Microsoft


Microsoft’s Project Scorpio: More Hardware Details Revealed

Microsoft’s Project Scorpio: More Hardware Details Revealed

This news piece contains speculation, and suggests silicon implementation based on released products and roadmaps. The only elements confirmed for Project Scorpio are the eight x86 cores, 6 TFLOPs, 326 GB/s, it’s built by AMD, and it is coming in 2017. If anyone wants to officially correct any speculation, please get in touch. 

One of the critical points of contention with consoles, especially when viewed through the lens of the PC enthusiast, is the hardware specifications. Consoles have long development cycles, and are thus already behind the curve at launch – and that gap only grows over time as the life-cycle of the console is anywhere from five to seven years. The trade-off is usually that the console is an optimized platform, particularly for software: performance is a known quantity and it is much easier to optimize for.

For ten months or so now, Microsoft has been teasing its next generation console. Aside from launching the Xbox One S as a minor mid-season revision to the Xbox One, the next-generation ‘Project Scorpio’ aims to be the most powerful console available. While this is a commendable aspiration (one that would look odd if it wasn’t achieved), the meat and potatoes of the hardware discussion has still been relatively unknown. Well, some of the details have come to the surface through a PR reveal with Eurogamer’s Digital Foundry.

We know the aim with Project Scorpio is to support 4K playback (4K UHD Blu-Ray), as well as a substantial part of 4K gaming. With recent introductions in the PC space of ‘VR’ capable hardware coming down in price, Microsoft is able to carefully navigate what hardware it can source. It is expected that this generation will still rely on AMD’s semi-custom foundry business, given that high-end consoles are now on x86 technologies and Intel’s custom foundry business is still in the process of being enabled (Intel’s custom foundry is also expected to be expensive). Of course, pairing an AMD CPU and AMD GPU would be the sensible choice here, with AMD launching a new GPU architecture last year in Polaris.

Here’s a table of what the reveal is:

Microsoft Console Specification Comparison
  Xbox 360 Xbox One Project Scorpio
CPU Cores/Threads 3/6 8/8 8 / ?
CPU Frequency 3.2 GHz 1.75 GHz 2.3 GHz
CPU µArch IBM PowerPC AMD Jaguar AMD Jaguar
Shared L2 Cache 1MB 2 x 2MB 2 x 2MB ?
GPU Cores   16 CUs
768 SPs
853 MHz
40 CUs
2560 SPs
1172 MHz
Peak Shader Throughput 0.24 TFLOPS 1.23 TFLOPS 6 TFLOPs
Embedded Memory 10MB eDRAM 32MB eSRAM None
Embedded Memory Bandwidth 32GB/s 102-204 GB/s None
System Memory 512MB GDDR3-1400 8GB DDR3-2133 12GB GDDR5
(6.8 Gbps)
System Memory Bus 128-bits 256-bits 384-bit
System Memory Bandwidth 22.4 GB/s 68.3 GB/s 326 GB/s
Manufacturing Process Various TSMC 28nm TSMC 16nm

At the high level, we have eight ‘custom’ x86 cores running at 2.3 GHz for the CPU, and 40 compute units at 1172 MHz for the GPU. The GPU will be paired with 12GB of GDDR5, to give 326GB/s of bandwidth. Storage is via a 1TB HDD, and the optical drive supports 4K UHD Blu-Ray.

Let’s break this down with some explanation and predictions.

Eight Custom CPU Cores: But They’re Still Jaguar (or almost)

The Xbox One uses AMD’s Jaguar cores. These are low powered and simpler cores, aimed at a low-performance profile and optimized for cost and power. In non-custom designs, we saw these CPUs hit above 2 GHz, but these were limited to 1.75 GHz in the Xbox One.

AMD technically has several cores potentially available for Scorpio: Excavator (Bulldozer-based, as seen on 28nm), Jaguar-based (also from 28nm) or Zen based (seen on 14nm GF). While the latter is a design that has returned AMD to the high-end of x86 performance computing, offering high performance for reasonable power, a Zen design would be relatively quick turnaround from a consumer launch a month ago. Because of the time frame, even if Microsoft could go for Zen in the Scorpio, this would increase the base cost of the console by redesigning the cores on 16nm TSMC.


A full shot of the motherboard in Scorpio. Source: Digital Foundry

In the Digital Foundary piece, Microsoft stated that the CPU portion of Scorpio has a 31% performance gain over the Xbox. This isn’t IPC, this is just raw performance. Moving from Jaguar to Zen would be more than 60%, and actually the frequency difference between the 2.3 GHz in Scorpio and 1.75 GHz in Xbox One is exactly 31%. So we are dealing with a Jaguar-style core (although perhaps modified).

That being said, this is a ‘custom’ x86 core. Microsoft could have requested specific IP blocks and features not present in the original Jaguar CPUs but present in things such as Zen, such as power management techniques. Typically a console shares DRAM between the CPU and GPU, so it might be something as simple as the CPU memory controller supporting GDDR5. So instead of seeing Zen coming to consoles, we’re seeing another crack at using Jaguar (or Jaguar+) but revised for a smaller process node to keep overall costs down – and given that the main focus on a console is the GPU, that’s entirely possible.

40 Compute Units: Likely Based on AMD’s Polaris GPU Architecture

When it comes to the GPU side of the Scorpio SoC, things get a little more nebulous and interesting. Simply put, we have a pretty good idea that the GPU is based on AMD’s Polaris (GCN4) architecture, but this isn’t something Microsoft is confirming at this time.

While AMD continually moves forward with their GPU architectures every generation, the long development time of the consoles and Microsoft/Sony’s need to customize means that console GPUs can branch off of AMD’s architectures at any number of points. Paradoxically, they can even branch off from future architectures, which is what we saw for the PlayStation 4 Pro last year. There, Sony confirmed that they had used the core shader design from AMD’s forthcoming Vega architecture, which even now has yet to be released on the PC.

For reference, the original Xbox One and the Xbox One S use a GPU design based on AMD’s GCN 1.1 architecture, roughly equivalent to the Radeon HD 7790. Microsoft’s options for their newest console then are to stick with GCN 1.1, use Polaris (GCN 4), or, like Sony, use something at least in part based on Vega (GCN 5).

So what did Microsoft use as a base in Scorpio? Right now, based on admittedly limited information from Microsoft’s carefully orchestrated reveal, all signs point to Polaris. Scorpio supports newer features not found in GCN 1.1 such as Delta Color Compression, which is a dead giveaway it’s based on something newer. At the same point, there is no mention in Microsoft’s reveal of any Vega-level features like rapid packed math (2xFP16) or a programmable geometry pipeline. As a result, the safe bet right now is that we’re looking at something principally derived from Polaris.

Now there is a bit of nuance here, as AMD’s GPU architecture is offered piecemeal: the shader cores, the memory controllers, the display controllers, etc are all separate blocks that can be mixed and matches. This is how the PS4 Pro uses just parts of Vega. So it’s entirely possible that there are other bits and pieces in Scorpio that are newer than Polaris, however the all-important shader cores and ROP backends clearly point to Polaris.

Diving into the specs a bit deeper, we do have the clockspeeds and configurations for both the GPU and the memory. Scorpio’s GPU is a 40 CU (2,560 SP) wide design – a bit wider than the Radeon RX 480 – which is a rather extensive upgrade over the original Xbox One. Ignoring clockspeeds for the moment (more in a sec), just the CU count itself is 3.33 times the 12 CUs in the original XB1. Similarly, Microsoft has doubled the number of ROP backends from 16 to 32. The ROP change is badly needed in order for Microsoft to reach their 4K goal, and it has been a pretty universal suspicion that the original XB1’s 16 ROPs were a big part of the reason that major multiplatform games tend to go with 900p instead of a native 1080p.

Meanwhile on the clockspeed front, the new GPU is clocked at 1172MHz, giving Microsoft 6 TFLOPS right on the dot. This is a 37% clockspeed increase over the original XB1, and a 28% increase over the XB1S, which received a slight clockspeed bump of its own. These clockspeeds are well within the range of what the Polaris architecture can offer, and while not as conservative as Sony’s design choices, should still be reasonably power efficient, though I’m very much interested in seeing what total power consumption is like.

More importantly, combined with the much wider GPU, the impact to the various throughput metrics is staggering. Shader/texture throughput will be 4.58x the original XB1, and ROP throughput will be 2.75x. Microsoft had a very large gap to close from the original Xbox One if they wanted to do 4K, and they have certainly put together a design that is equally large to help close that gap. However with that said, with performance that, on paper, is slightly ahead of a Radeon RX 480, I expect we’re still going to see some compromises here to consistently hit Microsoft’s 4K goal. 6 TFLOPS often isn’t enough for native 4K at current image quality levels, which means developers will have to resort to some clever optimizations or image scaling.

Now when it comes to feeding the beast, things take a very interesting turn. Scorpio comes with 12GB of GDDR5 attached to a 384-bit memory bus. This is as opposed to the original Xbox One, which used 8GB of DDR3 on a 256-bit bus, coupled with 32MB of SRAM on the SoC itself. Swapping out the DDR3 + SRAM for GDDR5 makes a lot of sense in the long run, as GDDR5 (as configured on Scorpio) offers 3.2x the bandwidth-per-pin as the DDR3. Microsoft scaled up the GPU, so they needed to scale up its ability to feed it as well.

What makes things especially interesting though is that Microsoft didn’t just switch out DDR3 for GDDR5, but they’re using a wider memory bus as well; expanding it by 50% to 384-bits wide. Not only does this even further expand the console’s memory bandwidth – now to a total of 326GB/sec, or 4.8x the XB1’s DDR3 – but it means we have an odd mismatch between the ROP backends and the memory bus. Briefly, the ROP backends and memory bus are typically balanced 1-to-1 in a GPU, so a single memory controller will feed 1 or two ROP partitions. However in this case, we have a 384-bit bus feeding 32 ROPs, which is not a compatible mapping.

What this means is that at some level, Microsoft is running an additional memory crossbar in the SoC, which would be very similar to what AMD did back in 2012 with the Radeon HD 7970. Because the console SoC needs to split its memory bandwidth between the CPU and the GPU, things aren’t as cut and dry here as they are with discrete GPUs. But, at a high level, what we saw from the 7970 is that the extra bandwidth + crossbar setup did not offer much of a benefit over a straight-connected, lower bandwidth configuration. Accordingly, AMD has never done it again in their dGPUs. So I think it will be very interesting to see if developers can consistently consume more than 218GB/sec or so of bandwidth using the GPU.

Finally, while not touched upon in great deal in Microsoft’s reveal, it’s clear that the GPU portion of Scorpio is otherwise fully modern with respect to its video and display blocks. This doesn’t come as much of a surprise, as it’s necessary to support the 4K UHD Bli-ray standard, and indeed the Xbox One S is already in the same boat. So that means we’re looking at full 4Kp60 HEVC decode with HDMI 2.0 out.

Designing for 16nm at TSMC

If we move forward with a Jaguar plus Polaris prediction, it means that both designs will have to be reconfigured for TSMC’s 16nm process. For the Jaguar-based CPU, it would result in much lower power than 28/32nm, and also a much lower die area. Compared to the GPU, an 8-core Jaguar design might be 10-15% of the entire silicon. The GPU will likely be on similar terms, although with a larger memory bus and more CUs (44 in the design, 40 in use).

AMD recently afforded additional quarterly costs for using foundries other than Global Foundries (as per their renegotiated wafer agreement), which a number of analysts chalked up to future server designs being made elsewhere. A few of us postulated it’s more to do with AMD’s semi-custom business,  and either way it points to silicon Zen being redesigned for 16nm TSMC.

Digital Foundry reported the total die size for the combination chip is listed 360mm2 at seven billion transistors (including CPU and GPU), with four shader engines each containing 11 compute units (one is disabled per block). This is all within 7 billion transistors. It was also mentioned that the floor plan of the silicon, aside from four groups of 11 CUs, also had two clusters of four CPU cores.

Given the use of Jaguar, this means that the CPUs are a tiny chunk of the die area on the silicon, probably under one fifth of the chip. We don’t know the size of the GPU, but 36 CUs of Polaris 10 on GloFo 14nm is 232mm2 at 5.7 billion transistors. Scaled up to 40 CUs, this is around 257 mm2, leaving 100mm2 for the lower density of TSMC’s 16nm process, the CPU cores, a memory controller, and other IO. 

Microsoft also states that the power supply with the unit can be suited up to 245W. If we assume a low-frequency Jaguar CPU inside, that could be around 25W max, leaving 150-220W for the GPU. A full sized RX 480 comes in at 150W, and given this GPU is a little more than that, perhaps nearer 170W (or tuned to 100-150W, depending on the base frequencies). The power supply, in a Jaguar + Polaris configuration, seems to have a good 20-25% power budget in hand.


Source: Digital Foundry

Based on some of the discussion from the source, it would seem that AMD is implementing a good number of its power saving features from Excavator and Zen, particularly related to unique DVFS profiles per silicon die as it comes off the production line, rather than a one-size fits all approach. The silicon will also be paired with a vapor chamber cooler, using a custom centrifugal fan.

What We Don’t Know

Hardware aside, the launch titles will be an interesting story in itself, especially with recent closures of dedicated MS studios such as Lionhead.

Project Scorpio is due out in Fall / Q3 2017.

This article originally predicted a Zen + Polaris configuration, but due to a secondary analysis, is now a Jaguar + Polaris prediction.

 

Source: Digital Foundry

Microsoft’s Project Scorpio: More Hardware Details Revealed

Microsoft’s Project Scorpio: More Hardware Details Revealed

This news piece contains speculation, and suggests silicon implementation based on released products and roadmaps. The only elements confirmed for Project Scorpio are the eight x86 cores, 6 TFLOPs, 326 GB/s, it’s built by AMD, and it is coming in 2017. If anyone wants to officially correct any speculation, please get in touch. 

One of the critical points of contention with consoles, especially when viewed through the lens of the PC enthusiast, is the hardware specifications. Consoles have long development cycles, and are thus already behind the curve at launch – and that gap only grows over time as the life-cycle of the console is anywhere from five to seven years. The trade-off is usually that the console is an optimized platform, particularly for software: performance is a known quantity and it is much easier to optimize for.

For ten months or so now, Microsoft has been teasing its next generation console. Aside from launching the Xbox One S as a minor mid-season revision to the Xbox One, the next-generation ‘Project Scorpio’ aims to be the most powerful console available. While this is a commendable aspiration (one that would look odd if it wasn’t achieved), the meat and potatoes of the hardware discussion has still been relatively unknown. Well, some of the details have come to the surface through a PR reveal with Eurogamer’s Digital Foundry.

We know the aim with Project Scorpio is to support 4K playback (4K UHD Blu-Ray), as well as a substantial part of 4K gaming. With recent introductions in the PC space of ‘VR’ capable hardware coming down in price, Microsoft is able to carefully navigate what hardware it can source. It is expected that this generation will still rely on AMD’s semi-custom foundry business, given that high-end consoles are now on x86 technologies and Intel’s custom foundry business is still in the process of being enabled (Intel’s custom foundry is also expected to be expensive). Of course, pairing an AMD CPU and AMD GPU would be the sensible choice here, with AMD launching a new GPU architecture last year in Polaris.

Here’s a table of what the reveal is:

Microsoft Console Specification Comparison
  Xbox 360 Xbox One Project Scorpio
CPU Cores/Threads 3/6 8/8 8 / ?
CPU Frequency 3.2 GHz 1.75 GHz 2.3 GHz
CPU µArch IBM PowerPC AMD Jaguar AMD Jaguar
Shared L2 Cache 1MB 2 x 2MB 2 x 2MB ?
GPU Cores   16 CUs
768 SPs
853 MHz
40 CUs
2560 SPs
1172 MHz
Peak Shader Throughput 0.24 TFLOPS 1.23 TFLOPS 6 TFLOPs
Embedded Memory 10MB eDRAM 32MB eSRAM None
Embedded Memory Bandwidth 32GB/s 102-204 GB/s None
System Memory 512MB GDDR3-1400 8GB DDR3-2133 12GB GDDR5
(6.8 Gbps)
System Memory Bus 128-bits 256-bits 384-bit
System Memory Bandwidth 22.4 GB/s 68.3 GB/s 326 GB/s
Manufacturing Process Various TSMC 28nm TSMC 16nm

At the high level, we have eight ‘custom’ x86 cores running at 2.3 GHz for the CPU, and 40 compute units at 1172 MHz for the GPU. The GPU will be paired with 12GB of GDDR5, to give 326GB/s of bandwidth. Storage is via a 1TB HDD, and the optical drive supports 4K UHD Blu-Ray.

Let’s break this down with some explanation and predictions.

Eight Custom CPU Cores: But They’re Still Jaguar (or almost)

The Xbox One uses AMD’s Jaguar cores. These are low powered and simpler cores, aimed at a low-performance profile and optimized for cost and power. In non-custom designs, we saw these CPUs hit above 2 GHz, but these were limited to 1.75 GHz in the Xbox One.

AMD technically has several cores potentially available for Scorpio: Excavator (Bulldozer-based, as seen on 28nm), Jaguar-based (also from 28nm) or Zen based (seen on 14nm GF). While the latter is a design that has returned AMD to the high-end of x86 performance computing, offering high performance for reasonable power, a Zen design would be relatively quick turnaround from a consumer launch a month ago. Because of the time frame, even if Microsoft could go for Zen in the Scorpio, this would increase the base cost of the console by redesigning the cores on 16nm TSMC.


A full shot of the motherboard in Scorpio. Source: Digital Foundry

In the Digital Foundary piece, Microsoft stated that the CPU portion of Scorpio has a 31% performance gain over the Xbox. This isn’t IPC, this is just raw performance. Moving from Jaguar to Zen would be more than 60%, and actually the frequency difference between the 2.3 GHz in Scorpio and 1.75 GHz in Xbox One is exactly 31%. So we are dealing with a Jaguar-style core (although perhaps modified).

That being said, this is a ‘custom’ x86 core. Microsoft could have requested specific IP blocks and features not present in the original Jaguar CPUs but present in things such as Zen, such as power management techniques. Typically a console shares DRAM between the CPU and GPU, so it might be something as simple as the CPU memory controller supporting GDDR5. So instead of seeing Zen coming to consoles, we’re seeing another crack at using Jaguar (or Jaguar+) but revised for a smaller process node to keep overall costs down – and given that the main focus on a console is the GPU, that’s entirely possible.

40 Compute Units: Likely Based on AMD’s Polaris GPU Architecture

When it comes to the GPU side of the Scorpio SoC, things get a little more nebulous and interesting. Simply put, we have a pretty good idea that the GPU is based on AMD’s Polaris (GCN4) architecture, but this isn’t something Microsoft is confirming at this time.

While AMD continually moves forward with their GPU architectures every generation, the long development time of the consoles and Microsoft/Sony’s need to customize means that console GPUs can branch off of AMD’s architectures at any number of points. Paradoxically, they can even branch off from future architectures, which is what we saw for the PlayStation 4 Pro last year. There, Sony confirmed that they had used the core shader design from AMD’s forthcoming Vega architecture, which even now has yet to be released on the PC.

For reference, the original Xbox One and the Xbox One S use a GPU design based on AMD’s GCN 1.1 architecture, roughly equivalent to the Radeon HD 7790. Microsoft’s options for their newest console then are to stick with GCN 1.1, use Polaris (GCN 4), or, like Sony, use something at least in part based on Vega (GCN 5).

So what did Microsoft use as a base in Scorpio? Right now, based on admittedly limited information from Microsoft’s carefully orchestrated reveal, all signs point to Polaris. Scorpio supports newer features not found in GCN 1.1 such as Delta Color Compression, which is a dead giveaway it’s based on something newer. At the same point, there is no mention in Microsoft’s reveal of any Vega-level features like rapid packed math (2xFP16) or a programmable geometry pipeline. As a result, the safe bet right now is that we’re looking at something principally derived from Polaris.

Now there is a bit of nuance here, as AMD’s GPU architecture is offered piecemeal: the shader cores, the memory controllers, the display controllers, etc are all separate blocks that can be mixed and matches. This is how the PS4 Pro uses just parts of Vega. So it’s entirely possible that there are other bits and pieces in Scorpio that are newer than Polaris, however the all-important shader cores and ROP backends clearly point to Polaris.

Diving into the specs a bit deeper, we do have the clockspeeds and configurations for both the GPU and the memory. Scorpio’s GPU is a 40 CU (2,560 SP) wide design – a bit wider than the Radeon RX 480 – which is a rather extensive upgrade over the original Xbox One. Ignoring clockspeeds for the moment (more in a sec), just the CU count itself is 3.33 times the 12 CUs in the original XB1. Similarly, Microsoft has doubled the number of ROP backends from 16 to 32. The ROP change is badly needed in order for Microsoft to reach their 4K goal, and it has been a pretty universal suspicion that the original XB1’s 16 ROPs were a big part of the reason that major multiplatform games tend to go with 900p instead of a native 1080p.

Meanwhile on the clockspeed front, the new GPU is clocked at 1172MHz, giving Microsoft 6 TFLOPS right on the dot. This is a 37% clockspeed increase over the original XB1, and a 28% increase over the XB1S, which received a slight clockspeed bump of its own. These clockspeeds are well within the range of what the Polaris architecture can offer, and while not as conservative as Sony’s design choices, should still be reasonably power efficient, though I’m very much interested in seeing what total power consumption is like.

More importantly, combined with the much wider GPU, the impact to the various throughput metrics is staggering. Shader/texture throughput will be 4.58x the original XB1, and ROP throughput will be 2.75x. Microsoft had a very large gap to close from the original Xbox One if they wanted to do 4K, and they have certainly put together a design that is equally large to help close that gap. However with that said, with performance that, on paper, is slightly ahead of a Radeon RX 480, I expect we’re still going to see some compromises here to consistently hit Microsoft’s 4K goal. 6 TFLOPS often isn’t enough for native 4K at current image quality levels, which means developers will have to resort to some clever optimizations or image scaling.

Now when it comes to feeding the beast, things take a very interesting turn. Scorpio comes with 12GB of GDDR5 attached to a 384-bit memory bus. This is as opposed to the original Xbox One, which used 8GB of DDR3 on a 256-bit bus, coupled with 32MB of SRAM on the SoC itself. Swapping out the DDR3 + SRAM for GDDR5 makes a lot of sense in the long run, as GDDR5 (as configured on Scorpio) offers 3.2x the bandwidth-per-pin as the DDR3. Microsoft scaled up the GPU, so they needed to scale up its ability to feed it as well.

What makes things especially interesting though is that Microsoft didn’t just switch out DDR3 for GDDR5, but they’re using a wider memory bus as well; expanding it by 50% to 384-bits wide. Not only does this even further expand the console’s memory bandwidth – now to a total of 326GB/sec, or 4.8x the XB1’s DDR3 – but it means we have an odd mismatch between the ROP backends and the memory bus. Briefly, the ROP backends and memory bus are typically balanced 1-to-1 in a GPU, so a single memory controller will feed 1 or two ROP partitions. However in this case, we have a 384-bit bus feeding 32 ROPs, which is not a compatible mapping.

What this means is that at some level, Microsoft is running an additional memory crossbar in the SoC, which would be very similar to what AMD did back in 2012 with the Radeon HD 7970. Because the console SoC needs to split its memory bandwidth between the CPU and the GPU, things aren’t as cut and dry here as they are with discrete GPUs. But, at a high level, what we saw from the 7970 is that the extra bandwidth + crossbar setup did not offer much of a benefit over a straight-connected, lower bandwidth configuration. Accordingly, AMD has never done it again in their dGPUs. So I think it will be very interesting to see if developers can consistently consume more than 218GB/sec or so of bandwidth using the GPU.

Finally, while not touched upon in great deal in Microsoft’s reveal, it’s clear that the GPU portion of Scorpio is otherwise fully modern with respect to its video and display blocks. This doesn’t come as much of a surprise, as it’s necessary to support the 4K UHD Bli-ray standard, and indeed the Xbox One S is already in the same boat. So that means we’re looking at full 4Kp60 HEVC decode with HDMI 2.0 out.

Designing for 16nm at TSMC

If we move forward with a Jaguar plus Polaris prediction, it means that both designs will have to be reconfigured for TSMC’s 16nm process. For the Jaguar-based CPU, it would result in much lower power than 28/32nm, and also a much lower die area. Compared to the GPU, an 8-core Jaguar design might be 10-15% of the entire silicon. The GPU will likely be on similar terms, although with a larger memory bus and more CUs (44 in the design, 40 in use).

AMD recently afforded additional quarterly costs for using foundries other than Global Foundries (as per their renegotiated wafer agreement), which a number of analysts chalked up to future server designs being made elsewhere. A few of us postulated it’s more to do with AMD’s semi-custom business,  and either way it points to silicon Zen being redesigned for 16nm TSMC.

Digital Foundry reported the total die size for the combination chip is listed 360mm2 at seven billion transistors (including CPU and GPU), with four shader engines each containing 11 compute units (one is disabled per block). This is all within 7 billion transistors. It was also mentioned that the floor plan of the silicon, aside from four groups of 11 CUs, also had two clusters of four CPU cores.

Given the use of Jaguar, this means that the CPUs are a tiny chunk of the die area on the silicon, probably under one fifth of the chip. We don’t know the size of the GPU, but 36 CUs of Polaris 10 on GloFo 14nm is 232mm2 at 5.7 billion transistors. Scaled up to 40 CUs, this is around 257 mm2, leaving 100mm2 for the lower density of TSMC’s 16nm process, the CPU cores, a memory controller, and other IO. 

Microsoft also states that the power supply with the unit can be suited up to 245W. If we assume a low-frequency Jaguar CPU inside, that could be around 25W max, leaving 150-220W for the GPU. A full sized RX 480 comes in at 150W, and given this GPU is a little more than that, perhaps nearer 170W (or tuned to 100-150W, depending on the base frequencies). The power supply, in a Jaguar + Polaris configuration, seems to have a good 20-25% power budget in hand.


Source: Digital Foundry

Based on some of the discussion from the source, it would seem that AMD is implementing a good number of its power saving features from Excavator and Zen, particularly related to unique DVFS profiles per silicon die as it comes off the production line, rather than a one-size fits all approach. The silicon will also be paired with a vapor chamber cooler, using a custom centrifugal fan.

What We Don’t Know

Hardware aside, the launch titles will be an interesting story in itself, especially with recent closures of dedicated MS studios such as Lionhead.

Project Scorpio is due out in Fall / Q3 2017.

This article originally predicted a Zen + Polaris configuration, but due to a secondary analysis, is now a Jaguar + Polaris prediction.

 

Source: Digital Foundry

Microsoft and Qualcomm Collaborate to Bring Windows 10 & x86 Emulation to Snapdragon Processors

Microsoft and Qualcomm Collaborate to Bring Windows 10 & x86 Emulation to Snapdragon Processors

Today at Microsoft’s WinHEC event in Shenzhen, China, the company announced that it’s working with Qualcomm to bring the full Windows 10 experience to future devices powered by Snapdragon processors. Terry Myerson, executive vice president of the Windows and Devices Group at Microsoft, is “excited to bring Windows 10 to the ARM ecosystem” and looks forward to bringing “Windows 10 to life with a range of thin, light, power-efficient and always-connected devices,” which may include anything from smartphones to tablets to ultraportable laptops to servers. These new Snapdragon-powered devices should support all things Microsoft, including Microsoft Office, Windows Hello, Windows Pen, and the Edge browser, alongside third-party Universal Windows Platform (UWP) apps and, most interestingly, x86 (32-bit) Win32 apps. They should even be able to play Crysis 2.

This announcement fits nicely with Microsoft’s “Windows Everywhere” doctrine and should come as no surprise. It’s not even the first time we’ve seen Windows running on ARM processors. Microsoft’s failed Windows RT operating system was a modified version of Windows 8 that targeted the ARMv7-A 32-bit architecture. It grew from Microsoft’s MinWin effort to make Windows more modular by reorganizing the operating system and cleaning up API dependencies.

This work first surfaced in Windows Server 2008, which could be installed with a stripped-down, command-line only interface that did not include components such as Internet Explorer that were not necessary for specific server roles. Windows RT also leveraged the newer Windows Runtime (WinRT) API that offered several new features such as digitally signed app packages distributed through the centralized Windows Store and the ability to run apps within a sandbox. It also made it easier for software developers to target multiple CPU architectures. However, Microsoft’s rework of Windows was not yet complete, leaving Windows RT with a bunch of legacy Win32 code that went unused. It also could not run Win32 desktop apps, severely limiting the number of available apps to only those using WinRT and distributed through the Windows Store.

MinWin and its derivatives have continued to evolve over the past few years after getting a major boost in 2013 when Microsoft reorganized its disparate software platforms into the singular Operating Systems Engineering Group. The end result is Windows 10, a modular OS that can run on anything from low-powered IoT devices to high-performing workstations and servers. Its foundation is OneCore, MinWin’s direct descendant, that includes only the operating system kernel and components essential for any hardware platform. OneCore UAP (Universal App Platform) is another major module for Windows 10 whose groundwork was laid during the creation of Windows Phone and Windows RT. It provides support for Universal Windows Apps and Drivers, along with more advanced features such as the Edge browser and DirectX. On top of these modules, Microsoft can add modules that target specific device families (desktop, mobile, Xbox, HoloLens, etc.) that provide specialized features and shells.

Also included in OneCore UAP is Universal Windows Platform (UWP). An extension of the WinRT API used in Windows 8, it allows developers to create universal apps that are CPU architecture agnostic and can run on multiple devices, seamlessly adapting their user interface and input methods to the hardware they’re running on. With UWP, the architecutre independence is achieved by having pre-compiled versions for each platform available from the Store, which will then download and install the correct version for the individual device. The major change with today’s announcement over Windows RT and UWP is that x86 apps will be able to run on Qualcomm’s ARM-based SoCs, along with support for all of the peripherals that are already supported with Windows 10. This alone is a huge change from Windows RT, which would only work with a small subset of peripherals.

Microsoft is also focusing on having these devices always connected through cellular, which is something that is not available for many PCs at the moment. Support will be available for eSIM to avoid having to find room in a cramped design to accomodate a physical SIM, and Microsoft is going so far as to call these “cellular PCs” meaning they are expecting broad support for this class of computer, rather than the handful available now with cellular connectivity.

The ability to run x86 Win32 apps on ARM will come through emulation, and to demonstrate the performance Microsoft has released a video of an ARM PC running Photoshop.

This of course raises several questions, few-if-any of which Microsoft is willing to answer. Intel has long exerted strong control over the x86 ISA, limiting or outright preventing competitors like NVIDIA from implementing x86 support. So how Microsoft and Qualcomm are able to (for lack of a better way to put it) get away with this is a big question. Certainly there’s no indication right now that this has Intel’s formal blessing.

The key points here are that this is a form of software emulation – Microsoft even calls it as much – and that only 32-bit x86 support is being offered. On the former, this means that there’s no hardware execution of x86 instructions taking place – though Microsoft and Qualcomm are certainly lining up instructions as best they can – which avoids many of the obvious patent pitfalls of doing x86 in hardware, and puts it in the same category as other x86 emulation mechanisms like DOSBox and QEMU. Meanwhile only supporting 32-bit x86 code further rolls back the clock, as the most important of those instructions are by now quite old, x86 having made the jump to 64-bit x86-64 back in 2003. So it may very well be that it’s easier to avoid any potential legal issues by sticking with 32-bit code, though that’s supposition on our part. In any case it will be interesting to see what instructions Microsoft’s emulator supports, and whether newer instructions and instruction set extensions (e.g SSE2) are supported in some fashion.

Of course, the performance of this solution remains to be seen. x86 is not easy or cheap to emulate, and an “emulator” as opposed to a Denver-like instruction translation makes that all the harder. On the other hand, while maximizing x86 compatibility is great for Microsoft and Qualcomm, what they really need x86 for is legacy applications, which broadly speaking aren’t performance-critical. So while x86 on a phone/tablet ARM SoC may not be fast, it need only be “good enough.”

In any case, Windows 10’s ability to scale and adapt to essentially any hardware platform is a remarkable feat of engineering, and it’s what makes today’s joint announcement with Qualcomm possible. The first devices with Snapdragon SoCs running the full Windows 10 experience should be available in the second half of 2017.

It will be interesting to see what shape these devices take and which companies produce them. Some new lower-cost, full-featured Windows 10 tablets would be a welcome addition, and Qualcomm has its eyes on the low-powered server market too with its Centriq product family. A Windows 10 smartphone with a Snapdragon SoC is also likely, but with Windows Phone 8 holding less than 1% global market share, according to Gartner, Microsoft is essentially starting from scratch. Will the benefits of universal apps be enough to lure software developers and users of other Windows products away from Android and iOS? Can Windows 10 reestablish Microsoft as a major player in the smartphone market, or is the hole it has dug over the past decade too deep?

Microsoft and Qualcomm Collaborate to Bring Windows 10 & x86 Emulation to Snapdragon Processors

Microsoft and Qualcomm Collaborate to Bring Windows 10 & x86 Emulation to Snapdragon Processors

Today at Microsoft’s WinHEC event in Shenzhen, China, the company announced that it’s working with Qualcomm to bring the full Windows 10 experience to future devices powered by Snapdragon processors. Terry Myerson, executive vice president of the Windows and Devices Group at Microsoft, is “excited to bring Windows 10 to the ARM ecosystem” and looks forward to bringing “Windows 10 to life with a range of thin, light, power-efficient and always-connected devices,” which may include anything from smartphones to tablets to ultraportable laptops to servers. These new Snapdragon-powered devices should support all things Microsoft, including Microsoft Office, Windows Hello, Windows Pen, and the Edge browser, alongside third-party Universal Windows Platform (UWP) apps and, most interestingly, x86 (32-bit) Win32 apps. They should even be able to play Crysis 2.

This announcement fits nicely with Microsoft’s “Windows Everywhere” doctrine and should come as no surprise. It’s not even the first time we’ve seen Windows running on ARM processors. Microsoft’s failed Windows RT operating system was a modified version of Windows 8 that targeted the ARMv7-A 32-bit architecture. It grew from Microsoft’s MinWin effort to make Windows more modular by reorganizing the operating system and cleaning up API dependencies.

This work first surfaced in Windows Server 2008, which could be installed with a stripped-down, command-line only interface that did not include components such as Internet Explorer that were not necessary for specific server roles. Windows RT also leveraged the newer Windows Runtime (WinRT) API that offered several new features such as digitally signed app packages distributed through the centralized Windows Store and the ability to run apps within a sandbox. It also made it easier for software developers to target multiple CPU architectures. However, Microsoft’s rework of Windows was not yet complete, leaving Windows RT with a bunch of legacy Win32 code that went unused. It also could not run Win32 desktop apps, severely limiting the number of available apps to only those using WinRT and distributed through the Windows Store.

MinWin and its derivatives have continued to evolve over the past few years after getting a major boost in 2013 when Microsoft reorganized its disparate software platforms into the singular Operating Systems Engineering Group. The end result is Windows 10, a modular OS that can run on anything from low-powered IoT devices to high-performing workstations and servers. Its foundation is OneCore, MinWin’s direct descendant, that includes only the operating system kernel and components essential for any hardware platform. OneCore UAP (Universal App Platform) is another major module for Windows 10 whose groundwork was laid during the creation of Windows Phone and Windows RT. It provides support for Universal Windows Apps and Drivers, along with more advanced features such as the Edge browser and DirectX. On top of these modules, Microsoft can add modules that target specific device families (desktop, mobile, Xbox, HoloLens, etc.) that provide specialized features and shells.

Also included in OneCore UAP is Universal Windows Platform (UWP). An extension of the WinRT API used in Windows 8, it allows developers to create universal apps that are CPU architecture agnostic and can run on multiple devices, seamlessly adapting their user interface and input methods to the hardware they’re running on. With UWP, the architecutre independence is achieved by having pre-compiled versions for each platform available from the Store, which will then download and install the correct version for the individual device. The major change with today’s announcement over Windows RT and UWP is that x86 apps will be able to run on Qualcomm’s ARM-based SoCs, along with support for all of the peripherals that are already supported with Windows 10. This alone is a huge change from Windows RT, which would only work with a small subset of peripherals.

Microsoft is also focusing on having these devices always connected through cellular, which is something that is not available for many PCs at the moment. Support will be available for eSIM to avoid having to find room in a cramped design to accomodate a physical SIM, and Microsoft is going so far as to call these “cellular PCs” meaning they are expecting broad support for this class of computer, rather than the handful available now with cellular connectivity.

The ability to run x86 Win32 apps on ARM will come through emulation, and to demonstrate the performance Microsoft has released a video of an ARM PC running Photoshop.

This of course raises several questions, few-if-any of which Microsoft is willing to answer. Intel has long exerted strong control over the x86 ISA, limiting or outright preventing competitors like NVIDIA from implementing x86 support. So how Microsoft and Qualcomm are able to (for lack of a better way to put it) get away with this is a big question. Certainly there’s no indication right now that this has Intel’s formal blessing.

The key points here are that this is a form of software emulation – Microsoft even calls it as much – and that only 32-bit x86 support is being offered. On the former, this means that there’s no hardware execution of x86 instructions taking place – though Microsoft and Qualcomm are certainly lining up instructions as best they can – which avoids many of the obvious patent pitfalls of doing x86 in hardware, and puts it in the same category as other x86 emulation mechanisms like DOSBox and QEMU. Meanwhile only supporting 32-bit x86 code further rolls back the clock, as the most important of those instructions are by now quite old, x86 having made the jump to 64-bit x86-64 back in 2003. So it may very well be that it’s easier to avoid any potential legal issues by sticking with 32-bit code, though that’s supposition on our part. In any case it will be interesting to see what instructions Microsoft’s emulator supports, and whether newer instructions and instruction set extensions (e.g SSE2) are supported in some fashion.

Of course, the performance of this solution remains to be seen. x86 is not easy or cheap to emulate, and an “emulator” as opposed to a Denver-like instruction translation makes that all the harder. On the other hand, while maximizing x86 compatibility is great for Microsoft and Qualcomm, what they really need x86 for is legacy applications, which broadly speaking aren’t performance-critical. So while x86 on a phone/tablet ARM SoC may not be fast, it need only be “good enough.”

In any case, Windows 10’s ability to scale and adapt to essentially any hardware platform is a remarkable feat of engineering, and it’s what makes today’s joint announcement with Qualcomm possible. The first devices with Snapdragon SoCs running the full Windows 10 experience should be available in the second half of 2017.

It will be interesting to see what shape these devices take and which companies produce them. Some new lower-cost, full-featured Windows 10 tablets would be a welcome addition, and Qualcomm has its eyes on the low-powered server market too with its Centriq product family. A Windows 10 smartphone with a Snapdragon SoC is also likely, but with Windows Phone 8 holding less than 1% global market share, according to Gartner, Microsoft is essentially starting from scratch. Will the benefits of universal apps be enough to lure software developers and users of other Windows products away from Android and iOS? Can Windows 10 reestablish Microsoft as a major player in the smartphone market, or is the hole it has dug over the past decade too deep?

Microsoft Announces the Surface Studio: 28-inch AIO with Touch, Pen, 4500x3000, Skylake, GTX 980M

Microsoft Announces the Surface Studio: 28-inch AIO with Touch, Pen, 4500×3000, Skylake, GTX 980M

As part of the now annual Microsoft Surface event, Panos Panay announced the next member of the Surface family, the Surface Studio. The Studio is ultimately a prosumer all-in-one device promising more functionality and versatility than any other desktop all-in-one PC by allowing the device to also turn a desk into a studio.

Front and center in what makes the Studio impressive is the size of the display: a 28-inch thin-bezel LCD display with a 3:2 aspect ratio, coming in at a 4500×3000 resolution and 192 pixels per inch. By contrast to 4K, this is 13.5 million pixels compared to 8.3 million in UHD, and Microsoft is promoting True Scale with the studio such that two A4 pieces of paper can be rendered side by side at full resolution and at a higher DPI than most standard office printers. The display is 12.5mm thin, with Microsoft redesigning the LCD stack to ensure a slim profile.

The display connects to the base via a specialist hinge, featuring 80 machined parts on each side for what Microsoft calls a ‘Zero Gravity Hinge’. This allows the display to be moved seamlessly and for any plausible angle, as well as taking on extra weight in studio mode. The display has two buttons on the right-hand side for power and volume. On the top of the display is the Windows Hello-enabled camera, with a 5.0 MP element capable of 1080p video (we assume 30 FPS). The Studio supports the Surface Pen, which can attach to the side of the display.

For color reproduction, Microsoft is advertising the display as supporting both DCI-P3 and sRGB with a simple toggle on the Windows sidebar to switch between the two. While Microsoft says that the displays are calibrated for both, this has fundamental issues with color reproduction.

In the base is a set of arguably last-generation specifications: 6th generation (Skylake) Intel Core i5 and Core i7 processor options (probably 65W desktop parts?) paired with up to 32GB of DDR4 memory (probably DDR4-2133). This comes with a NVIDIA GTX 965M 2GB for two of the three options, and a GTX 980M 4GB on the high-end model. Connectivity comes via USB 3.0, rather than USB 3.1/Thunderbolt. Storage is labeled as ‘1TB or 2TB Rapid Hybrid Drive’ options, which in the presentation looked like an M.2 drive but as yet it has been unstated if this is SATA or PCIe (or if a Rapid Hybrid Drive actually means an SSHD).

Microsoft Surface Studio
CPU Intel Core i5
Skylake
Intel Core i7
Skylake
Intel Core i7
Skylake
GPU NVIDIA
GTX 965M 2GB
NVIDIA
GTX 980M 4GB
DRAM 8GB DDR4 16 GB DDR4 32GB DDR4
Storage 1TB 1TB 2TB
‘Rapid Storage Drive’ (SATA? PCIe? SSHD?)
Display 28-inch 4500×3000 LCD Display
12.5mm thin
10-point MultiTouch
Magnetic Pen Support
Connectivity 802.11ac WiFi (Intel AC 8260?)
Gigabit Ethernet
Xbox Wireless
IO 4 x USB 3.0
Full-Size SD card reader (SDXC)
Mini DisplayPort
3.5mm Headset
Camera 5MP Front Facing
Windows Hello
1080p Recording
OS Windows 10 Pro
30-day Office Trial
Dimensions Display: 637.35 x 438.90 x 12.50 mm
Base: 250.00 x 200.00 x 32.2 mm
Weight: 9.56 kg / 21 lbs
Price $2999 $3499 $4199

Connectivity comes via four USB 3.0 ports, a full-size SD card reader, a mini DisplayPort output and a 3.5mm headset jack. WiFi is provided by an 802.11ac unit, although Microsoft does not say which one (I’d hazard a guess and say Intel’s AC8260 2×2 solution). The unit also supports Xbox Wireless, allowing for Xbox controllers to also be connected for gaming.

The whole unit weighs in at 21 lbs (9.5 kg), and Microsoft has stated that it will be available only in limited quantities during Q4, with the official release date as 15th December. Current configurations available will be:

$2999 : Intel Core i5 (Skylake), 8 GB DDR4, 1TB, GTX 965M 2GB
$3499 : Intel Core i7 (Skylake), 16 GB DDR4, 1TB, GTX 965M 2GB
$4199 : Intel Core i7 (Skylake), 32 GB DDR4, 2TB, GTX 980M 4GB

Windows 10 Pro is included with a 30-day Office trial.

Edit: Originally this piece was posted with the incorrect Intel Generation code name in the title. It should read ‘Skylake’, not ‘Haswell’. The piece has been edited to clarify.