News


Intel Launches 3D NAND SSDs For Client And Enterprise

Intel Launches 3D NAND SSDs For Client And Enterprise

Today Intel is announcing a variety of new SSDs with their 3D NAND flash memory. The new models use a mix of 3D MLC and 3D TLC, some SATA and some PCIe, and variously target the consumer, business, embedded and data center markets. While we are still awaiting details on the timing of these product releases, it is clear that Intel is eager to put planar flash behind them. The drive for this is especially strong as the models being replaced are all either based on Intel’s relatively expensive 20nm flash or on 16nm flash that Intel had to buy on the open market due to their decision to not participate in the 16nm node at IMFT.

Product Series Market Interface 3D NAND
SSD 600p Consumer Client M.2 PCIe 3 x4 TLC
SSD Pro 6000p Business Client M.2 PCIe 3 x4 TLC
SSD E 6000p Embedded, IoT M.2 PCIe 3 x4 TLC
SSD E 5420s Embedded, IoT 2.5″ and M.2 SATA MLC
SSD DC S3520 Data Center 2.5″ and M.2 SATA MLC
SSD DC P3520 Data Center U.2 and PCIe x4 HHHL MLC

First up, we have a M.2 PCIe SSD branded three different ways for three different markets. In the consumer market we have the SSD 600p series, while the business market will get the Pro 6000p series. The specs released so far differ only in mentioning that the Pro 6000p series will be supported by the remote secure erase feature of Intel’s Active Management Technology. The third variant—for the embedded and Internet of Things market—will only get the two smallest capacities, which gives us a look at how this design will perform with the limited parallelism that results from using IMFT’s high-capacity 384Gb 3D TLC die.

Intel Client and Embedded PCIe SSDs
Model Pro 6000p 600p E 6000p 750
Capacity 128GB, 256GB, 360GB, 512GB, 1024GB 256GB 128GB 400GB, 800GB, 1.2TB
NAND IMFT 32-layer 3D TLC IMFT 20nm MLC
Interface M.2 2280 PCIe 3 x4 (single-sided) U.2 or PCIe 3 x4 HHHL
Sequential Read up to 1800 MB/s 1570 MB/s 770 MB/s up to 2500 MB/s
Sequential Write up to 560 MB/s 540 MB/s 450 MB/s up to 1200 MB/s
4kB Random Read up to 155k IOPS 71k IOPS 35k IOPS up to 460k IOPS
4kB Random Write up to 128k IOPS 112k IOPS 91.5k IOPS up to 290k IOPS
Idle Power 10mW 4W
Warranty 5 years 5 years

The 600p and 6000p series are a much more mainstream design than Intel’s previous NVMe SSD for the client market. The SSD 750 was a thinly-disguised enterprise drive, with power consumption and physical dimensions that are far too big for the M.2 form factor that has become the preferred choice for client PCIe storage. The SSD 750 was in many ways overkill from the start, and more recent M.2 drives (especially from Samsung) have caught up in peak performance to offer a much better value for typical client usage. The 600p will be going after the client PCIe storage market from the opposite end: as one of the first TLC PCIe SSDs, its performance specifications don’t set any records but it will be a much more value-oriented product than any of the M.2 PCIe SSDs currently on the market. Intel has confirmed that the 600p and 6000p are using a third-party controller. UPDATE: Allyn Malventano at PC Perspective has uncovered a forum post with an uncensored picture of the 600p. The controller has “SMI” in big letters, suggesting that it is a Silicon Motion SM2260 or relative thereof, but with different markings than the samples Silicon Motion has been showing off at conventions. Intel has also used Silicon Motion controllers in drives like the 540s.


SSD 600p

In addition to the SSD E 6000p, there is a new series of SATA drives for the embedded market. The SSD E 5420s series consists of a 240GB 2.5″ drive and a 150GB M.2 drive, both with 3D MLC and full power loss protection. The E 5420s is rated for one drive write per day, a substantial improvement over the 0.3 DWPD rating of the E 5410s or the 20GB/day of the E 5400s.

Intel Embedded/IoT SATA SSDs
Model E 5420s E 5410s E 5400s
Capacity 240GB 150GB 80GB, 120GB 48GB, 80GB, 120GB, 180GB
NAND IMFT 32-layer 3D MLC 16nm MLC 16nm TLC
Interface 2.5″ SATA M.2 SATA 2.5″ SATA 2.5″ and M.2 SATA
Sequential Read 320 MB/s 165 MB/s up to 475 MB/s up to 560 MB/s
Sequential Write 300 MB/s 145 MB/s up to 135 MB/s up to 475 MB/s
4kB Random Read 65k IOPS 39k IOPS up to 68k IOPS up to 71k IOPS
4kB Random Write 16k IOPS 28k IOPS up to 84k IOPS up to 85k IOPS
Warranty 5 years 5 years 5 years


SSD E 5420s

Moving on to the data center products, the SSD DC S3520 is a new mid-range enterprise SATA SSD for read-oriented workloads and the third iteration of the S3500 series. The M.2 form factor has returned as an option after the DC S3510 series was only offered in the 2.5″ form factor. As with the SATA drives for the embedded market, performance has decreased but endurance has been bumped up from 0.3 DWPD to 1 DWPD. The larger per-die capacity of the 3D MLC has caused the smallest capacity option to increase from 80GB to 150GB, but 1.6TB is still the largest option for the 2.5″ form factor.

Intel Enterprise SATA SSDs
Model DC S3520 DC S3510
Capacity 150GB, 240GB, 480GB, 800GB, 960GB, 1.2TB, 1.6TB 150GB, 240GB, 480GB, 760GB, 960GB 80GB, 120GB, 240GB, 480GB, 800GB, 1.2TB, 1.6TB
NAND IMFT 32-layer 3D MLC 16nm MLC
Interface 2.5″ SATA M.2 SATA 2.5″ SATA
Sequential Read (up to) 450 MB/s 410 MB/s 500 MB/s
Sequential Write (up to) 380 MB/s 320 MB/s 460 MB/s
4kB Random Read (up to) 67.5k IOPS 53k IOPS 68k IOPS
4kB Random Write (up to) 17k IOPS 14.4k IOPS 20k IOPS
Endurance 1 DWPD 1 DWPD 0.3 DWPD
Warranty 5 years 5 years 5 years

SSD DC S3520

(UPDATED) Finally, for the enterprise PCIe space we have the SSD DC P3520. In March the DC P3320 was announced as Intel’s first 3D NAND SSD and the P3520 was mentioned but specifications were not provided at that time. Intel has since decided to only produce the P3520 and to price it close to the level of SATA SSDs. The reduced performance relative to the DC P3500 is a consequence of reduced parallelism at the same capacity that results from using the 256Gb 3D MLC rather than 128Gb 20nm MLC, and the size of this performance regression is a bit dismaying. The DC P3520 is clearly based on the same hardware platform as the rest of the PCIe data center drives, with a familiar layout for the PCB and heatsink evident in the add-in card version.

Intel Enterprise PCIe SSDs
Model DC P3520 DC P3320 (canceled) DC P3500
Capacity 450GB (U.2 only), 1.2TB, 2TB 450GB (U.2 only), 1.2TB, 2TB 400GB, 1.2TB, 2TB
NAND IMFT 32-layer 3D MLC IMFT 32-layer 3D MLC IMFT 20nm MLC
Interface U.2 and PCIe 3 x4 HHHL U.2 and PCIe 3 x4 HHHL U.2 and PCIe 3 x4 HHHL
Sequential Read (up to) 1700 MB/s 1600 MB/s 2700 MB/s
Sequential Write (up to) 1350 MB/s 1400 MB/s 1800 MB/s
4kB Random Read (up to) 375k IOPS 365k IOPS 430k IOPS
4kB Random Write (up to) 26k IOPS 22k IOPS 28k IOPS
4kB Random 70/30 Read/Write (up to) 80k IOPS 65k IOPS 80k IOPS
Warranty 5 years 5 years 5 years


SSD DC P3520 U.2

These new SSDs will have a staggered release over the rest of the year. Starting next week the DC P3520 will be shipping, as well as the 128GB, 256GB and 512GB capacities of the SSD 600p and SSD Pro 6000p. The 2.5″ DC S3520 will ship in early September. The rest are planned to be available in Q4.

Intel Launches 3D NAND SSDs For Client And Enterprise

Intel Launches 3D NAND SSDs For Client And Enterprise

Today Intel is announcing a variety of new SSDs with their 3D NAND flash memory. The new models use a mix of 3D MLC and 3D TLC, some SATA and some PCIe, and variously target the consumer, business, embedded and data center markets. While we are still awaiting details on the timing of these product releases, it is clear that Intel is eager to put planar flash behind them. The drive for this is especially strong as the models being replaced are all either based on Intel’s relatively expensive 20nm flash or on 16nm flash that Intel had to buy on the open market due to their decision to not participate in the 16nm node at IMFT.

Product Series Market Interface 3D NAND
SSD 600p Consumer Client M.2 PCIe 3 x4 TLC
SSD Pro 6000p Business Client M.2 PCIe 3 x4 TLC
SSD E 6000p Embedded, IoT M.2 PCIe 3 x4 TLC
SSD E 5420s Embedded, IoT 2.5″ and M.2 SATA MLC
SSD DC S3520 Data Center 2.5″ and M.2 SATA MLC
SSD DC P3520 Data Center U.2 and PCIe x4 HHHL MLC

First up, we have a M.2 PCIe SSD branded three different ways for three different markets. In the consumer market we have the SSD 600p series, while the business market will get the Pro 6000p series. The specs released so far differ only in mentioning that the Pro 6000p series will be supported by the remote secure erase feature of Intel’s Active Management Technology. The third variant—for the embedded and Internet of Things market—will only get the two smallest capacities, which gives us a look at how this design will perform with the limited parallelism that results from using IMFT’s high-capacity 384Gb 3D TLC die.

Intel Client and Embedded PCIe SSDs
Model Pro 6000p 600p E 6000p 750
Capacity 128GB, 256GB, 360GB, 512GB, 1024GB 256GB 128GB 400GB, 800GB, 1.2TB
NAND IMFT 32-layer 3D TLC IMFT 20nm MLC
Interface M.2 2280 PCIe 3 x4 (single-sided) U.2 or PCIe 3 x4 HHHL
Sequential Read up to 1800 MB/s 1570 MB/s 770 MB/s up to 2500 MB/s
Sequential Write up to 560 MB/s 540 MB/s 450 MB/s up to 1200 MB/s
4kB Random Read up to 155k IOPS 71k IOPS 35k IOPS up to 460k IOPS
4kB Random Write up to 128k IOPS 112k IOPS 91.5k IOPS up to 290k IOPS
Idle Power 10mW 4W
Warranty 5 years 5 years

The 600p and 6000p series are a much more mainstream design than Intel’s previous NVMe SSD for the client market. The SSD 750 was a thinly-disguised enterprise drive, with power consumption and physical dimensions that are far too big for the M.2 form factor that has become the preferred choice for client PCIe storage. The SSD 750 was in many ways overkill from the start, and more recent M.2 drives (especially from Samsung) have caught up in peak performance to offer a much better value for typical client usage. The 600p will be going after the client PCIe storage market from the opposite end: as one of the first TLC PCIe SSDs, its performance specifications don’t set any records but it will be a much more value-oriented product than any of the M.2 PCIe SSDs currently on the market. Intel has confirmed that the 600p and 6000p are using a third-party controller. UPDATE: Allyn Malventano at PC Perspective has uncovered a forum post with an uncensored picture of the 600p. The controller has “SMI” in big letters, suggesting that it is a Silicon Motion SM2260 or relative thereof, but with different markings than the samples Silicon Motion has been showing off at conventions. Intel has also used Silicon Motion controllers in drives like the 540s.


SSD 600p

In addition to the SSD E 6000p, there is a new series of SATA drives for the embedded market. The SSD E 5420s series consists of a 240GB 2.5″ drive and a 150GB M.2 drive, both with 3D MLC and full power loss protection. The E 5420s is rated for one drive write per day, a substantial improvement over the 0.3 DWPD rating of the E 5410s or the 20GB/day of the E 5400s.

Intel Embedded/IoT SATA SSDs
Model E 5420s E 5410s E 5400s
Capacity 240GB 150GB 80GB, 120GB 48GB, 80GB, 120GB, 180GB
NAND IMFT 32-layer 3D MLC 16nm MLC 16nm TLC
Interface 2.5″ SATA M.2 SATA 2.5″ SATA 2.5″ and M.2 SATA
Sequential Read 320 MB/s 165 MB/s up to 475 MB/s up to 560 MB/s
Sequential Write 300 MB/s 145 MB/s up to 135 MB/s up to 475 MB/s
4kB Random Read 65k IOPS 39k IOPS up to 68k IOPS up to 71k IOPS
4kB Random Write 16k IOPS 28k IOPS up to 84k IOPS up to 85k IOPS
Warranty 5 years 5 years 5 years


SSD E 5420s

Moving on to the data center products, the SSD DC S3520 is a new mid-range enterprise SATA SSD for read-oriented workloads and the third iteration of the S3500 series. The M.2 form factor has returned as an option after the DC S3510 series was only offered in the 2.5″ form factor. As with the SATA drives for the embedded market, performance has decreased but endurance has been bumped up from 0.3 DWPD to 1 DWPD. The larger per-die capacity of the 3D MLC has caused the smallest capacity option to increase from 80GB to 150GB, but 1.6TB is still the largest option for the 2.5″ form factor.

Intel Enterprise SATA SSDs
Model DC S3520 DC S3510
Capacity 150GB, 240GB, 480GB, 800GB, 960GB, 1.2TB, 1.6TB 150GB, 240GB, 480GB, 760GB, 960GB 80GB, 120GB, 240GB, 480GB, 800GB, 1.2TB, 1.6TB
NAND IMFT 32-layer 3D MLC 16nm MLC
Interface 2.5″ SATA M.2 SATA 2.5″ SATA
Sequential Read (up to) 450 MB/s 410 MB/s 500 MB/s
Sequential Write (up to) 380 MB/s 320 MB/s 460 MB/s
4kB Random Read (up to) 67.5k IOPS 53k IOPS 68k IOPS
4kB Random Write (up to) 17k IOPS 14.4k IOPS 20k IOPS
Endurance 1 DWPD 1 DWPD 0.3 DWPD
Warranty 5 years 5 years 5 years

SSD DC S3520

(UPDATED) Finally, for the enterprise PCIe space we have the SSD DC P3520. In March the DC P3320 was announced as Intel’s first 3D NAND SSD and the P3520 was mentioned but specifications were not provided at that time. Intel has since decided to only produce the P3520 and to price it close to the level of SATA SSDs. The reduced performance relative to the DC P3500 is a consequence of reduced parallelism at the same capacity that results from using the 256Gb 3D MLC rather than 128Gb 20nm MLC, and the size of this performance regression is a bit dismaying. The DC P3520 is clearly based on the same hardware platform as the rest of the PCIe data center drives, with a familiar layout for the PCB and heatsink evident in the add-in card version.

Intel Enterprise PCIe SSDs
Model DC P3520 DC P3320 (canceled) DC P3500
Capacity 450GB (U.2 only), 1.2TB, 2TB 450GB (U.2 only), 1.2TB, 2TB 400GB, 1.2TB, 2TB
NAND IMFT 32-layer 3D MLC IMFT 32-layer 3D MLC IMFT 20nm MLC
Interface U.2 and PCIe 3 x4 HHHL U.2 and PCIe 3 x4 HHHL U.2 and PCIe 3 x4 HHHL
Sequential Read (up to) 1700 MB/s 1600 MB/s 2700 MB/s
Sequential Write (up to) 1350 MB/s 1400 MB/s 1800 MB/s
4kB Random Read (up to) 375k IOPS 365k IOPS 430k IOPS
4kB Random Write (up to) 26k IOPS 22k IOPS 28k IOPS
4kB Random 70/30 Read/Write (up to) 80k IOPS 65k IOPS 80k IOPS
Warranty 5 years 5 years 5 years


SSD DC P3520 U.2

These new SSDs will have a staggered release over the rest of the year. Starting next week the DC P3520 will be shipping, as well as the 128GB, 256GB and 512GB capacities of the SSD 600p and SSD Pro 6000p. The 2.5″ DC S3520 will ship in early September. The rest are planned to be available in Q4.

Hot Chips 2016: NVIDIA Discloses Tegra Parker Details

Hot Chips 2016: NVIDIA Discloses Tegra Parker Details

At CES 2016 we saw that DRIVE PX2 had a new Tegra SoC in it, but to some extent NVIDIA was still being fairly cagey about what was actually in this SoC or what the block diagram for any of these platforms really looked like. Fortunately, at Hot Chips 2016 we finally got to see some details around the architecture of both Tegra Parker and DRIVE PX2.

Starting with Parker, this is an SoC that has been a long time coming for NVIDIA. The codename and its basic architectural composition were announced all the way back at GTC in 2013, as the successor to the Logan (Tegra K1) SoC. However Erista (Tegra X1) was later added mid-generation – and wound up being NVIDIA’s 20nm generation SoC – so until now the fate of Parker has not been clear. As it turns out, Parker is largely in line with NVIDIA’s original 2013 announcement, except instead of a Maxwell GPU we get something based off of the newer Pascal architecture.

But first, let’s talk about the CPU. The CPU complex has been disclosed as a dual core Denver 2 combined with a quad core Cortex A57, with the entire SoC running on TSMC 16nm FinFET process. This marks the second SoC to use NVIDIA’s custom-developed ARM CPU core, the first being the Denver version of the Tegra K1. Relative to K1, Parker (I suspect NVIDIA doesn’t want to end up with TP1 here) represents both an upgrade to the Denver CPU core itself, and how NVIDIA structures their overall CPU complex, with the addition of a quartet of ARM Cortex-A57 cores joining the two Denver 2 cores.

The big question for most readers, I suspect, is about the Denver 2 CPU cores. NVIDIA hasn’t said a whole lot about them – bearing in mind that Hot Chips is not an exhaustive deep-dive style architecture event – so unfortunately there’s not a ton of information to work with. What NVIDIA has said is that they’ve worked to improve the overall power efficiency of the cores (though I’m not sure if this factors in 16nm FinFET or not), including by implementing some new low power states. Meanwhile on the performance side of matters, NVIDIA has confirmed that this is still a 7-wide design, and that Denver 2 uses “an improved dynamic code optimization algorithm.” What little that was said about Denver 2 in particular was focused on energy efficiency, so it may very well be that the execution architecture is not substantially different from Denver 1’s.

With that in mind, the bigger news from a performance standpoint is that with Parker, the Denver CPU cores are not alone. For Parker the CPU has evolved into a full CPU complex, pairing up the two Denver cores with a quad-core Cortex-A57 implementation. NVIDIA cheekily refers to this as “Big + Super”, a subversion of ARM’s big.LITTLE design, as this combines “big” A57 cores with the “super” Denver cores. There are no formal low power cores here, so when it comes to low power operation it looks like NVIDIA is relying on Denver.

That NVIDIA would pair up Denver with ARM’s cores is an interesting move, in part because Denver was originally meant to solve the middling single-threaded performance of ARM’s earlier A-series cores. Secondary to this was avoiding big.LITTLE-style computing by making a core that could scale the full range. For Parker this is still the case, but NVIDIA seems to have come to the conclusion that both responsiveness and the total performance of the CPU complex needed addressed. The end result is the quad-core A57 to join the two Denver cores.

NVIDIA didn’t just stop at adding A57 cores though; they also made the design a full Heterogeneous Multi-Processing (HMP) design. A fully coherent HMP design at that, utilizing a proprietary coherency fabric specifically to allow the two rather different CPU cores to maintain that coherency. The significance of this – besides the unusual CPU pairing – is that it should allow NVIDIA to efficiently migrate threads between the Denver and A57 cores as power and performance require it. This also allows NVIDIA to use all 6 CPU cores at once to maximize performance. And since Parker is primarily meant for automotive applications – featuring more power and better cooling – unlike mobile environments it’s entirely reasonable to expect that the design can sustain operation across all 6 of those CPU cores for extended periods of time.

Overall this setup is very close to big.LITTLE, except with the Denver cores seemingly encompassing parts of both “big” and “little” depending on the task. With all of that said however, it should be noted that NVIDIA has not had great luck with multiple CPU clusters; Tegra X1 featured cluster migration, but it never seemed to use its A53 CPU cores at all. So without having had a chance to see Parker’s HMP in action, I have some skepticism on how well HMP is working in Parker.

Overall, NVIDIA is claiming about 40-50% more overall CPU performance than A9x or Kirin 950, which is to say that if your workload can take advantage of all 6 CPUs in the system then it’s going to be noticeably faster than two Twister CPUs at 2.2 GHz. But there’s no comparison to Denver 1 (TK1) here, or any discussion of single-thread performance. Though on the latter, admittedly I’m not sure quite how relevant that is for NVIDIA now that Parker is primarily an automotive SoC rather than a general purpose SoC.

Outside of the CPU, NVIDIA has added some new features to Parker such as doubling memory bandwidth. For the longest time NVIDIA stuck with a 64-bit memory bus on what was essentially a tablet SoC lineup, which despite what you may think from the specs worked well enough for NVIDIA, presumably due to their experience in GPU designs, and as we’ve since learned, compression & tiling. Parker in turn finally moves to a 128-bit memory bus, doubling the aggregate memory bandwidth to 50GB/sec (which works out to roughly LPDDR4-3200).

More interesting however is the addition of ECC support to the memory subsystem. This seems to be in place specfically to address the automotive market by improving the reliability of the memory and SoC. A cell phone and its user can deal with the rare bitflip, however things like self-driving vehicles can’t afford the same luxury. Though I should note it’s not clear whether ECC support is just some kind of soft ECC for the memory or if it’s hardwired ECC (NVIDIA calls it “in-line” DRAM ECC). But it’s clear that whatever it is, it extends beyond the DRAM, as NVIDIA notes that there’s ECC or parity protection for “key on-die memories”, which is something we’d expect to see on a more hardened design like NVIDIA is promoting.

Finally, NVIDIA has also significantly improved their I/O functionality, which again is being promoted particularly with the context of automotive applications. There’s more support for extra cameras to improve ADAS and self-driving systems, as well as 4Kp60 video encode, CAN bus support, hardware virtualization, and additional safety features that help to make this SoC truly automotive-focused.

The hardware virtualization of Parker is particularly interesting. It’s both a safety feature – isolating various systems from each other – while also allowing for some cost reduction on the part of the OEM as there is less need to use separate hardware to avoid a single point of failure for critical systems. There’s a lot of extra logic going on to make this all work properly, and things like running the dual Parker SoCs in a soft lockstep mode is also possible. In the case of DRIVE PX2 an Aurix TC297 is used to function as a safety system and controls both of the Parker SoCs, with a PCI-E switch to connect the SoCs to the GPUs and to each other.

Meanwhile, it’s interesting to note that the GPU of Parker was not a big part of NVIDIA’s presentation. Part of this is because Parker’s GPU architecture, Pascal, has already launched in desktops and is essentially a known quantity now. At the same time, Parker’s big use (at least within NVIDIA) is for the DRIVE PX2 system, which is going to be combining Parker with a pair of dGPUs. So in the big picture Parker’s greater role is in its CPUs, I/O, and system management rather than its iGPU.

Either way, NVIDIA’s presentation confirms that Parker integrates a 256 CUDA Core Pascal design. This is the same number of CUDA Cores as on TX1, so there has not been a gross increase in GPU hardware. At the same time moving from TSMC’s 20nm planar process to their 16nm FinFET process did not significantly increase transistor density, so there’s also not a lot of new space to put GPU hardware. NVIDIA quotes an FP16 rate of 1.5 TFLOPs for Parker, which implies a GPU clockspeed of around 1.5GHz. This is consistent with other Pascal-based GPUs in that NVIDIA seems to have invested most of their 16nm gains into ramping up clockspeeds rather than making for wider GPUs.

As the unique Maxwell implementation in TX1 was already closer to Pascal than any NVIDIA dGPU – in particular, it supported double rate FP16 when no other Maxwell did – the change from Maxwell to Pascal isn’t as dramatic here. However some of Pascal’s other changes, such as fine-grained context switching for CUDA applications, seems to play into Parker’s other features such as hardware virtualization. So Pascal should still be a notable improvement over Maxwell for the purposes of Parker.

Overall, it’s interesting to see how Tegra has evolved from being almost purely a mobile-focused SoC to a truly automotive-focused SoC. It’s fairly obvious at this point that Tegra is headed towards higher TDPs than what we’ve seen before, even higher than small tablets. Due to this automotive focus it’ll be interesting to see whether NVIDIA starts to integrate advanced DSPs or anything similar or if they continue to mostly rely on CPU and GPU for most processing tasks.

Hot Chips 2016: NVIDIA Discloses Tegra Parker Details

Hot Chips 2016: NVIDIA Discloses Tegra Parker Details

At CES 2016 we saw that DRIVE PX2 had a new Tegra SoC in it, but to some extent NVIDIA was still being fairly cagey about what was actually in this SoC or what the block diagram for any of these platforms really looked like. Fortunately, at Hot Chips 2016 we finally got to see some details around the architecture of both Tegra Parker and DRIVE PX2.

Starting with Parker, this is an SoC that has been a long time coming for NVIDIA. The codename and its basic architectural composition were announced all the way back at GTC in 2013, as the successor to the Logan (Tegra K1) SoC. However Erista (Tegra X1) was later added mid-generation – and wound up being NVIDIA’s 20nm generation SoC – so until now the fate of Parker has not been clear. As it turns out, Parker is largely in line with NVIDIA’s original 2013 announcement, except instead of a Maxwell GPU we get something based off of the newer Pascal architecture.

But first, let’s talk about the CPU. The CPU complex has been disclosed as a dual core Denver 2 combined with a quad core Cortex A57, with the entire SoC running on TSMC 16nm FinFET process. This marks the second SoC to use NVIDIA’s custom-developed ARM CPU core, the first being the Denver version of the Tegra K1. Relative to K1, Parker (I suspect NVIDIA doesn’t want to end up with TP1 here) represents both an upgrade to the Denver CPU core itself, and how NVIDIA structures their overall CPU complex, with the addition of a quartet of ARM Cortex-A57 cores joining the two Denver 2 cores.

The big question for most readers, I suspect, is about the Denver 2 CPU cores. NVIDIA hasn’t said a whole lot about them – bearing in mind that Hot Chips is not an exhaustive deep-dive style architecture event – so unfortunately there’s not a ton of information to work with. What NVIDIA has said is that they’ve worked to improve the overall power efficiency of the cores (though I’m not sure if this factors in 16nm FinFET or not), including by implementing some new low power states. Meanwhile on the performance side of matters, NVIDIA has confirmed that this is still a 7-wide design, and that Denver 2 uses “an improved dynamic code optimization algorithm.” What little that was said about Denver 2 in particular was focused on energy efficiency, so it may very well be that the execution architecture is not substantially different from Denver 1’s.

With that in mind, the bigger news from a performance standpoint is that with Parker, the Denver CPU cores are not alone. For Parker the CPU has evolved into a full CPU complex, pairing up the two Denver cores with a quad-core Cortex-A57 implementation. NVIDIA cheekily refers to this as “Big + Super”, a subversion of ARM’s big.LITTLE design, as this combines “big” A57 cores with the “super” Denver cores. There are no formal low power cores here, so when it comes to low power operation it looks like NVIDIA is relying on Denver.

That NVIDIA would pair up Denver with ARM’s cores is an interesting move, in part because Denver was originally meant to solve the middling single-threaded performance of ARM’s earlier A-series cores. Secondary to this was avoiding big.LITTLE-style computing by making a core that could scale the full range. For Parker this is still the case, but NVIDIA seems to have come to the conclusion that both responsiveness and the total performance of the CPU complex needed addressed. The end result is the quad-core A57 to join the two Denver cores.

NVIDIA didn’t just stop at adding A57 cores though; they also made the design a full Heterogeneous Multi-Processing (HMP) design. A fully coherent HMP design at that, utilizing a proprietary coherency fabric specifically to allow the two rather different CPU cores to maintain that coherency. The significance of this – besides the unusual CPU pairing – is that it should allow NVIDIA to efficiently migrate threads between the Denver and A57 cores as power and performance require it. This also allows NVIDIA to use all 6 CPU cores at once to maximize performance. And since Parker is primarily meant for automotive applications – featuring more power and better cooling – unlike mobile environments it’s entirely reasonable to expect that the design can sustain operation across all 6 of those CPU cores for extended periods of time.

Overall this setup is very close to big.LITTLE, except with the Denver cores seemingly encompassing parts of both “big” and “little” depending on the task. With all of that said however, it should be noted that NVIDIA has not had great luck with multiple CPU clusters; Tegra X1 featured cluster migration, but it never seemed to use its A53 CPU cores at all. So without having had a chance to see Parker’s HMP in action, I have some skepticism on how well HMP is working in Parker.

Overall, NVIDIA is claiming about 40-50% more overall CPU performance than A9x or Kirin 950, which is to say that if your workload can take advantage of all 6 CPUs in the system then it’s going to be noticeably faster than two Twister CPUs at 2.2 GHz. But there’s no comparison to Denver 1 (TK1) here, or any discussion of single-thread performance. Though on the latter, admittedly I’m not sure quite how relevant that is for NVIDIA now that Parker is primarily an automotive SoC rather than a general purpose SoC.

Outside of the CPU, NVIDIA has added some new features to Parker such as doubling memory bandwidth. For the longest time NVIDIA stuck with a 64-bit memory bus on what was essentially a tablet SoC lineup, which despite what you may think from the specs worked well enough for NVIDIA, presumably due to their experience in GPU designs, and as we’ve since learned, compression & tiling. Parker in turn finally moves to a 128-bit memory bus, doubling the aggregate memory bandwidth to 50GB/sec (which works out to roughly LPDDR4-3200).

More interesting however is the addition of ECC support to the memory subsystem. This seems to be in place specfically to address the automotive market by improving the reliability of the memory and SoC. A cell phone and its user can deal with the rare bitflip, however things like self-driving vehicles can’t afford the same luxury. Though I should note it’s not clear whether ECC support is just some kind of soft ECC for the memory or if it’s hardwired ECC (NVIDIA calls it “in-line” DRAM ECC). But it’s clear that whatever it is, it extends beyond the DRAM, as NVIDIA notes that there’s ECC or parity protection for “key on-die memories”, which is something we’d expect to see on a more hardened design like NVIDIA is promoting.

Finally, NVIDIA has also significantly improved their I/O functionality, which again is being promoted particularly with the context of automotive applications. There’s more support for extra cameras to improve ADAS and self-driving systems, as well as 4Kp60 video encode, CAN bus support, hardware virtualization, and additional safety features that help to make this SoC truly automotive-focused.

The hardware virtualization of Parker is particularly interesting. It’s both a safety feature – isolating various systems from each other – while also allowing for some cost reduction on the part of the OEM as there is less need to use separate hardware to avoid a single point of failure for critical systems. There’s a lot of extra logic going on to make this all work properly, and things like running the dual Parker SoCs in a soft lockstep mode is also possible. In the case of DRIVE PX2 an Aurix TC297 is used to function as a safety system and controls both of the Parker SoCs, with a PCI-E switch to connect the SoCs to the GPUs and to each other.

Meanwhile, it’s interesting to note that the GPU of Parker was not a big part of NVIDIA’s presentation. Part of this is because Parker’s GPU architecture, Pascal, has already launched in desktops and is essentially a known quantity now. At the same time, Parker’s big use (at least within NVIDIA) is for the DRIVE PX2 system, which is going to be combining Parker with a pair of dGPUs. So in the big picture Parker’s greater role is in its CPUs, I/O, and system management rather than its iGPU.

Either way, NVIDIA’s presentation confirms that Parker integrates a 256 CUDA Core Pascal design. This is the same number of CUDA Cores as on TX1, so there has not been a gross increase in GPU hardware. At the same time moving from TSMC’s 20nm planar process to their 16nm FinFET process did not significantly increase transistor density, so there’s also not a lot of new space to put GPU hardware. NVIDIA quotes an FP16 rate of 1.5 TFLOPs for Parker, which implies a GPU clockspeed of around 1.5GHz. This is consistent with other Pascal-based GPUs in that NVIDIA seems to have invested most of their 16nm gains into ramping up clockspeeds rather than making for wider GPUs.

As the unique Maxwell implementation in TX1 was already closer to Pascal than any NVIDIA dGPU – in particular, it supported double rate FP16 when no other Maxwell did – the change from Maxwell to Pascal isn’t as dramatic here. However some of Pascal’s other changes, such as fine-grained context switching for CUDA applications, seems to play into Parker’s other features such as hardware virtualization. So Pascal should still be a notable improvement over Maxwell for the purposes of Parker.

Overall, it’s interesting to see how Tegra has evolved from being almost purely a mobile-focused SoC to a truly automotive-focused SoC. It’s fairly obvious at this point that Tegra is headed towards higher TDPs than what we’ve seen before, even higher than small tablets. Due to this automotive focus it’ll be interesting to see whether NVIDIA starts to integrate advanced DSPs or anything similar or if they continue to mostly rely on CPU and GPU for most processing tasks.

Zotac ZBOX MAGNUS EN980 SFF PC Review – An Innovative VR-Ready Gaming Powerhouse

The PC market has been subject to challenges over the last several years. However, gaming systems and small form-factor (SFF) PCs have weathered the storm particularly well. Many vendors have tried to combine the two, but space constraints and power concerns have ended up limiting the gaming performance of such systems. Zotac, in particular, has been very active in this space with their E-series SFF PCs. The Zotac ZBOX MAGNUS EN980 that we are reviewing today is the follow-up to last year’s MAGNUS EN970 that combined a Broadwell-U CPU with a GTX 970M (rebadged as a GTX 960). The EN980’s full-blown 65W Core i5-6400 Skylake desktop CPU and a no-holds barred VR-ready desktop GTX 980 coupled with an all-in-one watercooling solution seem to have addressed the EN970’s shortcomings. Read on to find out how the unit performs in our rigorous benchmarking and evaluation process.