News


NVIDIA Announces Tesla M40 & M4 Server Cards - Data Center Machine Learning

NVIDIA Announces Tesla M40 & M4 Server Cards – Data Center Machine Learning

Slowly but steadily NVIDIA has been rotating in Maxwell GPUs into the company’s lineup of Tesla server cards. Though Maxwell is not well-suited towards the kind of high precision HPC work that the Tesla lineup was originally crafted for, Maxwell is plenty suitable for just about every other server use NVIDIA can think of. And as a result the company has been launching what’s best described as new breeds of Maxwell cards in the last few months.

After August’s announcement of the Tesla M60 and M6 cards – with a focus on VDI and video encoding – NVIDIA is back today for the announcement of the next set of Tesla cards, the M40 and the M4. In what the company is dubbing their “hyperscale accelerators,” NVIDIA is launching these two cards with a focus on capturing a larger portion of the machine learning market.

NVIDIA Tesla Family Specification Comparison
  Tesla M40 Tesla M4 Tesla M60 Tesla K40
Stream Processors 3072 1024 2 x 2048
(4096)
2880
Boost Clock(s) ~1140MHz ~1075MHz ~1180MHz 810MHz, 875MHz
Memory Clock 6GHz GDDR5 5.5GHz GDDR5 5GHz GDDR5 6GHz GDDR5
Memory Bus Width 384-bit 128-bit 2 x 256-bit 384-bit
VRAM 12GB 4GB 2 x 8GB
(16GB)
12GB
Single Precision (FP32) 7 TFLOPS 2.2 TFLOPS 9.7 TFLOPS 4.29 TFLOPS
Double Precision (FP64) 0.21 TFLOPS (1/32) 0.07 TFLOPS (1/32) 0.3 TFLOPS (1/32) 1.43 TFLOPS (1/3)
Transistor Count 8B 2.94B 2x 5.2B 7.1B
TDP 250W 50W-75W 225W-300W 235W
Cooling Passive Passive
(Low Profile)
Active/Passive Active/Passive
Manufacturing Process TSMC 28nm TSMC 28nm TSMC 28nm TSMC 28nm
GPU GM200 GM206 GM204 GK110
Target Market Machine Learning Machine Learning VDI Compute

First let’s quickly talk about the cards themselves. The Tesla M40 marks the introduction of the GM200 GPU to the Tesla lineup, with NVIDIA looking to put their best single precision (FP32) GPU to good use. This is a 250 Watt full power and fully enabled GM200 card – though with Maxwell this distinction loses some meaning – with NVIDIA outfitting the card with 12GB of GDDR5 VRAM clocked at 6GHz. We know that Maxwell doesn’t support on-chip ECC for the RAM and caches, but it’s not clear at this time whether soft-ECC is supported for the VRAM. Otherwise, with the exception of the change in coolers this card is a spitting image of the consumer GeForce GTX Titan X.

Joining the Tesla M40 is the Tesla M4. As hinted at by its single-digit product number, the M4 is a small, low powered card. In fact this is the first Tesla card to be released in a PCIe half-height low profile form factor, with NVIDIA specifically aiming for dense clusters of these cards. Tesla M4 is based on GM206 – this being the GPU’s first use in a Tesla product as well – and is paired with 4GB of GDDR5 clocked at 5GHz. NVIDIA offers multiple power/performance configurations of the M4 depending on server owner’s needs, ranging from 50W to 75W, with the highest power mode rated to deliver up to 2.2TFLOPS of FP32 performance.

Both the Tesla M40 and M4 are being pitched at the machine learning market, which has been a strong focus for NVIDIA since the very start of the year. The company believes that machine learning is the next great frontier for GPUs, capitalizing on neural net research that has shown GPUs to be capable of both quickly training and quickly executing neural nets. Neural nets in turn are increasingly being used as more efficient means for companies to process vast amounts of audio & video data (e.g. the Facebooks of the world).

To that end we have seen the company focus on machine learning in the automotive sector with products such as the Drive PX system and lay out their long-term plans for machine learning with the forthcoming Pascal architecture at GTC 2015. In the interim then we have the Tesla M40 and Tesla M4 for building machine learning setups with NVIDIA’s current-generation architecture.

Given their performance and power profiles, Tesla M40 and M4 are intended to split the machine learning market on the basis of training versus execution The powerful M40 being well-suited for quicker training of neural nets and other systems, while the more compact M4 is well-suited for dense clusters of systems actually executing various machine learning tasks. Note that it’s interesting that NVIDIA is pitching the M40 and not the more powerful M60 for training tasks; as NVIDIA briefly discussed among their long-term plans at GTC 2015, current training algorithms don’t scale very well beyond a couple of GPUs, so users are better off with a couple top-tier GM200 GPUs than a larger array of densely packed GM204 GPUs. As a result the M40 occupies an interesting position as the company’s top Tesla card for machine learning tasks that aren’t trivially scalable to many GPUs.

Meanwhile, along with today’s hardware announcement NVIDIA is also announcing a new software suite to tie together their hyperscale ambitions. Dubbed the “NVIDIA Hyperscale Suite,” the company is putting together software targeted at end-user facing web services. Arguably the lynchpin of the suite is NVIDIA’s GPU REST Engine, a service for RESTful APIs to utilize the GPU, and in turn allowing web services to easily access GPU resources. NVIDIA anticipates the GPU REST Engine enabling everything from search acceleration to image classification, and to start things off they are providing the NVIDIA Image Compute Engine, a REST-capable service for GPU image resizing. Meanwhile the company is also be providing their cuDNN neural net software as part of the suite, and versions of FFmpeg with support for NVIDIA’s hardware video encode and decode blocks to speed up video processing and transcoding.

Wrapping things up, as is common with Tesla product releases, today’s announcements will predate the hardware itself by a bit. NVIDIA tells us that the Tesla M40 and the hyperscale software suite will be available later this year (with just over a month and a half remaining). Meanwhile the Tesla M4 will be released in Q1 of 2016. NVIDIA has not announced card pricing at this time.

NVIDIA Announces Tesla M40 & M4 Server Cards - Data Center Machine Learning

NVIDIA Announces Tesla M40 & M4 Server Cards – Data Center Machine Learning

Slowly but steadily NVIDIA has been rotating in Maxwell GPUs into the company’s lineup of Tesla server cards. Though Maxwell is not well-suited towards the kind of high precision HPC work that the Tesla lineup was originally crafted for, Maxwell is plenty suitable for just about every other server use NVIDIA can think of. And as a result the company has been launching what’s best described as new breeds of Maxwell cards in the last few months.

After August’s announcement of the Tesla M60 and M6 cards – with a focus on VDI and video encoding – NVIDIA is back today for the announcement of the next set of Tesla cards, the M40 and the M4. In what the company is dubbing their “hyperscale accelerators,” NVIDIA is launching these two cards with a focus on capturing a larger portion of the machine learning market.

NVIDIA Tesla Family Specification Comparison
  Tesla M40 Tesla M4 Tesla M60 Tesla K40
Stream Processors 3072 1024 2 x 2048
(4096)
2880
Boost Clock(s) ~1140MHz ~1075MHz ~1180MHz 810MHz, 875MHz
Memory Clock 6GHz GDDR5 5.5GHz GDDR5 5GHz GDDR5 6GHz GDDR5
Memory Bus Width 384-bit 128-bit 2 x 256-bit 384-bit
VRAM 12GB 4GB 2 x 8GB
(16GB)
12GB
Single Precision (FP32) 7 TFLOPS 2.2 TFLOPS 9.7 TFLOPS 4.29 TFLOPS
Double Precision (FP64) 0.21 TFLOPS (1/32) 0.07 TFLOPS (1/32) 0.3 TFLOPS (1/32) 1.43 TFLOPS (1/3)
Transistor Count 8B 2.94B 2x 5.2B 7.1B
TDP 250W 50W-75W 225W-300W 235W
Cooling Passive Passive
(Low Profile)
Active/Passive Active/Passive
Manufacturing Process TSMC 28nm TSMC 28nm TSMC 28nm TSMC 28nm
GPU GM200 GM206 GM204 GK110
Target Market Machine Learning Machine Learning VDI Compute

First let’s quickly talk about the cards themselves. The Tesla M40 marks the introduction of the GM200 GPU to the Tesla lineup, with NVIDIA looking to put their best single precision (FP32) GPU to good use. This is a 250 Watt full power and fully enabled GM200 card – though with Maxwell this distinction loses some meaning – with NVIDIA outfitting the card with 12GB of GDDR5 VRAM clocked at 6GHz. We know that Maxwell doesn’t support on-chip ECC for the RAM and caches, but it’s not clear at this time whether soft-ECC is supported for the VRAM. Otherwise, with the exception of the change in coolers this card is a spitting image of the consumer GeForce GTX Titan X.

Joining the Tesla M40 is the Tesla M4. As hinted at by its single-digit product number, the M4 is a small, low powered card. In fact this is the first Tesla card to be released in a PCIe half-height low profile form factor, with NVIDIA specifically aiming for dense clusters of these cards. Tesla M4 is based on GM206 – this being the GPU’s first use in a Tesla product as well – and is paired with 4GB of GDDR5 clocked at 5GHz. NVIDIA offers multiple power/performance configurations of the M4 depending on server owner’s needs, ranging from 50W to 75W, with the highest power mode rated to deliver up to 2.2TFLOPS of FP32 performance.

Both the Tesla M40 and M4 are being pitched at the machine learning market, which has been a strong focus for NVIDIA since the very start of the year. The company believes that machine learning is the next great frontier for GPUs, capitalizing on neural net research that has shown GPUs to be capable of both quickly training and quickly executing neural nets. Neural nets in turn are increasingly being used as more efficient means for companies to process vast amounts of audio & video data (e.g. the Facebooks of the world).

To that end we have seen the company focus on machine learning in the automotive sector with products such as the Drive PX system and lay out their long-term plans for machine learning with the forthcoming Pascal architecture at GTC 2015. In the interim then we have the Tesla M40 and Tesla M4 for building machine learning setups with NVIDIA’s current-generation architecture.

Given their performance and power profiles, Tesla M40 and M4 are intended to split the machine learning market on the basis of training versus execution The powerful M40 being well-suited for quicker training of neural nets and other systems, while the more compact M4 is well-suited for dense clusters of systems actually executing various machine learning tasks. Note that it’s interesting that NVIDIA is pitching the M40 and not the more powerful M60 for training tasks; as NVIDIA briefly discussed among their long-term plans at GTC 2015, current training algorithms don’t scale very well beyond a couple of GPUs, so users are better off with a couple top-tier GM200 GPUs than a larger array of densely packed GM204 GPUs. As a result the M40 occupies an interesting position as the company’s top Tesla card for machine learning tasks that aren’t trivially scalable to many GPUs.

Meanwhile, along with today’s hardware announcement NVIDIA is also announcing a new software suite to tie together their hyperscale ambitions. Dubbed the “NVIDIA Hyperscale Suite,” the company is putting together software targeted at end-user facing web services. Arguably the lynchpin of the suite is NVIDIA’s GPU REST Engine, a service for RESTful APIs to utilize the GPU, and in turn allowing web services to easily access GPU resources. NVIDIA anticipates the GPU REST Engine enabling everything from search acceleration to image classification, and to start things off they are providing the NVIDIA Image Compute Engine, a REST-capable service for GPU image resizing. Meanwhile the company is also be providing their cuDNN neural net software as part of the suite, and versions of FFmpeg with support for NVIDIA’s hardware video encode and decode blocks to speed up video processing and transcoding.

Wrapping things up, as is common with Tesla product releases, today’s announcements will predate the hardware itself by a bit. NVIDIA tells us that the Tesla M40 and the hyperscale software suite will be available later this year (with just over a month and a half remaining). Meanwhile the Tesla M4 will be released in Q1 of 2016. NVIDIA has not announced card pricing at this time.

The Microsoft Surface Book Review

Microsoft has released what they are calling “The Ultimate Laptop” and with their first attempt at moving outside the tablet segment, we take a look at the new Surface Book and how it compares. Competition in the notebook segment is much more intense than the high end tablet market, and Microsoft is aiming for the top.

The Microsoft Surface Book Review

Microsoft has released what they are calling “The Ultimate Laptop” and with their first attempt at moving outside the tablet segment, we take a look at the new Surface Book and how it compares. Competition in the notebook segment is much more intense than the high end tablet market, and Microsoft is aiming for the top.

Imagination Announces New P6600, M6200, M6250 Warrior CPUs

Imagination Announces New P6600, M6200, M6250 Warrior CPUs

Today Imagination launches three new MIPS processor IPs: One in the performance category of Warrior CPUs, the P6600 and two embedded M-class core, the M6200 and M6250.

Warrior P6600

Starting off with the P6600, this is Imagination’s new MIPS flagship core succeeding the P5600. The P5600 was a 3-wide out-of-order design with a pipeline depth of up to 16 stages. The P6600 keeps most of the predecessor’s characteristics such as the main architectural features or full hardware virtualization and security through OmniShield, but adds compatibility for MIPS64 64-bit processing on top. Imagination first introduced a mobile oritented 64-bit MIPS CPU back with the I6400 a little more than a year ago but we’ve yet to see vendors announce products with it. 

We’re still lacking any details on the architectural improvements of the P6600 over the P5600 so it seems that for now we’re left with guessing what kind of performance the new core will bring. The P5600 was directly competing with ARM’s Cortex A15 in terms of IPC, but ARM has since then not only announced but also seen silicon with two successor IPs to the A15 (A57 and A72), so the P6600 will have some tough competition ahead of itself once it arrives in products.

The P6600, much like the P5600 can be implemented from single-core to six-core cluster configurations. What is interesting that as opposed to ARM CPU IP, the MIPS cores allow for asynchronous clock planes between the individual cores if the vendors wishes to implement the SoC’s power management in this way (It can also be set up to work in a synchronous way).

“MIPS P6600 is the next evolution of the high-end MIPS P-class family and builds on the 32-bit P5600 CPU. P6600 is a balanced CPU for mainstream/high-performance computing, enabling powerful multicore 64-bit SoCs with optimal area efficiency for applications in segments including mobile, home entertainment, networking, automotive, HPC or servers, and more. Customers have already licensed the P6600 for applications including high-performance computing and advanced image and vision systems.”

Warrior M6200 & M6250

Also as part of today’s announcement we see two new embedded CPU cores, the M6200 and M6250. Both cores are successors to the microAptiv-UP and UC but able to run at up to 30% higher frequency. The new processors also see an ISA upgrade to MIPS32 Release 6 instead of Release 5.

The M6200 is targeted at real-time embedded operating systems with minimal funtionality for cost- and power-savings. It has no MMU and as such can only be described as a microcontroller part.

The M6250 is the bigger brother of the M6200 and the biggest difference is the inclusion of a memory management unit (MMU) that makes this a full fledged processor core that can run operating systems like Linux.

“M6200 and M6250 are configurable and fully synthesizable solutions for devices requiring a high level of performance efficiency and small silicon area including wireless or wired modems, GPU supervisors, flash and SSD controllers, industrial and motor control, advanced audio and more.”