Vik
June 20, 2017
Comments are Closed

AMD EPYC Launch Event Live Blog (Starts 4pm ET)

I’m in Austin for AMD’s launch event for their new server parts, named EPYC. Come back at 4pm ET (3pm Austin) for the Live Blog on the official announcement!

Vik
June 20, 2017
Comments are Closed

NVIDIA Formally Announces PCIe Tesla V100: Available Later This Year

Similar to last year, at this year’s International Supercomputing Conference (ISC) NVIDIA has announced and detailed a PCI Express version of their latest Tesla GPU accelerator, the Volta-based V100. The conference itself runs from June 19 to 22, and with several speakers from NVIDIA scheduled for events tomorrow, NVIDIA is set to outline its next-generation efforts in HPC and deep learning with Volta.

With Volta discussed and described at their GPU Technology Conference in mid-May, NVIDIA upped the ante in terms of both features and reticle size: V100 is 815mm2 of custom TSMC 12FFN silicon, chock full of tensor cores and unified L1 cache per SM, along with many more fundamental – and as of yet not fully revealed – microarchitectural changes.

Like the previous Pascal iteration, the Tesla V100 PCIe offers a more traditional form factor as opposed to NVIDIA’s own mezzanine-type SXM2 form factor. This allows vendors to drop Tesla cards in traditional PCIe systems, making the cards far more accessible to server builders who don’t want to build around NVIDIA’s SXM2 connector or carrier board. The tradeoff being that the PCIe cards have a lower 250W TDP, and they don’t get NVLink, instead relying on just PCIe.

NVIDIA Tesla Family Specification Comparison
	Tesla V100 (SXM2)	Tesla V100 (PCIe)	Tesla P100 (SXM2)	Tesla P100 (PCIe)
CUDA Cores	5120	5120	3584	3584
Tensor Cores	640	640	N/A	N/A
Core Clock	?	?	1328MHz	?
Boost Clock(s)	1455MHz	~1370MHz	1480MHz	1300MHz
Memory Clock	1.75Gbps HBM2	1.75Gbps HBM2	1.4Gbps HBM2	1.4Gbps HBM2
Memory Bus Width	4096-bit	4096-bit	4096-bit	4096-bit
Memory Bandwidth	900GB/sec	900GB/sec	720GB/sec	720GB/sec
VRAM	16GB	16GB	16GB	16GB
L2 Cache	6MB	6MB	4MB	4MB
Half Precision	30 TFLOPS	28 TFLOPS	21.2 TFLOPS	18.7 TFLOPS
Single Precision	15 TFLOPS	14 TFLOPS	10.6 TFLOPS	9.3 TFLOPS
Double Precision	7.5 TFLOPS (1/2 rate)	7 TFLOPS (1/2 rate)	5.3 TFLOPS (1/2 rate)	4.7 TFLOPS (1/32 rate)
Tensor Performance (Deep Learning)	120 TFLOPS	112 TFLOPS	N/A	N/A
GPU	GV100 (815mm2)	GV100 (815mm2)	GP100 (610mm2)	GP100 (610mm2)
Transistor Count	21B	21B	15.3B	15.3B
TDP	300W	250W	300W	250W
Form Factor	Mezzanine (SXM2)	PCIe	Mezzanine (SXM2)	PCIe
Cooling	Passive	Passive	Passive	Passive
Manufacturing Process	TSMC 12nm FFN	TSMC 12nm FFN	TSMC 16nm FinFET	TSMC 16nm FinFET
Architecture	Volta	Volta	Pascal	Pascal

On the surface, the addition of tensor cores is the most noticeable change. To recap, tensor cores can be liked to a series of unified ALUs that are able to multiply two 4×4 FP16 matrices together and subsequently add that product to an FP16 or FP32 4×4 matrix in a fused multiply add operation, as opposed to conventional FP32 or FP64 CUDA cores. In the end, this means that for very specific kinds (and specifically programmed) workloads, Volta can take advantage of the 100+ TFLOPS capability that NVIDIA has tossed into the mix.

As for the specific specifications of the PCIe Tesla V100, it’s similarly configured to the SXM2 version, getting the same number of CUDA cores and memory capacity, however operating at a lower clockspeed in-line with its reduced 250W TDP. Based on NVIDIA’s throughput figures, this puts the PCIe card’s boost clock at around 1370MHz, 85MHz (~6%) slower than the SXM2 version.

Interestingly, unlike the Tesla P100 family, NVIDIA isn’t offering a second-tier PCIe card based on salvaged chips; so this generation doesn’t have an equivalent to the 12GB PCIe Tesla P100. NVIDIA’s experience with GP100/interposer/HBM2 assembly as well as continuing production of HBM2 has likely reduced the need for memory-salvaged parts.

Finally, PCIe-based Tesla V100 accelerators are “expected to be available later this year from NVIDIA reseller partner and manufacturers,” including Hewlett Packard Enterprise, which will offer three different PCIe Volta systems.

Vik
June 20, 2017
Comments are Closed

Lenovo Unveils ThinkStation P320 Tiny SFF Workstation

Lenovo has unveiled a new ThinkStation model, the P320 Tiny, based on a Kaby Lake / Q270 platform with NVIDIA’s Quadro P600 GPU. The unique aspect is the dimensions – At 1.4″ x 7.1″ x 7.2″ (1L in volume), it is one of the smallest systems we have see that includes a discrete GPU. In order to achieve this compact size, the 135W power adapter is external to the system.

The P320 Tiny supports Kaby Lake CPUs with TDP of up to 35W (such as the Intel Core i7-7700T). NVIDIA’s Quadro P600 is a GP107-based GPU with a 40W TDP. The system comes with two DDR4 SODIMM slots and two M.2 NVMe SSD slots. There is a rich variety of I/O ports – audio jacks in the front, a total of six USB 3.0 ports spread across the front and the rear, a RJ-45 GbE port, and six display outputs (4x mini-DP + 2x DP). Thanks to the Quadro GPU, the P320 Tiny is able to come with ISV certifications for various applications such as AutoCAD etc.

Lenovo ThinkStation P320 Tiny: General Specifications
CPU	Intel Kaby Lake (up to Core i7) (35W TDP max.)
Chipset	Intel Q270
RAM	Up to 32 GB DDR4-2400 (2x SODIMM)
GPU	NVIDIA Quadro P600
Storage	2x M.2 PCIe: up to 1 TB NVMe SSD each ODD: optional with add-on
Networking	Gigabit Ethernet Intel 802.11 ac, 2 x 2, 2.4 GHz/5GHz + Bluetooth 4.0 –
I/O	6x USB 3.0 Serial – optional
Dimensions	1.4″ x 7.1″ x 7.2″
Weight	2.9 lbs

The board used in the system seems to be a custom one – it is larger than a mini-STX board, but, smaller than an ITX one. It is perfect for space-constrained setups, and comes with extensibility options such as add-ons for extra USB ports and a COM port, or, for an optical drive, as shown in the gallery below.

Gallery: Lenovo P320 Tiny SFF Workstation

As for operating systems, the new Lenovo ThinkStation P320 Tiny workstation supports both Windows and Linux. The P320 Tiny starts at $799 and is available now.

Vik
June 19, 2017
Comments are Closed

The Intel Skylake-X Review: Core i9 7900X, i7 7820X and i7 7800X Tested

Building a PC is an experience worth having. Finding out what works with what and putting it all together is an experience, and the first time always gives a sense of achievement and accomplishment. In the high-end desktop space, even more so: trying …

Vik
June 18, 2017
Comments are Closed

Micron Discusses GDDR: 16 Gbps GDDR5X, 16 nm GDDR6 and GDDR5

Micron has made a number of announcements in recent weeks regarding its GDDR memory for graphics cards, game consoles and networking applications. The company is reporting that they’ve been able to hit 16 Gbps data rates in the lab on their latest generation of GDDR5X devices, while also reiterating their long-term plans for GDDR6 and GDDR5, with GDDR6 memory due in a couple of quarters from now, while GDDR5 will be here to stay for a long time to come.

Graphics DRAM has been a hot topic in the industry in the recent years as GPU demands for memory bandwidth are growing rapidly and because different companies offer different types of memory to satisfy these increasing requirements. For example, SK Hynix and Samsung rolled out HBM (Gen 1 and Gen 2) memory in 2015 and 2016 for ultra-high-end consumer and HPC applications, whereas Micron introduced its GDDR5X for high-end graphics cards last year. At present, HBM offers the greatest potential bandwidth, however the complexity of the multi-layer chips and 2.5D packaging keep costs high, so it remains to be seen which mass consumer applications adopt it. Meanwhile, conventional graphics memory in BGA packaging and proven architecture continues to evolve and hit new performance targets due to architectural improvements, which are intended to keep it competitive in the coming years.

When Micron announced its GDDR5X memory in late 2015, it set two targets for data transfer rates: the initial target of 10 – 12 Gbps and the longer-term target of 16 Gbps. Initially, the company only supplied GDDR5X ICs validated at 10 and 11 Gbps, but this year the company also started to bin the chips for 12 Gbps. The latter are used on NVIDIA’s Titan Xp graphics card. What is noteworthy is that engineers from Micron’s development center in Munich (also known as Graphics DRAM Design Center) recently managed to run the company’s mass-produced GDDR5X chips at 16 Gbps in the lab.

While the achievement doesn’t have an impact on actual products available today, it has a number of important implications. Primarily, it means that Micron has refined their process to the point where they can build graphics DRAM with 16 Gbps signaling, and this is something it is going to need going forward. But additionally, it shows that the current GDDR5X technology has potential, and that Micron’s customers might release new products with faster memory.

Micron has been quite busy in the last couple of years working on the GDDR5X memory specification, physical implementation of such ICs, and then developing GDDR6 chips that the company plans to launch by early 2018. In fact, GDDR5X and the GDDR6 are not that different. They are both based on the 16n prefetch architecture and this is the key to their additional performance when compared to GDDR5. Meanwhile, GDDR6 also features dual-channel mode, which is meant to ensure better channel utilization and hence improve performance in cases that can take advantage of the feature.

Micron’s GDDR Memory at Glance
	GDDR5	GDDR5X	GDDR6
Capacity	4 Gb – 8 Gb	8 Gb	8 Gb
Data Rate	5 – 8 Gbps	10 – 12 Gbps	Over 12 Gbps
Process Technology	20+ nm 20 nm, 16 nm	20 nm	16 nm

Meanwhile Micron will be using 16 nm fab lines to produce GDDR6 memory devices, which may add frequency potential to the upcoming chips compared to ICs made using their 20 nm fabrication process. Speaking of 16 nm, Micron also plans to use it for newer GDDR5 chips, which makes a lot of sense considering the fact that such devices are going to be used for graphics cards and game consoles for years to come.

Summing up. Micron has GDDR5X memory chips that run at 16 Gbps in the lab using test equipment. Such chips are made using 20 nm process technology. Meanwhile Micron is using 16 nm fabrication process to produce GDDR6 and GDDR5 by 2018.

Monday	10:00 AM - 5:30 PM
Tuesday	10:00 AM - 5:30 PM
Wednesday	10:00 AM - 5:30 PM
Thursday	10:00 AM - 5:30 PM
Friday	10:00 AM - 5:30 PM
Saturday	Closed
Sunday	Closed

News

AMD EPYC Launch Event Live Blog (Starts 4pm ET)

NVIDIA Formally Announces PCIe Tesla V100: Available Later This Year

Lenovo Unveils ThinkStation P320 Tiny SFF Workstation

The Intel Skylake-X Review: Core i9 7900X, i7 7820X and i7 7800X Tested

Micron Discusses GDDR: 16 Gbps GDDR5X, 16 nm GDDR6 and GDDR5