Vik


Microsoft Surface Pro 3 Review

I can’t believe it’s only been sixteen months since I published our review of the original Microsoft Surface Pro. It feels like longer but that’s likely because Surface RT made its sale debut a few months prior to that, and both devices were announced in the Summer of 2012. As far as an end user is concerned however, in February 2013 Microsoft released Surface Pro and proceeded to deliver two more iterations of the hardware in sixteen months. That’s three Surface Pros in less than two years. Read on for our review of Surface Pro 3.

ISC 2014: NVIDIA Tesla Cards Add ARM64 Host Compatibility

ISC 2014: NVIDIA Tesla Cards Add ARM64 Host Compatibility

Kicking off this week for the world of supercomputing is the 2014 International Supercomputing Conference in Leipzig, Germany. One of the major supercomputing conferences, ISC is Europe’s largest supercomputing conference and as one would exp…

QNAP TS-x51 NAS Series: Intel Quick Sync Gets its Killer App

At Computex 2014, we visited QNAP and came away with a lot of information (some of which had already been demonstrated at CES). After Computex, QNAP got in touch with me to better explain the various features of the newly introduced TS-x51 series (which was not at CES). And, boy, was I floored?! Usually, you don’t see me getting very excited over a product announcement. However, I believe that QNAP’s TS-x51 family has the capability to revolutionize the NAS market for home users and media enthusiasts, particularly in the way it utilizes Intel Quick Sync technology. Read on to for our analysis of where that market segment is headed, and why the TS-x51’s unique feature set may be the start of interesting things to come.

FPGA news roundup: Microsoft "Catapult", Intel's hybrid and Xilinx OpenCL

FPGA news roundup: Microsoft “Catapult”, Intel’s hybrid and Xilinx OpenCL

There has been some activity in the FPGA realm lately.  First, Microsoft has published a paper at ISCA (a very well-known peer-reviewed computer architecture conference) about using FPGAs in datacenters for page ranking processing for Bing. In a test deployment, MS reported up to 95% more throughput for only 10% more power. The added total cost of ownership (TCO) was less than 30%. Microsoft used Altera Stratix V FPGAs in a PCIe form-factor with 8GB of DDR3 RAM on each board. The FPGAs were connected with each other in a 6×8 torus configuration using a 10Gb SAS network. Microsoft mentioned that it programmed the FPGAs in Verilog and that this hand-coding was one of the challenging aspects of the work. However, Microsoft believes using high-level tools such as Scala (presumably a domain-specific subset), OpenCL or “C-to-gates” tools such as AutoESL or ImpulseC might also be suitable for such jobs in the future. Microsoft appears to be pretty happy with the results overall and hopes to deploy the system in production in 2015.

Intel revealed plans to manufacture a CPU-FPGA hybrid chip that combines Intel’s traditional Xeon-line CPU cores with FPGAs on a single chip. The FPGA and CPU will have coherent access to memory.  The exact details of the chip, such as number of CPU cores or the amount of logic and other resources of the FPGA, or even who is the source for the FPGAs (likely Altera), is not revealed. However, we do know that the chip will be package compatible with the existing Xeon E5 line. Intel mentions that FPGAs can deliver “up to 10x” the performance on unspecified industry benchmarks. Intel further claims its implementation will deliver another 2x improvement (so 20x total) because of coherency and lower CPU-FPGA latency. We will have to wait for more information about this product to validate any of Intel’s claims. It will also be interesting what software and development tools Intel provides for this chip.

Finally, you may remember our previous coverage of OpenCL on Altera’s FPGAs and we had mentioned that Xilinx had some plans for OpenCL as well.  Recently (~2 months ago) Xilinx updated its Vivado design suite and now includes “early access” support for OpenCL.

Overall, these announcements point to increased adoption of FPGAs in mainstream applications. Datacenters are very focused on performance per watt as they tend to be power limited, with increasing performance needs. Progress on scaling performance through multicore CPUs has slowed, and relying on GPUs to increase overall performance per watt has an upper bound as well. In a power constrained environment where two different types of general purpose processors are limited by progress on the process node side, we need to find another option to continue to scale performance. In an ideal world, one may design application-specific integrated circuits (ASICs) to get the highest performance/watt, but ASICs are hard to design and once deployed cannot be changed. This solution is not a good fit for datacenter applications, where the workload (such as algorithms for search) are tweaked and changed over time. FPGAs can offer a happy medium between CPUs and ASICs in that they offer programmable and reconfigurable hardware and can still offer a performance/watt advantage over CPUs because they effectively customize themselves to the algorithm. Making FPGAs accessible to more mainstream application programmers (i.e. those who are used to writing C, C++, Java etc. and not Verilog) will be one of the problems and tools such as OpenCL (and more) are gaining steam in this space.

FPGA news roundup: Microsoft "Catapult", Intel's hybrid and Xilinx OpenCL

FPGA news roundup: Microsoft “Catapult”, Intel’s hybrid and Xilinx OpenCL

There has been some activity in the FPGA realm lately.  First, Microsoft has published a paper at ISCA (a very well-known peer-reviewed computer architecture conference) about using FPGAs in datacenters for page ranking processing for Bing. In a test deployment, MS reported up to 95% more throughput for only 10% more power. The added total cost of ownership (TCO) was less than 30%. Microsoft used Altera Stratix V FPGAs in a PCIe form-factor with 8GB of DDR3 RAM on each board. The FPGAs were connected with each other in a 6×8 torus configuration using a 10Gb SAS network. Microsoft mentioned that it programmed the FPGAs in Verilog and that this hand-coding was one of the challenging aspects of the work. However, Microsoft believes using high-level tools such as Scala (presumably a domain-specific subset), OpenCL or “C-to-gates” tools such as AutoESL or ImpulseC might also be suitable for such jobs in the future. Microsoft appears to be pretty happy with the results overall and hopes to deploy the system in production in 2015.

Intel revealed plans to manufacture a CPU-FPGA hybrid chip that combines Intel’s traditional Xeon-line CPU cores with FPGAs on a single chip. The FPGA and CPU will have coherent access to memory.  The exact details of the chip, such as number of CPU cores or the amount of logic and other resources of the FPGA, or even who is the source for the FPGAs (likely Altera), is not revealed. However, we do know that the chip will be package compatible with the existing Xeon E5 line. Intel mentions that FPGAs can deliver “up to 10x” the performance on unspecified industry benchmarks. Intel further claims its implementation will deliver another 2x improvement (so 20x total) because of coherency and lower CPU-FPGA latency. We will have to wait for more information about this product to validate any of Intel’s claims. It will also be interesting what software and development tools Intel provides for this chip.

Finally, you may remember our previous coverage of OpenCL on Altera’s FPGAs and we had mentioned that Xilinx had some plans for OpenCL as well.  Recently (~2 months ago) Xilinx updated its Vivado design suite and now includes “early access” support for OpenCL.

Overall, these announcements point to increased adoption of FPGAs in mainstream applications. Datacenters are very focused on performance per watt as they tend to be power limited, with increasing performance needs. Progress on scaling performance through multicore CPUs has slowed, and relying on GPUs to increase overall performance per watt has an upper bound as well. In a power constrained environment where two different types of general purpose processors are limited by progress on the process node side, we need to find another option to continue to scale performance. In an ideal world, one may design application-specific integrated circuits (ASICs) to get the highest performance/watt, but ASICs are hard to design and once deployed cannot be changed. This solution is not a good fit for datacenter applications, where the workload (such as algorithms for search) are tweaked and changed over time. FPGAs can offer a happy medium between CPUs and ASICs in that they offer programmable and reconfigurable hardware and can still offer a performance/watt advantage over CPUs because they effectively customize themselves to the algorithm. Making FPGAs accessible to more mainstream application programmers (i.e. those who are used to writing C, C++, Java etc. and not Verilog) will be one of the problems and tools such as OpenCL (and more) are gaining steam in this space.