DirectX 12 Performance Preview, Part 3: Star Swarm & Intel's iGPUs

DirectX 12 Performance Preview, Part 3: Star Swarm & Intel’s iGPUs

We’re back once again for the 3rd and likely final part to our evolving series previewing the performance of DirectX 12. After taking an initial look at discrete GPUs from NVIDIA and AMD in part 1, and then looking at AMD’s integrated GPUs in part 2, today we’ll be taking a much requested look at the performance of Intel’s integrated GPUs. Does Intel benefit from DirectX 12 in the same way the dGPUs and AMD’s iGPU have? And where does Intel’s most powerful Haswell GPU configuration, Iris Pro (GT3e) stack up? Let’s find out.

As our regular readers may recall, when we were initially given early access to WDDM 2.0 drivers and a DirectX 12 version of Star Swarm, it only included drivers for AMD and NVIDIA GPUs. Those drivers in turn only supported Kepler and newer on the NVIDIA side and GCN 1.1 and newer on the AMD side, which is why we haven’t yet been able to look at older AMD or NVIDIA cards, or for that matter any Intel iGPUs. However as of late last week that changed when Microsoft began releasing WDDM 2.0 drivers for all 3 vendors through Windows Update on Windows 10, enabling early DirectX 12 functionality on many supported products.

With Intel WDDM 2.0 drivers now in hand, we’re able to take a look at how Intel’s iGPUs are affected in this early benchmark. Driver version, these drivers enable DirectX 12 functionality on Gen 7.5 (Haswell) and newer GPUs, with Gen 7.5 being the oldest Intel GPU generation that will support DirectX 12.

Today we’ll be looking at all 3 Haswell GPU tiers, GT1, GT2, and GT3e. We also have our AMD A10 and A8 results from earlier this month to use as a point of comparison (though please note that this combination of Mantle + SS is still non-functional on AMD APUs). With that said, before starting we’d like to once again remind everyone that this is an early driver on an early OS running an early DirectX 12 application, so everything here is subject to change. Furthermore Star Swarm itself is a very directed benchmark designed primarily to showcase batch counts, so what we see here should not be considered a well-rounded look at the benefits of DirectX 12. At the end of the day this is a test that more closely measures potential than real-world performance.

CPU: AMD A10-7800
AMD A8-7600
Intel Core i3-4330
Intel Core i5-4690
Intel Core i7-4770R
Intel Core i7-4790K
Motherboard: GIGABYTE F2A88X-UP4 for AMD
ASUS Maximus VII Impact for Intel LGA-1150
Zotac ZBOX EI750 Plus for Intel BGA
Power Supply: Rosewill Silent Night 500W Platinum
Hard Disk: OCZ Vertex 3 256GB OS SSD
Memory: G.Skill 2x4GB DDR3-2133 9-11-10 for AMD
G.Skill 2x4GB DDR3-1866 9-10-9 at 1600 for Intel
Video Cards: AMD APU Integrated
Intel CPU Integrated
Video Drivers: AMD Catalyst 15.200 Beta
OS: Windows 10 Technical Preview 2 (Build 9926)

Since we’re looking at fully integrated products this time around, we’ll invert our usual order and start with our GPU-centric view first before taking a CPU-centric look.

Star Swarm GPU Scaling - Mid Quality

Star Swarm GPU Scaling - Low Quality

As Star Swarm was originally created to demonstrate performance on discrete GPUs, these integrated GPUs do not perform well. Even at low settings nothing cracks 30fps on DirectX 12. None the less there are a few patterns here that can help us understand what’s going on.

Right off the bat then there are two very apparent patterns, one of which is expected and one which caught us by surprise. At a high level, both AMD APUs outperform our collection of Intel processors here, and this is to be expected. AMD has invested heavily in iGPU performance across their entire lineup, where most Intel desktop SKUs come with the mid-tier GT2 GPU.

However what’s very much not expected is the ranking of the various Intel processors. Despite having all 3 Intel GPU tiers represented here, the performance between the Intel GPUs is relatively close, and this includes the Core i7-4770R and its GT3e GPU. GT3e’s performance here immediately raises some red flags – under normal circumstances it substantially outperforms GT2 – and we need to tackle this issue first before we can discuss any other aspects of Intel’s performance.

As long-time readers may recall from our look at Intel’s Gen 7.5 GPU architecture, Intel scales up from GT1 through GT3 by both duplicating the EU/texture unit blocks (the subslice) and the ROP/L3 blocks (the slice common). In the case of GT3/GT3e, it has twice as many slices as GT2 and consequently by most metrics is twice the GPU that GT2 is, with GT3e’s Crystal Well eDRAM providing an extra bandwidth kick. Immediately then there is an issue, since in none of our benchmarks does the GT3e equipped 4770R surpass any of the GT2 equipped SKUs.

The explanation, we believe, lies in the one part of an Intel GPU that doesn’t get duplicated in GT3e, which is the front-end, or as Intel calls it the Global Assets. Regardless of which GPU configuration we’re looking at – GT1, GT2, or GT3e – all Gen 7.5 configurations share what’s essentially the same front-end, which means front-end performance doesn’t scale up with the larger GPUs beyond any minor differences in GPU clockspeed.

Star Swarm for its part is no average workload, as it emphasizes batch counts (draw calls) above all else. Even though the low quality setting has much smaller batch counts than the extreme setting we use on the dGPUs, it’s still over 20K batches per frame, a far higher number than any game would use if it was trying to be playable on an iGPU. Consequently based on our GT2 results and especially our GT3e result, we believe that Star Swarm is actually exposing the batch processing limits of Gen 7.5’s front-end, with the front-end bottlenecking performance once the CPU bottleneck is scaled back by the introduction of DirectX 12.

The result of this is that while the Intel iGPUs are technically GPU limited under DirectX 12, it’s not GPU limited in a traditional sense; it’s not limited by shading performance, or memory bandwidth, or ROP throughput. This means that although Intel’s iGPUs benefit from DirectX 12, it’s not by nearly as much as AMD’s iGPUs did, never mind the dGPUs.

Update: Between when this story was written and when it was published, we heard back from Intel on our results. We are publishing our results as-is, but Intel believes that the lack of scaling with GT3e stems in part from a lack of optimizations for lower performnace GPUs in our build of Star Swarm, which is from an October branch of Oxide's code base. Intel tells us that newer builds do show much better overall performance and more consistent gains for the GT3e, all the while the Oxide engine itself is in flux with its continued development. In any case this reiterates the fact that we're still looking at early code here from all parties and performance is subject to change, especially on a test as directed/non-standard as Star Swarm.

So how much does Intel actually benefit from DirectX 12 under Star Swarm? As one would reasonably expect, with their desktop processors configured for very high CPU performance and much more limited GPU performance, Intel is the least CPU bottlenecked in the first place. That said, if we take a look at the mid quality results in particular, what we find is that Intel still benefits from DX12. The 4770R is especially important here, as it’s a relatively weaker GPU (base frequency 3.2GHz) coupled with a more powerful GPU. It starts out trailing the other Core processors in DX11, only to reach parity with them under DX12 when the bottleneck shifts from the CPU to the GPU front-end. The performance gain is only 25% – and at framerates in the single digits – but conceptually it shows that even Intel can benefit from DX12. Meanwhile the other Intel processors see much smaller, but none the less consistent gains, indicating that there’s at least a trivial benefit from DX12.

Star Swarm CPU Batch Submission Time - Mid - iGPU

Taking a look under the hood at our batch submission times, we can much more clearly see the CPU usage benefits of DX12. The Intel CPUs actually start at a notable deficit here under DX11, with batch submission times worse than the AMD APUs and their relatively weaker CPUs, and 4770R in particular taking nearly 200ms to submit a batch. Enabling DX12 in turn causes the same dramatic reduction in batch submission times we’ve seen elsewhere, with Intel’s batch submission times dropping to below 20ms. Somewhat surprisingly Intel’s times are still worse than AMD’s, though at this point we’re so badly GPU limited on all platforms that it’s largely academic. None the less it shows that Intel may have room for future improvements.

Star Swarm CPU Scaling - Mid Quality - iGPUStar Swarm CPU Scaling - Low Quality - iGPU

With this data in hand, we can finally make better sense of the results we’re seeing today. Just as with AMD and NVIDIA, using DirectX 12 has a noticeable and dramatic reduction in batch submission times for Intel’s iGPUs. However in the case of Star Swarm the batch counts are so high that it appears GT2 and GT3e are bottlenecked by their GPU front-ends, and as a result the gains from enabling DX12 at very limited. In fact at this point we’re probably at the limits of Star Swarm’s usefulness, since it’s meant more for discrete GPUs.

The end result though is that one way or another Intel ends up shifting from being CPU limited to GPU limited under DX12. And with a weaker GPU than similar AMD parts, performance tops out much sooner. That said, it’s worth pointing out that we are looking at desktop parts here, where Intel goes heavy on the CPU and light on the GPU; in mobile parts where Intel’s CPU and GPU configurations are less lopsided, it’s likely that Intel would benefit more than they do on the desktop, though again probably not as much as AMD has.

As for real world games, just as with our other GPUs we’re in a wait-and-see situation. An actual game designed to be playable on Intel’s iGPUs is very unlikely to push as many batch calls as Star Swarm, so the front-end bottleneck and GT3e’s poor performance are similarly unlikely to recur. But at the same time with Intel generally being the least CPU bottlenecked in the first place, their overall gains under DX12 may be the smallest, particularly when exploiting the API’s vastly improved draw call performance.

In the meantime GDC 2015 will be taking place next week, where we will be hearing more from Microsoft and its GPU partners about DirectX 12. With last year’s unveiling being an early teaser of the API, the sessions this year will be focusing on helping programmers ramp up for its formal launch later this year, and with any luck we’ll find the final details on feature level 12_0 and whether any current GPUs are 12_0 compliant. Along with more on OpenGL Next (aka glNext), it should make for an exciting show for GPU events.