DDR4 Haswell-E Scaling Review: 2133 to 3200 with G.Skill, Corsair, ADATA and Crucial
by Ian Cutress on February 5, 2015 10:10 AM ESTFor any user interested in performance, memory speed is an important part of the equation when it comes to building your next system. This can apply to any user, from integrated graphics throughput to gaming and prosumer environments such as finance or oil and gas. Individuals with an opinion on memory speed fall into two broad camps, from saying faster memory has no effect, to the ‘make sure you get at least XYZ’. Following on from our previous Haswell DDR3 scaling coverage, we have now secured enough memory kits to perform a thorough test of the effect of memory speed on DDR4 and Haswell-E.
DDR4 vs. DDR3
On the face of it, direct comparisons between DDR4 and DDR3 are difficult to make. With the switch over from DDR2 to DDR3, there were some platforms that could use both types of memory and we could perform tests on both in the same environment. The current situation with DDR4 limits users to the extreme platform only, where DDR3 is not welcome (except for a few high minimum-order-quantity SKUs which are rarer than hens teeth). The platform dictates the memory compatibility, and the main characteristics of DDR4 are straightforward.
DDR4 brings to the table a lower operating voltage, down from 1.5 volts to 1.2 volts. This is the main characteristic touted by the memory manufacturers and those that use DDR4. It does not sound like a lot, especially when we can be dealing with systems from 300W to 1200W quite easily under Haswell-E. The quoted numbers are a 1-2W saving per module per system, which for a fully laden home-user desktop might approach 15W at the high end of savings over DDR3, but for a server farm with 1000 CPUs, this means a 15kW saving which adds up. The low voltage specification for DDR4L comes down from DDR3L as well, from 1.35 volts to 1.05 volts.
DRAM Comparison | |||
Low Voltage |
Standard Voltage |
Performance Voltage |
|
DDR | 1.80 V | 2.50 V | |
DDR2 | 1.80 V | 1.90 V | |
DDR3 | 1.35 V | 1.50 V | 1.65 V |
DDR4 | 1.05 V | 1.20 V | 1.35 V |
The lower voltage is also enhanced by voltage reference ICs before each memory chip in order to ensure that a consistent voltage is applied across each of them individually rather than the whole module at once. With DDR3, a single voltage source was applied across the whole module which can cause a more significant voltage drop, affecting stability. With this new design any voltage drop is IC dependent and can be corrected.
The other main adjustment to make from DDR3 to DDR4 is the rated speed. DDR3 JEDEC specifications started at 800 MTs and moved through to 1600 MTs, while some of the latest Intel DDR3 processors moved up to 1866 and AMD up to 2133. DDR4’s initial JEDEC for most consumer and server platforms is set at 2133 MHz, coupled with an increase in latency, but is designed to ensure that persistent transfers are quicker but overall latency is comparable to that of DDR2 and DDR3. Technically there is a DDR4-1600 specification for scenarios that want the bargain basement memory and are unfazed by actual performance.
As a result of this increase in speed, overall bandwidth is increased as well.
Bandwidth Comparison | |||||
Bus Clock | Internal Rate | Prefetch | Transfer Rate | Channel Bandwidth | |
DDR | 100-200 MHz | 100-200 MHz | 2n | 0.20-0.40 GT/s | 1.60-3.20 GBps |
DDR2 | 200-533 MHz | 100-266 MHz | 4n | 0.40-1.06 GT/s | 3.20-8.50 GBps |
DDR3 | 400-1066 MHz | 100-266 MHz | 8n | 0.80-2.13 GT/s | 6.40-17.0 GBps |
DDR4 | 1066-2133 MHz | 100-266 MHz | 8n | 2.13-4.26 GT/s | 12.80-25.60 GBps |
Latency moves from DDR3-1600 at CL 11 to DDR4-2133 at CL 15, which was an expected jump as JEDEC tends to increase CL by 2 for a jump in frequency. While having a latency of 15 clocks might come across as worse, the fact that the clocks are at 2133 MTs ensures that the overall performance is still comparable. At DDR3-1600 and CL11, time to initiate a read is 13.75 nanoseconds, compared to 14.06 nanoseconds for DDR4-2133 at CL15, which is a 2% jump.
One of the things that will offset the increase in latency is that CL15 seems to be a common standard no matter what frequency the memory is. Currently on the market we are seeing modules range from DDR4-2133 CL15 up to DDR4-3200 CL15 or DDR4-3400 CL16, marking a read latency down to 9.375 nanoseconds. With DDR3, we saw kits of DDR3-2400 CL10 for 8.33 nanoseconds, showing how aggressive memory manufacturing over the lifetime of the product can increase the efficiency.
Another noticeable difference from DDR3 to DDR4 is the design of the module itself.
DDR3 (top) vs DDR4 (bottom)
As with most technology updates notches are shifted in order to ensure that the right product fits in the right hole, but DDR4 changes a bit more than that. DDR4 is now a 288-pin package, moving up from 240-pin in DDR3. As the modules are the same length, this means a reduction in pin-to-pin distance from 1.00 mm to 0.85 mm (with a ±0.13 tolerance), decreasing the overall per-pin contact.
The other big design change is the sticky-out bits in the middle. Moving from pin 35 to pin 47, and back from pin 105 to pin 117, the pin contacts get longer as well as the PCB by 0.5 mm.
This is a gradient change rather than a full quick change:
Initially when dealing with these modules, I had the issue of not actually placing them in the slot correctly when using a motherboard with single sided latches. Over the past couple of weeks it has started to make more sense to place both ends in at the same time due to this protruding design, despite the fact it can be harder to do when on your hands and knees in a case.
Along with the pin size and arrangement, the modules are ever so slightly taller than DDR3 (31.25 mm rather than 30.35mm) to make routing easier, and the PCB is thicker (1.2 mm from 1.0 mm) to allow for more signal layers. This has implications for future designs, which we will mention later in the review.
There are other non-obvious benefits and considerations baked into the DDR4 design to mention.
DDR4 supports a low-power auto self-refresh (listed in the documentation as LPASR) which does the standard thing of refreshing the contents of memory but uses an adaptive algorithm based on temperature in order to avoid signal drift. The refreshing modes of each module will also adjust each array independently as the controller must support a fine-grained optimization routine to also coincide which parts of the memory are being used. This has power as well as stability implications for the long term future of DDR4 design.
Module training when the system boots is also a key feature of DDR4. During the start-up routine, the system must sweep through reference voltages to find a maximum passing window for the speeds selected rather than just apply the voltage in the options. The training will go through the voltage reference in steps from 0.5% of the VDDQ (typically 1.2V) to 0.8% and the set tolerance of the module must be within 1.625%. Calibration errors are plausible at one step size (9.6 mV at 1.2V) but also the slew margin loss due to calibration error must also be considered. This is due to the greater implication of losses due to margins and tolerances and ensures stable operation during use. The downside to the user is that the number of modules in the system effects the boot time of the device. A fully laden quad-channel Haswell-E system adds another 5-8 seconds to perform this procedure, and it is something that cannot be circumvented through a different routine without disregarding part of the specifications.
Source: Altera
DDR4 is also designed with the future in mind. Current memory on the market, except what we saw with Intelligent Memory, is a monolithic die solution. The base JEDEC specification will allow for 3D stacking of dies with through-silicon-vias (TSVs) should any memory manufacturer wish to go down this route to increase module density. To support this adjustment there are 3 chip select signals, bringing the total of bank select bits to 7 for a total of 128 possible banks. At current UDIMM specifications, there is provision for up to 8 stacked dies, however DDR4 is listed only to support x4/x8/x16 ICs with capacities of 2, 4, 8 and 16 Gibit (gibibit). This would suggest that the stacked die configuration is more suited to devices where x-y dimensions are a premium, or in the server markets. When it comes to higher capacity modules, we have already reported that 16GB UDIMMs should be coming to market, representing an 8*16Gb dual rank arrangement. We are working to make sure we can report on these as soon as they land, however when it comes to higher density UDIMM parts (i.e. not RDIMM or LRDIMM) we might have to start looking at newer technologies.
There are a significant number of other differences between DDR4 and DDR3, but most of these lie in the electronic engineer/design role for the memory and motherboard manufacturers, such as signal termination, extra programmable latencies and internal register adjustment. For a more in-depth read into these, a good Google search can yield results, although a thorough understanding of Rajinder Gill’s AnandTech piece about ‘Everything You Always Wanted To Know About SDRAM But Were Afraid To Ask’ is a great place to start about general memory operation. I still go back and refer to that piece more frequently than I admit, and end up scratching my head until I reach bone.
120 Comments
View All Comments
Dasa2 - Thursday, February 5, 2015 - link
To back up some of what i said here is a few linksI3 2100 matching 2500k@4ghz in dirt 3
http://www.tomshardware.com/reviews/gaming-fx-pent...
Arma a cpu bottlnecked game where a 2600k@4.3ghz with 2133c9 ram is faster than at 4.9ghz with 1600c11
http://forums.bistudio.com/showthread.php?166512-A...
Thief CPU|RAM performance
http://forums.atomicmpc.com.au/index.php/topic/557...
Bf4 1600c9=60fps 2400c10=70fps
http://www.team-greatbritain.com/call-of-duty-ghos...
Xbit ddr3 review looks a bit different to yours...
http://www.xbitlabs.com/articles/memory/display/ha...
Margalus - Friday, February 6, 2015 - link
And not one of those is using ddr4...Dasa2 - Friday, February 6, 2015 - link
Nope hence why I would like a decent review site like anandtech to do a proper job of there ddr4 reviewIm not expecting a big of a difference from higher speeds quad channel ddr4 by comparison to what can be seen in dual channel ddr3 but even there haswell ddr3 tests showed jack all due to the same problem with there tests so how can we know for sure
FlushedBubblyJock - Sunday, February 15, 2015 - link
You're correct, you made your points, so of course someone without many watts currently on display there said something silly, as usual being stupid pays off and those not dumbed down to base below average levels suffer the frustrating beyond belief consequences.mrcaffeinex - Friday, February 6, 2015 - link
The problem is that we currently do not have a non-enthusiast platform available that supports DDR4. The new X99 platform is also running quad-channel, so the best comparison to a prior platform would have to be using X79 (attempting to keep as close to apples to apples as possible). The point that can be taken from this article as it is right now, is that you can skip buying insanely-priced DDR4-3000+ memory because your X99 rig will probably not perform noticeably different with DDR4-2133.As the process matures and more systems adopt DDR4, then you'll be able to do a better comparison across multiple performance levels, but as it is right now, if you're buying into X99, you're buying a high-end CPU. I look forward to the extensive comparative tests that you have mentioned, but I do not see them happening until either the mainstream platform (LGA 115x) is running DDR4 or AMD has any offering that supports DDR4.
Dasa2 - Friday, February 6, 2015 - link
Unfortunately you cant take that from this article as the gaming tests wouldnt show if there was any gain from faster ram even if it did boost cpu performance by 15%These tests were worse than a complete waist of time from a gaming perspective as they could be very misleading
At a guess i would expect to see somewhere between 3-7% difference going from ddr4 2133 to ddr4 3200 at the same timings although most of that gain will probably be between 2133 and 2666 happy to be proven wrong though
Sushisamurai - Friday, February 6, 2015 - link
although I agree it would be nice to see the impact DDR4 timings and speeds on CPU bound games, I unfortunately don't see the real world application to it. With DDR4, we're working on Haswell-E, which already has a lot of compute power - if we were to run into any CPU bottlenecks, wouldn't it make more sense to spend more of the budget into the CPU instead of RAM? Unless, you had enough money to buy top CPU and top RAM, then the point becomes quite moot no?Dasa2 - Friday, February 6, 2015 - link
Depends how big the gain is from faster ram doesnt it and we wont know that until its tested properly with the ram speed compared cpu speeds toTesting cpu or ram performance with gpu bottleneck games is a waist of time unless your AMD trying to sell fx8150...
The only cpu limited games at this stage on Haswell-E will be the ones with bad multithreading support so spending a heap more on the cpu for extra cores from the 5960x wont help
What will help is spending extra for a better overclock and maybe faster ram but how far do you go
tim851 - Friday, February 6, 2015 - link
> The games you chose to review are so badly GPU bottlenecked its sad.That's why they were running these games at reduced resolutions and IQ settings, Einstein.
What game should Anandtech benchmark that is NOT GPU LIMITED - Quake 3 Arena?
Dasa2 - Friday, February 6, 2015 - link
They shouldnt reduce detail settings just no aa and resolution to 1080p while running a gtx980 or two (r9-290\gtx970\gtx780oc minimum)But with the likes of dirt 3 even if they do reduce detail settings its still gpu bottlnecked
Arma\Dayz are some of the only games that can be cpu bottlnecked with a single gtx770
Dying Light is very demanding on both cpu and gpu
http://translate.googleusercontent.com/translate_c...
There is a lot of games that can be a bit of a blend of cpu\gpu limitation with enough gpu power although most of these will run 60fps fine on a 5820k a fair few of them wont do 120-144fps
http://translate.google.com/translate?depth=6&...
As they are a blend there limitation can vary from one part of the game to the next for example testing BF4 SP although easier to get consistent results will be far more gpu limited than MP some levels will also be more gpu limited than others
This is why i suggest putting different models and clocks speeds of cpu in against ram speed results so that people can see where the limit really is and where money is best spent