Itanium - is there light at the end of the tunnel?
by Johan De Gelas on November 9, 2005 12:05 AM EST- Posted in
- CPUs
The benefits of TLP...
It is clear that the Itanium core has a big advantage in the area of threading and power dissipation constraints. If you are not convinced, the dual core Itanium Montecito (90 nm process) has no less than 1.72 billion transistors, but it is still able to consume less than 130 W. Compare this with the 300 million transistor Power 5+, which consumes about 170 W on a 90 nm SOI process.
And there is more. X86 CPUs are limited to a maximum of 3 decoded, issued and retired instructions. This might increase to 4 next year. But compared to the best x86 design today - the AMD Opteron -, the Itanium does about 60% more work per clock cycle in integer, and about 115% more work per cycle in floating point. Don't get me wrong, these numbers are no indication of superiority of any kind - clock speed matters just as much. But what these numbers tell you is that x86 designs are less brainiac in nature, and that the x86 ISA limits the ILP much more than IA-64 (we will give more proof in a later article). x86 designs prefer the speed-demon approach with deeper pipelines.
The Itanium can sustain 6 instructions per cycle and can issue up to 11 instructions. A lot of this potential goes to waste, but it also means that the potential gains for Multi-Threading techniques are much higher. While the Pentium 4 Xeon was unable to show any significant performance advantage due to SMT in our server tests[4], Montecito is claimed to be 30% faster in typical database loads, thanks to a Coarse Multi-Threading technique that is less advanced than Hyper Threading.
Itanium's future...
There is no doubt about it, the delay of Montecito and Intel's poor execution is a serious blow to the Itanium family. The Montecito based Itanium 2 has the features that it needs to be competitive in the server world for the next years: dual core, multi-threading and virtualization (Silverdale). Without these features, Itanium is hopelessly behind the competition, especially the dual core Xeon, Opteron and Power 5+. The Xeon and Opteron might still be a bit behind on the RAS features, but this can change quickly and is only important for a small part of the market.
If we ignore Intel's poor execution during the past months and the economic realities, and focus on the architecture, it is clear, however, that the Itanium has time on its side and is most likely the architecture with the highest potential.
Although the Itanium is capable of sustaining a theoretical maximum of 6 instructions and executing up to 11 instructions, and despite its massive register set, it uses fewer transistors for its core than all competitors. The main disadvantage is that it needs much more cache and instruction fetch width, but the disadvantage of needing more cache diminish as process technology gets better (smaller). To improve performance, the Itanium needs much bigger caches than its competitors, but this adds very little to the overall power consumption. As superscalar RISCs in x86 competitors increase their instruction execution width, they need to upgrade the Out-Of-Order buffers and more importantly, increase the complexity of the schedulers. This leads to a much higher complexity and power consumption.
As the focus shifts to Thread Level Parallellism, the Itanium's small cores make it easier to use more cores without increasing the power consumption too much. Montecito will be the living proof of this. The Itanium is also wider than the competition, which results in bigger benefits from threading techniques.
While Itanium may not be very popular in the hardware enthusiast community, it is definitely an architecture that, from an academic and technical point of view, deserves a lot more attention. We'll delve deeper in upcoming articles.
References
[1] The Quest for More Processing Power, Part One: "Is the single core CPU doomed?"
http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2343
[2] Hyper-Threading Technology Architecture and Microarchitecture
http://www.intel.com/technology/itj/2002/volume06issue01/art01_hyper/p01_abstract.htm
[3] Ace's hardware Specmine
http://www.aceshardware.com/SPECmine/
[4] Linux database server CPU comparison
http://www.anandtech.com/IT/showdoc.aspx?i=2447
It is clear that the Itanium core has a big advantage in the area of threading and power dissipation constraints. If you are not convinced, the dual core Itanium Montecito (90 nm process) has no less than 1.72 billion transistors, but it is still able to consume less than 130 W. Compare this with the 300 million transistor Power 5+, which consumes about 170 W on a 90 nm SOI process.
And there is more. X86 CPUs are limited to a maximum of 3 decoded, issued and retired instructions. This might increase to 4 next year. But compared to the best x86 design today - the AMD Opteron -, the Itanium does about 60% more work per clock cycle in integer, and about 115% more work per cycle in floating point. Don't get me wrong, these numbers are no indication of superiority of any kind - clock speed matters just as much. But what these numbers tell you is that x86 designs are less brainiac in nature, and that the x86 ISA limits the ILP much more than IA-64 (we will give more proof in a later article). x86 designs prefer the speed-demon approach with deeper pipelines.
The Itanium can sustain 6 instructions per cycle and can issue up to 11 instructions. A lot of this potential goes to waste, but it also means that the potential gains for Multi-Threading techniques are much higher. While the Pentium 4 Xeon was unable to show any significant performance advantage due to SMT in our server tests[4], Montecito is claimed to be 30% faster in typical database loads, thanks to a Coarse Multi-Threading technique that is less advanced than Hyper Threading.
Itanium's future...
There is no doubt about it, the delay of Montecito and Intel's poor execution is a serious blow to the Itanium family. The Montecito based Itanium 2 has the features that it needs to be competitive in the server world for the next years: dual core, multi-threading and virtualization (Silverdale). Without these features, Itanium is hopelessly behind the competition, especially the dual core Xeon, Opteron and Power 5+. The Xeon and Opteron might still be a bit behind on the RAS features, but this can change quickly and is only important for a small part of the market.
If we ignore Intel's poor execution during the past months and the economic realities, and focus on the architecture, it is clear, however, that the Itanium has time on its side and is most likely the architecture with the highest potential.
Although the Itanium is capable of sustaining a theoretical maximum of 6 instructions and executing up to 11 instructions, and despite its massive register set, it uses fewer transistors for its core than all competitors. The main disadvantage is that it needs much more cache and instruction fetch width, but the disadvantage of needing more cache diminish as process technology gets better (smaller). To improve performance, the Itanium needs much bigger caches than its competitors, but this adds very little to the overall power consumption. As superscalar RISCs in x86 competitors increase their instruction execution width, they need to upgrade the Out-Of-Order buffers and more importantly, increase the complexity of the schedulers. This leads to a much higher complexity and power consumption.
As the focus shifts to Thread Level Parallellism, the Itanium's small cores make it easier to use more cores without increasing the power consumption too much. Montecito will be the living proof of this. The Itanium is also wider than the competition, which results in bigger benefits from threading techniques.
While Itanium may not be very popular in the hardware enthusiast community, it is definitely an architecture that, from an academic and technical point of view, deserves a lot more attention. We'll delve deeper in upcoming articles.
References
[1] The Quest for More Processing Power, Part One: "Is the single core CPU doomed?"
http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2343
[2] Hyper-Threading Technology Architecture and Microarchitecture
http://www.intel.com/technology/itj/2002/volume06issue01/art01_hyper/p01_abstract.htm
[3] Ace's hardware Specmine
http://www.aceshardware.com/SPECmine/
[4] Linux database server CPU comparison
http://www.anandtech.com/IT/showdoc.aspx?i=2447
43 Comments
View All Comments
lifeguard1999 - Wednesday, November 9, 2005 - link
Johan has written a good article on the Itanium and its advantages. However, just because something is good from an engineers point of view, does not mean that it will be a market success. There is the business side of the equation that is just as important, and I will be looking forward to future articles on this.I live and play in the HPC world. Historically, this world has been small and based on (for lack of a better word), big-iron chips such as those found in the Cray C-90 (early 1990's technology) to the Cray X1E (today's tech). In the 1990's people clustered together "commodity" PCs (commonly called Beowulf Clusters) which culminated in 1997 Gordon Bell Prize at SC97 for a cluster of 16 Intel Pentium Pros (200 MHz). Today, these cluster-based supercomputers are everywhere (Cray sells a XT3 based on Opterons). The advantage of the cluster-based supercomputer is price/performance, or said another way: cost.
And that is where this ties back into the business case. Can Itanium compete based on cost? Now cost is more than just how much to produce the physical chip. There are systems adminstrator costs, cooling costs, user-needs-to-learn-to-program-it costs, etc. Cooling concerns are coming to the forefront now as supercomputers may need a dedicated power plant in the near future. Imagine, if you will, how much heat 10,000 Opterons can produce and much electricity it consumes (we only have 4096).
SGI was a big seller of supercomputers based on the MIPS chips (low power, low performance, but easy to use). They transitioned over to the Itanium chips and have had a successful run of supercomputers called the Altix. The problem is that anyone can buy a Opteron cluster supercomputer for much less cost than an Itanium supercomputer. While this is not the only reason for the decline of SGI and its recent delisting from the NYSE (inept management is the main reason) it is a contributing factor.
That leaves HP as the largest seller of Itaniums. Did I mention inept management two sentences back? Maybe I should mention it here again.
Itanium may be a great architecture, and it may survive and thrive. Right now however, it appears that there are dark days ahead.
highlandsun - Wednesday, November 9, 2005 - link
There are still a variety of problems that the Altix design can handle more easily than any cluster-based approach. I'm not convinced that the Altix architecture is tied to Itanium, though. It'd be cool to see an Altix-like machine based on Opterons.ksherman - Wednesday, November 9, 2005 - link
seems like a really good article! Too bad most of it goes over my head :(ceefka - Wednesday, November 9, 2005 - link
Me too, I can finally make some sense of what Itanium is all about. It may have potential in a technical sense but until it comes at an affordable price it doesn't stand a chance imho. It's not always the best tech that sells best, Intel knows ;-)xbdestroya - Wednesday, November 9, 2005 - link
Nice Article. I personally don't see too much of a future for Itanium with the environment it's presently operating in coupled with Intel's missteps, but I feel that for all the heat Itanium: 'The Project' often takes, the architecture itself is unduly maligned.Plus, I love to see articles analysing architectures other than the bread-and-butter x86 ones we're used to seeing. Some more on EPIC, Power... Sun/Fujitsu chips - maybe some NEC - let's spice things up!
There was one problem though with the article on the last page though:
"But the best x86 design - the AMD Opteron - does about 60% less work per clock cycle in integer, and about 115% less work per cycle in floating point than the Itanium."
How does something do 115% *less* work per cycle? Obviously not.
JohanAnandtech - Wednesday, November 9, 2005 - link
Mathematiques was never my best course. :-) Indeed the Itanium does 60% more integer work and 115% more FP.Calin - Thursday, November 10, 2005 - link
So the Opteron does just 62.5% integer work and 46.5% floating point work per clock cycle compared to Itanium.I learned this mostly after introduction of VAT in economy
snorre - Wednesday, November 9, 2005 - link
You write:"it is clear, however, that the Itanium has time on its side and is most likely the architecture with the highest potential."
No, that is not true by any standards. I've tested Itanium systems from day one, including several compilers and development tools and I don't see any high potential with this platform. It's over-expensive, under-performing and quite frankly a big flop.
Don't keep this pace maker going any further, please let it die in peace. Some good ideas just dosen't work well in practice, and EPIC is just another like them.
Starglider - Wednesday, November 9, 2005 - link
Here's a scenario I like to imagine. After many years of research, marketing and general toil Intel claim that their new Itanium-5 chip will finally be the one to popularise the platform. The day before the launch, AMD announce their new x86-64+++ architecture, which extends x86 (again) to allow a scheduling/cache/decoding-hints metadata stream interleaved with the main instruction stream. The new design combines the code density and dynamic optimisation of x86 with all the static optimisation power and execution width of Itanium (but done better becasue AMD have learned from Intel's mistakes), is binary compatible with legacy applications at full speed, and has AMD's onboard memory and PCI express controllers. AMD own 90% of the high-end space by the end of the year and Itanium is finally killed off. ;)dexvx - Wednesday, November 9, 2005 - link
You have no idea what you're talking about do you?I'd like to see a scheduler that is both dynamic and static at the same time.