Superscalar to the Rescue

If deepening the pipeline gives us higher clock speeds and more instructions being worked on at a time, but at the expense of lower performance when things aren’t working optimally, what other options do we have for increasing performance?

Instead of going deeper, what about making our chip wider? In our previous example only a single instruction could be active at any given stage in the pipeline - what if we removed that limitation?

A superscalar processor is one that allows multiple instructions to be active at any given stage in the pipeline. Through some duplication of resources you can now have two or more instructions at the same stage at the same time. The simplest superscalar implementation is a dual-issue, where two instructions can go down the pipe in parallel. Today’s Core 2 and Core i7 processors are four issue (four instructions go down the pipe in parallel); the high end hasn’t been dual issue since the days of the original Pentium processor.

The benefits of a superscalar chip are obvious: you potentially double the number of completed instructions at any given time. Combine that with a reasonably pipelined, high clock speed architecture and you have the makings of a high performance processor.

The drawbacks are also obvious; enabling a multi-issue architecture requires more transistors, which drive up die size (cost) and power (heat). Only recently have superscalar designs made their way into mobile devices thanks to smaller and cooler switching transistors (e.g. 45nm). You also have to worry even more about keeping the CPU fed with instructions, which means larger caches, faster memory buses and clever architectural tricks to extract as much instruction level paralellism as possible. A dual issue chip is a waste if you can’t keep it fed consistently.

Raw Clock Speed

The previous two examples of architectural enhancements are major improvements in design. To design a modern day CPU with more pipeline stages or to go from a single to dual-issue design takes a team years to implement; these are not trivial improvements.

A simpler path to improving performance is to just increase the clock speed of the CPU. In the first example I provided, our CPU could only run as fast as the most complex pipeline stage allowed it. In the real world however, there are other limitations to clock speed.

Manufacturing issues alone can severely limit clock speed. Even though an architecture may be capable of running at 1GHz, the transistors used in making the chip may only be yielding well at 600MHz. Power is also a major concern. A transistor usually has a range of switching speeds. Our hypothetical 45nm process may be able to run at 300MHz at 0.9500V or 600MHz at 1.300V; higher frequencies generally mean higher voltage, which results in higher power consumption - a big issue for mobile devices.

The iPhone’s processor is based on a SoC that can operate at up to 600MHz, for power (and battery life) concerns Apple/Samsung limit the CPU core to running at 412MHz. The architecture can clearly handle more, but the balance of power and battery life gate us. In general, increasing clock speed alone isn’t a desirable option to improve performance in a mobile device like a smartphone because your performance per watt doesn’t improve tremendously if at all.

In terms of sheer performance however, just increasing clock speed is preferred to deepening your pipeline and increasing clock speed. With no increase in pipeline depth you don’t have to worry about keeping any more stages full, everything just works faster if you increase your clock speed.

The key take away here is that you can’t just look at clock speed when it comes to processors. We learned this a long time ago in the desktop space, but it seems that it’s getting glossed over in the smartphone market. A 400MHz dual-issue core is going to be a better performer than a 500MHz single-issue core with a deeper pipeline, and the 528MHz processor in the iPod Touch is no where near as fast as the 600MHz processor in the iPhone 3GS.

A Crash Course in CPU Architecture Putting it in Perspective
Comments Locked

60 Comments

View All Comments

  • iwodo - Thursday, July 9, 2009 - link

    Would really love Anand digg deeper and give us some more info. The info i could find for Atom, has 47 Million transistors. Ars report 40% of it is cache, while others report the core is 13.7 million. The previous iPhone article Jarred Walton commented that x86 decoder no longer matters because 1.5 - 2 million transistors inside a billions transistor CPU is negligible. However in Mobile space, 2M inside a 13.7M is nearly 15%. Not to mention other transistor used that is needed for this decoding.

    The space required for Atom is 25mm2 on a 45NM ( Including All Cache) . Cortex A8 require 9mm2 ( dont know how many cache ) on 65nm.

    What is interesting is how Intel manage to squeeze the north bridge inside the Atom CPU ( more transistors ) while making the Die Smaller. ( i dont know if Intel slides were referring to the total package size or the die size itself ).
  • snookie - Thursday, July 9, 2009 - link

    The Pre hardware, as in case, screen, keyboard is terrible. Cheap, plasticky and breaking left and right on people. If Palm survives long enough to get to Verizon etc Here's hoping they come out with better hardware soon. I've used Blackberries for year but I see no need for a physical keyboard. With the new iPhone widescreen keyboard I type with both thumbs very quickly and I have big hands.
  • snookie - Thursday, July 9, 2009 - link

    Jason, Apple has in fact agreed to using mini-usb as a standard. As if that is really a reason to buy a phone or not.

    To say Apple never changes shows no knowledge of the history of Apple, even their recent history.
  • Itaintrite - Wednesday, July 8, 2009 - link

    Heh, it's funny how you say that you can't just look at clock speed, then followed with "the 528MHz processor in the iPod Touch is no where near as fast as the 600MHz processor in the iPhone 3GS." Heh.
  • Anonymous Freak - Wednesday, July 8, 2009 - link

    I want my punch and pie!

    Or a lollipop.

    Good review, I could feel your hunger pangs toward both Palm and Apple toward the end...
  • monomer - Wednesday, July 8, 2009 - link

    Regarding Anand's comments about Android phones needing an upgraded CPU, rumors are that the upcoming Sony Xperia Rachael will be sporting a 1GHz Qualcomm Snapdragon processor (ARM Cortex A8 derivative). Would love to find out the details of this phone when they become available.

    http://www.engadget.com/2009/07/04/sony-ericsson-r...">http://www.engadget.com/2009/07/04/sony...chael-an...
  • Affectionate-Bed-980 - Thursday, July 9, 2009 - link

    Well I'd like to see Anand's experience with Android phones. What is it, just G1? Look at the new Hero or even G2. What about the Samsung i7500? Sorry I'm afraid that the limited nature of cell phone selection in the US makes it VERY HARD to review cell phones well here. I haven't seen a good cell phone site that's by people in the US and from the US only. Phone Arena, Mobile Burn, Phonescoop, GSM Arena, It seems the international guys get a LOT more exposure, and this is why I feel like Anand's comments about phones in general makes him sound inexperienced which I can certainly bet is the case.

    If you limit yourself to only carrier offered phones, then I don't think you can make accurate assessments about manufacturers like Nokia or certain OS phones like WinMo or Symbian or even Android unless the US starts offering more of what the world considers top notch popular phones.
  • Affectionate-Bed-980 - Wednesday, July 8, 2009 - link

    N97 specs should be 434 MHz ARM11 not 424...
  • Affectionate-Bed-980 - Wednesday, July 8, 2009 - link

    BTW I don't believe you should be commenting about the N97. Gizmodo is heavily biased towards iPhones and unless you yourself Anand uses some Symbian S60 phones with detail, I don't really think you should join in the S60 bashing. I think a lot of us Symbian users AGREE that the platform needs to improve, but considering we were like ZOMG434MHZFAIL, the N97 is not bad in response time if you look at a few videos. The UI exceeded a lot of expectations amongst the Symbian crowd. If anything why don't you throw the Samsung i8910 Omnia HD in there instead? That has a Cortex A8 and uses Symbian S60v5 (not to mention has been out longer than the N97). The other S60v5 phone to come out is the Sony Satio which also uses a Cortex A8.

    You might as well comment on why Cortex A8 isn't being implemented in all new phones. WinMo phones are still on ARM11, and even HTC's newest announcements are ARMv7 chips. The iPhone doesn't define what high end is. Because if you want to point out that its unacceptable to have a ARMv7 chip in an N97, then it's just as unacceptable for the iPhone not to have a 5MP camera and multitasking.
  • straubs - Wednesday, July 8, 2009 - link

    1. It IS unacceptable for any flagship phone to use ARM11. The iPhone, Pre, and Omnia HD (as you pointed out) all use it, so why wouldn't Nokia put it in it's $700 N-series flagship? It doesn't make sense. I'm surprised he didn't mention the crappy screen on the N97.

    2. He did comment on how the iPhone needs multi-tasking and how much he missed the Pre's implementation of it.

    3. Doesn't everyone at this point agree that the number of megapixels in a phone camera is not a huge deal, considering the size of the sensor and optics? I would guess the N97 pictures are better than those from the the 3GS, but nothing like the jump from an ARM11 or A8.

Log in

Don't have an account? Sign up now