It's this kind of heterogeneous SoC structure with different CPU architectures on a single die that will nail Intel to the wall and prevent them from ever really penetrating the mobile market.
Intel is perfectly capable of doing the same thing, so I'm not sure why you say that. They do it now in a different direction with the on-die SB GPU...
They won't be able to implement a small, energy-efficient processor similar to Kingfisher (Cortex A7). While at performance/complexity levels of A15, the x86 decode penalty is relatively small, as you get down to ~100mW levels at the die area we're talking about, there simply isn't an x86 core out there that is feasible.
Intel already has the CE4100, a SoC designed for TVs. It just isn't a main point they are focusing on right now, as they are competing against IBM RISC (some 200W per CPU is possible on those :eek:), while having to scale all the way down. Intel even has to focus on IGP, wireless, ethernet, etc. Intel is doing the best they can against an overwheling slew of competitors at all angles in all directions. However, I do not have doubts in their ability to compete when why need to. Right now, they have some big fish to fry (IBM), and it's not like Intel hasn't made ARM chips before (XScale).
Not to mention... they still hold their traditional fab advantage over everyone else.
But do I think Intel may lose out if they don't start making some serious pushes into the smartphone SoC market? Yes, I do. However, Google/Intel have already anounced all future versions of Android will be officiall support x86, in addition to ARM. This works for existing Android apps, too, due to the JIT nature of Android apps (Dalvik).
"Intel is doing the best they can against an overwheling slew of competitors at all angles in all directions."
Actually Intel has made the deliberate decision that FULL x86 compatibility trumps everything else. This saddles Atom with a huge amount of basically useless baggage that ARM is not carrying. This baggage takes up space, uses power, and, most importantly, makes it that much more difficult to execute and validate a chip fast.
This is not "doing the best they can". It is a stupid, short-sighted decision made by execs who have drunk their own koolaid and can't imagine that some decisions made by Intel in the 1980s may not be optimal for today. Atom SHOULD have been a stripped down x86-64 CPU with EVERYTHING not essential to that mission jettisoned. One (modern) way of doing everything and the rest --- 286 mode, V x86 mode, PAE, SMM, x87, MMX etc etc tossed overboard. But that ship has sailed. Intel have made their bed, and now they will have to lie in it --- the same mindset that sank iTanic will sink Atom in its target market.
x86 compatibility is not a significant burden on modern chips; transistor density is increasing far faster than the number of x86 instructions is, and Intel's chips have been effectively RISC since the Pentium Pro, when they started translating x86 into micro-ops internally. CISC on the outside, RISC on the inside.
In 2008, when Anand wrote his Atom architecture article (http://www.anandtech.com/show/2493/3), he pointed out AMD had told him that x86 decoding consumed only 10% of the transistor count of the K8, and current transistor densities are ~35x higher than they were then (130nm -> 22nm).
By that math, that means that x86 decoding consumes only about 0.3% of a modern desktop processor, almost inconsequential.
It is not a significant burden on modern large chips. The type of profile that the Cortex A7 fits in (~0.5mm^2) will see a large burden from x86 decoding.
On a desktop or even low-profile laptop chip, x86 compatibility doesn't cost much. On a tiny CPU meant to only process a smartphone's background processes, it can cost a lot.
Allow me to explain how English works to you. You see this statement "and, most importantly, makes it that much more difficult to execute and validate a chip fast."? That means if you wish to refute my point, you have to attack the claim that I said was MOST IMPORTANT.
Do you deny that Intel has an insanely complex task designing and validating their chips, vastly more so than ARM? Do you deny that the major reason for that complexity is all the x86 baggage?
Like i wrote below, Atom and SandyBridge are different thing. Atom does not support SSE 4, Some Do not support Hyper Threading, Some dont have Intel 64, and it also lacks the new SandyBridge AVX.
I am not expert, but since Intel Atom and SB will have different Micro-Op Cache, Unless you write your software on the lowest common denominator, which is the Atom. You cant have the software works the same way as ARM has currently show with A7 and A9.
Yes, with some software tricks and profile i suppose the problem isn't Very hard to tackle. But in terms of software development ARM should be much easier.
The problem with Atom power wise was Intel stupidly decided to saddle it with a chipset that was 3 (?) fab generations behind (Atom: 45nm, chipset: 130nm) and used more power than the actual cpu. I don't know if they have corrected this part of the problem but it seems to be an Intel trait - get most of it right but screw up on the last mile thing. (Compare that to AMD that either gets it right (Zacate, Llano) or gets it very wrong (Bulldozer) or ARM which seems to get a most everything right).
Good point. Chipset development has been a secondary priority for Intel, this way they ensure the combo solution is good-enough for the market to make the volumes they intended. Looking into the past when other chipset makers like Nvidia, AMD and SiS, even VIA at some stage did a better chipset implementation than Intel.
Atom was not aggressive enough to leverage low-power and has little integration of other SoC components like most of the chipset features. At least most of the north/south bridges leaving only external I/O interfaces. A lousy slow GPU is their burden, so plenty of legacy not solved.
Intel had better be careful because ARM A15 has the capability to upset X86 in future by software emulation with multicore heterogeneous chips.
Well said. Atom has already sunk, just that Intel is in denial mode. I was suggesting that Intel swallows their pride and GET an Arm license so they can design and manufacture these chips and compete with Qualcomm and Samsung in the ARM market. At least it will give them some volume game instead of having zero in mobile. If they continue with Atom architecture, they will learn a costly mistake later on. This way they can evolve a single core SB low power and maybe a bare-bones atom core small.BIG evolution. It could just secures their Win8 tablets for X86 (if that market ever develops...).
A GPU is not a CPU. And intel have more than once before that they are NOT capable of matching ARM. And i dont expect them to for a long time, if ever.
Yeah, I was thinking "hey, Intel could stick an Atom on a better chip" before he said that.
This IS very interesting (as is what Nvidia is doing even before this), but it's interesting because it's an interesting idea...I don't see how it effects Intel one way or the other. Obviously if other companies can do it, Intel can do it too.
<quote>It's this kind of heterogeneous SoC structure with different CPU architectures on a single die that will nail Intel to the wall and prevent them from ever really penetrating the mobile market. </quote>
The idea of heterogeneous architectures isn't new. ARM is simply applying them differently. IBM's cell processor (used in the PS3) uses a combination of general purpose processing core(s) and specialized lighter weight cores. Quite a while back Intel's vision of the future involved processors with a combination of a few complex heavy weight cores and many lightweight cores (think Larrabee or similar). With power saving features largely complete and an upcoming GPU that is supposed to be competitive I wouldn't be surprised if Intel started to get more serious about bringing this to market. They have already made great strives with their tera-scale research: 48 core single chip, 80 core research chip.
What ARM did that was innovative, was to use a heterogeneous architecture for the purposes of power savings, and to make it appear homogenous. I would argue that with Intel's focus on power gating and other power saving features, the idea of using a heterogeneous architecture to save power hasn't escaped them. However, full instruction set compatibility between the two architectures makes things much simpler as the different cores remain largely transparent to the OS and applications. While it isn't really that hard to develop separate code paths to use more efficient instructions when available, this does raise the complexity on the OS for thread scheduling. Hiding these cores is mostly a convenience, though. It puts the burden of moving to a lower power core largely on the chip and again reduces the complexity of the thread scheduler.
A more effective use of heterogeneous architectures would be to reveal the presence of all cores to the OS and to individually power gate them. (Individual power states would be even better.) This would allow the use of lower power cores any time for threads that don't require higher performance. I.E. two high performance apps and low OS background tasks would take place on two A15 cores and an A7 rather than three A15s. Further, once the OS starts intelligently assigning tasks to processors, it can become advantageous to have slight differences in the architectures of some cores to support specific tasks.
I see this move as a necessary one to get OS makers and app developers thinking along the lines of heterogeneous processing while providing a progressive move over path. Intel's I64 architecture failed largely due to the fact that it forced a clean break from past applications. AMD's A64 architecture succeeded because people didn't have to leave behind old applications and code going forward (at least no until they were ready to). That said, I don't think ARM intends to stop here long term. While a cell like approach with significantly different cores would be less than optimal, smaller differences like the lack of full NEON or SSE4 support on lower power cores shouldn't be much of a burden once the OS/apps are smart enough to route threads to a core with the necessary units available.
This is exactly the kind of competition that the market has needed. AMD used to be able to keep up to Intel's heals with intelligent decisions and hand-tuning to make the most of being on a mature process node as opposed to a cutting/bleeding edge one. ARM's decisions here represent the logic of applying that at a macroscopic (architectural) level.
I was contemplating this few months ago before Kal-El was described in media and existence of its extra core revealed to public. Something along the lines of 2x Atom + 2x Sandy Bridge cores, with all cores visible to the OS.
OS should be able to identify each core and allocate the workload to it accordingly - i.e. OS would grab one of the weaker cores for itself, schedule CPU intensive processes on more powerful cores and have one low-power core in reserve just in case (for antivirus etc.).
This would result in CPUs with maximum TDP only ~5-10W above existing models yet it would allow for far less conext switching. There is no point in hiding those weaker cores from the OS; instead OS should be intelligent enough to to utilize them to the fullest extent.
For seamless *running* application migration between the different core types, they should both support the exact same instruction set extensions, which currently Atom and SB don't. I don't think that AMD's Bobcat and Bulldozer do either.
I wouldn't say no to a chip comprising of a Bulldozer module or two (like Trinity), and a couple of Bobcat cores as well for lower-power modes. This would surely save a lot of power over even Bulldozer in its lowest operational clock/power state.
However neither AMD nor Intel can compete in power against this ARM technology - A15 for power (around Bobcat performance per core) and A7 for power saving (around 1GHz Atom performance per core I would imagine). As soon as Intel takes a step towards lower power with Atom, ARM moves the goalposts. Even an Atom core implemented at 22nm can't compete with a 28nm 0.5mm^2 core... which is practically free in terms of silicon (even with a small L2 cache added on top).
Wouldn't a dynamic frequency design (like speedstep) a better implementation? Rather than having two different architectures exchanging data and handling different tasks.
DVFS is in use in almost all current-gen SoC's. This certainly does bring with it power saving, but given the present nature of workloads on most mobile devices, the CPU is either in standby (most of the time) or ramped up fully (for most of the remainder). Having cores running at different frequency steps, while a good idea on paper, can prove detrimental to performance if not implemented correctly.
Having a low-power 'companion' core shows power savings more readily, especially given the extremes in mobile CPU workload (standby-to-full-clock). The companion core is capable of running the exact set of tasks as the main-core(s), albeit at lower performance levels. This is completely transparent to the OS and software layers above since they are in fact the exact same architecture (or instruction set, to be clearer).
Even at the lowest frequency and voltage, a complex core will still use more power than a simple core. Take a Cortex A5 compared to a Cortex A15 -- even if you step down the voltage to minimum (~0.7V) on the Cortex A15, it would still consume more power than the Cortex A5 at max speed.
And that's not even accounting for the power savings operating an A5 at lower voltage/frequency would do.
There are issues like transistor leakage, etc that larger cores cannot fully overcome just by clocking down. This is why there's a move to unbalanced MP.
@gostan: "Wouldn't a dynamic frequency design (like speedstep) a better implementation?"
NO!. You cannot change the number of pipelines in the CPU, nor the components it needed, cache, eu, iu, fpu etc. So the number of transistors needed current is the same even with lower current. If the number of transistors are 1/3 then you get 3X savings!. so multiple simpler cores saves power way more, ir scales well.
Yet when Intel demo'ed their claremont prototype they were able to demonstrate scaling by a factor of 1000. This renders the multi-chip approach an expensive crutch.
I remember NEC ascribing to much the same philosophy many years ago when they started doing embedded multi core development. Did ARM tread on similar ground or is it me?
People have been doing this with ARM designs for ages anyhow, although not necessarily for power efficiency reasons.
Nintendo has done it since the GBA. The GBA shipped with an ARM7 and Z80 and the DS shipped with an ARM7 and ARM9. The 3DS was the first to go homogeneous, with two ARM11 cores.
To go off on a bit of a tangent, the 3DS's CPU is rather disappointing, as two 266MHz ARM11 chips is pretty pathetic, with similar performance to a first-gen iPhone. The PS Vita's quad-core Cortex A9 probably has 10-15x the performance... Makes me kind of regret buying a 3DS ;)
While A7 at best 1/3 Die Size of A8. It doesn't state the power compare to A8. And i dont understand where the 5x power efficiency coming from. I am guessing it will be able to delivery Double the Performance of A8 while using half the power. ( While that is amazing, it is still only 4x power efficiency!!!! )
It states about powering up and down individual core. What about having A7 constantly running task on phones, such as signal, phone calls, email etc... and only use A9 if there is a need? i.e delegating task to that core only.
The most amazing thing is A15 and A7 would appear to be the same to applications. That is unlike the current Atom and SandyBridge. Where SB support additional instructions and features. This make Atom even further away from getting to A7 level.
We all thought with further tweaking, and 22nm die shrink, Atom would only be one or two steps away from ARM on Mobile Phones. Not anymore with Cortex A7.
And we have PowerVR 6 coming out soon plus their Power VR RTX ( Hardware Ray Trace ).
I wonder when will ARM start to tackle the server market.
Too true, being british im kinda loving the secret surge in british influence on microprocessors/mini gpus, they have really caught the big boys with their pants down!
Just a quick question, to anyone one really ,(hopefully anand/brian) has any one got any news on the arm mali t-604 design?? seeing how powerfull the samsung implementation of the mali 400 was, will be interesting to see how the t-604 stacks up against the other next generation gpus..ie power vr 6 series, adreno 225-3xx series.
As i am really interested in this topic, and i cant find any new info on these designs, has anyone got any info/updates on what new features/api/performance we could expect from the next generation? heard that mali t-604 will be dx11!?
I believe it's too early in the development stage and no public data is available for the T-604 nor Rogue (VR6 series). This should change in the coming months I assume, since they announced both quite a while back
So they are envisioning a SoC with dual A15 and dual A7 cores, the A15 pair used for high performance and the A7 pair used for low power. Is there a way to use both the A15 and A7 together if there is thermal room since they are the same ISA?
From the aritcle: ARM did add that SoC vendors are free to expose all cores to the OS if they would like, although would obviously require OS awareness of the different core types.
You could also focus on advancing battery technology.
I understand battery tech is much more mature than SoC tech, so revolutionary advancements in this field may be wishful thinking, but it would be cool to see what things are being done in the R&D labs of our smartphone battery producers :)
meh that's a dead end. Lithium batteries only store 30% or so of what they are capable of storing. Why don't they? Because at full energy density that energy is just asking to get out-- a pin prick into the cell would set it on fahr.
There are technologies that are much safer than Li-ion currently in development. While Pros/Cons differ between different battery technologies, developing a better battery is never a dead end.
That said, I'm certain that there are advancements in battery technology all the time, we just read articles about them.
Battery technology is by no means "more mature". The techniques developed that are in use today are actually pretty recent advances when you consider that high-output LiFePO4 batteries are a pretty recent invention, and aren't even really on the market yet.
Current batteries are dangerous because they use LiCoO4, which decomposes at high temperature to release lithium metal and oxygen. I think it's obvious why that's a bad setup, but we already have a number of potential solutions fairly far along through the research stage.
Other potential advances have come from some recent potentially game-changing success in 3D batteries, which use materials such as aerogels, foams, etc. to give exceptionally high surface areas, which gives them the potential to have much higher power and energy densities. This system is just more difficult because it requires even coatings of subsequent layers, which requires somewhat complicated, though potentially very efficient, chemical methods.
So battery research is extremely important, and there's a lot of progress being made, though it's suffering right now due to research budget shortfalls, at least in the USA.
While this is nice article I really hate people usually forget Marvell Armada 628 -- heterogenous triple-core and just talk about NVidia Kal-El. Ptherwise combination of single-core A7 + 4-8 cores of A15 looks like the killer SoC. :-)
@Kgardas: While the chip looks good on paper, it supports USB3 which is no use in mobile, so it is likely to be used in NAS, MediaPlayer and STB applications. The chip also looks big in size so I guess its power consumption on full bore is significant in ARM terms.
The OMAP family also does this with Cortex-A and Cortex-M cores put together in a die. It's quite similar to what ARM describes here, except of course the instruction sets are not the same
But is very hard to realize. Today's OS are not application aware. Meaning it would not know if a thread is from application X and need 1GHz vs. another thread from application Y that only require 1MHz. As such, it would not be able to dynamically moving thread from one core to another without guarantee not missing deadline. If the small core is dedicate to do housekeeping thread only (i.e. sync, standby etc), that is all good but there is no need to do that anyway because such tasks are so infrequency (every hundred ms or so). Therefore, you can wakeup the big core, and shut it down.
I don't think it's so hard. The scheduler would start all processes on the slow core, and if the CPU utilization doesn't exceed its maximum over a very short period of time, keep it there because it obviously doesn't need the extra processing power.
The Android schedulers (CFS and BFS) are nano-second time aware, so the latency penalty could be managed.
Of course, it would be best if the programmer could explicitly place their program into a core, but you can already do that with sched_realtime, sched_fifo, and sched_batch policies. The question is really how far Android optimizes for this sort of thing. Right now I think they treat everything as realtime fifo queues, instead of letting the built-in Linux schedulers do their thing.
Keep in mind that there are performance and latency penalty when power on/off the big core. When you power up the big core, it would take time (to power up, reset and boot). Its L2$ are empty, so it would need to be heated up. All of this added to performance impact that is very hard to take into account by the O.S.
as you said, if programmer can specify performance/deadline/cpu requirement, then everything would be simple. However, we can't expect that from million of developers out there. It's just not practical.
Boot? They would lay there as cores in a low power state, for which there would be several, it won't take tens of seconds to start up. I'm sure they can handle the power states, clocks and scheduling pretty well in the os's. The OS would know when it's in a power saving mode or not or when it needs the performance or not. It would very much depend on power profiles and so on. I would not expect them to be in a deep sleep mode at every time when resources are needed. But it's point is of course battery-life not performance. A flag on programs that want to run on the big cores would probably be easy to implement on a system like Android, I wouldn't think as it's not just pure Linux ELFs. But we will see what kind of schemes there will be soon I guess.
Leakage is the reason they have "big-small" setup. So if the big core is not used, it will be "shut-down". So it will boot up from ROM when woken up. In 28 and 20nm, leakage will be the dominating power factor.
You'd be surprised. Most modern OS's (including Android) have not only profiling but API support for applications to poll for resources such as CPU. Most of the time, you won't be seeing something like Pandora take the CPU up to 100% even if it could, in theory, burst process a lot of data and then go to sleep.
The problem is that a lot of things can indeed be done faster -- web browser rendering is one primary example of something that would hog up as much CPU as it can.
And there's not really a way for a user to specify "hey, I don't mind if the page renders slower, stop using so much power".
True, the OSs do profiling, but it is hardly accurate to guarantee real-time performance. Yes, the API would help, but not many application, if any, that specify the resource it needs as that would depend on system. It's just not practical to require software developer to specify the resource needed given how many developers and applications we have today. Big guy like Pandora, sure, but not million of little guys out there. Deciding when to switch back and forth between the LITTLE and BIG core is hard because it's not free. It cost power and performance (latency). If you switch to often, then you end up costing more power. The problem is there is no fix criteria to switch. If you have the little core to handle "system tasks" and the big core to handle application (like Tegra-3), then it may work. However, that only help standby power and wont' do much for extend web-browsing time.
The Cortex-A5 processor is the smallest, lowest power ARM multicore processor capable of delivering the Internet to the widest possible range of devices: from ultra low cost handsets, feature phones and smart mobile devices, to pervasive embedded, consumer and industrial devices.The Cortex-A5 processor is fully application compatible with the Cortex-A8, Cortex-A9, and Cortex-A15 processors, enabling immediate access to an established developer and software ecosystem including Android, Adobe Flash, Java Platform Standard Edition (Java SE), JavaFX, Linux, Microsoft Windows Embedded, Symbian and Ubuntu. Cortex-A5 benefits include:
- Full application compatibility with the Cortex-A8, Cortex-A9, and Cortex-A15 processors - Provides a high-value migration path for the large number of existing ARM926EJ-S™ and ARM1176JZ-S™ processor licensees. - 1/3 the power and area of Cortex-A9, with full instruction set compatibility.
The Cortex-A5 was announced in 2009 and hasn't apparently there hasn't been much demand for it (according to one article). At 1.57 DMIPS / MHz (according to the ARM page) it's significantly weaker than the A8, and I figure that was one problem. My guess is that the Cortex-A7 is a response to that, with higher clock rates and performance that should surpass A8 in most cases.
But it wasn't supposed to replace Cortex A8, but ARM11, which is still in all low-end Android phones today, and I hate it. Cortex A5 with close to Cortex A8 performance, and 3x more efficient, would've been a really nice replacement for ARM11.
Because 8 is an even number, and ARM was cursed so that every even numbered architeture they make is bad. You don't hear about ARM6, ARM8 or ARM10, but ARM7, ARM9 and ARM11 are still very much alive everywhere (low end, Tegra2, etc). The cortex A8 was a bit more sucessfull because the new instruction-set and raw power, but still was a bad desing. We probably won't be hearing about any new SoC using it in the future.
Cortex A9, A5, A15 and now A7. ARM is on a roll now, as they stopped being stubborn and are side-stepping the even numbers. ;-)
The Cortex A8 is bigger and theorically faster clock for clock than the Cortex A7, even if in practice it will likely be slower because it laugably slow FPU, lower efficiency and core counts. And as it isn't faster than A9, 7 is the logical number to use.
Anand, could you do an article on xx bit CPU. With 64 Bit x86 CPU we get two major benefits, memory addressing space, and extra register for faster performance. But other then that, how many program actually uses 64bit Integer and Floating Point?
ARM A7 / A15 seems to provide 40bit address, 1TB of Memory or 250 times more then current 4GB limit. I remember Intel also had 40bit memory addressing but require Software, OS, BIOS working together and it doesn't work very well on software development. Is this still the case with ARM?
I can't wait to boot Ubuntu on those. With little tweaks we'll be able to have nice threads go to A7 and others to A15 automatically. Quad cores at 1-1.5 GHz should be enough for for mostly anything on Linux. And if we get it packaged with 543MP2 (and good drivers) this would kill x86.
hey i posted i comment earlier, replying to someone else, asking a question or two, nothing impolite or anythng like that, and when i have looked back to see if i got an answer my comment has been removed!?? why??
Does anyone else find their naming conventions infuriating? 11->7->8->9->15 HUH? plus they label their instruction sets with an A and a similar but different number... and the other licensees of ARM have their own gimpy naming conventions with one for the base core and one for the chip and so on and so forth...
I don't think A15 is really a replacement of A9. Each member of the Cortex-A family seems to have its place, except the A8 (which is the oldest and obsolete). A5: very small, low power - though once you include larger L1 caches and NEON it seems A7 would be a better fit as it's hardly smaller anymore. A7: small, low power, probably highest perf/power and perf/area of the whole family. Unlike A5 l1 cache sizes are fixed and NEON always included, and with all features of the A15 (including virtualization for instance). A8: a dud, unlike all others not MP capable and with nonpipelined FPU. Worst efficiency of the family (by a large margin) in perf/area and perf/power. I don't think there's any reason at all why you'd want to use this in a new design. In some areas it might be faster than A7 (I think NEON might be twice as fast). A9: similar size to A8 but quite a bit more advanced (out-of-order) and with higher efficiency. A15: quite a beast compared to A9, much more complex and faster, but much bigger - the first to target low-power servers too. Efficiency might be similar to A9, not sure.
Of course, this completely ignores the timeframe - A8 was the only option for quite some time, and apart from that only A9 has made it to devices yet (I think we should see A5 soon enough - MSM7227A has Cortex-A5 and possibly quite a few low-end smartphones might use it).
I do see the A15 as a replacement for the A9. The high end was A9 and will be A15. The mid range was A8 and will be A7. Both will offer significant performance increases. A9 will survive a little longer (as will the A8, probably), but I don't think it has a real place between the A15 and A7.
As for the names, ARM Cortex family names reflects core complexity or size, far as I understand, not how new the core is.
Intel has his ACE, atom E6xx, which has already been proved to be much more power efficient than ARM. For example, Sony PRS T1 is using intel 1Ghz processor(Atom E640) in his e-book reader runing on Andorid.
Good point though there should be really a rather large difference in performance (and in chip size) between a A7 and a A15, hence I think there's still some room for A9 (which should still be a fair bit faster than A7 after all) - say for a low-end smartphone in 2013. I guess though this will rather depend on the licensing cost differences (afaik the less complex designs are cheaper) between A7/A9/A15.
I dont think the replacement are intentional. it is the market overlaping when one product's power and effcieny leap forward.
It is like a Pentium SandyBridge was never meant to replace the good old Core 2 Duo, but it is just SandyBridge being so much better and cheaper to happens to replaced it.
is that ARM cannot boost core performance without sucking up more power. The power versus performance chart shows a straight line. So they have to employ this trick to maintain power efficiency, while getting enough horse power for difficult tasks.
Believe Intel has done this sort of thing already. What else is new?
No one can boost performance without sucking more power. But ARM still have many more things to do with IPC. And as a matter of fact, a Quad Core Cortex A15 @ 2.5Ghz is very capable, and faster then a Core 2 Duo.
"No one can boost performance without sucking more power." I beg your pardon ? I hope what you mean is that that`s impossible over one architecture type and/or manufacturing process ? Because if not, you`re talking nonsense - how about Intel SB compared to Nehalem or Ivy Bridge which is said to provide both up to 15-20% boost over SB while still lowering TDP for some cpu`s (e.g. 77 W desktop quadcore) ?
Didnt nvidia do some type of demo with tegra 3 where it beat a core 2 duo?? seemed a bit suspicious to me but if that is true then a quad cortex-a15 might match up against a low level core 2 quad!
We really need to find a way to keep a display from draining the battery. If they can use these processors so that you can keep the brightness but have as little of an impact on display draining the battery, we might have something here to jump on. In mean the phones today use about 3/4 of the battery life on just the display alone (and I'm not talking even talking about the LTE band for you people on Verizon), so this would be really nice feature. I hear they're already trying some battery-saving features in Ice Cream Sandwich (4.0). Any takes on this?
Would you please do a quick comparison of modern x86 vs. ARM solutions in terms of performance and power effiiency. The arguments keep going back/forth, and some reliable numbers would be very useful. Thanx!
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
75 Comments
Back to Article
dagamer34 - Wednesday, October 19, 2011 - link
It's this kind of heterogeneous SoC structure with different CPU architectures on a single die that will nail Intel to the wall and prevent them from ever really penetrating the mobile market.A5 - Wednesday, October 19, 2011 - link
Intel is perfectly capable of doing the same thing, so I'm not sure why you say that. They do it now in a different direction with the on-die SB GPU...metafor - Wednesday, October 19, 2011 - link
They won't be able to implement a small, energy-efficient processor similar to Kingfisher (Cortex A7). While at performance/complexity levels of A15, the x86 decode penalty is relatively small, as you get down to ~100mW levels at the die area we're talking about, there simply isn't an x86 core out there that is feasible.jeremyshaw - Wednesday, October 19, 2011 - link
Intel already has the CE4100, a SoC designed for TVs. It just isn't a main point they are focusing on right now, as they are competing against IBM RISC (some 200W per CPU is possible on those :eek:), while having to scale all the way down. Intel even has to focus on IGP, wireless, ethernet, etc. Intel is doing the best they can against an overwheling slew of competitors at all angles in all directions. However, I do not have doubts in their ability to compete when why need to. Right now, they have some big fish to fry (IBM), and it's not like Intel hasn't made ARM chips before (XScale).Not to mention... they still hold their traditional fab advantage over everyone else.
But do I think Intel may lose out if they don't start making some serious pushes into the smartphone SoC market? Yes, I do. However, Google/Intel have already anounced all future versions of Android will be officiall support x86, in addition to ARM. This works for existing Android apps, too, due to the JIT nature of Android apps (Dalvik).
jeremyshaw - Wednesday, October 19, 2011 - link
the next time I post on a tablet... I won't, lol.name99 - Wednesday, October 19, 2011 - link
"Intel is doing the best they can against an overwheling slew of competitors at all angles in all directions."Actually Intel has made the deliberate decision that FULL x86 compatibility trumps everything else. This saddles Atom with a huge amount of basically useless baggage that ARM is not carrying. This baggage takes up space, uses power, and, most importantly, makes it that much more difficult to execute and validate a chip fast.
This is not "doing the best they can". It is a stupid, short-sighted decision made by execs who have drunk their own koolaid and can't imagine that some decisions made by Intel in the 1980s may not be optimal for today. Atom SHOULD have been a stripped down x86-64 CPU with EVERYTHING not essential to that mission jettisoned. One (modern) way of doing everything and the rest --- 286 mode, V x86 mode, PAE, SMM, x87, MMX etc etc tossed overboard. But that ship has sailed. Intel have made their bed, and now they will have to lie in it --- the same mindset that sank iTanic will sink Atom in its target market.
Guspaz - Wednesday, October 19, 2011 - link
x86 compatibility is not a significant burden on modern chips; transistor density is increasing far faster than the number of x86 instructions is, and Intel's chips have been effectively RISC since the Pentium Pro, when they started translating x86 into micro-ops internally. CISC on the outside, RISC on the inside.In 2008, when Anand wrote his Atom architecture article (http://www.anandtech.com/show/2493/3), he pointed out AMD had told him that x86 decoding consumed only 10% of the transistor count of the K8, and current transistor densities are ~35x higher than they were then (130nm -> 22nm).
By that math, that means that x86 decoding consumes only about 0.3% of a modern desktop processor, almost inconsequential.
metafor - Wednesday, October 19, 2011 - link
It is not a significant burden on modern large chips. The type of profile that the Cortex A7 fits in (~0.5mm^2) will see a large burden from x86 decoding.On a desktop or even low-profile laptop chip, x86 compatibility doesn't cost much. On a tiny CPU meant to only process a smartphone's background processes, it can cost a lot.
name99 - Wednesday, October 19, 2011 - link
Allow me to explain how English works to you. You see this statement "and, most importantly, makes it that much more difficult to execute and validate a chip fast."?That means if you wish to refute my point, you have to attack the claim that I said was MOST IMPORTANT.
Do you deny that Intel has an insanely complex task designing and validating their chips, vastly more so than ARM? Do you deny that the major reason for that complexity is all the x86 baggage?
iwod - Thursday, October 20, 2011 - link
Like i wrote below, Atom and SandyBridge are different thing. Atom does not support SSE 4, Some Do not support Hyper Threading, Some dont have Intel 64, and it also lacks the new SandyBridge AVX.I am not expert, but since Intel Atom and SB will have different Micro-Op Cache, Unless you write your software on the lowest common denominator, which is the Atom. You cant have the software works the same way as ARM has currently show with A7 and A9.
Yes, with some software tricks and profile i suppose the problem isn't Very hard to tackle. But in terms of software development ARM should be much easier.
Manabu - Saturday, October 22, 2011 - link
The decode portion of the chip also have to grow if one wants higher IPC/power efficiency. So it is probably more than 0.3% nowadays.fic2 - Wednesday, October 19, 2011 - link
The problem with Atom power wise was Intel stupidly decided to saddle it with a chipset that was 3 (?) fab generations behind (Atom: 45nm, chipset: 130nm) and used more power than the actual cpu. I don't know if they have corrected this part of the problem but it seems to be an Intel trait - get most of it right but screw up on the last mile thing. (Compare that to AMD that either gets it right (Zacate, Llano) or gets it very wrong (Bulldozer) or ARM which seems to get a most everything right).fteoath64 - Thursday, October 20, 2011 - link
Good point. Chipset development has been a secondary priority for Intel, this way they ensure the combo solution is good-enough for the market to make the volumes they intended. Looking into the past when other chipset makers like Nvidia, AMD and SiS, even VIA at some stage did a better chipset implementation than Intel.Atom was not aggressive enough to leverage low-power and has little integration of other SoC components like most of the chipset features. At least most of the north/south bridges leaving only external I/O interfaces. A lousy slow GPU is their burden, so plenty of legacy not solved.
Intel had better be careful because ARM A15 has the capability to upset X86 in future by software emulation with multicore heterogeneous chips.
fteoath64 - Thursday, October 20, 2011 - link
Well said. Atom has already sunk, just that Intel is in denial mode. I was suggesting that Intel swallows their pride and GET an Arm license so they can design and manufacture these chips and compete with Qualcomm and Samsung in the ARM market. At least it will give them some volume game instead of having zero in mobile.If they continue with Atom architecture, they will learn a costly mistake later on. This way they can evolve a single core SB low power and maybe a bare-bones atom core small.BIG evolution. It could just secures their Win8 tablets for X86 (if that market ever develops...).
B3an - Wednesday, October 19, 2011 - link
A GPU is not a CPU. And intel have more than once before that they are NOT capable of matching ARM. And i dont expect them to for a long time, if ever.Wolfpup - Wednesday, October 19, 2011 - link
Yeah, I was thinking "hey, Intel could stick an Atom on a better chip" before he said that.This IS very interesting (as is what Nvidia is doing even before this), but it's interesting because it's an interesting idea...I don't see how it effects Intel one way or the other. Obviously if other companies can do it, Intel can do it too.
JPForums - Thursday, October 20, 2011 - link
<quote>It's this kind of heterogeneous SoC structure with different CPU architectures on a single die that will nail Intel to the wall and prevent them from ever really penetrating the mobile market. </quote>The idea of heterogeneous architectures isn't new. ARM is simply applying them differently. IBM's cell processor (used in the PS3) uses a combination of general purpose processing core(s) and specialized lighter weight cores. Quite a while back Intel's vision of the future involved processors with a combination of a few complex heavy weight cores and many lightweight cores (think Larrabee or similar). With power saving features largely complete and an upcoming GPU that is supposed to be competitive I wouldn't be surprised if Intel started to get more serious about bringing this to market. They have already made great strives with their tera-scale research: 48 core single chip, 80 core research chip.
What ARM did that was innovative, was to use a heterogeneous architecture for the purposes of power savings, and to make it appear homogenous. I would argue that with Intel's focus on power gating and other power saving features, the idea of using a heterogeneous architecture to save power hasn't escaped them. However, full instruction set compatibility between the two architectures makes things much simpler as the different cores remain largely transparent to the OS and applications. While it isn't really that hard to develop separate code paths to use more efficient instructions when available, this does raise the complexity on the OS for thread scheduling. Hiding these cores is mostly a convenience, though. It puts the burden of moving to a lower power core largely on the chip and again reduces the complexity of the thread scheduler.
A more effective use of heterogeneous architectures would be to reveal the presence of all cores to the OS and to individually power gate them. (Individual power states would be even better.) This would allow the use of lower power cores any time for threads that don't require higher performance. I.E. two high performance apps and low OS background tasks would take place on two A15 cores and an A7 rather than three A15s. Further, once the OS starts intelligently assigning tasks to processors, it can become advantageous to have slight differences in the architectures of some cores to support specific tasks.
I see this move as a necessary one to get OS makers and app developers thinking along the lines of heterogeneous processing while providing a progressive move over path. Intel's I64 architecture failed largely due to the fact that it forced a clean break from past applications. AMD's A64 architecture succeeded because people didn't have to leave behind old applications and code going forward (at least no until they were ready to). That said, I don't think ARM intends to stop here long term. While a cell like approach with significantly different cores would be less than optimal, smaller differences like the lack of full NEON or SSE4 support on lower power cores shouldn't be much of a burden once the OS/apps are smart enough to route threads to a core with the necessary units available.
MJEvans - Wednesday, October 19, 2011 - link
This is exactly the kind of competition that the market has needed. AMD used to be able to keep up to Intel's heals with intelligent decisions and hand-tuning to make the most of being on a mature process node as opposed to a cutting/bleeding edge one. ARM's decisions here represent the logic of applying that at a macroscopic (architectural) level.Arnulf - Wednesday, October 19, 2011 - link
I was contemplating this few months ago before Kal-El was described in media and existence of its extra core revealed to public. Something along the lines of 2x Atom + 2x Sandy Bridge cores, with all cores visible to the OS.OS should be able to identify each core and allocate the workload to it accordingly - i.e. OS would grab one of the weaker cores for itself, schedule CPU intensive processes on more powerful cores and have one low-power core in reserve just in case (for antivirus etc.).
This would result in CPUs with maximum TDP only ~5-10W above existing models yet it would allow for far less conext switching. There is no point in hiding those weaker cores from the OS; instead OS should be intelligent enough to to utilize them to the fullest extent.
psychobriggsy - Thursday, October 20, 2011 - link
For seamless *running* application migration between the different core types, they should both support the exact same instruction set extensions, which currently Atom and SB don't. I don't think that AMD's Bobcat and Bulldozer do either.I wouldn't say no to a chip comprising of a Bulldozer module or two (like Trinity), and a couple of Bobcat cores as well for lower-power modes. This would surely save a lot of power over even Bulldozer in its lowest operational clock/power state.
However neither AMD nor Intel can compete in power against this ARM technology - A15 for power (around Bobcat performance per core) and A7 for power saving (around 1GHz Atom performance per core I would imagine). As soon as Intel takes a step towards lower power with Atom, ARM moves the goalposts. Even an Atom core implemented at 22nm can't compete with a 28nm 0.5mm^2 core... which is practically free in terms of silicon (even with a small L2 cache added on top).
gostan - Wednesday, October 19, 2011 - link
Wouldn't a dynamic frequency design (like speedstep) a better implementation? Rather than having two different architectures exchanging data and handling different tasks.mythun.chandra - Wednesday, October 19, 2011 - link
DVFS is in use in almost all current-gen SoC's. This certainly does bring with it power saving, but given the present nature of workloads on most mobile devices, the CPU is either in standby (most of the time) or ramped up fully (for most of the remainder). Having cores running at different frequency steps, while a good idea on paper, can prove detrimental to performance if not implemented correctly.Having a low-power 'companion' core shows power savings more readily, especially given the extremes in mobile CPU workload (standby-to-full-clock). The companion core is capable of running the exact set of tasks as the main-core(s), albeit at lower performance levels. This is completely transparent to the OS and software layers above since they are in fact the exact same architecture (or instruction set, to be clearer).
metafor - Wednesday, October 19, 2011 - link
Even at the lowest frequency and voltage, a complex core will still use more power than a simple core. Take a Cortex A5 compared to a Cortex A15 -- even if you step down the voltage to minimum (~0.7V) on the Cortex A15, it would still consume more power than the Cortex A5 at max speed.And that's not even accounting for the power savings operating an A5 at lower voltage/frequency would do.
bnolsen - Wednesday, October 19, 2011 - link
There are issues like transistor leakage, etc that larger cores cannot fully overcome just by clocking down. This is why there's a move to unbalanced MP.fteoath64 - Thursday, October 20, 2011 - link
@gostan: "Wouldn't a dynamic frequency design (like speedstep) a better implementation?"NO!. You cannot change the number of pipelines in the CPU, nor the components it needed, cache, eu, iu, fpu etc. So the number of transistors needed current is the same even with lower current. If the number of transistors are 1/3 then you get 3X savings!. so multiple simpler cores saves power way more, ir scales well.
Rick83 - Friday, October 21, 2011 - link
Yet when Intel demo'ed their claremont prototype they were able to demonstrate scaling by a factor of 1000.This renders the multi-chip approach an expensive crutch.
rupaniii - Wednesday, October 19, 2011 - link
I remember NEC ascribing to much the same philosophy many years ago when they started doing embedded multi core development.Did ARM tread on similar ground or is it me?
Guspaz - Wednesday, October 19, 2011 - link
People have been doing this with ARM designs for ages anyhow, although not necessarily for power efficiency reasons.Nintendo has done it since the GBA. The GBA shipped with an ARM7 and Z80 and the DS shipped with an ARM7 and ARM9. The 3DS was the first to go homogeneous, with two ARM11 cores.
To go off on a bit of a tangent, the 3DS's CPU is rather disappointing, as two 266MHz ARM11 chips is pretty pathetic, with similar performance to a first-gen iPhone. The PS Vita's quad-core Cortex A9 probably has 10-15x the performance... Makes me kind of regret buying a 3DS ;)
iwod - Wednesday, October 19, 2011 - link
While A7 at best 1/3 Die Size of A8. It doesn't state the power compare to A8. And i dont understand where the 5x power efficiency coming from. I am guessing it will be able to delivery Double the Performance of A8 while using half the power. ( While that is amazing, it is still only 4x power efficiency!!!! )It states about powering up and down individual core. What about having A7 constantly running task on phones, such as signal, phone calls, email etc... and only use A9 if there is a need? i.e delegating task to that core only.
The most amazing thing is A15 and A7 would appear to be the same to applications. That is unlike the current Atom and SandyBridge. Where SB support additional instructions and features. This make Atom even further away from getting to A7 level.
We all thought with further tweaking, and 22nm die shrink, Atom would only be one or two steps away from ARM on Mobile Phones. Not anymore with Cortex A7.
And we have PowerVR 6 coming out soon plus their Power VR RTX ( Hardware Ray Trace ).
I wonder when will ARM start to tackle the server market.
french toast - Thursday, October 20, 2011 - link
Too true, being british im kinda loving the secret surge in british influence on microprocessors/mini gpus, they have really caught the big boys with their pants down!Just a quick question, to anyone one really ,(hopefully anand/brian) has any one got any news on the arm mali t-604 design?? seeing how powerfull the samsung implementation of the mali 400 was, will be interesting to see how the t-604 stacks up against the other next generation gpus..ie power vr 6 series, adreno 225-3xx series.
As i am really interested in this topic, and i cant find any new info on these designs, has anyone got any info/updates on what new features/api/performance we could expect from the next generation? heard that mali t-604 will be dx11!?
Any response would be much appreciated.....
introiboad - Thursday, October 20, 2011 - link
I believe it's too early in the development stage and no public data is available for the T-604 nor Rogue (VR6 series). This should change in the coming months I assume, since they announced both quite a while backfrench toast - Thursday, October 20, 2011 - link
cheers, hopefully they will spill the beans soon!ltcommanderdata - Wednesday, October 19, 2011 - link
So they are envisioning a SoC with dual A15 and dual A7 cores, the A15 pair used for high performance and the A7 pair used for low power. Is there a way to use both the A15 and A7 together if there is thermal room since they are the same ISA?fic2 - Wednesday, October 19, 2011 - link
From the aritcle:ARM did add that SoC vendors are free to expose all cores to the OS if they would like, although would obviously require OS awareness of the different core types.
geniekid - Wednesday, October 19, 2011 - link
You could also focus on advancing battery technology.I understand battery tech is much more mature than SoC tech, so revolutionary advancements in this field may be wishful thinking, but it would be cool to see what things are being done in the R&D labs of our smartphone battery producers :)
bjacobson - Wednesday, October 19, 2011 - link
meh that's a dead end.Lithium batteries only store 30% or so of what they are capable of storing. Why don't they? Because at full energy density that energy is just asking to get out-- a pin prick into the cell would set it on fahr.
Etsp - Thursday, October 20, 2011 - link
There are technologies that are much safer than Li-ion currently in development. While Pros/Cons differ between different battery technologies, developing a better battery is never a dead end.That said, I'm certain that there are advancements in battery technology all the time, we just read articles about them.
Etsp - Thursday, October 20, 2011 - link
We just *don't* read articles about them.Steel77 - Thursday, October 20, 2011 - link
Battery technology is by no means "more mature". The techniques developed that are in use today are actually pretty recent advances when you consider that high-output LiFePO4 batteries are a pretty recent invention, and aren't even really on the market yet.Current batteries are dangerous because they use LiCoO4, which decomposes at high temperature to release lithium metal and oxygen. I think it's obvious why that's a bad setup, but we already have a number of potential solutions fairly far along through the research stage.
Other potential advances have come from some recent potentially game-changing success in 3D batteries, which use materials such as aerogels, foams, etc. to give exceptionally high surface areas, which gives them the potential to have much higher power and energy densities. This system is just more difficult because it requires even coatings of subsequent layers, which requires somewhat complicated, though potentially very efficient, chemical methods.
So battery research is extremely important, and there's a lot of progress being made, though it's suffering right now due to research budget shortfalls, at least in the USA.
secretmanofagent - Wednesday, October 19, 2011 - link
This is not my area of expertise, but it really seems like Krait would have an advantage over this design. Am I reading into this incorrectly?kgardas - Wednesday, October 19, 2011 - link
While this is nice article I really hate people usually forget Marvell Armada 628 -- heterogenous triple-core and just talk about NVidia Kal-El.Ptherwise combination of single-core A7 + 4-8 cores of A15 looks like the killer SoC. :-)
fteoath64 - Thursday, October 20, 2011 - link
@Kgardas: While the chip looks good on paper, it supports USB3 which is no use in mobile, so it is likely to be used in NAS, MediaPlayer and STB applications. The chip also looks big in size so I guess its power consumption on full bore is significant in ARM terms.introiboad - Thursday, October 20, 2011 - link
The OMAP family also does this with Cortex-A and Cortex-M cores put together in a die. It's quite similar to what ARM describes here, except of course the instruction sets are not the samelancedal - Wednesday, October 19, 2011 - link
But is very hard to realize. Today's OS are not application aware. Meaning it would not know if a thread is from application X and need 1GHz vs. another thread from application Y that only require 1MHz. As such, it would not be able to dynamically moving thread from one core to another without guarantee not missing deadline.If the small core is dedicate to do housekeeping thread only (i.e. sync, standby etc), that is all good but there is no need to do that anyway because such tasks are so infrequency (every hundred ms or so). Therefore, you can wakeup the big core, and shut it down.
hechacker1 - Wednesday, October 19, 2011 - link
I don't think it's so hard. The scheduler would start all processes on the slow core, and if the CPU utilization doesn't exceed its maximum over a very short period of time, keep it there because it obviously doesn't need the extra processing power.The Android schedulers (CFS and BFS) are nano-second time aware, so the latency penalty could be managed.
Of course, it would be best if the programmer could explicitly place their program into a core, but you can already do that with sched_realtime, sched_fifo, and sched_batch policies. The question is really how far Android optimizes for this sort of thing. Right now I think they treat everything as realtime fifo queues, instead of letting the built-in Linux schedulers do their thing.
lancedal - Thursday, October 20, 2011 - link
Keep in mind that there are performance and latency penalty when power on/off the big core. When you power up the big core, it would take time (to power up, reset and boot). Its L2$ are empty, so it would need to be heated up. All of this added to performance impact that is very hard to take into account by the O.S.as you said, if programmer can specify performance/deadline/cpu requirement, then everything would be simple. However, we can't expect that from million of developers out there. It's just not practical.
Penti - Thursday, October 20, 2011 - link
Boot? They would lay there as cores in a low power state, for which there would be several, it won't take tens of seconds to start up. I'm sure they can handle the power states, clocks and scheduling pretty well in the os's. The OS would know when it's in a power saving mode or not or when it needs the performance or not. It would very much depend on power profiles and so on. I would not expect them to be in a deep sleep mode at every time when resources are needed. But it's point is of course battery-life not performance. A flag on programs that want to run on the big cores would probably be easy to implement on a system like Android, I wouldn't think as it's not just pure Linux ELFs. But we will see what kind of schemes there will be soon I guess.lancedal - Tuesday, October 25, 2011 - link
Leakage is the reason they have "big-small" setup. So if the big core is not used, it will be "shut-down". So it will boot up from ROM when woken up. In 28 and 20nm, leakage will be the dominating power factor.metafor - Thursday, October 20, 2011 - link
You'd be surprised. Most modern OS's (including Android) have not only profiling but API support for applications to poll for resources such as CPU. Most of the time, you won't be seeing something like Pandora take the CPU up to 100% even if it could, in theory, burst process a lot of data and then go to sleep.The problem is that a lot of things can indeed be done faster -- web browser rendering is one primary example of something that would hog up as much CPU as it can.
And there's not really a way for a user to specify "hey, I don't mind if the page renders slower, stop using so much power".
lancedal - Thursday, October 20, 2011 - link
True, the OSs do profiling, but it is hardly accurate to guarantee real-time performance. Yes, the API would help, but not many application, if any, that specify the resource it needs as that would depend on system. It's just not practical to require software developer to specify the resource needed given how many developers and applications we have today. Big guy like Pandora, sure, but not million of little guys out there.Deciding when to switch back and forth between the LITTLE and BIG core is hard because it's not free. It cost power and performance (latency). If you switch to often, then you end up costing more power. The problem is there is no fix criteria to switch.
If you have the little core to handle "system tasks" and the big core to handle application (like Tegra-3), then it may work. However, that only help standby power and wont' do much for extend web-browsing time.
sarge78 - Wednesday, October 19, 2011 - link
http://www.arm.com/products/processors/cortex-a/co...The Cortex-A5 processor is the smallest, lowest power ARM multicore processor capable of delivering the Internet to the widest possible range of devices: from ultra low cost handsets, feature phones and smart mobile devices, to pervasive embedded, consumer and industrial devices.The Cortex-A5 processor is fully application compatible with the Cortex-A8, Cortex-A9, and Cortex-A15 processors, enabling immediate access to an established developer and software ecosystem including Android, Adobe Flash, Java Platform Standard Edition (Java SE), JavaFX, Linux, Microsoft Windows Embedded, Symbian and Ubuntu. Cortex-A5 benefits include:
- Full application compatibility with the Cortex-A8, Cortex-A9, and Cortex-A15 processors
- Provides a high-value migration path for the large number of existing ARM926EJ-S™ and ARM1176JZ-S™ processor licensees.
- 1/3 the power and area of Cortex-A9, with full instruction set compatibility.
Why didn't nVidia use a cortex A5 for Kal-El?
ET - Thursday, October 20, 2011 - link
The Cortex-A5 was announced in 2009 and hasn't apparently there hasn't been much demand for it (according to one article). At 1.57 DMIPS / MHz (according to the ARM page) it's significantly weaker than the A8, and I figure that was one problem. My guess is that the Cortex-A7 is a response to that, with higher clock rates and performance that should surpass A8 in most cases.Lucian Armasu - Saturday, October 22, 2011 - link
But it wasn't supposed to replace Cortex A8, but ARM11, which is still in all low-end Android phones today, and I hate it. Cortex A5 with close to Cortex A8 performance, and 3x more efficient, would've been a really nice replacement for ARM11.bjacobson - Wednesday, October 19, 2011 - link
wait, so why is the A7 better if 7 is a smaller number than 8? O.oManabu - Sunday, October 23, 2011 - link
Because 8 is an even number, and ARM was cursed so that every even numbered architeture they make is bad. You don't hear about ARM6, ARM8 or ARM10, but ARM7, ARM9 and ARM11 are still very much alive everywhere (low end, Tegra2, etc). The cortex A8 was a bit more sucessfull because the new instruction-set and raw power, but still was a bad desing. We probably won't be hearing about any new SoC using it in the future.Cortex A9, A5, A15 and now A7. ARM is on a roll now, as they stopped being stubborn and are side-stepping the even numbers. ;-)
The Cortex A8 is bigger and theorically faster clock for clock than the Cortex A7, even if in practice it will likely be slower because it laugably slow FPU, lower efficiency and core counts. And as it isn't faster than A9, 7 is the logical number to use.
iwod - Thursday, October 20, 2011 - link
Anand, could you do an article on xx bit CPU. With 64 Bit x86 CPU we get two major benefits, memory addressing space, and extra register for faster performance. But other then that, how many program actually uses 64bit Integer and Floating Point?ARM A7 / A15 seems to provide 40bit address, 1TB of Memory or 250 times more then current 4GB limit. I remember Intel also had 40bit memory addressing but require Software, OS, BIOS working together and it doesn't work very well on software development. Is this still the case with ARM?
mihaimm - Thursday, October 20, 2011 - link
I can't wait to boot Ubuntu on those. With little tweaks we'll be able to have nice threads go to A7 and others to A15 automatically. Quad cores at 1-1.5 GHz should be enough for for mostly anything on Linux. And if we get it packaged with 543MP2 (and good drivers) this would kill x86.introiboad - Thursday, October 20, 2011 - link
Even better than 543MP2, if it runs Ubuntu and by the time this comes to market, we'll have Rogue.french toast - Thursday, October 20, 2011 - link
hey i posted i comment earlier, replying to someone else, asking a question or two, nothing impolite or anythng like that, and when i have looked back to see if i got an answer my comment has been removed!?? why??Snowstorms - Thursday, October 20, 2011 - link
these guys are so close to competing for desktop x86 space, they are already down to 28nm and are using the exact same ASML as Intel and AMDPessimism - Thursday, October 20, 2011 - link
Does anyone else find their naming conventions infuriating?11->7->8->9->15 HUH?
plus they label their instruction sets with an A and a similar but different number... and the other licensees of ARM have their own gimpy naming conventions with one for the base core and one for the chip and so on and so forth...
iwod - Thursday, October 20, 2011 - link
11 is ARM 11 which is different to Cortex SeriesIf we inflate number based on timing they A7 would have been named A16.
A8 is replaced by A9 and will be subsequently by A15.
A7 is more like an replacement for the ultra low power A5. It just happen to be even more powerful then A8 therefore they used A8 as comparison.
mczak - Thursday, October 20, 2011 - link
I don't think A15 is really a replacement of A9.Each member of the Cortex-A family seems to have its place, except the A8 (which is the oldest and obsolete).
A5: very small, low power - though once you include larger L1 caches and NEON it seems A7 would be a better fit as it's hardly smaller anymore.
A7: small, low power, probably highest perf/power and perf/area of the whole family. Unlike A5 l1 cache sizes are fixed and NEON always included, and with all features of the A15 (including virtualization for instance).
A8: a dud, unlike all others not MP capable and with nonpipelined FPU. Worst efficiency of the family (by a large margin) in perf/area and perf/power. I don't think there's any reason at all why you'd want to use this in a new design. In some areas it might be faster than A7 (I think NEON might be twice as fast).
A9: similar size to A8 but quite a bit more advanced (out-of-order) and with higher efficiency.
A15: quite a beast compared to A9, much more complex and faster, but much bigger - the first to target low-power servers too. Efficiency might be similar to A9, not sure.
Of course, this completely ignores the timeframe - A8 was the only option for quite some time, and apart from that only A9 has made it to devices yet (I think we should see A5 soon enough - MSM7227A has Cortex-A5 and possibly quite a few low-end smartphones might use it).
ET - Thursday, October 20, 2011 - link
I do see the A15 as a replacement for the A9. The high end was A9 and will be A15. The mid range was A8 and will be A7. Both will offer significant performance increases. A9 will survive a little longer (as will the A8, probably), but I don't think it has a real place between the A15 and A7.As for the names, ARM Cortex family names reflects core complexity or size, far as I understand, not how new the core is.
C300fans - Thursday, October 20, 2011 - link
Intel has his ACE, atom E6xx, which has already been proved to be much more power efficient than ARM. For example, Sony PRS T1 is using intel 1Ghz processor(Atom E640) in his e-book reader runing on Andorid.mczak - Thursday, October 20, 2011 - link
Good point though there should be really a rather large difference in performance (and in chip size) between a A7 and a A15, hence I think there's still some room for A9 (which should still be a fair bit faster than A7 after all) - say for a low-end smartphone in 2013. I guess though this will rather depend on the licensing cost differences (afaik the less complex designs are cheaper) between A7/A9/A15.iwod - Thursday, October 20, 2011 - link
I dont think the replacement are intentional. it is the market overlaping when one product's power and effcieny leap forward.It is like a Pentium SandyBridge was never meant to replace the good old Core 2 Duo, but it is just SandyBridge being so much better and cheaper to happens to replaced it.
nofumble62 - Thursday, October 20, 2011 - link
is that ARM cannot boost core performance without sucking up more power. The power versus performance chart shows a straight line. So they have to employ this trick to maintain power efficiency, while getting enough horse power for difficult tasks.Believe Intel has done this sort of thing already. What else is new?
What's next for ARM? double the core count?
iwod - Thursday, October 20, 2011 - link
No one can boost performance without sucking more power. But ARM still have many more things to do with IPC. And as a matter of fact, a Quad Core Cortex A15 @ 2.5Ghz is very capable, and faster then a Core 2 Duo.leonzio666 - Friday, October 21, 2011 - link
"No one can boost performance without sucking more power."I beg your pardon ? I hope what you mean is that that`s impossible over one architecture type and/or manufacturing process ? Because if not, you`re talking nonsense - how about Intel SB compared to Nehalem or Ivy Bridge which is said to provide both up to 15-20% boost over SB while still lowering TDP for some cpu`s (e.g. 77 W desktop quadcore) ?
C300fans - Friday, October 21, 2011 - link
Better performance requires more power. Pentium M, Core2, SB, they are all labeled 35W TDP on laptop although SB has the latest 32nm technology.french toast - Friday, October 21, 2011 - link
More performance only requires more power on the same manufacturing process and same bandwith/cache etc.Penryn increased performance whilst lowering tdp over previous core 2 quad, with virtually same architechture.- mainly superior 32nm HMGK process.
french toast - Friday, October 21, 2011 - link
Didnt nvidia do some type of demo with tegra 3 where it beat a core 2 duo?? seemed a bit suspicious to me but if that is true then a quad cortex-a15 might match up against a low level core 2 quad!What are you thoughts on krait??
blueboy11 - Friday, October 21, 2011 - link
We really need to find a way to keep a display from draining the battery. If they can use these processors so that you can keep the brightness but have as little of an impact on display draining the battery, we might have something here to jump on. In mean the phones today use about 3/4 of the battery life on just the display alone (and I'm not talking even talking about the LTE band for you people on Verizon), so this would be really nice feature. I hear they're already trying some battery-saving features in Ice Cream Sandwich (4.0). Any takes on this?hyvonen - Saturday, October 22, 2011 - link
Would you please do a quick comparison of modern x86 vs. ARM solutions in terms of performance and power effiiency. The arguments keep going back/forth, and some reliable numbers would be very useful. Thanx!