So, roughly speaking, how does ARM IPC compare to x86? Obviously it's not going to be as high as on modern big-core desktop x86 parts like SB/IB/Haswell, but how does it compare to Atom (both the current generation and the new one in the pipeline) and Bobcat/Kabini?
The performance expectations (which relate to IPC) were misguided though. Intel's compiler was, essentially, cheating by skipping entire sections of the benchmark. http://www.androidauthority.com/analyst-says-intel...
If I'm interpreting the labels on that graph correctly ("K900 at 1.0"); they've under clocked the atom by 50% (from 2ghz to 1ghz) from what it normally operates at; which would flip the results back to Intel winning the majority of the tests.
No, that's actually the *real* clock speed of Atom, which Intel misleadingly calls a "2 Ghz" core. The 2 Ghz speed is the turbo-boost speed, just like for laptops Haswell will really be clocked at 1.3 Ghz (same performance level as 1.5 Ghz IVB), and it goes up to 2.3 Ghz, or whatever, with Turbo-Boost.
The problem is that I think benchmarks do use the Turbo-Boost speed fully, which means Atom will do very well in benchmarks, while that may not be the case in real day to day life, where the phone might never activate the Turbo-Boost speed, and just use the slower "real clock speed" all the time.
The fact of Turbo-Boost is a problem for lazy benchmarking, it is NOT a problem for Intel.
Why do you want a high speed core in your phone? There is a population that wants to run aggressive games, or transcode video or whatever, and these people care about sustained performance. But they're in the dramatic minority. For most users, the value of a high speed core is that it makes the phone more zippy, meaning that operations are fast when they need to be fast, after which the phone can go back to sleeping. The user-level "speed" of a phone is measured by how fast it draws a (single) PDF page, or renders a (single) complex web page, or launches an app, not by how it performs over any task that takes longer than a second. In such a world, if Turbo-Boost allows the app to sprint for a second, then go back to low-power mode, the user is very happy with that behavior.
The only "problem" with this strategy, for Intel, is that it is obvious and will be copied by everyone. Intel is there first and most aggressively for historical and process reasons, but there's no reason they will remain the only player. (It's also quite likely that competitors will adopt the ideas of Turbo-Boost, just never call it that. After all the problem to be solved for phones is different from the REAL Turbo-Boost problem. Turbo-Boost comes from a world where you run the chip as hot as it can go --- till it just about to overheat. If an ARM core has no danger of actually overheating, then the design space is different. Now it's simply "we'll rate the core for 2GHz, but at that speed it uses up 10 nJ/op, so as far as possible we'll try to run it a 1GHz (using up 2 nJ/op) or better yet 100MHz (using up 10 pJ/op) [all numbers made up, but you get the point].)
All CPUs typically run at a far lower frequency than the maximum - I'm sure nobody believes eg. Krait 800 runs all 4 cores at 2.3GHz all the time. So if you call a specific frequency the "base" then anything faster than that is automatically a turbo boost. In that sense Intel's turbo boost is largely marketing, a way to claim a low TDP by setting the base frequency arbitrarily low, and allowing to go well over that TDP for a certain amount of time at a much higher boost frequency.
My desktop Haswell with Intel's retail cooler runs all four cores 24/7 turbo-boosted - I'm not quite sure what to, it reports 3401MHz in Linux but that's because /proc/cpuinfo asks ACPI which isn't fully compatible with turbo-boost. The machine draws 100W from the mains while doing so, which (given that it idles at 28W) is entirely consistent with the 84W TDP.
And, indeed, it runs at 800MHz at idle; and I suspect often slower than that, but /proc/cpuinfo doesn't report C-states
Intel's turbostat has proven very useful for getting good reporting of CPU clock speeds under Linux. with the -v option it also displays the maximum speeds that CPU will run at as the no of active cores varies. Recommended. i7z is another option, but I've seen it do a bad job of showing which cores are active when hyperthreading is enabled.
The difference is those ARM chips do take full advantage of the maximum core speed. Saying you start a web page - any web page. It WILL activate the maximum clock speed - whereas the Turbo-Boost in Atom doesn't activate all the time.
If we're talking about receiving notifications and such, then obviously the ARM processors won't go to 2 Ghz either, but that's not really what we're talking about here, is it? We're talking about what happens when you're doing normal heavy stuff (web browsing, apps, games).
That's the problem I have with performance benchmarks on cell phones. At some point thermal throttling kicks in because you're draining the battery a ton running your CPUs at full tilt. IPC improvements will be felt far more than clock speed ramping. If you ever look at CPU-Z on Android, you'll notice that a Snapdragon 600 with 4 cores clocked at 1.7Ghz tries its hardest to downclock to 1 core at 384Mhz. Even just scrolling up and down the monitoring screen pumps up the CPU speed to 1134Mhz and turns on a second core as well. Peak performance is nice, but ideally should rarely be utilized.
No, I meant it's a problem because Atom chips look like they are "competitive" in benchmarks, when in reality they have HALF the performance. That's what I was saying. It's a problem for US, not Intel. Intel wins by being misleading.
intel didn't mislead you. In SLM's review, they have very clear description about turbo. Copied here. Previous Atom based mobile SoCs had a very crude version of Intel’s Turbo Boost. The CPU would expose all of its available P-states to the OS and as it became thermally limited, Intel would clamp the max P-state it would expose to the OS. Everything was OS-driven and previous designs weren’t able to capitalize on unused thermal budget elsewhere in the SoC to drive up frequency in active parts of chip. ........ this is also how a lot of the present day ARM architectures work as well. At best, they vary what operating states they expose to the OS and clamp max frequency depending on thermals.
That's not cheating: it's what compilers are supposed to do. For example, if you write, "for (i=0; i<1000; i++);" a good optimizing compiler will analyze the loop, realize that it does nothing, resolve it to "i=1000;" and compile that. I believe the first use of this type of aggressive compiler technology was seen in Sun's C compiler for whatever version of Solaris it was that ran on the Sparc chips back in the '80s. The fact that the ARM compilers didn't do this speaks more about the expected performance of the chipset than anything else: you can build hardware to be as fast as you like, but if the compilers can't keep up, you might as well be running your code on a Commodore Pet.
Speaking of the Sun thing: I distinctly remember that the then-current version of the Sun "pizza-box"-style workstation appeared in benchmarks to be 100 times faster than the IBM PC-RT (another RISC architecture competing with Sun's platform) even though, on paper, the PC-RT was running on faster hardware: analysis of the benchmarks' compiled code revealed that Sun's compiler had effectively edited out the loops as I described above. Result: the PC-RT died off very quickly.
It's certainly cheating, if you followed the whole thing it was not just about ICC optimizing much of the benchmark away. The particular optimization was added recently to ICC - it was a lot more complex than an empty loop, it only optimized a very specific loop by a huge factor (so specific that if you compiled all open source code it would likely only apply to the benchmark and nothing else). For some odd reason AnTuTu then secretly switched to that ICC version despite ICC not being a standard Android compiler. Finally it turned out the settings for ARM were non-optimal, using an older GCC version with pretty much all loop optimizations disabled. Intel and ABI research then started making false claims on how fast Atom was compared to Galaxy S4 based on the parts of AnTuTu that were broken (without actually mentioning AnTuTu).
Giving one side such a huge unfair advantage is called cheating. As a result AnTuTu will now stop using ICC.
This is why benchmarks have to be taken with a healthy dose of skepticism.
First, if the benchmark program isn't open source, right off the bat it's worthless. If you can't see the code, you can't trust it.
Second, if the program isn't compiled with the same compiler and the same compiler options, the results are crap. You're not getting a valid comparison of the hardware itself.
It's kind of ridiculous seeing many of the journalists out there who took this sensational headline and ran with it without even questioning its legitimacy.
This is based on fair comparisons using Geekbench and so doesn't reflect what some marketing departments claim or what cheated benchmarks (ie. AnTuTu) appear to show.
Can you provide a reference for these values? The geekbench numbers are all over the place even for the same device (for instance, you see iphone 5 results that vary by 6%, while gs4 can vary by 16% easily).
Not sure what geekbench measures and saves. But considering the multitude of options for tweaking an Android system is quite easy. Just change a some governors around and use a different scheduler and you can get quite the range of results.
It really does sound like the Swift or Krait cores, but widely available now to the Rockchips and Mediateks. Even if it comes out next year, it means $200 smartphones with the raw performance of an iPhone 5 or Galaxy S4 while Apple and Samsung sell something similar for $450. The real question then is how Qualcomm, Samsung, and Apple will push their architectures other than more die-shrinks. Apple still has the option of moving to 4 core as well as BIG.little, and Qualcomm still has the option of BIG.little as well, but where is Exynos going to head? 8 core BIG.little (for 16 total cores?) Asymmetric B.l with 8 big cores and 4 small cores? Something else altogether?
Great point regarding Big.Little design in the SoC. There are many ways to implement Big.Little design on the wafer. I think only the rudimentary one has been used and this does not really link as much to OS optimizations as we would like. It takes effort/complexity and great code in drivers and kernel changes to take advantage of the design in order to maximise what the hardware can do. And there is the variant that could go Big.Medium.Little. If you look at the frequency charts of typical use, the Medium state do take a lot of the time duration while the Big takes very little (only in the spikes) then the Little takes care of near idle time. Having a Medium module takes space but might be worth the effort in power savings more than just Big.Little switching. The switching cost in power is negligible but time sustained power use on a certain frequency do have good savings (eg 30% in Medium state vs 5% on Big state). Optimizing the OS to change state is important as the efficiency and time savings are there to be had. The slower it does it, the more power it draws for a given duration. Another software optimizing is to split threads to specific core types or core number to optimise performance. eg Little core does all I/O since that is slow while FP/INT goes to Big, or INT split between Big and Little. Dynamic switching to keep one Big core active for several seconds longer might be a benefit if it gets triggered soon after, ie Why switch when a delay in switch solves the problem!. OF course a huge simulation is needed to find the optimal design points that are worth implementing. It is an iterative process. The same goes for GPU cores to get active and boost frequency on demand. For now, they kick fully blast when the game wants it. A great feedback way would be an FPS counter to throttle down the gpus since > 60fps is pretty useless unless you are running 120fps 3D displays. For that cap it at 120fps when the is mode is used. Due to the time to release durations, I am certain many compromised were made just to get the silicon out. ARM vendors are not like Intel who can afford the wait on a release because they had a monopoly on the chip in the market. Competition ensure that ARM evolves quickly and efficiently. This is where you can see Qualcomm succeeding while Tegra falters. Samsung is trying to find their secret sauce toward success with Octa-core design. I think next iteration might be a good one for them coupled with process node improvements they will use. I see a 2:3:4 design as optimum. 2Big 3Medium 4 Little. Here is how it should work: Full Bore: 2Big 2Medium and 1 Little active (PentaCore design). Medium operation: 3Medium and 2 Little active (Still holding PentaCore threading) Step Down1: 1Medium 2 Little. Idle: 1 Little only. Note Little takes ALL the I/O traffic.
Looks pretty clear to me that there will be an A55 at 14nm or at least 10nm. The A12 is technically replacing the A9, right at the start of the next gen of chips which are all 64 bit. It doesn't do them any good to have a high end and low end chip that is 64 bit, and a mid range chip that is only 32 bit. But the power/performance claims are very close to the A15... so this is basically replacing the A15, from that perspective.
The A57 will expire sometime at/after 14nm, and new designs will come out. At that time, an A55 that replaces it would make sense, fulfilling the same roll as the A12 at 32-bit.
I'm sure I remember reading somewhere (some interview?) that they decided that it just didn't make sense (yet) to go 64 bit for the sorts of devices that the A12 will be targeting. The A57 obviously has to go 64 bit to support servers and the like, and that presumably means that the A53 has to follow in order to be matched for bigLittle purposes for high end smart phones/tablets etc.
As michael2k refers to above, the A12 is aimed more at mid/in time, low end phones and the like. Much less reason to push 64 bit there just yet. ARM have to support this sort of thing but I guess the business model means that they can too.
The NEON improvements are compelling, but it would be nice to peek behind the curtain of the 48% improvement claims on FFMPEG.
To start FFMPEG covers a vast amount of functionality, but certain FFMPEG codecs like h.264 are much more relevant than the obscure ones. So which codecs were used, and are the improvements seen in encoding or decoding, or both?
As we learned with AVX and x264, it's not always easy to realize big gains in real life scenarios with new SIMD hardware.
If there's interest in an article benchmarking x264 on the A9/A15/Krait (to tide us over until the A12 arrives) let me know, been trying to find a way to contribute to AT. :)
Your tastes are changing. I used to read power supply and case reviews, now I usually don't even peek at the summary. I guess you could say some people are interested in where the action is, and while Intel is making bold moves (I still read Intel articles) there are still many things that we already know from previous reviews. ARM is still novel to us.
What game is that? ARM is in the business of making money. It will sell hundreds of millions of these to the developing world, and make money on each one. If you want to view low-power CPUs as porn, not a business, you should be spending your time watching Apple (and to a lesser extent, Qualcomm and, maybe one-day Intel) not ARM.
I'm strictly speaking business, not porn, thank you. To me, Cortex A12 is addressing the same problem that Krait and Swift cores addressed (specifically in memory bandwidth), where competitors are clearly >2 years ahead by the time of availability. Look how "successful" A15 vendors are (/s) while Qualcomm is taking a huge share of the pie.
big.LITTLE is proving too hard to implement. Samsung has succeeded in providing Apple with their needs when other fabs failed, all while having difficulties with their implementation of big.LITTLE. Samsung even ditched the CHEAPER licence of ARM's Mali GPU in favor of IT's solution. There's clearly a problem somewhere. Yes, Cortex A15 is faster, but "average" performance of Krait 200 compared with big.LITTLE (a15/a7) is VERY comparable. However, in heavy workloads, Cortex A15 consumes significantly more power.
ARM has this "view" on how the market "should" be heading, while the market is heading in a clearly different power-envelope/performance direction. Reason? Android. Cortex A9 is not powerful enough for Android, and A15 consumes too much power. I'm a big believer in power efficiency, but ARM seriously need to revise their power envelope charts. Cortex A15 should have been a target of the 20/22nm process, NOT 28nm. That's how demand is working now. Cortex A12 SHOULD HAVE BEEN prioritized over Cortex A15 on 28nm. OEMs (including Samsung) are preferring Snapdragons over Exynos 5 and Tegra 4 even on more power tolerable devices like tablets.
That said, even they're high performance Cortex A15 is seriously threatened by Krait 300 and Silvermont cores in power efficiency at relative performance. And by the time A57 is implemented, where do you think the competition will be?
Someday Intel? Dude, for $400, can either get a Windows RT ARM tablet, or a FULL Windows 8 tablet running Saltwell (and Silvermont in the very near future), which one would you pick? Android tablet you say? Guess what Samsung is doing now with their Tab 3 10.1.
Developing countries? Don't worry about those, Krait 200 cores will be too darn cheap in 2 years when A12 is ready to ship. Oh, they also have modem integration......
The business world works VERY differently from the world enthusiasts live in...
You make very valid points, but you have a very developed-world-centric point of view.
Yes A12 is too late, it is a response to Swift and Krait, and it is over two years too late. We'll have to wait a while still of course to see how it stacks up against its competition in 2015, but I agree with you in that it should have been prioritized, and it's late.
What you are missing however is that there is a huge swath of the market where ARM doesn't really have a competition. Swift and Krait are non-entities there, they are too expensive for your average Chinese OEM like Zopo or THL or Cube or FAEA (or many dozen others). These phones and tablets are now ruling China, India and south-east Asia, and they are all using Mediatek Cortex A7s in the phones, and Rockchip or Allwinner Cortex A9s in their tablets and Android sticks. These are huge markets, we are talking over 3.5 Billion people who live in these countries, and yes Samsung and Apple sell their phones there as well, but they are tiny (especially Apple). Something like 37% of ARM's Cortex A chips were produced by Mediatek alone in 2012, and no one in the developed world has even heard of them.
Sure Qualcomm is trying to play in this market, with their confusingly named Snapdragon S4 Play (which is just a cortex A5) but Mediatek beats it hands down both in performance and price. Modem integration you say? It just isn't a factor. They all have single band HSPA modems on 2100 Mhz, or at best dual band 850/2100 or 900/2100 and that's all that these markets care about. Oh and dual SIM, they really care about their dual SIM. No one cares about Qualcomm's fancy pants LTE modems, there are no networks to use them on!
Will you ever see Cortex A12 in a flagship Samsung or Apple or HTC product? Probably not, but that doesn't really matter. You'll see it in hundreds of millions of products that will line up on AliExpress and Taobao and that's where the vast bulk of the market is going to be.
The other thing to remember is that ARM aren't in competition with Qualcomm, Apple etc. You could even reasonably argue that the existence of Krait meant they didn't need to cover that niche as urgently as the newer markets that the A15/57 etc can be aimed at.
Folk like MediaTech must be a massive long term worry for everyone in the SoC market (and Intel trying to break into it) - at some point (reasonably soon?) the bulk of people even in the developed world might well decide that they've got 'enough' performance and then all those very cheap chips will be waiting to destroy margins. ARM are of course set up not to mind.
Of course you are right, ARM doesn't really view Krait and Swift as its competition... but just because ARM is happy to license its instruction set doesn't mean they like Krait and Swift. ARM gets a higher initial revenue when it licenses out its instruction set, but after that, the percentage that it gets on shipping Krait and Swift cores is far lower than what it gets from a straight implementation of one of its Cortex A designs... in ARM's ideal world, everyone would be implementing straight from its designs and giving it the higher percentage fee.
And yes, MediaTek is a massive worry to everyone the big guys. There is also a price war happening between Chinese companies (Rockchip and Allwinner) and MediaTek (which is Taiwanese) right now, price of highest end products have come down by over a third since the start of the year... you can now buy MT's highest end 1.5 Ghz quad core A7 chip for $8... this is just crazy!
Of course ARM is sitting comfortable at the top, but the one company that can give ARM some headache is Imagination. They've been beating ARM in the embedded GPU space for a good while now, and now that they have a CPU architecture of their own... this is going to get very interesting. In the 90s, MIPS had a far better IPC than ARM, so if Imagination really is set to revive MIPS and get aggressive on price, it will be very interesting to watch, especially with over 95% of Android apps working just fine on MIPS. Pity no one is paying attention to that battle, AnandTech didn't even cover the launch of Warrior.
Most of the analysis of MIPS implies that it has a chance at the embedded world, but not a prayer where the chips listed in this article play. I would assume that Imagination has a long term plan to break into this market, but it will take some sort of extreme cost/power/performance advantage to convince anyone to give up an architecture. There is a reason that ARM64 is still a dominant architecture, and it has nothing to do with any inherent superiority of the instruction set (indeed, it is a disaster and an unholy kludge. While most of the time "the backward stuff" might be irrelevant to modern computing, it still takes up area, has to be validated, and still has warts that have to be dealt with every design). Changing architectures isn't taken lightly (see how wonderfully windows RT is doing).
Isn't the business model the major reason for the architecture getting so dominant? Given how cheaply/freely they license the architecture, you need a really strong motivation and/or massive scale to consider using anything else. Limits ARMs size of course.
Qualcomm is not worried right now because it is busy with serving higher priced solutions and it could hardly supply those customers!. Besides, Qualcomm has access to next-gen process in volumes that can crush MediaTek , Actions, HiSilicon etc by dropping prices in the next-gen offerings. That market will run out for MediaTek within 12 months, so Qualcomm is playing the game right for now. All ARM licensees can do whatever they want as long as they stay within their contractual obligations. These Chinese licensees better be careful les they get cut off from the license and had to seek alternative architectures (meaning none, what going Intel is suicide!). The idea of MediaTek and others taking the "lower-tier" of the market for 3-4 quarters is enough for all to feast on the market. They do not want Intel to come in to cream everything off and leaves nothing for the partners to live on. There is strategy and turf protection, these are no dumb companies having made it this far. They know most of the tricks and can outwit the or else they will die. You should know the difference between Chinese vendors and western vendors, the Chinese manufacturers are happy to sell a wholesaler a phone for $110 when it cost them $80 to make. ie profit $30. The wholesaler turns around and sell it for $180 making $40-50 each after all the distribution costs etc. Western companies will see this unit for $300 retail!. The difference is greed even when the manufacturer provided goodwill in the factory price. The reason factories put a limited profit on each unit is to move the volumes because they know full well what the market price is going to be. They do not want to inflate it further.
I'm sure you have valid points too, but you're not getting my side of the story. MediaTech's solution is pretty solid, yes, but it's strictly Cortex A9. Consumer demand, even in developing countries, is growing, and the need for faster chips is growing as well. By 2015, what makes you think that Krait wont be competitive in price with higher performance? Qualcomm needs only to re-badge their already developed chips with higher clock speeds. Profit margins are way higher in developed countries, it's just a matter of time that even current flagship devices be slashed in price (with a slight change in design and chassis) and they're ready to take on the cheapest Chinese OEMs have to offer...
Anyway, this doesn't change the fact that ARM made a mistake in its priorities. Cortex A15 (big.LITTLE) should have been targeted for 20/22nm and smaller processes. Priority should have been for Cortex A12. And yes, flagships (hero phones) could have made use of that core if it was ready by this time, especially since it could have competed really well with Krait in both power efficiency and performance (most probably beating it if ARM's claims are to be believed in that regard.
The world, and A15 in particular, isn't just mobile phones :) The Samsung Chromebook, early versions of the micro server hardware, shield aren't all massive volume but they've started the process of starting to get Arm chips accepted in a bunch of device classes where they just didn't previously exist. (Phone wise it did end up powering a lot of S4's.).
You can see that could easily be more attractive/important as a priority for ARM than picking fights with existing licensees over high end mobile phones.
If Qualcomm/ (Intel too of course) end up directly taking on the cheapest that the Chinese OEMs have to offer they've basically already lost as there's then so little profit left. They have to somehow make the case for more expensive but also more powerful/efficient chips. At some point that'll get hard.
True, it isn't only for mobile devices (smartphones and tablets), but those make the absolute majority of demand. ARM has too much competition to face in the server world, a world that already has tons of existing x86 code. By the time they're ready to seriously fight in the server world, 14nm would be the norm.
Again, my argument isn't about which chip goes where, or who should compete where, it's about timing and priorities of architecture designs, which ARM clearly screwed up on. At least that's how I see it.
"By 2015, what makes you think that Krait wont be competitive in price with higher performance? Qualcomm needs only to re-badge their already developed chips with higher clock speeds."
And what makes you think that the competition will stand still until that time? They could get better SoC's based on the A-12 by that time, to just name one (out of many) possibilities.
They WILL stand still because Cortex A12 isn't ready to ship yet. It'll be a good 2 years before it's "cheaply" manufactured by the likes of MediaTec. By the way, Krait powered Nokia Lumias are going for around and less than the $200 mark right now. People are forgetting that Android isn't the only contender in the market. OS market share will most probably look very different in 2015. There are many dynamics running the low power processor market.
A12 will beat A53 by about 40%: A53 delivers performance comparable to A9 and A12 is 40% faster than A9. Note that 64-bit is not relevant and certainly doesn't provide a big speedup - even on x86 almost all software is still 32-bit as there is little to gain from going to 64-bit.
Well this isn't entirely accurate as some of us are running almost completely pure 64-bit systems. And it does appear that video encoding and decoding are common operations that can benefit from 64-bit software from some of the benchmarks I've seen, but that might actually have been from compiler options so maybe not. But otherwise yes, the 64-bit ARM chips are only really important for server type workloads where people already have 64-bit software that they don't want to rework.
64-bit has pros and cons. x64 provides more registers so some applications run faster when built for 64-bit - that may well the video codecs you mention. However there are downsides as well, as all your pointers double in size, which slows down pointer heavy code. On 64-bit ARM things will be similar.
Note the main reason for 64-bit is allowing more than 4GB of memory. The latest 32-bit ARMs already support this, so for mobiles/tablets etc there is no need to change to 64-bit.
Er, no. Just no. From what I can tell, 32 bit ARM chips use (from a developer's view) the exact same mechanism as x86 and PAE. This might use 4G of RAM efficiently (and for those OSs that like to leave all apps in RAM, it might work well for a bit more). Trying to address more memory than an integer register can map to is always going to be an unholy kludge (although I would personally recommend all computer architects design such a system into the architecture because it *will* happen if the architecture succeeds at all). Since ARM chips tend to go into machines that rarely allow memory to be upgraded, no vendor really should be selling machines with >4G RAM and 32 bit accessing. The size/power/cost tradeoff isn't worth it
google "Support for ARM LPAE". All my links go straight to the pdfs and I wind up with all the google code inbedded in my links.
A15 based servers will have 8-32GB. So yes it does go well over 4GB, that's the whole point of PAE. Mobiles will end up with 4GB RAM soon and because of PAE there is no need to use 64-bit CPUs (which would be way overkill for a mobile).
Yes I know about ARM LPAE and that it is supported in Linux.
"Cortex A12 retains the two integer pipelines of the Cortex A9, but adds support for integer divides (like the A7 and A15, other A-series architectures generally lacked support for hardware int divides)"
The parenthetical material is very unclear. It sounds like it's saying "The A12 has hardware divides, the A15 and A7 don't, and other A-series archs likewise don't." A simple edit makes the sentence much more clear and slightly more concise:
"Cortex A12 retains the two integer pipelines of the Cortex A9; however, like the A7 and A15, it adds support for hardware integer divides (which previous A-series architectures generally lacked)."
An even simpler edit is to just change the parenthetical comma into a semi-colon:
"Cortex A12 retains the two integer pipelines of the Cortex A9, but adds support for integer divides (like the A7 and A15; other A-series architectures generally lacked support for hardware int divides)"
And I didn't see anyway for 32bit ARM to access more than 3G. Maybe there is, but the PAE-style mechanism that allowed each process to access 4G of ram (well 2-3G of user space and 2-1G of OS space). It looks like each process sees 32 bit MMU tags meaning no way to access the whole RAM. Again, somewhere in there they might have an unholy kludge, but I suspect that they are more than willing to do things the [PAE] intel way [not the 286 way that Microsoft forced everyone to support a decade after it was consigned to the junkyard].
So how does one process access more than 4G (3G if Linux, likely less else where)? There is a reason nobody uses 32 bit chips. If you really looked up the datasheets, the *80386* chip could access way more than 64G virtual ram (it didn't have the pins for more than 4G of memory). You could even access it fairly easily in a process, but as far as I know *nobody* ever tried that.
Note: Linux 0.x and I think 1.x could both handle 3G per memory process. Maybe not, I know Linus used the 386 segmentation scheme natively on different processes, but I have no idea if the 386 MMU could handle tags that depended on the segmentation scheme (it was quite possible you could either go wild with segments, or use them traditionally and have full MMU operation. I haven't looked at this stupid idea since 1990, when I learned the disaster that is x86.
We use 64 bit chips for a reason. If we didn't need to access memory the size of an integer register, I would strongly suspect that all integer registers would be 16 bits long (note the pentium4 computed integer operations 16 bits at a time, they are notably faster). Using a 64 bit register and 64 bit addressing means that you can access an entire database of arbitrary size (2^63, whatever that is), while using 32 bit machines requires a "networking" OS call to whichever process happens to have that particular datum in memory. It is yet another unholy kludge and the reason that "the only fatal mistake a computer architecture can have is too small a word size".
You don't need to access more than 3GB per process on a mobile! Mobiles/tablets will have 4GB of RAM, however each process still has it's own 32-bit address space and uses only a portion of the available RAM.
There is no need to be so obsessed about 64-bit, you know the the Windows world is still mostly 32-bit 10 years after the introduction of Athlon64... Even Windows 8 still has a 32-bit version. So while there are lots of 64-bit chips around, most run only 32-bit code. My Athlon64 which I retired last year never ever ran 64-bit code during its entire life!
You only really require 64-bit addressing if you have big applications that need more than 3GB per process. Such applications are rare (your database is an example) and they only run on large expensive servers, not on mobiles. So clearly the need for 64-bit is extremely small, and mobiles/tablets will simply use PAE rather than switch to 64-bit for the foreseeable future.
The timing on this seems weird. Didn't they know they needed a smaller jump between A9 and A15 years ago? I HOPE it's not really needed by late 2014/2015...I mean I hope by then we're all using A15, and maybe A5x or whatever... Or AMD's low power chips and SIlvermont!
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
65 Comments
Back to Article
JDG1980 - Wednesday, July 17, 2013 - link
So, roughly speaking, how does ARM IPC compare to x86? Obviously it's not going to be as high as on modern big-core desktop x86 parts like SB/IB/Haswell, but how does it compare to Atom (both the current generation and the new one in the pipeline) and Bobcat/Kabini?nathanddrews - Wednesday, July 17, 2013 - link
I think that was covered before:http://www.anandtech.com/show/6936/intels-silvermo...
coder543 - Wednesday, July 17, 2013 - link
The performance expectations (which relate to IPC) were misguided though. Intel's compiler was, essentially, cheating by skipping entire sections of the benchmark. http://www.androidauthority.com/analyst-says-intel...DanNeely - Wednesday, July 17, 2013 - link
If I'm interpreting the labels on that graph correctly ("K900 at 1.0"); they've under clocked the atom by 50% (from 2ghz to 1ghz) from what it normally operates at; which would flip the results back to Intel winning the majority of the tests.Krysto - Wednesday, July 17, 2013 - link
No, that's actually the *real* clock speed of Atom, which Intel misleadingly calls a "2 Ghz" core. The 2 Ghz speed is the turbo-boost speed, just like for laptops Haswell will really be clocked at 1.3 Ghz (same performance level as 1.5 Ghz IVB), and it goes up to 2.3 Ghz, or whatever, with Turbo-Boost.The problem is that I think benchmarks do use the Turbo-Boost speed fully, which means Atom will do very well in benchmarks, while that may not be the case in real day to day life, where the phone might never activate the Turbo-Boost speed, and just use the slower "real clock speed" all the time.
name99 - Wednesday, July 17, 2013 - link
The fact of Turbo-Boost is a problem for lazy benchmarking, it is NOT a problem for Intel.Why do you want a high speed core in your phone? There is a population that wants to run aggressive games, or transcode video or whatever, and these people care about sustained performance. But they're in the dramatic minority. For most users, the value of a high speed core is that it makes the phone more zippy, meaning that operations are fast when they need to be fast, after which the phone can go back to sleeping. The user-level "speed" of a phone is measured by how fast it draws a (single) PDF page, or renders a (single) complex web page, or launches an app, not by how it performs over any task that takes longer than a second. In such a world, if Turbo-Boost allows the app to sprint for a second, then go back to low-power mode, the user is very happy with that behavior.
The only "problem" with this strategy, for Intel, is that it is obvious and will be copied by everyone. Intel is there first and most aggressively for historical and process reasons, but there's no reason they will remain the only player.
(It's also quite likely that competitors will adopt the ideas of Turbo-Boost, just never call it that. After all the problem to be solved for phones is different from the REAL Turbo-Boost problem. Turbo-Boost comes from a world where you run the chip as hot as it can go --- till it just about to overheat. If an ARM core has no danger of actually overheating, then the design space is different. Now it's simply "we'll rate the core for 2GHz, but at that speed it uses up 10 nJ/op, so as far as possible we'll try to run it a 1GHz (using up 2 nJ/op) or better yet 100MHz (using up 10 pJ/op) [all numbers made up, but you get the point].)
Wilco1 - Wednesday, July 17, 2013 - link
All CPUs typically run at a far lower frequency than the maximum - I'm sure nobody believes eg. Krait 800 runs all 4 cores at 2.3GHz all the time. So if you call a specific frequency the "base" then anything faster than that is automatically a turbo boost. In that sense Intel's turbo boost is largely marketing, a way to claim a low TDP by setting the base frequency arbitrarily low, and allowing to go well over that TDP for a certain amount of time at a much higher boost frequency.inighthawki - Wednesday, July 17, 2013 - link
My 3.5/3.9GHz advertised i7 4770K runs at 800Mhz at idle :)TomWomack - Thursday, July 18, 2013 - link
My desktop Haswell with Intel's retail cooler runs all four cores 24/7 turbo-boosted - I'm not quite sure what to, it reports 3401MHz in Linux but that's because /proc/cpuinfo asks ACPI which isn't fully compatible with turbo-boost. The machine draws 100W from the mains while doing so, which (given that it idles at 28W) is entirely consistent with the 84W TDP.And, indeed, it runs at 800MHz at idle; and I suspect often slower than that, but /proc/cpuinfo doesn't report C-states
RicDavis - Friday, July 19, 2013 - link
Intel's turbostat has proven very useful for getting good reporting of CPU clock speeds under Linux. with the -v option it also displays the maximum speeds that CPU will run at as the no of active cores varies. Recommended. i7z is another option, but I've seen it do a bad job of showing which cores are active when hyperthreading is enabled.Krysto - Saturday, July 20, 2013 - link
The difference is those ARM chips do take full advantage of the maximum core speed. Saying you start a web page - any web page. It WILL activate the maximum clock speed - whereas the Turbo-Boost in Atom doesn't activate all the time.If we're talking about receiving notifications and such, then obviously the ARM processors won't go to 2 Ghz either, but that's not really what we're talking about here, is it? We're talking about what happens when you're doing normal heavy stuff (web browsing, apps, games).
jeffkibuule - Monday, July 22, 2013 - link
That's the problem I have with performance benchmarks on cell phones. At some point thermal throttling kicks in because you're draining the battery a ton running your CPUs at full tilt. IPC improvements will be felt far more than clock speed ramping. If you ever look at CPU-Z on Android, you'll notice that a Snapdragon 600 with 4 cores clocked at 1.7Ghz tries its hardest to downclock to 1 core at 384Mhz. Even just scrolling up and down the monitoring screen pumps up the CPU speed to 1134Mhz and turns on a second core as well. Peak performance is nice, but ideally should rarely be utilized.Krysto - Saturday, July 20, 2013 - link
No, I meant it's a problem because Atom chips look like they are "competitive" in benchmarks, when in reality they have HALF the performance. That's what I was saying. It's a problem for US, not Intel. Intel wins by being misleading.felixyang - Thursday, July 18, 2013 - link
intel didn't mislead you. In SLM's review, they have very clear description about turbo. Copied here.Previous Atom based mobile SoCs had a very crude version of Intel’s Turbo Boost. The CPU would expose all of its available P-states to the OS and as it became thermally limited, Intel would clamp the max P-state it would expose to the OS. Everything was OS-driven and previous designs weren’t able to capitalize on unused thermal budget elsewhere in the SoC to drive up frequency in active parts of chip. ........ this is also how a lot of the present day ARM architectures work as well. At best, they vary what operating states they expose to the OS and clamp max frequency depending on thermals.
opwernby - Thursday, July 18, 2013 - link
That's not cheating: it's what compilers are supposed to do. For example, if you write, "for (i=0; i<1000; i++);" a good optimizing compiler will analyze the loop, realize that it does nothing, resolve it to "i=1000;" and compile that. I believe the first use of this type of aggressive compiler technology was seen in Sun's C compiler for whatever version of Solaris it was that ran on the Sparc chips back in the '80s. The fact that the ARM compilers didn't do this speaks more about the expected performance of the chipset than anything else: you can build hardware to be as fast as you like, but if the compilers can't keep up, you might as well be running your code on a Commodore Pet.opwernby - Thursday, July 18, 2013 - link
Speaking of the Sun thing: I distinctly remember that the then-current version of the Sun "pizza-box"-style workstation appeared in benchmarks to be 100 times faster than the IBM PC-RT (another RISC architecture competing with Sun's platform) even though, on paper, the PC-RT was running on faster hardware: analysis of the benchmarks' compiled code revealed that Sun's compiler had effectively edited out the loops as I described above. Result: the PC-RT died off very quickly.FunBunny2 - Friday, July 19, 2013 - link
The PC-RT didn't last long, but the processor (in its children) lives on as the RS-6000/PPC/iSeries/ZWilco1 - Thursday, July 18, 2013 - link
It's certainly cheating, if you followed the whole thing it was not just about ICC optimizing much of the benchmark away. The particular optimization was added recently to ICC - it was a lot more complex than an empty loop, it only optimized a very specific loop by a huge factor (so specific that if you compiled all open source code it would likely only apply to the benchmark and nothing else). For some odd reason AnTuTu then secretly switched to that ICC version despite ICC not being a standard Android compiler. Finally it turned out the settings for ARM were non-optimal, using an older GCC version with pretty much all loop optimizations disabled. Intel and ABI research then started making false claims on how fast Atom was compared to Galaxy S4 based on the parts of AnTuTu that were broken (without actually mentioning AnTuTu).Giving one side such a huge unfair advantage is called cheating. As a result AnTuTu will now stop using ICC.
jwcalla - Thursday, July 18, 2013 - link
This is why benchmarks have to be taken with a healthy dose of skepticism.First, if the benchmark program isn't open source, right off the bat it's worthless. If you can't see the code, you can't trust it.
Second, if the program isn't compiled with the same compiler and the same compiler options, the results are crap. You're not getting a valid comparison of the hardware itself.
It's kind of ridiculous seeing many of the journalists out there who took this sensational headline and ran with it without even questioning its legitimacy.
Wilco1 - Wednesday, July 17, 2013 - link
The IPC comparison for integer code goes like:Silverthorne < A7 < A9 < A9R4 < Silvermont < A12 < Bobcat < A15 < Jaguar
This is based on fair comparisons using Geekbench and so doesn't reflect what some marketing departments claim or what cheated benchmarks (ie. AnTuTu) appear to show.
lmcd - Wednesday, July 17, 2013 - link
Where do various Kraits fit in?Wilco1 - Wednesday, July 17, 2013 - link
The Krait 800 as used in Galaxy S4 fits between A9 and A9R4 (it is ~5% better IPC than A9 - it makes up for that by clocking very high).tuxRoller - Wednesday, July 17, 2013 - link
Can you provide a reference for these values?The geekbench numbers are all over the place even for the same device (for instance, you see iphone 5 results that vary by 6%, while gs4 can vary by 16% easily).
Death666Angel - Thursday, July 18, 2013 - link
Not sure what geekbench measures and saves. But considering the multitude of options for tweaking an Android system is quite easy. Just change a some governors around and use a different scheduler and you can get quite the range of results.tuxRoller - Sunday, July 21, 2013 - link
That's kinda my point.Where exactly is he getting his numbers from.
michael2k - Wednesday, July 17, 2013 - link
It really does sound like the Swift or Krait cores, but widely available now to the Rockchips and Mediateks. Even if it comes out next year, it means $200 smartphones with the raw performance of an iPhone 5 or Galaxy S4 while Apple and Samsung sell something similar for $450. The real question then is how Qualcomm, Samsung, and Apple will push their architectures other than more die-shrinks. Apple still has the option of moving to 4 core as well as BIG.little, and Qualcomm still has the option of BIG.little as well, but where is Exynos going to head? 8 core BIG.little (for 16 total cores?) Asymmetric B.l with 8 big cores and 4 small cores? Something else altogether?fteoath64 - Friday, July 19, 2013 - link
Great point regarding Big.Little design in the SoC. There are many ways to implement Big.Little design on the wafer. I think only the rudimentary one has been used and this does not really link as much to OS optimizations as we would like. It takes effort/complexity and great code in drivers and kernel changes to take advantage of the design in order to maximise what the hardware can do. And there is the variant that could go Big.Medium.Little. If you look at the frequency charts of typical use, the Medium state do take a lot of the time duration while the Big takes very little (only in the spikes) then the Little takes care of near idle time. Having a Medium module takes space but might be worth the effort in power savings more than just Big.Little switching. The switching cost in power is negligible but time sustained power use on a certain frequency do have good savings (eg 30% in Medium state vs 5% on Big state). Optimizing the OS to change state is important as the efficiency and time savings are there to be had. The slower it does it, the more power it draws for a given duration. Another software optimizing is to split threads to specific core types or core number to optimise performance. eg Little core does all I/O since that is slow while FP/INT goes to Big, or INT split between Big and Little. Dynamic switching to keep one Big core active for several seconds longer might be a benefit if it gets triggered soon after, ie Why switch when a delay in switch solves the problem!. OF course a huge simulation is needed to find the optimal design points that are worth implementing. It is an iterative process. The same goes for GPU cores to get active and boost frequency on demand. For now, they kick fully blast when the game wants it. A great feedback way would be an FPS counter to throttle down the gpus since > 60fps is pretty useless unless you are running 120fps 3D displays. For that cap it at 120fps when the is mode is used. Due to the time to release durations, I am certain many compromised were made just to get the silicon out. ARM vendors are not like Intel who can afford the wait on a release because they had a monopoly on the chip in the market. Competition ensure that ARM evolves quickly and efficiently. This is where you can see Qualcomm succeeding while Tegra falters. Samsung is trying to find their secret sauce toward success with Octa-core design. I think next iteration might be a good one for them coupled with process node improvements they will use.I see a 2:3:4 design as optimum. 2Big 3Medium 4 Little. Here is how it should work:
Full Bore: 2Big 2Medium and 1 Little active (PentaCore design).
Medium operation: 3Medium and 2 Little active (Still holding PentaCore threading)
Step Down1: 1Medium 2 Little.
Idle: 1 Little only. Note Little takes ALL the I/O traffic.
roberto.tomas - Wednesday, July 17, 2013 - link
Looks pretty clear to me that there will be an A55 at 14nm or at least 10nm. The A12 is technically replacing the A9, right at the start of the next gen of chips which are all 64 bit. It doesn't do them any good to have a high end and low end chip that is 64 bit, and a mid range chip that is only 32 bit. But the power/performance claims are very close to the A15... so this is basically replacing the A15, from that perspective.The A57 will expire sometime at/after 14nm, and new designs will come out. At that time, an A55 that replaces it would make sense, fulfilling the same roll as the A12 at 32-bit.
Qwertilot - Wednesday, July 17, 2013 - link
I'm sure I remember reading somewhere (some interview?) that they decided that it just didn't make sense (yet) to go 64 bit for the sorts of devices that the A12 will be targeting. The A57 obviously has to go 64 bit to support servers and the like, and that presumably means that the A53 has to follow in order to be matched for bigLittle purposes for high end smart phones/tablets etc.As michael2k refers to above, the A12 is aimed more at mid/in time, low end phones and the like. Much less reason to push 64 bit there just yet. ARM have to support this sort of thing but I guess the business model means that they can too.
WhitneyLand - Wednesday, July 17, 2013 - link
The NEON improvements are compelling, but it would be nice to peek behind the curtain of the 48% improvement claims on FFMPEG.To start FFMPEG covers a vast amount of functionality, but certain FFMPEG codecs like h.264 are much more relevant than the obscure ones. So which codecs were used, and are the improvements seen in encoding or decoding, or both?
As we learned with AVX and x264, it's not always easy to realize big gains in real life scenarios with new SIMD hardware.
If there's interest in an article benchmarking x264 on the A9/A15/Krait (to tide us over until the A12 arrives) let me know, been trying to find a way to contribute to AT. :)
crypticsaga - Wednesday, July 17, 2013 - link
Is it just me or have the arm architecture reviews become more interesting than the intel or amd ones?WhitneyLand - Wednesday, July 17, 2013 - link
Your tastes are changing. I used to read power supply and case reviews, now I usually don't even peek at the summary. I guess you could say some people are interested in where the action is, and while Intel is making bold moves (I still read Intel articles) there are still many things that we already know from previous reviews. ARM is still novel to us.jeffkibuule - Monday, July 22, 2013 - link
They are constantly evolving. Intel isn't experiencing hyper growth in performance anymore, instead optimizing for power which is far less sexy.lilmoe - Wednesday, July 17, 2013 - link
Cortex A12 is too late to the game. Too frustratingly so.name99 - Wednesday, July 17, 2013 - link
What game is that? ARM is in the business of making money. It will sell hundreds of millions of these to the developing world, and make money on each one.If you want to view low-power CPUs as porn, not a business, you should be spending your time watching Apple (and to a lesser extent, Qualcomm and, maybe one-day Intel) not ARM.
lilmoe - Wednesday, July 17, 2013 - link
I'm strictly speaking business, not porn, thank you. To me, Cortex A12 is addressing the same problem that Krait and Swift cores addressed (specifically in memory bandwidth), where competitors are clearly >2 years ahead by the time of availability. Look how "successful" A15 vendors are (/s) while Qualcomm is taking a huge share of the pie.big.LITTLE is proving too hard to implement. Samsung has succeeded in providing Apple with their needs when other fabs failed, all while having difficulties with their implementation of big.LITTLE. Samsung even ditched the CHEAPER licence of ARM's Mali GPU in favor of IT's solution. There's clearly a problem somewhere. Yes, Cortex A15 is faster, but "average" performance of Krait 200 compared with big.LITTLE (a15/a7) is VERY comparable. However, in heavy workloads, Cortex A15 consumes significantly more power.
ARM has this "view" on how the market "should" be heading, while the market is heading in a clearly different power-envelope/performance direction. Reason? Android. Cortex A9 is not powerful enough for Android, and A15 consumes too much power. I'm a big believer in power efficiency, but ARM seriously need to revise their power envelope charts. Cortex A15 should have been a target of the 20/22nm process, NOT 28nm. That's how demand is working now. Cortex A12 SHOULD HAVE BEEN prioritized over Cortex A15 on 28nm. OEMs (including Samsung) are preferring Snapdragons over Exynos 5 and Tegra 4 even on more power tolerable devices like tablets.
That said, even they're high performance Cortex A15 is seriously threatened by Krait 300 and Silvermont cores in power efficiency at relative performance. And by the time A57 is implemented, where do you think the competition will be?
Someday Intel? Dude, for $400, can either get a Windows RT ARM tablet, or a FULL Windows 8 tablet running Saltwell (and Silvermont in the very near future), which one would you pick? Android tablet you say? Guess what Samsung is doing now with their Tab 3 10.1.
Developing countries? Don't worry about those, Krait 200 cores will be too darn cheap in 2 years when A12 is ready to ship. Oh, they also have modem integration......
The business world works VERY differently from the world enthusiasts live in...
aryonoco - Wednesday, July 17, 2013 - link
You make very valid points, but you have a very developed-world-centric point of view.Yes A12 is too late, it is a response to Swift and Krait, and it is over two years too late. We'll have to wait a while still of course to see how it stacks up against its competition in 2015, but I agree with you in that it should have been prioritized, and it's late.
What you are missing however is that there is a huge swath of the market where ARM doesn't really have a competition. Swift and Krait are non-entities there, they are too expensive for your average Chinese OEM like Zopo or THL or Cube or FAEA (or many dozen others). These phones and tablets are now ruling China, India and south-east Asia, and they are all using Mediatek Cortex A7s in the phones, and Rockchip or Allwinner Cortex A9s in their tablets and Android sticks. These are huge markets, we are talking over 3.5 Billion people who live in these countries, and yes Samsung and Apple sell their phones there as well, but they are tiny (especially Apple). Something like 37% of ARM's Cortex A chips were produced by Mediatek alone in 2012, and no one in the developed world has even heard of them.
Sure Qualcomm is trying to play in this market, with their confusingly named Snapdragon S4 Play (which is just a cortex A5) but Mediatek beats it hands down both in performance and price. Modem integration you say? It just isn't a factor. They all have single band HSPA modems on 2100 Mhz, or at best dual band 850/2100 or 900/2100 and that's all that these markets care about. Oh and dual SIM, they really care about their dual SIM. No one cares about Qualcomm's fancy pants LTE modems, there are no networks to use them on!
Will you ever see Cortex A12 in a flagship Samsung or Apple or HTC product? Probably not, but that doesn't really matter. You'll see it in hundreds of millions of products that will line up on AliExpress and Taobao and that's where the vast bulk of the market is going to be.
parim - Thursday, July 18, 2013 - link
I completely agree with all your points, the sub 200$ android phone market in India is dominated by MediaTech.All of India is on 2100Mhz HSPA
Qualcomm/nvidia need to reduce their prices by a lot to prevent being completely washed away by the MediaTech Wave
Qwertilot - Thursday, July 18, 2013 - link
The other thing to remember is that ARM aren't in competition with Qualcomm, Apple etc. You could even reasonably argue that the existence of Krait meant they didn't need to cover that niche as urgently as the newer markets that the A15/57 etc can be aimed at.Folk like MediaTech must be a massive long term worry for everyone in the SoC market (and Intel trying to break into it) - at some point (reasonably soon?) the bulk of people even in the developed world might well decide that they've got 'enough' performance and then all those very cheap chips will be waiting to destroy margins. ARM are of course set up not to mind.
aryonoco - Thursday, July 18, 2013 - link
Of course you are right, ARM doesn't really view Krait and Swift as its competition... but just because ARM is happy to license its instruction set doesn't mean they like Krait and Swift. ARM gets a higher initial revenue when it licenses out its instruction set, but after that, the percentage that it gets on shipping Krait and Swift cores is far lower than what it gets from a straight implementation of one of its Cortex A designs... in ARM's ideal world, everyone would be implementing straight from its designs and giving it the higher percentage fee.And yes, MediaTek is a massive worry to everyone the big guys. There is also a price war happening between Chinese companies (Rockchip and Allwinner) and MediaTek (which is Taiwanese) right now, price of highest end products have come down by over a third since the start of the year... you can now buy MT's highest end 1.5 Ghz quad core A7 chip for $8... this is just crazy!
Of course ARM is sitting comfortable at the top, but the one company that can give ARM some headache is Imagination. They've been beating ARM in the embedded GPU space for a good while now, and now that they have a CPU architecture of their own... this is going to get very interesting. In the 90s, MIPS had a far better IPC than ARM, so if Imagination really is set to revive MIPS and get aggressive on price, it will be very interesting to watch, especially with over 95% of Android apps working just fine on MIPS. Pity no one is paying attention to that battle, AnandTech didn't even cover the launch of Warrior.
wumpus - Thursday, July 18, 2013 - link
Most of the analysis of MIPS implies that it has a chance at the embedded world, but not a prayer where the chips listed in this article play. I would assume that Imagination has a long term plan to break into this market, but it will take some sort of extreme cost/power/performance advantage to convince anyone to give up an architecture. There is a reason that ARM64 is still a dominant architecture, and it has nothing to do with any inherent superiority of the instruction set (indeed, it is a disaster and an unholy kludge. While most of the time "the backward stuff" might be irrelevant to modern computing, it still takes up area, has to be validated, and still has warts that have to be dealt with every design). Changing architectures isn't taken lightly (see how wonderfully windows RT is doing).Qwertilot - Thursday, July 18, 2013 - link
Isn't the business model the major reason for the architecture getting so dominant? Given how cheaply/freely they license the architecture, you need a really strong motivation and/or massive scale to consider using anything else. Limits ARMs size of course.Mondozai - Friday, July 19, 2013 - link
"There is a reason that ARM64 is still a dominant architecture, and it has nothing to do with any inherent superiority of the instruction set"Sorry that's just lazy.
ARM is where it is because no comptetitor has managed an alternative that is sufficiently competitive to their architecture.
Legacy isn't an issue. If ARM was going irrelevant, the switch would occur, and very fast too.
fteoath64 - Friday, July 19, 2013 - link
Qualcomm is not worried right now because it is busy with serving higher priced solutions and it could hardly supply those customers!. Besides, Qualcomm has access to next-gen process in volumes that can crush MediaTek , Actions, HiSilicon etc by dropping prices in the next-gen offerings. That market will run out for MediaTek within 12 months, so Qualcomm is playing the game right for now. All ARM licensees can do whatever they want as long as they stay within their contractual obligations. These Chinese licensees better be careful les they get cut off from the license and had to seek alternative architectures (meaning none, what going Intel is suicide!). The idea of MediaTek and others taking the "lower-tier" of the market for 3-4 quarters is enough for all to feast on the market. They do not want Intel to come in to cream everything off and leaves nothing for the partners to live on. There is strategy and turf protection, these are no dumb companies having made it this far. They know most of the tricks and can outwit the or else they will die. You should know the difference between Chinese vendors and western vendors, the Chinese manufacturers are happy to sell a wholesaler a phone for $110 when it cost them $80 to make. ie profit $30. The wholesaler turns around and sell it for $180 making $40-50 each after all the distribution costs etc. Western companies will see this unit for $300 retail!. The difference is greed even when the manufacturer provided goodwill in the factory price. The reason factories put a limited profit on each unit is to move the volumes because they know full well what the market price is going to be. They do not want to inflate it further.lilmoe - Thursday, July 18, 2013 - link
I'm sure you have valid points too, but you're not getting my side of the story. MediaTech's solution is pretty solid, yes, but it's strictly Cortex A9. Consumer demand, even in developing countries, is growing, and the need for faster chips is growing as well. By 2015, what makes you think that Krait wont be competitive in price with higher performance? Qualcomm needs only to re-badge their already developed chips with higher clock speeds.Profit margins are way higher in developed countries, it's just a matter of time that even current flagship devices be slashed in price (with a slight change in design and chassis) and they're ready to take on the cheapest Chinese OEMs have to offer...
Anyway, this doesn't change the fact that ARM made a mistake in its priorities. Cortex A15 (big.LITTLE) should have been targeted for 20/22nm and smaller processes. Priority should have been for Cortex A12. And yes, flagships (hero phones) could have made use of that core if it was ready by this time, especially since it could have competed really well with Krait in both power efficiency and performance (most probably beating it if ARM's claims are to be believed in that regard.
Qwertilot - Thursday, July 18, 2013 - link
The world, and A15 in particular, isn't just mobile phones :) The Samsung Chromebook, early versions of the micro server hardware, shield aren't all massive volume but they've started the process of starting to get Arm chips accepted in a bunch of device classes where they just didn't previously exist.(Phone wise it did end up powering a lot of S4's.).
You can see that could easily be more attractive/important as a priority for ARM than picking fights with existing licensees over high end mobile phones.
If Qualcomm/ (Intel too of course) end up directly taking on the cheapest that the Chinese OEMs have to offer they've basically already lost as there's then so little profit left. They have to somehow make the case for more expensive but also more powerful/efficient chips. At some point that'll get hard.
lilmoe - Thursday, July 18, 2013 - link
True, it isn't only for mobile devices (smartphones and tablets), but those make the absolute majority of demand. ARM has too much competition to face in the server world, a world that already has tons of existing x86 code. By the time they're ready to seriously fight in the server world, 14nm would be the norm.Again, my argument isn't about which chip goes where, or who should compete where, it's about timing and priorities of architecture designs, which ARM clearly screwed up on. At least that's how I see it.
Mondozai - Friday, July 19, 2013 - link
lilmoe wrote:"By 2015, what makes you think that Krait wont be competitive in price with higher performance? Qualcomm needs only to re-badge their already developed chips with higher clock speeds."
And what makes you think that the competition will stand still until that time?
They could get better SoC's based on the A-12 by that time, to just name one (out of many) possibilities.
lilmoe - Friday, July 19, 2013 - link
They WILL stand still because Cortex A12 isn't ready to ship yet. It'll be a good 2 years before it's "cheaply" manufactured by the likes of MediaTec.By the way, Krait powered Nokia Lumias are going for around and less than the $200 mark right now. People are forgetting that Android isn't the only contender in the market. OS market share will most probably look very different in 2015.
There are many dynamics running the low power processor market.
Wilco1 - Wednesday, July 17, 2013 - link
It's only late if you consider it a directly competitor of Silvermont. However A9R4 as used by Tegra 4i seems perfectly timed.haukionkannel - Wednesday, July 17, 2013 - link
I am interested in how this A12 compares to A53 in speed wise... A53 has wider registers (64 bit) but A12 run in higher freguensis and has more cores?Wilco1 - Wednesday, July 17, 2013 - link
A12 will beat A53 by about 40%: A53 delivers performance comparable to A9 and A12 is 40% faster than A9. Note that 64-bit is not relevant and certainly doesn't provide a big speedup - even on x86 almost all software is still 32-bit as there is little to gain from going to 64-bit.jwcalla - Wednesday, July 17, 2013 - link
Well this isn't entirely accurate as some of us are running almost completely pure 64-bit systems. And it does appear that video encoding and decoding are common operations that can benefit from 64-bit software from some of the benchmarks I've seen, but that might actually have been from compiler options so maybe not. But otherwise yes, the 64-bit ARM chips are only really important for server type workloads where people already have 64-bit software that they don't want to rework.Wilco1 - Wednesday, July 17, 2013 - link
64-bit has pros and cons. x64 provides more registers so some applications run faster when built for 64-bit - that may well the video codecs you mention. However there are downsides as well, as all your pointers double in size, which slows down pointer heavy code. On 64-bit ARM things will be similar.Note the main reason for 64-bit is allowing more than 4GB of memory. The latest 32-bit ARMs already support this, so for mobiles/tablets etc there is no need to change to 64-bit.
wumpus - Thursday, July 18, 2013 - link
Er, no. Just no.From what I can tell, 32 bit ARM chips use (from a developer's view) the exact same mechanism as x86 and PAE. This might use 4G of RAM efficiently (and for those OSs that like to leave all apps in RAM, it might work well for a bit more).
Trying to address more memory than an integer register can map to is always going to be an unholy kludge (although I would personally recommend all computer architects design such a system into the architecture because it *will* happen if the architecture succeeds at all). Since ARM chips tend to go into machines that rarely allow memory to be upgraded, no vendor really should be selling machines with >4G RAM and 32 bit accessing. The size/power/cost tradeoff isn't worth it
google "Support for ARM LPAE". All my links go straight to the pdfs and I wind up with all the google code inbedded in my links.
Calinou__ - Thursday, July 18, 2013 - link
PAE doesn't allow for more than 3 GB per process. ;)Wilco1 - Thursday, July 18, 2013 - link
A15 based servers will have 8-32GB. So yes it does go well over 4GB, that's the whole point of PAE. Mobiles will end up with 4GB RAM soon and because of PAE there is no need to use 64-bit CPUs (which would be way overkill for a mobile).Yes I know about ARM LPAE and that it is supported in Linux.
fteoath64 - Friday, July 19, 2013 - link
"PAE there is no need to use 64-bit CPUs". Agreed. More effort to be spend on optimizing for speed and power efficiency.jensend - Friday, July 19, 2013 - link
"Cortex A12 retains the two integer pipelines of the Cortex A9, but adds support for integer divides (like the A7 and A15, other A-series architectures generally lacked support for hardware int divides)"The parenthetical material is very unclear. It sounds like it's saying "The A12 has hardware divides, the A15 and A7 don't, and other A-series archs likewise don't." A simple edit makes the sentence much more clear and slightly more concise:
"Cortex A12 retains the two integer pipelines of the Cortex A9; however, like the A7 and A15, it adds support for hardware integer divides (which previous A-series architectures generally lacked)."
phoenix_rizzen - Friday, July 19, 2013 - link
An even simpler edit is to just change the parenthetical comma into a semi-colon:"Cortex A12 retains the two integer pipelines of the Cortex A9, but adds support for integer divides (like the A7 and A15; other A-series architectures generally lacked support for hardware int divides)"
wumpus - Friday, July 19, 2013 - link
And I didn't see anyway for 32bit ARM to access more than 3G. Maybe there is, but the PAE-style mechanism that allowed each process to access 4G of ram (well 2-3G of user space and 2-1G of OS space). It looks like each process sees 32 bit MMU tags meaning no way to access the whole RAM. Again, somewhere in there they might have an unholy kludge, but I suspect that they are more than willing to do things the [PAE] intel way [not the 286 way that Microsoft forced everyone to support a decade after it was consigned to the junkyard].wumpus - Friday, July 19, 2013 - link
So how does one process access more than 4G (3G if Linux, likely less else where)? There is a reason nobody uses 32 bit chips. If you really looked up the datasheets, the *80386* chip could access way more than 64G virtual ram (it didn't have the pins for more than 4G of memory). You could even access it fairly easily in a process, but as far as I know *nobody* ever tried that.Note: Linux 0.x and I think 1.x could both handle 3G per memory process. Maybe not, I know Linus used the 386 segmentation scheme natively on different processes, but I have no idea if the 386 MMU could handle tags that depended on the segmentation scheme (it was quite possible you could either go wild with segments, or use them traditionally and have full MMU operation. I haven't looked at this stupid idea since 1990, when I learned the disaster that is x86.
We use 64 bit chips for a reason. If we didn't need to access memory the size of an integer register, I would strongly suspect that all integer registers would be 16 bits long (note the pentium4 computed integer operations 16 bits at a time, they are notably faster). Using a 64 bit register and 64 bit addressing means that you can access an entire database of arbitrary size (2^63, whatever that is), while using 32 bit machines requires a "networking" OS call to whichever process happens to have that particular datum in memory. It is yet another unholy kludge and the reason that "the only fatal mistake a computer architecture can have is too small a word size".
Wilco1 - Friday, July 19, 2013 - link
You don't need to access more than 3GB per process on a mobile! Mobiles/tablets will have 4GB of RAM, however each process still has it's own 32-bit address space and uses only a portion of the available RAM.There is no need to be so obsessed about 64-bit, you know the the Windows world is still mostly 32-bit 10 years after the introduction of Athlon64... Even Windows 8 still has a 32-bit version. So while there are lots of 64-bit chips around, most run only 32-bit code. My Athlon64 which I retired last year never ever ran 64-bit code during its entire life!
You only really require 64-bit addressing if you have big applications that need more than 3GB per process. Such applications are rare (your database is an example) and they only run on large expensive servers, not on mobiles. So clearly the need for 64-bit is extremely small, and mobiles/tablets will simply use PAE rather than switch to 64-bit for the foreseeable future.
Calinou__ - Saturday, July 20, 2013 - link
64 bit is still a technogy of the future. Not to mention PAE can be quite buggy sometimes, especially when running eg. proprietary drivers.Wolfpup - Thursday, July 25, 2013 - link
The timing on this seems weird. Didn't they know they needed a smaller jump between A9 and A15 years ago? I HOPE it's not really needed by late 2014/2015...I mean I hope by then we're all using A15, and maybe A5x or whatever... Or AMD's low power chips and SIlvermont!