AMD's been taking about fusion forever but I can't get rid of the feeling that this Intel implementation will be much more "fused" than the AMD one will be. AMD barley has CPU turbo so adding a comined cpu/gpu turbo at once, maybe they can pull it off but experience makes me doubt that very much.
BTW, if it takes like 3mm^2 for a super fast video encoder I ask my self, why wasn't this done before?
Even if AMD's GPU in Liano is faster, intels GPU is finally decent and good enough for most people, but more importantly more people will care about CPU performance because most users dont play games and this GPU can more than easily handle HD video. And i'm sure SB will be faster than anything AMD has. Then throw in the AVX and i'd say Intel clearly have a better option for the vast majority of people, it just comes down to price now.
"And i'm sure SB will be faster than anything AMD has."
It's exactly price where AMD has the better option. It's people " known brand name" that keeps them at buying the same thing without knowledge... yeah lets buy a Pentium.
hahahahah yeah i agree AMD is the better option at all if i have the high budget i'll go for Insane i mean Intel but since im only "poor" and i cant afford it so i'll stick to AMD and my money worth it
how do you know Intel GPU has reached good enough state (do you have benchmarks to support your hypothesis). they have been trying to reach this state for as long as i can remember.
your good enough state might be very different that somebodies else's good enough state.
I am having 3 AMDs and 1 Intel, Believe me with the price of AMD CPUs, i can only get a celeron in Intel. Which cannot run NFS SHIFT. Or TIme Shift. But other hand, with AMD athlon, i have completed Devil May Cry 4 with decent speed. And the laptop costs 24K, Toshiba C650, psg xxxxx18 model. It has 360 GB SSD, ATI 4200HD.
Can you get such price and performance with Intel?
Best part is that i am running it with 800MHz cpu speed, with performance much much greater than 55K intel dual core laptop of my friend.
Many questions still not answered, may be Anand could found out for us.
1. Were the GPU performance we saw from 6 EU or 12 EU? 2. Where is FMA ( Fused Multiply Add ) ? Will we see it in Ivy Bridge? 3. Can All software developers access the Decoding Engine? We could see many codec being optimized for playback on Intel Hardware Decoder, whether it is fully supported codec or partially supported codec. 4. Hardware Encoder? It is Full Hardware encoder? Free to use for Software Dev? 5. OpenCL not possible? 6. How many % die size is given to Graphics? 7. Gfx Drivers, will Intel commit more resources on drivers update? Or Will they open sources it?
Apart from Sandy Bridge, Looking forward for reports on USB 3.0 situations, LightPeak, Gen 3 SSD.
With BCLK locked, where does that leave the motherboard manufacturers? I mean, what are they left to offer to enthusiasts, if the BCLK is locked? How are they going to differentiate an enthusiast-class motherboard from a mainstream one?
Sell more bullbozer boards. I was all set to be ready to get a nice Sandy Bridge and overclock it to hell, but now I think I'll get a bulldozer instead.
Sure there's the K, but it costs more. That kinda defeats the point, unless the aim is to get a high clk for epeen.
Only thing I am saddened by is that hybrid graphics apparently won't be "working" on the mobile high end chipset with the dual pci-e x8 lanes. It's extremely nice to have 2x a good modern mobile GPU, but still be able to switch to the Intel built in GPU when you want longer battery life on the road.
That ability, in the 2920 was something I was truly hoping for.
The rest of its abilities are quite nice and very welcome. USB 3 really is something to be sure they didn't miss. But otherwise kudos Intel.
I wonder If we could use a Discrete Graphics Card and enable the Media Engine. What about the DMI bus, Hasn't it become a bottleneck with SSD Drives and USB3? Does Intel have planes to address it?
I have a sneaking suspicion that Intel will be at the core of Apple's next laptop platform refresh with both SandyBridge and LightPeak.
Apple's MacBook lineup is starting to feel a little pressure from the other PC laptop vendors who are starting to produce some nicely designed tin and will need to stay current in order to continue to sell their products at such high premiums.
I'm imagining the next MacBook Pro lineup to offer 13" MBP's running i3 2120's and the 15" and 17" models running i5 2400/2500's or i7 2600's.
Apple already have their own dynamic integrated/discrete GPU switching technology (as do nVidia) and can make even better use of SB's integrated GPU augmented by a modest discrete GPU to deliver the performance that most users need but with much reduced power drain.
So how to differentiate themselves? LightPeak. Apple was the instigator of LightPeak to start with and Intel claimed at CES 2010 that it'd appear around a year later. That's next spring.
One thing's for sure: 2011 is going to be a VERY interesting year for new laptop and desktop devices :)
LightPeak WITHOUT USB3 will go over like a lead zeppelin. There are already plenty of USB3 peripherals available. I have never in my life seen a LightPeak peripheral, or even a review or sneak peek of one. Light Peak is coming, but I'm not sure that 2011 is its year.
The rate at which CPU speeds now increases is low enough that very few buyers feel any sort of pressure to upgrade the machine they bough 3 years ago. Apple can't deal with that by simply offering new iMacs and MacBooks with the newest Intel offering, since no normal person is much excited by another 10% CPU boost.
They have done an adequate job of dealing with this so far by boosting battery life, something (some) portable users do care about.
They have done a mixed job of making more cores, hyperthreading and better GPUs a reason to upgrade. We have some low-level infrastructure in Snow Leopard, but we have fsckall user level apps that take advantage of this. Where is the multi-threaded Safari? Where is the iTunes that utilizes multiple cores, and the GPU for transcoding audio? Does FileVault use AES-NI --- apparently not.
But Apple has done an truly astonishingly lousy job of tracking the one remaining piece of obvious slowness --- IO. Still no TRIM, still no eSATA, still no USB3.
My point is that I don't know the Apple politics, but I do know that they are doing a very very bad job of shipping machines that compel one to upgrade. There is no need for me to upgrade my 3+yr old Penryn iMac, for example --- I'd get a replacement with more cores (not used by any of my software), a better GPU (but what I have plays video just fine), and most importantly, NO FASTER IO.
Adding LightPeak to this mix without USB3 is not going to help any. People are still going to hold off on upgrades until USB3 is available, and no-one is going to rush to buy a LightPeak system so that they can then NOT run any of the many unavailable LightPeak peripherals on the shelves at Fry's.
On page 3: "Compared to an 8-core Bulldozer a 4-core Sandy Bridge has twice the 256-bit AVX throughput." WTF? 8*128 = 4*256. Based on the premise that the fp-scheduler of one Bulldozer module (two cores) can schedule e.g. one add and one mul avx-instruction per clock cycle, they have the same throughput. I think both architectures will have a delay for e.g. shuffling ymm-registers (compared to current xmm-instructions) because data has to be exchanged between different pipelines/ports (Hopefully the picture provided by Intel is correct). Perhaps the delay is smaller in Sandy Bridge cores. I expect some delays when one mixes floating-point and integer instructions on Sandy Bridge. (Currently I don't know, whether there exists a VEX prefix for xmm integer instructions. If there's no VEX prefix the delays will be great on both platforms.)
"So it's actually 8*256 = 4*2*256. At least this is how I see it. "
Ok, my calculation was a bit different. 4*2*256 will be true, but only if you mix additions and multiplications. Whether AMD is 8*2*128 depends on the fp-scheduler (based on the premise that one SIMD unit consists of a fmul, fadd and fmisc unit or something similar)
... one can do another floating point operation which goes through port 5, but the peak performance of additions and multiplications is more relevant in applications.
I think you are right. I would think bulldozer can manage the same theoreticakl throughput by issuing one combined FMA instruction (16 flop) / clock and module.
More importantly Bulldozer will achieve hight throughput for all the existing SSE code by having two independent FMA units. I have no idea how Anand could make such a mistake.
Well, I found a summary of the prefixes. Interestingly there are some exception, like I guessed, e.g. a VEX.128 prefix does not exist for conversion of packed floating points<->packed integers and for CRC32c + POPCNT.
Anand: The best info available on an exciting platform, good job.
I wonder if for the next article you could test DirectX / OpenGL compatibility? Intel advertises compliance for a lot of its products, but in reality the support is partial, and some applications that use DirectX / OpenGL entirely correctly are not supported by Intel graphics, including the current HD graphics.
I've found this with fastpictureviewer (DirectX, I think 9) and Photoshop CS5 (OpenGL 2)
This is quite shocking. Given that Intel is doing this currently, it would be great if reviewers could prod it into action, but unfortunately they tend to place speed first, correctness second or nowhere.
Hi CSMR, Could you please write more details about problems with DX and OGL on Intel HD graphics( including gfx driver version, system config ...). You mentioned about two applications: Fastpictureviewer and PSCS5, so could you please write some steps to reproduce to each of them - THX a lot.
What's the point of extreme editions if we're going to have affordable K SKUs? Or will socket 2011 not have any K SKUs? I'm guessing they'll leave the BCLCK unlocked on the 2011, and only have normal and extreme processors (no K processors). Or maybe extreme editions will just have more cores like 980X?
The extreme editions have always been for people who buy retail or who're playing with LN2 and need the most insanely binned part available. They've never been a mainstream OCer part.
I have a bad feeling about the "k" chips and the future of overclocking. Sure, intel gave us turbo mode, but that almost seems like appeasement before the last shoe drops. First, limited turbo with good overclockng, then better turbo and less overclocking, and now it's sounding like slightly better turbo and even less overclocking. It looks like we are moving to intel-controlled overclocking. There's virtually no value left for the enthusiast--a user that is already just a small part of the market. Intel just decided what the enthusiast needs, but I don't think they get what those users actually want.
I just don't buy that these limits are to prevent fraud. Mom and Pop stores are virtually all gone now, and I'd hate to think what Intel would do to a Dell or HP if they got caught overclocking desktops.
I guess this leaves another door open for AMD. Sad, cause SnB looks like a great design.
Hopefully Intel will allow the 'energy budget' to be increased when an extreme edition processer detects less thermal resistance (i.e. a bloody big heat sink). This would allow an EE CPU to either run with a higher multiplier or run at it's turbo frequency longer. (I'ld like this feature on all CPU's)
This would make EE CPU's interesting if K CPU's catch up in terms of cores.
What are the prospects for using Intel's transcoder to convert DVDs to 700MB avi files? Either DivX, Xvid, or H.264? Or anything else better than MPEG-2?
Since this seems to be, overall, a refinement, and not so much an improvement with new capabilities, and Anand's comments about the scalability of GPU related enhancements, that Intel is taking a two step approach towards CPU releases, in addition to its fab strategies? E.G, we see a new CPU, then it gets shrunk, then it gets improved (like this), then it gets bells and whistles (like a GPU etc), then we start over again with a really new architecture.....
This is no secret. This is exactly Intel's tick-tock strategy that has been in place for years now.
The one thing you have to keep in mind is that designing these CPUs now takes of order SEVEN YEARS (!!!) from conception to ship, which means that slips and mistakes do occur. Intel (and I guess AMD) have to make their best guess as to what the market will look like in seven years and sometimes they do guess incorrectly. Of course there is scope for small changes along the way closer to the release date, but not for changes in the grand strategy.
I think that the roadmap probably refers to OEM shipments, whereas, Anand was probably referring to when consumers would actually be able to buy devices.
I just realize that my computer will no longer scream when i do WebCam Video Conferencing with Skype!. With the Encoder Engine and Decoder Engine, all i am doing it feeding USB 3.0 data and move them around........
No, it seems to be right. Core Duo belongs to the Pentium M microarchitecture which implemented the SSE registers as two 64bit registers. So the largest registers were the x87-registers, but I'm not sure whether upon register renaming the registers were really copied.
For most people it makes perfect sense to get a new socket. Most people don't buy every new CPU from Intel or AMD because it would be a waste of money. My current CPU is a Core2Duo Quad processor with a 775 socket, i skipped the nehalem generation and will buy a SandyBridge early next year. So why should i keep my motherboard and the old 775 socket? Of course i will buy a new motherboard for the new processor. So i think for most people this is not a real issue.
There's a lot of "neato" stuff that does a lot to improve the user experience by making the chip use its design resources more intelligently (smarter turbo - that 'comcast turbo-boost' feature should really make a difference for end users); but in terms of actual throughput it looks like Intel left FP performance the same; and there certainly isn't any new integer hardware.
K11, on the other hand, doubled integer ALU's (though the raw number of execution units is now the same as in a Nehalem core) and added a half-width (compared to Intel) FP unit.
First, I'd be interested to see if the whizz-bangies AMD was talking about for the K11 FPU a year ago make the execution time for 128-bit FP instructions comparable, better than, or still slower than Intel's FPU .
Second, I'd be quadruple interested to see what impact the way AMD is allocating the new integer hardware is going to have on performance. A monolithic Nehalem core is going to be able to handle more complex (wider) threads better than a K11 core (that's a 2-integer and 1-FPU Bulldozer); but in SMT-mode (or pseudo-SMT mode) what happens? We know Intel experiences a performance hit in HTT mode which they are only able to offset because Nehalem is so wide. AMD thinks it isn't going to get the expected hit in the front end, and they won't have the thread-switching penalty that Intel does. My prediction is that 8-core K11/Bullzoder will crush Sandy Bridge in multithreaded FP-light workloads and be 5-20% slower in everything else (the possible exception being 128-bit floats).
I'm actually kind of disappointed by this update to Nehalem...Intel did a lot of "uncore" stuff and implemenated AVX. Where's our wider back-end? More execution hardware drives better single-thread performance...the rest is just undoing the damage from the CISC-RISC transition in the front end and OoO .
To me the problem is that instead of me overclocking without reguard to TDP, now Intel will do the overclocking for me, but it will be within the TDP that Intel thinks is best. Will this not just kill the after-market cooler makers with an almost locked TDP,and to some degree high end memory maker with a locked BCLK. This will change how overclocking is done from now on unless AMD keeps things as they are and forces Intel from going down this road.
It's true that the CPU will turboboost within the CPU's TDP, but exactly how much it will turboboost (how much bins it will gain) will depend on how well the CPU is being chilled. So having a better (read: aftermarket) cooler will allow you to take the best of your CPU's turboboost.
Hi Anand, while I expect the ring bus to provide great performances, I doubt that it don't impact die size and power consumption in respect to the Nehalem/Westmere L3 organization.
Let me explain... From my internal test, the Nehalem/Westmere L3 cache seems accessed by a four 64 bit channels (one per core). At 3 Ghz L3 cache, it translate in a maximum of 24 GB/s per core, or 96 GB/s for 4 cores. This cache organization seems confirmed by the tests at Techreport (on a i975X, SANDRA's L3 cumulative bandwidth is at about 60 GB/s: http://www.techreport.com/articles.x/18581/5) and Xbitlabs (EVEREST single-core L3 bandwidth of about 20 GB/s: http://www.xbitlabs.com/articles/cpu/display/intel... So, on Nehalem/Westmere I do not expect 4 x 256 wires, but only 4 x 64 wires (more or less).
Now, lets examine SB... We have 4 x 256 bit bus (4 indipendent rings) that runs around the L3 cache, for a total of 1024 wires. So, we have a lot of wires, that needs to be powered. These wires, in turn, need additional die space, and to me seems the main reason why most models will have "only" a 6 MB L3.
What do you think about? It is possible to ask Intel something about the Nehalem L3 cache organization and/or about the decision to equip most SB models with 6 MB of L3 cache?
Knowing what you do about Nehalem EX and SNB on socket H2, any speculation on what we can expect from the Socket B2/R chips when they finally arrive sometime next year?
I am mainly thinking of Northbridge/QPI and PCIe Lanes as compared to DMI used on the Mainstream parts discussed in this article.
I waited and waited for Westmere Core i7 to become "cheap" and thought the 970 was going to be my chip of choice @ $550. When they released it at $900 (you could already find 980Xs for less) it pretty much killed my plans to upgrade.
So now I am basically debating on do I build a high end H2 or wait for the enthusiast version to arrive instead? My understanding from seeing the server roadmap is there will be Socket B2 and Socket R with the differences between them mainly consisting of memory channels and # of PCIe Lanes. I have also read that both will support PCIe 3.0 whereas H2 will continue to use 2.0.
Add all these changes up and I am also hopeful we will see USB3 on the Enthusiast platform as well since it will have an additional 3-6 months to mature.
So any ideas/insight you have here would be awesome.
With the price of LCDs dropping, I am noticing that more and more consumers have more than one display for their mainstream machines. Has Intel said anything about how many displays the onboard graphics will be able to push? Have they said anything about what tech they are going to use ie. display port, HDMI, DVI-D, something else?
I can see myself getting a new SB machine sometime in Q1 2011 but I run at least 2 monitors at all times ( need the real estate for the type of work that I do ). I don't play many games but having the video decode/encode is important to me since I do tend to do some videoconferencing now a days.
The last thing I would like to know is if Intel is going to do the right thing with the drivers for their graphics. Will we humble linux users finally have a graphics driver that does not suck. Will Intel finally open source the driver so that the community can keep it updated and optimize it for X?
x264, the best h.264 encoder there is, produce better quality video and similar speed when using "ultrafast" setting. And with 2 / 4 Core we could even transcode 2 - 4 video at the same time.
The hardware encoder inside SandyBridge is not that speedy. While i could scarifies quality for speed. But Power VR's VRE Core manage 1000fps +, @ 400fps the encoder is like a waste of die space.
Intel could have further tuned the x264 for Sandy Bridge for speed and just release it with their drivers. If the hardware encoder aren't giving many times the increase in speed, then what is the point? They may as well have added extra 6 EU for GPU inside.
10-30% improvement-obviously that's great, but not as big as their previous tocks if I'm remembering right, and not much different from what "ticks" like Penryn did...I know Penryn was like a 10% boost minimum over Conroe...
I'm guessing it's because they're wasting effort and die area on a worthless GPU. I *HOPE* no one on this site, no power users are going to be using that thing. (Well, okay, for a tiny notebook or something maybe...)
Conroe was the first tock, and certainly, it was a major leap over the P4 line. But it's bloodline was actually derived from P6, which was carried through from Banias, Dothan, then Yohna. The improvement over Yohna was in the 10-20% IPC range.
Then came the tock, which was Nehalem. In single threaded performance, it was roughly another 5-10% over Penryn, but in multithreaded -- again, clock for clock, it had leaps of performance, around 20-40% again. http://www.anandtech.com/show/2658/20
The tick of Nehalem was Westmere, now Westmere did not launch a quad core part so it is hard to find a clock for clock, but in single threaded performance -- roughly the same as Nehalem, factoring out any turbo advantages...
Now SB, a tick, with another 10-30% across the board both single and multithreaded, depending on workload.
Of course, the GPU is not so worthless, it is indeed challenging low end GPUs -- no doubt Llano will offer up strong GPU performance, but for the majority of the market SB is perfectly fine.
Indeed, AMD is not too happy here, at least I would suspect. On the CPU side, Intel will crush anything AMD has in the same market segement where SB resides... GPU, surely AMD will crush SB. On the CPU front, AMD is already 20-40% behind Nehalem clock for clock, core for core, SB just extends that another 10-30%.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
62 Comments
Back to Article
beginner99 - Tuesday, September 14, 2010 - link
AMD's been taking about fusion forever but I can't get rid of the feeling that this Intel implementation will be much more "fused" than the AMD one will be. AMD barley has CPU turbo so adding a comined cpu/gpu turbo at once, maybe they can pull it off but experience makes me doubt that very much.BTW, if it takes like 3mm^2 for a super fast video encoder I ask my self, why wasn't this done before?
duploxxx - Tuesday, September 14, 2010 - link
first or not, doesn't really matter.who says AMD need's GPU turbo? If Liano really is a 400SP GPU it will knock any Intel GPU with or without turbo.
If we see the first results of Anadtech review which seems to be a GT2 part it doesn't have a chance at all.
core i5 is really castrated due to lack of HT, This is exactly where liano will fight against, with a bit less cpu power.
B3an - Tuesday, September 14, 2010 - link
Even if AMD's GPU in Liano is faster, intels GPU is finally decent and good enough for most people, but more importantly more people will care about CPU performance because most users dont play games and this GPU can more than easily handle HD video. And i'm sure SB will be faster than anything AMD has. Then throw in the AVX and i'd say Intel clearly have a better option for the vast majority of people, it just comes down to price now.B3an - Tuesday, September 14, 2010 - link
Sorry, didnt mean AVX, i meant the hardware accelerated video encoding.bitcrazed - Tuesday, September 14, 2010 - link
But it's not just about raw power - it's about power per dollar.If you've got $500 to spend on a mobo and CPU, where do you spend it? On a slower Intel platform or on a faster AMD platform?
If AMD get their pricing right, they could turn this into a no-brainer decision, greatly increasing their sales.
duploxxx - Tuesday, September 14, 2010 - link
now here comes the issue with the real fanboys:"And i'm sure SB will be faster than anything AMD has."
It's exactly price where AMD has the better option. It's people " known brand name" that keeps them at buying the same thing without knowledge... yeah lets buy a Pentium.
takeulo - Wednesday, September 15, 2010 - link
hahahahah yeah i agree AMD is the better option at all if i have the high budget i'll go for Insane i mean Intel but since im only "poor" and i cant afford it so i'll stick to AMD and my money worth itsorry for my bad english XD
MySchizoBuddy - Monday, December 20, 2010 - link
how do you know Intel GPU has reached good enough state (do you have benchmarks to support your hypothesis). they have been trying to reach this state for as long as i can remember.your good enough state might be very different that somebodies else's good enough state.
bindesh - Tuesday, September 20, 2011 - link
Your all doubts will be cleared after watching this video, and related once.http://www.youtube.com/watch?v=XqBk0uHrxII&fea...
I am having 3 AMDs and 1 Intel, Believe me with the price of AMD CPUs, i can only get a celeron in Intel. Which cannot run NFS SHIFT. Or TIme Shift. But other hand, with AMD athlon, i have completed Devil May Cry 4 with decent speed. And the laptop costs 24K, Toshiba C650, psg xxxxx18 model. It has 360 GB SSD, ATI 4200HD.
Can you get such price and performance with Intel?
Best part is that i am running it with 800MHz cpu speed, with performance much much greater than 55K intel dual core laptop of my friend.
vlado08 - Tuesday, September 14, 2010 - link
Still no word ont the 23.976 FPS play back?iwodo - Tuesday, September 14, 2010 - link
Many questions still not answered, may be Anand could found out for us.1. Were the GPU performance we saw from 6 EU or 12 EU?
2. Where is FMA ( Fused Multiply Add ) ? Will we see it in Ivy Bridge?
3. Can All software developers access the Decoding Engine? We could see many codec being optimized for playback on Intel Hardware Decoder, whether it is fully supported codec or partially supported codec.
4. Hardware Encoder? It is Full Hardware encoder? Free to use for Software Dev?
5. OpenCL not possible?
6. How many % die size is given to Graphics?
7. Gfx Drivers, will Intel commit more resources on drivers update? Or Will they open sources it?
Apart from Sandy Bridge, Looking forward for reports on USB 3.0 situations, LightPeak, Gen 3 SSD.
trivik12 - Tuesday, September 14, 2010 - link
1) I believe it was 12EU part.2) FMA will be introduced with Haswell(next tock). So we have to wait until early 2013 for that.
Foo999 - Tuesday, September 14, 2010 - link
> 2. Where is FMA ( Fused Multiply Add ) ? Will we see it in Ivy Bridge?You can check out the full current (and Ivy Bridge) AVX instructions in the AVX reference manual available from software.intel.com/en-us/avx/
spart - Tuesday, September 14, 2010 - link
1 , 6UE The 12 is only for laptops and high rangesgvaley - Tuesday, September 14, 2010 - link
So, was it playable, I mean Starcraft II?therealnickdanger - Tuesday, September 14, 2010 - link
Yeah, the caption said "310M vs Sandy Bridge" so I assume you could see the settings and frames per second. Details, man, details!!:)
Anand Lal Shimpi - Tuesday, September 14, 2010 - link
Yes, it was playable at medium quality settings. They only had the single player campaign running however.Take care,
Anand
Carleh - Tuesday, September 14, 2010 - link
With BCLK locked, where does that leave the motherboard manufacturers?I mean, what are they left to offer to enthusiasts, if the BCLK is locked? How are they going to differentiate an enthusiast-class motherboard from a mainstream one?
ssj4Gogeta - Tuesday, September 14, 2010 - link
Will they be locking the socket 2011 parts as well?Zoomer - Sunday, September 19, 2010 - link
Sell more bullbozer boards. I was all set to be ready to get a nice Sandy Bridge and overclock it to hell, but now I think I'll get a bulldozer instead.Sure there's the K, but it costs more. That kinda defeats the point, unless the aim is to get a high clk for epeen.
FXi - Tuesday, September 14, 2010 - link
Only thing I am saddened by is that hybrid graphics apparently won't be "working" on the mobile high end chipset with the dual pci-e x8 lanes. It's extremely nice to have 2x a good modern mobile GPU, but still be able to switch to the Intel built in GPU when you want longer battery life on the road.That ability, in the 2920 was something I was truly hoping for.
The rest of its abilities are quite nice and very welcome. USB 3 really is something to be sure they didn't miss. But otherwise kudos Intel.
Drazick - Tuesday, September 14, 2010 - link
Anand, few questions with your permission:I wonder If we could use a Discrete Graphics Card and enable the Media Engine.
What about the DMI bus, Hasn't it become a bottleneck with SSD Drives and USB3?
Does Intel have planes to address it?
Thanks.
EricZBA - Tuesday, September 14, 2010 - link
Someone please release a decent 13.3 inch laptop using Sandy Bridge please.bitcrazed - Tuesday, September 14, 2010 - link
I have a sneaking suspicion that Intel will be at the core of Apple's next laptop platform refresh with both SandyBridge and LightPeak.Apple's MacBook lineup is starting to feel a little pressure from the other PC laptop vendors who are starting to produce some nicely designed tin and will need to stay current in order to continue to sell their products at such high premiums.
I'm imagining the next MacBook Pro lineup to offer 13" MBP's running i3 2120's and the 15" and 17" models running i5 2400/2500's or i7 2600's.
Apple already have their own dynamic integrated/discrete GPU switching technology (as do nVidia) and can make even better use of SB's integrated GPU augmented by a modest discrete GPU to deliver the performance that most users need but with much reduced power drain.
So how to differentiate themselves? LightPeak. Apple was the instigator of LightPeak to start with and Intel claimed at CES 2010 that it'd appear around a year later. That's next spring.
One thing's for sure: 2011 is going to be a VERY interesting year for new laptop and desktop devices :)
name99 - Tuesday, September 14, 2010 - link
LightPeak WITHOUT USB3 will go over like a lead zeppelin.There are already plenty of USB3 peripherals available. I have never in my life seen a LightPeak peripheral, or even a review or sneak peek of one. Light Peak is coming, but I'm not sure that 2011 is its year.
The rate at which CPU speeds now increases is low enough that very few buyers feel any sort of pressure to upgrade the machine they bough 3 years ago. Apple can't deal with that by simply offering new iMacs and MacBooks with the newest Intel offering, since no normal person is much excited by another 10% CPU boost.
They have done an adequate job of dealing with this so far by boosting battery life, something (some) portable users do care about.
They have done a mixed job of making more cores, hyperthreading and better GPUs a reason to upgrade. We have some low-level infrastructure in Snow Leopard, but we have fsckall user level apps that take advantage of this. Where is the multi-threaded Safari? Where is the iTunes that utilizes multiple cores, and the GPU for transcoding audio? Does FileVault use AES-NI --- apparently not.
But Apple has done an truly astonishingly lousy job of tracking the one remaining piece of obvious slowness --- IO. Still no TRIM, still no eSATA, still no USB3.
My point is that I don't know the Apple politics, but I do know that they are doing a very very bad job of shipping machines that compel one to upgrade. There is no need for me to upgrade my 3+yr old Penryn iMac, for example --- I'd get a replacement with more cores (not used by any of my software), a better GPU (but what I have plays video just fine), and most importantly, NO FASTER IO.
Adding LightPeak to this mix without USB3 is not going to help any. People are still going to hold off on upgrades until USB3 is available, and no-one is going to rush to buy a LightPeak system so that they can then NOT run any of the many unavailable LightPeak peripherals on the shelves at Fry's.
NaN42 - Tuesday, September 14, 2010 - link
On page 3: "Compared to an 8-core Bulldozer a 4-core Sandy Bridge has twice the 256-bit AVX throughput."WTF? 8*128 = 4*256. Based on the premise that the fp-scheduler of one Bulldozer module (two cores) can schedule e.g. one add and one mul avx-instruction per clock cycle, they have the same throughput. I think both architectures will have a delay for e.g. shuffling ymm-registers (compared to current xmm-instructions) because data has to be exchanged between different pipelines/ports (Hopefully the picture provided by Intel is correct). Perhaps the delay is smaller in Sandy Bridge cores. I expect some delays when one mixes floating-point and integer instructions on Sandy Bridge. (Currently I don't know, whether there exists a VEX prefix for xmm integer instructions. If there's no VEX prefix the delays will be great on both platforms.)
gvaley - Tuesday, September 14, 2010 - link
"...you get two 256-bit AVX operations per clock.""AMD sees AVX support in a different light than Intel. Bulldozer features two 128-bit SSE paths that can be combined for 256-bit AVX operations. "
So it's actually 8*256 = 4*2*256. At least this is how I see it.
NaN42 - Tuesday, September 14, 2010 - link
"So it's actually 8*256 = 4*2*256. At least this is how I see it. "Ok, my calculation was a bit different. 4*2*256 will be true, but only if you mix additions and multiplications. Whether AMD is 8*2*128 depends on the fp-scheduler (based on the premise that one SIMD unit consists of a fmul, fadd and fmisc unit or something similar)
NaN42 - Tuesday, September 14, 2010 - link
... one can do another floating point operation which goes through port 5, but the peak performance of additions and multiplications is more relevant in applications.Spacksack - Tuesday, September 14, 2010 - link
I think you are right. I would think bulldozer can manage the same theoreticakl throughput by issuing one combined FMA instruction (16 flop) / clock and module.More importantly Bulldozer will achieve hight throughput for all the existing SSE code by having two independent FMA units. I have no idea how Anand could make such a mistake.
yuhong - Tuesday, September 14, 2010 - link
There is no VEX.256 for 256-bit integer ops, but there is a VEX.128 prefix that zeros the upper part of YMM registers to reduce the delays..NaN42 - Tuesday, September 14, 2010 - link
Well, I found a summary of the prefixes. Interestingly there are some exception, like I guessed, e.g. a VEX.128 prefix does not exist for conversion of packed floating points<->packed integers and for CRC32c + POPCNT.CSMR - Tuesday, September 14, 2010 - link
Anand:The best info available on an exciting platform, good job.
I wonder if for the next article you could test DirectX / OpenGL compatibility? Intel advertises compliance for a lot of its products, but in reality the support is partial, and some applications that use DirectX / OpenGL entirely correctly are not supported by Intel graphics, including the current HD graphics.
I've found this with fastpictureviewer (DirectX, I think 9) and Photoshop CS5 (OpenGL 2)
This is quite shocking. Given that Intel is doing this currently, it would be great if reviewers could prod it into action, but unfortunately they tend to place speed first, correctness second or nowhere.
marass31 - Thursday, September 16, 2010 - link
Hi CSMR,Could you please write more details about problems with DX and OGL on Intel HD graphics( including gfx driver version, system config ...). You mentioned about two applications: Fastpictureviewer and PSCS5, so could you please write some steps to reproduce to each of them - THX a lot.
ssj4Gogeta - Tuesday, September 14, 2010 - link
What's the point of extreme editions if we're going to have affordable K SKUs?Or will socket 2011 not have any K SKUs? I'm guessing they'll leave the BCLCK unlocked on the 2011, and only have normal and extreme processors (no K processors). Or maybe extreme editions will just have more cores like 980X?
DanNeely - Tuesday, September 14, 2010 - link
The extreme editions have always been for people who buy retail or who're playing with LN2 and need the most insanely binned part available. They've never been a mainstream OCer part.MonkeyPaw - Tuesday, September 14, 2010 - link
I have a bad feeling about the "k" chips and the future of overclocking. Sure, intel gave us turbo mode, but that almost seems like appeasement before the last shoe drops. First, limited turbo with good overclockng, then better turbo and less overclocking, and now it's sounding like slightly better turbo and even less overclocking. It looks like we are moving to intel-controlled overclocking. There's virtually no value left for the enthusiast--a user that is already just a small part of the market. Intel just decided what the enthusiast needs, but I don't think they get what those users actually want.I just don't buy that these limits are to prevent fraud. Mom and Pop stores are virtually all gone now, and I'd hate to think what Intel would do to a Dell or HP if they got caught overclocking desktops.
I guess this leaves another door open for AMD. Sad, cause SnB looks like a great design.
This Guy - Wednesday, September 15, 2010 - link
Hopefully Intel will allow the 'energy budget' to be increased when an extreme edition processer detects less thermal resistance (i.e. a bloody big heat sink). This would allow an EE CPU to either run with a higher multiplier or run at it's turbo frequency longer. (I'ld like this feature on all CPU's)This would make EE CPU's interesting if K CPU's catch up in terms of cores.
Shadowmaster625 - Tuesday, September 14, 2010 - link
What are the prospects for using Intel's transcoder to convert DVDs to 700MB avi files? Either DivX, Xvid, or H.264? Or anything else better than MPEG-2?Dfere - Tuesday, September 14, 2010 - link
Since this seems to be, overall, a refinement, and not so much an improvement with new capabilities, and Anand's comments about the scalability of GPU related enhancements, that Intel is taking a two step approach towards CPU releases, in addition to its fab strategies? E.G, we see a new CPU, then it gets shrunk, then it gets improved (like this), then it gets bells and whistles (like a GPU etc), then we start over again with a really new architecture.....name99 - Tuesday, September 14, 2010 - link
This is no secret. This is exactly Intel's tick-tock strategy that has been in place for years now.The one thing you have to keep in mind is that designing these CPUs now takes of order SEVEN YEARS (!!!) from conception to ship, which means that slips and mistakes do occur. Intel (and I guess AMD) have to make their best guess as to what the market will look like in seven years and sometimes they do guess incorrectly. Of course there is scope for small changes along the way closer to the release date, but not for changes in the grand strategy.
medi01 - Tuesday, September 14, 2010 - link
Agreed, it was two things: greed and the fact that AMD is currently not in a position to be a threat.tatertot - Tuesday, September 14, 2010 - link
"The value segments won’t see Sandy Bridge until 2012."You later show a roadmap slide which indicates Sandy Bridge in the value segment in Q3 2011.
Perhaps you meant "H2 '11" instead of "2012" ?
J_Tarasovic - Thursday, September 16, 2010 - link
I think that the roadmap probably refers to OEM shipments, whereas, Anand was probably referring to when consumers would actually be able to buy devices.iwodo - Tuesday, September 14, 2010 - link
I just realize that my computer will no longer scream when i do WebCam Video Conferencing with Skype!. With the Encoder Engine and Decoder Engine, all i am doing it feeding USB 3.0 data and move them around........yuhong - Tuesday, September 14, 2010 - link
"Back in the Core Duo days that was 80-bits of data. When Intel implemented SSE, the burden grew to 128-bits. ""Core Duo" Huh?
NaN42 - Tuesday, September 14, 2010 - link
No, it seems to be right. Core Duo belongs to the Pentium M microarchitecture which implemented the SSE registers as two 64bit registers. So the largest registers were the x87-registers, but I'm not sure whether upon register renaming the registers were really copied.aka_Warlock - Tuesday, September 14, 2010 - link
New CPU from Intel... and guess what?!! New SOCKET!! Lol.Intel do know how to milk the stupid cow.
bernpi - Sunday, November 14, 2010 - link
For most people it makes perfect sense to get a new socket. Most people don't buy every new CPU from Intel or AMD because it would be a waste of money. My current CPU is a Core2Duo Quad processor with a 775 socket, i skipped the nehalem generation and will buy a SandyBridge early next year. So why should i keep my motherboard and the old 775 socket? Of course i will buy a new motherboard for the new processor. So i think for most people this is not a real issue.Sahrin - Tuesday, September 14, 2010 - link
There's a lot of "neato" stuff that does a lot to improve the user experience by making the chip use its design resources more intelligently (smarter turbo - that 'comcast turbo-boost' feature should really make a difference for end users); but in terms of actual throughput it looks like Intel left FP performance the same; and there certainly isn't any new integer hardware.K11, on the other hand, doubled integer ALU's (though the raw number of execution units is now the same as in a Nehalem core) and added a half-width (compared to Intel) FP unit.
First, I'd be interested to see if the whizz-bangies AMD was talking about for the K11 FPU a year ago make the execution time for 128-bit FP instructions comparable, better than, or still slower than Intel's FPU .
Second, I'd be quadruple interested to see what impact the way AMD is allocating the new integer hardware is going to have on performance. A monolithic Nehalem core is going to be able to handle more complex (wider) threads better than a K11 core (that's a 2-integer and 1-FPU Bulldozer); but in SMT-mode (or pseudo-SMT mode) what happens? We know Intel experiences a performance hit in HTT mode which they are only able to offset because Nehalem is so wide. AMD thinks it isn't going to get the expected hit in the front end, and they won't have the thread-switching penalty that Intel does. My prediction is that 8-core K11/Bullzoder will crush Sandy Bridge in multithreaded FP-light workloads and be 5-20% slower in everything else (the possible exception being 128-bit floats).
I'm actually kind of disappointed by this update to Nehalem...Intel did a lot of "uncore" stuff and implemenated AVX. Where's our wider back-end? More execution hardware drives better single-thread performance...the rest is just undoing the damage from the CISC-RISC transition in the front end and OoO .
JoJoman88 - Wednesday, September 15, 2010 - link
To me the problem is that instead of me overclocking without reguard to TDP, now Intel will do the overclocking for me, but it will be within the TDP that Intel thinks is best. Will this not just kill the after-market cooler makers with an almost locked TDP,and to some degree high end memory maker with a locked BCLK.This will change how overclocking is done from now on unless AMD keeps things as they are and forces Intel from going down this road.
gvaley - Wednesday, September 15, 2010 - link
It's true that the CPU will turboboost within the CPU's TDP, but exactly how much it will turboboost (how much bins it will gain) will depend on how well the CPU is being chilled. So having a better (read: aftermarket) cooler will allow you to take the best of your CPU's turboboost.shodanshok - Wednesday, September 15, 2010 - link
Hi Anand,while I expect the ring bus to provide great performances, I doubt that it don't impact die size and power consumption in respect to the Nehalem/Westmere L3 organization.
Let me explain...
From my internal test, the Nehalem/Westmere L3 cache seems accessed by a four 64 bit channels (one per core). At 3 Ghz L3 cache, it translate in a maximum of 24 GB/s per core, or 96 GB/s for 4 cores. This cache organization seems confirmed by the tests at Techreport (on a i975X, SANDRA's L3 cumulative bandwidth is at about 60 GB/s: http://www.techreport.com/articles.x/18581/5) and Xbitlabs (EVEREST single-core L3 bandwidth of about 20 GB/s: http://www.xbitlabs.com/articles/cpu/display/intel...
So, on Nehalem/Westmere I do not expect 4 x 256 wires, but only 4 x 64 wires (more or less).
Now, lets examine SB...
We have 4 x 256 bit bus (4 indipendent rings) that runs around the L3 cache, for a total of 1024 wires. So, we have a lot of wires, that needs to be powered. These wires, in turn, need additional die space, and to me seems the main reason why most models will have "only" a 6 MB L3.
What do you think about? It is possible to ask Intel something about the Nehalem L3 cache organization and/or about the decision to equip most SB models with 6 MB of L3 cache?
Thanks.
Casper42 - Wednesday, September 15, 2010 - link
Knowing what you do about Nehalem EX and SNB on socket H2, any speculation on what we can expect from the Socket B2/R chips when they finally arrive sometime next year?I am mainly thinking of Northbridge/QPI and PCIe Lanes as compared to DMI used on the Mainstream parts discussed in this article.
I waited and waited for Westmere Core i7 to become "cheap" and thought the 970 was going to be my chip of choice @ $550. When they released it at $900 (you could already find 980Xs for less) it pretty much killed my plans to upgrade.
So now I am basically debating on do I build a high end H2 or wait for the enthusiast version to arrive instead?
My understanding from seeing the server roadmap is there will be Socket B2 and Socket R with the differences between them mainly consisting of memory channels and # of PCIe Lanes. I have also read that both will support PCIe 3.0 whereas H2 will continue to use 2.0.
Add all these changes up and I am also hopeful we will see USB3 on the Enthusiast platform as well since it will have an additional 3-6 months to mature.
So any ideas/insight you have here would be awesome.
linkages - Thursday, September 16, 2010 - link
With the price of LCDs dropping, I am noticing that more and more consumers have more than one display for their mainstream machines. Has Intel said anything about how many displays the onboard graphics will be able to push? Have they said anything about what tech they are going to use ie. display port, HDMI, DVI-D, something else?I can see myself getting a new SB machine sometime in Q1 2011 but I run at least 2 monitors at all times ( need the real estate for the type of work that I do ). I don't play many games but having the video decode/encode is important to me since I do tend to do some videoconferencing now a days.
The last thing I would like to know is if Intel is going to do the right thing with the drivers for their graphics. Will we humble linux users finally have a graphics driver that does not suck. Will Intel finally open source the driver so that the community can keep it updated and optimize it for X?
chukked - Thursday, September 16, 2010 - link
Hi Anand,thanks for the review, you addressed everything but left virtualization :(
which processors support vt-x and vt-d ?
iwodo - Friday, September 17, 2010 - link
x264, the best h.264 encoder there is, produce better quality video and similar speed when using "ultrafast" setting. And with 2 / 4 Core we could even transcode 2 - 4 video at the same time.The hardware encoder inside SandyBridge is not that speedy. While i could scarifies quality for speed. But Power VR's VRE Core manage 1000fps +, @ 400fps the encoder is like a waste of die space.
Intel could have further tuned the x264 for Sandy Bridge for speed and just release it with their drivers. If the hardware encoder aren't giving many times the increase in speed, then what is the point? They may as well have added extra 6 EU for GPU inside.
A Link to someone's blog posting some figures.
http://lee.hdgreetings.com/2010/09/intel-cpu-vs-nv...
Wolfpup - Wednesday, September 29, 2010 - link
Pretty disappointing. I'm sure AMD's glad though!10-30% improvement-obviously that's great, but not as big as their previous tocks if I'm remembering right, and not much different from what "ticks" like Penryn did...I know Penryn was like a 10% boost minimum over Conroe...
I'm guessing it's because they're wasting effort and die area on a worthless GPU. I *HOPE* no one on this site, no power users are going to be using that thing. (Well, okay, for a tiny notebook or something maybe...)
JumpingJack - Wednesday, September 29, 2010 - link
I don't believe you are remembering correctly.Conroe was the first tock, and certainly, it was a major leap over the P4 line. But it's bloodline was actually derived from P6, which was carried through from Banias, Dothan, then Yohna. The improvement over Yohna was in the 10-20% IPC range.
Then came Penryn the tick, which was on average only 5%, http://www.anandtech.com/show/2306/3
Then came the tock, which was Nehalem. In single threaded performance, it was roughly another 5-10% over Penryn, but in multithreaded -- again, clock for clock, it had leaps of performance, around 20-40% again. http://www.anandtech.com/show/2658/20
The tick of Nehalem was Westmere, now Westmere did not launch a quad core part so it is hard to find a clock for clock, but in single threaded performance -- roughly the same as Nehalem, factoring out any turbo advantages...
Now SB, a tick, with another 10-30% across the board both single and multithreaded, depending on workload.
Of course, the GPU is not so worthless, it is indeed challenging low end GPUs -- no doubt Llano will offer up strong GPU performance, but for the majority of the market SB is perfectly fine.
Indeed, AMD is not too happy here, at least I would suspect. On the CPU side, Intel will crush anything AMD has in the same market segement where SB resides... GPU, surely AMD will crush SB. On the CPU front, AMD is already 20-40% behind Nehalem clock for clock, core for core, SB just extends that another 10-30%.
gundersausage - Tuesday, October 26, 2010 - link
i7-950 vs i7-2500K... So which will be faster and a better gaming chip? anyone?markhennry - Monday, November 29, 2010 - link
wow awesome site about the computer parts i thinking buy a cpu and now i got a good help from this sites..katleo123 - Tuesday, February 1, 2011 - link
It is not expected to compete Core i7 processors to take its place
visit http://www.techreign.com/2010/12/intels-sandy-brid...