The Source of Intel's Cougar Point SATA Bug
by Anand Lal Shimpi on January 31, 2011 6:05 PM EST- Posted in
- CPUs
- Intel
- Sandy Bridge
- Recall
I just got off the phone with Intel’s Steve Smith (VP and Director of Intel Client PC Operations and Enabling) and got some more detail on this morning’s 6-series chipset/SATA bug.
The Problem
Cougar Point (Intel’s 6-series chipsets: H67/P67) has two sets of SATA ports: four that support 3Gbps operation, and two that support 6Gbps operation. Each set of ports requires its own PLL source.
The problem in the chipset was traced back to a transistor in the 3Gbps PLL clocking tree. The aforementioned transistor has a very thin gate oxide, which allows you to turn it on with a very low voltage. Unfortunately in this case Intel biased the transistor with too high of a voltage, resulting in higher than expected leakage current. Depending on the physical characteristics of the transistor the leakage current here can increase over time which can ultimately result in this failure on the 3Gbps ports. The fact that the 3Gbps and 6Gbps circuits have their own independent clocking trees is what ensures that this problem is limited to only ports 2 - 5 off the controller.
You can coax the problem out earlier by testing the PCH at increased voltage and temperature levels. By increasing one or both of these values you can simulate load over time and that’s how the problem was initially discovered. Intel believes that any current issues users have with SATA performance/compatibility/reliability are likely unrelated to the hardware bug.
One fix for this type of a problem would be to scale down the voltage applied across the problematic transistor. In this case there’s a much simpler option. The source of the problem is actually not even a key part of the 6-series chipset design, it’s remnant of an earlier design that’s no longer needed. In our Sandy Bridge review I pointed out the fair amount of design reuse that was done in creating the 6-series chipset. The solution Intel has devised is to simply remove voltage to the transistor. The chip is functionally no different, but by permanently disabling the transistor the problem will never arise.
To make matters worse, the problem was inserted at the B-stepping of the 6-series chipsets. Earlier steppings (such as what we previewed last summer) didn’t have the problem. Unfortunately for Intel, only B-stepping chipsets shipped to customers. Since the fix involves cutting off voltage to a transistor it will be fixed with a new spin of metal and you’ll get a new associated stepping (presumably C-stepping?).
While Steve wouldn’t go into greater detail he kept mentioning that this bug was completely an oversight. It sounds to me like an engineer did something without thinking and this was the result. This is a bit different from my initial take on the problem. Intel originally characterized the issue as purely statistical, but the source sounds a lot more like a design problem rather than completely random chance.
It’s Notta Recall
Intel has shipped around 8 million 6-series chipsets since the launch at CES. It also committed to setting aside $700 million to deal with the repair and replacement of any affected chipsets. That works out to be $87.50 per chipset if there are 8 million affected chipsets in the market, nearly the cost of an entire motherboard. Now the funds have to cover supplying the new chipset, bringing in the affected motherboard and repairing it or sending out a new one. Intel can eat the cost of the chipset, leaving the $87.50 for shipping, labor and time, as well as any other consideration Intel provides the OEM with (here’s $5, don’t hate us too much). At the end of the day it seems like enough money to handle the problem. However Intel was very careful to point out that this is not a full blown recall. The why is simple.
If you have a desktop system with six SATA ports driven off of P67/H67 chipset, there’s a chance (at least 5%) that during normal use some of the 3Gbps ports will stop working over the course of 3 years. The longer you use the ports, the higher that percentage will be. If you fall into this category, chances are your motherboard manufacturer will set up some sort of an exchange where you get a fixed board. The motherboard manufacturer could simply desolder your 6-series chipset and replace it with a newer stepping if it wanted to be frugal.
If you have a notebook system with only two SATA ports however, the scenario is a little less clear. Notebooks don’t have tons of storage bays and thus they don’t always use all of the ports a chipset offers. If a notebook design only uses ports 0 & 1 off the chipset (the unaffected ports), then the end user would never encounter an issue and the notebook may not even be recalled. In fact, if there are notebook designs currently in the pipeline that only use ports 0 & 1 they may not be delayed by today’s announcement. This is the only source of hope if you’re looking for an unaffected release schedule for your dual-core SNB notebook.
Final Words
Intel maintains that Sandy Bridge CPUs are not affected, and current users are highly unlikely to encounter the issue even under heavy loads. So far Intel has only been able to document the issue after running extended testing at high temperatures (in a thermal chamber) and voltages. My recommendation is to try to only use ports 0 & 1 (the 6Gbps ports) on your 6-series motherboard until you get a replacement in place.
OEMs and motherboard manufacturers are going to be talking to Intel over the next week to figure out the next steps. Intel plans to deliver fixed silicon to its partners at the end of February, however it’ll still take time for the motherboard makers to turn those chips into products. I wouldn’t expect replacements until March at the earliest.
I maintain that the best gesture of goodwill on Intel’s part would be to enable motherboard manufacturers to replace P67/H67 motherboards with Z68 boards for those users who want them.
127 Comments
View All Comments
TivoLi - Saturday, February 5, 2011 - link
Changing the MB will have an impact on the OS OEM versions (or so I believe), who will pay for that?Will MS be so nice just to give another try?
Assgier - Sunday, February 6, 2011 - link
You're implying everyone uses Windows.I have news for you: not everyone uses Windows.
METALMORPHASIS - Saturday, February 5, 2011 - link
I think between Intel and the board manufacturers, they should just send us all new boards without even having to rip out our old ones to send back. They did this with my Logitech speaker system that had the snap,crackle & pop in the audio and paid well over $100 dollars for. (Now have 2 speaker systems) A proof of purchasing slip should be all that is required and send us all new and possibly updated versions. And in the end there would still be frosting on the cake for Intel, as Im sure alot of people would buy new chips to go with their new boards!Sactodd - Friday, February 11, 2011 - link
Instead of repairing these motherboards, they should be re-purposed for schools or non profit organizations (maybe even the THE INTEL COMPUTER CLUBHOUSE NETWORK). Let the 2 6gb sata ports be used with some notice (sticker?) Its not worth the $$$ for Intel to deal with. Thats my $.02Groovester - Thursday, February 24, 2011 - link
Why not fix the GPU problem while they're at it?From: http://www.anandtech.com/show/4083/the-sandy-bridg...
"despite having a 23Hz setting in the driver, Intel’s GPU would never output anything other than 24Hz to a display"
swinster - Monday, May 25, 2015 - link
OK, 4-5 years on from this, am I now starting to see an issue? I have a Tyan S5510 server board that uses the Couger C204 chipset. A while back I ended up with some data corruption on drives that were connected to the SATA connector, but it was all very sporadic. I thought the drives might be failing, and although I replaced them (as a mater of course), I have since tested them with various diagnostic utilities on other systems and couldn't find anything actually wrong with them.The drives I used as replacements were OK for a while, but again this week, one started to get odd errors, and it was indeed connected to one of the 3Gbps ports.
ManInBrown - Sunday, May 1, 2016 - link
Hi guys, very in-depth article. I own a Toshba Qosmio X770 laptop (UK model). It comes with HM65 chipset. CPU-z calls it: Intel Sandy Bridge Rev 09 - - Intel HM65 Rev B2And Device Manager displays the dreaded: Intel(R) 6 Series/C200 Series Chipset family PCI Express Root Port 1 (1C10), Port 2 (1C12), Port 4 (1C16) and Port 6 (1C1A).
This is my 4th year running this. I am glad this intel-defect does not affect my USB3 port (which I have only one). There are no eSata ports on my laptop. I have an SSD (primary), a 7200rpm HDD (secondary) and a DVD-RW drive.
From what I understand, Toshiba engineers should only use Sata3 ports 0 and 1 because the Sata2 ports (2-5) will cause problems.
(1) If my primary hard-drive (Samsung Evo830 SSD) is connected to port 0, then why am I only getting Sata2 speeds (250-300 MB/s read and write)? All benchmark utilities I have used report similar speeds (Windows power settings set to Highest). They also report that SSD is on Sata3 (iaStor - Ok) and the secondary hard-drive is on a Sata-2 port. I conducted the SSD test on another laptop and the speeds reported were 520MB/s R/W seq. So the SSD is not faulty.
(2) If SATA3 ports 0 were soldered to primary hard-drives and SATA3 port 1 to the DVD drive, does this mean that my secondary hard-drive is on a faulty SATA2 port?
(3) I wish there was a software which could gather this information and tell me who is connected to what on the HM65 sata ports. Is there one? Maybe a hacking utility, which could enable Sata-3 speeds for my SSD (if in deed it is connected to SATA3 PORT 0).
(4) Can I swap HM65 with a non-faulty B3 chipset or HM70?
Thanks for any answers. Best regards.