Vik


Semi-Critical Intel Atom C2000 SoC Flaw Discovered, Hardware Fix Required

Semi-Critical Intel Atom C2000 SoC Flaw Discovered, Hardware Fix Required

Last week, Paul Alcorn over at Tom’s Hardware picked up on an interesting statement made by Intel in their Q4 2016 earnings call.  The company, whose Data Center group’s profits had slipped a bit year-over-year, was “observing a product quality issue in the fourth quarter with slightly higher expected failure rates under certain use and time constraints.” As a result the company had setup a reserve fund as part of their larger effort to deal with the issue, which would include a “minor” design (i.e. silicon) fix to permanently resolve the problem.

A bit more digging by Paul further turned up that the problem was with Intel’s Atom C2000 family, better known by the codenames Avoton and Rangeley. As a refresher, the Silvermont-based server SoCs were launched in Q3 of 2013 – about three and a half years ago – and are offered with 2, 4, and 8 cores. These chips are, in turn, meant for use in lower-power and reasonably highly threaded applications such as microservers, communication/networking gear, and storage. As a result the C2000 is an important part of Intel’s product lineup – especially as it directly competes with various ARM-based processors in many of its markets – but it’s a name that’s better known to device manufacturers and IT engineers than it is to consumers. Consequently, an issue with the C2000 family doesn’t immediately raise any eyebrows.

Jumping a week into the present, since their earnings call Intel has posted an updated spec sheet for the Atom C2000 family. More importantly, device manufacturers have started posting new product errata notices; and while they are keeping their distance away from naming the C2000 directly, all signs point to the affected products being C2000 based. As a result we finally have some insight into what the issue is with C2000. And while the news isn’t anywhere close to dire, it’s certainly not good news for Intel. As it turns out, there’s a degradation issue with at least some (if not all) parts in the Atom C2000 family, which over time can cause chips to fail only a few years into their lifetimes.

The Problem: Early Circuit Degradation

To understand what’s going on and why C2000 SoCs can fail early, let’s start with Intel’s updated spec sheet, which contains the new errata for the problem.

AVR54. System May Experience Inability to Boot or May Cease Operation

Problem: The SoC LPC_CLKOUT0 and/or LPC_CLKOUT1 signals (Low Pin Count bus clock outputs) may stop functioning.

Implication:   If the LPC clock(s) stop functioning the system will no longer be able to boot.

Workaround:  A platform level change has been identified and may be implemented as a workaround for this erratum.

At a high-level, the problem is that the operating clock for the Low Pin Count bus can stop working. Essentially a type of legacy bus, the LPC bus is a simple bus for simple peripherals, best known for supporting legacy devices such as serial and parallel ports. It is not a bus that’s strictly necessary for the operation of a computer or embedded device, and instead its importance depends on what devices are being hung off of it. Along with legacy I/O devices, the second most common device type to hang off of the LPC is the boot ROM/BIOS– owing to the fact that it’s a simple device that needs little bandwidth – and this is where the C2000 flaw truly rears its head.

As Intel’s errata succinctly explains, if the LPC bus breaks, then any system using it to host the boot ROM will no longer be able boot, as the system would no longer be able to access said boot ROM. The good news is that Intel has a workaround (more on that in a second), so it’s an avoidable failure, but it’s a hardware workaround, meaning the affected boards have to be reworked to fix them. Complicating matters, since Atom C2000 is a BGA chip being used in an embedded fashion, an LPC failure means that the entire board (if not the entire device) has to be replaced.

Diving deeper, the big question of course is how the LPC bus could break in this fashion. To that end, The Register reached out to Intel and has been able to get a few more details. As quoted by The Register, Intel is saying that the problem is “a degradation of a circuit element under high use conditions at a rate higher than Intel’s quality goals after multiple years of service.”

Though we tend to think of solid-state electronics as just that – solid and unchanging – circuit degradation is a normal part of the lifecycle of a complex semiconductor like a processor. Quantum tunneling and other effects on a microscopic scale will wear down processors while they’re in use, leading to eventual performance degradation or operational failure. However even with modern processors the effect should take a decade or longer, much longer than the expected service lifetime of a chip. So when something happens to speed up the degradation process, if severe enough it can cut the lifetime of a chip to a fraction of what it was planned for, causing a chip (or line of chips) to fail while still in active use. And this is exactly what’s happening with the Atom C2000.

For Intel, this is the second time this decade that they’ve encountered a degradation issue like this. Back in 2011 the company had to undertake a much larger and more embarrassing repair & replacement program for motherboards using early Intel 6-series chipsets. On those boards an overbiased (overdriven) transistor controlling some of the SATA ports could fail early, disabling those SATA ports. And while Intel hasn’t clarified whether something similar to this is happening on the Atom C2000, I wouldn’t be too surprised if it was. Which isn’t to unnecessarily pick on Intel here; given the geometries at play (bear in mind just how small a 22nm transistor is) transistor reliability is a significant challenge for all players. Just a bit too much voltage on a single transistor out of billions can be enough to ultimately break a chip.

The Solution: New Silicon & Reworked Motherboards

Anyhow, the good news is that Intel has developed both a silicon workaround and a platform workaround. The long-term solution is of course rolling out a new revision of the C2000 silicon that incorporates a fix for the issue, and Intel has told The Register they’ll be doing just that. This will actually come somewhat late in the lifetime of the processor, as the current B0 revision was launched three and a half years ago and will be succeeded by Denverton this year. At the same time though, as an IT-focused product Intel will still need to offer the Atom C2000 series to customers for a number of years to come, so even with the cost of a new revision of the silicon, it’s in Intel’s long-term interest.

More immediately, the platform fix can be used to prevent the issue on boards with the B0 silicon. Unfortunately Intel isn’t disclosing just what the platform fix is, but if it is a transistor bias issue, then the fix is likely to involve reducing the voltage to the transistor, essentially bringing its degradation back to expected levels. Some individual product vendors are also reporting that the fix can be reworked into existing (post-production) boards, though it sounds like this can only prevent the issue, not fix an already-unbootable board.

Affected Products: Routers, Servers, & NASes

As a result of the nature of a problem, the situation is a mixed bag for device manufacturers and owners. First and foremost, while most manufacturers have used the LPC bus to host the boot ROM, not all of them have. For the smaller number of manufacturers who are using SPI Flash, this wouldn’t impact them unless they were using the LPC bus for something else. Otherwise for those manufacturers who are impacted, transistor degradation is heavily dependent on ambient temperature and use: the hotter a chip and the harder its run, the faster a transistor will degrade. Consequently, while all C2000 chips have the flaw, not all C2000 chips will have their LPC clock fail before a device reaches the end of its useful lifetime. And certainly not all C2000 chips will fail at the same time.

Cisco, whose routers are impacted, estimates that while issues can occur as early as 18 months in, they don’t expect a meaningful spike in failures until 3 years (36 months) in. This of course happens to be just a bit shorter than the age of the first C2000 products, which is likely why this issue hasn’t come to light until now. Failures would then become increasingly likely as time goes on, and accordingly Cisco will be replacing the oldest affected routers first, as they’re the most vulnerable to the degradation issue.

As for other vendors shipping Atom C2000-based products, those vendors are setting up their own support programs. Patrick Kennedy over at ServeTheHome has already started compiling a list of vendor responses, including Supermicro and Netgate. However as it stands a lot of vendors are still developing their response to the issue, so this will be an ongoing process.

Finally, what’s likely to be the most affected on the consumer side of matters will be on the Network Attached Storage front. As pointed out to me by our own Ganesh TS, Seagate, Synology, ASRock, Advantronix, and other NAS vendors have all shipped devices using the flawed chips, and as a result all of these products are vulnerable to early failures. These vendors are still working on their respective support programs, but for covered devices the result is going to be the same: the affected NASes will need to be swapped for models with fixed boards/silicon. So NAS owners will want to pay close attention here, as while these devices aren’t necessarily at risk of immediate failure, they are at risk of failure in the long term.

Sources: Tom’s Hardware, The Register, & ServeTheHome

Semi-Critical Intel Atom C2000 SoC Flaw Discovered, Hardware Fix Required

Semi-Critical Intel Atom C2000 SoC Flaw Discovered, Hardware Fix Required

Last week, Paul Alcorn over at Tom’s Hardware picked up on an interesting statement made by Intel in their Q4 2016 earnings call.  The company, whose Data Center group’s profits had slipped a bit year-over-year, was “observing a product quality issue in the fourth quarter with slightly higher expected failure rates under certain use and time constraints.” As a result the company had setup a reserve fund as part of their larger effort to deal with the issue, which would include a “minor” design (i.e. silicon) fix to permanently resolve the problem.

A bit more digging by Paul further turned up that the problem was with Intel’s Atom C2000 family, better known by the codenames Avoton and Rangeley. As a refresher, the Silvermont-based server SoCs were launched in Q3 of 2013 – about three and a half years ago – and are offered with 2, 4, and 8 cores. These chips are, in turn, meant for use in lower-power and reasonably highly threaded applications such as microservers, communication/networking gear, and storage. As a result the C2000 is an important part of Intel’s product lineup – especially as it directly competes with various ARM-based processors in many of its markets – but it’s a name that’s better known to device manufacturers and IT engineers than it is to consumers. Consequently, an issue with the C2000 family doesn’t immediately raise any eyebrows.

Jumping a week into the present, since their earnings call Intel has posted an updated spec sheet for the Atom C2000 family. More importantly, device manufacturers have started posting new product errata notices; and while they are keeping their distance away from naming the C2000 directly, all signs point to the affected products being C2000 based. As a result we finally have some insight into what the issue is with C2000. And while the news isn’t anywhere close to dire, it’s certainly not good news for Intel. As it turns out, there’s a degradation issue with at least some (if not all) parts in the Atom C2000 family, which over time can cause chips to fail only a few years into their lifetimes.

The Problem: Early Circuit Degradation

To understand what’s going on and why C2000 SoCs can fail early, let’s start with Intel’s updated spec sheet, which contains the new errata for the problem.

AVR54. System May Experience Inability to Boot or May Cease Operation

Problem: The SoC LPC_CLKOUT0 and/or LPC_CLKOUT1 signals (Low Pin Count bus clock outputs) may stop functioning.

Implication:   If the LPC clock(s) stop functioning the system will no longer be able to boot.

Workaround:  A platform level change has been identified and may be implemented as a workaround for this erratum.

At a high-level, the problem is that the operating clock for the Low Pin Count bus can stop working. Essentially a type of legacy bus, the LPC bus is a simple bus for simple peripherals, best known for supporting legacy devices such as serial and parallel ports. It is not a bus that’s strictly necessary for the operation of a computer or embedded device, and instead its importance depends on what devices are being hung off of it. Along with legacy I/O devices, the second most common device type to hang off of the LPC is the boot ROM/BIOS– owing to the fact that it’s a simple device that needs little bandwidth – and this is where the C2000 flaw truly rears its head.

As Intel’s errata succinctly explains, if the LPC bus breaks, then any system using it to host the boot ROM will no longer be able boot, as the system would no longer be able to access said boot ROM. The good news is that Intel has a workaround (more on that in a second), so it’s an avoidable failure, but it’s a hardware workaround, meaning the affected boards have to be reworked to fix them. Complicating matters, since Atom C2000 is a BGA chip being used in an embedded fashion, an LPC failure means that the entire board (if not the entire device) has to be replaced.

Diving deeper, the big question of course is how the LPC bus could break in this fashion. To that end, The Register reached out to Intel and has been able to get a few more details. As quoted by The Register, Intel is saying that the problem is “a degradation of a circuit element under high use conditions at a rate higher than Intel’s quality goals after multiple years of service.”

Though we tend to think of solid-state electronics as just that – solid and unchanging – circuit degradation is a normal part of the lifecycle of a complex semiconductor like a processor. Quantum tunneling and other effects on a microscopic scale will wear down processors while they’re in use, leading to eventual performance degradation or operational failure. However even with modern processors the effect should take a decade or longer, much longer than the expected service lifetime of a chip. So when something happens to speed up the degradation process, if severe enough it can cut the lifetime of a chip to a fraction of what it was planned for, causing a chip (or line of chips) to fail while still in active use. And this is exactly what’s happening with the Atom C2000.

For Intel, this is the second time this decade that they’ve encountered a degradation issue like this. Back in 2011 the company had to undertake a much larger and more embarrassing repair & replacement program for motherboards using early Intel 6-series chipsets. On those boards an overbiased (overdriven) transistor controlling some of the SATA ports could fail early, disabling those SATA ports. And while Intel hasn’t clarified whether something similar to this is happening on the Atom C2000, I wouldn’t be too surprised if it was. Which isn’t to unnecessarily pick on Intel here; given the geometries at play (bear in mind just how small a 22nm transistor is) transistor reliability is a significant challenge for all players. Just a bit too much voltage on a single transistor out of billions can be enough to ultimately break a chip.

The Solution: New Silicon & Reworked Motherboards

Anyhow, the good news is that Intel has developed both a silicon workaround and a platform workaround. The long-term solution is of course rolling out a new revision of the C2000 silicon that incorporates a fix for the issue, and Intel has told The Register they’ll be doing just that. This will actually come somewhat late in the lifetime of the processor, as the current B0 revision was launched three and a half years ago and will be succeeded by Denverton this year. At the same time though, as an IT-focused product Intel will still need to offer the Atom C2000 series to customers for a number of years to come, so even with the cost of a new revision of the silicon, it’s in Intel’s long-term interest.

More immediately, the platform fix can be used to prevent the issue on boards with the B0 silicon. Unfortunately Intel isn’t disclosing just what the platform fix is, but if it is a transistor bias issue, then the fix is likely to involve reducing the voltage to the transistor, essentially bringing its degradation back to expected levels. Some individual product vendors are also reporting that the fix can be reworked into existing (post-production) boards, though it sounds like this can only prevent the issue, not fix an already-unbootable board.

Affected Products: Routers, Servers, & NASes

As a result of the nature of a problem, the situation is a mixed bag for device manufacturers and owners. First and foremost, while most manufacturers have used the LPC bus to host the boot ROM, not all of them have. For the smaller number of manufacturers who are using SPI Flash, this wouldn’t impact them unless they were using the LPC bus for something else. Otherwise for those manufacturers who are impacted, transistor degradation is heavily dependent on ambient temperature and use: the hotter a chip and the harder its run, the faster a transistor will degrade. Consequently, while all C2000 chips have the flaw, not all C2000 chips will have their LPC clock fail before a device reaches the end of its useful lifetime. And certainly not all C2000 chips will fail at the same time.

Cisco, whose routers are impacted, estimates that while issues can occur as early as 18 months in, they don’t expect a meaningful spike in failures until 3 years (36 months) in. This of course happens to be just a bit shorter than the age of the first C2000 products, which is likely why this issue hasn’t come to light until now. Failures would then become increasingly likely as time goes on, and accordingly Cisco will be replacing the oldest affected routers first, as they’re the most vulnerable to the degradation issue.

As for other vendors shipping Atom C2000-based products, those vendors are setting up their own support programs. Patrick Kennedy over at ServeTheHome has already started compiling a list of vendor responses, including Supermicro and Netgate. However as it stands a lot of vendors are still developing their response to the issue, so this will be an ongoing process.

Finally, what’s likely to be the most affected on the consumer side of matters will be on the Network Attached Storage front. As pointed out to me by our own Ganesh TS, Seagate, Synology, ASRock, Advantronix, and other NAS vendors have all shipped devices using the flawed chips, and as a result all of these products are vulnerable to early failures. These vendors are still working on their respective support programs, but for covered devices the result is going to be the same: the affected NASes will need to be swapped for models with fixed boards/silicon. So NAS owners will want to pay close attention here, as while these devices aren’t necessarily at risk of immediate failure, they are at risk of failure in the long term.

Sources: Tom’s Hardware, The Register, & ServeTheHome

Logitech Launches the BRIO 4K Pro: Its First 4K UHD Webcam with HDR

Logitech Launches the BRIO 4K Pro: Its First 4K UHD Webcam with HDR

Logitech has announced its BRIO 4K Pro Webcam, one of the world’s first webcams that features an Ultra HD resolution as well as HDR. The camera also has an infrared sensor to support facial recognition and Windows Hello.

The Logitech BRIO aka 4K Pro Webcam (960-001105) is based on the company’s new 4K sensor, supports up to 4096×2160 resolution at 30 fps (or 1920×1080 at 60 fps), as well as RightLight 3 with HDR backlighting technology. The webcam has autofocus, 5X digital zoom, and is equipped with an infrared sensor. The BRIO can also select between 65°, 78°, and 90° field of view (FOV) options, and the device has two omni-directional microphones with additional noise cancellation options.

Logitech does not reveal a lot of information about its webcam and we do not know what codec it uses or what video encoder chip is inside. What we do know is that the webcam requires USB 3.0 interconnection for 4K video recording, which suggests that video bitrates it supports are fairly high. As with the recently released C922, the Brio 4K will use Logitech’s latest software package which includes background replacement (similar to green screen detection). Logitech also lists a physical external privacy shutter on the feature set, allowing users to visually confirm that the camera is not taking video by other means (although, it is not mentioned if this also physically stops audio recording).

As for compatibility, the Logitech BRIO 4K Pro can work with all modern versions of Windows, Apple’s macOS 10.10 and higher as well as with Google Chrome OS version 29.0.1547.70. In addition, Logitech likes to point out that the webcam is suitable with multiple enterprise-class applications, including Skype for Business and Cisco programs with compatible certifications.  It is worth noting that due to various reasons, Windows 7 supports up to 1080p only.

The Logitech BRIO 4K Pro webcam is designed to be attached to the top of a monitor, so weighs 63 grams (2.2 oz) while being 102 mm (4”) wide and 27 mm (1”) tall and 27 mm (1”) deep. The clip weighs 44 grams, making the whole construction around 107 grams total. Logitech lists that the webcam comes with a USB 3.0 Type-A cable, indicating a C-to-A connection rather than a straight C-to-C.

The Logitech BRIO 4K Pro Webcam is available today directly from Logitech.com as well as from the company’s resellers worldwide. The MSRP of the product is $199 in the U.S., £199 in the UK, and €239 in Europe (those last two include tax). 

Logitech Launches the BRIO 4K Pro: Its First 4K UHD Webcam with HDR

Logitech Launches the BRIO 4K Pro: Its First 4K UHD Webcam with HDR

Logitech has announced its BRIO 4K Pro Webcam, one of the world’s first webcams that features an Ultra HD resolution as well as HDR. The camera also has an infrared sensor to support facial recognition and Windows Hello.

The Logitech BRIO aka 4K Pro Webcam (960-001105) is based on the company’s new 4K sensor, supports up to 4096×2160 resolution at 30 fps (or 1920×1080 at 60 fps), as well as RightLight 3 with HDR backlighting technology. The webcam has autofocus, 5X digital zoom, and is equipped with an infrared sensor. The BRIO can also select between 65°, 78°, and 90° field of view (FOV) options, and the device has two omni-directional microphones with additional noise cancellation options.

Logitech does not reveal a lot of information about its webcam and we do not know what codec it uses or what video encoder chip is inside. What we do know is that the webcam requires USB 3.0 interconnection for 4K video recording, which suggests that video bitrates it supports are fairly high. As with the recently released C922, the Brio 4K will use Logitech’s latest software package which includes background replacement (similar to green screen detection). Logitech also lists a physical external privacy shutter on the feature set, allowing users to visually confirm that the camera is not taking video by other means (although, it is not mentioned if this also physically stops audio recording).

As for compatibility, the Logitech BRIO 4K Pro can work with all modern versions of Windows, Apple’s macOS 10.10 and higher as well as with Google Chrome OS version 29.0.1547.70. In addition, Logitech likes to point out that the webcam is suitable with multiple enterprise-class applications, including Skype for Business and Cisco programs with compatible certifications.  It is worth noting that due to various reasons, Windows 7 supports up to 1080p only.

The Logitech BRIO 4K Pro webcam is designed to be attached to the top of a monitor, so weighs 63 grams (2.2 oz) while being 102 mm (4”) wide and 27 mm (1”) tall and 27 mm (1”) deep. The clip weighs 44 grams, making the whole construction around 107 grams total. Logitech lists that the webcam comes with a USB 3.0 Type-A cable, indicating a C-to-A connection rather than a straight C-to-C.

The Logitech BRIO 4K Pro Webcam is available today directly from Logitech.com as well as from the company’s resellers worldwide. The MSRP of the product is $199 in the U.S., £199 in the UK, and €239 in Europe (those last two include tax).