The Zen Thread

w00key

Ars Tribunus Angusticlavius
6,865
Subscriptor
I avoid the problem by not transcoding; I use wired clients and just play original quality. Some WiFi-only clients may not like that, but I prefer bigger screens anyway. Android streaming boxes aren't ridiculously expensive.
Never stream outside the home? Upload capacity may be a problem and so does marginal signal on the other side, I had 1 bar of LTE today in a swimming pool with metal roof and coated glass. Even a 20 second WhatsApp video clips took some time to download.

I have 200/200 now at home so next NAS will have either QS or Nvenc so transcoding is NBD, I want clients be able to pick a quality a la YT, 480/720p while away and full quality on wifi. Plus support for remixing down to stereo and tonemapping HDR to SDR.
 

w00key

Ars Tribunus Angusticlavius
6,865
Subscriptor
It used to be NBD then fragmentation of streaming services happened. Now I rip my media and make them available on Jellyfin, I'm not paying for Prime + Netflix + HBO + Disney etc and it's still a toss up if something is available, Paramount+ and other American services yanked all their content in preparation of an European launch and just didn't launch it here.

Yeah nevermind then. I'll just spend that monthly budget on media and rip it myself, or purchase and download a bd/webrip.
 

fitten

Ars Legatus Legionis
53,329
Subscriptor++
Is there a tl:dw ? Is it a widespread issue, or specific Mobo / user problems?

Not really... still not known for sure but some of the 9800X3D user failures were 'revived' by BIOS updates which means they weren't chip kills. ASRock (and others, maybe, IIRC) advised to not update your BIOS if you were currently stable. Less likely the real issue, but some are claiming that most of the 'failures' (either chip kill or BIOS revive) are from two bins (so possibly a bad batch) but that's just speculation at the moment.
 
The Asrock problem appears to be that some BIOS versions are undervolting the uncore to the point the system isn't booting anymore.
Do we have more investigation / measurements / evidence for the root cause? Hadn't seen that one yet. But would be kinda funny if the pendulum had finally swung too far to the other side.
 
The Asrock problem appears to be that some BIOS versions are undervolting the uncore to the point the system isn't booting anymore.

It's not high voltage killing CPUs like the previous round.
Well, that's good news. It's ASRock only? I was under the impression that there were other vendors also showing the issue?
 
According to the GN video, Asrock is like >80% of the complaints. Apparently, starting with BIOS 3.1x they started 'killing' CPUs in the sense they wouldn't boot. They've released BIOS 3.20 that address the issue. Whether it's 100% solved is not known yet.

For the remaining <20%, it looks like the usual mix of manual overclocking gone wrong, CPU wrongly inserted & all. The 9800x3D is the #1 CPU for enthusiast PC home builders, it shouldn't be surprising it's the #1 in complaints too.
 

theevilsharpie

Ars Scholae Palatinae
1,457
Subscriptor++
So undervolting can of course cause stability issues, but that wouldn't permanently kill a CPU like overvolting, right?

It wouldn't kill a CPU at a physical level, but if you've got a firmware that defaults to an undervolt such that it can't boot anymore, then the machine is effectively dead until the faulty firmware is replaced, which can be difficult if the CPU is needed to update the firmware.
 

Drizzt321

Ars Legatus Legionis
30,842
Subscriptor++
It wouldn't kill a CPU at a physical level, but if you've got a firmware that defaults to an undervolt such that it can't boot anymore, then the machine is effectively dead until the faulty firmware, which can be difficult if the CPU is needed to update the firmware.
That's what I thought, but was a bit unsure from some of the wording above. Thanks.
 

evan_s

Ars Tribunus Angusticlavius
6,387
Subscriptor
It wouldn't kill a CPU at a physical level, but if you've got a firmware that defaults to an undervolt such that it can't boot anymore, then the machine is effectively dead until the faulty firmware is replaced, which can be difficult if the CPU is needed to update the firmware.

Fortunately, Bios flashback type bios updates without needing a working CPU are mandatory on all AM5 boards AFAIK so you shouldn't end up in a situation where you can't flash the bios to fix this issue.
 
  • Like
Reactions: Baenwort

Xavin

Ars Legatus Legionis
30,551
Subscriptor++
So undervolting can of course cause stability issues, but that wouldn't permanently kill a CPU like overvolting, right?
Chronic undervolting can cause issues with electronics, but it's not likely to happen in this case since nobody is going to try and boot a non-functional machine enough for that kind of damage to build up. Depending on how it's failing the CPU might not even be being fully powered up.
 
It wouldn't kill a CPU at a physical level, but if you've got a firmware that defaults to an undervolt such that it can't boot anymore, then the machine is effectively dead until the faulty firmware is replaced, which can be difficult if the CPU is needed to update the firmware.
I think all AM5 boards are required to have CPU-less flashing, so as long as someone has enough of a computer to put files on a USB stick, they can recover, even if the main system isn't working.
 
Overvolting shouldn't kill your CPU either. Nor should overheating. In fact, unless you go to extraordinary measures to increase voltage beyond what the motherboard even supports, every CPU should be absolutely bulletproof, other than the defective Intel 13th/14th-gen of course.
You're depending on CPU safety mechanisms in that case, which is rather like throwing yourself off the tightrope and counting on the nets to save you.

The CPU circuitry itself can be killed instantly by overvoltage or overheating. Ask me how I know.
 
  • Like
Reactions: timezon3

Drizzt321

Ars Legatus Legionis
30,842
Subscriptor++
Well this is a wild, a vulnerability with Zen (1-4?) microcode patching. If I'm understanding it all right, it boils down to AMD using a non-secure hashing function for some weird, crazy reason, rather than a secure hashing function sigh WHY.

https://bughunters.google.com/blog/5424842357473280/zen-and-the-art-of-microcode-hacking
They used a bad hash function (CMAC instead of something like SHA-1), but the real foulup was publishing their Zen 1 private key and then keeping it through Zen 4. What a gigantic fuckup. Without the private key, the poor choice of hash wouldn't have been a big deal.

The mitigation appears to be loading a microcode update that blocks the exploit and makes sure any further microcode updates are actually from AMD. That's something you'll definitely want in the BIOS, so it loads first. Combined, however, with an exploited BIOS, recovering a machine could be pretty difficult. There's a definite possibility that, given a sufficiently sophisticated exploit, the only possible recovery would be the circular file for the CPU and motherboard.
 
Last edited:

Drizzt321

Ars Legatus Legionis
30,842
Subscriptor++
They used a bad hash function (CMAC instead of something like SHA-1), but the real foulup was publishing their Zen 1 private key and then keeping it through Zen 4. What a gigantic fuckup. Without the private key, the poor choice of hash wouldn't have been a big deal.
Oh yeah, forgot that part of what I read.

Yeah, was just really silly. Posting private key was basically game over, regardless, even if it was still complicated.
 

w00key

Ars Tribunus Angusticlavius
6,865
Subscriptor
Well this is a wild, a vulnerability with Zen (1-4?) microcode patching. If I'm understanding it all right, it boils down to AMD using a non-secure hashing function for some weird, crazy reason, rather than a secure hashing function sigh WHY.

https://bughunters.google.com/blog/5424842357473280/zen-and-the-art-of-microcode-hacking
It gets worse. Instead of generating a new key they used one of the examples from the docs. Like, wtaf?!

We noticed that the key from an old Zen 1 CPU was the example key of the NIST SP 800-38B publication (Appendix D.1 2b7e1516 28aed2a6 abf71588 09cf4f3c) and was reused until at least Zen 4 CPUs. Using this key we could break the two usages of AES-CMAC: the RSA public key and the microcode patch contents.

Key reuse is a big no no but using the example key from NIST docs is next level. You only type that in test code to verify your algo is implemented okay and poops out the expected value for certain inputs. Very wow.
 
It gets worse. Instead of generating a new key they used one of the examples from the docs. Like, wtaf?!



Key reuse is a big no no but using the example key from NIST docs is next level. You only type that in test code to verify your algo is implemented okay and poops out the expected value for certain inputs. Very wow.
Which brings up, in turn: why on earth are they going through the song-and-dance to generate a hash collision when they have the private key and can just sign any binary they like?
 

BO(V)BZ

Ars Tribunus Militum
2,235
That vulnerability has been known since early Feb. Basically you need local admin access to be vulnerable. Not even reallly a huge concern for datacenter as an attacker would need to escape the VM or container first then escalate to root.

It is pretty hilarious, in a tragi-comedy sort of way.
Well, there's a trio of VMWare vulnerabilities that can do exactly that. I skimmed it as it doesn't really affect me, but basically a compromised VM can end up rooting the hypervisor and compromising any other VMs on the same platform, so maybe a bigger problem? Still, getting root-level access to the hypervisor is already game over, I don't know why you'd need much more than that.
 
It gets worse. Instead of generating a new key they used one of the examples from the docs. Like, wtaf?!
Things like that typically happen when someone without cryptography expertise is tasked with implementing security. Ask yourself how many or how few people might have deep knowledge about the arcane details of a processor's microcode engine. And on top of that they need substantial knowledge about cryptography, too.

I won't excuse this. I merely note that this is one of the most common failure modes of products which require cryptography. IMHO this is most often a structural failure of the company itself; whenever security is implemented anywhere, it should never be done by a single person alone; and at least one specialized security expert should necessarily be part of the team.

Not sure if AMD of the Zen 1 days was big enough and organized enough to do it right. But by Zen3 at the very latest they should have had the resources, given the success of Zen2 on desktop and in the data center.
 
It was probably just one of those stupid corporate miscommunication things, where the later teams had no idea the Zen 1 key had been published in the first place. The bad hash choice wouldn't really have mattered, otherwise.

I'm still confused about why they're attacking the hash, though, and not just directly using the published private key.
 
I'm still confused about why they're attacking the hash, though, and not just directly using the published private key.
I am not sure the key was actually published. My guess is they attacked the hash because it is vulnerable to hash collisions. And with a colliding hash, the attackers could try to generate their own private key, not even knowing AMD's, that would have the same hash (and fulfills several obscure mathematical properties such that AMD's firmware would successfully decrypt the researchers' illegitimate microcode). Then the illegitimate firmware would be accepted, and be active in the CPU.

Presumably after the attackers had come this far, they started attacking AMD's private key, and immediately stumbled over the fact that their own first experiment with the example key succeeded right away?
 

Xavin

Ars Legatus Legionis
30,551
Subscriptor++
Well this is a wild, a vulnerability with Zen (1-4?) microcode patching. If I'm understanding it all right, it boils down to AMD using a non-secure hashing function for some weird, crazy reason, rather than a secure hashing function sigh WHY.

https://bughunters.google.com/blog/5424842357473280/zen-and-the-art-of-microcode-hacking
Programmers love to use their custom crypto/hashing functions they cooked up over a weekend, usually because they didn't understand the complexity of the standard ones and were trying to understand the principles then got attached to it or think they can do it better and more simply. 99.999% of the time they are missing critical reasons for that complexity and that's where the exploits live. Sometimes they just don't take time to understand all the implementation requirements, which usually seem arcane and are confusing.

Give me a slow uninspired programmer any day over a motivated clever one. I have spent too much of my career cleaning up the fallout from "clever" programming (including some I inflicted on myself in my younger days).
 
Motherboard I can see. Is there anything potentially persistent in the CPU?
An evil microcode update breaking all the security parameters isn't persistent across poweroffs. However, if the BIOS is corrupted to load a malicious microcode first, then that microcode could refuse any replacement attempts. And an evil BIOS can often defeat reflash attempts, though the new CPU-less flash mechanisms can potentially defeat that. However, not all the flashback methods correctly clear all of the flash, which can sometimes let malware stay present.
 
I am not sure the key was actually published. My guess is they attacked the hash because it is vulnerable to hash collisions.
But they explicitly say it was:

We noticed that the key from an old Zen 1 CPU was the example key of the NIST SP 800-38B publication (Appendix D.1 2b7e1516 28aed2a6 abf71588 09cf4f3c) and was reused until at least Zen 4 CPUs. Using this key we could break the two usages of AES-CMAC: the RSA public key and the microcode patch contents.

So why aren't they just using the key directly?
 

Drizzt321

Ars Legatus Legionis
30,842
Subscriptor++
So why aren't they just using the key directly?
I think when they did a search for the key after they reverse engineered it, it turned out to be a published key. I don't think they knew it was the reference key ahead of time. Although maybe security researches should just start with the reference key these days. Apparently it's the most common one used...