Backdoored package in Go mirror site went unnoticed for >3 years

hobotron · Feb 5, 2025

FYI: the "sequence of events" section is repeated twice in the article, specifically the bulleted content.

Thynix · Feb 5, 2025

I think software development has a fundamental problem caused by its need for trust. As much as researchers admonish people to audit their dependencies by reading through its code, that's a cost in time and effort that's difficult at best to justify to one's managers. Strictly speaking, they're right, assuming the backdoors are obvious enough to be spotted by someone unfamiliar with the codebase, and that they're present in the source (to my understanding Go has no library binaries, so that's not a problem here) which isn't the case for everything as we saw with the xz backdoor.

In this case in particular, it'd be difficult to detect even if one did read through the code - if I were auditing something, I'd clone its repository and read its tagged version, not open the version Go had squirreled away locally somewhere. This is an amazingly well thought out attack enabled by this caching layer - I didn't even know the caching layer existed.

By default, when someone uses command-line tools built into Go to download or install packages

is WILD

SpaceHamster · Feb 5, 2025

AFAIK this is how most/all package managers work, whether they serve bundled source code, compiled binaries, or a combo (like a Python wheel).

What could a package manager even do to mitigate this? Periodically compare the contents of the original source code repository to the contents of the package that they're storing? That's a BIG job. There are many tens (hundreds?) of thousands of packages served by the popular package managers, each with dozens or even hundreds of versions.

Maybe GitHub could help by letting a package manager like PyPi provide lists of repos and tags, and then GitHub could let the manager know whenever one of those tags is changed? Then the manager could invalidate the bundles that they've cached and inform the authors. But that's assuming that a package has a corresponding repo, and that they're doing something like tagging each of their versions, which certainly not always the case.

What a mess.

no_great_name · Feb 5, 2025

Thynix said:
I think software development has a fundamental problem caused by its need for trust. As much as researchers admonish people to audit their dependencies by reading through its code, that's a cost in time and effort that's difficult at best to justify to one's managers. Strictly speaking, they're right, assuming the backdoors are obvious enough to be spotted by someone unfamiliar with the codebase, and that they're present in the source (to my understanding Go has no library binaries, so that's not a problem here) which isn't the case for everything as we saw with the xz backdoor.

In this case in particular, it'd be difficult to detect even if one did read through the code - if I were auditing something, I'd clone its repository and read its tagged version, not open the version Go had squirreled away locally somewhere. This is an amazingly well thought out attack enabled by this caching layer - I didn't even know the caching layer existed.

is WILD

Even if you took the time to scan through the source of the module, you would need to understand all the code you’re looking at well enough to know if it was legitimate or not.

That might be straight forward (if time consuming) for convenience packages that include code you might write yourself if you had to. But for lower level packages outside of your domain expertise? Good luck.

And that’s before you consider obfuscation. I doubt we can rely on the malware devs to document their code for us so it’s easy to spot the malicious CreateBackdoor() method…

radarskiy · Feb 5, 2025

Cache invalidation is one of the two hard problems in computer science*. Part of this process is observing traffic that involves the source address. Software mirrors assume the source is invariant until it goes missing, but even outside of malicious behavior this isn't true.

*: The others being naming things and off-by-one errors.

c3b4fc · Feb 5, 2025

This could be mitigated by not using the typosquatted version in the first place and by using the commit hash instead of the tag, which is best practice anyway to prevent supply chain attacks.

Joel622 · Feb 5, 2025

Why doesn't Go provide a hash for its packages like most linux distributions do these days?

e: ^^^ beaten by seconds, lol. It sounds like there is a hash but it is optional and not ergonomic to use.

fuzzyfuzzyfungus · Feb 5, 2025

It seems minimally surprising that a big package cache would be serving some malware if it is just automatically scraping stuff; but it seems like a bigger "you had one job" problem that a 'cache' would be serving something different(identically numbered) than contained by the source.

There's a realistic limit on how often they are going to be able to verify the cache against the source without getting into a scraper war with MS; but (especially with git being deliberately unhelpful to quiet alteration) that limit should be significantly shorter than whatever interval they are using, if they are doing any periodic cache verification at all.

PhilipStorry · Feb 5, 2025

I think that within the next decade SBOMs will become mandatory for governments and large companies. And I'm not talking about what they write themselves - I'm talking about what they buy or use.

It will probably come in as an optional requirement at first - if you provide an SBOM whilst bidding, it's an extra tick when being evaluated. After a while it will become mandatory.

And it won't be a one-off thing - you'll have to provide an updated one every year. Large customers will want to know that you're actively maintaining and reviewing your SBOM. Security teams will be querying their SBOM databases when they learn about vulnerabilities, and following up with their suppliers.

Most importantly, working like this would probably mean a few customers saying "Are you sure this is right?", and typo-squatting attacks would then be spotted faster.

There may be a little resistance, but it's not really something that can be stopped. There's going to be a tipping point where it becomes "de rigeuer" for CTOs to be talking about their supply chain and how they manage it via SBOMs. There will be money to be made off the back of that, which will create a virtuous cycle that normalises it.

You should never make a prediction that can't be tested. If I'm right, within the next fifteen years there will be the first lawsuit based on a falsified SBOM that left a company or organisation open to attack. That's my test. Let's see how it goes...

adamsc · Feb 5, 2025

SpaceHamster said:
What could a package manager even do to mitigate this? Periodically compare the contents of the original source code repository to the contents of the package that they're storing? That's a BIG job. There are many tens (hundreds?) of thousands of packages served by the popular package managers, each with dozens or even hundreds of versions.

What PyPI is moving towards for Python packages is signed, verifiable builds where the package listed on the PyPI site is linked to a specific build and commit in your repository. That allows you to audit what was built and could allow you to automatically notice that the package is based on something no longer present.

This only helps so much, however, because most people do not refuse to install unsigned packages or retroactively audit their dependencies to see if something has been removed.

More fundamentally, the problem here is an untrustworthy source and so if this obfuscation wasn’t available they would easily have been able to try something else like the xz approach of hiding a bomb in the build scripts. Package managers really need to migrate away from the cultural assumption that you can just type something in by hand and hope for the best. That’s going to upend the open source world and it also seems likely to encourage more corporate control since people are going to - for quite legitimate reasons - do things like restrict the namespaces available on their projects and that’s going to make it hard for less established independent developers to see usage. I don’t know whether we’ll be able to find a good middle option but it makes me sad to see a war on the open source world I’ve spent decades in even though I can acknowledge the very good reasons for closing things down.

fuzzyfuzzyfungus · Feb 5, 2025

PhilipStorry said:
I think that within the next decade SBOMs will become mandatory for governments and large companies. And I'm not talking about what they write themselves - I'm talking about what they buy or use.

It will probably come in as an optional requirement at first - if you provide an SBOM whilst bidding, it's an extra tick when being evaluated. After a while it will become mandatory.

And it won't be a one-off thing - you'll have to provide an updated one every year. Large customers will want to know that you're actively maintaining and reviewing your SBOM. Security teams will be querying their SBOM databases when they learn about vulnerabilities, and following up with their suppliers.

Most importantly, working like this would probably mean a few customers saying "Are you sure this is right?", and typo-squatting attacks would then be spotted faster.

There may be a little resistance, but it's not really something that can be stopped. There's going to be a tipping point where it becomes "de rigeuer" for CTOs to be talking about their supply chain and how they manage it via SBOMs. There will be money to be made off the back of that, which will create a virtuous cycle that normalises it.

You should never make a prediction that can't be tested. If I'm right, within the next fifteen years there will be the first lawsuit based on a falsified SBOM that left a company or organisation open to attack. That's my test. Let's see how it goes...

I don't know if this will pan out; but I'd love to see it. I've had security vendors act like I'm asking crazy questions when I inquire about why other tools are calling out some of their dependencies(for installed software; unfortunately we don't generally have the resources to be cracking open appliances and analyzing them; especially when spiteful encryption of update packages seems to be increasingly common; even for ones that are only available to download with service contracts are are supposed to be just a trivial embedded linux environment with a few vendor daemons and a web UI); and those guys at least understand the question; unlike a lot of their 'general'/'line of business' counterparts; who often act like they've been sent to the crazytown gulag if you say that you'll be setting the application up on a system where the firewall blocks by default and you'll need port numbers and information on what needs to talk to what within the system.(speaking of firewall and communication configs; is there any sort of vaguely standardized notation for describing the communications topology of an application or system that is designed to be both machine readable for conversion to firewall rules and either human readable enough to read or convertible to human readable diagrams? That'd be handy vs. the usual free-text descriptions you get from the people who will shake the data loose.)

But at least, upon such rotten foundations, they will promise AI-Enabled Zero Trust Security Solutions.

BrianB_NY · Feb 5, 2025

hobotron said:
FYI: the "sequence of events" section is repeated twice in the article, specifically the bulleted content.

Double the (bullet) points, double the miles.

It's actually a subliminal advert for a certain credit card.

ProdigySim · Feb 5, 2025

Thynix said:
I think software development has a fundamental problem caused by its need for trust.

If this is a "fundamental problem" I'm afraid it affects almost all of society. It's not so much a problem as an oft-ignored constraint.

Emon · Feb 5, 2025

ProdigySim said:
If this is a "fundamental problem" I'm afraid it affects almost all of society. It's not so much a problem as an oft-ignored constraint.

It will always be there as long as short term stock price gains are what drives investors rather than long term dividends. Which is how things were until 40-50 years ago...stocks that didn't return dividends were usually considered worthless.

Zukunftsweber · Feb 5, 2025

c3b4fc said:
This could be mitigated by not using the typosquatted version in the first place and by using the commit hash instead of the tag, which is best practice anyway to prevent supply chain attacks.

Easier said than done in this case—the authentic module was no longer maintained (it was archived), so this apparently looked like a legitimate fork/continuation

cameron2 · Feb 5, 2025

Like most languages and ecosystems, security in Go is an afterthought. And without support for capabilities, this type of problem cannot be prevented in the future, either.

perholmes · Feb 5, 2025

I love GitHub's Dependabot, which gives us alerts on any upstream vulnerabilities, and even offers to create a patch commit that updates the package manifests or Dockerfiles with non-vulnerable versions.

So far, none of the vulnerabilities have applied to us because most of our containers aren't directly exposed to HTTP traffic. But I check every alert thoroughly.

The only other solution is that we start committing our dependencies, which is still on the table.

jonomacd · Feb 5, 2025

SpaceHamster said:
AFAIK this is how most/all package managers work, whether they serve bundled source code, compiled binaries, or a combo (like a Python wheel).

What could a package manager even do to mitigate this? Periodically compare the contents of the original source code repository to the contents of the package that they're storing? That's a BIG job. There are many tens (hundreds?) of thousands of packages served by the popular package managers, each with dozens or even hundreds of versions.

Maybe GitHub could help by letting a package manager like PyPi provide lists of repos and tags, and then GitHub could let the manager know whenever one of those tags is changed? Then the manager could invalidate the bundles that they've cached and inform the authors. But that's assuming that a package has a corresponding repo, and that they're doing something like tagging each of their versions, which certainly not always the case.

What a mess.

Yup, this kind of attack can take root in almost any supply chain. I would be very surprised if there weren't things like this lingering in almost any popular languages package management.

Dadlyedly · Feb 5, 2025

BrianB_NY said:
Double the (bullet) points, double the miles.

It's actually a subliminal advert for a certain credit card.

Not Doublemint Gum?

Dadlyedly · Feb 5, 2025

jonomacd said:
Yup, this kind of attack can take root in almost any supply chain. I would be very surprised if there weren't things like this lingering in almost any popular languages package management.

While reading the article, I was thinking to myself "This is quite clever of the people who did it, and I seriously doubt this is the first or last time they've done something like it, and I'll be reading about them again later." Comments above have reinforced that feeling.

tydavis · Feb 5, 2025

Couple of things:
1. This kind of “mistake” was flagged for companies a while ago. We identified it as “someone is caching a broken copy of the package and it’s automatically upgrading our clients,” but the outcome was the same: blanket disable of the proxy and sumdb.
2. There has been a solution that isn’t backdoored since go 1.15 — vendoring dependencies. The reason that hasn’t taken off is because people complain about “inflation of git repo size,” but it’s always a worthwhile trade off.

Go already encourages C-like copy paste instead of using a third party library dependency, but the Java/c++/nodejs crowd don’t do that. Google “solved” that problem with the proxy cache and here we are.

Yes, Google must do better, but also, Devs need to listen to others’ advice from the last decade.

DaveSimmons · Feb 5, 2025

radarskiy said:
Cache invalidation is one of the two hard problems in computer science*. Part of this process is observing traffic that involves the source address. Software mirrors assume the source is invariant until it goes missing, but even outside of malicious behavior this isn't true.

*: The others being naming things and off-by-one errors.

LOL.

On-topic, "trust us, we're Google" means less than it used to.

MilleniX · Feb 5, 2025

The research tells a cautionary tale of the importance of properly vetting code before running it on production devices

The same need for caution applies to running on developer devices as well, with its own exciting set of distinct hazards.

kruzes · Feb 5, 2025

Caching is no excuse. If this was detected and someone legitimate took hold of the typo squatted name, they could've retracted all published module versions prior. That would purge them from cache.

whiteknave · Feb 5, 2025

hobotron said:
FYI: the "sequence of events" section is repeated twice in the article, specifically the bulleted content.

It was cached. ;-)

ERIFNOMI · Feb 5, 2025

Joel622 said:
Why doesn't Go provide a hash for its packages like most linux distributions do these days?

e: ^^^ beaten by seconds, lol. It sounds like there is a hash but it is optional and not ergonomic to use.

What would a hash do the mitigate this? A hash of a malicious library just tells you that you got the malicious library you asked for.

The only thing they could have done in this case is update the cache when the source was force-updated and the backdoor was removed. But that doesn't solve the problem of someone making a malicious library and people accidentally grabbing it instead of a similarly named legitimate lib.

chanman819 · Feb 5, 2025

no_great_name said:
Even if you took the time to scan through the source of the module, you would need to understand all the code you’re looking at well enough to know if it was legitimate or not.

That might be straight forward (if time consuming) for convenience packages that include code you might write yourself if you had to. But for lower level packages outside of your domain expertise? Good luck.

And that’s before you consider obfuscation. I doubt we can rely on the malware devs to document their code for us so it’s easy to spot the malicious CreateBackdoor() method…

It's like the gap between a company's financial report and conducting forensic accounting. Someone would need both access and the specialized domain knowledge to make sense of it all to find evidence of deliberate wrongdoing.

DaveSimmons · Feb 5, 2025

It seems like repos need to take typosquatting a lot more seriously but this doesn't seem easy to automate over hundreds of thousands of package names.

Adding a domain, user, company prefix to a package name sounds good, but then the attackers just typosquat that too.

dooferorg · Feb 5, 2025

Crappy cache if it can't tell when the source was changed/updated. That's pretty pathetic.

sPOiDar · Feb 5, 2025

Omitted from this article, but included in the original write-up, is a description of why the module proxy cache serving immutable versions is actually a net positive: it allows for reproducable builds by ensuring that every pull of a version through the proxy produces exactly the same code, and it prevents a future malicious actor from rewriting a previously vetted version with malicious code.

The fact that this was able to be abused via the method described here doesn't negate those benefits.

That said, there could potentially be some sort of warning produced when pulling a dep that differs from upstream, by periodically comparing the cached hash to the upstream tag and flagging the cached version, so that users can make a more informed decision.

adamsc · Feb 5, 2025

DaveSimmons said:
It seems like repos need to take typosquatting a lot more seriously but this doesn't seem easy to automate over hundreds of thousands of package names.

Adding a domain, user, company prefix to a package name sounds good, but then the attackers just typosquat that too.

Yes, it's definitely a challenge but at the very least you could make it harder to get a prefix than register packages (e.g. an individual developer only needs to do that once in their career rather than every time they create a new package) and potentially that could also be tied to some kind of reputation where better known projects like the LF, CNCF, PSF, etc. could get a special status indicator or potentially even “vouch” for other prefixes (perhaps by listing them as published dependencies with an aging period).

For example, if the Python Software Foundation said “we confirmed that django/ is owned by the Django Software Foundation” and the DSF said “we confirmed that wagtail/ is controlled by the real developers” that transitive trust could be really useful for someone looking to audit their dependencies or have a policy where their CI system requires approval to build when new dependencies are added from prefixes they haven't previously used. That kind of system needs care to avoid successful spamming or social attacks – e.g. Jia Tan is going to helpfully submit pull requests to other projects – and if it's too hard to get approved it'll normalize the culture of not checking, but I think future developers are going to look back at things like the 2010s-era NPM world the same way most of us react when hearing about leaded gasoline or pre-HIV era swinger parties where people were having unprotected sex with strangers.

The good news is that open source decisively won the software development competition but the bad news is that now that the entire world depends on it people have much higher expectations for safety and stability.

Kebba · Feb 5, 2025

PhilipStorry said:
I think that within the next decade SBOMs will become mandatory for governments and large companies. And I'm not talking about what they write themselves - I'm talking about what they buy or use.

It will probably come in as an optional requirement at first - if you provide an SBOM whilst bidding, it's an extra tick when being evaluated. After a while it will become mandatory.

And it won't be a one-off thing - you'll have to provide an updated one every year. Large customers will want to know that you're actively maintaining and reviewing your SBOM. Security teams will be querying their SBOM databases when they learn about vulnerabilities, and following up with their suppliers.

Most importantly, working like this would probably mean a few customers saying "Are you sure this is right?", and typo-squatting attacks would then be spotted faster.

There may be a little resistance, but it's not really something that can be stopped. There's going to be a tipping point where it becomes "de rigeuer" for CTOs to be talking about their supply chain and how they manage it via SBOMs. There will be money to be made off the back of that, which will create a virtuous cycle that normalises it.

You should never make a prediction that can't be tested. If I'm right, within the next fifteen years there will be the first lawsuit based on a falsified SBOM that left a company or organisation open to attack. That's my test. Let's see how it goes...

It is already becoming a thing in automotive, at least the embedded software. Full SBOM from suppliers, and the supplier is responsible to monitor and notify the OEM of vulnerabilities.

And yes, it is very expensive as one would expect. Both to monitor, develop and patch

PhilipStorry · Feb 6, 2025

Kebba said:
It is already becoming a thing in automotive, at least the embedded software. Full SBOM from suppliers, and the supplier is responsible to monitor and notify the OEM of vulnerabilities.

And yes, it is very expensive as one would expect. Both to monitor, develop and patch

I was vaguely aware it's becoming a thing in some specialist circles, but I was thinking more military than automotive.

Part of why I think it will happen is that it will be an attempt to reduce the cost overall - a "many eyes" theory, whereby if there are lots of different security teams using different tools to analyse SBOMs, we're more likely to find issues than we are now.

Although to be fair that's very much a "something is better than nothing" argument now I come to think about it.

menkhaf · Feb 6, 2025

Seems like this particular bug was reported in April 2024 and ignored by the Go team: https://github.com/golang/go/issues/66653

Reading through the "hidden as off-topic" comments doesn't do the Go team any good -- they seem to completely misinterpret the issue. Not a good look.

Someone working on Golang will hopefully run some checks on how many other modules are "backdoored" in a similar way; should be simple to compare the information from pkg.go.dev with the actual source code.

dbarowy · Feb 6, 2025

ProdigySim said:
If this is a "fundamental problem" I'm afraid it affects almost all of society. It's not so much a problem as an oft-ignored constraint.

There is literally a classic CS paper about trust being a key component of a software system. It’s a Turing Award lecture no less!

https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_ReflectionsonTrustingTrust.pdf

The attack in this story definitely has shades of Thompson’s attack.

AppalachianEccentric · Feb 6, 2025

Joel622 said:
[...] there is [an X] but it is optional and not ergonomic to use.

Behold, the Go programming language!

Backdoored package in Go mirror site went unnoticed for >3 years

Wise, Aged Ars Veteran

Seniorius Lurkius

Wise, Aged Ars Veteran

Ars Centurion

Wise, Aged Ars Veteran

Wise, Aged Ars Veteran

Ars Centurion

Ars Legatus Legionis

Ars Scholae Palatinae

Ars Praefectus

Ars Legatus Legionis

Ars Praetorian

Ars Centurion

Ars Praefectus

Wise, Aged Ars Veteran

Ars Centurion

Ars Centurion

Ars Praetorian

Ars Tribunus Militum

Ars Tribunus Militum

Smack-Fu Master, in training

Ars Legatus Legionis

Ars Tribunus Angusticlavius

Ars Tribunus Militum

Ars Praefectus

Ars Tribunus Angusticlavius

Ars Tribunus Angusticlavius

Ars Legatus Legionis

Ars Scholae Palatinae

Wise, Aged Ars Veteran

Ars Praefectus

Ars Scholae Palatinae

Ars Scholae Palatinae

Seniorius Lurkius

Smack-Fu Master, in training

Seniorius Lurkius

nproxy.org