Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> its stored forever in the proxy cache

This is mistaken. The Go module proxy doesn't make any guarantee that it will permanently store the checksum for any given module. From the outside, we would expect that their policy is to only ever delete checksums for modules that haven't been fetched in a long time. But in general, you should not base your security model on the notion that these checksums are stored permanently.





> The Go module proxy doesn't make any guarantee that it will permanently store the checksum for any given module

Incorrect. Checksums are stored forever, in a Merkle Tree, meaning if the proxy were to ever delete a checksum, it would be detected (and yes, people like me are checking - https://sourcespotter.com/sumdb).

Like any code host, the proxy does not guarantee that the code for a module will be available forever, since code may have to be removed for legal reasons.

But you absolutely can rely on the checksum being preserved and thus you can be sure you'll never be given different code for a particular version.


Here's another person auditing the checksum database: https://raphting.dev/posts/gosumdb-live-again/

Ah, my mistake. I had read in the FAQ that it does not guarantee that data is stored forever, but overlooked the part about preserving checksums specifically.

To be very pedantic, there are two separate services: The module proxy (proxy.golang.org) serves cached modules and makes no guarantees about how long cache entries are kept. The sum database (sum.golang.org) serves module checksums, which are kept forever in a Merkle tree/transparency log.

Ok. So to answer the question whether the code for v1.0.0 that I downloaded today is the same as I downloaded yesterday (or whether the code that I get is the same as the one my coworker is getting) you basically have to trust Google.

The checksums are published in a transparency log, which uses a Merkle Tree[1] to make the attack you describe detectable. Source Spotter, which is unaffiliated with Google, continuously verifies that the log contains only one checksum per module version.

If Google were to present you with a different view of the Merkle Tree with different checksums in it, they'd have to forever show you, and only you, that view. If they accidentally show someone else that view, or show you the real view, the go command would detect it. This will eventually be strengthened further with witnessing[2], which will ensure that everyone's view of the log is the same. In the meantime, you / your coworker can upload your view of the log (in $GOPATH/pkg/sumdb/sum.golang.org/latest) to Source Spotter and it will tell you if it's consistent with its view:

  $ curl --data-binary "@$(go env GOPATH)/pkg/sumdb/sum.golang.org/latest" https://gossip.api.sourcespotter.com/sum.golang.org 
  consistent: this STH is consistent with other STHs that we've seen from sum.golang.org
[1] https://research.swtch.com/tlog

[2] https://github.com/C2SP/C2SP/blob/main/tlog-witness.md


Not really.

For the question “is the data in the checksum database immutable” you can trust people like the parent, who double checks what Google is doing.

For the question “is it the same data that can be downloaded directly from the repos” you can skip the proxy to download dependencies, then do it again with the proxy, and compare.

So I'd say you don't need to trust Google at all in this case.


ok, I guess I was wrong about the cache, but not the checksums. I was somewhat under the impression that it was forever due to the getting rid of vendoring. Getting rid of vendoring (to me) only makes sense if its cached forever (otherwise vendoring has significant value).

Go modules did not get rid of vendoring. You can do 'go mod vendor' and have been able to do so since Go modules were first introduced.

How long the google-run module cache (aka, module proxy or module mirror) at https://proxy.golang.org caches the contents of modules is I think slightly nuanced.

That page includes:

> Whenever possible, the mirror aims to cache content in order to avoid breaking builds for people that depend on your package

But that page also discusses how modules might need to be removed for legal reasons or if a module does not have a known Open Source license:

> proxy.golang.org does not save all modules forever. There are a number of reasons for this, but one reason is if proxy.golang.org is not able to detect a suitable license. In this case, only a temporarily cached copy of the module will be made available, and may become unavailable if it is removed from the original source and becomes outdated.

If interested, there's a good overview of how it all works in one of the older official announcement blog posts (in particular, the "Module Index", "Module Authentication", "Module Mirrors" sections there):

https://go.dev/blog/modules2019#module-index


ok, 1) so would it be fair to modify my statement that it basically tries to cache forever unless its can't determine that its legally allowed to cache forever?

2) you're right, glanced at kubernetes (been a long time since I worked on it) and they still have a vendor directory that gets updated regularly.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: