Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Most git objects are tiny files, so internal tree-based parallelization won't bring much compared to file parallelization (git is a hash tree itself, with variable-length leaves).

SHA256 is actually a lot faster on modern CPUs due to https://en.wikipedia.org/wiki/Intel_SHA_extensions (and similar on Arm), which are implemented for SHA-256 but not for SHA-512, e.g. openssl speed sha256 sha512 on M1:

  type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
  sha256           89474.97k   283341.15k   901724.41k  1730980.24k  2339109.86k
  sha512           66160.19k   262139.03k   365675.96k   487572.26k   545142.91k


A fair point about the instruction sets, and it is also true that “most” files are small.

But again, due precisely to their size, large files take a disproportionate amount of time to process.

Don’t confuse the typical use-case with the fundamental concept: versioning.

Git could be a general purpose versioning system with many more use-cases, but limitations like this hold it back unnecessarily…


Hashing is not the only thing that stops git from being useful for large file versioning. For this purpose, splitting files into chunks using a rolling hash (similar to how git packs, rsync, tarsnap or IPFS) would work better. This again doesn't require "internal" tree hashing, since each chunk would be hashed separately.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: