Most git objects are tiny files, so internal tree-based parallelization won't br...

jiggawatts · on June 23, 2022

A fair point about the instruction sets, and it is also true that “most” files are small.

But again, due precisely to their size, large files take a disproportionate amount of time to process.

Don’t confuse the typical use-case with the fundamental concept: versioning.

Git could be a general purpose versioning system with many more use-cases, but limitations like this hold it back unnecessarily…

dchest · on June 24, 2022

Hashing is not the only thing that stops git from being useful for large file versioning. For this purpose, splitting files into chunks using a rolling hash (similar to how git packs, rsync, tarsnap or IPFS) would work better. This again doesn't require "internal" tree hashing, since each chunk would be hashed separately.