Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There’s a very good reason to prefer .tar.gz (or xz or whatever) to .zip: tar.gz files deliver better compression (ranging from “marginally better” to “significantly better” depending on what you’re compressing).

In a .zip, the files are each compressed individually using DEFLATE and then concatenated to create an archive, whereas in a .tar.gz the files are first concatenated into a .tar archive and then DEFLATEd all together.

Because of this, a .tar.gz often achieves much better compression on archives containing many small files, because the compression algorithm can eliminate redundancies across files. The downside is that you can’t decompress an individual file without decompressing every preceding file in the stream, because DEFLATE does not support random access. (And so tar’s lack of index is an advantage here; an index would not be useful if you can’t seek.)

This is why e.g. open source software downloads often use .tar.gz. A source code archive has hundreds or thousands of tiny text files with a ton of redundancy between files in variable & function names, keywords, common code snippets/patterns, etc., so tar.gz delivers significantly better compression than zip. And there’s little use for random access of individual files, since all the files need to be extracted in order to compile the program anyway.

The abbreviation “tape archive” may be anachronistic nowadays, but the performance cbaracteristics of a tape drive — namely, fast sequential access but absolutely awful random access — coincide with the performance characteristics of compression algorithms. So an archive format designed for packaging files up to be stored on a tape is perfect for packaging files up to be compressed.



The trade-off is that it takes a long time to extract a single file from a solid archive. The 7z format supports a "solid block size" parameter for this reason (for all supported compression algorithms AFAICT) which can be set to anything from "compress all files individually" to "size of whole archive"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: