I've thought a little bit about those when working on Pijul, there are multiple ...

dan-robertson · on Dec 4, 2021

You didn’t mention the lack of a good theory for tree diffs/patches here. Is that because you think there is a good theory or because you think not having a theory doesn’t matter? Or just that it wasn’t worth mentioning?

It strikes me that another big difference between dealing with conflicts in DVCSes and in the kind of collaborative editing CRDTs people seem to be mostly thinking about in the article and the comments section here, is that merges happen on the scale of seconds in the latter case and potentially much longer in the former, and the cost of a bad merge probably increases with its age.

I’m very sympathetic to your point about people not seeing the complexity in VCSs. I wonder if it is that they are frequently used tools that seem to be simple, though I don’t see that behaviour with operating systems. Perhaps they are considered too sacred or something. I think most programmers aren’t stressing enough file systems enough to find that they are sometimes buggy and hard to deal with accurately and efficiently, and if pressed might mumble something about calling fsync. But maybe most people don’t think it is easy and only those people who do write their opinions on the matter down.

pmeunier · on Dec 5, 2021

> You didn’t mention the lack of a good theory for tree diffs/patches here. Is that because you think there is a good theory or because you think not having a theory doesn’t matter? Or just that it wasn’t worth mentioning?

I'm convinced there is a good theory. Moreover, since line graphs (i.e. totally ordered bytes in a file) are just a particular case of trees, that theory could even be implemented in the same way Pijul is now, where the "easy" case just means blobs linked together.

But I'm also convinced that there are so many edge cases that I wouldn't want to even start considering that before Pijul sees some real-world usage at a decent scale. If everybody misses the point and ends up preferring Git, then why even bother?

> though I don’t see that behaviour with operating systems. Perhaps they are considered too sacred or something.

I guess everybody sees how they would start to write a VCS: just read files, maybe write diff, imagine some datastructures to store that, fix the bugs and problems, that's it. Fewer people can see how to write operating systems, and even people who follow hobby tutorials immediately realise the complexity of handling all the hardware.

> I think most programmers aren’t stressing enough file systems enough to find that they are sometimes buggy and hard to deal with accurately and efficiently, and if pressed might mumble something about calling fsync.

That's right, but on the other hand if you look at the "massively parallel, structured filesystems", called databases, the space is tiny:

- There's a multi-billion-dollars company (Oracle) built at a time where nobody wanted to write fast databases.

- An academic project (Postgres) which grew extraordinarily slowly, and somehow managed to survive as a research platform, funded by public funds.

- Two single-author projects led by extraordinarily motivated people, one of which isn't meant to support concurrency (SQLite), and MySQL/MariaDB, built because there was no affordable, non-experimental database solution.

- Recent years have seen some progress funded by giant companies like Google for their cloud infrastructure, but we're not yet seeing many CRDTs in production.

The same goes for theory-based programming languages: both Haskell and OCaml have barely survived for 20 years before being used at massive scales. Rust has had a slightly easier time, but not much easier, despite being supported by Mozilla.

Yet, most engineers would consider databases "basic" technology, because they use them a lot, and (quite fortunately) don't need to implement them to build on top of them.