Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've thought a little bit about those when working on Pijul, there are multiple issues, mostly at the interface between the technical and the human:

- There is no good diff for trees. Some formulations of the problem are NP-complete, meaning that patches won't be minimal, and hence users will get unexpected conflicts.

- Most VCS users have learned to be afraid of their favourite tool, and treat it like something extremely fragile that can break at any time for unknown reasons. Words like "porcelain" show that feeling is even present in the tool's authors themselves. For that reason, the more layers and heuristics one adds on top of basic tools, the scarier it becomes (especially for experienced users).

Writing a VCS is hard and takes way more time than most people expect. Few applications share the stateful nature of VCSs, one needs to feel the pain of designing and implementing databases and/or filesystems in order to understand these systems, where any bug can mean the state becomes corrupt, data is lost, etc. which is the very thing the system you're writing is meant to prevent. So, that kind of effort will only make sense when there are enough users to justify it and support the development, in terms of both man-hours and money.



You didn’t mention the lack of a good theory for tree diffs/patches here. Is that because you think there is a good theory or because you think not having a theory doesn’t matter? Or just that it wasn’t worth mentioning?

It strikes me that another big difference between dealing with conflicts in DVCSes and in the kind of collaborative editing CRDTs people seem to be mostly thinking about in the article and the comments section here, is that merges happen on the scale of seconds in the latter case and potentially much longer in the former, and the cost of a bad merge probably increases with its age.

I’m very sympathetic to your point about people not seeing the complexity in VCSs. I wonder if it is that they are frequently used tools that seem to be simple, though I don’t see that behaviour with operating systems. Perhaps they are considered too sacred or something. I think most programmers aren’t stressing enough file systems enough to find that they are sometimes buggy and hard to deal with accurately and efficiently, and if pressed might mumble something about calling fsync. But maybe most people don’t think it is easy and only those people who do write their opinions on the matter down.


> You didn’t mention the lack of a good theory for tree diffs/patches here. Is that because you think there is a good theory or because you think not having a theory doesn’t matter? Or just that it wasn’t worth mentioning?

I'm convinced there is a good theory. Moreover, since line graphs (i.e. totally ordered bytes in a file) are just a particular case of trees, that theory could even be implemented in the same way Pijul is now, where the "easy" case just means blobs linked together.

But I'm also convinced that there are so many edge cases that I wouldn't want to even start considering that before Pijul sees some real-world usage at a decent scale. If everybody misses the point and ends up preferring Git, then why even bother?

> though I don’t see that behaviour with operating systems. Perhaps they are considered too sacred or something.

I guess everybody sees how they would start to write a VCS: just read files, maybe write diff, imagine some datastructures to store that, fix the bugs and problems, that's it. Fewer people can see how to write operating systems, and even people who follow hobby tutorials immediately realise the complexity of handling all the hardware.

> I think most programmers aren’t stressing enough file systems enough to find that they are sometimes buggy and hard to deal with accurately and efficiently, and if pressed might mumble something about calling fsync.

That's right, but on the other hand if you look at the "massively parallel, structured filesystems", called databases, the space is tiny:

- There's a multi-billion-dollars company (Oracle) built at a time where nobody wanted to write fast databases.

- An academic project (Postgres) which grew extraordinarily slowly, and somehow managed to survive as a research platform, funded by public funds.

- Two single-author projects led by extraordinarily motivated people, one of which isn't meant to support concurrency (SQLite), and MySQL/MariaDB, built because there was no affordable, non-experimental database solution.

- Recent years have seen some progress funded by giant companies like Google for their cloud infrastructure, but we're not yet seeing many CRDTs in production.

The same goes for theory-based programming languages: both Haskell and OCaml have barely survived for 20 years before being used at massive scales. Rust has had a slightly easier time, but not much easier, despite being supported by Mozilla.

Yet, most engineers would consider databases "basic" technology, because they use them a lot, and (quite fortunately) don't need to implement them to build on top of them.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: