Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Pijul – A free and open source distributed version control system (pijul.org)
259 points by dgellow on Feb 14, 2017 | hide | past | favorite | 172 comments


Sadly, I'm afraid the AGPL license is going to make this untouchable to businesses… :-(

[Edit] Interestingly, they cover that. I suppose if you're just running Pijul rather than integrating with its code, it might be safe to use in a corporate environment. Still, it's likely to be offputting.


Why do you find it offputting? IANAL but it basically doesn't allow you to add code to (or get code from) Pijul itself and making it in another license. You can use it to store closed source code, you can host it yourself, offer it as a sevice for money, you can even create your own extensions as long as you release them AGPL...

I feel that AGPL for a product is misunderstood and pre-rejected without justification by far too many. Now, if you are developing a library, then AGPL sucks as a license for it, but for a cohesive product? Seems acceptable to me (not sure if it's my favorite choice)


>I feel that AGPL for a product is misunderstood and pre-rejected without justification by far too many.

Isn't that precisely the parent comment's point?

Perhaps it shouldn't be this way, but the argument is that the AGPL will make pijul "untouchable" by businesses, de facto.


but does granparent commenter find it off putting now that he realized it's not content viral? It's weird that it's not today's Oracle or Microsoft equivalent spreading the FUD/false info nowadays but developers in discussions on this very site. AGPL explicitly does not touch the content it processes, I expect a legal department to be pretty good at understanding that unless they heard something and don't bother to read the license.

Anyway I don't really care anymore. We act based on what we believe not on facts (this is not directed to any parent comments, to be clear). Whatever


Again, I think the two of you are arguing on different levels.

You're saying AGPL shouldn't be offputting. He's saying that by some quirk of circumstance, it is.

In other words, his argument is that picking AGPL was a poor strategic decision, given its (admittedly undeserved) baggage.


Precisely. I'm not arguing based on some reasoning of my own that AGPL is bad. I'm arguing based on working at a previous big company where if you checked in AGPL code, Ninjas in hazmat suits broke through the skylights and parasailed down to exorcise the toxic intrusion. Enough mixed metaphors?


> if you checked in AGPL code

This sounds like taking some of Pijul's source code and putting it inside another project. That would certainly have business implications, which justifies ninjas. It's possible to use AGPL as a business strategy (I worked at a company whose main product used CPAL[1] which has a similar network-use clause); but such decisions should not be made via VCS commit.

Of course, if the choice of AGPL prevents a business from reusing Pijul's code then presumably that's why they chose AGPL. That's kind of the point of copyleft.

I imagine very few businesses would care about Pijul's source code though. If Pijul matures into a compelling tool, then the relevant phrase would be "if you used an AGPL command"; no need for ninjas there, unless (as others note) you're building a PijulHub or something.

[1]: https://en.wikipedia.org/wiki/Common_Public_Attribution_Lice...


If a BigCorp is to use AGPL code, it needs to erect internal barriers (on code search, etc) to ensure that said code does not find its way into other projects. Those internal barriers usually incur a cost that outweighs whatever benefit the code provides, relative to unencumbered alternatives.

If you say "we don't plan to patch the code, just use the binaries as-is" then you're asserting that the software today your needs forever into the future. That's a terribly foolish bet.

A business founded on a principle of radical openness might be compatible with AGPL. But any business that wants to have some internal software (HR, etc) is well advised to stay the hell away from AGPL.


> If a BigCorp is to use AGPL code, it needs to erect internal barriers (on code search, etc) to ensure that said code does not find its way into other projects.

Yes, but this has nothing in particular to do with the AGPL and everything to do with copyright laws. There is no licence except public domain/CC-0 that allows BigCorp to incorporate other people's code without any obligations whatsoever (attribution, at least). Copyright laws forbid such incorporation by default.


> If you say "we don't plan to patch the code, just use the binaries as-is" then you're asserting that the software today your needs forever into the future. That's a terribly foolish bet.

I agree; one of the many reasons I'm against proprietary software is that it forces users into this helpless situation.

I don't quite get the "internal barriers" idea; is it common for companies to mix together code from multiple projects, including ones they don't own, such that it's difficult to disentangle them? Regarding your example, why would copy/pasting code from an internal code search be treated any differently from, say, searchcode.com? A modicum of diligence is always required regarding ownership, licensing, appropriateness, trust, etc.


For more info (can I just light up the DannyBee bat signal?), see all of these… https://hn.algolia.com/?query=dannybee%20agpl&sort=byPopular...


you should check the AGPL with ghostscript. Personally, I will stay away from any AGPL code.

Basically, ghostscript says you cannot distribute AGPL code with your commercial code, even though they are not linked together, not even on the same media.

https://ghostscript.com/doc/current/Commprod.htm


> offer it as a sevice for money

Yes, you can, but if you add something - like issue tracking, user/access management etc.. around that then you need to publish that under AGPL as well..

It means it will never be part of something like github, bitbucket or AWS CodeCommit.

And then there are plugins to CIs (checkout from repository ...) - would those be affected?


> Yes, you can, but if you add something - like issue tracking, user/access management etc.. around that then you need to publish that under AGPL as well..

Mission accomplished?


I literally wrote that you need to publish that under AGPL any extensions, so you didn't really blew my mind there...

It definitely be part of something like Gitlab (or github, bitbucket or AWS CodeCommit), you just have to model your business around the fact that the software is available to everybody. You know, like wordpress (sure, they are GPL so they can have some proprietary components but the practical implication is that you CAN create your own, for pay, wordpress.com service).

Plugins can be a different story. Does the software have a rest interface? then use that, no license virality. If it doesn't, you release them AGPL


Correct me if I'm wrong, but this only means that if you modify the code or directly incorporate the code library into your own, you must distribute your work.

If you are just calling the service through it's API (either CLI or through the programming language interface), then you don't need to distribute anything.

This just protects against people taking open source code GPL, modifying it for themselves, using it in backend services, and then never distributing their modifications.


> If you are just calling the service through it's API

That's contestable afaik.

> through the programming language interface

i.e. dynamically linking which is usually understood to be prohibited unless linking code has a compatible license.

> either CLI

Here's the problem - you can take any GPL library, make small CLI or REST adapter for it and license it GPL as well, then use that adapter from your proprietary application - is that still allowed? Because if it is then GPL can't ever be enforced and if it isn't you can't call CLI APIs even if original libraries themselves provide it.


I was reading this[0], but didn't scroll down through the comments. It appears you are correct.

However, I think if you just installed Pijul on a server and then called it through your operating system interface, then you _might_ be fine. You might need to make it so the interface to Pijul is generic and could swap out with other VC systems.

I still might also be wrong about this. I'm not a lawyer and the comments in [0] are on both sides of the argument.

[0] http://softwareengineering.stackexchange.com/questions/10788...

EDIT: edited to express less certainty over my interpretation of the license


> then called it through your operating system interface, then you _might_ be fine

Might, exactly. Depending on various courts accepting there is a loophole in GPL, and my layman understanding is that there isn't. Skimming GPLv2 I don't see them differentiating in derivative works between those that use compile time linking and those that use mechanisms like CLI. I'm weary of bringing after the fact constructs to justify something that GPL doesn't talk about. And after all how is CLI that much more different than dynamic linking - CLI is merely an interface that is subjectively a bit more friendly in certain situations but this imo shouldn't a have a bearing in legal discussions.


>directly incorporate

afaik this means any kind of linking (static or dynamic), using a jar or using a npm package or similar..

>through the programming language interface

which implies some version of the above


As long as you keep them seperate enough, via APIs or.else, no issue at all.


You can't add proprietary bits to their system without contributing those bits back so everyone can have them. Boo hoo.


This is the big conceit of copyleft licenses: that "derivative works" are exclusively "modified and extended versions of the program" to use the FSF's terminology, and not a small piece of a much larger system.

It's easy to imagine wanting to add Pijul support to Tower, SourceTree, Gerrit, Phabricator, etc. - projects that all dwarf Pijul. But these projects will be unable or unwilling to risk doing so, because of the AGPL.


I'm sorry, but that's nothing but FUD. Tell me exactly why those projects would be unable to integrate.


Here's the thing: Lawyers are expensive and legal implications are often not obvious to non-lawyers. Without a really really compelling reason to invoke them to clear the use of an AGPL project, clearing the use of it as non-threatening to the business' distribution model is probably not worth it. This is why businesses ban it outright.


Hi! Author here. The situation is slightly more complex than what you describe. Pijul was announced on the darcs blog, not by its authors, way before it was ready (about a year before). We had not really chosen a license back then, so we made the most conservative and protective choice available, and agreed to think again whenever we would have time.

Now that we've started to use Pijul for the website (about two weeks ago), we want to change the license.

I'm slightly annoyed by all the political statements in GPL3, AGPL3 and several versions of LGPL, such as "you cannot use this on impure devices", or "you cannot use this on platforms with poor support for shared libraries such as windows or mac os" (by which I mean that these platforms don't have real package managers to handle dependencies, and the end user has to either (1) install DLLs manually or (2) install "unshared shared libraries", i.e. one full instance of the library per program using it, which OSX calls an "app").

I am not a lawyer, but GPL2 seems to be free of these. We're not likely to pick anything much more permissive for now. Also, the Pijul and Darcs teams agree that we don't particularly enjoy discussions about licenses, especially when they're not based on factual arguments. Here are answers we've already given:

- If you think that "a new anarchistic jurisdiction not recognizing any copyright law will soon emerge, hence Pijul should be in the public domain", we don't agree. The movie Dunyayi Kurtaran Adam, also known as Turkish Star Wars, is available full-length on youtube, to remind everyone that copyright laws may be broken, but not totally useless. Therefore, your dream country may not "emerge" that soon.

- Or maybe you think "I'd like to make a living from selling a small wiki based on Pijul, leveraging not only your research ideas, but also the database backend you spent 6 months full-time writing, as well as you SSH library. Why would you not allow me to do that?". The answer is: because if your wiki is useful, we want to use it too, without having to pay! What sense of fairness is this?


First of all, from the FAQ:

> But maybe we’ve missed something, and the AGPL actually prevents some use of Pijul that we’ve not thought of, and that does not aim at centralizing the internet. If this is the case, please discuss your idea with us on the mailing list.

Second, I wonder how many companies really had to change the source code of git, subversion, mercurial etc. What somebody could do is build a web services around it. As far as a service interfaces pijul with a system() it won't have any licensing problems. Encapsulate it into some RPC wrapper for extra safety. You might have to distribute the wrapper but it won't make it much easier for the competitors. The bulk of the web service can still be closed source.

Third, GPL licenses are fair to the original developers and the end users (they get to see the code they use). I understand why other developers might like BSD style licenses and we could spend another 30 years arguing about the virtues and the flaws of the two approaches. I won't get into that but given their goal of decentralizing the Internet I think they picked the right license.


> I wonder how many companies really had to change the source code of git, subversion, mercurial etc.

Even that overstates the matter. Companies could modify the code all they want as long as they don't distribute that code. That would impact Github competitors - maybe 10 or 20 companies in the entire US.


Facebook uses derivatives of PHP and mySQL today, because those are the technology choices Zuckerberg made in 2004. You hope you should be so lucky!

Also, "distribute that code" can be interpreted very broadly. A contractor may access your internal HR system, and now you have to share it with them. A factory line worker may be entitled to the source controlling the robot fixture. A airline passenger may use the in-flight entertainment unit, and be entitled to its source. Etc.


If the HR, the robot and the entertainment systems are AGPL licensed chances are that their source is already on github.

Spot on for PHP and MySQL. PHP has its own license (kind of BSD?) and MySQL is GPL2 or commercial. I bet Facebook could buy the rights of any AGPL product, unless the owner is really firm about principles.


I may be wrong but I think some (most? all?) companies that are blacklisting agpl3 won't bother going into small print details, license alone is a no-go, no?


Any company blacklisting AGPL is going to be blacklisting GPL as well, which would preclude using git as well (even the more permissive libgit2 is just GPL with a linking exception).


That's not true, I know several companies happy with GPL, and not AGPL.

The thing they worry about is deciding what falls under AGPL. With GPL it is easy, if an executable leaves the building, source goes with it. With AGPL, when almost very thing is web connected, it can be hard to draw the line on what you have to open source (or just tell people you are using.. do I have to add every AGPL program in an Ubuntu server install, just in case one is getting used by something else?)


Can confirm that this is absolutely not the case. I have encountered a number of companies that are absolutely fine with [L]GPL but not going to consider AGPL.

Whether rightly or not, https://www.theregister.co.uk/2011/03/31/google_on_open_sour... and other writings on the topic had a big impact on enterprise adoption of AGPL software.

There's also the fact that most prominent AGPL licensing tends to be around commercially backed software where the main backer owns all the source and is therefore in position to not hold themselves to the same standard of sharing changes as their customers and third parties must do.


That doesn't seem to be the case in practice?


Only uneducated people think that merely using or deploying AGPL software is 'problematic' in any way. Please don't spread that unreasonable fear.


That sort of response, "only uneducated people think...", really shuts down conversation, and is a hindrance to constructive discussion. Whether or not AGPL is 'problematic' is something that can be discussed without ad-hominem attacks.


Sorry, I did not mean it like that. ESL here.


Thanks for clarifying! Reading your original comment in the context of this comment, I understand how your first comment probably is not what I thought it was originally.


Apple's lawyers are definitely educated, and they have put their foot down within Apple. And Apple is not the only large organization where exactly that has happened. You may well disagree with them, but they are the ones far more educated in the law, and they are concerned by it, so there is probably something to it.

The speculation I have heard is that the terms "deploy" and "link" are both ill-defined and have not gotten proper testing in the courts. So there is no case-law saying that pushing your changed binaries to 1,000 internal sites (or even better: partially owned subsidiaries) does not invoke the clause. Or what does "linking" mean in the context of a database driver? What happens if a well-meaning employee loans out a modified binary to a customer to see if it fixes their problem? All of that makes lawyers nervous, and what makes one lawyer nervous has the potential of making other lawyers rich at their companies expense.

Just because you read something and come to a conclusion does not mean that people who come to other conclusions are "uneducated".


I used to be at Google; AGPL was banned there. I hear that it is banned at many other large/huge shops.

I run a tiny business. AGPL is banned there too.

It isn't an "unreasonable fear". It is a reasonable decision based on mitigation of risk. AGPL is a risky license for end users; it goes too far beyond the Four Freedoms.


I can't even.


Out of curiosity, why the downvotes? Is there a better way to respond to being told you lack education to discuss a matter? (Other than the obvious ignoring completely?)


Since when does anybody ever check the license of compiled binary applications?


If you buy a macOS machine today, it ships with bash3.2 by default. That's the last version that was GPLv2. It is a decade old by now. Some companies most definitely care.


As I've mentioned elsewhere, git is GPL, so Macs are screwed regardless. :P


And yet macOS ships with Git, version 2.10. GPL(v2) licensed and everything.

It's not about GPL, it's about GPLv2 vs GPLv3 and the requirements that come with it.


I had no idea that Macs shipped with git, happy to hear it. :)


git and bash3.2 are GPLv2. They didn't stop at the version of bash that was GPL; bash has always been GPL. They stopped at v3 specifically.


could you please tell me why GPLv2 allows Apple to include bash3.2 with macOS but not the newer versions with GPLv3? I always had the idea that including GPL software requires the rest to be GPL too.


The usual issue is the "anti-tivoization" language (basically: the GPLv3 requires hardware that uses GPLv3-licensed software to allow users to install modified versions of that software on the hardware in question). There's also some language around patent licenses that might be unattractive to a company that deals in a lot of software patents (like Apple).


Yes, you are right companies care, but they aren't good companies.


And there will be no PijulHub.


Would love to see more explanation (with diagrams) of their patching model. Right now I'm not convinced it has any intuitive advantage.


This might be deeper than you are looking for, but this is the theory it's based on, and even has diagrams. :)

https://arxiv.org/abs/1311.3903


Here is one concrete example compared to git I found digging around: https://tahoe-lafs.org/~zooko/badmerge/simple.html


For anyone following along still, I just tried the bad merge example that vesinisia linked to with both git and mercurial. Git does indeed do the wrong thing as illustrated on that web page. Mercurial gets it right.

UPDATE: both git and mercurial get the concrete version wrong (https://tahoe-lafs.org/~zooko/badmerge/concrete-good-semanti...)


Is there any new status on this project?

The last blog post from 2017-Jan-10 says "I’m pleased to announce that we are starting to test the first usable version of Pijul.We are not quite ready to release.."

I was hoping that getting to the front page meant pijul had done a release but I see nothing to that effect.



I've shared the link because that's an interesting project and I'm interested in what people have to say. I wasn't expecting to reach the front page to be honest.


How does this compare to git?


Git (and mercurial, and a few others) are snapshot-based. That means that they think of the world as states, with changes between them.

Darcs, and pijul, are patch-based. That means that they think of the world as an ordering of patches. Patches aren't the same as commits: commit orderings, for example, are fixed, whereas patch orderings are computed. They can change when you e..g merge a "branch". Branching is similarly "simpler": a branch is just a collection of patches, not just a single commit with an implicit DAG attached to it.


This distinction between commits and patches (I've been following git, hg, and darcs for over a decade) is one I've never quite understood. A patch says, take this original line of code found here and change it into this new line of code. A commit is just a patch that records when in the history you can be sure to find that original line of code. If you take an arbitrary patch and apply it to an arbitrary set of code you might not even be able to find that original line of code. How do darcs and/or pijul solve that problem? Git and mercurial have tools like rebase and cherry pick that allow you to rearrange commits and they can even use three way merge algorithms (because of the DAG) to help recompute the patches for when that original line of code isn't quite what the patch specifies anymore. How are darcs and/or pijul different in that regard?


A commit is not a patch. A commit is just often represented as a patch (i.e. a diff), but in general neither git nor hg even save a commit as a patch, except as a minor, optional, and opaque optimisation. What they really save is the entire state of your repo at a particular time, with the hash(es) of the state of the repo just before this one. In the case of git, it saves a tree of hashes that refer to all of the blobs (files) at the current commit. In the case of Mercurial, it saves a manifest of hashes that point to files represented in revlogs. Whenever both git and hg show you a commit as a diff, this involves a computation. They are not merely showing you diffs that they have pre-computed and stored.

Bitkeeper does have a weave data structure that more closely resembles patches. It's an encoded set of instructions for transforming one file from one state into another:

https://www.bitkeeper.org/src-notes/SCCSWEAVE.html

This data structure has a big advantage when computing annotations (blames): it's much faster than Mercurial's revlog (which in turn is faster than git's blob-tree-ref structure).


All that sounds like differences in the intr internal model. From everything I have understood it wor would be possible to reimplement git to store only patches, their position in the commit tree and the commit hash, and have it behave exactly like the git reference implementation.

Is the difference between patches and git commits in a DAG really only a difference in internal representations or is there a user-facing difference?


Speed is about the only user-facing difference. But the big user-facing difference is that commits are glued to their parents. They really are glued; the merges and potential conflicts involved with rebasing and cherry-picking are a consequence of trying to undo this glue.

Darcs' and pijuls' patches aren't glued, they only either commute or do not, and the conflict resolution mechanisms for non-commutative patches are different.


What does commute mean in this context? I'm getting the impression that it means that the original line of code that the patch references and is going to change is where the patch expects it to be.


It's probably best to get this from the source:

https://en.wikibooks.org/wiki/Understanding_Darcs/Patch_theo...


It would be certainly be possible to reimplement git using only patches, showing that Pijul cannot be worse than git.

However, it would not be possible to implement a patch-based system (Pijul/darcs) based on git.

- One example is cherry-picking: in git, when you are on some branch A, and cherry-pick from another branch B, after the cherry-picking is done, if you try to cherry-pick from B again, you'll get conflicts. In Pijul and darcs, that comes for free.

- Another example is merge: merge between commits is provably wrong (https://tahoe-lafs.org/~zooko/badmerge/simple.html). With patches, this cannot happen.


Thanks for that link, that finally helped me understand the real benefit of the patch model, this quote in particular:

> The difference between what svn does and what darcs does here is, contrary to popular belief, not that darcs makes a better or luckier guess as to where the line from c1 should go, but that darcs uses information that svn does not use -- namely the information contained in b1 -- to learn that the location has moved and precisely to where it has moved.

Darcs/Pijul understand which lines a patch is interested in, and if those lines get moved around in another branch, the patch "follows" them to the right place, rather than just finding/applying the shortest possible diff.


Yes. The DAG in a context in darcs or pijul (so, I have a checkout; how did those files come into existence) is a consequence of the set of patches; the order is computed, not intrinsically part of every commit. This has many UX consequences; for example, when you darcs pull, you automatically cherry-pick patches in any order (as long as you have the dependencies) -- when you git pull, you pull in a commit in a position in a DAG, and automatically merge its entire history.

If you've heard someone say "never git pull, always git fetch and then either merge or rebase as appropriate", then they've noticed the difference between the two.


A commit stores information about the state of the whole repository (in hashes). A patch is stateless, just describing what operations you do on the repository.

One confusing thing for git users is, git represents a number of commits (but not all types of commits) as patches.

Two differences:

- Cherry picking is possible, but when you cherry pick twice from the same branch, you get conflicts with commits, because cherry-picking change their identity. With patches, this works just as expected.

- Merging can be made associative with patches, not with commits. Concretely, in git, if Alice and Bob add lines to a file, even when there are no conflicts, Alice's new lines can be merged in the middle of parts added by Bob, even though she's never seen these parts. Even worse, there is no way to tell when this happens to you (git doesn't say). "Associativity" is the mathematical property that this never happens.


> Git and mercurial have tools like rebase and cherry pick

darcs had incredible cherry picking about ten years ago.

Instead of saying "get this commit" and solve merge conflicts manually, like git does, darcs would get one patch and every other that was necessary for it.

It effectively made cherry picking work as in "I want this feature from that branch" instead of "I want some code from that branch".

It was glorious, other than the little detail of occasionally exponential merge times...


darcs and pijul have an internal model that tells it where patches can go; it does not suggest patches can go pretty much everywhere (in fact; at any given time it probably only has exactly one idea of which order the patches go in). Once those patches have dependencies, it enforces their order; but e.g. groups of dependent patches can still be transplanted around.

The Darcs wiki has some cool graphics around cherry-picking merges that might answer your question better than a paragraph of prose can: http://darcs.net/Using/Model#merging-with-cherry-picking


This doesn't seem significantly different to git, from the perspective of a user. Under the hood, I model a git branch as a ref pointing to a commit object in a DAG. In the driver's seat (git log --patch branch..upstream), I think of it as an ordered collection of diffs.


In darcs or pijul there is no DAG; there is only a collection of patches. The ordering between them is implicit, computed on demand, and can change as a consequence of merges.


It sounds as this is mostly an internal difference. Certainly, the mental model you described can be used on git reasonably well, from the user's point of view, and they won't get steered too far off course with it.


It can be used in git, with a lot of duct tape like git rebase and git cherry-pick and occasional dynamiting through rough git merges.

The darcs/pijul world of repositories as loose ordered sets of patches makes cherry-picking the rule rather than the exception. A branch is mostly just the subset of patches you are interested in at a given moment. A "trunk" is just the superset of all possible patches. You can make interesting and easy usages of things like set intersections [1]: the intersection of the patches in two branches in darcs/pijul can be much more interesting than nearest common parent commit in the git DAG, and especially can be a lot more informative in the cases where things like bug fixes are cherry-picked across branches, which in git is a special bit of tree/commit surgery but in the darcs/pijul world that patch can be often the exact "same" in both branches.

[1] Aside, I love the concept of using intersection branches for consensus-oriented development (what releases to Production are the patches that every developer has pulled into their own working branch), which is a neat form of decentralized development that I think can only really be handled in the darcs/pijul model. (I have an ancient blog post on the idea of such a starfish development workflow.)


There are definitely many implications for users.

One thing you'll notice very quickly with Darcs (and presumably Pijul) is that the system always manages dependencies between patches. If you try to cherry-pick a single patch from a branch, you will get that patch and all the patches it depends on; you don't get the full linear history, you only get a subset.

In other words, you get the intuitive feeling that you're operating not on a log, but on a graph. Pulling one thread necessarily pulls other threads, and the whole graph rearranges itself to accomodate your changes. This has its downsides compared to the strictly-linear snapshot model, but the upside for most users is incredible. You can just commit and merge, and the system handles ordering for you.

Git was a major step down, UX-wise, when we switched from Darcs back in 2008, and it's still less user-friendly today. (It was also a major step up in some ways: Darcs, at the time, had a huge performance edge case where conflicts where sometimes effectively unresolvable because they took too much time to compute.)


I miss Darcs too. One potential downside of making cherry-picks very easy is that abusing them can bite you: two patches might be independent in terms of conflicts but functionally dependent (e.g., code in patch A calls a function introduced by patch B).


It's not quite the same.

Here is a very short video on a project called "Camp" that stalled out, nearly a decade ago. I think it very nicely explains how the user interface differs:

https://www.youtube.com/watch?v=iOGmwA5yBn0

The most important thing is that because there is no DAG, when you say "Darcs, pull this patch for me", like saying `git cherry-pick ABCDEF` -- the dependencies are automatically computed and pulled as well. You can sort of imagine it like if you had a git branch, and you ran 'cherry-pick' on one of the commits to your 'master' branch (because you wanted it). But, rather than pulling that one thing, 'cherry-pick' implicitly traversed the dependent patches and picks them all as well. But because there is no DAG, a dependency doesn't mean "parent commit". It means "the other patches that are mathematically required for this patch to work out". That means cherry-pick always works: you never have to calculate the dependencies yourself. To merge a patch is to implicitly merge all of its dependencies.

I've spent plenty of my time as an OSS maintainer dealing with merging multiple bug fixes from a development branch into stable branches. For example, a bug may already be fixed in HEAD when it's reported, but not STABLE, so you want to pull changes from HEAD into STABLE. Many times this requires multiple, carefully curated sequences of 'git cherry-pick' in order to correctly get the dependencies right. For example, the author may have made a small refactoring, then implemented the bugfix on top of that. Or it requires a complete reformulation or re-commit of a new change that matches the STABLE branch.

In a sense: this never happens with Darcs. If there's a bugfix, I say "Get me that bugfix patch". It always gets every dependent patch that is necessary, and never anything more. Every time. It always just works. Remember: no DAG. You aren't traversing parent commits. You are, in a sense, finding the transitive closure of "patches that cannot commute with this patch" (IIRC). That means: if a given patch does not commute with this patch, i.e. it is dependent, because we must apply them in a certain order, so there is a dependency -- then you also need that patch. And you need to apply that rule to that patch, and every patch it depends on, and so on and so forth (hence 'transitive closure')...

This allows a very powerful form of development, where features and bugfixes can coexist. But they do not necessarily need separate 'branches', so to speak. To merge a feature into a repository implicitly pulls its dependents, and the same with bugfixes. The net effect of this is that Darcs almost always gets merges correct, or it fails to do the merge at all. This kind of means that merges are sound ('kind of' because I don't know about an actual soundness proof, but the intuitive idea roughly is right): if Darcs pulls off the merge, then it's always correct, but it may not be able to always actually do that merge (perhaps not every merge is actually sensible, in the theoretical view of things, or perhaps the merge is sensible but the model doesn't allow it to handle that case).

The "fails to do so" is the tricky part, where Darcs 1 originally went exponential in some cases, though Darcs 2 mitigates this. It looks like Pijul will finally nail this problem dead, although admittedly I haven't looked over the theory.

Side note: Camp was originally envisioned to be the successor to Darcs, or at least the basis for "Darcs 3", using Coq to build formal proofs about the underlying patch theory to show it worked out correctly and avoided the harry bits that plagued Darcs 2. Unfortunately, it never panned out that way (due to time and lack of funding). The project was actually started by Ian Lynagh who worked at my current company before me and was one of the founders.


> If there's a bugfix, I say "Get me that bugfix patch". [..] It always just works.

> Darcs almost always gets merges correct, or it fails to do the merge at all.

One of these is not like the other, which IMO is the problem with "magical" merging systems. Great when they work, f*cking hell nightmare when they don't.

I'd rather have something like git that works in normal usage all the time, and when it fails, is easy to fix. YMMV.


If you try to analyse merge systems mathematically, git's merge system is the "magical" one. It is just a heuristic algorithm, with no solid property you can rely on.

In contrast, Darcs and Pijul's merge are associative, and Pijul's merge is commutative. Even if you don't like maths, this means that they will always behave deterministically. This also means you can use them in scripts, although darcs might sometimes have performance problems (pretty bad ones, actually).

In git, you can get the following: https://tahoe-lafs.org/~zooko/badmerge/simple.html


FWIW, this is traditionally quite possible with Darcs. I sort of misformulated in my original post; it's not like it just gives up and you're at square one. The workflow IIRC was basically the same as Git: it'll throw its hands up and you make a new commit to fix everything. So that isn't really a problem or any different. Note that Darcs 1 did have the exponential merge case on top of this, however, which was pretty unfortunate (and really a byproduct of the design of the change format, among other things).

In all honesty, given years of experience with Git, and fondly using Darcs as my first version control system: I still think merges are absolutely the one thing it beats Git at, hands down. When it works and it does its job, it always is correct. When it doesn't, you can bail it out. Not much different, but the "always is correct" and dependencies-being-implicit is what makes it good. Darcs could have saved me at least dozens of hours of hair pulling when doing STABLE merges I estimate... Git's still good. I wish it could do that, though...

Your note about git is interesting. In fact, Git is, in at least some cases, more magical than other VCSs in the merge department. You might just not be aware of it due to being so familiar. When I say "Darcs always gets the merge correct", I don't just mean it literally finishes with exit code 0, but also that the semantic model is, in some sense, more 'correct' or 'intuitive':

http://r6.ca/blog/20110416T204742Z.html

Darcs (and others) always get this 'merge associativity' case correct, where 'Base+A+B' where (+) is merge is associative (so it doesn't matter how you 'bundle' the changes or whatever). That means you have less edges to worry about. And to be fair, I don't think there's anything inherent about Git where this particular case can't be fixed. It's just a good example of why people are trying projects like Pijul/Darcs at all, so these things can be formalized and understood. The theory of patches is actually rather rich and helps formalize a lot of these notions of what a "merge" really is in an algebraic sense, how patches relate to one another, etc.


It is different. When you commit in darcs, you cherry pick as default.


I don't understand, you will have to use more words. I've never used darcs, and as a git user I don't see how you could cherry-pick by default for commiting. By default, I would say you… create a commit object with the author/date/message/tree/parent metadata recorded in it.


When you type git pull, it has a remote and a ref at that remote, and it attempts to merge (ff notwithstanding) the DAG you have with the DAG the remote has.

When you type darcs pull, darcs lets you pick and choose which patches (subject to the dependency constraints) you want to pull in. Those patches then get applied however darcs wants to apply them (again, subject to dependency constraints, of cousre). Because you do not necessarily need to pull all of them, you are always "cherry picking" by default.

This is different from cherry-picking in git, because when you cherry-pick in git you still have 2+ commits that exist in the context of a DAG; you're just transplanting the contents elsewhere to create a new commit in a different position in the DAG.


I still don't get it: pulls don't merge DAGs, they add objects and move refs (and maybe merge); every time darcs fans say 'patch' (as a collection of related changes) I think, 'oh like a branch?'

This sounds less and less like a tools/implementation thing and more like the default recommended/enforced workflow thing.


To start by adding to your confusion: one of the early problems encountered in darcs<->git interaction actually was that sometimes a darcs patch acts more like a git branch than a git commit...

It might help to compare the "identity" structures of git commits versus darcs/pijul patches. In a pseudo-C, you can see a git commit as something like:

    struct commit {
      string author;
      string description;
      tree_id tree_snapshot;
      commit_id[] parent_commits;
    }
If any of those fields change (are amended), you have a new commit.

This is a directed acyclic graph (DAG) because of that `parent_commits` link from one commit to the immediate previous parents (it can be multiple parents in the case of a merge commit). Git can only just move refs on a pull in the case of a "fast forward" when the remote branch is "simply" ahead of the current branch and all of its new commits "point to" the last commit in the current branch. Every other case it is a merge of the graph (via a merge commit with two or more parent commits).

(While git outputs a diff as the representation of the commit in places like `git show`, a commit doesn't store the diff but instead a link to a snapshot of the tree at the time of the commit.)

For something like darcs/pijul, the identifying information of a patch looks something more like:

    struct patch {
      string author;
      string name;
      change[] changes;
    }
If any of these things are changed (amended) you have a different patch.

This may seem like semantic quibbling in that the patch here actually contains the diffs as a part of its identity rather than a snapshot of a source tree, but that's not actually the important difference.

The important difference is that the context of the patch is no longer a part of its identity: there is no "parent patch" information, and the change structures don't directly refer to previous changes.

The reason that difference matters is because in the darcs/pijul models the context of the patch is more "metadata" about the patch than a direct part of the patch. Patches aren't "nailed" to a graph like a commit is, they "float in a basket" together. Darcs and pijul do the work to figure out which patches need to be in which order in a branch/repository.

This can be a nightmare to someone expecting a strict graph. Darcs and pijul can and will reorder history during a pull. You can see "newer" patches float down under "older" patches in the patch log as the systems work to build a stable sort of patches.

That movement, however, is also where the systems draw the most strength. That movement of the patches can be seen as a continual, rustling "cherry arranging" as the systems work to figure out the minimal set of previous changes that patch needs in order to exist.

If you cherry-pick a commit in git you copy the changes from that commit to the new branch (a new spot in the DAG) into a new commit with its own new identity. Down the line when you go to reintegrate/remerge the branches between the original branch and the cherry picked branch, git doesn't see the same commit/change and its merge can (in my experience, will) see conflicts in the exact same change made in different contexts.

When you cherry pick a darcs/pijul patch, you bring over the same exact patch and the system lets you know any other minimal dependencies that you need and brings them over as well. When you reintegrate/remerge the cherry picked branch, those exact same cherry-picked patches are already "in" the original branch and so don't necessarily need to be remerged/rearranged again.

You can duplicate git workflows on top of darcs/pijul, but it is very hard to duplicate some of the more interesting darcs/pijul workflows on top of git. Among other things, rebase/cherry-picking merge hell is a very real problem in the git ecosystem, whereas darcs/pijul almost seem like crazy smart magic in comparison when it comes to some of the scenarios where you might rebase or cherry-pick.

It might be something that won't entirely make sense until you try experimenting with it yourself: maybe, you might want to take darcs for a spin for a small project or two. I think you can feel a lot of the difference as you use it, especially as you start to push/pull between branches/repositories.

(Anecdotally, my workflows are quite different on darcs versus git, knowing that typically I could fix a bug discovered elsewhere in the code in the middle of a bigger project, without needing to branch, I would often just record that change into a tiny patch on its own right there on the spot, and generally know that if I needed to get just that one patch into another branch I could rely on darcs to cherry pick it for me later.)


  Down the line when you go to reintegrate/remerge the
  branches between the original branch and the cherry picked 
  branch, git doesn't see the same commit/change and its merge
  can (in my experience, will) see conflicts in the exact same
  change made in different contexts.
Finally I see some common ground: by tracking changes separately to commits, users won't see merge conflicts when the commits get moved.

Sadly, git also recognises this on the user's behalf, so likely your experience was due to some other delightful quirk of the git UI.

edit: I'd also recommend not calling trees 'tree snapshots', because that will confuse people familiar with trees. Same for 'cherry-picking a commit', since from your description darcs seems to use 'cherry-picking' to mean 'fetch a set of changes from someone else', which maps to 'fetch a branch' in git-land. 'git cherry-pick' means, 'copy a single change from one local branch to another', so has almost no overlap.


Obviously there are plenty of anecdotes on both sides, but I had to (try to) force a moratorium on git cherry-pick between long running branches at a previous job because merge problems became a huge sink of time. (This was after they'd already had the same problems in TFSVC and seemed adamant to recreate that problem in git.)

You seem to think its "fetch a branch", but I'm trying to tell you that the `darcs pull` experience is a lot more like doing `git fetch && git cherry-pick origin/TIP --interactive` every time than `git pull`, but with a much, much better merge experience than that implies.

I'm not trying to confuse different concepts, I'm trying to show that the hard concept in the git case was the easy concept in the darcs case.


How does the fetch part know which objects to grab? (Answer: because it cared about the DAG first.)


That's begging the question: git-upload-pack takes a ref and a commit ID, and returns a pack of objects to the requester to do as they like. I wouldn't call it a merge, but a set of operational transformation actions to be added to the requester's repo. It may not even examine the DAG at all, since it's possible to save those packs as 'bundles'!

My point is, if you focus on the implementation differences, my understanding won't increase because we don't have the same mental model of how git works under the hood.


Git stores diffs, not snapshots. A commit is a changeset, literally a patch that you can export with `git diff`. Ordering of is layered on top of that and informs things like merges.

Can you explain a little more what you mean?


Internally, git objects are snapshots -- the differences are computed on demand. That is a critical difference, because it has implications for how merging can work. For example: https://tahoe-lafs.org/~zooko/badmerge/simple.html

Even when using git am or send-email or whatever, yes, you're sending a patch -- but the way to apply that patch is to turn it into a commit and then cherry pick or rebase or merge or manually fix conflicts or whatever. In darcs and pijul, the model is _always_ that set of patches.


So, I didn't believe you because my internal mental model of a git repo is a DAG of changesets. And indeed, that is a good way to think about it, because almost all of git's operations behave like this. Commit history is almost always presented to the user as a series of diffs.

But, you are correct. Internally, git stores the full contents of files and computes the diffs on the fly.

For others' benefit, if you want to test for yourself, create a new repo, add a file and make a series of commits with changes. Git objects are compressed with DEFLATE (zlib) so gunzip and unzip won't work. I used https://github.com/jezell/zlibber because I was too lazy to write my own quick zlib wrapper. Then doing

    for o in .git/objects/*/*; do cat "$o" | inflate ; echo ""; done
lists the contents of all the git objects. Notice that there are no changesets, only full copies of the file you modified at different states.

This was surprising to me, since I had a very different model mentally. I still think the DAG of diffs is the better model mentally, but it is worth understanding that this is not what git is actually doing under the hood. It explains issues that arise doing rebases, cherry-picks, etc.

I now also understand the motivation behind Pijul. If I understand correctly, Pijul does use a collection of changesets as the underlying model. Like you say, that can be a critical difference.


I thought git stored diffs too. But I don't understand what difference it makes, isn't the commit storage format an implementation detail? If you want to get to B from A, then I can store B or store B-A. When I have B and want to show a diff, I can calculate B-A. When I have B-A and want to show a diff, it's a no-op. When I have B and want to work, it's a no-op, but when I have B-A and want to work, I have to populate the working tree with A + B-A. It doesn't feel like there's a fundamental difference in the model either way.

But if you don't have A in the first place, then there's a real problem, how to take B-A and reconstruct B, without knowing what A is. You have to find A, and there may be multiple acceptable A's. It seems like the fundamental difference here would be not having a strict "parent" for any given patch. I can see why that would make some workflows a little nicer, not being forced to rebase, but I don't see any massive advantages -- as a user what does this really buy me? Does it enable some things like that are impossible with git? Or does it mainly make some advanced git workflows easier?


Maybe lvh or one of the others who has more experience with darcs of Pijul will chime in. They've probably spent more time thinking about this.

One difference is that by storing the patches only you can understand more clearly what the intended change was. When you store the whole file it is easy to compute the difference between A and B, but may be impossible to compute the correct differences between A, B, and C. By storing the whole file you now have to consider all the possible differences between them, not just the ones introduced by the commits you are trying to merge.

I would have to play around with it, but I know there are scenarios involving rebase, revert, and cherry-picking commits that can cause trouble in git that I now understand comes because of the fact that git is storing contents, not diffs.

One that I've run into regularly is cherry-picking commits from a dev branch into a master branch to hot-fix bug fixes directly into a prod release instead of waiting until dev gets merged as part of our regular process. If I had commit A on dev and cherry-pick it to master it creates a totally new commit A-1 that becomes part of the history of master. We lost the fact that A and A-1 represent the exact same changeset. Depending on what the changes are, and what further changes happen on dev afterwards, this can cause failed merges requiring manual resolution when dev does finally get merged into master.

I imagine that would not be a problem for Pijul.


I think you've hit the nail on the head :)


Those reasons make sense. Would it be fair to summarize all that as "better merging with fewer merge conflicts"?

Git certainly has some room to improve in the merge conflict department. I looked at the bad merge example you posted -- I suspect I've hit that before. It's rare, but yeah it's there.

I also frequently notice that git complains about merge conflicts, while the custom diff tool I use to resolve them says there's no conflict, and I don't actually have to do anything. Good reason to use a custom merge tool with git.

But, given all this, is this really all an outcome of patches vs snapshots, or is this just git's merge algorithm being suboptimal? Certainly git could selectively ignore the DAG when merging, couldn't it? Even after reading the other comments here, it still seems to me like git has more information when merging than the "patch-based" workflow of darcs & Pijul.

It seems to me like there's a language problem with trying to draw a distinction between patches and snapshots. Git is still storing and transferring patches at the tree level, even if it's not happening at the file level. Git does not store a commit as a zip snapshot of the entire tree, the commit is still only the changed files. It would be fair (but not standard or common) to call the overlay of changed files a "patch" or a "diff". People do still use git format-patch, and email git "patches" to each other. So it's inherently confusing & problematic to talk about git and say that it doesn't use patches.

What does make sense to me is the distinction of having a strict DAG vs not having one -- is that actually what people mean when they talk about snapshots vs patches? Am I tripping on it because I'm being too pedantic about what a "patch" is?


Right, the raw representations are convertible (at some point a patch-oriented darcs/pijul has to build a snapshot so that it can build a working tree; at various times you want to see the diffs in git or format a patch file to email). It does have more to do with the representation of change context both between patches (strict DAG versus algebraic sets with looser change context models), and even to some extent within a patch (in a classic diff the tools use hardcoded line numbers; in something like darcs/pijul even the line numbers of a patch aren't necessarily taken as a given and are a part of the context of the change).


The difference is more than just the storage format. You could compute snapshots on demand and just store the diffs (that'd be impossibly slow, but you could), but you still would have git's DAG instead of darcs' theory of patches, with all of the consequences of that. (I'd explain them here, but I've already explained it in sibling comments on this thread).


A video referred to by another user on this thread puts it very clearly: https://news.ycombinator.com/item?id=13645102


That is the most basic storage format in git, but it has packfiles too where it uses deltas. But not necessarily changesets' diffs!


What other operations behave as if it's at its core diffs and not snapshots?

merging, rebasing, committing... all operate on refs. You might think you're transplanting changes (and you are), but the inputs are refs, and the outputs are refs. As you mentioned, refs are unambiguously snapshots.


Git does store snapshots, not diffs. Look into the files in .git/objects/* ;)


Not sure why you're getting downvoted, you're describing git correctly.

If anyone is questioning this, just play with `git rebase -i [some old changeset id]` and you'll see that it's just an ordering of patches.


git rebase -i does not demonstrate that gits internal model involves a DAG of snapshots (objects); it only demonstrates that git is sometimes willing to move the contents of one of those snapshots around to create a new snapshot. That is very different from a patches-always model, as I have illustrated in a sibling comment with an example link.


rebase turns the snapshots temporarily into a stack of patches, lets you play with them, then turns them back into snapshots.

In fact, this is why rebase is an out-of-band tool that has odd effects on shared history - specifically because it's inverting git's model into something more like pijul's, and therefore isn't really native-to-git.


I guess, you will see a lot more parallel branches with Pijul compared to git.

Assume you change file A, commit, then change file B, and commit. In git there is a dependency from the second to the first commit, because the state of the second is derived from the first one. However, the changes are unrelated because they are in different files. Pijul understands them as parallel unrelated changes (although with different time stamps).

(disclaimer: I infer that from using darcs many years ago. I never used Pijul)


Well, it depends. In practice, I think the number of branches is similar.

In theory, the number of possible branches is much, much greater (arguably infinite) for git. Commits have their own metadata, so just amending creates a new commit. Patches are immutable, and there's no independent artifact like a commit that incorporates the current set of patches.

Furthermore, darcs (and I assume pijul) absolutely let you make multi-file patches.


> How does this compare to git?

Apart from the other answers that go into theories of patches vs. commits, I always found Darcs much more intuitive to actually use than Git. It has fewer commands that do more intuitive stuff. To record a patch, you do "record" instead of separate "add" and "commit" steps. To revert some changes you have not recorded and that you want to get rid of, you do "revert" instead of "checkout -- filename". The "diff" command works more intuitively than git's "diff", which you sometimes have to use as "diff --cached" due to its staging model.

Another thing is that every branch is a separate copy of the entire source tree in your file system. A drawback is that this can be viewed as wasteful, but the advantage is that it's much much easier to work in parallel branches at the same time since changing the branch is just doing "cd" instead of some boring dance of "stash" and "checkout". You also don't get problems due to files intended for one branch still lying around after a "checkout" to switch to another branch.

(I'm assuming that Pijul preserves all or at least most of these properties of Darcs. The docs aren't very exhaustive.)


That's a bit involved, but I believe that the intent is that Pijul is to Darcs as Git is to Monotone.


"Because Pijul is based on a mathematical model of collaborative edition, its behavior matches intution, every time."

That's not how math and/or intuition works.


Math itself isn't necessarily intuitive in every case, but I think their point is this: if their system follows the simplest (i.e. most general) mathematical model, things will be more intuitive (i.e. more special cases == more difficult to reason about).

Let's look at example from math: integers and addition. Addition is pretty general -- there aren't, say, weird special cases when one operand is even, or the current date during calculation is Friday the 13th. Addition is associative and commutative, so I can evaluate a long summation in any order I want, or I could chunk up the calculation and have multiple computers evaluate parts thereof, all without any coordination/locking. It's easy to reason about, because the rules are so general, and generalization is what math is all about.

Now let's look at software: packaging. What are the semantics for package installation for your language/OS? If you install package A and then B, do you end up with the same result as you would by installing B and then A (i.e. is installation commutative)? Many (if not most) package managers can only support one installed version of a package at a time, and thus installation can not be commutative: installing a package will pull in dependencies that will influence package constraint resolution in subsequent installations, so order does matter. Now you have to be careful not to fuck that up when you set up your cluster's configuration management (or, hell, just get what you need installed on your laptop so you can work on a new assigned project). Now, if the package manager in question supported multiple installed versions of a given package (and had a way of "activating" only a subset of all packages for a given project/application), installation would be commutative, freeing you of the burden of installing things just the right way and in just the right order. This is how Nix (OS pkg manager) and the latest Cabal (Haskell pkg manager) work.

So, yes, coming up with a simple, consistent mathematical model for the semantics of your target system will definitely make things more intuitive. It's precisely because most developers are terrible at mathematical reasoning that so much software is so difficult to reason about -- there are tons of unnecessary special cases, when hidden inside all the tangled logic there's secretly a simple set of axioms and theorems that lend themselves to intuitive composition.


I think you showed the opposite point - you've argued commutativity is intuitive, but certainly non-commutativity can be part of a mathematical model.


Of course non-commutative can be a part of a mathematical model; no, I'm not showing the opposite point.

For clarification, my point is this: if some operation can be proven to be commutative, but you don't acknowledge that it is, you've just made things more difficult for yourself (and maybe others). It would be absurd if someone worked really hard to evaluate a_1+a_2+...a_n strictly left to right, when there might a more convenient (from a human grey-matter standpoint) order to evaluate those numbers. That's because we know that addition here is associative and commutative. Uncovering these properties is what math is all about -- discovering these truths and exploiting them to good effect.

If you can prove that something is not commutative given your axioms, you can attempt to revise your axioms -- maybe one axiom was redundant and only served to impose unnecessary constraints on your model. If you can't find a revised set of axioms that satisfy what you need, congrats -- you've found the simplest system you could come up with -- though someone might, down the road, come along and show you that there was, indeed, a simpler set of axioms that you couldn't envision. That's the process of mathematical development.

If you're good at math, you'll discover the axioms you need to get the generality you're looking for, and if commutativity is to be had, you'll find a way to get it. If you find no way to get you some commutativity, you at least avoid pretending that you have it (that is, you don't write buggy software).

Good math skills will either get you commutativity (which is intuitive) or prove non-commutativity (intuitive again, because you're avoiding bugs) -- it's win-win.


A simpler explanation to that of my sibling post -- your point is akin to this fallacy:

Susan: My brother plays such great music with his violin!

Bob: Actually, I think you're arguing the opposite point -- a violin has strings that can be mishandled such that they cause annoying screechy sounds, so surely he's capable of producing a cacophony of screechy sounds. Not so enjoyable.

Susan: Okay... but he's a good musician, so though he could fuck up a performance if he wanted to, he doesn't -- he plays to the best of his abilities.

If a math model claims that something isn't commutative, it's either as simple/general as it can be, or the creators of that model are bad at math (they left commutativity on the table due to bad axioms, or their theorem that the given operation was not commutative was wrong).


I have no idea where we're going with this.

You're original argument was that mathematics can model things that we might arrive at intuitively, without looking at the model - such as assuming commutativity in package manager installation order - correct?

I'm simply saying that mathematics can also describe things which are surprising; counter-intuitive even, so I agree with 'higher' comments in that being:

> based on a mathematical model of collaborative edition

is not sufficient for having that:

> behavior matches intution, [sic] every time

It just doesn't follow. Not least because one man's intuition differs from another's.


Emphasis mine:

> You're original argument was that mathematics can model things that we might arrive at intuitively, without looking at the model - such as assuming commutativity in package manager installation order - correct?

No, certainly not -- the right choice of model is key (a trivial example: one system for strings would be one in which there wasn't an identity element (that is, the empty string), thus making strings non-monoidal; that would complicate matters like concatenating a list of nullable/optional strings). I'm not claiming that to be the case, nor do I think the Pijul authors are claiming that. I think we both would say that there exists a model of any given domain that is the most suitable for for that domain (not that just any given model will suffice), and that it's math that helps you discover it. I read that blurb on the site as (the slightly tautological) "because we chose a good mathematical model, you can expect that the entities and operations on those entities can be composed as advertised, rather than unintuitively yielding unexpected results (bugs) or unnecessarily prohibiting some composition of operations that is clearly logically be sound (which is also unintuitive -- why the special cases, when these things should compose?)".

> I'm simply saying that mathematics can also describe things which are surprising; counter-intuitive even [...]

Sure. An example: non-commutativity in in package managers that only support one version of a given package to be installed. As someone who uses more flexible package managers, I'm often surprised when I'm using another package manager and discover its operations aren't commutative (which inevitably is due to, as I discover shortly thereafter, that it only supports installing one version of a given package at a time). That model requires extra brainpower to think through how I'm going to coax the package manager into installing what I need without conflicts (of course, after I've jumped through hoops to uninstall all the bad versions first). A mathematical model can be given for these systems, and they surely are convoluted and bad -- but just because we can come up with convoluted messes in math doesn't make math any more antithetical to intuition, that's just user error.

In this case, the package manager developers never set out to codify the formal semantics for these systems -- they just grew organically from initial needs. If they had started with an explicit mathematical model and iterated on that model, and assuming they had any math proficiency here, they would have done the convenient thing and allowed multiple package versions, and would consequently have commutative package installation (and I wouldn't have a notepad full of notes on how I need to carefully serialize my installs so that I don't get conflicts).

>> based on a mathematical model of collaborative edition

>is not sufficient for having that:

>> behavior matches intution, [sic] every time

Sure, not sufficient, but necessary (unless you count the possibility of just randomly stumbling into the best model). It's also necessary that you chose the best model for what you want (where "what you want" is surely a subjective matter, but efficacy can often be measured objectively in terms of, say, how much time is spent doing the same thing in two systems (assuming a similar level of mastery in both systems)).

> It just doesn't follow. Not least because one man's intuition differs from another's.

Ok, intuition is a subjective measurement, but I think there's still value in trying to find a pattern in what people generally find intuitive, rather than dismiss the topic entirely. I'm suggesting, anecdotally, that fewer arbitrary edge-cases is easier for human brains to deal with, and I think few would argue with that. Math is, to a large extent, the process of shaking out those generalizations from a bunch of concrete observations, so I would surely trust a system where someone could point out their logic in the construction thereof, over some system where the authors shrug and say "I dunno, that's just the way I built it." (which is most software I've come across). Seeing that note on the Pijul site inspires confidence: even if their model isn't maximally generalized (yet), I know that it's something they have an appreciation for, so I can come in and propose improvements (the same cannot be said for projects where the leadership can't appreciate such proposals due to a lack of the mental/mathematical framework necessary to conceive of the positive consequences thereof).

Given my heuristic for intuition ("as few edge-cases as possible"), it would seem that math would be requisite here. Do you disagree with that, or do you perhaps interpret that blurb from the Pijul site as claiming that (any unqualified) application of math is sufficient for developing an intuitive system?


> It's precisely because most developers are terrible at mathematical reasoning that so much software is so difficult to reason about -- there are tons of unnecessary special cases, when hidden inside all the tangled logic there's secretly a simple set of axioms and theorems that lend themselves to intuitive composition.

Have you ever developed software outside of academia? Special cases arise when your perfect mathematically structured snowflake is put in the hands of actual users who want to do actual work with it.

A buddy of mine who did work with JPL has told me that NASA uses a decades-old FORTRAN app to calculate orbits. When I asked him why they didn't port it to a more modern language he said it was because the system has so many special cases and fudges that would take forever and the new system might not be able to replicate the features of the old one.

Of course having a logical basic architecture (example, a structure that encodes special cases as configuration) can make it easier to maintain as a project matures.


I think they're implying that the mathematical model creates a pattern of behaviour in the software that's intuitive.

For vim users, that's how vim keybindings work, for example (not everyone would agree vim is intuitive, hence I qualified the user-base). If you learn vim's keybindings, they're like axioms. You start to intuit the patterns like you would a language, eventually you don't think about how to do something, your fingers intuitively know.


Also supposing it were, git is also mathemagically intuitive when you view commits as isomorphic contours in source-code phase space (http://tartley.com/?p=1267) thus mathematical foundations are not enough to distinguish Pijul.


If you want to be good at math, you MUST gain an intuitive understanding of it.


Indeed, but if can accurately deduce what the behavior will be from a few relatively simple rules, then that is an excellent basis for forming an intuition.


That's a deduction… intuition is usually drawing conclusions based on pattern-matching on similar situations (e.g. 'how do I do this in git/svn?').


A fair distinction, but I'm not claiming that deduction and intuition are the same. Rather, I'm saying it is easier to change the patterns you expect if the new patterns are simple and regular.

If by intuitive we mean "matches the patterns that other people expect (e.g. git/svn)" then obviously Pijul is not intuitive. But if once one learns the basics, it becomes fairly easy to infer what the tool will do in more advanced cases, then one might say it is at least "intuitable".


What is "collaborative edition"? Does it mean the same thing as collaborative editing?


Exactly so. It's not strictly correct modern English as far as I know, but it's quite a common slip amongst French speakers (from their names I would strongly suspect at least one of the core Pijul folks fits that category) for whom the native word is indeed "edition".


Is there any sound reason to use AGPL3 license in this case? Is it to avoid private, non-public modifications by some future pijulhub.com? Something, somewhere feels irrational in my head, especially when I think about popular projects like SQLite which seem to be doing just fine without the whole license thing.


The case is simply that the authors probably have a different view of licensing, and wish that their code remains free software even when integrated in other products. It probably wouldn't help that much in the case of something like Github because Github uses their own version of git.

Depending on the outlook, AGPL is simply there to fix a loophole in the GPL, and rightly so.


Hopefully they release "workable pijul" soon (they said on January 10th that they were almost there) [0], they haven't updated the source in the darcs repository June 1st.

[0] https://pijul.org/2017/01/10/first-working-pijul.html


Last time I checked, the Darcs project had mostly fixed the performance issues. Shouldn't it be better to concentrate and unite their efforts?


The challenges with darcs' performance lie at the core of its theory of patches; IIUC, there's no way to make exponential merge go away without also breaking darcs.

pijul changes the theory, and does so in a way specifically to avoid slow algorithms.


This isn't the first time that an entirely different patch algebra has been tested outside of darcs itself (camp is mentioned in other posts here, darcs-2 versus darcs-1 was sort of built this way too). Sometimes it can be easier to prototype and/or try entirely different directions without a backwards compatibility liability, and then come back to the "parent" project with working code and figure out the migration strategy then.

One thing I see here (as a long time reader/follower of the space) is that pijul has been using other languages than Haskell and there's some thought that a more approachable language might bring in more developers outside of the subset of Haskell developers that have been darcs' main source of development experience.


Do I understand it correctly if I say this is like the event-sourcing model applied to version control?


Yes; "event sourcing" is a niche term for "maintaining a change log" which is quite literally the idea: keep patches, not states.


Looks like this is inspired by Darcs but was originally an attempt to improve it for those curious about it as I was. It is written in Rust which gives it all of the benefits (and weaknesses(?)) of Rust.


Is anyone using Darcs or Pijul at work?


I used darcs for a while a few years back, it was ok. But definitely less polished than git.

Also the only vcs I've ever used where the repo managed to corrupt itself to the point I lost committed but unpushed changes


Says it's only text yet - is there a plan for binary? Does darcs work well with binary?


Darcs handles binaries, but not particularly well (ie, nothing quite next to the ease and brilliance of working with text documents). Binary merging is not well defined in general by any means, so binary is handled essentially at the entire file level. Because darcs patches need to [1] store the previous file state as well as the new/current one, patches in darcs with binary changes can get huge quickly.

Optimizations have been added to darcs over the years including some changes to something increasingly akin to git's hash storage for binary objects, but so far as I know, binary files that might change a lot over time are still mostly discouraged in darcs.

[1] To keep the patches reversible, for commutation.


$ alias pjl 'pijul'

... much better now..


oh what a great idea, let's build pijulhub!


How does this compare to darcs?

http://darcs.net/

Edit: they have some minor text on how it's better, but a more detailed description would be appreciated.


It is heavily inspired by Darcs, and is based on an alternative "theory of patches" that reportedly has much better worst-case performance characteristics.

It is written in Rust, instead of Haskell, also contributing to better/more predictable performance.

The Darcs folks are very aware and supportive of it, and last I heard, Darcs might add support for the Pijul patch format.

This is an extremely exciting project.

(speaking as a long time Darcs fan and passive onlooker who saw the Pijul presentation at rustconf)


I find it very uplifting to see both team exchanging so nicely.

Feels like good research and progress.


Not sure why this was downvoted, it is a legitimate question.

https://pijul.org/faq.html#did-you-solve-the-exponential-mer... answers it, at least partially. There may be a more comprehensive comparison, though.


I looked through the site, and saw some high-level comparisons, but there's not much meat to it.

Where Darcs has this:

https://en.wikibooks.org/wiki/Understanding_Darcs/Patch_theo...

Pijul has some text saying "we're better than Darcs in these areas".


It's supposedly faster, or, has a tolerable performance (I've nevere used Pijul, and never used darcs to the extent that it became slow).


How is this different from, my personal favorite, Fossil. http://fossil-scm.org/index.html/doc/trunk/www/index.wiki


Do you have a moment to explain what makes Fossil your favorite?

As far as I know, the only unique aspect of Fossil is "Integrated Bug Tracking, Wiki, and Technotes".


Those are certainly very nice features to have. For me the syntax and ease of use is what I like. Although that is all subjective to each person of course. Plus the author is a great guy. No ego at all. Plus the license is more appealing to me.


Why don't I ever see CC on OSS? Always these complicated licenses.


https://creativecommons.org/faq/#can-i-apply-a-creative-comm...

> Unlike software-specific licenses, CC licenses do not contain specific terms about the distribution of source code, which is often important to ensuring the free reuse and modifiability of software. Many software licenses also address patent rights, which are important to software but may not be applicable to other copyrightable works. Additionally, our licenses are currently not compatible with the major software licenses, so it would be difficult to integrate CC-licensed work with other free software. Existing software licenses were designed specifically for use with software and offer a similar set of rights to the Creative Commons licenses.


just to understand, have you ever read the BSD license or the MIT license? Both very prevalent in OSS

https://opensource.org/licenses/MIT


AGPL3 sounds like a death sentence given recent RethinkDB struggles, which is just sad, because the project looks very cool.


The reasons RethinkDB the company failed very probably have little to do with AGPL.

http://www.defstartup.org/2017/01/18/why-rethinkdb-failed.ht...


Any sources behind the claim that agpl is behind the lack of success for rethinkdb?

I am very bullish about agpl. As far as I understand, the choice of agpl has no effect on your code which you put in a repository. Is that not the case?


As I wrote in another comment here I'm really surprised by the shallow reasoning of fellow developers regarding AGPL: https://news.ycombinator.com/item?id=13643645

I mean, I might not understand the finer legal details but thinking "all your code you see after using anything GPL has to be GPL" is really weird, did this people ever used a linux kernel? (yes I know GPL is different from AGPL and at a high level I also know what is different)


Define "your code which you put in a repository"?

Disclaimer: IANAL. AGPL enforcement is generally a little unclear. If you ship a binary, you have to make the source available. If your binary answers on the wire but you don't ship it you (under the APL) have to make the source available.

So, uh, yes -- as long as you never use it, which seems a little silly of a definition?


> Define "your code which you put in a repository"?

... Code you add with the equivalent of `git add; git commit`. I.e. suppose git was AGPL instead of GPL, that would have no effect on the license of say, Rust (MIT+Apache) regardless of the fact that git is used in the development.


My apologies, I completely misunderstood your question.

Again, (A)GPL enforcement is unclear and subject of real debate, but my understanding is that nobody thinks virality impacts data being used by the program (e.g. source code in a VCS).


Honest question, although on a bit different topic I guess.

Does AGPL prevents some company creating Github like services for Pijul?


There is serious disagreement between serious legal scholars (Moglen, Rosen) about what kind of cooperation between components would trigger (A)GPL virality clauses. Linking counts according to one, but not the other; IIUC everyone agrees subprocesses don't count.

So, assuming said company would not want to AGPL their code (and assuming they don't distribute binaries), that choice would imply some technical decisions they would presumably not be very happy about, but overall: no.


I think the problem with the AGPL is that we don't know. The virality of the AGPL hasn't really been tested in courts of law yet, and it is easy to assume the worst, especially if you consider trying to need to explain AGPL terms to lawyers in a court case, much less a potentially non-technical juror.


It does not prevent any company from creating a Pijulhub. However, if you make modifications to Pijul or link it as a library into other code, that code must be made open-source.

(And note that GitHub doesn't even use the original git written by Linus, they wrote their own implementation, libgit2.)


Afaik libgit2 is a derivative of git.

Anyways, for Pijul to be successful, there must be something like PijulHub. The bar is higher now. Thus, make it easy to build this hub!


Being open-source doesn't make it any harder to build a centralized source code hosting site, it just makes it harder to get VC funding. :P


It sounds like the authors' motivation is to create something that doesn't need a central hub to compete, to fight centralization.

I expect their idea of success is also very different from github's.


> Does AGPL prevents some company creating Github like services for Pijul?

Not at all, if their service is AGPL licensed.

Of course, many companies may not want to use AGPL, and there's a whole lot of discussion on this page regarding what might/might-not be allowed to work around the AGPL. However, I didn't see anyone mention the possibility of, you know, using the AGPL, which is the entire reason the authors chose it ;)


Not really it just means the ticket tracking/wiki/etc... features can't get so cozy with pijul that they become 'derivative works'.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: