Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I know (or, at least, have known) how git works, in the way most people mean that (the data structures & on-disk layout, what a commit is, what a tag is, what a branch is, what HEAD is, staging, et c.). What I can't keep straight is WTF the commands are actually doing, in that low-level sense, which is a different thing, and there's approximately a 0% chance I'm ever going to use more than a tiny fraction of the commands often enough to remember that information.


definitely agree, and I'm in the same boat. I don't even think the data model of git is that hard to grok at all, it's mostly that commands are very unclear on what they operate on and in particular people get really tripped up about how many levels of state there are (stage, working tree, local branches, remote refs) that they have to interact with.

Like, I've had to explain a lot of times why you `git pull origin master` but when you want to interact with that remote branch otherwise it's `origin/master` instead. The lack of clarity is in what commands operate on what levels, with many of them operating on several at once.

There have been some efforts to reform the command set to be more clear, like `git switch`, but the old commands will persist forever along with a lot of other footguns (like `git push --force` really ought to be replaced with `git push --force-with-lease` and moved to `git push --force-I-really-mean-it` so it hardly matters.


I've actually worked on git internals and I'm in the same boat.

As part of a security-related project some years ago, my team and I hacked jgit to use SHA256, which required changing the length of pretty much every on-disk data structure. Sadly, there was (probably still is) no HASH_LEN constant, just a lot of magic offsets strewn throughout the code. I had to compare lengths against the git spec at every step.

And yet I still scramble for stackoverflow every time something goes slightly amiss.


There's an ongoing effort to rework core Git so that the hash implementation can be swapped out for eg. SHA256. [1]

jGit is actually a separate project from core Git, but once it gets adopted into core Git we can expect that jGit will follow suite, given that it's critical to Gerrit and other projects.

[1] https://lore.kernel.org/git/20191223011306.GF163225@camp.cru...


What a pointless project! U hope you were paid well, at least.


I was. But it wasn't quite as pointless as it sounds - the tool was a sort of tripwire-like system, with changes shipped to an append-only log, that itself was checkpointed in an early blockchain-ish structure. The threat model was "nation state actor" so the client wouldn't accept SHA1.

It was actually a pretty cool system. I don't think it was ever sold though.


Man, I thought zero days and secret backdoors were bad enough. Now we have to worry about manufactured hash collisions in all our repos' files dating back forever?


That seems like an overkill. Couldn't you combine the hash together with the date to obtain uniqueness?


The date isn't really meaningful since it can be set to anything on a file. But if you can force two dissimilar files to have the same hash, you can combine that with some other attack to inject it into some sort of chain of trust, whether it's git or some other type of checksum based system. Then combine that with a SolarWinds like attack and even if they try to revert to something from years earlier, they can't guarantee that the rollback files are still unaltered unless they had multiple hashes to compare it to or diffed it manually. But multiply that by X thousand files over Y commits during Z years and it would be very difficult to detect.


I do not remember jgit internals, but its API is pretty bad. I always assumed it was some kind of throwaway PoC suddenly turned popular.


> some kind of throwaway PoC suddenly turned popular

Wow, that description feels spot on.


> levels of state

This is the crux for me. Command naming is completely unrelated to and unindicative of state.

It feels like surely there's an opportunity for the basic CRUD operations to be collapsed down into a standard "{action} {source} {target}" style.

There will be nuances, specifically around branching, but the basics should be basic. As opposed to a Swiss Army knife, where you have to pull out the scissors and squeeze them three times before you can unfold and use the blade.


i can't stress just how great magit is. it's worth trying out emacs for just that. something like spacemacs as a wrapper is useful too since it gives you some well configured defaults for file operations. emacs is a kinda trash text editor but an amazing text utility toolkit that enabled magit.


I'll stress with you. Even if you hate Emacs, magit alone is a valid single use-case to start up emacs.


Are there any git frontends that do this today?


The one built into IntelliJ IDEs is pretty good. SourceTree is decent too. They are both cover the vast majority of day to day operations. I only ever very rarely have to resort to the command line for ritualistic summoning of the git demons.


Magit comes close to this action-source-target model whenever possible.


can you explain the `git pull origin master` thing one more time here?


I don't think using `git pull` is a particular good way of working. A pull is a fetch and merge or a rebase combined.

If it's difficult to keep your mental model of some system up to date, I doubt that doing bigger steps at once makes things easier.

So

1. run `git fetch`

2. if the textual output does not tell you what has happened, run `gitk -all`

3. Decide what to do. Rebase, merge, whatever.

Of course if you know exactly what you are doing, pull can be fine. If you changed the repo yourself on another computer that is the case. Otherwise, how can you know your second step, before having even seen the data you are operating on? Well, it can work, but if it doesn't, don't complain.


> I don't think using `git pull` is a particular good way of working.

I agree. For a DVCS like git, separating the network transaction from updating the working copy on disk is the best way to go about it. Going in the other direction, this is the default since git add, git commit and git push are executed separately.


This is literally the first advice I give when teaching people git. The first months of use, just run the two commands separate. Many mistakes are avoided that way.


I agree, but I end up using `pull` anyway just because the alternative is so tedious. I wish there was a short command that did the same thing as pull without fetch: merge the remote-tracking version of the current branch's default upstream into the current branch.

Essentially the whole concept of "upstream" is weird and non-orthogonal. Another one that bothers me is that as far as I can see there's no way to globally turn off setting an upstream on newly created branches (I can pass a flag to the specific "git branch" command, but that's tedious and error-prone).


like why it's different?

`git fetch` (and by extension `git pull` when given a remote) and `git push` copy data to and from a remote. When you specify `git pull origin master` you're saying "pull down a copy of the remote ref master from origin", which it then saves locally as the ref `origin/master`.

Everything under `origin/` (or really `refs/heads/origin/`) is just a cached pointer to the last known state of that ref on the remote.

All other commands operate only on these local references. So when you want to refer to what you know to be the state of things on `origin`, you can use `origin/master`. Otherwise that command has no particular knowledge of how to talk to origin.

Incidentally this is a shortcut I use all the time to update my local master from a remote:

`git fetch origin master:master`

Which is super unclear in its meaning but it means fetch origin's master HEAD and put it in my local master ref. I actually use this more often than git pull nowadays.


I tend to default to `git pull --rebase`.


I have this configured as default everywhere and strongly believe that merge-pulls are always wrong. The first place I used git we were learning together (i.e. nobody knew what a sensible workflow was) and people would push their local merge commits back to master. It was horrible.


`git config merge.ff=only` is really helpful for enforcing this. It makes you have to say what you want for any non-trivial update of a ref through pull or merge.


Strongly disagree. Never rewriting local commits is great for the same reasons that never rewriting published commits is great; if you rebase you lose the ability to fearlessly work on multiple branches in parallel that's the great advantage of git.

Pushing merges is great. Pushing random (unreviewed) local commits directly to master is bad, but it's no worse when those commits are merges than when they're not. Conversely, rebasing master (which is quite easy to do if you're inexperienced but have been advised to use git pull --rebase) and pushing that creates a self-perpetuating mess that is very hard to fix (because even if you fix what you did, any other user who did a rebase-pull of master in the meantime is going to reintroduce the problem). Using rebase also trains you to force-push which makes messing up published branches much easier.


Also, one advantage of `git pull origin master:master` is that you don't have to checkout master first.


so the distinction here is

- origin master <=== the actual remote version of the master branch

- origin/master <=== a local branch that you cached from the "origin master" remote, may or may not be in sync with the real "origin master"


origin and master are completely arbitrary too...

`git pull remote_repository_name branch_name` is the generic way to look at it instead of some magic incantation.

I like to call origin "upstream" to differentiate them.

and then git pull is another way to think of git fetch and git merge as one command roughly.


yep, that's right. Or rather, origin and master are just two parameters given to pull/fetch/push to describe a target while origin/master is just the local name for, as you say, the locally cached ref.

Comparing against that locally cached ref is also what git uses to tell you how far behind/ahead of the upstream you are in `git status` or whatever. Fetch and push are the only git commands that actually talk to a remote (at the "user level" of the command set anyways, those are also composed of lower level commands).


> (or really `refs/heads/origin/`)

It is worth the time to fully understand refspecs. Once people do, they tend to understand all essential ramifications of branch and repository naming.


What's wrong with `git push -f`? When I'm working on a branch that's been previously pushed with `-u`, it's pretty normal to force push it, particularly if you're amending or reordering commits in response to review feedback, or rebasing due to conflicts in preparation to merge.


changing `-f/--force` to act like `--force-with-lease` would have no effect on that flow whatsoever. What it would prevent is you accidentally overwriting something on the remote because you didn't know its current state, potentially silently backing out changes someone else (or perhaps you yourself on another machine) had pushed.

All it does is add this simple check before actually pushing:

    if (remote_ref("blah") != local_ref("remote/blah"))
        fail();
Most of the time it doesn't matter, and for most people's uses of --force it would have no effect (because most people are just pushing to a branch they're the only one pushing to). But every now and then it helps a lot to avoid losing data.


It’s important to also understand where this might fall down: many tools fetch automatically and this can cause issues with reliability here.


Ultimately, I suppose, git usage is somewhat cultural. I personally have an aversion to push -f, along the premise that once it’s pushed it’s public and someone else may have branched (and pushed changes of their own) or simply had it checked out for review; doing push -f “changes reality,” while checking out a new branch is idempotent. If someone else has committed on that branch it’s especially jerky to push -f.

I try to be pragmatic about this sort of thing yet push —force is one of those cultural no-no’s for me.


It means you can't fearlessly pull from other people's feature branches. So people mostly don't bother looking at each other's feature branches (because there's nothing you can reasonably do with someone else's change-in-progress except wait for the branch to hit master), so you collaborate later and end up with more conflicts.


I think it's because the commands are poorly named. "Reset" vs. "Revert" tells me nothing about what is happening at the low level, I just have to remember it. And yet the two operations, despite having fairly similar English language meanings, have entirely different meanings in the context of Git.


Yes. Especially

git init --submodule --recursive

Or is it

git submodule --init --recursive?

God I hate this UX so much I usually have a ./fetch-subrepos.sh that runs a bunch of "git clone" commands.

And if I push without first pulling, must it always punish me with a merge commit? Can't I say "oh shit I don't want to do this, go back and git pull"?


> And if I push without first pulling, must it always punish me with a merge commit? Can't I say "oh shit I don't want to do this, go back and git pull"?

This is a source of probably 50% of my "ah, fuck, time to undo..." moments with git, these days. I hate that shit. Muscle-memory gets ahead of me and I commit on a shared remote branch, which would be fine given our workflow except that I didn't pull first. What a pain in the ass.


I have this in my .gitconfig so the pull will fail rather than merge.

    [pull]
        ff = only
If it does fail I can decide whether to merge or rebase.


I guess I have `git pull --rebase` as muscle memory.

I would guess there's an easy way to make git do this automatically for you via config so you never forget, but I just never, ever `git pull`

Or:

> git config --global alias.up '!git fetch && git rebase --autostash FETCH_HEAD'

From:

https://github.com/JKrag/git-up


git config --global pull.rebase true

You probably also want:

git config --global rebase.autostash true


For the sake of your coworkers (and your future self), please don't lie just to make your history look pretty.


I try to make each commit a snapshot of a working repository (compileable or runnable or testable, whatever is the heuristic for working) where the difference between each snapshot can be explained by the commit message. Ideally they are isolated to a single logical "unit" of change (fix, refactor, add, remove). The goal here being to minimize the amount of confusion and work for anyone traveling up and down the tree. I often have to rewrite my local history to make this happen, because the actual changes that I make can happen in a somewhat arbitrary order. How has local history revision bitten you?


Rebasing leads to either having long stretches of non-compiling commits in history or giant non-bisectable commits. E.g. you added a method call on your feature branch, that method was renamed in master while you were working on your feature. If you merge then your commits still compile and I can use automated `git bisect` the way it's intended. If you rebase then your commits don't compile and I can't bisect through commits on your feature branch. If you squash then your whole feature development becomes a single monster commit and I can't bisect through it.

I agree with having as many commits as possible be compilable, but that's not the sole criterion, because there's a tension between that and having granular history: if you squash the whole history of the repo into a single commit then that means 100% of commits are compilable, but it's still a bad move. Conversely, a non-compiling commit in between two compiling commits is not a big problem (you just make sure your git bisect script skips non-compiling commits) - what really matters is keeping the diff between two successive compiling commits as small as possible. IME the best way to achieve that is never rewriting history.


That's a very good point with rebasing. Thanks for explaining.


It is for this reason that I have changed my workflow to always stash first, then pull, then pop the stash and do the merges locally, then push.


And always diff before stash because sometimes it's just random shit I wasn't serious about, so I'll re-checkout that file and then stash the rest.


> And if I push without first pulling

I think I know git well, but you got me confused. I've never heard of pushes causing merges. Surely you are talking about pulls, right?


push causes the error, the resolving pull creates the merge; the correct resolution has been pointed out as git pull --rebase but most people don't realize this.


Maybe somebody who has a habit of using --force when pushing. A major downside of rebase-centric workflows is that it teaches you to ignore the safety rails when pushing, or when deleting branches.


`--force-with-lease` would fix this problem (it needs an alias). Also, `--force` wouldn't cause a merge commit; it would overwrite the remote changes.

The only theory that makes sense is that this person doesn't know how to `pull --rebase`, but the order of `push` vs `pull` wouldn't change the presence of merge commits, so I'm still confused.


I don’t know git well, but I often run into the problem being discussed.

If I pull from origin before making my changes, I don’t have to merge, obviously.

But correct me if I’m wrong: I think that if I don’t pull first, but my changes don’t conflict with any part of what was done by the previous commit(s) I missed, I’ll still have to merge if I touched a file they touched.

This is a common scenario for me. Correct some typos in comments for example, and I get forced to figure out how to merge using vim, which I don’t know how to use at all (being a nano user). I’m sure I could and should switch to at least using nano by default, but I don’t know how merging really works, either.

What I really want to do is undo my commit, pull, and redo my commit. Then I don’t have to figure out git merge.


> I don’t know git well, but I often run into the problem being discussed.

I do understand the problem being discussed; what I don't understand is what it has to do with pushing first. You have the same problem no matter which order you use `git push` vs `git pull`.

> I think that if I don’t pull first, but my changes don’t conflict with any part of what was done by the previous commit(s) I missed, I’ll still have to merge if I touched a file they touched.

Yes, that's true.

> What I really want to do is undo my commit, pull, and redo my commit. Then I don’t have to figure out git merge.

You can do that with `git pull --rebase`, which, as others have mentioned, you can set as the default behavior of `git pull` like this:

https://news.ycombinator.com/item?id=27581416


Ooh, --force-with-lease looks like a nice feature, especially for updating github PRs that aren't yet merged. I still wouldn't want to use it where anybody else has a copy of the changes, since that's where you need a merge commit to avoid breaking somebody else's repo, but that gives me a safer option than a blind --force.


Just remember that --force-with-lease only protects you from overriding commits you have not yet fetched.


Wait, what? I've probably been using Gerrit too long but why do you ever need force in a rebase workflow?


These may be specific to a workflow with git + github, when using git from the command line, but here are the cases I've run into where overriding safeties is needed.

1. After making a PR, there are conflicts when merging into main. In a merge-based workflow, I would merge main into the feature branch, resolve any conflicts, then push. In a rebase-based workflow, I rebase the branch onto main, resolve any conflicts, but now I need to push --force. As some of the other comments have mentioned, this can be improved with --force-with-lease, but still isn't the greatest.

2. After making a PR, there are some typos that need to be fixed. Fix these in an interactive rebase, to edit the same commit that introduced the typos. Also requires either --force or --force-with-lease.

3. When the PR is accepted, the result is rebased on top of main. My local branch still exists, and must be deleted. I would prefer to use `git branch -d` to delete the feature branch, but this rightfully says that the feature branch hasn't been merged in. I instead need to use `git branch -D` to forcefully delete it, introducing a point of human error. (There are some cases where git can delete the branch safely, which I think occurs either when the feature branch has only a single commit, or when the feature branch can be applied on top of main without a rebase, but I haven't exactly determined it.)

#1 and #3 are cases where a safer option cannot be used due to a rebase-workflow. #2 would exist in either case, since even in a merge workflow, rebasing of branches before they are pulled makes sense to do.


> There are some cases where git can delete the branch safely, which I think occurs either when the feature branch has only a single commit, or when the feature branch can be applied on top of main without a rebase, but I haven't exactly determined it.

FWIW: it occurs when the feature branch was based on the tip of master (because no-one else has committed to master since you branched/since you rebased onto master) - in this case rebasing your feature branch onto master is a no-op and the commits that go into master have the same hashes as they had on your feature branch.


I usually use `git pull -r` to rebase upstream changes


Git init inits a git repo.

Git submodule runs commands on submodules.

What is hard about this UX?

And it's not punishing you, its doing what you asked, to pull into a non matching head, how does it know you're not using git in the intended and distributed way?

Btw, just quit the editor without saving, it aborts.


The "intended" way generates a completely spurious merge commit - it doesn't represent a real commit, and rarely do you care about keeping track of merges into a short lived branch which are already tracked on master.

Most people want a single source of truth workflow that corresponds to the old total ordering imposed by svn or p4.


> The "intended" way generates a completely spurious merge commit - it doesn't represent a real commit, and rarely do you care about keeping track of merges into a short lived branch which are already tracked on master.

On the contrary, you want those commits for bisection, which is the main reason to have a VCS history at all.

> Most people want a single source of truth workflow that corresponds to the old total ordering imposed by svn or p4.

People think they want that, but I've never seen a convincing case for why. Bisect works better if you use merge. Blame works better if you use merge. And if you really want to see the history without merges (why?), it's one flag to do that.


So use svn?


I would, but the option is not mine to make.


[misunderstanding removed]


I think they mean the commit message editor, which git will use to open a temp file to save the message to if you don't specify a message in-line with the "-m" flag when committing, including when a merge commit is initiated by a "pull". This happens on the CLI, it's just usually (though doesn't have to be!) a command line editor that it opens. I think vim's a common default.

AFAIK whatever's opened does need to block the CLI, so you can't use a command that opens a GUI editor then returns immediately or git will interpret that as your having closed the file without saving, but otherwise any editor should work, CLI or GUI, and can be assigned in your git config.


In a thread about common sources of confusion, I think it would be more helpful to leave the misunderstanding so others might learn from it. Ie, edit to add "this is a misunderstanding" to the top, not replace it entirely.


Git invokes an editor, for writing commit messages, etc. (it looks in the VISUAL and EDITOR env vars). That could be a GUI text editor, or something running in the CLI (personally, I use emacsclient to open a new buffer in an existing Emacs window)

What they're saying is: if you quit that editor without saving the commit message, git will abort.


You can also quit the CLI editor, e.g. vim.


YES! It always seems like people’s issues are handwaved away with something like “oh you just need to understand the underlying data structures better.” No, the UX is often very bad! Like, I know exactly what I want the underlying repo to do, but how the hell am I supposed to remember which —-option of which command is going to do that thing?


YES! you learn the happy-path commands you use all the time and the handful of "sadder-path" approaches you try when things go south, but there is a dramatic fall-off in knowledge and understanding from there that leaves otherwise clever and confident people feeling stupid and frustrated. This is not a silver-bullet for productivity but still a very worthy problem to address that could have meaningful impact for a lot of people.


I'm the same, but I think that's... Fine? If I understand what I want to do in terms of first principles, there's no harm in searching for the exact incantation if I do that only once every few months.

For the rest, there's shell autocomplete and muscle memory.


Maybe for low-level, somewhat rare tasks the ideal Git porcelain would be a GUI that just exposes the data model directly.


SmartGit is pretty good. It's $70/yr though (but well worth it IMO)

I'm the guy people go to fix Git screw ups at my jobs but I just click a few buttons or drag a few commits...


I wonder should Undo as a concept apply to all Git actions/commans which have state side-effects on the repo or work dir OR should Undo only cover certain operations (which ones)?


Take a look at https://eagain.net/articles/git-for-computer-scientists/?

Maybe you've already read it, but this is what let me grok the underlying data.


The parent commenter makes it clear that they already grok the underlying data. The problem with Git, as explained so, so many times, is its horribly intuitive mapping from UI to the operations those commands preform on that model.

Comments like this, which points to a resource intended to help people "grok the underlying data", has the effect of seizing the focus of conversation and implicitly retargeting it to be concerned with with people who don't understand the underlying data model. When you been through this enough times, it just comes off as incredibly annoying and a source of tiresomeness.


I often come back to a local repository to change something and think, while I'm at it, I'll just `git pull` and end up with a non-working working directory. Surely I should know better, but I think it's also hostile to users, when the easy thing to do is often the wrong thing to do.

Even worse, I'm not sure I correctly remembered the weird combination of actions and flags to use to get back to the state where I can continue with what I wanted to do in the first place.

That article is a good example of the problem. It tells me `git rebase` is an easy thing to do but I better not use that distributed VCS to publish my work that way, where 'publish' probably also applies to different machines of mine.


But... a monad in X is just a monoid in the category of endofunctors of X, with product × replaced by composition of endofunctors and unit set by the identity endofunctor ... so what is the problem? /s


I think the parent commenter says that they do understand the underlying data.

It's just that the command-line interface is very opaque regarding what it does to that data.

For instance, say I want to apply the last three commits I made in one branch to another branch. It's a very simple operation conceptually.

Good luck remembering that the command that does it is rebase, and what the arguments for it are.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: