Git for Subversion users, Part 2
Untangle the complexities of branch merging
This content is part # of # in the series: Git for Subversion users, Part 2
This content is part of the series:Git for Subversion users, Part 2
Stay tuned for additional content in this series.
This is the second of a two-part series. You should read the Part 1 if you haven't already, as I'll use the same Git and Subversion (SVN) setup, and it will get you used to my sense of humor.
Branching and merging in SVN
Easily the greatest source of headaches for version control system (VCS) managers are branching and merging. The vast majority of developers prefer to commit all of their changes in the trunk. As soon as branching and merging come up, developers start to complain, and the VCS manager gets to deal with it.
To be fair to developers, branching and merging are scary operations. The results are not always obvious, and merging can cause problems by undoing other people's work.
SVN manages the trunk well and many developers don't bother with branching.
SVN clients before 1.5 were a bit primitive about tracking merges, so if
you're used to older SVN clients, you might not know about SVN's
There's also a tool called svnmerge.py (see Related topics for a link). svnmerge.py can track merges without the
svn:mergeinfo support and thus works for older
Because of the complexity and variations in SVN's merge support, I won't provide specific examples. Instead, let's just talk about Git's branch merging. You can read the SVN manual referenced in the Related topics section if you are interested.
Branching and merging in Git
If Concurrent Versions System (CVS) is the village idiot when it comes to branching and merging, SVN is the vicar and Git is the mayor. Git was practically designed to support easy branching and merging. This Git feature not only impresses in the demo but is also handy every day.
To give you an example, Git has multiple merge strategies, including one called the octopus strategy, which allows you to merge multiple branches at once. An octopus strategy! Just think about the insanity of attempting to do this kind of merge in CVS or SVN. Git also supports a different kind of merging called rebasing. I won't examine rebasing here, but it is quite helpful for simplifying the repository history, so you might want to look it up.
Before I proceed with the merge example below, you should be familiar with
the branch setup in
HEAD (the current branch, in this case
master) and the
empty-gdbinit branch. First, let's merge
HEAD, then make a change in
HEAD and merge it the other way into
Listing 1. Merging changes from branch to HEAD with Git
# start clean % git clone email@example.com:tzz/datatest.git # ...clone output... # what branches are available? % git branch -a #* master # origin/HEAD # origin/empty-gdbinit # origin/master # do the merge % git merge origin/empty-gdbinit #Updating 6750342..5512d0a #Fast forward # gdbinit | 1005 --------------------------------------------------------------- # 1 files changed, 0 insertions(+), 1005 deletions(-) # now push the merge to the server % git push #Total 0 (delta 0), reused 0 (delta 0) #To firstname.lastname@example.org:tzz/datatest.git # 6750342..5512d0a master -> master
This is not hard as long as you realize that
after the merge with the
master branch gets pushed to the remote
server to synchronize with
other words, you merged locally from a remote branch and then pushed the
result to another remote branch.
What's important here is to see how Git does not care which branch is authoritative. You can merge from a local branch to another local branch or to a remote branch. The Git server only gets involved for remote operations. In contrast, SVN always requires the SVN server, because with SVN the repository on the server is the only authoritative version.
Of course, Git is a distributed VCS, so none of this is surprising. It was designed to work without central authority. Still, the freedom can be a bit jarring to developers used to CVS and SVN.
Now, properly prepared with all this grand talk, let's make another local branch:
Listing 2. Creating and switching to a release branch on machine A
# create and switch to the stable branch % git checkout -b release-stable #Switched to a new branch "release-stable" % git branch # master #* release-stable # push the new branch to the origin % git push --all #Total 0 (delta 0), reused 0 (delta 0) #To email@example.com:tzz/datatest.git # * [new branch] release-stable -> release-stable
Now, on a different machine we will remove the
gdbinit file from the master branch. Of course,
it doesn't have to be a different machine, it can simply be in a different
directory, but I'm reusing "The Other Ted" identity on Ubuntu from
for machine B.
Listing 3. Removing gdbinit from master branch on machine B
# start clean % git clone firstname.lastname@example.org:tzz/datatest.git # ...clone output... % git rm gdbinit # rm 'gdbinit' # hey, what branch am I in? % git branch #* master # all right, commit my changes % git commit -m "removed gdbinit" #Created commit 259e0fd: removed gdbinit # 1 files changed, 0 insertions(+), 1 deletions(-) # delete mode 100644 gdbinit # and now push the change to the remote branch % git push #updating 'refs/heads/master' # from 5512d0a4327416c499dcb5f72c3f4f6a257d209f # to 259e0fda9a8e9f3b0a4b3019781b99a914891150 #Generating pack... #Done counting 3 objects. #Result has 2 objects. #Deltifying 2 objects... # 100% (2/2) done #Writing 2 objects... # 100% (2/2) done #Total 2 (delta 1), reused 0 (delta 0)
Nothing crazy here (except for "deltifying," which sounds like something
you'd do at the gym or something a river might do near a large body
of water). But what happens on machine A in the
Listing 4. Merging removal of gdbinit from master branch to release-stable branch on machine A
# remember, we're in the release-stable branch % git branch # master #* release-stable # what's different vs. the master? % git diff origin/master #diff --git a/gdbinit b/gdbinit #new file mode 100644 #index 0000000..8b13789 #--- /dev/null #+++ b/gdbinit #@@ -0,0 +1 @@ #+ # pull in the changes (removal of gdbinit) % git pull origin master #From email@example.com:tzz/datatest # * branch master -> FETCH_HEAD #Updating 5512d0a..259e0fd #Fast forward # gdbinit | 1 - # 1 files changed, 0 insertions(+), 1 deletions(-) # delete mode 100644 gdbinit # push the changes to the remote server (updating the remote release-stable branch) % git push #Total 0 (delta 0), reused 0 (delta 0) #To firstname.lastname@example.org:tzz/datatest.git # 5512d0a..259e0fd release-stable -> release-stable
The mentat interface, which I referred to in
strikes again in the diff. You're supposed to know that
/dev/null is a special file that contains
nothing, and thus the remote master branch has nothing, whereas the local
release-stable branch has the
gdbinit file. That's not always obvious to most
After all that fun, the
pull merges the local
origin/master and then the
origin/release-stable with the changes. As
usual, "delta" is the Git developer's favorite word—one never misses a
chance to use it.
I won't go into the
git bisect command in detail
here, because it is quite complicated, but I wanted to mention it because
it is a terrific tool. Bisecting changes is really a binary search across
the commit log. "Binary" means that the search splits the search interval
down the middle and tests the middle each time to decide if the wanted
segment is above or below the middle.
The way it works is simple. You tell Git that version A is good and version Z is bad. Git then asks you (or asks an automated script) if the version halfway between A and Z, say Q, is bad. If Q is bad, then the bad commit is between A and Q; otherwise the bad commit is between Q and Z. The process is repeated until the bad commit is found.
It's especially nice that bisecting can be automated with a test script. That makes it possible to write a test for version Z and use it backwards to find when a feature broke, which most developers would call an automated regression test. Those will save you time.
Merge conflicts are inevitable in any VCS and especially so in a distributed VCS such as Git. What happens if two people change a file in conflicting ways in the same branch? Both of the following examples are in the master branch of the datatest repository we've been using so far.
First we make a change to encode.pl on machine B:
Listing 5. "Does not work" on machine B
# we're at time T1 # change the contents % echo "# this script doesn't work" > encode.pl % git commit -a -m 'does not work' #Created commit e61713b: does not work # 1 files changed, 1 insertions(+), 1 deletions(-) # we're at time T2 now, what's our status? % git status # On branch master #nothing to commit (working directory clean)
Now we make a change to encode.pl on machine A without awareness of the
changes on machine B, and
Listing 6. "Does work" on machine A
# we're at time T2 # change the contents % echo "this script does work" > encode.pl % git commit -a -m 'does not work' #Created commit e61713b: does not work # 1 files changed, 1 insertions(+), 1 deletions(-) # we're at time T3 now, what's our status? % git status # On branch master # Your branch is ahead of 'origin/master' by 1 commit. # #nothing to commit (working directory clean) % git push #Counting objects: 5, done. #Delta compression using 2 threads. #Compressing objects: 100% (2/2), done. #Writing objects: 100% (3/3), 298 bytes, done. #Total 3 (delta 0), reused 0 (delta 0) #To email@example.com:tzz/datatest.git # 259e0fd..f949703 master -> master
Now, on machine B, we do a
git pull and realize
things are not so wonderful:
Listing 7. Uh-oh on machine B
% git pull #remote: Counting objects: 5, done. #Compressing objects: 100% (2/2), done.) #remote: Total 3 (delta 0), reused 0 (delta 0) #Unpacking 3 objects... # 100% (3/3) done #* refs/remotes/origin/master: fast forward to branch 'master' # of firstname.lastname@example.org:tzz/datatest # old..new: 259e0fd..f949703 #Auto-merged encode.pl #CONFLICT (content): Merge conflict in encode.pl #Automatic merge failed; fix conflicts and then commit the result. # the next command is optional % echo uh-oh #uh-oh # you can also use "git diff" to see the conflicts % cat encode.pl #<<<<<<< HEAD:encode.pl ## this script doesn't work #======= #this script works #>>>>>>> f9497037ce14f87ff984c1391b6811507a4dd86c:encode.pl
This situation is very common in SVN as well. Someone else's changes disagree with your version of a file. Just edit the file and commit:
Listing 8. Fixing and committing on machine B
# fix encode.pl before this to contain only "# this script doesn't work"... % echo "# this script doesn't work" > encode.pl # commit, conflict resolved % git commit -a -m '' #Created commit 05ecdf1: Merge branch 'master' of email@example.com:tzz/datatest % git push #updating 'refs/heads/master' # from f9497037ce14f87ff984c1391b6811507a4dd86c # to 05ecdf164f17cd416f356385ce8f5c491b40bf01 #updating 'refs/remotes/origin/HEAD' # from 5512d0a4327416c499dcb5f72c3f4f6a257d209f # to f9497037ce14f87ff984c1391b6811507a4dd86c #updating 'refs/remotes/origin/master' # from 5512d0a4327416c499dcb5f72c3f4f6a257d209f # to f9497037ce14f87ff984c1391b6811507a4dd86c #Generating pack... #Done counting 8 objects. #Result has 4 objects. #Deltifying 4 objects... # 100% (4/4) done #Writing 4 objects... # 100% (4/4) done #Total 4 (delta 0), reused 0 (delta 0)
That was easy, wasn't it? Let's see what happens on machine A next time it updates.
Listing 9. Fixing and committing on machine B
% git pull #remote: Counting objects: 8, done. #remote: Compressing objects: 100% (3/3), done. #remote: Total 4 (delta 0), reused 0 (delta 0) #Unpacking objects: 100% (4/4), done. #From firstname.lastname@example.org:tzz/datatest # f949703..05ecdf1 master -> origin/master #Updating f949703..05ecdf1 #Fast forward # encode.pl | 2 +- # 1 files changed, 1 insertions(+), 1 deletions(-) % cat encode.pl ## this script doesn't work
Fast forward means that the local branch caught
up with the remote branch automatically, because it contained nothing new
to the remote branch. In other words, a fast forward implies no merging
was needed; all the local files were no newer than the remote branch's
Finally, I should mention
git revert and
git reset, which are very useful for undoing a
commit or other changes to the Git tree. There's no room to explain them
here, but make sure you know how to use them.
This article opened up the concept of merging, showing what it's like to keep the local and remote branches on two machines and resolving conflicts between them. I also drew attention to the complicated, even arcane Git messages, because compared with SVN, Git is much more verbose and much less intelligible. When you couple this fact with the complex syntax of Git's commands, it can make Git pretty intimidating for most beginners. However, once a few basic concepts are explained, Git gets much easier—even pleasant!
- "Git for Subversion users, Part 1: Getting started" (developerWorks, August 2009) shows how to install Git and set up a repository.
- In the SVN manual, read all you ever wanted to know about SVN branching and merging.
- Learn more about the Git merge option, rebasing.
- In case you were caught short on this bit of trivia, read the Wikipedia entry on mentats.
- This crash course tutorial on Git is a good place to find more examples. Another tutorial worth reading is Flavio Castelli's "Howto use Git and svn together."
- For Git hosting resources, try GitHub, Gitorious, repo.or.cz, and this list of public Git hosting sites.
- The Wikipedia community's take on Git and Subversion are both quite thorough.
- svnmerge.py is a tool to automate merge tracking. It allows branch maintainers to merge changes from and to their branches easily, and it automatically records which changes were already merged.
- Git downloads, documentation, and tools are available from the Git Web site.