Git for Subversion users, Part 2: Taking control

Untangle the complexities of branch merging

Git offers Linux® developers a number of advantages over Subversion for software version control, so developers working collaboratively owe it to themselves get familiar with the basic concepts behind it. In this installment, Ted dissects branching and merging in both Git and Subversion, introduces "git bisect" for bisecting changes, and shows how to resolve merge conflicts.

Share:

Teodor Zlatanov, Programmer, Gold Software Systems

photo- teodor zlatanovTeodor Zlatanov emerged with an M.S. in computer engineering from Boston University in 1999. He has worked as a programmer since 1992, using Perl, Java, C, and C++. His interests are in open source work on text parsing, database architectures, user interfaces, and UNIX system administration.



25 November 2009

Also available in Russian Japanese Portuguese

This is the second of a two-part series. You should read the Part 1 if you haven't already, as I'll use the same Git and Subversion (SVN) setup, and it will get you used to my sense of humor.

Branching and merging in SVN

Easily the greatest source of headaches for version control system (VCS) managers are branching and merging. The vast majority of developers prefer to commit all of their changes in the trunk. As soon as branching and merging come up, developers start to complain, and the VCS manager gets to deal with it.

To be fair to developers, branching and merging are scary operations. The results are not always obvious, and merging can cause problems by undoing other people's work.

SVN manages the trunk well and many developers don't bother with branching. SVN clients before 1.5 were a bit primitive about tracking merges, so if you're used to older SVN clients, you might not know about SVN's svn:mergeinfo property.

There's also a tool called svnmerge.py (see Resources for a link). svnmerge.py can track merges without the svn:mergeinfo support and thus works for older SVN clients.

Because of the complexity and variations in SVN's merge support, I won't provide specific examples. Instead, let's just talk about Git's branch merging. You can read the SVN manual referenced in the Resources section if you are interested.


Branching and merging in Git

If Concurrent Versions System (CVS) is the village idiot when it comes to branching and merging, SVN is the vicar and Git is the mayor. Git was practically designed to support easy branching and merging. This Git feature not only impresses in the demo but is also handy every day.

To give you an example, Git has multiple merge strategies, including one called the octopus strategy, which allows you to merge multiple branches at once. An octopus strategy! Just think about the insanity of attempting to do this kind of merge in CVS or SVN. Git also supports a different kind of merging called rebasing. I won't examine rebasing here, but it is quite helpful for simplifying the repository history, so you might want to look it up.

Before I proceed with the merge example below, you should be familiar with the branch setup in Part 1. You have HEAD (the current branch, in this case master) and the empty-gdbinit branch. First, let's merge empty-gdbinit into HEAD, then make a change in HEAD and merge it the other way into empty-gdbinit:

Listing 1. Merging changes from branch to HEAD with Git
# start clean
% git clone git@github.com:tzz/datatest.git
# ...clone output...
# what branches are available?
% git branch -a
#* master
#  origin/HEAD
#  origin/empty-gdbinit
#  origin/master
# do the merge
% git merge origin/empty-gdbinit
#Updating 6750342..5512d0a
#Fast forward
# gdbinit | 1005 ---------------------------------------------------------------
# 1 files changed, 0 insertions(+), 1005 deletions(-)
# now push the merge to the server
% git push
#Total 0 (delta 0), reused 0 (delta 0)
#To git@github.com:tzz/datatest.git
#   6750342..5512d0a  master -> master

This is not hard as long as you realize that master has HEAD, and after the merge with the empty-gdbinit branch, the master branch gets pushed to the remote server to synchronize with origin/master. In other words, you merged locally from a remote branch and then pushed the result to another remote branch.

What's important here is to see how Git does not care which branch is authoritative. You can merge from a local branch to another local branch or to a remote branch. The Git server only gets involved for remote operations. In contrast, SVN always requires the SVN server, because with SVN the repository on the server is the only authoritative version.

Of course, Git is a distributed VCS, so none of this is surprising. It was designed to work without central authority. Still, the freedom can be a bit jarring to developers used to CVS and SVN.

Now, properly prepared with all this grand talk, let's make another local branch:

Listing 2. Creating and switching to a release branch on machine A
# create and switch to the stable branch
% git checkout -b release-stable
#Switched to a new branch "release-stable"
% git branch
#  master
#* release-stable
# push the new branch to the origin
% git push --all
#Total 0 (delta 0), reused 0 (delta 0)
#To git@github.com:tzz/datatest.git
# * [new branch]      release-stable -> release-stable

Now, on a different machine we will remove the gdbinit file from the master branch. Of course, it doesn't have to be a different machine, it can simply be in a different directory, but I'm reusing "The Other Ted" identity on Ubuntu from Part 1 for machine B.

Listing 3. Removing gdbinit from master branch on machine B
# start clean
% git clone git@github.com:tzz/datatest.git
# ...clone output...
% git rm gdbinit
# rm 'gdbinit'
# hey, what branch am I in?
% git branch
#* master
# all right, commit my changes
% git commit -m "removed gdbinit"
#Created commit 259e0fd: removed gdbinit
# 1 files changed, 0 insertions(+), 1 deletions(-)
# delete mode 100644 gdbinit
# and now push the change to the remote branch
% git push
#updating 'refs/heads/master'
#  from 5512d0a4327416c499dcb5f72c3f4f6a257d209f
#  to   259e0fda9a8e9f3b0a4b3019781b99a914891150
#Generating pack...
#Done counting 3 objects.
#Result has 2 objects.
#Deltifying 2 objects...
# 100% (2/2) done
#Writing 2 objects...
# 100% (2/2) done
#Total 2 (delta 1), reused 0 (delta 0)

Nothing crazy here (except for "deltifying," which sounds like something you'd do at the gym or something a river might do near a large body of water). But what happens on machine A in the release-stable branch?

Listing 4. Merging removal of gdbinit from master branch to release-stable branch on machine A
# remember, we're in the release-stable branch
% git branch
#  master
#* release-stable
# what's different vs. the master?
% git diff origin/master
#diff --git a/gdbinit b/gdbinit
#new file mode 100644
#index 0000000..8b13789
#--- /dev/null
#+++ b/gdbinit
#@@ -0,0 +1 @@
#+
# pull in the changes (removal of gdbinit)
% git pull origin master
#From git@github.com:tzz/datatest
# * branch            master     -> FETCH_HEAD
#Updating 5512d0a..259e0fd
#Fast forward
# gdbinit |    1 -
# 1 files changed, 0 insertions(+), 1 deletions(-)
# delete mode 100644 gdbinit
# push the changes to the remote server (updating the remote release-stable branch)
% git push
#Total 0 (delta 0), reused 0 (delta 0)
#To git@github.com:tzz/datatest.git
#   5512d0a..259e0fd  release-stable -> release-stable

The mentat interface, which I referred to in Part 1, strikes again in the diff. You're supposed to know that /dev/null is a special file that contains nothing, and thus the remote master branch has nothing, whereas the local release-stable branch has the gdbinit file. That's not always obvious to most users.

After all that fun, the pull merges the local branch with origin/master and then the push updates origin/release-stable with the changes. As usual, "delta" is the Git developer's favorite word—one never misses a chance to use it.


Bisecting changes

I won't go into the git bisect command in detail here, because it is quite complicated, but I wanted to mention it because it is a terrific tool. Bisecting changes is really a binary search across the commit log. "Binary" means that the search splits the search interval down the middle and tests the middle each time to decide if the wanted segment is above or below the middle.

The way it works is simple. You tell Git that version A is good and version Z is bad. Git then asks you (or asks an automated script) if the version halfway between A and Z, say Q, is bad. If Q is bad, then the bad commit is between A and Q; otherwise the bad commit is between Q and Z. The process is repeated until the bad commit is found.

It's especially nice that bisecting can be automated with a test script. That makes it possible to write a test for version Z and use it backwards to find when a feature broke, which most developers would call an automated regression test. Those will save you time.


Resolving conflicts

Merge conflicts are inevitable in any VCS and especially so in a distributed VCS such as Git. What happens if two people change a file in conflicting ways in the same branch? Both of the following examples are in the master branch of the datatest repository we've been using so far.

First we make a change to encode.pl on machine B:

Listing 5. "Does not work" on machine B
# we're at time T1
# change the contents
% echo "# this script doesn't work" > encode.pl
% git commit -a -m 'does not work'
#Created commit e61713b: does not work
# 1 files changed, 1 insertions(+), 1 deletions(-)
# we're at time T2 now, what's our status?
% git status
# On branch master
#nothing to commit (working directory clean)

Now we make a change to encode.pl on machine A without awareness of the changes on machine B, and push it:

Listing 6. "Does work" on machine A
# we're at time T2
# change the contents
% echo "this script does work" > encode.pl
% git commit -a -m 'does not work'
#Created commit e61713b: does not work
# 1 files changed, 1 insertions(+), 1 deletions(-)
# we're at time T3 now, what's our status?
% git status
# On branch master
# Your branch is ahead of 'origin/master' by 1 commit.
#
#nothing to commit (working directory clean)
% git push
#Counting objects: 5, done.
#Delta compression using 2 threads.
#Compressing objects: 100% (2/2), done.
#Writing objects: 100% (3/3), 298 bytes, done.
#Total 3 (delta 0), reused 0 (delta 0)
#To git@github.com:tzz/datatest.git
#   259e0fd..f949703  master -> master

Now, on machine B, we do a git pull and realize things are not so wonderful:

Listing 7. Uh-oh on machine B
% git pull
#remote: Counting objects: 5, done.
#Compressing objects: 100% (2/2), done.)   
#remote: Total 3 (delta 0), reused 0 (delta 0)
#Unpacking 3 objects...
# 100% (3/3) done
#* refs/remotes/origin/master: fast forward to branch 'master' 
#   of git@github.com:tzz/datatest
#  old..new: 259e0fd..f949703
#Auto-merged encode.pl
#CONFLICT (content): Merge conflict in encode.pl
#Automatic merge failed; fix conflicts and then commit the result.
# the next command is optional
% echo uh-oh
#uh-oh
# you can also use "git diff" to see the conflicts
% cat encode.pl
#<<<<<<< HEAD:encode.pl
## this script doesn't work
#=======
#this script works
#>>>>>>> f9497037ce14f87ff984c1391b6811507a4dd86c:encode.pl

This situation is very common in SVN as well. Someone else's changes disagree with your version of a file. Just edit the file and commit:

Listing 8. Fixing and committing on machine B
# fix encode.pl before this to contain only "# this script doesn't work"...
% echo "# this script doesn't work" > encode.pl
# commit, conflict resolved 
% git commit -a -m ''
#Created commit 05ecdf1: Merge branch 'master' of git@github.com:tzz/datatest
% git push
#updating 'refs/heads/master'
#  from f9497037ce14f87ff984c1391b6811507a4dd86c
#  to   05ecdf164f17cd416f356385ce8f5c491b40bf01
#updating 'refs/remotes/origin/HEAD'
#  from 5512d0a4327416c499dcb5f72c3f4f6a257d209f
#  to   f9497037ce14f87ff984c1391b6811507a4dd86c
#updating 'refs/remotes/origin/master'
#  from 5512d0a4327416c499dcb5f72c3f4f6a257d209f
#  to   f9497037ce14f87ff984c1391b6811507a4dd86c
#Generating pack...
#Done counting 8 objects.
#Result has 4 objects.
#Deltifying 4 objects...
# 100% (4/4) done
#Writing 4 objects...
# 100% (4/4) done
#Total 4 (delta 0), reused 0 (delta 0)

That was easy, wasn't it? Let's see what happens on machine A next time it updates.

Listing 9. Fixing and committing on machine B
% git pull
#remote: Counting objects: 8, done.
#remote: Compressing objects: 100% (3/3), done.
#remote: Total 4 (delta 0), reused 0 (delta 0)
#Unpacking objects: 100% (4/4), done.
#From git@github.com:tzz/datatest
#   f949703..05ecdf1  master     -> origin/master
#Updating f949703..05ecdf1
#Fast forward
# encode.pl |    2 +-
# 1 files changed, 1 insertions(+), 1 deletions(-)
% cat encode.pl
## this script doesn't work

Fast forward means that the local branch caught up with the remote branch automatically, because it contained nothing new to the remote branch. In other words, a fast forward implies no merging was needed; all the local files were no newer than the remote branch's latest push.

Finally, I should mention git revert and git reset, which are very useful for undoing a commit or other changes to the Git tree. There's no room to explain them here, but make sure you know how to use them.


Conclusion

This article opened up the concept of merging, showing what it's like to keep the local and remote branches on two machines and resolving conflicts between them. I also drew attention to the complicated, even arcane Git messages, because compared with SVN, Git is much more verbose and much less intelligible. When you couple this fact with the complex syntax of Git's commands, it can make Git pretty intimidating for most beginners. However, once a few basic concepts are explained, Git gets much easier—even pleasant!

Resources

Learn

Get products and technologies

  • svnmerge.py is a tool to automate merge tracking. It allows branch maintainers to merge changes from and to their branches easily, and it automatically records which changes were already merged.
  • Git downloads, documentation, and tools are available from the Git Web site.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Linux on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux, Open source
ArticleID=449718
ArticleTitle=Git for Subversion users, Part 2: Taking control
publish-date=11252009