Git for Subversion users, Part 1
Git gets demystified for Subversion version control system users
For anyone unfamiliar with free and open source version control systems (VCSs), Subversion has become the standard non-commercial VCS, replacing the old champ, Concurrent Versions System (CVS). CVS is still just fine for limited use, but Subversion's allure is that it requires only a little bit of setup on a Web server and not much beyond that. Subversion does have some issues, which I'll discuss here, but for the most part, it just works.
So, why do we need another one? Git (capital "G";
git is the command-line tool) is in many ways
designed to be better than Subversion. It is one of many
distributed VCSs. My own first experience with these was with
Arch/tla, as well as Mercurial, Bazaar, darcs, and a few others. For many
reasons, which I'll discuss as far as they are relevant, Git has become
popular and is often considered together with Subversion as the two
leading choices for a personal or corporate VCS.
There are two important reasons to be interested in Git if you are a Subversion user.
- You are looking to move to Git because Subversion is limiting you in some way.
- You are curious about Git and want to find out how it compares to Subversion.
Well, perhaps there's a third reason: Git is a relatively hot technology you want to include on your resume. I hope that's not your primary goal; learning about Git is one of the most rewarding things a developer can do. Even if you don't use Git now, the concepts and workflow embodied in this distributed VCS are certain to be crucial knowledge for most segments of the IT industry in the next 10 years as the industry undergoes massive changes in scope and geographical distribution.
Finally, though it might not be a compelling reason if you're not a Linux kernel developer, the kernel and a number of other important projects are maintained using Git, so you'll want to be familiar with it if you plan to contribute.
This article is intended for beginning-to-intermediate Subversion users. It requires beginner-level knowledge of Subversion and some general knowledge of version control systems. The information here is mainly for users of UNIX®-like (Linux® and Mac OS X) systems, with a little bit thrown in for Windows® users.
Part 2 of this series will discuss more advanced uses of Git: merging branches, generating diffs, and other common tasks.
Subversion and Git basics
Henceforth, I'll abbreviate "Subversion" as "SVN" to save wear and tear on my U, B, E, R, S, I, and O keys.
So, what's SVN good for? You might already know this, but a VCS is not about files; it's about changes. SVN, running on a central server, adds changes to its repository of data and can give you a snapshot after every change. That snapshot has a revision number; the revision number is very important to SVN and the people who use it. If your change goes in after mine, you're guaranteed to have a higher revision number.
Git has a similar goal—tracking changes—but has no centralized server. This difference is crucial. Where SVN is centralized, Git is distributed; therefore, Git has no way to provide an increasing revision number, because there is no "latest revision." It still has unique revision IDs; they are just not as useful on their own as the SVN revision numbers.
With Git, the crucial action is no longer the commit; it is the merge. Anyone can clone a repository and commit to the clone. The owner of the repository is given the choice of merging changes back. Alternatively, developers can push changes back to the repository. I'll explore only the latter, authorized-push model.
Keeping a directory under SVN
Let's start with a simple common example: tracking a directory's contents with SVN. You'll need a SVN server and, obviously, a directory of files, as well as an account on that server with commit rights for at least one path. Get started by adding and committing the directory:
Listing 1. Setting up a directory under SVN
% svn co http://svnserver/...some path here.../top % cd top % cp -r ~/my_directory . % svn add my_directory % svn commit -m 'added directory'
What does this let you do? Now you can get the latest version of any file committed under this directory, delete files, rename them, create new files or directories, commit changes to existing files, and more:
Listing 2. Basic file operations under SVN
# get latest % svn up # what's the status? % svn st # delete files % svn delete # rename files (really a delete + add that keeps history) % svn rename # make directory % svn mkdir # add file % svn add # commit changes (everything above, plus any content changes) % svn commit
I won't examine these commands in detail here, but do keep them in mind.
help on any of these commands, just type
svn help COMMAND, and Subversion will show you
some basic help; go to the manual for more.
Keeping a directory under Git
I'll follow the same path as I did with the SVN example. As before, I'm assuming you already have a directory full of data.
For the remote server, I'll use the free github.com service, but, of course, you can set up your own server if you like. GitHub is an easy way to play with a remote Git repository. As of this writing, for a free account you're limited to 300MB of data and your repository must be public. I signed up as user "tzz" and created a public repository called "datatest"; feel free to use it. I gave my public SSH key; you should generate one if you don't have one already. You may also want to try the Gitorious server or repo.or.cz. You'll find a long list of Git hosting services on the git.or.cz Wiki (see Related topics for a link).
One nice thing about GitHub is that it's friendly. It tells you exactly what commands are needed to set up Git and initialize the repository. I'll walk through those with you.
First, you need to install Git, which is different on every platform, and
then initialize it. The
Git download page (see Related topics) lists a number
of options depending on platform. (On Mac OS X, I used the
port install git-core command, but you need to
set up MacPorts first. There's also a standalone MacOS X Git installer
linked from the Git download page; that will probably work better for most
Once you have it installed, here are the commands I used for a basic setup (pick your own user name and e-mail address, naturally):
Listing 3. Basic Git setup
% git config --global user.name "Ted Zlatanov" % git config --global user.email "firstname.lastname@example.org"
Already you might see a difference from SVN; there, your user identity was server-side and you were whomever the server said you were. In Git, you can be The Wonderful Monkey Of Wittgenstein if you want (I resisted the temptation).
Next, I set up the data files and initialize my repository with them. (GitHub will also import from a public SVN repository, which can be helpful.)
Listing 4. Directory setup and first commit
# grab some files % cp -rp ~/.gdbinit gdbinit % mkdir fortunes % cp -rp ~/.fortunes.db fortunes/data.txt # initialize % git init # "Initialized empty Git repository in /Users/tzz/datatest/.git/" # add the file and the directory % git add gdbinit fortunes % git commit -m 'initializing' #[master (root-commit) b238ddc] initializing # 2 files changed, 2371 insertions(+), 0 deletions(-) # create mode 100644 fortunes/data.txt # create mode 100644 gdbinit
In the output above, Git is telling us about file modes;
100644 refers to the octal version of the
permission bits on those files. You don't need to worry about that, but
2371 insertions is puzzling. It only
changed two files, right? That number actually refers to the number of
lines inserted. We didn't delete any, of course.
How about pushing our new changes to the GitHub server? The docs tell us
how to add a remote server called "origin" (you can use any name). I
should mention here that if you want to learn more about any Git command,
git remote, you'd type
git remote --help or
git help remote. This is typical for
command-line tools, and SVN does something very similar.
Listing 5. Push the changes to the remote
# remember the remote repository is called "datatest"? % git remote add origin email@example.com:tzz/datatest.git # push the changes % git push origin master #Warning: Permanently added 'github.com,126.96.36.199' (RSA) to the list of known hosts. #Counting objects: 5, done. #Delta compression using 2 threads. #Compressing objects: 100% (4/4), done. #Writing objects: 100% (5/5), 29.88 KiB, done. #Total 5 (delta 0), reused 0 (delta 0) #To firstname.lastname@example.org:tzz/datatest.git # * [new branch] master -> master
The warning is from OpenSSH because github.com was not a known host before. Nothing to worry about.
Git messages are, shall we say, thorough. Unlike SVN's messages, which are easy to understand, Git is written for mentats, by mentats. If you're from Frank Herbert's Dune universe and are trained as a human computer, you've probably already written your own version of Git, just because you can. For the rest of us, delta compression and the number of threads used by it are just not very relevant (and they make our heads hurt).
The push was done over SSH, but you can use other protocols, such as HTTP,
HTTPS, rsync, and file. See
git push --help.
Here is the crucial, most important, basic difference between SVN and Git. SVN's commit says "push this to the central server." Until you commit in SVN, your changes are ethereal. With Git, your commit is local, and you have a local repository no matter what happens on the remote side. You can back out a change, branch, commit to the branch, and so on without any interaction with the remote server. Pushing with Git is effectively a synchronization of your repository's state with the remote server.
All right, so finally let's see the Git log of what just happened:
Listing 6. The Git log
% git log #commit b238ddca99ee582e1a184658405e2a825f0815da #Author: Ted Zlatanov <email@example.com> #Date: ...commit date here... # # initializing
Only the commit is in the log (note the long, random-looking commit ID as
opposed to the SVN revision number). There is no mention of the
Collaborating through Git
So far we've been using Git as a SVN replacement. Of course, to make it
interesting, we have to get multiple users and changesets involved. I'll
check out the repository to another machine (running Ubuntu GNU/Linux in
this case; you'll need to install
Listing 7. Setting up another Git identity and checking out the repository
% git config --global user.name "The Other Ted" % git config --global user.email "firstname.lastname@example.org" % git clone email@example.com:tzz/datatest.git #Initialized empty Git repository in /home/tzz/datatest/.git/ #Warning: Permanently added 'github.com,188.8.131.52' (RSA) to the list of known hosts. #remote: Counting objects: 5, done. #remote: Compressing objects: 100% (4/4), done. #Indexing 5 objects... #remote: Total 5 (delta 0), reused 0 (delta 0) # 100% (5/5) done % ls datatest #fortunes gdbinit % ls -a datatest/.git # . .. branches config description HEAD hooks index info logs objects refs % ls -a datatest/.git/hooks # . .. applypatch-msg commit-msg post-commit post-receive post-update # pre-applypatch pre-commit pre-rebase update
Again, notice the OpenSSH warning indicating we have not done business with
GitHub over SSH before from this machine. The
git clone command is like a SVN checkout, but
instead of getting a synthesized version of the contents (a snapshot as of
a particular revision, or the latest revision), you are getting the whole
I included the contents of the datatest/.git directory and the hooks subdirectory under it to show that you really do get everything. Git keeps no secrets by default, unlike SVN, which keeps the repository private by default and only allows access to snapshots.
Incidentally, if you want to enforce some rules on your Git repository, whether on every commit or at other times, the hooks are the place. They are shell scripts, much like the SVN hooks, and have the same "return zero for success, anything else for failure" standard UNIX convention. I won't go into more detail on hooks here, but if your ambition is to use Git in a team, you should definitely read up on them.
All right, so "The Other Ted" is frisky and wants to add a new file in the master branch (roughly equivalent to SVN's TRUNK) and also make a new branch with some changes to the gdbinit file.
Listing 8. Adding a file and making a new branch
# get a file to add... % cp ~/bin/encode.pl . % git add encode.pl % git commit -m 'adding encode.pl' #Created commit 6750342: adding encode.pl # 1 files changed, 1 insertions(+), 0 deletions(-) # create mode 100644 encode.pl % git log #commit 675034202629e5497ed10b319a9ba42fc72b33e9 #Author: The Other Ted <firstname.lastname@example.org> #Date: ...commit date here... # # adding encode.pl # #commit b238ddca99ee582e1a184658405e2a825f0815da #Author: Ted Zlatanov <email@example.com> #Date: ...commit date here... # # initializing % git branch empty-gdbinit % git branch # empty-gdbinit #* master % git checkout empty-gdbinit #Switched to branch "empty-gdbinit" % git branch #* empty-gdbinit # master % git add gdbinit % git commit -m 'empty gdbinit' #Created commit 5512d0a: empty gdbinit # 1 files changed, 0 insertions(+), 1005 deletions(-) % git push #updating 'refs/heads/master' # from b238ddca99ee582e1a184658405e2a825f0815da # to 675034202629e5497ed10b319a9ba42fc72b33e9 #Generating pack... #Done counting 4 objects. #Result has 3 objects. #Deltifying 3 objects... # 100% (3/3) done #Writing 3 objects... # 100% (3/3) done #Total 3 (delta 0), reused 0 (delta 0)
That was a long example and I hope you didn't fall asleep; if you did, I hope you dreamt of Git repositories synchronizing in an endless waltz of changesets. (Oh, you'll have those dreams, don't worry.)
First, I added a file (encode.pl, only one line) and committed it. After
the commit, the remote repository at GitHub did not have any idea I had
made changes. I then made a new branch called
empty-gdbinit and switched to it (I could have
done this with
git checkout -b empty-gdbinit as
well). In that branch, I emptied the gdbinit file and committed that
change. Finally, I pushed to the remote server.
If I switch to the master branch, I won't see the empty gdbinit in the logs. So, each branch has its own log, which makes sense.
Listing 9. Looking at logs between branches
# we are still in the empty-gdbinit branch % git log #commit 5512d0a4327416c499dcb5f72c3f4f6a257d209f #Author: The Other Ted <firstname.lastname@example.org> #Date: ...commit date here... # # empty gdbinit # #commit 675034202629e5497ed10b319a9ba42fc72b33e9 #Author: The Other Ted <email@example.com> #Date: ...commit date here... # # adding encode.pl # #commit b238ddca99ee582e1a184658405e2a825f0815da #Author: Ted Zlatanov <firstname.lastname@example.org> #Date: ...commit date here... # # initializing % git checkout master #Switched to branch "master" % git log #commit 675034202629e5497ed10b319a9ba42fc72b33e9 #Author: The Other Ted <email@example.com> #Date: ...commit date here... # # adding encode.pl # #commit b238ddca99ee582e1a184658405e2a825f0815da #Author: Ted Zlatanov <firstname.lastname@example.org> #Date: ...commit date here... # # initializing
When I did the push, Git said, "Hey, look at that, a new file called encode.pl" on GitHub's servers.
GitHub's Web interface will now display encode.pl. But there's still only
one branch on GitHub. Why was the
branch not synchronized? It's because Git doesn't assume you want to push
branches and their changes by default. For that, you need to push
Listing 10. Pushing all
% git push -a #updating 'refs/heads/empty-gdbinit' # from 0000000000000000000000000000000000000000 # to 5512d0a4327416c499dcb5f72c3f4f6a257d209f #updating 'refs/remotes/origin/HEAD' # from 0000000000000000000000000000000000000000 # to b238ddca99ee582e1a184658405e2a825f0815da #updating 'refs/remotes/origin/master' # from 0000000000000000000000000000000000000000 # to b238ddca99ee582e1a184658405e2a825f0815da #Generating pack... #Done counting 5 objects. #Result has 3 objects. #Deltifying 3 objects... # 100% (3/3) done #Writing 3 objects... # 100% (3/3) done #Total 3 (delta 1), reused 0 (delta 0)
Again, the mentat interface is here in full glory. But we can figure
things out, right? We may not be mentats, but at least we have the common
sense to figure that
some kind of special initial tag. We can also see from the logs in
Listing 9 that tag
5512d0a4327416c499dcb5f72c3f4f6a257d209f is the
last (and only) commit in the
branch. The rest might as well be in Aramaic for most users; they just
won't care. GitHub will now show the new branch and the changes in it.
You can use
git mv and
git rm to manage files, renaming and removing
In this article, I explained basic Git concepts and used Git to keep a simple directory's contents under version control, comparing Git with Subversion along the way. I explained branching using a simple example.
In Part 2, I will explore merging, generating diffs, and some of the other Git commands. I strongly encourage you to read the very understandable Git manual or at least go through the tutorial. It's all accessible from the Git home page, so please spend some time exploring it. (See the Related topics below for links.) From the perspective of a SVN user, you don't need much more.
On the other hand, Git is a very rich DVCS; learning more about its features will almost certainly lead to using them to simplify and improve your VCS workflow. Plus, you may even have a dream or two about Git repositories.
- The Git - SVN Crash Course is a handy reference for those already familiar with SVN. Another good tutorial is Flavio Castelli's Howto use Git and SVN together.
- For Git hosting resources, try GitHub, Gitorious, repo.or.cz, and this list of public Git hosting sites.
- Wikipedia's take on Subversion is quite full-featured, as is the entry on Git.
- Hear from Robby Russell, who migrated from Subversion to Git and actually lived to tell about it.
- Get Git, and see the Git download page, as well as tons of documentation and tools.
- In the developerWorks Linux zone, find more resources for Linux developers, and scan our most popular articles and tutorials.
- See all Linux tips and Linux tutorials on developerWorks.