Git for Subversion users, Part 1: Getting started

Git gets demystified for Subversion version control system users

Distributed version control systems (DVCSs) offer a number of advantages over centralized VCSs, and for Subversion users looking to explore this model, Git is a great place to start. Using Subversion as a baseline, this first of two articles shows how to install Git, set up a remote repository, and begin using basic Git commands.

Teodor Zlatanov, Programmer, Gold Software Systems

photo- teodor zlatanovTeodor Zlatanov emerged with an M.S. in computer engineering from Boston University in 1999. He has worked as a programmer since 1992, using Perl, Java, C, and C++. His interests are in open source work on text parsing, database architectures, user interfaces, and UNIX system administration.



04 August 2009

Also available in Russian Japanese Portuguese

For anyone unfamiliar with free and open source version control systems (VCSs), Subversion has become the standard non-commercial VCS, replacing the old champ, Concurrent Versions System (CVS). CVS is still just fine for limited use, but Subversion's allure is that it requires only a little bit of setup on a Web server and not much beyond that. Subversion does have some issues, which I'll discuss here, but for the most part, it just works.

So, why do we need another one? Git (capital "G"; git is the command-line tool) is in many ways designed to be better than Subversion. It is one of many distributed VCSs. My own first experience with these was with Arch/tla, as well as Mercurial, Bazaar, darcs, and a few others. For many reasons, which I'll discuss as far as they are relevant, Git has become popular and is often considered together with Subversion as the two leading choices for a personal or corporate VCS.

There are two important reasons to be interested in Git if you are a Subversion user.

  • You are looking to move to Git because Subversion is limiting you in some way.
  • You are curious about Git and want to find out how it compares to Subversion.

Well, perhaps there's a third reason: Git is a relatively hot technology you want to include on your resume. I hope that's not your primary goal; learning about Git is one of the most rewarding things a developer can do. Even if you don't use Git now, the concepts and workflow embodied in this distributed VCS are certain to be crucial knowledge for most segments of the IT industry in the next 10 years as the industry undergoes massive changes in scope and geographical distribution.

Finally, though it might not be a compelling reason if you're not a Linux kernel developer, the kernel and a number of other important projects are maintained using Git, so you'll want to be familiar with it if you plan to contribute.

This article is intended for beginning-to-intermediate Subversion users. It requires beginner-level knowledge of Subversion and some general knowledge of version control systems. The information here is mainly for users of UNIX®-like (Linux® and Mac OS X) systems, with a little bit thrown in for Windows® users.

Part 2 of this series will discuss more advanced uses of Git: merging branches, generating diffs, and other common tasks.

Subversion and Git basics

Henceforth, I'll abbreviate "Subversion" as "SVN" to save wear and tear on my U, B, E, R, S, I, and O keys.

git-svn

You may have heard of git-svn, a tool that lets you use Git against a Subversion repository. Though useful in some situations, being halfway distributed and using a centralized VCS is not the same as switching to a distributed VCS.

So, what's SVN good for? You might already know this, but a VCS is not about files; it's about changes. SVN, running on a central server, adds changes to its repository of data and can give you a snapshot after every change. That snapshot has a revision number; the revision number is very important to SVN and the people who use it. If your change goes in after mine, you're guaranteed to have a higher revision number.

Git has a similar goal—tracking changes—but has no centralized server. This difference is crucial. Where SVN is centralized, Git is distributed; therefore, Git has no way to provide an increasing revision number, because there is no "latest revision." It still has unique revision IDs; they are just not as useful on their own as the SVN revision numbers.

With Git, the crucial action is no longer the commit; it is the merge. Anyone can clone a repository and commit to the clone. The owner of the repository is given the choice of merging changes back. Alternatively, developers can push changes back to the repository. I'll explore only the latter, authorized-push model.


Keeping a directory under SVN

Let's start with a simple common example: tracking a directory's contents with SVN. You'll need a SVN server and, obviously, a directory of files, as well as an account on that server with commit rights for at least one path. Get started by adding and committing the directory:

Listing 1. Setting up a directory under SVN
% svn co http://svnserver/...some path here.../top
% cd top
% cp -r ~/my_directory .
% svn add my_directory
% svn commit -m 'added directory'

What does this let you do? Now you can get the latest version of any file committed under this directory, delete files, rename them, create new files or directories, commit changes to existing files, and more:

Listing 2. Basic file operations under SVN
# get latest
% svn up
# what's the status?
% svn st
# delete files
% svn delete
# rename files (really a delete + add that keeps history)
% svn rename
# make directory
% svn mkdir
# add file
% svn add
# commit changes (everything above, plus any content changes)
% svn commit

I won't examine these commands in detail here, but do keep them in mind. For help on any of these commands, just type svn help COMMAND, and Subversion will show you some basic help; go to the manual for more.


Keeping a directory under Git

I'll follow the same path as I did with the SVN example. As before, I'm assuming you already have a directory full of data.

For the remote server, I'll use the free github.com service, but, of course, you can set up your own server if you like. GitHub is an easy way to play with a remote Git repository. As of this writing, for a free account you're limited to 300MB of data and your repository must be public. I signed up as user "tzz" and created a public repository called "datatest"; feel free to use it. I gave my public SSH key; you should generate one if you don't have one already. You may also want to try the Gitorious server or repo.or.cz. You'll find a long list of Git hosting services on the git.or.cz Wiki (see Resources for a link).

One nice thing about GitHub is that it's friendly. It tells you exactly what commands are needed to set up Git and initialize the repository. I'll walk through those with you.

First, you need to install Git, which is different on every platform, and then initialize it. The Git download page (see Resources) lists a number of options depending on platform. (On Mac OS X, I used the port install git-core command, but you need to set up MacPorts first. There's also a standalone MacOS X Git installer linked from the Git download page; that will probably work better for most people.)

Once you have it installed, here are the commands I used for a basic setup (pick your own user name and e-mail address, naturally):

Listing 3. Basic Git setup
% git config --global user.name "Ted Zlatanov"
% git config --global user.email "tzz@bu.edu"

Already you might see a difference from SVN; there, your user identity was server-side and you were whomever the server said you were. In Git, you can be The Wonderful Monkey Of Wittgenstein if you want (I resisted the temptation).

Next, I set up the data files and initialize my repository with them. (GitHub will also import from a public SVN repository, which can be helpful.)

Listing 4. Directory setup and first commit
# grab some files
% cp -rp ~/.gdbinit gdbinit
% mkdir fortunes
% cp -rp ~/.fortunes.db fortunes/data.txt
# initialize
% git init
# "Initialized empty Git repository in /Users/tzz/datatest/.git/"
# add the file and the directory
% git add gdbinit fortunes
% git commit -m 'initializing'
#[master (root-commit) b238ddc] initializing
# 2 files changed, 2371 insertions(+), 0 deletions(-)
# create mode 100644 fortunes/data.txt
# create mode 100644 gdbinit

In the output above, Git is telling us about file modes; 100644 refers to the octal version of the permission bits on those files. You don't need to worry about that, but the 2371 insertions is puzzling. It only changed two files, right? That number actually refers to the number of lines inserted. We didn't delete any, of course.

How about pushing our new changes to the GitHub server? The docs tell us how to add a remote server called "origin" (you can use any name). I should mention here that if you want to learn more about any Git command, for example, git remote, you'd type git remote --help or git help remote. This is typical for command-line tools, and SVN does something very similar.

Listing 5. Push the changes to the remote
# remember the remote repository is called "datatest"?
% git remote add origin git@github.com:tzz/datatest.git
# push the changes
% git push origin master
#Warning: Permanently added 'github.com,65.74.177.129' (RSA) to the list of known hosts.
#Counting objects: 5, done.
#Delta compression using 2 threads.
#Compressing objects: 100% (4/4), done.
#Writing objects: 100% (5/5), 29.88 KiB, done.
#Total 5 (delta 0), reused 0 (delta 0)
#To git@github.com:tzz/datatest.git
# * [new branch]      master -> master

The warning is from OpenSSH because github.com was not a known host before. Nothing to worry about.

Git messages are, shall we say, thorough. Unlike SVN's messages, which are easy to understand, Git is written for mentats, by mentats. If you're from Frank Herbert's Dune universe and are trained as a human computer, you've probably already written your own version of Git, just because you can. For the rest of us, delta compression and the number of threads used by it are just not very relevant (and they make our heads hurt).

The push was done over SSH, but you can use other protocols, such as HTTP, HTTPS, rsync, and file. See git push --help.

Here is the crucial, most important, basic difference between SVN and Git. SVN's commit says "push this to the central server." Until you commit in SVN, your changes are ethereal. With Git, your commit is local, and you have a local repository no matter what happens on the remote side. You can back out a change, branch, commit to the branch, and so on without any interaction with the remote server. Pushing with Git is effectively a synchronization of your repository's state with the remote server.

All right, so finally let's see the Git log of what just happened:

Listing 6. The Git log
% git log
#commit b238ddca99ee582e1a184658405e2a825f0815da
#Author: Ted Zlatanov <tzz@lifelogs.com>
#Date:   ...commit date here...
#
#    initializing

Only the commit is in the log (note the long, random-looking commit ID as opposed to the SVN revision number). There is no mention of the synchronization via git push.


Collaborating through Git

So far we've been using Git as a SVN replacement. Of course, to make it interesting, we have to get multiple users and changesets involved. I'll check out the repository to another machine (running Ubuntu GNU/Linux in this case; you'll need to install git-core and not git):

Listing 7. Setting up another Git identity and checking out the repository
% git config --global user.name "The Other Ted"
% git config --global user.email "tzz@bu.edu"
% git clone git@github.com:tzz/datatest.git
#Initialized empty Git repository in /home/tzz/datatest/.git/
#Warning: Permanently added 'github.com,65.74.177.129' (RSA) to the list of known hosts.
#remote: Counting objects: 5, done.
#remote: Compressing objects: 100% (4/4), done.
#Indexing 5 objects...
#remote: Total 5 (delta 0), reused 0 (delta 0)
# 100% (5/5) done
% ls datatest
#fortunes  gdbinit
% ls -a datatest/.git
# .  ..  branches  config  description  HEAD  hooks  index  info  logs  objects  refs
% ls -a datatest/.git/hooks
# .  ..  applypatch-msg  commit-msg  post-commit  post-receive post-update
#  pre-applypatch  pre-commit  pre-rebase  update

Again, notice the OpenSSH warning indicating we have not done business with GitHub over SSH before from this machine. The git clone command is like a SVN checkout, but instead of getting a synthesized version of the contents (a snapshot as of a particular revision, or the latest revision), you are getting the whole repository.

I included the contents of the datatest/.git directory and the hooks subdirectory under it to show that you really do get everything. Git keeps no secrets by default, unlike SVN, which keeps the repository private by default and only allows access to snapshots.

Incidentally, if you want to enforce some rules on your Git repository, whether on every commit or at other times, the hooks are the place. They are shell scripts, much like the SVN hooks, and have the same "return zero for success, anything else for failure" standard UNIX convention. I won't go into more detail on hooks here, but if your ambition is to use Git in a team, you should definitely read up on them.

All right, so "The Other Ted" is frisky and wants to add a new file in the master branch (roughly equivalent to SVN's TRUNK) and also make a new branch with some changes to the gdbinit file.

Listing 8. Adding a file and making a new branch
# get a file to add...
% cp ~/bin/encode.pl .
% git add encode.pl
% git commit -m 'adding encode.pl'
#Created commit 6750342: adding encode.pl
# 1 files changed, 1 insertions(+), 0 deletions(-)
# create mode 100644 encode.pl
% git log
#commit 675034202629e5497ed10b319a9ba42fc72b33e9
#Author: The Other Ted <tzz@bu.edu>
#Date:   ...commit date here...
#
#    adding encode.pl
#
#commit b238ddca99ee582e1a184658405e2a825f0815da
#Author: Ted Zlatanov <tzz@lifelogs.com>
#Date:   ...commit date here...
#
#    initializing
% git branch empty-gdbinit
% git branch
#  empty-gdbinit
#* master
% git checkout empty-gdbinit
#Switched to branch "empty-gdbinit"
% git branch
#* empty-gdbinit
#  master
% git add gdbinit
% git commit -m 'empty gdbinit'
#Created commit 5512d0a: empty gdbinit
# 1 files changed, 0 insertions(+), 1005 deletions(-)
% git push
#updating 'refs/heads/master'
#  from b238ddca99ee582e1a184658405e2a825f0815da
#  to   675034202629e5497ed10b319a9ba42fc72b33e9
#Generating pack...
#Done counting 4 objects.
#Result has 3 objects.
#Deltifying 3 objects...
# 100% (3/3) done
#Writing 3 objects...
# 100% (3/3) done
#Total 3 (delta 0), reused 0 (delta 0)

That was a long example and I hope you didn't fall asleep; if you did, I hope you dreamt of Git repositories synchronizing in an endless waltz of changesets. (Oh, you'll have those dreams, don't worry.)

First, I added a file (encode.pl, only one line) and committed it. After the commit, the remote repository at GitHub did not have any idea I had made changes. I then made a new branch called empty-gdbinit and switched to it (I could have done this with git checkout -b empty-gdbinit as well). In that branch, I emptied the gdbinit file and committed that change. Finally, I pushed to the remote server.

If I switch to the master branch, I won't see the empty gdbinit in the logs. So, each branch has its own log, which makes sense.

Listing 9. Looking at logs between branches
# we are still in the empty-gdbinit branch
% git log
#commit 5512d0a4327416c499dcb5f72c3f4f6a257d209f
#Author: The Other Ted <tzz@bu.edu>
#Date:   ...commit date here...
#
#    empty gdbinit
#
#commit 675034202629e5497ed10b319a9ba42fc72b33e9
#Author: The Other Ted <tzz@bu.edu>
#Date:   ...commit date here...
#
#    adding encode.pl
#
#commit b238ddca99ee582e1a184658405e2a825f0815da
#Author: Ted Zlatanov <tzz@lifelogs.com>
#Date:   ...commit date here...
#
#    initializing
% git checkout master
#Switched to branch "master"
% git log
#commit 675034202629e5497ed10b319a9ba42fc72b33e9
#Author: The Other Ted <tzz@bu.edu>
#Date:   ...commit date here...
#
#    adding encode.pl
#
#commit b238ddca99ee582e1a184658405e2a825f0815da
#Author: Ted Zlatanov <tzz@lifelogs.com>
#Date:   ...commit date here...
#
#    initializing

When I did the push, Git said, "Hey, look at that, a new file called encode.pl" on GitHub's servers.

GitHub's Web interface will now display encode.pl. But there's still only one branch on GitHub. Why was the empty-gdbinit branch not synchronized? It's because Git doesn't assume you want to push branches and their changes by default. For that, you need to push everything:

Listing 10. Pushing all
% git push -a
#updating 'refs/heads/empty-gdbinit'
#  from 0000000000000000000000000000000000000000
#  to   5512d0a4327416c499dcb5f72c3f4f6a257d209f
#updating 'refs/remotes/origin/HEAD'
#  from 0000000000000000000000000000000000000000
#  to   b238ddca99ee582e1a184658405e2a825f0815da
#updating 'refs/remotes/origin/master'
#  from 0000000000000000000000000000000000000000
#  to   b238ddca99ee582e1a184658405e2a825f0815da
#Generating pack...
#Done counting 5 objects.
#Result has 3 objects.
#Deltifying 3 objects...
# 100% (3/3) done
#Writing 3 objects...
# 100% (3/3) done
#Total 3 (delta 1), reused 0 (delta 0)

Again, the mentat interface is here in full glory. But we can figure things out, right? We may not be mentats, but at least we have the common sense to figure that 0000000000000000000000000000000000000000 is some kind of special initial tag. We can also see from the logs in Listing 9 that tag 5512d0a4327416c499dcb5f72c3f4f6a257d209f is the last (and only) commit in the empty-gdbinit branch. The rest might as well be in Aramaic for most users; they just won't care. GitHub will now show the new branch and the changes in it.

You can use git mv and git rm to manage files, renaming and removing them, respectively.


Conclusion

In this article, I explained basic Git concepts and used Git to keep a simple directory's contents under version control, comparing Git with Subversion along the way. I explained branching using a simple example.

In Part 2, I will explore merging, generating diffs, and some of the other Git commands. I strongly encourage you to read the very understandable Git manual or at least go through the tutorial. It's all accessible from the Git home page, so please spend some time exploring it. (See the Resources below for links.) From the perspective of a SVN user, you don't need much more.

On the other hand, Git is a very rich DVCS; learning more about its features will almost certainly lead to using them to simplify and improve your VCS workflow. Plus, you may even have a dream or two about Git repositories.

Resources

Learn

Get products and technologies

  • Get Git, and see the Git download page, as well as tons of documentation and tools.
  • With IBM trial software, available for download directly from developerWorks, build your next development project on Linux.

Discuss

  • Get involved in the My developerWorks community; with your personal profile and custom home page, you can tailor developerWorks to your interests and interact with other developerWorks users.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Linux on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux, Open source, Web development
ArticleID=418204
ArticleTitle=Git for Subversion users, Part 1: Getting started
publish-date=08042009