Git changes the game of distributed Web development
Modern version control systems provide powerful support for collaboration
Version control systems (VCSs) provide a mechanism for applying and managing changes to sets of files in a project and are commonly used in team-oriented software, documentation, and other online development projects. VCSs are just as critical to development projects as system backups, as they enable multiple users to submit changes to the same files or project without one developer's changes accidentally overwriting another's.
Even if Linus Torvalds hadn't developed the Linux® operating system kernel, he might be just as famous for having created the Git VCS. Projects as complex as Linux are the ultimate system test for a VCS, so it's no surprise that Git has quickly evolved into a stable, powerful, and flexible system.
Linux and UNIX® systems are knee-deep in VCSs, ranging from dinosaurs such as the Revision Control System (RCS) and the Concurrent Versions System (CVS) to more modern systems such as Subversion, Mercurial, Bazaar, Arch, and Darcs. Ironically (especially in the Linux domain), Git was developed as a replacement for a commercial VCS called BitKeeper, which had both unique, impressive capabilities and a free version. BitKeeper is still impressive, but licensing changes eventually caused Torvalds to look for alternatives; in the finest free software tradition, he eventually decided to write his own. Like the Linux kernel, Git is now the product of enhancements, bug fixes, and other contributions from countless open source developers.
Git is attractive both because of its power and because of the cost savings associated with free software, and it is rapidly being adopted by many open source software projects, academic institutions, and organizations. Once "inside" a corporate or academic environment, Git's benefits as both a VCS and a platform for collaboration have led to its adoption by many projects outside the traditional "source code" arena. As this article discusses, Git is especially useful in complex, distributed Web development projects that have both content and development requirements and therefore require up-to-date interaction among different people.
Git: more than just a VCS
Git was designed to facilitate distributed development among thousands of developers working at many different locations with different degrees of Internet connectivity without introducing significant performance or access bottlenecks. The most important aspects supporting these fundamental requirements in Git are:
- Using a central repository but providing each developer on a project with a complete copy of the source code for that project. This guarantees that all developers can get work done regardless of their current connectivity.
- Providing fast and reliable support for creating different working sets, known as branches, within a software project and sharing changes across them, known as merges. Branches make it easy to support different versions of a software package, regardless of whether those versions are permanent or were created for experimentation. Merges are a key aspect of a source code control system in general and are especially common in a branch-oriented VCS.
- Making it easy to share in-progress branches and code changes between subsets of developers without requiring that those changes first be checked in to the central repository.
These design decisions and their implementation are key aspects of Git's success and usability. Of course, Git also satisfies standard VCS requirements such as immutability and accountability. Immutability means that once changes are committed to a repository, they are a permanent part of a project's historical record. Even though they can subsequently be un-done (known as reverting), both the changes and the replacement code that undoes those changes are a permanent part of project history. Accountability means that it is easy to determine who made a specific change and when that change was made. Why a change was made may still be a mystery (though some comment is required whenever a change is committed), but at least you know whom to ask.
Git uses an internal index to track the state of the files and directories in a repository. It also stores objects that reflect changes to those files and directories to simplify subsequent updates. Git's index and these objects are distinct from the actual files and directories present in a local repository—a model that makes it easy to identify files and directories that have changed locally but have not yet been committed to a local repository or a remote central repository (if one is present). Some Git commands operate on the index, while others operate on actual file and directory contents, which can be confusing if you use the wrong command and wonder why files haven't been updated.
Most Linux distributions provide the packages for Git in their package repository. On Ubuntu, Debian, and similar systems that use the .deb package format, you need to install the git-core package. On RPM-based systems such as Fedora, Red Hat, Centos, SUSE, and Mandriva, the main Git package is simply called git. The basic Git package requires that Perl, Perl libraries for encryption and error handling, and the patch utility are also installed on your system.
If you need the latest and greatest version of Git for your Linux system, the Git Web site provides links to prepackaged .deb and RPM packages as well as the latest Git source code (if you want to build your own version). The Git site also provides links to precompiled versions of Git for Mac® OS X, native Windows®, Cygwin on Windows systems, and Sun/Oracle Solaris® systems. At the moment, IBM® AIX® and Hewlett-Packard® HP-UX systems administrators have to build Git from source code for their platforms. See Related topics for information about obtaining or building Git for your platform.
The main Git package contains the git executable and a few auxiliary Git applications. As you might expect, a huge number of other Git-related packages are also available. Some of the most popular Git-related packages include:
- git-cola. A GUI for working with files and directories in a Git repository
- git-doc. Installs the Git user's manual, tutorial, and documentation locally
- git-gui. A GUI for browsing and working with Git repositories; uses the gitk package
- git-web. A graphical, Web-based interface to Git repositories
- gitk. A simple GUI for browsing and working with Git repositories
- qgit. A graphical Qt-based application for viewing and exploring Git repositories
The git-gui, git-web, gitk,, and qgit packages provide similar functionality, though git-web is Web based, while all the others run locally. Any of these packages can be useful when you're getting started with Git, though the git-web package is probably the best choice in a distributed environment.
If you're interested in experimenting with Git but are already committed to another VCS, the following packages can be useful:
- git-cvs. This package provides interoperability between Git and CVS repositories, enabling you to import CVS repositories and change history into Git, work in Git, merge your changes back to the CVS repository, and import updates from the CVS repository.
- git-svn. This package provides interoperability between Git and Subversion repositories, enabling you to import Subversion repositories and change history into Git, work in Git, merge your changes back to the Subversion repository, and import updates from the Subversion repository.
Common Git commands
Git is what is traditionally referred to as a command suite, meaning that the
primary command you use is
git and that the first
argument on the command line indicates the specific Git command you want to run.
You can see a list of common Git commands at any time by running the
command with no arguments. The following is a subset of this list:
add. Adds a new file to Git's index. You will still have to commit this file to actually add its contents to a Git repository.
branch. Enables you to list branches that you have checked out, identify the branch that you're currently working on, create new branches, or destroy local copies of branches that you have created or checked out. This command does not switch to other branches: You do that using Git's
checkout. Checks out a specified branch or file/directory. If you are checking out a branch, that branch becomes your working branch. If you specify a certain file or directory, that file or directory will be updated to match the version currently checked in to the repository on the branch where you are working. You can also use this command to create a new branch that is based on and optionally tracks changes to a specified existing branch.
commit. Records changes to files and directories in Git's index. You can either specify the files and directories whose changes you want to commit, use the
-aoption to add all pending changes to files Git is tracking, or use the
--interactiveoption to select the file or directory changes that you want to commit together. (The latter is especially useful if you are working on several different tasks involving large numbers of files but want to commit certain changes together. Commits are made to the local repository; if you are using a remote, central repository, you must still use Git's
pushcommand to push your local changes to the remote repository.)
diff. Displays differences between local files and other commits or between files in two different commits. This command is most commonly used by simply specifying a file name, which displays the differences between the specified file and the version of that file committed to your repository on the current branch.
fetch. Retrieves index updates from another repository, identifying new tags that have been created and providing information about file or directory changes that have been committed to that repository but are not yet present locally. You can then inspect available changes using the
git logcommand. To actually retrieve associated file changes after a fetch, you would use either the
grep. Looks for and displays matches for a pattern in files in your current branch. This command supports most of your favorite GNU
log. Shows commit log information for the current branch or for specified files in the current branch.
merge. Merges changes from one branch to another. This command provides options to determine whether merged changes are automatically committed, enabling you to explore the impact of a merge before actually accepting those changes.
mv. Renames a file, directory, or symbolic link that Git is currently tracking.
pull. Retrieves index updates from another repository and merges them into the files and directories in the current branch.
push. Updates a remote repository with local index and object change information.
rebase. Updates your current branch to match a remote branch and modifies any local commits that have not been pushed to the remote branch so that they can be applied to the current state of that remote branch. This is a powerful but potentially dangerous command, because it literally rewrites commits as needed so that they can be merged. Depending on the frequency and extent of changes to a remote repository, simply using the
git pullcommand is often an attractive alternative.
rm. Removes a file, directory, or symbolic link that Git is currently tracking.
stash. Temporarily pushes your current changes onto a stack and returns your current checkout to a virgin state.
git stash savesaves your current changes on a local stack, while
git stash applyretrieves and reapplies them. This can be useful when you want to fetch remote changes or rebase without permanently committing in-progress changes.
status. Shows the status of the current branch, identifying changes that have not been committed, files that are not being tracked, and so on.
All Git commands accept a
--help option, which is your best
friend when trying to get more detailed information about any of these commands,
the list of command-line options that each accepts, and so on. You can also execute the
git help command to obtain help about
any Git command.
For a complete list of Git commands, execute the
command to see the online reference information for Git.
Setting up a new Git project
To begin using Git with an existing project that is not under any sort of revision control, perform the following steps:
- Change to the directory containing the source code for your project:
$ cd www.move2linux.com $ ls greenbar_wide.gif images index.html legacy.html services.html
- Use the
git initcommand to create an empty repository in the current directory:
$ git init Initialized empty Git repository in .git/
- Optionally, use the
git statuscommand to check the status of your new project.
This command lists all the files and directories in the current directory as untracked, meaning that Git knows that they are present but has not been instructed to track the files:
$ git status # On branch master # # Initial commit # # Untracked files: # (use "git add <file>..." to include in what will be committed) # # greenbar_wide.gif # images/ # index.html # legacy.html # services.html nothing added to commit but untracked files present...
- Add the files and directories in your project to your new Git repository.
You can either list them explicitly or use a period (
.) as the traditional short cut for "the contents of the current directory:"
$ git add .
- Re-execute the
git statuscommand to verify that all the files in the current directory and all its subdirectories have been added to your new project:
$ git status # On branch master # # Initial commit # # Changes to be committed: # (use "git rm --cached <file>..." to unstage) # # new file: greenbar_wide.gif # new file: images/digits/b/0.gif # new file: images/digits/b/1.gif # new file: images/digits/b/4.gif # new file: images/digits/b/5.gif # new file: images/digits/b/6.gif # new file: images/digits/b/7.gif # new file: images/digits/b/8.gif # new file: images/digits/b/9.gif # new file: index.html # new file: legacy.html # new file: services.html #
- Execute the
git commitcommand to check in your initial files.
Unless you use the
-m "commit message"option to specify your commit message on the command line, this command starts the default editor, in which you must enter a comment that will be associated with the commit. After saving your comment and exiting the editor, Git checks in the files associated with the change and displays information about the commit and the associated files.
$ git commit Created initial commit dfbd6cc: Initial checkin 12 files changed, 285 insertions(+), 0 deletions(-) ...
At this point, you're ready to use the commands discussed above to begin working in Git with your project files.
Making changes to an existing Git project
If you want to begin making your own changes to a Git project that someone
has already created, getting started is even easier. You simply use the
git clone command to create your own
working copy of the Git project:
$ git clone /home/jake/src/maps
This example creates a copy of the Git maps project in your current working directory. You can then change directory to your copy of the maps project and use the commands discussed earlier to begin working with the files in that project in Git.
Cloning a Git project
Cloning a Git project located on another machine is equally simple. Git supports the Secure shell (SSH) and HTTP protocols by default and can also use its own hyper-efficient git protocol if a Git daemon is running on the remote system and exporting the project that you're interested in. Git uses SSH by default, so the syntax for cloning a repository via SSH is exactly what you'd expect:
$ git clone remote-machine:/home/jake/src/maps $ git clone ssh://remote-machine/home/jake/src/maps
Note: Cloning a Git project via SSH requires that you have authenticated access to the remote system, which is a good argument for using the git protocol, even though it requires that you run the Git daemon.
Distributed Web development with Git
Repositories such as the one you created above contain a working copy of all the files in a Git project as well as all the files required for Git's use in tracking changes, branches, tags, and so on. By default, pushing to Git repositories that contain working copies of the files in a project only updates the index for that project, not the actual files in the project. This is because trying to update the files themselves could introduce merge conflicts if you were also working on the same files.
To create a Git project that you can push to, you need to create what is known as a bare repository—that is, one that does not contain a working copy of your file but contains the Git index, objects reflecting updates to that index, and other files that Git requires. Because a bare repository does not contain a working copy of your files, no one can actually work there, and it simply serves as a collection point for all developers working on the project that it contains.
Perform the following steps to create a Git repository on a Web server that contains the content of your Web site and that you and other developers can push to. This procedure also replaces your existing Web content directory with a directory that contains a checked-out version of your new Git repository and that is updated automatically whenever files are pushed to the shared Git repository. There are many ways to do this: This example was designed to be easy to follow and therefore only involves the HTML content on your Web site. You can follow the same principles to work on any other portion of your Web site.
- Use SSH on your Web server, and change to the directory containing
your Web content.
If your Web content is not already being tracked by Git, use the process described in the previous section to set up a Git repository there. For example:
$ ssh somehost $ cd /var/www/html $ git init $ git add . $ git commit -m "Initial commit"
- Change directory up one level, and create a bare Git repository by
cloning the project that you just created for your Web content:
$ cd .. $ git clone --bare html html.git
It is good practice to create your bare repository with the .git extension so that you can view it with tools such as gitweb, which requires this extension.
- Rename your existing Web directory, and create a new Git project by
the same name by cloning your bare repository:
$ mv html html.OLD && git clone html.git html
The new project directory contains a checked-out version of all the files in the Git project that correspond to your Web server content.
- Edit (or create) a post-update script in the hooks subdirectory of your
bare repository that pushes changes to the new project directory
that contains the checked-out files for your Web content.
This script will be executed once after any of the files being tracked by your bare repository are updated. Make sure that this script is executable:
$ pushd html.git/hooks $ emacs post-update $ chmod 755 post-update
Your post-update script should look something like Listing 1.
Listing 1. The post-update script
#!/bin/bash # WEB_DIR="/var/www/html" export GIT_DIR="$WEB_DIR/.git" pushd $WEB_DIR > /dev/null git pull popd > /dev/null
Note that this script must use /bin/bash as its interpreter to be able to use that shell's
popdbuilt-in commands. If the post-update file already exists, you can simply append the rest of the script to it after verifying the interpreter. You must also make sure that any existing commands in an existing post-update script are not preceded by an
execcommand, which would prevent subsequent lines in the file from being executed.
At this point, you or any other developer can clone the bare repository on your Web server and begin working on your Web site. You and any other developer can then work together on the files that make up your Web site by performing any of the following tasks:
- Work on files in your checkout of the Web site and push them directly to the shared central repository.
- Work on files in your checkout of the Web site and have a coworker pull committed changes from your checkout into his or hers. This enables you to collaboratively work on files before pushing them to the shared central repository and therefore to your Web site.
Although Git is much faster than most VCSs, it may be impractical to work directly in your Web site's directory if you are working with a large and complex site that is heavily used. Visitors may accidentally receive updates to some files while other files have not yet been updated. This can be especially problematic if CSS updates are not received when updated pages that use them are loaded. You can often prevent this problem using solutions like symbolic links in your Web server to actually point to your content directory and switching them when you want updated content to go live, or by modifying your Web server configuration file to point to a different directory and gracefully restarting the Web server when you want your new content to go live.
Git is a powerful, flexible VCS that provides many collaborative capabilities because of the distributed audience for which it was designed. Git is enabling collaborative development in new and different ways, including shared development of Web sites, Web-based applications, and so on, and is well worth the time it takes to understand its internals and learn the subset of Git commands that you will commonly use.
- Git user's manual: See this manual for detailed, procedural information about using Git to manage development projects.
- Git Magic: Read this free online book on Git for detailed, procedural information about using Git to manage development projects.
- Git tutorial: Take this tutorial for a quick overview of how to get started with Git.
- "Manage source code using Git" (Eli M. Dow, developerWorks, July 2006): Get a quick introduction to building Git and using it with the Linux kernel sources.
- Git home page: Check out the home page for the latest releases of Git and for pointers to additional Git resources.
- Building Git Version Control System on AIX, HP-UX and Solaris: Check out this page for information on building Git for AIX and HP-UX systems.
- Download Git: Download versions of Git for Cygwin, Linux, Mac OS X, Solaris, and Windows.
- IBM product evaluation versions: Download these versions today and get your hands on application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.