Git changes the game of distributed Web development

Modern version control systems provide powerful support for collaboration

Version control systems are a core component of most development projects, regardless of whether you're developing an application, a Web site, or an operating system. Most projects involve multiple developers, often working at widely separated physical locations. Distributed version control systems are nothing new, but the Git version control system provides unique support for collaboration and interaction among developers.

Share:

William von Hagen, Systems Administrator, Writer, WordSmiths

William (Bill) von Hagen has been a writer and UNIX systems administrator for more than 20 years and a Linux advocate since 1993. Bill is the author or co-author of books on subjects such as Ubuntu Linux, Xen Virtualization, the GNU Compiler Collection (gcc), SUSE Linux, Mac OS X, Linux file systems, and SGML. He has also written numerous articles for Linux and Mac OS X publications and Web sites. You can reach Bill at wvh@vonhagen.org.


developerWorks Contributing author
        level

25 August 2009

Also available in Chinese Russian Japanese

Version control systems (VCSs) provide a mechanism for applying and managing changes to sets of files in a project and are commonly used in team-oriented software, documentation, and other online development projects. VCSs are just as critical to development projects as system backups, as they enable multiple users to submit changes to the same files or project without one developer's changes accidentally overwriting another's.

Frequently used acronyms

  • CSS: Cascading Style Sheet
  • GUI: Graphical user interface
  • HTML: Hypertext Markup Language
  • HTTP: Hypertext Transfer Protocol

Even if Linus Torvalds hadn't developed the Linux® operating system kernel, he might be just as famous for having created the Git VCS. Projects as complex as Linux are the ultimate system test for a VCS, so it's no surprise that Git has quickly evolved into a stable, powerful, and flexible system.

Linux and UNIX® systems are knee-deep in VCSs, ranging from dinosaurs such as the Revision Control System (RCS) and the Concurrent Versions System (CVS) to more modern systems such as Subversion, Mercurial, Bazaar, Arch, and Darcs. Ironically (especially in the Linux domain), Git was developed as a replacement for a commercial VCS called BitKeeper, which had both unique, impressive capabilities and a free version. BitKeeper is still impressive, but licensing changes eventually caused Torvalds to look for alternatives; in the finest free software tradition, he eventually decided to write his own. Like the Linux kernel, Git is now the product of enhancements, bug fixes, and other contributions from countless open source developers.

Git is attractive both because of its power and because of the cost savings associated with free software, and it is rapidly being adopted by many open source software projects, academic institutions, and organizations. Once "inside" a corporate or academic environment, Git's benefits as both a VCS and a platform for collaboration have led to its adoption by many projects outside the traditional "source code" arena. As this article discusses, Git is especially useful in complex, distributed Web development projects that have both content and development requirements and therefore require up-to-date interaction among different people.

Git: more than just a VCS

Git was designed to facilitate distributed development among thousands of developers working at many different locations with different degrees of Internet connectivity without introducing significant performance or access bottlenecks. The most important aspects supporting these fundamental requirements in Git are:

  • Using a central repository but providing each developer on a project with a complete copy of the source code for that project. This guarantees that all developers can get work done regardless of their current connectivity.
  • Providing fast and reliable support for creating different working sets, known as branches, within a software project and sharing changes across them, known as merges. Branches make it easy to support different versions of a software package, regardless of whether those versions are permanent or were created for experimentation. Merges are a key aspect of a source code control system in general and are especially common in a branch-oriented VCS.
  • Making it easy to share in-progress branches and code changes between subsets of developers without requiring that those changes first be checked in to the central repository.

These design decisions and their implementation are key aspects of Git's success and usability. Of course, Git also satisfies standard VCS requirements such as immutability and accountability. Immutability means that once changes are committed to a repository, they are a permanent part of a project's historical record. Even though they can subsequently be un-done (known as reverting), both the changes and the replacement code that undoes those changes are a permanent part of project history. Accountability means that it is easy to determine who made a specific change and when that change was made. Why a change was made may still be a mystery (though some comment is required whenever a change is committed), but at least you know whom to ask.

Git uses an internal index to track the state of the files and directories in a repository. It also stores objects that reflect changes to those files and directories to simplify subsequent updates. Git's index and these objects are distinct from the actual files and directories present in a local repository—a model that makes it easy to identify files and directories that have changed locally but have not yet been committed to a local repository or a remote central repository (if one is present). Some Git commands operate on the index, while others operate on actual file and directory contents, which can be confusing if you use the wrong command and wonder why files haven't been updated.


Getting Git

Most Linux distributions provide the packages for Git in their package repository. On Ubuntu, Debian, and similar systems that use the .deb package format, you need to install the git-core package. On RPM-based systems such as Fedora, Red Hat, Centos, SUSE, and Mandriva, the main Git package is simply called git. The basic Git package requires that Perl, Perl libraries for encryption and error handling, and the patch utility are also installed on your system.

If you need the latest and greatest version of Git for your Linux system, the Git Web site provides links to prepackaged .deb and RPM packages as well as the latest Git source code (if you want to build your own version). The Git site also provides links to precompiled versions of Git for Mac® OS X, native Windows®, Cygwin on Windows systems, and Sun/Oracle Solaris® systems. At the moment, IBM® AIX® and Hewlett-Packard® HP-UX systems administrators have to build Git from source code for their platforms. See Resources for information about obtaining or building Git for your platform.

The main Git package contains the git executable and a few auxiliary Git applications. As you might expect, a huge number of other Git-related packages are also available. Some of the most popular Git-related packages include:

  • git-cola. A GUI for working with files and directories in a Git repository
  • git-doc. Installs the Git user's manual, tutorial, and documentation locally
  • git-gui. A GUI for browsing and working with Git repositories; uses the gitk package
  • git-web. A graphical, Web-based interface to Git repositories
  • gitk. A simple GUI for browsing and working with Git repositories
  • qgit. A graphical Qt-based application for viewing and exploring Git repositories

The git-gui, git-web, gitk,, and qgit packages provide similar functionality, though git-web is Web based, while all the others run locally. Any of these packages can be useful when you're getting started with Git, though the git-web package is probably the best choice in a distributed environment.

If you're interested in experimenting with Git but are already committed to another VCS, the following packages can be useful:

  • git-cvs. This package provides interoperability between Git and CVS repositories, enabling you to import CVS repositories and change history into Git, work in Git, merge your changes back to the CVS repository, and import updates from the CVS repository.
  • git-svn. This package provides interoperability between Git and Subversion repositories, enabling you to import Subversion repositories and change history into Git, work in Git, merge your changes back to the Subversion repository, and import updates from the Subversion repository.

Common Git commands

Git is what is traditionally referred to as a command suite, meaning that the primary command you use is git and that the first argument on the command line indicates the specific Git command you want to run.

Git analogues

Older versions of Git also installed analogues for many Git commands in the form git-command, where command was a specific Git command. This enabled you to execute a Git command as, for example, either git push or git-push. This functionality was useful in the early days of Git, when many auxiliary Git commands were actively under development, because it provided a command syntax for Git commands that were provided in the main Git binary and those that were still external, but it is no longer necessary now that Git is more mature.

You can see a list of common Git commands at any time by running the git command with no arguments. The following is a subset of this list:

  • add. Adds a new file to Git's index. You will still have to commit this file to actually add its contents to a Git repository.
  • branch. Enables you to list branches that you have checked out, identify the branch that you're currently working on, create new branches, or destroy local copies of branches that you have created or checked out. This command does not switch to other branches: You do that using Git's checkout command.
  • checkout. Checks out a specified branch or file/directory. If you are checking out a branch, that branch becomes your working branch. If you specify a certain file or directory, that file or directory will be updated to match the version currently checked in to the repository on the branch where you are working. You can also use this command to create a new branch that is based on and optionally tracks changes to a specified existing branch.
  • commit. Records changes to files and directories in Git's index. You can either specify the files and directories whose changes you want to commit, use the -a option to add all pending changes to files Git is tracking, or use the --interactive option to select the file or directory changes that you want to commit together. (The latter is especially useful if you are working on several different tasks involving large numbers of files but want to commit certain changes together. Commits are made to the local repository; if you are using a remote, central repository, you must still use Git's push command to push your local changes to the remote repository.)
  • diff. Displays differences between local files and other commits or between files in two different commits. This command is most commonly used by simply specifying a file name, which displays the differences between the specified file and the version of that file committed to your repository on the current branch.
  • fetch. Retrieves index updates from another repository, identifying new tags that have been created and providing information about file or directory changes that have been committed to that repository but are not yet present locally. You can then inspect available changes using the git log command. To actually retrieve associated file changes after a fetch, you would use either the git pull or git rebase command.
  • grep. Looks for and displays matches for a pattern in files in your current branch. This command supports most of your favorite GNU grep options.
  • log. Shows commit log information for the current branch or for specified files in the current branch.
  • merge. Merges changes from one branch to another. This command provides options to determine whether merged changes are automatically committed, enabling you to explore the impact of a merge before actually accepting those changes.
  • mv. Renames a file, directory, or symbolic link that Git is currently tracking.
  • pull. Retrieves index updates from another repository and merges them into the files and directories in the current branch.
  • push. Updates a remote repository with local index and object change information.
  • rebase. Updates your current branch to match a remote branch and modifies any local commits that have not been pushed to the remote branch so that they can be applied to the current state of that remote branch. This is a powerful but potentially dangerous command, because it literally rewrites commits as needed so that they can be merged. Depending on the frequency and extent of changes to a remote repository, simply using the git pull command is often an attractive alternative.
  • rm. Removes a file, directory, or symbolic link that Git is currently tracking.
  • stash. Temporarily pushes your current changes onto a stack and returns your current checkout to a virgin state. git stash save saves your current changes on a local stack, while git stash apply retrieves and reapplies them. This can be useful when you want to fetch remote changes or rebase without permanently committing in-progress changes.
  • status. Shows the status of the current branch, identifying changes that have not been committed, files that are not being tracked, and so on.

All Git commands accept a --help option, which is your best friend when trying to get more detailed information about any of these commands, the list of command-line options that each accepts, and so on. You can also execute the command git help command to obtain help about any Git command.

For a complete list of Git commands, execute the man git command to see the online reference information for Git.


Setting up a new Git project

To begin using Git with an existing project that is not under any sort of revision control, perform the following steps:

  1. Change to the directory containing the source code for your project:
    $ cd www.move2linux.com
    $ ls
    greenbar_wide.gif  images  index.html  legacy.html  services.html
  2. Use the git init command to create an empty repository in the current directory:
    $ git init
    Initialized empty Git repository in .git/
  3. Optionally, use the git status command to check the status of your new project.

    This command lists all the files and directories in the current directory as untracked, meaning that Git knows that they are present but has not been instructed to track the files:

    $ git status
    # On branch master
    #
    # Initial commit
    #
    # Untracked files:
    #   (use "git add <file>..." to include in what will be committed)
    #
    #       greenbar_wide.gif
    #       images/
    #       index.html
    #       legacy.html
    #       services.html
    nothing added to commit but untracked files present...
  4. Add the files and directories in your project to your new Git repository.

    You can either list them explicitly or use a period (.) as the traditional short cut for "the contents of the current directory:"

    $ git add .
  5. Re-execute the git status command to verify that all the files in the current directory and all its subdirectories have been added to your new project:
      $ git status
      # On branch master
      #
      # Initial commit
      #
      # Changes to be committed:
      #   (use "git rm --cached <file>..." to unstage)
      #
      #       new file: greenbar_wide.gif
      #       new file: images/digits/b/0.gif
      #       new file: images/digits/b/1.gif
      #       new file: images/digits/b/4.gif
      #       new file: images/digits/b/5.gif
      #       new file: images/digits/b/6.gif
      #       new file: images/digits/b/7.gif
      #       new file: images/digits/b/8.gif
      #       new file: images/digits/b/9.gif
      #       new file: index.html
      #       new file: legacy.html
      #       new file: services.html
      #
  6. Execute the git commit command to check in your initial files.

    Unless you use the -m "commit message" option to specify your commit message on the command line, this command starts the default editor, in which you must enter a comment that will be associated with the commit. After saving your comment and exiting the editor, Git checks in the files associated with the change and displays information about the commit and the associated files.

    $ git commit 
    Created initial commit dfbd6cc: Initial checkin
    12 files changed, 285 insertions(+), 0 deletions(-)
    ...

At this point, you're ready to use the commands discussed above to begin working in Git with your project files.

Setting up a central repository

Creating a repository as discussed in this section requires that you pull changes from other users who are also working with this repository or that you check out branches once other users push changes to your branch. See the section, Distributed Web development with Git, for information about setting up a central repository that does not host a working copy of all your files and therefore does not have this issue.

Making changes to an existing Git project

If you want to begin making your own changes to a Git project that someone has already created, getting started is even easier. You simply use the git clone command to create your own working copy of the Git project:

$ git clone /home/jake/src/maps

This example creates a copy of the Git maps project in your current working directory. You can then change directory to your copy of the maps project and use the commands discussed earlier to begin working with the files in that project in Git.

Cloning a Git project

Cloning a Git project located on another machine is equally simple. Git supports the Secure shell (SSH) and HTTP protocols by default and can also use its own hyper-efficient git protocol if a Git daemon is running on the remote system and exporting the project that you're interested in. Git uses SSH by default, so the syntax for cloning a repository via SSH is exactly what you'd expect:

$ git clone remote-machine:/home/jake/src/maps
$ git clone ssh://remote-machine/home/jake/src/maps

Note: Cloning a Git project via SSH requires that you have authenticated access to the remote system, which is a good argument for using the git protocol, even though it requires that you run the Git daemon.


Distributed Web development with Git

Repositories such as the one you created above contain a working copy of all the files in a Git project as well as all the files required for Git's use in tracking changes, branches, tags, and so on. By default, pushing to Git repositories that contain working copies of the files in a project only updates the index for that project, not the actual files in the project. This is because trying to update the files themselves could introduce merge conflicts if you were also working on the same files.

To create a Git project that you can push to, you need to create what is known as a bare repository—that is, one that does not contain a working copy of your file but contains the Git index, objects reflecting updates to that index, and other files that Git requires. Because a bare repository does not contain a working copy of your files, no one can actually work there, and it simply serves as a collection point for all developers working on the project that it contains.

Perform the following steps to create a Git repository on a Web server that contains the content of your Web site and that you and other developers can push to. This procedure also replaces your existing Web content directory with a directory that contains a checked-out version of your new Git repository and that is updated automatically whenever files are pushed to the shared Git repository. There are many ways to do this: This example was designed to be easy to follow and therefore only involves the HTML content on your Web site. You can follow the same principles to work on any other portion of your Web site.

  1. Use SSH on your Web server, and change to the directory containing your Web content.

    If your Web content is not already being tracked by Git, use the process described in the previous section to set up a Git repository there. For example:

    $ ssh somehost
    $ cd /var/www/html
    $ git init
    $ git add .
    $ git commit -m "Initial commit"
  2. Change directory up one level, and create a bare Git repository by cloning the project that you just created for your Web content:
    $ cd ..
    $ git clone --bare html html.git

    It is good practice to create your bare repository with the .git extension so that you can view it with tools such as gitweb, which requires this extension.

  3. Rename your existing Web directory, and create a new Git project by the same name by cloning your bare repository:
    $ mv html html.OLD && git clone html.git html

    The new project directory contains a checked-out version of all the files in the Git project that correspond to your Web server content.

  4. Edit (or create) a post-update script in the hooks subdirectory of your bare repository that pushes changes to the new project directory that contains the checked-out files for your Web content.

    This script will be executed once after any of the files being tracked by your bare repository are updated. Make sure that this script is executable:

    $ pushd html.git/hooks
    $ emacs post-update
    $ chmod 755 post-update

    Your post-update script should look something like Listing 1.

    Listing 1. The post-update script
    #!/bin/bash
    #
    WEB_DIR="/var/www/html"
    export GIT_DIR="$WEB_DIR/.git"
    pushd $WEB_DIR > /dev/null
    git pull
    popd > /dev/null

    Note that this script must use /bin/bash as its interpreter to be able to use that shell's pushd and popd built-in commands. If the post-update file already exists, you can simply append the rest of the script to it after verifying the interpreter. You must also make sure that any existing commands in an existing post-update script are not preceded by an exec command, which would prevent subsequent lines in the file from being executed.

At this point, you or any other developer can clone the bare repository on your Web server and begin working on your Web site. You and any other developer can then work together on the files that make up your Web site by performing any of the following tasks:

  • Work on files in your checkout of the Web site and push them directly to the shared central repository.
  • Work on files in your checkout of the Web site and have a coworker pull committed changes from your checkout into his or hers. This enables you to collaboratively work on files before pushing them to the shared central repository and therefore to your Web site.

Although Git is much faster than most VCSs, it may be impractical to work directly in your Web site's directory if you are working with a large and complex site that is heavily used. Visitors may accidentally receive updates to some files while other files have not yet been updated. This can be especially problematic if CSS updates are not received when updated pages that use them are loaded. You can often prevent this problem using solutions like symbolic links in your Web server to actually point to your content directory and switching them when you want updated content to go live, or by modifying your Web server configuration file to point to a different directory and gracefully restarting the Web server when you want your new content to go live.


Conclusion

Git is a powerful, flexible VCS that provides many collaborative capabilities because of the distributed audience for which it was designed. Git is enabling collaborative development in new and different ways, including shared development of Web sites, Web-based applications, and so on, and is well worth the time it takes to understand its internals and learn the subset of Git commands that you will commonly use.

Resources

Learn

Get products and technologies

  • Download Git: Download versions of Git for Cygwin, Linux, Mac OS X, Solaris, and Windows.
  • IBM product evaluation versions: Download these versions today and get your hands on application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Web development on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Web development, Linux
ArticleID=422741
ArticleTitle=Git changes the game of distributed Web development
publish-date=08252009