Suppose you've just taken over a software project that's been running for a few years. How do you get an understanding of the project's development history? The best way is probably to just talk to the programmers involved, but that's easier said than done. They've often moved on to other projects and can be hard to track down. You can look at the release frequency, although that may be governed by a nontechnical mandate (something along the lines of, "We'll do a release only after the end of the fiscal year"). You can poke around the bug and feature-request trackers, dredging up discussions on bugs opened and closed. Or you can go right to the source code history and use a utility like StatCVS to see what changes have been made and who made them. I've used StatCVS for several years on various large projects, and the reports it generates have always been well received. In this article, I'll show you how to set up and run StatCVS on your project, how to read the reports it generates, and where it has room to improve.
StatCVS is a Java™ program and requires JDK 1.4 or higher to use. It's easiest to install StatCVS from the command line: Just download the most recent release (see Resources) and unzip it into a directory; I used /usr/local/statcvs/. Also, as shown in Listing 1, I created a symbolic link called statcvs that links to the version we just installed. This saves a bit of typing later and, more importantly, allows us to toggle between StatCVS versions by simply changing the symbolic link to point to the version we want to use.
Listing 1. A symbolic link for StatCVS
[root@hal local]# pwd /usr/local [root@hal local]# ln -s statcvs-0.2.2 statcvs [root@hal local]# ls -l | grep statcvs lrwxrwxrwx 1 root root 13 Jan 13 14:27 statcvs -> statcvs-0.2.2 drwxrwxr-x 2 root root 4096 Oct 13 23:32 statcvs-0.2.2 -rw-rw-r-- 1 tom tom 1344753 Jan 13 13:49 statcvs-0.2.2.zip |
If you list the files in the statcvs directory, you'll see that there aren't any supporting JAR files for StatCVS. The only JAR file is statcvs.jar, which includes the single third-party library that StatCVS uses: JFreeChart. This approach makes it a bit easier to get started because there's no need to muck around with the classpath.
To illustrate how StatCVS works, we need to find a project with an interesting CVS history and generate some activity reports. The developerWorks project Jikes (see Resources) has been around for a while, has numerous developers, and has a public CVS repository, so it's a good fit.
In order to get StatCVS reports for Jikes, we need to get the latest source code and generate a CVS log file for StatCVS to analyze. So, we need to check out the Jikes code from its CVS repository. The Jikes developers allow anonymous read-only access to their repository, so I'll use that method to get the code, as shown in Listing 2:
Listing 2. Checking out code from the Jikes CVS repository
[tom@hal tmp]$ cvs -d:pserver:anoncvs@www-124.ibm.com:/usr/cvs/jikes login Logging in to :pserver:anoncvs@www-124.ibm.com:2401/usr/cvs/jikes CVS password: [ Type "anoncvs" here ] [tom@hal tmp]$ cvs -d:pserver:anoncvs@www-124.ibm.com:/usr/cvs/jikes co jikes cvs server: Updating jikes U jikes/.cvsignore ... several thousand lines elided ... |
Now that I've got the Jikes code on my machine, I need to create a CVS log file for StatCVS to chew on. To create this file, I'll move into the jikes directory and run the cvs log command. As you can see in Listing 3, I'm redirecting the command output to a file called logfile.txt:
Listing 3. Creating a CVS log file
[tom@hal tmp]$ cd jikes/ [tom@hal jikes]$ time cvs -d:pserver:anoncvs@www-124.ibm.com:/usr/cvs/jikes log > logfile.txt real 0m40.719s user 0m0.516s sys 0m0.314s [tom@hal jikes]$ |
Just for fun, I timed it. It took about 40 seconds on my workstation, and the resulting log-file size is about 3.3 MB.
Now I'm ready to run StatCVS and generate my reports. You can run StatCVS from the command line or from Ant (see Resources). I'll look at the command-line interface first, then discuss Ant.
StatCVS is easy to run from the command line because there's only one JAR file, and you can pass that JAR file name directly to the virtual machine. There are a variety of options you can use to tweak the output. Here are some of the more useful ones:
-title [title]-- A display title to put on the report-output-dir [directory]-- A place to put the report files; the directory will be automatically created if it doesn't already exist-include [pattern]-- Include only files that match the given pattern-viewcvs [ViewCVS url]-- The URL of a ViewCVS Web interface to the repository (see Resources)
I'll use the above options to create a report. First, I must move to the directory above jikes/, then I can run StatCVS from the command line, as shown in Listing 4:
Listing 4. Running StatCVS from the command line
[tom@hal tmp]$ time java -jar /usr/local/statcvs/statcvs.jar \ -include "cpp;**/*.h" \ -output-dir report \ -title "Jikes" \ -viewcvs http://www-124.ibm.com/developerworks/oss/cvs/jikes/jikes/ jikes/logfile.txt jikes/ StatCVS - CVS statistics generation real 0m15.232s user 0m12.014s sys 0m0.326s [tom@hal tmp]$ |
Note that I'm using the -include parameter to catch only C++ source code files and header files. There may be lots of other files in the CVS module (documentation, configuration scripts, reports, Web pages), but I'll just stick with the source code for this article.
Listing 5 shows the Ant task definition that would work the same as the command-line invocation in Listing 4:
Listing 5. Running StatCVS using Ant
<?xml version="1.0"?> <project name="StatCvsAnt" default="main" basedir="."> <taskdef name="statcvs" classname="net.sf.statcvs.ant.StatCvsTask"/> <target name="main"> <statcvs projectDirectory="jikes" cvsLogFile="jikes/logfile.txt" outputDirectory="report" title="Jikes" viewcvsURL="http://www-124.ibm.com/developerworks/oss/cvs/jikes/jikes/" includeFiles="**/*.cpp;**/*.h"/> </target> </project> |
The reports are placed in the report directory specified in Listing 4. If you're following along, opening a browser to that directory should pick up the index.html page, which is shown in Figure 1:
Figure 1. The main StatCVS report page for Jikes
You can see the sorts of reports available: statistics on the code authors, a view into the Commit Log, a Lines of Code section, and some statistics on file and directory sizes.
The Lines of Code chart, shown in Figure 2, can be pretty interesting:
Figure 2. Lines of code for Jikes over time
You can see from this chart that the initial code import occurred in early 1999. From then on, it grew fairly steadily until late 2001, when the amount of code began to decrease a bit. There are various times when new code was introduced or old code was refactored away as indicated by sharp rises or drops in the lines of code count. Not much code seems to have been added or removed since mid-2004, indicating, perhaps, that Jikes has matured to the point where it's mostly being maintained.
If you click on the Authors link from the main report page, you'll see the numbers and a chart showing how much code each of the committers has contributed, as shown in Figure 3:
Figure 3. Lines of code per committer
It's clear that ericb and shields have contributed the lion's share of the code, while the other committers have occasionally chipped in a bit. Note that none of the committers has been with the project for its duration.This speaks volumes about the need for clear code with good variable names and clean design on a long-term project.
Incidentally, StatCVS is pretty clever in terms of report generation. If a CVS repository has had only one committer, the link on the main report page simply says "Author page for joe_smith" and the comparison charts are not generated. This makes StatCVS run faster and keeps the report pages clean.
Let's look at one more committer activity chart. Author Activity, which is accessed via the main report page by clicking on the Authors link and is shown in Figure 4, shows whether each committer tended to add files or modify them:
Figure 4. Adding vs. modifying code
You can see that shields added the most code, which is what you would expect, as this person was apparently the first maintainer of the code after it was imported into CVS. Along the same lines, ericb came along several years after the project was started and mostly modified files.
The Commit Log is simply a list of all the changes made to the module. This report shows who made the changes and the comment that the committers associated with their changes. Also, because Jikes has a ViewCVS interface to its repository and I ran StatCVS with the -viewcvs parameter, the report includes links to the actual source code change. For example, there was a change made to src/decl.cpp on 12 Dec 2004. If you click on decl.cpp, you'll see that an if statement was added, as well as a comment. Figure 5 shows a portion of the ViewCVS display of the differences between the two file versions:
Figure 5. A specific code change
There are a few other reports: One shows the average file size, another shows how the code is distributed over the directory tree, and another shows which files were revised the most. The entire report for Jikes is available by clicking on the Code icon at the top or bottom of this article, or from the Download section. Simply unzip it and point your browser to index.html to see the report.
Generating reports for several projects
You've seen how to run StatCVS on one CVS repository. But suppose you have multiple repositories, and you'd like a way to generate StatCVS reports for all of them nightly. You can run StatCVS from the command line, so it's a simple matter to do that in a script. Here are a few things to keep in mind:
- StatCVS is a Java program, so it needs a lot of memory just to get started. It can take a while to process large CVS repositories, too. So if you run it on a machine that's also being used for other purposes, it's best to give the machine a bit of a break between repositories. If you're running a system that supports priorities, it's a good idea to run a long job like this at a low priority.
- If repositories are being added and removed from the machine on a regular basis, some repositories may not contain modules. Check for this possibility first so that you don't start up StatCVS unnecessarily.
Listing 6 shows a small Ruby script to run StatCVS on several repositories that have a common parent directory (for more on Ruby, see Resources); this script is also available as part of the code download:
Listing 6. A script to run StatCVS for multiple repositories
#!/usr/local/bin/ruby
require 'fileutils'
HOME_DIR = "/tmp/"
CVS_DIR= "/path/to/my/cvs/"
BASE_OUTPUT_DIR = "/var/www/my-projects/"
DELAY = 5
Dir.chdir(HOME_DIR)
# get a list of all the repositories
Dir.new(CVS_DIR).entries.grep(/^[^.]/).each {|file|
# create a working directory
working_directory = "tmp_" + rand().to_s
Dir.mkdir(working_directory)
Dir.chdir(working_directory)
`cvs -d#{CVS_DIR}#{file} -Q co .`
FileUtils.rm_rf(%w{CVS CVSROOT})
# no need to run StatCVS if no modules exist yet
if !Dir.new(".").entries.grep(/^[^.]/).empty?
`cvs -d#{CVS_DIR}#{file} -Q log > log`
cmd = "/usr/java/java/bin/java "
cmd = cmd + "-jar /usr/local/statcvs/statcvs.jar "
cmd = cmd + "-output-dir #{output_directory} log ."
`#{cmd}`
FileUtils.rm("log", :force=>true)
end
# clean up and sleep for a bit to let things settle down
Dir.chdir(HOME_DIR)
FileUtils.rm_rf(working_directory)
sleep DELAY
}
|
StatCVS internals and limitations
Because StatCVS is an open source project, the code is available. To get the StatCVS code, download the source zip file from the StatCVS page (see Resources) or check it out from the CVS repository on that Web site.
Here are some vital code statistics:
- 4,463 lines of code as measured by JavaNCSS
- 176 JUnit tests
- A nice Ant build file to facilitate compilation of custom builds
- A good sign -- PMD could find no examples of unused code within StatCVS
See Resources for additional information on JavaNCSS, JUnit, Ant, and PMD.
StatCVS uses JFreeChart to generate the charts and graphs. All the charts are generated in the Portable Network Graphics (PNG) format, which is supported by most modern Web browsers. The code for generating the charts is nicely encapsulated in the net.sf.statcvs.renderer package.
Probably the biggest limitation is that StatCVS does not support branches; it reports only changes made on the HEAD of each module. Thus, if your development team's usage pattern is to make a new branch for each version of the product and only commit to that branch, StatCVS will not return accurate results. This issue has been discussed on the StatCVS mailing list (see Resources), but it doesn't seem likely to be fixed anytime soon. But then again, it's open source, so who knows?
Another limitation is that StatCVS supports only CVS. With Subversion rapidly gaining as the heir apparent to CVS, it'd be great if StatCVS could support both. There's been some discussion of this on the StatCVS mailing list, as well, but the Subversion changeset formats seem to be the current obstacle.
Mining a CVS repository for usage information can produce lots of numbers and charts. It's up to you to recognize how valid these numbers are and what -- if any -- insight they give you into how a particular project has been developed. With those caveats in mind, StatCVS can provide some interesting visual snapshots of what's happened to a project's source code in its lifetime.
Also, StatCVS is an excellent model of good execution of an open source project. The code is clean, the build process is simple, and the documentation is clear. If you're interested in how to do open source right, you can learn a lot from looking into StatCVS.
| Name | Size | Download method |
|---|---|---|
| statcvs-download.zip | 563 KB | HTTP |
Information about download methods
- Get the latest version of StatCVS. Get the source code for the latest StatCVS release.
- Jikes is a fast open source compiler for Java programs.
- Learn more about JFreeChart, the graphing library that StatCVS uses.
- Download the latest release of CVS. Lots of documentation is up there, too, and the gurus on the CVS mailing lists can be quite helpful.
- Don't miss this tutorial covering usage of the CVS client and the CVS server (developerWorks, July 2003).
- You'll also want to read this article, which examines using Eclipse with CVS and Ant (developerWorks, March 2003).
- Open Source Development with CVS, Third Edition (Paraglyph Press, 2003) is a helpful guide to all the CVS features. It's also available for free in PDF format.
- Ruby is an interpreted scripting language for quick and easy object-oriented programming. It has many features to process text files and to do systems management tasks. It is simple, straight-forward, extensible, portable, and has friendly, active mailing lists.
- You can find several hundred open source Ruby projects on RubyForge, including some projects that may assist in your Java development. For example, Ruby JDWP is an effort to create a Ruby implementation of a Java Debug Wire Protocol client.
- Michael Squillace and Barry Feigenbaum discuss Ruby interaction with Java code in their article Take a shine to JRuby (developerWorks, September 2004).
- ViewCVS provides a read-only HTTP interface to CVS repository. It shows differences between file versions, displays branches and tags, and supports syntax colorizing to make the code easier to read.
- JavaNCSS counts lines of code in Java programs. It provides an accurate measurement because it skips comments and white space.
- The JUnit framework automates the running of unit tests. It provides a nice GUI that shows a green progress bar when all the tests pass. This gives rise to the tagline "If the bar is green, the code is clean." Also, developerWorks has an article on using Jython to create JUnit tests.
- Apache Ant is a build tool based on the Java programming language, and developerWorks hosts many articles and tutorials dealing with Ant, including Apache Ant 101: Make Java builds a snap (December 2003)
- Maven is a project management and project comprehension tool designed for Java programs. It defines a standard project description model and automates many standard project tasks.
- Be sure to see the documentation for the StatCVS-XML plug-in for Maven.
- PMD is a static source code analyzer for Java source code that finds unused variables, empty catch blocks, unnecessary object creation, and more.
- Elliotte Harold thinks the world of PMD and wrote this article to explain how to Zap bugs with PMD (developerWorks, January 2005).
- To learn more about open source projects, visit the developerWorks open source zone. You'll find many useful projects, the latest open source news, tutorials, books, and much more.
- You'll find articles about every aspect of Java programming in the developerWorks Java technology zone.
- Browse for books on these and other technical topics.
- Get involved in the developerWorks community by participating in
developerWorks blogs.
Tom Copeland started programming on a TRS-80 Model III, but demand for that skill has waned, and he now mostly writes Ruby and Java code. He contributes to various open source projects, including PMD and GForge, and he helps administer RubyForge, an open source Ruby project repository. He and his wife, Alina, have four children (Maria, Tommy, Anna, and Sarah) and live in northern Virginia.
Comments (Undergoing maintenance)





