Lessons from the Agile trenches

The ups and downs of Agile development


You've read the hype surrounding Agile software development. You've persuaded your boss that the business will benefit from adopting it and should bring in some expert consultants to help. Your team has embraced the advantages of frequent releases, iterative development, and daily stand-ups. Everything is going fine.

But as the team grows and the systems mature, you start to run into problems. The codebase is too large for collective ownership. People forget details, and no documentation exists. Developers complain that the build takes too long. Some even check in code before the build completes, and sometimes this breaks it. You start to think that the expensive Agile consultants you hired were armchair generals who knew nothing about being a foot soldier.

This story is all too common. Sadly, some teams then go back to the waterfall approach. Some become "watergile." But a few continue being highly Agile and prosper. I've had the good fortune to work as a developer with one such department in a large telecommunications company. They've been delivering software in an Agile way for nearly a decade, long after the Agile consultants left. This article tells how they do it. I'll give you a picture of the work environment, the processes used, and the challenges — both met and so far unmet. I can't offer an Agile blueprint (no one can), but the lessons learned from my experience can, I hope, help other organizations in their battles to become truly Agile.

Team and system configuration

Teams of database administrators, system administrators, data warehouse administrators, and network administrators are all essential to system operations at the telecoms company. But when it comes to being Agile, it's all about (to quote Steve Balmer) developers, developers, developers.

Development team composition

All 50 or so developers are based on one floor and work in about 10 teams of 6 or fewer. With teams this size, there's little chance of miscommunication, and daily stand-ups take no more than 15 minutes. Each team is responsible for writing and managing a handful of small applications that address a discrete area of business. Apart from agreeing on its interfaces with other teams, each team is an autonomous unit that decides its own way of doing things.

Although the applications under development have a clear architecture, we use no architects. All architectural decisions are made by consensus amongst the developers who work on the code face. Each team does, however, have a business analyst. He or she acts as a proxy for the customer, who might not be located in the same city. The business analyst feeds requirements for each iteration to the developers (who are careful not to overcommit). They are the persons the developers see who are closest to what can be described as a manager. In addition to the business analyst, each team has its own dedicated user-acceptance tester. Both the tester and the analyst attend the developer stand-ups.

In addition to the developer stand-ups, we hold a daily stand-up that a representative of each development team attends. Cross-cutting concerns such as expiring certificates and licenses are discussed here.

The architecture

The architecture of all the applications reflects the team composition. All the apps talk to one another via XML. The transfer protocol is sometimes Java™ Message Service (JMS) but more often HTTP. In keeping with autonomous teams, the choice of web server is irrelevant, and large Enterprise Service Bus solutions have so far been avoided. This approach largely works well, as long as some guidelines are followed.

The first guideline is to bake versioning into the messages sent between apps by including it either in the URL or in the XML payload. This way, the server application can fail quickly if it detects it has been given incomplete data.

Another is to give client teams the XML schema (XSD) for the XML payload. This simple gesture elicits surprisingly large amounts of gratitude from the client team, which then feels more comfortable about what it is their code is asking for.

A third is to use the free and open source Yatspec for documentation that runs as part of the test suite (see Related topics at the end of the article). If documentation becomes out of date, then the build breaks until it is updated. The documentation Yatspec produces uses human-readable web pages to describe the XML that is passed back and forth for a variety of paths through the system. At the end of every iteration it can be given to the client. It is far more readable than a Web Services Description Language (WSDL) file. I'll discuss Yatspec more later.

Perfect harmony?

With such small, self-organizing teams following sensible guidelines, what can possibly go wrong? Lots, actually.

Intrateam friction

Egoless programming can only truly be achieved in a world of egoless beings. Wherever that world is, it's not Earth. Although conditions can minimize friction between strong personalities, breakdowns will inevitably occur when a sufficiently large number of persons are interacting. One instance I saw was when half a team was adding Spring to the codebase while another was busy removing it. In cases like this, management must be strong and break the deadlock by whatever means are necessary. Sometimes, there's no alternative but to swap people out of a team that is suffering from a civil war.

Incompatible architectures

Teams that make their own architectural decisions leave little room for prima donnas. But architectural decisions that one team makes can impact others.

For instance, one team decided to use a NoSQL, in-memory database. Later, they decided that in fact the data must survive a crash. So they asked the team from which they took the data to store it for them. This was a suboptimal idea for the team providing the data, which had not planned for this work. They felt that it was architecturally repugnant for business concerns to span multiple apps (and that the team consuming the data had indulged in Career Driven Development).

The teams turn to an arbitrator when situations like this arise. The arbitrator is a developer who does not sit in any one team; he or she is not necessarily the best developer on the floor but one who is commonly regarded as having good people skills and the ability to hear both sides of an argument impartially.

Release-night blues

Unfortunately, we are currently not leveraging our federated environment of small apps quite as much as we could. On the night of the release, one application that fails to function causes the release of all applications to roll back.

This necessity is not simply due to a difference in interfaces between releases. (Because of version numbers in the messages sent between systems, an application can be prepared for that contingency.) Rather, it's due to the possibility that messages might disappear.

Consider the scenario where one successfully deployed application starts up and passes a message to another newly deployed app. The first app then considers its work done. Then, it is noticed that the second application that consumed the message is faulty, so both its deployment and its database are rolled back. The first application knows nothing of the rollback of its collaborator and hence has no interest in trying to resend its message.

The first approach to this problem was to break down the applications into different families. It was hoped that this would at least mean that only a subset of applications would be rolled back if a problem occurred. For instance, it seemed that the workflow of customer ordering (the client first wants to be connected to our network), provisioning (the client's line needs to be physically connected at the telephone exchange), and service assurance (checking the client's connectivity) were totally separate domains. But it later became clear that a support engineer who wants to know why a new client's telephone line is not working crosses all domains. By necessity, therefore, the apps need to communicate with one another.

A further problem with this superficial analysis is that applications even in the same family didn't necessarily talk to one another. Although the families seemed obvious from a business viewpoint, they failed to capture the dependencies between apps. For example, there was no common work flow in checking a landline and checking a connection from a wireless hotspot. The two applications that handled this flow did not talk to each other, nor did they share common dependencies despite being part of the putative assurance family.

Alas, we have not yet managed to produce a solution to all-or-nothing releases, although we are lobbying for more equipment so we can thoroughly test in a production-like environment. Until we solve this problem, either all applications must be deployed or all must be rolled back. A single application that fails to deploy will then jeopardize the whole release — a very expensive eventuality.

Reinventing the wheel

Although the pain of rewriting applications is reduced when they are small, not all rewrites are painless. Certain cross-cutting concerns keep getting rewritten. An example is an authentication system that affects all apps. Lack of long-term planning has caused it to go through several rewrites that affect almost all applications. Unfortunately, we have not yet solved this problem.

The folly of crowds

It is still perfectly possible to make architectural mistakes even when two teams agree on an approach.

A case in point is when a team short of work agreed to take on common functionality from an overworked team. The idea was for both teams to work on a shared library. But this quickly led to one team inadvertently breaking the build of the other as they fixated on their own build monitors. The other team's build was a strange land they did not wish to explore.

In retrospect, the first team should have offered some sort of XML/RPC service to the second. Because the first team was overworked, they could have taken members from the second. This painful experience is still ongoing after six months.

Agile/waterfall impedance mismatch

The whole floor operates on iterations that last two weeks. On iterations that have odd numbers, the apps are promoted to the environment just before production but not into production itself. This gives the system administrators and database administrators a dummy run. On iterations that have even numbers, the apps go live. All apps are released on the same night, ensuring that they all work together.

The problem is that there are teams and systems outside the department that do not follow the same cycle or perhaps don't even use the iterative releases of Agile at all. They might belong to different departments or even different companies.

Because we don't have control over our clients, there's a limit to what we can do. But we have discovered that providing the client with a "primer" application for each client-facing app helps their adoption. These primer apps stub out back-end systems that our client-facing app talks to. The primers have a simple web front end that enables our client to set up the systems in whatever configuration they want to test against.

Because releases of these primers are in lockstep with the production releases, they too are available in the test environment with every incremental build. This enables our clients to play with the applications as they progress through the iteration.

Falling between the cracks

It is easier for some work to fall between the cracks if each team is self-contained in its own silo. For instance, one application relied heavily on Apache ActiveMQ, but who was responsible for it was a constant source of argument. The developers saw it as infrastructure. The system admins argued that the developers were better placed to deal with messaging issues because they knew the software much better. Only trial-and-error plus a little horse-trading resulted in an agreement.

Tricks of the trade

By sometimes being far-sighted — but more often through trial and error — we have come up with some best practices to help us deliver. These recognize that the code is only part of a successful iteration. Tests, builds, deployment, and documentation all have their own processes that must be tuned over time. Here, I'll share some techniques that work for us. Your mileage might vary, but if you adopt and adapt them your productivity could increase.


It is essential to have fast builds if the developers' attention is to be maintained. Fast builds also enable them to be brave when making changes, because they can very quickly test they haven't broken anything.

Build times are naturally lower if applications are smaller. But another way to reduce the build time is to make the build multithreaded. Listing 1 shows part of an Apache Maven project object model (POM) file that uses the Maven Surefire Plugin (see Related topics at the end of this article) for running multithreaded tests:

Listing 1. Snippet from a Maven pom.xml for multithreaded tests

A full build for the application from which this excerpt was lifted takes less than a minute.

You should heed two pieces of advice when making your builds multithreaded. The first is that doing so on a mature project often does not work out-of-the-box. Tests that run serially can mysteriously fail when running in parallel. It is therefore better to make builds multithreaded from the start.

Second, thought must be given to how the tests are to be run. For instance, it's common for some tests to clean the database before they start. These tests are not appropriate for multithreading, because one might be tearing down the data while another is populating the table.


A cross-cutting concern for our many federated applications is their release into the various environments. To this end, we have written and open-sourced an Ant library called Conduit that facilitates the release process. It has helpful macros that, for instance, scp the artifact to an appropriate environment and start it remotely using SSH.

One nice side-effect of having a uniform way of starting and stopping applications is that writing integration tests that must start and stop whole sets of applications is much easier. Each script is the same. Another advantage of a library that deploys the application for you is that you can release to your many environments many times a day. The more often you do so, the less chance of surprises when you do the real release.

One word of warning, however: Conduit is still somewhat immature, and the error messages can be rather esoteric.

Living documentation

Agile consultants often advise avoiding documentation, saying that it quickly falls out of date. But this is not true if the documentation is part of the build.

The department I work in uses Yatspec (see Related topics at the end of this article), a tool that they wrote and open-sourced. It runs tests written in Java and turns the results and the source code into HTML documents. The output looks something like Figure 1:

Figure 1. Yatspec documentation
Screenshot of HTML documentation generated by the Yatspec documentation tool
Screenshot of HTML documentation generated by the Yatspec documentation tool

The test method that generated the output in Figure 1 can look as simple as the code in Listing 2:

Listing 2. A Yatspec test method to generate documentation
import org.junit.Test;
    public void pingAWifiServiceWithARouterAndCheckItIsReachable() throws Exception {

Each method in Listing 2 makes assertions and highlights data that is particularly interesting.

Yatspec can even automatically generate sequence diagrams like the one in Figure 2:

Figure 2. Automatically generated sequence diagram
Screenshot of a sequence diagram generated by the Yatspec documentation tool
Screenshot of a sequence diagram generated by the Yatspec documentation tool

Capturing the actions in the sequence diagram is simply a matter of injecting a test-code listener into your production code.

Source code as documentation

Ultimately, the source code is the documentation for developers. And sooner or later another team is going to have to look at yours. I've found that you will receive the undying admiration of your neighboring table of developers if you release your source code with each build.

This is easy to do in Maven using just a standard plug-in (see Related topics at the end of this article). Listing 3 shows you how to use the Maven Source Plugin:

Listing 3. Snippet from a Maven pom.xml for deploying source code

Release-night lifesavers

Imagine this scenario: It's 3.30 a.m. on the night of a release, and your application fails to start. You wonder why, because you've spent most of a whole month writing tests. You're tired, you want to go home, and your colleagues are looking at you with growing annoyance. You double-check the code, but nothing seems to be wrong with it....

It's easy to forget that configuration is as much a part of your application as your code is. For instance, one application of ours went three whole releases without a single code bug. However, it was plagued with configuration issues that made release night a nail-biting experience.

To avoid this unpleasantness, we wrote a class that would sanity-check all the properties. This simple JUnit test checks all the values in the property files for all the environments. It might execute a logical check of the value, maybe just checking that it matches a regular expression. Or, it might simply check to ensure that the value actually exists. This kind of test has prevented no end of late-night panics.


Agile methodologies recognize that projects never end. That's how software development rolls in the real world. Consequently, even after nearly a decade of Agile development, new issues (such as a coarse-grained release process and agreement on cross-cutting concerns) are still hitting us.

But having spent 16 years as a developer, I believe that we're in better shape than most other companies. Releases are rarely rolled back. Production issues tend to be minor. Team spirit is mostly high.

Agile methodologies will vastly improve your delivery of software, but you must still be realistic about what can be achieved and how quickly. Any consultant who tells you that Agile effortlessly fixes all the problems with your process is probably trying to sell you something. To paraphrase Winston Churchill, Agile is the worst methodology — except for all the others.

Downloadable resources

Related topics


Sign in or register to add and subscribe to comments.

ArticleTitle=Lessons from the Agile trenches