Better habits, better results
Earlier this year, Kyle Brown wrote a great article on paying back technical debt. That article explained that "technical debt" is incurred when you put off completing an important software development activity in order to complete some other activity. If you have ever put off completing part of a project with the intention of coming back later to finish it –- such as, perhaps, writing documentation, changing a piece of code to make it more understandable or maintainable, and so on -- then you have incurred technical debt.
This information hit home with me, and while I encourage you to revisit that article and apply the steps it outlines for resolving technical debt, I also realize that concepts like this are often difficult to put into practice, particularly in the middle of a project or development cycle. But with year-end upon us and a new year right around the corner, I thought that this might actually be an appropriate time to suggest taking a proactive approach to avoiding technical debt.
Here, then, is a short but important checklist to help you apply a fresh perspective, better organization, and, ultimately, greater overall integrity to your development projects -- and even your development and production environments.
- Define and understand service level agreements and non-functional requirements for your system
A question that few enterprises seem prepared to answer during the planning phase of a project has to do with their desired service level agreement (SLA), which primarily refers to system availability. As you can imagine, this is a crucial determination in the design, development, and support of any given system because it will have an enormous impact on the funding and effort required. For example, supporting a system that requires an uptime of 99.9% (also known as three nines) is much less expensive than a system that requires 99.999% (also known as five nines) of continuous availability.
Non-functional requirements, which more or less describe how the application should perform in the real world, often feed into the performance testing phase. It is critical to understand how the application should perform in production so that objectives and expectations are understood by all parties, otherwise performance and capacity planning will be inconclusive.
- Audit the enterprise infrastructure
A chain is only as strong as its weakest link. In an enterprise, this starts with the hardware and network infrastructure. At a minimum, ideally, the infrastructure should be audited every quarter by a competent audit team that can examine the configurations of the various network elements (servers, routers, and so on) to ensure that the settings are correct.
For example, a common problem occurs when the configurations of the network interface controllers (NICs) on a server and the routers they are connected to are mismatched. Another common error involves configuring an autonegotiate setting on one side, but then having the other side on a setting that the first side can’t configure itself for.
It must be the responsibility of someone in a network administration or system administration role, as appropriate, to ensure that things of this nature are identified and corrected in a timely manner.
- Review application architecture and code
Business critical and high volume applications always benefit from an application architecture and code review by an independent, outside expert (ideally, someone unrelated to the application). Reviews of mature, static applications do not need to be performed with the same frequency as newer, more active applications that get new versions pushed out every week. The enterprise must determine a legitimate review cycle for an application's architecture as well as its code.
Why is this necessary? It would be beneficial to know, for example, whether an application that is expected to be highly available is in fact being built for high availability. This not only helps stengthen and support the application, but can also help identify skill development needs.
Yes, this is a time consuming activity, but necessary. At the very least, an annual review is much better than no review.
- Conduct performance testing
An imperative phase in any business critical application’s lifecycle is performance testing. Performance testing should begin on an application as soon as any application functionality is available for testing. For any running application, this involves no less than the following high level steps:
- Review the logs and extract user patterns and use cases.
- Determine the frequency of each use case.
- Build use cases based on the information culled from the logs.
- Clear the logs; uncleared logs can grow very large and be difficult to consume during troubleshooting.
- Run the use cases at the frequency determined from the log information.
- Review and analyze the performance test data.
Effective performance testing can lead to changes in the configuration, the application code, and sometimes the application placement in the infrastructure. Any troubleshooting that occurs during performance testing should lead back to defects opened against the appropriate function, and followed up through an appropriate change management tool.
- Perform capacity planning
At the completion of performance testing, the capacity planner will be able to take the observed characteristics and test results and apply capacity planning principles to establish appropriate bandwidth and server load estimates.
- Enable change management
A worst case scenario might involve a production enterprise environment that is working fine on Friday, but then suddenly nothing is working on Monday morning. If a change was applied over the weekend and a good change management tool is not in use, then it will be extremely difficult to determine what the change was that needs to be backed out. Sadly, this is a common problem.
Administrators and developers must be equipped with an effective and reliable change management tool and approval process to not only accelerate troubleshooting, but to assist in staying out of technical debt in general.
- Staff well
Staffing and education are the most obvious – and most challenging – tools to help fight technical debt. In some cases, a single person might be able to fulfill multiple roles, but the fact is that most business critical sites have several fairly intensive job roles that simply are not conducive to double duty. Therefore, managers must take the time to at least understand which roles are understaffed and which require additional skills.
For example, system administrator is a role that is commonly shared by multiple (production, development, test) environments. However, if there is heavy demand in the production environment(s), then less focus and attention is available for the others. Delays in testing can have a snowball effect on the other environments, possibly resulting in fewer application features that get promoted to production or, worse, insufficient testing of features that get promoted to production anyway – both of which end up affecting the business.
- Performance Testing Protocol for WebSphere Application Server-based Applications
- Paying back technical debt
- Planning for Availability in the Enterprise
- Why do non-functional requirements matter?
- IBM developerWorks WebSphere