The six million dollar patch
Stu Walker 2700053P59 Comments (4) Visits (2491)
Passing time between flights from one city to the next I'll often head over to 'The Daily WTF' for some light reading and entertainment. Recently a post caught my eye because it illustrates how businesses are missing a trick when it comes to delivering software on time and to quality.
It is easy for most of us IT practitioners to relate in one way or another to John's story and the frustrations at the eventual outcome. If we take a moment to deconstruct the incident (one that ultimately cost his business $6,000,000 in 12 months) then we can look at the root cause and ultimately propose some ways John could have worked smarter and mitigated risk.
John's story starts by explaining his company's patch release procedure. Before releasing code he has to run tests in QA, UAT and Performance before seeking authority to push into production. Many of the companies I work with don't enforce this kind of procedural rigour so he's on a fairly good footing, however if we asked John how good his tests were I suspect he'd struggle to explain the coverage. He would probably say there was a mixture of automated and manual tests with a sprinkling of best endeavours. In an ideal world the majority of Johns tests would be automated so that he doesn't need to think twice about the coverage or setup of the environment prior to release but we'll leave that one for another discussion.
Reading further into John's story the alarms really start to sound when we see how John has to change the code being tested before he can test it. He adds flags that bridge production order data into his test environment so that the system has enough state to run through his tests. Bridging of production data into test environments is perhaps more common that you might think, usually an overnight batch job, but regardless it is wrong on many levels and opens his company up to the types of risk and resultant problems that he goes on to explain. I don't know what type of industry John operates in but by bringing real customer data into test systems he's risking more than just wasted time, he also risks data security compliance, customer confidence, loyalty and potentially financial penalties.
So why does John take such risks just to test his update?
John doesn't give a lot of detail about his application but lets assume its purpose is to manage shipment of orders. An order management system needs access to data from the wider organisation to perform its function; data on stock inventory, data on shipment prices and timing, data on customers and their accounts and data on the orders received. In a production environment data comes from John's corporate IT infrastructure or perhaps from external vendors in the case of shipping. In his DEV and TEST environments he doesn't have those systems available and yet somehow he needs to make his order management system think the rest of the IT infrastructure is out there by feeding enough good data to achieve test coverage that will validate his code changes.
In a project such as John's, data will undoubtedly be one of the most expensive and time consuming assets to produce, usually requiring teams of people churning out documents and spreadsheets defining generic and desensitized data values. This complexity is compounded by the fact that data is usually last on the list of things ready in large projects and means testing can be delayed and coverage hindered by it.
So with all that said, you can forgive John's company for using a risky if apparently easy technique to pull production data into test, after all, what alternatives do they have? Rustling up replicas of the all the legacy systems and mainframes, databases, in-house systems and external interfaces that are needed, on demand and replicated across each test environment just isn't economically viable or practical.
In the past it was almost impossible to imagine a simple way to replicate these systems in a test environment. Some people tried to solve the problem by building coded frameworks for mocking of interfaces but very quickly found they were brittle, cumbersome and relied too heavily on teams of developers to support and extend them. If nothing else, the ever changing nature of IT infrastructure and the need to keep these mocking tools up to date saw to that. Nowadays we have off the shelf Service Virtualization (SV) solutions that take away the pain of dealing with highly integrated systems. With SV John could easily record, desensitise and replay versions of realistic data to mimic the interfaces and databases with a few clicks of a button, ready to be played back reliably repeatedly and on demand. If only he'd known!
In a future blog I'll talk about Service Virtualization and how not to position it as the silver bullet I've made it out to be here. ROI at project level as a point solution is incredibly easy to achieve but the real benefits of SV are only seen through staged enterprise adoption which involves changes to process and practices and with an aggressive approach to shifting defect resolution to the left. Further down the line I'll also talk about how the benefits of SV are amplified by good integration of tooling and methodology.