What demolishing the QA wall looks like
By Gene Kim, Award-winning CTO, Author and DevOps Enthusiast
At Appvance, we’re passionate about helping companies break through the QA Wall—a bottleneck created because legacy software test automation tools were not designed to move as fast as Dev and Ops can now move. We talk with enterprise technology leaders every day who are seeking to accelerate software releases while maintaining (or improving) application quality, performance and security. That’s why we asked Gene Kim, our favorite DevOps enthusiast, to share his insights in this post. Be sure to download an excerpt from his newly published book, The DevOps Handbook and order your copy.
For many organizations, creating effective automated testing for legacy applications is likely to be the largest challenge in their DevOps transformation. This is especially the case when we are 100% reliant upon manual testing, maybe even being performed by an third-party and only at the end of a project.
When this happens, it because extremely difficult, or even impossible, for us to create fast flow from Dev into Test and Ops and then into production, while preserving world-class availability, reliability, and security. If testing is only performed a few times a year, developers learn about their mistakes months after they introduced the change that caused the error. By then, the link between cause and effect has likely faded, solving the problem requires firefighting and archaeology, and, worst of all, our ability to learn from the mistake and integrate it into our future work is significantly diminished.
Instead, we need to build quality into our product, even at the earliest stages, by having developers build automated tests as part of their daily work. This creates a fast feedback loop that helps developers find problems early and fix them quickly, when there are the fewest constraints (e.g., time, resources).
One of my favorite case studies in The DevOps Handbook that shows the value of creating effective automated testing and continuous integration processes is the HP LaserJet firmware story, as told by Gary Gruver. When I first heard about this in 2012 during a conversation with my fellow co-author Jez Humble, it just blew my mind.
Gruver describes better than anyone the business value created by automated testing, and how it is a competency that every technology leader must care about — as Gruver once observed, “Without automated testing, the more code we write, the more time and money is required to test our code—in most cases, this is a totally unscalable business model for any technology organization.’
Furthermore, by showing us that we can create effective automated testing for firmware, we can do it for anything!
Case Study: Continuous Delivery For HP LaserJet Firmware
The ability to “branch” in version control systems was created primarily to enable developers to work on different parts of the software system in parallel, without the risk of individual developers checking in changes that could destabilize or introduce errors into trunk (sometimes also called master or mainline).†
However, the longer developers are allowed to work in their branches in isolation, the more difficult it becomes to integrate and merge everyone’s changes back into trunk. In fact, integrating those changes becomes exponentially more difficult as we increase the number of branches and the number of changes in each code branch.
Integration problems result in a significant amount of rework to get back into a deployable state, including conflicting changes that must be manually merged or merges that break our automated or manual tests, usually requiring multiple developers to successfully resolve. And because integration has traditionally been done at the end of the project, when it takes far longer than planned, we are often forced to cut corners to make the release date.
This causes another downward spiral: when merging code is painful, we tend to do it less often, making future merges even worse. Continuous integration was designed to solve this problem by making merging into trunk a part of everyone’s daily work.
The surprising breadth of problems that continuous integration solves, as well as the solutions themselves, are exemplified in Gary Gruver’s experience in 2007 as the director of engineering for HP’s LaserJet Firmware division, which builds the firmware that runs all their scanners, printers, and multifunction devices.
The team consisted of four hundred developers distributed across the US, Brazil, and India. Despite the size of their team, they were moving far too slowly. For years, they were unable to deliver new features as quickly as the business needed.
Gruver described the problem thus, “Marketing would come to us with a million ideas to dazzle our customer, and we’d just tell them, ‘Out of your list, pick the two things you’d like to get in the next six to twelve months.'”
They were only completing two firmware releases per year, with the majority of their time spent porting code to support new products. Gruver estimated that only 5% of their time was spent creating new features—the rest of the time was spent on non-productive work associated with their technical debt, such as managing multiple code branches and manual testing, as shown below:
- 20% on detailed planning (Their poor throughput and high lead times were mis-attributed to faulty estimation, and so, hoping to get a better answer, they were asked to estimate the work in greater detail.)
- 25% spent porting code, all maintained on separate code branches
- 10% spent integrating their code between developer branches
- 15% spent completing manual testing
Gruver and his team created a goal of increasing the time spent on innovation and new functionality by a factor of ten. The team hoped this goal could be achieved through:
- Continuous integration and trunk-based development
- Significant investment in test automation
- Creation of a hardware simulator so tests could be run on a virtual platform
- The reproduction of test failures on developer workstations
- A new architecture to support running all printers off a common build and release
Before this, each product line would require a new code branch, with each model having a unique firmware build with capabilities defined at compile time.† The new architecture would have all developers working in a common code base, with a single firmware release supporting all LaserJet models built off of trunk, with printer capabilities being established at runtime in an XML configuration file.
Four years later, they had one codebase supporting all twenty-four HP LaserJet product lines being developed on trunk. Gruver admits trunk-based development requires a big mindset shift. Engineers thought trunk-based development would never work, but once they started, they couldn’t imagine ever going back. Over the years we’ve had several engineers leave HP, and they would call me to tell me about how backward development was in their new companies, pointing out how difficult it is to be effective and release good code when there is no feedback that continuous integration gives them.
However, trunk-based development required them to build more effective automated testing. Gruver observed, “Without automated testing, continuous integration is the fastest way to get a big pile of junk that never compiles or runs correctly.” In the beginning, a full manual testing cycle required six weeks.
In order to have all firmware builds automatically tested, they invested heavily in their printer simulators and created a testing farm in six weeks—within a few years two thousand printer simulators ran on six racks of servers that would load the firmware builds from their deployment pipeline. Their continuous integration (CI) system ran their entire set of automated unit, acceptance, and integration tests on builds from trunk, just as described in the previous chapter. Furthermore, they created a culture that halted all work anytime a developer broke the deployment pipeline, ensuring that developers quickly brought the system back into a green state.
Automated testing created fast feedback that enabled developers to quickly confirm that their committed code actually worked. Unit tests would run on their workstations in minutes, three levels of automated testing would run on every commit as well as every two and four hours. The final full regression testing would run every twenty-four hours. During this process, they:
- Reduced the build to one build per day, eventually doing ten to fifteen builds per day
- Went from around twenty commits per day performed by a “build boss” to over one hundred commits per day performed by individual developers
- Enabled developers to change or add 75k–100k lines of code each day
- Reduced regression test times from six weeks to one day
This level of productivity could never have been supported prior to adopting continuous integration, when merely creating a green build required days of heroics. The resulting business benefits were astonishing:
- Time spent on driving innovation and writing new features increased from 5% of developer time to 40
- Overall development costs were reduced by approximately 40%
- Programs under development were increased by about 140%
- Development costs per program were decreased by 78%
What Gruver’s experience shows is that, after comprehensive use of version control, continuous integration is one of the most critical practices that enable the fast flow of work in our value stream, enabling many development teams to independently develop, test, and deliver value. Nevertheless, continuous integration remains a controversial practice.
Gene Kim, “The Amazing DevOps Transformation of the HP LaserJet Firmware Team (Gary Gruver),” ITRevolution.com, 2013
Gary Gruver and Tommy Mouser, Leading the Transformation: Applying Agile and DevOps Principles at Scale (Portland, OR: IT Revolution Press), 60.
†The term “branching” in version control has been used in many ways, but is typically used to divide work between team members by release, promotion, task, component, technology platforms and so forth.
Read more in the DevOps Handbook
The remainder of this chapter describes the practices required to implement continuous integration, as well as how to overcome common objections. Download the 130 page excerpt and pre-order your copy of the DevOps Handbook.
Read more about the QA Wall
Download the Appvance Business Brief: The QA Wall: Why it is time to re-think software test automation