Sunday, November 9, 2008

Software and Chaos

As with many of my blog posts, this is a little long. To make things slightly easier, here is a map

The Problem
My Experience
Business Expectations and Results
Software and the Elevator Speech
All is Not Lost

THE PROBLEM

Roughly seventy five percent of all software projects fail in the sense that they cost more than expected, deliver less than expected, or are delivered later than expected. This has been true for decades. Behind the scenes at these projects is untold stress and pain as development teams and management try to cope. This coping typically starts when it becomes apparent that the carefully constructed project plan does not describe what is happening and "success" is unlikely. I would like to believe that my projects perform better than average, but many of them go through this readjustment to reality. It starts with a meeting where the current situation is explained and options are explored. At some point, someone asks what went wrong, and all eyes turn toward me as system architect and technical manager.

I pride myself on being honest and not ducking blame. I go through a careful analysis of projections and actualities. I point out places where we made mistakes and where we may have lost time, but also explain that this is not where most of the problems lay. I point out the areas where there was unanticipated work or where an underlying system did not perform as expected.

All of this is true, but it is not the actual problem. The actual problems are structural, and most organizations with whom I work will never make the changes necessary to rationally develop software. They will not change because they cannot accept the truth about the type of software development I work on. No team, regardless how smart or how much time is spent in planning, can accurately predict the effort necessary to produce the software. The predictions of effort are based on what is known and are always optimistic.

MY EXPERIENCE

Let me preface this with some characteristics of the projects I work on. My generalization may not apply to other types of work. I work on new development. Typically it is putting a system in place to replace some existing process. Often there is no existing software in place. In the case where there is existing software, the organization plans to completely scrap that system. Along with new software, the business processes will be substantially changed. The projects are usually medium sized, say three to fifteen developers working for between six months and a year and a half.

The parent organization is usually quite large, but not a software development house. They usually have an IT department that maintains and expands existing software. They may have a reasonably large body of homegrown software, but developing new products is not their expertise. The existing IT staff is often expected to do much of the development work. It is common for the organization to have outsourced their data center operations.

These are generally business and process problems, not the development of shrink wrapped products. I have worked on products as well. The details of the problems change in product development, but the basic outline remains.

BUSINESS EXPECTATIONS AND RESULTS

Modern business is based on rational sounding self deception. As a rational business person faced with investing in the future I want to get answers to a set of questions.

What business are we in?
What and when are the opportunities?
How much will it cost to address the opportunity?
How much will we earn if we are successful?
What are the risks of action and inaction?

There are existing tools to address each of these questions. Upper management usually spends a lot of time and money to define the business. This includes vision statements as well as long and short term goals. Market research is used to answer the what, when, and size of opportunities as well as the cost of inaction. Technology research and initial product research are used to estimate the cost of action.

Given this information, I can compare projects to project the highest return on the investment. As a rational person I want to guard against incomplete or incorrect information, so I will want information at every step of the way to make sure assumptions were valid and to readjust the course. Recognizing that it is cheaper and easier to correct errors early in the process, I will want to make sure that we understand the problem and the solution as early as possible. This leads to a typical waterfall process.

Gather Requirements – understand the problem
Detailed Design – understand the solution
Implementation – fill in the details
Test – make sure users see a reliable product
Release – cha ching - here is where we get the return on investment
Support – if we have done things well, we can minimize support

After every step of this process there are checkpoints to assess progress and readjust based on our best information.

This is called a waterfall process because it moves in one direction, at the end of any stage, it is difficult to move back upstream.

The end result of this rational process is generally failure. Not just failure to deliver the software as envisaged, but failure at every step. Management changes in the middle of a project are common. With every change in management there is a shift in the way the business is understood. What was critical yesterday is marginal today. Market research is almost invariably wrong. You can confirm this yourself. Take a look at your marketing reports from two years ago. Project requirements change in the middle of the project. This is one of the most commonly listed critical problems in software post-mortems. The implementation takes longer than projected. As the pressure mounts because the project is late, developers hack functionality together to meet deadlines. Adequate testing is often one of the first casualties. As the bugs mount and the project slips, functionality is cut. Test exposes defects that require rework and sometimes the rework is not simple bug fixing. Sometimes the fix requires major architectural work. When release finally occurs, users find that things they consider essential have been chopped out, or the carefully crafted requirements do not correspond to the way users work.

The sad reality is that this process will fail for all but the most constrained projects. The large uncertainties inherent in the world combined with our limited ability to understand and communicate our understanding doom the entire methodology.

In the software community, almost no one believes that waterfall processes are either effective or desirable. The first descriptions of the process describe its failures as well. The failures have been spectacular. The US Federal Government has had numerous multi-billion dollar projects that have been complete losses (FAA - 2.6 billion IRS - 4 billion http://spectrum.ieee.org/sep05/1685/failt1). The money would have been better spent if it had been bundled into logs and burned to provide heat for the poor. Despite this, many large organizations mandate a waterfall approach. In large organizations around the world, untold amounts of money are spent to describe, mandate, and adjust a process that will never work.

SOFTWARE AND THE ELEVATOR SPEECH

"I'm a busy man, tell me what you want." This is the origin of the elevator speech. As an entrepreneur you find yourself in an elevator with the very person who can fund your project. You walk in and by the time the elevator gets to your floor you want to hook the prospective funder. To do so you answer the implicit questions, What am I going to get (and why should I care), when am I going to get it, and what will it cost? You have perhaps a minute to hook the person.

The basic problem is that the "what", "when", "how much" questions are essentially impossible to answer. The world is chaotic in the mathematical sense. That is, the circumstances that lead to wild success are often indistinguishable from the circumstances that lead to complete disaster.

Even in constrained fields like books and movies where the processes to develop the end product are well known and understood, no once can predict the success of the final product.

For new software, the prediction problems are magnified. The problems I address typically span multiple groups in an organization and involve manual processes that currently require human judgement. It is common that no single person in the organization understands the entire problem. Like the blind men and the elephant, each individual understands the the part they are touching, but no one sees the whole.

I always say that there is only one thing worse than fully specifying a computer system before code is written. It is getting the system that was specified. No matter how carefully we work, the requirements specified at the start of the project often prove untenable in real life.

At best, the process itself encourages wishful thinking. At worst, outright lying. Software is expensive. In a world where the low bid sets a general bar, there is every reason to underestimate the uncertainty and to assume that everything will work as predicted. As a result, I get to sit in meetings where someone asks the "what went wrong" and all eyes turn to me.


ALL IS NOT LOST

Although this picture is bleak, the software development community has developed techniques that actually work. Most of these fall under the rubric "agile development".

Agile development is the antithesis of the elevator speech. The answer to the what, when, how expensive questions are: I don't know, I don't know and I don't know. Agile development says that if you want to get effective systems in place, you get a bunch of smart people in a room and let them loose. The pact with the business is:

Choose the project as best you can.
Define a level of investment.
Involve customers from the first day to the last.
Address the most important customer problems first.
Produce working software early and often.
Insist on complete transparency.
If cost is too high or benefit too low, pull the plug.

That is, as a software provider I will provide actual value as quickly as possible and if I don't earn my keep, fire me. Because large systems generally have a roll-out to a large number of end users, it may be a long time before the system is actually deployed, but every three weeks or a month there is a new, actual working system that can be evaluated, tested and re-directed. When that system provides enough value to warrant its deployment, roll it out. After roll out, continue investment so long as the implemented improvements justify their own cost.

No comments: