This is an essay on why software fails, because there are so many different configurations of computer systems. I originally wrote this for a presentation on 06 Mar 2000. I updated it for the web on 19 Dec 2004.
Software often fails because the developer has not considered — encountered — a system configuration that differs from all the other configurations previously seen. In some cases it's a different set of files: some are missing, or some extras may be present. There may be newer or older versions of some files. The user may have changed settings that alter the operation of the system. Changes in hardware can cause operational differences.
To begin, let us substitute a tangible system instead of software: let's use an automobile. For purposes of the example, let us also assume that all the mechanical systems described herein can be constructed in a reasonable manner, and that there are no physical roadblocks that would prevent an automaker from building a car as we will suggest. It's an ideal world.
The Original Car
We start with a conventional economy automobile, one with a floor-mounted manual transmission and a simple heat-only climate system.
It is important to realize that, when an upgrade is shipped, not all customers install it. So we end up with every possible combination of some, all, or no updates.
Upgrade 1: Automatic Transmission
We decide to add a model with automatic transmission. So we build two models, manual and auto. We have three versions of the car, although both manual versions are the same at this point:
Upgrade 2: Automatic Transmission Retrofit
The company decides to offer the auto transmission as an upgrade to all current owners, which is just what software vendors do.
Let us depart for a moment from the mechanical aspect of the car and presume that the upgrade can be made by the owner merely plugging the car into a large black box — called the Installer. In a few minutes the car is transformed without human interaction. This magical automatic upgrade method seems wonderful, but there is a serious shortcoming: no human has overseen the modifications to this particular car, so nobody in the field actually knows what was done. But as long as everything comes out OK, we don't care. How will we fix it if something didn't go right during the installation?
Now we have several versions of the car in the field: new ones with factory auto transmissions, those with the auto upgrade, new factory manual cars, and those with the original manual shift who don't upgrade.
But we just figure that's a tech-support problem, not an engineering problem.
Upgrade 3: Speed Sensor
After a short while, we are told that someone accidentally downshifted the auto transmission at too high a speed and damaged it. So we design a speed sensor, we modify the brake assembly on one wheel to mount the sensor. We of course offer this upgrade to all owners, but now we must change the Installer so it doesn't try to install the speed sensor on cars with only the manual transmission. Now we have to test the installer, and this takes additional time. Once the upgrade is in the field, we end up with more versions of the car:
Tech Support hires a few more people.
Upgrade 4: Speed Sensor Reset When Starting
Some time later, we learn that the speed sensor has a minor problem that can be solved by resetting it when the car is started. But how do we know when the engine starts? The Engine Team is too busy to help us, so we just cheat and wire to the switch on the oil-pressure light. If the ignition is on and the oil light goes off, then the engine must have started. We modify the transmission system, modify the oil light circuit a little bit, and ship the upgrade. We don't bother to tell the engine guys that we tweaked their oil-pressure circuit, but just keep that in mind for later. How many different versions of the car do we have now?
In Tech Support the blood pressure is already rising. Forget oil pressure.
Upgrade 5: Air Conditioning
Now the Climate-Control Team decides to add air conditioning to both the assembly line for new cars, and as an update. They go through basically the same process we did with the transmission, coming up with an automatic system that installs the new A/C with the original heater. In the design process, they realize that the air conditioner needs to know if the engine is running. Taking a cue from us, they tap the oil light like we did. Let's ignore the fact that they just doubled the number of versions of the car, but let's remember that oil light. How many versions now? Take note that we now build new manual cars with and without A/C, plus we have the "old" manual cars that might ugprade. Or might not.
Tech support simply kicks up their medications.
Upgrade 6: Brake Interlock
At this point, the Safety Team tells us that the brake must be pressed before the driver can shift out of Park. We come up with an elegant solution: rather than install another switch or sensor for the brake pedal, we simply wire to the brake lights! It's quick, cheap, and cool. After all, it worked just fine for that oil light. We ship this upgrade, blissfully ignoring the fact that no two cars are alike any more. How many versions now?
Tech Support throws away their meds and switches to straight bourbon.
A Mystery Failure
Somebody had a brake light burn out and he couldn't get the automatic transmission out of Park. The service people couldn't figure out what was wrong. Eventually, Engineering told them about the link between transmission and brake (or, more accurately, the brake lights). This was totally unbelievable to the customer, who would never dream that the transmission failure was somehow realted to a burned out light bulb! We never considered that situation, and the fix is going to take a little time. In the meantime, we come up with a workaround solution: go under the dash and pull out the fuse for the interior lights; you can live without the courtesy lights for a while.
Tech support sobers up and begins stockpiling small arms.
Upgrade 7: Sport Instrument Panel
Now comes the Sport Package: the Mid-Life Crisis Department puts in a tachometer and an oil pressure gauge, removing the oil light in the process. Oops — without an oil light, the speed sensor reset system can't tell when the engine starts. (Remember when we hacked that change? We never coordinated it with the Engine Department, but it seemed to work OK for us.) We decide to modify things — via another hack — to detect engine start by sensing the tachometer coming up from zero. We have factory-built Sport models, and upgraded Sport models. Our fix only needs to go to a small percentage of the cars out there, of course — just the ones with the Sport Package upgrade — those with the automatic transmission.
Tech Support spends their time browsing the Web for good prices on ammunition.
The Climate-Control guys now find out that they, too, needed that missing oil light. So we — the transmission engineers — get stuck faking an oil-light circuit for them. It's not our problem to solve, and we have absolutely nothing to do with the air conditioning. But we get ordered to do this, so we do it. Now the Climate guys are depending on a hack made by the Transmission guys. Will they understand how it works? How to fix it if something else forces a change? Of course not.
Tech Support quits en masse to join a startup automaker who promises never to do things like this. One of the experienced support guys carries his meds just in case.
A dealer in Texas calls to tell us that the brakes are overheating on some cars. It's that speed sensor we put in. Don't worry, we'll stop now.
So Where Are We At This Point?
We've only made seven changes, and now we have way more than seven versions of the car in the field. Let's count them:
Four basic cars: factory manual with none of the later upgrades; factory auto, with all of them; and original manual, with none of the other upgrades. That's three versions.
We have to double that to six, because some have air conditioning and some don't.
But the upgraded cars in the field (upgraded auto) cause the most variations, because the owner can apply none, some, or all of the later upgrades. This is where we run into trouble, and how our car example becomes similar to the world of software upgrades.
We know the user has upgraded to the automatic transmission, but for each of the five other upgades and fixes, we end up with 32 possible combinations of those upgrades.
Our total number of models, after seven upgrades, is 38 models!
No Two Alike
We practically have to give one-on-one support for each owner, because everyone's car is different from his neighbor's. The upgrade process has become risky, because it's easy to overlook some obscure versions of the car. All the interconnects between systems in the car — like the air conditioning and the automatic transmission system that share the fake oil-light circuit — create problems when we make changes.
We might change one of the car's systems and not expect it to affect another system, but the change somehow breaks that other system. Now the customer gets involved, trying to avoid problems by choosing which upgrades to take and which to skip. But to make these decisions, one must understand how things work, even though we don't give out enough information to figure things out.
There's another twist: there are dozens of other companies making upgrades for the car just like the manufacturer does. Some of these companies don't have good access to technical information: they either guess at how things operate, or they tap into something that is subject to change later. It's now impossible to determine how many possible variations of the car there can be. Worse, even if these third-party upgrades don't actually interact directly with each other, they can still have hidden impacts on each other just because they're both installed.
What does the engineer do?
- He keeps adding new features to meet competition.
- He keeps fixing unintended problems (bugs).
- His changes cause more unintended bugs.
After a while, he decides to quit driving altogether, and he switches to a bicycle.
Only he doesn't realize that the bicycle manufacturers have an upgrade for him...