I lived through the Chernobyl disaster back in 1986, as a teenager, in Germany. I recall watching the radioactive cloud coming closer and closer on TV, wondering whether it was ever going to be safe to go outside again. That was before I became an engineer, but I remember pondering even then whether it was just shoddy Soviet work that was the cause of the disaster, or whether there was something inherently unsafe about nuclear power plants.
The question: If there’s enough money and time and brainpower, and no political obstacles, is it possible to engineer safe nuclear power plants? It should be, shouldn’t it: it’s just engineering after all, and there is no mystery at all how nuclear power plants are supposed to work.
25 years later, watching the Japenese disaster unfold in the past few days, I’ve made up my mind. The answer is: no, it is not possible to engineer safe nuclear power plants.
Here is why. It has to do with complex systems, learning from experience, and the limits of human imagination:
Any complex system, like a PC, or a car, or some enterprise software, or a nuclear power plant, has one fundamental property that makes it similar to all other complex systems: that’s that it cannot be fully understood by a single person. There are other definitions of “complex system”, but this is mine: no one person can understand all aspects of it.
For non-engineers, that may sound alarming. It was for me when I first read as a kid about SDI in its day, but is a fact of life that we see all around us: iPhones are complex systems, the electricity grid is, when the UPS truck arrives, Amazon shipping, even more and more toys. Nobody understands all aspects of any of it.
As engineers, we use plenty of tricks to deal with systems that are larger and more complicated than we can comprehend, like teamwork (“I understand my part, you understand your’s, and we negotiate how they interact”) and abstraction. For example: “when I turn the key in the ignition, the car will start”. Many people do not know this, but not even the engineers who designed your car can tell you in detail what happens when you turn the key. All of them — and I used to work for a car company — will use abstractions, which means they will not know about details they think are unimportant, such as how exactly the network controller for the injection system initializes itself when power comes on, or what would happen if there was a lightning bolt hitting the car at the same time. That’s because they might understand the logical connectivity of the car’s parts, but not what lightning would do to it. (Some other engineers will understand the latter, but they won’t know about the software, and so on.)
The only way we ever get anything to work, as engineers, is to design our systems as well as we can, and then try them out, over and over, and fix what needs fixing. Flippantly, we could call the philosophy: “who cares whether we understand what exactly it does, as long as it does the right thing”; and that is demonstrated by trying out as many things as we can think of. Engineers call that “testing”.
As a result, the highest-quality engineering approaches that I know of for complex systems all put an incredible amount of resources into some form of testing, because that’s the only thing we can think of to make progress. The design engineer (or team) has come up with a design, and they, their peers, and the test engineers, now throw the kitchen sink at it trying it out in as many circumstances they can think of. If only one of those tests does not produce the desired result, the design engineers go back and change their design until all test cases work. (For the engineers: to keep it simple in this article, I also consider design reviews etc. forms of testing as they follow the same iterative pattern.)
Here’s an example. Let’s say somebody gives you a new pocket calculator prototype that supposedly knows how to simplify fractions, and you are supposed to test it. What will you do? You probably start with simple tests, like “4/6 makes 2/3, that’s right”. And “999/111 makes 9”. But it goes from there: what about we try negative numbers? What if the denominator is 0? What if the numerator is 1.3 and the denominator is 3/4 itself? Etc. etc. Many people would not think of even those simple examples to test. Test engineers will, and many more, like what if the fraction is 1.2E81 over PI? Or what if one of the numbers is an open parenthesis only as you can in some calculators?
The only reason why cars start when you turn the ignition, and why iPads play movies when you touch the right buttons, and why Google delivers the right answer to your search, is testing. For many complex systems, testing and fixing bugs takes a lot more time and money than actually building the system in the first place. If there wasn’t any testing, none — I repeat, NONE — of the complex systems I know would even come on, never mind do anything useful.
And even after gazillions of dollars in engineering salaries and testing equipment were spent, Windows still crashes. (and many other products; even my lowly computer keyboard seems to manage to crash occasionally.)
Which brings us back to nuclear power plants.
The problem with nuclear power plants is that they cannot be tested, at least not tested sufficiently. All the things that have been going on in Japan since the tsunami wave hit — today’s headlines all talk about “heroic staff at the reactor” — have been totally outside of the range of anything that has ever been tested. (Yes, I wasn’t around when they tested, so I don’t know for sure. But trust me on this: engineering pride does not allow for several roofs being blown off by sudden explosions, or seawater corroding the machinery, or remaining staff having to be evacuated, as “normal events”. So it’s a safe bet it wasn’t tested.)
Which means that fundamentally, we have no idea what will happen next: nobody understands the whole system because it is complex, and nobody has ever tried it out (“tested”) in the state it is in now, so nobody has any clue. My heart is with the engineers fighting to keep those plants under control; they are doing an incredibly hard job while in mortal danger themselves. But everything, ever since we left the realm of what has been tested before (I think that was probably just after the tsunami knocked out the generators), they had to make up on the fly.
Leaving the realm of what has been tested happens frequently with other kinds of complex systems, say, Windows. But there is one crucial difference: if Windows crashes, you reboot. If it keeps doing it, you re-install, or buy a Mac. The costs are rather limited. In case of a nuclear power plant, the costs are catastrophic because people might start dying and not just a few of them.
Why weren’t they tested better? There are two reasons:
First, your neighbors will not approve repeated testing that involves blowing up the roof of your nuclear power plant and releasing some radiation, which is of course what happened recently. You can guess all day long, using simulations and what have you, but because they are simulations and not the real thing, they are not sufficient. (I’m sure these plants had lots of simulations run on them, but it wasn’t good enough for sure.)
And secondly, in my now 20+ years experience building complex systems, I have never, not once, seen a complex system, that didn’t behave rather badly from time to time. Because I usually build software, it would be the users (they are very good at finding very unlikely bugs!) who managed to do something to the system that left all engineers scratching their beards saying “that’s impossible”. I’ve learned that there are no bug-free complex systems, and the reason is always the same: the engineers did not have the imagination required to come up with all the circumstances that the system found itself in. “What do you mean, the user managed to click the button twice before the popup went away?” (well, Ms Mary Smith of Hamstead managed) and “Yep, once the data was corrupted in this particular way, and there wasn’t enough memory, bad things might happen. We tested each, but not together.”
In Japan, they did not test (not even simulate) a 9.0 earthquake followed by a tsunami, and certainly nothing that came after that. I’m sure those plants in Japan were tested well, much much better than your average software system. But earthquake plus Tsunami plus generators-out plus out of fuel etc. etc. was something clearly not considered. And if it had been considered, something else would not have been.
There is always one more bug. In case of nuclear power plants, that one more bug is going to have catastrophic consequences. We cannot avoid that one more bug because we don’t know how to, as I’ve tried to show. So let’s stop assuming that nuclear power can be made safe; it can’t. There will be catastrophes from time to time, like the one in Japan this week.
The question is simply this: do we need nuclear power so badly that we need to be okay with an occasional nuclear catastrophe?