The central theme of this book is the use of reliability and availability computations as a means of comparing fault-tolerant designs. This chapter defines fault-tolerant computer systems and illustrates the prime importance of such techniques in improving the reliability and availability of digital systems that are ubiquitous in the 21st century. The main impetus for complex, digital systems is the microelectronics revolution, which provides engineers and scientists with inexpensive and powerful microprocessors, memories, storage systems, and communication links. . | Reliability of Computer Systems and Networks Fault Tolerance Analysis and Design Martin L. Shooman Copyright 2002 John Wiley Sons Inc. ISBNs 0-471-29342-3 Hardback 0-471-22460-X Electronic 1 INTRODUCTION The central theme of this book is the use of reliability and availability computations as a means of comparing fault-tolerant designs. This chapter defines fault-tolerant computer systems and illustrates the prime importance of such techniques in improving the reliability and availability of digital systems that are ubiquitous in the 21st century. The main impetus for complex digital systems is the microelectronics revolution which provides engineers and scientists with inexpensive and powerful microprocessors memories storage systems and communication links. Many complex digital systems serve us in areas requiring high reliability availability and safety such as control of air traffic aircraft nuclear reactors and space systems. However it is likely that planners of financial transaction systems telephone and other communication systems computer networks the Internet military systems office and home computers and even home appliances would argue that fault tolerance is necessary in their systems as well. The concluding section of this chapter explains how the chapters and appendices of this book interrelate. WHAT IS FAULT-TOLERANT COMPUTING Literally fault-tolerant computing means computing correctly despite the existence of errors in a system. Basically any system containing redundant components or functions has some of the properties of fault tolerance. A desktop computer and a notebook computer loaded with the same software and with files stored on floppy disks or other media is an example of a redundant sys 1 2 INTRODUCTION tem. Since either computer can be used the pair is tolerant of most hardware and some software failures. The sophistication and power of modern digital systems gives rise to a host of possible sophisticated approaches to fault tolerance