Experimental Evaluation of the Fail-silent behaviour in Computers without error masking

Henrique Madeira, Joao Gabriel Silva.


International Symposium on Fault Tolerant Computing Systems", FTCS-24 June 1994.


Abstract
Traditionally, fail-silent computers are implemented by using massive redundancy (hardware or software). In this research we investigate if it is possible to obtain a high degree of fail -silent behavior from a computer without hardware or software replication by using only simple behavior based error detection techniques. It is assumed that if the errors caused by a fault are detected in time it will be possible to stop the erroneous computer behavior, thus preventing the violation of the fail-silent model. The evaluation technique used in this research is physical fault injection at the pin level. Results obtained by the injection of about 20000 different faults in two different target systems have shown that 1) in a system without error detection up to 46% of the faults caused the violation of the fail-silent model; 2) in a computer with behavior based error detection the percentage of faults that caused the violation of the fail-silent mode was reduced to values from 2.3% to 0.4%; 3) the results are very dependent on the target system, on the program under execution during the fault injection and on the type of faults.