Hardware Acceleration of EDA Algorithms- P8: Single-threaded software applications have ceased to see significant gains in performance on a general-purpose CPU, even with further scaling in very large scale integration (VLSI) technology. This is a significant problem for electronic design automation (EDA) applications, since the design complexity of VLSI integrated circuits (ICs) is continuously growing. In this research monograph, we evaluate custom ICs, field-programmable gate arrays (FPGAs), and graphics processors as platforms for accelerating EDA algorithms, instead of the general-purpose singlethreaded CPU | Our Approach 123 offered by GPUs our implementation of the gate evaluation thread uses a memory lookup-based logic simulation paradigm. Fault simulation of a logic netlist consists of multiple logic simulations of the netlist with faults injected on specific nets. In the next three subsections we discuss i GPU-based implementation of logic simulation at a gate ii fault injection at a gate and iii fault detection at a gate. Then we discuss iv the implementation of fault simulation for a circuit. This uses the implementations described in the first three subsections. Logic Simulation at a Gate Logic simulation on the GPU is implemented using a lookup table LUT based approach. In this approach the truth tables of all gates in the library are stored in a LUT. The output of the simulation of a gate of type G is computed by looking up the LUT at the address corresponding to the sum of the gate offset of G Goff and the value of the gate inputs. 100010111111100001 NOR2 INV NAND3 AND2 offset offset offset offset Fig. Truth tables stored in a lookup table Figure shows the truth tables for a single NOR2 INV NAND3 and AND2 gate stored in a one-dimensional lookup table. Consider a gate g of type NAND3 with inputs A B and C and output O. For instance if ABC 110 O should be 1. In this case logic simulation is performed by reading the value stored in the LUT at the address NAND3off 6. Thus the value returned from the LUT will be the value of the output of the gate being simulated for the particular input value. LUT-based simulation is a fast technique even when used on a serial processor since any gate including complex gates can be evaluated by a single lookup. Since the LUT is typically small these lookups are usually cached. Further this technique is highly amenable to parallelization as will be shown in the sequel. Note that in our implementation each LUT enables the simulation of two identical gates with possibly different inputs simultaneously. In our .