Soft processors often use data caches to reduce the gap between processor and main memory speeds. To achieve high efficiency, simple, blocking caches are used. Such caches are not appropriate for processor designs such as runahead and out-of-order execution that require non-blocking caches to tolerate main memory latencies. Conventional nonblocking caches are expensive and slow on FPGAs as they use content-addressable memories (CAMs). This work exploits key properties of runahead execution and demonstrates an FPGA-friendly non-blocking cache design that does not require CAMs. A non-blocking 4KB cache operates at 329MHz on Stratix III FPGAs while it uses only 270 logic elements. A 32KB non-blocking cache operates at 278Mhz and uses 269 logic elements.