At the heart of Greenplum Database is the Parallel Dataflow Engine. This is where the real work of processing and analyzing data is done. The Parallel Dataflow Engine is an optimized parallel processing infrastructure that is designed to process data as it flows from disk, from external files or applications, or from other segments over the gNet interconnect (Figure 9). The engine is inherently parallel—it spans all segments of a Greenplum cluster and can scale effectively to thousands of commodity processing cores. The engine was designed based on supercomputing principles, with the idea that large volumes of data have.