Vector processors provide high-level operations that work on vectors, ., linear arrays of numbers [20]. A typical vector is able to contain between 64 and 512 64-bit elements. With a single instruction, operations can be performed in parallel on all elements. Specialized architectures with large numbers of simple processing elements are required. Compared to vector registers, SIMD registers can hold a small number of elements. For example, at most four 32-bit values can be processed using SIMD instructions in a Pentium 4. Thus SIMD on commodity processors has a much lower degree of parallelism. On the other hand, the latency of load-store instructions (compared with vector processing) is much higher for vector.