Vector processing was conceived in the 1960s and is one of the earliest applications of SIMD computing. Instead of processing one value at a time using a single instruction, researchers were looking for ways to use a single instruction to process multiple related values, known as vectors. The Cray-1 supercomputer is said to be one of the first supercomputers to successfully implement vector processing.
The following are mechanisms that leverage the properties of vector processing to benefit performance:
Less Instruction Overhead
With one instruction per vector, there will be far fewer fetch and decode stages per data. After the initial overhead of a single instruction fetch and decode, multiple data elements can go through the pipeline using the rest of the stages.
Memory Access Bandwidth
It is not only instructions that can be pipelined, but also memory accesses. When an instruction begins, a certain amount of data is retrieved from memory. When data is moved to the next stage in the pipeline the next set of data can be retrieved from memory. This overlapping of memory accesses reduces the overhead involved in accessing such large amounts of data from memory.
A vector processor utilizes the Fetch, Decode, Execute, Memory Access and Write Back instruction process and therefore can leverage pipelining to increase the throughput of data.
The pseudocode in the workspace shows two approaches to processing an array located in memory.
The scalar approach is split into 2 parts: setting up and looping. The setup section loads the memory location of the data, sets a counter and defines the number of loops. The loop section retrieves the data, processes it, and writes the data back to memory. The counter and memory location both get incremented and the loop goes back to the beginning.
The vector approach is much smaller at 4 lines. The memory location of the data is set just the same as in the scalar approach. When retrieving the data a vector of elements is retrieved. There’s one instruction to process all the elements and then another to write them all back to memory.
The difference in the number of instructions is noticeable and would be more so if we wrote out all the instructions executed after the looping was complete. This results in a lot of time and energy saved by the processor.