Learn

So how is a vector architecture different from a scalar one?

##### Vector Registers

To operate on large amounts of data, the CPU will need somewhere to put it. Vector registers are just like regular single element registers except they can hold multiple elements of data. The Cray-1 worked on 64-bit data and each vector register could hold up to 64 elements of data. This is a register that is: 64 x 64 = 4096 bits long!

##### Internal Looping

A code loop is how a scalar processor performs the same operation on multiple points of data. While there are mechanisms to simplify coded loops, there will always be multiple instruction fetches and decodes per loop. Since a vector architecture commonly performs the same operation on vectors of data, the looping is assumed and can be built into the architecture without any extra instruction fetches.

##### Lanes

Vector architectures can also use multiple lanes to process data elements simultaneously. Each lane can hold all the functional units a scalar processor has such as arithmetic, floating-point, and logic units. This comes at the cost of a more complex architecture, but they can help speed up processing drastically.

Vector processors also contain much of the same hardware that scalar processors do. One example is scalar registers to hold immediate values. If a program needs to add 5 to an entire vector register, it makes sense to hold the value 5 in a scalar register and not one element of a vector register.

### Instructions

The workspace shows the components of a vector architecture discussed in this exercise. At the top is a vector register with multiple data elements.

The middle portion shows the elements of the vector register stacked ready to be processed. This is a visual representation of internal looping, where the processor knows which element is up for processing once the previous one is done.

Lastly, the bottom portion of the diagram shows 4 lanes that allow the processing of multiple data elements simultaneously. Each lane contains functional units that might perform arithmetic processing, floating-point processing, and other system-specific functions.