[Next] [Up] [Previous]

Vector Register Mode

Short Page Mode causes a radical transformation of the instruction format to allow a larger portion of the available addressing modes to be available without switching modes and without increasing the instruction size.

Vector Register Mode transforms the instruction format in an even more radical manner. Here, the instructions are lengthened in many cases.

Instead of using the scratchpad pointer registers to indicate areas with 64 elements, additional register banks are used in this mode; one set of 64 supplementary accumulator/index registers, and one set of 64 supplementary floating-point registers.

In addition to this, to further approach the capabilities provided by Cray computers, two sets of eight long vector registers are provided, one with sixty-four 64-bit fixed point registers, and one with sixty-four 128-bit floating-point registers in each long vector register.

This resembles the complement of registers provided by the earliest Cray computers; the later Cray computers had considerably more, and even larger, vector registers. If sufficient register space is available in an implementation of this architecture, this too can be approached, by a set of sixty-four integer long vector registers and a set of sixty-four floating-point long vector registers, forming the long vector register scratchpad. (Upon re-examining the specifications for these computers, it appears I may have misread them, and, while they did increase the number of elements in a vector register from 64 to 128, or perhaps even the number of vector registers from 8 to 16, they did not have a second set of 64 vector registers as I mistakenly thought.)

Also, the scratchpad pointer registers are now used as additional base registers, so that there are now 32 base registers:

new base registers  0- 7: Base Registers 0-7
new base registers  8-15: Scratchpad Pointer Registers 0-7
new base registers 16-23: Pointer Scratchpad Base Registers 0-7
new base registers 24-31: Array Scratchpad Base Registers 0-7

The additional register banks used in this mode should be regarded as an optional feature of the architecture. Since they represent a large additional increment in the complexity of an implementation, and they are only applicable to some types of problems, there will be applications for which including these register banks would be wasteful.

The complement of registers provided in a full implementation of this mode is illustrated in the diagram below:

Because of the power and complexity of this mode, its description will be split up among several succeding pages.

While a short vector is composed of four double-precision floating-point numbers or eight single-precision floating-point numbers, and so on, a long vector always has 64 elements.

This will lead to some complications in arranging the path from memory to the arithmetic units, if these operations are implemented by means of 64 arithmetic units operating in parallel, as they may be in a high-performance implementation of this architecture (and a method of dealing with these complications is shown on the next page), but it also means that these vector operations can be implemented simply by the high-speed pipelining of a single arithmetic unit, as they were on many of the early vector architectures.

Another Vector Register Mode

The long vector registers and the long vector scratchpad are also available for use in symmetric vector register mode, which will be described later.

A Further Consequence

The potential presence of 64 arithmetic-logic units in an ultimate-performance implementation of the architecture suggests that a further increment in computing power might be obtained if each arithmetic-logic unit had a simple control unit associated with it, creating 64 separate computers. It would be possible to parcel out some portion of the internal cache memory of the chip to each of these processors; if some eight megabytes of cache memory were provided, for example, using half of that could provide each one with 65,536 bytes of memory.

The method of making use of this capability is described in some of the later sections among those that will follow.

It may be noted, however, that if there are 64 arithmetic-logic units available essentially identical to the main arithmetic-logic unit, attempting 65-way superscalar operation will suggest itself, and fully employing this will require the ability to decode instructions in the computer's full instruction set on a parallel basis. None the less, an advantage is derived from making each control unit simple for this form of operation, so that it can be provided on less ambitious implementations.


[Next] [Up] [Previous]