[Next] [Up] [Previous]

Alternate Mode, Semi-RISC Mode, and Full Opcode Alternate Mode

Alternate Mode is an attempt to take the concepts developed in Universal Mode, and to further fine-tune the tradeoffs made in the various alternate modes offered, so as to produce a mode that may serve to replace many of these modes.

As in vector register mode, and unlike the scratchpad modes, instead of using the scratchpad pointer registers to indicate areas with 64 elements, additional register banks are used in this mode; one set of 64 supplementary accumulator/index registers, and one set of 64 supplementary floating-point registers.

In addition to this, to further approach the capabilities provided by Cray computers, two sets of eight long vector registers are provided, one with sixty-four 64-bit fixed point registers, and one with sixty-four 128-bit floating-point registers in each long vector register.

This resembles the complement of registers provided by the earliest Cray computers; the later Cray computers had considerably more, and even larger, vector registers. If sufficient register space is available in an implementation of this architecture, this too can be approached, by a set of sixty-four integer long vector registers and a set of sixty-four floating-point long vector registers, forming the long vector register scratchpad. (Upon re-examining the specifications for these computers, it appears I may have misread them, and, while they did increase the number of elements in a vector register from 64 to 128, or perhaps even the number of vector registers from 8 to 16, they did not have a second set of 64 vector registers as I mistakenly thought.)

The additional register banks used in this mode should be regarded as an optional feature of the architecture. Since they represent a large additional increment in the complexity of an implementation, and they are only applicable to some types of problems, there will be applications for which including these register banks would be wasteful.

The complement of registers provided in a full implementation of this mode is illustrated in the diagram below:

Because of the power and complexity of this mode, its description will be split up among several succeding pages.

While a short vector is composed of four double-precision floating-point numbers or eight single-precision floating-point numbers, and so on, a long vector always has 64 elements, independently of the type of its elements.

This will lead to some complications in arranging the path from memory to the arithmetic units, if these operations are implemented by means of 64 arithmetic units operating in parallel, as they may be in a high-performance implementation of this architecture (and a method of dealing with these complications is shown on the next page), but it also means that these vector operations can be implemented simply by the high-speed pipelining of a single arithmetic unit, as they were on many of the early vector architectures.

A Further Consequence

The potential presence of 64 arithmetic-logic units in an ultimate-performance implementation of the architecture suggests that a further increment in computing power might be obtained if each arithmetic-logic unit had a simple control unit associated with it, creating 64 separate computers. It would be possible to parcel out some portion of the internal cache memory of the chip to each of these processors; if some eight megabytes of cache memory were provided, for example, using half of that could provide each one with 65,536 bytes of memory.

The method of making use of this capability is described in some of the later sections among those that will follow.

It may be noted, however, that if there are 64 arithmetic-logic units available essentially identical to the main arithmetic-logic unit, attempting 65-way superscalar operation will suggest itself, and fully employing this will require the ability to decode instructions in the computer's full instruction set on a parallel basis. None the less, an advantage is derived from making each control unit simple for this form of operation, so that it can be provided on less ambitious implementations.

The instruction formats have been divided up among a number of diagrams, because a large number of instruction formats are required for memory-reference and register-to-register operations to handle all the types of operation that it is intended to support.

This diagram shows the most basic instruction formats for the standard memory-reference instructions and their related register to register instructions:

In most of the formats, the opcode corresponds to the last seven bits of an opcode for full opcode mode, and is thus a seven-bit opcode in the same format as used with universal mode.

In the scratchpad format instructions, however, a five-bit opcode is used. This allows this type of instruction to work with only three types, 32-bit fixed-point quantities, and 32-bit and 64-bit floating-point quantities. These are the data types most commonly used in FORTRAN programs.

A modification of this mode produces semi-RISC mode:

In this mode, the arithmetic/index and floating-point registers from 0 to 3 only are used in the register scratchpad instructions, so that indexed instructions using the normal base registers are possible; these indexed instructions are confined to the load and store instructions, which is what earns this mode its name, despite not really involving a particularly "reduced" instruction set.

Also, in this mode, the seven-bit opcodes are the same as those in normal mode, not as shown below, excluding the unnormalized floating-point operations. This will create additional opcode space for use in multi-way vector operations.

Another modification of this mode produces full opcode alternate mode, where the standard memory-reference instructions are extended as for full opcode mode to include simple floating, register packed, and register compressed operations; here, the addressing formats look like this:

In this mode, the vector register operations will be suitably modified to reflect the changed form of the opcode field and the bits before it; however, register packed and register compressed instructions, if supported, may not run at speeds comparable to other vector register instructions, as it is not expected for implementations to provide more than one packed decimal arithmetic unit. This may also apply to the decimal exponent modes as well, but the multiple integer ALUs which may be provided to accelerate vector register operations will support simple floating operation.

The eight-bit opcodes are the same as those for full opcode mode, and can be found in the section for that mode.

Two types of operate instructions are also shown in this chart, the population count instruction (SEBI: Separate Bits), and the short form of the shift instruction, since they are outside of the standard space for operate instructions which remains largely the same in different instruction modes.

As in universal mode, economy of opcode space is achieved by using two three-bit fields instead of three for the register entries in standard instructions. This fits with the requirements of a register-to-register instruction, and a memory reference instruction that is not indexed. These two types of instruction now have to be distinguished by prefix bits.

When the base register field in a memory-reference instruction is zero, this indicates that the instruction belongs to an additional addressing mode. When the first bit of the halfword following the first halfword of the instruction is a zero, the mode used for indexed addressing is used. When this bit is a one, the additional addressing modes described on the subsequent pages will apply.

Of the remaining 31 bits of the next two halfwords, three are used for an index register field: this may be zero for compatible non-indexed instructions, and three are used for a base register field: this refers to one of the eight scratchpad registers instead of one of the eight base/address registers. The remaining 25 bits are the displacement.

In the scratchpad instructions, the source register is one of the 64 supplementary registers of the appropriate type, fixed or floating, while the destination register is either one of the eight arithmetic/index registers or one of the eight floating-point registers. The larger number of supplementary registers allows many short routines to avoid use of memory, and to avoid a need for instructions longer than 16 bits.

The available memory-reference instructions are shown in the table below. The opcodes are shown in three columns; the first shows the four-bit opcode in binary for the indexed load and store instructions in semi-RISC mode, the second shows the five-bit opcodes in binary, for the register scratchpad instructions, in which a five-bit opcode is available for the instruction, and the third shows, in octal, the first halfword of an instruction containing a seven-bit opcode. The second digit, which is either 0 or 1 in this table, actually has its first two bits determined by the type of the instruction.

           0000xx  SWB    Swap Byte
           0001xx  CB     Compare Byte
0000       0002xx  LB     Load Byte
0001       0003xx  STB    Store Byte
           0004xx  AB     Add Byte
           0005xx  SB     Subtract Byte

           0010xx  IB     Insert Byte
           0011xx  UCB    Unsigned Compare Byte
           0012xx  ULB    Unsigned Load Byte
           0013xx  XB     XOR Byte
           0014xx  NB     AND Byte
           0015xx  OB     OR Byte
           0016xx  STGB   Store if Greater Byte

           0020xx  SWH    Swap Halfword
           0021xx  CH     Compare Halfword
0010       0022xx  LH     Load Halfword
0011       0023xx  STH    Store Halfword
           0024xx  AH     Add Halfword
           0025xx  SH     Subtract Halfword
           0026xx  MH     Multiply Halfword
           0027xx  DH     Divide Halfword

           0030xx  IH     Insert Halfword
           0031xx  UCH    Unsigned Compare Halfword
           0032xx  ULH    Unsigned Load Halfword
           0033xx  XH     XOR Halfword
           0034xx  NH     AND Halfword
           0035xx  OH     OR Halfword
           0036xx  MEH    Multiply Extensibly Halfword
           0037xx  DEH    Divide Extensibly Halfword

     00000 0040xx  SW     Swap
     00001 0041xx  C      Compare
0100 00010 0042xx  L      Load
0101 00011 0043xx  ST     Store
     00100 0044xx  A      Add
     00101 0045xx  S      Subtract
     00110 0046xx  M      Multiply
     00111 0047xx  D      Divide

     01001 0051xx  UC     Unsigned Compare

     01011 0053xx  X      XOR
     01100 0054xx  N      AND
     01101 0055xx  O      OR
     01110 0056xx  ME     Multiply Extensibly
     01111 0057xx  DE     Divide Extensibly

           0060xx  SWL    Swap Long
           0061xx  CL     Compare Long
0110       0062xx  LL     Load Long
0111       0063xx  STL    Store Long
           0064xx  AL     Add Long
           0065xx  SL     Subtract Long
           0066xx  ML     Multiply Long
           0067xx  DL     Divide Long

           0071xx  UCL    Unsigned Compare Long

           0073xx  XL     XOR Long
           0074xx  NL     AND Long
           0075xx  OL     OR Long
           0076xx  MEL    Multiply Extensibly Long
           0077xx  DEL    Divide Extensibly Long

           0100xx  SWM    Swap Medium
           0101xx  CM     Compare Medium
1000       0102xx  LM     Load Medium
1001       0103xx  STM    Store Medium
           0104xx  AM     Add Medium
           0105xx  SM     Subtract Medium
           0106xx  MM     Multiply Medium
           0107xx  DM     Divide Medium

           0110xx  MEUM   Multiply Extensibly Unnormalized Medium
           0111xx  DEUM   Divide Extensibly Unnormalized Medium
           0112xx  LUM    Load Unnormalized Medium
           0113xx  STUM   Store Unnormalized Medium
           0114xx  AUM    Add Unnormalized Medium
           0115xx  SUM    Subtract Unnormalized Medium
           0116xx  MUM    Multiply Unnormalized Medium
           0117xx  DUM    Divide Unnormalized Medium

     10000 0120xx  SWF    Swap Floating
     10001 0121xx  CF     Compare Floating
1010 10010 0122xx  LF     Load Floating
1011 10011 0123xx  STF    Store Floating
     10100 0124xx  AF     Add Floating
     10101 0125xx  SF     Subtract Floating
     10110 0126xx  MF     Multiply Floating
     10111 0127xx  DF     Divide Floating

           0130xx  MEU    Multiply Extensibly Unnormalized
           0131xx  DEU    Divide Extensibly Unnormalized
           0132xx  LU     Load Unnormalized
           0133xx  STU    Store Unnormalized
           0134xx  AU     Add Unnormalized
           0135xx  SU     Subtract Unnormalized
           0136xx  MU     Multiply Unnormalized
           0137xx  DU     Divide Unnormalized

     11000 0140xx  SWD    Swap Double
     11001 0141xx  CD     Compare Double
1100 11010 0142xx  LD     Load Double
1101 11011 0143xx  STD    Store Double
     11100 0144xx  AD     Add Double
     11101 0145xx  SD     Subtract Double
     11110 0146xx  MD     Multiply Double
     11111 0147xx  DD     Divide Double

           0150xx  MEUD   Multiply Extensibly Unnormalized Double
           0151xx  DEUD   Divide Extensibly Unnormalized Double
           0152xx  LUD    Load Unnormalized Double
           0153xx  STUD   Store Unnormalized Double
           0154xx  AUD    Add Unnormalized Double
           0155xx  SUD    Subtract Unnormalized Double
           0156xx  MUD    Multiply Unnormalized Double
           0157xx  DUD    Divide Unnormalized Double

           0160xx  SWQ    Swap Quad
           0161xx  CQ     Compare Quad
1110       0162xx  LQ     Load Quad
1111       0163xx  STQ    Store Quad
           0164xx  AQ     Add Quad
           0165xx  SQ     Subtract Quad
           0166xx  MQ     Multiply Quad
           0167xx  DQ     Divide Quad

           0170xx  MEUQ   Multiply Extensibly Unnormalized Quad
           0171xx  DEUQ   Divide Extensibly Unnormalized Quad
           0172xx  LUQ    Load Unnormalized Quad
           0173xx  STUQ   Store Unnormalized Quad
           0174xx  AUQ    Add Unnormalized Quad
           0175xx  SUQ    Subtract Unnormalized Quad
           0176xx  MUQ    Multiply Unnormalized Quad
           0177xx  DUQ    Divide Unnormalized Quad

The operate instructions have the same format as in normal mode, even to the extent that if an index register field is present, it can be used, just as it is available for the memory-reference instructions with five-bit opcodes, except that when the base register field is zero, then again the sixteen-bit address field is replaced by a 32-bit field with an index and base register specification; the corresponding index register field in the instruction itself must be zero in that case.


[Next] [Up] [Previous]