[Next] [Up] [Previous]

The Short Vector Instructions

The short vector instructions operate on vectors of a fixed length, with a set of vector registers of that length available. These instructions resemble the MMX feature, or the later Streaming SIMD Extensions feature, of Intel microprocessors, or the AltiVec feature of Motorola microprocessors, by operating on a relatively long word that can be split into multiple smaller segments.

The Lincoln Laboratory TX-2 computer was a computer built from discrete transistors that could perform arithmetic on a 36-bit word or on its two halves or four quarters simultaneously, and, earlier, the AN/FSQ-7 computer designed by IBM for SAGE operated on pairs of 16-bit numbers at a time, so there are some precedents for operations on vectors of small to medium size. (Also, both the VIS instructions of the Sun SPARC and the MAX instructions of Hewlett-Packard's PA-RISC preceded MMX to a smaller extent as well.)

These instructions, in normal mode, have the formats:

and their opcodes are:

140xxx x00xxx     SWBSV   Swap Byte Short Vector

140xxx x02xxx     LBSV    Load Byte Short Vector
140xxx x03xxx     STBSV   Store Byte Short Vector
140xxx x04xxx     ABSV    Add Byte Short Vector
140xxx x05xxx     SBSV    Subtract Byte Short Vector

140xxx x10xxx     SMBPB   Set Mask Bit if Positive Byte
140xxx x11xxx     SMBZB   Set Mask Bit if Zero Byte
140xxx x12xxx     SMBNB   Set Mask Bit if Negative Byte

140xxx x13xxx     XBSV    XOR Byte Short Vector
140xxx x14xxx     NBSV    AND Byte Short Vector
140xxx x15xxx     OBSV    OR Byte Short Vector

140xxx 0362xx     LSVM    Load Short Vector Multiple
140xxx 0363xx     STSVM   Store Short Vector Multiple

141xxx x00xxx     SWHSV   Swap Halfword Short Vector

141xxx x02xxx     LHSV    Load Halfword Short Vector
141xxx x03xxx     STHSV   Store Halfword Short Vector
141xxx x04xxx     AHSV    Add Halfword Short Vector
141xxx x05xxx     SHSV    Subtract Halfword Short Vector
141xxx x06xxx     MHSV    Multiply Halfword Short Vector
141xxx x07xxx     DHSV    Divide Halfword Short Vector

141xxx x10xxx     SMBPH   Set Mask Bit if Positive Halfword
141xxx x11xxx     SMBZH   Set Mask Bit if Zero Halfword
141xxx x12xxx     SMBNH   Set Mask Bit if Negative Halfword
141xxx x13xxx     XHSV    XOR Halfword Short Vector
141xxx x14xxx     NHSV    AND Halfword Short Vector
141xxx x15xxx     OHSV    OR Halfword Short Vector

142xxx x00xxx     SWSV    Swap Short Vector

142xxx x02xxx     LSV     Load Short Vector
142xxx x03xxx     STSV    Store Short Vector
142xxx x04xxx     ASV     Add Short Vector
142xxx x05xxx     SSV     Subtract Short Vector
142xxx x06xxx     MSV     Multiply Short Vector
142xxx x07xxx     DSV     Divide Short Vector

142xxx x10xxx     SMBP    Set Mask Bit if Positive
142xxx x11xxx     SMBZ    Set Mask Bit if Zero
142xxx x12xxx     SMBN    Set Mask Bit if Negative
142xxx x13xxx     XSV     XOR Short Vector
142xxx x14xxx     NSV     AND Short Vector
142xxx x15xxx     OSV     OR Short Vector

143xxx x00xxx     SWLSV   Swap Long Short Vector

143xxx x02xxx     LLSV    Load Long Short Vector
143xxx x03xxx     STLSV   Store Long Short Vector
143xxx x04xxx     ALSV    Add Long Short Vector
143xxx x05xxx     SLSV    Subtract Long Short Vector
143xxx x06xxx     MLSV    Multiply Long Short Vector
143xxx x07xxx     DLSV    Divide Long Short Vector

143xxx x10xxx     SMBPL   Set Mask Bit if Positive Long
143xxx x11xxx     SMBZL   Set Mask Bit if Zero Long
143xxx x12xxx     SMBNL   Set Mask Bit if Negative Long
143xxx x13xxx     XLSV    XOR Long Short Vector
143xxx x14xxx     NLSV    AND Long Short Vector
143xxx x15xxx     OLSV    OR Long Short Vector

144xxx x02xxx     LSMSV   Load Small Short Vector
144xxx x03xxx     STSMSV  Store Small Short Vector
144xxx x04xxx     ASMSV   Add Small Short Vector
144xxx x05xxx     SSMSV   Subtract Small Short Vector
144xxx x06xxx     MSMSV   Multiply Small Short Vector
144xxx x07xxx     DSMSV   Divide Small Short Vector

144xxx x10xxx     SMBPSM  Set Mask Bit if Positive Small
144xxx x11xxx     SMBZSM  Set Mask Bit if Zero Small
144xxx x12xxx     SMBNSM  Set Mask Bit if Negative Small

144xxx 016200     SHSMHSV Shuffle Small/Halfword Short Vector

145xxx x02xxx     LFSV    Load Floating Short Vector
145xxx x03xxx     STFSV   Store Floating Short Vector
145xxx x04xxx     AFSV    Add Floating Short Vector
145xxx x05xxx     SFSV    Subtract Floating Short Vector
145xxx x06xxx     MFSV    Multiply Floating Short Vector
145xxx x07xxx     DFSV    Divide Floating Short Vector

145xxx x10xxx     SMBPF   Set Mask Bit if Positive Floating
145xxx x11xxx     SMBZF   Set Mask Bit if Zero Floating
145xxx x12xxx     SMBNF   Set Mask Bit if Negative Floating

145xxx 016200     SHFWSV  Shuffle Floating/Word Short Vector

146xxx x02xxx     LDSV    Load Double Short Vector
146xxx x03xxx     STDSV   Store Double Short Vector
146xxx x04xxx     ADSV    Add Double Short Vector
146xxx x05xxx     SDSV    Subtract Double Short Vector
146xxx x06xxx     MDSV    Multiply Double Short Vector
146xxx x07xxx     DDSV    Divide Double Short Vector

146xxx x10xxx     SMBPD   Set Mask Bit if Positive Double
146xxx x11xxx     SMBZD   Set Mask Bit if Zero Double
146xxx x12xxx     SMBND   Set Mask Bit if Negative Double

146xxx 016200     SHDLSV  Shuffle Double/Long Short Vector

147xxx x00xxx     SWQSV   Swap Quad Short Vector

147xxx x02xxx     LQSV    Load Quad Short Vector
147xxx x03xxx     STQSV   Store Quad Short Vector
147xxx x04xxx     AQSV    Add Quad Short Vector
147xxx x05xxx     SQSV    Subtract Quad Short Vector
147xxx x06xxx     MQSV    Multiply Quad Short Vector
147xxx x07xxx     DQSV    Divide Quad Short Vector

147xxx x10xxx     SMBPQ   Set Mask Bit if Positive Quad
147xxx x11xxx     SMBZQ   Set Mask Bit if Zero Quad
147xxx x12xxx     SMBNQ   Set Mask Bit if Negative Quad

147xxx 016200     SHQSV   Shuffle Quad Short Vector

These registers are 256 bits long, and are fully packed with data. Therefore, register-to-register floating-point operations of this type do not provide any guard bits which are retained between operations, unlike the normal floating-point registers, which retain floating-point numbers in an internal form, to be described later in the section on the basic aspects of this architecture which does include some additional bits of precision when values are of a type which does not fill the register.

This is not necessarily a bad thing, as it does lead to more consistent results. A limited number of guard bits, following normal ALU design practice, are used when carrying out the calculations themselves, and conversions similar to those made to internal floating-point formats would also be used during calculations to simplify ALU operation. Thus, the short vector ALU, in many cases, would be working with numbers in the same format as that given as the internal format used in the regular floating point registers, but four bits shorter; but the conversions to and from the external representation take place with every arithmetic operation. Similarly, the regular ALUs would also, when performing calculations, use four guard bits internally additional to those maintained in the regular floating-point registers. Therefore, the short vector ALUs would usually be eight bits less wide, not four bits less wide, than the regular ALUs for numbers of the same precision. Operating on quad precision floating-point numbers, which occupy 128 bits in the register in both cases, would usually be an exception to this, with the exception of the case of the compatible floating-point format, which does have eight additional in-register guard bits for its 128-bit floating-point formats, unlike the other 128-bit floating-point format, because it has an eight-bit redundant exponent field to eliminate internally.

These guard bits are not to be confused with the guard bit from the set of guard, round, and sticky bits used during the course of a single calculation to ensure an accurate result. While these are not available with the simple floating type, they are available for use with short vector operations.

This distinction may, perhaps, be clarified by means of the following table:

Floating-point Type Guard, Round, and Sticky Bits Additional Guard Bits
Floating-Point Shorter than 128 bits in Regular Floating-Point Registers Yes Yes
128-bit Floating-Point Yes No
Floating-Point in Short Vector Registers Yes No
Simple Floating Type No No

If the bit marked M in the instruction is set, the bits of the accumulator/index register indicated by mR indicate which of the elements of the vector are operated on by the instruction. The short vector registers are each 256 bits in length; they can contain anything from two 128-bit quad precison floating-point numbers to thirty-two 8-bit bytes. For cases other than byte operations, the mask bits used are the contiguous least significant bits of the register selected.

The mode field in the instruction indicates the addressing mode. Its values are:

00 register-register
01 register-memory
10 memory-register

Thus, in mode 00, sX and sB are not used; in mode 01, sR is not used; in mode 10, dR is not used, and sX and sB are used with a memory operand that is actually the destination rather than the source for the instruction. Also, in mode 00, the Address field shown in the illustration is not present.

Also, if the four-bit operate code is 1110, then:

Two additional instructions are defined with type 000 and mode 01:

140xxx 0362xx  LSVM    Load Short Vector Multiple
140xxx 0363xx  STSVM   Store Short Vector Multiple

These instructions allow a range of the short vector registers to be saved or loaded for purposes of context switching.

As well, four additional instructions are defined with types 100, 101, 110, and 111, and mode 00:

144xxx 016200  SHSMHSV Shuffle Small/Halfword Short Vector
145xxx 016200  SHFWSV  Shuffle Floating/Word Short Vector
146xxx 016200  SHDLSV  Shuffle Double/Long Short Vector
147xxx 016200  SHQSV   Shuffle Quad Short Vector

These instructions have a format similar to the short vector multiple-register instructions, but sX and sB are not used, and the fields shown as dRl and dRh for that format instead serve again as dR and sR respectively.

All these modes require that the source and destination be an even-numbered short-vector register, as they move data from the source register and the one following it to the destination register and the one following it.

The SHQSV instruction takes the four 128-bit quad-precision floating-point numbers in the source, and places them in the destination in the order:

0 2 1 3

The SHDLSV instruction divides the source into eight 64-bit blocks, and places them in the destination in the order:

0 4 1 5 2 6 3 7

The SHFWSV instruction divides the source into sixteen 32-bit blocks, and places them in the destination in the order:

 0  8  1  9  2 10  3 11  4 12  5 13  6 14  7 15

The SHSMHSV instruction divides the source into thirty-two halfwords, and places them in the destination in the order:

 0 16  1 17  2 18  3 19  4 20  5 21  6 22  7 23
 8 24  9 25 10 26 11 27 12 28 13 29 14 30 15 31

These instructions are intended to assist in performing Fast-Fourier Transform operations using the short vector registers when either the long vector registers are not available to a process, or they do not exist, or are implemented in a slow fashion (i.e. simulated in main memory) on a particular implementation of the architecture.

Because Fast-Fourier Transform operations may be performed on different operand types, and because of the structure of the short vector registers, in the case of the short vector registers, as opposed to the long vector registers, instructions for assisting with the Stockham framework of the FFT rather than the Pease framework of the FFT were the ones provided with these registers.

This is more fully discussed in the section discussing those instruction modes in which the long vector registers and scratchpad are used, specifically in one page within the section about the Vector Register Mode, in which the FFT operations used for those modes, involving the Pease framework, as well as the other possible frameworks, are illustrated.

The type, in the first word of the instruction, can represent eight different types, seven of them being seven of the eight types used with conventional memory-reference instructions. The 48-bit Medium floating-point type is not allowed, as that length does not evenly subdivide a 256-bit vector. Instead, it is replaced by the Small type, which provides 16-bit floating-point numbers for use in applications such as signal analysis.

Three formats are available for floating-point numbers with this length. Two bits of the Program Status Block control the format of Small floating-point numbers; these are independent of the nine-bit field in the Program Status Block that controls the format of floating-point numbers of other sizes.

Small With Gradual Underflow

The first possible format is modelled after the Standard floating-point format. It is also a standard format in its own right, as it is used by advanced graphics chips that perform 3-D acceleration on personal computers.

In this format, numbers consist of a sign bit, five exponent bits in excess-14 format, and ten mantissa bits, not including the first bit of the mantissa, which is a hidden 1 bit. For an all-zero exponent field, there is no longer a hidden one bit, but numbers can be unnormalized to allow gradual underflow.

In this format, some possible numeric values are:

Data Item            Numeric Value             Power of Two

0 11110 1111111111    65,504
0 11110 0000000000    32,768                     15
0 10000 0000000000         2                      1
0 01111 1000000000         1.5
0 01111 0000000000         1                      0
0 01110 0000000000          .5                   -1
0 00001 0000000000         6.10352 * 10^(-4)    -14
0 00000 1000000000         3.05176 * 10^(-4)    -15
0 00000 0100000000         1.52588 * 10^(-4)    -16
0 00000 0000000001         5.96046 * 10^(-8)    -24
0 00000 0000000000         0

The maximum possible exponent value, 11111, is reserved for infinities and NaN values exactly as in the Standard floating-point format.

Small With Extremely Gradual Underflow

Numbers in the second of these formats have two exponent bits, and are encoded using extremely gradual underflow in a sophisticated manner which allows them to be compared using integer comparison instructions.

For a positive number, the fields in its representation are:

If the number is negative, 1 is used for the sign bits, and the remaining portion of the number still follows the same format, but all the bits in it are inverted (that is, a one's complement is performed).

The exponent is taken as being an unsigned number, and the binary point of the mantissa as being before its first digit; thus, some example values in this encoding are shown below:

16-bit Small       Fields                Numeric             Power of
Data Item                                Value               Two
              
0111111111111111   0 1 11 111111111111   7.99951
0111000000000000   0 1 11 000000000000   4                          2
0110111111111111   0 1 10 111111111111   3.99976
0110000000000000   0 1 10 000000000000   2                          1
0101000000000000   0 1 01 000000000000   1                          0
0100000000000000   0 1 00 000000000000    .5                       -1
0011100000000000   0 01 11 00000000000    .25                      -2
0011000000000000   0 01 10 00000000000    .125                     -3
0010100000000000   0 01 01 00000000000    .0625                    -4
0010000000000000   0 01 00 00000000000    .03125                   -5
0001110000000000   0 001 11 0000000000    .015625                  -6
0000111000000000   0 0001 11 000000000   9.76562 * 10^(-3)        -10
0000011100000000   0 00001 11 00000000   6.10352 * 10^(-4)        -14
0000001110000000   0 000001 11 0000000   3.81470 * 10^(-5)        -18
0000000111000000   0 0000001 11 000000   2.38419 * 10^(-6)        -22
0000000011100000   0 00000001 11 00000   1.49012 * 10^(-7)        -26
0000000001110000   0 000000001 11 0000   9.31323 * 10^(-9)        -30
0000000000111000   0 0000000001 11 000   5.82077 * 10^(-10)       -34
0000000000011100   0 00000000001 11 00   3.63798 * 10^(-11)       -38
0000000000001110   0 000000000001 11 0   2.27374 * 10^(-12)       -42
0000000000000111   0 0000000000001 11    1.42109 * 10^(-13)       -46
0000000000000110   0 0000000000001 10    7.10543 * 10^(-14)       -47
0000000000000101   0 0000000000001 01    3.55271 * 10^(-14)       -48
0000000000000100   0 0000000000001 00    1.77636 * 10^(-14)       -49
0000000000000011   0 00000000000001 1    8.88178 * 10^(-15)       -50
0000000000000010   0 00000000000001 0    4.44089 * 10^(-15)       -51
0000000000000001   0 000000000000001     2.22045 * 10^(-15)       -52
0000000000000000   0 000000000000000     0  

Note that at the low end of the range, the exponent field shrinks from two bits to one and then zero bits. This produces a distribution of represented points similar to that provided by A-law audio encoding.

As early music-quality digital audio systems used 14-bit fixed-point samples instead of 16-bit ones, I envisaged this format as an alternative to fixed-point samples for uncompressed digital audio applications. However, there is a problem with using floating-point samples; since soft sounds are not always masked by loud sounds in different frequency ranges, the shifting noise floor of floating-point encoding can be distracting. One remedy would be to apply floating-point encoding to a transformed signal that has already been divided into critical bands: these are the narrow frequency ranges within which sounds do mask each other, and they are used as part of the compression algorithms for the Digital Compact Cassette (DCC) from Philips and the MiniDisc from Sony. It would also be appropriate to apply equalization, because low-frequency components of music will typically have much larger amplitudes than high-frequency components; the less rapid motion in low-frequency vibrations means that they have much less energy for a given amplitude than high-frequency vibrations.

Small With Hyper-Gradual Overflow and Underflow

The third possible format for 16-bit floating-point numbers attempts to provide a very wide exponent range. Numbers normally consist of a sign bit, three exponent bits in excess-4 format, and twelve mantissa bits, which do not include a hidden 1 bit. This is true when the exponent begins with 01 or 10. The size of the exponent is increased by two bits for every additional 0 or 1 that follows the initial 0 or 1 respectively, until the size of the mantissa field is reduced to a minimum of eight bits in length. The lowest possible value for the exponent field is thus an all-zeroes exponent field, which will be seven bits long; this will recieve the same special treatment as it does in the Standard floating-point format, moving the radix point one place and not having a hidden first one bit, to allow zero to be represented; thus, only gradual underflow takes place for the most extreme small values, which are also the only values to have a precision of less than nine bits.

Some representative values in this format are:

0 1111111 11111111    2,093,056
0 1111111 00000000    1,048,576                    20
0 1110111 00000000        4,096                    12
0 1110000 00000000           32                     5
0 11011 0000000000           16                     4
0 11000 0000000000            2                     1
0 101 110000000000            1.75
0 101 100000000000            1.5
0 101 010000000000            1.25
0 101 000000000000            1                     0
0 100 000000000000             .5                  -1
0 011 000000000000             .25                 -2
0 010 000000000000             .125                -3
0 00111 0000000000             .0625               -4
0 00100 0000000000             .0078125            -7
0 0001111 00000000             .00390625           -8
0 0001000 00000000            3.05176 * 10^(-5)   -15
0 0000001 00000000            2.38419 * 10^(-7)   -22
0 0000000 10000000            1.19209 * 10^(-7)   -23
0 0000000 01000000            5.96046 * 10^(-8)   -24
0 0000000 00000001            2.98023 * 10^(-8)   -30
0 0000000 00000000            0

Small Format Conversion Instructions

Extra instructions are defined to permit conversion between this particular number type, not used anywhere else, and more conventional types:

142xxx x16x20  CFSMSV  Convert Floating to Small Short Vector 
142xxx x16x30  CSMFSV  Convert Small to Floating Short Vector

Here, to permit use of the mask bit and a mask register with the instructions, where the type bits are 2 or 3, the three-bit opcode field is moved from the mR field to the dR field.

The destination operand is always a single short vector register considered as being divided into sixteen quantities of type small; the source operand is either a pair of short vector registers containing sixteen quantities of type floating or four short vector registers containing sixteen quantities of type long.

A pair of short vector registers must begin with an even-numbered one; a group of four must begin with short vector register 0, 4, 8 or 12.


[Next] [Up] [Previous]