The Short Vector Instructions

The short vector instructions operate on vectors of a fixed length, with a set of vector registers of that length available. These instructions resemble the MMX feature, or the later SSE (Streaming SIMD Extensions) and AVX (Advanced Vector eXtensions) features, of Intel microprocessors, or the AltiVec feature of both some Motorola 680x0 microprocessors and PowerPC microprocessors from Motorola, IBM, and others, or AMD's 3D Now! instructions, by operating on a relatively long word that can be split into multiple smaller segments.

The instructions dealt with on this page are available from both Normal Mode and Compressed Mode. The Short Vector Fourier Assist and Short Vector Type Conversion instructions are not available from Simple Mode; the others are available, but have different opcodes.

The Lincoln Laboratory TX-2 computer was a computer built from discrete transistors that could perform arithmetic on a 36-bit word or on its two halves or four quarters simultaneously; as well, the AN/FSQ-30 and 31 computers offered similar capabilities with their 48-bit word, and, earlier, the AN/FSQ-7 computer designed by IBM for SAGE operated on pairs of 16-bit numbers at a time, so there are some precedents for operations on vectors of small to medium size.

Also, both the VIS (Visual Instruction Set) instructions of the Sun SPARC and the MAX (Multimedia Acceleratin eXtensions) instructions of Hewlett-Packard's PA-RISC were available shortly before Intel introduced their MMX (Multi-Media eXtensions).

The short vector instructions have the formats:

and the opcodes of the standard short vector instructions are:

010422 1000xx 00xxxx     SWBSV   Swap Byte Short Vector

010422 1010xx 00xxxx     LBSV    Load Byte Short Vector
010422 1014xx 00xxxx     STBSV   Store Byte Short Vector
010422 1020xx 00xxxx     ABSV    Add Byte Short Vector
010422 1024xx 00xxxx     SBSV    Subtract Byte Short Vector

010422 1040xx 00xxxx     SMBPB   Set Mask Bit if Positive Byte
010422 1044xx 00xxxx     SMBZB   Set Mask Bit if Zero Byte
010422 1050xx 00xxxx     SMBNB   Set Mask Bit if Negative Byte
010422 1054xx 00xxxx     XBSV    XOR Byte Short Vector
010422 1060xx 00xxxx     NBSV    AND Byte Short Vector
010422 1064xx 00xxxx     OBSV    OR Byte Short Vector

010422 1000xx 01xxxx     SWHSV   Swap Halfword Short Vector

010422 1010xx 01xxxx     LHSV    Load Halfword Short Vector
010422 1014xx 01xxxx     STHSV   Store Halfword Short Vector
010422 1020xx 01xxxx     AHSV    Add Halfword Short Vector
010422 1024xx 01xxxx     SHSV    Subtract Halfword Short Vector
010422 1030xx 01xxxx     MHSV    Multiply Halfword Short Vector
010422 1034xx 01xxxx     DHSV    Divide Halfword Short Vector

010422 1040xx 01xxxx     SMBPH   Set Mask Bit if Positive Halfword
010422 1044xx 01xxxx     SMBZH   Set Mask Bit if Zero Halfword
010422 1050xx 01xxxx     SMBNH   Set Mask Bit if Negative Halfword
010422 1054xx 01xxxx     XHSV    XOR Halfword Short Vector
010422 1060xx 01xxxx     NHSV    AND Halfword Short Vector
010422 1064xx 01xxxx     OHSV    OR Halfword Short Vector

010422 1000xx 02xxxx     SWSV    Swap Short Vector

010422 1010xx 02xxxx     LSV     Load Short Vector
010422 1014xx 02xxxx     STSV    Store Short Vector
010422 1020xx 02xxxx     ASV     Add Short Vector
010422 1024xx 02xxxx     SSV     Subtract Short Vector
010422 1030xx 02xxxx     MSV     Multiply Short Vector
010422 1034xx 02xxxx     DSV     Divide Short Vector

010422 1040xx 02xxxx     SMBP    Set Mask Bit if Positive
010422 1044xx 02xxxx     SMBZ    Set Mask Bit if Zero
010422 1050xx 02xxxx     SMBN    Set Mask Bit if Negative
010422 1054xx 02xxxx     XSV     XOR Short Vector
010422 1060xx 02xxxx     NSV     AND Short Vector
010422 1064xx 02xxxx     OSV     OR Short Vector

010422 1000xx 03xxxx     SWLSV   Swap Long Short Vector

010422 1010xx 03xxxx     LLSV    Load Long Short Vector
010422 1014xx 03xxxx     STLSV   Store Long Short Vector
010422 1020xx 03xxxx     ALSV    Add Long Short Vector
010422 1024xx 03xxxx     SLSV    Subtract Long Short Vector
010422 1030xx 03xxxx     MLSV    Multiply Long Short Vector
010422 1034xx 03xxxx     DLSV    Divide Long Short Vector

010422 1040xx 03xxxx     SMBPL   Set Mask Bit if Positive Long
010422 1044xx 03xxxx     SMBZL   Set Mask Bit if Zero Long
010422 1050xx 03xxxx     SMBNL   Set Mask Bit if Negative Long
010422 1054xx 03xxxx     XLSV    XOR Long Short Vector
010422 1060xx 03xxxx     NLSV    AND Long Short Vector
010422 1064xx 03xxxx     OLSV    OR Long Short Vector

010422 1010xx 04xxxx     LSMSV   Load Small Short Vector
010422 1014xx 04xxxx     STSMSV  Store Small Short Vector
010422 1020xx 04xxxx     ASMSV   Add Small Short Vector
010422 1024xx 04xxxx     SSMSV   Subtract Small Short Vector
010422 1030xx 04xxxx     MSMSV   Multiply Small Short Vector
010422 1034xx 04xxxx     DSMSV   Divide Small Short Vector

010422 1040xx 04xxxx     SMBPSM  Set Mask Bit if Positive Small
010422 1044xx 04xxxx     SMBZSM  Set Mask Bit if Zero Small
010422 1050xx 04xxxx     SMBNSM  Set Mask Bit if Negative Small


010422 1010xx 05xxxx     LFSV    Load Floating Short Vector
010422 1014xx 05xxxx     STFSV   Store Floating Short Vector
010422 1020xx 05xxxx     AFSV    Add Floating Short Vector
010422 1024xx 05xxxx     SFSV    Subtract Floating Short Vector
010422 1030xx 05xxxx     MFSV    Multiply Floating Short Vector
010422 1034xx 05xxxx     DFSV    Divide Floating Short Vector

010422 1040xx 05xxxx     SMBPF   Set Mask Bit if Positive Floating
010422 1044xx 05xxxx     SMBZF   Set Mask Bit if Zero Floating
010422 1050xx 05xxxx     SMBNF   Set Mask Bit if Negative Floating


010422 1010xx 06xxxx     LDSV    Load Double Short Vector
010422 1014xx 06xxxx     STDSV   Store Double Short Vector
010422 1020xx 06xxxx     ADSV    Add Double Short Vector
010422 1024xx 06xxxx     SDSV    Subtract Double Short Vector
010422 1030xx 06xxxx     MDSV    Multiply Double Short Vector
010422 1034xx 06xxxx     DDSV    Divide Double Short Vector

010422 1040xx 06xxxx     SMBPD   Set Mask Bit if Positive Double
010422 1044xx 06xxxx     SMBZD   Set Mask Bit if Zero Double
010422 1050xx 06xxxx     SMBND   Set Mask Bit if Negative Double


010422 1000xx 07xxxx     SWQSV   Swap Quad Short Vector

010422 1010xx 07xxxx     LQSV    Load Quad Short Vector
010422 1014xx 07xxxx     STQSV   Store Quad Short Vector
010422 1020xx 07xxxx     AQSV    Add Quad Short Vector
010422 1024xx 07xxxx     SQSV    Subtract Quad Short Vector
010422 1030xx 07xxxx     MQSV    Multiply Quad Short Vector
010422 1034xx 07xxxx     DQSV    Divide Quad Short Vector

010422 1040xx 07xxxx     SMBPQ   Set Mask Bit if Positive Quad
010422 1044xx 07xxxx     SMBZQ   Set Mask Bit if Zero Quad
010422 1050xx 07xxxx     SMBNQ   Set Mask Bit if Negative Quad

In Simple Mode, these instructions will start with 044220 instead of 010422. This follows the general rule that instructions of the form:

00010abcdexyzxyz

in Normal Mode and Compact Mode are represented by instructions of the form

01abcdexyzxyz000

in Simple Mode, while instructions in Normal Mode and Compact Mode of the form

00011abcdexyzxyz

are not available from Simple Mode.

In addition, a set of instructions similar to the long format memory reference instructions and the long vector instructions is available that operates on short vectors:

These registers are 256 bits long, and are fully packed with data. Therefore, register-to-register floating-point operations of this type do not provide any guard bits which are retained between operations, unlike the normal floating-point registers, which retain floating-point numbers in an internal form, to be described later in the section on the basic aspects of this architecture which does include some additional bits of precision when values are of a type which does not fill the register.

This is not necessarily a bad thing, as it does lead to more consistent results. A limited number of guard bits, following normal ALU design practice, are used when carrying out the calculations themselves, and conversions similar to those made to internal floating-point formats would also be used during calculations to simplify ALU operation. Thus, the short vector ALU, in many cases, would be working with numbers in the same format as that given as the internal format used in the regular floating point registers, but four bits shorter; but the conversions to and from the external representation take place with every arithmetic operation. Similarly, the regular ALUs would also, when performing calculations, use four guard bits internally additional to those maintained in the regular floating-point registers. Therefore, the short vector ALUs would usually be eight bits less wide, not four bits less wide, than the regular ALUs for numbers of the same precision. Operating on quad precision floating-point numbers, which occupy 128 bits in the register in both cases, would usually be an exception to this, with the exception of the case of the compatible floating-point format, which does have eight additional in-register guard bits for its 128-bit floating-point formats, unlike the other 128-bit floating-point format, because it has an eight-bit redundant exponent field to eliminate internally.

These guard bits are not to be confused with the guard bit from the set of guard, round, and sticky bits used during the course of a single calculation to ensure an accurate result. While these are not available with the simple floating type, they are available for use with short vector operations.

This distinction may, perhaps, be clarified by means of the following table:

Floating-point Type	Guard, Round, and Sticky Bits	Additional Guard Bits
Floating-Point Shorter than 128 bits in Regular Floating-Point Registers	Yes	Yes
128-bit Floating-Point	Yes	No
Floating-Point in Short Vector Registers	Yes	No
Simple Floating Type	No	No

If the bit marked M in the instruction is set, the bits of the accumulator/index register indicated by mR indicate which of the elements of the vector are operated on by the instruction. The short vector registers are each 256 bits in length; they can contain anything from two 128-bit quad precison floating-point numbers to thirty-two 8-bit bytes. For cases other than byte operations, the mask bits used are the contiguous least significant bits of the register selected.

The mode field in the instruction indicates the addressing mode. Its values are:

01 register-memory
10 memory-register

Thus, in mode 01, sR is not used; in mode 10, dR is not used, and sX and sB are used with a memory operand that is actually the destination rather than the source for the instruction.

Also, if the four-bit operate code is 1110, then:

The opcodes of the short vector multiple register instructions are:

Normal         Simple
Compact
-------------  -------------
000222 xxx0xx  102220 xxx0xx   LSVM    Load Short Vector Multiple
000322 xxx0xx  103220 xxx0xx   STSVM   Store Short Vector Multiple

These instructions allow a range of the short vector registers to be saved or loaded for purposes of context switching.

The opcodes of the short vector Fourier assist instructions are:

140000 044xxx     SHSMHSV Shuffle Small/Halfword Short Vector
140000 045xxx     SHFWSV  Shuffle Floating/Word Short Vector
140000 046xxx     SHDLSV  Shuffle Double/Long Short Vector
140000 047xxx     SHQSV   Shuffle Quad Short Vector

These instructions are available only from Normal Mode and Compact Mode.

These instructions have a format similar to the short vector multiple-register instructions, but sX and sB are not used, and the fields shown as dRl and dRh for that format instead serve again as dR and sR respectively.

All these modes require that the source and destination be an even-numbered short-vector register, as they move data from the source register and the one following it to the destination register and the one following it.

The SHQSV instruction takes the four 128-bit quad-precision floating-point numbers in the source, and places them in the destination in the order:

0 2 1 3

The SHDLSV instruction divides the source into eight 64-bit blocks, and places them in the destination in the order:

0 4 1 5 2 6 3 7

The SHFWSV instruction divides the source into sixteen 32-bit blocks, and places them in the destination in the order:

 0  8  1  9  2 10  3 11  4 12  5 13  6 14  7 15

The SHSMHSV instruction divides the source into thirty-two halfwords, and places them in the destination in the order:

 0 16  1 17  2 18  3 19  4 20  5 21  6 22  7 23
 8 24  9 25 10 26 11 27 12 28 13 29 14 30 15 31

These instructions are intended to assist in performing Fast-Fourier Transform operations using the short vector registers when either the long vector registers are not available to a process, or they do not exist, or are implemented in a slow fashion (i.e. simulated in main memory) on a particular implementation of the architecture.

Because Fast-Fourier Transform operations may be performed on different operand types, and because of the structure of the short vector registers, in the case of the short vector registers, as opposed to the long vector registers, instructions for assisting with the Stockham framework of the FFT rather than the Pease framework of the FFT were the ones provided with these registers.

This is more fully discussed on the next page, in which the FFT operations used for those modes, involving the Pease framework, as well as the other possible frameworks, are illustrated.

The type, in the first word of the instruction, can represent eight different types, seven of them being seven of the eight types used with conventional memory-reference instructions. The 48-bit Medium floating-point type is not allowed, as that length does not evenly subdivide a 256-bit vector. Instead, it is replaced by the Small type, which provides 16-bit floating-point numbers for use in applications such as signal analysis.

Three formats are available for floating-point numbers with this length. Two bits of the Program Status Block control the format of Small floating-point numbers; these are independent of the nine-bit field in the Program Status Block that controls the format of floating-point numbers of other sizes.

Small With Gradual Underflow

The first possible format is modelled after the Standard floating-point format. It is also a standard format in its own right, as it is used by advanced graphics chips that perform 3-D acceleration on personal computers.

In this format, numbers consist of a sign bit, five exponent bits in excess-14 format, and ten mantissa bits, not including the first bit of the mantissa, which is a hidden 1 bit. For an all-zero exponent field, there is no longer a hidden one bit, but numbers can be unnormalized to allow gradual underflow.

In this format, some possible numeric values are:

Data Item            Numeric Value             Power of Two

0 11110 1111111111    65,504
0 11110 0000000000    32,768                     15
0 10000 0000000000         2                      1
0 01111 1000000000         1.5
0 01111 0000000000         1                      0
0 01110 0000000000          .5                   -1
0 00001 0000000000         6.10352 * 10^(-4)    -14
0 00000 1000000000         3.05176 * 10^(-4)    -15
0 00000 0100000000         1.52588 * 10^(-4)    -16
0 00000 0000000001         5.96046 * 10^(-8)    -24
0 00000 0000000000         0

The maximum possible exponent value, 11111, is reserved for infinities and NaN values exactly as in the Standard floating-point format.

Small With Extremely Gradual Underflow

Numbers in the second of these formats have two exponent bits, and are encoded using extremely gradual underflow in a sophisticated manner which allows them to be compared using integer comparison instructions.

For a positive number, the fields in its representation are:

The sign bit (which is 0)
The mantissa bits up to and including the first 1 bit in the mantissa
The exponent
The remaining mantissa bits

If the number is negative, 1 is used for the sign bits, and the remaining portion of the number still follows the same format, but all the bits in it are inverted (that is, a one's complement is performed).

The exponent is taken as being an unsigned number, and the binary point of the mantissa as being before its first digit; thus, some example values in this encoding are shown below:

16-bit Small       Fields                Numeric             Power of
Data Item                                Value               Two
              
0111111111111111   0 1 11 111111111111   7.99951
0111000000000000   0 1 11 000000000000   4                          2
0110111111111111   0 1 10 111111111111   3.99976
0110000000000000   0 1 10 000000000000   2                          1
0101000000000000   0 1 01 000000000000   1                          0
0100000000000000   0 1 00 000000000000    .5                       -1
0011100000000000   0 01 11 00000000000    .25                      -2
0011000000000000   0 01 10 00000000000    .125                     -3
0010100000000000   0 01 01 00000000000    .0625                    -4
0010000000000000   0 01 00 00000000000    .03125                   -5
0001110000000000   0 001 11 0000000000    .015625                  -6
0000111000000000   0 0001 11 000000000   9.76562 * 10^(-3)        -10
0000011100000000   0 00001 11 00000000   6.10352 * 10^(-4)        -14
0000001110000000   0 000001 11 0000000   3.81470 * 10^(-5)        -18
0000000111000000   0 0000001 11 000000   2.38419 * 10^(-6)        -22
0000000011100000   0 00000001 11 00000   1.49012 * 10^(-7)        -26
0000000001110000   0 000000001 11 0000   9.31323 * 10^(-9)        -30
0000000000111000   0 0000000001 11 000   5.82077 * 10^(-10)       -34
0000000000011100   0 00000000001 11 00   3.63798 * 10^(-11)       -38
0000000000001110   0 000000000001 11 0   2.27374 * 10^(-12)       -42
0000000000000111   0 0000000000001 11    1.42109 * 10^(-13)       -46
0000000000000110   0 0000000000001 10    7.10543 * 10^(-14)       -47
0000000000000101   0 0000000000001 01    3.55271 * 10^(-14)       -48
0000000000000100   0 0000000000001 00    1.77636 * 10^(-14)       -49
0000000000000011   0 00000000000001 1    8.88178 * 10^(-15)       -50
0000000000000010   0 00000000000001 0    4.44089 * 10^(-15)       -51
0000000000000001   0 000000000000001     2.22045 * 10^(-15)       -52
0000000000000000   0 000000000000000     0

Note that at the low end of the range, the exponent field shrinks from two bits to one and then zero bits. This produces a distribution of represented points similar to that provided by A-law audio encoding.

As early music-quality digital audio systems used 14-bit fixed-point samples instead of 16-bit ones, I envisaged this format as an alternative to fixed-point samples for uncompressed digital audio applications. However, there is a problem with using floating-point samples; since soft sounds are not always masked by loud sounds in different frequency ranges, the shifting noise floor of floating-point encoding can be distracting. One remedy would be to apply floating-point encoding to a transformed signal that has already been divided into critical bands: these are the narrow frequency ranges within which sounds do mask each other, and they are used as part of the compression algorithms for the Digital Compact Cassette (DCC) from Philips and the MiniDisc from Sony. It would also be appropriate to apply equalization, because low-frequency components of music will typically have much larger amplitudes than high-frequency components; the less rapid motion in low-frequency vibrations means that they have much less energy for a given amplitude than high-frequency vibrations.

Small With Hyper-Gradual Overflow and Underflow

The third possible format for 16-bit floating-point numbers attempts to provide a very wide exponent range. Numbers normally consist of a sign bit, three exponent bits in excess-4 format, and twelve mantissa bits, which do not include a hidden 1 bit. This is true when the exponent begins with 01 or 10. The size of the exponent is increased by two bits for every additional 0 or 1 that follows the initial 0 or 1 respectively, until the size of the mantissa field is reduced to a minimum of eight bits in length. The lowest possible value for the exponent field is thus an all-zeroes exponent field, which will be seven bits long; this will recieve the same special treatment as it does in the Standard floating-point format, moving the radix point one place and not having a hidden first one bit, to allow zero to be represented; thus, only gradual underflow takes place for the most extreme small values, which are also the only values to have a precision of less than nine bits.

Some representative values in this format are:

0 1111111 11111111    2,093,056
0 1111111 00000000    1,048,576                    20
0 1110111 00000000        4,096                    12
0 1110000 00000000           32                     5
0 11011 0000000000           16                     4
0 11000 0000000000            2                     1
0 101 110000000000            1.75
0 101 100000000000            1.5
0 101 010000000000            1.25
0 101 000000000000            1                     0
0 100 000000000000             .5                  -1
0 011 000000000000             .25                 -2
0 010 000000000000             .125                -3
0 00111 0000000000             .0625               -4
0 00100 0000000000             .0078125            -7
0 0001111 00000000             .00390625           -8
0 0001000 00000000            3.05176 * 10^(-5)   -15
0 0000001 00000000            2.38419 * 10^(-7)   -22
0 0000000 10000000            1.19209 * 10^(-7)   -23
0 0000000 01000000            5.96046 * 10^(-8)   -24
0 0000000 00000001            2.98023 * 10^(-8)   -30
0 0000000 00000000            0

Small Format Conversion Instructions

Extra instructions are defined to permit conversion between this particular number type, not used anywhere else, and more conventional types:

140000 06x0xx  CFSMSV  Convert Floating to Small Short Vector 
140000 07x0xx  CSMFSV  Convert Small to Floating Short Vector

Here, to permit use of the mask bit and a mask register with the instructions, where the type bits are 2 or 3, the three-bit opcode field is moved from the mR field to the dR field.

The destination operand is always a single short vector register considered as being divided into sixteen quantities of type small; the source operand is either a pair of short vector registers containing sixteen quantities of type floating or four short vector registers containing sixteen quantities of type long.

A pair of short vector registers must begin with an even-numbered one; a group of four must begin with short vector register 0, 4, 8 or 12.