[Up] [Previous] [Next Section]

Classic VLIW

Signal processing chips, such as the Texas Instrument TMS320C, are often referred to as having a VLIW architecture. The instructions for these processors are similar to those of a RISC processor design, but in addition, one bit per 32-bit instruction is dedicated to indicating if a block of eight instructions needs to be broken before that instruction is executed, for example due to a dependency on the result of a closely preceding instruction.

An even simpler case is the VLIW mode of the Intel i860. This chip had a conventional RISC instruction set, but when VLIW mode was turned on, it read and executed instructions in pairs, with the requirement that in each pair of instructions, one instruction would be an integer instruction, and the other instruction would be a floating-point instruction.

The Itanium has a 128-bit instruction word. Five bits are used to indicate which of the three instructions in the word can be executed in parallel with each other; each of the three instructions is 41 bits long, but the instruction format in each slot differs, as the instructions allowed in each slot are different.

However, earlier computer designs known as VLIW, such as the Advanced Flexible Processor (AFP) and Cyberplus from Control Data, were significantly different.

Let us imagine such a computer to have four add/subtract units, two multiply units, and one divide unit. These units operate on 64-bit floating-point numbers only. As well, there are two add/subtract units, one multiply unit, and a logical operation unit, for 16-bit integers.

One instruction is executed per cycle, and each instruction controls the routing of the inputs and outputs of each of the units. These units would all have two inputs and one output.

The units are presumed to be pipelined; a floating-point add/subtract might take three cycles, a multiply five cycles, a divide fifteen cycles.

Memory would also have to be accessed.

I am going to assume that this VLIW computer is an add-on processor to a computer with a more conventional architecture. Thus, I will provide it with storage for 32,768 floating-point numbers, at 64 bits each, and a separate memory containing 4,096 integers, at 16 bits each. It is because the memory only needs 15 bits for addressing that the integers are only 16 bits long: the computer is intended to perform computations primarily concerned with floating-point numbers - but integers may occasionally be needed to find the floating-point number one wants.

The memory, however, will have an unusual structure. It will be fast, internal memory, so once each cycle data can be read from it and written to it.

Only one word of data can be written to each of the two memory banks in a given cycle. However, in the case of the floating-point memory, eight different words, at any arbitrary positions, can be read in one cycle. In the case of the integer memory, four different words, at any arbitrary positions, can be read.

A simple and obvious way to implement a memory with that property is simply to have eight memories, each one with 32,768 locations of 64 bits in it. Writes would be broadcast to all eight memories at once; reads would come from each memory individually.

In practice, this kind of memory does exist; it is called multiport memory, and is usually implemented by adding one extra output transistor to each memory cell for each port to be added, and then providing an extra copy of the addressing circuitry to handle each set of outputs.


A VLIW instruction word would look like this:

For the four floating-point add subtract units:

One bit indicating if an operation will be initiated this cycle.

One bit indicating if the operation is an add or a subtract.

Two fields of four bits indicating the source of each of the two operands.

For the two floating-point multiply units, and the floating-point divide unit, the fields are the same, except the bit indicating which operation is to be performed is not present.

The four bit source fields can refer to the output (on the end of the previous cycle) of one of these seven arithmetic units, or the data from one of the eight memory ports.

For six of the eight read ports of the memory for floating-point data, the VLIW word would have the following fields:

One bit indicating if a fetch will be performed.

Fifteen bits indicating the address from which the fetch is performed.

For the remaining two read ports of the memory for floating-point data, the VLIW word would have the following fields:

One bit indicating if a fetch will be performed.

Three bits indicating the source of the address from which to fetch.

These three bits would indicate either the output from an integer arithmetic unit, or the output from a read port of the integer memory.

Finally, there would be a field in the VLIW instruction that controls the write port for the floating-point memory. Its structure would be:

One bit indicating if a store will be performed this cycle.

Four bits indicating the source of the data to be stored.

One bit indicating if the address is specified or calculated.

Fifteen bits indicating the address to be used if specified.

Three bits indicating the integer source of the address to be used if calculated.

Then the VLIW word would continue with fields describing the integer operations to take place during the cycle.

For the two integer add/subtract units:

One bit to indicate if an operation is initiated this cycle.

One bit to indicate if the operation is an add or a subtract.

Three bits for each of the two operands to indicate its source.

For the integer multiply unit, a bit to indicate the operation is not required.

For the logical operation unit, the operation is instead indicated by a two bit field, as it can be AND, OR, or XOR. As well, an extra bit is provided to invert the output of the operation.

For each of the two integer memory read ports, the control field will have the structure:

One bit to indicate if a fetch is performed.

One bit to indicate if the address is specified or calculated.

A twelve-bit specified address.

A three-bit source field for a calculated address.

This structure will also apply to the integer memory write port.


Note that this makes no provision for transfers of control; programs in this form could be called by more conventional instructions, and they could indicate that a block of m instructions is to be executed n times. However, some intrinsic capacity for control in the VLIW words themselves would be helpful.


[Up] [Previous] [Next Section]