The Concertina II Architecture

Welcome to the home page of the Concertina II computer architecture.

The original Concertina computer architecture was originally intended as a simple example of a conventional old-style CISC architecture, to help explain how computers work. It was expanded over time to include many features from a wide selection of historical computer architectures, to explain those as well.

Concertina II was intended as an ISA that could conceivably be of practical use in an actual implementation. However, I cannot make ambitious claims for it, as my experience in this area is quite limited. This architecture went through quite a number of drafts before I felt that I had struck an acceptable balance between the various factors that had to be compromised to provide the architecture with the capabilities I sought.

However, I believe that the current version of the ISA is a sound basis on which to proceed, and I only expect to be changing it with minor tweaks as I continue to flesh out the architecture and describe its features.

Once I have it completed, it may serve as an alternative to RISC-V, even though the designer of that architecture is far more knowledgeable and experienced than I am. This is because I feel it may at least suit some people's tastes more than RISC-V does.

Introduction

What is the Concertina II ISA, and what choices were made in its design?

The Concertina II design is still unfinished; many parts of it are yet to be described, and, although I do not intend to tear it up and start afresh, as I no longer feel I will be able to do better, it is still subject to minor tweaks.

It will be freely available to all to implement without restrictions once completed, subject to export controls on computer technology.

The basic Concertina II instruction set is largely patterned after today's most popular type of ISA (instruction set architecture) design, RISC (reduced instruction set computing), but it does not qualify as a genuine RISC design by any reasonable contemporary definition of RISC, even the least puristic.

The basic instruction set consists of 32-bit instructions, but also adds the ability to use a pair of 16-bit instructions at any point in the sequence of instructions in place of a 32-bit instruction.

This allows increasing code density by using smaller instructions for many operations, without losing the simplicity of fetching and decoding instructions gained by having all instructions of the same length.

As in many RISC designs, there are two main register files, one for integer values (with registers that are 64 bits wide) and one for floating-point values (with registers that are 128 bits wide), each of which contains 32 registers.

Also, the memory-reference instructions are of the load-store variety, following standard RISC practice.

The following extensions to the RISC model are included in the most basic portion of the instruction set:

For integer variable types shorter than the full register length of 64 bits, following the lead of the IBM System/360, there are three kinds of load instruction, all of which load values into the least significant bits of the destination register: Load, which performs sign extension; Unsigned Load, which clears the bits more significant than those of the value loaded; and Insert, which leaves all other bits intact and unaffected.
Full base-index addressing is provided. Memory-reference instructions typically have a three-bit index register field, a two-bit or three-bit base register field, and a displacement which may be 16, 15, or 12 bits in width. The 32 integer registers are divided into four groups of eight registers; the index registers are found in the first group, base registers for 20-bit displacements are found in the second, base registers for 12-bit displacements are found in the third, and base registers for 16-bit displacements are found in the fourth.

It is precisely because base-index addressing is provided by restricting potential index registers to registers 1-7, and potential base registers to groups of 7 (which group depends on the displacement length) that this design does not qualify as RISC, and instead could be called CISC in RISC clothing.

Typically, RISC architectures normally only allow two registers to be indicated in a memory-reference instruction. One is the destination register of the instruction, and the other one is the one the contents of which are added to the displacement to form the effective address, Since a base register is needed for any memory access when the displacement is not large enough to indicate any location in the available memory, this means that the advantage of having an index register isn't available, and array access require additional explicit arithmetic instructions to compute addresses.

Thus, since the use of arrays is a very common operation, full base-index addressing was considered a very important feature to add.

In order to make it possible to provide this feature, the integer registers were split up into groups of eight so that the index register and base register fields could be only three bits long instead of five bits long, thus allowing both to fit in an instruction.

Normally, if one allocates a block of memory containing 65,536 bytes, using a base register to point to that block, it is not useful to have addressing modes that can only access the first 4,096 bytes of that block. Therefore, separate groups of registers are used as the possible base registers for different sizes of displacement values.

Only one register serves as the implicit base register for 15-bit displacements; this is done to allow one larger block of memory to be used in conjunction with those accessed with 12-bit displacements. This permits more compact memory-reference instructions, and is inspired by the System/360 Model 20 computer.

The above summarizes how the basic instruction set of this computer was designed to take the basic RISC design, and offer important extensions to it, while still having instructions that fit in 32 bits.

But a number of other extensions are also offered. These require going beyond the somewhat RISC-like model of the basic instruction set, and instead recognizing that this architecture also has VLIW (Very Long Instruction Word) characteristics.

Instructions are grouped in blocks of 256 bits, each of which contains eight 32-bit instruction slots. If feasible, an implementation aiming for maximum performance should have at least a 256-bit data bus to main memory, permitting a block of instructions to be fetched at once.

A small portion of the opcode space for instructions is dedicated to codes which represent headers instead of instructions. A block may begin with a header, and if it does, an additional header may follow it. A header may be 32, 48, or 64 bits long. 48-bit long headers are possible because some headers indicate that the instruction set to be used in the current block will not be the basic one composed only of 32-bit instructions, but instead one containing variable-length instructions, with the length of each instruction being a multiple of 16 bits.

Headers, if any, are processed before the instructions in a block are decoded.

After the headers are processed, or after it is determined that the block does not begin with a header, the computer has the information required to decode all the instructions in the block in parallel.

One of the most important features that having headers provides, which is still considered part of the basic instruction set of the Concertina II architecture, is pseudo-immediate values.

Some register-to-register instructions may have a source register specification replaced by a five-bit byte pointer to an address within the current instruction block, which points to an operand for that instruction.

This capability is supported by headers which contain a three bit decode field, which indicates that some of the eight 32-bit instruction slots in the current block are to be ignored during instruction decoding, and skipped over in execution, so that pseudo-immediate values can be placed in them.

What are pseudo-immediate values, and why are they included in this ISA? Essentially, they are inspired by the Heads and Tails design of Heidi Pan. As Mitch Alsup has reminded us all in the design of his "My 66000" ISA, immediate mode instructions have the advantage that a constant value can be used in a calculation without requiring an additional fetch of data, with all the delays and overhead of memory accesses in modern architectures, where DRAM is slow compared to processor logic.

This is because the immediate value is part of the instruction itself, and thus has already been fetched as part of the instruction stream.

But since data items come in several widths, comprehensive support of immediate values means that instructions must come in many different lengths, and I felt this would complicate their decoding to an unacceptable extent.

With pseudo-immediate values, the length of the instruction doesn't have to be changed. A pointer to the value only takes up the same space as a register specification.

But if the value is fetched from a location indicated by a pointer, it isn't an immediate value any more. Hence the term "pseudo-immediate" - given that instructions are fetched from memory in 256-bit blocks, and the data to which the pointer refers is within the same block as the instruction itself, even though the values are not actually immediate values, they still offer the same basic advantage as immediate values. (To some extent, of course, this depends on how the implementation handles the instruction stream. Specifically, to gain the full advantages of this, the entire block needs to be buffered within the processor during instruction decoding.)

In addition to pseudo-immediate values, headers allow two basic sets of features to be added to the ISA that go beyond the RISC model.

The common VLIW feature of a break bit in association with each instruction can be added, which indicates that all the instructions in a block between a pair of break bits that are set to 1 may be executed in parallel.
Variable-length instructions, both in the sense of allowing 16-bit instructions to occur singly, and also allowing instructions longer than 32 bits, as in CISC architectures, are possible; the header explicitly indicates where instructions begin and end, so advanced algorithms and sophisticated logic design, and, even more importantly, the sequential decoding of instructions, are not required, permitting this to be implemented in a highly efficient manner.

Thus, while the architecture initially has the appearance of a conventional RISC architecture, it is intended to combine the basic features and advantages of RISC, CISC, and VLIW architectures.

Note, however, that by VLIW, I mean modern VLIW architectures, such as the Itanium or, even more particularly, the Texas Instruments TMS320C6000 chip, and not the type of classic VLIW architecture the term was originally concieved of as referring to, such as that of the Control Data Cyber 200 computer.

Given that both the Itanium and the i860 were failures in the marketplace, despite being backed by the might of Intel, it is understandable that some might doubt my sanity in proposing a VLIW design in this day and age.

However, instead of including a break bit in every instruction, the break bits are in an optional header at the beginning of a 256-bit block of instructions. Implementations don't need to be designed around VLIW operation, but they can be, if they are aimed at a niche where a VLIW design is appropriate.

The Architecture

There are 32 integer general registers and 32 floating-point registers, and those instructions that perform arithmetic or logical operations include a bit for enabling changes to the condition codes as a result of those instructions. These are characteristics found in RISC architectures.

Having register banks of 32 registers allows different calculations to be intertwined in the code, and being able to control if instructions affect the condition codes allows more intervening instructions between an instruction that sets the condition codes and a branch instruction that makes use of those results. Both of these things allowed code to be designed to offer some of the same benefits as are obtained from out-of-order execution, without the hardware overhead. However, at the microprocessor clock rates in use today, these measures normally are not enough to be effective: however, if code written this way is combined with simultaneous multi-threading (SMT), then there is still the potential for competing with out-of-order execution.

Also, the architecture provides extended register banks of 128 integer registers, 64 bits in width, and 128 floating-point registers, 128 bits in width, which will also promote efficient VLIW operation.

Block Organization

Instructions are organized into 256-bit blocks which contain eight 32-bit instruction slots.

These blocks are always aligned on the boundaries of aligned 32-byte areas in memory, so an instruction slot that may contain the initial header of a block must have an address the last five bits of which are zero.

When a block header makes provision for instructions longer than 32 bits, it is possible that these instructions may cross block boundaries, depending on the rules applicable to the particular block header format in use.

The instruction set is organized so that the computer is able to fetch a 256-bit block of instructions, and, after processing any block header within the block, to determine what, if any, special processing is required, immediately begin decoding each 32-bit instruction slot independently of the others in the block.

There are several different types of block header, which are shown in the diagram below.

Sixteen types of header are illustrated in this diagram.

For ease of understanding, the headers may be divided into three groups.

The first group of header consists of those which apply attributes to the instructions in a block which consists only of 32-bit instructions.

These headers are illustrated in the diagram below:

and their descriptions follow:

The first type of header also functions as a two-operand register-to-register operate instruction, as well as a header which, with its decode field, specifies the number of 32-bit instruction slots at the end of the block which are not decoded as instructions, but are instead reserved for other purposes, such as the data values for pseudo-immediates.

The decode field is used to indicate the number of 32-bit instruction slots that are reserved for data other than instructions, such as pseudo-immediate values, for which no attempt is to be made to decode them as instructions. A value of 000 in the decode field indicates that all the remaining instruction slots are to be decoded as instructions; a value of 001 indicates the last instruction slot is to be reserved, and not decoded, and so on.

An immediate value in an instruction allows it to perform an arithmetic operation involving a constant without having to perform a fetch of data from memory in addition to the fetching from memory already performed as part of reading in the instruction stream.

An important design goal of the Concertina II architecture has been to drastically simplify the decoding of instructions; once a 256-bit instruction block has been checked for a header, and that header, if present, has been processed, all the instructions in the block can be decoded in parallel independently. The varying lengths of different data types mean that including a wide selection of instructions with immediate values would conflict with this.

A pseudo-immediate is addressed by a pointer in the instruction, which seems to be the same thing as a memory-to-register instruction making use of a constant value stored somewhere else. However, the pointer is a short-range one, which only points to a location within the same 256-bit instruction block as the current instruction is contained in.

Therefore, although it involdes a pointer reference, and thus is not "really" an immediate, hence the name "pseudo-immediate", it provides the same advnatage of the constant argument having been fetched as part of the instruction stream!

This first type of header reserves space for these constants which therefore won't be decoded erroneously as instructions, and because the header is also an instruction, it lets these three bits of information be provided without the overhead of using a full 32-bit instruction slot for a header and nothing else.

An A bit is present in this header type, to select instructions in the Augmented Short Instruction format; this was considered important as improving the quality of the short instructions available with VLIW features makes it more likely that it will be possible to craft blocks of code that could potentially take advantage of the full power of fourteen-way superscalar operation.

When this option is selected, the choice of a decode field, which was previously explained, or a position field is available. The position field, if it contains a nonzero value, indicates one of the remaining seven 32-bit instruction slots will contain an instruction in the alternate 32-bit instruction format. Thus, in this case, one may have either or both of the following for the block: all (or all but one) instruction in the augmented short instruction format, and one instruction in the alternate 32-bit format.

The fourth header type attempts to provide an alternative way to include 48-bit instructions in programs with a lower overhead than imposed by the third and fourth header types when only one of them is needed in a block.

The position field in the header indicates which instruction slot, from 1 to 7, of the remaining ones (the header occupies instruction slot 0) in the block, is to contain a 48-bit instruction. The first 32 bits of that instruction go in the instruction slot, and the last 16 bits go in the instruction end field of the header. A decode field is also provided, so that this type of block may make use of pseudo-immediates as well.

The ninth type of header provides supplementary information which allows the computer to provide VLIW functionality.

The primary feature of this type of header is to provide for VLIW features which can be used to accelerate the speed of instruction execution, particularly on lightweight implementations of the architecture which lack out-of-order execution.

There are seven bits marked B, for break; they correspond to the seven remaining 32-bit instruction slots in the block, and if a bit marked B is set, this indicates that the instruction in its corresponding instruction slot may not be executed in parallel with the instructions that precede it.

Important note: it is intended that this ISA may be implemented in a number of ways. Specifically, in relation to the VLIW feature of the break bit, these three classes of implementations are possible:

Implementations without superpipelining (that is, pipelining of the execution of instructions; a pipeline that breaks instructions into fetch, decode, and execute, performing fetch and decode of subsequent instructions in parallel with the execution of one instruction is still possible) or superscalar capabilities, which simply execute instructions serially one after another, and thus ignore the break bit as they cannot execute instructions in parallel;
Implementations where the break bit materially speeds up execution, by allowing more efficient pipelining of instructions;
Implementations which have out-of-order execution, guided by a full set of interlocks, which do not require explicit guidance from break bits for the optimum execution of a sequence of instructions.

In consequence, any programs which would produce a different result on the first two types of implementation listed above are to be considered to be invalid programs which have been written incorrectly.

Thus, the architecture specification requires implementations to execute code which does not contain any explicit indications of parallel execution with sequential consistency.

When code does contain such indications, implementations may follow those indications, or they may execute the code sequentially, even if different results are produced in the two cases; it is the programmer's responsibility, if consistent model-independent execution of programs is desired, only to indicate parallelism where it does not lead to results different from those of completely sequential code.

In this header format, there is also a four-bit flag field. This indicates which of the sixteen flag bits may be used for predicating instructions in this block. A seven-bit predicated field indicates which instruction slots contain an instruction the execution of which is conditional, based on that flag bit. There is also a bit marked S, for sense; if that bit is zero, a predicated instruction will execute if and only if the selected flag bit is set (equal to 1); if it is one, the predicated instruction will instead execute if and only if the selected flag bit is cleared (equal to 0).

As noted for the headers of Type II, it is also true here that if this header is preceded by a prefix header, so that fewer instruction slots remain following this header than there are bits in the predicated field, and than there are break bits, then the rightmost ones correspond to the rightmost instruction slot, and the appropriate number of the leftmost bits are the ones that are unused. Again, this is a general rule, applying in all similar cases.

Also present in this header type is an A bit, to select instructions in the Augmented Short Instruction format; this was considered important as improving the quality of the short instructions available with VLIW features makes it more likely that it will be possible to craft blocks of code that could potentially take advantage of the full power of fourteen-way superscalar operation. There is also a bit marked X; if this bit is set along with the A bit, it indicates that a modified version of the Augmented Short Instruction format is used in which the opportunity to increase the size of the opcode fields in operate instructions is foregone in order that what would be the second 0 bit in a pair of 15 bit instructions is instead available for use as an additional break bit, so that parallelism can be indicated for individual 15-bit instructions, and not just the pair as a whole; instead, the break bit associated with the instruction slot now applies to the first 15-bit instruction, with the additional one preceding the second 15-bit instruction applying to that instruction.

The X bit may also be set when the A bit is not set. In the regular 32-bit instruction set, the initial bits 1110 are reserved to indicate one type of header, while a pair of 14-bit instructions is indicated by 1111 as initial bits. Therefore, when the X bit is set without the A bit, then 111 at the beginning of an instruction slot otherwise containing an unmodified instruction from the standard 32-bit instruction set, will indicate a pair of 14-bit instructions, and the following bit will function as the break bit for the second of those instructions.

The tenth header type allows it to be indicated if memory-reference instructions are to use scaled indexes.

When the use of scaled indexes is indicated for a memory-reference instruction, if that instruction is indexed, and its operand type is other than byte, the contents of the index register used are shifted left the appropriate number of times in order that the index is in units of the operand length before being added into the effective address.

In addition to a scaled bit corresponding to each of the remaining instruction slots, there is also an alternate bit. Normally, when the contents of the option field are 000, when a bit is set here, this indicates that the instruction in that slot is to be taken from the set of alternate 32-bit instructions.

When the option field contains 001 instead, a bit that is set in the alternate field instead indicates a 33-bit instruction, and the bits in the scaled field are instead used to supply the first bit of that instruction. (When an alternate bit is not set, the scaled bit still has its normal function.)

And when the option field contains 010, then a bit set in the alternate field instead indicates an instruction in the Augmented Short Instruction Mode.

This header also has a decode field.

The thirteenth type of header is also a prefix header, similar to the tenth type of header.

It blocks any instructions in the block from being the target of a branch operation, unless they correspond to bits that are set to 1 in the target field of the header.

There is also a scaled field, allowing this to be combined with indicating that instructions have scaled indexes. If the option field contains 1, however, this field has the function of an alternate field instead.

As it may be desired to combine indicating instructions as branch targets with other special header functions, if the bit marked F in the header is set to 1, it indicates that this header may be followed by another header.

Because the target field is only seven bits long, it assumes the block is composed of 32 bit instructions. Therefore, this type of block, although it can be followed by some other types of header, may not be followed by headers of types II, III, V, XI, or XII. However, it may be followed by a header of type VI, since while that creates a block with a single 48-bit instruction, all the instructions begin at the normal starting positions of 32-bit instructions.

Since it is this type of header which contains the F bit and no other, it must precede the other header with which is combined; as with the type VII, VIII, and IX headers, the order may not be reversed.

Since headers have special bit patterns which are distinct from those of normal instructions, it was noted that when the F bit is set, this header may be followed by another header.

Specifically, this header may be followed by headers of types IV, IX, and X.

The fourteenth type of header uses the last dregs of available opcode space for headers to provide a truly limitless potential for expansion of the instruction repertoire of the Concertina II.

In this header, the option field is six bits long, allowing the final seven bits, each of which corresponds to one of the remaining instruction slots in the block, to take on up to 64 possible significations.

In the place of the bit marked F in the preceding two prefix headers, this header type has a bit marked C, which stands for "chain" rather than "following". If this bit is set, this header may also be followed by another header, but in addition to a header of type IV, VI, and XI, this header may also be followed by one of the other prefix headers, such as that of type XIII, and it may even be followed by another header of its own type, type XIV.

The fifteenth type of header illustrates how the use of a header extension, as described in the discussion of the header of type XI, for the header of type XIV allows two instruction slots to do the work of three, permitting three modifications to instruction slots to be specified, which would otherwise require chaining three headers of type XIII.

An additional type of header belongs to the first group of headers which was omitted for reasons of space from the general diagram of all header types which appeared above.

This type of header has the following formats:

In addition to the zero-overhead header of Type I which contains an operate instruction, these provide zero-overhead headers which include memory-reference instructions.

These memory-reference instructions are extremely restricted, as follows:

They may only have 12-bit displacements.
They may only use registers 17, 18, and 19 as base registers for those displacements.
They may only have aligned memory operands.
If indexed, they may only use register 1 as their index register.
Their destination register must be one of registers 0, 1, 2, or 3.

While there is no A bit in this header format, if the decode field contains 111, which would otherwise render the header unnecessary, as it indicates that all instruction slots are to be decoded, then the header has the function of indicating that the instructions in the remaining instruction slots are in the Augmented Short Instruction format.

The second group of headers consists of those which indicate that the current block will consist of variable-length instructions, with the length of each instruction being a multiple of 16 bits.

These headers are illustrated in the diagram below:

and their descriptions follow:

The second header format provides a three-bit prefix field for every 16 bits in the remainder of the block, and also includes a two-bit option field.

Here, in the case where the option bits are 00, the prefix fields are interpreted as follows:

000 a 17-bit instruction starting with 0
001 a 17-bit instruction starting with 1
010 augmented short format type instruction
011 not the start of an instruction
100 the start of a normal 32-bit instruction
101 alternate 32-bit instruction
110 a 33-bit instruction starting with 0
111 a 33-bit instruction starting with 1

All instructions longer than 32 bits are allowed with this header format, since instructions longer than 32 bits do not need to be specifically indicated in the header. The instructions that are 48 bits long and longer all begin with 111, which distinguishes them from the standard 32-bit instructions. Note that since this is used to distinguish longer instructions from 32-bit instructions, pairs of 14-bit instructions are not available with this header format, which is, of course, not an issue since 17-bit instructions are available.

And while the augmented short format type instructions allow pairs of 15-bit instructions, this is also not useful because of the availability of 17-bit instructions. But this instruction format is still valuable to have here because it provides register-to-register operate instructions with longer opcodes.

In the case where the option field contains 10 or 11, this header is used with 18-bit instructions, to permit more effective use of the superscalar capabilities of the processor.

If the option field contains 10, the prefix bits are interpreted as follows:

000 a 17-bit instruction starting with 0
001 a 17-bit instruction starting with 1
010 the start of a normal 32-bit instruction (or a 48-bit or longer instruction)
011 not the start of an instruction
100 an 18-bit instruction starting with 00
101 an 18-bit instruction starting with 01
110 an 18-bit instruction starting with 10
111 an 18-bit instruction starting with 11

If the option field contains 11, the last two bits of the prefix fields are used to indicate the first two bits of each 18-bit instruction the block contains, and the block may only contain 18-bit instructions.

The first bit of each prefix field is used instead as a break bit, so that this header does not need to be preceded by a header of another type for VLIW functionality.

The third type of header creates a block which can allow 32-bit instructions to be freely mixed with 17-bit short instructions. Some of the instructions longer than 32 bits are also allowed in this mode: specifically, those which are 64 bits long or longer, but not those which are only 48 bits long.

If a block begins with an instruction slot that begins with the bits 1110, that instruction slot contains this type of header.

Here, each of the fields marked pre corresponds to one of the remaining 16-bit halves of the seven remaining 32-bit instruction slots in the block.

If a pre field contains 0 as its first bit, then the corresponding 16 bits in the block are the last sixteen bits of a seventeen bit short-format instruction; the first bit of the instruction is the second bit in the pre field following the leading zero.

If a pre field contains 10, then the corresponding 16 bits in the block are normally the first 16 bits of a 32-bit instruction in the same standard format as is used when there is no block header, or with the other type of header described above. They may also be the first 16 bits of an instruction that is longer than 32 bits in length, as these instructions begin with 111 which distinguishes them from regular 32 bit instructions.

If a pre field contains 11, this indicates the corresponding 16 bits in the block are not to be decoded unless decoding is initiated by a preceding 16-bit field in the block. That is, they will be decoded if they are part of a 32-bit (or longer) instruction that began before it. Thus, in addition to containing the later parts of instructions, the 16-bit extents indicated by these pre bits may also be used for pseudo-immediate values.

Because pre bit values of 00, 01, and 10, in addition to initiating the decoding of instructions, also control execution, as only the instructions that are decoded can be executed, it is not necessary for pseudo-immediate values to be placed at the end of the block, they can be placed in any space that is indicated as not being decoded by a pre value of 11. As we shall see below, taking advantage of this opportunity is necessary in one case.

Instructions in blocks of this format may be the targets of jump and jump to subroutine instructions; their addresses are always those of the first 16-bit part of the instruction, with the first bit in the header for 17-bit instructions not being considered.

Because the positions where instructions start are explicitly indicated, instructions may cross block boundaries in this type of block.

Because this type of block header does not contain a decode field, any instruction that will continue into the next block must be located at the physical end of the block. Therefore, if pseudo-immediates are also used in the specific case of a block containing a partial instruction at the end, then they must be placed between instructions instead.

When a header of this type is preceded by a prefix header, fewer 16-bit half slots will remain following this header than there are prefix fields within it. In this case, the rightmost prefix field will correspond to the rightmost half slot in the block, and the unused prefix fields will be the leftmost ones. This is a general rule applying to all similar header types.

The eighth type of header provides four prefix bits for each 16 bits of the remaining part of the block.

When what would have been the option field in the header begins with 11, the prefix fields are interpreted as follows:

0000 a 17-bit instruction starting with 0
0001 a 17-bit instruction starting with 1
0010 a special 16-bit instruction
0011 not the start of an instruction
0100 the start of a normal 32-bit instruction
0101 alternate 32-bit instruction
0110 a 33-bit instruction starting with 0
0111 a 33-bit instruction starting with 1
1000 a 35-bit instruction starting with 0
1001 a 35-bit instruction starting with 1
1010 a 53-bit instruction starting with 0
1011 a 53-bit instruction starting with 1
1100 supplemental bits 00 for a 35- or 53- bit instruction
1101 supplemental bits 01 for a 35- or 53- bit instruction
1110 supplemental bits 10 for a 35- or 53- bit instruction
1111 supplemental bits 11 for a 35- or 53- bit instruction

The last eight codes, additional to those available in headers of type III, allow a 35-bit instruction to occupy two 16-bit half slots, with the first prefix code corresponding to those slots contributing one additional bit, and the second one contributing two additional bits, and they allow a 53-bit instruction to occupy three 16-bit half slots, again with the first prefix code contributing one additional bit, and the next two prefix codes each contributing two more bits.

When what would have been the option field in the header begins with 11, the header contains a bit marked C. This indicates that, for all operate instructions in the block of a type such that the opcode field may indicate floating-point instructions with the default Standard format for floating-point numbers, but which cannot, perhaps because of not being long enough, indicate floating-point instructions in the Compatible floating-point format, those instructions are interpreted as being for the Compatible floating-point format instead.

This unusual option is provided for this particular block format as it is focused on providing instructions which perform the same operations of those of a particular popular mainframe architecture.

There is also a bit marked T, which causes floating-point operations in the Compatible floating-point type performed by instructions within the block with this header, instead of having normal rounding behavior (which is to be rounded to the value closest to the exact result, as specified in IEEE 754, for addition, subtraction, and multiplication, and to a result within 1/64 of the units in the last place of the exact result for division, just as in the case of the Standard floating-point type), to be truncated, for further compatibility. This bit does not affect any instructions in which the rounding type is explicitly indicated in the instruction.

In addition, this header type contains a bit marked E. Its purpose is specific to the kind of instructions that are added to the instruction set by this type of header with 11 beginning what would have been the option field. When this bit is set, for the instructions in the block, the first sixteen floating-point registers, as seen by the program, are changed from 128-bit registers to 64-bit registers, and they are placed in pairs in the first eight actual 128-bit floating-point registers of the machine. Registers 16 through 31 are not changed.

This affects instructions operating on both the Standard and the Compatible floating-point formats.

The purpose of this is to enable code with this header type to interface with code running in emulation mode for one particular computer architecture.

The locations of the 64-bit registers within the 128-bit registers are shown in the table below:

128-bit   64-bit
register  registers

0         0,2
1         4,6
2         8,10
3         12,14
4         1,3
5         5,7
6         9,11
7         13,15

This arrangement stems from the historical characteristics of the architecture being emulated; originally, it only had four floating-point registers, and they were numbered 0, 2, 4, and 6, and so when a register pair was needed, only even-numbered registers were available out of which to build it.

A pictorial representation of this arrangement is shown below:

The rightmost portion of the image is after Figure 2-2 on page 2-5 of the Ninth Edition of Enterprise Systems Architecture/390 Principles of Operation, publication SA22-7201-08, by IBM.

A block header with this bit set also modifies the behavior of floating-point instructions involving the Standard floating-point type in another important way. Since the first sixteen registers are now only 64 bits long, floating point values in these registers will be in the same form as they are kept in main memory, and will not be converted to internal form on being loaded, and from internal form on being stored.

The conversion will remain in place for registers 16 through 31, since the purpose of this block format is to facilitate communication between programs running in emulation mode and ordinary programs. Thus, instructions operating on 128-bit floats in the Standard floating-point type will continue to use the internal form of floats without a hidden first bit, rather than the IEEE 754 standard format for 128-bit floats.

When the option field contains 010 or 011, the block instead allows the use of 19-bit instructions, which facilitate the use of the extended register banks for enhancing superscalar operation.

When the option field contains 001, the prefix bits are interpreted as follows:

0000 a 17-bit instruction starting with 0
0001 a 17-bit instruction starting with 1
0010 augmented short format type instruction
0011 not the start of an instruction
0100 the start of a normal 32-bit instruction
0101 alternate 32-bit instruction
0110 a 33-bit instruction starting with 0
0111 a 33-bit instruction starting with 1
1000 supplemental bits 000 for 32-bit, 48-bit. and 64-bit instructions
1001 supplemental bits 001 for 32-bit, 48-bit. and 64-bit instructions
1010 supplemental bits 010 for 32-bit, 48-bit. and 64-bit instructions
1011 supplemental bits 011 for 32-bit, 48-bit. and 64-bit instructions
1100 supplemental bits 100 for 32-bit, 48-bit. and 64-bit instructions
1101 supplemental bits 101 for 32-bit, 48-bit. and 64-bit instructions
1110 supplemental bits 110 for 32-bit, 48-bit. and 64-bit instructions
1111 supplemental bits 111 for 32-bit, 48-bit. and 64-bit instructions

simply allowing the opcode field to be extended by three bits for 32-bit instructions, six bits for 48-bit instructions, and nine bits for 64-bit instructions, thus making many more instructions available with this header.

Since the alternate 32-bit instructions and the 33-bit instructions are fully defined by their first set of prefix bits, they may also recive supplemental bits in this header type.

Since the supplemental bits only extend the opcode fields of instructions they modify, and do not alter the format of the instruction, they cannot remedy other basic limitations of a given instruction format. Thus, for example, the alternate 32-bit instructions include memory-to-register operate instructions which can only have the first eight registers as their destination registers. Supplemental bits may allow these instructions to perform additional operations on additional data types, but they cannot address the limitation to the first eight registers. Addressing this issue is done by means of the next available value for the option field, described below, where it becomes possible to use supplemental bits with 35-bit instructions.

When the option field contains 011, the prefix bits are interpreted as follows:

0000 a 17-bit instruction starting with 0 0001 a 17-bit instruction starting with 1 0010 the start of a normal 32-bit instruction 0011 not the start of an instruction 0100 supplemental bits 00 for 32-bit, 48-bit. and 64-bit instructions 0101 supplemental bits 01 for 32-bit, 48-bit. and 64-bit instructions 0110 supplemental bits 10 for 32-bit, 48-bit. and 64-bit instructions 0111 supplemental bits 11 for 32-bit, 48-bit. and 64-bit instructions 1000 a 35-bit instruction starting with 000 1001 a 35-bit instruction starting with 001 1010 a 35-bit instruction starting with 010 1011 a 35-bit instruction starting with 011 1100 a 35-bit instruction starting with 100 1101 a 35-bit instruction starting with 101 1110 a 35-bit instruction starting with 110 1111 a 35-bit instruction starting with 111

This adds supplemental bits, but fewer of them, to many instruction types in a similar manner to the previous type of header. Because it allows a 35-bit instruction to be fully defined by its first set of prefix bits, it allows those instructions to recieve supplemental bits.

This is important, because supplemental bits only extend the opcode of an instruction, so they can't make up for other deficiencies in an instruction's format; for example, they can't add a C bit to a load/store instruction to allow it to be used as an operate instruction.

When the option field contains 100, the prefix bits are interpreted as follows:

0000 a 17-bit instruction starting with 0
0001 a 17-bit instruction starting with 1
0010 augmented short format type instruction
0011 not the start of an instruction
0100 the start of a normal 32-bit instruction
0101 alternate 32-bit instruction
0110 a 33-bit instruction starting with 0
0111 a 33-bit instruction starting with 1
1000 a 19-bit instruction starting with 000
1001 a 19-bit instruction starting with 001
1010 a 19-bit instruction starting with 010
1011 a 19-bit instruction starting with 011
1100 a 19-bit instruction starting with 100
1101 a 19-bit instruction starting with 101
1110 a 19-bit instruction starting with 110
1111 a 19-bit instruction starting with 111

and when, instead, the option bits are 101, then the first prefix bit serves as a break bit, and the remaining three bits are the ones prefixed to the corresponding 16 bits to form a 19-bit instruction.

What these two values of the option code do for 19-bit instructions is similar to what the Type III header does for 18-bit instructions.

The eleventh type of header provides a two-bit prefix field for each 16 bit extent in the remainder of the instruction block. If the option bits in the header are 00, they have the same meaning as in the header of type II.

The length field in this header type, if it contains a value other than 000, permits the use of a very specialized feature of this ISA. It is intended to facilitate emulation, or either running or converting legacy code in higher-level languages, and is not expected to be included on most implementations of this architecture.

Use of this feature requires the operating system to allocate memory to programs that differs in width from what is usually provided (typically 64, 128, or 256 bits in width). This is made possible either through use of such features as three-channel memory, or by ignoring some bits of each word of memory, and multiplying or dividing each memory address, relative to a given starting point in memory, by such numbers as three, nine, five, or fifteen.

The values of the length field affect the sizes of variables in various data types as follows:

       Byte Halfword Word Long  Single  Medium  Double  Extended

000       8       16   32   64      32      48      64       128

001       6       12   24   48      36      48      60       120
010       6       12   24   48      36      48      72       120
011       6       12   24   48      48      60      96       120

100       9       18   36   72      36      54      72       108

and for programs to successfully use this feature, they will need to use base registers that point into memory of the apropriate size that has been allocated to them.

However, programs using a special length value, one other than zero, still need to be able to have immediate values. As well, constants may be placed in code segments, rather than in the data segment where variables are contained, to simplify the task of the program loader. Therefore, stores and loads of data of one length type to and from memory of a different length type do need to have a well-defined action which produces useful and consistent results.

That action is: a wrong-length value is padded on the right with zeroes, so as to fit in the minimum-size addressable unit of the memory in which it is to be stored, and it is stored aligned to the left at the destination address. And it is read in the corresponding manner, starting from the left.

The twelfth type of header allows the length field to be used with a greater variety of instructions, by associating a three-bit prefix field with each 16 bits in the remainder of the block, and having a four-bit option field.

The third group of header consists of those which apply attributes to the instructions in a block consisting of variable-length instructions.

Headers in this group are always prefix headers, as one or more headers in this group must ultimately be followed by a header in the second group.

These headers are illustrated in the diagram below:

and their descriptions follow:

The fifth type of header has the same basic function as the thirteenth type of header, described previously as one of the first group of headers: if the option field contains 0000, it blocks all instructions in a block from being the targets of branch operations except those specifically indicated by a 1 in the target field.

As this instruction assumes instructions begin on 16-bit boundaries in the block, it will normally be followed by a header of type II, type III, type VIII, type XI, and type XII. However, it is possible to omit the second header; in this case, the only use to which this additional functionality will be put is to allow the first and second 14-bit instructions in an instruction slot containing two such instructions to be indicated.

Note that branching also treats the two 14-bit instructions in a 32-bit instruction slot as if they occupied two successive 16-bit units of storage, even though that is not actually the case.

Note that the type II, type III, type VIII, type XI, and type XII headers modify the interpretation of instructions that follow them in such a way as to make it impossible to indicate the presence of a header instruction, so it is not possible to reverse the order of these headers.

The bit marked H, if set, causes the contents of the block to be saved in the same special buffer register as for the first, third, and fifth header types. If this header is followed by a header of type V or type X, the H bit in this header takes precedence over the one in that header which follows.

If the option field contains 0001 then the target field is used instead as a previous field: if pointers to pseudo-immediates, or other similar pointers, are to point into the register containing a saved previous block instead of into the current block, for a given instruction, then the bit corresponding to the first 16 bits of that instruction is to be set in that field.

It is only when the bit marked I is set that the block used will be the one saved as the result of a bit marked H being set. Otherwise, the previous block used is instead the immediately preceding block; all blocks are saved in a second buffer register, but only for the use of the next block to be read in and executed. This allows the overhead of having a header with the H bit set in the earlier block used to be avoided in most cases.

The sixth type of header has part of the same basic function as the ninth type of header: it provides information which allows the computer to provide VLIW functionality. Like the eighth type of header, however, it is designed to work with blocks in which instructions begin on 16-bit boundaries, and so it should normally be followed by a header of type II, type III, type VIII, type XI, or type XII, with the same exception as noted for the sixth type of header.

The seventh type of header provides the same type of supplementary information for VLIW functionality as the ninth type of header, but as a prefix to the type II, type III, type VIII, type XI, and type XII headers, so as to provide this functionality for variable-length instruction as well.

It also contains a bit marked A. This indicates that the normal 17-bit short instructions are replaced by an alternate set of 17-bit short instructions designed to make use of the extended register banks. This is because having more registers available enables a program to work better with the explicit indication of parallelism, by allowing more instructions to be placed between a given instruction and a later instruction which depends on its result.

The sixteenth type of header provides the same limitless possibilities as provided by the twelfth type of header to variable-length instructions.

This header may precede headers of types V, VI, and VII in addition to headers of types II, III, V, and X. Just like the headers of types V, VI, and VII, it must be followed (ultimately, in its case) by a header of type II, III, VIII, XI, or XII, therefore there is no C bit in this header, just as there was no F bit in the headers of types V, VI, and VII.

It seems to be in order, given the number of cases involved, to include this table showing the availability of instructions by length and block type:

Header     Length

           14  15  16  17  18  19  24  30  32  33  35  48  53  64, 80, 96
No Header  *                               *
      I    *   *                           *
      II   *           *   *               *   *       *       *
      III              *                   *           *       *
      IV   *                               *           *
      V    (followed by II, III, VIII, XI, or XII)
      VI   (followed by II, III, VIII, XI, or XII)
      VII  (followed by II, III, VIII, XI, or XII)
      VIII *       *   *       *           *   *       *   *   *
      IX   *                               *  
      X    *   *                           *   *
      XI               *                   *           *       *
      XII              *                   *           *       *
      XIII *   +                           *   +       #  (+: only if followed by X) (#: only if followed by IV)
      XIV  *   *                           *   *       #  (#: only if followed by IV)
      XV   (followed by II, III, VIII, XI, or XII)
      XVI  (followed by II, III, VIII, XI, or XII)

However, while 14-bit instructions are available in blocks with header type III, IV, and VI, their use is not recommended, as they are completely superseded by 17-bit instructions. (The bit pattern 111 indicates instructions that are 48 bits or longer in the same instruction slots which could be occupied by 32-bit instructions, which takes away the opcode space where 14-bit instructions are located.)

Registers and Data Formats

The basic complement of registers included with this architecture is as follows:

There are 32 integer registers, each of which is 64 bits in length, numbered from 0 to 31.

Registers 1 through 7 may be used as index registers.

Registers 25 through 31 may be used as base registers, each of which points to an area of 65,536 bytes in length.

Registers 17 through 23 may be used as base registers, each of which points to an area of 4,096 bytes in length.

At least part of the area of 3,072 bytes in length pointed to by register 16 will normally be used to contain up to 384 pointers, each 64 bits in length, for use in either Array Mode addressing or Address Table addressing.

Registers 9 through 15 may be used as base registers, each of which points to an area of 1,048,576 bytes in length. This addressing format is used for 48-bit extended memory-reference instructions.

There are 32 floating-point registers, each of which is 128 bits in length, numbered from 0 to 31.

Floating point numbers in IEEE 754 format have exponent fields of different length, depending on the size of the number. For faster computation, floating-point numbers are stored in floating-point registers in an internal form which corresponds to the format in which extended precision floating-point numbers are stored in memory: with a 15-bit exponent field, and without a hidden first bit in the significand.

As 128-bit extended floating-point numbers are already in this format in memory, all floating-point numbers will fit in a 128-bit register, although shorter floating-point numbers are expanded.

However, the 32 floating-point registers may also be used for Decimal Floating-Point (DFP) numbers. These numbers will also be expanded into an internal form for faster computation, but that internal form may take more than 128 bits.

This is dealt with as follows: Only 24 DFP numbers that are 128 bits in length may be stored in the 32 floating-point registers. When such a DFP number is stored in an even-numbered register, it is stored in that register, and the first 32 bits of the following register. When it is stored in a register the number of which is of the form 4n + 1 for integer n, the first 84 bits of the internal form of that number are stored in the last 84 bits of that register, and the remainder of the internal form of that number is stored in the last 84 bits of the second register after that register.

In this way, the same principle that storing double-length numbers in two adjacent registers is respected: numbers too long to be stored in a given register are stored in that register, and in another register of the same register file that is nearby. But the method is extended to allow more efficient use of the available space.

The same technique is used for the 128-bit floating-point format which has recently been added to IEEE 754 which does have a hidden first bit; therefore, in order to support this format, the usual 128-bit floating-point format offered by this architecture, while similar to, and based on, the Temporary Real format of the original 8087 coprocessor, has an exponent field that is one bit longer than that of the Temporary Real format.

There are 16 short vector registers, each of which is 256 bits in length.

Each of these registers may contain:

Two 128-bit floating-point numbers.
Four 64-bit floating-point numbers.
Four 64-bit integers.
Eight 32-bit floating-point numbers.
Eight 32-bit integers.
Sixteen 16-bit integers.
Thirty-two 8-bit integers.

As well, they may contain sixteen 16-bit short floating-point numbers in one of two formats.

These numbers all remain in these registers in the same format as that in which they appear in memory.

The entire set of 16 short vector registers can contain a table of bits used for bit-matrix-multiply operations on 64 bit binary words. As well, the short vector registers may also be used as four string registers, each 128 bytes in length.

This is done, rather than using them as two string registers, each containing 256 bytes, because four registers are the minimum number of registers required for thye general register style of operations, at least as claimed in advertising literature for the Data General Nova. Having these strings only half the maximum length of those available to memory-to-memory string operations is presumed to be accessible, since strings "really" only have to be at least 80 characters long, as everyone knows.

In addition to the basic set of registers, two other larger sets of registers are also included in the architecture:

A set of 128 64-bit integer registers, and a set of 128 128-bit floating point registers.

A set of 8 vector registers, each of which contains 64 storage locations for floating-point numbers, each one 80 bits wide. This allows the computer to process vectors of 72-bit floating-point numbers in addition to vectors of 64-bit floating-point numbers, if the optional variable memory width feature is included.

As for how data values are stored in memory:

Signed integer values are stored in binary two's complement format.

Floating-point numbers are stored in IEEE 754 format, but in addition there are instructions for processing data in the format originally used by IBM's System/360 computers, including the Extended Precision format introduced on the Model 85.

The architecture is big-endian: the most significant bits of a value are stored in the byte at the lowest numbered address.

As well, there are 16 flag bits which are used for instruction predication, and of course there is a 64-bit program counter. The program status quadword includes eight sets of condition codes, and the program counter and flag bits are also part of the program status quadword.

32-bit Instruction Format