This is now my seventh attempt to propose a successor to my original Concertina architecture.
I hope that this time I have found a way to achieve the goals I have set for myself while avoiding excessive complexity. The basic instruction formats for this architecture have the form:
All instructions are, or at least start out as, 32 bits in length, and the way in which they are processed is organized to be suitable for an implementation in which instructions are fetched eight at a time, in blocks of 256 bits.
The intent of this design is that any portion of a program which consists only of instructins that are 32 bits long may be executed without any overhead caused by the possibility of instructions of other lengths being present. The principle used to achieve this is this: when, and only when, there is need for instructions of other lengths, one or more instructions in the preceding block indicate how many of the first few 32-bit instruction words in the next block are to be skipped over as containing data additional to normal 32 bit instructions; and, those instructions which are longer appear in the instruction stream as normal 32 bit instructions, but contain a pointer to the additional information required within that skipped-over part of the block.
The first two instruction formats are similar to those of many RISC architectures. There are memory-reference instructions, and register-to-register operate instructions that work with a bank of 32 registers.
The bit in the register-to-register instructions marked B, if it is zero, indicates that the instruction is guaranteed not to be dependent on the preceding instruction. This allows more rapid processing of instructions; they are considered to be grouped in blocks, where the first instruction of a block either has the B bit set, or is of a type without a B bit. Every instruction within a block must be one that can be safely executed in parallel.
The register-to-register instructions, as well as the augmented memory reference instructions, which may also perform calculations instead of merely loading or storing data, also have a bit marked C, which must be set to allow the instruction to change the condition codes.
Unlike most RISC architectures, but like the System/360, memory-reference instructions offer full base-index addressing.
There are 32 integer data registers, each one 64 bits wide, but only 8 address registers, also each 64 bits wide. As well, there are 32 floating-point data registers, each one 128 bits wide.
The base register field refers to one of the eight address registers, except that address register zero is not used as a base register.
The index register field indicates that indexing is not taking place if it contains all zeroes. Otherwise, it indicates the register that is to be used as an index register as follows:
001 Integer data register 1 010 Integer data register 2 011 Integer data register 3 100 Address register 0 101 Address register 1 110 Address register 2 111 Address register 3
Thus, some of the index registers are among the integer data registers, to allow index values to be the result of complex calculations. Since many programs will not require seven different base register values, however, some of the address registers may also be used as index registers; and, specifically, address register 0, which would otherwise not be useful, is allowed to serve as an index register.
The fourth line of the diagram shows a modified form of the register-to-register instruction in which only a source and destination register are specified. This frees up space to allow the instruction to include three bits which indicate, for the following block of 256 bits of instructions, a number of 32-bit instruction spaces at the beginning of that block to be skipped over.
This is useful in order to facilitate other instruction formats to be described in subsequent lines of the diagram, where additional information required for the instruction, either immediate values or supplemental data to create an instruction format larger than 32 bits, is taken from this area.
It is recommended that this two-address form should always be used in preference to a three-address form where the destination register and the operand register are the same. As this may mean that there is more than one instruction in a block with a skip field, all the skip fields should contain the same value. This permits simple implementations that set up the skip for the next block based on every skip field encountered, so that they will not give inconsistent results depending on the order in which instructions complete.
The P bit is used to indicate that the first 32 bits of the following 256-bit block of instructions will be used to provide predication information for the instructions in that block. In this case, the skip count must be 1 or greater, as those 32 bits are still included in the count of 32-bit instruction words to be skipped over.
In this case, the first 32 bits of the next block will have one of the forms shown in the following diagram:
In the first format, up to seven predicated instructions may be present in the block. If the C bit is zero, one of the first eight flag bits, numbered from 0 to 7, determines if the instruction is executed. If the C bit is one, the three remaining bits in the field indicate the state of the condition codes for which the instruction is executed. Values from 1 through 7 correspond to those in the conditional branch instructions; 000 indicates the instruction is executed if there is an overflow (as opposed to never being executed).
In the second format, which allows for up to four predicated instructions, if the C bit is zero, the following bit, if 1, indicates the instruction is to be executed when the flag bit selected by the last five bits of the field is cleared instead of set. If the C bit is one, the last five bits of the field correspond to condition codes as used in the conditional branch instructions.
In the third format, which allows for up to six predicated instructions, sixteen of the flag bits, and the first sixteen of the possible condition code tests are available.
When this method of indicating predication is used, if a branch takes place into a block set up in this way, the predication information will be ignored, as there will not be anything visible within the block itself to indicate that predication is taking place. This is the one deliberate exception to the general principle that by skipping over the first part of a block, and then always using pointers to use the material within that part of the block, branching into code, despite the lack of fixed overhead in the form of a block type indicator at the start of the block, will not cause problems.
It should be noted, however, that there is also another inherent unavoidable situation where branching to the wrong location is capable of causing problems.
It has been noted above that there may be more than one instruction with a three-bit skip field in a block, and, if so, all the skip fields should contain the same value. (The accompanying P bit should also be either set or cleared in the same fashion in all cases.) It should also be noted that if a block contains one or more instructions with a skip field, it is not necessarily the case that the last instruction in a block contains a skip field.
Thus, if a block contains one or more instructions with a skip field, and these skip fields contain a nonzero value, and a branch is made to that block after the last instruction with a skip field in that block, then the entire following block will be treated as composed of normal 32-bit instructions, with none skipped over.
The third line of the diagram shows an instruction format in which the source operand is an immediate value. But instead of being in a fixed location as part of the instruction, it is referenced by a pointer, like a memory operand instead of an immediate value, which is why this instruction format has been labelled "pseudo-immediate".
The pointer, however, points to one of the 32 bytes of the current 256 bit instruction block, so accessing the item it refers to, from what has already been fetched into the instruction buffer, should be at least as rapid as accessing an operand in a register, and thus in practice there should be no significant difference between operands of this form and conventional immediate operands.
Note that there is no need, as well, with this scheme, to place restrictions on what can be in a block that contains a branch target. Since the pseudo-immediates are pointed to, instead of being in locations deduced from what has happened in previous instructions in the block, as long as one only branches to actual code and not to constant values, things will simply work: the interpretation of instructions after the branch point will not depend on what portion of the block prior to the branch point consists of instructions, and what portion is skipped over to contain immediates or other data.
In the fifth, seventh, and eighth lines of the diagram are instruction formats that allow the instruction to be longer than 32 bits. Here, the pointer to additional material is four bits long, so that instructions may be lengthened in steps of 16 bits. How many of those are used depends on the particular instruction. Two different formats are provided, so that for a lengthened register-to-register instruction, two of the register operands can be in their usual position, and for a lengthened memory-reference instruction, the displacement field in the original 32-bit body of the instruction can be used.
The sixth and seventh lines of the diagram show instruction formats similar to those in the fourth and fifth lines, except that they are modified so that the source operand is immediate. Thus, an immediate operand may be combined with the formats shown in those lines.
The additional data which is used for immediates or supplementary bits in an instruction normally precedes the instruction within a 256-bit block. Thus, not shown in this diagram, is that a 32-bit instruction without supplementary bits may also have the format shown in the eighth line of the diagram, but with the four bits in the pointer to the supplementary bits set to 1111. This provides for additional 32-bit instructions which do not follow either the standard memory-reference or register-to-register formats.
The value 1110 points to the first 16 bits of the last 32 bits in the block, which is also not avaiable. This value is used, as is shown in the ninth and tenth lines of the diagram, for an alternate means to indicate predication. As this method places the instructions to be predicated within the skipped part of the block, and accesses them by a pointer to them, a branch to a block in which this method of indicating predication is used won't cause predication to be ignored.
This method is used when correct operation in the event of a branch to a block where predication is used is essential. The other method described above, as it allows the entire first 32 bits of a block to describe the predication, has less overhead and is more flexible where branching into the block is not an issue, which is why it is provided as well.
If there are from one to three predicated instructions, six bits are available to describe the predication for each one. If the first bit, marked C, in each six-bit field is zero, the five remaining bits indicate one of thirty-two flag bits, and the corresponding instruction is executed if the flag bit is set. If the C bit is 1, the five remaining bits indicate the condition code value, in the same manner as in a conditional branch instruction, under which the instruction is executed.
If there are four to six predicated instructions, the three bits corresponding to the instruction, if all zero, indicate the instruction is executed unconditionally; if they have a value from 1 to 7, flag bits 1 to 7, from among the thirty-two flag bits, numbered 0 to 31, are then used to control whether the instruction is executed.
In either case, for any number of instructions to be predicated from 1 to 6, the predication fields are used from left to right, with the contents of unused fields on the right being ignored. (The 32-bit header indicated by the P bit in instruction formats above follows the opposite convention.)
Single-operand register instructions do follow the standard register instruction formats, unlike the additional instructions noted as starting with the bits 11C1 1111, but with one or two of the register fields used to supply additional opcode bits instead.
The opcodes of the memory-reference instructions are:
00000 LB Load Byte 00001 STB Store Byte 00010 ULB Unsigned Load Byte 00011 IB Insert Byte 00100 LH Load Halfword 00101 STH Store Halfword 00110 ULH Unsigned Load Halfword 00111 IH Insert Halfword 01000 L Load 01001 ST Store 01010 UL Unsigned Load 01011 I Insert 01100 LL Load Long 01101 STL Store Long 01110 JC Jump on Condition 01111 JS Jump to Subroutine 10000 LM Load Medium 10001 STM Store Medium 10010 LF Load Floating 10011 STF Store Floating 10100 LD Load Double 10101 STD Store Double 10110 LQ Load Quad 10111 STQ Store Quad
Byte, Halfword, Word, and Long are integer formats 8, 16, 32, and 64 bits in length respectively; Medium, Floating, Double, and Quad are floating-point formats 48, 32, 64, and 128 bits in length respectively. Due to their odd length, Medium-format floating-point numbers are considered to be aligned when they are aligned to a 16-bit boundary.
In the case of the Jump on Condition instruction, the destination register field is used as part of the opcode, to indicate the condition under which branching takes place. The various forms of the Jump on Condition instruction are:
01110 00000 NOP No-operation 01110 00001 JL Jump if low 01110 00010 JE Jump if equal 01110 00011 JLE Jump if low or equal 01110 00100 JH Jump if high 01110 00101 JNE Jump if not equal 01110 00110 JHE Jump if high or equal 01110 00111 J Jump 01110 01000 JV Jump if overflow
In the case of the Jump to Subroutine instruction, the destination register field indicates where the return address is to be placed: if it contains a value from 0 to 7, it is placed in the address register indicated by the number in that field; otherwise, it is placed in the integer data register indicated by the number in that field.
The basic floating-point formats supported by this architecture are patterned after those of IEEE 754, as used by many other computers, and are shown below:
In addition to the standard 32-bit and 64-bit types specified by IEEE 754, a similar type occupying 48 bits is defined. The size of the exponent field is chosen to be the minimum that allows numbers from 10^-99 to 10^99 to be represented, and with that exponent field, 11 digits of precision are provided. Thus, this format matches the precision provided by many pocket calculators, as well as used in mathematical tables and mechanical calculators; thus, historically, it appears to be a good fit to what many scientific problems require.