Speed Without Compromise

Although I have now pointed out how the end-around carry allows accurate division by a fixed amount through multiplication by the reciprocal, requiring that for every memory access, although a few gate delays are insignificant compared to the access time of DRAM, still seems to me to fall short of the ideal at which I am aiming.

How can accessing data in memory, when that data is of odd lengths, be made as fast, straightforward, and simple as accessing memory in power-of-2 lengths on conventional architectures?

On further reflection, I found one further alternative for where to place the inevitable compromises that such a system must have, so that they don't get in the way of handling data with more flexible lengths in an efficient manner.

Before decimalisation in Britain, many computers made for use in the United Kingdom had a limited provision for mixed-radix arithmetic, so as to handle financial calculations involving the old British monetary system of pounds, shillings, and pence, with 20 shillings to the pound, and 12 pence to the shilling.

And so I propose to use mixed-radix arithmetic to facilitate addressing memory in which data items are aliquot parts of a memory word, where the number of these data items in a memory word, although it comes out even, is other than a power of two.

Thus, this page follows on from the ideas presented on this earlier page, where diifferent lengths of data items are handled by building addressing around a fixed unit that is a multiple of the lengths of each of the different data items was first presented.

As a concrete example:

Imagine a computer in which the RAM is organized into lines of data that are 180 bits wide.

Each such line, therefore, consists of four 45-bit words. But, in addition, it can be divided into:

Five single-precision floating-point numbers that are 36 bits long,

Three double-precision floating-point numbers that are 60 bits long.

If one has an intermediate-precision floating-point number that is 45 bits long, then it will offer only 10 digits of precision instead of 11 digits, but that may still allow it to be useful on occasion as an alternative to single precision.

Base registers on the kind of architecture I propose wouold contain pure binary numbers, those numbers being the address of a 180-bit memory line.

These registers would be used in conjunction with index registers and the displacements in instructions to generate four kinds of address.

Therefore, I propose to have four distinct sets of index registers in this architecture.

To address 45-bit words, the contents of an index register and of the displacement in an instruction would just be a binary number, with two additional bits after the end of the contents of the base registers.

To address 9-bit characters, possibly 18-bit data items of some kind, and 36-bit single-precision floating-point numbers, an address would consist of one part corresponding to the binary address of a 180-bit data line, and another part containing a number from 0 to 19 which would point to one of the twenty 9-bit characters within that data line.

The last two bits of that part of the address would be zero for an aligned 36-bit floating-point number.

And so one would have a set of index registers containing displacements in this format.

To address 15-bit instruction syllables, 30-bit integers, and 60-bit double-precision floating-point numbers, an address would consist of one part corresponding to the binary address of a 180-bit data line, and another part containing a number from 0 to 11 which would point to one of the twelve 15-bit instruction syllables within that data line.

To address 12-bit storage cells, so as to facilitate the processing of data consisting of 6-bit characters or 4-bit BCD digits, an address would consist of one part corresponding to the binary address of a 180-bit data line, and another part containing a number from 0 to 14 which would point to one of the fifteen 12-bit storage cells in a data line.

Adding two mixed-radix quantities is simple.

Multiplying a mixed-radix quantity by a binary number, as would be needed to address array elements, can be performed when necessary by "Russian Peasant Multiplication".

This method of multiplication is similar to the algorithm used to efficiently use multiplication to take numbers to integer powers. The mixed-radix number to be multiplied by a binary integer would be added to itself repeatedly to obtain its power-of-two multiples corresponding to the digits of the binary integer; those corresponding to a digit that is 1 instead of 0 would be added to form the final product.

In this scheme:

Data that is 36 bits long, 60 bits long, or of other odd lengths, is stored as aligned data items, always contained entirely within a single 180-bit memory line without wastage.

Addresses are always maintained in native form, indicating the binary address of a 180-bit memory line, and the index of an element within that memory line, so that there would be no need to convert addresses to another form when accessing data.

Using a 180-bit wide memory line is only one possibility. Here is a table of some possible options:

Memory line: 144 bits

Single Precision: 36 bits (*4)
Intermediate Precision: 48 bits (*3)
Double Precision: 72 bits (*2)
Temporary Real: 144 bits
Extended: 288 bits

Memory line: 180 bits

Single Precision: 36 bits (*5)
Intermediate Precision: 45 bits (*4)
Double Precision: 60 bits (*3)
Temporary Real: 90 bits (*2)
Extended: 180 bits

Memory line: 192 bits

Single Precision: 38 bits (*5, 2 unused bits)
Intermediate Precision: 48 bits (*4)
Double Precision: 64 bits (*3)
Temporary Real: 96 bits (*2)
Extended: 192 bits

Memory line: 256 bits

Single Precision: 36 bits (*7, 4 unused bits)
Intermediate Precision: 51 bits (*5, 1 unused bit)
Double Precision: 64 bits (*4)
Temporary Real (alternate): 85 bits (*3, 1 unused bit)
Temporary Real: 128 bits (*2)
Extended: 256 bits

Temporary Real is a format like the 80-bit Temporary Real format used with the Intel 8087 floating-point coprocessor, where there is no hidden first bit.

Extended is a format composed of two halves, each in the Temporary Real format, with the sign and exponent of the second half ignored. This is similar to double precision on the IBM 704 computer, or Extended Precision on the IBM System/360 Model 85.

The Problem with this Scheme

Although this way of dealing with the issues of memory addressing delivers what was sought - fast and efficient access to data items of lengths not related as powers of two, it does have some costs.

More index registers are needed; this adds transistors to the design, and they need to be saved and restored for interrupts and subroutine calls.

But there is a more serious problem.

Assembly-language code to operate on floating-point numbers of odd lengths efficiently can be written.

But the popular higher-level languages that people normally use don't require array subscripts to have types like "displacement in terms of 36-bit units" versus "displacement in terms of 60-bit units". It's assumed that subscripts are just binary numbers, that only need to be shifted to apply to a different type of data.

If code in a conventional language were just compiled in a simplistic manner, each time an array element was addressed, the subscripts would be converted from integer form to mixed-radix form. That would negate the advantages of this scheme; instead, loop counters are to be in the appropriate mixed-radix type for the data types used in the loop.

A fast divide-by-3 or divide-by-5 circuit, on the other hand, allows programs to be written in a conventional manner.