[Next] [Up] [Previous]

The Decimal Floating-Point Standard

The table given previously showing IBM's official formats for Densely Packed Decimal and Chen-Ho encoding is reproduced here again:

BCD digits         Chen-Ho encoding    Densely Packed
                                       Decimal encoding

0abc 0pqr 0uvw     0abpquvcrw          abcpqr0uvw
0abc 0pqr 100W     110pqabcrW          abcpqr100W
0abc 100R 0uvw     101abuvcRw          abcuvR101w
100C 0pqr 0uvw     100pquvCrw          uvCpqr110w
0abc 100R 100W     11110abcRW          abc10R111W
100C 0pqr 100W     11101pqCrW          pqC01r111W
100C 100R 0uvw     11100uvCRw          uvC00R111w
100C 100R 100W     1111100CRW          00C11R111W

     0pqr 0uvw     0pqruvw             pqr0uvw
     0pqr 100W     111rpqW             pqr100W
     100R 0uvw     100Ruvw             uvR101w
     100R 100W     110R00W             10R111W

as Densely Packed Decimal forms part of the specification for decimal floating point which is included in the revision of the IEEE 754 standard under consideration at the time this page was originally prepared. Since then, the format considered was adopted, but an additional variation of that format was adopted as well which, instead of using Densely Packed Decimal to represent decimal digits, simply represents the entire significand (or mantissa) as a binary integer.

Decimal Floating-Point

The new z9 computer from IBM has added a decimal floating-point format which makes use of IBM's Densely Packed Decimal encoding.

Floating-point numbers in this format are not necessarily normalized. The intent behind this does not appear to be to provide significance arithmetic as it is normally understood, as this number format is largely intended for use with quantities whose significance is specified.

The number of decimal digits of precision in this format is always of the form 3n+1. The reason for this could be to allow an efficient way to encode special values to indicate infinities and other NaN (not a number) quantities, but it is also true that the use of this field allows the lengths of the component fields in the number, occupying either 32, 64, or 128 bits, to work out more nicely, with a gradual increase in exponent size along with precision.

A number in this decimal floating point format consists of the following elements:

The lengths of these fields are:

Overall     Sign   CF     BXCF    CCF
Length

    32         1    5        6     20
    64         1    5        8     50
   128         1    5       12    110

the length of the CCF field always being a multiple of ten bits for effective use of the Densely Packed Decimal format.

The format of the CF is as follows:

First digit of mantissa/coefficient:

 0 to 7: aaa
0 000
1 001
2 010
3 011
4 100
5 101
6 110
7 111

 8 or 9: A
8 0
9 1

First two bits of biased exponent:
00, 01, or 10: bb

Formats of the CF:
bbaaa
11bbA
11110: infinity
11111: NaN

Note that the CF is encoded using the same division of the decimal digits into a group of eight digits and a group of two digits that lies at the basis of Chen-Ho encoding and Densely Packed Decimal encoding.

The next remaining piece of information about the format is the bias used for the exponent:

Length of        Number of Possible      Exponent               Precision of Number
Number in Bits   Exponent Values         Bias                   in Digits

    32           3 *    64 =    192           101 (   96+ 5)     2 * 3 + 1 =  7
    64           3 *   256 =    768           398 (  384+14)     5 * 3 + 1 = 16
   128           3 * 4,096 = 12,288         6,176 (6,144+32)    11 * 3 + 1 = 34

The exponent bias is, surprisingly, not the exact midpoint of the range of the possible exponents. However, given that it is intended to use numbers routinely in unnormalized form in this format, increasing the exponent bias facilitates this, and, in fact, the discrepancy between the exponent bias and half the exponent range is always two less than the number of digits of precision provided by the given format.

The following diagram illustrates these formats:

It should also be noted that the exponent bias, as given, is based on the decimal point being placed at the right of the coefficient, not at its left, so when the exponent, after the bias is added, equals zero, the number is an integer.

It has been noted that numbers in this format may be unnormalized.

One possible use for unnormalized numbers is significance arithmetic. This format, however, comes with a set of rules about the "ideal exponent" of the result of an arithmetic operation (this term acknowledges that the range of exponents is finite, and thus cases will arise where the choice of exponents to use in the representation of a number may be limited) that do not correspond to the rules of significance arithmetic. Instead, they follow the IEEE-754 philosophy of producing exact results as far as possible.

The basic intent of those rules is that 100 plus 5.25 should be 105.25 and not 105.250000000; 2.7 times 8.4 should be 22.68 and not 22.680000000. Thus, it is intended that the routines that input and output numbers should create unnormalized values based on the form of numbers read in, and should print numbers with trailing zeroes omitted to the extent indicated by the degree of unnormalization to be printed.

This is a further extension of the reason for using a decimal exponent base in the JOSS system, so that .3 plus .7 might be 1.0 instead of 0.9999999999; the goal is not merely decimal arithmetic, but humanized arithmetic. Doing this within the computations themselves, rather than merely removing trailing zeroes on output, is what is novel about this format.

Previous attempts at humanizing the arithmetic operations of computers such as that in JOSS have tended to be dismissed by the computing community as not worth the trouble, but given the popularity of spreadsheets, for example, it may be that this will prove to be a useful idea.

One thing that occurs to me is that perhaps a decimal floating-point number ought to have a flag bit indicating whether the bits past the end of the number are to be taken as certainly zero, or unknown, so that if either of the numbers in an operation have that bit set, the rules of significance arithmetic are followed instead of those of humanized arithmetic; this would make for a general floating-point arithmetic that is also able to handle the numbers one usually thinks of floating-point as being applicable to, values of physical quantities of limited precision. Actually, this is somewhat of an oversimplification: if a trailing asterisk is used to indicate the flag bit, for addition the rules would work like this:

2.345 + 7.1 = 9.445
2.345* + 7.1 = 9.445*
2.345 + 7.1* = 9.4*
2.345* + 7.1* = 9.4*

If the less precise quantity in an addition has the flag bit set, the rules of significance arithmetic are followed, and the flag is preserved; but if the more precise one has the flag bit set, then the less precise one is still taken as the exact quantity it claims to be.

In the case of multiplication, we also have multiple cases:

26.34 * 1.7 = 44.778
26.34* * 1.7 = 44.77*
26.34 * 1.7* = 45*
26.34* * 1.7* = 45*

Here, the number of significant digits, instead of the precision as a magnitude, is what is compared.

When I think of numbers represented internally in decimal form, I also tended to think of COBOL programs, not spreadsheets, and if one is using a program to calculate a payroll, one would be normally using fixed-point numbers as well: if new rounding rules are needed, inventing a new floating-point format for that purpose seemed wasteful to me. But once it is understood that the idea is to have a general tool that can be easily used for arbitrary calculations, relieving users, as opposed to programmers, of having to specify the range of numbers becomes an obvious necessity.

It may also be noted that IBM intends to license its Densely Packed Decimal patent on a royalty-free basis to implementors of this format as it is about to be specified in the revised IEEE 754 standard.

Binary Integer Decimal

The alternative format for Decimal Floating Point which is used on Intel microprocessors is as illustrated below:

This format does not have a five-bit combination field, but it is still true that the leading four bits of the significand in its raw form are encoded to either three bits or one bit, with a prefix indicating the latter case, so that the exponent range and precision of this format are the same as those of the other encoding.

The remainder of the significand, combined with its encoded first bits, is a binary integer having the same numerical value as the sequence of decimal digits which would be used in the other format would represent in decimal notation.

Thus, this format follows the same principle as originally used in von Neumann's JOSS interpreter.

Other Recent Decimal Floating-Point Formats

So far, the Binary Integer Decimal format is implemented by a software package on Intel microprocessors rather than being fully implemented in hardware.

The same was also true of a decimal floating-point format provided for the Motorola 68040 microprocessor by the same software package that also provided some floating-point operations which the 68881 and 68882 coprocessors performed in hardware but which were not part of the hardware floating-point on that microprocessor.

This format was 96 bits in length, and is as illustrated below:

Here, the significand consists of BCD digits, each four bit field containing a digit from 0000 (0) to 1001 (9), for speed in processing as opposed to efficiency in storage.

First, there is the sign of the number, and then the sign of the exponent, also represented in sign-magnitude representation.

Two bits are then used to indicate NaN values; in that case, both of those bits are set to 1; for normal values they are both zero.

Then there is the exponent, represented as three packed decimal BCD digits. It is followed by a one-digit field which allows an additional overflow decimal digit for the exponent.

Finally, the remainder of the number consists of the 19 digits of the mantissa.

The reason for having an overflow decimal digit in this format is because Motorola microprocessors also had a 96-bit extended precision format which contained the same data as the 80-bit Temporary Real floating-point format of the Intel 8087 math coprocessor, but 16 unused bits in addition.

Thus, the format had one bit for the sign, fifteen bits for the binary exponent, and sixty-four bits for the significand.

A sixty-four bit significand (or mantissa) provides just over nineteen decimal digits of precision. And note that this 96-bit decimal floating-point format provides nineteen decimal digits in BCD form.

Thus, the overflow decimal digit was present so that any floating-point number in the Temporary Real format could be converted to decimal form without error; 2^16384 is about equal to 10^4932, making a four-digit exponent field necessary. By treating the most significant digit as an overlow digit, however, then ordinary values which are allowed to be used in computation, with only three-digit exponents, can all be converted in the opposite direction, to binary, whereas values with exponents over 5,000 would not have binary equivalents.


Somewhat less recent, but also of interest as a decimal floating-point format provided on a primarily binary machine (as opposed to the many floating-point formats of decimal machines such as the IBM 7070 and 7074) is the decimal floating-point provided on the Wang VS computer.

This computer strongly resembled the IBM System/360. It also provided a decimal floating-point format which was almost identical to the format of double-precision floating-point numbers on the System/360, except that the exponent was now a power of ten, and the mantissa was composed of BCD digits. The exponent was still a binary number in excess-64 notation.


Fujitsu, in its SPARC 64X processors, supported the standard decimal floating-point format in the IBM version using Densely Packed Decimal; apparently, at present, they are the only vendor other than IBM doing so.

In addition, since Oracle aquired Sun, the originator of the SPARC architecture, these chips were made for Fujitsu servers (or mainframes) intended to be used to run Oracle database software. As a result, support for a legacy floating-point format from Oracle was also included on these processors.

Oracle offered the first relational data-base management system with SQL in 1979, preceding IBM's SQL/DS by two years, despite SQL and the basic concept of the relational data base having originated at IBM. Thus, it is not surprising that a decimal floating-point format suited to a software implementation was developed for earlier Oracle products, which remained supported in subsequent generations for compatibility reasons.

The Oracle floating-point format works with a number as a series of 8-bit bytes.

The first byte begins with a sign bit. 0, being smaller, indicates a negative number, while 1 indicates a positive number.

The remaining bits indicate an exponent, which ranges from -65 to 61, with an additional higher value, where 62 would be, indicating infinity.

For positive numbers, 0000000 represents -65, up to 1111110 representing 61, and 1111111 representing infinity. For negative numbers, the sequence is reversed; 1111111 represents -65, up to 0000001 representing 61 and 0000000 representing infinity.

The exponents are powers of one hundred, and are used to multiply a mantissa or significand which is considered to have its decimal point after its first two decimal digits.

The bytes composing the significand represent digit pairs, so they encode values from 0 to 99. Again, their sequence is inverted for negative numbers.

In a positive number, values from 1 to 100 in those bytes represent the digit pairs from 00 to 99.

In a negative number, values from 101 down to 2 in those bytes represent the digit pairs from 00 to 99.

Leading Bit Suppression in Decimal Floating Point

Below, I discuss an alternative way of handling decimal floating point, with the intent of obtaining a closer resemblance to the binary floating-point representation in the IEEE 754 standards. While I believe an improvement in numerical properties is obtained, changing to a format which must always be normalized of course loses the important property of the format described above of retaining an indication of how many trailing zeroes it is reasonable to print when a floating-point number is output.


The original floating-point format of the IBM System/360 architecture was subject to criticism on the basis that its large radix, 16, meant that the precision of floating-point numbers was highly variable.

Since this effect was judged tolerable in the Burroughs B5500 and the Atlas, which had a floating-point radix of 8, perhaps a radix of 10 is not excessive.

But if we view a floating-point radix of 2 as the ideal, can we modify decimal floating-point to have, in effect, a radix near 2, and, if so, can we further accomplish what has been achieved with binary floating-point: a gain in precision by hiding the first bit of the number, which (except for the number zero) must always be 1?

As a matter of fact, it *is* possible to do this. A scheme for doing so is outlined below, in a first, crude version:

1     1   00  ***
2    10   01  0**
3    11   01  1**
4   100   10  00*
5   101   10  01*
6   110   10  10*
7   111   10  10*
8  1000   11  000
9  1001   11  001

The first column shows the leading decimal digit of a floating-point quantity. The second column shows its binary representation. The third column shows the two bits to be appended after the least significant bits of the exponent which affect only the value of the first digit, not the rest of the mantissa/coefficient/significand. The fourth column shows how the first digit of the mantissa is represented, leaving space, represented by asterisks, for a fraction to appear at the end of the number. Three asterisks allow nothing, 1/8, 1/4, 3/8, 1/2, 5/8, 3/4, and 7/8 of the units value of the least significant digit to be added at the end; two asterisks, nothing, 1/4, 1/2, and 3/4, one asterisk, nothing or 1/2.

This crude scheme has an obvious flaw, however. The extra precision added to the end of the number is *binary* in nature. Instead, something that integrates well with decimal notation is desired.

Another flaw that is easier to remedy was left in: as ten, rather than sixteen, digit values are encoded, not all eight possible values occupying three bits are used, so some precision is wasted.

If one is familiar with the gradations along the span of a slide rule, however, an alternative offers itself. Instead of trying to use 1/4 and 1/8 as additional units, in addition to 1/2, simply use 1/5 in addition to 1/2. This means that instead of four exponent values corresponding to the same shift of the decimal place for most of the mantissa, only three exponent values so correspond.

The least significant part of the decimal exponent and the first digit, left-justified, along with the additional data to be appended to the least significant part of the number as a result of the space saved by the left-justification, can be combined into a single four-bit field as shown below:

0000   1 ... -       1 ... 0
0001   1 ... 1/5     1 ... 2
0010   1 ... 2/5     1 ... 4
0011   1 ... 3/5     1 ... 6
0100   1 ... 4/5     1 ... 8
----------------------------
0101   2 ... -       2 ... 0
0110   2 ... 1/2     2 ... 5

0111   3 ... -       3 ... 0
1000   3 ... 1/2     3 ... 5

1001   4 ... -       4 ... 0
1010   4 ... 1/2     4 ... 5
----------------------------
1011   5             5 ... 0

1100   6             6 ... 0

1101   7             7 ... 0

1110   8             8 ... 0

1111   9             9 ... 0

If the first digit of a number is 1, then a fraction in fifths is included at the end of the number after the least significant digit of the mantissa. If it is 2, 3, or 4, a fraction in halves is included; if it is 5, 6, 7, 8, or 9, then no significance is added to the mantissa.

Instead of thinking in terms of fractions, it may perhaps be easier to understand if the code is thought of as indicating in combination the digit to be appended to the left of the main mantissa field, and the digit to be appended to the right of the main mantissa field.

In effect, instead of the radix jumping by steps of 10, the least significant unit of a decimal number now moves by three gentler steps of 2, 2.5, and 2.

If the middle of the mantissa holds three decimal digits only, as an example, the result of representing an additional partial last digit of the mantissa along with the first digit in this form is to keep the value of the unit in the last place of the number within tighter bounds, as shown in the table below:

from
.10000 to
.19998

one unit in the last place, .00002, is .02% to .01%,

from
.20000 to
.49995

one unit in the last place, .00005, is .025% to .01%, and

from
.5000 to
.9999

one unit in the last place, .0001, is .02% to .01%.

whereas, if one simply had a four-digit mantissa going from .1000 to .9999, one unit in the last place would vary from .1% to .01%, a factor of 10, instead of the maximum range of a factor of 2.5 achieved above.

Peace at Last?

If we were to change the three ranges in the table above to:

Mantissa Range      Unit in     Size relative to
                    last place  number

.10000 to .19998    .00002       .02% to .01%
.20000 to .39995    .00005       .025% to .0125%
.4000  to .9999     .0001        .025% to .01%

we still keep the precion of numbers within the range of a factor of 2.5, even though we lose one bit of precision for numbers whose first digit is 4.

Thus, our combined first-and-last digit field now can have the following coding:

0000   0
----------------------------
0001   1 ... -       1 ... 0
0010   1 ... 1/5     1 ... 2
0011   1 ... 2/5     1 ... 4
0100   1 ... 3/5     1 ... 6
0101   1 ... 4/5     1 ... 8
----------------------------
0110   2 ... -       2 ... 0
0111   2 ... 1/2     2 ... 5

1000   3 ... -       3 ... 0
1001   3 ... 1/2     3 ... 5
----------------------------
1010   4             4 ... 0

1011   5             5 ... 0

1100   6             6 ... 0

1101   7             7 ... 0

1110   8             8 ... 0

1111   9             9 ... 0

which frees up the code 0000 to represent 0 as a leading digit. But, as a way of allowing unnormalized numbers, it has the serious flaw that the last digit, having five possible values when the first digit is 1, has now suddenly disappeared. So a jarring loss of precision takes place when a number is unnormalized; in effect, it is possible to indicate when the first two or more digits are zero, but not when the first digit is zero.

Fixing this seems to require accepting a result which no longer makes such a good fit: as a minimum, zero as a first digit has to be replaced by five different codes, making for a total of 20 codes, which is somewhat wasteful to code in five bits. One way to fix this is to go from 20 codes to 60 codes, a good fit to six bits, using the same trick with the leading part of the exponent that was used for the combination field in the standard decimal floating point format described above.

Perhaps we can also solve matters by adding a zero first digit to another coding, which is not quite as good a fit to begin with.

As a first try, we can think in terms of adding a third of a digit to our target numerical precision, to get a table like the following:

0...0   1...0   2...0   3...0   4...0   5...0   6...0   7...0   8...0   9...0
0...1   1...1
0...2   1...2   2...2   3...2
0...3   1...3
0...4   1...4   2...4   3...4
0...5   1...5                   4...5   5...5   6...5   7...5   8...5   9...5
0...6   1...6   2...6   3...6
0...7   1...7
0...8   1...8   2...8   3...8
0...9   1...9

This results in a need for 42 additional codes; this is just over twice 20 codes, and thus still has its limitations as a fit. We could, however, go to 52 codes by appending 00, 05, 15, 20, 25, and so on to the end of the mantissa when the first digit is zero, and this would be a reasonable manifestation of concern for the precision of unnormalized numbers. A similar step would increase the number of codes in the case discussed above from 20 codes to 25.

The next possibility is to add two-thirds of a digit to our target precision.

So, the first digit 1 would have twenty codes allocated to it, for appended digits 00, 05, 10, to 95 after the main mantissa. The first digits 2 and 3 would each have ten codes allocated to them, for appended digits 0, 1, 2, through 9 appended after the main mantissa. The first digits 4 through 7 would each have five codes, for appended digits 0, 2, 4, 6, and 8 appended. As with the first digit 4 in the first case considered, keeping the precision within bounds of a factor of 2.5 only requires two codes (instead of five) for the first digits 8 and 9, for appended digits 0 and 5.

This works out to a total of 64 codes. While above, a tight fit took 15 codes, and a loose fit 16, here a tight fit takes 64 codes, and a loose fit 70 codes.

An additional 20 codes are needed for zero; if that number is increased, the result would be 50 codes. 50 plus 70 is 120, which is indeed a good fit to 128. That is equivalent to adding a bit to the length of the mantissa in order to allow unnormalized values... which, of course, illustrates that the previous scheme really did achieve the near-equivalent of suppressing the first bit of a decimal mantissa!

But it does also mean that it is awkward to set up a system which combines this particular modification of floating-point with support for unnormalized numbers.


[Next] [Up] [Previous]