IBM has recently developed a decimal floating point format which it is including on its new z9 computers. These computers replace the z990, the previous top-of-the-line z/Architecture machine from IBM, z/Architecture being the 64-bit extension to the architecture which began with System/360 and continued with extensions to System/370 and ESA/390.
This section refers to instructions which implement operations on numbers in that format and in related formats.
This format is also described on this page.
The basic characteristics of this data type are as follows:
Three data types are defined. All three data types feature a five-bit field which contains both the first decimal digit of the mantissa (or coefficient) of the floating-point number, and the first two bits of the exponent (which is in binary form), those two bits being allowed to take only the values 00, 01, or 10, but not 11.
This provides an efficient means of coding decimal floating-point numbers, as in each case, the remaining digits of the mantissa are all contained within 10-bit fields. Had there been no extra decimal digit left over, of course, a simple binary exponent field would have been just as efficient, and simpler, but as it happened, the coding scheme used allowed efficient coding to be retained while providing an exponent field which was neither too large nor too small, particularly for 32-bit and 64-bit floating-point values, and it also ensured that the size of the exponent field would monotonically increase as the length of the number increased.
Since this data type permits unnormalized values to be represented, not only are instructions provided which follow the "ideal exponent" rules described in the standard, which are the humanized floating-point instructions given below, but instructions are also provided for conventional unnormalized operation, for the purpose of carrying out significance arithmetic, and for conventional normalized arithmetic.
In addition, another type is provided that allows only normalized numbers to be represented, and which may include a partial decimal digit appended at the end of the number depending on the value of the first digit. This type is called numeric floating register compressed decimal. The coding of the first digit is shown in the table below:
0000 1 ... - 1 ... 0 0001 1 ... 1/5 1 ... 2 0010 1 ... 2/5 1 ... 4 0011 1 ... 3/5 1 ... 6 0100 1 ... 4/5 1 ... 8 ---------------------------- 0101 2 ... - 2 ... 0 0110 2 ... 1/2 2 ... 5 0111 3 ... - 3 ... 0 1000 3 ... 1/2 3 ... 5 1001 4 ... - 4 ... 0 1010 4 ... 1/2 4 ... 5 ---------------------------- 1011 5 5 ... 0 1100 6 6 ... 0 1101 7 7 ... 0 1110 8 8 ... 0 1111 9 9 ... 0
The four-bit field containing the first and last digits of the mantissa is referred to as the compound field. In the case of 32, 64, and 128-bit numbers, it replaces the combination field, and so the length of the exponent field is increased by one bit, leading to the range of exponents being first divided by three and then multiplied by two as compared to that in the standard format.
In alternate precisions, the general rule is that the compound field encodes the most significant digit of the number, and then the remaining digits are encoded as appropriate, with a combination field if the number of remaining digits is of the form 3n+1.
For the regular floating register compressed decimal type, when a compound field is present, the values 11110 and 11111 are used, as provided for by the revised IEEE 754 standard to encode infinity and NaN. When one is not present, inadmissible codes for the 7-bit or 10-bit field including the most significant digit of the number will be used.
For the numeric floating register compressed decimal type, gradual underflow is provided for by replacing the compound field with a four-bit field containing a single BCD digit when the exponent is at its minimum value; thus, in that case, the most significant digit may be zero. For this type, whether or not a combination field is present, the values E and F in the compound field, when the exponent is at its minimum value, encode infinity and NaN respectively.
The instructions which deal with these numbers have the opcodes shown below:
173706 000xxx SWFRC Swap Floating Register Compressed 173706 001xxx CFRC Compare Floating Register Compressed 173706 002xxx LFRC Load Floating Register Compressed 173706 003xxx STFRC Store Floating Register Compressed 173706 004xxx AFRC Add Floating Register Compressed 173706 005xxx SFRC Subtract Floating Register Compressed 173706 006xxx MFRC Multiply Floating Register Compressed 173706 007xxx DFRC Divide Floating Register Compressed 173706 012xxx LUFRC Load Unnormalized Floating Register Compressed 173706 013xxx STUFRC Store Unnormalized Floating Register Compressed 173706 014xxx AUFRC Add Unnormalized Floating Register Compressed 173706 015xxx SUFRC Subtract Unnormalized Floating Register Compressed 173706 016xxx MUFRC Multiply Unnormalized Floating Register Compressed 173706 017xxx DUFRC Divide Unnormalized Floating Register Compressed 173706 024xxx AFRCH Add Floating Register Compressed Humanized 173706 025xxx SFRCH Subtract Floating Register Compressed Humanized 173706 026xxx MFRCH Multiply Floating Register Compressed Humanized 173706 027xxx DFRCH Divide Floating Register Compressed Humanized 173706 040xxx SWDRC Swap Double Register Compressed 173706 041xxx CDRC Compare Double Register Compressed 173706 042xxx LDRC Load Double Register Compressed 173706 043xxx STDRC Store Double Register Compressed 173706 044xxx ADRC Add Double Register Compressed 173706 045xxx SDRC Subtract Double Register Compressed 173706 046xxx MDRC Multiply Double Register Compressed 173706 047xxx DDRC Divide Double Register Compressed 173706 052xxx LUDRC Load Unnormalized Double Register Compressed 173706 053xxx STUDRC Store Unnormalized Double Register Compressed 173706 054xxx AUDRC Add Unnormalized Double Register Compressed 173706 055xxx SUDRC Subtract Unnormalized Double Register Compressed 173706 056xxx MUDRC Multiply Unnormalized Double Register Compressed 173706 057xxx DUDRC Divide Unnormalized Double Register Compressed 173706 064xxx AFDCH Add Double Register Compressed Humanized 173706 065xxx SFDCH Subtract Double Register Compressed Humanized 173706 066xxx MFDCH Multiply Double Register Compressed Humanized 173706 067xxx DFDCH Divide Double Register Compressed Humanized 173706 100xxx SWQRC Swap Quad Register Compressed 173706 101xxx CQRC Compare Quad Register Compressed 173706 102xxx LQRC Load Quad Register Compressed 173706 103xxx STQRC Store Quad Register Compressed 173706 104xxx AQRC Add Quad Register Compressed 173706 105xxx SQRC Subtract Quad Register Compressed 173706 106xxx MQRC Multiply Quad Register Compressed 173706 107xxx DQRC Divide Quad Register Compressed 173706 112xxx LUQRC Load Unnormalized Quad Register Compressed 173706 113xxx STUQRC Store Unnormalized Quad Register Compressed 173706 114xxx AUQRC Add Unnormalized Quad Register Compressed 173706 115xxx SUQRC Subtract Unnormalized Quad Register Compressed 173706 116xxx MUQRC Multiply Unnormalized Quad Register Compressed 173706 117xxx DUQRC Divide Unnormalized Quad Register Compressed 173706 124xxx AFDCH Add Quad Register Compressed Humanized 173706 125xxx SFDCH Subtract Quad Register Compressed Humanized 173706 126xxx MFDCH Multiply Quad Register Compressed Humanized 173706 127xxx DFDCH Divide Quad Register Compressed Humanized 173706 140xxx SWNFRC Swap Numerical Floating Register Compressed 173706 141xxx CNFRC Compare Numerical Floating Register Compressed 173706 142xxx LNFRC Load Numerical Floating Register Compressed 173706 143xxx STNFRC Store Numerical Floating Register Compressed 173706 144xxx ANFRC Add Numerical Floating Register Compressed 173706 145xxx SNFRC Subtract Numerical Floating Register Compressed 173706 146xxx MNFRC Multiply Numerical Floating Register Compressed 173706 147xxx DNFRC Divide Numerical Floating Register Compressed 173706 150xxx SWNDRC Swap Numerical Double Register Compressed 173706 151xxx CNDRC Compare Numerical Double Register Compressed 173706 152xxx LNDRC Load Numerical Double Register Compressed 173706 153xxx STNDRC Store Numerical Double Register Compressed 173706 154xxx ANDRC Add Numerical Double Register Compressed 173706 155xxx SNDRC Subtract Numerical Double Register Compressed 173706 156xxx MNDRC Multiply Numerical Double Register Compressed 173706 157xxx DNDRC Divide Numerical Double Register Compressed 173706 160xxx SWNQRC Swap Numerical Quad Register Compressed 173706 161xxx CNQRC Compare Numerical Quad Register Compressed 173706 162xxx LNQRC Load Numerical Quad Register Compressed 173706 163xxx STNQRC Store Numerical Quad Register Compressed 173706 164xxx ANQRC Add Numerical Quad Register Compressed 173706 165xxx SNQRC Subtract Numerical Quad Register Compressed 173706 166xxx MNQRC Multiply Numerical Quad Register Compressed 173706 167xxx DNQRC Divide Numerical Quad Register Compressed
As well, a few additional instructions are provided for the regular register compressed formats that provide targeted arithmetic.
173706 17nnnn 014xxx ATFRC Add Targeted Floating Register Compressed 173706 17nnnn 015xxx STFRC Subtract Targeted Floating Register Compressed 173706 17nnnn 016xxx MTFRC Multiply Targeted Floating Register Compressed 173706 17nnnn 017xxx DTFRC Divide Targeted Floating Register Compressed 173706 17nnnn 024xxx AETFRC Add Extensibly Targeted Floating Register Compressed 173706 17nnnn 025xxx SETFRC Subtract Extensibly Targeted Floating Register Compressed 173706 17nnnn 026xxx METFRC Multiply Extensibly Targeted Floating Register Compressed 173706 17nnnn 027xxx DETFRC Divide Extensibly Targeted Floating Register Compressed 173706 17nnnn 054xxx ATDRC Add Targeted Double Register Compressed 173706 17nnnn 055xxx STDRC Subtract Targeted Double Register Compressed 173706 17nnnn 056xxx MTDRC Multiply Targeted Double Register Compressed 173706 17nnnn 057xxx DTDRC Divide Targeted Double Register Compressed 173706 17nnnn 064xxx AETDRC Add Extensibly Targeted Double Register Compressed 173706 17nnnn 065xxx SETDRC Subtract Extensibly Targeted Double Register Compressed 173706 17nnnn 066xxx METDRC Multiply Extensibly Targeted Double Register Compressed 173706 17nnnn 067xxx DETDRC Divide Extensibly Targeted Double Register Compressed 173706 17nnnn 114xxx ATQRC Add Targeted Quad Register Compressed 173706 17nnnn 115xxx STQRC Subtract Targeted Quad Register Compressed 173706 17nnnn 116xxx MTQRC Multiply Targeted Quad Register Compressed 173706 17nnnn 117xxx DTQRC Divide Targeted Quad Register Compressed 173706 17nnnn 124xxx AETQRC Add Extensibly Targeted Quad Register Compressed 173706 17nnnn 125xxx SETQRC Subtract Extensibly Targeted Quad Register Compressed 173706 17nnnn 126xxx METQRC Multiply Extensibly Targeted Quad Register Compressed 173706 17nnnn 127xxx DETQRC Divide Extensibly Targeted Quad Register Compressed
In these instructions, the field marked xxx contains the destination register, the index register or source register, and the base register in the usual manner for memory-reference instructions. The field marked nnnn contains a twelve-bit target exponent value in excess-6,176 format, matching the exponent in the largest size of register compressed decimal numbers.
For decimal fixed-point arithmetic where all the numbers involved have the same exponent value, only a small range of exponent values is useful, since otherwise multiplication and division cannot produce a usable result. However, the inputs to a targeted instruction may have any exponent, and so the target exponent of the result can be one applicable to holding the result of an operation on two operands whose exponents are themselves determined through previous targeted operations, but which differ from that which is specified for the result.
A targeted arithmetic operation has the final operand aligned so that its exponent has the value specified as the target. This permits fixed-point arithmetic to be carried out automatically, without separate instructions for alignment, and in addition it has the benefit that since the fixed-point quantities are valid floating-point quantities, they are tagged with an indication of their magnitude. Normally, fixed-point arithmetic depends on adjustment steps being carried out after multiplies and divides, and the fixed-point quantities, being no different from the patterns of bits that represent integers, can easily be used incorrectly in calculations that assume a different location of the radix point.
Extensibly targeted arithmetic operations are carried out without rounding, and overflows from the most significant part of the mantissa will be ignored unless integer overflows are trapped, so they behave like integer operations in this respect as well. Ordinary targeted arithmetic operations, on the other hand, do not do this, so as to produce valid numerical results that can be incorporated into floating-point calculations.
This is inspired by a capability provided by the NORC computer.
Note that the use of a combination field, while it is appropriate with floating-point sizes of 32, 64, and 128 bits, may not necessarily work well with floating-point sizes of 48 and 96 bits, 36 and 72 bits, 30, 60, and 120 bits, or 40 and 80 bits.
This is because the overall length of the field in memory allocated to a floating-point number determines the number of decimal digits of precision it may have. Given that the compressed decimal format involves placing three digits at a time in a 10-bit long field, and the design of the combination field was predicated on there being one digit left over after a number of such fields for each of the three formats defined, we can conclude that there are three possible cases:
Given these three choices of format, it seems as though decimal floating-point when implemented across varying word sizes, if it is desired to maintain a relatively close correspondence with the exponent sizes provided by the existing IBM formats, and to follow the same rule as they in regards to choice of exponent bias, might lead to the following formats:
Value size: Exponent Values Exponent Bias Precision in Digits Sign Exponent Coefficient Conventional
Exponent Bias
32 bits 3 * 64 = 192 101 ( 96+ 5) 2 * 3 + 1 = 7 1 6+(2-) 20+(3+) 94
64 bits 3 * 256 = 768 398 ( 384+14) 5 * 3 + 1 = 16 1 8+(2-) 50+(3+) 382
128 bits 3 * 4,096 = 12,288 6,176 ( 6,144+32) 11 * 3 + 1 = 34 1 12+(2-) 110+(3+) 6,142
36 bits 256 134 ( 128+ 6) 2 * 3 + 2 = 8 1 8 20+7 126
72 bits 2,048 1,040 ( 1,024+16) 6 * 3 = 18 1 11 60 1,038
48 bits 1,024 521 ( 512+ 9) 3 * 3 + 2 = 11 1 10 30+7 510
96 bits 3 * 1,024 = 3,072 1,559 ( 1,536+23) 8 * 3 + 1 = 25 1 10+(2-) 80+(3+) 1,534
30 bits 512 260 ( 256+ 4) 2 * 3 = 6 1 9 20 254
60 bits 512 269 ( 256+13) 5 * 3 = 15 1 9 50 254
120 bits 4,096 2,078 ( 2,048+30) 10 * 3 + 2 = 32 1 12 100+7 2,046
40 bits 512 263 ( 256+ 7) 3 * 3 = 9 1 9 30 254
80 bits 4,096 2,066 ( 2,048+18) 6 * 3 + 2 = 20 1 12 60+7 2,046
The notations (2-) and (3+) above refer to components of the 5-bit field which combines a value from 0 to 2 for the beginning of the exponent with a value from 0 to 9 for the beginning of the mantissa included in IBM's decimal floating point format.
The final column, Conventional Exponent Bias, shows what the exponent bias would be, if the radix point of the coefficient (or mantissa) were regarded, as has been the more common convention, as being at the beginning of the field rather than at the end of the field. This is derived by subtracting the precision of the number in digits to the exponent bias value normally given for the format, which has that number of digits, less two, added to half the exponent range.
An exponent in excess-n notation has n subtracted from the exponent to determine the power of the radix by which the mantissa is to be multiplied, and so, if we regard the mantissa as a fraction instead of an integer, we are making it smaller, and that power needs to be increased. Therefore, n, which is subtracted from it, is decreased. Thus, the difference between this floating-point format and conventional formats, which place the radix point in front of the mantissa and simply choose an exponent bias which is half the exponent range without adjustment, is not as large as it seems at first.
Since it is felt that each of the series of word sizes would normally be used independently, strict monotonicity between series is not treated as an overriding goal. In one case, the series of 30, 60, and 120 bits, even monotonicity in the growth of the exponent field within a series had to be set aside in order to achieve a reasonably large exponent field for the 30 bit size without this leading to excessively-large exponent fields for the other sizes.
Note that, in the absence of a 5-bit field combining the start of the exponent and mantissa, it is assumed that no limitation is placed on the range of the exponent field in order to indicate infinity and NaN values. Thus, either the 7 bit field giving the first two digits of the mantissa, or the 10 bit field giving the first three digits of the mantissa, would presumably be used for that purpose, two of the 28 or 24 unused combinations of bits serving this purpose.
In the case of the numerical register compressed decimal floating-point data type, for the 32, 64, and 128 bit-long data types, the five-bit combination field representing the first two bits of the exponent and the first digit of the number is replaced by a four-bit field representing the first digit of the number, an extra partial digit appended to the end of the number, and, in effect, the last two bits of the exponent if it is thought of as applying to a mixed-radix system with radices alternating between 2, 2.5, and 2 in a cycle of three.
For other lengths, this four bit field representing the first digit of the number must be retained, and therefore the presence of a seven-bit field containing the next two digits of the number, or a combination field, following the form used in the previous numerical format, but in this case containing the second most significant digit of the number, is determined by the number of digits represented (ignoring the final appended partial digit) less one.
The resulting numerical formats are:
Value size: Exponent Values Precision in Digits Sign Exponent Compound Mantissa 32 bits 128 2 * 3 + 1 = 7 1 7 4 20 64 bits 512 5 * 3 + 1 = 16 1 9 4 50 128 bits 8,192 11 * 3 + 1 = 34 1 13 4 110 36 bits 3 * 64 = 192 2 * 3 + 2 = 8 1 6+(2-) 4 20+(3+) 72 bits 1,024 6 * 3 = 18 1 10 4 50+7 48 bits 3 * 256 = 768 3 * 3 + 2 = 11 1 8+(2-) 4 30+(3+) 96 bits 2,048 8 * 3 + 1 = 25 1 11 4 80 30 bits 256 2 * 3 = 6 1 8 4 10+7 60 bits 256 5 * 3 = 15 1 8 4 40+7 120 bits 3 * 1,024 = 3,072 10 * 3 + 2 = 32 1 10+(2-) 4 100+(3+) 40 bits 256 3 * 3 = 9 1 8 4 20+7 80 bits 3 * 1,024 = 3,072 6 * 3 + 2 = 20 1 10+(2-) 4 60+(3+)
In this format, unlike the one supporting unnormalized operation, the decimal point of the mantissa field lies before the most significant digit, and the exponent bias is always one-half the number of possible values for the exponent.
When the exponent is at its minimum value, the four bit compound field instead contains a single BCD digit, which may be zero, to allow gradual underflow as with the standard floating-point type.
The layout of the formats in the different sizes for these two types of floating-point number are illustrated below:
