language agnostic - Why Are Floating Point Numbers Inaccurate? -
why numbers lose accuracy when stored floating point numbers?
for example, decimal number 9.2
can expressed ratio of 2 decimal integers (92/10
), both of can expressed in binary (0b1011100/0b1010
). however, same ratio stored floating point number never equal 9.2
:
32-bit "single precision" float: 9.19999980926513671875 64-bit "double precision" float: 9.199999999999999289457264239899814128875732421875
how can such apparently simple number "too big" express in 64 bits of memory?
in programming languages, floating point numbers represented lot scientific notation: exponent , mantissa (also called significand). simple number, 9.2
, fraction:
5179139571476070 * 2 -49
where exponent -49
, mantissa 5179139571476070
. reason impossible represent some decimal numbers way both exponent , mantissa must integers. in other words, floats must integer multiplied integer power of 2.
9.2
may 92/10
, 10 cannot expressed 2n if n limited integer values.
seeing data
first, few functions see components make 32- , 64-bit float
. gloss on these if care output (example in python):
def float_to_bin_parts(number, bits=64): if bits == 32: # single precision int_pack = 'i' float_pack = 'f' exponent_bits = 8 mantissa_bits = 23 exponent_bias = 127 elif bits == 64: # double precision. python floats int_pack = 'q' float_pack = 'd' exponent_bits = 11 mantissa_bits = 52 exponent_bias = 1023 else: raise valueerror, 'bits argument must 32 or 64' bin_iter = iter(bin(struct.unpack(int_pack, struct.pack(float_pack, number))[0])[2:].rjust(bits, '0')) return [''.join(islice(bin_iter, x)) x in (1, exponent_bits, mantissa_bits)]
there's lot of complexity behind function, , it'd quite tangent explain, if you're interested, important resource our purposes struct module.
python's float
64-bit, double-precision number. in other languages such c, c++, java , c#, double-precision has separate type double
, implemented 64 bits.
when call function our example, 9.2
, here's get:
>>> float_to_bin_parts(9.2) ['0', '10000000010', '0010011001100110011001100110011001100110011001100110']
interpreting data
you'll see i've split return value 3 components. these components are:
- sign
- exponent
- mantissa (also called significand, or fraction)
sign
the sign stored in first component single bit. it's easy explain: 0
means float positive number; 1
means it's negative. because 9.2
positive, our sign value 0
.
exponent
the exponent stored in middle component 11 bits. in our case, 0b10000000010
. in decimal, represents value 1026
. quirk of component must subtract number equal 2(# of bits) - 1 - 1 true exponent; in our case, means subtracting 0b1111111111
(decimal number 1023
) true exponent, 0b00000000011
(decimal number 3).
mantissa
the mantissa stored in third component 52 bits. however, there's quirk component well. understand quirk, consider number in scientific notation, this:
6.0221413x1023
the mantissa 6.0221413
. recall mantissa in scientific notation begins single non-zero digit. same holds true binary, except binary has 2 digits: 0
, 1
. binary mantissa always starts 1
! when float stored, 1
@ front of binary mantissa omitted save space; have place @ front of our third element true mantissa:
1.0010011001100110011001100110011001100110011001100110
this involves more simple addition, because bits stored in our third component represent fractional part of mantissa, right of radix point.
when dealing decimal numbers, "move decimal point" multiplying or dividing powers of 10. in binary, can same thing multiplying or dividing powers of 2. since our third element has 52 bits, divide 252 move 52 places right:
0.0010011001100110011001100110011001100110011001100110
in decimal notation, that's same dividing 675539944105574
4503599627370496
0.1499999999999999
. (this 1 example of ratio can expressed in binary, approximately in decimal; more detail, see: 675539944105574 / 4503599627370496.)
now we've transformed third component fractional number, adding 1
gives true mantissa.
recapping components
- sign (first component):
0
positive,1
negative - exponent (middle component): subtract 2(# of bits) - 1 - 1 true exponent
- mantissa (last component): divide 2(# of bits) , add
1
true mantissa
calculating number
putting 3 parts together, we're given binary number:
1.0010011001100110011001100110011001100110011001100110 x 1011
which can convert binary decimal:
1.1499999999999999 x 23 (inexact!)
and multiply reveal final representation of number started (9.2
) after being stored floating point value:
9.1999999999999993
representing fraction
9.2
now we've built number, it's possible reconstruct simple fraction:
1.0010011001100110011001100110011001100110011001100110 x 1011
shift mantissa whole number:
10010011001100110011001100110011001100110011001100110 x 1011-110100
convert decimal:
5179139571476070 x 23-52
subtract exponent:
5179139571476070 x 2-49
turn negative exponent division:
5179139571476070 / 249
multiply exponent:
5179139571476070 / 562949953421312
which equals:
9.1999999999999993
9.5
>>> float_to_bin_parts(9.5) ['0', '10000000010', '0011000000000000000000000000000000000000000000000000']
already can see mantissa 4 digits followed whole lot of zeroes. let's go through paces.
assemble binary scientific notation:
1.0011 x 1011
shift decimal point:
10011 x 1011-100
subtract exponent:
10011 x 10-1
binary decimal:
19 x 2-1
negative exponent division:
19 / 21
multiply exponent:
19 / 2
equals:
9.5
further reading
- the floating-point guide: every programmer should know floating-point arithmetic, or, why don’t numbers add up? (floating-point-gui.de)
- what every computer scientist should know floating-point arithmetic (goldberg 1991)
- ieee double-precision floating-point format (wikipedia)
- floating point arithmetic: issues , limitations (docs.python.org)
- floating point binary
Comments
Post a Comment