# Module `Base.Float`

Floating-point representation and utilities.

If using 32-bit OCaml, you cannot quite assume operations act as you'd expect for IEEE 64-bit floats. E.g., one can have `let x = ~-. (2. ** 62.) in x = x -. 1.`

evaluate to `false`

while `let x = ~-. (2. ** 62.) in let y = x -. 1 in x = y`

evaluates to `true`

. This is related to 80-bit registers being used for calculations; you can force representation as a 64-bit value by let-binding.

`val hash_fold_t : Hash.state -> t -> Hash.state`

`val hash : t -> Hash.hash_value`

`max`

and `min`

will return nan if either argument is nan.

The `validate_*`

functions always fail if class is `Nan`

or `Infinite`

.

`include Identifiable.S with type t := t`

`val hash_fold_t : Hash.state -> t -> Hash.state`

`val hash : t -> Hash.hash_value`

`include Comparable.S with type t := t`

`include Base__.Comparable_intf.Polymorphic_compare`

`val ascending : t -> t -> int`

`ascending`

is identical to`compare`

.`descending x y = ascending y x`

. These are intended to be mnemonic when used like`List.sort ~compare:ascending`

and`List.sort ~cmp:descending`

, since they cause the list to be sorted in ascending or descending order, respectively.

`val descending : t -> t -> int`

`val between : t -> low:t -> high:t -> bool`

`between t ~low ~high`

means`low <= t <= high`

`val clamp_exn : t -> min:t -> max:t -> t`

`clamp_exn t ~min ~max`

returns`t'`

, the closest value to`t`

such that`between t' ~low:min ~high:max`

is true.Raises if

`not (min <= max)`

.

`val clamp : t -> min:t -> max:t -> t Or_error.t`

`include Comparator.S with type t := t`

`val comparator : (t, comparator_witness) Comparator.comparator`

`include Base__.Comparable_intf.Validate with type t := t`

`val validate_lbound : min:t Maybe_bound.t -> t Validate.check`

`val validate_ubound : max:t Maybe_bound.t -> t Validate.check`

`val validate_bound : min:t Maybe_bound.t -> max:t Maybe_bound.t -> t Validate.check`

`include Comparable.With_zero with type t := t`

`val validate_positive : t Validate.check`

`val validate_non_negative : t Validate.check`

`val validate_negative : t Validate.check`

`val validate_non_positive : t Validate.check`

`val is_positive : t -> bool`

`val is_non_negative : t -> bool`

`val is_negative : t -> bool`

`val is_non_positive : t -> bool`

`val sign : t -> Base__.Sign0.t`

Returns

`Neg`

,`Zero`

, or`Pos`

in a way consistent with the above functions.

`val validate_ordinary : t Validate.check`

`validate_ordinary`

fails if class is`Nan`

or`Infinite`

.

`val min_value : t`

Equal to

`neg_infinity`

.

`val sqrt_pi : t`

The constant sqrt(pi).

`val sqrt_2pi : t`

The constant sqrt(2 * pi).

`val euler : t`

Euler-Mascheroni constant (γ).

`val epsilon_float : t`

The difference between 1.0 and the smallest exactly representable floating-point number greater than 1.0. That is:

`epsilon_float = (one_ulp `Up 1.0) -. 1.0`

This gives the relative accuracy of type

`t`

, in the sense that for numbers on the order of`x`

, the roundoff error is on the order of`x *. float_epsilon`

.See also: Machine epsilon.

`val max_finite_value : t`

`val min_positive_subnormal_value : t`

`val min_positive_normal_value : t`

`val to_int64_preserve_order : t -> int64 option`

An order-preserving bijection between all floats except for nans, and all int64s with absolute value smaller than or equal to

`2**63 - 2**52`

. Note both 0. and -0. map to 0L.

`val to_int64_preserve_order_exn : t -> int64`

`val of_int64_preserve_order : int64 -> t`

Returns

`nan`

if the absolute value of the argument is too large.

`val one_ulp : [ `Up | `Down ] -> t -> t`

The next or previous representable float. ULP stands for "unit of least precision", and is the spacing between floating point numbers. Both

`one_ulp `Up infinity`

and`one_ulp `Down neg_infinity`

return a nan.

`val of_int : int -> t`

Note that this doesn't round trip in either direction. For example,

`Float.to_int (Float.of_int max_int) <> max_int`

.

`val to_int : t -> int`

`val of_int63 : Int63.t -> t`

`val of_int64 : int64 -> t`

`val to_int64 : t -> int64`

`val round : ?dir:[ `Zero | `Nearest | `Up | `Down ] -> t -> t`

`round`

rounds a float to an integer float.`iround{,_exn}`

rounds a float to an int. Both round according to a direction`dir`

, with default`dir`

being``Nearest`

.| `Down | rounds toward Float.neg_infinity | | `Up | rounds toward Float.infinity | | `Nearest | rounds to the nearest int ("round half-integers up") | | `Zero | rounds toward zero |

`iround_exn`

raises when trying to handle nan or trying to handle a float outside the range [float min_int, float max_int).Here are some examples for

`round`

for each direction:| `Down | [-2.,-1.) to -2. | [-1.,0.) to -1. | [0.,1.) to 0., [1.,2.) to 1. | | `Up | (-2.,-1.] to -1. | (-1.,0.] to -0. | (0.,1.] to 1., (1.,2.] to 2. | | `Zero | (-2.,-1.] to -1. | (-1.,1.) to 0. | [1.,2.) to 1. | | `Nearest | [-1.5,-0.5) to -1. | [-0.5,0.5) to 0. | [0.5,1.5) to 1. |

For convenience, versions of these functions with the

`dir`

argument hard-coded are provided. If you are writing performance-critical code you should use the versions with the hard-coded arguments (e.g.`iround_down_exn`

). The`_exn`

ones are the fastest.The following properties hold:

`of_int (iround_*_exn i) = i`

for any float`i`

that is an integer with`min_int <= i <= max_int`

.

`round_* i = i`

for any float`i`

that is an integer.

`iround_*_exn (of_int i) = i`

for any int`i`

with`-2**52 <= i <= 2**52`

.

`val iround : ?dir:[ `Zero | `Nearest | `Up | `Down ] -> t -> int option`

`val iround_exn : ?dir:[ `Zero | `Nearest | `Up | `Down ] -> t -> int`

`val round_towards_zero : t -> t`

`val round_down : t -> t`

`val round_up : t -> t`

`val round_nearest : t -> t`

Rounds half integers up.

`val iround_towards_zero : t -> int option`

`val iround_down : t -> int option`

`val iround_up : t -> int option`

`val iround_nearest : t -> int option`

`val iround_towards_zero_exn : t -> int`

`val iround_down_exn : t -> int`

`val iround_up_exn : t -> int`

`val iround_nearest_exn : t -> int`

`val int63_round_down_exn : t -> Int63.t`

`val int63_round_up_exn : t -> Int63.t`

`val int63_round_nearest_exn : t -> Int63.t`

`val iround_lbound : t`

If

`f <= iround_lbound || f >= iround_ubound`

, then`iround*`

functions will refuse to round`f`

, returning`None`

or raising as appropriate.

`val iround_ubound : t`

`val round_significant : float -> significant_digits:int -> float`

`round_significant x ~significant_digits:n`

rounds to the nearest number with`n`

significant digits. More precisely: it returns the representable float closest to`x rounded to n significant digits`

. It is meant to be equivalent to`sprintf "%.*g" n x |> Float.of_string`

but faster (10x-15x). Exact ties are resolved as round-to-even.However, it might in rare cases break the contract above.

It might in some cases appear as if it violates the round-to-even rule:

`let x = 4.36083208835;; let z = 4.3608320883;; assert (z = fast_approx_round_significant x ~sf:11)`

But in this case so does sprintf, since

`x`

as a float is slightly under-represented:`sprintf "%.11g" x = "4.3608320883";; sprintf "%.30g" x = "4.36083208834999958014577714493"`

More importantly,

`round_significant`

might sometimes give a different result than`sprintf ... |> Float.of_string`

because it round-trips through an integer. For example, the decimal fraction 0.009375 is slightly under-represented as a float:`sprintf "%.17g" 0.009375 = "0.0093749999999999997"`

But:

`0.009375 *. 1e5 = 937.5`

Therefore:

`round_significant 0.009375 ~significant_digits:3 = 0.00938`

whereas:

`sprintf "%.3g" 0.009375 = "0.00937"`

In general we believe (and have tested on numerous examples) that the following holds for all x:

`let s = sprintf "%.*g" significant_digits x |> Float.of_string in s = round_significant ~significant_digits x || s = round_significant ~significant_digits (one_ulp `Up x) || s = round_significant ~significant_digits (one_ulp `Down x)`

Also, for float representations of decimal fractions (like 0.009375),

`round_significant`

is more likely to give the "desired" result than`sprintf ... |> of_string`

(that is, the result of rounding the decimal fraction, rather than its float representation). But it's not guaranteed either--see the`4.36083208835`

example above.

`val round_decimal : float -> decimal_digits:int -> float`

`round_decimal x ~decimal_digits:n`

rounds`x`

to the nearest`10**(-n)`

. For positive`n`

it is meant to be equivalent to`sprintf "%.*f" n x |> Float.of_string`

, but faster.All the considerations mentioned in

`round_significant`

apply (both functions use the same code path).

`val min_inan : t -> t -> t`

`val max_inan : t -> t -> t`

`val (+) : t -> t -> t`

`val (-) : t -> t -> t`

`val (/) : t -> t -> t`

`val (*) : t -> t -> t`

`val (**) : t -> t -> t`

`val (~-) : t -> t`

`module Parts : sig ... end with type outer := t`

Returns the fractional part and the whole (i.e., integer) part. For example,

`modf (-3.14)`

returns`{ fractional = -0.14; integral = -3.; }`

!

`val modf : t -> Parts.t`

`val mod_float : t -> t -> t`

`mod_float x y`

returns a result with the same sign as`x`

. It returns`nan`

if`y`

is`0`

. It is basically`let mod_float x y = x -. float(truncate(x/.y)) *. y`

not

`let mod_float x y = x -. floor(x/.y) *. y`

and therefore resembles

`mod`

on integers more than`%`

.

`val add : t -> t -> t`

###### Ordinary functions for arithmetic operations

These are for modules that inherit from

`t`

, since the infix operators are more convenient.

`module O : sig ... end`

A sub-module designed to be opened to make working with floats more convenient.

`module O_dot : sig ... end`

Similar to

`O`

, except that operators are suffixed with a dot, allowing one to have both int and float operators in scope simultaneously.

`val to_string : t -> string`

`to_string x`

builds a string`s`

representing the float`x`

that guarantees the round trip, that is such that`Float.equal x (Float.of_string s)`

.It usually yields as few significant digits as possible. That is, it won't print

`3.14`

as`3.1400000000000001243`

. The only exception is that occasionally it will output 17 significant digits when the number can be represented with just 16 (but not 15 or less) of them.

`val to_string_hum : ?delimiter:char -> ?decimals:int -> ?strip_zero:bool -> t -> string`

Pretty print float, for example

`to_string_hum ~decimals:3 1234.1999 = "1_234.200"`

`to_string_hum ~decimals:3 ~strip_zero:true 1234.1999 = "1_234.2"`

. No delimiters are inserted to the right of the decimal.

`val to_padded_compact_string : t -> string`

Produce a lossy compact string representation of the float. The float is scaled by an appropriate power of 1000 and rendered with one digit after the decimal point, except that the decimal point is written as '.', 'k', 'm', 'g', 't', or 'p' to indicate the scale factor. (However, if the digit after the "decimal" point is 0, it is suppressed.)

The smallest scale factor that allows the number to be rendered with at most 3 digits to the left of the decimal is used. If the number is too large for this format (i.e., the absolute value is at least 999.95e15), scientific notation is used instead. E.g.:

`to_padded_compact_string (-0.01) = "-0 "`

`to_padded_compact_string 1.89 = "1.9"`

`to_padded_compact_string 999_949.99 = "999k9"`

`to_padded_compact_string 999_950. = "1m "`

In the case where the digit after the "decimal", or the "decimal" itself is omitted, the numbers are padded on the right with spaces to ensure the last two columns of the string always correspond to the decimal and the digit afterward (except in the case of scientific notation, where the exponent is the right-most element in the string and could take up to four characters).

`to_padded_compact_string 1. = "1 "`

`to_padded_compact_string 1.e6 = "1m "`

`to_padded_compact_string 1.e16 = "1.e+16"`

`to_padded_compact_string max_finite_value = "1.8e+308"`

Numbers in the range -.05 < x < .05 are rendered as "0 " or "-0 ".

Other cases:

`to_padded_compact_string nan = "nan "`

`to_padded_compact_string infinity = "inf "`

`to_padded_compact_string neg_infinity = "-inf "`

Exact ties are resolved to even in the decimal:

`to_padded_compact_string 3.25 = "3.2"`

`to_padded_compact_string 3.75 = "3.8"`

`to_padded_compact_string 33_250. = "33k2"`

`to_padded_compact_string 33_350. = "33k4"`

`val int_pow : t -> int -> t`

`int_pow x n`

computes`x ** float n`

via repeated squaring. It is generally much faster than`**`

.Note that

`int_pow x 0`

always returns`1.`

, even if`x = nan`

. This coincides with`x ** 0.`

and is intentional.For

`n >= 0`

the result is identical to an n-fold product of`x`

with itself under`*.`

, with a certain placement of parentheses. For`n < 0`

the result is identical to`int_pow (1. /. x) (-n)`

.The error will be on the order of

`|n|`

ulps, essentially the same as if you perturbed`x`

by up to a ulp and then exponentiated exactly.Benchmarks show a factor of 5-10 speedup (relative to

`**`

) for exponents up to about 1000 (approximately 10ns vs. 70ns). For larger exponents the advantage is smaller but persists into the trillions. For a recent or more detailed comparison, run the benchmarks.Depending on context, calling this function might or might not allocate 2 minor words. Even if called in a way that causes allocation, it still appears to be faster than

`**`

.

`val frexp : t -> t * int`

`frexp f`

returns the pair of the significant and the exponent of`f`

. When`f`

is zero, the significant`x`

and the exponent`n`

of`f`

are equal to zero. When`f`

is non-zero, they are defined by`f = x *. 2 ** n`

and`0.5 <= x < 1.0`

.

`val expm1 : t -> t`

`expm1 x`

computes`exp x -. 1.0`

, giving numerically-accurate results even if`x`

is close to`0.0`

.

`val log1p : t -> t`

`log1p x`

computes`log(1.0 +. x)`

(natural logarithm), giving numerically-accurate results even if`x`

is close to`0.0`

.

`val copysign : t -> t -> t`

`copysign x y`

returns a float whose absolute value is that of`x`

and whose sign is that of`y`

. If`x`

is`nan`

, returns`nan`

. If`y`

is`nan`

, returns either`x`

or`-. x`

, but it is not specified which.

`val acos : t -> t`

Arc cosine. The argument must fall within the range

`[-1.0, 1.0]`

. Result is in radians and is between`0.0`

and`pi`

.

`val asin : t -> t`

Arc sine. The argument must fall within the range

`[-1.0, 1.0]`

. Result is in radians and is between`-pi/2`

and`pi/2`

.

`val atan2 : t -> t -> t`

`atan2 y x`

returns the arc tangent of`y /. x`

. The signs of`x`

and`y`

are used to determine the quadrant of the result. Result is in radians and is between`-pi`

and`pi`

.

`val hypot : t -> t -> t`

`hypot x y`

returns`sqrt(x *. x + y *. y)`

, that is, the length of the hypotenuse of a right-angled triangle with sides of length`x`

and`y`

, or, equivalently, the distance of the point`(x,y)`

to origin.

`module Class : sig ... end`

Excluding nan the floating-point "number line" looks like:

`val classify : t -> Class.t`

`val is_finite : t -> bool`

`is_finite t`

returns`true`

iff`classify t`

is in`Normal; Subnormal; Zero;`

.

`val sign : t -> Sign.t`

`val sign_exn : t -> Sign.t`

The sign of a float. Both

`-0.`

and`0.`

map to`Zero`

. Raises on nan. All other values map to`Neg`

or`Pos`

.

`val sign_or_nan : t -> Sign_or_nan.t`

The sign of a float, with support for NaN. Both

`-0.`

and`0.`

map to`Zero`

. All NaN values map to`Nan`

. All other values map to`Neg`

or`Pos`

.

`val create_ieee : negative:bool -> exponent:int -> mantissa:Int63.t -> t Or_error.t`

These functions construct and destruct 64-bit floating point numbers based on their IEEE representation with a sign bit, an 11-bit non-negative (biased) exponent, and a 52-bit non-negative mantissa (or significand). See Wikipedia for details of the encoding.

In particular, if 1 <= exponent <= 2046, then:

`create_ieee_exn ~negative:false ~exponent ~mantissa = 2 ** (exponent - 1023) * (1 + (2 ** -52) * mantissa)`

`val create_ieee_exn : negative:bool -> exponent:int -> mantissa:Int63.t -> t`

`val ieee_negative : t -> bool`

`val ieee_exponent : t -> int`

`val ieee_mantissa : t -> Int63.t`

`module Terse : sig ... end`

S-expressions contain at most 8 significant digits.