Improve this page Quickly fork, edit online, and submit a pull request for this page. Requires a signed-in GitHub account. This works well for small changes. If you'd like to make larger changes you may want to consider using local clone. Page wiki View or edit the community-maintained wiki page associated with this page.

std.numeric

This module is a port of a growing fragment of the numeric header in Alexander Stepanov's Standard Template Library, with a few additions.

License:
Boost License 1.0.

Authors:
Andrei Alexandrescu, Don Clugston, Robert Jacques

Source:
std/numeric.d

enum CustomFloatFlags: int;
Format flags for CustomFloat.

signed
Adds a sign bit to allow for signed numbers.

storeNormalized
Store values in normalized form by default. The actual precision of the significand is extended by 1 bit by assuming an implicit leading bit of 1 instead of 0. i.e. 1.nnnn instead of 0.nnnn. True for all IEE754 types

allowDenorm
Stores the significand in IEEE754 denormalized form when the exponent is 0. Required to express the value 0.

infinity
Allows the storage of IEEE754 infinity values.

nan
Allows the storage of IEEE754 Not a Number values.

probability
If set, select an exponent bias such that max_exp = 1. i.e. so that the maximum value is >= 1.0 and < 2.0. Ignored if the exponent bias is manually specified.

negativeUnsigned
If set, unsigned custom floats are assumed to be negative.

allowDenormZeroOnly
If set, 0 is the only allowed IEEE754 denormalized number. Requires allowDenorm and storeNormalized.

ieee
Include all of the IEEE754 options.

none
Include none of the above options.

template CustomFloat(uint bits) if (bits == 8 || bits == 16 || bits == 32 || bits == 64 || bits == 80)
template CustomFloat(uint precision, uint exponentWidth, CustomFloatFlags flags = CustomFloatFlags.ieee) if (((flags & flags.signed) + precision + exponentWidth) % 8 == 0 && precision + exponentWidth > 0)
struct CustomFloat(uint precision, uint exponentWidth, CustomFloatFlags flags, uint bias) if (((flags & flags.signed) + precision + exponentWidth) % 8 == 0 && precision + exponentWidth > 0);
Allows user code to define custom floating-point formats. These formats are for storage only; all operations on them are performed by first implicitly extracting them to real first. After the operation is completed the result can be stored in a custom floating-point value via assignment.

Example:
// Define a 16-bit floating point values
CustomFloat!16                                x;     // Using the number of bits
CustomFloat!(10, 5)                           y;     // Using the precision and exponent width
CustomFloat!(10, 5,CustomFloatFlags.ieee)     z;     // Using the precision, exponent width and format flags
CustomFloat!(10, 5,CustomFloatFlags.ieee, 15) w;     // Using the precision, exponent width, format flags and exponent offset bias

// Use the 16-bit floats mostly like normal numbers
w = x*y - 1;
writeln(w);

// Functions calls require conversion
z = sin(+x)           + cos(+y);                     // Use uniary plus to concisely convert to a real
z = sin(x.re)         + cos(y.re);                   // Or use the .re property to convert to a real
z = sin(x.get!float)  + cos(y.get!float);            // Or use get!T
z = sin(cast(float)x) + cos(cast(float)y);           // Or use cast(T) to explicitly convert

// Define a 8-bit custom float for storing probabilities
alias CustomFloat!(4, 4, CustomFloatFlags.ieee^CustomFloatFlags.probability^CustomFloatFlags.signed ) Probability;
auto p = Probability(0.5);

template FPTemporary(F) if (isFloatingPoint!F)
Defines the fastest type to use when storing temporaries of a calculation intended to ultimately yield a result of type F (where F must be one of float, double, or real). When doing a multi-step computation, you may want to store intermediate results as FPTemporary!F.

Example:
// Average numbers in an array
double avg(in double[] a)
{
    if (a.length == 0) return 0;
    FPTemporary!double result = 0;
    foreach (e; a) result += e;
    return result / a.length;
}

The necessity of FPTemporary stems from the optimized floating-point operations and registers present in virtually all processors. When adding numbers in the example above, the addition may in fact be done in real precision internally. In that case, storing the intermediate result in double format is not only less precise, it is also (surprisingly) slower, because a conversion from real to double is performed every pass through the loop. This being a lose-lose situation, FPTemporary!F has been defined as the fastest type to use for calculations at precision F. There is no need to define a type for the most accurate calculations, as that is always real.

Finally, there is no guarantee that using FPTemporary!F will always be fastest, as the speed of floating-point calculations depends on very many factors.

template secantMethod(alias fun)
Implements the secant method for finding a root of the function fun starting from points [xn_1, x_n] (ideally close to the root). Num may be float, double, or real.

Example:
float f(float x) {
    return cos(x) - x*x*x;
}
auto x = secantMethod!(f)(0f, 1f);
assert(approxEqual(x, 0.865474));