Improve this page
Quickly fork, edit online, and submit a pull request for this page.
Requires a signed-in GitHub account. This works well for small changes.
If you'd like to make larger changes you may want to consider using
local clone.
Page wiki
View or edit the community-maintained wiki page associated with this page.
std.numeric
This module is a port of a growing fragment of the numeric header in Alexander Stepanov's Standard Template Library, with a few additions. License:Boost License 1.0. Authors:
Andrei Alexandrescu, Don Clugston, Robert Jacques Source:
std/numeric.d
- enum CustomFloatFlags: int;
- Format flags for CustomFloat.
- signed
- Adds a sign bit to allow for signed numbers.
- storeNormalized
- Store values in normalized form by default. The actual precision of the significand is extended by 1 bit by assuming an implicit leading bit of 1 instead of 0. i.e. 1.nnnn instead of 0.nnnn. True for all IEE754 types
- allowDenorm
- Stores the significand in IEEE754 denormalized form when the exponent is 0. Required to express the value 0.
- infinity
- Allows the storage of IEEE754 infinity values.
- nan
- Allows the storage of IEEE754 Not a Number values.
- probability
- If set, select an exponent bias such that max_exp = 1. i.e. so that the maximum value is >= 1.0 and < 2.0. Ignored if the exponent bias is manually specified.
- negativeUnsigned
- If set, unsigned custom floats are assumed to be negative.
- allowDenormZeroOnly
- If set, 0 is the only allowed IEEE754 denormalized number. Requires allowDenorm and storeNormalized.
- ieee
- Include all of the IEEE754 options.
- none
- Include none of the above options.
- template CustomFloat(uint bits) if (bits == 8 || bits == 16 || bits == 32 || bits == 64 || bits == 80)
template CustomFloat(uint precision, uint exponentWidth, CustomFloatFlags flags = CustomFloatFlags.ieee) if (((flags & flags.signed) + precision + exponentWidth) % 8 == 0 && precision + exponentWidth > 0)
struct CustomFloat(uint precision, uint exponentWidth, CustomFloatFlags flags, uint bias) if (((flags & flags.signed) + precision + exponentWidth) % 8 == 0 && precision + exponentWidth > 0); - Allows user code to define custom floating-point formats. These formats are
for storage only; all operations on them are performed by first implicitly
extracting them to real first. After the operation is completed the
result can be stored in a custom floating-point value via assignment.
Example:
// Define a 16-bit floating point values CustomFloat!16 x; // Using the number of bits CustomFloat!(10, 5) y; // Using the precision and exponent width CustomFloat!(10, 5,CustomFloatFlags.ieee) z; // Using the precision, exponent width and format flags CustomFloat!(10, 5,CustomFloatFlags.ieee, 15) w; // Using the precision, exponent width, format flags and exponent offset bias // Use the 16-bit floats mostly like normal numbers w = x*y - 1; writeln(w); // Functions calls require conversion z = sin(+x) + cos(+y); // Use uniary plus to concisely convert to a real z = sin(x.re) + cos(y.re); // Or use the .re property to convert to a real z = sin(x.get!float) + cos(y.get!float); // Or use get!T z = sin(cast(float)x) + cos(cast(float)y); // Or use cast(T) to explicitly convert // Define a 8-bit custom float for storing probabilities alias CustomFloat!(4, 4, CustomFloatFlags.ieee^CustomFloatFlags.probability^CustomFloatFlags.signed ) Probability; auto p = Probability(0.5);
- template FPTemporary(F) if (isFloatingPoint!F)
- Defines the fastest type to use when storing temporaries of a
calculation intended to ultimately yield a result of type F
(where F must be one of float, double, or real). When doing a multi-step computation, you may want to store
intermediate results as FPTemporary!F.
Example:
// Average numbers in an array double avg(in double[] a) { if (a.length == 0) return 0; FPTemporary!double result = 0; foreach (e; a) result += e; return result / a.length; }
The necessity of FPTemporary stems from the optimized floating-point operations and registers present in virtually all processors. When adding numbers in the example above, the addition may in fact be done in real precision internally. In that case, storing the intermediate result in double format is not only less precise, it is also (surprisingly) slower, because a conversion from real to double is performed every pass through the loop. This being a lose-lose situation, FPTemporary!F has been defined as the fastest type to use for calculations at precision F. There is no need to define a type for the most accurate calculations, as that is always real. Finally, there is no guarantee that using FPTemporary!F will always be fastest, as the speed of floating-point calculations depends on very many factors. - template secantMethod(alias fun)
- Implements the secant method for finding a
root of the function fun starting from points [xn_1, x_n]
(ideally close to the root). Num may be float, double,
or real.
Example:
float f(float x) { return cos(x) - x*x*x; } auto x = secantMethod!(f)(0f, 1f); assert(approxEqual(x, 0.865474));