Optimizations Enabled by -ffast-Math

2 years ago/116 comments/kristerw.github.io

I found the following note for -ffinite-math-only and -fno-signed-zeros quite worrying:

The program may behave in strange ways (such as not evaluating either the true or false part of an if-statement) if calculations produce Inf, NaN, or -0.0 when these flags are used.

I always thought that -ffast-math was telling the compiler. "I do not care about floating point standards compliance, and I do not rely on it. So optimize things that break the standard".

But instead it seems like this also implies a promise to the compiler. A promise that you will not produce Inf, NaN, or -0.0. Whilst especially Inf and NaN can be hard to exclude.

This changes the flag from saying "don't care about standards, make it fast" to "I hereby guarantee this code meets a stricter standard" where also it becomes quite hard to actually deduce you will meet this standard. Especially if you want to actually keep your performance. Because if you need to start putting all divisions in if statements to prevent getting Inf or NaN, that is a huge performance penalty.

2 years ago by adrian_b

The whole rationale for the standard for floating-point operations is to specify the FP operations with such properties that a naive programmer will be able to write programs which will behave as expected.

If you choose any option that is not compliant with the standard, that means, exactly as you have noticed, that you claim that you are an expert in FP computations and you know how to write FP programs that will give the desired results even with FP operations that can behave in strange ways.

So yes, that means that you become responsible to either guarantee that erroneous results do not matter or that you will take care to always check the ranges of input operands, as "okl" has already posted, to ensure that no overflows, underflows or undefined operations will happen.

2 years ago by celrod

> So yes, that means that you become responsible to either guarantee that erroneous results do not matter or that you will take care to always check the ranges of input operands, as "okl" has already posted, to ensure that no overflows, underflows or undefined operations will happen.

This is why I dislike the fact that it removes simple methods for checking if values are NaN, for example. I do not find it ergonomic to disable these checks.

2 years ago by kazinator

If NaN and Inf values can still happen, but you don't have reliable ways of working with them, that is poor.

I think what people really want is a sane floating point mode that throws exceptions instead of these Inf and NaN things, so if their program isn't debugged, they get a clear signal by way of its abnormal termination.

2 years ago by billfruit

Is it adviced to detect overflow and do something about it after an FP computation rather than adding pre-checks to try to avert the overflow?

2 years ago by owlbite

So it depends on how you want to detect overflow.

Trapping on math exceptions or otherwise pulling an emergency break doesn't really fit with modern hardware design if you want performance (for one thing, what do you do with the rest of your vector? do you just live with a pipeline flush?).

Adding targeted checks used sparingly can work, but probably requires a deeper understanding of the underlying numerical analysis of your problem. Generally just checking for NaN or Inf at the end is a better solution as these are in most cases absorbing states.

2 years ago by KarlKemp

The only obvious way to test if some operation is going to produce some value(s) is to perform that operation. Yes, for, say, a single multiplication I can anticipate the necessary bounds for each input value. But even short sequences of operations would quickly devolve into labyrinthine nested switch statements.

2 years ago by owlbite

Rename to: -fdisable-math-safeties

2 years ago by pinteresting

Generally if you're seeing a NaN/Inf something has gone wrong, It's very difficult to gracefully recover from and if you tried I think you would lose both sanity and performance!

Regarding performance, the cost of a real division is about 3-4 orders worse performance than an if statement that is very consistent, but the usual way is to have fast/safe versions of functions, where you need performance and can deduce if something is always/never true, create/use the fast function but by default everything uses the slower/safer function.

2 years ago by stncls

An example use of Inf is bounds on a value. Say you want to express that x sometimes has upper and/or lower bounds. With Inf you can simply write

    l <= x <= u

accepting that l is sometimes -Inf and u is sometimes +Inf. Without Inf, you get four different cases to handle. This is particularly handy when operations get slightly more complex, like transforming

    l' <= x + b <= u'

into the canonical form above, for some finite b. With Inf you can simply write

    l = l' - b
    u = u' - b

and things will work as one expects. Again, without Inf, multiple cases to handle correctly.

2 years ago by nemetroid

> Generally if you're seeing a NaN/Inf something has gone wrong

That’s a bold claim.

2 years ago by xioxox

As a scientist, it depends on what you mean by wrong. It's nice to push an array of numbers through some array operation. If some of the outputs look like NaN or Inf then that tells me something went wrong in the operation and to take a closer look. If some optimizer was told that NaN or Inf couldn't happen, meaning that they wouldn't be generated, then some of this output could be pure nonsense and I wouldn't notice. As NaNs propagate through the calculation they're extremely helpful to indicate something went wrong somewhere in the call chain.

NaN is also very useful itself as a data missing value in an array. If you have some regularly sampled data, with some items missing, it makes a lot of things easier to populate the holes with values which guarantee they won't produce misleading output.

2 years ago by pinteresting

Can you think of a function where the input is valid, the output is NaN and nothing has gone wrong in the process?

I can't think of any, haven't experienced any, not heard of any examples of it, so you're welcome to break my ignorance on the subject.

2 years ago by jes

What makes it a bold claim?

2 years ago by LeanderK

it's true in my experience. NaN/Inf was always a bug and resulted in needing to rewrite the calculation

2 years ago by beagle3

I often use NaNs on purpose to designate “unknown”, and its properties of “infecting” computations mean that I don’t have to check every step - just at the end of the computation.

R goes even farther, and uses one specific NaN (out of the million-billions) to signal “not available”, while other NaNs are just NaNs.

Its properties, used properly, make code much more simple and sane.

2 years ago by kloch

NaN usually indicates a bug or algorithmic problem, but not always. I have had cases where I need to detect NaN, present a useful message to the user even though everything is working as designed.

2 years ago by okl

> A promise that you will not produce Inf, NaN, or -0.0. Whilst especially Inf and NaN can be hard to exclude.

Cumbersome but not that difficult: Range-check all input values.

2 years ago by orangepanda

As the compiler assumes you wont produce such values, wouldnt it also optimise those range checks away?

2 years ago by jcelerier

At least GCC is I believe fairly buggy in that regard since isnan is optimised out, thus you cannot check if say an incoming value read from the network is a NaN. There's some lengthy discussion about it on the big tracker iirc..

2 years ago by nly

He means simple stuff like ensuring a function like

    float
    unit_price (float volume, int count) {
       return volume / count;
    }

is only called where count > 0

2 years ago by benttoothpaste

The checks could be expensive though, more expensive than the gains from this optimization. Often it is better to let the calculation proceed and use isnan to check the final result.

2 years ago by fho

That's where more advanced type systems shine. You get to encode things like "this number is not null" and a safe division function will only take "not nulls" as the input.

The workflow would then be:

    1. prove that the input is not null (if input == 0) yielding a "not null number"
    2. running your calculation (no check needed for multiplication or division, if you add something you have to reprove that the number is not null)

2 years ago by contravariant

And don't do any division.

Easy.

2 years ago by okl

There's nothing wrong with divisions if you exclude problematic combinations of input values.

2 years ago by scheme271

Or multiplications or additions to avoid overflows or denormals

2 years ago by gpderetta

I think the documentation is fairly clear:

"Allow optimizations for floating-point arithmetic that assume that arguments and results are not NaNs or +-Infs."

The obvious implication is that is that arguments and results are always finite and as a programmer you are responsible of guaranteeing the correct preconditions.

Certainly do not use finite-math-only when dealing with external data.

2 years ago by hurrrrrrrr

>Certainly do not use finite-math-only when dealing with external data.

so... never?

2 years ago by gpderetta

let me rephrase: data which you cannot guarantee meet the preconditions. Either the input is known good data or you sanitize it through some code path not compiled with fast-math.

2 years ago by mhh__

If you use the LLVM D compiler you can opt in or out of these individually on a per-function basis. I don't trust them globally.

2 years ago by stabbles

In Julia you can do `@fastmath ...`, which is basically just a find-and-replace of math operations and does not propagate into functions called in that code block, even when they are ultimately inlined. So what it does is:

    julia> @macroexpand @fastmath x + y
    :(Base.FastMath.add_fast(x, y))

And maybe that's a good thing, because the scope of @fastmath is as limited as it gets.

2 years ago by celrod

Yeah, I really like that Julia and LLVM allow applying it on a per-operation basis.

Because most of LLVM's backends don't allow for the same level of granularity, they do end up propagating some information more than I would like. For example, marking an operation as fast lets LLVM assume that it does not result in NaNs, letting nan checks get compiled away even though they themselves are not marked fast:

  julia> add_isnan(a,b) = isnan(@fastmath(a+b))
  add_isnan (generic function with 1 method)
  
  julia> @code_llvm debuginfo=:none add_isnan(1.2,2.3)
  define i8 @julia_add_isnan_597(double %0, double %1) #0 {
  top:
    ret i8 0
  }

meaning it remains more dangerous to use than it IMO should be. For this reason, LoopVectorization.jl does not apply "nonans".

2 years ago by gpderetta

with gcc you can also use #pragma gcc optimize or __attribute__(optimize(...)) for a similar effect.

It is not 100% bug free (at least it didn't use to) and often it prevents inlining a function into another having different optimization levels (so in practice its use has to be coarse grained).

2 years ago by nly

This pragma doesn't quite work for -ffast-math

https://gcc.godbolt.org/z/voMK7x7hG

Try it with and without the pragma, and adding -ffast-math to the compiler command line. It seems that with the pragma sqrt(x) * sqrt(x) becomes sqrt(x*x), but with the command line version it is simplified to just x.

2 years ago by gpderetta

That's very interesting. The pragma does indeed do a lot of optimizations compared to no-pragma (for example it doesn't call sqrtf at all), but the last simplification is only done with the global flag set. I wonder if it is a missed optimization or if there is a reason for that.

edit: well with pragma fast-math it appears it is simply assuming finite math and x>0 thus skipping the call to the sqrtf as there are not going to be any errors, basically only saving a jmp. Using pragma finite-math-only and an explicit check are enough to trigger the optimization (but passing finite-math as a command line is not).

Generally it seems that the various math flags behave slightly differently in the pragma/attribute.

2 years ago by p0nce

+1 Every time I tried --fast-math made things a bit slower. It's not that valuable with LLVM.

2 years ago by ISL

Fascinating to learn that something labelled 'unsafe' stands a reasonable chance of making a superior mathematical/arithmetic choice (even if it doesn't match on exactly to the unoptimized FP result).

If you're taking the ratio of sine and cosine, I'll bet that most of the time you're better off with tangent....

2 years ago by gigatexal

I am not a C dev but this was a really fascinating blog post and how compilers optimize things.

2 years ago by superjan

Excellent post I agree, but isn’t it more about how they can’t optimize? Compilers can seem magic how they optimize integer math but this is a great explanation why not to rely on the compiler if you want fast floating point code.

2 years ago by andraz

The question is how to enable all this in Rust... Right now there's no simple way to just tell Rust to be fast & loose with floats.

2 years ago by pornel

There are fast float intrinsics: https://doc.rust-lang.org/std/intrinsics/fn.fadd_fast.html

but better support dies in endless bikeshed of:

• People imagine enabling fast float by "scope", but there's no coherent way to specify that when it can mix with closures (even across crates) and math operators expand to std functions defined in a different scope.

• Type-based float config could work, but any proposal of just "fastfloat32" grows into a HomerMobile of "Float<NaN=false, Inf=Maybe, NegZeroEqualsZero=OnFullMoonOnly, etc.>"

• Rust doesn't want to allow UB in safe code, and LLVM will cry UB if it sees Inf or Nan, but nobody wants compiler inserting div != 0 checks.

• You could wrap existing fast intrinsics in a newtype, except newtypes don't support literals, and <insert user-defined literals bikeshed here>

2 years ago by kzrdude

It must be clearly understood which of the flags are entirely safe and which need to be under 'unsafe', as a prerequisite.

Blanket flags for the whole program do not fit very well with Rust, while point use of these flags is inconvenient or needs new syntax.. but there are discussions about these topics.

2 years ago by atoav

Also maybe you woild like to have more granular control over which parts of your program has the priority on speed and which part favours accuracy. Maybe this could be done with a seperate type (e.g. ff64) or a decorator (which would be useful if you want to enable this for someone elses library).

2 years ago by andraz

Maybe... or maybe there would be just a flag to enable all this and be fine with consequences.

2 years ago by varajelle

Rust is not really a fast & loose language where anything can be UB.

At least it doesn't need the errno ones since rust don't use errno.

2 years ago by pjmlp

The whole point of ALGOL derived languages for systems programming, is not being fast & loose with anything, unless the seatbelt and helmet are on as well.

2 years ago by tcpekin

Why is this compiler optimization beneficial with -ffinite-math-only and -fno-signed-integers?

From

    if (x > y) {
      do_something();
    } else {
      do_something_else();
    }

to the form

    if (x <= y) {
      do_something_else();
    } else {
      do_something();
    }

What happens when x or y are NaN?

2 years ago by progbits

It is always false. In the first block do_something_else() gets executed when either is NaN, in the second it is do_something().

2 years ago by zro

> NaN is unordered: it is not equal to, greater than, or less than anything, including itself. x == x is false if the value of x is NaN [0]

My read of this is that comparisons involving NaN on either side always evaluate to false.

In the first one if X or Y is NaN then you'll get do_something_else, and in the second one you'll get do_something.

As far as why one order would be more optimal than the other, I'm not sure. Maybe something to do with branch prediction?

[0] https://www.gnu.org/software/libc/manual/html_node/Infinity-...

2 years ago by ptidhomme

Do some applications actually use denormal floats ? I'm curious.

2 years ago by adrian_b

Denormal floats are not a purpose, they are not used intentionally.

When the CPU generates denormal floats on underflow, that ensures that underflow does not matter, because the errors remain the same as at any other floating-point operation.

Without denormal floats, underflow must be an exception condition that must be handled somehow by the program, because otherwise the computation errors can be much higher than expected.

Enabling flush-to-zero instead is an optimization of the same kind as ignoring integer overflow.

It can be harmless in a game or graphic application where a human will not care if the displayed image has some errors, but it can be catastrophic in a simulation program that is expected to provide reliable results.

Providing a flush-to-zero option is a lazy solution for CPU designers, because there have been many examples in the past of CPU designs where denormal numbers were handled without penalties for the user (but of course with a slightly higher cost for the manufacturer) so there was no need for a flush-to-zero option.

2 years ago by marcan_42

Flushing denormals to zero only matters if your calculations are already running into the lower end of floating-point exponents (and even with denormals, if they're doing that, they're going to run into lost precision anyway sooner or later).

The useful thing denormals do is make the loss of precision at that point gradual, instead of sudden. But you're still losing precision, and a few orders of magnitude later you're going to wind up at 0 either way.

If your formulas are producing intermediate results with values that run into the lower end of FP range, and it matters that those values retain precision there, then you're either using the wrong FP type or you're using the wrong formulas. Your code is likely already broken, the breakage is just rarer than in flush-to-zero mode.

So just enable FTZ mode, and if you run into issues, you need to fix that code (e.g. don't divide by values that can be too close to 0), not switch denormals on.

2 years ago by adrian_b

Your arguments are correct, but the conclusion does not result from them.

If we assume that underflows happen in your program and this, as you say, is a sign that greater problems will be caused by that, then you must not enable flush-to-zero, but you must enable trap-on-underflow, to see where underflows happen and to investigate the reason and maybe rearrange your formulas to avoid the too small results.

Flush-to-zero may sometimes lead to crashes, when it becomes obvious that something is wrong, but more often you just get some results with large errors that are difficult to distinguish from good results.

Opinions obviously vary, but I have never seen any good use case for flush-to-zero.

Underflows are either so rare that their performance does not matter, or if they are so frequent that flush-to-zero will increase the speed, then your code has a problem and the formulas should be restructured, which will both increase the speed and eliminate the loss of precision.

2 years ago by ptidhomme

Thanks for this explanation. But yeah, I meant : are there applications where use of denormals vs. flush-to-zero is actually useful... ? If your variable will be nearing zero, and e.g. you use it as a divider, don't you need to handle a special case anyway ? Just like you should handle your integer overflows.

I'm seeing denormals as an extension of the floating point range, but with a tradeoff that's not worth it. Maybe I got it wrong ?

2 years ago by adrian_b

The use of denormals is always useful except when you want to make your program faster no matter what and your target CPU happens to be a CPU where operations with denormals are slower than operations with other numbers.

You must keep in mind that if you enable flush-to-zero and you really see an increase in speed, that means that you have now injected errors in your FP computations. Whether the errors matter or not, that depends on the application.

If you do not see any serious speed improvement, then you have no reason to enable flush-to-zero. Even when you might want to use flush-to-zero in a production binary, you should not use it during development.

Prior to enabling flush-to-zero it would be good to enable trap-on-underflow instead of denormals, to see if this really happens frequently enough to matter for performance and possibly to investigate the reason, to see if underflow could be avoided in other ways.

In conclusion using denormals is always useful, because it limits the errors accumulated during FP computation. On some CPUs you may get higher speed by not using denormals, but only if large errors do not matter. If you avoid underflows by other means, e.g. range checking of inputs, then it does not matter how slow the operations with denormals are, so there is no reason to use any option that enables flush-to-zero (e.g. -ffast-math).

Making a FP computation with "double" numbers and enabling flush-to-zero on that would usually be quite stupid, because if the errors do not matter, then that is a task for single-precision "float" (unless you need "double" only for the exponent range, not for precision).

2 years ago by amadvance

Denormals give the guarantee that if a != b, then a - b != 0.

Without denormals a check before a division may fail to detect a division by 0.

2 years ago by marcan_42

I agree with you; if your code "works" with denormals and "doesn't work" with flush-to-zero, then it's already broken and likely doesn't work for some inputs anyway, you just haven't run into it yet.

2 years ago by PaulDavisThe1st

If you want maximum possible accuracy for FP computation, you need to pay the performance price that denormals incur. In many cases, that will likely be a long running computation where responsiveness from a user-perspective and/or meeting realtime deadlines is not an issue. Any time "I want the best possible answer that we can compute", you should leave denormals enabled.

By contrast, in media software (audio and/or video) denormals getting flushed to zero is frequently necessary to meet performance goals, and ultimately has no impact on the result. It's quite common for several audio FX (notably reverb) to generate denormals as the audio fades out, but those values are so far below the physical noise floor that it makes no difference to flush them to zero.

2 years ago by enriquto

It's not that you decide specifically to use them, but sometimes they sort of happen very naturally. For example, if you evaluate a Gaussian function f(x)=e^(-x^2), for moderate values of x the value f(x) is denormal.

2 years ago by im3w1l

Accidentally trigger that path now and then? Probably. Depends on subnormals to give a reasonable result? Probably rare. But what do I know...

2 years ago by SeanLuke

* x+0.0 cannot be optimized to x because that is not true when x is -0.0

Wait, what? What is x + -0.0 then? What are the special cases? The only case I can think of would be 0.0 + -0.0.

2 years ago by foo92691

Seems like it might depend on the FPU rounding mode?

https://en.wikipedia.org/wiki/Signed_zero#Arithmetic

Daily Digest

Get a daily email with the the top stories from Hacker News. No spam, unsubscribe at any time.

Home About GitHub Kaggle

AI Blog Deep Learning Apps Security Checklist

Bookmarks Hacker News My Stack