thr3ads.net - llvm dev - [LLVMdev] Integer divide by zero [Apr 2013]

If this information is useful, please help other people find it:
Share via:

Cameron McInally

2013-Apr-07 18:50 UTC

[LLVMdev] Integer divide by zero

Hey again,

Thank you for your opinions. I will take them into consideration. A few
comments...

On Sun, Apr 7, 2013 at 1:39 PM, Jeffrey Yasskin <jyasskin at google.com>
wrote:
...
> If the performance penalty is unclear to you, that means you haven't
> measured it. Until you measure, you have absolutely no business
> complaining about a potential performance problem. Measure, and then
> come back with numbers.

Unfortunately, I am restricted from publicly sharing performance results
without going through an extensive, expensive legal process. Not fun!

Some thoughts though...

In order to test the performance of this Clang feature, I would have to
build it into my frontend. That's not cost effective for me for the
following reason.

It seems to me, a priori, that the code currently generated by Clang would
indeed have a performance penalty on an inorder processor, without branch
prediction. Take Xeon Phi for example. Albeit, a small penalty. Please
correct me if my assumptions are incorrect.

Our team's culture dictates that "an instruction is an
instruction", hence
a performance problem. I understand that "performance problem" will
have
different definitions among different tribes.

> Although, I've been contemplating x86-64's behaviour for this case
when
> > floating point traps are disabled. Ideally, the compiler should
preserve
> > that behaviour, which might make this software implementation messy.
> > Especially if different processors have different implementations. The
> > simplest solution... let the hardware behave as it should.
>
> To be clear, you're asking to turn off a set of optimizations. That
> is, you're asking to make code in general run slower, so that you can
> get a particular behavior on some CPUs in unusual cases.
>
I respectfully disagree. I am asking for an *option* to turn off a set of
optimizations; not turn off optimizations in general. I would like to make
it easy for a compiler implementor to choose the desired behaviour. I
whole-heartedly believe that both behaviours (undefined and trap) have
merit.

To digress in the interest of light-heartedness, this reminds me of the old
joke "my program's performance improved 20x!, but the results
aren't
correct". :)

> >> You might need to
> >> do this in the processor-specific backend to avoid other
> >> undefined-behavior-based optimizations—that is, recognize "if
(x == 0)
> >> goto err_handler; else y/x;" and replace it with
> >> "register-pc-in-fp-handler-map(); turn-on-fp-traps();
y/x;".
> >
> >
> > I believe that the constant folder would remove the constant division
by
> > zero and conditional before the backend could have its say. We would
be
> left
> > with only the jump to the error handler. That may complicate things.
>
> If the compiler can prove x==0, then yes, you'd be left with just a
> jump to the error handler. That's more efficient than handling a
> hardware trap, so it's what you ought to want.
>
I would like a trap. I.e. x86-64's expected behaviour.

I would also not like a branch on non-constant integer divisions. As a
reminder, this discussion originated in the constant folder. The
non-constant behaviour works just fine.

Thanks again,
Cameron
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130407/06d19ff4/attachment.html>

Chandler Carruth

2013-Apr-07 22:23 UTC

head link

[LLVMdev] Integer divide by zero

I think this entire conversation is going a bit off the rails. Let's try to
stay focused on the specific request, and why there (may) be problems with
it.

On Sun, Apr 7, 2013 at 11:50 AM, Cameron McInally
<cameron.mcinally at nyu.edu>wrote:
> To be clear, you're asking to turn off a set of optimizations. That
>> is, you're asking to make code in general run slower, so that you
can
>> get a particular behavior on some CPUs in unusual cases.
>>
>
> I respectfully disagree. I am asking for an *option* to turn off a set of
> optimizations; not turn off optimizations in general. I would like to make
> it easy for a compiler implementor to choose the desired behaviour. I
> whole-heartedly believe that both behaviours (undefined and trap) have
> merit.
>
I think you're both misconstruing what this would involve.

You're actually asking for the formal model of the LLVM IR to be
*parameterized*. In one mode, an instruction would produce undefined
behavior on division, and in another mode it would produce a trap. Then you
are asking for the optimizer stack to support either semantic model, and
produce efficient code regardless.

This is completely intractable for LLVM to support. It would make both the
optimizers and the developers of LLVM crazy to have deep parameterization
of the fundamental semantic model for integer division.

The correct way to support *exactly* reproducing the architectural
peculiarities of the x86-64 integer divide instruction is to add a
target-specific intrinsic which does this. It will have defined behavior
(of trapping in some cases) as you want, and you can emit this in your FE
easily. However, even this has the risk of incurring a high maintenance
burden. If you want much in the way of optimizations of this intrinsic,
you'll have to go through the optimizer and teach each pass about your
intrinsic. Some of these will be easy, but some will be hard and there will
be a *lot* of them. =/

Cameron, you (and others interested) will certainly need to provide all of
the patches and work to support this if you think this is an important use
case, as the existing developers have found other trade-offs and solutions.
And even then, if it requires really substantial changes to the optimizer,
I'm not sure it's worth pursuing this in LLVM. My primary concerns are
two-fold. First, I think that the amount of work required to recover the
optimizations which could theoretically apply to both of these operations
will be massive. Second, I fear that after having done this work, you will
immediately find the need to remove some other undefined behavior from the
IR which happens to have defined behavior on x86-64.

Fundamentally, the idea of undefined behavior is at the core of the design
of LLVM's optimizers. It is leveraged everywhere, and without it many
algorithms that are fast would become slow, transformations that are cheap
would become expensive, passes that operate locally would be forced to
operate across ever growing scopes in order to be certain the optimizations
applied in this specific case. Trying to remove undefined behavior from
LLVM seems unlikely to be a productive pursuit.

More productive (IMO) is to emit explicit guards against the undefined
behavior in your language, much as -fsanitize does for undefined behavior
in C++. Then work to build a mode where a specific target can take
advantage of target specific trapping behaviors to emit these guards more
efficiently. This will allow LLVM's optimizers to continue to function in
the world they were designed for, and with a set of rules that we know how
to build efficient optimizers around, and your source programs can operate
in a world with checked behavior rather than undefined behavior. As a
useful side-effect, you can defer the target-specific optimizations until
you have benchmarks (internally is fine!) and can demonstrate the
performance problems (if any).

Cameron, you may disagree, but honestly if you were to convince folks here
I think it would have happened already. I'm not likely to continue the
theoretical debate about whether LLVM's stance on UB (as I've described
above) is a "good" or "bad" stance. Not that I wouldn't
enjoy the debate
(especially at a bar some time), but because I fear it isn't a productive
way to spend the time of folks on this list. So let's try to stick to the
technical discussion of strategies, costs, and tradeoffs.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130407/da362a0a/attachment.html>

Cameron McInally

2013-Apr-08 04:28 UTC

head link

[LLVMdev] Integer divide by zero

Well put, Chandler!

On Sun, Apr 7, 2013 at 6:23 PM, Chandler Carruth <chandlerc at
google.com>wrote:
> I think this entire conversation is going a bit off the rails. Let's
try
> to stay focused on the specific request, and why there (may) be problems
> with it.
>
> On Sun, Apr 7, 2013 at 11:50 AM, Cameron McInally <
> cameron.mcinally at nyu.edu> wrote:
>
>> To be clear, you're asking to turn off a set of optimizations. That
>>> is, you're asking to make code in general run slower, so that
you can
>>> get a particular behavior on some CPUs in unusual cases.
>>>
>>
>> I respectfully disagree. I am asking for an *option* to turn off a set
of
>> optimizations; not turn off optimizations in general. I would like to
make
>> it easy for a compiler implementor to choose the desired behaviour. I
>> whole-heartedly believe that both behaviours (undefined and trap) have
>> merit.
>>
>
> I think you're both misconstruing what this would involve.
>
> You're actually asking for the formal model of the LLVM IR to be
> *parameterized*. In one mode, an instruction would produce undefined
> behavior on division, and in another mode it would produce a trap. Then you
> are asking for the optimizer stack to support either semantic model, and
> produce efficient code regardless.
>
> This is completely intractable for LLVM to support. It would make both the
> optimizers and the developers of LLVM crazy to have deep parameterization
> of the fundamental semantic model for integer division.
>
> The correct way to support *exactly* reproducing the architectural
> peculiarities of the x86-64 integer divide instruction is to add a
> target-specific intrinsic which does this. It will have defined behavior
> (of trapping in some cases) as you want, and you can emit this in your FE
> easily. However, even this has the risk of incurring a high maintenance
> burden. If you want much in the way of optimizations of this intrinsic,
> you'll have to go through the optimizer and teach each pass about your
> intrinsic. Some of these will be easy, but some will be hard and there will
> be a *lot* of them. =/
>
>
> Cameron, you (and others interested) will certainly need to provide all of
> the patches and work to support this if you think this is an important use
> case, as the existing developers have found other trade-offs and solutions.
> And even then, if it requires really substantial changes to the optimizer,
> I'm not sure it's worth pursuing this in LLVM. My primary concerns
are
> two-fold. First, I think that the amount of work required to recover the
> optimizations which could theoretically apply to both of these operations
> will be massive. Second, I fear that after having done this work, you will
> immediately find the need to remove some other undefined behavior from the
> IR which happens to have defined behavior on x86-64.
>
Alas, I must have been shortsighted. For my purposes, I had envisioned
using this target-specific intrinsic only when undefined behaviour was
imminent. That information is available before the IR and it would
work-around the constant folder. I did not anticipate needing optimizations
around that intrinsic, since it would ultimately trap.

Supporting the intrinsic as a proper alternative to the integer division
operator(s) sounds like a lot of work. I do not believe that the reward is
worth the effort, at least for my purposes. Others may feel different.

> Fundamentally, the idea of undefined behavior is at the core of the design
> of LLVM's optimizers. It is leveraged everywhere, and without it many
> algorithms that are fast would become slow, transformations that are cheap
> would become expensive, passes that operate locally would be forced to
> operate across ever growing scopes in order to be certain the optimizations
> applied in this specific case. Trying to remove undefined behavior from
> LLVM seems unlikely to be a productive pursuit.
>
Fair enough.

> More productive (IMO) is to emit explicit guards against the undefined
> behavior in your language, much as -fsanitize does for undefined behavior
> in C++. Then work to build a mode where a specific target can take
> advantage of target specific trapping behaviors to emit these guards more
> efficiently. This will allow LLVM's optimizers to continue to function
in
> the world they were designed for, and with a set of rules that we know how
> to build efficient optimizers around, and your source programs can operate
> in a world with checked behavior rather than undefined behavior. As a
> useful side-effect, you can defer the target-specific optimizations until
> you have benchmarks (internally is fine!) and can demonstrate the
> performance problems (if any).
>
Regrettably, this implementation does not suit my needs. The constant
folding would still occur and I would like to produce the actual division,
since the instruction is non-maskable on x86. Others may have a better use
for this implementation though, so I don't want to shoot the idea down for
everyone.

> Cameron, you may disagree, but honestly if you were to convince folks here
> I think it would have happened already. I'm not likely to continue the
> theoretical debate about whether LLVM's stance on UB (as I've
described
> above) is a "good" or "bad" stance. Not that I
wouldn't enjoy the debate
> (especially at a bar some time), but because I fear it isn't a
productive
> way to spend the time of folks on this list. So let's try to stick to
the
> technical discussion of strategies, costs, and tradeoffs.
>
Oh, no. Your analysis was thorough and I can sympathize with it. The seams
of C/C++ and the x86 architecture are foggy. I understand that my
interpretation of their interactions is not gospel.

Thanks again for the thoughtful reply!

-Cameron
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130408/b4aa72d9/attachment.html>

dag at cray.com

2013-Apr-16 16:58 UTC

head link

[LLVMdev] Integer divide by zero

Chandler Carruth <chandlerc at google.com> writes:
> You're actually asking for the formal model of the LLVM IR to be
> *parameterized*. In one mode, an instruction would produce undefined
> behavior on division, and in another mode it would produce a trap.
> Then you are asking for the optimizer stack to support either semantic
> model, and produce efficient code regardless.
>
> This is completely intractable for LLVM to support. It would make both
> the optimizers and the developers of LLVM crazy to have deep
> parameterization of the fundamental semantic model for integer
> division.
You're making this *way* too complicated.  We have plenty of examples of
options to turn off optimizations.  Most compliers provide options to
preserve traps.

The division would still semantically have undefined behavior.  The
implementation would simply make the behavior a trap instead of
returning some random garbage value.  Sure, it *may* change how other
code is optimized but that is the choice the implementor makes when
choosing to use the option to preserve traps.  LLVM developers not
concerned with preserving traps need not give it a second thought.
> The correct way to support *exactly* reproducing the architectural
> peculiarities of the x86-64 integer divide instruction is to add a
> target-specific intrinsic which does this. It will have defined
> behavior (of trapping in some cases) as you want, and you can emit
> this in your FE easily.
You can't do that if the FE doesn't see the constant expression.
Optimization may reveal it later.
> Cameron, you (and others interested) will certainly need to provide
> all of the patches and work to support this if you think this is an
> important use case, as the existing developers have found other
> trade-offs and solutions. 
Certainly.

A software test+trap is theoretically possible and the target optimizer
could theoretically get rid of it, but I share Cameron's concern about
the work required to turn theory into reality.  He's not asking to
redefine the LLVM IR.  He's asking for an option to control the
implementation of that IR.

Preserving traps is a real-world need.  LLVM itself doesn't currently
provide a way to do it.  It seems like it should have one and not rely
on particular frontends.  It's supposed to be an independent set of
libraries, right?

                              -David

Reasonably Related Threads

Search for more apparently analagous threads

llvm dev - Apr 2013 - [LLVMdev] Integer divide by zero

[LLVMdev] Integer divide by zero

[LLVMdev] Integer divide by zero

[LLVMdev] Integer divide by zero

[LLVMdev] Integer divide by zero

Reasonably Related Threads