On 05/02/2014 11:57 AM, Filip Pizlo wrote:>
> On May 2, 2014 at 11:53:25 AM, Eric Christopher (echristo at gmail.com
> <mailto:echristo at gmail.com>) wrote:
>
>> On Wed, Apr 30, 2014 at 10:34 PM, Philip Reames
>> <listmail at philipreames.com> wrote:
>> > Andy - If you're not already following this closely, please
start.
>> We've
>> > gotten into fairly fundamental questions of what a patchpoint
does.
>> >
>> > Filip,
>> >
>> > I think you've hit the nail on the head. What I'm thinking
of as being
>> > patchpoints are not what you think they are. Part of that is that
>> I've got
>> > a local change which adds a very similar construction (called
>> "statepoints"
>> > for the moment), but I was trying to keep that separate. That also
>> includes
>> > a lot of GC semantics which are not under discussion currently. My
>> > apologies if that experience bled over into this conversation and
made
>> > things more confusing.
>> >
>> > I will note that the documentation for patchpoint say explicitly
the
>> > following:
>> > "The ‘llvm.experimental.patchpoint.*‘ intrinsics creates a
function
>> call to
>> > the specified <target> and records the location of specified
values
>> in the
>> > stack map."
>> >
>> > My reading has always been that a patchpoint *that isn't
patched*
>> is simply
>> > a call with a stackmap associated with it. To my reading, this can
(and
>> > did, and does) indicate my proposed usage would be legal.
>> >
>>
>> I like the idea that the target can be assumed to be called. It makes
>> optimization of the call possible, etc. I think it's definitely
worth
>> exploring before we lock down the patchpoint intrinsic.
>
> I will actively oppose this.
>
I think it sounds like we need to split patchpoint into two cases. I'm
going to send a rough proposal for this later today.>
> -Filip
>
>
>>
>>
>> -eric
>>
>> > I will agree that I've confused the topic badly on the
optimization
>> front.
>> > My "statepoint" isn't patchable, so a lot more
optimizations are legal.
>> > Sorry about that. To restate what I think you've been saying
all
>> along, the
>> > optimizer can't make assumptions about what function is called
by a
>> > patchpoint because that might change based on later patching. Is
>> this the
>> > key point you've been trying to make?
>> >
>> > I'm not objecting to separating "my patchpoint" from
"your patchpoint".
>> > Let's just hammer out the semantics of each first. :)
>> >
>> > Again, longer response to follow in a day or so. :)
>> >
>> > Philip
>> >
>> >
>> > On 04/30/2014 10:09 PM, Filip Pizlo wrote:
>> >
>> >
>> >
>> > On April 30, 2014 at 9:06:20 PM, Philip Reames
>> (listmail at philipreames.com)
>> > wrote:
>> >
>> > On 04/29/2014 12:39 PM, Filip Pizlo wrote:
>> >
>> > On April 29, 2014 at 11:27:06 AM, Philip Reames
>> (listmail at philipreames.com)
>> > wrote:
>> >
>> > On 04/29/2014 10:44 AM, Filip Pizlo wrote:
>> >
>> > LD;DR: Your desire to use trapping on x86 only further convinces
me
>> that
>> > Michael's proposed intrinsics are the best way to go.
>> >
>> > I'm still not convinced, but am not going to actively oppose
it
>> either. I'm
>> > leery of designing a solution with major assumptions we don't
have
>> data to
>> > backup.
>> >
>> > I worry your assumptions about deoptimization are potentially
>> unsound. But
>> > I don't have data to actually show this (yet).
>> >
>> > I *think* I may have been unclear about my assumptions; in
>> particular, my
>> > claims with respect to deoptimization are probably more subtle
than
>> they
>> > appeared. WebKit can use LLVM and it has divisions and we do all
>> possible
>> > deoptimization/profiling/etc tricks for it, so this is grounded in
>> > experience. Forgive me if the rest of this e-mail contains a
lecture on
>> > things that are obvious - I'll try to err on the side of
clarity and
>> > completeness since this discussion is sufficiently dense that we
>> run the
>> > risk of talking cross-purposes unless some baseline assumptions
are
>> > established.
>> >
>> > I think we're using the same terminology, but with slightly
>> different sets
>> > of assumptions. I'll point this out below where relevant.
>> >
>> > Also, thanks for taking the time to expand. It help clarify the
>> discussion
>> > quite a bit.
>> >
>> > I think we may be converging to an understanding of what you want
>> versus
>> > what I want, and I think that there are some points - possibly
>> unrelated to
>> > division - that are worth clarifying. I think that the main
>> difference is
>> > that when I say "patchpoint", I am referring to a
concrete
>> intrinsic with
>> > specific semantics that cannot change without breaking WebKit,
>> while you are
>> > using the term to refer to a broad concept, or rather, a class of
>> > as-yet-unimplemented intrinsics that share some of the same
>> features with
>> > patchpoints but otherwise have incompatible semantics.
>> >
>> > Also, when I say that you wouldn't want to use the existing
>> patchpoint to do
>> > your trapping deopt, what I mean is that the performance of doing
>> this would
>> > suck for reasons that are not related to deoptimization or
>> trapping. I'm
>> > not claiming that deoptimization performs poorly (trust me, I know
>> better)
>> > or that trapping to deoptimize is bad (I've done this many,
many
>> times and I
>> > know better). I'm saying that with the current patchpoint
intrinsics in
>> > LLVM, as they are currently specified and implemented, doing it
>> would be a
>> > bad idea because you'd have to compromise a bunch of other
>> optimizations to
>> > achieve it.
>> >
>> > You have essentially described new intrinsics that would make this
>> less of a
>> > bad idea and I am interested in your intrinsics, so I'll try
to
>> both respond
>> > with why patchpoints don't currently give you what you want
(and
>> why simply
>> > changing patchpoint semantics would be evil) and I'll also try
to
>> comment on
>> > what I think of the intrinsic that you're effectively
proposing.
>> Long story
>> > short, I think you should formally propose your intrinsic even if
>> it's not
>> > completely fleshed out. I think that it's an interesting
capability
>> and in
>> > its most basic form, it is a simple variation of the current
>> > patchpoint/stackmap intrinsics.
>> >
>> >
>> >
>> >
>> >
>> > On April 29, 2014 at 10:09:49 AM, Philip Reames
>> (listmail at philipreames.com)
>> > wrote:
>> >
>> > As the discussion has progressed and I've spent more time
thinking
>> about the
>> > topic, I find myself less and less enthused about the current
>> proposal. I'm
>> > in full support of having idiomatic ways to express safe division,
>> but I'm
>> > starting to doubt that using an intrinsic is the right way at the
>> moment.
>> >
>> > One case I find myself thinking about is how one would combine
>> profiling
>> > information and implicit div-by-zero/overflow checks with this
>> proposal. I
>> > don't really see a clean way. Ideally, for a "safe
div" which never
>> has the
>> > exceptional paths taken, you'd like to completely do away with
the
>> control
>> > flow entirely. (And rely on hardware traps w/exceptions instead.)
I
>> don't
>> > really see a way to represent that type of construct given the
current
>> > proposal.
>> >
>> > This is a deeper problem and to solve it you'd need a solution
to
>> trapping
>> > in general. Let's consider the case of Java. A Java program
may want to
>> > catch the arithmetic exception due to divide by zero. How would
you
>> do this
>> > with a trap in LLVM IR? Spill all state that is live at the catch?
>> Use a
>> > patchpoint for the entire division instruction?
>> >
>> > We'd likely use something similar to a patchpoint. You'd
need the
>> "abstract
>> > vm state" (which is not the compiled frame necessarily)
available
>> at the div
>> > instruction. You could then re-enter the interpreter at the
>> specified index
>> > (part of the vm state). We have all most of these mechanisms in
place.
>> > Ideally, you'd trigger a recompile and otherwise ensure
re-entry into
>> > compiled code at the soonest possible moment.
>> >
>> > This requires a lot of runtime support, but we already have most
of it
>> > implemented for another compiler. From our perspective, the
runtime
>> > requirements are not a major blocker.
>> >
>> > Right, you'll use a patchpoint. That's way more expensive
than
>> using a safe
>> > division intrinsic with branches, because it's opaque to the
optimizer.
>> >
>> > This statement is true at the moment, but it shouldn't be. I
think
>> this is
>> > our fundamental difference in approach.
>> >
>> > You should be able to write something like:
>> > i32 %res = invoke patchpoint (... x86_trapping_divide, a, b)
>> normal_dest
>> > invoke_dest
>> >
>> > normal_dest:
>> > ;; use %res
>> > invoke_dest:
>> > landingpad
>> > ;; dispatch edge cases
>> > ;; this could be unreachable code if you deopt this frame in the
trap
>> > handler and jump directly to an interpreter or other bit of code
>> >
>> > I see. It sounds like you want a generalization of the
>> "div.with.stackmap"
>> > that I thought you wanted - you want to be able to wrap anything
in a
>> > stackmap.
>> >
>> > The current patchpoint intrinsic does not do this, and you run the
>> risk of
>> > breaking existing semantics if you changed this. You'd
probably break
>> > WebKit, which treats the call target of the patchpoint as nothing
>> more than
>> > a quirk - we always pass null. Also, the current patchpoint treats
the
>> > callee as an i8* if I remember right and it would be super weird
if
>> all LLVM
>> > phases had to interpret this i8* by unwrapping a possible bitcast
>> to get to
>> > a declared function that may be an intrinsic. Yuck! Basically, the
call
>> > target of existing patchpoints is meant to be a kind of
convenience
>> feature
>> > rather than the core of the mechanism.
>> >
>> > I agree in principle that the intrinsic that you want would be a
useful
>> > intrinsic. But let's not call it a patchpoint for the purposes
of this
>> > discussion, and let's not confuse the discussion by claiming
>> (incorrectly)
>> > that the existing patchpoint facility gives you what you want. It
>> doesn't:
>> > patchpoints are designed to make the call target opaque (hence the
>> use of
>> > i8*) and there shouldn't be a correlation between what the
>> patchpoint does
>> > at run-time and what the called function would have done. You
could
>> make
>> > the call target be null (like WebKit does) and the patchpoint
>> should still
>> > mean "this code can do anything" because the expectation
is that
>> the client
>> > JIT will patch over it anyway.
>> >
>> > Also, "patchpoint" would probably not be the right term
for the
>> intrinsic
>> > that you want. I think that what you want is
"call.with.stackmap". Or
>> > maybe "stackmap.wrapper". Or just "stackmap" -
I'd be OK, in principle,
>> > with changing the name of the current "stackmap"
intrinsic to
>> something that
>> > reflects the fact that it's less of a stackmap than what you
want.
>> >
>> > To summarize. A patchpoint's main purpose is that you can
patch it with
>> > arbitrary code. The current "stackmap" means that you
can patch it with
>> > arbitrary code and that patching may be destructive to a shadow of
>> machine
>> > code bytes, so it's really just like patchpoints - we could
change
>> its name
>> > to "patchpoint.shadow" for example.
>> >
>> > If you were to propose such a stackmap intrinsic, then I think
>> there could
>> > be some ways of doing it that wouldn't be too terrible.
Basically
>> you want
>> > something that is like a patchpoint in that it reports a stackmap
>> via a side
>> > channel, but unlike patchpoints, it doesn't allow arbitrary
patching -
>> > instead the optimizer should be allowed to assume that the code
>> within the
>> > patchpoint will always do the same thing that the call target
would
>> have
>> > done. There are downsides to truly doing this. For example, to
make
>> > division efficient with such an intrinsic, you'd have to do
>> something that
>> > is somewhat worse than just recognizing intrinsics in the
optimizer
>> - you'd
>> > have to first recognize a call to your "stackmap
wrapper" intrinsic
>> and then
>> > observe that its call target argument is in turn another
intrinsic.
>> To me
>> > personally, that's kind of yucky, but I won't deny that it
could be
>> useful.
>> >
>> > As to the use of invoke: I don't believe that the use of
invoke
>> versus my
>> > suggested "branch on a trap predicate" idea are
different in any truly
>> > meaningful way. I buy that either would work.
>> >
>> >
>> >
>> > A patchpoint should not require any excess spilling. If values are
>> live in
>> > registers, that should be reflected in the stack map. (I do not
know if
>> > this is the case for patchpoint at the moment or not.)
>> >
>> > Patchpoints do not require spilling.
>> >
>> > My point was that with existing patchpoints, you *either* use a
>> patchpoint
>> > for the entire division which makes the division opaque to the
>> optimizer -
>> > because a patchpoint means "this code can do anything" -
*or* you could
>> > spill stuff to the stack prior to your trapping division
intrinsic,
>> since
>> > spilling is something that you could do as a workaround if you
>> didn't have a
>> > patchpoint.
>> >
>> > The reason why I brought up spilling at all is that I suspect that
>> spilling
>> > all state to the stack might be cheaper - for some systems - than
>> turning
>> > the division into a patchpoint. Turning the division into a
>> patchpoint is
>> > horrendously brutal - the patchpoint looks like it clobbers the
>> heap (which
>> > a division doesn't do), has to execute (a division is an
obvious DCE
>> > candidate), cannot be hoisted (hoisting divisions is awesome),
etc.
>> Perhaps
>> > most importantly, though, a patchpoint doesn't tell LLVM that
>> you're *doing
>> > a division* - so all constant folding, strenght reduction, and
>> algebraic
>> > reasoning flies out the window. On the other hand, spilling all
>> state to
>> > the stack is an arguably sound and performant solution to a lot of
VM
>> > problems. I've seen JVM implementations that ensure that there
is
>> always a
>> > copy of state on the stack at some critical points, just because
it
>> makes
>> > loads of stuff simpler (debugging, profiling, GC, and of course
>> deopt). I
>> > personally prefer to stay away from such a strategy because
it's
>> not free.
>> >
>> > On the other hand, branches can be cheap. A branch on a divide is
>> cheaper
>> > than not being able to optimize the divide.
>> >
>> >
>> >
>> > The Value called by a patchpoint should participate in
optimization
>> > normally.
>> >
>> > I agree that you could have a different intrinsic that behaves
like
>> this.
>> >
>> > We really want the patchpoint part of the call to be supplemental.
It
>> > should still be a call. It should be constant propagated,
transformed,
>> > etc.. This is not the case currently. I've got a couple of off
the wall
>> > ideas for improving the current status, but I'll agree this is
a
>> hardish
>> > problem.
>> >
>> > It should be legal to use a patchpoint in an invoke. It's an
ABI
>> issue of
>> > how the invoke path gets invoked. (i.e. side tables for the
runtime to
>> > lookup, etc..) This is not possible today, and probably requires a
fair
>> > amount of work. Some of it, I've already done and will be
sharing
>> shortly.
>> > Other parts, I haven't even thought about.
>> >
>> > Right, it's significantly more complex than either the
existing
>> patchpoints
>> > or Michael's proposed safe.div.
>> >
>> >
>> >
>> > If you didn't want to use the trapping semantics, you'd
insert
>> dedicated
>> > control flow _before_ the divide. This would allow normal
>> optimization of
>> > the control flow.
>> >
>> > Notes:
>> > 1) This might require a new PATCHPOINT pseudo op in the backend.
>> Haven't
>> > thought much about that yet.
>> > 2) I *think* your current intrinsic could be translated into
>> something like
>> > this. (Leaving aside the question of where the deopt state comes
>> from.) In
>> > fact, the more I look at this, the less difference I actually see
>> between
>> > the approaches.
>> >
>> >
>> >
>> > In a lot of languages, a divide produces some result even in the
>> exceptional
>> > case and this result requires effectively deoptimizing since the
>> resut won't
>> > be the one you would have predicted (double instead of int, or
BigInt
>> > instead of small int), which sort of means that if the CPU
>> exception occurs
>> > you have to be able to reconstruct all state. A patchpoint could
do
>> this,
>> > and so could spilling all state to the stack before the divide -
>> but both
>> > are very heavy hammers that are sure to be more expensive than
just
>> doing a
>> > branch.
>> >
>> > This isn't necessarily as expensive as you might believe.
I'd recommend
>> > reading the Graal project papers on this topic.
>> >
>> > Whether deopt or branching is more profitable *in this case*, I
>> can't easily
>> > say. I'm not yet to the point of being able to run that
experiment.
>> We can
>> > argue about what "should" be better all we want, but
real
>> performance data
>> > is the only way to truly know.
>> >
>> > My point may have been confusing. I know that deoptimization is
>> cheap and
>> > WebKit uses it everywhere, including division corner cases, if
>> profiling
>> > tells us that it's profitable to do so (which it does, in the
>> common case).
>> > WebKit is a heavy user of deoptimization in general, so you
don't
>> need to
>> > convince me that it's worth it.
>> >
>> > Acknowledged.
>> >
>> > Note that I want *both* deopt *and* branching, because in this
case, a
>> > branch is the fastest overall way of detecting when to deopt. In
the
>> > future, I will want to implement the deopt in terms of branching,
>> and when
>> > we do this, I believe that the most sound and performat approach
>> would be
>> > using Michael's intrinsics. This is subtle and I'll try to
explain
>> why it's
>> > the case.
>> >
>> > The point is that you wouldn't want to do deoptimization by
>> spilling state
>> > on the main path or by using a patchpoint for the main path of the
>> division.
>> >
>> > This is the main point I disagree with. I don't believe that
having a
>> > patchpoint on the main path should be any more expensive then the
>> original
>> > call. (see above)
>> >
>> > The reason why the patchpoint is expensive is that if you use a
>> patchpoint
>> > to implement a division then the optimizer won't be allowed to
>> assume that
>> > it's a division, because the whole point of
"patchpoint" is to tell the
>> > optimizer to piss off.
>> >
>> >
>> >
>> > Worth noting explicitly: I'm assuming that all of your deopt
state
>> would
>> > already be available for other purposes in nearby code. It's
on the
>> stack
>> > or in registers. I'm assuming that by adding the deopt point,
you
>> are not
>> > radically changing the set of computations which need to be done.
>> If that's
>> > not the case, you should avoid deopt and instead just inline the
>> slow paths
>> > with explicit checks.
>> >
>> > Yes, of course it is. That's not the issue.
>> >
>> >
>> >
>> > I'll note that given your assumptions about the cost of a
>> patchpoint, the
>> > rest of your position makes a lot more sense. :) As I spelled out
>> above, I
>> > believe this cost is not fundamental.
>> >
>> > You don't want the common path of executing the division to
involve a
>> > patchpoint instruction, although using a patchpoint or stackmap to
>> implement
>> > deoptimization on the failing path is great:
>> >
>> > Good: if (division would fail) { call @patchpoint(all of my state)
>> } else {
>> > result = a / b }
>> >
>> > Given your cost assumptions, I'd agree.
>> >
>> > Not my cost assumptions. The reason why this is better is that the
>> division
>> > is expressed in LLVM IR so that LLVM can do useful things to it -
like
>> > eliminate it, for example.
>> >
>> >
>> > Bad: call @patchpoint(all of my state) // patch with a divide
>> instruction -
>> > bad because the optimizer has no clue what you're doing and
assumes
>> the very
>> > worst
>> >
>> > Yuck. Agreed.
>> >
>> > To be clear, this is what you're proposing - except that
you're
>> assuming
>> > that LLVM will know that you've patched a division because
you're
>> expecting
>> > the call target to have semantic meaning. Or, rather, you're
>> expecting that
>> > you can make the contents of the patchpoint be a division by
having
>> the call
>> > target be a division intrinsic. In the current implementation and
>> as it is
>> > currently specified, the call target has no meaning and so you get
>> the yuck
>> > that I'm showing.
>> >
>> >
>> > Worse: spill all state to the stack; call @trapping.div(a, b) //
>> the spills
>> > will hurt you far more than a branch, so this should be avoided
>> >
>> > I'm assuming this is an explicit spill rather than simply
recording
>> a stack
>> > map *at the div*. If so, agreed.
>> >
>> > I suppose we could imagine a fourth option that involves a
>> patchpoint to
>> > pick up the state and a trapping divide instrinsic. But a trapping
>> divide
>> > intrinsic alone is not enough. Consider this:
>> >
>> > result = call @trapping.div(a, b); call @stackmap(all of my state)
>> >
>> > As soon as these are separate instructions, you have no guarantees
>> that the
>> > state that the stackmap reports is sound for the point at which
the div
>> > would trap.
>> >
>> > This is the closest to what I'd propose, except that the two
calls
>> would be
>> > merged into a single patchpoint. Isn't the entire point of a
>> patchpoint to
>> > record the stack map for a call?
>> >
>> > No. It would be bad if that's what the documentation says.
That's
>> not at
>> > all how WebKit uses it or probably any IC client would use it.
>> >
>> > Patchpoints are designed to be inline assembly on steroids.
They're
>> there
>> > to allow the client JIT to tell LLVM to piss off.
>> >
>> > (Well, ignoring the actual patching part..) Why not write this as:
>> > patchpoint(..., trapping.div, a, b);
>> >
>> > Is there something I'm missing here?
>> >
>> > Just to note: I fully agree that the two call proposal is unsound
>> and should
>> > be avoided.
>> >
>> > So, the division itself shouldn't be a trapping instruction
and
>> instead you
>> > want to detect the bad case with a branch.
>> >
>> > To be clear:
>> >
>> > - Whether you use deoptimization for division or anything else -
>> like WebKit
>> > has done since before any of the Graal papers were written - is
mostly
>> > unrelated to how you represent the division, unless you wanted to
>> add a new
>> > intrinsic that is like a trapping-division-with-stackmap:
>> >
>> > result = call @trapping.div.with.stackmap(a, b, ... all of my
state
>> ...)
>> >
>> > Now, maybe you do want such an intrinsic, in which case you should
>> propose
>> > it!
>> >
>> > Given what we already have with patchpoints, I don't think a
merged
>> > intrinsic is necessary. (See above). I believe we have all the
parts to
>> > build this solution, and that - in theory - they should compose
neatly.
>> >
>> > p.s. The bits I was referring to was not deopt per se. It was
>> particularly
>> > which set of deopt state you used for each deopt point. That's
a bit of
>> > tangent for the rest of the discussion now though.
>> >
>> > The reason why I haven't proposed it is that I think that
>> long-term, the
>> > currently proposed intrinsics are a better path to getting the
trapping
>> > optimizations. See my previous mail, where I show how we could
tell
>> LLVM
>> > what the failing path is (which may have deoptimization code that
>> uses a
>> > stackmap or whatever), what the trapping predicate is (it comes
>> from the
>> > safe.div intrinsic), and the fact that trapping is wise (branch
>> weights).
>> >
>> > - If you want to do the deoptimization with a trap, then your only
>> choice
>> > currently is to use a patchpoint for the main path of the
division.
>> This
>> > will be slower than using a branch to an OSR exit basic block,
because
>> > you're making the division itself opaque to the optimizer
(bad)
>> just to get
>> > rid of a branch (which was probably cheap to begin with).
>> >
>> > Hence, what you want to do - one way or another, regardless of
>> whether this
>> > proposed intrinsic is added - is to branch on the corner case
>> condition, and
>> > have the slow case of the branch go to a basic block that
deoptimizes.
>> > Unless of course you have profiling that says that the case does
happen
>> > often, in which case you can have that basic block handle the
>> corner case
>> > inline without leaving optimized code (FWIW, we do have such paths
>> in WebKit
>> > and they are useful).
>> >
>> > So the question for me is whether the branching involves explicit
>> control
>> > flow or is hidden inside an intrinsic. I prefer for it to be
within an
>> > intrinsic because it:
>> >
>> > - allows the optimizer to do more interesting things in the common
>> cases,
>> > like hoisting the entire division.
>> >
>> > - will give us a clearer path for implementing trapping
>> optimizations in the
>> > future.
>> >
>> > - is an immediate win on ARM.
>> >
>> > I'd be curious to hear what specific idea you have about how
to
>> implement
>> > trap-based deoptimization with your trapping division intrinsic
for
>> x86 -
>> > maybe it's different from the two "bad" idioms I
showed above.
>> >
>> > I hope my explanation above helps. If not, ask, and I'll try
to explain
>> > more clearly.
>> >
>> > I think I understand it. I think that the only issue is that:
>> >
>> > - Patchpoints currently don't do what you want.
>> >
>> > - If you made patchpoints do what you want then you'd break
WebKit
>> - not to
>> > mention anyone who wants to use them for inline caches.
>> >
>> > So it seems like you want a new intrinsic. You should officially
>> propose
>> > this new intrinsic, particularly since the core semantic
>> differences are not
>> > so great from what we have now. OTOH, if you truly believe that
>> patchpoints
>> > should just be changed to your semantics in a way that does break
>> WebKit,
>> > then that's probably also something you should get off your
chest. ;-)
>> >
>> >
>> >
>> > One point just for clarity; I don't believe this effects the
>> conclusions of
>> > our discussion so far. I'm also fairly sure that you (Filip)
are
>> aware of
>> > this, but want to spell it out for other readers.
>> >
>> > You seem to be assuming that compiled code needs to explicitly
>> branch to a
>> > point where deopt state is known to exit a compiled frame.
>> >
>> > This is a slightly unclear characterization of my assumptions. Our
>> JIT does
>> > deoptimization without explicit branches for many, many things.
You
>> should
>> > look at it some time, it's pretty fancy. ;-)
>> >
>> > Worth noting is that you can also exit a compiled frame on a trap
>> (without
>> > an explicitly condition/branch!) if the deopt state is known at
the
>> point
>> > you take the trap. This "exit frame on trap" behavior
shows up with
>> null
>> > pointer exceptions as well. I'll note that both compilers in
OpenJDK
>> > support some combination of "exit-on-trap" conditions
for division
>> and null
>> > dereferences. The two differ on exactly what strategies they use
in
>> each
>> > case though. :)
>> >
>> > Yeah, and I've also implemented VMs that do this - and I
endorse
>> the basic
>> > idea. I know what you want, and my only point is that the existing
>> > patchpoints only give you this if you're willing to make a
huge
>> compromise:
>> > namely, that you're willing to make the division (or heap load
for
>> the null
>> > case) completely opaque to the compiler to the point that GVN,
>> LICM, SCCP,
>> > and all algebraic reasoning have to give up on optimizing it. The
>> point of
>> > using LLVM is that it can optimize code. It can optimize branches
and
>> > divisions pretty well. So, eliminating an explicit branch by
>> replacing it
>> > with a construct that appears opaque to the optimizer is not a
smart
>> > trade-off.
>> >
>> > You could add a new intrinsic that, like patchpoints, records the
>> layout of
>> > state in a side-table, but that is used as a kind of wrapper for
>> operations
>> > that LLVM understands. This may or may not be hairy - you seem to
>> have sort
>> > of acknowledged that it's got some complexity and I've
also pointed
>> out some
>> > possible issues. If this is something that you want, you should
>> propose it
>> > so that others know what you're talking about. One danger of
how we're
>> > discussing this right now is that you're overloading
patchpoints to
>> mean the
>> > thing you want them to mean rather than what they actually mean,
>> which makes
>> > it seem like we don't need Michael's intrinsics on the
grounds that
>> > patchpoints already offer a solution. They don't already offer
a
>> solution
>> > precisely because patchpoints don't do what your intrinsics
would do.
>> >
>> >
>> >
>> > I'm not really arguing that either scheme is
"better" in all cases. I'm
>> > simply arguing that we should support both and allow optimization
>> and tuning
>> > between them. As far as I can tell, you seem to be assuming that
an
>> > explicit branch to known exit point is always superior.
>> >
>> >
>> > Ok, back to the topic at hand...
>> >
>> > With regards to the current proposal, I'm going to take a step
>> back. You
>> > guys seem to have already looked in this in a fair amount of
depth.
>> I'm not
>> > necessarily convinced you've come to the best solution, but at
some
>> point,
>> > we need to make forward progress. What you have is clearly better
than
>> > nothing.
>> >
>> > Please go ahead and submit your current approach. We can come back
and
>> > revise later if we really need to.
>> >
>> > I do request the following changes:
>> > - Mark it clearly as experimental.
>> >
>> >
>> > - Either don't specify the value computed in the edge cases,
or
>> allow those
>> > values to be specified as constant arguments to the call. This
allows
>> > efficient lowering to x86's div instruction if you want to
make use
>> of the
>> > trapping semantics.
>> >
>> > Once again: how would you use this to get trapping semantics
without
>> > throwing all of LLVM's optimizations out the window, in the
absence
>> of the
>> > kind of patchpoint-like intrinsic that you want? I ask just to
make
>> sure
>> > that we're on the same page.
>> >
>> >
>> >
>> > Finally, as for performance data, which part of this do you want
>> performance
>> > data for? I concede that I don't have performance data for
using
>> Michael's
>> > new intrinsic. Part of what the intrinsic accomplishes is it gives
>> a less
>> > ugly way of doing something that is already possible with target
>> intrinsics
>> > on ARM. I think it would be great if you could get those semantics
>> - along
>> > with a known-good implementation - on other architectures as well.
>> >
>> > I would be very interested in seeing data comparing two schemes:
>> > - Explicit control flow emited by the frontend
>> > - The safe.div intrinsic emitted by the frontend, desugared in
>> CodeGenPrep
>> >
>> > My strong suspicion is that each would preform well in some cases
>> and not in
>> > others. At least on x86. Since the edge-checks are essentially
free on
>> > ARM, the second scheme would probably be strictly superior there.
>> >
>> > I am NOT asking that we block submission on this data however.
>> >
>> > But this discussion has also involved suggestions that we should
use
>> > trapping to implement deoptimization, and the main point of my
>> message is to
>> > strongly argue against anything like this given the current state
of
>> > trapping semantics and how patchpoints work. My point is that
using
>> traps
>> > for division corner cases would either be unsound (see the
stackmap
>> after
>> > the trap, above), or would require you to do things that are
obviously
>> > inefficient. If you truly believe that the branch to detect
>> division slow
>> > paths is more expensive than spilling all bytecode state to the
>> stack or
>> > using a patchpoint for the division, then I could probably hack
>> something up
>> > in WebKit to show you the performance implications. (Or you could
do it
>> > yourself, the code is open source...)
>> >
>> > In a couple of months, I'll probably have the performance data
to
>> discuss
>> > this for real. When that happens, let's pick this up and
continue the
>> > debate. Alternatively, if you want to chat this over more with a
>> beer in
>> > hand at the social next week, let me know. In the meantime,
let's
>> not stall
>> > the current proposal any more.
>> >
>> > Philip
>> >
>> >
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140502/573b7055/attachment.html>