thr3ads.net - llvm dev - [LLVMdev] [RFC] Stackmap and Patchpoint Intrinsic Proposal [Oct 2013]

If this information is useful, please help other people find it:
Share via:

Filip Pizlo

2013-Oct-22 22:08 UTC

[LLVMdev] [RFC] Stackmap and Patchpoint Intrinsic Proposal

On Oct 22, 2013, at 1:48 PM, Philip R <listmail at philipreames.com>
wrote:
> On 10/22/13 10:34 AM, Filip Pizlo wrote:
>> On Oct 22, 2013, at 9:53 AM, Philip R <listmail at
philipreames.com> wrote:
>> 
>>> On 10/17/13 10:39 PM, Andrew Trick wrote:
>>>> This is a proposal for adding Stackmaps and Patchpoints to
LLVM. The
>>>> first client of these features is the JavaScript compiler
within the
>>>> open source WebKit project.
>>>> 
>>> I have a couple of comments on your proposal.  None of these are
major enough to prevent submission.
>>> 
>>> - As others have said, I'd prefer an experimental namespace
rather than a webkit namespace.  (minor)
>>> - Unless I am misreading your proposal, your proposed StackMap
intrinsic duplicates existing functionality already in llvm.  In particular,
much of the StackMap construction seems similar to the Safepoint mechanism used
by the in-tree GC support. (See CodeGen/GCStrategy.cpp and
CodeGen/GCMetadata.cpp).  Have you examined these mechanisms to see if you can
share implementations?
>>> - To my knowledge, there is nothing that prevents an LLVM
optimization pass from manufacturing new pointers which point inside an existing
data structure.  (e.g. an interior pointer to an array when blocking a loop) 
Does your StackMap mechanism need to be able to inspect/modify these
manufactured temporaries?  If so, I don't see how you could generate an
intrinsic which would include this manufactured pointer in the live variable
list.  Is there something I'm missing here?
>> These stackmaps have nothing to do with GC.  Interior pointers are a
problem unique to precise copying collectors.
> I would argue that while the use of the stack maps might be different, the
mechanism is fairly similar.
It's not at all similar.  These stackmaps are only useful for
deoptimization, since the only way to make use of the live state information is
to patch the stackmap with a jump to a deoptimization off-ramp.  You won't
use these for a GC.
> In general, if the expected semantics are the same, a shared implementation
would be desirable.  This is more a suggestion for future refactoring than
anything else.
I think that these stackmaps and GC stackmaps are fairly different beasts. 
While it's possible to unify the two, this isn't the intent here.  In
particular, you can use these stackmaps for deoptimization without having to
unwind the stack.
> 
> I agree that interior pointers are primarily a problem for relocating
collectors. (Though I disagree with the characterization of it being *uniquely*
a problem for such collectors.)  Since I was unaware of what you're using
your stackmap mechanism for, I wanted to ask.  Sounds like this is not an
intended use case for you.
>> 
>> In particular, the stackmaps in this proposal are likely to be used for
capturing only a select subset of state and that subset may fail to include all
possible GC roots.  These stackmaps are meant to be used for reconstructing
state-in-bytecode (where bytecode = whatever your baseline execution engine is,
could be an AST) for performing a deoptimization, if LLVM was used for compiling
code that had some type/value/behavior speculations.
> Thanks for the clarification.  This is definitely a useful mechanism. 
Thank you for contributing it back.
>> 
>>> - Your patchpoint mechanism appears to be one very specialized use
of a patchable location.  Would you mind renaming it to something like
patchablecall to reflect this specialization?
>> The top use case will be heap access dispatch inline cache, which is
not a call.
>> You can also use it to implement call inline caches, but that's not
the only thing you can use it for.
> Er, possibly I'm misunderstanding you.  To me, a inline call cache is a
mechanism to optimize a dynamic call by adding a typecheck+directcall fastpath.
Inline caches don't have to be calls.  For example, in JavaScript, the
expression "o.f" is fully dynamic but usually does not result in a
call.  The inline cache - and hence patchpoint - for such an expression will not
have a call in the common case.

Similar things arise in other dynamic languages.  You can have inline caches for
arithmetic.  Or for array accesses.  Or for any other dynamic operation in your
language.
>  (i.e. avoiding the dynamic dispatch logic in the common case)  I'm
assuming this what you mean with the term "call inline cache", but I
have never heard of a "heap access dispatch inline cache".  I've
done a google search and didn't find a definition.  Could you point me to a
reference or provide a brief explanation?
Every JavaScript engine does it, and usually the term "inline cache"
in the context of JS engines implies dispatching on the shape of the object in
order to find the offset at which a field is located, rather than dispatching on
the class of an object to determine what method to call.

-Filip
> 
> Philip
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131022/88102040/attachment.html>

Andrew Trick

2013-Oct-22 23:18 UTC

head link

[LLVMdev] [RFC] Stackmap and Patchpoint Intrinsic Proposal

On Oct 22, 2013, at 3:08 PM, Filip Pizlo <fpizlo at apple.com> wrote:
> On Oct 22, 2013, at 1:48 PM, Philip R <listmail at philipreames.com>
wrote:
> 
>> On 10/22/13 10:34 AM, Filip Pizlo wrote:
>>> On Oct 22, 2013, at 9:53 AM, Philip R <listmail at
philipreames.com> wrote:
>>> 
>>>> On 10/17/13 10:39 PM, Andrew Trick wrote:
>>>>> This is a proposal for adding Stackmaps and Patchpoints to
LLVM. The
>>>>> first client of these features is the JavaScript compiler
within the
>>>>> open source WebKit project.
>>>>> 
>>>> I have a couple of comments on your proposal.  None of these
are major enough to prevent submission.
>>>> 
>>>> - As others have said, I'd prefer an experimental namespace
rather than a webkit namespace.  (minor)
>>>> - Unless I am misreading your proposal, your proposed StackMap
intrinsic duplicates existing functionality already in llvm.  In particular,
much of the StackMap construction seems similar to the Safepoint mechanism used
by the in-tree GC support. (See CodeGen/GCStrategy.cpp and
CodeGen/GCMetadata.cpp).  Have you examined these mechanisms to see if you can
share implementations?
>>>> - To my knowledge, there is nothing that prevents an LLVM
optimization pass from manufacturing new pointers which point inside an existing
data structure.  (e.g. an interior pointer to an array when blocking a loop) 
Does your StackMap mechanism need to be able to inspect/modify these
manufactured temporaries?  If so, I don't see how you could generate an
intrinsic which would include this manufactured pointer in the live variable
list.  Is there something I'm missing here?
>>> These stackmaps have nothing to do with GC.  Interior pointers are
a problem unique to precise copying collectors.
>> I would argue that while the use of the stack maps might be different,
the mechanism is fairly similar.
> 
> It's not at all similar.  These stackmaps are only useful for
deoptimization, since the only way to make use of the live state information is
to patch the stackmap with a jump to a deoptimization off-ramp.  You won't
use these for a GC.
> 
>> In general, if the expected semantics are the same, a shared
implementation would be desirable.  This is more a suggestion for future
refactoring than anything else.
> 
> I think that these stackmaps and GC stackmaps are fairly different beasts. 
While it's possible to unify the two, this isn't the intent here.  In
particular, you can use these stackmaps for deoptimization without having to
unwind the stack.
I think Philip R is asking a good question. To paraphrase: If we introduce a
generically named feature, shouldn’t it be generically useful? Stack maps are
used in other ways, and there are other kinds of patching. I agree and I think
these are intended to be generically useful features, but not necessarily
sufficient for every use.

The proposed stack maps are very different from LLVM’s gcroot because gcroot
does not provide stack maps! llvm.gcroot effectively designates a stack location
for each root for the duration of the current function, and forces the root to
be spilled to the stack at all call sites (the client needs to disable
StackColoring). This is really the opposite of a stack map and I’m not aware of
any functionality that can be shared. It also requires a C++ plugin to process
the roots. llvm.stackmap generates data in a section that MCJIT clients can
parse.

If someone wanted to use stack maps for GC, I don’t know why they wouldn’t
leverage llvm.stackmap. Maybe Filip can see a problem with this that I
can't. The runtime can add GC roots to the stack map just like other live
value, and it should know how to interpret the records. The intrinsic doesn’t
bake in any particular interpretation of the mapped values. That said, my
proposal deliberately does not cover GC. I think that stack maps are the easy
part of the problem. The hard problem is tracking interior pointers, or for that
matter exterior/out-of-bounds or swizzled pointers. LLVM’s machine IR simply
doesn’t have the necessary facilities for doing this. But if you don’t need a
moving collector, then you don’t need to track derived pointers as long as the
roots are kept live. In that case, llvm.stackmap might be a nice optimization
over llvm.gcroot.

Now with regard to patching. I think llvm.patchpoint is generally useful for any
type of patching I can imagine. It does look like a call site in IR, and it’s
nice to be able to leverage calling conventions to inform the location of
arguments. But the patchpoint does not have to be a call after patching, and you
can specify zero arguments to avoid using a calling convention. In fact, we only
currently emit a call out of convenience. We could splat nops in place and
assume the runtime will immediately find and patch all occurrences before the
code executes. In the future we may want to handle NULL call target, bypass call
emission, and allow the reserved bytes to be less than that required to emit a
call.

-Andy
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131022/f6015e6a/attachment.html>

Philip R

2013-Oct-23 00:25 UTC

head link

[LLVMdev] [RFC] Stackmap and Patchpoint Intrinsic Proposal

On 10/22/13 3:08 PM, Filip Pizlo wrote:>
> On Oct 22, 2013, at 1:48 PM, Philip R <listmail at philipreames.com 
> <mailto:listmail at philipreames.com>> wrote:
>
>> On 10/22/13 10:34 AM, Filip Pizlo wrote:
>>> On Oct 22, 2013, at 9:53 AM, Philip R <listmail at
philipreames.com
>>> <mailto:listmail at philipreames.com>> wrote:
>>>
>>>> On 10/17/13 10:39 PM, Andrew Trick wrote:
>>>>> This is a proposal for adding Stackmaps and Patchpoints to
LLVM. The
>>>>> first client of these features is the JavaScript compiler
within the
>>>>> open source WebKit project.
>>>>>
>>>> I have a couple of comments on your proposal.  None of these
are
>>>> major enough to prevent submission.
>>>>
>>>> - As others have said, I'd prefer an experimental namespace
rather
>>>> than a webkit namespace.  (minor)
>>>> - Unless I am misreading your proposal, your proposed StackMap 
>>>> intrinsic duplicates existing functionality already in llvm. 
In
>>>> particular, much of the StackMap construction seems similar to
the
>>>> Safepoint mechanism used by the in-tree GC support. (See 
>>>> CodeGen/GCStrategy.cpp and CodeGen/GCMetadata.cpp).  Have you 
>>>> examined these mechanisms to see if you can share
implementations?
>>>> - To my knowledge, there is nothing that prevents an LLVM 
>>>> optimization pass from manufacturing new pointers which point 
>>>> inside an existing data structure.  (e.g. an interior pointer
to an
>>>> array when blocking a loop)  Does your StackMap mechanism need
to
>>>> be able to inspect/modify these manufactured temporaries?  If
so, I
>>>> don't see how you could generate an intrinsic which would
include
>>>> this manufactured pointer in the live variable list.  Is there 
>>>> something I'm missing here?
>>> These stackmaps have nothing to do with GC.  Interior pointers are
a
>>> problem unique to precise copying collectors.
>> I would argue that while the use of the stack maps might be 
>> different, the mechanism is fairly similar.
>
> It's not at all similar.  These stackmaps are only useful for 
> deoptimization, since the only way to make use of the live state 
> information is to patch the stackmap with a jump to a deoptimization 
> off-ramp.  You won't use these for a GC.
>
>> In general, if the expected semantics are the same, a shared 
>> implementation would be desirable.  This is more a suggestion for 
>> future refactoring than anything else.
>
> I think that these stackmaps and GC stackmaps are fairly different 
> beasts.  While it's possible to unify the two, this isn't the
intent
> here.  In particular, you can use these stackmaps for deoptimization 
> without having to unwind the stack.I'm going to respond to Andrew Trick's followup for this
portion.>
>>
>> I agree that interior pointers are primarily a problem for relocating 
>> collectors. (Though I disagree with the characterization of it being 
>> *uniquely* a problem for such collectors.)  Since I was unaware of 
>> what you're using your stackmap mechanism for, I wanted to ask. 
>>  Sounds like this is not an intended use case for you.
>>>
>>> In particular, the stackmaps in this proposal are likely to be used
>>> for capturing only a select subset of state and that subset may
fail
>>> to include all possible GC roots.  These stackmaps are meant to be 
>>> used for reconstructing state-in-bytecode (where bytecode =
whatever
>>> your baseline execution engine is, could be an AST) for performing
a
>>> deoptimization, if LLVM was used for compiling code that had some 
>>> type/value/behavior speculations.
>> Thanks for the clarification.  This is definitely a useful mechanism. 
>>  Thank you for contributing it back.
>>>
>>>> - Your patchpoint mechanism appears to be one very specialized
use
>>>> of a patchable location.  Would you mind renaming it to
something
>>>> like patchablecall to reflect this specialization?
>>> The top use case will be heap access dispatch inline cache, which
is
>>> not a call.
>>> You can also use it to implement call inline caches, but that's
not
>>> the only thing you can use it for.
>> Er, possibly I'm misunderstanding you.  To me, a inline call cache
is
>> a mechanism to optimize a dynamic call by adding a 
>> typecheck+directcall fastpath.
>
> Inline caches don't have to be calls.  For example, in JavaScript, the 
> expression "o.f" is fully dynamic but usually does not result in
a
> call.  The inline cache - and hence patchpoint - for such an 
> expression will not have a call in the common case.
>
> Similar things arise in other dynamic languages.  You can have inline 
> caches for arithmetic.  Or for array accesses.  Or for any other 
> dynamic operation in your language.
>
>>  (i.e. avoiding the dynamic dispatch logic in the common case)  I'm
>> assuming this what you mean with the term "call inline
cache", but I
>> have never heard of a "heap access dispatch inline cache". 
I've done
>> a google search and didn't find a definition.  Could you point me
to
>> a reference or provide a brief explanation?
>
> Every JavaScript engine does it, and usually the term "inline
cache"
> in the context of JS engines implies dispatching on the shape of the 
> object in order to find the offset at which a field is located, rather 
> than dispatching on the class of an object to determine what method to 
> call.Thank you for the clarification.  I am familiar with the patching 
optimizations performed for property access, but had not been aware of 
the modified usage of the term "inline cache".  I was also unaware of 
the term "heap access dispatch inline cache".  I believe I now 
understand your intent.

Taking a step back in the conversation, my original question was about 
the naming of the patchpoint intrinsic.  I am now convinced that you 
could use your patchpoint intrinsic for a number of different inline 
caching schemes (method dispatch, property access, etc..).  Given that, 
my concern about naming is diminished, but not completely eliminated.  I 
don't really have a suggestion for a better name, but given that a 
"stackmap" intrinsic can be patched, the "patchpoint"
intrinsic name
doesn't seem particularly descriptive.  To put it another way, how are 
the stackmap and patchpoint intrinsics different?  Can this difference 
be encoded in a descriptive name for one or the other?

As a secondary point, it would be good to update the proposed 
documentation with a brief description of the intended usage (i.e. 
inline caching).  This might prevent a future developer from being 
confused on the same issues.

Yours,
Philip

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131022/95a66c5f/attachment.html>

Filip Pizlo

2013-Oct-23 01:23 UTC

head link

[LLVMdev] [RFC] Stackmap and Patchpoint Intrinsic Proposal

On Oct 22, 2013, at 4:18 PM, Andrew Trick <atrick at apple.com> wrote:
> On Oct 22, 2013, at 3:08 PM, Filip Pizlo <fpizlo at apple.com> wrote:
> 
>> On Oct 22, 2013, at 1:48 PM, Philip R <listmail at
philipreames.com> wrote:
>> 
>>> On 10/22/13 10:34 AM, Filip Pizlo wrote:
>>>> On Oct 22, 2013, at 9:53 AM, Philip R <listmail at
philipreames.com> wrote:
>>>> 
>>>>> On 10/17/13 10:39 PM, Andrew Trick wrote:
>>>>>> This is a proposal for adding Stackmaps and Patchpoints
to LLVM. The
>>>>>> first client of these features is the JavaScript
compiler within the
>>>>>> open source WebKit project.
>>>>>> 
>>>>> I have a couple of comments on your proposal.  None of
these are major enough to prevent submission.
>>>>> 
>>>>> - As others have said, I'd prefer an experimental
namespace rather than a webkit namespace.  (minor)
>>>>> - Unless I am misreading your proposal, your proposed
StackMap intrinsic duplicates existing functionality already in llvm.  In
particular, much of the StackMap construction seems similar to the Safepoint
mechanism used by the in-tree GC support. (See CodeGen/GCStrategy.cpp and
CodeGen/GCMetadata.cpp).  Have you examined these mechanisms to see if you can
share implementations?
>>>>> - To my knowledge, there is nothing that prevents an LLVM
optimization pass from manufacturing new pointers which point inside an existing
data structure.  (e.g. an interior pointer to an array when blocking a loop) 
Does your StackMap mechanism need to be able to inspect/modify these
manufactured temporaries?  If so, I don't see how you could generate an
intrinsic which would include this manufactured pointer in the live variable
list.  Is there something I'm missing here?
>>>> These stackmaps have nothing to do with GC.  Interior pointers
are a problem unique to precise copying collectors.
>>> I would argue that while the use of the stack maps might be
different, the mechanism is fairly similar.
>> 
>> It's not at all similar.  These stackmaps are only useful for
deoptimization, since the only way to make use of the live state information is
to patch the stackmap with a jump to a deoptimization off-ramp.  You won't
use these for a GC.
>> 
>>> In general, if the expected semantics are the same, a shared
implementation would be desirable.  This is more a suggestion for future
refactoring than anything else.
>> 
>> I think that these stackmaps and GC stackmaps are fairly different
beasts.  While it's possible to unify the two, this isn't the intent
here.  In particular, you can use these stackmaps for deoptimization without
having to unwind the stack.
> 
> I think Philip R is asking a good question. To paraphrase: If we introduce
a generically named feature, shouldn’t it be generically useful? Stack maps are
used in other ways, and there are other kinds of patching. I agree and I think
these are intended to be generically useful features, but not necessarily
sufficient for every use.
> 
> The proposed stack maps are very different from LLVM’s gcroot because
gcroot does not provide stack maps! llvm.gcroot effectively designates a stack
location for each root for the duration of the current function, and forces the
root to be spilled to the stack at all call sites (the client needs to disable
StackColoring). This is really the opposite of a stack map and I’m not aware of
any functionality that can be shared. It also requires a C++ plugin to process
the roots. llvm.stackmap generates data in a section that MCJIT clients can
parse.
> 
> If someone wanted to use stack maps for GC, I don’t know why they wouldn’t
leverage llvm.stackmap. Maybe Filip can see a problem with this that I
can't.
You're right, it could work.

If you were happy with spilling all of your GC roots, then you could put them
into allocas and then pass the allocas' addresses to a stackmap.  This will
give you a FP offset of the roots.

If you were happy with an accurate GC that couldn't move objects referenced
from the stack then you could have each safepoint call use patchpoint, and then
if you also implemented stack unwinding, you could use the patchpoints'
implicit stackmaps to figure out which registers (or stack slots) contained
pointers.

These would be niche uses, I think.  If you care about performance then
you're not going to use an accurate GC that requires spilling roots;
you'll go for some GC algorithm that can handle conservative stack roots. 
If you're using accurate GC support for moving objects then it's usually
because you need to move *all* objects (after all you can move *most* objects
without any GC roots or stackmaps by using Bartlett's algorithm or similar)
so the calls-as-patchpoints approach won't work.

I could kind of see some real-time GC's using the alloca+stackmap approach,
but it's a bit of a stretch.

So, I don't see stackmaps as being particularly practical for accurate GC,
but I do concede that you *could* implement some kind of accurate GC that uses
stackmaps for some part of its stack scanning.
> The runtime can add GC roots to the stack map just like other live value,
and it should know how to interpret the records. The intrinsic doesn’t bake in
any particular interpretation of the mapped values. That said, my proposal
deliberately does not cover GC. I think that stack maps are the easy part of the
problem. The hard problem is tracking interior pointers, or for that matter
exterior/out-of-bounds or swizzled pointers. LLVM’s machine IR simply doesn’t
have the necessary facilities for doing this. But if you don’t need a moving
collector, then you don’t need to track derived pointers as long as the roots
are kept live. In that case, llvm.stackmap might be a nice optimization over
llvm.gcroot.
> 
> Now with regard to patching. I think llvm.patchpoint is generally useful
for any type of patching I can imagine. It does look like a call site in IR, and
it’s nice to be able to leverage calling conventions to inform the location of
arguments. But the patchpoint does not have to be a call after patching, and you
can specify zero arguments to avoid using a calling convention. In fact, we only
currently emit a call out of convenience. We could splat nops in place and
assume the runtime will immediately find and patch all occurrences before the
code executes. In the future we may want to handle NULL call target, bypass call
emission, and allow the reserved bytes to be less than that required to emit a
call.
> 
> -Andy
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131022/2fdb4774/attachment.html>

Philip R

2013-Oct-23 01:24 UTC

head link

[LLVMdev] [RFC] Stackmap and Patchpoint Intrinsic Proposal

Adding Gael as someone who has previously discussed vmkit topics on the 
list.  Since I'm assuming this is where the GC support came from, I 
wanted to draw this conversation to the attention of someone more 
familiar with the LLVM implementation than myself.

On 10/22/13 4:18 PM, Andrew Trick wrote:> On Oct 22, 2013, at 3:08 PM, Filip Pizlo <fpizlo at apple.com 
> <mailto:fpizlo at apple.com>> wrote:
>
>> On Oct 22, 2013, at 1:48 PM, Philip R <listmail at philipreames.com 
>> <mailto:listmail at philipreames.com>> wrote:
>>
>>> On 10/22/13 10:34 AM, Filip Pizlo wrote:
>>>> On Oct 22, 2013, at 9:53 AM, Philip R <listmail at
philipreames.com
>>>> <mailto:listmail at philipreames.com>> wrote:
>>>>
>>>>> On 10/17/13 10:39 PM, Andrew Trick wrote:
>>>>>> This is a proposal for adding Stackmaps and Patchpoints
to LLVM. The
>>>>>> first client of these features is the JavaScript
compiler within the
>>>>>> open source WebKit project.
>>>>>>
>>>>> I have a couple of comments on your proposal.  None of
these are
>>>>> major enough to prevent submission.
>>>>>
>>>>> - As others have said, I'd prefer an experimental
namespace rather
>>>>> than a webkit namespace.  (minor)
>>>>> - Unless I am misreading your proposal, your proposed
StackMap
>>>>> intrinsic duplicates existing functionality already in
llvm.  In
>>>>> particular, much of the StackMap construction seems similar
to the
>>>>> Safepoint mechanism used by the in-tree GC support. (See 
>>>>> CodeGen/GCStrategy.cpp and CodeGen/GCMetadata.cpp).  Have
you
>>>>> examined these mechanisms to see if you can share
implementations?
>>>>> - To my knowledge, there is nothing that prevents an LLVM 
>>>>> optimization pass from manufacturing new pointers which
point
>>>>> inside an existing data structure.  (e.g. an interior
pointer to
>>>>> an array when blocking a loop)  Does your StackMap
mechanism need
>>>>> to be able to inspect/modify these manufactured
temporaries?  If
>>>>> so, I don't see how you could generate an intrinsic
which would
>>>>> include this manufactured pointer in the live variable
list.  Is
>>>>> there something I'm missing here?
>>>> These stackmaps have nothing to do with GC.  Interior pointers
are
>>>> a problem unique to precise copying collectors.
>>> I would argue that while the use of the stack maps might be 
>>> different, the mechanism is fairly similar.
>>
>> It's not at all similar.  These stackmaps are only useful for 
>> deoptimization, since the only way to make use of the live state 
>> information is to patch the stackmap with a jump to a deoptimization 
>> off-ramp.  You won't use these for a GC.
>>
>>> In general, if the expected semantics are the same, a shared 
>>> implementation would be desirable.  This is more a suggestion for 
>>> future refactoring than anything else.
>>
>> I think that these stackmaps and GC stackmaps are fairly different 
>> beasts.  While it's possible to unify the two, this isn't the
intent
>> here.  In particular, you can use these stackmaps for deoptimization 
>> without having to unwind the stack.
>
> I think Philip R is asking a good question. To paraphrase: If we 
> introduce a generically named feature, shouldn’t it be generically 
> useful? Stack maps are used in other ways, and there are other kinds 
> of patching. I agree and I think these are intended to be generically 
> useful features, but not necessarily sufficient for every use.Thank you for the restatement.  You summarized my view
well.>
> The proposed stack maps are very different from LLVM’s gcroot because 
> gcroot does not provide stack maps! llvm.gcroot effectively designates 
> a stack location for each root for the duration of the current 
> function, and forces the root to be spilled to the stack at all call 
> sites (the client needs to disable StackColoring). This is really the 
> opposite of a stack map and I’m not aware of any functionality that 
> can be shared. It also requires a C++ plugin to process the roots. 
> llvm.stackmap generates data in a section that MCJIT clients can parse.Er, I think we're talking past each other again.  Let me lay out my 
current understanding of the terminology and existing infrastructure in 
LLVM.  Please correct me where I go wrong.

stack map - A mapping from "values" to storage locations.  Storage 
locations primarily take the form of register, or stack offsets, but 
could in principal refer to other well known locations (i.e. offsets 
into thread local state).  A stack map is specific to a particular PC 
and describes the state at that instruction only.

In a precise garbage collector, stack maps are used to ensure that the 
stack can be understood by the collector.  When a stop-the-world 
safepoint is reached, the collector needs to be able to identify any 
pointers to heap objects which may exist on the stack.  This explicitly 
includes both the frame which actually contains the safepoint and any 
caller frames back to the root of thread.  To accomplish this, a stack 
map is generated at any call site and a stack map is generated for the 
safepoint itself.

In LLVM currently, the GCStrategy records "safepoints" which are
really
points at which stack maps need to be remembered.  (i.e. calls and 
actual stop-the-world safepoints)  The GCMetadata mechanism gives a 
generic way to emit the binary encoding of a stack map in a collector 
specific way.  The current stack maps supported by this mechanism only 
allow abstract locations on the stack which force all registers to be 
spilled around "safepoints" (i.e. calls and stop-the-world
safepoints).
Also, the set of roots (which are recorded in the stack map) must be 
provided separately using the gcroot intrinsic.

In code:
- GCPoint in llvm/include/llvm/CodeGen/GCMetadata.h describes a request 
for a location with a stack map.  The SafePoints structure in 
GCFunctionInfo contains a list of these locations.
- The Ocaml GC is probably the best example of usage.  See 
llvm/lib/CodeGen/AsmPrinter/OcamlGCPrinter.cpp

Note: The summary of existing LLVM details above is based on reading the 
code.  I haven't actually implemented anything which used this mechanism 
yet.  As such, take it with a grain of salt.

In your change, you are adding a mechanism which is intended to enable 
runtime calls and inline cache patching.  (Right?)  Your stack maps seem 
to match the definition of a stack map I gave above and (I believe) the 
implementation currently in LLVM.  The only difference might be that 
your stack maps are partial (i.e. might not contain all "values" which
are live at a particular PC) and your implementation includes Register 
locations which the current implementation in LLVM does not.  One other 
possible difference, are you intending to include "values" which
aren't
of pointer type?

Before moving on, am I interpreting your proposal and changes correctly?

Assuming I'm still correct so far, how might we combine these 
implementations?  It looks like your implementation is much more mature 
than what exists in tree at the moment.  One possibility would be to 
express the needed GC stack maps in terms of your new infrastructure.  
(i.e. convert a GCStrategy request for a safepoint into a StackMap (as 
you've implemented it) with the list of explicit GC roots as it's 
arguments).  What would you think of this?

p.s. This discussion has gotten sufficiently abstract that it should in 
no way block your plan to submit these changes.  I appreciate your 
willingness to discuss.>
> If someone wanted to use stack maps for GC, I don’t know why they 
> wouldn’t leverage llvm.stackmap. Maybe Filip can see a problem with 
> this that I can't. The runtime can add GC roots to the stack map just 
> like other live value, and it should know how to interpret the 
> records. The intrinsic doesn’t bake in any particular interpretation 
> of the mapped values.I think this a restatement of my last paragraph above which would mean 
we're actually in agreement.> That said, my proposal deliberately does not cover GC. I think that 
> stack maps are the easy part of the problem. The hard problem is 
> tracking interior pointers, or for that matter exterior/out-of-bounds 
> or swizzled pointers. LLVM’s machine IR simply doesn’t have the 
> necessary facilities for doing this. But if you don’t need a moving 
> collector, then you don’t need to track derived pointers as long as 
> the roots are kept live. In that case, llvm.stackmap might be a nice 
> optimization over llvm.gcroot.Oddly enough, I'll be raising the issue of how to go about supporting a 
relocating collector on list shortly.  We've looking into this 
independently, but are at the point we'd like to get feedback from 
others.  :)>
> Now with regard to patching. I think llvm.patchpoint is generally 
> useful for any type of patching I can imagine. It does look like a 
> call site in IR, and it’s nice to be able to leverage calling 
> conventions to inform the location of arguments.Agreed.  My concern is mostly about naming and documentation of intended 
usages.  Speaking as someone who's likely to be using this in the very 
near future, I'd like to make sure I understand how you intend it to be 
used.  The last thing I want to do is misconstrue your intent and become 
reliant on a quirk of the implementation you later want to change.
> But the patchpoint does not have to be a call after patching, and you 
> can specify zero arguments to avoid using a calling convention.Er, not quite true.  Your calling convention also influences what 
registers stay live across the call.  But in general, I see your point.

(Again, this is touching an area of LLVM I'm not particularly familiar 
with.)> In fact, we only currently emit a call out of convenience. We could 
> splat nops in place and assume the runtime will immediately find and 
> patch all occurrences before the code executes. In the future we may 
> want to handle NULL call target, bypass call emission, and allow the 
> reserved bytes to be less than that required to emit a call.If you were to do that, how would the implementation be different then 
the new stackmap intrinsic?  Does that difference imply a clarification 
in intended usage or naming?

p.s. The naming discussion has gotten rather abstract and is starting to 
feel like a "what color is the bikeshed" discussion. Feel free to just
tell me to go away at some point.  :)

Philip

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131022/cc89da09/attachment.html>

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - Oct 2013 - [LLVMdev] [RFC] Stackmap and Patchpoint Intrinsic Proposal

[LLVMdev] [RFC] Stackmap and Patchpoint Intrinsic Proposal

[LLVMdev] [RFC] Stackmap and Patchpoint Intrinsic Proposal

[LLVMdev] [RFC] Stackmap and Patchpoint Intrinsic Proposal

[LLVMdev] [RFC] Stackmap and Patchpoint Intrinsic Proposal

[LLVMdev] [RFC] Stackmap and Patchpoint Intrinsic Proposal

Possibly Parallel Threads