Adding Gael as someone who has previously discussed vmkit topics on the list. Since I'm assuming this is where the GC support came from, I wanted to draw this conversation to the attention of someone more familiar with the LLVM implementation than myself. On 10/22/13 4:18 PM, Andrew Trick wrote:> On Oct 22, 2013, at 3:08 PM, Filip Pizlo <fpizlo at apple.com > <mailto:fpizlo at apple.com>> wrote: > >> On Oct 22, 2013, at 1:48 PM, Philip R <listmail at philipreames.com >> <mailto:listmail at philipreames.com>> wrote: >> >>> On 10/22/13 10:34 AM, Filip Pizlo wrote: >>>> On Oct 22, 2013, at 9:53 AM, Philip R <listmail at philipreames.com >>>> <mailto:listmail at philipreames.com>> wrote: >>>> >>>>> On 10/17/13 10:39 PM, Andrew Trick wrote: >>>>>> This is a proposal for adding Stackmaps and Patchpoints to LLVM. The >>>>>> first client of these features is the JavaScript compiler within the >>>>>> open source WebKit project. >>>>>> >>>>> I have a couple of comments on your proposal. None of these are >>>>> major enough to prevent submission. >>>>> >>>>> - As others have said, I'd prefer an experimental namespace rather >>>>> than a webkit namespace. (minor) >>>>> - Unless I am misreading your proposal, your proposed StackMap >>>>> intrinsic duplicates existing functionality already in llvm. In >>>>> particular, much of the StackMap construction seems similar to the >>>>> Safepoint mechanism used by the in-tree GC support. (See >>>>> CodeGen/GCStrategy.cpp and CodeGen/GCMetadata.cpp). Have you >>>>> examined these mechanisms to see if you can share implementations? >>>>> - To my knowledge, there is nothing that prevents an LLVM >>>>> optimization pass from manufacturing new pointers which point >>>>> inside an existing data structure. (e.g. an interior pointer to >>>>> an array when blocking a loop) Does your StackMap mechanism need >>>>> to be able to inspect/modify these manufactured temporaries? If >>>>> so, I don't see how you could generate an intrinsic which would >>>>> include this manufactured pointer in the live variable list. Is >>>>> there something I'm missing here? >>>> These stackmaps have nothing to do with GC. Interior pointers are >>>> a problem unique to precise copying collectors. >>> I would argue that while the use of the stack maps might be >>> different, the mechanism is fairly similar. >> >> It's not at all similar. These stackmaps are only useful for >> deoptimization, since the only way to make use of the live state >> information is to patch the stackmap with a jump to a deoptimization >> off-ramp. You won't use these for a GC. >> >>> In general, if the expected semantics are the same, a shared >>> implementation would be desirable. This is more a suggestion for >>> future refactoring than anything else. >> >> I think that these stackmaps and GC stackmaps are fairly different >> beasts. While it's possible to unify the two, this isn't the intent >> here. In particular, you can use these stackmaps for deoptimization >> without having to unwind the stack. > > I think Philip R is asking a good question. To paraphrase: If we > introduce a generically named feature, shouldn’t it be generically > useful? Stack maps are used in other ways, and there are other kinds > of patching. I agree and I think these are intended to be generically > useful features, but not necessarily sufficient for every use.Thank you for the restatement. You summarized my view well.> > The proposed stack maps are very different from LLVM’s gcroot because > gcroot does not provide stack maps! llvm.gcroot effectively designates > a stack location for each root for the duration of the current > function, and forces the root to be spilled to the stack at all call > sites (the client needs to disable StackColoring). This is really the > opposite of a stack map and I’m not aware of any functionality that > can be shared. It also requires a C++ plugin to process the roots. > llvm.stackmap generates data in a section that MCJIT clients can parse.Er, I think we're talking past each other again. Let me lay out my current understanding of the terminology and existing infrastructure in LLVM. Please correct me where I go wrong. stack map - A mapping from "values" to storage locations. Storage locations primarily take the form of register, or stack offsets, but could in principal refer to other well known locations (i.e. offsets into thread local state). A stack map is specific to a particular PC and describes the state at that instruction only. In a precise garbage collector, stack maps are used to ensure that the stack can be understood by the collector. When a stop-the-world safepoint is reached, the collector needs to be able to identify any pointers to heap objects which may exist on the stack. This explicitly includes both the frame which actually contains the safepoint and any caller frames back to the root of thread. To accomplish this, a stack map is generated at any call site and a stack map is generated for the safepoint itself. In LLVM currently, the GCStrategy records "safepoints" which are really points at which stack maps need to be remembered. (i.e. calls and actual stop-the-world safepoints) The GCMetadata mechanism gives a generic way to emit the binary encoding of a stack map in a collector specific way. The current stack maps supported by this mechanism only allow abstract locations on the stack which force all registers to be spilled around "safepoints" (i.e. calls and stop-the-world safepoints). Also, the set of roots (which are recorded in the stack map) must be provided separately using the gcroot intrinsic. In code: - GCPoint in llvm/include/llvm/CodeGen/GCMetadata.h describes a request for a location with a stack map. The SafePoints structure in GCFunctionInfo contains a list of these locations. - The Ocaml GC is probably the best example of usage. See llvm/lib/CodeGen/AsmPrinter/OcamlGCPrinter.cpp Note: The summary of existing LLVM details above is based on reading the code. I haven't actually implemented anything which used this mechanism yet. As such, take it with a grain of salt. In your change, you are adding a mechanism which is intended to enable runtime calls and inline cache patching. (Right?) Your stack maps seem to match the definition of a stack map I gave above and (I believe) the implementation currently in LLVM. The only difference might be that your stack maps are partial (i.e. might not contain all "values" which are live at a particular PC) and your implementation includes Register locations which the current implementation in LLVM does not. One other possible difference, are you intending to include "values" which aren't of pointer type? Before moving on, am I interpreting your proposal and changes correctly? Assuming I'm still correct so far, how might we combine these implementations? It looks like your implementation is much more mature than what exists in tree at the moment. One possibility would be to express the needed GC stack maps in terms of your new infrastructure. (i.e. convert a GCStrategy request for a safepoint into a StackMap (as you've implemented it) with the list of explicit GC roots as it's arguments). What would you think of this? p.s. This discussion has gotten sufficiently abstract that it should in no way block your plan to submit these changes. I appreciate your willingness to discuss.> > If someone wanted to use stack maps for GC, I don’t know why they > wouldn’t leverage llvm.stackmap. Maybe Filip can see a problem with > this that I can't. The runtime can add GC roots to the stack map just > like other live value, and it should know how to interpret the > records. The intrinsic doesn’t bake in any particular interpretation > of the mapped values.I think this a restatement of my last paragraph above which would mean we're actually in agreement.> That said, my proposal deliberately does not cover GC. I think that > stack maps are the easy part of the problem. The hard problem is > tracking interior pointers, or for that matter exterior/out-of-bounds > or swizzled pointers. LLVM’s machine IR simply doesn’t have the > necessary facilities for doing this. But if you don’t need a moving > collector, then you don’t need to track derived pointers as long as > the roots are kept live. In that case, llvm.stackmap might be a nice > optimization over llvm.gcroot.Oddly enough, I'll be raising the issue of how to go about supporting a relocating collector on list shortly. We've looking into this independently, but are at the point we'd like to get feedback from others. :)> > Now with regard to patching. I think llvm.patchpoint is generally > useful for any type of patching I can imagine. It does look like a > call site in IR, and it’s nice to be able to leverage calling > conventions to inform the location of arguments.Agreed. My concern is mostly about naming and documentation of intended usages. Speaking as someone who's likely to be using this in the very near future, I'd like to make sure I understand how you intend it to be used. The last thing I want to do is misconstrue your intent and become reliant on a quirk of the implementation you later want to change.> But the patchpoint does not have to be a call after patching, and you > can specify zero arguments to avoid using a calling convention.Er, not quite true. Your calling convention also influences what registers stay live across the call. But in general, I see your point. (Again, this is touching an area of LLVM I'm not particularly familiar with.)> In fact, we only currently emit a call out of convenience. We could > splat nops in place and assume the runtime will immediately find and > patch all occurrences before the code executes. In the future we may > want to handle NULL call target, bypass call emission, and allow the > reserved bytes to be less than that required to emit a call.If you were to do that, how would the implementation be different then the new stackmap intrinsic? Does that difference imply a clarification in intended usage or naming? p.s. The naming discussion has gotten rather abstract and is starting to feel like a "what color is the bikeshed" discussion. Feel free to just tell me to go away at some point. :) Philip -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131022/cc89da09/attachment.html>
Andrew Trick
2013-Oct-23 05:48 UTC
[LLVMdev] [RFC] Stackmap and Patchpoint Intrinsic Proposal
I'll respond to a few questions below. I'll start a new thread for GC discussion. On Oct 22, 2013, at 6:24 PM, Philip R <listmail at philipreames.com> wrote:>> Now with regard to patching. I think llvm.patchpoint is generally useful for any type of patching I can imagine. It does look like a call site in IR, and it’s nice to be able to leverage calling conventions to inform the location of arguments. > Agreed. My concern is mostly about naming and documentation of intended usages. Speaking as someone who's likely to be using this in the very near future, I'd like to make sure I understand how you intend it to be used. The last thing I want to do is misconstrue your intent and become reliant on a quirk of the implementation you later want to change.I don't think the intrinsic names will be able to capture their semantics. I think that's why we need documentation, which I've been working on: http://llvm-reviews.chandlerc.com/D1981. For example, the "stackmap" intrinsic isn't really a stack map, it's something that allows generation of a stack map in which the entries don't actually need to be on the stack... confusing, but still a good name I think.>> But the patchpoint does not have to be a call after patching, and you can specify zero arguments to avoid using a calling convention. > Er, not quite true. Your calling convention also influences what registers stay live across the call. But in general, I see your point.You get around that by defining a new calling convention. Each patchpoint intrinsic call can be marked with a different calling convention if you choose. For example, we'll be adding a dynamic calling convention called AnyRegCC. You can use that to effectively specify the number of arguments that you want to force into registers. The stack map will tell you which registers were used for arguments. The "call" will preserves most registers, but clobbers one register (on x86) for use within the code. Another potential extension is to add an entry to the stackmap marking physical registers that are actually in-use across the stack map or patch point. It helps me to think of llvm.patchpoint as a replacement for any situation where a JIT would have otherwise needed inline asm to generate the desired code sequence.> (Again, this is touching an area of LLVM I'm not particularly familiar with.) >> In fact, we only currently emit a call out of convenience. We could splat nops in place and assume the runtime will immediately find and patch all occurrences before the code executes. In the future we may want to handle NULL call target, bypass call emission, and allow the reserved bytes to be less than that required to emit a call. > If you were to do that, how would the implementation be different then the new stackmap intrinsic? Does that difference imply a clarification in intended usage or naming?The implementation of the two intrinsics is actually very similar. In this case, the difference would be that llvm.stackmap does not reserve space for patching, while llvm.patchpoint does. We could have defined different intrinsics for all variations of use cases, but I think two is the right number: - Use llvm.stackmap if you just want a stack map. No code will be emitted. There is no calling convention. If the runtime patches the code here, it will be destructive. - Use llvm.patchpoint if you want to reserve space for patching the code. When you do that, you can optionally specify a number of arguments that will follow a specified calling convention. You also get a stack map here because it can be useful to fuse the stack map to the point point location. After all, the runtime needs to know where to patch. -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131022/16af433e/attachment.html>
Andrew Trick
2013-Oct-23 06:12 UTC
[LLVMdev] GC StackMaps (was Stackmap and Patchpoint Intrinsic Proposal)
I'm moving this to a different thread. I think the newly proposed intrinsic definitions and their current implementation are valuable regardless of how it gets tied into GC... On Oct 22, 2013, at 6:24 PM, Philip R <listmail at philipreames.com> wrote:> Adding Gael as someone who has previously discussed vmkit topics on the list. Since I'm assuming this is where the GC support came from, I wanted to draw this conversation to the attention of someone more familiar with the LLVM implementation than myself. > > On 10/22/13 4:18 PM, Andrew Trick wrote: >> On Oct 22, 2013, at 3:08 PM, Filip Pizlo <fpizlo at apple.com> wrote: >> >>> On Oct 22, 2013, at 1:48 PM, Philip R <listmail at philipreames.com> wrote: >>> >>>> On 10/22/13 10:34 AM, Filip Pizlo wrote: >>>>> On Oct 22, 2013, at 9:53 AM, Philip R <listmail at philipreames.com> wrote: >>>>> >>>>>> On 10/17/13 10:39 PM, Andrew Trick wrote: >>>>>>> This is a proposal for adding Stackmaps and Patchpoints to LLVM. The >>>>>>> first client of these features is the JavaScript compiler within the >>>>>>> open source WebKit project. >>>>>>> >>>>>> I have a couple of comments on your proposal. None of these are major enough to prevent submission. >>>>>> >>>>>> - As others have said, I'd prefer an experimental namespace rather than a webkit namespace. (minor) >>>>>> - Unless I am misreading your proposal, your proposed StackMap intrinsic duplicates existing functionality already in llvm. In particular, much of the StackMap construction seems similar to the Safepoint mechanism used by the in-tree GC support. (See CodeGen/GCStrategy.cpp and CodeGen/GCMetadata.cpp). Have you examined these mechanisms to see if you can share implementations? >>>>>> - To my knowledge, there is nothing that prevents an LLVM optimization pass from manufacturing new pointers which point inside an existing data structure. (e.g. an interior pointer to an array when blocking a loop) Does your StackMap mechanism need to be able to inspect/modify these manufactured temporaries? If so, I don't see how you could generate an intrinsic which would include this manufactured pointer in the live variable list. Is there something I'm missing here? >>>>> These stackmaps have nothing to do with GC. Interior pointers are a problem unique to precise copying collectors. >>>> I would argue that while the use of the stack maps might be different, the mechanism is fairly similar. >>> >>> It's not at all similar. These stackmaps are only useful for deoptimization, since the only way to make use of the live state information is to patch the stackmap with a jump to a deoptimization off-ramp. You won't use these for a GC. >>> >>>> In general, if the expected semantics are the same, a shared implementation would be desirable. This is more a suggestion for future refactoring than anything else. >>> >>> I think that these stackmaps and GC stackmaps are fairly different beasts. While it's possible to unify the two, this isn't the intent here. In particular, you can use these stackmaps for deoptimization without having to unwind the stack. >> >> I think Philip R is asking a good question. To paraphrase: If we introduce a generically named feature, shouldn’t it be generically useful? Stack maps are used in other ways, and there are other kinds of patching. I agree and I think these are intended to be generically useful features, but not necessarily sufficient for every use. > Thank you for the restatement. You summarized my view well. >> >> The proposed stack maps are very different from LLVM’s gcroot because gcroot does not provide stack maps! llvm.gcroot effectively designates a stack location for each root for the duration of the current function, and forces the root to be spilled to the stack at all call sites (the client needs to disable StackColoring). This is really the opposite of a stack map and I’m not aware of any functionality that can be shared. It also requires a C++ plugin to process the roots. llvm.stackmap generates data in a section that MCJIT clients can parse. > Er, I think we're talking past each other again. Let me lay out my current understanding of the terminology and existing infrastructure in LLVM. Please correct me where I go wrong. > > stack map - A mapping from "values" to storage locations. Storage locations primarily take the form of register, or stack offsets, but could in principal refer to other well known locations (i.e. offsets into thread local state). A stack map is specific to a particular PC and describes the state at that instruction only. > > In a precise garbage collector, stack maps are used to ensure that the stack can be understood by the collector. When a stop-the-world safepoint is reached, the collector needs to be able to identify any pointers to heap objects which may exist on the stack. This explicitly includes both the frame which actually contains the safepoint and any caller frames back to the root of thread. To accomplish this, a stack map is generated at any call site and a stack map is generated for the safepoint itself. > > In LLVM currently, the GCStrategy records "safepoints" which are really points at which stack maps need to be remembered. (i.e. calls and actual stop-the-world safepoints) The GCMetadata mechanism gives a generic way to emit the binary encoding of a stack map in a collector specific way. The current stack maps supported by this mechanism only allow abstract locations on the stack which force all registers to be spilled around "safepoints" (i.e. calls and stop-the-world safepoints). Also, the set of roots (which are recorded in the stack map) must be provided separately using the gcroot intrinsic. > > In code: > - GCPoint in llvm/include/llvm/CodeGen/GCMetadata.h describes a request for a location with a stack map. The SafePoints structure in GCFunctionInfo contains a list of these locations. > - The Ocaml GC is probably the best example of usage. See llvm/lib/CodeGen/AsmPrinter/OcamlGCPrinter.cpp > > Note: The summary of existing LLVM details above is based on reading the code. I haven't actually implemented anything which used this mechanism yet. As such, take it with a grain of salt.That's an excellent description of stack maps, GCStrategy, and safepoints. Now let me explain how I see it. GCStrategy provides layers of abstraction that allow plugins to specialize GC metadata. Conceptually, a plugin can generate what looks like stack map data to the collector. But there isn't any direct support in LLVM IR for the kind of stack maps that we need. When I talk about adding stack map support, I'm really talking about support for mapping values to registers, where the set of values and their locations are specific to the "safepoint". We're adding an underlying implementation of per-safepoint live values. There isn't a lot of abstraction built up around it. Just a couple of intrinsics that directly expose the functionality. We're also approaching the interface very differently. We're enabling an MCJIT client. The interface to the client is the stack map format.> In your change, you are adding a mechanism which is intended to enable runtime calls and inline cache patching. (Right?) Your stack maps seem to match the definition of a stack map I gave above and (I believe) the implementation currently in LLVM. The only difference might be that your stack maps are partial (i.e. might not contain all "values" which are live at a particular PC) and your implementation includes Register locations which the current implementation in LLVM does not. One other possible difference, are you intending to include "values" which aren't of pointer type?Yes, the values will be of various types (although only 32/64 bit types are currently allowed because of DWARF register number weirdness). More importantly, our stack maps record locations of a specific set of values, which may be in registers, at a specific location. In fact, that, along with reserving space for code patching, is *all* we're doing. GCRoot doesn't do this at all. So there is effectively no overlap in implementation.> > Before moving on, am I interpreting your proposal and changes correctly?Yes, except I don’t see a direct connection between the functionality we’re adding and “the implementation currently in LLVM”.> Assuming I'm still correct so far, how might we combine these implementations? It looks like your implementation is much more mature than what exists in tree at the moment. One possibility would be to express the needed GC stack maps in terms of your new infrastructure. (i.e. convert a GCStrategy request for a safepoint into a StackMap (as you've implemented it) with the list of explicit GC roots as it's arguments). What would you think of this?I can imagine someone wanting to leverage some of the new implementation without using it end-to-end as-is. Although I'm not entirely sure what the motivation would be. For example: - A CodeGenPrepare pass could insert llvm.safepoint or llvm.patchpoint calls at custom safepoints after determining GC root liveness at those points. - Something like a GCStrategy could intercept our implementation of stack map generation and emit a custom format. Keep in mind though that the format that LLVM emits does not need to be the format read by the collector. The JIT/runtime can parse LLVM's stack map data and encode it using it's own data structures. That way, the JIT/runtime can change without customizing LLVM. As far as hooking the new stack map support into the GCMetaData abstraction, I'm not sure how that would work. GCMachineCodeAnalysis is currently a standalone MI pass. We can't generate our stack maps here. Technically, a preEmitPass can come along later and reassign registers invalidating the stack map. That's why we generate the maps during MC lowering. So, currently, the new intrinsics are serving a different purpose than GCMetaData. I think someone working on GC support needs to be convinced that they really need the new stack map features. Then we can build something on top of the underlying functionality that works for them. -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131022/238e6dc4/attachment.html>
Gaël Thomas
2013-Oct-23 08:17 UTC
[LLVMdev] GC StackMaps (was Stackmap and Patchpoint Intrinsic Proposal)
Hi all, I don't know if I understand everything, but it seems really interesting for a runtime developer, stackmap and patchpoint looks perfect for a lot of optimizations :) I just have few question to verify if I understand what are these stackmaps and patchpoints, and I discuss the GC after. * I have a first very simple scenario (useful in vmkit). Let's imagine that we want to lazily build the layout of an object at runtime, i.e., we don't know the layout of the object when we are emitting the code. And, we want to access to a field of this object identified by a symbol. If I understand correctly, we can use your stackmap to define the offset of this field and then patch the code that use this offset? The machine code will like mov %(rax)offset, .., and the stackmap will generate a map that contains the location of "offset" in the code? If it's the case, it's perfect. * Now, let's imagine that I want to lazily call a virtual method (aka, a single dispatch call in Java). I have two problems. First, I have to know the offset of the method in the virtual table (just like for virtual calls in c++). Second, the entry in the table should contain a stub able to link/compile the target method at runtime. And the stub has to know which object has received the call (which drives the method resolution and the update of the virtual table). With stackmaps and patchpoints, I can imagine something like that (in pseudo-llvm without typing) %r0 = load %obj, 0 ; the virtual table is at offset 0 %r1 = 0 stackmap %r1, ID_OFFSET ; contains the offset of the target method in the virtual table %r2 = add %r1, %r0 %r3 = load %r2 patchpoint ID_CALL %r3, %obj, other parameters ; to find %obj in the stub I should be able to: - patch ID_OFFSET when I load the description of obj (before the call, when the object is allocated) - use ID_CALL to know which object is the target of the call in order to find the appropriate method. If it could be the case, your patchpoint and safepoint are very interesting for vmkit. We just need a function to retreive, in the caller, from which patchpoint we are coming from (probably by inspecting the stack, we can find the program pointer of the call site and then find the patchpoint descriptor?) * Now, for the GC, if I understand correctly, instead of declaring a variable as a root, you can declare explicitly the safepoints by using patchpoints with something like patchpoint ID_safepoint_17, suspendTheThreadForCollection, list of the alloca (or registers) that contains objects Then in the suspendTheThreadForCollection, we can see that we are coming for the safepoint_17 and then find the locations of the objects? If a patchpoint can work like this, it's probably a good building block for the gc. Currently, we have to declare the root objects with the root intrinsic, then add the appropriate safepoints (it's just a call to GCFunctionInfo.addSafePoint). As root objects are marked as root, modifying GCFunctionInfo.addSafepoint to generate a patchpoint with all the gc roots as argument (instead of using the current infrastructure) should not be difficult. And it probably means that the current gc infrastructure could use patchpoint as a backend. The only problem that I see is that all the objects will be transmitted as arguments to suspendTheThreadForCollection, it's maybe not the best way to do that. Probably, something like: safepoint ID_safepoint_17, list of alloca that contains objects patchpoint ID_safepoint_17, suspendTheThreadForCollection should be better to avoid useless arguments? See you, Gaël PS: just, tell me if the code is already in the trunk, because I would like to see if these intrinsics can work for vmkit :) 2013/10/23 Andrew Trick <atrick at apple.com>:> I'm moving this to a different thread. I think the newly proposed > intrinsic definitions and their current implementation are valuable > regardless of how it gets tied into GC... > > On Oct 22, 2013, at 6:24 PM, Philip R <listmail at philipreames.com> wrote: > > Adding Gael as someone who has previously discussed vmkit topics on the > list. Since I'm assuming this is where the GC support came from, I wanted > to draw this conversation to the attention of someone more familiar with the > LLVM implementation than myself. > > On 10/22/13 4:18 PM, Andrew Trick wrote: > > On Oct 22, 2013, at 3:08 PM, Filip Pizlo <fpizlo at apple.com> wrote: > > On Oct 22, 2013, at 1:48 PM, Philip R <listmail at philipreames.com> wrote: > > On 10/22/13 10:34 AM, Filip Pizlo wrote: > > On Oct 22, 2013, at 9:53 AM, Philip R <listmail at philipreames.com> wrote: > > On 10/17/13 10:39 PM, Andrew Trick wrote: > > This is a proposal for adding Stackmaps and Patchpoints to LLVM. The > first client of these features is the JavaScript compiler within the > open source WebKit project. > > I have a couple of comments on your proposal. None of these are major > enough to prevent submission. > > - As others have said, I'd prefer an experimental namespace rather than a > webkit namespace. (minor) > - Unless I am misreading your proposal, your proposed StackMap intrinsic > duplicates existing functionality already in llvm. In particular, much of > the StackMap construction seems similar to the Safepoint mechanism used by > the in-tree GC support. (See CodeGen/GCStrategy.cpp and > CodeGen/GCMetadata.cpp). Have you examined these mechanisms to see if you > can share implementations? > - To my knowledge, there is nothing that prevents an LLVM optimization pass > from manufacturing new pointers which point inside an existing data > structure. (e.g. an interior pointer to an array when blocking a loop) > Does your StackMap mechanism need to be able to inspect/modify these > manufactured temporaries? If so, I don't see how you could generate an > intrinsic which would include this manufactured pointer in the live variable > list. Is there something I'm missing here? > > These stackmaps have nothing to do with GC. Interior pointers are a problem > unique to precise copying collectors. > > I would argue that while the use of the stack maps might be different, the > mechanism is fairly similar. > > > It's not at all similar. These stackmaps are only useful for > deoptimization, since the only way to make use of the live state information > is to patch the stackmap with a jump to a deoptimization off-ramp. You > won't use these for a GC. > > In general, if the expected semantics are the same, a shared implementation > would be desirable. This is more a suggestion for future refactoring than > anything else. > > > I think that these stackmaps and GC stackmaps are fairly different beasts. > While it's possible to unify the two, this isn't the intent here. In > particular, you can use these stackmaps for deoptimization without having to > unwind the stack. > > > I think Philip R is asking a good question. To paraphrase: If we introduce a > generically named feature, shouldn’t it be generically useful? Stack maps > are used in other ways, and there are other kinds of patching. I agree and I > think these are intended to be generically useful features, but not > necessarily sufficient for every use. > > Thank you for the restatement. You summarized my view well. > > > The proposed stack maps are very different from LLVM’s gcroot because gcroot > does not provide stack maps! llvm.gcroot effectively designates a stack > location for each root for the duration of the current function, and forces > the root to be spilled to the stack at all call sites (the client needs to > disable StackColoring). This is really the opposite of a stack map and I’m > not aware of any functionality that can be shared. It also requires a C++ > plugin to process the roots. llvm.stackmap generates data in a section that > MCJIT clients can parse. > > Er, I think we're talking past each other again. Let me lay out my current > understanding of the terminology and existing infrastructure in LLVM. > Please correct me where I go wrong. > > stack map - A mapping from "values" to storage locations. Storage locations > primarily take the form of register, or stack offsets, but could in > principal refer to other well known locations (i.e. offsets into thread > local state). A stack map is specific to a particular PC and describes the > state at that instruction only. > > In a precise garbage collector, stack maps are used to ensure that the stack > can be understood by the collector. When a stop-the-world safepoint is > reached, the collector needs to be able to identify any pointers to heap > objects which may exist on the stack. This explicitly includes both the > frame which actually contains the safepoint and any caller frames back to > the root of thread. To accomplish this, a stack map is generated at any > call site and a stack map is generated for the safepoint itself. > > In LLVM currently, the GCStrategy records "safepoints" which are really > points at which stack maps need to be remembered. (i.e. calls and actual > stop-the-world safepoints) The GCMetadata mechanism gives a generic way to > emit the binary encoding of a stack map in a collector specific way. The > current stack maps supported by this mechanism only allow abstract locations > on the stack which force all registers to be spilled around "safepoints" > (i.e. calls and stop-the-world safepoints). Also, the set of roots (which > are recorded in the stack map) must be provided separately using the gcroot > intrinsic. > > In code: > - GCPoint in llvm/include/llvm/CodeGen/GCMetadata.h describes a request for > a location with a stack map. The SafePoints structure in GCFunctionInfo > contains a list of these locations. > - The Ocaml GC is probably the best example of usage. See > llvm/lib/CodeGen/AsmPrinter/OcamlGCPrinter.cpp > > Note: The summary of existing LLVM details above is based on reading the > code. I haven't actually implemented anything which used this mechanism > yet. As such, take it with a grain of salt. > > > That's an excellent description of stack maps, GCStrategy, and > safepoints. Now let me explain how I see it. > > GCStrategy provides layers of abstraction that allow plugins to > specialize GC metadata. Conceptually, a plugin can generate what looks > like stack map data to the collector. But there isn't any direct > support in LLVM IR for the kind of stack maps that we need. > > When I talk about adding stack map support, I'm really talking about > support for mapping values to registers, where the set of values and > their locations are specific to the "safepoint". > > We're adding an underlying implementation of per-safepoint live > values. There isn't a lot of abstraction built up around it. Just a > couple of intrinsics that directly expose the functionality. > > We're also approaching the interface very differently. We're enabling > an MCJIT client. The interface to the client is the stack map format. > > > In your change, you are adding a mechanism which is intended to enable > runtime calls and inline cache patching. (Right?) Your stack maps seem to > match the definition of a stack map I gave above and (I believe) the > implementation currently in LLVM. The only difference might be that your > stack maps are partial (i.e. might not contain all "values" which are live > at a particular PC) and your implementation includes Register locations > which the current implementation in LLVM does not. One other possible > difference, are you intending to include "values" which aren't of pointer > type? > > > Yes, the values will be of various types (although only 32/64 bit > types are currently allowed because of DWARF register number > weirdness). More importantly, our stack maps record locations of a > specific set of values, which may be in registers, at a specific > location. In fact, that, along with reserving space for code patching, > is *all* we're doing. GCRoot doesn't do this at all. So there is > effectively no overlap in implementation. > > > Before moving on, am I interpreting your proposal and changes correctly? > > > Yes, except I don’t see a direct connection between the functionality we’re > adding and “the implementation currently in LLVM”. > > Assuming I'm still correct so far, how might we combine these > implementations? It looks like your implementation is much more mature than > what exists in tree at the moment. One possibility would be to express the > needed GC stack maps in terms of your new infrastructure. (i.e. convert a > GCStrategy request for a safepoint into a StackMap (as you've implemented > it) with the list of explicit GC roots as it's arguments). What would you > think of this? > > > I can imagine someone wanting to leverage some of the new > implementation without using it end-to-end as-is. Although I'm not > entirely sure what the motivation would be. For example: > > - A CodeGenPrepare pass could insert llvm.safepoint or llvm.patchpoint > calls at custom safepoints after determining GC root liveness at > those points. > > - Something like a GCStrategy could intercept our implementation of > stack map generation and emit a custom format. Keep in mind though > that the format that LLVM emits does not need to be the format read > by the collector. The JIT/runtime can parse LLVM's stack map data > and encode it using it's own data structures. That way, the > JIT/runtime can change without customizing LLVM. > > As far as hooking the new stack map support into the GCMetaData > abstraction, I'm not sure how that would work. GCMachineCodeAnalysis > is currently a standalone MI pass. We can't generate our stack maps > here. Technically, a preEmitPass can come along later and reassign > registers invalidating the stack map. That's why we generate the maps > during MC lowering. > > So, currently, the new intrinsics are serving a different purpose than > GCMetaData. I think someone working on GC support needs to be > convinced that they really need the new stack map features. Then we > can build something on top of the underlying functionality that works > for them. > > -Andy-- ------------------------------------------------------------------- Gaël Thomas, Associate Professor, UPMC http://pagesperso-systeme.lip6.fr/Gael.Thomas/ -------------------------------------------------------------------
Philip Reames
2013-Oct-24 00:27 UTC
[LLVMdev] [RFC] Stackmap and Patchpoint Intrinsic Proposal
On 10/22/13 10:48 PM, Andrew Trick wrote:> I'll respond to a few questions below. I'll start a new thread for GC > discussion.Good idea. Thanks.> > On Oct 22, 2013, at 6:24 PM, Philip R <listmail at philipreames.com > <mailto:listmail at philipreames.com>> wrote: > >>> Now with regard to patching. I think llvm.patchpoint is generally >>> useful for any type of patching I can imagine. It does look like a >>> call site in IR, and it’s nice to be able to leverage calling >>> conventions to inform the location of arguments. >> Agreed. My concern is mostly about naming and documentation of >> intended usages. Speaking as someone who's likely to be using this in >> the very near future, I'd like to make sure I understand how you >> intend it to be used. The last thing I want to do is misconstrue >> your intent and become reliant on a quirk of the implementation you >> later want to change. > > I don't think the intrinsic names will be able to capture their > semantics. I think that's why we need documentation, which I've been > working on: http://llvm-reviews.chandlerc.com/D1981. > > For example, the "stackmap" intrinsic isn't really a stack map, it's > something that allows generation of a stack map in which the entries > don't actually need to be on the stack... confusing, but still a good > name I think."stack map" is also a fairly well understood term in the GC/compiler world. It's better to stick with well known terminology where possible. As for naming vs documentation, I tend to believe that naming should be as descriptive as is reasonable. Having said that, we're well past the point of adding value with this discussion. Please update the documentation to reflect some of the clarifications on usage that have come up here and let's move on.> >>> But the patchpoint does not have to be a call after patching, and >>> you can specify zero arguments to avoid using a calling convention. >> Er, not quite true. Your calling convention also influences what >> registers stay live across the call. But in general, I see your point. > > You get around that by defining a new calling convention. Each > patchpoint intrinsic call can be marked with a different calling > convention if you choose. For example, we'll be adding a dynamic > calling convention called AnyRegCC. You can use that to effectively > specify the number of arguments that you want to force into registers. > The stack map will tell you which registers were used for arguments. > The "call" will preserves most registers, but clobbers one register > (on x86) for use within the code.Nice trick. I'll have to remember that.> > Another potential extension is to add an entry to the stackmap marking > physical registers that are actually in-use across the stack map or > patch point. > > It helps me to think of llvm.patchpoint as a replacement for any > situation where a JIT would have otherwise needed inline asm to > generate the desired code sequence. > >> (Again, this is touching an area of LLVM I'm not particularly >> familiar with.) >>> In fact, we only currently emit a call out of convenience. We could >>> splat nops in place and assume the runtime will immediately find and >>> patch all occurrences before the code executes. In the future we may >>> want to handle NULL call target, bypass call emission, and allow the >>> reserved bytes to be less than that required to emit a call. >> If you were to do that, how would the implementation be different >> then the new stackmap intrinsic? Does that difference imply a >> clarification in intended usage or naming? > > The implementation of the two intrinsics is actually very similar. In > this case, the difference would be that llvm.stackmap does not reserve > space for patching, while llvm.patchpoint does.I'm slightly confused by this given that stackmap takes an argument indicating the number of nops to emit as well, but it's not worth debating this any more. Let's move on. We can revisit this once I'm actually using the new intrinsics and can provide real concrete feedback.> > We could have defined different intrinsics for all variations of use > cases, but I think two is the right number: > > - Use llvm.stackmap if you just want a stack map. No code will be > emitted. There is no calling convention. If the runtime patches the > code here, it will be destructive. > > - Use llvm.patchpoint if you want to reserve space for patching the > code. When you do that, you can optionally specify a number of > arguments that will follow a specified calling convention. You also > get a stack map here because it can be useful to fuse the stack map to > the point point location. After all, the runtime needs to know where > to patch.This summary is really helpful. This summary and the other points you've made in our discussion is exactly what should be in the documentation. It provides the thought process behind their design, and the intended usage scenarios. Can you add a section with this meta information? Yours, Philip p.s. Thank you both for taking the time to hash this through. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131023/ef8a1ed4/attachment.html>
Philip Reames
2013-Oct-24 01:23 UTC
[LLVMdev] GC StackMaps (was Stackmap and Patchpoint Intrinsic Proposal)
On 10/22/13 11:12 PM, Andrew Trick wrote:> I'm moving this to a different thread. I think the newly proposed > intrinsic definitions and their current implementation are valuable > regardless of how it gets tied into GC...Agreed. As Gaël said, I'm looking forward to being able to play with this in tree. :)> > On Oct 22, 2013, at 6:24 PM, Philip R <listmail at philipreames.com > <mailto:listmail at philipreames.com>> wrote: > >> Adding Gael as someone who has previously discussed vmkit topics on >> the list. Since I'm assuming this is where the GC support came from, >> I wanted to draw this conversation to the attention of someone more >> familiar with the LLVM implementation than myself. >> >> On 10/22/13 4:18 PM, Andrew Trick wrote: >>> On Oct 22, 2013, at 3:08 PM, Filip Pizlo <fpizlo at apple.com >>> <mailto:fpizlo at apple.com>> wrote: >>> >>>> On Oct 22, 2013, at 1:48 PM, Philip R <listmail at philipreames.com >>>> <mailto:listmail at philipreames.com>> wrote: >>>> >>>>> On 10/22/13 10:34 AM, Filip Pizlo wrote: >>>>>> On Oct 22, 2013, at 9:53 AM, Philip R <listmail at philipreames.com >>>>>> <mailto:listmail at philipreames.com>> wrote: >>>>>> >>>>>>> On 10/17/13 10:39 PM, Andrew Trick wrote: >>>>>>>> This is a proposal for adding Stackmaps and Patchpoints to >>>>>>>> LLVM. The >>>>>>>> first client of these features is the JavaScript compiler >>>>>>>> within the >>>>>>>> open source WebKit project. >>>>>>>> >>>>>>> I have a couple of comments on your proposal. None of these are >>>>>>> major enough to prevent submission. >>>>>>> >>>>>>> - As others have said, I'd prefer an experimental namespace >>>>>>> rather than a webkit namespace. (minor) >>>>>>> - Unless I am misreading your proposal, your proposed StackMap >>>>>>> intrinsic duplicates existing functionality already in llvm. In >>>>>>> particular, much of the StackMap construction seems similar to >>>>>>> the Safepoint mechanism used by the in-tree GC support. (See >>>>>>> CodeGen/GCStrategy.cpp and CodeGen/GCMetadata.cpp). Have you >>>>>>> examined these mechanisms to see if you can share implementations? >>>>>>> - To my knowledge, there is nothing that prevents an LLVM >>>>>>> optimization pass from manufacturing new pointers which point >>>>>>> inside an existing data structure. (e.g. an interior pointer to >>>>>>> an array when blocking a loop) Does your StackMap mechanism >>>>>>> need to be able to inspect/modify these manufactured >>>>>>> temporaries? If so, I don't see how you could generate an >>>>>>> intrinsic which would include this manufactured pointer in the >>>>>>> live variable list. Is there something I'm missing here? >>>>>> These stackmaps have nothing to do with GC. Interior pointers >>>>>> are a problem unique to precise copying collectors. >>>>> I would argue that while the use of the stack maps might be >>>>> different, the mechanism is fairly similar. >>>> >>>> It's not at all similar. These stackmaps are only useful for >>>> deoptimization, since the only way to make use of the live state >>>> information is to patch the stackmap with a jump to a >>>> deoptimization off-ramp. You won't use these for a GC. >>>> >>>>> In general, if the expected semantics are the same, a shared >>>>> implementation would be desirable. This is more a suggestion for >>>>> future refactoring than anything else. >>>> >>>> I think that these stackmaps and GC stackmaps are fairly different >>>> beasts. While it's possible to unify the two, this isn't the >>>> intent here. In particular, you can use these stackmaps for >>>> deoptimization without having to unwind the stack. >>> >>> I think Philip R is asking a good question. To paraphrase: If we >>> introduce a generically named feature, shouldn’t it be generically >>> useful? Stack maps are used in other ways, and there are other kinds >>> of patching. I agree and I think these are intended to be >>> generically useful features, but not necessarily sufficient for >>> every use. >> Thank you for the restatement. You summarized my view well. >>> >>> The proposed stack maps are very different from LLVM’s gcroot >>> because gcroot does not provide stack maps! llvm.gcroot effectively >>> designates a stack location for each root for the duration of the >>> current function, and forces the root to be spilled to the stack at >>> all call sites (the client needs to disable StackColoring). This is >>> really the opposite of a stack map and I’m not aware of any >>> functionality that can be shared. It also requires a C++ plugin to >>> process the roots. llvm.stackmap generates data in a section that >>> MCJIT clients can parse. >> Er, I think we're talking past each other again. Let me lay out my >> current understanding of the terminology and existing infrastructure >> in LLVM. Please correct me where I go wrong. >> >> stack map - A mapping from "values" to storage locations. Storage >> locations primarily take the form of register, or stack offsets, but >> could in principal refer to other well known locations (i.e. offsets >> into thread local state). A stack map is specific to a particular PC >> and describes the state at that instruction only. >> >> In a precise garbage collector, stack maps are used to ensure that >> the stack can be understood by the collector. When a stop-the-world >> safepoint is reached, the collector needs to be able to identify any >> pointers to heap objects which may exist on the stack. This >> explicitly includes both the frame which actually contains the >> safepoint and any caller frames back to the root of thread. To >> accomplish this, a stack map is generated at any call site and a >> stack map is generated for the safepoint itself. >> >> In LLVM currently, the GCStrategy records "safepoints" which are >> really points at which stack maps need to be remembered. (i.e. calls >> and actual stop-the-world safepoints) The GCMetadata mechanism gives >> a generic way to emit the binary encoding of a stack map in a >> collector specific way. The current stack maps supported by this >> mechanism only allow abstract locations on the stack which force all >> registers to be spilled around "safepoints" (i.e. calls and >> stop-the-world safepoints). Also, the set of roots (which are >> recorded in the stack map) must be provided separately using the >> gcroot intrinsic. >> >> In code: >> - GCPoint in llvm/include/llvm/CodeGen/GCMetadata.h describes a >> request for a location with a stack map. The SafePoints structure in >> GCFunctionInfo contains a list of these locations. >> - The Ocaml GC is probably the best example of usage. See >> llvm/lib/CodeGen/AsmPrinter/OcamlGCPrinter.cpp >> >> Note: The summary of existing LLVM details above is based on reading >> the code. I haven't actually implemented anything which used this >> mechanism yet. As such, take it with a grain of salt. > > That's an excellent description of stack maps, GCStrategy, and > safepoints. Now let me explain how I see it. > > GCStrategy provides layers of abstraction that allow plugins to > specialize GC metadata. Conceptually, a plugin can generate what looks > like stack map data to the collector. But there isn't any direct > support in LLVM IR for the kind of stack maps that we need. > > When I talk about adding stack map support, I'm really talking about > support for mapping values to registers, where the set of values and > their locations are specific to the "safepoint". > > We're adding an underlying implementation of per-safepoint live > values. There isn't a lot of abstraction built up around it. Just a > couple of intrinsics that directly expose the functionality. > > We're also approaching the interface very differently. We're enabling > an MCJIT client. The interface to the client is the stack map format.For the record, I actually prefer your approach to the interface. :)> > >> In your change, you are adding a mechanism which is intended to >> enable runtime calls and inline cache patching. (Right?) Your stack >> maps seem to match the definition of a stack map I gave above and (I >> believe) the implementation currently in LLVM. The only difference >> might be that your stack maps are partial (i.e. might not contain all >> "values" which are live at a particular PC) and your implementation >> includes Register locations which the current implementation in LLVM >> does not. One other possible difference, are you intending to >> include "values" which aren't of pointer type? > > Yes, the values will be of various types (although only 32/64 bit > types are currently allowed because of DWARF register number > weirdness). More importantly, our stack maps record locations of a > specific set of values, which may be in registers, at a specific > location.The fact that you're interested in more than information about which locations contain pointers into the heap is the key point here. Your stack map is actually slightly more general than the form used by a garbage collector. For example, your mechanism allows you to describe where the iteration variable ("int i") in a loop lives. This is not something a stack map (in the sense I've been using it to refer to GC usage) would enable.> In fact, that, along with reserving space for code patching, > is *all* we're doing. GCRoot doesn't do this at all. So there is > effectively no overlap in implementation.I'm actually come around to agree with you. I think you're slightly misunderstanding the role of "safepoints" and "gcroot" in the current implementation, but the fact that your mechanism is significantly more general than standard GC stackmaps (which LLVM implements currently) is a strong reason for having them as a separate implementation. (For performance reasons, a GC framework would probably want to use more concise stack maps which only encode pointer roots.)> >> >> Before moving on, am I interpreting your proposal and changes correctly? > > Yes, except I don’t see a direct connection between the functionality > we’re > adding and “the implementation currently in LLVM”. > >> Assuming I'm still correct so far, how might we combine these >> implementations? It looks like your implementation is much more >> mature than what exists in tree at the moment. One possibility would >> be to express the needed GC stack maps in terms of your new >> infrastructure. (i.e. convert a GCStrategy request for a safepoint >> into a StackMap (as you've implemented it) with the list of explicit >> GC roots as it's arguments). What would you think of this? > > I can imagine someone wanting to leverage some of the new > implementation without using it end-to-end as-is. Although I'm not > entirely sure what the motivation would be. For example: > > - A CodeGenPrepare pass could insert llvm.safepoint or llvm.patchpoint > calls at custom safepoints after determining GC root liveness at > those points. > > - Something like a GCStrategy could intercept our implementation of > stack map generation and emit a custom format. Keep in mind though > that the format that LLVM emits does not need to be the format read > by the collector. The JIT/runtime can parse LLVM's stack map data > and encode it using it's own data structures. That way, the > JIT/runtime can change without customizing LLVM.I think this is a very good point. Alternately, you could frame your encoding as being the default representation provided by LLVM and provide a plugin mechanism to modify it. (Not proposing this should actually be done at the moment. This would be by demand only.)> > As far as hooking the new stack map support into the GCMetaData > abstraction, I'm not sure how that would work. GCMachineCodeAnalysis > is currently a standalone MI pass. We can't generate our stack maps > here. Technically, a preEmitPass can come along later and reassign > registers invalidating the stack map. That's why we generate the maps > during MC lowering.I agree. I think this is actually a problem with the existing implementation as well. It gets around it by (I believe) forcing all roots to the stack when a stack map is needed.> > So, currently, the new intrinsics are serving a different purpose than > GCMetaData. I think someone working on GC support needs to be > convinced that they really need the new stack map features. Then we > can build something on top of the underlying functionality that works > for them. > > -AndyJust to note, that person working on the GC support is very likely to be me (or one of my coworkers) in the near future. That's why I've been so interested in your changes. :) As background, we're investing using LLVM as a JIT compiler for a VM which uses a precise relocating collector. The existing collector support appears problematic with regards to a relocating collector and we're investigating approaches to enhance it. My coworker will be opening another thread on that topic in the next few days. Philip -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131023/e49d15ce/attachment.html>
Possibly Parallel Threads
- [LLVMdev] [RFC] Stackmap and Patchpoint Intrinsic Proposal
- [LLVMdev] [RFC] Stackmap and Patchpoint Intrinsic Proposal
- [LLVMdev] [RFC] Stackmap and Patchpoint Intrinsic Proposal
- [LLVMdev] [RFC] Stackmap and Patchpoint Intrinsic Proposal
- [LLVMdev] [RFC] Stackmap and Patchpoint Intrinsic Proposal