thr3ads.net - llvm dev - [llvm-dev] Proposal for llvm.experimental.gc intrinsics for inttoptr and ptrtoint [Oct 2019]

If this information is useful, please help other people find it:
Share via:

Jameson Nash via llvm-dev

2019-Oct-01 19:21 UTC

[llvm-dev] Proposal for llvm.experimental.gc intrinsics for inttoptr and ptrtoint

For a datapoint, Julia uses the following function description to implement
approximately the capability of those functions. We then also verify that
there's no direct inttoptr/ptrtoint into our gc-tracked AddressSpace with a
custom verifier pass (among other sanity checks). I can provide additional
details and pointers to our gc-root tracking algorithm implementation if
desired (I also plan to be at the llvm-devmtg). It'd be great to know if
there's opportunities for collaboration, or at least sharing insights and
experiences!


llvm.experimental.gc.ptrtoint:
    dropgcroot_type = FunctionType::get(PtrIntTy,
makeArrayRef(PointerType::get(AddressSpace::Derived)), false);
    dropgcroot_func = Function::Create(dropgcroot_type,
Function::ExternalLinkage, "julia.pointer_from_objref");
    dropgcroot_func->addFnAttr(Attribute::ReadNone);
    dropgcroot_func->addFnAttr(Attribute::NoUnwind);

    declare void* @"julia.pointer_from_objref"(void addrspace(2)*)
readnone
unwind

(AddressSpace::Derived in the signature means it doesn't need to be valid
as a root itself, but needs to be traced back to locate the base object)


llvm.experimental.gc.inttoptr:
    This didn't need a custom function, since doing "untracked ->
inttoptr
-> addrspacecast -> tracked" is considered a legal transform in
Julia. We
later have an optimization pass that may see this and decide to weaken a
tracked object back into an untracked one (the root scanning pass can
similarly also find that the base object is not tracked and ignore it).
Non-moving GC means we can do this for many values, including those loaded
from constants and arguments. In your case, this could also apply to
integers that needed to get cast to a pointer for the calling convention.
Note that the validity of introducing and allowing this can be pretty
subtle, since it implies that it may be impossible to "take back" a
value
into the GC once it has released its gc root. This is true for several
reasons, since we already can't guarantee the the object lifetime is
appropriate after the object got hidden from the analysis passes (via the
ptrtoint) as a means of allowing stronger optimizations (stack promotion,
early freeing, memory reuse, etc). But it also may be true because of the
IntrNoMem annotation suggested: this states that the instruction has no
side-effects, but if you expect the value to resume being tracked by the
gc, that would imply these instructions do have some sort of observable
side effects on memory (possibly ReadOnly, as well as perhaps the absence
of nosync and nofree).

On Mon, Sep 30, 2019 at 8:35 PM Sanjoy Das via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Adding some folks from Azul.
>
> On Mon, Sep 30, 2019 at 4:00 PM Jake Ehrlich via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi All,
>>
>> I'm working on a project converting Dart to llvm. Dart uses a
relocating
>> GC but additionally uses pointer tagging. The first bit of a pointer is
a
>> tag. For integers a '0' bit is used and for pointers to objects
a '1' bit
>> is used. V8 apparently uses a similar technique. Generated code may
need to
>> check which bit is used when this information isn't statically
known.
>> Additionally a function might have a parameter which might be of a
dynamic
>> type so it might either pass an object or an integer for the same
parameter
>> meaning that this parameter type has to be of a single type in the llvm
IR.
>>
>> I'd like to make use of the existing
llvm.experimental.gc.statepoint
>> intrinsics but they strictly use non-integral types. This is required
to
>> stop certain optimizations from making optimizations that conflict with
>> finding base pointers.
>>
>> After speaking about this (primarily with Sanjoy Das) and gathering the
>> set of issues involved it seems it might be possible to resolve this by
>> adding two new intrinsics that mirror inttoptr and ptrtoint:
>> llvm.experimental.gc.inttoptr and llvm.experimental.gc.ptrtoint. These
will
>> be opaque to all existing abstractions. An additional pass would be
added
>> as well that would lower these versions of inttoptr and ptrtoint to
their
>> standard forms. When this pass is added after other optimizations it
should
>> in theory be safe. Potentially safe optimizations might be possible to
>> perform after this point but it isn't clear what optimizations
would
>> actually be both useful and safe at this point. The user of such a pass
is
>> responsible for not applying this pass before any optimizations that
might
>> alter the representation of a pointer in an invalid manner.
>>
>> So specifically the proposal is just the following
>> 1) Add llvm.experimental.gc.inttoptr and llvm.experimental.gc.ptrtoint
as
>> opaque "semanticless" intrinsic calls. They will be defined
as `IntrNoMem`
>> operations since they won't ever be lowered to anything that may
perform
>> any memory operations.
>>
>> 2) Add a pass LowerOpaqueIntergalPointerOps to perform the specified
>> lowering in order to allow these intrinsics to be compiled to code. Use
of
>> these intrinsics without using this lowering steps will fail in code
>> generation since these intrinsics will not participate in code
generation.
>>
>> Does this seem like a sound approach? Does this seem like an acceptable
>> way forward to the community? What tweaks or alterations would people
>> prefer?
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191001/9cdea779/attachment.html>

Jake Ehrlich via llvm-dev

2019-Oct-01 19:37 UTC

head link

[llvm-dev] Proposal for llvm.experimental.gc intrinsics for inttoptr and ptrtoint

Ah ok, and you then inline `julia.pointer_from_objref` at the end? I
suppose I could use that technique instead of introducing new intrinsics
but it seems like we both have a use case for this

We later have an optimization pass that may see this and decide to weaken
a> tracked object back into an untracked one
>
Yeah I was thinking about this. In my case the only such values that aren't
tracked are integers. As I understand it the pass that adds relocations
won't need to relocate a value prior to passing to a function. The best way
to handle this seems to be to keep values known to be integers in an
integer type as long as possible, and only convert back to a pointer when
passing to a function. The goal should be to only use the integer type on
all branch paths where I've checked that the pointer is an integer (or have
additional knowledge for other reasons). If I never use the pointer value
on branches after confirming the value is an integer then I should never
have to use a relocation. Passing to a function requires downcasting of
course but I don't have to relocate for that kind of downcast. That
function would then have to perform the check of course.

On Tue, Oct 1, 2019 at 12:21 PM Jameson Nash <vtjnash at gmail.com> wrote:
> For a datapoint, Julia uses the following function description to
> implement approximately the capability of those functions. We then also
> verify that there's no direct inttoptr/ptrtoint into our gc-tracked
> AddressSpace with a custom verifier pass (among other sanity checks). I can
> provide additional details and pointers to our gc-root tracking algorithm
> implementation if desired (I also plan to be at the llvm-devmtg). It'd
be
> great to know if there's opportunities for collaboration, or at least
> sharing insights and experiences!
>
>
> llvm.experimental.gc.ptrtoint:
>     dropgcroot_type = FunctionType::get(PtrIntTy,
> makeArrayRef(PointerType::get(AddressSpace::Derived)), false);
>     dropgcroot_func = Function::Create(dropgcroot_type,
> Function::ExternalLinkage, "julia.pointer_from_objref");
>     dropgcroot_func->addFnAttr(Attribute::ReadNone);
>     dropgcroot_func->addFnAttr(Attribute::NoUnwind);
>
>     declare void* @"julia.pointer_from_objref"(void
addrspace(2)*)
> readnone unwind
>
> (AddressSpace::Derived in the signature means it doesn't need to be
valid
> as a root itself, but needs to be traced back to locate the base object)
>
>
> llvm.experimental.gc.inttoptr:
>     This didn't need a custom function, since doing "untracked
-> inttoptr
> -> addrspacecast -> tracked" is considered a legal transform in
Julia. We
> later have an optimization pass that may see this and decide to weaken a
> tracked object back into an untracked one (the root scanning pass can
> similarly also find that the base object is not tracked and ignore it).
> Non-moving GC means we can do this for many values, including those loaded
> from constants and arguments. In your case, this could also apply to
> integers that needed to get cast to a pointer for the calling convention.
> Note that the validity of introducing and allowing this can be pretty
> subtle, since it implies that it may be impossible to "take back"
a value
> into the GC once it has released its gc root. This is true for several
> reasons, since we already can't guarantee the the object lifetime is
> appropriate after the object got hidden from the analysis passes (via the
> ptrtoint) as a means of allowing stronger optimizations (stack promotion,
> early freeing, memory reuse, etc). But it also may be true because of the
> IntrNoMem annotation suggested: this states that the instruction has no
> side-effects, but if you expect the value to resume being tracked by the
> gc, that would imply these instructions do have some sort of observable
> side effects on memory (possibly ReadOnly, as well as perhaps the absence
> of nosync and nofree).
>
> On Mon, Sep 30, 2019 at 8:35 PM Sanjoy Das via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Adding some folks from Azul.
>>
>> On Mon, Sep 30, 2019 at 4:00 PM Jake Ehrlich via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Hi All,
>>>
>>> I'm working on a project converting Dart to llvm. Dart uses a
relocating
>>> GC but additionally uses pointer tagging. The first bit of a
pointer is a
>>> tag. For integers a '0' bit is used and for pointers to
objects a '1' bit
>>> is used. V8 apparently uses a similar technique. Generated code may
need to
>>> check which bit is used when this information isn't statically
known.
>>> Additionally a function might have a parameter which might be of a
dynamic
>>> type so it might either pass an object or an integer for the same
parameter
>>> meaning that this parameter type has to be of a single type in the
llvm IR.
>>>
>>> I'd like to make use of the existing
llvm.experimental.gc.statepoint
>>> intrinsics but they strictly use non-integral types. This is
required to
>>> stop certain optimizations from making optimizations that conflict
with
>>> finding base pointers.
>>>
>>> After speaking about this (primarily with Sanjoy Das) and gathering
the
>>> set of issues involved it seems it might be possible to resolve
this by
>>> adding two new intrinsics that mirror inttoptr and ptrtoint:
>>> llvm.experimental.gc.inttoptr and llvm.experimental.gc.ptrtoint.
These will
>>> be opaque to all existing abstractions. An additional pass would be
added
>>> as well that would lower these versions of inttoptr and ptrtoint to
their
>>> standard forms. When this pass is added after other optimizations
it should
>>> in theory be safe. Potentially safe optimizations might be possible
to
>>> perform after this point but it isn't clear what optimizations
would
>>> actually be both useful and safe at this point. The user of such a
pass is
>>> responsible for not applying this pass before any optimizations
that might
>>> alter the representation of a pointer in an invalid manner.
>>>
>>> So specifically the proposal is just the following
>>> 1) Add llvm.experimental.gc.inttoptr and
llvm.experimental.gc.ptrtoint
>>> as opaque "semanticless" intrinsic calls. They will be
defined as
>>> `IntrNoMem` operations since they won't ever be lowered to
anything that
>>> may perform any memory operations.
>>>
>>> 2) Add a pass LowerOpaqueIntergalPointerOps to perform the
specified
>>> lowering in order to allow these intrinsics to be compiled to code.
Use of
>>> these intrinsics without using this lowering steps will fail in
code
>>> generation since these intrinsics will not participate in code
generation.
>>>
>>> Does this seem like a sound approach? Does this seem like an
acceptable
>>> way forward to the community? What tweaks or alterations would
people
>>> prefer?
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191001/459d637c/attachment.html>

Jameson Nash via llvm-dev

2019-Oct-01 20:59 UTC

head link

[llvm-dev] Proposal for llvm.experimental.gc intrinsics for inttoptr and ptrtoint

Yes. After we do gc-root placement, the same pass usually needs to drop it,
and all of the lifetime information (including other things like
invariant.load), before proceeding (to make sure late passes don't move
around the gc-tracked objects beyond their now-fixed lifetimes).

Aside: while checking what LLVM would do with some aspects of this
intrinsic, I stumbled across https://reviews.llvm.org/D59065, which added a
"llvm.ptrmask" intrinsic and may also be of interest (currently we use
a
specific function just for looking through the gc tagging bits
"julia.typeof").

Right, we do that too in codegen. The optimization pass helps find
additional opportunities, such as a SelectInst fed by a (discovered)
constant condition or inspecting all of the inputs to a PHINode.


On Tue, Oct 1, 2019 at 3:37 PM Jake Ehrlich <jakehehrlich at google.com>
wrote:
> Ah ok, and you then inline `julia.pointer_from_objref` at the end? I
> suppose I could use that technique instead of introducing new intrinsics
> but it seems like we both have a use case for this
>
> We later have an optimization pass that may see this and decide to weaken
>> a tracked object back into an untracked one
>>
>
> Yeah I was thinking about this. In my case the only such values that
> aren't tracked are integers. As I understand it the pass that adds
> relocations won't need to relocate a value prior to passing to a
function.
> The best way to handle this seems to be to keep values known to be integers
> in an integer type as long as possible, and only convert back to a pointer
> when passing to a function. The goal should be to only use the integer type
> on all branch paths where I've checked that the pointer is an integer
(or
> have additional knowledge for other reasons). If I never use the pointer
> value on branches after confirming the value is an integer then I should
> never have to use a relocation. Passing to a function requires downcasting
> of course but I don't have to relocate for that kind of downcast. That
> function would then have to perform the check of course.
>
> On Tue, Oct 1, 2019 at 12:21 PM Jameson Nash <vtjnash at gmail.com>
wrote:
>
>> For a datapoint, Julia uses the following function description to
>> implement approximately the capability of those functions. We then also
>> verify that there's no direct inttoptr/ptrtoint into our gc-tracked
>> AddressSpace with a custom verifier pass (among other sanity checks). I
can
>> provide additional details and pointers to our gc-root tracking
algorithm
>> implementation if desired (I also plan to be at the llvm-devmtg).
It'd be
>> great to know if there's opportunities for collaboration, or at
least
>> sharing insights and experiences!
>>
>>
>> llvm.experimental.gc.ptrtoint:
>>     dropgcroot_type = FunctionType::get(PtrIntTy,
>> makeArrayRef(PointerType::get(AddressSpace::Derived)), false);
>>     dropgcroot_func = Function::Create(dropgcroot_type,
>> Function::ExternalLinkage, "julia.pointer_from_objref");
>>     dropgcroot_func->addFnAttr(Attribute::ReadNone);
>>     dropgcroot_func->addFnAttr(Attribute::NoUnwind);
>>
>>     declare void* @"julia.pointer_from_objref"(void
addrspace(2)*)
>> readnone unwind
>>
>> (AddressSpace::Derived in the signature means it doesn't need to be
valid
>> as a root itself, but needs to be traced back to locate the base
object)
>>
>>
>> llvm.experimental.gc.inttoptr:
>>     This didn't need a custom function, since doing "untracked
->
>> inttoptr -> addrspacecast -> tracked" is considered a legal
transform in
>> Julia. We later have an optimization pass that may see this and decide
to
>> weaken a tracked object back into an untracked one (the root scanning
pass
>> can similarly also find that the base object is not tracked and ignore
it).
>> Non-moving GC means we can do this for many values, including those
loaded
>> from constants and arguments. In your case, this could also apply to
>> integers that needed to get cast to a pointer for the calling
convention.
>> Note that the validity of introducing and allowing this can be pretty
>> subtle, since it implies that it may be impossible to "take
back" a value
>> into the GC once it has released its gc root. This is true for several
>> reasons, since we already can't guarantee the the object lifetime
is
>> appropriate after the object got hidden from the analysis passes (via
the
>> ptrtoint) as a means of allowing stronger optimizations (stack
promotion,
>> early freeing, memory reuse, etc). But it also may be true because of
the
>> IntrNoMem annotation suggested: this states that the instruction has no
>> side-effects, but if you expect the value to resume being tracked by
the
>> gc, that would imply these instructions do have some sort of observable
>> side effects on memory (possibly ReadOnly, as well as perhaps the
absence
>> of nosync and nofree).
>>
>> On Mon, Sep 30, 2019 at 8:35 PM Sanjoy Das via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Adding some folks from Azul.
>>>
>>> On Mon, Sep 30, 2019 at 4:00 PM Jake Ehrlich via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> Hi All,
>>>>
>>>> I'm working on a project converting Dart to llvm. Dart uses
a
>>>> relocating GC but additionally uses pointer tagging. The first
bit of a
>>>> pointer is a tag. For integers a '0' bit is used and
for pointers to
>>>> objects a '1' bit is used. V8 apparently uses a similar
technique.
>>>> Generated code may need to check which bit is used when this
information
>>>> isn't statically known. Additionally a function might have
a parameter
>>>> which might be of a dynamic type so it might either pass an
object or an
>>>> integer for the same parameter meaning that this parameter type
has to be
>>>> of a single type in the llvm IR.
>>>>
>>>> I'd like to make use of the existing
llvm.experimental.gc.statepoint
>>>> intrinsics but they strictly use non-integral types. This is
required to
>>>> stop certain optimizations from making optimizations that
conflict with
>>>> finding base pointers.
>>>>
>>>> After speaking about this (primarily with Sanjoy Das) and
gathering the
>>>> set of issues involved it seems it might be possible to resolve
this by
>>>> adding two new intrinsics that mirror inttoptr and ptrtoint:
>>>> llvm.experimental.gc.inttoptr and
llvm.experimental.gc.ptrtoint. These will
>>>> be opaque to all existing abstractions. An additional pass
would be added
>>>> as well that would lower these versions of inttoptr and
ptrtoint to their
>>>> standard forms. When this pass is added after other
optimizations it should
>>>> in theory be safe. Potentially safe optimizations might be
possible to
>>>> perform after this point but it isn't clear what
optimizations would
>>>> actually be both useful and safe at this point. The user of
such a pass is
>>>> responsible for not applying this pass before any optimizations
that might
>>>> alter the representation of a pointer in an invalid manner.
>>>>
>>>> So specifically the proposal is just the following
>>>> 1) Add llvm.experimental.gc.inttoptr and
llvm.experimental.gc.ptrtoint
>>>> as opaque "semanticless" intrinsic calls. They will
be defined as
>>>> `IntrNoMem` operations since they won't ever be lowered to
anything that
>>>> may perform any memory operations.
>>>>
>>>> 2) Add a pass LowerOpaqueIntergalPointerOps to perform the
specified
>>>> lowering in order to allow these intrinsics to be compiled to
code. Use of
>>>> these intrinsics without using this lowering steps will fail in
code
>>>> generation since these intrinsics will not participate in code
generation.
>>>>
>>>> Does this seem like a sound approach? Does this seem like an
acceptable
>>>> way forward to the community? What tweaks or alterations would
people
>>>> prefer?
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191001/16fe2efd/attachment.html>

llvm dev - Oct 2019 - Proposal for llvm.experimental.gc intrinsics for inttoptr and ptrtoint

[llvm-dev] Proposal for llvm.experimental.gc intrinsics for inttoptr and ptrtoint

[llvm-dev] Proposal for llvm.experimental.gc intrinsics for inttoptr and ptrtoint

[llvm-dev] Proposal for llvm.experimental.gc intrinsics for inttoptr and ptrtoint