thr3ads.net - llvm dev - [llvm-dev] [RFC] Adding support for marking allocator functions in LLVM IR [Jan 2022]

If this information is useful, please help other people find it:
Share via:

Jessica Clarke via llvm-dev

2022-Jan-06 00:58 UTC

[llvm-dev] [RFC] Adding support for marking allocator functions in LLVM IR

On 5 Jan 2022, at 22:32, Augie Fackler via llvm-dev <llvm-dev at
lists.llvm.org> wrote:> 
> Hi everyone! I’m working on making the Rust compiler being able to track
LLVM HEAD more closely, and as part of that we need to obviate a patch[0] that
teaches LLVM about some Rust allocator implementation details. This proposal is
the product of many conversations and a couple of failed attempts at simpler
implementations.
> 
> Background
> =======> Rust uses LLVM for codegen, and has its own allocator
functions. In order for LLVM to correctly optimize out allocations we have to
tell the optimizer about the allocation/deallocation functions used by Rust.
> 
> Languages supported by Clang, such as C and C++, have stable symbol names
for their allocation functions, which are hardcoded in LLVM[1][2].
Unfortunately, this strategy does not work for Rust, where developers don't
want to commit to a particular symbol name and calling convention yet.
> 
> Proposal
> ======> We add two attributes to LLVM IR:
> 
>  * `allocator(FAMILY)`: Marks a function as part of an allocator family,
named by the “primary” allocation function (e.g. `allocator(“malloc”)`,
`allocator(“_Znwm”)`, or `allocator(“__rust_alloc”)`).
Why do you need a family? What’s insufficient about just using allocsize(idx),
as used by __attribute__((alloc_size(...)) in GNU C? (Which you acknowledge the
existence of, but don’t justify why you need your own attribute.) What to use as
the allocator “family” string for C++ operator new seems pretty unclear too, so
I’m not sure how good an idea this free-form string argument is in your
proposal.
>  * `releaseptr(idx)`: Indicates that the function releases the pointer that
is its Nth argument.
This should probably just be free(idx)/frees(idx)/willfree(idx) (we already have
nofree as an attribute for arguments), or an attribute on the argument itself.
Talking about releasing makes it sound like reference counting semantics.

Jess
> These attributes, combined with the existing `allocsize(n[, m])` attribute
lets us annotate alloc, realloc, and free type functions in LLVM IR, which
relieves Rust of the need to carry a patch to describe its allocator functions
to LLVM’s optimizer. Some example IR of what this might look like:
> 
> ; Function Attrs: nounwind ssp
> define i8* @test5(i32 %n) #4 {
> entry:
>   %0 = tail call noalias dereferenceable_or_null(20) i8* @malloc(i32 20) #8
>   %1 = load i8*, i8** @s, align 8
>   call void @llvm.memcpy.p0i8.p0i8.i32(i8* noundef nonnull align 1
dereferenceable(10) %0, i8* noundef nonnull align 1 dereferenceable(10) %1, i32
10, i1 false) #0
>   ret i8* %0
> }
> 
> attributes #8 = { nounwind allocsize(0)
"allocator"="malloc" }
> 
> Similarly, the call `free(foo)` would get the attributes
`”allocator”=”malloc” releaseptr(1)` and `realloc(foo, N)` gets
`”allocator”=”malloc” releaseptr(1) allocsize(1)`. Note that the `releaseptr(n)`
attribute is 1-indexed to avoid issues with storing zero values in attributes in
my current draft - I’m very open to suggestions to change that, this just seemed
like the right solution rather than adding getters/setters everywhere to
increment/decrement a value.
> 
> Benefits
> ======> In addition to the benefits for Rust, the LLVM optimizer could
also be improved to not optimize away defects like
> 
> {
>   auto *foo = new Thing();
>   free(foo);
> }
> 
> which would then correctly crash instead of silently “working” until
something actually uses the allocation. Similarly, there’s a potential defect
when only one side of an overridden operator::new and operator::delete is
visible to the optimizer and inlineable, which can look indistinguishable from
the above after inlining.
> 
> This also probably opens the door to fixing issues like
https://bugs.llvm.org/show_bug.cgi?id=49022 caused by overloading the `builtin`
annotation on allocator functions, but I’m unlikely to continue in that
direction.
> 
> What do people think?
> 
> Thanks,
> Augie
> 
> [0]
https://github.com/rust-lang/llvm-project/commit/b1f55f7159540862c407a2d89d49434ce65892e5
> [1]
https://github.com/llvm/llvm-project/blob/cd5f582c3dd747ab97b57df37642b0dffba398ee/llvm/lib/Analysis/MemoryBuiltins.cpp#L73
> [2]
https://github.com/llvm/llvm-project/blob/cd5f582c3dd747ab97b57df37642b0dffba398ee/llvm/lib/Analysis/MemoryBuiltins.cpp#L433
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Augie Fackler via llvm-dev

2022-Jan-06 15:45 UTC

head link

[llvm-dev] [RFC] Adding support for marking allocator functions in LLVM IR

On Wed, Jan 5, 2022 at 7:58 PM Jessica Clarke <jrtc27 at jrtc27.com>
wrote:
> On 5 Jan 2022, at 22:32, Augie Fackler via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> > Proposal
> > ======> > We add two attributes to LLVM IR:
> >
> >  * `allocator(FAMILY)`: Marks a function as part of an allocator
family,
> named by the “primary” allocation function (e.g. `allocator(“malloc”)`,
> `allocator(“_Znwm”)`, or `allocator(“__rust_alloc”)`).
>
> Why do you need a family? What’s insufficient about just using
> allocsize(idx), as used by __attribute__((alloc_size(...)) in GNU C? (Which
> you acknowledge the existence of, but don’t justify why you need your own
> attribute.) What to use as the allocator “family” string for C++ operator
> new seems pretty unclear too, so I’m not sure how good an idea this
> free-form string argument is in your proposal.
>
For C++ new we'd use the mangled form of ::operator::new, aka _Znwm. The
reason for the family is so that we can track which allocations are
related, so that incorrect code like

{
  auto *foo = new Foo();
  free(foo);
}

doesn't get optimized away. I believe others I talked to while designing
this might have had more interesting ideas for how the family could be
used, but it's easy to put in today, and roughly impossible to add via an
IR upgrade if we don't put it in, so it seems sensible to try and track the
family now. I actually initially argued against the family argument, but
was won over by the limitation that we'd be unable to upgrade it in the
future.

> >  * `releaseptr(idx)`: Indicates that the function releases the pointer
> that is its Nth argument.
>
> This should probably just be free(idx)/frees(idx)/willfree(idx) (we
> already have nofree as an attribute for arguments), or an attribute on the
> argument itself. Talking about releasing makes it sound like reference
> counting semantics.
>
Sure, that seems reasonable. I'll update my prototype to use that name
instead. Thanks!

>
> Jess
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20220106/23722615/attachment.html>

llvm dev - Jan 2022 - [RFC] Adding support for marking allocator functions in LLVM IR

[llvm-dev] [RFC] Adding support for marking allocator functions in LLVM IR

[llvm-dev] [RFC] Adding support for marking allocator functions in LLVM IR