thr3ads.net - llvm dev - [llvm-dev] [RFC] Adding support for marking allocator functions in LLVM IR [Jan 2022]

If this information is useful, please help other people find it:
Share via:

Augie Fackler via llvm-dev

2022-Jan-05 22:32 UTC

[llvm-dev] [RFC] Adding support for marking allocator functions in LLVM IR

Hi everyone! I’m working on making the Rust compiler being able to track
LLVM HEAD more closely, and as part of that we need to obviate a patch[0]
that teaches LLVM about some Rust allocator implementation details. This
proposal is the product of many conversations and a couple of failed
attempts at simpler implementations.

Background

=======
Rust uses LLVM for codegen, and has its own allocator functions. In order
for LLVM to correctly optimize out allocations we have to tell the
optimizer about the allocation/deallocation functions used by Rust.

Languages supported by Clang, such as C and C++, have stable symbol names
for their allocation functions, which are hardcoded in LLVM[1][2].
Unfortunately, this strategy does not work for Rust, where developers don't
want to commit to a particular symbol name and calling convention yet.

Proposal

======
We add two attributes to LLVM IR:

 * `allocator(FAMILY)`: Marks a function as part of an allocator family,
named by the “primary” allocation function (e.g. `allocator(“malloc”)`,
`allocator(“_Znwm”)`, or `allocator(“__rust_alloc”)`).

 * `releaseptr(idx)`: Indicates that the function releases the pointer that
is its Nth argument.

These attributes, combined with the existing `allocsize(n[, m])` attribute
lets us annotate alloc, realloc, and free type functions in LLVM IR, which
relieves Rust of the need to carry a patch to describe its allocator
functions to LLVM’s optimizer. Some example IR of what this might look like:

; Function Attrs: nounwind ssp

define i8* @test5(i32 %n) #4 {

entry:

  %0 = tail call noalias dereferenceable_or_null(20) i8* @malloc(i32 20) #8

  %1 = load i8*, i8** @s, align 8

  call void @llvm.memcpy.p0i8.p0i8.i32(i8* noundef nonnull align 1
dereferenceable(10) %0, i8* noundef nonnull align 1 dereferenceable(10) %1,
i32 10, i1 false) #0

  ret i8* %0

}

attributes #8 = { nounwind allocsize(0) "allocator"="malloc"
}

Similarly, the call `free(foo)` would get the attributes
`”allocator”=”malloc” releaseptr(1)` and `realloc(foo, N)` gets
`”allocator”=”malloc” releaseptr(1) allocsize(1)`. Note that the
`releaseptr(n)` attribute is 1-indexed to avoid issues with storing zero
values in attributes in my current draft - I’m very open to suggestions to
change that, this just seemed like the right solution rather than adding
getters/setters everywhere to increment/decrement a value.

Benefits

======
In addition to the benefits for Rust, the LLVM optimizer could also be
improved to not optimize away defects like

{

  auto *foo = new Thing();

  free(foo);

}

which would then correctly crash instead of silently “working” until
something actually uses the allocation. Similarly, there’s a potential
defect when only one side of an overridden operator::new and
operator::delete is visible to the optimizer and inlineable, which can look
indistinguishable from the above after inlining.

This also probably opens the door to fixing issues like
https://bugs.llvm.org/show_bug.cgi?id=49022 caused by overloading the
`builtin` annotation on allocator functions, but I’m unlikely to continue
in that direction.

What do people think?

Thanks,

Augie

[0]
https://github.com/rust-lang/llvm-project/commit/b1f55f7159540862c407a2d89d49434ce65892e5

[1]
https://github.com/llvm/llvm-project/blob/cd5f582c3dd747ab97b57df37642b0dffba398ee/llvm/lib/Analysis/MemoryBuiltins.cpp#L73

[2]
https://github.com/llvm/llvm-project/blob/cd5f582c3dd747ab97b57df37642b0dffba398ee/llvm/lib/Analysis/MemoryBuiltins.cpp#L433
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20220105/c81e63a4/attachment.html>

Jessica Clarke via llvm-dev

2022-Jan-06 00:58 UTC

head link

[llvm-dev] [RFC] Adding support for marking allocator functions in LLVM IR

On 5 Jan 2022, at 22:32, Augie Fackler via llvm-dev <llvm-dev at
lists.llvm.org> wrote:> 
> Hi everyone! I’m working on making the Rust compiler being able to track
LLVM HEAD more closely, and as part of that we need to obviate a patch[0] that
teaches LLVM about some Rust allocator implementation details. This proposal is
the product of many conversations and a couple of failed attempts at simpler
implementations.
> 
> Background
> =======> Rust uses LLVM for codegen, and has its own allocator
functions. In order for LLVM to correctly optimize out allocations we have to
tell the optimizer about the allocation/deallocation functions used by Rust.
> 
> Languages supported by Clang, such as C and C++, have stable symbol names
for their allocation functions, which are hardcoded in LLVM[1][2].
Unfortunately, this strategy does not work for Rust, where developers don't
want to commit to a particular symbol name and calling convention yet.
> 
> Proposal
> ======> We add two attributes to LLVM IR:
> 
>  * `allocator(FAMILY)`: Marks a function as part of an allocator family,
named by the “primary” allocation function (e.g. `allocator(“malloc”)`,
`allocator(“_Znwm”)`, or `allocator(“__rust_alloc”)`).
Why do you need a family? What’s insufficient about just using allocsize(idx),
as used by __attribute__((alloc_size(...)) in GNU C? (Which you acknowledge the
existence of, but don’t justify why you need your own attribute.) What to use as
the allocator “family” string for C++ operator new seems pretty unclear too, so
I’m not sure how good an idea this free-form string argument is in your
proposal.
>  * `releaseptr(idx)`: Indicates that the function releases the pointer that
is its Nth argument.
This should probably just be free(idx)/frees(idx)/willfree(idx) (we already have
nofree as an attribute for arguments), or an attribute on the argument itself.
Talking about releasing makes it sound like reference counting semantics.

Jess
> These attributes, combined with the existing `allocsize(n[, m])` attribute
lets us annotate alloc, realloc, and free type functions in LLVM IR, which
relieves Rust of the need to carry a patch to describe its allocator functions
to LLVM’s optimizer. Some example IR of what this might look like:
> 
> ; Function Attrs: nounwind ssp
> define i8* @test5(i32 %n) #4 {
> entry:
>   %0 = tail call noalias dereferenceable_or_null(20) i8* @malloc(i32 20) #8
>   %1 = load i8*, i8** @s, align 8
>   call void @llvm.memcpy.p0i8.p0i8.i32(i8* noundef nonnull align 1
dereferenceable(10) %0, i8* noundef nonnull align 1 dereferenceable(10) %1, i32
10, i1 false) #0
>   ret i8* %0
> }
> 
> attributes #8 = { nounwind allocsize(0)
"allocator"="malloc" }
> 
> Similarly, the call `free(foo)` would get the attributes
`”allocator”=”malloc” releaseptr(1)` and `realloc(foo, N)` gets
`”allocator”=”malloc” releaseptr(1) allocsize(1)`. Note that the `releaseptr(n)`
attribute is 1-indexed to avoid issues with storing zero values in attributes in
my current draft - I’m very open to suggestions to change that, this just seemed
like the right solution rather than adding getters/setters everywhere to
increment/decrement a value.
> 
> Benefits
> ======> In addition to the benefits for Rust, the LLVM optimizer could
also be improved to not optimize away defects like
> 
> {
>   auto *foo = new Thing();
>   free(foo);
> }
> 
> which would then correctly crash instead of silently “working” until
something actually uses the allocation. Similarly, there’s a potential defect
when only one side of an overridden operator::new and operator::delete is
visible to the optimizer and inlineable, which can look indistinguishable from
the above after inlining.
> 
> This also probably opens the door to fixing issues like
https://bugs.llvm.org/show_bug.cgi?id=49022 caused by overloading the `builtin`
annotation on allocator functions, but I’m unlikely to continue in that
direction.
> 
> What do people think?
> 
> Thanks,
> Augie
> 
> [0]
https://github.com/rust-lang/llvm-project/commit/b1f55f7159540862c407a2d89d49434ce65892e5
> [1]
https://github.com/llvm/llvm-project/blob/cd5f582c3dd747ab97b57df37642b0dffba398ee/llvm/lib/Analysis/MemoryBuiltins.cpp#L73
> [2]
https://github.com/llvm/llvm-project/blob/cd5f582c3dd747ab97b57df37642b0dffba398ee/llvm/lib/Analysis/MemoryBuiltins.cpp#L433
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Nikita Popov via llvm-dev

2022-Jan-06 09:41 UTC

head link

[llvm-dev] [RFC] Adding support for marking allocator functions in LLVM IR

On Wed, Jan 5, 2022 at 11:32 PM Augie Fackler via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi everyone! I’m working on making the Rust compiler being able to track
> LLVM HEAD more closely, and as part of that we need to obviate a patch[0]
> that teaches LLVM about some Rust allocator implementation details. This
> proposal is the product of many conversations and a couple of failed
> attempts at simpler implementations.
>
> Background
>
> =======>
> Rust uses LLVM for codegen, and has its own allocator functions. In order
> for LLVM to correctly optimize out allocations we have to tell the
> optimizer about the allocation/deallocation functions used by Rust.
>
> Languages supported by Clang, such as C and C++, have stable symbol names
> for their allocation functions, which are hardcoded in LLVM[1][2].
> Unfortunately, this strategy does not work for Rust, where developers
don't
> want to commit to a particular symbol name and calling convention yet.
>
> Proposal
>
> ======>
> We add two attributes to LLVM IR:
>
>  * `allocator(FAMILY)`: Marks a function as part of an allocator family,
> named by the “primary” allocation function (e.g. `allocator(“malloc”)`,
> `allocator(“_Znwm”)`, or `allocator(“__rust_alloc”)`).
>
>  * `releaseptr(idx)`: Indicates that the function releases the pointer
> that is its Nth argument.
>
> These attributes, combined with the existing `allocsize(n[, m])` attribute
> lets us annotate alloc, realloc, and free type functions in LLVM IR, which
> relieves Rust of the need to carry a patch to describe its allocator
> functions to LLVM’s optimizer. Some example IR of what this might look
like:
>
> ; Function Attrs: nounwind ssp
>
> define i8* @test5(i32 %n) #4 {
>
> entry:
>
>   %0 = tail call noalias dereferenceable_or_null(20) i8* @malloc(i32 20) #8
>
>   %1 = load i8*, i8** @s, align 8
>
>   call void @llvm.memcpy.p0i8.p0i8.i32(i8* noundef nonnull align 1
> dereferenceable(10) %0, i8* noundef nonnull align 1 dereferenceable(10) %1,
> i32 10, i1 false) #0
>
>   ret i8* %0
>
> }
>
> attributes #8 = { nounwind allocsize(0)
"allocator"="malloc" }
>
> Similarly, the call `free(foo)` would get the attributes
> `”allocator”=”malloc” releaseptr(1)` and `realloc(foo, N)` gets
> `”allocator”=”malloc” releaseptr(1) allocsize(1)`. Note that the
> `releaseptr(n)` attribute is 1-indexed to avoid issues with storing zero
> values in attributes in my current draft - I’m very open to suggestions to
> change that, this just seemed like the right solution rather than adding
> getters/setters everywhere to increment/decrement a value.
>
> Benefits
>
> ======>
> In addition to the benefits for Rust, the LLVM optimizer could also be
> improved to not optimize away defects like
>
> {
>
>   auto *foo = new Thing();
>
>   free(foo);
>
> }
>
> which would then correctly crash instead of silently “working” until
> something actually uses the allocation. Similarly, there’s a potential
> defect when only one side of an overridden operator::new and
> operator::delete is visible to the optimizer and inlineable, which can look
> indistinguishable from the above after inlining.
>
> This also probably opens the door to fixing issues like
> https://bugs.llvm.org/show_bug.cgi?id=49022 caused by overloading the
> `builtin` annotation on allocator functions, but I’m unlikely to continue
> in that direction.
>
> What do people think?
>
An important bit I'm missing in this proposal is what the actual semantics
of the "allocator" attribute are -- what optimizations is LLVM
permitted to
perform if this attribute is present?

I've looked through various uses of isAllocLikeFn(), and I think a few of
them can be replaced by isNoAliasCall() instead, which is our existing
mechanism to annotate that a function returns a distinct memory object.
Sample change for LICM here: https://reviews.llvm.org/D116728 I think we
should try to migrate isAllocLikeFn() -> isNoAliasCall() for cases that
don't need any additional guarantees.

I assume the only optimization that "allocator" should control is the
elimination of unused alloc+free pairs. Is that correct? Or are there other
optimizations that should be bound to it?

Regards,
Nikita
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20220106/46f5fba8/attachment.html>

James Y Knight via llvm-dev

2022-Jan-06 16:10 UTC

head link

[llvm-dev] [RFC] Adding support for marking allocator functions in LLVM IR

On Wed, Jan 5, 2022 at 5:32 PM Augie Fackler via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> In addition to the benefits for Rust, the LLVM optimizer could also be
> improved to not optimize away defects like
>
> {
>
>   auto *foo = new Thing();
>
>   free(foo);
>
> }
>
> which would then correctly crash instead of silently “working” until
> something actually uses the allocation. Similarly, there’s a potential
> defect when only one side of an overridden operator::new and
> operator::delete is visible to the optimizer and inlineable, which can look
> indistinguishable from the above after inlining.
>
I think this is important to note -- tracking the pairing is something we
should be doing already with the existing hardcoded list, as well. E.g.
compile the following example with -O1 (https://godbolt.org/z/WsYKcexYG).

When compiling "main", at first we cannot see the matched new/delete
pair,
because delete is hidden in the "deleteit" function. Then, we run the
inliner, which inlines the "operator new" and "deleteit"
functions. Now,
main has a call to malloc followed by ::operator delete, which we proceed
to remove, because they're allocation functions. But they're *unmatched*
allocation functions, so this weirdly ends up skipping the side-effects of
our custom operator delete, but NOT those of our custom operator new.

#include <stdlib.h>
#include <stdio.h>

int allocs = 0;

void *operator new(size_t n) {
allocs++;
void *mem = malloc(n);
if (!mem) abort();
return mem;
}

__attribute__((noinline)) void operator delete(void *mem) noexcept {
allocs--;
free(mem);
}

void deleteit(int*i) { delete i; }
int main() {
int*i = new int;
deleteit(i);
if (allocs != 0)
printf("MEMORY LEAK! allocs: %d\n", allocs);
}
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20220106/9f78e6f5/attachment-0001.html>

Philip Reames via llvm-dev

2022-Jan-06 16:40 UTC

head link

[llvm-dev] [RFC] Adding support for marking allocator functions in LLVM IR

On 1/5/22 2:32 PM, Augie Fackler via llvm-dev wrote:>
> Hi everyone! I’m working on making the Rust compiler being able to 
> track LLVM HEAD more closely, and as part of that we need to obviate a 
> patch[0] that teaches LLVM about some Rust allocator implementation 
> details. This proposal is the product of many conversations and a 
> couple of failed attempts at simpler implementations.
>
>
> Background
>
> =======>
> Rust uses LLVM for codegen, and has its own allocator functions. In 
> order for LLVM to correctly optimize out allocations we have to tell 
> the optimizer about the allocation/deallocation functions used by Rust.
>
>
> Languages supported by Clang, such as C and C++, have stable symbol 
> names for their allocation functions, which are hardcoded in 
> LLVM[1][2]. Unfortunately, this strategy does not work for Rust, where 
> developers don't want to commit to a particular symbol name and 
> calling convention yet.
>
>
> Proposal
>
> ======>
> We add two attributes to LLVM IR:
>
>
>  * `allocator(FAMILY)`: Marks a function as part of an allocator 
> family, named by the “primary” allocation function (e.g. 
> `allocator(“malloc”)`, `allocator(“_Znwm”)`, or 
> `allocator(“__rust_alloc”)`).
>
>
>  * `releaseptr(idx)`: Indicates that the function releases the pointer 
> that is its Nth argument.
>Can you expand a bit on the motivation for this one?  What are some 
small examples that you think this will enable?

I don't see how this could allow allocation elimination without 
aggressive inlining.  Maybe you could use it to prove a particular bit 
of storage is undefined after return, but what does that buy you in 
terms of practical optimization benefit?  Do you have something else in 
mind?

p.s. In terms of spelling, I strongly agree with the suggestion 
elsewhere to recast this as a parameter attribute and use the "free" 
naming.

>
> These attributes, combined with the existing `allocsize(n[, m])` 
> attribute lets us annotate alloc, realloc, and free type functions in 
> LLVM IR, which relieves Rust of the need to carry a patch to describe 
> its allocator functions to LLVM’s optimizer. Some example IR of what 
> this might look like:
>
>
> ; Function Attrs: nounwind ssp
>
> define i8* @test5(i32 %n) #4 {
>
> entry:
>
>   %0 = tail call noalias dereferenceable_or_null(20) i8* @malloc(i32 
> 20) #8
>
>   %1 = load i8*, i8** @s, align 8
>
>   call void @llvm.memcpy.p0i8.p0i8.i32(i8* noundef nonnull align 1 
> dereferenceable(10) %0, i8* noundef nonnull align 1 
> dereferenceable(10) %1, i32 10, i1 false) #0
>
>   ret i8* %0
>
> }
>
>
> attributes #8 = { nounwind allocsize(0)
"allocator"="malloc" }
>
>
> Similarly, the call `free(foo)` would get the attributes 
> `”allocator”=”malloc” releaseptr(1)` and `realloc(foo, N)` gets 
> `”allocator”=”malloc” releaseptr(1) allocsize(1)`. Note that the 
> `releaseptr(n)` attribute is 1-indexed to avoid issues with storing 
> zero values in attributes in my current draft - I’m very open to 
> suggestions to change that, this just seemed like the right solution 
> rather than adding getters/setters everywhere to increment/decrement a 
> value.
>
>
> Benefits
>
> ======>
> In addition to the benefits for Rust, the LLVM optimizer could also be 
> improved to not optimize away defects like
>
>
> {
>
>   auto *foo = new Thing();
>
>   free(foo);
>
> }
>
>
> which would then correctly crash instead of silently “working” until 
> something actually uses the allocation. Similarly, there’s a potential 
> defect when only one side of an overridden operator::new and 
> operator::delete is visible to the optimizer and inlineable, which can 
> look indistinguishable from the above after inlining.
>
>
> This also probably opens the door to fixing issues like 
> https://bugs.llvm.org/show_bug.cgi?id=49022 
> <https://bugs.llvm.org/show_bug.cgi?id=49022>caused by overloading
the
> `builtin` annotation on allocator functions, but I’m unlikely to 
> continue in that direction.
>
>
> What do people think?
>
> Thanks,
>
> Augie
>
>
> [0] 
>
https://github.com/rust-lang/llvm-project/commit/b1f55f7159540862c407a2d89d49434ce65892e5
>
<https://github.com/rust-lang/llvm-project/commit/b1f55f7159540862c407a2d89d49434ce65892e5>
>
> [1] 
>
https://github.com/llvm/llvm-project/blob/cd5f582c3dd747ab97b57df37642b0dffba398ee/llvm/lib/Analysis/MemoryBuiltins.cpp#L73
>
<https://github.com/llvm/llvm-project/blob/cd5f582c3dd747ab97b57df37642b0dffba398ee/llvm/lib/Analysis/MemoryBuiltins.cpp#L73>
>
> [2] 
>
https://github.com/llvm/llvm-project/blob/cd5f582c3dd747ab97b57df37642b0dffba398ee/llvm/lib/Analysis/MemoryBuiltins.cpp#L433
>
<https://github.com/llvm/llvm-project/blob/cd5f582c3dd747ab97b57df37642b0dffba398ee/llvm/lib/Analysis/MemoryBuiltins.cpp#L433>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20220106/9fd84aef/attachment.html>

Bryce Wilson via llvm-dev

2022-Jan-07 09:23 UTC

head link

[llvm-dev] [RFC] Adding support for marking allocator functions in LLVM IR

Hi all,

It's quite a coincidence to see this proposal. I just joined this list a few
days ago specifically to ask about the correct way to annotate allocation and
freeing functions. I'll create a separate thread to ask questions about my
specific situation but I wanted to add my support for this proposal here.

My main question is if there should be some way to specify the kind of
allocation function. In the hardcoded AllocationFnData array, there is a field
to specify if the function acts like malloc, new, calloc, realloc, etc. This
could be added to the annotation but I think a better way would be to specify
the actual properties of interest. Can it return null, does it align the
allocation, and what are the values in the newly allocated space (undef for
malloc, 0 for calloc, something unknown but defined for strdup). This would also
allow for creating new types of allocators that don't already exist.

I've created a patch with what this might look like for the existing
hardcoded functions here: https://reviews.llvm.org/D116797
<https://reviews.llvm.org/D116797>. In my initial implementation, I
realized that there are a lot of places where argument positions are hardcoded
and special detection of strdup and strndup is hardcoded. Regardless of if we
add the ability to specify these properties in attribute form or not, we will at
least need to ensure that the correct arguments are used based on an allocsize
annotation if available.

Sincerely,
Bryce Michael Wilson

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20220107/6ba100d6/attachment.html>

llvm dev - Jan 2022 - [RFC] Adding support for marking allocator functions in LLVM IR

[llvm-dev] [RFC] Adding support for marking allocator functions in LLVM IR

[llvm-dev] [RFC] Adding support for marking allocator functions in LLVM IR

[llvm-dev] [RFC] Adding support for marking allocator functions in LLVM IR

[llvm-dev] [RFC] Adding support for marking allocator functions in LLVM IR

[llvm-dev] [RFC] Adding support for marking allocator functions in LLVM IR

[llvm-dev] [RFC] Adding support for marking allocator functions in LLVM IR