thr3ads.net - llvm dev - [LLVMdev] alloc

If this information is useful, please help other people find it:
Share via:

Hal Finkel

2012-May-25 16:28 UTC

[LLVMdev] alloc_size metadata

On Fri, 25 May 2012 17:43:52 +0200
Duncan Sands <baldrick at free.fr> wrote:
> Hi John,
> 
> On 25/05/12 17:22, John Criswell wrote:
> > On 5/25/12 2:16 AM, Duncan Sands wrote:
> >> Hi John,
> >>
> >>>>> I'm implementing the alloc_size function attribute
in clang.
> >>>> does anyone actually use this attribute? And if they do,
can it
> >>>> really buy them anything? How about
"implementing" it by
> >>>> ignoring it!
> >>>
> >> ...
> >>>
> >>> Currently, SAFECode has a pass which just recognizes certain
> >>> functions as allocators and knows how to interpret the
arguments
> >>> to find the size. If we want SAFECode to work with another
> >>> allocator (like a program's custom allocator, the
Objective-C
> >>> allocator, the Boehm garbage collector, etc), then that pass
> >>> needs to be modified to recognize it. Having to update this
pass
> >>> for every allocator name and type is one of the few reasons
why
> >>> SAFECode only works with C/C++ and not just any old language
that
> >>> is compiled down to LLVM IR.
> >>
> >>
> >>> Nuno's proposed feature would allow programmers to
communicate
> >>> the relevant information about allocators to tools like
SAFECode
> >>> and ASan. I think it might also make some of the optimizations
in
> >>> LLVM that require knowing about allocators work on non-C/C++
code.
> >>
> >> these are good points. The attribute and proposed implementation
> >> feel pretty clunky though, which is my main gripe.
> >
> > Hrm. I haven't formed an opinion on what the attributes should
look
> > like. I think supporting the ones established by GCC would be
> > important for compatibility, and on the surface, they look
> > reasonable. Devising better ones for Clang is fine with me. What
> > about them feels klunky?
> 
> basically it feels like "I only know about C, here's something
that
> pretends to be general but only handles C".  Consider a language with
> a string type that contains the string length as well as the
> characters.  It has a library function allocate_string(length).  How
> much does it allocate?  length+4 bytes. That can't be represented by
> alloc_size.  What's more, it may well store the length at the start,
> and return a pointer to just after the length: a pointer to the first
> character.  alloc_size can't represent "the allocated memory
starts 4
> bytes before the return value" either.  In short, it feels like a
> hack for handling something that turns up in some particular C code
> that someone has, rather than a general solution to the general
> problem.
I think this is a good point, here's a suggestion:

Have the metadata name two functions, both assumed to have the same
signature as the tagged function, one which returns the offset of the
start of the allocated region and one which returns the length of the
allocated region. Alternatively, these functions could take the same
signature and additionally the returned pointer of the tagged
function, and then one function can return the start of the region and
the other the length.

For static analysis, we can attempt to inline these functions and then
use SCEV (dead code elimination will then get rid of the unused
results). For runtime checks, calls (which may also be inlined) can be
easily constructed.
> 
> >> Since LLVM already has utility functions for recognizing
> >> allocators (i.e. that know about malloc, realloc and -fno-builtin
> >> etc) can't SAFECode just make use of them?
> >
> > It probably could. It doesn't simply because SAFECode was written
> > before these features existed within LLVM.
> > :)
> >
> >> Then either (1) something like alloc_size is implemented, the LLVM
> >> utility learns about it, and SAFECode benefits automagically, or
> >> (2) the LLVM utility is taught about other allocators like
Ada's,
> >> and SAFECode benefits automagically.
> >
> > I'm not sure what you mean by "LLVM utility," but I
think we're
> > thinking along the same lines. Clang/LLVM implement the alloc_size
> > attributes, we change SAFECode to recognize it, and so when people
> > use it, SAFECode benefits automagically.
> >
> > Am I right that we're thinking the same thing, or did I completely
> > misunderstand you?
> 
> no, I'm thinking that SAFECode won't need to look at or worry about
> the attribute at all, because the LLVM methods will know about it and
> serve up the appropriate info.  Take a look at
> Analysis/MemoryBuiltins.h.  In spite of the names, things like
> extractMallocCall are dealing with "malloc like" functions, such
as
> C++'s "new" as well as malloc.  Similarly for calloc.  So you
could
> use those right now to extract "malloc" and "calloc"
sizes.  If
> alloc_size is implemented, presumably these would just magically
> start to give you useful sizes for functions annotated with that
> attribute too.
Does the current code even handle calloc? I only see malloc and new.

 -Hal
> 
> Ciao, Duncan.
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory

Nuno Lopes

2012-May-29 16:37 UTC

head link

[LLVMdev] alloc_size metadata

>> On 25/05/12 17:22, John Criswell wrote:
>> > On 5/25/12 2:16 AM, Duncan Sands wrote:
>> >> Hi John,
>> >>
>> >>>>> I'm implementing the alloc_size function
attribute in clang.
>> >>>> does anyone actually use this attribute? And if they
do, can it
>> >>>> really buy them anything? How about
"implementing" it by
>> >>>> ignoring it!
>> >>>
>> >> ...
>> >>>
>> >>> Currently, SAFECode has a pass which just recognizes
certain
>> >>> functions as allocators and knows how to interpret the
arguments
>> >>> to find the size. If we want SAFECode to work with another
>> >>> allocator (like a program's custom allocator, the
Objective-C
>> >>> allocator, the Boehm garbage collector, etc), then that
pass
>> >>> needs to be modified to recognize it. Having to update
this pass
>> >>> for every allocator name and type is one of the few
reasons why
>> >>> SAFECode only works with C/C++ and not just any old
language that
>> >>> is compiled down to LLVM IR.
>> >>
>> >>
>> >>> Nuno's proposed feature would allow programmers to
communicate
>> >>> the relevant information about allocators to tools like
SAFECode
>> >>> and ASan. I think it might also make some of the
optimizations in
>> >>> LLVM that require knowing about allocators work on
non-C/C++ code.
>> >>
>> >> these are good points. The attribute and proposed
implementation
>> >> feel pretty clunky though, which is my main gripe.
>> >
>> > Hrm. I haven't formed an opinion on what the attributes should
look
>> > like. I think supporting the ones established by GCC would be
>> > important for compatibility, and on the surface, they look
>> > reasonable. Devising better ones for Clang is fine with me. What
>> > about them feels klunky?
>>
>> basically it feels like "I only know about C, here's something
that
>> pretends to be general but only handles C".  Consider a language
with
>> a string type that contains the string length as well as the
>> characters.  It has a library function allocate_string(length).  How
>> much does it allocate?  length+4 bytes. That can't be represented
by
>> alloc_size.  What's more, it may well store the length at the
start,
>> and return a pointer to just after the length: a pointer to the first
>> character.  alloc_size can't represent "the allocated memory
starts 4
>> bytes before the return value" either.  In short, it feels like a
>> hack for handling something that turns up in some particular C code
>> that someone has, rather than a general solution to the general
>> problem.
>
> I think this is a good point, here's a suggestion:
>
> Have the metadata name two functions, both assumed to have the same
> signature as the tagged function, one which returns the offset of the
> start of the allocated region and one which returns the length of the
> allocated region. Alternatively, these functions could take the same
> signature and additionally the returned pointer of the tagged
> function, and then one function can return the start of the region and
> the other the length.
Ok, so this seems to be the most general proposal, which can obviously  
handle all cases.
Something like this would work:

define i8* @foo() {
   %0 = tail call i32 @get_realloc_size(i8* null, i32 42)
   %call = tail call i8* @my_recalloc(i8* null, i32 42) nounwind,  
!alloc_size !{i32 %0}
   ret i8* %call
}

Basically I just added a function call as the metadata (it's not  
currently possible to add the function itself to the metadata; the  
function call is required instead).
As long as the function is marked as readnone, I think it shouldn't  
interfere with the optimizers, and we can have a later pass to drop  
the metadata and remove the calls.  I still don't like having the  
explicit calls there, though.  Any suggestions to remove the functions  
calls from there?

I feel that the offset function is probably not required. I've never  
seen an allocation function that doesn't return a pointer to the  
beginning of the allocated buffer. Also, I cannot remember of any  
function in the C library that has that behavior.

We will also need a convenient syntax to export this feature in the  
languages we support.
I personally would like to see '__attribute__((alloc_size( strlen(x)+1  
))' in C, but the implementation seems to be non-trivial.

About Duncan's comment about having the memory builtin analysis  
recognize this intrinsic, well I agree it should (and I'll take care  
of that), but I'm not sure if we should be very aggressive in  
optimizing based on this metadata.
For example, do we really want to remove a call to a custom allocator  
whose return value is unused (like we do for malloc)?  If so, we'll  
also need a metadata node to mark de-allocation functions  (so that  
sequences like my_free(my_malloc(xx)) are removed).

Any feedback on the issues described is highly appreciated!

Thanks,
Nuno

Duncan Sands

2012-May-29 17:23 UTC

head link

[LLVMdev] alloc_size metadata

Hi Nuno,
>> I think this is a good point, here's a suggestion:
>>
>> Have the metadata name two functions, both assumed to have the same
>> signature as the tagged function, one which returns the offset of the
>> start of the allocated region and one which returns the length of the
>> allocated region. Alternatively, these functions could take the same
>> signature and additionally the returned pointer of the tagged
>> function, and then one function can return the start of the region and
>> the other the length.
>
> Ok, so this seems to be the most general proposal, which can obviously
> handle all cases.
I agree.  Variation: have one function return the offset of the start of
the memory, and the other the offset of the end of the memory (or the end
plus 1), i.e. a range.  This seems more uniform to me, but I don't have a
strong preference.
> Something like this would work:
>
> define i8* @foo() {
>     %0 = tail call i32 @get_realloc_size(i8* null, i32 42)
>     %call = tail call i8* @my_recalloc(i8* null, i32 42) nounwind,
> !alloc_size !{i32 %0}
>     ret i8* %call
> }
>
> Basically I just added a function call as the metadata (it's not
> currently possible to add the function itself to the metadata; the
> function call is required instead).
> As long as the function is marked as readnone, I think it shouldn't
> interfere with the optimizers, and we can have a later pass to drop
> the metadata and remove the calls.  I still don't like having the
> explicit calls there, though.  Any suggestions to remove the functions
> calls from there?
How about this:

define i32 @lo(i32) {
   ret i32 0
}

define i32 @hi(i32 %n) {
   ret i32 %n
}

declare i8* @wonder_allocator(i32)

define i8* @foo(i32 %n) {
   %r = call i8* @wonder_allocator(i32 %n), !alloc !0
   ret i8* %r
}

!0 = metadata !{ i32 (i32)* @lo, i32 (i32)* @hi }


The main problem I see is that if you declare @lo and @hi to have internal
linkage then the optimizers will zap them.  Maybe there's a neat solution
to that.
> I feel that the offset function is probably not required. I've never
> seen an allocation function that doesn't return a pointer to the
> beginning of the allocated buffer. Also, I cannot remember of any
> function in the C library that has that behavior.
Yes, in C you probably never see such a thing, but we are not just dealing
with C here.  I think it is important to have the start offset as well as the
length.
> We will also need a convenient syntax to export this feature in the
> languages we support.
Actually, no you don't.  You could just implement GCC's alloc_size in
terms of
this, at least for the moment.  Even in the long term it's probably pretty
pointless for clang to ever expose the start offset functionality since clang
only supports C-like languages and probably (as you mentioned) this is pretty
useless for them.
> I personally would like to see '__attribute__((alloc_size( strlen(x)+1
> ))' in C, but the implementation seems to be non-trivial.
>
> About Duncan's comment about having the memory builtin analysis
> recognize this intrinsic, well I agree it should (and I'll take care
> of that), but I'm not sure if we should be very aggressive in
> optimizing based on this metadata.
It would be great for understanding that loads/stores from/to outside the bounds
of the allocation result in undef.  I think the optimizers already exploit this
kind of info in the case of alloca - maybe this helps generalize to heap
allocations.
> For example, do we really want to remove a call to a custom allocator
> whose return value is unused (like we do for malloc)?
No we don't, so LLVM's interface to malloc-like and calloc-like things
would
have to be reworked to extract out different kinds of knowledge.

   If so, we'll> also need a metadata node to mark de-allocation functions  (so that
> sequences like my_free(my_malloc(xx)) are removed).
Maybe!

Ciao, Duncan.

Hal Finkel

2012-May-29 18:02 UTC

head link

[LLVMdev] alloc_size metadata

On Tue, 29 May 2012 17:37:40 +0100
Nuno Lopes <nunoplopes at sapo.pt> wrote:
> >> On 25/05/12 17:22, John Criswell wrote:
> >> > On 5/25/12 2:16 AM, Duncan Sands wrote:
> >> >> Hi John,
> >> >>
> >> >>>>> I'm implementing the alloc_size function
attribute in clang.
> >> >>>> does anyone actually use this attribute? And if
they do, can
> >> >>>> it really buy them anything? How about
"implementing" it by
> >> >>>> ignoring it!
> >> >>>
> >> >> ...
> >> >>>
> >> >>> Currently, SAFECode has a pass which just recognizes
certain
> >> >>> functions as allocators and knows how to interpret
the
> >> >>> arguments to find the size. If we want SAFECode to
work with
> >> >>> another allocator (like a program's custom
allocator, the
> >> >>> Objective-C allocator, the Boehm garbage collector,
etc), then
> >> >>> that pass needs to be modified to recognize it.
Having to
> >> >>> update this pass for every allocator name and type is
one of
> >> >>> the few reasons why SAFECode only works with C/C++
and not
> >> >>> just any old language that is compiled down to LLVM
IR.
> >> >>
> >> >>
> >> >>> Nuno's proposed feature would allow programmers
to communicate
> >> >>> the relevant information about allocators to tools
like
> >> >>> SAFECode and ASan. I think it might also make some of
the
> >> >>> optimizations in LLVM that require knowing about
allocators
> >> >>> work on non-C/C++ code.
> >> >>
> >> >> these are good points. The attribute and proposed
implementation
> >> >> feel pretty clunky though, which is my main gripe.
> >> >
> >> > Hrm. I haven't formed an opinion on what the attributes
should
> >> > look like. I think supporting the ones established by GCC
would
> >> > be important for compatibility, and on the surface, they look
> >> > reasonable. Devising better ones for Clang is fine with me.
What
> >> > about them feels klunky?
> >>
> >> basically it feels like "I only know about C, here's
something that
> >> pretends to be general but only handles C".  Consider a
language
> >> with a string type that contains the string length as well as the
> >> characters.  It has a library function allocate_string(length).
> >> How much does it allocate?  length+4 bytes. That can't be
> >> represented by alloc_size.  What's more, it may well store the
> >> length at the start, and return a pointer to just after the
> >> length: a pointer to the first character.  alloc_size can't
> >> represent "the allocated memory starts 4 bytes before the
return
> >> value" either.  In short, it feels like a hack for handling
> >> something that turns up in some particular C code that someone
> >> has, rather than a general solution to the general problem.
> >
> > I think this is a good point, here's a suggestion:
> >
> > Have the metadata name two functions, both assumed to have the same
> > signature as the tagged function, one which returns the offset of
> > the start of the allocated region and one which returns the length
> > of the allocated region. Alternatively, these functions could take
> > the same signature and additionally the returned pointer of the
> > tagged function, and then one function can return the start of the
> > region and the other the length.
> 
> Ok, so this seems to be the most general proposal, which can
> obviously handle all cases.
> Something like this would work:
> 
> define i8* @foo() {
>    %0 = tail call i32 @get_realloc_size(i8* null, i32 42)
>    %call = tail call i8* @my_recalloc(i8* null, i32 42) nounwind,  
> !alloc_size !{i32 %0}
>    ret i8* %call
> }
> 
> Basically I just added a function call as the metadata (it's not  
> currently possible to add the function itself to the metadata; the  
> function call is required instead).
I had in mind essentially what Duncan wrote in his response (just
adding the function names, and using some rule to construct the
function calls from the allocator calls).
> As long as the function is marked as readnone, I think it shouldn't  
> interfere with the optimizers, and we can have a later pass to drop  
> the metadata and remove the calls.  I still don't like having the  
> explicit calls there, though.  Any suggestions to remove the
> functions calls from there?
> 
> I feel that the offset function is probably not required. I've never  
> seen an allocation function that doesn't return a pointer to the  
> beginning of the allocated buffer. Also, I cannot remember of any  
> function in the C library that has that behavior.
Isn't this what posix_memalign does? Anything that needs to 'round
up'
the alignment will return a pointer which may not point to the
beginning of the buffer. As Duncan noted, there are other uses as well.
> 
> We will also need a convenient syntax to export this feature in the  
> languages we support.
> I personally would like to see
> '__attribute__((alloc_size( strlen(x)+1 ))' in C, but the
> implementation seems to be non-trivial.
IMHO, this would be a very useful feature.
> 
> About Duncan's comment about having the memory builtin analysis  
> recognize this intrinsic, well I agree it should (and I'll take care  
> of that), but I'm not sure if we should be very aggressive in  
> optimizing based on this metadata.
> For example, do we really want to remove a call to a custom
> allocator whose return value is unused (like we do for malloc)?  If
> so, we'll also need a metadata node to mark de-allocation functions
> (so that sequences like my_free(my_malloc(xx)) are removed).
You might want to restrict that to custom allocators that have an
additional 'removable' attribute.

Out of curiosity, does any of this potentially effect the presence or
absence of 'inbounds' on GEP instructions?

Thanks again,
Hal
> 
> Any feedback on the issues described is highly appreciated!
> 
> Thanks,
> Nuno
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory

Reasonably Related Threads

Search for more possibly parallel threads

llvm dev - May 2012 - [LLVMdev] alloc_size metadata

[LLVMdev] alloc_size metadata

[LLVMdev] alloc_size metadata

[LLVMdev] alloc_size metadata

[LLVMdev] alloc_size metadata

Reasonably Related Threads