thr3ads.net - llvm dev - [LLVMdev] alloc

If this information is useful, please help other people find it:
Share via:

John Criswell

2012-May-25 15:22 UTC

[LLVMdev] alloc_size metadata

On 5/25/12 2:16 AM, Duncan Sands wrote:> Hi John,
>
>>>> I'm implementing the alloc_size function attribute in
clang.
>>> does anyone actually use this attribute? And if they do, can it 
>>> really buy
>>> them anything? How about "implementing" it by ignoring
it!
>>
> ...
>>
>> Currently, SAFECode has a pass which just recognizes certain 
>> functions as
>> allocators and knows how to interpret the arguments to find the size. 
>> If we want
>> SAFECode to work with another allocator (like a program's custom 
>> allocator, the
>> Objective-C allocator, the Boehm garbage collector, etc), then that 
>> pass needs
>> to be modified to recognize it. Having to update this pass for every 
>> allocator
>> name and type is one of the few reasons why SAFECode only works with 
>> C/C++ and
>> not just any old language that is compiled down to LLVM IR.
>
>
>> Nuno's proposed feature would allow programmers to communicate the 
>> relevant
>> information about allocators to tools like SAFECode and ASan. I think 
>> it might
>> also make some of the optimizations in LLVM that require knowing about
>> allocators work on non-C/C++ code.
>
> these are good points.  The attribute and proposed implementation feel 
> pretty
> clunky though, which is my main gripe.
Hrm.  I haven't formed an opinion on what the attributes should look 
like.  I think supporting the ones established by GCC would be important 
for compatibility, and on the surface, they look reasonable.  Devising 
better ones for Clang is fine with me.  What about them feels klunky?
>
> Since LLVM already has utility functions for recognizing allocators 
> (i.e. that
> know about malloc, realloc and -fno-builtin etc) can't SAFECode just 
> make use
> of them?
It probably could.  It doesn't simply because SAFECode was written 
before these features existed within LLVM.
:)
> Then either (1) something like alloc_size is implemented, the LLVM
> utility learns about it, and SAFECode benefits automagically, or (2) 
> the LLVM
> utility is taught about other allocators like Ada's, and SAFECode 
> benefits
> automagically.
I'm not sure what you mean by "LLVM utility," but I think
we're thinking
along the same lines.  Clang/LLVM implement the alloc_size attributes, 
we change SAFECode to recognize it, and so when people use it, SAFECode 
benefits automagically.

Am I right that we're thinking the same thing, or did I completely 
misunderstand you?

-- John T.
>
> Ciao, Duncan.

Duncan Sands

2012-May-25 15:43 UTC

head link

[LLVMdev] alloc_size metadata

Hi John,

On 25/05/12 17:22, John Criswell wrote:> On 5/25/12 2:16 AM, Duncan Sands wrote:
>> Hi John,
>>
>>>>> I'm implementing the alloc_size function attribute in
clang.
>>>> does anyone actually use this attribute? And if they do, can it
really buy
>>>> them anything? How about "implementing" it by
ignoring it!
>>>
>> ...
>>>
>>> Currently, SAFECode has a pass which just recognizes certain
functions as
>>> allocators and knows how to interpret the arguments to find the
size. If we want
>>> SAFECode to work with another allocator (like a program's
custom allocator, the
>>> Objective-C allocator, the Boehm garbage collector, etc), then that
pass needs
>>> to be modified to recognize it. Having to update this pass for
every allocator
>>> name and type is one of the few reasons why SAFECode only works
with C/C++ and
>>> not just any old language that is compiled down to LLVM IR.
>>
>>
>>> Nuno's proposed feature would allow programmers to communicate
the relevant
>>> information about allocators to tools like SAFECode and ASan. I
think it might
>>> also make some of the optimizations in LLVM that require knowing
about
>>> allocators work on non-C/C++ code.
>>
>> these are good points. The attribute and proposed implementation feel
pretty
>> clunky though, which is my main gripe.
>
> Hrm. I haven't formed an opinion on what the attributes should look
like. I
> think supporting the ones established by GCC would be important for
> compatibility, and on the surface, they look reasonable. Devising better
ones
> for Clang is fine with me. What about them feels klunky?
basically it feels like "I only know about C, here's something that
pretends to
be general but only handles C".  Consider a language with a string type
that
contains the string length as well as the characters.  It has a library function
allocate_string(length).  How much does it allocate?  length+4 bytes. That
can't
be represented by alloc_size.  What's more, it may well store the length at
the
start, and return a pointer to just after the length: a pointer to the first
character.  alloc_size can't represent "the allocated memory starts 4
bytes
before the return value" either.  In short, it feels like a hack for
handling
something that turns up in some particular C code that someone has, rather than
a general solution to the general problem.
>> Since LLVM already has utility functions for recognizing allocators
(i.e. that
>> know about malloc, realloc and -fno-builtin etc) can't SAFECode
just make use
>> of them?
>
> It probably could. It doesn't simply because SAFECode was written
before these
> features existed within LLVM.
> :)
>
>> Then either (1) something like alloc_size is implemented, the LLVM
>> utility learns about it, and SAFECode benefits automagically, or (2)
the LLVM
>> utility is taught about other allocators like Ada's, and SAFECode
benefits
>> automagically.
>
> I'm not sure what you mean by "LLVM utility," but I think
we're thinking along
> the same lines. Clang/LLVM implement the alloc_size attributes, we change
> SAFECode to recognize it, and so when people use it, SAFECode benefits
> automagically.
>
> Am I right that we're thinking the same thing, or did I completely
misunderstand
> you?
no, I'm thinking that SAFECode won't need to look at or worry about the
attribute at all, because the LLVM methods will know about it and serve
up the appropriate info.  Take a look at Analysis/MemoryBuiltins.h.  In
spite of the names, things like extractMallocCall are dealing with "malloc
like" functions, such as C++'s "new" as well as malloc. 
Similarly for
calloc.  So you could use those right now to extract "malloc" and
"calloc"
sizes.  If alloc_size is implemented, presumably these would just magically
start to give you useful sizes for functions annotated with that attribute too.

Ciao, Duncan.

Hal Finkel

2012-May-25 16:28 UTC

head link

[LLVMdev] alloc_size metadata

On Fri, 25 May 2012 17:43:52 +0200
Duncan Sands <baldrick at free.fr> wrote:
> Hi John,
> 
> On 25/05/12 17:22, John Criswell wrote:
> > On 5/25/12 2:16 AM, Duncan Sands wrote:
> >> Hi John,
> >>
> >>>>> I'm implementing the alloc_size function attribute
in clang.
> >>>> does anyone actually use this attribute? And if they do,
can it
> >>>> really buy them anything? How about
"implementing" it by
> >>>> ignoring it!
> >>>
> >> ...
> >>>
> >>> Currently, SAFECode has a pass which just recognizes certain
> >>> functions as allocators and knows how to interpret the
arguments
> >>> to find the size. If we want SAFECode to work with another
> >>> allocator (like a program's custom allocator, the
Objective-C
> >>> allocator, the Boehm garbage collector, etc), then that pass
> >>> needs to be modified to recognize it. Having to update this
pass
> >>> for every allocator name and type is one of the few reasons
why
> >>> SAFECode only works with C/C++ and not just any old language
that
> >>> is compiled down to LLVM IR.
> >>
> >>
> >>> Nuno's proposed feature would allow programmers to
communicate
> >>> the relevant information about allocators to tools like
SAFECode
> >>> and ASan. I think it might also make some of the optimizations
in
> >>> LLVM that require knowing about allocators work on non-C/C++
code.
> >>
> >> these are good points. The attribute and proposed implementation
> >> feel pretty clunky though, which is my main gripe.
> >
> > Hrm. I haven't formed an opinion on what the attributes should
look
> > like. I think supporting the ones established by GCC would be
> > important for compatibility, and on the surface, they look
> > reasonable. Devising better ones for Clang is fine with me. What
> > about them feels klunky?
> 
> basically it feels like "I only know about C, here's something
that
> pretends to be general but only handles C".  Consider a language with
> a string type that contains the string length as well as the
> characters.  It has a library function allocate_string(length).  How
> much does it allocate?  length+4 bytes. That can't be represented by
> alloc_size.  What's more, it may well store the length at the start,
> and return a pointer to just after the length: a pointer to the first
> character.  alloc_size can't represent "the allocated memory
starts 4
> bytes before the return value" either.  In short, it feels like a
> hack for handling something that turns up in some particular C code
> that someone has, rather than a general solution to the general
> problem.
I think this is a good point, here's a suggestion:

Have the metadata name two functions, both assumed to have the same
signature as the tagged function, one which returns the offset of the
start of the allocated region and one which returns the length of the
allocated region. Alternatively, these functions could take the same
signature and additionally the returned pointer of the tagged
function, and then one function can return the start of the region and
the other the length.

For static analysis, we can attempt to inline these functions and then
use SCEV (dead code elimination will then get rid of the unused
results). For runtime checks, calls (which may also be inlined) can be
easily constructed.
> 
> >> Since LLVM already has utility functions for recognizing
> >> allocators (i.e. that know about malloc, realloc and -fno-builtin
> >> etc) can't SAFECode just make use of them?
> >
> > It probably could. It doesn't simply because SAFECode was written
> > before these features existed within LLVM.
> > :)
> >
> >> Then either (1) something like alloc_size is implemented, the LLVM
> >> utility learns about it, and SAFECode benefits automagically, or
> >> (2) the LLVM utility is taught about other allocators like
Ada's,
> >> and SAFECode benefits automagically.
> >
> > I'm not sure what you mean by "LLVM utility," but I
think we're
> > thinking along the same lines. Clang/LLVM implement the alloc_size
> > attributes, we change SAFECode to recognize it, and so when people
> > use it, SAFECode benefits automagically.
> >
> > Am I right that we're thinking the same thing, or did I completely
> > misunderstand you?
> 
> no, I'm thinking that SAFECode won't need to look at or worry about
> the attribute at all, because the LLVM methods will know about it and
> serve up the appropriate info.  Take a look at
> Analysis/MemoryBuiltins.h.  In spite of the names, things like
> extractMallocCall are dealing with "malloc like" functions, such
as
> C++'s "new" as well as malloc.  Similarly for calloc.  So you
could
> use those right now to extract "malloc" and "calloc"
sizes.  If
> alloc_size is implemented, presumably these would just magically
> start to give you useful sizes for functions annotated with that
> attribute too.
Does the current code even handle calloc? I only see malloc and new.

 -Hal
> 
> Ciao, Duncan.
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory

John Criswell

2012-May-25 16:38 UTC

head link

[LLVMdev] alloc_size metadata

On 5/25/12 10:43 AM, Duncan Sands wrote:> Hi John,
>
> On 25/05/12 17:22, John Criswell wrote:
>> On 5/25/12 2:16 AM, Duncan Sands wrote:
>>> Hi John,
>>>
>>>>>> I'm implementing the alloc_size function attribute
in clang.
>>>>> does anyone actually use this attribute? And if they do,
can it
>>>>> really buy
>>>>> them anything? How about "implementing" it by
ignoring it!
>>>>
>>> ...
>>>>
>>>> Currently, SAFECode has a pass which just recognizes certain 
>>>> functions as
>>>> allocators and knows how to interpret the arguments to find the
>>>> size. If we want
>>>> SAFECode to work with another allocator (like a program's
custom
>>>> allocator, the
>>>> Objective-C allocator, the Boehm garbage collector, etc), then
that
>>>> pass needs
>>>> to be modified to recognize it. Having to update this pass for 
>>>> every allocator
>>>> name and type is one of the few reasons why SAFECode only works
>>>> with C/C++ and
>>>> not just any old language that is compiled down to LLVM IR.
>>>
>>>
>>>> Nuno's proposed feature would allow programmers to
communicate the
>>>> relevant
>>>> information about allocators to tools like SAFECode and ASan. I
>>>> think it might
>>>> also make some of the optimizations in LLVM that require
knowing about
>>>> allocators work on non-C/C++ code.
>>>
>>> these are good points. The attribute and proposed implementation 
>>> feel pretty
>>> clunky though, which is my main gripe.
>>
>> Hrm. I haven't formed an opinion on what the attributes should look
>> like. I
>> think supporting the ones established by GCC would be important for
>> compatibility, and on the surface, they look reasonable. Devising 
>> better ones
>> for Clang is fine with me. What about them feels klunky?
>
> basically it feels like "I only know about C, here's something
that
> pretends to
> be general but only handles C".  Consider a language with a string 
> type that
> contains the string length as well as the characters.  It has a 
> library function
> allocate_string(length).  How much does it allocate?  length+4 bytes. 
> That can't
> be represented by alloc_size.  What's more, it may well store the 
> length at the
> start, and return a pointer to just after the length: a pointer to the 
> first
> character.  alloc_size can't represent "the allocated memory
starts 4
> bytes
> before the return value" either.  In short, it feels like a hack for 
> handling
> something that turns up in some particular C code that someone has, 
> rather than
> a general solution to the general problem.
True.  It also doesn't handle a number of "C" allocators like
strdup(),
etc.  Making it that general, though, may be tricky, and I don't think 
it negates the utility of the simpler form.  I suspect a fair number of 
allocators could be described by the alloc_size feature.

Even in the C/C++ world, I think it'd be useful.  There's the 
GC_malloc() in Boehm's garbage collector, kmalloc() in the Linux kernel, 
malloc wrappers in applications, memalign(), etc.
>
>>> Since LLVM already has utility functions for recognizing allocators
>>> (i.e. that
>>> know about malloc, realloc and -fno-builtin etc) can't SAFECode
just
>>> make use
>>> of them?
>>
>> It probably could. It doesn't simply because SAFECode was written 
>> before these
>> features existed within LLVM.
>> :)
>>
>> [snip]
> no, I'm thinking that SAFECode won't need to look at or worry about
the
> attribute at all, because the LLVM methods will know about it and serve
> up the appropriate info.  Take a look at Analysis/MemoryBuiltins.h.  In
> spite of the names, things like extractMallocCall are dealing with 
> "malloc
> like" functions, such as C++'s "new" as well as malloc. 
Similarly for
> calloc.  So you could use those right now to extract "malloc" and
> "calloc"
> sizes.  If alloc_size is implemented, presumably these would just 
> magically
> start to give you useful sizes for functions annotated with that 
> attribute too.
I see.  That makes sense.

-- John T.

Nuno Lopes

2012-May-25 17:41 UTC

head link

[LLVMdev] alloc_size metadata

>>>> Currently, SAFECode has a pass which just recognizes certain
functions as
>>>> allocators and knows how to interpret the arguments to find the
>>>> size. If we want
>>>> SAFECode to work with another allocator (like a program's
custom
>>>> allocator, the
>>>> Objective-C allocator, the Boehm garbage collector, etc), then
>>>> that pass needs
>>>> to be modified to recognize it. Having to update this pass for
>>>> every allocator
>>>> name and type is one of the few reasons why SAFECode only works
>>>> with C/C++ and
>>>> not just any old language that is compiled down to LLVM IR.
>>>
>>>
>>>> Nuno's proposed feature would allow programmers to
communicate
>>>> the relevant
>>>> information about allocators to tools like SAFECode and ASan. I
>>>> think it might
>>>> also make some of the optimizations in LLVM that require
knowing about
>>>> allocators work on non-C/C++ code.
>>>
>>> these are good points. The attribute and proposed implementation  
>>> feel pretty
>>> clunky though, which is my main gripe.
>>
>> Hrm. I haven't formed an opinion on what the attributes should look
like. I
>> think supporting the ones established by GCC would be important for
>> compatibility, and on the surface, they look reasonable. Devising  
>> better ones
>> for Clang is fine with me. What about them feels klunky?
>
> basically it feels like "I only know about C, here's something
that
> pretends to
> be general but only handles C".  Consider a language with a string
type that
> contains the string length as well as the characters.  It has a  
> library function
> allocate_string(length).  How much does it allocate?  length+4  
> bytes. That can't
> be represented by alloc_size.  What's more, it may well store the  
> length at the
> start, and return a pointer to just after the length: a pointer to the
first
> character.  alloc_size can't represent "the allocated memory
starts 4 bytes
> before the return value" either.  In short, it feels like a hack for
handling
> something that turns up in some particular C code that someone has,  
> rather than
> a general solution to the general problem.
It's not a general solution, and not it even for C, of course.
But it's very useful for applications that have their own malloc  
wrappers and implementations. For example, LLVM, which has its own  
allocators! Without this metadata, you'll never be able to analyze  
LLVM's code at all. It's simply impossible to detect, in general, if a  
function is a custom allocator.
So, yes, some metadata is necessary. I agree my proposal is not  
general enough for all applications. For example, I run the tool over  
some code yesterday and I found an allocator that is the following:  
alloc(x, y, x) and allocates 'x * y + z' bytes. And that cannot be  
represented either at source-code level (with GCC's attribute) nor at  
IR level following my metadata proposal.
I'm happy to implement something more general if we come up with a  
better design.

Nuno

Maybe Matching Threads

Search for more seemingly similar threads

llvm dev - May 2012 - [LLVMdev] alloc_size metadata

[LLVMdev] alloc_size metadata

[LLVMdev] alloc_size metadata

[LLVMdev] alloc_size metadata

[LLVMdev] alloc_size metadata

[LLVMdev] alloc_size metadata

Maybe Matching Threads