thr3ads.net - llvm dev - [LLVMdev] Error handling in LLVMObject library [Jun 2015]

If this information is useful, please help other people find it:
Share via:

Sean Silva

2015-May-30 01:45 UTC

[LLVMdev] Error handling in LLVMObject library

On Fri, May 29, 2015 at 4:06 PM, Alexey Samsonov <vonosmas at gmail.com>
wrote:
> Hi everyone,
>
> Having proper error handling in LLVM's Object parsing library is a nice
> thing by itself, but it would additionally allow us to find bugs by fuzzing
> (see r238451 that adds llvm-dwarfdump-fuzzer tool), for which the clean
> input validation is essential.
>
> This is a generic discussion of state of affairs. I want to do some
> progress in fuzzing before we finish it (especially if we decide to make a
> significant intrusive changes), you may scroll down for my plan.
>
> The code in lib/Object calls report_fatal_error() far too often, both when
> we're (a) just constructing the specific implementation of ObjectFile,
and
> (b) when we access its contents and find out the file is broken and
can't
> be parsed properly.
>
> We should just go and fix (a): ObjectFile factory methods return
> ErrorOr<std::unique_ptr<ObjectFile>>, and we should propagate
the error
> appropriately.
>
> (b) is harder. The current approach is to use std::error_code as a return
> type, and store the result in by-reference argument, for instance:
>   std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);
>
> I wanted to follow this approach in a proposed large MachO API change
> (http://reviews.llvm.org/D10111
>
<https://urldefense.proofpoint.com/v2/url?u=http-3A__reviews.llvm.org_D10111&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=t8fOzjcE-HffAHdlkx8x2RzDYXmOrfwTVRZMBY_I1-k&s=vIEUEH66ZENYUbm5_hrLkGBXJih9kgG91w9SNPrpGGU&e=>),
> but it raised discussion on whether this approach is right.
> Moving this discussion here. I see the following options:
>
> 1. Use the current approach:
>   std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t &Res);
>
> Pros:
>   * don't need to change a large number of (often virtual) API
functions
>   * has a nice error handling pattern used in LLVM tools:
>   uint64_t Addr;
>   if (error(getSymbolAddress(Symbol, Addr)))
>     return;  // or continue, or do anything else.
>
> Cons:
>   * return value can just be silently ignored. Adding warn_unused_result
> attribute on per-function basis is ugly
>   * adds extra overhead for cases when we're certain the result would
be
> valid.
>
> 2. Switch to ErrorOr wrapper:
>   ErrorOr<uint64_t> getSymbolAddress(DataRefImpl Symbol);
>
> Pros:
>   * handling the error is now mandatory and explicit.
>   * callers can explicitly skip error handling if they know the result
> would be valid:
>     uint64_t Addr = getSymbolAddress(Symbol).get();
>   and it would fail the assert if they are wrong.
>
> Cons:
>   * need to change lots of old code, or live with two versions of functions
>   * error handling boilerplate in regular code on call site is ugly:
>   auto AddrOrErr = getSymbolAddress(Symbol);
>   if (AddrOrErr.hasError())
>     return;  // or continue, or whatever
>   uint64_t Addr = AddrOrErr.get();
>   (can probably be improved with a macro)
>   * adds extra overhead for cases when we're certain the result would
be
> valid.
>
> On IRC discussion Lang suggested
> 3. Check the whole object file contents in constructor or validate()
> method, and get rid
> of all error codes in regular accessors.
>
Unfortunately this option isn't possible with the current libObject.
libObject is specifically designed to avoid paging in or allocating any
memory beyond simply mmap'ing the file. This makes it annoying to use, but
also allows it to be used for demanding use cases like a linker. For many
things, I've always wished we had a "libEasyObject" that avoided
this
annoyance, but it seems like a lot of work compared to the benefit.

-- Sean Silva

>
> Pros:
>   * the interface is much cleaner
>   * no extra overhead for trusted (e.g. JIT) object files.
>
> Cons:
>   * significant change, fundamentally shifting the way object files are
> handled
>   * verifier function should now absolutely everything about the object
> file, and anticipate all possible use cases. Especially hard, assuming that
> ObjectFile interface allows user to pass any garbage as input arguments
> (e.g. as DataRefImpl in the example above).
>   * verifier can be slow, and might be an overkill if we strongly desire
> to parse some bits of object file lazily.
>
> ===============>
> Instead of http://reviews.llvm.org/D10111
>
<https://urldefense.proofpoint.com/v2/url?u=http-3A__reviews.llvm.org_D10111&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=t8fOzjcE-HffAHdlkx8x2RzDYXmOrfwTVRZMBY_I1-k&s=vIEUEH66ZENYUbm5_hrLkGBXJih9kgG91w9SNPrpGGU&e=>,
> I'm going to proceed with minimal incremental changes, that would allow
> fuzzer to move forward. Namely, I want to keep the changes to headers as
> small as possible, changing functions one by one, and preferring to use
> ErrorOr<> construct (option 2 listed above). An additional benefit of
this
> is that each small incremental change would be accompanied by the test case
> generated by fuzzer, that exposed this problem.
>
> Let me know if you think it's a good or terrible idea.
>
> --
> Alexey Samsonov
> vonosmas at gmail.com
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150529/5142b4e8/attachment.html>

Rafael Espíndola

2015-Jun-02 01:16 UTC

head link

[LLVMdev] Error handling in LLVMObject library

With comdats parts of the file might never be read.

There are also multiple levels of cache and while all the relocations of a
file will fit in ram, it is probably still more efficient to validate them
as they are read.

But I think (typing on a phone) that relocations are another case where the
best is to have a more specific api: the only way the relocation is invalid
from the perspective of the api we provide is if the symbol index is
invalid. We can just return symbol_end for that and avoid the error_code
and the ErrorOr.

Cheers,
Rafael
On Jun 1, 2015 6:19 PM, "Lang Hames" <lhames at gmail.com>
wrote:
> Out of interest, what parts of an object file commonly don't get mapped
in
> by the linker? I'd have assumed linkers would usually end up touching
just
> about everything.
>
> Even assuming that whole file validation is too coarse for performance
> reasons, I'd prefer something coarser than what we currently have. For
> example, right now every access to a relocation has to deal with an error
> return, but I assume it's rare to parse a single relocation in
isolation.
> Why don't we validate all relocations for each section up front, then
have
> a simpler API for accessing them?
>
> I don't have especially strong feelings about this, just a nagging
sense
> that libObject's error handling is unnecessarily fine grained.
>
> FWIW, of the other two proposals I prefer the ErrorOr approach.
>
> Cheers,
> Lang.
>
> On Fri, May 29, 2015 at 6:45 PM, Sean Silva <chisophugis at
gmail.com> wrote:
>
>>
>>
>> On Fri, May 29, 2015 at 4:06 PM, Alexey Samsonov <vonosmas at
gmail.com>
>> wrote:
>>
>>> Hi everyone,
>>>
>>> Having proper error handling in LLVM's Object parsing library
is a nice
>>> thing by itself, but it would additionally allow us to find bugs by
fuzzing
>>> (see r238451 that adds llvm-dwarfdump-fuzzer tool), for which the
clean
>>> input validation is essential.
>>>
>>> This is a generic discussion of state of affairs. I want to do some
>>> progress in fuzzing before we finish it (especially if we decide to
make a
>>> significant intrusive changes), you may scroll down for my plan.
>>>
>>> The code in lib/Object calls report_fatal_error() far too often,
both
>>> when we're (a) just constructing the specific implementation of
ObjectFile,
>>> and (b) when we access its contents and find out the file is broken
and
>>> can't be parsed properly.
>>>
>>> We should just go and fix (a): ObjectFile factory methods return
>>> ErrorOr<std::unique_ptr<ObjectFile>>, and we should
propagate the error
>>> appropriately.
>>>
>>> (b) is harder. The current approach is to use std::error_code as a
>>> return type, and store the result in by-reference argument, for
instance:
>>>   std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t
&Res);
>>>
>>> I wanted to follow this approach in a proposed large MachO API
change
>>> (http://reviews.llvm.org/D10111
>>>
<https://urldefense.proofpoint.com/v2/url?u=http-3A__reviews.llvm.org_D10111&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=t8fOzjcE-HffAHdlkx8x2RzDYXmOrfwTVRZMBY_I1-k&s=vIEUEH66ZENYUbm5_hrLkGBXJih9kgG91w9SNPrpGGU&e=>),
>>> but it raised discussion on whether this approach is right.
>>> Moving this discussion here. I see the following options:
>>>
>>> 1. Use the current approach:
>>>   std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t
&Res);
>>>
>>> Pros:
>>>   * don't need to change a large number of (often virtual) API
functions
>>>   * has a nice error handling pattern used in LLVM tools:
>>>   uint64_t Addr;
>>>   if (error(getSymbolAddress(Symbol, Addr)))
>>>     return;  // or continue, or do anything else.
>>>
>>> Cons:
>>>   * return value can just be silently ignored. Adding
warn_unused_result
>>> attribute on per-function basis is ugly
>>>   * adds extra overhead for cases when we're certain the result
would be
>>> valid.
>>>
>>> 2. Switch to ErrorOr wrapper:
>>>   ErrorOr<uint64_t> getSymbolAddress(DataRefImpl Symbol);
>>>
>>> Pros:
>>>   * handling the error is now mandatory and explicit.
>>>   * callers can explicitly skip error handling if they know the
result
>>> would be valid:
>>>     uint64_t Addr = getSymbolAddress(Symbol).get();
>>>   and it would fail the assert if they are wrong.
>>>
>>> Cons:
>>>   * need to change lots of old code, or live with two versions of
>>> functions
>>>   * error handling boilerplate in regular code on call site is
ugly:
>>>   auto AddrOrErr = getSymbolAddress(Symbol);
>>>   if (AddrOrErr.hasError())
>>>     return;  // or continue, or whatever
>>>   uint64_t Addr = AddrOrErr.get();
>>>   (can probably be improved with a macro)
>>>   * adds extra overhead for cases when we're certain the result
would be
>>> valid.
>>>
>>> On IRC discussion Lang suggested
>>> 3. Check the whole object file contents in constructor or
validate()
>>> method, and get rid
>>> of all error codes in regular accessors.
>>>
>>
>> Unfortunately this option isn't possible with the current
libObject.
>> libObject is specifically designed to avoid paging in or allocating any
>> memory beyond simply mmap'ing the file. This makes it annoying to
use, but
>> also allows it to be used for demanding use cases like a linker. For
many
>> things, I've always wished we had a "libEasyObject" that
avoided this
>> annoyance, but it seems like a lot of work compared to the benefit.
>>
>> -- Sean Silva
>>
>>
>>>
>>> Pros:
>>>   * the interface is much cleaner
>>>   * no extra overhead for trusted (e.g. JIT) object files.
>>>
>>> Cons:
>>>   * significant change, fundamentally shifting the way object files
are
>>> handled
>>>   * verifier function should now absolutely everything about the
object
>>> file, and anticipate all possible use cases. Especially hard,
assuming that
>>> ObjectFile interface allows user to pass any garbage as input
arguments
>>> (e.g. as DataRefImpl in the example above).
>>>   * verifier can be slow, and might be an overkill if we strongly
desire
>>> to parse some bits of object file lazily.
>>>
>>> ===============>>>
>>> Instead of http://reviews.llvm.org/D10111
>>>
<https://urldefense.proofpoint.com/v2/url?u=http-3A__reviews.llvm.org_D10111&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=t8fOzjcE-HffAHdlkx8x2RzDYXmOrfwTVRZMBY_I1-k&s=vIEUEH66ZENYUbm5_hrLkGBXJih9kgG91w9SNPrpGGU&e=>,
>>> I'm going to proceed with minimal incremental changes, that
would allow
>>> fuzzer to move forward. Namely, I want to keep the changes to
headers as
>>> small as possible, changing functions one by one, and preferring to
use
>>> ErrorOr<> construct (option 2 listed above). An additional
benefit of this
>>> is that each small incremental change would be accompanied by the
test case
>>> generated by fuzzer, that exposed this problem.
>>>
>>> Let me know if you think it's a good or terrible idea.
>>>
>>> --
>>> Alexey Samsonov
>>> vonosmas at gmail.com
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150601/dd40fac9/attachment.html>

Maybe Matching Threads

Search for more possibly parallel threads

llvm dev - Jun 2015 - [LLVMdev] Error handling in LLVMObject library

[LLVMdev] Error handling in LLVMObject library

[LLVMdev] Error handling in LLVMObject library

Maybe Matching Threads