With comdats parts of the file might never be read.
There are also multiple levels of cache and while all the relocations of a
file will fit in ram, it is probably still more efficient to validate them
as they are read.
But I think (typing on a phone) that relocations are another case where the
best is to have a more specific api: the only way the relocation is invalid
from the perspective of the api we provide is if the symbol index is
invalid. We can just return symbol_end for that and avoid the error_code
and the ErrorOr.
Cheers,
Rafael
On Jun 1, 2015 6:19 PM, "Lang Hames" <lhames at gmail.com>
wrote:
> Out of interest, what parts of an object file commonly don't get mapped
in
> by the linker? I'd have assumed linkers would usually end up touching
just
> about everything.
>
> Even assuming that whole file validation is too coarse for performance
> reasons, I'd prefer something coarser than what we currently have. For
> example, right now every access to a relocation has to deal with an error
> return, but I assume it's rare to parse a single relocation in
isolation.
> Why don't we validate all relocations for each section up front, then
have
> a simpler API for accessing them?
>
> I don't have especially strong feelings about this, just a nagging
sense
> that libObject's error handling is unnecessarily fine grained.
>
> FWIW, of the other two proposals I prefer the ErrorOr approach.
>
> Cheers,
> Lang.
>
> On Fri, May 29, 2015 at 6:45 PM, Sean Silva <chisophugis at
gmail.com> wrote:
>
>>
>>
>> On Fri, May 29, 2015 at 4:06 PM, Alexey Samsonov <vonosmas at
gmail.com>
>> wrote:
>>
>>> Hi everyone,
>>>
>>> Having proper error handling in LLVM's Object parsing library
is a nice
>>> thing by itself, but it would additionally allow us to find bugs by
fuzzing
>>> (see r238451 that adds llvm-dwarfdump-fuzzer tool), for which the
clean
>>> input validation is essential.
>>>
>>> This is a generic discussion of state of affairs. I want to do some
>>> progress in fuzzing before we finish it (especially if we decide to
make a
>>> significant intrusive changes), you may scroll down for my plan.
>>>
>>> The code in lib/Object calls report_fatal_error() far too often,
both
>>> when we're (a) just constructing the specific implementation of
ObjectFile,
>>> and (b) when we access its contents and find out the file is broken
and
>>> can't be parsed properly.
>>>
>>> We should just go and fix (a): ObjectFile factory methods return
>>> ErrorOr<std::unique_ptr<ObjectFile>>, and we should
propagate the error
>>> appropriately.
>>>
>>> (b) is harder. The current approach is to use std::error_code as a
>>> return type, and store the result in by-reference argument, for
instance:
>>> std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t
&Res);
>>>
>>> I wanted to follow this approach in a proposed large MachO API
change
>>> (http://reviews.llvm.org/D10111
>>>
<https://urldefense.proofpoint.com/v2/url?u=http-3A__reviews.llvm.org_D10111&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=t8fOzjcE-HffAHdlkx8x2RzDYXmOrfwTVRZMBY_I1-k&s=vIEUEH66ZENYUbm5_hrLkGBXJih9kgG91w9SNPrpGGU&e=>),
>>> but it raised discussion on whether this approach is right.
>>> Moving this discussion here. I see the following options:
>>>
>>> 1. Use the current approach:
>>> std::error_code getSymbolAddress(DataRefImpl Symbol, uint64_t
&Res);
>>>
>>> Pros:
>>> * don't need to change a large number of (often virtual) API
functions
>>> * has a nice error handling pattern used in LLVM tools:
>>> uint64_t Addr;
>>> if (error(getSymbolAddress(Symbol, Addr)))
>>> return; // or continue, or do anything else.
>>>
>>> Cons:
>>> * return value can just be silently ignored. Adding
warn_unused_result
>>> attribute on per-function basis is ugly
>>> * adds extra overhead for cases when we're certain the result
would be
>>> valid.
>>>
>>> 2. Switch to ErrorOr wrapper:
>>> ErrorOr<uint64_t> getSymbolAddress(DataRefImpl Symbol);
>>>
>>> Pros:
>>> * handling the error is now mandatory and explicit.
>>> * callers can explicitly skip error handling if they know the
result
>>> would be valid:
>>> uint64_t Addr = getSymbolAddress(Symbol).get();
>>> and it would fail the assert if they are wrong.
>>>
>>> Cons:
>>> * need to change lots of old code, or live with two versions of
>>> functions
>>> * error handling boilerplate in regular code on call site is
ugly:
>>> auto AddrOrErr = getSymbolAddress(Symbol);
>>> if (AddrOrErr.hasError())
>>> return; // or continue, or whatever
>>> uint64_t Addr = AddrOrErr.get();
>>> (can probably be improved with a macro)
>>> * adds extra overhead for cases when we're certain the result
would be
>>> valid.
>>>
>>> On IRC discussion Lang suggested
>>> 3. Check the whole object file contents in constructor or
validate()
>>> method, and get rid
>>> of all error codes in regular accessors.
>>>
>>
>> Unfortunately this option isn't possible with the current
libObject.
>> libObject is specifically designed to avoid paging in or allocating any
>> memory beyond simply mmap'ing the file. This makes it annoying to
use, but
>> also allows it to be used for demanding use cases like a linker. For
many
>> things, I've always wished we had a "libEasyObject" that
avoided this
>> annoyance, but it seems like a lot of work compared to the benefit.
>>
>> -- Sean Silva
>>
>>
>>>
>>> Pros:
>>> * the interface is much cleaner
>>> * no extra overhead for trusted (e.g. JIT) object files.
>>>
>>> Cons:
>>> * significant change, fundamentally shifting the way object files
are
>>> handled
>>> * verifier function should now absolutely everything about the
object
>>> file, and anticipate all possible use cases. Especially hard,
assuming that
>>> ObjectFile interface allows user to pass any garbage as input
arguments
>>> (e.g. as DataRefImpl in the example above).
>>> * verifier can be slow, and might be an overkill if we strongly
desire
>>> to parse some bits of object file lazily.
>>>
>>> ===============>>>
>>> Instead of http://reviews.llvm.org/D10111
>>>
<https://urldefense.proofpoint.com/v2/url?u=http-3A__reviews.llvm.org_D10111&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=t8fOzjcE-HffAHdlkx8x2RzDYXmOrfwTVRZMBY_I1-k&s=vIEUEH66ZENYUbm5_hrLkGBXJih9kgG91w9SNPrpGGU&e=>,
>>> I'm going to proceed with minimal incremental changes, that
would allow
>>> fuzzer to move forward. Namely, I want to keep the changes to
headers as
>>> small as possible, changing functions one by one, and preferring to
use
>>> ErrorOr<> construct (option 2 listed above). An additional
benefit of this
>>> is that each small incremental change would be accompanied by the
test case
>>> generated by fuzzer, that exposed this problem.
>>>
>>> Let me know if you think it's a good or terrible idea.
>>>
>>> --
>>> Alexey Samsonov
>>> vonosmas at gmail.com
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150601/dd40fac9/attachment.html>