thr3ads.net - llvm dev - [llvm-dev] RFC: Extend UBSan with qsort checks [Jan 2016]

If this information is useful, please help other people find it:
Share via:

Yuri Gribov via llvm-dev

2016-Jan-14 06:37 UTC

[llvm-dev] RFC: Extend UBSan with qsort checks

On Thu, Jan 14, 2016 at 3:43 AM, Paul Pluzhnikov <ppluzhnikov at
google.com> wrote:> On Wed, Jan 13, 2016 at 4:17 PM, Kostya Serebryany <kcc at
google.com> wrote:
>> Inviting Paul to the party (he wrote the libstdc++ sort checker).
>>
>> On Tue, Jan 12, 2016 at 11:09 PM, Yury Gribov <y.gribov at
samsung.com> wrote:
>>>
>>> On 01/13/2016 10:08 AM, Yury Gribov wrote:
>>>>
>>>> On 01/13/2016 09:57 AM, Kostya Serebryany wrote:
>>>>>
>>>>> On Tue, Jan 12, 2016 at 10:28 PM, Yury Gribov <y.gribov
at samsung.com>
>>>>> wrote:
>>>>>
>>>>>> On 01/13/2016 03:10 AM, Kostya Serebryany wrote:
>>>>>>
>>>>>>> FTR, here is one way to implement this in the
library:
>>>>>>>
>>>>>>>
>>>>>>>
https://gcc.gnu.org/svn/gcc/branches/google/gcc-4_9/libstdc++-v3/include/bits/stl_algo.h
>>>>>>>
>>>>>>> Search for "check sort predicate for strict
weak ordering"
>>>>>>>
>>>>>>
>>>>>> Nice, although this wouldn't catch violations of
transitivity (which is
>>>>>> probably the most important type of bug).
>>>>>>
>>>>>>
>>>>> I thought it would...
>>>>
>>>>
>>>> But debug operator() only knows about 2 inputs whereas for
transitivity
>>>> you'll need to consider at least 3.
>
> As I privately replied to Kostya, I was only willing to add constant
overhead.
Sure, there are various conflicting tradeoffs. SortChecker started as
an experiment so I was as agressive as possible.
> SortChecker adds N*N overhead:
> https://github.com/yugr/sortcheck/blob/master/src/sortchecker.c#L427
> except he caps it at N=32 on line 413
I'm afraid it's even N^3 (with N=32 cap). This indeed sounds scary but
I have not seen significant slowdowns when running instrumented
distro.
> (and thus will not detect all
> violations either).
Right, I thought about improving this with testing random 32 elements
instead of the first ones.
> Adding up to 1024 comparisons to large sort()s maybe isn't so bad.
>
>>>>> Can you give a test?
>>>
>>>
>>> Here's an example of transitivity violation in GCC:
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68988
>
> Here is one from GDB:
> https://sourceware.org/bugzilla/show_bug.cgi?id=19361
Yeah but this one is still in discussion so I didn't bother to mention it.
>>>>>> On Tue, Jan 12, 2016 at 2:57 PM, Alexey Samsonov via
llvm-dev <
>>>>>>>
>>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>>
>>>>>>> (+correct cfe-dev list)
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jan 12, 2016 at 2:57 PM, Alexey
Samsonov <vonosmas at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi Yuri,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Jan 11, 2016 at 9:53 AM, Yury
Gribov via llvm-dev <
>>>>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> UndefinedBehaviorSanitizer currently
does not check for undefined
>>>>>>>>>> behaviors which result from improper
usage of standard library
>>>>>>>>>> functions.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> It's not really an undefined behavior
per se: you just violate the
>>>>>>>>> contract provided by library functions (at
least I haven't found a
>>>>>>>>> clause
>>>>>>>>> in C++ standard
>>>>>>>>> that specifies that the behavior of smth.
like std::sort() with
>>>>>>>>> invalid
>>>>>>>>> operator< is invalid). The actual
behavior of these functions is
>>>>>>>>> determined by library authors, and can be
anything from UB to
>>>>>>>>> unspecified
>>>>>>>>> result, to nicely formatted error report
(if you're using a debug
>>>>>>>>> version
>>>>>>>>> of system library):
>>>>>>>>> I think I actually encountered all these
cases.
>>>>>>>>>
>>>>>>>>> That said, the benefit of checking the
sorting routines is clear:
>>>>>>>>> I'm
>>>>>>>>> not
>>>>>>>>> surprised you were able to catch numerous
bugs with your tool.
>>>>>>>>>
>>>>>>>>> UBSan currently doesn't have
interceptors, and all of its checks
>>>>>>>>> [1] are
>>>>>>>>> designed to work without runtime library
support, so that it can be
>>>>>>>>> used as
>>>>>>>>> a mitigation
>>>>>>>>> tool (with -fsanitize-trap=undefined).
It's challenging to add UBSan
>>>>>>>>> interceptors at this point: this would
limit the tool portability
>>>>>>>>> (at
>>>>>>>>> least
>>>>>>>>> portability of some functionality),
>>>>>>>>> and complicate interoperation of UBSan with
another tools. There
>>>>>>>>> is an
>>>>>>>>> opportunity to, say, add qsort()
interceptor to ASan/TSan/MSan, and
>>>>>>>>> check
>>>>>>>>> arguments there.
>>>>>>>>> This functionality can be controlled by a
runtime flag.
>>>>>>>>>
>>>>>>>>> Why do you need compiler instrumentation?
Do you want to
>>>>>>>>> automatically
>>>>>>>>> inject hooks to SortChecker into
std::sort() functions, so that you
>>>>>>>>> don't
>>>>>>>>> need to annotate C++ standard library code?
>>>>>>>>> We submitted some annotations to libc++
code (e.g. to report
>>>>>>>>> containter
>>>>>>>>> overflow bugs in sanitizers).
>>>>>>>>>
>>>>>>>>> [1] -fsanitize=vptr is an only notable
exception
>>>>>>>>>
>>>>>>>>> One notorious instance of such errors is
invalid usage of qsort or
>>>>>>>>>
>>>>>>>>>> bsearch routines (or std::sort and
friends in case of C++):
>>>>>>>>>> * using comparison function that
violates ordering axioms
>>>>>>>>>> (reflexivity,
>>>>>>>>>> symmetry, transitivity)
>>>>>>>>>> * returning unstable results from
comparison function
>>>>>>>>>> * passing unsorted array to bsearch
>>>>>>>>>> * etc.
>>>>>>>>>>
>>>>>>>>>> Errors like this will usually result in
slightly invalid output
>>>>>>>>>> (not
>>>>>>>>>> fully sorted arrays, rare failed
bsearches, etc.) but may as well
>>>>>>>>>> cause
>>>>>>>>>> aborts on some systems (
>>>>>>>>>>
https://bugzilla.samba.org/show_bug.cgi?id=3959
>>>>>>>>>> ).
>>>>>>>>>>
>>>>>>>>>> I've recently developed a simple
proof-of-concept tool SortChecker
>>>>>>>>>> (
>>>>>>>>>> https://github.com/yugr/sortcheck)
which intercepts calls to
>>>>>>>>>> qsort and
>>>>>>>>>> friends (via LD_PRELOAD) and performs
various sanity checks before
>>>>>>>>>> passing
>>>>>>>>>> control to libc e.g.
>>>>>>>>>> * sign(cmp(a, b)) == - sign(cmp(b, a))
for all array elements
>>>>>>>>>> * etc.
>>>>>>>>>>
>>>>>>>>>> Results were quite inspiring: I've
found several errors in popular
>>>>>>>>>> open-source packages (GCC, Harfbuzz,
dpkg, libXt, etc.). I'm also
>>>>>>>>>> pretty
>>>>>>>>>> sure most C++ developers have failed to
write correct comparison
>>>>>>>>>> function
>>>>>>>>>> at least once.
>>>>>>>>>>
>>>>>>>>>> Could SortChecker functionality be
considered for integration into
>>>>>>>>>> UBSan? Extending SortChecker to C++
land would require
>>>>>>>>>> significant work
>>>>>>>>>> which has already been done in
sanitizers (compiler
>>>>>>>>>> instrumentation,
>>>>>>>>>> portable parallel code, etc.). On the
other hand, ability to
>>>>>>>>>> detect new
>>>>>>>>>> classes of bugs should benefit UBSan
and community.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> -Y
>>>>>>>>>>
_______________________________________________
>>>>>>>>>> LLVM Developers mailing list
>>>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>>>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Alexey Samsonov
>>>>>>>>> vonosmas at gmail.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Alexey Samsonov
>>>>>>>> vonosmas at gmail.com
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> LLVM Developers mailing list
>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
>
>
> --
> Paul Pluzhnikov

Paul Pluzhnikov via llvm-dev

2016-Jan-14 07:06 UTC

head link

[llvm-dev] RFC: Extend UBSan with qsort checks

On Wed, Jan 13, 2016 at 10:37 PM, Yuri Gribov <tetra2005 at gmail.com>
wrote:
> I'm afraid it's even N^3 (with N=32 cap). This indeed sounds scary
but
> I have not seen significant slowdowns when running instrumented
> distro.
Have you looked at how many sort invocations there are, and what their
size distribution is?

Perhaps naively, I expect very few sort()s to be performed in day to
day distro operation, and even fewer sort()s with more than 20
elements.
> Right, I thought about improving this with testing random 32 elements
> instead of the first ones.
That has its own disadvantages: it makes the failure detection
non-repeatable, and makes it harder to understand the conditions under
which the predicate is broken.

-- 
Paul Pluzhnikov

Yuri Gribov via llvm-dev

2016-Jan-14 09:51 UTC

head link

[llvm-dev] RFC: Extend UBSan with qsort checks

On Thu, Jan 14, 2016 at 10:06 AM, Paul Pluzhnikov
<ppluzhnikov at google.com> wrote:> On Wed, Jan 13, 2016 at 10:37 PM, Yuri Gribov <tetra2005 at
gmail.com> wrote:
>
>> I'm afraid it's even N^3 (with N=32 cap). This indeed sounds
scary but
>> I have not seen significant slowdowns when running instrumented
>> distro.
>
> Have you looked at how many sort invocations there are, and what their
> size distribution is?
>
> Perhaps naively, I expect very few sort()s to be performed in day to
> day distro operation, and even fewer sort()s with more than 20
> elements.
That's my feeling as well but I never had time to really profile this.
>> Right, I thought about improving this with testing random 32 elements
>> instead of the first ones.
>
> That has its own disadvantages: it makes the failure detection
> non-repeatable, and makes it harder to understand the conditions under
> which the predicate is broken.
Absolutely (although could be mitigated by printing the seed together
with error message).
> --
> Paul Pluzhnikov

llvm dev - Jan 2016 - RFC: Extend UBSan with qsort checks

[llvm-dev] RFC: Extend UBSan with qsort checks

[llvm-dev] RFC: Extend UBSan with qsort checks

[llvm-dev] RFC: Extend UBSan with qsort checks