Yuri Gribov via llvm-dev
2016-Jan-14 06:37 UTC
[llvm-dev] RFC: Extend UBSan with qsort checks
On Thu, Jan 14, 2016 at 3:43 AM, Paul Pluzhnikov <ppluzhnikov at google.com> wrote:> On Wed, Jan 13, 2016 at 4:17 PM, Kostya Serebryany <kcc at google.com> wrote: >> Inviting Paul to the party (he wrote the libstdc++ sort checker). >> >> On Tue, Jan 12, 2016 at 11:09 PM, Yury Gribov <y.gribov at samsung.com> wrote: >>> >>> On 01/13/2016 10:08 AM, Yury Gribov wrote: >>>> >>>> On 01/13/2016 09:57 AM, Kostya Serebryany wrote: >>>>> >>>>> On Tue, Jan 12, 2016 at 10:28 PM, Yury Gribov <y.gribov at samsung.com> >>>>> wrote: >>>>> >>>>>> On 01/13/2016 03:10 AM, Kostya Serebryany wrote: >>>>>> >>>>>>> FTR, here is one way to implement this in the library: >>>>>>> >>>>>>> >>>>>>> https://gcc.gnu.org/svn/gcc/branches/google/gcc-4_9/libstdc++-v3/include/bits/stl_algo.h >>>>>>> >>>>>>> Search for "check sort predicate for strict weak ordering" >>>>>>> >>>>>> >>>>>> Nice, although this wouldn't catch violations of transitivity (which is >>>>>> probably the most important type of bug). >>>>>> >>>>>> >>>>> I thought it would... >>>> >>>> >>>> But debug operator() only knows about 2 inputs whereas for transitivity >>>> you'll need to consider at least 3. > > As I privately replied to Kostya, I was only willing to add constant overhead.Sure, there are various conflicting tradeoffs. SortChecker started as an experiment so I was as agressive as possible.> SortChecker adds N*N overhead: > https://github.com/yugr/sortcheck/blob/master/src/sortchecker.c#L427 > except he caps it at N=32 on line 413I'm afraid it's even N^3 (with N=32 cap). This indeed sounds scary but I have not seen significant slowdowns when running instrumented distro.> (and thus will not detect all > violations either).Right, I thought about improving this with testing random 32 elements instead of the first ones.> Adding up to 1024 comparisons to large sort()s maybe isn't so bad. > >>>>> Can you give a test? >>> >>> >>> Here's an example of transitivity violation in GCC: >>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68988 > > Here is one from GDB: > https://sourceware.org/bugzilla/show_bug.cgi?id=19361Yeah but this one is still in discussion so I didn't bother to mention it.>>>>>> On Tue, Jan 12, 2016 at 2:57 PM, Alexey Samsonov via llvm-dev < >>>>>>> >>>>>>> llvm-dev at lists.llvm.org> wrote: >>>>>>> >>>>>>> (+correct cfe-dev list) >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Jan 12, 2016 at 2:57 PM, Alexey Samsonov <vonosmas at gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>> Hi Yuri, >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Jan 11, 2016 at 9:53 AM, Yury Gribov via llvm-dev < >>>>>>>>> llvm-dev at lists.llvm.org> wrote: >>>>>>>>> >>>>>>>>> Hi all, >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> UndefinedBehaviorSanitizer currently does not check for undefined >>>>>>>>>> behaviors which result from improper usage of standard library >>>>>>>>>> functions. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> It's not really an undefined behavior per se: you just violate the >>>>>>>>> contract provided by library functions (at least I haven't found a >>>>>>>>> clause >>>>>>>>> in C++ standard >>>>>>>>> that specifies that the behavior of smth. like std::sort() with >>>>>>>>> invalid >>>>>>>>> operator< is invalid). The actual behavior of these functions is >>>>>>>>> determined by library authors, and can be anything from UB to >>>>>>>>> unspecified >>>>>>>>> result, to nicely formatted error report (if you're using a debug >>>>>>>>> version >>>>>>>>> of system library): >>>>>>>>> I think I actually encountered all these cases. >>>>>>>>> >>>>>>>>> That said, the benefit of checking the sorting routines is clear: >>>>>>>>> I'm >>>>>>>>> not >>>>>>>>> surprised you were able to catch numerous bugs with your tool. >>>>>>>>> >>>>>>>>> UBSan currently doesn't have interceptors, and all of its checks >>>>>>>>> [1] are >>>>>>>>> designed to work without runtime library support, so that it can be >>>>>>>>> used as >>>>>>>>> a mitigation >>>>>>>>> tool (with -fsanitize-trap=undefined). It's challenging to add UBSan >>>>>>>>> interceptors at this point: this would limit the tool portability >>>>>>>>> (at >>>>>>>>> least >>>>>>>>> portability of some functionality), >>>>>>>>> and complicate interoperation of UBSan with another tools. There >>>>>>>>> is an >>>>>>>>> opportunity to, say, add qsort() interceptor to ASan/TSan/MSan, and >>>>>>>>> check >>>>>>>>> arguments there. >>>>>>>>> This functionality can be controlled by a runtime flag. >>>>>>>>> >>>>>>>>> Why do you need compiler instrumentation? Do you want to >>>>>>>>> automatically >>>>>>>>> inject hooks to SortChecker into std::sort() functions, so that you >>>>>>>>> don't >>>>>>>>> need to annotate C++ standard library code? >>>>>>>>> We submitted some annotations to libc++ code (e.g. to report >>>>>>>>> containter >>>>>>>>> overflow bugs in sanitizers). >>>>>>>>> >>>>>>>>> [1] -fsanitize=vptr is an only notable exception >>>>>>>>> >>>>>>>>> One notorious instance of such errors is invalid usage of qsort or >>>>>>>>> >>>>>>>>>> bsearch routines (or std::sort and friends in case of C++): >>>>>>>>>> * using comparison function that violates ordering axioms >>>>>>>>>> (reflexivity, >>>>>>>>>> symmetry, transitivity) >>>>>>>>>> * returning unstable results from comparison function >>>>>>>>>> * passing unsorted array to bsearch >>>>>>>>>> * etc. >>>>>>>>>> >>>>>>>>>> Errors like this will usually result in slightly invalid output >>>>>>>>>> (not >>>>>>>>>> fully sorted arrays, rare failed bsearches, etc.) but may as well >>>>>>>>>> cause >>>>>>>>>> aborts on some systems ( >>>>>>>>>> https://bugzilla.samba.org/show_bug.cgi?id=3959 >>>>>>>>>> ). >>>>>>>>>> >>>>>>>>>> I've recently developed a simple proof-of-concept tool SortChecker >>>>>>>>>> ( >>>>>>>>>> https://github.com/yugr/sortcheck) which intercepts calls to >>>>>>>>>> qsort and >>>>>>>>>> friends (via LD_PRELOAD) and performs various sanity checks before >>>>>>>>>> passing >>>>>>>>>> control to libc e.g. >>>>>>>>>> * sign(cmp(a, b)) == - sign(cmp(b, a)) for all array elements >>>>>>>>>> * etc. >>>>>>>>>> >>>>>>>>>> Results were quite inspiring: I've found several errors in popular >>>>>>>>>> open-source packages (GCC, Harfbuzz, dpkg, libXt, etc.). I'm also >>>>>>>>>> pretty >>>>>>>>>> sure most C++ developers have failed to write correct comparison >>>>>>>>>> function >>>>>>>>>> at least once. >>>>>>>>>> >>>>>>>>>> Could SortChecker functionality be considered for integration into >>>>>>>>>> UBSan? Extending SortChecker to C++ land would require >>>>>>>>>> significant work >>>>>>>>>> which has already been done in sanitizers (compiler >>>>>>>>>> instrumentation, >>>>>>>>>> portable parallel code, etc.). On the other hand, ability to >>>>>>>>>> detect new >>>>>>>>>> classes of bugs should benefit UBSan and community. >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> -Y >>>>>>>>>> _______________________________________________ >>>>>>>>>> LLVM Developers mailing list >>>>>>>>>> llvm-dev at lists.llvm.org >>>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Alexey Samsonov >>>>>>>>> vonosmas at gmail.com >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Alexey Samsonov >>>>>>>> vonosmas at gmail.com >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> LLVM Developers mailing list >>>>>>>> llvm-dev at lists.llvm.org >>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > > > > -- > Paul Pluzhnikov
Paul Pluzhnikov via llvm-dev
2016-Jan-14 07:06 UTC
[llvm-dev] RFC: Extend UBSan with qsort checks
On Wed, Jan 13, 2016 at 10:37 PM, Yuri Gribov <tetra2005 at gmail.com> wrote:> I'm afraid it's even N^3 (with N=32 cap). This indeed sounds scary but > I have not seen significant slowdowns when running instrumented > distro.Have you looked at how many sort invocations there are, and what their size distribution is? Perhaps naively, I expect very few sort()s to be performed in day to day distro operation, and even fewer sort()s with more than 20 elements.> Right, I thought about improving this with testing random 32 elements > instead of the first ones.That has its own disadvantages: it makes the failure detection non-repeatable, and makes it harder to understand the conditions under which the predicate is broken. -- Paul Pluzhnikov
Yuri Gribov via llvm-dev
2016-Jan-14 09:51 UTC
[llvm-dev] RFC: Extend UBSan with qsort checks
On Thu, Jan 14, 2016 at 10:06 AM, Paul Pluzhnikov <ppluzhnikov at google.com> wrote:> On Wed, Jan 13, 2016 at 10:37 PM, Yuri Gribov <tetra2005 at gmail.com> wrote: > >> I'm afraid it's even N^3 (with N=32 cap). This indeed sounds scary but >> I have not seen significant slowdowns when running instrumented >> distro. > > Have you looked at how many sort invocations there are, and what their > size distribution is? > > Perhaps naively, I expect very few sort()s to be performed in day to > day distro operation, and even fewer sort()s with more than 20 > elements.That's my feeling as well but I never had time to really profile this.>> Right, I thought about improving this with testing random 32 elements >> instead of the first ones. > > That has its own disadvantages: it makes the failure detection > non-repeatable, and makes it harder to understand the conditions under > which the predicate is broken.Absolutely (although could be mitigated by printing the seed together with error message).> -- > Paul Pluzhnikov