thr3ads.net - llvm dev - [llvm-dev] Fuzzing complex programs [Sep 2015]

If this information is useful, please help other people find it:
Share via:

Greg Stark via llvm-dev

2015-Aug-30 14:30 UTC

[llvm-dev] Fuzzing complex programs

I have a project I want to do based on Libfuzzer. Is there a separate
list for it or should I bring up any ideas for it here?

What I have in mind is to fuzz Postgres. Trying to fuzz the SQL
interpreter in general
is not very productive because traditional fuzzers try to execute the
entire program repeatedly and it has a fairly high startup and
shutdown cost. Also the instrumentation-guided approach has
limitations due to the way lexing and parsing works as well as the
large amount of
internal state causing non-deterministic internal behaviour (garbage
collecting persistent data structures, etc).

However there are a number of internal functions that would be very
feasible to fuzz. Things like the datatype input/output functions (I'm
particularly thinking of the datetime parser), regular expression
library, etc.

To do this effectively I think it would be best to invoke the fuzzer
from inside Postgres. Essentially provide bindings for Libfuzzer so
you can I can have Libfuzzer provide all the test cases to repeatedly
call the internal functions on.

Is there any example of doing something like this already? Am I taking
a crazy approach?

There are other approaches possible. It would be nice if I could run
afl or libfuzzer on a client program and have the client program tell
afl or libfuzzer the pid of the server to watch and then request test
cases to feed to the server. That seems like it would be a more
flexible approach for a lot of use cases where the server requires
setting up a complex environment.

-- 
greg

Brian Cain via llvm-dev

2015-Aug-30 16:11 UTC

head link

[llvm-dev] Fuzzing complex programs

On Sun, Aug 30, 2015 at 9:30 AM, Greg Stark via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> I have a project I want to do based on Libfuzzer. Is there a separate
> list for it or should I bring up any ideas for it here?
>
> What I have in mind is to fuzz Postgres. Trying to fuzz the SQL
> interpreter in general
> is not very productive because traditional fuzzers try to execute the
> entire program repeatedly and it has a fairly high startup and
> shutdown cost. Also the instrumentation-guided approach has
>
One challenge in leaving the daemon up while testing is knowing how well
isolated the test cases are from one another.  It may be the case that the
test cases somehow accumulate some global state (test case N triggers heap
corruption, N + 23 crashes as a result of that earlier corruption).  At
least that specific failure mode can probably be mitigated by using one or
more of the sanitizers though.

> limitations due to the way lexing and parsing works as well as the
> large amount of
> internal state causing non-deterministic internal behaviour (garbage
> collecting persistent data structures, etc).
>
> However there are a number of internal functions that would be very
> feasible to fuzz. Things like the datatype input/output functions (I'm
> particularly thinking of the datetime parser), regular expression
> library, etc.
>
> To do this effectively I think it would be best to invoke the fuzzer
> from inside Postgres. Essentially provide bindings for Libfuzzer so
> you can I can have Libfuzzer provide all the test cases to repeatedly
> call the internal functions on.
>
> Is there any example of doing something like this already? Am I taking
> a crazy approach?
>
>I don't have enough experience to say if it's crazy or not.  But if
your LLVMFuzzerTestOneInput() queues some work for the server and pends on
a response -- that seems like a sane approach.

> There are other approaches possible. It would be nice if I could run
> afl or libfuzzer on a client program and have the client program tell
> afl or libfuzzer the pid of the server to watch and then request test
> cases to feed to the server. That seems like it would be a more
> flexible approach for a lot of use cases where the server requires
> setting up a complex environment.
>
>Great idea, but it seems tricky to get the execution coverage feedback in
this case.

Let me know if you're interested in collaborating, it sounds interesting.
Though at first glance, I'd prefer the "not very productive" brute
force
option and just toss more resources at it.

-- 
-Brian
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150830/e73892ed/attachment.html>

Kostya Serebryany via llvm-dev

2015-Aug-30 19:04 UTC

head link

[llvm-dev] Fuzzing complex programs

On Sun, Aug 30, 2015 at 9:11 AM, Brian Cain via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
>
>
> On Sun, Aug 30, 2015 at 9:30 AM, Greg Stark via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> I have a project I want to do based on Libfuzzer. Is there a separate
>> list for it or should I bring up any ideas for it here?
>>
>> No separate list so far, this one should be good.
> What I have in mind is to fuzz Postgres. Trying to fuzz the SQL
>> interpreter in general
>> is not very productive because traditional fuzzers try to execute the
>> entire program repeatedly and it has a fairly high startup and
>> shutdown cost. Also the instrumentation-guided approach has
>>
>
> One challenge in leaving the daemon up while testing is knowing how well
> isolated the test cases are from one another.  It may be the case that the
> test cases somehow accumulate some global state (test case N triggers heap
> corruption, N + 23 crashes as a result of that earlier corruption).  At
> least that specific failure mode can probably be mitigated by using one or
> more of the sanitizers though.
>
>
This is true, however accumulating global state increases the chances to
find complex bugs (at the cost of increased cost of analyzing bugs).
We have seen a few such cases, e.g.
https://sourceware.org/bugzilla/show_bug.cgi?id=18043#c11

> limitations due to the way lexing and parsing works as well as the
>> large amount of
>> internal state causing non-deterministic internal behaviour (garbage
>> collecting persistent data structures, etc).
>>
>> However there are a number of internal functions that would be very
>> feasible to fuzz. Things like the datatype input/output functions
(I'm
>> particularly thinking of the datetime parser), regular expression
>> library, etc.
>>
>In my (biased) opinion libFuzzer is particularly well suited for this task
(fuzzing individual libraries, as opposed to fuzzing the whole postgress).
I've played with a dozen of regular expression libs and found bugs in all
of them
(e.g. search for "Fuzzer" in
http://vcs.pcre.org/pcre2/code/trunk/ChangeLog?view=markup&pathrev=360)


>> To do this effectively I think it would be best to invoke the fuzzer
>> from inside Postgres.
>
>Never tied this.
Can't you just link libFuzzer with a part of the code you want to test?

> Essentially provide bindings for Libfuzzer so
>> you can I can have Libfuzzer provide all the test cases to repeatedly
>> call the internal functions on.
>>
>> Is there any example of doing something like this already? Am I taking
>> a crazy approach?
>>
>>
> I don't have enough experience to say if it's crazy or not.  But if
> your LLVMFuzzerTestOneInput() queues some work for the server and pends on
> a response -- that seems like a sane approach.
>
>
>> There are other approaches possible. It would be nice if I could run
>> afl or libfuzzer on a client program and have the client program tell
>> afl or libfuzzer the pid of the server to watch and then request test
>> cases to feed to the server. That seems like it would be a more
>> flexible approach for a lot of use cases where the server requires
>> setting up a complex environment.
>>
>>
> Great idea, but it seems tricky to get the execution coverage feedback in
> this case.
>
Not very tricky, but less efficient.
The major benefit of libfuzzer is that it get the coverage feedback from
inside the process
avoiding any kind of inter-process communication (no syscalls even).
So for things like simple parsers you can get 50K executions per second
(unless the fuzzer finds  exponential algorithms in the parser).


>
> Let me know if you're interested in collaborating, it sounds
interesting.
> Though at first glance, I'd prefer the "not very productive"
brute force
> option and just toss more resources at it.
>
> --
> -Brian
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150830/bd7af080/attachment.html>

Greg Stark via llvm-dev

2015-Sep-03 11:26 UTC

head link

[llvm-dev] Fuzzing complex programs

On Sun, Aug 30, 2015 at 3:30 PM, Greg Stark <stark at mit.edu>
wrote:> To do this effectively I think it would be best to invoke the fuzzer
> from inside Postgres. Essentially provide bindings for Libfuzzer so
> you can I can have Libfuzzer provide all the test cases to repeatedly
> call the internal functions on.
>
> Is there any example of doing something like this already? Am I taking
> a crazy approach?

So on further inspection it seems the API I want, at least for the
in-process plan is mostly there in LLVMFuzzerNoMain. It would be nice
if I could call the driver with a function pointer and void* and it
would call my callback passing that closure along with the fuzzed
input. But I can probably work around that with a global variable.

I'm actually kind of frustrated by a more basic problem. The build
system. It seems LibFuzzer is meant to be compiled as part of LLVM but
it didn't get compiled when I built LLVM because I didn't build it
with sanitize-coverage enabled. Now I can't get it to build because I
get errors like:

$ for i in *.cpp ; do clang -c -std=c++11 $i ; done
$ clang -std=c++11 *.o
FuzzerDriver.o: In function `fuzzer::ReadTokensFile(char const*)':
FuzzerDriver.cpp:(.text+0x56): undefined reference to
`std::allocator<char>::allocator()'
FuzzerDriver.cpp:(.text+0x6d): undefined reference to
`std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >::basic_string(char const*,
std::allocator<char>
const&)'
FuzzerDriver.cpp:(.text+0x8d): undefined reference to
`std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >::~basic_string()'
FuzzerDriver.cpp:(.text+0x96): undefined reference to
`std::allocator<char>::~allocator()'
FuzzerDriver.cpp:(.text+0xab): undefined reference to
`std::__cxx11::basic_istringstream<char, std::char_traits<char>,
std::allocator<char>>::basic_istringstream(std::__cxx11::basic_string<char,std::char_traits<char>, std::allocator<char> > const&,
std::_Ios_Openmode)'
FuzzerDriver.cpp:(.text+0x14c): undefined reference to
`std::allocator<char>::allocator()'
FuzzerDriver.cpp:(.text+0x166): undefined reference to
`std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >::basic_string(char const*,
std::allocator<char>
const&)'
FuzzerDriver.cpp:(.text+0x18f): undefined reference to
`std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >::~basic_string()'

And I get similar errors if I try to build it using the LLVM CMake
generated makefiles (after running "cmake
-DLLVM_USE_SANITIZE_COVERAGE=1" in the LibFuzzer directory), in fact I
get errors that I need -std=c++11. Do I need to recompile *all* of
llvm as if I was going to fuzz LLVM just to get libfuzzer built?

-- 
greg

mats petersson via llvm-dev

2015-Sep-03 13:55 UTC

head link

[llvm-dev] Fuzzing complex programs

I'm fairly sure your compiler (or rather linker) errors are coming from the
fact that you are not linking to the C++ runtime library. Use `clang++
-std=c++11 *.o`, and I'm reasonably sure it will do what you want.

--
Mats

On 3 September 2015 at 12:26, Greg Stark via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On Sun, Aug 30, 2015 at 3:30 PM, Greg Stark <stark at mit.edu> wrote:
> > To do this effectively I think it would be best to invoke the fuzzer
> > from inside Postgres. Essentially provide bindings for Libfuzzer so
> > you can I can have Libfuzzer provide all the test cases to repeatedly
> > call the internal functions on.
> >
> > Is there any example of doing something like this already? Am I taking
> > a crazy approach?
>
>
> So on further inspection it seems the API I want, at least for the
> in-process plan is mostly there in LLVMFuzzerNoMain. It would be nice
> if I could call the driver with a function pointer and void* and it
> would call my callback passing that closure along with the fuzzed
> input. But I can probably work around that with a global variable.
>
> I'm actually kind of frustrated by a more basic problem. The build
> system. It seems LibFuzzer is meant to be compiled as part of LLVM but
> it didn't get compiled when I built LLVM because I didn't build it
> with sanitize-coverage enabled. Now I can't get it to build because I
> get errors like:
>
> $ for i in *.cpp ; do clang -c -std=c++11 $i ; done
> $ clang -std=c++11 *.o
> FuzzerDriver.o: In function `fuzzer::ReadTokensFile(char const*)':
> FuzzerDriver.cpp:(.text+0x56): undefined reference to
> `std::allocator<char>::allocator()'
> FuzzerDriver.cpp:(.text+0x6d): undefined reference to
> `std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> >::basic_string(char const*,
std::allocator<char>
> const&)'
> FuzzerDriver.cpp:(.text+0x8d): undefined reference to
> `std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> >::~basic_string()'
> FuzzerDriver.cpp:(.text+0x96): undefined reference to
> `std::allocator<char>::~allocator()'
> FuzzerDriver.cpp:(.text+0xab): undefined reference to
> `std::__cxx11::basic_istringstream<char, std::char_traits<char>,
> std::allocator<char>
> >::basic_istringstream(std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> > const&,
> std::_Ios_Openmode)'
> FuzzerDriver.cpp:(.text+0x14c): undefined reference to
> `std::allocator<char>::allocator()'
> FuzzerDriver.cpp:(.text+0x166): undefined reference to
> `std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> >::basic_string(char const*,
std::allocator<char>
> const&)'
> FuzzerDriver.cpp:(.text+0x18f): undefined reference to
> `std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> >::~basic_string()'
>
> And I get similar errors if I try to build it using the LLVM CMake
> generated makefiles (after running "cmake
> -DLLVM_USE_SANITIZE_COVERAGE=1" in the LibFuzzer directory), in fact I
> get errors that I need -std=c++11. Do I need to recompile *all* of
> llvm as if I was going to fuzz LLVM just to get libfuzzer built?
>
> --
> greg
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150903/a35265f7/attachment.html>

Maybe Matching Threads

Search for more possibly parallel threads

llvm dev - Sep 2015 - Fuzzing complex programs

[llvm-dev] Fuzzing complex programs

[llvm-dev] Fuzzing complex programs

[llvm-dev] Fuzzing complex programs

[llvm-dev] Fuzzing complex programs

[llvm-dev] Fuzzing complex programs

Maybe Matching Threads