thr3ads.net - llvm dev - [llvm-dev] Applying for GSoC 2021(Fuzzing LLVM-IR Passes) [Mar 2021]

If this information is useful, please help other people find it:
Share via:

张驰斌 via llvm-dev

2021-Mar-09 09:12 UTC

[llvm-dev] Applying for GSoC 2021(Fuzzing LLVM-IR Passes)

Hi Johannes,
       Glad to hear from you! I understand that the title listed in the llvm
GSoC 2021 webpage serves as a general guideline but a project proposal might
need limit its scope and focus on the deliverables. The ideas proposed all seems
quite appealing and relevant to me. I’ve been browsing through
llvm.rog/docs/FuzzingLLVM.html and llvm-project/*/tools/*-fuzzer recently as
well as the youtube video that you mentioned on the GSoC site. The following are
some questions I’ve accumulated. (forgive me if they are too naïve…).

1.      Truth to be told, I’ve used OpenMP before for my course project, but I
haven’t look into the inner workings of it, e.g. how it actually instruments
programs decorated with #pragma, and how it interact with the OS’s threading. If
llvm’s OpenMP implementation hasn’t been fuzzed before, then it surely is a
valuable fuzz target.  Could you give some clue on how we could fuzz OpenMP? 
Like writing a parser for fuzzer input and calling openmp library function in
LLVMFuzzOneInput function? Or we fuzz it through clang? I’ll look into
llvm-project/openmp some more.

2.      For the custom mutator idea.  My understanding is that currently there
are 2 kinds of mutators, the generic one that is shipped with LibFuzzer (Bit
flipping, splicing, etc.), and a structural mutator. Is the structural mutator
related to IRMutator.cpp in the FuzzMutate folder?

3.      Most of the bugs found by fuzzers are usually crashes or hangs.
Correctness testing is interesting but hard to achieve from my limited
knowledge. I wonder if this is related to the ‘Alive’ tool mentioned by Florian?
The fuzzer provides input to some llvm pass, and ‘Alive’ will verify that the
transformation is valid. Please correct me if my understanding is wrong…
To be honest, previous llvm passes I wrote are out tree passes. I’ve just
setuped my machine, built llvm configured with fuzzer support, and started
fiddling around lately. I have a rough picture of what each idea is about, but
it would take some preparation work for me to split them into incremental steps
and deliverables. Since it’s still early in the application process, I wonder if
you can spare me some time researching the ideas that you proposed and making
inquiries before finally deciding on my project proposal? 😊

I am living in Shanghai, in the GMT+8 time zone. How about 15:00 tommorrow
(March. 10), or 13:30 on Friday afternoon (March. 12)? I am not sure which time
zone you are located in, so feel free to propose another time slot if the prior
two are not convenient for you (later that day or on weekends are both fine).
Hope to have a chat with you soon.

Cheers,
Chibin Zhang
2021.3.9

发件人: Johannes Doerfert<mailto:johannesdoerfert at gmail.com>
发送时间: 2021年3月9日 7:17
收件人: Florian Hahn<mailto:florian_hahn at apple.com>;
llvm-dev<mailto:llvm-dev at lists.llvm.org>
抄送: 张驰斌<mailto:zhangchb1 at shanghaitech.edu.cn>; John
Regehr<mailto:regehr at cs.utah.edu>
主题: Re: [llvm-dev] Applying for GSoC 2021(Fuzzing LLVM-IR Passes)

Having Alive2 as oracle would certainly be great.

Some rough ideas that can be worked on in parallel if we have multiple
GSoC students:
  - mutation rules we know are sound, e.g., remove guarantees, add 1
iteration loops, etc.
  - input generation, equivalence checking (alive, partial evaluation, ...)
  - fragment extraction from larger codes + input tracking ->
reproducer splitting, faster equivalence checking, ...

We certainly can come up with more things.

Would either or both of your (or anyone else) be interested in
co-mentoring students?
We have multiple interested ones already, even though my project
description is lacking any detail.

~ Johannes

On 3/8/21 3:34 PM, Florian Hahn wrote:>
>> On Mar 8, 2021, at 20:26, John Regehr via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
>>
>> Hi folks, an angle related to IR fuzzing that I would be happy to help
out with is using Alive2 as a test oracle.
>>
>> Using Alive2 incurs a set of problems (not all IR features supported,
can be very slow) but has corresponding advantages (considers all inputs at
once, handles UB gracefully).
>>
> If anyone’s interested in combing LLVM’s libFuzzer & Alive2, I’ve put
up https://reviews.llvm.org/D96654 which uses Alive2 to verify candidates
generated by fuzzing. It works out quite well, but I think there’s lots of
potential to improve the ‘interestingness’ of the IR generated by libFuzzer.
>
> Cheers,
> Florian
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210309/3fbffc57/attachment-0001.html>

John Regehr via llvm-dev

2021-Mar-09 18:24 UTC

head link

[llvm-dev] Applying for GSoC 2021(Fuzzing LLVM-IR Passes)

> 3.Most of the bugs found by fuzzers are usually crashes or hangs. 
> Correctness testing is interesting but hard to achieve from my limited 
> knowledge. I wonder if this is related to the ‘Alive’tool mentioned by 
> Florian? The fuzzer provides input to some llvm pass, and ‘Alive’will 
> verify that the transformation is valid. Please correct me if my 
> understanding is wrong…
Yes, exactly. Currently what we do is run the LLVM test suite with 
Alive2 watching every transformation and looking for problems -- this 
has found a number of issues.

A similar process, but with inputs supplied by a random IR generator, 
should work quite well.

John

Johannes Doerfert via llvm-dev

2021-Mar-10 16:53 UTC

head link

[llvm-dev] Applying for GSoC 2021(Fuzzing LLVM-IR Passes)

Hi Chibin,

On 3/9/21 3:12 AM, 张驰斌 wrote:> Hi Johannes,
>         Glad to hear from you! I understand that the title listed in the
llvm GSoC 2021 webpage serves as a general guideline but a project proposal
might need limit its scope and focus on the deliverables.
Yes, students will write the actual proposal which should contain more 
details and scope discussion.

>   The ideas proposed all seems quite appealing and relevant to me. I’ve
been browsing through llvm.rog/docs/FuzzingLLVM.html and
llvm-project/*/tools/*-fuzzer recently as well as the youtube video that you
mentioned on the GSoC site. The following are some questions I’ve accumulated.
(forgive me if they are too naïve…).
Questions are always good.

> 1.      Truth to be told, I’ve used OpenMP before for my course project,
but I haven’t look into the inner workings of it, e.g. how it actually
instruments programs decorated with #pragma, and how it interact with the OS’s
threading. If llvm’s OpenMP implementation hasn’t been fuzzed before, then it
surely is a valuable fuzz target.  Could you give some clue on how we could fuzz
OpenMP?  Like writing a parser for fuzzer input and calling openmp library
function in LLVMFuzzOneInput function? Or we fuzz it through clang? I’ll look
into llvm-project/openmp some more.
So the OpenMP runtime has an "internal" and an external part. The 
internal part is full of undocumented dependences so I doubt we can fuzz 
it without breaking at least one for each test. The external one is 
fuzzable however. That said, generating OpenMP programs to be feed to 
clang seems like a good thing to do. OpenMP has it's own set of 
"documented" dependences, e.g., nesting restrictions, but that is not 
necessarily a problem.
If we generate an invalid OpenMP program we should gracefully fail, in 
most cases. If we don't we have good test cases for an OpenMP sanitizer 
later on. We could also embed knowledge about nesting and other OpenMP 
restrictions into the fuzzer/mutation tester/test generator. Long story 
short, generating a large corpus of OpenMP inputs is certainly something 
I'm interested in, we can start with "random" programs and evolve 
towards more targeted approaches.

> 2.      For the custom mutator idea.  My understanding is that currently
there are 2 kinds of mutators, the generic one that is shipped with LibFuzzer
(Bit flipping, splicing, etc.), and a structural mutator. Is the structural
mutator related to IRMutator.cpp in the FuzzMutate folder?
I'm not sure myself. I think "structural" here means it fuzzes a
well
defined structures, here protobuf. I might be wrong.

What I was looking for, among other things, is a way to do CFG 
transformations and less obvious IR transformations, maybe:
   - Add a "while-loop" with one iteration around a (set of) block(s) 
(various ways to "hide" the one iteration part)
   - Add a "do-loop" with zero iterations around a (set of) block(s) 
(various ways to "hide" the zero iterations part)
   - Add a call to an function SCC which does effectively nothing but 
writes new buffers passed to it or allocated within.
   - Add branches that will not be taken with various targets, 
unreachable, some arbitrary block in the function, etc.
   - Add arguments to functions that are effectively useless.
   - ...

We would do those and record if and how the change impacted passes or 
the entire O3 pipeline. Learn about our heuristics and cutoffs and such, 
build a database, etc.

>
> 3.      Most of the bugs found by fuzzers are usually crashes or hangs.
Correctness testing is interesting but hard to achieve from my limited
knowledge. I wonder if this is related to the ‘Alive’ tool mentioned by Florian?
The fuzzer provides input to some llvm pass, and ‘Alive’ will verify that the
transformation is valid. Please correct me if my understanding is wrong…
Yes, that is the idea. If we fuzz blindly, as opposed to guided test 
mutation or synthesis, we will generate a lot of garbage inputs which 
can only be used to detect crashes and hangs. However, given Alive we 
can verify if the output of the compiler is an implementation of the 
input, for some cases.

> To be honest, previous llvm passes I wrote are out tree passes. I’ve just
setuped my machine, built llvm configured with fuzzer support, and started
fiddling around lately. I have a rough picture of what each idea is about, but
it would take some preparation work for me to split them into incremental steps
and deliverables. Since it’s still early in the application process, I wonder if
you can spare me some time researching the ideas that you proposed and making
inquiries before finally deciding on my project proposal? 😊
As mentioned, students write the proposal. You should determine which of 
the "areas" you like best and then do some research towards that. We
can
be in contact and you start write up what you want to do.

>
> I am living in Shanghai, in the GMT+8 time zone. How about 15:00 tommorrow
(March. 10), or 13:30 on Friday afternoon (March. 12)? I am not sure which time
zone you are located in, so feel free to propose another time slot if the prior
two are not convenient for you (later that day or on weekends are both fine).
Hope to have a chat with you soon.
This week is full, I'll get back to you.

~ Johannes

>
> Cheers,
> Chibin Zhang
> 2021.3.9
>
> 发件人: Johannes Doerfert<mailto:johannesdoerfert at gmail.com>
> 发送时间: 2021年3月9日 7:17
> 收件人: Florian Hahn<mailto:florian_hahn at apple.com>;
llvm-dev<mailto:llvm-dev at lists.llvm.org>
> 抄送: 张驰斌<mailto:zhangchb1 at shanghaitech.edu.cn>; John
Regehr<mailto:regehr at cs.utah.edu>
> 主题: Re: [llvm-dev] Applying for GSoC 2021(Fuzzing LLVM-IR Passes)
>
> Having Alive2 as oracle would certainly be great.
>
> Some rough ideas that can be worked on in parallel if we have multiple
> GSoC students:
>    - mutation rules we know are sound, e.g., remove guarantees, add 1
> iteration loops, etc.
>    - input generation, equivalence checking (alive, partial evaluation,
...)
>    - fragment extraction from larger codes + input tracking ->
> reproducer splitting, faster equivalence checking, ...
>
> We certainly can come up with more things.
>
> Would either or both of your (or anyone else) be interested in
> co-mentoring students?
> We have multiple interested ones already, even though my project
> description is lacking any detail.
>
> ~ Johannes
>
>
> On 3/8/21 3:34 PM, Florian Hahn wrote:
>>> On Mar 8, 2021, at 20:26, John Regehr via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
>>>
>>> Hi folks, an angle related to IR fuzzing that I would be happy to
help out with is using Alive2 as a test oracle.
>>>
>>> Using Alive2 incurs a set of problems (not all IR features
supported, can be very slow) but has corresponding advantages (considers all
inputs at once, handles UB gracefully).
>>>
>> If anyone’s interested in combing LLVM’s libFuzzer & Alive2, I’ve
put up https://reviews.llvm.org/D96654 which uses Alive2 to verify candidates
generated by fuzzing. It works out quite well, but I think there’s lots of
potential to improve the ‘interestingness’ of the IR generated by libFuzzer.
>>
>> Cheers,
>> Florian
>>
>

llvm dev - Mar 2021 - Applying for GSoC 2021(Fuzzing LLVM-IR Passes)

[llvm-dev] Applying for GSoC 2021(Fuzzing LLVM-IR Passes)

[llvm-dev] Applying for GSoC 2021(Fuzzing LLVM-IR Passes)

[llvm-dev] Applying for GSoC 2021(Fuzzing LLVM-IR Passes)