thr3ads.net - llvm dev - [LLVMdev] Google SoC Proposal Draft [Mar 2007]

If this information is useful, please help other people find it:
Share via:

Tilmann Scheller

2007-Mar-24 23:41 UTC

[LLVMdev] Google SoC Proposal Draft

Hello,

here's my proposal for a GSoC project with LLVM. I'm happy for any 
feedback or advice you can give me.

Thanks in Advance

Tilmann


* Proposal for Google Summer of Code Project

** Using LLVM as a backend for QEMU's dynamic binary translation


*** Abstract:
The goal of this project is to modify the QEMU dynamic binary translator 
to use components of the LLVM compiler infrastructure to turn it into a 
highly optimizing dynamic binary translator in order to increase the 
performance of QEMU even further. Instead of directly emitting code for 
the host architecture QEMU is running on, the target code is first 
translated to LLVM IR, then a selection of LLVM's optimization functions 
is applied to the IR and as a last step the LLVM JIT is used to generate 
code from the optimized IR for the host architecture. Detailed speed 
measurements will be performed in order to evaluate the efficiency of 
this approach, especially in comparison to the approach currently used 
by QEMU.


*** Benefits:
QEMU will largely benefit from this project through an expected increase 
in speed, while remaining portable.
Through this project LLVM will effectively get frontends for all target 
architectures supported by QEMU (at the moment this are x86, ARM, SPARC, 
PowerPC and MIPS). This offers many opportunities and new fields for the 
application of LLVM on binary code e.g. optimization of binaries where 
no source code is available. Also since the LLVM JIT will be used for 
the final code generation QEMU can be hosted on any architecture 
targetted by the LLVM JIT (at the moment this are x86, x86-64, PowerPC 
and PowerPC 64), at least concerning code generation. Further 
adjustments to QEMU might be necessary though to get QEMU to run on a 
certain architecture which is supported by the LLVM JIT but not by QEMU. 
This project will show the applicability of LLVM in an emulation 
environment, especially in regard to dynamic binary translation. It can 
also be used as a basis to try out concepts like profile-guided 
optimization or static optimization in the context of an emulator.


*** Deliverables:
- a version of QEMU with an optimizing dynamic binary translator 
utilizing LLVM components
- a set of test suites which are created during the development
- all necessary documentation to understand and be able to maintain the 
software


*** Plan:
The development of the software will be done within the three month 
timeframe of GSoC. Weekly status reports will be given.

Week 1, 2:
      - get familiar with LLVM and QEMU
      - write small test programs for certain LLVM components, or even a 
simple prototype
      - get to know LLVM example programs
Week 3, 4:
      - modify QEMU's dynamic binary translator to emit LLVM IR
      - create tests to verify the translation
Week 5, 6:
      - integrate LLVM JIT into QEMU's dynamic binary translator
      - perform first speed measurements
Week 7, 8:
      - integrate LLVM optimizations into QEMU
      - perform more speed measurements, select useful optimizations
Week 9, 10:
      - test the system extensively
      - write final documentation
Week 11, 12:
      - time buffer to deal with unexpected events


*** Qualification:
I'm a graduate student studying Software Engineering at the University 
of Stuttgart in Germany. I have a strong interest in compiler technology 
and see this project as a great opportunity to gain experience in this 
field. I have taken a compiler building class and plan to focus my 
future studies in this area.
Emulation is another area I'm interested in. I wrote a Game Boy Advance 
emulator in C from scratch and a GP32 emulator based on QEMU (also C). 
While doing this I gained a basic understanding of the QEMU codebase.
I'm currently involved in a university project which develops a testing 
tool for glass box tests for Java and COBOL, which allows to gather 
certain coverage metrics, and which will be opensourced later this year.
I have decent experience with C and Java and i'm familiar with C++. Also 
I have a deep understanding of the ARM architecture and I'm familiar 
with x86.
This project is a big chance for me to give something back to the open 
source community, especially since both LLVM and QEMU can profit from 
this project.

Reid Spencer

2007-Mar-25 00:27 UTC

head link

[LLVMdev] Google SoC Proposal Draft

Hi Tilmman,

Thanks for submitting this. Here's some feedback.

On Sun, 2007-03-25 at 00:41 +0100, Tilmann Scheller
wrote:> Hello,
> 
> here's my proposal for a GSoC project with LLVM. I'm happy for any 
> feedback or advice you can give me.
> 
> Thanks in Advance
> 
> Tilmann
> 
> 
> * Proposal for Google Summer of Code Project
> 
> ** Using LLVM as a backend for QEMU's dynamic binary translation
> 
> 
> *** Abstract:
> The goal of this project is to modify the QEMU dynamic binary translator 
> to use components of the LLVM compiler infrastructure to turn it into a 
> highly optimizing dynamic binary translator in order to increase the 
> performance of QEMU even further. Instead of directly emitting code for 
> the host architecture QEMU is running on, the target code is first 
> translated to LLVM IR, then a selection of LLVM's optimization
functions
> is applied to the IR and as a last step the LLVM JIT is used to generate 
> code from the optimized IR for the host architecture. Detailed speed 
> measurements will be performed in order to evaluate the efficiency of 
> this approach, especially in comparison to the approach currently used 
> by QEMU.
One thing I find lacking here is any mention of how you'll address the
extra time taken generate, optimize and code gen with LLVM. If I
understand QEMU, it translates to an intermediate representation and
then (fairly directly) executes on the native machine. I assume you will
translate this intermediate representation to LLVM IR, run passes, and
then JIT. You probably only want to do that if you know the function is
going to be called a lot or it has loops, etc. The cost of a second
layer of representation (LLVM IR), optimization and code generation is
non trivial and could easily dwarf the execution time of the
functions/programs involved. Do you have a strategy for addressing this?
> 
> 
> *** Benefits:
> QEMU will largely benefit from this project through an expected increase 
> in speed, while remaining portable.
> Through this project LLVM will effectively get frontends for all target 
frontends -> front ends> architectures supported by QEMU (at the moment this are x86, ARM, SPARC, 
> PowerPC and MIPS). this -> these

I'm not sure how strong a point this is. The only processor there that
LLVM lacks is MIPS so we can always test on the native hardware. A MIPS
BE has been proposed as well.
> This offers many opportunities and new fields for the 
> application of LLVM on binary code e.g. optimization of binaries where 
> no source code is available. 
The main advantage I see here is that QEMU would allow us to *test*
binaries for multiple targets (those supported by QEMU) without the need
for that target's hardware. While that's an advantage, I don't think
its
a "new field". 

What other "opportunities and new fields" do you see for this? If I
were
Google, I wouldn't find the argument that it "allows cross-target
testing for LLVM JIT" very compelling.
> Also since the LLVM JIT will be used for 
> the final code generation QEMU can be hosted on any architecture 
> targetted by the LLVM JIT (at the moment this are x86, x86-64, PowerPC targetted -> targeted
> and PowerPC 64), at least concerning code generation. Further 
> adjustments to QEMU might be necessary though to get QEMU to run on a 
> certain architecture which is supported by the LLVM JIT but not by QEMU. 
> This project will show the applicability of LLVM in an emulation 
> environment, especially in regard to dynamic binary translation. It can 
> also be used as a basis to try out concepts like profile-guided 
> optimization or static optimization in the context of an emulator.
> 
> 
> *** Deliverables:
> - a version of QEMU with an optimizing dynamic binary translator 
> utilizing LLVM components
> - a set of test suites which are created during the development
You might want to hint at the level of coverage you can reasonably do in
the time frame you have. 
> - all necessary documentation to understand and be able to maintain the 
> software
> 
> 
> *** Plan:
> The development of the software will be done within the three month 
> timeframe of GSoC. Weekly status reports will be given.
> 
> Week 1, 2:
>       - get familiar with LLVM and QEMU
>       - write small test programs for certain LLVM components, or even a 
> simple prototype
>       - get to know LLVM example programs
> Week 3, 4:
>       - modify QEMU's dynamic binary translator to emit LLVM IR
>       - create tests to verify the translationI think this is going to take longer than 2 weeks, but that's just my
guess.> Week 5, 6:
>       - integrate LLVM JIT into QEMU's dynamic binary translator
>       - perform first speed measurements
> Week 7, 8:
>       - integrate LLVM optimizations into QEMU
>       - perform more speed measurements, select useful optimizations
> Week 9, 10:
>       - test the system extensively
>       - write final documentation
> Week 11, 12:
>       - time buffer to deal with unexpected events
> 
> 
> *** Qualification:
> I'm a graduate student studying Software Engineering at the University 
> of Stuttgart in Germany. I have a strong interest in compiler technology 
> and see this project as a great opportunity to gain experience in this 
> field. I have taken a compiler building class and plan to focus my 
> future studies in this area.
> Emulation is another area I'm interested in. I wrote a Game Boy Advance
> emulator in C from scratch and a GP32 emulator based on QEMU (also C). 
> While doing this I gained a basic understanding of the QEMU codebase.
How much more familiar do you need to get? This is somewhat
contradictory with spending 1-2 weeks learning LLVM/QEMU that you have
in the Plan.
> I'm currently involved in a university project which develops a testing
> tool for glass box tests for Java and COBOL, which allows to gather 
> certain coverage metrics, and which will be opensourced later this year.
> I have decent experience with C and Java and i'm familiar with C++.
Also
> I have a deep understanding of the ARM architecture and I'm familiar 
> with x86.
> This project is a big chance for me to give something back to the open 
> source community, especially since both LLVM and QEMU can profit from 
> this project.
Yup, I'm sure both projects will welcome it.

Sounds like an interesting project. Hope you get it.

Reid.
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Vikram S. Adve

2007-Mar-25 01:44 UTC

head link

[LLVMdev] Google SoC Proposal Draft

If I understand right, Tilmann is proposing to create *front-ends*  
from binary code in various architectures to LLVM, not back ends.  In  
fact, having a binary-to-LLVM translator would be quite valuable  
because there are many projects where being able to apply compiler  
techniques to binary code would be useful.  Even if the results are  
more conservative than source, the advantage of not requiring source  
code (or analyzing both source code and binary code, such as for  
external libraries) would outweigh that.

For example, it would be fabulous to be able to apply the SAFECode  
compiler to binary code, instead of source.

--Vikram
http://www.cs.uiuc.edu/~vadve
http://llvm.cs.uiuc.edu/


On Mar 24, 2007, at 7:27 PM, Reid Spencer wrote:
> Hi Tilmman,
>
> Thanks for submitting this. Here's some feedback.
>
> On Sun, 2007-03-25 at 00:41 +0100, Tilmann Scheller wrote:
>> Hello,
>>
>> here's my proposal for a GSoC project with LLVM. I'm happy for
any
>> feedback or advice you can give me.
>>
>> Thanks in Advance
>>
>> Tilmann
>>
>>
>> * Proposal for Google Summer of Code Project
>>
>> ** Using LLVM as a backend for QEMU's dynamic binary translation
>>
>>
>> *** Abstract:
>> The goal of this project is to modify the QEMU dynamic binary  
>> translator
>> to use components of the LLVM compiler infrastructure to turn it  
>> into a
>> highly optimizing dynamic binary translator in order to increase the
>> performance of QEMU even further. Instead of directly emitting  
>> code for
>> the host architecture QEMU is running on, the target code is first
>> translated to LLVM IR, then a selection of LLVM's optimization  
>> functions
>> is applied to the IR and as a last step the LLVM JIT is used to  
>> generate
>> code from the optimized IR for the host architecture. Detailed speed
>> measurements will be performed in order to evaluate the efficiency of
>> this approach, especially in comparison to the approach currently  
>> used
>> by QEMU.
>
> One thing I find lacking here is any mention of how you'll address the
> extra time taken generate, optimize and code gen with LLVM. If I
> understand QEMU, it translates to an intermediate representation and
> then (fairly directly) executes on the native machine. I assume you  
> will
> translate this intermediate representation to LLVM IR, run passes, and
> then JIT. You probably only want to do that if you know the  
> function is
> going to be called a lot or it has loops, etc. The cost of a second
> layer of representation (LLVM IR), optimization and code generation is
> non trivial and could easily dwarf the execution time of the
> functions/programs involved. Do you have a strategy for addressing  
> this?
>
>>
>>
>> *** Benefits:
>> QEMU will largely benefit from this project through an expected  
>> increase
>> in speed, while remaining portable.
>> Through this project LLVM will effectively get frontends for all  
>> target
> frontends -> front ends
>> architectures supported by QEMU (at the moment this are x86, ARM,  
>> SPARC,
>> PowerPC and MIPS).
> this -> these
>
> I'm not sure how strong a point this is. The only processor there that
> LLVM lacks is MIPS so we can always test on the native hardware. A  
> MIPS
> BE has been proposed as well.
>
>> This offers many opportunities and new fields for the
>> application of LLVM on binary code e.g. optimization of binaries  
>> where
>> no source code is available.
>
> The main advantage I see here is that QEMU would allow us to *test*
> binaries for multiple targets (those supported by QEMU) without the  
> need
> for that target's hardware. While that's an advantage, I don't
> think its
> a "new field".
>
> What other "opportunities and new fields" do you see for this? If
I
> were
> Google, I wouldn't find the argument that it "allows cross-target
> testing for LLVM JIT" very compelling.
>
>> Also since the LLVM JIT will be used for
>> the final code generation QEMU can be hosted on any architecture
>> targetted by the LLVM JIT (at the moment this are x86, x86-64,  
>> PowerPC
> targetted -> targeted
>
>> and PowerPC 64), at least concerning code generation. Further
>> adjustments to QEMU might be necessary though to get QEMU to run on a
>> certain architecture which is supported by the LLVM JIT but not by  
>> QEMU.
>> This project will show the applicability of LLVM in an emulation
>> environment, especially in regard to dynamic binary translation.  
>> It can
>> also be used as a basis to try out concepts like profile-guided
>> optimization or static optimization in the context of an emulator.
>>
>>
>> *** Deliverables:
>> - a version of QEMU with an optimizing dynamic binary translator
>> utilizing LLVM components
>> - a set of test suites which are created during the development
>
> You might want to hint at the level of coverage you can reasonably  
> do in
> the time frame you have.
>
>> - all necessary documentation to understand and be able to  
>> maintain the
>> software
>>
>>
>> *** Plan:
>> The development of the software will be done within the three month
>> timeframe of GSoC. Weekly status reports will be given.
>>
>> Week 1, 2:
>>       - get familiar with LLVM and QEMU
>>       - write small test programs for certain LLVM components, or  
>> even a
>> simple prototype
>>       - get to know LLVM example programs
>> Week 3, 4:
>>       - modify QEMU's dynamic binary translator to emit LLVM IR
>>       - create tests to verify the translation
> I think this is going to take longer than 2 weeks, but that's just my
> guess.
>> Week 5, 6:
>>       - integrate LLVM JIT into QEMU's dynamic binary translator
>>       - perform first speed measurements
>> Week 7, 8:
>>       - integrate LLVM optimizations into QEMU
>>       - perform more speed measurements, select useful optimizations
>> Week 9, 10:
>>       - test the system extensively
>>       - write final documentation
>> Week 11, 12:
>>       - time buffer to deal with unexpected events
>>
>>
>> *** Qualification:
>> I'm a graduate student studying Software Engineering at the  
>> University
>> of Stuttgart in Germany. I have a strong interest in compiler  
>> technology
>> and see this project as a great opportunity to gain experience in  
>> this
>> field. I have taken a compiler building class and plan to focus my
>> future studies in this area.
>> Emulation is another area I'm interested in. I wrote a Game Boy  
>> Advance
>> emulator in C from scratch and a GP32 emulator based on QEMU (also  
>> C).
>> While doing this I gained a basic understanding of the QEMU codebase.
>
> How much more familiar do you need to get? This is somewhat
> contradictory with spending 1-2 weeks learning LLVM/QEMU that you have
> in the Plan.
>
>> I'm currently involved in a university project which develops a  
>> testing
>> tool for glass box tests for Java and COBOL, which allows to gather
>> certain coverage metrics, and which will be opensourced later this  
>> year.
>> I have decent experience with C and Java and i'm familiar with C+ 
>> +. Also
>> I have a deep understanding of the ARM architecture and I'm
familiar
>> with x86.
>> This project is a big chance for me to give something back to the  
>> open
>> source community, especially since both LLVM and QEMU can profit from
>> this project.
>
> Yup, I'm sure both projects will welcome it.
>
> Sounds like an interesting project. Hope you get it.
>
> Reid.
>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20070324/87d16694/attachment.html>

Anton Korobeynikov

2007-Mar-25 10:30 UTC

head link

[LLVMdev] Google SoC Proposal Draft

Vikram,
> If I understand right, Tilmann is proposing to create *front-ends*
> from binary code in various architectures to LLVM, not back ends.Yes. AFAIK, currently QEMU disassembles source binary, converts
instructions to "ops" (which is something like direct C equivalents of
the instructions) and executes that "ops". 

The idea was to emit LLVM IR instead of that "ops". And after - use
all
power of LLVM (JIT'ing, transformations, etc.).

> > I'm not sure how strong a point this is. The only processor there
> > that
> > LLVM lacks is MIPS so we can always test on the native hardware. A
> > MIPS BE has been proposed as well.These are "source" architectures, e.g. the architectures QEMU will
translate code from.


-- 
With best regards, Anton Korobeynikov.

Faculty of Mathematics & Mechanics, Saint Petersburg State University.

Seemingly Similar Threads

Search for more reasonably related threads

llvm dev - Mar 2007 - [LLVMdev] Google SoC Proposal Draft

[LLVMdev] Google SoC Proposal Draft

[LLVMdev] Google SoC Proposal Draft

[LLVMdev] Google SoC Proposal Draft

[LLVMdev] Google SoC Proposal Draft

Seemingly Similar Threads