thr3ads.net - llvm dev - [LLVMdev] Slow jitter. [Aug 2009]

If this information is useful, please help other people find it:
Share via:

Óscar Fuentes

2009-Aug-26 13:57 UTC

[LLVMdev] Slow jitter.

Eli Friedman <eli.friedman at gmail.com> writes:
> On Tue, Aug 25, 2009 at 4:58 PM, Óscar Fuentes<ofv at wanadoo.es>
wrote:
>> Eli Friedman <eli.friedman at gmail.com> writes:
>>
>>> On Wed, Aug 26, 2009 at 1:10 AM, Óscar Fuentes<ofv at
wanadoo.es> wrote:
>>>> While compiling some sources, translating from my
compiler's IR to LLVM
>>>> using the C++ API requires 2.5 seconds. If the resulting LLVM
module is
>>>> dumped as LLVM assembler, the file is 240,000 lines long.
Generating
>>>> LLVM code is fast.
>>>>
>>>> However, generating the native code is quite slow: 33 seconds.
I force
>>>> native code generation calling
ExecutionEngine::getPointerToFunction for
>>>> each function on the module.
>>>>
>>>> This is on x86/Windows/MinGW. The only pass is TargetData, so
no fancy
>>>> optimizations.
>>>>
>>>> I don't think that a static compiler (llvm-gcc, for
instance) needs so
>>>> much time for generating unoptimized native code for a
similarly sized
>>>> module. Is there something special about the JIT that makes it
so slow?
>>>
>>> For comparison, how long does it take to write the whole thing out
as
>>> native assembler?
>>
>> What kind of metric this is? How string manipulation and I/O are a
>> better indication than the number of llvm assembly lines generated or
>> the ratio (llvm IR generation time / native code generation time)?
>
> I wanted the comparison to check whether the issue is just "codegen is
> slow", or more specifically that JIT codegen is slow.  You seem to be
> under the impression that it will be significantly slower, but I don't
> think it's self-evident.  (The output of "time llc
dumpedmodule.bc"
> would be sufficient.)
Sorry Eli. I misread your message as if you were suggesting to measure
the time required for dumping the module as LLVM assembler.

llc needs 45 seconds. This is far worse than the 33 seconds used by the
JIT. Maybe llc is using optimizations. My JIT have no optimizations
enabled.

Yup, llc -O0 takes 37.5 seconds.

llc -pre-RA-sched=fast -regalloc=local takes 26 seconds. Much better but
still slow IMO. The question is if this avoids the non-linear algorithms
and if the generated code is faster enough to justify LLVM. I'll do some
experimentation.

The generated assembly file is 290K lines for unadorned llc and 616K
lines for -pre-RA-sched=fast -regalloc=local. This does not inspire much
hope :-)

-- 
Óscar

Török Edwin

2009-Aug-26 14:07 UTC

head link

[LLVMdev] Slow jitter.

On 2009-08-26 16:57, Óscar Fuentes wrote:> llc needs 45 seconds. This is far worse than the 33 seconds used by the
> JIT. Maybe llc is using optimizations. My JIT have no optimizations
> enabled.
>
> Yup, llc -O0 takes 37.5 seconds.
>
> llc -pre-RA-sched=fast -regalloc=local takes 26 seconds. Much better but
> still slow IMO. The question is if this avoids the non-linear algorithms
> and if the generated code is faster enough to justify LLVM. I'll do
some
> experimentation.
>
> The generated assembly file is 290K lines for unadorned llc and 616K
> lines for -pre-RA-sched=fast -regalloc=local. This does not inspire much
> hope :-)
Is this a Release or a Release-Asserts build?
You could try how much time it takes on a Release-Asserts build.

Also if you use -time-passes with llc it should show which pass in llc
takes so much time.

Best regards,
--Edwin

Óscar Fuentes

2009-Aug-26 14:47 UTC

head link

[LLVMdev] Slow jitter.

Hello Török.

Török Edwin <edwintorok at gmail.com> writes:
> On 2009-08-26 16:57, Óscar Fuentes wrote:
>> llc needs 45 seconds. This is far worse than the 33 seconds used by the
>> JIT. Maybe llc is using optimizations. My JIT have no optimizations
>> enabled.
>>
>> Yup, llc -O0 takes 37.5 seconds.
>>
>> llc -pre-RA-sched=fast -regalloc=local takes 26 seconds. Much better
but
>> still slow IMO. The question is if this avoids the non-linear
algorithms
>> and if the generated code is faster enough to justify LLVM. I'll do
some
>> experimentation.
>>
>> The generated assembly file is 290K lines for unadorned llc and 616K
>> lines for -pre-RA-sched=fast -regalloc=local. This does not inspire
much
>> hope :-)
>
> Is this a Release or a Release-Asserts build?
> You could try how much time it takes on a Release-Asserts build.
Assertions are disabled.
> Also if you use -time-passes with llc it should show which pass in llc
> takes so much time.
These are the three main culprits for llc -O0

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  ---
Name ---
  10.9531 ( 30.0%)   0.4687 ( 58.8%)  11.4218 ( 30.6%)  11.5468 ( 30.6%)  X86
DAG->DAG Instruction Selection
  10.2500 ( 28.0%)   0.0156 (  1.9%)  10.2656 ( 27.5%)  10.2500 ( 27.2%)  Live
Variable Analysis
   4.8593 ( 13.3%)   0.0000 (  0.0%)   4.8593 ( 13.0%)   4.8593 ( 12.9%)  Linear
Scan Register Allocator

And there for -pre-RA-sched=fast -regalloc=simple -O0 code.bc

  10.7187 ( 45.4%)   0.4375 ( 60.8%)  11.1562 ( 45.8%)  11.1718 ( 45.4%)  X86
DAG->DAG Instruction Selection
   7.4687 ( 31.6%)   0.0156 (  2.1%)   7.4843 ( 30.7%)   7.5312 ( 30.6%)  Simple
Register Allocator
   1.9531 (  8.2%)   0.1406 ( 19.5%)   2.0937 (  8.6%)   2.1093 (  8.5%)  X86
Intel-Style Assembly Printer

I suppose we can't get rid of instruction selection :-)

-- 
Óscar

Dan Gohman

2009-Aug-26 14:48 UTC

head link

[LLVMdev] Slow jitter.

On Aug 26, 2009, at 6:57 AM, Óscar Fuentes <ofv at wanadoo.es>
wrote:>
> Yup, llc -O0 takes 37.5 seconds.
>
> llc -pre-RA-sched=fast -regalloc=local takes 26 seconds.
Another important flag for testing llc time is llc -asm-verbose=false.

Dan

Óscar Fuentes

2009-Aug-26 14:59 UTC

head link

[LLVMdev] Slow jitter.

Hello Dan.

Dan Gohman <gohman at apple.com> writes:
> On Aug 26, 2009, at 6:57 AM, Óscar Fuentes <ofv at wanadoo.es> wrote:
>>
>> Yup, llc -O0 takes 37.5 seconds.
>>
>> llc -pre-RA-sched=fast -regalloc=local takes 26 seconds.
>
> Another important flag for testing llc time is llc -asm-verbose=false.
Adding -asm-verbose=false to -pre-RA-sched=fast -regalloc=simple
-time-passes -O0 made no significant difference (saved 0.1 seconds of
27.7 total) for outputting a 637K lines long .s file.

-- 
Óscar

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Aug 2009 - [LLVMdev] Slow jitter.

[LLVMdev] Slow jitter.

[LLVMdev] Slow jitter.

[LLVMdev] Slow jitter.

[LLVMdev] Slow jitter.

[LLVMdev] Slow jitter.

Possibly Parallel Threads