Eli Friedman <eli.friedman at gmail.com> writes:> On Tue, Aug 25, 2009 at 4:58 PM, Óscar Fuentes<ofv at wanadoo.es> wrote: >> Eli Friedman <eli.friedman at gmail.com> writes: >> >>> On Wed, Aug 26, 2009 at 1:10 AM, Óscar Fuentes<ofv at wanadoo.es> wrote: >>>> While compiling some sources, translating from my compiler's IR to LLVM >>>> using the C++ API requires 2.5 seconds. If the resulting LLVM module is >>>> dumped as LLVM assembler, the file is 240,000 lines long. Generating >>>> LLVM code is fast. >>>> >>>> However, generating the native code is quite slow: 33 seconds. I force >>>> native code generation calling ExecutionEngine::getPointerToFunction for >>>> each function on the module. >>>> >>>> This is on x86/Windows/MinGW. The only pass is TargetData, so no fancy >>>> optimizations. >>>> >>>> I don't think that a static compiler (llvm-gcc, for instance) needs so >>>> much time for generating unoptimized native code for a similarly sized >>>> module. Is there something special about the JIT that makes it so slow? >>> >>> For comparison, how long does it take to write the whole thing out as >>> native assembler? >> >> What kind of metric this is? How string manipulation and I/O are a >> better indication than the number of llvm assembly lines generated or >> the ratio (llvm IR generation time / native code generation time)? > > I wanted the comparison to check whether the issue is just "codegen is > slow", or more specifically that JIT codegen is slow. You seem to be > under the impression that it will be significantly slower, but I don't > think it's self-evident. (The output of "time llc dumpedmodule.bc" > would be sufficient.)Sorry Eli. I misread your message as if you were suggesting to measure the time required for dumping the module as LLVM assembler. llc needs 45 seconds. This is far worse than the 33 seconds used by the JIT. Maybe llc is using optimizations. My JIT have no optimizations enabled. Yup, llc -O0 takes 37.5 seconds. llc -pre-RA-sched=fast -regalloc=local takes 26 seconds. Much better but still slow IMO. The question is if this avoids the non-linear algorithms and if the generated code is faster enough to justify LLVM. I'll do some experimentation. The generated assembly file is 290K lines for unadorned llc and 616K lines for -pre-RA-sched=fast -regalloc=local. This does not inspire much hope :-) -- Óscar
On 2009-08-26 16:57, Óscar Fuentes wrote:> llc needs 45 seconds. This is far worse than the 33 seconds used by the > JIT. Maybe llc is using optimizations. My JIT have no optimizations > enabled. > > Yup, llc -O0 takes 37.5 seconds. > > llc -pre-RA-sched=fast -regalloc=local takes 26 seconds. Much better but > still slow IMO. The question is if this avoids the non-linear algorithms > and if the generated code is faster enough to justify LLVM. I'll do some > experimentation. > > The generated assembly file is 290K lines for unadorned llc and 616K > lines for -pre-RA-sched=fast -regalloc=local. This does not inspire much > hope :-)Is this a Release or a Release-Asserts build? You could try how much time it takes on a Release-Asserts build. Also if you use -time-passes with llc it should show which pass in llc takes so much time. Best regards, --Edwin
Hello Török. Török Edwin <edwintorok at gmail.com> writes:> On 2009-08-26 16:57, Óscar Fuentes wrote: >> llc needs 45 seconds. This is far worse than the 33 seconds used by the >> JIT. Maybe llc is using optimizations. My JIT have no optimizations >> enabled. >> >> Yup, llc -O0 takes 37.5 seconds. >> >> llc -pre-RA-sched=fast -regalloc=local takes 26 seconds. Much better but >> still slow IMO. The question is if this avoids the non-linear algorithms >> and if the generated code is faster enough to justify LLVM. I'll do some >> experimentation. >> >> The generated assembly file is 290K lines for unadorned llc and 616K >> lines for -pre-RA-sched=fast -regalloc=local. This does not inspire much >> hope :-) > > Is this a Release or a Release-Asserts build? > You could try how much time it takes on a Release-Asserts build.Assertions are disabled.> Also if you use -time-passes with llc it should show which pass in llc > takes so much time.These are the three main culprits for llc -O0 ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 10.9531 ( 30.0%) 0.4687 ( 58.8%) 11.4218 ( 30.6%) 11.5468 ( 30.6%) X86 DAG->DAG Instruction Selection 10.2500 ( 28.0%) 0.0156 ( 1.9%) 10.2656 ( 27.5%) 10.2500 ( 27.2%) Live Variable Analysis 4.8593 ( 13.3%) 0.0000 ( 0.0%) 4.8593 ( 13.0%) 4.8593 ( 12.9%) Linear Scan Register Allocator And there for -pre-RA-sched=fast -regalloc=simple -O0 code.bc 10.7187 ( 45.4%) 0.4375 ( 60.8%) 11.1562 ( 45.8%) 11.1718 ( 45.4%) X86 DAG->DAG Instruction Selection 7.4687 ( 31.6%) 0.0156 ( 2.1%) 7.4843 ( 30.7%) 7.5312 ( 30.6%) Simple Register Allocator 1.9531 ( 8.2%) 0.1406 ( 19.5%) 2.0937 ( 8.6%) 2.1093 ( 8.5%) X86 Intel-Style Assembly Printer I suppose we can't get rid of instruction selection :-) -- Óscar
On Aug 26, 2009, at 6:57 AM, Óscar Fuentes <ofv at wanadoo.es> wrote:> > Yup, llc -O0 takes 37.5 seconds. > > llc -pre-RA-sched=fast -regalloc=local takes 26 seconds.Another important flag for testing llc time is llc -asm-verbose=false. Dan
Hello Dan. Dan Gohman <gohman at apple.com> writes:> On Aug 26, 2009, at 6:57 AM, Óscar Fuentes <ofv at wanadoo.es> wrote: >> >> Yup, llc -O0 takes 37.5 seconds. >> >> llc -pre-RA-sched=fast -regalloc=local takes 26 seconds. > > Another important flag for testing llc time is llc -asm-verbose=false.Adding -asm-verbose=false to -pre-RA-sched=fast -regalloc=simple -time-passes -O0 made no significant difference (saved 0.1 seconds of 27.7 total) for outputting a 637K lines long .s file. -- Óscar