Hello Török. Török Edwin <edwintorok at gmail.com> writes:> On 2009-08-26 16:57, Óscar Fuentes wrote: >> llc needs 45 seconds. This is far worse than the 33 seconds used by the >> JIT. Maybe llc is using optimizations. My JIT have no optimizations >> enabled. >> >> Yup, llc -O0 takes 37.5 seconds. >> >> llc -pre-RA-sched=fast -regalloc=local takes 26 seconds. Much better but >> still slow IMO. The question is if this avoids the non-linear algorithms >> and if the generated code is faster enough to justify LLVM. I'll do some >> experimentation. >> >> The generated assembly file is 290K lines for unadorned llc and 616K >> lines for -pre-RA-sched=fast -regalloc=local. This does not inspire much >> hope :-) > > Is this a Release or a Release-Asserts build? > You could try how much time it takes on a Release-Asserts build.Assertions are disabled.> Also if you use -time-passes with llc it should show which pass in llc > takes so much time.These are the three main culprits for llc -O0 ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 10.9531 ( 30.0%) 0.4687 ( 58.8%) 11.4218 ( 30.6%) 11.5468 ( 30.6%) X86 DAG->DAG Instruction Selection 10.2500 ( 28.0%) 0.0156 ( 1.9%) 10.2656 ( 27.5%) 10.2500 ( 27.2%) Live Variable Analysis 4.8593 ( 13.3%) 0.0000 ( 0.0%) 4.8593 ( 13.0%) 4.8593 ( 12.9%) Linear Scan Register Allocator And there for -pre-RA-sched=fast -regalloc=simple -O0 code.bc 10.7187 ( 45.4%) 0.4375 ( 60.8%) 11.1562 ( 45.8%) 11.1718 ( 45.4%) X86 DAG->DAG Instruction Selection 7.4687 ( 31.6%) 0.0156 ( 2.1%) 7.4843 ( 30.7%) 7.5312 ( 30.6%) Simple Register Allocator 1.9531 ( 8.2%) 0.1406 ( 19.5%) 2.0937 ( 8.6%) 2.1093 ( 8.5%) X86 Intel-Style Assembly Printer I suppose we can't get rid of instruction selection :-) -- Óscar
On Aug 26, 2009, at 7:47 AM, Óscar Fuentes wrote:>> Also if you use -time-passes with llc it should show which pass in >> llc >> takes so much time. > > These are the three main culprits for llc -O0 > > ---User Time--- --System Time-- --User+System-- ---Wall > Time--- --- Name --- > 10.9531 ( 30.0%) 0.4687 ( 58.8%) 11.4218 ( 30.6%) 11.5468 > ( 30.6%) X86 DAG->DAG Instruction Selection > 10.2500 ( 28.0%) 0.0156 ( 1.9%) 10.2656 ( 27.5%) 10.2500 > ( 27.2%) Live Variable Analysis > 4.8593 ( 13.3%) 0.0000 ( 0.0%) 4.8593 ( 13.0%) 4.8593 > ( 12.9%) Linear Scan Register Allocator > > And there for -pre-RA-sched=fast -regalloc=simple -O0 code.bc > > 10.7187 ( 45.4%) 0.4375 ( 60.8%) 11.1562 ( 45.8%) 11.1718 > ( 45.4%) X86 DAG->DAG Instruction Selection > 7.4687 ( 31.6%) 0.0156 ( 2.1%) 7.4843 ( 30.7%) 7.5312 > ( 30.6%) Simple Register Allocator > 1.9531 ( 8.2%) 0.1406 ( 19.5%) 2.0937 ( 8.6%) 2.1093 > ( 8.5%) X86 Intel-Style Assembly Printer > > I suppose we can't get rid of instruction selection :-)Pass -fast-isel to speed up instruction selection. Dan, I think that this should be made "non hidden" and updated (from llc --help): -fast-isel - Enable the experimental "fast" instruction selector -Chris
On Aug 26, 2009, at 8:32 AM, Chris Lattner <clattner at apple.com> wrote:> On Aug 26, 2009, at 7:47 AM, Óscar Fuentes wrote: >>> Also if you use -time-passes with llc it should show which pass in >>> llc >>> takes so much time. >> >> These are the three main culprits for llc -O0 >> >> ---User Time--- --System Time-- --User+System-- ---Wall >> Time--- --- Name --- >> 10.9531 ( 30.0%) 0.4687 ( 58.8%) 11.4218 ( 30.6%) 11.5468 >> ( 30.6%) X86 DAG->DAG Instruction Selection >> 10.2500 ( 28.0%) 0.0156 ( 1.9%) 10.2656 ( 27.5%) 10.2500 >> ( 27.2%) Live Variable Analysis >> 4.8593 ( 13.3%) 0.0000 ( 0.0%) 4.8593 ( 13.0%) 4.8593 >> ( 12.9%) Linear Scan Register Allocator >> >> And there for -pre-RA-sched=fast -regalloc=simple -O0 code.bc >> >> 10.7187 ( 45.4%) 0.4375 ( 60.8%) 11.1562 ( 45.8%) 11.1718 >> ( 45.4%) X86 DAG->DAG Instruction Selection >> 7.4687 ( 31.6%) 0.0156 ( 2.1%) 7.4843 ( 30.7%) 7.5312 >> ( 30.6%) Simple Register Allocator >> 1.9531 ( 8.2%) 0.1406 ( 19.5%) 2.0937 ( 8.6%) 2.1093 >> ( 8.5%) X86 Intel-Style Assembly Printer >> >> I suppose we can't get rid of instruction selection :-) > > Pass -fast-isel to speed up instruction selection. > > Dan, I think that this should be made "non hidden" and updated (from > llc --help): > > -fast-isel - Enable the experimental > "fast" instruction selectorIt's turned on by -O0. And I guess it's not so "experimental" at this point :). It hasn't been tuned for a wide variety of applications yet though. An interesting option to add is -fast-isel-verbose, which prints out LLVM instructions that aren't going down the fast path. If there's something that shows up a lot, it may be worthwhile looking into why the front-end is using it, or looking into adding support for that instruction to the fast path. LLVM has made progress in this area, but there's more to be done. Dan