Hello, I recently noticed a performance issue of JIT execution vs native code of the following simple logic which computes the Fibonacci sequence: uint64_t fib(int n) { if (n <= 2) { return 1; } else { return fib(n-1) + fib(n-2); } } When compiled natively using clang++ with -O3, it took 0.17s to compute fib(40). However, when executing using LLJIT, fed with the IR output of "clang++ -emit-llvm -O3", it took 0.26s. I don't know much about the internals of LLJIT, but my guess is since the IR is the same, maybe LLJIT used a cheaper but lower quality instruction selection pass, resulting in the slower runtime? Could someone working on LLJIT clarify the difference in lowering passes between LLJIT and clang++? And if I were to change this behavior, which APIs should I look at to begin with? Thanks for your time! Best regards, Haoran -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200903/72bd5fb0/attachment.html>
Hi Haoran, LLJIT uses CodeGenOpt::Default by default, whereas I suspect -O3 uses CodeGenOpt::Aggressive. You can configure this by setting/modifying the JITTargetMachineBuilder member of your LLJITBuilder before calling create. You can also try attaching a DumpObjects instance to your JIT to dump the JIT'd objects to disk: sometimes comparing objects can offer useful insights. You can find an example of this in llvm/examples/OrcV2Examples/LLJITDumpObjects. Regards, Lang. On Thu, Sep 3, 2020 at 7:01 PM Haoran Xu via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hello, > > I recently noticed a performance issue of JIT execution vs native code of > the following simple logic which computes the Fibonacci sequence: > > uint64_t fib(int n) { > if (n <= 2) { > return 1; > } else { > return fib(n-1) + fib(n-2); > } > } > > When compiled natively using clang++ with -O3, it took 0.17s to compute > fib(40). However, when executing using LLJIT, fed with the IR output of > "clang++ -emit-llvm -O3", it took 0.26s. > > I don't know much about the internals of LLJIT, but my guess is since the > IR is the same, maybe LLJIT used a cheaper but lower quality instruction > selection pass, resulting in the slower runtime? Could someone working on > LLJIT clarify the difference in lowering passes between LLJIT and clang++? > And if I were to change this behavior, which APIs should I look at to begin > with? > > Thanks for your time! > > Best regards, > Haoran > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200906/8138fe5b/attachment.html>
Thanks for the clarification Lang! I didn't know about CodeGenOpt before. I'll give it a try to see if it fixes the issue. Thanks again! Haoran Lang Hames <lhames at gmail.com> 于2020年9月6日周日 下午11:03写道:> Hi Haoran, > > LLJIT uses CodeGenOpt::Default by default, whereas I suspect -O3 uses > CodeGenOpt::Aggressive. You can configure this by setting/modifying the > JITTargetMachineBuilder member of your LLJITBuilder before calling create. > > You can also try attaching a DumpObjects instance to your JIT to dump the > JIT'd objects to disk: sometimes comparing objects can offer useful > insights. You can find an example of this in > llvm/examples/OrcV2Examples/LLJITDumpObjects. > > Regards, > Lang. > > On Thu, Sep 3, 2020 at 7:01 PM Haoran Xu via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Hello, >> >> I recently noticed a performance issue of JIT execution vs native code of >> the following simple logic which computes the Fibonacci sequence: >> >> uint64_t fib(int n) { >> if (n <= 2) { >> return 1; >> } else { >> return fib(n-1) + fib(n-2); >> } >> } >> >> When compiled natively using clang++ with -O3, it took 0.17s to compute >> fib(40). However, when executing using LLJIT, fed with the IR output of >> "clang++ -emit-llvm -O3", it took 0.26s. >> >> I don't know much about the internals of LLJIT, but my guess is since the >> IR is the same, maybe LLJIT used a cheaper but lower quality instruction >> selection pass, resulting in the slower runtime? Could someone working on >> LLJIT clarify the difference in lowering passes between LLJIT and clang++? >> And if I were to change this behavior, which APIs should I look at to begin >> with? >> >> Thanks for your time! >> >> Best regards, >> Haoran >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200907/e4631dab/attachment-0001.html>