Here is some timing information from running the Zig standard library tests:
$ ./zig test ../std/index.zig --enable-timing-info
                Name       Start         End    Duration     Percent
          Initialize      0.0000      0.0010      0.0010      0.0001
   Semantic Analysis      0.0010      0.9968      0.9958      0.1192
     Code Generation      0.9968      1.4000      0.4032      0.0483
    LLVM Emit Output      1.4000      8.1759      6.7760      0.8112
  Build Dependencies      8.1759      8.3341      0.1581      0.0189
           LLVM Link      8.3341      8.3530      0.0189      0.0023
               Total      0.0000      8.3530      8.3530      1.0000
81% of the time was spent waiting for LLVM to turn a Module into an object
file. This is with optimizations off, FastISel, no module verification, etc.
How can I speed this up? Any tips or things to look into?
Here's the function that 81% of the time is spent inside:
bool ZigLLVMTargetMachineEmitToFile(LLVMTargetMachineRef targ_machine_ref,
LLVMModuleRef module_ref,
        const char *filename, ZigLLVM_EmitOutputType output_type, char
**error_message, bool is_debug, bool is_small)
{
    std::error_code EC;
    raw_fd_ostream dest(filename, EC, sys::fs::F_None);
    if (EC) {
        *error_message = strdup((const char
*)StringRef(EC.message()).bytes_begin());
        return true;
    }
    TargetMachine* target_machine
reinterpret_cast<TargetMachine*>(targ_machine_ref);
    target_machine->setO0WantsFastISel(true);
    Module* module = unwrap(module_ref);
    PassManagerBuilder *PMBuilder = new(std::nothrow) PassManagerBuilder();
    if (PMBuilder == nullptr) {
        *error_message = strdup("memory allocation failure");
        return true;
    }
    PMBuilder->OptLevel = target_machine->getOptLevel();
    PMBuilder->SizeLevel = is_small ? 2 : 0;
    PMBuilder->DisableTailCalls = is_debug;
    PMBuilder->DisableUnitAtATime = is_debug;
    PMBuilder->DisableUnrollLoops = is_debug;
    PMBuilder->SLPVectorize = !is_debug;
    PMBuilder->LoopVectorize = !is_debug;
    PMBuilder->RerollLoops = !is_debug;
    // Leaving NewGVN as default (off) because when on it caused issue #673
    //PMBuilder->NewGVN = !is_debug;
    PMBuilder->DisableGVNLoadPRE = is_debug;
    PMBuilder->VerifyInput = assertions_on;
    PMBuilder->VerifyOutput = assertions_on;
    PMBuilder->MergeFunctions = !is_debug;
    PMBuilder->PrepareForLTO = false;
    PMBuilder->PrepareForThinLTO = false;
    PMBuilder->PerformThinLTO = false;
    TargetLibraryInfoImpl tlii(Triple(module->getTargetTriple()));
    PMBuilder->LibraryInfo = &tlii;
    if (is_debug) {
        PMBuilder->Inliner = createAlwaysInlinerLegacyPass(false);
    } else {
        target_machine->adjustPassManager(*PMBuilder);
        PMBuilder->addExtension(PassManagerBuilder::EP_EarlyAsPossible,
addDiscriminatorsPass);
        PMBuilder->Inliner createFunctionInliningPass(PMBuilder->OptLevel,
PMBuilder->SizeLevel,
false);
    }
    addCoroutinePassesToExtensionPoints(*PMBuilder);
    // Set up the per-function pass manager.
    legacy::FunctionPassManager FPM = legacy::FunctionPassManager(module);
    auto tliwp = new(std::nothrow) TargetLibraryInfoWrapperPass(tlii);
    FPM.add(tliwp);
FPM.add(createTargetTransformInfoWrapperPass(target_machine->getTargetIRAnalysis()));
    if (assertions_on) {
        FPM.add(createVerifierPass());
    }
    PMBuilder->populateFunctionPassManager(FPM);
    // Set up the per-module pass manager.
    legacy::PassManager MPM;
MPM.add(createTargetTransformInfoWrapperPass(target_machine->getTargetIRAnalysis()));
    PMBuilder->populateModulePassManager(MPM);
    // Set output pass.
    TargetMachine::CodeGenFileType ft;
    if (output_type != ZigLLVM_EmitLLVMIr) {
        switch (output_type) {
            case ZigLLVM_EmitAssembly:
                ft = TargetMachine::CGFT_AssemblyFile;
                break;
            case ZigLLVM_EmitBinary:
                ft = TargetMachine::CGFT_ObjectFile;
                break;
            default:
                abort();
        }
        if (target_machine->addPassesToEmitFile(MPM, dest, ft)) {
            *error_message = strdup("TargetMachine can't emit a file of
this type");
            return true;
        }
    }
    // run per function optimization passes
    FPM.doInitialization();
    for (Function &F : *module)
      if (!F.isDeclaration())
        FPM.run(F);
    FPM.doFinalization();
    MPM.run(*module);
    if (output_type == ZigLLVM_EmitLLVMIr) {
        if (LLVMPrintModuleToFile(module_ref, filename, error_message)) {
            return true;
        }
    }
    return false;
}
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180911/d7df5280/attachment.html>
On 9/11/2018 5:48 PM, Andrew Kelley via llvm-dev wrote:> Here is some timing information from running the Zig standard library > tests: > > $ ./zig test ../std/index.zig --enable-timing-info > Name Start End Duration Percent > Initialize 0.0000 0.0010 0.0010 0.0001 > Semantic Analysis 0.0010 0.9968 0.9958 0.1192 > Code Generation 0.9968 1.4000 0.4032 0.0483 > LLVM Emit Output 1.4000 8.1759 6.7760 0.8112 > Build Dependencies 8.1759 8.3341 0.1581 0.0189 > LLVM Link 8.3341 8.3530 0.0189 0.0023 > Total 0.0000 8.3530 8.3530 1.0000 > > 81% of the time was spent waiting for LLVM to turn a Module into an > object file. This is with optimizations off, FastISel, no module > verification, etc. > > How can I speed this up? Any tips or things to look into?First step is probably setting TimePassesIsEnabled to true and looking at the output. It's hard to say where the time is going without any numbers. -Eli -- Employee of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180911/37533d23/attachment.html>
Thanks, that was a really helpful suggestion. If you're curious- here are some of the high cost areas: ===-------------------------------------------------------------------------== DWARF Emission ===-------------------------------------------------------------------------== Total Execution Time: 2.0117 seconds (2.0185 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.7977 ( 64.4%) 0.4517 ( 58.4%) 1.2494 ( 62.1%) 1.2557 ( 62.2%) Debug Info Emission 0.4383 ( 35.4%) 0.3221 ( 41.6%) 0.7603 ( 37.8%) 0.7608 ( 37.7%) DWARF Exception Writer 0.0019 ( 0.2%) 0.0000 ( 0.0%) 0.0019 ( 0.1%) 0.0019 ( 0.1%) DWARF Debug Writer 1.2379 (100.0%) 0.7738 (100.0%) 2.0117 (100.0%) 2.0185 (100.0%) Total ===-------------------------------------------------------------------------== ... Pass execution timing report ... ===-------------------------------------------------------------------------== Total Execution Time: 10.3340 seconds (10.3289 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 2.4747 ( 31.5%) 1.6998 ( 68.5%) 4.1744 ( 40.4%) 4.1745 ( 40.4%) X86 Assembly Printer 2.3159 ( 29.5%) 0.0103 ( 0.4%) 2.3262 ( 22.5%) 2.3262 ( 22.5%) Coroutine Splitting 1.4348 ( 18.3%) 0.4092 ( 16.5%) 1.8440 ( 17.8%) 1.8428 ( 17.8%) X86 DAG->DAG Instruction Selection 0.1875 ( 2.4%) 0.0384 ( 1.5%) 0.2259 ( 2.2%) 0.2252 ( 2.2%) Fast Register Allocator I'm curious about the X86 Assembly Printer. What is it doing? The module in question has some inline assembly in it. So I would think that LLVM would parse it and then render it as machine code - at what point does it get printed? As for Coroutine Splitting - is this about as fast as it will get? If not then what's the point of LLVM having coroutines? Frontends would be better off implementing coroutines on top of structs, like Rust does. On Tue, Sep 11, 2018 at 9:01 PM Friedman, Eli <efriedma at codeaurora.org> wrote:> On 9/11/2018 5:48 PM, Andrew Kelley via llvm-dev wrote: > > Here is some timing information from running the Zig standard library > tests: > > $ ./zig test ../std/index.zig --enable-timing-info > Name Start End Duration Percent > Initialize 0.0000 0.0010 0.0010 0.0001 > Semantic Analysis 0.0010 0.9968 0.9958 0.1192 > Code Generation 0.9968 1.4000 0.4032 0.0483 > LLVM Emit Output 1.4000 8.1759 6.7760 0.8112 > Build Dependencies 8.1759 8.3341 0.1581 0.0189 > LLVM Link 8.3341 8.3530 0.0189 0.0023 > Total 0.0000 8.3530 8.3530 1.0000 > > 81% of the time was spent waiting for LLVM to turn a Module into an object > file. This is with optimizations off, FastISel, no module verification, etc. > > How can I speed this up? Any tips or things to look into? > > > First step is probably setting TimePassesIsEnabled to true and looking at > the output. It's hard to say where the time is going without any numbers. > > -Eli > > -- > Employee of Qualcomm Innovation Center, Inc. > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180911/32907d88/attachment.html>