thr3ads.net - llvm dev - [LLVMdev] Removing contention in PassRegistry accesses to speed up compiles [Feb 2015]

If this information is useful, please help other people find it:
Share via:

Nipun Sehrawat

2015-Feb-24 21:50 UTC

[LLVMdev] Removing contention in PassRegistry accesses to speed up compiles

Hi,

We use LLVM libraries to compile C++ code and noticed slow downs when
multiple threads of a process were compiling at once. *perf *indicated that
most of the CPU time was spent in a spin lock, which was being
locked/unlocked from llvm::PassRegistry::getPassInfo().

We read the relevant LLVM code and found out that PassRegistry is a
ManagedStatic and is shared among all threads in case of a multi-threaded
setup. This sharing requires locking in PassRegistry's method, which
becomes source of the contention. To get rid of the contention, we made a
change to make PassRegistry thread-local and got rid of the locking. This
removed all the contention and we noticed a 2x speed up in single thread
compiles and 7x improvement when ten threads were compiling in parallel.

Please find attached the diff for this change. We are using a old version
of LLVM code (svn revision 170375), so the code might be quite outdated.

We have two questions:
1. Does the change look reasonable? Or are we missing something here?
2. When we run  with 1000 threads compiling concurrently, we
deterministically run into a segfault in PassRegistry lookup. Any insights
into the segfault?


Please find attached the following files:
1. *pass_registry.txt*: Git diff of our change. Note that it is against
LLVM svn revision 170375.
2.* contention.txt*: Perf report with existing LLVM code - shows contention
in llvm::PassRegistry::getPassInfo()
3. *no_contention.txt*: Perf report of LLVM built with our change.
4. *segfault.txt*: Segfault we are encountering after our change.
5. *clang_compile.cpp*: Snippet of code we use to compile code using LLVM.


Thanks a lot,
Nipun
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150224/47a0c9d5/attachment.html>
-------------- next part --------------
Samples: 696K of event 'cycles', Event count (approx.): 427355036253
-  48.98%  caching_compile  [kernel.kallsyms]          [k] __ticket_spin_lock
   - __ticket_spin_lock
      - 98.91% _raw_spin_lock
         - 57.49% futex_wake
              do_futex
              SyS_futex
            - system_call_fastpath
               - 100.00% __lll_unlock_wake
                  + 99.77% llvm::PassRegistry::getPassInfo(void const*) const
         - 38.01% futex_wait_setup
              futex_wait
              do_futex
              SyS_futex
            - system_call_fastpath
               - 100.00% __lll_lock_wait
                  + 99.86% llvm::PassRegistry::getPassInfo(void const*) const
         + 4.03% try_to_wake_up
      + 0.61% _raw_spin_lock_irq
+   4.10%  caching_compile  libpthread.so.0            [.] pthread_mutex_lock
+   1.81%  caching_compile  caching_compiler_test_old  [.]
llvm::sys::MemoryFence()
+   1.63%  caching_compile  libpthread.so.0            [.]
__pthread_mutex_unlock_usercnt
-------------- next part --------------
Samples: 113K of event 'cycles', Event count (approx.): 70370381305
+  10.38%  caching_compile  caching_compiler_test  [.]
common::RecordParser::MatchData(common::RecordParser::Record*, unsigned long)
+   5.45%  caching_compile  caching_compiler_test  [.]
common::RecordParser::Stream::FileStreamAdapter::good()
+   3.31%  caching_compile  caching_compiler_test  [.]
common::RecordParser::Stream::FileStreamAdapter::get()
+   1.17%  caching_compile  caching_compiler_test  [.]
llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*,
unsigned int)
+   1.10%  caching_compile  caching_compiler_test  [.] operator new(unsigned
long)
+   1.07%  caching_compile  caching_compiler_test  [.]
llvm::SmallPtrSetImpl::insert_imp(void const*)
+   1.00%  caching_compile  caching_compiler_test  [.]
llvm::PMDataManager::removeNotPreservedAnalysis(llvm::Pass*)
+   0.99%  caching_compile  caching_compiler_test  [.] operator delete(void*)
+   0.69%  caching_compile  caching_compiler_test  [.] llvm::sys::MemoryFence()
+   0.64%  caching_compile  caching_compiler_test  [.]
llvm::PMDataManager::initializeAnalysisImpl(llvm::Pass*)
+   0.59%  caching_compile  libc.so.6              [.] __memcpy_ssse3_back
+   0.57%  caching_compile  caching_compiler_test  [.]
llvm::PassRegistry::getPassInfo(void const*) const
+   0.54%  caching_compile  libc.so.6              [.] __memset_sse2
+   0.53%  caching_compile  caching_compiler_test  [.] (anonymous
namespace)::VectorLegalizer::LegalizeOp(llvm::SDValue)
+   0.47%  caching_compile  caching_compiler_test  [.]
llvm::PMDataManager::findAnalysisPass(void const*, bool)
-------------- next part --------------
diff --git a/llvm/llvm.svnrev.170375/include/llvm/PassSupport.h
b/llvm/llvm.svnrev.170375/include/llvm/PassSupport.h
index 3633f47..44b4d56 100644
--- a/llvm/llvm.svnrev.170375/include/llvm/PassSupport.h
+++ b/llvm/llvm.svnrev.170375/include/llvm/PassSupport.h
@@ -130,25 +130,14 @@ private:
   PassInfo(const PassInfo &) LLVM_DELETED_FUNCTION;
 };
 
+
+
 #define CALL_ONCE_INITIALIZATION(function) \
-  static volatile sys::cas_flag initialized = 0; \
-  sys::cas_flag old_val = sys::CompareAndSwap(&initialized, 1, 0); \
-  if (old_val == 0) { \
+  static __thread int initialized = 0; \
+  if (initialized == 0) { \
     function(Registry); \
-    sys::MemoryFence(); \
-    TsanIgnoreWritesBegin(); \
-    TsanHappensBefore(&initialized); \
-    initialized = 2; \
-    TsanIgnoreWritesEnd(); \
-  } else { \
-    sys::cas_flag tmp = initialized; \
-    sys::MemoryFence(); \
-    while (tmp != 2) { \
-      tmp = initialized; \
-      sys::MemoryFence(); \
-    } \
-  } \
-  TsanHappensAfter(&initialized);
+    initialized = 1; \
+  }
 
 #define INITIALIZE_PASS(passName, arg, name, cfg, analysis) \
   static void* initialize##passName##PassOnce(PassRegistry &Registry) { \
diff --git a/llvm/llvm.svnrev.170375/lib/VMCore/PassRegistry.cpp
b/llvm/llvm.svnrev.170375/lib/VMCore/PassRegistry.cpp
index 5d22879..8cca689 100644
--- a/llvm/llvm.svnrev.170375/lib/VMCore/PassRegistry.cpp
+++ b/llvm/llvm.svnrev.170375/lib/VMCore/PassRegistry.cpp
@@ -30,12 +30,17 @@ using namespace llvm;
 // llvm_shutdown clear this map prevents successful resurrection after
 // llvm_shutdown is run.  Ideally we should find a solution so that we
don't
 // leak the map, AND can still resurrect after shutdown.
-static ManagedStatic<PassRegistry> PassRegistryObj;
+
+// Each thread gets its own PassRegistry.
+static __thread PassRegistry* PassRegistryObj = NULL;
 PassRegistry *PassRegistry::getPassRegistry() {
-  return &*PassRegistryObj;
+  // TODO(nipun): Handle deletion.
+  if (NULL == PassRegistryObj) {
+    PassRegistryObj = new PassRegistry();
+  }
+  return PassRegistryObj;
 }
 
-static ManagedStatic<sys::SmartMutex<true> > Lock;
 
 //===----------------------------------------------------------------------===//
 // PassRegistryImpl
@@ -72,26 +77,23 @@ void *PassRegistry::getImpl() const {
 //
 
 PassRegistry::~PassRegistry() {
-  sys::SmartScopedLock<true> Guard(*Lock);
   PassRegistryImpl *Impl = static_cast<PassRegistryImpl*>(pImpl);
-  
+
   for (std::vector<const PassInfo*>::iterator I =
Impl->ToFree.begin(),
        E = Impl->ToFree.end(); I != E; ++I)
     delete *I;
-  
+
   delete Impl;
   pImpl = 0;
 }
 
 const PassInfo *PassRegistry::getPassInfo(const void *TI) const {
-  sys::SmartScopedLock<true> Guard(*Lock);
   PassRegistryImpl *Impl = static_cast<PassRegistryImpl*>(getImpl());
   PassRegistryImpl::MapType::const_iterator I = Impl->PassInfoMap.find(TI);
   return I != Impl->PassInfoMap.end() ? I->second : 0;
 }
 
 const PassInfo *PassRegistry::getPassInfo(StringRef Arg) const {
-  sys::SmartScopedLock<true> Guard(*Lock);
   PassRegistryImpl *Impl = static_cast<PassRegistryImpl*>(getImpl());
   PassRegistryImpl::StringMapType::const_iterator
     I = Impl->PassInfoStringMap.find(Arg);
@@ -103,7 +105,6 @@ const PassInfo *PassRegistry::getPassInfo(StringRef Arg)
const {
 //
 
 void PassRegistry::registerPass(const PassInfo &PI, bool ShouldFree) {
-  sys::SmartScopedLock<true> Guard(*Lock);
   PassRegistryImpl *Impl = static_cast<PassRegistryImpl*>(getImpl());
   bool Inserted     
Impl->PassInfoMap.insert(std::make_pair(PI.getTypeInfo(),&PI)).second;
@@ -115,24 +116,22 @@ void PassRegistry::registerPass(const PassInfo &PI,
bool ShouldFree) {
   for (std::vector<PassRegistrationListener*>::iterator
        I = Impl->Listeners.begin(), E = Impl->Listeners.end(); I != E;
++I)
     (*I)->passRegistered(&PI);
-  
+
   if (ShouldFree) Impl->ToFree.push_back(&PI);
 }
 
 void PassRegistry::unregisterPass(const PassInfo &PI) {
-  sys::SmartScopedLock<true> Guard(*Lock);
   PassRegistryImpl *Impl = static_cast<PassRegistryImpl*>(getImpl());
-  PassRegistryImpl::MapType::iterator I = 
+  PassRegistryImpl::MapType::iterator I     
Impl->PassInfoMap.find(PI.getTypeInfo());
   assert(I != Impl->PassInfoMap.end() && "Pass registered but
not in map!");
-  
+
   // Remove pass from the map.
   Impl->PassInfoMap.erase(I);
   Impl->PassInfoStringMap.erase(PI.getPassArgument());
 }
 
 void PassRegistry::enumerateWith(PassRegistrationListener *L) {
-  sys::SmartScopedLock<true> Guard(*Lock);
   PassRegistryImpl *Impl = static_cast<PassRegistryImpl*>(getImpl());
   for (PassRegistryImpl::MapType::const_iterator I =
Impl->PassInfoMap.begin(),
        E = Impl->PassInfoMap.end(); I != E; ++I)
@@ -160,8 +159,6 @@ void PassRegistry::registerAnalysisGroup(const void
*InterfaceID,
     assert(ImplementationInfo &&
            "Must register pass before adding to AnalysisGroup!");
 
-    sys::SmartScopedLock<true> Guard(*Lock);
-    
     // Make sure we keep track of the fact that the implementation implements
     // the interface.
     ImplementationInfo->addInterfaceImplemented(InterfaceInfo);
@@ -180,20 +177,18 @@ void PassRegistry::registerAnalysisGroup(const void
*InterfaceID,
       InterfaceInfo->setNormalCtor(ImplementationInfo->getNormalCtor());
     }
   }
-  
+
   PassRegistryImpl *Impl = static_cast<PassRegistryImpl*>(getImpl());
   if (ShouldFree) Impl->ToFree.push_back(&Registeree);
 }
 
 void PassRegistry::addRegistrationListener(PassRegistrationListener *L) {
-  sys::SmartScopedLock<true> Guard(*Lock);
   PassRegistryImpl *Impl = static_cast<PassRegistryImpl*>(getImpl());
   Impl->Listeners.push_back(L);
 }
 
 void PassRegistry::removeRegistrationListener(PassRegistrationListener *L) {
-  sys::SmartScopedLock<true> Guard(*Lock);
-  
+
   // NOTE: This is necessary, because removeRegistrationListener() can be
called
   // as part of the llvm_shutdown sequence.  Since we have no control over the
   // order of that sequence, we need to gracefully handle the case where the
-------------- next part --------------
*** Aborted at 1424812565 (unix time) try "date -d @1424812565" if you
are using GNU date ***
PC: @           0x51c290 llvm::PMTopLevelManager::findAnalysisPass()
*** SIGSEGV (@0x20) received by PID 50599 (TID 0x7f1fab00e700) from PID 32;
stack trace: ***
    @     0x7f20d332a0d0 (unknown)
    @           0x51c290 llvm::PMTopLevelManager::findAnalysisPass()
    @           0x51f508 llvm::PMDataManager::initializeAnalysisImpl()
    @           0x5232fb llvm::FPPassManager::runOnFunction()
    @           0x523426 llvm::FunctionPassManagerImpl::run()
    @           0x52354d llvm::FunctionPassManager::run()
    @           0x5407aa llvm::JIT::jitTheFunction()
    @           0x540db8 llvm::JIT::runJITOnFunctionUnlocked()
    @           0x540fbf llvm::JIT::getPointerToFunction()
    @           0x424793
falcon::ClangCompiler::ClangModuleHandle::GetPointerToFunction()
    @           0x40f92d
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN6falcon29CachingCompiler_PerfTest_Test8TestBodyEvEUlvE0_vEEE6_M_runEv
    @     0x7f20d23ab784 execute_native_thread_routine
    @     0x7f20d3322dc2 start_thread
    @     0x7f20d1b413fd __clone
-------------- next part --------------
A non-text attachment was scrubbed...
Name: clang_compiler.cpp
Type: text/x-c++src
Size: 3554 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150224/47a0c9d5/attachment.cpp>

Nipun Sehrawat

2015-Feb-26 17:58 UTC

head link

[LLVMdev] Removing contention in PassRegistry accesses to speed up compiles

We have root caused the segfault - it was due to a caching layer we have in
our code, which is to avoid duplicate compilations. Basically,
llvm::JIT::getPointerToFunction() looks up PassRegistry, but as our change
introduces a separate PassRegistry for each thread, this means that the
thread that calls llvm::JIT::getPointerToFunction() should have appropriate
PassRegistry setup. In our setup, some threads were witnessing a cache hit
on a code that was compiled by another thread, but when such a thread
called llvm::JIT::getPointerToFunction(), it was getting a segfault as its
PassRegistry was not setup.

Any comments on the our change to the PassRegistry?

Thanks,
Nipun

On Tue, Feb 24, 2015 at 1:50 PM, Nipun Sehrawat <nipun at thoughtspot.com>
wrote:
> Hi,
>
> We use LLVM libraries to compile C++ code and noticed slow downs when
> multiple threads of a process were compiling at once. *perf *indicated
> that most of the CPU time was spent in a spin lock, which was being
> locked/unlocked from llvm::PassRegistry::getPassInfo().
>
> We read the relevant LLVM code and found out that PassRegistry is a
> ManagedStatic and is shared among all threads in case of a multi-threaded
> setup. This sharing requires locking in PassRegistry's method, which
> becomes source of the contention. To get rid of the contention, we made a
> change to make PassRegistry thread-local and got rid of the locking. This
> removed all the contention and we noticed a 2x speed up in single thread
> compiles and 7x improvement when ten threads were compiling in parallel.
>
> Please find attached the diff for this change. We are using a old version
> of LLVM code (svn revision 170375), so the code might be quite outdated.
>
> We have two questions:
> 1. Does the change look reasonable? Or are we missing something here?
> 2. When we run  with 1000 threads compiling concurrently, we
> deterministically run into a segfault in PassRegistry lookup. Any insights
> into the segfault?
>
>
> Please find attached the following files:
> 1. *pass_registry.txt*: Git diff of our change. Note that it is against
> LLVM svn revision 170375.
> 2.* contention.txt*: Perf report with existing LLVM code - shows
> contention in llvm::PassRegistry::getPassInfo()
> 3. *no_contention.txt*: Perf report of LLVM built with our change.
> 4. *segfault.txt*: Segfault we are encountering after our change.
> 5. *clang_compile.cpp*: Snippet of code we use to compile code using LLVM.
>
>
> Thanks a lot,
> Nipun
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150226/b7aa95c6/attachment.html>

Sanjoy Das

2015-Feb-26 18:42 UTC

head link

[LLVMdev] Removing contention in PassRegistry accesses to speed up compiles

Hi Nipun,

The usual way to get feedback on patches is to upload them on
phabricator ( http://llvm.org/docs/Phabricator.html ) and send them to
llvm-commits.  If you do the first part (upload patch to phabricator)
correctly, phabricator will mail your patch to llvm-commits with an
appropriate description automatically.

-- Sanjoy

On Thu, Feb 26, 2015 at 9:58 AM, Nipun Sehrawat <nipun at thoughtspot.com>
wrote:> We have root caused the segfault - it was due to a caching layer we have in
> our code, which is to avoid duplicate compilations. Basically,
> llvm::JIT::getPointerToFunction() looks up PassRegistry, but as our change
> introduces a separate PassRegistry for each thread, this means that the
> thread that calls llvm::JIT::getPointerToFunction() should have appropriate
> PassRegistry setup. In our setup, some threads were witnessing a cache hit
> on a code that was compiled by another thread, but when such a thread
called
> llvm::JIT::getPointerToFunction(), it was getting a segfault as its
> PassRegistry was not setup.
>
> Any comments on the our change to the PassRegistry?
>
> Thanks,
> Nipun
>
> On Tue, Feb 24, 2015 at 1:50 PM, Nipun Sehrawat <nipun at
thoughtspot.com>
> wrote:
>>
>> Hi,
>>
>> We use LLVM libraries to compile C++ code and noticed slow downs when
>> multiple threads of a process were compiling at once. perf indicated
that
>> most of the CPU time was spent in a spin lock, which was being
>> locked/unlocked from llvm::PassRegistry::getPassInfo().
>>
>> We read the relevant LLVM code and found out that PassRegistry is a
>> ManagedStatic and is shared among all threads in case of a
multi-threaded
>> setup. This sharing requires locking in PassRegistry's method,
which becomes
>> source of the contention. To get rid of the contention, we made a
change to
>> make PassRegistry thread-local and got rid of the locking. This removed
all
>> the contention and we noticed a 2x speed up in single thread compiles
and 7x
>> improvement when ten threads were compiling in parallel.
>>
>> Please find attached the diff for this change. We are using a old
version
>> of LLVM code (svn revision 170375), so the code might be quite
outdated.
>>
>> We have two questions:
>> 1. Does the change look reasonable? Or are we missing something here?
>> 2. When we run  with 1000 threads compiling concurrently, we
>> deterministically run into a segfault in PassRegistry lookup. Any
insights
>> into the segfault?
>>
>>
>> Please find attached the following files:
>> 1. pass_registry.txt: Git diff of our change. Note that it is against
LLVM
>> svn revision 170375.
>> 2. contention.txt: Perf report with existing LLVM code - shows
contention
>> in llvm::PassRegistry::getPassInfo()
>> 3. no_contention.txt: Perf report of LLVM built with our change.
>> 4. segfault.txt: Segfault we are encountering after our change.
>> 5. clang_compile.cpp: Snippet of code we use to compile code using
LLVM.
>>
>>
>> Thanks a lot,
>> Nipun
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Apparently Analagous Threads

Search for more seemingly similar threads

llvm dev - Feb 2015 - [LLVMdev] Removing contention in PassRegistry accesses to speed up compiles

[LLVMdev] Removing contention in PassRegistry accesses to speed up compiles

[LLVMdev] Removing contention in PassRegistry accesses to speed up compiles

[LLVMdev] Removing contention in PassRegistry accesses to speed up compiles

Apparently Analagous Threads