Lang Hames via llvm-dev
2020-May-21 23:53 UTC
[llvm-dev] [ORC JIT][MLIR] GDBRegistrationListener "second attempt to perform debug registration" assert
Hi Adam, Calls to the listeners should be protected by the RTDyldLayerMutex. Could you apply the attached patch and share the debugging output from one of the failing runs? Regards, Lang. On Wed, May 20, 2020 at 8:00 PM David Blaikie <dblaikie at gmail.com> wrote:> +Lang > > On Wed, May 20, 2020 at 4:44 PM Straw, Adam D via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Hi all, >> >> Attention: Lang Hames >> >> >> >> I am developing the nGraph MLIR >> <https://github.com/NervanaSystems/ngraph/tree/master/src/contrib/mlir> >> implementation and hitting the following assert while running nGraph unit >> tests: >> >> >> >> assert(ObjectBufferMap.find(K) == ObjectBufferMap.end() && >> >> "Second attempt to perform debug registration."); >> >> >> >> Here is a permalink >> <https://github.com/llvm/llvm-project/blob/3d5360a4398bfa6878f94ca9ac55bc568692c765/llvm/lib/ExecutionEngine/GDBRegistrationListener.cpp#L163> >> for that line of code on GitHub. The failure only occurs when running >> multiple unit tests back-to-back. Inevitably, if I rerun the failing unit >> test, it passes. The failure also tends to move around with a different >> unit tests failing on successive runs. >> >> >> >> I am able to hit the failure in GDB. Here is a partial backtrace: >> >> >> >> #3 0x00007fffec4cf412 in __GI___assert_fail (assertion=assertion at entry=0x7ffff1d145e0 >> "ObjectBufferMap.find(K) == ObjectBufferMap.end() && \"Second attempt to >> perform debug registration.\"", >> >> file=file at entry=0x7ffff1d144d8 >> "/localdisk/adstraw/ngraph/build/mlir_project/llvm-project/llvm/lib/ExecutionEngine/GDBRegistrationListener.cpp", >> line=line at entry=164, >> >> function=function at entry=0x7ffff1d15520 <(anonymous >> namespace)::GDBJITRegistrationListener::notifyObjectLoaded(unsigned long, >> llvm::object::ObjectFile const&, llvm::RuntimeDyld::LoadedObjectInfo >> const&)::__PRETTY_FUNCTION__> "virtual void >> {anonymous}::GDBJITRegistrationListener::notifyObjectLoaded(llvm::JITEventListener::ObjectKey, >> const llvm::object::ObjectFile&, const >> llvm::RuntimeDyld::LoadedObjectInfo&)") >> >> at assert.c:101 >> >> #4 0x00007ffff01ed4d3 in (anonymous >> namespace)::GDBJITRegistrationListener::notifyObjectLoaded >> (this=0x5555591a9270, K=93825060893584, Obj=..., L=...) >> >> at >> /localdisk/adstraw/ngraph/build/mlir_project/llvm-project/llvm/lib/ExecutionEngine/GDBRegistrationListener.cpp:163 >> >> #5 0x00007ffff01b0eca in std::function<void (unsigned long, >> llvm::object::ObjectFile const&, llvm::RuntimeDyld::LoadedObjectInfo >> const&)>::operator()(unsigned long, llvm::object::ObjectFile const&, >> llvm::RuntimeDyld::LoadedObjectInfo const&) const (__args#2=..., >> __args#1=..., __args#0=<optimized out>, this=0x5555594e09b8) at >> /usr/include/c++/7/bits/std_function.h:706 >> >> #6 llvm::orc::RTDyldObjectLinkingLayer::onObjLoad (this=<optimized out>, >> K=3, R=..., Obj=..., MemMgr=0x5555591c4160, >> LoadedObjInfo=std::unique_ptr<llvm::RuntimeDyld::LoadedObjectInfo> = {...}, >> >> Resolved=std::map with 8 elements = {...}, InternalSymbols=std::set >> with 0 elements) >> >> at >> /localdisk/adstraw/ngraph/build/mlir_project/llvm-project/llvm/lib/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.cpp:267 >> >> >> >> A little debug context: >> >> >> >> - *Frame #6* with K=3 we are about to execute the `NotifyLoaded` >> callback >> - *Frame #5* we execute the `NotifyLoaded` callback which apparently >> transforms K from 3 to 93825060893584 (0x5555596CF390) which looks like a >> pointer. I don’t know the code well enough to grok what’s going on here. >> I’m not even sure where this callback resides, but it could be here >> <https://github.com/llvm/llvm-project/blob/c66f89005f6d23b6885d8f93f33ff27dc60ce7dd/llvm/lib/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.cpp#L312> >> in RTDyldObjectLinkingLater.cpp where we are using the address of the >> `MemMgr` as K. >> - *Frame #4 *now in `notifyObjectLoaded` with K=93825060893584 which >> will go on to assert in *frame #3* due to the fact that that key (K) >> already exists in the map >> >> >> >> All of this leads me to believe there is some sort of race. Theory: >> with two unit tests A and B running back to back… >> >> >> >> 1. Unit test A allocates and deallocates a `MemMgr` (or whatever >> other object we are using for a key) at address X >> 2. Unit test A calls ` notifyFreeingObject` to free the object at K=X >> but gets stuck (does not lock) the ` JITDebugLock` mutex here >> <https://github.com/llvm/llvm-project/blob/3d5360a4398bfa6878f94ca9ac55bc568692c765/llvm/lib/ExecutionEngine/GDBRegistrationListener.cpp#L181> >> for whatever reason >> 3. Meanwhile unit test B allocates a `MemMgr` (or whatever other >> object we are using for a key) at newly freed address X >> 4. Unit test B calls `notifyObjectLoaded` to register the object at >> K=X and locks the ` JITDebugLock` here >> <https://github.com/llvm/llvm-project/blob/3d5360a4398bfa6878f94ca9ac55bc568692c765/llvm/lib/ExecutionEngine/GDBRegistrationListener.cpp#L162> >> with unit test A still waiting in step #2 above >> 5. This goes on to assert as the key was not erased (step #2) before >> it was added (step #4) >> >> >> >> I could use a little help to debug the error further. Curiously, we do >> hit this issue in continuous integration (CI) --- only when running on >> development systems “by hand”. I am investigating the differences (perhaps >> single vs. multi threaded?) as to why this might be the case. >> >> >> >> Note that I am using a slightly old version of MLIR master, I’m at this >> commit: >> >> >> >> commit 2f8b4545f4960778e37114c024073d208751ca89 >> >> Author: Adam Straw <adam.d.straw at intel.com> >> >> Date: Tue Apr 14 22:49:18 2020 +0300 >> >> >> >> [mlir] Fix assert on signed integer type in EDSC >> >> >> >> Integer type in Std dialect is signless so we should be checking >> >> for signless integer type instead of signed integer type in EDSC. >> >> >> >> Differential Revision: https://reviews.llvm.org/D78144 >> >> >> >> Thanks in advance for the help. >> >> >> >> Adam >> >> >> >> Note: I am out of office this Fri and the following Mon but otherwise >> should be prompt with email replies. >> >> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200521/f30cc7c9/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: rtdyldlayer-debug-out.patch Type: application/octet-stream Size: 1431 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200521/f30cc7c9/attachment.obj>
Straw, Adam D via llvm-dev
2020-May-26 16:41 UTC
[llvm-dev] [ORC JIT][MLIR] GDBRegistrationListener "second attempt to perform debug registration" assert
Sorry for the delay. Hope everyone had a good long weekend. See attached log captured from GDB session. Here is the punchline: Adding MemMgr 0x55555959f440 unit-test: /localdisk/adstraw/ngraph/build/mlir_project/llvm-project/llvm/lib/ExecutionEngine/GDBRegistrationListener.cpp:164: virtual void {anonymous}::GDBJITRegistrationListener::notifyObjectLoaded(llvm::JITEventListener::ObjectKey, const llvm::object::ObjectFile&, const llvm::RuntimeDyld::LoadedObjectInfo&): Assertion `ObjectBufferMap.find(K) == ObjectBufferMap.end() && "Second attempt to perform debug registration."' failed. (gdb) print/x K $2 = 0x555559605c50 The key (K) causing the assert a) does not match the addresses of the most recently added MemMgr and b) does not match the address of ANY MemMgr in the entire log file. The plot thickens. Wondering if there is some other object (besides a MemMgr) that registers its address-as-key into `ObjectBufferMap`. Adam From: Lang Hames <lhames at gmail.com> Sent: Thursday, May 21, 2020 4:53 PM To: David Blaikie <dblaikie at gmail.com> Cc: Straw, Adam D <adam.d.straw at intel.com>; llvm-dev at lists.llvm.org; Cyphers, Scott <scott.cyphers at intel.com>; Bobba, Jayaram <jayaram.bobba at intel.com>; Mostafa, Nagy H <nagy.h.mostafa at intel.com> Subject: Re: [llvm-dev] [ORC JIT][MLIR] GDBRegistrationListener "second attempt to perform debug registration" assert Hi Adam, Calls to the listeners should be protected by the RTDyldLayerMutex. Could you apply the attached patch and share the debugging output from one of the failing runs? Regards, Lang. On Wed, May 20, 2020 at 8:00 PM David Blaikie <dblaikie at gmail.com<mailto:dblaikie at gmail.com>> wrote: +Lang On Wed, May 20, 2020 at 4:44 PM Straw, Adam D via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: Hi all, Attention: Lang Hames I am developing the nGraph MLIR<https://github.com/NervanaSystems/ngraph/tree/master/src/contrib/mlir> implementation and hitting the following assert while running nGraph unit tests: assert(ObjectBufferMap.find(K) == ObjectBufferMap.end() && "Second attempt to perform debug registration."); Here is a permalink<https://github.com/llvm/llvm-project/blob/3d5360a4398bfa6878f94ca9ac55bc568692c765/llvm/lib/ExecutionEngine/GDBRegistrationListener.cpp#L163> for that line of code on GitHub. The failure only occurs when running multiple unit tests back-to-back. Inevitably, if I rerun the failing unit test, it passes. The failure also tends to move around with a different unit tests failing on successive runs. I am able to hit the failure in GDB. Here is a partial backtrace: #3 0x00007fffec4cf412 in __GI___assert_fail (assertion=assertion at entry=0x7ffff1d145e0 "ObjectBufferMap.find(K) == ObjectBufferMap.end() && \"Second attempt to perform debug registration.\"", file=file at entry=0x7ffff1d144d8 "/localdisk/adstraw/ngraph/build/mlir_project/llvm-project/llvm/lib/ExecutionEngine/GDBRegistrationListener.cpp", line=line at entry=164, function=function at entry=0x7ffff1d15520 <(anonymous namespace)::GDBJITRegistrationListener::notifyObjectLoaded(unsigned long, llvm::object::ObjectFile const&, llvm::RuntimeDyld::LoadedObjectInfo const&)::__PRETTY_FUNCTION__> "virtual void {anonymous}::GDBJITRegistrationListener::notifyObjectLoaded(llvm::JITEventListener::ObjectKey, const llvm::object::ObjectFile&, const llvm::RuntimeDyld::LoadedObjectInfo&)") at assert.c:101 #4 0x00007ffff01ed4d3 in (anonymous namespace)::GDBJITRegistrationListener::notifyObjectLoaded (this=0x5555591a9270, K=93825060893584, Obj=..., L=...) at /localdisk/adstraw/ngraph/build/mlir_project/llvm-project/llvm/lib/ExecutionEngine/GDBRegistrationListener.cpp:163 #5 0x00007ffff01b0eca in std::function<void (unsigned long, llvm::object::ObjectFile const&, llvm::RuntimeDyld::LoadedObjectInfo const&)>::operator()(unsigned long, llvm::object::ObjectFile const&, llvm::RuntimeDyld::LoadedObjectInfo const&) const (__args#2=..., __args#1=..., __args#0=<optimized out>, this=0x5555594e09b8) at /usr/include/c++/7/bits/std_function.h:706 #6 llvm::orc::RTDyldObjectLinkingLayer::onObjLoad (this=<optimized out>, K=3, R=..., Obj=..., MemMgr=0x5555591c4160, LoadedObjInfo=std::unique_ptr<llvm::RuntimeDyld::LoadedObjectInfo> = {...}, Resolved=std::map with 8 elements = {...}, InternalSymbols=std::set with 0 elements) at /localdisk/adstraw/ngraph/build/mlir_project/llvm-project/llvm/lib/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.cpp:267 A little debug context: * Frame #6 with K=3 we are about to execute the `NotifyLoaded` callback * Frame #5 we execute the `NotifyLoaded` callback which apparently transforms K from 3 to 93825060893584 (0x5555596CF390) which looks like a pointer. I don’t know the code well enough to grok what’s going on here. I’m not even sure where this callback resides, but it could be here<https://github.com/llvm/llvm-project/blob/c66f89005f6d23b6885d8f93f33ff27dc60ce7dd/llvm/lib/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.cpp#L312> in RTDyldObjectLinkingLater.cpp where we are using the address of the `MemMgr` as K. * Frame #4 now in `notifyObjectLoaded` with K=93825060893584 which will go on to assert in frame #3 due to the fact that that key (K) already exists in the map All of this leads me to believe there is some sort of race. Theory: with two unit tests A and B running back to back… 1. Unit test A allocates and deallocates a `MemMgr` (or whatever other object we are using for a key) at address X 2. Unit test A calls ` notifyFreeingObject` to free the object at K=X but gets stuck (does not lock) the ` JITDebugLock` mutex here<https://github.com/llvm/llvm-project/blob/3d5360a4398bfa6878f94ca9ac55bc568692c765/llvm/lib/ExecutionEngine/GDBRegistrationListener.cpp#L181> for whatever reason 3. Meanwhile unit test B allocates a `MemMgr` (or whatever other object we are using for a key) at newly freed address X 4. Unit test B calls `notifyObjectLoaded` to register the object at K=X and locks the ` JITDebugLock` here<https://github.com/llvm/llvm-project/blob/3d5360a4398bfa6878f94ca9ac55bc568692c765/llvm/lib/ExecutionEngine/GDBRegistrationListener.cpp#L162> with unit test A still waiting in step #2 above 5. This goes on to assert as the key was not erased (step #2) before it was added (step #4) I could use a little help to debug the error further. Curiously, we do hit this issue in continuous integration (CI) --- only when running on development systems “by hand”. I am investigating the differences (perhaps single vs. multi threaded?) as to why this might be the case. Note that I am using a slightly old version of MLIR master, I’m at this commit: commit 2f8b4545f4960778e37114c024073d208751ca89 Author: Adam Straw <adam.d.straw at intel.com<mailto:adam.d.straw at intel.com>> Date: Tue Apr 14 22:49:18 2020 +0300 [mlir] Fix assert on signed integer type in EDSC Integer type in Std dialect is signless so we should be checking for signless integer type instead of signed integer type in EDSC. Differential Revision: https://reviews.llvm.org/D78144 Thanks in advance for the help. Adam Note: I am out of office this Fri and the following Mon but otherwise should be prompt with email replies. _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200526/6db7f0f3/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: log2 Type: application/octet-stream Size: 88604 bytes Desc: log2 URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200526/6db7f0f3/attachment-0001.obj>
Straw, Adam D via llvm-dev
2020-May-26 20:16 UTC
[llvm-dev] [ORC JIT][MLIR] GDBRegistrationListener "second attempt to perform debug registration" assert
Referring to the log messages from my previous mail… I confused myself (and probably others) by reading the “Adding MemMgr 0x55555959f440“ message as “Registering MemMgr 0x55555959f440”. Thus the address mismatch made no sense. How could we be registering a `MemMgr` address/key that does not match once we arrive in `notifyObjectLoaded` method? Answer: Because the registrations is NOT coming from RTDyldObjectLinkingLayer.cpp. I found this first by adding an additional debug statement (highlighted) thinking the `reinterpret_cast` may be at fault for the key mismatch. I never saw this print and I then realized my “Adding” vs. “Registering” mistake. // Run EventListener notifyLoaded callbacks. { std::lock_guard<std::mutex> Lock(RTDyldLayerMutex); dbgs() << "Registering MemMgr " << MemMgr << "\n"; auto LOIItr = LoadedObjInfos.find(MemMgr); assert(LOIItr != LoadedObjInfos.end() && "LoadedObjInfo missing"); for (auto *L : EventListeners) { dbgs() << " key = " << reinterpret_cast<uintptr_t>(MemMgr) << "\n"; L->notifyObjectLoaded( static_cast<uint64_t>(reinterpret_cast<uintptr_t>(MemMgr)), *Obj, *LOIItr->second); } LoadedObjInfos.erase(MemMgr); } So, where is the registration coming from? Remember that I am on a slightly older version of MLIR. My mlir/lib/ExecutionEngine/ExecutionEngine.cpp has the following code with highlighted debug statement added by me: objectLayer->setNotifyLoaded( [engine = engine.get()]( llvm::orc::VModuleKey, const llvm::object::ObjectFile &object, const llvm::RuntimeDyld::LoadedObjectInfo &objectInfo) { if (engine->gdbListener) { uint64_t key = static_cast<uint64_t>( reinterpret_cast<uintptr_t>(object.getData().data())); dbgs() << "ExecutionEngine key = " << key << "\n"; engine->gdbListener->notifyObjectLoaded(key, object, objectInfo); } }); The key printed here matches the key printed in `notifyObjectLoaded`. This code is NOT present at the tip of MLIR master. Hence the solution here may be to move to latest MLIR master which I probably should have tried before sending this mail. Will do that now. Adam From: Straw, Adam D Sent: Tuesday, May 26, 2020 9:41 AM To: Lang Hames <lhames at gmail.com>; David Blaikie <dblaikie at gmail.com> Cc: llvm-dev at lists.llvm.org; Cyphers, Scott <Scott.Cyphers at intel.com>; Bobba, Jayaram <jayaram.bobba at intel.com>; Mostafa, Nagy H <Nagy.H.Mostafa at intel.com>; Iyer, Swapna <swapna.iyer at intel.com> Subject: RE: [llvm-dev] [ORC JIT][MLIR] GDBRegistrationListener "second attempt to perform debug registration" assert Sorry for the delay. Hope everyone had a good long weekend. See attached log captured from GDB session. Here is the punchline: Adding MemMgr 0x55555959f440 unit-test: /localdisk/adstraw/ngraph/build/mlir_project/llvm-project/llvm/lib/ExecutionEngine/GDBRegistrationListener.cpp:164: virtual void {anonymous}::GDBJITRegistrationListener::notifyObjectLoaded(llvm::JITEventListener::ObjectKey, const llvm::object::ObjectFile&, const llvm::RuntimeDyld::LoadedObjectInfo&): Assertion `ObjectBufferMap.find(K) == ObjectBufferMap.end() && "Second attempt to perform debug registration."' failed. (gdb) print/x K $2 = 0x555559605c50 The key (K) causing the assert a) does not match the addresses of the most recently added MemMgr and b) does not match the address of ANY MemMgr in the entire log file. The plot thickens. Wondering if there is some other object (besides a MemMgr) that registers its address-as-key into `ObjectBufferMap`. Adam From: Lang Hames <lhames at gmail.com<mailto:lhames at gmail.com>> Sent: Thursday, May 21, 2020 4:53 PM To: David Blaikie <dblaikie at gmail.com<mailto:dblaikie at gmail.com>> Cc: Straw, Adam D <adam.d.straw at intel.com<mailto:adam.d.straw at intel.com>>; llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>; Cyphers, Scott <scott.cyphers at intel.com<mailto:scott.cyphers at intel.com>>; Bobba, Jayaram <jayaram.bobba at intel.com<mailto:jayaram.bobba at intel.com>>; Mostafa, Nagy H <nagy.h.mostafa at intel.com<mailto:nagy.h.mostafa at intel.com>> Subject: Re: [llvm-dev] [ORC JIT][MLIR] GDBRegistrationListener "second attempt to perform debug registration" assert Hi Adam, Calls to the listeners should be protected by the RTDyldLayerMutex. Could you apply the attached patch and share the debugging output from one of the failing runs? Regards, Lang. On Wed, May 20, 2020 at 8:00 PM David Blaikie <dblaikie at gmail.com<mailto:dblaikie at gmail.com>> wrote: +Lang On Wed, May 20, 2020 at 4:44 PM Straw, Adam D via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: Hi all, Attention: Lang Hames I am developing the nGraph MLIR<https://github.com/NervanaSystems/ngraph/tree/master/src/contrib/mlir> implementation and hitting the following assert while running nGraph unit tests: assert(ObjectBufferMap.find(K) == ObjectBufferMap.end() && "Second attempt to perform debug registration."); Here is a permalink<https://github.com/llvm/llvm-project/blob/3d5360a4398bfa6878f94ca9ac55bc568692c765/llvm/lib/ExecutionEngine/GDBRegistrationListener.cpp#L163> for that line of code on GitHub. The failure only occurs when running multiple unit tests back-to-back. Inevitably, if I rerun the failing unit test, it passes. The failure also tends to move around with a different unit tests failing on successive runs. I am able to hit the failure in GDB. Here is a partial backtrace: #3 0x00007fffec4cf412 in __GI___assert_fail (assertion=assertion at entry=0x7ffff1d145e0 "ObjectBufferMap.find(K) == ObjectBufferMap.end() && \"Second attempt to perform debug registration.\"", file=file at entry=0x7ffff1d144d8 "/localdisk/adstraw/ngraph/build/mlir_project/llvm-project/llvm/lib/ExecutionEngine/GDBRegistrationListener.cpp", line=line at entry=164, function=function at entry=0x7ffff1d15520 <(anonymous namespace)::GDBJITRegistrationListener::notifyObjectLoaded(unsigned long, llvm::object::ObjectFile const&, llvm::RuntimeDyld::LoadedObjectInfo const&)::__PRETTY_FUNCTION__> "virtual void {anonymous}::GDBJITRegistrationListener::notifyObjectLoaded(llvm::JITEventListener::ObjectKey, const llvm::object::ObjectFile&, const llvm::RuntimeDyld::LoadedObjectInfo&)") at assert.c:101 #4 0x00007ffff01ed4d3 in (anonymous namespace)::GDBJITRegistrationListener::notifyObjectLoaded (this=0x5555591a9270, K=93825060893584, Obj=..., L=...) at /localdisk/adstraw/ngraph/build/mlir_project/llvm-project/llvm/lib/ExecutionEngine/GDBRegistrationListener.cpp:163 #5 0x00007ffff01b0eca in std::function<void (unsigned long, llvm::object::ObjectFile const&, llvm::RuntimeDyld::LoadedObjectInfo const&)>::operator()(unsigned long, llvm::object::ObjectFile const&, llvm::RuntimeDyld::LoadedObjectInfo const&) const (__args#2=..., __args#1=..., __args#0=<optimized out>, this=0x5555594e09b8) at /usr/include/c++/7/bits/std_function.h:706 #6 llvm::orc::RTDyldObjectLinkingLayer::onObjLoad (this=<optimized out>, K=3, R=..., Obj=..., MemMgr=0x5555591c4160, LoadedObjInfo=std::unique_ptr<llvm::RuntimeDyld::LoadedObjectInfo> = {...}, Resolved=std::map with 8 elements = {...}, InternalSymbols=std::set with 0 elements) at /localdisk/adstraw/ngraph/build/mlir_project/llvm-project/llvm/lib/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.cpp:267 A little debug context: * Frame #6 with K=3 we are about to execute the `NotifyLoaded` callback * Frame #5 we execute the `NotifyLoaded` callback which apparently transforms K from 3 to 93825060893584 (0x5555596CF390) which looks like a pointer. I don’t know the code well enough to grok what’s going on here. I’m not even sure where this callback resides, but it could be here<https://github.com/llvm/llvm-project/blob/c66f89005f6d23b6885d8f93f33ff27dc60ce7dd/llvm/lib/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.cpp#L312> in RTDyldObjectLinkingLater.cpp where we are using the address of the `MemMgr` as K. * Frame #4 now in `notifyObjectLoaded` with K=93825060893584 which will go on to assert in frame #3 due to the fact that that key (K) already exists in the map All of this leads me to believe there is some sort of race. Theory: with two unit tests A and B running back to back… 1. Unit test A allocates and deallocates a `MemMgr` (or whatever other object we are using for a key) at address X 2. Unit test A calls ` notifyFreeingObject` to free the object at K=X but gets stuck (does not lock) the ` JITDebugLock` mutex here<https://github.com/llvm/llvm-project/blob/3d5360a4398bfa6878f94ca9ac55bc568692c765/llvm/lib/ExecutionEngine/GDBRegistrationListener.cpp#L181> for whatever reason 3. Meanwhile unit test B allocates a `MemMgr` (or whatever other object we are using for a key) at newly freed address X 4. Unit test B calls `notifyObjectLoaded` to register the object at K=X and locks the ` JITDebugLock` here<https://github.com/llvm/llvm-project/blob/3d5360a4398bfa6878f94ca9ac55bc568692c765/llvm/lib/ExecutionEngine/GDBRegistrationListener.cpp#L162> with unit test A still waiting in step #2 above 5. This goes on to assert as the key was not erased (step #2) before it was added (step #4) I could use a little help to debug the error further. Curiously, we do hit this issue in continuous integration (CI) --- only when running on development systems “by hand”. I am investigating the differences (perhaps single vs. multi threaded?) as to why this might be the case. Note that I am using a slightly old version of MLIR master, I’m at this commit: commit 2f8b4545f4960778e37114c024073d208751ca89 Author: Adam Straw <adam.d.straw at intel.com<mailto:adam.d.straw at intel.com>> Date: Tue Apr 14 22:49:18 2020 +0300 [mlir] Fix assert on signed integer type in EDSC Integer type in Std dialect is signless so we should be checking for signless integer type instead of signed integer type in EDSC. Differential Revision: https://reviews.llvm.org/D78144 Thanks in advance for the help. Adam Note: I am out of office this Fri and the following Mon but otherwise should be prompt with email replies. _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200526/af98b28a/attachment.html>
Possibly Parallel Threads
- [ORC JIT][MLIR] GDBRegistrationListener "second attempt to perform debug registration" assert
- [ORC JIT][MLIR] GDBRegistrationListener "second attempt to perform debug registration" assert
- Invalid Signature of orc::RTDyldObjectLinkingLayer::NotifyLoadedFtor
- ORC Assertion failure
- Debugging JIT'ed code with ORC JIT?