Hi, I am observing a deadlock with llvm-lit on windows 7. When I attached a debugger, the communicate() call is blocked. In file utils/lit/lit/TestRunner.py> # FIXME: There is probably still deadlock potential here. Yawn.> procData = [None] * len(procs)> procData[-1] = procs[-1].communicate()I am invoking python directly on windows to run the unit tests. C:\Python27\python.exe C:\build\llvm\Release\bin\llvm-lit.py -v -j 12 --param build_mode=Release --param build_config=Win32 llvm_site_config=C:\llvm_on_win\nightly\build\llvm\tools\polly\test\lit.site .cfg test Note: If I invoke with "-j 1" , the unit tests finish but took a lot of time. There is no deadlock. I am using python version 2.7.6. At this moment, I believe the issue is caused by stdout filling the OS buffer there by blocking the communicate() call. It is possible some of the unit tests dump a lot of text/data to stdout. FYI, I have a couple of unit tests of my own in the code base. On Linux, there is no deadlock but on windows I am hitting a deadlock 7 out of 10 times. I tried invoking python with "-u" but in vain. When I looked at llvm-lit code, I saw the code to avoid deadlocks but there was no guarantee, like the one I pasted above. Would appreciate if some one take a look at it and provide more context on deadlocks. --Sumanth G -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150126/65d52c06/attachment.html>
Have you been able to isolate the test which causes the deadlock? You should be able to get close to figuring it out by seeing which processes are hung and maybe which tests aren't finished... - Daniel On Mon, Jan 26, 2015 at 9:57 AM, Sumanth Gundapaneni < sgundapa at codeaurora.org> wrote:> Hi, > > I am observing a deadlock with llvm-lit on windows 7. > > When I attached a debugger, the communicate() call is blocked. > > > > In file utils/lit/lit/TestRunner.py > > > # FIXME: There is probably still deadlock potential here. Yawn. > > > procData = [None] * len(procs) > > > procData[-1] = procs[-1].communicate() > > > > I am invoking python directly on windows to run the unit tests. > > C:\Python27\python.exe C:\build\llvm\Release\bin\llvm-lit.py -v -j 12 > --param build_mode=Release --param build_config=Win32 > llvm_site_config=C:\llvm_on_win\nightly\build\llvm\tools\polly\test\lit.site.cfg > test > > Note: If I invoke with “-j 1” , the unit tests finish but took a lot of > time. There is no deadlock. I am using python version 2.7.6. > > > > At this moment, I believe the issue is caused by stdout filling the OS > buffer there by blocking the communicate() call. > > It is possible some of the unit tests dump a lot of text/data to stdout. > FYI, I have a couple of unit tests of my own in the code base. > > On Linux, there is no deadlock but on windows I am hitting a deadlock 7 > out of 10 times. I tried invoking python with “-u” but in vain. > > > > When I looked at llvm-lit code, I saw the code to avoid deadlocks but > there was no guarantee, like the one I pasted above. > > Would appreciate if some one take a look at it and provide more context on > deadlocks. > > > > --Sumanth G > > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150126/d49e261b/attachment.html>
Honestly, lit should just create temp files and get out of the business of polling and reading from subprocess pipes. I think you have more or less diagnosed the problem, with the caveat that communicate will not block because the underlying pipes of that process are full. It is more likely that some other process is blocked writing to a full pipe, and the process under communication is also waiting on that pipe. Consider this pipeline: llc -debug -mtriple=x86_64-linux < %s | FileCheck %s In this case, llc will dump lots of text to stderr, which is piped to lit. That buffer will fill and writes will block. lit will 'communicate' with FileCheck, and no progress will be made. On Mon, Jan 26, 2015 at 9:57 AM, Sumanth Gundapaneni < sgundapa at codeaurora.org> wrote:> Hi, > > I am observing a deadlock with llvm-lit on windows 7. > > When I attached a debugger, the communicate() call is blocked. > > > > In file utils/lit/lit/TestRunner.py > > > # FIXME: There is probably still deadlock potential here. Yawn. > > > procData = [None] * len(procs) > > > procData[-1] = procs[-1].communicate() > > > > I am invoking python directly on windows to run the unit tests. > > C:\Python27\python.exe C:\build\llvm\Release\bin\llvm-lit.py -v -j 12 > --param build_mode=Release --param build_config=Win32 > llvm_site_config=C:\llvm_on_win\nightly\build\llvm\tools\polly\test\lit.site.cfg > test > > Note: If I invoke with “-j 1” , the unit tests finish but took a lot of > time. There is no deadlock. I am using python version 2.7.6. > > > > At this moment, I believe the issue is caused by stdout filling the OS > buffer there by blocking the communicate() call. > > It is possible some of the unit tests dump a lot of text/data to stdout. > FYI, I have a couple of unit tests of my own in the code base. > > On Linux, there is no deadlock but on windows I am hitting a deadlock 7 > out of 10 times. I tried invoking python with “-u” but in vain. > > > > When I looked at llvm-lit code, I saw the code to avoid deadlocks but > there was no guarantee, like the one I pasted above. > > Would appreciate if some one take a look at it and provide more context on > deadlocks. > > > > --Sumanth G > > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150126/70c075db/attachment.html>
The Python 2.7.2 docs specifically say that calls to wait() when stdout / stderr == PIPE, and access to stdin.read / .write and stderr.read will cause deadlocks. It seems this happens when the OS pipe buffer is filled, so everything below is correct in terms of root cause. I see uses of all of those in this script but am not well versed in python and can't offer any suggestions other than the one to use communicate() for everything mentioned on the linked page: https://python.readthedocs.org/en/v2.7.2/library/subprocess.html Cheers, Gordon Keiser Software Development Engineer Arxan Technologies> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Reid Kleckner > Sent: Monday, January 26, 2015 4:32 PM > > Honestly, lit should just create temp files and get out of the business of polling and reading from subprocess pipes. > > I think you have more or less diagnosed the problem, with the caveat that communicate will not block because the underlying pipes of that process are full. It is more likely that some other process is blocked writing to a full pipe, and the process under communication is also waiting on that pipe. Consider this pipeline: > llc -debug -mtriple=x86_64-linux < %s | FileCheck %s > > In this case, llc will dump lots of text to stderr, which is piped to lit. That buffer will fill and writes will block. lit will 'communicate' with FileCheck, and no progress will be made. > > On Mon, Jan 26, 2015 at 9:57 AM, Sumanth Gundapaneni <sgundapa at codeaurora.org> wrote: > Hi, > I am observing a deadlock with llvm-lit on windows 7. > When I attached a debugger, the communicate() call is blocked. > > In file utils/lit/lit/TestRunner.py > > # FIXME: There is probably still deadlock potential here. Yawn. > > procData = [None] * len(procs) > > procData[-1] = procs[-1].communicate() > > I am invoking python directly on windows to run the unit tests. > C:\Python27\python.exe C:\build\llvm\Release\bin\llvm-lit.py -v -j 12 --param build_mode=Release --param build_config=Win32 llvm_site_config=C:\llvm_on_win\nightly\build\llvm\tools\polly\test\lit.site.cfg test > Note: If I invoke with “-j 1” , the unit tests finish but took a lot of time. There is no deadlock. I am using python version 2.7.6. > > At this moment, I believe the issue is caused by stdout filling the OS buffer there by blocking the communicate() call. > It is possible some of the unit tests dump a lot of text/data to stdout. FYI, I have a couple of unit tests of my own in the code base. > On Linux, there is no deadlock but on windows I am hitting a deadlock 7 out of 10 times. I tried invoking python with “-u” but in vain. > > When I looked at llvm-lit code, I saw the code to avoid deadlocks but there was no guarantee, like the one I pasted above. > Would appreciate if some one take a look at it and provide more context on deadlocks. > > --Sumanth G_______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
I think I am observing this on multiple test cases, but I haven’t narrowed down on any. --Sumanth G From: daniel.dunbar at gmail.com [mailto:daniel.dunbar at gmail.com] On Behalf Of Daniel Dunbar Sent: Monday, January 26, 2015 1:29 PM To: Sumanth Gundapaneni Cc: LLVM Developers Mailing List; Rafael Ávila de Espíndola Subject: Re: Deadlock in llvm-lit on windows 7 Have you been able to isolate the test which causes the deadlock? You should be able to get close to figuring it out by seeing which processes are hung and maybe which tests aren't finished... - Daniel On Mon, Jan 26, 2015 at 9:57 AM, Sumanth Gundapaneni <sgundapa at codeaurora.org <mailto:sgundapa at codeaurora.org> > wrote: Hi, I am observing a deadlock with llvm-lit on windows 7. When I attached a debugger, the communicate() call is blocked. In file utils/lit/lit/TestRunner.py> # FIXME: There is probably still deadlock potential here. Yawn.> procData = [None] * len(procs)> procData[-1] = procs[-1].communicate()I am invoking python directly on windows to run the unit tests. C:\Python27\python.exe C:\build\llvm\Release\bin\llvm-lit.py -v -j 12 --param build_mode=Release --param build_config=Win32 llvm_site_config=C:\llvm_on_win\nightly\build\llvm\tools\polly\test\lit.site.cfg test Note: If I invoke with “-j 1” , the unit tests finish but took a lot of time. There is no deadlock. I am using python version 2.7.6. At this moment, I believe the issue is caused by stdout filling the OS buffer there by blocking the communicate() call. It is possible some of the unit tests dump a lot of text/data to stdout. FYI, I have a couple of unit tests of my own in the code base. On Linux, there is no deadlock but on windows I am hitting a deadlock 7 out of 10 times. I tried invoking python with “-u” but in vain. When I looked at llvm-lit code, I saw the code to avoid deadlocks but there was no guarantee, like the one I pasted above. Would appreciate if some one take a look at it and provide more context on deadlocks. --Sumanth G -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150126/f22e2dc4/attachment.html>
I agree with you Reid on creating temp files.>In this case, llc will dump lots of text to stderr, which is piped to lit. That buffer will fill and writes will block. lit will 'communicate' with FileCheck, and no progress will be madeI was exactly trying to convey this. --Sumanth G From: Reid Kleckner [mailto:rnk at google.com] Sent: Monday, January 26, 2015 1:32 PM To: Sumanth Gundapaneni Cc: LLVM Developers Mailing List Subject: Re: [LLVMdev] Deadlock in llvm-lit on windows 7 Honestly, lit should just create temp files and get out of the business of polling and reading from subprocess pipes. I think you have more or less diagnosed the problem, with the caveat that communicate will not block because the underlying pipes of that process are full. It is more likely that some other process is blocked writing to a full pipe, and the process under communication is also waiting on that pipe. Consider this pipeline: llc -debug -mtriple=x86_64-linux < %s | FileCheck %s In this case, llc will dump lots of text to stderr, which is piped to lit. That buffer will fill and writes will block. lit will 'communicate' with FileCheck, and no progress will be made. On Mon, Jan 26, 2015 at 9:57 AM, Sumanth Gundapaneni <sgundapa at codeaurora.org <mailto:sgundapa at codeaurora.org> > wrote: Hi, I am observing a deadlock with llvm-lit on windows 7. When I attached a debugger, the communicate() call is blocked. In file utils/lit/lit/TestRunner.py> # FIXME: There is probably still deadlock potential here. Yawn.> procData = [None] * len(procs)> procData[-1] = procs[-1].communicate()I am invoking python directly on windows to run the unit tests. C:\Python27\python.exe C:\build\llvm\Release\bin\llvm-lit.py -v -j 12 --param build_mode=Release --param build_config=Win32 llvm_site_config=C:\llvm_on_win\nightly\build\llvm\tools\polly\test\lit.site.cfg test Note: If I invoke with “-j 1” , the unit tests finish but took a lot of time. There is no deadlock. I am using python version 2.7.6. At this moment, I believe the issue is caused by stdout filling the OS buffer there by blocking the communicate() call. It is possible some of the unit tests dump a lot of text/data to stdout. FYI, I have a couple of unit tests of my own in the code base. On Linux, there is no deadlock but on windows I am hitting a deadlock 7 out of 10 times. I tried invoking python with “-u” but in vain. When I looked at llvm-lit code, I saw the code to avoid deadlocks but there was no guarantee, like the one I pasted above. Would appreciate if some one take a look at it and provide more context on deadlocks. --Sumanth G _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150126/cc5d2863/attachment.html>
Apparently Analagous Threads
- LLVMTargetMachine with optimization level passed from clang.
- [LLVMdev] C++ demangler for llvm tools
- [LLVMdev] CMake: Gold linker detection
- LLVMTargetMachine with optimization level passed from clang.
- LLVMTargetMachine with optimization level passed from clang.