I ran the LLVM regression tests today (via make check) and noticed that llvm-ranlib crashes with a Bus Error on my test system (a fairly old RedHat 9 system), using the latest CVS version. I did some digging and I think I know what the problem is, and I have attached a quick and dirty patch that fixes the problem for me, but I need a suggestion about how it should be integrated properly. Here are the details: To reproduce the crash, run llvm-ranlib on the "GNU.a" file in the llvm/test/Regression/Archive directory (make a copy first: it corrupts it). It then crashes with a Bus Error. The stack trace is: #0 0x4207c1aa in memcpy () from /lib/tls/libc.so.6 #1 0x400d55e8 in std::basic_streambuf<char, std::char_traits<char> >::xsputn(char const*, int) () from /usr/lib/libstdc++.so.5 #2 0x4009c818 in std::basic_filebuf<char, std::char_traits<char> >::xsputn(char const*, int) () from /usr/lib/libstdc++.so.5 #3 0x400cbed1 in std::ostream::write(char const*, int) () from /usr/lib/libstdc++.so.5 #4 0x0829c9d0 in llvm::Archive::writeMember(llvm::ArchiveMember const&, std::basic_ofstream<char, std::char_traits<char> >&, bool, bool, bool) ( this=0x8356088, member=@0x8356180, ARFile=@0xbfffd630, CreateSymbolTable=false, TruncateNames=false, ShouldCompress=false) at ArchiveWriter.cpp:294 #5 0x0829d297 in llvm::Archive::writeToDisk(bool, bool, bool) ( this=0x8356088, CreateSymbolTable=true, TruncateNames=false, Compress=false) at ArchiveWriter.cpp:439 #6 0x081a5618 in main (argc=2, argv=0xbfffd9b4) at llvm-ranlib.cpp:76 #7 0x42015574 in __libc_start_main () from /lib/tls/libc.so.6 At frame #4 (Archive::writeMember) looks like this:> // Write the (possibly compressed) member's content to the file. > ARFile.write(data,fSize);If I examine the backtrace, fSize equals 46, and "data" points to 46 null bytes. However, the "data" pointer is invalid, since if I inspect it *before* the crash, the crash does not occur. frame #5 (Archive::writeToDisk) looks like this:> // If there is a foreign symbol table, put it into the file now. > Most > // ar(1) implementations require the symbol table to be first > but llvm-ar > // can deal with it being after a foreign symbol table. This > ensures > // compatibility with other ar(1) implementations as well as > allowing the > // archive to store both native .o and LLVM .bc files, both > indexed. > if (foreignST) { > writeMember(*foreignST, FinalFile, false, false, false); > }So I tracked back the foreignST pointer, and when it is set the "data" pointer is *not* 46 null bytes. It is valid data mmap-ed from the archive file. But when it gets to the call to writeMember, that data pointer is no longer valid. Running "strace" on llvm-ranlib solved the mystery. Here are the relevant calls: open("temp.GNU.a", O_RDONLY) = 13 fstat64(13, {st_mode=S_IFREG|0600, st_size=4210, ...}) = 0 mmap2(NULL, 8192, PROT_READ, MAP_PRIVATE, 13, 0) = 0x40017000 ** The source file is mapped, and a lot of stuff happens ** open("temp.GNU.a", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 15 fstat64(15, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0 ** Here the source file is TRUNCATED. Essentially, this invalidates the data pointer. Two lines follow in the trace: ** _llseek(15, 0, [0], SEEK_CUR) = 0 --- SIGBUS (Bus error) @ 0 (0) --- So the fix is pretty simple: before opening the file again, unlink it. This has the effect of creating a *new* file, instead of overwriting the old data. I've attached my quick-and-dirty patch that will only work on Unix. I'm not sure how this should be solved correctly. The other strange part is why hasn't anyone else seen this problem? I would think that this would occur pretty reliably on all systems. Any ideas? Evan Jones -------------- next part -------------- A non-text attachment was scrubbed... Name: archive.patch Type: application/octet-stream Size: 1071 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20051122/e5d04595/attachment.obj> -------------- next part -------------- -- Evan Jones http://evanjones.ca/
Evan, Your patch uses an operating system call that is not portable. All non-portable code needs to be located in the lib/System library. I'm not sure why this problem appears on an old Red Hat system. Perhaps the C++ io library is not up to snuff on that platform? What compiler are you using? Reid. Evan Jones wrote:> I ran the LLVM regression tests today (via make check) and noticed that > llvm-ranlib crashes with a Bus Error on my test system (a fairly old > RedHat 9 system), using the latest CVS version. I did some digging and I > think I know what the problem is, and I have attached a quick and dirty > patch that fixes the problem for me, but I need a suggestion about how > it should be integrated properly. Here are the details: > > To reproduce the crash, run llvm-ranlib on the "GNU.a" file in the > llvm/test/Regression/Archive directory (make a copy first: it corrupts > it). It then crashes with a Bus Error. > > The stack trace is: > > #0 0x4207c1aa in memcpy () from /lib/tls/libc.so.6 > #1 0x400d55e8 in std::basic_streambuf<char, std::char_traits<char> > >::xsputn(char const*, int) () from /usr/lib/libstdc++.so.5 > #2 0x4009c818 in std::basic_filebuf<char, std::char_traits<char> > >::xsputn(char const*, int) () from /usr/lib/libstdc++.so.5 > #3 0x400cbed1 in std::ostream::write(char const*, int) () > from /usr/lib/libstdc++.so.5 > #4 0x0829c9d0 in llvm::Archive::writeMember(llvm::ArchiveMember const&, > std::basic_ofstream<char, std::char_traits<char> >&, bool, bool, bool) ( > this=0x8356088, member=@0x8356180, ARFile=@0xbfffd630, > CreateSymbolTable=false, TruncateNames=false, ShouldCompress=false) > at ArchiveWriter.cpp:294 > #5 0x0829d297 in llvm::Archive::writeToDisk(bool, bool, bool) ( > this=0x8356088, CreateSymbolTable=true, TruncateNames=false, > Compress=false) at ArchiveWriter.cpp:439 > #6 0x081a5618 in main (argc=2, argv=0xbfffd9b4) at llvm-ranlib.cpp:76 > #7 0x42015574 in __libc_start_main () from /lib/tls/libc.so.6 > > > At frame #4 (Archive::writeMember) looks like this: > >> // Write the (possibly compressed) member's content to the file. >> ARFile.write(data,fSize); > > > If I examine the backtrace, fSize equals 46, and "data" points to 46 > null bytes. However, the "data" pointer is invalid, since if I inspect > it *before* the crash, the crash does not occur. > > frame #5 (Archive::writeToDisk) looks like this: > >> // If there is a foreign symbol table, put it into the file now. >> Most >> // ar(1) implementations require the symbol table to be first >> but llvm-ar >> // can deal with it being after a foreign symbol table. This >> ensures >> // compatibility with other ar(1) implementations as well as >> allowing the >> // archive to store both native .o and LLVM .bc files, both >> indexed. >> if (foreignST) { >> writeMember(*foreignST, FinalFile, false, false, false); >> } > > > So I tracked back the foreignST pointer, and when it is set the "data" > pointer is *not* 46 null bytes. It is valid data mmap-ed from the > archive file. But when it gets to the call to writeMember, that data > pointer is no longer valid. Running "strace" on llvm-ranlib solved the > mystery. Here are the relevant calls: > > open("temp.GNU.a", O_RDONLY) = 13 > fstat64(13, {st_mode=S_IFREG|0600, st_size=4210, ...}) = 0 > mmap2(NULL, 8192, PROT_READ, MAP_PRIVATE, 13, 0) = 0x40017000 > > ** The source file is mapped, and a lot of stuff happens ** > > open("temp.GNU.a", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 15 > fstat64(15, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0 > > ** Here the source file is TRUNCATED. Essentially, this invalidates the > data pointer. Two lines follow in the trace: ** > > _llseek(15, 0, [0], SEEK_CUR) = 0 > --- SIGBUS (Bus error) @ 0 (0) --- > > > > So the fix is pretty simple: before opening the file again, unlink it. > This has the effect of creating a *new* file, instead of overwriting the > old data. I've attached my quick-and-dirty patch that will only work on > Unix. I'm not sure how this should be solved correctly. The other > strange part is why hasn't anyone else seen this problem? I would think > that this would occur pretty reliably on all systems. Any ideas? > > Evan Jones > > > -- > Evan Jones > http://evanjones.ca/ > > > ------------------------------------------------------------------------ > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On Nov 22, 2005, at 17:18, Reid Spencer wrote:> Your patch uses an operating system call that is not portable. All > non-portable code needs to be located in the lib/System library.Yep! I know. That is why I posted it for discussion. I'm not sure if this is the "right" way to fix the problem, or if there is a different fix that should be applied (like perhaps copying the data out of the mmap-ed archive?).> I'm not sure why this problem appears on an old Red Hat system. > Perhaps the C++ io library is not up to snuff on that platform? What > compiler are you using?It is very strange to me that it doesn't appear on other systems. I'll try to load LLVM on my bleeding edge Debian laptop tomorrow and see what happens there. I am pretty certain that this has nothing to do with the C++ library, and everything to do with the behaviour of mmap when the file that was mmaped is modified. I actually can reproduce this behaviour with the attached C test case. The program mmaps a file called 'data,' prints the last byte, truncates the file, then tries to read the last byte again. It causes a Bus Error on both the RedHat system and my Mac OS X workstation. Hence, this appears to be valid (or at least common) mmap behaviour. rn-spra1c07:~ ejones$ dd if=/dev/zero of=data bs=1 count=4096 4096+0 records in 4096+0 records out 4096 bytes transferred in 0.067263 secs (60895 bytes/sec) rn-spra1c07:~ ejones$ ./mmaptest last byte = 0x00 Bus error I can also reproduce it with a minimal LLVM example, also attached. That program needs the "GNU.a" file in the current directory. It opens the archive and scans through all the members, printing out the first byte of each one. Then it truncates the file and repeats that experiment. It also causes a Bus Error. Essentially, this is what happens in ArchiveWriter.cpp:429. This bug will be triggered by any archive that has a native symbol table, since that member (foreignST) references data that was mmaped from the original file. All the other members are copied from the temporary archive, so they are not a problem. Evan Jones -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: mmaptest.c URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20051122/fc0b2085/attachment.c> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: llvm-buserror.cpp URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20051122/fc0b2085/attachment.ksh> -------------- next part -------------- -- Evan Jones http://evanjones.ca/
Possibly Parallel Threads
- [LLVMdev] llvm-ranlib: Bus Error in regressions + fix
- [LLVMdev] llvm-ranlib: Bus Error in regressions + fix
- [LLVMdev] llvm-ranlib: Bus Error in regressions + fix
- [LLVMdev] llvm-ranlib: Bus Error in regressions + fix
- [LLVMdev] llvm-ranlib: Bus Error in regressions + fix