On Nov 22, 2005, at 19:10, Reid Spencer wrote:> 1. What is the path name associated with TmpArchive? If its the same > as the path name associated with archPath then that's a bug, probably > introduced when Path::makeUnique is called from > Path::createTemporaryFileOnDisk which is called from line 377 of > ArchiveWriter.cpp.This does not appear to be the problem. I excluded the lines from the strace that created this temporary file. After line 377: (gdb) p TmpArchive $2 = {path = {static npos = 4294967295, _M_dataplus = {<allocator<char>> = {<No data fields>}, _M_p = 0x835407c "temp.GNU.a-PozKFJ"}, static _S_empty_rep_storage = {0, 0, 4, 0}}} (gdb) p archPath $3 = {path = {static npos = 4294967295, _M_dataplus = {<allocator<char>> = {<No data fields>}, _M_p = 0x83545f4 "temp.GNU.a5\b"}, static _S_empty_rep_storage = {0, 0, 4, 0}}} So these two variables are pointing to different files, and the creation of TmpArchive works just fine. The strace including the parts that reference the temporary file is appended to the end of the email. The very last open to the "archPath" file is at line 429, where it truncates it even though the foreignST pointer refers to data mmaped in that file. Is this data supposed to be copied out of the original file, or is another temporary supposed to be created and then the original could be replaced using a file move operation instead? I'll try this on my Debian unstable system tomorrow. If ranlib works there, maybe I can track down the difference. Evan Jones open("temp.GNU.a", O_RDONLY) = 3 fstat64(3, {st_mode=S_IFREG|0600, st_size=4210, ...}) = 0 mmap2(NULL, 8192, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40017000 gettimeofday({1132714484, 283020}, NULL) = 0 getpid() = 28656 open("temp.GNU.a-O1Q6E8", O_RDWR|O_CREAT|O_EXCL, 0600) = 4 close(4) = 0 open("temp.GNU.a-O1Q6E8", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 4 close(4) = 0 *** SIGNAL HANDLING REMOVED *** open("temp.GNU.a-O1Q6E8", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 4 brk(0) = 0x8357000 brk(0x8359000) = 0x8359000 fstat64(4, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40019000 _llseek(4, 0, [0], SEEK_CUR) = 0 _llseek(4, 0, [0], SEEK_SET) = 0 _llseek(4, 0, [0], SEEK_SET) = 0 _llseek(4, 0, [0], SEEK_SET) = 0 _llseek(4, 0, [0], SEEK_SET) = 0 brk(0) = 0x8359000 brk(0x8369000) = 0x8369000 brk(0) = 0x8369000 brk(0x836a000) = 0x836a000 mmap2(NULL, 2002944, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x4013b000 munmap(0x4013b000, 2002944) = 0 _llseek(4, 0, [0], SEEK_SET) = 0 _llseek(4, 0, [0], SEEK_SET) = 0 _llseek(4, 0, [0], SEEK_SET) = 0 write(4, "!<arch>\nevenlen/ 11008330"..., 4040) = 4040 close(4) = 0 munmap(0x40019000, 4096) = 0 access("temp.GNU.a-O1Q6E8", F_OK) = 0 open("temp.GNU.a-O1Q6E8", O_RDONLY) = 4 fstat64(4, {st_mode=S_IFREG|0600, st_size=4040, ...}) = 0 mmap2(NULL, 4096, PROT_READ, MAP_PRIVATE, 4, 0) = 0x40019000 open("temp.GNU.a", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 5 fstat64(5, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x4001a000 _llseek(5, 0, [0], SEEK_CUR) = 0 --- SIGBUS (Bus error) @ 0 (0) --- -- Evan Jones http://evanjones.ca/
Evan Jones wrote:> On Nov 22, 2005, at 19:10, Reid Spencer wrote: > >> 1. What is the path name associated with TmpArchive? If its the same >> as the path name associated with archPath then that's a bug, probably >> introduced when Path::makeUnique is called from >> Path::createTemporaryFileOnDisk which is called from line 377 of >> ArchiveWriter.cpp. > > > This does not appear to be the problem. I excluded the lines from the > strace that created this temporary file. After line 377: > > (gdb) p TmpArchive > $2 = {path = {static npos = 4294967295, > _M_dataplus = {<allocator<char>> = {<No data fields>}, > _M_p = 0x835407c "temp.GNU.a-PozKFJ"}, static _S_empty_rep_storageThat looks like mkstemp working correctly.> = {0, > 0, 4, 0}}} > (gdb) p archPath > $3 = {path = {static npos = 4294967295, > _M_dataplus = {<allocator<char>> = {<No data fields>}, > _M_p = 0x83545f4 "temp.GNU.a5\b"}, static _S_empty_rep_storage =What's with the "5\b" at the end? Looks like garbage to me. Not sure what's up with that.> {0, 0, > 4, 0}}} > > So these two variables are pointing to different files, and the creation > of TmpArchive works just fine. The strace including the parts that > reference the temporary file is appended to the end of the email.Okay, the previous strace looked like they were both opening "temp.GNU.a"; perhaps the names were truncated in the strace. Anyway, you're right, this isn't the problem if the file names are different.> The > very last open to the "archPath" file is at line 429, where it truncates > it even though the foreignST pointer refers to data mmaped in that file.Yes, the foreignST points to a non-LLVM symbol table which must be retained for compatibility with other AR implementations. Unfortunately, it is being held from a file that is about to be mmap'd.> Is this data supposed to be copied out of the original file,That's one solution but it defeats the purpose/efficiency of the mmap. or is> another temporary supposed to be created and then the original could be > replaced using a file move operation instead?Another temporary file would be even slower than copying the memory in memory.> > I'll try this on my Debian unstable system tomorrow. If ranlib works > there, maybe I can track down the difference.Sounds like a plan. Its a bit difficult to debug this over email. If you get stuck, let me know and I'll gen up a debug environment to take a look at it. Chances are, however, that I won't be able to replicate this problem on FC3 since it passes the nightly test here.>> open("temp.GNU.a", O_RDONLY) = 3 Original file being opened (read-only, because we're not supposed to modify it)> fstat64(3, {st_mode=S_IFREG|0600, st_size=4210, ...}) = 0 > mmap2(NULL, 8192, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40017000 > gettimeofday({1132714484, 283020}, NULL) = 0 > getpid() = 28656 > open("temp.GNU.a-O1Q6E8", O_RDWR|O_CREAT|O_EXCL, 0600) = 4The temporary file that will contain the symbol table> close(4) = 0 > open("temp.GNU.a-O1Q6E8", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 4 > close(4) = 0Not sure why this is opened twice .. probably an llvm::sys::Path issue.> > *** SIGNAL HANDLING REMOVED *** > > open("temp.GNU.a-O1Q6E8", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 4Open yet again the temnporary file into which we'll build the symtab.> brk(0) = 0x8357000 > brk(0x8359000) = 0x8359000 > fstat64(4, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0 > mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, > 0) = 0x40019000This mmap is just allocating memory> _llseek(4, 0, [0], SEEK_CUR) = 0 > _llseek(4, 0, [0], SEEK_SET) = 0 > _llseek(4, 0, [0], SEEK_SET) = 0 > _llseek(4, 0, [0], SEEK_SET) = 0 > _llseek(4, 0, [0], SEEK_SET) = 0 > brk(0) = 0x8359000 > brk(0x8369000) = 0x8369000 > brk(0) = 0x8369000 > brk(0x836a000) = 0x836a000 > mmap2(NULL, 2002944, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, > -1, 0) = 0x4013b000more memory allocations> munmap(0x4013b000, 2002944) = 0 > _llseek(4, 0, [0], SEEK_SET) = 0 > _llseek(4, 0, [0], SEEK_SET) = 0 > _llseek(4, 0, [0], SEEK_SET) = 0 > write(4, "!<arch>\nevenlen/ 11008330"..., 4040) = 4040 > close(4) = 0 > munmap(0x40019000, 4096) = 0 > access("temp.GNU.a-O1Q6E8", F_OK) = 0 > open("temp.GNU.a-O1Q6E8", O_RDONLY) = 4 > fstat64(4, {st_mode=S_IFREG|0600, st_size=4040, ...}) = 0 > mmap2(NULL, 4096, PROT_READ, MAP_PRIVATE, 4, 0) = 0x40019000map in the contents of the file we wrote.> open("temp.GNU.a", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 5wipe out the original file> fstat64(5, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0 > mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, > 0) = 0x4001a000allocate memory> _llseek(5, 0, [0], SEEK_CUR) = 0 > --- SIGBUS (Bus error) @ 0 (0) ---From this strace it looks to me like the only solution is to copy the data pointed to by foreignST into malloc'd space. Its just a symbol table so it shouldn't be too huge. The problem, as you've noted, is that the foreignST pointer is pointing into file handle 3. When file handle 5 is opened, the O_TRUNC parameter causes the memory pointed to by foreignST to be invalidated (unmapped). When we try to read this data in order to write it back to temp.GNU.a, it is getting the bus error trying to access the foreignST data. There's two solutions: (1) copy the data pointed to by foreignST (2) build the new file with the symbol table in a 3rd temporary file which is later renamed as the original (temp.GNU.a in this case). Reid.
On Nov 22, 2005, at 23:59, Reid Spencer wrote:>> = {0, >> 0, 4, 0}}} >> (gdb) p archPath >> $3 = {path = {static npos = 4294967295, >> _M_dataplus = {<allocator<char>> = {<No data fields>}, >> _M_p = 0x83545f4 "temp.GNU.a5\b"}, static _S_empty_rep_storage > What's with the "5\b" at the end? Looks like garbage to me. Not sure > what's up with that.The implementation of std::string on this system does not always NULL terminate strings. I verified that the path.length() value in that case was 10, and calling path.c_str() returned the correct NULL terminated string. I found a system where llvm-ranlib "works," and I have figured out the problem. Here is the crashed strace, after it opens the file: open("temp.GNU.a", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 15 fstat64(15, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x4001a000 _llseek(15, 0, [0], SEEK_CUR) = 0 --- SIGBUS (Bus error) @ 0 (0) --- and here is the "working" strace, after it opens the file: open("temp.GNU.a", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 17 write(17, "!<arch>\n", 8) = 8 _llseek(17, 0, [8], SEEK_CUR) = 0 write(17, "/ 1100833490 0 "..., 106) = 106 _llseek(17, 0, [114], SEEK_CUR) = 0 Note that in the "working" case, the C++ library decides to flush the magic number to the file immediately. This causes the first page in the MMAPed file to be allocated (with zeros). Hence, when we go to read the foreignST pointer, it doesn't crash, but it has silently corrupted the data. The foreignST pointer looks like this when it is first read from the file: (gdb) x/46bx foreignST->getData() 0xf6ffe044: 0x00 0x00 0x00 0x02 0x00 0x00 0x07 0x4e 0xf6ffe04c: 0x00 0x00 0x07 0x4e 0x5f 0x5a 0x4e 0x34 0xf6ffe054: 0x6c 0x6c 0x76 0x6d 0x35 0x49 0x73 0x4e 0xf6ffe05c: 0x41 0x4e 0x45 0x66 0x00 0x5f 0x5a 0x4e 0xf6ffe064: 0x34 0x6c 0x6c 0x76 0x6d 0x35 0x49 0x73 0xf6ffe06c: 0x4e 0x41 0x4e 0x45 0x64 0x00 And it looks like this just before it gets written out: (gdb) x/46bx foreignST->getData() 0xf6ffe044: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xf6ffe04c: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xf6ffe054: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xf6ffe05c: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xf6ffe064: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xf6ffe06c: 0x00 0x00 0x00 0x00 0x00 0x00 You can tell that it is getting corrupted by looking at the file with "hexdump -C" before and after llvm-ranlib. The native symbol table is supposed to start at offset 0x44.>> Is this data supposed to be copied out of the original file, > That's one solution but it defeats the purpose/efficiency of the mmap.Well, only the native symbol table would need to be copied, and only in the writeToDisk function, before the file gets invalidated.>> another temporary supposed to be created and then the original could >> be replaced using a file move operation instead? > Another temporary file would be even slower than copying the memory in > memory.I'm not so sure about that. The archive file is being truncated anyway. How much worse would it be to unlink the file before writing it out? The operating system probably already deallocates the disk blocks as soon as it is truncated. That also fixes the problem, since the original mapped file still exists until it is unmapped.> There's two solutions: > (1) copy the data pointed to by foreignST > (2) build the new file with the symbol table in a 3rd temporary file > which is later renamed as the original (temp.GNU.a in this case).(3) unlink the original file first (basically equivalent to (2)) (4) Write the foreignST into the TmpArchive file. Is there any reason that this isn't possible? Then the final archive would be created in a single pass, and it could just be moved into place. Also, since writeToDisk() invalidates the mapped file that was originally used to create the archive, shouldn't this function also unmap that file and erase all its members? This would prevent and further bugs like this from happening. Evan Jones -- Evan Jones http://evanjones.ca/
Possibly Parallel Threads
- [LLVMdev] llvm-ranlib: Bus Error in regressions + fix
- [LLVMdev] llvm-ranlib: Bus Error in regressions + fix
- [LLVMdev] llvm-ranlib: Bus Error in regressions + fix
- [LLVMdev] llvm-ranlib: Bus Error in regressions + fix
- [LLVMdev] llvm-ranlib: Bus Error in regressions + fix