On Nov 22, 2005, at 17:18, Reid Spencer wrote:> Your patch uses an operating system call that is not portable. All > non-portable code needs to be located in the lib/System library.Yep! I know. That is why I posted it for discussion. I'm not sure if this is the "right" way to fix the problem, or if there is a different fix that should be applied (like perhaps copying the data out of the mmap-ed archive?).> I'm not sure why this problem appears on an old Red Hat system. > Perhaps the C++ io library is not up to snuff on that platform? What > compiler are you using?It is very strange to me that it doesn't appear on other systems. I'll try to load LLVM on my bleeding edge Debian laptop tomorrow and see what happens there. I am pretty certain that this has nothing to do with the C++ library, and everything to do with the behaviour of mmap when the file that was mmaped is modified. I actually can reproduce this behaviour with the attached C test case. The program mmaps a file called 'data,' prints the last byte, truncates the file, then tries to read the last byte again. It causes a Bus Error on both the RedHat system and my Mac OS X workstation. Hence, this appears to be valid (or at least common) mmap behaviour. rn-spra1c07:~ ejones$ dd if=/dev/zero of=data bs=1 count=4096 4096+0 records in 4096+0 records out 4096 bytes transferred in 0.067263 secs (60895 bytes/sec) rn-spra1c07:~ ejones$ ./mmaptest last byte = 0x00 Bus error I can also reproduce it with a minimal LLVM example, also attached. That program needs the "GNU.a" file in the current directory. It opens the archive and scans through all the members, printing out the first byte of each one. Then it truncates the file and repeats that experiment. It also causes a Bus Error. Essentially, this is what happens in ArchiveWriter.cpp:429. This bug will be triggered by any archive that has a native symbol table, since that member (foreignST) references data that was mmaped from the original file. All the other members are copied from the temporary archive, so they are not a problem. Evan Jones -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: mmaptest.c URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20051122/fc0b2085/attachment.c> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: llvm-buserror.cpp URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20051122/fc0b2085/attachment.ksh> -------------- next part -------------- -- Evan Jones http://evanjones.ca/
Evan Jones wrote: > I am pretty certain that this has nothing to do with the C++ library,> and everything to do with the behaviour of mmap when the file that was > mmaped is modified. I actually can reproduce this behaviour with the > attached C test case. The program mmaps a file called 'data,' prints the > last byte, truncates the file, then tries to read the last byte again. > It causes a Bus Error on both the RedHat system and my Mac OS X > workstation. Hence, this appears to be valid (or at least common) mmap > behaviour.Yes, this is the correct behavior for mmap in such a situation. The mapped file, when it is truncated, invalidates the memory corresponding to truncated portion of the file. The memory is taken out of the virtual memory table so that any attempt to access generates a, you guessed it, bus error.> > rn-spra1c07:~ ejones$ dd if=/dev/zero of=data bs=1 count=4096 > 4096+0 records in > 4096+0 records out > 4096 bytes transferred in 0.067263 secs (60895 bytes/sec) > rn-spra1c07:~ ejones$ ./mmaptest > last byte = 0x00 > Bus error > > I can also reproduce it with a minimal LLVM example, also attached. That > program needs the "GNU.a" file in the current directory. It opens the > archive and scans through all the members, printing out the first byte > of each one. Then it truncates the file and repeats that experiment. It > also causes a Bus Error. > > Essentially, this is what happens in ArchiveWriter.cpp:429. This bug > will be triggered by any archive that has a native symbol table, since > that member (foreignST) references data that was mmaped from the > original file. All the other members are copied from the temporary > archive, so they are not a problem.The file gets corrupted because it is overwriting itself. The strace showed that it opened the same file for reading and writing with 2 file handlers. This isn't what the code is supposed to do. The TmpArchive variable in ArchiveWriter.cpp is supposed to reference a unique file name and it is not. At that point in the ArchiveWriter, it is trying to insert the symbol table. It does that by creating a temporary file and mmaping the original file. The temporary file is written with the symbol table and then it writes the entire content of the mmaped file into the temporary file (single write using mmaped pointer). When that is done, it renames the temporary file to that of the original. The problem is, the temporary and the original are the same file! This is a failure of Path::makeUnique, which is system dependent. I don't have a debugging environment handy to track this down, but I would suggest that you break out a debugger and investigate the following: 1. What is the path name associated with TmpArchive? If its the same as the path name associated with archPath then that's a bug, probably introduced when Path::makeUnique is called from Path::createTemporaryFileOnDisk which is called from line 377 of ArchiveWriter.cpp. 2. If item 1. holds, break in Path::makeUnique and see how it is computing the temporary name. There are three mechanisms: mkstemp, mktemp, and "manual". I don't know which mechanism it is using or why its not creating a unique file name. If your Red Hat system is really old, its possible one of the system mechanisms is broken and you'll need to adjust the code for the broken (but available) library call. Reid.
On Nov 22, 2005, at 19:10, Reid Spencer wrote:> 1. What is the path name associated with TmpArchive? If its the same > as the path name associated with archPath then that's a bug, probably > introduced when Path::makeUnique is called from > Path::createTemporaryFileOnDisk which is called from line 377 of > ArchiveWriter.cpp.This does not appear to be the problem. I excluded the lines from the strace that created this temporary file. After line 377: (gdb) p TmpArchive $2 = {path = {static npos = 4294967295, _M_dataplus = {<allocator<char>> = {<No data fields>}, _M_p = 0x835407c "temp.GNU.a-PozKFJ"}, static _S_empty_rep_storage = {0, 0, 4, 0}}} (gdb) p archPath $3 = {path = {static npos = 4294967295, _M_dataplus = {<allocator<char>> = {<No data fields>}, _M_p = 0x83545f4 "temp.GNU.a5\b"}, static _S_empty_rep_storage = {0, 0, 4, 0}}} So these two variables are pointing to different files, and the creation of TmpArchive works just fine. The strace including the parts that reference the temporary file is appended to the end of the email. The very last open to the "archPath" file is at line 429, where it truncates it even though the foreignST pointer refers to data mmaped in that file. Is this data supposed to be copied out of the original file, or is another temporary supposed to be created and then the original could be replaced using a file move operation instead? I'll try this on my Debian unstable system tomorrow. If ranlib works there, maybe I can track down the difference. Evan Jones open("temp.GNU.a", O_RDONLY) = 3 fstat64(3, {st_mode=S_IFREG|0600, st_size=4210, ...}) = 0 mmap2(NULL, 8192, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40017000 gettimeofday({1132714484, 283020}, NULL) = 0 getpid() = 28656 open("temp.GNU.a-O1Q6E8", O_RDWR|O_CREAT|O_EXCL, 0600) = 4 close(4) = 0 open("temp.GNU.a-O1Q6E8", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 4 close(4) = 0 *** SIGNAL HANDLING REMOVED *** open("temp.GNU.a-O1Q6E8", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 4 brk(0) = 0x8357000 brk(0x8359000) = 0x8359000 fstat64(4, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40019000 _llseek(4, 0, [0], SEEK_CUR) = 0 _llseek(4, 0, [0], SEEK_SET) = 0 _llseek(4, 0, [0], SEEK_SET) = 0 _llseek(4, 0, [0], SEEK_SET) = 0 _llseek(4, 0, [0], SEEK_SET) = 0 brk(0) = 0x8359000 brk(0x8369000) = 0x8369000 brk(0) = 0x8369000 brk(0x836a000) = 0x836a000 mmap2(NULL, 2002944, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x4013b000 munmap(0x4013b000, 2002944) = 0 _llseek(4, 0, [0], SEEK_SET) = 0 _llseek(4, 0, [0], SEEK_SET) = 0 _llseek(4, 0, [0], SEEK_SET) = 0 write(4, "!<arch>\nevenlen/ 11008330"..., 4040) = 4040 close(4) = 0 munmap(0x40019000, 4096) = 0 access("temp.GNU.a-O1Q6E8", F_OK) = 0 open("temp.GNU.a-O1Q6E8", O_RDONLY) = 4 fstat64(4, {st_mode=S_IFREG|0600, st_size=4040, ...}) = 0 mmap2(NULL, 4096, PROT_READ, MAP_PRIVATE, 4, 0) = 0x40019000 open("temp.GNU.a", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 5 fstat64(5, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x4001a000 _llseek(5, 0, [0], SEEK_CUR) = 0 --- SIGBUS (Bus error) @ 0 (0) --- -- Evan Jones http://evanjones.ca/
Possibly Parallel Threads
- [LLVMdev] llvm-ranlib: Bus Error in regressions + fix
- [LLVMdev] llvm-ranlib: Bus Error in regressions + fix
- [LLVMdev] llvm-ranlib: Bus Error in regressions + fix
- [LLVMdev] llvm-ranlib: Bus Error in regressions + fix
- [LLVMdev] llvm-ranlib: Bus Error in regressions + fix