On Nov 22, 2005, at 23:59, Reid Spencer wrote:>> = {0, >> 0, 4, 0}}} >> (gdb) p archPath >> $3 = {path = {static npos = 4294967295, >> _M_dataplus = {<allocator<char>> = {<No data fields>}, >> _M_p = 0x83545f4 "temp.GNU.a5\b"}, static _S_empty_rep_storage > What's with the "5\b" at the end? Looks like garbage to me. Not sure > what's up with that.The implementation of std::string on this system does not always NULL terminate strings. I verified that the path.length() value in that case was 10, and calling path.c_str() returned the correct NULL terminated string. I found a system where llvm-ranlib "works," and I have figured out the problem. Here is the crashed strace, after it opens the file: open("temp.GNU.a", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 15 fstat64(15, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x4001a000 _llseek(15, 0, [0], SEEK_CUR) = 0 --- SIGBUS (Bus error) @ 0 (0) --- and here is the "working" strace, after it opens the file: open("temp.GNU.a", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 17 write(17, "!<arch>\n", 8) = 8 _llseek(17, 0, [8], SEEK_CUR) = 0 write(17, "/ 1100833490 0 "..., 106) = 106 _llseek(17, 0, [114], SEEK_CUR) = 0 Note that in the "working" case, the C++ library decides to flush the magic number to the file immediately. This causes the first page in the MMAPed file to be allocated (with zeros). Hence, when we go to read the foreignST pointer, it doesn't crash, but it has silently corrupted the data. The foreignST pointer looks like this when it is first read from the file: (gdb) x/46bx foreignST->getData() 0xf6ffe044: 0x00 0x00 0x00 0x02 0x00 0x00 0x07 0x4e 0xf6ffe04c: 0x00 0x00 0x07 0x4e 0x5f 0x5a 0x4e 0x34 0xf6ffe054: 0x6c 0x6c 0x76 0x6d 0x35 0x49 0x73 0x4e 0xf6ffe05c: 0x41 0x4e 0x45 0x66 0x00 0x5f 0x5a 0x4e 0xf6ffe064: 0x34 0x6c 0x6c 0x76 0x6d 0x35 0x49 0x73 0xf6ffe06c: 0x4e 0x41 0x4e 0x45 0x64 0x00 And it looks like this just before it gets written out: (gdb) x/46bx foreignST->getData() 0xf6ffe044: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xf6ffe04c: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xf6ffe054: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xf6ffe05c: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xf6ffe064: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xf6ffe06c: 0x00 0x00 0x00 0x00 0x00 0x00 You can tell that it is getting corrupted by looking at the file with "hexdump -C" before and after llvm-ranlib. The native symbol table is supposed to start at offset 0x44.>> Is this data supposed to be copied out of the original file, > That's one solution but it defeats the purpose/efficiency of the mmap.Well, only the native symbol table would need to be copied, and only in the writeToDisk function, before the file gets invalidated.>> another temporary supposed to be created and then the original could >> be replaced using a file move operation instead? > Another temporary file would be even slower than copying the memory in > memory.I'm not so sure about that. The archive file is being truncated anyway. How much worse would it be to unlink the file before writing it out? The operating system probably already deallocates the disk blocks as soon as it is truncated. That also fixes the problem, since the original mapped file still exists until it is unmapped.> There's two solutions: > (1) copy the data pointed to by foreignST > (2) build the new file with the symbol table in a 3rd temporary file > which is later renamed as the original (temp.GNU.a in this case).(3) unlink the original file first (basically equivalent to (2)) (4) Write the foreignST into the TmpArchive file. Is there any reason that this isn't possible? Then the final archive would be created in a single pass, and it could just be moved into place. Also, since writeToDisk() invalidates the mapped file that was originally used to create the archive, shouldn't this function also unmap that file and erase all its members? This would prevent and further bugs like this from happening. Evan Jones -- Evan Jones http://evanjones.ca/
On Nov 23, 2005, at 8:16, Evan Jones wrote:> (4) Write the foreignST into the TmpArchive file. Is there any reason > that this isn't possible? Then the final archive would be created in a > single pass, and it could just be moved into place.Ah. I see: It needs to be written in order to compute the offsets. However, it would be possible to use a stringstream for this. I think the best thing to do is to write the final archive to a temporary file, and then move it into place. This has the advantage that if something goes wrong the original archive will not be corrupted. Additionally, all the members of the current archive object instance stay valid, since the original mmap is still valid. One disadvantage of this approach is that if CreateSymbolTable is false, it will be slightly more expensive as it will build the archive in memory before writing it to disk, instead of just writing it out to disk directly.> Also, since writeToDisk() invalidates the mapped file that was > originally used to create the archive, shouldn't this function also > unmap that file and erase all its members? This would prevent and > further bugs like this from happening.This is not necessary using the replace method described above. However, it *would* be necessary on Windows since you can't unlink/replace an open file on that platform. To fix this, the original archive file must be closed before performing the move. I've attached a patch that builds the temporary archive in a stringstream, then writes the temporary file and moves it into place. It fixes the bug on the old RedHat system. Evan Jones -------------- next part -------------- A non-text attachment was scrubbed... Name: archive.patch Type: application/octet-stream Size: 5759 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20051123/7855778c/attachment.obj> -------------- next part -------------- -- Evan Jones http://evanjones.ca/
Evan Jones wrote:> On Nov 23, 2005, at 8:16, Evan Jones wrote: > >> (4) Write the foreignST into the TmpArchive file. Is there any reason >> that this isn't possible? Then the final archive would be created in a >> single pass, and it could just be moved into place. > > > Ah. I see: It needs to be written in order to compute the offsets.Exactly.> However, it would be possible to use a stringstream for this.Aggg! No! Those perform horribly with anything more than a small amount of data. Some Archive files can be huge (e.g. not fit in memory).> > I think the best thing to do is to write the final archive to a > temporary file, and then move it into place. This has the advantage that > if something goes wrong the original archive will not be corrupted. > Additionally, all the members of the current archive object instance > stay valid, since the original mmap is still valid.Yeah, that sounds like the safest/sanest strategy.> > One disadvantage of this approach is that if CreateSymbolTable is false, > it will be slightly more expensive as it will build the archive in > memory before writing it to disk, instead of just writing it out to disk > directly.You can always use CreateSymbolTable as a flag to determine which mechanism to use.> >> Also, since writeToDisk() invalidates the mapped file that was >> originally used to create the archive, shouldn't this function also >> unmap that file and erase all its members? This would prevent and >> further bugs like this from happening. > > > This is not necessary using the replace method described above. However, > it *would* be necessary on Windows since you can't unlink/replace an > open file on that platform. To fix this, the original archive file must > be closed before performing the move.Yup. If you use the sys::Path class to do the "unlink" (removeFromDisk) then the platform differences should be accounted for. Please don't put the direct unlink call into ArchiveWriter.cpp.> > I've attached a patch that builds the temporary archive in a > stringstream, then writes the temporary file and moves it into place. It > fixes the bug on the old RedHat system.Please test the performance of that on the large .a file in the regression tests. I have my doubts about the scalability and speed for stringstream. Why can't you just write it out directly to the temporary file? Reid.
Apparently Analagous Threads
- [LLVMdev] llvm-ranlib: Bus Error in regressions + fix
- [LLVMdev] llvm-ranlib: Bus Error in regressions + fix
- [LLVMdev] llvm-ranlib: Bus Error in regressions + fix
- [LLVMdev] llvm-ranlib: Bus Error in regressions + fix
- [LLVMdev] llvm-ranlib: Bus Error in regressions + fix