Existing llvm code tends to use raw_ostream for writing files. But raw_ostream is not a good match for a linker for a couple of reasons: 1) When the linker creates an executable, the file needs the 'x' bit set. Currently raw_fd_ostream has no way to set that. 2) The Unix conformance suite actually has some test cases where the linker is run and the output file does exists but is not writable, or is not writable but is in a writable directory, or with funky umask values. raw_fd_ostream interface has no way to match those semantics. 3) On darwin we have found the linker performs better if it opens the output file, truncates it to the output size, then mmaps in the file, then writes directly into that memory buffer. This avoids the memory copy from the private buffer to the OS file system buffer in the write() syscall. 4) In the model we are using for lld, a streaming output interface is not optimal. Currently, lld copies chunks of code from the (read-only) input files, to a temporary buffer, then applies any fixups (relocations), then streams out that temporary buffer. If instead we had a big output buffer, the linker could copy the code chunks directly to the output buffer and apply the fixups there, avoiding an extra copy. Is there an existing solution for these issues in llvm I've overlooked? I've searched the bug database and did not find any similar requests. Should I propose a new llvm/Support/ class? -Nick
On May 3, 2012, at 6:10 PM, Nick Kledzik wrote:> Existing llvm code tends to use raw_ostream for writing files. But raw_ostream is not a good match for a linker for a couple of reasons: > > 1) When the linker creates an executable, the file needs the 'x' bit set. Currently raw_fd_ostream has no way to set that.If this were the only problem, I'd suggest just generalizing raw_fd_ostream to support this use case. It would be straight-forward to do.> 2) The Unix conformance suite actually has some test cases where the linker is run and the output file does exists but is not writable, or is not writable but is in a writable directory, or with funky umask values. raw_fd_ostream interface has no way to match those semantics.If this were the only problem :), I would suggest using a new raw_ostream subclass, where you do custom stuff to get the file system stuff happening that you want, but reuse all the streaming API aspect of raw_ostream.> 3) On darwin we have found the linker performs better if it opens the output file, truncates it to the output size, then mmaps in the file, then writes directly into that memory buffer. This avoids the memory copy from the private buffer to the OS file system buffer in the write() syscall.This should also be possible with raw_ostream.> 4) In the model we are using for lld, a streaming output interface is not optimal.This is a show-stopper for raw_ostream. I really don't want raw_ostream to support seeking or other non-stream behavior. :)> CIs there an existing solution for these issues in llvm I've overlooked? I've searched the bug database and did not find any similar requests. > > Should I propose a new llvm/Support/ class?Yes please. We have a variety of other places in the codebase that are using open/close/read etc directly (grep for #include's of unistd.h). Using these APIs is annoying because they are really low level (an expose nonsense like EINTR handling) and that windows make them annoying to use. Having a better wrapper for doing real low-level file system stuff (including seeking) makes perfect sense for llvm/Support! -Chris
For the reasons listed in my 03-May-2012 email, I am proposing a new llvm/Support class for using in writing binary files: /// OutputBuffer - This interface provides simple way to create an in-memory /// buffer which when done will be written to a file. During the lifetime of /// these objects, the content or existence of the specified file is undefined. /// That is, creating an OutputBuffer for a file may immediately remove the /// file. /// If the OutputBuffer is committed, the target file's content will become /// the buffer content at the time of the commit. If the OutputBuffer is not /// committed, the file will be deleted in the OutputBuffer buffer destructor. class OutputBuffer { public: enum Flags { F_executable = 1, /// set the 'x' bit on the resulting file }; /// Factory method to create an OutputBuffer object which manages a read/write /// buffer of the specified size. When committed, the buffer will be written /// to the file at the specified path. static error_code createFile(StringRef filePath, Flags flags, size_t size, OwningPtr<OutputBuffer> &result); /// Returns a pointer to the start of the buffer. uint8_t *bufferStart(); /// Returns a pointer to the end of the buffer. uint8_t *bufferEnd(); /// Returns size of the buffer. size_t size(); /// Flushes the content of the buffer to its file and deallocates the /// buffer. If commit() is not called before this object's destructor /// is called, the file is deleted in the destructor. The optional parameter /// is used if it turns out you want the file size to be smaller than /// initially requested. void commit(int64_t newSmallerSize = -1); }; The Flags will probable need to be extended over time to handle other clients needs. For Unix/Darwin, my plan is to implement this by: 1) delete the file 2) create a new file with a random name in same directory 3) truncate the file to the new size 4) mmap() in the file r/w 5) On commit, unmap the file, rename() to final name 6) In destructor, if not committed, unmap, delete the randomly named file I'll leave the windows implementation empty and let someone with windows experience do the implementation. Comments? Suggestions? -Nick> On May 3, 2012, at 6:10 PM, Nick Kledzik wrote: > Existing llvm code tends to use raw_ostream for writing files. But raw_ostream is not a good match for a linker for a couple of reasons: > > 1) When the linker creates an executable, the file needs the 'x' bit set. Currently raw_fd_ostream has no way to set that. > > 2) The Unix conformance suite actually has some test cases where the linker is run and the output file does exists but is not writable, or is not writable but is in a writable directory, or with funky umask values. raw_fd_ostream interface has no way to match those semantics. > > 3) On darwin we have found the linker performs better if it opens the output file, truncates it to the output size, then mmaps in the file, then writes directly into that memory buffer. This avoids the memory copy from the private buffer to the OS file system buffer in the write() syscall. > > 4) In the model we are using for lld, a streaming output interface is not optimal. Currently, lld copies chunks of code from the (read-only) input files, to a temporary buffer, then applies any fixups (relocations), then streams out that temporary buffer. If instead we had a big output buffer, the linker could copy the code chunks directly to the output buffer and apply the fixups there, avoiding an extra copy. >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120507/17766bbd/attachment.html>
Maybe Matching Threads
- [LLVMdev] [RFC] llvm/include/Support/OutputBuffer.h
- [LLVMdev] [RFC] llvm/include/Support/OutputBuffer.h
- [LLVMdev] [RFC] llvm/include/Support/OutputBuffer.h
- [LLVMdev] [RFC] llvm/include/Support/OutputBuffer.h
- [LLVMdev] [RFC] llvm/include/Support/OutputBuffer.h