Alexey Lapshin via llvm-dev
2021-Jan-18 17:13 UTC
[llvm-dev] Refactoring. Using streams for llvm-objcopy.
Folks, we are trying to reuse some part of llvm-objcopy. To make it possible we want to move the main implementation of llvm-objcopy from "tools" subdirectory into the "Object" library(D88827). One of the problems is using custom buffer class(tools/llvm-objcopy/Buffer.h/cpp) as an output buffer by llvm-objcopy: Error executeObjcopyOnRawBinary(const CopyConfig &Config, MemoryBuffer &In, ***Buffer &Out***); class Buffer { StringRef Name; virtual Error allocate(size_t Size) = 0; virtual uint8_t *getBufferStart() = 0; virtual Error commit() = 0; } There are two drawbacks of using custom Buffer class: 1. It is not good to move the custom Buffer class into the general llvm Object library. It is better to use some standard, already existed solution. 2. Interface of class Buffer assumes that the entire buffer must be preallocated. i.e. before writing to the output file we need to pre-allocate the space. Such pre-allocation is not a problem if memory-mapped files are used behind Buffer. But it could be a wasting of memory resources in other cases. A library might be used in a wider number of scenarios than a separate tool. So it would not be good for the library to work effectively only if memory-mapped files are used. ====================================== We propose to use streams instead of custom Buffer(D91028): Error executeObjcopyOnRawBinary(const CopyConfig &Config, MemoryBuffer &In, ***raw_ostream &Out***); That solution has the following benefits: 1. it uses standard llvm streams. 2. it does not require pre-allocating of the entire space. 3. it allows easily replace kind of destinations(raw_fd_ostream, raw_null_ostream, raw_svector_ostream, raw_sha1_ostream, raw_string_ostream). 4. for some usages it could avoid memory allocations at all(using raw_sha1_ostream as a destination for sha calculation would not require to allocate space for the output file). That solution has the following drawbacks: 1. There is not memory-mapped file implementation for streams. 2. Some formats could not be generated through one pass. f.e. the ELF format needs to go back(after the section header table is generated it needs to go back and update the ELF header). For the first point, we might create such an implementation(raw_mmap_stream). For the second point, it looks like we have three alternatives: a) Current implementation of ELF writer already has preliminary steps which calculate sizes. Before allocating destination buffer, it calculates the size of the resulting binary: ELFWriter<ELFT>::finalize(). So it looks like all required ELF header information might be precalculated during this finalizing step. It allows writing data to the output stream by one pass. b) use raw_pwrite_stream as the output. It would allow seeking and updating. c) use internal memory buffer, generate the file into that memory buffer(memory buffer allows to go back and update) and then stream that buffer into the output. ====================================== D91028 suggests the following roadmap to replace Buffer with streams: 1. Implement interfaces using raw_ostream: Error executeObjcopyOnBinary(CopyConfig &Config, object::Binary &In, raw_ostream &Out); 2. Use additional internal buffers for file generation and not change the writer's implementation. After the files are generated stream buffers into the output(raw_ostream &Out). Error executeObjcopyOnBinary(CopyConfig &Config, object::Binary &In, raw_ostream &Out) { MemoryBuffer internal; ELFWriter.write(internal); Out.write(internal.data(), internal.size()); } 3. Change the implementation of writers(ELF/COFF/MachO/Wasm) to not use internal buffers. So that writers store data into the output stream directly. Error executeObjcopyOnBinary(CopyConfig &Config, object::Binary &In, raw_ostream &Out) { ELFWriter.write(Out); } If all implementations are successful - then leave raw_ostream in interfaces. If some implementations would still require seek/update functionality then change raw_ostream into raw_pwrite_stream: Error executeObjcopyOnBinary(CopyConfig &Config, object::Binary &In, raw_pwrite_stream &Out); ====================================== So, what do you think? Would it be good to use streams as an output format for objcopy code in Object library? Or Do we need to use some other solution here? Thank you, Alexey.
David Blaikie via llvm-dev
2021-Jan-19 22:42 UTC
[llvm-dev] Refactoring. Using streams for llvm-objcopy.
Sure, streaming sounds fine to me. The more this looks like the code we already have for writing object files in LLVM, the better, hopefully (though that code is a bit expensive, last I played with it - MCStreamer would buffer each output section, etc, before writing the output - so some lower-level abstraction below that which I think we're missing - would be great to have something below MCStreamer that was higher level than the raw_pwrite_stream it uses - that could be shared with lld, objcopy, dwp, dsymutil, etc). On Mon, Jan 18, 2021 at 9:13 AM Alexey Lapshin <avl.lapshin at gmail.com> wrote:> > Folks, > > we are trying to reuse some part of llvm-objcopy. > To make it possible we want to move the main implementation of llvm-objcopy > from "tools" subdirectory into the "Object" library(D88827). > One of the problems is using custom buffer > class(tools/llvm-objcopy/Buffer.h/cpp) > as an output buffer by llvm-objcopy: > > Error executeObjcopyOnRawBinary(const CopyConfig &Config, MemoryBuffer > &In, ***Buffer &Out***); > > class Buffer { > StringRef Name; > > virtual Error allocate(size_t Size) = 0; > virtual uint8_t *getBufferStart() = 0; > virtual Error commit() = 0; > } > > There are two drawbacks of using custom Buffer class: > > 1. It is not good to move the custom Buffer class into the general llvm > Object library. > It is better to use some standard, already existed solution. > > 2. Interface of class Buffer assumes that the entire buffer must be > preallocated. > i.e. before writing to the output file we need to pre-allocate the > space. > Such pre-allocation is not a problem if memory-mapped files are > used behind Buffer. > But it could be a wasting of memory resources in other cases. > A library might be used in a wider number of scenarios than a > separate tool. > So it would not be good for the library to work effectively only if > memory-mapped files are used. > > ======================================> > We propose to use streams instead of custom Buffer(D91028): > > Error executeObjcopyOnRawBinary(const CopyConfig &Config, MemoryBuffer > &In, ***raw_ostream &Out***); > > That solution has the following benefits: > > 1. it uses standard llvm streams. > 2. it does not require pre-allocating of the entire space. > 3. it allows easily replace kind of destinations(raw_fd_ostream, > raw_null_ostream, > raw_svector_ostream, raw_sha1_ostream, raw_string_ostream). > 4. for some usages it could avoid memory allocations at all(using > raw_sha1_ostream > as a destination for sha calculation would not require to allocate > space for the output file). > > That solution has the following drawbacks: > > 1. There is not memory-mapped file implementation for streams. > 2. Some formats could not be generated through one pass. > f.e. the ELF format needs to go back(after the section header table > is generated > it needs to go back and update the ELF header). > > For the first point, we might create such an > implementation(raw_mmap_stream). > > For the second point, it looks like we have three alternatives: > > a) Current implementation of ELF writer already has preliminary steps > which calculate sizes. > Before allocating destination buffer, it calculates the size of > the resulting binary: ELFWriter<ELFT>::finalize(). > So it looks like all required ELF header information might be > precalculated during this finalizing step. > It allows writing data to the output stream by one pass. > > b) use raw_pwrite_stream as the output. It would allow seeking and > updating. > > c) use internal memory buffer, generate the file into that memory > buffer(memory buffer allows to > go back and update) and then stream that buffer into the output. > > ======================================> > D91028 suggests the following roadmap to replace Buffer with streams: > > > 1. Implement interfaces using raw_ostream: > > Error executeObjcopyOnBinary(CopyConfig &Config, object::Binary &In, > raw_ostream &Out); > > 2. Use additional internal buffers for file generation and not change > the writer's implementation. > > After the files are generated stream buffers into the > output(raw_ostream &Out). > > Error executeObjcopyOnBinary(CopyConfig &Config, object::Binary &In, > raw_ostream &Out) { > MemoryBuffer internal; > ELFWriter.write(internal); > Out.write(internal.data(), internal.size()); > } > > > 3. Change the implementation of writers(ELF/COFF/MachO/Wasm) to not use > internal buffers. > So that writers store data into the output stream directly. > > Error executeObjcopyOnBinary(CopyConfig &Config, object::Binary &In, > raw_ostream &Out) { > ELFWriter.write(Out); > } > > If all implementations are successful - then leave raw_ostream in > interfaces. > > If some implementations would still require seek/update > functionality then change raw_ostream into raw_pwrite_stream: > > Error executeObjcopyOnBinary(CopyConfig &Config, object::Binary > &In, raw_pwrite_stream &Out); > > ======================================> > > So, what do you think? Would it be good to use streams as an output > format for objcopy code in Object library? > > Or Do we need to use some other solution here? > > Thank you, Alexey. >
Alexey Lapshin via llvm-dev
2021-Feb-11 09:53 UTC
[llvm-dev] Refactoring. Using streams for llvm-objcopy.
David, Thank you for the opinion. What might be the next step to make a progress with it? Do I need to initiate proposal review, like described here - https://github.com/llvm/llvm-www/blob/main/proposals/LP0001-LLVMDecisionMaking.md ? Thank you, Alexey. On 20.01.2021 01:42, David Blaikie wrote:> Sure, streaming sounds fine to me. The more this looks like the code > we already have for writing object files in LLVM, the better, > hopefully (though that code is a bit expensive, last I played with it > - MCStreamer would buffer each output section, etc, before writing the > output - so some lower-level abstraction below that which I think > we're missing - would be great to have something below MCStreamer that > was higher level than the raw_pwrite_stream it uses - that could be > shared with lld, objcopy, dwp, dsymutil, etc). > > On Mon, Jan 18, 2021 at 9:13 AM Alexey Lapshin <avl.lapshin at gmail.com> wrote: >> Folks, >> >> we are trying to reuse some part of llvm-objcopy. >> To make it possible we want to move the main implementation of llvm-objcopy >> from "tools" subdirectory into the "Object" library(D88827). >> One of the problems is using custom buffer >> class(tools/llvm-objcopy/Buffer.h/cpp) >> as an output buffer by llvm-objcopy: >> >> Error executeObjcopyOnRawBinary(const CopyConfig &Config, MemoryBuffer >> &In, ***Buffer &Out***); >> >> class Buffer { >> StringRef Name; >> >> virtual Error allocate(size_t Size) = 0; >> virtual uint8_t *getBufferStart() = 0; >> virtual Error commit() = 0; >> } >> >> There are two drawbacks of using custom Buffer class: >> >> 1. It is not good to move the custom Buffer class into the general llvm >> Object library. >> It is better to use some standard, already existed solution. >> >> 2. Interface of class Buffer assumes that the entire buffer must be >> preallocated. >> i.e. before writing to the output file we need to pre-allocate the >> space. >> Such pre-allocation is not a problem if memory-mapped files are >> used behind Buffer. >> But it could be a wasting of memory resources in other cases. >> A library might be used in a wider number of scenarios than a >> separate tool. >> So it would not be good for the library to work effectively only if >> memory-mapped files are used. >> >> ======================================>> >> We propose to use streams instead of custom Buffer(D91028): >> >> Error executeObjcopyOnRawBinary(const CopyConfig &Config, MemoryBuffer >> &In, ***raw_ostream &Out***); >> >> That solution has the following benefits: >> >> 1. it uses standard llvm streams. >> 2. it does not require pre-allocating of the entire space. >> 3. it allows easily replace kind of destinations(raw_fd_ostream, >> raw_null_ostream, >> raw_svector_ostream, raw_sha1_ostream, raw_string_ostream). >> 4. for some usages it could avoid memory allocations at all(using >> raw_sha1_ostream >> as a destination for sha calculation would not require to allocate >> space for the output file). >> >> That solution has the following drawbacks: >> >> 1. There is not memory-mapped file implementation for streams. >> 2. Some formats could not be generated through one pass. >> f.e. the ELF format needs to go back(after the section header table >> is generated >> it needs to go back and update the ELF header). >> >> For the first point, we might create such an >> implementation(raw_mmap_stream). >> >> For the second point, it looks like we have three alternatives: >> >> a) Current implementation of ELF writer already has preliminary steps >> which calculate sizes. >> Before allocating destination buffer, it calculates the size of >> the resulting binary: ELFWriter<ELFT>::finalize(). >> So it looks like all required ELF header information might be >> precalculated during this finalizing step. >> It allows writing data to the output stream by one pass. >> >> b) use raw_pwrite_stream as the output. It would allow seeking and >> updating. >> >> c) use internal memory buffer, generate the file into that memory >> buffer(memory buffer allows to >> go back and update) and then stream that buffer into the output. >> >> ======================================>> >> D91028 suggests the following roadmap to replace Buffer with streams: >> >> >> 1. Implement interfaces using raw_ostream: >> >> Error executeObjcopyOnBinary(CopyConfig &Config, object::Binary &In, >> raw_ostream &Out); >> >> 2. Use additional internal buffers for file generation and not change >> the writer's implementation. >> >> After the files are generated stream buffers into the >> output(raw_ostream &Out). >> >> Error executeObjcopyOnBinary(CopyConfig &Config, object::Binary &In, >> raw_ostream &Out) { >> MemoryBuffer internal; >> ELFWriter.write(internal); >> Out.write(internal.data(), internal.size()); >> } >> >> >> 3. Change the implementation of writers(ELF/COFF/MachO/Wasm) to not use >> internal buffers. >> So that writers store data into the output stream directly. >> >> Error executeObjcopyOnBinary(CopyConfig &Config, object::Binary &In, >> raw_ostream &Out) { >> ELFWriter.write(Out); >> } >> >> If all implementations are successful - then leave raw_ostream in >> interfaces. >> >> If some implementations would still require seek/update >> functionality then change raw_ostream into raw_pwrite_stream: >> >> Error executeObjcopyOnBinary(CopyConfig &Config, object::Binary >> &In, raw_pwrite_stream &Out); >> >> ======================================>> >> >> So, what do you think? Would it be good to use streams as an output >> format for objcopy code in Object library? >> >> Or Do we need to use some other solution here? >> >> Thank you, Alexey. >>