thr3ads.net - llvm dev - [llvm-dev] Refactoring. Using streams for llvm-objcopy. [Jan 2021]

If this information is useful, please help other people find it:
Share via:

Alexey Lapshin via llvm-dev

2021-Jan-18 17:13 UTC

[llvm-dev] Refactoring. Using streams for llvm-objcopy.

Folks,

we are trying to reuse some part of llvm-objcopy.
To make it possible we want to move the main implementation of llvm-objcopy
from "tools" subdirectory into the "Object" library(D88827).
One of the problems is using custom buffer 
class(tools/llvm-objcopy/Buffer.h/cpp)
as an output buffer by llvm-objcopy:

Error executeObjcopyOnRawBinary(const CopyConfig &Config, MemoryBuffer 
&In, ***Buffer &Out***);

class Buffer {
   StringRef Name;

   virtual Error allocate(size_t Size) = 0;
   virtual uint8_t *getBufferStart() = 0;
   virtual Error commit() = 0;
}

There are two drawbacks of using custom Buffer class:

1. It is not good to move the custom Buffer class into the general llvm 
Object library.
     It is better to use some standard, already existed solution.

2. Interface of class Buffer assumes that the entire buffer must be 
preallocated.
     i.e. before writing to the output file we need to pre-allocate the 
space.
     Such pre-allocation is not a problem if memory-mapped files are 
used behind Buffer.
     But it could be a wasting of memory resources in other cases.
     A library might be used in a wider number of scenarios than a 
separate tool.
     So it would not be good for the library to work effectively only if 
memory-mapped files are used.

======================================
We propose to use streams instead of custom Buffer(D91028):

Error executeObjcopyOnRawBinary(const CopyConfig &Config, MemoryBuffer 
&In, ***raw_ostream &Out***);

That solution has the following benefits:

1. it uses standard llvm streams.
2. it does not require pre-allocating of the entire space.
3. it allows easily replace kind of destinations(raw_fd_ostream, 
raw_null_ostream,
     raw_svector_ostream, raw_sha1_ostream, raw_string_ostream).
4. for some usages it could avoid memory allocations at all(using 
raw_sha1_ostream
     as a destination for sha calculation would not require to allocate 
space for the output file).

That solution has the following drawbacks:

1. There is not memory-mapped file implementation for streams.
2. Some formats could not be generated through one pass.
     f.e. the ELF format needs to go back(after the section header table 
is generated
     it needs to go back and update the ELF header).

For the first point, we might create such an 
implementation(raw_mmap_stream).

For the second point, it looks like we have three alternatives:

a) Current implementation of ELF writer already has preliminary steps 
which calculate sizes.
      Before allocating destination buffer, it calculates the size of 
the resulting binary:  ELFWriter<ELFT>::finalize().
      So it looks like all required ELF header information might be 
precalculated during this finalizing step.
      It allows writing data to the output stream by one pass.

b) use raw_pwrite_stream as the output. It would allow seeking and 
updating.

c) use internal memory buffer, generate the file into that memory 
buffer(memory buffer allows to
go back and update) and then stream that buffer into the output.

======================================
D91028 suggests the following roadmap to replace Buffer with streams:


1. Implement interfaces using raw_ostream:

    Error executeObjcopyOnBinary(CopyConfig &Config, object::Binary &In,
raw_ostream &Out);

2. Use additional internal buffers for file generation and not change 
the writer's implementation.

     After the files are generated stream buffers into the 
output(raw_ostream &Out).

    Error executeObjcopyOnBinary(CopyConfig &Config, object::Binary &In,
raw_ostream &Out) {
      MemoryBuffer internal;
      ELFWriter.write(internal);
      Out.write(internal.data(), internal.size());
    }


3. Change the implementation of writers(ELF/COFF/MachO/Wasm) to not use 
internal buffers.
     So that writers store data into the output stream directly.

    Error executeObjcopyOnBinary(CopyConfig &Config, object::Binary &In,
raw_ostream &Out) {
      ELFWriter.write(Out);
    }

     If all implementations are successful - then leave raw_ostream in 
interfaces.

     If some implementations would still require seek/update 
functionality then change raw_ostream into raw_pwrite_stream:

     Error executeObjcopyOnBinary(CopyConfig &Config, object::Binary 
&In, raw_pwrite_stream &Out);

======================================

So, what do you think? Would it be good to use streams as an output 
format for objcopy code in Object library?

Or Do we need to use some other solution here?

Thank you, Alexey.

David Blaikie via llvm-dev

2021-Jan-19 22:42 UTC

head link

[llvm-dev] Refactoring. Using streams for llvm-objcopy.

Sure, streaming sounds fine to me. The more this looks like the code
we already have for writing object files in LLVM, the better,
hopefully (though that code is a bit expensive, last I played with it
- MCStreamer would buffer each output section, etc, before writing the
output - so some lower-level abstraction below that which I think
we're missing - would be great to have something below MCStreamer that
was higher level than the raw_pwrite_stream it uses - that could be
shared with lld, objcopy, dwp, dsymutil, etc).

On Mon, Jan 18, 2021 at 9:13 AM Alexey Lapshin <avl.lapshin at gmail.com>
wrote:>
> Folks,
>
> we are trying to reuse some part of llvm-objcopy.
> To make it possible we want to move the main implementation of llvm-objcopy
> from "tools" subdirectory into the "Object"
library(D88827).
> One of the problems is using custom buffer
> class(tools/llvm-objcopy/Buffer.h/cpp)
> as an output buffer by llvm-objcopy:
>
> Error executeObjcopyOnRawBinary(const CopyConfig &Config, MemoryBuffer
> &In, ***Buffer &Out***);
>
> class Buffer {
>    StringRef Name;
>
>    virtual Error allocate(size_t Size) = 0;
>    virtual uint8_t *getBufferStart() = 0;
>    virtual Error commit() = 0;
> }
>
> There are two drawbacks of using custom Buffer class:
>
> 1. It is not good to move the custom Buffer class into the general llvm
> Object library.
>      It is better to use some standard, already existed solution.
>
> 2. Interface of class Buffer assumes that the entire buffer must be
> preallocated.
>      i.e. before writing to the output file we need to pre-allocate the
> space.
>      Such pre-allocation is not a problem if memory-mapped files are
> used behind Buffer.
>      But it could be a wasting of memory resources in other cases.
>      A library might be used in a wider number of scenarios than a
> separate tool.
>      So it would not be good for the library to work effectively only if
> memory-mapped files are used.
>
> ======================================>
> We propose to use streams instead of custom Buffer(D91028):
>
> Error executeObjcopyOnRawBinary(const CopyConfig &Config, MemoryBuffer
> &In, ***raw_ostream &Out***);
>
> That solution has the following benefits:
>
> 1. it uses standard llvm streams.
> 2. it does not require pre-allocating of the entire space.
> 3. it allows easily replace kind of destinations(raw_fd_ostream,
> raw_null_ostream,
>      raw_svector_ostream, raw_sha1_ostream, raw_string_ostream).
> 4. for some usages it could avoid memory allocations at all(using
> raw_sha1_ostream
>      as a destination for sha calculation would not require to allocate
> space for the output file).
>
> That solution has the following drawbacks:
>
> 1. There is not memory-mapped file implementation for streams.
> 2. Some formats could not be generated through one pass.
>      f.e. the ELF format needs to go back(after the section header table
> is generated
>      it needs to go back and update the ELF header).
>
> For the first point, we might create such an
> implementation(raw_mmap_stream).
>
> For the second point, it looks like we have three alternatives:
>
> a) Current implementation of ELF writer already has preliminary steps
> which calculate sizes.
>       Before allocating destination buffer, it calculates the size of
> the resulting binary:  ELFWriter<ELFT>::finalize().
>       So it looks like all required ELF header information might be
> precalculated during this finalizing step.
>       It allows writing data to the output stream by one pass.
>
> b) use raw_pwrite_stream as the output. It would allow seeking and
> updating.
>
> c) use internal memory buffer, generate the file into that memory
> buffer(memory buffer allows to
> go back and update) and then stream that buffer into the output.
>
> ======================================>
> D91028 suggests the following roadmap to replace Buffer with streams:
>
>
> 1. Implement interfaces using raw_ostream:
>
>     Error executeObjcopyOnBinary(CopyConfig &Config, object::Binary
&In,
> raw_ostream &Out);
>
> 2. Use additional internal buffers for file generation and not change
> the writer's implementation.
>
>      After the files are generated stream buffers into the
> output(raw_ostream &Out).
>
>     Error executeObjcopyOnBinary(CopyConfig &Config, object::Binary
&In,
> raw_ostream &Out) {
>       MemoryBuffer internal;
>       ELFWriter.write(internal);
>       Out.write(internal.data(), internal.size());
>     }
>
>
> 3. Change the implementation of writers(ELF/COFF/MachO/Wasm) to not use
> internal buffers.
>      So that writers store data into the output stream directly.
>
>     Error executeObjcopyOnBinary(CopyConfig &Config, object::Binary
&In,
> raw_ostream &Out) {
>       ELFWriter.write(Out);
>     }
>
>      If all implementations are successful - then leave raw_ostream in
> interfaces.
>
>      If some implementations would still require seek/update
> functionality then change raw_ostream into raw_pwrite_stream:
>
>      Error executeObjcopyOnBinary(CopyConfig &Config, object::Binary
> &In, raw_pwrite_stream &Out);
>
> ======================================>
>
> So, what do you think? Would it be good to use streams as an output
> format for objcopy code in Object library?
>
> Or Do we need to use some other solution here?
>
> Thank you, Alexey.
>

Alexey Lapshin via llvm-dev

2021-Feb-11 09:53 UTC

head link

[llvm-dev] Refactoring. Using streams for llvm-objcopy.

David, Thank you for the opinion. What might be the next step to make a 
progress with it?

Do I need to initiate proposal review, like described here - 
https://github.com/llvm/llvm-www/blob/main/proposals/LP0001-LLVMDecisionMaking.md
?

Thank you, Alexey.

On 20.01.2021 01:42, David Blaikie wrote:> Sure, streaming sounds fine to me. The more this looks like the code
> we already have for writing object files in LLVM, the better,
> hopefully (though that code is a bit expensive, last I played with it
> - MCStreamer would buffer each output section, etc, before writing the
> output - so some lower-level abstraction below that which I think
> we're missing - would be great to have something below MCStreamer that
> was higher level than the raw_pwrite_stream it uses - that could be
> shared with lld, objcopy, dwp, dsymutil, etc).
>
> On Mon, Jan 18, 2021 at 9:13 AM Alexey Lapshin <avl.lapshin at
gmail.com> wrote:
>> Folks,
>>
>> we are trying to reuse some part of llvm-objcopy.
>> To make it possible we want to move the main implementation of
llvm-objcopy
>> from "tools" subdirectory into the "Object"
library(D88827).
>> One of the problems is using custom buffer
>> class(tools/llvm-objcopy/Buffer.h/cpp)
>> as an output buffer by llvm-objcopy:
>>
>> Error executeObjcopyOnRawBinary(const CopyConfig &Config,
MemoryBuffer
>> &In, ***Buffer &Out***);
>>
>> class Buffer {
>>     StringRef Name;
>>
>>     virtual Error allocate(size_t Size) = 0;
>>     virtual uint8_t *getBufferStart() = 0;
>>     virtual Error commit() = 0;
>> }
>>
>> There are two drawbacks of using custom Buffer class:
>>
>> 1. It is not good to move the custom Buffer class into the general llvm
>> Object library.
>>       It is better to use some standard, already existed solution.
>>
>> 2. Interface of class Buffer assumes that the entire buffer must be
>> preallocated.
>>       i.e. before writing to the output file we need to pre-allocate
the
>> space.
>>       Such pre-allocation is not a problem if memory-mapped files are
>> used behind Buffer.
>>       But it could be a wasting of memory resources in other cases.
>>       A library might be used in a wider number of scenarios than a
>> separate tool.
>>       So it would not be good for the library to work effectively only
if
>> memory-mapped files are used.
>>
>> ======================================>>
>> We propose to use streams instead of custom Buffer(D91028):
>>
>> Error executeObjcopyOnRawBinary(const CopyConfig &Config,
MemoryBuffer
>> &In, ***raw_ostream &Out***);
>>
>> That solution has the following benefits:
>>
>> 1. it uses standard llvm streams.
>> 2. it does not require pre-allocating of the entire space.
>> 3. it allows easily replace kind of destinations(raw_fd_ostream,
>> raw_null_ostream,
>>       raw_svector_ostream, raw_sha1_ostream, raw_string_ostream).
>> 4. for some usages it could avoid memory allocations at all(using
>> raw_sha1_ostream
>>       as a destination for sha calculation would not require to
allocate
>> space for the output file).
>>
>> That solution has the following drawbacks:
>>
>> 1. There is not memory-mapped file implementation for streams.
>> 2. Some formats could not be generated through one pass.
>>       f.e. the ELF format needs to go back(after the section header
table
>> is generated
>>       it needs to go back and update the ELF header).
>>
>> For the first point, we might create such an
>> implementation(raw_mmap_stream).
>>
>> For the second point, it looks like we have three alternatives:
>>
>> a) Current implementation of ELF writer already has preliminary steps
>> which calculate sizes.
>>        Before allocating destination buffer, it calculates the size of
>> the resulting binary:  ELFWriter<ELFT>::finalize().
>>        So it looks like all required ELF header information might be
>> precalculated during this finalizing step.
>>        It allows writing data to the output stream by one pass.
>>
>> b) use raw_pwrite_stream as the output. It would allow seeking and
>> updating.
>>
>> c) use internal memory buffer, generate the file into that memory
>> buffer(memory buffer allows to
>> go back and update) and then stream that buffer into the output.
>>
>> ======================================>>
>> D91028 suggests the following roadmap to replace Buffer with streams:
>>
>>
>> 1. Implement interfaces using raw_ostream:
>>
>>      Error executeObjcopyOnBinary(CopyConfig &Config,
object::Binary &In,
>> raw_ostream &Out);
>>
>> 2. Use additional internal buffers for file generation and not change
>> the writer's implementation.
>>
>>       After the files are generated stream buffers into the
>> output(raw_ostream &Out).
>>
>>      Error executeObjcopyOnBinary(CopyConfig &Config,
object::Binary &In,
>> raw_ostream &Out) {
>>        MemoryBuffer internal;
>>        ELFWriter.write(internal);
>>        Out.write(internal.data(), internal.size());
>>      }
>>
>>
>> 3. Change the implementation of writers(ELF/COFF/MachO/Wasm) to not use
>> internal buffers.
>>       So that writers store data into the output stream directly.
>>
>>      Error executeObjcopyOnBinary(CopyConfig &Config,
object::Binary &In,
>> raw_ostream &Out) {
>>        ELFWriter.write(Out);
>>      }
>>
>>       If all implementations are successful - then leave raw_ostream in
>> interfaces.
>>
>>       If some implementations would still require seek/update
>> functionality then change raw_ostream into raw_pwrite_stream:
>>
>>       Error executeObjcopyOnBinary(CopyConfig &Config,
object::Binary
>> &In, raw_pwrite_stream &Out);
>>
>> ======================================>>
>>
>> So, what do you think? Would it be good to use streams as an output
>> format for objcopy code in Object library?
>>
>> Or Do we need to use some other solution here?
>>
>> Thank you, Alexey.
>>

llvm dev - Jan 2021 - Refactoring. Using streams for llvm-objcopy.

[llvm-dev] Refactoring. Using streams for llvm-objcopy.

[llvm-dev] Refactoring. Using streams for llvm-objcopy.

[llvm-dev] Refactoring. Using streams for llvm-objcopy.