On Oct 22, 2012, at 5:07 PM, Nick Kledzik wrote:> On Oct 22, 2012, at 4:40 PM, Sean Silva wrote: >> Hey Nick, what's the status on YAML IO? The other thread seems to have died. > > I'm waiting on Michael Spencer's feedback.To better understand how a client would use YAML I/O. I've completely rewritten the ReaderYAML and WriterYAML in lld to use YAML I/O. The source code is now about half the size. But more importantly, the error checking is much, much better and any time an attribute (e.g. of an Atom) is changed or added, there is just one place to update the yaml code instead of two places (the reader and writer). This code requires no changes to the rest of lld. The traits based approach was thus non-invasive. It is able to produce yaml from existing data structures and when reading yaml recreate the existing data structures. The example also shows how context sensitive yaml conversion is done, using io.getContext() to make conversion decisions. The StringRef ownership works, but is a little clunky. Because one yaml stream can contain many documents, the ownership of the input file MemoryBuffer cannot be handed off to the newly created lld::File object (which would have allowed any StringRefs provided by the parse to be used as is). Instead whenever a trait needs to keep a StringRef it must make a copy of the underlying string, and the copies are owned by the generated lld::File object. -------------- next part -------------- A non-text attachment was scrubbed... Name: ReaderWriterYAML.cpp Type: application/octet-stream Size: 42575 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20121024/90283f3f/attachment.obj> -------------- next part -------------- -Nick
> To better understand how a client would use YAML I/O. I've completely rewritten the ReaderYAML and WriterYAML in lld to use YAML I/O. The source code is now about half the size. But more importantly, the error checking is much, much better and any time an attribute (e.g. of an Atom) is changed or added, there is just one place to update the yaml code instead of two places (the reader and writer).Fantastic!> The StringRef ownership works, but is a little clunky. Because one yaml stream can contain many documents, the ownership of the input file MemoryBuffer cannot be handed off to the newly created lld::File object (which would have allowed any StringRefs provided by the parse to be used as is). Instead whenever a trait needs to keep a StringRef it must make a copy of the underlying string, and the copies are owned by the generated lld::File object.Copying seems like a performance problem waiting to happen. Maybe this could be addressed through reference counting? What are "typical" sizes of strings that would be copied? -- Sean Silva On Wed, Oct 24, 2012 at 9:34 PM, Nick Kledzik <kledzik at apple.com> wrote:> > On Oct 22, 2012, at 5:07 PM, Nick Kledzik wrote: >> On Oct 22, 2012, at 4:40 PM, Sean Silva wrote: >>> Hey Nick, what's the status on YAML IO? The other thread seems to have died. >> >> I'm waiting on Michael Spencer's feedback. > > To better understand how a client would use YAML I/O. I've completely rewritten the ReaderYAML and WriterYAML in lld to use YAML I/O. The source code is now about half the size. But more importantly, the error checking is much, much better and any time an attribute (e.g. of an Atom) is changed or added, there is just one place to update the yaml code instead of two places (the reader and writer). > > This code requires no changes to the rest of lld. The traits based approach was thus non-invasive. It is able to produce yaml from existing data structures and when reading yaml recreate the existing data structures. > > The example also shows how context sensitive yaml conversion is done, using io.getContext() to make conversion decisions. > > The StringRef ownership works, but is a little clunky. Because one yaml stream can contain many documents, the ownership of the input file MemoryBuffer cannot be handed off to the newly created lld::File object (which would have allowed any StringRefs provided by the parse to be used as is). Instead whenever a trait needs to keep a StringRef it must make a copy of the underlying string, and the copies are owned by the generated lld::File object. > > > > > > -Nick > >
On Oct 25, 2012, at 9:59 AM, Sean Silva wrote:>> To better understand how a client would use YAML I/O. I've completely rewritten the ReaderYAML and WriterYAML in lld to use YAML I/O. The source code is now about half the size. But more importantly, the error checking is much, much better and any time an attribute (e.g. of an Atom) is changed or added, there is just one place to update the yaml code instead of two places (the reader and writer). > > Fantastic! > >> The StringRef ownership works, but is a little clunky. Because one yaml stream can contain many documents, the ownership of the input file MemoryBuffer cannot be handed off to the newly created lld::File object (which would have allowed any StringRefs provided by the parse to be used as is). Instead whenever a trait needs to keep a StringRef it must make a copy of the underlying string, and the copies are owned by the generated lld::File object. > > Copying seems like a performance problem waiting to happen. Maybe this > could be addressed through reference counting?There are no separate strings to reference count. There is just the one big MemoryBuffer which all the parsed StringRefs point into. I don't think this is a general issue with YAML I/O. Most clients will not need to support multiple documents and will have a natural owner for the MemoryBuffer. The lld test cases uses multiple yaml documents because lld is a linker and links multiple files. That said, I think a better implementation for lld's ReaderWriterYAML would be a BumpPtrAllocator per File to hold any strings it needs to copy.> What are "typical" > sizes of strings that would be copied?Most values in key/value pairs are converted to some enum value, so a temporary StringRef into the MemoryBuffer is fine. The only ones that need to remain as strings are the atoms' names (e.g. "malloc"). -Nick> > On Wed, Oct 24, 2012 at 9:34 PM, Nick Kledzik <kledzik at apple.com> wrote: >> >> On Oct 22, 2012, at 5:07 PM, Nick Kledzik wrote: >>> On Oct 22, 2012, at 4:40 PM, Sean Silva wrote: >>>> Hey Nick, what's the status on YAML IO? The other thread seems to have died. >>> >>> I'm waiting on Michael Spencer's feedback. >> >> To better understand how a client would use YAML I/O. I've completely rewritten the ReaderYAML and WriterYAML in lld to use YAML I/O. The source code is now about half the size. But more importantly, the error checking is much, much better and any time an attribute (e.g. of an Atom) is changed or added, there is just one place to update the yaml code instead of two places (the reader and writer). >> >> This code requires no changes to the rest of lld. The traits based approach was thus non-invasive. It is able to produce yaml from existing data structures and when reading yaml recreate the existing data structures. >> >> The example also shows how context sensitive yaml conversion is done, using io.getContext() to make conversion decisions. >> >> The StringRef ownership works, but is a little clunky. Because one yaml stream can contain many documents, the ownership of the input file MemoryBuffer cannot be handed off to the newly created lld::File object (which would have allowed any StringRefs provided by the parse to be used as is). Instead whenever a trait needs to keep a StringRef it must make a copy of the underlying string, and the copies are owned by the generated lld::File object. >> >> >> >> >> >> -Nick >> >>
Michael, To validate the refactor of YAML Reader/Writer using YAML I/O. I updated all the test cases to be compatible with YAML I/O. One issue that was a gnarly was how to handle the test cases with archives. Currently, we have test cases like: --- atoms: - name: foo # more stuff --- archive: - name bar.o atoms: - name: bar # more stuff This sort of weak/dynamic typing is hard to using with YAML I/O which enforces stronger typing which helps it do better error checking. The core of the problem is when a new document is started, we don't know what kind of file it is going to be, to know what keys are legal. I first looked into used tags to specify the document type. For example: --- !archive members: - name: bar.o # more stuff But after modifying YAMLParser to make that the tag available, then trying to figure out how to make the tag actionable in the trait, I realized that for maps, the tag is just like another key. So, if every client agreed that the first key/value was a particular key name (e.g. tag: type) which YAML I/O already supports, then there is no need for tags and no need for an additional mechanism in YAML I/O. So, I know have the traits set up to support archives assuming the first (option) key of each document type read by lld will be "kind:". The archive-basic.objctxt case now looks like: # RUN: lld-core %s | FileCheck %s # # Tests archives in YAML. Tests that an undefined in a regular file will load # all atoms in select archive members. # --- defined-atoms: - name: foo type: code undefined-atoms: - name: bar --- kind: archive members: - name: bar.o content: defined-atoms: - name: bar scope: global type: code - name: bar2 type: code - name: baz.o content: defined-atoms: - name: baz scope: global type: code - name: baz2 type: code ... # CHECK: name: foo # CHECK-NOT: undefined-atoms: # CHECK: name: bar # CHECK: name: bar2 # CHECK-NOT: name: baz # CHECK: ... My thinking is that we can extend this to support embedded COFF/ELF/MachO in yaml by using new kind values. For example: --- kind: object-coff header: # stuff sections: # stuff symbols: # stuff ... The MappingTrait<const ld::File*> will look at the kind value and switch off it. We just need an external function (per file format) which can be called with the same mapping() parameters which will do the io.map*() calls and have traits for platform specific types, which turns the yaml into an in-memory binary object, then runs the Reader to return a File*. I'll be prototyping this approach for mach-o. -Nick On Oct 25, 2012, at 9:59 AM, Sean Silva wrote:>> To better understand how a client would use YAML I/O. I've completely rewritten the ReaderYAML and WriterYAML in lld to use YAML I/O. The source code is now about half the size. But more importantly, the error checking is much, much better and any time an attribute (e.g. of an Atom) is changed or added, there is just one place to update the yaml code instead of two places (the reader and writer). > > Fantastic!-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20121029/3d3ace69/attachment.html>