Leonardo Santagada via llvm-dev
2018-Jan-26 17:59 UTC
[llvm-dev] [lldb-dev] Trying out lld to link windows binaries (using msvc as a compiler)
I'm now thinking that there's a bug in either obj2yaml or yaml2obj, because if I run just those two tools on my codebase it generates yaml files that can't be decoded, will try now to not add any section to the obj file in llvm-objcopy to see if I can link with obj files that I rewrite (but without adding symbols or sections). One of the bugs that do annoy me is that the timedatestamp is not carried when obj2yaml writes a file, and that the layout function on yaml2coff does generate different indexes to the sections, none that look wrong, but it seems that it leaves some padding, but I didn't have time to look to closely at why. On Fri, Jan 26, 2018 at 6:52 PM, Zachary Turner <zturner at google.com> wrote:> Hmm, ok. In that case let me try again without my local changes. Maybe > they are getting in the way :-/ > > > On Fri, Jan 26, 2018 at 9:51 AM Leonardo Santagada <santagada at gmail.com> > wrote: > >> it is identical to me... wierd. >> >> On Fri, Jan 26, 2018 at 6:49 PM, Zachary Turner <zturner at google.com> >> wrote: >> >>> (Ignore the fact that my hashes are 8 byte in the "good" file, this is >>> due to some local changes I've been experimenting with) >>> >>> On Fri, Jan 26, 2018 at 9:48 AM Zachary Turner <zturner at google.com> >>> wrote: >>> >>>> I did this: >>>> >>>> // a.cpp >>>> static int x = 0; >>>> void b(int); >>>> void a(int) { >>>> if (x) >>>> b(x); >>>> } >>>> int main(int argc, char **argv) { >>>> a(argc); >>>> return x; >>>> } >>>> >>>> >>>> clang-cl /Z7 /c a.cpp /Foa.noghash.obj >>>> clang-cl /Z7 /c a.cpp -mllvm -emit-codeview-ghash-section >>>> /Foa.ghash.good.obj >>>> llvm-objcopy a.noghash.obj a.ghash.bad.obj >>>> obj2yaml a.ghash.good.obj > a.ghash.good.yaml >>>> obj2yaml a.ghash.bad.obj > a.ghash.bad.yaml >>>> >>>> Then open these 2 yaml files up in a diff viewer. It looks like the >>>> hashes aren't getting emitted at all. For example, in the good yaml file I >>>> see this: >>>> >>>> - Name: '.debug$H' >>>> Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, >>>> IMAGE_SCN_MEM_DISCARDABLE, IMAGE_SCN_MEM_READ ] >>>> Alignment: 4 >>>> SectionData: C5C93301000001005549419E78044E >>>> 3896D45CD7009428758BE4A1E2B3E022BA267DEE221F5C42B17BCA182AF8 >>>> 4584814A8B5E7E3FB17B397A9E3DEA75CD5627 >>>> GlobalHashes: >>>> Version: 0 >>>> HashAlgorithm: 1 >>>> HashValues: >>>> - 5549419E78044E38 >>>> - 96D45CD700942875 >>>> - 8BE4A1E2B3E022BA >>>> - 267DEE221F5C42B1 >>>> - 7BCA182AF8458481 >>>> - 4A8B5E7E3FB17B39 >>>> - 7A9E3DEA75CD5627 >>>> - Name: .pdata >>>> >>>> And in the bad yaml file I see this: >>>> - Name: '.debug$H' >>>> Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, >>>> IMAGE_SCN_MEM_DISCARDABLE, IMAGE_SCN_MEM_READ ] >>>> Alignment: 4 >>>> SectionData: C5C9330100000000 >>>> GlobalHashes: >>>> Version: 0 >>>> HashAlgorithm: 0 >>>> - Name: .pdata >>>> >>>> Don't focus too much on trying to figure out weird linker errors. Just >>>> get the output of obj2yaml to be identical when run under a diff utility, >>>> then everything should work fine. >>>> >>>> On Fri, Jan 26, 2018 at 7:27 AM Leonardo Santagada <santagada at gmail.com> >>>> wrote: >>>> >>>>> I'm so close I can almost smell it :) >>>>> >>>>> I know how bad the code looks, I don't intend to submit this, but if >>>>> you want to try it out its at: https://gist.github.com/santagada/ >>>>> 544136b1ee143bf31653b1158ac6829e >>>>> >>>>> I'm seeing: lld-link.exe: error: duplicate symbol: >>>>> "<redacted_unmangled>" (<redacted>) in <internal> and in >>>>> <redacted_filename>.obj, looking at the .yaml dump the symbols are all >>>>> similar to this: >>>>> >>>>> - Name: <redacted> >>>>> Value: 0 >>>>> SectionNumber: 0 >>>>> SimpleType: IMAGE_SYM_TYPE_NULL >>>>> ComplexType: IMAGE_SYM_DTYPE_FUNCTION >>>>> StorageClass: IMAGE_SYM_CLASS_WEAK_EXTERNAL >>>>> WeakExternal: >>>>> TagIndex: 134 >>>>> Characteristics: IMAGE_WEAK_EXTERN_SEARCH_LIBRARY >>>>> >>>>> On Thu, Jan 25, 2018 at 8:01 PM, Zachary Turner <zturner at google.com> >>>>> wrote: >>>>> >>>>>> I haven't really dabbled in this part of the COFF format personally, >>>>>> so hopefully I'm not leading you astray :) >>>>>> >>>>>> But I checked the code for coff2yaml, and I see this: >>>>>> >>>>>> } else if (Symbol.isSectionDefinition()) { >>>>>> // This symbol represents a section definition. >>>>>> assert(Symbol.getNumberOfAuxSymbols() == 1 && >>>>>> "Expected a single aux symbol to describe this >>>>>> section!"); >>>>>> const object::coff_aux_section_definition *ObjSD >>>>>> reinterpret_cast<const object::coff_aux_section_definition >>>>>> *>( >>>>>> AuxData.data()); >>>>>> >>>>>> So it looks like you need exactly 1 aux symbol for each section >>>>>> symbol. >>>>>> >>>>>> I then scrolled up in this function to figure out where AuxData comes >>>>>> from, and it comes from COFFObjectFile::getSymbolAuxData. I think >>>>>> that function holds the clue to what you need to do. It looks like you >>>>>> need to set coff::symbol::NumberOfAuxSymbols to 1, and then there is >>>>>> a comment in getSymbolAuxData which says: >>>>>> >>>>>> // AUX data comes immediately after the symbol in COFF >>>>>> Aux = reinterpret_cast<const uint8_t *>(Symbol.getRawPtr()) + >>>>>> SymbolSize; >>>>>> >>>>>> So I think you just need to write the bytes immediately after the >>>>>> coff::symbol. The thing you need to write looks like a >>>>>> coff::coff_aux_section_definition structure. >>>>>> >>>>>> For the CheckSum, look at WinCOFFObjectWriter::writeSection. It >>>>>> looks like its a CRC32 of the actual section contents, which you can >>>>>> generate with a couple of lines of code: >>>>>> >>>>>> JamCRC JC(/*Init=*/0); >>>>>> JC.update(DebugHContents); >>>>>> AuxSymbol.CheckSum = JC.getCRC(); >>>>>> >>>>>> Hope this helps >>>>>> >>>>>-- Leonardo Santagada -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180126/1bd57a4a/attachment.html>
Zachary Turner via llvm-dev
2018-Jan-26 18:06 UTC
[llvm-dev] [lldb-dev] Trying out lld to link windows binaries (using msvc as a compiler)
Interesting. If it is generating yaml files that can't be decoded, then definitely sounds like a bug. If you can provide a reduced test case we can try to fix it, but admittedly it can often take some effort to generate a reduced test case. The best way is to use creduce. Use cl or clang-cl and write the pre-processed output to a file, then run creduce on that file with a test that basically roundtrips from obj2yaml to yaml2obj and return 1 if there's an error. Then let it run for a couple of hours (or days) and you should come back to a minimal repro. Granted, it's understandable if you don't have the time for that :) Also, I got rid of my local changes and re-ran the test case and I'm seeing what you see. the 2 yaml files are identical. But the 2 binary files aren't. 00000004: 83 94 00000050: 00 08 00000051: 00 02 00000074: 00 04 0000077C: 04 0F 000007A0: 0F 04 000007C5: 61 62 000007D0: 62 61 Luckily 00000004 is a pretty easy offset to identify, so we should be able to figure this out. It looks probably some header fields aren't being initialized correctly (not sure why obj2yaml isn't printing this information) On Fri, Jan 26, 2018 at 9:59 AM Leonardo Santagada <santagada at gmail.com> wrote:> I'm now thinking that there's a bug in either obj2yaml or yaml2obj, > because if I run just those two tools on my codebase it generates yaml > files that can't be decoded, will try now to not add any section to the obj > file in llvm-objcopy to see if I can link with obj files that I rewrite > (but without adding symbols or sections). > > One of the bugs that do annoy me is that the timedatestamp is not carried > when obj2yaml writes a file, and that the layout function on yaml2coff does > generate different indexes to the sections, none that look wrong, but it > seems that it leaves some padding, but I didn't have time to look to > closely at why. > > On Fri, Jan 26, 2018 at 6:52 PM, Zachary Turner <zturner at google.com> > wrote: > >> Hmm, ok. In that case let me try again without my local changes. Maybe >> they are getting in the way :-/ >> >> >> On Fri, Jan 26, 2018 at 9:51 AM Leonardo Santagada <santagada at gmail.com> >> wrote: >> >>> it is identical to me... wierd. >>> >>> On Fri, Jan 26, 2018 at 6:49 PM, Zachary Turner <zturner at google.com> >>> wrote: >>> >>>> (Ignore the fact that my hashes are 8 byte in the "good" file, this is >>>> due to some local changes I've been experimenting with) >>>> >>>> On Fri, Jan 26, 2018 at 9:48 AM Zachary Turner <zturner at google.com> >>>> wrote: >>>> >>>>> I did this: >>>>> >>>>> // a.cpp >>>>> static int x = 0; >>>>> void b(int); >>>>> void a(int) { >>>>> if (x) >>>>> b(x); >>>>> } >>>>> int main(int argc, char **argv) { >>>>> a(argc); >>>>> return x; >>>>> } >>>>> >>>>> >>>>> clang-cl /Z7 /c a.cpp /Foa.noghash.obj >>>>> clang-cl /Z7 /c a.cpp -mllvm -emit-codeview-ghash-section >>>>> /Foa.ghash.good.obj >>>>> llvm-objcopy a.noghash.obj a.ghash.bad.obj >>>>> obj2yaml a.ghash.good.obj > a.ghash.good.yaml >>>>> obj2yaml a.ghash.bad.obj > a.ghash.bad.yaml >>>>> >>>>> Then open these 2 yaml files up in a diff viewer. It looks like the >>>>> hashes aren't getting emitted at all. For example, in the good yaml file I >>>>> see this: >>>>> >>>>> - Name: '.debug$H' >>>>> Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, >>>>> IMAGE_SCN_MEM_DISCARDABLE, IMAGE_SCN_MEM_READ ] >>>>> Alignment: 4 >>>>> SectionData: >>>>> C5C93301000001005549419E78044E3896D45CD7009428758BE4A1E2B3E022BA267DEE221F5C42B17BCA182AF84584814A8B5E7E3FB17B397A9E3DEA75CD5627 >>>>> GlobalHashes: >>>>> Version: 0 >>>>> HashAlgorithm: 1 >>>>> HashValues: >>>>> - 5549419E78044E38 >>>>> - 96D45CD700942875 >>>>> - 8BE4A1E2B3E022BA >>>>> - 267DEE221F5C42B1 >>>>> - 7BCA182AF8458481 >>>>> - 4A8B5E7E3FB17B39 >>>>> - 7A9E3DEA75CD5627 >>>>> - Name: .pdata >>>>> >>>>> And in the bad yaml file I see this: >>>>> - Name: '.debug$H' >>>>> Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, >>>>> IMAGE_SCN_MEM_DISCARDABLE, IMAGE_SCN_MEM_READ ] >>>>> Alignment: 4 >>>>> SectionData: C5C9330100000000 >>>>> GlobalHashes: >>>>> Version: 0 >>>>> HashAlgorithm: 0 >>>>> - Name: .pdata >>>>> >>>>> Don't focus too much on trying to figure out weird linker errors. >>>>> Just get the output of obj2yaml to be identical when run under a diff >>>>> utility, then everything should work fine. >>>>> >>>>> On Fri, Jan 26, 2018 at 7:27 AM Leonardo Santagada < >>>>> santagada at gmail.com> wrote: >>>>> >>>>>> I'm so close I can almost smell it :) >>>>>> >>>>>> I know how bad the code looks, I don't intend to submit this, but if >>>>>> you want to try it out its at: >>>>>> https://gist.github.com/santagada/544136b1ee143bf31653b1158ac6829e >>>>>> >>>>>> I'm seeing: lld-link.exe: error: duplicate symbol: >>>>>> "<redacted_unmangled>" (<redacted>) in <internal> and in >>>>>> <redacted_filename>.obj, looking at the .yaml dump the symbols are all >>>>>> similar to this: >>>>>> >>>>>> - Name: <redacted> >>>>>> Value: 0 >>>>>> SectionNumber: 0 >>>>>> SimpleType: IMAGE_SYM_TYPE_NULL >>>>>> ComplexType: IMAGE_SYM_DTYPE_FUNCTION >>>>>> StorageClass: IMAGE_SYM_CLASS_WEAK_EXTERNAL >>>>>> WeakExternal: >>>>>> TagIndex: 134 >>>>>> Characteristics: IMAGE_WEAK_EXTERN_SEARCH_LIBRARY >>>>>> >>>>>> On Thu, Jan 25, 2018 at 8:01 PM, Zachary Turner <zturner at google.com> >>>>>> wrote: >>>>>> >>>>>>> I haven't really dabbled in this part of the COFF format personally, >>>>>>> so hopefully I'm not leading you astray :) >>>>>>> >>>>>>> But I checked the code for coff2yaml, and I see this: >>>>>>> >>>>>>> } else if (Symbol.isSectionDefinition()) { >>>>>>> // This symbol represents a section definition. >>>>>>> assert(Symbol.getNumberOfAuxSymbols() == 1 && >>>>>>> "Expected a single aux symbol to describe this >>>>>>> section!"); >>>>>>> const object::coff_aux_section_definition *ObjSD >>>>>>> reinterpret_cast<const >>>>>>> object::coff_aux_section_definition *>( >>>>>>> AuxData.data()); >>>>>>> >>>>>>> So it looks like you need exactly 1 aux symbol for each section >>>>>>> symbol. >>>>>>> >>>>>>> I then scrolled up in this function to figure out where AuxData >>>>>>> comes from, and it comes from COFFObjectFile::getSymbolAuxData. I think >>>>>>> that function holds the clue to what you need to do. It looks like you >>>>>>> need to set coff::symbol::NumberOfAuxSymbols to 1, and then there is a >>>>>>> comment in getSymbolAuxData which says: >>>>>>> >>>>>>> // AUX data comes immediately after the symbol in COFF >>>>>>> Aux = reinterpret_cast<const uint8_t *>(Symbol.getRawPtr()) + >>>>>>> SymbolSize; >>>>>>> >>>>>>> So I think you just need to write the bytes immediately after the >>>>>>> coff::symbol. The thing you need to write looks like a >>>>>>> coff::coff_aux_section_definition structure. >>>>>>> >>>>>>> For the CheckSum, look at WinCOFFObjectWriter::writeSection. It >>>>>>> looks like its a CRC32 of the actual section contents, which you can >>>>>>> generate with a couple of lines of code: >>>>>>> >>>>>>> JamCRC JC(/*Init=*/0); >>>>>>> JC.update(DebugHContents); >>>>>>> AuxSymbol.CheckSum = JC.getCRC(); >>>>>>> >>>>>>> Hope this helps >>>>>>> >>>>>> > > > -- > > Leonardo Santagada >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180126/f2820631/attachment.html>
Leonardo Santagada via llvm-dev
2018-Jan-26 18:21 UTC
[llvm-dev] [lldb-dev] Trying out lld to link windows binaries (using msvc as a compiler)
04 is the date timestamp... weird it being different. for me on that peview.exe app I talked about I see a difference in the pointer to symbol table, because yaml2coff does layout all sections in the same order as they are on the header, and clang-cl does put them on the file in a different order (apparently trying to merge all .debug$S sections together) which is very different than what visual studio does (the headers are in the same order as the sections later in the file). On Fri, Jan 26, 2018 at 7:06 PM, Zachary Turner <zturner at google.com> wrote:> Interesting. If it is generating yaml files that can't be decoded, then > definitely sounds like a bug. If you can provide a reduced test case we > can try to fix it, but admittedly it can often take some effort to generate > a reduced test case. The best way is to use creduce. Use cl or clang-cl > and write the pre-processed output to a file, then run creduce on that file > with a test that basically roundtrips from obj2yaml to yaml2obj and return > 1 if there's an error. Then let it run for a couple of hours (or days) and > you should come back to a minimal repro. > > Granted, it's understandable if you don't have the time for that :) > > > Also, I got rid of my local changes and re-ran the test case and I'm > seeing what you see. the 2 yaml files are identical. But the 2 binary > files aren't. > > 00000004: 83 94 > 00000050: 00 08 > 00000051: 00 02 > 00000074: 00 04 > 0000077C: 04 0F > 000007A0: 0F 04 > 000007C5: 61 62 > 000007D0: 62 61 > > Luckily 00000004 is a pretty easy offset to identify, so we should be able > to figure this out. It looks probably some header fields aren't being > initialized correctly (not sure why obj2yaml isn't printing this > information) > > On Fri, Jan 26, 2018 at 9:59 AM Leonardo Santagada <santagada at gmail.com> > wrote: > >> I'm now thinking that there's a bug in either obj2yaml or yaml2obj, >> because if I run just those two tools on my codebase it generates yaml >> files that can't be decoded, will try now to not add any section to the obj >> file in llvm-objcopy to see if I can link with obj files that I rewrite >> (but without adding symbols or sections). >> >> One of the bugs that do annoy me is that the timedatestamp is not carried >> when obj2yaml writes a file, and that the layout function on yaml2coff does >> generate different indexes to the sections, none that look wrong, but it >> seems that it leaves some padding, but I didn't have time to look to >> closely at why. >> >> On Fri, Jan 26, 2018 at 6:52 PM, Zachary Turner <zturner at google.com> >> wrote: >> >>> Hmm, ok. In that case let me try again without my local changes. Maybe >>> they are getting in the way :-/ >>> >>> >>> On Fri, Jan 26, 2018 at 9:51 AM Leonardo Santagada <santagada at gmail.com> >>> wrote: >>> >>>> it is identical to me... wierd. >>>> >>>> On Fri, Jan 26, 2018 at 6:49 PM, Zachary Turner <zturner at google.com> >>>> wrote: >>>> >>>>> (Ignore the fact that my hashes are 8 byte in the "good" file, this is >>>>> due to some local changes I've been experimenting with) >>>>> >>>>> On Fri, Jan 26, 2018 at 9:48 AM Zachary Turner <zturner at google.com> >>>>> wrote: >>>>> >>>>>> I did this: >>>>>> >>>>>> // a.cpp >>>>>> static int x = 0; >>>>>> void b(int); >>>>>> void a(int) { >>>>>> if (x) >>>>>> b(x); >>>>>> } >>>>>> int main(int argc, char **argv) { >>>>>> a(argc); >>>>>> return x; >>>>>> } >>>>>> >>>>>> >>>>>> clang-cl /Z7 /c a.cpp /Foa.noghash.obj >>>>>> clang-cl /Z7 /c a.cpp -mllvm -emit-codeview-ghash-section >>>>>> /Foa.ghash.good.obj >>>>>> llvm-objcopy a.noghash.obj a.ghash.bad.obj >>>>>> obj2yaml a.ghash.good.obj > a.ghash.good.yaml >>>>>> obj2yaml a.ghash.bad.obj > a.ghash.bad.yaml >>>>>> >>>>>> Then open these 2 yaml files up in a diff viewer. It looks like the >>>>>> hashes aren't getting emitted at all. For example, in the good yaml file I >>>>>> see this: >>>>>> >>>>>> - Name: '.debug$H' >>>>>> Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, >>>>>> IMAGE_SCN_MEM_DISCARDABLE, IMAGE_SCN_MEM_READ ] >>>>>> Alignment: 4 >>>>>> SectionData: C5C93301000001005549419E78044E >>>>>> 3896D45CD7009428758BE4A1E2B3E022BA267DEE221F5C42B17BCA182AF8 >>>>>> 4584814A8B5E7E3FB17B397A9E3DEA75CD5627 >>>>>> GlobalHashes: >>>>>> Version: 0 >>>>>> HashAlgorithm: 1 >>>>>> HashValues: >>>>>> - 5549419E78044E38 >>>>>> - 96D45CD700942875 >>>>>> - 8BE4A1E2B3E022BA >>>>>> - 267DEE221F5C42B1 >>>>>> - 7BCA182AF8458481 >>>>>> - 4A8B5E7E3FB17B39 >>>>>> - 7A9E3DEA75CD5627 >>>>>> - Name: .pdata >>>>>> >>>>>> And in the bad yaml file I see this: >>>>>> - Name: '.debug$H' >>>>>> Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, >>>>>> IMAGE_SCN_MEM_DISCARDABLE, IMAGE_SCN_MEM_READ ] >>>>>> Alignment: 4 >>>>>> SectionData: C5C9330100000000 >>>>>> GlobalHashes: >>>>>> Version: 0 >>>>>> HashAlgorithm: 0 >>>>>> - Name: .pdata >>>>>> >>>>>> Don't focus too much on trying to figure out weird linker errors. >>>>>> Just get the output of obj2yaml to be identical when run under a diff >>>>>> utility, then everything should work fine. >>>>>> >>>>>> On Fri, Jan 26, 2018 at 7:27 AM Leonardo Santagada < >>>>>> santagada at gmail.com> wrote: >>>>>> >>>>>>> I'm so close I can almost smell it :) >>>>>>> >>>>>>> I know how bad the code looks, I don't intend to submit this, but if >>>>>>> you want to try it out its at: https://gist.github.com/santagada/ >>>>>>> 544136b1ee143bf31653b1158ac6829e >>>>>>> >>>>>>> I'm seeing: lld-link.exe: error: duplicate symbol: >>>>>>> "<redacted_unmangled>" (<redacted>) in <internal> and in >>>>>>> <redacted_filename>.obj, looking at the .yaml dump the symbols are all >>>>>>> similar to this: >>>>>>> >>>>>>> - Name: <redacted> >>>>>>> Value: 0 >>>>>>> SectionNumber: 0 >>>>>>> SimpleType: IMAGE_SYM_TYPE_NULL >>>>>>> ComplexType: IMAGE_SYM_DTYPE_FUNCTION >>>>>>> StorageClass: IMAGE_SYM_CLASS_WEAK_EXTERNAL >>>>>>> WeakExternal: >>>>>>> TagIndex: 134 >>>>>>> Characteristics: IMAGE_WEAK_EXTERN_SEARCH_LIBRARY >>>>>>> >>>>>>> On Thu, Jan 25, 2018 at 8:01 PM, Zachary Turner <zturner at google.com> >>>>>>> wrote: >>>>>>> >>>>>>>> I haven't really dabbled in this part of the COFF format >>>>>>>> personally, so hopefully I'm not leading you astray :) >>>>>>>> >>>>>>>> But I checked the code for coff2yaml, and I see this: >>>>>>>> >>>>>>>> } else if (Symbol.isSectionDefinition()) { >>>>>>>> // This symbol represents a section definition. >>>>>>>> assert(Symbol.getNumberOfAuxSymbols() == 1 && >>>>>>>> "Expected a single aux symbol to describe this >>>>>>>> section!"); >>>>>>>> const object::coff_aux_section_definition *ObjSD >>>>>>>> reinterpret_cast<const object::coff_aux_section_definition >>>>>>>> *>( >>>>>>>> AuxData.data()); >>>>>>>> >>>>>>>> So it looks like you need exactly 1 aux symbol for each section >>>>>>>> symbol. >>>>>>>> >>>>>>>> I then scrolled up in this function to figure out where AuxData >>>>>>>> comes from, and it comes from COFFObjectFile::getSymbolAuxData. I >>>>>>>> think that function holds the clue to what you need to do. It looks like >>>>>>>> you need to set coff::symbol::NumberOfAuxSymbols to 1, and then >>>>>>>> there is a comment in getSymbolAuxData which says: >>>>>>>> >>>>>>>> // AUX data comes immediately after the symbol in COFF >>>>>>>> Aux = reinterpret_cast<const uint8_t *>(Symbol.getRawPtr()) + >>>>>>>> SymbolSize; >>>>>>>> >>>>>>>> So I think you just need to write the bytes immediately after the >>>>>>>> coff::symbol. The thing you need to write looks like a >>>>>>>> coff::coff_aux_section_definition structure. >>>>>>>> >>>>>>>> For the CheckSum, look at WinCOFFObjectWriter::writeSection. It >>>>>>>> looks like its a CRC32 of the actual section contents, which you can >>>>>>>> generate with a couple of lines of code: >>>>>>>> >>>>>>>> JamCRC JC(/*Init=*/0); >>>>>>>> JC.update(DebugHContents); >>>>>>>> AuxSymbol.CheckSum = JC.getCRC(); >>>>>>>> >>>>>>>> Hope this helps >>>>>>>> >>>>>>> >> >> >> -- >> >> Leonardo Santagada >> >-- Leonardo Santagada -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180126/b2881e3e/attachment.html>
Zachary Turner via llvm-dev
2018-Jan-26 18:23 UTC
[llvm-dev] [lldb-dev] Trying out lld to link windows binaries (using msvc as a compiler)
dumpbin has some clues. I ran dumpbin /all on both object files and diffed the results. In the good object file, Section #2 (.data) has File Pointer to Raw Data 208, but in the bad file Section #2 (.data) has File Pointer to Raw Data 0. Also, Section #3 (.bss) in the good file has Size of Raw Data = 4, but in the bad file Section #3 (.bss) has Size of Raw Data = 0. On Fri, Jan 26, 2018 at 10:06 AM Zachary Turner <zturner at google.com> wrote:> Interesting. If it is generating yaml files that can't be decoded, then > definitely sounds like a bug. If you can provide a reduced test case we > can try to fix it, but admittedly it can often take some effort to generate > a reduced test case. The best way is to use creduce. Use cl or clang-cl > and write the pre-processed output to a file, then run creduce on that file > with a test that basically roundtrips from obj2yaml to yaml2obj and return > 1 if there's an error. Then let it run for a couple of hours (or days) and > you should come back to a minimal repro. > > Granted, it's understandable if you don't have the time for that :) > > > Also, I got rid of my local changes and re-ran the test case and I'm > seeing what you see. the 2 yaml files are identical. But the 2 binary > files aren't. > > 00000004: 83 94 > 00000050: 00 08 > 00000051: 00 02 > 00000074: 00 04 > 0000077C: 04 0F > 000007A0: 0F 04 > 000007C5: 61 62 > 000007D0: 62 61 > > Luckily 00000004 is a pretty easy offset to identify, so we should be able > to figure this out. It looks probably some header fields aren't being > initialized correctly (not sure why obj2yaml isn't printing this > information) > > On Fri, Jan 26, 2018 at 9:59 AM Leonardo Santagada <santagada at gmail.com> > wrote: > >> I'm now thinking that there's a bug in either obj2yaml or yaml2obj, >> because if I run just those two tools on my codebase it generates yaml >> files that can't be decoded, will try now to not add any section to the obj >> file in llvm-objcopy to see if I can link with obj files that I rewrite >> (but without adding symbols or sections). >> >> One of the bugs that do annoy me is that the timedatestamp is not carried >> when obj2yaml writes a file, and that the layout function on yaml2coff does >> generate different indexes to the sections, none that look wrong, but it >> seems that it leaves some padding, but I didn't have time to look to >> closely at why. >> >> On Fri, Jan 26, 2018 at 6:52 PM, Zachary Turner <zturner at google.com> >> wrote: >> >>> Hmm, ok. In that case let me try again without my local changes. Maybe >>> they are getting in the way :-/ >>> >>> >>> On Fri, Jan 26, 2018 at 9:51 AM Leonardo Santagada <santagada at gmail.com> >>> wrote: >>> >>>> it is identical to me... wierd. >>>> >>>> On Fri, Jan 26, 2018 at 6:49 PM, Zachary Turner <zturner at google.com> >>>> wrote: >>>> >>>>> (Ignore the fact that my hashes are 8 byte in the "good" file, this is >>>>> due to some local changes I've been experimenting with) >>>>> >>>>> On Fri, Jan 26, 2018 at 9:48 AM Zachary Turner <zturner at google.com> >>>>> wrote: >>>>> >>>>>> I did this: >>>>>> >>>>>> // a.cpp >>>>>> static int x = 0; >>>>>> void b(int); >>>>>> void a(int) { >>>>>> if (x) >>>>>> b(x); >>>>>> } >>>>>> int main(int argc, char **argv) { >>>>>> a(argc); >>>>>> return x; >>>>>> } >>>>>> >>>>>> >>>>>> clang-cl /Z7 /c a.cpp /Foa.noghash.obj >>>>>> clang-cl /Z7 /c a.cpp -mllvm -emit-codeview-ghash-section >>>>>> /Foa.ghash.good.obj >>>>>> llvm-objcopy a.noghash.obj a.ghash.bad.obj >>>>>> obj2yaml a.ghash.good.obj > a.ghash.good.yaml >>>>>> obj2yaml a.ghash.bad.obj > a.ghash.bad.yaml >>>>>> >>>>>> Then open these 2 yaml files up in a diff viewer. It looks like the >>>>>> hashes aren't getting emitted at all. For example, in the good yaml file I >>>>>> see this: >>>>>> >>>>>> - Name: '.debug$H' >>>>>> Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, >>>>>> IMAGE_SCN_MEM_DISCARDABLE, IMAGE_SCN_MEM_READ ] >>>>>> Alignment: 4 >>>>>> SectionData: >>>>>> C5C93301000001005549419E78044E3896D45CD7009428758BE4A1E2B3E022BA267DEE221F5C42B17BCA182AF84584814A8B5E7E3FB17B397A9E3DEA75CD5627 >>>>>> GlobalHashes: >>>>>> Version: 0 >>>>>> HashAlgorithm: 1 >>>>>> HashValues: >>>>>> - 5549419E78044E38 >>>>>> - 96D45CD700942875 >>>>>> - 8BE4A1E2B3E022BA >>>>>> - 267DEE221F5C42B1 >>>>>> - 7BCA182AF8458481 >>>>>> - 4A8B5E7E3FB17B39 >>>>>> - 7A9E3DEA75CD5627 >>>>>> - Name: .pdata >>>>>> >>>>>> And in the bad yaml file I see this: >>>>>> - Name: '.debug$H' >>>>>> Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, >>>>>> IMAGE_SCN_MEM_DISCARDABLE, IMAGE_SCN_MEM_READ ] >>>>>> Alignment: 4 >>>>>> SectionData: C5C9330100000000 >>>>>> GlobalHashes: >>>>>> Version: 0 >>>>>> HashAlgorithm: 0 >>>>>> - Name: .pdata >>>>>> >>>>>> Don't focus too much on trying to figure out weird linker errors. >>>>>> Just get the output of obj2yaml to be identical when run under a diff >>>>>> utility, then everything should work fine. >>>>>> >>>>>> On Fri, Jan 26, 2018 at 7:27 AM Leonardo Santagada < >>>>>> santagada at gmail.com> wrote: >>>>>> >>>>>>> I'm so close I can almost smell it :) >>>>>>> >>>>>>> I know how bad the code looks, I don't intend to submit this, but if >>>>>>> you want to try it out its at: >>>>>>> https://gist.github.com/santagada/544136b1ee143bf31653b1158ac6829e >>>>>>> >>>>>>> I'm seeing: lld-link.exe: error: duplicate symbol: >>>>>>> "<redacted_unmangled>" (<redacted>) in <internal> and in >>>>>>> <redacted_filename>.obj, looking at the .yaml dump the symbols are all >>>>>>> similar to this: >>>>>>> >>>>>>> - Name: <redacted> >>>>>>> Value: 0 >>>>>>> SectionNumber: 0 >>>>>>> SimpleType: IMAGE_SYM_TYPE_NULL >>>>>>> ComplexType: IMAGE_SYM_DTYPE_FUNCTION >>>>>>> StorageClass: IMAGE_SYM_CLASS_WEAK_EXTERNAL >>>>>>> WeakExternal: >>>>>>> TagIndex: 134 >>>>>>> Characteristics: IMAGE_WEAK_EXTERN_SEARCH_LIBRARY >>>>>>> >>>>>>> On Thu, Jan 25, 2018 at 8:01 PM, Zachary Turner <zturner at google.com> >>>>>>> wrote: >>>>>>> >>>>>>>> I haven't really dabbled in this part of the COFF format >>>>>>>> personally, so hopefully I'm not leading you astray :) >>>>>>>> >>>>>>>> But I checked the code for coff2yaml, and I see this: >>>>>>>> >>>>>>>> } else if (Symbol.isSectionDefinition()) { >>>>>>>> // This symbol represents a section definition. >>>>>>>> assert(Symbol.getNumberOfAuxSymbols() == 1 && >>>>>>>> "Expected a single aux symbol to describe this >>>>>>>> section!"); >>>>>>>> const object::coff_aux_section_definition *ObjSD >>>>>>>> reinterpret_cast<const >>>>>>>> object::coff_aux_section_definition *>( >>>>>>>> AuxData.data()); >>>>>>>> >>>>>>>> So it looks like you need exactly 1 aux symbol for each section >>>>>>>> symbol. >>>>>>>> >>>>>>>> I then scrolled up in this function to figure out where AuxData >>>>>>>> comes from, and it comes from COFFObjectFile::getSymbolAuxData. I think >>>>>>>> that function holds the clue to what you need to do. It looks like you >>>>>>>> need to set coff::symbol::NumberOfAuxSymbols to 1, and then there is a >>>>>>>> comment in getSymbolAuxData which says: >>>>>>>> >>>>>>>> // AUX data comes immediately after the symbol in COFF >>>>>>>> Aux = reinterpret_cast<const uint8_t *>(Symbol.getRawPtr()) + >>>>>>>> SymbolSize; >>>>>>>> >>>>>>>> So I think you just need to write the bytes immediately after the >>>>>>>> coff::symbol. The thing you need to write looks like a >>>>>>>> coff::coff_aux_section_definition structure. >>>>>>>> >>>>>>>> For the CheckSum, look at WinCOFFObjectWriter::writeSection. It >>>>>>>> looks like its a CRC32 of the actual section contents, which you can >>>>>>>> generate with a couple of lines of code: >>>>>>>> >>>>>>>> JamCRC JC(/*Init=*/0); >>>>>>>> JC.update(DebugHContents); >>>>>>>> AuxSymbol.CheckSum = JC.getCRC(); >>>>>>>> >>>>>>>> Hope this helps >>>>>>>> >>>>>>> >> >> >> -- >> >> Leonardo Santagada >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180126/d1f9393d/attachment-0001.html>
Seemingly Similar Threads
- [lldb-dev] Trying out lld to link windows binaries (using msvc as a compiler)
- [lldb-dev] Trying out lld to link windows binaries (using msvc as a compiler)
- [lldb-dev] Trying out lld to link windows binaries (using msvc as a compiler)
- [lldb-dev] Trying out lld to link windows binaries (using msvc as a compiler)
- [lldb-dev] Trying out lld to link windows binaries (using msvc as a compiler)