Zachary Turner via llvm-dev
2018-Jan-26 18:23 UTC
[llvm-dev] [lldb-dev] Trying out lld to link windows binaries (using msvc as a compiler)
dumpbin has some clues. I ran dumpbin /all on both object files and diffed the results. In the good object file, Section #2 (.data) has File Pointer to Raw Data 208, but in the bad file Section #2 (.data) has File Pointer to Raw Data 0. Also, Section #3 (.bss) in the good file has Size of Raw Data = 4, but in the bad file Section #3 (.bss) has Size of Raw Data = 0. On Fri, Jan 26, 2018 at 10:06 AM Zachary Turner <zturner at google.com> wrote:> Interesting. If it is generating yaml files that can't be decoded, then > definitely sounds like a bug. If you can provide a reduced test case we > can try to fix it, but admittedly it can often take some effort to generate > a reduced test case. The best way is to use creduce. Use cl or clang-cl > and write the pre-processed output to a file, then run creduce on that file > with a test that basically roundtrips from obj2yaml to yaml2obj and return > 1 if there's an error. Then let it run for a couple of hours (or days) and > you should come back to a minimal repro. > > Granted, it's understandable if you don't have the time for that :) > > > Also, I got rid of my local changes and re-ran the test case and I'm > seeing what you see. the 2 yaml files are identical. But the 2 binary > files aren't. > > 00000004: 83 94 > 00000050: 00 08 > 00000051: 00 02 > 00000074: 00 04 > 0000077C: 04 0F > 000007A0: 0F 04 > 000007C5: 61 62 > 000007D0: 62 61 > > Luckily 00000004 is a pretty easy offset to identify, so we should be able > to figure this out. It looks probably some header fields aren't being > initialized correctly (not sure why obj2yaml isn't printing this > information) > > On Fri, Jan 26, 2018 at 9:59 AM Leonardo Santagada <santagada at gmail.com> > wrote: > >> I'm now thinking that there's a bug in either obj2yaml or yaml2obj, >> because if I run just those two tools on my codebase it generates yaml >> files that can't be decoded, will try now to not add any section to the obj >> file in llvm-objcopy to see if I can link with obj files that I rewrite >> (but without adding symbols or sections). >> >> One of the bugs that do annoy me is that the timedatestamp is not carried >> when obj2yaml writes a file, and that the layout function on yaml2coff does >> generate different indexes to the sections, none that look wrong, but it >> seems that it leaves some padding, but I didn't have time to look to >> closely at why. >> >> On Fri, Jan 26, 2018 at 6:52 PM, Zachary Turner <zturner at google.com> >> wrote: >> >>> Hmm, ok. In that case let me try again without my local changes. Maybe >>> they are getting in the way :-/ >>> >>> >>> On Fri, Jan 26, 2018 at 9:51 AM Leonardo Santagada <santagada at gmail.com> >>> wrote: >>> >>>> it is identical to me... wierd. >>>> >>>> On Fri, Jan 26, 2018 at 6:49 PM, Zachary Turner <zturner at google.com> >>>> wrote: >>>> >>>>> (Ignore the fact that my hashes are 8 byte in the "good" file, this is >>>>> due to some local changes I've been experimenting with) >>>>> >>>>> On Fri, Jan 26, 2018 at 9:48 AM Zachary Turner <zturner at google.com> >>>>> wrote: >>>>> >>>>>> I did this: >>>>>> >>>>>> // a.cpp >>>>>> static int x = 0; >>>>>> void b(int); >>>>>> void a(int) { >>>>>> if (x) >>>>>> b(x); >>>>>> } >>>>>> int main(int argc, char **argv) { >>>>>> a(argc); >>>>>> return x; >>>>>> } >>>>>> >>>>>> >>>>>> clang-cl /Z7 /c a.cpp /Foa.noghash.obj >>>>>> clang-cl /Z7 /c a.cpp -mllvm -emit-codeview-ghash-section >>>>>> /Foa.ghash.good.obj >>>>>> llvm-objcopy a.noghash.obj a.ghash.bad.obj >>>>>> obj2yaml a.ghash.good.obj > a.ghash.good.yaml >>>>>> obj2yaml a.ghash.bad.obj > a.ghash.bad.yaml >>>>>> >>>>>> Then open these 2 yaml files up in a diff viewer. It looks like the >>>>>> hashes aren't getting emitted at all. For example, in the good yaml file I >>>>>> see this: >>>>>> >>>>>> - Name: '.debug$H' >>>>>> Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, >>>>>> IMAGE_SCN_MEM_DISCARDABLE, IMAGE_SCN_MEM_READ ] >>>>>> Alignment: 4 >>>>>> SectionData: >>>>>> C5C93301000001005549419E78044E3896D45CD7009428758BE4A1E2B3E022BA267DEE221F5C42B17BCA182AF84584814A8B5E7E3FB17B397A9E3DEA75CD5627 >>>>>> GlobalHashes: >>>>>> Version: 0 >>>>>> HashAlgorithm: 1 >>>>>> HashValues: >>>>>> - 5549419E78044E38 >>>>>> - 96D45CD700942875 >>>>>> - 8BE4A1E2B3E022BA >>>>>> - 267DEE221F5C42B1 >>>>>> - 7BCA182AF8458481 >>>>>> - 4A8B5E7E3FB17B39 >>>>>> - 7A9E3DEA75CD5627 >>>>>> - Name: .pdata >>>>>> >>>>>> And in the bad yaml file I see this: >>>>>> - Name: '.debug$H' >>>>>> Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, >>>>>> IMAGE_SCN_MEM_DISCARDABLE, IMAGE_SCN_MEM_READ ] >>>>>> Alignment: 4 >>>>>> SectionData: C5C9330100000000 >>>>>> GlobalHashes: >>>>>> Version: 0 >>>>>> HashAlgorithm: 0 >>>>>> - Name: .pdata >>>>>> >>>>>> Don't focus too much on trying to figure out weird linker errors. >>>>>> Just get the output of obj2yaml to be identical when run under a diff >>>>>> utility, then everything should work fine. >>>>>> >>>>>> On Fri, Jan 26, 2018 at 7:27 AM Leonardo Santagada < >>>>>> santagada at gmail.com> wrote: >>>>>> >>>>>>> I'm so close I can almost smell it :) >>>>>>> >>>>>>> I know how bad the code looks, I don't intend to submit this, but if >>>>>>> you want to try it out its at: >>>>>>> https://gist.github.com/santagada/544136b1ee143bf31653b1158ac6829e >>>>>>> >>>>>>> I'm seeing: lld-link.exe: error: duplicate symbol: >>>>>>> "<redacted_unmangled>" (<redacted>) in <internal> and in >>>>>>> <redacted_filename>.obj, looking at the .yaml dump the symbols are all >>>>>>> similar to this: >>>>>>> >>>>>>> - Name: <redacted> >>>>>>> Value: 0 >>>>>>> SectionNumber: 0 >>>>>>> SimpleType: IMAGE_SYM_TYPE_NULL >>>>>>> ComplexType: IMAGE_SYM_DTYPE_FUNCTION >>>>>>> StorageClass: IMAGE_SYM_CLASS_WEAK_EXTERNAL >>>>>>> WeakExternal: >>>>>>> TagIndex: 134 >>>>>>> Characteristics: IMAGE_WEAK_EXTERN_SEARCH_LIBRARY >>>>>>> >>>>>>> On Thu, Jan 25, 2018 at 8:01 PM, Zachary Turner <zturner at google.com> >>>>>>> wrote: >>>>>>> >>>>>>>> I haven't really dabbled in this part of the COFF format >>>>>>>> personally, so hopefully I'm not leading you astray :) >>>>>>>> >>>>>>>> But I checked the code for coff2yaml, and I see this: >>>>>>>> >>>>>>>> } else if (Symbol.isSectionDefinition()) { >>>>>>>> // This symbol represents a section definition. >>>>>>>> assert(Symbol.getNumberOfAuxSymbols() == 1 && >>>>>>>> "Expected a single aux symbol to describe this >>>>>>>> section!"); >>>>>>>> const object::coff_aux_section_definition *ObjSD >>>>>>>> reinterpret_cast<const >>>>>>>> object::coff_aux_section_definition *>( >>>>>>>> AuxData.data()); >>>>>>>> >>>>>>>> So it looks like you need exactly 1 aux symbol for each section >>>>>>>> symbol. >>>>>>>> >>>>>>>> I then scrolled up in this function to figure out where AuxData >>>>>>>> comes from, and it comes from COFFObjectFile::getSymbolAuxData. I think >>>>>>>> that function holds the clue to what you need to do. It looks like you >>>>>>>> need to set coff::symbol::NumberOfAuxSymbols to 1, and then there is a >>>>>>>> comment in getSymbolAuxData which says: >>>>>>>> >>>>>>>> // AUX data comes immediately after the symbol in COFF >>>>>>>> Aux = reinterpret_cast<const uint8_t *>(Symbol.getRawPtr()) + >>>>>>>> SymbolSize; >>>>>>>> >>>>>>>> So I think you just need to write the bytes immediately after the >>>>>>>> coff::symbol. The thing you need to write looks like a >>>>>>>> coff::coff_aux_section_definition structure. >>>>>>>> >>>>>>>> For the CheckSum, look at WinCOFFObjectWriter::writeSection. It >>>>>>>> looks like its a CRC32 of the actual section contents, which you can >>>>>>>> generate with a couple of lines of code: >>>>>>>> >>>>>>>> JamCRC JC(/*Init=*/0); >>>>>>>> JC.update(DebugHContents); >>>>>>>> AuxSymbol.CheckSum = JC.getCRC(); >>>>>>>> >>>>>>>> Hope this helps >>>>>>>> >>>>>>> >> >> >> -- >> >> Leonardo Santagada >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180126/d1f9393d/attachment-0001.html>
Leonardo Santagada via llvm-dev
2018-Jan-26 18:32 UTC
[llvm-dev] [lldb-dev] Trying out lld to link windows binaries (using msvc as a compiler)
yeah, apparently .bss has a flag of unitialized data that is not being respected on the layout of the coff files (it should skip those sections) but I dunno what to do with .data as it doesn't have a size. On Fri, Jan 26, 2018 at 7:23 PM, Zachary Turner <zturner at google.com> wrote:> dumpbin has some clues. I ran dumpbin /all on both object files and > diffed the results. > > In the good object file, Section #2 (.data) has File Pointer to Raw Data > 208, but in the bad file Section #2 (.data) has File Pointer to Raw Data > 0. > Also, Section #3 (.bss) in the good file has Size of Raw Data = 4, but in > the bad file Section #3 (.bss) has Size of Raw Data = 0. > > > > On Fri, Jan 26, 2018 at 10:06 AM Zachary Turner <zturner at google.com> > wrote: > >> Interesting. If it is generating yaml files that can't be decoded, then >> definitely sounds like a bug. If you can provide a reduced test case we >> can try to fix it, but admittedly it can often take some effort to generate >> a reduced test case. The best way is to use creduce. Use cl or clang-cl >> and write the pre-processed output to a file, then run creduce on that file >> with a test that basically roundtrips from obj2yaml to yaml2obj and return >> 1 if there's an error. Then let it run for a couple of hours (or days) and >> you should come back to a minimal repro. >> >> Granted, it's understandable if you don't have the time for that :) >> >> >> Also, I got rid of my local changes and re-ran the test case and I'm >> seeing what you see. the 2 yaml files are identical. But the 2 binary >> files aren't. >> >> 00000004: 83 94 >> 00000050: 00 08 >> 00000051: 00 02 >> 00000074: 00 04 >> 0000077C: 04 0F >> 000007A0: 0F 04 >> 000007C5: 61 62 >> 000007D0: 62 61 >> >> Luckily 00000004 is a pretty easy offset to identify, so we should be >> able to figure this out. It looks probably some header fields aren't being >> initialized correctly (not sure why obj2yaml isn't printing this >> information) >> >> On Fri, Jan 26, 2018 at 9:59 AM Leonardo Santagada <santagada at gmail.com> >> wrote: >> >>> I'm now thinking that there's a bug in either obj2yaml or yaml2obj, >>> because if I run just those two tools on my codebase it generates yaml >>> files that can't be decoded, will try now to not add any section to the obj >>> file in llvm-objcopy to see if I can link with obj files that I rewrite >>> (but without adding symbols or sections). >>> >>> One of the bugs that do annoy me is that the timedatestamp is not >>> carried when obj2yaml writes a file, and that the layout function on >>> yaml2coff does generate different indexes to the sections, none that look >>> wrong, but it seems that it leaves some padding, but I didn't have time to >>> look to closely at why. >>> >>> On Fri, Jan 26, 2018 at 6:52 PM, Zachary Turner <zturner at google.com> >>> wrote: >>> >>>> Hmm, ok. In that case let me try again without my local changes. >>>> Maybe they are getting in the way :-/ >>>> >>>> >>>> On Fri, Jan 26, 2018 at 9:51 AM Leonardo Santagada <santagada at gmail.com> >>>> wrote: >>>> >>>>> it is identical to me... wierd. >>>>> >>>>> On Fri, Jan 26, 2018 at 6:49 PM, Zachary Turner <zturner at google.com> >>>>> wrote: >>>>> >>>>>> (Ignore the fact that my hashes are 8 byte in the "good" file, this >>>>>> is due to some local changes I've been experimenting with) >>>>>> >>>>>> On Fri, Jan 26, 2018 at 9:48 AM Zachary Turner <zturner at google.com> >>>>>> wrote: >>>>>> >>>>>>> I did this: >>>>>>> >>>>>>> // a.cpp >>>>>>> static int x = 0; >>>>>>> void b(int); >>>>>>> void a(int) { >>>>>>> if (x) >>>>>>> b(x); >>>>>>> } >>>>>>> int main(int argc, char **argv) { >>>>>>> a(argc); >>>>>>> return x; >>>>>>> } >>>>>>> >>>>>>> >>>>>>> clang-cl /Z7 /c a.cpp /Foa.noghash.obj >>>>>>> clang-cl /Z7 /c a.cpp -mllvm -emit-codeview-ghash-section >>>>>>> /Foa.ghash.good.obj >>>>>>> llvm-objcopy a.noghash.obj a.ghash.bad.obj >>>>>>> obj2yaml a.ghash.good.obj > a.ghash.good.yaml >>>>>>> obj2yaml a.ghash.bad.obj > a.ghash.bad.yaml >>>>>>> >>>>>>> Then open these 2 yaml files up in a diff viewer. It looks like the >>>>>>> hashes aren't getting emitted at all. For example, in the good yaml file I >>>>>>> see this: >>>>>>> >>>>>>> - Name: '.debug$H' >>>>>>> Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, >>>>>>> IMAGE_SCN_MEM_DISCARDABLE, IMAGE_SCN_MEM_READ ] >>>>>>> Alignment: 4 >>>>>>> SectionData: C5C93301000001005549419E78044E >>>>>>> 3896D45CD7009428758BE4A1E2B3E022BA267DEE221F5C42B17BCA182AF8 >>>>>>> 4584814A8B5E7E3FB17B397A9E3DEA75CD5627 >>>>>>> GlobalHashes: >>>>>>> Version: 0 >>>>>>> HashAlgorithm: 1 >>>>>>> HashValues: >>>>>>> - 5549419E78044E38 >>>>>>> - 96D45CD700942875 >>>>>>> - 8BE4A1E2B3E022BA >>>>>>> - 267DEE221F5C42B1 >>>>>>> - 7BCA182AF8458481 >>>>>>> - 4A8B5E7E3FB17B39 >>>>>>> - 7A9E3DEA75CD5627 >>>>>>> - Name: .pdata >>>>>>> >>>>>>> And in the bad yaml file I see this: >>>>>>> - Name: '.debug$H' >>>>>>> Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, >>>>>>> IMAGE_SCN_MEM_DISCARDABLE, IMAGE_SCN_MEM_READ ] >>>>>>> Alignment: 4 >>>>>>> SectionData: C5C9330100000000 >>>>>>> GlobalHashes: >>>>>>> Version: 0 >>>>>>> HashAlgorithm: 0 >>>>>>> - Name: .pdata >>>>>>> >>>>>>> Don't focus too much on trying to figure out weird linker errors. >>>>>>> Just get the output of obj2yaml to be identical when run under a diff >>>>>>> utility, then everything should work fine. >>>>>>> >>>>>>> On Fri, Jan 26, 2018 at 7:27 AM Leonardo Santagada < >>>>>>> santagada at gmail.com> wrote: >>>>>>> >>>>>>>> I'm so close I can almost smell it :) >>>>>>>> >>>>>>>> I know how bad the code looks, I don't intend to submit this, but >>>>>>>> if you want to try it out its at: https://gist.github.com/ >>>>>>>> santagada/544136b1ee143bf31653b1158ac6829e >>>>>>>> >>>>>>>> I'm seeing: lld-link.exe: error: duplicate symbol: >>>>>>>> "<redacted_unmangled>" (<redacted>) in <internal> and in >>>>>>>> <redacted_filename>.obj, looking at the .yaml dump the symbols are all >>>>>>>> similar to this: >>>>>>>> >>>>>>>> - Name: <redacted> >>>>>>>> Value: 0 >>>>>>>> SectionNumber: 0 >>>>>>>> SimpleType: IMAGE_SYM_TYPE_NULL >>>>>>>> ComplexType: IMAGE_SYM_DTYPE_FUNCTION >>>>>>>> StorageClass: IMAGE_SYM_CLASS_WEAK_EXTERNAL >>>>>>>> WeakExternal: >>>>>>>> TagIndex: 134 >>>>>>>> Characteristics: IMAGE_WEAK_EXTERN_SEARCH_LIBRARY >>>>>>>> >>>>>>>> On Thu, Jan 25, 2018 at 8:01 PM, Zachary Turner <zturner at google.com >>>>>>>> > wrote: >>>>>>>> >>>>>>>>> I haven't really dabbled in this part of the COFF format >>>>>>>>> personally, so hopefully I'm not leading you astray :) >>>>>>>>> >>>>>>>>> But I checked the code for coff2yaml, and I see this: >>>>>>>>> >>>>>>>>> } else if (Symbol.isSectionDefinition()) { >>>>>>>>> // This symbol represents a section definition. >>>>>>>>> assert(Symbol.getNumberOfAuxSymbols() == 1 && >>>>>>>>> "Expected a single aux symbol to describe this >>>>>>>>> section!"); >>>>>>>>> const object::coff_aux_section_definition *ObjSD >>>>>>>>> reinterpret_cast<const object::coff_aux_section_definition >>>>>>>>> *>( >>>>>>>>> AuxData.data()); >>>>>>>>> >>>>>>>>> So it looks like you need exactly 1 aux symbol for each section >>>>>>>>> symbol. >>>>>>>>> >>>>>>>>> I then scrolled up in this function to figure out where AuxData >>>>>>>>> comes from, and it comes from COFFObjectFile::getSymbolAuxData. >>>>>>>>> I think that function holds the clue to what you need to do. It looks like >>>>>>>>> you need to set coff::symbol::NumberOfAuxSymbols to 1, and then >>>>>>>>> there is a comment in getSymbolAuxData which says: >>>>>>>>> >>>>>>>>> // AUX data comes immediately after the symbol in COFF >>>>>>>>> Aux = reinterpret_cast<const uint8_t *>(Symbol.getRawPtr()) + >>>>>>>>> SymbolSize; >>>>>>>>> >>>>>>>>> So I think you just need to write the bytes immediately after the >>>>>>>>> coff::symbol. The thing you need to write looks like a >>>>>>>>> coff::coff_aux_section_definition structure. >>>>>>>>> >>>>>>>>> For the CheckSum, look at WinCOFFObjectWriter::writeSection. It >>>>>>>>> looks like its a CRC32 of the actual section contents, which you can >>>>>>>>> generate with a couple of lines of code: >>>>>>>>> >>>>>>>>> JamCRC JC(/*Init=*/0); >>>>>>>>> JC.update(DebugHContents); >>>>>>>>> AuxSymbol.CheckSum = JC.getCRC(); >>>>>>>>> >>>>>>>>> Hope this helps >>>>>>>>> >>>>>>>> >>> >>> >>> -- >>> >>> Leonardo Santagada >>> >>-- Leonardo Santagada -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180126/35125e9e/attachment.html>
Leonardo Santagada via llvm-dev
2018-Jan-26 19:05 UTC
[llvm-dev] [lldb-dev] Trying out lld to link windows binaries (using msvc as a compiler)
yeah, apparently .bss has a flag of unitialized data that is not being respected on the layout of the coff files (it should skip those sections) but I dunno what to do with .data as it doesn't have a size. (resending as apparently my pastes generated a ton of hidden html data and this message hit the mailinglist limit of 100k) -- Leonardo Santagada -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180126/696782b2/attachment.html>
Leonardo Santagada via llvm-dev
2018-Jan-26 19:23 UTC
[llvm-dev] [lldb-dev] Trying out lld to link windows binaries (using msvc as a compiler)
Okay, apparently coff2yaml and yaml2coff are not in a great place as they both don't deal well with the fact that you can have overlapping sections, which seems to be what clang-cl produces (the .data section points to the same place as a later section). Which is not a big big problem for me particularly because msvc doesn't even generate .data sections in .obj. I'm trying to put support for .bss sections in both coff2yaml and yaml2coff... but I still can link just fine with my transformations clang-cl generated files... what does give me problems is msvc .obj files. Have you tried to link one of these? On Fri, Jan 26, 2018 at 8:05 PM, Leonardo Santagada <santagada at gmail.com> wrote:> yeah, apparently .bss has a flag of unitialized data that is not being > respected on the layout of the coff files (it should skip those sections) > but I dunno what to do with .data as it doesn't have a size. > > (resending as apparently my pastes generated a ton of hidden html data and > this message hit the mailinglist limit of 100k) > -- > > Leonardo Santagada >-- Leonardo Santagada -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180126/f8d60edf/attachment.html>
David Chisnall via llvm-dev
2018-Jan-31 09:03 UTC
[llvm-dev] [lldb-dev] Trying out lld to link windows binaries (using msvc as a compiler)
On 26 Jan 2018, at 19:05, Leonardo Santagada via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > > yeah, apparently .bss has a flag of unitialized data that is not being respected on the layout of the coff files (it should skip those sections) but I dunno what to do with .data as it doesn't have a size. > > (resending as apparently my pastes generated a ton of hidden html data and this message hit the mailinglist limit of 100k)This email also embedded this tracker: <img class=3D"gmail-ajz" id=3D"gmail-:od" src=3D"https://mail.google.com/mail/u/0/images/cleardot.gif" alt=3D”"> Which has then been duplicated in every reply so far. I don’t know if it’s malicious on your part or if your mail client has been compromised, but it’s definitely impolite. David
Possibly Parallel Threads
- [lldb-dev] Trying out lld to link windows binaries (using msvc as a compiler)
- [lldb-dev] Trying out lld to link windows binaries (using msvc as a compiler)
- [lldb-dev] Trying out lld to link windows binaries (using msvc as a compiler)
- [lldb-dev] Trying out lld to link windows binaries (using msvc as a compiler)
- [lldb-dev] Trying out lld to link windows binaries (using msvc as a compiler)