Ramkumar Ramachandra
2015-Aug-04 15:10 UTC
[LLVMdev] [BUG] Incorrect ASCII escape characters on Mac
Hi, The bug originated when one of our tests failed with an answer-mismatch on Mac, after upgrading to XCode 6.2. The internal IR after all our transforms, and just before it hits the final LLVM lowering is identical on Mac and Linux: var t6 : int8_T{col}[10] = {34, -48, 18, -12, 33, 0, 21, -7, -20, -31}; The LLVM lowering itself doesn't try to do anything smart. When we dumpModule, we see the difference: @2 = internal global [24 x i8] c"\10\02\03\0D\05\0B\0A\08\09\07\06\0C\04\0E\0F\01\01\02\03\04\06\02\00\05" @3 = internal global [10 x i8] c"\22\00\12\00!\00\15\00\00\00" @4 = internal global [10 x i8] c"\00\19\00+\00\00#\03\00\11" - at 5 = internal global [10 x i8] c"\22\D0\12\F4!\00\15\F9\EC\E1" - at 6 = internal global [10 x i8] c"\D0\19\FB+\FD\F8#\03\E2\11" + at 5 = internal global [10 x i8] c"\22Ð\12ô!\00\15ùìá" + at 6 = internal global [10 x i8] c"Ð\19û+ýø#\03â\11" The diff is between Linux and Mac, where lines added are from Mac. Both the @5 character sequences represent: 34 208 18 244 33 0 21 249 236 225 in decimal, which is our original array (from https://r12a.github.io/apps/conversion/). Let's try to feed this into llc then: llc: maci.ll:1:32: error: constant expression type mismatch @0 = internal global [10 x i8] c"\22Ð\12ô!\00\15ùìá" The Linux string is fine ofcourse. So, my conclusion is that some unicode normalization code is broken. Although, if they represent the same code points, why should the program fail? Thanks. Ram
Ramkumar Ramachandra
2015-Aug-05 14:02 UTC
[llvm-dev] [BUG] Incorrect ASCII escape characters on Mac
[Posting to new list] Hi, The bug originated when one of our tests failed with an answer-mismatch on Mac, after upgrading to XCode 6.2. The internal IR after all our transforms, and just before it hits the final LLVM lowering is identical on Mac and Linux: var t6 : int8_T{col}[10] = {34, -48, 18, -12, 33, 0, 21, -7, -20, -31}; The LLVM lowering itself doesn't try to do anything smart. When we dumpModule, we see the difference: @2 = internal global [24 x i8] c"\10\02\03\0D\05\0B\0A\08\09\07\06\0C\04\0E\0F\01\01\02\03\04\06\02\00\05" @3 = internal global [10 x i8] c"\22\00\12\00!\00\15\00\00\00" @4 = internal global [10 x i8] c"\00\19\00+\00\00#\03\00\11" - at 5 = internal global [10 x i8] c"\22\D0\12\F4!\00\15\F9\EC\E1" - at 6 = internal global [10 x i8] c"\D0\19\FB+\FD\F8#\03\E2\11" + at 5 = internal global [10 x i8] c"\22Ð\12ô!\00\15ùìá" + at 6 = internal global [10 x i8] c"Ð\19û+ýø#\03â\11" The diff is between Linux and Mac, where lines added are from Mac. Both the @5 character sequences represent: 34 208 18 244 33 0 21 249 236 225 in decimal, which is our original array (from https://r12a.github.io/apps/conversion/). Let's try to feed this into llc then: llc: maci.ll:1:32: error: constant expression type mismatch @0 = internal global [10 x i8] c"\22Ð\12ô!\00\15ùìá" The Linux string is fine ofcourse. So, my conclusion is that some unicode normalization code is broken. Although, if they represent the same code points, why should the program fail? Thanks. Ram
David Woodhouse
2015-Aug-05 14:23 UTC
[llvm-dev] [BUG] Incorrect ASCII escape characters on Mac
On Wed, 2015-08-05 at 10:02 -0400, Ramkumar Ramachandra wrote:> > - at 5 = internal global [10 x i8] c"\22\D0\12\F4!\00\15\F9\EC\E1" > - at 6 = internal global [10 x i8] c"\D0\19\FB+\FD\F8#\03\E2\11" > + at 5 = internal global [10 x i8] c"\22Ð\12ô!\00\15ùìá" > + at 6 = internal global [10 x i8] c"Ð\19û+ýø#\03â\11" > > The diff is between Linux and Mac, where lines added are from Mac. > Both the @5 character sequences represent: > > 34 208 18 244 33 0 21 249 236 225Not in this century, they don't. That Ð, for example, is U+00D0 LATIN CAPITAL LETTER ETH, which in any 21st century system should be represented by the UTF-8 bytes 195,144. Your string "\22Ð\12ô!\00\15ùìá" is much more likely be: 34 195 144 18 195 180 33 0 21 195 185 195 172 195 161 Your "Linux" version is encoding the bytes directly and not making assumptions about character sets. -- David Woodhouse Open Source Technology Centre David.Woodhouse at intel.com Intel Corporation -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5691 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150805/c103c7d3/attachment.bin>