Michael Spencer
2011-Nov-25  15:39 UTC
[LLVMdev] Where does LLVM mangle characters from llvm-ir names while generating native code?
So I was taking a look at Microsoft C++ ABI support while on vacation,
and ran into a major issue. Given the following llvm-ir:
$ clang++ -S -emit-llvm -O3 mangling.cpp -o - -Xclang -cxx-abi -Xclang microsoft
; ModuleID = 'mangling.cpp'
target datalayout
"e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f80:128:128-v64:64:64-
v128:128:128-a0:0:64-f80:32:32-n8:16:32-S32"
target triple = "i686-pc-win32"
define i32 @"?heyimacxxfunction@@YAHXZ"() nounwind readnone {
entry:
  ret i32 42
}
Note the ?heyimacxxfunction@@YAHXZ. Now if I generate assembly (using
clang or llc) I get:
$ clang++ -S -O3 mangling.cpp -o - -Xclang -cxx-abi -Xclang microsoft
        .def     __3F_heyimacxxfunction@@YAHXZ;
        .scl    2;
        .type   32;
        .endef
        .text
        .globl  __3F_heyimacxxfunction@@YAHXZ
        .align  16, 0x90
__3F_heyimacxxfunction@@YAHXZ:          # @"?heyimacxxfunction@@YAHXZ"
# BB#0:                                 # %entry
        movl    $42, %eax
        ret
It turned the ? into _3F_, and prepended _ someplace. I get the same
symbol if I use integrated-as to generate an object file.
I've been unable thus far to find out exactly where this occurs. And
once I do find it, I'm not sure of the correct fix. I assume that LLVM
mangles the '?' because it means something special in the gas syntax.
There also has to be a way to communicate to llvm from clang that this
symbol should not receive a prepended _.
Thanks,
- Michael Spencer
Joe Abbey
2011-Nov-25  16:29 UTC
[LLVMdev] Where does LLVM mangle characters from llvm-ir names while generating native code?
Looks to me like it converted the ? into the ascii hexadecimal representation
_3F_.  I don't think another underscore was pre-pended.
This is probably thanks to lib/Target/Mangler.cpp.  You'll want to let ? be
treated as an acceptable character.
static bool isAcceptableChar(char C, bool AllowPeriod) {
  if ((C < 'a' || C > 'z') &&
      (C < 'A' || C > 'Z') &&
      (C < '0' || C > '9') &&
      C != '_' && C != '$' && C != '@'
&&
      !(AllowPeriod && C == '.'))
    return false;
  return true;
}
But you might want to add doesAllowQuestionMarksInName() to MCAsmInfo, it's
possible that other name manglers don't accept question marks.
Cheers,
 
Joe Abbey
Software Architect
Arxan Technologies, Inc.
1305 Cumberland Ave, Ste 215
West Lafayette, IN 47906
jabbey at arxan.com
www.arxan.com
On Nov 25, 2011, at 10:39 AM, Michael Spencer wrote:
> So I was taking a look at Microsoft C++ ABI support while on vacation,
> and ran into a major issue. Given the following llvm-ir:
> 
> $ clang++ -S -emit-llvm -O3 mangling.cpp -o - -Xclang -cxx-abi -Xclang
microsoft
> ; ModuleID = 'mangling.cpp'
> target datalayout >
"e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f80:128:128-v64:64:64-
> v128:128:128-a0:0:64-f80:32:32-n8:16:32-S32"
> target triple = "i686-pc-win32"
> 
> define i32 @"?heyimacxxfunction@@YAHXZ"() nounwind readnone {
> entry:
>  ret i32 42
> }
> 
> Note the ?heyimacxxfunction@@YAHXZ. Now if I generate assembly (using
> clang or llc) I get:
> 
> $ clang++ -S -O3 mangling.cpp -o - -Xclang -cxx-abi -Xclang microsoft
>        .def     __3F_heyimacxxfunction@@YAHXZ;
>        .scl    2;
>        .type   32;
>        .endef
>        .text
>        .globl  __3F_heyimacxxfunction@@YAHXZ
>        .align  16, 0x90
> __3F_heyimacxxfunction@@YAHXZ:          #
@"?heyimacxxfunction@@YAHXZ"
> # BB#0:                                 # %entry
>        movl    $42, %eax
>        ret
> 
> It turned the ? into _3F_, and prepended _ someplace. I get the same
> symbol if I use integrated-as to generate an object file.
> 
> I've been unable thus far to find out exactly where this occurs. And
> once I do find it, I'm not sure of the correct fix. I assume that LLVM
> mangles the '?' because it means something special in the gas
syntax.
> There also has to be a way to communicate to llvm from clang that this
> symbol should not receive a prepended _.
> 
> Thanks,
> 
> - Michael Spencer
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111125/9433b7e6/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4350 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111125/9433b7e6/attachment.bin>
Joe Abbey
2011-Nov-25  16:45 UTC
[LLVMdev] Where does LLVM mangle characters from llvm-ir names while generating native code?
> > it's possible that other name manglers don't accept question marks.I mean assemblers/linkers. Dur, Joe Joe Abbey Software Architect Arxan Technologies, Inc. 1305 Cumberland Ave, Ste 215 West Lafayette, IN 47906 jabbey at arxan.com www.arxan.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111125/563d6387/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4350 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111125/563d6387/attachment.bin>
Rafael EspĂndola
2011-Nov-25  17:13 UTC
[LLVMdev] Where does LLVM mangle characters from llvm-ir names while generating native code?
> I've been unable thus far to find out exactly where this occurs. And > once I do find it, I'm not sure of the correct fix. I assume that LLVM > mangles the '?' because it means something special in the gas syntax. > There also has to be a way to communicate to llvm from clang that this > symbol should not receive a prepended _.You can prepend a \01. That is what is done with inline assembly. I would love to remove all mangling from the LLVM IL, but last I mentioned sabre wanted to keep at the lest the prepending of '_'. Not sure about the rest (like expanding ?).> Thanks, > > - Michael SpencerCheers, Rafael
Charles Davis
2011-Nov-25  19:15 UTC
[LLVMdev] Where does LLVM mangle characters from llvm-ir names while generating native code?
On Nov 25, 2011, at 8:39 AM, Michael Spencer wrote:> So I was taking a look at Microsoft C++ ABI support while on vacation, > and ran into a major issue. Given the following llvm-ir: > > $ clang++ -S -emit-llvm -O3 mangling.cpp -o - -Xclang -cxx-abi -Xclang microsoft > ; ModuleID = 'mangling.cpp' > target datalayout > "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f80:128:128-v64:64:64- > v128:128:128-a0:0:64-f80:32:32-n8:16:32-S32" > target triple = "i686-pc-win32" > > define i32 @"?heyimacxxfunction@@YAHXZ"() nounwind readnone { > entry: > ret i32 42 > } > > Note the ?heyimacxxfunction@@YAHXZ. Now if I generate assembly (using > clang or llc) I get: > > $ clang++ -S -O3 mangling.cpp -o - -Xclang -cxx-abi -Xclang microsoft > .def __3F_heyimacxxfunction@@YAHXZ; > .scl 2; > .type 32; > .endef > .text > .globl __3F_heyimacxxfunction@@YAHXZ > .align 16, 0x90 > __3F_heyimacxxfunction@@YAHXZ: # @"?heyimacxxfunction@@YAHXZ" > # BB#0: # %entry > movl $42, %eax > ret > > It turned the ? into _3F_, and prepended _ someplace. I get the same > symbol if I use integrated-as to generate an object file. > > I've been unable thus far to find out exactly where this occurs. And > once I do find it, I'm not sure of the correct fix. I assume that LLVM > mangles the '?' because it means something special in the gas syntax. > There also has to be a way to communicate to llvm from clang that this > symbol should not receive a prepended _.I could swear that when I wrote that code I stuck a '\1' character in front of the name to prevent LLVM's mangler from doing anything to it (see lib/AST/MicrosoftMangle.cpp:162). (I knew the prepending of '_' characters was a problem, but I didn't know LLVM magically transformed chars it doesn't like into _xx_.) That is a magic marker that says "this is the literal name of this symbol, don't mangle it in the usual way." It looks like for some reason that the '\1' character isn't getting emitted. I wish I had an answer for you, but I can't debug this because I don't run Windows. Works fine for me here on Mac. Running your command to generate LLVM IR, I get: ; ModuleID = 'mangling.cpp' target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128-n8:16:32-S128" target triple = "i386-apple-macosx10.6.8" define i32 @"\01?heyimacxxfunction@@YAHXZ"() nounwind readnone ssp { ret i32 42 } Note the '\01'. Chip> > Thanks, > > - Michael Spencer > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
bigcheesegs at gmail.com
2011-Nov-25  21:22 UTC
[LLVMdev] Where does LLVM mangle characters from llvm-ir names while generating native code?
In the case I posted I had removed that line, however, you still get the __3F_ in the generated assembly with it. Sent from my iPhone On Nov 25, 2011, at 2:15 PM, Charles Davis <cdavis at mymail.mines.edu> wrote:> > On Nov 25, 2011, at 8:39 AM, Michael Spencer wrote: > >> So I was taking a look at Microsoft C++ ABI support while on vacation, >> and ran into a major issue. Given the following llvm-ir: >> >> $ clang++ -S -emit-llvm -O3 mangling.cpp -o - -Xclang -cxx-abi -Xclang microsoft >> ; ModuleID = 'mangling.cpp' >> target datalayout >> "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f80:128:128-v64:64:64- >> v128:128:128-a0:0:64-f80:32:32-n8:16:32-S32" >> target triple = "i686-pc-win32" >> >> define i32 @"?heyimacxxfunction@@YAHXZ"() nounwind readnone { >> entry: >> ret i32 42 >> } >> >> Note the ?heyimacxxfunction@@YAHXZ. Now if I generate assembly (using >> clang or llc) I get: >> >> $ clang++ -S -O3 mangling.cpp -o - -Xclang -cxx-abi -Xclang microsoft >> .def __3F_heyimacxxfunction@@YAHXZ; >> .scl 2; >> .type 32; >> .endef >> .text >> .globl __3F_heyimacxxfunction@@YAHXZ >> .align 16, 0x90 >> __3F_heyimacxxfunction@@YAHXZ: # @"?heyimacxxfunction@@YAHXZ" >> # BB#0: # %entry >> movl $42, %eax >> ret >> >> It turned the ? into _3F_, and prepended _ someplace. I get the same >> symbol if I use integrated-as to generate an object file. >> >> I've been unable thus far to find out exactly where this occurs. And >> once I do find it, I'm not sure of the correct fix. I assume that LLVM >> mangles the '?' because it means something special in the gas syntax. >> There also has to be a way to communicate to llvm from clang that this >> symbol should not receive a prepended _. > I could swear that when I wrote that code I stuck a '\1' character in front of the name to prevent LLVM's mangler from doing anything to it (see lib/AST/MicrosoftMangle.cpp:162). (I knew the prepending of '_' characters was a problem, but I didn't know LLVM magically transformed chars it doesn't like into _xx_.) That is a magic marker that says "this is the literal name of this symbol, don't mangle it in the usual way." > > It looks like for some reason that the '\1' character isn't getting emitted. I wish I had an answer for you, but I can't debug this because I don't run Windows. Works fine for me here on Mac. Running your command to generate LLVM IR, I get: > > ; ModuleID = 'mangling.cpp' > target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128-n8:16:32-S128" > target triple = "i386-apple-macosx10.6.8" > > define i32 @"\01?heyimacxxfunction@@YAHXZ"() nounwind readnone ssp { > ret i32 42 > } > > Note the '\01'. > > Chip > >> >> Thanks, >> >> - Michael Spencer >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Reasonably Related Threads
- [LLVMdev] Where does LLVM mangle characters from llvm-ir names while generating native code?
- [LLVMdev] Where does LLVM mangle characters from llvm-ir names while generating native code?
- [LLVMdev] Where does LLVM mangle characters from llvm-ir names while generating native code?
- [LLVMdev] Where does LLVM mangle characters from llvm-ir names while generating native code?
- [LLVMdev] Where does LLVM mangle characters from llvm-ir names while generating native code?