Hi, I've been trying to set up clang/LLVM to compile for big endian ARM and I need a little help. The code generation works for the most part and most of my regression tests pass, but I noticed that code like this extern void g(void); int *p; int main() { if (*p & 0x01000000) g(); } generates ldr r0, [r0] ldrb r0, [r0, #3] tst r0, #1 i.e. the test of the value is optimized to use a byte load, but the ldrb is done assuming a little endian address space. I've been snooping around, but can't seem to find where the conversion to a byte operation is done. Could someone point me in the right direction? -Rich
On Saturday, June 02, 2012 10:20:03 AM Richard Pennington wrote:> Hi, > > I've been trying to set up clang/LLVM to compile for big endian ARM and I > need a little help. The code generation works for the most part and most of > my regression tests pass, but I noticed that code like this > > extern void g(void); > int *p; > > int main() > { > if (*p & 0x01000000) g(); > } > > generates > > ldr r0, [r0] > ldrb r0, [r0, #3] > tst r0, #1 > > i.e. the test of the value is optimized to use a byte load, but the ldrb is > done assuming a little endian address space. > > I've been snooping around, but can't seem to find where the conversion to a > byte operation is done. Could someone point me in the right direction? >I've figured out my problem. I didn't adjust the data layout description string in ARMTargetMachine.cpp for big endian targets. This brings up another question. clang has its own set of description strings for varying ABIs, etc. Should those strings somehow override in the code generators? -Rich
Hi Rich,> I've figured out my problem. I didn't adjust the data layout description string > in ARMTargetMachine.cpp for big endian targets. > > This brings up another question. clang has its own set of description strings > for varying ABIs, etc. Should those strings somehow override in the code > generators?no, they shouldn't override it. These strings exist AFAIK so that clang doesn't have to pull in all of LLVM's codegen just to know data layout, i.e. it gives better decoupling. What would make sense is to have LLVM codegen check that the data layout string in the module matches the string that codegen is going to use and error out if not. Ciao, Duncan.
On Jun 2, 2012, at 8:35 PM, Richard Pennington wrote:>> i.e. the test of the value is optimized to use a byte load, but the ldrb is >> done assuming a little endian address space. >> >> I've been snooping around, but can't seem to find where the conversion to a >> byte operation is done. Could someone point me in the right direction? >> > > I've figured out my problem. I didn't adjust the data layout description string > in ARMTargetMachine.cpp for big endian targets. > > This brings up another question. clang has its own set of description strings > for varying ABIs, etc. Should those strings somehow override in the code > generators?The current design is that the frontend (if it attaches a TD string) is *required* to match the code generator: http://llvm.org/docs/LangRef.html#datalayout It is intended to allow the mid-level optimizers to know about data layout without having the code generator linked in (e.g. "opt"). -Chris