Jeff Kuskin
2014-Jun-10 15:04 UTC
[LLVMdev] Help with new backend: byte-sized loads being generated for 'int' array access
First, apologies because I'm quite new to LLVM backend development. I very much appreciate any help from more experienced folks. I'm running into a problem in which byte-sized loads are _sometimes_ being generated for a read access to an external array of 4-byte ints, depending on how the array is declared. I am hoping someone can perhaps point me to possible sources of the problem in my backend code. I would be happy to supply additional details; I'm trying to keep this message relatively short. The issue arises when I compile the following C code: extern int EI[]; int MYFUNC() { return EI[1288]; } I run the code through 'clang -emit-llvm' and end up with bitcode of: ; ModuleID = './clang2.c' target datalayout = "e-m:e-p:32:32-i8:8:32-i16:16:32-i64:64-n32-S64" target triple = "dgc" @EI = external global [0 x i32] ; Function Attrs: nounwind define i32 @MYFUNC() #0 { entry: %0 = load i32* getelementptr inbounds ([0 x i32]* @EI, i32 0, i32 1288), align 1 ret i32 %0 } attributes #0 = { nounwind "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" } !llvm.ident = !{!0} !0 = metadata !{metadata !"clang version 3.5.0 (209307)"} When I then run the bitcode through llc, the memory load for the 'EI[1288]' reference is generated with a series of four byte-sized loads, followed by the appropriate shifting and OR'ing to get all the bytes into the proper place in the result. This is not what I want, of course. I want a single, word-sized load to be generated. I have various sizes of load instructions defined in my TableGen file, an excerpt of which I've included at the end of this message. Other backends built from the same source tree -- mipsel and xcore, for instance -- do indeed generate a single word-sized load, as expected, so I'm confident the problem is in my backend code. What's interesting is that my backend *DOES* generate a single word-sized load if I make either of the following changes to the declaration of 'EI': (1) Provide an array size in the EI declaration: extern int EI[5000]; This yields the following in the bitcode, replacing the like lines from above: @EI = external global [5000 x i32] ; Function Attrs: nounwind define i32 @MYFUNC() #0 { entry: %0 = load i32* getelementptr inbounds ([5000 x i32]* @EI, i32 0, i32 1288), align 4 ret i32 %0 } (2) Change EI to be an int*: extern int* EI; This yields the following in the bitcode: @EI = external global i32* ; Function Attrs: nounwind define i32 @MYFUNC() #0 { entry: %0 = load i32** @EI, align 4 %arrayidx = getelementptr inbounds i32* %0, i32 1288 %1 = load i32* %arrayidx, align 4 ret i32 %1 } I have tried a number of things to figure out this issue, but to no avail. For some reason the 'EI[1288]' reference is being treated as possibly unaligned ("align=1"), but I can't figure out why. TD file excerpt (modeled after the MIPS .td file): def DGCAddrDefault : ComplexPattern<iPTR, 2, "selectAddrDefault", [frameindex]>; def DGCAddrInt : ComplexPattern<iPTR, 2, "selectAddrInt", [frameindex]>; def DGCMemSrc : Operand<iPTR> { let MIOperandInfo = (ops ptr_rc, i32imm); let OperandType = "OPERAND_MEMORY"; } let canFoldAsLoad = 1, mayLoad = 1 in { def LB : InstrDGC64_s__s_s< (outs IntRegs:$rd), (ins DGCMemSrc:$addr), !strconcat("lb", "\t$rd, $addr"), [(set i32:$rd, (sextloadi8 DGCAddrInt:$addr))], 0b10011, 0b000, 0, 0>; def LH : InstrDGC64_s__s_s< (outs IntRegs:$rd), (ins DGCMemSrc:$addr), !strconcat("lh", "\t$rd, $addr"), [(set i32:$rd, (sextloadi16 DGCAddrDefault:$addr))], 0b10011, 0b001, 0, 0>; def LBU : InstrDGC64_s__s_s< (outs IntRegs:$rd), (ins DGCMemSrc:$addr), !strconcat("lbu", "\t$rd, $addr"), [(set i32:$rd, (zextloadi8 DGCAddrDefault:$addr))], 0b10011, 0b100, 0, 0>; def LHU : InstrDGC64_s__s_s< (outs IntRegs:$rd), (ins DGCMemSrc:$addr), !strconcat("lhu", "\t$rd, $addr"), [(set i32:$rd, (zextloadi16 DGCAddrInt:$addr))], 0b10011, 0b101, 0, 0>; def LW : InstrDGC64_s__s_s< (outs IntRegs:$rd), (ins DGCMemSrc:$addr), !strconcat("lw", "\t$rd, $addr"), [(set i32:$rd, (load DGCAddrDefault:$addr))], 0b10011, 0b010, 0, 0>; }
Hal Finkel
2014-Jun-10 15:54 UTC
[LLVMdev] Help with new backend: byte-sized loads being generated for 'int' array access
----- Original Message -----> From: "Jeff Kuskin" <jk500500 at yahoo.com> > To: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu> > Sent: Tuesday, June 10, 2014 10:04:45 AM > Subject: [LLVMdev] Help with new backend: byte-sized loads being generated for 'int' array access > > First, apologies because I'm quite new to LLVM backend development. > I very much appreciate any help from more experienced folks. > > > I'm running into a problem in which byte-sized loads are _sometimes_ > being generated for a read access to an external array of 4-byte > ints, depending on how the array is declared. > > I am hoping someone can perhaps point me to possible sources of the > problem in my backend code. I would be happy to supply additional > details; I'm trying to keep this message relatively short. > > > > The issue arises when I compile the following C code: > > extern int EI[]; > int MYFUNC() { return EI[1288]; } > > > > I run the code through 'clang -emit-llvm' and end up with bitcode of: > > ; ModuleID = './clang2.c' > target datalayout > "e-m:e-p:32:32-i8:8:32-i16:16:32-i64:64-n32-S64" > target triple = "dgc" > @EI = external global [0 x i32] > ; Function Attrs: nounwind > define i32 @MYFUNC() #0 { > entry: > %0 = load i32* getelementptr inbounds ([0 x i32]* @EI, i32 0, > i32 1288), align 1 > ret i32 %0 > } > > attributes #0 = { nounwind "less-precise-fpmad"="false" > "no-frame-pointer-elim"="true" > "no-frame-pointer-elim-non-leaf" > "no-infs-fp-math"="false" > "no-nans-fp-math"="false" > "stack-protector-buffer-size"="8" > "unsafe-fp-math"="false" "use-soft-float"="false" > } > !llvm.ident = !{!0} > !0 = metadata !{metadata !"clang version 3.5.0 (209307)"} > > > > When I then run the bitcode through llc, the memory load for the > 'EI[1288]' reference is generated with a series of four byte-sized > loads, followed by the appropriate shifting and OR'ing to get all > the bytes into the proper place in the result. > > This is not what I want, of course. I want a single, word-sized load > to be generated. I have various sizes of load instructions defined > in my TableGen file, an excerpt of which I've included at the end of > this message. > > Other backends built from the same source tree -- mipsel and xcore, > for instance -- do indeed generate a single word-sized load, as > expected, so I'm confident the problem is in my backend code. > > What's interesting is that my backend *DOES* generate a single > word-sized load if I make either of the following changes to the > declaration of 'EI': > > (1) Provide an array size in the EI declaration: > extern int EI[5000]; > > This yields the following in the bitcode, replacing the like > lines from above: > @EI = external global [5000 x i32] > ; Function Attrs: nounwind > define i32 @MYFUNC() #0 { > entry: > %0 = load i32* getelementptr inbounds ([5000 x i32]* > @EI, i32 0, i32 1288), align 4 > ret i32 %0 > } > > > > (2) Change EI to be an int*: > extern int* EI; > > This yields the following in the bitcode: > @EI = external global i32* > ; Function Attrs: nounwind > define i32 @MYFUNC() #0 { > entry: > %0 = load i32** @EI, align 4 > %arrayidx = getelementptr inbounds i32* %0, i32 1288 > %1 = load i32* %arrayidx, align 4 > ret i32 %1 > } > > > > I have tried a number of things to figure out this issue, but to no > avail. For some reason the 'EI[1288]' reference is being treated as > possibly unaligned ("align=1"), but I can't figure out why. >For the question of how C is being translated into LLVM IR (why there is the 'align 1' vs 'align 4'), you should ask on the cfe-dev list (not here). To mention a related point, if your target supports unaligned loads for 4-byte integers, then you need to override the *TargetLowering::allowsUnalignedMemoryAccesses callback for your target. -Hal> > > > TD file excerpt (modeled after the MIPS .td file): > > > def DGCAddrDefault : > ComplexPattern<iPTR, 2, "selectAddrDefault", [frameindex]>; > def DGCAddrInt : > ComplexPattern<iPTR, 2, "selectAddrInt", [frameindex]>; > > def DGCMemSrc : Operand<iPTR> { > let MIOperandInfo = (ops ptr_rc, i32imm); > let OperandType = "OPERAND_MEMORY"; > } > > let canFoldAsLoad = 1, > mayLoad = 1 in > { > def LB : InstrDGC64_s__s_s< > (outs IntRegs:$rd), > (ins DGCMemSrc:$addr), > !strconcat("lb", "\t$rd, $addr"), > [(set i32:$rd, (sextloadi8 DGCAddrInt:$addr))], > 0b10011, 0b000, 0, 0>; > def LH : InstrDGC64_s__s_s< > (outs IntRegs:$rd), > (ins DGCMemSrc:$addr), > !strconcat("lh", "\t$rd, $addr"), > [(set i32:$rd, (sextloadi16 DGCAddrDefault:$addr))], > 0b10011, 0b001, 0, 0>; > def LBU : InstrDGC64_s__s_s< > (outs IntRegs:$rd), > (ins DGCMemSrc:$addr), > !strconcat("lbu", "\t$rd, $addr"), > [(set i32:$rd, (zextloadi8 DGCAddrDefault:$addr))], > 0b10011, 0b100, 0, 0>; > def LHU : InstrDGC64_s__s_s< > (outs IntRegs:$rd), > (ins DGCMemSrc:$addr), > !strconcat("lhu", "\t$rd, $addr"), > [(set i32:$rd, (zextloadi16 DGCAddrInt:$addr))], > 0b10011, 0b101, 0, 0>; > def LW : InstrDGC64_s__s_s< > (outs IntRegs:$rd), > (ins DGCMemSrc:$addr), > !strconcat("lw", "\t$rd, $addr"), > [(set i32:$rd, (load DGCAddrDefault:$addr))], > 0b10011, 0b010, 0, 0>; > } > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory