John, I have looked at the SAFECode and thought following sould work if (isa<Constant>(I.getOperand(0))) { Out << "*** operand 0 is a constant ******"; if (Instruction *operandI = dyn_cast<Instruction>(I.getOperand(0))) { Out << "********operand is an instruction ****"; if (GetElementPtrInst *gepI dyn_cast<GetElementPtrInst>(operandI)) { Out << "*** operand is a gep instruction ********"; if (const ArrayType *ar dyn_cast<ArrayType>(gepI->getPointerOperandType()->getElementType())) hi=ar->getNumElements(); But this does not recognize that operand(0) of instruction I is even an instruction, let alone a get element pointer instruction. I have taken the code from line 632 and line 757 of safecode/lib/ArrayBoundsChecks/ArrayBoundCheck.cpp I must be doing something wrong, what is it? Surinder Kumar Jain PS: Yes, I will be using safecode but still I want to know why above code does not work. I am posting a separate mail wioth the title "OPT optimizations" On Fri, Jan 21, 2011 at 3:12 PM, John Criswell <criswell at illinois.edu> wrote:> On 1/20/2011 10:02 PM, Surinder wrote: >> When I compile C programs into llvm, it produces load instructions in >> two different flavours. >> >> (1) %8 = load i8** %bp, align 8 >> >> (2) %1 = load i8* getelementptr inbounds ([4 x i8]* @.str, i64 0, >> i64 0), align 1 >> >> I know that %bp in first case and the entire "getelementptr inbounds >> ([4 x i8]* @.str, i64 0, i64 0)" in second case can be obtained by >> dump'ing I.getOperand(0) >> >> However, I want to find out which of the two forms of load have been >> produced because in the second case, I want to insert checks for array >> bounds. >> >> How can I find out when I am in Instruction object I and I.getOpcode() >> == 29 whether I am dealing with type (1) or type (2) above. > > The second load instruction is not really a "special" load instruction. > Rather, its pointer argument is an LLVM constant expression (class > llvm::ConstExpr). The Getelementptr (GEP), which would normally be a > GEP instruction, is instead a constant expression that will be converted > into a constant numeric value at code generation time. > > So, what you need to do is to examine the LoadInst's operand and see if > its a ConstExpr, and then see whether the ConstExpr's opcode is a GEP > opcode. > > However, there's an easier way to handle this. SAFECode > (http://safecode.cs.illinois.edu) has an LLVM pass which converts > constant expression GEPs into GEP instructions. If you run it on your > code first, you'll get the following instruction sequence: > > %tmp = getelementptr inbounds ([4 x i8]* @.str, i64 0,i64 0), align 1 > %1 = load i8* %tmp > > You then merely search for GEP instructions and put run-time checks on > those (which you have to do anyway if you're adding array bounds > checking). The only ConstantExpr GEPs that aren't converted, I think, > are those in global variable initializers. > > Now, regarding the insertion of array bounds checks, SAFECode does that, > too (it is a memory safety compiler for C code). It also provides a > simple static array bounds checker and some array bounds check > optimization passes. > > I can direct you to the relevant portions of the source code if you're > interested. > > -- John T. > >> Thanks. >> >> Surinder Kumar Jain >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
On 1/21/2011 10:46 PM, Surinder wrote:> John, > > I have looked at the SAFECode and thought following sould work > > if (isa<Constant>(I.getOperand(0))) > { Out<< "*** operand 0 is a constant ******"; > if (Instruction *operandI = dyn_cast<Instruction>(I.getOperand(0))) > { Out<< "********operand is an instruction ****"; > if (GetElementPtrInst *gepI > dyn_cast<GetElementPtrInst>(operandI)) > { Out<< "*** operand is a gep instruction ********"; > if (const ArrayType *ar > dyn_cast<ArrayType>(gepI->getPointerOperandType()->getElementType())) > hi=ar->getNumElements();> But this does not recognize that operand(0) of instruction I is even > an instruction, let alone a get element pointer instruction. I have > taken the code from line 632 and line 757 of > safecode/lib/ArrayBoundsChecks/ArrayBoundCheck.cpp > > I must be doing something wrong, what is it?The problem is simple: you're looking at the wrong source file. :) More specifically, you're looking at the very antiquated static array bounds checking pass (it hasn't compiled in several years now). The file you want to look at is in lib/InsertPoolChecks/insert.cpp. This file contains the InsertPoolChecks pass which, in mainline SAFECode, is responsible for inserting array bounds checks and indirect function call checks. In particular, you want to look at the addGetElementPtrChecks() method. As for Constant Expression GEPs, you want to look at the BreakConstGEP pass, located in lib/ArrayBoundsChecks/BreakConstantGEPs.cpp. The BreakConstantGEP pass is run first. All it does is find instructions that use constant expression GEPs and replaces the Constant Expression GEP with a GEP instruction. All of the other SAFECode passes that worry about array bounds checks (i.e., the static array bounds checking passes in lib/ArrayBoundsCheck and the run-time instrumentation pass in lib/InsertPoolChecks/insert.cpp) only look for GEP instructions. -- John T.> Surinder Kumar Jain > > > PS: Yes, I will be using safecode but still I want to know why above > code does not work. I am posting a separate mail wioth the title "OPT > optimizations"> > On Fri, Jan 21, 2011 at 3:12 PM, John Criswell<criswell at illinois.edu> wrote: >> On 1/20/2011 10:02 PM, Surinder wrote: >>> When I compile C programs into llvm, it produces load instructions in >>> two different flavours. >>> >>> (1) %8 = load i8** %bp, align 8 >>> >>> (2) %1 = load i8* getelementptr inbounds ([4 x i8]* @.str, i64 0, >>> i64 0), align 1 >>> >>> I know that %bp in first case and the entire "getelementptr inbounds >>> ([4 x i8]* @.str, i64 0, i64 0)" in second case can be obtained by >>> dump'ing I.getOperand(0) >>> >>> However, I want to find out which of the two forms of load have been >>> produced because in the second case, I want to insert checks for array >>> bounds. >>> >>> How can I find out when I am in Instruction object I and I.getOpcode() >>> == 29 whether I am dealing with type (1) or type (2) above. >> The second load instruction is not really a "special" load instruction. >> Rather, its pointer argument is an LLVM constant expression (class >> llvm::ConstExpr). The Getelementptr (GEP), which would normally be a >> GEP instruction, is instead a constant expression that will be converted >> into a constant numeric value at code generation time. >> >> So, what you need to do is to examine the LoadInst's operand and see if >> its a ConstExpr, and then see whether the ConstExpr's opcode is a GEP >> opcode. >> >> However, there's an easier way to handle this. SAFECode >> (http://safecode.cs.illinois.edu) has an LLVM pass which converts >> constant expression GEPs into GEP instructions. If you run it on your >> code first, you'll get the following instruction sequence: >> >> %tmp = getelementptr inbounds ([4 x i8]* @.str, i64 0,i64 0), align 1 >> %1 = load i8* %tmp >> >> You then merely search for GEP instructions and put run-time checks on >> those (which you have to do anyway if you're adding array bounds >> checking). The only ConstantExpr GEPs that aren't converted, I think, >> are those in global variable initializers. >> >> Now, regarding the insertion of array bounds checks, SAFECode does that, >> too (it is a memory safety compiler for C code). It also provides a >> simple static array bounds checker and some array bounds check >> optimization passes. >> >> I can direct you to the relevant portions of the source code if you're >> interested. >> >> -- John T. >> >>> Thanks. >>> >>> Surinder Kumar Jain >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>
John, I have looked at the real code (instead of the obsolete one) and it appears to be easy to find if an operand is a getelementptr instruction. if (ConstantExpr * CE = dyn_cast<ConstantExpr>(I.getOperand(0))) { Out<< "*** operand 0 is a constant Expr******"; if (CE->getOpcode() == Instruction::GetElementPtr) { Out<< "*** operand 0 is a gep instruction ********"; if (const ArrayType *ar dyn_cast<ArrayType>(CE->getPointerOperandType()->getElementType())) hi=ar->getNumElements(); Thank you for that. I would like to use safecode programs rather than write my own code. However, the website of safecode says that it works only with version 2.6 or 2.7 of llvm whereas I use version 2.8 of llvm. To get around the problem, I plan to do as follows : (1) Do not install safecode with llvm 2.8 (as it may or may not work) (2) Create a new pass named "unGep", "Breaks Constant GEPs" (3) The new pass derives from FunctionPass (because safecode does so, if I had to write it, ModulePass would have been good enough.) (4) The RunOnFunction method of the unGep pass invokes addPoolChecks(F) passing it the function F. I will modify addGetElementPtrChecks so that it produces array bounds in the way I need. (I need a check that array bounds are being voilated for my reaserch to detect overflows.) I will then run opt as opt -load../unGep.so to produce llvm code without geps as operands. Please advise if this will work or if there is an easier way. Thanks. Surinder Kumar Jain On Sat, Jan 22, 2011 at 4:08 PM, John Criswell <criswell at illinois.edu> wrote:> On 1/21/2011 10:46 PM, Surinder wrote: >> >> John, >> >> I have looked at the SAFECode and thought following sould work >> >> if (isa<Constant>(I.getOperand(0))) >> { Out<< "*** operand 0 is a constant ******"; >> if (Instruction *operandI >> dyn_cast<Instruction>(I.getOperand(0))) >> { Out<< "********operand is an instruction ****"; >> if (GetElementPtrInst *gepI >> dyn_cast<GetElementPtrInst>(operandI)) >> { Out<< "*** operand is a gep instruction ********"; >> if (const ArrayType *ar >> dyn_cast<ArrayType>(gepI->getPointerOperandType()->getElementType())) >> hi=ar->getNumElements(); > >> But this does not recognize that operand(0) of instruction I is even >> an instruction, let alone a get element pointer instruction. I have >> taken the code from line 632 and line 757 of >> safecode/lib/ArrayBoundsChecks/ArrayBoundCheck.cpp >> >> I must be doing something wrong, what is it? > > The problem is simple: you're looking at the wrong source file. > :) > > More specifically, you're looking at the very antiquated static array bounds > checking pass (it hasn't compiled in several years now). The file you want > to look at is in lib/InsertPoolChecks/insert.cpp. This file contains the > InsertPoolChecks pass which, in mainline SAFECode, is responsible for > inserting array bounds checks and indirect function call checks. In > particular, you want to look at the addGetElementPtrChecks() method. > > As for Constant Expression GEPs, you want to look at the BreakConstGEP pass, > located in lib/ArrayBoundsChecks/BreakConstantGEPs.cpp. > > The BreakConstantGEP pass is run first. All it does is find instructions > that use constant expression GEPs and replaces the Constant Expression GEP > with a GEP instruction. All of the other SAFECode passes that worry about > array bounds checks (i.e., the static array bounds checking passes in > lib/ArrayBoundsCheck and the run-time instrumentation pass in > lib/InsertPoolChecks/insert.cpp) only look for GEP instructions. > > -- John T. > > >> Surinder Kumar Jain >> >> >> PS: Yes, I will be using safecode but still I want to know why above >> code does not work. I am posting a separate mail wioth the title "OPT >> optimizations" > > >> >> On Fri, Jan 21, 2011 at 3:12 PM, John Criswell<criswell at illinois.edu> >> wrote: >>> >>> On 1/20/2011 10:02 PM, Surinder wrote: >>>> >>>> When I compile C programs into llvm, it produces load instructions in >>>> two different flavours. >>>> >>>> (1) %8 = load i8** %bp, align 8 >>>> >>>> (2) %1 = load i8* getelementptr inbounds ([4 x i8]* @.str, i64 0, >>>> i64 0), align 1 >>>> >>>> I know that %bp in first case and the entire "getelementptr inbounds >>>> ([4 x i8]* @.str, i64 0, i64 0)" in second case can be obtained by >>>> dump'ing I.getOperand(0) >>>> >>>> However, I want to find out which of the two forms of load have been >>>> produced because in the second case, I want to insert checks for array >>>> bounds. >>>> >>>> How can I find out when I am in Instruction object I and I.getOpcode() >>>> == 29 whether I am dealing with type (1) or type (2) above. >>> >>> The second load instruction is not really a "special" load instruction. >>> Rather, its pointer argument is an LLVM constant expression (class >>> llvm::ConstExpr). The Getelementptr (GEP), which would normally be a >>> GEP instruction, is instead a constant expression that will be converted >>> into a constant numeric value at code generation time. >>> >>> So, what you need to do is to examine the LoadInst's operand and see if >>> its a ConstExpr, and then see whether the ConstExpr's opcode is a GEP >>> opcode. >>> >>> However, there's an easier way to handle this. SAFECode >>> (http://safecode.cs.illinois.edu) has an LLVM pass which converts >>> constant expression GEPs into GEP instructions. If you run it on your >>> code first, you'll get the following instruction sequence: >>> >>> %tmp = getelementptr inbounds ([4 x i8]* @.str, i64 0,i64 0), align 1 >>> %1 = load i8* %tmp >>> >>> You then merely search for GEP instructions and put run-time checks on >>> those (which you have to do anyway if you're adding array bounds >>> checking). The only ConstantExpr GEPs that aren't converted, I think, >>> are those in global variable initializers. >>> >>> Now, regarding the insertion of array bounds checks, SAFECode does that, >>> too (it is a memory safety compiler for C code). It also provides a >>> simple static array bounds checker and some array bounds check >>> optimization passes. >>> >>> I can direct you to the relevant portions of the source code if you're >>> interested. >>> >>> -- John T. >>> >>>> Thanks. >>>> >>>> Surinder Kumar Jain >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> > >