Wang Nan via llvm-dev
2015-Aug-14 10:05 UTC
[llvm-dev] [LLVM RFC] Add llvm.typeid.for intrinsic
This is for BPF output. BPF program output bytes to perf through a tracepoint. For decoding such data, we need a way to describe the format of the buffer. This patch is a try which gives each variable a unique number by introducing a new intrinsic 'llvm.typeid.for'. At the bottom is an example of using that intrinsic and the result of $ clang -target bpf -O2 -c -S ./test_typeid.c There is a limitation of the newly introduced intrinsic that, I can't find a way to make the intrinsic to accept all types without name mangling. Therefore, we have to define different intrinsics for different type. See the example below, by using macro trick, we define llvm.typeid.for.p0struct.mystr and llvm.typeid.for.p0struct.mystr2, and also the different output functions. Another problem is that I'm still unable to find a way to insert dwarf information in this stage. After clang, debug information are already isolated, and debug information entries are linked together. Adjusting debug information requires me to create new metadata and new debug info entries, link them properly then insert into correct place. Which is possible, but makes code ugly. Because of the above two problems, I decided to try clang builtin again. I think that should be the last try. If still not work, then I'd like to stop working on it until I have any better idea (BCC rewriter should be a considerable solution). Let patch series 'Make eBPF programs output data to perf' be merged into upstream without the 'typeid' change. Before the decoding problem solved, we have to let user decode the BPF output themself manually or use perf script or babeltrace script. Thank you. ----------------- EXAMPLE ----------------- extern void output(int id, void *ptr, int size); #define OUTPUT_STR(name) \ struct name { #define OUTPUT_STR_END(name) \ }; \ unsigned long long __get_typeid_##name(struct name *) asm ("llvm.typeid.for.p0struct."#name); \ static inline void output_##name(struct name *str) \ {\ output(__get_typeid_##name(str), str, sizeof(struct name));\ };\ static struct name __g_##name; OUTPUT_STR(mystr) int x; int y; int z; OUTPUT_STR_END(mystr); OUTPUT_STR(mystr2) int x; int y; OUTPUT_STR_END(mystr2); --------------- RESULT ------------- int func(void) { int x = 123; struct mystr myvar; struct mystr2 myvar2; output_mystr(&myvar); output_mystr2(&myvar2); output_mystr(&myvar); return 0; } int func2(void) { int x = 123; struct mystr myvar; struct mystr2 myvar2; output_mystr2(&myvar2); output_mystr(&myvar); output_mystr2(&myvar2); return 0; } .text .globl func .align 8 func: # @func # BB#0: # %entry mov r6, r10 addi r6, -16 mov r1, 1 mov r2, r6 mov r3, 12 call output mov r2, r10 addi r2, -24 mov r1, 2 mov r3, 8 call output mov r1, 1 mov r2, r6 mov r3, 12 call output mov r0, 0 ret .globl func2 .align 8 func2: # @func2 # BB#0: # %entry mov r6, r10 addi r6, -24 mov r1, 2 mov r2, r6 mov r3, 8 call output mov r2, r10 addi r2, -16 mov r1, 1 mov r3, 12 call output mov r1, 2 mov r2, r6 mov r3, 8 call output mov r0, 0 ret Signed-off-by: Wang Nan <wangnan0 at huawei.com> --- include/llvm/IR/Intrinsics.td | 1 + lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp | 27 ++++++++++++++++++++++++ lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h | 1 + 3 files changed, 29 insertions(+) diff --git a/include/llvm/IR/Intrinsics.td b/include/llvm/IR/Intrinsics.td index 83cfebe..8ebeb24 100644 --- a/include/llvm/IR/Intrinsics.td +++ b/include/llvm/IR/Intrinsics.td @@ -640,6 +640,7 @@ def int_masked_scatter: Intrinsic<[], def int_bitset_test : Intrinsic<[llvm_i1_ty], [llvm_ptr_ty, llvm_metadata_ty], [IntrNoMem]>; +def int_typeid_for : Intrinsic<[llvm_i64_ty], [llvm_any_ty], [IntrNoMem]>; //===----------------------------------------------------------------------===// // Target-specific intrinsics //===----------------------------------------------------------------------===// diff --git a/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp index ce4912f..ff453cd 100644 --- a/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp +++ b/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp @@ -5077,6 +5077,10 @@ SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I, unsigned Intrinsic) { setValue(&I, N); return nullptr; } + case Intrinsic::typeid_for: { + visitTypeidfor(I); + return nullptr; + } } } @@ -6769,6 +6773,29 @@ void SelectionDAGBuilder::visitPatchpoint(ImmutableCallSite CS, FuncInfo.MF->getFrameInfo()->setHasPatchPoint(); } +void SelectionDAGBuilder::visitTypeidfor(const CallInst &CI) { + SDValue Res; + static std::vector<const StructType *> StructTypes; + int ID = -1; + Value *PtrArg = CI.getArgOperand(0); + PointerType *PTy = cast<PointerType>(PtrArg->getType()); + if (PTy) { + StructType *STy = cast<StructType>(PTy->getElementType()); + if (STy) { + for (unsigned i = 0, N = StructTypes.size(); i != N; ++i) + if (StructTypes[i] == STy) + ID = i + 1; + if (ID == -1) { + StructTypes.push_back(STy); + ID = StructTypes.size(); + } + } + } + + Res = DAG.getConstant(ID, getCurSDLoc(), MVT::i32); + setValue(&CI, Res); +} + /// Returns an AttributeSet representing the attributes applied to the return /// value of the given call. static AttributeSet getReturnAttrs(TargetLowering::CallLoweringInfo &CLI) { diff --git a/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h b/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h index f71190d..f037689 100644 --- a/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h +++ b/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h @@ -855,6 +855,7 @@ private: void visitStatepoint(const CallInst &I); void visitGCRelocate(const CallInst &I); void visitGCResult(const CallInst &I); + void visitTypeidfor(const CallInst &I); void visitUserOp1(const Instruction &I) { llvm_unreachable("UserOp1 should not exist at instruction selection time!"); -- 1.8.3.4
Alexei Starovoitov via llvm-dev
2015-Aug-16 23:40 UTC
[llvm-dev] [LLVM RFC] Add llvm.typeid.for intrinsic
On Fri, Aug 14, 2015 at 10:05:12AM +0000, Wang Nan via llvm-dev wrote:> This is for BPF output. BPF program output bytes to perf through a > tracepoint. For decoding such data, we need a way to describe the format > of the buffer. This patch is a try which gives each variable a unique > number by introducing a new intrinsic 'llvm.typeid.for'. > > At the bottom is an example of using that intrinsic and the result > of > $ clang -target bpf -O2 -c -S ./test_typeid.c > > There is a limitation of the newly introduced intrinsic that, I can't > find a way to make the intrinsic to accept all types without name > mangling. Therefore, we have to define different intrinsics for > different type. See the example below, by using macro trick, we define > llvm.typeid.for.p0struct.mystr and llvm.typeid.for.p0struct.mystr2, and > also the different output functions. > > Another problem is that I'm still unable to find a way to insert dwarf > information in this stage. After clang, debug information are already > isolated, and debug information entries are linked together. Adjusting > debug information requires me to create new metadata and new debug info > entries, link them properly then insert into correct place. Which is > possible, but makes code ugly. > > Because of the above two problems, I decided to try clang builtin > again. I think that should be the last try. If still not work, then > I'd like to stop working on it until I have any better idea (BCC > rewriter should be a considerable solution). Let patch series > 'Make eBPF programs output data to perf' be merged into upstream > without the 'typeid' change. Before the decoding problem solved, we > have to let user decode the BPF output themself manually or use > perf script or babeltrace script. > > Thank you. > > @@ -6769,6 +6773,29 @@ void SelectionDAGBuilder::visitPatchpoint(ImmutableCallSite CS, > FuncInfo.MF->getFrameInfo()->setHasPatchPoint(); > } > > +void SelectionDAGBuilder::visitTypeidfor(const CallInst &CI) { > + SDValue Res; > + static std::vector<const StructType *> StructTypes;'static' is obviously short term hack for illustration purpose, right?> + int ID = -1; > + Value *PtrArg = CI.getArgOperand(0); > + PointerType *PTy = cast<PointerType>(PtrArg->getType()); > + if (PTy) { > + StructType *STy = cast<StructType>(PTy->getElementType()); > + if (STy) { > + for (unsigned i = 0, N = StructTypes.size(); i != N; ++i) > + if (StructTypes[i] == STy) > + ID = i + 1; > + if (ID == -1) { > + StructTypes.push_back(STy); > + ID = StructTypes.size(); > + } > + } > + } > unsigned long long __get_typeid_##name(struct name *) asm ("llvm.typeid.for.p0struct."#name); \the macro hack and the loop are quite ugly. Also how do you plane to correlate such ID to dwarf info? Instead of StructType we need to lookup DICompositeType, but looks like there is no clear connection between call arguments to metadata provided by clang. May be indeed it would be easier to add clang intrinsic that will add metadata number as explicit constant. I didn't really have time to explore this problem in depth. May be we can make the clear problem statement and someone on llvm list that familiar with debug info can help design a solution. Let me state what I think we're trying to do. For the program: void foo(void * ptr); void bar(...) { struct S s; ... foo(&s); } We want to be able to scan .o file and for the callsite of foo, we want to be able to find an id of DICompositeType looking at binary code of .o, so we can lookup this id in dwarf info (that is also part of .o) and figure out the layout of the struct passed into the function foo.
Wangnan (F) via llvm-dev
2015-Aug-17 01:54 UTC
[llvm-dev] [LLVM RFC] Add llvm.typeid.for intrinsic
On 2015/8/17 7:40, Alexei Starovoitov wrote:> On Fri, Aug 14, 2015 at 10:05:12AM +0000, Wang Nan via llvm-dev wrote: >> This is for BPF output. BPF program output bytes to perf through a >> tracepoint. For decoding such data, we need a way to describe the format >> of the buffer. This patch is a try which gives each variable a unique >> number by introducing a new intrinsic 'llvm.typeid.for'. >> >> At the bottom is an example of using that intrinsic and the result >> of >> $ clang -target bpf -O2 -c -S ./test_typeid.c >> >> There is a limitation of the newly introduced intrinsic that, I can't >> find a way to make the intrinsic to accept all types without name >> mangling. Therefore, we have to define different intrinsics for >> different type. See the example below, by using macro trick, we define >> llvm.typeid.for.p0struct.mystr and llvm.typeid.for.p0struct.mystr2, and >> also the different output functions. >> >> Another problem is that I'm still unable to find a way to insert dwarf >> information in this stage. After clang, debug information are already >> isolated, and debug information entries are linked together. Adjusting >> debug information requires me to create new metadata and new debug info >> entries, link them properly then insert into correct place. Which is >> possible, but makes code ugly. >> >> Because of the above two problems, I decided to try clang builtin >> again. I think that should be the last try. If still not work, then >> I'd like to stop working on it until I have any better idea (BCC >> rewriter should be a considerable solution). Let patch series >> 'Make eBPF programs output data to perf' be merged into upstream >> without the 'typeid' change. Before the decoding problem solved, we >> have to let user decode the BPF output themself manually or use >> perf script or babeltrace script. >> >> Thank you. >> >> @@ -6769,6 +6773,29 @@ void SelectionDAGBuilder::visitPatchpoint(ImmutableCallSite CS, >> FuncInfo.MF->getFrameInfo()->setHasPatchPoint(); >> } >> >> +void SelectionDAGBuilder::visitTypeidfor(const CallInst &CI) { >> + SDValue Res; >> + static std::vector<const StructType *> StructTypes; > 'static' is obviously short term hack for illustration purpose, right?Of course. Actually I don't like this solution. Please see my commit message.> >> + int ID = -1; >> + Value *PtrArg = CI.getArgOperand(0); >> + PointerType *PTy = cast<PointerType>(PtrArg->getType()); >> + if (PTy) { >> + StructType *STy = cast<StructType>(PTy->getElementType()); >> + if (STy) { >> + for (unsigned i = 0, N = StructTypes.size(); i != N; ++i) >> + if (StructTypes[i] == STy) >> + ID = i + 1; >> + if (ID == -1) { >> + StructTypes.push_back(STy); >> + ID = StructTypes.size(); >> + } >> + } >> + } >> unsigned long long __get_typeid_##name(struct name *) asm ("llvm.typeid.for.p0struct."#name); \ > the macro hack and the loop are quite ugly.Quite sure. This is a hard limitation if we implement this in llvm intrinsic. Instead, in clang we can use varargs: BUILTIN(__builtin_bpf_typeid, "Wi.", "nc")> Also how do you plane to correlate such ID to dwarf info? > Instead of StructType we need to lookup DICompositeType, > but looks like there is no clear connection between call > arguments to metadata provided by clang.Not sure. I'd like try clang intrinsic again.> May be indeed it would be easier to add clang intrinsic > that will add metadata number as explicit constant. > > I didn't really have time to explore this problem in depth. > May be we can make the clear problem statement and someone > on llvm list that familiar with debug info can help design > a solution. > Let me state what I think we're trying to do. > For the program: > void foo(void * ptr); > void bar(...) > { > struct S s; > ... > foo(&s); > } > We want to be able to scan .o file and for the callsite of > foo, we want to be able to find an id of DICompositeType > looking at binary code of .o, so we can lookup this id in > dwarf info (that is also part of .o) and figure out the layout > of the struct passed into the function foo. >Yes. I think if we can generate program like this we solve this problem: struct structure1 { int ID; int x; int y; }; struct structure2 { int ID; int a; int b; }; enum bpf_types { BPF_TYPE_structure1 = 1, BPF_TYPE_structure2 = 2, }; int func(void) { struct structure1 var1; struct structure2 var2; var1.ID = BPF_TYPE_structure1; var2.ID = BPF_TYPE_structure2; foo(&var1); foo(&var2); return 0; } The key is the enum type. The value of BPF_TYPE_structure{1,2} will be recorded in DWARF info like: <1><2a>: Abbrev Number: 2 (DW_TAG_enumeration_type) <2b> DW_AT_name : (indirect string, offset: 0xf4): bpf_types <2f> DW_AT_byte_size : 4 <30> DW_AT_decl_file : 1 <31> DW_AT_decl_line : 12 <2><32>: Abbrev Number: 3 (DW_TAG_enumerator) <33> DW_AT_name : (indirect string, offset: 0xcc): BPF_TYPE_structure1 <37> DW_AT_const_value : 1 <2><38>: Abbrev Number: 3 (DW_TAG_enumerator) <39> DW_AT_name : (indirect string, offset: 0xe0): BPF_TYPE_structure2 <3d> DW_AT_const_value : 2 So we can connect the ID field and type with them. DW_AT_const_value can also be used by const, so we may be enum can be replaced. Thank you.