Wangnan (F) via llvm-dev
2015-Aug-12 05:28 UTC
[llvm-dev] llvm bpf debug info. Re: [RFC PATCH v4 3/3] bpf: Introduce function for outputing data to perf event
On 2015/8/12 12:57, Alexei Starovoitov wrote:> On Wed, Aug 12, 2015 at 10:34:43AM +0800, Wangnan (F) via llvm-dev wrote: >> Think about a program like this: >> >> struct strA { int a; } >> struct strB { int b; } >> int func() { >> struct strA a; >> struct strB b; >> >> a.a = 1; >> b.b = 2; >> bpf_output(gettype(a), &a); >> bpf_output(gettype(b), &b); >> return 0; >> } >> >> BPF backend can't (and needn't) tell the difference between local >> variables a and b in theory. In LLVM implementation, it filters type >> information out using ComputeValueVTs(). Please have a look at >> SelectionDAGBuilder::visitIntrinsicCall in >> lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp and >> SelectionDAGBuilder::visitTargetIntrinsic in the same file. in >> visitTargetIntrinsic, ComputeValueVTs acts as a barrier which strips >> type information out from CallInst ("I"), and leave SDValue and SDVTList >> ("Ops" and "VTs") to target code. SDValue and SDVTList are wrappers of >> EVT and MVT, all information we concern won't be passed here. >> >> I think now we have 2 choices: >> >> 1. Hacking into clang, implement target specific builtin function. Now I >> have worked out a ugly but workable patch which setup a builtin function: >> __builtin_bpf_typeid(), which accepts local or global variable then >> returns different constant for different types. >> >> 2. Implementing an LLVM intrinsic call (llvm.typeid), make it be processed >> in >> visitIntrinsicCall(). I think we can get something useful if it is >> processed >> with that function. > Yeah. You're right about pure target intrinsics. > I think llvm.typeid might work. imo it's cleaner than > doing it at clang level. > >> The next thing should be generating debug information to map type and >> constants which issued by __builtin_bpf_typeid() or llvm.typeid. Now we >> have a crazy idea that, if we limit the name of the structure to 8 bytes, >> we can insert the name into a u64, then there would be no need to consider >> type information in DWARF. For example, in the above sample code, gettype(a) >> will issue 0x0000000041727473 because its type is "strA". What do you think? > that's way too hacky. > I was thinking when compiling we can keep llvm ir along with .o > instead of dwarf and extract type info from there. > dwarf has names and other things that we don't need. We only > care about actual field layout of the structs. > But it probably won't be easy to parse llvm ir on perf side > instead of dwarf.Shipping both llvm IR and .o to perf makes it harder to use. I'm not sure whether it is a good idea. If we are unable to encode the structure using a u64, let's still dig into dwarf. We have another idea that we can utilize dwarf's existing feature. For example, when __buildin_bpf_typeid() get called, define an enumerate type in dwarf info, so you'll find: <1><2a>: Abbrev Number: 2 (DW_TAG_enumeration_type) <2b> DW_AT_name : (indirect string, offset: 0xec): TYPEINFO <2f> DW_AT_byte_size : 4 <30> DW_AT_decl_file : 1 <31> DW_AT_decl_line : 3 <2><32>: Abbrev Number: 3 (DW_TAG_enumerator) <33> DW_AT_name : (indirect string, offset: 0xcc): __typeinfo_strA <37> DW_AT_const_value : 2 <2><38>: Abbrev Number: 3 (DW_TAG_enumerator) <39> DW_AT_name : (indirect string, offset: 0xdc): __typeinfo_strB <3d> DW_AT_const_value : 3 or this: <3><54>: Abbrev Number: 4 (DW_TAG_variable) <55> DW_AT_const_value : 2 <66> DW_AT_name : (indirect string, offset: 0x1e): __typeinfo_strA <6a> DW_AT_decl_file : 1 <6b> DW_AT_decl_line : 29 <6c> DW_AT_type : <0x72> then from DW_AT_name and DW_AT_const_value we can do the mapping. Drawback is that all __typeinfo_ prefixed names become reserved.> btw, if you haven't looked at iovisor/bcc, there we're solving > similar problem differently. There we use clang rewriter, so all > structs fields are visible at this level, then we use bpf backend > in JIT mode and push bpf instructions into the kernel on the fly > completely skipping ELF and .o > For example in: > https://github.com/iovisor/bcc/blob/master/examples/distributed_bridge/tunnel.c > when you see > struct ethernet_t { > unsigned long long dst:48; > unsigned long long src:48; > unsigned int type:16; > } BPF_PACKET_HEADER; > struct ethernet_t *ethernet = cursor_advance(cursor, sizeof(*ethernet)); > ... ethernet->src ... > is recognized by clang rewriter and ->src is converted to a different > C code that is sent again into clang. > So there is no need to use dwarf or patch clang/llvm. clang rewriter > has all the info.Could you please give us further information about your clang rewriter? I guess you need a new .so when injecting those code into kernel?> I'm not sure you can live with clang/llvm on the host where you > want to run the tracing bits, but if you can that's an easier option. >I'm not sure. Our target platform should be embedded devices like smartphone. Bringing full clang/llvm environment there is not acceptable. Thank you.
Brenden Blanco via llvm-dev
2015-Aug-12 13:15 UTC
[llvm-dev] llvm bpf debug info. Re: [RFC PATCH v4 3/3] bpf: Introduce function for outputing data to perf event
Hi Wangnan, I've been authoring the BCC development, so I'll answer those specific questions.> > > Could you please give us further information about your clang rewriter? > I guess you need a new .so when injecting those code into kernel?The rewriter runs all of its passes in a single process, creating no files on disk and having no external dependencies in terms of toolchain. 1. Entry point: bpf_module_create() - C API call to create module, can take filename or directly a c string with the full contents of the program 2. Convert contents into a clang memory buffer 3. Set up a clang driver::CompilerInvocation in the style of the clang interpreter example 4. Run a rewriter pass over the memory buffer file, annotating and/or doing BPF specific magic on the input source a. Open BPF maps with a call to bpf_create_map directly b. Convert references to map operations with the specific FD of the new map c. Convert arguments to bpf_probe_read calls as needed d. Collect the externed function names to avoid section() hack in the language 5. Re-run the CompilerInvocation on the modified sources 6. JIT the llvm::Module to bpf arch 7. Load the resulting in-memory ".o" to bpf_prog_load, keeping the FD alive in the compiler process 8. Attach the FD as necessary to perf events, socket, tc, etc. 9. goto 1 The above steps are captured in the BCC github repo in src/cc, with the clang specific bits inside of the frontends/clang subdirectory.> I'm not sure. Our target platform should be embedded devices like > smartphone. > Bringing full clang/llvm environment there is not acceptable.The artifact from the build process of BCC is a shared library, which has the clang/llvm .a embedded within them. It is not yet a single binary, but not unfeasible to make it so. The clang toolchain itself does not need to exist on the target. I have not attempted to cross-compile BCC to any architecture, currently x86_64 only. If you have more BCC specific questions not involving clang/llvm, perhaps you can ping Alexei/myself off of the llvm-dev list, in case this discussion is not relevant to them.
Wangnan (F) via llvm-dev
2015-Aug-13 06:24 UTC
[llvm-dev] llvm bpf debug info. Re: [RFC PATCH v4 3/3] bpf: Introduce function for outputing data to perf event
Thank you for your reply. Add He Kuang to CC list. On 2015/8/12 21:15, Brenden Blanco wrote:> Hi Wangnan, I've been authoring the BCC development, so I'll answer > those specific questions. >> >> Could you please give us further information about your clang rewriter? >> I guess you need a new .so when injecting those code into kernel? > The rewriter runs all of its passes in a single process, creating no > files on disk and having no external dependencies in terms of > toolchain. > 1. Entry point: bpf_module_create() - C API call to create module, can > take filename or directly a c string with the full contents of the > program > 2. Convert contents into a clang memory buffer > 3. Set up a clang driver::CompilerInvocation in the style of the clang > interpreter example > 4. Run a rewriter pass over the memory buffer file, annotating and/or > doing BPF specific magic on the input source > a. Open BPF maps with a call to bpf_create_map directly > b. Convert references to map operations with the specific FD of the new map > c. Convert arguments to bpf_probe_read calls as needed > d. Collect the externed function names to avoid section() hack in the language > 5. Re-run the CompilerInvocation on the modified sources > 6. JIT the llvm::Module to bpf arch > 7. Load the resulting in-memory ".o" to bpf_prog_load, keeping the FD > alive in the compiler process > 8. Attach the FD as necessary to perf events, socket, tc, etc. > 9. goto 1 > > The above steps are captured in the BCC github repo in src/cc, with > the clang specific bits inside of the frontends/clang subdirectory. > >> I'm not sure. Our target platform should be embedded devices like >> smartphone. >> Bringing full clang/llvm environment there is not acceptable. > The artifact from the build process of BCC is a shared library, which > has the clang/llvm .a embedded within them. It is not yet a single > binary, but not unfeasible to make it so. The clang toolchain itself > does not need to exist on the target. I have not attempted to > cross-compile BCC to any architecture, currently x86_64 only. > > If you have more BCC specific questions not involving clang/llvm, > perhaps you can ping Alexei/myself off of the llvm-dev list, in case > this discussion is not relevant to them.
Wang Nan via llvm-dev
2015-Aug-14 10:05 UTC
[llvm-dev] [LLVM RFC] Add llvm.typeid.for intrinsic
This is for BPF output. BPF program output bytes to perf through a tracepoint. For decoding such data, we need a way to describe the format of the buffer. This patch is a try which gives each variable a unique number by introducing a new intrinsic 'llvm.typeid.for'. At the bottom is an example of using that intrinsic and the result of $ clang -target bpf -O2 -c -S ./test_typeid.c There is a limitation of the newly introduced intrinsic that, I can't find a way to make the intrinsic to accept all types without name mangling. Therefore, we have to define different intrinsics for different type. See the example below, by using macro trick, we define llvm.typeid.for.p0struct.mystr and llvm.typeid.for.p0struct.mystr2, and also the different output functions. Another problem is that I'm still unable to find a way to insert dwarf information in this stage. After clang, debug information are already isolated, and debug information entries are linked together. Adjusting debug information requires me to create new metadata and new debug info entries, link them properly then insert into correct place. Which is possible, but makes code ugly. Because of the above two problems, I decided to try clang builtin again. I think that should be the last try. If still not work, then I'd like to stop working on it until I have any better idea (BCC rewriter should be a considerable solution). Let patch series 'Make eBPF programs output data to perf' be merged into upstream without the 'typeid' change. Before the decoding problem solved, we have to let user decode the BPF output themself manually or use perf script or babeltrace script. Thank you. ----------------- EXAMPLE ----------------- extern void output(int id, void *ptr, int size); #define OUTPUT_STR(name) \ struct name { #define OUTPUT_STR_END(name) \ }; \ unsigned long long __get_typeid_##name(struct name *) asm ("llvm.typeid.for.p0struct."#name); \ static inline void output_##name(struct name *str) \ {\ output(__get_typeid_##name(str), str, sizeof(struct name));\ };\ static struct name __g_##name; OUTPUT_STR(mystr) int x; int y; int z; OUTPUT_STR_END(mystr); OUTPUT_STR(mystr2) int x; int y; OUTPUT_STR_END(mystr2); --------------- RESULT ------------- int func(void) { int x = 123; struct mystr myvar; struct mystr2 myvar2; output_mystr(&myvar); output_mystr2(&myvar2); output_mystr(&myvar); return 0; } int func2(void) { int x = 123; struct mystr myvar; struct mystr2 myvar2; output_mystr2(&myvar2); output_mystr(&myvar); output_mystr2(&myvar2); return 0; } .text .globl func .align 8 func: # @func # BB#0: # %entry mov r6, r10 addi r6, -16 mov r1, 1 mov r2, r6 mov r3, 12 call output mov r2, r10 addi r2, -24 mov r1, 2 mov r3, 8 call output mov r1, 1 mov r2, r6 mov r3, 12 call output mov r0, 0 ret .globl func2 .align 8 func2: # @func2 # BB#0: # %entry mov r6, r10 addi r6, -24 mov r1, 2 mov r2, r6 mov r3, 8 call output mov r2, r10 addi r2, -16 mov r1, 1 mov r3, 12 call output mov r1, 2 mov r2, r6 mov r3, 8 call output mov r0, 0 ret Signed-off-by: Wang Nan <wangnan0 at huawei.com> --- include/llvm/IR/Intrinsics.td | 1 + lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp | 27 ++++++++++++++++++++++++ lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h | 1 + 3 files changed, 29 insertions(+) diff --git a/include/llvm/IR/Intrinsics.td b/include/llvm/IR/Intrinsics.td index 83cfebe..8ebeb24 100644 --- a/include/llvm/IR/Intrinsics.td +++ b/include/llvm/IR/Intrinsics.td @@ -640,6 +640,7 @@ def int_masked_scatter: Intrinsic<[], def int_bitset_test : Intrinsic<[llvm_i1_ty], [llvm_ptr_ty, llvm_metadata_ty], [IntrNoMem]>; +def int_typeid_for : Intrinsic<[llvm_i64_ty], [llvm_any_ty], [IntrNoMem]>; //===----------------------------------------------------------------------===// // Target-specific intrinsics //===----------------------------------------------------------------------===// diff --git a/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp index ce4912f..ff453cd 100644 --- a/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp +++ b/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp @@ -5077,6 +5077,10 @@ SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I, unsigned Intrinsic) { setValue(&I, N); return nullptr; } + case Intrinsic::typeid_for: { + visitTypeidfor(I); + return nullptr; + } } } @@ -6769,6 +6773,29 @@ void SelectionDAGBuilder::visitPatchpoint(ImmutableCallSite CS, FuncInfo.MF->getFrameInfo()->setHasPatchPoint(); } +void SelectionDAGBuilder::visitTypeidfor(const CallInst &CI) { + SDValue Res; + static std::vector<const StructType *> StructTypes; + int ID = -1; + Value *PtrArg = CI.getArgOperand(0); + PointerType *PTy = cast<PointerType>(PtrArg->getType()); + if (PTy) { + StructType *STy = cast<StructType>(PTy->getElementType()); + if (STy) { + for (unsigned i = 0, N = StructTypes.size(); i != N; ++i) + if (StructTypes[i] == STy) + ID = i + 1; + if (ID == -1) { + StructTypes.push_back(STy); + ID = StructTypes.size(); + } + } + } + + Res = DAG.getConstant(ID, getCurSDLoc(), MVT::i32); + setValue(&CI, Res); +} + /// Returns an AttributeSet representing the attributes applied to the return /// value of the given call. static AttributeSet getReturnAttrs(TargetLowering::CallLoweringInfo &CLI) { diff --git a/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h b/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h index f71190d..f037689 100644 --- a/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h +++ b/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h @@ -855,6 +855,7 @@ private: void visitStatepoint(const CallInst &I); void visitGCRelocate(const CallInst &I); void visitGCResult(const CallInst &I); + void visitTypeidfor(const CallInst &I); void visitUserOp1(const Instruction &I) { llvm_unreachable("UserOp1 should not exist at instruction selection time!"); -- 1.8.3.4