Hück, Alexander via llvm-dev
2018-Jul-19 15:55 UTC
[llvm-dev] Possible to query type information from a malloc in optimized codes
Hello, I am working on a pass that tries to extract type information from, say, all malloc statements in LLVM-IR (source language is C). For debug code, this can be achieved by looking up the respective bitcast instruction and extracting the type from it. However, in optimized code, the LLVM-IR omits these direct bitcasts in different scenarios (see example after the question). My question now, is there any way to use, e.g., debug data or some use-def search to reliably extract the correct type information for such a malloc? For one instance, consider the following C code: typedef struct { int nvars; int* vars; } struct_grid; void set(struct_grid* pgrid, int nvars, int* vars_n) { int* new_vars; new_vars = (int*)malloc(nvars * sizeof(int)); for (int i = 0; i < nvars; i++) { new_vars[i] = vars_n[i]; } pgrid->vars = new_vars; } Compiled with -g, we get the expected bitcast. With optimizations, we get: %6 = tail call i8* @malloc(i64 %5) ; the malloc, no subsequent bitcast ... call void @llvm.memcpy.p0i8.p0i8.i64(i8* %6, i8* %10, i64 %12, i32 4, i1 false) Thus, the %6 is never casted, as it is directly put into the memcpy operation. Only later, through some indirection when new_vars is assigned to pgrid->vars can we get the real type: %14 = getelementptr inbounds %struct.struct_grid, %struct.struct_grid* %0, i64 0, i32 1, !dbg !38 %15 = bitcast i32** %14 to i8**, !dbg !39 store i8* %6, i8** %15, align 8, !dbg !39, !tbaa !40 ret void Thanks in advance. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180719/6bf6fedb/attachment.html>
Stephen Kell via llvm-dev
2018-Jul-23 10:54 UTC
[llvm-dev] Possible to query type information from a malloc in optimized codes
> I am working on a pass that tries to extract type information from, > say, all malloc statements in LLVM-IR (source language is C). > > For debug code, this can be achieved by looking up the respective > bitcast instruction and extracting the type from it.> However, in optimized code, the LLVM-IR omits these direct bitcasts > in different scenarios (see example after the question). > > My question now, is there any way to use, e.g., debug data or some > use-def search to reliably extract the correct type information for > such a malloc?Hi Alexander. Not an LLVM-flavoured answer, but in case it's useful, this is something that the tooling from my liballocs project can do for C source code. <https://github.com/stephenrkell/liballocs> Looking at bitcasts is at best heuristic since even in debug code there need not be a bitcast in all circumstances. My approach -- also heuristic, I admit -- has been to analyse the use of "sizeof" in C source code. This works pretty well, with the caveat that if you have malloc wrappers in the mix, since the sizeof occurs at the wrapper call, not the malloc call, you have to declare such wrappers to the tool. (I agree with you that allocation sites could usefully be described in debugging information; at present I'm not aware of any toolchains that do this.) Feel free to mail me off-list if you have questions about building/using liballocs... it's not mega-friendly as yet, though I am interested in improving that. Stephen
David Blaikie via llvm-dev
2018-Jul-24 00:59 UTC
[llvm-dev] Possible to query type information from a malloc in optimized codes
Type information isn't preserved in LLVM IR- the debug info will provide a best-effort, but optimizations might pull apart structures, collapse values across different variables, etc. So it's potentially lossy. You can either use debug info (which carries as much type information as is available right now - many quality of implementation issues/areas of improvement where the debug information is lossy) or potentially insert your own intrinsics in the frontend to track the properties you care about. - Dave On Thu, Jul 19, 2018 at 10:59 AM Hück, Alexander via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hello, > > > I am working on a pass that tries to extract type information from, say, > all malloc statements in LLVM-IR (source language is C). > > For debug code, this can be achieved by looking up the respective bitcast > instruction and extracting the type from it. > > > However, in optimized code, the LLVM-IR omits these direct bitcasts in > different scenarios (see example after the question). > > > My question now, is there any way to use, e.g., debug data or some use-def > search to reliably extract the correct type information for such a malloc? > > > > For one instance, consider the following C code: > > typedef struct { > int nvars; > int* vars; > } struct_grid; > > void set(struct_grid* pgrid, int nvars, int* vars_n) { > int* new_vars; > new_vars = (int*)malloc(nvars * sizeof(int)); > for (int i = 0; i < nvars; i++) { > new_vars[i] = vars_n[i]; > } > pgrid->vars = new_vars; > } > > Compiled with -g, we get the expected bitcast. With optimizations, we get: > > %6 = tail call i8* @malloc(i64 %5) ; the malloc, no subsequent bitcast > > ... > > call void @llvm.memcpy.p0i8.p0i8.i64(i8* %6, i8* %10, i64 %12, i32 4, i1 > false) > > > Thus, the %6 is never casted, as it is directly put into the memcpy > operation. > > > Only later, through some indirection when new_vars is assigned to > pgrid->vars can we get the real type: > > %14 = getelementptr inbounds %struct.struct_grid, %struct.struct_grid* > %0, i64 0, i32 1, !dbg !38 > %15 = bitcast i32** %14 to i8**, !dbg !39 > store i8* %6, i8** %15, align 8, !dbg !39, !tbaa !40 > ret void > > > > > Thanks in advance. > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180723/dec093ea/attachment.html>