The front end I'm building for an existing interpreted language is unfortunately producing output similar to this far too often; define void @foo(i8* nocapture %dest, i8* nocapture %src, i32 %len) nounwind { %1 = tail call noalias i8* @malloc(i32 %len) nounwind tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %1, i8* %src, i32 %len, i32 1, i1 false) tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %1, i32 %len, i32 1, i1 false) tail call void @free(i8* %1) nounwind ret void } I'd like to be able to reduce this pattern to this; define void @foo(i8* nocapture %dest, i8* nocapture %src, i32 %len) nounwind { tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 %len, i32 1, i1 false) ret void } Optimising all cases of this pattern from within my front end's AST would be difficult. I'd rather implement this as an llvm pass or two that runs after other function passes have already cleaned up the mess I've made. Has anyone written any passes to detect and combine multiple memory copies that originated from the same data? And then eliminate stores and malloc / free pairs for local pointers that are never read from or captured? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130521/89b18cd1/attachment.html>
Hi Jeremy, On 21/05/13 13:15, Jeremy Lakeman wrote:> The front end I'm building for an existing interpreted language is unfortunately > producing output similar to this far too often; > > define void @foo(i8* nocapture %dest, i8* nocapture %src, i32 %len) nounwind { > %1 = tail call noalias i8* @malloc(i32 %len) nounwind > tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %1, i8* %src, i32 %len, i32 1, > i1 false) > tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %1, i32 %len, i32 1, > i1 false) > tail call void @free(i8* %1) nounwind > ret void > }could you allocate the memory on the stack instead (alloca instruction)? Ciao, Duncan.> > I'd like to be able to reduce this pattern to this; > > define void @foo(i8* nocapture %dest, i8* nocapture %src, i32 %len) nounwind { > tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 %len, i32 > 1, i1 false) > ret void > } > > Optimising all cases of this pattern from within my front end's AST would be > difficult. I'd rather implement this as an llvm pass or two that runs after > other function passes have already cleaned up the mess I've made. > > Has anyone written any passes to detect and combine multiple memory copies that > originated from the same data? > And then eliminate stores and malloc / free pairs for local pointers that are > never read from or captured? > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
> could you allocate the memory on the stack instead (alloca instruction)?This is mainly for string or binary blob handling, using the stack isn't a great idea for size reasons. While I'm experimenting with simple code examples now, and I picked a simple one for this email. I'm certain things will get much more complicated once I implement more features of the language. On Tue, May 21, 2013 at 8:45 PM, Jeremy Lakeman <Jeremy.Lakeman at gmail.com>wrote:> The front end I'm building for an existing interpreted language is > unfortunately producing output similar to this far too often; > > define void @foo(i8* nocapture %dest, i8* nocapture %src, i32 %len) > nounwind { > %1 = tail call noalias i8* @malloc(i32 %len) nounwind > tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %1, i8* %src, i32 %len, > i32 1, i1 false) > tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %1, i32 %len, > i32 1, i1 false) > tail call void @free(i8* %1) nounwind > ret void > } > > I'd like to be able to reduce this pattern to this; > > define void @foo(i8* nocapture %dest, i8* nocapture %src, i32 %len) > nounwind { > tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 %len, > i32 1, i1 false) > ret void > } > > Optimising all cases of this pattern from within my front end's AST would > be difficult. I'd rather implement this as an llvm pass or two that runs > after other function passes have already cleaned up the mess I've made. > > Has anyone written any passes to detect and combine multiple memory copies > that originated from the same data? > And then eliminate stores and malloc / free pairs for local pointers that > are never read from or captured? > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130521/88e90d02/attachment.html>
I have been playing with some ideas in this space. I haven't gotten beyond toy implementations yet, but would be happy to brainstorm if nothing else. I'm traveling at the moment, but should have some time next week if you want to discuss. Philip Reames ---- Apologies for any terseness; typing on a phone's keyboard does not lend itself to verbosity. On May 21, 2013, at 7:15 AM, Jeremy Lakeman <Jeremy.Lakeman at gmail.com> wrote:> The front end I'm building for an existing interpreted language is unfortunately producing output similar to this far too often; > > define void @foo(i8* nocapture %dest, i8* nocapture %src, i32 %len) nounwind { > %1 = tail call noalias i8* @malloc(i32 %len) nounwind > tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %1, i8* %src, i32 %len, i32 1, i1 false) > tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %1, i32 %len, i32 1, i1 false) > tail call void @free(i8* %1) nounwind > ret void > } > > I'd like to be able to reduce this pattern to this; > > define void @foo(i8* nocapture %dest, i8* nocapture %src, i32 %len) nounwind { > tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 %len, i32 1, i1 false) > ret void > } > > Optimising all cases of this pattern from within my front end's AST would be difficult. I'd rather implement this as an llvm pass or two that runs after other function passes have already cleaned up the mess I've made. > > Has anyone written any passes to detect and combine multiple memory copies that originated from the same data? > And then eliminate stores and malloc / free pairs for local pointers that are never read from or captured? > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Hi Jeremy, On 21/05/13 14:34, Jeremy Lakeman wrote:> > could you allocate the memory on the stack instead (alloca instruction)? > > This is mainly for string or binary blob handling, using the stack isn't a great > idea for size reasons. > > While I'm experimenting with simple code examples now, and I picked a simple one > for this email. I'm certain things will get much more complicated once I > implement more features of the language.the optimizer that does memcpy forwarding is in lib/Transforms/Scalar/MemCpyOptimizer.cpp You might want to look into teaching it how to handle malloc'd memory and not just alloca instructions. I think the logic is in MemCpyOpt::performCallSlotOptzn Ciao, Duncan.> > > On Tue, May 21, 2013 at 8:45 PM, Jeremy Lakeman <Jeremy.Lakeman at gmail.com > <mailto:Jeremy.Lakeman at gmail.com>> wrote: > > The front end I'm building for an existing interpreted language is > unfortunately producing output similar to this far too often; > > define void @foo(i8* nocapture %dest, i8* nocapture %src, i32 %len) nounwind { > %1 = tail call noalias i8* @malloc(i32 %len) nounwind > tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %1, i8* %src, i32 %len, i32 > 1, i1 false) > tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %1, i32 %len, > i32 1, i1 false) > tail call void @free(i8* %1) nounwind > ret void > } > > I'd like to be able to reduce this pattern to this; > > define void @foo(i8* nocapture %dest, i8* nocapture %src, i32 %len) nounwind { > tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 %len, > i32 1, i1 false) > ret void > } > > Optimising all cases of this pattern from within my front end's AST would be > difficult. I'd rather implement this as an llvm pass or two that runs after > other function passes have already cleaned up the mess I've made. > > Has anyone written any passes to detect and combine multiple memory copies > that originated from the same data? > And then eliminate stores and malloc / free pairs for local pointers that > are never read from or captured? > > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Possibly Parallel Threads
- [LLVMdev] malloc / free & memcpy optimisations.
- new @llvm.memcpy and @llvm.memset API in trunk - how to use alignment?
- new @llvm.memcpy and @llvm.memset API in trunk - how to use alignment?
- AliasAnalysis does not look though a memcpy
- AliasAnalysis does not look though a memcpy