thr3ads.net - llvm dev - [LLVMdev] malloc / free & memcpy optimisations. [May 2013]

If this information is useful, please help other people find it:
Share via:

Jeremy Lakeman

2013-May-21 11:15 UTC

[LLVMdev] malloc / free & memcpy optimisations.

The front end I'm building for an existing interpreted language is
unfortunately producing output similar to this far too often;

define void @foo(i8* nocapture %dest, i8* nocapture %src, i32 %len)
nounwind {
  %1 = tail call noalias i8* @malloc(i32 %len) nounwind
  tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %1, i8* %src, i32 %len, i32
1, i1 false)
  tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %1, i32 %len,
i32 1, i1 false)
  tail call void @free(i8* %1) nounwind
  ret void
}

I'd like to be able to reduce this pattern to this;

define void @foo(i8* nocapture %dest, i8* nocapture %src, i32 %len)
nounwind {
  tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 %len,
i32 1, i1 false)
  ret void
}

Optimising all cases of this pattern from within my front end's AST would
be difficult. I'd rather implement this as an llvm pass or two that runs
after other function passes have already cleaned up the mess I've made.

Has anyone written any passes to detect and combine multiple memory copies
that originated from the same data?
And then eliminate stores and malloc / free pairs for local pointers that
are never read from or captured?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130521/89b18cd1/attachment.html>

Duncan Sands

2013-May-21 11:57 UTC

head link

[LLVMdev] malloc / free & memcpy optimisations.

Hi Jeremy,

On 21/05/13 13:15, Jeremy Lakeman wrote:> The front end I'm building for an existing interpreted language is
unfortunately
> producing output similar to this far too often;
>
> define void @foo(i8* nocapture %dest, i8* nocapture %src, i32 %len)
nounwind {
>    %1 = tail call noalias i8* @malloc(i32 %len) nounwind
>    tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %1, i8* %src, i32 %len,
i32 1,
> i1 false)
>    tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %1, i32 %len,
i32 1,
> i1 false)
>    tail call void @free(i8* %1) nounwind
>    ret void
> }
could you allocate the memory on the stack instead (alloca instruction)?

Ciao, Duncan.
>
> I'd like to be able to reduce this pattern to this;
>
> define void @foo(i8* nocapture %dest, i8* nocapture %src, i32 %len)
nounwind {
>    tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 %len,
i32
> 1, i1 false)
>    ret void
> }
>
> Optimising all cases of this pattern from within my front end's AST
would be
> difficult. I'd rather implement this as an llvm pass or two that runs
after
> other function passes have already cleaned up the mess I've made.
>
> Has anyone written any passes to detect and combine multiple memory copies
that
> originated from the same data?
> And then eliminate stores and malloc / free pairs for local pointers that
are
> never read from or captured?
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Jeremy Lakeman

2013-May-21 12:34 UTC

head link

[LLVMdev] malloc / free & memcpy optimisations.

> could you allocate the memory on the stack instead (alloca instruction)?
This is mainly for string or binary blob handling, using the stack isn't a
great idea for size reasons.

While I'm experimenting with simple code examples now, and I picked a
simple one for this email. I'm certain things will get much more
complicated once I implement more features of the language.


On Tue, May 21, 2013 at 8:45 PM, Jeremy Lakeman <Jeremy.Lakeman at
gmail.com>wrote:
> The front end I'm building for an existing interpreted language is
> unfortunately producing output similar to this far too often;
>
> define void @foo(i8* nocapture %dest, i8* nocapture %src, i32 %len)
> nounwind {
>   %1 = tail call noalias i8* @malloc(i32 %len) nounwind
>   tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %1, i8* %src, i32 %len,
> i32 1, i1 false)
>   tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %1, i32 %len,
> i32 1, i1 false)
>   tail call void @free(i8* %1) nounwind
>   ret void
> }
>
> I'd like to be able to reduce this pattern to this;
>
> define void @foo(i8* nocapture %dest, i8* nocapture %src, i32 %len)
> nounwind {
>   tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 %len,
> i32 1, i1 false)
>   ret void
> }
>
> Optimising all cases of this pattern from within my front end's AST
would
> be difficult. I'd rather implement this as an llvm pass or two that
runs
> after other function passes have already cleaned up the mess I've made.
>
> Has anyone written any passes to detect and combine multiple memory copies
> that originated from the same data?
> And then eliminate stores and malloc / free pairs for local pointers that
> are never read from or captured?
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130521/88e90d02/attachment.html>

Philip Reames

2013-May-21 16:00 UTC

head link

[LLVMdev] malloc / free & memcpy optimisations.

I have been playing with some ideas in this space.  I haven't gotten beyond
toy implementations yet, but would be happy to brainstorm if nothing else.

I'm traveling at the moment, but should have some time next week if you want
to discuss.

Philip Reames

----
Apologies for any terseness; typing on a phone's keyboard does not lend
itself to verbosity.

On May 21, 2013, at 7:15 AM, Jeremy Lakeman <Jeremy.Lakeman at gmail.com>
wrote:
> The front end I'm building for an existing interpreted language is
unfortunately producing output similar to this far too often;
> 
> define void @foo(i8* nocapture %dest, i8* nocapture %src, i32 %len)
nounwind {
>   %1 = tail call noalias i8* @malloc(i32 %len) nounwind
>   tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %1, i8* %src, i32 %len, i32
1, i1 false)
>   tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %1, i32 %len,
i32 1, i1 false)
>   tail call void @free(i8* %1) nounwind
>   ret void
> }
> 
> I'd like to be able to reduce this pattern to this;
> 
> define void @foo(i8* nocapture %dest, i8* nocapture %src, i32 %len)
nounwind {
>   tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 %len,
i32 1, i1 false)
>   ret void
> }
> 
> Optimising all cases of this pattern from within my front end's AST
would be difficult. I'd rather implement this as an llvm pass or two that
runs after other function passes have already cleaned up the mess I've made.
> 
> Has anyone written any passes to detect and combine multiple memory copies
that originated from the same data?
> And then eliminate stores and malloc / free pairs for local pointers that
are never read from or captured?
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Duncan Sands

2013-May-21 18:50 UTC

head link

[LLVMdev] malloc / free & memcpy optimisations.

Hi Jeremy,

On 21/05/13 14:34, Jeremy Lakeman wrote:>  > could you allocate the memory on the stack instead (alloca
instruction)?
>
> This is mainly for string or binary blob handling, using the stack
isn't a great
> idea for size reasons.
>
> While I'm experimenting with simple code examples now, and I picked a
simple one
> for this email. I'm certain things will get much more complicated once
I
> implement more features of the language.
the optimizer that does memcpy forwarding is in
   lib/Transforms/Scalar/MemCpyOptimizer.cpp
You might want to look into teaching it how to handle malloc'd memory and
not
just alloca instructions.  I think the logic is in
   MemCpyOpt::performCallSlotOptzn

Ciao, Duncan.
>
>
> On Tue, May 21, 2013 at 8:45 PM, Jeremy Lakeman <Jeremy.Lakeman at
gmail.com
> <mailto:Jeremy.Lakeman at gmail.com>> wrote:
>
>     The front end I'm building for an existing interpreted language is
>     unfortunately producing output similar to this far too often;
>
>     define void @foo(i8* nocapture %dest, i8* nocapture %src, i32 %len)
nounwind {
>        %1 = tail call noalias i8* @malloc(i32 %len) nounwind
>        tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %1, i8* %src, i32
%len, i32
>     1, i1 false)
>        tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %1, i32
%len,
>     i32 1, i1 false)
>        tail call void @free(i8* %1) nounwind
>        ret void
>     }
>
>     I'd like to be able to reduce this pattern to this;
>
>     define void @foo(i8* nocapture %dest, i8* nocapture %src, i32 %len)
nounwind {
>        tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32
%len,
>     i32 1, i1 false)
>        ret void
>     }
>
>     Optimising all cases of this pattern from within my front end's AST
would be
>     difficult. I'd rather implement this as an llvm pass or two that
runs after
>     other function passes have already cleaned up the mess I've made.
>
>     Has anyone written any passes to detect and combine multiple memory
copies
>     that originated from the same data?
>     And then eliminate stores and malloc / free pairs for local pointers
that
>     are never read from or captured?
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Apparently Analagous Threads

Search for more apparently analagous threads

llvm dev - May 2013 - [LLVMdev] malloc / free & memcpy optimisations.

[LLVMdev] malloc / free & memcpy optimisations.

[LLVMdev] malloc / free & memcpy optimisations.

[LLVMdev] malloc / free & memcpy optimisations.

[LLVMdev] malloc / free & memcpy optimisations.

[LLVMdev] malloc / free & memcpy optimisations.

Apparently Analagous Threads