thr3ads.net - llvm dev - [LLVMdev] Backend optimizations [Jan 2015]

If this information is useful, please help other people find it:
Share via:

Rinaldini Julien

2015-Jan-26 14:33 UTC

[LLVMdev] Backend optimizations

> From the department of ignorance and stupid suggestions: Run this
> conversion after other passes that deal with call instructions?
Yes, but my modifications are not made in a MachineFunctionPass, it's a
custom inserter for an intrinsic... I'm not sure when intrinsic lowering
is applied, but I guess before any MachineFunctionPass? So I'm not sure
I can chosse the order at this point.
> Side question: Is this targeting some unusual x86 architecture, as I
> believe this would be quite detrimental to performance on modern CPUs,
> as they use the pair of call/return to do predictive execution, so if
> you remove the CALL, the return will "unbalance" the call stack
> management, and lead to slower execution.
The goal here is to add some obfuscation to the final binary, so some
performance loss is excepted!

Cheers

Herbie Robinson

2015-Jan-27 15:49 UTC

head link

[LLVMdev] Backend optimizations

On 1/26/15 9:33 AM, Rinaldini Julien wrote:> The goal here is to add some obfuscation to the final binary, so some
> performance loss is excepted!You could solve the unbalanced call/return by replacing the return with 
a pop and jump, but I don't think that's going to fool anybody for
long...

But this brings forth an idea: one interesting optimization for for 
internal subroutines that are not internally recursive would be to:

1.   Put the return address into a temp.
2.  Jump to the entry point (using a PHI instruction to collect the 
arguments).
3.  Use a computed branch on the temp instead of return.

This would be applicable for things that are too big to inline and don't 
allocate a lot of variables on the stack frame.  One wouldn't want to do 
it all the time, because kernel coders will occasionally segregate stub 
routines that need large stack for temporary storage out of the main 
data path to keep the frame size down in hot paths (to reduce cache 
footprint).  I've gotten 5-10% improvement in overall performance by 
doing that (with putnext, which was allocating a very large frame to 
format an error message that essentially never occurred).

Of course, now somebody will tell me LLVM is already doing this one :-)

Rinaldini Julien

2015-Jan-27 17:05 UTC

head link

[LLVMdev] Backend optimizations

> On 1/26/15 9:33 AM, Rinaldini Julien wrote:
>> The goal here is to add some obfuscation to the final binary, so some
>> performance loss is excepted!
> You could solve the unbalanced call/return by replacing the return with
> a pop and jump, but I don't think that's going to fool anybody for
long...
Yeah I know, you should not use that alone... But I have some other
stuffs ;)
> But this brings forth an idea: one interesting optimization for for
> internal subroutines that are not internally recursive would be to:
> 
> 1.   Put the return address into a temp.
> 2.  Jump to the entry point (using a PHI instruction to collect the
> arguments).
> 3.  Use a computed branch on the temp instead of return.
It should be possible to do that also I guess.

But actually I have some others problems. I tried to expand the call
like Tim Northover said, but I was not able to make it works. You have
to push the return address and I did not find a solution to add the
future address of the next instruction in the 'push'.

I tried to keep my solution in ISelLowering, so I can create the new
basic block, get the address and push it, add the 'jmp', leave the
normal call so the arguments are not destroyed, and then, in
MCInstLower, everytime there is a call, I detect if my intrinsic is
present and delete the call. But this don't work. Something else reorder
the basic block and it fails at link time because it cannot find the
basic block address :(

Cheers

llvm dev - Jan 2015 - [LLVMdev] Backend optimizations

[LLVMdev] Backend optimizations

[LLVMdev] Backend optimizations

[LLVMdev] Backend optimizations