> From the department of ignorance and stupid suggestions: Run this > conversion after other passes that deal with call instructions?Yes, but my modifications are not made in a MachineFunctionPass, it's a custom inserter for an intrinsic... I'm not sure when intrinsic lowering is applied, but I guess before any MachineFunctionPass? So I'm not sure I can chosse the order at this point.> Side question: Is this targeting some unusual x86 architecture, as I > believe this would be quite detrimental to performance on modern CPUs, > as they use the pair of call/return to do predictive execution, so if > you remove the CALL, the return will "unbalance" the call stack > management, and lead to slower execution.The goal here is to add some obfuscation to the final binary, so some performance loss is excepted! Cheers
On 1/26/15 9:33 AM, Rinaldini Julien wrote:> The goal here is to add some obfuscation to the final binary, so some > performance loss is excepted!You could solve the unbalanced call/return by replacing the return with a pop and jump, but I don't think that's going to fool anybody for long... But this brings forth an idea: one interesting optimization for for internal subroutines that are not internally recursive would be to: 1. Put the return address into a temp. 2. Jump to the entry point (using a PHI instruction to collect the arguments). 3. Use a computed branch on the temp instead of return. This would be applicable for things that are too big to inline and don't allocate a lot of variables on the stack frame. One wouldn't want to do it all the time, because kernel coders will occasionally segregate stub routines that need large stack for temporary storage out of the main data path to keep the frame size down in hot paths (to reduce cache footprint). I've gotten 5-10% improvement in overall performance by doing that (with putnext, which was allocating a very large frame to format an error message that essentially never occurred). Of course, now somebody will tell me LLVM is already doing this one :-)
> On 1/26/15 9:33 AM, Rinaldini Julien wrote: >> The goal here is to add some obfuscation to the final binary, so some >> performance loss is excepted! > You could solve the unbalanced call/return by replacing the return with > a pop and jump, but I don't think that's going to fool anybody for long...Yeah I know, you should not use that alone... But I have some other stuffs ;)> But this brings forth an idea: one interesting optimization for for > internal subroutines that are not internally recursive would be to: > > 1. Put the return address into a temp. > 2. Jump to the entry point (using a PHI instruction to collect the > arguments). > 3. Use a computed branch on the temp instead of return.It should be possible to do that also I guess. But actually I have some others problems. I tried to expand the call like Tim Northover said, but I was not able to make it works. You have to push the return address and I did not find a solution to add the future address of the next instruction in the 'push'. I tried to keep my solution in ISelLowering, so I can create the new basic block, get the address and push it, add the 'jmp', leave the normal call so the arguments are not destroyed, and then, in MCInstLower, everytime there is a call, I detect if my intrinsic is present and delete the call. But this don't work. Something else reorder the basic block and it fails at link time because it cannot find the basic block address :( Cheers