Joerg Sonnenberger
2011-Feb-27 00:29 UTC
[LLVMdev] TableGen syntax for matching a constant load
On Sat, Feb 26, 2011 at 02:04:54PM -0800, Jakob Stoklund Olesen wrote:> > On Feb 26, 2011, at 1:36 PM, Joerg Sonnenberger wrote: > > > On Sat, Feb 26, 2011 at 01:07:39PM -0800, Jakob Stoklund Olesen wrote: > >> > >> You may want to consider using xorl+decl instead. It is also three > >> bytes, and there are no false dependencies. The xor idiom is recognized > >> by processors as old as Pentium 4 as having no dependencies. > > > > Any examples of how to create more than one instructions for a given > > pattern? There are some other cases I could use this for. > > def : Pat<(i32 -1), (DEC32r (MOV32r0))>;Hm. Right. This gives the me first set of size peep hole optmisations as attached. I didn't add the above rule for 64bit builds, since it is larger than the to-be-figured out OR32rmi8 / OR64rmi8. Joerg -------------- next part -------------- A non-text attachment was scrubbed... Name: X86InstrCompiler.td.diff Type: text/x-diff Size: 876 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110227/94a8caea/attachment.diff>
Joerg Sonnenberger
2011-Feb-27 00:50 UTC
[LLVMdev] TableGen syntax for matching a constant load
On Sun, Feb 27, 2011 at 01:29:25AM +0100, Joerg Sonnenberger wrote:> +let Predicates = [OptForSize] in { > +def : Pat<(store (i32 0), addr:$dst), (AND32mi8 addr:$dst, 0)>; > +def : Pat<(store (i32 0), addr:$dst), (AND32mi8 addr:$dst, 0)>; > +def : Pat<(store (i64 -1), addr:$dst), (OR64mi8 addr:$dst, -1)>; > +def : Pat<(store (i64 -1), addr:$dst), (OR64mi8 addr:$dst, -1)>; > +}All these patterns have one important downside. They are suboptimal if more than one store happens in a row. E.g. the 0 store is better expressed as xor followed by two register moves, if a register is available... This is most noticable when memset() gets inlined Joerg
Jakob Stoklund Olesen
2011-Feb-27 02:12 UTC
[LLVMdev] TableGen syntax for matching a constant load
On Feb 26, 2011, at 4:50 PM, Joerg Sonnenberger wrote:> On Sun, Feb 27, 2011 at 01:29:25AM +0100, Joerg Sonnenberger wrote: >> +let Predicates = [OptForSize] in { >> +def : Pat<(store (i32 0), addr:$dst), (AND32mi8 addr:$dst, 0)>; >> +def : Pat<(store (i32 0), addr:$dst), (AND32mi8 addr:$dst, 0)>; >> +def : Pat<(store (i64 -1), addr:$dst), (OR64mi8 addr:$dst, -1)>; >> +def : Pat<(store (i64 -1), addr:$dst), (OR64mi8 addr:$dst, -1)>; >> +} > > All these patterns have one important downside. They are suboptimal if > more than one store happens in a row. E.g. the 0 store is better > expressed as xor followed by two register moves, if a register is > available... This is most noticable when memset() gets inlinedNote that LLVM's -Os option does not quite mean the same as GCC's flag. It disables optimizations that increase code size without a clear performance gain. It does not try to minimize code size at any cost. When you said you weren't concerned about performance, I assumed you wouldn't be submitting patches. Sorry about the confusion. Implementing constant stores as load+bitop+store is almost certainly not worth the small size win. As for materializing (i32 -1) in 3 bytes instead of 5, but with 2 ยต-ops instead of 1, I would like to see some performance numbers first. It might be cheap enough that it is worth it. The MOV32ri instruction can be rematerialized and is considered to be as cheap as a move. That is not true for xorl+decl, and unfortunately the register allocator currently doesn't know how to rematerialize multiple instructions which means that the register containing -1 could get spilled. We really don't want that to happen. Until the register allocator learns how to rematerialize multiple instructions, you would need to use a pseudo-instruction representing the xorl+decl pair. That instruction could be marked as rematerializable and even as cheap as a move. Then you can start measuring the performance impact ;-) Thanks, /jakob
Apparently Analagous Threads
- [LLVMdev] TableGen syntax for matching a constant load
- [LLVMdev] TableGen syntax for matching a constant load
- [LLVMdev] predicates vs. requirements [TableGen, X86InstrInfo.td]
- [LLVMdev] predicates vs. requirements [TableGen, X86InstrInfo.td]
- [LLVMdev] Adding a stack probe function attribute