"The fact that the loop is unrolled explains why the XORs, SHLs, and ORs
are
not folded into 1."
I dont see why the unrolling explains it.
"I think he is trying to say this expression generated by unrolling by a
factor of 4 can indeed be folded into a single XOR, SHL and OR. "
Precisely. The code generated by unrolling can be folded into a single XOR
and SHL. And even if it was not inside a loop, it can still be
optimized. What I want to know is: is there any optimization supposed to
optimize this code, but for some reason it thinks it is not possible, or
there is no optimization for that situation at all?
Thanks for the help guys
On Tue, Jul 26, 2011 at 9:55 AM, Alistair Lynn <arplynn at gmail.com>
wrote:
> Hi-
>
> > I haven't seen a machine in which OR is faster than ADD nor more
> energy-efficient. They're all done by the same ALU circuitry which
delays
> the pipeline by its worstcase path timing. So, for modern processor
hardware
> purposes, OR is exactly equal ADD. Transforming ADD to OR isn't
strenght
> reduction at all. Maybe this is benefical only if you have a backend
> generating circuitry (programming FPGAs).
>
> I believe that in cases where ADD and OR are equivalent, LLVM prefers the
> latter because it's easier to reason about the bits in the result of an
OR
> in complex cases. The x86 backend, for instance, transforms ORs in such
> cases back into adds, presumably in case it may be matched to an lea where
> that's beneficial.
>
> Alistair
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110726/dcf01a34/attachment.html>