Displaying 9 results from an estimated 9 matches for "shls".
Did you mean:
shl
2011 Jul 26
2
[LLVMdev] XOR Optimization
...eq i32 %inc.3, 128
> br i1 %exitcond.3, label %while.end, label %while.body
>
> while.end: ; preds = %while.body
> ret void
>
>
>
> It is clear that we are able to fold all XORs into a single XOR, and the
> same happens to all SHLs and ORs.
> I am using -O3, but the code is not optimized, so I am assuming there is no
> optimization for this case. Am I correct?
The loop is being unrolled by a factor of 4. This breaks the artificial
dependence between loop iterations, and should yield a substantial improvement
on machin...
2016 Jan 11
4
Some llvm questions (for tgsi backend)
...mnipotent char", !10, i64 0}
!10 = !{!"Simple C/C++ TBAA"}
And the "tgsi" looks like this:
.text
.file "/home/hans/foo.cl"
.globl test_kern
test_kern:
BGNSUB
MOVis TEMP1x, 0
CAL _Z13get_global_idj
SHLs TEMP1y, TEMP1x, 7
LOADiis TEMP1z, [4]
UADDs TEMP1y, TEMP1z, TEMP1y
SHLs TEMP1x, TEMP1x, 2
LOADiis TEMP1z, [0]
UADDs TEMP1x, TEMP1z, TEMP1x
LOADgis TEMP1x, [TEMP1x]
INEGs TEMP1x, TEMP1x
LOADgis TEMP1z, [TEMP1y]
UADDs TE...
2011 Jul 26
0
[LLVMdev] XOR Optimization
...ond.3, label %while.end, label %while.body
> >
> > while.end: ; preds = %while.body
> > ret void
> >
> >
> >
> > It is clear that we are able to fold all XORs into a single XOR, and the
> > same happens to all SHLs and ORs.
> > I am using -O3, but the code is not optimized, so I am assuming there is
> no
> > optimization for this case. Am I correct?
>
> The loop is being unrolled by a factor of 4. This breaks the artificial
> dependence between loop iterations, and should yield a subs...
2016 Jan 12
1
Some llvm questions (for tgsi backend)
...he "tgsi" looks like this:
>>
>> .text
>> .file "/home/hans/foo.cl"
>> .globl test_kern
>> test_kern:
>> BGNSUB
>> MOVis TEMP1x, 0
>> CAL _Z13get_global_idj
>> SHLs TEMP1y, TEMP1x, 7
>> LOADiis TEMP1z, [4]
>> UADDs TEMP1y, TEMP1z, TEMP1y
>> SHLs TEMP1x, TEMP1x, 2
>> LOADiis TEMP1z, [0]
>> UADDs TEMP1x, TEMP1z, TEMP1x
>> LOADgis TEMP1x, [TEMP1x]
>> INEGs TEM...
2016 Jan 11
0
Some llvm questions (for tgsi backend)
...ot;Simple C/C++ TBAA"}
>
> And the "tgsi" looks like this:
>
> .text
> .file "/home/hans/foo.cl"
> .globl test_kern
> test_kern:
> BGNSUB
> MOVis TEMP1x, 0
> CAL _Z13get_global_idj
> SHLs TEMP1y, TEMP1x, 7
> LOADiis TEMP1z, [4]
> UADDs TEMP1y, TEMP1z, TEMP1y
> SHLs TEMP1x, TEMP1x, 2
> LOADiis TEMP1z, [0]
> UADDs TEMP1x, TEMP1z, TEMP1x
> LOADgis TEMP1x, [TEMP1x]
> INEGs TEMP1x, TEMP1x
> LOADgis TE...
2016 Jan 11
0
Some llvm questions (for tgsi backend)
...;Simple C/C++ TBAA"}
>
> And the "tgsi" looks like this:
>
> .text
> .file "/home/hans/foo.cl"
> .globl test_kern
> test_kern:
> BGNSUB
> MOVis TEMP1x, 0
> CAL _Z13get_global_idj
> SHLs TEMP1y, TEMP1x, 7
> LOADiis TEMP1z, [4]
> UADDs TEMP1y, TEMP1z, TEMP1y
> SHLs TEMP1x, TEMP1x, 2
> LOADiis TEMP1z, [0]
> UADDs TEMP1x, TEMP1z, TEMP1x
> LOADgis TEMP1x, [TEMP1x]
> INEGs TEMP1x, TEMP1x
> LOADgis TE...
2011 Jul 26
0
[LLVMdev] XOR Optimization
"The fact that the loop is unrolled explains why the XORs, SHLs, and ORs are
not folded into 1."
I dont see why the unrolling explains it.
"I think he is trying to say this expression generated by unrolling by a
factor of 4 can indeed be folded into a single XOR, SHL and OR. "
Precisely. The code generated by unrolling can be folded into a sin...
2011 Jul 26
2
[LLVMdev] XOR Optimization
Hi-
> I haven't seen a machine in which OR is faster than ADD nor more energy-efficient. They're all done by the same ALU circuitry which delays the pipeline by its worstcase path timing. So, for modern processor hardware purposes, OR is exactly equal ADD. Transforming ADD to OR isn't strenght reduction at all. Maybe this is benefical only if you have a backend generating circuitry
2011 Jul 26
2
[LLVMdev] XOR optimization
...%inc.3 = add i32 %0, 4
%exitcond.3 = icmp eq i32 %inc.3, 128
br i1 %exitcond.3, label %while.end, label %while.body
while.end: ; preds = %while.body
ret void
It is clear that we are able to fold all XORs into a single XOR, and the
same happens to all SHLs and ORs.
I am using -O3, but the code is not optimized, so I am assuming there is no
optimization for this case. Am I correct?
If yes, I have a few other questions:
- Do you know of any other similar optimization that could do something
here but is not being triggered for some reason??
- Do you...