thr3ads.net - search: "shls"

2011 Jul 26

2

[LLVMdev] XOR Optimization

...eq i32 %inc.3, 128 > br i1 %exitcond.3, label %while.end, label %while.body > > while.end: ; preds = %while.body > ret void > > > > It is clear that we are able to fold all XORs into a single XOR, and the > same happens to all SHLs and ORs. > I am using -O3, but the code is not optimized, so I am assuming there is no > optimization for this case. Am I correct? The loop is being unrolled by a factor of 4. This breaks the artificial dependence between loop iterations, and should yield a substantial improvement on machin...

Some llvm questions (for tgsi backend)

2016 Jan 11

4

Some llvm questions (for tgsi backend)

...mnipotent char", !10, i64 0} !10 = !{!"Simple C/C++ TBAA"} And the "tgsi" looks like this: .text .file "/home/hans/foo.cl" .globl test_kern test_kern: BGNSUB MOVis TEMP1x, 0 CAL _Z13get_global_idj SHLs TEMP1y, TEMP1x, 7 LOADiis TEMP1z, [4] UADDs TEMP1y, TEMP1z, TEMP1y SHLs TEMP1x, TEMP1x, 2 LOADiis TEMP1z, [0] UADDs TEMP1x, TEMP1z, TEMP1x LOADgis TEMP1x, [TEMP1x] INEGs TEMP1x, TEMP1x LOADgis TEMP1z, [TEMP1y] UADDs TE...

[LLVMdev] XOR Optimization

2011 Jul 26

0

[LLVMdev] XOR Optimization

...ond.3, label %while.end, label %while.body > > > > while.end: ; preds = %while.body > > ret void > > > > > > > > It is clear that we are able to fold all XORs into a single XOR, and the > > same happens to all SHLs and ORs. > > I am using -O3, but the code is not optimized, so I am assuming there is > no > > optimization for this case. Am I correct? > > The loop is being unrolled by a factor of 4. This breaks the artificial > dependence between loop iterations, and should yield a subs...

Some llvm questions (for tgsi backend)

2016 Jan 12

1

Some llvm questions (for tgsi backend)

...he "tgsi" looks like this: >> >> .text >> .file "/home/hans/foo.cl" >> .globl test_kern >> test_kern: >> BGNSUB >> MOVis TEMP1x, 0 >> CAL _Z13get_global_idj >> SHLs TEMP1y, TEMP1x, 7 >> LOADiis TEMP1z, [4] >> UADDs TEMP1y, TEMP1z, TEMP1y >> SHLs TEMP1x, TEMP1x, 2 >> LOADiis TEMP1z, [0] >> UADDs TEMP1x, TEMP1z, TEMP1x >> LOADgis TEMP1x, [TEMP1x] >> INEGs TEM...

Some llvm questions (for tgsi backend)

2016 Jan 11

0

Some llvm questions (for tgsi backend)

...ot;Simple C/C++ TBAA"} > > And the "tgsi" looks like this: > > .text > .file "/home/hans/foo.cl" > .globl test_kern > test_kern: > BGNSUB > MOVis TEMP1x, 0 > CAL _Z13get_global_idj > SHLs TEMP1y, TEMP1x, 7 > LOADiis TEMP1z, [4] > UADDs TEMP1y, TEMP1z, TEMP1y > SHLs TEMP1x, TEMP1x, 2 > LOADiis TEMP1z, [0] > UADDs TEMP1x, TEMP1z, TEMP1x > LOADgis TEMP1x, [TEMP1x] > INEGs TEMP1x, TEMP1x > LOADgis TE...

Some llvm questions (for tgsi backend)

2016 Jan 11

0

Some llvm questions (for tgsi backend)

...;Simple C/C++ TBAA"} > > And the "tgsi" looks like this: > > .text > .file "/home/hans/foo.cl" > .globl test_kern > test_kern: > BGNSUB > MOVis TEMP1x, 0 > CAL _Z13get_global_idj > SHLs TEMP1y, TEMP1x, 7 > LOADiis TEMP1z, [4] > UADDs TEMP1y, TEMP1z, TEMP1y > SHLs TEMP1x, TEMP1x, 2 > LOADiis TEMP1z, [0] > UADDs TEMP1x, TEMP1z, TEMP1x > LOADgis TEMP1x, [TEMP1x] > INEGs TEMP1x, TEMP1x > LOADgis TE...

[LLVMdev] XOR Optimization

2011 Jul 26

0

[LLVMdev] XOR Optimization

"The fact that the loop is unrolled explains why the XORs, SHLs, and ORs are not folded into 1." I dont see why the unrolling explains it. "I think he is trying to say this expression generated by unrolling by a factor of 4 can indeed be folded into a single XOR, SHL and OR. " Precisely. The code generated by unrolling can be folded into a sin...

[LLVMdev] XOR Optimization

2011 Jul 26

2

[LLVMdev] XOR Optimization

Hi- > I haven't seen a machine in which OR is faster than ADD nor more energy-efficient. They're all done by the same ALU circuitry which delays the pipeline by its worstcase path timing. So, for modern processor hardware purposes, OR is exactly equal ADD. Transforming ADD to OR isn't strenght reduction at all. Maybe this is benefical only if you have a backend generating circuitry

[LLVMdev] XOR optimization

2011 Jul 26

2

[LLVMdev] XOR optimization

...%inc.3 = add i32 %0, 4 %exitcond.3 = icmp eq i32 %inc.3, 128 br i1 %exitcond.3, label %while.end, label %while.body while.end: ; preds = %while.body ret void It is clear that we are able to fold all XORs into a single XOR, and the same happens to all SHLs and ORs. I am using -O3, but the code is not optimized, so I am assuming there is no optimization for this case. Am I correct? If yes, I have a few other questions: - Do you know of any other similar optimization that could do something here but is not being triggered for some reason?? - Do you...

search for: shls