thr3ads.net - llvm dev - [llvm-dev] [LLVMdev] Question on BlendSplat Code - LLVM Commit 72753f87f2b80d66cfd7ca7c7b6c0db6737d4b24 [Aug 2015]

If this information is useful, please help other people find it:
Share via:

Tyler Kenney

2015-Jul-30 14:29 UTC

[LLVMdev] Question on BlendSplat Code - LLVM Commit 72753f87f2b80d66cfd7ca7c7b6c0db6737d4b24

An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150730/121669c8/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: blend-splat-test.tar.gz
Type: application/octet-stream
Size: 11716 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150730/121669c8/attachment.obj>

Hal Finkel

2015-Aug-05 06:31 UTC

head link

[llvm-dev] [LLVMdev] Question on BlendSplat Code - LLVM Commit 72753f87f2b80d66cfd7ca7c7b6c0db6737d4b24

Hi Tyler,

First, as a procedural note, we always refer to commits by their svn revision
number, what is the corresponding svn revision number to
72753f87f2b80d66cfd7ca7c7b6c0db6737d4b24 on the git mirror?

Second, as I read your message, you sound skeptical about the utility of the
transformation, even for x86. That seems unjustified, however, because running
your test case through llc -mtriple=x86_64 -mcpu=corei7-avx generates vpunpckhbw
and vpmovzxbw (which seem to be the two corresponding shuffle instructions).

That having been said, Chandler, could you please explain the general strategy
that x86 uses here? We obviously might wish to emulate it in the PowerPC
backend.

 -Hal

----- Original Message -----> From: "Tyler Kenney" <tjkenney at us.ibm.com>
> To: chandlerc at gmail.com
> Cc: "Ulrich Weigand" <Ulrich.Weigand at de.ibm.com>,
llvmdev at cs.uiuc.edu
> Sent: Thursday, July 30, 2015 9:29:07 AM
> Subject: [LLVMdev] Question on BlendSplat Code - LLVM Commit
72753f87f2b80d66cfd7ca7c7b6c0db6737d4b24
> 
> 
> 
> 
> 
> Hey Chandler,
> 
> 
> I'm working on a modification of the Power LLVM backend and I have
> some questions about the 'BlendSplat' code in
> SelectionDAG::GetVectorShuffle(). Basically, I'm wondering if you
> can give a little more detail about the goal of this function? It
> seems like your code is increasing the chances of the mask matching
> the subsequent checks for an identity shuffle or all LHS/RHS, which
> is clearly beneficial. Are you also claiming the altered mask is
> easier to match, even if it's not caught by those special cases?
> 
> 
> I attached a tarball with .cl & .ll source for one case where the
> altered mask seems much more difficult to match; the shufflevector
> instruction in the IR is a fairly straightforward interleave of two
> variables, but your blend code eliminates this pattern when building
> the dag. Like I said, I'm targetting power here, so I want the
> shufflevector instructions to match vmrghb & vmrglb. I'm assuming
> x86 has similar instructions? Is the altered mask in the .ps file
> really easier to match on x86? I attached the power assembly
> generated for this function with & without the blendsplat code and I
> think its clear that, at least in the case of power, the altered
> mask is not preferable. Agreed? I'd like to understand the intent of
> your code better so I can either (a) figure out how to properly
> avoid modification of the mask in this case or (b) invert this
> modification in the power backend so we can match this to vmrg*
> instructions and avoid the use of vperm.
> 
> 
> Thanks,
> Tyler
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Tyler Kenney

2015-Aug-05 21:02 UTC

head link

[llvm-dev] [LLVMdev] Question on BlendSplat Code - LLVM Commit 72753f87f2b80d66cfd7ca7c7b6c0db6737d4b24

<font face="Default Sans Serif,Verdana,Arial,Helvetica,sans-serif"
size="2"><font face="Verdana, Arial, Helvetica,
sans-serif">Hal,</font><div style="font-family: Verdana,
Arial, Helvetica, sans-serif;"><br></div><div
style="font-family: Verdana, Arial, Helvetica, sans-serif;">Sorry
about that, svn rev = 229308.</div><div style="font-family:
Verdana, Arial, Helvetica, sans-serif;"><br></div><div
style="font-family: Verdana, Arial, Helvetica, sans-serif;">I
really can't comment on the utility of the transformation when targetting
x86 as I am not familiar with the instruction set. I am, however, skeptical of
the transformation's utility in this specific test case for
<i>general</i> compilation. Seeing as this code is in the target
independent code generator, I think whether or not this is a generally useful
transformation is the question here. </div><div
style="font-family: Verdana, Arial, Helvetica,
sans-serif;"><br></div><div style="font-family:
Verdana, Arial, Helvetica, sans-serif;">To be clear, the transformation
in this case is:</div><div style="font-family: Verdana, Arial,
Helvetica, sans-serif;"><br></div><div><span
class="Apple-tab-span" style="font-family: Verdana, Arial,
Helvetica, sans-serif; white-space: pre;">       
</span><<font face="Verdana, Arial, Helvetica,
sans-serif">0,<font
color="#cc0000">32</font>,2,<font
color="#cc0000">33,</font>4,<font
color="#cc0000">34</font>,6,<font
color="#cc0000">35</font>,8,<font
color="#cc0000">36</font>,10,<font
color="#cc0000">37</font>,12,<font
color="#cc0000">38</font>,14,<font
color="#ff0000">39</font>,16,40,18,41,20,42,22,43,24,44,26,45,28,46,30,47></font></div><div><font
face="Verdana, Arial, Helvetica,
sans-serif"><br></font></div><div><font
face="Verdana, Arial, Helvetica,
sans-serif">to:</font></div><div><font
face="Verdana, Arial, Helvetica, sans-serif"><span
class="Apple-tab-span" style="white-space:pre">       
</span></font><<font face="Verdana, Arial, Helvetica,
sans-serif">0,<font
color="#00cc00">33</font>,2,<font
color="#00cc00">35</font>,4,<font
color="#00cc00">37</font>,6,<font
color="#00cc00">39</font>,8,<font
color="#00cc00">41</font>,10,<font
color="#00cc00">43</font>,12,<font
color="#00cc00">45</font>,14,<font
color="#00cc00">47</font>,16,40,18,41,20,42,22,43,24,44,26,45,28,46,30,47></font></div><div><font
face="Verdana, Arial, Helvetica,
sans-serif"><br></font></div><div><font
face="Verdana, Arial, Helvetica, sans-serif">In my test case, this
instruction is preceded by a shufflevector with the following
mask:</font></div><div><font face="Verdana, Arial,
Helvetica,
sans-serif"><br></font></div><div><font
face="Verdana, Arial, Helvetica, sans-serif">    
<0,undef,1,undef,2,undef,3,undef,4,undef,5,undef,6,undef,7,undef,8,undef,9,undef,10,undef,11,undef,12,undef,13,undef,14,undef,15,undef></font></div><div><br></div><div>When
the blend-splat code is ifdef'd out, the dag combiner can combine these two
masks into a straightforward
interleave.</div><div></div><div><font
face="Verdana, Arial, Helvetica,
sans-serif"><br></font></div><div><font
face="Verdana, Arial, Helvetica, sans-serif">Like you said, the x86
backend does still seem to do a good job recognizing the merge/interleave
operation here, so I can take a look at the x86 lowering code to see how they
handle this case. However, Chandler if you could still offer some insight into
why the transformed mask is preferred in this case, I'd appreciate
it.</font></div><div><font face="Verdana, Arial,
Helvetica,
sans-serif"><br></font></div><div><font
face="Verdana, Arial, Helvetica,
sans-serif">Tyler</font></div><div
style="font-family: Verdana, Arial, Helvetica,
sans-serif;"><br><br><font
color="#990099">-----Hal Finkel <<a
href="mailto:hfinkel@anl.gov"
target="_blank">hfinkel@anl.gov</a>> wrote:
-----</font><div class="iNotesHistory"
style="padding-left:5px;"><div
style="padding-right:0px;padding-left:5px;border-left:solid black
2px;">To: Tyler Kenney/Marlborough/IBM@IBMUS, <<a
href="mailto:chandlerc@gmail.com"
target="_blank">chandlerc@gmail.com</a>><br>From:
Hal Finkel <<a href="mailto:hfinkel@anl.gov"
target="_blank">hfinkel@anl.gov</a>><br>Date:
08/05/2015 02:31AM<br>Cc: Ulrich Weigand <<a
href="mailto:Ulrich.Weigand@de.ibm.com"
target="_blank">Ulrich.Weigand@de.ibm.com</a>>, <<a
href="mailto:llvm-dev@lists.llvm.org"
target="_blank">llvm-dev@lists.llvm.org</a>><br>Subject:
Re: [LLVMdev] Question on BlendSplat Code - LLVM Commit
72753f87f2b80d66cfd7ca7c7b6c0db6737d4b24<br><br><div><font
face="Courier New,Courier,monospace" size="3">Hi
Tyler,<br><br>First, as a procedural note, we always refer to
commits by their svn revision number, what is the corresponding svn revision
number to 72753f87f2b80d66cfd7ca7c7b6c0db6737d4b24 on the git
mirror?<br><br>Second, as I read your message, you sound skeptical
about the utility of the transformation, even for x86. That seems unjustified,
however, because running your test case through llc -mtriple=x86_64
-mcpu=corei7-avx generates vpunpckhbw and vpmovzxbw (which seem to be the two
corresponding shuffle instructions).<br><br>That having been said,
Chandler, could you please explain the general strategy that x86 uses here? We
obviously might wish to emulate it in the PowerPC backend.<br><br>
-Hal<br><br>----- Original Message -----<br>> From:
"Tyler Kenney" <<a href="mailto:tjkenney@us.ibm.com"
target="_blank">tjkenney@us.ibm.com</a>><br>> To:
<a href="mailto:chandlerc@gmail.com"
target="_blank">chandlerc@gmail.com</a><br>> Cc:
"Ulrich Weigand" <<a
href="mailto:Ulrich.Weigand@de.ibm.com"
target="_blank">Ulrich.Weigand@de.ibm.com</a>>, <a
href="mailto:llvmdev@cs.uiuc.edu"
target="_blank">llvmdev@cs.uiuc.edu</a><br>> Sent:
Thursday, July 30, 2015 9:29:07 AM<br>> Subject: [LLVMdev] Question on
BlendSplat Code - LLVM Commit
72753f87f2b80d66cfd7ca7c7b6c0db6737d4b24<br>> <br>>
<br>> <br>> <br>> <br>> Hey
Chandler,<br>> <br>> <br>> I'm working on a
modification of the Power LLVM backend and I have<br>> some questions
about the 'BlendSplat' code in<br>>
SelectionDAG::GetVectorShuffle(). Basically, I'm wondering if
you<br>> can give a little more detail about the goal of this function?
It<br>> seems like your code is increasing the chances of the mask
matching<br>> the subsequent checks for an identity shuffle or all
LHS/RHS, which<br>> is clearly beneficial. Are you also claiming the
altered mask is<br>> easier to match, even if it's not caught by
those special cases?<br>> <br>> <br>> I attached a
tarball with .cl & .ll source for one case where the<br>> altered
mask seems much more difficult to match; the shufflevector<br>>
instruction in the IR is a fairly straightforward interleave of
two<br>> variables, but your blend code eliminates this pattern when
building<br>> the dag. Like I said, I'm targetting power here, so I
want the<br>> shufflevector instructions to match vmrghb & vmrglb.
I'm assuming<br>> x86 has similar instructions? Is the altered mask
in the .ps file<br>> really easier to match on x86? I attached the
power assembly<br>> generated for this function with & without the
blendsplat code and I<br>> think its clear that, at least in the case
of power, the altered<br>> mask is not preferable. Agreed? I'd like
to understand the intent of<br>> your code better so I can either (a)
figure out how to properly<br>> avoid modification of the mask in this
case or (b) invert this<br>> modification in the power backend so we
can match this to vmrg*<br>> instructions and avoid the use of
vperm.<br>> <br>> <br>> Thanks,<br>>
Tyler<br>> <br>> <br>> <br>> <br>>
<br>> <br>> <br>>
_______________________________________________<br>> LLVM Developers
mailing list<br>> <a href="mailto:LLVMdev@cs.uiuc.edu"
target="_blank">LLVMdev@cs.uiuc.edu</a>         <a
href="http://llvm.cs.uiuc.edu">http://llvm.cs.uiuc.edu</a><br>>
<a
href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>>
<br><br>-- <br>Hal Finkel<br>Assistant Computational
Scientist<br>Leadership Computing Facility<br>Argonne National
Laboratory<br><br></font></div></div></div></div></font><BR>

llvm dev - Aug 2015 - [LLVMdev] Question on BlendSplat Code - LLVM Commit 72753f87f2b80d66cfd7ca7c7b6c0db6737d4b24

[LLVMdev] Question on BlendSplat Code - LLVM Commit 72753f87f2b80d66cfd7ca7c7b6c0db6737d4b24

[llvm-dev] [LLVMdev] Question on BlendSplat Code - LLVM Commit 72753f87f2b80d66cfd7ca7c7b6c0db6737d4b24

[llvm-dev] [LLVMdev] Question on BlendSplat Code - LLVM Commit 72753f87f2b80d66cfd7ca7c7b6c0db6737d4b24