thr3ads.net - llvm dev - [LLVMdev] [Target] Custom Lowering expansion of 32-bit ISD::SHL, ISD::SHR without barrel shifter [Nov 2013]

If this information is useful, please help other people find it:
Share via:

Steve Montgomery

2013-Nov-10 18:35 UTC

[LLVMdev] [Target] Custom Lowering expansion of 32-bit ISD::SHL, ISD::SHR without barrel shifter

I forgot to mention that I used EXTRACT_ELEMENT in my backend to get the high
and low parts of an SDValue.

On 10 Nov 2013, at 17:50, Steve Montgomery <stephen.montgomery3 at
btinternet.com> wrote:
> I had a similar problem with a backend for the 68HC12 family which also has
no barrel shifter. Some 68HC12 CPUs support shift for just one of the 16-bit
registers and only support rotation on the 2 8-bit subregs of that 16-bit
register. That means the only practical solution for 32-bit shifts is to lower
to a libcall but my situation for 16-bit shifts sounds similar to yours for
32-bit shifts.
> 
> I defined pseudo-instructions in my InstrInfo.td that implemented 8 and
16-bit SHL, SRL and SRA with one variant of each for the shift count in a
register and another for an immediate shift count. Some shifts by immediate
values and all shifts by non-immediate values need to be implemented using a
loop so I used a custom inserter. I added a pass to expand the remaining shift
pseudos into shift and rotate sequences with masking where required (e.g.
replacing SRL x,7 with ROL x,2)
> 
> I've no idea if that's the best way but it seems to work for me.
> 
> Steve
> 
> On 9 Nov 2013, at 21:29, MathOnNapkins . <mathonnapkins at gmail.com>
wrote:
> 
>> Dear All,
>> 
>> I am trying to custom lower 32-bit ISD::SHL and SHR in a backend for
6502 family CPUs. The particular subtarget has 16-bit registers at most, so a
32-bit result is not legal. Normally, if you mark this as "Legal" or
"Expand", then it will expand the node into a more nodes as follows in
an example:
>> 
>> shl i32 %a , 2
>> 
>> => high_sdvalue = (or (shr %b, 14), (shl %c, 2) )
>> => low_sdvalue = (shl %b, 2)
>> 
>> where %b would be the lower 16 bits of %a, and %c is the upper 16 bits
of %a.
>> 
>> Since this target doesn't have a barrel shifter, it would be a huge
performance hit to construct the DAG in this way, due to the large number of
right shifts). Thus, I wish to mark ISD::SHL/SHR as "Custom" in
setOperationAction(), but I'm unsure of how to manually split the 32-bit
shiftee SDValue into two 16-bit values. The standard logic in
/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp and LegalizeIntegerTypes.cpp has a
member function called GetExpandedInteger(). Unfortunately, this is a private
method, so I can't gain access to this without hacking on code outside of my
target's (/lib/Target/x65 in this case). To be 100% clear, I actually
already did attempt this route, and when compiling received an error that
GetExpandedInteger was a private method.
>> 
>> What I would ultimately like to lower this to is probably a 16-bit
ISD::SHL_PARTS, which I'd then further lower to a sequence of pairs of {
shift left, rotate left through carry} instructions. Example of how that would
look in relation to the aforementioned example:
>> 
>> asl %b
>> rol %c
>> asl %b
>> rol %c
>> 
>> There doesn't seem to be a set of standard node types analogous to
ADD / ADDC / ADDE for shift operations, which is really what I'm after here.
>> 
>> I would very much appreciate any advice or insight you could provide on
this matter.
>> 
>> Thanks,
>> ~Dave Waggoner / MathOnNapkins
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

MathOnNapkins .

2013-Nov-11 06:18 UTC

head link

[LLVMdev] [Target] Custom Lowering expansion of 32-bit ISD::SHL, ISD::SHR without barrel shifter

Hi Steve,

Thanks for confirming that EXTRACT_ELEMENT is something I can use. I had
seen it in the generated DAGs but was unsure whether I was "allowed"
to use
it, if that's the right word. I checked up on it more and indeed the
mainstream targets like ARM use that node type in custom lowering code, so
that should solve that. Perhaps in the future I might submit a patch for
TargetLowering to set whether the target has a barrel shifter, because I
believe that should alter the default emitted DAGs considerably. For
example, I tried lowering SHL %a, 5 to a series of SHL %a, 1's at one point
and the framework somewhat hilariously (to me) recombined them. If CodeGen
knew not to do that, it could save some effort for people trying to target
instruction sets with this limitation. The only current mainstream target
without a barrel shifter, though, is the MSP430, and it's still a bit
experimental (though a very helpful example for me at times).

Could I also ask whether you had to design your shift nodes with Glue or
Chain operands? I haven't taken extra that step yet, but I imagine it will
be necessary.

Sincerely,
~Dave Waggoner / MathOnNapkins


On Sun, Nov 10, 2013 at 1:35 PM, Steve Montgomery <
stephen.montgomery3 at btinternet.com> wrote:
> I forgot to mention that I used EXTRACT_ELEMENT in my backend to get the
> high and low parts of an SDValue.
>
> On 10 Nov 2013, at 17:50, Steve Montgomery <
> stephen.montgomery3 at btinternet.com> wrote:
>
> > I had a similar problem with a backend for the 68HC12 family which
also
> has no barrel shifter. Some 68HC12 CPUs support shift for just one of the
> 16-bit registers and only support rotation on the 2 8-bit subregs of that
> 16-bit register. That means the only practical solution for 32-bit shifts
> is to lower to a libcall but my situation for 16-bit shifts sounds similar
> to yours for 32-bit shifts.
> >
> > I defined pseudo-instructions in my InstrInfo.td that implemented 8
and
> 16-bit SHL, SRL and SRA with one variant of each for the shift count in a
> register and another for an immediate shift count. Some shifts by immediate
> values and all shifts by non-immediate values need to be implemented using
> a loop so I used a custom inserter. I added a pass to expand the remaining
> shift pseudos into shift and rotate sequences with masking where required
> (e.g. replacing SRL x,7 with ROL x,2)
> >
> > I've no idea if that's the best way but it seems to work for
me.
> >
> > Steve
> >
> > On 9 Nov 2013, at 21:29, MathOnNapkins . <mathonnapkins at
gmail.com>
> wrote:
> >
> >> Dear All,
> >>
> >> I am trying to custom lower 32-bit ISD::SHL and SHR in a backend
for
> 6502 family CPUs. The particular subtarget has 16-bit registers at most, so
> a 32-bit result is not legal. Normally, if you mark this as
"Legal" or
> "Expand", then it will expand the node into a more nodes as
follows in an
> example:
> >>
> >> shl i32 %a , 2
> >>
> >> => high_sdvalue = (or (shr %b, 14), (shl %c, 2) )
> >> => low_sdvalue = (shl %b, 2)
> >>
> >> where %b would be the lower 16 bits of %a, and %c is the upper 16
bits
> of %a.
> >>
> >> Since this target doesn't have a barrel shifter, it would be a
huge
> performance hit to construct the DAG in this way, due to the large number
> of right shifts). Thus, I wish to mark ISD::SHL/SHR as "Custom"
in
> setOperationAction(), but I'm unsure of how to manually split the
32-bit
> shiftee SDValue into two 16-bit values. The standard logic in
> /lib/CodeGen/SelectionDAG/LegalizeTypes.cpp and LegalizeIntegerTypes.cpp
> has a member function called GetExpandedInteger(). Unfortunately, this is a
> private method, so I can't gain access to this without hacking on code
> outside of my target's (/lib/Target/x65 in this case). To be 100%
clear, I
> actually already did attempt this route, and when compiling received an
> error that GetExpandedInteger was a private method.
> >>
> >> What I would ultimately like to lower this to is probably a 16-bit
> ISD::SHL_PARTS, which I'd then further lower to a sequence of pairs of
{
> shift left, rotate left through carry} instructions. Example of how that
> would look in relation to the aforementioned example:
> >>
> >> asl %b
> >> rol %c
> >> asl %b
> >> rol %c
> >>
> >> There doesn't seem to be a set of standard node types
analogous to ADD
> / ADDC / ADDE for shift operations, which is really what I'm after
here.
> >>
> >> I would very much appreciate any advice or insight you could
provide on
> this matter.
> >>
> >> Thanks,
> >> ~Dave Waggoner / MathOnNapkins
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131111/cc7dcca0/attachment.html>

Steve Montgomery

2013-Nov-11 08:23 UTC

head link

[LLVMdev] [Target] Custom Lowering expansion of 32-bit ISD::SHL, ISD::SHR without barrel shifter

Hi Dave,

I wasn't sure whether I was "allowed" to use EXTRACT_ELEMENT
either but, as you say, it's used in other targets and it seems to work so
I'll stick with it.

No, I didn't use Glue or Chain operands because I'm lowering each shift
operation to a single node corresponding to a pseudo-instruction which is
expanded into Machineinstrs either by the custom inserter or by a later pass. I
think the MSP430 does it in a similar way, the difference being that it always
expands shifts using a loop whereas I chose to leave some to be expanded later
without use of a loop.

Steve

On 11 Nov 2013, at 06:18, MathOnNapkins . <mathonnapkins at gmail.com>
wrote:
> Hi Steve,
> 
> Thanks for confirming that EXTRACT_ELEMENT is something I can use. I had
seen it in the generated DAGs but was unsure whether I was "allowed"
to use it, if that's the right word. I checked up on it more and indeed the
mainstream targets like ARM use that node type in custom lowering code, so that
should solve that. Perhaps in the future I might submit a patch for
TargetLowering to set whether the target has a barrel shifter, because I believe
that should alter the default emitted DAGs considerably. For example, I tried
lowering SHL %a, 5 to a series of SHL %a, 1's at one point and the framework
somewhat hilariously (to me) recombined them. If CodeGen knew not to do that, it
could save some effort for people trying to target instruction sets with this
limitation. The only current mainstream target without a barrel shifter, though,
is the MSP430, and it's still a bit experimental (though a very helpful
example for me at times).
> 
> Could I also ask whether you had to design your shift nodes with Glue or
Chain operands? I haven't taken extra that step yet, but I imagine it will
be necessary.
> 
> Sincerely,
> ~Dave Waggoner / MathOnNapkins
> 
> 
> On Sun, Nov 10, 2013 at 1:35 PM, Steve Montgomery <stephen.montgomery3
at btinternet.com> wrote:
> I forgot to mention that I used EXTRACT_ELEMENT in my backend to get the
high and low parts of an SDValue.
> 
> On 10 Nov 2013, at 17:50, Steve Montgomery <stephen.montgomery3 at
btinternet.com> wrote:
> 
> > I had a similar problem with a backend for the 68HC12 family which
also has no barrel shifter. Some 68HC12 CPUs support shift for just one of the
16-bit registers and only support rotation on the 2 8-bit subregs of that 16-bit
register. That means the only practical solution for 32-bit shifts is to lower
to a libcall but my situation for 16-bit shifts sounds similar to yours for
32-bit shifts.
> >
> > I defined pseudo-instructions in my InstrInfo.td that implemented 8
and 16-bit SHL, SRL and SRA with one variant of each for the shift count in a
register and another for an immediate shift count. Some shifts by immediate
values and all shifts by non-immediate values need to be implemented using a
loop so I used a custom inserter. I added a pass to expand the remaining shift
pseudos into shift and rotate sequences with masking where required (e.g.
replacing SRL x,7 with ROL x,2)
> >
> > I've no idea if that's the best way but it seems to work for
me.
> >
> > Steve
> >
> > On 9 Nov 2013, at 21:29, MathOnNapkins . <mathonnapkins at
gmail.com> wrote:
> >
> >> Dear All,
> >>
> >> I am trying to custom lower 32-bit ISD::SHL and SHR in a backend
for 6502 family CPUs. The particular subtarget has 16-bit registers at most, so
a 32-bit result is not legal. Normally, if you mark this as "Legal" or
"Expand", then it will expand the node into a more nodes as follows in
an example:
> >>
> >> shl i32 %a , 2
> >>
> >> => high_sdvalue = (or (shr %b, 14), (shl %c, 2) )
> >> => low_sdvalue = (shl %b, 2)
> >>
> >> where %b would be the lower 16 bits of %a, and %c is the upper 16
bits of %a.
> >>
> >> Since this target doesn't have a barrel shifter, it would be a
huge performance hit to construct the DAG in this way, due to the large number
of right shifts). Thus, I wish to mark ISD::SHL/SHR as "Custom" in
setOperationAction(), but I'm unsure of how to manually split the 32-bit
shiftee SDValue into two 16-bit values. The standard logic in
/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp and LegalizeIntegerTypes.cpp has a
member function called GetExpandedInteger(). Unfortunately, this is a private
method, so I can't gain access to this without hacking on code outside of my
target's (/lib/Target/x65 in this case). To be 100% clear, I actually
already did attempt this route, and when compiling received an error that
GetExpandedInteger was a private method.
> >>
> >> What I would ultimately like to lower this to is probably a 16-bit
ISD::SHL_PARTS, which I'd then further lower to a sequence of pairs of {
shift left, rotate left through carry} instructions. Example of how that would
look in relation to the aforementioned example:
> >>
> >> asl %b
> >> rol %c
> >> asl %b
> >> rol %c
> >>
> >> There doesn't seem to be a set of standard node types
analogous to ADD / ADDC / ADDE for shift operations, which is really what
I'm after here.
> >>
> >> I would very much appreciate any advice or insight you could
provide on this matter.
> >>
> >> Thanks,
> >> ~Dave Waggoner / MathOnNapkins
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131111/25c1d6c0/attachment.html>

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - Nov 2013 - [LLVMdev] [Target] Custom Lowering expansion of 32-bit ISD::SHL, ISD::SHR without barrel shifter

[LLVMdev] [Target] Custom Lowering expansion of 32-bit ISD::SHL, ISD::SHR without barrel shifter

[LLVMdev] [Target] Custom Lowering expansion of 32-bit ISD::SHL, ISD::SHR without barrel shifter

[LLVMdev] [Target] Custom Lowering expansion of 32-bit ISD::SHL, ISD::SHR without barrel shifter

Possibly Parallel Threads