Displaying 20 results from an estimated 27 matches for "armisd".
2010 Nov 12
2
[LLVMdev] Simple NEON optimization
On 12 November 2010 17:52, Bob Wilson <bob.wilson at apple.com> wrote:
> I recommend implementing this as a target-specific DAG combine optimization. We already have target-specific DAG nodes for the relevant NEON comparison operations (ARMISD::VCEQ, etc. -- see ARMISelLowering.h) as well as the vmov (ARMISD::VMOVIMM). You just need to teach the DAG combiner how to fold them together. Here's what you need to do (all of this code is in ARMISelLowering.cpp):
Hi Bob,
I thought so... I'll get cracked and see if I can generate som...
2010 Nov 12
0
[LLVMdev] Simple NEON optimization
...uld I put this as a special case in NEON lowering or make it as
> part of an optimization pass? Which classes should I look first?
I recommend implementing this as a target-specific DAG combine optimization. We already have target-specific DAG nodes for the relevant NEON comparison operations (ARMISD::VCEQ, etc. -- see ARMISelLowering.h) as well as the vmov (ARMISD::VMOVIMM). You just need to teach the DAG combiner how to fold them together. Here's what you need to do (all of this code is in ARMISelLowering.cpp):
0. (You don't actually need to do anything, but I'm just mentioning...
2010 Nov 12
2
[LLVMdev] Simple NEON optimization
Hi folks, me again,
So, I want to implement a simple optimization in a NEON case I've seen
these days, most as a matter of exercise, but it also simplifies (just
a bit) the code generated.
The case is simple:
uint32x2_t x, res;
res = vceq_u32(x, vcreate_u32(0));
This will generate the following code:
; zero d16
vmov.i32 d16, #0x0
; load a
2009 Feb 17
1
[LLVMdev] ARM backend playing with alternative jump table implementations
....data
.LJTI9_0_0:
.long .LBB9_2
.long .LBB9_5
.long .LBB9_7
.long .LBB9_4
.long .LBB9_8
.text
The code for the lowering lives mostly in SDValue
ARMTargetLowering::LowerBR_JT
with some more heavy lifting done by ARMISD::WrapperJT
My attempts at this are marked in the code below.
My problem is to come up with the right item/value to put into the constant
pool.
SDValue ARMTargetLowering::LowerBR_JT(SDValue Op, SelectionDAG &DAG) {
SDValue Chain = Op.getOperand(0);
SDValue Table = Op.getOperand(1);
SDValu...
2010 Nov 12
0
[LLVMdev] Simple NEON optimization
...12, 2010, at 10:42 AM, Renato Golin wrote:
> On 12 November 2010 17:52, Bob Wilson <bob.wilson at apple.com> wrote:
>> I recommend implementing this as a target-specific DAG combine optimization. We already have target-specific DAG nodes for the relevant NEON comparison operations (ARMISD::VCEQ, etc. -- see ARMISelLowering.h) as well as the vmov (ARMISD::VMOVIMM). You just need to teach the DAG combiner how to fold them together. Here's what you need to do (all of this code is in ARMISelLowering.cpp):
>
> Hi Bob,
>
> I thought so... I'll get cracked and see i...
2019 Jan 04
2
Potential bug in SelectionDAGLegalize::ConvertNodeToLibcall()?
+ Eli Friedman as he often has very insightful comments regarding back end
changes.
On Fri, Jan 4, 2019 at 9:03 AM Nemanja Ivanovic <nemanja.i.ibm at gmail.com>
wrote:
> The changes seem fine to me. I don't think this is excessively intrusive
> and it accomplishes what is needed by targets whose call lowering can
> introduce illegal types.
> Adding Justin Bogner as the
2013 Jul 01
3
[LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?
...G combine.
>
> Let me know if there is another, better supported, approach for this kind of problems.
>
> ** Motivating Example **
> The motivating example comes form the lowering of vector code on armv7.
> More specifically, the build_vector node is lowered to a target specific ARMISD::build_vector where all the parameters are bitcasted to floating point types.
>
> This works well, unless the inserted bitcasts survive until instruction selection. In that case, they incur moves between integer unit and floating point unit that may result in inefficient code.
>
> Att...
2013 Jul 01
0
[LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?
...if there is another, better supported, approach for this kind
>> of problems.
>>
>> ** Motivating Example **
>> The motivating example comes form the lowering of vector code on armv7.
>> More specifically, the build_vector node is lowered to a target specific
>> ARMISD::build_vector where all the parameters are bitcasted to floating
>> point types.
>>
>> This works well, unless the inserted bitcasts survive until instruction
>> selection. In that case, they incur moves between integer unit and floating
>> point unit that may result i...
2013 Jul 01
3
[LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?
...g will be eliminated during DAG combine.
Let me know if there is another, better supported, approach for this kind of problems.
** Motivating Example **
The motivating example comes form the lowering of vector code on armv7.
More specifically, the build_vector node is lowered to a target specific ARMISD::build_vector where all the parameters are bitcasted to floating point types.
This works well, unless the inserted bitcasts survive until instruction selection. In that case, they incur moves between integer unit and floating point unit that may result in inefficient code.
Attached motivating_exa...
2013 Feb 02
0
[LLVMdev] Moving return value registers from MRI to return instructions
...e way.
I'll be updating the in-tree targets. Other targets need to make three changes:
1. The XXXretflag SDNode needs to be variadic like the call SDNodes are:
--- a/lib/Target/ARM/ARMInstrInfo.td
+++ b/lib/Target/ARM/ARMInstrInfo.td
@@ -117,7 +117,7 @@ def ARMcall_nolink : SDNode<"ARMISD::CALL_NOLINK", SDT_ARMcall,
SDNPVariadic]>;
def ARMretflag : SDNode<"ARMISD::RET_FLAG", SDTNone,
- [SDNPHasChain, SDNPOptInGlue]>;
+ [SDNPHasChain, SDNPOptInGlue, SDNPVariadic]...
2013 Jul 01
0
[LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?
...e.
>
> Let me know if there is another, better supported, approach for this kind
> of problems.
>
> ** Motivating Example **
> The motivating example comes form the lowering of vector code on armv7.
> More specifically, the build_vector node is lowered to a target specific
> ARMISD::build_vector where all the parameters are bitcasted to floating
> point types.
>
> This works well, unless the inserted bitcasts survive until instruction
> selection. In that case, they incur moves between integer unit and floating
> point unit that may result in inefficient code....
2009 Jun 03
5
[LLVMdev] patch for llc/ARM: added mechanism to move switch tables from .text -> .data; also cleanup and documentation
...RMConstantPoolValue(".T", Num,
+ ARMCP::CPDataSegmentJumpTable);
+ const SDValue CPAddr = DAG.getTargetConstantPool(CPV, PTy, 4);
+
+ // An ARM idiosyncrasy: wrap each constant pool entry before accessing it
+ const SDValue Wrapper = DAG.getNode(ARMISD::Wrapper, dl, MVT::i32, CPAddr);
+
+ // Load Table start from constan pool
+ const SDValue Table = DAG.getLoad(PTy, dl, DAG.getEntryNode(), Wrapper, NULL, 0);
+
+ // table entries are 4 bytes, so multiple index by 4
+ const SDValue ScaledIndex = DAG.getNode(ISD::MUL, dl, PTy, Index, DAG.getCons...
2010 Jan 15
4
[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz
Hi,
On ARMv6T2 this turns cttz into rbit, clz instead of the 4 instruction sequence it is now.
I'm not sure if adding RBIT to ARMISD and doing this optimization in the legalize pass is the best option, but the only better way I could think of doing it was to add a bitreverse intrinsic to llvm ir, which itself might not be the best option since bitreverse probably isn't too common.
Other targets that I know of that could pot...
2009 Jun 11
0
[LLVMdev] patch for llc/ARM: added mechanism to move switch tables from .text -> .data; also cleanup and documentation
On Jun 8, 2009, at 2:42 PM, robert muth wrote:
> On Sun, Jun 7, 2009 at 11:53 PM, Evan Cheng <evan.cheng at apple.com>
> wrote:
>>
>> On Jun 7, 2009, at 6:59 AM, robert muth wrote:
>>
>>> On Sat, Jun 6, 2009 at 4:51 PM, Evan Cheng<evan.cheng at apple.com>
>>> wrote:
>>>> +cl::opt<std::string>
2009 Jun 08
2
[LLVMdev] patch for llc/ARM: added mechanism to move switch tables from .text -> .data; also cleanup and documentation
On Sun, Jun 7, 2009 at 11:53 PM, Evan Cheng <evan.cheng at apple.com> wrote:
>
> On Jun 7, 2009, at 6:59 AM, robert muth wrote:
>
>> On Sat, Jun 6, 2009 at 4:51 PM, Evan Cheng<evan.cheng at apple.com>
>> wrote:
>>> +cl::opt<std::string> FlagJumpTableSection("jumptable-section",
>>> +
2010 Nov 12
1
[LLVMdev] Simple NEON optimization
...10:42 AM, Renato Golin wrote:
>
>> On 12 November 2010 17:52, Bob Wilson <bob.wilson at apple.com> wrote:
>>> I recommend implementing this as a target-specific DAG combine optimization. We already have target-specific DAG nodes for the relevant NEON comparison operations (ARMISD::VCEQ, etc. -- see ARMISelLowering.h) as well as the vmov (ARMISD::VMOVIMM). You just need to teach the DAG combiner how to fold them together. Here's what you need to do (all of this code is in ARMISelLowering.cpp):
>>
>> Hi Bob,
>>
>> I thought so... I'll get c...
2010 Jan 15
1
[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz
...Chris Lattner <clattner at apple.com> wrote:
>
> On Jan 14, 2010, at 10:13 PM, David Conrad wrote:
>
>> Hi,
>>
>> On ARMv6T2 this turns cttz into rbit, clz instead of the 4
>> instruction sequence it is now.
>>
>> I'm not sure if adding RBIT to ARMISD and doing this optimization in
>> the legalize pass is the best option, but the only better way I
>> could think of doing it was to add a bitreverse intrinsic to llvm
>> ir, which itself might not be the best option since bitreverse
>> probably isn't too common.
>
>...
2012 Feb 17
0
[LLVMdev] ARM/Thumb2/ISEL Need help tracing down a failing match: (HOW?)
...e78210, 0x1e78310<LD4[ConstantPool]> [ID=10]
Initial Opcode index to 24435
......
Morphed node: 0x1e7adf0: i32,ch = LDRi12 0x1e78210, 0x1e78010, 0x1e7aef0,
0x1e7b0f0, 0x1e4c030<Mem:LD4[ConstantPool]>
ISEL: Match complete!
ISEL: Starting pattern match on root node: 0x1e78210: i32 = ARMISD::Wrapper
0x1e77f10 [ID=9]
Initial Opcode index to 49796
OpcodeSwitch from 49799 to 49891
Skipped scope entry (due to false predicate) at index 49896, continuing
at 49914
Morphed node: 0x1e78210: i32 = MOVi32imm 0x1e77f10
ISEL: Match complete!
Here is the failing case in Thumb2 mode
ISE...
2010 Jan 15
0
[LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz
On Jan 14, 2010, at 10:13 PM, David Conrad wrote:
> Hi,
>
> On ARMv6T2 this turns cttz into rbit, clz instead of the 4
> instruction sequence it is now.
>
> I'm not sure if adding RBIT to ARMISD and doing this optimization in
> the legalize pass is the best option, but the only better way I
> could think of doing it was to add a bitreverse intrinsic to llvm
> ir, which itself might not be the best option since bitreverse
> probably isn't too common.
I haven't l...
2009 Jun 24
2
[LLVMdev] patch for llc/ARM: added mechanism to move switch tables from .text -> .data; also cleanup and documentation
...RMConstantPoolValue(".T", Num,
+ ARMCP::CPDataSegmentJumpTable);
+ const SDValue CPAddr = DAG.getTargetConstantPool(CPV, PTy, 4);
+
+ // An ARM idiosyncrasy: wrap each constant pool entry before accessing it
+ const SDValue Wrapper = DAG.getNode(ARMISD::Wrapper, dl, MVT::i32, CPAddr);
+
+ // Load Table start from constan pool
+ const SDValue Table = DAG.getLoad(PTy, dl, DAG.getEntryNode(), Wrapper, NULL, 0);
+
+ // table entries are 4 bytes, so multiple index by 4
+ const SDValue ScaledIndex = DAG.getNode(ISD::MUL, dl, PTy, Index, DAG.getCons...