Alex Susu via llvm-dev
2017-Feb-10 20:42 UTC
[llvm-dev] Specify special cases of delay slots in the back end
Hello.
I am progressing a bit with difficulty with the post RA scheduler
(PostRASchedulerList.cpp with ScoreboardHazardRecognizer) - the problem I have
is that it
doesn't advance at the next available instruction when the overridden
ScoreboardHazardRecognizer::getHazardType() method returns NoopHazard and it
gets stuck at
the same instruction (store in my runs).
Just to make sure: I am trying to use the post-RA (Register Allocation)
scheduler to
avoid data hazards by inserting, if possible, other USEFUL instructions from the
program
instead of (just) NOPs. Is this out-of-order scheduling (e.g., using the
ScoreboardHazardRecognizer) that employs useful program instructions instead of
NOPs
working well with the post-RA scheduler?
Otherwise, if the post RA scheduler only inserts NOPs, since I have issues
using it,
I could as well insert NOPs in the [Target]AsmPrinter.cpp module .
Thank you,
Alex
On 2/10/2017 1:42 AM, Hal Finkel wrote:>
> On 02/09/2017 04:46 PM, Alex Susu via llvm-dev wrote:
>> Hello.
>> Hal, thank you for the information.
>> I managed to get inspired from PPCHazardRecognizers.cpp. So I
created my very simple
>> [Target]HazardRecognizers.cpp pass that is also derived from
ScoreboardHazardRecognizer.
>> My class only implements the method getHazardType(), which checks if,
as stated in my
>> first email, for example, I have a store instruction that is storing
the value updated
>> by the instruction immediately above, which is NOT ok, since for my
processor this is a
>> data hazard and in this case I have to insert a NOP in between by
making getHazardType()
>> to:
>> return NoopHazard; // this basically emits noop
>>
>> However, to my surprise, my very simple post-RA scheduler (using my
class derived
>> from ScoreboardHazardRecognizer) is cycling FOREVER after this return
NoopHazard, by
>> calling getHazardType() again and again for this SAME store instruction
I found in the
>> first place with the data hazard problem. So, llc is no longer
finishing - I have to
>> stop the process because of this strange behavior.
>> I was expecting after the first call to getHazardType() with the
respective store
>> instruction (and return NoopHazard) that the scheduler would move
forward to the other
>> instructions in the DAG/basic-block.
>
> It should emit a nop if all available instructions return NoopHazard.
>
>>
>> Do you have an idea what can I do to fix this problem?
>
> I'm not sure. I recall running into a situation like this years ago,
but I don't recall
> now how I resolved it. Are you correctly handling the Stalls argument to
getHazardType?
>
> -Hal
>
>>
>> Thank you very much,
>> Alex
>>
>> On 2/3/2017 10:25 PM, Hal Finkel wrote:
>>> Hi Alex,
>>>
>>> You can program a post-RA scheduler which will return NoopHazard in
the appropriate
>>> circumstances. You can look at the PowerPC target (e.g.
>>> lib/Target/PowerPC/PPCHazardRecognizers.cpp) as an example.
>>>
>>> -Hal
>>>
>>>
>>> On 02/02/2017 05:03 PM, Alex Susu via llvm-dev wrote:
>>>> Hello.
>>>> I see there is little information on specifying
instructions with delay slots.
>>>> So could you please tell me how can I insert NOPs (BEFORE
or after an instruction)
>>>> or how to make an aware instruction scheduler in order to avoid
miscalculations due to
>>>> the delay slot effect?
>>>>
>>>> More exactly, I have the following constraints on my (SIMD)
processor:
>>>> - certain stores or loads, must be executed 1 cycle after
the instruction
>>>> generating their input operands ends. For example, if I have:
>>>> R1 = R2 + R3
>>>> LS[R10] = R1 // this will not produce the correct
result because it does not
>>>> see the updated value of R1 from the previous instruction
>>>> To make this code execute correctly we need to insert a
NOP:
>>>> R1 = R2 + R3
>>>> NOP // or other instruction to fill the delay slot
>>>> LS[R10] = R1
>>>>
>>>> - a compare instruction requires to add a NOP after it,
before the predicated
>>>> block (something like a conditional JMP instruction) starts.
>>>>
>>>>
>>>> Thank you,
>>>> Alex
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
Hal Finkel via llvm-dev
2017-Feb-10 21:33 UTC
[llvm-dev] Specify special cases of delay slots in the back end
Hi Alex, All of this makes sense, but are you correctly handling the Stalls argument to getHazardType? What are you doing with it? -Hal On 02/10/2017 02:42 PM, Alex Susu via llvm-dev wrote:> Hello. > I am progressing a bit with difficulty with the post RA scheduler > (PostRASchedulerList.cpp with ScoreboardHazardRecognizer) - the > problem I have is that it doesn't advance at the next available > instruction when the overridden > ScoreboardHazardRecognizer::getHazardType() method returns NoopHazard > and it gets stuck at the same instruction (store in my runs). > > Just to make sure: I am trying to use the post-RA (Register > Allocation) scheduler to avoid data hazards by inserting, if possible, > other USEFUL instructions from the program instead of (just) NOPs. Is > this out-of-order scheduling (e.g., using the > ScoreboardHazardRecognizer) that employs useful program instructions > instead of NOPs working well with the post-RA scheduler? > Otherwise, if the post RA scheduler only inserts NOPs, since I > have issues using it, I could as well insert NOPs in the > [Target]AsmPrinter.cpp module . > > Thank you, > Alex > > On 2/10/2017 1:42 AM, Hal Finkel wrote: >> >> On 02/09/2017 04:46 PM, Alex Susu via llvm-dev wrote: >>> Hello. >>> Hal, thank you for the information. >>> I managed to get inspired from PPCHazardRecognizers.cpp. So I >>> created my very simple >>> [Target]HazardRecognizers.cpp pass that is also derived from >>> ScoreboardHazardRecognizer. >>> My class only implements the method getHazardType(), which checks >>> if, as stated in my >>> first email, for example, I have a store instruction that is storing >>> the value updated >>> by the instruction immediately above, which is NOT ok, since for my >>> processor this is a >>> data hazard and in this case I have to insert a NOP in between by >>> making getHazardType() >>> to: >>> return NoopHazard; // this basically emits noop >>> >>> However, to my surprise, my very simple post-RA scheduler (using >>> my class derived >>> from ScoreboardHazardRecognizer) is cycling FOREVER after this >>> return NoopHazard, by >>> calling getHazardType() again and again for this SAME store >>> instruction I found in the >>> first place with the data hazard problem. So, llc is no longer >>> finishing - I have to >>> stop the process because of this strange behavior. >>> I was expecting after the first call to getHazardType() with the >>> respective store >>> instruction (and return NoopHazard) that the scheduler would move >>> forward to the other >>> instructions in the DAG/basic-block. >> >> It should emit a nop if all available instructions return NoopHazard. >> >>> >>> Do you have an idea what can I do to fix this problem? >> >> I'm not sure. I recall running into a situation like this years ago, >> but I don't recall >> now how I resolved it. Are you correctly handling the Stalls argument >> to getHazardType? >> >> -Hal >> >>> >>> Thank you very much, >>> Alex >>> >>> On 2/3/2017 10:25 PM, Hal Finkel wrote: >>>> Hi Alex, >>>> >>>> You can program a post-RA scheduler which will return NoopHazard in >>>> the appropriate >>>> circumstances. You can look at the PowerPC target (e.g. >>>> lib/Target/PowerPC/PPCHazardRecognizers.cpp) as an example. >>>> >>>> -Hal >>>> >>>> >>>> On 02/02/2017 05:03 PM, Alex Susu via llvm-dev wrote: >>>>> Hello. >>>>> I see there is little information on specifying instructions >>>>> with delay slots. >>>>> So could you please tell me how can I insert NOPs (BEFORE or >>>>> after an instruction) >>>>> or how to make an aware instruction scheduler in order to avoid >>>>> miscalculations due to >>>>> the delay slot effect? >>>>> >>>>> More exactly, I have the following constraints on my (SIMD) >>>>> processor: >>>>> - certain stores or loads, must be executed 1 cycle after >>>>> the instruction >>>>> generating their input operands ends. For example, if I have: >>>>> R1 = R2 + R3 >>>>> LS[R10] = R1 // this will not produce the correct result >>>>> because it does not >>>>> see the updated value of R1 from the previous instruction >>>>> To make this code execute correctly we need to insert a NOP: >>>>> R1 = R2 + R3 >>>>> NOP // or other instruction to fill the delay slot >>>>> LS[R10] = R1 >>>>> >>>>> - a compare instruction requires to add a NOP after it, >>>>> before the predicated >>>>> block (something like a conditional JMP instruction) starts. >>>>> >>>>> >>>>> Thank you, >>>>> Alex >>>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> llvm-dev at lists.llvm.org >>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory
Alex Susu via llvm-dev
2017-Feb-11 12:39 UTC
[llvm-dev] Specify special cases of delay slots in the back end
Hello.
Hal, the problem I have is that it doesn't advance at the next
available instruction
- it always gets the same store. This might be because I did not specify in a
file like
[Target]Schedule.td the functional units, processor and instruction
itineraries.
Regarding the Stalls argument to my method
[Target]DispatchGroupSBHazardRecognizer::getHazardType() I always get the
argument Stalls
= 0. This is no surprise since in PostRASchedulerList.cpp we have only one call
to it, in
method SchedulePostRATDList::ListScheduleTopDown():
ScheduleHazardRecognizer::HazardType HT
HazardRec->getHazardType(CurSUnit, 0/*no stalls*/);
Let me state what I have added to my back end to enable scheduling with
hazards:
- inspiring from lib/Target/PowerPC/PPCHazardRecognizers.h, I have
created a class
[Target]DispatchGroupSBHazardRecognizer : public ScoreboardHazardRecognizer (I
use
ScoreboardHazardRecognizer because I hope in the near future to make my class
employ in
"out-of-order" execution USEFUL program instructions instead of NOP to
handle my data
hazards), implementing for it only a method:
HazardType getHazardType(SUnit *SU, int Stalls);
In this method I check if the current SU is a vector store and the
previous
instruction updates the register used by the store, which in my processor is a
data
hazard, in which case I give:
return NoopHazard;
and otherwise, I give:
return ScoreboardHazardRecognizer::getHazardType(SU, Stalls);
- I implemented in [Target]InstrInfo.cpp 2 more methods:
- CreateTargetPostRAHazardRecognizer() to register the
[Target]DispatchGroupSBHazardRecognizer()
- insertNoop() which returns the target's NOP
- note that my vector (and scalar) instructions are inspired from the
Mips back
end, which has MSAInst (and MipsInst) with NoItinerary InstrItinClass. Currently
I am not
using a [Target]Schedule.td specifying functional units, processor and
instruction
itineraries. This might be a problem - I guess ScoreboardHazardRecognizer relies
on this
information.
In principle, should I maybe use the post-RA MI-scheduler instead of the
standard
post-RA scheduler (maybe also
http://llvm.org/docs/doxygen/html/classllvm_1_1MachineSchedStrategy.html ) to
deal with my
hazards ?
Following http://llvm.org/devmtg/2014-10/Slides/Estes-MISchedulerTutorial.pdf,
the
MI-scheduler also handles hazards, but I guess it's less documented,
although the AArch64
is using it.
Thank you,
Alex
On 2/10/2017 11:33 PM, Hal Finkel wrote:> Hi Alex,
>
> All of this makes sense, but are you correctly handling the Stalls argument
to
> getHazardType? What are you doing with it?
>
> -Hal
>
>
> On 02/10/2017 02:42 PM, Alex Susu via llvm-dev wrote:
>> Hello.
>> I am progressing a bit with difficulty with the post RA scheduler
>> (PostRASchedulerList.cpp with ScoreboardHazardRecognizer) - the problem
I have is that
>> it doesn't advance at the next available instruction when the
overridden
>> ScoreboardHazardRecognizer::getHazardType() method returns NoopHazard
and it gets stuck
>> at the same instruction (store in my runs).
>>
>> Just to make sure: I am trying to use the post-RA (Register
Allocation) scheduler to
>> avoid data hazards by inserting, if possible, other USEFUL instructions
from the program
>> instead of (just) NOPs. Is this out-of-order scheduling (e.g., using
the
>> ScoreboardHazardRecognizer) that employs useful program instructions
instead of NOPs
>> working well with the post-RA scheduler?
>> Otherwise, if the post RA scheduler only inserts NOPs, since I have
issues using it,
>> I could as well insert NOPs in the [Target]AsmPrinter.cpp module .
>>
>> Thank you,
>> Alex
>>
>> On 2/10/2017 1:42 AM, Hal Finkel wrote:
>>>
>>> On 02/09/2017 04:46 PM, Alex Susu via llvm-dev wrote:
>>>> Hello.
>>>> Hal, thank you for the information.
>>>> I managed to get inspired from PPCHazardRecognizers.cpp. So
I created my very simple
>>>> [Target]HazardRecognizers.cpp pass that is also derived from
ScoreboardHazardRecognizer.
>>>> My class only implements the method getHazardType(), which
checks if, as stated in my
>>>> first email, for example, I have a store instruction that is
storing the value updated
>>>> by the instruction immediately above, which is NOT ok, since
for my processor this is a
>>>> data hazard and in this case I have to insert a NOP in between
by making getHazardType()
>>>> to:
>>>> return NoopHazard; // this basically emits noop
>>>>
>>>> However, to my surprise, my very simple post-RA scheduler
(using my class derived
>>>> from ScoreboardHazardRecognizer) is cycling FOREVER after this
return NoopHazard, by
>>>> calling getHazardType() again and again for this SAME store
instruction I found in the
>>>> first place with the data hazard problem. So, llc is no longer
finishing - I have to
>>>> stop the process because of this strange behavior.
>>>> I was expecting after the first call to getHazardType()
with the respective store
>>>> instruction (and return NoopHazard) that the scheduler would
move forward to the other
>>>> instructions in the DAG/basic-block.
>>>
>>> It should emit a nop if all available instructions return
NoopHazard.
>>>
>>>>
>>>> Do you have an idea what can I do to fix this problem?
>>>
>>> I'm not sure. I recall running into a situation like this years
ago, but I don't recall
>>> now how I resolved it. Are you correctly handling the Stalls
argument to getHazardType?
>>>
>>> -Hal
>>>
>>>>
>>>> Thank you very much,
>>>> Alex
>>>>
>>>> On 2/3/2017 10:25 PM, Hal Finkel wrote:
>>>>> Hi Alex,
>>>>>
>>>>> You can program a post-RA scheduler which will return
NoopHazard in the appropriate
>>>>> circumstances. You can look at the PowerPC target (e.g.
>>>>> lib/Target/PowerPC/PPCHazardRecognizers.cpp) as an example.
>>>>>
>>>>> -Hal
>>>>>
>>>>>
>>>>> On 02/02/2017 05:03 PM, Alex Susu via llvm-dev wrote:
>>>>>> Hello.
>>>>>> I see there is little information on specifying
instructions with delay slots.
>>>>>> So could you please tell me how can I insert NOPs
(BEFORE or after an instruction)
>>>>>> or how to make an aware instruction scheduler in order
to avoid miscalculations due to
>>>>>> the delay slot effect?
>>>>>>
>>>>>> More exactly, I have the following constraints on
my (SIMD) processor:
>>>>>> - certain stores or loads, must be executed 1
cycle after the instruction
>>>>>> generating their input operands ends. For example, if I
have:
>>>>>> R1 = R2 + R3
>>>>>> LS[R10] = R1 // this will not produce the
correct result because it does not
>>>>>> see the updated value of R1 from the previous
instruction
>>>>>> To make this code execute correctly we need to
insert a NOP:
>>>>>> R1 = R2 + R3
>>>>>> NOP // or other instruction to fill the delay
slot
>>>>>> LS[R10] = R1
>>>>>>
>>>>>> - a compare instruction requires to add a NOP
after it, before the predicated
>>>>>> block (something like a conditional JMP instruction)
starts.
>>>>>>
>>>>>>
>>>>>> Thank you,
>>>>>> Alex
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> llvm-dev at lists.llvm.org
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
Maybe Matching Threads
- Specify special cases of delay slots in the back end
- Specify special cases of delay slots in the back end
- Pre-RA scheduler does not generate NOPs when getHazardType() returns NoopHazard
- Specify special cases of delay slots in the back end
- [LLVMdev] [llvm-commits] Bottom-Up Scheduling?