thr3ads.net - llvm dev - [llvm-dev] Some questions about software pipeline in LLVM 4.0.0 [Jun 2017]

If this information is useful, please help other people find it:
Share via:

zhangqiang (CO) via llvm-dev

2017-May-25 08:33 UTC

[llvm-dev] Some questions about software pipeline in LLVM 4.0.0

Hi,

I have some questions about the implementation of Software pipeline in
MachinePipeliner.cpp.

First, in hexagon backend, between MachinePipeliner and regalloc pass,
there're some other passes like phi eliminate, two-address, register
coalescing, which may change or insert intructions like 'copy' in MBB,
and swp kernel loop may be destroyed by these passes.
Why not put MachinePipeliner just before reg alloc pass like gcc's modulo
scheduler does? In order to keep SSA pattern?
I found many codes to process PHI nodes in MachinePipeliner.cpp. So I think if
we move MachinePipeliner just before regalloc, it will simplify the
data/resource dependency graph for SMS.

Another question, in gcc, there's a flag BB_DISABLE_SCHEDULE in Basic block,
which is used by SMS to prevent other schedulers from messing with the loop
schedule. So, in llvm , where can I find the similar flag to prevent the machine
scheduler touch the kernel loop?
I have debug some swp cases(hexagon), and find machine scheduler will
re-schedule the SMS kernel loop. Why not add such a flag?


Best Regards,

Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170525/5e99db71/attachment.html>

陳韋任 via llvm-dev

2017-May-31 11:53 UTC

head link

[llvm-dev] Some questions about software pipeline in LLVM 4.0.0

Hi Brendon,

  Maybe you can explain the reason behind current SMS implementation
somehow? :-)

Regards,
chenwj



2017-05-25 16:33 GMT+08:00 zhangqiang (CO) via llvm-dev <
llvm-dev at lists.llvm.org>:
> Hi,
>
>
>
> I have some questions about the implementation of Software pipeline in
> MachinePipeliner.cpp.
>
>
>
> First, in hexagon backend, between MachinePipeliner and regalloc pass,
> there're some other passes like phi eliminate, two-address, register
> coalescing, which may change or insert intructions like 'copy' in
MBB, and
> swp kernel loop may be destroyed by these passes.
>
> Why not put MachinePipeliner just before reg alloc pass like gcc’s modulo
> scheduler does? In order to keep SSA pattern?
>
> I found many codes to process PHI nodes in MachinePipeliner.cpp. So I
> think if we move MachinePipeliner just before regalloc, it will simplify
> the data/resource dependency graph for SMS.
>
>
>
> Another question, in gcc, there's a flag * BB_DISABLE_SCHEDULE* in
Basic
> block, which is used by SMS to prevent other schedulers from messing with
> the loop schedule. So, in llvm , where can I find the similar flag to
> prevent the machine scheduler touch the kernel loop?
>
> I have debug some swp cases(hexagon), and find machine scheduler will
> re-schedule the SMS kernel loop. Why not add such a flag?
>
>
>
>
>
> Best Regards,
>
>
>
> Thanks
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>

-- 
Wei-Ren Chen (陳韋任)
Homepage: https://people.cs.nctu.edu.tw/~chenwj
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170531/14e92ed0/attachment-0001.html>

Brendon Cahoon via llvm-dev

2017-Jun-01 00:00 UTC

head link

[llvm-dev] Some questions about software pipeline in LLVM 4.0.0

Hi - I replied to the original sender only by mistake. Sorry about that.

 

When we started working on the pipeliner, and added it before the scheduler,
we also were concerned that the scheduler or other passes would undo the
work of the pipeliner. The initial thought was that we would add information
(using metadata or some other way like you've suggested) to the basic block
to tell the scheduler not to schedule the block.  It turns out, that for us,
we never needed to do so.  It was pretty rare that the scheduler would
"undo" the work of the pipeliner. Actually, in the cases that it did,
it
turned out to be a problem with the scheduler since it wasn't making good
decisions.

 

In general, most of the extra copies that are added by prior to the register
allocator are eliminated.  Certainly, there are some real copies that end up
being generated, but I think it's better to exclude the copies from the
schedule since most will be eliminated.  Otherwise, including the copies in
the schedule will require resources that may never be used, which is worse
in my opinion.

 

We decided to run the pipeliner on SSA form since the presence of the Phis
helps identify recurrences and other dependences.  Without the Phis, we need
another way to identify recurrences.  Also, if it's done just prior to
register allocation we need to re-generate the liveness information for all
the new virtual registers and CFG. Unfortunately, you're correct - there is
a lot of code that deals with Phis. The code that generates the Phis in the
swp kernel and epilogs is a mess and very complicated.  This portion of the
pipeliner really needs some attention to reduce the complexity and improve
readability.  This has been on my list for quite a while.

 

While I think we could move the location of the pipeliner, I don't think the
extra work to do so would provide much benefit. In general, we've been able
to work around the cases when extra copies or instructions are added, or
when the scheduler messes up the kernel.  Also, for Hexagon, there are many
passes that run after the register that deal with scheduling. If you have
specific cases where you're seeing a problem, it would be interesting to
take a look at them.

 

Thanks,

Brendon

 

From: zhangqiang (CO) [mailto:zhangqiang75 at huawei.com] 
Sent: Thursday, May 25, 2017 3:33 AM
To: llvm-dev at lists.llvm.org
Cc: bcahoon at codeaurora.org
Subject: Some questions about software pipeline in LLVM 4.0.0

 

Hi,

 

I have some questions about the implementation of Software pipeline in
MachinePipeliner.cpp.

 

First, in hexagon backend, between MachinePipeliner and regalloc pass,
there're some other passes like phi eliminate, two-address, register
coalescing, which may change or insert intructions like 'copy' in MBB,
and
swp kernel loop may be destroyed by these passes. 

Why not put MachinePipeliner just before reg alloc pass like gcc's modulo
scheduler does? In order to keep SSA pattern? 

I found many codes to process PHI nodes in MachinePipeliner.cpp. So I think
if we move MachinePipeliner just before regalloc, it will simplify the
data/resource dependency graph for SMS.

 

Another question, in gcc, there's a flag BB_DISABLE_SCHEDULE in Basic block,
which is used by SMS to prevent other schedulers from messing with the loop
schedule. So, in llvm , where can I find the similar flag to prevent the
machine scheduler touch the kernel loop?

I have debug some swp cases(hexagon), and find machine scheduler will
re-schedule the SMS kernel loop. Why not add such a flag?

 

 

Best Regards,

 

Thanks

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170531/6e00fca0/attachment-0001.html>

Ehsan Amiri via llvm-dev

2017-Jun-19 06:55 UTC

head link

[llvm-dev] Some questions about software pipeline in LLVM 4.0.0

Hi Brendon

Certainly, there are some real copies that end up being generated, but I think
it's better to exclude the copies from the schedule since most will be
eliminated.

I was wondering what was the cause of the real copies that was being generated
in your experience? Something that I noticed when experimenting with LLVM on our
out-of-tree backend, was that there are copy instructions generated **because
of** modulo scheduling.

For example before modulo scheduling I have

%vreg6<def> = PHI %vreg23, <BB#1>, %vreg17
%vreg25<def> = INSN1 %vreg1, %vreg6;
% vreg26<def> = INSN1 %vreg2, %vreg6     <-- same opcode as previous
insn
% vreg17<def> = INSN2 %vreg6, %vreg5;

So for the phi node here, if we do phi elimination and register coalescing, we
won't have any copy insn left. But after modulo scheduling the instructions
above, now appear like this:

%vreg73<def> = PHI %vreg59, <BB#5>, %vreg62, <BB#6>;
%vreg61<def> = INSN1 %vreg1, %vreg73;
%vreg62<def> = INSN2 %vreg73, %vreg5;
%vreg64<def> = INSN1 %vreg2, %vreg73;

Now if you look right after the third insn after modulo scheduling, both vreg73
and vreg62 are live here. So when we remove the corresponding phi instruction,
we end up with a copy instruction that cannot be removed by register coalescing.

IIUC, this is a byproduct of modulo scheduling. I have not really started tuning
modulo scheduling for our target, so I don't know if this is a result of
modulo scheduling not being tuned or not? Have you seen this type of Copy? Any
insights are greatly appreciated.

Thanks
Ehsan


From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Brendon
Cahoon via llvm-dev
Sent: Wednesday, May 31, 2017 8:00 PM
To: zhangqiang (CO); llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] Some questions about software pipeline in LLVM 4.0.0

Hi - I replied to the original sender only by mistake. Sorry about that.

When we started working on the pipeliner, and added it before the scheduler, we
also were concerned that the scheduler or other passes would undo the work of
the pipeliner. The initial thought was that we would add information (using
metadata or some other way like you've suggested) to the basic block to tell
the scheduler not to schedule the block.  It turns out, that for us, we never
needed to do so.  It was pretty rare that the scheduler would "undo"
the work of the pipeliner. Actually, in the cases that it did, it turned out to
be a problem with the scheduler since it wasn't making good decisions.

In general, most of the extra copies that are added by prior to the register
allocator are eliminated.  Certainly, there are some real copies that end up
being generated, but I think it's better to exclude the copies from the
schedule since most will be eliminated.  Otherwise, including the copies in the
schedule will require resources that may never be used, which is worse in my
opinion.

We decided to run the pipeliner on SSA form since the presence of the Phis helps
identify recurrences and other dependences.  Without the Phis, we need another
way to identify recurrences.  Also, if it's done just prior to register
allocation we need to re-generate the liveness information for all the new
virtual registers and CFG. Unfortunately, you're correct - there is a lot of
code that deals with Phis. The code that generates the Phis in the swp kernel
and epilogs is a mess and very complicated.  This portion of the pipeliner
really needs some attention to reduce the complexity and improve readability. 
This has been on my list for quite a while.

While I think we could move the location of the pipeliner, I don't think the
extra work to do so would provide much benefit. In general, we've been able
to work around the cases when extra copies or instructions are added, or when
the scheduler messes up the kernel.  Also, for Hexagon, there are many passes
that run after the register that deal with scheduling. If you have specific
cases where you're seeing a problem, it would be interesting to take a look
at them.

Thanks,
Brendon

From: zhangqiang (CO) [mailto:zhangqiang75 at huawei.com]
Sent: Thursday, May 25, 2017 3:33 AM
To: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Cc: bcahoon at codeaurora.org<mailto:bcahoon at codeaurora.org>
Subject: Some questions about software pipeline in LLVM 4.0.0

Hi,

I have some questions about the implementation of Software pipeline in
MachinePipeliner.cpp.

First, in hexagon backend, between MachinePipeliner and regalloc pass,
there're some other passes like phi eliminate, two-address, register
coalescing, which may change or insert intructions like 'copy' in MBB,
and swp kernel loop may be destroyed by these passes.
Why not put MachinePipeliner just before reg alloc pass like gcc's modulo
scheduler does? In order to keep SSA pattern?
I found many codes to process PHI nodes in MachinePipeliner.cpp. So I think if
we move MachinePipeliner just before regalloc, it will simplify the
data/resource dependency graph for SMS.

Another question, in gcc, there's a flag BB_DISABLE_SCHEDULE in Basic block,
which is used by SMS to prevent other schedulers from messing with the loop
schedule. So, in llvm , where can I find the similar flag to prevent the machine
scheduler touch the kernel loop?
I have debug some swp cases(hexagon), and find machine scheduler will
re-schedule the SMS kernel loop. Why not add such a flag?


Best Regards,

Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170619/9cae3b22/attachment.html>

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Jun 2017 - Some questions about software pipeline in LLVM 4.0.0

[llvm-dev] Some questions about software pipeline in LLVM 4.0.0

[llvm-dev] Some questions about software pipeline in LLVM 4.0.0

[llvm-dev] Some questions about software pipeline in LLVM 4.0.0

[llvm-dev] Some questions about software pipeline in LLVM 4.0.0

Possibly Parallel Threads