thr3ads.net - llvm dev - [LLVMdev] [RFC] Bundling support in the PostRA Scheduler [Aug 2012]

If this information is useful, please help other people find it:
Share via:

Ivan Llopard

2012-Jul-31 15:37 UTC

[LLVMdev] [RFC] Bundling support in the PostRA Scheduler

Hi,

I'm working on a custom top-down post RA scheduler which builds bundles 
at the same time for our VLIW processor. I've borrowed most of the 
implementation from the resource priority queue implemented for the 
existent VLIW scheduler but applied to the context of MI scheduling. 
Basically, instructions that are likely to be bundled must be scheduled 
first (i.e. get higher priority).
This work should integrate very well with the current infrastructure and 
I'd like to contribute with a patch to add bundling capabilities to the 
current post RA scheduler if the LLVM community is interested :-) (May 
Hexagon need it as well?). It would also be a great opportunity for us 
to get feedback from the community about this.

We have a non-interlocked processor which relies on the post ra 
scheduler to emit cycle-accurate bundles (valid bundles without 
incurring in structural hazards). The construction of bundles outside 
the scope of post RA scheduling will require structural hazard 
information to work properly for processors without pipeline interlocks. 
For example, we can discover that an instruction can fit into the 
current packet (following a schema of linear code scanning, just like 
the current DFAPacketizer does) while in fact it cannot because of 
structural hazards. The two terms are strongly connected and necessary 
to build valid packets.
The concerns are mainly based on our non-interlocked processor, where 
cycle-accurate bundle emission is necessary. Other approaches/ideas are 
very welcome.
Do you have any plan for adding a more robust bundler into the current 
infrastructure ?

Ivan

Tom Stellard

2012-Jul-31 20:47 UTC

head link

[LLVMdev] [RFC] Bundling support in the PostRA Scheduler

On Tue, Jul 31, 2012 at 05:37:27PM +0200, Ivan Llopard
wrote:> Hi,
> 
> I'm working on a custom top-down post RA scheduler which builds bundles
> at the same time for our VLIW processor. I've borrowed most of the 
> implementation from the resource priority queue implemented for the 
> existent VLIW scheduler but applied to the context of MI scheduling. 
> Basically, instructions that are likely to be bundled must be scheduled 
> first (i.e. get higher priority).
> This work should integrate very well with the current infrastructure and 
> I'd like to contribute with a patch to add bundling capabilities to the
> current post RA scheduler if the LLVM community is interested :-) (May 
> Hexagon need it as well?). It would also be a great opportunity for us 
> to get feedback from the community about this.
>
This could potentially be useful for the R600 backend, which targets VLIW
GPUs (AMD HD2XXX-HD6XXX), so I would be very interested in this.


-Tom
> We have a non-interlocked processor which relies on the post ra 
> scheduler to emit cycle-accurate bundles (valid bundles without 
> incurring in structural hazards). The construction of bundles outside 
> the scope of post RA scheduling will require structural hazard 
> information to work properly for processors without pipeline interlocks. 
> For example, we can discover that an instruction can fit into the 
> current packet (following a schema of linear code scanning, just like 
> the current DFAPacketizer does) while in fact it cannot because of 
> structural hazards. The two terms are strongly connected and necessary 
> to build valid packets.
> The concerns are mainly based on our non-interlocked processor, where 
> cycle-accurate bundle emission is necessary. Other approaches/ideas are 
> very welcome.
> Do you have any plan for adding a more robust bundler into the current 
> infrastructure ?
> 
> Ivan
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Jordy Potman

2012-Aug-06 08:48 UTC

head link

[LLVMdev] [RFC] Bundling support in the PostRA Scheduler

On Tue, 2012-07-31 at 17:37 +0200, Ivan Llopard wrote:> Hi,
> 
> I'm working on a custom top-down post RA scheduler which builds bundles
> at the same time for our VLIW processor. I've borrowed most of the 
> implementation from the resource priority queue implemented for the 
> existent VLIW scheduler but applied to the context of MI scheduling. 
> Basically, instructions that are likely to be bundled must be scheduled 
> first (i.e. get higher priority).
> This work should integrate very well with the current infrastructure and 
> I'd like to contribute with a patch to add bundling capabilities to the
> current post RA scheduler if the LLVM community is interested :-) (May 
> Hexagon need it as well?). It would also be a great opportunity for us 
> to get feedback from the community about this.
> 
> We have a non-interlocked processor which relies on the post ra 
> scheduler to emit cycle-accurate bundles (valid bundles without 
> incurring in structural hazards). The construction of bundles outside 
> the scope of post RA scheduling will require structural hazard 
> information to work properly for processors without pipeline interlocks. 
> For example, we can discover that an instruction can fit into the 
> current packet (following a schema of linear code scanning, just like 
> the current DFAPacketizer does) while in fact it cannot because of 
> structural hazards. The two terms are strongly connected and necessary 
> to build valid packets.
> The concerns are mainly based on our non-interlocked processor, where 
> cycle-accurate bundle emission is necessary. Other approaches/ideas are 
> very welcome.
We would be interested in this since we are targeting a non-interlocked
VLIW processor as well. Currently we are using a custom post RA
scheduler implemented in a pre emit pass. However, we would like to
integrate better with and reuse more of the current LLVM infrastructure
such as the post RA scheduler framework and bundling support.

Jordy
> Do you have any plan for adding a more robust bundler into the current 
> infrastructure ?
> 
> Ivan
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Andrew Trick

2012-Aug-06 17:12 UTC

head link

[LLVMdev] [RFC] Bundling support in the PostRA Scheduler

On Jul 31, 2012, at 8:37 AM, Ivan Llopard <ivanllopard at gmail.com>
wrote:
> Hi,
> 
> I'm working on a custom top-down post RA scheduler which builds bundles
> at the same time for our VLIW processor. I've borrowed most of the 
> implementation from the resource priority queue implemented for the 
> existent VLIW scheduler but applied to the context of MI scheduling. 
> Basically, instructions that are likely to be bundled must be scheduled 
> first (i.e. get higher priority).
> This work should integrate very well with the current infrastructure and 
> I'd like to contribute with a patch to add bundling capabilities to the
> current post RA scheduler if the LLVM community is interested :-) (May 
> Hexagon need it as well?). It would also be a great opportunity for us 
> to get feedback from the community about this.
> 
> We have a non-interlocked processor which relies on the post ra 
> scheduler to emit cycle-accurate bundles (valid bundles without 
> incurring in structural hazards). The construction of bundles outside 
> the scope of post RA scheduling will require structural hazard 
> information to work properly for processors without pipeline interlocks. 
> For example, we can discover that an instruction can fit into the 
> current packet (following a schema of linear code scanning, just like 
> the current DFAPacketizer does) while in fact it cannot because of 
> structural hazards. The two terms are strongly connected and necessary 
> to build valid packets.
> The concerns are mainly based on our non-interlocked processor, where 
> cycle-accurate bundle emission is necessary. Other approaches/ideas are 
> very welcome.
> Do you have any plan for adding a more robust bundler into the current 
> infrastructure ?
> 
> Ivan
Hi Ivan,

Your description sounds fine to me. I assume you are totally decoupled from what
LLVM currently calls the "postRA" scheduling pass. Hopefully you
don't need anything in PostRASchedulerList.cpp.

Running your bundler as a preEmit pass is the cleanest approach. But if need be,
we can support preRA bundling at the time the MachineScheduler currently runs
(if enabled). TargetPassConfig allows you to substitute your own pass in place
of MachineScheduler. Passes that run after MachineScheduler are intended to
support instruction bundles. This feature is not extensively tested, so anyone
taking this approach would need to work with backend maintainers to get things
fixed.

-Andy

Sergei Larin

2012-Aug-09 19:03 UTC

head link

[LLVMdev] [RFC] Bundling support in the PostRA Scheduler

Ivan, 

  IMHO the current bundle infrastructure is in need of some additional
features - not only bundle handling utilities, but also support for liveness
computation, several (key) code transformation passes etc. (see my recent
post for conditional defs for instance). Does your back-end perform any
substantial transformations to the code _after_ the second pass (postRA)
scheduler and bundle formation/finalization? Does any of them might demand
bundle decomposition? If so, I really wonder how do you plan to address
that.

  We also have an MI based version of the current VLIW scheduler, that I am
planning to bring up to date with the tip, and upstream ASAP. But it is
preRA scheduler, and it does not preserve bundles (though it constructs them
and easily could keep them around). The issue is what I have said above,
many current passes do not understand bundles (well or at all) and accurate
liveness update has been an issue for me recently.

  I also tried to mess with PostRA scheduler to achieve similar goals, only
to find out that all the additional dependencies after RA make it virtually
impossible to produce high quality schedule, and obviously it is too late at
that point to address reg pressure via scheduling techniques, so I have put
that project on the back burner for a while.

  I guess what I am saying is that you are not alone in this :) ...and we do
need to formulate our requirements clearly... and repeat them regularly :)
Hope this helps.

Sergei

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum.
> -----Original Message-----
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at
cs.uiuc.edu]
> On Behalf Of Ivan Llopard
> Sent: Tuesday, July 31, 2012 10:37 AM
> To: LLVM Developers Mailing List
> Subject: [LLVMdev] [RFC] Bundling support in the PostRA Scheduler
> 
> Hi,
> 
> I'm working on a custom top-down post RA scheduler which builds bundles
> at the same time for our VLIW processor. I've borrowed most of the
> implementation from the resource priority queue implemented for the
> existent VLIW scheduler but applied to the context of MI scheduling.
> Basically, instructions that are likely to be bundled must be scheduled
> first (i.e. get higher priority).
> This work should integrate very well with the current infrastructure
> and I'd like to contribute with a patch to add bundling capabilities to
> the current post RA scheduler if the LLVM community is interested :-)
> (May Hexagon need it as well?). It would also be a great opportunity
> for us to get feedback from the community about this.
> 
> We have a non-interlocked processor which relies on the post ra
> scheduler to emit cycle-accurate bundles (valid bundles without
> incurring in structural hazards). The construction of bundles outside
> the scope of post RA scheduling will require structural hazard
> information to work properly for processors without pipeline
> interlocks.
> For example, we can discover that an instruction can fit into the
> current packet (following a schema of linear code scanning, just like
> the current DFAPacketizer does) while in fact it cannot because of
> structural hazards. The two terms are strongly connected and necessary
> to build valid packets.
> The concerns are mainly based on our non-interlocked processor, where
> cycle-accurate bundle emission is necessary. Other approaches/ideas are
> very welcome.
> Do you have any plan for adding a more robust bundler into the current
> infrastructure ?
> 
> Ivan
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Pekka Jääskeläinen

2012-Aug-10 07:28 UTC

head link

[LLVMdev] [RFC] Bundling support in the PostRA Scheduler

Hi,

On 08/09/2012 10:03 PM, Sergei Larin wrote:>    I also tried to mess with PostRA scheduler to achieve similar goals,
only
> to find out that all the additional dependencies after RA make it virtually
> impossible to produce high quality schedule, and obviously it is too late
at
> that point to address reg pressure via scheduling techniques, so I have put
> that project on the back burner for a while.
Has anyone studied how much work it would be to implement
an integrated allocator/scheduler in LLVM now? It should be
an efficient solution to the VLIW RA/scheduling phase ordering
problem, but of course it's more complex to implement and
modularize. (I'm unsure if it's possible to exploit the current
register allocator code easily with it).

Another solution (which we use in TCE) is to use register renaming.
That is, rename schedule-restricting antideps introduced by
the pre-RA "on the fly" during scheduling. It's sort of a middle
ground solution that allows modularizing the proper RA to
a separate pass cleanly. However, its proper implementation goes
step-by-step towards the integrated allocator (smarter the renamer
becomes, more it looks like a proper RA).

BR,
-- 
Pekka

Ivan Llopard

2012-Aug-13 10:07 UTC

head link

[LLVMdev] [RFC] Bundling support in the PostRA Scheduler

Hi all,

Thanks for your feed-backs :-)

@Andrew: In fact, I've reused most of the postRA list scheduler code and 
the resource priority queue. Every time it needs to move forward, either 
because of a res hazard (HazardRec) or an invalid combination of 
instructions in the current packet (DFA), it closes the current bundle 
and advances to the next cycle. The non-interlocked nature of our 
processor forces the bundling logic to live with the scheduling logic. 
We cannot build bundles without the scoreboard.

I also tried to build bundles as a preRA pass in order to reduce the 
register pressure (so the RA will take full advantage of the vliw 
architecture). I've ran into some problems such as the 
re-materialization one that we have discussed some time ago 
(http://llvm.1065342.n5.nabble.com/Instruction-bundles-before-RA-Rematerialization-td45900.html)
and the liveness re-computation while moving MI's into packets, where we 
have contributed with a patch. Other problems are related to our 
specific BE implementation which doesn't allow us to get good 
performances with the preRA bundler. The preRA bundling forces a 
starting point from which the postRA bundler must start with which may 
or may not be the optimal point. Without bundles before RA, the register 
pressure will be higher but the postRA bundler will get the freedom it 
needs to build better bundles. It seems to have a trade-off between reg 
pressure and bundling capabilities. We have chosen the latter.

@Sergei: It's good to see that you are working on it also :-). ATM, we 
don't do any transformation which may affect bundles. The postRA 
scheduler will schedule and packetize all MI's then we run the 
packetFinalization pass. Bundle decomposition is somewhat complex in 
non-interlocked processors where the DFA is not enough to rebuild them.

@Pekka: We don't care about anti-deps as far as the dependent MI's can 
fit into the same bundle. There are anti-dep breakers in llvm and the 
requested one runs together with the postRA scheduler.

The changes I'd like to propose are mainly based in:
- Adapting the current resource priority queue to work with MI's.
- And either add a new postRA SchedBundler or modify the existent one.

But I think I should wait until Sergei send upstream his MI based 
scheduler. Sergei, are you working on some resource priority queue at MI 
level?

Ivan

On 06/08/2012 19:12, Andrew Trick wrote:> On Jul 31, 2012, at 8:37 AM, Ivan Llopard <ivanllopard at gmail.com>
wrote:
>
>> Hi,
>>
>> I'm working on a custom top-down post RA scheduler which builds
bundles
>> at the same time for our VLIW processor. I've borrowed most of the
>> implementation from the resource priority queue implemented for the
>> existent VLIW scheduler but applied to the context of MI scheduling.
>> Basically, instructions that are likely to be bundled must be scheduled
>> first (i.e. get higher priority).
>> This work should integrate very well with the current infrastructure
and
>> I'd like to contribute with a patch to add bundling capabilities to
the
>> current post RA scheduler if the LLVM community is interested :-) (May
>> Hexagon need it as well?). It would also be a great opportunity for us
>> to get feedback from the community about this.
>>
>> We have a non-interlocked processor which relies on the post ra
>> scheduler to emit cycle-accurate bundles (valid bundles without
>> incurring in structural hazards). The construction of bundles outside
>> the scope of post RA scheduling will require structural hazard
>> information to work properly for processors without pipeline
interlocks.
>> For example, we can discover that an instruction can fit into the
>> current packet (following a schema of linear code scanning, just like
>> the current DFAPacketizer does) while in fact it cannot because of
>> structural hazards. The two terms are strongly connected and necessary
>> to build valid packets.
>> The concerns are mainly based on our non-interlocked processor, where
>> cycle-accurate bundle emission is necessary. Other approaches/ideas are
>> very welcome.
>> Do you have any plan for adding a more robust bundler into the current
>> infrastructure ?
>>
>> Ivan
> Hi Ivan,
>
> Your description sounds fine to me. I assume you are totally decoupled from
what LLVM currently calls the "postRA" scheduling pass. Hopefully you
don't need anything in PostRASchedulerList.cpp.
>
> Running your bundler as a preEmit pass is the cleanest approach. But if
need be, we can support preRA bundling at the time the MachineScheduler
currently runs (if enabled). TargetPassConfig allows you to substitute your own
pass in place of MachineScheduler. Passes that run after MachineScheduler are
intended to support instruction bundles. This feature is not extensively tested,
so anyone taking this approach would need to work with backend maintainers to
get things fixed.
>
> -Andy

Maybe Matching Threads

Search for more possibly parallel threads

llvm dev - Aug 2012 - [LLVMdev] [RFC] Bundling support in the PostRA Scheduler

[LLVMdev] [RFC] Bundling support in the PostRA Scheduler

[LLVMdev] [RFC] Bundling support in the PostRA Scheduler

[LLVMdev] [RFC] Bundling support in the PostRA Scheduler

[LLVMdev] [RFC] Bundling support in the PostRA Scheduler

[LLVMdev] [RFC] Bundling support in the PostRA Scheduler

[LLVMdev] [RFC] Bundling support in the PostRA Scheduler

[LLVMdev] [RFC] Bundling support in the PostRA Scheduler

Maybe Matching Threads