thr3ads.net - llvm dev - [LLVMdev] Scheduler Roadmap [May 2012]

If this information is useful, please help other people find it:
Share via:

Sergei Larin

2012-May-11 18:28 UTC

[LLVMdev] Scheduler Roadmap

Dave, 

  Thank you for your interest. Please see my replies below. Sorry that my
terminology is not as crisp as Andy's, but I think you can see what I mean.

Sergei

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum.

> -----Original Message-----
> From: dag at cray.com [mailto:dag at cray.com]
> Sent: Friday, May 11, 2012 12:14 PM
> To: Sergei Larin
> Cc: 'Andrew Trick'; 'Hal Finkel'; wrf at cray.com;
'LLVM Developers
> Mailing List'
> Subject: Re: [LLVMdev] Scheduler Roadmap
> 
> Sergei Larin <slarin at codeaurora.org> writes:
> 
> >   - We do need to have a way to assign bundles much earlier than we
> do now.
> 
> Yeah, I can imagine why this would be useful.
> 
> > And it needs to be intertwined with scheduling (Bundler currently
> > reuses a good chunk of scheduler infrastructure).
> 
> Just to clarify, is the need due to the current bundling implementation
> of reusing scheduler infrastructure or is there a more fundamental
> reason the two should be tied together?  I can imagine some advantages
> of fusing the two but I'm no VLIW expert.
[Larin, Sergei] A little bit of both. Current bundler uses DAG dep builder
to facilitate its analysis, for that it actually instantiates full MI
scheduler... without scheduling.  Ideally scheduler itself should be able to
produce bundled code (since it has the best picture of machine resources and
instruction stream), but standalone bundler by itself might be needed to
re-bundle incrementally (which it does not do right now). In short - I see
bundler as a utility, not as a pass.
> 
> > It is also obvious that it will have adverse effect on all the
> > downstream passes.
> 
> How so?  Isn't the bundle representation supposed to be fairly
> transparent to passes that don't care about it?
[Larin, Sergei] Kind of. Once bundles are finalized, bundle header become a
new "super instructions", and if a pass does not need to look at
individual
(MI) instructions, there will not be any difference for it. But if a pass
need to deal with individual MIs, things get interesting. For one, we lack
API for moving/adding/removing individual MIs to/from finalized bundles. We
also lack API to move MIs between BBs in presence of bundles. Live Intervals
obviously do not work with bundles... 
  Two, semantics (dependencies) within a bundle are parallel (think { r0 r1; r1
= r0 } in serial vs. parallel semantics) and if a pass needs to
"understand" it, it will need to be "taught" how to do it.
This is where
incremental rebundling might come in handy. Fortunately we currently do
bundling fairly late, so it is not an issue yet.
> 
> > It is further insulting due to the fact that bundling is trivial to
> do
> > during scheduling, but it goes hard against the original assumptions
> > made elsewhere.
> 
> Can you explain more about this?
[Larin, Sergei] The core of bundler is the DFA state machine, which also
_must_ be a part of any VLIW scheduler, so in my Hexagon VLIW "custom"
scheduler (I actually have two - SDNode and MI based) I virtually __create__
bundles, but discard them at the end of pass, only to recreate them again
later in the standalone bundler. Second attempt is riddled with additional
false dependencies (anti, output etc.) introduced by the register
allocation, so bundling quality is affected.
> 
> > Re-bundling is also a distinct task that might need to be addressed
> in
> > this context.
[Larin, Sergei] I think above explanation covers this.
> 
> Rebundling is certainly useful.  I'm not sure what you mean by "in
this
> context."
> 
> >   - We do need to have at least a distant plan for global scheduling.
> 
> Yes, definitely!
> 
> > BB scope is nice and manageable, but I can easily get several percent
> > of "missed" performance by simply implementing a pull-up
pass at the
> > very end of code generation... meaning multiple opportunities were
> > lost earlier.  Current way to express and maintain state needed for
> > global scheduling remains to be improved.
> 
> We could also use a global scheduler in the medium-term though it's not
> absolutely critical.  A nice clean infrastructure to support global
> scheduling with multiple heuristics, etc. would be very valuable.  It
> would also be a lot of work.  :)
[Larin, Sergei] I will need to start doing this very soon to meet my
performance goals, but extensive discussion and generic support might take
some time to crystallize... but indeed critically needed.
> 
> >   - SW pipelining and scheduler interaction with it. When (not if:)
> we
> > will have a robust SW pipeliner it will likely to take place before
> > first scheduling pass, and we do not want to "undo" some
decision
> made there.
> 
> Right.  We (the LLVM community) will need ways to mark regions
"don't
> schedule."
> 
>                               -Dave

Andrew Trick

2012-May-11 19:13 UTC

head link

[LLVMdev] Scheduler Roadmap

Thanks for helping explain the infrastructure. One ammendment...

On May 11, 2012, at 11:28 AM, Sergei Larin <slarin at codeaurora.org>
wrote:> [Larin, Sergei] Kind of. Once bundles are finalized, bundle header become a
> new "super instructions", and if a pass does not need to look at
individual
> (MI) instructions, there will not be any difference for it. But if a pass
> need to deal with individual MIs, things get interesting. For one, we lack
> API for moving/adding/removing individual MIs to/from finalized bundles. We
> also lack API to move MIs between BBs in presence of bundles. Live
Intervals
> obviously do not work with bundles... 
>  Two, semantics (dependencies) within a bundle are parallel (think { r0
> r1; r1 = r0 } in serial vs. parallel semantics) and if a pass needs to
> "understand" it, it will need to be "taught" how to do
it. This is where
> incremental rebundling might come in handy. Fortunately we currently do
> bundling fairly late, so it is not an issue yet.
LiveIntervals should work for your bundles by giving each instruction in the
bundle the same slot index. Regalloc should do the right thing. There's a
proof-of-concept API in LiveIntervalAnalyses, handleMoveIntoBundle(), that you
should try to use. As I was saying in my last message, we may want to improve
that API as you actually start testing it.

-Andy
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120511/d37d45f9/attachment.html>

dag at cray.com

2012-May-11 20:24 UTC

head link

[LLVMdev] Scheduler Roadmap

Sergei Larin <slarin at codeaurora.org> writes:
> Ideally scheduler itself should be able to produce bundled code (since
> it has the best picture of machine resources and instruction stream)
Ok, that's what I was imagining.
>> How so?  Isn't the bundle representation supposed to be fairly
>> transparent to passes that don't care about it?
> We also lack API to move MIs between BBs in presence of bundles. 
Why would you want to do that?  Wouldn't you just move bundles across
BBs?  A rebundling pass could split out the individual MIs you want to
move if there are dependence issues.
> Live Intervals obviously do not work with bundles...
I don't see that as obvious.  I'd been imagining LiveIntervals to use a
generic bundle interface to get at the defs and uses of trhe MIs inside
it.  Is it just a matter of not having those interfaces yet?
> Two, semantics (dependencies) within a bundle are parallel (think { r0
> = r1; r1 = r0 } in serial vs. parallel semantics) and if a pass needs to
> "understand" it, it will need to be "taught" how to do
it.
But if a pass need to know about that it is a special bundle-aware pass
and I wouldn't expect the bundle to be transparent.
> This is where incremental rebundling might come in handy. Fortunately
> we currently do bundling fairly late, so it is not an issue yet.
Right, but I'm glad you're thinking ahead.  :)
> [Larin, Sergei] The core of bundler is the DFA state machine, which also
> _must_ be a part of any VLIW scheduler, so in my Hexagon VLIW
"custom"
> scheduler (I actually have two - SDNode and MI based) I virtually
__create__
> bundles, but discard them at the end of pass, only to recreate them again
> later in the standalone bundler. Second attempt is riddled with additional
> false dependencies (anti, output etc.) introduced by the register
> allocation, so bundling quality is affected.
They're only "false" in the sense of the register names being
fixed at
that point.  It's essentially the same problem as trying to schedule for
latency after register allocation.  You're constrained by hazards
imposed by a reduced namespace.  If the scheduler-created bundles were
maintained, then the register allocator would have less freedom to
assign registers.  It's a classic scheduler/regalloc tradeoff.  Bunding
is "just" another fun detail, I think.  :)
>> We could also use a global scheduler in the medium-term though it's
not
>> absolutely critical.  A nice clean infrastructure to support global
>> scheduling with multiple heuristics, etc. would be very valuable.  It
>> would also be a lot of work.  :)
>
> [Larin, Sergei] I will need to start doing this very soon to meet my
> performance goals, but extensive discussion and generic support might take
> some time to crystallize... but indeed critically needed.
Yes, I fully expect it will take a while to mature.  I'm really excited
that you're starting to look at this!

                               -Dave

Sergei Larin

2012-Aug-09 16:48 UTC

head link

[LLVMdev] MI bundle liveness attributes

Hello everyone,

  Let me (re)present a question that might have previously been discussed,
but did not result in any code (AFIK).

  How do we represent a _conditional_ assignment (def) in a bundle MI?

  More contents - currently we expose internal def/use/kill information to a
bundle header - something like this:


BUNDLE %PC<imp-def>, %R0<imp-def>, %P0<imp-use,kill>,
%R16<imp-use>
   * %R0<def> = LDriuh_cdnNotPt %P0<kill,internal>, %R16, 0;
   * %P0<def> = CMPEQri %R16, 0;

  Here CMPEQri is a compare to a predicate register instruction, and
LDriuh_cdnNotPt is a _conditional_ load, which might or might not
Take place based on the outcome of the compare... As such R0 might or might
not be defined in this bundle, which obviously changes the liveness update
process.

  My question, do we need another attribute along with isImplicit and
isEarlyClobber etc. to designate a conditional def? Furthermore, depending
on architectural details we well might have a conditional use as well... and
what about the individual (unbundled) def/use? Should this:

%R0<def> = LDriuh_cdnNotPt %P0<kill,internal>, %R16, 0;

...become this:

%R0<def-cond> = LDriuh_cdnNotPt %P0<kill,internal>, %R16, 0;

or even:

%R0<def-cond> = LDriuh_cdnNotPt %P0<kill,internal>,
%R16<use-cond>, 0;

  So, if I am missing something in current implementation or an ongoing
discussions (and that is entirely possible since I am just back after
vacation), please let me know how to achieve this functionality, but if this
is something missing in implementation, let's discuss how do we want to
realize it.

  Thanks.

Sergei Larin

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum.

Arnold Schwaighofer

2012-Aug-09 18:53 UTC

head link

[LLVMdev] MI bundle liveness attributes

Hi Sergei,

It seems to me that you can represent the semantics of a conditional
instruction by adding a use of the conditionally defined register to the
instruction.

The value of the output register of an instruction is either the value
of the instruction if it was conditionally executed or the value of the
output register before the instruction.

The Bundle would be:

BUNDLE %PC<imp-def>, %R0<imp-def>, %P0<imp-use,kill>,
%R16<imp-use>,
    %R0<imp-use,kill>
      * %R0<def> = LDriuh_cdnNotPt %P0<kill,internal>, %R16, 0,
                          %R0<imp-use,kill>
      * %P0<def> = CMPEQri %R16, 0

The individual instruction would be:

   %R0<def> = LDriuh_cdnNotPt %P0<kill,internal>, %R16, 0,
%R0<imp-use,
                                                            kill>;

How would you use cond-def/uses? How would they change liveness?

Best,
Arnold

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum.

On 8/9/2012 11:48 AM, Sergei Larin wrote:>
> Hello everyone,
>
>    Let me (re)present a question that might have previously been discussed,
> but did not result in any code (AFIK).
>
>    How do we represent a _conditional_ assignment (def) in a bundle MI?
>
>    More contents - currently we expose internal def/use/kill information to
a
> bundle header - something like this:
>
>
> BUNDLE %PC<imp-def>, %R0<imp-def>, %P0<imp-use,kill>,
%R16<imp-use>
>     * %R0<def> = LDriuh_cdnNotPt %P0<kill,internal>, %R16, 0;
>     * %P0<def> = CMPEQri %R16, 0;
>
>    Here CMPEQri is a compare to a predicate register instruction, and
> LDriuh_cdnNotPt is a _conditional_ load, which might or might not
> Take place based on the outcome of the compare... As such R0 might or might
> not be defined in this bundle, which obviously changes the liveness update
> process.
>
>    My question, do we need another attribute along with isImplicit and
> isEarlyClobber etc. to designate a conditional def? Furthermore, depending
> on architectural details we well might have a conditional use as well...
and
> what about the individual (unbundled) def/use? Should this:
>
> %R0<def> = LDriuh_cdnNotPt %P0<kill,internal>, %R16, 0;
>
> ...become this:
>
> %R0<def-cond> = LDriuh_cdnNotPt %P0<kill,internal>, %R16, 0;
>
> or even:
>
> %R0<def-cond> = LDriuh_cdnNotPt %P0<kill,internal>,
%R16<use-cond>, 0;
>
>    So, if I am missing something in current implementation or an ongoing
> discussions (and that is entirely possible since I am just back after
> vacation), please let me know how to achieve this functionality, but if
this
> is something missing in implementation, let's discuss how do we want to
> realize it.
>
>    Thanks.
>
> Sergei Larin
>
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum.
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>


-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum.

Andrew Trick

2012-Aug-10 00:08 UTC

head link

[LLVMdev] MI bundle liveness attributes

Hi Sergei. If an instruction conditionally writes R0 then I think it needs to
implicitly use R0 for proper liveness

Andy 

On Aug 9, 2012, at 9:48 AM, Sergei Larin <slarin at codeaurora.org> wrote:
> 
> Hello everyone,
> 
>  Let me (re)present a question that might have previously been discussed,
> but did not result in any code (AFIK).
> 
>  How do we represent a _conditional_ assignment (def) in a bundle MI?
> 
>  More contents - currently we expose internal def/use/kill information to a
> bundle header - something like this:
> 
> 
> BUNDLE %PC<imp-def>, %R0<imp-def>, %P0<imp-use,kill>,
%R16<imp-use>
>   * %R0<def> = LDriuh_cdnNotPt %P0<kill,internal>, %R16, 0;
>   * %P0<def> = CMPEQri %R16, 0;
> 
>  Here CMPEQri is a compare to a predicate register instruction, and
> LDriuh_cdnNotPt is a _conditional_ load, which might or might not
> Take place based on the outcome of the compare... As such R0 might or might
> not be defined in this bundle, which obviously changes the liveness update
> process.
> 
>  My question, do we need another attribute along with isImplicit and
> isEarlyClobber etc. to designate a conditional def? Furthermore, depending
> on architectural details we well might have a conditional use as well...
and
> what about the individual (unbundled) def/use? Should this:
> 
> %R0<def> = LDriuh_cdnNotPt %P0<kill,internal>, %R16, 0;
> 
> ...become this:
> 
> %R0<def-cond> = LDriuh_cdnNotPt %P0<kill,internal>, %R16, 0;
> 
> or even:
> 
> %R0<def-cond> = LDriuh_cdnNotPt %P0<kill,internal>,
%R16<use-cond>, 0;
> 
>  So, if I am missing something in current implementation or an ongoing
> discussions (and that is entirely possible since I am just back after
> vacation), please let me know how to achieve this functionality, but if
this
> is something missing in implementation, let's discuss how do we want to
> realize it.
> 
>  Thanks.
> 
> Sergei Larin
> 
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum.
> 
>

Sergei Larin

2012-Oct-16 20:43 UTC

head link

[LLVMdev] MI DAG constructor indeterminism

Andy, 

 

  This is less of a question but rather a status quo verification. 

 

   We currently have certain indeterminism in MI scheduler DAG construction
- it is introduces by the use of std::map/std::set during edge traversal.

Result - a random variation in SUnit edge order (which will remain fixed
thereafter). Logically, it is the same DAG, but topologically it is a
slightly different one, and if some algorithm is dependent on the order of
edge traversal, we can have performance and debugging indeterminism. The way
I have discovered it - VLIW scheduler can produce identical cost function
for a pair of SUs, making visitation order the tie breaker, which is not
deterministic per above discussion. For me it is trivial to fix, but I
wonder if this might become a source of well hidden issues in the future.

 

  I am at this time not proposing anything - a fix is definitely possible,
but I wonder what people think about it before I even consider this a bug.

 

Thanks.

 

Sergei

 

---

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by
The Linux Foundation

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20121016/d0a09e82/attachment.html>

Maybe Matching Threads

Search for more seemingly similar threads

llvm dev - May 2012 - [LLVMdev] Scheduler Roadmap

[LLVMdev] Scheduler Roadmap

[LLVMdev] Scheduler Roadmap

[LLVMdev] Scheduler Roadmap

[LLVMdev] MI bundle liveness attributes

[LLVMdev] MI bundle liveness attributes

[LLVMdev] MI bundle liveness attributes

[LLVMdev] MI DAG constructor indeterminism

Maybe Matching Threads