thr3ads.net - llvm dev - [LLVMdev] Scheduling question (memory dependency) [Sep 2012]

If this information is useful, please help other people find it:
Share via:

William J. Schmidt

2012-Sep-21 18:04 UTC

[LLVMdev] Scheduling question (memory dependency)

On Fri, 2012-09-21 at 11:34 -0500, William J. Schmidt
wrote:> Hi Sergei,
> 
> Thanks for the response!  We just discovered there is likely a bug
> happening during post-RA list scheduling.  There's an invalid successor
> index in the scheduling graph that is probably supposed to be the
> missing arc.  Starting to investigate further now.  This is recorded in
> http://llvm.org/bugs/show_bug.cgi?id=13891.
That appears to have been a red herring; I believe the value of -1 is an
artificial dependency indicating the scheduling barrier at the end of
the group, or something along those lines.  The problem appears to be
that the load and store both return a value from
getUnderlyingObjectForInstr, but they are two different objects...

Thanks,
Bill
> 
> Thanks,
> Bill
> 
> On Fri, 2012-09-21 at 11:15 -0500, Sergei Larin wrote:
> > Hi Bill,
> > 
> >    Which scheduler do you use? MI or SDNode one? In either case the
problem
> > is likely the same, but cause might be in a different place...
> > 
> > The way I see it, you have an issue with the alias analyzer, not
scheduler.
> > When scheduling DAG is constructed, AA is checked for pairs of mem
accessing
> > objects, and if no potential interference is flagged by the AA the
chain
> > edge is _not_ inserted. If that decision is wrong, you will end up
with a
> > well hidden and randomly popping bugs.
> > 
> >   So the question much more likely is: Why AA sees these two objects
as not
> > aliasing, and are they properly described and presented to it?
> > 
> >   Does ld/bitcast has proper memory operands? Any flags on them? Is
> > underlying memory object making sense?
> > 
> >   You can look at getUnderlyingObjectForInstr and MIsNeedChainEdge in
the MI
> > scheduling framework to see what I mean.
> > 
> >   If you are still using SDNode scheduling framework - it has a very
similar
> > functionality in a slightly different code.
> > 
> >   Hope this helps.
> > 
> > Sergei
> > 
> > ---
> > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by
> > The Linux Foundation
> > 
> > > -----Original Message-----
> > > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at
cs.uiuc.edu]
> > > On Behalf Of William J. Schmidt
> > > Sent: Friday, September 21, 2012 9:07 AM
> > > To: llvmdev at cs.uiuc.edu
> > > Subject: Re: [LLVMdev] Scheduling question (memory dependency)
> > > 
> > > Here's another data point that may be useful.  [Scheduling
experts,
> > > please help! :) ]
> > > 
> > > If the two-byte bitfield is replaced by a two-byte struct
(replace
> > > "short i:8" with "short i", etc.), the
scheduler properly generates a
> > > dependency between the store and the load.  For this case, a GEP
is
> > > used instead of a bitcast:
> > > 
> > >
------------------------------------------------------------------
> > > define void @_Z5check3fooj(%struct.foo* nocapture byval %f, i32
%i)
> > > noinline {
> > > entry:
> > >   %i1 = getelementptr inbounds %struct.foo* %f, i64 0, i32 0
> > >   %0 = load i16* %i1, align 2, !tbaa !0
> > >
------------------------------------------------------------------
> > > 
> > > One notable difference is the "!tbaa !0" decoration on
the load.  I
> > > don't know whether this helps or not.  Later the lowered
instructions
> > > look like:
> > > 
> > >
------------------------------------------------------------------
> > > 16B		%vreg2<def> = COPY %X4; G8RC_with_sub_32:%vreg2
> > > 32B		%vreg1<def> = COPY %X3; G8RC:%vreg1
> > > 48B		STH8 %vreg1<kill>, 0, <fi#-1>;
mem:ST2[FixedStack-1]
> > > G8RC:%vreg1
> > > 64B		%vreg0<def> = LHZ 0, <fi#-1>; mem:LD2[%i11]
GPRC:%vreg0
> > >                 ...
> > >
------------------------------------------------------------------
> > > 
> > > Note the %i11 instead of %0 on the LHZ as another difference. 
The
> > > scheduler then generates a dependency between the store and the
load,
> > > and everything works properly.
> > > 
> > > Does this help tickle any memories?
> > > 
> > > Thanks,
> > > Bill
> > > 
> > > 
> > > On Thu, 2012-09-20 at 18:02 -0500, William J. Schmidt wrote:
> > > > Greetings,
> > > >
> > > > I'm investigating a bug in the PowerPC back end in which
a load from
> > > a
> > > > storage address is being reordered prior to a store to the
same
> > > > storage address.  I'm quite new to LLVM, so I would
appreciate some
> > > > help understanding what I'm seeing from the dumps.  I
assume that
> > > some
> > > > information is missing that would represent the memory
dependency,
> > > but
> > > > I don't know what form that should take.
> > > >
> > > > Example source code is as follows:
> > > >
> > > >
----------------------------------------------------------------
> > > > extern "C" { int printf(const char *, ...); void
exit(int);} struct
> > > > foo {
> > > >   short i:8;
> > > > };
> > > >
> > > > void check(struct foo f, short i) __attribute__((noinline))
{
> > > >   if (f.i != i) {
> > > >     short fi = f.i;
> > > >     printf("problem with %u != %u\n", fi, i);
> > > >     exit(0);
> > > >   }
> > > > }
> > > >
---------------------------------------------------------------
> > > >
> > > > The initial portion of the Clang output is:
> > > >
> > > > define void @_Z5check3foos(%struct.foo* nocapture byval %f,
i16
> > > > signext %i) noinline {
> > > > entry:
> > > >   %0 = bitcast %struct.foo* %f to i16*
> > > >   %1 = load i16* %0, align 2
> > > >   ...
> > > >
---------------------------------------------------------------
> > > >
> > > > The code works OK at -O0.  At -O1, the first part of the
generated
> > > > code
> > > > is:
> > > >
> > > >
---------------------------------------------------------------
> > > > .L._Z5check3foos:
> > > > 	.cfi_startproc
> > > > # BB#0:                                 # %entry
> > > > 	mflr 0
> > > > 	std 0, 16(1)
> > > > 	stdu 1, -112(1)
> > > > .Ltmp1:
> > > > 	.cfi_def_cfa_offset 112
> > > > .Ltmp2:
> > > > 	.cfi_offset lr, 16
> > > > 	lha 5, 162(1)
> > > > 	sth 3, 162(1)
> > > >         ...
> > > >
---------------------------------------------------------------
> > > >
> > > > The problem here is that the incoming parameter in register
3 is
> > > > stored too late, after an attempt to load the value into
register 5.
> > > >
> > > > Looking at dumps with -debug, I see the following:
> > > >
> > > >
---------------------------------------------------------------
> > > > ********** MACHINEINSTRS **********
> > > > # Machine code for function _Z5check3foos: Post SSA Frame
Objects:
> > > >   fi#-1: size=2, align=2, fixed, at location [SP+50]
Function Live
> > > > Ins: %X3 in %vreg1, %X4 in %vreg2
> > > >
> > > > 0B	BB#0: derived from LLVM BB %entry
> > > > 	    Live Ins: %X3 %X4
> > > > 16B		%vreg2<def> = COPY %X4; G8RC_with_sub_32:%vreg2
> > > > 32B		%vreg1<def> = COPY %X3; G8RC:%vreg1
> > > > 48B		STH8 %vreg1<kill>, 0, <fi#-1>;
mem:ST2[FixedStack-1]
> > > G8RC:%vreg1
> > > > 64B		%vreg4<def> = LHA 0, <fi#-1>; mem:LD2[%0]
GPRC:%vreg4
> > > >                 ...
> > > >
---------------------------------------------------------------
> > > >
> > > > So far, so good.  When we get to list scheduling, not quite
so good:
> > > >
> > > >
---------------------------------------------------------------
> > > > ********** List Scheduling **********
> > > > SU(0):   STH8 %X3<kill>, 162, %X1;
mem:ST2[FixedStack-1]
> > > >   # preds left       : 0
> > > >   # succs left       : 4
> > > >   # rdefs left       : 0
> > > >   Latency            : 3
> > > >   Depth              : 0
> > > >   Height             : 0
> > > >   Successors:
> > > >    antiSU(2): Latency=0
> > > >    antiSU(2): Latency=0
> > > >    ch  SU(5): Latency=0
> > > >    ch  SU(4294967295) *: Latency=0
> > > >
> > > > SU(1):   %R5<def> = LHA 162, %X1; mem:LD2[%0]
> > > >   # preds left       : 0
> > > >   # succs left       : 3
> > > >   # rdefs left       : 0
> > > >   Latency            : 5
> > > >   Depth              : 0
> > > >   Height             : 0
> > > >   Successors:
> > > >    out SU(3): Latency=1
> > > >    val SU(2): Latency=5
> > > >    ch  SU(5): Latency=0
> > > > ...
> > > >
---------------------------------------------------------------
> > > >
> > > > There is no dependency expressed between these two memory
operations,
> > > > although they both access the stack address 162(X1).  The
scheduler
> > > > then sees both instructions as ready, and chooses the load
based on
> > > > critical path height:
> > > >
> > > >
---------------------------------------------------------------
> > > > *** Examining Available
> > > > Height 9: SU(1):   %R5<def> = LHA 162, %X1;
mem:LD2[%0]
> > > > Height 4: SU(0):   STH8 %X3<kill>, 162, %X1;
mem:ST2[FixedStack-1]
> > > > *** Scheduling [0]: SU(1):   %R5<def> = LHA 162, %X1;
mem:LD2[%0]
> > > >
---------------------------------------------------------------
> > > >
> > > > The obvious questions are:  Why is there no dependence
between these
> > > > two instructions?  And what needs to be done to ensure there
is one?
> > > > My guess is that we somehow need to unify FixedStack-1 with
%0, but
> > > > it's not clear to me how this would be accomplished.
> > > >
> > > > (The store is generated as part of
SelectionDAGISel::LowerArguments
> > > > from lib/CodeGen/SelectionDAG/SelectionDAGBuilder, using the
> > > > PowerPC-specific code in
lib/Target/PowerPC/PPCISelLowering.cpp.  The
> > > > load is generated directly from the "load" in the
LLVM IR at some
> > > > other time.)
> > > >
> > > > Thanks very much for any help!
> > > >
> > > > Bill
> > > >
> > > 
> > > 
> > > _______________________________________________
> > > LLVM Developers mailing list
> > > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> > 
>

William J. Schmidt

2012-Sep-21 18:57 UTC

head link

[LLVMdev] Scheduling question (memory dependency)

OK, finally found it.  The AliasChain in
ScheduleDAGInstrs::buildSchedGraph is not acting as a chain for loads
and stores (the head of the chain is not being updated as they are
encountered, so dependencies aren't being added solely on the basis of
may-aliasing in some cases).  Will test a patch.

On Fri, 2012-09-21 at 13:04 -0500, William J. Schmidt
wrote:> On Fri, 2012-09-21 at 11:34 -0500, William J. Schmidt wrote:
> > Hi Sergei,
> > 
> > Thanks for the response!  We just discovered there is likely a bug
> > happening during post-RA list scheduling.  There's an invalid
successor
> > index in the scheduling graph that is probably supposed to be the
> > missing arc.  Starting to investigate further now.  This is recorded
in
> > http://llvm.org/bugs/show_bug.cgi?id=13891.
> 
> That appears to have been a red herring; I believe the value of -1 is an
> artificial dependency indicating the scheduling barrier at the end of
> the group, or something along those lines.  The problem appears to be
> that the load and store both return a value from
> getUnderlyingObjectForInstr, but they are two different objects...
> 
> Thanks,
> Bill
> 
> > 
> > Thanks,
> > Bill
> > 
> > On Fri, 2012-09-21 at 11:15 -0500, Sergei Larin wrote:
> > > Hi Bill,
> > > 
> > >    Which scheduler do you use? MI or SDNode one? In either case
the problem
> > > is likely the same, but cause might be in a different place...
> > > 
> > > The way I see it, you have an issue with the alias analyzer, not
scheduler.
> > > When scheduling DAG is constructed, AA is checked for pairs of
mem accessing
> > > objects, and if no potential interference is flagged by the AA
the chain
> > > edge is _not_ inserted. If that decision is wrong, you will end
up with a
> > > well hidden and randomly popping bugs.
> > > 
> > >   So the question much more likely is: Why AA sees these two
objects as not
> > > aliasing, and are they properly described and presented to it?
> > > 
> > >   Does ld/bitcast has proper memory operands? Any flags on them?
Is
> > > underlying memory object making sense?
> > > 
> > >   You can look at getUnderlyingObjectForInstr and
MIsNeedChainEdge in the MI
> > > scheduling framework to see what I mean.
> > > 
> > >   If you are still using SDNode scheduling framework - it has a
very similar
> > > functionality in a slightly different code.
> > > 
> > >   Hope this helps.
> > > 
> > > Sergei
> > > 
> > > ---
> > > Qualcomm Innovation Center, Inc. is a member of Code Aurora
Forum, hosted by
> > > The Linux Foundation
> > > 
> > > > -----Original Message-----
> > > > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces
at cs.uiuc.edu]
> > > > On Behalf Of William J. Schmidt
> > > > Sent: Friday, September 21, 2012 9:07 AM
> > > > To: llvmdev at cs.uiuc.edu
> > > > Subject: Re: [LLVMdev] Scheduling question (memory
dependency)
> > > > 
> > > > Here's another data point that may be useful. 
[Scheduling experts,
> > > > please help! :) ]
> > > > 
> > > > If the two-byte bitfield is replaced by a two-byte struct
(replace
> > > > "short i:8" with "short i", etc.), the
scheduler properly generates a
> > > > dependency between the store and the load.  For this case, a
GEP is
> > > > used instead of a bitcast:
> > > > 
> > > >
------------------------------------------------------------------
> > > > define void @_Z5check3fooj(%struct.foo* nocapture byval %f,
i32 %i)
> > > > noinline {
> > > > entry:
> > > >   %i1 = getelementptr inbounds %struct.foo* %f, i64 0, i32 0
> > > >   %0 = load i16* %i1, align 2, !tbaa !0
> > > >
------------------------------------------------------------------
> > > > 
> > > > One notable difference is the "!tbaa !0"
decoration on the load.  I
> > > > don't know whether this helps or not.  Later the lowered
instructions
> > > > look like:
> > > > 
> > > >
------------------------------------------------------------------
> > > > 16B		%vreg2<def> = COPY %X4; G8RC_with_sub_32:%vreg2
> > > > 32B		%vreg1<def> = COPY %X3; G8RC:%vreg1
> > > > 48B		STH8 %vreg1<kill>, 0, <fi#-1>;
mem:ST2[FixedStack-1]
> > > > G8RC:%vreg1
> > > > 64B		%vreg0<def> = LHZ 0, <fi#-1>; mem:LD2[%i11]
GPRC:%vreg0
> > > >                 ...
> > > >
------------------------------------------------------------------
> > > > 
> > > > Note the %i11 instead of %0 on the LHZ as another
difference.  The
> > > > scheduler then generates a dependency between the store and
the load,
> > > > and everything works properly.
> > > > 
> > > > Does this help tickle any memories?
> > > > 
> > > > Thanks,
> > > > Bill
> > > > 
> > > > 
> > > > On Thu, 2012-09-20 at 18:02 -0500, William J. Schmidt wrote:
> > > > > Greetings,
> > > > >
> > > > > I'm investigating a bug in the PowerPC back end in
which a load from
> > > > a
> > > > > storage address is being reordered prior to a store to
the same
> > > > > storage address.  I'm quite new to LLVM, so I would
appreciate some
> > > > > help understanding what I'm seeing from the dumps. 
I assume that
> > > > some
> > > > > information is missing that would represent the memory
dependency,
> > > > but
> > > > > I don't know what form that should take.
> > > > >
> > > > > Example source code is as follows:
> > > > >
> > > > >
----------------------------------------------------------------
> > > > > extern "C" { int printf(const char *, ...);
void exit(int);} struct
> > > > > foo {
> > > > >   short i:8;
> > > > > };
> > > > >
> > > > > void check(struct foo f, short i)
__attribute__((noinline)) {
> > > > >   if (f.i != i) {
> > > > >     short fi = f.i;
> > > > >     printf("problem with %u != %u\n", fi, i);
> > > > >     exit(0);
> > > > >   }
> > > > > }
> > > > >
---------------------------------------------------------------
> > > > >
> > > > > The initial portion of the Clang output is:
> > > > >
> > > > > define void @_Z5check3foos(%struct.foo* nocapture byval
%f, i16
> > > > > signext %i) noinline {
> > > > > entry:
> > > > >   %0 = bitcast %struct.foo* %f to i16*
> > > > >   %1 = load i16* %0, align 2
> > > > >   ...
> > > > >
---------------------------------------------------------------
> > > > >
> > > > > The code works OK at -O0.  At -O1, the first part of
the generated
> > > > > code
> > > > > is:
> > > > >
> > > > >
---------------------------------------------------------------
> > > > > .L._Z5check3foos:
> > > > > 	.cfi_startproc
> > > > > # BB#0:                                 # %entry
> > > > > 	mflr 0
> > > > > 	std 0, 16(1)
> > > > > 	stdu 1, -112(1)
> > > > > .Ltmp1:
> > > > > 	.cfi_def_cfa_offset 112
> > > > > .Ltmp2:
> > > > > 	.cfi_offset lr, 16
> > > > > 	lha 5, 162(1)
> > > > > 	sth 3, 162(1)
> > > > >         ...
> > > > >
---------------------------------------------------------------
> > > > >
> > > > > The problem here is that the incoming parameter in
register 3 is
> > > > > stored too late, after an attempt to load the value
into register 5.
> > > > >
> > > > > Looking at dumps with -debug, I see the following:
> > > > >
> > > > >
---------------------------------------------------------------
> > > > > ********** MACHINEINSTRS **********
> > > > > # Machine code for function _Z5check3foos: Post SSA
Frame Objects:
> > > > >   fi#-1: size=2, align=2, fixed, at location [SP+50]
Function Live
> > > > > Ins: %X3 in %vreg1, %X4 in %vreg2
> > > > >
> > > > > 0B	BB#0: derived from LLVM BB %entry
> > > > > 	    Live Ins: %X3 %X4
> > > > > 16B		%vreg2<def> = COPY %X4;
G8RC_with_sub_32:%vreg2
> > > > > 32B		%vreg1<def> = COPY %X3; G8RC:%vreg1
> > > > > 48B		STH8 %vreg1<kill>, 0, <fi#-1>;
mem:ST2[FixedStack-1]
> > > > G8RC:%vreg1
> > > > > 64B		%vreg4<def> = LHA 0, <fi#-1>;
mem:LD2[%0] GPRC:%vreg4
> > > > >                 ...
> > > > >
---------------------------------------------------------------
> > > > >
> > > > > So far, so good.  When we get to list scheduling, not
quite so good:
> > > > >
> > > > >
---------------------------------------------------------------
> > > > > ********** List Scheduling **********
> > > > > SU(0):   STH8 %X3<kill>, 162, %X1;
mem:ST2[FixedStack-1]
> > > > >   # preds left       : 0
> > > > >   # succs left       : 4
> > > > >   # rdefs left       : 0
> > > > >   Latency            : 3
> > > > >   Depth              : 0
> > > > >   Height             : 0
> > > > >   Successors:
> > > > >    antiSU(2): Latency=0
> > > > >    antiSU(2): Latency=0
> > > > >    ch  SU(5): Latency=0
> > > > >    ch  SU(4294967295) *: Latency=0
> > > > >
> > > > > SU(1):   %R5<def> = LHA 162, %X1; mem:LD2[%0]
> > > > >   # preds left       : 0
> > > > >   # succs left       : 3
> > > > >   # rdefs left       : 0
> > > > >   Latency            : 5
> > > > >   Depth              : 0
> > > > >   Height             : 0
> > > > >   Successors:
> > > > >    out SU(3): Latency=1
> > > > >    val SU(2): Latency=5
> > > > >    ch  SU(5): Latency=0
> > > > > ...
> > > > >
---------------------------------------------------------------
> > > > >
> > > > > There is no dependency expressed between these two
memory operations,
> > > > > although they both access the stack address 162(X1). 
The scheduler
> > > > > then sees both instructions as ready, and chooses the
load based on
> > > > > critical path height:
> > > > >
> > > > >
---------------------------------------------------------------
> > > > > *** Examining Available
> > > > > Height 9: SU(1):   %R5<def> = LHA 162, %X1;
mem:LD2[%0]
> > > > > Height 4: SU(0):   STH8 %X3<kill>, 162, %X1;
mem:ST2[FixedStack-1]
> > > > > *** Scheduling [0]: SU(1):   %R5<def> = LHA 162,
%X1; mem:LD2[%0]
> > > > >
---------------------------------------------------------------
> > > > >
> > > > > The obvious questions are:  Why is there no dependence
between these
> > > > > two instructions?  And what needs to be done to ensure
there is one?
> > > > > My guess is that we somehow need to unify FixedStack-1
with %0, but
> > > > > it's not clear to me how this would be
accomplished.
> > > > >
> > > > > (The store is generated as part of
SelectionDAGISel::LowerArguments
> > > > > from lib/CodeGen/SelectionDAG/SelectionDAGBuilder,
using the
> > > > > PowerPC-specific code in
lib/Target/PowerPC/PPCISelLowering.cpp.  The
> > > > > load is generated directly from the "load" in
the LLVM IR at some
> > > > > other time.)
> > > > >
> > > > > Thanks very much for any help!
> > > > >
> > > > > Bill
> > > > >
> > > > 
> > > > 
> > > > _______________________________________________
> > > > LLVM Developers mailing list
> > > > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> > > 
> > 
>

Sergei Larin

2012-Sep-21 21:40 UTC

head link

[LLVMdev] Scheduling question (memory dependency)

Bill, 

  I am not sure what do you mean by "is not acting as a chain for loads and
stores"...
  There is probably a reason why those mem ops are not in AliasChain. Do you
know what is that reason? Are they marked as invariant for instance?
  If getUnderlyingObjectForInstr really think they are the same object, and they
are not, there probably would be a false edge, but you are missing one...

Sergei

---
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The
Linux Foundation

> -----Original Message-----
> From: William J. Schmidt [mailto:wschmidt at linux.vnet.ibm.com]
> Sent: Friday, September 21, 2012 1:57 PM
> To: Sergei Larin
> Cc: llvmdev at cs.uiuc.edu
> Subject: RE: [LLVMdev] Scheduling question (memory dependency)
> 
> OK, finally found it.  The AliasChain in
> ScheduleDAGInstrs::buildSchedGraph is not acting as a chain for loads
> and stores (the head of the chain is not being updated as they are
> encountered, so dependencies aren't being added solely on the basis of
> may-aliasing in some cases).  Will test a patch.
> 
> On Fri, 2012-09-21 at 13:04 -0500, William J. Schmidt wrote:
> > On Fri, 2012-09-21 at 11:34 -0500, William J. Schmidt wrote:
> > > Hi Sergei,
> > >
> > > Thanks for the response!  We just discovered there is likely a
bug
> > > happening during post-RA list scheduling.  There's an invalid
> > > successor index in the scheduling graph that is probably supposed
> to
> > > be the missing arc.  Starting to investigate further now.  This
is
> > > recorded in http://llvm.org/bugs/show_bug.cgi?id=13891.
> >
> > That appears to have been a red herring; I believe the value of -1 is
> > an artificial dependency indicating the scheduling barrier at the end
> > of the group, or something along those lines.  The problem appears to
> > be that the load and store both return a value from
> > getUnderlyingObjectForInstr, but they are two different objects...
> >
> > Thanks,
> > Bill
> >
> > >
> > > Thanks,
> > > Bill
> > >
> > > On Fri, 2012-09-21 at 11:15 -0500, Sergei Larin wrote:
> > > > Hi Bill,
> > > >
> > > >    Which scheduler do you use? MI or SDNode one? In either
case
> > > > the problem is likely the same, but cause might be in a
different
> place...
> > > >
> > > > The way I see it, you have an issue with the alias analyzer,
not
> scheduler.
> > > > When scheduling DAG is constructed, AA is checked for pairs
of
> mem
> > > > accessing objects, and if no potential interference is
flagged by
> > > > the AA the chain edge is _not_ inserted. If that decision is
> > > > wrong, you will end up with a well hidden and randomly
popping
> bugs.
> > > >
> > > >   So the question much more likely is: Why AA sees these two
> > > > objects as not aliasing, and are they properly described and
> presented to it?
> > > >
> > > >   Does ld/bitcast has proper memory operands? Any flags on
them?
> > > > Is underlying memory object making sense?
> > > >
> > > >   You can look at getUnderlyingObjectForInstr and
> MIsNeedChainEdge
> > > > in the MI scheduling framework to see what I mean.
> > > >
> > > >   If you are still using SDNode scheduling framework - it
has a
> > > > very similar functionality in a slightly different code.
> > > >
> > > >   Hope this helps.
> > > >
> > > > Sergei
> > > >
> > > > ---
> > > > Qualcomm Innovation Center, Inc. is a member of Code Aurora
> Forum,
> > > > hosted by The Linux Foundation
> > > >
> > > > > -----Original Message-----
> > > > > From: llvmdev-bounces at cs.uiuc.edu
> > > > > [mailto:llvmdev-bounces at cs.uiuc.edu]
> > > > > On Behalf Of William J. Schmidt
> > > > > Sent: Friday, September 21, 2012 9:07 AM
> > > > > To: llvmdev at cs.uiuc.edu
> > > > > Subject: Re: [LLVMdev] Scheduling question (memory
dependency)
> > > > >
> > > > > Here's another data point that may be useful. 
[Scheduling
> > > > > experts, please help! :) ]
> > > > >
> > > > > If the two-byte bitfield is replaced by a two-byte
struct
> > > > > (replace "short i:8" with "short
i", etc.), the scheduler
> > > > > properly generates a dependency between the store and
the load.
> > > > > For this case, a GEP is used instead of a bitcast:
> > > > >
> > > > >
---------------------------------------------------------------
> -
> > > > > -- define void @_Z5check3fooj(%struct.foo* nocapture
byval %f,
> > > > > i32 %i) noinline {
> > > > > entry:
> > > > >   %i1 = getelementptr inbounds %struct.foo* %f, i64 0,
i32 0
> > > > >   %0 = load i16* %i1, align 2, !tbaa !0
> > > > >
---------------------------------------------------------------
> -
> > > > > --
> > > > >
> > > > > One notable difference is the "!tbaa !0"
decoration on the
> load.
> > > > > I don't know whether this helps or not.  Later the
lowered
> > > > > instructions look like:
> > > > >
> > > > >
---------------------------------------------------------------
> ---
> > > > > 16B		%vreg2<def> = COPY %X4;
G8RC_with_sub_32:%vreg2
> > > > > 32B		%vreg1<def> = COPY %X3; G8RC:%vreg1
> > > > > 48B		STH8 %vreg1<kill>, 0, <fi#-1>;
mem:ST2[FixedStack-1]
> > > > > G8RC:%vreg1
> > > > > 64B		%vreg0<def> = LHZ 0, <fi#-1>;
mem:LD2[%i11]
> GPRC:%vreg0
> > > > >                 ...
> > > > >
---------------------------------------------------------------
> -
> > > > > --
> > > > >
> > > > > Note the %i11 instead of %0 on the LHZ as another
difference.
> > > > > The scheduler then generates a dependency between the
store and
> > > > > the load, and everything works properly.
> > > > >
> > > > > Does this help tickle any memories?
> > > > >
> > > > > Thanks,
> > > > > Bill
> > > > >
> > > > >
> > > > > On Thu, 2012-09-20 at 18:02 -0500, William J. Schmidt
wrote:
> > > > > > Greetings,
> > > > > >
> > > > > > I'm investigating a bug in the PowerPC back
end in which a
> > > > > > load from
> > > > > a
> > > > > > storage address is being reordered prior to a
store to the
> > > > > > same storage address.  I'm quite new to LLVM,
so I would
> > > > > > appreciate some help understanding what I'm
seeing from the
> > > > > > dumps.  I assume that
> > > > > some
> > > > > > information is missing that would represent the
memory
> > > > > > dependency,
> > > > > but
> > > > > > I don't know what form that should take.
> > > > > >
> > > > > > Example source code is as follows:
> > > > > >
> > > > > >
-------------------------------------------------------------
> -
> > > > > > -- extern "C" { int printf(const char *,
...); void
> > > > > > exit(int);} struct foo {
> > > > > >   short i:8;
> > > > > > };
> > > > > >
> > > > > > void check(struct foo f, short i)
__attribute__((noinline)) {
> > > > > >   if (f.i != i) {
> > > > > >     short fi = f.i;
> > > > > >     printf("problem with %u != %u\n",
fi, i);
> > > > > >     exit(0);
> > > > > >   }
> > > > > > }
> > > > > >
-------------------------------------------------------------
> -
> > > > > > -
> > > > > >
> > > > > > The initial portion of the Clang output is:
> > > > > >
> > > > > > define void @_Z5check3foos(%struct.foo* nocapture
byval %f,
> > > > > > i16 signext %i) noinline {
> > > > > > entry:
> > > > > >   %0 = bitcast %struct.foo* %f to i16*
> > > > > >   %1 = load i16* %0, align 2
> > > > > >   ...
> > > > > >
-------------------------------------------------------------
> -
> > > > > > -
> > > > > >
> > > > > > The code works OK at -O0.  At -O1, the first part
of the
> > > > > > generated code
> > > > > > is:
> > > > > >
> > > > > >
-------------------------------------------------------------
> -
> > > > > > -
> > > > > > .L._Z5check3foos:
> > > > > > 	.cfi_startproc
> > > > > > # BB#0:                                 # %entry
> > > > > > 	mflr 0
> > > > > > 	std 0, 16(1)
> > > > > > 	stdu 1, -112(1)
> > > > > > .Ltmp1:
> > > > > > 	.cfi_def_cfa_offset 112
> > > > > > .Ltmp2:
> > > > > > 	.cfi_offset lr, 16
> > > > > > 	lha 5, 162(1)
> > > > > > 	sth 3, 162(1)
> > > > > >         ...
> > > > > >
-------------------------------------------------------------
> -
> > > > > > -
> > > > > >
> > > > > > The problem here is that the incoming parameter in
register 3
> > > > > > is stored too late, after an attempt to load the
value into
> register 5.
> > > > > >
> > > > > > Looking at dumps with -debug, I see the following:
> > > > > >
> > > > > >
-------------------------------------------------------------
> -
> > > > > > -
> > > > > > ********** MACHINEINSTRS ********** # Machine code
for
> > > > > > function _Z5check3foos: Post SSA Frame Objects:
> > > > > >   fi#-1: size=2, align=2, fixed, at location
[SP+50] Function
> > > > > > Live
> > > > > > Ins: %X3 in %vreg1, %X4 in %vreg2
> > > > > >
> > > > > > 0B	BB#0: derived from LLVM BB %entry
> > > > > > 	    Live Ins: %X3 %X4
> > > > > > 16B		%vreg2<def> = COPY %X4;
G8RC_with_sub_32:%vreg2
> > > > > > 32B		%vreg1<def> = COPY %X3; G8RC:%vreg1
> > > > > > 48B		STH8 %vreg1<kill>, 0, <fi#-1>;
> mem:ST2[FixedStack-1]
> > > > > G8RC:%vreg1
> > > > > > 64B		%vreg4<def> = LHA 0, <fi#-1>;
mem:LD2[%0]
> GPRC:%vreg4
> > > > > >                 ...
> > > > > >
-------------------------------------------------------------
> -
> > > > > > -
> > > > > >
> > > > > > So far, so good.  When we get to list scheduling,
not quite
> so good:
> > > > > >
> > > > > >
-------------------------------------------------------------
> -
> > > > > > -
> > > > > > ********** List Scheduling **********
> > > > > > SU(0):   STH8 %X3<kill>, 162, %X1;
mem:ST2[FixedStack-1]
> > > > > >   # preds left       : 0
> > > > > >   # succs left       : 4
> > > > > >   # rdefs left       : 0
> > > > > >   Latency            : 3
> > > > > >   Depth              : 0
> > > > > >   Height             : 0
> > > > > >   Successors:
> > > > > >    antiSU(2): Latency=0
> > > > > >    antiSU(2): Latency=0
> > > > > >    ch  SU(5): Latency=0
> > > > > >    ch  SU(4294967295) *: Latency=0
> > > > > >
> > > > > > SU(1):   %R5<def> = LHA 162, %X1;
mem:LD2[%0]
> > > > > >   # preds left       : 0
> > > > > >   # succs left       : 3
> > > > > >   # rdefs left       : 0
> > > > > >   Latency            : 5
> > > > > >   Depth              : 0
> > > > > >   Height             : 0
> > > > > >   Successors:
> > > > > >    out SU(3): Latency=1
> > > > > >    val SU(2): Latency=5
> > > > > >    ch  SU(5): Latency=0
> > > > > > ...
> > > > > >
-------------------------------------------------------------
> -
> > > > > > -
> > > > > >
> > > > > > There is no dependency expressed between these two
memory
> > > > > > operations, although they both access the stack
address
> > > > > > 162(X1).  The scheduler then sees both
instructions as ready,
> > > > > > and chooses the load based on critical path
height:
> > > > > >
> > > > > >
-------------------------------------------------------------
> -
> > > > > > -
> > > > > > *** Examining Available
> > > > > > Height 9: SU(1):   %R5<def> = LHA 162, %X1;
mem:LD2[%0]
> > > > > > Height 4: SU(0):   STH8 %X3<kill>, 162, %X1;
> mem:ST2[FixedStack-1]
> > > > > > *** Scheduling [0]: SU(1):   %R5<def> = LHA
162, %X1;
> mem:LD2[%0]
> > > > > >
-------------------------------------------------------------
> -
> > > > > > -
> > > > > >
> > > > > > The obvious questions are:  Why is there no
dependence
> between
> > > > > > these two instructions?  And what needs to be done
to ensure
> there is one?
> > > > > > My guess is that we somehow need to unify
FixedStack-1 with
> > > > > > %0, but it's not clear to me how this would be
accomplished.
> > > > > >
> > > > > > (The store is generated as part of
> > > > > > SelectionDAGISel::LowerArguments from
> > > > > > lib/CodeGen/SelectionDAG/SelectionDAGBuilder,
using the
> > > > > > PowerPC-specific code in
> > > > > > lib/Target/PowerPC/PPCISelLowering.cpp.  The load
is
> generated
> > > > > > directly from the "load" in the LLVM IR
at some other time.)
> > > > > >
> > > > > > Thanks very much for any help!
> > > > > >
> > > > > > Bill
> > > > > >
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > LLVM Developers mailing list
> > > > > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> > > >
> > >
> >
>

Maybe Matching Threads

Search for more seemingly similar threads

llvm dev - Sep 2012 - [LLVMdev] Scheduling question (memory dependency)

[LLVMdev] Scheduling question (memory dependency)

[LLVMdev] Scheduling question (memory dependency)

[LLVMdev] Scheduling question (memory dependency)

Maybe Matching Threads