thr3ads.net - llvm dev - [llvm-dev] [cfe-dev] [RFC] Loading Bitfields with Smallest Needed Types [May 2020]

If this information is useful, please help other people find it:
Share via:

Bill Wendling via llvm-dev

2020-May-29 23:29 UTC

[llvm-dev] [cfe-dev] [RFC] Loading Bitfields with Smallest Needed Types

On Fri, May 29, 2020 at 4:00 PM Richard Smith <richard at metafoo.co.uk>
wrote:>
> On Fri, 29 May 2020 at 11:06, John McCall via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
>>
>> On 28 May 2020, at 18:42, Bill Wendling wrote:
>>
>> > On Tue, May 26, 2020 at 7:49 PM James Y Knight via llvm-dev
>> > <llvm-dev at lists.llvm.org> wrote:
>> >>
>> >> At least in this test-case, the "bitfield" part of
this seems to be a
>> >> distraction. As Eli notes, Clang has lowered the function to
LLVM IR
>> >> containing consistent i16 operations. Despite that being a
different
>> >> choice from GCC, it should still be correct and consistent.
>> >>
>> > I suspect that this is more prevalent with bitfields as
they're more
>> > likely to have the load / bitwise op / store operations done on
them,
>> > resulting in an access type that can be shortened. But yes,
it's not
>> > specific to just bitfields.
>> >
>> > I'm more interested in consistency, to be honest. If the loads
and
>> > stores for the bitfields (or other such shorten-able objects) were
the
>> > same, then we wouldn't run into the store-to-load forwarding
issue on
>> > x86 (I don't know about other platforms, but suspect that
consistency
>> > wouldn't hurt). I liked Arthur's idea of accessing the
object using
>> > the type size the bitfield was defined with (i8, i16, i256). It
would
>> > help with improving the heuristic. The downside is that it could
lead
>> > to un-optimal code, but that's the situation we have now,
so...
>>
>> Okay, but what concretely are you suggesting here?  Clang IRGen is
>> emitting accesses with consistent sizes, and LLVM is making them
>> inconsistent.  Are you just asking Clang to emit smaller accesses
>> in the hope that LLVM won’t mess them up?
>
>
> I don't think this has anything to do with bit-fields or Clang's
lowering. This seems to "just" be an optimizer issue (one that happens
to show up for bit-field access patterns, but also affects other cases).
Much-reduced testcase:
>
> unsigned short n;
> void set() { n |= 1; }
>
> For this testcase, -O2 generates a 1-byte 'or' instruction, which
will often be a pessimization when there are also full-width accesses. I
don't think the frontend can or should be working around this.
We're a bit (heh) tied in what we can do. LLVM's IR doesn't convey
that we're working with a bitfield, but as many pointed out it's not
bitfield-specific (I didn't mean to imply that it was *only* bitfields
affected by this with my initial post, that was just where it showed
up). The front-end generates code that's perfectly valid for the
various accesses, and changing what it generates doesn't seem like it
would be worth the hassle. (E.g. using i1, i3, etc. for the bitfields
(note I'm *not* suggesting we do this, I'm just using as an example).)

So what we're left with is trying to ensure that we don't generate
sub-optimal code via the DAG combiner and other passes. Some ideas off
the top of my head:

1. Avoid generating byte-sized bitwise operations, because as Richard
pointed out they can lead to pessimizations.

2. Come up with a better heuristic to determine if we're going to
generate inconsistent load/store accesses to the same object.

3. Transform the machine instructions before register allocation to
ensure that all accesses to the same address are the same size, or at
least that the stores are of equal or greater size than the loads.

Option (1) may be the simplest and could potentially solve our immediate issue.

Option (2) is difficult because the DAG combiner operates on a single
BB at a time and doesn't have visibility into past DAGs it modified.

Option (3) allows us to retain the current behavior and gives us
visibility into the whole function, not just an individual BB.

I kind of like option 3, but I'm open to other suggestions.

-bw

Craig Topper via llvm-dev

2020-May-29 23:49 UTC

head link

[llvm-dev] [cfe-dev] [RFC] Loading Bitfields with Smallest Needed Types

I thought the IR you showed in March had the store in one function and the
load in another function. Option 3 would not handle that.

~Craig


On Fri, May 29, 2020 at 4:29 PM Bill Wendling via cfe-dev <
cfe-dev at lists.llvm.org> wrote:
> On Fri, May 29, 2020 at 4:00 PM Richard Smith <richard at
metafoo.co.uk>
> wrote:
> >
> > On Fri, 29 May 2020 at 11:06, John McCall via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >>
> >> On 28 May 2020, at 18:42, Bill Wendling wrote:
> >>
> >> > On Tue, May 26, 2020 at 7:49 PM James Y Knight via llvm-dev
> >> > <llvm-dev at lists.llvm.org> wrote:
> >> >>
> >> >> At least in this test-case, the "bitfield" part
of this seems to be a
> >> >> distraction. As Eli notes, Clang has lowered the function
to LLVM IR
> >> >> containing consistent i16 operations. Despite that being
a different
> >> >> choice from GCC, it should still be correct and
consistent.
> >> >>
> >> > I suspect that this is more prevalent with bitfields as
they're more
> >> > likely to have the load / bitwise op / store operations done
on them,
> >> > resulting in an access type that can be shortened. But yes,
it's not
> >> > specific to just bitfields.
> >> >
> >> > I'm more interested in consistency, to be honest. If the
loads and
> >> > stores for the bitfields (or other such shorten-able objects)
were the
> >> > same, then we wouldn't run into the store-to-load
forwarding issue on
> >> > x86 (I don't know about other platforms, but suspect that
consistency
> >> > wouldn't hurt). I liked Arthur's idea of accessing
the object using
> >> > the type size the bitfield was defined with (i8, i16, i256).
It would
> >> > help with improving the heuristic. The downside is that it
could lead
> >> > to un-optimal code, but that's the situation we have now,
so...
> >>
> >> Okay, but what concretely are you suggesting here?  Clang IRGen is
> >> emitting accesses with consistent sizes, and LLVM is making them
> >> inconsistent.  Are you just asking Clang to emit smaller accesses
> >> in the hope that LLVM won’t mess them up?
> >
> >
> > I don't think this has anything to do with bit-fields or
Clang's
> lowering. This seems to "just" be an optimizer issue (one that
happens to
> show up for bit-field access patterns, but also affects other cases).
> Much-reduced testcase:
> >
> > unsigned short n;
> > void set() { n |= 1; }
> >
> > For this testcase, -O2 generates a 1-byte 'or' instruction,
which will
> often be a pessimization when there are also full-width accesses. I
don't
> think the frontend can or should be working around this.
>
> We're a bit (heh) tied in what we can do. LLVM's IR doesn't
convey
> that we're working with a bitfield, but as many pointed out it's
not
> bitfield-specific (I didn't mean to imply that it was *only* bitfields
> affected by this with my initial post, that was just where it showed
> up). The front-end generates code that's perfectly valid for the
> various accesses, and changing what it generates doesn't seem like it
> would be worth the hassle. (E.g. using i1, i3, etc. for the bitfields
> (note I'm *not* suggesting we do this, I'm just using as an
example).)
>
> So what we're left with is trying to ensure that we don't generate
> sub-optimal code via the DAG combiner and other passes. Some ideas off
> the top of my head:
>
> 1. Avoid generating byte-sized bitwise operations, because as Richard
> pointed out they can lead to pessimizations.
>
> 2. Come up with a better heuristic to determine if we're going to
> generate inconsistent load/store accesses to the same object.
>
> 3. Transform the machine instructions before register allocation to
> ensure that all accesses to the same address are the same size, or at
> least that the stores are of equal or greater size than the loads.
>
> Option (1) may be the simplest and could potentially solve our immediate
> issue.
>
> Option (2) is difficult because the DAG combiner operates on a single
> BB at a time and doesn't have visibility into past DAGs it modified.
>
> Option (3) allows us to retain the current behavior and gives us
> visibility into the whole function, not just an individual BB.
>
> I kind of like option 3, but I'm open to other suggestions.
>
> -bw
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200529/67d82e3f/attachment.html>

Bill Wendling via llvm-dev

2020-May-29 23:56 UTC

head link

[llvm-dev] [cfe-dev] [RFC] Loading Bitfields with Smallest Needed Types

On Fri, May 29, 2020 at 4:50 PM Craig Topper <craig.topper at gmail.com>
wrote:>
> I thought the IR you showed in March had the store in one function and the
load in another function. Option 3 would not handle that.
>Yeah. That's an entirely different issue though as it'll need IPA.
I'm
a bit skeptical about the initial report as I suspect that a store at
the end of a function would have retired or at least made it to a
cache by the time the function returns, the stack's readjusted,
registers restored, etc. So I think some of the details in the initial
reports may have been a bit off.

-bw
> On Fri, May 29, 2020 at 4:29 PM Bill Wendling via cfe-dev <cfe-dev at
lists.llvm.org> wrote:
>>
>> On Fri, May 29, 2020 at 4:00 PM Richard Smith <richard at
metafoo.co.uk> wrote:
>> >
>> > On Fri, 29 May 2020 at 11:06, John McCall via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>> >>
>> >> On 28 May 2020, at 18:42, Bill Wendling wrote:
>> >>
>> >> > On Tue, May 26, 2020 at 7:49 PM James Y Knight via
llvm-dev
>> >> > <llvm-dev at lists.llvm.org> wrote:
>> >> >>
>> >> >> At least in this test-case, the "bitfield"
part of this seems to be a
>> >> >> distraction. As Eli notes, Clang has lowered the
function to LLVM IR
>> >> >> containing consistent i16 operations. Despite that
being a different
>> >> >> choice from GCC, it should still be correct and
consistent.
>> >> >>
>> >> > I suspect that this is more prevalent with bitfields as
they're more
>> >> > likely to have the load / bitwise op / store operations
done on them,
>> >> > resulting in an access type that can be shortened. But
yes, it's not
>> >> > specific to just bitfields.
>> >> >
>> >> > I'm more interested in consistency, to be honest. If
the loads and
>> >> > stores for the bitfields (or other such shorten-able
objects) were the
>> >> > same, then we wouldn't run into the store-to-load
forwarding issue on
>> >> > x86 (I don't know about other platforms, but suspect
that consistency
>> >> > wouldn't hurt). I liked Arthur's idea of
accessing the object using
>> >> > the type size the bitfield was defined with (i8, i16,
i256). It would
>> >> > help with improving the heuristic. The downside is that
it could lead
>> >> > to un-optimal code, but that's the situation we have
now, so...
>> >>
>> >> Okay, but what concretely are you suggesting here?  Clang
IRGen is
>> >> emitting accesses with consistent sizes, and LLVM is making
them
>> >> inconsistent.  Are you just asking Clang to emit smaller
accesses
>> >> in the hope that LLVM won’t mess them up?
>> >
>> >
>> > I don't think this has anything to do with bit-fields or
Clang's lowering. This seems to "just" be an optimizer issue (one
that happens to show up for bit-field access patterns, but also affects other
cases). Much-reduced testcase:
>> >
>> > unsigned short n;
>> > void set() { n |= 1; }
>> >
>> > For this testcase, -O2 generates a 1-byte 'or'
instruction, which will often be a pessimization when there are also full-width
accesses. I don't think the frontend can or should be working around this.
>>
>> We're a bit (heh) tied in what we can do. LLVM's IR doesn't
convey
>> that we're working with a bitfield, but as many pointed out
it's not
>> bitfield-specific (I didn't mean to imply that it was *only*
bitfields
>> affected by this with my initial post, that was just where it showed
>> up). The front-end generates code that's perfectly valid for the
>> various accesses, and changing what it generates doesn't seem like
it
>> would be worth the hassle. (E.g. using i1, i3, etc. for the bitfields
>> (note I'm *not* suggesting we do this, I'm just using as an
example).)
>>
>> So what we're left with is trying to ensure that we don't
generate
>> sub-optimal code via the DAG combiner and other passes. Some ideas off
>> the top of my head:
>>
>> 1. Avoid generating byte-sized bitwise operations, because as Richard
>> pointed out they can lead to pessimizations.
>>
>> 2. Come up with a better heuristic to determine if we're going to
>> generate inconsistent load/store accesses to the same object.
>>
>> 3. Transform the machine instructions before register allocation to
>> ensure that all accesses to the same address are the same size, or at
>> least that the stores are of equal or greater size than the loads.
>>
>> Option (1) may be the simplest and could potentially solve our
immediate issue.
>>
>> Option (2) is difficult because the DAG combiner operates on a single
>> BB at a time and doesn't have visibility into past DAGs it
modified.
>>
>> Option (3) allows us to retain the current behavior and gives us
>> visibility into the whole function, not just an individual BB.
>>
>> I kind of like option 3, but I'm open to other suggestions.
>>
>> -bw
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

John McCall via llvm-dev

2020-May-30 00:59 UTC

head link

[llvm-dev] [cfe-dev] [RFC] Loading Bitfields with Smallest Needed Types

On 29 May 2020, at 19:29, Bill Wendling wrote:> On Fri, May 29, 2020 at 4:00 PM Richard Smith <richard at
metafoo.co.uk>
> wrote:
>> On Fri, 29 May 2020 at 11:06, John McCall via llvm-dev 
>> <llvm-dev at lists.llvm.org> wrote:
>>> On 28 May 2020, at 18:42, Bill Wendling wrote:
>>>
>>>> On Tue, May 26, 2020 at 7:49 PM James Y Knight via llvm-dev
>>>> <llvm-dev at lists.llvm.org> wrote:
>>>>>
>>>>> At least in this test-case, the "bitfield" part
of this seems to
>>>>> be a
>>>>> distraction. As Eli notes, Clang has lowered the function
to LLVM
>>>>> IR
>>>>> containing consistent i16 operations. Despite that being a 
>>>>> different
>>>>> choice from GCC, it should still be correct and consistent.
>>>>>
>>>> I suspect that this is more prevalent with bitfields as
they're
>>>> more
>>>> likely to have the load / bitwise op / store operations done on
>>>> them,
>>>> resulting in an access type that can be shortened. But yes,
it's
>>>> not
>>>> specific to just bitfields.
>>>>
>>>> I'm more interested in consistency, to be honest. If the
loads and
>>>> stores for the bitfields (or other such shorten-able objects)
were
>>>> the
>>>> same, then we wouldn't run into the store-to-load
forwarding issue
>>>> on
>>>> x86 (I don't know about other platforms, but suspect that 
>>>> consistency
>>>> wouldn't hurt). I liked Arthur's idea of accessing the
object using
>>>> the type size the bitfield was defined with (i8, i16, i256). It
>>>> would
>>>> help with improving the heuristic. The downside is that it
could
>>>> lead
>>>> to un-optimal code, but that's the situation we have now,
so...
>>>
>>> Okay, but what concretely are you suggesting here?  Clang IRGen is
>>> emitting accesses with consistent sizes, and LLVM is making them
>>> inconsistent.  Are you just asking Clang to emit smaller accesses
>>> in the hope that LLVM won’t mess them up?
>>
>>
>> I don't think this has anything to do with bit-fields or
Clang's
>> lowering. This seems to "just" be an optimizer issue (one
that
>> happens to show up for bit-field access patterns, but also affects 
>> other cases). Much-reduced testcase:
>>
>> unsigned short n;
>> void set() { n |= 1; }
To be clear, I agree with you that this is not a frontend problem.
I asked because the question seemed to lead towards changing the width
that Clang uses for accesses.
> We're a bit (heh) tied in what we can do. LLVM's IR doesn't
convey
> that we're working with a bitfield, but as many pointed out it's
not
> bitfield-specific (I didn't mean to imply that it was *only* bitfields
> affected by this with my initial post, that was just where it showed
> up). The front-end generates code that's perfectly valid for the
> various accesses, and changing what it generates doesn't seem like it
> would be worth the hassle. (E.g. using i1, i3, etc. for the bitfields
> (note I'm *not* suggesting we do this, I'm just using as an
example).)
>
> So what we're left with is trying to ensure that we don't generate
> sub-optimal code via the DAG combiner and other passes. Some ideas off
> the top of my head:
>
> 1. Avoid generating byte-sized bitwise operations, because as Richard
> pointed out they can lead to pessimizations.
>
> 2. Come up with a better heuristic to determine if we're going to
> generate inconsistent load/store accesses to the same object.
>
> 3. Transform the machine instructions before register allocation to
> ensure that all accesses to the same address are the same size, or at
> least that the stores are of equal or greater size than the loads.
>
> Option (1) may be the simplest and could potentially solve our 
> immediate issue.
>
> Option (2) is difficult because the DAG combiner operates on a single
> BB at a time and doesn't have visibility into past DAGs it modified.
>
> Option (3) allows us to retain the current behavior and gives us
> visibility into the whole function, not just an individual BB.
>
> I kind of like option 3, but I'm open to other suggestions.
What’s the concrete suggestion for (3), that we analyze the uses
of a single address and try to re-widen the access to make it use
a consistent width?  Widening accesses is difficult in general,
unless the information is still present that it was originally
wider before DAG combine.

At any rate, seems like a backend problem. :)

John.

Bill Wendling via llvm-dev

2020-May-30 03:23 UTC

head link

[llvm-dev] [cfe-dev] [RFC] Loading Bitfields with Smallest Needed Types

On Fri, May 29, 2020 at 5:59 PM John McCall <rjmccall at apple.com>
wrote:> On 29 May 2020, at 19:29, Bill Wendling wrote:
> > On Fri, May 29, 2020 at 4:00 PM Richard Smith <richard at
metafoo.co.uk>
> > wrote:
> >> On Fri, 29 May 2020 at 11:06, John McCall via llvm-dev
> >> <llvm-dev at lists.llvm.org> wrote:
> >>> On 28 May 2020, at 18:42, Bill Wendling wrote:
> >>>
> >>>> On Tue, May 26, 2020 at 7:49 PM James Y Knight via
llvm-dev
> >>>> <llvm-dev at lists.llvm.org> wrote:
> >>>>>
> >>>>> At least in this test-case, the "bitfield"
part of this seems to
> >>>>> be a
> >>>>> distraction. As Eli notes, Clang has lowered the
function to LLVM
> >>>>> IR
> >>>>> containing consistent i16 operations. Despite that
being a
> >>>>> different
> >>>>> choice from GCC, it should still be correct and
consistent.
> >>>>>
> >>>> I suspect that this is more prevalent with bitfields as
they're
> >>>> more
> >>>> likely to have the load / bitwise op / store operations
done on
> >>>> them,
> >>>> resulting in an access type that can be shortened. But
yes, it's
> >>>> not
> >>>> specific to just bitfields.
> >>>>
> >>>> I'm more interested in consistency, to be honest. If
the loads and
> >>>> stores for the bitfields (or other such shorten-able
objects) were
> >>>> the
> >>>> same, then we wouldn't run into the store-to-load
forwarding issue
> >>>> on
> >>>> x86 (I don't know about other platforms, but suspect
that
> >>>> consistency
> >>>> wouldn't hurt). I liked Arthur's idea of accessing
the object using
> >>>> the type size the bitfield was defined with (i8, i16,
i256). It
> >>>> would
> >>>> help with improving the heuristic. The downside is that it
could
> >>>> lead
> >>>> to un-optimal code, but that's the situation we have
now, so...
> >>>
> >>> Okay, but what concretely are you suggesting here?  Clang
IRGen is
> >>> emitting accesses with consistent sizes, and LLVM is making
them
> >>> inconsistent.  Are you just asking Clang to emit smaller
accesses
> >>> in the hope that LLVM won’t mess them up?
> >>
> >>
> >> I don't think this has anything to do with bit-fields or
Clang's
> >> lowering. This seems to "just" be an optimizer issue
(one that
> >> happens to show up for bit-field access patterns, but also affects
> >> other cases). Much-reduced testcase:
> >>
> >> unsigned short n;
> >> void set() { n |= 1; }
>
> To be clear, I agree with you that this is not a frontend problem.
> I asked because the question seemed to lead towards changing the width
> that Clang uses for accesses.
>
> > We're a bit (heh) tied in what we can do. LLVM's IR
doesn't convey
> > that we're working with a bitfield, but as many pointed out
it's not
> > bitfield-specific (I didn't mean to imply that it was *only*
bitfields
> > affected by this with my initial post, that was just where it showed
> > up). The front-end generates code that's perfectly valid for the
> > various accesses, and changing what it generates doesn't seem like
it
> > would be worth the hassle. (E.g. using i1, i3, etc. for the bitfields
> > (note I'm *not* suggesting we do this, I'm just using as an
example).)
> >
> > So what we're left with is trying to ensure that we don't
generate
> > sub-optimal code via the DAG combiner and other passes. Some ideas off
> > the top of my head:
> >
> > 1. Avoid generating byte-sized bitwise operations, because as Richard
> > pointed out they can lead to pessimizations.
> >
> > 2. Come up with a better heuristic to determine if we're going to
> > generate inconsistent load/store accesses to the same object.
> >
> > 3. Transform the machine instructions before register allocation to
> > ensure that all accesses to the same address are the same size, or at
> > least that the stores are of equal or greater size than the loads.
> >
> > Option (1) may be the simplest and could potentially solve our
> > immediate issue.
> >
> > Option (2) is difficult because the DAG combiner operates on a single
> > BB at a time and doesn't have visibility into past DAGs it
modified.
> >
> > Option (3) allows us to retain the current behavior and gives us
> > visibility into the whole function, not just an individual BB.
> >
> > I kind of like option 3, but I'm open to other suggestions.
>
> What’s the concrete suggestion for (3), that we analyze the uses
> of a single address and try to re-widen the access to make it use
> a consistent width?  Widening accesses is difficult in general,
> unless the information is still present that it was originally
> wider before DAG combine.
>Alternatively, we could move the shortening logic out of DAG combine
and into a separate pass. I'll come up with a more concrete proposal.
In the meantime, some flags were added which may get us past the
initial issue. (Crossing fingers that it doesn't pessimise other
code.)
> At any rate, seems like a backend problem. :)
Yes. :)

-bw

Carrot Wei via llvm-dev

2020-May-30 05:09 UTC

head link

[llvm-dev] [cfe-dev] [RFC] Loading Bitfields with Smallest Needed Types

On Fri, May 29, 2020 at 5:59 PM John McCall via llvm-dev
<llvm-dev at lists.llvm.org> wrote:>
> On 29 May 2020, at 19:29, Bill Wendling wrote:
> > On Fri, May 29, 2020 at 4:00 PM Richard Smith <richard at
metafoo.co.uk>
> > wrote:
> >> On Fri, 29 May 2020 at 11:06, John McCall via llvm-dev
> >> <llvm-dev at lists.llvm.org> wrote:
> >>> On 28 May 2020, at 18:42, Bill Wendling wrote:
> >>>
> >>>> On Tue, May 26, 2020 at 7:49 PM James Y Knight via
llvm-dev
> >>>> <llvm-dev at lists.llvm.org> wrote:
> >>>>>
> >>>>> At least in this test-case, the "bitfield"
part of this seems to
> >>>>> be a
> >>>>> distraction. As Eli notes, Clang has lowered the
function to LLVM
> >>>>> IR
> >>>>> containing consistent i16 operations. Despite that
being a
> >>>>> different
> >>>>> choice from GCC, it should still be correct and
consistent.
> >>>>>
> >>>> I suspect that this is more prevalent with bitfields as
they're
> >>>> more
> >>>> likely to have the load / bitwise op / store operations
done on
> >>>> them,
> >>>> resulting in an access type that can be shortened. But
yes, it's
> >>>> not
> >>>> specific to just bitfields.
> >>>>
> >>>> I'm more interested in consistency, to be honest. If
the loads and
> >>>> stores for the bitfields (or other such shorten-able
objects) were
> >>>> the
> >>>> same, then we wouldn't run into the store-to-load
forwarding issue
> >>>> on
> >>>> x86 (I don't know about other platforms, but suspect
that
> >>>> consistency
> >>>> wouldn't hurt). I liked Arthur's idea of accessing
the object using
> >>>> the type size the bitfield was defined with (i8, i16,
i256). It
> >>>> would
> >>>> help with improving the heuristic. The downside is that it
could
> >>>> lead
> >>>> to un-optimal code, but that's the situation we have
now, so...
> >>>
> >>> Okay, but what concretely are you suggesting here?  Clang
IRGen is
> >>> emitting accesses with consistent sizes, and LLVM is making
them
> >>> inconsistent.  Are you just asking Clang to emit smaller
accesses
> >>> in the hope that LLVM won’t mess them up?
> >>
> >>
> >> I don't think this has anything to do with bit-fields or
Clang's
> >> lowering. This seems to "just" be an optimizer issue
(one that
> >> happens to show up for bit-field access patterns, but also affects
> >> other cases). Much-reduced testcase:
> >>
> >> unsigned short n;
> >> void set() { n |= 1; }
>
> To be clear, I agree with you that this is not a frontend problem.
> I asked because the question seemed to lead towards changing the width
> that Clang uses for accesses.
>
> > We're a bit (heh) tied in what we can do. LLVM's IR
doesn't convey
> > that we're working with a bitfield, but as many pointed out
it's not
> > bitfield-specific (I didn't mean to imply that it was *only*
bitfields
> > affected by this with my initial post, that was just where it showed
> > up). The front-end generates code that's perfectly valid for the
> > various accesses, and changing what it generates doesn't seem like
it
> > would be worth the hassle. (E.g. using i1, i3, etc. for the bitfields
> > (note I'm *not* suggesting we do this, I'm just using as an
example).)
> >
> > So what we're left with is trying to ensure that we don't
generate
> > sub-optimal code via the DAG combiner and other passes. Some ideas off
> > the top of my head:
> >
> > 1. Avoid generating byte-sized bitwise operations, because as Richard
> > pointed out they can lead to pessimizations.
> >
> > 2. Come up with a better heuristic to determine if we're going to
> > generate inconsistent load/store accesses to the same object.
> >
> > 3. Transform the machine instructions before register allocation to
> > ensure that all accesses to the same address are the same size, or at
> > least that the stores are of equal or greater size than the loads.
> >
> > Option (1) may be the simplest and could potentially solve our
> > immediate issue.
> >
> > Option (2) is difficult because the DAG combiner operates on a single
> > BB at a time and doesn't have visibility into past DAGs it
modified.
> >
> > Option (3) allows us to retain the current behavior and gives us
> > visibility into the whole function, not just an individual BB.
> >
> > I kind of like option 3, but I'm open to other suggestions.
>
> What’s the concrete suggestion for (3), that we analyze the uses
> of a single address and try to re-widen the access to make it use
> a consistent width?  Widening accesses is difficult in general,
> unless the information is still present that it was originally
> wider before DAG combine.
>
> At any rate, seems like a backend problem. :)
> In order to modify load / store instructions, all related alu instructions must
also be analyzed and modified. It is not convenient in machine instruction
level. LLVM IR is more appropriate for this work. But DAG combine and
selection can narrow and widen those load / store again. Actually most of
current load / store narrow / widen problems come from DAG combine and
selection. Maybe we should move these load / store narrow / widen
optimization from DAG to an LLVM pass.
>
> John.
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

llvm dev - May 2020 - [cfe-dev] [RFC] Loading Bitfields with Smallest Needed Types

[llvm-dev] [cfe-dev] [RFC] Loading Bitfields with Smallest Needed Types

[llvm-dev] [cfe-dev] [RFC] Loading Bitfields with Smallest Needed Types

[llvm-dev] [cfe-dev] [RFC] Loading Bitfields with Smallest Needed Types

[llvm-dev] [cfe-dev] [RFC] Loading Bitfields with Smallest Needed Types

[llvm-dev] [cfe-dev] [RFC] Loading Bitfields with Smallest Needed Types

[llvm-dev] [cfe-dev] [RFC] Loading Bitfields with Smallest Needed Types