thr3ads.net - llvm dev - [llvm-dev] SROA and volatile memcpy/memset [Nov 2015]

If this information is useful, please help other people find it:
Share via:

Krzysztof Parzyszek via llvm-dev

2015-Nov-11 15:34 UTC

[llvm-dev] SROA and volatile memcpy/memset

On 11/11/2015 9:28 AM, Chandler Carruth wrote:> So, here is the model that LLVM is using: a volatile memcpy is lowered
> to a loop of loads and stores of indeterminate width. As such, splitting
> a memcpy is always valid.
>
> If we want a very specific load and store width for volatile accesses, I
> think that the frontend should generate concrete loads and stores of a
> type with that width. Ultimately, memcpy is a pretty bad model for
> *specific* width accesses, it is best at handling indeterminate sized
> accesses, which is exactly what doesn't make sense for device backed
> volatile accesses.
>
Yeah, the remark about devices I made in my post was a result of a 
"last-minute" thought to add some rationale.  It doesn't actually
apply
to SROA, since there are no devices that are mapped to the stack, which 
is what SROA is interested in.

The concern with the testcase I attached is really about performance. 
Would it be reasonable to control the splitting in SROA via TTI?

-Krzysztof

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
hosted by The Linux Foundation

Hal Finkel via llvm-dev

2015-Nov-11 15:36 UTC

head link

[llvm-dev] SROA and volatile memcpy/memset

----- Original Message -----> From: "Krzysztof Parzyszek" <kparzysz at codeaurora.org>
> To: "Chandler Carruth" <chandlerc at gmail.com>, "Hal
Finkel" <hfinkel at anl.gov>
> Cc: llvm-dev at lists.llvm.org
> Sent: Wednesday, November 11, 2015 9:34:01 AM
> Subject: Re: [llvm-dev] SROA and volatile memcpy/memset
> 
> On 11/11/2015 9:28 AM, Chandler Carruth wrote:
> > So, here is the model that LLVM is using: a volatile memcpy is
> > lowered
> > to a loop of loads and stores of indeterminate width. As such,
> > splitting
> > a memcpy is always valid.
> >
> > If we want a very specific load and store width for volatile
> > accesses, I
> > think that the frontend should generate concrete loads and stores
> > of a
> > type with that width. Ultimately, memcpy is a pretty bad model for
> > *specific* width accesses, it is best at handling indeterminate
> > sized
> > accesses, which is exactly what doesn't make sense for device
> > backed
> > volatile accesses.
> >
> 
> Yeah, the remark about devices I made in my post was a result of a
> "last-minute" thought to add some rationale.  It doesn't
actually
> apply
> to SROA, since there are no devices that are mapped to the stack,
> which
> is what SROA is interested in.
> 
> The concern with the testcase I attached is really about performance.
> Would it be reasonable to control the splitting in SROA via TTI?
How so?

 -Hal
> 
> -Krzysztof
> 
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> hosted by The Linux Foundation
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Chandler Carruth via llvm-dev

2015-Nov-11 15:40 UTC

head link

[llvm-dev] SROA and volatile memcpy/memset

I'm pretty sure volatile access voids your performance warranty....

I assume the issue is that the loads and stores aren't combined late in the
back end because we propagate the volatile? I think the fix for performance
is "don't use volatile". I'm sure you've looked at that
option, but we'll
need a lot more context on what problem you're actually hitting to provide
more realistic options.

I think TTI is a very bad fit here -- target customization would really
hurt the entire canonicalization efforts of the middle end....

On Wed, Nov 11, 2015, 10:34 Krzysztof Parzyszek via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On 11/11/2015 9:28 AM, Chandler Carruth wrote:
> > So, here is the model that LLVM is using: a volatile memcpy is lowered
> > to a loop of loads and stores of indeterminate width. As such,
splitting
> > a memcpy is always valid.
> >
> > If we want a very specific load and store width for volatile accesses,
I
> > think that the frontend should generate concrete loads and stores of a
> > type with that width. Ultimately, memcpy is a pretty bad model for
> > *specific* width accesses, it is best at handling indeterminate sized
> > accesses, which is exactly what doesn't make sense for device
backed
> > volatile accesses.
> >
>
> Yeah, the remark about devices I made in my post was a result of a
> "last-minute" thought to add some rationale.  It doesn't
actually apply
> to SROA, since there are no devices that are mapped to the stack, which
> is what SROA is interested in.
>
> The concern with the testcase I attached is really about performance.
> Would it be reasonable to control the splitting in SROA via TTI?
>
> -Krzysztof
>
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> hosted by The Linux Foundation
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151111/ea02e827/attachment.html>

Krzysztof Parzyszek via llvm-dev

2015-Nov-11 15:54 UTC

head link

[llvm-dev] SROA and volatile memcpy/memset

On 11/11/2015 9:36 AM, Hal Finkel wrote:> ----- Original Message -----
>> From: "Krzysztof Parzyszek" <kparzysz at
codeaurora.org>
>>
>> Yeah, the remark about devices I made in my post was a result of a
>> "last-minute" thought to add some rationale.  It doesn't
actually
>> apply
>> to SROA, since there are no devices that are mapped to the stack,
>> which
>> is what SROA is interested in.
>>
>> The concern with the testcase I attached is really about performance.
>> Would it be reasonable to control the splitting in SROA via TTI?
>
> How so?
I'm not sure which part you are referring to.  The "volatileness"
of the
structure in question does not place the same restrictions on how we can 
access it as it would be in the case of a device access.  The broken up 
loads and stores are legal in the sense that they won't cause any 
hardware issues, however they would take longer to execute because the 
resulting instructions would be marked as volatile and thus 
"non-optimizable".

-Krzysztof

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
hosted by The Linux Foundation

Krzysztof Parzyszek via llvm-dev

2015-Nov-11 15:57 UTC

head link

[llvm-dev] SROA and volatile memcpy/memset

On 11/11/2015 9:40 AM, Chandler Carruth wrote:> I'm pretty sure volatile access voids your performance warranty....
>
> I assume the issue is that the loads and stores aren't combined late in
> the back end because we propagate the volatile? I think the fix for
> performance is "don't use volatile". I'm sure you've
looked at that
> option, but we'll need a lot more context on what problem you're
> actually hitting to provide more realistic options.
The testcase is a synthetic scenario that a customer gave us.  I don't 
have much more insight into it than what I can get by just looking at 
it.  I can try to find out more about the origin of it.

> I think TTI is a very bad fit here -- target customization would really
> hurt the entire canonicalization efforts of the middle end....
I see. Hmm.

-Krzysztof

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
hosted by The Linux Foundation

llvm dev - Nov 2015 - SROA and volatile memcpy/memset

[llvm-dev] SROA and volatile memcpy/memset

[llvm-dev] SROA and volatile memcpy/memset

[llvm-dev] SROA and volatile memcpy/memset

[llvm-dev] SROA and volatile memcpy/memset

[llvm-dev] SROA and volatile memcpy/memset