Krzysztof Parzyszek via llvm-dev
2015-Nov-11 15:34 UTC
[llvm-dev] SROA and volatile memcpy/memset
On 11/11/2015 9:28 AM, Chandler Carruth wrote:> So, here is the model that LLVM is using: a volatile memcpy is lowered > to a loop of loads and stores of indeterminate width. As such, splitting > a memcpy is always valid. > > If we want a very specific load and store width for volatile accesses, I > think that the frontend should generate concrete loads and stores of a > type with that width. Ultimately, memcpy is a pretty bad model for > *specific* width accesses, it is best at handling indeterminate sized > accesses, which is exactly what doesn't make sense for device backed > volatile accesses. >Yeah, the remark about devices I made in my post was a result of a "last-minute" thought to add some rationale. It doesn't actually apply to SROA, since there are no devices that are mapped to the stack, which is what SROA is interested in. The concern with the testcase I attached is really about performance. Would it be reasonable to control the splitting in SROA via TTI? -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
----- Original Message -----> From: "Krzysztof Parzyszek" <kparzysz at codeaurora.org> > To: "Chandler Carruth" <chandlerc at gmail.com>, "Hal Finkel" <hfinkel at anl.gov> > Cc: llvm-dev at lists.llvm.org > Sent: Wednesday, November 11, 2015 9:34:01 AM > Subject: Re: [llvm-dev] SROA and volatile memcpy/memset > > On 11/11/2015 9:28 AM, Chandler Carruth wrote: > > So, here is the model that LLVM is using: a volatile memcpy is > > lowered > > to a loop of loads and stores of indeterminate width. As such, > > splitting > > a memcpy is always valid. > > > > If we want a very specific load and store width for volatile > > accesses, I > > think that the frontend should generate concrete loads and stores > > of a > > type with that width. Ultimately, memcpy is a pretty bad model for > > *specific* width accesses, it is best at handling indeterminate > > sized > > accesses, which is exactly what doesn't make sense for device > > backed > > volatile accesses. > > > > Yeah, the remark about devices I made in my post was a result of a > "last-minute" thought to add some rationale. It doesn't actually > apply > to SROA, since there are no devices that are mapped to the stack, > which > is what SROA is interested in. > > The concern with the testcase I attached is really about performance. > Would it be reasonable to control the splitting in SROA via TTI?How so? -Hal> > -Krzysztof > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, > hosted by The Linux Foundation >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
Chandler Carruth via llvm-dev
2015-Nov-11 15:40 UTC
[llvm-dev] SROA and volatile memcpy/memset
I'm pretty sure volatile access voids your performance warranty.... I assume the issue is that the loads and stores aren't combined late in the back end because we propagate the volatile? I think the fix for performance is "don't use volatile". I'm sure you've looked at that option, but we'll need a lot more context on what problem you're actually hitting to provide more realistic options. I think TTI is a very bad fit here -- target customization would really hurt the entire canonicalization efforts of the middle end.... On Wed, Nov 11, 2015, 10:34 Krzysztof Parzyszek via llvm-dev < llvm-dev at lists.llvm.org> wrote:> On 11/11/2015 9:28 AM, Chandler Carruth wrote: > > So, here is the model that LLVM is using: a volatile memcpy is lowered > > to a loop of loads and stores of indeterminate width. As such, splitting > > a memcpy is always valid. > > > > If we want a very specific load and store width for volatile accesses, I > > think that the frontend should generate concrete loads and stores of a > > type with that width. Ultimately, memcpy is a pretty bad model for > > *specific* width accesses, it is best at handling indeterminate sized > > accesses, which is exactly what doesn't make sense for device backed > > volatile accesses. > > > > Yeah, the remark about devices I made in my post was a result of a > "last-minute" thought to add some rationale. It doesn't actually apply > to SROA, since there are no devices that are mapped to the stack, which > is what SROA is interested in. > > The concern with the testcase I attached is really about performance. > Would it be reasonable to control the splitting in SROA via TTI? > > -Krzysztof > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, > hosted by The Linux Foundation > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151111/ea02e827/attachment.html>
Krzysztof Parzyszek via llvm-dev
2015-Nov-11 15:54 UTC
[llvm-dev] SROA and volatile memcpy/memset
On 11/11/2015 9:36 AM, Hal Finkel wrote:> ----- Original Message ----- >> From: "Krzysztof Parzyszek" <kparzysz at codeaurora.org> >> >> Yeah, the remark about devices I made in my post was a result of a >> "last-minute" thought to add some rationale. It doesn't actually >> apply >> to SROA, since there are no devices that are mapped to the stack, >> which >> is what SROA is interested in. >> >> The concern with the testcase I attached is really about performance. >> Would it be reasonable to control the splitting in SROA via TTI? > > How so?I'm not sure which part you are referring to. The "volatileness" of the structure in question does not place the same restrictions on how we can access it as it would be in the case of a device access. The broken up loads and stores are legal in the sense that they won't cause any hardware issues, however they would take longer to execute because the resulting instructions would be marked as volatile and thus "non-optimizable". -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Krzysztof Parzyszek via llvm-dev
2015-Nov-11 15:57 UTC
[llvm-dev] SROA and volatile memcpy/memset
On 11/11/2015 9:40 AM, Chandler Carruth wrote:> I'm pretty sure volatile access voids your performance warranty.... > > I assume the issue is that the loads and stores aren't combined late in > the back end because we propagate the volatile? I think the fix for > performance is "don't use volatile". I'm sure you've looked at that > option, but we'll need a lot more context on what problem you're > actually hitting to provide more realistic options.The testcase is a synthetic scenario that a customer gave us. I don't have much more insight into it than what I can get by just looking at it. I can try to find out more about the origin of it.> I think TTI is a very bad fit here -- target customization would really > hurt the entire canonicalization efforts of the middle end....I see. Hmm. -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation