Jon Chesterfield via llvm-dev
2017-Sep-15 19:10 UTC
[llvm-dev] What should a truncating store do?
OK, I'm clear on scalars. Data races are thankfully OK in this context. Densely packing vectors sounds efficient and is clear in the case where lanes * width is a multiple of 8 bits. I don't think I understand how it works in other cases. If we could take store <4 x i8> truncating to <4 x i7> as an example. This can be converted into four scalar i8 -> i7 stores with corresponding increments to the address, in which case the final layout in memory is 0b01111111011111110111111101111111. Or it can be written as a packed vector which I think would resemble 0b00001111111111111111111111111111. This would mean the memory layout changes depending on how/whether the legaliser breaks large vectors down into smaller types. Is this the case? For example, <4xi32> => <4 x i31> converts to two <2 x i32> => <2 x i31> stores on a target with <2 x i32> legal but would not be split if <4 x i32> were declared legal. Thanks Jon On Fri, Sep 15, 2017 at 7:41 PM, Friedman, Eli <efriedma at codeaurora.org> wrote:> On 9/15/2017 11:30 AM, Jon Chesterfield wrote: > >> Interesting, thank you. I expected both answers to be "unchanged" so was >> surprised by the zero extend in the legaliser. >> >> The motivation here is that it's faster for us to load N bytes, apply >> whatever masks are necessary to reproduce the truncating store then store >> all N bytes. This is only a good plan if there's no change to the semantics >> :) >> > > See llvm.org/docs/LangRef.html#store-instruction . In general, > you have to be careful to avoid data races, but that might not apply to > your target. > > Are scalar integer types zero extended to the next multiple of 8 or to the >> next power of 2 greater than 7? For example, i17 => i24 or i17 => i32? >> > > Multiple of 8. > > I think this means truncating stores of vector types will introduce zero >> bits at the end of each element instead grouping all the zeros at the end. >> For example, <i6 63, i6 63> writes to sixteen bits as 0b0011111100111111, >> not as 0b0000111111111111? >> > > Vector types are tightly packed, so <8 x i1> is 1 byte, not 8 bytes. > > > -Eli > > -- > Employee of Qualcomm Innovation Center, Inc. > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux > Foundation Collaborative Project > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20170915/f6be837d/attachment.html>
Friedman, Eli via llvm-dev
2017-Sep-15 20:41 UTC
[llvm-dev] What should a truncating store do?
On 9/15/2017 12:10 PM, Jon Chesterfield wrote:> OK, I'm clear on scalars. Data races are thankfully OK in this context. > > Densely packing vectors sounds efficient and is clear in the case > where lanes * width is a multiple of 8 bits. I don't think I > understand how it works in other cases. > > If we could take store <4 x i8> truncating to <4 x i7> as an example. > This can be converted into four scalar i8 -> i7 stores with > corresponding increments to the address, in which case the final > layout in memory is 0b01111111011111110111111101111111. Or it can be > written as a packed vector which I think would > resemble 0b00001111111111111111111111111111. > > This would mean the memory layout changes depending on how/whether the > legaliser breaks large vectors down into smaller types. Is this the > case? For example, <4xi32> => <4 x i31> converts to two <2 x i32> => > <2 x i31> stores on a target with <2 x i32> legal but would not be > split if <4 x i32> were declared legal.Vectors get complicated; I don't recall all the details of what the code generator currently does/is supposed to do. See also bugs.llvm.org/show_bug.cgi?id=31265 . -Eli -- Employee of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Jon Chesterfield via llvm-dev
2017-Sep-15 21:28 UTC
[llvm-dev] What should a truncating store do?
They are starting to look complicated. The patch linked is interesting, perhaps v1 vectors are special cased. It shouldn't be too onerous to work out what one or two in tree back ends do by experimentation. Thanks again, it's great to have context beyond the source. On Fri, Sep 15, 2017 at 9:41 PM, Friedman, Eli <efriedma at codeaurora.org> wrote:> On 9/15/2017 12:10 PM, Jon Chesterfield wrote: > >> OK, I'm clear on scalars. Data races are thankfully OK in this context. >> >> Densely packing vectors sounds efficient and is clear in the case where >> lanes * width is a multiple of 8 bits. I don't think I understand how it >> works in other cases. >> >> If we could take store <4 x i8> truncating to <4 x i7> as an example. >> This can be converted into four scalar i8 -> i7 stores with corresponding >> increments to the address, in which case the final layout in memory >> is 0b01111111011111110111111101111111. Or it can be written as a packed >> vector which I think would resemble 0b00001111111111111111111111111111. >> >> This would mean the memory layout changes depending on how/whether the >> legaliser breaks large vectors down into smaller types. Is this the case? >> For example, <4xi32> => <4 x i31> converts to two <2 x i32> => <2 x i31> >> stores on a target with <2 x i32> legal but would not be split if <4 x i32> >> were declared legal. >> > > Vectors get complicated; I don't recall all the details of what the code > generator currently does/is supposed to do. See also > bugs.llvm.org/show_bug.cgi?id=31265 . > > > -Eli > > -- > Employee of Qualcomm Innovation Center, Inc. > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux > Foundation Collaborative Project > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20170915/9db3438d/attachment.html>