Hi all, On Tue, 7 Apr 2020 at 11:04, Renato Golin via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > On Tue, 7 Apr 2020 at 09:30, Kai Wang via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > LMUL = 1 LMUL = 2 LMUL = 4 LMUL = 8 > > int64_t | vscale x 1 x i64 | vscale x 2 x i64 | vscale x 4 x i64 | vscale x 8 x i64 > > int32_t | vscale x 2 x i32 | vscale x 4 x i32 | vscale x 8 x i32 | vscale x 16 x i32 > > int16_t | vscale x 4 x i16 | vscale x 8 x i16 | vscale x 16 x i16 | vscale x 32 x i16 > > int8_t | vscale x 8 x i8 | vscale x 16 x i8 | vscale x 32 x i8 | vscale x 64 x i8 > > > > We have another architecture parameter, ELEN, which means the maximum size of a single vector element in bits. > > Hi, > > For my own education, some quick questions: > > 1. is LMUL always a multiple of ELEN?This happens to be true (at least in the current spec, disregarding some in-progress proposals) just because both are powers of two and the largest possible LMUL equals the smallest possible ELEN (8), but I don't think there is any meaning to be found in this observation. The two values govern unrelated aspects of the vector unit.> 2. Is this fixed on the hardware, depending on the actual lengths, or > is this dynamically set by software (on a register or status flag)? > 2a. If dynamic, can it change from program to program? Function to function?It's not clear whether by "this" you mean ELEN, LMUL, or something else. ELEN is fixed in hardware. LMUL is a property of each individual instruction. Most instructions take it from a control register, a few encode it in the instruction as an immediate, but in any case it needs to be statically determined (on a per-instruction basis) to be able to allocate registers. This is not just a constraint for compiler-generated code, but also for all hand-written assembly code I've seen or can imagine.> > > We hope the type system could be consistent under ELEN = 32 and ELEN = 64. However, vscale may be a fractional value under ELEN = 32 in the above type system. When ELEN = 32, i64 is an invalid type (we could ignore the first row for ELEN = 32) and vscale may become 1/2 on run time to fit the architecture (if the vector register only has 32 bits). > > Do you mean ELEN=32 like this? > int32_t | vscale x 1 x i32 | vscale x 2 x i32 | vscale x 4 x i32 | > vscale x 8 x i32 > int16_t | vscale x 2 x i16 | vscale x 4 x i16 | vscale x 8 x i16 | > vscale x 16 x i16 > int8_t | vscale x 4 x i8 | vscale x 8 x i8 | vscale x 16 x i8 | > vscale x 32 x i8 > > If the type is invalid, you would need to legalise it, and in that > case create some cluttered accessors (via insert/extract element) and > possibly use intrinsics to expose underlying instructions that can > deal with it. > > Perhaps I'm not clear on what you need, but vscale is supposed to be > the number of valid elements (lanes), and given i64 is invalid, vscale > wouldn't apply?I don't know what "vscale wouldn't apply" is supposed to mean. Whether it's legal or not, you can write LLVM IR using (for example) the type <vscale x 1 x i64> even if the target doesn't natively support it. The purpose of legalization is to make sure that results in the behavior the type is supposed to have. For <vscale x 1 x i32>, this means among other things: - it has the same number of elements as <vscale x 1 x i32>, but each element is twice as big - it has half as many elements (each of the same size) as <vscale x 2 x i64> - its total size in bits is the same as <vscale x 2 x i32> I think that focusing on the completely illegal i64 might obscure the real problem I see with the fractional vscale concept. Let's look at <vscale x 1 x i32> instead. The elements are clearly legal in this context, even in some vector types, but the <vscale x 1 x i32> type is absent from Kai's table. This makes sense: the same vector register fits 2x as many i32 elements as i64 elements, so if you start with <vscale x 1 x i64> mapping to a single register, then <vscale x 2 x i32> is the same size and fits in the same register class, while <vscale x 1 x i32> is too small and must be legalized somehow. But how? If we take Kai's table as gospel and look at a VLEN = ELEN 32 machine, the vector type <vscale x 2 x i32> is supposed to map to a single vector register, which is 32b small, and thus <vscale x 2 x i32> would have just one element in this context (matching the "vscale = 1/2" intuition). To be consistent with this, <vscale x 1 x i32> would have be contain just *half* an element. This is not something any legalization strategy can achieve, because it is a fundamentally impossible notion. So we end up in a situation where some types are not just illegal and have to be legalized, but are contradictory and can't be legalized in any meaningful way. I don't think LLVM can/should support this kind of contradiction. Some types have to be legalized, sometimes the legalization is not efficient, sometimes it's not even implemented, that's all fine. But letting some targets decide that <vscale x 1 x i32> is a fundamentally impossible type to even assign a meaning to... that seems unprecedented and contrary to the philosophy of LLVM IR as reasonably target-independent IR. The obvious solution is to use a different set of legal vector types (and thus, a different interpretation of vscale) depending on the largest legal element type (ELEN in RISC-V jargon). This corresponds to the table for ELEN=32 that Renato gave above. Kai's proposal is intended to avoid this, and I can understand the desire for that, but it really seems like the lesser evil to me. Best regards Hanna> > Is there any problem to assume vscale to be fractional under some circumstances? vscale should be an unknown value when compiling. So, it should have no impact on code generation and optimization. The relationship between types is correct regardless vscale’s value. Is there anything I missed? > > I believe the assumption was always that vscale is an integer. > Representing it as a fraction would need code change for sure, but > also reevaluate the assumptions. > > I'm copying some SVE and LV people to give a more informed opinion. > > cheers, > --renato > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Hi, thanks Hanna for pointing at the contradiction under this modeling. I wonder if HwModes can help us here. I feel in some way ELEN is playing a similar role to RISC-V's XLEN. This way could assign different value types to the register classes associated to the different LMUL values. E.g. ELEN=32 the register class for base registers (i.e. LMUL=1) could include nxv1i32, nxv1f32, nxv2i16, nxv2i8, etc. ELEN=64 the register class for base registers could include nxv1i64, nxv1f64, nxv2i32, nxv2f32, ... but does not have to include nxv1f32, nxv1i32, nxv2i16, etc. (my understanding is that there is an ongoing proposal to efficiently allow manipulating such values as if they were subregisters of the base registers, but I'm ignoring that for now). Kind regards, Missatge de Hanna Kruppe via llvm-dev <llvm-dev at lists.llvm.org> del dia dt., 7 d’abr. 2020 a les 13:52:> Hi all, > > On Tue, 7 Apr 2020 at 11:04, Renato Golin via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > > > On Tue, 7 Apr 2020 at 09:30, Kai Wang via llvm-dev > > <llvm-dev at lists.llvm.org> wrote: > > > LMUL = 1 LMUL = 2 LMUL = 4 > LMUL = 8 > > > int64_t | vscale x 1 x i64 | vscale x 2 x i64 | vscale x 4 x i64 | > vscale x 8 x i64 > > > int32_t | vscale x 2 x i32 | vscale x 4 x i32 | vscale x 8 x i32 | > vscale x 16 x i32 > > > int16_t | vscale x 4 x i16 | vscale x 8 x i16 | vscale x 16 x i16 | > vscale x 32 x i16 > > > int8_t | vscale x 8 x i8 | vscale x 16 x i8 | vscale x 32 x i8 | > vscale x 64 x i8 > > > > > > We have another architecture parameter, ELEN, which means the maximum > size of a single vector element in bits. > > > > Hi, > > > > For my own education, some quick questions: > > > > 1. is LMUL always a multiple of ELEN? > > > This happens to be true (at least in the current spec, disregarding > some in-progress proposals) just because both are powers of two and > the largest possible LMUL equals the smallest possible ELEN (8), but I > don't think there is any meaning to be found in this observation. The > two values govern unrelated aspects of the vector unit. > > > 2. Is this fixed on the hardware, depending on the actual lengths, or > > is this dynamically set by software (on a register or status flag)? > > 2a. If dynamic, can it change from program to program? Function to > function? > > > It's not clear whether by "this" you mean ELEN, LMUL, or something > else. ELEN is fixed in hardware. LMUL is a property of each individual > instruction. Most instructions take it from a control register, a few > encode it in the instruction as an immediate, but in any case it needs > to be statically determined (on a per-instruction basis) to be able to > allocate registers. This is not just a constraint for > compiler-generated code, but also for all hand-written assembly code > I've seen or can imagine. > > > > > > We hope the type system could be consistent under ELEN = 32 and ELEN > 64. However, vscale may be a fractional value under ELEN = 32 in the above > type system. When ELEN = 32, i64 is an invalid type (we could ignore the > first row for ELEN = 32) and vscale may become 1/2 on run time to fit the > architecture (if the vector register only has 32 bits). > > > > Do you mean ELEN=32 like this? > > int32_t | vscale x 1 x i32 | vscale x 2 x i32 | vscale x 4 x i32 | > > vscale x 8 x i32 > > int16_t | vscale x 2 x i16 | vscale x 4 x i16 | vscale x 8 x i16 | > > vscale x 16 x i16 > > int8_t | vscale x 4 x i8 | vscale x 8 x i8 | vscale x 16 x i8 | > > vscale x 32 x i8 > > > > If the type is invalid, you would need to legalise it, and in that > > case create some cluttered accessors (via insert/extract element) and > > possibly use intrinsics to expose underlying instructions that can > > deal with it. > > > > Perhaps I'm not clear on what you need, but vscale is supposed to be > > the number of valid elements (lanes), and given i64 is invalid, vscale > > wouldn't apply? > > > I don't know what "vscale wouldn't apply" is supposed to mean. Whether > it's legal or not, you can write LLVM IR using (for example) the type > <vscale x 1 x i64> even if the target doesn't natively support it. The > purpose of legalization is to make sure that results in the behavior > the type is supposed to have. For <vscale x 1 x i32>, this means among > other things: > > - it has the same number of elements as <vscale x 1 x i32>, but each > element is twice as big > - it has half as many elements (each of the same size) as <vscale x 2 x > i64> > - its total size in bits is the same as <vscale x 2 x i32> > > I think that focusing on the completely illegal i64 might obscure the > real problem I see with the fractional vscale concept. Let's look at > <vscale x 1 x i32> instead. The elements are clearly legal in this > context, even in some vector types, but the <vscale x 1 x i32> type is > absent from Kai's table. This makes sense: the same vector register > fits 2x as many i32 elements as i64 elements, so if you start with > <vscale x 1 x i64> mapping to a single register, then <vscale x 2 x > i32> is the same size and fits in the same register class, while > <vscale x 1 x i32> is too small and must be legalized somehow. > > But how? If we take Kai's table as gospel and look at a VLEN = ELEN > 32 machine, the vector type <vscale x 2 x i32> is supposed to map to a > single vector register, which is 32b small, and thus <vscale x 2 x > i32> would have just one element in this context (matching the "vscale > = 1/2" intuition). To be consistent with this, <vscale x 1 x i32> > would have be contain just *half* an element. This is not something > any legalization strategy can achieve, because it is a fundamentally > impossible notion. So we end up in a situation where some types are > not just illegal and have to be legalized, but are contradictory and > can't be legalized in any meaningful way. > > I don't think LLVM can/should support this kind of contradiction. Some > types have to be legalized, sometimes the legalization is not > efficient, sometimes it's not even implemented, that's all fine. But > letting some targets decide that <vscale x 1 x i32> is a fundamentally > impossible type to even assign a meaning to... that seems > unprecedented and contrary to the philosophy of LLVM IR as reasonably > target-independent IR. > > The obvious solution is to use a different set of legal vector types > (and thus, a different interpretation of vscale) depending on the > largest legal element type (ELEN in RISC-V jargon). This corresponds > to the table for ELEN=32 that Renato gave above. Kai's proposal is > intended to avoid this, and I can understand the desire for that, but > it really seems like the lesser evil to me. > > Best regards > Hanna > > > > > Is there any problem to assume vscale to be fractional under some > circumstances? vscale should be an unknown value when compiling. So, it > should have no impact on code generation and optimization. The relationship > between types is correct regardless vscale’s value. Is there anything I > missed? > > > > I believe the assumption was always that vscale is an integer. > > Representing it as a fraction would need code change for sure, but > > also reevaluate the assumptions. > > > > I'm copying some SVE and LV people to give a more informed opinion. > > > > cheers, > > --renato > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-- Roger Ferrer Ibáñez -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200407/203a2876/attachment-0001.html>
On Tue, 7 Apr 2020 at 14:03, Roger Ferrer Ibáñez <rofirrim at gmail.com> wrote:> > Hi, > > thanks Hanna for pointing at the contradiction under this modeling. > > I wonder if HwModes can help us here. I feel in some way ELEN is playing a similar role to RISC-V's XLEN. This way could assign different value types to the register classes associated to the different LMUL values. > > E.g. > ELEN=32 the register class for base registers (i.e. LMUL=1) could include nxv1i32, nxv1f32, nxv2i16, nxv2i8, etc. > ELEN=64 the register class for base registers could include nxv1i64, nxv1f64, nxv2i32, nxv2f32, ... but does not have to include nxv1f32, nxv1i32, nxv2i16, etc. (my understanding is that there is an ongoing proposal to efficiently allow manipulating such values as if they were subregisters of the base registers, but I'm ignoring that for now).Hi Roger, HwModes indeed seem a nice way to model "different legal types depending on ELEN" in the backend. However, at the moment it seems there's still no consensus that this is the route we should/need to take. Maybe we should settle that question before giving these implementation details more thought? Kind regards, Hanna> Kind regards, > > Missatge de Hanna Kruppe via llvm-dev <llvm-dev at lists.llvm.org> del dia dt., 7 d’abr. 2020 a les 13:52: >> >> Hi all, >> >> On Tue, 7 Apr 2020 at 11:04, Renato Golin via llvm-dev >> <llvm-dev at lists.llvm.org> wrote: >> > >> > On Tue, 7 Apr 2020 at 09:30, Kai Wang via llvm-dev >> > <llvm-dev at lists.llvm.org> wrote: >> > > LMUL = 1 LMUL = 2 LMUL = 4 LMUL = 8 >> > > int64_t | vscale x 1 x i64 | vscale x 2 x i64 | vscale x 4 x i64 | vscale x 8 x i64 >> > > int32_t | vscale x 2 x i32 | vscale x 4 x i32 | vscale x 8 x i32 | vscale x 16 x i32 >> > > int16_t | vscale x 4 x i16 | vscale x 8 x i16 | vscale x 16 x i16 | vscale x 32 x i16 >> > > int8_t | vscale x 8 x i8 | vscale x 16 x i8 | vscale x 32 x i8 | vscale x 64 x i8 >> > > >> > > We have another architecture parameter, ELEN, which means the maximum size of a single vector element in bits. >> > >> > Hi, >> > >> > For my own education, some quick questions: >> > >> > 1. is LMUL always a multiple of ELEN? >> >> >> This happens to be true (at least in the current spec, disregarding >> some in-progress proposals) just because both are powers of two and >> the largest possible LMUL equals the smallest possible ELEN (8), but I >> don't think there is any meaning to be found in this observation. The >> two values govern unrelated aspects of the vector unit. >> >> > 2. Is this fixed on the hardware, depending on the actual lengths, or >> > is this dynamically set by software (on a register or status flag)? >> > 2a. If dynamic, can it change from program to program? Function to function? >> >> >> It's not clear whether by "this" you mean ELEN, LMUL, or something >> else. ELEN is fixed in hardware. LMUL is a property of each individual >> instruction. Most instructions take it from a control register, a few >> encode it in the instruction as an immediate, but in any case it needs >> to be statically determined (on a per-instruction basis) to be able to >> allocate registers. This is not just a constraint for >> compiler-generated code, but also for all hand-written assembly code >> I've seen or can imagine. >> >> > >> > > We hope the type system could be consistent under ELEN = 32 and ELEN = 64. However, vscale may be a fractional value under ELEN = 32 in the above type system. When ELEN = 32, i64 is an invalid type (we could ignore the first row for ELEN = 32) and vscale may become 1/2 on run time to fit the architecture (if the vector register only has 32 bits). >> > >> > Do you mean ELEN=32 like this? >> > int32_t | vscale x 1 x i32 | vscale x 2 x i32 | vscale x 4 x i32 | >> > vscale x 8 x i32 >> > int16_t | vscale x 2 x i16 | vscale x 4 x i16 | vscale x 8 x i16 | >> > vscale x 16 x i16 >> > int8_t | vscale x 4 x i8 | vscale x 8 x i8 | vscale x 16 x i8 | >> > vscale x 32 x i8 >> > >> > If the type is invalid, you would need to legalise it, and in that >> > case create some cluttered accessors (via insert/extract element) and >> > possibly use intrinsics to expose underlying instructions that can >> > deal with it. >> > >> > Perhaps I'm not clear on what you need, but vscale is supposed to be >> > the number of valid elements (lanes), and given i64 is invalid, vscale >> > wouldn't apply? >> >> >> I don't know what "vscale wouldn't apply" is supposed to mean. Whether >> it's legal or not, you can write LLVM IR using (for example) the type >> <vscale x 1 x i64> even if the target doesn't natively support it. The >> purpose of legalization is to make sure that results in the behavior >> the type is supposed to have. For <vscale x 1 x i32>, this means among >> other things: >> >> - it has the same number of elements as <vscale x 1 x i32>, but each >> element is twice as big >> - it has half as many elements (each of the same size) as <vscale x 2 x i64> >> - its total size in bits is the same as <vscale x 2 x i32> >> >> I think that focusing on the completely illegal i64 might obscure the >> real problem I see with the fractional vscale concept. Let's look at >> <vscale x 1 x i32> instead. The elements are clearly legal in this >> context, even in some vector types, but the <vscale x 1 x i32> type is >> absent from Kai's table. This makes sense: the same vector register >> fits 2x as many i32 elements as i64 elements, so if you start with >> <vscale x 1 x i64> mapping to a single register, then <vscale x 2 x >> i32> is the same size and fits in the same register class, while >> <vscale x 1 x i32> is too small and must be legalized somehow. >> >> But how? If we take Kai's table as gospel and look at a VLEN = ELEN >> 32 machine, the vector type <vscale x 2 x i32> is supposed to map to a >> single vector register, which is 32b small, and thus <vscale x 2 x >> i32> would have just one element in this context (matching the "vscale >> = 1/2" intuition). To be consistent with this, <vscale x 1 x i32> >> would have be contain just *half* an element. This is not something >> any legalization strategy can achieve, because it is a fundamentally >> impossible notion. So we end up in a situation where some types are >> not just illegal and have to be legalized, but are contradictory and >> can't be legalized in any meaningful way. >> >> I don't think LLVM can/should support this kind of contradiction. Some >> types have to be legalized, sometimes the legalization is not >> efficient, sometimes it's not even implemented, that's all fine. But >> letting some targets decide that <vscale x 1 x i32> is a fundamentally >> impossible type to even assign a meaning to... that seems >> unprecedented and contrary to the philosophy of LLVM IR as reasonably >> target-independent IR. >> >> The obvious solution is to use a different set of legal vector types >> (and thus, a different interpretation of vscale) depending on the >> largest legal element type (ELEN in RISC-V jargon). This corresponds >> to the table for ELEN=32 that Renato gave above. Kai's proposal is >> intended to avoid this, and I can understand the desire for that, but >> it really seems like the lesser evil to me. >> >> Best regards >> Hanna >> >> >> > > Is there any problem to assume vscale to be fractional under some circumstances? vscale should be an unknown value when compiling. So, it should have no impact on code generation and optimization. The relationship between types is correct regardless vscale’s value. Is there anything I missed? >> > >> > I believe the assumption was always that vscale is an integer. >> > Representing it as a fraction would need code change for sure, but >> > also reevaluate the assumptions. >> > >> > I'm copying some SVE and LV people to give a more informed opinion. >> > >> > cheers, >> > --renato >> > _______________________________________________ >> > LLVM Developers mailing list >> > llvm-dev at lists.llvm.org >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > -- > Roger Ferrer Ibáñez
On Tue, 7 Apr 2020 at 12:51, Hanna Kruppe <hanna.kruppe at gmail.com> wrote:> > 1. is LMUL always a multiple of ELEN? > This happens to be true (at least in the current spec, disregarding > some in-progress proposals) just because both are powers of two and > the largest possible LMUL equals the smallest possible ELEN (8), but I > don't think there is any meaning to be found in this observation. The > two values govern unrelated aspects of the vector unit.Sorry, I meant multiple of basic types. But you have answered my question. :)> > 2. Is this fixed on the hardware, depending on the actual lengths, or > > is this dynamically set by software (on a register or status flag)? > > 2a. If dynamic, can it change from program to program? Function to function? > It's not clear whether by "this" you mean ELEN, LMUL, or something > else. ELEN is fixed in hardware. LMUL is a property of each individual > instruction.Sorry again, "this" as in both ELEN and LMUL and their relationship. Ack.> I don't know what "vscale wouldn't apply" is supposed to mean.Legalisation-wise, you got right, like <n x 0.5 x i64> is invalid and gets converted to <n x 1 x i32>, which it is. "Wouldn't apply" as in "what would be the point of having half-scale on a type that needs to be broken in half", and thus making it whole. You explain better below, so ignore it for now.> But how? If we take Kai's table as gospel and look at a VLEN = ELEN > 32 machine, the vector type <vscale x 2 x i32> is supposed to map to a > single vector register, which is 32b small, and thus <vscale x 2 x > i32> would have just one element in this context (matching the "vscale > = 1/2" intuition). To be consistent with this, <vscale x 1 x i32> > would have be contain just *half* an element. This is not something > any legalization strategy can achieve, because it is a fundamentally > impossible notion. So we end up in a situation where some types are > not just illegal and have to be legalized, but are contradictory and > can't be legalized in any meaningful way.Right, we have faced that problem before on non-scalable vector extensions. For example, vectorising 3 operations in a 4-wide vector and adding an undef in the last lane. It didn't use to be possible to do that, many years ago, as a general case. But if you look at register aliasing (VFP and NEON in ARMv7), we had the idea of different number of elements on the same register, depending on how you look. I'm not proposing to create all combinations of half-vscale shadowing, but perhaps adding half-length types as valid and lowering them in a special way could work much simpler than changing the interpretation of vscale. Also, I'm acting like devil's advocate, so don't take my comments as a rejection of your proposal, I'm just trying to understand where you are coming from. cheers, --renato
Thanks, Hanna. On Tue, Apr 7, 2020 at 7:51 PM Hanna Kruppe <hanna.kruppe at gmail.com> wrote:> Hi all, > > On Tue, 7 Apr 2020 at 11:04, Renato Golin via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > > > On Tue, 7 Apr 2020 at 09:30, Kai Wang via llvm-dev > > <llvm-dev at lists.llvm.org> wrote: > > > LMUL = 1 LMUL = 2 LMUL = 4 > LMUL = 8 > > > int64_t | vscale x 1 x i64 | vscale x 2 x i64 | vscale x 4 x i64 | > vscale x 8 x i64 > > > int32_t | vscale x 2 x i32 | vscale x 4 x i32 | vscale x 8 x i32 | > vscale x 16 x i32 > > > int16_t | vscale x 4 x i16 | vscale x 8 x i16 | vscale x 16 x i16 | > vscale x 32 x i16 > > > int8_t | vscale x 8 x i8 | vscale x 16 x i8 | vscale x 32 x i8 | > vscale x 64 x i8 > > > > > > We have another architecture parameter, ELEN, which means the maximum > size of a single vector element in bits. > > > > Hi, > > > > For my own education, some quick questions: > > > > 1. is LMUL always a multiple of ELEN? > > > This happens to be true (at least in the current spec, disregarding > some in-progress proposals) just because both are powers of two and > the largest possible LMUL equals the smallest possible ELEN (8), but I > don't think there is any meaning to be found in this observation. The > two values govern unrelated aspects of the vector unit. > > > 2. Is this fixed on the hardware, depending on the actual lengths, or > > is this dynamically set by software (on a register or status flag)? > > 2a. If dynamic, can it change from program to program? Function to > function? > > > It's not clear whether by "this" you mean ELEN, LMUL, or something > else. ELEN is fixed in hardware. LMUL is a property of each individual > instruction. Most instructions take it from a control register, a few > encode it in the instruction as an immediate, but in any case it needs > to be statically determined (on a per-instruction basis) to be able to > allocate registers. This is not just a constraint for > compiler-generated code, but also for all hand-written assembly code > I've seen or can imagine. > > > > > > We hope the type system could be consistent under ELEN = 32 and ELEN > 64. However, vscale may be a fractional value under ELEN = 32 in the above > type system. When ELEN = 32, i64 is an invalid type (we could ignore the > first row for ELEN = 32) and vscale may become 1/2 on run time to fit the > architecture (if the vector register only has 32 bits). > > > > Do you mean ELEN=32 like this? > > int32_t | vscale x 1 x i32 | vscale x 2 x i32 | vscale x 4 x i32 | > > vscale x 8 x i32 > > int16_t | vscale x 2 x i16 | vscale x 4 x i16 | vscale x 8 x i16 | > > vscale x 16 x i16 > > int8_t | vscale x 4 x i8 | vscale x 8 x i8 | vscale x 16 x i8 | > > vscale x 32 x i8 > > > > If the type is invalid, you would need to legalise it, and in that > > case create some cluttered accessors (via insert/extract element) and > > possibly use intrinsics to expose underlying instructions that can > > deal with it. > > > > Perhaps I'm not clear on what you need, but vscale is supposed to be > > the number of valid elements (lanes), and given i64 is invalid, vscale > > wouldn't apply? > > > I don't know what "vscale wouldn't apply" is supposed to mean. Whether > it's legal or not, you can write LLVM IR using (for example) the type > <vscale x 1 x i64> even if the target doesn't natively support it. The > purpose of legalization is to make sure that results in the behavior > the type is supposed to have. For <vscale x 1 x i32>, this means among > other things: > > - it has the same number of elements as <vscale x 1 x i32>, but each > element is twice as big > - it has half as many elements (each of the same size) as <vscale x 2 x > i64> > - its total size in bits is the same as <vscale x 2 x i32> > > I think that focusing on the completely illegal i64 might obscure the > real problem I see with the fractional vscale concept. Let's look at > <vscale x 1 x i32> instead. The elements are clearly legal in this > context, even in some vector types, but the <vscale x 1 x i32> type is > absent from Kai's table. This makes sense: the same vector register > fits 2x as many i32 elements as i64 elements, so if you start with > <vscale x 1 x i64> mapping to a single register, then <vscale x 2 x > i32> is the same size and fits in the same register class, while > <vscale x 1 x i32> is too small and must be legalized somehow. > > But how? If we take Kai's table as gospel and look at a VLEN = ELEN > 32 machine, the vector type <vscale x 2 x i32> is supposed to map to a > single vector register, which is 32b small, and thus <vscale x 2 x > i32> would have just one element in this context (matching the "vscale > = 1/2" intuition). To be consistent with this, <vscale x 1 x i32> > would have be contain just *half* an element. This is not something > any legalization strategy can achieve, because it is a fundamentally > impossible notion. So we end up in a situation where some types are > not just illegal and have to be legalized, but are contradictory and > can't be legalized in any meaningful way. > > I don't think LLVM can/should support this kind of contradiction. Some > types have to be legalized, sometimes the legalization is not > efficient, sometimes it's not even implemented, that's all fine. But > letting some targets decide that <vscale x 1 x i32> is a fundamentally > impossible type to even assign a meaning to... that seems > unprecedented and contrary to the philosophy of LLVM IR as reasonably > target-independent IR. >If we apply the type system pointed out by Renato, is the vector type <vscale x 1 x i16> legal? If we decide that <vscale x 1 x i16> is a fundamentally impossible type, does it contrary to the philosophy of LLVM IR as reasonably target-independent IR? I do not get the point of your argument.> > The obvious solution is to use a different set of legal vector types > (and thus, a different interpretation of vscale) depending on the > largest legal element type (ELEN in RISC-V jargon). This corresponds > to the table for ELEN=32 that Renato gave above. Kai's proposal is > intended to avoid this, and I can understand the desire for that, but > it really seems like the lesser evil to me. >The problem of defining a different type system depending on the largest legal element type (ELEN in RISC-V jargon) is that they are not compatible. I assume that programs compiled under ELEN = 32 could be run on ELEN = 64 machines. It should be possible to link ELEN = 32 objects with ELEN = 64 objects. If we use the type <vscale x 1 x i32> under ELEN = 32, there is no corresponding type under ELEN = 64 for <vscale x 1 x i32> (look up in my table). It seems an illegal type under ELEN = 64. Does it follow the philosophy of target independent IR? I hope we could design an unified type system for different ELEN. However, the vscale may be fractional on run time under some circumstances (VLEN 32, ELEN = 32) in my proposal. That is why I wonder to know whether the fractional vscale is matter or not. Thanks, Kai> Best regards > Hanna > > > > > Is there any problem to assume vscale to be fractional under some > circumstances? vscale should be an unknown value when compiling. So, it > should have no impact on code generation and optimization. The relationship > between types is correct regardless vscale’s value. Is there anything I > missed? > > > > I believe the assumption was always that vscale is an integer. > > Representing it as a fraction would need code change for sure, but > > also reevaluate the assumptions. > > > > I'm copying some SVE and LV people to give a more informed opinion. > > > > cheers, > > --renato > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200408/6a76ea7d/attachment-0001.html>
On Wed, 8 Apr 2020 at 04:23, Kai Wang <kai.wang at sifive.com> wrote:> If we apply the type system pointed out by Renato, is the vector type <vscale x 1 x i16> legal? If we decide that <vscale x 1 x i16> is a fundamentally impossible type, does it contrary to the philosophy of LLVM IR as reasonably target-independent IR? I do not get the point of your argument.Hi Kai, Don't worry about target-independent IR in your design of intermediate passes or lowering. By the time the front-end lowers to LLVM IR, it already has, often irreversible, target-specific knowledge in it. If by some stroke of luck that doesn't happen, then using "<vscale x anything>" is enough indication that you should not try to lower that onto a target that it wasn't specifically aimed at. No one expects the middle-end to be target-neutral. That's the whole point of constantly asking target-specific machinery (like TTI) about what's possible or what's "good" and what's not. More importantly, the closer you are to the end of the pass pipeline, the closer the IR is to machine IR. It's not uncommon, and often expected, to see "just the right amount of shuffles" to match lowering patterns into MIR and then Asm. IIRC, OpenCL or some other parallel-compute/graphic oriented pipeline does use odd vector shapes and legalise them later on. I may be severely outdated in my opinion, and happy to be corrected, but I don't think it would be totally egregious to carry on with (whole numbered) vector shapes that aren't strictly legal, as long as you guarantee that *any* such pattern gets correctly legalised by the lowering. If you can make the adversarial cases performing on top of that, it's a bonus, not a target. Hope this helps. cheers, --renato
On Tue, 7 Apr 2020 at 16:09, Renato Golin <rengolin at gmail.com> wrote:> > On Tue, 7 Apr 2020 at 12:51, Hanna Kruppe <hanna.kruppe at gmail.com> wrote: > > > 1. is LMUL always a multiple of ELEN? > > This happens to be true (at least in the current spec, disregarding > > some in-progress proposals) just because both are powers of two and > > the largest possible LMUL equals the smallest possible ELEN (8), but I > > don't think there is any meaning to be found in this observation. The > > two values govern unrelated aspects of the vector unit. > > Sorry, I meant multiple of basic types. But you have answered my question. :) > > > > 2. Is this fixed on the hardware, depending on the actual lengths, or > > > is this dynamically set by software (on a register or status flag)? > > > 2a. If dynamic, can it change from program to program? Function to function? > > It's not clear whether by "this" you mean ELEN, LMUL, or something > > else. ELEN is fixed in hardware. LMUL is a property of each individual > > instruction. > > Sorry again, "this" as in both ELEN and LMUL and their relationship. Ack. > > > I don't know what "vscale wouldn't apply" is supposed to mean. > > Legalisation-wise, you got right, like <n x 0.5 x i64> is invalid and > gets converted to <n x 1 x i32>, which it is. > > "Wouldn't apply" as in "what would be the point of having half-scale > on a type that needs to be broken in half", and thus making it whole. > You explain better below, so ignore it for now. > > > But how? If we take Kai's table as gospel and look at a VLEN = ELEN > > 32 machine, the vector type <vscale x 2 x i32> is supposed to map to a > > single vector register, which is 32b small, and thus <vscale x 2 x > > i32> would have just one element in this context (matching the "vscale > > = 1/2" intuition). To be consistent with this, <vscale x 1 x i32> > > would have be contain just *half* an element. This is not something > > any legalization strategy can achieve, because it is a fundamentally > > impossible notion. So we end up in a situation where some types are > > not just illegal and have to be legalized, but are contradictory and > > can't be legalized in any meaningful way. > > Right, we have faced that problem before on non-scalable vector extensions. > > For example, vectorising 3 operations in a 4-wide vector and adding an > undef in the last lane. > > It didn't use to be possible to do that, many years ago, as a general > case. But if you look at register aliasing (VFP and NEON in ARMv7), we > had the idea of different number of elements on the same register, > depending on how you look. > > I'm not proposing to create all combinations of half-vscale shadowing, > but perhaps adding half-length types as valid and lowering them in a > special way could work much simpler than changing the interpretation > of vscale.[re-sending because I dropped the list -- sorry for the extra copy, Renato!] I don't see how the situation you mention is comparable. Legalization for e.g. <3 x i32> was not implemented at first, but as demonstrated by the fact that it *was* implemented later, there's no conceptual problem with legalizing that kind of type. You don't even have to legalize them in vector registers, three scalar registers work fine (you can even do that on the IR level). For <vscale x 1 x i32> with a fractional value of vscale, there are several conceivable ways to "legalize" this type, but none of them work. Legalization (codegen in general) does not know if the machine code will eventually run on a chip with vector registers so small that vscale works out to 1/2, but it has to choose some legalization strategy. I can imagine several approaches to this, but since the actual value of vscale is not known at this time, it will have to map the illegal scalable vector types to the vector registers in some way, to ensure there's enough space even when vscale is very large in some executions of the program. Depending on how you do that exactly, the generated code might have different behavior when running on a vscale == 1/2 machine, e.g. you might end up with a vector register holding *one* i32 element or a vector register holding *zero* i32 elements (i.e., the sole lane of the 32-bit vector register is masked out). There might be other approaches that result in yet another behavior, such as a hardware fault, but crashes and other immediate problems aside, you're going to end up with a certain discrete number of i32 values. That's a problem. If <vscale x 1 x i32> ends up having one element, and <vscale x 2 x i32> also has one (= 2 * 0.5) element, then that's wrong: the latter type must have twice as many elements as the former (one example where this matters: split_low / split_high / concat shuffle patterns). The second option, a vector with *zero* elements, is just as wrong if not worse. It's not that a correct legalization exists but it's too annoying to implement, or that one might exist but I'm too lazy to work it out. We're also not running in a limitation or oddity of the RISC-V vector ISA in particular. It's simply that, if you set vscale == 0.5, then by the way scalable vector types work (vscale * const elements), some vector types that can be written in the IR would need to have a fractional number of elements to be consistent with the other scalable vector types. As that is not possible (not even conceptually), whatever code you emit to try to legalize that type will end up being wrong in some respect. So if we'd decide to support fractional vscale, we can't say these types are "illegal". In LLVM parlance, illegal types can be used in LLVM IR and targets aspire to turn them into something that works correctly, even if it's very inefficient. Sometimes a legalization is unimplemented or buggy, but these problems can be patched and this has often happened in the past. With fractional vscale, the situation is quite different: nobody will ever be able to use certain scalable vector types on the target in question, because they can't be legalized even in principle. In contrast, scalable vector types that are illegal because they're too large (e.g. <vscale x 32 x i64>) can be legalized just fine. For example, you could split them across a sufficiently large (fixed) number of vector registers and maybe spill them to the stack for inserts/extracts/shuffles/etc. that cross lanes or access elements at data-dependent positions. Implementing this will probably not be a priority for any targets, but it can be implemented whenever it does become important to someone. I hope this lengthy explanation help you see where I'm coming from. Thanks, Hanna> Also, I'm acting like devil's advocate, so don't take my comments as a > rejection of your proposal, I'm just trying to understand where you > are coming from. > > cheers, > --renato
On Wed, 8 Apr 2020 at 05:23, Kai Wang <kai.wang at sifive.com> wrote:> > [snip] > > If we apply the type system pointed out by Renato, is the vector type <vscale x 1 x i16> legal? If we decide that <vscale x 1 x i16> is a fundamentally impossible type, does it contrary to the philosophy of LLVM IR as reasonably target-independent IR? I do not get the point of your argument. ><vscale x 1 x i16> would be illegal, but like other illegal types, it can be legalized. It does not run into the problem I see with fractional vscale, since the number of elements each vector type is supposed to have is still a whole number. For example, it could be legalized by using a full vector register as for <vscale x 1 x i32> and sign-extending or zero-extending each element as needed by the operations performed on it. Another possibility is using a full vector register with SEW=16, but computing `vl` as for SEW=32 which effectively means using only the lower half of the vector register. Both options always ensure we correctly (if slowly) emulate a vector containing as many i16 elements as <vscale x 1 x i32> has i32 elements. To be clear: in LLVM jargon, a type being "illegal" does not mean that the type is not supposed to be used. It only means that the type isn't directly supported by the hardware, but can be mapped to the things the hardware does support with extra effort and at the expense of some performance. For example, i64 is illegal on typical 32 bit targets, but clang happily uses i64 for C types like `long` or `long long` (depending on ABI), and backends support this. Another example are the float and double types on any target without FPU (including RISC-V without F/D extension). I hope this clarifies the distinction I made before. Kind regards, Hanna PS: I'm ignoring the "fractional LMUL" proposal in this discussion, I hope that's okay for you. If it was adopted it would give us a larger set of legal types, but all the concepts we're discussing here would still apply to other types, so let's stick with LMUL > 1 to avoid confusion.>> >> >> The obvious solution is to use a different set of legal vector types >> (and thus, a different interpretation of vscale) depending on the >> largest legal element type (ELEN in RISC-V jargon). This corresponds >> to the table for ELEN=32 that Renato gave above. Kai's proposal is >> intended to avoid this, and I can understand the desire for that, but >> it really seems like the lesser evil to me. > > > The problem of defining a different type system depending on the largest legal element type (ELEN in RISC-V jargon) is that they are not compatible. I assume that programs compiled under ELEN = 32 could be run on ELEN = 64 machines. It should be possible to link ELEN = 32 objects with ELEN = 64 objects. If we use the type <vscale x 1 x i32> under ELEN = 32, there is no corresponding type under ELEN = 64 for <vscale x 1 x i32> (look up in my table). It seems an illegal type under ELEN = 64. Does it follow the philosophy of target independent IR? > > I hope we could design an unified type system for different ELEN. However, the vscale may be fractional on run time under some circumstances (VLEN = 32, ELEN = 32) in my proposal. That is why I wonder to know whether the fractional vscale is matter or not. > > Thanks, > Kai > >> >> Best regards >> Hanna >> >> >> > > Is there any problem to assume vscale to be fractional under some circumstances? vscale should be an unknown value when compiling. So, it should have no impact on code generation and optimization. The relationship between types is correct regardless vscale’s value. Is there anything I missed? >> > >> > I believe the assumption was always that vscale is an integer. >> > Representing it as a fraction would need code change for sure, but >> > also reevaluate the assumptions. >> > >> > I'm copying some SVE and LV people to give a more informed opinion. >> > >> > cheers, >> > --renato >> > _______________________________________________ >> > LLVM Developers mailing list >> > llvm-dev at lists.llvm.org >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev