Andrew, your response highlights a naming problem in LLVM, which is that "array" and "vector" mean the same thing in normal computer language and compiler theory usage, so it is inconvenient and misleading within LLVM to give one a very specific meaning that is different from the other.... to the LLVM developers I would suggest using the term "packed" to refer to the type of data that the SPARC-VIS, the PPC-Altivec, and the Intel-mmx/sse (among others) instruction sets support. As far as I am aware not a single one of any of the above types of instruction sets allows the "subscripting" of packed data within a register (the Maspar computer had hardware that would allow subscripting of sub-elements of data within a larger/wider register, but it was the exception, not the rule, and it did not support any of the saturating arithmetic that is part-and-parcel of the packed data types in the currently existing "multi-media instruction sets"). sincerely, Peter Lawrence.> > Could you use an array of pointers instead? As far as I'm aware, the > vector types in LLVM are intended to abstract the vector units on > modern > CPUs (SSE, etc.) which generally support operations on only > integers and > floating point values, which may be why there isn't native support for > vectors of pointers. > > Andrew
Agreed, I too was wondering why we need both arrays and vectors. It goes against the grain, I think, of the structure typing system used by LLVM. For example, a vector of 4 floats and an array of 4 floats are structurally the same type. Would it be feasible in the future to consolidate the two types by allowing "vector" operations (add, multiply, etc.) on arrays where it makes sense, and doing away with the specialized vector types? Andrew On 02/14/2011 03:49 PM, Peter Lawrence wrote:> Andrew, > your response highlights a naming problem in LLVM, > which is that "array" and "vector" > mean the same thing in normal computer language and compiler theory > usage, so it is > inconvenient and misleading within LLVM to give one a very specific > meaning that is different > from the other.... > > to the LLVM developers I would suggest using the term "packed" to > refer to the type of > data that the SPARC-VIS, the PPC-Altivec, and the Intel-mmx/sse > (among others) instruction > sets support. > > As far as I am aware not a single one of any of the above types of > instruction sets allows the "subscripting" of packed data within a > register (the Maspar > computer had hardware that would allow subscripting of sub-elements > of data within > a larger/wider register, but it was the exception, not the rule, and > it did not support > any of the saturating arithmetic that is part-and-parcel of the > packed data types in > the currently existing "multi-media instruction sets"). > > > sincerely, > Peter Lawrence. >
On Mon, Feb 14, 2011 at 3:49 PM, Peter Lawrence <peterl95124 at sbcglobal.net> wrote:> Andrew, > your response highlights a naming problem in LLVM, > which is that "array" and "vector" > mean the same thing in normal computer language and compiler theory > usage, so it is > inconvenient and misleading within LLVM to give one a very specific > meaning that is different > from the other....The only place in CS I know of that "vector" is used to mean arbitrarily long sequence of elements is the C++ STL vector template class. So far as I know in the field of compilers "vectorization" always refers to lowering code down to use SIMD operations.> to the LLVM developers I would suggest using the term "packed" to > refer to the type of > data that the SPARC-VIS, the PPC-Altivec, and the Intel-mmx/sse > (among others) instruction > sets support."packed" is already used to refer to "packed structs", ie structs without any padding. Reid
Reid, although K&R's "The C Programming Language" does not use the term vector, note that Stroustrup's "The C++ Programming Language" does (way before STL was ever envisioned) ! Metcalf's "Effective Fortran-77" does not use the term vector, except that it is frequently used as the "name" for an array in array usage examples, but then in "Effective Fortran-90", when an array is used as an index into an array it is referred to as a "vector-valued-subscript", not an "array-valued-subscript" ! Guy Steel's "Common Lisp" quote: "one-dimensional arrays are called _vectors_ in Common Lisp...". the equivalent and interchangable usage of "array" and "vector" goes back to the beginning of computing.... when multiple items are placed within a single word of computer memory or a single register, it is "packed" data, regardless of whether that it is of uniform type and size (array/vector-like) or of different type and size (struct/record-like). best, -Peter Lawrence. On Feb 14, 2011, at 1:27 PM, Reid Kleckner wrote:> On Mon, Feb 14, 2011 at 3:49 PM, Peter Lawrence > <peterl95124 at sbcglobal.net> wrote: >> Andrew, >> your response highlights a naming problem in LLVM, >> which is that "array" and "vector" >> mean the same thing in normal computer language and compiler theory >> usage, so it is >> inconvenient and misleading within LLVM to give one a very specific >> meaning that is different >> from the other.... > > The only place in CS I know of that "vector" is used to mean > arbitrarily long sequence of elements is the C++ STL vector template > class. So far as I know in the field of compilers "vectorization" > always refers to lowering code down to use SIMD operations. > >> to the LLVM developers I would suggest using the term "packed" to >> refer to the type of >> data that the SPARC-VIS, the PPC-Altivec, and the Intel-mmx/sse >> (among others) instruction >> sets support. > > "packed" is already used to refer to "packed structs", ie structs > without any padding. > > Reid
Andrew, This is one area of LLVM that maps very nicely to our GPU architecture. A vector is a native data type on these architectures. For example, on AMD's GPUs, the native type is 4x32bit vector with sub-components. Each of the individual 32bits can be indexed separately, but not dynamically. This is a big difference from an array of 32bit values. So there are cases where the meaning of the two are different on modern hardware. Micah> -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of Andrew Clinton > Sent: Monday, February 14, 2011 1:14 PM > To: llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] LLVMdev Digest, Vol 80, Issue 13 > > Agreed, I too was wondering why we need both arrays and vectors. It > goes against the grain, I think, of the structure typing system used by > LLVM. For example, a vector of 4 floats and an array of 4 floats are > structurally the same type. Would it be feasible in the future to > consolidate the two types by allowing "vector" operations (add, > multiply, etc.) on arrays where it makes sense, and doing away with the > specialized vector types? > > Andrew > > On 02/14/2011 03:49 PM, Peter Lawrence wrote: > > Andrew, > > your response highlights a naming problem in LLVM, > > which is that "array" and "vector" > > mean the same thing in normal computer language and compiler theory > > usage, so it is > > inconvenient and misleading within LLVM to give one a very specific > > meaning that is different > > from the other.... > > > > to the LLVM developers I would suggest using the term "packed" to > > refer to the type of > > data that the SPARC-VIS, the PPC-Altivec, and the Intel-mmx/sse > > (among others) instruction > > sets support. > > > > As far as I am aware not a single one of any of the above types of > > instruction sets allows the "subscripting" of packed data within a > > register (the Maspar > > computer had hardware that would allow subscripting of sub-elements > > of data within > > a larger/wider register, but it was the exception, not the rule, and > > it did not support > > any of the saturating arithmetic that is part-and-parcel of the > > packed data types in > > the currently existing "multi-media instruction sets"). > > > > > > sincerely, > > Peter Lawrence. > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Peter Lawrence <peterl95124 at sbcglobal.net> writes:> Andrew, your response highlights a naming problem in LLVM, which is > that "array" and "vector" mean the same thing in normal computer > language and compiler theory usage, so it is inconvenient and > misleading within LLVM to give one a very specific meaning that is > different from the other....I think any sort of separation at all is counterproductive. The existing array/vector split is artificial. It would be better to have one array-like type and allow a reasonable set of operations on it. Target lowering should take care of legality issues. For best performance various transformation patterns will want to know about the target but that's true independent of vector types. Scalar optimizers want to know about targets too.> As far as I am aware not a single one of any of the above types of > instruction sets allows the "subscripting" of packed data within a > registerGiven what we know of Larrabee and speculating that the "Knights" family is likely a derivative of it, it's safe to assume that future Intel architectures will be much more like traditional vector machines. That means gather/scatter, element indexing, etc. The existing PINSR/PEXTR and shuffle instructions already allow a degree of element indexing. Note that the existing LLVM vector types already have insert/extract operators. Unifying array and vector and generalizing the result would open a lot of optimization opportunities. -Dave
Andrew Clinton <andrew at sidefx.com> writes:> Agreed, I too was wondering why we need both arrays and vectors. It > goes against the grain, I think, of the structure typing system used by > LLVM. For example, a vector of 4 floats and an array of 4 floats are > structurally the same type. Would it be feasible in the future to > consolidate the two types by allowing "vector" operations (add, > multiply, etc.) on arrays where it makes sense, and doing away with the > specialized vector types?Actually, I do not believe an array of 4 floats and a vector of 4 floats are the same. There are various bit packing and alignment semantics that differ. -Dave
Reid Kleckner <reid.kleckner at gmail.com> writes:> So far as I know in the field of compilers "vectorization" always > refers to lowering code down to use SIMD operations.Not true at all. "Vectorization" as mapping exclusicely to fixed-length SSE-style instructions is relatively new. Traditionally, vectorization has meant targeting vector registers with variable-length operations and masks. The SSE-style instructions sets are very limited in comparison. The Allen/Kennedy book is a good resource. http://www.amazon.com/Optimizing-Compilers-Modern-Architectures-Dependence-based/dp/1558602860 -Dave
Dave,> Unifying array and vector and generalizing the result would open a lot > of optimization opportunities.you would be piling an incomplete optimization on top of a pile of already incomplete optimizations... Vectorization in Fortran is already a "hard" problem, requiring alias analysis (always an incomplete and inaccurate (conservative) analysis) and loop-carried array subscript dependence analysis (which is equivalent to the Diophantine Equation problem in Mathematics which is in general not solvable, so you end up again with incomplete and inaccurate (conservative) analysis). Doing this in C (without first class array/ matrix types), and with even more alias analysis issues, makes it more problematic. The final straw with all these multimedia instruction sets is that they require large alignment on their "packed array" data types (even Intel which started out not requiring alignment with MMX (though unaligned data invoked hugh performance penalties), did evolve with SSE to what everyone else requires). data alignment can (and should!) be analyzed within the same algorithms that do alias analysis, and the analysis has the same inherent limitations. The problem is that in real world applications it is typical for array data slices (ie sections of arrays that are passed to subroutines to be processed) to be unaligned, even if the array base address is aligned, the bounds of the section being processed are in general not aligned. You end up with wanting to clone your algorithm kernel for various incoming alignments (just like memcpy, memcmp, etc are often cloned internally for various relative alignments of the incoming arguments), but with a kernel accessing N different arrays you end up needing 2**N clones, which in general is an impractical code-explosion. The reason I object to the use of "vector" and "simd" when describing these "packed data" multimedia instruction sets is that in practical reality the traditional vectorization optimization technology just does not apply. You can always come up with geewiz examples where it does, but you cannot make it work in the general case. No matter what fancy data shuffling/permuting/inserting/extracting instructions get added to MMX/SSE/SSE2, they will still not solve the data alignment problem, so the instruction sets remain incompatible with "traditional vector machines" where there was always one-data-item-per-HW-register and there was never any alignment issue. best, Peter Lawrence. On Feb 15, 2011, at 8:38 AM, David A. Greene wrote:> Peter Lawrence <peterl95124 at sbcglobal.net> writes: > >> Andrew, your response highlights a naming problem in LLVM, which is >> that "array" and "vector" mean the same thing in normal computer >> language and compiler theory usage, so it is inconvenient and >> misleading within LLVM to give one a very specific meaning that is >> different from the other.... > > I think any sort of separation at all is counterproductive. The > existing array/vector split is artificial. It would be better to have > one array-like type and allow a reasonable set of operations on it. > Target lowering should take care of legality issues. For best > performance various transformation patterns will want to know about > the > target but that's true independent of vector types. Scalar optimizers > want to know about targets too. > >> As far as I am aware not a single one of any of the above types of >> instruction sets allows the "subscripting" of packed data within a >> register > > Given what we know of Larrabee and speculating that the "Knights" > family > is likely a derivative of it, it's safe to assume that future Intel > architectures will be much more like traditional vector machines. > That > means gather/scatter, element indexing, etc. The existing PINSR/PEXTR > and shuffle instructions already allow a degree of element indexing. > Note that the existing LLVM vector types already have insert/extract > operators. > > Unifying array and vector and generalizing the result would open a lot > of optimization opportunities. > > -Dave