Hi all, Is there a way to express in the IR that a pointer's value is a multiple of, say, 32 bytes? I.e. the data the pointer points to has an alignment of 32 bytes. I am not meaning the natural alignment determined by the object's size. I am talking about a double* pointer and like to explicitly overestimate the alignment. I am trying to add this pointer as a function's argument, so that later aligned (vector-) loads would get generated. See the pseudo code of what I try to accomplish: define void @foo( double* noalias %arg0 ) { // switching to C style for( int outer=0 ; outer < end ; ++outer ) { for( int inner=0 ; inner < 4 ; ++inner ) { arg0[ outer*4 + inner ] += arg0[ outer*4 + inner ]; } } The loop vectorizer does its job on the 'inner' loop and generates vector loads/adds/stores for this code. However, the vector loads/stores are not optimally aligned as they could be resulting a lot of boilerplate code produced in codegen (lots of permutations). After vectorization the code looks similar to define void @foo( double* noalias %arg0 ) { // switching to C style for( int outer=0 ; outer < end ; ++outer ) { vector.body: ; preds = %vector.body, %L5 %index = phi i64 [ 0, %L5 ], [ %index.next, %vector.body ] %42 = add i64 %7, %index %43 = getelementptr double* %arg1, i64 %42 %44 = bitcast double* %43 to <4 x double>* %wide.load = load <4 x double>* %44, align 8 %132 = fadd <4 x double> %wide.load, %wide.load54 %364 = getelementptr double* %arg0, i64 %93 %365 = bitcast double* %364 to <4 x double>* store <4 x double> %329, <4 x double>* %365, align 8 } } One can see that if the initial alignment of the pointee of %arg0 was 32 bytes and since the vectorizer operates on a loop with a fixed trip count of 4 and the size of double is 8 bytes, the vector loads and stores could be ideally aligned with 32 bytes (which on my target architecture would result in vector loads without additional permutations. Is it somehow possible to achieve this? I am generating the IR with the builder, i.e. I am not coming from C or clang. Thank you, Frank
On 3/25/2014 8:53 AM, Frank Winter wrote:> > One can see that if the initial alignment of the pointee of %arg0 was 32 > bytes and since the vectorizer operates on a loop with a fixed trip > count of 4 and the size of double is 8 bytes, the vector loads and > stores could be ideally aligned with 32 bytes (which on my target > architecture would result in vector loads without additional permutations. > > Is it somehow possible to achieve this? I am generating the IR with the > builder, i.e. I am not coming from C or clang.If you are generating the loads and stores, you could just set the alignment to whatever you want, i.e. 32 bytes in your case. I have wondered about it in a general case, when you simply want to have an alignment information on the pointer, and not on loads/stores. My idea was to invent a builtin, something like "assert_aligned", that does nothing, other than manifest the alignment by the fact of its existence. For example: %argp = call i8* llvm.assert.aligned(%arg0, 32) would state that the pointer %argp is aligned to 32 bytes, and the value of it is the same as %arg0 at the place of the "call". That was a while ago and maybe there are other ways of doing it now. -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
On 03/25/2014 10:08 AM, Krzysztof Parzyszek wrote:> On 3/25/2014 8:53 AM, Frank Winter wrote: >> >> One can see that if the initial alignment of the pointee of %arg0 was 32 >> bytes and since the vectorizer operates on a loop with a fixed trip >> count of 4 and the size of double is 8 bytes, the vector loads and >> stores could be ideally aligned with 32 bytes (which on my target >> architecture would result in vector loads without additional >> permutations. >> >> Is it somehow possible to achieve this? I am generating the IR with the >> builder, i.e. I am not coming from C or clang. > > If you are generating the loads and stores, you could just set the > alignment to whatever you want, i.e. 32 bytes in your case.I can't. Take a look again at the first piece of code. The loads occur in the 'inner' loop. Only for the first iteration the alignment of 32 bytes is true, not for iteration 2, 3 and 4. So, the alignment information cannot enter at the point of loading. Thus, the idea of attaching the information right at the pointer's definition, i.e. as the argument.> > I have wondered about it in a general case, when you simply want to > have an alignment information on the pointer, and not on > loads/stores. My idea was to invent a builtin, something like > "assert_aligned", that does nothing, other than manifest the alignment > by the fact of its existence. For example: > %argp = call i8* llvm.assert.aligned(%arg0, 32) > would state that the pointer %argp is aligned to 32 bytes, and the > value of it is the same as %arg0 at the place of the "call". > > That was a while ago and maybe there are other ways of doing it now.Should be doable this way. Although I am not sure whether a assertion or an annotation would be cleaner. There should already be a solution.> > -Krzysztof >