A fix for this is in r194876. Thanks for reporting this! On Nov 15, 2013, at 3:49 PM, Joshua Klontz <josh.klontz at gmail.com> wrote:> Nadav, > > I believe aligned accesses to unaligned pointers is precisely the issue. Consider the function `add_u8S` before[1] and after[2] the loop vectorizer pass. There is no alignment assumption associated with %kernel_data prior to vectorization. I can't tell if it's the loop vectorizer or the codegen at fault, but the alignment assumption seems to sneak in somewhere. > > v/r, > Josh > > [1] http://pastebin.com/kc95WtGG > [2] http://pastebin.com/VY3ZLVJK > > > On Fri, Nov 15, 2013 at 3:58 PM, Nadav Rotem <nrotem at apple.com> wrote: > > On Nov 15, 2013, at 12:36 PM, Renato Golin <renato.golin at linaro.org> wrote: > >> On 15 November 2013 20:24, Joshua Klontz <josh.klontz at gmail.com> wrote: >> Agreed, is there a pass that will insert a runtime alignment check? Also, what's the easiest way to get at TargetTransformInfo::getRegisterBitWidth() so I don't have to hard code 32? Thanks! >> >> I think that's a fair question, and it's about safety. If you're getting this on the JIT, means we may be generating unsafe transformations on the vectorizer. >> >> Arnold, Nadav, I don't remember seeing code to generate any run-time alignment checks on the incoming pointer, is there such a thing? If not, shouldn't we add one? > > > If the the vectorizer generates aligned memory accesses to unaligned addresses then this is a serious bug. But I don’t think that Josh said that the vectorizer generated aligned accesses to unaligned pointers. > > There is no point in LLVM checking for alignment because if the memory is unaligned then the program will crash. Users who want to crash with a readable error message can simply write code that checks the pointer (by masking the high bits and comparing to zero). > >
I confirm that r194876 fixes the issue, i.e. segfault not caused. My program still passed 16 byte aligned pointers to the function which the loop vectorizer processes successfully: LV: Vector loop of width 8 costs: 1. LV: Selecting VF = : 8. LV: Found a vectorizable loop (8) in func_orig.ll LV: Unroll Factor is 1 Since the program runs fine, it seems to be allowed for the CPU to issue a vector load (8 floats) to a 16 byte aligned address (as opposed to 32 byte aligned). Or does in fact the loop vectorizer handle this case in the preamble and the vector.body issues only 32 byte aligned accesses. In which case I would align the payload to 32 byte in order to save a little in the preamble. Frank On 15/11/13 18:15, Arnold Schwaighofer wrote:> A fix for this is in r194876. > > Thanks for reporting this! > > > On Nov 15, 2013, at 3:49 PM, Joshua Klontz <josh.klontz at gmail.com> wrote: > >> Nadav, >> >> I believe aligned accesses to unaligned pointers is precisely the issue. Consider the function `add_u8S` before[1] and after[2] the loop vectorizer pass. There is no alignment assumption associated with %kernel_data prior to vectorization. I can't tell if it's the loop vectorizer or the codegen at fault, but the alignment assumption seems to sneak in somewhere. >> >> v/r, >> Josh >> >> [1] http://pastebin.com/kc95WtGG >> [2] http://pastebin.com/VY3ZLVJK >> >> >> On Fri, Nov 15, 2013 at 3:58 PM, Nadav Rotem <nrotem at apple.com> wrote: >> >> On Nov 15, 2013, at 12:36 PM, Renato Golin <renato.golin at linaro.org> wrote: >> >>> On 15 November 2013 20:24, Joshua Klontz <josh.klontz at gmail.com> wrote: >>> Agreed, is there a pass that will insert a runtime alignment check? Also, what's the easiest way to get at TargetTransformInfo::getRegisterBitWidth() so I don't have to hard code 32? Thanks! >>> >>> I think that's a fair question, and it's about safety. If you're getting this on the JIT, means we may be generating unsafe transformations on the vectorizer. >>> >>> Arnold, Nadav, I don't remember seeing code to generate any run-time alignment checks on the incoming pointer, is there such a thing? If not, shouldn't we add one? >> >> If the the vectorizer generates aligned memory accesses to unaligned addresses then this is a serious bug. But I don’t think that Josh said that the vectorizer generated aligned accesses to unaligned pointers. >> >> There is no point in LLVM checking for alignment because if the memory is unaligned then the program will crash. Users who want to crash with a readable error message can simply write code that checks the pointer (by masking the high bits and comparing to zero). >> >>
The vectorizer will now emit = load <8 x i32>, align #TargetAlignmentOfScalari32 where before it would emit = load <8 x i32> (which has the semantics of “= load <8 xi32>, align 0” which means the address is aligned with target abi alignment, see http://llvm.org/docs/LangRef.html#load-instruction). When the backend generates code for the former it will emit an unaligned move: = vmovups ... wheres for the later it will use an aligned move: = vmovaps … vmovups can load from unaligned addresses while vmovaps can not. No, we currently don’t peel loops for alignment. Best, Arnold On Nov 15, 2013, at 7:23 PM, Frank Winter <fwinter at jlab.org> wrote:> I confirm that r194876 fixes the issue, i.e. segfault not caused. > > My program still passed 16 byte aligned pointers to the function > which the loop vectorizer processes successfully: > > LV: Vector loop of width 8 costs: 1. > LV: Selecting VF = : 8. > LV: Found a vectorizable loop (8) in func_orig.ll > LV: Unroll Factor is 1 > > Since the program runs fine, it seems to be allowed for the CPU > to issue a vector load (8 floats) to a 16 byte aligned address (as > opposed to 32 byte aligned). Or does in fact the loop vectorizer > handle this case in the preamble and the vector.body issues only > 32 byte aligned accesses. In which case I would align the payload > to 32 byte in order to save a little in the preamble. > > Frank > > > > On 15/11/13 18:15, Arnold Schwaighofer wrote: >> A fix for this is in r194876. >> >> Thanks for reporting this! >> >> >> On Nov 15, 2013, at 3:49 PM, Joshua Klontz <josh.klontz at gmail.com> wrote: >> >>> Nadav, >>> >>> I believe aligned accesses to unaligned pointers is precisely the issue. Consider the function `add_u8S` before[1] and after[2] the loop vectorizer pass. There is no alignment assumption associated with %kernel_data prior to vectorization. I can't tell if it's the loop vectorizer or the codegen at fault, but the alignment assumption seems to sneak in somewhere. >>> >>> v/r, >>> Josh >>> >>> [1] http://pastebin.com/kc95WtGG >>> [2] http://pastebin.com/VY3ZLVJK >>> >>> >>> On Fri, Nov 15, 2013 at 3:58 PM, Nadav Rotem <nrotem at apple.com> wrote: >>> >>> On Nov 15, 2013, at 12:36 PM, Renato Golin <renato.golin at linaro.org> wrote: >>> >>>> On 15 November 2013 20:24, Joshua Klontz <josh.klontz at gmail.com> wrote: >>>> Agreed, is there a pass that will insert a runtime alignment check? Also, what's the easiest way to get at TargetTransformInfo::getRegisterBitWidth() so I don't have to hard code 32? Thanks! >>>> >>>> I think that's a fair question, and it's about safety. If you're getting this on the JIT, means we may be generating unsafe transformations on the vectorizer. >>>> >>>> Arnold, Nadav, I don't remember seeing code to generate any run-time alignment checks on the incoming pointer, is there such a thing? If not, shouldn't we add one? >>> >>> If the the vectorizer generates aligned memory accesses to unaligned addresses then this is a serious bug. But I don’t think that Josh said that the vectorizer generated aligned accesses to unaligned pointers. >>> >>> There is no point in LLVM checking for alignment because if the memory is unaligned then the program will crash. Users who want to crash with a readable error message can simply write code that checks the pointer (by masking the high bits and comparing to zero). >>> >>> > >