Daniel Neilson via llvm-dev
2017-May-08  19:08 UTC
[llvm-dev] RFC: Element-atomic memory intrinsics
Hi Sanjoy, Responses inlined…> On May 8, 2017, at 12:49 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote: > > Hi Daniel, > > [+CC Mehdi, Vedant for the auto upgrade issue] > > On Mon, May 8, 2017 at 7:54 AM, Daniel Neilson via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> **Method** >> >> Clearly we are going to have to teach LLVM about unordered memory >> intrinsics. There are, as I can see it, a couple of ways to accomplish this. >> I’d like your opinions on which are preferable, or if you can think of any >> other options. In no particular order… >> >> Option 1) >> Introduce a new unordered/element-atomic version of each of the memory >> intrinsics. >> Ex: @llvm.memcpy_element_atomic — work was already started to introduce >> this one in D27133, but could be backed out and restarted. >> Intrinsic prototype: @llvm.memcpy_element_atomic.<overload desc>(<ty>* >> dest, <ty>* src, <ty> len, i32 align, i2 isunordered, i16 element_size) >> Semantics: >> * Will do a memcpy of len bytes from src to dest. >> * len must = k * lcm( #bytes in dest type, #bytes in src type), for >> some non-negative integer k [note: lcm = least-common multiple] >> * load/store size given by the constant power-of-2 parameter >> “element_size”; expected to be the lcm(sizeof(dest_ty), sizeof(src_ty)) > > I'm not sure if sizeof(dest_ty) and sizeof(src_ty) adds anything here. > > LLVM is moving towards "typeless pointers" (i.e. pointers will not > have pointee types, instead they will just be a "generic pointer" in > some address space), so working in the types of dest and src into the > specification seems awkward. >Poor choice of wording on my part. By sizeof(<thing>) I mean the element-size of the load/store that appeared in the original loop from which the memory intrinsic was materialized — arguably, these will be the same in any real cases so it probably doesn’t make any sense to even mention it...> Also, does the non-overlap restriction of src and dest (as in the > regular llvm.memcpy) apply here as well? >Yes, I would think so.>> * isunordered param: bit 0 == 1 => stores to dest must be marked >> ‘unordered’; bit 1 == 1 => loads from src must be marked ‘unordered' > > What if the bits are zero -- will the stores / loads (depending on > which bit) be "ordered" in that case, or something stronger? >This is partly why I prefer option 2. An ‘isunordered’ value of 0 is nonsense for the standalone atomic-unordered memory intrinsic. It would imply that neither the source nor dest needs to be loaded/stored via unordered-atomic ops, and so the memory intrinsic is identical to the ordinary non-atomic one.>> Option 2) >> Expand the current/existing memory intrinsics to identify the unordered >> constraint, if one exists, in much the same way that volatility is expressed >> — i.e. add an ‘isunordered’ parameter(s) to the intrinsic. >> This option has the same semantics as option 1; the only difference is, >> literally, that we expand the existing memcpy/memset/memmove intrinsics to >> have an ‘isunordered’ parameter and an ‘element_size’ parameter, so the >> prototype becomes something like: >> @llvm.memcpy.<overload desc>(<ty>* dest, <ty>* src, <ty> len, i32 align, >> i1 isvolatile, i2 isunordered, i16 element_size) >> >> Pros: >> * Minimal extra work to handle the new version in existing passes — only >> need to change passes that create calls to memory intrinsics, expand memory >> intrinsics, or that need to care about unordered (which none should that are >> reasoning about memory intrinsic semantics). >> * New code that’s introduced by others to exploit/handle memory >> intrinsics should just handle unordered for free — unordered being a part of >> the memory intrinsic means it’s more likely that the person will realize >> that they have to think about it (i.e. it raises the profile of unordered >> memory intrinsics). > > I like the second point, but (unfortunately) I suspect in practice > you'll see new code do: > > if (MCI->isOrdered()) > return false; // be conservative >Yes, that would be an unfortunate reality, but one can hope. :-)>> Cons: >> * Breaks backward compatibility of the IR — is there a mechanism for >> migrating old IR into a new IR version when loading the IR into LLVM? > > I think the migration here will be fairly straightforward -- you can > just auto-upgrade old calls to memcpy to pass in 0 for the isordered > argument. But I've CC'd Mehdi and Vedant to help shed some light on > this. > > — SanjoyThanks! -Daniel --- Daniel Neilson, Ph.D. Azul Systems
Sanjoy Das via llvm-dev
2017-May-08  20:55 UTC
[llvm-dev] RFC: Element-atomic memory intrinsics
Hi Daniel, On Mon, May 8, 2017 at 12:08 PM, Daniel Neilson <dneilson at azul.com> wrote:>>> * isunordered param: bit 0 == 1 => stores to dest must be marked >>> ‘unordered’; bit 1 == 1 => loads from src must be marked ‘unordered' >> >> What if the bits are zero -- will the stores / loads (depending on >> which bit) be "ordered" in that case, or something stronger? >> > > This is partly why I prefer option 2. An ‘isunordered’ value of 0 is nonsense for the standalone atomic-unordered memory intrinsic. It would imply that neither the source nor dest needs to be loaded/stored via unordered-atomic ops, and so the memory intrinsic is identical to the ordinary non-atomic one.Perhaps the appropriate thing to do for this option is to not have the isunordered flag at all and instead call the intrinsic @llvm.memcpy_element_unordered_atomic? I suspect we probably won't care about anything stronger than unordered atomic memcpys, and we can always generalize if there is a need later. -- Sanjoy
Daniel Neilson via llvm-dev
2017-May-08  21:09 UTC
[llvm-dev] RFC: Element-atomic memory intrinsics
Hi Sanjoy,> On May 8, 2017, at 3:55 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote: > > Hi Daniel, > > On Mon, May 8, 2017 at 12:08 PM, Daniel Neilson <dneilson at azul.com> wrote: >>>> * isunordered param: bit 0 == 1 => stores to dest must be marked >>>> ‘unordered’; bit 1 == 1 => loads from src must be marked ‘unordered' >>> >>> What if the bits are zero -- will the stores / loads (depending on >>> which bit) be "ordered" in that case, or something stronger? >>> >> >> This is partly why I prefer option 2. An ‘isunordered’ value of 0 is nonsense for the standalone atomic-unordered memory intrinsic. It would imply that neither the source nor dest needs to be loaded/stored via unordered-atomic ops, and so the memory intrinsic is identical to the ordinary non-atomic one. > > Perhaps the appropriate thing to do for this option is to not have the > isunordered flag at all and instead call the intrinsic > @llvm.memcpy_element_unordered_atomic? I suspect we probably won't > care about anything stronger than unordered atomic memcpys, and we can > always generalize if there is a need later.Removing that ‘isunordered’ parameter would mean that @llvm.memcpy_element_unordered_atomic() would require that both the src & dest arrays’ elements are accessed via unordered-atomic ops. Leaving the parameter in we allow the possibility that only one of src/dest’s elements need to be accessed via unordered-atomic ops, and the other may be accessed via non-atomic ops. That added degree of freedom might be useful. -Daniel