> On Nov 18, 2014, at 2:35 PM, Chandler Carruth <chandlerc at google.com> wrote: > > > On Fri, Nov 14, 2014 at 1:09 PM, Sahasrabuddhe, Sameer <sameer.sahasrabuddhe at amd.com <mailto:sameer.sahasrabuddhe at amd.com>> wrote: > 1. Update the synchronization scope field in atomic instructions from a > single bit to a wider field, say 32-bit unsigned integer. > > I think this should be an arbitrary bit width integer. I think baking any size into this is a mistake unless that size is "1”....> If we go with your proposed constraint below, I think it is essential to model single-thread-scope as the maximum integer. It should be a strict subset of all inter-thread scopes.These seem mutually contradictory.> > 5. Possibly add the following constraint on memory scopes: "The scope > represented by a larger value is nested inside (is a proper subset > of) the scope represented by a smaller value." This would also imply > that the value used for single-thread scope must be the largest > value used by the target. > This constraint on "nesting" is easily satisfied by HSAIL (and also > OpenCL), where synchronization scopes increase from a single > work-item to the entire system. But it is conceivable that other > targets do not have this constraint. For example, a platform may > define synchronization scopes in terms of overlapping sets instead > of proper subsets. > > I think this is the important thing to settle on in the design. I'd really like to hear from a diverse set of vendors and folks operating in the GPU space to understand whether having this constraint is critically important or problematic for any reasons.I am not aware of any systems (including GPUs) that would need non-nested memory scopes. If such exist, I might expect them to be some kind of clustered NUMA HPC machine. —Owen -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141118/8b396e3e/attachment.html>
Sahasrabuddhe, Sameer
2014-Dec-01 11:03 UTC
[LLVMdev] memory scopes in atomic instructions
On 11/19/2014 6:11 AM, Owen Anderson wrote:>> On Nov 18, 2014, at 2:35 PM, Chandler Carruth <chandlerc at google.com >> <mailto:chandlerc at google.com>> wrote: >> >> I think this should be an arbitrary bit width integer. I think baking >> any size into this is a mistake unless that size is "1”. > ... >> If we go with your proposed constraint below, I think it is essential >> to model single-thread-scope as the maximum integer. It should be a >> strict subset of all inter-thread scopes. > > These seem mutually contradictory.I managed to implement wider synchronization scopes in the IR as follows. Would like to put the changes up for review, but I still need some high-level inputs. 1. The synchronization scope is already implemented as an unsigned integer in the bitcode, as far as I can tell (but I am still new at understanding BitcodeWriter). Only BitcodeReader needs to change so that it is aware that CrossThread and SingleThread are not the only two values possible. 2. My change redefines CrossThread to be "0" from the current "1", and SingleThread to "~0U" from the current "0". 3. The LLVM assembly language needs a way to specify all the scopes. The proposed syntax is: 1. singlethread: This represents "the largest integer representable in the underlying implementation" and hence "the smallest scope". LLParser will interpret this "~0U". 2. synchscope(n): Used to specify any synchronization scope that is larger than a single thread. The notation is similar to that used for address spaces. 3. The default scope is the same as specifying "synchscope(0)". 4. But the above encoding is actually incompatible with the current encoding; I had got this wrong in my original post. In particular, the following tests failed because the meaning of 0/1 changed: LLVM :: Bitcode/cmpxchg-upgrade.ll LLVM :: Bitcode/memInstructions.3.2.ll LLVM :: Bitcode/weak-cmpxchg-upgrade.ll All three tests contain comments on making sure that older bitcode files can be read correctly. The simplest approach is to invert the new encoding and say that "0" remains as SingleThread, and CrossThread is now the largest integer. This will need a new token "crossthread" in the assembly, and point (3) above will change accordingly. This breaks away from the general tradition of "zero as default", but this was not true with synchronization scopes anyway. 5. And yes, "crossthread" needs to be renamed to something better, like "allthreads", or "systemscope", or "addressspacescope". The last one is accurate but takes a small effort to see why. I would go for "allthreads". Sameer.
Hi Sameer, Did you ever get any further with this proposal? —Owen> On Dec 1, 2014, at 3:03 AM, Sahasrabuddhe, Sameer <sameer.sahasrabuddhe at amd.com> wrote: > > > On 11/19/2014 6:11 AM, Owen Anderson wrote: >>> On Nov 18, 2014, at 2:35 PM, Chandler Carruth <chandlerc at google.com <mailto:chandlerc at google.com>> wrote: >>> >>> I think this should be an arbitrary bit width integer. I think baking any size into this is a mistake unless that size is "1”. >> ... >>> If we go with your proposed constraint below, I think it is essential to model single-thread-scope as the maximum integer. It should be a strict subset of all inter-thread scopes. >> >> These seem mutually contradictory. > > I managed to implement wider synchronization scopes in the IR as follows. Would like to put the changes up for review, but I still need some high-level inputs. > > 1. The synchronization scope is already implemented as an unsigned > integer in the bitcode, as far as I can tell (but I am still new at > understanding BitcodeWriter). Only BitcodeReader needs to change so > that it is aware that CrossThread and SingleThread are not the only > two values possible. > > 2. My change redefines CrossThread to be "0" from the current "1", and > SingleThread to "~0U" from the current "0". > > 3. The LLVM assembly language needs a way to specify all the scopes. > The proposed syntax is: > 1. singlethread: This represents "the largest integer representable > in the underlying implementation" and hence "the smallest > scope". LLParser will interpret this "~0U". > 2. synchscope(n): Used to specify any synchronization scope that is > larger than a single thread. The notation is similar to that > used for address spaces. > 3. The default scope is the same as specifying "synchscope(0)". > > 4. But the above encoding is actually incompatible with the current > encoding; I had got this wrong in my original post. In particular, > the following tests failed because the meaning of 0/1 changed: > > LLVM :: Bitcode/cmpxchg-upgrade.ll > LLVM :: Bitcode/memInstructions.3.2.ll > LLVM :: Bitcode/weak-cmpxchg-upgrade.ll > > All three tests contain comments on making sure that older bitcode > files can be read correctly. > > The simplest approach is to invert the new encoding and say that "0" > remains as SingleThread, and CrossThread is now the largest integer. > This will need a new token "crossthread" in the assembly, and point > (3) above will change accordingly. This breaks away from the general > tradition of "zero as default", but this was not true with > synchronization scopes anyway. > > 5. And yes, "crossthread" needs to be renamed to something better, like > "allthreads", or "systemscope", or "addressspacescope". The last one > is accurate but takes a small effort to see why. I would go for > "allthreads". > > Sameer. > >