Chandler Carruth
2014-Dec-24 00:24 UTC
[LLVMdev] [RFC][PATCH][OPENCL] synchronization scopes redux
I've not had a good chance to look at the patches in detail, but just to clarify one point: I don't really care whether we number things going up or down from single threaded to "every thread". I just think it makes sense to expose them in the in-memory IR interface as an enum with a particular ordering so that code can use the obvious sorts of tests for comparing two orderings and not have to worry (overly much) about edge cases. This doesn't really need to be reflected in the bitcode encoding though, so I'm fine with whatever steps are needed to keep the bitcode compatible and sane. I also agree with having the text format use a symbolic thing for both extremes. It doesn't seem super important, but it seems nice. Regarding the bitcode encoding, I would consider whether one encoding is more space efficient than another. I don't recall whether we default to zero or whether we use a varint encoding in the bitcode here, but if we do, it would make sense to optimize the encoding around cross thread being the most common. I'm not really a bitcode expert, so I'd rather defer to someone who has hacked on this part of LLVM more recently there. I can try to take a look at the higher level patches soon though. On Fri, Dec 12, 2014 at 10:25 AM, Sahasrabuddhe, Sameer < sameer.sahasrabuddhe at amd.com> wrote:> > On 12/11/2014 4:28 PM, Sahasrabuddhe, Sameer wrote: > > > Attached is a sequence of patches that changes the IR to support more than > two synchronization scopes. This is still a work in progress, and these > patches are only meant to start a more detailed discussion on the way > forward. > > > Ping! > > Found a simple way to preserve forward compatibility. See below. > > One big issue is the absence of any backend that actually makes use of > intermediate synchronization scopes. This work is meant to be just one part > of the ground work required for landing the much-anticipated HSAIL backend. > Also, more work might be needed for emitting atomic instructions via Clang. > > The proposed syntax for synchronization scope is as follows: > > 1. Synchronization scopes are of arbitrary width, but implemented as > unsigned in the bitcode, just like address spaces. > 2. Cross-thread is default, but now encoded as 0. > 3. Keyword 'singlethread' is unchanged, but now encoded as the > largest integer (which happens to be ~0U in bitcode). > 4. New syntax "synchscope(n)" for other scopes. > 5. There is no keyword for cross-thread, but it can be specified as > "synchscope(0)". > > This change breaks forward compatibility for the bitcode, since the > meaning of the zero/one values are now changed. > > enum SynchronizationScope { > - SingleThread = 0, > - CrossThread = 1 > + CrossThread = 0, > + SingleThread = ~0U > }; > > The change passes almost all lit tests including one new test (see patch > 0005). The failing tests are specifically checking for forward > compatibility: > > Failing Tests (3): > LLVM :: Bitcode/cmpxchg-upgrade.ll > LLVM :: Bitcode/memInstructions.3.2.ll > LLVM :: Bitcode/weak-cmpxchg-upgrade.ll > > This breakage remains even if we reverse the order of synchronization > scopes. One simple way to preserve compatibility is to retain 0 and 1 with > their current meanings, and specify that intermediate scopes are > represented in an ordered way with numbers greater than one. But this is > pretty ugly to work with. Would appreciate inputs on how to fix this! > > > The issue here is purely in the bitcode, and we need an encoding that can > represent new intermediate scopes while preserving the two known values of > zero and one. Note that the earlier zero is now ~0U in the in-memory > representation, and the earlier 1 is now zero. This mapping can be easily > accomplished with a simple increment/decrement by one, ignoring overflow. > So the bitreader now subtracts a one when decoding the synch scope, and > bitwriter adds a one when encoding the synch scope. The attached change > number 0006 is meant to replace changes 0003 and 0005 in the previous list, > since the assembly and the bitcode need to be updated simultaneously for > this to work. > > The new change passes all tests, including the ones checking for forward > compatibility. > > Sameer. > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141223/c884c0c6/attachment.html>
Sahasrabuddhe, Sameer
2015-Jan-05 10:51 UTC
[LLVMdev] [RFC][PATCH][OPENCL] synchronization scopes redux
On 12/24/2014 5:54 AM, Chandler Carruth wrote:> I've not had a good chance to look at the patches in detail, but just > to clarify one point: > > I don't really care whether we number things going up or down from > single threaded to "every thread". I just think it makes sense to > expose them in the in-memory IR interface as an enum with a particular > ordering so that code can use the obvious sorts of tests for comparing > two orderings and not have to worry (overly much) about edge cases. > This doesn't really need to be reflected in the bitcode encoding > though, so I'm fine with whatever steps are needed to keep the bitcode > compatible and sane.Right. The second version of my patches fixes the bitcode encoding. But now I see another potential problem with future bitcode if we require an ordering on the scopes. What happens when a backend later introduces a new scope that goes into the middle of the order? If they renumber the scopes to accomodate this, then existing bitcode for that backend will no longer work. The bitcode reader/writer cannot compensate for this since the values are backend-specific. If we agree that this problem is real, then we cannot force an ordering on the scope numbers.> I also agree with having the text format use a symbolic thing for both > extremes. It doesn't seem super important, but it seems nice.So far, I have refrained from proposing a keyword for cross thread scope in the text format, because (a) there never was one and (b) it is not strictly needed since it is the default anyway. I am fine either way, but we will first have to decide what the new keyword should be. I find "allthreads" to be a decent counterpart for "singlethread" ... "crossthread" is not good enough since intermediate scopes have multiple threads too.> Regarding the bitcode encoding, I would consider whether one encoding > is more space efficient than another. I don't recall whether we > default to zero or whether we use a varint encoding in the bitcode > here, but if we do, it would make sense to optimize the encoding > around cross thread being the most common. I'm not really a bitcode > expert, so I'd rather defer to someone who has hacked on this part of > LLVM more recently there.Indeed, the text format is defined around cross thread being the most common, but strangely it was not encoded as zero, even in bitcode. So the most common case turns out to be a one stored as a uint32 in the bitcode. The new scopes fit into that existing space, while the most common case changes from one to ~0U. Maintaining forward compatibility for older bitcode would mean that we can't optimize by changing the common case to zero.> I can try to take a look at the higher level patches soon though.Great! I intend to clean up and submit the in-memory patches first. These simply upgrade the representation from a single bit to an unsigned integer, without affecting any "end points" of the compiler. Sameer.> > On Fri, Dec 12, 2014 at 10:25 AM, Sahasrabuddhe, Sameer > <sameer.sahasrabuddhe at amd.com <mailto:sameer.sahasrabuddhe at amd.com>> > wrote: > > > On 12/11/2014 4:28 PM, Sahasrabuddhe, Sameer wrote: >> >> Attached is a sequence of patches that changes the IR to support >> more than two synchronization scopes. This is still a work in >> progress, and these patches are only meant to start a more >> detailed discussion on the way forward. >> > > Ping! > > Found a simple way to preserve forward compatibility. See below. > >> One big issue is the absence of any backend that actually makes >> use of intermediate synchronization scopes. This work is meant to >> be just one part of the ground work required for landing the >> much-anticipated HSAIL backend. Also, more work might be needed >> for emitting atomic instructions via Clang. >> >> The proposed syntax for synchronization scope is as follows: >> >> 1. Synchronization scopes are of arbitrary width, but >> implemented as unsigned in the bitcode, just like address spaces. >> 2. Cross-thread is default, but now encoded as 0. >> 3. Keyword 'singlethread' is unchanged, but now encoded as the >> largest integer (which happens to be ~0U in bitcode). >> 4. New syntax "synchscope(n)" for other scopes. >> 5. There is no keyword for cross-thread, but it can be specified >> as "synchscope(0)". >> >> This change breaks forward compatibility for the bitcode, since >> the meaning of the zero/one values are now changed. >> >> enum SynchronizationScope { >> - SingleThread = 0, >> - CrossThread = 1 >> + CrossThread = 0, >> + SingleThread = ~0U >> }; >> >> The change passes almost all lit tests including one new test >> (see patch 0005). The failing tests are specifically checking for >> forward compatibility: >> >> Failing Tests (3): >> LLVM :: Bitcode/cmpxchg-upgrade.ll >> LLVM :: Bitcode/memInstructions.3.2.ll >> LLVM :: Bitcode/weak-cmpxchg-upgrade.ll >> >> This breakage remains even if we reverse the order of >> synchronization scopes. One simple way to preserve compatibility >> is to retain 0 and 1 with their current meanings, and specify >> that intermediate scopes are represented in an ordered way with >> numbers greater than one. But this is pretty ugly to work with. >> Would appreciate inputs on how to fix this! >> > > The issue here is purely in the bitcode, and we need an encoding > that can represent new intermediate scopes while preserving the > two known values of zero and one. Note that the earlier zero is > now ~0U in the in-memory representation, and the earlier 1 is now > zero. This mapping can be easily accomplished with a simple > increment/decrement by one, ignoring overflow. So the bitreader > now subtracts a one when decoding the synch scope, and bitwriter > adds a one when encoding the synch scope. The attached change > number 0006 is meant to replace changes 0003 and 0005 in the > previous list, since the assembly and the bitcode need to be > updated simultaneously for this to work. > > The new change passes all tests, including the ones checking for > forward compatibility. > > Sameer. > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150105/134d454e/attachment.html>
Owen Anderson
2015-Jan-06 06:51 UTC
[LLVMdev] [RFC][PATCH][OPENCL] synchronization scopes redux
Hi Sameer,> On Jan 5, 2015, at 4:51 AM, Sahasrabuddhe, Sameer <Sameer.Sahasrabuddhe at amd.com> wrote: > > Right. The second version of my patches fixes the bitcode encoding. But now I see another potential problem with future bitcode if we require an ordering on the scopes. What happens when a backend later introduces a new scope that goes into the middle of the order? If they renumber the scopes to accomodate this, then existing bitcode for that backend will no longer work. The bitcode reader/writer cannot compensate for this since the values are backend-specific. If we agree that this problem is real, then we cannot force an ordering on the scope numbers.That’s an interesting consideration, and something I hadn’t thought of. I’m unsure offhand of how much it matters in practice. The alternative, I suppose, is having something like string-named scopes, but then we can’t do much with them at the IR level.> So far, I have refrained from proposing a keyword for cross thread scope in the text format, because (a) there never was one and (b) it is not strictly needed since it is the default anyway. I am fine either way, but we will first have to decide what the new keyword should be. I find "allthreads" to be a decent counterpart for "singlethread" ... "crossthread" is not good enough since intermediate scopes have multiple threads too.This actually raises another question. In principle, the “most visible” scope ought to be something like “system” or “device”, meaning a completely uncached memory access that is visible to all peripherals in a heterogeneous system. However, this is almost certainly not what we want to have for typical memory accesses. To summarize, a prototypical scope nest, from most to least visible (aka least to most cacheable) might look like: System —> AllThreads —> Various target-specific local scopes —> SingleThread If we wanted to go really gonzo, there could be a Network scope at the beginning for large-scale HPC systems, but I’m not sure how important that is to anyone. As a related question, do we actually need the local scopes to be target specific? Are there systems, real or planned, that *aren’t* captured by: [Network —> ] System —> AllThreads —> ThreadGroup —> SingleThread ? —Owen
Possibly Parallel Threads
- [LLVMdev] [RFC][PATCH][OPENCL] synchronization scopes redux
- [LLVMdev] [RFC][PATCH][OPENCL] synchronization scopes redux
- [LLVMdev] [RFC][PATCH][OPENCL] synchronization scopes redux
- [LLVMdev] [RFC][PATCH][OPENCL] synchronization scopes redux
- [LLVMdev] [RFC][PATCH][OPENCL] synchronization scopes redux