Philip Reames
2014-May-26 18:49 UTC
[LLVMdev] Why can't atomic loads and stores handle floats?
David provided one good answer. I'll give another. The current design pushes complexity into the language frontend for - as far as I know - no good reason. I can say from recent experience that the corner cases around atomics are both surprising and result in odd looking hacks in the frontend. To say this differently, why should marking loads and stores atomic required me to rewrite largish chunks of code around the load or store? There's nothing "wrong" per se with that design, but why complicate a bunch of frontends when a single IR level desugarring pass could preform the same logic? Another answer would be that bitcasts make the IR less readable. They consume memory. Unless handled carefully, they inhibit optimizations. (i.e. if you forget to strip casts in a peephole optimization) When dealing with large IR files from a language where *every* field access is atomic "unordered", the first two are particularly important. p.s. I'm currently operating under the assumption that there is no *technical* reason LLVM could represent atomic loads and stores on floating point types. If this is not true, please correct me. Philip On 05/24/2014 03:18 PM, Filip Pizlo wrote:> What is the downside of the currently generated IR? There ain't > nothin' wrong with bitcasts, IMO. > > -Filip > > On May 24, 2014, at 2:17 PM, Philip Reames <listmail at philipreames.com > <mailto:listmail at philipreames.com>> wrote: > >> Looking through the documentation, I discovered that atomic loads and >> stores are only supported for integer types. Can anyone provide some >> background on this? Why is this true? >> >> Currently, given code: >> std::atomic<float> aFloat; >> void foo() { >> float f = atomic_load(&aFloat); >> .. >> } >> >> Clang generates code like:|| >> %"struct.std::atomic.2" = type { float } >> @aFloat = global %"struct.std::atomic.2" zeroinitializer, align 4 >> >> define void @foo() { >> %1 = load atomic i32* bitcast (%"struct.std::atomic.2"* @aFloat to >> i32*) seq_cst, align 4 >> %2 = bitcast i32 %1 to float >> ... >> } >> >> This seems less than ideal. I would expect that we might have to >> desugar floats into integer & cast operations in the backend, but why >> is this imposed on the frontend? >> >> More generally, is there anyone who is knowledgeable and/or working >> on atomics and synchronization in LLVM? I've got a number of >> questions w.r.t. semantics and have found a number of what I believe >> to be missed optimizations. I'm happy to file the later, but I'd >> like to talk them over with a knowledgeable party first. >> >> Philip >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140526/ea492052/attachment.html>
Tim Northover
2014-May-26 19:51 UTC
[LLVMdev] Why can't atomic loads and stores handle floats?
> There's nothing "wrong" per se with that design, but why > complicate a bunch of frontends when a single IR level desugarring pass > could preform the same logic?I quite like this idea. It could give David his atomic ops where an integer really can't do the right thing, and isn't just shunting the burden onto all of the backends. Some restrictions would still be needed. A "load atomic [1000 x i64]* %addr" is just being cheeky. The biggest issue I see is modelling the legal loads for a target. For example AArch64 probably has "legal" monotonic loads for most sane types, in the sense that they can be implemented in the same way as non-atomic ones. But there's no "ldar s0, [addr]", and you can't simply replace an atomic load with a normal load even in the weaker cases because you have no say in what passes run after your shiny expansion pass. With appropriate target hooks, I think it could be made to work. Cheers. Tim.
David Chisnall
2014-May-26 20:26 UTC
[LLVMdev] Why can't atomic loads and stores handle floats?
On 26 May 2014, at 20:51, Tim Northover <t.p.northover at gmail.com> wrote:> I quite like this idea. It could give David his atomic ops where an > integer really can't do the right thing, and isn't just shunting the > burden onto all of the backends. Some restrictions would still be > needed. A "load atomic [1000 x i64]* %addr" is just being cheeky.Currently, the frontend will have to lower these to calls to the __atomic functions, but there's no technical reason for this on all architectures. Haswell and newer Intel chips *can* implement atomic loads of 1000 x i64: with the transactional extensions, the limit for loads is very large (the limit for atomic writes is around 30KB). As transactional memory becomes more common, large atomicrmw operations become possible, but LLVM IR can't meaningfully express them. Currently, two architectures in LLVM support hardware transactional memory in some form: x86 and BlueGene/Q. One of the biggest issues I face implementing the back end for our architecture is the willingness, both in mapping to IR and then to SelectionDAG for LLVM to throw away information that is not yet meaningful to existing back ends. Let's try not to make that any worse for future back-end authors. Lots of people are trying to use LLVM for custom ASICs and at EuroLLVM there were a number of people who had encountered similar problems in exactly this. David
Philip Reames
2014-May-27 16:30 UTC
[LLVMdev] Why can't atomic loads and stores handle floats?
On 05/26/2014 12:51 PM, Tim Northover wrote:>> There's nothing "wrong" per se with that design, but why >> complicate a bunch of frontends when a single IR level desugarring pass >> could preform the same logic? > I quite like this idea. It could give David his atomic ops where an > integer really can't do the right thing, and isn't just shunting the > burden onto all of the backends. Some restrictions would still be > needed. A "load atomic [1000 x i64]* %addr" is just being cheeky. > > The biggest issue I see is modelling the legal loads for a target. For > example AArch64 probably has "legal" monotonic loads for most sane > types, in the sense that they can be implemented in the same way as > non-atomic ones. But there's no "ldar s0, [addr]", and you can't > simply replace an atomic load with a normal load even in the weaker > cases because you have no say in what passes run after your shiny > expansion pass. > > With appropriate target hooks, I think it could be made to work.I'm wiling to take this on. It's not going to be immediately, but cleaning up the code in my frontend is worth the work. It seems like the logical place for this would either be in CodeGenPrep or SelectionDAGBuilder. Does anyone know of any reason why it would need to be done earlier? I'm going to ignore the generalized transactional-memory use cases for the moment. I want to stick to the subset of features which have fairly wide support across platforms. Honestly, the transactional memory bits feel like they should be solved differently anyways. Philip