Hi David, sorry for sending you the mail two times, I forgot to send to the list the first time. On 2014-03-12 09:48, David Chisnall wrote:> I have some patches that automatically expand all memcpy and similar > if the operands are not in AS 0. I think this is probably not quite > the right approach though, and we should be asking the back end for > the function that does a memcpy / memset / whatever in a non-0 address > space, and expand automatically if it doesn't provide one.Can you share these patches? This would be a tentative solution for the reporter of the bug I linked in the original post.> In an ideal world, I'd rather have the memcpy / memset lowering moved > entirely out of SelectionDAG and into a FunctionPass, where it would > be much easier to debug. I'd also want to do the same for lowering of > unaligned loads / stores, so by the time you get to the back end every > load and store is something that can map trivially to a single > instruction (assuming an adequate addressing mode exists).While I agree that the memcpy lowering pass could be done as an IR pass because it involves loops, I don't think you should do that for lowering of unaligned loads / stores. But that's mostly unrelated to this thread and should be discussed separately. There are still some advantages of lowering the memcpy / memset in SelectionDAGBuilder. The infrastructure (e.g. target hooks for determining the right register class for memory operations) is already there. I don't know how hard it is to generate loops in SelectionDAGBuilder, though. -Manuel> David > > On 11 Mar 2014, at 22:23, Manuel Jacob <me at manueljacob.de> wrote: > >> Hi, >> >> SelectionDAGBuilder doesn't know how to lower a Memcpy and Memset if >> one of the pointer operands have an address space >= 256. This is >> understandable since the libc's memcpy / memset don't work for these >> address spaces. However, both Clang (when copying a struct) and some >> optimization passes (LoopIdiomRecognize, MemCpyOpt) can emit memcpy / >> memset for these address spaces. This triggers an assert in >> SelectionDAGBuilder. The optimization passes could be modified to >> give up when they encounter an address space >= 256, but I think clang >> would need some new code that emits a struct copy member-by-member. I >> think it's better to extend the code generator to be able to emit code >> for that. What do you think? >> >> The problem is also described here: >> http://llvm.org/bugs/show_bug.cgi?id=18549 >> >> -Manuel
On 13 Mar 2014, at 05:34, Manuel Jacob <me at manueljacob.de> wrote:> On 2014-03-12 09:48, David Chisnall wrote: >> I have some patches that automatically expand all memcpy and similar >> if the operands are not in AS 0. I think this is probably not quite >> the right approach though, and we should be asking the back end for >> the function that does a memcpy / memset / whatever in a non-0 address >> space, and expand automatically if it doesn't provide one. > > Can you share these patches? This would be a tentative solution for the reporter of the bug I linked in the original post.They're quite ugly (hence not upstreaming yet - most of the stuff in this repo needs some tidying and may be the wrong approach). Looking at the specific change, it's actually a one liner (and a really, really ugly hack): https://github.com/CTSRD-CHERI/llvm/commit/a3e044242ab109a7dad2589f4cc1461b08cb513d This basically sets the limit for the size of a memcpy to expand to infinite. I'm not sure it works for variable-sized memcpys, but I don't think those are inserted by optimisations so we haven't hit them yet. David
On 2014-03-13 02:29, David Chisnall wrote:> On 13 Mar 2014, at 05:34, Manuel Jacob <me at manueljacob.de> wrote: >> On 2014-03-12 09:48, David Chisnall wrote: >>> I have some patches that automatically expand all memcpy and similar >>> if the operands are not in AS 0. I think this is probably not quite >>> the right approach though, and we should be asking the back end for >>> the function that does a memcpy / memset / whatever in a non-0 >>> address >>> space, and expand automatically if it doesn't provide one. >> >> Can you share these patches? This would be a tentative solution for >> the reporter of the bug I linked in the original post. > > They're quite ugly (hence not upstreaming yet - most of the stuff in > this repo needs some tidying and may be the wrong approach). Looking > at the specific change, it's actually a one liner (and a really, > really ugly hack): > > https://github.com/CTSRD-CHERI/llvm/commit/a3e044242ab109a7dad2589f4cc1461b08cb513dWould passing AlwaysInline=true as a parameter have the same effect as this change?> This basically sets the limit for the size of a memcpy to expand to > infinite. I'm not sure it works for variable-sized memcpys, but I > don't think those are inserted by optimisations so we haven't hit them > yet.I previously thought LoopIdiomRecognize can insert variable-sized memcpys / memsets but it only looks into "countable" loops which are defined as ScalarEvolution's hasLoopInvariantBackedgeTakenCount(). This probably means no variable-sized memcpys / memsets are inserted. Or does it only mean the loop count has to be independent from the loop body? I'll try out passing "AlwaysInline" to getMemcpy() and see whether it works for the code presented in the bug report linked in the original post. However, for a fix that can be integrated into the tree, variable-sized memcpys / memsets either have to be implemented or a better (and one when built without assertions) error message has to be presented. -Manuel