Olivier Goffart
2014-Jan-21 13:21 UTC
[LLVMdev] [Patches] Some LazyValueInfo and related patches
Hi. Attached you will find a set of patches which I did while I was trying to solve two problems. I did not manage to solve fully what i wanted to improve, but I think it is still a step in the right direction. The patches are hopefully self-explanatory. The biggest change here is that LazyValueInfo do not maintain a separate stack of work to do, but do the work directly recursively. The test included in the patch 4 also test the patch 2. The first problem I was trying to solve is to be let the code give hint on the range of the values. Imagine, in a library: class CopyOnWrite { char *stuff; int ref_count; void detach_internal(); inline void detach() { if (ref_count > 1) { detach_internal(); /* ref_count = 1; */ } } public: char &operator[](int i) { detach(); return stuff[i]; } }; Then, in code like this: int doStuffWithStuff(CoptOnWrite &stuff) { return stuff[0] + stuff[1] * stuff[2]; } The generated code will contains three test of ref_count, and three call to detach_internal Is there a way to tell the compiler that ref_count is actually smaller or equal to 1 after a call to detach_internal? Having the "ref_count=1" explicit in the code help (with my patches), but then the operation itself is in the code, and I don't want that. Something like if (ref_count>1) __builtin_unreachable() Works fine in GCC, but does not work with LLVM. Well, it almost work. but the problem is that the whole condition is removed before the inlining is done. So what can be done for that to work? Either delay the removal of __builtin_unreachable() to after inlining (when?) Another way could be, while removing branches because they are unreachable, somehow leave the range information kept. I was thinking about a !range metadata, but I don't know where to put it. The other problem was that i was analyzing code like this: void toLatin1(uchar *dst, const ushort *src, int length) { if (length) { #if defined(__SSE2__) if (length >= 16) { for (int i = 0; i < length >> 4; ++i) { /* skipped code using SSE2 intrinsics */ src += 16; dst += 16; } length = length % 16; } #endif while (length--) { *dst++ = (*src>0xff) ? '?' : (uchar) *src; ++src; } } } I was wondering, if compiling with AVX, would clang/LLVM be able to even vectorize more the SSE2 intrinsics to wider vectors? Or would the non intrinsics branch be better? It turns out the result is not great. LLVM leaves the intrinsics code unchanged (that's ok), but tries to also vectorize the second loop. (And the result of this vectorisation is quite horrible.) Shouldn't the compiler see that length is never bigger than 16 and hence deduce that there is no point in vectorizing? This is why I implemented the srem and urem in LVI. But then, maybe some other pass a loop pass should use LVI to see than a loop never enters, or loop vectorizer could use LVI to avoid creating the loop in the first place. -- Olivier -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-SCCP-Do-not-transform-load-of-a-null-pointer-into-0.patch Type: text/x-patch Size: 1003 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140121/f4ef0d84/attachment.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0002-LVI-Be-able-to-optimize-the-condition-with-and-and-o.patch Type: text/x-patch Size: 7857 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140121/f4ef0d84/attachment-0001.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0003-LVI-Re-order-the-check-that-the-second-operand-is-co.patch Type: text/x-patch Size: 2949 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140121/f4ef0d84/attachment-0002.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0004-LVI-Look-recursively-the-dependencies-for-finding-ra.patch Type: text/x-patch Size: 3152 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140121/f4ef0d84/attachment-0003.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0007-LVI-Support-range-detection-of-srem-and-urem.patch Type: text/x-patch Size: 8355 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140121/f4ef0d84/attachment-0004.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0005-LVI-simplify-a-bit-by-not-having-a-separate-stack.patch Type: text/x-patch Size: 12836 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140121/f4ef0d84/attachment-0005.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0008-CVP-Look-for-LVI-information-when-there-is-a-compari.patch Type: text/x-patch Size: 7736 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140121/f4ef0d84/attachment-0006.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0006-LVI-simplify-remove-hasBlockValue-and-solve-from-get.patch Type: text/x-patch Size: 6095 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140121/f4ef0d84/attachment-0007.bin>
Olivier Goffart
2014-Jan-24 07:34 UTC
[LLVMdev] [Patches] Some LazyValueInfo and related patches
Ping? On Tuesday 21 January 2014 14:21:43 Olivier Goffart wrote:> Hi. > > Attached you will find a set of patches which I did while I was trying to > solve two problems. > I did not manage to solve fully what i wanted to improve, but I think it is > still a step in the right direction. > > The patches are hopefully self-explanatory. > The biggest change here is that LazyValueInfo do not maintain a separate > stack of work to do, > but do the work directly recursively. > > The test included in the patch 4 also test the patch 2. > > > The first problem I was trying to solve is to be let the code give hint on > the range of the values. > > Imagine, in a library: > > class CopyOnWrite { > char *stuff; > int ref_count; > void detach_internal(); > inline void detach() { > if (ref_count > 1) { > detach_internal(); > /* ref_count = 1; */ > } > } > public: > char &operator[](int i) { detach(); return stuff[i]; } > }; > > Then, in code like this: > > int doStuffWithStuff(CoptOnWrite &stuff) { > return stuff[0] + stuff[1] * stuff[2]; > } > > The generated code will contains three test of ref_count, and three call to > detach_internal > > Is there a way to tell the compiler that ref_count is actually smaller or > equal to 1 after a call to detach_internal? > Having the "ref_count=1" explicit in the code help (with my patches), but > then the operation itself is in the code, and I don't want that. > > Something like > > if (ref_count>1) > __builtin_unreachable() > > Works fine in GCC, but does not work with LLVM. > Well, it almost work. but the problem is that the whole condition is > removed before the inlining is done. > So what can be done for that to work? Either delay the removal of > __builtin_unreachable() to after inlining (when?) > Another way could be, while removing branches because they are unreachable, > somehow leave the range information kept. > I was thinking about a !range metadata, but I don't know where to put it. > > The other problem was that i was analyzing code like this: > > void toLatin1(uchar *dst, const ushort *src, int length) > { > if (length) { > #if defined(__SSE2__) > if (length >= 16) { > for (int i = 0; i < length >> 4; ++i) { > /* skipped code using SSE2 intrinsics */ > src += 16; dst += 16; > } > length = length % 16; > } > #endif > while (length--) { > *dst++ = (*src>0xff) ? '?' : (uchar) *src; > ++src; > } > } > } > > I was wondering, if compiling with AVX, would clang/LLVM be able to even > vectorize more the SSE2 intrinsics to wider vectors? Or would the non > intrinsics branch be better? > It turns out the result is not great. LLVM leaves the intrinsics code > unchanged (that's ok), but tries to also vectorize the second loop. (And > the result of this vectorisation is quite horrible.) > Shouldn't the compiler see that length is never bigger than 16 and hence > deduce that there is no point in vectorizing? This is why I implemented the > srem and urem in LVI. > But then, maybe some other pass a loop pass should use LVI to see than a > loop never enters, or loop vectorizer could use LVI to avoid creating the > loop in the first place. > > -- > Olivier