thr3ads.net - llvm dev - [LLVMdev] [Patches] Some LazyValueInfo and related patches [Jan 2014]

If this information is useful, please help other people find it:
Share via:

Olivier Goffart

2014-Jan-21 13:21 UTC

[LLVMdev] [Patches] Some LazyValueInfo and related patches

Hi.

Attached you will find a set of patches which I did while I was trying to solve 
two problems.
I did not manage to solve fully what i wanted to improve, but I think it is 
still a step in the right direction.

The patches are hopefully self-explanatory.
The biggest change here is that LazyValueInfo do not maintain a separate stack 
of work to do,
but do the work directly recursively.

The test included in the patch 4 also test the patch 2.


The first problem I was trying to solve is to be let the code give hint on the 
range of the values.

Imagine, in a library:

class CopyOnWrite {
    char *stuff;
    int ref_count;
    void detach_internal();
    inline void detach() {
        if (ref_count > 1) {
            detach_internal();
            /* ref_count = 1; */
        }
    }
public:
    char &operator[](int i) { detach(); return stuff[i]; }
};

Then, in code like this:

int doStuffWithStuff(CoptOnWrite &stuff) {
    return stuff[0] + stuff[1] * stuff[2];
}

The generated code will contains three test of ref_count, and three call to 
detach_internal

Is there a way to tell the compiler that ref_count is actually smaller or 
equal to 1 after a call to detach_internal?
Having the "ref_count=1" explicit in the code help (with my patches),
but then
the operation itself is in the code, and I don't want that.

Something like

 if (ref_count>1)
     __builtin_unreachable()

Works fine in GCC,  but does not work with LLVM.
Well, it almost work.  but the problem is that the whole condition is removed 
before the inlining is done.
So what can be done for that to work?  Either delay the removal of 
__builtin_unreachable() to after inlining (when?)
Another way could be, while removing branches because they are unreachable, 
somehow leave the range information kept.
I was thinking about a !range metadata, but I don't know where to put it.

The other problem was that i was analyzing code like this:

void toLatin1(uchar *dst, const ushort *src, int length)
{
    if (length) {
#if defined(__SSE2__)
        if (length >= 16) {
            for (int i = 0; i < length >> 4; ++i) {
                /* skipped code using SSE2 intrinsics */
                src += 16; dst += 16;
            }
            length = length % 16;
        }
#endif
        while (length--) {
            *dst++ = (*src>0xff) ? '?' : (uchar) *src;
            ++src;
        }
    }
}

I was wondering, if compiling with AVX, would clang/LLVM be able to even 
vectorize more the SSE2 intrinsics to wider vectors? Or would the non 
intrinsics branch be better?
It turns out the result is not great.  LLVM leaves the intrinsics code  
unchanged (that's ok),  but tries to also vectorize the second loop. (And
the
result of this vectorisation is quite horrible.)
Shouldn't the compiler see that length is never bigger than 16 and hence 
deduce that there is no point in vectorizing? This is why I implemented the 
srem and urem in LVI.
But then, maybe some other pass a loop pass should use LVI to see than a loop 
never enters, or loop vectorizer could use LVI to avoid creating the loop in 
the first place.

--
Olivier
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-SCCP-Do-not-transform-load-of-a-null-pointer-into-0.patch
Type: text/x-patch
Size: 1003 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140121/f4ef0d84/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-LVI-Be-able-to-optimize-the-condition-with-and-and-o.patch
Type: text/x-patch
Size: 7857 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140121/f4ef0d84/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0003-LVI-Re-order-the-check-that-the-second-operand-is-co.patch
Type: text/x-patch
Size: 2949 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140121/f4ef0d84/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0004-LVI-Look-recursively-the-dependencies-for-finding-ra.patch
Type: text/x-patch
Size: 3152 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140121/f4ef0d84/attachment-0003.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0007-LVI-Support-range-detection-of-srem-and-urem.patch
Type: text/x-patch
Size: 8355 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140121/f4ef0d84/attachment-0004.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0005-LVI-simplify-a-bit-by-not-having-a-separate-stack.patch
Type: text/x-patch
Size: 12836 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140121/f4ef0d84/attachment-0005.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0008-CVP-Look-for-LVI-information-when-there-is-a-compari.patch
Type: text/x-patch
Size: 7736 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140121/f4ef0d84/attachment-0006.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0006-LVI-simplify-remove-hasBlockValue-and-solve-from-get.patch
Type: text/x-patch
Size: 6095 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140121/f4ef0d84/attachment-0007.bin>

Olivier Goffart

2014-Jan-24 07:34 UTC

head link

[LLVMdev] [Patches] Some LazyValueInfo and related patches

Ping?

On Tuesday 21 January 2014 14:21:43 Olivier Goffart
wrote:> Hi.
> 
> Attached you will find a set of patches which I did while I was trying to
> solve two problems.
> I did not manage to solve fully what i wanted to improve, but I think it is
> still a step in the right direction.
> 
> The patches are hopefully self-explanatory.
> The biggest change here is that LazyValueInfo do not maintain a separate
> stack of work to do,
> but do the work directly recursively.
> 
> The test included in the patch 4 also test the patch 2.
> 
> 
> The first problem I was trying to solve is to be let the code give hint on
> the range of the values.
> 
> Imagine, in a library:
> 
> class CopyOnWrite {
>     char *stuff;
>     int ref_count;
>     void detach_internal();
>     inline void detach() {
>         if (ref_count > 1) {
>             detach_internal();
>             /* ref_count = 1; */
>         }
>     }
> public:
>     char &operator[](int i) { detach(); return stuff[i]; }
> };
> 
> Then, in code like this:
> 
> int doStuffWithStuff(CoptOnWrite &stuff) {
>     return stuff[0] + stuff[1] * stuff[2];
> }
> 
> The generated code will contains three test of ref_count, and three call to
> detach_internal
> 
> Is there a way to tell the compiler that ref_count is actually smaller or
> equal to 1 after a call to detach_internal?
> Having the "ref_count=1" explicit in the code help (with my
patches), but
> then the operation itself is in the code, and I don't want that.
> 
> Something like
> 
>  if (ref_count>1)
>      __builtin_unreachable()
> 
> Works fine in GCC,  but does not work with LLVM.
> Well, it almost work.  but the problem is that the whole condition is
> removed before the inlining is done.
> So what can be done for that to work?  Either delay the removal of
> __builtin_unreachable() to after inlining (when?)
> Another way could be, while removing branches because they are unreachable,
> somehow leave the range information kept.
> I was thinking about a !range metadata, but I don't know where to put
it.
> 
> The other problem was that i was analyzing code like this:
> 
> void toLatin1(uchar *dst, const ushort *src, int length)
> {
>     if (length) {
> #if defined(__SSE2__)
>         if (length >= 16) {
>             for (int i = 0; i < length >> 4; ++i) {
>                 /* skipped code using SSE2 intrinsics */
>                 src += 16; dst += 16;
>             }
>             length = length % 16;
>         }
> #endif
>         while (length--) {
>             *dst++ = (*src>0xff) ? '?' : (uchar) *src;
>             ++src;
>         }
>     }
> }
> 
> I was wondering, if compiling with AVX, would clang/LLVM be able to even
> vectorize more the SSE2 intrinsics to wider vectors? Or would the non
> intrinsics branch be better?
> It turns out the result is not great.  LLVM leaves the intrinsics code
> unchanged (that's ok),  but tries to also vectorize the second loop.
(And
> the result of this vectorisation is quite horrible.)
> Shouldn't the compiler see that length is never bigger than 16 and
hence
> deduce that there is no point in vectorizing? This is why I implemented the
> srem and urem in LVI.
> But then, maybe some other pass a loop pass should use LVI to see than a
> loop never enters, or loop vectorizer could use LVI to avoid creating the
> loop in the first place.
> 
> --
> Olivier

llvm dev - Jan 2014 - [LLVMdev] [Patches] Some LazyValueInfo and related patches

[LLVMdev] [Patches] Some LazyValueInfo and related patches

[LLVMdev] [Patches] Some LazyValueInfo and related patches