thr3ads.net - llvm dev - [llvm-dev] the as-if rule / perf vs. security [Mar 2016]

If this information is useful, please help other people find it:
Share via:

Sanjay Patel via llvm-dev

2016-Mar-15 15:46 UTC

[llvm-dev] the as-if rule / perf vs. security

[cc'ing cfe-dev because this may require some interpretation of language
law]

My understanding is that the compiler has the freedom to access extra data
in C/C++ (not sure about other languages); AFAIK, the LLVM LangRef is
silent about this. In C/C++, this is based on the "as-if rule":
http://en.cppreference.com/w/cpp/language/as_if

So the question is: where should the optimizer draw the line with respect
to perf vs. security if it involves operating on unknown data? Are there
guidelines that we can use to decide this?

The masked load transform referenced below is not unique in accessing /
operating on unknown data. In addition to the related scalar loads ->
vector load transform that I've mentioned earlier in this thread, see for
example:
https://llvm.org/bugs/show_bug.cgi?id=20358
(and the security paper and patch review linked there)


On Mon, Mar 14, 2016 at 10:26 PM, Shahid, Asghar-ahmad <
Asghar-ahmad.Shahid at amd.com> wrote:
> Hi Sanjay,
>
>
>
> >The real question I have is whether it is legal to read the extra
memory,
> regardless of whether this is a masked load or
>
> >something else.
>
> No, It is not legal AFAIK because by doing that we are exposing the
> content of the memory which programmer
>
> does not intend to. This may be vulnerable for exploitation.
>
>
>
> Regards,
>
> Shahid
>
>
>
>
>
> *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] *On Behalf Of
*Sanjay
> Patel via llvm-dev
> *Sent:* Monday, March 14, 2016 10:37 PM
> *To:* Nema, Ashutosh
> *Cc:* llvm-dev
> *Subject:* Re: [llvm-dev] masked-load endpoints optimization
>
>
>
> I checked in a patch to do this transform for x86-only for now:
> http://reviews.llvm.org/D18094 / http://reviews.llvm.org/rL263446
>
>
>
> On Fri, Mar 11, 2016 at 9:57 AM, Sanjay Patel <spatel at
rotateright.com>
> wrote:
>
> Thanks, Ashutosh.
>
> Yes, either TTI or TLI could be used to limit the transform if we do it in
> CGP rather than the DAG.
>
> The real question I have is whether it is legal to read the extra memory,
> regardless of whether this is a masked load or something else.
>
> Note that the x86 backend already does this, so either my proposal is ok
> for x86, or we're already doing an illegal optimization:
>
>
> define <4 x i32> @load_bonus_bytes(i32* %addr1, <4 x i32> %v) {
>   %ld1 = load i32, i32* %addr1
>   %addr2 = getelementptr i32, i32* %addr1, i64 3
>   %ld2 = load i32, i32* %addr2
>   %vec1 = insertelement <4 x i32> undef, i32 %ld1, i32 0
>   %vec2 = insertelement <4 x i32> %vec1, i32 %ld2, i32 3
>   ret <4 x i32> %vec2
> }
>
> $ ./llc -o - loadcombine.ll
> ...
>     movups    (%rdi), %xmm0
>     retq
>
>
>
>
> On Thu, Mar 10, 2016 at 10:22 PM, Nema, Ashutosh <Ashutosh.Nema at
amd.com>
> wrote:
>
> This looks interesting, the main motivation appears to be replacing masked
> vector load with a general vector load followed by a select.
>
>
>
> Observed masked vector loads are in general expensive in comparison with a
> vector load.
>
>
>
> But if first & last element of a masked vector load are guaranteed to
be
> accessed then it can be transformed to a vector load.
>
>
>
> In opt this can be driven by TTI, where the benefit of this transformation
> should be checked.
>
>
>
> Regards,
>
> Ashutosh
>
>
>
> *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] *On Behalf Of
*Sanjay
> Patel via llvm-dev
> *Sent:* Friday, March 11, 2016 3:37 AM
> *To:* llvm-dev
> *Subject:* [llvm-dev] masked-load endpoints optimization
>
>
>
> If we're loading the first and last elements of a vector using a masked
> load [1], can we replace the masked load with a full vector load?
>
> "The result of this operation is equivalent to a regular vector load
> instruction followed by a ‘select’ between the loaded and the passthru
> values, predicated on the same mask. However, using this intrinsic prevents
> exceptions on memory access to masked-off lanes."
>
> I think the fact that we're loading the endpoints of the vector
guarantees
> that a full vector load can't have any different faulting/exception
> behavior on x86 and most (?) other targets. We would, however, be reading
> memory that the program has not explicitly requested.
>
> IR example:
>
> define <4 x i32> @maskedload_endpoints(<4 x i32>* %addr, <4
x i32> %v) {
>
>   ; load the first and last elements pointed to by %addr and shuffle those
> into %v
>
>   %res = call <4 x i32> @llvm.masked.load.v4i32(<4 x i32>*
%addr, i32 4,
> <4 x i1> <i1 1, i1 0, i1 0, i1 1>, <4 x i32> %v)
>   ret <4 x i32> %res
> }
>
> would become something like:
>
>
> define <4 x i32> @maskedload_endpoints(<4 x i32>* %addr, <4
x i32> %v) {
>
>   %vecload = load <4 x i32>, <4 x i32>* %addr, align 4
>
>   %sel = select <4 x i1> <i1 1, i1 0, i1 0, i1 1>, <4 x
i32> %vecload, <4
> x i32> %v
>
>   ret <4 x i32> %sel
> }
>
> If this isn't valid as an IR optimization, would it be acceptable as a
DAG
> combine with target hook to opt in?
>
>
> [1] http://llvm.org/docs/LangRef.html#llvm-masked-load-intrinsics
>
>
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160315/5dc42e80/attachment.html>

Craig, Ben via llvm-dev

2016-Mar-16 13:59 UTC

head link

[llvm-dev] the as-if rule / perf vs. security

Regarding accessing extra data, there are at least some limits as to 
what can be accessed.  You can't generate extra loads or stores to 
volatiles.  You can't generate extra stores to atomics, even if the 
extra stores appear to be the same value as the old value.

As for determining where the perf vs. security line should be drawn, I 
would argue that most compilers have gone too far on the perf side while 
optimizing undefined behavior.  Dead store elimination leaving passwords 
in memory, integer overflow checks getting optimized out, and NULL 
checks optimized away.  Linus Torvalds was complaining about those just 
recently on this list, and while I don't share his tone, I agree with 
him regarding the harm these optimizations can cause.

If I'm understanding correctly, for your specific cases, you are 
wondering if it is fine to load and operate on a floating point value 
that the user did not specifically request you to operate on. This could 
cause (at least) two different problems.  First, it could cause a 
floating point exception.  I think the danger of the floating point 
exception should rule out loading values the user didn't request.  
Second, loading values the user didn't specify could enable a timing 
attack.  The timing attack is scary, but I don't think it is something 
we can really fix in the general case. As long as individual assembly 
instructions have impractical-to-predict execution times, we will be at 
the mercy of the current hardware state.  There are timing attacks that 
can determine TLS keys in a different VM instance based off of how 
quickly loads in the current process execute.  If our worst timing 
attack problems are floating point denormalization issues, then I think 
we are in a pretty good state.

On 3/15/2016 10:46 AM, Sanjay Patel via llvm-dev wrote:> [cc'ing cfe-dev because this may require some interpretation of 
> language law]
>
> My understanding is that the compiler has the freedom to access extra 
> data in C/C++ (not sure about other languages); AFAIK, the LLVM 
> LangRef is silent about this. In C/C++, this is based on the "as-if
rule":
> http://en.cppreference.com/w/cpp/language/as_if
>
> So the question is: where should the optimizer draw the line with 
> respect to perf vs. security if it involves operating on unknown data? 
> Are there guidelines that we can use to decide this?
>
> The masked load transform referenced below is not unique in accessing 
> / operating on unknown data. In addition to the related scalar loads 
> -> vector load transform that I've mentioned earlier in this thread,
> see for example:
> https://llvm.org/bugs/show_bug.cgi?id=20358
> (and the security paper and patch review linked there)
>
>
> On Mon, Mar 14, 2016 at 10:26 PM, Shahid, Asghar-ahmad 
> <Asghar-ahmad.Shahid at amd.com <mailto:Asghar-ahmad.Shahid at
amd.com>> wrote:
>
>     Hi Sanjay,
>
>     >The real question I have is whether it is legal to read the extra
>     memory, regardless of whether this is a masked load or
>
>     >something else.
>
>     No, It is not legal AFAIK because by doing that we are exposing
>     the content of the memory which programmer
>
>     does not intend to. This may be vulnerable for exploitation.
>
>     Regards,
>
>     Shahid
>
>     *From:*llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org
>     <mailto:llvm-dev-bounces at lists.llvm.org>] *On Behalf Of
*Sanjay
>     Patel via llvm-dev
>     *Sent:* Monday, March 14, 2016 10:37 PM
>     *To:* Nema, Ashutosh
>     *Cc:* llvm-dev
>     *Subject:* Re: [llvm-dev] masked-load endpoints optimization
>
>     I checked in a patch to do this transform for x86-only for now:
>     http://reviews.llvm.org/D18094 / http://reviews.llvm.org/rL263446
>
>     On Fri, Mar 11, 2016 at 9:57 AM, Sanjay Patel
>     <spatel at rotateright.com <mailto:spatel at
rotateright.com>> wrote:
>
>     Thanks, Ashutosh.
>
>     Yes, either TTI or TLI could be used to limit the transform if we
>     do it in CGP rather than the DAG.
>
>     The real question I have is whether it is legal to read the extra
>     memory, regardless of whether this is a masked load or something else.
>
>     Note that the x86 backend already does this, so either my proposal
>     is ok for x86, or we're already doing an illegal optimization:
>
>
>     define <4 x i32> @load_bonus_bytes(i32* %addr1, <4 x i32>
%v) {
>       %ld1 = load i32, i32* %addr1
>       %addr2 = getelementptr i32, i32* %addr1, i64 3
>       %ld2 = load i32, i32* %addr2
>       %vec1 = insertelement <4 x i32> undef, i32 %ld1, i32 0
>       %vec2 = insertelement <4 x i32> %vec1, i32 %ld2, i32 3
>       ret <4 x i32> %vec2
>     }
>
>     $ ./llc -o - loadcombine.ll
>     ...
>         movups    (%rdi), %xmm0
>         retq
>
>
>     On Thu, Mar 10, 2016 at 10:22 PM, Nema, Ashutosh
>     <Ashutosh.Nema at amd.com <mailto:Ashutosh.Nema at
amd.com>> wrote:
>
>     This looks interesting, the main motivation appears to be
>     replacing masked vector load with a general vector load followed
>     by a select.
>
>     Observed masked vector loads are in general expensive in
>     comparison with a vector load.
>
>     But if first & last element of a masked vector load are guaranteed
>     to be accessed then it can be transformed to a vector load.
>
>     In opt this can be driven by TTI, where the benefit of this
>     transformation should be checked.
>
>     Regards,
>
>     Ashutosh
>
>     *From:*llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org
>     <mailto:llvm-dev-bounces at lists.llvm.org>] *On Behalf Of
*Sanjay
>     Patel via llvm-dev
>     *Sent:* Friday, March 11, 2016 3:37 AM
>     *To:* llvm-dev
>     *Subject:* [llvm-dev] masked-load endpoints optimization
>
>     If we're loading the first and last elements of a vector using a
>     masked load [1], can we replace the masked load with a full vector
>     load?
>
>     "The result of this operation is equivalent to a regular vector
>     load instruction followed by a ‘select’ between the loaded and the
>     passthru values, predicated on the same mask. However, using this
>     intrinsic prevents exceptions on memory access to masked-off
lanes."
>
>     I think the fact that we're loading the endpoints of the vector
>     guarantees that a full vector load can't have any different
>     faulting/exception behavior on x86 and most (?) other targets. We
>     would, however, be reading memory that the program has not
>     explicitly requested.
>
>     IR example:
>
>     define <4 x i32> @maskedload_endpoints(<4 x i32>* %addr,
<4 x i32>
>     %v) {
>
>       ; load the first and last elements pointed to by %addr and
>     shuffle those into %v
>
>     %res = call <4 x i32> @llvm.masked.load.v4i32(<4 x i32>*
%addr,
>     i32 4, <4 x i1> <i1 1, i1 0, i1 0, i1 1>, <4 x i32>
%v)
>       ret <4 x i32> %res
>     }
>
>     would become something like:
>
>
>     define <4 x i32> @maskedload_endpoints(<4 x i32>* %addr,
<4 x i32>
>     %v) {
>
>     %vecload = load <4 x i32>, <4 x i32>* %addr, align 4
>
>     %sel = select <4 x i1> <i1 1, i1 0, i1 0, i1 1>, <4 x
i32>
>     %vecload, <4 x i32> %v
>
>     ret <4 x i32> %sel
>     }
>
>     If this isn't valid as an IR optimization, would it be acceptable
>     as a DAG combine with target hook to opt in?
>
>
>     [1] http://llvm.org/docs/LangRef.html#llvm-masked-load-intrinsics
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160316/c1c6e0db/attachment.html>

Sanjay Patel via llvm-dev

2016-Mar-16 16:38 UTC

head link

[llvm-dev] the as-if rule / perf vs. security

Hi Ben -

Thanks for your response. For the sake of argument, let's narrow the scope
of the problem to eliminate some of the variables you have rightfully
cited.

Let's assume we're not dealing with volatiles, atomics, or FP operands.
We'll even guarantee that the extra loaded value is never used. This is, in
fact, the scenario that http://reviews.llvm.org/rL263446 is concerned with.

Related C example:

typedef int v4i32 __attribute__((__vector_size__(16)));

// Load some almost-consecutive ints as a vector.
v4i32 foo(int *x) {
   int x0 = x[0];
   int x1 = x[1];
// int x2 = x[2];   // U can't touch this?
   int x3 = x[3];
   return (v4i32) { x0, x1, 0, x3 };
}

For x86, we notice that we have nearly a v4i32 vector's worth of loads, so
we just turn that into a vector load and mask out the element that's
getting set to zero:
    movups    (%rdi), %xmm0            ; load 128-bits instead of three
32-bit elements
    andps    LCPI0_0(%rip), %xmm0 ; put zero bits into the 3rd element of
the vector

Should that optimization be disabled by a hypothetical -fextra-secure flag?



On Wed, Mar 16, 2016 at 7:59 AM, Craig, Ben <ben.craig at codeaurora.org>
wrote:
> Regarding accessing extra data, there are at least some limits as to what
> can be accessed.  You can't generate extra loads or stores to
volatiles.
> You can't generate extra stores to atomics, even if the extra stores
appear
> to be the same value as the old value.
>
> As for determining where the perf vs. security line should be drawn, I
> would argue that most compilers have gone too far on the perf side while
> optimizing undefined behavior.  Dead store elimination leaving passwords in
> memory, integer overflow checks getting optimized out, and NULL checks
> optimized away.  Linus Torvalds was complaining about those just recently
> on this list, and while I don't share his tone, I agree with him
regarding
> the harm these optimizations can cause.
>
> If I'm understanding correctly, for your specific cases, you are
wondering
> if it is fine to load and operate on a floating point value that the user
> did not specifically request you to operate on.  This could cause (at
> least) two different problems.  First, it could cause a floating point
> exception.  I think the danger of the floating point exception should rule
> out loading values the user didn't request.  Second, loading values the
> user didn't specify could enable a timing attack.  The timing attack is
> scary, but I don't think it is something we can really fix in the
general
> case.  As long as individual assembly instructions have
> impractical-to-predict execution times, we will be at the mercy of the
> current hardware state.  There are timing attacks that can determine TLS
> keys in a different VM instance based off of how quickly loads in the
> current process execute.  If our worst timing attack problems are floating
> point denormalization issues, then I think we are in a pretty good state.
>
>
> On 3/15/2016 10:46 AM, Sanjay Patel via llvm-dev wrote:
>
> [cc'ing cfe-dev because this may require some interpretation of
language
> law]
>
> My understanding is that the compiler has the freedom to access extra data
> in C/C++ (not sure about other languages); AFAIK, the LLVM LangRef is
> silent about this. In C/C++, this is based on the "as-if rule":
> http://en.cppreference.com/w/cpp/language/as_if
>
> So the question is: where should the optimizer draw the line with respect
> to perf vs. security if it involves operating on unknown data? Are there
> guidelines that we can use to decide this?
>
> The masked load transform referenced below is not unique in accessing /
> operating on unknown data. In addition to the related scalar loads ->
> vector load transform that I've mentioned earlier in this thread, see
for
> example:
> https://llvm.org/bugs/show_bug.cgi?id=20358
> (and the security paper and patch review linked there)
>
>
> On Mon, Mar 14, 2016 at 10:26 PM, Shahid, Asghar-ahmad <
> <Asghar-ahmad.Shahid at amd.com>Asghar-ahmad.Shahid at amd.com>
wrote:
>
>> Hi Sanjay,
>>
>>
>>
>> >The real question I have is whether it is legal to read the extra
>> memory, regardless of whether this is a masked load or
>>
>> >something else.
>>
>> No, It is not legal AFAIK because by doing that we are exposing the
>> content of the memory which programmer
>>
>> does not intend to. This may be vulnerable for exploitation.
>>
>>
>>
>> Regards,
>>
>> Shahid
>>
>>
>>
>>
>>
>> *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] *On Behalf
Of *Sanjay
>> Patel via llvm-dev
>> *Sent:* Monday, March 14, 2016 10:37 PM
>> *To:* Nema, Ashutosh
>> *Cc:* llvm-dev
>> *Subject:* Re: [llvm-dev] masked-load endpoints optimization
>>
>>
>>
>> I checked in a patch to do this transform for x86-only for now:
>> http://reviews.llvm.org/D18094 / http://reviews.llvm.org/rL263446
>>
>>
>>
>> On Fri, Mar 11, 2016 at 9:57 AM, Sanjay Patel < <spatel at
rotateright.com>
>> spatel at rotateright.com> wrote:
>>
>> Thanks, Ashutosh.
>>
>> Yes, either TTI or TLI could be used to limit the transform if we do it
>> in CGP rather than the DAG.
>>
>> The real question I have is whether it is legal to read the extra
memory,
>> regardless of whether this is a masked load or something else.
>>
>> Note that the x86 backend already does this, so either my proposal is
ok
>> for x86, or we're already doing an illegal optimization:
>>
>>
>> define <4 x i32> @load_bonus_bytes(i32* %addr1, <4 x i32>
%v) {
>>   %ld1 = load i32, i32* %addr1
>>   %addr2 = getelementptr i32, i32* %addr1, i64 3
>>   %ld2 = load i32, i32* %addr2
>>   %vec1 = insertelement <4 x i32> undef, i32 %ld1, i32 0
>>   %vec2 = insertelement <4 x i32> %vec1, i32 %ld2, i32 3
>>   ret <4 x i32> %vec2
>> }
>>
>> $ ./llc -o - loadcombine.ll
>> ...
>>     movups    (%rdi), %xmm0
>>     retq
>>
>>
>>
>>
>> On Thu, Mar 10, 2016 at 10:22 PM, Nema, Ashutosh <Ashutosh.Nema at
amd.com>
>> wrote:
>>
>> This looks interesting, the main motivation appears to be replacing
>> masked vector load with a general vector load followed by a select.
>>
>>
>>
>> Observed masked vector loads are in general expensive in comparison
with
>> a vector load.
>>
>>
>>
>> But if first & last element of a masked vector load are guaranteed
to be
>> accessed then it can be transformed to a vector load.
>>
>>
>>
>> In opt this can be driven by TTI, where the benefit of this
>> transformation should be checked.
>>
>>
>>
>> Regards,
>>
>> Ashutosh
>>
>>
>>
>> *From:* llvm-dev [mailto: <llvm-dev-bounces at lists.llvm.org>
>> llvm-dev-bounces at lists.llvm.org] *On Behalf Of *Sanjay Patel via
llvm-dev
>> *Sent:* Friday, March 11, 2016 3:37 AM
>> *To:* llvm-dev
>> *Subject:* [llvm-dev] masked-load endpoints optimization
>>
>>
>>
>> If we're loading the first and last elements of a vector using a
masked
>> load [1], can we replace the masked load with a full vector load?
>>
>> "The result of this operation is equivalent to a regular vector
load
>> instruction followed by a ‘select’ between the loaded and the passthru
>> values, predicated on the same mask. However, using this intrinsic
prevents
>> exceptions on memory access to masked-off lanes."
>>
>> I think the fact that we're loading the endpoints of the vector
>> guarantees that a full vector load can't have any different
>> faulting/exception behavior on x86 and most (?) other targets. We
would,
>> however, be reading memory that the program has not explicitly
requested.
>>
>> IR example:
>>
>> define <4 x i32> @maskedload_endpoints(<4 x i32>* %addr,
<4 x i32> %v) {
>>
>>   ; load the first and last elements pointed to by %addr and shuffle
>> those into %v
>>
>>   %res = call <4 x i32> @llvm.masked.load.v4i32(<4 x i32>*
%addr, i32 4,
>> <4 x i1> <i1 1, i1 0, i1 0, i1 1>, <4 x i32> %v)
>>   ret <4 x i32> %res
>> }
>>
>> would become something like:
>>
>>
>> define <4 x i32> @maskedload_endpoints(<4 x i32>* %addr,
<4 x i32> %v) {
>>
>>   %vecload = load <4 x i32>, <4 x i32>* %addr, align 4
>>
>>   %sel = select <4 x i1> <i1 1, i1 0, i1 0, i1 1>, <4 x
i32> %vecload, <4
>> x i32> %v
>>
>>   ret <4 x i32> %sel
>> }
>>
>> If this isn't valid as an IR optimization, would it be acceptable
as a
>> DAG combine with target hook to opt in?
>>
>>
>> [1]
<http://llvm.org/docs/LangRef.html#llvm-masked-load-intrinsics>
>> http://llvm.org/docs/LangRef.html#llvm-masked-load-intrinsics
>>
>>
>>
>>
>>
>
>
>
> _______________________________________________
> LLVM Developers mailing listllvm-dev at
lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
> --
> Employee of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160316/f9a4dc9e/attachment-0001.html>

mats petersson via llvm-dev

2016-Mar-16 16:39 UTC

head link

[llvm-dev] [cfe-dev] the as-if rule / perf vs. security

I'm by no means an expert on security, and I'm not even going to discuss
the implications of that, but surely a load of, say, 4 x i32 with or
without mask will still load those at least into the cache - whether the
are then loaded into some hardware register inside the CPU or not is an
implementation detail for the given processor, but one could imagine that
on some hardware, the masking is done by a suitable AND of a constant, so
even if a masked-load is used, it's not guaranteed to NOT read that data.
The PROCESS can certainly read that data anyway, so my rather naive view of
how secure software works seems to think that this is a very extreme case
of worrying about something that `gdb` can see, for example.

--
Mats

On 16 March 2016 at 13:59, Craig, Ben via cfe-dev <cfe-dev at
lists.llvm.org>
wrote:
> Regarding accessing extra data, there are at least some limits as to what
> can be accessed.  You can't generate extra loads or stores to
volatiles.
> You can't generate extra stores to atomics, even if the extra stores
appear
> to be the same value as the old value.
>
> As for determining where the perf vs. security line should be drawn, I
> would argue that most compilers have gone too far on the perf side while
> optimizing undefined behavior.  Dead store elimination leaving passwords in
> memory, integer overflow checks getting optimized out, and NULL checks
> optimized away.  Linus Torvalds was complaining about those just recently
> on this list, and while I don't share his tone, I agree with him
regarding
> the harm these optimizations can cause.
>
> If I'm understanding correctly, for your specific cases, you are
wondering
> if it is fine to load and operate on a floating point value that the user
> did not specifically request you to operate on.  This could cause (at
> least) two different problems.  First, it could cause a floating point
> exception.  I think the danger of the floating point exception should rule
> out loading values the user didn't request.  Second, loading values the
> user didn't specify could enable a timing attack.  The timing attack is
> scary, but I don't think it is something we can really fix in the
general
> case.  As long as individual assembly instructions have
> impractical-to-predict execution times, we will be at the mercy of the
> current hardware state.  There are timing attacks that can determine TLS
> keys in a different VM instance based off of how quickly loads in the
> current process execute.  If our worst timing attack problems are floating
> point denormalization issues, then I think we are in a pretty good state.
>
>
> On 3/15/2016 10:46 AM, Sanjay Patel via llvm-dev wrote:
>
> [cc'ing cfe-dev because this may require some interpretation of
language
> law]
>
> My understanding is that the compiler has the freedom to access extra data
> in C/C++ (not sure about other languages); AFAIK, the LLVM LangRef is
> silent about this. In C/C++, this is based on the "as-if rule":
> http://en.cppreference.com/w/cpp/language/as_if
>
> So the question is: where should the optimizer draw the line with respect
> to perf vs. security if it involves operating on unknown data? Are there
> guidelines that we can use to decide this?
>
> The masked load transform referenced below is not unique in accessing /
> operating on unknown data. In addition to the related scalar loads ->
> vector load transform that I've mentioned earlier in this thread, see
for
> example:
> https://llvm.org/bugs/show_bug.cgi?id=20358
> (and the security paper and patch review linked there)
>
>
> On Mon, Mar 14, 2016 at 10:26 PM, Shahid, Asghar-ahmad <
> <Asghar-ahmad.Shahid at amd.com>Asghar-ahmad.Shahid at amd.com>
wrote:
>
>> Hi Sanjay,
>>
>>
>>
>> >The real question I have is whether it is legal to read the extra
>> memory, regardless of whether this is a masked load or
>>
>> >something else.
>>
>> No, It is not legal AFAIK because by doing that we are exposing the
>> content of the memory which programmer
>>
>> does not intend to. This may be vulnerable for exploitation.
>>
>>
>>
>> Regards,
>>
>> Shahid
>>
>>
>>
>>
>>
>> *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] *On Behalf
Of *Sanjay
>> Patel via llvm-dev
>> *Sent:* Monday, March 14, 2016 10:37 PM
>> *To:* Nema, Ashutosh
>> *Cc:* llvm-dev
>> *Subject:* Re: [llvm-dev] masked-load endpoints optimization
>>
>>
>>
>> I checked in a patch to do this transform for x86-only for now:
>> http://reviews.llvm.org/D18094 / http://reviews.llvm.org/rL263446
>>
>>
>>
>> On Fri, Mar 11, 2016 at 9:57 AM, Sanjay Patel < <spatel at
rotateright.com>
>> spatel at rotateright.com> wrote:
>>
>> Thanks, Ashutosh.
>>
>> Yes, either TTI or TLI could be used to limit the transform if we do it
>> in CGP rather than the DAG.
>>
>> The real question I have is whether it is legal to read the extra
memory,
>> regardless of whether this is a masked load or something else.
>>
>> Note that the x86 backend already does this, so either my proposal is
ok
>> for x86, or we're already doing an illegal optimization:
>>
>>
>> define <4 x i32> @load_bonus_bytes(i32* %addr1, <4 x i32>
%v) {
>>   %ld1 = load i32, i32* %addr1
>>   %addr2 = getelementptr i32, i32* %addr1, i64 3
>>   %ld2 = load i32, i32* %addr2
>>   %vec1 = insertelement <4 x i32> undef, i32 %ld1, i32 0
>>   %vec2 = insertelement <4 x i32> %vec1, i32 %ld2, i32 3
>>   ret <4 x i32> %vec2
>> }
>>
>> $ ./llc -o - loadcombine.ll
>> ...
>>     movups    (%rdi), %xmm0
>>     retq
>>
>>
>>
>>
>> On Thu, Mar 10, 2016 at 10:22 PM, Nema, Ashutosh <Ashutosh.Nema at
amd.com>
>> wrote:
>>
>> This looks interesting, the main motivation appears to be replacing
>> masked vector load with a general vector load followed by a select.
>>
>>
>>
>> Observed masked vector loads are in general expensive in comparison
with
>> a vector load.
>>
>>
>>
>> But if first & last element of a masked vector load are guaranteed
to be
>> accessed then it can be transformed to a vector load.
>>
>>
>>
>> In opt this can be driven by TTI, where the benefit of this
>> transformation should be checked.
>>
>>
>>
>> Regards,
>>
>> Ashutosh
>>
>>
>>
>> *From:* llvm-dev [mailto: <llvm-dev-bounces at lists.llvm.org>
>> llvm-dev-bounces at lists.llvm.org] *On Behalf Of *Sanjay Patel via
llvm-dev
>> *Sent:* Friday, March 11, 2016 3:37 AM
>> *To:* llvm-dev
>> *Subject:* [llvm-dev] masked-load endpoints optimization
>>
>>
>>
>> If we're loading the first and last elements of a vector using a
masked
>> load [1], can we replace the masked load with a full vector load?
>>
>> "The result of this operation is equivalent to a regular vector
load
>> instruction followed by a ‘select’ between the loaded and the passthru
>> values, predicated on the same mask. However, using this intrinsic
prevents
>> exceptions on memory access to masked-off lanes."
>>
>> I think the fact that we're loading the endpoints of the vector
>> guarantees that a full vector load can't have any different
>> faulting/exception behavior on x86 and most (?) other targets. We
would,
>> however, be reading memory that the program has not explicitly
requested.
>>
>> IR example:
>>
>> define <4 x i32> @maskedload_endpoints(<4 x i32>* %addr,
<4 x i32> %v) {
>>
>>   ; load the first and last elements pointed to by %addr and shuffle
>> those into %v
>>
>>   %res = call <4 x i32> @llvm.masked.load.v4i32(<4 x i32>*
%addr, i32 4,
>> <4 x i1> <i1 1, i1 0, i1 0, i1 1>, <4 x i32> %v)
>>   ret <4 x i32> %res
>> }
>>
>> would become something like:
>>
>>
>> define <4 x i32> @maskedload_endpoints(<4 x i32>* %addr,
<4 x i32> %v) {
>>
>>   %vecload = load <4 x i32>, <4 x i32>* %addr, align 4
>>
>>   %sel = select <4 x i1> <i1 1, i1 0, i1 0, i1 1>, <4 x
i32> %vecload, <4
>> x i32> %v
>>
>>   ret <4 x i32> %sel
>> }
>>
>> If this isn't valid as an IR optimization, would it be acceptable
as a
>> DAG combine with target hook to opt in?
>>
>>
>> [1]
<http://llvm.org/docs/LangRef.html#llvm-masked-load-intrinsics>
>> http://llvm.org/docs/LangRef.html#llvm-masked-load-intrinsics
>>
>>
>>
>>
>>
>
>
>
> _______________________________________________
> LLVM Developers mailing listllvm-dev at
lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
> --
> Employee of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160316/ce49122e/attachment.html>

Maybe Matching Threads

Search for more maybe matching threads

llvm dev - Mar 2016 - the as-if rule / perf vs. security

[llvm-dev] the as-if rule / perf vs. security

[llvm-dev] the as-if rule / perf vs. security

[llvm-dev] the as-if rule / perf vs. security

[llvm-dev] [cfe-dev] the as-if rule / perf vs. security

Maybe Matching Threads