thr3ads.net - llvm dev - [llvm-dev] @llvm.memcpy not honoring volatile? [Jun 2019]

If this information is useful, please help other people find it:
Share via:

JF Bastien via llvm-dev

2019-Jun-13 04:44 UTC

[llvm-dev] @llvm.memcpy not honoring volatile?

> On Jun 12, 2019, at 9:38 PM, James Y Knight <jyknight at google.com>
wrote:
> 
> 
>> On Tue, Jun 11, 2019 at 12:08 PM JF Bastien via llvm-dev <llvm-dev
at lists.llvm.org> wrote:
> 
>> I think we want option 2.: keep volatile memcpy, and implement it as
touching each byte exactly once. That’s unlikely to be particularly useful for
every direct-to-hardware uses, but it behaves intuitively enough that I think
it’s desirable.
> 
> As Eli pointed out, that precludes lowering a volatile memcpy into a call
the memcpy library function. The usual "memcpy" library function may
well use the same overlapping-memory trick, and there is no
"volatile_memcpy" libc function which would provide a guarantee of not
touching bytes multiple times. Perhaps it's okay to just always emit an
inline loop instead of falling back to a memcpy call.
In which circumstances does this matter?

> But, possibly option 3 would be better. Maybe it's better to force
people/compiler-frontends to emit the raw load/store operations, so that
it's more clear exactly what semantics are desired.
> 
> The fundamental issue to me is that for reasonable usages of volatile, the
operand size and number of memmory instructions generated for a given operation
actually matters. Certainly, this is a somewhat unfortunate situation, since the
C standard explicitly doesn't forbid implementing any volatile access with
smaller memory operations. (Which, among other issues, allows tearing as your
wg21 doc nicely points out.) Nevertheless, it _is_ an important property --
required by POSIX for accesses of a volatile sig_atomic_t, even -- and is a
property which LLVM/Clang does provide when dealing with volatile accesses of
target-specific appropriate sizes and alignments.
> 
> But, what does that mean for volatile memcpy? What size should it use?
Any size that makes sense to HW. 

> Always a byte-by-byte copy?
It can. 
> May it do larger-sized reads/writes as well?
Any size, but no larger than memcpy’s size parameter specified. 
> Must it do so?
No, but it has to be sensible (whatever that means). 
> Does it have to read/write the data in order?
No. 
> Or can it do so in reverse order?
Yes. 
> Can it use CPU's block-copy instructions (e.g. rep movsb on x86) which
may sometimes cause effectively-arbitrarily-sized memory-ops, in arbitrary
order, in hardware?
Sure. 

> If we're going to keep volatile memcpy support, possibly those other
questions ought to be answered too?
Paul McKenney has a follow on paper (linked from R2 of mine) which addresses
some of your questions I think. LLVM can do what it wants for now since there’s
no standard, but there’s likely to be one eventually and we probably should
match what it’s likely to be.


> I dunno...-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190612/ed1b6072/attachment.html>

James Y Knight via llvm-dev

2019-Jun-13 15:08 UTC

head link

[llvm-dev] @llvm.memcpy not honoring volatile?

On Thu, Jun 13, 2019 at 12:54 AM JF Bastien <jfbastien at apple.com>
wrote:
>
>
> On Jun 12, 2019, at 9:38 PM, James Y Knight <jyknight at google.com>
wrote:
>
> 
> On Tue, Jun 11, 2019 at 12:08 PM JF Bastien via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> I think we want option 2.: keep volatile memcpy, and implement it as
>> touching each byte exactly once. That’s unlikely to be particularly
useful
>> for every direct-to-hardware uses, but it behaves intuitively enough
that I
>> think it’s desirable.
>>
>
> As Eli pointed out, that precludes lowering a volatile memcpy into a call
> the memcpy library function. The usual "memcpy" library function
may well
> use the same overlapping-memory trick, and there is no
"volatile_memcpy"
> libc function which would provide a guarantee of not touching bytes
> multiple times. Perhaps it's okay to just always emit an inline loop
> instead of falling back to a memcpy call.
>
>
> In which circumstances does this matter?
>
If it's problematic to touch a byte multiple times when emitting inlined
instructions for a "volatile memcpy", surely it's also problematic
to emit
a library function call which does the same thing?

But -- I don't know of any realistic circumstance where either one would be
important to actual users. Someone would need to have a situation where
doing 2 overlapping 4-byte writes to implement a 7-byte memcpy is
problematic, but where it doesn't matter to them what permutation of
non-overlapping memory read/write sizes is used -- and furthermore, where
the order doesn't matter. That seems extremely unlikely to ever be the case.

Paul McKenney has a follow on paper (linked from R2 of mine)
which> addresses some of your questions I think. LLVM can do what it wants for now
> since there’s no standard, but there’s likely to be one eventually and we
> probably should match what it’s likely to be.
>
I agree, Paul's paper describes the actually-required (vs
C-standard-required) semantics for volatile loads and stores today -- that
they must use non-tearing operations for sizes/alignments where the
hardware provides such. (IMO, any usage of volatile where that cannot be
done is extremely questionable). Of course, that doesn't say anything about
memcpy, since volatile memcpy isn't part of C, just part of LLVM.

Always a byte-by-byte copy?>
>
It can.

So, why does llvm even provide a volatile memcpy intrinsic? One possible
answer is that it was needed in order to implement volatile aggregate
copies generated by the C frontend. So, given the real world requirement to
use single instructions where possible...what about this code:

struct X {int n;};
void foo(volatile struct X *n) {
    n[0] = n[1];
}

Clang implements it by creating a volatile llvm.memcpy call. Which
currently is generally lowered as a 32-bit read/write. Maybe it should be
_required_ to always emit a 32-bit read/write instruction, just as if you
were directly operating on a 'volatile int *n'? (Assuming a 32-bit
platform
which has such instructions, of course)?

Or -- maybe memcpy is actually not a reasonable thing to use for copying a
volatile struct at all. Perhaps a volatile struct copy should do volatile
element-wise copies of each fundamentally-typed field? That might make some
sense. (But...unions?).
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190613/80de44e5/attachment.html>

JF Bastien via llvm-dev

2019-Jun-13 15:55 UTC

head link

[llvm-dev] @llvm.memcpy not honoring volatile?

> On Jun 13, 2019, at 8:08 AM, James Y Knight <jyknight at google.com>
wrote:
> 
> On Thu, Jun 13, 2019 at 12:54 AM JF Bastien <jfbastien at apple.com
<mailto:jfbastien at apple.com>> wrote:
> 
> 
>> On Jun 12, 2019, at 9:38 PM, James Y Knight <jyknight at google.com
<mailto:jyknight at google.com>> wrote:
>> 
>> 
>> On Tue, Jun 11, 2019 at 12:08 PM JF Bastien via llvm-dev <llvm-dev
at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>> I think we want option 2.: keep volatile memcpy, and implement it as
touching each byte exactly once. That’s unlikely to be particularly useful for
every direct-to-hardware uses, but it behaves intuitively enough that I think
it’s desirable.
>> 
>> As Eli pointed out, that precludes lowering a volatile memcpy into a
call the memcpy library function. The usual "memcpy" library function
may well use the same overlapping-memory trick, and there is no
"volatile_memcpy" libc function which would provide a guarantee of not
touching bytes multiple times. Perhaps it's okay to just always emit an
inline loop instead of falling back to a memcpy call.
> 
> In which circumstances does this matter?
> 
> If it's problematic to touch a byte multiple times when emitting
inlined instructions for a "volatile memcpy", surely it's also
problematic to emit a library function call which does the same thing?
> 
> But -- I don't know of any realistic circumstance where either one
would be important to actual users. Someone would need to have a situation where
doing 2 overlapping 4-byte writes to implement a 7-byte memcpy is problematic,
but where it doesn't matter to them what permutation of non-overlapping
memory read/write sizes is used -- and furthermore, where the order doesn't
matter. That seems extremely unlikely to ever be the case.
Agreed, but that’s the stated intended behavior of volatile. Makes sense for
hardware, weird otherwise, but we don’t need to do it any other way. I could
construct a case where volatile is used in a signal handler, and where a partial
result with overlap breaks expectations, but… I agree it’s unlikely.

> Paul McKenney has a follow on paper (linked from R2 of mine) which
addresses some of your questions I think. LLVM can do what it wants for now
since there’s no standard, but there’s likely to be one eventually and we
probably should match what it’s likely to be.
> 
> I agree, Paul's paper describes the actually-required (vs
C-standard-required) semantics for volatile loads and stores today -- that they
must use non-tearing operations for sizes/alignments where the hardware provides
such. (IMO, any usage of volatile where that cannot be done is extremely
questionable). Of course, that doesn't say anything about memcpy, since
volatile memcpy isn't part of C, just part of LLVM.
Indeed. After we standardize Paul’s paper I expect to also do something like
volatile memcpy based on it.

> Always a byte-by-byte copy?
>  
> It can. 
>  
> So, why does llvm even provide a volatile memcpy intrinsic? One possible
answer is that it was needed in order to implement volatile aggregate copies
generated by the C frontend. So, given the real world requirement to use single
instructions where possible...what about this code:
> 
> struct X {int n;}; 
> void foo(volatile struct X *n) {
>     n[0] = n[1];
> }
> 
> Clang implements it by creating a volatile llvm.memcpy call. Which
currently is generally lowered as a 32-bit read/write. Maybe it should be
_required_ to always emit a 32-bit read/write instruction, just as if you were
directly operating on a 'volatile int *n'? (Assuming a 32-bit platform
which has such instructions, of course)?
In general, volatile instructions should behave as expected. Of course, we can
disagree on expectations ;-)
Interpreting a non-specification is wonderful.

> Or -- maybe memcpy is actually not a reasonable thing to use for copying a
volatile struct at all. Perhaps a volatile struct copy should do volatile
element-wise copies of each fundamentally-typed field? That might make some
sense. (But...unions?).
I think copying a volatile struct is the unreasonable part, but that’s not super
relevant to how we implement this silly thing :-)
Don’t get me started on volatile unions (and bitfields).

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190613/b36bedd5/attachment.html>

llvm dev - Jun 2019 - @llvm.memcpy not honoring volatile?

[llvm-dev] @llvm.memcpy not honoring volatile?

[llvm-dev] @llvm.memcpy not honoring volatile?

[llvm-dev] @llvm.memcpy not honoring volatile?