Bruce Hoult via llvm-dev
2016-Oct-19 12:44 UTC
[llvm-dev] RFC: Killing undef and spreading poison
On Wed, Oct 19, 2016 at 2:52 PM, Nuno Lopes via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Memcpy does a byte-by-byte copy. So if one of the bits is poison then only > the byte containing that bit becomes poison. > Therefore, memcpy(x, y, 1) is equivalent to load i8. But memcpy(x,y,4) is > not equivalent to "load i32" since load makes the whole value poison if any > of the bits is poison. > The alternative as given by Eli is to use "load <4 x i8>". Since now we > are loading 4 separate values, poison does not extend past the byte > boundary. When load is lowered, you should get exactly the same code as > with "load i32", though. > So the hope is that there's no diff at assembly level. >I'm curious. Where is it defined that memcpy is byte by byte not, for example, bit by bit? Why is the destination not identical to the source, with exactly the same bits poison? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161019/5bbc9478/attachment.html>
Nuno Lopes via llvm-dev
2016-Oct-19 13:08 UTC
[llvm-dev] RFC: Killing undef and spreading poison
> On Wed, Oct 19, 2016 at 2:52 PM, Nuno Lopes via llvm-dev <mailto:llvm-dev at lists.llvm.org> wrote: >> Memcpy does a byte-by-byte copy. So if one of the bits is poison then only the byte containing that bit becomes poison. >> Therefore, memcpy(x, y, 1) is equivalent to load i8. But memcpy(x,y,4) is not equivalent to "load i32" since load makes the whole value poison if any of the bits is poison. >> The alternative as given by Eli is to use "load <4 x i8>". Since now we are loading 4 separate values, poison does not extend past the byte boundary. When load is lowered, you should get exactly the same code as with "load i32", though. >> So the hope is that there's no diff at assembly level. > > I'm curious. Where is it defined that memcpy is byte by byte not, for example, bit by bit? Why is the destination not identical to the source, with exactly the same bits poison?I don't think it's written explicitly anywhere, but the C++ standard says the following: "The fundamental storage unit in the C++ memory model is the byte." [intro.memory - 1.7] And then: "most derived object shall have a non-zero size and shall occupy one or more bytes of storage." "An object of trivially copyable or standard-layout type (3.9) shall occupy contiguous bytes of storage." [intro.object - 1.8] I'm not a language lawyer, but it seems that for C/C++, defining memcpy in terms of byte copying is sufficient. Even for bit-fields, since these are lowered into words with a multiple-of-a-byte size. However, just because for C is fine to do something, it's true that we may choose to do something else at the IR level. It's easy to make memcpy a bit-wise copy; the question is whether there's a client that cares or not. Please let us know if you are aware of such a client. Nuno
Joerg Sonnenberger via llvm-dev
2016-Oct-19 13:12 UTC
[llvm-dev] RFC: Killing undef and spreading poison
On Wed, Oct 19, 2016 at 03:44:05PM +0300, Bruce Hoult via llvm-dev wrote:> I'm curious. Where is it defined that memcpy is byte by byte not, for > example, bit by bit? Why is the destination not identical to the source, > with exactly the same bits poison?The fundamental unit in C is a char. Joerg
Bruce Hoult via llvm-dev
2016-Oct-19 14:49 UTC
[llvm-dev] RFC: Killing undef and spreading poison
On Wed, Oct 19, 2016 at 4:08 PM, Nuno Lopes via llvm-dev < llvm-dev at lists.llvm.org> wrote:> > On Wed, Oct 19, 2016 at 2:52 PM, Nuno Lopes via llvm-dev <mailto: > llvm-dev at lists.llvm.org> wrote: > >> Memcpy does a byte-by-byte copy. So if one of the bits is poison then > only the byte containing that bit becomes poison. > >> Therefore, memcpy(x, y, 1) is equivalent to load i8. But memcpy(x,y,4) > is not equivalent to "load i32" since load makes the whole value poison if > any of the bits is poison. > >> The alternative as given by Eli is to use "load <4 x i8>". Since now > we are loading 4 separate values, poison does not extend past the byte > boundary. When load is lowered, you should get exactly the same code as > with "load i32", though. > >> So the hope is that there's no diff at assembly level. > > > > I'm curious. Where is it defined that memcpy is byte by byte not, for > example, bit by bit? Why is the destination not identical to the source, > with exactly the same bits poison? > > I don't think it's written explicitly anywhere, but the C++ standard says > the following: > "The fundamental storage unit in the C++ memory model is the byte." > [intro.memory - 1.7] > > And then: > "most derived object shall have a non-zero size and shall occupy one or > more bytes of storage." > "An object of trivially copyable or standard-layout type (3.9) shall > occupy contiguous bytes of storage." > [intro.object - 1.8] > > I'm not a language lawyer, but it seems that for C/C++, defining memcpy in > terms of byte copying is sufficient. Even for bit-fields, since these are > lowered into words with a multiple-of-a-byte size. > > However, just because for C is fine to do something, it's true that we may > choose to do something else at the IR level. It's easy to make memcpy a > bit-wise copy; the question is whether there's a client that cares or not. > Please let us know if you are aware of such a client.Isn't a struct containing a bitfield with a non multiple of 8 number of bits already an example? A previous post indicated the last byte, with the padding bits, needed special treatment. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161019/0df7bc59/attachment.html>
Chris Lattner via llvm-dev
2016-Oct-24 19:57 UTC
[llvm-dev] RFC: Killing undef and spreading poison
> On Oct 19, 2016, at 6:12 AM, Joerg Sonnenberger via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > On Wed, Oct 19, 2016 at 03:44:05PM +0300, Bruce Hoult via llvm-dev wrote: >> I'm curious. Where is it defined that memcpy is byte by byte not, for >> example, bit by bit? Why is the destination not identical to the source, >> with exactly the same bits poison? > > The fundamental unit in C is a char.This approach won’t work for LLVM thought. It is perfectly fine in C to have an initialized bitfield in an otherwise uninitialized object. Use of the initialized part is acceptable, and the initialized aspect of it needs to propagate through memcpy. -Chris