thr3ads.net - llvm dev - [llvm-dev] [cfe-dev] [RFC] Introducing a byte type to LLVM [Jun 2021]

If this information is useful, please help other people find it:
Share via:

Nuno Lopes via llvm-dev

2021-Jun-06 19:15 UTC

[llvm-dev] [RFC] Introducing a byte type to LLVM

Of course, let's look at a simple example with GVN:
if (p == q) {
  *(p-1) = 0;
}

If p/q are integers, then it's perfectly fine to replace p with q or
vice-versa inside the 'if'. GVN may decide to produce this:
if (p == q) {
  *(q-1) = 0;
}

Now consider that p/q are pointers. In particular they are the following:
char a[n];
char q[n];
char *p = a + n;

inlining the definitions, originally we had:
if (a+n == q) {
  *(a+n-1) = 0;
}

The program is well-defined. However, GVN produces a ill-defined program:
if (a+n == q) {
  *(q-1) = 0;
}

While a+n may have the same address as q, their provenance is different. You can
dereference a[n-1], but you can't dereference q[-1], even if they have
exactly the same address.

Pointer equalities can only be propagated if both pointers have the same
provenance information. If two pointers are inbounds and compare equal, then
they have the same provenance, for example. That's a case when GVN for
pointers is correct.

Another class of issues is related with alias analysis and implicit casts done
by integer loads. AA doesn't consider these implicit casts, which is
incorrect. When we load an integer, we don't know if we are actually loading
an integer or a "raw byte", as there's no way to distinguish. So,
to be correct, we need to consider all integer loads as escaping pointers
(unless we introduce a way to distinguish "raw byte" loads from
integer loads). I gave one such example in my last reply to Johannes.

Please let me know if the example above isn't clear. Happy to elaborate
further :)

Thanks,
Nuno 


-----Original Message-----
From: David Blaikie <dblaikie at gmail.com> 
Sent: 06 June 2021 19:58
Subject: Re: [llvm-dev] [RFC] Introducing a byte type to LLVM

Perhaps a small concrete example of the GVN (or otherwise, if there are smaller
examples in other optimizations) transformation that you find to be invalid -
showing how it's important for integers, and invalid for pointers (smallest
example that manifests that invalidity in an observable way) might help ground
the conversation a bit.

On Sun, Jun 6, 2021 at 11:26 AM Nuno Lopes via llvm-dev <llvm-dev at
lists.llvm.org> wrote:>
> Let me try to explain the issue around typed memory. It’s orthogonal to
TBAA. Plus I agree the allocation type is irrelevant; it’s the type of the
load/store operations that matters.
>
>
>
> Fact 1: LLVM needs an “universal holder” type in its IR.
>
> Why?
>
> Frontends load data from memory and don’t necessarily know the type they
are loading. When you load a char in C, it can be part of a pointer or an
integer, for example. Clang has no way to know.
> That’s necessary to express implementations of memcpy in IR, as it copies
raw data without knowing what the type is.
> LLVM lowers small memcpys into load/store.
>
>
>
> I hope you agree LLVM needs a type that can contain any other type.
>
>
>
>
>
> Fact 2: optimizations for this “universal holder” type need to be correct
for any LLVM IR type.
>
> Since this type can hold anything, optimizations that apply to this type
must be correct for integers, pointers, etc. Otherwise, the results would be
incorrect if the data was cast to the “wrong” type.
>
>
>
> The corollary of this fact is that GVN is not correct for the “universal
holder” type in general. That’s because GVN is not correct for pointers in
general (though correct in some cases). GVN doesn’t work for pointers since just
because 2 pointers compare equal, it doesn’t mean they have the same provenance.
As LLVM’s AA uses provenance information, this information must be preserved.
>
>
>
> Hopefully you agree with everything I wrote so far. If not please stop me
and ask.
>
>
>
>
>
> Ok, so what’s the solution?
>
> We have two candidates for this “universal holder” type:
>
> integer (i<x>)
> byte (b<x>), a new dedicated type
>
>
>
> The advantage of going with integers is that they already exist. The
disadvantage? To make LLVM correct, we would need to disable GVN in most cases
or stop using provenance information in AA (make pointers == integers).
>
> This sounds pretty bad to me due to the expect performance hit.
>
>
>
> Introducing a new type requires work, yes. But it allows us to keep all
integer optimizations as aggressive as we want, and only throttle down when
encountering the byte type. Another benefit is that loading a byte from memory
doesn’t escape any pointer, while with the first solution you need to consider
all stored pointers as escaped when loading an integer.
>
> Introducing the byte type allow us to fix all the integer/pointer casts
bugs in LLVM IR. And it enables more optimizations as we won’t need to be
careful because of the LLVM IR bug.
>
>
>
>
>
> A 3rd option is to keep everything as-is. That is, keep wrong optimizations
in LLVM and keep adding hacks to limit the miscompilations in real-world
programs.
>
> That has bitten us multiple times. Last time someone told me my
miscompilation example was academic, LLVM ended up miscompiling itself, and then
miscompiled Google’s code. Only this year we are planning to fully fix that bug
(we have a hack right now).
>
> Last week Alive2 caught a miscompilation in the Linux kernel, in the
network stack. The optimization got the pointer arithmetic wrong. Pretty scary,
and the bug may have security implications.
>
>
>
> Anyway, just to argue that leaving things as-is is not an option. We have
momentum and 3 GSoC students ready to help to fix the last few design bugs in
LLVM IR.
>
>
>
> I know this stuff is complex; please ask questions if something is not
clear or if you disagree with anything I wrote above.
>
>
>
> Thanks,
>
> Nuno
>
>
>
>
>
>
>
> From: Chris Lattner via llvm-dev
> Sent: 06 June 2021 05:27
> To: John McCall <rjmccall at apple.com>
> Cc: llvm-dev at lists.llvm.org; Ralf Jung <jung at mpi-sws.org>; 
> cfe-dev at lists.llvm.org
> Subject: Re: [llvm-dev] [cfe-dev] [RFC] Introducing a byte type to 
> LLVM
>
>
>
> On Jun 4, 2021, at 11:25 AM, John McCall via cfe-dev <cfe-dev at
lists.llvm.org> wrote:On 4 Jun 2021, at 11:24, George Mitenkov wrote:
>
> Hi all,
>
> Together with Nuno Lopes and Juneyoung Lee we propose to add a new 
> byte type to LLVM to fix miscompilations due to load type punning. 
> Please see the proposal below. It would be great to hear the 
> feedback/comments/suggestions!
>
>
> Motivation
> =========>
> char and unsigned char are considered to be universal holders in C. 
> They can access raw memory and are used to implement memcpy. i8 is the 
> LLVM’s counterpart but it does not have such semantics, which is also 
> not desirable as it would disable many optimizations.
>
> I don’t believe this is correct. LLVM does not have an innate concept 
> of typed memory. The type of a global or local allocation is just a 
> roundabout way of giving it a size and default alignment, and 
> similarly the type of a load or store just determines the width and 
> default alignment of the access. There are no restrictions on what 
> types can be used to load or store from certain objects.
>
> C-style type aliasing restrictions are imposed using tbaa metadata, 
> which are unrelated to the IR type of the access.
>
> I completely agree with John.  “i8” in LLVM doesn’t carry any implications
about aliasing (in fact, LLVM pointers are going towards being typeless).  Any
such thing occurs at the accesses, and are part of TBAA.
>
>
>
> I’m opposed to adding a byte type to LLVM, as such semantic carrying types
are entirely unprecedented, and would add tremendous complexity to the entire
system.
>
>
>
> -Chris
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

James Courtier-Dutton via llvm-dev

2021-Jun-06 19:29 UTC

head link

[llvm-dev] [cfe-dev] [RFC] Introducing a byte type to LLVM

See inline

On Sun, 6 Jun 2021 at 20:15, Nuno Lopes via cfe-dev
<cfe-dev at lists.llvm.org> wrote:> Now consider that p/q are pointers. In particular they are the following:
> char a[n];
> char q[n];
> char *p = a + n;
>
> inlining the definitions, originally we had:
> if (a+n == q) {     <-  This is the problem. This being true or false is
undefined. One compiler might make them equal, another will not.
>   *(a+n-1) = 0;
> }
>

Jeroen Dobbelaere via llvm-dev

2021-Jun-07 12:15 UTC

head link

[llvm-dev] [cfe-dev] [RFC] Introducing a byte type to LLVM

IMHO, the ptr_provenance support that is part of the full restrict patches [0]
can likely help for these cases ? The allow to explicitly associate the
ptr_provenance
of a load/store instruction.

In the past LLVM AA Tech Calls, we decided to pull those out of the full
restrict
patches and prepare them for inclusion. (Which I still have on my list of
todo's)

Greetings,

Jeroen Dobbelaere
[0] https://reviews.llvm.org/D68488 [PATCH 03/27] [noalias] [IR] Introduce
ptr_provenance for LoadInst/StoreInst


> -----Original Message-----
> From: cfe-dev <cfe-dev-bounces at lists.llvm.org> On Behalf Of Nuno
Lopes via
> cfe-dev
> Sent: Sunday, June 6, 2021 21:16
> To: 'David Blaikie' <dblaikie at gmail.com>
> Cc: 'llvm-dev' <llvm-dev at lists.llvm.org>; 'Ralf
Jung' <jung at mpi-sws.org>;
> 'Clang Dev' <cfe-dev at lists.llvm.org>; 'Chris
Lattner' <clattner at nondot.org>
> Subject: Re: [cfe-dev] [RFC] Introducing a byte type to LLVM
> 
> Of course, let's look at a simple example with GVN:
> if (p == q) {
>   *(p-1) = 0;
> }
> 
> If p/q are integers, then it's perfectly fine to replace p with q or
vice-
> versa inside the 'if'. GVN may decide to produce this:
> if (p == q) {
>   *(q-1) = 0;
> }
> 
> Now consider that p/q are pointers. In particular they are the following:
> char a[n];
> char q[n];
> char *p = a + n;
> 
> inlining the definitions, originally we had:
> if (a+n == q) {
>   *(a+n-1) = 0;
> }
> 
> The program is well-defined. However, GVN produces a ill-defined program:
> if (a+n == q) {
>   *(q-1) = 0;
> }
> 
> While a+n may have the same address as q, their provenance is different.
You
> can dereference a[n-1], but you can't dereference q[-1], even if they
have
> exactly the same address.
> 
> Pointer equalities can only be propagated if both pointers have the same
> provenance information. If two pointers are inbounds and compare equal,
then
> they have the same provenance, for example. That's a case when GVN for
> pointers is correct.
> 
> Another class of issues is related with alias analysis and implicit casts
done
> by integer loads. AA doesn't consider these implicit casts, which is
> incorrect. When we load an integer, we don't know if we are actually
loading
> an integer or a "raw byte", as there's no way to distinguish.
So, to be
> correct, we need to consider all integer loads as escaping pointers (unless
we
> introduce a way to distinguish "raw byte" loads from integer
loads). I gave
> one such example in my last reply to Johannes.
> 
> Please let me know if the example above isn't clear. Happy to elaborate
> further :)
> 
> Thanks,
> Nuno
> 
> 
> -----Original Message-----
> From: David Blaikie <dblaikie at gmail.com>
> Sent: 06 June 2021 19:58
> Subject: Re: [llvm-dev] [RFC] Introducing a byte type to LLVM
> 
> Perhaps a small concrete example of the GVN (or otherwise, if there are
> smaller examples in other optimizations) transformation that you find to be
> invalid - showing how it's important for integers, and invalid for
pointers
> (smallest example that manifests that invalidity in an observable way)
might
> help ground the conversation a bit.
> 
> On Sun, Jun 6, 2021 at 11:26 AM Nuno Lopes via llvm-dev <llvm-
> dev at lists.llvm.org> wrote:
> >
> > Let me try to explain the issue around typed memory. It’s orthogonal
to
> TBAA. Plus I agree the allocation type is irrelevant; it’s the type of the
> load/store operations that matters.
> >
> >
> >
> > Fact 1: LLVM needs an “universal holder” type in its IR.
> >
> > Why?
> >
> > Frontends load data from memory and don’t necessarily know the type
they are
> loading. When you load a char in C, it can be part of a pointer or an
integer,
> for example. Clang has no way to know.
> > That’s necessary to express implementations of memcpy in IR, as it
copies
> raw data without knowing what the type is.
> > LLVM lowers small memcpys into load/store.
> >
> >
> >
> > I hope you agree LLVM needs a type that can contain any other type.
> >
> >
> >
> >
> >
> > Fact 2: optimizations for this “universal holder” type need to be
correct
> for any LLVM IR type.
> >
> > Since this type can hold anything, optimizations that apply to this
type
> must be correct for integers, pointers, etc. Otherwise, the results would
be
> incorrect if the data was cast to the “wrong” type.
> >
> >
> >
> > The corollary of this fact is that GVN is not correct for the
“universal
> holder” type in general. That’s because GVN is not correct for pointers in
> general (though correct in some cases). GVN doesn’t work for pointers since
> just because 2 pointers compare equal, it doesn’t mean they have the same
> provenance. As LLVM’s AA uses provenance information, this information must
be
> preserved.
> >
> >
> >
> > Hopefully you agree with everything I wrote so far. If not please stop
me
> and ask.
> >
> >
> >
> >
> >
> > Ok, so what’s the solution?
> >
> > We have two candidates for this “universal holder” type:
> >
> > integer (i<x>)
> > byte (b<x>), a new dedicated type
> >
> >
> >
> > The advantage of going with integers is that they already exist. The
> disadvantage? To make LLVM correct, we would need to disable GVN in most
cases
> or stop using provenance information in AA (make pointers == integers).
> >
> > This sounds pretty bad to me due to the expect performance hit.
> >
> >
> >
> > Introducing a new type requires work, yes. But it allows us to keep
all
> integer optimizations as aggressive as we want, and only throttle down when
> encountering the byte type. Another benefit is that loading a byte from
memory
> doesn’t escape any pointer, while with the first solution you need to
consider
> all stored pointers as escaped when loading an integer.
> >
> > Introducing the byte type allow us to fix all the integer/pointer
casts bugs
> in LLVM IR. And it enables more optimizations as we won’t need to be
careful
> because of the LLVM IR bug.
> >
> >
> >
> >
> >
> > A 3rd option is to keep everything as-is. That is, keep wrong
optimizations
> in LLVM and keep adding hacks to limit the miscompilations in real-world
> programs.
> >
> > That has bitten us multiple times. Last time someone told me my
> miscompilation example was academic, LLVM ended up miscompiling itself, and
> then miscompiled Google’s code. Only this year we are planning to fully fix
> that bug (we have a hack right now).
> >
> > Last week Alive2 caught a miscompilation in the Linux kernel, in the
network
> stack. The optimization got the pointer arithmetic wrong. Pretty scary, and
> the bug may have security implications.
> >
> >
> >
> > Anyway, just to argue that leaving things as-is is not an option. We
have
> momentum and 3 GSoC students ready to help to fix the last few design bugs
in
> LLVM IR.
> >
> >
> >
> > I know this stuff is complex; please ask questions if something is not
clear
> or if you disagree with anything I wrote above.
> >
> >
> >
> > Thanks,
> >
> > Nuno
> >
> >
> >
> >
> >
> >
> >
> > From: Chris Lattner via llvm-dev
> > Sent: 06 June 2021 05:27
> > To: John McCall <rjmccall at apple.com>
> > Cc: llvm-dev at lists.llvm.org; Ralf Jung <jung at mpi-sws.org>;
> > cfe-dev at lists.llvm.org
> > Subject: Re: [llvm-dev] [cfe-dev] [RFC] Introducing a byte type to
> > LLVM
> >
> >
> >
> > On Jun 4, 2021, at 11:25 AM, John McCall via cfe-dev <cfe-
> dev at lists.llvm.org> wrote:On 4 Jun 2021, at 11:24, George Mitenkov
wrote:
> >
> > Hi all,
> >
> > Together with Nuno Lopes and Juneyoung Lee we propose to add a new
> > byte type to LLVM to fix miscompilations due to load type punning.
> > Please see the proposal below. It would be great to hear the
> > feedback/comments/suggestions!
> >
> >
> > Motivation
> > =========> >
> > char and unsigned char are considered to be universal holders in C.
> > They can access raw memory and are used to implement memcpy. i8 is the
> > LLVM’s counterpart but it does not have such semantics, which is also
> > not desirable as it would disable many optimizations.
> >
> > I don’t believe this is correct. LLVM does not have an innate concept
> > of typed memory. The type of a global or local allocation is just a
> > roundabout way of giving it a size and default alignment, and
> > similarly the type of a load or store just determines the width and
> > default alignment of the access. There are no restrictions on what
> > types can be used to load or store from certain objects.
> >
> > C-style type aliasing restrictions are imposed using tbaa metadata,
> > which are unrelated to the IR type of the access.
> >
> > I completely agree with John.  “i8” in LLVM doesn’t carry any
implications
> about aliasing (in fact, LLVM pointers are going towards being typeless). 
Any
> such thing occurs at the accesses, and are part of TBAA.
> >
> >
> >
> > I’m opposed to adding a byte type to LLVM, as such semantic carrying
types
> are entirely unprecedented, and would add tremendous complexity to the
entire
> system.
> >
> >
> >
> > -Chris
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://urldefense.com/v3/__https://lists.llvm.org/cgi-
> bin/mailman/listinfo/llvm-
>
dev__;!!A4F2R9G_pg!OaorXrTDpbE6DTSL_1JEUI55JMoefP2WL3XL_8iPWUad0B4K4HEnbYNE32W
> YUj8oOMYIgVLz$
> 
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://urldefense.com/v3/__https://lists.llvm.org/cgi-
> bin/mailman/listinfo/cfe-
>
dev__;!!A4F2R9G_pg!OaorXrTDpbE6DTSL_1JEUI55JMoefP2WL3XL_8iPWUad0B4K4HEnbYNE32W
> YUj8oOCB56bnt$

Richard Smith via llvm-dev

2021-Jun-11 00:25 UTC

head link

[llvm-dev] [cfe-dev] [RFC] Introducing a byte type to LLVM

On Sun, 6 Jun 2021 at 12:15, Nuno Lopes via cfe-dev <cfe-dev at
lists.llvm.org>
wrote:
> Of course, let's look at a simple example with GVN:
> if (p == q) {
>   *(p-1) = 0;
> }
>
> If p/q are integers, then it's perfectly fine to replace p with q or
> vice-versa inside the 'if'.

If we start with:

char a[n];
char b[n];
long p = (long)(a+n);
long q = (long)b;
if (p == q) {
   *((char*)p-1) = 0;
}

... what prevents us from observing the same issue?

GVN may decide to produce this:> if (p == q) {
>   *(q-1) = 0;
> }
>
> Now consider that p/q are pointers. In particular they are the following:
> char a[n];
> char q[n];
> char *p = a + n;
>
> inlining the definitions, originally we had:
> if (a+n == q) {
>   *(a+n-1) = 0;
> }
>
> The program is well-defined. However, GVN produces a ill-defined program:
> if (a+n == q) {
>   *(q-1) = 0;
> }
>
> While a+n may have the same address as q, their provenance is different.
> You can dereference a[n-1], but you can't dereference q[-1], even if
they
> have exactly the same address.
>
> Pointer equalities can only be propagated if both pointers have the same
> provenance information. If two pointers are inbounds and compare equal,
> then they have the same provenance, for example. That's a case when GVN
for
> pointers is correct.
>
> Another class of issues is related with alias analysis and implicit casts
> done by integer loads. AA doesn't consider these implicit casts, which
is
> incorrect. When we load an integer, we don't know if we are actually
> loading an integer or a "raw byte", as there's no way to
distinguish. So,
> to be correct, we need to consider all integer loads as escaping pointers
> (unless we introduce a way to distinguish "raw byte" loads from
integer
> loads). I gave one such example in my last reply to Johannes.
>
> Please let me know if the example above isn't clear. Happy to elaborate
> further :)
>
> Thanks,
> Nuno
>
>
> -----Original Message-----
> From: David Blaikie <dblaikie at gmail.com>
> Sent: 06 June 2021 19:58
> Subject: Re: [llvm-dev] [RFC] Introducing a byte type to LLVM
>
> Perhaps a small concrete example of the GVN (or otherwise, if there are
> smaller examples in other optimizations) transformation that you find to be
> invalid - showing how it's important for integers, and invalid for
pointers
> (smallest example that manifests that invalidity in an observable way)
> might help ground the conversation a bit.
>
> On Sun, Jun 6, 2021 at 11:26 AM Nuno Lopes via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >
> > Let me try to explain the issue around typed memory. It’s orthogonal
to
> TBAA. Plus I agree the allocation type is irrelevant; it’s the type of the
> load/store operations that matters.
> >
> >
> >
> > Fact 1: LLVM needs an “universal holder” type in its IR.
> >
> > Why?
> >
> > Frontends load data from memory and don’t necessarily know the type
they
> are loading. When you load a char in C, it can be part of a pointer or an
> integer, for example. Clang has no way to know.
> > That’s necessary to express implementations of memcpy in IR, as it
> copies raw data without knowing what the type is.
> > LLVM lowers small memcpys into load/store.
> >
> >
> >
> > I hope you agree LLVM needs a type that can contain any other type.
> >
> >
> >
> >
> >
> > Fact 2: optimizations for this “universal holder” type need to be
> correct for any LLVM IR type.
> >
> > Since this type can hold anything, optimizations that apply to this
type
> must be correct for integers, pointers, etc. Otherwise, the results would
> be incorrect if the data was cast to the “wrong” type.
> >
> >
> >
> > The corollary of this fact is that GVN is not correct for the
“universal
> holder” type in general. That’s because GVN is not correct for pointers in
> general (though correct in some cases). GVN doesn’t work for pointers since
> just because 2 pointers compare equal, it doesn’t mean they have the same
> provenance. As LLVM’s AA uses provenance information, this information must
> be preserved.
> >
> >
> >
> > Hopefully you agree with everything I wrote so far. If not please stop
> me and ask.
> >
> >
> >
> >
> >
> > Ok, so what’s the solution?
> >
> > We have two candidates for this “universal holder” type:
> >
> > integer (i<x>)
> > byte (b<x>), a new dedicated type
> >
> >
> >
> > The advantage of going with integers is that they already exist. The
> disadvantage? To make LLVM correct, we would need to disable GVN in most
> cases or stop using provenance information in AA (make pointers =>
integers).
> >
> > This sounds pretty bad to me due to the expect performance hit.
> >
> >
> >
> > Introducing a new type requires work, yes. But it allows us to keep
all
> integer optimizations as aggressive as we want, and only throttle down when
> encountering the byte type. Another benefit is that loading a byte from
> memory doesn’t escape any pointer, while with the first solution you need
> to consider all stored pointers as escaped when loading an integer.
> >
> > Introducing the byte type allow us to fix all the integer/pointer
casts
> bugs in LLVM IR. And it enables more optimizations as we won’t need to be
> careful because of the LLVM IR bug.
> >
> >
> >
> >
> >
> > A 3rd option is to keep everything as-is. That is, keep wrong
> optimizations in LLVM and keep adding hacks to limit the miscompilations in
> real-world programs.
> >
> > That has bitten us multiple times. Last time someone told me my
> miscompilation example was academic, LLVM ended up miscompiling itself, and
> then miscompiled Google’s code. Only this year we are planning to fully fix
> that bug (we have a hack right now).
> >
> > Last week Alive2 caught a miscompilation in the Linux kernel, in the
> network stack. The optimization got the pointer arithmetic wrong. Pretty
> scary, and the bug may have security implications.
> >
> >
> >
> > Anyway, just to argue that leaving things as-is is not an option. We
> have momentum and 3 GSoC students ready to help to fix the last few design
> bugs in LLVM IR.
> >
> >
> >
> > I know this stuff is complex; please ask questions if something is not
> clear or if you disagree with anything I wrote above.
> >
> >
> >
> > Thanks,
> >
> > Nuno
> >
> >
> >
> >
> >
> >
> >
> > From: Chris Lattner via llvm-dev
> > Sent: 06 June 2021 05:27
> > To: John McCall <rjmccall at apple.com>
> > Cc: llvm-dev at lists.llvm.org; Ralf Jung <jung at mpi-sws.org>;
> > cfe-dev at lists.llvm.org
> > Subject: Re: [llvm-dev] [cfe-dev] [RFC] Introducing a byte type to
> > LLVM
> >
> >
> >
> > On Jun 4, 2021, at 11:25 AM, John McCall via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:On 4 Jun 2021, at 11:24, George
Mitenkov
> wrote:
> >
> > Hi all,
> >
> > Together with Nuno Lopes and Juneyoung Lee we propose to add a new
> > byte type to LLVM to fix miscompilations due to load type punning.
> > Please see the proposal below. It would be great to hear the
> > feedback/comments/suggestions!
> >
> >
> > Motivation
> > =========> >
> > char and unsigned char are considered to be universal holders in C.
> > They can access raw memory and are used to implement memcpy. i8 is the
> > LLVM’s counterpart but it does not have such semantics, which is also
> > not desirable as it would disable many optimizations.
> >
> > I don’t believe this is correct. LLVM does not have an innate concept
> > of typed memory. The type of a global or local allocation is just a
> > roundabout way of giving it a size and default alignment, and
> > similarly the type of a load or store just determines the width and
> > default alignment of the access. There are no restrictions on what
> > types can be used to load or store from certain objects.
> >
> > C-style type aliasing restrictions are imposed using tbaa metadata,
> > which are unrelated to the IR type of the access.
> >
> > I completely agree with John.  “i8” in LLVM doesn’t carry any
> implications about aliasing (in fact, LLVM pointers are going towards being
> typeless).  Any such thing occurs at the accesses, and are part of TBAA.
> >
> >
> >
> > I’m opposed to adding a byte type to LLVM, as such semantic carrying
> types are entirely unprecedented, and would add tremendous complexity to
> the entire system.
> >
> >
> >
> > -Chris
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210610/5077bba2/attachment.html>

llvm dev - Jun 2021 - [cfe-dev] [RFC] Introducing a byte type to LLVM

[llvm-dev] [RFC] Introducing a byte type to LLVM

[llvm-dev] [cfe-dev] [RFC] Introducing a byte type to LLVM

[llvm-dev] [cfe-dev] [RFC] Introducing a byte type to LLVM

[llvm-dev] [cfe-dev] [RFC] Introducing a byte type to LLVM