James Courtier-Dutton via llvm-dev
2021-Jun-07 11:57 UTC
[llvm-dev] [cfe-dev] [RFC] Introducing a byte type to LLVM
On Fri, 4 Jun 2021 at 17:35, George Mitenkov via cfe-dev <cfe-dev at lists.llvm.org> wrote:> > Hi Johannes, > > Sure! The underlying problem is that raw-memory access handlers are treated > as integers, while they are not really integers. Especially std::byte that specifically > states that it has raw-memory access semantics. This semantic mismatch can make > AA wrong and a pointer to escape. > > Consider the following LLVM IR that copies a pointer:You are making an assumption here. By just looking at the IR code here, I don't think you can really be sure what the type of the thing being copied is.> %src8 = bitcast i8** %src to i8* > %dst8 = bitcast i8** %dst to i8* > call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dst8, i8* %src8, i32 8, i1 false) > %load = load i8*, i8** %dst > %addr = ptrtoint i8* %load to i64 > ret i64 %addr > > If we optimize the call to memcpy, then the IR becomesSame here, by just looking at the IR code here, I don't think you can really be sure what the type of the thing being copied is.> %src64 = bitcast i8** %src to i64* > %dst64 = bitcast i8** %dst to i64* > %addr = load i64, i64* %src64, align 1 > store i64 %addr, i64* %dst64, align 1 > ret i64 %addr >One can do bitcasts etc, to obscure the actual type of the bytes being copied. In both those examples, 8 bytes are copied, and the same value is returned. So the end program will function the same when run. Essentially, there is not enough information in the above code to determine if the 8 bytes copied are part of a pointer or not. For AA analysis, I would say, more information is needed. One can only really be sure what type those bytes are, and that they are a pointer when they are actually used as a pointer argument to a LOAD or STORE. There are some other operations that can also be used to infer whether it is a pointer or not, but the LOAD/STORE is the simplest example. Kind Regards James
George Mitenkov via llvm-dev
2021-Jun-07 12:25 UTC
[llvm-dev] [cfe-dev] [RFC] Introducing a byte type to LLVM
The purpose of an example is to make an assumption about what IR we have, and to show that it becomes wrong after optimization. I am not sure I get your comment here. Same here, by just looking at the IR code here, I don't think you can> really be sure what the type of the thing being copied is.That is exactly the point. We do not know what type we are copying - it may be an integer, or it may be a pointer. Importantly, we can see that the ptr2int instruction disappeared, and the optimized code returning the integer actually can escape the pointer. Using a byte type instead of i64 removes implicit pointer casts and therefore helps AA to catch this case. One can do bitcasts etc, to obscure the actual type of the bytes being> copied. > In both those examples, 8 bytes are copied, and the same value is > returned. So the end program will function the same when run. > Essentially, there is not enough information in the above code to > determine if the 8 bytes copied are part of a pointer or not. > For AA analysis, I would say, more information is needed.This is just an example of a wrong optimization. It is important because bugs like this appear partially because of this exact optimization: https://bugs.llvm.org/show_bug.cgi?id=37469 The mecpy is replaced with load/store pairs and store forwarding happens incorrectly. I haven't shown it in full, and hence there may be a bit of confusion. I am happy to elaborate more if this is still unclear! Thanks, George On Mon, Jun 7, 2021 at 2:58 PM James Courtier-Dutton <james.dutton at gmail.com> wrote:> On Fri, 4 Jun 2021 at 17:35, George Mitenkov via cfe-dev > <cfe-dev at lists.llvm.org> wrote: > > > > Hi Johannes, > > > > Sure! The underlying problem is that raw-memory access handlers are > treated > > as integers, while they are not really integers. Especially std::byte > that specifically > > states that it has raw-memory access semantics. This semantic mismatch > can make > > AA wrong and a pointer to escape. > > > > Consider the following LLVM IR that copies a pointer: > You are making an assumption here. By just looking at the IR code > here, I don't think you can really be > sure what the type of the thing being copied is. > > %src8 = bitcast i8** %src to i8* > > %dst8 = bitcast i8** %dst to i8* > > call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dst8, i8* %src8, i32 8, i1 > false) > > %load = load i8*, i8** %dst > > %addr = ptrtoint i8* %load to i64 > > ret i64 %addr > > > > If we optimize the call to memcpy, then the IR becomes > Same here, by just looking at the IR code here, I don't think you can > really be sure what the type of the thing being copied is. > > %src64 = bitcast i8** %src to i64* > > %dst64 = bitcast i8** %dst to i64* > > %addr = load i64, i64* %src64, align 1 > > store i64 %addr, i64* %dst64, align 1 > > ret i64 %addr > > > > One can do bitcasts etc, to obscure the actual type of the bytes being > copied. > In both those examples, 8 bytes are copied, and the same value is > returned. So the end program will function the same when run. > Essentially, there is not enough information in the above code to > determine if the 8 bytes copied are part of a pointer or not. > For AA analysis, I would say, more information is needed. > > One can only really be sure what type those bytes are, and that they > are a pointer when they are actually used as a pointer argument to a > LOAD or STORE. > There are some other operations that can also be used to infer whether > it is a pointer or not, but the LOAD/STORE is the simplest example. > > Kind Regards > > James >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210607/49ab52cb/attachment.html>