Han Zhu via llvm-dev
2021-Jul-21 20:43 UTC
[llvm-dev] [cfe-dev] How to tell if a class contains tail padding?
After clang codegen, is the class layout final? Does the optimizer modify the class layout? On Wed, Jul 21, 2021 at 12:31 PM Eli Friedman <efriedma at quicinc.com> wrote:> I suspect the transform you’re trying to do is more complicated than > you’re making it out to be. > > > > In general, if you have a class that isn’t “POD for the purpose of layout” > (https://itanium-cxx-abi.github.io/cxx-abi/abi.html), derived classes can > store data in the tail padding. So the “padding” might contain data the > program cares about. If you want to overwrite that space, you need to > prove there isn’t a derived class storing data there. > > > > Possible proof approaches: > > > > 1. If the class is marked “final”, there aren’t any derived classes. > 2. Array indexing with the wrong pointer type might be illegal. > > > > -Eli > > > > *From:* cfe-dev <cfe-dev-bounces at lists.llvm.org> *On Behalf Of *Han Zhu > via cfe-dev > *Sent:* Wednesday, July 21, 2021 11:46 AM > *To:* cfe-dev at lists.llvm.org; llvm-dev at lists.llvm.org > *Subject:* [EXT] [cfe-dev] How to tell if a class contains tail padding? > > > > Hi, > > I'm working on an optimization to improve LoopIdiomRecognize pass. For a > trivial loop like this: > > ``` > struct S { > int a; > int b; > char c; > // 3 bytes padding > }; > > unsigned copy_noalias(S* __restrict__ a, S* b, int n) { > for (int i = 0; i < n; i++) { > a[i] = b[i]; > } > return sizeof(a[0]); > } > ``` > > Clang generates the below loop (some parts of IR omitted): > ``` > %struct.S = type { i32, i32, i8 } > > for.body: ; preds = %for.cond > %2 = load %struct.S*, %struct.S** %b.addr, align 8 > %3 = load i32, i32* %i, align 4 > %idxprom = sext i32 %3 to i64 > %arrayidx = getelementptr inbounds %struct.S, %struct.S* %2, i64 %idxprom > %4 = load %struct.S*, %struct.S** %a.addr, align 8 > %5 = load i32, i32* %i, align 4 > %idxprom1 = sext i32 %5 to i64 > %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %4, i64 > %idxprom1 > %6 = bitcast %struct.S* %arrayidx2 to i8* > %7 = bitcast %struct.S* %arrayidx to i8* > call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %6, i8* align 4 %7, i64 > 12, i1 false) > br label %for.inc > ``` > > It can be transformed into a single memcpy: > > ``` > for.body.preheader: ; preds = %entry > %b10 = bitcast %struct.S* %b to i8* > %a9 = bitcast %struct.S* %a to i8* > %0 = zext i32 %n to i64 > %1 = mul nuw nsw i64 %0, 12 > call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %a9, i8* align 4 %b10, > i64 %1, i1 false) > br label %for.cond.cleanup > ``` > > The problem is, if the copied elements are a class, this doesn't work. For > a > class with the same members: > ``` > %class.C = type <{ i32, i32, i8, [3 x i8] }> > ``` > > Clang does some optimization to generate a memcpy of nine bytes, omitting > the > tail padding: > > ``` > call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %6, i8* align 4 %7, i64 > 9, i1 false) > ``` > > Then in LLVM, we find the memcpy is not touching every byte of the array, > so > we abort the transformation. > > If we could tell the untouched three bytes are padding, we should be able > to > still do the optimization, but LLVM doesn't seem to have this information. > I > tried using `DataLayout::getTypeStoreSize()`, and it returned 12 bytes. I > also > tried `StructLayout`, and it treats the tail padding as a regular class > member. > > Is there an API in LLVM to tell if a class has tail padding? If not, would > it > be useful to add this feature? > > Thanks, > Han >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210721/9723be9b/attachment.html>
Richard Smith via llvm-dev
2021-Jul-21 21:27 UTC
[llvm-dev] [cfe-dev] How to tell if a class contains tail padding?
On Wed, 21 Jul 2021 at 13:43, Han Zhu via cfe-dev <cfe-dev at lists.llvm.org> wrote:> After clang codegen, is the class layout final? Does the optimizer modify > the class layout? >LLVM IR does not contain enough information to determine whether the optimization is valid. The IR that we generate for: struct S { ~S() {} int a; int b; char c; // 3 bytes padding }; unsigned copy_noalias(S* __restrict__ a, S* b, int n) { for (int i = 0; i < n; i++) { a[i] = b[i]; } return sizeof(a[0]); } ... would also be correct IR to generate for: struct T : S { char x, y, z; // stored in S's tail padding }; unsigned copy_noalias2(T* __restrict__ a, T* b, int n) { for (int i = 0; i < n; i++) { (S&)a[i] = (S&)b[i]; } return sizeof((S&)a[0]); } You'll need to generate some additional information from the frontend if you want to be able to do this. You could, in at least some cases, analyze the types and expressions involved and locally prove that you know the last few bytes are guaranteed to be padding, then generate !tbaa.struct metadata and attach it to the @llvm.memcpy call that the frontend emits. On Wed, Jul 21, 2021 at 12:31 PM Eli Friedman <efriedma at quicinc.com> wrote:> >> I suspect the transform you’re trying to do is more complicated than >> you’re making it out to be. >> >> >> >> In general, if you have a class that isn’t “POD for the purpose of >> layout” (https://itanium-cxx-abi.github.io/cxx-abi/abi.html), derived >> classes can store data in the tail padding. So the “padding” might contain >> data the program cares about. If you want to overwrite that space, you >> need to prove there isn’t a derived class storing data there. >> >> >> >> Possible proof approaches: >> >> >> >> 1. If the class is marked “final”, there aren’t any derived classes. >> 2. Array indexing with the wrong pointer type might be illegal. >> >> >> >> -Eli >> >> >> >> *From:* cfe-dev <cfe-dev-bounces at lists.llvm.org> *On Behalf Of *Han Zhu >> via cfe-dev >> *Sent:* Wednesday, July 21, 2021 11:46 AM >> *To:* cfe-dev at lists.llvm.org; llvm-dev at lists.llvm.org >> *Subject:* [EXT] [cfe-dev] How to tell if a class contains tail padding? >> >> >> >> Hi, >> >> I'm working on an optimization to improve LoopIdiomRecognize pass. For a >> trivial loop like this: >> >> ``` >> struct S { >> int a; >> int b; >> char c; >> // 3 bytes padding >> }; >> >> unsigned copy_noalias(S* __restrict__ a, S* b, int n) { >> for (int i = 0; i < n; i++) { >> a[i] = b[i]; >> } >> return sizeof(a[0]); >> } >> ``` >> >> Clang generates the below loop (some parts of IR omitted): >> ``` >> %struct.S = type { i32, i32, i8 } >> >> for.body: ; preds = %for.cond >> %2 = load %struct.S*, %struct.S** %b.addr, align 8 >> %3 = load i32, i32* %i, align 4 >> %idxprom = sext i32 %3 to i64 >> %arrayidx = getelementptr inbounds %struct.S, %struct.S* %2, i64 >> %idxprom >> %4 = load %struct.S*, %struct.S** %a.addr, align 8 >> %5 = load i32, i32* %i, align 4 >> %idxprom1 = sext i32 %5 to i64 >> %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %4, i64 >> %idxprom1 >> %6 = bitcast %struct.S* %arrayidx2 to i8* >> %7 = bitcast %struct.S* %arrayidx to i8* >> call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %6, i8* align 4 %7, >> i64 12, i1 false) >> br label %for.inc >> ``` >> >> It can be transformed into a single memcpy: >> >> ``` >> for.body.preheader: ; preds = %entry >> %b10 = bitcast %struct.S* %b to i8* >> %a9 = bitcast %struct.S* %a to i8* >> %0 = zext i32 %n to i64 >> %1 = mul nuw nsw i64 %0, 12 >> call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %a9, i8* align 4 %b10, >> i64 %1, i1 false) >> br label %for.cond.cleanup >> ``` >> >> The problem is, if the copied elements are a class, this doesn't work. >> For a >> class with the same members: >> ``` >> %class.C = type <{ i32, i32, i8, [3 x i8] }> >> ``` >> >> Clang does some optimization to generate a memcpy of nine bytes, omitting >> the >> tail padding: >> >> ``` >> call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %6, i8* align 4 %7, i64 >> 9, i1 false) >> ``` >> >> Then in LLVM, we find the memcpy is not touching every byte of the array, >> so >> we abort the transformation. >> >> If we could tell the untouched three bytes are padding, we should be able >> to >> still do the optimization, but LLVM doesn't seem to have this >> information. I >> tried using `DataLayout::getTypeStoreSize()`, and it returned 12 bytes. I >> also >> tried `StructLayout`, and it treats the tail padding as a regular class >> member. >> >> Is there an API in LLVM to tell if a class has tail padding? If not, >> would it >> be useful to add this feature? >> >> Thanks, >> Han >> > _______________________________________________ > cfe-dev mailing list > cfe-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210721/5e22b94b/attachment.html>