Hi, I've been wondering why LLVMs GEP instructions are based on types, rather than encoding the raw address calculation as a base pointer plus some scaled offsets (still in the form of a GEP, to retain provenance). The type information does not seem particularly useful (shouldn't be used as an optimization base, because struct layouts lie), but increases the non-canonical IR space (there are many ways to encode the same GEP) and increases compile-time (optimizations need to constantly decompose GEPs, e.g. to get constant offsets). What am I missing here? Nikita, Regards -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200713/28789b73/attachment.html>
Hi, Although I'm not an expert on the topic, there are at least two reasons: 1) It looks more like C/C++ than computing offsets. This goes hand in hand with the fact that GEP abstracts target-specific information. For example, a pointer is 4 bytes in a typical 32-bit system but 8 bytes in a 64-bit system. If you have a struct like: struct { int *p; int v; }; To get `v`, with a GEP you just say "give me the second member". If you were to code this with offsets, you would need to know the target, something that generally front-ends are not good to have a dependency on (Clang and other front-ends actually have and that's another big discussion). 2) It's very important for alias analysis. Again, not an expert on that, but e.g. see the first rule on when a pointer is based on another (pointer) here: https://llvm.org/docs/LangRef.html#pointeraliasing Best regards, Stefanos Στις Δευ, 13 Ιουλ 2020 στις 11:08 μ.μ., ο/η Nikita Popov via llvm-dev < llvm-dev at lists.llvm.org> έγραψε:> Hi, > > I've been wondering why LLVMs GEP instructions are based on types, rather > than encoding the raw address calculation as a base pointer plus some > scaled offsets (still in the form of a GEP, to retain provenance). > > The type information does not seem particularly useful (shouldn't be used > as an optimization base, because struct layouts lie), but increases the > non-canonical IR space (there are many ways to encode the same GEP) and > increases compile-time (optimizations need to constantly decompose GEPs, > e.g. to get constant offsets). > > What am I missing here? > > Nikita, > Regards > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200713/5d1c13a9/attachment.html>
You are right that it’s mostly a convenience for the front-ends. So they don’t have to deal with boring things like padding and sizing things. Otherwise it adds no semantic value. Object aliasing is not field sensitive in LLVM, so it doesn’t matter. Though someone may want to add support for that in the future for languages where it’s ok to do so. FWIW, Alive2’s GEP instruction works over bytes only (pairs of constant * %reg). Though I’m not sure I would advocate to change LLVM’s representation. Nuno From: Nikita Popov Sent: 13 July 2020 21:08 To: llvm-dev <llvm-dev at lists.llvm.org> Subject: [llvm-dev] Why are GEPs type based? Hi, I've been wondering why LLVMs GEP instructions are based on types, rather than encoding the raw address calculation as a base pointer plus some scaled offsets (still in the form of a GEP, to retain provenance). The type information does not seem particularly useful (shouldn't be used as an optimization base, because struct layouts lie), but increases the non-canonical IR space (there are many ways to encode the same GEP) and increases compile-time (optimizations need to constantly decompose GEPs, e.g. to get constant offsets). What am I missing here? Nikita, Regards -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200713/f4aed15a/attachment.html>
Good to know, thanks for the info. - Stefanos On Tue, Jul 14, 2020, 00:35 Nuno Lopes via llvm-dev <llvm-dev at lists.llvm.org> wrote:> You are right that it’s mostly a convenience for the front-ends. So they > don’t have to deal with boring things like padding and sizing things. > > Otherwise it adds no semantic value. Object aliasing is not field > sensitive in LLVM, so it doesn’t matter. Though someone may want to add > support for that in the future for languages where it’s ok to do so. > > FWIW, Alive2’s GEP instruction works over bytes only (pairs of constant * > %reg). Though I’m not sure I would advocate to change LLVM’s > representation. > > > > Nuno > > > > > > *From:* Nikita Popov > *Sent:* 13 July 2020 21:08 > *To:* llvm-dev <llvm-dev at lists.llvm.org> > *Subject:* [llvm-dev] Why are GEPs type based? > > > > Hi, > > > > I've been wondering why LLVMs GEP instructions are based on types, rather > than encoding the raw address calculation as a base pointer plus some > scaled offsets (still in the form of a GEP, to retain provenance). > > > > The type information does not seem particularly useful (shouldn't be used > as an optimization base, because struct layouts lie), but increases the > non-canonical IR space (there are many ways to encode the same GEP) and > increases compile-time (optimizations need to constantly decompose GEPs, > e.g. to get constant offsets). > > > > What am I missing here? > > > > Nikita, > > Regards > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200714/407aa166/attachment.html>