Jie Zhou via llvm-dev
2020-Jan-07 02:45 UTC
[llvm-dev] Best way of implement a fat pointer for C
Dear All, I’m working on a project that extends C. I’m adding a new type of pointer that is a fat pointer. It has some metadata about the pointed object besides the starting address of the object. Currently I implemented this pointer as an llvm:StructType. In llvm::Type generation function llvm::Type *CodeGenTypes::ConvertType(QualType T) in the case for clang::Type::Pointer, instead of creating an llvm::PointerType I create an llvm::StructType type for this new type of pointer. And I added some helper code in llvm::StructType and in multiple places I added code to trick the compiler to believe sometimes a struct is actually a pointer. Until now it compile test programs fine with -O0 but I got lots of assertion failures when compiling with -O1 or -O2 majorly because of the confusion of type mismatch. LLVM assumes that a PointerType is essentially an Integer (32 or 64 bit depending on the architecture), and since this is quite a fundamental assumption, I started to question whether my way of implementing the fat pointer is feasible. I thought about adding a new llvm type that inherits both llvm:PointerType and llvm:StructType; but I’m not sure if this is the correct path. It looks like this demands substantial changes to the compiler including adding code for bitcode generation. Can you give me some advice on how to implement a fat pointer in llvm? Thanks, - Jie -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200107/3fe60169/attachment.html>
Jacob Lifshay via llvm-dev
2020-Jan-07 05:20 UTC
[llvm-dev] Best way of implement a fat pointer for C
On Mon, Jan 6, 2020, 18:45 Jie Zhou via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Dear All, ><snip>>Can you give me some advice on how to implement> a fat pointer in llvm? >Rustc currently implements fat pointers in function arguments by passing a pair of arguments: the pointer to the object's data and the associated data which is either the length as a usize (C's uintptr_t) or a pointer to the vtable. Fat pointers in return values are passed as a two-member struct where the first member is the pointer to the object's data and the second is the length or vtable pointer. See https://rust.godbolt.org/z/cjoRNG for the (definitely non-idiomatic) rust source code as well as the LLVM IR. Btw, you should definitely check out Rust if you haven't already at https://www.rust-lang.org/ Jacob Lifshay -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200106/deab3222/attachment.html>
David Chisnall via llvm-dev
2020-Jan-07 11:51 UTC
[llvm-dev] Best way of implement a fat pointer for C
Hi, For CHERI, we use pointers in address space 200 to represent memory capabilities (which are a kind of fat pointer). These are able to pass through the LLVM pipeline and we then lower them to special instructions in the various targets that understand that pointers are a heardware-enforced type. It would be possible to add late pass that then expanded these into a StructType that contained the address and whatever metadata you wanted, and expanded loads and stores to use the address component. Note that if you want to hoist checks out of loops, you will want to do this expansion somewhere in the middle of your pass pipeline. The tricky part here is in function parameters: you cannot easily change the type of a function after it has been created. Your best bet here is to always pass fat pointers as the structure representation. LLVM does not assume that pointers are integers - we have done a lot of work to remove that assumption and the IR has always made the two types distinct. There are still a few rough areas, but these are bugs. We are able to compile nontrivial codebases (e.g. FreeBSD, WebKit) with optimisations enabled for targets where pointers and integers are distinct types at the hardware level. David On 07/01/2020 02:45, Jie Zhou via llvm-dev wrote:> Dear All, > > I’m working on a project that extends C. I’m adding a new type of pointer > that is a fat pointer. It has some metadata about the pointed object besides > the starting address of the object. Currently I implemented this pointer as > an llvm:StructType. In llvm::Type generation function > /llvm::Type *CodeGenTypes::ConvertType(QualType T)/ > in the case for /clang::Type::Pointer/, instead of creating an > llvm::PointerType > I create an llvm::StructType type for this new type of pointer. And I > added some > helper code in llvm::StructType and in multiple places I added code to trick > the compiler to believe sometimes a struct is actually a pointer. Until now > it compile test programs fine with -O0 but I got lots of assertion > failures when > compiling with -O1 or -O2 majorly because of the confusion of type mismatch. > > LLVM assumes that a PointerType is essentially an Integer (32 or 64 bit > depending > on the architecture), and since this is quite a fundamental assumption, > I started > to question whether my way of implementing the fat pointer is feasible. > I thought about adding a new llvm type that inherits both llvm:PointerType > and llvm:StructType; but I’m not sure if this is the correct path. It > looks like > this demands substantial changes to the compiler including adding code > for bitcode generation. Can you give me some advice on how to implement > a fat pointer in llvm? > > Thanks, > - Jie > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >
Jie Zhou via llvm-dev
2020-Jan-07 16:00 UTC
[llvm-dev] Best way of implement a fat pointer for C
> On Jan 7, 2020, at 06:51, David Chisnall via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hi, > > For CHERI, we use pointers in address space 200 to represent memory capabilities (which are a kind of fat pointer). These are able to pass through the LLVM pipeline and we then lower them to special instructions in the various targets that understand that pointers are a heardware-enforced type. It would be possible to add late pass that then expanded these into a StructType that contained the address and whatever metadata you wanted, and expanded loads and stores to use the address component. Note that if you want to hoist checks out of loops, you will want to do this expansion somewhere in the middle of your pass pipeline. > > The tricky part here is in function parameters: you cannot easily change the type of a function after it has been created. Your best bet here is to always pass fat pointers as the structure representation.Hi David, Yes passing pointers thorough function parameters and return values is tricky; I got yelled at by the compiler when I tried to use llvm::Value::mutateType() to change the prototype of a function. :-)> > LLVM does not assume that pointers are integers - we have done a lot of work to remove that assumption and the IR has always made the two types distinct. There are still a few rough areas, but these are bugs. We are able to compile nontrivial codebases (e.g. FreeBSD, WebKit) with optimisations enabled for targets where pointers and integers are distinct types at the hardware level.You’re correct that LLVM does not assume that pointers are integers. I think what I really tried to say in my previous email is that current LLVM implements PointerType as Integer and this gives me trouble in tons of places such as the memory layout for a pointer. In you experience, do you think it’s feasible to create a new llvm Type that inherits both llvm::PointerType and llvm::StructType and modify the bitcode generator to support this new type? Thanks, - Jie> > David > > On 07/01/2020 02:45, Jie Zhou via llvm-dev wrote: >> Dear All, >> I’m working on a project that extends C. I’m adding a new type of pointer >> that is a fat pointer. It has some metadata about the pointed object besides >> the starting address of the object. Currently I implemented this pointer as >> an llvm:StructType. In llvm::Type generation function >> /llvm::Type *CodeGenTypes::ConvertType(QualType T)/ >> in the case for /clang::Type::Pointer/, instead of creating an llvm::PointerType >> I create an llvm::StructType type for this new type of pointer. And I added some >> helper code in llvm::StructType and in multiple places I added code to trick >> the compiler to believe sometimes a struct is actually a pointer. Until now >> it compile test programs fine with -O0 but I got lots of assertion failures when >> compiling with -O1 or -O2 majorly because of the confusion of type mismatch. >> LLVM assumes that a PointerType is essentially an Integer (32 or 64 bit depending >> on the architecture), and since this is quite a fundamental assumption, I started >> to question whether my way of implementing the fat pointer is feasible. >> I thought about adding a new llvm type that inherits both llvm:PointerType >> and llvm:StructType; but I’m not sure if this is the correct path. It looks like >> this demands substantial changes to the compiler including adding code >> for bitcode generation. Can you give me some advice on how to implement >> a fat pointer in llvm? >> Thanks, >> - Jie >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=DwIGaQ&c=kbmfwr1Yojg42sGEpaQh5ofMHBeTl9EI2eaqQZhHbOU&r=KAtyTEI8n3FritxDpKpR7rv3VjdmUs0luiVKZLb_bNI&m=2-GzO8LApi_o_V9sEL0O1W7epVVG8TKzx6D4yoSozXY&s=ChAnwkSzjNn11lIv696-rLlKWO9h6ON3g1knJDUG31g&e= > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=DwIGaQ&c=kbmfwr1Yojg42sGEpaQh5ofMHBeTl9EI2eaqQZhHbOU&r=KAtyTEI8n3FritxDpKpR7rv3VjdmUs0luiVKZLb_bNI&m=2-GzO8LApi_o_V9sEL0O1W7epVVG8TKzx6D4yoSozXY&s=ChAnwkSzjNn11lIv696-rLlKWO9h6ON3g1knJDUG31g&e=