Tim Northover via llvm-dev
2020-Jan-09 10:08 UTC
[llvm-dev] Position independent code writes absolute pointer
Hi Gaier, There's no way to do this automatically in LLVM at the moment. It sounds kind of related to pointer compression techniques (also not supported right now). On Thu, 9 Jan 2020 at 08:14, Gaier, Bjoern via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Could it be possible to modify the code on the IR-Level to store PIC/offset address and not absolute address? I’m not familiar with the LLVM IR so I don’t know what is possible and how it effects the code at all.It depends how much control you have over the code. You could instrument code so that it converted all stores of pointers to be relative to some fixed global (PC-relative doesn't work there because it will be loaded at a different address, and "relative to the address it's being stored to" would break memcpy). But that has some major issues: 1. It's an ABI break, so you have to be able to recompile all code, including any system libraries you make use of. 2. LLVM can only convert the pointers it knows about, so it would still be broken by someone storing a pointer via an intptr_t cast and probably other things I haven't thought of. 3. There probably isn't even a relocation for any statically initialized pointers. You might be able to convert all of them to use a dynamic module initializer instead though. 4. I'd expect debugging to go horribly wrong. Cheers. Tim.
Gaier, Bjoern via llvm-dev
2020-Jan-09 10:34 UTC
[llvm-dev] Position independent code writes absolute pointer
Hey Tim, Thank you for the answer! I expected something like that sadly :< However...> It depends how much control you have over the code. You could instrument code so that it converted all stores of pointers to be relative to some fixed global (PC-relative doesn't work there because it will be loaded at a different address, and "relative to the address it's being stored to" would break memcpy). But that has some majorThis sounds interesting from a learning perspective, because I never have done something like that. Is this difficult to do? Also why only convert the stores? Shouldn't I also convert the reads so they are also valid? Kind greetings Björn -----Original Message----- From: Tim Northover <t.p.northover at gmail.com> Sent: 09 January 2020 11:08 To: Gaier, Bjoern <Bjoern.Gaier at horiba.com> Cc: llvm-dev at lists.llvm.org Subject: Re: [llvm-dev] Position independent code writes absolute pointer Hi Gaier, There's no way to do this automatically in LLVM at the moment. It sounds kind of related to pointer compression techniques (also not supported right now). On Thu, 9 Jan 2020 at 08:14, Gaier, Bjoern via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Could it be possible to modify the code on the IR-Level to store PIC/offset address and not absolute address? I’m not familiar with the LLVM IR so I don’t know what is possible and how it effects the code at all.It depends how much control you have over the code. You could instrument code so that it converted all stores of pointers to be relative to some fixed global (PC-relative doesn't work there because it will be loaded at a different address, and "relative to the address it's being stored to" would break memcpy). But that has some major issues: 1. It's an ABI break, so you have to be able to recompile all code, including any system libraries you make use of. 2. LLVM can only convert the pointers it knows about, so it would still be broken by someone storing a pointer via an intptr_t cast and probably other things I haven't thought of. 3. There probably isn't even a relocation for any statically initialized pointers. You might be able to convert all of them to use a dynamic module initializer instead though. 4. I'd expect debugging to go horribly wrong. Cheers. Tim. Als GmbH eingetragen im Handelsregister Bad Homburg v.d.H. HRB 9816, USt.ID-Nr. DE 114 165 789 Geschäftsführer: Dr. Hiroshi Nakamura, Dr. Robert Plank, Markus Bode, Heiko Lampert, Takashi Nagano, Takeshi Fukushima. Junichi Tajika
Tim Northover via llvm-dev
2020-Jan-09 11:27 UTC
[llvm-dev] Position independent code writes absolute pointer
Hi Bjoern, On Thu, 9 Jan 2020 at 10:34, Gaier, Bjoern <Bjoern.Gaier at horiba.com> wrote:> > It depends how much control you have over the code. You could instrument code so that it converted all stores of pointers to be relative to some fixed global (PC-relative doesn't work there because it will be loaded at a different address, and "relative to the address it's being stored to" would break memcpy). But that has some major > > This sounds interesting from a learning perspective, because I never have done something like that. Is this difficult to do? Also why only convert the stores? Shouldn't I also convert the reads so they are also valid?Sorry, I meant to say you'd have to undo the transformation on the loads (and atomicrmw, cmpxchg) too. I think getting something that sometimes works would actually be quite easy. You'd want to make it a ModulePass to handle the globals, then you'd iterate through each function, turning a store like: store %type* %val, %type** %ptr into: %val.int = ptrtoint %type* %val to i64 %val.int.new = sub i64 %val.int, ptrtoint(i8* @__GLOBAL_ANCHOR to i64) %val.new = inttoptr i64 %val.int.new to %type* store %type* %val.new, %type** %ptr The corresponding load side would add back @__GLOBAL_ANCHOR. At the Module level you'd add some kind of tentative definition for GLOBAL_ANCHOR so it can be merged if needed, and convert a definition like: @var = global i8* @other_global into @var = global i8* null define void @__MODULE_INIT() { ; Duplicate store code above to put a relative value for @other_global into @var } %0 = type { i32, void ()*, i8* } @llvm.global_ctors = appending global [1 x %0] [%0 { i32 65535, void ()* @__MODULE_INIT, i8* null }] Unfortunately I've also thought of a couple more nasty problems while writing this out: 1. Things like target-specific vector intrinsics that do loads and stores might obscure the fact that they're storing a pointer by casting it to an i64 or something. 2. You'd have to make sure the stack for both programs as in the shared region or no-one ever used a pointer to a local variable. Cheers. Tim.