Rafael Espíndola via llvm-dev
2015-Aug-26 15:50 UTC
[llvm-dev] Proposal: arbitrary relocations in constant global initializers
Now with the correct list. On 26 August 2015 at 11:49, Rafael Espíndola <rafael.espindola at gmail.com> wrote:> This is pr10368. > > Do we really need to support hard coded relocation numbers? Looks like > the examples above have a representation as constant expressions: > > (sub (add (ptrtoint @foo) 0xeafffffe) cur_pos) > > no? > > Why do you need to be able to avoid them showing up in function > bodies? It would be unusual but valid to pass the above value as an > argument to a function. > > Cheers, > Rafael > > > > On 29 July 2015 at 15:44, Peter Collingbourne <peter at pcc.me.uk> wrote: >> Hi, >> >> I’d like to make this proposal for extending the Constant hierarchy with >> a mechanism for introducing custom relocations in global initializers. This >> could also be seen as a first step towards adding a “bag-of-bytes with >> relocations” representation for global initializers. >> >> Problem >> >> In order to implement control flow integrity for indirect function calls, we >> would like to add a set of constructs to the IR that ultimately allow for a >> jump table similar to that described for IFCC in [1] to be expressed. Ideally >> the additions should be minimal and general-purpose enough to allow them to >> be used for other purposes. >> >> IFCC, the previous attempt to teach LLVM to emit jump tables, was removed >> for complicating how functions are emitted, in particular requiring a >> subtarget-specific instruction emitter available in subtarget-independent >> code. However, the form of a jump table entry is generally well known to >> whichever component of the compiler is creating the jump table (for example, it >> needs to know the size of each entry, and therefore the specific instructions >> used), and we can therefore simplify things greatly by not considering jump >> tables as consisting of instructions, but rather known strings of bytes in >> the .text section with a relocation pointing to the function address. For >> example, on x86: >> >> $ cat tc.ll >> declare void @foo() >> >> define void @bar() { >> tail call void @foo() >> ret void >> } >> $ ~/src/llvm-build-rel/bin/llc -filetype=obj -o - tc.ll -O3 |~/src/llvm-build-rel/bin/llvm-objdump -d -r - >> <stdin>: file format ELF64-x86-64 >> >> Disassembly of section .text: >> bar: >> 0: e9 00 00 00 00 jmp 0 <bar+5> >> 0000000000000001: R_X86_64_PC32 foo-4-P >> >> >> >> Or on ARM: >> >> $ ~/src/llvm-build-rel/bin/llc -filetype=obj -o - tc.ll -O3 -mtriple=armv7-unknown-linux |~/src/llvm-build-rel/bin/llvm-objdump -d -r - >> >> <stdin>: file format ELF32-arm-little >> >> Disassembly of section .text: >> bar: >> 0: fe ff ff ea b #-8 <bar> >> 00000000: R_ARM_JUMP24 foo >> >> >> How can we represent such jump table entries in IR? One way that almost >> works on x86 is to attach a constant to a function using either prefix data >> or prologue data, or to place a GlobalVariable in the .text section using >> the section attribute. The constant would use ConstantExpr arithmetic to >> produce the required PC32 relocation: >> >> define void @bar() prefix <{ i8, i32, i8, i8, i8 }> <{ i8 -23, i32 trunc (i64 add (i64 sub (i64 ptrtoint (void ()* @foo to i64), i64 ptrtoint (void ()* @bar to i64)), i64 3) to i32), i8 -52, i8 -52, i8 -52 }> { >> ... >> } >> >> However, this is awkward, and can’t be used to represent an ARM jump table >> entry. (It also isn’t quite right; PC32 can trigger the creation of a >> PLT entry, which doesn’t entirely match what the ConstantExpr arithmetic >> is doing.) >> >> Design >> >> A relocation can be seen as having three inputs: the relocation type (on >> Mach-O this also includes a pcrel flag), the target, and the addend. So >> let’s define a relocation constant like this: >> >> iNN reloc relocation_type (ptr target, iNN addend) >> >> where iNN is some integer type, and ptr is some pointer type. For example, >> an ARM jump table entry might look like this: >> >> i32 reloc 0x1d (void ()* @foo, i32 0xeafffffe) ; R_ARM_JUMP24 = 0x1d >> >> There is no error checking for this; if you use the wrong integer type for >> a particular relocation, things will break and you get to keep both pieces. >> >> At the asm level, we would add a single directive, ".reloc", whose syntax >> would look like this when targeting ELF and COFF: >> >> .reloc size relocation_type target addend >> >> or this when targeting Mach-O: >> >> .reloc size relocation_type pcrel target addend >> >> The code generator would emit this directive when emitting a reloc in a >> constant initializer. (Note that this means that reloc constants would only >> be supported with the integrated assembler.) >> >> For example, the ARM JUMP24 relocation would look like this: >> >> .reloc 4 0x1d foo 0xeafffffe >> >> We would need to add some mechanism for the assembler to evaluate relocations >> in case the symbol is locally defined and not exported. For that reason, >> we can start with a small set of supported "internal" relocations and expand >> as needed. >> >> What about constant propagation? >> >> We do not want reloc constants to appear in functions' IR, or to be propagated >> out of global initializers that use them. The simplest solution to this >> problem is to only allow reloc constants in constant initializers where we >> cannot/do not currently perform constant propagation, i.e. function prologue >> data, prefix data and constants with weak linkage. This could be enforced >> by the verifier. Later we can consider relaxing this constraint as needed. >> >> Other uses >> >> Relocation constants could be used for other purposes by frontends. For >> example, a frontend may need to represent some other kind of custom/specific >> instruction sequence in IR, or to create arbitrary kinds of references between >> objects where that may be beneficial (for example, -fsanitize=function may >> use this facility to create GOTOFF relocations in function prologues to >> avoid creating dynamic relocations in the .text section to fix PR17633). >> >> Thanks, >> -- >> Peter >> >> [1] http://www.pcc.me.uk/~peter/acad/usenix14.pdf