thr3ads.net - llvm dev - [llvm-dev] [MTE] Globals Tagging

If this information is useful, please help other people find it:
Share via:

Mitch Phillips via llvm-dev

2020-Sep-17 22:05 UTC

[llvm-dev] [MTE] Globals Tagging - Discussion

Hi folks,

ARM v8.5 introduces the Memory Tagging Extension (MTE), a hardware that
allows for detection of memory safety bugs (buffer overflows,
use-after-free, etc) with low overhead. So far, MTE support is implemented
in the Scudo hardened allocator (compiler-rt/lib/scudo/standalone) for
heap, and stack allocation is implemented in LLVM/Clang behind
-fsanitize=memtag <https://llvm.org/docs/MemTagSanitizer.html>.

As part of a holistic MTE implementation, global memory should also be
properly tagged. HWASan
<http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html>
(a
software-only implementation of MTE) has a schema that uses static tags,
however these can be trivially determined by an attacker with access to the
ELF file. This would allow attackers with arbitrary read/write to trivially
attack global variables. It would also allow attackers with a semilinear RW
primitive to trivially attack global variables if the offset is
controllable. Dynamic global tags are required to provide the same MTE
mitigation guarantees that are afforded to stack and heap memory.

We've got a plan in mind about how to do MTE globals with fully dynamic
tags, but we'd love to get feedback from the community. In particular -
we'd like to try and align implementation details with GCC as the scheme
requires cooperation from the compiler, linker, and loader.

Our current ideas are outlined below. All the compiler features (including
realignment, etc.) would be guarded behind -fsanitize=memtag. Protection of
read-only globals would be enabled-by-default, but can be disabled at
compile time behind a flag (likely -f(no)sanitize-memtag-ro-globals).

a) Dynamic symbols (int f; extern int f;)

   1.

   Mark all tagged global data symbols in the dynamic symbol table as
   st_other.STO_TAGGED.
   2.

   Teach the loader to read the symbol table at load time (and dlopen())
   prior to relocations, and apply random memory tags (via. `irg -> stg`) to
   each STO_TAGGED carrying global.

b) Hidden Symbols (static int g; or -fvisibility=hidden)

   1.

   Have the compiler mark hidden tagged globals in the symbol table as
   st_other.STO_TAGGED.
   2.

   Have the linker read the symbol table and create a table of {
   unrelocated virtual address, size } pairs for each STO_TAGGED carrying
   hidden global, storing this in a new section (.mteglobtab).
   3.

   Create a new dynamic entry "DT_MTEGLOBTAB" that points to this
segment,
   along with "DT_MTEGLOBENT" for the size of each entry and
"DT_MTEGLOBSZ"
   for the size (in bytes) of the table.
   4.

   Similar to dynamic symbols, teach the loader to read this table and
   apply random memory tags to each global prior to relocations.
   5.

   Materialization of hidden symbols now fetch and insert the memory tag
   via. `ldg`. On aarch64, this means non PC-relative
   loads/stores/address-taken (*g = 7;) generates:
     adrp x0, g;
     ldg x0, [x0, :lo12:g]; // new instruction
     mov x1, #7;
     str x1, [x0, :lo12:g];

   Note that this materialization sequence means that executables built
   with MTE globals are not able to run on non-MTE hardware.

Note: Some dynamic symbols can be transformed at link time into hidden
symbols if:

   1.

   The symbol is in an object file that is statically linked into an
   executable and is not referenced in any shared libraries, or
   2.

   The symbol has its visibility changed with a version script.

These globals always have their addresses derived from a GOT entry, and
thus have their address tag materialized through the RELATIVE relocation of
the GOT entry. Due to the lack of dynamic symbol table entry however, the
memory would go untagged. The linker must ensure it creates an MTEGLOBTAB
entry for all hidden MTE-globals, including those that are transformed from
external to hidden. DSO's linked with -Bsymbolic retain their dynamic
symbol table entries, and thus require no special handling.

c) All symbols

   1.

   Realign to granule size (16 bytes), resize to multiple of granule size
   (e.g. 40B -> 48B).
   2.

   Ban data folding (except where contents and size are same, no tail
   merging).
   3.

   In the loader, ensure writable segments (and possibly .rodata, see next
   dot point) are mapped MAP_ANONYMOUS and PROT_MTE (with the contents of the
   mappings filled from the file), as file-based mappings aren't necessarily
   backed by tag-capable memory. It also requires in-place remapping of data
   segments from the program image (as they're already mapped by the kernel
   before PT_INTERP invokes the loader).
   4.

   Make .rodata protection optional. When read-only protection is in use,
   the .rodata section should be moved into a separate segment. For Bionic
   libc, the rodata section takes up 20% of its ALLOC | READ segment, and
we'd
   like to be able to maintain page sharing for the remaining 189KiB of other
   read-only data in this segment.

d) Relocations

   1.

   GLOB_DAT, ABS64, and RELATIVE relocations change semantics - they would
   be required to retrieve and insert the memory tag of the symbol into the
   relocated value. For example, the ABS64 relocation becomes:
     sym_addr = get_symbol_address() // sym_addr = 0x1008
     sym_addr |= get_tag(sym_addr & 0xf) // get_tag(0x1008 & 0xf ==
0x1000)
     *r_offset = sym_addr + r_addend;
   2.

   Introduce a TAGGED_RELATIVE relocation - in order to solve the problem
   where the tag derivation shouldn't be from the relocation result, e.g.
   static int array[16] = {};
   // array_end must have the same tag as array[]. array_end is out of
   // bounds w.r.t. array, and may point to a completely different global.
   int *array_end = &array[16];

   TAGGED_RELATIVE stores the untagged symbol value in the place (*r_offset
   == &array[16]), and keeps the address where the tag should be derived in
   the addend (RELA-only r_addend == &array[0]).

   For derived symbols where the granule-aligned address is in-bounds of
   the tag (e.g. array_end = &array[7] implies the tag can be derived
from (&array[0]
   & 0xf)), we can use a normal RELATIVE relocation.

   The TAGGED_RELATIVE operand looks like:
     *r_offset |= get_tag(r_addend & ~0xf);
   3.

   ABS64, RELATIVE, and TAGGED_RELATIVE relocations need a slight tweak to
   grab the place's memory tag before use, as the place itself may be
tagged.
   So, for example, the TAGGED_RELATIVE operation above actually becomes:
     r_offset = ldg(r_offset);
     *r_offset |= get_tag(r_addend & ~0xf);
   4.

   Introduce an R_AARCH64_LDG_LO9_SHORT_NC relocation for relocating the
   9-bit immediate for the LDG instruction. This isn't MTE-globals specific,
   we just seem to be missing the relocation to encode the 9-bit immediate for
   LDG at bits [12..20]. This would save us an additional ADD instruction in
   the inline-LDG sequence for hidden symbols.

We considered a few other schemes, including:

   1.

   Creating a dynamic symbol table entry for all hidden globals and giving
   them the same st_other.STO_TAGGED treatment. These entries would not
   require symbol names, but Elf(Sym) entries are 24 bytes (in comparison to 8
   bytes for the MTEGLOBTAB schema under the small code model). For an AOSP
   build, using dynamic symbol entries instead of MTEGLOBTAB results in a
   2.3MiB code size increase across all DSO's.
   2.

   Making all hidden symbol accesses go through a local-GOT. Requires an
   extra indirection for all local symbols - resulting in increased cache
   pressure (and thus decreased performance) over a simple `ldg` of the tag
   (as the dcache and tag-cache are going to be warmed anyway for the
   load/store). Unlike the MTEGLOBTAG scheme however, this scheme is backwards
   compatible, allowing MTE-globals built binaries to run on old ARM64
   hardware (as no incompatible instructions are emitted), the same as heap
   tagging. Stack tagging requires a new ABI - and we expect the MTE globals
   scheme to be enabled in partnership with stack tagging, thus we are
   unconcerned about the ABI requirement for the MTEGLOBTAG scheme.


Please let us know any feedback you have. We're currently working on an
experimental version and will update with any more details as they arise.

Thanks,

Mitch.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200917/0e9dce3c/attachment-0001.html>

David Spickett via llvm-dev

2020-Sep-18 11:10 UTC

head link

[llvm-dev] [MTE] Globals Tagging - Discussion

Hi Mitch,

In the intro you say:> It would also allow attackers with a semilinear RW primitive to trivially
attack global variables if the offset is controllable. Dynamic global tags are
required to provide the same MTE mitigation guarantees that are afforded to
stack and heap memory.
Then later:> b) Hidden Symbols (static int g; or -fvisibility=hidden)
> Materialization of hidden symbols now fetch and insert the memory tag via.
`ldg`. On aarch64, this means non PC-relative loads/stores/address-taken (*g =
7;) generates:
>  adrp x0, g;
>  ldg x0, [x0, :lo12:g]; // new instruction
>  mov x1, #7;
>  str x1, [x0, :lo12:g];
Does the tagging of these hidden symbols only protect against RW
primitives without a similar ldg? If I knew the address of the hidden
symbol I could presumably use the same sequence, but I think I'm
stretching what memory tagging is supposed to protect against. Mostly
wanted to check I understood.

Speaking of understanding...
> Introduce a TAGGED_RELATIVE relocation - in order to solve the problem
where the tag derivation shouldn't be from the relocation result, e.g.
> static int array[16] = {};
> // array_end must have the same tag as array[]. array_end is out of
> // bounds w.r.t. array, and may point to a completely different global.
> int *array_end = &array[16];
Does this mean that the value of array_end must have the same tag as
array[]. Then &array_end would have a different tag since it's a
different global?

For example you might assign tag 1 to array, then tag 2 to array_end.
Which means that array_end has a tag of 2 and so does array[16].
(assuming they're sequential)
|            array            | array_end/array[16] |
| < 1> <1> <1> <1>  |            <2>              
|

So if we just did a RELATIVE relocation then array_end's value would
have a tag of 2, so you couldn't do:
for (int* ptr=array; ptr != array_end; ++ptr)
Since it's always != due to the tags.

Do I have that right?

Also, if you have this same example but the array got rounded up to
the nearest granule e.g. (4 byte ints, 16 byte granules)
int array[3]; // rounded up to be array[4]
int* array_end = array[3];

Would you emit a normal RELATIVE relocation for array_end, because
it's within the bounds of the rounded up array. Or a TAGGED_RELATIVE
relocation because it's out of bounds of the original size of the
array?
(I don't think doing the former is a problem but I'm not a linker
expert)

Thanks,
David Spickett.

On Thu, 17 Sep 2020 at 23:05, Mitch Phillips via llvm-dev
<llvm-dev at lists.llvm.org> wrote:>
> Hi folks,
>
>
> ARM v8.5 introduces the Memory Tagging Extension (MTE), a hardware that
allows for detection of memory safety bugs (buffer overflows, use-after-free,
etc) with low overhead. So far, MTE support is implemented in the Scudo hardened
allocator (compiler-rt/lib/scudo/standalone) for heap, and stack allocation is
implemented in LLVM/Clang behind -fsanitize=memtag.
>
>
> As part of a holistic MTE implementation, global memory should also be
properly tagged. HWASan (a software-only implementation of MTE) has a schema
that uses static tags, however these can be trivially determined by an attacker
with access to the ELF file. This would allow attackers with arbitrary
read/write to trivially attack global variables. It would also allow attackers
with a semilinear RW primitive to trivially attack global variables if the
offset is controllable. Dynamic global tags are required to provide the same MTE
mitigation guarantees that are afforded to stack and heap memory.
>
>
> We've got a plan in mind about how to do MTE globals with fully dynamic
tags, but we'd love to get feedback from the community. In particular -
we'd like to try and align implementation details with GCC as the scheme
requires cooperation from the compiler, linker, and loader.
>
>
> Our current ideas are outlined below. All the compiler features (including
realignment, etc.) would be guarded behind -fsanitize=memtag. Protection of
read-only globals would be enabled-by-default, but can be disabled at compile
time behind a flag (likely -f(no)sanitize-memtag-ro-globals).
>
>
> a) Dynamic symbols (int f; extern int f;)
>
> Mark all tagged global data symbols in the dynamic symbol table as
st_other.STO_TAGGED.
>
> Teach the loader to read the symbol table at load time (and dlopen()) prior
to relocations, and apply random memory tags (via. `irg -> stg`) to each
STO_TAGGED carrying global.
>
> b) Hidden Symbols (static int g; or -fvisibility=hidden)
>
> Have the compiler mark hidden tagged globals in the symbol table as
st_other.STO_TAGGED.
>
> Have the linker read the symbol table and create a table of { unrelocated
virtual address, size } pairs for each STO_TAGGED carrying hidden global,
storing this in a new section (.mteglobtab).
>
> Create a new dynamic entry "DT_MTEGLOBTAB" that points to this
segment, along with "DT_MTEGLOBENT" for the size of each entry and
"DT_MTEGLOBSZ" for the size (in bytes) of the table.
>
> Similar to dynamic symbols, teach the loader to read this table and apply
random memory tags to each global prior to relocations.
>
> Materialization of hidden symbols now fetch and insert the memory tag via.
`ldg`. On aarch64, this means non PC-relative loads/stores/address-taken (*g =
7;) generates:
>   adrp x0, g;
>   ldg x0, [x0, :lo12:g]; // new instruction
>   mov x1, #7;
>   str x1, [x0, :lo12:g];
>
> Note that this materialization sequence means that executables built with
MTE globals are not able to run on non-MTE hardware.
>
> Note: Some dynamic symbols can be transformed at link time into hidden
symbols if:
>
> The symbol is in an object file that is statically linked into an
executable and is not referenced in any shared libraries, or
>
> The symbol has its visibility changed with a version script.
>
> These globals always have their addresses derived from a GOT entry, and
thus have their address tag materialized through the RELATIVE relocation of the
GOT entry. Due to the lack of dynamic symbol table entry however, the memory
would go untagged. The linker must ensure it creates an MTEGLOBTAB entry for all
hidden MTE-globals, including those that are transformed from external to
hidden. DSO's linked with -Bsymbolic retain their dynamic symbol table
entries, and thus require no special handling.
>
>
> c) All symbols
>
> Realign to granule size (16 bytes), resize to multiple of granule size
(e.g. 40B -> 48B).
>
> Ban data folding (except where contents and size are same, no tail
merging).
>
> In the loader, ensure writable segments (and possibly .rodata, see next dot
point) are mapped MAP_ANONYMOUS and PROT_MTE (with the contents of the mappings
filled from the file), as file-based mappings aren't necessarily backed by
tag-capable memory. It also requires in-place remapping of data segments from
the program image (as they're already mapped by the kernel before PT_INTERP
invokes the loader).
>
> Make .rodata protection optional. When read-only protection is in use, the
.rodata section should be moved into a separate segment. For Bionic libc, the
rodata section takes up 20% of its ALLOC | READ segment, and we'd like to be
able to maintain page sharing for the remaining 189KiB of other read-only data
in this segment.
>
> d) Relocations
>
> GLOB_DAT, ABS64, and RELATIVE relocations change semantics - they would be
required to retrieve and insert the memory tag of the symbol into the relocated
value. For example, the ABS64 relocation becomes:
>   sym_addr = get_symbol_address() // sym_addr = 0x1008
>   sym_addr |= get_tag(sym_addr & 0xf) // get_tag(0x1008 & 0xf ==
0x1000)
>   *r_offset = sym_addr + r_addend;
>
> Introduce a TAGGED_RELATIVE relocation - in order to solve the problem
where the tag derivation shouldn't be from the relocation result, e.g.
> static int array[16] = {};
> // array_end must have the same tag as array[]. array_end is out of
> // bounds w.r.t. array, and may point to a completely different global.
> int *array_end = &array[16];
>
> TAGGED_RELATIVE stores the untagged symbol value in the place (*r_offset ==
&array[16]), and keeps the address where the tag should be derived in the
addend (RELA-only r_addend == &array[0]).
>
> For derived symbols where the granule-aligned address is in-bounds of the
tag (e.g. array_end = &array[7] implies the tag can be derived from
(&array[0] & 0xf)), we can use a normal RELATIVE relocation.
>
> The TAGGED_RELATIVE operand looks like:
>   *r_offset |= get_tag(r_addend & ~0xf);
>
> ABS64, RELATIVE, and TAGGED_RELATIVE relocations need a slight tweak to
grab the place's memory tag before use, as the place itself may be tagged.
So, for example, the TAGGED_RELATIVE operation above actually becomes:
>   r_offset = ldg(r_offset);
>   *r_offset |= get_tag(r_addend & ~0xf);
>
> Introduce an R_AARCH64_LDG_LO9_SHORT_NC relocation for relocating the 9-bit
immediate for the LDG instruction. This isn't MTE-globals specific, we just
seem to be missing the relocation to encode the 9-bit immediate for LDG at bits
[12..20]. This would save us an additional ADD instruction in the inline-LDG
sequence for hidden symbols.
>
> We considered a few other schemes, including:
>
> Creating a dynamic symbol table entry for all hidden globals and giving
them the same st_other.STO_TAGGED treatment. These entries would not require
symbol names, but Elf(Sym) entries are 24 bytes (in comparison to 8 bytes for
the MTEGLOBTAB schema under the small code model). For an AOSP build, using
dynamic symbol entries instead of MTEGLOBTAB results in a 2.3MiB code size
increase across all DSO's.
>
> Making all hidden symbol accesses go through a local-GOT. Requires an extra
indirection for all local symbols - resulting in increased cache pressure (and
thus decreased performance) over a simple `ldg` of the tag (as the dcache and
tag-cache are going to be warmed anyway for the load/store). Unlike the
MTEGLOBTAG scheme however, this scheme is backwards compatible, allowing
MTE-globals built binaries to run on old ARM64 hardware (as no incompatible
instructions are emitted), the same as heap tagging. Stack tagging requires a
new ABI - and we expect the MTE globals scheme to be enabled in partnership with
stack tagging, thus we are unconcerned about the ABI requirement for the
MTEGLOBTAG scheme.
>
>
> Please let us know any feedback you have. We're currently working on an
experimental version and will update with any more details as they arise.
>
>
> Thanks,
>
> Mitch.
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Mitch Phillips via llvm-dev

2020-Sep-18 19:18 UTC

head link

[llvm-dev] [MTE] Globals Tagging - Discussion

Hi David,

Does the tagging of these hidden symbols only protect against
RW> primitives without a similar ldg? If I knew the address of the hidden
> symbol I could presumably use the same sequence, but I think I'm
> stretching what memory tagging is supposed to protect against.

I might be missing your point here - but don't forget that the local
globals are always PC-relative direct loads/stores. The `ldg` sequence in
that example can only be used to get `&g` (and nothing else). There
shouldn't be any `ldg`'s of arbitrary addresses (unless an attacker
already
has control of the instruction pointer, which means they've already
bypassed MTE).

Does this mean that the value of array_end must have the same tag
as> array[]. Then &array_end would have a different tag since it's a
> different global?
>
Yes, exactly.

For example you might assign tag 1 to array, then tag 2 to
array_end.> Which means that array_end has a tag of 2 and so does array[16].
> (assuming they're sequential)
> |            array            | array_end/array[16] |
> | < 1> <1> <1> <1>  |            <2>         
|
>

So if we just did a RELATIVE relocation then array_end's value
would> have a tag of 2, so you couldn't do:
> for (int* ptr=array; ptr != array_end; ++ptr)
> Since it's always != due to the tags.
> Do I have that right?

 Yep - you've got it right, this is why we need TAGGED_RELATIVE. For
clarity, here's the memory layout where array_end is relocated using
TAGGED_RELATIVE{*r_offset = &array[16], r_addend = &array[0]}:
array array_end (padding)
Memory Tag 0x1 0x1 0x2 0x2
Value 0 0 0 0 (0x1 << 56) | &array[16] 0 0
So the address tag of `array` and `array_end` are the same (only
`&array_end` has an memory/address tag of 0x2), and thus `for (int*
ptr=array; ptr != array_end; ++ptr)` works normally.

Also, if you have this same example but the array got rounded up
to> the nearest granule e.g. (4 byte ints, 16 byte granules)
> int array[3]; // rounded up to be array[4]
> int* array_end = array[3];
> Would you emit a normal RELATIVE relocation for array_end, because
> it's within the bounds of the rounded up array. Or a TAGGED_RELATIVE
> relocation because it's out of bounds of the original size of the
> array?
> (I don't think doing the former is a problem but I'm not a linker
expert)

At this stage, this would generate a TAGGED_RELATIVE. We expect
TAGGED_RELATIVE to be relatively scarce, and coming up with a more complex
scheme for the linker to optimise this edge case where it's in bounds of
the granule padding (but not the symbol itself) seems over-the-top. In
saying that, it's a possibility for later revisions.

On Fri, Sep 18, 2020 at 4:10 AM David Spickett <david.spickett at
linaro.org>
wrote:
> Hi Mitch,
>
> In the intro you say:
> > It would also allow attackers with a semilinear RW primitive to
> trivially attack global variables if the offset is controllable. Dynamic
> global tags are required to provide the same MTE mitigation guarantees that
> are afforded to stack and heap memory.
>
> Then later:
> > b) Hidden Symbols (static int g; or -fvisibility=hidden)
> > Materialization of hidden symbols now fetch and insert the memory tag
> via. `ldg`. On aarch64, this means non PC-relative
> loads/stores/address-taken (*g = 7;) generates:
> >  adrp x0, g;
> >  ldg x0, [x0, :lo12:g]; // new instruction
> >  mov x1, #7;
> >  str x1, [x0, :lo12:g];
>
> Does the tagging of these hidden symbols only protect against RW
> primitives without a similar ldg? If I knew the address of the hidden
> symbol I could presumably use the same sequence, but I think I'm
> stretching what memory tagging is supposed to protect against. Mostly
> wanted to check I understood.
>
> Speaking of understanding...
>
> > Introduce a TAGGED_RELATIVE relocation - in order to solve the problem
> where the tag derivation shouldn't be from the relocation result, e.g.
> > static int array[16] = {};
> > // array_end must have the same tag as array[]. array_end is out of
> > // bounds w.r.t. array, and may point to a completely different
global.
> > int *array_end = &array[16];
>
> Does this mean that the value of array_end must have the same tag as
> array[]. Then &array_end would have a different tag since it's a
> different global?
>
> For example you might assign tag 1 to array, then tag 2 to array_end.
> Which means that array_end has a tag of 2 and so does array[16].
> (assuming they're sequential)
> |            array            | array_end/array[16] |
> | < 1> <1> <1> <1>  |            <2>         
|
>
> So if we just did a RELATIVE relocation then array_end's value would
> have a tag of 2, so you couldn't do:
> for (int* ptr=array; ptr != array_end; ++ptr)
> Since it's always != due to the tags.
>
> Do I have that right?
>
> Also, if you have this same example but the array got rounded up to
> the nearest granule e.g. (4 byte ints, 16 byte granules)
> int array[3]; // rounded up to be array[4]
> int* array_end = array[3];
>
> Would you emit a normal RELATIVE relocation for array_end, because
> it's within the bounds of the rounded up array. Or a TAGGED_RELATIVE
> relocation because it's out of bounds of the original size of the
> array?
> (I don't think doing the former is a problem but I'm not a linker
expert)
>
> Thanks,
> David Spickett.
>
> On Thu, 17 Sep 2020 at 23:05, Mitch Phillips via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
> >
> > Hi folks,
> >
> >
> > ARM v8.5 introduces the Memory Tagging Extension (MTE), a hardware
that
> allows for detection of memory safety bugs (buffer overflows,
> use-after-free, etc) with low overhead. So far, MTE support is implemented
> in the Scudo hardened allocator (compiler-rt/lib/scudo/standalone) for
> heap, and stack allocation is implemented in LLVM/Clang behind
> -fsanitize=memtag.
> >
> >
> > As part of a holistic MTE implementation, global memory should also be
> properly tagged. HWASan (a software-only implementation of MTE) has a
> schema that uses static tags, however these can be trivially determined by
> an attacker with access to the ELF file. This would allow attackers with
> arbitrary read/write to trivially attack global variables. It would also
> allow attackers with a semilinear RW primitive to trivially attack global
> variables if the offset is controllable. Dynamic global tags are required
> to provide the same MTE mitigation guarantees that are afforded to stack
> and heap memory.
> >
> >
> > We've got a plan in mind about how to do MTE globals with fully
dynamic
> tags, but we'd love to get feedback from the community. In particular -
> we'd like to try and align implementation details with GCC as the
scheme
> requires cooperation from the compiler, linker, and loader.
> >
> >
> > Our current ideas are outlined below. All the compiler features
> (including realignment, etc.) would be guarded behind -fsanitize=memtag.
> Protection of read-only globals would be enabled-by-default, but can be
> disabled at compile time behind a flag (likely
> -f(no)sanitize-memtag-ro-globals).
> >
> >
> > a) Dynamic symbols (int f; extern int f;)
> >
> > Mark all tagged global data symbols in the dynamic symbol table as
> st_other.STO_TAGGED.
> >
> > Teach the loader to read the symbol table at load time (and dlopen())
> prior to relocations, and apply random memory tags (via. `irg -> stg`)
to
> each STO_TAGGED carrying global.
> >
> > b) Hidden Symbols (static int g; or -fvisibility=hidden)
> >
> > Have the compiler mark hidden tagged globals in the symbol table as
> st_other.STO_TAGGED.
> >
> > Have the linker read the symbol table and create a table of {
> unrelocated virtual address, size } pairs for each STO_TAGGED carrying
> hidden global, storing this in a new section (.mteglobtab).
> >
> > Create a new dynamic entry "DT_MTEGLOBTAB" that points to
this segment,
> along with "DT_MTEGLOBENT" for the size of each entry and
"DT_MTEGLOBSZ"
> for the size (in bytes) of the table.
> >
> > Similar to dynamic symbols, teach the loader to read this table and
> apply random memory tags to each global prior to relocations.
> >
> > Materialization of hidden symbols now fetch and insert the memory tag
> via. `ldg`. On aarch64, this means non PC-relative
> loads/stores/address-taken (*g = 7;) generates:
> >   adrp x0, g;
> >   ldg x0, [x0, :lo12:g]; // new instruction
> >   mov x1, #7;
> >   str x1, [x0, :lo12:g];
> >
> > Note that this materialization sequence means that executables built
> with MTE globals are not able to run on non-MTE hardware.
> >
> > Note: Some dynamic symbols can be transformed at link time into hidden
> symbols if:
> >
> > The symbol is in an object file that is statically linked into an
> executable and is not referenced in any shared libraries, or
> >
> > The symbol has its visibility changed with a version script.
> >
> > These globals always have their addresses derived from a GOT entry,
and
> thus have their address tag materialized through the RELATIVE relocation of
> the GOT entry. Due to the lack of dynamic symbol table entry however, the
> memory would go untagged. The linker must ensure it creates an MTEGLOBTAB
> entry for all hidden MTE-globals, including those that are transformed from
> external to hidden. DSO's linked with -Bsymbolic retain their dynamic
> symbol table entries, and thus require no special handling.
> >
> >
> > c) All symbols
> >
> > Realign to granule size (16 bytes), resize to multiple of granule size
> (e.g. 40B -> 48B).
> >
> > Ban data folding (except where contents and size are same, no tail
> merging).
> >
> > In the loader, ensure writable segments (and possibly .rodata, see
next
> dot point) are mapped MAP_ANONYMOUS and PROT_MTE (with the contents of the
> mappings filled from the file), as file-based mappings aren't
necessarily
> backed by tag-capable memory. It also requires in-place remapping of data
> segments from the program image (as they're already mapped by the
kernel
> before PT_INTERP invokes the loader).
> >
> > Make .rodata protection optional. When read-only protection is in use,
> the .rodata section should be moved into a separate segment. For Bionic
> libc, the rodata section takes up 20% of its ALLOC | READ segment, and
we'd
> like to be able to maintain page sharing for the remaining 189KiB of other
> read-only data in this segment.
> >
> > d) Relocations
> >
> > GLOB_DAT, ABS64, and RELATIVE relocations change semantics - they
would
> be required to retrieve and insert the memory tag of the symbol into the
> relocated value. For example, the ABS64 relocation becomes:
> >   sym_addr = get_symbol_address() // sym_addr = 0x1008
> >   sym_addr |= get_tag(sym_addr & 0xf) // get_tag(0x1008 & 0xf
== 0x1000)
> >   *r_offset = sym_addr + r_addend;
> >
> > Introduce a TAGGED_RELATIVE relocation - in order to solve the problem
> where the tag derivation shouldn't be from the relocation result, e.g.
> > static int array[16] = {};
> > // array_end must have the same tag as array[]. array_end is out of
> > // bounds w.r.t. array, and may point to a completely different
global.
> > int *array_end = &array[16];
> >
> > TAGGED_RELATIVE stores the untagged symbol value in the place
(*r_offset
> == &array[16]), and keeps the address where the tag should be derived
in
> the addend (RELA-only r_addend == &array[0]).
> >
> > For derived symbols where the granule-aligned address is in-bounds of
> the tag (e.g. array_end = &array[7] implies the tag can be derived from
> (&array[0] & 0xf)), we can use a normal RELATIVE relocation.
> >
> > The TAGGED_RELATIVE operand looks like:
> >   *r_offset |= get_tag(r_addend & ~0xf);
> >
> > ABS64, RELATIVE, and TAGGED_RELATIVE relocations need a slight tweak
to
> grab the place's memory tag before use, as the place itself may be
tagged.
> So, for example, the TAGGED_RELATIVE operation above actually becomes:
> >   r_offset = ldg(r_offset);
> >   *r_offset |= get_tag(r_addend & ~0xf);
> >
> > Introduce an R_AARCH64_LDG_LO9_SHORT_NC relocation for relocating the
> 9-bit immediate for the LDG instruction. This isn't MTE-globals
specific,
> we just seem to be missing the relocation to encode the 9-bit immediate for
> LDG at bits [12..20]. This would save us an additional ADD instruction in
> the inline-LDG sequence for hidden symbols.
> >
> > We considered a few other schemes, including:
> >
> > Creating a dynamic symbol table entry for all hidden globals and
giving
> them the same st_other.STO_TAGGED treatment. These entries would not
> require symbol names, but Elf(Sym) entries are 24 bytes (in comparison to 8
> bytes for the MTEGLOBTAB schema under the small code model). For an AOSP
> build, using dynamic symbol entries instead of MTEGLOBTAB results in a
> 2.3MiB code size increase across all DSO's.
> >
> > Making all hidden symbol accesses go through a local-GOT. Requires an
> extra indirection for all local symbols - resulting in increased cache
> pressure (and thus decreased performance) over a simple `ldg` of the tag
> (as the dcache and tag-cache are going to be warmed anyway for the
> load/store). Unlike the MTEGLOBTAG scheme however, this scheme is backwards
> compatible, allowing MTE-globals built binaries to run on old ARM64
> hardware (as no incompatible instructions are emitted), the same as heap
> tagging. Stack tagging requires a new ABI - and we expect the MTE globals
> scheme to be enabled in partnership with stack tagging, thus we are
> unconcerned about the ABI requirement for the MTEGLOBTAG scheme.
> >
> >
> > Please let us know any feedback you have. We're currently working
on an
> experimental version and will update with any more details as they arise.
> >
> >
> > Thanks,
> >
> > Mitch.
> >
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200918/33ba4990/attachment.html>

Szabolcs Nagy via llvm-dev

2020-Oct-08 18:42 UTC

head link

[llvm-dev] [MTE] Globals Tagging - Discussion

* Mitch Phillips via llvm-dev <llvm-dev at lists.llvm.org> [2020-09-17
15:05:18 -0700]:> ARM v8.5 introduces the Memory Tagging Extension (MTE), a hardware that
> allows for detection of memory safety bugs (buffer overflows,
> use-after-free, etc) with low overhead. So far, MTE support is implemented
> in the Scudo hardened allocator (compiler-rt/lib/scudo/standalone) for
> heap, and stack allocation is implemented in LLVM/Clang behind
> -fsanitize=memtag <https://llvm.org/docs/MemTagSanitizer.html>.
> 
> As part of a holistic MTE implementation, global memory should also be
> properly tagged. HWASan
>
<http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html>
(a
> software-only implementation of MTE) has a schema that uses static tags,
> however these can be trivially determined by an attacker with access to the
> ELF file. This would allow attackers with arbitrary read/write to trivially
> attack global variables. It would also allow attackers with a semilinear RW
> primitive to trivially attack global variables if the offset is
> controllable. Dynamic global tags are required to provide the same MTE
> mitigation guarantees that are afforded to stack and heap memory.
> 
> We've got a plan in mind about how to do MTE globals with fully dynamic
> tags, but we'd love to get feedback from the community. In particular -
> we'd like to try and align implementation details with GCC as the
scheme
> requires cooperation from the compiler, linker, and loader.
> 
> Our current ideas are outlined below. All the compiler features (including
> realignment, etc.) would be guarded behind -fsanitize=memtag. Protection of
> read-only globals would be enabled-by-default, but can be disabled at
> compile time behind a flag (likely -f(no)sanitize-memtag-ro-globals).
i think -fsanitize is not appropriate for an mte abi.

(i mean you can have an -fsanitize for it, but mte can be
a proper abi between libc, linkers and compilers that
several toolchains can implement independently, not an
llvm vs compiler-rt internal design or android only design.)
> 
> a) Dynamic symbols (int f; extern int f;)
> 
>    1.
> 
>    Mark all tagged global data symbols in the dynamic symbol table as
>    st_other.STO_TAGGED.
note: these bits are not really reserved for os or processor
specific use in ELF. in practice they are processor specific
so it will be STO_AARCH64_TAGGED.

note2: undefined symbol references will need correct marking
too if objects may get copy relocated into the main exe and
linkers should check if definitions match references.

this will require an ABI bump (otherwise old tools will
silently ignore the new STO flag).

but i'm not convinced yet that per symbol marking is needed.

it would be better to discuss on a linux abi or arm abi forum
than on llvm-dev (at least in my experience unsubscribed mail
gets dropped or significantly delayed here and many linux or
arm abi folks are not subscribed)
>    2.
> 
>    Teach the loader to read the symbol table at load time (and dlopen())
>    prior to relocations, and apply random memory tags (via. `irg ->
stg`) to
>    each STO_TAGGED carrying global.
are object sizes reliable in the dynamic symbol table?
is this why there is a need for per symbol marking?
> 
> b) Hidden Symbols (static int g; or -fvisibility=hidden)
> 
>    1.
> 
>    Have the compiler mark hidden tagged globals in the symbol table as
>    st_other.STO_TAGGED.
>    2.
> 
>    Have the linker read the symbol table and create a table of {
>    unrelocated virtual address, size } pairs for each STO_TAGGED carrying
>    hidden global, storing this in a new section (.mteglobtab).
>    3.
> 
>    Create a new dynamic entry "DT_MTEGLOBTAB" that points to this
segment,
>    along with "DT_MTEGLOBENT" for the size of each entry and
"DT_MTEGLOBSZ"
>    for the size (in bytes) of the table.
>    4.
> 
>    Similar to dynamic symbols, teach the loader to read this table and
>    apply random memory tags to each global prior to relocations.
for static linking it's possible to make a static exe self
relocating like how static pie handles RELATIVE relocs, but
this sounds a bit nasty (and will need to use rcrt1.o or a
new *crt1.o entry that guarantees such self relocation).
>    5.
> 
>    Materialization of hidden symbols now fetch and insert the memory tag
>    via. `ldg`. On aarch64, this means non PC-relative
>    loads/stores/address-taken (*g = 7;) generates:
>      adrp x0, g;
>      ldg x0, [x0, :lo12:g]; // new instruction
>      mov x1, #7;
>      str x1, [x0, :lo12:g];
> 
>    Note that this materialization sequence means that executables built
>    with MTE globals are not able to run on non-MTE hardware.
i need to think about this, i think a compiler may transform

static int a[8];

void f(int i)
{
	a[i-5] = 0;
}

to

	(a-5)[i] = 0;

i.e. instead of offsetting i, compute the address of a-5 with
adrp and then less instructions can be used for indexing.

but then ldg on the computed address is not valid.
(this is likely not a performance concern, but implies that
there may be code generation troubles if we assume anything
that is computed with adrp can be fixed up with ldg.)
> 
> Note: Some dynamic symbols can be transformed at link time into hidden
> symbols if:
> 
>    1.
> 
>    The symbol is in an object file that is statically linked into an
>    executable and is not referenced in any shared libraries, or
>    2.
> 
>    The symbol has its visibility changed with a version script.
> 
> These globals always have their addresses derived from a GOT entry, and
> thus have their address tag materialized through the RELATIVE relocation of
> the GOT entry. Due to the lack of dynamic symbol table entry however, the
> memory would go untagged. The linker must ensure it creates an MTEGLOBTAB
> entry for all hidden MTE-globals, including those that are transformed from
> external to hidden. DSO's linked with -Bsymbolic retain their dynamic
> symbol table entries, and thus require no special handling.
> 
> c) All symbols
> 
>    1.
> 
>    Realign to granule size (16 bytes), resize to multiple of granule size
>    (e.g. 40B -> 48B).
>    2.
> 
>    Ban data folding (except where contents and size are same, no tail
>    merging).
>    3.
> 
>    In the loader, ensure writable segments (and possibly .rodata, see next
>    dot point) are mapped MAP_ANONYMOUS and PROT_MTE (with the contents of
the
>    mappings filled from the file), as file-based mappings aren't
necessarily
>    backed by tag-capable memory. It also requires in-place remapping of
data
>    segments from the program image (as they're already mapped by the
kernel
>    before PT_INTERP invokes the loader).
copying file data is a bit ugly but i think this is ok.
>    4.
> 
>    Make .rodata protection optional. When read-only protection is in use,
>    the .rodata section should be moved into a separate segment. For Bionic
>    libc, the rodata section takes up 20% of its ALLOC | READ segment, and
we'd
>    like to be able to maintain page sharing for the remaining 189KiB of
other
>    read-only data in this segment.
i think a design that prevents sharing is not acceptable.
> 
> d) Relocations
> 
>    1.
> 
>    GLOB_DAT, ABS64, and RELATIVE relocations change semantics - they would
>    be required to retrieve and insert the memory tag of the symbol into the
>    relocated value. For example, the ABS64 relocation becomes:
>      sym_addr = get_symbol_address() // sym_addr = 0x1008
>      sym_addr |= get_tag(sym_addr & 0xf) // get_tag(0x1008 & 0xf ==
0x1000)
>      *r_offset = sym_addr + r_addend;
>    2.
> 
>    Introduce a TAGGED_RELATIVE relocation - in order to solve the problem
>    where the tag derivation shouldn't be from the relocation result,
e.g.
>    static int array[16] = {};
>    // array_end must have the same tag as array[]. array_end is out of
>    // bounds w.r.t. array, and may point to a completely different global.
>    int *array_end = &array[16];
> 
>    TAGGED_RELATIVE stores the untagged symbol value in the place (*r_offset
>    == &array[16]), and keeps the address where the tag should be
derived in
>    the addend (RELA-only r_addend == &array[0]).
> 
>    For derived symbols where the granule-aligned address is in-bounds of
>    the tag (e.g. array_end = &array[7] implies the tag can be derived
> from (&array[0]
>    & 0xf)), we can use a normal RELATIVE relocation.
> 
>    The TAGGED_RELATIVE operand looks like:
>      *r_offset |= get_tag(r_addend & ~0xf);
>    3.
> 
>    ABS64, RELATIVE, and TAGGED_RELATIVE relocations need a slight tweak to
>    grab the place's memory tag before use, as the place itself may be
tagged.
>    So, for example, the TAGGED_RELATIVE operation above actually becomes:
>      r_offset = ldg(r_offset);
>      *r_offset |= get_tag(r_addend & ~0xf);
>    4.
> 
>    Introduce an R_AARCH64_LDG_LO9_SHORT_NC relocation for relocating the
>    9-bit immediate for the LDG instruction. This isn't MTE-globals
specific,
>    we just seem to be missing the relocation to encode the 9-bit immediate
for
>    LDG at bits [12..20]. This would save us an additional ADD instruction
in
>    the inline-LDG sequence for hidden symbols.
> 
> We considered a few other schemes, including:
> 
>    1.
> 
>    Creating a dynamic symbol table entry for all hidden globals and giving
>    them the same st_other.STO_TAGGED treatment. These entries would not
>    require symbol names, but Elf(Sym) entries are 24 bytes (in comparison
to 8
>    bytes for the MTEGLOBTAB schema under the small code model). For an AOSP
>    build, using dynamic symbol entries instead of MTEGLOBTAB results in a
>    2.3MiB code size increase across all DSO's.
>    2.
> 
>    Making all hidden symbol accesses go through a local-GOT. Requires an
>    extra indirection for all local symbols - resulting in increased cache
>    pressure (and thus decreased performance) over a simple `ldg` of the tag
>    (as the dcache and tag-cache are going to be warmed anyway for the
>    load/store). Unlike the MTEGLOBTAG scheme however, this scheme is
backwards
>    compatible, allowing MTE-globals built binaries to run on old ARM64
>    hardware (as no incompatible instructions are emitted), the same as heap
>    tagging. Stack tagging requires a new ABI - and we expect the MTE
globals
>    scheme to be enabled in partnership with stack tagging, thus we are
>    unconcerned about the ABI requirement for the MTEGLOBTAG scheme.
if object access goes via symbolic dynamic relocation
(GOT, ABS) then there is no need to do anything special:

- pointer representation is controlled by the dynamic
  linker via the relocs

- location of object is known (object bounds and
  if it's in a PROT_MTE segment)

so it can be a completely dynamic linker internal decision
what globals to tag and how. (it is also backward compat
with existing binaries, but it might make sense to have
an opt-in mechanism for such tagging.)


new abi is needed to protect local accesses, i'm not yet
sure about the proposed design with two RELATIVE relocs.
i think RELATIVE reloc should not assume that the computed
pointer can be dereferenced, this is not just for the array
end case but for other oob computed pointers too. e.g.

static int a[8];
static int *p = a - 5;
...
	p[10] = 1;

should work (even if it's not valid in c it can be valid as
a c extension or written in asm, so ELF should support it).

e.g. the r_info field in the RELATIVE reloc can index the
MTEGLOBTAB and use the object bounds from there for ldg
(and if r_info==0 means untagged this falls back to normal
RELATIVE reloc processing), but i don't yet know what is
the best solution here.

i think tls needs some thought too, arrays are probably
not common there, but some protection may be possible in
some cases..
> 
> 
> Please let us know any feedback you have. We're currently working on an
> experimental version and will update with any more details as they arise.
> 
> Thanks,
> 
> Mitch.
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Mitch Phillips via llvm-dev

2020-Oct-09 20:17 UTC

head link

[llvm-dev] [MTE] Globals Tagging - Discussion

>
> note: these bits are not really reserved for os or processor
> specific use in ELF. in practice they are processor specific
> so it will be STO_AARCH64_TAGGED.
>
Correct.

note2: undefined symbol references will need correct
marking> too if objects may get copy relocated into the main exe and
> linkers should check if definitions match references.

Yep - at this point I expect that resolving an untagged reference with a
tagged symbol (or vice versa) should result in a link-time error, but I
don't feel particularly strongly about this. Downgrading to untagged should
always be safe - but I think this subverts the object files' desire to have
tagged globals. This also affects linking object files that are some
tagged, some untagged into the same DSO.

it would be better to discuss on a linux abi or arm abi forum

than on llvm-dev


If you have any recommendations, that would be much appreciated. We have
some ARM ELF folks on the line here, but it's probably not as broad as I
would like.

are object sizes reliable in the dynamic symbol table?> is this why there is a need for per symbol marking?

> so it can be a completely dynamic linker internal decision
what globals to tag and how. (it is also backward compat> with existing binaries, but it might make sense to have
> an opt-in mechanism for such tagging.)

Object sizes are reliable - but marking symbols explicitly allows us to
have mixed tagged and untagged symbols in the same segment (think of a
symbol we know is being used by non-compliant assembly, we can mark it with
__attribute__((nosanitize("mte"))).

IMO marking symbols in the dynamic symbol table gives us greater
flexibility than indiscriminately tagging granule-aligned symbols that fall
in the right segments.

i think a design that prevents sharing is not
acceptable.>
Unfortunately shared memory isn't required to be tag capable (DAX
<https://www.kernel.org/doc/Documentation/filesystems/dax.txt> is an
example) - so any PROT_MTE mappings must be anonymous. That's why we'd
like
to carve out rodata into its own segment, to continue to allow page sharing
for the rest of the 80% of the stuff in that segment.

 static int a[8];

static int *p = a - 5;> ...
>         p[10] = 1;
> should work (even if it's not valid in c it can be valid as
> a c extension or written in asm, so ELF should support it).

IMO this is exactly the kind of thing that MTE is trying to *prevent*. I
don't see why we would want to support something like this.

i think tls needs some thought too, arrays are probably> not common there, but some protection may be possible in
> some cases..
>
Definitely agreed - we haven't fleshed out a TLS story at this point in
time -- we're considering it for later iterations though.

On Thu, Oct 8, 2020 at 11:42 AM Szabolcs Nagy via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> * Mitch Phillips via llvm-dev <llvm-dev at lists.llvm.org>
[2020-09-17
> 15:05:18 -0700]:
> > ARM v8.5 introduces the Memory Tagging Extension (MTE), a hardware
that
> > allows for detection of memory safety bugs (buffer overflows,
> > use-after-free, etc) with low overhead. So far, MTE support is
> implemented
> > in the Scudo hardened allocator (compiler-rt/lib/scudo/standalone) for
> > heap, and stack allocation is implemented in LLVM/Clang behind
> > -fsanitize=memtag <https://llvm.org/docs/MemTagSanitizer.html>.
> >
> > As part of a holistic MTE implementation, global memory should also be
> > properly tagged. HWASan
> >
<http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html>
> (a
> > software-only implementation of MTE) has a schema that uses static
tags,
> > however these can be trivially determined by an attacker with access
to
> the
> > ELF file. This would allow attackers with arbitrary read/write to
> trivially
> > attack global variables. It would also allow attackers with a
semilinear
> RW
> > primitive to trivially attack global variables if the offset is
> > controllable. Dynamic global tags are required to provide the same MTE
> > mitigation guarantees that are afforded to stack and heap memory.
> >
> > We've got a plan in mind about how to do MTE globals with fully
dynamic
> > tags, but we'd love to get feedback from the community. In
particular -
> > we'd like to try and align implementation details with GCC as the
scheme
> > requires cooperation from the compiler, linker, and loader.
> >
> > Our current ideas are outlined below. All the compiler features
> (including
> > realignment, etc.) would be guarded behind -fsanitize=memtag.
Protection
> of
> > read-only globals would be enabled-by-default, but can be disabled at
> > compile time behind a flag (likely -f(no)sanitize-memtag-ro-globals).
>
> i think -fsanitize is not appropriate for an mte abi.
>
> (i mean you can have an -fsanitize for it, but mte can be
> a proper abi between libc, linkers and compilers that
> several toolchains can implement independently, not an
> llvm vs compiler-rt internal design or android only design.)
>
> >
> > a) Dynamic symbols (int f; extern int f;)
> >
> >    1.
> >
> >    Mark all tagged global data symbols in the dynamic symbol table as
> >    st_other.STO_TAGGED.
>
> note: these bits are not really reserved for os or processor
> specific use in ELF. in practice they are processor specific
> so it will be STO_AARCH64_TAGGED.
>
> note2: undefined symbol references will need correct marking
> too if objects may get copy relocated into the main exe and
> linkers should check if definitions match references.
>
> this will require an ABI bump (otherwise old tools will
> silently ignore the new STO flag).
>
> but i'm not convinced yet that per symbol marking is needed.
>
> it would be better to discuss on a linux abi or arm abi forum
> than on llvm-dev (at least in my experience unsubscribed mail
> gets dropped or significantly delayed here and many linux or
> arm abi folks are not subscribed)
>
> >    2.
> >
> >    Teach the loader to read the symbol table at load time (and
dlopen())
> >    prior to relocations, and apply random memory tags (via. `irg ->
> stg`) to
> >    each STO_TAGGED carrying global.
>
> are object sizes reliable in the dynamic symbol table?
> is this why there is a need for per symbol marking?
>
> >
> > b) Hidden Symbols (static int g; or -fvisibility=hidden)
> >
> >    1.
> >
> >    Have the compiler mark hidden tagged globals in the symbol table as
> >    st_other.STO_TAGGED.
> >    2.
> >
> >    Have the linker read the symbol table and create a table of {
> >    unrelocated virtual address, size } pairs for each STO_TAGGED
carrying
> >    hidden global, storing this in a new section (.mteglobtab).
> >    3.
> >
> >    Create a new dynamic entry "DT_MTEGLOBTAB" that points to
this
> segment,
> >    along with "DT_MTEGLOBENT" for the size of each entry and
> "DT_MTEGLOBSZ"
> >    for the size (in bytes) of the table.
> >    4.
> >
> >    Similar to dynamic symbols, teach the loader to read this table and
> >    apply random memory tags to each global prior to relocations.
>
> for static linking it's possible to make a static exe self
> relocating like how static pie handles RELATIVE relocs, but
> this sounds a bit nasty (and will need to use rcrt1.o or a
> new *crt1.o entry that guarantees such self relocation).
>
> >    5.
> >
> >    Materialization of hidden symbols now fetch and insert the memory
tag
> >    via. `ldg`. On aarch64, this means non PC-relative
> >    loads/stores/address-taken (*g = 7;) generates:
> >      adrp x0, g;
> >      ldg x0, [x0, :lo12:g]; // new instruction
> >      mov x1, #7;
> >      str x1, [x0, :lo12:g];
> >
> >    Note that this materialization sequence means that executables
built
> >    with MTE globals are not able to run on non-MTE hardware.
>
> i need to think about this, i think a compiler may transform
>
> static int a[8];
>
> void f(int i)
> {
>         a[i-5] = 0;
> }
>
> to
>
>         (a-5)[i] = 0;
>
> i.e. instead of offsetting i, compute the address of a-5 with
> adrp and then less instructions can be used for indexing.
>
> but then ldg on the computed address is not valid.
> (this is likely not a performance concern, but implies that
> there may be code generation troubles if we assume anything
> that is computed with adrp can be fixed up with ldg.)
>
> >
> > Note: Some dynamic symbols can be transformed at link time into hidden
> > symbols if:
> >
> >    1.
> >
> >    The symbol is in an object file that is statically linked into an
> >    executable and is not referenced in any shared libraries, or
> >    2.
> >
> >    The symbol has its visibility changed with a version script.
> >
> > These globals always have their addresses derived from a GOT entry,
and
> > thus have their address tag materialized through the RELATIVE
relocation
> of
> > the GOT entry. Due to the lack of dynamic symbol table entry however,
the
> > memory would go untagged. The linker must ensure it creates an
MTEGLOBTAB
> > entry for all hidden MTE-globals, including those that are transformed
> from
> > external to hidden. DSO's linked with -Bsymbolic retain their
dynamic
> > symbol table entries, and thus require no special handling.
> >
> > c) All symbols
> >
> >    1.
> >
> >    Realign to granule size (16 bytes), resize to multiple of granule
size
> >    (e.g. 40B -> 48B).
> >    2.
> >
> >    Ban data folding (except where contents and size are same, no tail
> >    merging).
> >    3.
> >
> >    In the loader, ensure writable segments (and possibly .rodata, see
> next
> >    dot point) are mapped MAP_ANONYMOUS and PROT_MTE (with the contents
> of the
> >    mappings filled from the file), as file-based mappings aren't
> necessarily
> >    backed by tag-capable memory. It also requires in-place remapping
of
> data
> >    segments from the program image (as they're already mapped by
the
> kernel
> >    before PT_INTERP invokes the loader).
>
> copying file data is a bit ugly but i think this is ok.
>
> >    4.
> >
> >    Make .rodata protection optional. When read-only protection is in
use,
> >    the .rodata section should be moved into a separate segment. For
> Bionic
> >    libc, the rodata section takes up 20% of its ALLOC | READ segment,
> and we'd
> >    like to be able to maintain page sharing for the remaining 189KiB
of
> other
> >    read-only data in this segment.
>
> i think a design that prevents sharing is not acceptable.
>
> >
> > d) Relocations
> >
> >    1.
> >
> >    GLOB_DAT, ABS64, and RELATIVE relocations change semantics - they
> would
> >    be required to retrieve and insert the memory tag of the symbol
into
> the
> >    relocated value. For example, the ABS64 relocation becomes:
> >      sym_addr = get_symbol_address() // sym_addr = 0x1008
> >      sym_addr |= get_tag(sym_addr & 0xf) // get_tag(0x1008 &
0xf => 0x1000)
> >      *r_offset = sym_addr + r_addend;
> >    2.
> >
> >    Introduce a TAGGED_RELATIVE relocation - in order to solve the
problem
> >    where the tag derivation shouldn't be from the relocation
result, e.g.
> >    static int array[16] = {};
> >    // array_end must have the same tag as array[]. array_end is out of
> >    // bounds w.r.t. array, and may point to a completely different
> global.
> >    int *array_end = &array[16];
> >
> >    TAGGED_RELATIVE stores the untagged symbol value in the place
> (*r_offset
> >    == &array[16]), and keeps the address where the tag should be
derived
> in
> >    the addend (RELA-only r_addend == &array[0]).
> >
> >    For derived symbols where the granule-aligned address is in-bounds
of
> >    the tag (e.g. array_end = &array[7] implies the tag can be
derived
> > from (&array[0]
> >    & 0xf)), we can use a normal RELATIVE relocation.
> >
> >    The TAGGED_RELATIVE operand looks like:
> >      *r_offset |= get_tag(r_addend & ~0xf);
> >    3.
> >
> >    ABS64, RELATIVE, and TAGGED_RELATIVE relocations need a slight
tweak
> to
> >    grab the place's memory tag before use, as the place itself may
be
> tagged.
> >    So, for example, the TAGGED_RELATIVE operation above actually
becomes:
> >      r_offset = ldg(r_offset);
> >      *r_offset |= get_tag(r_addend & ~0xf);
> >    4.
> >
> >    Introduce an R_AARCH64_LDG_LO9_SHORT_NC relocation for relocating
the
> >    9-bit immediate for the LDG instruction. This isn't MTE-globals
> specific,
> >    we just seem to be missing the relocation to encode the 9-bit
> immediate for
> >    LDG at bits [12..20]. This would save us an additional ADD
> instruction in
> >    the inline-LDG sequence for hidden symbols.
> >
> > We considered a few other schemes, including:
> >
> >    1.
> >
> >    Creating a dynamic symbol table entry for all hidden globals and
> giving
> >    them the same st_other.STO_TAGGED treatment. These entries would
not
> >    require symbol names, but Elf(Sym) entries are 24 bytes (in
> comparison to 8
> >    bytes for the MTEGLOBTAB schema under the small code model). For an
> AOSP
> >    build, using dynamic symbol entries instead of MTEGLOBTAB results
in a
> >    2.3MiB code size increase across all DSO's.
> >    2.
> >
> >    Making all hidden symbol accesses go through a local-GOT. Requires
an
> >    extra indirection for all local symbols - resulting in increased
cache
> >    pressure (and thus decreased performance) over a simple `ldg` of
the
> tag
> >    (as the dcache and tag-cache are going to be warmed anyway for the
> >    load/store). Unlike the MTEGLOBTAG scheme however, this scheme is
> backwards
> >    compatible, allowing MTE-globals built binaries to run on old ARM64
> >    hardware (as no incompatible instructions are emitted), the same as
> heap
> >    tagging. Stack tagging requires a new ABI - and we expect the MTE
> globals
> >    scheme to be enabled in partnership with stack tagging, thus we are
> >    unconcerned about the ABI requirement for the MTEGLOBTAG scheme.
>
> if object access goes via symbolic dynamic relocation
> (GOT, ABS) then there is no need to do anything special:
>
> - pointer representation is controlled by the dynamic
>   linker via the relocs
>
> - location of object is known (object bounds and
>   if it's in a PROT_MTE segment)
>
> so it can be a completely dynamic linker internal decision
> what globals to tag and how. (it is also backward compat
> with existing binaries, but it might make sense to have
> an opt-in mechanism for such tagging.)
>
>
> new abi is needed to protect local accesses, i'm not yet
> sure about the proposed design with two RELATIVE relocs.
> i think RELATIVE reloc should not assume that the computed
> pointer can be dereferenced, this is not just for the array
> end case but for other oob computed pointers too. e.g.
>
> static int a[8];
> static int *p = a - 5;
> ...
>         p[10] = 1;
>
> should work (even if it's not valid in c it can be valid as
> a c extension or written in asm, so ELF should support it).
>
> e.g. the r_info field in the RELATIVE reloc can index the
> MTEGLOBTAB and use the object bounds from there for ldg
> (and if r_info==0 means untagged this falls back to normal
> RELATIVE reloc processing), but i don't yet know what is
> the best solution here.
>
> i think tls needs some thought too, arrays are probably
> not common there, but some protection may be possible in
> some cases..
>
> >
> >
> > Please let us know any feedback you have. We're currently working
on an
> > experimental version and will update with any more details as they
arise.
> >
> > Thanks,
> >
> > Mitch.
>
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20201009/3f2ab01c/attachment.html>

llvm dev - Sep 2020 - [MTE] Globals Tagging - Discussion

[llvm-dev] [MTE] Globals Tagging - Discussion

[llvm-dev] [MTE] Globals Tagging - Discussion

[llvm-dev] [MTE] Globals Tagging - Discussion

[llvm-dev] [MTE] Globals Tagging - Discussion

[llvm-dev] [MTE] Globals Tagging - Discussion