thr3ads.net - llvm dev - [llvm-dev] [MTE] Globals Tagging

If this information is useful, please help other people find it:
Share via:

David Spickett via llvm-dev

2020-Sep-21 14:05 UTC

[llvm-dev] [MTE] Globals Tagging - Discussion

> I might be missing your point here - but don't forget that the local
globals are always PC-relative direct loads/stores.
I did forget! Thanks for clarifying, now I understand.

On Fri, 18 Sep 2020 at 20:51, Evgenii Stepanov <eugenis at google.com>
wrote:>
>
>
> On Fri, Sep 18, 2020 at 12:18 PM Mitch Phillips via llvm-dev <llvm-dev
at lists.llvm.org> wrote:
>>
>> Hi David,
>>
>>> Does the tagging of these hidden symbols only protect against RW
>>> primitives without a similar ldg? If I knew the address of the
hidden
>>> symbol I could presumably use the same sequence, but I think
I'm
>>> stretching what memory tagging is supposed to protect against.
>>
>>
>> I might be missing your point here - but don't forget that the
local globals are always PC-relative direct loads/stores. The `ldg` sequence in
that example can only be used to get `&g` (and nothing else). There
shouldn't be any `ldg`'s of arbitrary addresses (unless an attacker
already has control of the instruction pointer, which means they've already
bypassed MTE).
>>
>>> Does this mean that the value of array_end must have the same tag
as
>>> array[]. Then &array_end would have a different tag since
it's a
>>> different global?
>>
>>
>> Yes, exactly.
>>
>>> For example you might assign tag 1 to array, then tag 2 to
array_end.
>>> Which means that array_end has a tag of 2 and so does array[16].
>>> (assuming they're sequential)
>>> |            array            | array_end/array[16] |
>>> | < 1> <1> <1> <1>  |            <2> 
|
>>>
>>>
>>>
>>> So if we just did a RELATIVE relocation then array_end's value
would
>>> have a tag of 2, so you couldn't do:
>>> for (int* ptr=array; ptr != array_end; ++ptr)
>>> Since it's always != due to the tags.
>>> Do I have that right?
>>
>>
>>  Yep - you've got it right, this is why we need TAGGED_RELATIVE.
For clarity, here's the memory layout where array_end is relocated using
TAGGED_RELATIVE{*r_offset = &array[16], r_addend = &array[0]}:
>> arrayarray_end(padding)
>> Memory Tag0x10x10x20x2
>> Value0000(0x1 << 56) | &array[16]00
>>
>> So the address tag of `array` and `array_end` are the same (only
`&array_end` has an memory/address tag of 0x2), and thus `for (int*
ptr=array; ptr != array_end; ++ptr)` works normally.
>>
>>> Also, if you have this same example but the array got rounded up to
>>> the nearest granule e.g. (4 byte ints, 16 byte granules)
>>> int array[3]; // rounded up to be array[4]
>>> int* array_end = array[3];
>>> Would you emit a normal RELATIVE relocation for array_end, because
>>> it's within the bounds of the rounded up array. Or a
TAGGED_RELATIVE
>>> relocation because it's out of bounds of the original size of
the
>>> array?
>>> (I don't think doing the former is a problem but I'm not a
linker expert)
>>
>>
>> At this stage, this would generate a TAGGED_RELATIVE. We expect
TAGGED_RELATIVE to be relatively scarce, and coming up with a more complex
scheme for the linker to optimise this edge case where it's in bounds of the
granule padding (but not the symbol itself) seems over-the-top. In saying that,
it's a possibility for later revisions.
>
>
> The plan calls to
> > Realign to granule size (16 bytes), resize to multiple of granule size
(e.g. 40B -> 48B).
> so this would never happen.
>
> The symbols are resized in order to prevent smaller untagged symbols from
getting into the padding of the 16-byte aligned tagged ones.
> I'm not sure if it's desirable to change the symbol size just for
this reason. The linker could always suppress such packing for STO_TAGGED
symbols.
>
> In any case, since all sizes and alignments are known, the compiler should
be allowed to emit RELATIVE in the rounded-up array case.
>
>>
>>
>> On Fri, Sep 18, 2020 at 4:10 AM David Spickett <david.spickett at
linaro.org> wrote:
>>>
>>> Hi Mitch,
>>>
>>> In the intro you say:
>>> > It would also allow attackers with a semilinear RW primitive
to trivially attack global variables if the offset is controllable. Dynamic
global tags are required to provide the same MTE mitigation guarantees that are
afforded to stack and heap memory.
>>>
>>> Then later:
>>> > b) Hidden Symbols (static int g; or -fvisibility=hidden)
>>> > Materialization of hidden symbols now fetch and insert the
memory tag via. `ldg`. On aarch64, this means non PC-relative
loads/stores/address-taken (*g = 7;) generates:
>>> >  adrp x0, g;
>>> >  ldg x0, [x0, :lo12:g]; // new instruction
>>> >  mov x1, #7;
>>> >  str x1, [x0, :lo12:g];
>>>
>>> Does the tagging of these hidden symbols only protect against RW
>>> primitives without a similar ldg? If I knew the address of the
hidden
>>> symbol I could presumably use the same sequence, but I think
I'm
>>> stretching what memory tagging is supposed to protect against.
Mostly
>>> wanted to check I understood.
>>>
>>> Speaking of understanding...
>>>
>>> > Introduce a TAGGED_RELATIVE relocation - in order to solve the
problem where the tag derivation shouldn't be from the relocation result,
e.g.
>>> > static int array[16] = {};
>>> > // array_end must have the same tag as array[]. array_end is
out of
>>> > // bounds w.r.t. array, and may point to a completely
different global.
>>> > int *array_end = &array[16];
>>>
>>> Does this mean that the value of array_end must have the same tag
as
>>> array[]. Then &array_end would have a different tag since
it's a
>>> different global?
>>>
>>> For example you might assign tag 1 to array, then tag 2 to
array_end.
>>> Which means that array_end has a tag of 2 and so does array[16].
>>> (assuming they're sequential)
>>> |            array            | array_end/array[16] |
>>> | < 1> <1> <1> <1>  |            <2> 
|
>>>
>>> So if we just did a RELATIVE relocation then array_end's value
would
>>> have a tag of 2, so you couldn't do:
>>> for (int* ptr=array; ptr != array_end; ++ptr)
>>> Since it's always != due to the tags.
>>>
>>> Do I have that right?
>>>
>>> Also, if you have this same example but the array got rounded up to
>>> the nearest granule e.g. (4 byte ints, 16 byte granules)
>>> int array[3]; // rounded up to be array[4]
>>> int* array_end = array[3];
>>>
>>> Would you emit a normal RELATIVE relocation for array_end, because
>>> it's within the bounds of the rounded up array. Or a
TAGGED_RELATIVE
>>> relocation because it's out of bounds of the original size of
the
>>> array?
>>> (I don't think doing the former is a problem but I'm not a
linker expert)
>>>
>>> Thanks,
>>> David Spickett.
>>>
>>> On Thu, 17 Sep 2020 at 23:05, Mitch Phillips via llvm-dev
>>> <llvm-dev at lists.llvm.org> wrote:
>>> >
>>> > Hi folks,
>>> >
>>> >
>>> > ARM v8.5 introduces the Memory Tagging Extension (MTE), a
hardware that allows for detection of memory safety bugs (buffer overflows,
use-after-free, etc) with low overhead. So far, MTE support is implemented in
the Scudo hardened allocator (compiler-rt/lib/scudo/standalone) for heap, and
stack allocation is implemented in LLVM/Clang behind -fsanitize=memtag.
>>> >
>>> >
>>> > As part of a holistic MTE implementation, global memory should
also be properly tagged. HWASan (a software-only implementation of MTE) has a
schema that uses static tags, however these can be trivially determined by an
attacker with access to the ELF file. This would allow attackers with arbitrary
read/write to trivially attack global variables. It would also allow attackers
with a semilinear RW primitive to trivially attack global variables if the
offset is controllable. Dynamic global tags are required to provide the same MTE
mitigation guarantees that are afforded to stack and heap memory.
>>> >
>>> >
>>> > We've got a plan in mind about how to do MTE globals with
fully dynamic tags, but we'd love to get feedback from the community. In
particular - we'd like to try and align implementation details with GCC as
the scheme requires cooperation from the compiler, linker, and loader.
>>> >
>>> >
>>> > Our current ideas are outlined below. All the compiler
features (including realignment, etc.) would be guarded behind
-fsanitize=memtag. Protection of read-only globals would be enabled-by-default,
but can be disabled at compile time behind a flag (likely
-f(no)sanitize-memtag-ro-globals).
>>> >
>>> >
>>> > a) Dynamic symbols (int f; extern int f;)
>>> >
>>> > Mark all tagged global data symbols in the dynamic symbol
table as st_other.STO_TAGGED.
>>> >
>>> > Teach the loader to read the symbol table at load time (and
dlopen()) prior to relocations, and apply random memory tags (via. `irg ->
stg`) to each STO_TAGGED carrying global.
>>> >
>>> > b) Hidden Symbols (static int g; or -fvisibility=hidden)
>>> >
>>> > Have the compiler mark hidden tagged globals in the symbol
table as st_other.STO_TAGGED.
>>> >
>>> > Have the linker read the symbol table and create a table of {
unrelocated virtual address, size } pairs for each STO_TAGGED carrying hidden
global, storing this in a new section (.mteglobtab).
>>> >
>>> > Create a new dynamic entry "DT_MTEGLOBTAB" that
points to this segment, along with "DT_MTEGLOBENT" for the size of
each entry and "DT_MTEGLOBSZ" for the size (in bytes) of the table.
>>> >
>>> > Similar to dynamic symbols, teach the loader to read this
table and apply random memory tags to each global prior to relocations.
>>> >
>>> > Materialization of hidden symbols now fetch and insert the
memory tag via. `ldg`. On aarch64, this means non PC-relative
loads/stores/address-taken (*g = 7;) generates:
>>> >   adrp x0, g;
>>> >   ldg x0, [x0, :lo12:g]; // new instruction
>>> >   mov x1, #7;
>>> >   str x1, [x0, :lo12:g];
>>> >
>>> > Note that this materialization sequence means that executables
built with MTE globals are not able to run on non-MTE hardware.
>>> >
>>> > Note: Some dynamic symbols can be transformed at link time
into hidden symbols if:
>>> >
>>> > The symbol is in an object file that is statically linked into
an executable and is not referenced in any shared libraries, or
>>> >
>>> > The symbol has its visibility changed with a version script.
>>> >
>>> > These globals always have their addresses derived from a GOT
entry, and thus have their address tag materialized through the RELATIVE
relocation of the GOT entry. Due to the lack of dynamic symbol table entry
however, the memory would go untagged. The linker must ensure it creates an
MTEGLOBTAB entry for all hidden MTE-globals, including those that are
transformed from external to hidden. DSO's linked with -Bsymbolic retain
their dynamic symbol table entries, and thus require no special handling.
>>> >
>>> >
>>> > c) All symbols
>>> >
>>> > Realign to granule size (16 bytes), resize to multiple of
granule size (e.g. 40B -> 48B).
>>> >
>>> > Ban data folding (except where contents and size are same, no
tail merging).
>>> >
>>> > In the loader, ensure writable segments (and possibly .rodata,
see next dot point) are mapped MAP_ANONYMOUS and PROT_MTE (with the contents of
the mappings filled from the file), as file-based mappings aren't
necessarily backed by tag-capable memory. It also requires in-place remapping of
data segments from the program image (as they're already mapped by the
kernel before PT_INTERP invokes the loader).
>>> >
>>> > Make .rodata protection optional. When read-only protection is
in use, the .rodata section should be moved into a separate segment. For Bionic
libc, the rodata section takes up 20% of its ALLOC | READ segment, and we'd
like to be able to maintain page sharing for the remaining 189KiB of other
read-only data in this segment.
>>> >
>>> > d) Relocations
>>> >
>>> > GLOB_DAT, ABS64, and RELATIVE relocations change semantics -
they would be required to retrieve and insert the memory tag of the symbol into
the relocated value. For example, the ABS64 relocation becomes:
>>> >   sym_addr = get_symbol_address() // sym_addr = 0x1008
>>> >   sym_addr |= get_tag(sym_addr & 0xf) // get_tag(0x1008
& 0xf == 0x1000)
>>> >   *r_offset = sym_addr + r_addend;
>>> >
>>> > Introduce a TAGGED_RELATIVE relocation - in order to solve the
problem where the tag derivation shouldn't be from the relocation result,
e.g.
>>> > static int array[16] = {};
>>> > // array_end must have the same tag as array[]. array_end is
out of
>>> > // bounds w.r.t. array, and may point to a completely
different global.
>>> > int *array_end = &array[16];
>>> >
>>> > TAGGED_RELATIVE stores the untagged symbol value in the place
(*r_offset == &array[16]), and keeps the address where the tag should be
derived in the addend (RELA-only r_addend == &array[0]).
>>> >
>>> > For derived symbols where the granule-aligned address is
in-bounds of the tag (e.g. array_end = &array[7] implies the tag can be
derived from (&array[0] & 0xf)), we can use a normal RELATIVE
relocation.
>>> >
>>> > The TAGGED_RELATIVE operand looks like:
>>> >   *r_offset |= get_tag(r_addend & ~0xf);
>>> >
>>> > ABS64, RELATIVE, and TAGGED_RELATIVE relocations need a slight
tweak to grab the place's memory tag before use, as the place itself may be
tagged. So, for example, the TAGGED_RELATIVE operation above actually becomes:
>>> >   r_offset = ldg(r_offset);
>>> >   *r_offset |= get_tag(r_addend & ~0xf);
>>> >
>>> > Introduce an R_AARCH64_LDG_LO9_SHORT_NC relocation for
relocating the 9-bit immediate for the LDG instruction. This isn't
MTE-globals specific, we just seem to be missing the relocation to encode the
9-bit immediate for LDG at bits [12..20]. This would save us an additional ADD
instruction in the inline-LDG sequence for hidden symbols.
>>> >
>>> > We considered a few other schemes, including:
>>> >
>>> > Creating a dynamic symbol table entry for all hidden globals
and giving them the same st_other.STO_TAGGED treatment. These entries would not
require symbol names, but Elf(Sym) entries are 24 bytes (in comparison to 8
bytes for the MTEGLOBTAB schema under the small code model). For an AOSP build,
using dynamic symbol entries instead of MTEGLOBTAB results in a 2.3MiB code size
increase across all DSO's.
>>> >
>>> > Making all hidden symbol accesses go through a local-GOT.
Requires an extra indirection for all local symbols - resulting in increased
cache pressure (and thus decreased performance) over a simple `ldg` of the tag
(as the dcache and tag-cache are going to be warmed anyway for the load/store).
Unlike the MTEGLOBTAG scheme however, this scheme is backwards compatible,
allowing MTE-globals built binaries to run on old ARM64 hardware (as no
incompatible instructions are emitted), the same as heap tagging. Stack tagging
requires a new ABI - and we expect the MTE globals scheme to be enabled in
partnership with stack tagging, thus we are unconcerned about the ABI
requirement for the MTEGLOBTAG scheme.
>>> >
>>> >
>>> > Please let us know any feedback you have. We're currently
working on an experimental version and will update with any more details as they
arise.
>>> >
>>> >
>>> > Thanks,
>>> >
>>> > Mitch.
>>> >
>>> >
>>> > _______________________________________________
>>> > LLVM Developers mailing list
>>> > llvm-dev at lists.llvm.org
>>> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Jessica Clarke via llvm-dev

2020-Sep-21 22:28 UTC

head link

[llvm-dev] [MTE] Globals Tagging - Discussion

> On 21 Sep 2020, at 15:05, David Spickett via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
>> I might be missing your point here - but don't forget that the
local globals are always PC-relative direct loads/stores.
> 
> I did forget! Thanks for clarifying, now I understand.
I think it's worth pointing out this is only true on ABIs that implement
pointers using integer addresses. On CHERI[1], and thus Arm's upcoming
Morello
research prototype[2,3], we use a pure capability ABI where every C language
pointer is a bounded capability with associated permissions, but the same is
also true for all the sub-language-level pointers such as the program counter,
meaning that a PC-relative pointer has read and execute permission but not
write permission. Thus, pointers to local globals still use a GOT (except
containing capabilities, not addresses). It might be wise to pick a
sufficiently-flexible scheme such that it would compose properly with CHERI.

On the other hand, however, MTE on CHERI would be used for a very different
purpose, as by having our capabilities be bounded we already enforce spatial
memory safety and a notion of pointer provenance in a non-probabilistic manner,
so there is no need to make use of the probabilistic protection that MTE can
provide. One of our interests is using MTE to provide versioning of memory in
order to be able to reuse the same memory multiple times in a temporally-safe
way without having to perform revocation sweeps; anyone interested should take
a look at §D.9 of CHERI ISAv7[4] (ISAv8 will be released within a few weeks and
has a little more detail).

Jess

[1] https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/
[2] https://developer.arm.com/architectures/cpu-architecture/a-profile/morello
[3] https://www.morello-project.org
[4] https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-927.pdf
> On Fri, 18 Sep 2020 at 20:51, Evgenii Stepanov <eugenis at
google.com> wrote:
>> 
>> 
>> 
>> On Fri, Sep 18, 2020 at 12:18 PM Mitch Phillips via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>>> 
>>> Hi David,
>>> 
>>>> Does the tagging of these hidden symbols only protect against
RW
>>>> primitives without a similar ldg? If I knew the address of the
hidden
>>>> symbol I could presumably use the same sequence, but I think
I'm
>>>> stretching what memory tagging is supposed to protect against.
>>> 
>>> 
>>> I might be missing your point here - but don't forget that the
local globals are always PC-relative direct loads/stores. The `ldg` sequence in
that example can only be used to get `&g` (and nothing else). There
shouldn't be any `ldg`'s of arbitrary addresses (unless an attacker
already has control of the instruction pointer, which means they've already
bypassed MTE).
>>> 
>>>> Does this mean that the value of array_end must have the same
tag as
>>>> array[]. Then &array_end would have a different tag since
it's a
>>>> different global?
>>> 
>>> 
>>> Yes, exactly.
>>> 
>>>> For example you might assign tag 1 to array, then tag 2 to
array_end.
>>>> Which means that array_end has a tag of 2 and so does
array[16].
>>>> (assuming they're sequential)
>>>> |            array            | array_end/array[16] |
>>>> | < 1> <1> <1> <1>  |           
<2>               |
>>>> 
>>>> 
>>>> 
>>>> So if we just did a RELATIVE relocation then array_end's
value would
>>>> have a tag of 2, so you couldn't do:
>>>> for (int* ptr=array; ptr != array_end; ++ptr)
>>>> Since it's always != due to the tags.
>>>> Do I have that right?
>>> 
>>> 
>>> Yep - you've got it right, this is why we need TAGGED_RELATIVE.
For clarity, here's the memory layout where array_end is relocated using
TAGGED_RELATIVE{*r_offset = &array[16], r_addend = &array[0]}:
>>> arrayarray_end(padding)
>>> Memory Tag0x10x10x20x2
>>> Value0000(0x1 << 56) | &array[16]00
>>> 
>>> So the address tag of `array` and `array_end` are the same (only
`&array_end` has an memory/address tag of 0x2), and thus `for (int*
ptr=array; ptr != array_end; ++ptr)` works normally.
>>> 
>>>> Also, if you have this same example but the array got rounded
up to
>>>> the nearest granule e.g. (4 byte ints, 16 byte granules)
>>>> int array[3]; // rounded up to be array[4]
>>>> int* array_end = array[3];
>>>> Would you emit a normal RELATIVE relocation for array_end,
because
>>>> it's within the bounds of the rounded up array. Or a
TAGGED_RELATIVE
>>>> relocation because it's out of bounds of the original size
of the
>>>> array?
>>>> (I don't think doing the former is a problem but I'm
not a linker expert)
>>> 
>>> 
>>> At this stage, this would generate a TAGGED_RELATIVE. We expect
TAGGED_RELATIVE to be relatively scarce, and coming up with a more complex
scheme for the linker to optimise this edge case where it's in bounds of the
granule padding (but not the symbol itself) seems over-the-top. In saying that,
it's a possibility for later revisions.
>> 
>> 
>> The plan calls to
>>> Realign to granule size (16 bytes), resize to multiple of granule
size (e.g. 40B -> 48B).
>> so this would never happen.
>> 
>> The symbols are resized in order to prevent smaller untagged symbols
from getting into the padding of the 16-byte aligned tagged ones.
>> I'm not sure if it's desirable to change the symbol size just
for this reason. The linker could always suppress such packing for STO_TAGGED
symbols.
>> 
>> In any case, since all sizes and alignments are known, the compiler
should be allowed to emit RELATIVE in the rounded-up array case.
>> 
>>> 
>>> 
>>> On Fri, Sep 18, 2020 at 4:10 AM David Spickett <david.spickett
at linaro.org> wrote:
>>>> 
>>>> Hi Mitch,
>>>> 
>>>> In the intro you say:
>>>>> It would also allow attackers with a semilinear RW
primitive to trivially attack global variables if the offset is controllable.
Dynamic global tags are required to provide the same MTE mitigation guarantees
that are afforded to stack and heap memory.
>>>> 
>>>> Then later:
>>>>> b) Hidden Symbols (static int g; or -fvisibility=hidden)
>>>>> Materialization of hidden symbols now fetch and insert the
memory tag via. `ldg`. On aarch64, this means non PC-relative
loads/stores/address-taken (*g = 7;) generates:
>>>>> adrp x0, g;
>>>>> ldg x0, [x0, :lo12:g]; // new instruction
>>>>> mov x1, #7;
>>>>> str x1, [x0, :lo12:g];
>>>> 
>>>> Does the tagging of these hidden symbols only protect against
RW
>>>> primitives without a similar ldg? If I knew the address of the
hidden
>>>> symbol I could presumably use the same sequence, but I think
I'm
>>>> stretching what memory tagging is supposed to protect against.
Mostly
>>>> wanted to check I understood.
>>>> 
>>>> Speaking of understanding...
>>>> 
>>>>> Introduce a TAGGED_RELATIVE relocation - in order to solve
the problem where the tag derivation shouldn't be from the relocation
result, e.g.
>>>>> static int array[16] = {};
>>>>> // array_end must have the same tag as array[]. array_end
is out of
>>>>> // bounds w.r.t. array, and may point to a completely
different global.
>>>>> int *array_end = &array[16];
>>>> 
>>>> Does this mean that the value of array_end must have the same
tag as
>>>> array[]. Then &array_end would have a different tag since
it's a
>>>> different global?
>>>> 
>>>> For example you might assign tag 1 to array, then tag 2 to
array_end.
>>>> Which means that array_end has a tag of 2 and so does
array[16].
>>>> (assuming they're sequential)
>>>> |            array            | array_end/array[16] |
>>>> | < 1> <1> <1> <1>  |           
<2>               |
>>>> 
>>>> So if we just did a RELATIVE relocation then array_end's
value would
>>>> have a tag of 2, so you couldn't do:
>>>> for (int* ptr=array; ptr != array_end; ++ptr)
>>>> Since it's always != due to the tags.
>>>> 
>>>> Do I have that right?
>>>> 
>>>> Also, if you have this same example but the array got rounded
up to
>>>> the nearest granule e.g. (4 byte ints, 16 byte granules)
>>>> int array[3]; // rounded up to be array[4]
>>>> int* array_end = array[3];
>>>> 
>>>> Would you emit a normal RELATIVE relocation for array_end,
because
>>>> it's within the bounds of the rounded up array. Or a
TAGGED_RELATIVE
>>>> relocation because it's out of bounds of the original size
of the
>>>> array?
>>>> (I don't think doing the former is a problem but I'm
not a linker expert)
>>>> 
>>>> Thanks,
>>>> David Spickett.
>>>> 
>>>> On Thu, 17 Sep 2020 at 23:05, Mitch Phillips via llvm-dev
>>>> <llvm-dev at lists.llvm.org> wrote:
>>>>> 
>>>>> Hi folks,
>>>>> 
>>>>> 
>>>>> ARM v8.5 introduces the Memory Tagging Extension (MTE), a
hardware that allows for detection of memory safety bugs (buffer overflows,
use-after-free, etc) with low overhead. So far, MTE support is implemented in
the Scudo hardened allocator (compiler-rt/lib/scudo/standalone) for heap, and
stack allocation is implemented in LLVM/Clang behind -fsanitize=memtag.
>>>>> 
>>>>> 
>>>>> As part of a holistic MTE implementation, global memory
should also be properly tagged. HWASan (a software-only implementation of MTE)
has a schema that uses static tags, however these can be trivially determined by
an attacker with access to the ELF file. This would allow attackers with
arbitrary read/write to trivially attack global variables. It would also allow
attackers with a semilinear RW primitive to trivially attack global variables if
the offset is controllable. Dynamic global tags are required to provide the same
MTE mitigation guarantees that are afforded to stack and heap memory.
>>>>> 
>>>>> 
>>>>> We've got a plan in mind about how to do MTE globals
with fully dynamic tags, but we'd love to get feedback from the community.
In particular - we'd like to try and align implementation details with GCC
as the scheme requires cooperation from the compiler, linker, and loader.
>>>>> 
>>>>> 
>>>>> Our current ideas are outlined below. All the compiler
features (including realignment, etc.) would be guarded behind
-fsanitize=memtag. Protection of read-only globals would be enabled-by-default,
but can be disabled at compile time behind a flag (likely
-f(no)sanitize-memtag-ro-globals).
>>>>> 
>>>>> 
>>>>> a) Dynamic symbols (int f; extern int f;)
>>>>> 
>>>>> Mark all tagged global data symbols in the dynamic symbol
table as st_other.STO_TAGGED.
>>>>> 
>>>>> Teach the loader to read the symbol table at load time (and
dlopen()) prior to relocations, and apply random memory tags (via. `irg ->
stg`) to each STO_TAGGED carrying global.
>>>>> 
>>>>> b) Hidden Symbols (static int g; or -fvisibility=hidden)
>>>>> 
>>>>> Have the compiler mark hidden tagged globals in the symbol
table as st_other.STO_TAGGED.
>>>>> 
>>>>> Have the linker read the symbol table and create a table of
{ unrelocated virtual address, size } pairs for each STO_TAGGED carrying hidden
global, storing this in a new section (.mteglobtab).
>>>>> 
>>>>> Create a new dynamic entry "DT_MTEGLOBTAB" that
points to this segment, along with "DT_MTEGLOBENT" for the size of
each entry and "DT_MTEGLOBSZ" for the size (in bytes) of the table.
>>>>> 
>>>>> Similar to dynamic symbols, teach the loader to read this
table and apply random memory tags to each global prior to relocations.
>>>>> 
>>>>> Materialization of hidden symbols now fetch and insert the
memory tag via. `ldg`. On aarch64, this means non PC-relative
loads/stores/address-taken (*g = 7;) generates:
>>>>>  adrp x0, g;
>>>>>  ldg x0, [x0, :lo12:g]; // new instruction
>>>>>  mov x1, #7;
>>>>>  str x1, [x0, :lo12:g];
>>>>> 
>>>>> Note that this materialization sequence means that
executables built with MTE globals are not able to run on non-MTE hardware.
>>>>> 
>>>>> Note: Some dynamic symbols can be transformed at link time
into hidden symbols if:
>>>>> 
>>>>> The symbol is in an object file that is statically linked
into an executable and is not referenced in any shared libraries, or
>>>>> 
>>>>> The symbol has its visibility changed with a version
script.
>>>>> 
>>>>> These globals always have their addresses derived from a
GOT entry, and thus have their address tag materialized through the RELATIVE
relocation of the GOT entry. Due to the lack of dynamic symbol table entry
however, the memory would go untagged. The linker must ensure it creates an
MTEGLOBTAB entry for all hidden MTE-globals, including those that are
transformed from external to hidden. DSO's linked with -Bsymbolic retain
their dynamic symbol table entries, and thus require no special handling.
>>>>> 
>>>>> 
>>>>> c) All symbols
>>>>> 
>>>>> Realign to granule size (16 bytes), resize to multiple of
granule size (e.g. 40B -> 48B).
>>>>> 
>>>>> Ban data folding (except where contents and size are same,
no tail merging).
>>>>> 
>>>>> In the loader, ensure writable segments (and possibly
.rodata, see next dot point) are mapped MAP_ANONYMOUS and PROT_MTE (with the
contents of the mappings filled from the file), as file-based mappings
aren't necessarily backed by tag-capable memory. It also requires in-place
remapping of data segments from the program image (as they're already mapped
by the kernel before PT_INTERP invokes the loader).
>>>>> 
>>>>> Make .rodata protection optional. When read-only protection
is in use, the .rodata section should be moved into a separate segment. For
Bionic libc, the rodata section takes up 20% of its ALLOC | READ segment, and
we'd like to be able to maintain page sharing for the remaining 189KiB of
other read-only data in this segment.
>>>>> 
>>>>> d) Relocations
>>>>> 
>>>>> GLOB_DAT, ABS64, and RELATIVE relocations change semantics
- they would be required to retrieve and insert the memory tag of the symbol
into the relocated value. For example, the ABS64 relocation becomes:
>>>>>  sym_addr = get_symbol_address() // sym_addr = 0x1008
>>>>>  sym_addr |= get_tag(sym_addr & 0xf) // get_tag(0x1008
& 0xf == 0x1000)
>>>>>  *r_offset = sym_addr + r_addend;
>>>>> 
>>>>> Introduce a TAGGED_RELATIVE relocation - in order to solve
the problem where the tag derivation shouldn't be from the relocation
result, e.g.
>>>>> static int array[16] = {};
>>>>> // array_end must have the same tag as array[]. array_end
is out of
>>>>> // bounds w.r.t. array, and may point to a completely
different global.
>>>>> int *array_end = &array[16];
>>>>> 
>>>>> TAGGED_RELATIVE stores the untagged symbol value in the
place (*r_offset == &array[16]), and keeps the address where the tag should
be derived in the addend (RELA-only r_addend == &array[0]).
>>>>> 
>>>>> For derived symbols where the granule-aligned address is
in-bounds of the tag (e.g. array_end = &array[7] implies the tag can be
derived from (&array[0] & 0xf)), we can use a normal RELATIVE
relocation.
>>>>> 
>>>>> The TAGGED_RELATIVE operand looks like:
>>>>>  *r_offset |= get_tag(r_addend & ~0xf);
>>>>> 
>>>>> ABS64, RELATIVE, and TAGGED_RELATIVE relocations need a
slight tweak to grab the place's memory tag before use, as the place itself
may be tagged. So, for example, the TAGGED_RELATIVE operation above actually
becomes:
>>>>>  r_offset = ldg(r_offset);
>>>>>  *r_offset |= get_tag(r_addend & ~0xf);
>>>>> 
>>>>> Introduce an R_AARCH64_LDG_LO9_SHORT_NC relocation for
relocating the 9-bit immediate for the LDG instruction. This isn't
MTE-globals specific, we just seem to be missing the relocation to encode the
9-bit immediate for LDG at bits [12..20]. This would save us an additional ADD
instruction in the inline-LDG sequence for hidden symbols.
>>>>> 
>>>>> We considered a few other schemes, including:
>>>>> 
>>>>> Creating a dynamic symbol table entry for all hidden
globals and giving them the same st_other.STO_TAGGED treatment. These entries
would not require symbol names, but Elf(Sym) entries are 24 bytes (in comparison
to 8 bytes for the MTEGLOBTAB schema under the small code model). For an AOSP
build, using dynamic symbol entries instead of MTEGLOBTAB results in a 2.3MiB
code size increase across all DSO's.
>>>>> 
>>>>> Making all hidden symbol accesses go through a local-GOT.
Requires an extra indirection for all local symbols - resulting in increased
cache pressure (and thus decreased performance) over a simple `ldg` of the tag
(as the dcache and tag-cache are going to be warmed anyway for the load/store).
Unlike the MTEGLOBTAG scheme however, this scheme is backwards compatible,
allowing MTE-globals built binaries to run on old ARM64 hardware (as no
incompatible instructions are emitted), the same as heap tagging. Stack tagging
requires a new ABI - and we expect the MTE globals scheme to be enabled in
partnership with stack tagging, thus we are unconcerned about the ABI
requirement for the MTEGLOBTAG scheme.
>>>>> 
>>>>> 
>>>>> Please let us know any feedback you have. We're
currently working on an experimental version and will update with any more
details as they arise.
>>>>> 
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Mitch.
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> 
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Mitch Phillips via llvm-dev

2020-Sep-22 16:56 UTC

head link

[llvm-dev] [MTE] Globals Tagging - Discussion

Hi Jessica,

Thanks for the info. I'm assuming that the CHERI-on-Morello scheme is going
to require its own relocation types and instructions in order to make
different use of MTE.

Is there anything in our specification that is cross-applicable under
Arm+CHERI? I'm assuming the symbol tagging scheme might be useful, but not
the RELATIVE_TAGGED relocation as it's designed for spatial and temporal
safety. Would you recommend any changes here to allow Arm+CHERI to take
advantage?

On Mon, Sep 21, 2020 at 3:28 PM Jessica Clarke via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> > On 21 Sep 2020, at 15:05, David Spickett via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >
> >> I might be missing your point here - but don't forget that the
local
> globals are always PC-relative direct loads/stores.
> >
> > I did forget! Thanks for clarifying, now I understand.
>
> I think it's worth pointing out this is only true on ABIs that
implement
> pointers using integer addresses. On CHERI[1], and thus Arm's upcoming
> Morello
> research prototype[2,3], we use a pure capability ABI where every C
> language
> pointer is a bounded capability with associated permissions, but the same
> is
> also true for all the sub-language-level pointers such as the program
> counter,
> meaning that a PC-relative pointer has read and execute permission but not
> write permission. Thus, pointers to local globals still use a GOT (except
> containing capabilities, not addresses). It might be wise to pick a
> sufficiently-flexible scheme such that it would compose properly with
> CHERI.
>
> On the other hand, however, MTE on CHERI would be used for a very different
> purpose, as by having our capabilities be bounded we already enforce
> spatial
> memory safety and a notion of pointer provenance in a non-probabilistic
> manner,
> so there is no need to make use of the probabilistic protection that MTE
> can
> provide. One of our interests is using MTE to provide versioning of memory
> in
> order to be able to reuse the same memory multiple times in a
> temporally-safe
> way without having to perform revocation sweeps; anyone interested should
> take
> a look at §D.9 of CHERI ISAv7[4] (ISAv8 will be released within a few
> weeks and
> has a little more detail).
>
> Jess
>
> [1] https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/
> [2]
> https://developer.arm.com/architectures/cpu-architecture/a-profile/morello
> [3] https://www.morello-project.org
> [4] https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-927.pdf
>
> > On Fri, 18 Sep 2020 at 20:51, Evgenii Stepanov <eugenis at
google.com>
> wrote:
> >>
> >>
> >>
> >> On Fri, Sep 18, 2020 at 12:18 PM Mitch Phillips via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >>>
> >>> Hi David,
> >>>
> >>>> Does the tagging of these hidden symbols only protect
against RW
> >>>> primitives without a similar ldg? If I knew the address of
the hidden
> >>>> symbol I could presumably use the same sequence, but I
think I'm
> >>>> stretching what memory tagging is supposed to protect
against.
> >>>
> >>>
> >>> I might be missing your point here - but don't forget that
the local
> globals are always PC-relative direct loads/stores. The `ldg` sequence in
> that example can only be used to get `&g` (and nothing else). There
> shouldn't be any `ldg`'s of arbitrary addresses (unless an attacker
already
> has control of the instruction pointer, which means they've already
> bypassed MTE).
> >>>
> >>>> Does this mean that the value of array_end must have the
same tag as
> >>>> array[]. Then &array_end would have a different tag
since it's a
> >>>> different global?
> >>>
> >>>
> >>> Yes, exactly.
> >>>
> >>>> For example you might assign tag 1 to array, then tag 2 to
array_end.
> >>>> Which means that array_end has a tag of 2 and so does
array[16].
> >>>> (assuming they're sequential)
> >>>> |            array            | array_end/array[16] |
> >>>> | < 1> <1> <1> <1>  |           
<2>               |
> >>>>
> >>>>
> >>>>
> >>>> So if we just did a RELATIVE relocation then
array_end's value would
> >>>> have a tag of 2, so you couldn't do:
> >>>> for (int* ptr=array; ptr != array_end; ++ptr)
> >>>> Since it's always != due to the tags.
> >>>> Do I have that right?
> >>>
> >>>
> >>> Yep - you've got it right, this is why we need
TAGGED_RELATIVE. For
> clarity, here's the memory layout where array_end is relocated using
> TAGGED_RELATIVE{*r_offset = &array[16], r_addend = &array[0]}:
> >>> arrayarray_end(padding)
> >>> Memory Tag0x10x10x20x2
> >>> Value0000(0x1 << 56) | &array[16]00
> >>>
> >>> So the address tag of `array` and `array_end` are the same
(only
> `&array_end` has an memory/address tag of 0x2), and thus `for (int*
> ptr=array; ptr != array_end; ++ptr)` works normally.
> >>>
> >>>> Also, if you have this same example but the array got
rounded up to
> >>>> the nearest granule e.g. (4 byte ints, 16 byte granules)
> >>>> int array[3]; // rounded up to be array[4]
> >>>> int* array_end = array[3];
> >>>> Would you emit a normal RELATIVE relocation for array_end,
because
> >>>> it's within the bounds of the rounded up array. Or a
TAGGED_RELATIVE
> >>>> relocation because it's out of bounds of the original
size of the
> >>>> array?
> >>>> (I don't think doing the former is a problem but
I'm not a linker
> expert)
> >>>
> >>>
> >>> At this stage, this would generate a TAGGED_RELATIVE. We
expect
> TAGGED_RELATIVE to be relatively scarce, and coming up with a more complex
> scheme for the linker to optimise this edge case where it's in bounds
of
> the granule padding (but not the symbol itself) seems over-the-top. In
> saying that, it's a possibility for later revisions.
> >>
> >>
> >> The plan calls to
> >>> Realign to granule size (16 bytes), resize to multiple of
granule size
> (e.g. 40B -> 48B).
> >> so this would never happen.
> >>
> >> The symbols are resized in order to prevent smaller untagged
symbols
> from getting into the padding of the 16-byte aligned tagged ones.
> >> I'm not sure if it's desirable to change the symbol size
just for this
> reason. The linker could always suppress such packing for STO_TAGGED
> symbols.
> >>
> >> In any case, since all sizes and alignments are known, the
compiler
> should be allowed to emit RELATIVE in the rounded-up array case.
> >>
> >>>
> >>>
> >>> On Fri, Sep 18, 2020 at 4:10 AM David Spickett <
> david.spickett at linaro.org> wrote:
> >>>>
> >>>> Hi Mitch,
> >>>>
> >>>> In the intro you say:
> >>>>> It would also allow attackers with a semilinear RW
primitive to
> trivially attack global variables if the offset is controllable. Dynamic
> global tags are required to provide the same MTE mitigation guarantees that
> are afforded to stack and heap memory.
> >>>>
> >>>> Then later:
> >>>>> b) Hidden Symbols (static int g; or
-fvisibility=hidden)
> >>>>> Materialization of hidden symbols now fetch and insert
the memory
> tag via. `ldg`. On aarch64, this means non PC-relative
> loads/stores/address-taken (*g = 7;) generates:
> >>>>> adrp x0, g;
> >>>>> ldg x0, [x0, :lo12:g]; // new instruction
> >>>>> mov x1, #7;
> >>>>> str x1, [x0, :lo12:g];
> >>>>
> >>>> Does the tagging of these hidden symbols only protect
against RW
> >>>> primitives without a similar ldg? If I knew the address of
the hidden
> >>>> symbol I could presumably use the same sequence, but I
think I'm
> >>>> stretching what memory tagging is supposed to protect
against. Mostly
> >>>> wanted to check I understood.
> >>>>
> >>>> Speaking of understanding...
> >>>>
> >>>>> Introduce a TAGGED_RELATIVE relocation - in order to
solve the
> problem where the tag derivation shouldn't be from the relocation
result,
> e.g.
> >>>>> static int array[16] = {};
> >>>>> // array_end must have the same tag as array[].
array_end is out of
> >>>>> // bounds w.r.t. array, and may point to a completely
different
> global.
> >>>>> int *array_end = &array[16];
> >>>>
> >>>> Does this mean that the value of array_end must have the
same tag as
> >>>> array[]. Then &array_end would have a different tag
since it's a
> >>>> different global?
> >>>>
> >>>> For example you might assign tag 1 to array, then tag 2 to
array_end.
> >>>> Which means that array_end has a tag of 2 and so does
array[16].
> >>>> (assuming they're sequential)
> >>>> |            array            | array_end/array[16] |
> >>>> | < 1> <1> <1> <1>  |           
<2>               |
> >>>>
> >>>> So if we just did a RELATIVE relocation then
array_end's value would
> >>>> have a tag of 2, so you couldn't do:
> >>>> for (int* ptr=array; ptr != array_end; ++ptr)
> >>>> Since it's always != due to the tags.
> >>>>
> >>>> Do I have that right?
> >>>>
> >>>> Also, if you have this same example but the array got
rounded up to
> >>>> the nearest granule e.g. (4 byte ints, 16 byte granules)
> >>>> int array[3]; // rounded up to be array[4]
> >>>> int* array_end = array[3];
> >>>>
> >>>> Would you emit a normal RELATIVE relocation for array_end,
because
> >>>> it's within the bounds of the rounded up array. Or a
TAGGED_RELATIVE
> >>>> relocation because it's out of bounds of the original
size of the
> >>>> array?
> >>>> (I don't think doing the former is a problem but
I'm not a linker
> expert)
> >>>>
> >>>> Thanks,
> >>>> David Spickett.
> >>>>
> >>>> On Thu, 17 Sep 2020 at 23:05, Mitch Phillips via llvm-dev
> >>>> <llvm-dev at lists.llvm.org> wrote:
> >>>>>
> >>>>> Hi folks,
> >>>>>
> >>>>>
> >>>>> ARM v8.5 introduces the Memory Tagging Extension
(MTE), a hardware
> that allows for detection of memory safety bugs (buffer overflows,
> use-after-free, etc) with low overhead. So far, MTE support is implemented
> in the Scudo hardened allocator (compiler-rt/lib/scudo/standalone) for
> heap, and stack allocation is implemented in LLVM/Clang behind
> -fsanitize=memtag.
> >>>>>
> >>>>>
> >>>>> As part of a holistic MTE implementation, global
memory should also
> be properly tagged. HWASan (a software-only implementation of MTE) has a
> schema that uses static tags, however these can be trivially determined by
> an attacker with access to the ELF file. This would allow attackers with
> arbitrary read/write to trivially attack global variables. It would also
> allow attackers with a semilinear RW primitive to trivially attack global
> variables if the offset is controllable. Dynamic global tags are required
> to provide the same MTE mitigation guarantees that are afforded to stack
> and heap memory.
> >>>>>
> >>>>>
> >>>>> We've got a plan in mind about how to do MTE
globals with fully
> dynamic tags, but we'd love to get feedback from the community. In
> particular - we'd like to try and align implementation details with GCC
as
> the scheme requires cooperation from the compiler, linker, and loader.
> >>>>>
> >>>>>
> >>>>> Our current ideas are outlined below. All the compiler
features
> (including realignment, etc.) would be guarded behind -fsanitize=memtag.
> Protection of read-only globals would be enabled-by-default, but can be
> disabled at compile time behind a flag (likely
> -f(no)sanitize-memtag-ro-globals).
> >>>>>
> >>>>>
> >>>>> a) Dynamic symbols (int f; extern int f;)
> >>>>>
> >>>>> Mark all tagged global data symbols in the dynamic
symbol table as
> st_other.STO_TAGGED.
> >>>>>
> >>>>> Teach the loader to read the symbol table at load time
(and
> dlopen()) prior to relocations, and apply random memory tags (via. `irg
->
> stg`) to each STO_TAGGED carrying global.
> >>>>>
> >>>>> b) Hidden Symbols (static int g; or
-fvisibility=hidden)
> >>>>>
> >>>>> Have the compiler mark hidden tagged globals in the
symbol table as
> st_other.STO_TAGGED.
> >>>>>
> >>>>> Have the linker read the symbol table and create a
table of {
> unrelocated virtual address, size } pairs for each STO_TAGGED carrying
> hidden global, storing this in a new section (.mteglobtab).
> >>>>>
> >>>>> Create a new dynamic entry "DT_MTEGLOBTAB"
that points to this
> segment, along with "DT_MTEGLOBENT" for the size of each entry
and
> "DT_MTEGLOBSZ" for the size (in bytes) of the table.
> >>>>>
> >>>>> Similar to dynamic symbols, teach the loader to read
this table and
> apply random memory tags to each global prior to relocations.
> >>>>>
> >>>>> Materialization of hidden symbols now fetch and insert
the memory
> tag via. `ldg`. On aarch64, this means non PC-relative
> loads/stores/address-taken (*g = 7;) generates:
> >>>>>  adrp x0, g;
> >>>>>  ldg x0, [x0, :lo12:g]; // new instruction
> >>>>>  mov x1, #7;
> >>>>>  str x1, [x0, :lo12:g];
> >>>>>
> >>>>> Note that this materialization sequence means that
executables built
> with MTE globals are not able to run on non-MTE hardware.
> >>>>>
> >>>>> Note: Some dynamic symbols can be transformed at link
time into
> hidden symbols if:
> >>>>>
> >>>>> The symbol is in an object file that is statically
linked into an
> executable and is not referenced in any shared libraries, or
> >>>>>
> >>>>> The symbol has its visibility changed with a version
script.
> >>>>>
> >>>>> These globals always have their addresses derived from
a GOT entry,
> and thus have their address tag materialized through the RELATIVE
> relocation of the GOT entry. Due to the lack of dynamic symbol table entry
> however, the memory would go untagged. The linker must ensure it creates an
> MTEGLOBTAB entry for all hidden MTE-globals, including those that are
> transformed from external to hidden. DSO's linked with -Bsymbolic
retain
> their dynamic symbol table entries, and thus require no special handling.
> >>>>>
> >>>>>
> >>>>> c) All symbols
> >>>>>
> >>>>> Realign to granule size (16 bytes), resize to multiple
of granule
> size (e.g. 40B -> 48B).
> >>>>>
> >>>>> Ban data folding (except where contents and size are
same, no tail
> merging).
> >>>>>
> >>>>> In the loader, ensure writable segments (and possibly
.rodata, see
> next dot point) are mapped MAP_ANONYMOUS and PROT_MTE (with the contents of
> the mappings filled from the file), as file-based mappings aren't
> necessarily backed by tag-capable memory. It also requires in-place
> remapping of data segments from the program image (as they're already
> mapped by the kernel before PT_INTERP invokes the loader).
> >>>>>
> >>>>> Make .rodata protection optional. When read-only
protection is in
> use, the .rodata section should be moved into a separate segment. For
> Bionic libc, the rodata section takes up 20% of its ALLOC | READ segment,
> and we'd like to be able to maintain page sharing for the remaining
189KiB
> of other read-only data in this segment.
> >>>>>
> >>>>> d) Relocations
> >>>>>
> >>>>> GLOB_DAT, ABS64, and RELATIVE relocations change
semantics - they
> would be required to retrieve and insert the memory tag of the symbol into
> the relocated value. For example, the ABS64 relocation becomes:
> >>>>>  sym_addr = get_symbol_address() // sym_addr = 0x1008
> >>>>>  sym_addr |= get_tag(sym_addr & 0xf) //
get_tag(0x1008 & 0xf => 0x1000)
> >>>>>  *r_offset = sym_addr + r_addend;
> >>>>>
> >>>>> Introduce a TAGGED_RELATIVE relocation - in order to
solve the
> problem where the tag derivation shouldn't be from the relocation
result,
> e.g.
> >>>>> static int array[16] = {};
> >>>>> // array_end must have the same tag as array[].
array_end is out of
> >>>>> // bounds w.r.t. array, and may point to a completely
different
> global.
> >>>>> int *array_end = &array[16];
> >>>>>
> >>>>> TAGGED_RELATIVE stores the untagged symbol value in
the place
> (*r_offset == &array[16]), and keeps the address where the tag should
be
> derived in the addend (RELA-only r_addend == &array[0]).
> >>>>>
> >>>>> For derived symbols where the granule-aligned address
is in-bounds
> of the tag (e.g. array_end = &array[7] implies the tag can be derived
from
> (&array[0] & 0xf)), we can use a normal RELATIVE relocation.
> >>>>>
> >>>>> The TAGGED_RELATIVE operand looks like:
> >>>>>  *r_offset |= get_tag(r_addend & ~0xf);
> >>>>>
> >>>>> ABS64, RELATIVE, and TAGGED_RELATIVE relocations need
a slight tweak
> to grab the place's memory tag before use, as the place itself may be
> tagged. So, for example, the TAGGED_RELATIVE operation above actually
> becomes:
> >>>>>  r_offset = ldg(r_offset);
> >>>>>  *r_offset |= get_tag(r_addend & ~0xf);
> >>>>>
> >>>>> Introduce an R_AARCH64_LDG_LO9_SHORT_NC relocation for
relocating
> the 9-bit immediate for the LDG instruction. This isn't MTE-globals
> specific, we just seem to be missing the relocation to encode the 9-bit
> immediate for LDG at bits [12..20]. This would save us an additional ADD
> instruction in the inline-LDG sequence for hidden symbols.
> >>>>>
> >>>>> We considered a few other schemes, including:
> >>>>>
> >>>>> Creating a dynamic symbol table entry for all hidden
globals and
> giving them the same st_other.STO_TAGGED treatment. These entries would not
> require symbol names, but Elf(Sym) entries are 24 bytes (in comparison to 8
> bytes for the MTEGLOBTAB schema under the small code model). For an AOSP
> build, using dynamic symbol entries instead of MTEGLOBTAB results in a
> 2.3MiB code size increase across all DSO's.
> >>>>>
> >>>>> Making all hidden symbol accesses go through a
local-GOT. Requires
> an extra indirection for all local symbols - resulting in increased cache
> pressure (and thus decreased performance) over a simple `ldg` of the tag
> (as the dcache and tag-cache are going to be warmed anyway for the
> load/store). Unlike the MTEGLOBTAG scheme however, this scheme is backwards
> compatible, allowing MTE-globals built binaries to run on old ARM64
> hardware (as no incompatible instructions are emitted), the same as heap
> tagging. Stack tagging requires a new ABI - and we expect the MTE globals
> scheme to be enabled in partnership with stack tagging, thus we are
> unconcerned about the ABI requirement for the MTEGLOBTAG scheme.
> >>>>>
> >>>>>
> >>>>> Please let us know any feedback you have. We're
currently working on
> an experimental version and will update with any more details as they
arise.
> >>>>>
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> Mitch.
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> LLVM Developers mailing list
> >>>>> llvm-dev at lists.llvm.org
> >>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>>
> >>> _______________________________________________
> >>> LLVM Developers mailing list
> >>> llvm-dev at lists.llvm.org
> >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200922/9ddddffa/attachment-0001.html>

llvm dev - Sep 2020 - [MTE] Globals Tagging - Discussion

[llvm-dev] [MTE] Globals Tagging - Discussion

[llvm-dev] [MTE] Globals Tagging - Discussion

[llvm-dev] [MTE] Globals Tagging - Discussion