thr3ads.net - llvm dev - [llvm-dev] Extra questions about HWASAN [Sep 2019]

If this information is useful, please help other people find it:
Share via:

Matthew Malcomson via llvm-dev

2019-Sep-12 16:23 UTC

[llvm-dev] Requesting clarification of some HWASAN behaviours.

Hello,

I'm working on implementing hwasan instrumentation in GCC, and have just
started discussing my current work-in-progress on the gcc-patches
mailing list.
(https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00387.html -- the email
that Kostya saw and added people to).

I've gotten about as basic a user-space implementation as possible
(using the interceptor ABI) up and running, and would appreciate it if I
could get some clarification on some other parts of hwasan and it's use
so I can look into those.

As you may guess by my current task I'm most interested in features that
require some compiler instrumentation.

Most of the questions I have are just double checking the impression I
have from skimming the LLVM source code, but I would very much
appreciate any extra clarification people think would be helpful.

Evgenii's recent reply on the GCC mailing list was very useful, so I
figure there could be a lot of information I don't quite know to ask for
that could be a great help.
(https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00773.html)

The questions I know I have are:

1) What code-generation differences are there between kernel and
    userspace hwasan?
    From what I can see in the llvm source code there is
    - No C++ exceptions, globals, or module constructor & init functions
    - Kernel has match-all of 0xFF by default.
    - Tagging code differences due to 0xFF in kernel address top byte.
    - Shadow memory is located in a different place.
    Is there anything else I should know around hwasan in the kernel?

2) I believe compiling while ignoring the "short-granule" feature
would
    not add any incompatibilities with other code.  Is this correct?

3) Am I right in thinking that longjmp and setjmp are only handled in
    the platform ABI? (I couldn't see any interceptors for them).

Thanks,
Matthew

Evgenii Stepanov via llvm-dev

2019-Sep-12 19:47 UTC

head link

[llvm-dev] Requesting clarification of some HWASAN behaviours.

On Thu, Sep 12, 2019 at 9:23 AM Matthew Malcomson
<Matthew.Malcomson at arm.com> wrote:>
> Hello,
>
> I'm working on implementing hwasan instrumentation in GCC, and have
just
> started discussing my current work-in-progress on the gcc-patches
> mailing list.
> (https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00387.html -- the email
> that Kostya saw and added people to).
I must say I really like your plan to reuse parts of HWASan codegen
for MTE in that change. In LLVM, we did hwasan before we had a good
idea of what MTE ISA would look like, and they ended up mostly
independent. I'm considering eventually refactoring HWASan to use more
of MTE codegen, but it does not look like this work would be
justifiable by the performance or code size gains it might bring.
> I've gotten about as basic a user-space implementation as possible
> (using the interceptor ABI) up and running, and would appreciate it if I
> could get some clarification on some other parts of hwasan and it's use
> so I can look into those.
>
> As you may guess by my current task I'm most interested in features
that
> require some compiler instrumentation.
>
> Most of the questions I have are just double checking the impression I
> have from skimming the LLVM source code, but I would very much
> appreciate any extra clarification people think would be helpful.
>
> Evgenii's recent reply on the GCC mailing list was very useful, so I
> figure there could be a lot of information I don't quite know to ask
for
> that could be a great help.
> (https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00773.html)
>
> The questions I know I have are:
>
> 1) What code-generation differences are there between kernel and
>     userspace hwasan?
>     From what I can see in the llvm source code there is
>     - No C++ exceptions, globals, or module constructor & init
functions
>     - Kernel has match-all of 0xFF by default.
>     - Tagging code differences due to 0xFF in kernel address top byte.
>     - Shadow memory is located in a different place.
>     Is there anything else I should know around hwasan in the kernel?
That's about all I can remember.
> 2) I believe compiling while ignoring the "short-granule" feature
would
>     not add any incompatibilities with other code.  Is this correct?
As long as short granules are disabled in the runtime library, too.
And no other code in the same process is compiled using short granules
for stack instrumentation. Basically, a short-granule-unaware tag
check will see a short granule as a tag mismatch.That is because with
short granules, the actual tag is stored in the last byte of the
allocation, and the size of the allocation - in the memory tag.
> 3) Am I right in thinking that longjmp and setjmp are only handled in
>     the platform ABI? (I couldn't see any interceptors for them).
Yes. I think this is a simple omission coming from the fact that the
interceptor ABI have not really seen a lot of use outside of
compiler-rt tests.
>
> Thanks,
> Matthew

Matthew Malcomson via llvm-dev

2019-Sep-20 13:48 UTC

head link

[llvm-dev] Extra questions about HWASAN

Hello again,

I have been thinking more about the GCC implementation of hwasan and
found a few more questions that I would really appreciate help with.

---
I've noticed a match-all condition in the compiler inline
instrumentation, but can't see where it's used in the libhwasan function
call checks.  Is that a planned behaviour or am I just missing the part
in the code where this happens?

 From reading the git history I'm guessing it's not in the library since
the feature was introduced for the kernel specifically and the kernel
doesn't use the library ... or is that wild speculation on my part?

---
Would it be OK to add `report_load{size}` functions to the library?  I
notice that LLVM emits an inline-assembler `brk` into the IR for the
inline tag-mismatch report.

I'm a little uncomfortable putting architecture specific assembly code
into the mid-end of GCC (even though the entire HWASAN is AArch64
specific in GCC) and would like to put some "just report" functions
into
libhwasan (in the same manner that libasan has
`__asan_report_(load|store){1,2,4,8,16,_n}{,_noabort}` functions).

Would that be OK?  It's a simple patch that I already have locally.  I
guess tag-mismatch reporting would then contain an extra function in the
stack report?

---
As I understand it, when the mainline kernel gets patched to accept
tagged pointers in syscalls, the relaxed ABI will be behind a `prctl`
call rather than being generically turned on.  I guess this means that
Android will eventually have the same requirement?

If/when that happens, would the initial call to `prctl` be put in
libhwasan, or would something like that be done in Bionic?  (n.b. this
call needs to be done for every program since the setting doesn't
propagate across fork).  I have a patch that adds the relevant `prctl`
call to what `__hwasan_init` does in libhwasan.  I do this because I'm
using a glibc unmodified for hwasan.

---
I've noticed code around maintaining a thread-specific ring-buffer of
the stack frames that have been used (recording the PC and SP) -- it
looks like this is in order to give better messages on tag-mismatch, is
that correct?

---
Would the addition of `longjmp` and `setjmp` to the "interceptor ABI"
build of libhwasan be OK?  I understand that LLVM pretty much only use
that ABI for testing, but I'm working without a glibc that supports the
"platform ABI".

Thanks,
Matthew

Evgenii Stepanov via llvm-dev

2019-Sep-20 18:07 UTC

head link

[llvm-dev] Extra questions about HWASAN

Hi,

On Fri, Sep 20, 2019 at 6:48 AM Matthew Malcomson
<Matthew.Malcomson at arm.com> wrote:>
> Hello again,
>
> I have been thinking more about the GCC implementation of hwasan and
> found a few more questions that I would really appreciate help with.
>
> ---
> I've noticed a match-all condition in the compiler inline
> instrumentation, but can't see where it's used in the libhwasan
function
> call checks.  Is that a planned behaviour or am I just missing the part
> in the code where this happens?
>
>  From reading the git history I'm guessing it's not in the library
since
> the feature was introduced for the kernel specifically and the kernel
> doesn't use the library ... or is that wild speculation on my part?
Yes, I think this is exactly right.
> ---
> Would it be OK to add `report_load{size}` functions to the library?  I
> notice that LLVM emits an inline-assembler `brk` into the IR for the
> inline tag-mismatch report.
>
> I'm a little uncomfortable putting architecture specific assembly code
> into the mid-end of GCC (even though the entire HWASAN is AArch64
> specific in GCC) and would like to put some "just report"
functions into
> libhwasan (in the same manner that libasan has
> `__asan_report_(load|store){1,2,4,8,16,_n}{,_noabort}` functions).
>
> Would that be OK?  It's a simple patch that I already have locally.  I
> guess tag-mismatch reporting would then contain an extra function in the
> stack report?
Does __hwasan_tag_mismatch / __hwasan_tag_mismatch_stub work for you?
The first one has a non-standard ABI, but it can save register state
at the point of the fault in the user code.
> ---
> As I understand it, when the mainline kernel gets patched to accept
> tagged pointers in syscalls, the relaxed ABI will be behind a `prctl`
> call rather than being generically turned on.  I guess this means that
> Android will eventually have the same requirement?
>
> If/when that happens, would the initial call to `prctl` be put in
> libhwasan, or would something like that be done in Bionic?  (n.b. this
> call needs to be done for every program since the setting doesn't
> propagate across fork).  I have a patch that adds the relevant `prctl`
> call to what `__hwasan_init` does in libhwasan.  I do this because I'm
> using a glibc unmodified for hwasan.
I agree, __hwasan_init is the right place for this.
> ---
> I've noticed code around maintaining a thread-specific ring-buffer of
> the stack frames that have been used (recording the PC and SP) -- it
> looks like this is in order to give better messages on tag-mismatch, is
> that correct?
Yes, for stack errors. We emit the tag offset of each local variable
in DWARF debug info, and the ring buffer contains PC, SP and the base
tag for recent stack frames. The code in PrintStackAllocations
combines this info to find possible source(s) of a mismatching pointer
tag.
> ---
> Would the addition of `longjmp` and `setjmp` to the "interceptor
ABI"
> build of libhwasan be OK?  I understand that LLVM pretty much only use
> that ABI for testing, but I'm working without a glibc that supports the
> "platform ABI".
Sure, patches are welcome.
> Thanks,
> Matthew

llvm dev - Sep 2019 - Extra questions about HWASAN

[llvm-dev] Requesting clarification of some HWASAN behaviours.

[llvm-dev] Requesting clarification of some HWASAN behaviours.

[llvm-dev] Extra questions about HWASAN

[llvm-dev] Extra questions about HWASAN