thr3ads.net - llvm dev - [llvm-dev] RFC: Generalize means the sanitizers work with memory [Feb 2017]

If this information is useful, please help other people find it:
Share via:

Ivan A. Kosarev via llvm-dev

2017-Feb-23 18:16 UTC

[llvm-dev] RFC: Generalize means the sanitizers work with memory

RFC: Generalize means the sanitizers work with memory

Overview
=======
Currently, LLVM sanitizers, such as Asan and Tsan, are tied to a specific
memory model that relies on presence of hardware support for virtual memory.
This prevents sanitizers from being used on platforms that lack such 
support,
but otherwise are capable of running sanitized programs. Our research
indicates that adding support for such platforms is possible with a 
relatively
small amount of changes to the sanitizers source code and zero 
performance and
size penalty on currently supported systems. We also found that these 
changes
clarify and formalize the functional and performance dependencies between
sanitizers and system memory so they can be considered an improvement in
terms of design and readability regardless of the added capabilities. 
One can
think of it as a zero-cost abstraction layer.


The Approach
===========
To support platforms that do not have hardware virtual memory managers,
we need to introduce the concept of physical memory pages that work as the
storage for data that sanitizers currently read and write by virtual
addresses. In presence of the concept of physical memory, every time we 
access
virtual memory we have to translate the given virtual address to a physical
one. For example, this check:

    *(u8 *)MEM_TO_SHADOW(allocated) == 0

becomes:

    *MEM_TO_PSHADOW(allocated) == 0

where the MEM_TO_PSHADOW(mem) macro is defined as:

    #define MEM_TO_PSHADOW(mem) VSHADOW_TO_PSHADOW(MEM_TO_VSHADOW(mem))
    #define MEM_TO_VSHADOW(mem) /* Whatever currently MEM_TO_SHADOW() is. */

The VSHADOW_TO_PSHADOW(vs) macro returns a pointer to a byte within a
physical page that corresponds to the given virtual address and 
allocates this
page if it has not been allocated before. On platforms that leverage 
hardware
virtual memory managers this macro returns the virtual address as a physical
one:

    #define VSHADOW_TO_PSHADOW(vs) (reinterpret_cast<u8*>((vs)))

Physical pages are required to be aligned by their size. The size of 
physical
pages is a multiple of the shadow memory granularity (8 bytes for Asan) and
not less than the size of the widest scalar access we have to support (16
bytes). This makes trivial finding page offsets, which we need to implement
RTL functions efficiently. This also simplifies handling of aligned accesses
to physical memory as they are known to not cross bounds of physical pages.
Note that RTL functions have to be fixed to not rely on specific size,
location or order of physical pages.

In addition to the facilities that allow handling of individual accesses to
the virtual memory we also need a set of functions that efficiently perform
operations on specified ranges of virtual addresses:

// Fills a virtual memory with a given value. May release zeroed pages. For
// DFsan we may need a version of this function that takes 16-bit values to
// fill with.
void vshadow_memset(uptr vs, u8 value, uptr size);

// Similarly to vshadow_memset(), this function fills a range of virtual
// memory with a given value and additionally claims that range as read-only
// so the memory manager is not required to support modifying accesses for
// these addresses.
void fill_rodata_vshadow(uptr vs, u8 value, uptr size);

// Copies potentially overlapping memory regions.
void vshadow_memmove(uptr dest, uptr src, uptr size);

// Returns the virtual address of the first non-zero byte in a given virtual
// address range. Can also be used to test for zeroed regions.
uptr find_non_zero_vshadow_byte(uptr vs, uptr size);

// Explicitly releases pages that fit the specified range.
void release_vshadow(uptr vs, uptr size);


The Proof-of-Concept Patch
=========================
To make sure the approach is feasible we have prepared a patch that
fixes the Asan and Tsan RTL and instrumentation parts to translate virtual
shadow memory addresses to physical ones and mmap() shadow memory as we 
access
it. This way we simulate a software virtual memory manager that allocates
physical storage for shadow memory on-demand.

We used that to mock RTL for the sanitizers tests. With this mock in 
place we
pass all Tsan tests and fail on 3 of 610 Asan tests:

test/asan/TestCases/Linux/cuda_test.cc
test/asan/TestCases/Linux/nohugepage_test.cc
test/asan/TestCases/Linux/swapcontext_annotation.cc

The first two tests rely on specific memory map after initializtion of the
shadow memory and the latter takes too long to complete. It would 
probably be
acceptable to XFAIL them when run with a software memory manager enabled and
then consider ways to adopt them as necessary on a per-test basis.

* * *

With this paper we propose the changes that make it possible to use 
sanitizers
on plaforms that have no MMUs to be part of the mainline. However, before
moving further we would like some feedback from the community so 
comments are
very appreciated.

If the approach is fine, we will prepare a set of patches shortly.

Thank you,

-- 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: sanitizers-instrumentation.diff
Type: text/x-patch
Size: 10554 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170223/bc9df223/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sanitizers-rtl.diff
Type: text/x-patch
Size: 83463 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170223/bc9df223/attachment-0003.bin>

Kostya Serebryany via llvm-dev

2017-Feb-28 01:32 UTC

head link

[llvm-dev] RFC: Generalize means the sanitizers work with memory

Hi Ivan,

I've seen your message, but did not have a chance to carefully read, sorry.
Busy weeks.
I may have time next week, or maybe some one else replies earlier.
Don't hesitate to ping me ~ mid next week.

Some suggestions:
* if you use http://llvm.org/docs/Phabricator.html for patches you are more
likely to get attention from us.
* be more concrete, e.g. instead of "platforms that lack such support"
mention which exactly
  platforms are affected (and what is the rest of the LLVM support story
for them)

On Thu, Feb 23, 2017 at 10:16 AM, Ivan A. Kosarev via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> RFC: Generalize means the sanitizers work with memory
>
> Overview
> =======>
> Currently, LLVM sanitizers, such as Asan and Tsan, are tied to a specific
> memory model that relies on presence of hardware support for virtual
> memory.
> This prevents sanitizers from being used on platforms that lack such
> support,
> but otherwise are capable of running sanitized programs. Our research
> indicates that adding support for such platforms is possible with a
> relatively
> small amount of changes to the sanitizers source code and zero performance
> and
> size penalty on currently supported systems. We also found that these
> changes
> clarify and formalize the functional and performance dependencies between
> sanitizers and system memory so they can be considered an improvement in
> terms of design and readability regardless of the added capabilities. One
> can
> think of it as a zero-cost abstraction layer.
>
>
> The Approach
> ===========>
> To support platforms that do not have hardware virtual memory managers,
> we need to introduce the concept of physical memory pages that work as the
> storage for data that sanitizers currently read and write by virtual
> addresses. In presence of the concept of physical memory, every time we
> access
> virtual memory we have to translate the given virtual address to a physical
> one. For example, this check:
>
>    *(u8 *)MEM_TO_SHADOW(allocated) == 0
>
> becomes:
>
>    *MEM_TO_PSHADOW(allocated) == 0
>
> where the MEM_TO_PSHADOW(mem) macro is defined as:
>
>    #define MEM_TO_PSHADOW(mem) VSHADOW_TO_PSHADOW(MEM_TO_VSHADOW(mem))
>    #define MEM_TO_VSHADOW(mem) /* Whatever currently MEM_TO_SHADOW() is. */
>
> The VSHADOW_TO_PSHADOW(vs) macro returns a pointer to a byte within a
> physical page that corresponds to the given virtual address and allocates
> this
> page if it has not been allocated before. On platforms that leverage
> hardware
> virtual memory managers this macro returns the virtual address as a
> physical
> one:
>
>    #define VSHADOW_TO_PSHADOW(vs) (reinterpret_cast<u8*>((vs)))
>
> Physical pages are required to be aligned by their size. The size of
> physical
> pages is a multiple of the shadow memory granularity (8 bytes for Asan) and
> not less than the size of the widest scalar access we have to support (16
> bytes). This makes trivial finding page offsets, which we need to implement
> RTL functions efficiently. This also simplifies handling of aligned
> accesses
> to physical memory as they are known to not cross bounds of physical pages.
> Note that RTL functions have to be fixed to not rely on specific size,
> location or order of physical pages.
>
> In addition to the facilities that allow handling of individual accesses to
> the virtual memory we also need a set of functions that efficiently perform
> operations on specified ranges of virtual addresses:
>
> // Fills a virtual memory with a given value. May release zeroed pages. For
> // DFsan we may need a version of this function that takes 16-bit values to
> // fill with.
> void vshadow_memset(uptr vs, u8 value, uptr size);
>
> // Similarly to vshadow_memset(), this function fills a range of virtual
> // memory with a given value and additionally claims that range as
> read-only
> // so the memory manager is not required to support modifying accesses for
> // these addresses.
> void fill_rodata_vshadow(uptr vs, u8 value, uptr size);
>
> // Copies potentially overlapping memory regions.
> void vshadow_memmove(uptr dest, uptr src, uptr size);
>
> // Returns the virtual address of the first non-zero byte in a given
> virtual
> // address range. Can also be used to test for zeroed regions.
> uptr find_non_zero_vshadow_byte(uptr vs, uptr size);
>
> // Explicitly releases pages that fit the specified range.
> void release_vshadow(uptr vs, uptr size);
>
>
> The Proof-of-Concept Patch
> =========================>
> To make sure the approach is feasible we have prepared a patch that
> fixes the Asan and Tsan RTL and instrumentation parts to translate virtual
> shadow memory addresses to physical ones and mmap() shadow memory as we
> access
> it. This way we simulate a software virtual memory manager that allocates
> physical storage for shadow memory on-demand.
>
> We used that to mock RTL for the sanitizers tests. With this mock in place
> we
> pass all Tsan tests and fail on 3 of 610 Asan tests:
>
> test/asan/TestCases/Linux/cuda_test.cc
> test/asan/TestCases/Linux/nohugepage_test.cc
> test/asan/TestCases/Linux/swapcontext_annotation.cc
>
> The first two tests rely on specific memory map after initializtion of the
> shadow memory and the latter takes too long to complete. It would probably
> be
> acceptable to XFAIL them when run with a software memory manager enabled
> and
> then consider ways to adopt them as necessary on a per-test basis.
>
> * * *
>
> With this paper we propose the changes that make it possible to use
> sanitizers
> on plaforms that have no MMUs to be part of the mainline. However, before
> moving further we would like some feedback from the community so comments
> are
> very appreciated.
>
> If the approach is fine, we will prepare a set of patches shortly.
>
> Thank you,
>
> --
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170227/fc377b1e/attachment-0001.html>

Sean Silva via llvm-dev

2017-Feb-28 02:59 UTC

head link

[llvm-dev] RFC: Generalize means the sanitizers work with memory

+Hal

IIRC, Hal mentioned that he did something like this for a no-MMU HPC
environment he was working in.

-- Sean Silva

On Thu, Feb 23, 2017 at 10:16 AM, Ivan A. Kosarev via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> RFC: Generalize means the sanitizers work with memory
>
> Overview
> =======>
> Currently, LLVM sanitizers, such as Asan and Tsan, are tied to a specific
> memory model that relies on presence of hardware support for virtual
> memory.
> This prevents sanitizers from being used on platforms that lack such
> support,
> but otherwise are capable of running sanitized programs. Our research
> indicates that adding support for such platforms is possible with a
> relatively
> small amount of changes to the sanitizers source code and zero performance
> and
> size penalty on currently supported systems. We also found that these
> changes
> clarify and formalize the functional and performance dependencies between
> sanitizers and system memory so they can be considered an improvement in
> terms of design and readability regardless of the added capabilities. One
> can
> think of it as a zero-cost abstraction layer.
>
>
> The Approach
> ===========>
> To support platforms that do not have hardware virtual memory managers,
> we need to introduce the concept of physical memory pages that work as the
> storage for data that sanitizers currently read and write by virtual
> addresses. In presence of the concept of physical memory, every time we
> access
> virtual memory we have to translate the given virtual address to a physical
> one. For example, this check:
>
>    *(u8 *)MEM_TO_SHADOW(allocated) == 0
>
> becomes:
>
>    *MEM_TO_PSHADOW(allocated) == 0
>
> where the MEM_TO_PSHADOW(mem) macro is defined as:
>
>    #define MEM_TO_PSHADOW(mem) VSHADOW_TO_PSHADOW(MEM_TO_VSHADOW(mem))
>    #define MEM_TO_VSHADOW(mem) /* Whatever currently MEM_TO_SHADOW() is. */
>
> The VSHADOW_TO_PSHADOW(vs) macro returns a pointer to a byte within a
> physical page that corresponds to the given virtual address and allocates
> this
> page if it has not been allocated before. On platforms that leverage
> hardware
> virtual memory managers this macro returns the virtual address as a
> physical
> one:
>
>    #define VSHADOW_TO_PSHADOW(vs) (reinterpret_cast<u8*>((vs)))
>
> Physical pages are required to be aligned by their size. The size of
> physical
> pages is a multiple of the shadow memory granularity (8 bytes for Asan) and
> not less than the size of the widest scalar access we have to support (16
> bytes). This makes trivial finding page offsets, which we need to implement
> RTL functions efficiently. This also simplifies handling of aligned
> accesses
> to physical memory as they are known to not cross bounds of physical pages.
> Note that RTL functions have to be fixed to not rely on specific size,
> location or order of physical pages.
>
> In addition to the facilities that allow handling of individual accesses to
> the virtual memory we also need a set of functions that efficiently perform
> operations on specified ranges of virtual addresses:
>
> // Fills a virtual memory with a given value. May release zeroed pages. For
> // DFsan we may need a version of this function that takes 16-bit values to
> // fill with.
> void vshadow_memset(uptr vs, u8 value, uptr size);
>
> // Similarly to vshadow_memset(), this function fills a range of virtual
> // memory with a given value and additionally claims that range as
> read-only
> // so the memory manager is not required to support modifying accesses for
> // these addresses.
> void fill_rodata_vshadow(uptr vs, u8 value, uptr size);
>
> // Copies potentially overlapping memory regions.
> void vshadow_memmove(uptr dest, uptr src, uptr size);
>
> // Returns the virtual address of the first non-zero byte in a given
> virtual
> // address range. Can also be used to test for zeroed regions.
> uptr find_non_zero_vshadow_byte(uptr vs, uptr size);
>
> // Explicitly releases pages that fit the specified range.
> void release_vshadow(uptr vs, uptr size);
>
>
> The Proof-of-Concept Patch
> =========================>
> To make sure the approach is feasible we have prepared a patch that
> fixes the Asan and Tsan RTL and instrumentation parts to translate virtual
> shadow memory addresses to physical ones and mmap() shadow memory as we
> access
> it. This way we simulate a software virtual memory manager that allocates
> physical storage for shadow memory on-demand.
>
> We used that to mock RTL for the sanitizers tests. With this mock in place
> we
> pass all Tsan tests and fail on 3 of 610 Asan tests:
>
> test/asan/TestCases/Linux/cuda_test.cc
> test/asan/TestCases/Linux/nohugepage_test.cc
> test/asan/TestCases/Linux/swapcontext_annotation.cc
>
> The first two tests rely on specific memory map after initializtion of the
> shadow memory and the latter takes too long to complete. It would probably
> be
> acceptable to XFAIL them when run with a software memory manager enabled
> and
> then consider ways to adopt them as necessary on a per-test basis.
>
> * * *
>
> With this paper we propose the changes that make it possible to use
> sanitizers
> on plaforms that have no MMUs to be part of the mainline. However, before
> moving further we would like some feedback from the community so comments
> are
> very appreciated.
>
> If the approach is fine, we will prepare a set of patches shortly.
>
> Thank you,
>
> --
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170227/5ac4d155/attachment.html>

Hal Finkel via llvm-dev

2017-Mar-09 13:58 UTC

head link

[llvm-dev] RFC: Generalize means the sanitizers work with memory

Hi Ivan,

Thanks for posting this; I'm excited by this proposal - if we can get 
this kind of support in without making the implementation 
non-trivially-harder to maintain, that would be a positive development. 
As Sean mentioned, I did something along these lines to adapt ASan to 
the IBM BG/Q - an HPC system that uses a lightweight operating system. 
On the BG/Q, the lightweight operating system does support virtual 
memory for some special-purpose mappings, but it does not support 
mapping unreserved pages (i.e. MAP_NORESERVE is not supported, and this 
functionality is not supported any other way). As a result, the 
mechanism that the sanitizers use to cover the complete address space 
using shadow memory - by mapping a large region of unreserved pages - 
won't work in this environment. Systems without virtual memory at all 
will obviously have the same problem: All shadow memory must be 
physically backed. I'll also mention that many normal Linux HPC 
environments are configured with overcommit turned off, and I believe 
that using the sanitizers in such environments would also currently not 
work.

Because all shadow memory must be physically backed, it must be 
allocated judicially, and the mapping process might need to be more 
complicated than a simple shift/offset. On the BG/Q, there were a few 
distinct regions of virtual memory that needed to be mapped into a 
single shadow region in the part of the address space where heap 
allocations could be made - as a result, I used a more-complicated 
mapping function.

In this light, I'm trying to understand your proposal. I see that you're
proposing to add support for some kind of additional translation scheme 
between virtual addresses and physical addresses, but I'm not exactly 
sure how you propose to use them. It might help if you were to provide 
some hypothetical implementation of these translations for a simple 
system so that we can understand the usage model better. I'd also like 
to better understand how the instrumentation works; if the mapping 
always replaced by these __asan_mem_to_vshadow/__asan_mem_to_pshadow calls?

Finally, I recommend that we layer this support so that we have:

[regular system] -> [system without (sufficient) unreserved pages] -> 
[system without any mmu]

I'd like a clear explanation of how these last two differ. It looks like 
you have support for manually zeroing pages for the last category. 
Please explain exactly how this scheme works.

Thanks,

Hal


On 02/23/2017 12:16 PM, Ivan A. Kosarev via llvm-dev
wrote:> RFC: Generalize means the sanitizers work with memory
>
> Overview
> =======>
> Currently, LLVM sanitizers, such as Asan and Tsan, are tied to a specific
> memory model that relies on presence of hardware support for virtual 
> memory.
> This prevents sanitizers from being used on platforms that lack such 
> support,
> but otherwise are capable of running sanitized programs. Our research
> indicates that adding support for such platforms is possible with a 
> relatively
> small amount of changes to the sanitizers source code and zero 
> performance and
> size penalty on currently supported systems. We also found that these 
> changes
> clarify and formalize the functional and performance dependencies between
> sanitizers and system memory so they can be considered an improvement in
> terms of design and readability regardless of the added capabilities. 
> One can
> think of it as a zero-cost abstraction layer.
>
>
> The Approach
> ===========>
> To support platforms that do not have hardware virtual memory managers,
> we need to introduce the concept of physical memory pages that work as 
> the
> storage for data that sanitizers currently read and write by virtual
> addresses. In presence of the concept of physical memory, every time 
> we access
> virtual memory we have to translate the given virtual address to a 
> physical
> one. For example, this check:
>
>    *(u8 *)MEM_TO_SHADOW(allocated) == 0
>
> becomes:
>
>    *MEM_TO_PSHADOW(allocated) == 0
>
> where the MEM_TO_PSHADOW(mem) macro is defined as:
>
>    #define MEM_TO_PSHADOW(mem) VSHADOW_TO_PSHADOW(MEM_TO_VSHADOW(mem))
>    #define MEM_TO_VSHADOW(mem) /* Whatever currently MEM_TO_SHADOW() 
> is. */
>
> The VSHADOW_TO_PSHADOW(vs) macro returns a pointer to a byte within a
> physical page that corresponds to the given virtual address and 
> allocates this
> page if it has not been allocated before. On platforms that leverage 
> hardware
> virtual memory managers this macro returns the virtual address as a 
> physical
> one:
>
>    #define VSHADOW_TO_PSHADOW(vs) (reinterpret_cast<u8*>((vs)))
>
> Physical pages are required to be aligned by their size. The size of 
> physical
> pages is a multiple of the shadow memory granularity (8 bytes for 
> Asan) and
> not less than the size of the widest scalar access we have to support (16
> bytes). This makes trivial finding page offsets, which we need to 
> implement
> RTL functions efficiently. This also simplifies handling of aligned 
> accesses
> to physical memory as they are known to not cross bounds of physical 
> pages.
> Note that RTL functions have to be fixed to not rely on specific size,
> location or order of physical pages.
>
> In addition to the facilities that allow handling of individual 
> accesses to
> the virtual memory we also need a set of functions that efficiently 
> perform
> operations on specified ranges of virtual addresses:
>
> // Fills a virtual memory with a given value. May release zeroed 
> pages. For
> // DFsan we may need a version of this function that takes 16-bit 
> values to
> // fill with.
> void vshadow_memset(uptr vs, u8 value, uptr size);
>
> // Similarly to vshadow_memset(), this function fills a range of virtual
> // memory with a given value and additionally claims that range as 
> read-only
> // so the memory manager is not required to support modifying accesses 
> for
> // these addresses.
> void fill_rodata_vshadow(uptr vs, u8 value, uptr size);
>
> // Copies potentially overlapping memory regions.
> void vshadow_memmove(uptr dest, uptr src, uptr size);
>
> // Returns the virtual address of the first non-zero byte in a given 
> virtual
> // address range. Can also be used to test for zeroed regions.
> uptr find_non_zero_vshadow_byte(uptr vs, uptr size);
>
> // Explicitly releases pages that fit the specified range.
> void release_vshadow(uptr vs, uptr size);
>
>
> The Proof-of-Concept Patch
> =========================>
> To make sure the approach is feasible we have prepared a patch that
> fixes the Asan and Tsan RTL and instrumentation parts to translate 
> virtual
> shadow memory addresses to physical ones and mmap() shadow memory as 
> we access
> it. This way we simulate a software virtual memory manager that allocates
> physical storage for shadow memory on-demand.
>
> We used that to mock RTL for the sanitizers tests. With this mock in 
> place we
> pass all Tsan tests and fail on 3 of 610 Asan tests:
>
> test/asan/TestCases/Linux/cuda_test.cc
> test/asan/TestCases/Linux/nohugepage_test.cc
> test/asan/TestCases/Linux/swapcontext_annotation.cc
>
> The first two tests rely on specific memory map after initializtion of 
> the
> shadow memory and the latter takes too long to complete. It would 
> probably be
> acceptable to XFAIL them when run with a software memory manager 
> enabled and
> then consider ways to adopt them as necessary on a per-test basis.
>
> * * *
>
> With this paper we propose the changes that make it possible to use 
> sanitizers
> on plaforms that have no MMUs to be part of the mainline. However, before
> moving further we would like some feedback from the community so 
> comments are
> very appreciated.
>
> If the approach is fine, we will prepare a set of patches shortly.
>
> Thank you,
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170309/b186159b/attachment.html>

Ivan A. Kosarev via llvm-dev

2017-Mar-13 19:32 UTC

head link

[llvm-dev] RFC: Generalize means the sanitizers work with memory

Hello Hal,

Thanks a lot for your feedback. Particularly, I appreciate mentioning 
HPC systems as potential targets for this work as it helps with figuring 
out what the generalized memory interface would look like.

Answering your questions: what we propose with this RFC is to support 
platforms for which there is no way to adopt sanitizers by adding a 
special mapping or tweaking the MEM_TO_SHADOW() macro. As you already 
said, the problem with such platforms is that the amount of available 
physical memory is not sufficient to shadow-map the whole address space 
to sanitize. The solution we propose is to claim a fixed set of ways, 
such as macros and functions, that provide access to physical shadow 
memory so that by implementing these macros and functions one can 
support sanitizers even on platforms that: 1) do not have hardware 
support for virtual memory and 2) can only allocate physical memory by 
relatively small pieces whose base addressses are not known at 
compile-time. This includes implementing a software shadow memory 
manager on top of a malloc()-like API. In addition, the resulting 
support shall be compact and efficient enough to be practical on such 
platforms and the introduced abstraction layer shall have zero penalty 
in terms of code space and performance for the already supported targets.

The proposed approach to the abstraction layer is to provide macros and 
functions that perform necessary operations with physical shadow memory 
by given virtual shadow addresses or virtual shadow address ranges. For 
example, for Asan there is function VShadowToPShadow() declared as:

u64 *VShadowToPShadow(uptr vs);

that returns a pointer to a physical shadow cell by its virtual address 
and makes sure that the piece of physical shadow memory (the physical 
shadow page) the address belongs to is allocated and accessible. For 
perfomance reasons there are also block shadow memory functions that 
peform various operations over virtual address ranges rather than 
individual addresses. There is also a function that explicitly releases 
physical shadow memory. That function can be implemented in any way 
suitable for a given specific platform. The only requirement is that 
subsequent read accesses to the released shadow memory yield zeros, so 
the simplest implementation is zeroing out the specified region.

Please see the updated patch at:

https://reviews.llvm.org/D30583

for details. This patch implements a software shadow memory manager on 
top of Linux mmap(). With this patch we pass Asan and Tsan tests with a 
promising slowdown ratio.

Re: instrumentation: yes, to support platforms that only support 
manual/explicit allocation of physical memory the only way is to 
instrument the code to sanitize with RTL calls.

One important quality of the abstraction layer that we would like to 
maintain is that it is never required to perform backward 
physical-to-virtual translations as they may be extremely inefficient in 
some cases. Since the santiziers themselves require 
shadow-to-application memory translations to be supported, we have to 
deal with both the concepts of virtual and physical shadow addresses. 
Since the abstract layer operates in terms of virtual shadow addresses, 
it does not affect how application memory addresses translate to virtual 
shadow addresses. This means one can choose whatever mapping will do 
better for his platform and then decide whether to rely on 
hardware-driven allocation of physical pages or implement a custom 
software memory manager.

Thanks again and please let me know if I can help more.

Regards,
Ivan


On 09/03/17 15:58, Hal Finkel wrote:>
> Hi Ivan,
>
> Thanks for posting this; I'm excited by this proposal - if we can get 
> this kind of support in without making the implementation 
> non-trivially-harder to maintain, that would be a positive 
> development. As Sean mentioned, I did something along these lines to 
> adapt ASan to the IBM BG/Q - an HPC system that uses a lightweight 
> operating system. On the BG/Q, the lightweight operating system does 
> support virtual memory for some special-purpose mappings, but it does 
> not support mapping unreserved pages (i.e. MAP_NORESERVE is not 
> supported, and this functionality is not supported any other way). As 
> a result, the mechanism that the sanitizers use to cover the complete 
> address space using shadow memory - by mapping a large region of 
> unreserved pages - won't work in this environment. Systems without 
> virtual memory at all will obviously have the same problem: All shadow 
> memory must be physically backed. I'll also mention that many normal 
> Linux HPC environments are configured with overcommit turned off, and 
> I believe that using the sanitizers in such environments would also 
> currently not work.
>
> Because all shadow memory must be physically backed, it must be 
> allocated judicially, and the mapping process might need to be more 
> complicated than a simple shift/offset. On the BG/Q, there were a few 
> distinct regions of virtual memory that needed to be mapped into a 
> single shadow region in the part of the address space where heap 
> allocations could be made - as a result, I used a more-complicated 
> mapping function.
>
> In this light, I'm trying to understand your proposal. I see that 
> you're proposing to add support for some kind of additional 
> translation scheme between virtual addresses and physical addresses, 
> but I'm not exactly sure how you propose to use them. It might help if 
> you were to provide some hypothetical implementation of these 
> translations for a simple system so that we can understand the usage 
> model better. I'd also like to better understand how the 
> instrumentation works; if the mapping always replaced by these 
> __asan_mem_to_vshadow/__asan_mem_to_pshadow calls?
>
> Finally, I recommend that we layer this support so that we have:
>
> [regular system] -> [system without (sufficient) unreserved pages] ->
> [system without any mmu]
>
> I'd like a clear explanation of how these last two differ. It looks 
> like you have support for manually zeroing pages for the last 
> category. Please explain exactly how this scheme works.
>
> Thanks,
>
> Hal
>
>
> On 02/23/2017 12:16 PM, Ivan A. Kosarev via llvm-dev wrote:
>> RFC: Generalize means the sanitizers work with memory
>>
>> Overview
>> =======>>
>> Currently, LLVM sanitizers, such as Asan and Tsan, are tied to a 
>> specific
>> memory model that relies on presence of hardware support for virtual 
>> memory.
>> This prevents sanitizers from being used on platforms that lack such 
>> support,
>> but otherwise are capable of running sanitized programs. Our research
>> indicates that adding support for such platforms is possible with a 
>> relatively
>> small amount of changes to the sanitizers source code and zero 
>> performance and
>> size penalty on currently supported systems. We also found that these 
>> changes
>> clarify and formalize the functional and performance dependencies 
>> between
>> sanitizers and system memory so they can be considered an improvement
in
>> terms of design and readability regardless of the added capabilities. 
>> One can
>> think of it as a zero-cost abstraction layer.
>>
>>
>> The Approach
>> ===========>>
>> To support platforms that do not have hardware virtual memory managers,
>> we need to introduce the concept of physical memory pages that work 
>> as the
>> storage for data that sanitizers currently read and write by virtual
>> addresses. In presence of the concept of physical memory, every time 
>> we access
>> virtual memory we have to translate the given virtual address to a 
>> physical
>> one. For example, this check:
>>
>>    *(u8 *)MEM_TO_SHADOW(allocated) == 0
>>
>> becomes:
>>
>>    *MEM_TO_PSHADOW(allocated) == 0
>>
>> where the MEM_TO_PSHADOW(mem) macro is defined as:
>>
>>    #define MEM_TO_PSHADOW(mem) VSHADOW_TO_PSHADOW(MEM_TO_VSHADOW(mem))
>>    #define MEM_TO_VSHADOW(mem) /* Whatever currently MEM_TO_SHADOW() 
>> is. */
>>
>> The VSHADOW_TO_PSHADOW(vs) macro returns a pointer to a byte within a
>> physical page that corresponds to the given virtual address and 
>> allocates this
>> page if it has not been allocated before. On platforms that leverage 
>> hardware
>> virtual memory managers this macro returns the virtual address as a 
>> physical
>> one:
>>
>>    #define VSHADOW_TO_PSHADOW(vs) (reinterpret_cast<u8*>((vs)))
>>
>> Physical pages are required to be aligned by their size. The size of 
>> physical
>> pages is a multiple of the shadow memory granularity (8 bytes for 
>> Asan) and
>> not less than the size of the widest scalar access we have to support 
>> (16
>> bytes). This makes trivial finding page offsets, which we need to 
>> implement
>> RTL functions efficiently. This also simplifies handling of aligned 
>> accesses
>> to physical memory as they are known to not cross bounds of physical 
>> pages.
>> Note that RTL functions have to be fixed to not rely on specific size,
>> location or order of physical pages.
>>
>> In addition to the facilities that allow handling of individual 
>> accesses to
>> the virtual memory we also need a set of functions that efficiently 
>> perform
>> operations on specified ranges of virtual addresses:
>>
>> // Fills a virtual memory with a given value. May release zeroed 
>> pages. For
>> // DFsan we may need a version of this function that takes 16-bit 
>> values to
>> // fill with.
>> void vshadow_memset(uptr vs, u8 value, uptr size);
>>
>> // Similarly to vshadow_memset(), this function fills a range of
virtual
>> // memory with a given value and additionally claims that range as 
>> read-only
>> // so the memory manager is not required to support modifying 
>> accesses for
>> // these addresses.
>> void fill_rodata_vshadow(uptr vs, u8 value, uptr size);
>>
>> // Copies potentially overlapping memory regions.
>> void vshadow_memmove(uptr dest, uptr src, uptr size);
>>
>> // Returns the virtual address of the first non-zero byte in a given 
>> virtual
>> // address range. Can also be used to test for zeroed regions.
>> uptr find_non_zero_vshadow_byte(uptr vs, uptr size);
>>
>> // Explicitly releases pages that fit the specified range.
>> void release_vshadow(uptr vs, uptr size);
>>
>>
>> The Proof-of-Concept Patch
>> =========================>>
>> To make sure the approach is feasible we have prepared a patch that
>> fixes the Asan and Tsan RTL and instrumentation parts to translate 
>> virtual
>> shadow memory addresses to physical ones and mmap() shadow memory as 
>> we access
>> it. This way we simulate a software virtual memory manager that 
>> allocates
>> physical storage for shadow memory on-demand.
>>
>> We used that to mock RTL for the sanitizers tests. With this mock in 
>> place we
>> pass all Tsan tests and fail on 3 of 610 Asan tests:
>>
>> test/asan/TestCases/Linux/cuda_test.cc
>> test/asan/TestCases/Linux/nohugepage_test.cc
>> test/asan/TestCases/Linux/swapcontext_annotation.cc
>>
>> The first two tests rely on specific memory map after initializtion 
>> of the
>> shadow memory and the latter takes too long to complete. It would 
>> probably be
>> acceptable to XFAIL them when run with a software memory manager 
>> enabled and
>> then consider ways to adopt them as necessary on a per-test basis.
>>
>> * * *
>>
>> With this paper we propose the changes that make it possible to use 
>> sanitizers
>> on plaforms that have no MMUs to be part of the mainline. However, 
>> before
>> moving further we would like some feedback from the community so 
>> comments are
>> very appreciated.
>>
>> If the approach is fine, we will prepare a set of patches shortly.
>>
>> Thank you,
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> -- 
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170313/3c621369/attachment.html>

Apparently Analagous Threads

Search for more apparently analagous threads

llvm dev - Feb 2017 - RFC: Generalize means the sanitizers work with memory

[llvm-dev] RFC: Generalize means the sanitizers work with memory

[llvm-dev] RFC: Generalize means the sanitizers work with memory

[llvm-dev] RFC: Generalize means the sanitizers work with memory

[llvm-dev] RFC: Generalize means the sanitizers work with memory

[llvm-dev] RFC: Generalize means the sanitizers work with memory

Apparently Analagous Threads