thr3ads.net - Nouveau - [Nouveau] MmioTrace: Using the Instruction Decoder, etc. [Oct 2013]

If this information is useful, please help other people find it:
Share via:

Pekka Paalanen

2013-Oct-19 07:14 UTC

[Nouveau] MmioTrace: Using the Instruction Decoder, etc.

On Fri, 18 Oct 2013 00:11:15 +0400
Eugene Shatokhin <euspectre at gmail.com> wrote:
> Hi,
> 
> Good to know that!
> 
> Yes, it should be faster than page faulting, although I haven't done
the
> benchmarking yet. And yes, it is not needed to disable all but one CPU. In
> my current implementation, I use an ordered workqueue to send the data to
> the mmapped output buffer (where they will be read from from the user
> space) and that ensures the order of events is kept. May be less than ideal
> but it currently works quite well with network drivers, the performance
> overhead is acceptable there.
Ah, you are not using the ftrace framework nor relayfs? Mmiotrace
used to be relayfs at one point and then converted to ftrace.
> A subtle drawback may be that the system sees the memory reads and writes
> made by the code of the driver directly but if the driver uses some other
> kernel functions, it needs to intercept these calls and determine how they
> access the memory of interest. Theoretically, it could be less accurate
> than page fault handling. A page fault happens no matter if the driver
> accesses the memory directly or via strcpy(), for example. I doubt this
> would be a big problem for tracking the accesses to ioremapped memory
> though.
> 
> Nevertheless, it is manageable, the system already handles string
> functions, for example, and reports appropriate events. The handlers for
> other functions could be added as well. So this just requires a bit more
> maintenance work.
Are you saying that you intercept function calls, and *never* rely
on page faulting?

Does that mean that if a driver does the ugly thing and
dereferences an iomem pointer directly, you won't catch that?
Unfortunately, I think proprietary drivers do such uglies, since
they are x86 and x86_64 only where it works. Or they might have the
iomem accessor functions inlined.

What I had in mind was to still use page faulting to catch the
memory accessing machine instructions, but then use emulation to
execute that instruction with the memory address diverted to the
real ioremapped region instead of the dummy region given to the driver.
Currently for each access, on the page fault, mmiotrace uses single
stepping and page table manipulation to let the instruction run for
real, and immediately afterwards set things back to page faulting.

Sorry, I see my terminology was wrong. I don't think we can avoid
the page faulting, but I'd like to avoid the single-stepping and
page table mangling on the fly. Heh, things are slowly coming back
to me.

What do you thing, would it still be interesting?
> > Unfortunately, my job exhausts my coding energy, and I haven't
even
> touched mmiotrace in years.
> 
> I understand. I have many other responsibilities too. Code to write, bugs
> to fix, etc. ;-)
> 
> Well, then, when time permits, I'll try to prepare a prototype so that
its
> performance and reliability could be evaluated. Hard to tell what the
> numbers will be before that.
> 
> Suggestions, comments and other feedback are welcome of course.
> 
> And, by the way, video drivers do not use SSE and similar instructions when
> accessing ioremapped memory, do they?
> Such things are rare in the kernel and usually frowned upon so I opted not
> to handle them so far in KernelStrider.
I don't really know. I guess everything could be possible in
proprietary drivers, but you can look at the instruction decoding
code in mmiotrace, which digs up the type and size of access and
the value. That has been enough so far.


Thanks,
pq
> 2013/10/17 Pekka Paalanen <pq at iki.fi>
> 
> > On Mon, 14 Oct 2013 22:45:09 +0400
> > Eugene Shatokhin <euspectre at gmail.com> wrote:
> >
> > > Hi,
> > >
> > > There is an interesting TODO item on MmioTraceDeveloper page:
> > > "kprobes has a generic instruction decoding facility, use
that instead of
> > > homebrewn (or KVM), and use emulation instead of page
faulting"
> > >
> > > Actually, I have done something similar in one of my systems,
> > KernelStrider
> > > (http://code.google.com/p/kernel-strider/). The system
instruments a
> > kernel
> > > module when that module is being loaded. The instrumented code
executes
> > > instead of the original one and provides information about the
memory
> > > accesses it makes and the functions it calls. These data are sent
to user
> > > space for further analysis.
> > >
> > > Currently, I use this system to detect data races in the Linux
kernel
> > (and
> > > have found some). I suppose, it could probably be useful to
MmioTrace as
> > > well.
> > >
> > > KernelStrider uses an enhanced version of the x86 instruction
decoder
> > that
> > > Kprobes use and relies on binary instrumentation rather than on
page
> > > faults. So, it can track:
> > > - memory accesses (address and size of the accessed memory as
well as the
> > > access type are recorded)
> > > - function calls (exported functions and callbacks, one can setup
pre-
> > and
> > > post- handlers for these)
> > >
> > > Is there any interest in trying this approach to the task of
MmioTrace?
> > >
> > > If so, we can discuss it. When I have time, I could try to create
a
> > > prototype based on KernelStrider's core that tracks the
memory accesses
> > > Mmiotrace needs.
> > > What do you think?
> >
> > Hi Eugene,
> >
> > that is very interesting! I assume emulating the instructions is
> > not only cleaner, but also faster than page-faulting, right? Maybe
> > even more reliable, perhaps up to the point where we would not need
> > to disable all but one CPU.
> >
> > Unfortunately, my job exhausts my coding energy, and I haven't
even
> > touched mmiotrace in years.
> >
> > However, let's see if there are interested people on the mailing
> > lists. I'm CC'ing nouveau, since that is where mmiotrace
started,
> > and dri-devel in the hopes to catch other drivers' reverse
> > engineers.
> >

Eugene Shatokhin

2013-Oct-19 13:12 UTC

head link

[Nouveau] MmioTrace: Using the Instruction Decoder, etc.

Hi,
>  Ah, you are not using the ftrace framework nor relayfs? Mmiotrace used to be relayfs at one point and then converted to ftrace.

Yes, I considered these when I started working on KernelStrider but finally
borrowed ideas from Perf and implemented them. A mmapped ring buffer does
its job well and has a higher throughput than Ftrace in my case.
> Are you saying that you intercept function calls, and *never* rely
> on page faulting?
The system intercepts both function calls *and* memory operations made by
the driver itself. Yes, it never relies on page faulting.

 > Does that mean that if a driver does the ugly thing and
 > dereferences an iomem pointer directly, you won't catch that?

It will be caught.

What my system actually does is as follows.

When the target kernel module has been loaded into memory but before it has
begun its initialization, KernelStrider processes it, function after
function. It creates an instrumented variant of each function in the module
mapping space and places a jump at the beginning of the original function
to point to the instrumented one. After instrumentation is done, the target
driver may start executing.

If some original function of the driver contained, say,

  mov 0xabcd (%rax), %rsi
  mov %rbx, 0xbeeffeed (%rsi)

that will be transformed to something like

  lea  0xabcd (%rax), %rbx
  mov %rbx, <local_storage1>
  mov 0xabcd (%rax), %rsi
  lea  0xbeeffeed (%rsi), %rbx
  mov %rbx, <local_storage2>
  mov %rbx, 0xbeeffeed (%rsi)
  ...
  <send the local_storage to the output system>

That is, the address which is about to be accessed is determined and stored
in 'local_storage', a special memory structure. At the end of the block
of
instructions, the information from the local storage is sent to the output
system. So the addresses and sizes of the accessed memory areas as well as
the types of the accesses (read/write/update) will be available for reading
from the user space.

It is actually more complex than that (KernelStrider has to deal with
register allocation, relocations and other things) but the principle is as
I described.

The function calls are processed too so that we can set our own handlers to
execute at the beginning of a function and right before its exit.

Yes, the functions like read[bwql]() and write[bwlq]() are usually inline
but they pose no problem: on x86 they compile to ordinary MOV instructions
and the like which are handled as I described above.

The instrumented code will access the ioremapped area the same way as the
original code would, no need for single-stepping or emulation in this case.

What I wrote in my previous letter is that there is a special case when the
target driver uses some non-inline function provided by the kernel proper
or by another driver and that function accesses the ioremapped memory area
of interest.

KernelStrider needs to track all such functions in order not to miss some
memory accesses to that ioremapped area. Perhaps, that's manageable. There
are not too many such functions, aren't they?
> I don't really know. I guess everything could be possible in
> proprietary drivers, but you can look at the instruction decoding
> code in mmiotrace, which digs up the type and size of access and
> the value. That has been enough so far.
Yes, I will take a closer look on that part of MmioTrace, thanks for the
point.

Regards,

Eugene
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.freedesktop.org/archives/nouveau/attachments/20131019/abfc4305/attachment.html>

Eugene Shatokhin

2013-Oct-19 13:16 UTC

head link

[Nouveau] MmioTrace: Using the Instruction Decoder, etc.

Oh, messed up the registers in the example. Should be like this:

If some original function of the driver contained, say,

mov 0xabcd (%rax), %rsi
mov %rdx, 0xbeeffeed (%rsi)

that will be transformed to something like

lea 0xabcd (%rax), %rbx
mov %rbx, <local_storage1>
mov 0xabcd (%rax), %rsi
lea 0xbeeffeed (%rsi), %rbx
mov %rbx, <local_storage2>
mov %rdx, 0xbeeffeed (%rsi)
...
<send the local_storage to the output system>

Regards,
Eugene
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.freedesktop.org/archives/nouveau/attachments/20131019/90d24e40/attachment-0001.html>

Pekka Paalanen

2013-Oct-25 09:08 UTC

head link

[Nouveau] MmioTrace: Using the Instruction Decoder, etc.

On Sat, 19 Oct 2013 17:12:20 +0400
Eugene Shatokhin <euspectre at gmail.com> wrote:
> Hi,
> 
> >  Ah, you are not using the ftrace framework nor relayfs? Mmiotrace
>  used to be relayfs at one point and then converted to ftrace.
> 
> Yes, I considered these when I started working on KernelStrider but finally
> borrowed ideas from Perf and implemented them. A mmapped ring buffer does
> its job well and has a higher throughput than Ftrace in my case.
> 
> > Are you saying that you intercept function calls, and *never* rely
> > on page faulting?
> 
> The system intercepts both function calls *and* memory operations made by
> the driver itself. Yes, it never relies on page faulting.
> 
>  > Does that mean that if a driver does the ugly thing and
>  > dereferences an iomem pointer directly, you won't catch that?
> 
> It will be caught.
> 
> What my system actually does is as follows.
> 
> When the target kernel module has been loaded into memory but before it has
> begun its initialization, KernelStrider processes it, function after
> function. It creates an instrumented variant of each function in the module
> mapping space and places a jump at the beginning of the original function
> to point to the instrumented one. After instrumentation is done, the target
> driver may start executing.
Oh, that works on a completely different way than I even imagined,
a whole another level of complexity.


<...snip code you corrected in another email>
> That is, the address which is about to be accessed is determined and stored
> in 'local_storage', a special memory structure. At the end of the
block of
> instructions, the information from the local storage is sent to the output
> system. So the addresses and sizes of the accessed memory areas as well as
> the types of the accesses (read/write/update) will be available for reading
> from the user space.
Just curious, how do you detect interesting instructions to
instrument from uninteresting instructions that do not access mmio
areas?

Does it rely on post-processing, in that you instrument practically
everything, and then in post-processing you check if the accessed
memory address actually was interesting before sending the data to user
space?
> It is actually more complex than that (KernelStrider has to deal with
> register allocation, relocations and other things) but the principle is as
> I described.
> 
> The function calls are processed too so that we can set our own handlers to
> execute at the beginning of a function and right before its exit.
> 
> Yes, the functions like read[bwql]() and write[bwlq]() are usually inline
> but they pose no problem: on x86 they compile to ordinary MOV instructions
> and the like which are handled as I described above.
> 
> The instrumented code will access the ioremapped area the same way as the
> original code would, no need for single-stepping or emulation in this case.
That is very cool, the possibility never even occurred to me.
> What I wrote in my previous letter is that there is a special case when the
> target driver uses some non-inline function provided by the kernel proper
> or by another driver and that function accesses the ioremapped memory area
> of interest.
> 
> KernelStrider needs to track all such functions in order not to miss some
> memory accesses to that ioremapped area. Perhaps, that's manageable.
There
> are not too many such functions, aren't they?
I don't really know, and personally I was never even interested,
since the page faulting approach was a catch-all method. We
could even detect when we hit some access we couldn't handle right
due to lacking instruction decoding.

I guess to be sure your approach does not miss anything, we'd still
need the page faulting setup as a safety net to know when or if
something is missed, right? And somehow have the instrumented code
circumvent it.

We could use some comments from the real reverse-engineers. I used
to be mostly a tool writer.


Thanks,
pq

Possibly Parallel Threads

Search for more maybe matching threads

Nouveau - Oct 2013 - MmioTrace: Using the Instruction Decoder, etc.

[Nouveau] MmioTrace: Using the Instruction Decoder, etc.

[Nouveau] MmioTrace: Using the Instruction Decoder, etc.

[Nouveau] MmioTrace: Using the Instruction Decoder, etc.

[Nouveau] MmioTrace: Using the Instruction Decoder, etc.

Possibly Parallel Threads