thr3ads.net - llvm dev - [LLVMdev] Proposal: Improved regression test support for RuntimeDyld/MCJIT. [Jun 2014]

If this information is useful, please help other people find it:
Share via:

Lang Hames

2014-Jun-23 22:01 UTC

[LLVMdev] Proposal: Improved regression test support for RuntimeDyld/MCJIT.

Hi Everyone,

For your consideration: A proposal to improve regression test support for
RuntimeDyld.

Short version: We can make RuntimeDyld far more testable by adding a
trivial pointer-expression language that allows us to describe how memory
should look post-relocation. Jump down to "The Proposal" for details.

Long version:

Background:

For those unfamiliar with it, RuntimeDyld a component of MCJIT, LLVM's JIT
compiler infrastructure. MCJIT produces an object file in memory for each
module that is JIT'd. RuntimeDyld's job is to apply all the relocations
necessary to make the code in the object file runnable. In other words,
RuntimeDyld is acting as both the static and dynamic linker for the JIT.

The Problem:

We can't directly test RuntimeDyld at the moment. We currently infer the
correctness of RuntimeDyld indirectly from the success of the MCJIT
regression tests - if they pass, we assume RuntimeDyld must have done its
job right. That's far from an ideal. The biggest issues with it are:

(1) Each platform is testing only its own relocations and no others. I.e.
X86 testers are testing X86 relocations only. ARM testers are testing ARM
relocations only. If someone running on X86 breaks a relocation for ARM
they won't see the error in their regression test run - they'll have to
wait until an ARM buildbot breaks before they realize anything is wrong.
Fixes for platforms that you don't have access to are difficult to test -
all you can do is eyeball disassembled memory and see if everything looks
sane. This is not much fun.

(2) Relocations are produced by CodeGen from IR, rather than described
directly. That's a lot of machinery to have between the test-case and the
final result. It is difficult to know what relocations each IR regression
test is testing (and they're often incidental - we don't have a
dedicated
relocation test set). This also means that if/when the code generator
produces different relocation types the existing tests will keep on passing
but will silently stop testing the thing they used to test.

The Proposal:

(1) We provide a mechanism for describing how pieces of relocated memory
should look immediately prior to execution, and then inspect the memory
rather than executing it. This addresses point (1) above: Tests for any
platform can be loaded, linked and verified on any platform. If you're
coding on X86 and you break an ARM relocation you'll know about it
immediately.

(2) RuntimeDyld test cases should be written in assembly, rather than IR.
This addresses point (2) above - we can cut the code generators out and
guarantee that we're testing what we're interested in.

The way to do this is to introduce a simple pointer expression language.
This should be able to express things like: "The immediate for this call
points at symbol foo".

Symbolically, what I have in mind would look something like:

        // some asm ...
# assert *(inst1 + 1) = foo
inst1:
        callq   foo
        // some asm...

Here we add the "inst1" label to give us a address from which we can
get at
the immediate for the call. The " + 1" expression skips the call
opcode (we
know the size of the opcode ahead of time, since this is assembly and so
target-specific).

To verify that constraints expressed in this language hold, we can add an
expression evaluator to the llvm-rtdyld utility, which is a command-line
interface to RuntimeDyld.

I find these things are easier to discuss in the concrete, so I've attached
a basic implementation of this idea. The following discussion is in terms
of my patch, but I'm very open to tweaking all this.

The language I've implemented is:

test = expr '=' expr

expr = '*{' number '}' load_addr_expr
     | binary_expr
     | '(' expr ')'
     | symbol
     | number

load_addr_expr = symbol
               | '(' symbol '+' number ')'
               | '(' symbol '-' number ')'

binary_expr = expr '+' expr
            | expr '-' expr
            | expr '&' expr
            | expr '|' expr
            | expr '<<' expr
            | expr '>>' expr

This expression language supports simple pointer arithmetic, shifting,
masking and loading. All values are internally held as 64-bit unsigneds,
since RuntimeDlyd is designed to support cross-platform linking, including
linking for 64-bit targets from a 32-bit host. I think the only stand-out
wart is the *{#size}<addr> syntax for loads. This comes from the fact that
immediates aren't always 64-bits, so it's not safe to do a 64-bit load:
you
could read past the end of allocated memory. The #size field indicates how
many bytes to read.

This patch adds a "-verify" option to llvm-rtdyld to attach the
expression
evaluator to a RuntimeDyld instance after linking. When -verify is passed,
llvm-rtdyld does not execute any code. Files containing rules are passed
via "-check=<filename>" arguments, and rules are read from any
line
prefixed with the string "# rtdyld-check: ". The intended workflow is
modeled on the FileCheck regression tests.

Here's an example of what a test case for a test for an x86-64 PC-relative
MACHO_VANILLA relocation would look like:

; RUN: clang -triple x86_64-apple-macosx10.9.0 -c -o foo.o %s
; RUN: llvm-rtdyld -verify -check=foo.s foo.o
; RUN: rm foo.o
;
; Test an x86-64 PC-relative MACHO_VANILLA relocation.

        .text
        .globl  bar
        .align  16, 0x90
bar:
        retq

        .globl  foo
        .align  16, 0x90
foo:
# rtdyld-check: *{4}(inst1 - 4) = (bar - inst1) & 0xffffffff
        callq   bar
inst1:
        retq


With this system, we could write targeted regression tests for every
relocation type on every platform, and test them on any system. Failures
would immediately identify which target and relocation type broke.

I think this system would massively improve the testability of the
RuntimeDyld layer, which is good news in light of the increased usage MCJIT
is getting these days.

Please let me know what you think. Comments and critiques are very welcome,
both of the language and the proposed workflow.

Cheers,
Lang.

TL;DR: lhames responds to dblaikie's incessant demand for test cases. ;)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140623/2baf978d/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: skip_scattered_relocations.patch
Type: application/octet-stream
Size: 2117 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140623/2baf978d/attachment.obj>

Lang Hames

2014-Jun-23 23:21 UTC

head link

[LLVMdev] Proposal: Improved regression test support for RuntimeDyld/MCJIT.

I should emphasize, in case it's not clear: This testing infrastructure is
target and format agnostic. It will work for MachO, ELF, and any other
object format without modification, as it talks only to the generic
RuntimeDyld interface.

Cheers,
Lang.


On Mon, Jun 23, 2014 at 3:01 PM, Lang Hames <lhames at gmail.com> wrote:
> Hi Everyone,
>
> For your consideration: A proposal to improve regression test support for
> RuntimeDyld.
>
> Short version: We can make RuntimeDyld far more testable by adding a
> trivial pointer-expression language that allows us to describe how memory
> should look post-relocation. Jump down to "The Proposal" for
details.
>
> Long version:
>
> Background:
>
> For those unfamiliar with it, RuntimeDyld a component of MCJIT, LLVM's
JIT
> compiler infrastructure. MCJIT produces an object file in memory for each
> module that is JIT'd. RuntimeDyld's job is to apply all the
relocations
> necessary to make the code in the object file runnable. In other words,
> RuntimeDyld is acting as both the static and dynamic linker for the JIT.
>
> The Problem:
>
> We can't directly test RuntimeDyld at the moment. We currently infer
the
> correctness of RuntimeDyld indirectly from the success of the MCJIT
> regression tests - if they pass, we assume RuntimeDyld must have done its
> job right. That's far from an ideal. The biggest issues with it are:
>
> (1) Each platform is testing only its own relocations and no others. I.e.
> X86 testers are testing X86 relocations only. ARM testers are testing ARM
> relocations only. If someone running on X86 breaks a relocation for ARM
> they won't see the error in their regression test run - they'll
have to
> wait until an ARM buildbot breaks before they realize anything is wrong.
> Fixes for platforms that you don't have access to are difficult to test
-
> all you can do is eyeball disassembled memory and see if everything looks
> sane. This is not much fun.
>
> (2) Relocations are produced by CodeGen from IR, rather than described
> directly. That's a lot of machinery to have between the test-case and
the
> final result. It is difficult to know what relocations each IR regression
> test is testing (and they're often incidental - we don't have a
dedicated
> relocation test set). This also means that if/when the code generator
> produces different relocation types the existing tests will keep on passing
> but will silently stop testing the thing they used to test.
>
> The Proposal:
>
> (1) We provide a mechanism for describing how pieces of relocated memory
> should look immediately prior to execution, and then inspect the memory
> rather than executing it. This addresses point (1) above: Tests for any
> platform can be loaded, linked and verified on any platform. If you're
> coding on X86 and you break an ARM relocation you'll know about it
> immediately.
>
> (2) RuntimeDyld test cases should be written in assembly, rather than IR.
> This addresses point (2) above - we can cut the code generators out and
> guarantee that we're testing what we're interested in.
>
> The way to do this is to introduce a simple pointer expression language.
> This should be able to express things like: "The immediate for this
call
> points at symbol foo".
>
> Symbolically, what I have in mind would look something like:
>
>         // some asm ...
> # assert *(inst1 + 1) = foo
> inst1:
>         callq   foo
>         // some asm...
>
> Here we add the "inst1" label to give us a address from which we
can get
> at the immediate for the call. The " + 1" expression skips the
call opcode
> (we know the size of the opcode ahead of time, since this is assembly and
> so target-specific).
>
> To verify that constraints expressed in this language hold, we can add an
> expression evaluator to the llvm-rtdyld utility, which is a command-line
> interface to RuntimeDyld.
>
> I find these things are easier to discuss in the concrete, so I've
> attached a basic implementation of this idea. The following discussion is
> in terms of my patch, but I'm very open to tweaking all this.
>
> The language I've implemented is:
>
> test = expr '=' expr
>
> expr = '*{' number '}' load_addr_expr
>      | binary_expr
>      | '(' expr ')'
>      | symbol
>      | number
>
> load_addr_expr = symbol
>                | '(' symbol '+' number ')'
>                | '(' symbol '-' number ')'
>
> binary_expr = expr '+' expr
>             | expr '-' expr
>             | expr '&' expr
>             | expr '|' expr
>             | expr '<<' expr
>             | expr '>>' expr
>
> This expression language supports simple pointer arithmetic, shifting,
> masking and loading. All values are internally held as 64-bit unsigneds,
> since RuntimeDlyd is designed to support cross-platform linking, including
> linking for 64-bit targets from a 32-bit host. I think the only stand-out
> wart is the *{#size}<addr> syntax for loads. This comes from the fact
that
> immediates aren't always 64-bits, so it's not safe to do a 64-bit
load: you
> could read past the end of allocated memory. The #size field indicates how
> many bytes to read.
>
> This patch adds a "-verify" option to llvm-rtdyld to attach the
expression
> evaluator to a RuntimeDyld instance after linking. When -verify is passed,
> llvm-rtdyld does not execute any code. Files containing rules are passed
> via "-check=<filename>" arguments, and rules are read from
any line
> prefixed with the string "# rtdyld-check: ". The intended
workflow is
> modeled on the FileCheck regression tests.
>
> Here's an example of what a test case for a test for an x86-64
PC-relative
> MACHO_VANILLA relocation would look like:
>
> ; RUN: clang -triple x86_64-apple-macosx10.9.0 -c -o foo.o %s
> ; RUN: llvm-rtdyld -verify -check=foo.s foo.o
> ; RUN: rm foo.o
> ;
> ; Test an x86-64 PC-relative MACHO_VANILLA relocation.
>
>         .text
>         .globl  bar
>         .align  16, 0x90
> bar:
>         retq
>
>         .globl  foo
>         .align  16, 0x90
> foo:
> # rtdyld-check: *{4}(inst1 - 4) = (bar - inst1) & 0xffffffff
>         callq   bar
> inst1:
>         retq
>
>
> With this system, we could write targeted regression tests for every
> relocation type on every platform, and test them on any system. Failures
> would immediately identify which target and relocation type broke.
>
> I think this system would massively improve the testability of the
> RuntimeDyld layer, which is good news in light of the increased usage MCJIT
> is getting these days.
>
> Please let me know what you think. Comments and critiques are very
> welcome, both of the language and the proposed workflow.
>
> Cheers,
> Lang.
>
> TL;DR: lhames responds to dblaikie's incessant demand for test cases.
;)
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140623/854793f9/attachment.html>

David Blaikie

2014-Jun-24 00:01 UTC

head link

[LLVMdev] Proposal: Improved regression test support for RuntimeDyld/MCJIT.

On Mon, Jun 23, 2014 at 3:01 PM, Lang Hames <lhames at gmail.com>
wrote:> Hi Everyone,
>
> For your consideration: A proposal to improve regression test support for
> RuntimeDyld.
Thanks for working on this, Lang. It's great to see.
> Short version: We can make RuntimeDyld far more testable by adding a
trivial
> pointer-expression language that allows us to describe how memory should
> look post-relocation. Jump down to "The Proposal" for details.
I've been trying to puzzle over what this would look like with a
possibly more general feature*.

What would testing look like if we had a rtdyld dumping mode that
printed the disassembly of the relocated machine code, and a symbol
table (or just inserted the labels for the symbols into the
disassembly?).

I understand we'd need to beef up FileCheck with slightly more
arithmetic operations - but is it really so much (& would they be so
useless for other tests) that it's not worth putting it there?

To take your example, here's my vague idea of what it might look like
to use a dump+FileCheck. The dump would look something like:

(obviously I don't know, nor for this purpose care, how big the
instructions are, just that they have distinct addresses, etc)

  0x42: bar:
  0x42:   retq
  0x43: foo:
  0x43:   callq 0x42
  0x44: inst1:
  0x44:   retq

And the FileCheck equivalent of

  # rtdyld-check: *{4}(inst1 - 4) = (bar - inst1) & 0xffffffff

would be something like:

  CHECK: [[CALL_ADDR:.*]]: bar:
  CHECK: callq [[CALL_ADDR]]

Which, I suppose, depends on disassembler working correctly, not sure
if that's high risk/complicated.

Alternatively - could llvm-rtdyld just print a simple description of
relocations its applied and the location of symbols? (similar to a
static display of relocations like llvm-objdump -r) then FileCheck
that.

* all that said, a feature like you've proposed/implemented isn't
without precedent - clang's -verify is very similar to what you've got
here

> Long version:
>
> Background:
>
> For those unfamiliar with it, RuntimeDyld a component of MCJIT, LLVM's
JIT
> compiler infrastructure. MCJIT produces an object file in memory for each
> module that is JIT'd. RuntimeDyld's job is to apply all the
relocations
> necessary to make the code in the object file runnable. In other words,
> RuntimeDyld is acting as both the static and dynamic linker for the JIT.
>
> The Problem:
>
> We can't directly test RuntimeDyld at the moment. We currently infer
the
> correctness of RuntimeDyld indirectly from the success of the MCJIT
> regression tests - if they pass, we assume RuntimeDyld must have done its
> job right. That's far from an ideal. The biggest issues with it are:
>
> (1) Each platform is testing only its own relocations and no others. I.e.
> X86 testers are testing X86 relocations only. ARM testers are testing ARM
> relocations only. If someone running on X86 breaks a relocation for ARM
they
> won't see the error in their regression test run - they'll have to
wait
> until an ARM buildbot breaks before they realize anything is wrong. Fixes
> for platforms that you don't have access to are difficult to test - all
you
> can do is eyeball disassembled memory and see if everything looks sane.
This
> is not much fun.
>
> (2) Relocations are produced by CodeGen from IR, rather than described
> directly. That's a lot of machinery to have between the test-case and
the
> final result. It is difficult to know what relocations each IR regression
> test is testing (and they're often incidental - we don't have a
dedicated
> relocation test set). This also means that if/when the code generator
> produces different relocation types the existing tests will keep on passing
> but will silently stop testing the thing they used to test.
>
> The Proposal:
>
> (1) We provide a mechanism for describing how pieces of relocated memory
> should look immediately prior to execution, and then inspect the memory
> rather than executing it. This addresses point (1) above: Tests for any
> platform can be loaded, linked and verified on any platform. If you're
> coding on X86 and you break an ARM relocation you'll know about it
> immediately.
>
> (2) RuntimeDyld test cases should be written in assembly, rather than IR.
> This addresses point (2) above - we can cut the code generators out and
> guarantee that we're testing what we're interested in.
>
> The way to do this is to introduce a simple pointer expression language.
> This should be able to express things like: "The immediate for this
call
> points at symbol foo".
>
> Symbolically, what I have in mind would look something like:
>
>         // some asm ...
> # assert *(inst1 + 1) = foo
> inst1:
>         callq   foo
>         // some asm...
>
> Here we add the "inst1" label to give us a address from which we
can get at
> the immediate for the call. The " + 1" expression skips the call
opcode (we
> know the size of the opcode ahead of time, since this is assembly and so
> target-specific).
>
> To verify that constraints expressed in this language hold, we can add an
> expression evaluator to the llvm-rtdyld utility, which is a command-line
> interface to RuntimeDyld.
>
> I find these things are easier to discuss in the concrete, so I've
attached
> a basic implementation of this idea. The following discussion is in terms
of
> my patch, but I'm very open to tweaking all this.
>
> The language I've implemented is:
>
> test = expr '=' expr
>
> expr = '*{' number '}' load_addr_expr
>      | binary_expr
>      | '(' expr ')'
>      | symbol
>      | number
>
> load_addr_expr = symbol
>                | '(' symbol '+' number ')'
>                | '(' symbol '-' number ')'
>
> binary_expr = expr '+' expr
>             | expr '-' expr
>             | expr '&' expr
>             | expr '|' expr
>             | expr '<<' expr
>             | expr '>>' expr
>
> This expression language supports simple pointer arithmetic, shifting,
> masking and loading. All values are internally held as 64-bit unsigneds,
> since RuntimeDlyd is designed to support cross-platform linking, including
> linking for 64-bit targets from a 32-bit host. I think the only stand-out
> wart is the *{#size}<addr> syntax for loads. This comes from the fact
that
> immediates aren't always 64-bits, so it's not safe to do a 64-bit
load: you
> could read past the end of allocated memory. The #size field indicates how
> many bytes to read.
>
> This patch adds a "-verify" option to llvm-rtdyld to attach the
expression
> evaluator to a RuntimeDyld instance after linking. When -verify is passed,
> llvm-rtdyld does not execute any code. Files containing rules are passed
via
> "-check=<filename>" arguments, and rules are read from any
line prefixed
> with the string "# rtdyld-check: ". The intended workflow is
modeled on the
> FileCheck regression tests.
>
> Here's an example of what a test case for a test for an x86-64
PC-relative
> MACHO_VANILLA relocation would look like:
>
> ; RUN: clang -triple x86_64-apple-macosx10.9.0 -c -o foo.o %s
> ; RUN: llvm-rtdyld -verify -check=foo.s foo.o
> ; RUN: rm foo.o
> ;
> ; Test an x86-64 PC-relative MACHO_VANILLA relocation.
>
>         .text
>         .globl  bar
>         .align  16, 0x90
> bar:
>         retq
>
>         .globl  foo
>         .align  16, 0x90
> foo:
> # rtdyld-check: *{4}(inst1 - 4) = (bar - inst1) & 0xffffffff
>         callq   bar
> inst1:
>         retq
>
>
> With this system, we could write targeted regression tests for every
> relocation type on every platform, and test them on any system. Failures
> would immediately identify which target and relocation type broke.
>
> I think this system would massively improve the testability of the
> RuntimeDyld layer, which is good news in light of the increased usage MCJIT
> is getting these days.
>
> Please let me know what you think. Comments and critiques are very welcome,
> both of the language and the proposed workflow.
>
> Cheers,
> Lang.
>
> TL;DR: lhames responds to dblaikie's incessant demand for test cases.
;)
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Lang Hames

2014-Jun-24 03:54 UTC

head link

[LLVMdev] Proposal: Improved regression test support for RuntimeDyld/MCJIT.

Hi Dave,

Jim Grosbach asked the same question, so you're in good company. With
hindsight I think it was a mistake to say "FileCheck workflow". What I
really meant was that this system plays well with lit. Not that your question
about using FileCheck would have been any less valid.

I did consider using FileCheck for this, but decided it was the wrong approach.
The fundamental reason is that there's no demand for textually rendering
RuntimeDyld's memory, and developing a textual renderer so that we could
output text just to pattern match and re-assemble ints in FileCheck would be a
lot of pain for (as far as I can see) no gain.

If there's a desire for FileCheck to support expression evaluation we could
flesh out this evaluator and make it available as a support library that both
FileCheck and RuntimeDyld could use.

You (and independently Nick Kledzik) do raise the really useful idea of
leveraging the disassembler though. I like the idea of adding some special
syntax to disassemble an instruction at a label and use one of its immediates.
That would eliminate a lot of the bit bashing that would have been required on
instruction sets with tricky immediate encodings (E.g. ARM). Something like:

# rtdyld-check: @test_inst[0] = foo - (test_inst + 5)
test_inst:
  callq foo

Cheers,
Lang.
> On Jun 23, 2014, at 5:01 PM, David Blaikie <dblaikie at gmail.com>
wrote:
> 
>> On Mon, Jun 23, 2014 at 3:01 PM, Lang Hames <lhames at gmail.com>
wrote:
>> Hi Everyone,
>> 
>> For your consideration: A proposal to improve regression test support
for
>> RuntimeDyld.
> 
> Thanks for working on this, Lang. It's great to see.
> 
>> Short version: We can make RuntimeDyld far more testable by adding a
trivial
>> pointer-expression language that allows us to describe how memory
should
>> look post-relocation. Jump down to "The Proposal" for
details.
> 
> I've been trying to puzzle over what this would look like with a
> possibly more general feature*.
> 
> What would testing look like if we had a rtdyld dumping mode that
> printed the disassembly of the relocated machine code, and a symbol
> table (or just inserted the labels for the symbols into the
> disassembly?).
> 
> I understand we'd need to beef up FileCheck with slightly more
> arithmetic operations - but is it really so much (& would they be so
> useless for other tests) that it's not worth putting it there?
> 
> To take your example, here's my vague idea of what it might look like
> to use a dump+FileCheck. The dump would look something like:
> 
> (obviously I don't know, nor for this purpose care, how big the
> instructions are, just that they have distinct addresses, etc)
> 
>  0x42: bar:
>  0x42:   retq
>  0x43: foo:
>  0x43:   callq 0x42
>  0x44: inst1:
>  0x44:   retq
> 
> And the FileCheck equivalent of
> 
>  # rtdyld-check: *{4}(inst1 - 4) = (bar - inst1) & 0xffffffff
> 
> would be something like:
> 
>  CHECK: [[CALL_ADDR:.*]]: bar:
>  CHECK: callq [[CALL_ADDR]]
> 
> Which, I suppose, depends on disassembler working correctly, not sure
> if that's high risk/complicated.
> 
> Alternatively - could llvm-rtdyld just print a simple description of
> relocations its applied and the location of symbols? (similar to a
> static display of relocations like llvm-objdump -r) then FileCheck
> that.
> 
> * all that said, a feature like you've proposed/implemented isn't
> without precedent - clang's -verify is very similar to what you've
got
> here
> 
> 
>> Long version:
>> 
>> Background:
>> 
>> For those unfamiliar with it, RuntimeDyld a component of MCJIT,
LLVM's JIT
>> compiler infrastructure. MCJIT produces an object file in memory for
each
>> module that is JIT'd. RuntimeDyld's job is to apply all the
relocations
>> necessary to make the code in the object file runnable. In other words,
>> RuntimeDyld is acting as both the static and dynamic linker for the
JIT.
>> 
>> The Problem:
>> 
>> We can't directly test RuntimeDyld at the moment. We currently
infer the
>> correctness of RuntimeDyld indirectly from the success of the MCJIT
>> regression tests - if they pass, we assume RuntimeDyld must have done
its
>> job right. That's far from an ideal. The biggest issues with it
are:
>> 
>> (1) Each platform is testing only its own relocations and no others.
I.e.
>> X86 testers are testing X86 relocations only. ARM testers are testing
ARM
>> relocations only. If someone running on X86 breaks a relocation for ARM
they
>> won't see the error in their regression test run - they'll have
to wait
>> until an ARM buildbot breaks before they realize anything is wrong.
Fixes
>> for platforms that you don't have access to are difficult to test -
all you
>> can do is eyeball disassembled memory and see if everything looks sane.
This
>> is not much fun.
>> 
>> (2) Relocations are produced by CodeGen from IR, rather than described
>> directly. That's a lot of machinery to have between the test-case
and the
>> final result. It is difficult to know what relocations each IR
regression
>> test is testing (and they're often incidental - we don't have a
dedicated
>> relocation test set). This also means that if/when the code generator
>> produces different relocation types the existing tests will keep on
passing
>> but will silently stop testing the thing they used to test.
>> 
>> The Proposal:
>> 
>> (1) We provide a mechanism for describing how pieces of relocated
memory
>> should look immediately prior to execution, and then inspect the memory
>> rather than executing it. This addresses point (1) above: Tests for any
>> platform can be loaded, linked and verified on any platform. If
you're
>> coding on X86 and you break an ARM relocation you'll know about it
>> immediately.
>> 
>> (2) RuntimeDyld test cases should be written in assembly, rather than
IR.
>> This addresses point (2) above - we can cut the code generators out and
>> guarantee that we're testing what we're interested in.
>> 
>> The way to do this is to introduce a simple pointer expression
language.
>> This should be able to express things like: "The immediate for
this call
>> points at symbol foo".
>> 
>> Symbolically, what I have in mind would look something like:
>> 
>>        // some asm ...
>> # assert *(inst1 + 1) = foo
>> inst1:
>>        callq   foo
>>        // some asm...
>> 
>> Here we add the "inst1" label to give us a address from which
we can get at
>> the immediate for the call. The " + 1" expression skips the
call opcode (we
>> know the size of the opcode ahead of time, since this is assembly and
so
>> target-specific).
>> 
>> To verify that constraints expressed in this language hold, we can add
an
>> expression evaluator to the llvm-rtdyld utility, which is a
command-line
>> interface to RuntimeDyld.
>> 
>> I find these things are easier to discuss in the concrete, so I've
attached
>> a basic implementation of this idea. The following discussion is in
terms of
>> my patch, but I'm very open to tweaking all this.
>> 
>> The language I've implemented is:
>> 
>> test = expr '=' expr
>> 
>> expr = '*{' number '}' load_addr_expr
>>     | binary_expr
>>     | '(' expr ')'
>>     | symbol
>>     | number
>> 
>> load_addr_expr = symbol
>>               | '(' symbol '+' number ')'
>>               | '(' symbol '-' number ')'
>> 
>> binary_expr = expr '+' expr
>>            | expr '-' expr
>>            | expr '&' expr
>>            | expr '|' expr
>>            | expr '<<' expr
>>            | expr '>>' expr
>> 
>> This expression language supports simple pointer arithmetic, shifting,
>> masking and loading. All values are internally held as 64-bit
unsigneds,
>> since RuntimeDlyd is designed to support cross-platform linking,
including
>> linking for 64-bit targets from a 32-bit host. I think the only
stand-out
>> wart is the *{#size}<addr> syntax for loads. This comes from the
fact that
>> immediates aren't always 64-bits, so it's not safe to do a
64-bit load: you
>> could read past the end of allocated memory. The #size field indicates
how
>> many bytes to read.
>> 
>> This patch adds a "-verify" option to llvm-rtdyld to attach
the expression
>> evaluator to a RuntimeDyld instance after linking. When -verify is
passed,
>> llvm-rtdyld does not execute any code. Files containing rules are
passed via
>> "-check=<filename>" arguments, and rules are read from
any line prefixed
>> with the string "# rtdyld-check: ". The intended workflow is
modeled on the
>> FileCheck regression tests.
>> 
>> Here's an example of what a test case for a test for an x86-64
PC-relative
>> MACHO_VANILLA relocation would look like:
>> 
>> ; RUN: clang -triple x86_64-apple-macosx10.9.0 -c -o foo.o %s
>> ; RUN: llvm-rtdyld -verify -check=foo.s foo.o
>> ; RUN: rm foo.o
>> ;
>> ; Test an x86-64 PC-relative MACHO_VANILLA relocation.
>> 
>>        .text
>>        .globl  bar
>>        .align  16, 0x90
>> bar:
>>        retq
>> 
>>        .globl  foo
>>        .align  16, 0x90
>> foo:
>> # rtdyld-check: *{4}(inst1 - 4) = (bar - inst1) & 0xffffffff
>>        callq   bar
>> inst1:
>>        retq
>> 
>> 
>> With this system, we could write targeted regression tests for every
>> relocation type on every platform, and test them on any system.
Failures
>> would immediately identify which target and relocation type broke.
>> 
>> I think this system would massively improve the testability of the
>> RuntimeDyld layer, which is good news in light of the increased usage
MCJIT
>> is getting these days.
>> 
>> Please let me know what you think. Comments and critiques are very
welcome,
>> both of the language and the proposed workflow.
>> 
>> Cheers,
>> Lang.
>> 
>> TL;DR: lhames responds to dblaikie's incessant demand for test
cases. ;)
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>

Seemingly Similar Threads

Search for more seemingly similar threads

llvm dev - Jun 2014 - [LLVMdev] Proposal: Improved regression test support for RuntimeDyld/MCJIT.

[LLVMdev] Proposal: Improved regression test support for RuntimeDyld/MCJIT.

[LLVMdev] Proposal: Improved regression test support for RuntimeDyld/MCJIT.

[LLVMdev] Proposal: Improved regression test support for RuntimeDyld/MCJIT.

[LLVMdev] Proposal: Improved regression test support for RuntimeDyld/MCJIT.

Seemingly Similar Threads