thr3ads.net - llvm dev - [LLVMdev] MCJIT RemoteMemoryManager Failures on ARM [Nov 2013]

If this information is useful, please help other people find it:
Share via:

Renato Golin

2013-Nov-26 22:44 UTC

[LLVMdev] MCJIT RemoteMemoryManager Failures on ARM

On 26 November 2013 19:05, Kaylor, Andrew <andrew.kaylor at intel.com>
wrote:
>  I would also note that the failure isn’t actually in anything
> MCJIT-specific.  Aside from the fact that it seems to be clang-specific,
> the code that is failing is specific to the lli remote implementation.
> It’s not clear to me why it would fail under aggressive optimization with
> clang, but I wouldn’t characterize that code as particularly robust.
>
I agree. I think this is more likely a codegen fault on Clang's side that
crashes the client, not even the remote implementation, that even being
crude, has very little room for failure of that magnitude.


I just updated the bugzilla report with a few comments about the
failure.> The short of it is that there’s nothing MCJIT-specific about this failure.
> It’s most likely a pipe I/O problem.  I think it’s possible that the clang
> optimizations are just exposing a timing-related vulnerability in the pipe
> handling.
>
Ok, I'll disable those tests for ARM for now and will look into the bug.

I don't know much about how MCJIT works, so creating the reduced test case
will prove difficult. But I'll progress, because I do want MCJIT to work
well on ARM, and disabling tests is the wrong way to head. ;)

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131126/88b09dab/attachment.html>

Kaylor, Andrew

2013-Nov-26 23:29 UTC

head link

[LLVMdev] MCJIT RemoteMemoryManager Failures on ARM

Looking at the code, one obvious source of intermittent failure is that the
Linux implementations of ReadBytes and WriteBytes don't check for EINTR.  I
doubt that's the failure you're seeing because it would be more randomly
distributed but it's something that should be fixed.

More likely as the cause of failure in your case is that read is returning less
than the number of bytes requested.   In theory, this can happen if we read one
end of the pipe while the other end is being written, but the current code
doesn't check for it.  A race condition like this seems more likely than a
code generation problem.

I'm attaching a patch (which I haven't even tried to compile) that I
think addresses these issues.  Can you try it out and see if it fixes this
problem for you?

If this doesn't do the trick, by stepping through the remote case in the
debugger you can see what the communication is leading up to the failure.  From
there it should be relatively simple to use just the RemoteTargetExternal class
to create a test driver that communicates with the child process in the same
way.  This ought to give you a failing test case completely independent of any
of significant part of LLVM (unless the failure is entirely timing dependent).

Thanks,
Andy


From: Renato Golin [mailto:renato.golin at linaro.org]
Sent: Tuesday, November 26, 2013 2:44 PM
To: Kaylor, Andrew
Cc: NAKAMURA Takumi; LLVM Dev
Subject: Re: MCJIT RemoteMemoryManager Failures on ARM

On 26 November 2013 19:05, Kaylor, Andrew <andrew.kaylor at
intel.com<mailto:andrew.kaylor at intel.com>> wrote:
I would also note that the failure isn't actually in anything
MCJIT-specific.  Aside from the fact that it seems to be clang-specific, the
code that is failing is specific to the lli remote implementation.  It's not
clear to me why it would fail under aggressive optimization with clang, but I
wouldn't characterize that code as particularly robust.

I agree. I think this is more likely a codegen fault on Clang's side that
crashes the client, not even the remote implementation, that even being crude,
has very little room for failure of that magnitude.


I just updated the bugzilla report with a few comments about the failure.  The
short of it is that there's nothing MCJIT-specific about this failure. 
It's most likely a pipe I/O problem.  I think it's possible that the
clang optimizations are just exposing a timing-related vulnerability in the pipe
handling.

Ok, I'll disable those tests for ARM for now and will look into the bug.

I don't know much about how MCJIT works, so creating the reduced test case
will prove difficult. But I'll progress, because I do want MCJIT to work
well on ARM, and disabling tests is the wrong way to head. ;)

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131126/77118bd3/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lli-remote-comm.patch
Type: application/octet-stream
Size: 2997 bytes
Desc: lli-remote-comm.patch
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131126/77118bd3/attachment.obj>

Renato Golin

2013-Nov-27 00:30 UTC

head link

[LLVMdev] MCJIT RemoteMemoryManager Failures on ARM

On 26 November 2013 23:29, Kaylor, Andrew <andrew.kaylor at intel.com>
wrote:
>  Looking at the code, one obvious source of intermittent failure is that
> the Linux implementations of ReadBytes and WriteBytes don’t check for
> EINTR.  I doubt that’s the failure you’re seeing because it would be more
> randomly distributed but it’s something that should be fixed.
>
Agreed.


 More likely as the cause of failure in your case is that read is
returning> less than the number of bytes requested.   In theory, this can happen if we
> read one end of the pipe while the other end is being written, but the
> current code doesn’t check for it.  A race condition like this seems more
> likely than a code generation problem.
>
Right. What I meant by a codegen problem was not *just* a crash in the
client, but code movement that would induce instability, like moving things
beyond memory barriers, etc. However, I agree that the code, as it is, is
not robust enough and that the compiler can be more aggressive to remove
the lucky balance it has now.


I’m attaching a patch (which I haven’t even tried to compile) that I
think> addresses these issues.  Can you try it out and see if it fixes this
> problem for you?
>
The patch indeed fixes the problem, but it introduces lock-ups on other
(random) tests when they run simultaneously, but not so when I run them
independently.

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131127/7dacd5f7/attachment.html>

Apparently Analagous Threads

Search for more apparently analagous threads

llvm dev - Nov 2013 - [LLVMdev] MCJIT RemoteMemoryManager Failures on ARM

[LLVMdev] MCJIT RemoteMemoryManager Failures on ARM

[LLVMdev] MCJIT RemoteMemoryManager Failures on ARM

[LLVMdev] MCJIT RemoteMemoryManager Failures on ARM

Apparently Analagous Threads