thr3ads.net - Xen devel - domU panic on nested call to arch_enter_lazy_mmu

If this information is useful, please help other people find it:
Share via:

Andrew Jones

2013-Apr-10 15:35 UTC

domU panic on nested call to arch_enter_lazy_mmu_mode()

Hi all,

A couple years ago a thread[1] popped up here for a bug report that
Jeremy followed up to with this patch[2]. That patch was never
committed though (likely because the issue was difficult to
reproduce/test). We''ve got a report now of the same issue for the
rhel6 kernel running on EC2. It''s pretty certain that it''s the
same,
because the reproducer steps[3] given would certainly generate the
same call sequences shown in [1], and applying the proposed patch[2]
to the rhel6 kernel fixes it.

Now, while the grant table code has changed some between what rhel6
has and recent kernels, I believe the issue should still be present
with recent kernels. However, we attempted to reproduce using a
Fedora18 kernel (>3.8) and could not. So I''m writing to see if
I''m
missing something in my analysis - meaning upstream is no longer at
risk of hitting this bug, and/or if Jeremy''s proposed patch was
rejected for other reasons than not being testable (or just
forgotten). If not, then I''d suggest we repost it.

Thanks,
drew

[1] http://lists.xen.org/archives/html/xen-devel/2010-12/msg00440.html
[2] http://lists.xen.org/archives/html/xen-devel/2010-12/msg00505.html
[3] Reproducer steps
1. Start a instance which is a c1.xlarge of Amazon EC2 Instance type.
   (c1.xlarge has 8 cores)

2. create 7 file systems(ext3) on top of Amazon EBS volumes 

3. mount 7 file sytemes you created

4. For increasing page table operations, create a following program

--
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>

int main(void)
{
        int status;
        pid_t pid; 
        for (;;) {
                pid = fork();
                if (pid == 0) {
                        return 0;
                }
                wait(&status);
        }
}
--

5. run the program  pinning CPU0

# gcc fork.c
# taskset -c 0 ./a.out  


6. For using grant table, execute simultaneous write operation to 7 EBS volumes.
  ( c1.xlarge can use 8CPU so execute simultaneous write to CPU1-CPU7 except
CPU0 )

For instance:
--
for i in `seq 1 7`;
do
        taskset -c $i dd if=/dev/zero of=/mnt/$i/testfile bs=10M count=10000
oflag=direct &
done

Konrad Rzeszutek Wilk

2013-May-03 13:03 UTC

head link

Re: domU panic on nested call to arch_enter_lazy_mmu_mode()

On Wed, Apr 10, 2013 at 11:35:35AM -0400, Andrew Jones
wrote:> Hi all,
> 
> A couple years ago a thread[1] popped up here for a bug report that
> Jeremy followed up to with this patch[2]. That patch was never
> committed though (likely because the issue was difficult to
> reproduce/test). We''ve got a report now of the same issue for the
> rhel6 kernel running on EC2. It''s pretty certain that
it''s the same,
> because the reproducer steps[3] given would certainly generate the
> same call sequences shown in [1], and applying the proposed patch[2]
> to the rhel6 kernel fixes it.
> 
> Now, while the grant table code has changed some between what rhel6
> has and recent kernels, I believe the issue should still be present
> with recent kernels. However, we attempted to reproduce using a
> Fedora18 kernel (>3.8) and could not. So I''m writing to see if
I''m
> missing something in my analysis - meaning upstream is no longer at
> risk of hitting this bug, and/or if Jeremy''s proposed patch was
> rejected for other reasons than not being testable (or just
> forgotten). If not, then I''d suggest we repost it.
The logic behind the arch_enter/leave_lazy_mmu was that they would
be done within the context of the kernel uninterrupted. Meaning that the
enter and leave would be done at some point and user-space would not
be invoked during that time (which is btw the issue that Chuck
spotted). There were a couple of bugs that did not do that properly and
they have been fixed (I can''t remember the exact ones, but a git log
--grep="lazy" should provide some idea).

Most of the issues were not in the Xen code but in generic, such
as vmalloc, and some other ones:

commit 1160c2779b826c6f5c08e5cc542de58fd1f667d5
Author: Samu Kallio <samu.kallio@aberdeencloud.com>
Date:   Sat Mar 23 09:36:35 2013 -0400

    x86, mm, paravirt: Fix vmalloc_fault oops during lazy MMU updates


But if you find this re-appearing, please do report it so we can
either track it down, or use that patch (and add some WARN) so
that the customers can still use the kernel but we can identify
the issues.
> 
> Thanks,
> drew
> 
> [1] http://lists.xen.org/archives/html/xen-devel/2010-12/msg00440.html
> [2] http://lists.xen.org/archives/html/xen-devel/2010-12/msg00505.html
> [3] Reproducer steps
> 1. Start a instance which is a c1.xlarge of Amazon EC2 Instance type.
>    (c1.xlarge has 8 cores)
> 
> 2. create 7 file systems(ext3) on top of Amazon EBS volumes 
> 
> 3. mount 7 file sytemes you created
> 
> 4. For increasing page table operations, create a following program
> 
> --
> #include <unistd.h>
> #include <sys/types.h>
> #include <sys/wait.h>
> 
> int main(void)
> {
>         int status;
>         pid_t pid; 
>         for (;;) {
>                 pid = fork();
>                 if (pid == 0) {
>                         return 0;
>                 }
>                 wait(&status);
>         }
> }
> --
> 
> 5. run the program  pinning CPU0
> 
> # gcc fork.c
> # taskset -c 0 ./a.out  
> 
> 
> 6. For using grant table, execute simultaneous write operation to 7 EBS
volumes.
>   ( c1.xlarge can use 8CPU so execute simultaneous write to CPU1-CPU7
except CPU0 )
> 
> For instance:
> --
> for i in `seq 1 7`;
> do
>         taskset -c $i dd if=/dev/zero of=/mnt/$i/testfile bs=10M
count=10000 oflag=direct &
> done
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>

Xen devel - Apr 2013 - domU panic on nested call to arch_enter_lazy_mmu_mode()

domU panic on nested call to arch_enter_lazy_mmu_mode()

Re: domU panic on nested call to arch_enter_lazy_mmu_mode()