thr3ads.net - Xen devel - [Xen-devel] how to handle paged hypercall args? [Nov 2010]

If this information is useful, please help other people find it:
Share via:

Olaf Hering

2010-Nov-10 08:58 UTC

[Xen-devel] how to handle paged hypercall args?

Hello,

I''m running into the BUG_ON() after an incomplete
XENMEM_decrease_reservation
HYPERVISOR_memory_op call in balloon.c:decrease_reservation().

The reason for that is the huge number of nr_extents, were many of them
are paged out pages. Because the are paged out, they can just be dropped
from the xenpaging point of view, no need to page them in before calling
p2m_remove_page() for the paged-out gfn. Whatever strategy is chosen,
the hypercall will be preempted.

Because the hypercall is preempted, the arg is copied several times from
the guest to the stack with copy_from_guest(). Now there is appearently
nothing that stops the xenpaging binary in dom0 from making progress and
eventually nominating the gfn which holds the guests kernel stack page.
This lets __hvm_copy() return HVMCOPY_gfn_paged_out, which means
copy_from_user_hvm() "fails", and this lets the whole hypercall fail.

Now in my particular case, its the first copy_from_user_hvm() and I can
probably come up with a simple patch which let copy_from_user_hvm()
return some sort of -EAGAIN. This could be used in do_memory_op() to
just restart the hypercall once more until the gfn which holds args is
available again. Then my decrease_reservation() bug would have a
workaround and I could move on.

However, I think there is nothing that would prevent the xenpaging
binary from nominating the guest gfn while the actual work is done
during the hypercall and then copy_to_user_hvm would fail.
How should other hypercalls deal with the situation that the guest gfn
gets into the paged-out state? Can they just sleep and do some sort of
polling until the page is accessible again? Was this case considered
while implementing xenpaging?

I''m currently reading through the callers of __hvm_copy(). Some of them
detect HVMCOPY_gfn_paged_out and so some sort of retry. Others just
ignore the return codes, or turn them into generic errors. In the case
of copy_from/to_guest each caller needs an audit if a retry is possible.


Olaf


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Olaf Hering

2010-Nov-11 14:33 UTC

head link

[Xen-devel] Re: how to handle paged hypercall args?

On Wed, Nov 10, Olaf Hering wrote:
> I''m currently reading through the callers of __hvm_copy(). Some of
them
> detect HVMCOPY_gfn_paged_out and so some sort of retry. Others just
> ignore the return codes, or turn them into generic errors. In the case
> of copy_from/to_guest each caller needs an audit if a retry is possible.
A frist patch which avoids the BUG_ON is below. 
It turned out that the guests pagetable were just nominated and
paged-out during the preempted do_memory_op hypercall. So
copy_from_guest failed in decrease_reservation() and do_memory_op().

I have added also some error handling in case the copy_to_user fails.
However, only the decrease_reservation() code path is runtime tested and
in fact this whole patch is not yet compile-tested. Its just a heads-up.


So is that an acceptable way to deal with the HVMCOPY_gfn_paged_out
return codes from __hvm_copy?
Or should I explore some different way, like spinning there and possible
let other threads-of-execution make progress while waiting for the gfns
to come back?


Olaf

---
 xen/arch/x86/hvm/hvm.c |    4 ++++
 xen/common/memory.c    |   43 ++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 42 insertions(+), 5 deletions(-)

--- xen-4.0.1-testing.orig/xen/arch/x86/hvm/hvm.c
+++ xen-4.0.1-testing/xen/arch/x86/hvm/hvm.c
@@ -1853,6 +1853,8 @@ unsigned long copy_to_user_hvm(void *to,
 
     rc = hvm_copy_to_guest_virt_nofault((unsigned long)to, (void *)from,
                                         len, 0);
+    if ( rc == HVMCOPY_gfn_paged_out )
+        return -EAGAIN;
     return rc ? len : 0; /* fake a copy_to_user() return code */
 }
 
@@ -1869,6 +1871,8 @@ unsigned long copy_from_user_hvm(void *t
 #endif
 
     rc = hvm_copy_from_guest_virt_nofault(to, (unsigned long)from, len, 0);
+    if ( rc == HVMCOPY_gfn_paged_out )
+        return -EAGAIN;
     return rc ? len : 0; /* fake a copy_from_user() return code */
 }
 
--- xen-4.0.1-testing.orig/xen/common/memory.c
+++ xen-4.0.1-testing/xen/common/memory.c
@@ -47,6 +47,7 @@ static void increase_reservation(struct
 {
     struct page_info *page;
     unsigned long i;
+    unsigned long ctg_ret;
     xen_pfn_t mfn;
     struct domain *d = a->domain;
 
@@ -80,8 +81,14 @@ static void increase_reservation(struct
         if ( !guest_handle_is_null(a->extent_list) )
         {
             mfn = page_to_mfn(page);
-            if ( unlikely(__copy_to_guest_offset(a->extent_list, i,
&mfn, 1)) )
+            ctg_ret = __copy_to_guest_offset(a->extent_list, i, &mfn,
1);
+            if ( unlikely(ctg_ret) )
+            {
+                free_domheap_pages(page, a->extent_order);
+                if ( (long)ctg_ret == -EAGAIN )
+                    a->preempted = 1;
                 goto out;
+            }
         }
     }
 
@@ -93,6 +100,7 @@ static void populate_physmap(struct memo
 {
     struct page_info *page;
     unsigned long i, j;
+    unsigned long ctg_ret;
     xen_pfn_t gpfn, mfn;
     struct domain *d = a->domain;
 
@@ -111,8 +119,13 @@ static void populate_physmap(struct memo
             goto out;
         }
 
-        if ( unlikely(__copy_from_guest_offset(&gpfn, a->extent_list, i,
1)) )
+        j = __copy_from_guest_offset(&gpfn, a->extent_list, i, 1);
+        if ( unlikely(j) )
+        {
+            if ( (long)j == -EAGAIN )
+                a->preempted = 1;
             goto out;
+        }
 
         if ( a->memflags & MEMF_populate_on_demand )
         {
@@ -142,8 +155,17 @@ static void populate_physmap(struct memo
                     set_gpfn_from_mfn(mfn + j, gpfn + j);
 
                 /* Inform the domain of the new page''s machine
address. */
-                if ( unlikely(__copy_to_guest_offset(a->extent_list, i,
&mfn, 1)) )
+                ctg_ret = __copy_to_guest_offset(a->extent_list, i,
&mfn, 1);
+                if ( unlikely(ctg_ret) )
+                {
+                    for ( j = 0; j < (1 << a->extent_order); j++ )
+                        set_gpfn_from_mfn(mfn + j, INVALID_P2M_ENTRY);
+                    guest_physmap_remove_page(d, gpfn, mfn,
a->extent_order);
+                    free_domheap_pages(page, a->extent_order);
+                    if ( (long)ctg_ret == -EAGAIN )
+                        a->preempted = 1;
                     goto out;
+                }
             }
         }
     }
@@ -226,8 +248,13 @@ static void decrease_reservation(struct
             goto out;
         }
 
-        if ( unlikely(__copy_from_guest_offset(&gmfn, a->extent_list, i,
1)) )
+        j = __copy_from_guest_offset(&gmfn, a->extent_list, i, 1);
+        if ( unlikely(j) )
+        {
+            if ( (long)j == -EAGAIN )
+                a->preempted = 1;
             goto out;
+        }
 
         if ( tb_init_done )
         {
@@ -511,6 +538,7 @@ long do_memory_op(unsigned long cmd, XEN
     int rc, op;
     unsigned int address_bits;
     unsigned long start_extent;
+    unsigned long cfg_ret;
     struct xen_memory_reservation reservation;
     struct memop_args args;
     domid_t domid;
@@ -524,8 +552,13 @@ long do_memory_op(unsigned long cmd, XEN
     case XENMEM_populate_physmap:
         start_extent = cmd >> MEMOP_EXTENT_SHIFT;
 
-        if ( copy_from_guest(&reservation, arg, 1) )
+        cfg_ret = copy_from_guest(&reservation, arg, 1);
+        if ( unlikely(cfg_ret) )
+        {
+            if ( (long)cfg_ret == -EAGAIN )
+                return hypercall_create_continuation(__HYPERVISOR_memory_op,
"lh", cmd, arg);
             return start_extent;
+        }
 
         /* Is size too large for us to encode a continuation? */
         if ( reservation.nr_extents > (ULONG_MAX >>
MEMOP_EXTENT_SHIFT) )

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Nov-11 20:08 UTC

head link

Re: [Xen-devel] Re: how to handle paged hypercall args?

On 11/11/2010 14:33, "Olaf Hering" <olaf@aepfle.de> wrote:
> So is that an acceptable way to deal with the HVMCOPY_gfn_paged_out
> return codes from __hvm_copy?
> Or should I explore some different way, like spinning there and possible
> let other threads-of-execution make progress while waiting for the gfns
> to come back?
You can''t just spin because Xen is not preemptible. If it were a single
CPU
system for example, no other thread would ever run again. You have to
''spin''
via a preemptible loop that returns to guest context and then back into the
hypercall. Which appears to be what you''re doing.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Olaf Hering

2010-Nov-11 20:34 UTC

head link

Re: [Xen-devel] Re: how to handle paged hypercall args?

On Thu, Nov 11, Keir Fraser wrote:
> On 11/11/2010 14:33, "Olaf Hering" <olaf@aepfle.de> wrote:
> 
> > So is that an acceptable way to deal with the HVMCOPY_gfn_paged_out
> > return codes from __hvm_copy?
> > Or should I explore some different way, like spinning there and
possible
> > let other threads-of-execution make progress while waiting for the
gfns
> > to come back?
> 
> You can''t just spin because Xen is not preemptible. If it were a
single CPU
> system for example, no other thread would ever run again. You have to
''spin''
> via a preemptible loop that returns to guest context and then back into the
> hypercall. Which appears to be what you''re doing.
Thanks for the answer.

It occoured to me that this is an issue for hypercalls made by the
guest. There are probably not that many in use. So it shouldnt be that
hard to audit the few drivers what they use and add some error handling.
Up to now, only do_memory_op had an issue.


Olaf


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Nov-11 21:00 UTC

head link

Re: [Xen-devel] Re: how to handle paged hypercall args?

On 11/11/2010 20:34, "Olaf Hering" <olaf@aepfle.de> wrote:
> On Thu, Nov 11, Keir Fraser wrote:
> 
>> On 11/11/2010 14:33, "Olaf Hering" <olaf@aepfle.de>
wrote:
>> 
>>> So is that an acceptable way to deal with the HVMCOPY_gfn_paged_out
>>> return codes from __hvm_copy?
>>> Or should I explore some different way, like spinning there and
possible
>>> let other threads-of-execution make progress while waiting for the
gfns
>>> to come back?
>> 
>> You can''t just spin because Xen is not preemptible. If it were
a single CPU
>> system for example, no other thread would ever run again. You have to
''spin''
>> via a preemptible loop that returns to guest context and then back into
the
>> hypercall. Which appears to be what you''re doing.
> 
> Thanks for the answer.
> 
> It occoured to me that this is an issue for hypercalls made by the
> guest. There are probably not that many in use. So it shouldnt be that
> hard to audit the few drivers what they use and add some error handling.
> Up to now, only do_memory_op had an issue.
Only other thing I''d say is that depending on how often this happens,
because paging in may well require a slow I/O operation, it may even be nice
to sleep the waiting vcpu rather than spin. Would require some mechanism to
record what vcpus are waiting for what mfns, and to check that list when
paging stuff in. I guess it''s rather a ''phase 2''
thing after things actually
work reliably!

 -- Keir
> 
> Olaf
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2010-Nov-12 09:45 UTC

head link

Re: [Xen-devel] Re: how to handle paged hypercall args?

>>> On 11.11.10 at 21:08, Keir Fraser <keir@xen.org> wrote:
> On 11/11/2010 14:33, "Olaf Hering" <olaf@aepfle.de> wrote:
> 
>> So is that an acceptable way to deal with the HVMCOPY_gfn_paged_out
>> return codes from __hvm_copy?
>> Or should I explore some different way, like spinning there and
possible
>> let other threads-of-execution make progress while waiting for the gfns
>> to come back?
> 
> You can''t just spin because Xen is not preemptible. If it were a
single CPU
> system for example, no other thread would ever run again. You have to
''spin''
> via a preemptible loop that returns to guest context and then back into the
> hypercall. Which appears to be what you''re doing.
This works in the context os do_memory_op(), which already has
a way to encode a continuation. For other hypercalls (accessible
to HVM guests) this may not be as simple, and all of them are
potentially running into this same problem.

Furthermore, even for the do_memory_op() one, encoding a
continuation for a failure of copying in the arguments is clearly
acceptable (if no other solution can be found), but unwinding
the whole operation when copying out the results fails is at
least undesirable (and can lead to a live lock). So I think a
general (hopefully transparent to the individual hypercall
handlers) solution needs to be found, and a word on the
general issue from the original paging code authors (and their
thoughts of it when designing the whole thing) would be very
much appreciated.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Nov-12 10:22 UTC

head link

Re: [Xen-devel] Re: how to handle paged hypercall args?

On 12/11/2010 09:45, "Jan Beulich" <JBeulich@novell.com> wrote:
> Furthermore, even for the do_memory_op() one, encoding a
> continuation for a failure of copying in the arguments is clearly
> acceptable (if no other solution can be found), but unwinding
> the whole operation when copying out the results fails is at
> least undesirable (and can lead to a live lock). So I think a
> general (hopefully transparent to the individual hypercall
> handlers) solution needs to be found, and a word on the
> general issue from the original paging code authors (and their
> thoughts of it when designing the whole thing) would be very
> much appreciated.
We will at least have to enforce that no spinlocks are held during
copy_to/from_guest operations. That''s easily enforced at least in debug
builds of course.

Beyond that, introducing some transparent mechanisms for sleeping in the
hypervisor -- mutexes, wait queues, and the like -- is actually fine with
me. Perhaps this will also help clean up the preemptible page-type-checking
logic that you had to do some heavy lifting on?

I''m happy to help work on the basic mechanism of this, if it''s
going to be
useful and widely used. I reckon I could get mutexes and wait queues going
in a couple of days. This would be the kind of framework that the paging
mechanisms should then properly be built on.

What do you think?

 -- Keir

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2010-Nov-12 10:47 UTC

head link

Re: [Xen-devel] Re: how to handle paged hypercall args?

>>> On 12.11.10 at 11:22, Keir Fraser <keir@xen.org> wrote:
> On 12/11/2010 09:45, "Jan Beulich" <JBeulich@novell.com>
wrote:
> 
>> Furthermore, even for the do_memory_op() one, encoding a
>> continuation for a failure of copying in the arguments is clearly
>> acceptable (if no other solution can be found), but unwinding
>> the whole operation when copying out the results fails is at
>> least undesirable (and can lead to a live lock). So I think a
>> general (hopefully transparent to the individual hypercall
>> handlers) solution needs to be found, and a word on the
>> general issue from the original paging code authors (and their
>> thoughts of it when designing the whole thing) would be very
>> much appreciated.
> 
> We will at least have to enforce that no spinlocks are held during
> copy_to/from_guest operations. That''s easily enforced at least in
debug
> builds of course.
> 
> Beyond that, introducing some transparent mechanisms for sleeping in the
> hypervisor -- mutexes, wait queues, and the like -- is actually fine with
> me. Perhaps this will also help clean up the preemptible page-type-checking
> logic that you had to do some heavy lifting on?
I''m not sure it would help there - this requires voluntary
preemption rather than synchronization. But perhaps it can be
built on top of this (or results as a side effect).
> I''m happy to help work on the basic mechanism of this, if
it''s going to be
> useful and widely used. I reckon I could get mutexes and wait queues going
> in a couple of days. This would be the kind of framework that the paging
> mechanisms should then properly be built on.
> 
> What do you think?
Sounds good, and you helping with this will be much appreciated
(Olaf - unless you had plans doing this yourself). Whether it''s going
to be widely used I can''t tell immediately - for the moment,
overcoming the paging problems seems like the only application.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Nov-12 14:32 UTC

head link

Re: [Xen-devel] Re: how to handle paged hypercall args?

On 12/11/2010 10:47, "Jan Beulich" <JBeulich@novell.com> wrote:
>>>> On 12.11.10 at 11:22, Keir Fraser <keir@xen.org> wrote:
>> On 12/11/2010 09:45, "Jan Beulich"
<JBeulich@novell.com> wrote:
>> 
>> Beyond that, introducing some transparent mechanisms for sleeping in
the
>> hypervisor -- mutexes, wait queues, and the like -- is actually fine
with
>> me. Perhaps this will also help clean up the preemptible
page-type-checking
>> logic that you had to do some heavy lifting on?
> 
> I''m not sure it would help there - this requires voluntary
> preemption rather than synchronization. But perhaps it can be
> built on top of this (or results as a side effect).
Yes, voluntary preempt can be built from the same bits and pieces very
easily. I will provide that too, and I think some simplification to the
page-type functions and callers will result. No bad thing!
>> I''m happy to help work on the basic mechanism of this, if
it''s going to be
>> useful and widely used. I reckon I could get mutexes and wait queues
going
>> in a couple of days. This would be the kind of framework that the
paging
>> mechanisms should then properly be built on.
>> 
>> What do you think?
> 
> Sounds good, and you helping with this will be much appreciated
> (Olaf - unless you had plans doing this yourself). Whether it''s
going
> to be widely used I can''t tell immediately - for the moment,
> overcoming the paging problems seems like the only application.
Yeah, I''ll get on this.

 -- Keir
> Jan
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tim Deegan

2010-Nov-15 09:37 UTC

head link

Re: [Xen-devel] Re: how to handle paged hypercall args?

At 09:45 +0000 on 12 Nov (1289555111), Jan Beulich
wrote:> Furthermore, even for the do_memory_op() one, encoding a
> continuation for a failure of copying in the arguments is clearly
> acceptable (if no other solution can be found), but unwinding
> the whole operation when copying out the results fails is at
> least undesirable (and can lead to a live lock). So I think a
> general (hopefully transparent to the individual hypercall
> handlers) solution needs to be found, and a word on the
> general issue from the original paging code authors (and their
> thoughts of it when designing the whole thing) would be very
> much appreciated.
Maybe Patrick can comment too, but my recollection of discussing this is
that we would have to propagate failures caused by paging at least as
far as the dom0 kernel, because otherwise a single-vcpu dom0 kernel
could deadlock with its one vcpu stuck in a hypercall (or continually
having it preempted and retried) and the paging binary that would
unstick it never getting scheduled.

Cheers,

Tim.

-- 
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, Xen Platform Team
Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2010-Nov-15 09:53 UTC

head link

Re: [Xen-devel] Re: how to handle paged hypercall args?

>>> On 15.11.10 at 10:37, Tim Deegan <Tim.Deegan@citrix.com>
wrote:
> At 09:45 +0000 on 12 Nov (1289555111), Jan Beulich wrote:
>> Furthermore, even for the do_memory_op() one, encoding a
>> continuation for a failure of copying in the arguments is clearly
>> acceptable (if no other solution can be found), but unwinding
>> the whole operation when copying out the results fails is at
>> least undesirable (and can lead to a live lock). So I think a
>> general (hopefully transparent to the individual hypercall
>> handlers) solution needs to be found, and a word on the
>> general issue from the original paging code authors (and their
>> thoughts of it when designing the whole thing) would be very
>> much appreciated.
> 
> Maybe Patrick can comment too, but my recollection of discussing this is
> that we would have to propagate failures caused by paging at least as
> far as the dom0 kernel, because otherwise a single-vcpu dom0 kernel
> could deadlock with its one vcpu stuck in a hypercall (or continually
> having it preempted and retried) and the paging binary that would
> unstick it never getting scheduled.
How''s Dom0 involved here? The hypercall arguments live in
guest memory.

Confused, Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Nov-15 10:09 UTC

head link

Re: [Xen-devel] Re: how to handle paged hypercall args?

On 15/11/2010 09:53, "Jan Beulich" <JBeulich@novell.com> wrote:
>> Maybe Patrick can comment too, but my recollection of discussing this
is
>> that we would have to propagate failures caused by paging at least as
>> far as the dom0 kernel, because otherwise a single-vcpu dom0 kernel
>> could deadlock with its one vcpu stuck in a hypercall (or continually
>> having it preempted and retried) and the paging binary that would
>> unstick it never getting scheduled.
> 
> How''s Dom0 involved here? The hypercall arguments live in
> guest memory.
Yes, and you''d never turn on paging for dom0 itself. That would never
work!

Changing every user of the guest accessor macros to retry via guest space is
really not tenable. We''d never get all the bugs out.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tim Deegan

2010-Nov-15 10:20 UTC

head link

Re: [Xen-devel] Re: how to handle paged hypercall args?

At 10:09 +0000 on 15 Nov (1289815777), Keir Fraser
wrote:> On 15/11/2010 09:53, "Jan Beulich" <JBeulich@novell.com>
wrote:
> 
> >> Maybe Patrick can comment too, but my recollection of discussing
this is
> >> that we would have to propagate failures caused by paging at least
as
> >> far as the dom0 kernel, because otherwise a single-vcpu dom0
kernel
> >> could deadlock with its one vcpu stuck in a hypercall (or
continually
> >> having it preempted and retried) and the paging binary that would
> >> unstick it never getting scheduled.
> > 
> > How''s Dom0 involved here? The hypercall arguments live in
> > guest memory.
> 
> Yes, and you''d never turn on paging for dom0 itself. That would
never work!
:) No, the issue is if dom0 (or whichever dom the pager lives in) is
trying an operation on domU''s memory that hits a paged-out page
(e.g. qemu or similar is mapping it) with its only vpcu - you can''t
just block or spin.  You need to let dom0 schedule the pager process. 
> Changing every user of the guest accessor macros to retry via guest space
is
> really not tenable. We''d never get all the bugs out.
Right now, I can''t see another way of doing it.  Grants can be handled
by shadowing the guest grant table and pinning granted frames so the
block happens in domU (performance-- but you''re already paging, right?)
but what about qemu, xenctx, save/restore...?

Tim.

-- 
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, Xen Platform Team
Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Nov-15 10:33 UTC

head link

Re: [Xen-devel] Re: how to handle paged hypercall args?

On 15/11/2010 10:20, "Tim Deegan" <Tim.Deegan@citrix.com> wrote:
>> Yes, and you''d never turn on paging for dom0 itself. That
would never work!
> 
> :) No, the issue is if dom0 (or whichever dom the pager lives in) is
> trying an operation on domU''s memory that hits a paged-out page
> (e.g. qemu or similar is mapping it) with its only vpcu - you
can''t
> just block or spin.  You need to let dom0 schedule the pager process.
> 
>> Changing every user of the guest accessor macros to retry via guest
space is
>> really not tenable. We''d never get all the bugs out.
> 
> Right now, I can''t see another way of doing it.  Grants can be
handled
> by shadowing the guest grant table and pinning granted frames so the
> block happens in domU (performance-- but you''re already paging,
right?)
> but what about qemu, xenctx, save/restore...?
We''re talking about copy_to/from_guest, and friends, here. They always
implicitly act on the local domain, so the issue you raise is not a problem
there. Dom0 mappings of domU memory are a separate issue, presumably already
considered and dealt with to some extent, no doubt.

 -- Keir




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tim Deegan

2010-Nov-15 10:49 UTC

head link

Re: [Xen-devel] Re: how to handle paged hypercall args?

At 10:33 +0000 on 15 Nov (1289817224), Keir Fraser
wrote:> On 15/11/2010 10:20, "Tim Deegan" <Tim.Deegan@citrix.com>
wrote:
> 
> >> Yes, and you''d never turn on paging for dom0 itself. That
would never work!
> > 
> > :) No, the issue is if dom0 (or whichever dom the pager lives in) is
> > trying an operation on domU''s memory that hits a paged-out
page
> > (e.g. qemu or similar is mapping it) with its only vpcu - you
can''t
> > just block or spin.  You need to let dom0 schedule the pager process.
> > 
> >> Changing every user of the guest accessor macros to retry via
guest space is
> >> really not tenable. We''d never get all the bugs out.
> > 
> > Right now, I can''t see another way of doing it.  Grants can
be handled
> > by shadowing the guest grant table and pinning granted frames so the
> > block happens in domU (performance-- but you''re already
paging, right?)
> > but what about qemu, xenctx, save/restore...?
> 
> We''re talking about copy_to/from_guest, and friends, here.
Oh sorry, I had lost the context there. 

Yes, for those the plan was just to pause and retry, just like all other
cases where Xen needs to access guest memory.  We hadn''t particularly
considered the case of large hypercall arguments that aren''t all read
up-front.  How many cases of that are there?  A bit of reordering on the
memory-operation hypercalls could presuambly let them be preempted and
restart further in mid-operation next time.  (IIRC the compat code
already does something like this).

Tim.

-- 
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, Xen Platform Team
Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Nov-15 11:55 UTC

head link

Re: [Xen-devel] Re: how to handle paged hypercall args?

On 15/11/2010 10:49, "Tim Deegan" <Tim.Deegan@citrix.com> wrote:
>> We''re talking about copy_to/from_guest, and friends, here.
> 
> Oh sorry, I had lost the context there.
> 
> Yes, for those the plan was just to pause and retry, just like all other
> cases where Xen needs to access guest memory.
Could you expand on what you mean by pause and retry? As that''s what I
think
should be implemented, and involves sleeping in hypervisor context afaics,
which has led us to the current point in the discussion.
> We hadn''t particularly
> considered the case of large hypercall arguments that aren''t all
read
> up-front.  How many cases of that are there?  A bit of reordering on the
> memory-operation hypercalls could presuambly let them be preempted and
> restart further in mid-operation next time.  (IIRC the compat code
> already does something like this).
The issue is that there are hundreds of uses of the guest-accessor macros.
Every single one would need updating to handle the paged-out-so-retry case,
unless we can hide that *inside* the accessor macros themselves. It''s a
huge
job, not to mention the bug tail on rarely-executed error paths.

Consider also the copy_to_* writeback case at the end of a hypercall.
You''ve
done the potentially non-idempotent work, you have some state cached in
hypervisor regs/stack/heap and want to push it out to guest memory. The
guest target memory is paged out. How do you encode the continuation for the
dozens of cases like this without tearing your hair out?

I suppose *maybe* you could check-and-pin all memory that might be accessed
before the meat of a hypercall begins. That seems a fragile pain in the neck
too however.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tim Deegan

2010-Nov-15 12:04 UTC

head link

Re: [Xen-devel] Re: how to handle paged hypercall args?

At 11:55 +0000 on 15 Nov (1289822118), Keir Fraser
wrote:> The issue is that there are hundreds of uses of the guest-accessor macros.
> Every single one would need updating to handle the paged-out-so-retry case,
> unless we can hide that *inside* the accessor macros themselves.
It''s a huge
> job, not to mention the bug tail on rarely-executed error paths.
Right, I see.  You''re suggesting that we code up a sort of setjmp()
that
can be called in the __copy function, which will deschedule the vcpu and
allow it to be rescheduled back where it was.  Sounds ideal.  Will it
need per-vcpu stacks? (and will they, in turn, use order>0 allocations? :))

We''ll have to audit the __copy functions to make sure they''re
not called
with locks held.  Sounds more fun than the alternative, I guess.

I think the ioreq code would be another candidate for tidying up if we
had such a mechanism.  Presumably some of the current users of
hypercall_create_continuation() would benefit too.
> Consider also the copy_to_* writeback case at the end of a hypercall.
You''ve
> done the potentially non-idempotent work, you have some state cached in
> hypervisor regs/stack/heap and want to push it out to guest memory. The
> guest target memory is paged out. How do you encode the continuation for
the
> dozens of cases like this without tearing your hair out?
> 
> I suppose *maybe* you could check-and-pin all memory that might be accessed
> before the meat of a hypercall begins. That seems a fragile pain in the
neck
> too however.
Good point.

Tim.

-- 
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, Xen Platform Team
Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Nov-15 12:17 UTC

head link

Re: [Xen-devel] Re: how to handle paged hypercall args?

On 15/11/2010 12:04, "Tim Deegan" <Tim.Deegan@citrix.com> wrote:
> At 11:55 +0000 on 15 Nov (1289822118), Keir Fraser wrote:
>> The issue is that there are hundreds of uses of the guest-accessor
macros.
>> Every single one would need updating to handle the paged-out-so-retry
case,
>> unless we can hide that *inside* the accessor macros themselves.
It''s a huge
>> job, not to mention the bug tail on rarely-executed error paths.
> 
> Right, I see.  You''re suggesting that we code up a sort of
setjmp() that
> can be called in the __copy function, which will deschedule the vcpu and
> allow it to be rescheduled back where it was.  Sounds ideal.
Exactly so.
> Will it
> need per-vcpu stacks? (and will they, in turn, use order>0 allocations?
:))
Of a sort. I propose to keep the per-pcpu stacks and then copy context
to/from a per-vcpu memory area for the setjmp-like behaviour. Guest call
stacks won''t be very deep -- I reckon a 1kB or 2kB per-vcpu area will
suffice.

In some ways this is a backwards version of the Linux stack-handling logic,
which has a proper per-task kernel stack which is of moderate size (4kB?).
Then it has per-cpu irq stacks which are larger to deal with deep irq
nesting. We will have proper per-cpu hypervisor stacks of sufficient size to
deal with guest and irq state -- our per-vcpu ''shadow stack''
will then be
the special case and only of small/moderate size to deal with shallow guest
call stacks.
> We''ll have to audit the __copy functions to make sure
they''re not called
> with locks held.  Sounds more fun than the alternative, I guess.
Exactly so. Best of a bad set of options. At least we can run-time assert
this and it''s not error-path only.
> I think the ioreq code would be another candidate for tidying up if we
> had such a mechanism.  Presumably some of the current users of
> hypercall_create_continuation() would benefit too.
Yeah, it needs a dash of thought but I think we will be able to move in this
direction.

 -- Keir
>> Consider also the copy_to_* writeback case at the end of a hypercall.
You''ve
>> done the potentially non-idempotent work, you have some state cached in
>> hypervisor regs/stack/heap and want to push it out to guest memory. The
>> guest target memory is paged out. How do you encode the continuation
for the
>> dozens of cases like this without tearing your hair out?
>> 
>> I suppose *maybe* you could check-and-pin all memory that might be
accessed
>> before the meat of a hypercall begins. That seems a fragile pain in the
neck
>> too however.
> 
> Good point.
> 
> Tim.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Olaf Hering

2010-Nov-15 13:12 UTC

head link

Re: [Xen-devel] Re: how to handle paged hypercall args?

On Fri, Nov 12, Keir Fraser wrote:
> On 12/11/2010 10:47, "Jan Beulich" <JBeulich@novell.com>
wrote:
> > Sounds good, and you helping with this will be much appreciated
> > (Olaf - unless you had plans doing this yourself). Whether
it''s going
> > to be widely used I can''t tell immediately - for the moment,
> > overcoming the paging problems seems like the only application.
> 
> Yeah, I''ll get on this.
Sorry for being late here.

I''m glad you volunteer for this task.


Olaf


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Nov-17 16:52 UTC

head link

Re: [Xen-devel] Re: how to handle paged hypercall args?

On 15/11/2010 13:12, "Olaf Hering" <olaf@aepfle.de> wrote:
> On Fri, Nov 12, Keir Fraser wrote:
> 
>> On 12/11/2010 10:47, "Jan Beulich"
<JBeulich@novell.com> wrote:
>>> Sounds good, and you helping with this will be much appreciated
>>> (Olaf - unless you had plans doing this yourself). Whether
it''s going
>>> to be widely used I can''t tell immediately - for the
moment,
>>> overcoming the paging problems seems like the only application.
>> 
>> Yeah, I''ll get on this.
> 
> Sorry for being late here.
> 
> I''m glad you volunteer for this task.
The basis of what you need is checked in as xen-unstable:22396. You can
include <xen/wait.h> and you get an interface like a very simplified
version
of Linux waitqueues. There are still some details to be worked out but it
basically works as-is and you can start using it now.

The one big cleanup/audit we will need is that all callers of __hvm_copy()
(which ends up being all HVM guest callers of the copy_to/from_guest*
macros) must not hold any locks. This is because you are going to modify
__hvm_copy() such that it may sleep. Probably you should
ASSERT(!in_atomic()) at the top of __hvm_copy(), and go from there. :-)

 -- Keir
> 
> Olaf
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Nov-18 12:33 UTC

head link

Re: [Xen-devel] Re: how to handle paged hypercall args?

On 17/11/2010 16:52, "Keir Fraser" <keir@xen.org> wrote:
> On 15/11/2010 13:12, "Olaf Hering" <olaf@aepfle.de> wrote:
> 
>> Sorry for being late here.
>> 
>> I''m glad you volunteer for this task.
> 
> The basis of what you need is checked in as xen-unstable:22396. You can
> include <xen/wait.h> and you get an interface like a very simplified
version
> of Linux waitqueues. There are still some details to be worked out but it
> basically works as-is and you can start using it now.
> 
> The one big cleanup/audit we will need is that all callers of __hvm_copy()
> (which ends up being all HVM guest callers of the copy_to/from_guest*
> macros) must not hold any locks. This is because you are going to modify
> __hvm_copy() such that it may sleep. Probably you should
> ASSERT(!in_atomic()) at the top of __hvm_copy(), and go from there. :-)
I''ve done something along these lines now as xen-unstable:22402. It
actually
seems to work okay! So you can go ahead and use waitqueues in __hvm_copy()
now.

 -- Keir
>  -- Keir
> 
>> 
>> Olaf
>> 
> 
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Olaf Hering

2010-Nov-18 13:51 UTC

head link

Re: [Xen-devel] Re: how to handle paged hypercall args?

On Thu, Nov 18, Keir Fraser wrote:
> I''ve done something along these lines now as xen-unstable:22402.
It actually
> seems to work okay! So you can go ahead and use waitqueues in __hvm_copy()
> now.
Thanks alot for your work, Keir!
I will get to it next week.

Olaf


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Olaf Hering

2010-Dec-02 10:11 UTC

head link

Re: [Xen-devel] Re: how to handle paged hypercall args?

On Thu, Nov 18, Keir Fraser wrote:
> I''ve done something along these lines now as xen-unstable:22402.
It actually
> seems to work okay! So you can go ahead and use waitqueues in __hvm_copy()
> now.
This is my first attempt to do it.
It crashed Xen on the very first try in a spectacular way. But it
happend only once for some reason.
See my other mail.


Olaf

--- xen-unstable.hg-4.1.22447.orig/xen/arch/x86/hvm/hvm.c
+++ xen-unstable.hg-4.1.22447/xen/arch/x86/hvm/hvm.c
@@ -1986,69 +1986,117 @@ static enum hvm_copy_result __hvm_copy(
 enum hvm_copy_result hvm_copy_to_guest_phys(
     paddr_t paddr, void *buf, int size)
 {
-    return __hvm_copy(buf, paddr, size,
+    enum hvm_copy_result res;
+    struct waitqueue_head wq;
+    init_waitqueue_head(&wq);
+
+    wait_event(wq, (
+    res = __hvm_copy(buf, paddr, size,
                       HVMCOPY_to_guest | HVMCOPY_fault | HVMCOPY_phys,
-                      0);
+                      0)) != HVMCOPY_gfn_paged_out);
+    return res;
 }
 
 enum hvm_copy_result hvm_copy_from_guest_phys(
     void *buf, paddr_t paddr, int size)
 {
-    return __hvm_copy(buf, paddr, size,
+    enum hvm_copy_result res;
+    struct waitqueue_head wq;
+    init_waitqueue_head(&wq);
+
+    wait_event(wq, (
+    res = __hvm_copy(buf, paddr, size,
                       HVMCOPY_from_guest | HVMCOPY_fault | HVMCOPY_phys,
-                      0);
+                      0)) != HVMCOPY_gfn_paged_out);
+    return res;
 }
 
 enum hvm_copy_result hvm_copy_to_guest_virt(
     unsigned long vaddr, void *buf, int size, uint32_t pfec)
 {
-    return __hvm_copy(buf, vaddr, size,
+    enum hvm_copy_result res;
+    struct waitqueue_head wq;
+    init_waitqueue_head(&wq);
+
+    wait_event(wq, (
+    res = __hvm_copy(buf, vaddr, size,
                       HVMCOPY_to_guest | HVMCOPY_fault | HVMCOPY_virt,
-                      PFEC_page_present | PFEC_write_access | pfec);
+                      PFEC_page_present | PFEC_write_access | pfec)) !=
HVMCOPY_gfn_paged_out);
+    return res;
 }
 
 enum hvm_copy_result hvm_copy_from_guest_virt(
     void *buf, unsigned long vaddr, int size, uint32_t pfec)
 {
-    return __hvm_copy(buf, vaddr, size,
+    enum hvm_copy_result res;
+    struct waitqueue_head wq;
+    init_waitqueue_head(&wq);
+
+    wait_event(wq, (
+    res = __hvm_copy(buf, vaddr, size,
                       HVMCOPY_from_guest | HVMCOPY_fault | HVMCOPY_virt,
-                      PFEC_page_present | pfec);
+                      PFEC_page_present | pfec)) != HVMCOPY_gfn_paged_out);
+    return res;
 }
 
 enum hvm_copy_result hvm_fetch_from_guest_virt(
     void *buf, unsigned long vaddr, int size, uint32_t pfec)
 {
+    enum hvm_copy_result res;
+    struct waitqueue_head wq;
     if ( hvm_nx_enabled(current) )
         pfec |= PFEC_insn_fetch;
-    return __hvm_copy(buf, vaddr, size,
+    init_waitqueue_head(&wq);
+
+    wait_event(wq, (
+    res = __hvm_copy(buf, vaddr, size,
                       HVMCOPY_from_guest | HVMCOPY_fault | HVMCOPY_virt,
-                      PFEC_page_present | pfec);
+                      PFEC_page_present | pfec)) != HVMCOPY_gfn_paged_out);
+    return res;
 }
 
 enum hvm_copy_result hvm_copy_to_guest_virt_nofault(
     unsigned long vaddr, void *buf, int size, uint32_t pfec)
 {
-    return __hvm_copy(buf, vaddr, size,
+    enum hvm_copy_result res;
+    struct waitqueue_head wq;
+    init_waitqueue_head(&wq);
+
+    wait_event(wq, (
+    res = __hvm_copy(buf, vaddr, size,
                       HVMCOPY_to_guest | HVMCOPY_no_fault | HVMCOPY_virt,
-                      PFEC_page_present | PFEC_write_access | pfec);
+                      PFEC_page_present | PFEC_write_access | pfec)) !=
HVMCOPY_gfn_paged_out);
+    return res;
 }
 
 enum hvm_copy_result hvm_copy_from_guest_virt_nofault(
     void *buf, unsigned long vaddr, int size, uint32_t pfec)
 {
-    return __hvm_copy(buf, vaddr, size,
+    enum hvm_copy_result res;
+    struct waitqueue_head wq;
+    init_waitqueue_head(&wq);
+
+    wait_event(wq, (
+    res = __hvm_copy(buf, vaddr, size,
                       HVMCOPY_from_guest | HVMCOPY_no_fault | HVMCOPY_virt,
-                      PFEC_page_present | pfec);
+                      PFEC_page_present | pfec)) != HVMCOPY_gfn_paged_out);
+    return res;
 }
 
 enum hvm_copy_result hvm_fetch_from_guest_virt_nofault(
     void *buf, unsigned long vaddr, int size, uint32_t pfec)
 {
+    enum hvm_copy_result res;
+    struct waitqueue_head wq;
     if ( hvm_nx_enabled(current) )
         pfec |= PFEC_insn_fetch;
-    return __hvm_copy(buf, vaddr, size,
+    init_waitqueue_head(&wq);
+
+    wait_event(wq, (
+    res = __hvm_copy(buf, vaddr, size,
                       HVMCOPY_from_guest | HVMCOPY_no_fault | HVMCOPY_virt,
-                      PFEC_page_present | pfec);
+                      PFEC_page_present | pfec)) != HVMCOPY_gfn_paged_out);
+    return res;
 }
 
 unsigned long copy_to_user_hvm(void *to, const void *from, unsigned int len)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Olaf Hering

2010-Dec-02 10:18 UTC

head link

Re: [Xen-devel] Re: how to handle paged hypercall args?

On Thu, Nov 18, Keir Fraser wrote:
> I''ve done something along these lines now as xen-unstable:22402.
It actually
> seems to work okay! So you can go ahead and use waitqueues in __hvm_copy()
> now.
My first attempt with the patch I sent crashed like this.
Two threads run into a non-empty list:

prepare_to_wait
check_wakeup_from_wait

I could not reproduce this. Right now I''m running with a modified
xenpaging policy which pages just the pagetable gfns around gfn 0x1800.
But that almost stalls the guest due to the continous paging.


Any ideas how this crash can happen?


Olaf

....................

Welcome to SUSE Linux Enterprise Server 11 SP1  (x86_64) - Kernel
2.6.32.24-20101117.152845-xen (console).


stein-schneider login: (XEN) memory.c:145:d0 Could not allocate order=9 extent:
id=1 memflags=0 (2 of 4)
(XEN) memory.c:145:d0 Could not allocate order=9 extent: id=1 memflags=0 (0 of
3)
[  102.139380] (cdrom_add_media_watch()
file=/usr/src/packages/BUILD/kernel-xen-2.6.32.24/linux-2.6.32/drivers/xen/blkback/cdrom.c,
line=108) nodename:backend/vbd/1/768
[  102.171632] (cdrom_is_type()
file=/usr/src/packages/BUILD/kernel-xen-2.6.32.24/linux-2.6.32/drivers/xen/blkback/cdrom.c,
line=95) type:0
[  102.209310] device vif1.0 entered promiscuous mode
[  102.221776] br0: port 2(vif1.0) entering forwarding state
[  102.490897] OLH gntdev_open(449) xend[5202]->qemu-dm[5324] i
ffff8800f2420720 f ffff8800f1c2f980
[  102.733559] ip_tables: (C) 2000-2006 Netfilter Core Team
[  102.888335] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
[  103.241995] (cdrom_add_media_watch()
file=/usr/src/packages/BUILD/kernel-xen-2.6.32.24/linux-2.6.32/drivers/xen/blkback/cdrom.c,
line=108) nodename:backend/vbd/1/5632
[  103.274444] (cdrom_is_type()
file=/usr/src/packages/BUILD/kernel-xen-2.6.32.24/linux-2.6.32/drivers/xen/blkback/cdrom.c,
line=95) type:1
[  103.301481] (cdrom_add_media_watch()
file=/usr/src/packages/BUILD/kernel-xen-2.6.32.24/linux-2.6.32/drivers/xen/blkback/cdrom.c,
line=110) is a cdrom
[  103.331978] (cdrom_add_media_watch()
file=/usr/src/packages/BUILD/kernel-xen-2.6.32.24/linux-2.6.32/drivers/xen/blkback/cdrom.c,
line=112) xenstore wrote OK
[  103.362764] (cdrom_is_type()
file=/usr/src/packages/BUILD/kernel-xen-2.6.32.24/linux-2.6.32/drivers/xen/blkback/cdrom.c,
line=95) type:1
[  104.538376] (cdrom_add_media_watch()
file=/usr/src/packages/BUILD/kernel-xen-2.6.32.24/linux-2.6.32/drivers/xen/blkback/cdrom.c,
line=108) nodename:backend/vbd/1/832
[  104.570669] (cdrom_is_type()
file=/usr/src/packages/BUILD/kernel-xen-2.6.32.24/linux-2.6.32/drivers/xen/blkback/cdrom.c,
line=95) type:0
[  112.401097] vif1.0: no IPv6 routers present
(XEN) HVM1: HVM Loader
(XEN) HVM1: Detected Xen v4.1.22433-20101126
(XEN) HVM1: CPU speed is 2667 MHz
(XEN) HVM1: Xenbus rings @0xfeffc000, event channel 5
(XEN) irq.c:258: Dom1 PCI link 0 changed 0 -> 5
(XEN) HVM1: PCI-ISA link 0 routed to IRQ5
(XEN) irq.c:258: Dom1 PCI link 1 changed 0 -> 10
(XEN) HVM1: PCI-ISA link 1 routed to IRQ10
(XEN) irq.c:258: Dom1 PCI link 2 changed 0 -> 11
(XEN) HVM1: PCI-ISA link 2 routed to IRQ11
(XEN) irq.c:258: Dom1 PCI link 3 changed 0 -> 5
(XEN) HVM1: PCI-ISA link 3 routed to IRQ5
(XEN) HVM1: pci dev 01:3 INTA->IRQ10
(XEN) HVM1: pci dev 03:0 INTA->IRQ5
(XEN) HVM1: pci dev 02:0 bar 10 size 02000000: f0000008
(XEN) HVM1: pci dev 03:0 bar 14 size 01000000: f2000008
(XEN) HVM1: pci dev 02:0 bar 14 size 00001000: f3000000
(XEN) HVM1: pci dev 03:0 bar 10 size 00000100: 0000c001
(XEN) HVM1: pci dev 01:1 bar 20 size 00000010: 0000c101
(XEN) HVM1: Multiprocessor initialisation:
(XEN) HVM1:  - CPU0 ... 40-bit phys ... fixed MTRRs ... var MTRRs [2/8] ...
done.
(XEN) HVM1:  - CPU1 ... 40-bit phys ... fixed MTRRs ... var MTRRs [2/8] ...
done.
(XEN) HVM1:  - CPU2 ... 40-bit phys ... fixed MTRRs ... var MTRRs [2/8] ...
done.
(XEN) HVM1:  - CPU3 ... 40-bit phys ... fixed MTRRs ... var MTRRs [2/8] ...
done.
(XEN) HVM1: Testing HVM environment:
(XEN) HVM1:  - REP INSB across page boundaries ... passed
(XEN) HVM1:  - GS base MSRs and SWAPGS ... passed
(XEN) HVM1: Passed 2 of 2 tests
(XEN) HVM1: Writing SMBIOS tables ...
(XEN) HVM1: Loading ROMBIOS ...
(XEN) HVM1: 9660 bytes of ROMBIOS high-memory extensions:
(XEN) HVM1:   Relocating to 0xfc000000-0xfc0025bc ... done
(XEN) HVM1: Creating MP tables ...
(XEN) HVM1: Loading Cirrus VGABIOS ...
(XEN) HVM1: Loading ACPI ...
(XEN) HVM1:  - Lo data: 000ea020-000ea04f
(XEN) HVM1:  - Hi data: fc002800-fc01291f
(XEN) HVM1: vm86 TSS at fc012c00
(XEN) HVM1: BIOS map:
(XEN) HVM1:  c0000-c8fff: VGA BIOS
(XEN) HVM1:  eb000-eb1d9: SMBIOS tables
(XEN) HVM1:  f0000-fffff: Main BIOS
(XEN) HVM1: E820 table:
(XEN) HVM1:  [00]: 00000000:00000000 - 00000000:0009e000: RAM
(XEN) HVM1:  [01]: 00000000:0009e000 - 00000000:0009fc00: RESERVED
(XEN) HVM1:  [02]: 00000000:0009fc00 - 00000000:000a0000: RESERVED
(XEN) HVM1:  HOLE: 00000000:000a0000 - 00000000:000e0000
(XEN) HVM1:  [03]: 00000000:000e0000 - 00000000:00100000: RESERVED
(XEN) HVM1:  [04]: 00000000:00100000 - 00000000:40000000: RAM
(XEN) HVM1:  HOLE: 00000000:40000000 - 00000000:fc000000
(XEN) HVM1:  [05]: 00000000:fc000000 - 00000001:00000000: RESERVED
(XEN) HVM1: Invoking ROMBIOS ...
(XEN) HVM1: $Revision: 1.221 $ $Date: 2008/12/07 17:32:29 $
(XEN) stdvga.c:147:d1 entering stdvga and caching modes
(XEN) HVM1: VGABios $Id: vgabios.c,v 1.67 2008/01/27 09:44:12 vruppert Exp $
(XEN) HVM1: Bochs BIOS - build: 06/23/99
(XEN) HVM1: $Revision: 1.221 $ $Date: 2008/12/07 17:32:29 $
(XEN) HVM1: Options: apmbios pcibios eltorito PMM 
(XEN) HVM1: 
(XEN) HVM1: ata0-0: PCHS=8322/16/63 translation=lba LCHS=522/255/63
(XEN) HVM1: ata0 master: QEMU HARDDISK ATA-7 Hard-Disk (4096 MBytes)
(XEN) HVM1: ata0-1: PCHS=16383/16/63 translation=lba LCHS=1024/255/63
(XEN) HVM1: ata0  slave: QEMU HARDDISK ATA-7 Hard-Disk (43008 MBytes)
(XEN) HVM1: ata1 master: QEMU DVD-ROM ATAPI-4 CD-Rom/DVD-Rom
(XEN) HVM1: IDE time out
(XEN) HVM1: 
(XEN) HVM1: 
(XEN) HVM1: 
(XEN) HVM1: Press F12 for boot menu.
(XEN) HVM1: 
(XEN) HVM1: Booting from Hard Disk...
(XEN) HVM1: Booting from 0000:7c00
(XEN) HVM1: int13_harddisk: function 41, unmapped device for ELDL=82
(XEN) HVM1: int13_harddisk: function 08, unmapped device for ELDL=82
(XEN) HVM1: *** int 15h function AX=00c0, BX=0000 not yet supported!
(XEN) HVM1: *** int 15h function AX=ec00, BX=0002 not yet supported!
(XEN) HVM1: KBD: unsupported int 16h function 03
(XEN) HVM1: *** int 15h function AX=e980, BX=0000 not yet supported!
(XEN) HVM1: int13_harddisk: function 41, unmapped device for ELDL=82
(XEN) HVM1: int13_harddisk: function 02, unmapped device for ELDL=82
(XEN) HVM1: int13_harddisk: function 41, unmapped device for ELDL=83
(XEN) HVM1: int13_harddisk: function 02, unmapped device for ELDL=83
(XEN) HVM1: int13_harddisk: function 41, unmapped device for ELDL=84
(XEN) HVM1: int13_harddisk: function 02, unmapped device for ELDL=84
(XEN) HVM1: int13_harddisk: function 41, unmapped device for ELDL=85
(XEN) HVM1: int13_harddisk: function 02, unmapped device for ELDL=85
(XEN) HVM1: int13_harddisk: function 41, unmapped device for ELDL=86
(XEN) HVM1: int13_harddisk: function 02, unmapped device for ELDL=86
(XEN) HVM1: int13_harddisk: function 41, unmapped device for ELDL=87
(XEN) HVM1: int13_harddisk: function 02, unmapped device for ELDL=87
(XEN) HVM1: int13_harddisk: function 41, ELDL out of range 88
(XEN) HVM1: int13_harddisk: function 02, ELDL out of range 88
(XEN) HVM1: int13_harddisk: function 41, ELDL out of range 89
(XEN) HVM1: int13_harddisk: function 02, ELDL out of range 89
(XEN) HVM1: int13_harddisk: function 41, ELDL out of range 8a
(XEN) HVM1: int13_harddisk: function 02, ELDL out of range 8a
(XEN) HVM1: int13_harddisk: function 41, ELDL out of range 8b
(XEN) HVM1: int13_harddisk: function 02, ELDL out of range 8b
(XEN) HVM1: int13_harddisk: function 41, ELDL out of range 8c
(XEN) HVM1: int13_harddisk: function 02, ELDL out of range 8c
(XEN) HVM1: int13_harddisk: function 41, ELDL out of range 8d
(XEN) HVM1: int13_harddisk: function 02, ELDL out of range 8d
(XEN) HVM1: int13_harddisk: function 41, ELDL out of range 8e
(XEN) HVM1: int13_harddisk: function 02, ELDL out of range 8e
(XEN) HVM1: int13_harddisk: function 41, ELDL out of range 8f
(XEN) HVM1: int13_harddisk: function 02, ELDL out of range 8f
(XEN) vlapic.c:699:d1 Local APIC Write to read-only register 0x30
(XEN) vlapic.c:699:d1 Local APIC Write to read-only register 0x20
(XEN) vlapic.c:699:d1 Local APIC Write to read-only register 0x20
(XEN) irq.c:258: Dom1 PCI link 0 changed 5 -> 0
(XEN) irq.c:258: Dom1 PCI link 1 changed 10 -> 0
(XEN) irq.c:258: Dom1 PCI link 2 changed 11 -> 0
(XEN) irq.c:258: Dom1 PCI link 3 changed 5 -> 0
(XEN) grant_table.c:1414:d1 Fault while reading gnttab_query_size_t.
(XEN) grant_table.c:1414:d1 Fault while reading gnttab_query_size_t.
(XEN) irq.c:324: Dom1 callback via changed to PCI INTx Dev 0x03 IntA
[  165.316278] blkback: ring-ref 8, event-channel 9, protocol 1 (x86_64-abi)
[  165.330911]   alloc irq_desc for 886 on node 0
[  165.337115]   alloc kstat_irqs on node 0
[  165.351993] blkback: ring-ref 9, event-channel 10, protocol 1 (x86_64-abi)
[  165.366824]   alloc irq_desc for 887 on node 0
[  165.372089]   alloc kstat_irqs on node 0
[  165.387424] blkback: ring-ref 10, event-channel 11, protocol 1 (x86_64-abi)
[  165.402453]   alloc irq_desc for 888 on node 0
[  165.409108]   alloc kstat_irqs on node 0
(XEN) grant_table.c:1414:d1 Fault while reading gnttab_query_size_t.
[  168.016706]   alloc irq_desc for 889 on node 0
[  168.020103]   alloc kstat_irqs on node 0
(XEN) Xen BUG at wait.c:118
(XEN) Assertion ''list_empty(&wqv->list)'' failed at
wait.c:130
(XEN) Debugging connection not set up.
(XEN) Debugging connection not set up.
(XEN) ----[ Xen-4.1.22433-20101126.164804  x86_64  debug=y  Tainted:    C ]----
(XEN) ----[ Xen-4.1.22433-20101126.164804  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    1
(XEN) CPU:    3
(XEN) RIP:    e008:[<ffff82c4801285c1>]RIP:   
e008:[<ffff82c4801283ab>] prepare_to_wait+0xf0/0x10f
(XEN) RFLAGS: 0000000000010212    check_wakeup_from_wait+0x27/0x61CONTEXT:
hypervisor
(XEN) 
(XEN) RFLAGS: 0000000000010293   rax: 0000000000000000   rbx: ffff8301337cd010  
rcx: 0000000000000968
(XEN) CONTEXT: hypervisor
(XEN) rdx: ffff83013e737f18   rsi: ffff83013e7375b0   rdi: ffff8301337cd030
(XEN) rax: ffff8301337cd620   rbx: ffff830012b72000   rcx: 0000000000000000
(XEN) rbp: ffff83013e737648   rsp: ffff83013e737628   r8:  ffff830138439f60
(XEN) rdx: ffff83013e707f18   rsi: 0000000000000003   rdi: ffff830012b73860
(XEN) r9:  000000000011622f   r10: ffff83013e737950   r11: ffffffff8101f230
(XEN) rbp: ffff83013e707cb0   rsp: ffff83013e707cb0   r8:  0000000000000013
(XEN) r12: ffff83013e737668   r13: ffff8301337cd010   r14: ffff830012b74000
(XEN) r9:  0000ffff0000ffff   r10: 00ff00ff00ff00ff   r11: 0f0f0f0f0f0f0f0f
(XEN) r15: ffffffffff5fb300   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) r12: 0000000000000003   r13: ffff8300bf2fa000   r14: 0000002e80873e97
(XEN) cr3: 000000013444c000   cr2: ffffe8ffffc00000
(XEN) r15: ffff830012b72000   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) cr3: 00000001347e5000   cr2: 00007f1c79acf000
(XEN) Xen stack trace from rsp=ffff83013e737628:
(XEN)   ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN)  0000000000000004Xen stack trace from rsp=ffff83013e707cb0:
(XEN)    0000000000000003 ffff83013e707cf0 0000000000000004 ffff82c4801ab935
ffff83013e7379b0 ffff83013e707d10
(XEN)    ffff830012b72000 ffff83013e7376b8
(XEN)    ffff82c4801ac622 0000000000000003 ffff83013e7376f0 ffff8300bf2fa000
ffff83013e737668 0000002e80873e97
(XEN)    ffff83013e70c040 000000000fff0001
(XEN)    ffff8301337cd010 ffff83013e707d10 ffff8301337cd010 ffff82c4801c29eb
0000000000000004
(XEN)    0000002e80873e97 000000033e737bc8 ffff830012b72000 0000000000000000
(XEN)    0000000000000003 ffff83013e707e10 0000000000000004 ffff82c480157913
(XEN)    0000000000000008 ffff83013e737bc8 ffff830012b74000 0000000000000000
ffff83013e737728
(XEN)    ffff82c4801a6460 0000000000000000
(XEN)    0000000000000000 ffff83013e7376f8 0000000000000002 0000000000000001
0000002e80873cb4 ffff83013e7379b0
(XEN)    000000000000008b ffff82c4802e58c0
(XEN)    ffff82c4802e58c0 00000000fee00310 ffff83013e707dc0 0000000000000001
0000000000000282 ffffffffff5fb300
(XEN)    0000000000000089 ffff82c48014d4a1
(XEN)    0000002e80873e97 0000000000000000 ffff83013e707e40 0000000000000000
0000000000000000 00000003c18f8247 0000000000000000
(XEN)   
(XEN)    ffff83013e77b5c0 ffff83013e737b28 0000000000000000 ffff82c48019179c
ffff83013e707e00 ffff83013e737748 ffff82c48017a38b ffff82c480175d52
(XEN)   
(XEN)    0000000000000392 ffff83013e737768 0000003a19e31b9c 0000000280175df4
0000000000000000 000000003e6295a0 0000000000000000 ffff8301388e6f50
(XEN)   
(XEN)    0000000000000000 00007cfec18c8867 0000000000000000 ffff82c4802587c0
ffff83013e707e10 ffff83013e737bc8 ffff830012b72000 0000000012b74000
(XEN)   
(XEN)    ffff8300bf2fa000 000000d600d60001 0000000000000001 ffff830138092000
0000002e80873e97 0000000000000000 ffff83013e70c040 25d68301388e6f50
(XEN)   
(XEN)    ffff83013e707e90 0000000000000005 ffff82c480120fbe 00000000000ca8e4
ffff83013e707e40 000000000011622f 0000002e80873e97 0000000000000011
(XEN)   
(XEN)    ffff83013e70c100 0000000139401004 ffff83013e767dd8 0000000000000000
ffff830012b72000 ffff83013e7377e8 0000000001c9c380 00ff82c480175d52
(XEN)   
(XEN)    ffff83013e707e00 ffff83013e737800 ffff83013e70c100 ffff82c480175df4
ffff83013e767f20 ffff82c480122204 0000000000000003 ffff8301388e6f50
(XEN)   
(XEN)    0000000000000003 00000004388e6f50 ffff82c4802b3f00 0000000800000008
ffff83013e707f18 0000000000000000 ffffffffffffffff 00000004ffffffff
(XEN)   
(XEN)    ffff83013e707ed0 0000000400000001 ffff82c4801220d7 ffff82c4802022e9
ffff82c4802b3f00 ffff83013e7378c8 ffff83013e707f18 ffff83013e737888
(XEN)   
(XEN)    ffff82c48025dbe0 0000000000000010 ffff83013e707f18 0000000300000000
0000002e805c506e ffff83013e73793c ffff83013e70c040 000000000000180a
(XEN)   
(XEN)    0000000100000000 ffff83013e707ee0 0000000000000003 ffff82c480122152
00000000388e6f50 ffff83013e707f10 000000000000000a ffff82c480155619
(XEN)   
(XEN)    ffff83013942d000 0000000000000000 0000000000119285 ffff8300bf2fa000
ffff83013e737998 0000000000000003 0000000000000000
(XEN)  ffff8300bf2f6000Xen call trace:
(XEN)    
(XEN)   [<ffff82c4801285c1>] ffff83013e707d38 prepare_to_wait+0xf0/0x10f
(XEN)     0000000000000000[<ffff82c4801ac622>] 0000000000000000
hvm_copy_to_guest_virt+0x65/0xb0
(XEN)     0000000000000000[<ffff82c4801a6460>]
(XEN) Xen call trace:
(XEN)     hvmemul_write+0x113/0x1a2
(XEN)    [<ffff82c4801283ab>][<ffff82c48019179c>]
check_wakeup_from_wait+0x27/0x61
(XEN)    [<ffff82c4801ab935>] x86_emulate+0xe296/0x110ae
(XEN)    [<ffff82c4801a563f>] hvm_do_resume+0x29/0x1aa
(XEN)    [<ffff82c4801c29eb>] hvm_emulate_one+0x103/0x192
(XEN)    [<ffff82c4801b08ee>] vmx_do_resume+0x1bc/0x1db
(XEN)    [<ffff82c480157913>] handle_mmio+0x4e/0x17d
(XEN)    [<ffff82c4801c844f>] context_switch+0xdbf/0xddb
(XEN)    [<ffff82c480120fbe>] vmx_vmexit_handler+0x173f/0x1d2c
(XEN)    
(XEN)  schedule+0x5f3/0x619
(XEN)    
(XEN) ****************************************
(XEN) [<ffff82c4801220d7>]Panic on CPU 1:
(XEN)  __do_softirq+0x88/0x99
(XEN)    Xen BUG at wait.c:118
(XEN) [<ffff82c480122152>]****************************************
(XEN) 
(XEN)  do_softirq+0x6a/0x7a
(XEN)    Reboot in five seconds...
(XEN) [<ffff82c480155619>]Debugging connection not set up.
(XEN)  idle_loop+0x64/0x66
(XEN)    Resetting with ACPI MEMORY or I/O RESET_REG.ý

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Dec-02 10:22 UTC

head link

Re: [Xen-devel] Re: how to handle paged hypercall args?

On 02/12/2010 10:11, "Olaf Hering" <olaf@aepfle.de> wrote:
> On Thu, Nov 18, Keir Fraser wrote:
> 
>> I''ve done something along these lines now as
xen-unstable:22402. It actually
>> seems to work okay! So you can go ahead and use waitqueues in
__hvm_copy()
>> now.
> 
> This is my first attempt to do it.
> It crashed Xen on the very first try in a spectacular way. But it
> happend only once for some reason.
> See my other mail.
Firstly, the usage of waitqueues is broken. The waitqueue_head should be
shared with code that pages in, so that vcpus can be *woken* at some point
after they start waiting. As it is, if a vcpu does sleep on its local
waitqueue_head, it will never wake. You might start with a single global
waitqueue_head and wake everyone on it every time a page (or maybe page
batch) is paged in. More sophisticated might be to hash page numbers into a
array of waitqueue_heads, to reduce false wakeups. This is all similar to
Linux waitqueues by the way -- your current code would be just as broken in
Linux as it is Xen.

Secondly, you should be able to hide the waiting inside __hvm_copy(). I
doubt you really need to touch the callers.

 -- Keir
> 
> Olaf
> 
> --- xen-unstable.hg-4.1.22447.orig/xen/arch/x86/hvm/hvm.c
> +++ xen-unstable.hg-4.1.22447/xen/arch/x86/hvm/hvm.c
> @@ -1986,69 +1986,117 @@ static enum hvm_copy_result __hvm_copy(
>  enum hvm_copy_result hvm_copy_to_guest_phys(
>      paddr_t paddr, void *buf, int size)
>  {
> -    return __hvm_copy(buf, paddr, size,
> +    enum hvm_copy_result res;
> +    struct waitqueue_head wq;
> +    init_waitqueue_head(&wq);
> +
> +    wait_event(wq, (
> +    res = __hvm_copy(buf, paddr, size,
>                        HVMCOPY_to_guest | HVMCOPY_fault | HVMCOPY_phys,
> -                      0);
> +                      0)) != HVMCOPY_gfn_paged_out);
> +    return res;
>  }
>  
>  enum hvm_copy_result hvm_copy_from_guest_phys(
>      void *buf, paddr_t paddr, int size)
>  {
> -    return __hvm_copy(buf, paddr, size,
> +    enum hvm_copy_result res;
> +    struct waitqueue_head wq;
> +    init_waitqueue_head(&wq);
> +
> +    wait_event(wq, (
> +    res = __hvm_copy(buf, paddr, size,
>                        HVMCOPY_from_guest | HVMCOPY_fault | HVMCOPY_phys,
> -                      0);
> +                      0)) != HVMCOPY_gfn_paged_out);
> +    return res;
>  }
>  
>  enum hvm_copy_result hvm_copy_to_guest_virt(
>      unsigned long vaddr, void *buf, int size, uint32_t pfec)
>  {
> -    return __hvm_copy(buf, vaddr, size,
> +    enum hvm_copy_result res;
> +    struct waitqueue_head wq;
> +    init_waitqueue_head(&wq);
> +
> +    wait_event(wq, (
> +    res = __hvm_copy(buf, vaddr, size,
>                        HVMCOPY_to_guest | HVMCOPY_fault | HVMCOPY_virt,
> -                      PFEC_page_present | PFEC_write_access | pfec);
> +                      PFEC_page_present | PFEC_write_access | pfec)) !>
HVMCOPY_gfn_paged_out);
> +    return res;
>  }
>  
>  enum hvm_copy_result hvm_copy_from_guest_virt(
>      void *buf, unsigned long vaddr, int size, uint32_t pfec)
>  {
> -    return __hvm_copy(buf, vaddr, size,
> +    enum hvm_copy_result res;
> +    struct waitqueue_head wq;
> +    init_waitqueue_head(&wq);
> +
> +    wait_event(wq, (
> +    res = __hvm_copy(buf, vaddr, size,
>                        HVMCOPY_from_guest | HVMCOPY_fault | HVMCOPY_virt,
> -                      PFEC_page_present | pfec);
> +                      PFEC_page_present | pfec)) !=
HVMCOPY_gfn_paged_out);
> +    return res;
>  }
>  
>  enum hvm_copy_result hvm_fetch_from_guest_virt(
>      void *buf, unsigned long vaddr, int size, uint32_t pfec)
>  {
> +    enum hvm_copy_result res;
> +    struct waitqueue_head wq;
>      if ( hvm_nx_enabled(current) )
>          pfec |= PFEC_insn_fetch;
> -    return __hvm_copy(buf, vaddr, size,
> +    init_waitqueue_head(&wq);
> +
> +    wait_event(wq, (
> +    res = __hvm_copy(buf, vaddr, size,
>                        HVMCOPY_from_guest | HVMCOPY_fault | HVMCOPY_virt,
> -                      PFEC_page_present | pfec);
> +                      PFEC_page_present | pfec)) !=
HVMCOPY_gfn_paged_out);
> +    return res;
>  }
>  
>  enum hvm_copy_result hvm_copy_to_guest_virt_nofault(
>      unsigned long vaddr, void *buf, int size, uint32_t pfec)
>  {
> -    return __hvm_copy(buf, vaddr, size,
> +    enum hvm_copy_result res;
> +    struct waitqueue_head wq;
> +    init_waitqueue_head(&wq);
> +
> +    wait_event(wq, (
> +    res = __hvm_copy(buf, vaddr, size,
>                        HVMCOPY_to_guest | HVMCOPY_no_fault | HVMCOPY_virt,
> -                      PFEC_page_present | PFEC_write_access | pfec);
> +                      PFEC_page_present | PFEC_write_access | pfec)) !>
HVMCOPY_gfn_paged_out);
> +    return res;
>  }
>  
>  enum hvm_copy_result hvm_copy_from_guest_virt_nofault(
>      void *buf, unsigned long vaddr, int size, uint32_t pfec)
>  {
> -    return __hvm_copy(buf, vaddr, size,
> +    enum hvm_copy_result res;
> +    struct waitqueue_head wq;
> +    init_waitqueue_head(&wq);
> +
> +    wait_event(wq, (
> +    res = __hvm_copy(buf, vaddr, size,
>                        HVMCOPY_from_guest | HVMCOPY_no_fault |
HVMCOPY_virt,
> -                      PFEC_page_present | pfec);
> +                      PFEC_page_present | pfec)) !=
HVMCOPY_gfn_paged_out);
> +    return res;
>  }
>  
>  enum hvm_copy_result hvm_fetch_from_guest_virt_nofault(
>      void *buf, unsigned long vaddr, int size, uint32_t pfec)
>  {
> +    enum hvm_copy_result res;
> +    struct waitqueue_head wq;
>      if ( hvm_nx_enabled(current) )
>          pfec |= PFEC_insn_fetch;
> -    return __hvm_copy(buf, vaddr, size,
> +    init_waitqueue_head(&wq);
> +
> +    wait_event(wq, (
> +    res = __hvm_copy(buf, vaddr, size,
>                        HVMCOPY_from_guest | HVMCOPY_no_fault |
HVMCOPY_virt,
> -                      PFEC_page_present | pfec);
> +                      PFEC_page_present | pfec)) !=
HVMCOPY_gfn_paged_out);
> +    return res;
>  }
>  
>  unsigned long copy_to_user_hvm(void *to, const void *from, unsigned int
len)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2010-Dec-02 10:25 UTC

head link

Re: [Xen-devel] Re: how to handle paged hypercall args?

>>> On 02.12.10 at 11:11, Olaf Hering <olaf@aepfle.de> wrote:
> On Thu, Nov 18, Keir Fraser wrote:
> 
>> I''ve done something along these lines now as
xen-unstable:22402. It actually
>> seems to work okay! So you can go ahead and use waitqueues in
__hvm_copy()
>> now.
> 
> This is my first attempt to do it.
I didn''t look in detail whether that''s being done in a
non-intuitive
way elsewhere, but I can''t see how the event you''re waiting on
would ever get signaled - wouldn''t you need to pass it into
__hvm_copy() and further down from there?

Jan
> It crashed Xen on the very first try in a spectacular way. But it
> happend only once for some reason.
> See my other mail.
> 
> 
> Olaf
> 
> --- xen-unstable.hg-4.1.22447.orig/xen/arch/x86/hvm/hvm.c
> +++ xen-unstable.hg-4.1.22447/xen/arch/x86/hvm/hvm.c
> @@ -1986,69 +1986,117 @@ static enum hvm_copy_result __hvm_copy(
>  enum hvm_copy_result hvm_copy_to_guest_phys(
>      paddr_t paddr, void *buf, int size)
>  {
> -    return __hvm_copy(buf, paddr, size,
> +    enum hvm_copy_result res;
> +    struct waitqueue_head wq;
> +    init_waitqueue_head(&wq);
> +
> +    wait_event(wq, (
> +    res = __hvm_copy(buf, paddr, size,
>                        HVMCOPY_to_guest | HVMCOPY_fault | HVMCOPY_phys,
> -                      0);
> +                      0)) != HVMCOPY_gfn_paged_out);
> +    return res;
>  }
>  
>  enum hvm_copy_result hvm_copy_from_guest_phys(
>      void *buf, paddr_t paddr, int size)
>  {
> -    return __hvm_copy(buf, paddr, size,
> +    enum hvm_copy_result res;
> +    struct waitqueue_head wq;
> +    init_waitqueue_head(&wq);
> +
> +    wait_event(wq, (
> +    res = __hvm_copy(buf, paddr, size,
>                        HVMCOPY_from_guest | HVMCOPY_fault | HVMCOPY_phys,
> -                      0);
> +                      0)) != HVMCOPY_gfn_paged_out);
> +    return res;
>  }
>  
>  enum hvm_copy_result hvm_copy_to_guest_virt(
>      unsigned long vaddr, void *buf, int size, uint32_t pfec)
>  {
> -    return __hvm_copy(buf, vaddr, size,
> +    enum hvm_copy_result res;
> +    struct waitqueue_head wq;
> +    init_waitqueue_head(&wq);
> +
> +    wait_event(wq, (
> +    res = __hvm_copy(buf, vaddr, size,
>                        HVMCOPY_to_guest | HVMCOPY_fault | HVMCOPY_virt,
> -                      PFEC_page_present | PFEC_write_access | pfec);
> +                      PFEC_page_present | PFEC_write_access | pfec)) != 
> HVMCOPY_gfn_paged_out);
> +    return res;
>  }
>  
>  enum hvm_copy_result hvm_copy_from_guest_virt(
>      void *buf, unsigned long vaddr, int size, uint32_t pfec)
>  {
> -    return __hvm_copy(buf, vaddr, size,
> +    enum hvm_copy_result res;
> +    struct waitqueue_head wq;
> +    init_waitqueue_head(&wq);
> +
> +    wait_event(wq, (
> +    res = __hvm_copy(buf, vaddr, size,
>                        HVMCOPY_from_guest | HVMCOPY_fault | HVMCOPY_virt,
> -                      PFEC_page_present | pfec);
> +                      PFEC_page_present | pfec)) !=
HVMCOPY_gfn_paged_out);
> +    return res;
>  }
>  
>  enum hvm_copy_result hvm_fetch_from_guest_virt(
>      void *buf, unsigned long vaddr, int size, uint32_t pfec)
>  {
> +    enum hvm_copy_result res;
> +    struct waitqueue_head wq;
>      if ( hvm_nx_enabled(current) )
>          pfec |= PFEC_insn_fetch;
> -    return __hvm_copy(buf, vaddr, size,
> +    init_waitqueue_head(&wq);
> +
> +    wait_event(wq, (
> +    res = __hvm_copy(buf, vaddr, size,
>                        HVMCOPY_from_guest | HVMCOPY_fault | HVMCOPY_virt,
> -                      PFEC_page_present | pfec);
> +                      PFEC_page_present | pfec)) !=
HVMCOPY_gfn_paged_out);
> +    return res;
>  }
>  
>  enum hvm_copy_result hvm_copy_to_guest_virt_nofault(
>      unsigned long vaddr, void *buf, int size, uint32_t pfec)
>  {
> -    return __hvm_copy(buf, vaddr, size,
> +    enum hvm_copy_result res;
> +    struct waitqueue_head wq;
> +    init_waitqueue_head(&wq);
> +
> +    wait_event(wq, (
> +    res = __hvm_copy(buf, vaddr, size,
>                        HVMCOPY_to_guest | HVMCOPY_no_fault | HVMCOPY_virt,
> -                      PFEC_page_present | PFEC_write_access | pfec);
> +                      PFEC_page_present | PFEC_write_access | pfec)) != 
> HVMCOPY_gfn_paged_out);
> +    return res;
>  }
>  
>  enum hvm_copy_result hvm_copy_from_guest_virt_nofault(
>      void *buf, unsigned long vaddr, int size, uint32_t pfec)
>  {
> -    return __hvm_copy(buf, vaddr, size,
> +    enum hvm_copy_result res;
> +    struct waitqueue_head wq;
> +    init_waitqueue_head(&wq);
> +
> +    wait_event(wq, (
> +    res = __hvm_copy(buf, vaddr, size,
>                        HVMCOPY_from_guest | HVMCOPY_no_fault |
HVMCOPY_virt,
> -                      PFEC_page_present | pfec);
> +                      PFEC_page_present | pfec)) !=
HVMCOPY_gfn_paged_out);
> +    return res;
>  }
>  
>  enum hvm_copy_result hvm_fetch_from_guest_virt_nofault(
>      void *buf, unsigned long vaddr, int size, uint32_t pfec)
>  {
> +    enum hvm_copy_result res;
> +    struct waitqueue_head wq;
>      if ( hvm_nx_enabled(current) )
>          pfec |= PFEC_insn_fetch;
> -    return __hvm_copy(buf, vaddr, size,
> +    init_waitqueue_head(&wq);
> +
> +    wait_event(wq, (
> +    res = __hvm_copy(buf, vaddr, size,
>                        HVMCOPY_from_guest | HVMCOPY_no_fault |
HVMCOPY_virt,
> -                      PFEC_page_present | pfec);
> +                      PFEC_page_present | pfec)) !=
HVMCOPY_gfn_paged_out);
> +    return res;
>  }
>  
>  unsigned long copy_to_user_hvm(void *to, const void *from, unsigned int 
> len)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Dec-02 10:25 UTC

head link

Re: [Xen-devel] Re: how to handle paged hypercall args?

On 02/12/2010 10:18, "Olaf Hering" <olaf@aepfle.de> wrote:
> On Thu, Nov 18, Keir Fraser wrote:
> 
>> I''ve done something along these lines now as
xen-unstable:22402. It actually
>> seems to work okay! So you can go ahead and use waitqueues in
__hvm_copy()
>> now.
> 
> My first attempt with the patch I sent crashed like this.
> Two threads run into a non-empty list:
> 
> prepare_to_wait
> check_wakeup_from_wait
> 
> I could not reproduce this. Right now I''m running with a modified
> xenpaging policy which pages just the pagetable gfns around gfn 0x1800.
> But that almost stalls the guest due to the continous paging.
> 
> Any ideas how this crash can happen?
Since your current patch is conceptually quite broken anyway, there is
little point in chasing down the crash. It might have something to do with
allocating the waitqueue_head on the local stack -- which you would never
want to do in a correct usage of waitqueues. So, back to square one and try
again I''m afraid.

 -- Keir
> Olaf
> 
> ....................
> 
> Welcome to SUSE Linux Enterprise Server 11 SP1  (x86_64) - Kernel
> 2.6.32.24-20101117.152845-xen (console).
> 
> 
> stein-schneider login: (XEN) memory.c:145:d0 Could not allocate order=9
> extent: id=1 memflags=0 (2 of 4)
> (XEN) memory.c:145:d0 Could not allocate order=9 extent: id=1 memflags=0 (0
of
> 3)
> [  102.139380] (cdrom_add_media_watch()
>
file=/usr/src/packages/BUILD/kernel-xen-2.6.32.24/linux-2.6.32/drivers/xen/blk
> back/cdrom.c, line=108) nodename:backend/vbd/1/768
> [  102.171632] (cdrom_is_type()
>
file=/usr/src/packages/BUILD/kernel-xen-2.6.32.24/linux-2.6.32/drivers/xen/blk
> back/cdrom.c, line=95) type:0
> [  102.209310] device vif1.0 entered promiscuous mode
> [  102.221776] br0: port 2(vif1.0) entering forwarding state
> [  102.490897] OLH gntdev_open(449) xend[5202]->qemu-dm[5324] i
> ffff8800f2420720 f ffff8800f1c2f980
> [  102.733559] ip_tables: (C) 2000-2006 Netfilter Core Team
> [  102.888335] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
> [  103.241995] (cdrom_add_media_watch()
>
file=/usr/src/packages/BUILD/kernel-xen-2.6.32.24/linux-2.6.32/drivers/xen/blk
> back/cdrom.c, line=108) nodename:backend/vbd/1/5632
> [  103.274444] (cdrom_is_type()
>
file=/usr/src/packages/BUILD/kernel-xen-2.6.32.24/linux-2.6.32/drivers/xen/blk
> back/cdrom.c, line=95) type:1
> [  103.301481] (cdrom_add_media_watch()
>
file=/usr/src/packages/BUILD/kernel-xen-2.6.32.24/linux-2.6.32/drivers/xen/blk
> back/cdrom.c, line=110) is a cdrom
> [  103.331978] (cdrom_add_media_watch()
>
file=/usr/src/packages/BUILD/kernel-xen-2.6.32.24/linux-2.6.32/drivers/xen/blk
> back/cdrom.c, line=112) xenstore wrote OK
> [  103.362764] (cdrom_is_type()
>
file=/usr/src/packages/BUILD/kernel-xen-2.6.32.24/linux-2.6.32/drivers/xen/blk
> back/cdrom.c, line=95) type:1
> [  104.538376] (cdrom_add_media_watch()
>
file=/usr/src/packages/BUILD/kernel-xen-2.6.32.24/linux-2.6.32/drivers/xen/blk
> back/cdrom.c, line=108) nodename:backend/vbd/1/832
> [  104.570669] (cdrom_is_type()
>
file=/usr/src/packages/BUILD/kernel-xen-2.6.32.24/linux-2.6.32/drivers/xen/blk
> back/cdrom.c, line=95) type:0
> [  112.401097] vif1.0: no IPv6 routers present
> (XEN) HVM1: HVM Loader
> (XEN) HVM1: Detected Xen v4.1.22433-20101126
> (XEN) HVM1: CPU speed is 2667 MHz
> (XEN) HVM1: Xenbus rings @0xfeffc000, event channel 5
> (XEN) irq.c:258: Dom1 PCI link 0 changed 0 -> 5
> (XEN) HVM1: PCI-ISA link 0 routed to IRQ5
> (XEN) irq.c:258: Dom1 PCI link 1 changed 0 -> 10
> (XEN) HVM1: PCI-ISA link 1 routed to IRQ10
> (XEN) irq.c:258: Dom1 PCI link 2 changed 0 -> 11
> (XEN) HVM1: PCI-ISA link 2 routed to IRQ11
> (XEN) irq.c:258: Dom1 PCI link 3 changed 0 -> 5
> (XEN) HVM1: PCI-ISA link 3 routed to IRQ5
> (XEN) HVM1: pci dev 01:3 INTA->IRQ10
> (XEN) HVM1: pci dev 03:0 INTA->IRQ5
> (XEN) HVM1: pci dev 02:0 bar 10 size 02000000: f0000008
> (XEN) HVM1: pci dev 03:0 bar 14 size 01000000: f2000008
> (XEN) HVM1: pci dev 02:0 bar 14 size 00001000: f3000000
> (XEN) HVM1: pci dev 03:0 bar 10 size 00000100: 0000c001
> (XEN) HVM1: pci dev 01:1 bar 20 size 00000010: 0000c101
> (XEN) HVM1: Multiprocessor initialisation:
> (XEN) HVM1:  - CPU0 ... 40-bit phys ... fixed MTRRs ... var MTRRs [2/8] ...
> done.
> (XEN) HVM1:  - CPU1 ... 40-bit phys ... fixed MTRRs ... var MTRRs [2/8] ...
> done.
> (XEN) HVM1:  - CPU2 ... 40-bit phys ... fixed MTRRs ... var MTRRs [2/8] ...
> done.
> (XEN) HVM1:  - CPU3 ... 40-bit phys ... fixed MTRRs ... var MTRRs [2/8] ...
> done.
> (XEN) HVM1: Testing HVM environment:
> (XEN) HVM1:  - REP INSB across page boundaries ... passed
> (XEN) HVM1:  - GS base MSRs and SWAPGS ... passed
> (XEN) HVM1: Passed 2 of 2 tests
> (XEN) HVM1: Writing SMBIOS tables ...
> (XEN) HVM1: Loading ROMBIOS ...
> (XEN) HVM1: 9660 bytes of ROMBIOS high-memory extensions:
> (XEN) HVM1:   Relocating to 0xfc000000-0xfc0025bc ... done
> (XEN) HVM1: Creating MP tables ...
> (XEN) HVM1: Loading Cirrus VGABIOS ...
> (XEN) HVM1: Loading ACPI ...
> (XEN) HVM1:  - Lo data: 000ea020-000ea04f
> (XEN) HVM1:  - Hi data: fc002800-fc01291f
> (XEN) HVM1: vm86 TSS at fc012c00
> (XEN) HVM1: BIOS map:
> (XEN) HVM1:  c0000-c8fff: VGA BIOS
> (XEN) HVM1:  eb000-eb1d9: SMBIOS tables
> (XEN) HVM1:  f0000-fffff: Main BIOS
> (XEN) HVM1: E820 table:
> (XEN) HVM1:  [00]: 00000000:00000000 - 00000000:0009e000: RAM
> (XEN) HVM1:  [01]: 00000000:0009e000 - 00000000:0009fc00: RESERVED
> (XEN) HVM1:  [02]: 00000000:0009fc00 - 00000000:000a0000: RESERVED
> (XEN) HVM1:  HOLE: 00000000:000a0000 - 00000000:000e0000
> (XEN) HVM1:  [03]: 00000000:000e0000 - 00000000:00100000: RESERVED
> (XEN) HVM1:  [04]: 00000000:00100000 - 00000000:40000000: RAM
> (XEN) HVM1:  HOLE: 00000000:40000000 - 00000000:fc000000
> (XEN) HVM1:  [05]: 00000000:fc000000 - 00000001:00000000: RESERVED
> (XEN) HVM1: Invoking ROMBIOS ...
> (XEN) HVM1: $Revision: 1.221 $ $Date: 2008/12/07 17:32:29 $
> (XEN) stdvga.c:147:d1 entering stdvga and caching modes
> (XEN) HVM1: VGABios $Id: vgabios.c,v 1.67 2008/01/27 09:44:12 vruppert Exp
$
> (XEN) HVM1: Bochs BIOS - build: 06/23/99
> (XEN) HVM1: $Revision: 1.221 $ $Date: 2008/12/07 17:32:29 $
> (XEN) HVM1: Options: apmbios pcibios eltorito PMM
> (XEN) HVM1: 
> (XEN) HVM1: ata0-0: PCHS=8322/16/63 translation=lba LCHS=522/255/63
> (XEN) HVM1: ata0 master: QEMU HARDDISK ATA-7 Hard-Disk (4096 MBytes)
> (XEN) HVM1: ata0-1: PCHS=16383/16/63 translation=lba LCHS=1024/255/63
> (XEN) HVM1: ata0  slave: QEMU HARDDISK ATA-7 Hard-Disk (43008 MBytes)
> (XEN) HVM1: ata1 master: QEMU DVD-ROM ATAPI-4 CD-Rom/DVD-Rom
> (XEN) HVM1: IDE time out
> (XEN) HVM1: 
> (XEN) HVM1: 
> (XEN) HVM1: 
> (XEN) HVM1: Press F12 for boot menu.
> (XEN) HVM1: 
> (XEN) HVM1: Booting from Hard Disk...
> (XEN) HVM1: Booting from 0000:7c00
> (XEN) HVM1: int13_harddisk: function 41, unmapped device for ELDL=82
> (XEN) HVM1: int13_harddisk: function 08, unmapped device for ELDL=82
> (XEN) HVM1: *** int 15h function AX=00c0, BX=0000 not yet supported!
> (XEN) HVM1: *** int 15h function AX=ec00, BX=0002 not yet supported!
> (XEN) HVM1: KBD: unsupported int 16h function 03
> (XEN) HVM1: *** int 15h function AX=e980, BX=0000 not yet supported!
> (XEN) HVM1: int13_harddisk: function 41, unmapped device for ELDL=82
> (XEN) HVM1: int13_harddisk: function 02, unmapped device for ELDL=82
> (XEN) HVM1: int13_harddisk: function 41, unmapped device for ELDL=83
> (XEN) HVM1: int13_harddisk: function 02, unmapped device for ELDL=83
> (XEN) HVM1: int13_harddisk: function 41, unmapped device for ELDL=84
> (XEN) HVM1: int13_harddisk: function 02, unmapped device for ELDL=84
> (XEN) HVM1: int13_harddisk: function 41, unmapped device for ELDL=85
> (XEN) HVM1: int13_harddisk: function 02, unmapped device for ELDL=85
> (XEN) HVM1: int13_harddisk: function 41, unmapped device for ELDL=86
> (XEN) HVM1: int13_harddisk: function 02, unmapped device for ELDL=86
> (XEN) HVM1: int13_harddisk: function 41, unmapped device for ELDL=87
> (XEN) HVM1: int13_harddisk: function 02, unmapped device for ELDL=87
> (XEN) HVM1: int13_harddisk: function 41, ELDL out of range 88
> (XEN) HVM1: int13_harddisk: function 02, ELDL out of range 88
> (XEN) HVM1: int13_harddisk: function 41, ELDL out of range 89
> (XEN) HVM1: int13_harddisk: function 02, ELDL out of range 89
> (XEN) HVM1: int13_harddisk: function 41, ELDL out of range 8a
> (XEN) HVM1: int13_harddisk: function 02, ELDL out of range 8a
> (XEN) HVM1: int13_harddisk: function 41, ELDL out of range 8b
> (XEN) HVM1: int13_harddisk: function 02, ELDL out of range 8b
> (XEN) HVM1: int13_harddisk: function 41, ELDL out of range 8c
> (XEN) HVM1: int13_harddisk: function 02, ELDL out of range 8c
> (XEN) HVM1: int13_harddisk: function 41, ELDL out of range 8d
> (XEN) HVM1: int13_harddisk: function 02, ELDL out of range 8d
> (XEN) HVM1: int13_harddisk: function 41, ELDL out of range 8e
> (XEN) HVM1: int13_harddisk: function 02, ELDL out of range 8e
> (XEN) HVM1: int13_harddisk: function 41, ELDL out of range 8f
> (XEN) HVM1: int13_harddisk: function 02, ELDL out of range 8f
> (XEN) vlapic.c:699:d1 Local APIC Write to read-only register 0x30
> (XEN) vlapic.c:699:d1 Local APIC Write to read-only register 0x20
> (XEN) vlapic.c:699:d1 Local APIC Write to read-only register 0x20
> (XEN) irq.c:258: Dom1 PCI link 0 changed 5 -> 0
> (XEN) irq.c:258: Dom1 PCI link 1 changed 10 -> 0
> (XEN) irq.c:258: Dom1 PCI link 2 changed 11 -> 0
> (XEN) irq.c:258: Dom1 PCI link 3 changed 5 -> 0
> (XEN) grant_table.c:1414:d1 Fault while reading gnttab_query_size_t.
> (XEN) grant_table.c:1414:d1 Fault while reading gnttab_query_size_t.
> (XEN) irq.c:324: Dom1 callback via changed to PCI INTx Dev 0x03 IntA
> [  165.316278] blkback: ring-ref 8, event-channel 9, protocol 1
(x86_64-abi)
> [  165.330911]   alloc irq_desc for 886 on node 0
> [  165.337115]   alloc kstat_irqs on node 0
> [  165.351993] blkback: ring-ref 9, event-channel 10, protocol 1
(x86_64-abi)
> [  165.366824]   alloc irq_desc for 887 on node 0
> [  165.372089]   alloc kstat_irqs on node 0
> [  165.387424] blkback: ring-ref 10, event-channel 11, protocol 1
(x86_64-abi)
> [  165.402453]   alloc irq_desc for 888 on node 0
> [  165.409108]   alloc kstat_irqs on node 0
> (XEN) grant_table.c:1414:d1 Fault while reading gnttab_query_size_t.
> [  168.016706]   alloc irq_desc for 889 on node 0
> [  168.020103]   alloc kstat_irqs on node 0
> (XEN) Xen BUG at wait.c:118
> (XEN) Assertion ''list_empty(&wqv->list)'' failed at
wait.c:130
> (XEN) Debugging connection not set up.
> (XEN) Debugging connection not set up.
> (XEN) ----[ Xen-4.1.22433-20101126.164804  x86_64  debug=y  Tainted:    C
> ]----
> (XEN) ----[ Xen-4.1.22433-20101126.164804  x86_64  debug=y  Tainted:    C
> ]----
> (XEN) CPU:    1
> (XEN) CPU:    3
> (XEN) RIP:    e008:[<ffff82c4801285c1>]RIP:   
e008:[<ffff82c4801283ab>]
> prepare_to_wait+0xf0/0x10f
> (XEN) RFLAGS: 0000000000010212    check_wakeup_from_wait+0x27/0x61CONTEXT:
> hypervisor
> (XEN) 
> (XEN) RFLAGS: 0000000000010293   rax: 0000000000000000   rbx:
ffff8301337cd010
> rcx: 0000000000000968
> (XEN) CONTEXT: hypervisor
> (XEN) rdx: ffff83013e737f18   rsi: ffff83013e7375b0   rdi: ffff8301337cd030
> (XEN) rax: ffff8301337cd620   rbx: ffff830012b72000   rcx: 0000000000000000
> (XEN) rbp: ffff83013e737648   rsp: ffff83013e737628   r8:  ffff830138439f60
> (XEN) rdx: ffff83013e707f18   rsi: 0000000000000003   rdi: ffff830012b73860
> (XEN) r9:  000000000011622f   r10: ffff83013e737950   r11: ffffffff8101f230
> (XEN) rbp: ffff83013e707cb0   rsp: ffff83013e707cb0   r8:  0000000000000013
> (XEN) r12: ffff83013e737668   r13: ffff8301337cd010   r14: ffff830012b74000
> (XEN) r9:  0000ffff0000ffff   r10: 00ff00ff00ff00ff   r11: 0f0f0f0f0f0f0f0f
> (XEN) r15: ffffffffff5fb300   cr0: 000000008005003b   cr4: 00000000000026f0
> (XEN) r12: 0000000000000003   r13: ffff8300bf2fa000   r14: 0000002e80873e97
> (XEN) cr3: 000000013444c000   cr2: ffffe8ffffc00000
> (XEN) r15: ffff830012b72000   cr0: 000000008005003b   cr4: 00000000000026f0
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> (XEN) cr3: 00000001347e5000   cr2: 00007f1c79acf000
> (XEN) Xen stack trace from rsp=ffff83013e737628:
> (XEN)   ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN)  0000000000000004Xen stack trace from rsp=ffff83013e707cb0:
> (XEN)    0000000000000003 ffff83013e707cf0 0000000000000004
ffff82c4801ab935
> ffff83013e7379b0 ffff83013e707d10
> (XEN)    ffff830012b72000 ffff83013e7376b8
> (XEN)    ffff82c4801ac622 0000000000000003 ffff83013e7376f0
ffff8300bf2fa000
> ffff83013e737668 0000002e80873e97
> (XEN)    ffff83013e70c040 000000000fff0001
> (XEN)    ffff8301337cd010 ffff83013e707d10 ffff8301337cd010
ffff82c4801c29eb
> 0000000000000004
> (XEN)    0000002e80873e97 000000033e737bc8 ffff830012b72000
0000000000000000
> (XEN)    0000000000000003 ffff83013e707e10 0000000000000004
ffff82c480157913
> (XEN)    0000000000000008 ffff83013e737bc8 ffff830012b74000
0000000000000000
> ffff83013e737728
> (XEN)    ffff82c4801a6460 0000000000000000
> (XEN)    0000000000000000 ffff83013e7376f8 0000000000000002
0000000000000001
> 0000002e80873cb4 ffff83013e7379b0
> (XEN)    000000000000008b ffff82c4802e58c0
> (XEN)    ffff82c4802e58c0 00000000fee00310 ffff83013e707dc0
0000000000000001
> 0000000000000282 ffffffffff5fb300
> (XEN)    0000000000000089 ffff82c48014d4a1
> (XEN)    0000002e80873e97 0000000000000000 ffff83013e707e40
0000000000000000
> 0000000000000000 00000003c18f8247 0000000000000000
> (XEN)   
> (XEN)    ffff83013e77b5c0 ffff83013e737b28 0000000000000000
ffff82c48019179c
> ffff83013e707e00 ffff83013e737748 ffff82c48017a38b ffff82c480175d52
> (XEN)   
> (XEN)    0000000000000392 ffff83013e737768 0000003a19e31b9c
0000000280175df4
> 0000000000000000 000000003e6295a0 0000000000000000 ffff8301388e6f50
> (XEN)   
> (XEN)    0000000000000000 00007cfec18c8867 0000000000000000
ffff82c4802587c0
> ffff83013e707e10 ffff83013e737bc8 ffff830012b72000 0000000012b74000
> (XEN)   
> (XEN)    ffff8300bf2fa000 000000d600d60001 0000000000000001
ffff830138092000
> 0000002e80873e97 0000000000000000 ffff83013e70c040 25d68301388e6f50
> (XEN)   
> (XEN)    ffff83013e707e90 0000000000000005 ffff82c480120fbe
00000000000ca8e4
> ffff83013e707e40 000000000011622f 0000002e80873e97 0000000000000011
> (XEN)   
> (XEN)    ffff83013e70c100 0000000139401004 ffff83013e767dd8
0000000000000000
> ffff830012b72000 ffff83013e7377e8 0000000001c9c380 00ff82c480175d52
> (XEN)   
> (XEN)    ffff83013e707e00 ffff83013e737800 ffff83013e70c100
ffff82c480175df4
> ffff83013e767f20 ffff82c480122204 0000000000000003 ffff8301388e6f50
> (XEN)   
> (XEN)    0000000000000003 00000004388e6f50 ffff82c4802b3f00
0000000800000008
> ffff83013e707f18 0000000000000000 ffffffffffffffff 00000004ffffffff
> (XEN)   
> (XEN)    ffff83013e707ed0 0000000400000001 ffff82c4801220d7
ffff82c4802022e9
> ffff82c4802b3f00 ffff83013e7378c8 ffff83013e707f18 ffff83013e737888
> (XEN)   
> (XEN)    ffff82c48025dbe0 0000000000000010 ffff83013e707f18
0000000300000000
> 0000002e805c506e ffff83013e73793c ffff83013e70c040 000000000000180a
> (XEN)   
> (XEN)    0000000100000000 ffff83013e707ee0 0000000000000003
ffff82c480122152
> 00000000388e6f50 ffff83013e707f10 000000000000000a ffff82c480155619
> (XEN)   
> (XEN)    ffff83013942d000 0000000000000000 0000000000119285
ffff8300bf2fa000
> ffff83013e737998 0000000000000003 0000000000000000
> (XEN)  ffff8300bf2f6000Xen call trace:
> (XEN)    
> (XEN)   [<ffff82c4801285c1>] ffff83013e707d38
prepare_to_wait+0xf0/0x10f
> (XEN)     0000000000000000[<ffff82c4801ac622>] 0000000000000000
> hvm_copy_to_guest_virt+0x65/0xb0
> (XEN)     0000000000000000[<ffff82c4801a6460>]
> (XEN) Xen call trace:
> (XEN)     hvmemul_write+0x113/0x1a2
> (XEN)    [<ffff82c4801283ab>][<ffff82c48019179c>]
> check_wakeup_from_wait+0x27/0x61
> (XEN)    [<ffff82c4801ab935>] x86_emulate+0xe296/0x110ae
> (XEN)    [<ffff82c4801a563f>] hvm_do_resume+0x29/0x1aa
> (XEN)    [<ffff82c4801c29eb>] hvm_emulate_one+0x103/0x192
> (XEN)    [<ffff82c4801b08ee>] vmx_do_resume+0x1bc/0x1db
> (XEN)    [<ffff82c480157913>] handle_mmio+0x4e/0x17d
> (XEN)    [<ffff82c4801c844f>] context_switch+0xdbf/0xddb
> (XEN)    [<ffff82c480120fbe>] vmx_vmexit_handler+0x173f/0x1d2c
> (XEN)    
> (XEN)  schedule+0x5f3/0x619
> (XEN)    
> (XEN) ****************************************
> (XEN) [<ffff82c4801220d7>]Panic on CPU 1:
> (XEN)  __do_softirq+0x88/0x99
> (XEN)    Xen BUG at wait.c:118
> (XEN) [<ffff82c480122152>]****************************************
> (XEN) 
> (XEN)  do_softirq+0x6a/0x7a
> (XEN)    Reboot in five seconds...
> (XEN) [<ffff82c480155619>]Debugging connection not set up.
> (XEN)  idle_loop+0x64/0x66
> (XEN)    Resetting with ACPI MEMORY or I/O RESET_REG.ý


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Dec-02 10:30 UTC

head link

Re: [Xen-devel] Re: how to handle paged hypercall args?

On 02/12/2010 10:22, "Keir Fraser" <keir@xen.org> wrote:
> Firstly, the usage of waitqueues is broken. The waitqueue_head should be
> shared with code that pages in, so that vcpus can be *woken* at some point
> after they start waiting. As it is, if a vcpu does sleep on its local
> waitqueue_head, it will never wake. You might start with a single global
> waitqueue_head and wake everyone on it every time a page (or maybe page
> batch) is paged in. More sophisticated might be to hash page numbers into a
> array of waitqueue_heads, to reduce false wakeups.
...Or you might have a per-domain waitqueue_head, and do the wake_up() from
code that adds paged-in entries to the guest physmap. That would seem a
pretty sensible way to proceed, to me.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Olaf Hering

2010-Dec-03 09:03 UTC

head link

Re: [Xen-devel] Re: how to handle paged hypercall args?

On Mon, Nov 15, Keir Fraser wrote:
> On 15/11/2010 12:04, "Tim Deegan" <Tim.Deegan@citrix.com>
wrote:
> > Will it
> > need per-vcpu stacks? (and will they, in turn, use order>0
allocations? :))
> 
> Of a sort. I propose to keep the per-pcpu stacks and then copy context
> to/from a per-vcpu memory area for the setjmp-like behaviour. Guest call
> stacks won''t be very deep -- I reckon a 1kB or 2kB per-vcpu area
will
> suffice.
Keir,

in my testing the BUG_ON in __prepare_to_wait() triggers, 1500 is too
small. I changed it to 4096 - (4*sizeof(void*)) to fix it for me.
3K would be enough as well.
How large can the stack get, is there an upper limit?

Olaf


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Olaf Hering

2010-Dec-03 09:06 UTC

head link

Re: [Xen-devel] Re: how to handle paged hypercall args?

On Thu, Dec 02, Jan Beulich wrote:
> >>> On 02.12.10 at 11:11, Olaf Hering <olaf@aepfle.de>
wrote:
> > On Thu, Nov 18, Keir Fraser wrote:
> > 
> >> I''ve done something along these lines now as
xen-unstable:22402. It actually
> >> seems to work okay! So you can go ahead and use waitqueues in
__hvm_copy()
> >> now.
> > 
> > This is my first attempt to do it.
> 
> I didn''t look in detail whether that''s being done in a
non-intuitive
> way elsewhere, but I can''t see how the event you''re
waiting on
> would ever get signaled - wouldn''t you need to pass it into
> __hvm_copy() and further down from there?
I was relying on the kind-of wakeup in p2m_mem_paging_resume().

There will be a new patch shortly.

Olaf


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Olaf Hering

2010-Dec-03 09:14 UTC

head link

Re: [Xen-devel] Re: how to handle paged hypercall args?

On Thu, Dec 02, Keir Fraser wrote:
> On 02/12/2010 10:22, "Keir Fraser" <keir@xen.org> wrote:
> 
> > Firstly, the usage of waitqueues is broken. The waitqueue_head should
be
> > shared with code that pages in, so that vcpus can be *woken* at some
point
> > after they start waiting. As it is, if a vcpu does sleep on its local
> > waitqueue_head, it will never wake. You might start with a single
global
> > waitqueue_head and wake everyone on it every time a page (or maybe
page
> > batch) is paged in. More sophisticated might be to hash page numbers
into a
> > array of waitqueue_heads, to reduce false wakeups.
> 
> ...Or you might have a per-domain waitqueue_head, and do the wake_up() from
> code that adds paged-in entries to the guest physmap. That would seem a
> pretty sensible way to proceed, to me.
Thats what I''m doing right now.

It seems that the existing MEM_EVENT_FLAG_VCPU_PAUSED code can be reused
for this. I was messing with wait_event() until I realized that the vcpu
is stopped by p2m_mem_paging_populate() already and the wake_up() ran
before the vcpu got a chance to call schedule().

If a vcpu happens to be scheduled and the domain is destroyed, the
BUG_ON in destroy_waitqueue_vcpu() will trigger. What can happen if
there is still an entry in the list? The cleanup should handle this
situation to not crash Xen itself.

Olaf

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Dec-03 14:13 UTC

head link

Re: [Xen-devel] Re: how to handle paged hypercall args?

On 03/12/2010 01:03, "Olaf Hering" <olaf@aepfle.de> wrote:
> in my testing the BUG_ON in __prepare_to_wait() triggers, 1500 is too
> small. I changed it to 4096 - (4*sizeof(void*)) to fix it for me.
> 3K would be enough as well.
> How large can the stack get, is there an upper limit?
It can get pretty deep with nested interrupts. I wouldn''t expect a
guest''s
hypercall stack to get very deep at all. Send me a BUG_ON() backtrace. That
said, making the saved-stack area bigger isn''t really a problem.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Dec-03 14:18 UTC

head link

Re: [Xen-devel] Re: how to handle paged hypercall args?

On 03/12/2010 01:06, "Olaf Hering" <olaf@aepfle.de> wrote:
>> I didn''t look in detail whether that''s being done in
a non-intuitive
>> way elsewhere, but I can''t see how the event you''re
waiting on
>> would ever get signaled - wouldn''t you need to pass it into
>> __hvm_copy() and further down from there?
> 
> I was relying on the kind-of wakeup in p2m_mem_paging_resume().
> 
> There will be a new patch shortly.
vcpu_pause() is nestable and counted. So the vcpu_unpause() on
MEM_EVENT_FLAG_VCPU_PAUSED will not be enough to wake up a vcpu that is also
paused on a waitqueue. Once the vcou is a sleep on a waitqueue it definitely
needs wake_up() to wake it.

Of course, p2m_mem_paging_resume() is quite likely the right place to put
the wake_up() call. But you do need it in addition to the unpause on the
MEM_EVENT flag.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Dec-03 14:37 UTC

head link

Re: [Xen-devel] Re: how to handle paged hypercall args?

On 03/12/2010 01:14, "Olaf Hering" <olaf@aepfle.de> wrote:
>> ...Or you might have a per-domain waitqueue_head, and do the wake_up()
from
>> code that adds paged-in entries to the guest physmap. That would seem a
>> pretty sensible way to proceed, to me.
> 
> Thats what I''m doing right now.
> 
> It seems that the existing MEM_EVENT_FLAG_VCPU_PAUSED code can be reused
> for this. I was messing with wait_event() until I realized that the vcpu
> is stopped by p2m_mem_paging_populate() already and the wake_up() ran
> before the vcpu got a chance to call schedule().
Hm, not sure what you mean. The vcpu does not get synchronously stopped by
_paging_populate(). Maybe you are confused.
> If a vcpu happens to be scheduled and the domain is destroyed, the
> BUG_ON in destroy_waitqueue_vcpu() will trigger. What can happen if
> there is still an entry in the list? The cleanup should handle this
> situation to not crash Xen itself.
You''ll get a crash if a vcpu is on a waitqueue when you kill the
domain.
Yes, the destroydomain path needs code to handle that. It''ll get added,
once
I see an actual user of this waitqueue stuff. There a few other places that
need fixing up like destroydomain, too.

I don''t know what you mean by ''vcpu is scheduled and the
domain is
destroyed'' causing the BUG_ON(). If a vcpu is scheduled and running
then
presumably it is not on a waitqueue.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Olaf Hering

2010-Dec-07 09:25 UTC

head link

Re: [Xen-devel] Re: how to handle paged hypercall args?

On Thu, Dec 02, Keir Fraser wrote:
> Since your current patch is conceptually quite broken anyway, there is
> little point in chasing down the crash. It might have something to do with
> allocating the waitqueue_head on the local stack -- which you would never
> want to do in a correct usage of waitqueues. So, back to square one and try
> again I''m afraid.
Keir,

yesterday I sent out my patch queue for xen-unstable. I think the
approach to wait the active vcpu in p2m_mem_paging_populate() and wakeup
the vcpu in p2m_mem_paging_resume() could work.
However, something causes what looks like stack corruption.

Any idea whats going on?

Olaf

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Dec-07 16:45 UTC

head link

Re: [Xen-devel] Re: how to handle paged hypercall args?

On 07/12/2010 09:25, "Olaf Hering" <olaf@aepfle.de> wrote:
> On Thu, Dec 02, Keir Fraser wrote:
> 
>> Since your current patch is conceptually quite broken anyway, there is
>> little point in chasing down the crash. It might have something to do
with
>> allocating the waitqueue_head on the local stack -- which you would
never
>> want to do in a correct usage of waitqueues. So, back to square one and
try
>> again I''m afraid.
> 
> Keir,
> 
> yesterday I sent out my patch queue for xen-unstable. I think the
> approach to wait the active vcpu in p2m_mem_paging_populate() and wakeup
> the vcpu in p2m_mem_paging_resume() could work.
> However, something causes what looks like stack corruption.
> 
> Any idea whats going on?
No, I did some unit testing of the waitqueue stuff and it worked for me.
Perhaps you can suggest some reproduction steps.

 K.
> Olaf
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Olaf Hering

2010-Dec-07 17:16 UTC

head link

Re: [Xen-devel] Re: how to handle paged hypercall args?

On Tue, Dec 07, Keir Fraser wrote:
> No, I did some unit testing of the waitqueue stuff and it worked for me.
> Perhaps you can suggest some reproduction steps.
The patches 1 - 13 I sent out need to be applied.

My config for a SLES11-SP1-x86_64 guest looks like that, 1 vcpu appears
to make it crash faster:

  # /etc/xen/vm/sles11_0
name="sles11_0"
description="None"
uuid="756210f5-cc53-2bc6-7db2-a0cefca17c0b"
memory=1024
maxmem=1024
vcpus=4
on_poweroff="destroy"
on_reboot="restart"
on_crash="destroy"
localtime=0
keymap="de"
builder="hvm"
device_model="/usr/lib/xen/bin/qemu-dm"
kernel="/usr/lib/xen/boot/hvmloader"
boot="c"
disk=[ ''file:/abuild/vdisk-sles11_0-disk0,hda,w'',
''file:/abuild/vdisk-sles11_0-disk1,hdb,w'',
''file:/abuild/bootiso-xenpaging-sles11_0.iso,hdc:cdrom,r'', ]
vif=[
''mac=00:e0:f1:08:15:00,bridge=br0,model=rtl8139,type=netfront'',
]
stdvga=0
vnc=1
vncunused=1
extid=0
acpi=1
pae=1
serial="pty"

The guest does not get very far, so all the IO part does probably not
matter.  I stop the guest in grub, then run ''xenpaging 1 -1''.
With patch
#13 only the guests pagetables get paged once the kernel is started from
grub. For me it crashes in less than 255 populate/resume cycles.

Does that help to reproduce the crash?

Olaf


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Dec-07 18:08 UTC

head link

Re: [Xen-devel] Re: how to handle paged hypercall args?

On 07/12/2010 17:16, "Olaf Hering" <olaf@aepfle.de> wrote:
> On Tue, Dec 07, Keir Fraser wrote:
> 
>> No, I did some unit testing of the waitqueue stuff and it worked for
me.
>> Perhaps you can suggest some reproduction steps.
> 
> The patches 1 - 13 I sent out need to be applied.
I''ll wait for tools patches 1-7 to be reviewed and accepted, then I
might
find time to have a go. I assume I should start the guest paused, attach
xenpaging, then when I unpause th eguest it should crash the host straight
away pretty much?

 -- Keir
> My config for a SLES11-SP1-x86_64 guest looks like that, 1 vcpu appears
> to make it crash faster:
> 
>   # /etc/xen/vm/sles11_0
> name="sles11_0"
> description="None"
> uuid="756210f5-cc53-2bc6-7db2-a0cefca17c0b"
> memory=1024
> maxmem=1024
> vcpus=4
> on_poweroff="destroy"
> on_reboot="restart"
> on_crash="destroy"
> localtime=0
> keymap="de"
> builder="hvm"
> device_model="/usr/lib/xen/bin/qemu-dm"
> kernel="/usr/lib/xen/boot/hvmloader"
> boot="c"
> disk=[ ''file:/abuild/vdisk-sles11_0-disk0,hda,w'',
> ''file:/abuild/vdisk-sles11_0-disk1,hdb,w'',
>
''file:/abuild/bootiso-xenpaging-sles11_0.iso,hdc:cdrom,r'', ]
> vif=[
''mac=00:e0:f1:08:15:00,bridge=br0,model=rtl8139,type=netfront'',
]
> stdvga=0
> vnc=1
> vncunused=1
> extid=0
> acpi=1
> pae=1
> serial="pty"
> 
> The guest does not get very far, so all the IO part does probably not
> matter.  I stop the guest in grub, then run ''xenpaging 1
-1''. With patch
> #13 only the guests pagetables get paged once the kernel is started from
> grub. For me it crashes in less than 255 populate/resume cycles.
> 
> Does that help to reproduce the crash?
> 
> Olaf
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Olaf Hering

2010-Dec-07 18:50 UTC

head link

Re: [Xen-devel] Re: how to handle paged hypercall args?

On Tue, Dec 07, Keir Fraser wrote:
> On 07/12/2010 17:16, "Olaf Hering" <olaf@aepfle.de> wrote:
> 
> > On Tue, Dec 07, Keir Fraser wrote:
> > 
> >> No, I did some unit testing of the waitqueue stuff and it worked
for me.
> >> Perhaps you can suggest some reproduction steps.
> > 
> > The patches 1 - 13 I sent out need to be applied.
> 
> I''ll wait for tools patches 1-7 to be reviewed and accepted, then
I might
> find time to have a go. I assume I should start the guest paused, attach
> xenpaging, then when I unpause th eguest it should crash the host straight
> away pretty much?
My testhost had hardware issues today, so I could not proceed with
testing.

What I did was:

sync
xm create /etc/xen/vm/sles11_1 && xm vnc sles11_1 &
sleep 1
xenpaging 1 -1 &

Olaf

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Nov 2010 - how to handle paged hypercall args?

[Xen-devel] how to handle paged hypercall args?

[Xen-devel] Re: how to handle paged hypercall args?

Re: [Xen-devel] Re: how to handle paged hypercall args?

Re: [Xen-devel] Re: how to handle paged hypercall args?

Re: [Xen-devel] Re: how to handle paged hypercall args?

Re: [Xen-devel] Re: how to handle paged hypercall args?

Re: [Xen-devel] Re: how to handle paged hypercall args?

Re: [Xen-devel] Re: how to handle paged hypercall args?

Re: [Xen-devel] Re: how to handle paged hypercall args?

Re: [Xen-devel] Re: how to handle paged hypercall args?

Re: [Xen-devel] Re: how to handle paged hypercall args?

Re: [Xen-devel] Re: how to handle paged hypercall args?

Re: [Xen-devel] Re: how to handle paged hypercall args?

Re: [Xen-devel] Re: how to handle paged hypercall args?

Re: [Xen-devel] Re: how to handle paged hypercall args?

Re: [Xen-devel] Re: how to handle paged hypercall args?

Re: [Xen-devel] Re: how to handle paged hypercall args?

Re: [Xen-devel] Re: how to handle paged hypercall args?

Re: [Xen-devel] Re: how to handle paged hypercall args?

Re: [Xen-devel] Re: how to handle paged hypercall args?

Re: [Xen-devel] Re: how to handle paged hypercall args?

Re: [Xen-devel] Re: how to handle paged hypercall args?

Re: [Xen-devel] Re: how to handle paged hypercall args?

Re: [Xen-devel] Re: how to handle paged hypercall args?

Re: [Xen-devel] Re: how to handle paged hypercall args?

Re: [Xen-devel] Re: how to handle paged hypercall args?

Re: [Xen-devel] Re: how to handle paged hypercall args?

Re: [Xen-devel] Re: how to handle paged hypercall args?

Re: [Xen-devel] Re: how to handle paged hypercall args?

Re: [Xen-devel] Re: how to handle paged hypercall args?

Re: [Xen-devel] Re: how to handle paged hypercall args?

Re: [Xen-devel] Re: how to handle paged hypercall args?

Re: [Xen-devel] Re: how to handle paged hypercall args?

Re: [Xen-devel] Re: how to handle paged hypercall args?

Re: [Xen-devel] Re: how to handle paged hypercall args?

Re: [Xen-devel] Re: how to handle paged hypercall args?

Re: [Xen-devel] Re: how to handle paged hypercall args?

Re: [Xen-devel] Re: how to handle paged hypercall args?

Re: [Xen-devel] Re: how to handle paged hypercall args?