thr3ads.net - Xen devel - [Xen-devel] Need help with fixing the Xen waitqueue feature [Nov 2011]

If this information is useful, please help other people find it:
Share via:

Olaf Hering

2011-Nov-08 21:20 UTC

[Xen-devel] Need help with fixing the Xen waitqueue feature

The patch ''mem_event: use wait queue when ring is full'' I just
sent out
makes use of the waitqueue feature. There are two issues I get with the
change applied:

I think I got the logic right, and in my testing vcpu->pause_count drops
to zero in p2m_mem_paging_resume(). But for some reason the vcpu does
not make progress after the first wakeup. In my debugging there is one
wakeup, the ring is still full, but further wakeups dont happen.
The fully decoded xentrace output may provide some hints about the
underlying issue. But its hard to get due to the second issue.

Another thing is that sometimes the host suddenly reboots without any
message. I think the reason for this is that a vcpu whose stack was put
aside and that was later resumed may find itself on another physical
cpu. And if that happens, wouldnt that invalidate some of the local
variables back in the callchain? If some of them point to the old
physical cpu, how could this be fixed? Perhaps a few "volatiles" are
needed in some places.

I will check wether pinning the guests vcpus to physical cpus actually
avoids the sudden reboots.

Olaf

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2011-Nov-08 22:05 UTC

head link

Re: [Xen-devel] Need help with fixing the Xen waitqueue feature

On 08/11/2011 21:20, "Olaf Hering" <olaf@aepfle.de> wrote:
> Another thing is that sometimes the host suddenly reboots without any
> message. I think the reason for this is that a vcpu whose stack was put
> aside and that was later resumed may find itself on another physical
> cpu. And if that happens, wouldnt that invalidate some of the local
> variables back in the callchain? If some of them point to the old
> physical cpu, how could this be fixed? Perhaps a few "volatiles"
are
> needed in some places.
>From how many call sites can we end up on a wait queue? I know we were goingto end up with a small and explicit number (e.g., in __hvm_copy()) but does
this patch make it a more generally-used mechanism? There will unavoidably
be many constraints on callers who want to be able to yield the cpu. We can
add Linux-style get_cpu/put_cpu abstractions to catch some of them. Actually
I don''t think it''s *that* common that hypercall contexts cache
things like
per-cpu pointers. But every caller will need auditing, I expect.

A sudden reboot is very extreme. No message even on a serial line? That most
commonly indicates bad page tables. Most other bugs you''d at least get
a
double fault message.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Olaf Hering

2011-Nov-08 22:20 UTC

head link

Re: [Xen-devel] Need help with fixing the Xen waitqueue feature

On Tue, Nov 08, Keir Fraser wrote:
> On 08/11/2011 21:20, "Olaf Hering" <olaf@aepfle.de> wrote:
> 
> > Another thing is that sometimes the host suddenly reboots without any
> > message. I think the reason for this is that a vcpu whose stack was
put
> > aside and that was later resumed may find itself on another physical
> > cpu. And if that happens, wouldnt that invalidate some of the local
> > variables back in the callchain? If some of them point to the old
> > physical cpu, how could this be fixed? Perhaps a few
"volatiles" are
> > needed in some places.
> 
> From how many call sites can we end up on a wait queue? I know we were
going
> to end up with a small and explicit number (e.g., in __hvm_copy()) but does
> this patch make it a more generally-used mechanism? There will unavoidably
> be many constraints on callers who want to be able to yield the cpu. We can
> add Linux-style get_cpu/put_cpu abstractions to catch some of them.
Actually
> I don''t think it''s *that* common that hypercall contexts
cache things like
> per-cpu pointers. But every caller will need auditing, I expect.
I havent started to audit the callers. In my testing
mem_event_put_request() is called from p2m_mem_paging_drop_page() and
p2m_mem_paging_populate(). The latter is called from more places.

My plan is to put the sleep into ept_get_entry(), but I''m not there
yet.
First I want to test waitqueues in a rather simple code path like
mem_event_put_request().
> A sudden reboot is very extreme. No message even on a serial line? That
most
> commonly indicates bad page tables. Most other bugs you''d at least
get a
> double fault message.
There is no output on serial, I boot with this cmdline:
  vga=mode-normal console=com1 com1=57600 loglvl=all guest_loglvl=all
  sync_console conring_size=123456 maxcpus=8 dom0_vcpus_pin
  dom0_max_vcpus=2
My base changeset is 24003, the testhost is a Xeon X5670  @ 2.93GHz.

Olaf

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2011-Nov-08 22:54 UTC

head link

Re: [Xen-devel] Need help with fixing the Xen waitqueue feature

On 08/11/2011 22:20, "Olaf Hering" <olaf@aepfle.de> wrote:
> On Tue, Nov 08, Keir Fraser wrote:
> 
>> On 08/11/2011 21:20, "Olaf Hering" <olaf@aepfle.de>
wrote:
>> 
>>> Another thing is that sometimes the host suddenly reboots without
any
>>> message. I think the reason for this is that a vcpu whose stack was
put
>>> aside and that was later resumed may find itself on another
physical
>>> cpu. And if that happens, wouldnt that invalidate some of the local
>>> variables back in the callchain? If some of them point to the old
>>> physical cpu, how could this be fixed? Perhaps a few
"volatiles" are
>>> needed in some places.
>> 
>> From how many call sites can we end up on a wait queue? I know we were
going
>> to end up with a small and explicit number (e.g., in __hvm_copy()) but
does
>> this patch make it a more generally-used mechanism? There will
unavoidably
>> be many constraints on callers who want to be able to yield the cpu. We
can
>> add Linux-style get_cpu/put_cpu abstractions to catch some of them.
Actually
>> I don''t think it''s *that* common that hypercall
contexts cache things like
>> per-cpu pointers. But every caller will need auditing, I expect.
> 
> I havent started to audit the callers. In my testing
> mem_event_put_request() is called from p2m_mem_paging_drop_page() and
> p2m_mem_paging_populate(). The latter is called from more places.
Tbh I wonder anyway whether stale hypercall context would be likely to cause
a silent machine reboot. Booting with max_cpus=1 would eliminate moving
between CPUs as a cause of inconsistencies, or pin the guest under test.
Another problem could be sleeping with locks held, but we do test for that
(in debug builds at least) and I''d expect crash/hang rather than silent
reboot. Another problem could be if the vcpu has its own state in an
inconsistent/invalid state temporarily (e.g., its pagetable base pointers)
which then is attempted to be restored during a waitqueue wakeup. That could
certainly cause a reboot, but I don''t know of an example where this
might
happen.

 -- Keir
> My plan is to put the sleep into ept_get_entry(), but I''m not
there yet.
> First I want to test waitqueues in a rather simple code path like
> mem_event_put_request().
> 
>> A sudden reboot is very extreme. No message even on a serial line? That
most
>> commonly indicates bad page tables. Most other bugs you''d at
least get a
>> double fault message.
> 
> There is no output on serial, I boot with this cmdline:
>   vga=mode-normal console=com1 com1=57600 loglvl=all guest_loglvl=all
>   sync_console conring_size=123456 maxcpus=8 dom0_vcpus_pin
>   dom0_max_vcpus=2
> My base changeset is 24003, the testhost is a Xeon X5670  @ 2.93GHz.
> 
> Olaf


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andres Lagar-Cavilla

2011-Nov-09 03:37 UTC

head link

[Xen-devel] Re: Need help with fixing the Xen waitqueue feature

Olaf,
are waitqueue''s on the mem-event ring meant to be the way to deal with
ring exhaustion? i.e. is this meant to go beyond a testing vehicle for
waitqueue''s?

With the pager itself generating events, and foreign mappings generating
events, you''ll end up putting dom0 vcpu''s in a waitqueue. This
will
basically deadlock the host.

Am I missing something here?
Andres
> Date: Tue, 8 Nov 2011 22:20:24 +0100
> From: Olaf Hering <olaf@aepfle.de>
> Subject: [Xen-devel] Need help with fixing the Xen waitqueue feature
> To: xen-devel@lists.xensource.com
> Message-ID: <20111108212024.GA5276@aepfle.de>
> Content-Type: text/plain; charset=utf-8
>
>
> The patch ''mem_event: use wait queue when ring is full'' I
just sent out
> makes use of the waitqueue feature. There are two issues I get with the
> change applied:
>
> I think I got the logic right, and in my testing vcpu->pause_count drops
> to zero in p2m_mem_paging_resume(). But for some reason the vcpu does
> not make progress after the first wakeup. In my debugging there is one
> wakeup, the ring is still full, but further wakeups dont happen.
> The fully decoded xentrace output may provide some hints about the
> underlying issue. But its hard to get due to the second issue.
>
> Another thing is that sometimes the host suddenly reboots without any
> message. I think the reason for this is that a vcpu whose stack was put
> aside and that was later resumed may find itself on another physical
> cpu. And if that happens, wouldnt that invalidate some of the local
> variables back in the callchain? If some of them point to the old
> physical cpu, how could this be fixed? Perhaps a few "volatiles"
are
> needed in some places.
>
> I will check wether pinning the guests vcpus to physical cpus actually
> avoids the sudden reboots.
>
> Olaf
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andres Lagar-Cavilla

2011-Nov-09 03:52 UTC

head link

[Xen-devel] Re: Need help with fixing the Xen waitqueue feature

> Date: Tue, 08 Nov 2011 22:05:41 +0000
> From: Keir Fraser <keir.xen@gmail.com>
> Subject: Re: [Xen-devel] Need help with fixing the Xen waitqueue
> 	feature
> To: Olaf Hering <olaf@aepfle.de>,
<xen-devel@lists.xensource.com>
> Message-ID: <CADF5835.245E1%keir.xen@gmail.com>
> Content-Type: text/plain;	charset="US-ASCII"
>
> On 08/11/2011 21:20, "Olaf Hering" <olaf@aepfle.de> wrote:
>
>> Another thing is that sometimes the host suddenly reboots without any
>> message. I think the reason for this is that a vcpu whose stack was put
>> aside and that was later resumed may find itself on another physical
>> cpu. And if that happens, wouldnt that invalidate some of the local
>> variables back in the callchain? If some of them point to the old
>> physical cpu, how could this be fixed? Perhaps a few
"volatiles" are
>> needed in some places.
>
>>From how many call sites can we end up on a wait queue? I know we were
>> going
> to end up with a small and explicit number (e.g., in __hvm_copy()) but
> does
> this patch make it a more generally-used mechanism? There will unavoidably
> be many constraints on callers who want to be able to yield the cpu. We
> can
> add Linux-style get_cpu/put_cpu abstractions to catch some of them.
> Actually
> I don''t think it''s *that* common that hypercall contexts
cache things like
> per-cpu pointers. But every caller will need auditing, I expect.
Tbh, for paging to be effective, we need to be prepared to yield on every
p2m lookup.

Let''s compare paging to PoD. They''re essentially the same
thing: pages
disappear, and get allocated on the fly when you need them. PoD is a
highly optimized in-hypervisor optimization that does not need a
user-space helper -- but the pager could do PoD easily and remove all that
p2m-pod.c code from the hypervisor.

PoD only introduces extraneous side-effects when there is a complete
absence of memory to allocate pages. The same cannot be said of paging, to
put it mildly. It returns EINVAL all over the place. Right now, qemu can
be crashed in a blink by paging out the right gfn.

To get paging to where PoD is, all these situations need to be handled in
a manner other than returning EINVAL. That means putting the vcpu on a
waitqueue on every location p2m_pod_demand_populate is called, not just
__hvm_copy.

I don''t know that that''s gonna be altogether doable. Many of
these gfn
lookups happen in atomic contexts, not to mention cpu-specific pointers.
But at least we should aim for that.

Andres>
> A sudden reboot is very extreme. No message even on a serial line? That
> most
> commonly indicates bad page tables. Most other bugs you''d at least
get a
> double fault message.
>
>  -- Keir
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Olaf Hering

2011-Nov-09 07:02 UTC

head link

[Xen-devel] Re: Need help with fixing the Xen waitqueue feature

On Tue, Nov 08, Andres Lagar-Cavilla wrote:
> Olaf,
> are waitqueue''s on the mem-event ring meant to be the way to deal
with
> ring exhaustion? i.e. is this meant to go beyond a testing vehicle for
> waitqueue''s?
Putting the guest to sleep when the ring is full is at least required
for p2m_mem_paging_drop_page(), so that the page gets informed about all
gfns from decrease_reservation.
> With the pager itself generating events, and foreign mappings generating
> events, you''ll end up putting dom0 vcpu''s in a waitqueue.
This will
> basically deadlock the host.
Those vcpus can not go to sleep and my change handles that case.

Olaf

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Olaf Hering

2011-Nov-09 07:09 UTC

head link

[Xen-devel] Re: Need help with fixing the Xen waitqueue feature

On Tue, Nov 08, Andres Lagar-Cavilla wrote:
> Tbh, for paging to be effective, we need to be prepared to yield on every
> p2m lookup.
Yes, if a gfn is missing the vcpu should go to sleep rather than
returning -ENOENT to the caller. Only the query part of gfn_to_mfn
should return the p2m paging types.
> Let''s compare paging to PoD. They''re essentially the same
thing: pages
> disappear, and get allocated on the fly when you need them. PoD is a
> highly optimized in-hypervisor optimization that does not need a
> user-space helper -- but the pager could do PoD easily and remove all that
> p2m-pod.c code from the hypervisor.
Perhaps PoD and paging could be merged, I havent had time to study the
PoD code.
> PoD only introduces extraneous side-effects when there is a complete
> absence of memory to allocate pages. The same cannot be said of paging, to
> put it mildly. It returns EINVAL all over the place. Right now, qemu can
> be crashed in a blink by paging out the right gfn.
I have seen qemu crashes when using emulated storage, but havent
debugged them yet. I suspect they were caused by a race between nominate
and evict.

Olaf

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andres Lagar-Cavilla

2011-Nov-09 21:21 UTC

head link

[Xen-devel] Re: Need help with fixing the Xen waitqueue feature

Hi there,> On Tue, Nov 08, Andres Lagar-Cavilla wrote:
>
>> Tbh, for paging to be effective, we need to be prepared to yield on
>> every
>> p2m lookup.
>
> Yes, if a gfn is missing the vcpu should go to sleep rather than
> returning -ENOENT to the caller. Only the query part of gfn_to_mfn
> should return the p2m paging types.
>
>> Let''s compare paging to PoD. They''re essentially the
same thing: pages
>> disappear, and get allocated on the fly when you need them. PoD is a
>> highly optimized in-hypervisor optimization that does not need a
>> user-space helper -- but the pager could do PoD easily and remove all
>> that
>> p2m-pod.c code from the hypervisor.
>
> Perhaps PoD and paging could be merged, I havent had time to study the
> PoD code.
Well, PoD can be implemented with a pager that simply shortcuts the step
that actually populates the page with contents. A zeroed heap page is good
enough. It''s fairly simple for a pager to know for which pages it
should
return zero.

PoD also does emergency sweeps under memory pressure to identify zeroes,
that can be easily implemented by a user-space utility.

The hypervisor code keeps a list of 2M superpages -- that feature would be
lost.

But I doubt this would fly anyways: PoD works for non-ept modes, which I
guess don''t want to lose that functionality.
>
>> PoD only introduces extraneous side-effects when there is a complete
>> absence of memory to allocate pages. The same cannot be said of paging,
>> to
>> put it mildly. It returns EINVAL all over the place. Right now, qemu
can
>> be crashed in a blink by paging out the right gfn.
>
> I have seen qemu crashes when using emulated storage, but havent
> debugged them yet. I suspect they were caused by a race between nominate
> and evict.
>
> Olaf
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andres Lagar-Cavilla

2011-Nov-09 21:30 UTC

head link

[Xen-devel] Re: Need help with fixing the Xen waitqueue feature

Also,> On Tue, Nov 08, Andres Lagar-Cavilla wrote:
>
>> Tbh, for paging to be effective, we need to be prepared to yield on
>> every
>> p2m lookup.
>
> Yes, if a gfn is missing the vcpu should go to sleep rather than
> returning -ENOENT to the caller. Only the query part of gfn_to_mfn
> should return the p2m paging types.
>
>> Let''s compare paging to PoD. They''re essentially the
same thing: pages
>> disappear, and get allocated on the fly when you need them. PoD is a
>> highly optimized in-hypervisor optimization that does not need a
>> user-space helper -- but the pager could do PoD easily and remove all
>> that
>> p2m-pod.c code from the hypervisor.
>
> Perhaps PoD and paging could be merged, I havent had time to study the
> PoD code.
>
>> PoD only introduces extraneous side-effects when there is a complete
>> absence of memory to allocate pages. The same cannot be said of paging,
>> to
>> put it mildly. It returns EINVAL all over the place. Right now, qemu
can
>> be crashed in a blink by paging out the right gfn.
>
> I have seen qemu crashes when using emulated storage, but havent
> debugged them yet. I suspect they were caused by a race between nominate
> and evict.
After a bit of thinking, things are far more complicated. I don''t think
this is a "race." If the pager removed a page that later gets
scheduled by
the guest OS for IO, qemu will want to foreign-map that. With the
hypervisor returning ENOENT, the foreign map will fail, and there goes
qemu.

Same will happen for pv backend mapping grants, or the checkpoint/migrate
code.

I guess qemu/migrate/libxc could retry until the pager is done and the
mapping succeeds. It will be delicate. It won''t work for pv backends.
It
will flood the mem_event ring.

Wait-queueing the dom0 vcpu is a no-go -- the machine will deadlock quicly.

My thinking is that the best bet is to wait-queue the dom0 process. The
dom0 kernel code handling the foreign map will need to put the mapping
thread in a wait-queue. It can establish a ring-based notification
mechanism with Xen. When Xen completes the paging in, it can add a
notification to the ring. dom0 can then awake the mapping thread and
retry.

Not simple at all. Ideas out there?

Andres
>
> Olaf
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Olaf Hering

2011-Nov-09 22:11 UTC

head link

[Xen-devel] Re: Need help with fixing the Xen waitqueue feature

On Wed, Nov 09, Andres Lagar-Cavilla wrote:
> After a bit of thinking, things are far more complicated. I don''t
think
> this is a "race." If the pager removed a page that later gets
scheduled by
> the guest OS for IO, qemu will want to foreign-map that. With the
> hypervisor returning ENOENT, the foreign map will fail, and there goes
> qemu.
The tools are supposed to catch ENOENT and try again.
linux_privcmd_map_foreign_bulk() does that. linux_gnttab_grant_map()
appears to do that as well. What code path uses qemu that leads to a
crash?
> I guess qemu/migrate/libxc could retry until the pager is done and the
> mapping succeeds. It will be delicate. It won''t work for pv
backends. It
> will flood the mem_event ring.
There will no flood, only one request is sent per gfn in
p2m_mem_paging_populate().

Olaf

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andres Lagar-Cavilla

2011-Nov-10 04:29 UTC

head link

[Xen-devel] Re: Need help with fixing the Xen waitqueue feature

Olaf,> On Wed, Nov 09, Andres Lagar-Cavilla wrote:
>
>> After a bit of thinking, things are far more complicated. I
don''t think
>> this is a "race." If the pager removed a page that later gets
scheduled
>> by
>> the guest OS for IO, qemu will want to foreign-map that. With the
>> hypervisor returning ENOENT, the foreign map will fail, and there goes
>> qemu.
>
> The tools are supposed to catch ENOENT and try again.
> linux_privcmd_map_foreign_bulk() does that. linux_gnttab_grant_map()
> appears to do that as well. What code path uses qemu that leads to a
> crash?
The tools retry as long as IOCTL_PRIVCMD_MMAPBATCH_V2 is supported. Which
it isn''t on mainline linux 3.0, 3.1, etc. Which dom0 kernel are you
using?

And for backend drivers implemented in the kernel (netback, etc), there is
no retrying.

All those ram_paging types and their interactions give me a headache, but
I''ll trust you that only one event is put in the ring.

I''m using 24066:54a5e994a241. I start windows 7, make xenpaging try to
evict 90% of the RAM, qemu lasts for about two seconds. Linux fights
harder, but qemu also dies. No pv drivers. I haven''t been able to trace
back the qemu crash (segfault on a NULL ide_if field for a dma callback)
to the exact paging action yet, but no crashes without paging.

Andres
>
>> I guess qemu/migrate/libxc could retry until the pager is done and the
>> mapping succeeds. It will be delicate. It won''t work for pv
backends. It
>> will flood the mem_event ring.
>
> There will no flood, only one request is sent per gfn in
> p2m_mem_paging_populate().
>
> Olaf
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2011-Nov-10 09:20 UTC

head link

[Xen-devel] Re: Need help with fixing the Xen waitqueue feature

>>> On 10.11.11 at 05:29, "Andres Lagar-Cavilla"
<andres@lagarcavilla.org> wrote:
> The tools retry as long as IOCTL_PRIVCMD_MMAPBATCH_V2 is supported. Which
> it isn''t on mainline linux 3.0, 3.1, etc.
Seems like nobody cared to port over the code from the old 2.6.18 tree
(or the forward ports thereof).
> Which dom0 kernel are you using?
Certainly one of our forward port kernels.
> And for backend drivers implemented in the kernel (netback, etc), there is
> no retrying.
As above, seems like nobody cared to forward port those bits either.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2011-Nov-10 09:26 UTC

head link

[Xen-devel] Re: Need help with fixing the Xen waitqueue feature

On 10/11/2011 04:29, "Andres Lagar-Cavilla"
<andres@lagarcavilla.org> wrote:
>> The tools are supposed to catch ENOENT and try again.
>> linux_privcmd_map_foreign_bulk() does that. linux_gnttab_grant_map()
>> appears to do that as well. What code path uses qemu that leads to a
>> crash?
> 
> The tools retry as long as IOCTL_PRIVCMD_MMAPBATCH_V2 is supported. Which
> it isn''t on mainline linux 3.0, 3.1, etc. Which dom0 kernel are
you using?
> 
> And for backend drivers implemented in the kernel (netback, etc), there is
> no retrying.
Getting this working without a new Linux kernel -- and with
as-yet-to-be-written new stuff in it -- is unlikely to be on the cards is
it?

I think you suggested an in-kernel mechanism to wait for page-in and then
retry mapping. If that could be used by the in-kernel drivers and exposed
via our privcmd interface for qemu and rest of userspace too, that may be
the best single solution. Perhaps it could be largely hidden behind the
existing privcmd-mmap ioctls.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Olaf Hering

2011-Nov-10 10:18 UTC

head link

[Xen-devel] Re: Need help with fixing the Xen waitqueue feature

On Wed, Nov 09, Andres Lagar-Cavilla wrote:
> Olaf,
> > On Wed, Nov 09, Andres Lagar-Cavilla wrote:
> >
> >> After a bit of thinking, things are far more complicated. I
don''t think
> >> this is a "race." If the pager removed a page that later
gets scheduled
> >> by
> >> the guest OS for IO, qemu will want to foreign-map that. With the
> >> hypervisor returning ENOENT, the foreign map will fail, and there
goes
> >> qemu.
> >
> > The tools are supposed to catch ENOENT and try again.
> > linux_privcmd_map_foreign_bulk() does that. linux_gnttab_grant_map()
> > appears to do that as well. What code path uses qemu that leads to a
> > crash?
> 
> The tools retry as long as IOCTL_PRIVCMD_MMAPBATCH_V2 is supported. Which
> it isn''t on mainline linux 3.0, 3.1, etc. Which dom0 kernel are
you using?
I''m running SLES11 as dom0. Now thats really odd that there is no
ENOENT
handling in mainline, I will go and check the code.
> And for backend drivers implemented in the kernel (netback, etc), there is
> no retrying.
A while ago I fixed the grant status handling, perhaps that change was
never forwarded to pvops, at least I didnt do it at that time.
> I''m using 24066:54a5e994a241. I start windows 7, make xenpaging
try to
> evict 90% of the RAM, qemu lasts for about two seconds. Linux fights
> harder, but qemu also dies. No pv drivers. I haven''t been able to
trace
> back the qemu crash (segfault on a NULL ide_if field for a dma callback)
> to the exact paging action yet, but no crashes without paging.
If the kernel is pvops it may need some audit to check the ENOENT
handling.

Olaf

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Olaf Hering

2011-Nov-10 12:05 UTC

head link

[Xen-devel] Re: Need help with fixing the Xen waitqueue feature

On Thu, Nov 10, Olaf Hering wrote:
> On Wed, Nov 09, Andres Lagar-Cavilla wrote:
> > The tools retry as long as IOCTL_PRIVCMD_MMAPBATCH_V2 is supported.
Which
> > it isn''t on mainline linux 3.0, 3.1, etc. Which dom0 kernel
are you using?
> I''m running SLES11 as dom0. Now thats really odd that there is no
ENOENT
> handling in mainline, I will go and check the code.
xen_remap_domain_mfn_range() has to catch -ENOENT returned from
HYPERVISOR_mmu_update() and return it to its callers. Then
drivers/xen/xenfs/privcmd.c:traverse_pages() will do the right thing.
See
http://xenbits.xen.org/hg/linux-2.6.18-xen.hg/rev/0051d294bb60

The granttable part needs more changes, see
http://xenbits.xen.org/hg/linux-2.6.18-xen.hg/rev/7c7efaea8b54


Olaf

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andres Lagar-Cavilla

2011-Nov-10 13:57 UTC

head link

Re: [Xen-devel] Re: Need help with fixing the Xen waitqueue feature

Thanks Jan, and thanks Olaf for the pointers to specific patches. I''ll
try
to cherry pick those into my dom0 (debian mainline 3.0). Somebody else
should get those in mainline though. Soonish :)

Andres>>>> On 10.11.11 at 05:29, "Andres Lagar-Cavilla"
<andres@lagarcavilla.org>
>>>> wrote:
>> The tools retry as long as IOCTL_PRIVCMD_MMAPBATCH_V2 is supported.
>> Which
>> it isn''t on mainline linux 3.0, 3.1, etc.
>
> Seems like nobody cared to port over the code from the old 2.6.18 tree
> (or the forward ports thereof).
>
>> Which dom0 kernel are you using?
>
> Certainly one of our forward port kernels.
>
>> And for backend drivers implemented in the kernel (netback, etc), there
>> is
>> no retrying.
>
> As above, seems like nobody cared to forward port those bits either.
>
> Jan
>
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Nov-10 15:32 UTC

head link

Re: [Xen-devel] Re: Need help with fixing the Xen waitqueue feature

On Thu, Nov 10, 2011 at 05:57:18AM -0800, Andres Lagar-Cavilla
wrote:> Thanks Jan, and thanks Olaf for the pointers to specific patches.
I''ll try
> to cherry pick those into my dom0 (debian mainline 3.0). Somebody else
> should get those in mainline though. Soonish :)
Well, could you post them once you have cherry-picked them?

Thanks.> 
> Andres
> >>>> On 10.11.11 at 05:29, "Andres Lagar-Cavilla"
<andres@lagarcavilla.org>
> >>>> wrote:
> >> The tools retry as long as IOCTL_PRIVCMD_MMAPBATCH_V2 is
supported.
> >> Which
> >> it isn''t on mainline linux 3.0, 3.1, etc.
> >
> > Seems like nobody cared to port over the code from the old 2.6.18 tree
> > (or the forward ports thereof).
> >
> >> Which dom0 kernel are you using?
> >
> > Certainly one of our forward port kernels.
> >
> >> And for backend drivers implemented in the kernel (netback, etc),
there
> >> is
> >> no retrying.
> >
> > As above, seems like nobody cared to forward port those bits either.
> >
> > Jan
> >
> >
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Olaf Hering

2011-Nov-11 22:56 UTC

head link

Re: [Xen-devel] Need help with fixing the Xen waitqueue feature

Keir,

just do dump my findings to the list:

On Tue, Nov 08, Keir Fraser wrote:
> Tbh I wonder anyway whether stale hypercall context would be likely to
cause
> a silent machine reboot. Booting with max_cpus=1 would eliminate moving
> between CPUs as a cause of inconsistencies, or pin the guest under test.
> Another problem could be sleeping with locks held, but we do test for that
> (in debug builds at least) and I''d expect crash/hang rather than
silent
> reboot. Another problem could be if the vcpu has its own state in an
> inconsistent/invalid state temporarily (e.g., its pagetable base pointers)
> which then is attempted to be restored during a waitqueue wakeup. That
could
> certainly cause a reboot, but I don''t know of an example where
this might
> happen.
The crashes also happen with maxcpus=1 and a single guest cpu.
Today I added wait_event to ept_get_entry and this works.

But at some point the codepath below is executed, after that wake_up the
host hangs hard. I will trace it further next week, maybe the backtrace
gives a glue what the cause could be.

Also, the 3K stacksize is still too small, this path uses 3096.

(XEN) prep 127a 30 0
(XEN) wake 127a 30
(XEN) prep 1cf71 30 0
(XEN) wake 1cf71 30
(XEN) prep 1cf72 30 0
(XEN) wake 1cf72 30
(XEN) prep 1cee9 30 0
(XEN) wake 1cee9 30
(XEN) prep 121a 30 0
(XEN) wake 121a 30

(This means ''gfn  (p2m_unshare << 4) in_atomic)''

(XEN) prep 1ee61 20 0
(XEN) max stacksize c18
(XEN) Xen WARN at wait.c:126
(XEN) ----[ Xen-4.2.24114-20111111.221356  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82c48012b85e>] prepare_to_wait+0x178/0x1b2
(XEN) RFLAGS: 0000000000010286   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: ffff830201f76000   rcx: 0000000000000000
(XEN) rdx: ffff82c4802b7f18   rsi: 000000000000000a   rdi: ffff82c4802673f0
(XEN) rbp: ffff82c4802b73a8   rsp: ffff82c4802b7378   r8:  0000000000000000
(XEN) r9:  ffff82c480221da0   r10: 00000000fffffffa   r11: 0000000000000003
(XEN) r12: ffff82c4802b7f18   r13: ffff830201f76000   r14: ffff83003ea5c000
(XEN) r15: 000000000001ee61   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 000000020336d000   cr2: 00007fa88ac42000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen stack trace from rsp=ffff82c4802b7378:
(XEN)    0000000000000020 000000000001ee61 0000000000000002 ffff830201aa9e90
(XEN)    ffff830201aa9f60 0000000000000020 ffff82c4802b7428 ffff82c4801e02f9
(XEN)    ffff830000000002 0000000000000000 ffff82c4802b73f8 ffff82c4802b73f4
(XEN)    0000000000000000 ffff82c4802b74e0 ffff82c4802b74e4 0000000101aa9e90
(XEN)    000000ffffffffff ffff830201aa9e90 000000000001ee61 ffff82c4802b74e4
(XEN)    0000000000000002 0000000000000000 ffff82c4802b7468 ffff82c4801d810f
(XEN)    ffff82c4802b74e0 000000000001ee61 ffff830201aa9e90 ffff82c4802b75bc
(XEN)    00000000002167f5 ffff88001ee61900 ffff82c4802b7518 ffff82c480211b80
(XEN)    ffff8302167f5000 ffff82c4801c168c 0000000000000000 ffff83003ea5c000
(XEN)    ffff88001ee61900 0000000001805063 0000000001809063 000000001ee001e3
(XEN)    000000001ee61067 00000000002167f5 000000000022ee70 000000000022ed10
(XEN)    ffffffffffffffff 0000000a00000007 0000000000000004 ffff82c48025db80
(XEN)    ffff83003ea5c000 ffff82c4802b75bc ffff88001ee61900 ffff830201aa9e90
(XEN)    ffff82c4802b7528 ffff82c480211cb1 ffff82c4802b7568 ffff82c4801da97f
(XEN)    ffff82c4801be053 0000000000000008 ffff82c4802b7b58 ffff88001ee61900
(XEN)    0000000000000000 ffff82c4802b78b0 ffff82c4802b75f8 ffff82c4801aaec8
(XEN)    0000000000000003 ffff88001ee61900 ffff82c4802b78b0 ffff82c4802b7640
(XEN)    ffff83003ea5c000 00000000000000a0 0000000000000900 0000000000000008
(XEN)    00000003802b7650 0000000000000004 00000003802b7668 0000000000000000
(XEN)    ffff82c4802b7b58 0000000000000001 0000000000000003 ffff82c4802b78b0
(XEN) Xen call trace:
(XEN)    [<ffff82c48012b85e>] prepare_to_wait+0x178/0x1b2
(XEN)    [<ffff82c4801e02f9>] ept_get_entry+0x81/0xd8
(XEN)    [<ffff82c4801d810f>] gfn_to_mfn_type_p2m+0x55/0x114
(XEN)    [<ffff82c480211b80>] hap_p2m_ga_to_gfn_4_levels+0x1c4/0x2d6
(XEN)    [<ffff82c480211cb1>] hap_gva_to_gfn_4_levels+0x1f/0x2e
(XEN)    [<ffff82c4801da97f>] paging_gva_to_gfn+0xae/0xc4
(XEN)    [<ffff82c4801aaec8>] hvmemul_linear_to_phys+0xf1/0x25c
(XEN)    [<ffff82c4801ab762>] hvmemul_rep_movs+0xe8/0x31a
(XEN)    [<ffff82c48018de07>] x86_emulate+0x4e01/0x10fde
(XEN)    [<ffff82c4801aab3c>] hvm_emulate_one+0x12d/0x1c5
(XEN)    [<ffff82c4801b68a9>] handle_mmio+0x4e/0x1d8
(XEN)    [<ffff82c4801b3a1e>] hvm_hap_nested_page_fault+0x1e7/0x302
(XEN)    [<ffff82c4801d1ff6>] vmx_vmexit_handler+0x12cf/0x1594
(XEN)
(XEN) wake 1ee61 20




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2011-Nov-12 07:00 UTC

head link

Re: [Xen-devel] Need help with fixing the Xen waitqueue feature

On 11/11/2011 22:56, "Olaf Hering" <olaf@aepfle.de> wrote:
> Keir,
> 
> just do dump my findings to the list:
> 
> On Tue, Nov 08, Keir Fraser wrote:
> 
>> Tbh I wonder anyway whether stale hypercall context would be likely to
cause
>> a silent machine reboot. Booting with max_cpus=1 would eliminate moving
>> between CPUs as a cause of inconsistencies, or pin the guest under
test.
>> Another problem could be sleeping with locks held, but we do test for
that
>> (in debug builds at least) and I''d expect crash/hang rather
than silent
>> reboot. Another problem could be if the vcpu has its own state in an
>> inconsistent/invalid state temporarily (e.g., its pagetable base
pointers)
>> which then is attempted to be restored during a waitqueue wakeup. That
could
>> certainly cause a reboot, but I don''t know of an example where
this might
>> happen.
> 
> The crashes also happen with maxcpus=1 and a single guest cpu.
> Today I added wait_event to ept_get_entry and this works.
> 
> But at some point the codepath below is executed, after that wake_up the
> host hangs hard. I will trace it further next week, maybe the backtrace
> gives a glue what the cause could be.
So you run with a single CPU, and with wait_event() in one location, and
that works for a while (actually doing full waitqueue work: executing wait()
and wake_up()), but then hangs? That''s weird, but pretty interesting if
I''ve
understood correctly.
> Also, the 3K stacksize is still too small, this path uses 3096.
I''ll allocate a whole page for the stack then.

 -- Keir
> (XEN) prep 127a 30 0
> (XEN) wake 127a 30
> (XEN) prep 1cf71 30 0
> (XEN) wake 1cf71 30
> (XEN) prep 1cf72 30 0
> (XEN) wake 1cf72 30
> (XEN) prep 1cee9 30 0
> (XEN) wake 1cee9 30
> (XEN) prep 121a 30 0
> (XEN) wake 121a 30
> 
> (This means ''gfn  (p2m_unshare << 4) in_atomic)''
> 
> (XEN) prep 1ee61 20 0
> (XEN) max stacksize c18
> (XEN) Xen WARN at wait.c:126
> (XEN) ----[ Xen-4.2.24114-20111111.221356  x86_64  debug=y  Tainted:    C
> ]----
> (XEN) CPU:    0
> (XEN) RIP:    e008:[<ffff82c48012b85e>] prepare_to_wait+0x178/0x1b2
> (XEN) RFLAGS: 0000000000010286   CONTEXT: hypervisor
> (XEN) rax: 0000000000000000   rbx: ffff830201f76000   rcx: 0000000000000000
> (XEN) rdx: ffff82c4802b7f18   rsi: 000000000000000a   rdi: ffff82c4802673f0
> (XEN) rbp: ffff82c4802b73a8   rsp: ffff82c4802b7378   r8:  0000000000000000
> (XEN) r9:  ffff82c480221da0   r10: 00000000fffffffa   r11: 0000000000000003
> (XEN) r12: ffff82c4802b7f18   r13: ffff830201f76000   r14: ffff83003ea5c000
> (XEN) r15: 000000000001ee61   cr0: 000000008005003b   cr4: 00000000000026f0
> (XEN) cr3: 000000020336d000   cr2: 00007fa88ac42000
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> (XEN) Xen stack trace from rsp=ffff82c4802b7378:
> (XEN)    0000000000000020 000000000001ee61 0000000000000002
ffff830201aa9e90
> (XEN)    ffff830201aa9f60 0000000000000020 ffff82c4802b7428
ffff82c4801e02f9
> (XEN)    ffff830000000002 0000000000000000 ffff82c4802b73f8
ffff82c4802b73f4
> (XEN)    0000000000000000 ffff82c4802b74e0 ffff82c4802b74e4
0000000101aa9e90
> (XEN)    000000ffffffffff ffff830201aa9e90 000000000001ee61
ffff82c4802b74e4
> (XEN)    0000000000000002 0000000000000000 ffff82c4802b7468
ffff82c4801d810f
> (XEN)    ffff82c4802b74e0 000000000001ee61 ffff830201aa9e90
ffff82c4802b75bc
> (XEN)    00000000002167f5 ffff88001ee61900 ffff82c4802b7518
ffff82c480211b80
> (XEN)    ffff8302167f5000 ffff82c4801c168c 0000000000000000
ffff83003ea5c000
> (XEN)    ffff88001ee61900 0000000001805063 0000000001809063
000000001ee001e3
> (XEN)    000000001ee61067 00000000002167f5 000000000022ee70
000000000022ed10
> (XEN)    ffffffffffffffff 0000000a00000007 0000000000000004
ffff82c48025db80
> (XEN)    ffff83003ea5c000 ffff82c4802b75bc ffff88001ee61900
ffff830201aa9e90
> (XEN)    ffff82c4802b7528 ffff82c480211cb1 ffff82c4802b7568
ffff82c4801da97f
> (XEN)    ffff82c4801be053 0000000000000008 ffff82c4802b7b58
ffff88001ee61900
> (XEN)    0000000000000000 ffff82c4802b78b0 ffff82c4802b75f8
ffff82c4801aaec8
> (XEN)    0000000000000003 ffff88001ee61900 ffff82c4802b78b0
ffff82c4802b7640
> (XEN)    ffff83003ea5c000 00000000000000a0 0000000000000900
0000000000000008
> (XEN)    00000003802b7650 0000000000000004 00000003802b7668
0000000000000000
> (XEN)    ffff82c4802b7b58 0000000000000001 0000000000000003
ffff82c4802b78b0
> (XEN) Xen call trace:
> (XEN)    [<ffff82c48012b85e>] prepare_to_wait+0x178/0x1b2
> (XEN)    [<ffff82c4801e02f9>] ept_get_entry+0x81/0xd8
> (XEN)    [<ffff82c4801d810f>] gfn_to_mfn_type_p2m+0x55/0x114
> (XEN)    [<ffff82c480211b80>] hap_p2m_ga_to_gfn_4_levels+0x1c4/0x2d6
> (XEN)    [<ffff82c480211cb1>] hap_gva_to_gfn_4_levels+0x1f/0x2e
> (XEN)    [<ffff82c4801da97f>] paging_gva_to_gfn+0xae/0xc4
> (XEN)    [<ffff82c4801aaec8>] hvmemul_linear_to_phys+0xf1/0x25c
> (XEN)    [<ffff82c4801ab762>] hvmemul_rep_movs+0xe8/0x31a
> (XEN)    [<ffff82c48018de07>] x86_emulate+0x4e01/0x10fde
> (XEN)    [<ffff82c4801aab3c>] hvm_emulate_one+0x12d/0x1c5
> (XEN)    [<ffff82c4801b68a9>] handle_mmio+0x4e/0x1d8
> (XEN)    [<ffff82c4801b3a1e>] hvm_hap_nested_page_fault+0x1e7/0x302
> (XEN)    [<ffff82c4801d1ff6>] vmx_vmexit_handler+0x12cf/0x1594
> (XEN)
> (XEN) wake 1ee61 20
> 
> 
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Olaf Hering

2011-Nov-22 11:40 UTC

head link

Re: Need help with fixing the Xen waitqueue feature

On Sat, Nov 12, Keir Fraser wrote:
> On 11/11/2011 22:56, "Olaf Hering" <olaf@aepfle.de> wrote:
> 
> > Keir,
> > 
> > just do dump my findings to the list:
> > 
> > On Tue, Nov 08, Keir Fraser wrote:
> > 
> >> Tbh I wonder anyway whether stale hypercall context would be
likely to cause
> >> a silent machine reboot. Booting with max_cpus=1 would eliminate
moving
> >> between CPUs as a cause of inconsistencies, or pin the guest under
test.
> >> Another problem could be sleeping with locks held, but we do test
for that
> >> (in debug builds at least) and I''d expect crash/hang
rather than silent
> >> reboot. Another problem could be if the vcpu has its own state in
an
> >> inconsistent/invalid state temporarily (e.g., its pagetable base
pointers)
> >> which then is attempted to be restored during a waitqueue wakeup.
That could
> >> certainly cause a reboot, but I don''t know of an example
where this might
> >> happen.
> > 
> > The crashes also happen with maxcpus=1 and a single guest cpu.
> > Today I added wait_event to ept_get_entry and this works.
> > 
> > But at some point the codepath below is executed, after that wake_up
the
> > host hangs hard. I will trace it further next week, maybe the
backtrace
> > gives a glue what the cause could be.
> 
> So you run with a single CPU, and with wait_event() in one location, and
> that works for a while (actually doing full waitqueue work: executing
wait()
> and wake_up()), but then hangs? That''s weird, but pretty
interesting if I''ve
> understood correctly.
Yes, thats what happens with single cpu in dom0 and domU.
I have added some more debug. After the backtrace below I see one more
call to check_wakeup_from_wait() for dom0, then the host hangs hard.
> > Also, the 3K stacksize is still too small, this path uses 3096.
> 
> I''ll allocate a whole page for the stack then.
Thanks.


Olaf
> > (XEN) prep 127a 30 0
> > (XEN) wake 127a 30
> > (XEN) prep 1cf71 30 0
> > (XEN) wake 1cf71 30
> > (XEN) prep 1cf72 30 0
> > (XEN) wake 1cf72 30
> > (XEN) prep 1cee9 30 0
> > (XEN) wake 1cee9 30
> > (XEN) prep 121a 30 0
> > (XEN) wake 121a 30
> > 
> > (This means ''gfn  (p2m_unshare << 4)
in_atomic)''
> > 
> > (XEN) prep 1ee61 20 0
> > (XEN) max stacksize c18
> > (XEN) Xen WARN at wait.c:126
> > (XEN) ----[ Xen-4.2.24114-20111111.221356  x86_64  debug=y  Tainted:  
C
> > ]----
> > (XEN) CPU:    0
> > (XEN) RIP:    e008:[<ffff82c48012b85e>]
prepare_to_wait+0x178/0x1b2
> > (XEN) RFLAGS: 0000000000010286   CONTEXT: hypervisor
> > (XEN) rax: 0000000000000000   rbx: ffff830201f76000   rcx:
0000000000000000
> > (XEN) rdx: ffff82c4802b7f18   rsi: 000000000000000a   rdi:
ffff82c4802673f0
> > (XEN) rbp: ffff82c4802b73a8   rsp: ffff82c4802b7378   r8: 
0000000000000000
> > (XEN) r9:  ffff82c480221da0   r10: 00000000fffffffa   r11:
0000000000000003
> > (XEN) r12: ffff82c4802b7f18   r13: ffff830201f76000   r14:
ffff83003ea5c000
> > (XEN) r15: 000000000001ee61   cr0: 000000008005003b   cr4:
00000000000026f0
> > (XEN) cr3: 000000020336d000   cr2: 00007fa88ac42000
> > (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> > (XEN) Xen stack trace from rsp=ffff82c4802b7378:
> > (XEN)    0000000000000020 000000000001ee61 0000000000000002
ffff830201aa9e90
> > (XEN)    ffff830201aa9f60 0000000000000020 ffff82c4802b7428
ffff82c4801e02f9
> > (XEN)    ffff830000000002 0000000000000000 ffff82c4802b73f8
ffff82c4802b73f4
> > (XEN)    0000000000000000 ffff82c4802b74e0 ffff82c4802b74e4
0000000101aa9e90
> > (XEN)    000000ffffffffff ffff830201aa9e90 000000000001ee61
ffff82c4802b74e4
> > (XEN)    0000000000000002 0000000000000000 ffff82c4802b7468
ffff82c4801d810f
> > (XEN)    ffff82c4802b74e0 000000000001ee61 ffff830201aa9e90
ffff82c4802b75bc
> > (XEN)    00000000002167f5 ffff88001ee61900 ffff82c4802b7518
ffff82c480211b80
> > (XEN)    ffff8302167f5000 ffff82c4801c168c 0000000000000000
ffff83003ea5c000
> > (XEN)    ffff88001ee61900 0000000001805063 0000000001809063
000000001ee001e3
> > (XEN)    000000001ee61067 00000000002167f5 000000000022ee70
000000000022ed10
> > (XEN)    ffffffffffffffff 0000000a00000007 0000000000000004
ffff82c48025db80
> > (XEN)    ffff83003ea5c000 ffff82c4802b75bc ffff88001ee61900
ffff830201aa9e90
> > (XEN)    ffff82c4802b7528 ffff82c480211cb1 ffff82c4802b7568
ffff82c4801da97f
> > (XEN)    ffff82c4801be053 0000000000000008 ffff82c4802b7b58
ffff88001ee61900
> > (XEN)    0000000000000000 ffff82c4802b78b0 ffff82c4802b75f8
ffff82c4801aaec8
> > (XEN)    0000000000000003 ffff88001ee61900 ffff82c4802b78b0
ffff82c4802b7640
> > (XEN)    ffff83003ea5c000 00000000000000a0 0000000000000900
0000000000000008
> > (XEN)    00000003802b7650 0000000000000004 00000003802b7668
0000000000000000
> > (XEN)    ffff82c4802b7b58 0000000000000001 0000000000000003
ffff82c4802b78b0
> > (XEN) Xen call trace:
> > (XEN)    [<ffff82c48012b85e>] prepare_to_wait+0x178/0x1b2
> > (XEN)    [<ffff82c4801e02f9>] ept_get_entry+0x81/0xd8
> > (XEN)    [<ffff82c4801d810f>] gfn_to_mfn_type_p2m+0x55/0x114
> > (XEN)    [<ffff82c480211b80>]
hap_p2m_ga_to_gfn_4_levels+0x1c4/0x2d6
> > (XEN)    [<ffff82c480211cb1>] hap_gva_to_gfn_4_levels+0x1f/0x2e
> > (XEN)    [<ffff82c4801da97f>] paging_gva_to_gfn+0xae/0xc4
> > (XEN)    [<ffff82c4801aaec8>] hvmemul_linear_to_phys+0xf1/0x25c
> > (XEN)    [<ffff82c4801ab762>] hvmemul_rep_movs+0xe8/0x31a
> > (XEN)    [<ffff82c48018de07>] x86_emulate+0x4e01/0x10fde
> > (XEN)    [<ffff82c4801aab3c>] hvm_emulate_one+0x12d/0x1c5
> > (XEN)    [<ffff82c4801b68a9>] handle_mmio+0x4e/0x1d8
> > (XEN)    [<ffff82c4801b3a1e>]
hvm_hap_nested_page_fault+0x1e7/0x302
> > (XEN)    [<ffff82c4801d1ff6>] vmx_vmexit_handler+0x12cf/0x1594
> > (XEN)
> > (XEN) wake 1ee61 20
> > 
> > 
> > 
> 
>

Keir Fraser

2011-Nov-22 13:04 UTC

head link

Re: Need help with fixing the Xen waitqueue feature

On 22/11/2011 11:40, "Olaf Hering" <olaf@aepfle.de> wrote:
> On Sat, Nov 12, Keir Fraser wrote:
> 
>> On 11/11/2011 22:56, "Olaf Hering" <olaf@aepfle.de>
wrote:
>> 
>> So you run with a single CPU, and with wait_event() in one location,
and
>> that works for a while (actually doing full waitqueue work: executing
wait()
>> and wake_up()), but then hangs? That''s weird, but pretty
interesting if I''ve
>> understood correctly.
> 
> Yes, thats what happens with single cpu in dom0 and domU.
> I have added some more debug. After the backtrace below I see one more
> call to check_wakeup_from_wait() for dom0, then the host hangs hard.
I think I checked before, but: also unresponsive to serial debug keys?

And dom0 isn''t getting put on a waitqueue I assume? Since I guess dom0
is
doing the work to wake things from the waitqueue, that couldn''t go
well. :-)
>>> Also, the 3K stacksize is still too small, this path uses 3096.
>> 
>> I''ll allocate a whole page for the stack then.
> 
> Thanks.
Forgot about it. Done now!

 -- Keir
> 
> Olaf

Olaf Hering

2011-Nov-22 13:54 UTC

head link

Re: Need help with fixing the Xen waitqueue feature

On Tue, Nov 22, Keir Fraser wrote:
> I think I checked before, but: also unresponsive to serial debug keys?
Good point, I will check that. So far I havent used these keys.
> Forgot about it. Done now!
What about domain_crash() instead of BUG_ON() in __prepare_to_wait()?
If the stacksize would be checked before its copied the hypervisor could
survive.

Olaf

Keir Fraser

2011-Nov-22 14:24 UTC

head link

Re: Need help with fixing the Xen waitqueue feature

On 22/11/2011 13:54, "Olaf Hering" <olaf@aepfle.de> wrote:
> On Tue, Nov 22, Keir Fraser wrote:
> 
>> I think I checked before, but: also unresponsive to serial debug keys?
> 
> Good point, I will check that. So far I havent used these keys.
If they work then ''d'' will give you a backtrace on every CPU,
and ''q'' will
dump domain and vcpu states. That should make things easier!
>> Forgot about it. Done now!
> 
> What about domain_crash() instead of BUG_ON() in __prepare_to_wait()?
> If the stacksize would be checked before its copied the hypervisor could
> survive.
Try the attached patch (please also try reducing the size of the new
parameter to the inline asm from PAGE_SIZE down to e.g. 2000 to force the
domain-crashing path).

 -- Keir
> Olaf


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2011-Nov-22 14:34 UTC

head link

Re: Need help with fixing the Xen waitqueue feature

On Wed, Nov 9, 2011 at 9:21 PM, Andres Lagar-Cavilla
<andres@lagarcavilla.org> wrote:> PoD also does emergency sweeps under memory pressure to identify zeroes,
> that can be easily implemented by a user-space utility.
PoD is certainly a special-case, hypervisor-handled version of paging.
 The main question is whether a user-space version can be made to
perform well enough.  My guess is that it can, but it''s far from
certain.  If it can, I''m all in favor of making the paging handle PoD.
> The hypervisor code keeps a list of 2M superpages -- that feature would be
> lost.
This is actually pretty important; Windows scrubs memory on boot, so
it''s guaranteed that the majority of the memory will be touched and
re-populated.
> But I doubt this would fly anyways: PoD works for non-ept modes, which I
> guess don''t want to lose that functionality.
Is there a particular reason we can''t do paging on shadow code? I
thought it was just that doing HAP was simpler to get started with.
That would be another blocker to getting rid of PoD, really.

 -George

Keir Fraser

2011-Nov-22 15:40 UTC

head link

Re: Need help with fixing the Xen waitqueue feature

On 22/11/2011 15:07, "Olaf Hering" <olaf@aepfle.de> wrote:
> On Tue, Nov 22, Keir Fraser wrote:
> 
>> On 22/11/2011 13:54, "Olaf Hering" <olaf@aepfle.de>
wrote:
>> 
>>> On Tue, Nov 22, Keir Fraser wrote:
>>> 
>>>> I think I checked before, but: also unresponsive to serial
debug keys?
>>> 
>>> Good point, I will check that. So far I havent used these keys.
>> 
>> If they work then ''d'' will give you a backtrace on
every CPU, and ''q'' will
>> dump domain and vcpu states. That should make things easier!
> 
> They do indeed work. The backtrace below is from another system.
> Looks like hpet_broadcast_exit() is involved.
> 
> Does that output below give any good hints?
It tells us that the hypervisor itself is in good shape. The deterministic
RIP in hpet_broadcast_exit() is simply because the serial rx interrupt is
always waking us from the idle loop. That RIP value will simply be the first
possible interruption point after the HLT instruction.

I have a new theory, which is that if we go round the for-loop in
wait_event() more than once, the vcpu''s pause counter gets messed up
and
goes negative, condemning it to sleep forever.

I have *just* pushed a change to the debug ''q'' key (ignore the
changeset
comment referring to ''d'' key, I got that wrong!) which will
print per-vcpu
and per-domain pause_count values. Please get the system stuck again, and
send the output from ''q'' key with that new changeset (c/s
24178).

Finally, I don''t really know what the prep/wake/done messages from your
logs
mean, as you didn''t send the patch that prints them.

 -- Keir 
>> Try the attached patch (please also try reducing the size of the new
>> parameter to the inline asm from PAGE_SIZE down to e.g. 2000 to force
the
>> domain-crashing path).
> 
> Thanks, I will try it.
> 
> 
> Olaf
> 
> 
> ..........
> 
> (XEN) ''q'' pressed -> dumping domain info
(now=0x5E:F50D77F8)
> (XEN) General information for domain 0:
> (XEN)     refcnt=3 dying=0 nr_pages=1852873 xenheap_pages=5 dirty_cpus={}
> max_pages=4294967295
> (XEN)     handle=00000000-0000-0000-0000-000000000000 vm_assist=00000004
> (XEN) Rangesets belonging to domain 0:
> (XEN)     I/O Ports  { 0-1f, 22-3f, 44-60, 62-9f, a2-3f7, 400-807, 80c-cfb,
> d00-ffff }
> (XEN)     Interrupts { 0-207 }
> (XEN)     I/O Memory { 0-febff, fec01-fedff, fee01-ffffffffffffffff }
> (XEN) Memory pages belonging to domain 0:
> (XEN)     DomPage list too long to display
> (XEN)     XenPage 000000000021e6d9: caf=c000000000000002,
taf=7400000000000002
> (XEN)     XenPage 000000000021e6d8: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 000000000021e6d7: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 000000000021e6d6: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 00000000000db2fe: caf=c000000000000002,
taf=7400000000000002
> (XEN) VCPU information and callbacks for domain 0:
> (XEN)     VCPU0: CPU0 [has=F] flags=0 poll=0 upcall_pend = 01, upcall_mask
> 00 dirty_cpus={} cpu_affinity={0}
> (XEN)     250 Hz periodic timer (period 4 ms)
> (XEN) General information for domain 1:
> (XEN)     refcnt=3 dying=0 nr_pages=3645 xenheap_pages=6 dirty_cpus={}
> max_pages=131328
> (XEN)     handle=d80155e4-8f8b-94e1-8382-94084b7f1e51 vm_assist=00000000
> (XEN)     paging assistance: hap refcounts log_dirty translate external
> (XEN) Rangesets belonging to domain 1:
> (XEN)     I/O Ports  { }
> (XEN)     Interrupts { }
> (XEN)     I/O Memory { }
> (XEN) Memory pages belonging to domain 1:
> (XEN)     DomPage list too long to display
> (XEN)     PoD entries=0 cachesize=0
> (XEN)     XenPage 000000000020df70: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 000000000020e045: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 000000000020c58c: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 000000000020c5a4: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 0000000000019f1e: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 000000000020eb23: caf=c000000000000001,
taf=7400000000000001
> (XEN) VCPU information and callbacks for domain 1:
> (XEN)     VCPU0: CPU0 [has=F] flags=4 poll=0 upcall_pend = 00, upcall_mask
> 00 dirty_cpus={} cpu_affinity={0}
> (XEN)     paging assistance: hap, 4 levels
> (XEN)     No periodic timer
> (XEN) Notifying guest 0:0 (virq 1, port 0, stat 0/-1/-1)
> (XEN) Notifying guest 1:0 (virq 1, port 0, stat 0/0/0)
> (XEN) ''q'' pressed -> dumping domain info
(now=0x60:A7DD8B08)
> (XEN) General information for domain 0:
> (XEN)     refcnt=3 dying=0 nr_pages=1852873 xenheap_pages=5 dirty_cpus={}
> max_pages=4294967295
> (XEN)     handle=00000000-0000-0000-0000-000000000000 vm_assist=00000004
> (XEN) Rangesets belonging to domain 0:
> (XEN)     I/O Ports  { 0-1f, 22-3f, 44-60, 62-9f, a2-3f7, 400-807, 80c-cfb,
> d00-ffff }
> (XEN)     Interrupts { 0-207 }
> (XEN)     I/O Memory { 0-febff, fec01-fedff, fee01-ffffffffffffffff }
> (XEN) Memory pages belonging to domain 0:
> (XEN)     DomPage list too long to display
> (XEN)     XenPage 000000000021e6d9: caf=c000000000000002,
taf=7400000000000002
> (XEN)     XenPage 000000000021e6d8: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 000000000021e6d7: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 000000000021e6d6: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 00000000000db2fe: caf=c000000000000002,
taf=7400000000000002
> (XEN) VCPU information and callbacks for domain 0:
> (XEN)     VCPU0: CPU0 [has=F] flags=0 poll=0 upcall_pend = 01, upcall_mask
> 00 dirty_cpus={} cpu_affinity={0}
> (XEN)     250 Hz periodic timer (period 4 ms)
> (XEN) General information for domain 1:
> (XEN)     refcnt=3 dying=0 nr_pages=3645 xenheap_pages=6 dirty_cpus={}
> max_pages=131328
> (XEN)     handle=d80155e4-8f8b-94e1-8382-94084b7f1e51 vm_assist=00000000
> (XEN)     paging assistance: hap refcounts log_dirty translate external
> (XEN) Rangesets belonging to domain 1:
> (XEN)     I/O Ports  { }
> (XEN)     Interrupts { }
> (XEN)     I/O Memory { }
> (XEN) Memory pages belonging to domain 1:
> (XEN)     DomPage list too long to display
> (XEN)     PoD entries=0 cachesize=0
> (XEN)     XenPage 000000000020df70: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 000000000020e045: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 000000000020c58c: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 000000000020c5a4: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 0000000000019f1e: caf=c000000000000001,
taf=7400000000000001
> (XEN)     XenPage 000000000020eb23: caf=c000000000000001,
taf=7400000000000001
> (XEN) VCPU information and callbacks for domain 1:
> (XEN)     VCPU0: CPU0 [has=F] flags=4 poll=0 upcall_pend = 00, upcall_mask
> 00 dirty_cpus={} cpu_affinity={0}
> (XEN)     paging assistance: hap, 4 levels
> (XEN)     No periodic timer
> (XEN) Notifying guest 0:0 (virq 1, port 0, stat 0/-1/-1)
> (XEN) Notifying guest 1:0 (virq 1, port 0, stat 0/0/0)
> (XEN) ''d'' pressed -> dumping registers
> (XEN)
> (XEN) *** Dumping CPU0 host state: ***
> (XEN) ----[ Xen-4.2.24169-20111122.144218  x86_64  debug=y  Tainted:    C
> ]----
> (XEN) CPU:    0
> (XEN) RIP:    e008:[<ffff82c48019bfe6>] hpet_broadcast_exit+0x0/0x1f9
> (XEN) RFLAGS: 0000000000000246   CONTEXT: hypervisor
> (XEN) rax: 0000000000003b40   rbx: 000000674742e72d   rcx: 0000000000000001
> (XEN) rdx: 0000000000000000   rsi: ffff82c48030f000   rdi: ffff82c4802bfea0
> (XEN) rbp: ffff82c4802bfee0   rsp: ffff82c4802bfe78   r8:  000000008c858211
> (XEN) r9:  0000000000000003   r10: ffff82c4803064e0   r11: 000000676bf885a3
> (XEN) r12: ffff83021e70e840   r13: ffff83021e70e8d0   r14: 00000067471bdb62
> (XEN) r15: ffff82c48030e440   cr0: 000000008005003b   cr4: 00000000000026f0
> (XEN) cr3: 00000000db4c4000   cr2: 0000000000beb000
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN) Xen stack trace from rsp=ffff82c4802bfe78:
> (XEN)    ffff82c48019f0ca ffff82c4802bff18 ffffffffffffffff
ffff82c4802bfed0
> (XEN)    0000000180124b57 0000000000000000 0000000000000000
ffff82c48025b200
> (XEN)    0000152900006fe3 ffff82c4802bff18 ffff82c48025b200
ffff82c4802bff18
> (XEN)    ffff82c48030e468 ffff82c4802bff10 ffff82c48015a88d
0000000000000000
> (XEN)    ffff8300db6c6000 ffff8300db6c6000 ffffffffffffffff
ffff82c4802bfe00
> (XEN)    0000000000000000 0000000000001000 0000000000001000
0000000000000000
> (XEN)    8000000000000427 ffff8801d8579010 0000000000000246
00000000deadbeef
> (XEN)    ffff8801d8579000 ffff8801d8579000 00000000fffffffe
ffffffff8000302a
> (XEN)    00000000deadbeef 00000000deadbeef 00000000deadbeef
0000010000000000
> (XEN)    ffffffff8000302a 000000000000e033 0000000000000246
ffff8801a515bd10
> (XEN)    000000000000e02b 000000000000beef 000000000000beef
000000000000beef
> (XEN)    000000000000beef 0000000000000000 ffff8300db6c6000
0000000000000000
> (XEN)    0000000000000000
> (XEN) Xen call trace:
> (XEN)    [<ffff82c48019bfe6>] hpet_broadcast_exit+0x0/0x1f9
> (XEN)    [<ffff82c48015a88d>] idle_loop+0x6c/0x7b
> (XEN)
> (XEN) ''d'' pressed -> dumping registers
> (XEN)
> (XEN) *** Dumping CPU0 host state: ***
> (XEN) ----[ Xen-4.2.24169-20111122.144218  x86_64  debug=y  Tainted:    C
> ]----
> (XEN) CPU:    0
> (XEN) RIP:    e008:[<ffff82c48019bfe6>] hpet_broadcast_exit+0x0/0x1f9
> (XEN) RFLAGS: 0000000000000246   CONTEXT: hypervisor
> (XEN) rax: 0000000000003b40   rbx: 00000078f4fbe7ed   rcx: 0000000000000001
> (XEN) rdx: 0000000000000000   rsi: ffff82c48030f000   rdi: ffff82c4802bfea0
> (XEN) rbp: ffff82c4802bfee0   rsp: ffff82c4802bfe78   r8:  00000000cd4f8db6
> (XEN) r9:  0000000000000002   r10: ffff82c480308780   r11: 000000790438291d
> (XEN) r12: ffff83021e70e840   r13: ffff83021e70e8d0   r14: 00000078f412a61c
> (XEN) r15: ffff82c48030e440   cr0: 000000008005003b   cr4: 00000000000026f0
> (XEN) cr3: 00000000db4c4000   cr2: 0000000000beb000
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN) Xen stack trace from rsp=ffff82c4802bfe78:
> (XEN)    ffff82c48019f0ca ffff82c4802bff18 ffffffffffffffff
ffff82c4802bfed0
> (XEN)    0000000180124b57 0000000000000000 0000000000000000
ffff82c48025b200
> (XEN)    0000239e00007657 ffff82c4802bff18 ffff82c48025b200
ffff82c4802bff18
> (XEN)    ffff82c48030e468 ffff82c4802bff10 ffff82c48015a88d
0000000000000000
> (XEN)    ffff8300db6c6000 ffff8300db6c6000 ffffffffffffffff
ffff82c4802bfe00
> (XEN)    0000000000000000 0000000000001000 0000000000001000
0000000000000000
> (XEN)    8000000000000427 ffff8801d8579010 0000000000000246
00000000deadbeef
> (XEN)    ffff8801d8579000 ffff8801d8579000 00000000fffffffe
ffffffff8000302a
> (XEN)    00000000deadbeef 00000000deadbeef 00000000deadbeef
0000010000000000
> (XEN)    ffffffff8000302a 000000000000e033 0000000000000246
ffff8801a515bd10
> (XEN)    000000000000e02b 000000000000beef 000000000000beef
000000000000beef
> (XEN)    000000000000beef 0000000000000000 ffff8300db6c6000
0000000000000000
> (XEN)    0000000000000000
> (XEN) Xen call trace:
> (XEN)    [<ffff82c48019bfe6>] hpet_broadcast_exit+0x0/0x1f9
> (XEN)    [<ffff82c48015a88d>] idle_loop+0x6c/0x7b
> (XEN)
>

Keir Fraser

2011-Nov-22 15:54 UTC

head link

Re: Need help with fixing the Xen waitqueue feature

On 22/11/2011 15:40, "Keir Fraser" <keir@xen.org> wrote:
> I have a new theory, which is that if we go round the for-loop in
> wait_event() more than once, the vcpu''s pause counter gets messed
up and
> goes negative, condemning it to sleep forever.
Further to this, can you please try moving the call to __prepare_to_wait()
from just after the spinlock region to just before (i.e., immediately after
the ASSERT), in prepare_to_wait(). That could well makes things work better.
On UP at least -- for SMP systems I will also need to fix the broken usage
of wqv->esp...

 -- Keir

Olaf Hering

2011-Nov-22 17:36 UTC

head link

Re: Need help with fixing the Xen waitqueue feature

On Tue, Nov 22, Keir Fraser wrote:
> I have a new theory, which is that if we go round the for-loop in
> wait_event() more than once, the vcpu''s pause counter gets messed
up and
> goes negative, condemning it to sleep forever.
I have added a check for that, its not negative.
 > I have *just* pushed a change to the debug ''q'' key
(ignore the changeset
> comment referring to ''d'' key, I got that wrong!) which
will print per-vcpu
> and per-domain pause_count values. Please get the system stuck again, and
> send the output from ''q'' key with that new changeset (c/s
24178).
To me it looks like dom0 gets paused, perhaps due to some uneven pause/unpause
calls.
I will see if I can figure it out.

Olaf

(XEN) ''q'' pressed -> dumping domain info
(now=0xA1:4BC733CC)
(XEN) General information for domain 0:
(XEN)     refcnt=3 dying=0 pause_count=0
(XEN)     nr_pages=5991502 xenheap_pages=5 dirty_cpus={} max_pages=4294967295
(XEN)     handle=00000000-0000-0000-0000-000000000000 vm_assist=00000004
(XEN) Rangesets belonging to domain 0:
(XEN)     I/O Ports  { 0-1f, 22-3f, 44-60, 62-9f, a2-3f7, 400-407, 40c-cfb,
d00-ffff }
(XEN)     Interrupts { 0-303 }
(XEN)     I/O Memory { 0-febff, fec01-fec8f, fec91-fedff, fee01-ffffffffffffffff
}
(XEN) Memory pages belonging to domain 0:
(XEN)     DomPage list too long to display
(XEN)     XenPage 000000000036ff8d: caf=c000000000000002, taf=7400000000000002
(XEN)     XenPage 000000000036ff8c: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 000000000036ff8b: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 000000000036ff8a: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 000000000008befd: caf=c000000000000002, taf=7400000000000002
(XEN) VCPU information and callbacks for domain 0:
(XEN)     VCPU0: CPU0 [has=F] poll=0 upcall_pend = 01, upcall_mask = 00
dirty_cpus={} cpu_affinity={0}
(XEN)     pause_count=1 pause_flags=0
(XEN)     250 Hz periodic timer (period 4 ms)
(XEN) General information for domain 1:
(XEN)     refcnt=3 dying=0 pause_count=0
(XEN)     nr_pages=17549 xenheap_pages=6 dirty_cpus={} max_pages=131328
(XEN)     handle=5499728e-7f38-dbb0-b6cc-22866a6864f3 vm_assist=00000000
(XEN)     paging assistance: hap refcounts translate external
(XEN) Rangesets belonging to domain 1:
(XEN)     I/O Ports  { }
(XEN)     Interrupts { }
(XEN)     I/O Memory { }
(XEN) Memory pages belonging to domain 1:
(XEN)     DomPage list too long to display
(XEN)     PoD entries=0 cachesize=0
(XEN)     XenPage 0000000000200b7c: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 0000000000203bfe: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 0000000000200b48: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 000000000021291d: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 000000000003ebfc: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 0000000000202ef4: caf=c000000000000001, taf=7400000000000001
(XEN) VCPU information and callbacks for domain 1:
(XEN)     VCPU0: CPU0 [has=F] poll=0 upcall_pend = 00, upcall_mask = 00
dirty_cpus={} cpu_affinity={0}
(XEN)     pause_count=0 pause_flags=4
(XEN)     paging assistance: hap, 4 levels
(XEN)     No periodic timer
(XEN) Notifying guest 0:0 (virq 1, port 0, stat 0/-1/-1)
(XEN) Notifying guest 1:0 (virq 1, port 0, stat 0/0/0)

Keir Fraser

2011-Nov-22 17:42 UTC

head link

Re: Need help with fixing the Xen waitqueue feature

On 22/11/2011 17:36, "Olaf Hering" <olaf@aepfle.de> wrote:
> On Tue, Nov 22, Keir Fraser wrote:
> 
>> I have a new theory, which is that if we go round the for-loop in
>> wait_event() more than once, the vcpu''s pause counter gets
messed up and
>> goes negative, condemning it to sleep forever.
> 
> I have added a check for that, its not negative.
>  
>> I have *just* pushed a change to the debug ''q'' key
(ignore the changeset
>> comment referring to ''d'' key, I got that wrong!)
which will print per-vcpu
>> and per-domain pause_count values. Please get the system stuck again,
and
>> send the output from ''q'' key with that new changeset
(c/s 24178).
> 
> To me it looks like dom0 gets paused, perhaps due to some uneven
pause/unpause
> calls.
> I will see if I can figure it out.
Could it have ended up on the waitqueue?

 -- Keir

Olaf Hering

2011-Nov-22 18:04 UTC

head link

Re: Need help with fixing the Xen waitqueue feature

On Tue, Nov 22, Keir Fraser wrote:
> Could it have ended up on the waitqueue?
Unlikely, but I will add checks for that as well.

Olaf

Olaf Hering

2011-Nov-22 21:15 UTC

head link

Re: Need help with fixing the Xen waitqueue feature

On Tue, Nov 22, Olaf Hering wrote:
> On Tue, Nov 22, Keir Fraser wrote:
> 
> > Could it have ended up on the waitqueue?
> 
> Unlikely, but I will add checks for that as well.
I posted three changes which make use of the wait queues.
For some reason the code at the very end of p2m_mem_paging_populate()
triggers when d is dom0, so its vcpu is put to sleep.


Olaf

Keir Fraser

2011-Nov-22 21:53 UTC

head link

Re: Need help with fixing the Xen waitqueue feature

On 22/11/2011 21:15, "Olaf Hering" <olaf@aepfle.de> wrote:
> On Tue, Nov 22, Olaf Hering wrote:
> 
>> On Tue, Nov 22, Keir Fraser wrote:
>> 
>>> Could it have ended up on the waitqueue?
>> 
>> Unlikely, but I will add checks for that as well.
> 
> I posted three changes which make use of the wait queues.
> For some reason the code at the very end of p2m_mem_paging_populate()
> triggers when d is dom0, so its vcpu is put to sleep.
We obviously can''t have dom0 going to sleep on paging work. This, at
least,
isn''t a wait-queue bug.
> Olaf

Olaf Hering

2011-Nov-23 17:00 UTC

head link

Re: Need help with fixing the Xen waitqueue feature

On Tue, Nov 22, Keir Fraser wrote:
> We obviously can''t have dom0 going to sleep on paging work. This,
at least,
> isn''t a wait-queue bug.
I had to rearrange some code in p2m_mem_paging_populate for my debug
stuff. This led to an uninitialized req, and as a result req.flags
sometimes had MEM_EVENT_FLAG_VCPU_PAUSED set. For some reason gcc did
not catch that..
Now waitqueues appear to work ok for me. Thanks!


What do you think about C99 initializers in p2m_mem_paging_populate,
just to avoid such mistakes?

   mem_event_request_t req = { .type = MEM_EVENT_TYPE_PAGING };

Olaf

Keir Fraser

2011-Nov-23 17:16 UTC

head link

Re: Need help with fixing the Xen waitqueue feature

On 23/11/2011 17:00, "Olaf Hering" <olaf@aepfle.de> wrote:
> On Tue, Nov 22, Keir Fraser wrote:
> 
>> We obviously can''t have dom0 going to sleep on paging work.
This, at least,
>> isn''t a wait-queue bug.
> 
> I had to rearrange some code in p2m_mem_paging_populate for my debug
> stuff. This led to an uninitialized req, and as a result req.flags
> sometimes had MEM_EVENT_FLAG_VCPU_PAUSED set. For some reason gcc did
> not catch that..
> Now waitqueues appear to work ok for me. Thanks!
Great. However, while eyeballing wait.c I spotted at least two bugs.
I''m
pretty sure that the hypervisor will blow up pretty quickly when you resume
testing with multiple physical CPUs, for example. I need to create a couple
of fixup patches which I will then send to you for test.

By the way, did you test my patch to domain_crash when the stack-save area
isn''t large enough?
> What do you think about C99 initializers in p2m_mem_paging_populate,
> just to avoid such mistakes?
> 
>    mem_event_request_t req = { .type = MEM_EVENT_TYPE_PAGING };
We like them.

 -- Keir

Olaf Hering

2011-Nov-23 18:06 UTC

head link

Re: Need help with fixing the Xen waitqueue feature

On Wed, Nov 23, Keir Fraser wrote:
> On 23/11/2011 17:00, "Olaf Hering" <olaf@aepfle.de> wrote:
> 
> > On Tue, Nov 22, Keir Fraser wrote:
> > 
> >> We obviously can''t have dom0 going to sleep on paging
work. This, at least,
> >> isn''t a wait-queue bug.
> > 
> > I had to rearrange some code in p2m_mem_paging_populate for my debug
> > stuff. This led to an uninitialized req, and as a result req.flags
> > sometimes had MEM_EVENT_FLAG_VCPU_PAUSED set. For some reason gcc did
> > not catch that..
> > Now waitqueues appear to work ok for me. Thanks!
> 
> Great. However, while eyeballing wait.c I spotted at least two bugs.
I''m
> pretty sure that the hypervisor will blow up pretty quickly when you resume
> testing with multiple physical CPUs, for example. I need to create a couple
> of fixup patches which I will then send to you for test.
Good, I will look forward for these fixes.
> By the way, did you test my patch to domain_crash when the stack-save area
> isn''t large enough?
I ran into the ->esp == 0 case right away, but I need to retest with a
clean tree.

Olaf

Keir Fraser

2011-Nov-23 18:18 UTC

head link

Re: Need help with fixing the Xen waitqueue feature

On 23/11/2011 17:16, "Keir Fraser" <keir.xen@gmail.com> wrote:
> On 23/11/2011 17:00, "Olaf Hering" <olaf@aepfle.de> wrote:
> 
>> On Tue, Nov 22, Keir Fraser wrote:
>> 
>>> We obviously can''t have dom0 going to sleep on paging
work. This, at least,
>>> isn''t a wait-queue bug.
>> 
>> I had to rearrange some code in p2m_mem_paging_populate for my debug
>> stuff. This led to an uninitialized req, and as a result req.flags
>> sometimes had MEM_EVENT_FLAG_VCPU_PAUSED set. For some reason gcc did
>> not catch that..
>> Now waitqueues appear to work ok for me. Thanks!
> 
> Great. However, while eyeballing wait.c I spotted at least two bugs.
I''m
> pretty sure that the hypervisor will blow up pretty quickly when you resume
> testing with multiple physical CPUs, for example. I need to create a couple
> of fixup patches which I will then send to you for test.
We have quite a big waitqueue problem actually. The current scheme of
per-cpu stacks doesn''t work nicely, as the stack pointer will change if
a
vcpu goes to sleep and then wakes up on a different cpu. This really
doesn''t
work nicely with preempted C code, which may implement frame pointers and/or
arbitrarily take the address of on-stack variables. The result will be
hideous cross-stack corruptions, as these frame pointers and cached
addresses of automatic variables will reference the wrong cpu''s stack!
Fixing or detecting this in general is not possible afaics.

So, we''ll have to switch to per-vcpu stacks, probably with separate
per-cpu
irq stacks (as a later followup). That''s quite a nuisance!

 -- Keir
> By the way, did you test my patch to domain_crash when the stack-save area
> isn''t large enough?
> 
>> What do you think about C99 initializers in p2m_mem_paging_populate,
>> just to avoid such mistakes?
>> 
>>    mem_event_request_t req = { .type = MEM_EVENT_TYPE_PAGING };
> 
> We like them.
> 
>  -- Keir
> 
>

Keir Fraser

2011-Nov-23 18:23 UTC

head link

Re: Need help with fixing the Xen waitqueue feature

On 23/11/2011 18:06, "Olaf Hering" <olaf@aepfle.de> wrote:
> On Wed, Nov 23, Keir Fraser wrote:
> 
>> On 23/11/2011 17:00, "Olaf Hering" <olaf@aepfle.de>
wrote:
>> 
>>> On Tue, Nov 22, Keir Fraser wrote:
>>> 
>>>> We obviously can''t have dom0 going to sleep on paging
work. This, at least,
>>>> isn''t a wait-queue bug.
>>> 
>>> I had to rearrange some code in p2m_mem_paging_populate for my
debug
>>> stuff. This led to an uninitialized req, and as a result req.flags
>>> sometimes had MEM_EVENT_FLAG_VCPU_PAUSED set. For some reason gcc
did
>>> not catch that..
>>> Now waitqueues appear to work ok for me. Thanks!
>> 
>> Great. However, while eyeballing wait.c I spotted at least two bugs.
I''m
>> pretty sure that the hypervisor will blow up pretty quickly when you
resume
>> testing with multiple physical CPUs, for example. I need to create a
couple
>> of fixup patches which I will then send to you for test.
> 
> Good, I will look forward for these fixes.
> 
>> By the way, did you test my patch to domain_crash when the stack-save
area
>> isn''t large enough?
> 
> I ran into the ->esp == 0 case right away, but I need to retest with a
> clean tree.
I think I have a test the wrong way round. This doesn''t really matter
now
anyway. As I say in my previous email, stack management will have to be
redone for waitqueues.

 -- Keir
> Olaf

Olaf Hering

2011-Nov-23 18:31 UTC

head link

Re: Need help with fixing the Xen waitqueue feature

On Wed, Nov 23, Keir Fraser wrote:
> We have quite a big waitqueue problem actually. The current scheme of
> per-cpu stacks doesn''t work nicely, as the stack pointer will
change if a
> vcpu goes to sleep and then wakes up on a different cpu. This really
doesn''t
> work nicely with preempted C code, which may implement frame pointers
and/or
> arbitrarily take the address of on-stack variables. The result will be
> hideous cross-stack corruptions, as these frame pointers and cached
> addresses of automatic variables will reference the wrong cpu''s
stack!
> Fixing or detecting this in general is not possible afaics.
Yes, I was thinking about that wakeup on different cpu as well.
As a quick fix/hack, perhaps the scheduler could make sure the vcpu
wakes up on the same cpu?

Olaf

Keir Fraser

2011-Nov-23 19:21 UTC

head link

Re: Need help with fixing the Xen waitqueue feature

On 23/11/2011 18:31, "Olaf Hering" <olaf@aepfle.de> wrote:
> On Wed, Nov 23, Keir Fraser wrote:
> 
>> We have quite a big waitqueue problem actually. The current scheme of
>> per-cpu stacks doesn''t work nicely, as the stack pointer will
change if a
>> vcpu goes to sleep and then wakes up on a different cpu. This really
doesn''t
>> work nicely with preempted C code, which may implement frame pointers
and/or
>> arbitrarily take the address of on-stack variables. The result will be
>> hideous cross-stack corruptions, as these frame pointers and cached
>> addresses of automatic variables will reference the wrong
cpu''s stack!
>> Fixing or detecting this in general is not possible afaics.
> 
> Yes, I was thinking about that wakeup on different cpu as well.
> As a quick fix/hack, perhaps the scheduler could make sure the vcpu
> wakes up on the same cpu?
Could save old affinity and then vcpu_set_affinity. That will have to do for
now. Actually it should work okay as long as toolstack doesn''t mess
with
affinity meanwhile. I''ll sort out a patch for this.

 -- Keir
> Olaf

Keir Fraser

2011-Nov-23 21:03 UTC

head link

Re: Need help with fixing the Xen waitqueue feature

On 23/11/2011 19:21, "Keir Fraser" <keir.xen@gmail.com> wrote:
> On 23/11/2011 18:31, "Olaf Hering" <olaf@aepfle.de> wrote:
> 
>> On Wed, Nov 23, Keir Fraser wrote:
>> 
>>> We have quite a big waitqueue problem actually. The current scheme
of
>>> per-cpu stacks doesn''t work nicely, as the stack pointer
will change if a
>>> vcpu goes to sleep and then wakes up on a different cpu. This
really doesn''t
>>> work nicely with preempted C code, which may implement frame
pointers and/or
>>> arbitrarily take the address of on-stack variables. The result will
be
>>> hideous cross-stack corruptions, as these frame pointers and cached
>>> addresses of automatic variables will reference the wrong
cpu''s stack!
>>> Fixing or detecting this in general is not possible afaics.
>> 
>> Yes, I was thinking about that wakeup on different cpu as well.
>> As a quick fix/hack, perhaps the scheduler could make sure the vcpu
>> wakes up on the same cpu?
> 
> Could save old affinity and then vcpu_set_affinity. That will have to do
for
> now. Actually it should work okay as long as toolstack doesn''t
mess with
> affinity meanwhile. I''ll sort out a patch for this.
Attached three patches for you to try. They apply in sequence.
00: A fixed version of "domain_crash on stack overflow"
01: Reorders prepare_to_wait so that the vcpu will always be on the
waitqueue on exit (even if it has just been woken).
02: Ensures the vcpu wakes up on the same cpu that it slept on.

We need all of these. Just need testing to make sure they aren''t
horribly
broken. You should be able to test multi-processor host again with these.

 -- Keir
>  -- Keir
> 
>> Olaf
> 
> 




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Olaf Hering

2011-Nov-23 22:30 UTC

head link

Re: Need help with fixing the Xen waitqueue feature

On Wed, Nov 23, Keir Fraser wrote:
> Attached three patches for you to try. They apply in sequence.
> 00: A fixed version of "domain_crash on stack overflow"
> 01: Reorders prepare_to_wait so that the vcpu will always be on the
> waitqueue on exit (even if it has just been woken).
> 02: Ensures the vcpu wakes up on the same cpu that it slept on.
> 
> We need all of these. Just need testing to make sure they aren''t
horribly
> broken. You should be able to test multi-processor host again with these.
Thanks Keir.

In a first test they work ok with multi-processor.
I get vcpu hangs when I balloon up and down with mem-set. Thats most
likely caused by uneven vcpu_pause/unpause calls in my changes which use
wait queue in mem_event handling and ept_get_entry. I will debug that
further.

After the vcpu hung I killed the guest and tried to start a new one.
Oddly enough I wasnt able to fully kill the guest, it remained in --p--d
state. Most vcpus were in paused state before that.

In another attempt I was able to run firefox in a guest. But after
trying to open all "latest headlines" in tabs the guest crashed.
qemu-dm
log had alot of this (but nothing in xen dmesg):
track_dirty_vram(f0000000, 12c) failed (-1, 3)

xl vcpu-list shows
(null)                               1     0    -   --p      47.3  any cpu
(null)                               1     1   12   ---      13.4  any cpu
(null)                               1     2    -   --p       4.3  any cpu
(null)                               1     3    -   --p       7.8  any cpu
(null)                               1     4    -   --p       3.5  any cpu
(null)                               1     5    -   --p       1.9  any cpu
(null)                               1     6    -   --p       1.6  any cpu
(null)                               1     7    -   --p       1.4  any cpu

Hmm, qemu-dm doesnt get killed in all cases, killing it destroys the guest..
I have seen that before already.

I will provide more test results tomorrow.

Olaf

Keir Fraser

2011-Nov-23 23:12 UTC

head link

Re: Need help with fixing the Xen waitqueue feature

On 23/11/2011 22:30, "Olaf Hering" <olaf@aepfle.de> wrote:
> After the vcpu hung I killed the guest and tried to start a new one.
> Oddly enough I wasnt able to fully kill the guest, it remained in --p--d
> state. Most vcpus were in paused state before that.
Dying but kept as a zombie by memory references...
> In another attempt I was able to run firefox in a guest. But after
> trying to open all "latest headlines" in tabs the guest crashed.
qemu-dm
> log had alot of this (but nothing in xen dmesg):
> track_dirty_vram(f0000000, 12c) failed (-1, 3)
> 
> xl vcpu-list shows
> (null)                               1     0    -   --p      47.3  any cpu
> (null)                               1     1   12   ---      13.4  any cpu
> (null)                               1     2    -   --p       4.3  any cpu
> (null)                               1     3    -   --p       7.8  any cpu
> (null)                               1     4    -   --p       3.5  any cpu
> (null)                               1     5    -   --p       1.9  any cpu
> (null)                               1     6    -   --p       1.6  any cpu
> (null)                               1     7    -   --p       1.4  any cpu
> 
> Hmm, qemu-dm doesnt get killed in all cases, killing it destroys the
guest..
> I have seen that before already.
...from qemu-dm. Problem is that toolstack is not killing qemu-dm, or
qemu-dm is not responding to some shutdown signal.
> I will provide more test results tomorrow.
Thanks.

 -- Keir

Jan Beulich

2011-Nov-24 09:15 UTC

head link

Re: Need help with fixing the Xen waitqueue feature

>>> On 23.11.11 at 22:03, Keir Fraser <keir.xen@gmail.com> wrote:
> On 23/11/2011 19:21, "Keir Fraser" <keir.xen@gmail.com>
wrote:
> 
>> On 23/11/2011 18:31, "Olaf Hering" <olaf@aepfle.de>
wrote:
>> 
>>> On Wed, Nov 23, Keir Fraser wrote:
>>> 
>>>> We have quite a big waitqueue problem actually. The current
scheme of
>>>> per-cpu stacks doesn''t work nicely, as the stack
pointer will change if a
>>>> vcpu goes to sleep and then wakes up on a different cpu. This
really doesn''t
>>>> work nicely with preempted C code, which may implement frame
pointers and/or
>>>> arbitrarily take the address of on-stack variables. The result
will be
>>>> hideous cross-stack corruptions, as these frame pointers and
cached
>>>> addresses of automatic variables will reference the wrong
cpu''s stack!
>>>> Fixing or detecting this in general is not possible afaics.
>>> 
>>> Yes, I was thinking about that wakeup on different cpu as well.
>>> As a quick fix/hack, perhaps the scheduler could make sure the vcpu
>>> wakes up on the same cpu?
>> 
>> Could save old affinity and then vcpu_set_affinity. That will have to
do for
>> now. Actually it should work okay as long as toolstack doesn''t
mess with
>> affinity meanwhile. I''ll sort out a patch for this.
> 
> Attached three patches for you to try. They apply in sequence.
> 00: A fixed version of "domain_crash on stack overflow"
> 01: Reorders prepare_to_wait so that the vcpu will always be on the
> waitqueue on exit (even if it has just been woken).
> 02: Ensures the vcpu wakes up on the same cpu that it slept on.
Didn''t we (long ago) settle on not permitting new calls to
domain_crash_synchronous()? Is it really impossible to just
domain_crash() in any of the instances these add?

Jan
> We need all of these. Just need testing to make sure they aren''t
horribly
> broken. You should be able to test multi-processor host again with these.
> 
>  -- Keir
> 
>>  -- Keir
>> 
>>> Olaf
>> 
>>

Keir Fraser

2011-Nov-24 09:51 UTC

head link

Re: Need help with fixing the Xen waitqueue feature

On 24/11/2011 09:15, "Jan Beulich" <JBeulich@suse.com> wrote:
>> 
>> Attached three patches for you to try. They apply in sequence.
>> 00: A fixed version of "domain_crash on stack overflow"
>> 01: Reorders prepare_to_wait so that the vcpu will always be on the
>> waitqueue on exit (even if it has just been woken).
>> 02: Ensures the vcpu wakes up on the same cpu that it slept on.
> 
> Didn''t we (long ago) settle on not permitting new calls to
> domain_crash_synchronous()? Is it really impossible to just
> domain_crash() in any of the instances these add?
It''s safe because you must be in a context that is safe to preempt.
That''s a
pre-condition for using a waitqueue. It''s not safe to use
domain_crash()
because the caller of wait_event() may not handle the exceptional return.

 -- Keir

Keir Fraser

2011-Nov-24 09:58 UTC

head link

Re: Need help with fixing the Xen waitqueue feature

On 24/11/2011 09:15, "Jan Beulich" <JBeulich@suse.com> wrote:
>> Attached three patches for you to try. They apply in sequence.
>> 00: A fixed version of "domain_crash on stack overflow"
>> 01: Reorders prepare_to_wait so that the vcpu will always be on the
>> waitqueue on exit (even if it has just been woken).
>> 02: Ensures the vcpu wakes up on the same cpu that it slept on.
> 
> Didn''t we (long ago) settle on not permitting new calls to
> domain_crash_synchronous()?
This was a reaction to lazy patches which sprinkled d_c_s calls around
liberally, and in unsafe locations, as a dodge around proper error handling.

Olaf Hering

2011-Nov-24 10:00 UTC

head link

Re: Need help with fixing the Xen waitqueue feature

On Wed, Nov 23, Keir Fraser wrote:
> ...from qemu-dm. Problem is that toolstack is not killing qemu-dm, or
> qemu-dm is not responding to some shutdown signal.
In the first crash there was no qemu-dm process left from what I
remember. I will see if it happens again.

Olaf

Olaf Hering

2011-Nov-25 12:56 UTC

head link

Re: Need help with fixing the Xen waitqueue feature

On Thu, Nov 24, Olaf Hering wrote:
> On Wed, Nov 23, Keir Fraser wrote:
> 
> > ...from qemu-dm. Problem is that toolstack is not killing qemu-dm, or
> > qemu-dm is not responding to some shutdown signal.
> 
> In the first crash there was no qemu-dm process left from what I
> remember. I will see if it happens again.
I see the patches were already commited. Thanks.

After more investigation my config file has on_crash="preserve".
To me it looks like the guest kills itself, since nothing is in the
logs. So after all thats not a waitqueue issue, and most likely also not
a paging bug.

Olaf

Olaf Hering

2011-Nov-25 18:26 UTC

head link

Re: Need help with fixing the Xen waitqueue feature

One more thing:

Is the BUG_ON in destroy_waitqueue_vcpu really required? If the vcpu
happens to be in a queue by the time xl destroy is called the hypervisor
will crash.

Perhaps there should be some sort of domain destructor for each
waitqueue?

Olaf

Keir Fraser

2011-Nov-25 19:35 UTC

head link

Re: Need help with fixing the Xen waitqueue feature

On 25/11/2011 18:26, "Olaf Hering" <olaf@aepfle.de> wrote:
> 
> One more thing:
> 
> Is the BUG_ON in destroy_waitqueue_vcpu really required? If the vcpu
> happens to be in a queue by the time xl destroy is called the hypervisor
> will crash.
We could fix this by having waitqueues that contain a vcpu hold a reference
to that vcpu''s domain.
> Perhaps there should be some sort of domain destructor for each
> waitqueue?
Not sure what you mean.

 -- Keir
> Olaf

Seemingly Similar Threads

Search for more seemingly similar threads

Xen devel - Nov 2011 - Need help with fixing the Xen waitqueue feature

[Xen-devel] Need help with fixing the Xen waitqueue feature

Re: [Xen-devel] Need help with fixing the Xen waitqueue feature

Re: [Xen-devel] Need help with fixing the Xen waitqueue feature

Re: [Xen-devel] Need help with fixing the Xen waitqueue feature

[Xen-devel] Re: Need help with fixing the Xen waitqueue feature

[Xen-devel] Re: Need help with fixing the Xen waitqueue feature

[Xen-devel] Re: Need help with fixing the Xen waitqueue feature

[Xen-devel] Re: Need help with fixing the Xen waitqueue feature

[Xen-devel] Re: Need help with fixing the Xen waitqueue feature

[Xen-devel] Re: Need help with fixing the Xen waitqueue feature

[Xen-devel] Re: Need help with fixing the Xen waitqueue feature

[Xen-devel] Re: Need help with fixing the Xen waitqueue feature

[Xen-devel] Re: Need help with fixing the Xen waitqueue feature

[Xen-devel] Re: Need help with fixing the Xen waitqueue feature

[Xen-devel] Re: Need help with fixing the Xen waitqueue feature

[Xen-devel] Re: Need help with fixing the Xen waitqueue feature

Re: [Xen-devel] Re: Need help with fixing the Xen waitqueue feature

Re: [Xen-devel] Re: Need help with fixing the Xen waitqueue feature

Re: [Xen-devel] Need help with fixing the Xen waitqueue feature

Re: [Xen-devel] Need help with fixing the Xen waitqueue feature

Re: Need help with fixing the Xen waitqueue feature

Re: Need help with fixing the Xen waitqueue feature

Re: Need help with fixing the Xen waitqueue feature

Re: Need help with fixing the Xen waitqueue feature

Re: Need help with fixing the Xen waitqueue feature

Re: Need help with fixing the Xen waitqueue feature

Re: Need help with fixing the Xen waitqueue feature

Re: Need help with fixing the Xen waitqueue feature

Re: Need help with fixing the Xen waitqueue feature

Re: Need help with fixing the Xen waitqueue feature

Re: Need help with fixing the Xen waitqueue feature

Re: Need help with fixing the Xen waitqueue feature

Re: Need help with fixing the Xen waitqueue feature

Re: Need help with fixing the Xen waitqueue feature

Re: Need help with fixing the Xen waitqueue feature

Re: Need help with fixing the Xen waitqueue feature

Re: Need help with fixing the Xen waitqueue feature

Re: Need help with fixing the Xen waitqueue feature

Re: Need help with fixing the Xen waitqueue feature

Re: Need help with fixing the Xen waitqueue feature

Re: Need help with fixing the Xen waitqueue feature

Re: Need help with fixing the Xen waitqueue feature

Re: Need help with fixing the Xen waitqueue feature

Re: Need help with fixing the Xen waitqueue feature

Re: Need help with fixing the Xen waitqueue feature

Re: Need help with fixing the Xen waitqueue feature

Re: Need help with fixing the Xen waitqueue feature

Re: Need help with fixing the Xen waitqueue feature

Re: Need help with fixing the Xen waitqueue feature

Seemingly Similar Threads