Olaf Hering
2011-Nov-08 21:20 UTC
[Xen-devel] Need help with fixing the Xen waitqueue feature
The patch ''mem_event: use wait queue when ring is full'' I just sent out makes use of the waitqueue feature. There are two issues I get with the change applied: I think I got the logic right, and in my testing vcpu->pause_count drops to zero in p2m_mem_paging_resume(). But for some reason the vcpu does not make progress after the first wakeup. In my debugging there is one wakeup, the ring is still full, but further wakeups dont happen. The fully decoded xentrace output may provide some hints about the underlying issue. But its hard to get due to the second issue. Another thing is that sometimes the host suddenly reboots without any message. I think the reason for this is that a vcpu whose stack was put aside and that was later resumed may find itself on another physical cpu. And if that happens, wouldnt that invalidate some of the local variables back in the callchain? If some of them point to the old physical cpu, how could this be fixed? Perhaps a few "volatiles" are needed in some places. I will check wether pinning the guests vcpus to physical cpus actually avoids the sudden reboots. Olaf _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2011-Nov-08 22:05 UTC
Re: [Xen-devel] Need help with fixing the Xen waitqueue feature
On 08/11/2011 21:20, "Olaf Hering" <olaf@aepfle.de> wrote:> Another thing is that sometimes the host suddenly reboots without any > message. I think the reason for this is that a vcpu whose stack was put > aside and that was later resumed may find itself on another physical > cpu. And if that happens, wouldnt that invalidate some of the local > variables back in the callchain? If some of them point to the old > physical cpu, how could this be fixed? Perhaps a few "volatiles" are > needed in some places.>From how many call sites can we end up on a wait queue? I know we were goingto end up with a small and explicit number (e.g., in __hvm_copy()) but does this patch make it a more generally-used mechanism? There will unavoidably be many constraints on callers who want to be able to yield the cpu. We can add Linux-style get_cpu/put_cpu abstractions to catch some of them. Actually I don''t think it''s *that* common that hypercall contexts cache things like per-cpu pointers. But every caller will need auditing, I expect. A sudden reboot is very extreme. No message even on a serial line? That most commonly indicates bad page tables. Most other bugs you''d at least get a double fault message. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Olaf Hering
2011-Nov-08 22:20 UTC
Re: [Xen-devel] Need help with fixing the Xen waitqueue feature
On Tue, Nov 08, Keir Fraser wrote:> On 08/11/2011 21:20, "Olaf Hering" <olaf@aepfle.de> wrote: > > > Another thing is that sometimes the host suddenly reboots without any > > message. I think the reason for this is that a vcpu whose stack was put > > aside and that was later resumed may find itself on another physical > > cpu. And if that happens, wouldnt that invalidate some of the local > > variables back in the callchain? If some of them point to the old > > physical cpu, how could this be fixed? Perhaps a few "volatiles" are > > needed in some places. > > From how many call sites can we end up on a wait queue? I know we were going > to end up with a small and explicit number (e.g., in __hvm_copy()) but does > this patch make it a more generally-used mechanism? There will unavoidably > be many constraints on callers who want to be able to yield the cpu. We can > add Linux-style get_cpu/put_cpu abstractions to catch some of them. Actually > I don''t think it''s *that* common that hypercall contexts cache things like > per-cpu pointers. But every caller will need auditing, I expect.I havent started to audit the callers. In my testing mem_event_put_request() is called from p2m_mem_paging_drop_page() and p2m_mem_paging_populate(). The latter is called from more places. My plan is to put the sleep into ept_get_entry(), but I''m not there yet. First I want to test waitqueues in a rather simple code path like mem_event_put_request().> A sudden reboot is very extreme. No message even on a serial line? That most > commonly indicates bad page tables. Most other bugs you''d at least get a > double fault message.There is no output on serial, I boot with this cmdline: vga=mode-normal console=com1 com1=57600 loglvl=all guest_loglvl=all sync_console conring_size=123456 maxcpus=8 dom0_vcpus_pin dom0_max_vcpus=2 My base changeset is 24003, the testhost is a Xeon X5670 @ 2.93GHz. Olaf _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2011-Nov-08 22:54 UTC
Re: [Xen-devel] Need help with fixing the Xen waitqueue feature
On 08/11/2011 22:20, "Olaf Hering" <olaf@aepfle.de> wrote:> On Tue, Nov 08, Keir Fraser wrote: > >> On 08/11/2011 21:20, "Olaf Hering" <olaf@aepfle.de> wrote: >> >>> Another thing is that sometimes the host suddenly reboots without any >>> message. I think the reason for this is that a vcpu whose stack was put >>> aside and that was later resumed may find itself on another physical >>> cpu. And if that happens, wouldnt that invalidate some of the local >>> variables back in the callchain? If some of them point to the old >>> physical cpu, how could this be fixed? Perhaps a few "volatiles" are >>> needed in some places. >> >> From how many call sites can we end up on a wait queue? I know we were going >> to end up with a small and explicit number (e.g., in __hvm_copy()) but does >> this patch make it a more generally-used mechanism? There will unavoidably >> be many constraints on callers who want to be able to yield the cpu. We can >> add Linux-style get_cpu/put_cpu abstractions to catch some of them. Actually >> I don''t think it''s *that* common that hypercall contexts cache things like >> per-cpu pointers. But every caller will need auditing, I expect. > > I havent started to audit the callers. In my testing > mem_event_put_request() is called from p2m_mem_paging_drop_page() and > p2m_mem_paging_populate(). The latter is called from more places.Tbh I wonder anyway whether stale hypercall context would be likely to cause a silent machine reboot. Booting with max_cpus=1 would eliminate moving between CPUs as a cause of inconsistencies, or pin the guest under test. Another problem could be sleeping with locks held, but we do test for that (in debug builds at least) and I''d expect crash/hang rather than silent reboot. Another problem could be if the vcpu has its own state in an inconsistent/invalid state temporarily (e.g., its pagetable base pointers) which then is attempted to be restored during a waitqueue wakeup. That could certainly cause a reboot, but I don''t know of an example where this might happen. -- Keir> My plan is to put the sleep into ept_get_entry(), but I''m not there yet. > First I want to test waitqueues in a rather simple code path like > mem_event_put_request(). > >> A sudden reboot is very extreme. No message even on a serial line? That most >> commonly indicates bad page tables. Most other bugs you''d at least get a >> double fault message. > > There is no output on serial, I boot with this cmdline: > vga=mode-normal console=com1 com1=57600 loglvl=all guest_loglvl=all > sync_console conring_size=123456 maxcpus=8 dom0_vcpus_pin > dom0_max_vcpus=2 > My base changeset is 24003, the testhost is a Xeon X5670 @ 2.93GHz. > > Olaf_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andres Lagar-Cavilla
2011-Nov-09 03:37 UTC
[Xen-devel] Re: Need help with fixing the Xen waitqueue feature
Olaf, are waitqueue''s on the mem-event ring meant to be the way to deal with ring exhaustion? i.e. is this meant to go beyond a testing vehicle for waitqueue''s? With the pager itself generating events, and foreign mappings generating events, you''ll end up putting dom0 vcpu''s in a waitqueue. This will basically deadlock the host. Am I missing something here? Andres> Date: Tue, 8 Nov 2011 22:20:24 +0100 > From: Olaf Hering <olaf@aepfle.de> > Subject: [Xen-devel] Need help with fixing the Xen waitqueue feature > To: xen-devel@lists.xensource.com > Message-ID: <20111108212024.GA5276@aepfle.de> > Content-Type: text/plain; charset=utf-8 > > > The patch ''mem_event: use wait queue when ring is full'' I just sent out > makes use of the waitqueue feature. There are two issues I get with the > change applied: > > I think I got the logic right, and in my testing vcpu->pause_count drops > to zero in p2m_mem_paging_resume(). But for some reason the vcpu does > not make progress after the first wakeup. In my debugging there is one > wakeup, the ring is still full, but further wakeups dont happen. > The fully decoded xentrace output may provide some hints about the > underlying issue. But its hard to get due to the second issue. > > Another thing is that sometimes the host suddenly reboots without any > message. I think the reason for this is that a vcpu whose stack was put > aside and that was later resumed may find itself on another physical > cpu. And if that happens, wouldnt that invalidate some of the local > variables back in the callchain? If some of them point to the old > physical cpu, how could this be fixed? Perhaps a few "volatiles" are > needed in some places. > > I will check wether pinning the guests vcpus to physical cpus actually > avoids the sudden reboots. > > Olaf >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andres Lagar-Cavilla
2011-Nov-09 03:52 UTC
[Xen-devel] Re: Need help with fixing the Xen waitqueue feature
> Date: Tue, 08 Nov 2011 22:05:41 +0000 > From: Keir Fraser <keir.xen@gmail.com> > Subject: Re: [Xen-devel] Need help with fixing the Xen waitqueue > feature > To: Olaf Hering <olaf@aepfle.de>, <xen-devel@lists.xensource.com> > Message-ID: <CADF5835.245E1%keir.xen@gmail.com> > Content-Type: text/plain; charset="US-ASCII" > > On 08/11/2011 21:20, "Olaf Hering" <olaf@aepfle.de> wrote: > >> Another thing is that sometimes the host suddenly reboots without any >> message. I think the reason for this is that a vcpu whose stack was put >> aside and that was later resumed may find itself on another physical >> cpu. And if that happens, wouldnt that invalidate some of the local >> variables back in the callchain? If some of them point to the old >> physical cpu, how could this be fixed? Perhaps a few "volatiles" are >> needed in some places. > >>From how many call sites can we end up on a wait queue? I know we were >> going > to end up with a small and explicit number (e.g., in __hvm_copy()) but > does > this patch make it a more generally-used mechanism? There will unavoidably > be many constraints on callers who want to be able to yield the cpu. We > can > add Linux-style get_cpu/put_cpu abstractions to catch some of them. > Actually > I don''t think it''s *that* common that hypercall contexts cache things like > per-cpu pointers. But every caller will need auditing, I expect.Tbh, for paging to be effective, we need to be prepared to yield on every p2m lookup. Let''s compare paging to PoD. They''re essentially the same thing: pages disappear, and get allocated on the fly when you need them. PoD is a highly optimized in-hypervisor optimization that does not need a user-space helper -- but the pager could do PoD easily and remove all that p2m-pod.c code from the hypervisor. PoD only introduces extraneous side-effects when there is a complete absence of memory to allocate pages. The same cannot be said of paging, to put it mildly. It returns EINVAL all over the place. Right now, qemu can be crashed in a blink by paging out the right gfn. To get paging to where PoD is, all these situations need to be handled in a manner other than returning EINVAL. That means putting the vcpu on a waitqueue on every location p2m_pod_demand_populate is called, not just __hvm_copy. I don''t know that that''s gonna be altogether doable. Many of these gfn lookups happen in atomic contexts, not to mention cpu-specific pointers. But at least we should aim for that. Andres> > A sudden reboot is very extreme. No message even on a serial line? That > most > commonly indicates bad page tables. Most other bugs you''d at least get a > double fault message. > > -- Keir >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Olaf Hering
2011-Nov-09 07:02 UTC
[Xen-devel] Re: Need help with fixing the Xen waitqueue feature
On Tue, Nov 08, Andres Lagar-Cavilla wrote:> Olaf, > are waitqueue''s on the mem-event ring meant to be the way to deal with > ring exhaustion? i.e. is this meant to go beyond a testing vehicle for > waitqueue''s?Putting the guest to sleep when the ring is full is at least required for p2m_mem_paging_drop_page(), so that the page gets informed about all gfns from decrease_reservation.> With the pager itself generating events, and foreign mappings generating > events, you''ll end up putting dom0 vcpu''s in a waitqueue. This will > basically deadlock the host.Those vcpus can not go to sleep and my change handles that case. Olaf _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Olaf Hering
2011-Nov-09 07:09 UTC
[Xen-devel] Re: Need help with fixing the Xen waitqueue feature
On Tue, Nov 08, Andres Lagar-Cavilla wrote:> Tbh, for paging to be effective, we need to be prepared to yield on every > p2m lookup.Yes, if a gfn is missing the vcpu should go to sleep rather than returning -ENOENT to the caller. Only the query part of gfn_to_mfn should return the p2m paging types.> Let''s compare paging to PoD. They''re essentially the same thing: pages > disappear, and get allocated on the fly when you need them. PoD is a > highly optimized in-hypervisor optimization that does not need a > user-space helper -- but the pager could do PoD easily and remove all that > p2m-pod.c code from the hypervisor.Perhaps PoD and paging could be merged, I havent had time to study the PoD code.> PoD only introduces extraneous side-effects when there is a complete > absence of memory to allocate pages. The same cannot be said of paging, to > put it mildly. It returns EINVAL all over the place. Right now, qemu can > be crashed in a blink by paging out the right gfn.I have seen qemu crashes when using emulated storage, but havent debugged them yet. I suspect they were caused by a race between nominate and evict. Olaf _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andres Lagar-Cavilla
2011-Nov-09 21:21 UTC
[Xen-devel] Re: Need help with fixing the Xen waitqueue feature
Hi there,> On Tue, Nov 08, Andres Lagar-Cavilla wrote: > >> Tbh, for paging to be effective, we need to be prepared to yield on >> every >> p2m lookup. > > Yes, if a gfn is missing the vcpu should go to sleep rather than > returning -ENOENT to the caller. Only the query part of gfn_to_mfn > should return the p2m paging types. > >> Let''s compare paging to PoD. They''re essentially the same thing: pages >> disappear, and get allocated on the fly when you need them. PoD is a >> highly optimized in-hypervisor optimization that does not need a >> user-space helper -- but the pager could do PoD easily and remove all >> that >> p2m-pod.c code from the hypervisor. > > Perhaps PoD and paging could be merged, I havent had time to study the > PoD code.Well, PoD can be implemented with a pager that simply shortcuts the step that actually populates the page with contents. A zeroed heap page is good enough. It''s fairly simple for a pager to know for which pages it should return zero. PoD also does emergency sweeps under memory pressure to identify zeroes, that can be easily implemented by a user-space utility. The hypervisor code keeps a list of 2M superpages -- that feature would be lost. But I doubt this would fly anyways: PoD works for non-ept modes, which I guess don''t want to lose that functionality.> >> PoD only introduces extraneous side-effects when there is a complete >> absence of memory to allocate pages. The same cannot be said of paging, >> to >> put it mildly. It returns EINVAL all over the place. Right now, qemu can >> be crashed in a blink by paging out the right gfn. > > I have seen qemu crashes when using emulated storage, but havent > debugged them yet. I suspect they were caused by a race between nominate > and evict. > > Olaf >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andres Lagar-Cavilla
2011-Nov-09 21:30 UTC
[Xen-devel] Re: Need help with fixing the Xen waitqueue feature
Also,> On Tue, Nov 08, Andres Lagar-Cavilla wrote: > >> Tbh, for paging to be effective, we need to be prepared to yield on >> every >> p2m lookup. > > Yes, if a gfn is missing the vcpu should go to sleep rather than > returning -ENOENT to the caller. Only the query part of gfn_to_mfn > should return the p2m paging types. > >> Let''s compare paging to PoD. They''re essentially the same thing: pages >> disappear, and get allocated on the fly when you need them. PoD is a >> highly optimized in-hypervisor optimization that does not need a >> user-space helper -- but the pager could do PoD easily and remove all >> that >> p2m-pod.c code from the hypervisor. > > Perhaps PoD and paging could be merged, I havent had time to study the > PoD code. > >> PoD only introduces extraneous side-effects when there is a complete >> absence of memory to allocate pages. The same cannot be said of paging, >> to >> put it mildly. It returns EINVAL all over the place. Right now, qemu can >> be crashed in a blink by paging out the right gfn. > > I have seen qemu crashes when using emulated storage, but havent > debugged them yet. I suspect they were caused by a race between nominate > and evict.After a bit of thinking, things are far more complicated. I don''t think this is a "race." If the pager removed a page that later gets scheduled by the guest OS for IO, qemu will want to foreign-map that. With the hypervisor returning ENOENT, the foreign map will fail, and there goes qemu. Same will happen for pv backend mapping grants, or the checkpoint/migrate code. I guess qemu/migrate/libxc could retry until the pager is done and the mapping succeeds. It will be delicate. It won''t work for pv backends. It will flood the mem_event ring. Wait-queueing the dom0 vcpu is a no-go -- the machine will deadlock quicly. My thinking is that the best bet is to wait-queue the dom0 process. The dom0 kernel code handling the foreign map will need to put the mapping thread in a wait-queue. It can establish a ring-based notification mechanism with Xen. When Xen completes the paging in, it can add a notification to the ring. dom0 can then awake the mapping thread and retry. Not simple at all. Ideas out there? Andres> > Olaf >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Olaf Hering
2011-Nov-09 22:11 UTC
[Xen-devel] Re: Need help with fixing the Xen waitqueue feature
On Wed, Nov 09, Andres Lagar-Cavilla wrote:> After a bit of thinking, things are far more complicated. I don''t think > this is a "race." If the pager removed a page that later gets scheduled by > the guest OS for IO, qemu will want to foreign-map that. With the > hypervisor returning ENOENT, the foreign map will fail, and there goes > qemu.The tools are supposed to catch ENOENT and try again. linux_privcmd_map_foreign_bulk() does that. linux_gnttab_grant_map() appears to do that as well. What code path uses qemu that leads to a crash?> I guess qemu/migrate/libxc could retry until the pager is done and the > mapping succeeds. It will be delicate. It won''t work for pv backends. It > will flood the mem_event ring.There will no flood, only one request is sent per gfn in p2m_mem_paging_populate(). Olaf _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andres Lagar-Cavilla
2011-Nov-10 04:29 UTC
[Xen-devel] Re: Need help with fixing the Xen waitqueue feature
Olaf,> On Wed, Nov 09, Andres Lagar-Cavilla wrote: > >> After a bit of thinking, things are far more complicated. I don''t think >> this is a "race." If the pager removed a page that later gets scheduled >> by >> the guest OS for IO, qemu will want to foreign-map that. With the >> hypervisor returning ENOENT, the foreign map will fail, and there goes >> qemu. > > The tools are supposed to catch ENOENT and try again. > linux_privcmd_map_foreign_bulk() does that. linux_gnttab_grant_map() > appears to do that as well. What code path uses qemu that leads to a > crash?The tools retry as long as IOCTL_PRIVCMD_MMAPBATCH_V2 is supported. Which it isn''t on mainline linux 3.0, 3.1, etc. Which dom0 kernel are you using? And for backend drivers implemented in the kernel (netback, etc), there is no retrying. All those ram_paging types and their interactions give me a headache, but I''ll trust you that only one event is put in the ring. I''m using 24066:54a5e994a241. I start windows 7, make xenpaging try to evict 90% of the RAM, qemu lasts for about two seconds. Linux fights harder, but qemu also dies. No pv drivers. I haven''t been able to trace back the qemu crash (segfault on a NULL ide_if field for a dma callback) to the exact paging action yet, but no crashes without paging. Andres> >> I guess qemu/migrate/libxc could retry until the pager is done and the >> mapping succeeds. It will be delicate. It won''t work for pv backends. It >> will flood the mem_event ring. > > There will no flood, only one request is sent per gfn in > p2m_mem_paging_populate(). > > Olaf >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2011-Nov-10 09:20 UTC
[Xen-devel] Re: Need help with fixing the Xen waitqueue feature
>>> On 10.11.11 at 05:29, "Andres Lagar-Cavilla" <andres@lagarcavilla.org> wrote: > The tools retry as long as IOCTL_PRIVCMD_MMAPBATCH_V2 is supported. Which > it isn''t on mainline linux 3.0, 3.1, etc.Seems like nobody cared to port over the code from the old 2.6.18 tree (or the forward ports thereof).> Which dom0 kernel are you using?Certainly one of our forward port kernels.> And for backend drivers implemented in the kernel (netback, etc), there is > no retrying.As above, seems like nobody cared to forward port those bits either. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2011-Nov-10 09:26 UTC
[Xen-devel] Re: Need help with fixing the Xen waitqueue feature
On 10/11/2011 04:29, "Andres Lagar-Cavilla" <andres@lagarcavilla.org> wrote:>> The tools are supposed to catch ENOENT and try again. >> linux_privcmd_map_foreign_bulk() does that. linux_gnttab_grant_map() >> appears to do that as well. What code path uses qemu that leads to a >> crash? > > The tools retry as long as IOCTL_PRIVCMD_MMAPBATCH_V2 is supported. Which > it isn''t on mainline linux 3.0, 3.1, etc. Which dom0 kernel are you using? > > And for backend drivers implemented in the kernel (netback, etc), there is > no retrying.Getting this working without a new Linux kernel -- and with as-yet-to-be-written new stuff in it -- is unlikely to be on the cards is it? I think you suggested an in-kernel mechanism to wait for page-in and then retry mapping. If that could be used by the in-kernel drivers and exposed via our privcmd interface for qemu and rest of userspace too, that may be the best single solution. Perhaps it could be largely hidden behind the existing privcmd-mmap ioctls. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Olaf Hering
2011-Nov-10 10:18 UTC
[Xen-devel] Re: Need help with fixing the Xen waitqueue feature
On Wed, Nov 09, Andres Lagar-Cavilla wrote:> Olaf, > > On Wed, Nov 09, Andres Lagar-Cavilla wrote: > > > >> After a bit of thinking, things are far more complicated. I don''t think > >> this is a "race." If the pager removed a page that later gets scheduled > >> by > >> the guest OS for IO, qemu will want to foreign-map that. With the > >> hypervisor returning ENOENT, the foreign map will fail, and there goes > >> qemu. > > > > The tools are supposed to catch ENOENT and try again. > > linux_privcmd_map_foreign_bulk() does that. linux_gnttab_grant_map() > > appears to do that as well. What code path uses qemu that leads to a > > crash? > > The tools retry as long as IOCTL_PRIVCMD_MMAPBATCH_V2 is supported. Which > it isn''t on mainline linux 3.0, 3.1, etc. Which dom0 kernel are you using?I''m running SLES11 as dom0. Now thats really odd that there is no ENOENT handling in mainline, I will go and check the code.> And for backend drivers implemented in the kernel (netback, etc), there is > no retrying.A while ago I fixed the grant status handling, perhaps that change was never forwarded to pvops, at least I didnt do it at that time.> I''m using 24066:54a5e994a241. I start windows 7, make xenpaging try to > evict 90% of the RAM, qemu lasts for about two seconds. Linux fights > harder, but qemu also dies. No pv drivers. I haven''t been able to trace > back the qemu crash (segfault on a NULL ide_if field for a dma callback) > to the exact paging action yet, but no crashes without paging.If the kernel is pvops it may need some audit to check the ENOENT handling. Olaf _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Olaf Hering
2011-Nov-10 12:05 UTC
[Xen-devel] Re: Need help with fixing the Xen waitqueue feature
On Thu, Nov 10, Olaf Hering wrote:> On Wed, Nov 09, Andres Lagar-Cavilla wrote: > > The tools retry as long as IOCTL_PRIVCMD_MMAPBATCH_V2 is supported. Which > > it isn''t on mainline linux 3.0, 3.1, etc. Which dom0 kernel are you using? > I''m running SLES11 as dom0. Now thats really odd that there is no ENOENT > handling in mainline, I will go and check the code.xen_remap_domain_mfn_range() has to catch -ENOENT returned from HYPERVISOR_mmu_update() and return it to its callers. Then drivers/xen/xenfs/privcmd.c:traverse_pages() will do the right thing. See http://xenbits.xen.org/hg/linux-2.6.18-xen.hg/rev/0051d294bb60 The granttable part needs more changes, see http://xenbits.xen.org/hg/linux-2.6.18-xen.hg/rev/7c7efaea8b54 Olaf _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andres Lagar-Cavilla
2011-Nov-10 13:57 UTC
Re: [Xen-devel] Re: Need help with fixing the Xen waitqueue feature
Thanks Jan, and thanks Olaf for the pointers to specific patches. I''ll try to cherry pick those into my dom0 (debian mainline 3.0). Somebody else should get those in mainline though. Soonish :) Andres>>>> On 10.11.11 at 05:29, "Andres Lagar-Cavilla" <andres@lagarcavilla.org> >>>> wrote: >> The tools retry as long as IOCTL_PRIVCMD_MMAPBATCH_V2 is supported. >> Which >> it isn''t on mainline linux 3.0, 3.1, etc. > > Seems like nobody cared to port over the code from the old 2.6.18 tree > (or the forward ports thereof). > >> Which dom0 kernel are you using? > > Certainly one of our forward port kernels. > >> And for backend drivers implemented in the kernel (netback, etc), there >> is >> no retrying. > > As above, seems like nobody cared to forward port those bits either. > > Jan > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Nov-10 15:32 UTC
Re: [Xen-devel] Re: Need help with fixing the Xen waitqueue feature
On Thu, Nov 10, 2011 at 05:57:18AM -0800, Andres Lagar-Cavilla wrote:> Thanks Jan, and thanks Olaf for the pointers to specific patches. I''ll try > to cherry pick those into my dom0 (debian mainline 3.0). Somebody else > should get those in mainline though. Soonish :)Well, could you post them once you have cherry-picked them? Thanks.> > Andres > >>>> On 10.11.11 at 05:29, "Andres Lagar-Cavilla" <andres@lagarcavilla.org> > >>>> wrote: > >> The tools retry as long as IOCTL_PRIVCMD_MMAPBATCH_V2 is supported. > >> Which > >> it isn''t on mainline linux 3.0, 3.1, etc. > > > > Seems like nobody cared to port over the code from the old 2.6.18 tree > > (or the forward ports thereof). > > > >> Which dom0 kernel are you using? > > > > Certainly one of our forward port kernels. > > > >> And for backend drivers implemented in the kernel (netback, etc), there > >> is > >> no retrying. > > > > As above, seems like nobody cared to forward port those bits either. > > > > Jan > > > > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Olaf Hering
2011-Nov-11 22:56 UTC
Re: [Xen-devel] Need help with fixing the Xen waitqueue feature
Keir, just do dump my findings to the list: On Tue, Nov 08, Keir Fraser wrote:> Tbh I wonder anyway whether stale hypercall context would be likely to cause > a silent machine reboot. Booting with max_cpus=1 would eliminate moving > between CPUs as a cause of inconsistencies, or pin the guest under test. > Another problem could be sleeping with locks held, but we do test for that > (in debug builds at least) and I''d expect crash/hang rather than silent > reboot. Another problem could be if the vcpu has its own state in an > inconsistent/invalid state temporarily (e.g., its pagetable base pointers) > which then is attempted to be restored during a waitqueue wakeup. That could > certainly cause a reboot, but I don''t know of an example where this might > happen.The crashes also happen with maxcpus=1 and a single guest cpu. Today I added wait_event to ept_get_entry and this works. But at some point the codepath below is executed, after that wake_up the host hangs hard. I will trace it further next week, maybe the backtrace gives a glue what the cause could be. Also, the 3K stacksize is still too small, this path uses 3096. (XEN) prep 127a 30 0 (XEN) wake 127a 30 (XEN) prep 1cf71 30 0 (XEN) wake 1cf71 30 (XEN) prep 1cf72 30 0 (XEN) wake 1cf72 30 (XEN) prep 1cee9 30 0 (XEN) wake 1cee9 30 (XEN) prep 121a 30 0 (XEN) wake 121a 30 (This means ''gfn (p2m_unshare << 4) in_atomic)'' (XEN) prep 1ee61 20 0 (XEN) max stacksize c18 (XEN) Xen WARN at wait.c:126 (XEN) ----[ Xen-4.2.24114-20111111.221356 x86_64 debug=y Tainted: C ]---- (XEN) CPU: 0 (XEN) RIP: e008:[<ffff82c48012b85e>] prepare_to_wait+0x178/0x1b2 (XEN) RFLAGS: 0000000000010286 CONTEXT: hypervisor (XEN) rax: 0000000000000000 rbx: ffff830201f76000 rcx: 0000000000000000 (XEN) rdx: ffff82c4802b7f18 rsi: 000000000000000a rdi: ffff82c4802673f0 (XEN) rbp: ffff82c4802b73a8 rsp: ffff82c4802b7378 r8: 0000000000000000 (XEN) r9: ffff82c480221da0 r10: 00000000fffffffa r11: 0000000000000003 (XEN) r12: ffff82c4802b7f18 r13: ffff830201f76000 r14: ffff83003ea5c000 (XEN) r15: 000000000001ee61 cr0: 000000008005003b cr4: 00000000000026f0 (XEN) cr3: 000000020336d000 cr2: 00007fa88ac42000 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 (XEN) Xen stack trace from rsp=ffff82c4802b7378: (XEN) 0000000000000020 000000000001ee61 0000000000000002 ffff830201aa9e90 (XEN) ffff830201aa9f60 0000000000000020 ffff82c4802b7428 ffff82c4801e02f9 (XEN) ffff830000000002 0000000000000000 ffff82c4802b73f8 ffff82c4802b73f4 (XEN) 0000000000000000 ffff82c4802b74e0 ffff82c4802b74e4 0000000101aa9e90 (XEN) 000000ffffffffff ffff830201aa9e90 000000000001ee61 ffff82c4802b74e4 (XEN) 0000000000000002 0000000000000000 ffff82c4802b7468 ffff82c4801d810f (XEN) ffff82c4802b74e0 000000000001ee61 ffff830201aa9e90 ffff82c4802b75bc (XEN) 00000000002167f5 ffff88001ee61900 ffff82c4802b7518 ffff82c480211b80 (XEN) ffff8302167f5000 ffff82c4801c168c 0000000000000000 ffff83003ea5c000 (XEN) ffff88001ee61900 0000000001805063 0000000001809063 000000001ee001e3 (XEN) 000000001ee61067 00000000002167f5 000000000022ee70 000000000022ed10 (XEN) ffffffffffffffff 0000000a00000007 0000000000000004 ffff82c48025db80 (XEN) ffff83003ea5c000 ffff82c4802b75bc ffff88001ee61900 ffff830201aa9e90 (XEN) ffff82c4802b7528 ffff82c480211cb1 ffff82c4802b7568 ffff82c4801da97f (XEN) ffff82c4801be053 0000000000000008 ffff82c4802b7b58 ffff88001ee61900 (XEN) 0000000000000000 ffff82c4802b78b0 ffff82c4802b75f8 ffff82c4801aaec8 (XEN) 0000000000000003 ffff88001ee61900 ffff82c4802b78b0 ffff82c4802b7640 (XEN) ffff83003ea5c000 00000000000000a0 0000000000000900 0000000000000008 (XEN) 00000003802b7650 0000000000000004 00000003802b7668 0000000000000000 (XEN) ffff82c4802b7b58 0000000000000001 0000000000000003 ffff82c4802b78b0 (XEN) Xen call trace: (XEN) [<ffff82c48012b85e>] prepare_to_wait+0x178/0x1b2 (XEN) [<ffff82c4801e02f9>] ept_get_entry+0x81/0xd8 (XEN) [<ffff82c4801d810f>] gfn_to_mfn_type_p2m+0x55/0x114 (XEN) [<ffff82c480211b80>] hap_p2m_ga_to_gfn_4_levels+0x1c4/0x2d6 (XEN) [<ffff82c480211cb1>] hap_gva_to_gfn_4_levels+0x1f/0x2e (XEN) [<ffff82c4801da97f>] paging_gva_to_gfn+0xae/0xc4 (XEN) [<ffff82c4801aaec8>] hvmemul_linear_to_phys+0xf1/0x25c (XEN) [<ffff82c4801ab762>] hvmemul_rep_movs+0xe8/0x31a (XEN) [<ffff82c48018de07>] x86_emulate+0x4e01/0x10fde (XEN) [<ffff82c4801aab3c>] hvm_emulate_one+0x12d/0x1c5 (XEN) [<ffff82c4801b68a9>] handle_mmio+0x4e/0x1d8 (XEN) [<ffff82c4801b3a1e>] hvm_hap_nested_page_fault+0x1e7/0x302 (XEN) [<ffff82c4801d1ff6>] vmx_vmexit_handler+0x12cf/0x1594 (XEN) (XEN) wake 1ee61 20 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2011-Nov-12 07:00 UTC
Re: [Xen-devel] Need help with fixing the Xen waitqueue feature
On 11/11/2011 22:56, "Olaf Hering" <olaf@aepfle.de> wrote:> Keir, > > just do dump my findings to the list: > > On Tue, Nov 08, Keir Fraser wrote: > >> Tbh I wonder anyway whether stale hypercall context would be likely to cause >> a silent machine reboot. Booting with max_cpus=1 would eliminate moving >> between CPUs as a cause of inconsistencies, or pin the guest under test. >> Another problem could be sleeping with locks held, but we do test for that >> (in debug builds at least) and I''d expect crash/hang rather than silent >> reboot. Another problem could be if the vcpu has its own state in an >> inconsistent/invalid state temporarily (e.g., its pagetable base pointers) >> which then is attempted to be restored during a waitqueue wakeup. That could >> certainly cause a reboot, but I don''t know of an example where this might >> happen. > > The crashes also happen with maxcpus=1 and a single guest cpu. > Today I added wait_event to ept_get_entry and this works. > > But at some point the codepath below is executed, after that wake_up the > host hangs hard. I will trace it further next week, maybe the backtrace > gives a glue what the cause could be.So you run with a single CPU, and with wait_event() in one location, and that works for a while (actually doing full waitqueue work: executing wait() and wake_up()), but then hangs? That''s weird, but pretty interesting if I''ve understood correctly.> Also, the 3K stacksize is still too small, this path uses 3096.I''ll allocate a whole page for the stack then. -- Keir> (XEN) prep 127a 30 0 > (XEN) wake 127a 30 > (XEN) prep 1cf71 30 0 > (XEN) wake 1cf71 30 > (XEN) prep 1cf72 30 0 > (XEN) wake 1cf72 30 > (XEN) prep 1cee9 30 0 > (XEN) wake 1cee9 30 > (XEN) prep 121a 30 0 > (XEN) wake 121a 30 > > (This means ''gfn (p2m_unshare << 4) in_atomic)'' > > (XEN) prep 1ee61 20 0 > (XEN) max stacksize c18 > (XEN) Xen WARN at wait.c:126 > (XEN) ----[ Xen-4.2.24114-20111111.221356 x86_64 debug=y Tainted: C > ]---- > (XEN) CPU: 0 > (XEN) RIP: e008:[<ffff82c48012b85e>] prepare_to_wait+0x178/0x1b2 > (XEN) RFLAGS: 0000000000010286 CONTEXT: hypervisor > (XEN) rax: 0000000000000000 rbx: ffff830201f76000 rcx: 0000000000000000 > (XEN) rdx: ffff82c4802b7f18 rsi: 000000000000000a rdi: ffff82c4802673f0 > (XEN) rbp: ffff82c4802b73a8 rsp: ffff82c4802b7378 r8: 0000000000000000 > (XEN) r9: ffff82c480221da0 r10: 00000000fffffffa r11: 0000000000000003 > (XEN) r12: ffff82c4802b7f18 r13: ffff830201f76000 r14: ffff83003ea5c000 > (XEN) r15: 000000000001ee61 cr0: 000000008005003b cr4: 00000000000026f0 > (XEN) cr3: 000000020336d000 cr2: 00007fa88ac42000 > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 > (XEN) Xen stack trace from rsp=ffff82c4802b7378: > (XEN) 0000000000000020 000000000001ee61 0000000000000002 ffff830201aa9e90 > (XEN) ffff830201aa9f60 0000000000000020 ffff82c4802b7428 ffff82c4801e02f9 > (XEN) ffff830000000002 0000000000000000 ffff82c4802b73f8 ffff82c4802b73f4 > (XEN) 0000000000000000 ffff82c4802b74e0 ffff82c4802b74e4 0000000101aa9e90 > (XEN) 000000ffffffffff ffff830201aa9e90 000000000001ee61 ffff82c4802b74e4 > (XEN) 0000000000000002 0000000000000000 ffff82c4802b7468 ffff82c4801d810f > (XEN) ffff82c4802b74e0 000000000001ee61 ffff830201aa9e90 ffff82c4802b75bc > (XEN) 00000000002167f5 ffff88001ee61900 ffff82c4802b7518 ffff82c480211b80 > (XEN) ffff8302167f5000 ffff82c4801c168c 0000000000000000 ffff83003ea5c000 > (XEN) ffff88001ee61900 0000000001805063 0000000001809063 000000001ee001e3 > (XEN) 000000001ee61067 00000000002167f5 000000000022ee70 000000000022ed10 > (XEN) ffffffffffffffff 0000000a00000007 0000000000000004 ffff82c48025db80 > (XEN) ffff83003ea5c000 ffff82c4802b75bc ffff88001ee61900 ffff830201aa9e90 > (XEN) ffff82c4802b7528 ffff82c480211cb1 ffff82c4802b7568 ffff82c4801da97f > (XEN) ffff82c4801be053 0000000000000008 ffff82c4802b7b58 ffff88001ee61900 > (XEN) 0000000000000000 ffff82c4802b78b0 ffff82c4802b75f8 ffff82c4801aaec8 > (XEN) 0000000000000003 ffff88001ee61900 ffff82c4802b78b0 ffff82c4802b7640 > (XEN) ffff83003ea5c000 00000000000000a0 0000000000000900 0000000000000008 > (XEN) 00000003802b7650 0000000000000004 00000003802b7668 0000000000000000 > (XEN) ffff82c4802b7b58 0000000000000001 0000000000000003 ffff82c4802b78b0 > (XEN) Xen call trace: > (XEN) [<ffff82c48012b85e>] prepare_to_wait+0x178/0x1b2 > (XEN) [<ffff82c4801e02f9>] ept_get_entry+0x81/0xd8 > (XEN) [<ffff82c4801d810f>] gfn_to_mfn_type_p2m+0x55/0x114 > (XEN) [<ffff82c480211b80>] hap_p2m_ga_to_gfn_4_levels+0x1c4/0x2d6 > (XEN) [<ffff82c480211cb1>] hap_gva_to_gfn_4_levels+0x1f/0x2e > (XEN) [<ffff82c4801da97f>] paging_gva_to_gfn+0xae/0xc4 > (XEN) [<ffff82c4801aaec8>] hvmemul_linear_to_phys+0xf1/0x25c > (XEN) [<ffff82c4801ab762>] hvmemul_rep_movs+0xe8/0x31a > (XEN) [<ffff82c48018de07>] x86_emulate+0x4e01/0x10fde > (XEN) [<ffff82c4801aab3c>] hvm_emulate_one+0x12d/0x1c5 > (XEN) [<ffff82c4801b68a9>] handle_mmio+0x4e/0x1d8 > (XEN) [<ffff82c4801b3a1e>] hvm_hap_nested_page_fault+0x1e7/0x302 > (XEN) [<ffff82c4801d1ff6>] vmx_vmexit_handler+0x12cf/0x1594 > (XEN) > (XEN) wake 1ee61 20 > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Sat, Nov 12, Keir Fraser wrote:> On 11/11/2011 22:56, "Olaf Hering" <olaf@aepfle.de> wrote: > > > Keir, > > > > just do dump my findings to the list: > > > > On Tue, Nov 08, Keir Fraser wrote: > > > >> Tbh I wonder anyway whether stale hypercall context would be likely to cause > >> a silent machine reboot. Booting with max_cpus=1 would eliminate moving > >> between CPUs as a cause of inconsistencies, or pin the guest under test. > >> Another problem could be sleeping with locks held, but we do test for that > >> (in debug builds at least) and I''d expect crash/hang rather than silent > >> reboot. Another problem could be if the vcpu has its own state in an > >> inconsistent/invalid state temporarily (e.g., its pagetable base pointers) > >> which then is attempted to be restored during a waitqueue wakeup. That could > >> certainly cause a reboot, but I don''t know of an example where this might > >> happen. > > > > The crashes also happen with maxcpus=1 and a single guest cpu. > > Today I added wait_event to ept_get_entry and this works. > > > > But at some point the codepath below is executed, after that wake_up the > > host hangs hard. I will trace it further next week, maybe the backtrace > > gives a glue what the cause could be. > > So you run with a single CPU, and with wait_event() in one location, and > that works for a while (actually doing full waitqueue work: executing wait() > and wake_up()), but then hangs? That''s weird, but pretty interesting if I''ve > understood correctly.Yes, thats what happens with single cpu in dom0 and domU. I have added some more debug. After the backtrace below I see one more call to check_wakeup_from_wait() for dom0, then the host hangs hard.> > Also, the 3K stacksize is still too small, this path uses 3096. > > I''ll allocate a whole page for the stack then.Thanks. Olaf> > (XEN) prep 127a 30 0 > > (XEN) wake 127a 30 > > (XEN) prep 1cf71 30 0 > > (XEN) wake 1cf71 30 > > (XEN) prep 1cf72 30 0 > > (XEN) wake 1cf72 30 > > (XEN) prep 1cee9 30 0 > > (XEN) wake 1cee9 30 > > (XEN) prep 121a 30 0 > > (XEN) wake 121a 30 > > > > (This means ''gfn (p2m_unshare << 4) in_atomic)'' > > > > (XEN) prep 1ee61 20 0 > > (XEN) max stacksize c18 > > (XEN) Xen WARN at wait.c:126 > > (XEN) ----[ Xen-4.2.24114-20111111.221356 x86_64 debug=y Tainted: C > > ]---- > > (XEN) CPU: 0 > > (XEN) RIP: e008:[<ffff82c48012b85e>] prepare_to_wait+0x178/0x1b2 > > (XEN) RFLAGS: 0000000000010286 CONTEXT: hypervisor > > (XEN) rax: 0000000000000000 rbx: ffff830201f76000 rcx: 0000000000000000 > > (XEN) rdx: ffff82c4802b7f18 rsi: 000000000000000a rdi: ffff82c4802673f0 > > (XEN) rbp: ffff82c4802b73a8 rsp: ffff82c4802b7378 r8: 0000000000000000 > > (XEN) r9: ffff82c480221da0 r10: 00000000fffffffa r11: 0000000000000003 > > (XEN) r12: ffff82c4802b7f18 r13: ffff830201f76000 r14: ffff83003ea5c000 > > (XEN) r15: 000000000001ee61 cr0: 000000008005003b cr4: 00000000000026f0 > > (XEN) cr3: 000000020336d000 cr2: 00007fa88ac42000 > > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 > > (XEN) Xen stack trace from rsp=ffff82c4802b7378: > > (XEN) 0000000000000020 000000000001ee61 0000000000000002 ffff830201aa9e90 > > (XEN) ffff830201aa9f60 0000000000000020 ffff82c4802b7428 ffff82c4801e02f9 > > (XEN) ffff830000000002 0000000000000000 ffff82c4802b73f8 ffff82c4802b73f4 > > (XEN) 0000000000000000 ffff82c4802b74e0 ffff82c4802b74e4 0000000101aa9e90 > > (XEN) 000000ffffffffff ffff830201aa9e90 000000000001ee61 ffff82c4802b74e4 > > (XEN) 0000000000000002 0000000000000000 ffff82c4802b7468 ffff82c4801d810f > > (XEN) ffff82c4802b74e0 000000000001ee61 ffff830201aa9e90 ffff82c4802b75bc > > (XEN) 00000000002167f5 ffff88001ee61900 ffff82c4802b7518 ffff82c480211b80 > > (XEN) ffff8302167f5000 ffff82c4801c168c 0000000000000000 ffff83003ea5c000 > > (XEN) ffff88001ee61900 0000000001805063 0000000001809063 000000001ee001e3 > > (XEN) 000000001ee61067 00000000002167f5 000000000022ee70 000000000022ed10 > > (XEN) ffffffffffffffff 0000000a00000007 0000000000000004 ffff82c48025db80 > > (XEN) ffff83003ea5c000 ffff82c4802b75bc ffff88001ee61900 ffff830201aa9e90 > > (XEN) ffff82c4802b7528 ffff82c480211cb1 ffff82c4802b7568 ffff82c4801da97f > > (XEN) ffff82c4801be053 0000000000000008 ffff82c4802b7b58 ffff88001ee61900 > > (XEN) 0000000000000000 ffff82c4802b78b0 ffff82c4802b75f8 ffff82c4801aaec8 > > (XEN) 0000000000000003 ffff88001ee61900 ffff82c4802b78b0 ffff82c4802b7640 > > (XEN) ffff83003ea5c000 00000000000000a0 0000000000000900 0000000000000008 > > (XEN) 00000003802b7650 0000000000000004 00000003802b7668 0000000000000000 > > (XEN) ffff82c4802b7b58 0000000000000001 0000000000000003 ffff82c4802b78b0 > > (XEN) Xen call trace: > > (XEN) [<ffff82c48012b85e>] prepare_to_wait+0x178/0x1b2 > > (XEN) [<ffff82c4801e02f9>] ept_get_entry+0x81/0xd8 > > (XEN) [<ffff82c4801d810f>] gfn_to_mfn_type_p2m+0x55/0x114 > > (XEN) [<ffff82c480211b80>] hap_p2m_ga_to_gfn_4_levels+0x1c4/0x2d6 > > (XEN) [<ffff82c480211cb1>] hap_gva_to_gfn_4_levels+0x1f/0x2e > > (XEN) [<ffff82c4801da97f>] paging_gva_to_gfn+0xae/0xc4 > > (XEN) [<ffff82c4801aaec8>] hvmemul_linear_to_phys+0xf1/0x25c > > (XEN) [<ffff82c4801ab762>] hvmemul_rep_movs+0xe8/0x31a > > (XEN) [<ffff82c48018de07>] x86_emulate+0x4e01/0x10fde > > (XEN) [<ffff82c4801aab3c>] hvm_emulate_one+0x12d/0x1c5 > > (XEN) [<ffff82c4801b68a9>] handle_mmio+0x4e/0x1d8 > > (XEN) [<ffff82c4801b3a1e>] hvm_hap_nested_page_fault+0x1e7/0x302 > > (XEN) [<ffff82c4801d1ff6>] vmx_vmexit_handler+0x12cf/0x1594 > > (XEN) > > (XEN) wake 1ee61 20 > > > > > > > >
On 22/11/2011 11:40, "Olaf Hering" <olaf@aepfle.de> wrote:> On Sat, Nov 12, Keir Fraser wrote: > >> On 11/11/2011 22:56, "Olaf Hering" <olaf@aepfle.de> wrote: >> >> So you run with a single CPU, and with wait_event() in one location, and >> that works for a while (actually doing full waitqueue work: executing wait() >> and wake_up()), but then hangs? That''s weird, but pretty interesting if I''ve >> understood correctly. > > Yes, thats what happens with single cpu in dom0 and domU. > I have added some more debug. After the backtrace below I see one more > call to check_wakeup_from_wait() for dom0, then the host hangs hard.I think I checked before, but: also unresponsive to serial debug keys? And dom0 isn''t getting put on a waitqueue I assume? Since I guess dom0 is doing the work to wake things from the waitqueue, that couldn''t go well. :-)>>> Also, the 3K stacksize is still too small, this path uses 3096. >> >> I''ll allocate a whole page for the stack then. > > Thanks.Forgot about it. Done now! -- Keir> > Olaf
On Tue, Nov 22, Keir Fraser wrote:> I think I checked before, but: also unresponsive to serial debug keys?Good point, I will check that. So far I havent used these keys.> Forgot about it. Done now!What about domain_crash() instead of BUG_ON() in __prepare_to_wait()? If the stacksize would be checked before its copied the hypervisor could survive. Olaf
On 22/11/2011 13:54, "Olaf Hering" <olaf@aepfle.de> wrote:> On Tue, Nov 22, Keir Fraser wrote: > >> I think I checked before, but: also unresponsive to serial debug keys? > > Good point, I will check that. So far I havent used these keys.If they work then ''d'' will give you a backtrace on every CPU, and ''q'' will dump domain and vcpu states. That should make things easier!>> Forgot about it. Done now! > > What about domain_crash() instead of BUG_ON() in __prepare_to_wait()? > If the stacksize would be checked before its copied the hypervisor could > survive.Try the attached patch (please also try reducing the size of the new parameter to the inline asm from PAGE_SIZE down to e.g. 2000 to force the domain-crashing path). -- Keir> Olaf_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, Nov 9, 2011 at 9:21 PM, Andres Lagar-Cavilla <andres@lagarcavilla.org> wrote:> PoD also does emergency sweeps under memory pressure to identify zeroes, > that can be easily implemented by a user-space utility.PoD is certainly a special-case, hypervisor-handled version of paging. The main question is whether a user-space version can be made to perform well enough. My guess is that it can, but it''s far from certain. If it can, I''m all in favor of making the paging handle PoD.> The hypervisor code keeps a list of 2M superpages -- that feature would be > lost.This is actually pretty important; Windows scrubs memory on boot, so it''s guaranteed that the majority of the memory will be touched and re-populated.> But I doubt this would fly anyways: PoD works for non-ept modes, which I > guess don''t want to lose that functionality.Is there a particular reason we can''t do paging on shadow code? I thought it was just that doing HAP was simpler to get started with. That would be another blocker to getting rid of PoD, really. -George
On 22/11/2011 15:07, "Olaf Hering" <olaf@aepfle.de> wrote:> On Tue, Nov 22, Keir Fraser wrote: > >> On 22/11/2011 13:54, "Olaf Hering" <olaf@aepfle.de> wrote: >> >>> On Tue, Nov 22, Keir Fraser wrote: >>> >>>> I think I checked before, but: also unresponsive to serial debug keys? >>> >>> Good point, I will check that. So far I havent used these keys. >> >> If they work then ''d'' will give you a backtrace on every CPU, and ''q'' will >> dump domain and vcpu states. That should make things easier! > > They do indeed work. The backtrace below is from another system. > Looks like hpet_broadcast_exit() is involved. > > Does that output below give any good hints?It tells us that the hypervisor itself is in good shape. The deterministic RIP in hpet_broadcast_exit() is simply because the serial rx interrupt is always waking us from the idle loop. That RIP value will simply be the first possible interruption point after the HLT instruction. I have a new theory, which is that if we go round the for-loop in wait_event() more than once, the vcpu''s pause counter gets messed up and goes negative, condemning it to sleep forever. I have *just* pushed a change to the debug ''q'' key (ignore the changeset comment referring to ''d'' key, I got that wrong!) which will print per-vcpu and per-domain pause_count values. Please get the system stuck again, and send the output from ''q'' key with that new changeset (c/s 24178). Finally, I don''t really know what the prep/wake/done messages from your logs mean, as you didn''t send the patch that prints them. -- Keir>> Try the attached patch (please also try reducing the size of the new >> parameter to the inline asm from PAGE_SIZE down to e.g. 2000 to force the >> domain-crashing path). > > Thanks, I will try it. > > > Olaf > > > .......... > > (XEN) ''q'' pressed -> dumping domain info (now=0x5E:F50D77F8) > (XEN) General information for domain 0: > (XEN) refcnt=3 dying=0 nr_pages=1852873 xenheap_pages=5 dirty_cpus={} > max_pages=4294967295 > (XEN) handle=00000000-0000-0000-0000-000000000000 vm_assist=00000004 > (XEN) Rangesets belonging to domain 0: > (XEN) I/O Ports { 0-1f, 22-3f, 44-60, 62-9f, a2-3f7, 400-807, 80c-cfb, > d00-ffff } > (XEN) Interrupts { 0-207 } > (XEN) I/O Memory { 0-febff, fec01-fedff, fee01-ffffffffffffffff } > (XEN) Memory pages belonging to domain 0: > (XEN) DomPage list too long to display > (XEN) XenPage 000000000021e6d9: caf=c000000000000002, taf=7400000000000002 > (XEN) XenPage 000000000021e6d8: caf=c000000000000001, taf=7400000000000001 > (XEN) XenPage 000000000021e6d7: caf=c000000000000001, taf=7400000000000001 > (XEN) XenPage 000000000021e6d6: caf=c000000000000001, taf=7400000000000001 > (XEN) XenPage 00000000000db2fe: caf=c000000000000002, taf=7400000000000002 > (XEN) VCPU information and callbacks for domain 0: > (XEN) VCPU0: CPU0 [has=F] flags=0 poll=0 upcall_pend = 01, upcall_mask > 00 dirty_cpus={} cpu_affinity={0} > (XEN) 250 Hz periodic timer (period 4 ms) > (XEN) General information for domain 1: > (XEN) refcnt=3 dying=0 nr_pages=3645 xenheap_pages=6 dirty_cpus={} > max_pages=131328 > (XEN) handle=d80155e4-8f8b-94e1-8382-94084b7f1e51 vm_assist=00000000 > (XEN) paging assistance: hap refcounts log_dirty translate external > (XEN) Rangesets belonging to domain 1: > (XEN) I/O Ports { } > (XEN) Interrupts { } > (XEN) I/O Memory { } > (XEN) Memory pages belonging to domain 1: > (XEN) DomPage list too long to display > (XEN) PoD entries=0 cachesize=0 > (XEN) XenPage 000000000020df70: caf=c000000000000001, taf=7400000000000001 > (XEN) XenPage 000000000020e045: caf=c000000000000001, taf=7400000000000001 > (XEN) XenPage 000000000020c58c: caf=c000000000000001, taf=7400000000000001 > (XEN) XenPage 000000000020c5a4: caf=c000000000000001, taf=7400000000000001 > (XEN) XenPage 0000000000019f1e: caf=c000000000000001, taf=7400000000000001 > (XEN) XenPage 000000000020eb23: caf=c000000000000001, taf=7400000000000001 > (XEN) VCPU information and callbacks for domain 1: > (XEN) VCPU0: CPU0 [has=F] flags=4 poll=0 upcall_pend = 00, upcall_mask > 00 dirty_cpus={} cpu_affinity={0} > (XEN) paging assistance: hap, 4 levels > (XEN) No periodic timer > (XEN) Notifying guest 0:0 (virq 1, port 0, stat 0/-1/-1) > (XEN) Notifying guest 1:0 (virq 1, port 0, stat 0/0/0) > (XEN) ''q'' pressed -> dumping domain info (now=0x60:A7DD8B08) > (XEN) General information for domain 0: > (XEN) refcnt=3 dying=0 nr_pages=1852873 xenheap_pages=5 dirty_cpus={} > max_pages=4294967295 > (XEN) handle=00000000-0000-0000-0000-000000000000 vm_assist=00000004 > (XEN) Rangesets belonging to domain 0: > (XEN) I/O Ports { 0-1f, 22-3f, 44-60, 62-9f, a2-3f7, 400-807, 80c-cfb, > d00-ffff } > (XEN) Interrupts { 0-207 } > (XEN) I/O Memory { 0-febff, fec01-fedff, fee01-ffffffffffffffff } > (XEN) Memory pages belonging to domain 0: > (XEN) DomPage list too long to display > (XEN) XenPage 000000000021e6d9: caf=c000000000000002, taf=7400000000000002 > (XEN) XenPage 000000000021e6d8: caf=c000000000000001, taf=7400000000000001 > (XEN) XenPage 000000000021e6d7: caf=c000000000000001, taf=7400000000000001 > (XEN) XenPage 000000000021e6d6: caf=c000000000000001, taf=7400000000000001 > (XEN) XenPage 00000000000db2fe: caf=c000000000000002, taf=7400000000000002 > (XEN) VCPU information and callbacks for domain 0: > (XEN) VCPU0: CPU0 [has=F] flags=0 poll=0 upcall_pend = 01, upcall_mask > 00 dirty_cpus={} cpu_affinity={0} > (XEN) 250 Hz periodic timer (period 4 ms) > (XEN) General information for domain 1: > (XEN) refcnt=3 dying=0 nr_pages=3645 xenheap_pages=6 dirty_cpus={} > max_pages=131328 > (XEN) handle=d80155e4-8f8b-94e1-8382-94084b7f1e51 vm_assist=00000000 > (XEN) paging assistance: hap refcounts log_dirty translate external > (XEN) Rangesets belonging to domain 1: > (XEN) I/O Ports { } > (XEN) Interrupts { } > (XEN) I/O Memory { } > (XEN) Memory pages belonging to domain 1: > (XEN) DomPage list too long to display > (XEN) PoD entries=0 cachesize=0 > (XEN) XenPage 000000000020df70: caf=c000000000000001, taf=7400000000000001 > (XEN) XenPage 000000000020e045: caf=c000000000000001, taf=7400000000000001 > (XEN) XenPage 000000000020c58c: caf=c000000000000001, taf=7400000000000001 > (XEN) XenPage 000000000020c5a4: caf=c000000000000001, taf=7400000000000001 > (XEN) XenPage 0000000000019f1e: caf=c000000000000001, taf=7400000000000001 > (XEN) XenPage 000000000020eb23: caf=c000000000000001, taf=7400000000000001 > (XEN) VCPU information and callbacks for domain 1: > (XEN) VCPU0: CPU0 [has=F] flags=4 poll=0 upcall_pend = 00, upcall_mask > 00 dirty_cpus={} cpu_affinity={0} > (XEN) paging assistance: hap, 4 levels > (XEN) No periodic timer > (XEN) Notifying guest 0:0 (virq 1, port 0, stat 0/-1/-1) > (XEN) Notifying guest 1:0 (virq 1, port 0, stat 0/0/0) > (XEN) ''d'' pressed -> dumping registers > (XEN) > (XEN) *** Dumping CPU0 host state: *** > (XEN) ----[ Xen-4.2.24169-20111122.144218 x86_64 debug=y Tainted: C > ]---- > (XEN) CPU: 0 > (XEN) RIP: e008:[<ffff82c48019bfe6>] hpet_broadcast_exit+0x0/0x1f9 > (XEN) RFLAGS: 0000000000000246 CONTEXT: hypervisor > (XEN) rax: 0000000000003b40 rbx: 000000674742e72d rcx: 0000000000000001 > (XEN) rdx: 0000000000000000 rsi: ffff82c48030f000 rdi: ffff82c4802bfea0 > (XEN) rbp: ffff82c4802bfee0 rsp: ffff82c4802bfe78 r8: 000000008c858211 > (XEN) r9: 0000000000000003 r10: ffff82c4803064e0 r11: 000000676bf885a3 > (XEN) r12: ffff83021e70e840 r13: ffff83021e70e8d0 r14: 00000067471bdb62 > (XEN) r15: ffff82c48030e440 cr0: 000000008005003b cr4: 00000000000026f0 > (XEN) cr3: 00000000db4c4000 cr2: 0000000000beb000 > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 > (XEN) Xen stack trace from rsp=ffff82c4802bfe78: > (XEN) ffff82c48019f0ca ffff82c4802bff18 ffffffffffffffff ffff82c4802bfed0 > (XEN) 0000000180124b57 0000000000000000 0000000000000000 ffff82c48025b200 > (XEN) 0000152900006fe3 ffff82c4802bff18 ffff82c48025b200 ffff82c4802bff18 > (XEN) ffff82c48030e468 ffff82c4802bff10 ffff82c48015a88d 0000000000000000 > (XEN) ffff8300db6c6000 ffff8300db6c6000 ffffffffffffffff ffff82c4802bfe00 > (XEN) 0000000000000000 0000000000001000 0000000000001000 0000000000000000 > (XEN) 8000000000000427 ffff8801d8579010 0000000000000246 00000000deadbeef > (XEN) ffff8801d8579000 ffff8801d8579000 00000000fffffffe ffffffff8000302a > (XEN) 00000000deadbeef 00000000deadbeef 00000000deadbeef 0000010000000000 > (XEN) ffffffff8000302a 000000000000e033 0000000000000246 ffff8801a515bd10 > (XEN) 000000000000e02b 000000000000beef 000000000000beef 000000000000beef > (XEN) 000000000000beef 0000000000000000 ffff8300db6c6000 0000000000000000 > (XEN) 0000000000000000 > (XEN) Xen call trace: > (XEN) [<ffff82c48019bfe6>] hpet_broadcast_exit+0x0/0x1f9 > (XEN) [<ffff82c48015a88d>] idle_loop+0x6c/0x7b > (XEN) > (XEN) ''d'' pressed -> dumping registers > (XEN) > (XEN) *** Dumping CPU0 host state: *** > (XEN) ----[ Xen-4.2.24169-20111122.144218 x86_64 debug=y Tainted: C > ]---- > (XEN) CPU: 0 > (XEN) RIP: e008:[<ffff82c48019bfe6>] hpet_broadcast_exit+0x0/0x1f9 > (XEN) RFLAGS: 0000000000000246 CONTEXT: hypervisor > (XEN) rax: 0000000000003b40 rbx: 00000078f4fbe7ed rcx: 0000000000000001 > (XEN) rdx: 0000000000000000 rsi: ffff82c48030f000 rdi: ffff82c4802bfea0 > (XEN) rbp: ffff82c4802bfee0 rsp: ffff82c4802bfe78 r8: 00000000cd4f8db6 > (XEN) r9: 0000000000000002 r10: ffff82c480308780 r11: 000000790438291d > (XEN) r12: ffff83021e70e840 r13: ffff83021e70e8d0 r14: 00000078f412a61c > (XEN) r15: ffff82c48030e440 cr0: 000000008005003b cr4: 00000000000026f0 > (XEN) cr3: 00000000db4c4000 cr2: 0000000000beb000 > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 > (XEN) Xen stack trace from rsp=ffff82c4802bfe78: > (XEN) ffff82c48019f0ca ffff82c4802bff18 ffffffffffffffff ffff82c4802bfed0 > (XEN) 0000000180124b57 0000000000000000 0000000000000000 ffff82c48025b200 > (XEN) 0000239e00007657 ffff82c4802bff18 ffff82c48025b200 ffff82c4802bff18 > (XEN) ffff82c48030e468 ffff82c4802bff10 ffff82c48015a88d 0000000000000000 > (XEN) ffff8300db6c6000 ffff8300db6c6000 ffffffffffffffff ffff82c4802bfe00 > (XEN) 0000000000000000 0000000000001000 0000000000001000 0000000000000000 > (XEN) 8000000000000427 ffff8801d8579010 0000000000000246 00000000deadbeef > (XEN) ffff8801d8579000 ffff8801d8579000 00000000fffffffe ffffffff8000302a > (XEN) 00000000deadbeef 00000000deadbeef 00000000deadbeef 0000010000000000 > (XEN) ffffffff8000302a 000000000000e033 0000000000000246 ffff8801a515bd10 > (XEN) 000000000000e02b 000000000000beef 000000000000beef 000000000000beef > (XEN) 000000000000beef 0000000000000000 ffff8300db6c6000 0000000000000000 > (XEN) 0000000000000000 > (XEN) Xen call trace: > (XEN) [<ffff82c48019bfe6>] hpet_broadcast_exit+0x0/0x1f9 > (XEN) [<ffff82c48015a88d>] idle_loop+0x6c/0x7b > (XEN) >
On 22/11/2011 15:40, "Keir Fraser" <keir@xen.org> wrote:> I have a new theory, which is that if we go round the for-loop in > wait_event() more than once, the vcpu''s pause counter gets messed up and > goes negative, condemning it to sleep forever.Further to this, can you please try moving the call to __prepare_to_wait() from just after the spinlock region to just before (i.e., immediately after the ASSERT), in prepare_to_wait(). That could well makes things work better. On UP at least -- for SMP systems I will also need to fix the broken usage of wqv->esp... -- Keir
On Tue, Nov 22, Keir Fraser wrote:> I have a new theory, which is that if we go round the for-loop in > wait_event() more than once, the vcpu''s pause counter gets messed up and > goes negative, condemning it to sleep forever.I have added a check for that, its not negative.> I have *just* pushed a change to the debug ''q'' key (ignore the changeset > comment referring to ''d'' key, I got that wrong!) which will print per-vcpu > and per-domain pause_count values. Please get the system stuck again, and > send the output from ''q'' key with that new changeset (c/s 24178).To me it looks like dom0 gets paused, perhaps due to some uneven pause/unpause calls. I will see if I can figure it out. Olaf (XEN) ''q'' pressed -> dumping domain info (now=0xA1:4BC733CC) (XEN) General information for domain 0: (XEN) refcnt=3 dying=0 pause_count=0 (XEN) nr_pages=5991502 xenheap_pages=5 dirty_cpus={} max_pages=4294967295 (XEN) handle=00000000-0000-0000-0000-000000000000 vm_assist=00000004 (XEN) Rangesets belonging to domain 0: (XEN) I/O Ports { 0-1f, 22-3f, 44-60, 62-9f, a2-3f7, 400-407, 40c-cfb, d00-ffff } (XEN) Interrupts { 0-303 } (XEN) I/O Memory { 0-febff, fec01-fec8f, fec91-fedff, fee01-ffffffffffffffff } (XEN) Memory pages belonging to domain 0: (XEN) DomPage list too long to display (XEN) XenPage 000000000036ff8d: caf=c000000000000002, taf=7400000000000002 (XEN) XenPage 000000000036ff8c: caf=c000000000000001, taf=7400000000000001 (XEN) XenPage 000000000036ff8b: caf=c000000000000001, taf=7400000000000001 (XEN) XenPage 000000000036ff8a: caf=c000000000000001, taf=7400000000000001 (XEN) XenPage 000000000008befd: caf=c000000000000002, taf=7400000000000002 (XEN) VCPU information and callbacks for domain 0: (XEN) VCPU0: CPU0 [has=F] poll=0 upcall_pend = 01, upcall_mask = 00 dirty_cpus={} cpu_affinity={0} (XEN) pause_count=1 pause_flags=0 (XEN) 250 Hz periodic timer (period 4 ms) (XEN) General information for domain 1: (XEN) refcnt=3 dying=0 pause_count=0 (XEN) nr_pages=17549 xenheap_pages=6 dirty_cpus={} max_pages=131328 (XEN) handle=5499728e-7f38-dbb0-b6cc-22866a6864f3 vm_assist=00000000 (XEN) paging assistance: hap refcounts translate external (XEN) Rangesets belonging to domain 1: (XEN) I/O Ports { } (XEN) Interrupts { } (XEN) I/O Memory { } (XEN) Memory pages belonging to domain 1: (XEN) DomPage list too long to display (XEN) PoD entries=0 cachesize=0 (XEN) XenPage 0000000000200b7c: caf=c000000000000001, taf=7400000000000001 (XEN) XenPage 0000000000203bfe: caf=c000000000000001, taf=7400000000000001 (XEN) XenPage 0000000000200b48: caf=c000000000000001, taf=7400000000000001 (XEN) XenPage 000000000021291d: caf=c000000000000001, taf=7400000000000001 (XEN) XenPage 000000000003ebfc: caf=c000000000000001, taf=7400000000000001 (XEN) XenPage 0000000000202ef4: caf=c000000000000001, taf=7400000000000001 (XEN) VCPU information and callbacks for domain 1: (XEN) VCPU0: CPU0 [has=F] poll=0 upcall_pend = 00, upcall_mask = 00 dirty_cpus={} cpu_affinity={0} (XEN) pause_count=0 pause_flags=4 (XEN) paging assistance: hap, 4 levels (XEN) No periodic timer (XEN) Notifying guest 0:0 (virq 1, port 0, stat 0/-1/-1) (XEN) Notifying guest 1:0 (virq 1, port 0, stat 0/0/0)
On 22/11/2011 17:36, "Olaf Hering" <olaf@aepfle.de> wrote:> On Tue, Nov 22, Keir Fraser wrote: > >> I have a new theory, which is that if we go round the for-loop in >> wait_event() more than once, the vcpu''s pause counter gets messed up and >> goes negative, condemning it to sleep forever. > > I have added a check for that, its not negative. > >> I have *just* pushed a change to the debug ''q'' key (ignore the changeset >> comment referring to ''d'' key, I got that wrong!) which will print per-vcpu >> and per-domain pause_count values. Please get the system stuck again, and >> send the output from ''q'' key with that new changeset (c/s 24178). > > To me it looks like dom0 gets paused, perhaps due to some uneven pause/unpause > calls. > I will see if I can figure it out.Could it have ended up on the waitqueue? -- Keir
On Tue, Nov 22, Keir Fraser wrote:> Could it have ended up on the waitqueue?Unlikely, but I will add checks for that as well. Olaf
On Tue, Nov 22, Olaf Hering wrote:> On Tue, Nov 22, Keir Fraser wrote: > > > Could it have ended up on the waitqueue? > > Unlikely, but I will add checks for that as well.I posted three changes which make use of the wait queues. For some reason the code at the very end of p2m_mem_paging_populate() triggers when d is dom0, so its vcpu is put to sleep. Olaf
On 22/11/2011 21:15, "Olaf Hering" <olaf@aepfle.de> wrote:> On Tue, Nov 22, Olaf Hering wrote: > >> On Tue, Nov 22, Keir Fraser wrote: >> >>> Could it have ended up on the waitqueue? >> >> Unlikely, but I will add checks for that as well. > > I posted three changes which make use of the wait queues. > For some reason the code at the very end of p2m_mem_paging_populate() > triggers when d is dom0, so its vcpu is put to sleep.We obviously can''t have dom0 going to sleep on paging work. This, at least, isn''t a wait-queue bug.> Olaf
On Tue, Nov 22, Keir Fraser wrote:> We obviously can''t have dom0 going to sleep on paging work. This, at least, > isn''t a wait-queue bug.I had to rearrange some code in p2m_mem_paging_populate for my debug stuff. This led to an uninitialized req, and as a result req.flags sometimes had MEM_EVENT_FLAG_VCPU_PAUSED set. For some reason gcc did not catch that.. Now waitqueues appear to work ok for me. Thanks! What do you think about C99 initializers in p2m_mem_paging_populate, just to avoid such mistakes? mem_event_request_t req = { .type = MEM_EVENT_TYPE_PAGING }; Olaf
On 23/11/2011 17:00, "Olaf Hering" <olaf@aepfle.de> wrote:> On Tue, Nov 22, Keir Fraser wrote: > >> We obviously can''t have dom0 going to sleep on paging work. This, at least, >> isn''t a wait-queue bug. > > I had to rearrange some code in p2m_mem_paging_populate for my debug > stuff. This led to an uninitialized req, and as a result req.flags > sometimes had MEM_EVENT_FLAG_VCPU_PAUSED set. For some reason gcc did > not catch that.. > Now waitqueues appear to work ok for me. Thanks!Great. However, while eyeballing wait.c I spotted at least two bugs. I''m pretty sure that the hypervisor will blow up pretty quickly when you resume testing with multiple physical CPUs, for example. I need to create a couple of fixup patches which I will then send to you for test. By the way, did you test my patch to domain_crash when the stack-save area isn''t large enough?> What do you think about C99 initializers in p2m_mem_paging_populate, > just to avoid such mistakes? > > mem_event_request_t req = { .type = MEM_EVENT_TYPE_PAGING };We like them. -- Keir
On Wed, Nov 23, Keir Fraser wrote:> On 23/11/2011 17:00, "Olaf Hering" <olaf@aepfle.de> wrote: > > > On Tue, Nov 22, Keir Fraser wrote: > > > >> We obviously can''t have dom0 going to sleep on paging work. This, at least, > >> isn''t a wait-queue bug. > > > > I had to rearrange some code in p2m_mem_paging_populate for my debug > > stuff. This led to an uninitialized req, and as a result req.flags > > sometimes had MEM_EVENT_FLAG_VCPU_PAUSED set. For some reason gcc did > > not catch that.. > > Now waitqueues appear to work ok for me. Thanks! > > Great. However, while eyeballing wait.c I spotted at least two bugs. I''m > pretty sure that the hypervisor will blow up pretty quickly when you resume > testing with multiple physical CPUs, for example. I need to create a couple > of fixup patches which I will then send to you for test.Good, I will look forward for these fixes.> By the way, did you test my patch to domain_crash when the stack-save area > isn''t large enough?I ran into the ->esp == 0 case right away, but I need to retest with a clean tree. Olaf
On 23/11/2011 17:16, "Keir Fraser" <keir.xen@gmail.com> wrote:> On 23/11/2011 17:00, "Olaf Hering" <olaf@aepfle.de> wrote: > >> On Tue, Nov 22, Keir Fraser wrote: >> >>> We obviously can''t have dom0 going to sleep on paging work. This, at least, >>> isn''t a wait-queue bug. >> >> I had to rearrange some code in p2m_mem_paging_populate for my debug >> stuff. This led to an uninitialized req, and as a result req.flags >> sometimes had MEM_EVENT_FLAG_VCPU_PAUSED set. For some reason gcc did >> not catch that.. >> Now waitqueues appear to work ok for me. Thanks! > > Great. However, while eyeballing wait.c I spotted at least two bugs. I''m > pretty sure that the hypervisor will blow up pretty quickly when you resume > testing with multiple physical CPUs, for example. I need to create a couple > of fixup patches which I will then send to you for test.We have quite a big waitqueue problem actually. The current scheme of per-cpu stacks doesn''t work nicely, as the stack pointer will change if a vcpu goes to sleep and then wakes up on a different cpu. This really doesn''t work nicely with preempted C code, which may implement frame pointers and/or arbitrarily take the address of on-stack variables. The result will be hideous cross-stack corruptions, as these frame pointers and cached addresses of automatic variables will reference the wrong cpu''s stack! Fixing or detecting this in general is not possible afaics. So, we''ll have to switch to per-vcpu stacks, probably with separate per-cpu irq stacks (as a later followup). That''s quite a nuisance! -- Keir> By the way, did you test my patch to domain_crash when the stack-save area > isn''t large enough? > >> What do you think about C99 initializers in p2m_mem_paging_populate, >> just to avoid such mistakes? >> >> mem_event_request_t req = { .type = MEM_EVENT_TYPE_PAGING }; > > We like them. > > -- Keir > >
On 23/11/2011 18:06, "Olaf Hering" <olaf@aepfle.de> wrote:> On Wed, Nov 23, Keir Fraser wrote: > >> On 23/11/2011 17:00, "Olaf Hering" <olaf@aepfle.de> wrote: >> >>> On Tue, Nov 22, Keir Fraser wrote: >>> >>>> We obviously can''t have dom0 going to sleep on paging work. This, at least, >>>> isn''t a wait-queue bug. >>> >>> I had to rearrange some code in p2m_mem_paging_populate for my debug >>> stuff. This led to an uninitialized req, and as a result req.flags >>> sometimes had MEM_EVENT_FLAG_VCPU_PAUSED set. For some reason gcc did >>> not catch that.. >>> Now waitqueues appear to work ok for me. Thanks! >> >> Great. However, while eyeballing wait.c I spotted at least two bugs. I''m >> pretty sure that the hypervisor will blow up pretty quickly when you resume >> testing with multiple physical CPUs, for example. I need to create a couple >> of fixup patches which I will then send to you for test. > > Good, I will look forward for these fixes. > >> By the way, did you test my patch to domain_crash when the stack-save area >> isn''t large enough? > > I ran into the ->esp == 0 case right away, but I need to retest with a > clean tree.I think I have a test the wrong way round. This doesn''t really matter now anyway. As I say in my previous email, stack management will have to be redone for waitqueues. -- Keir> Olaf
On Wed, Nov 23, Keir Fraser wrote:> We have quite a big waitqueue problem actually. The current scheme of > per-cpu stacks doesn''t work nicely, as the stack pointer will change if a > vcpu goes to sleep and then wakes up on a different cpu. This really doesn''t > work nicely with preempted C code, which may implement frame pointers and/or > arbitrarily take the address of on-stack variables. The result will be > hideous cross-stack corruptions, as these frame pointers and cached > addresses of automatic variables will reference the wrong cpu''s stack! > Fixing or detecting this in general is not possible afaics.Yes, I was thinking about that wakeup on different cpu as well. As a quick fix/hack, perhaps the scheduler could make sure the vcpu wakes up on the same cpu? Olaf
On 23/11/2011 18:31, "Olaf Hering" <olaf@aepfle.de> wrote:> On Wed, Nov 23, Keir Fraser wrote: > >> We have quite a big waitqueue problem actually. The current scheme of >> per-cpu stacks doesn''t work nicely, as the stack pointer will change if a >> vcpu goes to sleep and then wakes up on a different cpu. This really doesn''t >> work nicely with preempted C code, which may implement frame pointers and/or >> arbitrarily take the address of on-stack variables. The result will be >> hideous cross-stack corruptions, as these frame pointers and cached >> addresses of automatic variables will reference the wrong cpu''s stack! >> Fixing or detecting this in general is not possible afaics. > > Yes, I was thinking about that wakeup on different cpu as well. > As a quick fix/hack, perhaps the scheduler could make sure the vcpu > wakes up on the same cpu?Could save old affinity and then vcpu_set_affinity. That will have to do for now. Actually it should work okay as long as toolstack doesn''t mess with affinity meanwhile. I''ll sort out a patch for this. -- Keir> Olaf
On 23/11/2011 19:21, "Keir Fraser" <keir.xen@gmail.com> wrote:> On 23/11/2011 18:31, "Olaf Hering" <olaf@aepfle.de> wrote: > >> On Wed, Nov 23, Keir Fraser wrote: >> >>> We have quite a big waitqueue problem actually. The current scheme of >>> per-cpu stacks doesn''t work nicely, as the stack pointer will change if a >>> vcpu goes to sleep and then wakes up on a different cpu. This really doesn''t >>> work nicely with preempted C code, which may implement frame pointers and/or >>> arbitrarily take the address of on-stack variables. The result will be >>> hideous cross-stack corruptions, as these frame pointers and cached >>> addresses of automatic variables will reference the wrong cpu''s stack! >>> Fixing or detecting this in general is not possible afaics. >> >> Yes, I was thinking about that wakeup on different cpu as well. >> As a quick fix/hack, perhaps the scheduler could make sure the vcpu >> wakes up on the same cpu? > > Could save old affinity and then vcpu_set_affinity. That will have to do for > now. Actually it should work okay as long as toolstack doesn''t mess with > affinity meanwhile. I''ll sort out a patch for this.Attached three patches for you to try. They apply in sequence. 00: A fixed version of "domain_crash on stack overflow" 01: Reorders prepare_to_wait so that the vcpu will always be on the waitqueue on exit (even if it has just been woken). 02: Ensures the vcpu wakes up on the same cpu that it slept on. We need all of these. Just need testing to make sure they aren''t horribly broken. You should be able to test multi-processor host again with these. -- Keir> -- Keir > >> Olaf > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, Nov 23, Keir Fraser wrote:> Attached three patches for you to try. They apply in sequence. > 00: A fixed version of "domain_crash on stack overflow" > 01: Reorders prepare_to_wait so that the vcpu will always be on the > waitqueue on exit (even if it has just been woken). > 02: Ensures the vcpu wakes up on the same cpu that it slept on. > > We need all of these. Just need testing to make sure they aren''t horribly > broken. You should be able to test multi-processor host again with these.Thanks Keir. In a first test they work ok with multi-processor. I get vcpu hangs when I balloon up and down with mem-set. Thats most likely caused by uneven vcpu_pause/unpause calls in my changes which use wait queue in mem_event handling and ept_get_entry. I will debug that further. After the vcpu hung I killed the guest and tried to start a new one. Oddly enough I wasnt able to fully kill the guest, it remained in --p--d state. Most vcpus were in paused state before that. In another attempt I was able to run firefox in a guest. But after trying to open all "latest headlines" in tabs the guest crashed. qemu-dm log had alot of this (but nothing in xen dmesg): track_dirty_vram(f0000000, 12c) failed (-1, 3) xl vcpu-list shows (null) 1 0 - --p 47.3 any cpu (null) 1 1 12 --- 13.4 any cpu (null) 1 2 - --p 4.3 any cpu (null) 1 3 - --p 7.8 any cpu (null) 1 4 - --p 3.5 any cpu (null) 1 5 - --p 1.9 any cpu (null) 1 6 - --p 1.6 any cpu (null) 1 7 - --p 1.4 any cpu Hmm, qemu-dm doesnt get killed in all cases, killing it destroys the guest.. I have seen that before already. I will provide more test results tomorrow. Olaf
On 23/11/2011 22:30, "Olaf Hering" <olaf@aepfle.de> wrote:> After the vcpu hung I killed the guest and tried to start a new one. > Oddly enough I wasnt able to fully kill the guest, it remained in --p--d > state. Most vcpus were in paused state before that.Dying but kept as a zombie by memory references...> In another attempt I was able to run firefox in a guest. But after > trying to open all "latest headlines" in tabs the guest crashed. qemu-dm > log had alot of this (but nothing in xen dmesg): > track_dirty_vram(f0000000, 12c) failed (-1, 3) > > xl vcpu-list shows > (null) 1 0 - --p 47.3 any cpu > (null) 1 1 12 --- 13.4 any cpu > (null) 1 2 - --p 4.3 any cpu > (null) 1 3 - --p 7.8 any cpu > (null) 1 4 - --p 3.5 any cpu > (null) 1 5 - --p 1.9 any cpu > (null) 1 6 - --p 1.6 any cpu > (null) 1 7 - --p 1.4 any cpu > > Hmm, qemu-dm doesnt get killed in all cases, killing it destroys the guest.. > I have seen that before already....from qemu-dm. Problem is that toolstack is not killing qemu-dm, or qemu-dm is not responding to some shutdown signal.> I will provide more test results tomorrow.Thanks. -- Keir
>>> On 23.11.11 at 22:03, Keir Fraser <keir.xen@gmail.com> wrote: > On 23/11/2011 19:21, "Keir Fraser" <keir.xen@gmail.com> wrote: > >> On 23/11/2011 18:31, "Olaf Hering" <olaf@aepfle.de> wrote: >> >>> On Wed, Nov 23, Keir Fraser wrote: >>> >>>> We have quite a big waitqueue problem actually. The current scheme of >>>> per-cpu stacks doesn''t work nicely, as the stack pointer will change if a >>>> vcpu goes to sleep and then wakes up on a different cpu. This really doesn''t >>>> work nicely with preempted C code, which may implement frame pointers and/or >>>> arbitrarily take the address of on-stack variables. The result will be >>>> hideous cross-stack corruptions, as these frame pointers and cached >>>> addresses of automatic variables will reference the wrong cpu''s stack! >>>> Fixing or detecting this in general is not possible afaics. >>> >>> Yes, I was thinking about that wakeup on different cpu as well. >>> As a quick fix/hack, perhaps the scheduler could make sure the vcpu >>> wakes up on the same cpu? >> >> Could save old affinity and then vcpu_set_affinity. That will have to do for >> now. Actually it should work okay as long as toolstack doesn''t mess with >> affinity meanwhile. I''ll sort out a patch for this. > > Attached three patches for you to try. They apply in sequence. > 00: A fixed version of "domain_crash on stack overflow" > 01: Reorders prepare_to_wait so that the vcpu will always be on the > waitqueue on exit (even if it has just been woken). > 02: Ensures the vcpu wakes up on the same cpu that it slept on.Didn''t we (long ago) settle on not permitting new calls to domain_crash_synchronous()? Is it really impossible to just domain_crash() in any of the instances these add? Jan> We need all of these. Just need testing to make sure they aren''t horribly > broken. You should be able to test multi-processor host again with these. > > -- Keir > >> -- Keir >> >>> Olaf >> >>
On 24/11/2011 09:15, "Jan Beulich" <JBeulich@suse.com> wrote:>> >> Attached three patches for you to try. They apply in sequence. >> 00: A fixed version of "domain_crash on stack overflow" >> 01: Reorders prepare_to_wait so that the vcpu will always be on the >> waitqueue on exit (even if it has just been woken). >> 02: Ensures the vcpu wakes up on the same cpu that it slept on. > > Didn''t we (long ago) settle on not permitting new calls to > domain_crash_synchronous()? Is it really impossible to just > domain_crash() in any of the instances these add?It''s safe because you must be in a context that is safe to preempt. That''s a pre-condition for using a waitqueue. It''s not safe to use domain_crash() because the caller of wait_event() may not handle the exceptional return. -- Keir
On 24/11/2011 09:15, "Jan Beulich" <JBeulich@suse.com> wrote:>> Attached three patches for you to try. They apply in sequence. >> 00: A fixed version of "domain_crash on stack overflow" >> 01: Reorders prepare_to_wait so that the vcpu will always be on the >> waitqueue on exit (even if it has just been woken). >> 02: Ensures the vcpu wakes up on the same cpu that it slept on. > > Didn''t we (long ago) settle on not permitting new calls to > domain_crash_synchronous()?This was a reaction to lazy patches which sprinkled d_c_s calls around liberally, and in unsafe locations, as a dodge around proper error handling.
On Wed, Nov 23, Keir Fraser wrote:> ...from qemu-dm. Problem is that toolstack is not killing qemu-dm, or > qemu-dm is not responding to some shutdown signal.In the first crash there was no qemu-dm process left from what I remember. I will see if it happens again. Olaf
On Thu, Nov 24, Olaf Hering wrote:> On Wed, Nov 23, Keir Fraser wrote: > > > ...from qemu-dm. Problem is that toolstack is not killing qemu-dm, or > > qemu-dm is not responding to some shutdown signal. > > In the first crash there was no qemu-dm process left from what I > remember. I will see if it happens again.I see the patches were already commited. Thanks. After more investigation my config file has on_crash="preserve". To me it looks like the guest kills itself, since nothing is in the logs. So after all thats not a waitqueue issue, and most likely also not a paging bug. Olaf
One more thing: Is the BUG_ON in destroy_waitqueue_vcpu really required? If the vcpu happens to be in a queue by the time xl destroy is called the hypervisor will crash. Perhaps there should be some sort of domain destructor for each waitqueue? Olaf
On 25/11/2011 18:26, "Olaf Hering" <olaf@aepfle.de> wrote:> > One more thing: > > Is the BUG_ON in destroy_waitqueue_vcpu really required? If the vcpu > happens to be in a queue by the time xl destroy is called the hypervisor > will crash.We could fix this by having waitqueues that contain a vcpu hold a reference to that vcpu''s domain.> Perhaps there should be some sort of domain destructor for each > waitqueue?Not sure what you mean. -- Keir> Olaf
Maybe Matching Threads
- Xen PVM: Strange lockups when running PostgreSQL load
- pdb missing files?
- [PATCH] xenpaging: add error code to indicate iommem passthrough
- [PATCH] xenpaging:add a new array to speed up page-in in xenpaging
- [PATCH 00 of 18] [v2] tools: fix bugs and build errors triggered by -O2 -Wall -Werror