Hello, I''m running into the BUG_ON() after an incomplete XENMEM_decrease_reservation HYPERVISOR_memory_op call in balloon.c:decrease_reservation(). The reason for that is the huge number of nr_extents, were many of them are paged out pages. Because the are paged out, they can just be dropped from the xenpaging point of view, no need to page them in before calling p2m_remove_page() for the paged-out gfn. Whatever strategy is chosen, the hypercall will be preempted. Because the hypercall is preempted, the arg is copied several times from the guest to the stack with copy_from_guest(). Now there is appearently nothing that stops the xenpaging binary in dom0 from making progress and eventually nominating the gfn which holds the guests kernel stack page. This lets __hvm_copy() return HVMCOPY_gfn_paged_out, which means copy_from_user_hvm() "fails", and this lets the whole hypercall fail. Now in my particular case, its the first copy_from_user_hvm() and I can probably come up with a simple patch which let copy_from_user_hvm() return some sort of -EAGAIN. This could be used in do_memory_op() to just restart the hypercall once more until the gfn which holds args is available again. Then my decrease_reservation() bug would have a workaround and I could move on. However, I think there is nothing that would prevent the xenpaging binary from nominating the guest gfn while the actual work is done during the hypercall and then copy_to_user_hvm would fail. How should other hypercalls deal with the situation that the guest gfn gets into the paged-out state? Can they just sleep and do some sort of polling until the page is accessible again? Was this case considered while implementing xenpaging? I''m currently reading through the callers of __hvm_copy(). Some of them detect HVMCOPY_gfn_paged_out and so some sort of retry. Others just ignore the return codes, or turn them into generic errors. In the case of copy_from/to_guest each caller needs an audit if a retry is possible. Olaf _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, Nov 10, Olaf Hering wrote:> I''m currently reading through the callers of __hvm_copy(). Some of them > detect HVMCOPY_gfn_paged_out and so some sort of retry. Others just > ignore the return codes, or turn them into generic errors. In the case > of copy_from/to_guest each caller needs an audit if a retry is possible.A frist patch which avoids the BUG_ON is below. It turned out that the guests pagetable were just nominated and paged-out during the preempted do_memory_op hypercall. So copy_from_guest failed in decrease_reservation() and do_memory_op(). I have added also some error handling in case the copy_to_user fails. However, only the decrease_reservation() code path is runtime tested and in fact this whole patch is not yet compile-tested. Its just a heads-up. So is that an acceptable way to deal with the HVMCOPY_gfn_paged_out return codes from __hvm_copy? Or should I explore some different way, like spinning there and possible let other threads-of-execution make progress while waiting for the gfns to come back? Olaf --- xen/arch/x86/hvm/hvm.c | 4 ++++ xen/common/memory.c | 43 ++++++++++++++++++++++++++++++++++++++----- 2 files changed, 42 insertions(+), 5 deletions(-) --- xen-4.0.1-testing.orig/xen/arch/x86/hvm/hvm.c +++ xen-4.0.1-testing/xen/arch/x86/hvm/hvm.c @@ -1853,6 +1853,8 @@ unsigned long copy_to_user_hvm(void *to, rc = hvm_copy_to_guest_virt_nofault((unsigned long)to, (void *)from, len, 0); + if ( rc == HVMCOPY_gfn_paged_out ) + return -EAGAIN; return rc ? len : 0; /* fake a copy_to_user() return code */ } @@ -1869,6 +1871,8 @@ unsigned long copy_from_user_hvm(void *t #endif rc = hvm_copy_from_guest_virt_nofault(to, (unsigned long)from, len, 0); + if ( rc == HVMCOPY_gfn_paged_out ) + return -EAGAIN; return rc ? len : 0; /* fake a copy_from_user() return code */ } --- xen-4.0.1-testing.orig/xen/common/memory.c +++ xen-4.0.1-testing/xen/common/memory.c @@ -47,6 +47,7 @@ static void increase_reservation(struct { struct page_info *page; unsigned long i; + unsigned long ctg_ret; xen_pfn_t mfn; struct domain *d = a->domain; @@ -80,8 +81,14 @@ static void increase_reservation(struct if ( !guest_handle_is_null(a->extent_list) ) { mfn = page_to_mfn(page); - if ( unlikely(__copy_to_guest_offset(a->extent_list, i, &mfn, 1)) ) + ctg_ret = __copy_to_guest_offset(a->extent_list, i, &mfn, 1); + if ( unlikely(ctg_ret) ) + { + free_domheap_pages(page, a->extent_order); + if ( (long)ctg_ret == -EAGAIN ) + a->preempted = 1; goto out; + } } } @@ -93,6 +100,7 @@ static void populate_physmap(struct memo { struct page_info *page; unsigned long i, j; + unsigned long ctg_ret; xen_pfn_t gpfn, mfn; struct domain *d = a->domain; @@ -111,8 +119,13 @@ static void populate_physmap(struct memo goto out; } - if ( unlikely(__copy_from_guest_offset(&gpfn, a->extent_list, i, 1)) ) + j = __copy_from_guest_offset(&gpfn, a->extent_list, i, 1); + if ( unlikely(j) ) + { + if ( (long)j == -EAGAIN ) + a->preempted = 1; goto out; + } if ( a->memflags & MEMF_populate_on_demand ) { @@ -142,8 +155,17 @@ static void populate_physmap(struct memo set_gpfn_from_mfn(mfn + j, gpfn + j); /* Inform the domain of the new page''s machine address. */ - if ( unlikely(__copy_to_guest_offset(a->extent_list, i, &mfn, 1)) ) + ctg_ret = __copy_to_guest_offset(a->extent_list, i, &mfn, 1); + if ( unlikely(ctg_ret) ) + { + for ( j = 0; j < (1 << a->extent_order); j++ ) + set_gpfn_from_mfn(mfn + j, INVALID_P2M_ENTRY); + guest_physmap_remove_page(d, gpfn, mfn, a->extent_order); + free_domheap_pages(page, a->extent_order); + if ( (long)ctg_ret == -EAGAIN ) + a->preempted = 1; goto out; + } } } } @@ -226,8 +248,13 @@ static void decrease_reservation(struct goto out; } - if ( unlikely(__copy_from_guest_offset(&gmfn, a->extent_list, i, 1)) ) + j = __copy_from_guest_offset(&gmfn, a->extent_list, i, 1); + if ( unlikely(j) ) + { + if ( (long)j == -EAGAIN ) + a->preempted = 1; goto out; + } if ( tb_init_done ) { @@ -511,6 +538,7 @@ long do_memory_op(unsigned long cmd, XEN int rc, op; unsigned int address_bits; unsigned long start_extent; + unsigned long cfg_ret; struct xen_memory_reservation reservation; struct memop_args args; domid_t domid; @@ -524,8 +552,13 @@ long do_memory_op(unsigned long cmd, XEN case XENMEM_populate_physmap: start_extent = cmd >> MEMOP_EXTENT_SHIFT; - if ( copy_from_guest(&reservation, arg, 1) ) + cfg_ret = copy_from_guest(&reservation, arg, 1); + if ( unlikely(cfg_ret) ) + { + if ( (long)cfg_ret == -EAGAIN ) + return hypercall_create_continuation(__HYPERVISOR_memory_op, "lh", cmd, arg); return start_extent; + } /* Is size too large for us to encode a continuation? */ if ( reservation.nr_extents > (ULONG_MAX >> MEMOP_EXTENT_SHIFT) ) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 11/11/2010 14:33, "Olaf Hering" <olaf@aepfle.de> wrote:> So is that an acceptable way to deal with the HVMCOPY_gfn_paged_out > return codes from __hvm_copy? > Or should I explore some different way, like spinning there and possible > let other threads-of-execution make progress while waiting for the gfns > to come back?You can''t just spin because Xen is not preemptible. If it were a single CPU system for example, no other thread would ever run again. You have to ''spin'' via a preemptible loop that returns to guest context and then back into the hypercall. Which appears to be what you''re doing. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, Nov 11, Keir Fraser wrote:> On 11/11/2010 14:33, "Olaf Hering" <olaf@aepfle.de> wrote: > > > So is that an acceptable way to deal with the HVMCOPY_gfn_paged_out > > return codes from __hvm_copy? > > Or should I explore some different way, like spinning there and possible > > let other threads-of-execution make progress while waiting for the gfns > > to come back? > > You can''t just spin because Xen is not preemptible. If it were a single CPU > system for example, no other thread would ever run again. You have to ''spin'' > via a preemptible loop that returns to guest context and then back into the > hypercall. Which appears to be what you''re doing.Thanks for the answer. It occoured to me that this is an issue for hypercalls made by the guest. There are probably not that many in use. So it shouldnt be that hard to audit the few drivers what they use and add some error handling. Up to now, only do_memory_op had an issue. Olaf _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 11/11/2010 20:34, "Olaf Hering" <olaf@aepfle.de> wrote:> On Thu, Nov 11, Keir Fraser wrote: > >> On 11/11/2010 14:33, "Olaf Hering" <olaf@aepfle.de> wrote: >> >>> So is that an acceptable way to deal with the HVMCOPY_gfn_paged_out >>> return codes from __hvm_copy? >>> Or should I explore some different way, like spinning there and possible >>> let other threads-of-execution make progress while waiting for the gfns >>> to come back? >> >> You can''t just spin because Xen is not preemptible. If it were a single CPU >> system for example, no other thread would ever run again. You have to ''spin'' >> via a preemptible loop that returns to guest context and then back into the >> hypercall. Which appears to be what you''re doing. > > Thanks for the answer. > > It occoured to me that this is an issue for hypercalls made by the > guest. There are probably not that many in use. So it shouldnt be that > hard to audit the few drivers what they use and add some error handling. > Up to now, only do_memory_op had an issue.Only other thing I''d say is that depending on how often this happens, because paging in may well require a slow I/O operation, it may even be nice to sleep the waiting vcpu rather than spin. Would require some mechanism to record what vcpus are waiting for what mfns, and to check that list when paging stuff in. I guess it''s rather a ''phase 2'' thing after things actually work reliably! -- Keir> > Olaf >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 11.11.10 at 21:08, Keir Fraser <keir@xen.org> wrote: > On 11/11/2010 14:33, "Olaf Hering" <olaf@aepfle.de> wrote: > >> So is that an acceptable way to deal with the HVMCOPY_gfn_paged_out >> return codes from __hvm_copy? >> Or should I explore some different way, like spinning there and possible >> let other threads-of-execution make progress while waiting for the gfns >> to come back? > > You can''t just spin because Xen is not preemptible. If it were a single CPU > system for example, no other thread would ever run again. You have to ''spin'' > via a preemptible loop that returns to guest context and then back into the > hypercall. Which appears to be what you''re doing.This works in the context os do_memory_op(), which already has a way to encode a continuation. For other hypercalls (accessible to HVM guests) this may not be as simple, and all of them are potentially running into this same problem. Furthermore, even for the do_memory_op() one, encoding a continuation for a failure of copying in the arguments is clearly acceptable (if no other solution can be found), but unwinding the whole operation when copying out the results fails is at least undesirable (and can lead to a live lock). So I think a general (hopefully transparent to the individual hypercall handlers) solution needs to be found, and a word on the general issue from the original paging code authors (and their thoughts of it when designing the whole thing) would be very much appreciated. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 12/11/2010 09:45, "Jan Beulich" <JBeulich@novell.com> wrote:> Furthermore, even for the do_memory_op() one, encoding a > continuation for a failure of copying in the arguments is clearly > acceptable (if no other solution can be found), but unwinding > the whole operation when copying out the results fails is at > least undesirable (and can lead to a live lock). So I think a > general (hopefully transparent to the individual hypercall > handlers) solution needs to be found, and a word on the > general issue from the original paging code authors (and their > thoughts of it when designing the whole thing) would be very > much appreciated.We will at least have to enforce that no spinlocks are held during copy_to/from_guest operations. That''s easily enforced at least in debug builds of course. Beyond that, introducing some transparent mechanisms for sleeping in the hypervisor -- mutexes, wait queues, and the like -- is actually fine with me. Perhaps this will also help clean up the preemptible page-type-checking logic that you had to do some heavy lifting on? I''m happy to help work on the basic mechanism of this, if it''s going to be useful and widely used. I reckon I could get mutexes and wait queues going in a couple of days. This would be the kind of framework that the paging mechanisms should then properly be built on. What do you think? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 12.11.10 at 11:22, Keir Fraser <keir@xen.org> wrote: > On 12/11/2010 09:45, "Jan Beulich" <JBeulich@novell.com> wrote: > >> Furthermore, even for the do_memory_op() one, encoding a >> continuation for a failure of copying in the arguments is clearly >> acceptable (if no other solution can be found), but unwinding >> the whole operation when copying out the results fails is at >> least undesirable (and can lead to a live lock). So I think a >> general (hopefully transparent to the individual hypercall >> handlers) solution needs to be found, and a word on the >> general issue from the original paging code authors (and their >> thoughts of it when designing the whole thing) would be very >> much appreciated. > > We will at least have to enforce that no spinlocks are held during > copy_to/from_guest operations. That''s easily enforced at least in debug > builds of course. > > Beyond that, introducing some transparent mechanisms for sleeping in the > hypervisor -- mutexes, wait queues, and the like -- is actually fine with > me. Perhaps this will also help clean up the preemptible page-type-checking > logic that you had to do some heavy lifting on?I''m not sure it would help there - this requires voluntary preemption rather than synchronization. But perhaps it can be built on top of this (or results as a side effect).> I''m happy to help work on the basic mechanism of this, if it''s going to be > useful and widely used. I reckon I could get mutexes and wait queues going > in a couple of days. This would be the kind of framework that the paging > mechanisms should then properly be built on. > > What do you think?Sounds good, and you helping with this will be much appreciated (Olaf - unless you had plans doing this yourself). Whether it''s going to be widely used I can''t tell immediately - for the moment, overcoming the paging problems seems like the only application. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 12/11/2010 10:47, "Jan Beulich" <JBeulich@novell.com> wrote:>>>> On 12.11.10 at 11:22, Keir Fraser <keir@xen.org> wrote: >> On 12/11/2010 09:45, "Jan Beulich" <JBeulich@novell.com> wrote: >> >> Beyond that, introducing some transparent mechanisms for sleeping in the >> hypervisor -- mutexes, wait queues, and the like -- is actually fine with >> me. Perhaps this will also help clean up the preemptible page-type-checking >> logic that you had to do some heavy lifting on? > > I''m not sure it would help there - this requires voluntary > preemption rather than synchronization. But perhaps it can be > built on top of this (or results as a side effect).Yes, voluntary preempt can be built from the same bits and pieces very easily. I will provide that too, and I think some simplification to the page-type functions and callers will result. No bad thing!>> I''m happy to help work on the basic mechanism of this, if it''s going to be >> useful and widely used. I reckon I could get mutexes and wait queues going >> in a couple of days. This would be the kind of framework that the paging >> mechanisms should then properly be built on. >> >> What do you think? > > Sounds good, and you helping with this will be much appreciated > (Olaf - unless you had plans doing this yourself). Whether it''s going > to be widely used I can''t tell immediately - for the moment, > overcoming the paging problems seems like the only application.Yeah, I''ll get on this. -- Keir> Jan >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
At 09:45 +0000 on 12 Nov (1289555111), Jan Beulich wrote:> Furthermore, even for the do_memory_op() one, encoding a > continuation for a failure of copying in the arguments is clearly > acceptable (if no other solution can be found), but unwinding > the whole operation when copying out the results fails is at > least undesirable (and can lead to a live lock). So I think a > general (hopefully transparent to the individual hypercall > handlers) solution needs to be found, and a word on the > general issue from the original paging code authors (and their > thoughts of it when designing the whole thing) would be very > much appreciated.Maybe Patrick can comment too, but my recollection of discussing this is that we would have to propagate failures caused by paging at least as far as the dom0 kernel, because otherwise a single-vcpu dom0 kernel could deadlock with its one vcpu stuck in a hypercall (or continually having it preempted and retried) and the paging binary that would unstick it never getting scheduled. Cheers, Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Xen Platform Team Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 15.11.10 at 10:37, Tim Deegan <Tim.Deegan@citrix.com> wrote: > At 09:45 +0000 on 12 Nov (1289555111), Jan Beulich wrote: >> Furthermore, even for the do_memory_op() one, encoding a >> continuation for a failure of copying in the arguments is clearly >> acceptable (if no other solution can be found), but unwinding >> the whole operation when copying out the results fails is at >> least undesirable (and can lead to a live lock). So I think a >> general (hopefully transparent to the individual hypercall >> handlers) solution needs to be found, and a word on the >> general issue from the original paging code authors (and their >> thoughts of it when designing the whole thing) would be very >> much appreciated. > > Maybe Patrick can comment too, but my recollection of discussing this is > that we would have to propagate failures caused by paging at least as > far as the dom0 kernel, because otherwise a single-vcpu dom0 kernel > could deadlock with its one vcpu stuck in a hypercall (or continually > having it preempted and retried) and the paging binary that would > unstick it never getting scheduled.How''s Dom0 involved here? The hypercall arguments live in guest memory. Confused, Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 15/11/2010 09:53, "Jan Beulich" <JBeulich@novell.com> wrote:>> Maybe Patrick can comment too, but my recollection of discussing this is >> that we would have to propagate failures caused by paging at least as >> far as the dom0 kernel, because otherwise a single-vcpu dom0 kernel >> could deadlock with its one vcpu stuck in a hypercall (or continually >> having it preempted and retried) and the paging binary that would >> unstick it never getting scheduled. > > How''s Dom0 involved here? The hypercall arguments live in > guest memory.Yes, and you''d never turn on paging for dom0 itself. That would never work! Changing every user of the guest accessor macros to retry via guest space is really not tenable. We''d never get all the bugs out. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
At 10:09 +0000 on 15 Nov (1289815777), Keir Fraser wrote:> On 15/11/2010 09:53, "Jan Beulich" <JBeulich@novell.com> wrote: > > >> Maybe Patrick can comment too, but my recollection of discussing this is > >> that we would have to propagate failures caused by paging at least as > >> far as the dom0 kernel, because otherwise a single-vcpu dom0 kernel > >> could deadlock with its one vcpu stuck in a hypercall (or continually > >> having it preempted and retried) and the paging binary that would > >> unstick it never getting scheduled. > > > > How''s Dom0 involved here? The hypercall arguments live in > > guest memory. > > Yes, and you''d never turn on paging for dom0 itself. That would never work!:) No, the issue is if dom0 (or whichever dom the pager lives in) is trying an operation on domU''s memory that hits a paged-out page (e.g. qemu or similar is mapping it) with its only vpcu - you can''t just block or spin. You need to let dom0 schedule the pager process.> Changing every user of the guest accessor macros to retry via guest space is > really not tenable. We''d never get all the bugs out.Right now, I can''t see another way of doing it. Grants can be handled by shadowing the guest grant table and pinning granted frames so the block happens in domU (performance-- but you''re already paging, right?) but what about qemu, xenctx, save/restore...? Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Xen Platform Team Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 15/11/2010 10:20, "Tim Deegan" <Tim.Deegan@citrix.com> wrote:>> Yes, and you''d never turn on paging for dom0 itself. That would never work! > > :) No, the issue is if dom0 (or whichever dom the pager lives in) is > trying an operation on domU''s memory that hits a paged-out page > (e.g. qemu or similar is mapping it) with its only vpcu - you can''t > just block or spin. You need to let dom0 schedule the pager process. > >> Changing every user of the guest accessor macros to retry via guest space is >> really not tenable. We''d never get all the bugs out. > > Right now, I can''t see another way of doing it. Grants can be handled > by shadowing the guest grant table and pinning granted frames so the > block happens in domU (performance-- but you''re already paging, right?) > but what about qemu, xenctx, save/restore...?We''re talking about copy_to/from_guest, and friends, here. They always implicitly act on the local domain, so the issue you raise is not a problem there. Dom0 mappings of domU memory are a separate issue, presumably already considered and dealt with to some extent, no doubt. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
At 10:33 +0000 on 15 Nov (1289817224), Keir Fraser wrote:> On 15/11/2010 10:20, "Tim Deegan" <Tim.Deegan@citrix.com> wrote: > > >> Yes, and you''d never turn on paging for dom0 itself. That would never work! > > > > :) No, the issue is if dom0 (or whichever dom the pager lives in) is > > trying an operation on domU''s memory that hits a paged-out page > > (e.g. qemu or similar is mapping it) with its only vpcu - you can''t > > just block or spin. You need to let dom0 schedule the pager process. > > > >> Changing every user of the guest accessor macros to retry via guest space is > >> really not tenable. We''d never get all the bugs out. > > > > Right now, I can''t see another way of doing it. Grants can be handled > > by shadowing the guest grant table and pinning granted frames so the > > block happens in domU (performance-- but you''re already paging, right?) > > but what about qemu, xenctx, save/restore...? > > We''re talking about copy_to/from_guest, and friends, here.Oh sorry, I had lost the context there. Yes, for those the plan was just to pause and retry, just like all other cases where Xen needs to access guest memory. We hadn''t particularly considered the case of large hypercall arguments that aren''t all read up-front. How many cases of that are there? A bit of reordering on the memory-operation hypercalls could presuambly let them be preempted and restart further in mid-operation next time. (IIRC the compat code already does something like this). Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Xen Platform Team Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 15/11/2010 10:49, "Tim Deegan" <Tim.Deegan@citrix.com> wrote:>> We''re talking about copy_to/from_guest, and friends, here. > > Oh sorry, I had lost the context there. > > Yes, for those the plan was just to pause and retry, just like all other > cases where Xen needs to access guest memory.Could you expand on what you mean by pause and retry? As that''s what I think should be implemented, and involves sleeping in hypervisor context afaics, which has led us to the current point in the discussion.> We hadn''t particularly > considered the case of large hypercall arguments that aren''t all read > up-front. How many cases of that are there? A bit of reordering on the > memory-operation hypercalls could presuambly let them be preempted and > restart further in mid-operation next time. (IIRC the compat code > already does something like this).The issue is that there are hundreds of uses of the guest-accessor macros. Every single one would need updating to handle the paged-out-so-retry case, unless we can hide that *inside* the accessor macros themselves. It''s a huge job, not to mention the bug tail on rarely-executed error paths. Consider also the copy_to_* writeback case at the end of a hypercall. You''ve done the potentially non-idempotent work, you have some state cached in hypervisor regs/stack/heap and want to push it out to guest memory. The guest target memory is paged out. How do you encode the continuation for the dozens of cases like this without tearing your hair out? I suppose *maybe* you could check-and-pin all memory that might be accessed before the meat of a hypercall begins. That seems a fragile pain in the neck too however. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
At 11:55 +0000 on 15 Nov (1289822118), Keir Fraser wrote:> The issue is that there are hundreds of uses of the guest-accessor macros. > Every single one would need updating to handle the paged-out-so-retry case, > unless we can hide that *inside* the accessor macros themselves. It''s a huge > job, not to mention the bug tail on rarely-executed error paths.Right, I see. You''re suggesting that we code up a sort of setjmp() that can be called in the __copy function, which will deschedule the vcpu and allow it to be rescheduled back where it was. Sounds ideal. Will it need per-vcpu stacks? (and will they, in turn, use order>0 allocations? :)) We''ll have to audit the __copy functions to make sure they''re not called with locks held. Sounds more fun than the alternative, I guess. I think the ioreq code would be another candidate for tidying up if we had such a mechanism. Presumably some of the current users of hypercall_create_continuation() would benefit too.> Consider also the copy_to_* writeback case at the end of a hypercall. You''ve > done the potentially non-idempotent work, you have some state cached in > hypervisor regs/stack/heap and want to push it out to guest memory. The > guest target memory is paged out. How do you encode the continuation for the > dozens of cases like this without tearing your hair out? > > I suppose *maybe* you could check-and-pin all memory that might be accessed > before the meat of a hypercall begins. That seems a fragile pain in the neck > too however.Good point. Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Xen Platform Team Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 15/11/2010 12:04, "Tim Deegan" <Tim.Deegan@citrix.com> wrote:> At 11:55 +0000 on 15 Nov (1289822118), Keir Fraser wrote: >> The issue is that there are hundreds of uses of the guest-accessor macros. >> Every single one would need updating to handle the paged-out-so-retry case, >> unless we can hide that *inside* the accessor macros themselves. It''s a huge >> job, not to mention the bug tail on rarely-executed error paths. > > Right, I see. You''re suggesting that we code up a sort of setjmp() that > can be called in the __copy function, which will deschedule the vcpu and > allow it to be rescheduled back where it was. Sounds ideal.Exactly so.> Will it > need per-vcpu stacks? (and will they, in turn, use order>0 allocations? :))Of a sort. I propose to keep the per-pcpu stacks and then copy context to/from a per-vcpu memory area for the setjmp-like behaviour. Guest call stacks won''t be very deep -- I reckon a 1kB or 2kB per-vcpu area will suffice. In some ways this is a backwards version of the Linux stack-handling logic, which has a proper per-task kernel stack which is of moderate size (4kB?). Then it has per-cpu irq stacks which are larger to deal with deep irq nesting. We will have proper per-cpu hypervisor stacks of sufficient size to deal with guest and irq state -- our per-vcpu ''shadow stack'' will then be the special case and only of small/moderate size to deal with shallow guest call stacks.> We''ll have to audit the __copy functions to make sure they''re not called > with locks held. Sounds more fun than the alternative, I guess.Exactly so. Best of a bad set of options. At least we can run-time assert this and it''s not error-path only.> I think the ioreq code would be another candidate for tidying up if we > had such a mechanism. Presumably some of the current users of > hypercall_create_continuation() would benefit too.Yeah, it needs a dash of thought but I think we will be able to move in this direction. -- Keir>> Consider also the copy_to_* writeback case at the end of a hypercall. You''ve >> done the potentially non-idempotent work, you have some state cached in >> hypervisor regs/stack/heap and want to push it out to guest memory. The >> guest target memory is paged out. How do you encode the continuation for the >> dozens of cases like this without tearing your hair out? >> >> I suppose *maybe* you could check-and-pin all memory that might be accessed >> before the meat of a hypercall begins. That seems a fragile pain in the neck >> too however. > > Good point. > > Tim._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Fri, Nov 12, Keir Fraser wrote:> On 12/11/2010 10:47, "Jan Beulich" <JBeulich@novell.com> wrote: > > Sounds good, and you helping with this will be much appreciated > > (Olaf - unless you had plans doing this yourself). Whether it''s going > > to be widely used I can''t tell immediately - for the moment, > > overcoming the paging problems seems like the only application. > > Yeah, I''ll get on this.Sorry for being late here. I''m glad you volunteer for this task. Olaf _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 15/11/2010 13:12, "Olaf Hering" <olaf@aepfle.de> wrote:> On Fri, Nov 12, Keir Fraser wrote: > >> On 12/11/2010 10:47, "Jan Beulich" <JBeulich@novell.com> wrote: >>> Sounds good, and you helping with this will be much appreciated >>> (Olaf - unless you had plans doing this yourself). Whether it''s going >>> to be widely used I can''t tell immediately - for the moment, >>> overcoming the paging problems seems like the only application. >> >> Yeah, I''ll get on this. > > Sorry for being late here. > > I''m glad you volunteer for this task.The basis of what you need is checked in as xen-unstable:22396. You can include <xen/wait.h> and you get an interface like a very simplified version of Linux waitqueues. There are still some details to be worked out but it basically works as-is and you can start using it now. The one big cleanup/audit we will need is that all callers of __hvm_copy() (which ends up being all HVM guest callers of the copy_to/from_guest* macros) must not hold any locks. This is because you are going to modify __hvm_copy() such that it may sleep. Probably you should ASSERT(!in_atomic()) at the top of __hvm_copy(), and go from there. :-) -- Keir> > Olaf >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 17/11/2010 16:52, "Keir Fraser" <keir@xen.org> wrote:> On 15/11/2010 13:12, "Olaf Hering" <olaf@aepfle.de> wrote: > >> Sorry for being late here. >> >> I''m glad you volunteer for this task. > > The basis of what you need is checked in as xen-unstable:22396. You can > include <xen/wait.h> and you get an interface like a very simplified version > of Linux waitqueues. There are still some details to be worked out but it > basically works as-is and you can start using it now. > > The one big cleanup/audit we will need is that all callers of __hvm_copy() > (which ends up being all HVM guest callers of the copy_to/from_guest* > macros) must not hold any locks. This is because you are going to modify > __hvm_copy() such that it may sleep. Probably you should > ASSERT(!in_atomic()) at the top of __hvm_copy(), and go from there. :-)I''ve done something along these lines now as xen-unstable:22402. It actually seems to work okay! So you can go ahead and use waitqueues in __hvm_copy() now. -- Keir> -- Keir > >> >> Olaf >> > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, Nov 18, Keir Fraser wrote:> I''ve done something along these lines now as xen-unstable:22402. It actually > seems to work okay! So you can go ahead and use waitqueues in __hvm_copy() > now.Thanks alot for your work, Keir! I will get to it next week. Olaf _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, Nov 18, Keir Fraser wrote:> I''ve done something along these lines now as xen-unstable:22402. It actually > seems to work okay! So you can go ahead and use waitqueues in __hvm_copy() > now.This is my first attempt to do it. It crashed Xen on the very first try in a spectacular way. But it happend only once for some reason. See my other mail. Olaf --- xen-unstable.hg-4.1.22447.orig/xen/arch/x86/hvm/hvm.c +++ xen-unstable.hg-4.1.22447/xen/arch/x86/hvm/hvm.c @@ -1986,69 +1986,117 @@ static enum hvm_copy_result __hvm_copy( enum hvm_copy_result hvm_copy_to_guest_phys( paddr_t paddr, void *buf, int size) { - return __hvm_copy(buf, paddr, size, + enum hvm_copy_result res; + struct waitqueue_head wq; + init_waitqueue_head(&wq); + + wait_event(wq, ( + res = __hvm_copy(buf, paddr, size, HVMCOPY_to_guest | HVMCOPY_fault | HVMCOPY_phys, - 0); + 0)) != HVMCOPY_gfn_paged_out); + return res; } enum hvm_copy_result hvm_copy_from_guest_phys( void *buf, paddr_t paddr, int size) { - return __hvm_copy(buf, paddr, size, + enum hvm_copy_result res; + struct waitqueue_head wq; + init_waitqueue_head(&wq); + + wait_event(wq, ( + res = __hvm_copy(buf, paddr, size, HVMCOPY_from_guest | HVMCOPY_fault | HVMCOPY_phys, - 0); + 0)) != HVMCOPY_gfn_paged_out); + return res; } enum hvm_copy_result hvm_copy_to_guest_virt( unsigned long vaddr, void *buf, int size, uint32_t pfec) { - return __hvm_copy(buf, vaddr, size, + enum hvm_copy_result res; + struct waitqueue_head wq; + init_waitqueue_head(&wq); + + wait_event(wq, ( + res = __hvm_copy(buf, vaddr, size, HVMCOPY_to_guest | HVMCOPY_fault | HVMCOPY_virt, - PFEC_page_present | PFEC_write_access | pfec); + PFEC_page_present | PFEC_write_access | pfec)) != HVMCOPY_gfn_paged_out); + return res; } enum hvm_copy_result hvm_copy_from_guest_virt( void *buf, unsigned long vaddr, int size, uint32_t pfec) { - return __hvm_copy(buf, vaddr, size, + enum hvm_copy_result res; + struct waitqueue_head wq; + init_waitqueue_head(&wq); + + wait_event(wq, ( + res = __hvm_copy(buf, vaddr, size, HVMCOPY_from_guest | HVMCOPY_fault | HVMCOPY_virt, - PFEC_page_present | pfec); + PFEC_page_present | pfec)) != HVMCOPY_gfn_paged_out); + return res; } enum hvm_copy_result hvm_fetch_from_guest_virt( void *buf, unsigned long vaddr, int size, uint32_t pfec) { + enum hvm_copy_result res; + struct waitqueue_head wq; if ( hvm_nx_enabled(current) ) pfec |= PFEC_insn_fetch; - return __hvm_copy(buf, vaddr, size, + init_waitqueue_head(&wq); + + wait_event(wq, ( + res = __hvm_copy(buf, vaddr, size, HVMCOPY_from_guest | HVMCOPY_fault | HVMCOPY_virt, - PFEC_page_present | pfec); + PFEC_page_present | pfec)) != HVMCOPY_gfn_paged_out); + return res; } enum hvm_copy_result hvm_copy_to_guest_virt_nofault( unsigned long vaddr, void *buf, int size, uint32_t pfec) { - return __hvm_copy(buf, vaddr, size, + enum hvm_copy_result res; + struct waitqueue_head wq; + init_waitqueue_head(&wq); + + wait_event(wq, ( + res = __hvm_copy(buf, vaddr, size, HVMCOPY_to_guest | HVMCOPY_no_fault | HVMCOPY_virt, - PFEC_page_present | PFEC_write_access | pfec); + PFEC_page_present | PFEC_write_access | pfec)) != HVMCOPY_gfn_paged_out); + return res; } enum hvm_copy_result hvm_copy_from_guest_virt_nofault( void *buf, unsigned long vaddr, int size, uint32_t pfec) { - return __hvm_copy(buf, vaddr, size, + enum hvm_copy_result res; + struct waitqueue_head wq; + init_waitqueue_head(&wq); + + wait_event(wq, ( + res = __hvm_copy(buf, vaddr, size, HVMCOPY_from_guest | HVMCOPY_no_fault | HVMCOPY_virt, - PFEC_page_present | pfec); + PFEC_page_present | pfec)) != HVMCOPY_gfn_paged_out); + return res; } enum hvm_copy_result hvm_fetch_from_guest_virt_nofault( void *buf, unsigned long vaddr, int size, uint32_t pfec) { + enum hvm_copy_result res; + struct waitqueue_head wq; if ( hvm_nx_enabled(current) ) pfec |= PFEC_insn_fetch; - return __hvm_copy(buf, vaddr, size, + init_waitqueue_head(&wq); + + wait_event(wq, ( + res = __hvm_copy(buf, vaddr, size, HVMCOPY_from_guest | HVMCOPY_no_fault | HVMCOPY_virt, - PFEC_page_present | pfec); + PFEC_page_present | pfec)) != HVMCOPY_gfn_paged_out); + return res; } unsigned long copy_to_user_hvm(void *to, const void *from, unsigned int len) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, Nov 18, Keir Fraser wrote:> I''ve done something along these lines now as xen-unstable:22402. It actually > seems to work okay! So you can go ahead and use waitqueues in __hvm_copy() > now.My first attempt with the patch I sent crashed like this. Two threads run into a non-empty list: prepare_to_wait check_wakeup_from_wait I could not reproduce this. Right now I''m running with a modified xenpaging policy which pages just the pagetable gfns around gfn 0x1800. But that almost stalls the guest due to the continous paging. Any ideas how this crash can happen? Olaf .................... Welcome to SUSE Linux Enterprise Server 11 SP1 (x86_64) - Kernel 2.6.32.24-20101117.152845-xen (console). stein-schneider login: (XEN) memory.c:145:d0 Could not allocate order=9 extent: id=1 memflags=0 (2 of 4) (XEN) memory.c:145:d0 Could not allocate order=9 extent: id=1 memflags=0 (0 of 3) [ 102.139380] (cdrom_add_media_watch() file=/usr/src/packages/BUILD/kernel-xen-2.6.32.24/linux-2.6.32/drivers/xen/blkback/cdrom.c, line=108) nodename:backend/vbd/1/768 [ 102.171632] (cdrom_is_type() file=/usr/src/packages/BUILD/kernel-xen-2.6.32.24/linux-2.6.32/drivers/xen/blkback/cdrom.c, line=95) type:0 [ 102.209310] device vif1.0 entered promiscuous mode [ 102.221776] br0: port 2(vif1.0) entering forwarding state [ 102.490897] OLH gntdev_open(449) xend[5202]->qemu-dm[5324] i ffff8800f2420720 f ffff8800f1c2f980 [ 102.733559] ip_tables: (C) 2000-2006 Netfilter Core Team [ 102.888335] nf_conntrack version 0.5.0 (16384 buckets, 65536 max) [ 103.241995] (cdrom_add_media_watch() file=/usr/src/packages/BUILD/kernel-xen-2.6.32.24/linux-2.6.32/drivers/xen/blkback/cdrom.c, line=108) nodename:backend/vbd/1/5632 [ 103.274444] (cdrom_is_type() file=/usr/src/packages/BUILD/kernel-xen-2.6.32.24/linux-2.6.32/drivers/xen/blkback/cdrom.c, line=95) type:1 [ 103.301481] (cdrom_add_media_watch() file=/usr/src/packages/BUILD/kernel-xen-2.6.32.24/linux-2.6.32/drivers/xen/blkback/cdrom.c, line=110) is a cdrom [ 103.331978] (cdrom_add_media_watch() file=/usr/src/packages/BUILD/kernel-xen-2.6.32.24/linux-2.6.32/drivers/xen/blkback/cdrom.c, line=112) xenstore wrote OK [ 103.362764] (cdrom_is_type() file=/usr/src/packages/BUILD/kernel-xen-2.6.32.24/linux-2.6.32/drivers/xen/blkback/cdrom.c, line=95) type:1 [ 104.538376] (cdrom_add_media_watch() file=/usr/src/packages/BUILD/kernel-xen-2.6.32.24/linux-2.6.32/drivers/xen/blkback/cdrom.c, line=108) nodename:backend/vbd/1/832 [ 104.570669] (cdrom_is_type() file=/usr/src/packages/BUILD/kernel-xen-2.6.32.24/linux-2.6.32/drivers/xen/blkback/cdrom.c, line=95) type:0 [ 112.401097] vif1.0: no IPv6 routers present (XEN) HVM1: HVM Loader (XEN) HVM1: Detected Xen v4.1.22433-20101126 (XEN) HVM1: CPU speed is 2667 MHz (XEN) HVM1: Xenbus rings @0xfeffc000, event channel 5 (XEN) irq.c:258: Dom1 PCI link 0 changed 0 -> 5 (XEN) HVM1: PCI-ISA link 0 routed to IRQ5 (XEN) irq.c:258: Dom1 PCI link 1 changed 0 -> 10 (XEN) HVM1: PCI-ISA link 1 routed to IRQ10 (XEN) irq.c:258: Dom1 PCI link 2 changed 0 -> 11 (XEN) HVM1: PCI-ISA link 2 routed to IRQ11 (XEN) irq.c:258: Dom1 PCI link 3 changed 0 -> 5 (XEN) HVM1: PCI-ISA link 3 routed to IRQ5 (XEN) HVM1: pci dev 01:3 INTA->IRQ10 (XEN) HVM1: pci dev 03:0 INTA->IRQ5 (XEN) HVM1: pci dev 02:0 bar 10 size 02000000: f0000008 (XEN) HVM1: pci dev 03:0 bar 14 size 01000000: f2000008 (XEN) HVM1: pci dev 02:0 bar 14 size 00001000: f3000000 (XEN) HVM1: pci dev 03:0 bar 10 size 00000100: 0000c001 (XEN) HVM1: pci dev 01:1 bar 20 size 00000010: 0000c101 (XEN) HVM1: Multiprocessor initialisation: (XEN) HVM1: - CPU0 ... 40-bit phys ... fixed MTRRs ... var MTRRs [2/8] ... done. (XEN) HVM1: - CPU1 ... 40-bit phys ... fixed MTRRs ... var MTRRs [2/8] ... done. (XEN) HVM1: - CPU2 ... 40-bit phys ... fixed MTRRs ... var MTRRs [2/8] ... done. (XEN) HVM1: - CPU3 ... 40-bit phys ... fixed MTRRs ... var MTRRs [2/8] ... done. (XEN) HVM1: Testing HVM environment: (XEN) HVM1: - REP INSB across page boundaries ... passed (XEN) HVM1: - GS base MSRs and SWAPGS ... passed (XEN) HVM1: Passed 2 of 2 tests (XEN) HVM1: Writing SMBIOS tables ... (XEN) HVM1: Loading ROMBIOS ... (XEN) HVM1: 9660 bytes of ROMBIOS high-memory extensions: (XEN) HVM1: Relocating to 0xfc000000-0xfc0025bc ... done (XEN) HVM1: Creating MP tables ... (XEN) HVM1: Loading Cirrus VGABIOS ... (XEN) HVM1: Loading ACPI ... (XEN) HVM1: - Lo data: 000ea020-000ea04f (XEN) HVM1: - Hi data: fc002800-fc01291f (XEN) HVM1: vm86 TSS at fc012c00 (XEN) HVM1: BIOS map: (XEN) HVM1: c0000-c8fff: VGA BIOS (XEN) HVM1: eb000-eb1d9: SMBIOS tables (XEN) HVM1: f0000-fffff: Main BIOS (XEN) HVM1: E820 table: (XEN) HVM1: [00]: 00000000:00000000 - 00000000:0009e000: RAM (XEN) HVM1: [01]: 00000000:0009e000 - 00000000:0009fc00: RESERVED (XEN) HVM1: [02]: 00000000:0009fc00 - 00000000:000a0000: RESERVED (XEN) HVM1: HOLE: 00000000:000a0000 - 00000000:000e0000 (XEN) HVM1: [03]: 00000000:000e0000 - 00000000:00100000: RESERVED (XEN) HVM1: [04]: 00000000:00100000 - 00000000:40000000: RAM (XEN) HVM1: HOLE: 00000000:40000000 - 00000000:fc000000 (XEN) HVM1: [05]: 00000000:fc000000 - 00000001:00000000: RESERVED (XEN) HVM1: Invoking ROMBIOS ... (XEN) HVM1: $Revision: 1.221 $ $Date: 2008/12/07 17:32:29 $ (XEN) stdvga.c:147:d1 entering stdvga and caching modes (XEN) HVM1: VGABios $Id: vgabios.c,v 1.67 2008/01/27 09:44:12 vruppert Exp $ (XEN) HVM1: Bochs BIOS - build: 06/23/99 (XEN) HVM1: $Revision: 1.221 $ $Date: 2008/12/07 17:32:29 $ (XEN) HVM1: Options: apmbios pcibios eltorito PMM (XEN) HVM1: (XEN) HVM1: ata0-0: PCHS=8322/16/63 translation=lba LCHS=522/255/63 (XEN) HVM1: ata0 master: QEMU HARDDISK ATA-7 Hard-Disk (4096 MBytes) (XEN) HVM1: ata0-1: PCHS=16383/16/63 translation=lba LCHS=1024/255/63 (XEN) HVM1: ata0 slave: QEMU HARDDISK ATA-7 Hard-Disk (43008 MBytes) (XEN) HVM1: ata1 master: QEMU DVD-ROM ATAPI-4 CD-Rom/DVD-Rom (XEN) HVM1: IDE time out (XEN) HVM1: (XEN) HVM1: (XEN) HVM1: (XEN) HVM1: Press F12 for boot menu. (XEN) HVM1: (XEN) HVM1: Booting from Hard Disk... (XEN) HVM1: Booting from 0000:7c00 (XEN) HVM1: int13_harddisk: function 41, unmapped device for ELDL=82 (XEN) HVM1: int13_harddisk: function 08, unmapped device for ELDL=82 (XEN) HVM1: *** int 15h function AX=00c0, BX=0000 not yet supported! (XEN) HVM1: *** int 15h function AX=ec00, BX=0002 not yet supported! (XEN) HVM1: KBD: unsupported int 16h function 03 (XEN) HVM1: *** int 15h function AX=e980, BX=0000 not yet supported! (XEN) HVM1: int13_harddisk: function 41, unmapped device for ELDL=82 (XEN) HVM1: int13_harddisk: function 02, unmapped device for ELDL=82 (XEN) HVM1: int13_harddisk: function 41, unmapped device for ELDL=83 (XEN) HVM1: int13_harddisk: function 02, unmapped device for ELDL=83 (XEN) HVM1: int13_harddisk: function 41, unmapped device for ELDL=84 (XEN) HVM1: int13_harddisk: function 02, unmapped device for ELDL=84 (XEN) HVM1: int13_harddisk: function 41, unmapped device for ELDL=85 (XEN) HVM1: int13_harddisk: function 02, unmapped device for ELDL=85 (XEN) HVM1: int13_harddisk: function 41, unmapped device for ELDL=86 (XEN) HVM1: int13_harddisk: function 02, unmapped device for ELDL=86 (XEN) HVM1: int13_harddisk: function 41, unmapped device for ELDL=87 (XEN) HVM1: int13_harddisk: function 02, unmapped device for ELDL=87 (XEN) HVM1: int13_harddisk: function 41, ELDL out of range 88 (XEN) HVM1: int13_harddisk: function 02, ELDL out of range 88 (XEN) HVM1: int13_harddisk: function 41, ELDL out of range 89 (XEN) HVM1: int13_harddisk: function 02, ELDL out of range 89 (XEN) HVM1: int13_harddisk: function 41, ELDL out of range 8a (XEN) HVM1: int13_harddisk: function 02, ELDL out of range 8a (XEN) HVM1: int13_harddisk: function 41, ELDL out of range 8b (XEN) HVM1: int13_harddisk: function 02, ELDL out of range 8b (XEN) HVM1: int13_harddisk: function 41, ELDL out of range 8c (XEN) HVM1: int13_harddisk: function 02, ELDL out of range 8c (XEN) HVM1: int13_harddisk: function 41, ELDL out of range 8d (XEN) HVM1: int13_harddisk: function 02, ELDL out of range 8d (XEN) HVM1: int13_harddisk: function 41, ELDL out of range 8e (XEN) HVM1: int13_harddisk: function 02, ELDL out of range 8e (XEN) HVM1: int13_harddisk: function 41, ELDL out of range 8f (XEN) HVM1: int13_harddisk: function 02, ELDL out of range 8f (XEN) vlapic.c:699:d1 Local APIC Write to read-only register 0x30 (XEN) vlapic.c:699:d1 Local APIC Write to read-only register 0x20 (XEN) vlapic.c:699:d1 Local APIC Write to read-only register 0x20 (XEN) irq.c:258: Dom1 PCI link 0 changed 5 -> 0 (XEN) irq.c:258: Dom1 PCI link 1 changed 10 -> 0 (XEN) irq.c:258: Dom1 PCI link 2 changed 11 -> 0 (XEN) irq.c:258: Dom1 PCI link 3 changed 5 -> 0 (XEN) grant_table.c:1414:d1 Fault while reading gnttab_query_size_t. (XEN) grant_table.c:1414:d1 Fault while reading gnttab_query_size_t. (XEN) irq.c:324: Dom1 callback via changed to PCI INTx Dev 0x03 IntA [ 165.316278] blkback: ring-ref 8, event-channel 9, protocol 1 (x86_64-abi) [ 165.330911] alloc irq_desc for 886 on node 0 [ 165.337115] alloc kstat_irqs on node 0 [ 165.351993] blkback: ring-ref 9, event-channel 10, protocol 1 (x86_64-abi) [ 165.366824] alloc irq_desc for 887 on node 0 [ 165.372089] alloc kstat_irqs on node 0 [ 165.387424] blkback: ring-ref 10, event-channel 11, protocol 1 (x86_64-abi) [ 165.402453] alloc irq_desc for 888 on node 0 [ 165.409108] alloc kstat_irqs on node 0 (XEN) grant_table.c:1414:d1 Fault while reading gnttab_query_size_t. [ 168.016706] alloc irq_desc for 889 on node 0 [ 168.020103] alloc kstat_irqs on node 0 (XEN) Xen BUG at wait.c:118 (XEN) Assertion ''list_empty(&wqv->list)'' failed at wait.c:130 (XEN) Debugging connection not set up. (XEN) Debugging connection not set up. (XEN) ----[ Xen-4.1.22433-20101126.164804 x86_64 debug=y Tainted: C ]---- (XEN) ----[ Xen-4.1.22433-20101126.164804 x86_64 debug=y Tainted: C ]---- (XEN) CPU: 1 (XEN) CPU: 3 (XEN) RIP: e008:[<ffff82c4801285c1>]RIP: e008:[<ffff82c4801283ab>] prepare_to_wait+0xf0/0x10f (XEN) RFLAGS: 0000000000010212 check_wakeup_from_wait+0x27/0x61CONTEXT: hypervisor (XEN) (XEN) RFLAGS: 0000000000010293 rax: 0000000000000000 rbx: ffff8301337cd010 rcx: 0000000000000968 (XEN) CONTEXT: hypervisor (XEN) rdx: ffff83013e737f18 rsi: ffff83013e7375b0 rdi: ffff8301337cd030 (XEN) rax: ffff8301337cd620 rbx: ffff830012b72000 rcx: 0000000000000000 (XEN) rbp: ffff83013e737648 rsp: ffff83013e737628 r8: ffff830138439f60 (XEN) rdx: ffff83013e707f18 rsi: 0000000000000003 rdi: ffff830012b73860 (XEN) r9: 000000000011622f r10: ffff83013e737950 r11: ffffffff8101f230 (XEN) rbp: ffff83013e707cb0 rsp: ffff83013e707cb0 r8: 0000000000000013 (XEN) r12: ffff83013e737668 r13: ffff8301337cd010 r14: ffff830012b74000 (XEN) r9: 0000ffff0000ffff r10: 00ff00ff00ff00ff r11: 0f0f0f0f0f0f0f0f (XEN) r15: ffffffffff5fb300 cr0: 000000008005003b cr4: 00000000000026f0 (XEN) r12: 0000000000000003 r13: ffff8300bf2fa000 r14: 0000002e80873e97 (XEN) cr3: 000000013444c000 cr2: ffffe8ffffc00000 (XEN) r15: ffff830012b72000 cr0: 000000008005003b cr4: 00000000000026f0 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 (XEN) cr3: 00000001347e5000 cr2: 00007f1c79acf000 (XEN) Xen stack trace from rsp=ffff83013e737628: (XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) 0000000000000004Xen stack trace from rsp=ffff83013e707cb0: (XEN) 0000000000000003 ffff83013e707cf0 0000000000000004 ffff82c4801ab935 ffff83013e7379b0 ffff83013e707d10 (XEN) ffff830012b72000 ffff83013e7376b8 (XEN) ffff82c4801ac622 0000000000000003 ffff83013e7376f0 ffff8300bf2fa000 ffff83013e737668 0000002e80873e97 (XEN) ffff83013e70c040 000000000fff0001 (XEN) ffff8301337cd010 ffff83013e707d10 ffff8301337cd010 ffff82c4801c29eb 0000000000000004 (XEN) 0000002e80873e97 000000033e737bc8 ffff830012b72000 0000000000000000 (XEN) 0000000000000003 ffff83013e707e10 0000000000000004 ffff82c480157913 (XEN) 0000000000000008 ffff83013e737bc8 ffff830012b74000 0000000000000000 ffff83013e737728 (XEN) ffff82c4801a6460 0000000000000000 (XEN) 0000000000000000 ffff83013e7376f8 0000000000000002 0000000000000001 0000002e80873cb4 ffff83013e7379b0 (XEN) 000000000000008b ffff82c4802e58c0 (XEN) ffff82c4802e58c0 00000000fee00310 ffff83013e707dc0 0000000000000001 0000000000000282 ffffffffff5fb300 (XEN) 0000000000000089 ffff82c48014d4a1 (XEN) 0000002e80873e97 0000000000000000 ffff83013e707e40 0000000000000000 0000000000000000 00000003c18f8247 0000000000000000 (XEN) (XEN) ffff83013e77b5c0 ffff83013e737b28 0000000000000000 ffff82c48019179c ffff83013e707e00 ffff83013e737748 ffff82c48017a38b ffff82c480175d52 (XEN) (XEN) 0000000000000392 ffff83013e737768 0000003a19e31b9c 0000000280175df4 0000000000000000 000000003e6295a0 0000000000000000 ffff8301388e6f50 (XEN) (XEN) 0000000000000000 00007cfec18c8867 0000000000000000 ffff82c4802587c0 ffff83013e707e10 ffff83013e737bc8 ffff830012b72000 0000000012b74000 (XEN) (XEN) ffff8300bf2fa000 000000d600d60001 0000000000000001 ffff830138092000 0000002e80873e97 0000000000000000 ffff83013e70c040 25d68301388e6f50 (XEN) (XEN) ffff83013e707e90 0000000000000005 ffff82c480120fbe 00000000000ca8e4 ffff83013e707e40 000000000011622f 0000002e80873e97 0000000000000011 (XEN) (XEN) ffff83013e70c100 0000000139401004 ffff83013e767dd8 0000000000000000 ffff830012b72000 ffff83013e7377e8 0000000001c9c380 00ff82c480175d52 (XEN) (XEN) ffff83013e707e00 ffff83013e737800 ffff83013e70c100 ffff82c480175df4 ffff83013e767f20 ffff82c480122204 0000000000000003 ffff8301388e6f50 (XEN) (XEN) 0000000000000003 00000004388e6f50 ffff82c4802b3f00 0000000800000008 ffff83013e707f18 0000000000000000 ffffffffffffffff 00000004ffffffff (XEN) (XEN) ffff83013e707ed0 0000000400000001 ffff82c4801220d7 ffff82c4802022e9 ffff82c4802b3f00 ffff83013e7378c8 ffff83013e707f18 ffff83013e737888 (XEN) (XEN) ffff82c48025dbe0 0000000000000010 ffff83013e707f18 0000000300000000 0000002e805c506e ffff83013e73793c ffff83013e70c040 000000000000180a (XEN) (XEN) 0000000100000000 ffff83013e707ee0 0000000000000003 ffff82c480122152 00000000388e6f50 ffff83013e707f10 000000000000000a ffff82c480155619 (XEN) (XEN) ffff83013942d000 0000000000000000 0000000000119285 ffff8300bf2fa000 ffff83013e737998 0000000000000003 0000000000000000 (XEN) ffff8300bf2f6000Xen call trace: (XEN) (XEN) [<ffff82c4801285c1>] ffff83013e707d38 prepare_to_wait+0xf0/0x10f (XEN) 0000000000000000[<ffff82c4801ac622>] 0000000000000000 hvm_copy_to_guest_virt+0x65/0xb0 (XEN) 0000000000000000[<ffff82c4801a6460>] (XEN) Xen call trace: (XEN) hvmemul_write+0x113/0x1a2 (XEN) [<ffff82c4801283ab>][<ffff82c48019179c>] check_wakeup_from_wait+0x27/0x61 (XEN) [<ffff82c4801ab935>] x86_emulate+0xe296/0x110ae (XEN) [<ffff82c4801a563f>] hvm_do_resume+0x29/0x1aa (XEN) [<ffff82c4801c29eb>] hvm_emulate_one+0x103/0x192 (XEN) [<ffff82c4801b08ee>] vmx_do_resume+0x1bc/0x1db (XEN) [<ffff82c480157913>] handle_mmio+0x4e/0x17d (XEN) [<ffff82c4801c844f>] context_switch+0xdbf/0xddb (XEN) [<ffff82c480120fbe>] vmx_vmexit_handler+0x173f/0x1d2c (XEN) (XEN) schedule+0x5f3/0x619 (XEN) (XEN) **************************************** (XEN) [<ffff82c4801220d7>]Panic on CPU 1: (XEN) __do_softirq+0x88/0x99 (XEN) Xen BUG at wait.c:118 (XEN) [<ffff82c480122152>]**************************************** (XEN) (XEN) do_softirq+0x6a/0x7a (XEN) Reboot in five seconds... (XEN) [<ffff82c480155619>]Debugging connection not set up. (XEN) idle_loop+0x64/0x66 (XEN) Resetting with ACPI MEMORY or I/O RESET_REG.ý _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 02/12/2010 10:11, "Olaf Hering" <olaf@aepfle.de> wrote:> On Thu, Nov 18, Keir Fraser wrote: > >> I''ve done something along these lines now as xen-unstable:22402. It actually >> seems to work okay! So you can go ahead and use waitqueues in __hvm_copy() >> now. > > This is my first attempt to do it. > It crashed Xen on the very first try in a spectacular way. But it > happend only once for some reason. > See my other mail.Firstly, the usage of waitqueues is broken. The waitqueue_head should be shared with code that pages in, so that vcpus can be *woken* at some point after they start waiting. As it is, if a vcpu does sleep on its local waitqueue_head, it will never wake. You might start with a single global waitqueue_head and wake everyone on it every time a page (or maybe page batch) is paged in. More sophisticated might be to hash page numbers into a array of waitqueue_heads, to reduce false wakeups. This is all similar to Linux waitqueues by the way -- your current code would be just as broken in Linux as it is Xen. Secondly, you should be able to hide the waiting inside __hvm_copy(). I doubt you really need to touch the callers. -- Keir> > Olaf > > --- xen-unstable.hg-4.1.22447.orig/xen/arch/x86/hvm/hvm.c > +++ xen-unstable.hg-4.1.22447/xen/arch/x86/hvm/hvm.c > @@ -1986,69 +1986,117 @@ static enum hvm_copy_result __hvm_copy( > enum hvm_copy_result hvm_copy_to_guest_phys( > paddr_t paddr, void *buf, int size) > { > - return __hvm_copy(buf, paddr, size, > + enum hvm_copy_result res; > + struct waitqueue_head wq; > + init_waitqueue_head(&wq); > + > + wait_event(wq, ( > + res = __hvm_copy(buf, paddr, size, > HVMCOPY_to_guest | HVMCOPY_fault | HVMCOPY_phys, > - 0); > + 0)) != HVMCOPY_gfn_paged_out); > + return res; > } > > enum hvm_copy_result hvm_copy_from_guest_phys( > void *buf, paddr_t paddr, int size) > { > - return __hvm_copy(buf, paddr, size, > + enum hvm_copy_result res; > + struct waitqueue_head wq; > + init_waitqueue_head(&wq); > + > + wait_event(wq, ( > + res = __hvm_copy(buf, paddr, size, > HVMCOPY_from_guest | HVMCOPY_fault | HVMCOPY_phys, > - 0); > + 0)) != HVMCOPY_gfn_paged_out); > + return res; > } > > enum hvm_copy_result hvm_copy_to_guest_virt( > unsigned long vaddr, void *buf, int size, uint32_t pfec) > { > - return __hvm_copy(buf, vaddr, size, > + enum hvm_copy_result res; > + struct waitqueue_head wq; > + init_waitqueue_head(&wq); > + > + wait_event(wq, ( > + res = __hvm_copy(buf, vaddr, size, > HVMCOPY_to_guest | HVMCOPY_fault | HVMCOPY_virt, > - PFEC_page_present | PFEC_write_access | pfec); > + PFEC_page_present | PFEC_write_access | pfec)) !> HVMCOPY_gfn_paged_out); > + return res; > } > > enum hvm_copy_result hvm_copy_from_guest_virt( > void *buf, unsigned long vaddr, int size, uint32_t pfec) > { > - return __hvm_copy(buf, vaddr, size, > + enum hvm_copy_result res; > + struct waitqueue_head wq; > + init_waitqueue_head(&wq); > + > + wait_event(wq, ( > + res = __hvm_copy(buf, vaddr, size, > HVMCOPY_from_guest | HVMCOPY_fault | HVMCOPY_virt, > - PFEC_page_present | pfec); > + PFEC_page_present | pfec)) != HVMCOPY_gfn_paged_out); > + return res; > } > > enum hvm_copy_result hvm_fetch_from_guest_virt( > void *buf, unsigned long vaddr, int size, uint32_t pfec) > { > + enum hvm_copy_result res; > + struct waitqueue_head wq; > if ( hvm_nx_enabled(current) ) > pfec |= PFEC_insn_fetch; > - return __hvm_copy(buf, vaddr, size, > + init_waitqueue_head(&wq); > + > + wait_event(wq, ( > + res = __hvm_copy(buf, vaddr, size, > HVMCOPY_from_guest | HVMCOPY_fault | HVMCOPY_virt, > - PFEC_page_present | pfec); > + PFEC_page_present | pfec)) != HVMCOPY_gfn_paged_out); > + return res; > } > > enum hvm_copy_result hvm_copy_to_guest_virt_nofault( > unsigned long vaddr, void *buf, int size, uint32_t pfec) > { > - return __hvm_copy(buf, vaddr, size, > + enum hvm_copy_result res; > + struct waitqueue_head wq; > + init_waitqueue_head(&wq); > + > + wait_event(wq, ( > + res = __hvm_copy(buf, vaddr, size, > HVMCOPY_to_guest | HVMCOPY_no_fault | HVMCOPY_virt, > - PFEC_page_present | PFEC_write_access | pfec); > + PFEC_page_present | PFEC_write_access | pfec)) !> HVMCOPY_gfn_paged_out); > + return res; > } > > enum hvm_copy_result hvm_copy_from_guest_virt_nofault( > void *buf, unsigned long vaddr, int size, uint32_t pfec) > { > - return __hvm_copy(buf, vaddr, size, > + enum hvm_copy_result res; > + struct waitqueue_head wq; > + init_waitqueue_head(&wq); > + > + wait_event(wq, ( > + res = __hvm_copy(buf, vaddr, size, > HVMCOPY_from_guest | HVMCOPY_no_fault | HVMCOPY_virt, > - PFEC_page_present | pfec); > + PFEC_page_present | pfec)) != HVMCOPY_gfn_paged_out); > + return res; > } > > enum hvm_copy_result hvm_fetch_from_guest_virt_nofault( > void *buf, unsigned long vaddr, int size, uint32_t pfec) > { > + enum hvm_copy_result res; > + struct waitqueue_head wq; > if ( hvm_nx_enabled(current) ) > pfec |= PFEC_insn_fetch; > - return __hvm_copy(buf, vaddr, size, > + init_waitqueue_head(&wq); > + > + wait_event(wq, ( > + res = __hvm_copy(buf, vaddr, size, > HVMCOPY_from_guest | HVMCOPY_no_fault | HVMCOPY_virt, > - PFEC_page_present | pfec); > + PFEC_page_present | pfec)) != HVMCOPY_gfn_paged_out); > + return res; > } > > unsigned long copy_to_user_hvm(void *to, const void *from, unsigned int len)_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 02.12.10 at 11:11, Olaf Hering <olaf@aepfle.de> wrote: > On Thu, Nov 18, Keir Fraser wrote: > >> I''ve done something along these lines now as xen-unstable:22402. It actually >> seems to work okay! So you can go ahead and use waitqueues in __hvm_copy() >> now. > > This is my first attempt to do it.I didn''t look in detail whether that''s being done in a non-intuitive way elsewhere, but I can''t see how the event you''re waiting on would ever get signaled - wouldn''t you need to pass it into __hvm_copy() and further down from there? Jan> It crashed Xen on the very first try in a spectacular way. But it > happend only once for some reason. > See my other mail. > > > Olaf > > --- xen-unstable.hg-4.1.22447.orig/xen/arch/x86/hvm/hvm.c > +++ xen-unstable.hg-4.1.22447/xen/arch/x86/hvm/hvm.c > @@ -1986,69 +1986,117 @@ static enum hvm_copy_result __hvm_copy( > enum hvm_copy_result hvm_copy_to_guest_phys( > paddr_t paddr, void *buf, int size) > { > - return __hvm_copy(buf, paddr, size, > + enum hvm_copy_result res; > + struct waitqueue_head wq; > + init_waitqueue_head(&wq); > + > + wait_event(wq, ( > + res = __hvm_copy(buf, paddr, size, > HVMCOPY_to_guest | HVMCOPY_fault | HVMCOPY_phys, > - 0); > + 0)) != HVMCOPY_gfn_paged_out); > + return res; > } > > enum hvm_copy_result hvm_copy_from_guest_phys( > void *buf, paddr_t paddr, int size) > { > - return __hvm_copy(buf, paddr, size, > + enum hvm_copy_result res; > + struct waitqueue_head wq; > + init_waitqueue_head(&wq); > + > + wait_event(wq, ( > + res = __hvm_copy(buf, paddr, size, > HVMCOPY_from_guest | HVMCOPY_fault | HVMCOPY_phys, > - 0); > + 0)) != HVMCOPY_gfn_paged_out); > + return res; > } > > enum hvm_copy_result hvm_copy_to_guest_virt( > unsigned long vaddr, void *buf, int size, uint32_t pfec) > { > - return __hvm_copy(buf, vaddr, size, > + enum hvm_copy_result res; > + struct waitqueue_head wq; > + init_waitqueue_head(&wq); > + > + wait_event(wq, ( > + res = __hvm_copy(buf, vaddr, size, > HVMCOPY_to_guest | HVMCOPY_fault | HVMCOPY_virt, > - PFEC_page_present | PFEC_write_access | pfec); > + PFEC_page_present | PFEC_write_access | pfec)) != > HVMCOPY_gfn_paged_out); > + return res; > } > > enum hvm_copy_result hvm_copy_from_guest_virt( > void *buf, unsigned long vaddr, int size, uint32_t pfec) > { > - return __hvm_copy(buf, vaddr, size, > + enum hvm_copy_result res; > + struct waitqueue_head wq; > + init_waitqueue_head(&wq); > + > + wait_event(wq, ( > + res = __hvm_copy(buf, vaddr, size, > HVMCOPY_from_guest | HVMCOPY_fault | HVMCOPY_virt, > - PFEC_page_present | pfec); > + PFEC_page_present | pfec)) != HVMCOPY_gfn_paged_out); > + return res; > } > > enum hvm_copy_result hvm_fetch_from_guest_virt( > void *buf, unsigned long vaddr, int size, uint32_t pfec) > { > + enum hvm_copy_result res; > + struct waitqueue_head wq; > if ( hvm_nx_enabled(current) ) > pfec |= PFEC_insn_fetch; > - return __hvm_copy(buf, vaddr, size, > + init_waitqueue_head(&wq); > + > + wait_event(wq, ( > + res = __hvm_copy(buf, vaddr, size, > HVMCOPY_from_guest | HVMCOPY_fault | HVMCOPY_virt, > - PFEC_page_present | pfec); > + PFEC_page_present | pfec)) != HVMCOPY_gfn_paged_out); > + return res; > } > > enum hvm_copy_result hvm_copy_to_guest_virt_nofault( > unsigned long vaddr, void *buf, int size, uint32_t pfec) > { > - return __hvm_copy(buf, vaddr, size, > + enum hvm_copy_result res; > + struct waitqueue_head wq; > + init_waitqueue_head(&wq); > + > + wait_event(wq, ( > + res = __hvm_copy(buf, vaddr, size, > HVMCOPY_to_guest | HVMCOPY_no_fault | HVMCOPY_virt, > - PFEC_page_present | PFEC_write_access | pfec); > + PFEC_page_present | PFEC_write_access | pfec)) != > HVMCOPY_gfn_paged_out); > + return res; > } > > enum hvm_copy_result hvm_copy_from_guest_virt_nofault( > void *buf, unsigned long vaddr, int size, uint32_t pfec) > { > - return __hvm_copy(buf, vaddr, size, > + enum hvm_copy_result res; > + struct waitqueue_head wq; > + init_waitqueue_head(&wq); > + > + wait_event(wq, ( > + res = __hvm_copy(buf, vaddr, size, > HVMCOPY_from_guest | HVMCOPY_no_fault | HVMCOPY_virt, > - PFEC_page_present | pfec); > + PFEC_page_present | pfec)) != HVMCOPY_gfn_paged_out); > + return res; > } > > enum hvm_copy_result hvm_fetch_from_guest_virt_nofault( > void *buf, unsigned long vaddr, int size, uint32_t pfec) > { > + enum hvm_copy_result res; > + struct waitqueue_head wq; > if ( hvm_nx_enabled(current) ) > pfec |= PFEC_insn_fetch; > - return __hvm_copy(buf, vaddr, size, > + init_waitqueue_head(&wq); > + > + wait_event(wq, ( > + res = __hvm_copy(buf, vaddr, size, > HVMCOPY_from_guest | HVMCOPY_no_fault | HVMCOPY_virt, > - PFEC_page_present | pfec); > + PFEC_page_present | pfec)) != HVMCOPY_gfn_paged_out); > + return res; > } > > unsigned long copy_to_user_hvm(void *to, const void *from, unsigned int > len)_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 02/12/2010 10:18, "Olaf Hering" <olaf@aepfle.de> wrote:> On Thu, Nov 18, Keir Fraser wrote: > >> I''ve done something along these lines now as xen-unstable:22402. It actually >> seems to work okay! So you can go ahead and use waitqueues in __hvm_copy() >> now. > > My first attempt with the patch I sent crashed like this. > Two threads run into a non-empty list: > > prepare_to_wait > check_wakeup_from_wait > > I could not reproduce this. Right now I''m running with a modified > xenpaging policy which pages just the pagetable gfns around gfn 0x1800. > But that almost stalls the guest due to the continous paging. > > Any ideas how this crash can happen?Since your current patch is conceptually quite broken anyway, there is little point in chasing down the crash. It might have something to do with allocating the waitqueue_head on the local stack -- which you would never want to do in a correct usage of waitqueues. So, back to square one and try again I''m afraid. -- Keir> Olaf > > .................... > > Welcome to SUSE Linux Enterprise Server 11 SP1 (x86_64) - Kernel > 2.6.32.24-20101117.152845-xen (console). > > > stein-schneider login: (XEN) memory.c:145:d0 Could not allocate order=9 > extent: id=1 memflags=0 (2 of 4) > (XEN) memory.c:145:d0 Could not allocate order=9 extent: id=1 memflags=0 (0 of > 3) > [ 102.139380] (cdrom_add_media_watch() > file=/usr/src/packages/BUILD/kernel-xen-2.6.32.24/linux-2.6.32/drivers/xen/blk > back/cdrom.c, line=108) nodename:backend/vbd/1/768 > [ 102.171632] (cdrom_is_type() > file=/usr/src/packages/BUILD/kernel-xen-2.6.32.24/linux-2.6.32/drivers/xen/blk > back/cdrom.c, line=95) type:0 > [ 102.209310] device vif1.0 entered promiscuous mode > [ 102.221776] br0: port 2(vif1.0) entering forwarding state > [ 102.490897] OLH gntdev_open(449) xend[5202]->qemu-dm[5324] i > ffff8800f2420720 f ffff8800f1c2f980 > [ 102.733559] ip_tables: (C) 2000-2006 Netfilter Core Team > [ 102.888335] nf_conntrack version 0.5.0 (16384 buckets, 65536 max) > [ 103.241995] (cdrom_add_media_watch() > file=/usr/src/packages/BUILD/kernel-xen-2.6.32.24/linux-2.6.32/drivers/xen/blk > back/cdrom.c, line=108) nodename:backend/vbd/1/5632 > [ 103.274444] (cdrom_is_type() > file=/usr/src/packages/BUILD/kernel-xen-2.6.32.24/linux-2.6.32/drivers/xen/blk > back/cdrom.c, line=95) type:1 > [ 103.301481] (cdrom_add_media_watch() > file=/usr/src/packages/BUILD/kernel-xen-2.6.32.24/linux-2.6.32/drivers/xen/blk > back/cdrom.c, line=110) is a cdrom > [ 103.331978] (cdrom_add_media_watch() > file=/usr/src/packages/BUILD/kernel-xen-2.6.32.24/linux-2.6.32/drivers/xen/blk > back/cdrom.c, line=112) xenstore wrote OK > [ 103.362764] (cdrom_is_type() > file=/usr/src/packages/BUILD/kernel-xen-2.6.32.24/linux-2.6.32/drivers/xen/blk > back/cdrom.c, line=95) type:1 > [ 104.538376] (cdrom_add_media_watch() > file=/usr/src/packages/BUILD/kernel-xen-2.6.32.24/linux-2.6.32/drivers/xen/blk > back/cdrom.c, line=108) nodename:backend/vbd/1/832 > [ 104.570669] (cdrom_is_type() > file=/usr/src/packages/BUILD/kernel-xen-2.6.32.24/linux-2.6.32/drivers/xen/blk > back/cdrom.c, line=95) type:0 > [ 112.401097] vif1.0: no IPv6 routers present > (XEN) HVM1: HVM Loader > (XEN) HVM1: Detected Xen v4.1.22433-20101126 > (XEN) HVM1: CPU speed is 2667 MHz > (XEN) HVM1: Xenbus rings @0xfeffc000, event channel 5 > (XEN) irq.c:258: Dom1 PCI link 0 changed 0 -> 5 > (XEN) HVM1: PCI-ISA link 0 routed to IRQ5 > (XEN) irq.c:258: Dom1 PCI link 1 changed 0 -> 10 > (XEN) HVM1: PCI-ISA link 1 routed to IRQ10 > (XEN) irq.c:258: Dom1 PCI link 2 changed 0 -> 11 > (XEN) HVM1: PCI-ISA link 2 routed to IRQ11 > (XEN) irq.c:258: Dom1 PCI link 3 changed 0 -> 5 > (XEN) HVM1: PCI-ISA link 3 routed to IRQ5 > (XEN) HVM1: pci dev 01:3 INTA->IRQ10 > (XEN) HVM1: pci dev 03:0 INTA->IRQ5 > (XEN) HVM1: pci dev 02:0 bar 10 size 02000000: f0000008 > (XEN) HVM1: pci dev 03:0 bar 14 size 01000000: f2000008 > (XEN) HVM1: pci dev 02:0 bar 14 size 00001000: f3000000 > (XEN) HVM1: pci dev 03:0 bar 10 size 00000100: 0000c001 > (XEN) HVM1: pci dev 01:1 bar 20 size 00000010: 0000c101 > (XEN) HVM1: Multiprocessor initialisation: > (XEN) HVM1: - CPU0 ... 40-bit phys ... fixed MTRRs ... var MTRRs [2/8] ... > done. > (XEN) HVM1: - CPU1 ... 40-bit phys ... fixed MTRRs ... var MTRRs [2/8] ... > done. > (XEN) HVM1: - CPU2 ... 40-bit phys ... fixed MTRRs ... var MTRRs [2/8] ... > done. > (XEN) HVM1: - CPU3 ... 40-bit phys ... fixed MTRRs ... var MTRRs [2/8] ... > done. > (XEN) HVM1: Testing HVM environment: > (XEN) HVM1: - REP INSB across page boundaries ... passed > (XEN) HVM1: - GS base MSRs and SWAPGS ... passed > (XEN) HVM1: Passed 2 of 2 tests > (XEN) HVM1: Writing SMBIOS tables ... > (XEN) HVM1: Loading ROMBIOS ... > (XEN) HVM1: 9660 bytes of ROMBIOS high-memory extensions: > (XEN) HVM1: Relocating to 0xfc000000-0xfc0025bc ... done > (XEN) HVM1: Creating MP tables ... > (XEN) HVM1: Loading Cirrus VGABIOS ... > (XEN) HVM1: Loading ACPI ... > (XEN) HVM1: - Lo data: 000ea020-000ea04f > (XEN) HVM1: - Hi data: fc002800-fc01291f > (XEN) HVM1: vm86 TSS at fc012c00 > (XEN) HVM1: BIOS map: > (XEN) HVM1: c0000-c8fff: VGA BIOS > (XEN) HVM1: eb000-eb1d9: SMBIOS tables > (XEN) HVM1: f0000-fffff: Main BIOS > (XEN) HVM1: E820 table: > (XEN) HVM1: [00]: 00000000:00000000 - 00000000:0009e000: RAM > (XEN) HVM1: [01]: 00000000:0009e000 - 00000000:0009fc00: RESERVED > (XEN) HVM1: [02]: 00000000:0009fc00 - 00000000:000a0000: RESERVED > (XEN) HVM1: HOLE: 00000000:000a0000 - 00000000:000e0000 > (XEN) HVM1: [03]: 00000000:000e0000 - 00000000:00100000: RESERVED > (XEN) HVM1: [04]: 00000000:00100000 - 00000000:40000000: RAM > (XEN) HVM1: HOLE: 00000000:40000000 - 00000000:fc000000 > (XEN) HVM1: [05]: 00000000:fc000000 - 00000001:00000000: RESERVED > (XEN) HVM1: Invoking ROMBIOS ... > (XEN) HVM1: $Revision: 1.221 $ $Date: 2008/12/07 17:32:29 $ > (XEN) stdvga.c:147:d1 entering stdvga and caching modes > (XEN) HVM1: VGABios $Id: vgabios.c,v 1.67 2008/01/27 09:44:12 vruppert Exp $ > (XEN) HVM1: Bochs BIOS - build: 06/23/99 > (XEN) HVM1: $Revision: 1.221 $ $Date: 2008/12/07 17:32:29 $ > (XEN) HVM1: Options: apmbios pcibios eltorito PMM > (XEN) HVM1: > (XEN) HVM1: ata0-0: PCHS=8322/16/63 translation=lba LCHS=522/255/63 > (XEN) HVM1: ata0 master: QEMU HARDDISK ATA-7 Hard-Disk (4096 MBytes) > (XEN) HVM1: ata0-1: PCHS=16383/16/63 translation=lba LCHS=1024/255/63 > (XEN) HVM1: ata0 slave: QEMU HARDDISK ATA-7 Hard-Disk (43008 MBytes) > (XEN) HVM1: ata1 master: QEMU DVD-ROM ATAPI-4 CD-Rom/DVD-Rom > (XEN) HVM1: IDE time out > (XEN) HVM1: > (XEN) HVM1: > (XEN) HVM1: > (XEN) HVM1: Press F12 for boot menu. > (XEN) HVM1: > (XEN) HVM1: Booting from Hard Disk... > (XEN) HVM1: Booting from 0000:7c00 > (XEN) HVM1: int13_harddisk: function 41, unmapped device for ELDL=82 > (XEN) HVM1: int13_harddisk: function 08, unmapped device for ELDL=82 > (XEN) HVM1: *** int 15h function AX=00c0, BX=0000 not yet supported! > (XEN) HVM1: *** int 15h function AX=ec00, BX=0002 not yet supported! > (XEN) HVM1: KBD: unsupported int 16h function 03 > (XEN) HVM1: *** int 15h function AX=e980, BX=0000 not yet supported! > (XEN) HVM1: int13_harddisk: function 41, unmapped device for ELDL=82 > (XEN) HVM1: int13_harddisk: function 02, unmapped device for ELDL=82 > (XEN) HVM1: int13_harddisk: function 41, unmapped device for ELDL=83 > (XEN) HVM1: int13_harddisk: function 02, unmapped device for ELDL=83 > (XEN) HVM1: int13_harddisk: function 41, unmapped device for ELDL=84 > (XEN) HVM1: int13_harddisk: function 02, unmapped device for ELDL=84 > (XEN) HVM1: int13_harddisk: function 41, unmapped device for ELDL=85 > (XEN) HVM1: int13_harddisk: function 02, unmapped device for ELDL=85 > (XEN) HVM1: int13_harddisk: function 41, unmapped device for ELDL=86 > (XEN) HVM1: int13_harddisk: function 02, unmapped device for ELDL=86 > (XEN) HVM1: int13_harddisk: function 41, unmapped device for ELDL=87 > (XEN) HVM1: int13_harddisk: function 02, unmapped device for ELDL=87 > (XEN) HVM1: int13_harddisk: function 41, ELDL out of range 88 > (XEN) HVM1: int13_harddisk: function 02, ELDL out of range 88 > (XEN) HVM1: int13_harddisk: function 41, ELDL out of range 89 > (XEN) HVM1: int13_harddisk: function 02, ELDL out of range 89 > (XEN) HVM1: int13_harddisk: function 41, ELDL out of range 8a > (XEN) HVM1: int13_harddisk: function 02, ELDL out of range 8a > (XEN) HVM1: int13_harddisk: function 41, ELDL out of range 8b > (XEN) HVM1: int13_harddisk: function 02, ELDL out of range 8b > (XEN) HVM1: int13_harddisk: function 41, ELDL out of range 8c > (XEN) HVM1: int13_harddisk: function 02, ELDL out of range 8c > (XEN) HVM1: int13_harddisk: function 41, ELDL out of range 8d > (XEN) HVM1: int13_harddisk: function 02, ELDL out of range 8d > (XEN) HVM1: int13_harddisk: function 41, ELDL out of range 8e > (XEN) HVM1: int13_harddisk: function 02, ELDL out of range 8e > (XEN) HVM1: int13_harddisk: function 41, ELDL out of range 8f > (XEN) HVM1: int13_harddisk: function 02, ELDL out of range 8f > (XEN) vlapic.c:699:d1 Local APIC Write to read-only register 0x30 > (XEN) vlapic.c:699:d1 Local APIC Write to read-only register 0x20 > (XEN) vlapic.c:699:d1 Local APIC Write to read-only register 0x20 > (XEN) irq.c:258: Dom1 PCI link 0 changed 5 -> 0 > (XEN) irq.c:258: Dom1 PCI link 1 changed 10 -> 0 > (XEN) irq.c:258: Dom1 PCI link 2 changed 11 -> 0 > (XEN) irq.c:258: Dom1 PCI link 3 changed 5 -> 0 > (XEN) grant_table.c:1414:d1 Fault while reading gnttab_query_size_t. > (XEN) grant_table.c:1414:d1 Fault while reading gnttab_query_size_t. > (XEN) irq.c:324: Dom1 callback via changed to PCI INTx Dev 0x03 IntA > [ 165.316278] blkback: ring-ref 8, event-channel 9, protocol 1 (x86_64-abi) > [ 165.330911] alloc irq_desc for 886 on node 0 > [ 165.337115] alloc kstat_irqs on node 0 > [ 165.351993] blkback: ring-ref 9, event-channel 10, protocol 1 (x86_64-abi) > [ 165.366824] alloc irq_desc for 887 on node 0 > [ 165.372089] alloc kstat_irqs on node 0 > [ 165.387424] blkback: ring-ref 10, event-channel 11, protocol 1 (x86_64-abi) > [ 165.402453] alloc irq_desc for 888 on node 0 > [ 165.409108] alloc kstat_irqs on node 0 > (XEN) grant_table.c:1414:d1 Fault while reading gnttab_query_size_t. > [ 168.016706] alloc irq_desc for 889 on node 0 > [ 168.020103] alloc kstat_irqs on node 0 > (XEN) Xen BUG at wait.c:118 > (XEN) Assertion ''list_empty(&wqv->list)'' failed at wait.c:130 > (XEN) Debugging connection not set up. > (XEN) Debugging connection not set up. > (XEN) ----[ Xen-4.1.22433-20101126.164804 x86_64 debug=y Tainted: C > ]---- > (XEN) ----[ Xen-4.1.22433-20101126.164804 x86_64 debug=y Tainted: C > ]---- > (XEN) CPU: 1 > (XEN) CPU: 3 > (XEN) RIP: e008:[<ffff82c4801285c1>]RIP: e008:[<ffff82c4801283ab>] > prepare_to_wait+0xf0/0x10f > (XEN) RFLAGS: 0000000000010212 check_wakeup_from_wait+0x27/0x61CONTEXT: > hypervisor > (XEN) > (XEN) RFLAGS: 0000000000010293 rax: 0000000000000000 rbx: ffff8301337cd010 > rcx: 0000000000000968 > (XEN) CONTEXT: hypervisor > (XEN) rdx: ffff83013e737f18 rsi: ffff83013e7375b0 rdi: ffff8301337cd030 > (XEN) rax: ffff8301337cd620 rbx: ffff830012b72000 rcx: 0000000000000000 > (XEN) rbp: ffff83013e737648 rsp: ffff83013e737628 r8: ffff830138439f60 > (XEN) rdx: ffff83013e707f18 rsi: 0000000000000003 rdi: ffff830012b73860 > (XEN) r9: 000000000011622f r10: ffff83013e737950 r11: ffffffff8101f230 > (XEN) rbp: ffff83013e707cb0 rsp: ffff83013e707cb0 r8: 0000000000000013 > (XEN) r12: ffff83013e737668 r13: ffff8301337cd010 r14: ffff830012b74000 > (XEN) r9: 0000ffff0000ffff r10: 00ff00ff00ff00ff r11: 0f0f0f0f0f0f0f0f > (XEN) r15: ffffffffff5fb300 cr0: 000000008005003b cr4: 00000000000026f0 > (XEN) r12: 0000000000000003 r13: ffff8300bf2fa000 r14: 0000002e80873e97 > (XEN) cr3: 000000013444c000 cr2: ffffe8ffffc00000 > (XEN) r15: ffff830012b72000 cr0: 000000008005003b cr4: 00000000000026f0 > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 > (XEN) cr3: 00000001347e5000 cr2: 00007f1c79acf000 > (XEN) Xen stack trace from rsp=ffff83013e737628: > (XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008 > (XEN) 0000000000000004Xen stack trace from rsp=ffff83013e707cb0: > (XEN) 0000000000000003 ffff83013e707cf0 0000000000000004 ffff82c4801ab935 > ffff83013e7379b0 ffff83013e707d10 > (XEN) ffff830012b72000 ffff83013e7376b8 > (XEN) ffff82c4801ac622 0000000000000003 ffff83013e7376f0 ffff8300bf2fa000 > ffff83013e737668 0000002e80873e97 > (XEN) ffff83013e70c040 000000000fff0001 > (XEN) ffff8301337cd010 ffff83013e707d10 ffff8301337cd010 ffff82c4801c29eb > 0000000000000004 > (XEN) 0000002e80873e97 000000033e737bc8 ffff830012b72000 0000000000000000 > (XEN) 0000000000000003 ffff83013e707e10 0000000000000004 ffff82c480157913 > (XEN) 0000000000000008 ffff83013e737bc8 ffff830012b74000 0000000000000000 > ffff83013e737728 > (XEN) ffff82c4801a6460 0000000000000000 > (XEN) 0000000000000000 ffff83013e7376f8 0000000000000002 0000000000000001 > 0000002e80873cb4 ffff83013e7379b0 > (XEN) 000000000000008b ffff82c4802e58c0 > (XEN) ffff82c4802e58c0 00000000fee00310 ffff83013e707dc0 0000000000000001 > 0000000000000282 ffffffffff5fb300 > (XEN) 0000000000000089 ffff82c48014d4a1 > (XEN) 0000002e80873e97 0000000000000000 ffff83013e707e40 0000000000000000 > 0000000000000000 00000003c18f8247 0000000000000000 > (XEN) > (XEN) ffff83013e77b5c0 ffff83013e737b28 0000000000000000 ffff82c48019179c > ffff83013e707e00 ffff83013e737748 ffff82c48017a38b ffff82c480175d52 > (XEN) > (XEN) 0000000000000392 ffff83013e737768 0000003a19e31b9c 0000000280175df4 > 0000000000000000 000000003e6295a0 0000000000000000 ffff8301388e6f50 > (XEN) > (XEN) 0000000000000000 00007cfec18c8867 0000000000000000 ffff82c4802587c0 > ffff83013e707e10 ffff83013e737bc8 ffff830012b72000 0000000012b74000 > (XEN) > (XEN) ffff8300bf2fa000 000000d600d60001 0000000000000001 ffff830138092000 > 0000002e80873e97 0000000000000000 ffff83013e70c040 25d68301388e6f50 > (XEN) > (XEN) ffff83013e707e90 0000000000000005 ffff82c480120fbe 00000000000ca8e4 > ffff83013e707e40 000000000011622f 0000002e80873e97 0000000000000011 > (XEN) > (XEN) ffff83013e70c100 0000000139401004 ffff83013e767dd8 0000000000000000 > ffff830012b72000 ffff83013e7377e8 0000000001c9c380 00ff82c480175d52 > (XEN) > (XEN) ffff83013e707e00 ffff83013e737800 ffff83013e70c100 ffff82c480175df4 > ffff83013e767f20 ffff82c480122204 0000000000000003 ffff8301388e6f50 > (XEN) > (XEN) 0000000000000003 00000004388e6f50 ffff82c4802b3f00 0000000800000008 > ffff83013e707f18 0000000000000000 ffffffffffffffff 00000004ffffffff > (XEN) > (XEN) ffff83013e707ed0 0000000400000001 ffff82c4801220d7 ffff82c4802022e9 > ffff82c4802b3f00 ffff83013e7378c8 ffff83013e707f18 ffff83013e737888 > (XEN) > (XEN) ffff82c48025dbe0 0000000000000010 ffff83013e707f18 0000000300000000 > 0000002e805c506e ffff83013e73793c ffff83013e70c040 000000000000180a > (XEN) > (XEN) 0000000100000000 ffff83013e707ee0 0000000000000003 ffff82c480122152 > 00000000388e6f50 ffff83013e707f10 000000000000000a ffff82c480155619 > (XEN) > (XEN) ffff83013942d000 0000000000000000 0000000000119285 ffff8300bf2fa000 > ffff83013e737998 0000000000000003 0000000000000000 > (XEN) ffff8300bf2f6000Xen call trace: > (XEN) > (XEN) [<ffff82c4801285c1>] ffff83013e707d38 prepare_to_wait+0xf0/0x10f > (XEN) 0000000000000000[<ffff82c4801ac622>] 0000000000000000 > hvm_copy_to_guest_virt+0x65/0xb0 > (XEN) 0000000000000000[<ffff82c4801a6460>] > (XEN) Xen call trace: > (XEN) hvmemul_write+0x113/0x1a2 > (XEN) [<ffff82c4801283ab>][<ffff82c48019179c>] > check_wakeup_from_wait+0x27/0x61 > (XEN) [<ffff82c4801ab935>] x86_emulate+0xe296/0x110ae > (XEN) [<ffff82c4801a563f>] hvm_do_resume+0x29/0x1aa > (XEN) [<ffff82c4801c29eb>] hvm_emulate_one+0x103/0x192 > (XEN) [<ffff82c4801b08ee>] vmx_do_resume+0x1bc/0x1db > (XEN) [<ffff82c480157913>] handle_mmio+0x4e/0x17d > (XEN) [<ffff82c4801c844f>] context_switch+0xdbf/0xddb > (XEN) [<ffff82c480120fbe>] vmx_vmexit_handler+0x173f/0x1d2c > (XEN) > (XEN) schedule+0x5f3/0x619 > (XEN) > (XEN) **************************************** > (XEN) [<ffff82c4801220d7>]Panic on CPU 1: > (XEN) __do_softirq+0x88/0x99 > (XEN) Xen BUG at wait.c:118 > (XEN) [<ffff82c480122152>]**************************************** > (XEN) > (XEN) do_softirq+0x6a/0x7a > (XEN) Reboot in five seconds... > (XEN) [<ffff82c480155619>]Debugging connection not set up. > (XEN) idle_loop+0x64/0x66 > (XEN) Resetting with ACPI MEMORY or I/O RESET_REG.ý_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 02/12/2010 10:22, "Keir Fraser" <keir@xen.org> wrote:> Firstly, the usage of waitqueues is broken. The waitqueue_head should be > shared with code that pages in, so that vcpus can be *woken* at some point > after they start waiting. As it is, if a vcpu does sleep on its local > waitqueue_head, it will never wake. You might start with a single global > waitqueue_head and wake everyone on it every time a page (or maybe page > batch) is paged in. More sophisticated might be to hash page numbers into a > array of waitqueue_heads, to reduce false wakeups....Or you might have a per-domain waitqueue_head, and do the wake_up() from code that adds paged-in entries to the guest physmap. That would seem a pretty sensible way to proceed, to me. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Mon, Nov 15, Keir Fraser wrote:> On 15/11/2010 12:04, "Tim Deegan" <Tim.Deegan@citrix.com> wrote:> > Will it > > need per-vcpu stacks? (and will they, in turn, use order>0 allocations? :)) > > Of a sort. I propose to keep the per-pcpu stacks and then copy context > to/from a per-vcpu memory area for the setjmp-like behaviour. Guest call > stacks won''t be very deep -- I reckon a 1kB or 2kB per-vcpu area will > suffice.Keir, in my testing the BUG_ON in __prepare_to_wait() triggers, 1500 is too small. I changed it to 4096 - (4*sizeof(void*)) to fix it for me. 3K would be enough as well. How large can the stack get, is there an upper limit? Olaf _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, Dec 02, Jan Beulich wrote:> >>> On 02.12.10 at 11:11, Olaf Hering <olaf@aepfle.de> wrote: > > On Thu, Nov 18, Keir Fraser wrote: > > > >> I''ve done something along these lines now as xen-unstable:22402. It actually > >> seems to work okay! So you can go ahead and use waitqueues in __hvm_copy() > >> now. > > > > This is my first attempt to do it. > > I didn''t look in detail whether that''s being done in a non-intuitive > way elsewhere, but I can''t see how the event you''re waiting on > would ever get signaled - wouldn''t you need to pass it into > __hvm_copy() and further down from there?I was relying on the kind-of wakeup in p2m_mem_paging_resume(). There will be a new patch shortly. Olaf _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, Dec 02, Keir Fraser wrote:> On 02/12/2010 10:22, "Keir Fraser" <keir@xen.org> wrote: > > > Firstly, the usage of waitqueues is broken. The waitqueue_head should be > > shared with code that pages in, so that vcpus can be *woken* at some point > > after they start waiting. As it is, if a vcpu does sleep on its local > > waitqueue_head, it will never wake. You might start with a single global > > waitqueue_head and wake everyone on it every time a page (or maybe page > > batch) is paged in. More sophisticated might be to hash page numbers into a > > array of waitqueue_heads, to reduce false wakeups. > > ...Or you might have a per-domain waitqueue_head, and do the wake_up() from > code that adds paged-in entries to the guest physmap. That would seem a > pretty sensible way to proceed, to me.Thats what I''m doing right now. It seems that the existing MEM_EVENT_FLAG_VCPU_PAUSED code can be reused for this. I was messing with wait_event() until I realized that the vcpu is stopped by p2m_mem_paging_populate() already and the wake_up() ran before the vcpu got a chance to call schedule(). If a vcpu happens to be scheduled and the domain is destroyed, the BUG_ON in destroy_waitqueue_vcpu() will trigger. What can happen if there is still an entry in the list? The cleanup should handle this situation to not crash Xen itself. Olaf _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 03/12/2010 01:03, "Olaf Hering" <olaf@aepfle.de> wrote:> in my testing the BUG_ON in __prepare_to_wait() triggers, 1500 is too > small. I changed it to 4096 - (4*sizeof(void*)) to fix it for me. > 3K would be enough as well. > How large can the stack get, is there an upper limit?It can get pretty deep with nested interrupts. I wouldn''t expect a guest''s hypercall stack to get very deep at all. Send me a BUG_ON() backtrace. That said, making the saved-stack area bigger isn''t really a problem. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 03/12/2010 01:06, "Olaf Hering" <olaf@aepfle.de> wrote:>> I didn''t look in detail whether that''s being done in a non-intuitive >> way elsewhere, but I can''t see how the event you''re waiting on >> would ever get signaled - wouldn''t you need to pass it into >> __hvm_copy() and further down from there? > > I was relying on the kind-of wakeup in p2m_mem_paging_resume(). > > There will be a new patch shortly.vcpu_pause() is nestable and counted. So the vcpu_unpause() on MEM_EVENT_FLAG_VCPU_PAUSED will not be enough to wake up a vcpu that is also paused on a waitqueue. Once the vcou is a sleep on a waitqueue it definitely needs wake_up() to wake it. Of course, p2m_mem_paging_resume() is quite likely the right place to put the wake_up() call. But you do need it in addition to the unpause on the MEM_EVENT flag. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 03/12/2010 01:14, "Olaf Hering" <olaf@aepfle.de> wrote:>> ...Or you might have a per-domain waitqueue_head, and do the wake_up() from >> code that adds paged-in entries to the guest physmap. That would seem a >> pretty sensible way to proceed, to me. > > Thats what I''m doing right now. > > It seems that the existing MEM_EVENT_FLAG_VCPU_PAUSED code can be reused > for this. I was messing with wait_event() until I realized that the vcpu > is stopped by p2m_mem_paging_populate() already and the wake_up() ran > before the vcpu got a chance to call schedule().Hm, not sure what you mean. The vcpu does not get synchronously stopped by _paging_populate(). Maybe you are confused.> If a vcpu happens to be scheduled and the domain is destroyed, the > BUG_ON in destroy_waitqueue_vcpu() will trigger. What can happen if > there is still an entry in the list? The cleanup should handle this > situation to not crash Xen itself.You''ll get a crash if a vcpu is on a waitqueue when you kill the domain. Yes, the destroydomain path needs code to handle that. It''ll get added, once I see an actual user of this waitqueue stuff. There a few other places that need fixing up like destroydomain, too. I don''t know what you mean by ''vcpu is scheduled and the domain is destroyed'' causing the BUG_ON(). If a vcpu is scheduled and running then presumably it is not on a waitqueue. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, Dec 02, Keir Fraser wrote:> Since your current patch is conceptually quite broken anyway, there is > little point in chasing down the crash. It might have something to do with > allocating the waitqueue_head on the local stack -- which you would never > want to do in a correct usage of waitqueues. So, back to square one and try > again I''m afraid.Keir, yesterday I sent out my patch queue for xen-unstable. I think the approach to wait the active vcpu in p2m_mem_paging_populate() and wakeup the vcpu in p2m_mem_paging_resume() could work. However, something causes what looks like stack corruption. Any idea whats going on? Olaf _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 07/12/2010 09:25, "Olaf Hering" <olaf@aepfle.de> wrote:> On Thu, Dec 02, Keir Fraser wrote: > >> Since your current patch is conceptually quite broken anyway, there is >> little point in chasing down the crash. It might have something to do with >> allocating the waitqueue_head on the local stack -- which you would never >> want to do in a correct usage of waitqueues. So, back to square one and try >> again I''m afraid. > > Keir, > > yesterday I sent out my patch queue for xen-unstable. I think the > approach to wait the active vcpu in p2m_mem_paging_populate() and wakeup > the vcpu in p2m_mem_paging_resume() could work. > However, something causes what looks like stack corruption. > > Any idea whats going on?No, I did some unit testing of the waitqueue stuff and it worked for me. Perhaps you can suggest some reproduction steps. K.> Olaf >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, Dec 07, Keir Fraser wrote:> No, I did some unit testing of the waitqueue stuff and it worked for me. > Perhaps you can suggest some reproduction steps.The patches 1 - 13 I sent out need to be applied. My config for a SLES11-SP1-x86_64 guest looks like that, 1 vcpu appears to make it crash faster: # /etc/xen/vm/sles11_0 name="sles11_0" description="None" uuid="756210f5-cc53-2bc6-7db2-a0cefca17c0b" memory=1024 maxmem=1024 vcpus=4 on_poweroff="destroy" on_reboot="restart" on_crash="destroy" localtime=0 keymap="de" builder="hvm" device_model="/usr/lib/xen/bin/qemu-dm" kernel="/usr/lib/xen/boot/hvmloader" boot="c" disk=[ ''file:/abuild/vdisk-sles11_0-disk0,hda,w'', ''file:/abuild/vdisk-sles11_0-disk1,hdb,w'', ''file:/abuild/bootiso-xenpaging-sles11_0.iso,hdc:cdrom,r'', ] vif=[ ''mac=00:e0:f1:08:15:00,bridge=br0,model=rtl8139,type=netfront'', ] stdvga=0 vnc=1 vncunused=1 extid=0 acpi=1 pae=1 serial="pty" The guest does not get very far, so all the IO part does probably not matter. I stop the guest in grub, then run ''xenpaging 1 -1''. With patch #13 only the guests pagetables get paged once the kernel is started from grub. For me it crashes in less than 255 populate/resume cycles. Does that help to reproduce the crash? Olaf _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 07/12/2010 17:16, "Olaf Hering" <olaf@aepfle.de> wrote:> On Tue, Dec 07, Keir Fraser wrote: > >> No, I did some unit testing of the waitqueue stuff and it worked for me. >> Perhaps you can suggest some reproduction steps. > > The patches 1 - 13 I sent out need to be applied.I''ll wait for tools patches 1-7 to be reviewed and accepted, then I might find time to have a go. I assume I should start the guest paused, attach xenpaging, then when I unpause th eguest it should crash the host straight away pretty much? -- Keir> My config for a SLES11-SP1-x86_64 guest looks like that, 1 vcpu appears > to make it crash faster: > > # /etc/xen/vm/sles11_0 > name="sles11_0" > description="None" > uuid="756210f5-cc53-2bc6-7db2-a0cefca17c0b" > memory=1024 > maxmem=1024 > vcpus=4 > on_poweroff="destroy" > on_reboot="restart" > on_crash="destroy" > localtime=0 > keymap="de" > builder="hvm" > device_model="/usr/lib/xen/bin/qemu-dm" > kernel="/usr/lib/xen/boot/hvmloader" > boot="c" > disk=[ ''file:/abuild/vdisk-sles11_0-disk0,hda,w'', > ''file:/abuild/vdisk-sles11_0-disk1,hdb,w'', > ''file:/abuild/bootiso-xenpaging-sles11_0.iso,hdc:cdrom,r'', ] > vif=[ ''mac=00:e0:f1:08:15:00,bridge=br0,model=rtl8139,type=netfront'', ] > stdvga=0 > vnc=1 > vncunused=1 > extid=0 > acpi=1 > pae=1 > serial="pty" > > The guest does not get very far, so all the IO part does probably not > matter. I stop the guest in grub, then run ''xenpaging 1 -1''. With patch > #13 only the guests pagetables get paged once the kernel is started from > grub. For me it crashes in less than 255 populate/resume cycles. > > Does that help to reproduce the crash? > > Olaf >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, Dec 07, Keir Fraser wrote:> On 07/12/2010 17:16, "Olaf Hering" <olaf@aepfle.de> wrote: > > > On Tue, Dec 07, Keir Fraser wrote: > > > >> No, I did some unit testing of the waitqueue stuff and it worked for me. > >> Perhaps you can suggest some reproduction steps. > > > > The patches 1 - 13 I sent out need to be applied. > > I''ll wait for tools patches 1-7 to be reviewed and accepted, then I might > find time to have a go. I assume I should start the guest paused, attach > xenpaging, then when I unpause th eguest it should crash the host straight > away pretty much?My testhost had hardware issues today, so I could not proceed with testing. What I did was: sync xm create /etc/xen/vm/sles11_1 && xm vnc sles11_1 & sleep 1 xenpaging 1 -1 & Olaf _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel