Ian Jackson
2012-Nov-16 18:19 UTC
Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration
Liu Jinsong writes ("[PATCH V3] X86/vMCE: handle broken page with regard to migration"):> This is V3 patch, adding handle for vMCE occur at last iteration of migration:I think you should take off my ack, having added significant new code in the tools section. About that code:> + /* > + * if vMCE occur at last iter, do one more iter so that it get > + * chance to transfer broken page''s pfn_type and pfn number to > + * target and then take appropriate action > + */ > + if ( last_iter ) > + { > + for ( j = 0; j < batch; j++ ) > + { > + if ( hvm ) > + pfn_type[j] = pfn_batch[j]; > + else > + pfn_type[j] = pfn_to_mfn(pfn_batch[j]); > + } > + > + if ( xc_get_pfn_type_batch(xch, dom, batch, pfn_type) ) > + { > + PERROR("get_pfn_type_batch failed"); > + goto out; > + } > + > + broken_page_num2 = 0; > + for ( j = 0; j < batch; j++ ) > + if ( pfn_type[j] == XEN_DOMCTL_PFINFO_BROKEN ) > + broken_page_num2++; > + > + if ( broken_page_num1 < broken_page_num2 ) > + last_iter = 0; > + }Is this really the best way of doing this ? Isn''t there some single flag the hypervisor sets somewhere ? Ian.
Liu, Jinsong
2012-Nov-16 18:31 UTC
Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration
Ian Jackson wrote:> Liu Jinsong writes ("[PATCH V3] X86/vMCE: handle broken page with > regard to migration"): >> This is V3 patch, adding handle for vMCE occur at last iteration of >> migration: > > I think you should take off my ack, having added significant new code > in the tools section.OK.> > About that code: > >> + /* >> + * if vMCE occur at last iter, do one more iter so that >> it get + * chance to transfer broken page''s pfn_type and >> pfn number to + * target and then take appropriate >> action + */ + if ( last_iter ) >> + { >> + for ( j = 0; j < batch; j++ ) >> + { >> + if ( hvm ) >> + pfn_type[j] = pfn_batch[j]; + >> else + pfn_type[j] >> pfn_to_mfn(pfn_batch[j]); + } + >> + if ( xc_get_pfn_type_batch(xch, dom, batch, >> pfn_type) ) + { + >> PERROR("get_pfn_type_batch failed"); + goto out; >> + } >> + >> + broken_page_num2 = 0; >> + for ( j = 0; j < batch; j++ ) >> + if ( pfn_type[j] == XEN_DOMCTL_PFINFO_BROKEN ) >> + broken_page_num2++; >> + >> + if ( broken_page_num1 < broken_page_num2 ) >> + last_iter = 0; >> + } > > Is this really the best way of doing this ? Isn''t there some single > flag the hypervisor sets somewhere ? > > Ian.That will involve adding new hypercall, and notifying hypervisor whether tools side is at the last iter round. This patch just re-use existed mechanism. Thanks, Jinsong
Liu Jinsong
2012-Nov-17 02:04 UTC
[PATCH V3] X86/vMCE: handle broken page with regard to migration
Generally, there are 2 cases: 1. broken page occurs before migration 2. broken page occurs during migration We had submitted 2 versions of patches. Their difference is how to handle case 2: V1 patch aborted migration for the case ''broken page occurs during migration''; V2 patch marked broken page to dirty bitmap then got handled next round and no abort; This is V3 patch, adding handle for vMCE occur at last iteration of migration: At the sender For case 1, the broken page will be mapped but not copied to target (otherwise it may trigger more serious error, say, SRAR error). While its pfn_type and pfn number will be transferred to target so that target take appropriate action. For case 2, mce handler marks the broken page to dirty bitmap, so that at copypages stage of migration, its pfn_type and pfn number will be transferred to target and then take appropriate action. At the target When migration target populates pages for guest. As for the case of broken page wrt migration, we prefer to keep the type of the page, for the sake of seamless migration. At target it will set p2m as p2m_ram_broken for broken page. Guest MCE may have good chance to handle its broken page, while if guest access the broken page again it will kill itself as expected. If vMCE occur at the last memory copy iteration of migration, we do one more iteration so that the broken page''s pfn_type and pfn number has chance to be transferred to target. Suggested-by: George Dunlap <george.dunlap@eu.citrix.com> Signed-off-by: Liu Jinsong <jinsong.liu@intel.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> - for tools part of the patch Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com> - for hypervisor part of the patch Acked-by: Jan Beulich <JBeulich@suse.com> diff -r 8b93ac0c93f3 -r 446f6b9bfc89 tools/libxc/xc_domain.c --- a/tools/libxc/xc_domain.c Tue Nov 13 11:19:17 2012 +0000 +++ b/tools/libxc/xc_domain.c Sat Nov 17 09:46:05 2012 +0800 @@ -283,6 +283,22 @@ return ret; } +/* set broken page p2m */ +int xc_set_broken_page_p2m(xc_interface *xch, + uint32_t domid, + unsigned long pfn) +{ + int ret; + DECLARE_DOMCTL; + + domctl.cmd = XEN_DOMCTL_set_broken_page_p2m; + domctl.domain = (domid_t)domid; + domctl.u.set_broken_page_p2m.pfn = pfn; + ret = do_domctl(xch, &domctl); + + return ret ? -1 : 0; +} + /* get info from hvm guest for save */ int xc_domain_hvm_getcontext(xc_interface *xch, uint32_t domid, diff -r 8b93ac0c93f3 -r 446f6b9bfc89 tools/libxc/xc_domain_restore.c --- a/tools/libxc/xc_domain_restore.c Tue Nov 13 11:19:17 2012 +0000 +++ b/tools/libxc/xc_domain_restore.c Sat Nov 17 09:46:05 2012 +0800 @@ -1023,9 +1023,15 @@ countpages = count; for (i = oldcount; i < buf->nr_pages; ++i) - if ((buf->pfn_types[i] & XEN_DOMCTL_PFINFO_LTAB_MASK) == XEN_DOMCTL_PFINFO_XTAB - ||(buf->pfn_types[i] & XEN_DOMCTL_PFINFO_LTAB_MASK) == XEN_DOMCTL_PFINFO_XALLOC) + { + unsigned long pagetype; + + pagetype = buf->pfn_types[i] & XEN_DOMCTL_PFINFO_LTAB_MASK; + if ( pagetype == XEN_DOMCTL_PFINFO_XTAB || + pagetype == XEN_DOMCTL_PFINFO_BROKEN || + pagetype == XEN_DOMCTL_PFINFO_XALLOC ) --countpages; + } if (!countpages) return count; @@ -1267,6 +1273,17 @@ /* a bogus/unmapped/allocate-only page: skip it */ continue; + if ( pagetype == XEN_DOMCTL_PFINFO_BROKEN ) + { + if ( xc_set_broken_page_p2m(xch, dom, pfn) ) + { + ERROR("Set p2m for broken page failed, " + "dom=%d, pfn=%lx\n", dom, pfn); + goto err_mapped; + } + continue; + } + if (pfn_err[i]) { ERROR("unexpected PFN mapping failure pfn %lx map_mfn %lx p2m_mfn %lx", diff -r 8b93ac0c93f3 -r 446f6b9bfc89 tools/libxc/xc_domain_save.c --- a/tools/libxc/xc_domain_save.c Tue Nov 13 11:19:17 2012 +0000 +++ b/tools/libxc/xc_domain_save.c Sat Nov 17 09:46:05 2012 +0800 @@ -1118,7 +1118,7 @@ /* Now write out each data page, canonicalising page tables as we go... */ for ( ; ; ) { - unsigned int N, batch, run; + unsigned int N, batch, run, broken_page_num1, broken_page_num2; char reportbuf[80]; snprintf(reportbuf, sizeof(reportbuf), @@ -1270,6 +1270,7 @@ goto out; } + broken_page_num1 = 0; for ( run = j = 0; j < batch; j++ ) { unsigned long gmfn = pfn_batch[j]; @@ -1277,6 +1278,14 @@ if ( !hvm ) gmfn = pfn_to_mfn(gmfn); + if ( pfn_type[j] == XEN_DOMCTL_PFINFO_BROKEN ) + { + pfn_type[j] |= pfn_batch[j]; + ++broken_page_num1; + ++run; + continue; + } + if ( pfn_err[j] ) { if ( pfn_type[j] == XEN_DOMCTL_PFINFO_XTAB ) @@ -1371,8 +1380,12 @@ } } - /* skip pages that aren''t present or are alloc-only */ + /* + * skip pages that aren''t present, + * or are broken, or are alloc-only + */ if ( pagetype == XEN_DOMCTL_PFINFO_XTAB + || pagetype == XEN_DOMCTL_PFINFO_BROKEN || pagetype == XEN_DOMCTL_PFINFO_XALLOC ) continue; @@ -1484,6 +1497,35 @@ munmap(region_base, batch*PAGE_SIZE); + /* + * if vMCE occur at last iter, do one more iter so that it get + * chance to transfer broken page''s pfn_type and pfn number to + * target and then take appropriate action + */ + if ( last_iter ) + { + for ( j = 0; j < batch; j++ ) + { + if ( hvm ) + pfn_type[j] = pfn_batch[j]; + else + pfn_type[j] = pfn_to_mfn(pfn_batch[j]); + } + + if ( xc_get_pfn_type_batch(xch, dom, batch, pfn_type) ) + { + PERROR("get_pfn_type_batch failed"); + goto out; + } + + broken_page_num2 = 0; + for ( j = 0; j < batch; j++ ) + if ( pfn_type[j] == XEN_DOMCTL_PFINFO_BROKEN ) + broken_page_num2++; + + if ( broken_page_num1 < broken_page_num2 ) + last_iter = 0; + } } /* end of this while loop for this iteration */ skip: @@ -1550,23 +1592,22 @@ PERROR("Error when writing to state file (tsc)"); goto out; } + } + } + else + last_iter = 1; + if ( xc_shadow_control(xch, dom, + XEN_DOMCTL_SHADOW_OP_CLEAN, HYPERCALL_BUFFER(to_send), + dinfo->p2m_size, NULL, 0, &shadow_stats) != dinfo->p2m_size ) + { + PERROR("Error flushing shadow PT"); + goto out; + } - } + sent_last_iter = sent_this_iter; - if ( xc_shadow_control(xch, dom, - XEN_DOMCTL_SHADOW_OP_CLEAN, HYPERCALL_BUFFER(to_send), - dinfo->p2m_size, NULL, 0, &shadow_stats) != dinfo->p2m_size ) - { - PERROR("Error flushing shadow PT"); - goto out; - } - - sent_last_iter = sent_this_iter; - - print_stats(xch, dom, sent_this_iter, &time_stats, &shadow_stats, 1); - - } + print_stats(xch, dom, sent_this_iter, &time_stats, &shadow_stats, 1); } /* end of infinite for loop */ DPRINTF("All memory is saved\n"); diff -r 8b93ac0c93f3 -r 446f6b9bfc89 tools/libxc/xenctrl.h --- a/tools/libxc/xenctrl.h Tue Nov 13 11:19:17 2012 +0000 +++ b/tools/libxc/xenctrl.h Sat Nov 17 09:46:05 2012 +0800 @@ -575,6 +575,17 @@ xc_domaininfo_t *info); /** + * This function set p2m for broken page + * &parm xch a handle to an open hypervisor interface + * @parm domid the domain id which broken page belong to + * @parm pfn the pfn number of the broken page + * @return 0 on success, -1 on failure + */ +int xc_set_broken_page_p2m(xc_interface *xch, + uint32_t domid, + unsigned long pfn); + +/** * This function returns information about the context of a hvm domain * @parm xch a handle to an open hypervisor interface * @parm domid the domain to get information from diff -r 8b93ac0c93f3 -r 446f6b9bfc89 xen/arch/x86/cpu/mcheck/mcaction.c --- a/xen/arch/x86/cpu/mcheck/mcaction.c Tue Nov 13 11:19:17 2012 +0000 +++ b/xen/arch/x86/cpu/mcheck/mcaction.c Sat Nov 17 09:46:05 2012 +0800 @@ -1,5 +1,6 @@ #include <xen/types.h> #include <xen/sched.h> +#include <asm/p2m.h> #include "mcaction.h" #include "vmce.h" #include "mce.h" @@ -91,6 +92,24 @@ goto vmce_failed; } + if ( is_hvm_domain(d) && !d->arch.hvm_domain.dirty_vram && + paging_mode_log_dirty(d) ) + { + /* + * vMCE occur during migration + * + * At sender, it marks broken page to dirty bitmap, + * so that at copypages stage of migration, broken + * page''s pfn_type and pfn number would be transferred + * to target and then take appropriate action. + * + * At target, it would set p2m as p2m_ram_broken for + * broken page, so that if guest access the broken page + * again, it would kill itself as expected. + */ + paging_mark_dirty(d, mfn); + } + if ( unmmap_broken_page(d, _mfn(mfn), gfn) ) { printk("Unmap broken memory %lx for DOM%d failed\n", diff -r 8b93ac0c93f3 -r 446f6b9bfc89 xen/arch/x86/domctl.c --- a/xen/arch/x86/domctl.c Tue Nov 13 11:19:17 2012 +0000 +++ b/xen/arch/x86/domctl.c Sat Nov 17 09:46:05 2012 +0800 @@ -209,12 +209,18 @@ for ( j = 0; j < k; j++ ) { unsigned long type = 0; + p2m_type_t t; - page = get_page_from_gfn(d, arr[j], NULL, P2M_ALLOC); + page = get_page_from_gfn(d, arr[j], &t, P2M_ALLOC); if ( unlikely(!page) || unlikely(is_xen_heap_page(page)) ) - type = XEN_DOMCTL_PFINFO_XTAB; + { + if ( p2m_is_broken(t) ) + type = XEN_DOMCTL_PFINFO_BROKEN; + else + type = XEN_DOMCTL_PFINFO_XTAB; + } else { switch( page->u.inuse.type_info & PGT_type_mask ) @@ -235,6 +241,9 @@ if ( page->u.inuse.type_info & PGT_pinned ) type |= XEN_DOMCTL_PFINFO_LPINTAB; + + if ( page->count_info & PGC_broken ) + type = XEN_DOMCTL_PFINFO_BROKEN; } if ( page ) @@ -1568,6 +1577,29 @@ } break; + case XEN_DOMCTL_set_broken_page_p2m: + { + struct domain *d; + + d = rcu_lock_domain_by_id(domctl->domain); + if ( d != NULL ) + { + p2m_type_t pt; + unsigned long pfn = domctl->u.set_broken_page_p2m.pfn; + mfn_t mfn = get_gfn_query(d, pfn, &pt); + + if ( unlikely(!mfn_valid(mfn_x(mfn)) || !p2m_is_ram(pt) || + (p2m_change_type(d, pfn, pt, p2m_ram_broken) != pt)) ) + ret = -EINVAL; + + put_gfn(d, pfn); + rcu_unlock_domain(d); + } + else + ret = -ESRCH; + } + break; + default: ret = iommu_do_domctl(domctl, u_domctl); break; diff -r 8b93ac0c93f3 -r 446f6b9bfc89 xen/include/public/domctl.h --- a/xen/include/public/domctl.h Tue Nov 13 11:19:17 2012 +0000 +++ b/xen/include/public/domctl.h Sat Nov 17 09:46:05 2012 +0800 @@ -136,6 +136,7 @@ #define XEN_DOMCTL_PFINFO_LPINTAB (0x1U<<31) #define XEN_DOMCTL_PFINFO_XTAB (0xfU<<28) /* invalid page */ #define XEN_DOMCTL_PFINFO_XALLOC (0xeU<<28) /* allocate-only page */ +#define XEN_DOMCTL_PFINFO_BROKEN (0xdU<<28) /* broken page */ #define XEN_DOMCTL_PFINFO_LTAB_MASK (0xfU<<28) struct xen_domctl_getpageframeinfo { @@ -834,6 +835,12 @@ typedef struct xen_domctl_set_access_required xen_domctl_set_access_required_t; DEFINE_XEN_GUEST_HANDLE(xen_domctl_set_access_required_t); +struct xen_domctl_set_broken_page_p2m { + uint64_aligned_t pfn; +}; +typedef struct xen_domctl_set_broken_page_p2m xen_domctl_set_broken_page_p2m_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_set_broken_page_p2m_t); + struct xen_domctl { uint32_t cmd; #define XEN_DOMCTL_createdomain 1 @@ -899,6 +906,7 @@ #define XEN_DOMCTL_set_access_required 64 #define XEN_DOMCTL_audit_p2m 65 #define XEN_DOMCTL_set_virq_handler 66 +#define XEN_DOMCTL_set_broken_page_p2m 67 #define XEN_DOMCTL_gdbsx_guestmemio 1000 #define XEN_DOMCTL_gdbsx_pausevcpu 1001 #define XEN_DOMCTL_gdbsx_unpausevcpu 1002 @@ -954,6 +962,7 @@ struct xen_domctl_audit_p2m audit_p2m; struct xen_domctl_set_virq_handler set_virq_handler; struct xen_domctl_gdbsx_memio gdbsx_guest_memio; + struct xen_domctl_set_broken_page_p2m set_broken_page_p2m; struct xen_domctl_gdbsx_pauseunp_vcpu gdbsx_pauseunp_vcpu; struct xen_domctl_gdbsx_domstatus gdbsx_domstatus; uint8_t pad[128];
Ian Campbell
2012-Nov-19 09:55 UTC
Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration
On Fri, 2012-11-16 at 18:31 +0000, Liu, Jinsong wrote:> >> + /* > >> + * if vMCE occur at last iter, do one more iter so that it get > >> + * chance to transfer broken page''s pfn_type and pfn number to > >> + * target and then take appropriate action[...]> >> + } > > > > Is this really the best way of doing this ? Isn''t there some single > > flag the hypervisor sets somewhere ?> That will involve adding new hypercall, and notifying hypervisor > whether tools side is at the last iter round.If we get to this stage then haven''t we either already sent something over the wire for this page or marked it as dirty when we tried and failed to send it? In the former case we don''t care that the page is now broken on the source since the target has got a good pre-breakage copy. In the latter case could we not set a flag at the same time as we mark the page dirty which means "go round at least one more time"? Ian.
George Dunlap
2012-Nov-19 15:29 UTC
Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration
On 19/11/12 09:55, Ian Campbell wrote:> If we get to this stage then haven''t we either already sent something > over the wire for this page or marked it as dirty when we tried and > failed to send it? > > In the former case we don''t care that the page is now broken on the > source since the target has got a good pre-breakage copy. > > In the latter case could we not set a flag at the same time as we mark > the page dirty which means "go round at least one more time"?Yeah -- on the last iteration, the VM itself has to be paused; if any pages get broken after that, it doesn''t really matter, does it? The real thing is to have a consistent "snapshot" of behavior. I guess the one potentially tricky case to worry about is whether to deliver an MCE to the guest on restore. Consider the following scenario: - Page A is modified (and marked dirty) - VM paused for last iteration - Page breaks, is marked broken in the p2m - Save code sends page A In that case, the save code would send a "broken" page, and the restore code would mark a page as broken, and we *would* want to deliver an MCE on the far side. But suppose the last two steps were reversed: - Page A modified - VM paused for last iteration - Save code sends page A - Page breaks, marked broken in the p2m In that case, when the save code sends page A, it will send a good page; there''s no need to mark it broken, or to send the guest an MCE. Am I understanding the situation correctly, Jinsong? -George
Ian Campbell
2012-Nov-19 16:57 UTC
Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration
On Mon, 2012-11-19 at 15:29 +0000, George Dunlap wrote:> On 19/11/12 09:55, Ian Campbell wrote: > > If we get to this stage then haven''t we either already sent something > > over the wire for this page or marked it as dirty when we tried and > > failed to send it? > > > > In the former case we don''t care that the page is now broken on the > > source since the target has got a good pre-breakage copy. > > > > In the latter case could we not set a flag at the same time as we mark > > the page dirty which means "go round at least one more time"? > > Yeah -- on the last iteration, the VM itself has to be paused; if any > pages get broken after that, it doesn''t really matter, does it? The real > thing is to have a consistent "snapshot" of behavior. > > I guess the one potentially tricky case to worry about is whether to > deliver an MCE to the guest on restore. Consider the following scenario: > > - Page A is modified (and marked dirty) > - VM paused for last iteration > - Page breaks, is marked broken in the p2m > - Save code sends page A > > In that case, the save code would send a "broken" page, and the restore > code would mark a page as broken, and we *would* want to deliver an MCE > on the far side. But suppose the last two steps were reversed: > > - Page A modified > - VM paused for last iteration > - Save code sends page A > - Page breaks, marked broken in the p2m > > In that case, when the save code sends page A, it will send a good page; > there''s no need to mark it broken, or to send the guest an MCE.I guess you''d want to err on the side of stopping using a good page, as opposed to continuing to use a bad page? i.e. its better to take a spurious vMCE than to not take an actual one. I''m not actually sure what a guest does with a vMCE, I guess it does some sort of memory exchange to give the bad page back to the h/v and get a good page in return? If the hypervisor thinks the old page is ok rather than bad I guess it''ll just put it in the free list instead of the bad list?> > Am I understanding the situation correctly, Jinsong? > > -George
George Dunlap
2012-Nov-20 15:08 UTC
Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration
On 19/11/12 16:57, Ian Campbell wrote:> On Mon, 2012-11-19 at 15:29 +0000, George Dunlap wrote: >> On 19/11/12 09:55, Ian Campbell wrote: >>> If we get to this stage then haven''t we either already sent something >>> over the wire for this page or marked it as dirty when we tried and >>> failed to send it? >>> >>> In the former case we don''t care that the page is now broken on the >>> source since the target has got a good pre-breakage copy. >>> >>> In the latter case could we not set a flag at the same time as we mark >>> the page dirty which means "go round at least one more time"? >> Yeah -- on the last iteration, the VM itself has to be paused; if any >> pages get broken after that, it doesn''t really matter, does it? The real >> thing is to have a consistent "snapshot" of behavior. >> >> I guess the one potentially tricky case to worry about is whether to >> deliver an MCE to the guest on restore. Consider the following scenario: >> >> - Page A is modified (and marked dirty) >> - VM paused for last iteration >> - Page breaks, is marked broken in the p2m >> - Save code sends page A >> >> In that case, the save code would send a "broken" page, and the restore >> code would mark a page as broken, and we *would* want to deliver an MCE >> on the far side. But suppose the last two steps were reversed: >> >> - Page A modified >> - VM paused for last iteration >> - Save code sends page A >> - Page breaks, marked broken in the p2m >> >> In that case, when the save code sends page A, it will send a good page; >> there''s no need to mark it broken, or to send the guest an MCE. > I guess you''d want to err on the side of stopping using a good page, as > opposed to continuing to use a bad page? i.e. its better to take a > spurious vMCE than to not take an actual one.While that''s true, taking a spurious MCE means at very least one less page available to the guest to use (for HVM guests that haven''t ballooned down, at least), and the unnecessary loss of the data in that page. The problem I guess is that the save code at the moment has no way of distinguishing the following cases: 1. Marked broken after the last time I sent it, but before the VM was paused; but the page hasn''t been written to 2. Marked broken after the VM was paused; page hadn''t been written to 3. Marked broken after the VM was paused, but the page had been written to In case 1, we definitely need to send a broken page; but the VM may have already received a vMCE. In case 2, we don''t need to send a broken page or a vMCE, while in #3 we need to do both. On the other hand, the whole situation is hopefully rare enough that maybe we can just do the simple correct thing, even if it''s a tiny bit sub-optimal. In that case, assuming that spurious vMCEs aren''t a problem (e.g., #1), I think we basically just need to see if the last iteration contains a broken page, and if so, send the guest a vMCE on resume. Thoughts?> I''m not actually sure what a guest does with a vMCE, I guess it does > some sort of memory exchange to give the bad page back to the h/v and > get a good page in return? If the hypervisor thinks the old page is ok > rather than bad I guess it''ll just put it in the free list instead of > the bad list?Yes, I''m pretty sure the hypervisor''s accounting of broken pages is separate from guest p2m entries; I think if you mark a p2m entry broken, the hypervisor will just free the ram page that was mapped there before. I think the guest just tries to recover gracefully when it gets a vMCE (e.g., by re-reading the page from disk or killing the process). I don''t think it asks the hypervisor for another page to replace it at this point. -George
Liu, Jinsong
2012-Nov-20 16:11 UTC
Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration
Ian Campbell wrote:> On Fri, 2012-11-16 at 18:31 +0000, Liu, Jinsong wrote: >>>> + /* >>>> + * if vMCE occur at last iter, do one more iter so >>>> that it get + * chance to transfer broken page''s >>>> pfn_type and pfn number to + * target and then take >>>> appropriate action [...] + } >>> >>> Is this really the best way of doing this ? Isn''t there some single >>> flag the hypervisor sets somewhere ? > >> That will involve adding new hypercall, and notifying hypervisor >> whether tools side is at the last iter round. > > If we get to this stage then haven''t we either already sent something > over the wire for this page or marked it as dirty when we tried and > failed to send it? > > In the former case we don''t care that the page is now broken on the > source since the target has got a good pre-breakage copy.Yes, and theoritically it''s better, if only tools save code and hypervisor vmce code can *easily* know status each other, like * whether vmce occur at last iteration * whether the broken page is at dirty bitmap or not at last iteration> > In the latter case could we not set a flag at the same time as we mark > the page dirty which means "go round at least one more time"? > > Ian.That would involve adding new hypercalls for this corner case. IMO it''s better to keep simple way for the corner case, re-use currently existed mechanism. Thanks, Jinsong
Liu, Jinsong
2012-Nov-20 16:29 UTC
Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration
George Dunlap wrote:> On 19/11/12 09:55, Ian Campbell wrote: >> If we get to this stage then haven''t we either already sent something >> over the wire for this page or marked it as dirty when we tried and >> failed to send it? >> >> In the former case we don''t care that the page is now broken on the >> source since the target has got a good pre-breakage copy. >> >> In the latter case could we not set a flag at the same time as we >> mark the page dirty which means "go round at least one more time"? > > Yeah -- on the last iteration, the VM itself has to be paused; if any > pages get broken after that, it doesn''t really matter, does it? TheIf the page (which will break at last iter) is in the dirty bitmap of last iter, it does matter, even though VM forzen at last iter.> real thing is to have a consistent "snapshot" of behavior. > > I guess the one potentially tricky case to worry about is whether to > deliver an MCE to the guest on restore. Consider the following > scenario: > > - Page A is modified (and marked dirty) > - VM paused for last iteration > - Page breaks, is marked broken in the p2m > - Save code sends page A > > In that case, the save code would send a "broken" page, and the > restore code would mark a page as broken, and we *would* want to > deliver an MCE on the far side. But suppose the last two steps were > reversed: > > - Page A modified > - VM paused for last iteration > - Save code sends page A > - Page breaks, marked broken in the p2m > > In that case, when the save code sends page A, it will send a good > page; there''s no need to mark it broken,> or to send the guest an MCE. > > Am I understanding the situation correctly, Jinsong? > > -GeorgeYes, if not consider code complication :-)
Liu, Jinsong
2012-Nov-20 16:43 UTC
Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration
Ian Campbell wrote:> On Mon, 2012-11-19 at 15:29 +0000, George Dunlap wrote: >> On 19/11/12 09:55, Ian Campbell wrote: >>> If we get to this stage then haven''t we either already sent >>> something over the wire for this page or marked it as dirty when we >>> tried and failed to send it? >>> >>> In the former case we don''t care that the page is now broken on the >>> source since the target has got a good pre-breakage copy. >>> >>> In the latter case could we not set a flag at the same time as we >>> mark the page dirty which means "go round at least one more time"? >> >> Yeah -- on the last iteration, the VM itself has to be paused; if any >> pages get broken after that, it doesn''t really matter, does it? The >> real thing is to have a consistent "snapshot" of behavior. >> >> I guess the one potentially tricky case to worry about is whether to >> deliver an MCE to the guest on restore. Consider the following >> scenario: >> >> - Page A is modified (and marked dirty) >> - VM paused for last iteration >> - Page breaks, is marked broken in the p2m >> - Save code sends page A >> >> In that case, the save code would send a "broken" page, and the >> restore code would mark a page as broken, and we *would* want to >> deliver an MCE on the far side. But suppose the last two steps were >> reversed: >> >> - Page A modified >> - VM paused for last iteration >> - Save code sends page A >> - Page breaks, marked broken in the p2m >> >> In that case, when the save code sends page A, it will send a good >> page; there''s no need to mark it broken, or to send the guest an MCE. > > I guess you''d want to err on the side of stopping using a good page, > as opposed to continuing to use a bad page? i.e. its better to take a > spurious vMCE than to not take an actual one. > > I''m not actually sure what a guest does with a vMCE, I guess it does > some sort of memory exchange to give the bad page back to the h/v and > get a good page in return? If the hypervisor thinks the old page is ok > rather than bad I guess it''ll just put it in the free list instead of > the bad list? >the broken page is h/w bad page, means its content corrupted and cannot be corrected by h/w circuit (say ECC). For vmce and the broken page, guest and h/v are agnostic each other so I don''t think h/v should have chance to do anything, except inject vMCE# to guest. Guest handle vMCE# exactly same as native, like, to simply ignore vMCE, to kill process and isolate broken page if possible (in its own range), or kill guest itself if not possible, depending on what the error is. Thanks, Jinsong
Liu, Jinsong
2012-Nov-20 17:08 UTC
Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration
George Dunlap wrote:> On 19/11/12 16:57, Ian Campbell wrote: >> On Mon, 2012-11-19 at 15:29 +0000, George Dunlap wrote: >>> On 19/11/12 09:55, Ian Campbell wrote: >>>> If we get to this stage then haven''t we either already sent >>>> something over the wire for this page or marked it as dirty when >>>> we tried and failed to send it? >>>> >>>> In the former case we don''t care that the page is now broken on the >>>> source since the target has got a good pre-breakage copy. >>>> >>>> In the latter case could we not set a flag at the same time as we >>>> mark the page dirty which means "go round at least one more time"? >>> Yeah -- on the last iteration, the VM itself has to be paused; if >>> any pages get broken after that, it doesn''t really matter, does it? >>> The real thing is to have a consistent "snapshot" of behavior. >>> >>> I guess the one potentially tricky case to worry about is whether to >>> deliver an MCE to the guest on restore. Consider the following >>> scenario: >>> >>> - Page A is modified (and marked dirty) >>> - VM paused for last iteration >>> - Page breaks, is marked broken in the p2m >>> - Save code sends page A >>> >>> In that case, the save code would send a "broken" page, and the >>> restore code would mark a page as broken, and we *would* want to >>> deliver an MCE on the far side. But suppose the last two steps >>> were reversed: >>> >>> - Page A modified >>> - VM paused for last iteration >>> - Save code sends page A >>> - Page breaks, marked broken in the p2m >>> >>> In that case, when the save code sends page A, it will send a good >>> page; there''s no need to mark it broken, or to send the guest an >>> MCE. >> I guess you''d want to err on the side of stopping using a good page, >> as opposed to continuing to use a bad page? i.e. its better to take a >> spurious vMCE than to not take an actual one. > > While that''s true, taking a spurious MCE means at very least one less > page available to the guest to use (for HVM guests that haven''t > ballooned down, at least), and the unnecessary loss of the data in > that page. > > The problem I guess is that the save code at the moment has no way of > distinguishing the following cases: > 1. Marked broken after the last time I sent it, but before the VM was > paused; but the page hasn''t been written to > 2. Marked broken after the VM was paused; page hadn''t been written to > 3. Marked broken after the VM was paused, but the page had been > written to > > In case 1, we definitely need to send a broken page; but the VM may > have already received a vMCE. In case 2, we don''t need to send a > broken page or a vMCE, while in #3 we need to do both. >Yes, and for case 3, strictly it still has chance not to send broken page to target (and target will happily runs on ''good'' pfn_type and ''good'' page), but too complicated to distinguish.> On the other hand, the whole situation is hopefully rare enough that > maybe we can just do the simple correct thing, even if it''s a tiny bit > sub-optimal. In that case, assuming that spurious vMCEs aren''t a > problem (e.g., #1), I think we basically just need to see if the last > iteration contains a broken page, and if so, send the guest a vMCE on > resume. > > Thoughts?Yes, that''s exactly V3 patch did. Thanks, Jinsong
George Dunlap
2012-Nov-20 17:23 UTC
Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration
On 20/11/12 17:08, Liu, Jinsong wrote:>> On the other hand, the whole situation is hopefully rare enough that >> maybe we can just do the simple correct thing, even if it''s a tiny bit >> sub-optimal. In that case, assuming that spurious vMCEs aren''t a >> problem (e.g., #1), I think we basically just need to see if the last >> iteration contains a broken page, and if so, send the guest a vMCE on >> resume. >> >> Thoughts? > > Yes, that''s exactly V3 patch did.Right -- but you did it by adding an extra couple of loops, which made us wonder if there was a more efficient way to do the same thing. Now that we know *what* needs to be done, we''re in a better position to think about other options. Let me take a look... -George
George Dunlap
2012-Nov-20 17:48 UTC
Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration
On Sat, Nov 17, 2012 at 2:04 AM, Liu Jinsong <jinsong.liu@intel.com> wrote:> + /* > + * if vMCE occur at last iter, do one more iter so that it get > + * chance to transfer broken page''s pfn_type and pfn number to > + * target and then take appropriate action > + */ > + if ( last_iter ) > + { > + for ( j = 0; j < batch; j++ ) > + { > + if ( hvm ) > + pfn_type[j] = pfn_batch[j]; > + else > + pfn_type[j] = pfn_to_mfn(pfn_batch[j]); > + } > + > + if ( xc_get_pfn_type_batch(xch, dom, batch, pfn_type) ) > + { > + PERROR("get_pfn_type_batch failed"); > + goto out; > + } > + > + broken_page_num2 = 0; > + for ( j = 0; j < batch; j++ ) > + if ( pfn_type[j] == XEN_DOMCTL_PFINFO_BROKEN ) > + broken_page_num2++; > + > + if ( broken_page_num1 < broken_page_num2 ) > + last_iter = 0; > + } >I guess I still don''t see the point of this loop. The pages this loop will catch are pages in this scenario: - Page A marked dirty - Last iteration begins; vm paused - Page A sent - Page A marked broken - Last iteration ends But in this case, it doesn''t really matter that page A is clean on the receiving side; it''s already got a clean working copy with the latest data, right?> } /* end of this while loop for this iteration */ > > skip: > @@ -1550,23 +1592,22 @@ > PERROR("Error when writing to state file (tsc)"); > goto out; > } > + } > + } > + else > + last_iter = 1; > > + if ( xc_shadow_control(xch, dom, > + XEN_DOMCTL_SHADOW_OP_CLEAN, > HYPERCALL_BUFFER(to_send), > + dinfo->p2m_size, NULL, 0, &shadow_stats) > != dinfo->p2m_size ) > + { > + PERROR("Error flushing shadow PT"); > + goto out; > + } > > - } > + sent_last_iter = sent_this_iter; > > - if ( xc_shadow_control(xch, dom, > - XEN_DOMCTL_SHADOW_OP_CLEAN, > HYPERCALL_BUFFER(to_send), > - dinfo->p2m_size, NULL, 0, > &shadow_stats) != dinfo->p2m_size ) > - { > - PERROR("Error flushing shadow PT"); > - goto out; > - } > - > - sent_last_iter = sent_this_iter; > - > - print_stats(xch, dom, sent_this_iter, &time_stats, > &shadow_stats, 1); > - > - } > + print_stats(xch, dom, sent_this_iter, &time_stats, &shadow_stats, > 1); >And what''s all this stuff about? Why force a second sweep when we''re saving to disk? -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Liu, Jinsong
2012-Nov-20 17:49 UTC
Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration
George Dunlap wrote:> On 20/11/12 17:08, Liu, Jinsong wrote: >>> On the other hand, the whole situation is hopefully rare enough that >>> maybe we can just do the simple correct thing, even if it''s a tiny >>> bit sub-optimal. In that case, assuming that spurious vMCEs aren''t >>> a problem (e.g., #1), I think we basically just need to see if the >>> last iteration contains a broken page, and if so, send the guest a >>> vMCE on resume. >>> >>> Thoughts? >> >> Yes, that''s exactly V3 patch did. > > Right -- but you did it by adding an extra couple of loops, which made > us wonder if there was a more efficient way to do the same thing. Now > that we know *what* needs to be done, we''re in a better position to > think about other options. Let me take a look... > > -GeorgeWow, thanks! Jinsong
Liu, Jinsong
2012-Nov-20 18:13 UTC
Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration
________________________________ From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of George Dunlap Sent: Wednesday, November 21, 2012 1:48 AM To: Liu, Jinsong Cc: Ian Campbell; Ian Jackson; Jan Beulich; xen-devel@lists.xensource.com Subject: Re: [Xen-devel] [PATCH V3] X86/vMCE: handle broken page with regard to migration On Sat, Nov 17, 2012 at 2:04 AM, Liu Jinsong <jinsong.liu@intel.com<mailto:jinsong.liu@intel.com>> wrote: + /* + * if vMCE occur at last iter, do one more iter so that it get + * chance to transfer broken page''s pfn_type and pfn number to + * target and then take appropriate action + */ + if ( last_iter ) + { + for ( j = 0; j < batch; j++ ) + { + if ( hvm ) + pfn_type[j] = pfn_batch[j]; + else + pfn_type[j] = pfn_to_mfn(pfn_batch[j]); + } + + if ( xc_get_pfn_type_batch(xch, dom, batch, pfn_type) ) + { + PERROR("get_pfn_type_batch failed"); + goto out; + } + + broken_page_num2 = 0; + for ( j = 0; j < batch; j++ ) + if ( pfn_type[j] == XEN_DOMCTL_PFINFO_BROKEN ) + broken_page_num2++; + + if ( broken_page_num1 < broken_page_num2 ) + last_iter = 0; + } I guess I still don''t see the point of this loop. The pages this loop will catch are pages in this scenario: - Page A marked dirty - Last iteration begins; vm paused - Page A sent - Page A marked broken - Last iteration ends [Jinsong] No, at last lter, there are 4 points: 1. start last iter 2. get and transfer pfn_type to target 3. copy page to target 4. end last iter this code monitor whether it has a vmce occur at area from point 2 to point 4 (we don''t care vmce occur at last iter but before point2) But in this case, it doesn''t really matter that page A is clean on the receiving side; it''s already got a clean working copy with the latest data, right? } /* end of this while loop for this iteration */ skip: @@ -1550,23 +1592,22 @@ PERROR("Error when writing to state file (tsc)"); goto out; } + } + } + else + last_iter = 1; + if ( xc_shadow_control(xch, dom, + XEN_DOMCTL_SHADOW_OP_CLEAN, HYPERCALL_BUFFER(to_send), + dinfo->p2m_size, NULL, 0, &shadow_stats) != dinfo->p2m_size ) + { + PERROR("Error flushing shadow PT"); + goto out; + } - } + sent_last_iter = sent_this_iter; - if ( xc_shadow_control(xch, dom, - XEN_DOMCTL_SHADOW_OP_CLEAN, HYPERCALL_BUFFER(to_send), - dinfo->p2m_size, NULL, 0, &shadow_stats) != dinfo->p2m_size ) - { - PERROR("Error flushing shadow PT"); - goto out; - } - - sent_last_iter = sent_this_iter; - - print_stats(xch, dom, sent_this_iter, &time_stats, &shadow_stats, 1); - - } + print_stats(xch, dom, sent_this_iter, &time_stats, &shadow_stats, 1); And what''s all this stuff about? Why force a second sweep when we''re saving to disk? [Jinsong] if not, saving to disk will do endless loop, since it has no chance to set last_iter=1. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Jackson
2012-Nov-20 18:21 UTC
Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration
Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3] X86/vMCE: handle broken page with regard to migration"):> No, at last lter, there are 4 points: > 1. start last iter > 2. get and transfer pfn_type to target > 3. copy page to target > 4. end last iter > > this code monitor whether it has a vmce occur at area from point 2 to point 4 (we don''t care vmce occur at last iter but before point2)The migration code is rather tangled but perhaps it would be best to check for mce after step 3 for each page, and keep a separate list of the pages which broke in that pass ? Ian.
Liu, Jinsong
2012-Nov-20 18:39 UTC
Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration
Ian Jackson wrote:> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3] X86/vMCE: handle > broken page with regard to migration"): >> No, at last lter, there are 4 points: >> 1. start last iter >> 2. get and transfer pfn_type to target >> 3. copy page to target >> 4. end last iter >> >> this code monitor whether it has a vmce occur at area from point 2 >> to point 4 (we don''t care vmce occur at last iter but before point2) > > The migration code is rather tangled but perhaps it would be best to > check for mce after step 3 for each page, and keep a separate list of > the pages which broke in that pass ? > > Ian.It indeed checks mce after point 3 for each page, but what''s the advantage of keeping a separate list? Thanks, Jinsong
Ian Jackson
2012-Nov-20 18:42 UTC
Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration
Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3] X86/vMCE: handle broken page with regard to migration"):> Ian Jackson wrote: > > Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3] X86/vMCE: handle > > broken page with regard to migration"): > >> No, at last lter, there are 4 points: > >> 1. start last iter > >> 2. get and transfer pfn_type to target > >> 3. copy page to target > >> 4. end last iter...> It indeed checks mce after point 3 for each page, but what''s the > advantage of keeping a separate list?It avoids yet another loop over all the pages. Unless I have misunderstood. Which I may have, because: if it checks for mce after point 3 then surely that is sufficient ? We don''t need to worry about mces after that check. Ian.
Liu, Jinsong
2012-Nov-20 18:54 UTC
Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration
Well, let me elaborate more my thought about broken page occur at last iter: Theoretically we can separate ''broken page at last iter'' into several sub-cases, and optimize case by case: 1. if the page (which will break at last iter) is not on dirty bitmap (of last iter) --> do nothing is OK, target will happily run w/o broken page; 2. if the page (which will break at last iter) is on dirty bitmap (of last iter) 2.1 if at last iter, vmce occur after page copy --> do nothing is OK, target happily run w/o broken page; 2.2 if vmce occur beofre pfn_type transfer --> do nothing is OK, V2 patch has correctly handle the case, target will set p2m broken correctly; 2.3 if vmce occur after pfn_type transfer and before copy page to target --> we need handle this case; Practically considering it''s rare enough, and code complication, we handle it in a simple way (not so optimized but enough for real life): - we don''t distinguish if the page is in dirty bitmap of last iter; - we don''t prefer adding new hypercall for this corner case, instead we''d like to re-use currently existed hypercall; - if vmce occur at last iter, we do 1~2 more iter. This is what V3 patch does now. Thoughts? Thanks, Jinsong
Liu, Jinsong
2012-Nov-20 19:07 UTC
Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration
Ian Jackson wrote:> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3] X86/vMCE: handle > broken page with regard to migration"): >> Ian Jackson wrote: >>> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3] X86/vMCE: handle >>> broken page with regard to migration"): >>>> No, at last lter, there are 4 points: >>>> 1. start last iter >>>> 2. get and transfer pfn_type to target >>>> 3. copy page to target >>>> 4. end last iter > ... >> It indeed checks mce after point 3 for each page, but what''s the >> advantage of keeping a separate list? > > It avoids yet another loop over all the pages. Unless I have > misunderstood. Which I may have, because: if it checks for mce after > point 3 then surely that is sufficient ? We don''t need to worry about > mces after that check. > > Ian.In fact it will only do 1~2 more iter (if vmce occur at last iter), with exactly 1 more page (the broken page) -- vm frozen under this case, right? As for the case vmce occur after our check, target will be very happy (and lucky) run the vm w/o broken page. Thanks, Jinsong
Ian Campbell
2012-Nov-21 11:07 UTC
Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration
On Tue, 2012-11-20 at 18:54 +0000, Liu, Jinsong wrote:> Well, let me elaborate more my thought about broken page occur at last iter: > > Theoretically we can separate ''broken page at last iter'' into several sub-cases, and optimize case by case: > 1. if the page (which will break at last iter) is not on dirty bitmap (of last iter) --> do nothing is OK, target will happily run w/o broken page; > 2. if the page (which will break at last iter) is on dirty bitmap (of last iter) > 2.1 if at last iter, vmce occur after page copy --> do nothing is OK, target happily run w/o broken page; > 2.2 if vmce occur beofre pfn_type transfer --> do nothing is OK, V2 patch has correctly handle the case, target will set p2m broken correctly; > 2.3 if vmce occur after pfn_type transfer and before copy page to target --> we need handle this case; > > Practically considering it''s rare enough, and code complication, we handle it in a simple way (not so optimized but enough for real life): > - we don''t distinguish if the page is in dirty bitmap of last iter; > - we don''t prefer adding new hypercall for this corner case, instead we''d like to re-use currently existed hypercall; > - if vmce occur at last iter, we do 1~2 more iter.Can''t a page break on each "last" iter. i.e. you might actually go around as many times as there are pages in the last batch? Ian.
George Dunlap
2012-Nov-21 11:18 UTC
Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration
On 21/11/12 11:07, Ian Campbell wrote:> On Tue, 2012-11-20 at 18:54 +0000, Liu, Jinsong wrote: >> Well, let me elaborate more my thought about broken page occur at last iter: >> >> Theoretically we can separate ''broken page at last iter'' into several sub-cases, and optimize case by case: >> 1. if the page (which will break at last iter) is not on dirty bitmap (of last iter) --> do nothing is OK, target will happily run w/o broken page; >> 2. if the page (which will break at last iter) is on dirty bitmap (of last iter) >> 2.1 if at last iter, vmce occur after page copy --> do nothing is OK, target happily run w/o broken page; >> 2.2 if vmce occur beofre pfn_type transfer --> do nothing is OK, V2 patch has correctly handle the case, target will set p2m broken correctly; >> 2.3 if vmce occur after pfn_type transfer and before copy page to target --> we need handle this case; >> >> Practically considering it''s rare enough, and code complication, we handle it in a simple way (not so optimized but enough for real life): >> - we don''t distinguish if the page is in dirty bitmap of last iter; >> - we don''t prefer adding new hypercall for this corner case, instead we''d like to re-use currently existed hypercall; >> - if vmce occur at last iter, we do 1~2 more iter. > Can''t a page break on each "last" iter. i.e. you might actually go > around as many times as there are pages in the last batch?I think it would be only if the pages break one by one, at just the right time -- that seems pretty unlikely. :-) -George
George Dunlap
2012-Nov-21 11:34 UTC
Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration
On 20/11/12 18:42, Ian Jackson wrote:> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3] X86/vMCE: handle broken page with regard to migration"): >> Ian Jackson wrote: >>> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3] X86/vMCE: handle >>> broken page with regard to migration"): >>>> No, at last lter, there are 4 points: >>>> 1. start last iter >>>> 2. get and transfer pfn_type to target >>>> 3. copy page to target >>>> 4. end last iter > ... >> It indeed checks mce after point 3 for each page, but what''s the >> advantage of keeping a separate list? > It avoids yet another loop over all the pages. Unless I have > misunderstood. Which I may have, because: if it checks for mce after > point 3 then surely that is sufficient ? We don''t need to worry about > mces after that check.It''s sufficient, but wouldn''t each check require a separate hypercall? That would surely be slower than just a single hypercall and a loop (which is what Jinsong''s patch does). We don''t actually need a list -- I think we just need to know, "Have any pages broken between reading the p2m table ( xc_get_pfn_type_batch() ); if so, we do another full iteration. Since marking a page broken will also mark it dirty, I suppose what we could do is, on the last iteration, clear the dirty bitmap after getting the list of pages but before copying them; and then check the bit in the bitmap corresponding to the pfn after copying it. But on the whole, is that really that much *faster* to do it that way? Either way you''re still adding a single hypercall to the whole thing, and one iteration of a loop for each page; but in Jinsong''s patch, the loop is done all at once, so presumably the compiler / processor will be able to do make better use of loop unrolling / registers / the pipeline / branch prediction / caches &c. The only way to avoid the loop would be to add an extra "have any pages been marked broken / dirty" bit somewhere, which is probably more trouble than it''s worth. -George
Ian Jackson
2012-Nov-21 11:55 UTC
Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration
George Dunlap writes ("Re: [Xen-devel] [PATCH V3] X86/vMCE: handle broken page with regard to migration"):> But on the whole, is that really that much *faster* to do it that way? > Either way you''re still adding a single hypercall to the whole thing, > and one iteration of a loop for each page; but in Jinsong''s patch, the > loop is done all at once, so presumably the compiler / processor will be > able to do make better use of loop unrolling / registers / the pipeline > / branch prediction / caches &c. > > The only way to avoid the loop would be to add an extra "have any pages > been marked broken / dirty" bit somewhere, which is probably more > trouble than it''s worth.OK, that makes sense. Ian.
Liu, Jinsong
2012-Nov-21 12:11 UTC
Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration
George Dunlap wrote:> On 21/11/12 11:07, Ian Campbell wrote: >> On Tue, 2012-11-20 at 18:54 +0000, Liu, Jinsong wrote: >>> Well, let me elaborate more my thought about broken page occur at >>> last iter: >>> >>> Theoretically we can separate ''broken page at last iter'' into >>> several sub-cases, and optimize case by case: >>> 1. if the page (which will break at last iter) is not on dirty >>> bitmap (of last iter) --> do nothing is OK, target will happily run >>> w/o broken page; >>> 2. if the page (which will break at last iter) is on dirty bitmap >>> (of last iter) >>> 2.1 if at last iter, vmce occur after page copy --> do nothing >>> is OK, target happily run w/o broken page; >>> 2.2 if vmce occur beofre pfn_type transfer --> do nothing is OK, >>> V2 patch has correctly handle the case, target will set p2m broken >>> correctly; >>> 2.3 if vmce occur after pfn_type transfer and before copy page >>> to target --> we need handle this case; >>> >>> Practically considering it''s rare enough, and code complication, we >>> handle it in a simple way (not so optimized but enough for real >>> life): >>> - we don''t distinguish if the page is in dirty bitmap of last iter; >>> - we don''t prefer adding new hypercall for this corner case, >>> instead we''d like to re-use currently existed hypercall; >>> - if vmce occur at last iter, we do 1~2 more iter. >> Can''t a page break on each "last" iter. i.e. you might actually go >> around as many times as there are pages in the last batch? > > I think it would be only if the pages break one by one, at just the > right time -- that seems pretty unlikely. :-) > > -GeorgeAnd under the case it still OK w/ V3, keeping iter until the last broken page. Thanks, Jinsong
Ian Campbell
2012-Nov-21 12:11 UTC
Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration
On Wed, 2012-11-21 at 11:34 +0000, George Dunlap wrote:> On 20/11/12 18:42, Ian Jackson wrote: > > Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3] X86/vMCE: handle broken page with regard to migration"): > >> Ian Jackson wrote: > >>> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3] X86/vMCE: handle > >>> broken page with regard to migration"): > >>>> No, at last lter, there are 4 points: > >>>> 1. start last iter > >>>> 2. get and transfer pfn_type to target > >>>> 3. copy page to target > >>>> 4. end last iter > > ... > >> It indeed checks mce after point 3 for each page, but what''s the > >> advantage of keeping a separate list? > > It avoids yet another loop over all the pages. Unless I have > > misunderstood. Which I may have, because: if it checks for mce after > > point 3 then surely that is sufficient ? We don''t need to worry about > > mces after that check. > > It''s sufficient, but wouldn''t each check require a separate hypercall? > That would surely be slower than just a single hypercall and a loop > (which is what Jinsong''s patch does). > > We don''t actually need a list -- I think we just need to know, "Have any > pages broken between reading the p2m table ( xc_get_pfn_type_batch() ); > if so, we do another full iteration.If a page fails between 2. and 3. above then what happens at point 3? I presume we can''t map and send the page (since it is broken), do we get some sort of failure to map? What happens if the failure occurs during stage 3, i.e. while the page is mapped and we are reading from it? Ian.
George Dunlap
2012-Nov-21 12:15 UTC
Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration
On 21/11/12 12:11, Ian Campbell wrote:> On Wed, 2012-11-21 at 11:34 +0000, George Dunlap wrote: >> On 20/11/12 18:42, Ian Jackson wrote: >>> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3] X86/vMCE: handle broken page with regard to migration"): >>>> Ian Jackson wrote: >>>>> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3] X86/vMCE: handle >>>>> broken page with regard to migration"): >>>>>> No, at last lter, there are 4 points: >>>>>> 1. start last iter >>>>>> 2. get and transfer pfn_type to target >>>>>> 3. copy page to target >>>>>> 4. end last iter >>> ... >>>> It indeed checks mce after point 3 for each page, but what''s the >>>> advantage of keeping a separate list? >>> It avoids yet another loop over all the pages. Unless I have >>> misunderstood. Which I may have, because: if it checks for mce after >>> point 3 then surely that is sufficient ? We don''t need to worry about >>> mces after that check. >> It''s sufficient, but wouldn''t each check require a separate hypercall? >> That would surely be slower than just a single hypercall and a loop >> (which is what Jinsong''s patch does). >> >> We don''t actually need a list -- I think we just need to know, "Have any >> pages broken between reading the p2m table ( xc_get_pfn_type_batch() ); >> if so, we do another full iteration. > If a page fails between 2. and 3. above then what happens at point 3? I > presume we can''t map and send the page (since it is broken), do we get > some sort of failure to map?At that point the pages are already mapped. The order is: 1. From dirty bitmap / whatever, get a batch of pfns to map 2. Map them with foreign_batch 3. Read the p2m table 4. Send a list of PFNs (this is where it will list the broken ones) 5. Send all the data from those pfns So JS is adding: 6. Read the p2m table again to see if anything has broken between 3 and the end of 5> What happens if the failure occurs during stage 3, i.e. while the page > is mapped and we are reading from it?I was rather wondering the same thing. :-) -George
George Dunlap
2012-Nov-21 12:17 UTC
Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration
OK, so now that we seem to have agreed on the overall approach, down to "brass tacks", as we say in English: On Sat, Nov 17, 2012 at 2:04 AM, Liu Jinsong <jinsong.liu@intel.com> wrote:> Generally, there are 2 cases: > 1. broken page occurs before migration > 2. broken page occurs during migration >I think we probably need to include the "last iteration" subcases here as well -- although, probably it would be better to put them in the comments. Also make sure to note that now even suspend may do a second iteration if pages are broken during the suspend operation. Also, what you should do is have the main description always say what the patch does, and then at the bottom have a summary of the changes; for instance: v4: - Adjust variables and patch description based on feedback v3: - Handle pages broken on the last iteration &c> We had submitted 2 versions of patches. Their difference is how to handle > case 2: > V1 patch aborted migration for the case ''broken page occurs during > migration''; > V2 patch marked broken page to dirty bitmap then got handled next round > and no abort; > > This is V3 patch, adding handle for vMCE occur at last iteration of > migration: > At the sender > For case 1, the broken page will be mapped but not copied to target > (otherwise it may trigger more serious error, say, SRAR error). > While its pfn_type and pfn number will be transferred to target > so that target take appropriate action. > > For case 2, mce handler marks the broken page to dirty bitmap, so that > at copypages stage of migration, its pfn_type and pfn number will be > transferred to target and then take appropriate action. > > At the target > When migration target populates pages for guest. As for the case > of broken page wrt migration, we prefer to keep the type of the page, > for the sake of seamless migration. > > At target it will set p2m as p2m_ram_broken for broken page. Guest MCE > may have good chance to handle its broken page, while if guest access > the broken page again it will kill itself as expected. > > If vMCE occur at the last memory copy iteration of migration, we do one > more > iteration so that the broken page''s pfn_type and pfn number has chance to > be transferred to target. > > Suggested-by: George Dunlap <george.dunlap@eu.citrix.com> > Signed-off-by: Liu Jinsong <jinsong.liu@intel.com> > Acked-by: George Dunlap <george.dunlap@eu.citrix.com> > > - for tools part of the patch > Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com> >You should remove these until you get another ack (since the patch has changed materially)> > - for hypervisor part of the patch > Acked-by: Jan Beulich <JBeulich@suse.com> >I think if you haven''t changed the HV part since his ack, you can probably leave this one.> > diff -r 8b93ac0c93f3 -r 446f6b9bfc89 tools/libxc/xc_domain.c > --- a/tools/libxc/xc_domain.c Tue Nov 13 11:19:17 2012 +0000 > +++ b/tools/libxc/xc_domain.c Sat Nov 17 09:46:05 2012 +0800 > @@ -283,6 +283,22 @@ > return ret; > } > > +/* set broken page p2m */ > +int xc_set_broken_page_p2m(xc_interface *xch, > + uint32_t domid, > + unsigned long pfn) > +{ > + int ret; > + DECLARE_DOMCTL; > + > + domctl.cmd = XEN_DOMCTL_set_broken_page_p2m; > + domctl.domain = (domid_t)domid; > + domctl.u.set_broken_page_p2m.pfn = pfn; > + ret = do_domctl(xch, &domctl); > + > + return ret ? -1 : 0; > +} > + > /* get info from hvm guest for save */ > int xc_domain_hvm_getcontext(xc_interface *xch, > uint32_t domid, > diff -r 8b93ac0c93f3 -r 446f6b9bfc89 tools/libxc/xc_domain_restore.c > --- a/tools/libxc/xc_domain_restore.c Tue Nov 13 11:19:17 2012 +0000 > +++ b/tools/libxc/xc_domain_restore.c Sat Nov 17 09:46:05 2012 +0800 > @@ -1023,9 +1023,15 @@ > > countpages = count; > for (i = oldcount; i < buf->nr_pages; ++i) > - if ((buf->pfn_types[i] & XEN_DOMCTL_PFINFO_LTAB_MASK) => XEN_DOMCTL_PFINFO_XTAB > - ||(buf->pfn_types[i] & XEN_DOMCTL_PFINFO_LTAB_MASK) => XEN_DOMCTL_PFINFO_XALLOC) > + { > + unsigned long pagetype; > + > + pagetype = buf->pfn_types[i] & XEN_DOMCTL_PFINFO_LTAB_MASK; > + if ( pagetype == XEN_DOMCTL_PFINFO_XTAB || > + pagetype == XEN_DOMCTL_PFINFO_BROKEN || > + pagetype == XEN_DOMCTL_PFINFO_XALLOC ) > --countpages; > + } > > if (!countpages) > return count; > @@ -1267,6 +1273,17 @@ > /* a bogus/unmapped/allocate-only page: skip it */ > continue; > > + if ( pagetype == XEN_DOMCTL_PFINFO_BROKEN ) > + { > + if ( xc_set_broken_page_p2m(xch, dom, pfn) ) > + { > + ERROR("Set p2m for broken page failed, " > + "dom=%d, pfn=%lx\n", dom, pfn); > + goto err_mapped; > + } > + continue; > + } > + > if (pfn_err[i]) > { > ERROR("unexpected PFN mapping failure pfn %lx map_mfn %lx > p2m_mfn %lx", > diff -r 8b93ac0c93f3 -r 446f6b9bfc89 tools/libxc/xc_domain_save.c > --- a/tools/libxc/xc_domain_save.c Tue Nov 13 11:19:17 2012 +0000 > +++ b/tools/libxc/xc_domain_save.c Sat Nov 17 09:46:05 2012 +0800 > @@ -1118,7 +1118,7 @@ > /* Now write out each data page, canonicalising page tables as we > go... */ > for ( ; ; ) > { > - unsigned int N, batch, run; > + unsigned int N, batch, run, broken_page_num1, broken_page_num2; >You need more descriptive variables here. Maybe "broken_before_send" and "broken_after_send"?> char reportbuf[80]; > > snprintf(reportbuf, sizeof(reportbuf), > @@ -1270,6 +1270,7 @@ > goto out; > } > > + broken_page_num1 = 0; > for ( run = j = 0; j < batch; j++ ) > { > unsigned long gmfn = pfn_batch[j]; > @@ -1277,6 +1278,14 @@ > if ( !hvm ) > gmfn = pfn_to_mfn(gmfn); > > + if ( pfn_type[j] == XEN_DOMCTL_PFINFO_BROKEN ) > + { > + pfn_type[j] |= pfn_batch[j]; > + ++broken_page_num1; > + ++run; > + continue; > + } > + > if ( pfn_err[j] ) > { > if ( pfn_type[j] == XEN_DOMCTL_PFINFO_XTAB ) > @@ -1371,8 +1380,12 @@ > } > } > > - /* skip pages that aren''t present or are alloc-only */ > + /* > + * skip pages that aren''t present, > + * or are broken, or are alloc-only > + */ > if ( pagetype == XEN_DOMCTL_PFINFO_XTAB > + || pagetype == XEN_DOMCTL_PFINFO_BROKEN > || pagetype == XEN_DOMCTL_PFINFO_XALLOC ) > continue; > > @@ -1484,6 +1497,35 @@ > > munmap(region_base, batch*PAGE_SIZE); > > + /* > + * if vMCE occur at last iter, do one more iter so that it get > + * chance to transfer broken page''s pfn_type and pfn number to > + * target and then take appropriate action > + */ >Maybe, "Pages may have broken between mapping them and sending them. Count the number of broken pages after sending, and if there are more than before sending, do another iteration to make sure the pages are marked broken on the receiving side"?> + if ( last_iter ) > + { >/* Read the p2m table again */> + for ( j = 0; j < batch; j++ ) > + { > + if ( hvm ) > + pfn_type[j] = pfn_batch[j]; > + else > + pfn_type[j] = pfn_to_mfn(pfn_batch[j]); > + } > + > + if ( xc_get_pfn_type_batch(xch, dom, batch, pfn_type) ) > + { > + PERROR("get_pfn_type_batch failed"); > + goto out; > + } > + >/* Count the number of broken pages */> + broken_page_num2 = 0; > + for ( j = 0; j < batch; j++ ) > + if ( pfn_type[j] == XEN_DOMCTL_PFINFO_BROKEN ) > + broken_page_num2++; > + >/* If there are more than before sending, do one more iteration */> + if ( broken_page_num1 < broken_page_num2 ) > + last_iter = 0; > + } > } /* end of this while loop for this iteration */ >I think that''s all I have. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Liu, Jinsong
2012-Nov-21 13:26 UTC
Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration
Ian Campbell wrote:> On Wed, 2012-11-21 at 11:34 +0000, George Dunlap wrote: >> On 20/11/12 18:42, Ian Jackson wrote: >>> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3] X86/vMCE: handle >>> broken page with regard to migration"): >>>> Ian Jackson wrote: >>>>> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3] X86/vMCE: handle >>>>> broken page with regard to migration"): >>>>>> No, at last lter, there are 4 points: >>>>>> 1. start last iter >>>>>> 2. get and transfer pfn_type to target >>>>>> 3. copy page to target >>>>>> 4. end last iter >>> ... >>>> It indeed checks mce after point 3 for each page, but what''s the >>>> advantage of keeping a separate list? >>> It avoids yet another loop over all the pages. Unless I have >>> misunderstood. Which I may have, because: if it checks for mce >>> after >>> point 3 then surely that is sufficient ? We don''t need to worry >>> about >>> mces after that check. >> >> It''s sufficient, but wouldn''t each check require a separate >> hypercall? That would surely be slower than just a single hypercall >> and a loop (which is what Jinsong''s patch does). >> >> We don''t actually need a list -- I think we just need to know, "Have >> any pages broken between reading the p2m table ( >> xc_get_pfn_type_batch() ); if so, we do another full iteration. > > If a page fails between 2. and 3. above then what happens at point 3? > I presume we can''t map and send the page (since it is broken), do we > get some sort of failure to map? > > What happens if the failure occurs during stage 3, i.e. while the page > is mapped and we are reading from it? > > Ian.If read a broken page, it generates more serious error (say, SRAR error). I don''t think guest has good opportunity to survive under this case --> most probably it kill itself and of course we don''t need care migration now. However, if guest can luckly survive (say complete broken page copying to target), it''s OK to continue --> its broken pfn_type will transfer to target next iter so guest will kill itself if access then. Thanks, Jinsong
Liu, Jinsong
2012-Nov-21 13:31 UTC
Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration
Yes, I will update and send V4 later. Thanks, Jinsong ________________________________ From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of George Dunlap Sent: Wednesday, November 21, 2012 8:17 PM To: Liu, Jinsong Cc: Ian Campbell; Ian Jackson; Jan Beulich; xen-devel@lists.xensource.com Subject: Re: [Xen-devel] [PATCH V3] X86/vMCE: handle broken page with regard to migration OK, so now that we seem to have agreed on the overall approach, down to "brass tacks", as we say in English: On Sat, Nov 17, 2012 at 2:04 AM, Liu Jinsong <jinsong.liu@intel.com<mailto:jinsong.liu@intel.com>> wrote: Generally, there are 2 cases: 1. broken page occurs before migration 2. broken page occurs during migration I think we probably need to include the "last iteration" subcases here as well -- although, probably it would be better to put them in the comments. Also make sure to note that now even suspend may do a second iteration if pages are broken during the suspend operation. Also, what you should do is have the main description always say what the patch does, and then at the bottom have a summary of the changes; for instance: v4: - Adjust variables and patch description based on feedback v3: - Handle pages broken on the last iteration &c We had submitted 2 versions of patches. Their difference is how to handle case 2: V1 patch aborted migration for the case ''broken page occurs during migration''; V2 patch marked broken page to dirty bitmap then got handled next round and no abort; This is V3 patch, adding handle for vMCE occur at last iteration of migration: At the sender For case 1, the broken page will be mapped but not copied to target (otherwise it may trigger more serious error, say, SRAR error). While its pfn_type and pfn number will be transferred to target so that target take appropriate action. For case 2, mce handler marks the broken page to dirty bitmap, so that at copypages stage of migration, its pfn_type and pfn number will be transferred to target and then take appropriate action. At the target When migration target populates pages for guest. As for the case of broken page wrt migration, we prefer to keep the type of the page, for the sake of seamless migration. At target it will set p2m as p2m_ram_broken for broken page. Guest MCE may have good chance to handle its broken page, while if guest access the broken page again it will kill itself as expected. If vMCE occur at the last memory copy iteration of migration, we do one more iteration so that the broken page''s pfn_type and pfn number has chance to be transferred to target. Suggested-by: George Dunlap <george.dunlap@eu.citrix.com<mailto:george.dunlap@eu.citrix.com>> Signed-off-by: Liu Jinsong <jinsong.liu@intel.com<mailto:jinsong.liu@intel.com>> Acked-by: George Dunlap <george.dunlap@eu.citrix.com<mailto:george.dunlap@eu.citrix.com>> - for tools part of the patch Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com<mailto:Ian.Jackson@eu.citrix.com>> You should remove these until you get another ack (since the patch has changed materially) - for hypervisor part of the patch Acked-by: Jan Beulich <JBeulich@suse.com<mailto:JBeulich@suse.com>> I think if you haven''t changed the HV part since his ack, you can probably leave this one. diff -r 8b93ac0c93f3 -r 446f6b9bfc89 tools/libxc/xc_domain.c --- a/tools/libxc/xc_domain.c Tue Nov 13 11:19:17 2012 +0000 +++ b/tools/libxc/xc_domain.c Sat Nov 17 09:46:05 2012 +0800 @@ -283,6 +283,22 @@ return ret; } +/* set broken page p2m */ +int xc_set_broken_page_p2m(xc_interface *xch, + uint32_t domid, + unsigned long pfn) +{ + int ret; + DECLARE_DOMCTL; + + domctl.cmd = XEN_DOMCTL_set_broken_page_p2m; + domctl.domain = (domid_t)domid; + domctl.u.set_broken_page_p2m.pfn = pfn; + ret = do_domctl(xch, &domctl); + + return ret ? -1 : 0; +} + /* get info from hvm guest for save */ int xc_domain_hvm_getcontext(xc_interface *xch, uint32_t domid, diff -r 8b93ac0c93f3 -r 446f6b9bfc89 tools/libxc/xc_domain_restore.c --- a/tools/libxc/xc_domain_restore.c Tue Nov 13 11:19:17 2012 +0000 +++ b/tools/libxc/xc_domain_restore.c Sat Nov 17 09:46:05 2012 +0800 @@ -1023,9 +1023,15 @@ countpages = count; for (i = oldcount; i < buf->nr_pages; ++i) - if ((buf->pfn_types[i] & XEN_DOMCTL_PFINFO_LTAB_MASK) == XEN_DOMCTL_PFINFO_XTAB - ||(buf->pfn_types[i] & XEN_DOMCTL_PFINFO_LTAB_MASK) == XEN_DOMCTL_PFINFO_XALLOC) + { + unsigned long pagetype; + + pagetype = buf->pfn_types[i] & XEN_DOMCTL_PFINFO_LTAB_MASK; + if ( pagetype == XEN_DOMCTL_PFINFO_XTAB || + pagetype == XEN_DOMCTL_PFINFO_BROKEN || + pagetype == XEN_DOMCTL_PFINFO_XALLOC ) --countpages; + } if (!countpages) return count; @@ -1267,6 +1273,17 @@ /* a bogus/unmapped/allocate-only page: skip it */ continue; + if ( pagetype == XEN_DOMCTL_PFINFO_BROKEN ) + { + if ( xc_set_broken_page_p2m(xch, dom, pfn) ) + { + ERROR("Set p2m for broken page failed, " + "dom=%d, pfn=%lx\n", dom, pfn); + goto err_mapped; + } + continue; + } + if (pfn_err[i]) { ERROR("unexpected PFN mapping failure pfn %lx map_mfn %lx p2m_mfn %lx", diff -r 8b93ac0c93f3 -r 446f6b9bfc89 tools/libxc/xc_domain_save.c --- a/tools/libxc/xc_domain_save.c Tue Nov 13 11:19:17 2012 +0000 +++ b/tools/libxc/xc_domain_save.c Sat Nov 17 09:46:05 2012 +0800 @@ -1118,7 +1118,7 @@ /* Now write out each data page, canonicalising page tables as we go... */ for ( ; ; ) { - unsigned int N, batch, run; + unsigned int N, batch, run, broken_page_num1, broken_page_num2; You need more descriptive variables here. Maybe "broken_before_send" and "broken_after_send"? char reportbuf[80]; snprintf(reportbuf, sizeof(reportbuf), @@ -1270,6 +1270,7 @@ goto out; } + broken_page_num1 = 0; for ( run = j = 0; j < batch; j++ ) { unsigned long gmfn = pfn_batch[j]; @@ -1277,6 +1278,14 @@ if ( !hvm ) gmfn = pfn_to_mfn(gmfn); + if ( pfn_type[j] == XEN_DOMCTL_PFINFO_BROKEN ) + { + pfn_type[j] |= pfn_batch[j]; + ++broken_page_num1; + ++run; + continue; + } + if ( pfn_err[j] ) { if ( pfn_type[j] == XEN_DOMCTL_PFINFO_XTAB ) @@ -1371,8 +1380,12 @@ } } - /* skip pages that aren''t present or are alloc-only */ + /* + * skip pages that aren''t present, + * or are broken, or are alloc-only + */ if ( pagetype == XEN_DOMCTL_PFINFO_XTAB + || pagetype == XEN_DOMCTL_PFINFO_BROKEN || pagetype == XEN_DOMCTL_PFINFO_XALLOC ) continue; @@ -1484,6 +1497,35 @@ munmap(region_base, batch*PAGE_SIZE); + /* + * if vMCE occur at last iter, do one more iter so that it get + * chance to transfer broken page''s pfn_type and pfn number to + * target and then take appropriate action + */ Maybe, "Pages may have broken between mapping them and sending them. Count the number of broken pages after sending, and if there are more than before sending, do another iteration to make sure the pages are marked broken on the receiving side"? + if ( last_iter ) + { /* Read the p2m table again */ + for ( j = 0; j < batch; j++ ) + { + if ( hvm ) + pfn_type[j] = pfn_batch[j]; + else + pfn_type[j] = pfn_to_mfn(pfn_batch[j]); + } + + if ( xc_get_pfn_type_batch(xch, dom, batch, pfn_type) ) + { + PERROR("get_pfn_type_batch failed"); + goto out; + } + /* Count the number of broken pages */ + broken_page_num2 = 0; + for ( j = 0; j < batch; j++ ) + if ( pfn_type[j] == XEN_DOMCTL_PFINFO_BROKEN ) + broken_page_num2++; + /* If there are more than before sending, do one more iteration */ + if ( broken_page_num1 < broken_page_num2 ) + last_iter = 0; + } } /* end of this while loop for this iteration */ I think that''s all I have. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Jan Beulich
2012-Nov-21 13:37 UTC
Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration
>>> On 21.11.12 at 14:26, "Liu, Jinsong" <jinsong.liu@intel.com> wrote: > Ian Campbell wrote: >> On Wed, 2012-11-21 at 11:34 +0000, George Dunlap wrote: >>> On 20/11/12 18:42, Ian Jackson wrote: >>>> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3] X86/vMCE: handle >>>> broken page with regard to migration"): >>>>> Ian Jackson wrote: >>>>>> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3] X86/vMCE: handle >>>>>> broken page with regard to migration"): >>>>>>> No, at last lter, there are 4 points: >>>>>>> 1. start last iter >>>>>>> 2. get and transfer pfn_type to target >>>>>>> 3. copy page to target >>>>>>> 4. end last iter >>>> ... >>>>> It indeed checks mce after point 3 for each page, but what''s the >>>>> advantage of keeping a separate list? >>>> It avoids yet another loop over all the pages. Unless I have >>>> misunderstood. Which I may have, because: if it checks for mce >>>> after >>>> point 3 then surely that is sufficient ? We don''t need to worry >>>> about >>>> mces after that check. >>> >>> It''s sufficient, but wouldn''t each check require a separate >>> hypercall? That would surely be slower than just a single hypercall >>> and a loop (which is what Jinsong''s patch does). >>> >>> We don''t actually need a list -- I think we just need to know, "Have >>> any pages broken between reading the p2m table ( >>> xc_get_pfn_type_batch() ); if so, we do another full iteration. >> >> If a page fails between 2. and 3. above then what happens at point 3? >> I presume we can''t map and send the page (since it is broken), do we >> get some sort of failure to map? >> >> What happens if the failure occurs during stage 3, i.e. while the page >> is mapped and we are reading from it? >> >> Ian. > > If read a broken page, it generates more serious error (say, SRAR error). > I don''t think guest has good opportunity to survive under this case --> most > probably it kill itself and of course we don''t need care migration now. > However, if guest can luckly survive (say complete broken page copying to > target), it''s OK to continue --> its broken pfn_type will transfer to target > next iter so guest will kill itself if access then.I think you misread the question - it said "we", as in "the tools/ kernel/hypervisor" (at least that''s how I''m reading it). The MCE would surface in host context in this case, and whether that''s fatal to the host depends on the precise properties of the event. Jan
George Dunlap
2012-Nov-21 13:59 UTC
Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration
On 21/11/12 13:26, Liu, Jinsong wrote:> Ian Campbell wrote: >> On Wed, 2012-11-21 at 11:34 +0000, George Dunlap wrote: >>> On 20/11/12 18:42, Ian Jackson wrote: >>>> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3] X86/vMCE: handle >>>> broken page with regard to migration"): >>>>> Ian Jackson wrote: >>>>>> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3] X86/vMCE: handle >>>>>> broken page with regard to migration"): >>>>>>> No, at last lter, there are 4 points: >>>>>>> 1. start last iter >>>>>>> 2. get and transfer pfn_type to target >>>>>>> 3. copy page to target >>>>>>> 4. end last iter >>>> ... >>>>> It indeed checks mce after point 3 for each page, but what''s the >>>>> advantage of keeping a separate list? >>>> It avoids yet another loop over all the pages. Unless I have >>>> misunderstood. Which I may have, because: if it checks for mce >>>> after >>>> point 3 then surely that is sufficient ? We don''t need to worry >>>> about >>>> mces after that check. >>> It''s sufficient, but wouldn''t each check require a separate >>> hypercall? That would surely be slower than just a single hypercall >>> and a loop (which is what Jinsong''s patch does). >>> >>> We don''t actually need a list -- I think we just need to know, "Have >>> any pages broken between reading the p2m table ( >>> xc_get_pfn_type_batch() ); if so, we do another full iteration. >> If a page fails between 2. and 3. above then what happens at point 3? >> I presume we can''t map and send the page (since it is broken), do we >> get some sort of failure to map? >> >> What happens if the failure occurs during stage 3, i.e. while the page >> is mapped and we are reading from it? >> >> Ian. > If read a broken page, it generates more serious error (say, SRAR error). > I don''t think guest has good opportunity to survive under this case --> most probably it kill itself and of course we don''t need care migration now. > However, if guest can luckly survive (say complete broken page copying to target), it''s OK to continue --> its broken pfn_type will transfer to target next iter so guest will kill itself if access then.But in this case, I''m asking what happens if the migration code reads the page. If reading the page in the migration code causes dom0 to crash, then the whole "last iteration" stuff is fairly pointless. :-) -George
Liu, Jinsong
2012-Nov-22 11:23 UTC
Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration
Jan Beulich wrote:>>>> On 21.11.12 at 14:26, "Liu, Jinsong" <jinsong.liu@intel.com> >>>> wrote: Ian Campbell wrote: On Wed, 2012-11-21 at 11:34 +0000, >>>> George Dunlap wrote: On 20/11/12 18:42, Ian Jackson wrote: >>>>> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3] X86/vMCE: handle >>>>> broken page with regard to migration"): >>>>>> Ian Jackson wrote: >>>>>>> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3] X86/vMCE: >>>>>>> handle broken page with regard to migration"): >>>>>>>> No, at last lter, there are 4 points: >>>>>>>> 1. start last iter >>>>>>>> 2. get and transfer pfn_type to target >>>>>>>> 3. copy page to target >>>>>>>> 4. end last iter >>>>> ... >>>>>> It indeed checks mce after point 3 for each page, but what''s the >>>>>> advantage of keeping a separate list? >>>>> It avoids yet another loop over all the pages. Unless I have >>>>> misunderstood. Which I may have, because: if it checks for mce >>>>> after point 3 then surely that is sufficient ? We don''t need to >>>>> worry about mces after that check. >>>> >>>> It''s sufficient, but wouldn''t each check require a separate >>>> hypercall? That would surely be slower than just a single hypercall >>>> and a loop (which is what Jinsong''s patch does). >>>> >>>> We don''t actually need a list -- I think we just need to know, >>>> "Have any pages broken between reading the p2m table ( >>>> xc_get_pfn_type_batch() ); if so, we do another full iteration. >>> >>> If a page fails between 2. and 3. above then what happens at point >>> 3? I presume we can''t map and send the page (since it is broken), >>> do we get some sort of failure to map? >>> >>> What happens if the failure occurs during stage 3, i.e. while the >>> page is mapped and we are reading from it? >>> >>> Ian. >> >> If read a broken page, it generates more serious error (say, SRAR >> error). >> I don''t think guest has good opportunity to survive under this case >> --> most probably it kill itself and of course we don''t need care >> migration now. However, if guest can luckly survive (say complete >> broken page copying to target), it''s OK to continue --> its broken >> pfn_type will transfer to target next iter so guest will kill itself >> if access then. > > I think you misread the question - it said "we", as in "the tools/ > kernel/hypervisor" (at least that''s how I''m reading it). The MCE > would surface in host context in this case, and whether that''s > fatal to the host depends on the precise properties of the event. > > JanYes, depending on error types, both hypervisor and guest may crash. As for tools I think it''s OK if only hypervisor OK. Thanks, Jinsong
Liu, Jinsong
2012-Nov-22 11:44 UTC
Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration
George Dunlap wrote:> On 21/11/12 13:26, Liu, Jinsong wrote: >> Ian Campbell wrote: >>> On Wed, 2012-11-21 at 11:34 +0000, George Dunlap wrote: >>>> On 20/11/12 18:42, Ian Jackson wrote: >>>>> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3] X86/vMCE: handle >>>>> broken page with regard to migration"): >>>>>> Ian Jackson wrote: >>>>>>> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3] X86/vMCE: >>>>>>> handle broken page with regard to migration"): >>>>>>>> No, at last lter, there are 4 points: >>>>>>>> 1. start last iter >>>>>>>> 2. get and transfer pfn_type to target >>>>>>>> 3. copy page to target >>>>>>>> 4. end last iter >>>>> ... >>>>>> It indeed checks mce after point 3 for each page, but what''s the >>>>>> advantage of keeping a separate list? >>>>> It avoids yet another loop over all the pages. Unless I have >>>>> misunderstood. Which I may have, because: if it checks for mce >>>>> after point 3 then surely that is sufficient ? We don''t need to >>>>> worry about mces after that check. >>>> It''s sufficient, but wouldn''t each check require a separate >>>> hypercall? That would surely be slower than just a single hypercall >>>> and a loop (which is what Jinsong''s patch does). >>>> >>>> We don''t actually need a list -- I think we just need to know, >>>> "Have any pages broken between reading the p2m table ( >>>> xc_get_pfn_type_batch() ); if so, we do another full iteration. >>> If a page fails between 2. and 3. above then what happens at point >>> 3? I presume we can''t map and send the page (since it is broken), >>> do we get some sort of failure to map? >>> >>> What happens if the failure occurs during stage 3, i.e. while the >>> page is mapped and we are reading from it? >>> >>> Ian. >> If read a broken page, it generates more serious error (say, SRAR >> error). >> I don''t think guest has good opportunity to survive under this case >> --> most probably it kill itself and of course we don''t need care >> migration now. >> However, if guest can luckly survive (say complete broken page >> copying to target), it''s OK to continue --> its broken pfn_type will >> transfer to target next iter so guest will kill itself if access >> then. > > But in this case, I''m asking what happens if the migration code reads > the page. If reading the page in the migration code causes dom0 to > crash, then the whole "last iteration" stuff is fairly pointless. :-) > > -GeorgeIf migration code read the page it will trigger more serious error and may kill hypervisor or guest. But unfortunately we cannot prevent it since we cannot predict whether a vmce will occur *during* migration. What we can do is do our best to handle it: 1. for vmce occur before migration, we can safely handle it; 2. for vmce occur during migration, we can only do our best: 2.1 if fortunately vmce occur at some area (say, before point2), we can successfully prevent page reading; 2.1 if vmce occur after point2, it will read the page, under such case * if guest/hypervisor can survive, it''s OK to transfer broken pfn_type to target so that no further harm to target; * if guest/hypervisor crash, we definitely needn''t care migration any more; The key point is, before migration we have no way to predict it, and we cannot forbid migration for fear that it potentially crash system. Thanks, Jinsong Thanks, Jinsong
Liu, Jinsong
2012-Nov-22 12:37 UTC
Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration
________________________________ From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of George Dunlap Sent: Wednesday, November 21, 2012 8:17 PM To: Liu, Jinsong Cc: Ian Campbell; Ian Jackson; Jan Beulich; xen-devel@lists.xensource.com Subject: Re: [Xen-devel] [PATCH V3] X86/vMCE: handle broken page with regard to migration OK, so now that we seem to have agreed on the overall approach, down to "brass tacks", as we say in English: On Sat, Nov 17, 2012 at 2:04 AM, Liu Jinsong <jinsong.liu@intel.com<mailto:jinsong.liu@intel.com>> wrote: Generally, there are 2 cases: 1. broken page occurs before migration 2. broken page occurs during migration I think we probably need to include the "last iteration" subcases here as well -- although, probably it would be better to put them in the comments. Also make sure to note that now even suspend may do a second iteration if pages are broken during the suspend operation. Also, what you should do is have the main description always say what the patch does, and then at the bottom have a summary of the changes; for instance: v4: - Adjust variables and patch description based on feedback v3: - Handle pages broken on the last iteration &c We had submitted 2 versions of patches. Their difference is how to handle case 2: V1 patch aborted migration for the case ''broken page occurs during migration''; V2 patch marked broken page to dirty bitmap then got handled next round and no abort; This is V3 patch, adding handle for vMCE occur at last iteration of migration: At the sender For case 1, the broken page will be mapped but not copied to target (otherwise it may trigger more serious error, say, SRAR error). While its pfn_type and pfn number will be transferred to target so that target take appropriate action. For case 2, mce handler marks the broken page to dirty bitmap, so that at copypages stage of migration, its pfn_type and pfn number will be transferred to target and then take appropriate action. At the target When migration target populates pages for guest. As for the case of broken page wrt migration, we prefer to keep the type of the page, for the sake of seamless migration. At target it will set p2m as p2m_ram_broken for broken page. Guest MCE may have good chance to handle its broken page, while if guest access the broken page again it will kill itself as expected. If vMCE occur at the last memory copy iteration of migration, we do one more iteration so that the broken page''s pfn_type and pfn number has chance to be transferred to target. Suggested-by: George Dunlap <george.dunlap@eu.citrix.com<mailto:george.dunlap@eu.citrix.com>> Signed-off-by: Liu Jinsong <jinsong.liu@intel.com<mailto:jinsong.liu@intel.com>> Acked-by: George Dunlap <george.dunlap@eu.citrix.com<mailto:george.dunlap@eu.citrix.com>> - for tools part of the patch Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com<mailto:Ian.Jackson@eu.citrix.com>> You should remove these until you get another ack (since the patch has changed materially) - for hypervisor part of the patch Acked-by: Jan Beulich <JBeulich@suse.com<mailto:JBeulich@suse.com>> I think if you haven''t changed the HV part since his ack, you can probably leave this one. Jan, for hypervisor side your opinion of acked-by ? I''m updating to V4. Thanks, Jinsong _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Jan Beulich
2012-Nov-22 13:36 UTC
Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration
>>> On 22.11.12 at 13:37, "Liu, Jinsong" <jinsong.liu@intel.com> wrote: > - for hypervisor part of the patch > Acked-by: Jan Beulich <JBeulich@suse.com<mailto:JBeulich@suse.com>> > > I think if you haven''t changed the HV part since his ack, you can probably > leave this one. > > Jan, for hypervisor side your opinion of acked-by ? I''m updating to V4.As George said - if you didn''t change the hypervisor bits, you can keep the ack. Else remove it. Jan