thr3ads.net - Xen devel - [PATCH V3] X86/vMCE: handle broken page with regard to migration [Nov 2012]

If this information is useful, please help other people find it:
Share via:

Ian Jackson

2012-Nov-16 18:19 UTC

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Liu Jinsong writes ("[PATCH V3] X86/vMCE: handle broken page with regard to
migration"):> This is V3 patch, adding handle for vMCE occur at last iteration of
migration:
I think you should take off my ack, having added significant new code
in the tools section.

About that code:
> +            /*
> +             * if vMCE occur at last iter, do one more iter so that it get
> +             * chance to transfer broken page''s pfn_type and pfn
number to
> +             * target and then take appropriate action
> +             */
> +            if ( last_iter )
> +            {
> +                for ( j = 0; j < batch; j++ )
> +                {
> +                    if ( hvm )
> +                        pfn_type[j] = pfn_batch[j];
> +                    else
> +                        pfn_type[j] = pfn_to_mfn(pfn_batch[j]);
> +                }
> +
> +                if ( xc_get_pfn_type_batch(xch, dom, batch, pfn_type) )
> +                {
> +                    PERROR("get_pfn_type_batch failed");
> +                    goto out;
> +                }
> +
> +                broken_page_num2 = 0;
> +                for ( j = 0; j < batch; j++ )
> +                    if ( pfn_type[j] == XEN_DOMCTL_PFINFO_BROKEN )
> +                        broken_page_num2++;
> +
> +                if ( broken_page_num1 < broken_page_num2 )
> +                    last_iter = 0;
> +            }
Is this really the best way of doing this ?  Isn''t there some single
flag the hypervisor sets somewhere ?

Ian.

Liu, Jinsong

2012-Nov-16 18:31 UTC

head link

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Ian Jackson wrote:> Liu Jinsong writes ("[PATCH V3] X86/vMCE: handle broken page with
> regard to migration"): 
>> This is V3 patch, adding handle for vMCE occur at last iteration of
>> migration: 
> 
> I think you should take off my ack, having added significant new code
> in the tools section.
OK.
> 
> About that code:
> 
>> +            /*
>> +             * if vMCE occur at last iter, do one more iter so that
>> it get +             * chance to transfer broken page''s
pfn_type and
>> pfn number to +             * target and then take appropriate
>> action +             */ +            if ( last_iter )
>> +            {
>> +                for ( j = 0; j < batch; j++ )
>> +                {
>> +                    if ( hvm )
>> +                        pfn_type[j] = pfn_batch[j]; +              
>> else +                        pfn_type[j] >>
pfn_to_mfn(pfn_batch[j]); +                } +
>> +                if ( xc_get_pfn_type_batch(xch, dom, batch,
>> pfn_type) ) +                { +                   
>> PERROR("get_pfn_type_batch failed"); +                   
goto out;
>> +                }
>> +
>> +                broken_page_num2 = 0;
>> +                for ( j = 0; j < batch; j++ )
>> +                    if ( pfn_type[j] == XEN_DOMCTL_PFINFO_BROKEN )
>> +                        broken_page_num2++;
>> +
>> +                if ( broken_page_num1 < broken_page_num2 )
>> +                    last_iter = 0;
>> +            }
> 
> Is this really the best way of doing this ?  Isn''t there some
single
> flag the hypervisor sets somewhere ?
> 
> Ian.
That will involve adding new hypercall, and notifying hypervisor whether tools
side is at the last iter round.
This patch just re-use existed mechanism.

Thanks,
Jinsong

Liu Jinsong

2012-Nov-17 02:04 UTC

head link

[PATCH V3] X86/vMCE: handle broken page with regard to migration

Generally, there are 2 cases:
1. broken page occurs before migration
2. broken page occurs during migration

We had submitted 2 versions of patches. Their difference is how to handle case
2:
V1 patch aborted migration for the case ''broken page occurs during
migration'';
V2 patch marked broken page to dirty bitmap then got handled next round and no
abort;

This is V3 patch, adding handle for vMCE occur at last iteration of migration:
At the sender
  For case 1, the broken page will be mapped but not copied to target
  (otherwise it may trigger more serious error, say, SRAR error).
  While its pfn_type and pfn number will be transferred to target
  so that target take appropriate action.

  For case 2, mce handler marks the broken page to dirty bitmap, so that
  at copypages stage of migration, its pfn_type and pfn number will be
  transferred to target and then take appropriate action.

At the target
  When migration target populates pages for guest. As for the case
  of broken page wrt migration, we prefer to keep the type of the page,
  for the sake of seamless migration.

  At target it will set p2m as p2m_ram_broken for broken page. Guest MCE
  may have good chance to handle its broken page, while if guest access
  the broken page again it will kill itself as expected.

If vMCE occur at the last memory copy iteration of migration, we do one more
iteration so that the broken page''s pfn_type and pfn number has chance
to
be transferred to target.

Suggested-by: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Liu Jinsong <jinsong.liu@intel.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>

- for tools part of the patch
Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com>

- for hypervisor part of the patch
Acked-by: Jan Beulich <JBeulich@suse.com>

diff -r 8b93ac0c93f3 -r 446f6b9bfc89 tools/libxc/xc_domain.c
--- a/tools/libxc/xc_domain.c	Tue Nov 13 11:19:17 2012 +0000
+++ b/tools/libxc/xc_domain.c	Sat Nov 17 09:46:05 2012 +0800
@@ -283,6 +283,22 @@
     return ret;
 }
 
+/* set broken page p2m */
+int xc_set_broken_page_p2m(xc_interface *xch,
+                           uint32_t domid,
+                           unsigned long pfn)
+{
+    int ret;
+    DECLARE_DOMCTL;
+
+    domctl.cmd = XEN_DOMCTL_set_broken_page_p2m;
+    domctl.domain = (domid_t)domid;
+    domctl.u.set_broken_page_p2m.pfn = pfn;
+    ret = do_domctl(xch, &domctl);
+
+    return ret ? -1 : 0;
+}
+
 /* get info from hvm guest for save */
 int xc_domain_hvm_getcontext(xc_interface *xch,
                              uint32_t domid,
diff -r 8b93ac0c93f3 -r 446f6b9bfc89 tools/libxc/xc_domain_restore.c
--- a/tools/libxc/xc_domain_restore.c	Tue Nov 13 11:19:17 2012 +0000
+++ b/tools/libxc/xc_domain_restore.c	Sat Nov 17 09:46:05 2012 +0800
@@ -1023,9 +1023,15 @@
 
     countpages = count;
     for (i = oldcount; i < buf->nr_pages; ++i)
-        if ((buf->pfn_types[i] & XEN_DOMCTL_PFINFO_LTAB_MASK) ==
XEN_DOMCTL_PFINFO_XTAB
-            ||(buf->pfn_types[i] & XEN_DOMCTL_PFINFO_LTAB_MASK) ==
XEN_DOMCTL_PFINFO_XALLOC)
+    {
+        unsigned long pagetype;
+
+        pagetype = buf->pfn_types[i] & XEN_DOMCTL_PFINFO_LTAB_MASK;
+        if ( pagetype == XEN_DOMCTL_PFINFO_XTAB ||
+             pagetype == XEN_DOMCTL_PFINFO_BROKEN ||
+             pagetype == XEN_DOMCTL_PFINFO_XALLOC )
             --countpages;
+    }
 
     if (!countpages)
         return count;
@@ -1267,6 +1273,17 @@
             /* a bogus/unmapped/allocate-only page: skip it */
             continue;
 
+        if ( pagetype == XEN_DOMCTL_PFINFO_BROKEN )
+        {
+            if ( xc_set_broken_page_p2m(xch, dom, pfn) )
+            {
+                ERROR("Set p2m for broken page failed, "
+                      "dom=%d, pfn=%lx\n", dom, pfn);
+                goto err_mapped;
+            }
+            continue;
+        }
+
         if (pfn_err[i])
         {
             ERROR("unexpected PFN mapping failure pfn %lx map_mfn %lx
p2m_mfn %lx",
diff -r 8b93ac0c93f3 -r 446f6b9bfc89 tools/libxc/xc_domain_save.c
--- a/tools/libxc/xc_domain_save.c	Tue Nov 13 11:19:17 2012 +0000
+++ b/tools/libxc/xc_domain_save.c	Sat Nov 17 09:46:05 2012 +0800
@@ -1118,7 +1118,7 @@
     /* Now write out each data page, canonicalising page tables as we go... */
     for ( ; ; )
     {
-        unsigned int N, batch, run;
+        unsigned int N, batch, run, broken_page_num1, broken_page_num2;
         char reportbuf[80];
 
         snprintf(reportbuf, sizeof(reportbuf),
@@ -1270,6 +1270,7 @@
                 goto out;
             }
 
+            broken_page_num1 = 0;
             for ( run = j = 0; j < batch; j++ )
             {
                 unsigned long gmfn = pfn_batch[j];
@@ -1277,6 +1278,14 @@
                 if ( !hvm )
                     gmfn = pfn_to_mfn(gmfn);
 
+                if ( pfn_type[j] == XEN_DOMCTL_PFINFO_BROKEN )
+                {
+                    pfn_type[j] |= pfn_batch[j];
+                    ++broken_page_num1;
+                    ++run;
+                    continue;
+                }
+
                 if ( pfn_err[j] )
                 {
                     if ( pfn_type[j] == XEN_DOMCTL_PFINFO_XTAB )
@@ -1371,8 +1380,12 @@
                     }
                 }
 
-                /* skip pages that aren''t present or are alloc-only */
+                /*
+                 * skip pages that aren''t present,
+                 * or are broken, or are alloc-only
+                 */
                 if ( pagetype == XEN_DOMCTL_PFINFO_XTAB
+                    || pagetype == XEN_DOMCTL_PFINFO_BROKEN
                     || pagetype == XEN_DOMCTL_PFINFO_XALLOC )
                     continue;
 
@@ -1484,6 +1497,35 @@
 
             munmap(region_base, batch*PAGE_SIZE);
 
+            /*
+             * if vMCE occur at last iter, do one more iter so that it get
+             * chance to transfer broken page''s pfn_type and pfn
number to
+             * target and then take appropriate action
+             */
+            if ( last_iter )
+            {
+                for ( j = 0; j < batch; j++ )
+                {
+                    if ( hvm )
+                        pfn_type[j] = pfn_batch[j];
+                    else
+                        pfn_type[j] = pfn_to_mfn(pfn_batch[j]);
+                }
+
+                if ( xc_get_pfn_type_batch(xch, dom, batch, pfn_type) )
+                {
+                    PERROR("get_pfn_type_batch failed");
+                    goto out;
+                }
+
+                broken_page_num2 = 0;
+                for ( j = 0; j < batch; j++ )
+                    if ( pfn_type[j] == XEN_DOMCTL_PFINFO_BROKEN )
+                        broken_page_num2++;
+
+                if ( broken_page_num1 < broken_page_num2 )
+                    last_iter = 0;
+            }
         } /* end of this while loop for this iteration */
 
       skip:
@@ -1550,23 +1592,22 @@
                     PERROR("Error when writing to state file (tsc)");
                     goto out;
                 }
+            }
+        }
+        else
+            last_iter = 1;
 
+        if ( xc_shadow_control(xch, dom,
+                               XEN_DOMCTL_SHADOW_OP_CLEAN,
HYPERCALL_BUFFER(to_send),
+                               dinfo->p2m_size, NULL, 0, &shadow_stats)
!= dinfo->p2m_size )
+        {
+            PERROR("Error flushing shadow PT");
+            goto out;
+        }
 
-            }
+        sent_last_iter = sent_this_iter;
 
-            if ( xc_shadow_control(xch, dom,
-                                   XEN_DOMCTL_SHADOW_OP_CLEAN,
HYPERCALL_BUFFER(to_send),
-                                   dinfo->p2m_size, NULL, 0,
&shadow_stats) != dinfo->p2m_size )
-            {
-                PERROR("Error flushing shadow PT");
-                goto out;
-            }
-
-            sent_last_iter = sent_this_iter;
-
-            print_stats(xch, dom, sent_this_iter, &time_stats,
&shadow_stats, 1);
-
-        }
+        print_stats(xch, dom, sent_this_iter, &time_stats,
&shadow_stats, 1);
     } /* end of infinite for loop */
 
     DPRINTF("All memory is saved\n");
diff -r 8b93ac0c93f3 -r 446f6b9bfc89 tools/libxc/xenctrl.h
--- a/tools/libxc/xenctrl.h	Tue Nov 13 11:19:17 2012 +0000
+++ b/tools/libxc/xenctrl.h	Sat Nov 17 09:46:05 2012 +0800
@@ -575,6 +575,17 @@
                           xc_domaininfo_t *info);
 
 /**
+ * This function set p2m for broken page
+ * &parm xch a handle to an open hypervisor interface
+ * @parm domid the domain id which broken page belong to
+ * @parm pfn the pfn number of the broken page
+ * @return 0 on success, -1 on failure
+ */
+int xc_set_broken_page_p2m(xc_interface *xch,
+                           uint32_t domid,
+                           unsigned long pfn);
+
+/**
  * This function returns information about the context of a hvm domain
  * @parm xch a handle to an open hypervisor interface
  * @parm domid the domain to get information from
diff -r 8b93ac0c93f3 -r 446f6b9bfc89 xen/arch/x86/cpu/mcheck/mcaction.c
--- a/xen/arch/x86/cpu/mcheck/mcaction.c	Tue Nov 13 11:19:17 2012 +0000
+++ b/xen/arch/x86/cpu/mcheck/mcaction.c	Sat Nov 17 09:46:05 2012 +0800
@@ -1,5 +1,6 @@
 #include <xen/types.h>
 #include <xen/sched.h>
+#include <asm/p2m.h>
 #include "mcaction.h"
 #include "vmce.h"
 #include "mce.h"
@@ -91,6 +92,24 @@
                     goto vmce_failed;
                 }
 
+                if ( is_hvm_domain(d) &&
!d->arch.hvm_domain.dirty_vram &&
+                     paging_mode_log_dirty(d) )
+                {
+                    /*
+                     * vMCE occur during migration
+                     *
+                     *   At sender, it marks broken page to dirty bitmap,
+                     *   so that at copypages stage of migration, broken
+                     *   page''s pfn_type and pfn number would be
transferred
+                     *   to target and then take appropriate action.
+                     *
+                     *   At target, it would set p2m as p2m_ram_broken for
+                     *   broken page, so that if guest access the broken page
+                     *   again, it would kill itself as expected.
+                     */
+                    paging_mark_dirty(d, mfn);
+                }
+
                 if ( unmmap_broken_page(d, _mfn(mfn), gfn) )
                 {
                     printk("Unmap broken memory %lx for DOM%d
failed\n",
diff -r 8b93ac0c93f3 -r 446f6b9bfc89 xen/arch/x86/domctl.c
--- a/xen/arch/x86/domctl.c	Tue Nov 13 11:19:17 2012 +0000
+++ b/xen/arch/x86/domctl.c	Sat Nov 17 09:46:05 2012 +0800
@@ -209,12 +209,18 @@
                 for ( j = 0; j < k; j++ )
                 {
                     unsigned long type = 0;
+                    p2m_type_t t;
 
-                    page = get_page_from_gfn(d, arr[j], NULL, P2M_ALLOC);
+                    page = get_page_from_gfn(d, arr[j], &t, P2M_ALLOC);
 
                     if ( unlikely(!page) ||
                          unlikely(is_xen_heap_page(page)) )
-                        type = XEN_DOMCTL_PFINFO_XTAB;
+                    {
+                        if ( p2m_is_broken(t) )
+                            type = XEN_DOMCTL_PFINFO_BROKEN;
+                        else
+                            type = XEN_DOMCTL_PFINFO_XTAB;
+                    }
                     else
                     {
                         switch( page->u.inuse.type_info & PGT_type_mask
)
@@ -235,6 +241,9 @@
 
                         if ( page->u.inuse.type_info & PGT_pinned )
                             type |= XEN_DOMCTL_PFINFO_LPINTAB;
+
+                        if ( page->count_info & PGC_broken )
+                            type = XEN_DOMCTL_PFINFO_BROKEN;
                     }
 
                     if ( page )
@@ -1568,6 +1577,29 @@
     }
     break;
 
+    case XEN_DOMCTL_set_broken_page_p2m:
+    {
+        struct domain *d;
+
+        d = rcu_lock_domain_by_id(domctl->domain);
+        if ( d != NULL )
+        {
+            p2m_type_t pt;
+            unsigned long pfn = domctl->u.set_broken_page_p2m.pfn;
+            mfn_t mfn = get_gfn_query(d, pfn, &pt);
+
+            if ( unlikely(!mfn_valid(mfn_x(mfn)) || !p2m_is_ram(pt) ||
+                         (p2m_change_type(d, pfn, pt, p2m_ram_broken) != pt)) )
+                ret = -EINVAL;
+
+            put_gfn(d, pfn);
+            rcu_unlock_domain(d);
+        }
+        else
+            ret = -ESRCH;
+    }
+    break;
+
     default:
         ret = iommu_do_domctl(domctl, u_domctl);
         break;
diff -r 8b93ac0c93f3 -r 446f6b9bfc89 xen/include/public/domctl.h
--- a/xen/include/public/domctl.h	Tue Nov 13 11:19:17 2012 +0000
+++ b/xen/include/public/domctl.h	Sat Nov 17 09:46:05 2012 +0800
@@ -136,6 +136,7 @@
 #define XEN_DOMCTL_PFINFO_LPINTAB (0x1U<<31)
 #define XEN_DOMCTL_PFINFO_XTAB    (0xfU<<28) /* invalid page */
 #define XEN_DOMCTL_PFINFO_XALLOC  (0xeU<<28) /* allocate-only page */
+#define XEN_DOMCTL_PFINFO_BROKEN  (0xdU<<28) /* broken page */
 #define XEN_DOMCTL_PFINFO_LTAB_MASK (0xfU<<28)
 
 struct xen_domctl_getpageframeinfo {
@@ -834,6 +835,12 @@
 typedef struct xen_domctl_set_access_required xen_domctl_set_access_required_t;
 DEFINE_XEN_GUEST_HANDLE(xen_domctl_set_access_required_t);
 
+struct xen_domctl_set_broken_page_p2m {
+    uint64_aligned_t pfn;
+};
+typedef struct xen_domctl_set_broken_page_p2m xen_domctl_set_broken_page_p2m_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_set_broken_page_p2m_t);
+
 struct xen_domctl {
     uint32_t cmd;
 #define XEN_DOMCTL_createdomain                   1
@@ -899,6 +906,7 @@
 #define XEN_DOMCTL_set_access_required           64
 #define XEN_DOMCTL_audit_p2m                     65
 #define XEN_DOMCTL_set_virq_handler              66
+#define XEN_DOMCTL_set_broken_page_p2m           67
 #define XEN_DOMCTL_gdbsx_guestmemio            1000
 #define XEN_DOMCTL_gdbsx_pausevcpu             1001
 #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
@@ -954,6 +962,7 @@
         struct xen_domctl_audit_p2m         audit_p2m;
         struct xen_domctl_set_virq_handler  set_virq_handler;
         struct xen_domctl_gdbsx_memio       gdbsx_guest_memio;
+        struct xen_domctl_set_broken_page_p2m set_broken_page_p2m;
         struct xen_domctl_gdbsx_pauseunp_vcpu gdbsx_pauseunp_vcpu;
         struct xen_domctl_gdbsx_domstatus   gdbsx_domstatus;
         uint8_t                             pad[128];

Ian Campbell

2012-Nov-19 09:55 UTC

head link

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

On Fri, 2012-11-16 at 18:31 +0000, Liu, Jinsong wrote:> >> +            /*
> >> +             * if vMCE occur at last iter, do one more iter so
that it get
> >> +             * chance to transfer broken page''s pfn_type
and pfn number to
> >> +             * target and then take appropriate action
[...]> >> +            }
> > 
> > Is this really the best way of doing this ?  Isn''t there some
single
> > flag the hypervisor sets somewhere ?
> That will involve adding new hypercall, and notifying hypervisor
> whether tools side is at the last iter round.
If we get to this stage then haven''t we either already sent something
over the wire for this page or marked it as dirty when we tried and
failed to send it?

In the former case we don''t care that the page is now broken on the
source since the target has got a good pre-breakage copy.

In the latter case could we not set a flag at the same time as we mark
the page dirty which means "go round at least one more time"?

Ian.

George Dunlap

2012-Nov-19 15:29 UTC

head link

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

On 19/11/12 09:55, Ian Campbell wrote:> If we get to this stage then haven''t we either already sent
something
> over the wire for this page or marked it as dirty when we tried and
> failed to send it?
>
> In the former case we don''t care that the page is now broken on
the
> source since the target has got a good pre-breakage copy.
>
> In the latter case could we not set a flag at the same time as we mark
> the page dirty which means "go round at least one more time"?
Yeah -- on the last iteration, the VM itself has to be paused; if any 
pages get broken after that, it doesn''t really matter, does it? The
real
thing is to have a consistent "snapshot" of behavior.

I guess the one potentially tricky case to worry about is whether to 
deliver an MCE to the guest on restore.  Consider the following scenario:

- Page A is modified (and marked dirty)
- VM paused for last iteration
- Page breaks, is marked broken in the p2m
- Save code sends page A

In that case, the save code would send a "broken" page, and the
restore
code would mark a page as broken, and we *would* want to deliver an MCE 
on the far side.  But suppose the last two steps were reversed:

- Page A modified
- VM paused for last iteration
- Save code sends page A
- Page breaks, marked broken in the p2m

In that case, when the save code sends page A, it will send a good page; 
there''s no need to mark it broken, or to send the guest an MCE.

Am I understanding the situation correctly, Jinsong?

  -George

Ian Campbell

2012-Nov-19 16:57 UTC

head link

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

On Mon, 2012-11-19 at 15:29 +0000, George Dunlap wrote:> On 19/11/12 09:55, Ian Campbell wrote:
> > If we get to this stage then haven''t we either already sent
something
> > over the wire for this page or marked it as dirty when we tried and
> > failed to send it?
> >
> > In the former case we don''t care that the page is now broken
on the
> > source since the target has got a good pre-breakage copy.
> >
> > In the latter case could we not set a flag at the same time as we mark
> > the page dirty which means "go round at least one more
time"?
> 
> Yeah -- on the last iteration, the VM itself has to be paused; if any 
> pages get broken after that, it doesn''t really matter, does it?
The real
> thing is to have a consistent "snapshot" of behavior.
> 
> I guess the one potentially tricky case to worry about is whether to 
> deliver an MCE to the guest on restore.  Consider the following scenario:
> 
> - Page A is modified (and marked dirty)
> - VM paused for last iteration
> - Page breaks, is marked broken in the p2m
> - Save code sends page A
> 
> In that case, the save code would send a "broken" page, and the
restore
> code would mark a page as broken, and we *would* want to deliver an MCE 
> on the far side.  But suppose the last two steps were reversed:
> 
> - Page A modified
> - VM paused for last iteration
> - Save code sends page A
> - Page breaks, marked broken in the p2m
> 
> In that case, when the save code sends page A, it will send a good page; 
> there''s no need to mark it broken, or to send the guest an MCE.
I guess you''d want to err on the side of stopping using a good page, as
opposed to continuing to use a bad page? i.e. its better to take a
spurious vMCE than to not take an actual one.

I''m not actually sure what a guest does with a vMCE, I guess it does
some sort of memory exchange to give the bad page back to the h/v and
get a good page in return? If the hypervisor thinks the old page is ok
rather than bad I guess it''ll just put it in the free list instead of
the bad list?
> 
> Am I understanding the situation correctly, Jinsong?
> 
>   -George

George Dunlap

2012-Nov-20 15:08 UTC

head link

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

On 19/11/12 16:57, Ian Campbell wrote:> On Mon, 2012-11-19 at 15:29 +0000, George Dunlap wrote:
>> On 19/11/12 09:55, Ian Campbell wrote:
>>> If we get to this stage then haven''t we either already
sent something
>>> over the wire for this page or marked it as dirty when we tried and
>>> failed to send it?
>>>
>>> In the former case we don''t care that the page is now
broken on the
>>> source since the target has got a good pre-breakage copy.
>>>
>>> In the latter case could we not set a flag at the same time as we
mark
>>> the page dirty which means "go round at least one more
time"?
>> Yeah -- on the last iteration, the VM itself has to be paused; if any
>> pages get broken after that, it doesn''t really matter, does
it? The real
>> thing is to have a consistent "snapshot" of behavior.
>>
>> I guess the one potentially tricky case to worry about is whether to
>> deliver an MCE to the guest on restore.  Consider the following
scenario:
>>
>> - Page A is modified (and marked dirty)
>> - VM paused for last iteration
>> - Page breaks, is marked broken in the p2m
>> - Save code sends page A
>>
>> In that case, the save code would send a "broken" page, and
the restore
>> code would mark a page as broken, and we *would* want to deliver an MCE
>> on the far side.  But suppose the last two steps were reversed:
>>
>> - Page A modified
>> - VM paused for last iteration
>> - Save code sends page A
>> - Page breaks, marked broken in the p2m
>>
>> In that case, when the save code sends page A, it will send a good
page;
>> there''s no need to mark it broken, or to send the guest an
MCE.
> I guess you''d want to err on the side of stopping using a good
page, as
> opposed to continuing to use a bad page? i.e. its better to take a
> spurious vMCE than to not take an actual one.
While that''s true, taking a spurious MCE means at very least one less 
page available to the guest to use (for HVM guests that haven''t 
ballooned down, at least), and the unnecessary loss of the data in that 
page.

The problem I guess is that the save code at the moment has no way of 
distinguishing the following cases:
1. Marked broken after the last time I sent it, but before the VM was 
paused; but the page hasn''t been written to
2. Marked broken after the VM was paused; page hadn''t been written to
3. Marked broken after the VM was paused, but the page had been written to

In case 1, we definitely need to send a broken page; but the VM may have 
already received a vMCE.  In case 2, we don''t need to send a broken
page
or a vMCE, while in #3 we need to do both.

On the other hand, the whole situation is hopefully rare enough that 
maybe we can just do the simple correct thing, even if it''s a tiny bit 
sub-optimal.  In that case, assuming that spurious vMCEs aren''t a 
problem (e.g., #1), I think we basically just need to see if the last 
iteration contains a broken page, and if so, send the guest a vMCE on 
resume.

Thoughts?
> I''m not actually sure what a guest does with a vMCE, I guess it
does
> some sort of memory exchange to give the bad page back to the h/v and
> get a good page in return? If the hypervisor thinks the old page is ok
> rather than bad I guess it''ll just put it in the free list instead
of
> the bad list?
Yes, I''m pretty sure the hypervisor''s accounting of broken
pages is
separate from guest p2m entries; I think if you mark a p2m entry broken, 
the hypervisor will just free the ram page that was mapped there before.

I think the guest just tries to recover gracefully when it gets a vMCE 
(e.g., by re-reading the page from disk or killing the process).  I 
don''t think it asks the hypervisor for another page to replace it at 
this point.

  -George

Liu, Jinsong

2012-Nov-20 16:11 UTC

head link

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Ian Campbell wrote:> On Fri, 2012-11-16 at 18:31 +0000, Liu, Jinsong wrote:
>>>> +            /*
>>>> +             * if vMCE occur at last iter, do one more iter so
>>>> that it get +             * chance to transfer broken
page''s
>>>> pfn_type and pfn number to +             * target and then take
>>>> appropriate action [...] +            }
>>> 
>>> Is this really the best way of doing this ?  Isn''t there
some single
>>> flag the hypervisor sets somewhere ?
> 
>> That will involve adding new hypercall, and notifying hypervisor
>> whether tools side is at the last iter round.
> 
> If we get to this stage then haven''t we either already sent
something
> over the wire for this page or marked it as dirty when we tried and
> failed to send it?
> 
> In the former case we don''t care that the page is now broken on
the
> source since the target has got a good pre-breakage copy.
Yes, and theoritically it''s better, if only tools save code and
hypervisor vmce code can *easily* know status each other, like
* whether vmce occur at last iteration
* whether the broken page is at dirty bitmap or not at last iteration
> 
> In the latter case could we not set a flag at the same time as we mark
> the page dirty which means "go round at least one more time"?
> 
> Ian.
That would involve adding new hypercalls for this corner case.

IMO it''s better to keep simple way for the corner case, re-use
currently existed mechanism.

Thanks,
Jinsong

Liu, Jinsong

2012-Nov-20 16:29 UTC

head link

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

George Dunlap wrote:> On 19/11/12 09:55, Ian Campbell wrote:
>> If we get to this stage then haven''t we either already sent
something
>> over the wire for this page or marked it as dirty when we tried and
>> failed to send it? 
>> 
>> In the former case we don''t care that the page is now broken
on the
>> source since the target has got a good pre-breakage copy.
>> 
>> In the latter case could we not set a flag at the same time as we
>> mark the page dirty which means "go round at least one more
time"?
> 
> Yeah -- on the last iteration, the VM itself has to be paused; if any
> pages get broken after that, it doesn''t really matter, does it?
The
If the page (which will break at last iter) is in the dirty bitmap of last iter,
it does matter, even though VM forzen at last iter.
> real thing is to have a consistent "snapshot" of behavior.
> 
> I guess the one potentially tricky case to worry about is whether to
> deliver an MCE to the guest on restore.  Consider the following
> scenario: 
> 
> - Page A is modified (and marked dirty)
> - VM paused for last iteration
> - Page breaks, is marked broken in the p2m
> - Save code sends page A
> 
> In that case, the save code would send a "broken" page, and the
> restore code would mark a page as broken, and we *would* want to
> deliver an MCE on the far side.  But suppose the last two steps were
> reversed: 
> 
> - Page A modified
> - VM paused for last iteration
> - Save code sends page A
> - Page breaks, marked broken in the p2m
> 
> In that case, when the save code sends page A, it will send a good
> page; there''s no need to mark it broken, 
> or to send the guest an MCE.
> 
> Am I understanding the situation correctly, Jinsong?
> 
>   -George
Yes, if not consider code complication :-)

Liu, Jinsong

2012-Nov-20 16:43 UTC

head link

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Ian Campbell wrote:> On Mon, 2012-11-19 at 15:29 +0000, George Dunlap wrote:
>> On 19/11/12 09:55, Ian Campbell wrote:
>>> If we get to this stage then haven''t we either already
sent
>>> something over the wire for this page or marked it as dirty when we
>>> tried and failed to send it? 
>>> 
>>> In the former case we don''t care that the page is now
broken on the
>>> source since the target has got a good pre-breakage copy.
>>> 
>>> In the latter case could we not set a flag at the same time as we
>>> mark the page dirty which means "go round at least one more
time"?
>> 
>> Yeah -- on the last iteration, the VM itself has to be paused; if any
>> pages get broken after that, it doesn''t really matter, does
it? The
>> real thing is to have a consistent "snapshot" of behavior.
>> 
>> I guess the one potentially tricky case to worry about is whether to
>> deliver an MCE to the guest on restore.  Consider the following
>> scenario: 
>> 
>> - Page A is modified (and marked dirty)
>> - VM paused for last iteration
>> - Page breaks, is marked broken in the p2m
>> - Save code sends page A
>> 
>> In that case, the save code would send a "broken" page, and
the
>> restore code would mark a page as broken, and we *would* want to
>> deliver an MCE on the far side.  But suppose the last two steps were
>> reversed: 
>> 
>> - Page A modified
>> - VM paused for last iteration
>> - Save code sends page A
>> - Page breaks, marked broken in the p2m
>> 
>> In that case, when the save code sends page A, it will send a good
>> page; there''s no need to mark it broken, or to send the guest
an MCE.
> 
> I guess you''d want to err on the side of stopping using a good
page,
> as opposed to continuing to use a bad page? i.e. its better to take a
> spurious vMCE than to not take an actual one.
> 
> I''m not actually sure what a guest does with a vMCE, I guess it
does
> some sort of memory exchange to give the bad page back to the h/v and
> get a good page in return? If the hypervisor thinks the old page is ok
> rather than bad I guess it''ll just put it in the free list instead
of
> the bad list?
> 
the broken page is h/w bad page, means its content corrupted and cannot be
corrected by h/w circuit (say ECC).

For vmce and the broken page, guest and h/v are agnostic each other so I
don''t think h/v should have chance to do anything, except inject vMCE#
to guest.
Guest handle vMCE# exactly same as native, like, to simply ignore vMCE, to kill
process and isolate broken page if possible (in its own range), or kill guest
itself if not possible, depending on what the error is.

Thanks,
Jinsong

Liu, Jinsong

2012-Nov-20 17:08 UTC

head link

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

George Dunlap wrote:> On 19/11/12 16:57, Ian Campbell wrote:
>> On Mon, 2012-11-19 at 15:29 +0000, George Dunlap wrote:
>>> On 19/11/12 09:55, Ian Campbell wrote:
>>>> If we get to this stage then haven''t we either already
sent
>>>> something over the wire for this page or marked it as dirty
when
>>>> we tried and failed to send it? 
>>>> 
>>>> In the former case we don''t care that the page is now
broken on the
>>>> source since the target has got a good pre-breakage copy.
>>>> 
>>>> In the latter case could we not set a flag at the same time as
we
>>>> mark the page dirty which means "go round at least one
more time"?
>>> Yeah -- on the last iteration, the VM itself has to be paused; if
>>> any pages get broken after that, it doesn''t really matter,
does it?
>>> The real thing is to have a consistent "snapshot" of
behavior.
>>> 
>>> I guess the one potentially tricky case to worry about is whether
to
>>> deliver an MCE to the guest on restore.  Consider the following
>>> scenario: 
>>> 
>>> - Page A is modified (and marked dirty)
>>> - VM paused for last iteration
>>> - Page breaks, is marked broken in the p2m
>>> - Save code sends page A
>>> 
>>> In that case, the save code would send a "broken" page,
and the
>>> restore code would mark a page as broken, and we *would* want to
>>> deliver an MCE on the far side.  But suppose the last two steps
>>> were reversed: 
>>> 
>>> - Page A modified
>>> - VM paused for last iteration
>>> - Save code sends page A
>>> - Page breaks, marked broken in the p2m
>>> 
>>> In that case, when the save code sends page A, it will send a good
>>> page; there''s no need to mark it broken, or to send the
guest an
>>> MCE. 
>> I guess you''d want to err on the side of stopping using a good
page,
>> as opposed to continuing to use a bad page? i.e. its better to take a
>> spurious vMCE than to not take an actual one.
> 
> While that''s true, taking a spurious MCE means at very least one
less
> page available to the guest to use (for HVM guests that haven''t
> ballooned down, at least), and the unnecessary loss of the data in
> that page.
> 
> The problem I guess is that the save code at the moment has no way of
> distinguishing the following cases:
> 1. Marked broken after the last time I sent it, but before the VM was
> paused; but the page hasn''t been written to
> 2. Marked broken after the VM was paused; page hadn''t been written
to
> 3. Marked broken after the VM was paused, but the page had been
> written to 
> 
> In case 1, we definitely need to send a broken page; but the VM may
> have already received a vMCE.  In case 2, we don''t need to send a
> broken page or a vMCE, while in #3 we need to do both.
> 
Yes, and for case 3, strictly it still has chance not to send broken page to
target (and target will happily runs on ''good'' pfn_type and
''good'' page), but too complicated to distinguish.
> On the other hand, the whole situation is hopefully rare enough that
> maybe we can just do the simple correct thing, even if it''s a tiny
bit
> sub-optimal.  In that case, assuming that spurious vMCEs aren''t a
> problem (e.g., #1), I think we basically just need to see if the last
> iteration contains a broken page, and if so, send the guest a vMCE on
> resume.
> 
> Thoughts?
Yes, that''s exactly V3 patch did.

Thanks,
Jinsong

George Dunlap

2012-Nov-20 17:23 UTC

head link

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

On 20/11/12 17:08, Liu, Jinsong wrote:>> On the other hand, the whole situation is hopefully rare enough that
>> maybe we can just do the simple correct thing, even if it''s a
tiny bit
>> sub-optimal.  In that case, assuming that spurious vMCEs
aren''t a
>> problem (e.g., #1), I think we basically just need to see if the last
>> iteration contains a broken page, and if so, send the guest a vMCE on
>> resume.
>>
>> Thoughts?
>
> Yes, that''s exactly V3 patch did.
Right -- but you did it by adding an extra couple of loops, which made 
us wonder if there was a more efficient way to do the same thing.  Now 
that we know *what* needs to be done, we''re in a better position to 
think about other options.  Let me take a look...

  -George

George Dunlap

2012-Nov-20 17:48 UTC

head link

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

On Sat, Nov 17, 2012 at 2:04 AM, Liu Jinsong <jinsong.liu@intel.com>
wrote:
> +            /*
> +             * if vMCE occur at last iter, do one more iter so that it get
> +             * chance to transfer broken page''s pfn_type and pfn
number to
> +             * target and then take appropriate action
> +             */
> +            if ( last_iter )
> +            {
> +                for ( j = 0; j < batch; j++ )
> +                {
> +                    if ( hvm )
> +                        pfn_type[j] = pfn_batch[j];
> +                    else
> +                        pfn_type[j] = pfn_to_mfn(pfn_batch[j]);
> +                }
> +
> +                if ( xc_get_pfn_type_batch(xch, dom, batch, pfn_type) )
> +                {
> +                    PERROR("get_pfn_type_batch failed");
> +                    goto out;
> +                }
> +
> +                broken_page_num2 = 0;
> +                for ( j = 0; j < batch; j++ )
> +                    if ( pfn_type[j] == XEN_DOMCTL_PFINFO_BROKEN )
> +                        broken_page_num2++;
> +
> +                if ( broken_page_num1 < broken_page_num2 )
> +                    last_iter = 0;
> +            }
>
I guess I still don''t see the point of this loop.  The pages this loop
will
catch are pages in this scenario:
- Page A marked dirty
- Last iteration begins; vm paused
- Page A sent
- Page A marked broken
- Last iteration ends

But in this case, it doesn''t really matter that page A is clean on the
receiving side; it''s already got a clean working copy with the latest
data,
right?


>          } /* end of this while loop for this iteration */
>
>        skip:
> @@ -1550,23 +1592,22 @@
>                      PERROR("Error when writing to state file
(tsc)");
>                      goto out;
>                  }
> +            }
> +        }
> +        else
> +            last_iter = 1;
>
> +        if ( xc_shadow_control(xch, dom,
> +                               XEN_DOMCTL_SHADOW_OP_CLEAN,
> HYPERCALL_BUFFER(to_send),
> +                               dinfo->p2m_size, NULL, 0,
&shadow_stats)
> != dinfo->p2m_size )
> +        {
> +            PERROR("Error flushing shadow PT");
> +            goto out;
> +        }
>
> -            }
> +        sent_last_iter = sent_this_iter;
>
> -            if ( xc_shadow_control(xch, dom,
> -                                   XEN_DOMCTL_SHADOW_OP_CLEAN,
> HYPERCALL_BUFFER(to_send),
> -                                   dinfo->p2m_size, NULL, 0,
> &shadow_stats) != dinfo->p2m_size )
> -            {
> -                PERROR("Error flushing shadow PT");
> -                goto out;
> -            }
> -
> -            sent_last_iter = sent_this_iter;
> -
> -            print_stats(xch, dom, sent_this_iter, &time_stats,
> &shadow_stats, 1);
> -
> -        }
> +        print_stats(xch, dom, sent_this_iter, &time_stats,
&shadow_stats,
> 1);
>
And what''s all this stuff about?  Why force a second sweep when
we''re
saving to disk?

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Liu, Jinsong

2012-Nov-20 17:49 UTC

head link

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

George Dunlap wrote:> On 20/11/12 17:08, Liu, Jinsong wrote:
>>> On the other hand, the whole situation is hopefully rare enough
that
>>> maybe we can just do the simple correct thing, even if
it''s a tiny
>>> bit sub-optimal.  In that case, assuming that spurious vMCEs
aren''t
>>> a problem (e.g., #1), I think we basically just need to see if the
>>> last iteration contains a broken page, and if so, send the guest a
>>> vMCE on resume. 
>>> 
>>> Thoughts?
>> 
>> Yes, that''s exactly V3 patch did.
> 
> Right -- but you did it by adding an extra couple of loops, which made
> us wonder if there was a more efficient way to do the same thing.  Now
> that we know *what* needs to be done, we''re in a better position
to
> think about other options.  Let me take a look...
> 
>   -George
Wow, thanks!

Jinsong

Liu, Jinsong

2012-Nov-20 18:13 UTC

head link

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

________________________________
From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of George Dunlap
Sent: Wednesday, November 21, 2012 1:48 AM
To: Liu, Jinsong
Cc: Ian Campbell; Ian Jackson; Jan Beulich; xen-devel@lists.xensource.com
Subject: Re: [Xen-devel] [PATCH V3] X86/vMCE: handle broken page with regard to
migration

On Sat, Nov 17, 2012 at 2:04 AM, Liu Jinsong
<jinsong.liu@intel.com<mailto:jinsong.liu@intel.com>> wrote:
+            /*
+             * if vMCE occur at last iter, do one more iter so that it get
+             * chance to transfer broken page''s pfn_type and pfn
number to
+             * target and then take appropriate action
+             */
+            if ( last_iter )
+            {
+                for ( j = 0; j < batch; j++ )
+                {
+                    if ( hvm )
+                        pfn_type[j] = pfn_batch[j];
+                    else
+                        pfn_type[j] = pfn_to_mfn(pfn_batch[j]);
+                }
+
+                if ( xc_get_pfn_type_batch(xch, dom, batch, pfn_type) )
+                {
+                    PERROR("get_pfn_type_batch failed");
+                    goto out;
+                }
+
+                broken_page_num2 = 0;
+                for ( j = 0; j < batch; j++ )
+                    if ( pfn_type[j] == XEN_DOMCTL_PFINFO_BROKEN )
+                        broken_page_num2++;
+
+                if ( broken_page_num1 < broken_page_num2 )
+                    last_iter = 0;
+            }

I guess I still don''t see the point of this loop.  The pages this loop
will catch are pages in this scenario:
- Page A marked dirty
- Last iteration begins; vm paused
- Page A sent
- Page A marked broken
- Last iteration ends

[Jinsong]

No, at last lter, there are 4 points:
1. start last iter
2. get and transfer pfn_type to target
3. copy page to target
4. end last iter

this code monitor whether it has a vmce occur at area from point 2 to point 4
(we don''t care vmce occur at last iter but before point2)

But in this case, it doesn''t really matter that page A is clean on the
receiving side; it''s already got a clean working copy with the latest
data, right?


         } /* end of this while loop for this iteration */

       skip:
@@ -1550,23 +1592,22 @@
                     PERROR("Error when writing to state file (tsc)");
                     goto out;
                 }
+            }
+        }
+        else
+            last_iter = 1;

+        if ( xc_shadow_control(xch, dom,
+                               XEN_DOMCTL_SHADOW_OP_CLEAN,
HYPERCALL_BUFFER(to_send),
+                               dinfo->p2m_size, NULL, 0, &shadow_stats)
!= dinfo->p2m_size )
+        {
+            PERROR("Error flushing shadow PT");
+            goto out;
+        }

-            }
+        sent_last_iter = sent_this_iter;

-            if ( xc_shadow_control(xch, dom,
-                                   XEN_DOMCTL_SHADOW_OP_CLEAN,
HYPERCALL_BUFFER(to_send),
-                                   dinfo->p2m_size, NULL, 0,
&shadow_stats) != dinfo->p2m_size )
-            {
-                PERROR("Error flushing shadow PT");
-                goto out;
-            }
-
-            sent_last_iter = sent_this_iter;
-
-            print_stats(xch, dom, sent_this_iter, &time_stats,
&shadow_stats, 1);
-
-        }
+        print_stats(xch, dom, sent_this_iter, &time_stats,
&shadow_stats, 1);

And what''s all this stuff about?  Why force a second sweep when
we''re saving to disk?

 [Jinsong] if not, saving to disk will do endless loop, since it has no chance
to set last_iter=1.





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Ian Jackson

2012-Nov-20 18:21 UTC

head link

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3] X86/vMCE: handle broken
page with regard to migration"):> No, at last lter, there are 4 points:
> 1. start last iter
> 2. get and transfer pfn_type to target
> 3. copy page to target
> 4. end last iter
> 
> this code monitor whether it has a vmce occur at area from point 2 to point
4 (we don''t care vmce occur at last iter but before point2)
The migration code is rather tangled but perhaps it would be best to
check for mce after step 3 for each page, and keep a separate list of
the pages which broke in that pass ?

Ian.

Liu, Jinsong

2012-Nov-20 18:39 UTC

head link

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Ian Jackson wrote:> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3] X86/vMCE: handle
> broken page with regard to migration"): 
>> No, at last lter, there are 4 points:
>> 1. start last iter
>> 2. get and transfer pfn_type to target
>> 3. copy page to target
>> 4. end last iter
>> 
>> this code monitor whether it has a vmce occur at area from point 2
>> to point 4 (we don''t care vmce occur at last iter but before
point2)
> 
> The migration code is rather tangled but perhaps it would be best to
> check for mce after step 3 for each page, and keep a separate list of
> the pages which broke in that pass ?
> 
> Ian.
It indeed checks mce after point 3 for each page, but what''s the
advantage of keeping a separate list?

Thanks,
Jinsong

Ian Jackson

2012-Nov-20 18:42 UTC

head link

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3] X86/vMCE: handle broken
page with regard to migration"):> Ian Jackson wrote:
> > Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3] X86/vMCE: handle
> > broken page with regard to migration"): 
> >> No, at last lter, there are 4 points:
> >> 1. start last iter
> >> 2. get and transfer pfn_type to target
> >> 3. copy page to target
> >> 4. end last iter
...> It indeed checks mce after point 3 for each page, but what''s the
> advantage of keeping a separate list?
It avoids yet another loop over all the pages.  Unless I have
misunderstood.  Which I may have, because: if it checks for mce after
point 3 then surely that is sufficient ?  We don''t need to worry about
mces after that check.

Ian.

Liu, Jinsong

2012-Nov-20 18:54 UTC

head link

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Well, let me elaborate more my thought about broken page occur at last iter:

Theoretically we can separate ''broken page at last iter'' into
several sub-cases, and optimize case by case:
1. if the page (which will break at last iter) is not on dirty bitmap (of last
iter) --> do nothing is OK, target will happily run w/o broken page;
2. if the page (which will break at last iter) is on dirty bitmap (of last iter)
  2.1 if at last iter, vmce occur after page copy --> do nothing is OK,
target happily run w/o broken page;
  2.2 if vmce occur beofre pfn_type transfer --> do nothing is OK, V2 patch
has correctly handle the case, target will set p2m broken correctly;
  2.3 if vmce occur after pfn_type transfer and before copy page to target
--> we need handle this case;

Practically considering it''s rare enough, and code complication, we
handle it in a simple way (not so optimized but enough for real life):
- we don''t distinguish if the page is in dirty bitmap of last iter;
- we don''t prefer adding new hypercall for this corner case, instead
we''d like to re-use currently existed hypercall;
- if vmce occur at last iter, we do 1~2 more iter.
This is what V3 patch does now.

Thoughts?

Thanks,
Jinsong

Liu, Jinsong

2012-Nov-20 19:07 UTC

head link

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Ian Jackson wrote:> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3] X86/vMCE: handle
> broken page with regard to migration"): 
>> Ian Jackson wrote:
>>> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3] X86/vMCE:
handle
>>> broken page with regard to migration"):
>>>> No, at last lter, there are 4 points:
>>>> 1. start last iter
>>>> 2. get and transfer pfn_type to target
>>>> 3. copy page to target
>>>> 4. end last iter
> ...
>> It indeed checks mce after point 3 for each page, but what''s
the
>> advantage of keeping a separate list?
> 
> It avoids yet another loop over all the pages.  Unless I have
> misunderstood.  Which I may have, because: if it checks for mce after
> point 3 then surely that is sufficient ?  We don''t need to worry
about
> mces after that check.
> 
> Ian.
In fact it will only do 1~2 more iter (if vmce occur at last iter), with exactly
1 more page (the broken page) -- vm frozen under this case, right?
As for the case vmce occur after our check, target will be very happy (and
lucky) run the vm w/o broken page.

Thanks,
Jinsong

Ian Campbell

2012-Nov-21 11:07 UTC

head link

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

On Tue, 2012-11-20 at 18:54 +0000, Liu, Jinsong wrote:> Well, let me elaborate more my thought about broken page occur at last
iter:
> 
> Theoretically we can separate ''broken page at last iter''
into several sub-cases, and optimize case by case:
> 1. if the page (which will break at last iter) is not on dirty bitmap (of
last iter) --> do nothing is OK, target will happily run w/o broken page;
> 2. if the page (which will break at last iter) is on dirty bitmap (of last
iter)
>   2.1 if at last iter, vmce occur after page copy --> do nothing is OK,
target happily run w/o broken page;
>   2.2 if vmce occur beofre pfn_type transfer --> do nothing is OK, V2
patch has correctly handle the case, target will set p2m broken correctly;
>   2.3 if vmce occur after pfn_type transfer and before copy page to target
--> we need handle this case;
> 
> Practically considering it''s rare enough, and code complication,
we handle it in a simple way (not so optimized but enough for real life):
> - we don''t distinguish if the page is in dirty bitmap of last
iter;
> - we don''t prefer adding new hypercall for this corner case,
instead we''d like to re-use currently existed hypercall;
> - if vmce occur at last iter, we do 1~2 more iter.
Can''t a page break on each "last" iter. i.e. you might
actually go
around as many times as there are pages in the last batch?

Ian.

George Dunlap

2012-Nov-21 11:18 UTC

head link

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

On 21/11/12 11:07, Ian Campbell wrote:> On Tue, 2012-11-20 at 18:54 +0000, Liu, Jinsong wrote:
>> Well, let me elaborate more my thought about broken page occur at last
iter:
>>
>> Theoretically we can separate ''broken page at last
iter'' into several sub-cases, and optimize case by case:
>> 1. if the page (which will break at last iter) is not on dirty bitmap
(of last iter) --> do nothing is OK, target will happily run w/o broken page;
>> 2. if the page (which will break at last iter) is on dirty bitmap (of
last iter)
>>    2.1 if at last iter, vmce occur after page copy --> do nothing is
OK, target happily run w/o broken page;
>>    2.2 if vmce occur beofre pfn_type transfer --> do nothing is OK,
V2 patch has correctly handle the case, target will set p2m broken correctly;
>>    2.3 if vmce occur after pfn_type transfer and before copy page to
target --> we need handle this case;
>>
>> Practically considering it''s rare enough, and code
complication, we handle it in a simple way (not so optimized but enough for real
life):
>> - we don''t distinguish if the page is in dirty bitmap of last
iter;
>> - we don''t prefer adding new hypercall for this corner case,
instead we''d like to re-use currently existed hypercall;
>> - if vmce occur at last iter, we do 1~2 more iter.
> Can''t a page break on each "last" iter. i.e. you might
actually go
> around as many times as there are pages in the last batch?
I think it would be only if the pages break one by one, at just the 
right time -- that seems pretty unlikely. :-)

  -George

George Dunlap

2012-Nov-21 11:34 UTC

head link

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

On 20/11/12 18:42, Ian Jackson wrote:> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3] X86/vMCE: handle
broken page with regard to migration"):
>> Ian Jackson wrote:
>>> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3] X86/vMCE:
handle
>>> broken page with regard to migration"):
>>>> No, at last lter, there are 4 points:
>>>> 1. start last iter
>>>> 2. get and transfer pfn_type to target
>>>> 3. copy page to target
>>>> 4. end last iter
> ...
>> It indeed checks mce after point 3 for each page, but what''s
the
>> advantage of keeping a separate list?
> It avoids yet another loop over all the pages.  Unless I have
> misunderstood.  Which I may have, because: if it checks for mce after
> point 3 then surely that is sufficient ?  We don''t need to worry
about
> mces after that check.
It''s sufficient, but wouldn''t each check require a separate
hypercall?
That would surely be slower than just a single hypercall and a loop 
(which is what Jinsong''s patch does).

We don''t actually need a list -- I think we just need to know,
"Have any
pages broken between reading the p2m table ( xc_get_pfn_type_batch() ); 
if so, we do another full iteration.

Since marking a page broken will also mark it dirty, I suppose what we 
could do is, on the last iteration, clear the dirty bitmap after getting 
the list of pages but before copying them; and then check the bit in the 
bitmap corresponding to the pfn after copying it.

But on the whole, is that really that much *faster* to do it that way?  
Either way you''re still adding a single hypercall to the whole thing, 
and one iteration of a loop for each page; but in Jinsong''s patch, the 
loop is done all at once, so presumably the compiler / processor will be 
able to do make better use of loop unrolling / registers / the pipeline 
/ branch prediction / caches &c.

The only way to avoid the loop would be to add an extra "have any pages 
been marked broken / dirty" bit somewhere, which is probably more 
trouble than it''s worth.

  -George

Ian Jackson

2012-Nov-21 11:55 UTC

head link

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

George Dunlap writes ("Re: [Xen-devel] [PATCH V3] X86/vMCE: handle broken
page with regard to migration"):> But on the whole, is that really that much *faster* to do it that way?  
> Either way you''re still adding a single hypercall to the whole
thing,
> and one iteration of a loop for each page; but in Jinsong''s patch,
the
> loop is done all at once, so presumably the compiler / processor will be 
> able to do make better use of loop unrolling / registers / the pipeline 
> / branch prediction / caches &c.
> 
> The only way to avoid the loop would be to add an extra "have any
pages
> been marked broken / dirty" bit somewhere, which is probably more 
> trouble than it''s worth.
OK, that makes sense.

Ian.

Liu, Jinsong

2012-Nov-21 12:11 UTC

head link

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

George Dunlap wrote:> On 21/11/12 11:07, Ian Campbell wrote:
>> On Tue, 2012-11-20 at 18:54 +0000, Liu, Jinsong wrote:
>>> Well, let me elaborate more my thought about broken page occur at
>>> last iter: 
>>> 
>>> Theoretically we can separate ''broken page at last
iter'' into
>>> several sub-cases, and optimize case by case: 
>>> 1. if the page (which will break at last iter) is not on dirty
>>> bitmap (of last iter) --> do nothing is OK, target will happily
run
>>> w/o broken page;  
>>> 2. if the page (which will break at last iter) is on dirty bitmap
>>> (of last iter) 
>>>    2.1 if at last iter, vmce occur after page copy --> do
nothing
>>> is OK, target happily run w/o broken page; 
>>>    2.2 if vmce occur beofre pfn_type transfer --> do nothing is
OK,
>>> V2 patch has correctly handle the case, target will set p2m broken
>>> correctly;  
>>>    2.3 if vmce occur after pfn_type transfer and before copy page
>>> to target --> we need handle this case; 
>>> 
>>> Practically considering it''s rare enough, and code
complication, we
>>> handle it in a simple way (not so optimized but enough for real
>>> life):  
>>> - we don''t distinguish if the page is in dirty bitmap of
last iter;
>>> - we don''t prefer adding new hypercall for this corner
case,
>>> instead we''d like to re-use currently existed hypercall; 
>>> - if vmce occur at last iter, we do 1~2 more iter.
>> Can''t a page break on each "last" iter. i.e. you
might actually go
>> around as many times as there are pages in the last batch?
> 
> I think it would be only if the pages break one by one, at just the
> right time -- that seems pretty unlikely. :-)
> 
>   -George
And under the case it still OK w/ V3, keeping iter until the last broken page.

Thanks,
Jinsong

Ian Campbell

2012-Nov-21 12:11 UTC

head link

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

On Wed, 2012-11-21 at 11:34 +0000, George Dunlap wrote:> On 20/11/12 18:42, Ian Jackson wrote:
> > Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3] X86/vMCE: handle
broken page with regard to migration"):
> >> Ian Jackson wrote:
> >>> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3]
X86/vMCE: handle
> >>> broken page with regard to migration"):
> >>>> No, at last lter, there are 4 points:
> >>>> 1. start last iter
> >>>> 2. get and transfer pfn_type to target
> >>>> 3. copy page to target
> >>>> 4. end last iter
> > ...
> >> It indeed checks mce after point 3 for each page, but
what''s the
> >> advantage of keeping a separate list?
> > It avoids yet another loop over all the pages.  Unless I have
> > misunderstood.  Which I may have, because: if it checks for mce after
> > point 3 then surely that is sufficient ?  We don''t need to
worry about
> > mces after that check.
> 
> It''s sufficient, but wouldn''t each check require a
separate hypercall?
> That would surely be slower than just a single hypercall and a loop 
> (which is what Jinsong''s patch does).
> 
> We don''t actually need a list -- I think we just need to know,
"Have any
> pages broken between reading the p2m table ( xc_get_pfn_type_batch() ); 
> if so, we do another full iteration.
If a page fails between 2. and 3. above then what happens at point 3? I
presume we can''t map and send the page (since it is broken), do we get
some sort of failure to map?

What happens if the failure occurs during stage 3, i.e. while the page
is mapped and we are reading from it?

Ian.

George Dunlap

2012-Nov-21 12:15 UTC

head link

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

On 21/11/12 12:11, Ian Campbell wrote:> On Wed, 2012-11-21 at 11:34 +0000, George Dunlap wrote:
>> On 20/11/12 18:42, Ian Jackson wrote:
>>> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3] X86/vMCE:
handle broken page with regard to migration"):
>>>> Ian Jackson wrote:
>>>>> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3]
X86/vMCE: handle
>>>>> broken page with regard to migration"):
>>>>>> No, at last lter, there are 4 points:
>>>>>> 1. start last iter
>>>>>> 2. get and transfer pfn_type to target
>>>>>> 3. copy page to target
>>>>>> 4. end last iter
>>> ...
>>>> It indeed checks mce after point 3 for each page, but
what''s the
>>>> advantage of keeping a separate list?
>>> It avoids yet another loop over all the pages.  Unless I have
>>> misunderstood.  Which I may have, because: if it checks for mce
after
>>> point 3 then surely that is sufficient ?  We don''t need to
worry about
>>> mces after that check.
>> It''s sufficient, but wouldn''t each check require a
separate hypercall?
>> That would surely be slower than just a single hypercall and a loop
>> (which is what Jinsong''s patch does).
>>
>> We don''t actually need a list -- I think we just need to know,
"Have any
>> pages broken between reading the p2m table ( xc_get_pfn_type_batch() );
>> if so, we do another full iteration.
> If a page fails between 2. and 3. above then what happens at point 3? I
> presume we can''t map and send the page (since it is broken), do we
get
> some sort of failure to map?
At that point the pages are already mapped.  The order is:
1. From dirty bitmap / whatever, get a batch of pfns to map
2. Map them with foreign_batch
3. Read the p2m table
4. Send a list of PFNs (this is where it will list the broken ones)
5. Send all the data from those pfns

So JS is adding:

6. Read the p2m table again to see if anything has broken between 3 and 
the end of 5
> What happens if the failure occurs during stage 3, i.e. while the page
> is mapped and we are reading from it?
I was rather wondering the same thing. :-)

  -George

George Dunlap

2012-Nov-21 12:17 UTC

head link

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

OK, so now that we seem to have agreed on the overall approach, down to
"brass tacks", as we say in English:

On Sat, Nov 17, 2012 at 2:04 AM, Liu Jinsong <jinsong.liu@intel.com>
wrote:
> Generally, there are 2 cases:
> 1. broken page occurs before migration
> 2. broken page occurs during migration
>
I think we probably need to include the "last iteration" subcases here
as
well -- although, probably it would be better to put them in the comments.
Also make sure to note that now even suspend may do a second iteration if
pages are broken during the suspend operation.

Also, what you should do is have the main description always say what the
patch does, and then at the bottom have a summary of the changes; for
instance:

v4:
 - Adjust variables and patch description based on feedback
v3:
 - Handle pages broken on the last iteration
&c


> We had submitted 2 versions of patches. Their difference is how to handle
> case 2:
> V1 patch aborted migration for the case ''broken page occurs during
> migration'';
> V2 patch marked broken page to dirty bitmap then got handled next round
> and no abort;
>
> This is V3 patch, adding handle for vMCE occur at last iteration of
> migration:
> At the sender
>   For case 1, the broken page will be mapped but not copied to target
>   (otherwise it may trigger more serious error, say, SRAR error).
>   While its pfn_type and pfn number will be transferred to target
>   so that target take appropriate action.
>
>   For case 2, mce handler marks the broken page to dirty bitmap, so that
>   at copypages stage of migration, its pfn_type and pfn number will be
>   transferred to target and then take appropriate action.
>
> At the target
>   When migration target populates pages for guest. As for the case
>   of broken page wrt migration, we prefer to keep the type of the page,
>   for the sake of seamless migration.
>
>   At target it will set p2m as p2m_ram_broken for broken page. Guest MCE
>   may have good chance to handle its broken page, while if guest access
>   the broken page again it will kill itself as expected.
>
> If vMCE occur at the last memory copy iteration of migration, we do one
> more
> iteration so that the broken page''s pfn_type and pfn number has
chance to
> be transferred to target.
>
> Suggested-by: George Dunlap <george.dunlap@eu.citrix.com>
> Signed-off-by: Liu Jinsong <jinsong.liu@intel.com>
> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
>
> - for tools part of the patch
> Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
>
You should remove these until you get another ack (since the patch has
changed materially)

>
> - for hypervisor part of the patch
> Acked-by: Jan Beulich <JBeulich@suse.com>
>
I think if you haven''t changed the HV part since his ack, you can
probably
leave this one.

>
> diff -r 8b93ac0c93f3 -r 446f6b9bfc89 tools/libxc/xc_domain.c
> --- a/tools/libxc/xc_domain.c   Tue Nov 13 11:19:17 2012 +0000
> +++ b/tools/libxc/xc_domain.c   Sat Nov 17 09:46:05 2012 +0800
> @@ -283,6 +283,22 @@
>      return ret;
>  }
>
> +/* set broken page p2m */
> +int xc_set_broken_page_p2m(xc_interface *xch,
> +                           uint32_t domid,
> +                           unsigned long pfn)
> +{
> +    int ret;
> +    DECLARE_DOMCTL;
> +
> +    domctl.cmd = XEN_DOMCTL_set_broken_page_p2m;
> +    domctl.domain = (domid_t)domid;
> +    domctl.u.set_broken_page_p2m.pfn = pfn;
> +    ret = do_domctl(xch, &domctl);
> +
> +    return ret ? -1 : 0;
> +}
> +
>  /* get info from hvm guest for save */
>  int xc_domain_hvm_getcontext(xc_interface *xch,
>                               uint32_t domid,
> diff -r 8b93ac0c93f3 -r 446f6b9bfc89 tools/libxc/xc_domain_restore.c
> --- a/tools/libxc/xc_domain_restore.c   Tue Nov 13 11:19:17 2012 +0000
> +++ b/tools/libxc/xc_domain_restore.c   Sat Nov 17 09:46:05 2012 +0800
> @@ -1023,9 +1023,15 @@
>
>      countpages = count;
>      for (i = oldcount; i < buf->nr_pages; ++i)
> -        if ((buf->pfn_types[i] & XEN_DOMCTL_PFINFO_LTAB_MASK) =>
XEN_DOMCTL_PFINFO_XTAB
> -            ||(buf->pfn_types[i] & XEN_DOMCTL_PFINFO_LTAB_MASK)
=> XEN_DOMCTL_PFINFO_XALLOC)
> +    {
> +        unsigned long pagetype;
> +
> +        pagetype = buf->pfn_types[i] & XEN_DOMCTL_PFINFO_LTAB_MASK;
> +        if ( pagetype == XEN_DOMCTL_PFINFO_XTAB ||
> +             pagetype == XEN_DOMCTL_PFINFO_BROKEN ||
> +             pagetype == XEN_DOMCTL_PFINFO_XALLOC )
>              --countpages;
> +    }
>
>      if (!countpages)
>          return count;
> @@ -1267,6 +1273,17 @@
>              /* a bogus/unmapped/allocate-only page: skip it */
>              continue;
>
> +        if ( pagetype == XEN_DOMCTL_PFINFO_BROKEN )
> +        {
> +            if ( xc_set_broken_page_p2m(xch, dom, pfn) )
> +            {
> +                ERROR("Set p2m for broken page failed, "
> +                      "dom=%d, pfn=%lx\n", dom, pfn);
> +                goto err_mapped;
> +            }
> +            continue;
> +        }
> +
>          if (pfn_err[i])
>          {
>              ERROR("unexpected PFN mapping failure pfn %lx map_mfn %lx
> p2m_mfn %lx",
> diff -r 8b93ac0c93f3 -r 446f6b9bfc89 tools/libxc/xc_domain_save.c
> --- a/tools/libxc/xc_domain_save.c      Tue Nov 13 11:19:17 2012 +0000
> +++ b/tools/libxc/xc_domain_save.c      Sat Nov 17 09:46:05 2012 +0800
> @@ -1118,7 +1118,7 @@
>      /* Now write out each data page, canonicalising page tables as we
> go... */
>      for ( ; ; )
>      {
> -        unsigned int N, batch, run;
> +        unsigned int N, batch, run, broken_page_num1, broken_page_num2;
>
You need more descriptive variables here.  Maybe "broken_before_send"
and
"broken_after_send"?

>          char reportbuf[80];
>
>          snprintf(reportbuf, sizeof(reportbuf),
> @@ -1270,6 +1270,7 @@
>                  goto out;
>              }
>
> +            broken_page_num1 = 0;
>              for ( run = j = 0; j < batch; j++ )
>              {
>                  unsigned long gmfn = pfn_batch[j];
> @@ -1277,6 +1278,14 @@
>                  if ( !hvm )
>                      gmfn = pfn_to_mfn(gmfn);
>
> +                if ( pfn_type[j] == XEN_DOMCTL_PFINFO_BROKEN )
> +                {
> +                    pfn_type[j] |= pfn_batch[j];
> +                    ++broken_page_num1;
> +                    ++run;
> +                    continue;
> +                }
> +
>                  if ( pfn_err[j] )
>                  {
>                      if ( pfn_type[j] == XEN_DOMCTL_PFINFO_XTAB )
> @@ -1371,8 +1380,12 @@
>                      }
>                  }
>
> -                /* skip pages that aren''t present or are
alloc-only */
> +                /*
> +                 * skip pages that aren''t present,
> +                 * or are broken, or are alloc-only
> +                 */
>                  if ( pagetype == XEN_DOMCTL_PFINFO_XTAB
> +                    || pagetype == XEN_DOMCTL_PFINFO_BROKEN
>                      || pagetype == XEN_DOMCTL_PFINFO_XALLOC )
>                      continue;
>
> @@ -1484,6 +1497,35 @@
>
>              munmap(region_base, batch*PAGE_SIZE);
>
> +            /*
> +             * if vMCE occur at last iter, do one more iter so that it get
> +             * chance to transfer broken page''s pfn_type and pfn
number to
> +             * target and then take appropriate action
> +             */
>
Maybe, "Pages may have broken between mapping them and sending them.  Count
the number of broken pages after sending, and if there are more than before
sending, do another iteration to make sure the pages are marked broken on
the receiving side"?

> +            if ( last_iter )
> +            {
>
/* Read the p2m table again */

> +                for ( j = 0; j < batch; j++ )
> +                {
> +                    if ( hvm )
> +                        pfn_type[j] = pfn_batch[j];
> +                    else
> +                        pfn_type[j] = pfn_to_mfn(pfn_batch[j]);
> +                }
> +
> +                if ( xc_get_pfn_type_batch(xch, dom, batch, pfn_type) )
> +                {
> +                    PERROR("get_pfn_type_batch failed");
> +                    goto out;
> +                }
> +
>
/* Count the number of broken pages */

> +                broken_page_num2 = 0;
> +                for ( j = 0; j < batch; j++ )
> +                    if ( pfn_type[j] == XEN_DOMCTL_PFINFO_BROKEN )
> +                        broken_page_num2++;
> +
>
/* If there are more than before sending, do one more iteration */

> +                if ( broken_page_num1 < broken_page_num2 )
> +                    last_iter = 0;
> +            }
>          } /* end of this while loop for this iteration */
>
I think that''s all I have.

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Liu, Jinsong

2012-Nov-21 13:26 UTC

head link

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Ian Campbell wrote:> On Wed, 2012-11-21 at 11:34 +0000, George Dunlap wrote:
>> On 20/11/12 18:42, Ian Jackson wrote:
>>> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3] X86/vMCE:
handle
>>> broken page with regard to migration"): 
>>>> Ian Jackson wrote:
>>>>> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3]
X86/vMCE: handle
>>>>> broken page with regard to migration"):
>>>>>> No, at last lter, there are 4 points:
>>>>>> 1. start last iter
>>>>>> 2. get and transfer pfn_type to target
>>>>>> 3. copy page to target
>>>>>> 4. end last iter
>>> ...
>>>> It indeed checks mce after point 3 for each page, but
what''s the
>>>> advantage of keeping a separate list?
>>> It avoids yet another loop over all the pages.  Unless I have
>>> misunderstood.  Which I may have, because: if it checks for mce
>>> after 
>>> point 3 then surely that is sufficient ?  We don''t need to
worry
>>> about 
>>> mces after that check.
>> 
>> It''s sufficient, but wouldn''t each check require a
separate
>> hypercall? That would surely be slower than just a single hypercall
>> and a loop (which is what Jinsong''s patch does).
>> 
>> We don''t actually need a list -- I think we just need to know,
"Have
>> any pages broken between reading the p2m table (
>> xc_get_pfn_type_batch() ); if so, we do another full iteration.
> 
> If a page fails between 2. and 3. above then what happens at point 3?
> I presume we can''t map and send the page (since it is broken), do
we
> get some sort of failure to map?
> 
> What happens if the failure occurs during stage 3, i.e. while the page
> is mapped and we are reading from it?
> 
> Ian.
If read a broken page, it generates more serious error (say, SRAR error).
I don''t think guest has good opportunity to survive under this case
--> most probably it kill itself and of course we don''t need care
migration now.
However, if guest can luckly survive (say complete broken page copying to
target), it''s OK to continue --> its broken pfn_type will transfer
to target next iter so guest will kill itself if access then.

Thanks,
Jinsong

Liu, Jinsong

2012-Nov-21 13:31 UTC

head link

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Yes, I will update and send V4 later.

Thanks,
Jinsong

________________________________
From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of George Dunlap
Sent: Wednesday, November 21, 2012 8:17 PM
To: Liu, Jinsong
Cc: Ian Campbell; Ian Jackson; Jan Beulich; xen-devel@lists.xensource.com
Subject: Re: [Xen-devel] [PATCH V3] X86/vMCE: handle broken page with regard to
migration

OK, so now that we seem to have agreed on the overall approach, down to
"brass tacks", as we say in English:

On Sat, Nov 17, 2012 at 2:04 AM, Liu Jinsong
<jinsong.liu@intel.com<mailto:jinsong.liu@intel.com>> wrote:
Generally, there are 2 cases:
1. broken page occurs before migration
2. broken page occurs during migration

I think we probably need to include the "last iteration" subcases here
as well -- although, probably it would be better to put them in the comments. 
Also make sure to note that now even suspend may do a second iteration if pages
are broken during the suspend operation.

Also, what you should do is have the main description always say what the patch
does, and then at the bottom have a summary of the changes; for instance:

v4:
 - Adjust variables and patch description based on feedback
v3:
 - Handle pages broken on the last iteration
&c



We had submitted 2 versions of patches. Their difference is how to handle case
2:
V1 patch aborted migration for the case ''broken page occurs during
migration'';
V2 patch marked broken page to dirty bitmap then got handled next round and no
abort;

This is V3 patch, adding handle for vMCE occur at last iteration of migration:
At the sender
  For case 1, the broken page will be mapped but not copied to target
  (otherwise it may trigger more serious error, say, SRAR error).
  While its pfn_type and pfn number will be transferred to target
  so that target take appropriate action.

  For case 2, mce handler marks the broken page to dirty bitmap, so that
  at copypages stage of migration, its pfn_type and pfn number will be
  transferred to target and then take appropriate action.

At the target
  When migration target populates pages for guest. As for the case
  of broken page wrt migration, we prefer to keep the type of the page,
  for the sake of seamless migration.

  At target it will set p2m as p2m_ram_broken for broken page. Guest MCE
  may have good chance to handle its broken page, while if guest access
  the broken page again it will kill itself as expected.

If vMCE occur at the last memory copy iteration of migration, we do one more
iteration so that the broken page''s pfn_type and pfn number has chance
to
be transferred to target.

Suggested-by: George Dunlap
<george.dunlap@eu.citrix.com<mailto:george.dunlap@eu.citrix.com>>
Signed-off-by: Liu Jinsong
<jinsong.liu@intel.com<mailto:jinsong.liu@intel.com>>
Acked-by: George Dunlap
<george.dunlap@eu.citrix.com<mailto:george.dunlap@eu.citrix.com>>

- for tools part of the patch
Acked-by: Ian Jackson
<Ian.Jackson@eu.citrix.com<mailto:Ian.Jackson@eu.citrix.com>>

You should remove these until you get another ack (since the patch has changed
materially)


- for hypervisor part of the patch
Acked-by: Jan Beulich <JBeulich@suse.com<mailto:JBeulich@suse.com>>

I think if you haven''t changed the HV part since his ack, you can
probably leave this one.


diff -r 8b93ac0c93f3 -r 446f6b9bfc89 tools/libxc/xc_domain.c
--- a/tools/libxc/xc_domain.c   Tue Nov 13 11:19:17 2012 +0000
+++ b/tools/libxc/xc_domain.c   Sat Nov 17 09:46:05 2012 +0800
@@ -283,6 +283,22 @@
     return ret;
 }

+/* set broken page p2m */
+int xc_set_broken_page_p2m(xc_interface *xch,
+                           uint32_t domid,
+                           unsigned long pfn)
+{
+    int ret;
+    DECLARE_DOMCTL;
+
+    domctl.cmd = XEN_DOMCTL_set_broken_page_p2m;
+    domctl.domain = (domid_t)domid;
+    domctl.u.set_broken_page_p2m.pfn = pfn;
+    ret = do_domctl(xch, &domctl);
+
+    return ret ? -1 : 0;
+}
+
 /* get info from hvm guest for save */
 int xc_domain_hvm_getcontext(xc_interface *xch,
                              uint32_t domid,
diff -r 8b93ac0c93f3 -r 446f6b9bfc89 tools/libxc/xc_domain_restore.c
--- a/tools/libxc/xc_domain_restore.c   Tue Nov 13 11:19:17 2012 +0000
+++ b/tools/libxc/xc_domain_restore.c   Sat Nov 17 09:46:05 2012 +0800
@@ -1023,9 +1023,15 @@

     countpages = count;
     for (i = oldcount; i < buf->nr_pages; ++i)
-        if ((buf->pfn_types[i] & XEN_DOMCTL_PFINFO_LTAB_MASK) ==
XEN_DOMCTL_PFINFO_XTAB
-            ||(buf->pfn_types[i] & XEN_DOMCTL_PFINFO_LTAB_MASK) ==
XEN_DOMCTL_PFINFO_XALLOC)
+    {
+        unsigned long pagetype;
+
+        pagetype = buf->pfn_types[i] & XEN_DOMCTL_PFINFO_LTAB_MASK;
+        if ( pagetype == XEN_DOMCTL_PFINFO_XTAB ||
+             pagetype == XEN_DOMCTL_PFINFO_BROKEN ||
+             pagetype == XEN_DOMCTL_PFINFO_XALLOC )
             --countpages;
+    }

     if (!countpages)
         return count;
@@ -1267,6 +1273,17 @@
             /* a bogus/unmapped/allocate-only page: skip it */
             continue;

+        if ( pagetype == XEN_DOMCTL_PFINFO_BROKEN )
+        {
+            if ( xc_set_broken_page_p2m(xch, dom, pfn) )
+            {
+                ERROR("Set p2m for broken page failed, "
+                      "dom=%d, pfn=%lx\n", dom, pfn);
+                goto err_mapped;
+            }
+            continue;
+        }
+
         if (pfn_err[i])
         {
             ERROR("unexpected PFN mapping failure pfn %lx map_mfn %lx
p2m_mfn %lx",
diff -r 8b93ac0c93f3 -r 446f6b9bfc89 tools/libxc/xc_domain_save.c
--- a/tools/libxc/xc_domain_save.c      Tue Nov 13 11:19:17 2012 +0000
+++ b/tools/libxc/xc_domain_save.c      Sat Nov 17 09:46:05 2012 +0800
@@ -1118,7 +1118,7 @@
     /* Now write out each data page, canonicalising page tables as we go... */
     for ( ; ; )
     {
-        unsigned int N, batch, run;
+        unsigned int N, batch, run, broken_page_num1, broken_page_num2;

You need more descriptive variables here.  Maybe "broken_before_send"
and "broken_after_send"?

         char reportbuf[80];

         snprintf(reportbuf, sizeof(reportbuf),
@@ -1270,6 +1270,7 @@
                 goto out;
             }

+            broken_page_num1 = 0;
             for ( run = j = 0; j < batch; j++ )
             {
                 unsigned long gmfn = pfn_batch[j];
@@ -1277,6 +1278,14 @@
                 if ( !hvm )
                     gmfn = pfn_to_mfn(gmfn);

+                if ( pfn_type[j] == XEN_DOMCTL_PFINFO_BROKEN )
+                {
+                    pfn_type[j] |= pfn_batch[j];
+                    ++broken_page_num1;
+                    ++run;
+                    continue;
+                }
+
                 if ( pfn_err[j] )
                 {
                     if ( pfn_type[j] == XEN_DOMCTL_PFINFO_XTAB )
@@ -1371,8 +1380,12 @@
                     }
                 }

-                /* skip pages that aren''t present or are alloc-only */
+                /*
+                 * skip pages that aren''t present,
+                 * or are broken, or are alloc-only
+                 */
                 if ( pagetype == XEN_DOMCTL_PFINFO_XTAB
+                    || pagetype == XEN_DOMCTL_PFINFO_BROKEN
                     || pagetype == XEN_DOMCTL_PFINFO_XALLOC )
                     continue;

@@ -1484,6 +1497,35 @@

             munmap(region_base, batch*PAGE_SIZE);

+            /*
+             * if vMCE occur at last iter, do one more iter so that it get
+             * chance to transfer broken page''s pfn_type and pfn
number to
+             * target and then take appropriate action
+             */

Maybe, "Pages may have broken between mapping them and sending them.  Count
the number of broken pages after sending, and if there are more than before
sending, do another iteration to make sure the pages are marked broken on the
receiving side"?

+            if ( last_iter )
+            {

/* Read the p2m table again */

+                for ( j = 0; j < batch; j++ )
+                {
+                    if ( hvm )
+                        pfn_type[j] = pfn_batch[j];
+                    else
+                        pfn_type[j] = pfn_to_mfn(pfn_batch[j]);
+                }
+
+                if ( xc_get_pfn_type_batch(xch, dom, batch, pfn_type) )
+                {
+                    PERROR("get_pfn_type_batch failed");
+                    goto out;
+                }
+

/* Count the number of broken pages */

+                broken_page_num2 = 0;
+                for ( j = 0; j < batch; j++ )
+                    if ( pfn_type[j] == XEN_DOMCTL_PFINFO_BROKEN )
+                        broken_page_num2++;
+

/* If there are more than before sending, do one more iteration */

+                if ( broken_page_num1 < broken_page_num2 )
+                    last_iter = 0;
+            }
         } /* end of this while loop for this iteration */

I think that''s all I have.

 -George



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Jan Beulich

2012-Nov-21 13:37 UTC

head link

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

>>> On 21.11.12 at 14:26, "Liu, Jinsong"
<jinsong.liu@intel.com> wrote:
> Ian Campbell wrote:
>> On Wed, 2012-11-21 at 11:34 +0000, George Dunlap wrote:
>>> On 20/11/12 18:42, Ian Jackson wrote:
>>>> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3] X86/vMCE:
handle
>>>> broken page with regard to migration"): 
>>>>> Ian Jackson wrote:
>>>>>> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3]
X86/vMCE: handle
>>>>>> broken page with regard to migration"):
>>>>>>> No, at last lter, there are 4 points:
>>>>>>> 1. start last iter
>>>>>>> 2. get and transfer pfn_type to target
>>>>>>> 3. copy page to target
>>>>>>> 4. end last iter
>>>> ...
>>>>> It indeed checks mce after point 3 for each page, but
what''s the
>>>>> advantage of keeping a separate list?
>>>> It avoids yet another loop over all the pages.  Unless I have
>>>> misunderstood.  Which I may have, because: if it checks for mce
>>>> after 
>>>> point 3 then surely that is sufficient ?  We don''t
need to worry
>>>> about 
>>>> mces after that check.
>>> 
>>> It''s sufficient, but wouldn''t each check require
a separate
>>> hypercall? That would surely be slower than just a single hypercall
>>> and a loop (which is what Jinsong''s patch does).
>>> 
>>> We don''t actually need a list -- I think we just need to
know, "Have
>>> any pages broken between reading the p2m table (
>>> xc_get_pfn_type_batch() ); if so, we do another full iteration.
>> 
>> If a page fails between 2. and 3. above then what happens at point 3?
>> I presume we can''t map and send the page (since it is broken),
do we
>> get some sort of failure to map?
>> 
>> What happens if the failure occurs during stage 3, i.e. while the page
>> is mapped and we are reading from it?
>> 
>> Ian.
> 
> If read a broken page, it generates more serious error (say, SRAR error).
> I don''t think guest has good opportunity to survive under this
case --> most
> probably it kill itself and of course we don''t need care migration
now.
> However, if guest can luckly survive (say complete broken page copying to 
> target), it''s OK to continue --> its broken pfn_type will
transfer to target
> next iter so guest will kill itself if access then.
I think you misread the question - it said "we", as in "the
tools/
kernel/hypervisor" (at least that''s how I''m reading it).
The MCE
would surface in host context in this case, and whether that''s
fatal to the host depends on the precise properties of the event.

Jan

George Dunlap

2012-Nov-21 13:59 UTC

head link

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

On 21/11/12 13:26, Liu, Jinsong wrote:> Ian Campbell wrote:
>> On Wed, 2012-11-21 at 11:34 +0000, George Dunlap wrote:
>>> On 20/11/12 18:42, Ian Jackson wrote:
>>>> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3] X86/vMCE:
handle
>>>> broken page with regard to migration"):
>>>>> Ian Jackson wrote:
>>>>>> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3]
X86/vMCE: handle
>>>>>> broken page with regard to migration"):
>>>>>>> No, at last lter, there are 4 points:
>>>>>>> 1. start last iter
>>>>>>> 2. get and transfer pfn_type to target
>>>>>>> 3. copy page to target
>>>>>>> 4. end last iter
>>>> ...
>>>>> It indeed checks mce after point 3 for each page, but
what''s the
>>>>> advantage of keeping a separate list?
>>>> It avoids yet another loop over all the pages.  Unless I have
>>>> misunderstood.  Which I may have, because: if it checks for mce
>>>> after
>>>> point 3 then surely that is sufficient ?  We don''t
need to worry
>>>> about
>>>> mces after that check.
>>> It''s sufficient, but wouldn''t each check require
a separate
>>> hypercall? That would surely be slower than just a single hypercall
>>> and a loop (which is what Jinsong''s patch does).
>>>
>>> We don''t actually need a list -- I think we just need to
know, "Have
>>> any pages broken between reading the p2m table (
>>> xc_get_pfn_type_batch() ); if so, we do another full iteration.
>> If a page fails between 2. and 3. above then what happens at point 3?
>> I presume we can''t map and send the page (since it is broken),
do we
>> get some sort of failure to map?
>>
>> What happens if the failure occurs during stage 3, i.e. while the page
>> is mapped and we are reading from it?
>>
>> Ian.
> If read a broken page, it generates more serious error (say, SRAR error).
> I don''t think guest has good opportunity to survive under this
case --> most probably it kill itself and of course we don''t need
care migration now.
> However, if guest can luckly survive (say complete broken page copying to
target), it''s OK to continue --> its broken pfn_type will transfer
to target next iter so guest will kill itself if access then.
But in this case, I''m asking what happens if the migration code reads 
the page.  If reading the page in the migration code causes dom0 to 
crash, then the whole "last iteration" stuff is fairly pointless. :-)

  -George

Liu, Jinsong

2012-Nov-22 11:23 UTC

head link

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Jan Beulich wrote:>>>> On 21.11.12 at 14:26, "Liu, Jinsong"
<jinsong.liu@intel.com>
>>>> wrote: Ian Campbell wrote: On Wed, 2012-11-21 at 11:34 +0000,
>>>> George Dunlap wrote: On 20/11/12 18:42, Ian Jackson wrote: 
>>>>> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3]
X86/vMCE: handle
>>>>> broken page with regard to migration"):
>>>>>> Ian Jackson wrote:
>>>>>>> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH
V3] X86/vMCE:
>>>>>>> handle broken page with regard to migration"):
>>>>>>>> No, at last lter, there are 4 points:
>>>>>>>> 1. start last iter
>>>>>>>> 2. get and transfer pfn_type to target
>>>>>>>> 3. copy page to target
>>>>>>>> 4. end last iter
>>>>> ...
>>>>>> It indeed checks mce after point 3 for each page, but
what''s the
>>>>>> advantage of keeping a separate list?
>>>>> It avoids yet another loop over all the pages.  Unless I
have
>>>>> misunderstood.  Which I may have, because: if it checks for
mce
>>>>> after point 3 then surely that is sufficient ?  We
don''t need to
>>>>> worry about mces after that check.
>>>> 
>>>> It''s sufficient, but wouldn''t each check
require a separate
>>>> hypercall? That would surely be slower than just a single
hypercall
>>>> and a loop (which is what Jinsong''s patch does).
>>>> 
>>>> We don''t actually need a list -- I think we just need
to know,
>>>> "Have any pages broken between reading the p2m table (
>>>> xc_get_pfn_type_batch() ); if so, we do another full iteration.
>>> 
>>> If a page fails between 2. and 3. above then what happens at point
>>> 3? I presume we can''t map and send the page (since it is
broken),
>>> do we get some sort of failure to map?
>>> 
>>> What happens if the failure occurs during stage 3, i.e. while the
>>> page is mapped and we are reading from it?
>>> 
>>> Ian.
>> 
>> If read a broken page, it generates more serious error (say, SRAR
>> error). 
>> I don''t think guest has good opportunity to survive under this
case
>> --> most probably it kill itself and of course we don''t
need care
>> migration now. However, if guest can luckly survive (say complete
>> broken page copying to target), it''s OK to continue --> its
broken
>> pfn_type will transfer to target next iter so guest will kill itself
>> if access then. 
> 
> I think you misread the question - it said "we", as in "the
tools/
> kernel/hypervisor" (at least that''s how I''m reading
it). The MCE
> would surface in host context in this case, and whether that''s
> fatal to the host depends on the precise properties of the event.
> 
> Jan
Yes, depending on error types, both hypervisor and guest may crash.
As for tools I think it''s OK if only hypervisor OK.

Thanks,
Jinsong

Liu, Jinsong

2012-Nov-22 11:44 UTC

head link

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

George Dunlap wrote:> On 21/11/12 13:26, Liu, Jinsong wrote:
>> Ian Campbell wrote:
>>> On Wed, 2012-11-21 at 11:34 +0000, George Dunlap wrote:
>>>> On 20/11/12 18:42, Ian Jackson wrote:
>>>>> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3]
X86/vMCE: handle
>>>>> broken page with regard to migration"):
>>>>>> Ian Jackson wrote:
>>>>>>> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH
V3] X86/vMCE:
>>>>>>> handle broken page with regard to migration"):
>>>>>>>> No, at last lter, there are 4 points:
>>>>>>>> 1. start last iter
>>>>>>>> 2. get and transfer pfn_type to target
>>>>>>>> 3. copy page to target
>>>>>>>> 4. end last iter
>>>>> ...
>>>>>> It indeed checks mce after point 3 for each page, but
what''s the
>>>>>> advantage of keeping a separate list?
>>>>> It avoids yet another loop over all the pages.  Unless I
have
>>>>> misunderstood.  Which I may have, because: if it checks for
mce
>>>>> after point 3 then surely that is sufficient ?  We
don''t need to
>>>>> worry about mces after that check.
>>>> It''s sufficient, but wouldn''t each check
require a separate
>>>> hypercall? That would surely be slower than just a single
hypercall
>>>> and a loop (which is what Jinsong''s patch does).
>>>> 
>>>> We don''t actually need a list -- I think we just need
to know,
>>>> "Have any pages broken between reading the p2m table (
>>>> xc_get_pfn_type_batch() ); if so, we do another full iteration.
>>> If a page fails between 2. and 3. above then what happens at point
>>> 3? I presume we can''t map and send the page (since it is
broken),
>>> do we get some sort of failure to map?
>>> 
>>> What happens if the failure occurs during stage 3, i.e. while the
>>> page is mapped and we are reading from it?
>>> 
>>> Ian.
>> If read a broken page, it generates more serious error (say, SRAR
>> error). 
>> I don''t think guest has good opportunity to survive under this
case
>> --> most probably it kill itself and of course we don''t
need care
>> migration now.  
>> However, if guest can luckly survive (say complete broken page
>> copying to target), it''s OK to continue --> its broken
pfn_type will
>> transfer to target next iter so guest will kill itself if access
>> then.   
> 
> But in this case, I''m asking what happens if the migration code
reads
> the page.  If reading the page in the migration code causes dom0 to
> crash, then the whole "last iteration" stuff is fairly pointless.
:-)
> 
>   -George
If migration code read the page it will trigger more serious error and may kill
hypervisor or guest.

But unfortunately we cannot prevent it since we cannot predict whether a vmce
will occur *during* migration. What we can do is do our best to handle it:
1. for vmce occur before migration, we can safely handle it;
2. for vmce occur during migration, we can only do our best:
  2.1 if fortunately vmce occur at some area (say, before point2), we can
successfully prevent page reading;
  2.1 if vmce occur after point2, it will read the page, under such case
    * if guest/hypervisor can survive, it''s OK to transfer broken
pfn_type to target so that no further harm to target;
    * if guest/hypervisor crash, we definitely needn''t care migration
any more;
The key point is, before migration we have no way to predict it, and we cannot
forbid migration for fear that it potentially crash system.

Thanks,
Jinsong

Thanks,
Jinsong

Liu, Jinsong

2012-Nov-22 12:37 UTC

head link

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

________________________________
From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of George Dunlap
Sent: Wednesday, November 21, 2012 8:17 PM
To: Liu, Jinsong
Cc: Ian Campbell; Ian Jackson; Jan Beulich; xen-devel@lists.xensource.com
Subject: Re: [Xen-devel] [PATCH V3] X86/vMCE: handle broken page with regard to
migration

OK, so now that we seem to have agreed on the overall approach, down to
"brass tacks", as we say in English:

On Sat, Nov 17, 2012 at 2:04 AM, Liu Jinsong
<jinsong.liu@intel.com<mailto:jinsong.liu@intel.com>> wrote:
Generally, there are 2 cases:
1. broken page occurs before migration
2. broken page occurs during migration

I think we probably need to include the "last iteration" subcases here
as well -- although, probably it would be better to put them in the comments. 
Also make sure to note that now even suspend may do a second iteration if pages
are broken during the suspend operation.

Also, what you should do is have the main description always say what the patch
does, and then at the bottom have a summary of the changes; for instance:

v4:
 - Adjust variables and patch description based on feedback
v3:
 - Handle pages broken on the last iteration
&c

We had submitted 2 versions of patches. Their difference is how to handle case
2:
V1 patch aborted migration for the case ''broken page occurs during
migration'';
V2 patch marked broken page to dirty bitmap then got handled next round and no
abort;

This is V3 patch, adding handle for vMCE occur at last iteration of migration:
At the sender
  For case 1, the broken page will be mapped but not copied to target
  (otherwise it may trigger more serious error, say, SRAR error).
  While its pfn_type and pfn number will be transferred to target
  so that target take appropriate action.

  For case 2, mce handler marks the broken page to dirty bitmap, so that
  at copypages stage of migration, its pfn_type and pfn number will be
  transferred to target and then take appropriate action.

At the target
  When migration target populates pages for guest. As for the case
  of broken page wrt migration, we prefer to keep the type of the page,
  for the sake of seamless migration.

  At target it will set p2m as p2m_ram_broken for broken page. Guest MCE
  may have good chance to handle its broken page, while if guest access
  the broken page again it will kill itself as expected.

If vMCE occur at the last memory copy iteration of migration, we do one more
iteration so that the broken page''s pfn_type and pfn number has chance
to
be transferred to target.

Suggested-by: George Dunlap
<george.dunlap@eu.citrix.com<mailto:george.dunlap@eu.citrix.com>>
Signed-off-by: Liu Jinsong
<jinsong.liu@intel.com<mailto:jinsong.liu@intel.com>>
Acked-by: George Dunlap
<george.dunlap@eu.citrix.com<mailto:george.dunlap@eu.citrix.com>>

- for tools part of the patch
Acked-by: Ian Jackson
<Ian.Jackson@eu.citrix.com<mailto:Ian.Jackson@eu.citrix.com>>

You should remove these until you get another ack (since the patch has changed
materially)

- for hypervisor part of the patch
Acked-by: Jan Beulich <JBeulich@suse.com<mailto:JBeulich@suse.com>>

I think if you haven''t changed the HV part since his ack, you can
probably leave this one.

Jan, for hypervisor side your opinion of acked-by ? I''m updating to V4.

Thanks,
Jinsong

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Jan Beulich

2012-Nov-22 13:36 UTC

head link

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

>>> On 22.11.12 at 13:37, "Liu, Jinsong"
<jinsong.liu@intel.com> wrote:
> - for hypervisor part of the patch
> Acked-by: Jan Beulich
<JBeulich@suse.com<mailto:JBeulich@suse.com>>
> 
> I think if you haven''t changed the HV part since his ack, you can
probably
> leave this one.
> 
> Jan, for hypervisor side your opinion of acked-by ? I''m updating
to V4.
As George said - if you didn''t change the hypervisor bits, you
can keep the ack. Else remove it.

Jan

Xen devel - Nov 2012 - [PATCH V3] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

[PATCH V3] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration