thr3ads.net - Xen devel - [PATCH V4] X86/vMCE: handle broken page with regard to migration [Nov 2012]

If this information is useful, please help other people find it:
Share via:

Liu, Jinsong

2012-Nov-22 16:28 UTC

Re: [PATCH V4] X86/vMCE: handle broken page with regard to migration

Sorry all, please ignore last 2 email, I mis-operate hg email :(
I''m check the reason and will send out later.

Thanks,
Jinsong

Liu Jinsong wrote:> Generally, there are 2 cases:
> 1. broken page occurs before migration
> 2. broken page occurs during migration
> 
> We had submitted 2 versions of patches. Their difference is how to
> handle case 2: 
> V1 patch aborted migration for the case ''broken page occurs during
> migration''; 
> V2 patch marked broken page to dirty bitmap then got handled next
> round and no abort; 
> 
> This is V3 patch, adding handle for vMCE occur at last iteration of
> migration: 
> At the sender
>   For case 1, the broken page will be mapped but not copied to target
>   (otherwise it may trigger more serious error, say, SRAR error).
>   While its pfn_type and pfn number will be transferred to target
>   so that target take appropriate action.
> 
>   For case 2, mce handler marks the broken page to dirty bitmap, so
>   that at copypages stage of migration, its pfn_type and pfn number
>   will be transferred to target and then take appropriate action.
> 
> At the target
>   When migration target populates pages for guest. As for the case
>   of broken page wrt migration, we prefer to keep the type of the
>   page, for the sake of seamless migration.
> 
>   At target it will set p2m as p2m_ram_broken for broken page. Guest
>   MCE may have good chance to handle its broken page, while if guest
>   access the broken page again it will kill itself as expected.
> 
> If vMCE occur at the last memory copy iteration of migration, we do
> one more 
> iteration so that the broken page''s pfn_type and pfn number has
> chance to 
> be transferred to target.
> 
> Suggested-by: George Dunlap <george.dunlap@eu.citrix.com>
> Signed-off-by: Liu Jinsong <jinsong.liu@intel.com>
> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
> 
> - for tools part of the patch
> Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
> 
> - for hypervisor part of the patch
> Acked-by: Jan Beulich <JBeulich@suse.com>
> 
> diff -r 8b93ac0c93f3 -r 446f6b9bfc89 tools/libxc/xc_domain.c
> --- a/tools/libxc/xc_domain.c	Tue Nov 13 11:19:17 2012 +0000
> +++ b/tools/libxc/xc_domain.c	Sat Nov 17 09:46:05 2012 +0800
> @@ -283,6 +283,22 @@
>      return ret;
>  }
> 
> +/* set broken page p2m */
> +int xc_set_broken_page_p2m(xc_interface *xch,
> +                           uint32_t domid,
> +                           unsigned long pfn)
> +{
> +    int ret;
> +    DECLARE_DOMCTL;
> +
> +    domctl.cmd = XEN_DOMCTL_set_broken_page_p2m;
> +    domctl.domain = (domid_t)domid;
> +    domctl.u.set_broken_page_p2m.pfn = pfn;
> +    ret = do_domctl(xch, &domctl);
> +
> +    return ret ? -1 : 0;
> +}
> +
>  /* get info from hvm guest for save */
>  int xc_domain_hvm_getcontext(xc_interface *xch,
>                               uint32_t domid,
> diff -r 8b93ac0c93f3 -r 446f6b9bfc89 tools/libxc/xc_domain_restore.c
> --- a/tools/libxc/xc_domain_restore.c	Tue Nov 13 11:19:17 2012 +0000
> +++ b/tools/libxc/xc_domain_restore.c	Sat Nov 17 09:46:05 2012 +0800
> @@ -1023,9 +1023,15 @@
> 
>      countpages = count;
>      for (i = oldcount; i < buf->nr_pages; ++i)
> -        if ((buf->pfn_types[i] & XEN_DOMCTL_PFINFO_LTAB_MASK) =>
XEN_DOMCTL_PFINFO_XTAB
> -            ||(buf->pfn_types[i] & XEN_DOMCTL_PFINFO_LTAB_MASK)
=> XEN_DOMCTL_PFINFO_XALLOC) +    {
> +        unsigned long pagetype;
> +
> +        pagetype = buf->pfn_types[i] & XEN_DOMCTL_PFINFO_LTAB_MASK;
> +        if ( pagetype == XEN_DOMCTL_PFINFO_XTAB ||
> +             pagetype == XEN_DOMCTL_PFINFO_BROKEN ||
> +             pagetype == XEN_DOMCTL_PFINFO_XALLOC )
>              --countpages;
> +    }
> 
>      if (!countpages)
>          return count;
> @@ -1267,6 +1273,17 @@
>              /* a bogus/unmapped/allocate-only page: skip it */
>              continue;
> 
> +        if ( pagetype == XEN_DOMCTL_PFINFO_BROKEN )
> +        {
> +            if ( xc_set_broken_page_p2m(xch, dom, pfn) )
> +            {
> +                ERROR("Set p2m for broken page failed, "
> +                      "dom=%d, pfn=%lx\n", dom, pfn);
> +                goto err_mapped;
> +            }
> +            continue;
> +        }
> +
>          if (pfn_err[i])
>          {
>              ERROR("unexpected PFN mapping failure pfn %lx map_mfn
> %lx p2m_mfn %lx", 
> diff -r 8b93ac0c93f3 -r 446f6b9bfc89 tools/libxc/xc_domain_save.c
> --- a/tools/libxc/xc_domain_save.c	Tue Nov 13 11:19:17 2012 +0000
> +++ b/tools/libxc/xc_domain_save.c	Sat Nov 17 09:46:05 2012 +0800
> @@ -1118,7 +1118,7 @@
>      /* Now write out each data page, canonicalising page tables as
>      we go... */ for ( ; ; )
>      {
> -        unsigned int N, batch, run;
> +        unsigned int N, batch, run, broken_page_num1,
>          broken_page_num2; char reportbuf[80];
> 
>          snprintf(reportbuf, sizeof(reportbuf),
> @@ -1270,6 +1270,7 @@
>                  goto out;
>              }
> 
> +            broken_page_num1 = 0;
>              for ( run = j = 0; j < batch; j++ )
>              {
>                  unsigned long gmfn = pfn_batch[j];
> @@ -1277,6 +1278,14 @@
>                  if ( !hvm )
>                      gmfn = pfn_to_mfn(gmfn);
> 
> +                if ( pfn_type[j] == XEN_DOMCTL_PFINFO_BROKEN )
> +                {
> +                    pfn_type[j] |= pfn_batch[j];
> +                    ++broken_page_num1;
> +                    ++run;
> +                    continue;
> +                }
> +
>                  if ( pfn_err[j] )
>                  {
>                      if ( pfn_type[j] == XEN_DOMCTL_PFINFO_XTAB )
> @@ -1371,8 +1380,12 @@
>                      }
>                  }
> 
> -                /* skip pages that aren''t present or are
alloc-only
> */ +                /*
> +                 * skip pages that aren''t present,
> +                 * or are broken, or are alloc-only
> +                 */
>                  if ( pagetype == XEN_DOMCTL_PFINFO_XTAB
> +                    || pagetype == XEN_DOMCTL_PFINFO_BROKEN
>                      || pagetype == XEN_DOMCTL_PFINFO_XALLOC )
>                      continue;
> 
> @@ -1484,6 +1497,35 @@
> 
>              munmap(region_base, batch*PAGE_SIZE);
> 
> +            /*
> +             * if vMCE occur at last iter, do one more iter so that
> it get +             * chance to transfer broken page''s pfn_type
and
> pfn number to +             * target and then take appropriate action
> +             */
> +            if ( last_iter )
> +            {
> +                for ( j = 0; j < batch; j++ )
> +                {
> +                    if ( hvm )
> +                        pfn_type[j] = pfn_batch[j];
> +                    else
> +                        pfn_type[j] = pfn_to_mfn(pfn_batch[j]);
> +                }
> +
> +                if ( xc_get_pfn_type_batch(xch, dom, batch,
> pfn_type) ) +                {
> +                    PERROR("get_pfn_type_batch failed");
> +                    goto out;
> +                }
> +
> +                broken_page_num2 = 0;
> +                for ( j = 0; j < batch; j++ )
> +                    if ( pfn_type[j] == XEN_DOMCTL_PFINFO_BROKEN )
> +                        broken_page_num2++;
> +
> +                if ( broken_page_num1 < broken_page_num2 )
> +                    last_iter = 0;
> +            }
>          } /* end of this while loop for this iteration */
> 
>        skip:
> @@ -1550,23 +1592,22 @@
>                      PERROR("Error when writing to state file
(tsc)");
>                      goto out;
>                  }
> +            }
> +        }
> +        else
> +            last_iter = 1;
> 
> +        if ( xc_shadow_control(xch, dom,
> +                               XEN_DOMCTL_SHADOW_OP_CLEAN,
> HYPERCALL_BUFFER(to_send), +                              
> dinfo->p2m_size, NULL, 0, &shadow_stats) != dinfo->p2m_size ) +
> { +            PERROR("Error flushing shadow PT");
> +            goto out;
> +        }
> 
> -            }
> +        sent_last_iter = sent_this_iter;
> 
> -            if ( xc_shadow_control(xch, dom,
> -                                   XEN_DOMCTL_SHADOW_OP_CLEAN,
> HYPERCALL_BUFFER(to_send), 
> -                                   dinfo->p2m_size, NULL, 0,
> &shadow_stats) != dinfo->p2m_size ) 
> -            {
> -                PERROR("Error flushing shadow PT");
> -                goto out;
> -            }
> -
> -            sent_last_iter = sent_this_iter;
> -
> -            print_stats(xch, dom, sent_this_iter, &time_stats,
> &shadow_stats, 1); -
> -        }
> +        print_stats(xch, dom, sent_this_iter, &time_stats,
>      &shadow_stats, 1); } /* end of infinite for loop */
> 
>      DPRINTF("All memory is saved\n");
> diff -r 8b93ac0c93f3 -r 446f6b9bfc89 tools/libxc/xenctrl.h
> --- a/tools/libxc/xenctrl.h	Tue Nov 13 11:19:17 2012 +0000
> +++ b/tools/libxc/xenctrl.h	Sat Nov 17 09:46:05 2012 +0800
> @@ -575,6 +575,17 @@
>                            xc_domaininfo_t *info);
> 
>  /**
> + * This function set p2m for broken page
> + * &parm xch a handle to an open hypervisor interface
> + * @parm domid the domain id which broken page belong to
> + * @parm pfn the pfn number of the broken page
> + * @return 0 on success, -1 on failure
> + */
> +int xc_set_broken_page_p2m(xc_interface *xch,
> +                           uint32_t domid,
> +                           unsigned long pfn);
> +
> +/**
>   * This function returns information about the context of a hvm
> domain 
>   * @parm xch a handle to an open hypervisor interface
>   * @parm domid the domain to get information from
> diff -r 8b93ac0c93f3 -r 446f6b9bfc89
> xen/arch/x86/cpu/mcheck/mcaction.c ---
> a/xen/arch/x86/cpu/mcheck/mcaction.c	Tue Nov 13 11:19:17 2012 +0000
> +++ b/xen/arch/x86/cpu/mcheck/mcaction.c	Sat Nov 17 09:46:05 2012
>  +0800 @@ -1,5 +1,6 @@ #include <xen/types.h>
>  #include <xen/sched.h>
> +#include <asm/p2m.h>
>  #include "mcaction.h"
>  #include "vmce.h"
>  #include "mce.h"
> @@ -91,6 +92,24 @@
>                      goto vmce_failed;
>                  }
> 
> +                if ( is_hvm_domain(d) &&
> !d->arch.hvm_domain.dirty_vram && +                    
> paging_mode_log_dirty(d) ) +                {
> +                    /*
> +                     * vMCE occur during migration
> +                     *
> +                     *   At sender, it marks broken page to dirty
> bitmap, +                     *   so that at copypages stage of
> migration, broken +                     *   page''s pfn_type and
pfn
> number would be transferred +                     *   to target and
> then take appropriate action. +                     *
> +                     *   At target, it would set p2m as
> p2m_ram_broken for +                     *   broken page, so that if
> guest access the broken page +                     *   again, it
> would kill itself as expected. +                     */
> +                    paging_mark_dirty(d, mfn);
> +                }
> +
>                  if ( unmmap_broken_page(d, _mfn(mfn), gfn) )
>                  {
>                      printk("Unmap broken memory %lx for DOM%d
> failed\n", 
> diff -r 8b93ac0c93f3 -r 446f6b9bfc89 xen/arch/x86/domctl.c
> --- a/xen/arch/x86/domctl.c	Tue Nov 13 11:19:17 2012 +0000
> +++ b/xen/arch/x86/domctl.c	Sat Nov 17 09:46:05 2012 +0800
> @@ -209,12 +209,18 @@
>                  for ( j = 0; j < k; j++ )
>                  {
>                      unsigned long type = 0;
> +                    p2m_type_t t;
> 
> -                    page = get_page_from_gfn(d, arr[j], NULL,
> P2M_ALLOC); +                    page = get_page_from_gfn(d, arr[j],
> &t, P2M_ALLOC); 
> 
>                      if ( unlikely(!page) ||
>                           unlikely(is_xen_heap_page(page)) )
> -                        type = XEN_DOMCTL_PFINFO_XTAB;
> +                    {
> +                        if ( p2m_is_broken(t) )
> +                            type = XEN_DOMCTL_PFINFO_BROKEN;
> +                        else
> +                            type = XEN_DOMCTL_PFINFO_XTAB;
> +                    }
>                      else
>                      {
>                          switch( page->u.inuse.type_info &
> PGT_type_mask ) @@ -235,6 +241,9 @@
> 
>                          if ( page->u.inuse.type_info & PGT_pinned )
>                              type |= XEN_DOMCTL_PFINFO_LPINTAB;
> +
> +                        if ( page->count_info & PGC_broken )
> +                            type = XEN_DOMCTL_PFINFO_BROKEN;
>                      }
> 
>                      if ( page )
> @@ -1568,6 +1577,29 @@
>      }
>      break;
> 
> +    case XEN_DOMCTL_set_broken_page_p2m:
> +    {
> +        struct domain *d;
> +
> +        d = rcu_lock_domain_by_id(domctl->domain);
> +        if ( d != NULL )
> +        {
> +            p2m_type_t pt;
> +            unsigned long pfn = domctl->u.set_broken_page_p2m.pfn;
> +            mfn_t mfn = get_gfn_query(d, pfn, &pt);
> +
> +            if ( unlikely(!mfn_valid(mfn_x(mfn)) || !p2m_is_ram(pt)
> || +                         (p2m_change_type(d, pfn, pt,
> p2m_ram_broken) != pt)) ) +                ret = -EINVAL;
> +
> +            put_gfn(d, pfn);
> +            rcu_unlock_domain(d);
> +        }
> +        else
> +            ret = -ESRCH;
> +    }
> +    break;
> +
>      default:
>          ret = iommu_do_domctl(domctl, u_domctl);
>          break;
> diff -r 8b93ac0c93f3 -r 446f6b9bfc89 xen/include/public/domctl.h
> --- a/xen/include/public/domctl.h	Tue Nov 13 11:19:17 2012 +0000
> +++ b/xen/include/public/domctl.h	Sat Nov 17 09:46:05 2012 +0800
> @@ -136,6 +136,7 @@
>  #define XEN_DOMCTL_PFINFO_LPINTAB (0x1U<<31)
>  #define XEN_DOMCTL_PFINFO_XTAB    (0xfU<<28) /* invalid page */
>  #define XEN_DOMCTL_PFINFO_XALLOC  (0xeU<<28) /* allocate-only page
*/
> +#define XEN_DOMCTL_PFINFO_BROKEN  (0xdU<<28) /* broken page */
>  #define XEN_DOMCTL_PFINFO_LTAB_MASK (0xfU<<28)
> 
>  struct xen_domctl_getpageframeinfo {
> @@ -834,6 +835,12 @@
>  typedef struct xen_domctl_set_access_required
>  xen_domctl_set_access_required_t;
> DEFINE_XEN_GUEST_HANDLE(xen_domctl_set_access_required_t); 
> 
> +struct xen_domctl_set_broken_page_p2m {
> +    uint64_aligned_t pfn;
> +};
> +typedef struct xen_domctl_set_broken_page_p2m
> xen_domctl_set_broken_page_p2m_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_domctl_set_broken_page_p2m_t); +
>  struct xen_domctl {
>      uint32_t cmd;
>  #define XEN_DOMCTL_createdomain                   1
> @@ -899,6 +906,7 @@
>  #define XEN_DOMCTL_set_access_required           64
>  #define XEN_DOMCTL_audit_p2m                     65
>  #define XEN_DOMCTL_set_virq_handler              66
> +#define XEN_DOMCTL_set_broken_page_p2m           67
>  #define XEN_DOMCTL_gdbsx_guestmemio            1000
>  #define XEN_DOMCTL_gdbsx_pausevcpu             1001
>  #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
> @@ -954,6 +962,7 @@
>          struct xen_domctl_audit_p2m         audit_p2m;
>          struct xen_domctl_set_virq_handler  set_virq_handler;
>          struct xen_domctl_gdbsx_memio       gdbsx_guest_memio;
> +        struct xen_domctl_set_broken_page_p2m set_broken_page_p2m;
>          struct xen_domctl_gdbsx_pauseunp_vcpu gdbsx_pauseunp_vcpu;
>          struct xen_domctl_gdbsx_domstatus   gdbsx_domstatus;
>          uint8_t                             pad[128];
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

Liu Jinsong

2012-Nov-23 00:13 UTC

head link

[PATCH V4] X86/vMCE: handle broken page with regard to migration

Generally, there are 2 cases:
1. broken page occurs before migration
2. broken page occurs during migration

We had submitted 2 versions of patches. Their difference is how to handle case
2:
V1 patch aborted migration for the case ''broken page occurs during
migration'';
V2 patch marked broken page to dirty bitmap then got handled next round and no
abort;

This is V3 patch, adding handle for vMCE occur at last iteration of migration:
At the sender
  For case 1, the broken page will be mapped but not copied to target
  (otherwise it may trigger more serious error, say, SRAR error).
  While its pfn_type and pfn number will be transferred to target
  so that target take appropriate action.

  For case 2, mce handler marks the broken page to dirty bitmap, so that
  at copypages stage of migration, its pfn_type and pfn number will be
  transferred to target and then take appropriate action.

At the target
  When migration target populates pages for guest. As for the case
  of broken page wrt migration, we prefer to keep the type of the page,
  for the sake of seamless migration.

  At target it will set p2m as p2m_ram_broken for broken page. Guest MCE
  may have good chance to handle its broken page, while if guest access
  the broken page again it will kill itself as expected.

If vMCE occur at the last memory copy iteration of migration, we do one more
iteration so that the broken page''s pfn_type and pfn number has chance
to
be transferred to target.

Suggested-by: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Liu Jinsong <jinsong.liu@intel.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>

- for tools part of the patch
Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com>

- for hypervisor part of the patch
Acked-by: Jan Beulich <JBeulich@suse.com>

diff -r 8b93ac0c93f3 -r 446f6b9bfc89 tools/libxc/xc_domain.c
--- a/tools/libxc/xc_domain.c	Tue Nov 13 11:19:17 2012 +0000
+++ b/tools/libxc/xc_domain.c	Sat Nov 17 09:46:05 2012 +0800
@@ -283,6 +283,22 @@
     return ret;
 }
 
+/* set broken page p2m */
+int xc_set_broken_page_p2m(xc_interface *xch,
+                           uint32_t domid,
+                           unsigned long pfn)
+{
+    int ret;
+    DECLARE_DOMCTL;
+
+    domctl.cmd = XEN_DOMCTL_set_broken_page_p2m;
+    domctl.domain = (domid_t)domid;
+    domctl.u.set_broken_page_p2m.pfn = pfn;
+    ret = do_domctl(xch, &domctl);
+
+    return ret ? -1 : 0;
+}
+
 /* get info from hvm guest for save */
 int xc_domain_hvm_getcontext(xc_interface *xch,
                              uint32_t domid,
diff -r 8b93ac0c93f3 -r 446f6b9bfc89 tools/libxc/xc_domain_restore.c
--- a/tools/libxc/xc_domain_restore.c	Tue Nov 13 11:19:17 2012 +0000
+++ b/tools/libxc/xc_domain_restore.c	Sat Nov 17 09:46:05 2012 +0800
@@ -1023,9 +1023,15 @@
 
     countpages = count;
     for (i = oldcount; i < buf->nr_pages; ++i)
-        if ((buf->pfn_types[i] & XEN_DOMCTL_PFINFO_LTAB_MASK) ==
XEN_DOMCTL_PFINFO_XTAB
-            ||(buf->pfn_types[i] & XEN_DOMCTL_PFINFO_LTAB_MASK) ==
XEN_DOMCTL_PFINFO_XALLOC)
+    {
+        unsigned long pagetype;
+
+        pagetype = buf->pfn_types[i] & XEN_DOMCTL_PFINFO_LTAB_MASK;
+        if ( pagetype == XEN_DOMCTL_PFINFO_XTAB ||
+             pagetype == XEN_DOMCTL_PFINFO_BROKEN ||
+             pagetype == XEN_DOMCTL_PFINFO_XALLOC )
             --countpages;
+    }
 
     if (!countpages)
         return count;
@@ -1267,6 +1273,17 @@
             /* a bogus/unmapped/allocate-only page: skip it */
             continue;
 
+        if ( pagetype == XEN_DOMCTL_PFINFO_BROKEN )
+        {
+            if ( xc_set_broken_page_p2m(xch, dom, pfn) )
+            {
+                ERROR("Set p2m for broken page failed, "
+                      "dom=%d, pfn=%lx\n", dom, pfn);
+                goto err_mapped;
+            }
+            continue;
+        }
+
         if (pfn_err[i])
         {
             ERROR("unexpected PFN mapping failure pfn %lx map_mfn %lx
p2m_mfn %lx",
diff -r 8b93ac0c93f3 -r 446f6b9bfc89 tools/libxc/xc_domain_save.c
--- a/tools/libxc/xc_domain_save.c	Tue Nov 13 11:19:17 2012 +0000
+++ b/tools/libxc/xc_domain_save.c	Sat Nov 17 09:46:05 2012 +0800
@@ -1118,7 +1118,7 @@
     /* Now write out each data page, canonicalising page tables as we go... */
     for ( ; ; )
     {
-        unsigned int N, batch, run;
+        unsigned int N, batch, run, broken_page_num1, broken_page_num2;
         char reportbuf[80];
 
         snprintf(reportbuf, sizeof(reportbuf),
@@ -1270,6 +1270,7 @@
                 goto out;
             }
 
+            broken_page_num1 = 0;
             for ( run = j = 0; j < batch; j++ )
             {
                 unsigned long gmfn = pfn_batch[j];
@@ -1277,6 +1278,14 @@
                 if ( !hvm )
                     gmfn = pfn_to_mfn(gmfn);
 
+                if ( pfn_type[j] == XEN_DOMCTL_PFINFO_BROKEN )
+                {
+                    pfn_type[j] |= pfn_batch[j];
+                    ++broken_page_num1;
+                    ++run;
+                    continue;
+                }
+
                 if ( pfn_err[j] )
                 {
                     if ( pfn_type[j] == XEN_DOMCTL_PFINFO_XTAB )
@@ -1371,8 +1380,12 @@
                     }
                 }
 
-                /* skip pages that aren''t present or are alloc-only */
+                /*
+                 * skip pages that aren''t present,
+                 * or are broken, or are alloc-only
+                 */
                 if ( pagetype == XEN_DOMCTL_PFINFO_XTAB
+                    || pagetype == XEN_DOMCTL_PFINFO_BROKEN
                     || pagetype == XEN_DOMCTL_PFINFO_XALLOC )
                     continue;
 
@@ -1484,6 +1497,35 @@
 
             munmap(region_base, batch*PAGE_SIZE);
 
+            /*
+             * if vMCE occur at last iter, do one more iter so that it get
+             * chance to transfer broken page''s pfn_type and pfn
number to
+             * target and then take appropriate action
+             */
+            if ( last_iter )
+            {
+                for ( j = 0; j < batch; j++ )
+                {
+                    if ( hvm )
+                        pfn_type[j] = pfn_batch[j];
+                    else
+                        pfn_type[j] = pfn_to_mfn(pfn_batch[j]);
+                }
+
+                if ( xc_get_pfn_type_batch(xch, dom, batch, pfn_type) )
+                {
+                    PERROR("get_pfn_type_batch failed");
+                    goto out;
+                }
+
+                broken_page_num2 = 0;
+                for ( j = 0; j < batch; j++ )
+                    if ( pfn_type[j] == XEN_DOMCTL_PFINFO_BROKEN )
+                        broken_page_num2++;
+
+                if ( broken_page_num1 < broken_page_num2 )
+                    last_iter = 0;
+            }
         } /* end of this while loop for this iteration */
 
       skip:
@@ -1550,23 +1592,22 @@
                     PERROR("Error when writing to state file (tsc)");
                     goto out;
                 }
+            }
+        }
+        else
+            last_iter = 1;
 
+        if ( xc_shadow_control(xch, dom,
+                               XEN_DOMCTL_SHADOW_OP_CLEAN,
HYPERCALL_BUFFER(to_send),
+                               dinfo->p2m_size, NULL, 0, &shadow_stats)
!= dinfo->p2m_size )
+        {
+            PERROR("Error flushing shadow PT");
+            goto out;
+        }
 
-            }
+        sent_last_iter = sent_this_iter;
 
-            if ( xc_shadow_control(xch, dom,
-                                   XEN_DOMCTL_SHADOW_OP_CLEAN,
HYPERCALL_BUFFER(to_send),
-                                   dinfo->p2m_size, NULL, 0,
&shadow_stats) != dinfo->p2m_size )
-            {
-                PERROR("Error flushing shadow PT");
-                goto out;
-            }
-
-            sent_last_iter = sent_this_iter;
-
-            print_stats(xch, dom, sent_this_iter, &time_stats,
&shadow_stats, 1);
-
-        }
+        print_stats(xch, dom, sent_this_iter, &time_stats,
&shadow_stats, 1);
     } /* end of infinite for loop */
 
     DPRINTF("All memory is saved\n");
diff -r 8b93ac0c93f3 -r 446f6b9bfc89 tools/libxc/xenctrl.h
--- a/tools/libxc/xenctrl.h	Tue Nov 13 11:19:17 2012 +0000
+++ b/tools/libxc/xenctrl.h	Sat Nov 17 09:46:05 2012 +0800
@@ -575,6 +575,17 @@
                           xc_domaininfo_t *info);
 
 /**
+ * This function set p2m for broken page
+ * &parm xch a handle to an open hypervisor interface
+ * @parm domid the domain id which broken page belong to
+ * @parm pfn the pfn number of the broken page
+ * @return 0 on success, -1 on failure
+ */
+int xc_set_broken_page_p2m(xc_interface *xch,
+                           uint32_t domid,
+                           unsigned long pfn);
+
+/**
  * This function returns information about the context of a hvm domain
  * @parm xch a handle to an open hypervisor interface
  * @parm domid the domain to get information from
diff -r 8b93ac0c93f3 -r 446f6b9bfc89 xen/arch/x86/cpu/mcheck/mcaction.c
--- a/xen/arch/x86/cpu/mcheck/mcaction.c	Tue Nov 13 11:19:17 2012 +0000
+++ b/xen/arch/x86/cpu/mcheck/mcaction.c	Sat Nov 17 09:46:05 2012 +0800
@@ -1,5 +1,6 @@
 #include <xen/types.h>
 #include <xen/sched.h>
+#include <asm/p2m.h>
 #include "mcaction.h"
 #include "vmce.h"
 #include "mce.h"
@@ -91,6 +92,24 @@
                     goto vmce_failed;
                 }
 
+                if ( is_hvm_domain(d) &&
!d->arch.hvm_domain.dirty_vram &&
+                     paging_mode_log_dirty(d) )
+                {
+                    /*
+                     * vMCE occur during migration
+                     *
+                     *   At sender, it marks broken page to dirty bitmap,
+                     *   so that at copypages stage of migration, broken
+                     *   page''s pfn_type and pfn number would be
transferred
+                     *   to target and then take appropriate action.
+                     *
+                     *   At target, it would set p2m as p2m_ram_broken for
+                     *   broken page, so that if guest access the broken page
+                     *   again, it would kill itself as expected.
+                     */
+                    paging_mark_dirty(d, mfn);
+                }
+
                 if ( unmmap_broken_page(d, _mfn(mfn), gfn) )
                 {
                     printk("Unmap broken memory %lx for DOM%d
failed\n",
diff -r 8b93ac0c93f3 -r 446f6b9bfc89 xen/arch/x86/domctl.c
--- a/xen/arch/x86/domctl.c	Tue Nov 13 11:19:17 2012 +0000
+++ b/xen/arch/x86/domctl.c	Sat Nov 17 09:46:05 2012 +0800
@@ -209,12 +209,18 @@
                 for ( j = 0; j < k; j++ )
                 {
                     unsigned long type = 0;
+                    p2m_type_t t;
 
-                    page = get_page_from_gfn(d, arr[j], NULL, P2M_ALLOC);
+                    page = get_page_from_gfn(d, arr[j], &t, P2M_ALLOC);
 
                     if ( unlikely(!page) ||
                          unlikely(is_xen_heap_page(page)) )
-                        type = XEN_DOMCTL_PFINFO_XTAB;
+                    {
+                        if ( p2m_is_broken(t) )
+                            type = XEN_DOMCTL_PFINFO_BROKEN;
+                        else
+                            type = XEN_DOMCTL_PFINFO_XTAB;
+                    }
                     else
                     {
                         switch( page->u.inuse.type_info & PGT_type_mask
)
@@ -235,6 +241,9 @@
 
                         if ( page->u.inuse.type_info & PGT_pinned )
                             type |= XEN_DOMCTL_PFINFO_LPINTAB;
+
+                        if ( page->count_info & PGC_broken )
+                            type = XEN_DOMCTL_PFINFO_BROKEN;
                     }
 
                     if ( page )
@@ -1568,6 +1577,29 @@
     }
     break;
 
+    case XEN_DOMCTL_set_broken_page_p2m:
+    {
+        struct domain *d;
+
+        d = rcu_lock_domain_by_id(domctl->domain);
+        if ( d != NULL )
+        {
+            p2m_type_t pt;
+            unsigned long pfn = domctl->u.set_broken_page_p2m.pfn;
+            mfn_t mfn = get_gfn_query(d, pfn, &pt);
+
+            if ( unlikely(!mfn_valid(mfn_x(mfn)) || !p2m_is_ram(pt) ||
+                         (p2m_change_type(d, pfn, pt, p2m_ram_broken) != pt)) )
+                ret = -EINVAL;
+
+            put_gfn(d, pfn);
+            rcu_unlock_domain(d);
+        }
+        else
+            ret = -ESRCH;
+    }
+    break;
+
     default:
         ret = iommu_do_domctl(domctl, u_domctl);
         break;
diff -r 8b93ac0c93f3 -r 446f6b9bfc89 xen/include/public/domctl.h
--- a/xen/include/public/domctl.h	Tue Nov 13 11:19:17 2012 +0000
+++ b/xen/include/public/domctl.h	Sat Nov 17 09:46:05 2012 +0800
@@ -136,6 +136,7 @@
 #define XEN_DOMCTL_PFINFO_LPINTAB (0x1U<<31)
 #define XEN_DOMCTL_PFINFO_XTAB    (0xfU<<28) /* invalid page */
 #define XEN_DOMCTL_PFINFO_XALLOC  (0xeU<<28) /* allocate-only page */
+#define XEN_DOMCTL_PFINFO_BROKEN  (0xdU<<28) /* broken page */
 #define XEN_DOMCTL_PFINFO_LTAB_MASK (0xfU<<28)
 
 struct xen_domctl_getpageframeinfo {
@@ -834,6 +835,12 @@
 typedef struct xen_domctl_set_access_required xen_domctl_set_access_required_t;
 DEFINE_XEN_GUEST_HANDLE(xen_domctl_set_access_required_t);
 
+struct xen_domctl_set_broken_page_p2m {
+    uint64_aligned_t pfn;
+};
+typedef struct xen_domctl_set_broken_page_p2m xen_domctl_set_broken_page_p2m_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_set_broken_page_p2m_t);
+
 struct xen_domctl {
     uint32_t cmd;
 #define XEN_DOMCTL_createdomain                   1
@@ -899,6 +906,7 @@
 #define XEN_DOMCTL_set_access_required           64
 #define XEN_DOMCTL_audit_p2m                     65
 #define XEN_DOMCTL_set_virq_handler              66
+#define XEN_DOMCTL_set_broken_page_p2m           67
 #define XEN_DOMCTL_gdbsx_guestmemio            1000
 #define XEN_DOMCTL_gdbsx_pausevcpu             1001
 #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
@@ -954,6 +962,7 @@
         struct xen_domctl_audit_p2m         audit_p2m;
         struct xen_domctl_set_virq_handler  set_virq_handler;
         struct xen_domctl_gdbsx_memio       gdbsx_guest_memio;
+        struct xen_domctl_set_broken_page_p2m set_broken_page_p2m;
         struct xen_domctl_gdbsx_pauseunp_vcpu gdbsx_pauseunp_vcpu;
         struct xen_domctl_gdbsx_domstatus   gdbsx_domstatus;
         uint8_t                             pad[128];

Liu Jinsong

2012-Nov-23 00:25 UTC

head link

[PATCH V4] X86/vMCE: handle broken page with regard to migration

This patch handles broken page wrt migration. Generally there are below cases:
1. broken page occurs before migration
2. broken page occurs during migration
  2.1 broken page occurs during migration but not at the last iteration
  2.2 broken page occurs at the last iteration of migration

For case 1, at the sender the broken page will be mapped but not copied
  to target (otherwise it may trigger more serious error, say, SRAR error).
  While its pfn_type and pfn number will be transferred to target so that
  target take appropriate action.

For case 2.1, at the sender mce handler marks the broken page to dirty
  bitmap, so that at next iteration, its pfn_type and pfn number will be
  transferred to the target and then take appropriate action.

For case 2.2, at the sender it adds a check to see if vMCE occurs at the
  last iteration. If yes, it will do more iteration(s) so that the broken
  page''s pfn_type and pfn number will be transferred to target.
  Another point is, if guest save to disk and during which vMCE occurs,
  it also need do more iteration(s).

For all cases at the target (if migration not aborted by vMCE):
  Target will populates pages for guest. As for the case of broken page,
  we prefer to keep the type of the page for the sake of seamless migration.
  Target will set p2m as p2m_ram_broken for broken page. If guest access
  the broken page again it will kill itself as expected.

All above description is based on the assumption that migration will success.
However, for case 2 there are scenario that may result in guest or hypervisor
crash. When pfn_type detecting fail to get p2m_ram_broken (vMCE occur after the
detecting) and read the broken page, guest/hypervisor may survive or crash,
depending on error nature and how guest/hypervisor handle it. If
guest/hypervisor
survive, migration is OK since it will transfer pfn_type to the target at next
iter and then prevent further harm at target. If guest/hypervisor crash it
definitely needn''t care migration any more. Unfortunately we have no
way to
predict it, so what we can do is to do the best to handle it, after all we
cannot forbid migration for fear that it may crash guest/hypervisor.

Patch version history:
V4:
  - adjust variables and patch description based on feedback
V3:
  - handle pages broken at the last iteration
V2:
  - migrate continue when broken page occur during migration,
    via marking broken page to dirty bitmap
V1:
  - migration abort when broken page occur during migration
  - transfer pfn_type to target for broken page occur before migration

Suggested-by: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Liu Jinsong <jinsong.liu@intel.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>

- for hypervisor part of the patch
Acked-by: Jan Beulich <JBeulich@suse.com>

diff -r 8b93ac0c93f3 -r d3378692eece tools/libxc/xc_domain.c
--- a/tools/libxc/xc_domain.c	Tue Nov 13 11:19:17 2012 +0000
+++ b/tools/libxc/xc_domain.c	Fri Nov 23 08:21:57 2012 +0800
@@ -283,6 +283,22 @@
     return ret;
 }
 
+/* set broken page p2m */
+int xc_set_broken_page_p2m(xc_interface *xch,
+                           uint32_t domid,
+                           unsigned long pfn)
+{
+    int ret;
+    DECLARE_DOMCTL;
+
+    domctl.cmd = XEN_DOMCTL_set_broken_page_p2m;
+    domctl.domain = (domid_t)domid;
+    domctl.u.set_broken_page_p2m.pfn = pfn;
+    ret = do_domctl(xch, &domctl);
+
+    return ret ? -1 : 0;
+}
+
 /* get info from hvm guest for save */
 int xc_domain_hvm_getcontext(xc_interface *xch,
                              uint32_t domid,
diff -r 8b93ac0c93f3 -r d3378692eece tools/libxc/xc_domain_restore.c
--- a/tools/libxc/xc_domain_restore.c	Tue Nov 13 11:19:17 2012 +0000
+++ b/tools/libxc/xc_domain_restore.c	Fri Nov 23 08:21:57 2012 +0800
@@ -1023,9 +1023,15 @@
 
     countpages = count;
     for (i = oldcount; i < buf->nr_pages; ++i)
-        if ((buf->pfn_types[i] & XEN_DOMCTL_PFINFO_LTAB_MASK) ==
XEN_DOMCTL_PFINFO_XTAB
-            ||(buf->pfn_types[i] & XEN_DOMCTL_PFINFO_LTAB_MASK) ==
XEN_DOMCTL_PFINFO_XALLOC)
+    {
+        unsigned long pagetype;
+
+        pagetype = buf->pfn_types[i] & XEN_DOMCTL_PFINFO_LTAB_MASK;
+        if ( pagetype == XEN_DOMCTL_PFINFO_XTAB ||
+             pagetype == XEN_DOMCTL_PFINFO_BROKEN ||
+             pagetype == XEN_DOMCTL_PFINFO_XALLOC )
             --countpages;
+    }
 
     if (!countpages)
         return count;
@@ -1267,6 +1273,17 @@
             /* a bogus/unmapped/allocate-only page: skip it */
             continue;
 
+        if ( pagetype == XEN_DOMCTL_PFINFO_BROKEN )
+        {
+            if ( xc_set_broken_page_p2m(xch, dom, pfn) )
+            {
+                ERROR("Set p2m for broken page failed, "
+                      "dom=%d, pfn=%lx\n", dom, pfn);
+                goto err_mapped;
+            }
+            continue;
+        }
+
         if (pfn_err[i])
         {
             ERROR("unexpected PFN mapping failure pfn %lx map_mfn %lx
p2m_mfn %lx",
diff -r 8b93ac0c93f3 -r d3378692eece tools/libxc/xc_domain_save.c
--- a/tools/libxc/xc_domain_save.c	Tue Nov 13 11:19:17 2012 +0000
+++ b/tools/libxc/xc_domain_save.c	Fri Nov 23 08:21:57 2012 +0800
@@ -1118,7 +1118,7 @@
     /* Now write out each data page, canonicalising page tables as we go... */
     for ( ; ; )
     {
-        unsigned int N, batch, run;
+        unsigned int N, batch, run, broken_before_send, broken_after_send;
         char reportbuf[80];
 
         snprintf(reportbuf, sizeof(reportbuf),
@@ -1270,6 +1270,7 @@
                 goto out;
             }
 
+            broken_before_send = 0;
             for ( run = j = 0; j < batch; j++ )
             {
                 unsigned long gmfn = pfn_batch[j];
@@ -1277,6 +1278,14 @@
                 if ( !hvm )
                     gmfn = pfn_to_mfn(gmfn);
 
+                if ( pfn_type[j] == XEN_DOMCTL_PFINFO_BROKEN )
+                {
+                    pfn_type[j] |= pfn_batch[j];
+                    ++broken_before_send;
+                    ++run;
+                    continue;
+                }
+
                 if ( pfn_err[j] )
                 {
                     if ( pfn_type[j] == XEN_DOMCTL_PFINFO_XTAB )
@@ -1371,8 +1380,12 @@
                     }
                 }
 
-                /* skip pages that aren''t present or are alloc-only */
+                /*
+                 * skip pages that aren''t present,
+                 * or are broken, or are alloc-only
+                 */
                 if ( pagetype == XEN_DOMCTL_PFINFO_XTAB
+                    || pagetype == XEN_DOMCTL_PFINFO_BROKEN
                     || pagetype == XEN_DOMCTL_PFINFO_XALLOC )
                     continue;
 
@@ -1484,6 +1497,38 @@
 
             munmap(region_base, batch*PAGE_SIZE);
 
+            /*
+             * At the last iter, count the number of broken pages after
sending,
+             * and if there are more than before sending, do one or more iter
+             * to make sure the pages are marked broken on the receiving side.
+             */
+            if ( last_iter )
+            {
+                /* Read pfn type again */
+                for ( j = 0; j < batch; j++ )
+                {
+                    if ( hvm )
+                        pfn_type[j] = pfn_batch[j];
+                    else
+                        pfn_type[j] = pfn_to_mfn(pfn_batch[j]);
+                }
+
+                if ( xc_get_pfn_type_batch(xch, dom, batch, pfn_type) )
+                {
+                    PERROR("get_pfn_type_batch failed");
+                    goto out;
+                }
+
+                /* Count the number of broken pages */
+                broken_after_send = 0;
+                for ( j = 0; j < batch; j++ )
+                    if ( pfn_type[j] == XEN_DOMCTL_PFINFO_BROKEN )
+                        broken_after_send++;
+
+                /* If there are more than before sending, do more iter(s) */
+                if ( broken_before_send < broken_after_send )
+                    last_iter = 0;
+            }
         } /* end of this while loop for this iteration */
 
       skip:
@@ -1550,23 +1595,22 @@
                     PERROR("Error when writing to state file (tsc)");
                     goto out;
                 }
+            }
+        }
+        else
+            last_iter = 1;
 
+        if ( xc_shadow_control(xch, dom,
+                               XEN_DOMCTL_SHADOW_OP_CLEAN,
HYPERCALL_BUFFER(to_send),
+                               dinfo->p2m_size, NULL, 0, &shadow_stats)
!= dinfo->p2m_size )
+        {
+            PERROR("Error flushing shadow PT");
+            goto out;
+        }
 
-            }
+        sent_last_iter = sent_this_iter;
 
-            if ( xc_shadow_control(xch, dom,
-                                   XEN_DOMCTL_SHADOW_OP_CLEAN,
HYPERCALL_BUFFER(to_send),
-                                   dinfo->p2m_size, NULL, 0,
&shadow_stats) != dinfo->p2m_size )
-            {
-                PERROR("Error flushing shadow PT");
-                goto out;
-            }
-
-            sent_last_iter = sent_this_iter;
-
-            print_stats(xch, dom, sent_this_iter, &time_stats,
&shadow_stats, 1);
-
-        }
+        print_stats(xch, dom, sent_this_iter, &time_stats,
&shadow_stats, 1);
     } /* end of infinite for loop */
 
     DPRINTF("All memory is saved\n");
diff -r 8b93ac0c93f3 -r d3378692eece tools/libxc/xenctrl.h
--- a/tools/libxc/xenctrl.h	Tue Nov 13 11:19:17 2012 +0000
+++ b/tools/libxc/xenctrl.h	Fri Nov 23 08:21:57 2012 +0800
@@ -575,6 +575,17 @@
                           xc_domaininfo_t *info);
 
 /**
+ * This function set p2m for broken page
+ * &parm xch a handle to an open hypervisor interface
+ * @parm domid the domain id which broken page belong to
+ * @parm pfn the pfn number of the broken page
+ * @return 0 on success, -1 on failure
+ */
+int xc_set_broken_page_p2m(xc_interface *xch,
+                           uint32_t domid,
+                           unsigned long pfn);
+
+/**
  * This function returns information about the context of a hvm domain
  * @parm xch a handle to an open hypervisor interface
  * @parm domid the domain to get information from
diff -r 8b93ac0c93f3 -r d3378692eece xen/arch/x86/cpu/mcheck/mcaction.c
--- a/xen/arch/x86/cpu/mcheck/mcaction.c	Tue Nov 13 11:19:17 2012 +0000
+++ b/xen/arch/x86/cpu/mcheck/mcaction.c	Fri Nov 23 08:21:57 2012 +0800
@@ -1,5 +1,6 @@
 #include <xen/types.h>
 #include <xen/sched.h>
+#include <asm/p2m.h>
 #include "mcaction.h"
 #include "vmce.h"
 #include "mce.h"
@@ -91,6 +92,24 @@
                     goto vmce_failed;
                 }
 
+                if ( is_hvm_domain(d) &&
!d->arch.hvm_domain.dirty_vram &&
+                     paging_mode_log_dirty(d) )
+                {
+                    /*
+                     * vMCE occur during migration
+                     *
+                     *   At sender, it marks broken page to dirty bitmap,
+                     *   so that at copypages stage of migration, broken
+                     *   page''s pfn_type and pfn number would be
transferred
+                     *   to target and then take appropriate action.
+                     *
+                     *   At target, it would set p2m as p2m_ram_broken for
+                     *   broken page, so that if guest access the broken page
+                     *   again, it would kill itself as expected.
+                     */
+                    paging_mark_dirty(d, mfn);
+                }
+
                 if ( unmmap_broken_page(d, _mfn(mfn), gfn) )
                 {
                     printk("Unmap broken memory %lx for DOM%d
failed\n",
diff -r 8b93ac0c93f3 -r d3378692eece xen/arch/x86/domctl.c
--- a/xen/arch/x86/domctl.c	Tue Nov 13 11:19:17 2012 +0000
+++ b/xen/arch/x86/domctl.c	Fri Nov 23 08:21:57 2012 +0800
@@ -209,12 +209,18 @@
                 for ( j = 0; j < k; j++ )
                 {
                     unsigned long type = 0;
+                    p2m_type_t t;
 
-                    page = get_page_from_gfn(d, arr[j], NULL, P2M_ALLOC);
+                    page = get_page_from_gfn(d, arr[j], &t, P2M_ALLOC);
 
                     if ( unlikely(!page) ||
                          unlikely(is_xen_heap_page(page)) )
-                        type = XEN_DOMCTL_PFINFO_XTAB;
+                    {
+                        if ( p2m_is_broken(t) )
+                            type = XEN_DOMCTL_PFINFO_BROKEN;
+                        else
+                            type = XEN_DOMCTL_PFINFO_XTAB;
+                    }
                     else
                     {
                         switch( page->u.inuse.type_info & PGT_type_mask
)
@@ -235,6 +241,9 @@
 
                         if ( page->u.inuse.type_info & PGT_pinned )
                             type |= XEN_DOMCTL_PFINFO_LPINTAB;
+
+                        if ( page->count_info & PGC_broken )
+                            type = XEN_DOMCTL_PFINFO_BROKEN;
                     }
 
                     if ( page )
@@ -1568,6 +1577,29 @@
     }
     break;
 
+    case XEN_DOMCTL_set_broken_page_p2m:
+    {
+        struct domain *d;
+
+        d = rcu_lock_domain_by_id(domctl->domain);
+        if ( d != NULL )
+        {
+            p2m_type_t pt;
+            unsigned long pfn = domctl->u.set_broken_page_p2m.pfn;
+            mfn_t mfn = get_gfn_query(d, pfn, &pt);
+
+            if ( unlikely(!mfn_valid(mfn_x(mfn)) || !p2m_is_ram(pt) ||
+                         (p2m_change_type(d, pfn, pt, p2m_ram_broken) != pt)) )
+                ret = -EINVAL;
+
+            put_gfn(d, pfn);
+            rcu_unlock_domain(d);
+        }
+        else
+            ret = -ESRCH;
+    }
+    break;
+
     default:
         ret = iommu_do_domctl(domctl, u_domctl);
         break;
diff -r 8b93ac0c93f3 -r d3378692eece xen/include/public/domctl.h
--- a/xen/include/public/domctl.h	Tue Nov 13 11:19:17 2012 +0000
+++ b/xen/include/public/domctl.h	Fri Nov 23 08:21:57 2012 +0800
@@ -136,6 +136,7 @@
 #define XEN_DOMCTL_PFINFO_LPINTAB (0x1U<<31)
 #define XEN_DOMCTL_PFINFO_XTAB    (0xfU<<28) /* invalid page */
 #define XEN_DOMCTL_PFINFO_XALLOC  (0xeU<<28) /* allocate-only page */
+#define XEN_DOMCTL_PFINFO_BROKEN  (0xdU<<28) /* broken page */
 #define XEN_DOMCTL_PFINFO_LTAB_MASK (0xfU<<28)
 
 struct xen_domctl_getpageframeinfo {
@@ -834,6 +835,12 @@
 typedef struct xen_domctl_set_access_required xen_domctl_set_access_required_t;
 DEFINE_XEN_GUEST_HANDLE(xen_domctl_set_access_required_t);
 
+struct xen_domctl_set_broken_page_p2m {
+    uint64_aligned_t pfn;
+};
+typedef struct xen_domctl_set_broken_page_p2m xen_domctl_set_broken_page_p2m_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_set_broken_page_p2m_t);
+
 struct xen_domctl {
     uint32_t cmd;
 #define XEN_DOMCTL_createdomain                   1
@@ -899,6 +906,7 @@
 #define XEN_DOMCTL_set_access_required           64
 #define XEN_DOMCTL_audit_p2m                     65
 #define XEN_DOMCTL_set_virq_handler              66
+#define XEN_DOMCTL_set_broken_page_p2m           67
 #define XEN_DOMCTL_gdbsx_guestmemio            1000
 #define XEN_DOMCTL_gdbsx_pausevcpu             1001
 #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
@@ -954,6 +962,7 @@
         struct xen_domctl_audit_p2m         audit_p2m;
         struct xen_domctl_set_virq_handler  set_virq_handler;
         struct xen_domctl_gdbsx_memio       gdbsx_guest_memio;
+        struct xen_domctl_set_broken_page_p2m set_broken_page_p2m;
         struct xen_domctl_gdbsx_pauseunp_vcpu gdbsx_pauseunp_vcpu;
         struct xen_domctl_gdbsx_domstatus   gdbsx_domstatus;
         uint8_t                             pad[128];

George Dunlap

2012-Nov-23 16:26 UTC

head link

Re: [PATCH V4] X86/vMCE: handle broken page with regard to migration

On Fri, Nov 23, 2012 at 12:25 AM, Liu Jinsong <jinsong.liu@intel.com>
wrote:
> This patch handles broken page wrt migration. Generally there are below
> cases:
> 1. broken page occurs before migration
> 2. broken page occurs during migration
>   2.1 broken page occurs during migration but not at the last iteration
>   2.2 broken page occurs at the last iteration of migration
>
> For case 1, at the sender the broken page will be mapped but not copied
>   to target (otherwise it may trigger more serious error, say, SRAR error).
>   While its pfn_type and pfn number will be transferred to target so that
>   target take appropriate action.
>
> For case 2.1, at the sender mce handler marks the broken page to dirty
>   bitmap, so that at next iteration, its pfn_type and pfn number will be
>   transferred to the target and then take appropriate action.
>
> For case 2.2, at the sender it adds a check to see if vMCE occurs at the
>   last iteration. If yes, it will do more iteration(s) so that the broken
>   page''s pfn_type and pfn number will be transferred to target.
>   Another point is, if guest save to disk and during which vMCE occurs,
>   it also need do more iteration(s).
>
> For all cases at the target (if migration not aborted by vMCE):
>   Target will populates pages for guest. As for the case of broken page,
>   we prefer to keep the type of the page for the sake of seamless
> migration.
>   Target will set p2m as p2m_ram_broken for broken page. If guest access
>   the broken page again it will kill itself as expected.
>
> All above description is based on the assumption that migration will
> success.
> However, for case 2 there are scenario that may result in guest or
> hypervisor
> crash. When pfn_type detecting fail to get p2m_ram_broken (vMCE occur
> after the
> detecting) and read the broken page, guest/hypervisor may survive or crash,
> depending on error nature and how guest/hypervisor handle it. If
> guest/hypervisor
> survive, migration is OK since it will transfer pfn_type to the target at
> next
> iter and then prevent further harm at target. If guest/hypervisor crash it
> definitely needn''t care migration any more. Unfortunately we have
no way to
> predict it, so what we can do is to do the best to handle it, after all we
> cannot forbid migration for fear that it may crash guest/hypervisor.
>
> Patch version history:
> V4:
>   - adjust variables and patch description based on feedback
> V3:
>   - handle pages broken at the last iteration
> V2:
>   - migrate continue when broken page occur during migration,
>     via marking broken page to dirty bitmap
> V1:
>   - migration abort when broken page occur during migration
>   - transfer pfn_type to target for broken page occur before migration
>
> Suggested-by: George Dunlap <george.dunlap@eu.citrix.com>
> Signed-off-by: Liu Jinsong <jinsong.liu@intel.com>
> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
>
Technically, since you changed part of the code I acked, you should have
removed this ack.  But now I''ve read the patch:

Acked-by: George Dunlap <george.dunlap@eu.citrix.com>

Thanks Jinsong!


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Liu, Jinsong

2012-Nov-25 13:24 UTC

head link

Re: [PATCH V4] X86/vMCE: handle broken page with regard to migration

Thanks George!

________________________________
From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of George Dunlap
Sent: Saturday, November 24, 2012 12:26 AM
To: Liu, Jinsong
Cc: Ian Campbell; Ian Jackson; Jan Beulich; xen-devel@lists.xensource.com
Subject: Re: [Xen-devel] [PATCH V4] X86/vMCE: handle broken page with regard to
migration

On Fri, Nov 23, 2012 at 12:25 AM, Liu Jinsong
<jinsong.liu@intel.com<mailto:jinsong.liu@intel.com>> wrote:
This patch handles broken page wrt migration. Generally there are below cases:
1. broken page occurs before migration
2. broken page occurs during migration
  2.1 broken page occurs during migration but not at the last iteration
  2.2 broken page occurs at the last iteration of migration

For case 1, at the sender the broken page will be mapped but not copied
  to target (otherwise it may trigger more serious error, say, SRAR error).
  While its pfn_type and pfn number will be transferred to target so that
  target take appropriate action.

For case 2.1, at the sender mce handler marks the broken page to dirty
  bitmap, so that at next iteration, its pfn_type and pfn number will be
  transferred to the target and then take appropriate action.

For case 2.2, at the sender it adds a check to see if vMCE occurs at the
  last iteration. If yes, it will do more iteration(s) so that the broken
  page''s pfn_type and pfn number will be transferred to target.
  Another point is, if guest save to disk and during which vMCE occurs,
  it also need do more iteration(s).

For all cases at the target (if migration not aborted by vMCE):
  Target will populates pages for guest. As for the case of broken page,
  we prefer to keep the type of the page for the sake of seamless migration.
  Target will set p2m as p2m_ram_broken for broken page. If guest access
  the broken page again it will kill itself as expected.

All above description is based on the assumption that migration will success.
However, for case 2 there are scenario that may result in guest or hypervisor
crash. When pfn_type detecting fail to get p2m_ram_broken (vMCE occur after the
detecting) and read the broken page, guest/hypervisor may survive or crash,
depending on error nature and how guest/hypervisor handle it. If
guest/hypervisor
survive, migration is OK since it will transfer pfn_type to the target at next
iter and then prevent further harm at target. If guest/hypervisor crash it
definitely needn''t care migration any more. Unfortunately we have no
way to
predict it, so what we can do is to do the best to handle it, after all we
cannot forbid migration for fear that it may crash guest/hypervisor.

Patch version history:
V4:
  - adjust variables and patch description based on feedback
V3:
  - handle pages broken at the last iteration
V2:
  - migrate continue when broken page occur during migration,
    via marking broken page to dirty bitmap
V1:
  - migration abort when broken page occur during migration
  - transfer pfn_type to target for broken page occur before migration

Suggested-by: George Dunlap
<george.dunlap@eu.citrix.com<mailto:george.dunlap@eu.citrix.com>>
Signed-off-by: Liu Jinsong
<jinsong.liu@intel.com<mailto:jinsong.liu@intel.com>>
Acked-by: George Dunlap
<george.dunlap@eu.citrix.com<mailto:george.dunlap@eu.citrix.com>>

Technically, since you changed part of the code I acked, you should have removed
this ack.  But now I''ve read the patch:

Acked-by: George Dunlap
<george.dunlap@eu.citrix.com<mailto:george.dunlap@eu.citrix.com>>

Thanks Jinsong!

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Liu, Jinsong

2012-Nov-28 14:37 UTC

head link

Re: [PATCH V4] X86/vMCE: handle broken page with regard to migration

Ping?

Thanks,
Jinsong

________________________________
From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of George Dunlap
Sent: Saturday, November 24, 2012 12:26 AM
To: Liu, Jinsong
Cc: Ian Campbell; Ian Jackson; Jan Beulich; xen-devel@lists.xensource.com
Subject: Re: [Xen-devel] [PATCH V4] X86/vMCE: handle broken page with regard to
migration

On Fri, Nov 23, 2012 at 12:25 AM, Liu Jinsong
<jinsong.liu@intel.com<mailto:jinsong.liu@intel.com>> wrote:
This patch handles broken page wrt migration. Generally there are below cases:
1. broken page occurs before migration
2. broken page occurs during migration
  2.1 broken page occurs during migration but not at the last iteration
  2.2 broken page occurs at the last iteration of migration

For case 1, at the sender the broken page will be mapped but not copied
  to target (otherwise it may trigger more serious error, say, SRAR error).
  While its pfn_type and pfn number will be transferred to target so that
  target take appropriate action.

For case 2.1, at the sender mce handler marks the broken page to dirty
  bitmap, so that at next iteration, its pfn_type and pfn number will be
  transferred to the target and then take appropriate action.

For case 2.2, at the sender it adds a check to see if vMCE occurs at the
  last iteration. If yes, it will do more iteration(s) so that the broken
  page''s pfn_type and pfn number will be transferred to target.
  Another point is, if guest save to disk and during which vMCE occurs,
  it also need do more iteration(s).

For all cases at the target (if migration not aborted by vMCE):
  Target will populates pages for guest. As for the case of broken page,
  we prefer to keep the type of the page for the sake of seamless migration.
  Target will set p2m as p2m_ram_broken for broken page. If guest access
  the broken page again it will kill itself as expected.

All above description is based on the assumption that migration will success.
However, for case 2 there are scenario that may result in guest or hypervisor
crash. When pfn_type detecting fail to get p2m_ram_broken (vMCE occur after the
detecting) and read the broken page, guest/hypervisor may survive or crash,
depending on error nature and how guest/hypervisor handle it. If
guest/hypervisor
survive, migration is OK since it will transfer pfn_type to the target at next
iter and then prevent further harm at target. If guest/hypervisor crash it
definitely needn''t care migration any more. Unfortunately we have no
way to
predict it, so what we can do is to do the best to handle it, after all we
cannot forbid migration for fear that it may crash guest/hypervisor.

Patch version history:
V4:
  - adjust variables and patch description based on feedback
V3:
  - handle pages broken at the last iteration
V2:
  - migrate continue when broken page occur during migration,
    via marking broken page to dirty bitmap
V1:
  - migration abort when broken page occur during migration
  - transfer pfn_type to target for broken page occur before migration

Suggested-by: George Dunlap
<george.dunlap@eu.citrix.com<mailto:george.dunlap@eu.citrix.com>>
Signed-off-by: Liu Jinsong
<jinsong.liu@intel.com<mailto:jinsong.liu@intel.com>>
Acked-by: George Dunlap
<george.dunlap@eu.citrix.com<mailto:george.dunlap@eu.citrix.com>>

Technically, since you changed part of the code I acked, you should have removed
this ack.  But now I''ve read the patch:

Acked-by: George Dunlap
<george.dunlap@eu.citrix.com<mailto:george.dunlap@eu.citrix.com>>

Thanks Jinsong!

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Ian Campbell

2012-Nov-29 10:02 UTC

head link

Re: [PATCH V4] X86/vMCE: handle broken page with regard to migration

On Wed, 2012-11-28 at 14:37 +0000, Liu, Jinsong wrote:> Ping?
Sorry I''ve been meaning to reply but didn''t manage to yet.
Also you
replied to V4 saying to ignore it, so I was half waiting for V5 but I
see this should actually be labelled V5 anyway.

I''m afraid I still don''t fully grok the reason for the loop
that goes
with:
+            /*
+             * At the last iter, count the number of broken pages after
sending,
+             * and if there are more than before sending, do one or more iter
+             * to make sure the pages are marked broken on the receiving side.
+             */

Can we go through it one more time? Sorry.

Let me outline the sequence of events and you can point out where I''m
going wrong. I''m afraid this has turned out to be rather long, again
I''m
sorry for that.

First we do some number of iterations with the guest live. If an MCE
occurs during this phase then the page will be marked dirty and we will
pick this up on the next iteration and resend the page with the dirty
type etc and all is fine. This all looks good to me, so we don''t need
to
worry about anything at this stage.

Eventually we get to the last iteration, at which point we pause the
guest. From here on in the guest itself is not going to cause an MCE
(e.g. by touching its RAM) because it is not running but we must still
account for the possibility of an asynchronous MCE of some sort e.g.
triggered by the error detection mechanisms in the hardware, cosmic rays
and such like.

The final iteration proceeds roughly as follows.

     1. The domain is paused
     2. We scan the dirty bitmap and add dirty pages to the batch of
        pages to process (there may be several batches in the last
        iteration, we only need to concern ourselves with any one batch
        here).
     3. We map all of the pages in the resulting batch with
        xc_map_foreign_bulk
     4. We query the types of all the pages in the batch with
        xc_get_pfn_type_batch
     5. We iterate over the batch, checking the type of each page, in
        some cases we do some incidental processing.
     6. We send the types of the pages in the batch over the wire.
     7. We iterate over the batch again, and send any normal page (not
        broken, xtab etc) over the wire. Actually we do this as runs of
        normal pages, but the key point is we avoid touching any special
        page (including ones marked as broken by #4)

Is this sequence of events accurate?

Now lets consider the consequences of an MCE occurring at various stages
here.

Any MCE which happens before #4 is fine, we will pick that up in #4 and
the following steps will do the right thing.

Note that I am assuming that the mapping step in #3 is safe even for a
broken page, so long as we don''t actually try and use the mapping (more
on that later), is this true?

If an MCE occurs after #4 then the page will be marked as dirty in the
bitmap and Xen will internally mark it as broken, but we won''t see
either of those with the current algorithm. There are two cases to think
about here AFAICT,
     A. The page was not already dirty at #2. In this case we know that
        the guest hasn''t dirtied the page since the previous iteration
        and therefore the target has a good copy of this page from that
        time. The page isn''t even in the batch we are processing So we
        don''t particularly care about the MCE here and can, from the
PoV
        of migrating this guest, ignore it.
     B. The page was already dirty (but not broken, we handled that case
        above in "Any MCE which happens before #4...") at #2 which
means
        we have do not have an up to date copy on the target. This has
        two subcases:
             I. The MCE occurs before (or during) #6 (sending the page)
                and therefore we do not have a good up to date copy of
                that data at either end.
            II. The MCE occurs after #6, in which case we already have a
                good copy at the target end.

To fix B you have added an 8th step to the above:

        8. Query the types of the pages again, using
        xc_get_pfn_type_batch, and if there are more pages dirty now
        than we say at #4 (actually #5 when we scanned the array, but
        that distinction doesn''t matter) then a new MCE must have
        occurred. Go back to #2 and try again.

This won''t do anything for A since the page wasn''t in the
batch to start
with and so neither #4 or #8 will look at its type, this is good and
proper.

So now we consider the two subcases of B. Lets consider B.II first since
it seems to be the more obvious case.

In case B.II the target end already has a good copy of the data page,
there is no need to mark the page as broken on the far end, nor to
arrange for a vMCE to be injected. I don''t know if/how we arrange for
vMCEs to be injected under these circumstances, however even if a vMCE
does get injected into the guest when it eventually gets unpaused on the
target then all that will happen is that it will needlessly throw away a
good page. However this is a rare corner case which is not worth
concerning ourselves with (it''s largely indistinguishable from case A).
If the MCE had happened even a single cycle earlier then this would have
been a B.I event instead of a B.II one. In any case there is no need to
return to #2 and try again, everything will be just fine if we complete
the migration at this point.

In case B.I the MCE occurs before (or while) we send the page onto the
wire. We will therefore try to read from this page because we haven''t
looked at the type since #4 and have no idea that it is now broken.
Reading from the broken page will cause a fault, perhaps causing a vMCE
to be delivered to dom0, which causes the kernel to kill the process
doing the migration. Or maybe it kills dom0 or the host entirely. Either
way the idea of looping again is rather moot.

Have I missed a case which needs thinking about?

I suspect B.I is the case where you are most likely to find a flaw in my
argument. Is there something else which is done in this case which would
allow us to continue?

Ian.

George Dunlap

2012-Nov-29 10:50 UTC

head link

Re: [PATCH V4] X86/vMCE: handle broken page with regard to migration

On 29/11/12 10:02, Ian Campbell wrote:> On Wed, 2012-11-28 at 14:37 +0000, Liu, Jinsong wrote:
>> Ping?
> Sorry I''ve been meaning to reply but didn''t manage to
yet. Also you
> replied to V4 saying to ignore it, so I was half waiting for V5 but I
> see this should actually be labelled V5 anyway.
>
> I''m afraid I still don''t fully grok the reason for the
loop that goes
> with:
> +            /*
> +             * At the last iter, count the number of broken pages after
sending,
> +             * and if there are more than before sending, do one or more
iter
> +             * to make sure the pages are marked broken on the receiving
side.
> +             */
>
> Can we go through it one more time? Sorry.
>
> Let me outline the sequence of events and you can point out where
I''m
> going wrong. I''m afraid this has turned out to be rather long,
again I''m
> sorry for that.
>
> First we do some number of iterations with the guest live. If an MCE
> occurs during this phase then the page will be marked dirty and we will
> pick this up on the next iteration and resend the page with the dirty
> type etc and all is fine. This all looks good to me, so we don''t
need to
> worry about anything at this stage.
>
> Eventually we get to the last iteration, at which point we pause the
> guest. From here on in the guest itself is not going to cause an MCE
> (e.g. by touching its RAM) because it is not running but we must still
> account for the possibility of an asynchronous MCE of some sort e.g.
> triggered by the error detection mechanisms in the hardware, cosmic rays
> and such like.
>
> The final iteration proceeds roughly as follows.
>
>       1. The domain is paused
>       2. We scan the dirty bitmap and add dirty pages to the batch of
>          pages to process (there may be several batches in the last
>          iteration, we only need to concern ourselves with any one batch
>          here).
>       3. We map all of the pages in the resulting batch with
>          xc_map_foreign_bulk
>       4. We query the types of all the pages in the batch with
>          xc_get_pfn_type_batch
>       5. We iterate over the batch, checking the type of each page, in
>          some cases we do some incidental processing.
>       6. We send the types of the pages in the batch over the wire.
>       7. We iterate over the batch again, and send any normal page (not
>          broken, xtab etc) over the wire. Actually we do this as runs of
>          normal pages, but the key point is we avoid touching any special
>          page (including ones marked as broken by #4)
>
> Is this sequence of events accurate?
>
> Now lets consider the consequences of an MCE occurring at various stages
> here.
>
> Any MCE which happens before #4 is fine, we will pick that up in #4 and
> the following steps will do the right thing.
>
> Note that I am assuming that the mapping step in #3 is safe even for a
> broken page, so long as we don''t actually try and use the mapping
(more
> on that later), is this true?
>
> If an MCE occurs after #4 then the page will be marked as dirty in the
> bitmap and Xen will internally mark it as broken, but we won''t see
> either of those with the current algorithm. There are two cases to think
> about here AFAICT,
>       A. The page was not already dirty at #2. In this case we know that
>          the guest hasn''t dirtied the page since the previous
iteration
>          and therefore the target has a good copy of this page from that
>          time. The page isn''t even in the batch we are processing
So we
>          don''t particularly care about the MCE here and can, from
the PoV
>          of migrating this guest, ignore it.
>       B. The page was already dirty (but not broken, we handled that case
>          above in "Any MCE which happens before #4...") at #2
which means
>          we have do not have an up to date copy on the target. This has
>          two subcases:
>               I. The MCE occurs before (or during) #6 (sending the page)
>                  and therefore we do not have a good up to date copy of
>                  that data at either end.
>              II. The MCE occurs after #6, in which case we already have a
>                  good copy at the target end.
>
> To fix B you have added an 8th step to the above:
>
>          8. Query the types of the pages again, using
>          xc_get_pfn_type_batch, and if there are more pages dirty now
>          than we say at #4 (actually #5 when we scanned the array, but
>          that distinction doesn''t matter) then a new MCE must have
>          occurred. Go back to #2 and try again.
>
> This won''t do anything for A since the page wasn''t in the
batch to start
> with and so neither #4 or #8 will look at its type, this is good and
> proper.
>
> So now we consider the two subcases of B. Lets consider B.II first since
> it seems to be the more obvious case.
>
> In case B.II the target end already has a good copy of the data page,
> there is no need to mark the page as broken on the far end, nor to
> arrange for a vMCE to be injected. I don''t know if/how we arrange
for
> vMCEs to be injected under these circumstances, however even if a vMCE
> does get injected into the guest when it eventually gets unpaused on the
> target then all that will happen is that it will needlessly throw away a
> good page. However this is a rare corner case which is not worth
> concerning ourselves with (it''s largely indistinguishable from
case A).
> If the MCE had happened even a single cycle earlier then this would have
> been a B.I event instead of a B.II one. In any case there is no need to
> return to #2 and try again, everything will be just fine if we complete
> the migration at this point.
>
> In case B.I the MCE occurs before (or while) we send the page onto the
> wire. We will therefore try to read from this page because we
haven''t
> looked at the type since #4 and have no idea that it is now broken.
> Reading from the broken page will cause a fault, perhaps causing a vMCE
> to be delivered to dom0, which causes the kernel to kill the process
> doing the migration. Or maybe it kills dom0 or the host entirely. Either
> way the idea of looping again is rather moot.
>
> Have I missed a case which needs thinking about?
>
> I suspect B.I is the case where you are most likely to find a flaw in my
> argument. Is there something else which is done in this case which would
> allow us to continue?
I think your analysis is correct -- the only question is whether B.I 
will 100% cause the migration to crash, or whether there''s a chance of 
not crashing on the read.  I had tried to ask that question before, and 
understood Jinsong''s response to be that it''s not 100% sure
that the
read would cause an error.  However, looking back at the thread, I think 
I may have understood something that was not there. :-)

So I guess the question for Jinsong is this:

The only time this extra loop could help is if there is a page broken 
after being paused but before being sent the last time (B.I in Ian''s 
analysis) -- in which case, the migration code is 100% guaranteed to 
read a now-broken page.  What are the chances that this read of a broken 
page will *not* cause a fault which will kill at least the migration 
process, if not dom0?  If the chances are "slim-to-none", then there
is
no point for the extra check.

  -George

Liu, Jinsong

2012-Nov-30 18:51 UTC

head link

Re: [PATCH V4] X86/vMCE: handle broken page with regard to migration

Ian Campbell wrote:> On Wed, 2012-11-28 at 14:37 +0000, Liu, Jinsong wrote:
>> Ping?
> 
> Sorry I''ve been meaning to reply but didn''t manage to
yet. Also you
> replied to V4 saying to ignore it, so I was half waiting for V5 but I
> see this should actually be labelled V5 anyway.
> 
> I''m afraid I still don''t fully grok the reason for the
loop that goes
> with:
> +            /*
> +             * At the last iter, count the number of broken pages
> after sending, +             * and if there are more than before
> sending, do one or more iter +             * to make sure the pages
> are marked broken on the receiving side. +             */
> 
> Can we go through it one more time? Sorry.
Sure, and I''m very much appreciated your kindly/patient review. Your
comments help/force me to re-think it more detailed.
> 
> Let me outline the sequence of events and you can point out where
I''m
> going wrong. I''m afraid this has turned out to be rather long,
again
> I''m 
> sorry for that.
> 
> First we do some number of iterations with the guest live. If an MCE
> occurs during this phase then the page will be marked dirty and we
> will 
> pick this up on the next iteration and resend the page with the dirty
> type etc and all is fine. This all looks good to me, so we don''t
need
> to 
> worry about anything at this stage.
Hmm, seems not so safe. If the page is good and it''s in dirty bitmap
(get from step #2), while at the iteration it broken after #4, same issue occur
as your analysis of last iteration --> migration process will read broken
page (except this case I agree it''s OK) --> so let''s merge
analysis of ''migration process read broken page'' with last
iteration B.I case.
> 
> Eventually we get to the last iteration, at which point we pause the
> guest. From here on in the guest itself is not going to cause an MCE
> (e.g. by touching its RAM) because it is not running but we must still
> account for the possibility of an asynchronous MCE of some sort e.g.
> triggered by the error detection mechanisms in the hardware, cosmic
> rays 
> and such like.
> 
> The final iteration proceeds roughly as follows.
> 
>      1. The domain is paused
>      2. We scan the dirty bitmap and add dirty pages to the batch of
>         pages to process (there may be several batches in the last
>         iteration, we only need to concern ourselves with any one
>         batch here).
>      3. We map all of the pages in the resulting batch with
>         xc_map_foreign_bulk
>      4. We query the types of all the pages in the batch with
>         xc_get_pfn_type_batch
>      5. We iterate over the batch, checking the type of each page, in
>         some cases we do some incidental processing.
>      6. We send the types of the pages in the batch over the wire.
>      7. We iterate over the batch again, and send any normal page (not
>         broken, xtab etc) over the wire. Actually we do this as runs
>         of normal pages, but the key point is we avoid touching any
>         special page (including ones marked as broken by #4)
> 
> Is this sequence of events accurate?
Yes, exactly.
> 
> Now lets consider the consequences of an MCE occurring at various
> stages 
> here.
> 
> Any MCE which happens before #4 is fine, we will pick that up in #4
> and 
> the following steps will do the right thing.
> 
> Note that I am assuming that the mapping step in #3 is safe even for a
> broken page, so long as we don''t actually try and use the mapping
> (more 
> on that later), is this true?
I think so. #3 mmu mapping is safe itself.
> 
> If an MCE occurs after #4 then the page will be marked as dirty in the
> bitmap and Xen will internally mark it as broken, but we won''t see
> either of those with the current algorithm. There are two cases to
> think 
> about here AFAICT,
>      A. The page was not already dirty at #2. In this case we know
>         that the guest hasn''t dirtied the page since the previous
>         iteration and therefore the target has a good copy of this
>         page from that time. The page isn''t even in the batch we
are
>         processing So we don''t particularly care about the MCE
here
>         and can, from the PoV of migrating this guest, ignore it.
>      B. The page was already dirty (but not broken, we handled that
>         case above in "Any MCE which happens before #4...") at #2
>         which means we have do not have an up to date copy on the
>         target. This has two subcases:
>              I. The MCE occurs before (or during) #6 (sending the
>                 page) and therefore we do not have a good up to date
>                 copy of that data at either end.
>             II. The MCE occurs after #6, in which case we already
>                 have a good copy at the target end.
> 
> To fix B you have added an 8th step to the above:
> 
>         8. Query the types of the pages again, using
>         xc_get_pfn_type_batch, and if there are more pages dirty now
>         than we say at #4 (actually #5 when we scanned the array, but
>         that distinction doesn''t matter) then a new MCE must have
>         occurred. Go back to #2 and try again.
> 
> This won''t do anything for A since the page wasn''t in the
batch to
> start 
> with and so neither #4 or #8 will look at its type, this is good and
> proper.
> 
> So now we consider the two subcases of B. Lets consider B.II first
> since 
> it seems to be the more obvious case.
> 
> In case B.II the target end already has a good copy of the data page,
> there is no need to mark the page as broken on the far end, nor to
> arrange for a vMCE to be injected. I don''t know if/how we arrange
for
> vMCEs to be injected under these circumstances, however even if a vMCE
> does get injected into the guest when it eventually gets unpaused on
> the 
> target then all that will happen is that it will needlessly throw
> away a 
> good page. However this is a rare corner case which is not worth
> concerning ourselves with (it''s largely indistinguishable from
case
> A). 
> If the MCE had happened even a single cycle earlier then this would
> have 
> been a B.I event instead of a B.II one. In any case there is no need
> to 
> return to #2 and try again, everything will be just fine if we
> complete 
> the migration at this point.
> 
> In case B.I the MCE occurs before (or while) we send the page onto the
> wire. We will therefore try to read from this page because we
haven''t
> looked at the type since #4 and have no idea that it is now broken.
Right.
> Reading from the broken page will cause a fault, perhaps causing a
> vMCE 
> to be delivered to dom0, which causes the kernel to kill the process
> doing the migration. Or maybe it kills dom0 or the host entirely.
Not exactly I think. Reading the broken page will trigger a serious SRAR error.
Under such case hypervisor will inject a vMCE to the guest which was migrating,
not dom0. The reason of this injection is, guest is best one to handle it, w/
sufficient clue/status/information (other component like hypervisor/dom0 are not
proper). For xl migration process, after return from MCE context, it *again*
read the broken page ... this will kill system entirely --> so we definitely
not care migration any more.
> Either 
> way the idea of looping again is rather moot.
Agree the conclusion, though a little different opinion about its reason. So
let''s remove step #8?

Thanks,
Jinsong
> 
> Have I missed a case which needs thinking about?
> 
> I suspect B.I is the case where you are most likely to find a flaw in
> my argument. Is there something else which is done in this case which
> would 
> allow us to continue?
> 
> Ian.

Liu, Jinsong

2012-Nov-30 18:57 UTC

head link

Re: [PATCH V4] X86/vMCE: handle broken page with regard to migration

George Dunlap wrote:> On 29/11/12 10:02, Ian Campbell wrote:
>> On Wed, 2012-11-28 at 14:37 +0000, Liu, Jinsong wrote:
>>> Ping?
>> Sorry I''ve been meaning to reply but didn''t manage to
yet. Also you
>> replied to V4 saying to ignore it, so I was half waiting for V5 but I
>> see this should actually be labelled V5 anyway.
>> 
>> I''m afraid I still don''t fully grok the reason for
the loop that goes
>> with:
>> +            /*
>> +             * At the last iter, count the number of broken pages
>> after sending, +             * and if there are more than before
>> sending, do one or more iter +             * to make sure the pages
>> are marked broken on the receiving side. +             */ 
>> 
>> Can we go through it one more time? Sorry.
>> 
>> Let me outline the sequence of events and you can point out where
I''m
>> going wrong. I''m afraid this has turned out to be rather long,
again
>> I''m 
>> sorry for that.
>> 
>> First we do some number of iterations with the guest live. If an MCE
>> occurs during this phase then the page will be marked dirty and we
>> will 
>> pick this up on the next iteration and resend the page with the dirty
>> type etc and all is fine. This all looks good to me, so we
don''t
>> need to 
>> worry about anything at this stage.
>> 
>> Eventually we get to the last iteration, at which point we pause the
>> guest. From here on in the guest itself is not going to cause an MCE
>> (e.g. by touching its RAM) because it is not running but we must
>> still 
>> account for the possibility of an asynchronous MCE of some sort e.g.
>> triggered by the error detection mechanisms in the hardware, cosmic
>> rays 
>> and such like.
>> 
>> The final iteration proceeds roughly as follows.
>> 
>>       1. The domain is paused
>>       2. We scan the dirty bitmap and add dirty pages to the batch of
>>          pages to process (there may be several batches in the last
>>          iteration, we only need to concern ourselves with any one
>> batch          here). 
>>       3. We map all of the pages in the resulting batch with        
>> xc_map_foreign_bulk 
>>       4. We query the types of all the pages in the batch with
>>          xc_get_pfn_type_batch
>>       5. We iterate over the batch, checking the type of each page,
>>          in some cases we do some incidental processing.
>>       6. We send the types of the pages in the batch over the wire.
>>       7. We iterate over the batch again, and send any normal page
>>          (not broken, xtab etc) over the wire. Actually we do this
>>          as runs of normal pages, but the key point is we avoid
>>          touching any special page (including ones marked as broken
>> by #4) 
>> 
>> Is this sequence of events accurate?
>> 
>> Now lets consider the consequences of an MCE occurring at various
>> stages 
>> here.
>> 
>> Any MCE which happens before #4 is fine, we will pick that up in #4
>> and 
>> the following steps will do the right thing.
>> 
>> Note that I am assuming that the mapping step in #3 is safe even for
>> a 
>> broken page, so long as we don''t actually try and use the
mapping
>> (more 
>> on that later), is this true?
>> 
>> If an MCE occurs after #4 then the page will be marked as dirty in
>> the 
>> bitmap and Xen will internally mark it as broken, but we won''t
see
>> either of those with the current algorithm. There are two cases to
>> think 
>> about here AFAICT,
>>       A. The page was not already dirty at #2. In this case we know
>>          that the guest hasn''t dirtied the page since the
previous
>>          iteration and therefore the target has a good copy of this
>>          page from that time. The page isn''t even in the batch
we
>>          are processing So we don''t particularly care about
the MCE
>>          here and can, from the PoV of migrating this guest, ignore
>>       it. B. The page was already dirty (but not broken, we handled
>>          that case above in "Any MCE which happens before
#4...") at
>>          #2 which means we have do not have an up to date copy on
>>               the target. This has          two subcases: I. The MCE
>>                  occurs before (or during) #6 (sending the page) and
>>                  therefore we do not have a good up to date copy of
>>              that data at either end. II. The MCE occurs after #6,
>>                  in which case we already have a good copy at the
>> target end. 
>> 
>> To fix B you have added an 8th step to the above:
>> 
>>          8. Query the types of the pages again, using
>>          xc_get_pfn_type_batch, and if there are more pages dirty now
>>          than we say at #4 (actually #5 when we scanned the array,
>>          but that distinction doesn''t matter) then a new MCE
must
>>          have occurred. Go back to #2 and try again.
>> 
>> This won''t do anything for A since the page wasn''t in
the batch to
>> start 
>> with and so neither #4 or #8 will look at its type, this is good and
>> proper.
>> 
>> So now we consider the two subcases of B. Lets consider B.II first
>> since 
>> it seems to be the more obvious case.
>> 
>> In case B.II the target end already has a good copy of the data page,
>> there is no need to mark the page as broken on the far end, nor to
>> arrange for a vMCE to be injected. I don''t know if/how we
arrange for
>> vMCEs to be injected under these circumstances, however even if a
>> vMCE 
>> does get injected into the guest when it eventually gets unpaused on
>> the 
>> target then all that will happen is that it will needlessly throw
>> away a 
>> good page. However this is a rare corner case which is not worth
>> concerning ourselves with (it''s largely indistinguishable from
case
>> A). 
>> If the MCE had happened even a single cycle earlier then this would
>> have 
>> been a B.I event instead of a B.II one. In any case there is no need
>> to 
>> return to #2 and try again, everything will be just fine if we
>> complete 
>> the migration at this point.
>> 
>> In case B.I the MCE occurs before (or while) we send the page onto
>> the 
>> wire. We will therefore try to read from this page because we
haven''t
>> looked at the type since #4 and have no idea that it is now broken.
>> Reading from the broken page will cause a fault, perhaps causing a
>> vMCE 
>> to be delivered to dom0, which causes the kernel to kill the process
>> doing the migration. Or maybe it kills dom0 or the host entirely.
>> Either 
>> way the idea of looping again is rather moot.
>> 
>> Have I missed a case which needs thinking about?
>> 
>> I suspect B.I is the case where you are most likely to find a flaw
>> in my argument. Is there something else which is done in this case
>> which would 
>> allow us to continue?
> 
> I think your analysis is correct -- the only question is whether B.I
> will 100% cause the migration to crash, or whether there''s a
chance of
> not crashing on the read.  I had tried to ask that question before,
> and understood Jinsong''s response to be that it''s not
100% sure that
> the read would cause an error.  However, looking back at the thread,
> I think I may have understood something that was not there. :-)
> 
> So I guess the question for Jinsong is this:
> 
> The only time this extra loop could help is if there is a page broken
> after being paused but before being sent the last time (B.I in
Ian''s
> analysis) -- in which case, the migration code is 100% guaranteed to
> read a now-broken page.  What are the chances that this read of a
> broken page will *not* cause a fault which will kill at least the
> migration process, if not dom0?  If the chances are
"slim-to-none",
> then there is no point for the extra check.
> 
>   -George
Yes, the extra check is pointless, xl migration reading broken page will
eventually kill system. See my reply IanC''s email :-)

Thanks,
Jinsong

George Dunlap

2012-Dec-03 11:24 UTC

head link

Re: [PATCH V4] X86/vMCE: handle broken page with regard to migration

On 30/11/12 18:51, Liu, Jinsong wrote:>
> Not exactly I think. Reading the broken page will trigger a serious SRAR
error. Under such case hypervisor will inject a vMCE to the guest which was
migrating, not dom0. The reason of this injection is, guest is best one to
handle it, w/ sufficient clue/status/information (other component like
hypervisor/dom0 are not proper). For xl migration process, after return from MCE
context, it *again* read the broken page ... this will kill system entirely
--> so we definitely not care migration any more.
Why would the vMCE be sent to the guest, rather than the vcpu that was 
running when the SRAR error occured (i.e., the vcpu on which the 
migration process was running)?

  -George

Liu, Jinsong

2012-Dec-05 15:57 UTC

head link

Re: [PATCH V4] X86/vMCE: handle broken page with regard to migration

Liu, Jinsong wrote:> Ian Campbell wrote:
>> On Wed, 2012-11-28 at 14:37 +0000, Liu, Jinsong wrote:
>>> Ping?
>> 
>> Sorry I''ve been meaning to reply but didn''t manage to
yet. Also you
>> replied to V4 saying to ignore it, so I was half waiting for V5 but I
>> see this should actually be labelled V5 anyway.
>> 
>> I''m afraid I still don''t fully grok the reason for
the loop that
>> goes with: +            /*
>> +             * At the last iter, count the number of broken pages
>> after sending, +             * and if there are more than before
>> sending, do one or more iter +             * to make sure the pages
>> are marked broken on the receiving side. +             */
>> 
>> Can we go through it one more time? Sorry.
> 
> Sure, and I''m very much appreciated your kindly/patient review.
Your
> comments help/force me to re-think it more detailed. 
> 
>> 
>> Let me outline the sequence of events and you can point out where
I''m
>> going wrong. I''m afraid this has turned out to be rather long,
again
>> I''m sorry for that.
>> 
>> First we do some number of iterations with the guest live. If an MCE
>> occurs during this phase then the page will be marked dirty and we
>> will pick this up on the next iteration and resend the page with the
>> dirty type etc and all is fine. This all looks good to me, so we
>> don''t need to worry about anything at this stage.
> 
> Hmm, seems not so safe. If the page is good and it''s in dirty
bitmap
> (get from step #2), while at the iteration it broken after #4, same
> issue occur as your analysis of last iteration --> migration process
> will read broken page (except this case I agree it''s OK) --> so
let''s
> merge analysis of ''migration process read broken page''
with last
> iteration B.I case.     
> 
>> 
>> Eventually we get to the last iteration, at which point we pause the
>> guest. From here on in the guest itself is not going to cause an MCE
>> (e.g. by touching its RAM) because it is not running but we must
>> still account for the possibility of an asynchronous MCE of some
>> sort e.g. triggered by the error detection mechanisms in the
>> hardware, cosmic rays and such like.
>> 
>> The final iteration proceeds roughly as follows.
>> 
>>      1. The domain is paused
>>      2. We scan the dirty bitmap and add dirty pages to the batch of
>>         pages to process (there may be several batches in the last
>>         iteration, we only need to concern ourselves with any one   
>> batch here). 
>>      3. We map all of the pages in the resulting batch with        
>> xc_map_foreign_bulk 
>>      4. We query the types of all the pages in the batch with       
>> xc_get_pfn_type_batch 
>>      5. We iterate over the batch, checking the type of each page, in
>>         some cases we do some incidental processing.
>>      6. We send the types of the pages in the batch over the wire.
>>      7. We iterate over the batch again, and send any normal page
>>         (not broken, xtab etc) over the wire. Actually we do this as
>>         runs of normal pages, but the key point is we avoid touching
>>         any special page (including ones marked as broken by #4)
>> 
>> Is this sequence of events accurate?
> 
> Yes, exactly.
> 
>> 
>> Now lets consider the consequences of an MCE occurring at various
>> stages here.
>> 
>> Any MCE which happens before #4 is fine, we will pick that up in #4
>> and the following steps will do the right thing.
>> 
>> Note that I am assuming that the mapping step in #3 is safe even for
>> a broken page, so long as we don''t actually try and use the
mapping
>> (more on that later), is this true?
> 
> I think so. #3 mmu mapping is safe itself.
> 
>> 
>> If an MCE occurs after #4 then the page will be marked as dirty in
>> the bitmap and Xen will internally mark it as broken, but we
won''t
>> see either of those with the current algorithm. There are two cases
>> to think about here AFAICT,
>>      A. The page was not already dirty at #2. In this case we know
>>         that the guest hasn''t dirtied the page since the
previous
>>         iteration and therefore the target has a good copy of this
>>         page from that time. The page isn''t even in the batch
we are
>>         processing So we don''t particularly care about the MCE
here
>>         and can, from the PoV of migrating this guest, ignore it.
>>      B. The page was already dirty (but not broken, we handled that
>>         case above in "Any MCE which happens before #4...")
at #2
>>         which means we have do not have an up to date copy on the
>>         target. This has two subcases:
>>              I. The MCE occurs before (or during) #6 (sending the
>>                 page) and therefore we do not have a good up to date
>>                 copy of that data at either end.
>>             II. The MCE occurs after #6, in which case we already
>>                 have a good copy at the target end.
>> 
>> To fix B you have added an 8th step to the above:
>> 
>>         8. Query the types of the pages again, using
>>         xc_get_pfn_type_batch, and if there are more pages dirty now
>>         than we say at #4 (actually #5 when we scanned the array, but
>>         that distinction doesn''t matter) then a new MCE must
have
>>         occurred. Go back to #2 and try again.
>> 
>> This won''t do anything for A since the page wasn''t in
the batch to
>> start with and so neither #4 or #8 will look at its type, this is
>> good and proper. 
>> 
>> So now we consider the two subcases of B. Lets consider B.II first
>> since it seems to be the more obvious case.
>> 
>> In case B.II the target end already has a good copy of the data page,
>> there is no need to mark the page as broken on the far end, nor to
>> arrange for a vMCE to be injected. I don''t know if/how we
arrange for
>> vMCEs to be injected under these circumstances, however even if a
>> vMCE does get injected into the guest when it eventually gets
>> unpaused on the target then all that will happen is that it will
>> needlessly throw away a good page. However this is a rare corner
>> case which is not worth concerning ourselves with (it''s
largely
>> indistinguishable from case A). If the MCE had happened even a
>> single cycle earlier then this would have been a B.I event instead
>> of a B.II one. In any case there is no need to return to #2 and try
>> again, everything will be just fine if we complete the migration at
>> this point. 
>> 
>> In case B.I the MCE occurs before (or while) we send the page onto
>> the wire. We will therefore try to read from this page because we
>> haven''t looked at the type since #4 and have no idea that it
is now
>> broken. 
> 
> Right.
> 
>> Reading from the broken page will cause a fault, perhaps causing a
>> vMCE to be delivered to dom0, which causes the kernel to kill the
>> process doing the migration. Or maybe it kills dom0 or the host
>> entirely. 
> 
> Not exactly I think. Reading the broken page will trigger a serious
> SRAR error. Under such case hypervisor will inject a vMCE to the
> guest which was migrating, not dom0. The reason of this injection is,
> guest is best one to handle it, w/ sufficient clue/status/information
> (other component like hypervisor/dom0 are not proper). For xl
> migration process, after return from MCE context, it *again* read the
> broken page ... this will kill system entirely --> so we definitely
> not care migration any more.       
> 
>> Either
>> way the idea of looping again is rather moot.
> 
> Agree the conclusion, though a little different opinion about its
> reason. So let''s remove step #8? 
> 
> Thanks,
> Jinsong
More simply, we can remove not only step #8 for last iteration check, but also
the action to ''mark broken page to dirty bitmap'':
In any iteration, there is a key point #4, if vmce occur before #4 it will
transfer proper pfn_type/pfn_number and (xl migration) will not access broken
page (if guest access the broken page again it will be killed by hypervisor), if
vmce occur after #4 system will crash and no need care migration any more.

So we can go back to the original patch which used to handle ''vmce
occur before migration'' and entirely don''t need add specific
code to handle ''vmce occur during migration'', since it in fact
has handled both cases (and simple). Thoughts?

Thanks,
Jinsong

George Dunlap

2012-Dec-05 15:59 UTC

head link

Re: [PATCH V4] X86/vMCE: handle broken page with regard to migration

On 05/12/12 15:57, Liu, Jinsong wrote:> More simply, we can remove not only step #8 for last iteration check, but
also the action to ''mark broken page to dirty bitmap'':
> In any iteration, there is a key point #4, if vmce occur before #4 it will
transfer proper pfn_type/pfn_number and (xl migration) will not access broken
page (if guest access the broken page again it will be killed by hypervisor), if
vmce occur after #4 system will crash and no need care migration any more.
>
> So we can go back to the original patch which used to handle ''vmce
occur before migration'' and entirely don''t need add specific
code to handle ''vmce occur during migration'', since it in fact
has handled both cases (and simple). Thoughts?
I thought part of the point of that was to have a consistent behavior 
from the presented virtual hardware -- i.e,. if the guest OS receives a 
vMCE, and subsequently touches that page, it should get an SRAR?  Is 
that important or not?

  -George

Ian Campbell

2012-Dec-05 16:01 UTC

head link

Re: [PATCH V4] X86/vMCE: handle broken page with regard to migration

(please trim your quotes)
> More simply, we can remove not only step #8 for last iteration check,
> but also the action to ''mark broken page to dirty
bitmap'':
> In any iteration, there is a key point #4, if vmce occur before #4 it
> will transfer proper pfn_type/pfn_number and (xl migration) will not
> access broken page (if guest access the broken page again it will be
> killed by hypervisor), if vmce occur after #4 system will crash and no
> need care migration any more.
> 
> So we can go back to the original patch which used to handle ''vmce
> occur before migration'' and entirely don''t need add
specific code to
> handle ''vmce occur during migration'', since it in fact
has handled
> both cases (and simple). Thoughts?
Do you not need that stuff to handle MCEs during the live phase?

Specifically an MCE occurring on a page which is not in the current
batch at all can be handled safely during the live phase.

Ian.

Liu, Jinsong

2012-Dec-05 16:15 UTC

head link

Re: [PATCH V4] X86/vMCE: handle broken page with regard to migration

Ian Campbell wrote:> (please trim your quotes)
> 
>> More simply, we can remove not only step #8 for last iteration check,
>> but also the action to ''mark broken page to dirty
bitmap'':
>> In any iteration, there is a key point #4, if vmce occur before #4 it
>> will transfer proper pfn_type/pfn_number and (xl migration) will not
>> access broken page (if guest access the broken page again it will be
>> killed by hypervisor), if vmce occur after #4 system will crash and
>> no need care migration any more. 
>> 
>> So we can go back to the original patch which used to handle
''vmce
>> occur before migration'' and entirely don''t need add
specific code to
>> handle ''vmce occur during migration'', since it in
fact has handled
>> both cases (and simple). Thoughts?
> 
> Do you not need that stuff to handle MCEs during the live phase?
> 
> Specifically an MCE occurring on a page which is not in the current
> batch at all can be handled safely during the live phase.
> 
> Ian.
No, we don''t need care it. Whether a page (mce occurred at) is in
current batch or not is not important:
1. if xl will not transfer it in the future, that''s fine;
2. if xl will transfer it in the future, but it was not in current batch,
that''s also fine --> it will be handled at its batch;
(guest will be killed if it access the broken page again)

Thanks,
Jinsong

Liu, Jinsong

2012-Dec-05 16:36 UTC

head link

Re: [PATCH V4] X86/vMCE: handle broken page with regard to migration

George Dunlap wrote:> On 05/12/12 15:57, Liu, Jinsong wrote:
>> More simply, we can remove not only step #8 for last iteration
>> check, but also the action to ''mark broken page to dirty
bitmap'':
>> In any iteration, there is a key point #4, if vmce occur before #4
>> it will transfer proper pfn_type/pfn_number and (xl migration) will
>> not access broken page (if guest access the broken page again it
>> will be killed by hypervisor), if vmce occur after #4 system will
>> crash and no need care migration any more.    
>> 
>> So we can go back to the original patch which used to handle
''vmce
>> occur before migration'' and entirely don''t need add
specific code to
>> handle ''vmce occur during migration'', since it in
fact has handled
>> both cases (and simple). Thoughts?   
> 
> I thought part of the point of that was to have a consistent behavior
> from the presented virtual hardware -- i.e,. if the guest OS receives
> a vMCE, and subsequently touches that page, it should get an SRAR?  Is
> that important or not?
> 
>   -George
Currently hypervisor will kill guest under such case (instead of inject SRAR to
guest). The reason is, not all h/w platform support SRAR so if guest access
broken page it hardwarely will trigger more serious error: in some old platform
which not support SRAR it will crash whole system. So to prevent this bad
situation hypervisor prefer kill the guest.
(Under MCA architecture, there is no cpuid to distinguish if the platform
support SRAR or not --> if MCi_status report an known type indicate a SRAR it
means h/w support SRAR, otherwise h/w will report an unknown error -->
hypervisor then kill system under this case)

So from guest point of view, it will not feel strange: if it received a vMCE
(SRAO) then it access the page again, it was killed thinking itself runs at a
platform not support SRAR.

Thanks,
Jinsong

Xen devel - Nov 2012 - [PATCH V4] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V4] X86/vMCE: handle broken page with regard to migration

[PATCH V4] X86/vMCE: handle broken page with regard to migration

[PATCH V4] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V4] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V4] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V4] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V4] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V4] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V4] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V4] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V4] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V4] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V4] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V4] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V4] X86/vMCE: handle broken page with regard to migration

Re: [PATCH V4] X86/vMCE: handle broken page with regard to migration