thr3ads.net - Xen devel - [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels [Oct 2013]

If this information is useful, please help other people find it:
Share via:

David Vrabel

2013-Oct-08 16:55 UTC

[PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

The series (for Xen 4.4) improves the kexec hypercall by making Xen
responsible for loading and relocating the image.  This allows kexec
to be usable by pv-ops kernels and should allow kexec to be usable
from a HVM or PVH privileged domain.

The first patch is a simple clean-up.

Patch 2 introduces the new ABI.

Patch 3 and 4 nearly completely reimplement the kexec load, unload and
exec sub-ops.  The old load_v1 sub-op is then implemented on top of
the new code.

Patch 5 calls the kexec image when dom0 crashes.  This avoids having
to alter dom0 kernels to do a exec sub-op call on crash -- a
SHUTDOWN_crash by dom0 will trigger the kexec.

Patches 6 and 7 add the libxc API for the kexec calls.  These have
been acked-by Ian Campbell already.

Patch 8 adds a link time check for the size of the relocate code.

Patch 9 adds myself as the maintainer for kexec in Xen.

The required patch series for kexec-tools will be posted shortly and
are available from the xen-v6 branch of:

http://xenbits.xen.org/gitweb/?p=people/dvrabel/kexec-tools.git;a=summary

Changes in v9:

- Update comments to correctly say 4.4.
- Minor updates the kexec_reloc assembly to improve maintainability a
  bit.

Changes in v8:

- Use #defines for compat ABI structures.
- Tweak link time check for kexec_reloc.

Changes in v7:

- No longer use GUEST_HANDLE_64(), get a uniform ABI by using unions
  and explicit padding.
- Only map the segments and not all of RAM.
- Add a mechanism to create mappings for use by the exec''d image (a
  segment with a NULL buf handle).
- Fix a bug where a crash image''s code page would by placed at machine
  address 0 (instead of inside the crash region).

Changes in v6:

- Fix double free in KEXEC_load_v1 failure path.
- Only copy the relocation code and not the whole page.
- Add myself as the kexec maintainer.

Changes in v5 (not posted to the list):

- _rsvd -> _pad in one of the public ABI structures.
- Fix bug where trailing pages were not zeroed. This fixes loading a
  64-bit Linux kernel using a more recent version of kexec-tools.
- Check the relocation code fits into a page at link time.

Changes in v4:

- Use paddr_t and page_to_maddr() etc. for portability.
- Add explicit padding to hypercall structures where required.
- Minor cleanup of the kexec_reloc assembly.
- Print a message before exec''ing a crash image.
- Style fixes (tabs, trailing whitespace) and typos.
- Fix a bug where using the V1 interface and unloading a image may crash.

Changes in v3:

- Provide old struct xen_kexec_load if __XEN_INTERFACE_VERSION__ < 4.3
- Adjust new struct xen_kexec_load to avoid unnecessary padding.
- Use domheap pages for the image and control pages.
- Remove the DBG() macros from the reloc code.

David

David Vrabel

2013-Oct-08 16:55 UTC

head link

[PATCH 1/9] x86: give FIX_EFI_MPF its own fixmap entry

From: David Vrabel <david.vrabel@citrix.com>

FIX_EFI_MPF was the same as FIX_KEXEC_BASE_0 which is going away.  So
add its own entry.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Tested-by: Daniel Kiper <daniel.kiper@oracle.com>
---
 xen/arch/x86/mpparse.c       |    2 --
 xen/include/asm-x86/fixmap.h |    1 +
 2 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/mpparse.c b/xen/arch/x86/mpparse.c
index 97d34bc..3753704 100644
--- a/xen/arch/x86/mpparse.c
+++ b/xen/arch/x86/mpparse.c
@@ -538,8 +538,6 @@ static inline void __init construct_default_ISA_mptable(int
mpc_default_type)
 	}
 }
 
-#define FIX_EFI_MPF FIX_KEXEC_BASE_0
-
 static __init void efi_unmap_mpf(void)
 {
 	if (efi_enabled)
diff --git a/xen/include/asm-x86/fixmap.h b/xen/include/asm-x86/fixmap.h
index d850be4..8b4266d 100644
--- a/xen/include/asm-x86/fixmap.h
+++ b/xen/include/asm-x86/fixmap.h
@@ -66,6 +66,7 @@ enum fixed_addresses {
     FIX_APEI_RANGE_BASE,
     FIX_APEI_RANGE_END = FIX_APEI_RANGE_BASE + FIX_APEI_RANGE_MAX -1,
     FIX_IGD_MMIO,
+    FIX_EFI_MPF,
     __end_of_fixed_addresses
 };
 
-- 
1.7.2.5

David Vrabel

2013-Oct-08 16:55 UTC

head link

[PATCH 2/9] kexec: add public interface for improved load/unload sub-ops

From: David Vrabel <david.vrabel@citrix.com>

Add replacement KEXEC_CMD_load and KEXEC_CMD_unload sub-ops to the
kexec hypercall.  These new sub-ops allow a priviledged guest to
provide the image data to be loaded into Xen memory or the crash
region instead of guests loading the image data themselves and
providing the relocation code and metadata.

The old interface is provided to guests requesting an interface
version prior to 4.4.

Bump __XEN_LATEST_INTERFACE_VERSION__ to 0x00040400.

Signed-off: David Vrabel <david.vrabel@citrix.com>
---
 xen/common/kexec.c              |   12 +++---
 xen/include/public/kexec.h      |   78 +++++++++++++++++++++++++++++++++++++--
 xen/include/public/xen-compat.h |    2 +-
 3 files changed, 81 insertions(+), 11 deletions(-)

diff --git a/xen/common/kexec.c b/xen/common/kexec.c
index 7cd151f..7b23df0 100644
--- a/xen/common/kexec.c
+++ b/xen/common/kexec.c
@@ -734,7 +734,7 @@ static void crash_save_vmcoreinfo(void)
 #endif
 }
 
-static int kexec_load_unload_internal(unsigned long op, xen_kexec_load_t *load)
+static int kexec_load_unload_internal(unsigned long op, xen_kexec_load_v1_t
*load)
 {
     xen_kexec_image_t *image;
     int base, bit, pos;
@@ -781,7 +781,7 @@ static int kexec_load_unload_internal(unsigned long op,
xen_kexec_load_t *load)
 
 static int kexec_load_unload(unsigned long op, XEN_GUEST_HANDLE_PARAM(void)
uarg)
 {
-    xen_kexec_load_t load;
+    xen_kexec_load_v1_t load;
 
     if ( unlikely(copy_from_guest(&load, uarg, 1)) )
         return -EFAULT;
@@ -793,8 +793,8 @@ static int kexec_load_unload_compat(unsigned long op,
                                     XEN_GUEST_HANDLE_PARAM(void) uarg)
 {
 #ifdef CONFIG_COMPAT
-    compat_kexec_load_t compat_load;
-    xen_kexec_load_t load;
+    compat_kexec_load_v1_t compat_load;
+    xen_kexec_load_v1_t load;
 
     if ( unlikely(copy_from_guest(&compat_load, uarg, 1)) )
         return -EFAULT;
@@ -866,8 +866,8 @@ static int do_kexec_op_internal(unsigned long op,
         else
                 ret = kexec_get_range(uarg);
         break;
-    case KEXEC_CMD_kexec_load:
-    case KEXEC_CMD_kexec_unload:
+    case KEXEC_CMD_kexec_load_v1:
+    case KEXEC_CMD_kexec_unload_v1:
         spin_lock_irqsave(&kexec_lock, flags);
         if (!test_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags))
         {
diff --git a/xen/include/public/kexec.h b/xen/include/public/kexec.h
index 36409ff..4e86f86 100644
--- a/xen/include/public/kexec.h
+++ b/xen/include/public/kexec.h
@@ -116,12 +116,12 @@ typedef struct xen_kexec_exec {
  * type  == KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH [in]
  * image == relocation information for kexec (ignored for unload) [in]
  */
-#define KEXEC_CMD_kexec_load            1
-#define KEXEC_CMD_kexec_unload          2
-typedef struct xen_kexec_load {
+#define KEXEC_CMD_kexec_load_v1         1 /* obsolete since 0x00040400 */
+#define KEXEC_CMD_kexec_unload_v1       2 /* obsolete since 0x00040400 */
+typedef struct xen_kexec_load_v1 {
     int type;
     xen_kexec_image_t image;
-} xen_kexec_load_t;
+} xen_kexec_load_v1_t;
 
 #define KEXEC_RANGE_MA_CRASH      0 /* machine address and size of crash area
*/
 #define KEXEC_RANGE_MA_XEN        1 /* machine address and size of Xen itself
*/
@@ -152,6 +152,76 @@ typedef struct xen_kexec_range {
     unsigned long start;
 } xen_kexec_range_t;
 
+#if __XEN_INTERFACE_VERSION__ >= 0x00040400
+/*
+ * A contiguous chunk of a kexec image and it''s destination machine
+ * address.
+ */
+typedef struct xen_kexec_segment {
+    union {
+        XEN_GUEST_HANDLE(const_void) h;
+        uint64_t _pad;
+    } buf;
+    uint64_t buf_size;
+    uint64_t dest_maddr;
+    uint64_t dest_size;
+} xen_kexec_segment_t;
+DEFINE_XEN_GUEST_HANDLE(xen_kexec_segment_t);
+
+/*
+ * Load a kexec image into memory.
+ *
+ * For KEXEC_TYPE_DEFAULT images, the segments may be anywhere in RAM.
+ * The image is relocated prior to being executed.
+ *
+ * For KEXEC_TYPE_CRASH images, each segment of the image must reside
+ * in the memory region reserved for kexec (KEXEC_RANGE_MA_CRASH) and
+ * the entry point must be within the image. The caller is responsible
+ * for ensuring that multiple images do not overlap.
+ *
+ * All image segments will be loaded to their destination machine
+ * addresses prior to being executed.  The trailing portion of any
+ * segments with a source buffer (from dest_maddr + buf_size to
+ * dest_maddr + dest_size) will be zeroed.
+ *
+ * Segments with no source buffer will be accessible to the image when
+ * it is executed.
+ */
+
+#define KEXEC_CMD_kexec_load 4
+typedef struct xen_kexec_load {
+    uint8_t  type;        /* One of KEXEC_TYPE_* */
+    uint8_t  _pad;
+    uint16_t arch;        /* ELF machine type (EM_*). */
+    uint32_t nr_segments;
+    union {
+        XEN_GUEST_HANDLE(xen_kexec_segment_t) h;
+        uint64_t _pad;
+    } segments;
+    uint64_t entry_maddr; /* image entry point machine address. */
+} xen_kexec_load_t;
+DEFINE_XEN_GUEST_HANDLE(xen_kexec_load_t);
+
+/*
+ * Unload a kexec image.
+ *
+ * Type must be one of KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH.
+ */
+#define KEXEC_CMD_kexec_unload 5
+typedef struct xen_kexec_unload {
+    uint8_t type;
+} xen_kexec_unload_t;
+DEFINE_XEN_GUEST_HANDLE(xen_kexec_unload_t);
+
+#else /* __XEN_INTERFACE_VERSION__ < 0x00040400 */
+
+#define KEXEC_CMD_kexec_load KEXEC_CMD_kexec_load_v1
+#define KEXEC_CMD_kexec_unload KEXEC_CMD_kexec_unload_v1
+#define xen_kexec_load xen_kexec_load_v1
+#define xen_kexec_load_t xen_kexec_load_v1_t
+
+#endif
+
 #endif /* _XEN_PUBLIC_KEXEC_H */
 
 /*
diff --git a/xen/include/public/xen-compat.h b/xen/include/public/xen-compat.h
index 69141c4..3eb80a0 100644
--- a/xen/include/public/xen-compat.h
+++ b/xen/include/public/xen-compat.h
@@ -27,7 +27,7 @@
 #ifndef __XEN_PUBLIC_XEN_COMPAT_H__
 #define __XEN_PUBLIC_XEN_COMPAT_H__
 
-#define __XEN_LATEST_INTERFACE_VERSION__ 0x00040300
+#define __XEN_LATEST_INTERFACE_VERSION__ 0x00040400
 
 #if defined(__XEN__) || defined(__XEN_TOOLS__)
 /* Xen is built with matching headers and implements the latest interface. */
-- 
1.7.2.5

David Vrabel

2013-Oct-08 16:55 UTC

head link

[PATCH 3/9] kexec: add infrastructure for handling kexec images

From: David Vrabel <david.vrabel@citrix.com>

Add the code needed to handle and load kexec images into Xen memory or
into the crash region.  This is needed for the new KEXEC_CMD_load and
KEXEC_CMD_unload hypercall sub-ops.

Much of this code is derived from the Linux kernel.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 xen/common/Makefile      |    1 +
 xen/common/kimage.c      |  817 ++++++++++++++++++++++++++++++++++++++++++++++
 xen/include/xen/kimage.h |   62 ++++
 3 files changed, 880 insertions(+), 0 deletions(-)
 create mode 100644 xen/common/kimage.c
 create mode 100644 xen/include/xen/kimage.h

diff --git a/xen/common/Makefile b/xen/common/Makefile
index fcb4a84..9ed0a39 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -11,6 +11,7 @@ obj-y += irq.o
 obj-y += kernel.o
 obj-y += keyhandler.o
 obj-$(HAS_KEXEC) += kexec.o
+obj-$(HAS_KEXEC) += kimage.o
 obj-y += lib.o
 obj-y += memory.o
 obj-y += multicall.o
diff --git a/xen/common/kimage.c b/xen/common/kimage.c
new file mode 100644
index 0000000..9783e5a
--- /dev/null
+++ b/xen/common/kimage.c
@@ -0,0 +1,817 @@
+/*
+ * Kexec Image
+ *
+ * Copyright (C) 2013 Citrix Systems R&D Ltd.
+ *
+ * Derived from kernel/kexec.c from Linux:
+ *
+ *   Copyright (C) 2002-2004 Eric Biederman  <ebiederm@xmission.com>
+ *
+ * This source code is licensed under the GNU General Public License,
+ * Version 2.  See the file COPYING for more details.
+ */
+
+#include <xen/config.h>
+#include <xen/types.h>
+#include <xen/init.h>
+#include <xen/kernel.h>
+#include <xen/errno.h>
+#include <xen/spinlock.h>
+#include <xen/guest_access.h>
+#include <xen/mm.h>
+#include <xen/kexec.h>
+#include <xen/kimage.h>
+
+#include <asm/page.h>
+
+/*
+ * When kexec transitions to the new kernel there is a one-to-one
+ * mapping between physical and virtual addresses.  On processors
+ * where you can disable the MMU this is trivial, and easy.  For
+ * others it is still a simple predictable page table to setup.
+ *
+ * The code for the transition from the current kernel to the the new
+ * kernel is placed in the page-size control_code_buffer.  This memory
+ * must be identity mapped in the transition from virtual to physical
+ * addresses.
+ *
+ * The assembly stub in the control code buffer is passed a linked list
+ * of descriptor pages detailing the source pages of the new kernel,
+ * and the destination addresses of those source pages.  As this data
+ * structure is not used in the context of the current OS, it must
+ * be self-contained.
+ *
+ * The code has been made to work with highmem pages and will use a
+ * destination page in its final resting place (if it happens
+ * to allocate it).  The end product of this is that most of the
+ * physical address space, and most of RAM can be used.
+ *
+ * Future directions include:
+ *  - allocating a page table with the control code buffer identity
+ *    mapped, to simplify machine_kexec and make kexec_on_panic more
+ *    reliable.
+ */
+
+/*
+ * KIMAGE_NO_DEST is an impossible destination address..., for
+ * allocating pages whose destination address we do not care about.
+ */
+#define KIMAGE_NO_DEST (-1UL)
+
+/*
+ * Offset of the last entry in an indirection page.
+ */
+#define KIMAGE_LAST_ENTRY (PAGE_SIZE/sizeof(kimage_entry_t) - 1)
+
+
+static int kimage_is_destination_range(struct kexec_image *image,
+                                       paddr_t start, paddr_t end);
+static struct page_info *kimage_alloc_page(struct kexec_image *image,
+                                           paddr_t dest);
+
+static struct page_info *kimage_alloc_zeroed_page(unsigned memflags)
+{
+    struct page_info *page;
+
+    page = alloc_domheap_page(NULL, memflags);
+    if ( !page )
+        return NULL;
+
+    clear_domain_page(page_to_mfn(page));
+
+    return page;
+}
+
+static int do_kimage_alloc(struct kexec_image **rimage, paddr_t entry,
+                           unsigned long nr_segments,
+                           xen_kexec_segment_t *segments, uint8_t type)
+{
+    struct kexec_image *image;
+    unsigned long i;
+    int result;
+
+    /* Allocate a controlling structure */
+    result = -ENOMEM;
+    image = xzalloc(typeof(*image));
+    if ( !image )
+        goto out;
+
+    image->entry_maddr = entry;
+    image->type = type;
+    image->nr_segments = nr_segments;
+    image->segments = segments;
+
+    image->next_crash_page = kexec_crash_area.start;
+
+    INIT_PAGE_LIST_HEAD(&image->control_pages);
+    INIT_PAGE_LIST_HEAD(&image->dest_pages);
+    INIT_PAGE_LIST_HEAD(&image->unusable_pages);
+
+    /*
+     * Verify we have good destination addresses.  The caller is
+     * responsible for making certain we don''t attempt to load the new
+     * image into invalid or reserved areas of RAM.  This just
+     * verifies it is an address we can use.
+     *
+     * Since the kernel does everything in page size chunks ensure the
+     * destination addresses are page aligned.  Too many special cases
+     * crop of when we don''t do this.  The most insidious is getting
+     * overlapping destination addresses simply because addresses are
+     * changed to page size granularity.
+     */
+    result = -EADDRNOTAVAIL;
+    for ( i = 0; i < nr_segments; i++ )
+    {
+        paddr_t mstart, mend;
+
+        mstart = image->segments[i].dest_maddr;
+        mend   = mstart + image->segments[i].dest_size;
+        if ( (mstart & ~PAGE_MASK) || (mend & ~PAGE_MASK) )
+            goto out;
+    }
+
+    /*
+     * Verify our destination addresses do not overlap.  If we allowed
+     * overlapping destination addresses through very weird things can
+     * happen with no easy explanation as one segment stops on
+     * another.
+     */
+    result = -EINVAL;
+    for ( i = 0; i < nr_segments; i++ )
+    {
+        paddr_t mstart, mend;
+        unsigned long j;
+
+        mstart = image->segments[i].dest_maddr;
+        mend   = mstart + image->segments[i].dest_size;
+        for (j = 0; j < i; j++ )
+        {
+            paddr_t pstart, pend;
+            pstart = image->segments[j].dest_maddr;
+            pend   = pstart + image->segments[j].dest_size;
+            /* Do the segments overlap? */
+            if ( (mend > pstart) && (mstart < pend) )
+                goto out;
+        }
+    }
+
+    /*
+     * Ensure our buffer sizes are strictly less than our memory
+     * sizes.  This should always be the case, and it is easier to
+     * check up front than to be surprised later on.
+     */
+    result = -EINVAL;
+    for ( i = 0; i < nr_segments; i++ )
+    {
+        if ( image->segments[i].buf_size >
image->segments[i].dest_size )
+            goto out;
+    }
+
+    /* 
+     * Page for the relocation code must still be accessible after the
+     * processor has switched to 32-bit mode.
+     */
+    result = -ENOMEM;
+    image->control_code_page = kimage_alloc_control_page(image,
MEMF_bits(32));
+    if ( !image->control_code_page )
+        goto out;
+
+    /* Add an empty indirection page. */
+    image->entry_page = kimage_alloc_control_page(image, 0);
+    if ( !image->entry_page )
+        goto out;
+
+    image->head = page_to_maddr(image->entry_page);
+    image->next_entry = 0;
+
+    result = 0;
+out:
+    if ( result == 0 )
+        *rimage = image;
+    else
+        kimage_free(image);
+
+    return result;
+
+}
+
+static int kimage_normal_alloc(struct kexec_image **rimage, paddr_t entry,
+                               unsigned long nr_segments,
+                               xen_kexec_segment_t *segments)
+{
+    return do_kimage_alloc(rimage, entry, nr_segments, segments,
+                           KEXEC_TYPE_DEFAULT);
+}
+
+static int kimage_crash_alloc(struct kexec_image **rimage, paddr_t entry,
+                              unsigned long nr_segments,
+                              xen_kexec_segment_t *segments)
+{
+    unsigned long i;
+    int result;
+
+    /* Verify we have a valid entry point */
+    if ( (entry < kexec_crash_area.start)
+         || (entry > kexec_crash_area.start + kexec_crash_area.size))
+        return -EADDRNOTAVAIL;
+
+    /*
+     * Verify we have good destination addresses.  Normally
+     * the caller is responsible for making certain we don''t
+     * attempt to load the new image into invalid or reserved
+     * areas of RAM.  But crash kernels are preloaded into a
+     * reserved area of ram.  We must ensure the addresses
+     * are in the reserved area otherwise preloading the
+     * kernel could corrupt things.
+     */
+    for ( i = 0; i < nr_segments; i++ )
+    {
+        paddr_t mstart, mend;
+
+        if ( guest_handle_is_null(segments[i].buf.h) )
+            continue;
+
+        mstart = segments[i].dest_maddr;
+        mend = mstart + segments[i].dest_size - 1;
+        /* Ensure we are within the crash kernel limits. */
+        if ( (mstart < kexec_crash_area.start )
+             || (mend > kexec_crash_area.start + kexec_crash_area.size))
+            return -EADDRNOTAVAIL;
+    }
+
+    /* Allocate and initialize a controlling structure. */
+    result = do_kimage_alloc(rimage, entry, nr_segments, segments,
+                             KEXEC_TYPE_CRASH);
+    if ( result )
+        return result;
+
+    return 0;
+}
+
+static int kimage_is_destination_range(struct kexec_image *image,
+                                       paddr_t start,
+                                       paddr_t end)
+{
+    unsigned long i;
+
+    for ( i = 0; i < image->nr_segments; i++ )
+    {
+        paddr_t mstart, mend;
+
+        mstart = image->segments[i].dest_maddr;
+        mend = mstart + image->segments[i].dest_size;
+        if ( (end > mstart) && (start < mend) )
+            return 1;
+    }
+
+    return 0;
+}
+
+static void kimage_free_page_list(struct page_list_head *list)
+{
+    struct page_info *page, *next;
+
+    page_list_for_each_safe(page, next, list)
+    {
+        page_list_del(page, list);
+        free_domheap_page(page);
+    }
+}
+
+static struct page_info *kimage_alloc_normal_control_page(
+    struct kexec_image *image, unsigned memflags)
+{
+    /*
+     * Control pages are special, they are the intermediaries that are
+     * needed while we copy the rest of the pages to their final
+     * resting place.  As such they must not conflict with either the
+     * destination addresses or memory the kernel is already using.
+     *
+     * The only case where we really need more than one of these are
+     * for architectures where we cannot disable the MMU and must
+     * instead generate an identity mapped page table for all of the
+     * memory.
+     *
+     * At worst this runs in O(N) of the image size.
+     */
+    struct page_list_head extra_pages;
+    struct page_info *page = NULL;
+
+    INIT_PAGE_LIST_HEAD(&extra_pages);
+
+    /*
+     * Loop while I can allocate a page and the page allocated is a
+     * destination page.
+     */
+    do {
+        unsigned long mfn, emfn;
+        paddr_t addr, eaddr;
+
+        page = kimage_alloc_zeroed_page(memflags);
+        if ( !page )
+            break;
+        mfn   = page_to_mfn(page);
+        emfn  = mfn + 1;
+        addr  = page_to_maddr(page);
+        eaddr = addr + PAGE_SIZE;
+        if ( kimage_is_destination_range(image, addr, eaddr) )
+        {
+            page_list_add(page, &extra_pages);
+            page = NULL;
+        }
+    } while ( !page );
+
+    if ( page )
+    {
+        /* Remember the allocated page... */
+        page_list_add(page, &image->control_pages);
+
+        /*
+         * Because the page is already in it''s destination location we
+         * will never allocate another page at that address.
+         * Therefore kimage_alloc_page will not return it (again) and
+         * we don''t need to give it an entry in image->segments[].
+         */
+    }
+    /*
+     * Deal with the destination pages I have inadvertently allocated.
+     *
+     * Ideally I would convert multi-page allocations into single page
+     * allocations, and add everything to image->dest_pages.
+     *
+     * For now it is simpler to just free the pages.
+     */
+    kimage_free_page_list(&extra_pages);
+
+    return page;
+}
+
+static struct page_info *kimage_alloc_crash_control_page(struct kexec_image
*image)
+{
+    /*
+     * Control pages are special, they are the intermediaries that are
+     * needed while we copy the rest of the pages to their final
+     * resting place.  As such they must not conflict with either the
+     * destination addresses or memory the kernel is already using.
+     *
+     * Control pages are also the only pags we must allocate when
+     * loading a crash kernel.  All of the other pages are specified
+     * by the segments and we just memcpy into them directly.
+     *
+     * The only case where we really need more than one of these are
+     * for architectures where we cannot disable the MMU and must
+     * instead generate an identity mapped page table for all of the
+     * memory.
+     *
+     * Given the low demand this implements a very simple allocator
+     * that finds the first hole of the appropriate size in the
+     * reserved memory region, and allocates all of the memory up to
+     * and including the hole.
+     */
+    paddr_t hole_start, hole_end;
+    struct page_info *page = NULL;
+
+    hole_start = PAGE_ALIGN(image->next_crash_page);
+    hole_end   = hole_start + PAGE_SIZE;
+    while ( hole_end <= kexec_crash_area.start + kexec_crash_area.size )
+    {
+        unsigned long i;
+
+        /* See if I overlap any of the segments. */
+        for ( i = 0; i < image->nr_segments; i++ )
+        {
+            paddr_t mstart, mend;
+
+            mstart = image->segments[i].dest_maddr;
+            mend   = mstart + image->segments[i].dest_size;
+            if ( (hole_end > mstart) && (hole_start < mend) )
+            {
+                /* Advance the hole to the end of the segment. */
+                hole_start = PAGE_ALIGN(mend);
+                hole_end   = hole_start + PAGE_SIZE;
+                break;
+            }
+        }
+        /* If I don''t overlap any segments I have found my hole! */
+        if ( i == image->nr_segments )
+        {
+            page = maddr_to_page(hole_start);
+            break;
+        }
+    }
+    if ( page )
+    {
+        image->next_crash_page = hole_end;
+        clear_domain_page(page_to_mfn(page));
+    }
+
+    return page;
+}
+
+
+struct page_info *kimage_alloc_control_page(struct kexec_image *image,
+                                            unsigned memflags)
+{
+    struct page_info *pages = NULL;
+
+    switch ( image->type )
+    {
+    case KEXEC_TYPE_DEFAULT:
+        pages = kimage_alloc_normal_control_page(image, memflags);
+        break;
+    case KEXEC_TYPE_CRASH:
+        pages = kimage_alloc_crash_control_page(image);
+        break;
+    }
+    return pages;
+}
+
+static int kimage_add_entry(struct kexec_image *image, kimage_entry_t entry)
+{
+    kimage_entry_t *entries;
+
+    if ( image->next_entry == KIMAGE_LAST_ENTRY )
+    {
+        struct page_info *page;
+
+        page = kimage_alloc_page(image, KIMAGE_NO_DEST);
+        if ( !page )
+            return -ENOMEM;
+
+        entries = __map_domain_page(image->entry_page);
+        entries[image->next_entry] = page_to_maddr(page) | IND_INDIRECTION;
+        unmap_domain_page(entries);
+
+        image->entry_page = page;
+        image->next_entry = 0;
+    }
+
+    entries = __map_domain_page(image->entry_page);
+    entries[image->next_entry] = entry;
+    image->next_entry++;
+    unmap_domain_page(entries);
+
+    return 0;
+}
+
+static int kimage_set_destination(struct kexec_image *image,
+                                  paddr_t destination)
+{
+    return kimage_add_entry(image, (destination & PAGE_MASK) |
IND_DESTINATION);
+}
+
+
+static int kimage_add_page(struct kexec_image *image, paddr_t maddr)
+{
+    return kimage_add_entry(image, (maddr & PAGE_MASK) | IND_SOURCE);
+}
+
+
+static void kimage_free_extra_pages(struct kexec_image *image)
+{
+    kimage_free_page_list(&image->dest_pages);
+    kimage_free_page_list(&image->unusable_pages);
+}
+
+static void kimage_terminate(struct kexec_image *image)
+{
+    kimage_entry_t *entries;
+
+    entries = __map_domain_page(image->entry_page);
+    entries[image->next_entry] = IND_DONE;
+    unmap_domain_page(entries);
+}
+
+/*
+ * Iterate over all the entries in the indirection pages.
+ *
+ * Call unmap_domain_page(ptr) after the loop exits.
+ */
+#define for_each_kimage_entry(image, ptr, entry)                        \
+    for ( ptr = map_domain_page(image->head >> PAGE_SHIFT);           
\
+          (entry = *ptr) && !(entry & IND_DONE);                   
\
+          ptr = (entry & IND_INDIRECTION) ?                             \
+              (unmap_domain_page(ptr), map_domain_page(entry >>
PAGE_SHIFT)) \
+              : ptr + 1 )
+
+static void kimage_free_entry(kimage_entry_t entry)
+{
+    struct page_info *page;
+
+    page = mfn_to_page(entry >> PAGE_SHIFT);
+    free_domheap_page(page);
+}
+
+void kimage_free(struct kexec_image *image)
+{
+    kimage_entry_t *ptr, entry;
+    kimage_entry_t ind = 0;
+
+    if ( !image )
+        return;
+
+    kimage_free_extra_pages(image);
+    for_each_kimage_entry(image, ptr, entry)
+    {
+        if ( entry & IND_INDIRECTION )
+        {
+            /* Free the previous indirection page */
+            if ( ind & IND_INDIRECTION )
+                kimage_free_entry(ind);
+            /* Save this indirection page until we are done with it. */
+            ind = entry;
+        }
+        else if ( entry & IND_SOURCE )
+            kimage_free_entry(entry);
+    }
+    unmap_domain_page(ptr);
+
+    /* Free the final indirection page. */
+    if ( ind & IND_INDIRECTION )
+        kimage_free_entry(ind);
+
+    /* Free the kexec control pages. */
+    kimage_free_page_list(&image->control_pages);
+    xfree(image->segments);
+    xfree(image);
+}
+
+static kimage_entry_t *kimage_dst_used(struct kexec_image *image,
+                                       paddr_t maddr)
+{
+    kimage_entry_t *ptr, entry;
+    unsigned long destination = 0;
+
+    for_each_kimage_entry(image, ptr, entry)
+    {
+        if ( entry & IND_DESTINATION )
+            destination = entry & PAGE_MASK;
+        else if ( entry & IND_SOURCE )
+        {
+            if ( maddr == destination )
+                return ptr;
+            destination += PAGE_SIZE;
+        }
+    }
+    unmap_domain_page(ptr);
+
+    return NULL;
+}
+
+static struct page_info *kimage_alloc_page(struct kexec_image *image,
+                                           paddr_t destination)
+{
+    /*
+     * Here we implement safeguards to ensure that a source page is
+     * not copied to its destination page before the data on the
+     * destination page is no longer useful.
+     *
+     * To do this we maintain the invariant that a source page is
+     * either its own destination page, or it is not a destination
+     * page at all.
+     *
+     * That is slightly stronger than required, but the proof that no
+     * problems will not occur is trivial, and the implementation is
+     * simply to verify.
+     *
+     * When allocating all pages normally this algorithm will run in
+     * O(N) time, but in the worst case it will run in O(N^2) time.
+     * If the runtime is a problem the data structures can be fixed.
+     */
+    struct page_info *page;
+    paddr_t addr;
+
+    /*
+     * Walk through the list of destination pages, and see if I have a
+     * match.
+     */
+    page_list_for_each(page, &image->dest_pages)
+    {
+        addr = page_to_maddr(page);
+        if ( addr == destination )
+        {
+            page_list_del(page, &image->dest_pages);
+            return page;
+        }
+    }
+    page = NULL;
+    for (;;)
+    {
+        kimage_entry_t *old;
+
+        /* Allocate a page, if we run out of memory give up. */
+        page = kimage_alloc_zeroed_page(0);
+        if ( !page )
+            return NULL;
+        addr = page_to_maddr(page);
+
+        /* If it is the destination page we want use it. */
+        if ( addr == destination )
+            break;
+
+        /* If the page is not a destination page use it. */
+        if ( !kimage_is_destination_range(image, addr,
+                                          addr + PAGE_SIZE) )
+            break;
+
+        /*
+         * I know that the page is someones destination page.  See if
+         * there is already a source page for this destination page.
+         * And if so swap the source pages.
+         */
+        old = kimage_dst_used(image, addr);
+        if ( old )
+        {
+            /* If so move it. */
+            unsigned long old_mfn = *old >> PAGE_SHIFT;
+            unsigned long mfn = addr >> PAGE_SHIFT;
+
+            copy_domain_page(mfn, old_mfn);
+            clear_domain_page(old_mfn);
+            *old = (addr & ~PAGE_MASK) | IND_SOURCE;
+            unmap_domain_page(old);
+
+            page = mfn_to_page(old_mfn);
+            break;
+        }
+        else
+        {
+            /*
+             * Place the page on the destination list; I will use it
+             * later.
+             */
+            page_list_add(page, &image->dest_pages);
+        }
+    }
+    return page;
+}
+
+static int kimage_load_normal_segment(struct kexec_image *image,
+                                      xen_kexec_segment_t *segment)
+{
+    unsigned long to_copy;
+    unsigned long src_offset;
+    paddr_t dest, end;
+    int ret;
+
+    to_copy = segment->buf_size;
+    src_offset = 0;
+    dest = segment->dest_maddr;
+
+    ret = kimage_set_destination(image, dest);
+    if ( ret < 0 )
+        return ret;
+
+    while ( to_copy )
+    {
+        unsigned long dest_mfn;
+        struct page_info *page;
+        void *dest_va;
+        size_t size;
+
+        dest_mfn = dest >> PAGE_SHIFT;
+
+        size = min_t(unsigned long, PAGE_SIZE, to_copy);
+
+        page = kimage_alloc_page(image, dest);
+        if ( !page )
+            return -ENOMEM;
+        ret = kimage_add_page(image, page_to_maddr(page));
+        if ( ret < 0 )
+            return ret;
+
+        dest_va = __map_domain_page(page);
+        ret = copy_from_guest_offset(dest_va, segment->buf.h, src_offset,
size);
+        unmap_domain_page(dest_va);
+        if ( ret )
+            return -EFAULT;
+
+        to_copy -= size;
+        src_offset += size;
+        dest += PAGE_SIZE;
+    }
+
+    /* Remainder of the destination should be zeroed. */
+    end = segment->dest_maddr + segment->dest_size;
+    for ( ; dest < end; dest += PAGE_SIZE )
+        kimage_add_entry(image, IND_ZERO);
+
+    return 0;
+}
+
+static int kimage_load_crash_segment(struct kexec_image *image,
+                                     xen_kexec_segment_t *segment)
+{
+    /*
+     * For crash dumps kernels we simply copy the data from user space
+     * to it''s destination.
+     */
+    paddr_t dest;
+    unsigned long sbytes, dbytes;
+    int ret = 0;
+    unsigned long src_offset = 0;
+
+    sbytes = segment->buf_size;
+    dbytes = segment->dest_size;
+    dest = segment->dest_maddr;
+
+    while ( dbytes )
+    {
+        unsigned long dest_mfn;
+        void *dest_va;
+        size_t schunk, dchunk;
+
+        dest_mfn = dest >> PAGE_SHIFT;
+
+        dchunk = PAGE_SIZE;
+        schunk = min(dchunk, sbytes);
+
+        dest_va = map_domain_page(dest_mfn);
+        if ( !dest_va )
+            return -EINVAL;
+
+        ret = copy_from_guest_offset(dest_va, segment->buf.h, src_offset,
schunk);
+        memset(dest_va + schunk, 0, dchunk - schunk);
+
+        unmap_domain_page(dest_va);
+        if ( ret )
+            return -EFAULT;
+
+        dbytes -= dchunk;
+        sbytes -= schunk;
+        dest += dchunk;
+        src_offset += schunk;
+    }
+
+    return 0;
+}
+
+static int kimage_load_segment(struct kexec_image *image, xen_kexec_segment_t
*segment)
+{
+    int result = -ENOMEM;
+
+    if ( !guest_handle_is_null(segment->buf.h) )
+    {
+        switch ( image->type )
+        {
+        case KEXEC_TYPE_DEFAULT:
+            result = kimage_load_normal_segment(image, segment);
+            break;
+        case KEXEC_TYPE_CRASH:
+            result = kimage_load_crash_segment(image, segment);
+            break;
+        }
+    }
+
+    return result;
+}
+
+int kimage_alloc(struct kexec_image **rimage, uint8_t type, uint16_t arch,
+                 uint64_t entry_maddr,
+                 uint32_t nr_segments, xen_kexec_segment_t *segment)
+{
+    int result;
+
+    switch( type )
+    {
+    case KEXEC_TYPE_DEFAULT:
+        result = kimage_normal_alloc(rimage, entry_maddr, nr_segments,
segment);
+        break;
+    case KEXEC_TYPE_CRASH:
+        result = kimage_crash_alloc(rimage, entry_maddr, nr_segments, segment);
+        break;
+    default:
+        result = -EINVAL;
+        break;
+    }
+    if ( result < 0 )
+        return result;
+
+    (*rimage)->arch = arch;
+
+    return result;
+}
+
+int kimage_load_segments(struct kexec_image *image)
+{
+    int s;
+    int result;
+
+    for ( s = 0; s < image->nr_segments; s++ ) {
+        result = kimage_load_segment(image, &image->segments[s]);
+        if ( result < 0 )
+            return result;
+    }
+    kimage_terminate(image);
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/xen/kimage.h b/xen/include/xen/kimage.h
new file mode 100644
index 0000000..0ebd37a
--- /dev/null
+++ b/xen/include/xen/kimage.h
@@ -0,0 +1,62 @@
+#ifndef __XEN_KIMAGE_H__
+#define __XEN_KIMAGE_H__
+
+#define IND_DESTINATION  0x1
+#define IND_INDIRECTION  0x2
+#define IND_DONE         0x4
+#define IND_SOURCE       0x8
+#define IND_ZERO        0x10
+
+#ifndef __ASSEMBLY__
+
+#include <xen/list.h>
+#include <xen/mm.h>
+#include <public/kexec.h>
+
+#define KEXEC_SEGMENT_MAX 16
+
+typedef paddr_t kimage_entry_t;
+
+struct kexec_image {
+    uint8_t type;
+    uint16_t arch;
+    uint64_t entry_maddr;
+    uint32_t nr_segments;
+    xen_kexec_segment_t *segments;
+
+    kimage_entry_t head;
+    struct page_info *entry_page;
+    unsigned next_entry;
+
+    struct page_info *control_code_page;
+    struct page_info *aux_page;
+
+    struct page_list_head control_pages;
+    struct page_list_head dest_pages;
+    struct page_list_head unusable_pages;
+
+    /* Address of next control page to allocate for crash kernels. */
+    paddr_t next_crash_page;
+};
+
+int kimage_alloc(struct kexec_image **rimage, uint8_t type, uint16_t arch,
+                 uint64_t entry_maddr,
+                 uint32_t nr_segments, xen_kexec_segment_t *segment);
+void kimage_free(struct kexec_image *image);
+int kimage_load_segments(struct kexec_image *image);
+struct page_info *kimage_alloc_control_page(struct kexec_image *image,
+                                            unsigned memflags);
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* __XEN_KIMAGE_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.7.2.5

David Vrabel

2013-Oct-08 16:55 UTC

head link

[PATCH 4/9] kexec: extend hypercall with improved load/unload ops

From: David Vrabel <david.vrabel@citrix.com>

In the existing kexec hypercall, the load and unload ops depend on
internals of the Linux kernel (the page list and code page provided by
the kernel).  The code page is used to transition between Xen context
and the image so using kernel code doesn''t make sense and will not
work for PVH guests.

Add replacement KEXEC_CMD_kexec_load and KEXEC_CMD_kexec_unload ops
that no longer require a code page to be provided by the guest -- Xen
now provides the code for calling the image directly.

The new load op looks similar to the Linux kexec_load system call and
allows the guest to provide the image data to be loaded.  The guest
specifies the architecture of the image which may be a 32-bit subarch
of the hypervisor''s architecture (i.e., an EM_386 image on an
EM_X86_64 hypervisor).

The toolstack can now load images without kernel involvement.  This is
required for supporting kexec when using a dom0 with an upstream
kernel.

Crash images are copied directly into the crash region on load.
Default images are copied into domheap pages and a list of source and
destination machine addresses is created.  This is list is used in
kexec_reloc() to relocate the image to its destination.

The old load and unload sub-ops are still available (as
KEXEC_CMD_load_v1 and KEXEC_CMD_unload_v1) and are implemented on top
of the new infrastructure.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 xen/arch/x86/machine_kexec.c        |  192 +++++++++++-------
 xen/arch/x86/x86_64/Makefile        |    2 +-
 xen/arch/x86/x86_64/compat_kexec.S  |  187 -----------------
 xen/arch/x86/x86_64/kexec_reloc.S   |  206 ++++++++++++++++++
 xen/common/kexec.c                  |  393 +++++++++++++++++++++++++++++------
 xen/common/kimage.c                 |  123 +++++++++++-
 xen/include/asm-x86/fixmap.h        |    3 -
 xen/include/asm-x86/machine_kexec.h |   16 ++
 xen/include/xen/kexec.h             |   16 +-
 xen/include/xen/kimage.h            |    6 +
 10 files changed, 809 insertions(+), 335 deletions(-)
 delete mode 100644 xen/arch/x86/x86_64/compat_kexec.S
 create mode 100644 xen/arch/x86/x86_64/kexec_reloc.S
 create mode 100644 xen/include/asm-x86/machine_kexec.h

diff --git a/xen/arch/x86/machine_kexec.c b/xen/arch/x86/machine_kexec.c
index 68b9705..b70d5a6 100644
--- a/xen/arch/x86/machine_kexec.c
+++ b/xen/arch/x86/machine_kexec.c
@@ -1,9 +1,18 @@
 /******************************************************************************
  * machine_kexec.c
  *
+ * Copyright (C) 2013 Citrix Systems R&D Ltd.
+ *
+ * Portions derived from Linux''s arch/x86/kernel/machine_kexec_64.c.
+ *
+ *   Copyright (C) 2002-2005 Eric Biederman  <ebiederm@xmission.com>
+ *
  * Xen port written by:
  * - Simon ''Horms'' Horman <horms@verge.net.au>
  * - Magnus Damm <magnus@valinux.co.jp>
+ *
+ * This source code is licensed under the GNU General Public License,
+ * Version 2.  See the file COPYING for more details.
  */
 
 #include <xen/types.h>
@@ -11,63 +20,124 @@
 #include <xen/guest_access.h>
 #include <asm/fixmap.h>
 #include <asm/hpet.h>
+#include <asm/page.h>
+#include <asm/machine_kexec.h>
 
-typedef void (*relocate_new_kernel_t)(
-                unsigned long indirection_page,
-                unsigned long *page_list,
-                unsigned long start_address,
-                unsigned int preserve_context);
-
-int machine_kexec_load(int type, int slot, xen_kexec_image_t *image)
+/*
+ * Add a mapping for a page to the page tables used during kexec.
+ */
+int machine_kexec_add_page(struct kexec_image *image, unsigned long vaddr,
+                           unsigned long maddr)
 {
-    unsigned long prev_ma = 0;
-    int fix_base = FIX_KEXEC_BASE_0 + (slot * (KEXEC_XEN_NO_PAGES >> 1));
-    int k;
+    struct page_info *l4_page;
+    struct page_info *l3_page;
+    struct page_info *l2_page;
+    struct page_info *l1_page;
+    l4_pgentry_t *l4 = NULL;
+    l3_pgentry_t *l3 = NULL;
+    l2_pgentry_t *l2 = NULL;
+    l1_pgentry_t *l1 = NULL;
+    int ret = -ENOMEM;
+
+    l4_page = image->aux_page;
+    if ( !l4_page )
+    {
+        l4_page = kimage_alloc_control_page(image, 0);
+        if ( !l4_page )
+            goto out;
+        image->aux_page = l4_page;
+    }
 
-    /* setup fixmap to point to our pages and record the virtual address
-     * in every odd index in page_list[].
-     */
+    l4 = __map_domain_page(l4_page);
+    l4 += l4_table_offset(vaddr);
+    if ( !(l4e_get_flags(*l4) & _PAGE_PRESENT) )
+    {
+        l3_page = kimage_alloc_control_page(image, 0);
+        if ( !l3_page )
+            goto out;
+        l4e_write(l4, l4e_from_page(l3_page, __PAGE_HYPERVISOR));
+    }
+    else
+        l3_page = l4e_get_page(*l4);
+
+    l3 = __map_domain_page(l3_page);
+    l3 += l3_table_offset(vaddr);
+    if ( !(l3e_get_flags(*l3) & _PAGE_PRESENT) )
+    {
+        l2_page = kimage_alloc_control_page(image, 0);
+        if ( !l2_page )
+            goto out;
+        l3e_write(l3, l3e_from_page(l2_page, __PAGE_HYPERVISOR));
+    }
+    else
+        l2_page = l3e_get_page(*l3);
+
+    l2 = __map_domain_page(l2_page);
+    l2 += l2_table_offset(vaddr);
+    if ( !(l2e_get_flags(*l2) & _PAGE_PRESENT) )
+    {
+        l1_page = kimage_alloc_control_page(image, 0);
+        if ( !l1_page )
+            goto out;
+        l2e_write(l2, l2e_from_page(l1_page, __PAGE_HYPERVISOR));
+    }
+    else
+        l1_page = l2e_get_page(*l2);
+
+    l1 = __map_domain_page(l1_page);
+    l1 += l1_table_offset(vaddr);
+    l1e_write(l1, l1e_from_pfn(maddr >> PAGE_SHIFT, __PAGE_HYPERVISOR));
+
+    ret = 0;
+out:
+    if ( l1 )
+        unmap_domain_page(l1);
+    if ( l2 )
+        unmap_domain_page(l2);
+    if ( l3 )
+        unmap_domain_page(l3);
+    if ( l4 )
+        unmap_domain_page(l4);
+    return ret;
+}
 
-    for ( k = 0; k < KEXEC_XEN_NO_PAGES; k++ )
+int machine_kexec_load(struct kexec_image *image)
+{
+    void *code_page;
+    int ret;
+
+    switch ( image->arch )
     {
-        if ( (k & 1) == 0 )
-        {
-            /* Even pages: machine address. */
-            prev_ma = image->page_list[k];
-        }
-        else
-        {
-            /* Odd pages: va for previous ma. */
-            if ( is_pv_32on64_domain(dom0) )
-            {
-                /*
-                 * The compatability bounce code sets up a page table
-                 * with a 1-1 mapping of the first 1G of memory so
-                 * VA==PA here.
-                 *
-                 * This Linux purgatory code still sets up separate
-                 * high and low mappings on the control page (entries
-                 * 0 and 1) but it is harmless if they are equal since
-                 * that PT is not live at the time.
-                 */
-                image->page_list[k] = prev_ma;
-            }
-            else
-            {
-                set_fixmap(fix_base + (k >> 1), prev_ma);
-                image->page_list[k] = fix_to_virt(fix_base + (k >>
1));
-            }
-        }
+    case EM_386:
+    case EM_X86_64:
+        break;
+    default:
+        return -EINVAL;
     }
 
+    code_page = __map_domain_page(image->control_code_page);
+    memcpy(code_page, kexec_reloc, kexec_reloc_size);
+    unmap_domain_page(code_page);
+
+    /*
+     * Add a mapping for the control code page to the same virtual
+     * address as kexec_reloc.  This allows us to keep running after
+     * these page tables are loaded in kexec_reloc.
+     */
+    ret = machine_kexec_add_page(image, (unsigned long)kexec_reloc,
+                                 page_to_maddr(image->control_code_page));
+    if ( ret < 0 )
+        return ret;
+
     return 0;
 }
 
-void machine_kexec_unload(int type, int slot, xen_kexec_image_t *image)
+void machine_kexec_unload(struct kexec_image *image)
 {
+    /* no-op. kimage_free() frees all control pages. */
 }
 
-void machine_reboot_kexec(xen_kexec_image_t *image)
+void machine_reboot_kexec(struct kexec_image *image)
 {
     BUG_ON(smp_processor_id() != 0);
     smp_send_stop();
@@ -75,13 +145,10 @@ void machine_reboot_kexec(xen_kexec_image_t *image)
     BUG();
 }
 
-void machine_kexec(xen_kexec_image_t *image)
+void machine_kexec(struct kexec_image *image)
 {
-    struct desc_ptr gdt_desc = {
-        .base = (unsigned long)(boot_cpu_gdt_table - FIRST_RESERVED_GDT_ENTRY),
-        .limit = LAST_RESERVED_GDT_BYTE
-    };
     int i;
+    unsigned long reloc_flags = 0;
 
     /* We are about to permenantly jump out of the Xen context into the kexec
      * purgatory code.  We really dont want to be still servicing interupts.
@@ -109,29 +176,12 @@ void machine_kexec(xen_kexec_image_t *image)
      * not like running with NMIs disabled. */
     enable_nmis();
 
-    /*
-     * compat_machine_kexec() returns to idle pagetables, which requires us
-     * to be running on a static GDT mapping (idle pagetables have no GDT
-     * mappings in their per-domain mapping area).
-     */
-    asm volatile ( "lgdt %0" : : "m" (gdt_desc) );
+    if ( image->arch == EM_386 )
+        reloc_flags |= KEXEC_RELOC_FLAG_COMPAT;
 
-    if ( is_pv_32on64_domain(dom0) )
-    {
-        compat_machine_kexec(image->page_list[1],
-                             image->indirection_page,
-                             image->page_list,
-                             image->start_address);
-    }
-    else
-    {
-        relocate_new_kernel_t rnk;
-
-        rnk = (relocate_new_kernel_t) image->page_list[1];
-        (*rnk)(image->indirection_page, image->page_list,
-               image->start_address,
-               0 /* preserve_context */);
-    }
+    kexec_reloc(page_to_maddr(image->control_code_page),
+                page_to_maddr(image->aux_page),
+                image->head, image->entry_maddr, reloc_flags);
 }
 
 int machine_kexec_get(xen_kexec_range_t *range)
diff --git a/xen/arch/x86/x86_64/Makefile b/xen/arch/x86/x86_64/Makefile
index d56e12d..7f8fb3d 100644
--- a/xen/arch/x86/x86_64/Makefile
+++ b/xen/arch/x86/x86_64/Makefile
@@ -11,11 +11,11 @@ obj-y += mmconf-fam10h.o
 obj-y += mmconfig_64.o
 obj-y += mmconfig-shared.o
 obj-y += compat.o
-obj-bin-y += compat_kexec.o
 obj-y += domain.o
 obj-y += physdev.o
 obj-y += platform_hypercall.o
 obj-y += cpu_idle.o
 obj-y += cpufreq.o
+obj-bin-y += kexec_reloc.o
 
 obj-$(crash_debug)   += gdbstub.o
diff --git a/xen/arch/x86/x86_64/compat_kexec.S
b/xen/arch/x86/x86_64/compat_kexec.S
deleted file mode 100644
index fc92af9..0000000
--- a/xen/arch/x86/x86_64/compat_kexec.S
+++ /dev/null
@@ -1,187 +0,0 @@
-/*
- * Compatibility kexec handler.
- */
-
-/*
- * NOTE: We rely on Xen not relocating itself above the 4G boundary. This is
- * currently true but if it ever changes then compat_pg_table will
- * need to be moved back below 4G at run time.
- */
-
-#include <xen/config.h>
-
-#include <asm/asm_defns.h>
-#include <asm/msr.h>
-#include <asm/page.h>
-
-/* The unrelocated physical address of a symbol. */
-#define SYM_PHYS(sym)          ((sym) - __XEN_VIRT_START)
-
-/* Load physical address of symbol into register and relocate it. */
-#define RELOCATE_SYM(sym,reg)  mov $SYM_PHYS(sym), reg ; \
-                               add xen_phys_start(%rip), reg
-
-/*
- * Relocate a physical address in memory. Size of temporary register
- * determines size of the value to relocate.
- */
-#define RELOCATE_MEM(addr,reg) mov addr(%rip), reg ; \
-                               add xen_phys_start(%rip), reg ; \
-                               mov reg, addr(%rip)
-
-        .text
-
-        .code64
-
-ENTRY(compat_machine_kexec)
-        /* x86/64                        x86/32  */
-        /* %rdi - relocate_new_kernel_t  CALL    */
-        /* %rsi - indirection page       4(%esp) */
-        /* %rdx - page_list              8(%esp) */
-        /* %rcx - start address         12(%esp) */
-        /*        cpu has pae           16(%esp) */
-
-        /* Shim the 64 bit page_list into a 32 bit page_list. */
-        mov $12,%r9
-        lea compat_page_list(%rip), %rbx
-1:      dec %r9
-        movl (%rdx,%r9,8),%eax
-        movl %eax,(%rbx,%r9,4)
-        test %r9,%r9
-        jnz 1b
-
-        RELOCATE_SYM(compat_page_list,%rdx)
-
-        /* Relocate compatibility mode entry point address. */
-        RELOCATE_MEM(compatibility_mode_far,%eax)
-
-        /* Relocate compat_pg_table. */
-        RELOCATE_MEM(compat_pg_table,     %rax)
-        RELOCATE_MEM(compat_pg_table+0x8, %rax)
-        RELOCATE_MEM(compat_pg_table+0x10,%rax)
-        RELOCATE_MEM(compat_pg_table+0x18,%rax)
-
-        /*
-         * Setup an identity mapped region in PML4[0] of idle page
-         * table.
-         */
-        RELOCATE_SYM(l3_identmap,%rax)
-        or  $0x63,%rax
-        mov %rax, idle_pg_table(%rip)
-
-        /* Switch to idle page table. */
-        RELOCATE_SYM(idle_pg_table,%rax)
-        movq %rax, %cr3
-
-        /* Switch to identity mapped compatibility stack. */
-        RELOCATE_SYM(compat_stack,%rax)
-        movq %rax, %rsp
-
-        /* Save xen_phys_start for 32 bit code. */
-        movq xen_phys_start(%rip), %rbx
-
-        /* Jump to low identity mapping in compatibility mode. */
-        ljmp *compatibility_mode_far(%rip)
-        ud2
-
-compatibility_mode_far:
-        .long SYM_PHYS(compatibility_mode)
-        .long __HYPERVISOR_CS32
-
-        /*
-         * We use 5 words of stack for the arguments passed to the kernel. The
-         * kernel only uses 1 word before switching to its own stack. Allocate
-         * 16 words to give "plenty" of room.
-         */
-        .fill 16,4,0
-compat_stack:
-
-        .code32
-
-#undef RELOCATE_SYM
-#undef RELOCATE_MEM
-
-/*
- * Load physical address of symbol into register and relocate it. %rbx
- * contains xen_phys_start(%rip) saved before jump to compatibility
- * mode.
- */
-#define RELOCATE_SYM(sym,reg) mov $SYM_PHYS(sym), reg ; \
-                              add %ebx, reg
-
-compatibility_mode:
-        /* Setup some sane segments. */
-        movl $__HYPERVISOR_DS32, %eax
-        movl %eax, %ds
-        movl %eax, %es
-        movl %eax, %fs
-        movl %eax, %gs
-        movl %eax, %ss
-
-        /* Push arguments onto stack. */
-        pushl $0   /* 20(%esp) - preserve context */
-        pushl $1   /* 16(%esp) - cpu has pae */
-        pushl %ecx /* 12(%esp) - start address */
-        pushl %edx /*  8(%esp) - page list */
-        pushl %esi /*  4(%esp) - indirection page */
-        pushl %edi /*  0(%esp) - CALL */
-
-        /* Disable paging and therefore leave 64 bit mode. */
-        movl %cr0, %eax
-        andl $~X86_CR0_PG, %eax
-        movl %eax, %cr0
-
-        /* Switch to 32 bit page table. */
-        RELOCATE_SYM(compat_pg_table, %eax)
-        movl  %eax, %cr3
-
-        /* Clear MSR_EFER[LME], disabling long mode */
-        movl    $MSR_EFER,%ecx
-        rdmsr
-        btcl    $_EFER_LME,%eax
-        wrmsr
-
-        /* Re-enable paging, but only 32 bit mode now. */
-        movl %cr0, %eax
-        orl $X86_CR0_PG, %eax
-        movl %eax, %cr0
-        jmp 1f
-1:
-
-        popl %eax
-        call *%eax
-        ud2
-
-        .data
-        .align 4
-compat_page_list:
-        .fill 12,4,0
-
-        .align 32,0
-
-        /*
-         * These compat page tables contain an identity mapping of the
-         * first 4G of the physical address space.
-         */
-compat_pg_table:
-        .long SYM_PHYS(compat_pg_table_l2) + 0*PAGE_SIZE + 0x01, 0
-        .long SYM_PHYS(compat_pg_table_l2) + 1*PAGE_SIZE + 0x01, 0
-        .long SYM_PHYS(compat_pg_table_l2) + 2*PAGE_SIZE + 0x01, 0
-        .long SYM_PHYS(compat_pg_table_l2) + 3*PAGE_SIZE + 0x01, 0
-
-        .section .data.page_aligned, "aw", @progbits
-        .align PAGE_SIZE,0
-compat_pg_table_l2:
-        .macro identmap from=0, count=512
-        .if \count-1
-        identmap "(\from+0)","(\count/2)"
-        identmap
"(\from+(0x200000*(\count/2)))","(\count/2)"
-        .else
-        .quad 0x00000000000000e3 + \from
-        .endif
-        .endm
-
-        identmap 0x00000000
-        identmap 0x40000000
-        identmap 0x80000000
-        identmap 0xc0000000
diff --git a/xen/arch/x86/x86_64/kexec_reloc.S
b/xen/arch/x86/x86_64/kexec_reloc.S
new file mode 100644
index 0000000..de75e04
--- /dev/null
+++ b/xen/arch/x86/x86_64/kexec_reloc.S
@@ -0,0 +1,206 @@
+/*
+ * Relocate a kexec_image to its destination and call it.
+ *
+ * Copyright (C) 2013 Citrix Systems R&D Ltd.
+ *
+ * Portions derived from Linux''s arch/x86/kernel/relocate_kernel_64.S.
+ *
+ *   Copyright (C) 2002-2005 Eric Biederman  <ebiederm@xmission.com>
+ *
+ * This source code is licensed under the GNU General Public License,
+ * Version 2.  See the file COPYING for more details.
+ */
+#include <xen/config.h>
+#include <xen/kimage.h>
+
+#include <asm/asm_defns.h>
+#include <asm/msr.h>
+#include <asm/page.h>
+#include <asm/machine_kexec.h>
+
+        .text
+        .align PAGE_SIZE
+        .code64
+
+ENTRY(kexec_reloc)
+        /* %rdi - code page maddr */
+        /* %rsi - page table maddr */
+        /* %rdx - indirection page maddr */
+        /* %rcx - entry maddr */
+        /* %r8 - flags */
+
+        /* Setup stack. */
+        leaq    (reloc_stack - kexec_reloc)(%rdi), %rsp
+
+        /* Load reloc page table. */
+        movq    %rsi, %cr3
+
+        /* Jump to identity mapped code. */
+        leaq    (identity_mapped - kexec_reloc)(%rdi), %rax
+        jmpq    *%rax
+
+identity_mapped:
+        /*
+         * Set cr0 to a known state:
+         *  - Paging enabled
+         *  - Alignment check disabled
+         *  - Write protect disabled
+         *  - No task switch
+         *  - Don''t do FP software emulation.
+         *  - Protected mode enabled
+         */
+        movq    %cr0, %rax
+        andl    $~(X86_CR0_AM | X86_CR0_WP | X86_CR0_TS | X86_CR0_EM), %eax
+        orl     $(X86_CR0_PG | X86_CR0_PE), %eax
+        movq    %rax, %cr0
+
+        /*
+         * Set cr4 to a known state:
+         *  - physical address extension enabled
+         */
+        movl    $X86_CR4_PAE, %eax
+        movq    %rax, %cr4
+
+        pushq   %rdi
+        movq    %rdx, %rdi
+        call    relocate_pages
+        popq    %rdi
+
+        /* Need to switch to 32-bit mode? */
+        testq   $KEXEC_RELOC_FLAG_COMPAT, %r8
+        jnz     call_32_bit
+
+call_64_bit:
+        /* Call the image entry point.  This should never return. */
+        callq   *%rcx
+        ud2
+
+call_32_bit:
+        /* Setup IDT. */
+        lidt    compat_mode_idt(%rip)
+
+        /* Load compat GDT. */
+        leaq    (compat_mode_gdt - kexec_reloc)(%rdi), %rax
+        movq    %rax, (compat_mode_gdt_desc + 2)(%rip)
+        lgdt    compat_mode_gdt_desc(%rip)
+
+        /* Relocate compatibility mode entry point address. */
+        leal    (compatibility_mode - kexec_reloc)(%edi), %eax
+        movl    %eax, compatibility_mode_far(%rip)
+
+        /* Enter compatibility mode. */
+        ljmp    *compatibility_mode_far(%rip)
+
+relocate_pages:
+        /* %rdi - indirection page maddr */
+        pushq   %rdi
+        pushq   %rsi
+        pushq   %rbx
+        pushq   %rcx
+
+        cld
+        movq    %rdi, %rbx
+        xorl    %edi, %edi
+        xorl    %esi, %esi
+
+next_entry: /* top, read another word for the indirection page */
+
+        movq    (%rbx), %rcx
+        addq    $8, %rbx
+is_dest:
+        testb   $IND_DESTINATION, %cl
+        jz      is_ind
+        movq    %rcx, %rdi
+        andq    $PAGE_MASK, %rdi
+        jmp     next_entry
+is_ind:
+        testb   $IND_INDIRECTION, %cl
+        jz      is_done
+        movq    %rcx, %rbx
+        andq    $PAGE_MASK, %rbx
+        jmp     next_entry
+is_done:
+        testb   $IND_DONE, %cl
+        jnz     done
+is_source:
+        testb   $IND_SOURCE, %cl
+        jz      is_zero
+        movq    %rcx, %rsi      /* For every source page do a copy */
+        andq    $PAGE_MASK, %rsi
+        movl    $(PAGE_SIZE / 8), %ecx
+        rep movsq
+        jmp     next_entry
+is_zero:
+        testb   $IND_ZERO, %cl
+        jz      next_entry
+        movl    $(PAGE_SIZE / 8), %ecx  /* Zero the destination page. */
+        xorl    %eax, %eax
+        rep stosq
+        jmp     next_entry
+done:
+        popq    %rcx
+        popq    %rbx
+        popq    %rsi
+        popq    %rdi
+        ret
+
+        .code32
+
+compatibility_mode:
+        /* Setup some sane segments. */
+        movl    $0x0008, %eax
+        movl    %eax, %ds
+        movl    %eax, %es
+        movl    %eax, %fs
+        movl    %eax, %gs
+        movl    %eax, %ss
+
+        movl    %ecx, %ebp
+
+        /* Disable paging and therefore leave 64 bit mode. */
+        movl    %cr0, %eax
+        andl    $~X86_CR0_PG, %eax
+        movl    %eax, %cr0
+
+        /* Disable long mode */
+        movl    $MSR_EFER, %ecx
+        rdmsr
+        andl    $~EFER_LME, %eax
+        wrmsr
+
+        /* Clear cr4 to disable PAE. */
+        xorl    %eax, %eax
+        movl    %eax, %cr4
+
+        /* Call the image entry point.  This should never return. */
+        call    *%ebp
+        ud2
+
+        .align 4
+compatibility_mode_far:
+        .long 0x00000000             /* set in call_32_bit above */
+        .word 0x0010
+
+compat_mode_gdt_desc:
+        .word (3*8)-1
+        .quad 0x0000000000000000     /* set in call_32_bit above */
+
+        .align 8
+compat_mode_gdt:
+        .quad 0x0000000000000000     /* null                              */
+        .quad 0x00cf92000000ffff     /* 0x0008 ring 0 data                */
+        .quad 0x00cf9a000000ffff     /* 0x0010 ring 0 code, compatibility */
+
+compat_mode_idt:
+        .word 0                      /* limit */
+        .long 0                      /* base */
+
+        /*
+         * 16 words of stack are more than enough.
+         */
+        .fill 16,8,0
+reloc_stack:
+
+        .globl kexec_reloc_size
+kexec_reloc_size:
+        .long . - kexec_reloc
diff --git a/xen/common/kexec.c b/xen/common/kexec.c
index 7b23df0..5548c37 100644
--- a/xen/common/kexec.c
+++ b/xen/common/kexec.c
@@ -25,6 +25,7 @@
 #include <xen/version.h>
 #include <xen/console.h>
 #include <xen/kexec.h>
+#include <xen/kimage.h>
 #include <public/elfnote.h>
 #include <xsm/xsm.h>
 #include <xen/cpu.h>
@@ -47,7 +48,7 @@ static Elf_Note *xen_crash_note;
 
 static cpumask_t crash_saved_cpus;
 
-static xen_kexec_image_t kexec_image[KEXEC_IMAGE_NR];
+static struct kexec_image *kexec_image[KEXEC_IMAGE_NR];
 
 #define KEXEC_FLAG_DEFAULT_POS   (KEXEC_IMAGE_NR + 0)
 #define KEXEC_FLAG_CRASH_POS     (KEXEC_IMAGE_NR + 1)
@@ -311,14 +312,14 @@ void kexec_crash(void)
     kexec_common_shutdown();
     kexec_crash_save_cpu();
     machine_crash_shutdown();
-    machine_kexec(&kexec_image[KEXEC_IMAGE_CRASH_BASE + pos]);
+    machine_kexec(kexec_image[KEXEC_IMAGE_CRASH_BASE + pos]);
 
     BUG();
 }
 
 static long kexec_reboot(void *_image)
 {
-    xen_kexec_image_t *image = _image;
+    struct kexec_image *image = _image;
 
     kexecing = TRUE;
 
@@ -734,63 +735,261 @@ static void crash_save_vmcoreinfo(void)
 #endif
 }
 
-static int kexec_load_unload_internal(unsigned long op, xen_kexec_load_v1_t
*load)
+static void kexec_unload_image(struct kexec_image *image)
+{
+    if ( !image )
+        return;
+
+    machine_kexec_unload(image);
+    kimage_free(image);
+}
+
+static int kexec_exec(XEN_GUEST_HANDLE_PARAM(void) uarg)
+{
+    xen_kexec_exec_t exec;
+    struct kexec_image *image;
+    int base, bit, pos, ret = -EINVAL;
+
+    if ( unlikely(copy_from_guest(&exec, uarg, 1)) )
+        return -EFAULT;
+
+    if ( kexec_load_get_bits(exec.type, &base, &bit) )
+        return -EINVAL;
+
+    pos = (test_bit(bit, &kexec_flags) != 0);
+
+    /* Only allow kexec/kdump into loaded images */
+    if ( !test_bit(base + pos, &kexec_flags) )
+        return -ENOENT;
+
+    switch (exec.type)
+    {
+    case KEXEC_TYPE_DEFAULT:
+        image = kexec_image[base + pos];
+        ret = continue_hypercall_on_cpu(0, kexec_reboot, image);
+        break;
+    case KEXEC_TYPE_CRASH:
+        kexec_crash(); /* Does not return */
+        break;
+    }
+
+    return -EINVAL; /* never reached */
+}
+
+static int kexec_swap_images(int type, struct kexec_image *new,
+                             struct kexec_image **old)
 {
-    xen_kexec_image_t *image;
     int base, bit, pos;
-    int ret = 0;
+    int new_slot, old_slot;
+
+    *old = NULL;
+
+    spin_lock(&kexec_lock);
+
+    if ( test_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags) )
+    {
+        spin_unlock(&kexec_lock);
+        return -EBUSY;
+    }
 
-    if ( kexec_load_get_bits(load->type, &base, &bit) )
+    if ( kexec_load_get_bits(type, &base, &bit) )
         return -EINVAL;
 
     pos = (test_bit(bit, &kexec_flags) != 0);
+    old_slot = base + pos;
+    new_slot = base + !pos;
 
-    /* Load the user data into an unused image */
-    if ( op == KEXEC_CMD_kexec_load )
+    if ( new )
     {
-        image = &kexec_image[base + !pos];
+        kexec_image[new_slot] = new;
+        set_bit(new_slot, &kexec_flags);
+    }
+    change_bit(bit, &kexec_flags);
 
-        BUG_ON(test_bit((base + !pos), &kexec_flags)); /* must be free */
+    clear_bit(old_slot, &kexec_flags);
+    *old = kexec_image[old_slot];
 
-        memcpy(image, &load->image, sizeof(*image));
+    spin_unlock(&kexec_lock);
 
-        if ( !(ret = machine_kexec_load(load->type, base + !pos, image)) )
-        {
-            /* Set image present bit */
-            set_bit((base + !pos), &kexec_flags);
+    return 0;
+}
 
-            /* Make new image the active one */
-            change_bit(bit, &kexec_flags);
-        }
+static int kexec_load_slot(struct kexec_image *kimage)
+{
+    struct kexec_image *old_kimage;
+    int ret = -ENOMEM;
+
+    ret = machine_kexec_load(kimage);
+    if ( ret < 0 )
+        return ret;
+
+    crash_save_vmcoreinfo();
+
+    ret = kexec_swap_images(kimage->type, kimage, &old_kimage);
+    if ( ret < 0 )
+        return ret;
+
+    kexec_unload_image(old_kimage);
+
+    return 0;
+}
+
+static uint16_t kexec_load_v1_arch(void)
+{
+#ifdef CONFIG_X86
+    return is_pv_32on64_domain(dom0) ? EM_386 : EM_X86_64;
+#else
+    return EM_NONE;
+#endif
+}
+
+static int kexec_segments_add_segment(
+    unsigned *nr_segments, xen_kexec_segment_t *segments,
+    unsigned long mfn)
+{
+    paddr_t maddr = (paddr_t)mfn << PAGE_SHIFT;
+    int n = *nr_segments;
 
-        crash_save_vmcoreinfo();
+    /* Need a new segment? */
+    if ( n == 0
+         || segments[n-1].dest_maddr + segments[n-1].dest_size != maddr )
+    {
+        n++;
+        if ( n > KEXEC_SEGMENT_MAX )
+            return -EINVAL;
+        *nr_segments = n;
+
+        set_xen_guest_handle(segments[n-1].buf.h, NULL);
+        segments[n-1].buf_size = 0;
+        segments[n-1].dest_maddr = maddr;
+        segments[n-1].dest_size = 0;
     }
 
-    /* Unload the old image if present and load successful */
-    if ( ret == 0 && !test_bit(KEXEC_FLAG_IN_PROGRESS,
&kexec_flags) )
+    return 0;
+}
+
+static int kexec_segments_from_ind_page(unsigned long mfn,
+                                        unsigned *nr_segments,
+                                        xen_kexec_segment_t *segments,
+                                        bool_t compat)
+{
+    void *page;
+    kimage_entry_t *entry;
+    int ret = 0;
+
+    page = map_domain_page(mfn);
+
+    /*
+     * Walk the indirection page list, adding destination pages to the
+     * segments.
+     */
+    for ( entry = page; ; )
     {
-        if ( test_and_clear_bit((base + pos), &kexec_flags) )
+        unsigned long ind;
+
+        ind = kimage_entry_ind(entry, compat);
+        mfn = kimage_entry_mfn(entry, compat);
+
+        switch ( ind )
         {
-            image = &kexec_image[base + pos];
-            machine_kexec_unload(load->type, base + pos, image);
+        case IND_DESTINATION:
+            ret = kexec_segments_add_segment(nr_segments, segments, mfn);
+            if ( ret < 0 )
+                goto done;
+            break;
+        case IND_INDIRECTION:
+            unmap_domain_page(page);
+            page = map_domain_page(mfn);
+            if ( page == NULL )
+                return -ENOMEM;
+            entry = page;
+            continue;
+        case IND_DONE:
+            goto done;
+        case IND_SOURCE:
+            segments[*nr_segments-1].dest_size += PAGE_SIZE;
+            break;
+        default:
+            ret = -EINVAL;
+            goto done;
         }
+        entry = kimage_entry_next(entry, compat);
     }
+done:
+    unmap_domain_page(page);
+    return ret;
+}
+
+static int kexec_do_load_v1(xen_kexec_load_v1_t *load, int compat)
+{
+    struct kexec_image *kimage = NULL;
+    xen_kexec_segment_t *segments;
+    uint16_t arch;
+    unsigned nr_segments = 0;
+    unsigned long ind_mfn = load->image.indirection_page >>
PAGE_SHIFT;
+    int ret;
+
+    arch = kexec_load_v1_arch();
+    if ( arch == EM_NONE )
+        return -ENOSYS;
+
+    segments = xmalloc_array(xen_kexec_segment_t, KEXEC_SEGMENT_MAX);
+    if ( segments == NULL )
+        return -ENOMEM;
+
+    /*
+     * Work out the image segments (destination only) from the
+     * indirection pages.
+     *
+     * This is needed so we don''t allocate pages that will overlap
+     * with the destination when building the new set of indirection
+     * pages below.
+     */
+    ret = kexec_segments_from_ind_page(ind_mfn, &nr_segments, segments,
compat);
+    if ( ret < 0 )
+        goto error;
+
+    ret = kimage_alloc(&kimage, load->type, arch,
load->image.start_address,
+                       nr_segments, segments);
+    if ( ret < 0 )
+        goto error;
+
+    /*
+     * Build a new set of indirection pages in the native format.
+     *
+     * This walks the guest provided indirection pages a second time.
+     * The guest could have altered then, invalidating the segment
+     * information constructed above.  This will only result in the
+     * resulting image being potentially unrelocatable.
+     */
+    ret = kimage_build_ind(kimage, ind_mfn, compat);
+    if ( ret < 0 )
+        goto error;
+
+    ret = kexec_load_slot(kimage);
+    if ( ret < 0 )
+        goto error;
 
+    return 0;
+
+error:
+    if ( !kimage )
+        xfree(segments);
+    kimage_free(kimage);
     return ret;
 }
 
-static int kexec_load_unload(unsigned long op, XEN_GUEST_HANDLE_PARAM(void)
uarg)
+static int kexec_load_v1(XEN_GUEST_HANDLE_PARAM(void) uarg)
 {
     xen_kexec_load_v1_t load;
 
     if ( unlikely(copy_from_guest(&load, uarg, 1)) )
         return -EFAULT;
 
-    return kexec_load_unload_internal(op, &load);
+    return kexec_do_load_v1(&load, 0);
 }
 
-static int kexec_load_unload_compat(unsigned long op,
-                                    XEN_GUEST_HANDLE_PARAM(void) uarg)
+static int kexec_load_v1_compat(XEN_GUEST_HANDLE_PARAM(void) uarg)
 {
 #ifdef CONFIG_COMPAT
     compat_kexec_load_v1_t compat_load;
@@ -809,49 +1008,113 @@ static int kexec_load_unload_compat(unsigned long op,
     load.type = compat_load.type;
     XLAT_kexec_image(&load.image, &compat_load.image);
 
-    return kexec_load_unload_internal(op, &load);
-#else /* CONFIG_COMPAT */
+    return kexec_do_load_v1(&load, 1);
+#else
     return 0;
-#endif /* CONFIG_COMPAT */
+#endif
 }
 
-static int kexec_exec(XEN_GUEST_HANDLE_PARAM(void) uarg)
+static int kexec_load(XEN_GUEST_HANDLE_PARAM(void) uarg)
 {
-    xen_kexec_exec_t exec;
-    xen_kexec_image_t *image;
-    int base, bit, pos, ret = -EINVAL;
+    xen_kexec_load_t load;
+    xen_kexec_segment_t *segments;
+    struct kexec_image *kimage = NULL;
+    int ret;
 
-    if ( unlikely(copy_from_guest(&exec, uarg, 1)) )
+    if ( copy_from_guest(&load, uarg, 1) )
         return -EFAULT;
 
-    if ( kexec_load_get_bits(exec.type, &base, &bit) )
+    if ( load.nr_segments >= KEXEC_SEGMENT_MAX )
         return -EINVAL;
 
-    pos = (test_bit(bit, &kexec_flags) != 0);
-
-    /* Only allow kexec/kdump into loaded images */
-    if ( !test_bit(base + pos, &kexec_flags) )
-        return -ENOENT;
+    segments = xmalloc_array(xen_kexec_segment_t, load.nr_segments);
+    if ( segments == NULL )
+        return -ENOMEM;
 
-    switch (exec.type)
+    if ( copy_from_guest(segments, load.segments.h, load.nr_segments) )
     {
-    case KEXEC_TYPE_DEFAULT:
-        image = &kexec_image[base + pos];
-        ret = continue_hypercall_on_cpu(0, kexec_reboot, image);
-        break;
-    case KEXEC_TYPE_CRASH:
-        kexec_crash(); /* Does not return */
-        break;
+        ret = -EFAULT;
+        goto error;
     }
 
-    return -EINVAL; /* never reached */
+    ret = kimage_alloc(&kimage, load.type, load.arch, load.entry_maddr,
+                       load.nr_segments, segments);
+    if ( ret < 0 )
+        goto error;
+
+    ret = kimage_load_segments(kimage);
+    if ( ret < 0 )
+        goto error;
+
+    ret = kexec_load_slot(kimage);
+    if ( ret < 0 )
+        goto error;
+
+    return 0;
+
+error:
+    if ( ! kimage )
+        xfree(segments);
+    kimage_free(kimage);
+    return ret;
+}
+
+static int kexec_do_unload(xen_kexec_unload_t *unload)
+{
+    struct kexec_image *old_kimage;
+    int ret;
+
+    ret = kexec_swap_images(unload->type, NULL, &old_kimage);
+    if ( ret < 0 )
+        return ret;
+
+    kexec_unload_image(old_kimage);
+
+    return 0;
+}
+
+static int kexec_unload_v1(XEN_GUEST_HANDLE_PARAM(void) uarg)
+{
+    xen_kexec_load_v1_t load;
+    xen_kexec_unload_t unload;
+
+    if ( copy_from_guest(&load, uarg, 1) )
+        return -EFAULT;
+
+    unload.type = load.type;
+    return kexec_do_unload(&unload);
+}
+
+static int kexec_unload_v1_compat(XEN_GUEST_HANDLE_PARAM(void) uarg)
+{
+#ifdef CONFIG_COMPAT
+    compat_kexec_load_v1_t compat_load;
+    xen_kexec_unload_t unload;
+
+    if ( copy_from_guest(&compat_load, uarg, 1) )
+        return -EFAULT;
+
+    unload.type = compat_load.type;
+    return kexec_do_unload(&unload);
+#else
+    return 0;
+#endif
+}
+
+static int kexec_unload(XEN_GUEST_HANDLE_PARAM(void) uarg)
+{
+    xen_kexec_unload_t unload;
+
+    if ( unlikely(copy_from_guest(&unload, uarg, 1)) )
+        return -EFAULT;
+
+    return kexec_do_unload(&unload);
 }
 
 static int do_kexec_op_internal(unsigned long op,
                                 XEN_GUEST_HANDLE_PARAM(void) uarg,
                                 bool_t compat)
 {
-    unsigned long flags;
     int ret = -EINVAL;
 
     ret = xsm_kexec(XSM_PRIV);
@@ -867,20 +1130,26 @@ static int do_kexec_op_internal(unsigned long op,
                 ret = kexec_get_range(uarg);
         break;
     case KEXEC_CMD_kexec_load_v1:
+        if ( compat )
+            ret = kexec_load_v1_compat(uarg);
+        else
+            ret = kexec_load_v1(uarg);
+        break;
     case KEXEC_CMD_kexec_unload_v1:
-        spin_lock_irqsave(&kexec_lock, flags);
-        if (!test_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags))
-        {
-                if (compat)
-                        ret = kexec_load_unload_compat(op, uarg);
-                else
-                        ret = kexec_load_unload(op, uarg);
-        }
-        spin_unlock_irqrestore(&kexec_lock, flags);
+        if ( compat )
+            ret = kexec_unload_v1_compat(uarg);
+        else
+            ret = kexec_unload_v1(uarg);
         break;
     case KEXEC_CMD_kexec:
         ret = kexec_exec(uarg);
         break;
+    case KEXEC_CMD_kexec_load:
+        ret = kexec_load(uarg);
+        break;
+    case KEXEC_CMD_kexec_unload:
+        ret = kexec_unload(uarg);
+        break;
     }
 
     return ret;
diff --git a/xen/common/kimage.c b/xen/common/kimage.c
index 9783e5a..6bee9cf 100644
--- a/xen/common/kimage.c
+++ b/xen/common/kimage.c
@@ -175,14 +175,22 @@ static int do_kimage_alloc(struct kexec_image **rimage,
paddr_t entry,
     image->control_code_page = kimage_alloc_control_page(image,
MEMF_bits(32));
     if ( !image->control_code_page )
         goto out;
+    result = machine_kexec_add_page(image,
+                                    page_to_maddr(image->control_code_page),
+                                   
page_to_maddr(image->control_code_page));
+    if ( result < 0 )
+        return result;
 
     /* Add an empty indirection page. */
     image->entry_page = kimage_alloc_control_page(image, 0);
     if ( !image->entry_page )
         goto out;
+    result = machine_kexec_add_page(image, page_to_maddr(image->entry_page),
+                                    page_to_maddr(image->entry_page));
+    if ( result < 0 )
+        return result;
 
     image->head = page_to_maddr(image->entry_page);
-    image->next_entry = 0;
 
     result = 0;
 out:
@@ -591,7 +599,7 @@ static struct page_info *kimage_alloc_page(struct
kexec_image *image,
         if ( addr == destination )
         {
             page_list_del(page, &image->dest_pages);
-            return page;
+            goto found;
         }
     }
     page = NULL;
@@ -643,6 +651,8 @@ static struct page_info *kimage_alloc_page(struct
kexec_image *image,
             page_list_add(page, &image->dest_pages);
         }
     }
+found:
+    machine_kexec_add_page(image, page_to_maddr(page), page_to_maddr(page));
     return page;
 }
 
@@ -749,6 +759,7 @@ static int kimage_load_crash_segment(struct kexec_image
*image,
 static int kimage_load_segment(struct kexec_image *image, xen_kexec_segment_t
*segment)
 {
     int result = -ENOMEM;
+    paddr_t addr;
 
     if ( !guest_handle_is_null(segment->buf.h) )
     {
@@ -763,6 +774,14 @@ static int kimage_load_segment(struct kexec_image *image,
xen_kexec_segment_t *s
         }
     }
 
+    for ( addr = segment->dest_maddr & PAGE_MASK;
+          addr < segment->dest_maddr + segment->dest_size; addr +=
PAGE_SIZE )
+    {
+        result = machine_kexec_add_page(image, addr, addr);
+        if ( result < 0 )
+            break;
+    }
+
     return result;
 }
 
@@ -806,6 +825,106 @@ int kimage_load_segments(struct kexec_image *image)
     return 0;
 }
 
+kimage_entry_t *kimage_entry_next(kimage_entry_t *entry, bool_t compat)
+{
+    if ( compat )
+        return (kimage_entry_t *)((uint32_t *)entry + 1);
+    return entry + 1;
+}
+
+unsigned long kimage_entry_mfn(kimage_entry_t *entry, bool_t compat)
+{
+    if ( compat )
+        return *(uint32_t *)entry >> PAGE_SHIFT;
+    return *entry >> PAGE_SHIFT;
+}
+
+unsigned long kimage_entry_ind(kimage_entry_t *entry, bool_t compat)
+{
+    if ( compat )
+        return *(uint32_t *)entry & 0xf;
+    return *entry & 0xf;
+}
+
+int kimage_build_ind(struct kexec_image *image, unsigned long ind_mfn,
+                     bool_t compat)
+{
+    void *page;
+    kimage_entry_t *entry;
+    int ret = 0;
+    paddr_t dest = KIMAGE_NO_DEST;
+
+    page = map_domain_page(ind_mfn);
+    if ( !page )
+        return -ENOMEM;
+
+    /*
+     * Walk the guest-supplied indirection pages, adding entries to
+     * the image''s indirection pages.
+     */
+    for ( entry = page; ;  )
+    {
+        unsigned long ind;
+        unsigned long mfn;
+
+        ind = kimage_entry_ind(entry, compat);
+        mfn = kimage_entry_mfn(entry, compat);
+
+        switch ( ind )
+        {
+        case IND_DESTINATION:
+            dest = (paddr_t)mfn << PAGE_SHIFT;
+            ret = kimage_set_destination(image, dest);
+            if ( ret < 0 )
+                goto done;
+            break;
+        case IND_INDIRECTION:
+            unmap_domain_page(page);
+            page = map_domain_page(mfn);
+            entry = page;
+            continue;
+        case IND_DONE:
+            kimage_terminate(image);
+            goto done;
+        case IND_SOURCE:
+        {
+            struct page_info *guest_page, *xen_page;
+
+            guest_page = mfn_to_page(mfn);
+            if ( !get_page(guest_page, current->domain) )
+            {
+                ret = -EFAULT;
+                goto done;
+            }
+
+            xen_page = kimage_alloc_page(image, dest);
+            if ( !xen_page )
+            {
+                put_page(guest_page);
+                ret = -ENOMEM;
+                goto done;
+            }
+
+            copy_domain_page(page_to_mfn(xen_page), mfn);
+            put_page(guest_page);
+
+            ret = kimage_add_page(image, page_to_maddr(xen_page));
+            if ( ret < 0 )
+                goto done;
+            dest += PAGE_SIZE;
+            break;
+        }
+        default:
+            ret = -EINVAL;
+            goto done;
+        }
+        entry = kimage_entry_next(entry, compat);
+    }
+done:
+    unmap_domain_page(page);
+    return ret;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/asm-x86/fixmap.h b/xen/include/asm-x86/fixmap.h
index 8b4266d..48c5676 100644
--- a/xen/include/asm-x86/fixmap.h
+++ b/xen/include/asm-x86/fixmap.h
@@ -56,9 +56,6 @@ enum fixed_addresses {
     FIX_ACPI_BEGIN,
     FIX_ACPI_END = FIX_ACPI_BEGIN + FIX_ACPI_PAGES - 1,
     FIX_HPET_BASE,
-    FIX_KEXEC_BASE_0,
-    FIX_KEXEC_BASE_END = FIX_KEXEC_BASE_0 \
-      + ((KEXEC_XEN_NO_PAGES >> 1) * KEXEC_IMAGE_NR) - 1,
     FIX_TBOOT_SHARED_BASE,
     FIX_MSIX_IO_RESERV_BASE,
     FIX_MSIX_IO_RESERV_END = FIX_MSIX_IO_RESERV_BASE + FIX_MSIX_MAX_PAGES -1,
diff --git a/xen/include/asm-x86/machine_kexec.h
b/xen/include/asm-x86/machine_kexec.h
new file mode 100644
index 0000000..ba0d469
--- /dev/null
+++ b/xen/include/asm-x86/machine_kexec.h
@@ -0,0 +1,16 @@
+#ifndef __X86_MACHINE_KEXEC_H__
+#define __X86_MACHINE_KEXEC_H__
+
+#define KEXEC_RELOC_FLAG_COMPAT 0x1 /* 32-bit image */
+
+#ifndef __ASSEMBLY__
+
+extern void kexec_reloc(unsigned long reloc_code, unsigned long reloc_pt,
+                        unsigned long ind_maddr, unsigned long entry_maddr,
+                        unsigned long flags);
+
+extern unsigned int kexec_reloc_size;
+
+#endif
+
+#endif /* __X86_MACHINE_KEXEC_H__ */
diff --git a/xen/include/xen/kexec.h b/xen/include/xen/kexec.h
index 1a5dda1..bd17747 100644
--- a/xen/include/xen/kexec.h
+++ b/xen/include/xen/kexec.h
@@ -6,6 +6,7 @@
 #include <public/kexec.h>
 #include <asm/percpu.h>
 #include <xen/elfcore.h>
+#include <xen/kimage.h>
 
 typedef struct xen_kexec_reserve {
     unsigned long size;
@@ -40,11 +41,13 @@ extern enum low_crashinfo low_crashinfo_mode;
 extern paddr_t crashinfo_maxaddr_bits;
 void kexec_early_calculations(void);
 
-int machine_kexec_load(int type, int slot, xen_kexec_image_t *image);
-void machine_kexec_unload(int type, int slot, xen_kexec_image_t *image);
+int machine_kexec_add_page(struct kexec_image *image, unsigned long vaddr,
+                           unsigned long maddr);
+int machine_kexec_load(struct kexec_image *image);
+void machine_kexec_unload(struct kexec_image *image);
 void machine_kexec_reserved(xen_kexec_reserve_t *reservation);
-void machine_reboot_kexec(xen_kexec_image_t *image);
-void machine_kexec(xen_kexec_image_t *image);
+void machine_reboot_kexec(struct kexec_image *image);
+void machine_kexec(struct kexec_image *image);
 void kexec_crash(void);
 void kexec_crash_save_cpu(void);
 crash_xen_info_t *kexec_crash_save_info(void);
@@ -52,11 +55,6 @@ void machine_crash_shutdown(void);
 int machine_kexec_get(xen_kexec_range_t *range);
 int machine_kexec_get_xen(xen_kexec_range_t *range);
 
-void compat_machine_kexec(unsigned long rnk,
-                          unsigned long indirection_page,
-                          unsigned long *page_list,
-                          unsigned long start_address);
-
 /* vmcoreinfo stuff */
 #define VMCOREINFO_BYTES           (4096)
 #define VMCOREINFO_NOTE_NAME       "VMCOREINFO_XEN"
diff --git a/xen/include/xen/kimage.h b/xen/include/xen/kimage.h
index 0ebd37a..d10ebf7 100644
--- a/xen/include/xen/kimage.h
+++ b/xen/include/xen/kimage.h
@@ -47,6 +47,12 @@ int kimage_load_segments(struct kexec_image *image);
 struct page_info *kimage_alloc_control_page(struct kexec_image *image,
                                             unsigned memflags);
 
+kimage_entry_t *kimage_entry_next(kimage_entry_t *entry, bool_t compat);
+unsigned long kimage_entry_mfn(kimage_entry_t *entry, bool_t compat);
+unsigned long kimage_entry_ind(kimage_entry_t *entry, bool_t compat);
+int kimage_build_ind(struct kexec_image *image, unsigned long ind_mfn,
+                     bool_t compat);
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* __XEN_KIMAGE_H__ */
-- 
1.7.2.5

David Vrabel

2013-Oct-08 16:55 UTC

head link

[PATCH 5/9] xen: kexec crash image when dom0 crashes

From: David Vrabel <david.vrabel@citrix.com>

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Tested-by: Daniel Kiper <daniel.kiper@oracle.com>
---
 xen/common/kexec.c    |    2 ++
 xen/common/shutdown.c |    3 +++
 2 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/xen/common/kexec.c b/xen/common/kexec.c
index 5548c37..fe0424e 100644
--- a/xen/common/kexec.c
+++ b/xen/common/kexec.c
@@ -307,6 +307,8 @@ void kexec_crash(void)
     if ( !test_bit(KEXEC_IMAGE_CRASH_BASE + pos, &kexec_flags) )
         return;
 
+    printk("Executing crash image\n");
+
     kexecing = TRUE;
 
     kexec_common_shutdown();
diff --git a/xen/common/shutdown.c b/xen/common/shutdown.c
index 20f04b0..9bccd34 100644
--- a/xen/common/shutdown.c
+++ b/xen/common/shutdown.c
@@ -47,6 +47,9 @@ void dom0_shutdown(u8 reason)
     {
         debugger_trap_immediate();
         printk("Domain 0 crashed: ");
+#ifdef CONFIG_KEXEC
+        kexec_crash();
+#endif
         maybe_reboot();
         break; /* not reached */
     }
-- 
1.7.2.5

David Vrabel

2013-Oct-08 16:55 UTC

head link

[PATCH 6/9] libxc: add hypercall buffer arrays

From: David Vrabel <david.vrabel@citrix.com>

Hypercall buffer arrays are used when a hypercall takes a variable
length array of buffers.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Tested-by: Daniel Kiper <daniel.kiper@oracle.com>
---
 tools/libxc/xc_hcall_buf.c |   73 ++++++++++++++++++++++++++++++++++++++++++++
 tools/libxc/xenctrl.h      |   27 ++++++++++++++++
 2 files changed, 100 insertions(+), 0 deletions(-)

diff --git a/tools/libxc/xc_hcall_buf.c b/tools/libxc/xc_hcall_buf.c
index c354677..e762a93 100644
--- a/tools/libxc/xc_hcall_buf.c
+++ b/tools/libxc/xc_hcall_buf.c
@@ -228,6 +228,79 @@ void xc__hypercall_bounce_post(xc_interface *xch,
xc_hypercall_buffer_t *b)
     xc__hypercall_buffer_free(xch, b);
 }
 
+struct xc_hypercall_buffer_array {
+    unsigned max_bufs;
+    xc_hypercall_buffer_t *bufs;
+};
+
+xc_hypercall_buffer_array_t *xc_hypercall_buffer_array_create(xc_interface
*xch,
+                                                              unsigned n)
+{
+    xc_hypercall_buffer_array_t *array;
+    xc_hypercall_buffer_t *bufs = NULL;
+
+    array = malloc(sizeof(*array));
+    if ( array == NULL )
+        goto error;
+
+    bufs = calloc(n, sizeof(*bufs));
+    if ( bufs == NULL )
+        goto error;
+
+    array->max_bufs = n;
+    array->bufs     = bufs;
+
+    return array;
+
+error:
+    free(bufs);
+    free(array);
+    return NULL;
+}
+
+void *xc__hypercall_buffer_array_alloc(xc_interface *xch,
+                                       xc_hypercall_buffer_array_t *array,
+                                       unsigned index,
+                                       xc_hypercall_buffer_t *hbuf,
+                                       size_t size)
+{
+    void *buf;
+
+    if ( index >= array->max_bufs || array->bufs[index].hbuf )
+        abort();
+
+    buf = xc__hypercall_buffer_alloc(xch, hbuf, size);
+    if ( buf )
+        array->bufs[index] = *hbuf;
+    return buf;
+}
+
+void *xc__hypercall_buffer_array_get(xc_interface *xch,
+                                     xc_hypercall_buffer_array_t *array,
+                                     unsigned index,
+                                     xc_hypercall_buffer_t *hbuf)
+{
+    if ( index >= array->max_bufs || array->bufs[index].hbuf == NULL )
+        abort();
+
+    *hbuf = array->bufs[index];
+    return array->bufs[index].hbuf;
+}
+
+void xc_hypercall_buffer_array_destroy(xc_interface *xc,
+                                       xc_hypercall_buffer_array_t *array)
+{
+    unsigned i;
+
+    if ( array == NULL )
+        return;
+
+    for (i = 0; i < array->max_bufs; i++ )
+        xc__hypercall_buffer_free(xc, &array->bufs[i]);
+    free(array->bufs);
+    free(array);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index 58d51f3..1119943 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -321,6 +321,33 @@ void xc__hypercall_buffer_free_pages(xc_interface *xch,
xc_hypercall_buffer_t *b
 #define xc_hypercall_buffer_free_pages(_xch, _name, _nr)
xc__hypercall_buffer_free_pages(_xch, HYPERCALL_BUFFER(_name), _nr)
 
 /*
+ * Array of hypercall buffers.
+ *
+ * Create an array with xc_hypercall_buffer_array_create() and
+ * populate it by declaring one hypercall buffer in a loop and
+ * allocating the buffer with xc_hypercall_buffer_array_alloc().
+ *
+ * To access a previously allocated buffers, declare a new hypercall
+ * buffer and call xc_hypercall_buffer_array_get().
+ *
+ * Destroy the array with xc_hypercall_buffer_array_destroy() to free
+ * the array and all its alocated hypercall buffers.
+ */
+struct xc_hypercall_buffer_array;
+typedef struct xc_hypercall_buffer_array xc_hypercall_buffer_array_t;
+
+xc_hypercall_buffer_array_t *xc_hypercall_buffer_array_create(xc_interface
*xch, unsigned n);
+void *xc__hypercall_buffer_array_alloc(xc_interface *xch,
xc_hypercall_buffer_array_t *array,
+                                       unsigned index, xc_hypercall_buffer_t
*hbuf, size_t size);
+#define xc_hypercall_buffer_array_alloc(_xch, _array, _index, _name, _size) \
+    xc__hypercall_buffer_array_alloc(_xch, _array, _index,
HYPERCALL_BUFFER(_name), _size)
+void *xc__hypercall_buffer_array_get(xc_interface *xch,
xc_hypercall_buffer_array_t *array,
+                                     unsigned index, xc_hypercall_buffer_t
*hbuf);
+#define xc_hypercall_buffer_array_get(_xch, _array, _index, _name, _size) \
+    xc__hypercall_buffer_array_get(_xch, _array, _index,
HYPERCALL_BUFFER(_name))
+void xc_hypercall_buffer_array_destroy(xc_interface *xc,
xc_hypercall_buffer_array_t *array);
+
+/*
  * CPUMAP handling
  */
 typedef uint8_t *xc_cpumap_t;
-- 
1.7.2.5

David Vrabel

2013-Oct-08 16:55 UTC

head link

[PATCH 7/9] libxc: add API for kexec hypercall

From: David Vrabel <david.vrabel@citrix.com>

Add xc_kexec_exec(), xc_kexec_get_ranges(), xc_kexec_load(), and
xc_kexec_unload().  The load and unload calls require the v2 load and
unload ops.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Tested-by: Daniel Kiper <daniel.kiper@oracle.com>
---
 tools/libxc/Makefile   |    1 +
 tools/libxc/xc_kexec.c |  140 ++++++++++++++++++++++++++++++++++++++++++++++++
 tools/libxc/xenctrl.h  |   55 +++++++++++++++++++
 3 files changed, 196 insertions(+), 0 deletions(-)
 create mode 100644 tools/libxc/xc_kexec.c

diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
index 4c64c15..f2d6e56 100644
--- a/tools/libxc/Makefile
+++ b/tools/libxc/Makefile
@@ -31,6 +31,7 @@ CTRL_SRCS-y       += xc_mem_access.c
 CTRL_SRCS-y       += xc_memshr.c
 CTRL_SRCS-y       += xc_hcall_buf.c
 CTRL_SRCS-y       += xc_foreign_memory.c
+CTRL_SRCS-y       += xc_kexec.c
 CTRL_SRCS-y       += xtl_core.c
 CTRL_SRCS-y       += xtl_logger_stdio.c
 CTRL_SRCS-$(CONFIG_X86) += xc_pagetab.c
diff --git a/tools/libxc/xc_kexec.c b/tools/libxc/xc_kexec.c
new file mode 100644
index 0000000..a49cffb
--- /dev/null
+++ b/tools/libxc/xc_kexec.c
@@ -0,0 +1,140 @@
+/******************************************************************************
+ * xc_kexec.c
+ *
+ * API for loading and executing kexec images.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation;
+ * version 2.1 of the License.
+ *
+ * Copyright (C) 2013 Citrix Systems R&D Ltd.
+ */
+#include "xc_private.h"
+
+int xc_kexec_exec(xc_interface *xch, int type)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_kexec_exec_t, exec);
+    int ret = -1;
+
+    exec = xc_hypercall_buffer_alloc(xch, exec, sizeof(*exec));
+    if ( exec == NULL )
+    {
+        PERROR("Count not alloc bounce buffer for kexec_exec
hypercall");
+        goto out;
+    }
+
+    exec->type = type;
+
+    hypercall.op = __HYPERVISOR_kexec_op;
+    hypercall.arg[0] = KEXEC_CMD_kexec;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(exec);
+
+    ret = do_xen_hypercall(xch, &hypercall);
+
+out:
+    xc_hypercall_buffer_free(xch, exec);
+
+    return ret;
+}
+
+int xc_kexec_get_range(xc_interface *xch, int range,  int nr,
+                       uint64_t *size, uint64_t *start)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_kexec_range_t, get_range);
+    int ret = -1;
+
+    get_range = xc_hypercall_buffer_alloc(xch, get_range, sizeof(*get_range));
+    if ( get_range == NULL )
+    {
+        PERROR("Could not alloc bounce buffer for kexec_get_range
hypercall");
+        goto out;
+    }
+
+    get_range->range = range;
+    get_range->nr = nr;
+
+    hypercall.op = __HYPERVISOR_kexec_op;
+    hypercall.arg[0] = KEXEC_CMD_kexec_get_range;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(get_range);
+
+    ret = do_xen_hypercall(xch, &hypercall);
+
+    *size = get_range->size;
+    *start = get_range->start;
+
+out:
+    xc_hypercall_buffer_free(xch, get_range);
+
+    return ret;
+}
+
+int xc_kexec_load(xc_interface *xch, uint8_t type, uint16_t arch,
+                  uint64_t entry_maddr,
+                  uint32_t nr_segments, xen_kexec_segment_t *segments)
+{
+    int ret = -1;
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BOUNCE(segments, sizeof(*segments) * nr_segments,
+                             XC_HYPERCALL_BUFFER_BOUNCE_IN);
+    DECLARE_HYPERCALL_BUFFER(xen_kexec_load_t, load);
+
+    if ( xc_hypercall_bounce_pre(xch, segments) )
+    {
+        PERROR("Could not allocate bounce buffer for kexec load
hypercall");
+        goto out;
+    }
+    load = xc_hypercall_buffer_alloc(xch, load, sizeof(*load));
+    if ( load == NULL )
+    {
+        PERROR("Could not allocate buffer for kexec load hypercall");
+        goto out;
+    }
+
+    load->type = type;
+    load->arch = arch;
+    load->entry_maddr = entry_maddr;
+    load->nr_segments = nr_segments;
+    set_xen_guest_handle(load->segments.h, segments);
+
+    hypercall.op = __HYPERVISOR_kexec_op;
+    hypercall.arg[0] = KEXEC_CMD_kexec_load;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(load);
+
+    ret = do_xen_hypercall(xch, &hypercall);
+
+out:
+    xc_hypercall_buffer_free(xch, load);
+    xc_hypercall_bounce_post(xch, segments);
+
+    return ret;
+}
+
+int xc_kexec_unload(xc_interface *xch, int type)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_kexec_unload_t, unload);
+    int ret = -1;
+
+    unload = xc_hypercall_buffer_alloc(xch, unload, sizeof(*unload));
+    if ( unload == NULL )
+    {
+        PERROR("Count not alloc buffer for kexec unload hypercall");
+        goto out;
+    }
+
+    unload->type = type;
+
+    hypercall.op = __HYPERVISOR_kexec_op;
+    hypercall.arg[0] = KEXEC_CMD_kexec_unload;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(unload);
+
+    ret = do_xen_hypercall(xch, &hypercall);
+
+out:
+    xc_hypercall_buffer_free(xch, unload);
+
+    return ret;
+}
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index 1119943..d77a8b7 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -46,6 +46,7 @@
 #include <xen/hvm/params.h>
 #include <xen/xsm/flask_op.h>
 #include <xen/tmem.h>
+#include <xen/kexec.h>
 
 #include "xentoollog.h"
 
@@ -2328,4 +2329,58 @@ int xc_compression_uncompress_page(xc_interface *xch,
char *compbuf,
 				   unsigned long compbuf_size,
 				   unsigned long *compbuf_pos, char *dest);
 
+/*
+ * Execute an image previously loaded with xc_kexec_load().
+ *
+ * Does not return on success.
+ *
+ * Fails with:
+ *   ENOENT if the specified image has not been loaded.
+ */
+int xc_kexec_exec(xc_interface *xch, int type);
+
+/*
+ * Find the machine address and size of certain memory areas.
+ *
+ *   KEXEC_RANGE_MA_CRASH       crash area
+ *   KEXEC_RANGE_MA_XEN         Xen itself
+ *   KEXEC_RANGE_MA_CPU         CPU note for CPU number ''nr''
+ *   KEXEC_RANGE_MA_XENHEAP     xenheap
+ *   KEXEC_RANGE_MA_EFI_MEMMAP  EFI Memory Map
+ *   KEXEC_RANGE_MA_VMCOREINFO  vmcoreinfo
+ *
+ * Fails with:
+ *   EINVAL if the range or CPU number isn''t valid.
+ */
+int xc_kexec_get_range(xc_interface *xch, int range,  int nr,
+                       uint64_t *size, uint64_t *start);
+
+/*
+ * Load a kexec image into memory.
+ *
+ * The image may be of type KEXEC_TYPE_DEFAULT (executed on request)
+ * or KEXEC_TYPE_CRASH (executed on a crash).
+ *
+ * The image architecture may be a 32-bit variant of the hypervisor
+ * architecture (e.g, EM_386 on a x86-64 hypervisor).
+ *
+ * Fails with:
+ *   ENOMEM if there is insufficient memory for the new image.
+ *   EINVAL if the image does not fit into the crash area or the entry
+ *          point isn''t within one of segments.
+ *   EBUSY  if another image is being executed.
+ */
+int xc_kexec_load(xc_interface *xch, uint8_t type, uint16_t arch,
+                  uint64_t entry_maddr,
+                  uint32_t nr_segments, xen_kexec_segment_t *segments);
+
+/*
+ * Unload a kexec image.
+ *
+ * This prevents a KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH image from
+ * being executed.  The crash images are not cleared from the crash
+ * region.
+ */
+int xc_kexec_unload(xc_interface *xch, int type);
+
 #endif /* XENCTRL_H */
-- 
1.7.2.5

David Vrabel

2013-Oct-08 16:55 UTC

head link

[PATCH 8/9] x86: check kexec relocation code fits in a page

From: David Vrabel <david.vrabel@citrix.com>

The kexec relocation (control) code must fit in a single page so add a
link time check for this.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 xen/arch/x86/xen.lds.S |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
index 9600cdf..17db361 100644
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -198,3 +198,5 @@ SECTIONS
   .stab.indexstr 0 : { *(.stab.indexstr) }
   .comment 0 : { *(.comment) }
 }
+
+ASSERT(kexec_reloc_size - kexec_reloc <= PAGE_SIZE, "kexec_reloc is too
large")
-- 
1.7.2.5

David Vrabel

2013-Oct-08 16:55 UTC

head link

[PATCH 9/9] MAINTAINERS: Add KEXEC maintainer

From: David Vrabel <david.vrabel@citrix.com>

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 MAINTAINERS |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index adacac2..4aac28c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -197,6 +197,14 @@ X:	xen/drivers/passthrough/amd/
 X:	xen/drivers/passthrough/vtd/
 F:	xen/include/xen/iommu.h
 
+KEXEC
+M:      David Vrabel <david.vrabel@citrix.com>
+S:      Supported
+F:      xen/common/{kexec,kimage}.c
+F:      xen/include/{kexec,kimage}.h
+F:      xen/arch/x86/machine_kexec.c
+F:      xen/arch/x86/x86_64/kexec_reloc.S
+
 LINUX (PV_OPS)
 M:	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
 S:	Supported
-- 
1.7.2.5

Andrew Cooper

2013-Oct-08 17:03 UTC

head link

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

On 08/10/13 17:55, David Vrabel wrote:> The series (for Xen 4.4) improves the kexec hypercall by making Xen
> responsible for loading and relocating the image.  This allows kexec
> to be usable by pv-ops kernels and should allow kexec to be usable
> from a HVM or PVH privileged domain.
All Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

Daniel Kiper

2013-Oct-09 15:26 UTC

head link

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

On Tue, Oct 08, 2013 at 05:55:01PM +0100, David Vrabel
wrote:> The series (for Xen 4.4) improves the kexec hypercall by making Xen
> responsible for loading and relocating the image.  This allows kexec
> to be usable by pv-ops kernels and should allow kexec to be usable
> from a HVM or PVH privileged domain.
As I can see you taken some sugestions into account. Thanks. But...
Why did not you send this patch series to kexec@lists.infradead.org
and my Oracle address? Why did not you implemented kexec hypercall
function to get info about loaded images? Andrew and I asked about that.
What about setting GPRs to known value (e.g. 0 like in Linux Kernel)
before jumping into purgatory?

By the way, you do not need to save and restore %rdi, %rsi and %rbx
in relocate_pages() in xen/arch/x86/x86_64/kexec_reloc.S.

Daniel

Andrew Cooper

2013-Oct-09 15:52 UTC

head link

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

On 09/10/13 16:26, Daniel Kiper wrote:> On Tue, Oct 08, 2013 at 05:55:01PM +0100, David Vrabel wrote:
>> The series (for Xen 4.4) improves the kexec hypercall by making Xen
>> responsible for loading and relocating the image.  This allows kexec
>> to be usable by pv-ops kernels and should allow kexec to be usable
>> from a HVM or PVH privileged domain.
> As I can see you taken some sugestions into account. Thanks. But...
> Why did not you send this patch series to kexec@lists.infradead.org
> and my Oracle address? Why did not you implemented kexec hypercall
> function to get info about loaded images? Andrew and I asked about that.
> What about setting GPRs to known value (e.g. 0 like in Linux Kernel)
> before jumping into purgatory?
I had forgotten that I had already asked that question (in person) to
David, and that he had answered.

There is no real usecase for such a hypercall, and introduces a
TimeOfCheck-TimeOfUse race condition, as nothing stops the interleaving
of a KEXECOP_unload hypercall.

The KEXECOP_kexec hypercall is well specified in so far as it will
return if an image is not loaded.

~Andrew
>
> By the way, you do not need to save and restore %rdi, %rsi and %rbx
> in relocate_pages() in xen/arch/x86/x86_64/kexec_reloc.S.
>
> Daniel
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

David Vrabel

2013-Oct-09 16:03 UTC

head link

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

On 09/10/13 16:26, Daniel Kiper wrote:> On Tue, Oct 08, 2013 at 05:55:01PM +0100, David Vrabel wrote:
>> The series (for Xen 4.4) improves the kexec hypercall by making
>> Xen responsible for loading and relocating the image.  This allows
>> kexec to be usable by pv-ops kernels and should allow kexec to be
>> usable from a HVM or PVH privileged domain.
> 
> As I can see you taken some sugestions into account. Thanks. But... 
> Why did not you send this patch series to kexec@lists.infradead.org 
> and my Oracle address?
I reckoned that the kexec list subscribers were sick of this Xen-only
series by now.  Omitting your Cc was accidental, sorry.
> Why did not you implemented kexec hypercall function to get info about
> loaded images?
Because it is not needed.
> What about setting GPRs to known value (e.g. 0 like in Linux Kernel) 
> before jumping into purgatory?
I have (repeatedly) explained why and you have not provided a sensible
reason why they should be zeroed.
> By the way, you do not need to save and restore %rdi, %rsi and %rbx 
> in relocate_pages() in xen/arch/x86/x86_64/kexec_reloc.S.
This is done so relocate_pages() behaves like a proper function with the
standard calling convention.

David

Daniel Kiper

2013-Oct-10 15:45 UTC

head link

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

On Wed, Oct 09, 2013 at 05:03:22PM +0100, David Vrabel
wrote:> On 09/10/13 16:26, Daniel Kiper wrote:
> > On Tue, Oct 08, 2013 at 05:55:01PM +0100, David Vrabel wrote:
> >> The series (for Xen 4.4) improves the kexec hypercall by making
> >> Xen responsible for loading and relocating the image.  This allows
> >> kexec to be usable by pv-ops kernels and should allow kexec to be
> >> usable from a HVM or PVH privileged domain.
> >
> > As I can see you taken some sugestions into account. Thanks. But...
> > Why did not you send this patch series to kexec@lists.infradead.org
> > and my Oracle address?
>
> I reckoned that the kexec list subscribers were sick of this Xen-only
I do not think so. I have not seen any complains.
Please send next version to kexec list too.
> series by now.  Omitting your Cc was accidental, sorry.
OK.
> > Why did not you implemented kexec hypercall function to get info about
> > loaded images?
>
> Because it is not needed.
OK.
> > What about setting GPRs to known value (e.g. 0 like in Linux Kernel)
> > before jumping into purgatory?
>
> I have (repeatedly) explained why and you have not provided a sensible
What do you mean by that? I have not seen any real explanation.
You were saying only that I am defining an ABI. I do not buy it.
Even you did not reply to my last question: Could you tell me
where do you see an ABI here?
> reason why they should be zeroed.
OK, security reasons could be quite difficult to prove in that case and
some may say that they are only theoretical here. However, why we do not
care about compatiblity with existing implementation? What is wrong with
that? We used it as a base for our development and we use a lot of code
existing in kexec-tools. Why we dropped these a few xors during our work?
What is wrong with known values here? Are we going to write special purgatory
code for Xen if one day original purgatory require 0 or another known value
in one or more registers?
> > By the way, you do not need to save and restore %rdi, %rsi and %rbx
> > in relocate_pages() in xen/arch/x86/x86_64/kexec_reloc.S.
>
> This is done so relocate_pages() behaves like a proper function with the
> standard calling convention.
If you would like to be inline with GCC (and a few others) calling convetion
then you should save %rbx in relocate_pages() only. %rcx, %rdi and %rsi should
be saved by caller if needed. Anyway, I do not care about saving registers not
used later in relocate_pages() or around it. Additionally, relocate_pages()
is called only once and I do not expect that it will be moved somewhere else.
So I think that some of save and restore instructions are redundant.

David

David Vrabel

2013-Oct-10 16:35 UTC

head link

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

On 10/10/13 16:45, Daniel Kiper wrote:> On Wed, Oct 09, 2013 at 05:03:22PM +0100, David Vrabel wrote:
>> On 09/10/13 16:26, Daniel Kiper wrote:
>>> 
>>> What about setting GPRs to known value (e.g. 0 like in Linux
Kernel)
>>> before jumping into purgatory?
>>
>> I have (repeatedly) explained why and you have not provided a sensible
> 
> What do you mean by that? I have not seen any real explanation.
> You were saying only that I am defining an ABI. I do not buy it.
> Even you did not reply to my last question: Could you tell me
> where do you see an ABI here?
I''m going to comment on your points one final time. I am not going to
debate with you any further on any of this.

The register state on executing the image is undefined (this is the
specified ABI), so there is no need to set the registers to any
particular value.

If the implementation did zero the registers then an image could rely on
this.  It would then be impossible to change the implementation to do
anything other than zero the registers as that would break existing
users.  Zeroing the register is thus an implicit or defacto ABI (even
though we specified the register values as undefined).

If the registers are not zeroed then it is highly unlikely that an image
could make use of their values and thus if we wish to change the
specification to set some register values we can safely do so without
breaking existing images.
>> reason why they should be zeroed.
> 
> OK, security reasons could be quite difficult to prove in that case and
> some may say that they are only theoretical here.
There is no security concern here.  The image must be fully trusted by
the host administrator since it has full access to all of the host''s
RAM
and devices.

If you''re afraid an image might do something malicious then
don''t load
that image.
> However, why we do not care about compatiblity with existing
implementation?
Xen does diverge from ABI provided by Linux where it makes sense.  i.e.,
where doing so makes for a better ABI or a better implementation.  For
example, 64-bit images are exec''d with page tables that only cover the
image segments (unlike on Linux were the page tables cover all of RAM
which has problems as noted by Jan with cachable mapping overlapping
with uncachable regions).

Compatibility with existing Linux tools is a nice bonus but should not
and does not constrain the Xen ABI or implementation.
> Are we going to write special purgatory
> code for Xen if one day original purgatory require 0 or another known value
> in one or more registers?
If that happens we can always revist the Xen implementation and consider
changing or (or just fix purgatory).
>>> By the way, you do not need to save and restore %rdi, %rsi and %rbx
>>> in relocate_pages() in xen/arch/x86/x86_64/kexec_reloc.S.
>>
>> This is done so relocate_pages() behaves like a proper function with
the
>> standard calling convention.
> 
> If you would like to be inline with GCC (and a few others) calling
convetion
> then you should save %rbx in relocate_pages() only. %rcx, %rdi and %rsi
should
> be saved by caller if needed.
Yes, I got that wrong, but you''re really into trivial nit-picking here
which is quite frankly neither helpful nor productive.
> Anyway, I do not care about saving registers not
> used later in relocate_pages() or around it.
This is stupid -- relocate_pages() is called like a function so it
should behave like one.  Anything else is going to trip up someone else
looking at or modifying the code in the future.

David

Daniel Kiper

2013-Oct-10 21:24 UTC

head link

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

On Thu, Oct 10, 2013 at 05:35:39PM +0100, David Vrabel
wrote:> On 10/10/13 16:45, Daniel Kiper wrote:
> > On Wed, Oct 09, 2013 at 05:03:22PM +0100, David Vrabel wrote:
> >> On 09/10/13 16:26, Daniel Kiper wrote:
> >>>
> >>> What about setting GPRs to known value (e.g. 0 like in Linux
Kernel)
> >>> before jumping into purgatory?
> >>
> >> I have (repeatedly) explained why and you have not provided a
sensible
> >
> > What do you mean by that? I have not seen any real explanation.
> > You were saying only that I am defining an ABI. I do not buy it.
> > Even you did not reply to my last question: Could you tell me
> > where do you see an ABI here?
>
> I''m going to comment on your points one final time. I am not going
to
> debate with you any further on any of this.
Well, we are here to discuss technical stuff and we are doing that.
Sometimes it is tough but it does not mean that we should break that
discussion. Especially if adversaries obey the rules. I think this
is the case here. Additionally, we agreed some things so this is not
counterproductive. So still I count on cooperation with you.
> The register state on executing the image is undefined (this is the
> specified ABI), so there is no need to set the registers to any
> particular value.
So let''s look into the docs. I have found at least two interesting
source
files for us. linux/arch/x86/kernel/relocate_kernel_64.S (let''s focus
on 64-bit; 32-bit is quite similar) contains something like:

        /*
         * set all of the registers to known values
         * leave %rsp alone
         */

        testq   %r11, %r11
        jnz 1f
        xorl    %eax, %eax
        xorl    %ebx, %ebx
        xorl    %ecx, %ecx
        xorl    %edx, %edx
        xorl    %esi, %esi
        xorl    %edi, %edi
        xorl    %ebp, %ebp
        xorl    %r8d, %r8d
        xorl    %r9d, %r9d
        xorl    %r10d, %r10d
        xorl    %r11d, %r11d
        xorl    %r12d, %r12d
        xorl    %r13d, %r13d
        xorl    %r14d, %r14d
        xorl    %r15d, %r15d

        ret

Comment clearly states: set all of the registers to known values; leave %rsp
alone.
So registers states, just before jump into purgatory, are clearly defined.

Now look into kexec-tools/purgatory/arch/x86_64/setup-x86_64.S. There is
no any single word about registers states. Even any instruction assumes their
states. Excluding %rsp which should point to jump_back_entry. In our case
%rsp should point to 0UL stored at the stack (we have missed that and it
should be fixed by us).

We call purgatory. Purgatory is shared code used at least by Linux and Xen.
It was created for Linux by Linux guys so we are the guest here. There is
no other (correct me if I missed something) docs saying anything about
registers states just before jump into purgatory. So we should obey the
rules described in linux/arch/x86/kernel/relocate_kernel_64.S. If we do not
like them we should ask authors of original kexec/kdump about this stuff
and act accordingly to their reply. There is no other way here.
> If the implementation did zero the registers then an image could rely on
> this.  It would then be impossible to change the implementation to do
This is bogus. Any sane developer or maintainer do not assume any register
state if it is not clearly described by existing docs. And it is not. Such
brain dead code would not be accepted at least at kexec list. Guys doing
that are crazy and I do not care about them. They are just asking for problems.
Additionally, there are more reliable ways to get 0 for our needs.
> anything other than zero the registers as that would break existing
> users.  Zeroing the register is thus an implicit or defacto ABI (even
> though we specified the register values as undefined).
>
> If the registers are not zeroed then it is highly unlikely that an image
> could make use of their values and thus if we wish to change the
> specification to set some register values we can safely do so without
> breaking existing images.
So how are you going differentiate between old Xen kexec implementation
(current proposal) and new Xen kexec implementation (current proposal +
some changes) in purgatory (if it be needed) using just only registers
if existing proposal does not enforce registers values (they are simply
random)?
> > However, why we do not care about compatiblity with existing
implementation?
>
> Xen does diverge from ABI provided by Linux where it makes sense.  i.e.,
ABI should be compatible with exiting implementation because we are
using existing code. Please look above.
> where doing so makes for a better ABI or a better implementation.  For
However, I do not object against better implementation...
> example, 64-bit images are exec''d with page tables that only cover
the
> image segments (unlike on Linux were the page tables cover all of RAM
> which has problems as noted by Jan with cachable mapping overlapping
> with uncachable regions).
...like in that case.
> Compatibility with existing Linux tools is a nice bonus but should not
> and does not constrain the Xen ABI or implementation.
Ditto.
> > Are we going to write special purgatory
> > code for Xen if one day original purgatory require 0 or another known
value
> > in one or more registers?
>
> If that happens we can always revist the Xen implementation and consider
> changing or (or just fix purgatory).
Why we are assuming that we need fix our implementation just before inclusion
in Xen unstable? Why we cannot fix it now? Why we could not assume that our
implementation should run as long as possible without any changes?
> >>> By the way, you do not need to save and restore %rdi, %rsi and
%rbx
> >>> in relocate_pages() in xen/arch/x86/x86_64/kexec_reloc.S.
> >>
> >> This is done so relocate_pages() behaves like a proper function
with the
> >> standard calling convention.
> >
> > If you would like to be inline with GCC (and a few others) calling
convetion
> > then you should save %rbx in relocate_pages() only. %rcx, %rdi and
%rsi should
> > be saved by caller if needed.
>
> Yes, I got that wrong, but you''re really into trivial nit-picking
here
> which is quite frankly neither helpful nor productive.
We are here to discuss nitpicks too. It is not counterproductive.
> > Anyway, I do not care about saving registers not
> > used later in relocate_pages() or around it.
>
> This is stupid -- relocate_pages() is called like a function so it
> should behave like one.  Anything else is going to trip up someone else
> looking at or modifying the code in the future.
I do not agree with you but respect your opinion. If you insist on
making relocate_pages() as a real function do that. However, please
be inline with GCC calling convention.

Daniel

Jan Beulich

2013-Oct-11 06:49 UTC

head link

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

>>> On 10.10.13 at 23:24, Daniel Kiper <daniel.kiper@oracle.com>
wrote:
> On Thu, Oct 10, 2013 at 05:35:39PM +0100, David Vrabel wrote:
>> The register state on executing the image is undefined (this is the
>> specified ABI), so there is no need to set the registers to any
>> particular value.
> 
> So let''s look into the docs.
"docs"?
> I have found at least two interesting source
> files for us. linux/arch/x86/kernel/relocate_kernel_64.S (let''s
focus
> on 64-bit; 32-bit is quite similar) contains something like:
> 
>         /*
>          * set all of the registers to known values
>          * leave %rsp alone
>          */
> 
>         testq   %r11, %r11
>         jnz 1f
>         xorl    %eax, %eax
>         xorl    %ebx, %ebx
>         xorl    %ecx, %ecx
>         xorl    %edx, %edx
>         xorl    %esi, %esi
>         xorl    %edi, %edi
>         xorl    %ebp, %ebp
>         xorl    %r8d, %r8d
>         xorl    %r9d, %r9d
>         xorl    %r10d, %r10d
>         xorl    %r11d, %r11d
>         xorl    %r12d, %r12d
>         xorl    %r13d, %r13d
>         xorl    %r14d, %r14d
>         xorl    %r15d, %r15d
> 
>         ret
This is all source code, not documentation.
> Comment clearly states: set all of the registers to known values; leave
%rsp
> alone.
> So registers states, just before jump into purgatory, are clearly defined.
The implementer of this code wanted them to be, but none of the
above in any way says that there''s a requirement. One could as
well load ~0 into all of them, or some random or garbage pattern.
> Now look into kexec-tools/purgatory/arch/x86_64/setup-x86_64.S. There is
> no any single word about registers states. Even any instruction assumes 
> their
> states. Excluding %rsp which should point to jump_back_entry. In our case
> %rsp should point to 0UL stored at the stack (we have missed that and it
> should be fixed by us).
So then where''s your problem with David leaving the register
values alone?
>> If the implementation did zero the registers then an image could rely
on
>> this.  It would then be impossible to change the implementation to do
> 
> This is bogus. Any sane developer or maintainer do not assume any register
> state if it is not clearly described by existing docs. And it is not. Such
> brain dead code would not be accepted at least at kexec list. Guys doing
> that are crazy and I do not care about them. They are just asking for 
> problems.
So you seem to contradict yourself: First you demand the registers
to be zeroed, and then you explain why there''s no need to?
What''s
your position then after all?

Jan

Daniel Kiper

2013-Oct-11 08:58 UTC

head link

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

On Fri, Oct 11, 2013 at 07:49:21AM +0100, Jan Beulich
wrote:> >>> On 10.10.13 at 23:24, Daniel Kiper
<daniel.kiper@oracle.com> wrote:
> > On Thu, Oct 10, 2013 at 05:35:39PM +0100, David Vrabel wrote:
> >> The register state on executing the image is undefined (this is
the
> >> specified ABI), so there is no need to set the registers to any
> >> particular value.
> >
> > So let''s look into the docs.
>
> "docs"?
>
> > I have found at least two interesting source
> > files for us. linux/arch/x86/kernel/relocate_kernel_64.S
(let''s focus
> > on 64-bit; 32-bit is quite similar) contains something like:
> >
> >         /*
> >          * set all of the registers to known values
> >          * leave %rsp alone
> >          */
> >
> >         testq   %r11, %r11
> >         jnz 1f
> >         xorl    %eax, %eax
> >         xorl    %ebx, %ebx
> >         xorl    %ecx, %ecx
> >         xorl    %edx, %edx
> >         xorl    %esi, %esi
> >         xorl    %edi, %edi
> >         xorl    %ebp, %ebp
> >         xorl    %r8d, %r8d
> >         xorl    %r9d, %r9d
> >         xorl    %r10d, %r10d
> >         xorl    %r11d, %r11d
> >         xorl    %r12d, %r12d
> >         xorl    %r13d, %r13d
> >         xorl    %r14d, %r14d
> >         xorl    %r15d, %r15d
> >
> >         ret
>
> This is all source code, not documentation.
OK, maybe I should write "docs". Sadly we do not have real docs in
that case
(correct me if I am wrong). So we are forced to use sources as a reference.
We did it writing our kexec implementation. However, we missed that part.
> > Comment clearly states: set all of the registers to known values;
leave %rsp
> > alone.
> > So registers states, just before jump into purgatory, are clearly
defined.
>
> The implementer of this code wanted them to be, but none of the
> above in any way says that there''s a requirement. One could as
> well load ~0 into all of them, or some random or garbage pattern.
I agree that it could be anything here. However, we have real piece
of code and no other documentation. So we should read what is written.
No more no less. And here guys clear all registers. However, there is
no explanation why they are doing that (I think that this is the problem
here). But this code as I can see is actively maintained and I suppose
that there is some reason to leave it as is. I will ask them why they do that.
> > Now look into kexec-tools/purgatory/arch/x86_64/setup-x86_64.S. There
is
> > no any single word about registers states. Even any instruction
assumes
> > their
> > states. Excluding %rsp which should point to jump_back_entry. In our
case
> > %rsp should point to 0UL stored at the stack (we have missed that and
it
> > should be fixed by us).
>
> So then where''s your problem with David leaving the register
> values alone?
We could not state that registers have undefined state on entry of purgatory
just reading kexec-tools/purgatory/arch/x86_64/setup-x86_64.S. In theory they
may have it but there is the caller side in
linux/arch/x86/kernel/relocate_kernel_64.S
and we should take it into account too.
> >> If the implementation did zero the registers then an image could
rely on
> >> this.  It would then be impossible to change the implementation to
do
> >
> > This is bogus. Any sane developer or maintainer do not assume any
register
> > state if it is not clearly described by existing docs. And it is not.
Such
> > brain dead code would not be accepted at least at kexec list. Guys
doing
> > that are crazy and I do not care about them. They are just asking for
> > problems.
>
> So you seem to contradict yourself: First you demand the registers
> to be zeroed, and then you explain why there''s no need to?
What''s
> your position then after all?
As I correctly understand (correct me if I am wrong), David said that
some guys may assume that registers are zeroed on entry of purgatory
just because we do zero them in relocate_kernel_64.S. I do not agree.
There is no any statement in kexec-tools/purgatory/arch/x86_64/setup-x86_64.S
which allows them to do that. Even registers are zeroed (or set to other value)
in relocate_kernel_64.S.

Daniel

David Vrabel

2013-Oct-11 09:56 UTC

head link

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

On 11/10/13 07:49, Jan Beulich wrote:>>>> On 10.10.13 at 23:24, Daniel Kiper
<daniel.kiper@oracle.com> wrote:
>> On Thu, Oct 10, 2013 at 05:35:39PM +0100, David Vrabel wrote:
>>> The register state on executing the image is undefined (this is the
>>> specified ABI), so there is no need to set the registers to any
>>> particular value.
>>
>> So let''s look into the docs.
> 
> "docs"?
Yes, we should have some.  How about this as a start?

--- a/xen/include/public/kexec.h
+++ b/xen/include/public/kexec.h
@@ -105,7 +105,20 @@ typedef struct xen_kexec_image {
  * Perform kexec having previously loaded a kexec or kdump kernel
  * as appropriate.
  * type == KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH [in]
+ *
+ * Control is transferred to the image entry point with the host in
+ * the following state.
+ *
+ * - The image may be executed on any PCPU and all other PCPUs are
+ *   stopped.
+ *
+ * - Local interrupts are disabled.
+ *
+ * - Register values are undefined.
+ *
+ * - The image segments have writeable 1:1 virtual to machine mappings.
+ *   The location of the page tables is undefined and the page table
+ *   frames are not be mapped.
  */
 #define KEXEC_CMD_kexec                 0
 typedef struct xen_kexec_exec {

David

Daniel Kiper

2013-Oct-11 11:15 UTC

head link

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

On Fri, Oct 11, 2013 at 10:56:25AM +0100, David Vrabel
wrote:> On 11/10/13 07:49, Jan Beulich wrote:
> >>>> On 10.10.13 at 23:24, Daniel Kiper
<daniel.kiper@oracle.com> wrote:
> >> On Thu, Oct 10, 2013 at 05:35:39PM +0100, David Vrabel wrote:
> >>> The register state on executing the image is undefined (this
is the
> >>> specified ABI), so there is no need to set the registers to
any
> >>> particular value.
> >>
> >> So let''s look into the docs.
> >
> > "docs"?
>
> Yes, we should have some.  How about this as a start?
>
> --- a/xen/include/public/kexec.h
> +++ b/xen/include/public/kexec.h
> @@ -105,7 +105,20 @@ typedef struct xen_kexec_image {
>   * Perform kexec having previously loaded a kexec or kdump kernel
>   * as appropriate.
>   * type == KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH [in]
> + *
> + * Control is transferred to the image entry point with the host in
> + * the following state.
> + *
> + * - The image may be executed on any PCPU and all other PCPUs are
> + *   stopped.
OK.
> + * - Local interrupts are disabled.
OK.
> + * - Register values are undefined.
If Linux and kexec guys state that they do not care then I do not care too.
Let''s wait what will happen in "kexec: Clearing registers just
before
jumping into purgatory" thread.
> + * - The image segments have writeable 1:1 virtual to machine mappings.
OK.
> + *   The location of the page tables is undefined and the page table
> + *   frames are not be mapped.
OK.

Daniel

David Vrabel

2013-Oct-11 14:06 UTC

head link

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

On 11/10/13 12:15, Daniel Kiper wrote:> 
>> + * - Register values are undefined.
> 
> If Linux and kexec guys state that they do not care then I do not care too.
> Let''s wait what will happen in "kexec: Clearing registers
just before
> jumping into purgatory" thread.
How about we get the current series in as-is (plus the extra docs) and
then, since you feel so strongly about this minor point, you post a
follow patch to change the behaviour?

Does that work for you?  If so and if you''re happy with everything
else,
can I get your Reviewed-by on the whole series?

David

Daniel Kiper

2013-Oct-14 13:53 UTC

head link

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

On Fri, Oct 11, 2013 at 03:06:09PM +0100, David Vrabel
wrote:> On 11/10/13 12:15, Daniel Kiper wrote:
> >
> >> + * - Register values are undefined.
> >
> > If Linux and kexec guys state that they do not care then I do not care
too.
> > Let''s wait what will happen in "kexec: Clearing
registers just before
> > jumping into purgatory" thread.
>
> How about we get the current series in as-is (plus the extra docs) and
> then, since you feel so strongly about this minor point, you post a
> follow patch to change the behaviour?
>
> Does that work for you?  If so and if you''re happy with everything
else,
> can I get your Reviewed-by on the whole series?
What do you think about last Eric comments? Should we continue our discussion?
If yes I could do final tests of latest series now and put my Tested-by and
Reviewed-by as needed. Later we could establish details and put follow up
patches
(one for zeroing registers and one fixing/aliging calling convention for
relocate_pages). It will be nice if we finish this stuff by the of this week.

If not please prepare patches with above mentioned fixes and put them ASAP.
I will test them in one day.

Daniel

David Vrabel

2013-Oct-14 14:14 UTC

head link

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

On 14/10/13 14:53, Daniel Kiper wrote:> On Fri, Oct 11, 2013 at 03:06:09PM +0100, David Vrabel wrote:
>> On 11/10/13 12:15, Daniel Kiper wrote:
>>>
>>>> + * - Register values are undefined.
>>>
>>> If Linux and kexec guys state that they do not care then I do not
care too.
>>> Let''s wait what will happen in "kexec: Clearing
registers just before
>>> jumping into purgatory" thread.
>>
>> How about we get the current series in as-is (plus the extra docs) and
>> then, since you feel so strongly about this minor point, you post a
>> follow patch to change the behaviour?
>>
>> Does that work for you?  If so and if you''re happy with
everything else,
>> can I get your Reviewed-by on the whole series?
> 
> What do you think about last Eric comments? Should we continue our
discussion?
> If yes I could do final tests of latest series now and put my Tested-by and
> Reviewed-by as needed. Later we could establish details and put follow up
patches
> (one for zeroing registers and one fixing/aliging calling convention for
> relocate_pages). It will be nice if we finish this stuff by the of this
week.
I think there are two[*] sensible options:

A. Registers are specified as undefined, register values are not
initialized.

B. Registers are specified as zeroed (%rsp, %rax excepted), register
values are initialized to zero.

If A is merged, then Xen can move to B later.  If B is merged, Xen
cannot go back to A.  Therefore, I think we should merge A and discuss
moving to B (or perhaps even C) as a separate item.

(FYI, I''ve already fixed up relocate_pages() to go into v10 since I
need
to post v10 with the extra docs anyway.)

David

[*] There is a third way:

C. Registers are specified as undefined, but register values are
initialized to zero.

But I don''t think the specification should diverge from the
implementation.

Daniel Kiper

2013-Oct-14 18:13 UTC

head link

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

On Mon, Oct 14, 2013 at 03:14:13PM +0100, David Vrabel
wrote:> On 14/10/13 14:53, Daniel Kiper wrote:
> > On Fri, Oct 11, 2013 at 03:06:09PM +0100, David Vrabel wrote:
> >> On 11/10/13 12:15, Daniel Kiper wrote:
> >>>
> >>>> + * - Register values are undefined.
> >>>
> >>> If Linux and kexec guys state that they do not care then I do
not care too.
> >>> Let''s wait what will happen in "kexec: Clearing
registers just before
> >>> jumping into purgatory" thread.
> >>
> >> How about we get the current series in as-is (plus the extra docs)
and
> >> then, since you feel so strongly about this minor point, you post
a
> >> follow patch to change the behaviour?
> >>
> >> Does that work for you?  If so and if you''re happy with
everything else,
> >> can I get your Reviewed-by on the whole series?
> >
> > What do you think about last Eric comments? Should we continue our
discussion?
> > If yes I could do final tests of latest series now and put my
Tested-by and
> > Reviewed-by as needed. Later we could establish details and put follow
up patches
> > (one for zeroing registers and one fixing/aliging calling convention
for
> > relocate_pages). It will be nice if we finish this stuff by the of
this week.
>
> I think there are two[*] sensible options:
>
> A. Registers are specified as undefined, register values are not
> initialized.
>
> B. Registers are specified as zeroed (%rsp, %rax excepted), register
> values are initialized to zero.
>
> If A is merged, then Xen can move to B later.  If B is merged, Xen
> cannot go back to A.  Therefore, I think we should merge A and discuss
> moving to B (or perhaps even C) as a separate item.
OK.
> (FYI, I''ve already fixed up relocate_pages() to go into v10 since
I need
> to post v10 with the extra docs anyway.)
Thanks.
> David
>
> [*] There is a third way:
>
> C. Registers are specified as undefined, but register values are
> initialized to zero.
>
> But I don''t think the specification should diverge from the
implementation.
I agree but I think that we could solve that problem by adding comment
which precisely explains what is going on and what callee should expect
(uninitialized registers). Eric comment is nice and could be used by us
as a starting point. Additionally, I think that similar comment should
be added to Linux Kernel source and purgatory entry (I could do that).

Daniel

Daniel Kiper

2013-Oct-16 21:09 UTC

head link

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

On Mon, Oct 14, 2013 at 08:13:52PM +0200, Daniel Kiper
wrote:> On Mon, Oct 14, 2013 at 03:14:13PM +0100, David Vrabel wrote:
[...]
> > (FYI, I''ve already fixed up relocate_pages() to go into v10
since I need
> > to post v10 with the extra docs anyway.)
>
> Thanks.
Hmmm... David, are you going to send v10 soon, so I should wait,
or should I do tests of v9 now?

Daniel

Daniel Kiper

2013-Oct-18 18:40 UTC

head link

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

On Tue, Oct 08, 2013 at 05:55:01PM +0100, David Vrabel
wrote:> The series (for Xen 4.4) improves the kexec hypercall by making Xen
> responsible for loading and relocating the image.  This allows kexec
> to be usable by pv-ops kernels and should allow kexec to be usable
> from a HVM or PVH privileged domain.
I could not load panic image because Xen crashes in following way:

(XEN) ----[ Xen-4.4-unstable  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    3
(XEN) RIP:    e008:[<ffff82d080114ef2>] kimage_free+0x67/0xd2
(XEN) RFLAGS: 0000000000010286   CONTEXT: hypervisor
(XEN) rax: ffff820040037000   rbx: f000ff53f000e2c3   rcx: 0000000000000037
(XEN) rdx: ffff820040000000   rsi: 0000000000000040   rdi: ffff83007faea1d8
(XEN) rbp: ffff83007fae7d48   rsp: ffff83007fae7d28   r8: fffffffffffffe31
(XEN) r9:  0000000000000009   r10: 0000000000000282   r11: 0000000000bfd000
(XEN) r12: ffff820040037000   r13: f000ff53f000e2c3   r14: ffff830076b1df20
(XEN) r15: 0000000013bfd000   cr0: 0000000080050033   cr4: 00000000000026f0
(XEN) cr3: 0000000076389000   cr2: ffff820040037000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff83007fae7d28:
(XEN)    ffff830076b1df20 00000000ffffff9d ffff830076b1de50 0000000000000001
(XEN)    ffff83007fae7d98 ffff82d0801151f9 0000000000000010 ffff83007fae7de0
(XEN)    00000000000000c0 ffff83007fae7de0 000000000000003e 0000000013f4f720
(XEN)    ffff880039c56b50 0000000000000000 ffff83007fae7dc8 ffff82d0801152fe
(XEN)    ffff83007fae7dc8 00000000fffffff2 ffff830076b1de50 00007fff4a4391f0
(XEN)    ffff83007fae7ee8 ffff82d0801144c0 ffff83007fae7ef8 0000000000000000
(XEN)    ffff83007fae7e48 ffff82d08011d7d2 ffff83007fae7e18 ffff82d080270d20
(XEN)    ffff83007fad9060 0000000000075a0c 0000000000000000 ffff83007fad9000
(XEN)    ffff820040035000 00007ff000000003 00000006003e0001 00007fbf8d9b0004
(XEN)    0000000013f4f720 ffff83007faea000 ffff83007fae7e68 ffff82d08016fa43
(XEN)    ffff83007fae7e88 ffff82d080221348 ffff83007faea000 ffff83007fae7f18
(XEN)    ffff83007fae7ef8 ffff82d0802214a8 000000000af2f749 0000000000000000
(XEN)    0000000000000217 00007fbf8cfd5577 0000000000000100 00007fbf8cfd5577
(XEN)    ffff83007fae7ed8 ffff82d08016fa43 ffff83007fad9000 0000000000000003
(XEN)    ffff83007fae7ef8 ffff82d0801145c9 00007cff805180c7 ffff82d0802268cb
(XEN)    ffffffff810014aa 0000000000000025 0000000000000000 00007fff4a439270
(XEN)    00000000000000a0 00007fbf8d9b1000 ffff88003951dea8 ffff880039912c00
(XEN)    0000000000000286 000000000155d850 0000000000200000 0000000013f4f720
(XEN)    0000000000000025 ffffffff810014aa 00007fbf8d8faa55 00007fbf8d9af004
(XEN)    0000000000000004 0001010000000000 ffffffff810014aa 000000000000e033
(XEN) Xen call trace:
(XEN)    [<ffff82d080114ef2>] kimage_free+0x67/0xd2
(XEN)    [<ffff82d0801151f9>] do_kimage_alloc+0x29c/0x2f0
(XEN)    [<ffff82d0801152fe>] kimage_alloc+0xb1/0xe6
(XEN)    [<ffff82d0801144c0>] do_kexec_op_internal+0x68e/0x789
(XEN)    [<ffff82d0801145c9>] do_kexec_op+0xe/0x12
(XEN)    [<ffff82d0802268cb>] syscall_enter+0xeb/0x145
(XEN)
(XEN) Pagetable walk from ffff820040037000:
(XEN)  L4[0x104] = 000000007ffd0063 ffffffffffffffff
(XEN)  L3[0x001] = 000000007ffce063 ffffffffffffffff
(XEN)  L2[0x000] = 000000007ffc5063 ffffffffffffffff
(XEN)  L1[0x037] = f000ff53f000e063 ffffffffffffffff
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 3:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0009]
(XEN) Faulting linear address: ffff820040037000
(XEN) ****************************************

Normal kernel could be loaded but when it is executed something
crashes very early. Following message is displayed

I''m in purgatory
early console in decompress_kernel

and machine is restarted shortly.

I have done tests with latest kexec-tools and Xen versions.

Daniel

David Vrabel

2013-Oct-18 23:14 UTC

head link

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

On 18/10/2013 19:40, Daniel Kiper wrote:> On Tue, Oct 08, 2013 at 05:55:01PM +0100, David Vrabel wrote:
>> The series (for Xen 4.4) improves the kexec hypercall by making Xen
>> responsible for loading and relocating the image.  This allows kexec
>> to be usable by pv-ops kernels and should allow kexec to be usable
>> from a HVM or PVH privileged domain.
> 
> I could not load panic image because Xen crashes in following way:
> 
> (XEN) ----[ Xen-4.4-unstable  x86_64  debug=y  Tainted:    C ]----
[...]> (XEN) Xen call trace:
> (XEN)    [<ffff82d080114ef2>] kimage_free+0x67/0xd2
> (XEN)    [<ffff82d0801151f9>] do_kimage_alloc+0x29c/0x2f0
> (XEN)    [<ffff82d0801152fe>] kimage_alloc+0xb1/0xe6
> (XEN)    [<ffff82d0801144c0>] do_kexec_op_internal+0x68e/0x789
> (XEN)    [<ffff82d0801145c9>] do_kexec_op+0xe/0x12
> (XEN)    [<ffff82d0802268cb>] syscall_enter+0xeb/0x145
The appended patch should fix this crash which only occurs if there''s
an
error in do_kimage_alloc().
> Normal kernel could be loaded but when it is executed something
> crashes very early. Following message is displayed
Both normal and panic images work fine for me.  You''re going to have to
provide more details.

David

8<---------------------------------
--- a/xen/common/kimage.c
+++ b/xen/common/kimage.c
@@ -179,7 +179,7 @@ static int do_kimage_alloc(struct kexec_image
**rimage, paddr_t entry,

page_to_maddr(image->control_code_page),

page_to_maddr(image->control_code_page));
     if ( result < 0 )
-        return result;
+        goto out;

     /* Add an empty indirection page. */
     image->entry_page = kimage_alloc_control_page(image, 0);
@@ -188,7 +188,7 @@ static int do_kimage_alloc(struct kexec_image
**rimage, paddr_t entry,
     result = machine_kexec_add_page(image,
page_to_maddr(image->entry_page),
                                     page_to_maddr(image->entry_page));
     if ( result < 0 )
-        return result;
+        goto out;

     image->head = page_to_maddr(image->entry_page);

@@ -510,15 +510,14 @@ static void kimage_free_entry(kimage_entry_t entry)
     free_domheap_page(page);
 }

-void kimage_free(struct kexec_image *image)
+static void kimage_free_all_entries(struct kexec_image *image)
 {
     kimage_entry_t *ptr, entry;
     kimage_entry_t ind = 0;

-    if ( !image )
+    if ( !image->head )
         return;

-    kimage_free_extra_pages(image);
     for_each_kimage_entry(image, ptr, entry)
     {
         if ( entry & IND_INDIRECTION )
@@ -537,8 +536,15 @@ void kimage_free(struct kexec_image *image)
     /* Free the final indirection page. */
     if ( ind & IND_INDIRECTION )
         kimage_free_entry(ind);
+}

-    /* Free the kexec control pages. */
+void kimage_free(struct kexec_image *image)
+{
+    if ( !image )
+        return;
+
+    kimage_free_extra_pages(image);
+    kimage_free_all_entries(image);
     kimage_free_page_list(&image->control_pages);
     xfree(image->segments);
     xfree(image);

Andrew Cooper

2013-Oct-18 23:42 UTC

head link

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

On 18/10/2013 19:40, Daniel Kiper wrote:> On Tue, Oct 08, 2013 at 05:55:01PM +0100, David Vrabel wrote:
>> The series (for Xen 4.4) improves the kexec hypercall by making Xen
>> responsible for loading and relocating the image.  This allows kexec
>> to be usable by pv-ops kernels and should allow kexec to be usable
>> from a HVM or PVH privileged domain.
> I could not load panic image because Xen crashes in following way:
>
> (XEN) ----[ Xen-4.4-unstable  x86_64  debug=y  Tainted:    C ]----
> (XEN) CPU:    3
> (XEN) RIP:    e008:[<ffff82d080114ef2>] kimage_free+0x67/0xd2
> (XEN) RFLAGS: 0000000000010286   CONTEXT: hypervisor
> (XEN) rax: ffff820040037000   rbx: f000ff53f000e2c3   rcx: 0000000000000037
> (XEN) rdx: ffff820040000000   rsi: 0000000000000040   rdi: ffff83007faea1d8
> (XEN) rbp: ffff83007fae7d48   rsp: ffff83007fae7d28   r8: fffffffffffffe31
> (XEN) r9:  0000000000000009   r10: 0000000000000282   r11: 0000000000bfd000
> (XEN) r12: ffff820040037000   r13: f000ff53f000e2c3   r14: ffff830076b1df20
> (XEN) r15: 0000000013bfd000   cr0: 0000000080050033   cr4: 00000000000026f0
> (XEN) cr3: 0000000076389000   cr2: ffff820040037000
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN) Xen stack trace from rsp=ffff83007fae7d28:
> (XEN)    ffff830076b1df20 00000000ffffff9d ffff830076b1de50
0000000000000001
> (XEN)    ffff83007fae7d98 ffff82d0801151f9 0000000000000010
ffff83007fae7de0
> (XEN)    00000000000000c0 ffff83007fae7de0 000000000000003e
0000000013f4f720
> (XEN)    ffff880039c56b50 0000000000000000 ffff83007fae7dc8
ffff82d0801152fe
> (XEN)    ffff83007fae7dc8 00000000fffffff2 ffff830076b1de50
00007fff4a4391f0
> (XEN)    ffff83007fae7ee8 ffff82d0801144c0 ffff83007fae7ef8
0000000000000000
> (XEN)    ffff83007fae7e48 ffff82d08011d7d2 ffff83007fae7e18
ffff82d080270d20
> (XEN)    ffff83007fad9060 0000000000075a0c 0000000000000000
ffff83007fad9000
> (XEN)    ffff820040035000 00007ff000000003 00000006003e0001
00007fbf8d9b0004
> (XEN)    0000000013f4f720 ffff83007faea000 ffff83007fae7e68
ffff82d08016fa43
> (XEN)    ffff83007fae7e88 ffff82d080221348 ffff83007faea000
ffff83007fae7f18
> (XEN)    ffff83007fae7ef8 ffff82d0802214a8 000000000af2f749
0000000000000000
> (XEN)    0000000000000217 00007fbf8cfd5577 0000000000000100
00007fbf8cfd5577
> (XEN)    ffff83007fae7ed8 ffff82d08016fa43 ffff83007fad9000
0000000000000003
> (XEN)    ffff83007fae7ef8 ffff82d0801145c9 00007cff805180c7
ffff82d0802268cb
> (XEN)    ffffffff810014aa 0000000000000025 0000000000000000
00007fff4a439270
> (XEN)    00000000000000a0 00007fbf8d9b1000 ffff88003951dea8
ffff880039912c00
> (XEN)    0000000000000286 000000000155d850 0000000000200000
0000000013f4f720
> (XEN)    0000000000000025 ffffffff810014aa 00007fbf8d8faa55
00007fbf8d9af004
> (XEN)    0000000000000004 0001010000000000 ffffffff810014aa
000000000000e033
> (XEN) Xen call trace:
> (XEN)    [<ffff82d080114ef2>] kimage_free+0x67/0xd2
> (XEN)    [<ffff82d0801151f9>] do_kimage_alloc+0x29c/0x2f0
> (XEN)    [<ffff82d0801152fe>] kimage_alloc+0xb1/0xe6
> (XEN)    [<ffff82d0801144c0>] do_kexec_op_internal+0x68e/0x789
> (XEN)    [<ffff82d0801145c9>] do_kexec_op+0xe/0x12
> (XEN)    [<ffff82d0802268cb>] syscall_enter+0xeb/0x145
> (XEN)
> (XEN) Pagetable walk from ffff820040037000:
> (XEN)  L4[0x104] = 000000007ffd0063 ffffffffffffffff
> (XEN)  L3[0x001] = 000000007ffce063 ffffffffffffffff
> (XEN)  L2[0x000] = 000000007ffc5063 ffffffffffffffff
> (XEN)  L1[0x037] = f000ff53f000e063 ffffffffffffffff
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 3:
> (XEN) FATAL PAGE FAULT
> (XEN) [error_code=0009]
> (XEN) Faulting linear address: ffff820040037000
> (XEN) ****************************************
>
> Normal kernel could be loaded but when it is executed something
> crashes very early. Following message is displayed
>
> I''m in purgatory
> early console in decompress_kernel
>
> and machine is restarted shortly.
>
> I have done tests with latest kexec-tools and Xen versions.
>
> Daniel
David - this looks curiously like the double-free crash I fixed on v6 in
XenServer.  Did you remember to merge the patch into your series?

Daniel: Your tests show that the new load kernel is in control of the
machine.  The restart is therefore a bug in whichever kernel you used,
presumably making some false assumption about the state of the machine.

~Andrew

Xu, YongweiX

2013-Oct-21 03:11 UTC

head link

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

Hi, David
I have tried to use kexec with upstream xen source(C/S:27758) which has been put
your patches and didn''t not work on Dom0. Step as below:
1. use upstream xen and tools with your patch
2. use kexec-tools with branch "remotes/origin/xen-v6"
3. kexec -l -t multiboot-x86 /boot/xen.gz --module="/boot/vmlinuz-xen
root=/dev/sda1 o" --module="/boot/initrd-xen.img"
4. kexec -e
The phenomenon was that the kernel didn''t change, but the network
broke.

Is there any problem in my operation?
> -----Original Message-----
> From: xen-devel-bounces@lists.xen.org
> [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of David Vrabel
> Sent: Wednesday, October 09, 2013 12:55 AM
> To: xen-devel@lists.xen.org
> Cc: Keir Fraser; David Vrabel; Jan Beulich
> Subject: [Xen-devel] [PATCHv9 0/9] Xen: extend kexec hypercall for use with
> pv-ops kernels
> 
> The series (for Xen 4.4) improves the kexec hypercall by making Xen
responsible
> for loading and relocating the image.  This allows kexec to be usable by
pv-ops
> kernels and should allow kexec to be usable from a HVM or PVH privileged
> domain.
> 
> The first patch is a simple clean-up.
> 
> Patch 2 introduces the new ABI.
> 
> Patch 3 and 4 nearly completely reimplement the kexec load, unload and exec
> sub-ops.  The old load_v1 sub-op is then implemented on top of the new
code.
> 
> Patch 5 calls the kexec image when dom0 crashes.  This avoids having to
alter
> dom0 kernels to do a exec sub-op call on crash -- a SHUTDOWN_crash by dom0
> will trigger the kexec.
> 
> Patches 6 and 7 add the libxc API for the kexec calls.  These have been
> acked-by Ian Campbell already.
> 
> Patch 8 adds a link time check for the size of the relocate code.
> 
> Patch 9 adds myself as the maintainer for kexec in Xen.
> 
> The required patch series for kexec-tools will be posted shortly and are
> available from the xen-v6 branch of:
> 
> http://xenbits.xen.org/gitweb/?p=people/dvrabel/kexec-tools.git;a=summary
> 
> Changes in v9:
> 
> - Update comments to correctly say 4.4.
> - Minor updates the kexec_reloc assembly to improve maintainability a
>   bit.
> 
> Changes in v8:
> 
> - Use #defines for compat ABI structures.
> - Tweak link time check for kexec_reloc.
> 
> Changes in v7:
> 
> - No longer use GUEST_HANDLE_64(), get a uniform ABI by using unions
>   and explicit padding.
> - Only map the segments and not all of RAM.
> - Add a mechanism to create mappings for use by the exec''d image
(a
>   segment with a NULL buf handle).
> - Fix a bug where a crash image''s code page would by placed at
machine
>   address 0 (instead of inside the crash region).
> 
> Changes in v6:
> 
> - Fix double free in KEXEC_load_v1 failure path.
> - Only copy the relocation code and not the whole page.
> - Add myself as the kexec maintainer.
> 
> Changes in v5 (not posted to the list):
> 
> - _rsvd -> _pad in one of the public ABI structures.
> - Fix bug where trailing pages were not zeroed. This fixes loading a
>   64-bit Linux kernel using a more recent version of kexec-tools.
> - Check the relocation code fits into a page at link time.
> 
> Changes in v4:
> 
> - Use paddr_t and page_to_maddr() etc. for portability.
> - Add explicit padding to hypercall structures where required.
> - Minor cleanup of the kexec_reloc assembly.
> - Print a message before exec''ing a crash image.
> - Style fixes (tabs, trailing whitespace) and typos.
> - Fix a bug where using the V1 interface and unloading a image may crash.
> 
> Changes in v3:
> 
> - Provide old struct xen_kexec_load if __XEN_INTERFACE_VERSION__ < 4.3
> - Adjust new struct xen_kexec_load to avoid unnecessary padding.
> - Use domheap pages for the image and control pages.
> - Remove the DBG() macros from the reloc code.
> 
> David
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

David Vrabel

2013-Oct-21 10:21 UTC

head link

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

On 21/10/13 04:11, Xu, YongweiX wrote:> Hi, David
> I have tried to use kexec with upstream xen source(C/S:27758) which has
been put your patches and didn''t not work on Dom0. Step as below:
> 1. use upstream xen and tools with your patch
> 2. use kexec-tools with branch "remotes/origin/xen-v6"
> 3. kexec -l -t multiboot-x86 /boot/xen.gz --module="/boot/vmlinuz-xen
root=/dev/sda1 o" --module="/boot/initrd-xen.img"
kexec segfaults for me here because of a missing command line for Xen.
Specifying one with --command-line="console=com1 etc" will work around
this.
> 4. kexec -e
> The phenomenon was that the kernel didn''t change, but the network
broke.
> 
> Is there any problem in my operation?
Yes, you''re trying something that is not supported -- see the comment
at
the top of kexec/arch/i386/kexec-multiboot-x86.c.  Only ELF32 images are
supported.

David

Daniel Kiper

2013-Oct-21 12:19 UTC

head link

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

On Sat, Oct 19, 2013 at 12:14:24AM +0100, David Vrabel
wrote:> On 18/10/2013 19:40, Daniel Kiper wrote:
> > On Tue, Oct 08, 2013 at 05:55:01PM +0100, David Vrabel wrote:
> >> The series (for Xen 4.4) improves the kexec hypercall by making
Xen
> >> responsible for loading and relocating the image.  This allows
kexec
> >> to be usable by pv-ops kernels and should allow kexec to be usable
> >> from a HVM or PVH privileged domain.
> >
> > I could not load panic image because Xen crashes in following way:
> >
> > (XEN) ----[ Xen-4.4-unstable  x86_64  debug=y  Tainted:    C ]----
> [...]
> > (XEN) Xen call trace:
> > (XEN)    [<ffff82d080114ef2>] kimage_free+0x67/0xd2
> > (XEN)    [<ffff82d0801151f9>] do_kimage_alloc+0x29c/0x2f0
> > (XEN)    [<ffff82d0801152fe>] kimage_alloc+0xb1/0xe6
> > (XEN)    [<ffff82d0801144c0>] do_kexec_op_internal+0x68e/0x789
> > (XEN)    [<ffff82d0801145c9>] do_kexec_op+0xe/0x12
> > (XEN)    [<ffff82d0802268cb>] syscall_enter+0xeb/0x145
>
> The appended patch should fix this crash which only occurs if
there''s an
> error in do_kimage_alloc().
Patch had wrapped lines. I hope that I fixed it properly.
I cannot load panic kernel. kexec fails with following message:

kexec_load failed: Cannot assign requested address
entry       = 0x13f4f720 flags = 0x3e0001
nr_segments = 5
segment[0].buf   = 0x7f5057554e10
segment[0].bufsz = 0x2d4d20
segment[0].mem   = 0x13000000
segment[0].memsz = 0xc0f000
segment[1].buf   = 0x144e620
segment[1].bufsz = 0x3f28
segment[1].mem   = 0x13f4a000
segment[1].memsz = 0x4000
segment[2].buf   = 0x14464c0
segment[2].bufsz = 0x80e0
segment[2].mem   = 0x13f4f000
segment[2].memsz = 0xa000
segment[3].buf   = 0x1445aa0
segment[3].bufsz = 0x400
segment[3].mem   = 0x13f5a000
segment[3].memsz = 0x4000
segment[4].buf   = 0x7f5058ad4010
segment[4].bufsz = 0x9f400
segment[4].mem   = 0x13f5f000
segment[4].memsz = 0xa0000

Normal kexec crashes as previously.

I boot my system in following way:

multiboot /boot/xen.gz apic_verbosity=debug com1=115200,8n1 conring_size=256k \
  crashkernel=256m@64m dom0_mem=1g,max:1g guest_loglvl=all loglvl=all \
  sync_console console=com1,vga
module /boot/vmlinuz root=/dev/sda3 ro apic=debug fbcon=scrollback:256k \
  libata.ignore_hpa=1 raid=noautodetect earlyprintk=xen console=tty1
console=hvc0

I load panic kernel with following command line:

/sbin/kexec -p /boot/vmlinuz --console-serial --serial=ttyS0
--serial-baud=115200 \
  --append=''root=/dev/sda3 ro apic=debug fbcon=scrollback:256k
libata.ignore_hpa=1
            nr_cpus=1 raid=noautodetect earlyprintk=ttyS0,115200 console=tty1
            console=ttyS0,115200n8''

I load normal kernel with following command line:

/sbin/kexec -l /boot/vmlinuz --console-serial --serial=ttyS0
--serial-baud=115200 \
  --append=''root=/dev/sda3 ro apic=debug fbcon=scrollback:256k
libata.ignore_hpa=1
            raid=noautodetect earlyprintk=ttyS0,115200 console=tty1
            console=ttyS0,115200n8''

I use xen-unstable tree with latest commit
(f72cb6bbc10348f4f7671428e5db509731e9e6a5),
Linux Kernel 3.10.17 and kexec-tools with latest
commit (1cbddc80ddfe34cdcdac11c0562e4d8395c48b16)

If you need more info drop me a line.

Daniel

David Vrabel

2013-Oct-21 12:26 UTC

head link

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

On 21/10/13 11:21, David Vrabel wrote:> On 21/10/13 04:11, Xu, YongweiX wrote:
>> Hi, David
>> I have tried to use kexec with upstream xen source(C/S:27758) which has
been put your patches and didn''t not work on Dom0. Step as below:
>> 1. use upstream xen and tools with your patch
>> 2. use kexec-tools with branch "remotes/origin/xen-v6"
>> 3. kexec -l -t multiboot-x86 /boot/xen.gz
--module="/boot/vmlinuz-xen root=/dev/sda1 o"
--module="/boot/initrd-xen.img"
> 
> kexec segfaults for me here because of a missing command line for Xen.
> Specifying one with --command-line="console=com1 etc" will work
around this.
> 
>> 4. kexec -e
>> The phenomenon was that the kernel didn''t change, but the
network broke.
>>
>> Is there any problem in my operation?
> 
> Yes, you''re trying something that is not supported -- see the
comment at
> the top of kexec/arch/i386/kexec-multiboot-x86.c.  Only ELF32 images are
> supported.
Huh.  It does sort of work.  There''s a 3 minute or so delay between
starting purgatory and Xen starting.

I don''t have time to look into this use case (kexec''ing into
Xen) so if
you care you''ll have to do some digging yourself.

David

David Vrabel

2013-Oct-21 12:56 UTC

head link

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

On 21/10/13 13:19, Daniel Kiper wrote:> On Sat, Oct 19, 2013 at 12:14:24AM +0100, David Vrabel wrote:
>> On 18/10/2013 19:40, Daniel Kiper wrote:
>>> On Tue, Oct 08, 2013 at 05:55:01PM +0100, David Vrabel wrote:
>>>> The series (for Xen 4.4) improves the kexec hypercall by making
Xen
>>>> responsible for loading and relocating the image.  This allows
kexec
>>>> to be usable by pv-ops kernels and should allow kexec to be
usable
>>>> from a HVM or PVH privileged domain.
>>>
>>> I could not load panic image because Xen crashes in following way:
>>>
>>> (XEN) ----[ Xen-4.4-unstable  x86_64  debug=y  Tainted:    C ]----
>> [...]
>>> (XEN) Xen call trace:
>>> (XEN)    [<ffff82d080114ef2>] kimage_free+0x67/0xd2
>>> (XEN)    [<ffff82d0801151f9>] do_kimage_alloc+0x29c/0x2f0
>>> (XEN)    [<ffff82d0801152fe>] kimage_alloc+0xb1/0xe6
>>> (XEN)    [<ffff82d0801144c0>]
do_kexec_op_internal+0x68e/0x789
>>> (XEN)    [<ffff82d0801145c9>] do_kexec_op+0xe/0x12
>>> (XEN)    [<ffff82d0802268cb>] syscall_enter+0xeb/0x145
>>
>> The appended patch should fix this crash which only occurs if
there''s an
>> error in do_kimage_alloc().
> 
> Patch had wrapped lines. I hope that I fixed it properly.
> I cannot load panic kernel. kexec fails with following message:
> 
> kexec_load failed: Cannot assign requested address
This is -EADDRINVALID which means one of

a) the entry point isn''t within a segment.
b) one of the segments is not page aligned.
c) one of the segments is not within the crash region.

But the segments kexec has constructed all looked fine to me (and
similar to the segments I see).

I''m afraid I cannot reproduce either of your failures.  Are you sure
you''ve built everything correctly?  In particular has kexec-tools been
built against the correct version of Xen headers?

David

Daniel Kiper

2013-Oct-21 20:20 UTC

head link

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

On Mon, Oct 21, 2013 at 01:56:09PM +0100, David Vrabel
wrote:> On 21/10/13 13:19, Daniel Kiper wrote:
> > On Sat, Oct 19, 2013 at 12:14:24AM +0100, David Vrabel wrote:
> >> On 18/10/2013 19:40, Daniel Kiper wrote:
> >>> On Tue, Oct 08, 2013 at 05:55:01PM +0100, David Vrabel wrote:
> >>>> The series (for Xen 4.4) improves the kexec hypercall by
making Xen
> >>>> responsible for loading and relocating the image.  This
allows kexec
> >>>> to be usable by pv-ops kernels and should allow kexec to
be usable
> >>>> from a HVM or PVH privileged domain.
> >>>
> >>> I could not load panic image because Xen crashes in following
way:
> >>>
> >>> (XEN) ----[ Xen-4.4-unstable  x86_64  debug=y  Tainted:    C
]----
> >> [...]
> >>> (XEN) Xen call trace:
> >>> (XEN)    [<ffff82d080114ef2>] kimage_free+0x67/0xd2
> >>> (XEN)    [<ffff82d0801151f9>]
do_kimage_alloc+0x29c/0x2f0
> >>> (XEN)    [<ffff82d0801152fe>] kimage_alloc+0xb1/0xe6
> >>> (XEN)    [<ffff82d0801144c0>]
do_kexec_op_internal+0x68e/0x789
> >>> (XEN)    [<ffff82d0801145c9>] do_kexec_op+0xe/0x12
> >>> (XEN)    [<ffff82d0802268cb>] syscall_enter+0xeb/0x145
> >>
> >> The appended patch should fix this crash which only occurs if
there''s an
> >> error in do_kimage_alloc().
> >
> > Patch had wrapped lines. I hope that I fixed it properly.
> > I cannot load panic kernel. kexec fails with following message:
> >
> > kexec_load failed: Cannot assign requested address
>
> This is -EADDRINVALID which means one of
>
> a) the entry point isn''t within a segment.
> b) one of the segments is not page aligned.
> c) one of the segments is not within the crash region.
>
> But the segments kexec has constructed all looked fine to me (and
> similar to the segments I see).
>
> I''m afraid I cannot reproduce either of your failures.  Are you
sure
> you''ve built everything correctly?  In particular has kexec-tools
been
> built against the correct version of Xen headers?
It looks that I build it correctly but I will double check it.
Could you send me your Xen/Linux boot command lines and kexec
command lines for normal and panic kernel? Could you tell me
what is your RAM size?

Daniel

Daniel Kiper

2013-Oct-25 09:13 UTC

head link

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

On Mon, Oct 21, 2013 at 10:20:32PM +0200, Daniel Kiper
wrote:> On Mon, Oct 21, 2013 at 01:56:09PM +0100, David Vrabel wrote:
> > On 21/10/13 13:19, Daniel Kiper wrote:
> > > On Sat, Oct 19, 2013 at 12:14:24AM +0100, David Vrabel wrote:
> > >> On 18/10/2013 19:40, Daniel Kiper wrote:
> > >>> On Tue, Oct 08, 2013 at 05:55:01PM +0100, David Vrabel
wrote:
> > >>>> The series (for Xen 4.4) improves the kexec hypercall
by making Xen
> > >>>> responsible for loading and relocating the image. 
This allows kexec
> > >>>> to be usable by pv-ops kernels and should allow kexec
to be usable
> > >>>> from a HVM or PVH privileged domain.
> > >>>
> > >>> I could not load panic image because Xen crashes in
following way:
> > >>>
> > >>> (XEN) ----[ Xen-4.4-unstable  x86_64  debug=y  Tainted:  
C ]----
> > >> [...]
> > >>> (XEN) Xen call trace:
> > >>> (XEN)    [<ffff82d080114ef2>] kimage_free+0x67/0xd2
> > >>> (XEN)    [<ffff82d0801151f9>]
do_kimage_alloc+0x29c/0x2f0
> > >>> (XEN)    [<ffff82d0801152fe>]
kimage_alloc+0xb1/0xe6
> > >>> (XEN)    [<ffff82d0801144c0>]
do_kexec_op_internal+0x68e/0x789
> > >>> (XEN)    [<ffff82d0801145c9>] do_kexec_op+0xe/0x12
> > >>> (XEN)    [<ffff82d0802268cb>]
syscall_enter+0xeb/0x145
> > >>
> > >> The appended patch should fix this crash which only occurs if
there''s an
> > >> error in do_kimage_alloc().
> > >
> > > Patch had wrapped lines. I hope that I fixed it properly.
> > > I cannot load panic kernel. kexec fails with following message:
> > >
> > > kexec_load failed: Cannot assign requested address
> >
> > This is -EADDRINVALID which means one of
> >
> > a) the entry point isn''t within a segment.
> > b) one of the segments is not page aligned.
> > c) one of the segments is not within the crash region.
> >
> > But the segments kexec has constructed all looked fine to me (and
> > similar to the segments I see).
> >
> > I''m afraid I cannot reproduce either of your failures.  Are
you sure
> > you''ve built everything correctly?  In particular has
kexec-tools been
> > built against the correct version of Xen headers?
>
> It looks that I build it correctly but I will double check it.
> Could you send me your Xen/Linux boot command lines and kexec
> command lines for normal and panic kernel? Could you tell me
> what is your RAM size?
Ping? This info will help me to dig deeper in this issue.

Daniel

David Vrabel

2013-Oct-25 23:04 UTC

head link

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

On 25/10/13 10:13, Daniel Kiper wrote:> On Mon, Oct 21, 2013 at 10:20:32PM +0200, Daniel Kiper wrote:
>> On Mon, Oct 21, 2013 at 01:56:09PM +0100, David Vrabel wrote:
>>> On 21/10/13 13:19, Daniel Kiper wrote:
>>>> On Sat, Oct 19, 2013 at 12:14:24AM +0100, David Vrabel wrote:
>>>>> On 18/10/2013 19:40, Daniel Kiper wrote:
>>>>>> On Tue, Oct 08, 2013 at 05:55:01PM +0100, David Vrabel
wrote:
>>>>>>> The series (for Xen 4.4) improves the kexec
hypercall by making Xen
>>>>>>> responsible for loading and relocating the image. 
This allows kexec
>>>>>>> to be usable by pv-ops kernels and should allow
kexec to be usable
>>>>>>> from a HVM or PVH privileged domain.
>>>>>>
>>>>>> I could not load panic image because Xen crashes in
following way:
>>>>>>
>>>>>> (XEN) ----[ Xen-4.4-unstable  x86_64  debug=y  Tainted:
C ]----
>>>>> [...]
>>>>>> (XEN) Xen call trace:
>>>>>> (XEN)    [<ffff82d080114ef2>]
kimage_free+0x67/0xd2
>>>>>> (XEN)    [<ffff82d0801151f9>]
do_kimage_alloc+0x29c/0x2f0
>>>>>> (XEN)    [<ffff82d0801152fe>]
kimage_alloc+0xb1/0xe6
>>>>>> (XEN)    [<ffff82d0801144c0>]
do_kexec_op_internal+0x68e/0x789
>>>>>> (XEN)    [<ffff82d0801145c9>]
do_kexec_op+0xe/0x12
>>>>>> (XEN)    [<ffff82d0802268cb>]
syscall_enter+0xeb/0x145
>>>>>
>>>>> The appended patch should fix this crash which only occurs
if there''s an
>>>>> error in do_kimage_alloc().
>>>>
>>>> Patch had wrapped lines. I hope that I fixed it properly.
>>>> I cannot load panic kernel. kexec fails with following message:
>>>>
>>>> kexec_load failed: Cannot assign requested address
>>>
>>> This is -EADDRINVALID which means one of
>>>
>>> a) the entry point isn''t within a segment.
>>> b) one of the segments is not page aligned.
>>> c) one of the segments is not within the crash region.
>>>
>>> But the segments kexec has constructed all looked fine to me (and
>>> similar to the segments I see).
>>>
>>> I''m afraid I cannot reproduce either of your failures. 
Are you sure
>>> you''ve built everything correctly?  In particular has
kexec-tools been
>>> built against the correct version of Xen headers?
>>
>> It looks that I build it correctly but I will double check it.
>> Could you send me your Xen/Linux boot command lines and kexec
>> command lines for normal and panic kernel? Could you tell me
>> what is your RAM size?
>
> Ping? This info will help me to dig deeper in this issue.
Sorry, I''ve been at the Xen Developers Summit and haven''t had
easy
access to my test box to gather this information.  I''ll get this when 
I''m back in the office on Monday.

David

David Vrabel

2013-Oct-30 16:57 UTC

head link

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

On 21/10/13 21:20, Daniel Kiper wrote:> On Mon, Oct 21, 2013 at 01:56:09PM +0100, David Vrabel wrote:
>> On 21/10/13 13:19, Daniel Kiper wrote:
>>> On Sat, Oct 19, 2013 at 12:14:24AM +0100, David Vrabel wrote:
>>>> On 18/10/2013 19:40, Daniel Kiper wrote:
>>>>> On Tue, Oct 08, 2013 at 05:55:01PM +0100, David Vrabel
wrote:
>>>>>> The series (for Xen 4.4) improves the kexec hypercall
by making Xen
>>>>>> responsible for loading and relocating the image.  This
allows kexec
>>>>>> to be usable by pv-ops kernels and should allow kexec
to be usable
>>>>>> from a HVM or PVH privileged domain.
>>>>>
>>>>> I could not load panic image because Xen crashes in
following way:
>>>>>
>>>>> (XEN) ----[ Xen-4.4-unstable  x86_64  debug=y  Tainted:   
C ]----
>>>> [...]
>>>>> (XEN) Xen call trace:
>>>>> (XEN)    [<ffff82d080114ef2>] kimage_free+0x67/0xd2
>>>>> (XEN)    [<ffff82d0801151f9>]
do_kimage_alloc+0x29c/0x2f0
>>>>> (XEN)    [<ffff82d0801152fe>] kimage_alloc+0xb1/0xe6
>>>>> (XEN)    [<ffff82d0801144c0>]
do_kexec_op_internal+0x68e/0x789
>>>>> (XEN)    [<ffff82d0801145c9>] do_kexec_op+0xe/0x12
>>>>> (XEN)    [<ffff82d0802268cb>]
syscall_enter+0xeb/0x145
>>>>
>>>> The appended patch should fix this crash which only occurs if
there''s an
>>>> error in do_kimage_alloc().
>>>
>>> Patch had wrapped lines. I hope that I fixed it properly.
>>> I cannot load panic kernel. kexec fails with following message:
>>>
>>> kexec_load failed: Cannot assign requested address
>>
>> This is -EADDRINVALID which means one of
>>
>> a) the entry point isn''t within a segment.
>> b) one of the segments is not page aligned.
>> c) one of the segments is not within the crash region.
>>
>> But the segments kexec has constructed all looked fine to me (and
>> similar to the segments I see).
>>
>> I''m afraid I cannot reproduce either of your failures.  Are
you sure
>> you''ve built everything correctly?  In particular has
kexec-tools been
>> built against the correct version of Xen headers?
> 
> It looks that I build it correctly but I will double check it.
> Could you send me your Xen/Linux boot command lines and kexec
> command lines for normal and panic kernel? Could you tell me
> what is your RAM size?
AMD Opteron 4264 with 8 GiB RAM.

Xen 4.4-unstable debug=y:

com1=115200,8n1 console=com1 crashkernel=256M@64M

Linux 3.12-rc4

root=/dev/mapper/cam--st09-root ro console=hvc0

Normal image:

build/sbin/kexec --debug --console-serial --serial-baud=115200
--command-line="console=ttyS0,115200n8 maxcpus=1" -l
/boot/vmlinuz-3.11.0.davidvr

Panic image:

build/sbin/kexec --debug --console-serial --serial-baud=115200
--command-line="console=ttyS0,115200n8 maxcpus=1" -p
/boot/vmlinuz-3.11.0.davidvr

David

Don Slutz

2013-Oct-31 16:59 UTC

head link

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

On 10/30/13 12:57, David Vrabel wrote:> On 21/10/13 21:20, Daniel Kiper wrote:
>> On Mon, Oct 21, 2013 at 01:56:09PM +0100, David Vrabel wrote:
>>> On 21/10/13 13:19, Daniel Kiper wrote:
>>>> On Sat, Oct 19, 2013 at 12:14:24AM +0100, David Vrabel wrote:
>>>>> On 18/10/2013 19:40, Daniel Kiper wrote:
>>>>>> On Tue, Oct 08, 2013 at 05:55:01PM +0100, David Vrabel
wrote:
>>>>>>> The series (for Xen 4.4) improves the kexec
hypercall by making Xen
>>>>>>> responsible for loading and relocating the image. 
This allows kexec
>>>>>>> to be usable by pv-ops kernels and should allow
kexec to be usable
>>>>>>> from a HVM or PVH privileged domain.
>>>>>> I could not load panic image because Xen crashes in
following way:
>>>>>>
>>>>>> (XEN) ----[ Xen-4.4-unstable  x86_64  debug=y  Tainted:
C ]----
>>>>> [...]
>>>>>> (XEN) Xen call trace:
>>>>>> (XEN)    [<ffff82d080114ef2>]
kimage_free+0x67/0xd2
>>>>>> (XEN)    [<ffff82d0801151f9>]
do_kimage_alloc+0x29c/0x2f0
>>>>>> (XEN)    [<ffff82d0801152fe>]
kimage_alloc+0xb1/0xe6
>>>>>> (XEN)    [<ffff82d0801144c0>]
do_kexec_op_internal+0x68e/0x789
>>>>>> (XEN)    [<ffff82d0801145c9>]
do_kexec_op+0xe/0x12
>>>>>> (XEN)    [<ffff82d0802268cb>]
syscall_enter+0xeb/0x145
I get the same thing.>>>>> The appended patch should fix this crash which only occurs
if there''s an
>>>>> error in do_kimage_alloc().
>>>> Patch had wrapped lines. I hope that I fixed it properly.
>>>> I cannot load panic kernel. kexec fails with following message:My version of this patch is attached (0001...). It has both crashed right away
and not:

    (XEN) [2013-10-30 21:26:39] ----[ Xen-4.4-unstable x86_64  debug=y  Not
tainted ]----
    (XEN) [2013-10-30 21:26:39] CPU:    7
    (XEN) [2013-10-30 21:26:39] RIP: e008:[<ffff82d08012fd72>]
xmem_pool_free+0x6f/0x2e9
    (XEN) [2013-10-30 21:26:39] RFLAGS: 0000000000010286 CONTEXT: hypervisor
    (XEN) [2013-10-30 21:26:39] rax: ffff8308df5a5e90   rbx: ffff83083f48f9f0  
rcx: 000000000001b410
    (XEN) [2013-10-30 21:26:39] rdx: 00000000a01164a0   rsi: ffff83083a1ae000  
rdi: ffff83083a1af86c
    (XEN) [2013-10-30 21:26:39] rbp: ffff830823fbfd88   rsp: ffff830823fbfd68  
r8:  000000000000000c
    (XEN) [2013-10-30 21:26:39] r9:  0000000010000000   r10: ffff83083f4904f0  
r11: 00000000004c6000
    (XEN) [2013-10-30 21:26:39] r12: ffff83083a1ae000   r13: ffff83083a1af868  
r14: 00007fff5a9b7fc0
    (XEN) [2013-10-30 21:26:39] r15: 0000000000000003   cr0: 0000000080050033  
cr4: 00000000000426f0
    (XEN) [2013-10-30 21:26:39] cr3: 000000066b482000   cr2: ffff8308df5a5e98
    (XEN) [2013-10-30 21:26:39] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss:
e010   cs: e008
    (XEN) [2013-10-30 21:26:39] Xen stack trace from rsp=ffff830823fbfd68:
    (XEN) [2013-10-30 21:26:39]    00000000000000e0 00000000ffffff9d
ffff83083f4904f0 ffff83083f48f9f0
    (XEN) [2013-10-30 21:26:39]    ffff830823fbfdc8 ffff82d0801304fe
ffff830823fbfdc8 00000000ffffff9d
    (XEN) [2013-10-30 21:26:39]    ffff83083f4904f0 ffff8800870bb5e8
00007fff5a9b7fc0 0000000000000003
    (XEN) [2013-10-30 21:26:39]    ffff830823fbfee8 ffff82d08011450c
ffff830823fbfef8 0000000000000000
    (XEN) [2013-10-30 21:26:39]    0000000000000002 ffff830823fb10b8
ffff830823fbfe18 ffff82d08012b104
    (XEN) [2013-10-30 21:26:39]    ffff8300bf2f4060 000000000066a0cb
0000000000000000 ffff8300bf2f4000
    (XEN) [2013-10-30 21:26:39]    ffff82004001f000 00007ff000000003
00000007003e0001 00007f84d993b004
    (XEN) [2013-10-30 21:26:39]    000000001ff53720 ffff830823fb1000
ffff830823fbfe68 ffff82d08016fb23
    (XEN) [2013-10-30 21:26:39]    ffff830823fbfe88 ffff82d080221348
ffff830823fb1000 ffff830823fbff18
    (XEN) [2013-10-30 21:26:39]    ffff830823fbfef8 ffff82d0802214a8
00000000d69204a7 0000000000000000
    (XEN) [2013-10-30 21:26:39]    0000000000000217 0000003564eee0a7
0000000000000100 0000003564eee0a7
    (XEN) [2013-10-30 21:26:39]    ffff830823fbfed8 ffff82d08016fb23
ffff8300bf2f4000 00007fff5a9b7fc0
    (XEN) [2013-10-30 21:26:39]    ffff830823fbfef8 ffff82d0801145d9
00007cf7dc0400c7 ffff82d0802268cb
    (XEN) [2013-10-30 21:26:39]    ffffffff810014aa 0000000000000025
0000001efd525f9a 0000001efd60d300
    (XEN) [2013-10-30 21:26:39]    0000000000000000 00000021d69204a7
ffff880087debe88 ffff880005d9a500
    (XEN) [2013-10-30 21:26:39]    0000000000000286 00007fff5a9b8180
ffff880087191480 000000001ff53720
    (XEN) [2013-10-30 21:26:39]    0000000000000025 ffffffff810014aa
0000003564a148e5 00007f84d8f3f004
    (XEN) [2013-10-30 21:26:39]    0000000000000004 0001010000000000
ffffffff810014aa 000000000000e033
    (XEN) [2013-10-30 21:26:39]    0000000000000286 ffff880087debde0
000000000000e02b d53835942492fce9
    ...

The auto reboot overwrote the rest.  When it did not crash right away, the next
day I got error messages about page table issues.  (I forgot that the request to
write hypervisor console data to a file is not the default.)  I hope to still
have the data at home.

Best guess at this point is that the error handling still has
issues.>>>> kexec_load failed: Cannot assign requested address
>>> This is -EADDRINVALID which means one of
>>>
>>> a) the entry point isn''t within a segment.
>>> b) one of the segments is not page aligned.
>>> c) one of the segments is not within the crash region.
>>>
>>> But the segments kexec has constructed all looked fine to me (and
>>> similar to the segments I see).I have tracked this down to in kexec-tools:

    +    if (info->kexec_flags & KEXEC_ON_CRASH) {
    +        set_xen_guest_handle(xen_segs[s].buf.h, HYPERCALL_BUFFER_NULL);
    +        xen_segs[s].buf_size = 0;
    +        xen_segs[s].dest_maddr = info->backup_src_start;
    +        xen_segs[s].dest_size = info->backup_src_size;
    +        nr_segments++;
    +    }

Which in some cases passes the 1st e820 line which for me is:

    (XEN) Xen-e820 RAM map:
    (XEN)  0000000000000000 - 000000000009b800 (usable)
    (XEN)  000000000009b800 - 00000000000a0000 (reserved)
    (XEN)  00000000000e0000 - 0000000000100000 (reserved)
    (XEN)  0000000000100000 - 00000000bf63f000 (usable)
    ...

000000000009b800 is not page aligned and so the test:

if ( (mstart & ~PAGE_MASK) || (mend & ~PAGE_MASK) )
             goto out;

Fails.

A possible fix is attached as (0002...) this does allow me to get into the crash
kernel.

    -Don Slutz
>>> I''m afraid I cannot reproduce either of your failures. 
Are you sure
>>> you''ve built everything correctly?  In particular has
kexec-tools been
>>> built against the correct version of Xen headers?
>> It looks that I build it correctly but I will double check it.
>> Could you send me your Xen/Linux boot command lines and kexec
>> command lines for normal and panic kernel? Could you tell me
>> what is your RAM size?
> AMD Opteron 4264 with 8 GiB RAM.
>
> Xen 4.4-unstable debug=y:
>
> com1=115200,8n1 console=com1 crashkernel=256M@64M
>
> Linux 3.12-rc4
>
> root=/dev/mapper/cam--st09-root ro console=hvc0
>
> Normal image:
>
> build/sbin/kexec --debug --console-serial --serial-baud=115200
> --command-line="console=ttyS0,115200n8 maxcpus=1" -l
> /boot/vmlinuz-3.11.0.davidvr
>
> Panic image:
>
> build/sbin/kexec --debug --console-serial --serial-baud=115200
> --command-line="console=ttyS0,115200n8 maxcpus=1" -p
> /boot/vmlinuz-3.11.0.davidvr
>
> David
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

David Vrabel

2013-Oct-31 18:30 UTC

head link

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

On 31/10/2013 16:59, Don Slutz wrote:> On 10/30/13 12:57, David Vrabel wrote:
>> On 21/10/13 21:20, Daniel Kiper wrote:
>>> On Mon, Oct 21, 2013 at 01:56:09PM +0100, David Vrabel wrote:
>>>> On 21/10/13 13:19, Daniel Kiper wrote:
>>>>> On Sat, Oct 19, 2013 at 12:14:24AM +0100, David Vrabel
wrote:
>>>>>> On 18/10/2013 19:40, Daniel Kiper wrote:
>>>>>>> On Tue, Oct 08, 2013 at 05:55:01PM +0100, David
Vrabel wrote:
>>>>>>>> The series (for Xen 4.4) improves the kexec
hypercall by making Xen
>>>>>>>> responsible for loading and relocating the
image.  This allows kexec
>>>>>>>> to be usable by pv-ops kernels and should allow
kexec to be usable
>>>>>>>> from a HVM or PVH privileged domain.
>>>>>>> I could not load panic image because Xen crashes in
following way:
>>>>>>>
>>>>>>> (XEN) ----[ Xen-4.4-unstable  x86_64  debug=y 
Tainted:    C ]----
>>>>>> [...]
>>>>>>> (XEN) Xen call trace:
>>>>>>> (XEN)    [<ffff82d080114ef2>]
kimage_free+0x67/0xd2
>>>>>>> (XEN)    [<ffff82d0801151f9>]
do_kimage_alloc+0x29c/0x2f0
>>>>>>> (XEN)    [<ffff82d0801152fe>]
kimage_alloc+0xb1/0xe6
>>>>>>> (XEN)    [<ffff82d0801144c0>]
do_kexec_op_internal+0x68e/0x789
>>>>>>> (XEN)    [<ffff82d0801145c9>]
do_kexec_op+0xe/0x12
>>>>>>> (XEN)    [<ffff82d0802268cb>]
syscall_enter+0xeb/0x145
> I get the same thing.
>>>>>> The appended patch should fix this crash which only
occurs if there''s an
>>>>>> error in do_kimage_alloc().
>>>>> Patch had wrapped lines. I hope that I fixed it properly.
>>>>> I cannot load panic kernel. kexec fails with following
message:
> My version of this patch is attached (0001...). It has both crashed
> right away and not:
> 
>     (XEN) [2013-10-30 21:26:39] ----[ Xen-4.4-unstable  x86_64  debug=y 
>     Not tainted ]----
>     (XEN) [2013-10-30 21:26:39] CPU:    7
>     (XEN) [2013-10-30 21:26:39] RIP:    e008:[<ffff82d08012fd72>]
>     xmem_pool_free+0x6f/0x2e9
Looks like heap corruption.  I''ll look into this.
>>>>> kexec_load failed: Cannot assign requested address
>>>> This is -EADDRINVALID which means one of
>>>>
>>>> a) the entry point isn''t within a segment.
>>>> b) one of the segments is not page aligned.
>>>> c) one of the segments is not within the crash region.
>>>>
>>>> But the segments kexec has constructed all looked fine to me
(and
>>>> similar to the segments I see).
> I have tracked this down to in kexec-tools:
> 
>     +    if (info->kexec_flags & KEXEC_ON_CRASH) {
>     +        set_xen_guest_handle(xen_segs[s].buf.h,
HYPERCALL_BUFFER_NULL);
>     +        xen_segs[s].buf_size = 0;
>     +        xen_segs[s].dest_maddr = info->backup_src_start;
>     +        xen_segs[s].dest_size = info->backup_src_size;
>     +        nr_segments++;
>     +    }
> 
> Which in some cases passes the 1st e820 line which for me is:
> 
>     (XEN) Xen-e820 RAM map:
>     (XEN)  0000000000000000 - 000000000009b800 (usable)
>     (XEN)  000000000009b800 - 00000000000a0000 (reserved)
>     (XEN)  00000000000e0000 - 0000000000100000 (reserved)
>     (XEN)  0000000000100000 - 00000000bf63f000 (usable)
>     ...
> 
> 000000000009b800 is not page aligned and so the test:
> 
>          if ( (mstart & ~PAGE_MASK) || (mend & ~PAGE_MASK) )
>             goto out;
> 
> Fails.
> 
> A possible fix is attached as (0002...) this does allow me to get into
> the crash kernel.
Thanks for tracking this down. This should be fixed in the tools by
correctly aligning that segment.

David

Don Slutz

2013-Oct-31 20:23 UTC

head link

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

On 10/31/13 14:30, David Vrabel wrote:> On 31/10/2013 16:59, Don Slutz wrote:
>> On 10/30/13 12:57, David Vrabel wrote:[...]
>> Looks like heap corruption.  I''ll look into this.Here is the info that I have in case it helps:

    3737] WARNING: at arch/x86/xen/multicalls.c:129
    xen_mc_flush+0x1a7/0x1b0()
    [46610.043742] Hardware name: SM15000-XE
    [46610.043745] Modules linked in: xen_acpi_processor xen_pciback
    xen_netback xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs
    xen_privcmd nfsv3 nfs_acl tun nfsv4 auth_rpcgss nfs dns_resolver
    fscache lockd 8021q garp bridge stp llc ip6t_REJECT
    nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
    nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack i2c_i801
    i2c_core coretemp iTCO_wdt iTCO_vendor_support microcode lpc_ich
    mfd_core crc32c_intel ghash_clmulni_intel e1000 sunrpc
    [46610.043804] Pid: 2129, comm: sshd Tainted: G B       
    3.8.11-100.fc17.x86_64 #1
    [46610.043810] Call Trace:
    [46610.043816]  [<ffffffff8105e685>] warn_slowpath_common+0x75/0xa0
    [46610.043824]  [<ffffffff8105e6ca>] warn_slowpath_null+0x1a/0x20
    [46610.043830]  [<ffffffff81004d47>] xen_mc_flush+0x1a7/0x1b0
    [46610.043837]  [<ffffffff81008c85>] __xen_pgd_pin+0xe5/0x260
    [46610.043844]  [<ffffffff81008e66>] xen_dup_mmap+0x26/0x40
    [46610.043850]  [<ffffffff8105c0ff>] dup_mm+0x44f/0x640
    [46610.043856]  [<ffffffff8105cd07>] copy_process.part.23+0x9d7/0x13e0
    [46610.043863]  [<ffffffff812970f6>] ? security_file_alloc+0x16/0x20
    [46610.043871]  [<ffffffff8119eb1c>] ? get_empty_filp+0x8c/0x190
    [46610.043877]  [<ffffffff8105d809>] do_fork+0xa9/0x350
    [46610.043884]  [<ffffffff811b933e>] ? __fd_install+0x2e/0x60
    [46610.043890]  [<ffffffff8105db36>] sys_clone+0x16/0x20
    [46610.043896]  [<ffffffff816587b9>] stub_clone+0x69/0x90
    [46610.043902]  [<ffffffff81658459>] ? system_call_fastpath+0x16/0x1b
    [46610.043908] ---[ end trace b3928f7451ca4cbd ]---
    (XEN) [2013-10-31 12:29:06] mm.c:765:d0 Bad L1 flags 400000
    (XEN) [2013-10-31 12:29:06] mm.c:1221:d0 Failure in alloc_l1_table:
    entry 248
    (XEN) [2013-10-31 12:29:06] mm.c:2099:d0 Error while validating mfn
    669913 (pfn 882a8) for type 1000000000000000: caf=8000000000000003
    taf=1000000000000001
    (XEN) [2013-10-31 12:29:06] mm.c:906:d0 Attempt to create linear
    p.t. with write perms
    (XEN) [2013-10-31 12:29:06] mm.c:1297:d0 Failure in alloc_l2_table:
    entry 335
    (XEN) [2013-10-31 12:29:06] mm.c:2099:d0 Error while validating mfn
    66b81c (pfn 8a39f) for type 2000000000000000: caf=8000000000000003
    taf=2000000000000001
    (XEN) [2013-10-31 12:29:06] mm.c:948:d0 Attempt to create linear
    p.t. with write perms
    (XEN) [2013-10-31 12:29:06] mm.c:1379:d0 Failure in alloc_l3_table:
    entry 504
    (XEN) [2013-10-31 12:29:06] mm.c:2099:d0 Error while validating mfn
    66b0ea (pfn 89ad1) for type 3000000000000000: caf=8000000000000003
    taf=3000000000000001
    (XEN) [2013-10-31 12:29:06] mm.c:972:d0 Attempt to create linear
    p.t. with write perms
    (XEN) [2013-10-31 12:29:06] mm.c:1438:d0 Failure in alloc_l4_table:
    entry 255
    (XEN) [2013-10-31 12:29:06] mm.c:2099:d0 Error while validating mfn
    66bb52 (pfn 8a469) for type 4000000000000000: caf=8000000000000003
    taf=4000000000000001
    (XEN) [2013-10-31 12:29:06] mm.c:2763:d0 Error while installing new
    baseptr 66bb52
    [46610.044162] ------------[ cut here ]------------
    [46610.044179] WARNING: at arch/x86/xen/multicalls.c:129
    xen_mc_flush+0x1a7/0x1b0()
    [46610.044188] Hardware name: SM15000-XE
    [46610.044193] Modules linked in: xen_acpi_processor xen_pciback
    xen_netback xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs
    xen_privcmd nfsv3 nfs_acl tun nfsv4 auth_rpcgss nfs dns_resolver
    fscache lockd 8021q garp bridge stp llc ip6t_REJECT
    nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
    nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack i2c_i801
    i2c_core coretemp iTCO_wdt iTCO_vendor_support microcode lpc_ich
    mfd_core crc32c_intel ghash_clmulni_intel e1000 sunrpc
    [46610.044263] Pid: 0, comm: swapper/5 Tainted: G    B W   
    3.8.11-100.fc17.x86_64 #1
    [46610.044269] Call Trace:
    [46610.044272] ---[ end trace b3928f7451ca4cbe ]---
    (XEN) [2013-10-31 12:29:06] mm.c:765:d0 Bad L1 flags 400000
    (XEN) [2013-10-31 12:29:06] mm.c:1221:d0 Failure in alloc_l1_table:
    entry 248
    (XEN) [2013-10-31 12:29:06] mm.c:2099:d0 Error while validating mfn
    669913 (pfn 882a8) for type 1000000000000000: caf=8000000000000003
    taf=1000000000000001
    (XEN) [2013-10-31 12:29:06] mm.c:906:d0 Attempt to create linear
    p.t. with write perms
    (XEN) [2013-10-31 12:29:06] mm.c:1297:d0 Failure in alloc_l2_table:
    entry 335
    (XEN) [2013-10-31 12:29:06] mm.c:2099:d0 Error while validating mfn
    66b81c (pfn 8a39f) for type 2000000000000000: caf=8000000000000003
    taf=2000000000000001
    (XEN) [2013-10-31 12:29:06] mm.c:948:d0 Attempt to create linear
    p.t. with write perms
    (XEN) [2013-10-31 12:29:06] mm.c:1379:d0 Failure in alloc_l3_table:
    entry 504
    (XEN) [2013-10-31 12:29:06] mm.c:2099:d0 Error while validating mfn
    66b0ea (pfn 89ad1) for type 3000000000000000: caf=8000000000000003
    taf=3000000000000001
    (XEN) [2013-10-31 12:29:06] mm.c:972:d0 Attempt to create linear
    p.t. with write perms
    (XEN) [2013-10-31 12:29:06] mm.c:1438:d0 Failure in alloc_l4_table:
    entry 255
    (XEN) [2013-10-31 12:29:06] mm.c:2099:d0 Error while validating mfn
    66bb52 (pfn 8a469) for type 4000000000000000: caf=8000000000000003
    taf=4000000000000001
    (XEN) [2013-10-31 12:29:06] mm.c:2763:d0 Error while installing new
    baseptr 66bb52
    [46610.047984] ------------[ cut here ]------------
    [46610.047995] WARNING: at arch/x86/xen/multicalls.c:129
    xen_mc_flush+0x1a7/0x1b0()
    [46610.048001] Hardware name: SM15000-XE
    [46610.048004] Modules linked in: xen_acpi_processor xen_pciback
    xen_netback xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs
    xen_privcmd nfsv3 nfs_acl tun nfsv4 auth_rpcgss nfs dns_resolver
    fscache lockd 8021q garp bridge stp llc ip6t_REJECT
    nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
    nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack i2c_i801
    i2c_core coretemp iTCO_wdt iTCO_vendor_support microcode lpc_ich
    mfd_core crc32c_intel ghash_clmulni_intel e1000 sunrpc
    [46610.048063] Pid: 0, comm: swapper/6 Tainted: G    B W   
    3.8.11-100.fc17.x86_64 #1
    [46610.048069] Call Trace:
    [46610.048072] ---[ end trace b3928f7451ca4cbf ]---
    [46611.322192] BUG: Bad page map in process sshd
    pte:b2f32206a28ec48d pmd:00000066
    [46611.322197] Hardware name: SM15000-XE
    [46611.322199] addr:00007ffe29ef7000 vm_flags:00100071
    anon_vma:ffff8800050e1840 mapping:ffff88000555fa00 index:c
    [46611.322204] vma->vm_ops->fault: filemap_fault+0x0/0x4a0
    [46611.322208] vma->vm_file->f_op->mmap: ext4_file_mmap+0x0/0x60
    [46611.322210] Modules linked in: xen_acpi_processor xen_pciback
    xen_netback xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs
    xen_privcmd nfsv3 nfs_acl tun nfsv4 auth_rpcgss nfs dns_resolver
    fscache lockd 8021q garp bridge stp llc ip6t_REJECT
    nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
    nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack i2c_i801
    i2c_core coretemp iTCO_wdt iTCO_vendor_support microcode lpc_ich
    mfd_core crc32c_intel ghash_clmulni_intel e1000 sunrpc
    [46611.322240] Pid: 2129, comm: sshd Tainted: G    B W   
    3.8.11-100.fc17.x86_64 #1
    [46611.322242] Call Trace:
    [46611.322246]  [<ffffffff811599c7>] print_bad_pte+0x1e7/0x260
    [46611.322249]  [<ffffffff8115ae69>] vm_normal_page+0x79/0x80
    [46611.322252]  [<ffffffff8115c3d1>] copy_pte_range+0x1f1/0x4f0
    [46611.322256]  [<ffffffff8100b66e>] ? m2p_find_override+0xae/0xc0
    [46611.322259]  [<ffffffff8115e41c>] copy_page_range+0x2fc/0x4a0
    [46611.322263]  [<ffffffff8105bfe9>] dup_mm+0x339/0x640
    [46611.322266]  [<ffffffff8105cd07>] copy_process.part.23+0x9d7/0x13e0
    [46611.322270]  [<ffffffff812970f6>] ? security_file_alloc+0x16/0x20
    [46611.322273]  [<ffffffff8119eb1c>] ? get_empty_filp+0x8c/0x190
    [46611.322277]  [<ffffffff8105d809>] do_fork+0xa9/0x350
    [46611.322280]  [<ffffffff811b933e>] ? __fd_install+0x2e/0x60
    [46611.322283]  [<ffffffff8105db36>] sys_clone+0x16/0x20
    [46611.322286]  [<ffffffff816587b9>] stub_clone+0x69/0x90
    [46611.322289]  [<ffffffff81658459>] ? system_call_fastpath+0x16/0x1b
    [46611.322294] BUG: Bad page map in process sshd
    pte:ce27c72aa6fea78d pmd:00000066
    [46611.322297] Hardware name: SM15000-XE
    [46611.322299] addr:00007ffe29ef8000 vm_flags:00100073
    anon_vma:ffff8800050e1840 mapping:ffff88000555fa00 index:d
    [46611.322303] vma->vm_ops->fault: filemap_fault+0x0/0x4a0
    [46611.322306] vma->vm_file->f_op->mmap: ext4_file_mmap+0x0/0x60
    [46611.322308] Modules linked in: xen_acpi_processor xen_pciback
    xen_netback xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs
    xen_privcmd nfsv3 nfs_acl tun nfsv4 auth_rpcgss nfs dns_resolver
    fscache lockd 8021q garp bridge stp llc ip6t_REJECT
    nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
    nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack i2c_i801
    i2c_core coretemp iTCO_wdt iTCO_vendor_support microcode lpc_ich
    mfd_core crc32c_intel ghash_clmulni_intel e1000 sunrpc
    [46611.322336] Pid: 2129, comm: sshd Tainted: G    B W   
    3.8.11-100.fc17.x86_64 #1
    [46611.322339] Call Trace:
    [46611.322341]  [<ffffffff811599c7>] print_bad_pte+0x1e7/0x260
    [46611.322344]  [<ffffffff8115ae59>] vm_normal_page+0x69/0x80
    [46611.322347]  [<ffffffff8115c3d1>] copy_pte_range+0x1f1/0x4f0
    [46611.322350]  [<ffffffff8100b66e>] ? m2p_find_override+0xae/0xc0
    [46611.322353]  [<ffffffff8115e41c>] copy_page_range+0x2fc/0x4a0
    [46611.322357]  [<ffffffff8105bfe9>] dup_mm+0x339/0x640
    [46611.322360]  [<ffffffff8105cd07>] copy_process.part.23+0x9d7/0x13e0
    [46611.322364]  [<ffffffff812970f6>] ? security_file_alloc+0x16/0x20
    [46611.322367]  [<ffffffff8119eb1c>] ? get_empty_filp+0x8c/0x190
    [46611.322370]  [<ffffffff8105d809>] do_fork+0xa9/0x350
    [46611.322374]  [<ffffffff811b933e>] ? __fd_install+0x2e/0x60
    [46611.322377]  [<ffffffff8105db36>] sys_clone+0x16/0x20
    [46611.322380]  [<ffffffff816587b9>] stub_clone+0x69/0x90
    [46611.322383]  [<ffffffff81658459>] ? system_call_fastpath+0x16/0x1b
    (XEN) [2013-10-31 12:29:07] mm.c:765:d0 Bad L1 flags 400000
    (XEN) [2013-10-31 12:29:07] mm.c:1221:d0 Failure in alloc_l1_table:
    entry 248
    (XEN) [2013-10-31 12:29:07] mm.c:2099:d0 Error while validating mfn
    669930 (pfn 8828b) for type 1000000000000000: caf=8000000000000003
    taf=1000000000000001
    (XEN) [2013-10-31 12:29:07] mm.c:2995:d0 Error while pinning mfn 669930
    (XEN) [2013-10-31 12:29:07] mm.c:765:d0 Bad L1 flags 400000
    (XEN) [2013-10-31 12:29:07] mm.c:1221:d0 Failure in alloc_l1_table:
    entry 248
    (XEN) [2013-10-31 12:29:07] mm.c:2099:d0 Error while validating mfn
    669930 (pfn 8828b) for type 1000000000000000: caf=8000000000000003
    taf=1000000000000001
    (XEN) [2013-10-31 12:29:07] mm.c:906:d0 Attempt to create linear
    p.t. with write perms
    (XEN) [2013-1
    [root@dcs-xen-51 ~]# reboot
    (XEN) [2013-10-31 12:30:12] mm.c:765:d0 Bad L1 flags 400000
    (XEN) [2013-10-31 12:30:12] mm.c:1221:d0 Failure in alloc_l1_table:
    entry 248
    (XEN) [2013-10-31 12:30:12] mm.c:2099:d0 Error while validating mfn
    669930 (pfn 8828b) for type 1000000000000000: caf=8000000000000003
    taf=1000000000000001
    (XEN) [2013-10-31 12:30:12] mm.c:906:d0 Attempt to create linear
    p.t. with write perms
    (XEN) [2013-10-31 12:30:12] mm.c:1297:d0 Failure in alloc_l2_table:
    entry 335
    (XEN) [2013-10-31 12:30:12] mm.c:2099:d0 Error while validating mfn
    83e7af (pfn 52af) for type 2000000000000000: caf=8000000000000003
    taf=2000000000000001
    (XEN) [2013-10-31 12:30:12] mm.c:948:d0 Attempt to create linear
    p.t. with write perms
    (XEN) [2013-10-31 12:30:12] mm.c:1379:d0 Failure in alloc_l3_table:
    entry 504
    (XEN) [2013-10-31 12:30:12] mm.c:2099:d0 Error while validating mfn
    83f1df (pfn 5cdf) for type 3000000000000000: caf=8000000000000003
    taf=3000000000000001
    (XEN) [2013-10-31 12:30:12] mm.c:972:d0 Attempt to create linear
    p.t. with write perms
    (XEN) [2013-10-31 12:30:12] mm.c:1438:d0 Failure in alloc_l4_table:
    entry 255
    (XEN) [2013-10-31 12:30:12] mm.c:2099:d0 Error while validating mfn
    66ac55 (pfn 89766) for type 4000000000000000: caf=8000000000000003
    taf=4000000000000001
    (XEN) [2013-10-31 12:30:12] mm.c:2763:d0 Error while installing new
    baseptr 66ac55
    [46676.389678] ------------[ cut here ]------------
    [46676.389692] WARNING: at arch/x86/xen/multicalls.c:129
    xen_mc_flush+0x1a7/0x1b0()
    [46676.389697] Hardware name: SM15000-XE
    [46676.389703] Modules linked in: xen_acpi_processor xen_pciback
    xen_netback xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs
    xen_privcmd nfsv3 nfs_acl tun nfsv4 auth_rpcgss nfs dns_resolver
    fscache lockd 8021q garp bridge stp llc ip6t_REJECT
    nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
    nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack i2c_i801
    i2c_core coretemp iTCO_wdt iTCO_vendor_support microcode lpc_ich
    mfd_core crc32c_intel ghash_clmulni_intel e1000 sunrpc
    [46676.389767] Pid: 689, comm: rsyslogd Tainted: G    B W   
    3.8.11-100.fc17.x86_64 #1
    [46676.389775] Call Trace:
    [46676.389779] ---[ end trace b3928f7451ca4cc2 ]---
    (XEN) [2013-10-31 12:30:12] mm.c:765:d0 Bad L1 flags 400000
    (XEN) [2013-10-31 12:30:12] mm.c:1221:d0 Failure in alloc_l1_table:
    entry 248
    (XEN) [2013-10-31 12:30:12] mm.c:2099:d0 Error while validating mfn
    669930 (pfn 8828b) for type 1000000000000000: caf=8000000000000003
    taf=1000000000000001
    (XEN) [2013-10-31 12:30:12] mm.c:906:d0 Attempt to create linear
    p.t. with write perms
    (XEN) [2013-10-31 12:30:12] mm.c:1297:d0 Failure in alloc_l2_table:
    entry 335
    (XEN) [2013-10-31 12:30:12] mm.c:2099:d0 Error while validating mfn
    83e7af (pfn 52af) for type 2000000000000000: caf=8000000000000003
    taf=2000000000000001
    (XEN) [2013-10-31 12:30:12] mm.c:948:d0 Attempt to create linear
    p.t. with write perms
    (XEN) [2013-10-31 12:30:12] mm.c:1379:d0 Failure in alloc_l3_table:
    entry 504
    (XEN) [2013-10-31 12:30:12] mm.c:2099:d0 Error while validating mfn
    83f1df (pfn 5cdf) for type 3000000000000000: caf=8000000000000003
    taf=3000000000000001
    (XEN) [2013-10-31 12:30:12] mm.c:972:d0 Attempt to create linear
    p.t. with write perms
    (XEN) [2013-10-31 12:30:12] mm.c:1438:d0 Failure in alloc_l4_table:
    entry 255
    (XEN) [2013-10-31 12:30:12] mm.c:2099:d0 Error while validating mfn
    66ac55 (pfn 89766) for type 4000000000000000: caf=8000000000000003
    taf=4000000000000001
    (XEN) [2013-10-31 12:30:12] mm.c:2763:d0 Error while installing new
    baseptr 66ac55
    [46676.390686] ------------[ cut here ]------------
    [46676.390698] WARNING: at arch/x86/xen/multicalls.c:129
    xen_mc_flush+0x1a7/0x1b0()
    [46676.390704] Hardware name: SM15000-XE
    [46676.390709] Modules linked in: xen_acpi_processor xen_pciback
    xen_netback xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs
    xen_privcmd nfsv3 nfs_acl tun nfsv4 auth_rpcgss nfs dns_resolver
    fscache lockd 8021q garp bridge stp llc ip6t_REJECT
    nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
    nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack i2c_i801
    i2c_core coretemp iTCO_wdt iTCO_vendor_support microcode lpc_ich
    mfd_core crc32c_intel ghash_clmulni_intel e1000 sunrpc
    [46676.390796] Pid: 722, comm: mcelog Tainted: G    B W   
    3.8.11-100.fc17.x86_64 #1
    [46676.390801] Call Trace:
    [46676.390805] ---[ end trace b3928f7451ca4cc3 ]---
    (XEN) [2013-10-31 12:30:12] mm.c:765:d0 Bad L1 flags 400000
    (XEN) [2013-10-31 12:30:12] mm.c:1221:d0 Failure in alloc_l1_table:
    entry 248
    (XEN) [2013-10-31 12:30:12] mm.c:2099:d0 Error while validating mfn
    669913 (pfn 882a8) for type 1000000000000000: caf=8000000000000003
    taf=1000000000000001
    (XEN) [2013-10-31 12:30:12] mm.c:906:d0 Attempt to create linear
    p.t. with write perms
    (XEN) [2013-10-31 12:30:12] mm.c:1297:d0 Failure in alloc_l2_table:
    entry 335
    (XEN) [2013-10-31 12:30:12] mm.c:2099:d0 Error while validating mfn
    66b81c (pfn 8a39f) for type 2000000000000000: caf=8000000000000003
    taf=2000000000000001
    (XEN) [2013-10-31 12:30:12] mm.c:948:d0 Attempt to create linear
    p.t. with write perms
    (XEN) [2013-10-31 12:30:12] mm.c:1379:d0 Failure in alloc_l3_table:
    entry 504
    (XEN) [2013-10-31 12:30:12] mm.c:2099:d0 Error while validating mfn
    66b0ea (pfn 89ad1) for type 3000000000000000: caf=8000000000000003
    taf=3000000000000001
    (XEN) [2013-10-31 12:30:12] mm.c:972:d0 Attempt to create linear
    p.t. with write perms
    (XEN) [2013-10-31 12:30:12] mm.c:1438:d0 Failure in alloc_l4_table:
    entry 255
    (XEN) [2013-10-31 12:30:12] mm.c:2099:d0 Error while validating mfn
    66bb52 (pfn 8a469) for type 4000000000000000: caf=8000000000000003
    taf=4000000000000001
    (XEN) [2013-10-31 12:30:12] mm.c:2763:d0 Error while installing new
    baseptr 66bb52
    [46676.390897] ------------[ cut here ]------------
    [46676.390909] WARNING: at arch/x86/xen/multicalls.c:129
    xen_mc_flush+0x1a7/0x1b0()
    [46676.390916] Hardware name: SM15000-XE
    [46676.390919] Modules linked in: xen_acpi_processor xen_pciback
    xen_netback xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs
    xen_privcmd nfsv3 nfs_acl tun nfsv4 auth_rpcgss nfs dns_resolver
    fscache lockd 8021q garp bridge stp llc ip6t_REJECT
    nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
    nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack i2c_i801
    i2c_core coretemp iTCO_wdt iTCO_vendor_support microcode lpc_ich
    mfd_core crc32c_intel ghash_clmulni_intel e1000 sunrpc
    [46676.390988] Pid: 714, comm: acpid Tainted: G    B W   
    3.8.11-100.fc17.x86_64 #1
    [46676.390996] Call Trace:
    [46676.391001] ---[ end trace b3928f7451ca4cc4 ]---
    (XEN) [2013-10-31 12:30:12] mm.c:765:d0 Bad L1 flags 400000
    (XEN) [2013-10-31 12:30:12] mm.c:1221:d0 Failure in alloc_l1_table:
    entry 248
    (XEN) [2013-10-31 12:30:12] mm.c:2099:d0 Error while validating mfn
    669913 (pfn 882a8) for type 1000000000000000: caf=8000000000000003
    taf=1000000000000001
    (XEN) [2013-10-31 12:30:12] mm.c:906:d0 Attempt to create linear
    p.t. with write perms
    (XEN) [2013-10-31 12:30:12] mm.c:1297:d0 Failure in alloc_l2_table:
    entry 335
    (XEN) [2013-10-31 12:30:12] mm.c:2099:d0 Error while validating mfn
    66b81c (pfn 8a39f) for type 2000000000000000: caf=8000000000000003
    taf=2000000000000001
    (XEN) [2013-10-31 12:30:12] mm.c:948:d0 Attempt to create linear
    p.t. with write perms
    (XEN) [2013-10-31 12:30:12] mm.c:1379:d0 Failure in alloc_l3_table:
    entry 504
    (XEN) [2013-10-31 12:30:12] mm.c:2099:d0 Error while validating mfn
    66b0ea (pfn 89ad1) for type 3000000000000000: caf=8000000000000003
    taf=3000000000000001
    (XEN) [2013-10-31 12:30:12] mm.c:972:d0 Attempt to create linear
    p.t. with write perms
    (XEN) [2013-10-31 12:30:12] mm.c:1438:d0 Failure in alloc_l4_table:
    entry 255
    (XEN) [2013-10-31 12:30:12] mm.c:2099:d0 Error while validating mfn
    66bb52 (pfn 8a469) for type 4000000000000000: caf=8000000000000003
    taf=4000000000000001
    (XEN) [2013-10-31 12:30:12] mm.c:2763:d0 Error while installing new
    baseptr 66bb52
    [46676.391842] ------------[ cut here ]------------
    [46676.391850] WARNING: at arch/x86/xen/multicalls.c:129
    xen_mc_flush+0x1a7/0x1b0()
    [46676.391855] Hardware name: SM15000-XE
    [46676.391858] Modules linked in: xen_acpi_processor xen_pciback
    xen_netback xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs
    xen_privcmd nfsv3 nfs_acl tun nfsv4 auth_rpcgss nfs dns_resolver
    fscache lockd 8021q garp bridge stp llc ip6t_REJECT
    nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
    nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack i2c_i801
    i2c_core coretemp iTCO_wdt iTCO_vendor_support microcode lpc_ich
    mfd_core crc32c_intel ghash_clmulni_intel e1000 sunrpc
    [46676.391911] Pid: 713, comm: systemd-logind Tainted: G    B   W   
    3.8.11-100.fc17.x86_64 #1
    [46676.391916] Call Trace:
    [46676.391919] ---[ end trace b3928f7451ca4cc5 ]---
    xencommons[3118]: Stopping xenconsoled
    xencommons[3118]: Stopping QEMU
    xencommons[3118]: WARNING: Not stopping xenstored, as it cannot be
    restarted.
    (XEN) [2013-10-31 12:31:20] mm.c:765:d0 Bad L1 flags 400000
    (XEN) [2013-10-31 12:31:20] mm.c:1221:d0 Failure in alloc_l1_table:
    entry 248
    (XEN) [2013-10-31 12:31:20] mm.c:2099:d0 Error while validating mfn
    669913 (pfn 882a8) for type 1000000000000000: caf=8000000000000003
    taf=1000000000000001
    (XEN) [2013-10-31 12:31:20] mm.c:906:d0 Attempt to create linear
    p.t. with write perms
    (XEN) [2013-10-31 12:31:20] mm.c:1297:d0 Failure in alloc_l2_table:
    entry 335
    (XEN) [2013-10-31 12:31:20] mm.c:2099:d0 Error while validating mfn
    66b81c (pfn 8a39f) for type 2000000000000000: caf=8000000000000003
    taf=2000000000000001
    (XEN) [2013-10-31 12:31:20] mm.c:948:d0 Attempt to create linear
    p.t. with write perms
    (XEN) [2013-10-31 12:31:20] mm.c:1379:d0 Failure in alloc_l3_table:
    entry 504
    (XEN) [2013-10-31 12:31:20] mm.c:2099:d0 Error while validating mfn
    66b0ea (pfn 89ad1) for type 3000000000000000: caf=8000000000000003
    taf=3000000000000001
    (XEN) [2013-10-31 12:31:20] mm.c:972:d0 Attempt to create linear
    p.t. with write perms
    (XEN) [2013-10-31 12:31:20] mm.c:1438:d0 Failure in alloc_l4_table:
    entry 255
    (XEN) [2013-10-31 12:31:20] mm.c:2099:d0 Error while validating mfn
    66bb52 (pfn 8a469) for type 4000000000000000: caf=8000000000000003
    taf=4000000000000001
    (XEN) [2013-10-31 12:31:20] mm.c:2763:d0 Error while installing new
    baseptr 66bb52
    [46744.392509] ------------[ cut here ]------------
    [46744.392524] WARNING: at arch/x86/xen/multicalls.c:129
    xen_mc_flush+0x1a7/0x1b0()
    [46744.392530] Hardware name: SM15000-XE
    [46744.392534] Modules linked in: xen_acpi_processor xen_pciback
    xen_netback xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs
    xen_privcmd nfsv3 nfs_acl tun nfsv4 auth_rpcgss nfs dns_resolver
    fscache lockd 8021q garp bridge stp llc ip6t_REJECT
    nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
    nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack i2c_i801
    i2c_core coretemp iTCO_wdt iTCO_vendor_support microcode lpc_ich
    mfd_core crc32c_intel ghash_clmulni_intel e1000 sunrpc
    [46744.392598] Pid: 2556, comm: sshd Tainted: G    B W   
    3.8.11-100.fc17.x86_64 #1
    [46744.392607] Call Trace:
    [46744.392614] ---[ end trace b3928f7451ca4cc6 ]---
    (XEN) [2013-10-31 12:31:20] mm.c:765:d0 Bad L1 flags 400000
    (XEN) [2013-10-31 12:31:20] mm.c:1221:d0 Failure in alloc_l1_table:
    entry 248
    (XEN) [2013-10-31 12:31:20] mm.c:2099:d0 Error while validating mfn
    669913 (pfn 882a8) for type 1000000000000000: caf=8000000000000003
    taf=1000000000000001
    (XEN) [2013-10-31 12:31:20] mm.c:906:d0 Attempt to create linear
    p.t. with write perms
    (XEN) [2013-10-31 12:31:20] mm.c:1297:d0 Failure in alloc_l2_table:
    entry 335
    (XEN) [2013-10-31 12:31:20] mm.c:2099:d0 Error while validating mfn
    66b81c (pfn 8a39f) for type 2000000000000000: caf=8000000000000003
    taf=2000000000000001
    (XEN) [2013-10-31 12:31:20] mm.c:948:d0 Attempt to create linear
    p.t. with write perms
    (XEN) [2013-10-31 12:31:20] mm.c:1379:d0 Failure in alloc_l3_table:
    entry 504
    (XEN) [2013-10-31 12:31:20] mm.c:2099:d0 Error while validating mfn
    66b0ea (pfn 89ad1) for type 3000000000000000: caf=8000000000000003
    taf=3000000000000001
    (XEN) [2013-10-31 12:31:20] mm.c:972:d0 Attempt to create linear
    p.t. with write perms
    (XEN) [2013-10-31 12:31:20] mm.c:1438:d0 Failure in alloc_l4_table:
    entry 255
    (XEN) [2013-10-31 12:31:20] mm.c:2099:d0 Error while validating mfn
    66bb52 (pfn 8a469) for type 4000000000000000: caf=8000000000000003
    taf=4000000000000001
    (XEN) [2013-10-31 12:31:20] mm.c:2763:d0 Error while installing new
    baseptr 66bb52
    [46744.393790] ------------[ cut here ]------------
    [46744.393799] WARNING: at arch/x86/xen/multicalls.c:129
    xen_mc_flush+0x1a7/0x1b0()
    [46744.393804] Hardware name: SM15000-XE
    [46744.393807] Modules linked in: xen_acpi_processor xen_pciback
    xen_netback xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs
    xen_privcmd nfsv3 nfs_acl tun nfsv4 auth_rpcgss nfs dns_resolver
    fscache lockd 8021q garp bridge stp llc ip6t_REJECT
    nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
    nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack i2c_i801
    i2c_core coretemp iTCO_wdt iTCO_vendor_support microcode lpc_ich
    mfd_core crc32c_intel ghash_clmulni_intel e1000 sunrpc
    [46744.393862] Pid: 61, comm: kworker/6:1 Tainted: G B   W   
    3.8.11-100.fc17.x86_64 #1
    [46744.393868] Call Trace:
    [46744.393871] ---[ end trace b3928f7451ca4cc7 ]---
    (XEN) [2013-10-31 12:31:20] mm.c:765:d0 Bad L1 flags 400000
    (XEN) [2013-10-31 12:31:20] mm.c:1221:d0 Failure in alloc_l1_table:
    entry 248
    (XEN) [2013-10-31 12:31:20] mm.c:2099:d0 Error while validating mfn
    669913 (pfn 882a8) for type 1000000000000000: caf=8000000000000003
    taf=1000000000000001
    (XEN) [2013-10-31 12:31:20] mm.c:906:d0 Attempt to create linear
    p.t. with write perms
    (XEN) [2013-10-31 12:31:20] mm.c:1297:d0 Failure in alloc_l2_table:
    entry 335
    (XEN) [2013-10-31 12:31:20] mm.c:2099:d0 Error while validating mfn
    66b81c (pfn 8a39f) for type 2000000000000000: caf=8000000000000003
    taf=2000000000000001
    (XEN) [2013-10-31 12:31:20] mm.c:948:d0 Attempt to create linear
    p.t. with write perms
    (XEN) [2013-10-31 12:31:20] mm.c:1379:d0 Failure in alloc_l3_table:
    entry 504
    (XEN) [2013-10-31 12:31:20] mm.c:2099:d0 Error while validating mfn
    66b0ea (pfn 89ad1) for type 3000000000000000: caf=8000000000000003
    taf=3000000000000001
    (XEN) [2013-10-31 12:31:20] mm.c:972:d0 Attempt to create linear
    p.t. with write perms
    (XEN) [2013-10-31 12:31:20] mm.c:1438:d0 Failure in alloc_l4_table:
    entry 255
    (XEN) [2013-10-31 12:31:20] mm.c:2099:d0 Error while validating mfn
    66bb52 (pfn 8a469) for type 4000000000000000: caf=8000000000000003
    taf=4000000000000001
    (XEN) [2013-10-31 12:31:20] mm.c:2763:d0 Error while installing new
    baseptr 66bb52
    [46744.394165] ------------[ cut here ]------------
    [46744.394175] WARNING: at arch/x86/xen/multicalls.c:129
    xen_mc_flush+0x1a7/0x1b0()
    [46744.394182] Hardware name: SM15000-XE
    [46744.394185] Modules linked in: xen_acpi_processor xen_pciback
    xen_netback xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs
    xen_privcmd nfsv3 nfs_acl tun nfsv4 auth_rpcgss nfs dns_resolver
    fscache lockd 8021q garp bridge stp llc ip6t_REJECT
    nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
    nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack i2c_i801
    i2c_core coretemp iTCO_wdt iTCO_vendor_support microcode lpc_ich
    mfd_core crc32c_intel ghash_clmulni_intel e1000 sunrpc
    [46744.394248] Pid: 2531, comm: kworker/0:2 Tainted: G B   W   
    3.8.11-100.fc17.x86_64 #1
    [46744.394253] Call Trace:
    [46744.394257] ---[ end trace b3928f7451ca4cc8 ]---
    (XEN) [2013-10-31 12:31:20] mm.c:2352:d0 Bad type (saw
    4400000000000001 != exp 7000000000000000) for mfn 66c1ff (pfn 8a9bc)
    (XEN) [2013-10-31 12:31:20] mm.c:846:d0 Could not get page type
    PGT_writable_page
    (XEN) [2013-10-31 12:31:20] mm.c:898:d0 Error getting mfn 66c1ff
    (pfn 8a9bc) from L1 entry 800000066c1ff063 for l1e_owner=0, pg_owner=0
    (XEN) [2013-10-31 12:31:20] mm.c:2352:d0 Bad type (saw
    3400000000000002 != exp 7000000000000000) for mfn 66d0cb (pfn 8baf0)
    (XEN) [2013-10-31 12:31:20] mm.c:846:d0 Could not get page type
    PGT_writable_page
    (XEN) [2013-10-31 12:31:20] mm.c:898:d0 Error getting mfn 66d0cb
    (pfn 8baf0) from L1 entry 800000066d0cb063 for l1e_owner=0, pg_owner=0
    (XEN) [2013-10-31 12:31:20] mm.c:2352:d0 Bad type (saw
    2400000000000001 != exp 7000000000000000) for mfn 66ca08 (pfn 8b5b3)
    (XEN) [2013-10-31 12:31:20] mm.c:846:d0 Could not get page type
    PGT_writable_page
    (XEN) [2013-10-31 12:31:20] mm.c:898:d0 Error getting mfn 66ca08
    (pfn 8b5b3) from L1 entry 800000066ca08063 for l1e_owner=0, pg_owner=0
    (XEN) [2013-10-31 12:31:20] mm.c:2352:d0 Bad type (saw
    1400000000000001 != exp 7000000000000000) for mfn 66bd69 (pfn 8a652)
    (XEN) [2013-10-31 12:31:20] mm.c:846:d0 Could not get page type
    PGT_writable_page
    (XEN) [2013-10-31 12:31:20] mm.c:898:d0 Error getting mfn 66bd69
    (pfn 8a652) from L1 entry 800000066bd69063 for l1e_owner=0, pg_owner=0
    (XEN) [2013-10-31 12:31:20] mm.c:2352:d0 Bad type (saw
    1400000000000001 != exp 7000000000000000) for mfn 66bd83 (pfn 8a638)
    (XEN) [2013-10-31 12:31:20] mm.c:846:d0 Could not get page type
    PGT_writable_page
    (XEN) [2013-10-31 12:31:20] mm.c:898:d0 Error getting mfn 66bd83
    (pfn 8a638) from L1 entry 800000066bd83063 for l1e_owner=0, pg_owner=0
    (XEN) [2013-10-31 12:31:20] mm.c:2352:d0 Bad type (saw
    1400000000000001 != exp 7000000000000000) for mfn 66cccc (pfn 8b6ef)
    (XEN) [2013-10-31 12:31:20] mm.c:846:d0 Could not get page type
    PGT_writable_page
    (XEN) [2013-10-31 12:31:20] mm.c:898:d0 Error getting mfn 66cccc
    (pfn 8b6ef) from L1 entry 800000066cccc063 for l1e_owner=0, pg_owner=0
    (XEN) [2013-10-31 12:31:20] mm.c:2352:d0 Bad type (saw
    1400000000000001 != exp 7000000000000000) for mfn 66d178 (pfn 8ba43)
    (XEN) [2013-10-31 12:31:20] mm.c:846:d0 Could not get page type
    PGT_writable_page
    (XEN) [2013-10-31 12:31:20] mm.c:898:d0 Error getting mfn 66d178
    (pfn 8ba43) from L1 entry 800000066d178063 for l1e_owner=0, pg_owner=0
    (XEN) [2013-10-31 12:31:20] mm.c:2352:d0 Bad type (saw
    1400000000000001 != exp 7000000000000000) for mfn 66ac0b (pfn 897b0)
    (XEN) [2013-10-31 12:31:20] mm.c:846:d0 Could not get page type
    PGT_writable_page
    (XEN) [2013-10-31 12:31:20] mm.c:898:d0 Error getting mfn 66ac0b
    (pfn 897b0) from L1 entry 800000066ac0b063 for l1e_owner=0, pg_owner=0
    (XEN) [2013-10-31 12:31:20] mm.c:2352:d0 Bad type (saw
    1400000000000001 != exp 7000000000000000) for mfn 66d0ce (pfn 8baed)
    (XEN) [2013-10-31 12:31:20] mm.c:846:d0 Could not get page type
    PGT_writable_page
    (XEN) [2013-10-31 12:31:20] mm.c:898:d0 Error getting mfn 66d0ce
    (pfn 8baed) from L1 entry 800000066d0ce063 for l1e_owner=0, pg_owner=0
    (XEN) [2013-10-31 12:31:20] mm.c:2352:d0 Bad type (saw
    1400000000000001 != exp 7000000000000000) for mfn 83f402 (pfn 5f02)
    (XEN) [2013-10-31 12:31:20] mm.c:846:d0 Could not get page type
    PGT_writable_page
    (XEN) [2013-10-31 12:31:20] mm.c:898:d0 Error getting mfn 83f402
    (pfn 5f02) from L1 entry 800000083f402063 for l1e_owner=0, pg_owner=0
    (XEN) [2013-10-31 12:31:20] mm.c:2352:d0 Bad type (saw
    1400000000000001 != exp 7000000000000000) for mfn 6691b9 (pfn 87a02)
    (XEN) [2013-10-31 12:31:20] mm.c:846:d0 Could not get page type
    PGT_writable_page
    (XEN) [2013-10-31 12:31:20] mm.c:898:d0 Error getting mfn 6691b9
    (pfn 87a02) from L1 entry 80000006691b9063 for l1e_owner=0, pg_owner=0
    (XEN) [2013-10-31 12:31:20] mm.c:2352:d0 Bad type (saw
    1400000000000001 != exp 7000000000000000) for mfn 66cd08 (pfn 8b6b3)
    (XEN) [2013-10-31 12:31:20] mm.c:846:d0 Could not get page type
    PGT_writable_page
    (XEN) [2013-10-31 12:31:20] mm.c:898:d0 Error getting mfn 66cd08
    (pfn 8b6b3) from L1 entry 800000066cd08063 for l1e_owner=0, pg_owner=0
    (XEN) [2013-10-31 12:31:20] mm.c:2352:d0 Bad type (saw
    1400000000000001 != exp 7000000000000000) for mfn 83ef6c (pfn 5a6c)
    (XEN) [2013-10-31 12:31:20] mm.c:846:d0 Could not get page type
    PGT_writable_page
    (XEN) [2013-10-31 12:31:20] mm.c:898:d0 Error getting mfn 83ef6c
    (pfn 5a6c) from L1 entry 800000083ef6c063 for l1e_owner=0, pg_owner=0
    (XEN) [2013-10-31 12:31:20] mm.c:2352:d0 Bad type (saw
    1400000000000001 != exp 7000000000000000) for mfn 66be24 (pfn 8a997)
    (XEN) [2013-10-31 12:31:20] mm.c:846:d0 Could not get page type
    PGT_writable_page
    (XEN) [2013-10-31 12:31:20] mm.c:898:d0 Error getting mfn 66be24
    (pfn 8a997) from L1 entry 800000066be24063 for l1e_owner=0, pg_owner=0
    (XEN) [2013-10-31 12:31:20] mm.c:2352:d0 Bad type (saw
    1400000000000001 != exp 7000000000000000) for mfn 66cdcb (pfn 8b5f0)
    (XEN) [2013-10-31 12:31:20] mm.c:846:d0 Could not get page type
    PGT_writable_page
    (XEN) [2013-10-31 12:31:20] mm.c:898:d0 Error getting mfn 66cdcb
    (pfn 8b5f0) from L1 entry 800000066cdcb063 for l1e_owner=0, pg_owner=0
    (XEN) [2013-10-31 12:31:20] mm.c:2352:d0 Bad type (saw
    1400000000000001 != exp 7000000000000000) for mfn 66caaa (pfn 8b511)
    (XEN) [2013-10-31 12:31:20] mm.c:846:d0 Could not get page type
    PGT_writable_page
    (XEN) [2013-10-31 12:31:20] mm.c:898:d0 Error getting mfn 66caaa
    (pfn 8b511) from L1 entry 800000066caaa063 for l1e_owner=0, pg_owner=0
    (XEN) [2013-10-31 12:31:20] mm.c:2352:d0 Bad type (saw
    1400000000000001 != exp 7000000000000000) for mfn 66cdc3 (pfn 8b5f8)
    (XEN) [2013-10-31 12:31:20] mm.c:846:d0 Could not get page type
    PGT_writable_page
    (XEN) [2013-10-31 12:31:20] mm.c:898:d0 Error getting mfn 66cdc3
    (pfn 8b5f8) from L1 entry 800000066cdc3063 for l1e_owner=0, pg_owner=0
    [46744.438090] ------------[ cut here ]------------
    [46744.438100] WARNING: at arch/x86/xen/multicalls.c:129
    xen_mc_flush+0x1a7/0x1b0()
    [46744.438106] Hardware name: SM15000-XE
    [46744.438109] Modules linked in: xen_acpi_processor xen_pciback
    xen_netback xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs
    xen_privcmd nfsv3 nfs_acl tun nfsv4 auth_rpcgss nfs dns_resolver
    fscache lockd 8021q garp bridge stp llc ip6t_REJECT
    nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
    nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack i2c_i801
    i2c_core coretemp iTCO_wdt iTCO_vendor_support microcode lpc_ich
    mfd_core crc32c_intel ghash_clmulni_intel e1000 sunrpc
    [46744.438170] Pid: 2596, comm: sshd Tainted: G    B W   
    3.8.11-100.fc17.x86_64 #1
    [46744.438176] Call Trace:
    [46744.438184]  [<ffffffff8105e685>] warn_slowpath_common+0x75/0xa0
    [46744.438194]  [<ffffffff8105e6ca>] warn_slowpath_null+0x1a/0x20
    [46744.438200]  [<ffffffff81004d47>] xen_mc_flush+0x1a7/0x1b0
    [46744.438206]  [<ffffffff81004e4f>] __xen_mc_entry+0xff/0x130
    [46744.438215]  [<ffffffff81006f22>] xen_extend_mmuext_op+0x82/0x110
    [46744.438222]  [<ffffffff8100700b>] xen_do_pin+0x5b/0x60
    [46744.438231]  [<ffffffff810080d8>] ? pte_mfn_to_pfn+0x78/0x110
    [46744.438239]  [<ffffffff81007156>] xen_unpin_page+0x146/0x170
    [46744.438248]  [<ffffffff8100818e>] ? xen_pmd_val+0xe/0x10
    [46744.438257]  [<ffffffff810050f9>] ?
    __raw_callee_save_xen_pmd_val+0x11/0x1e
    [46744.438266]  [<ffffffff8100591d>] __xen_pgd_walk+0x24d/0x260
    [46744.438275]  [<ffffffff81007010>] ? xen_do_pin+0x60/0x60
    [46744.438284]  [<ffffffff8100724d>] __xen_pgd_unpin+0xcd/0x190
    [46744.438290]  [<ffffffff81007407>] xen_exit_mmap+0xf7/0x150
    [46744.438300]  [<ffffffff811656e8>] exit_mmap+0x48/0x170
    [46744.438311]  [<ffffffff810bc5aa>] ? exit_robust_list+0x8a/0x160
    [46744.438318]  [<ffffffff810de7a5>] ? __audit_free+0x1c5/0x230
    [46744.438325]  [<ffffffff8105b9d3>] mmput+0x83/0xf0
    [46744.438331]  [<ffffffff8106410b>] do_exit+0x24b/0x9d0
    [46744.438338]  [<ffffffff8119e9ae>] ? ____fput+0xe/0x10
    [46744.438345]  [<ffffffff8107ec8c>] ? task_work_run+0xac/0xe0
    [46744.438351]  [<ffffffff8106491f>] do_group_exit+0x3f/0xa0
    [46744.438358]  [<ffffffff81064997>] sys_exit_group+0x17/0x20
    [46744.438368]  [<ffffffff81658459>] system_call_fastpath+0x16/0x1b
    [46744.438374] ---[ end trace b3928f7451ca4cc9 ]---
    (XEN) [2013-10-31 12:31:20] mm.c:2352:d0 Bad type (saw
    1400000000000001 != exp 7000000000000000) for mfn 66cdff (pfn 8b5bc)
    (XEN) [2013-10-31 12:31:20] mm.c:846:d0 Could not get page type
    PGT_writable_page
    (XEN) [2013-10-31 12:31:20] mm.c:898:d0 Error getting mfn 66cdff
    (pfn 8b5bc) from L1 entry 800000066cdff063 for l1e_owner=0, pg_owner=0
    (XEN) [2013-10-31 12:31:20] mm.c:2352:d0 Bad type (saw
    1400000000000001 != exp 7000000000000000) for mfn 66ac37 (pfn 89784)
    (XEN) [2013-10-31 12:31:20] mm.c:846:d0 Could not get page type
    PGT_writable_page
    (XEN) [2013-10-31 12:31:20] mm.c:898:d0 Error getting mfn 66ac37
    (pfn 89784) from L1 entry 800000066ac37063 for l1e_owner=0, pg_owner=0
    (XEN) [2013-10-31 12:31:20] mm.c:2352:d0 Bad type (saw
    1400000000000001 != exp 7000000000000000) for mfn 66cdd5 (pfn 8b5e6)
    (XEN) [2013-10-31 12:31:20] mm.c:846:d0 Could not get page type
    PGT_writable_page
    (XEN) [2013-10-31 12:31:20] mm.c:898:d0 Error getting mfn 66cdd5
    (pfn 8b5e6) from L1 entry 800000066cdd5063 for l1e_owner=0, pg_owner=0
    [46744.438603] ------------[ cut here ]------------
    [46744.438615] WARNING: at arch/x86/xen/multicalls.c:129
    xen_mc_flush+0x1a7/0x1b0()
    [46744.438621] Hardware name: SM15000-XE
    [46744.438623] Modules linked in: xen_acpi_processor xen_pciback
    xen_netback xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs
    xen_privcmd nfsv3 nfs_acl tun nfsv4 auth_rpcgss nfs dns_resolver
    fscache lockd 8021q garp bridge stp llc ip6t_REJECT
    nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
    nf_conntrack_ipv4 nf_defrag(XEN) [2013-10-31 12:31:47] ----[
    Xen-4.4-unstable  x86_64  debug=y  Not tainted ]----
    (XEN) [2013-10-31 12:31:47] CPU:    1
    (XEN) [2013-10-31 12:31:47] RIP: e008:[<ffff82d080173f3d>]
    __put_page_type+0xdd/0x24e
    (XEN) [2013-10-31 12:31:47] RFLAGS: 0000000000010246 CONTEXT: hypervisor
    (XEN) [2013-10-31 12:31:47] rax: 0000000000000000   rbx:
    1c00000000000001   rcx: 000000000083f45c
    (XEN) [2013-10-31 12:31:47] rdx: 0000000000000000   rsi:
    0000000000000074   rdi: 0000000000000000
    (XEN) [2013-10-31 12:31:47] rbp: ffff83083a177a98   rsp:
    ffff83083a177a58   r8:  ffff8141002003b8
    (XEN) [2013-10-31 12:31:47] r9:  0000000000000077   r10:
    0000000000000200   r11: 0000000000000212
    (XEN) [2013-10-31 12:31:47] r12: ffff82e0107e8b80   r13:
    0000000000000000   r14: 007fffffffffffff
    (XEN) [2013-10-31 12:31:47] r15: 0000000000000000   cr0:
    0000000080050033   cr4: 00000000000426f0
    (XEN) [2013-10-31 12:31:47] cr3: 000000081dc0c000   cr2:
    00000000000009da
    (XEN) [2013-10-31 12:31:47] ds: 0000   es: 0000   fs: 0000   gs:
    0000   ss: e010   cs: e008
    (XEN) [2013-10-31 12:31:47] Xen stack trace from rsp=ffff83083a177a58:
    (XEN) [2013-10-31 12:31:47]    0000000000000097 1c00000000000001
    ffff82d080128626 ffff82e0107e8b80
    (XEN) [2013-10-31 12:31:47]    000000000000014f ffff830823fb1000
    0000000000000001 ffff820040075000
    (XEN) [2013-10-31 12:31:47]    ffff83083a177aa8 ffff82d0801743ad
    ffff83083a177ac8 ffff82d0801759cd
    (XEN) [2013-10-31 12:31:47]    0000000000000001 ffff82e0107ccd40
    ffff83083a177b28 ffff82d080173a91
    (XEN) [2013-10-31 12:31:47]    ffff83083a177ef8 ffff83083a170000
    ffff830800000000 000000000083e66a
    (XEN) [2013-10-31 12:31:47]    0000000000000206 2400000000000001
    ffff82e0107ccd40 ffff83083a170000
    (XEN) [2013-10-31 12:31:47]    007fffffffffffff 0000000000000001
    ffff83083a177b78 ffff82d080173f15
    (XEN) [2013-10-31 12:31:47]    ffff82d080287f4c 2400000000000001
    0000000000000002 000000000083e66a
    (XEN) [2013-10-31 12:31:47]    ffff82e0107ccd40 0000000000000001
    0000000000000000 ffff820040071fc0
    (XEN) [2013-10-31 12:31:47]    ffff83083a177b88 ffff82d0801740bc
    ffff83083a177bb8 ffff82d080173773
    (XEN) [2013-10-31 12:31:47]    ffff82e00cd7b8c0 00000000000001f8
    0000000000000001 0000000000000000
    (XEN) [2013-10-31 12:31:47]    ffff83083a177c18 ffff82d080173bf8
    0000000000000001 ffff830823fb1000
    (XEN) [2013-10-31 12:31:47]    ffff820040071000 000000000066bdc6
    ffff83083a177c48 3400000000000001
    (XEN) [2013-10-31 12:31:47]    ffff82e00cd7b8c0 ffff83083a170000
    007fffffffffffff 0000000000000001
    (XEN) [2013-10-31 12:31:47]    ffff83083a177c68 ffff82d080173f15
    0000000000000087 3400000000000001
    (XEN) [2013-10-31 12:31:47]    ffff82d080128626 ffff82e0107cf4c0
    ffff82e00cd7b8c0 0000000000000001
    (XEN) [2013-10-31 12:31:47]    0000000000000000 ffff830823fb1000
    ffff83083a177c78 ffff82d0801740bc
    (XEN) [2013-10-31 12:31:47]    ffff83083a177c98 ffff82d08017437c
    ffff82e0107cf4c0 00000000000000ff
    (XEN) [2013-10-31 12:31:47]    ffff83083a177cf8 ffff82d080173d60
    ffff83083a161ee0 0000000000000286
    (XEN) [2013-10-31 12:31:47]    ffff82004006f000 000000000083e7a6
    0000000000000000 4400000000000001
    (XEN) [2013-10-31 12:31:47] Xen call trace:
    (XEN) [2013-10-31 12:31:47]    [<ffff82d080173f3d>]
    __put_page_type+0xdd/0x24e
    (XEN) [2013-10-31 12:31:47]    [<ffff82d0801743ad>]
    put_page_type+0xe/0x16
    (XEN) [2013-10-31 12:31:47]    [<ffff82d0801759cd>]
    put_page_from_l2e+0x187/0x1a2
    (XEN) [2013-10-31 12:31:47]    [<ffff82d080173a91>]
    free_page_type+0x2ee/0x6bd
    (XEN) [2013-10-31 12:31:47]    [<ffff82d080173f15>]
    __put_page_type+0xb5/0x24e
    (XEN) [2013-10-31 12:31:47]    [<ffff82d0801740bc>]
    put_page_type_preemptible+0xe/0x10
    (XEN) [2013-10-31 12:31:47]    [<ffff82d080173773>]
    put_page_from_l3e+0x144/0x174
    (XEN) [2013-10-31 12:31:47]    [<ffff82d080173bf8>]
    free_page_type+0x455/0x6bd
    (XEN) [2013-10-31 12:31:47]    [<ffff82d080173f15>]
    __put_page_type+0xb5/0x24e
    (XEN) [2013-10-31 12:31:47]    [<ffff82d0801740bc>]
    put_page_type_preemptible+0xe/0x10
    (XEN) [2013-10-31 12:31:47]    [<ffff82d08017437c>]
    put_page_from_l4e+0x98/0xbb
    (XEN) [2013-10-31 12:31:47]    [<ffff82d080173d60>]
    free_page_type+0x5bd/0x6bd
    (XEN) [2013-10-31 12:31:47]    [<ffff82d080173f15>]
    __put_page_type+0xb5/0x24e
    (XEN) [2013-10-31 12:31:47]    [<ffff82d0801740bc>]
    put_page_type_preemptible+0x

   -Don Slutz



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Daniel Kiper

2013-Oct-31 22:21 UTC

head link

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

On Thu, Oct 31, 2013 at 06:30:32PM +0000, David Vrabel
wrote:> On 31/10/2013 16:59, Don Slutz wrote:
> > On 10/30/13 12:57, David Vrabel wrote:
> >> On 21/10/13 21:20, Daniel Kiper wrote:
> >>> On Mon, Oct 21, 2013 at 01:56:09PM +0100, David Vrabel wrote:
> >>>> On 21/10/13 13:19, Daniel Kiper wrote:
> >>>>> On Sat, Oct 19, 2013 at 12:14:24AM +0100, David Vrabel
wrote:
> >>>>>> On 18/10/2013 19:40, Daniel Kiper wrote:
> >>>>>>> On Tue, Oct 08, 2013 at 05:55:01PM +0100,
David Vrabel wrote:
> >>>>>>>> The series (for Xen 4.4) improves the
kexec hypercall by making Xen
> >>>>>>>> responsible for loading and relocating the
image.  This allows kexec
> >>>>>>>> to be usable by pv-ops kernels and should
allow kexec to be usable
> >>>>>>>> from a HVM or PVH privileged domain.
> >>>>>>> I could not load panic image because Xen
crashes in following way:
> >>>>>>>
> >>>>>>> (XEN) ----[ Xen-4.4-unstable  x86_64  debug=y 
Tainted:    C ]----
> >>>>>> [...]
> >>>>>>> (XEN) Xen call trace:
> >>>>>>> (XEN)    [<ffff82d080114ef2>]
kimage_free+0x67/0xd2
> >>>>>>> (XEN)    [<ffff82d0801151f9>]
do_kimage_alloc+0x29c/0x2f0
> >>>>>>> (XEN)    [<ffff82d0801152fe>]
kimage_alloc+0xb1/0xe6
> >>>>>>> (XEN)    [<ffff82d0801144c0>]
do_kexec_op_internal+0x68e/0x789
> >>>>>>> (XEN)    [<ffff82d0801145c9>]
do_kexec_op+0xe/0x12
> >>>>>>> (XEN)    [<ffff82d0802268cb>]
syscall_enter+0xeb/0x145
> > I get the same thing.
> >>>>>> The appended patch should fix this crash which
only occurs if there''s an
> >>>>>> error in do_kimage_alloc().
> >>>>> Patch had wrapped lines. I hope that I fixed it
properly.
> >>>>> I cannot load panic kernel. kexec fails with following
message:
> > My version of this patch is attached (0001...). It has both crashed
> > right away and not:
> >
> >     (XEN) [2013-10-30 21:26:39] ----[ Xen-4.4-unstable  x86_64 
debug=y
> >     Not tainted ]----
> >     (XEN) [2013-10-30 21:26:39] CPU:    7
> >     (XEN) [2013-10-30 21:26:39] RIP:   
e008:[<ffff82d08012fd72>]
> >     xmem_pool_free+0x6f/0x2e9
>
> Looks like heap corruption.  I''ll look into this.
>
> >>>>> kexec_load failed: Cannot assign requested address
> >>>> This is -EADDRINVALID which means one of
> >>>>
> >>>> a) the entry point isn''t within a segment.
> >>>> b) one of the segments is not page aligned.
> >>>> c) one of the segments is not within the crash region.
> >>>>
> >>>> But the segments kexec has constructed all looked fine to
me (and
> >>>> similar to the segments I see).
> > I have tracked this down to in kexec-tools:
> >
> >     +    if (info->kexec_flags & KEXEC_ON_CRASH) {
> >     +        set_xen_guest_handle(xen_segs[s].buf.h,
HYPERCALL_BUFFER_NULL);
> >     +        xen_segs[s].buf_size = 0;
> >     +        xen_segs[s].dest_maddr = info->backup_src_start;
> >     +        xen_segs[s].dest_size = info->backup_src_size;
> >     +        nr_segments++;
> >     +    }
> >
> > Which in some cases passes the 1st e820 line which for me is:
> >
> >     (XEN) Xen-e820 RAM map:
> >     (XEN)  0000000000000000 - 000000000009b800 (usable)
> >     (XEN)  000000000009b800 - 00000000000a0000 (reserved)
> >     (XEN)  00000000000e0000 - 0000000000100000 (reserved)
> >     (XEN)  0000000000100000 - 00000000bf63f000 (usable)
> >     ...
> >
> > 000000000009b800 is not page aligned and so the test:
> >
> >          if ( (mstart & ~PAGE_MASK) || (mend & ~PAGE_MASK) )
> >             goto out;
> >
> > Fails.
> >
> > A possible fix is attached as (0002...) this does allow me to get into
> > the crash kernel.
>
> Thanks for tracking this down. This should be fixed in the tools by
> correctly aligning that segment.
I have rebuild all binaries and results are the same. Configuration
changes does not help too. Don''s patches solve some issues but dump
system still does not load (I use Linux Kernel 3.10.17). I can see
only following messages in all cases:

I''m in purgatory
early console in decompress_kernel

I am going to investigate this further at the beginning of next week
(probably on Monday). If you find something new drop me a line.

Daniel

Daniel Kiper

2013-Nov-05 17:41 UTC

head link

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

On Thu, Oct 31, 2013 at 11:21:48PM +0100, Daniel Kiper
wrote:> On Thu, Oct 31, 2013 at 06:30:32PM +0000, David Vrabel wrote:
> > On 31/10/2013 16:59, Don Slutz wrote:
[...]
> > > A possible fix is attached as (0002...) this does allow me to get
into
> > > the crash kernel.
> >
> > Thanks for tracking this down. This should be fixed in the tools by
> > correctly aligning that segment.
With Don''s patch (workaround) crash kernel works from time to time.
I hope that proper fix will solve this issue.
> I have rebuild all binaries and results are the same. Configuration
> changes does not help too. Don''s patches solve some issues but
dump
> system still does not load (I use Linux Kernel 3.10.17). I can see
> only following messages in all cases:
>
> I''m in purgatory
> early console in decompress_kernel
It looks that early VGA code touches VGA memory buffer (0xb8000).
When I mapped 0 - 1 MiB memory region (using machine_kexec_add_page()
in xen/arch/x86/machine_kexec.c:machine_kexec_load() everything
started working. So, sadly, I think we should map unconditionally
this memory region too.

Daniel

David Vrabel

2013-Nov-05 18:01 UTC

head link

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

On 05/11/13 17:41, Daniel Kiper wrote:> On Thu, Oct 31, 2013 at 11:21:48PM +0100, Daniel Kiper wrote:
>> On Thu, Oct 31, 2013 at 06:30:32PM +0000, David Vrabel wrote:
>>> On 31/10/2013 16:59, Don Slutz wrote:
> 
> [...]
> 
>>>> A possible fix is attached as (0002...) this does allow me to
get into
>>>> the crash kernel.
>>>
>>> Thanks for tracking this down. This should be fixed in the tools by
>>> correctly aligning that segment.
> 
> With Don''s patch (workaround) crash kernel works from time to
time.
> I hope that proper fix will solve this issue.
> 
>> I have rebuild all binaries and results are the same. Configuration
>> changes does not help too. Don''s patches solve some issues but
dump
>> system still does not load (I use Linux Kernel 3.10.17). I can see
>> only following messages in all cases:
>>
>> I''m in purgatory
>> early console in decompress_kernel
> 
> It looks that early VGA code touches VGA memory buffer (0xb8000).
> When I mapped 0 - 1 MiB memory region (using machine_kexec_add_page()
> in xen/arch/x86/machine_kexec.c:machine_kexec_load() everything
> started working. So, sadly, I think we should map unconditionally
> this memory region too.
Thanks for tracking this down.

Mapping all of 0 - 1 MiB seems reasonable to me, but this should be done
by kexec-tools specifying this and not by Xen.

I''ll post updated Xen and kexec-tools patches tomorrow hopefully.

David

Don Slutz

2013-Nov-05 22:39 UTC

head link

Re: [PATCH 3/9] kexec: add infrastructure for handling kexec images

On 10/08/13 12:55, David Vrabel wrote:> From: David Vrabel <david.vrabel@citrix.com>
>
> Add the code needed to handle and load kexec images into Xen memory or
> into the crash region.  This is needed for the new KEXEC_CMD_load and
> KEXEC_CMD_unload hypercall sub-ops.
>
[...]> +static int kimage_crash_alloc(struct kexec_image **rimage, paddr_t entry,
> +                              unsigned long nr_segments,
> +                              xen_kexec_segment_t *segments)
> +{
> +    unsigned long i;
> +    int result;
> +
> +    /* Verify we have a valid entry point */
> +    if ( (entry < kexec_crash_area.start)
> +         || (entry > kexec_crash_area.start + kexec_crash_area.size))
> +        return -EADDRNOTAVAIL;
> +
> +    /*
> +     * Verify we have good destination addresses.  Normally
> +     * the caller is responsible for making certain we don''t
> +     * attempt to load the new image into invalid or reserved
> +     * areas of RAM.  But crash kernels are preloaded into a
> +     * reserved area of ram.  We must ensure the addresses
> +     * are in the reserved area otherwise preloading the
> +     * kernel could corrupt things.
> +     */
> +    for ( i = 0; i < nr_segments; i++ )
> +    {
> +        paddr_t mstart, mend;
> +
> +        if ( guest_handle_is_null(segments[i].buf.h) )
> +            continue;
> +
> +        mstart = segments[i].dest_maddr;
> +        mend = mstart + segments[i].dest_size - 1;It is safer and matches the rest of the use to drop the -1 here, and use
"mend >=" below.  (The more I think on it, "mend >"
below is still correct after removing the "-
1".)> +        /* Ensure we are within the crash kernel limits. */
> +        if ( (mstart < kexec_crash_area.start )
> +             || (mend > kexec_crash_area.start +
kexec_crash_area.size))
> +            return -EADDRNOTAVAIL;
> +    }
> +
> +    /* Allocate and initialize a controlling structure. */
> +    result = do_kimage_alloc(rimage, entry, nr_segments, segments,
> +                             KEXEC_TYPE_CRASH);
> +    if ( result )
> +        return result;
> +
> +    return 0;It can be proved that result === 0.  So "return result" and
"return 0" are the same.  This means that the if above is not needed. 
When it is removed the only use of result is to be returned, so changing to:

    return do_kimage_alloc(rimage, entry, nr_segments, segments,
KEXEC_TYPE_CRASH);

makes sense to me.
> +}
[...]> +static int kimage_add_entry(struct kexec_image *image, kimage_entry_t
entry)
> +{
> +    kimage_entry_t *entries;
> +
> +    if ( image->next_entry == KIMAGE_LAST_ENTRY )
> +    {
> +        struct page_info *page;
> +
> +        page = kimage_alloc_page(image, KIMAGE_NO_DEST);
> +        if ( !page )
> +            return -ENOMEM;
> +
> +        entries = __map_domain_page(image->entry_page);Not sure if entries needs to be checked for NULL.  Best guess is that it cannot
be NULL.> +        entries[image->next_entry] = page_to_maddr(page) |
IND_INDIRECTION;
> +        unmap_domain_page(entries);
> +
>   -Don Slutz


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Don Slutz

2013-Nov-05 22:43 UTC

head link

Re: [PATCH 4/9] kexec: extend hypercall with improved load/unload ops

On 10/08/13 12:55, David Vrabel wrote:> From: David Vrabel <david.vrabel@citrix.com>
>
[...]> +
> +static int kexec_segments_from_ind_page(unsigned long mfn,
> +                                        unsigned *nr_segments,
> +                                        xen_kexec_segment_t *segments,
> +                                        bool_t compat)
> +{
> +    void *page;
> +    kimage_entry_t *entry;
> +    int ret = 0;
> +
> +    page = map_domain_page(mfn);
> +
> +    /*
> +     * Walk the indirection page list, adding destination pages to the
> +     * segments.
> +     */
> +    for ( entry = page; ; )
>       {
> -        if ( test_and_clear_bit((base + pos), &kexec_flags) )
> +        unsigned long ind;
> +
> +        ind = kimage_entry_ind(entry, compat);
> +        mfn = kimage_entry_mfn(entry, compat);
> +
> +        switch ( ind )
>           {
> -            image = &kexec_image[base + pos];
> -            machine_kexec_unload(load->type, base + pos, image);
> +        case IND_DESTINATION:
> +            ret = kexec_segments_add_segment(nr_segments, segments, mfn);
> +            if ( ret < 0 )
> +                goto done;
> +            break;
> +        case IND_INDIRECTION:
> +            unmap_domain_page(page);
> +            page = map_domain_page(mfn);
> +            if ( page == NULL )
> +                return -ENOMEM;
> +            entry = page;
> +            continue;
> +        case IND_DONE:
> +            goto done;
> +        case IND_SOURCE:
> +            segments[*nr_segments-1].dest_size += PAGE_SIZE;I have not been able to prove that *nr_segments can not be zero when you get
here.  So I think that this needs to be checked for instead of corrupting
memory.> +            break;
> +        default:
> +            ret = -EINVAL;
> +            goto done;
>           }
> +        entry = kimage_entry_next(entry, compat);
>       }
> +done:
> +    unmap_domain_page(page);
> +    return ret;
> +}
[...]>    -Don Slutz

Jan Beulich

2013-Nov-06 08:12 UTC

head link

Re: [PATCH 3/9] kexec: add infrastructure for handling kexec images

>>> On 05.11.13 at 23:39, Don Slutz <dslutz@verizon.com> wrote:
> On 10/08/13 12:55, David Vrabel wrote:
>> +        entries = __map_domain_page(image->entry_page);
> Not sure if entries needs to be checked for NULL.  Best guess is that it 
> cannot be NULL.
Indeed, map_domain_page() (other than map_domain_page_global())
can''t fail.

Jan

Daniel Kiper

2013-Nov-14 11:20 UTC

head link

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

On Mon, Oct 14, 2013 at 08:13:52PM +0200, Daniel Kiper
wrote:> On Mon, Oct 14, 2013 at 03:14:13PM +0100, David Vrabel wrote:
> > On 14/10/13 14:53, Daniel Kiper wrote:
> > > On Fri, Oct 11, 2013 at 03:06:09PM +0100, David Vrabel wrote:
> > >> On 11/10/13 12:15, Daniel Kiper wrote:
> > >>>
> > >>>> + * - Register values are undefined.
> > >>>
> > >>> If Linux and kexec guys state that they do not care then
I do not care too.
> > >>> Let''s wait what will happen in "kexec:
Clearing registers just before
> > >>> jumping into purgatory" thread.
> > >>
> > >> How about we get the current series in as-is (plus the extra
docs) and
> > >> then, since you feel so strongly about this minor point, you
post a
> > >> follow patch to change the behaviour?
> > >>
> > >> Does that work for you?  If so and if you''re happy
with everything else,
> > >> can I get your Reviewed-by on the whole series?
> > >
> > > What do you think about last Eric comments? Should we continue
our discussion?
> > > If yes I could do final tests of latest series now and put my
Tested-by and
> > > Reviewed-by as needed. Later we could establish details and put
follow up patches
> > > (one for zeroing registers and one fixing/aliging calling
convention for
> > > relocate_pages). It will be nice if we finish this stuff by the
of this week.
> >
> > I think there are two[*] sensible options:
> >
> > A. Registers are specified as undefined, register values are not
> > initialized.
> >
> > B. Registers are specified as zeroed (%rsp, %rax excepted), register
> > values are initialized to zero.
> >
> > If A is merged, then Xen can move to B later.  If B is merged, Xen
> > cannot go back to A.  Therefore, I think we should merge A and discuss
> > moving to B (or perhaps even C) as a separate item.
>
> OK.
>
> > (FYI, I''ve already fixed up relocate_pages() to go into v10
since I need
> > to post v10 with the extra docs anyway.)
>
> Thanks.
>
> > David
> >
> > [*] There is a third way:
> >
> > C. Registers are specified as undefined, but register values are
> > initialized to zero.
> >
> > But I don''t think the specification should diverge from the
implementation.
>
> I agree but I think that we could solve that problem by adding comment
> which precisely explains what is going on and what callee should expect
> (uninitialized registers). Eric comment is nice and could be used by us
> as a starting point. Additionally, I think that similar comment should
> be added to Linux Kernel source and purgatory entry (I could do that).
Now I think that we are at point in which we should solve this issue. A option
is merged now with short comment. Personally I prefer C with Eric Biederman
comment.
However, if you are not convinced we could stay with A but I prefer that current
comment would be extended with clear statement why we decided to deviate from
Linux
implementation (which we used as a base for our development).

Daniel

David Vrabel

2013-Nov-14 11:27 UTC

head link

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

On 14/11/13 11:20, Daniel Kiper wrote:> 
> Now I think that we are at point in which we should solve this issue. A
option
> is merged now with short comment. Personally I prefer C with Eric Biederman
comment.
> However, if you are not convinced we could stay with A but I prefer that
current
> comment would be extended with clear statement why we decided to deviate
from Linux
> implementation (which we used as a base for our development).
I still prefer option A but if you still think otherwise, post a patch
with a full explanation of why and I can reconsider it.

David

Xen devel - Oct 2013 - [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

[PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

[PATCH 1/9] x86: give FIX_EFI_MPF its own fixmap entry

[PATCH 2/9] kexec: add public interface for improved load/unload sub-ops

[PATCH 3/9] kexec: add infrastructure for handling kexec images

[PATCH 4/9] kexec: extend hypercall with improved load/unload ops

[PATCH 5/9] xen: kexec crash image when dom0 crashes

[PATCH 6/9] libxc: add hypercall buffer arrays

[PATCH 7/9] libxc: add API for kexec hypercall

[PATCH 8/9] x86: check kexec relocation code fits in a page

[PATCH 9/9] MAINTAINERS: Add KEXEC maintainer

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

Re: [PATCH 3/9] kexec: add infrastructure for handling kexec images

Re: [PATCH 4/9] kexec: extend hypercall with improved load/unload ops

Re: [PATCH 3/9] kexec: add infrastructure for handling kexec images

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels

Re: [PATCHv9 0/9] Xen: extend kexec hypercall for use with pv-ops kernels