thr3ads.net - Xen devel - [PATCH 0 of 8] x86/mm fixes [Jan 2012]

If this information is useful, please help other people find it:
Share via:

Andres Lagar-Cavilla

2012-Jan-26 03:53 UTC

[PATCH 0 of 8] x86/mm fixes

This patch series aggregates a number of fixes:
- Fix the paging load operation to correct the m2p entry, and
  promote the p2m entry to the final guest-accessible type
- Fix locking around p2m_teardown
- Fix read-only mapping of shared pages
- Output to the console the per-domain count of shared pages
- Eliminate a stale var from a p2m audit debugtrace printk
- Correct accounting of paged out pages when a paging-out is
  interrupted by a guest access
- Simplify (and in some cases eliminate) use of p2m get_gfn*_unlocked
- Allow for recursive locking in the mm-locks.h deadlock detection
  scheme

Patches 1, 4 and 6 involve paging, cc''ed Olaf Hering.

Patches 2, 7 and 8 lay more groundwork in preparation of fully
synchronized p2m access.

Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>

 xen/arch/x86/mm/p2m.c         |  18 ++++++++----------
 xen/arch/x86/mm/p2m.c         |   4 ++--
 xen/arch/x86/mm.c             |   8 +++++++-
 xen/common/keyhandler.c       |   6 ++++--
 xen/arch/x86/mm/p2m.c         |   4 ++--
 xen/arch/x86/mm/p2m.c         |   3 ++-
 xen/arch/x86/hvm/emulate.c    |  35 ++++++++++++++++++++++++++++-------
 xen/arch/x86/hvm/hvm.c        |   5 ++++-
 xen/arch/x86/hvm/stdvga.c     |  12 ++++++++++--
 xen/arch/x86/hvm/vmx/vmx.c    |   2 +-
 xen/arch/x86/mm/mem_sharing.c |   2 +-
 xen/arch/x86/mm/mm-locks.h    |   3 ++-
 12 files changed, 71 insertions(+), 31 deletions(-)

Andres Lagar-Cavilla

2012-Jan-26 03:53 UTC

head link

[PATCH 1 of 8] x86/mm: Fix paging_load

xen/arch/x86/mm/p2m.c |  18 ++++++++----------
 1 files changed, 8 insertions(+), 10 deletions(-)


When restoring a p2m entry in the paging_load path, we were not updating the
m2p entry correctly.

Also take advantage of this to act on an old suggestion: once done with the
load, promote the p2m entry to the final guest accessible type. This simplifies
logic.

Tested to work with xenpaging.

Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>

diff -r f09f62ae92b7 -r 143e4982c9bf xen/arch/x86/mm/p2m.c
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -975,7 +975,7 @@ void p2m_mem_paging_populate(struct doma
 int p2m_mem_paging_prep(struct domain *d, unsigned long gfn, uint64_t buffer)
 {
     struct page_info *page;
-    p2m_type_t p2mt, target_p2mt;
+    p2m_type_t p2mt;
     p2m_access_t a;
     mfn_t mfn;
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
@@ -1033,15 +1033,13 @@ int p2m_mem_paging_prep(struct domain *d
         }
     }
 
-    target_p2mt = (p2mt == p2m_ram_paging_in_start) ?
-        /* If we kicked the pager with a populate event, the pager will send
-         * a resume event back */
-        p2m_ram_paging_in :
-        /* If this was called asynchronously by the pager, then we can 
-         * transition directly to the final guest-accessible type */
-        (paging_mode_log_dirty(d) ? p2m_ram_logdirty : p2m_ram_rw);
-    /* Fix p2m mapping */
-    set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, target_p2mt, a);
+    /* Make the page already guest-accessible. If the pager still has a
+     * pending resume operation, it will be idempotent p2m entry-wise,
+     * but will unpause the vcpu */
+    set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, 
+                    paging_mode_log_dirty(d) ? p2m_ram_logdirty : 
+                    p2m_ram_rw, a);
+    set_gpfn_from_mfn(mfn_x(mfn), gfn);
 
     atomic_dec(&d->paged_pages);

Andres Lagar-Cavilla

2012-Jan-26 03:53 UTC

head link

[PATCH 2 of 8] x86/mm: Fix p2m teardown locking

xen/arch/x86/mm/p2m.c |  4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)


Holding the p2m lock during a p2m teardown, while unsharing entries pointing to
shared frames, causes a locking inversion and deadlock panic.

Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>

diff -r 143e4982c9bf -r 2e7c77585adb xen/arch/x86/mm/p2m.c
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -343,8 +343,6 @@ void p2m_teardown(struct p2m_domain *p2m
     if (p2m == NULL)
         return;
 
-    p2m_lock(p2m);
-
 #ifdef __x86_64__
     for ( gfn=0; gfn < p2m->max_mapped_pfn; gfn++ )
     {
@@ -358,6 +356,8 @@ void p2m_teardown(struct p2m_domain *p2m
     }
 #endif
 
+    p2m_lock(p2m);
+
     p2m->phys_table = pagetable_null();
 
     while ( (pg = page_list_remove_head(&p2m->pages)) )

Andres Lagar-Cavilla

2012-Jan-26 03:53 UTC

head link

[PATCH 3 of 8] x86/mm: Allow foreign read-only mappings of shared pages

xen/arch/x86/mm.c |  8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)


Because shared pages are owned by dom_cow, the ownership test
while foreign mapping fails.

Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>

diff -r 2e7c77585adb -r c4b96d2cba06 xen/arch/x86/mm.c
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -878,7 +878,8 @@ get_page_from_l1e(
         return 1;
     }
 
-    if ( unlikely(real_pg_owner != pg_owner) )
+    if ( unlikely( (real_pg_owner != pg_owner) &&
+                   (real_pg_owner != dom_cow) ) )
     {
         /*
          * Let privileged domains transfer the right to map their target
@@ -892,6 +893,11 @@ get_page_from_l1e(
         pg_owner = real_pg_owner;
     }
 
+    /* Extra paranoid check for shared memory. Writable mappings 
+     * disallowed (unshare first!) */
+    if ( (l1f & _PAGE_RW) && (real_pg_owner == dom_cow) )
+        goto could_not_pin;
+
     /* Foreign mappings into guests in shadow external mode don''t
      * contribute to writeable mapping refcounts.  (This allows the
      * qemu-dm helper process in dom0 to map the domain''s memory
without

Andres Lagar-Cavilla

2012-Jan-26 03:53 UTC

head link

[PATCH 4 of 8] x86/mm: Output domain count of paged pages in console

xen/common/keyhandler.c |  6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)


Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>

diff -r c4b96d2cba06 -r 39e2fc847c57 xen/common/keyhandler.c
--- a/xen/common/keyhandler.c
+++ b/xen/common/keyhandler.c
@@ -250,8 +250,10 @@ static void dump_domains(unsigned char k
         printk("    refcnt=%d dying=%d pause_count=%d\n",
                atomic_read(&d->refcnt), d->is_dying,
                atomic_read(&d->pause_count));
-        printk("    nr_pages=%d xenheap_pages=%d shared_pages=%u
dirty_cpus=%s max_pages=%u\n",
-               d->tot_pages, d->xenheap_pages,
atomic_read(&d->shr_pages), tmpstr, d->max_pages);
+        printk("    nr_pages=%d xenheap_pages=%d shared_pages=%u
paged_pages=%u "
+               "dirty_cpus=%s max_pages=%u\n", d->tot_pages,
d->xenheap_pages,
+                atomic_read(&d->shr_pages),
atomic_read(&d->paged_pages),
+                tmpstr, d->max_pages);
         printk("    handle=%02x%02x%02x%02x-%02x%02x-%02x%02x-"
                "%02x%02x-%02x%02x%02x%02x%02x%02x vm_assist=%08lx\n",
                d->handle[ 0], d->handle[ 1], d->handle[ 2],
d->handle[ 3],

Andres Lagar-Cavilla

2012-Jan-26 03:53 UTC

head link

[PATCH 5 of 8] x86/mm: Remove stale variable from debugtrace printk in p2m audit

xen/arch/x86/mm/p2m.c |  4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)


Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>

diff -r 39e2fc847c57 -r c41436e555cd xen/arch/x86/mm/p2m.c
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1534,8 +1534,8 @@ void audit_p2m(struct domain *d,
         }
         __put_gfn(p2m, gfn);
 
-        P2M_PRINTK("OK: mfn=%#lx, gfn=%#lx, p2mfn=%#lx,
lp2mfn=%#lx\n",
-                       mfn, gfn, mfn_x(p2mfn), lp2mfn);
+        P2M_PRINTK("OK: mfn=%#lx, gfn=%#lx, p2mfn=%#lx\n",
+                       mfn, gfn, mfn_x(p2mfn));
     }
     spin_unlock(&d->page_alloc_lock);

Andres Lagar-Cavilla

2012-Jan-26 03:53 UTC

head link

[PATCH 6 of 8] x86/mm: Properly account for paged out pages

xen/arch/x86/mm/p2m.c |  3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)


If we hit the page after nominate but before paging it out, don''t
decrement the
domain count of paged out pages.

Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>

diff -r c41436e555cd -r d4336e35f0bc xen/arch/x86/mm/p2m.c
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1041,7 +1041,8 @@ int p2m_mem_paging_prep(struct domain *d
                     p2m_ram_rw, a);
     set_gpfn_from_mfn(mfn_x(mfn), gfn);
 
-    atomic_dec(&d->paged_pages);
+    if ( !page_extant )
+        atomic_dec(&d->paged_pages);
 
     ret = 0;

Andres Lagar-Cavilla

2012-Jan-26 03:53 UTC

head link

[PATCH 7 of 8] x86/mm: clean use of p2m unlocked queries

xen/arch/x86/hvm/emulate.c    |  35 ++++++++++++++++++++++++++++-------
 xen/arch/x86/hvm/hvm.c        |   5 ++++-
 xen/arch/x86/hvm/stdvga.c     |  12 ++++++++++--
 xen/arch/x86/hvm/vmx/vmx.c    |   2 +-
 xen/arch/x86/mm/mem_sharing.c |   2 +-
 5 files changed, 44 insertions(+), 12 deletions(-)


Limit such queries only to p2m_query types. This is more compatible
with the name and intended semantics: perform only a lookup, and explicitly
in an unlocked way.

Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>

diff -r d4336e35f0bc -r 2dc0ef05662e xen/arch/x86/hvm/emulate.c
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -660,7 +660,7 @@ static int hvmemul_rep_movs(
 {
     struct hvm_emulate_ctxt *hvmemul_ctxt          container_of(ctxt, struct
hvm_emulate_ctxt, ctxt);
-    unsigned long saddr, daddr, bytes;
+    unsigned long saddr, daddr, bytes, sgfn, dgfn;
     paddr_t sgpa, dgpa;
     uint32_t pfec = PFEC_page_present;
     p2m_type_t p2mt;
@@ -693,17 +693,28 @@ static int hvmemul_rep_movs(
     if ( rc != X86EMUL_OKAY )
         return rc;
 
-    /* Unlocked works here because we get_gfn for real in whatever
-     * we call later. */
-    (void)get_gfn_unlocked(current->domain, sgpa >> PAGE_SHIFT,
&p2mt);
+    /* XXX In a fine-grained p2m locking scenario, we need to sort this
+     * get_gfn''s, or else we might deadlock */
+    sgfn = sgpa >> PAGE_SHIFT;
+    (void)get_gfn(current->domain, sgfn, &p2mt);
     if ( !p2m_is_ram(p2mt) && !p2m_is_grant(p2mt) )
-        return hvmemul_do_mmio(
+    {
+        rc = hvmemul_do_mmio(
             sgpa, reps, bytes_per_rep, dgpa, IOREQ_READ, df, NULL);
+        put_gfn(current->domain, sgfn);
+        return rc;
+    }
 
-    (void)get_gfn_unlocked(current->domain, dgpa >> PAGE_SHIFT,
&p2mt);
+    dgfn = dgpa >> PAGE_SHIFT;
+    (void)get_gfn(current->domain, dgfn, &p2mt);
     if ( !p2m_is_ram(p2mt) && !p2m_is_grant(p2mt) )
-        return hvmemul_do_mmio(
+    {
+        rc = hvmemul_do_mmio(
             dgpa, reps, bytes_per_rep, sgpa, IOREQ_WRITE, df, NULL);
+        put_gfn(current->domain, sgfn);
+        put_gfn(current->domain, dgfn);
+        return rc;
+    }
 
     /* RAM-to-RAM copy: emulate as equivalent of memmove(dgpa, sgpa, bytes). */
     bytes = *reps * bytes_per_rep;
@@ -718,7 +729,11 @@ static int hvmemul_rep_movs(
      * can be emulated by a source-to-buffer-to-destination block copy.
      */
     if ( ((dgpa + bytes_per_rep) > sgpa) && (dgpa < (sgpa +
bytes)) )
+    {
+        put_gfn(current->domain, sgfn);
+        put_gfn(current->domain, dgfn);
         return X86EMUL_UNHANDLEABLE;
+    }
 
     /* Adjust destination address for reverse copy. */
     if ( df )
@@ -727,7 +742,11 @@ static int hvmemul_rep_movs(
     /* Allocate temporary buffer. Fall back to slow emulation if this fails. */
     buf = xmalloc_bytes(bytes);
     if ( buf == NULL )
+    {
+        put_gfn(current->domain, sgfn);
+        put_gfn(current->domain, dgfn);
         return X86EMUL_UNHANDLEABLE;
+    }
 
     /*
      * We do a modicum of checking here, just for paranoia''s sake and
to
@@ -738,6 +757,8 @@ static int hvmemul_rep_movs(
         rc = hvm_copy_to_guest_phys(dgpa, buf, bytes);
 
     xfree(buf);
+    put_gfn(current->domain, sgfn);
+    put_gfn(current->domain, dgfn);
 
     if ( rc == HVMCOPY_gfn_paged_out )
         return X86EMUL_RETRY;
diff -r d4336e35f0bc -r 2dc0ef05662e xen/arch/x86/hvm/hvm.c
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3975,7 +3975,10 @@ long do_hvm_op(unsigned long op, XEN_GUE
         rc = -EINVAL;
         if ( is_hvm_domain(d) )
         {
-            get_gfn_unshare_unlocked(d, a.pfn, &t);
+            /* Use get_gfn query as we are interested in the current 
+             * type, not in allocating or unsharing. That''ll happen 
+             * on access. */
+            get_gfn_query_unlocked(d, a.pfn, &t);
             if ( p2m_is_mmio(t) )
                 a.mem_type =  HVMMEM_mmio_dm;
             else if ( p2m_is_readonly(t) )
diff -r d4336e35f0bc -r 2dc0ef05662e xen/arch/x86/hvm/stdvga.c
--- a/xen/arch/x86/hvm/stdvga.c
+++ b/xen/arch/x86/hvm/stdvga.c
@@ -482,15 +482,19 @@ static int mmio_move(struct hvm_hw_stdvg
                 if ( hvm_copy_to_guest_phys(data, &tmp, p->size) !      
HVMCOPY_okay )
                 {
-                    (void)get_gfn_unlocked(d, data >> PAGE_SHIFT,
&p2mt);
+                    (void)get_gfn(d, data >> PAGE_SHIFT, &p2mt);
                     /*
                      * The only case we handle is vga_mem <-> vga_mem.
                      * Anything else disables caching and leaves it to qemu-dm.
                      */
                     if ( (p2mt != p2m_mmio_dm) || (data < VGA_MEM_BASE) ||
                          ((data + p->size) > (VGA_MEM_BASE +
VGA_MEM_SIZE)) )
+                    {
+                        put_gfn(d, data >> PAGE_SHIFT);
                         return 0;
+                    }
                     stdvga_mem_write(data, tmp, p->size);
+                    put_gfn(d, data >> PAGE_SHIFT);
                 }
                 data += sign * p->size;
                 addr += sign * p->size;
@@ -504,11 +508,15 @@ static int mmio_move(struct hvm_hw_stdvg
                 if ( hvm_copy_from_guest_phys(&tmp, data, p->size) !    
HVMCOPY_okay )
                 {
-                    (void)get_gfn_unlocked(d, data >> PAGE_SHIFT,
&p2mt);
+                    (void)get_gfn(d, data >> PAGE_SHIFT, &p2mt);
                     if ( (p2mt != p2m_mmio_dm) || (data < VGA_MEM_BASE) ||
                          ((data + p->size) > (VGA_MEM_BASE +
VGA_MEM_SIZE)) )
+                    {
+                        put_gfn(d, data >> PAGE_SHIFT);
                         return 0;
+                    }
                     tmp = stdvga_mem_read(data, p->size);
+                    put_gfn(d, data >> PAGE_SHIFT);
                 }
                 stdvga_mem_write(addr, tmp, p->size);
                 data += sign * p->size;
diff -r d4336e35f0bc -r 2dc0ef05662e xen/arch/x86/hvm/vmx/vmx.c
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2112,7 +2112,7 @@ static void ept_handle_violation(unsigne
         return;
 
     /* Everything else is an error. */
-    mfn = get_gfn_guest_unlocked(d, gfn, &p2mt);
+    mfn = get_gfn_query_unlocked(d, gfn, &p2mt);
     gdprintk(XENLOG_ERR, "EPT violation %#lx (%c%c%c/%c%c%c), "
              "gpa %#"PRIpaddr", mfn %#lx, type %i.\n", 
              qualification, 
diff -r d4336e35f0bc -r 2dc0ef05662e xen/arch/x86/mm/mem_sharing.c
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -421,7 +421,7 @@ int mem_sharing_debug_gfn(struct domain 
     p2m_type_t p2mt;
     mfn_t mfn;
 
-    mfn = get_gfn_unlocked(d, gfn, &p2mt);
+    mfn = get_gfn_query_unlocked(d, gfn, &p2mt);
 
     gdprintk(XENLOG_DEBUG, "Debug for domain=%d, gfn=%lx, ", 
                d->domain_id,

Andres Lagar-Cavilla

2012-Jan-26 03:53 UTC

head link

[PATCH 8 of 8] x86/mm: Avoid spurious deadlock panic trigger

xen/arch/x86/mm/mm-locks.h |  3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)


In the mm layer, if we take lock A, then lock B, and the recursively lock A,
the deadlock detector panics. This is not a deadlock risk because we
already ''own'' the outer lock (A), so we will not contend for
that resource.

Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>

diff -r 2dc0ef05662e -r 409f0f368ae3 xen/arch/x86/mm/mm-locks.h
--- a/xen/arch/x86/mm/mm-locks.h
+++ b/xen/arch/x86/mm/mm-locks.h
@@ -61,7 +61,8 @@ do {                                \
 
 static inline void _mm_lock(mm_lock_t *l, const char *func, int level, int rec)
 {
-    __check_lock_level(level);
+    if ( !((mm_locked_by_me(l)) && rec) ) 
+        __check_lock_level(level);
     spin_lock_recursive(&l->lock);
     if ( l->lock.recurse_cnt == 1 )
     {

Olaf Hering

2012-Jan-26 09:46 UTC

head link

Re: [PATCH 1 of 8] x86/mm: Fix paging_load

On Wed, Jan 25, Andres Lagar-Cavilla wrote:
> When restoring a p2m entry in the paging_load path, we were not updating
the
> m2p entry correctly.
Thats a good one.
> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
Acked-by: Olaf Hering <olaf@aepfle.de>

Olaf Hering

2012-Jan-26 09:47 UTC

head link

Re: [PATCH 4 of 8] x86/mm: Output domain count of paged pages in console

On Wed, Jan 25, Andres Lagar-Cavilla wrote:
>  xen/common/keyhandler.c |  6 ++++--
>  1 files changed, 4 insertions(+), 2 deletions(-)
> 
> 
> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
Acked-by: Olaf Hering <olaf@aepfle.de>

Olaf Hering

2012-Jan-26 09:54 UTC

head link

Re: [PATCH 6 of 8] x86/mm: Properly account for paged out pages

On Wed, Jan 25, Andres Lagar-Cavilla wrote:
>  xen/arch/x86/mm/p2m.c |  3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> 
> If we hit the page after nominate but before paging it out, don''t
decrement the
> domain count of paged out pages.
The meaning of paged_pages is to track released pages.
In p2m_mem_paging_evict() it gets incremented once the page is removed
from the domain, and in p2m_mem_paging_prep() it should be decremented
right after the alloc_domheap_page() call.
So the patch should move the atomic_dec() call after successful page
allocation.

Olaf
> diff -r c41436e555cd -r d4336e35f0bc xen/arch/x86/mm/p2m.c
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -1041,7 +1041,8 @@ int p2m_mem_paging_prep(struct domain *d
>                      p2m_ram_rw, a);
>      set_gpfn_from_mfn(mfn_x(mfn), gfn);
>  
> -    atomic_dec(&d->paged_pages);
> +    if ( !page_extant )
> +        atomic_dec(&d->paged_pages);
>  
>      ret = 0;
>

Andres Lagar-Cavilla

2012-Jan-26 10:47 UTC

head link

Re: [PATCH 6 of 8] x86/mm: Properly account for paged out pages

> On Wed, Jan 25, Andres Lagar-Cavilla wrote:
>
>>  xen/arch/x86/mm/p2m.c |  3 ++-
>>  1 files changed, 2 insertions(+), 1 deletions(-)
>>
>>
>> If we hit the page after nominate but before paging it out,
don''t
>> decrement the
>> domain count of paged out pages.
>
> The meaning of paged_pages is to track released pages.
> In p2m_mem_paging_evict() it gets incremented once the page is removed
> from the domain, and in p2m_mem_paging_prep() it should be decremented
> right after the alloc_domheap_page() call.
> So the patch should move the atomic_dec() call after successful page
> allocation.
>
> Olaf
Hi Olaf,

there is a put_page in case things go wrong, after the alloc_domheap_page
call. So doing the decrement at alloc is a bit too soon.

Thanks
Andres>
>> diff -r c41436e555cd -r d4336e35f0bc xen/arch/x86/mm/p2m.c
>> --- a/xen/arch/x86/mm/p2m.c
>> +++ b/xen/arch/x86/mm/p2m.c
>> @@ -1041,7 +1041,8 @@ int p2m_mem_paging_prep(struct domain *d
>>                      p2m_ram_rw, a);
>>      set_gpfn_from_mfn(mfn_x(mfn), gfn);
>>
>> -    atomic_dec(&d->paged_pages);
>> +    if ( !page_extant )
>> +        atomic_dec(&d->paged_pages);
>>
>>      ret = 0;
>>
>

Andres Lagar-Cavilla

2012-Jan-26 10:49 UTC

head link

Re: [PATCH 1 of 8] x86/mm: Fix paging_load

> On Wed, Jan 25, Andres Lagar-Cavilla wrote:
>
>> When restoring a p2m entry in the paging_load path, we were not
updating
>> the
>> m2p entry correctly.
>
> Thats a good one.
>
>> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
>
> Acked-by: Olaf Hering <olaf@aepfle.de>
>Great!

Now, afaict, the p2m_ram_paging_in state is not needed anymore. Can you
provide feedback as to whether
1. remove p2m_ram_paging_in
2. rename p2m_ram_paging_in_start to p2m_ram_paging_in

sounds like a good plan?

Cheers!
Andres

Olaf Hering

2012-Jan-26 12:05 UTC

head link

Re: [PATCH 1 of 8] x86/mm: Fix paging_load

On Thu, Jan 26, Andres Lagar-Cavilla wrote:
> Now, afaict, the p2m_ram_paging_in state is not needed anymore. Can you
> provide feedback as to whether
> 1. remove p2m_ram_paging_in
> 2. rename p2m_ram_paging_in_start to p2m_ram_paging_in
> 
> sounds like a good plan?
In my opinion the common case is that evicted pages get populated, an
request is sent. Later an response is expected to make room in the ring.

If p2m_mem_paging_populate allocates a page for the guest, it can let
the pager know that it did so (or failed to allocate one).
If there is a page already, the pager can copy the gfn content into a
buffer, put a pointer to it in the response and let
p2m_mem_paging_resume() handle both the ring accounting (as it does now)
and also the copy_from_user.
If page allocation failed, the pager has to allocate one via
p2m_mem_paging_prep() as it is done now, as an intermediate step.

The buffer page handling in the pager is probably simple, it needs to
maintain RING_SIZE() buffers. There cant be more than that in flight
because thats the limit of requests as well. In other words, the pager
does not need to wait for p2m_mem_paging_resume() to run and pull the
buffer content.

If the "populate - allocate - put_request - get_request - fill_buffer -
put_response - resume  get_response - copy_from_buffer - resume_vcpu"
cycle works, it would reduce the overall amount of work to be done
during paging, even if the hypercalls itself are not the bottleneck.
It all depends on the possibility to allocate a page in the various
contexts where p2m_mem_paging_populate is called.

The resume part could be done via eventchannel and
XEN_DOMCTL_MEM_EVENT_OP_PAGING_RESUME could be removed.

Also the question is if freeing one p2mt is more important than reducing
the number if hypercalls to execute at runtime.

Olaf

Olaf Hering

2012-Jan-26 12:11 UTC

head link

Re: [PATCH 6 of 8] x86/mm: Properly account for paged out pages

On Thu, Jan 26, Andres Lagar-Cavilla wrote:
> there is a put_page in case things go wrong, after the alloc_domheap_page
> call. So doing the decrement at alloc is a bit too soon.
Thats probably true.

But in the case of failure the guest will likely hang soon. Is the
copy_from_user() (and p2m_mem_paging_prep itself) restartable? If not,
the guest should probably be crashed.  The way xenpaging_populate_page()
is written right now, it will just exit, EFAULT isnt caught.

Olaf

Andres Lagar-Cavilla

2012-Jan-26 12:23 UTC

head link

Re: [PATCH 1 of 8] x86/mm: Fix paging_load

> On Thu, Jan 26, Andres Lagar-Cavilla wrote:
>
>> Now, afaict, the p2m_ram_paging_in state is not needed anymore. Can you
>> provide feedback as to whether
>> 1. remove p2m_ram_paging_in
>> 2. rename p2m_ram_paging_in_start to p2m_ram_paging_in
>>
>> sounds like a good plan?
>
> In my opinion the common case is that evicted pages get populated, an
> request is sent. Later an response is expected to make room in the ring.
>
> If p2m_mem_paging_populate allocates a page for the guest, it can let
> the pager know that it did so (or failed to allocate one).
> If there is a page already, the pager can copy the gfn content into a
> buffer, put a pointer to it in the response and let
> p2m_mem_paging_resume() handle both the ring accounting (as it does now)
> and also the copy_from_user.
So, this would bounce the page contents twice for the case when the page
hasn''t yet been evicted?
> If page allocation failed, the pager has to allocate one via
> p2m_mem_paging_prep() as it is done now, as an intermediate step.
The issue of failed allocations is more pervasive. It also affects mem
sharing. And even PoD. What I''m trying to say is that even though your
solution seems to work (as long as the pager does dom0 ballooning to free
up some memory in between populate and prep!), we need a more generic
mechanism. Something along the lines of an ENOMEM mem_event ring, and a
matching dom0 daemon.
>
> The buffer page handling in the pager is probably simple, it needs to
> maintain RING_SIZE() buffers. There cant be more than that in flight
> because thats the limit of requests as well. In other words, the pager
> does not need to wait for p2m_mem_paging_resume() to run and pull the
> buffer content.
>
>
> If the "populate - allocate - put_request - get_request - fill_buffer
-
> put_response - resume  get_response - copy_from_buffer - resume_vcpu"
> cycle works, it would reduce the overall amount of work to be done
> during paging, even if the hypercalls itself are not the bottleneck.
> It all depends on the possibility to allocate a page in the various
> contexts where p2m_mem_paging_populate is called.
The gist here is that the paging_load call would be removed?

I like the general direction, but one excellent property of paging_resume
is that it doesn''t fail. This is particularly important since we
already
do resumes via ring responses and event channel kicks (see below). So, if
resume needs to propagate failures back to the pager, things get icky.

(paging_load is restartable, see other email)
>
> The resume part could be done via eventchannel and
> XEN_DOMCTL_MEM_EVENT_OP_PAGING_RESUME could be removed.
This is already the case. I''m not eager to remove the domctl, but
resuming
via event channel kick only is in place.
>
> Also the question is if freeing one p2mt is more important than reducing
> the number if hypercalls to execute at runtime.
Agreed. However, eliminating code complexity is also useful, and these two
ram_paging_in states cause everyone headaches.

Thanks
Andres>
> Olaf
>

Andres Lagar-Cavilla

2012-Jan-26 12:26 UTC

head link

Re: [PATCH 6 of 8] x86/mm: Properly account for paged out pages

> On Thu, Jan 26, Andres Lagar-Cavilla wrote:
>
>> there is a put_page in case things go wrong, after the
>> alloc_domheap_page
>> call. So doing the decrement at alloc is a bit too soon.
>
> Thats probably true.
>
> But in the case of failure the guest will likely hang soon. Is the
> copy_from_user() (and p2m_mem_paging_prep itself) restartable? If not,
> the guest should probably be crashed.  The way xenpaging_populate_page()
> is written right now, it will just exit, EFAULT isnt caught.
p2m_mem_paging_prep is restartable, afaict. It cleanly undoes its deeds.

xenpaging not handling EFAULT ... that''s a different issue/patch
altogether.

Andres
>
> Olaf
>

Olaf Hering

2012-Jan-26 12:43 UTC

head link

Re: [PATCH 1 of 8] x86/mm: Fix paging_load

On Thu, Jan 26, Andres Lagar-Cavilla wrote:
> > On Thu, Jan 26, Andres Lagar-Cavilla wrote:
> >
> >> Now, afaict, the p2m_ram_paging_in state is not needed anymore.
Can you
> >> provide feedback as to whether
> >> 1. remove p2m_ram_paging_in
> >> 2. rename p2m_ram_paging_in_start to p2m_ram_paging_in
> >>
> >> sounds like a good plan?
> >
> > In my opinion the common case is that evicted pages get populated, an
> > request is sent. Later an response is expected to make room in the
ring.
> >
> > If p2m_mem_paging_populate allocates a page for the guest, it can let
> > the pager know that it did so (or failed to allocate one).
> > If there is a page already, the pager can copy the gfn content into a
> > buffer, put a pointer to it in the response and let
> > p2m_mem_paging_resume() handle both the ring accounting (as it does
now)
> > and also the copy_from_user.
> 
> So, this would bounce the page contents twice for the case when the page
> hasn''t yet been evicted?
If it was not evicted, then nothing has to be done. In theory the page
could be resumed right away. But you wanted the notification, so a
kind-of dummy request has to be sent. The pager itself will figure out
quickly that the page was not evicted yet, so all it can do is to send a
dummy response.
I''m not sure what you mean with bouncing it twice. The pager itself has
likely written the page, so some IO happend. The hypervisor will find a
mfn for the gfn, right now it does nothing but doing p2mt changes.
> > If page allocation failed, the pager has to allocate one via
> > p2m_mem_paging_prep() as it is done now, as an intermediate step.
> 
> The issue of failed allocations is more pervasive. It also affects mem
> sharing. And even PoD. What I''m trying to say is that even though
your
> solution seems to work (as long as the pager does dom0 ballooning to free
> up some memory in between populate and prep!), we need a more generic
> mechanism. Something along the lines of an ENOMEM mem_event ring, and a
> matching dom0 daemon.
I''m not sure how all that relates to putting an alloc_domheap_page()
into p2m_mem_paging_populate() and let the pager know about the results.
> >
> > The buffer page handling in the pager is probably simple, it needs to
> > maintain RING_SIZE() buffers. There cant be more than that in flight
> > because thats the limit of requests as well. In other words, the pager
> > does not need to wait for p2m_mem_paging_resume() to run and pull the
> > buffer content.
> >
> >
> > If the "populate - allocate - put_request - get_request -
fill_buffer -
> > put_response - resume  get_response - copy_from_buffer -
resume_vcpu"
> > cycle works, it would reduce the overall amount of work to be done
> > during paging, even if the hypercalls itself are not the bottleneck.
> > It all depends on the possibility to allocate a page in the various
> > contexts where p2m_mem_paging_populate is called.
> 
> The gist here is that the paging_load call would be removed?
No, it is required if alloc_domheap_page() in
p2m_mem_paging_populate() fails. Then the pager has to take the "slow
path" and try until it gets a page, like it does now.
> I like the general direction, but one excellent property of paging_resume
> is that it doesn''t fail. This is particularly important since we
already
> do resumes via ring responses and event channel kicks (see below). So, if
> resume needs to propagate failures back to the pager, things get icky.
> 
> (paging_load is restartable, see other email)
Once the request is sent, all what can happen is that the copy_from_user
fails. And in that case the guest is in an undefined state. Perhaps such
copy_from_user handling can be fixed to be reliable. In that case of
course my idea wont work.
> > The resume part could be done via eventchannel and
> > XEN_DOMCTL_MEM_EVENT_OP_PAGING_RESUME could be removed.
> 
> This is already the case. I''m not eager to remove the domctl, but
resuming
> via event channel kick only is in place.
It looks to me like resume is currently called twice in xenpaging, once
per ioctl and once per event channel.
> > Also the question is if freeing one p2mt is more important than
reducing
> > the number if hypercalls to execute at runtime.
> 
> Agreed. However, eliminating code complexity is also useful, and these two
> ram_paging_in states cause everyone headaches.
Well, its not so alien to me after starring at and tracing it long
enough.

Olaf

Olaf Hering

2012-Jan-26 13:08 UTC

head link

Re: [PATCH 6 of 8] x86/mm: Properly account for paged out pages

On Thu, Jan 26, Andres Lagar-Cavilla wrote:
> xenpaging not handling EFAULT ... that''s a different issue/patch
altogether.
I''m not sure wether EFAULT is something an app can or should catch. I
will have a look.


Other than that, the patch is ok.

Olaf

Tim Deegan

2012-Jan-26 13:31 UTC

head link

Re: [PATCH 0 of 8] x86/mm fixes

At 22:53 -0500 on 25 Jan (1327532004), Andres Lagar-Cavilla
wrote:> This patch series aggregates a number of fixes:
> - Fix the paging load operation to correct the m2p entry, and
>   promote the p2m entry to the final guest-accessible type
> - Fix locking around p2m_teardown
> - Fix read-only mapping of shared pages
> - Output to the console the per-domain count of shared pages
> - Eliminate a stale var from a p2m audit debugtrace printk
> - Correct accounting of paged out pages when a paging-out is
>   interrupted by a guest access
> - Simplify (and in some cases eliminate) use of p2m get_gfn*_unlocked
> - Allow for recursive locking in the mm-locks.h deadlock detection
>   scheme
All applied; thanks.

Tim.

Xen devel - Jan 2012 - [PATCH 0 of 8] x86/mm fixes

[PATCH 0 of 8] x86/mm fixes

[PATCH 1 of 8] x86/mm: Fix paging_load

[PATCH 2 of 8] x86/mm: Fix p2m teardown locking

[PATCH 3 of 8] x86/mm: Allow foreign read-only mappings of shared pages

[PATCH 4 of 8] x86/mm: Output domain count of paged pages in console

[PATCH 5 of 8] x86/mm: Remove stale variable from debugtrace printk in p2m audit

[PATCH 6 of 8] x86/mm: Properly account for paged out pages

[PATCH 7 of 8] x86/mm: clean use of p2m unlocked queries

[PATCH 8 of 8] x86/mm: Avoid spurious deadlock panic trigger

Re: [PATCH 1 of 8] x86/mm: Fix paging_load

Re: [PATCH 4 of 8] x86/mm: Output domain count of paged pages in console

Re: [PATCH 6 of 8] x86/mm: Properly account for paged out pages

Re: [PATCH 6 of 8] x86/mm: Properly account for paged out pages

Re: [PATCH 1 of 8] x86/mm: Fix paging_load

Re: [PATCH 1 of 8] x86/mm: Fix paging_load

Re: [PATCH 6 of 8] x86/mm: Properly account for paged out pages

Re: [PATCH 1 of 8] x86/mm: Fix paging_load

Re: [PATCH 6 of 8] x86/mm: Properly account for paged out pages

Re: [PATCH 1 of 8] x86/mm: Fix paging_load

Re: [PATCH 6 of 8] x86/mm: Properly account for paged out pages

Re: [PATCH 0 of 8] x86/mm fixes