thr3ads.net - Xen devel - [PATCH] ARM: cache coherence problem in guestcopy.c [Jun 2013]

If this information is useful, please help other people find it:
Share via:

Jaeyong Yoo

2013-Jun-18 05:03 UTC

[PATCH] ARM: cache coherence problem in guestcopy.c

I''ve encountered a rather unusual bug while I''m implementing
live
migration on arndale board.
After resume, domU kernel starts invoking hypercalls and at some point
the hypercall parameters delivered to xen are corrupted.

After some debugging (with the help of HW debugger), I found that
cache polution happens, and here is the detailed sequence.

1) DomU kernel allocates a local variable struct xen_add_to_physmap xatp
   and the GVA of xatp is 0xc785fe38 (note that not cache-line aligned)
   (see gnttab_map function in linux/drivers/xen/grant-table.c)
   
2) GVA of xatp is mapped in xen page table at raw_copy_from_guest function, 
   and the VA of xen is 0xae48ee38 and its contents are cached.

3) DomU kernel reuses xatp to invoke the second hypercall with different 
   parameters.

4) GVA of xatp is mapped again in the same VA of xen, and the cached data
   at step 2) (the first hypercall parameter) is loaded.

The below patch prevents the above problem.

I''m wondering, as a better solution, that does unmap_domain_page should
invalidate the cache before unmap the page?


Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com>
---
 xen/arch/arm/guestcopy.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/xen/arch/arm/guestcopy.c b/xen/arch/arm/guestcopy.c
index d146cd6..9c4a7e4 100644
--- a/xen/arch/arm/guestcopy.c
+++ b/xen/arch/arm/guestcopy.c
@@ -25,6 +25,7 @@ unsigned long raw_copy_to_guest(void *to, const void *from,
unsigned len)
         p += offset;
         memcpy(p, from, size);
 
+        flush_xen_dcache_va_range(p, size);
         unmap_domain_page(p - offset);
         len -= size;
         from += size;
@@ -55,6 +56,7 @@ unsigned long raw_clear_guest(void *to, unsigned len)
         p += offset;
         memset(p, 0x00, size);
 
+        flush_xen_dcache_va_range(p, size);
         unmap_domain_page(p - offset);
         len -= size;
         to += size;
@@ -84,6 +86,7 @@ unsigned long raw_copy_from_guest(void *to, const void __user
*from, unsigned le
 
         memcpy(to, p, size);
 
+        flush_xen_dcache_va_range(p, size);
         unmap_domain_page(p);
         len -= size;
         from += size;
-- 
1.7.9.5

Ian Campbell

2013-Jun-18 09:20 UTC

head link

Re: [PATCH] ARM: cache coherence problem in guestcopy.c

On Tue, 2013-06-18 at 14:03 +0900, Jaeyong Yoo wrote:> I''ve encountered a rather unusual bug while I''m
implementing live
> migration on arndale board.
> After resume, domU kernel starts invoking hypercalls and at some point
> the hypercall parameters delivered to xen are corrupted.
> 
> After some debugging (with the help of HW debugger), I found that
> cache polution happens, and here is the detailed sequence.
> 
> 1) DomU kernel allocates a local variable struct xen_add_to_physmap xatp
>    and the GVA of xatp is 0xc785fe38 (note that not cache-line aligned)
>    (see gnttab_map function in linux/drivers/xen/grant-table.c)
>    
> 2) GVA of xatp is mapped in xen page table at raw_copy_from_guest function,
>    and the VA of xen is 0xae48ee38 and its contents are cached.
> 
> 3) DomU kernel reuses xatp to invoke the second hypercall with different 
>    parameters.
> 
> 4) GVA of xatp is mapped again in the same VA of xen, and the cached data
>    at step 2) (the first hypercall parameter) is loaded.
> 
> The below patch prevents the above problem.
Ouch! I wonder what other random unexplained events could be attributed
to this!
> I''m wondering, as a better solution, that does unmap_domain_page
should
> invalidate the cache before unmap the page?
I think that would be better since the domain_page region mappings are
reference counted so you could avoid a flush every time you unmapped
something and just issue it when the count hits zero.

But actually the domain_page stuff is lazy -- it won''t unmap a
refcount==0 slot until it comes to reuse it, this means that if you
unmap then remap the same address you should end up reusing the same
mapping. This means we only need to remap on reuse and not when the
count hits zero. That''s why there are TLB flushes in map_domain_page
but
not unmap_domain_page. Obviously as you have discovered we need some
data cache flushes in there as well.

So I think we probably actually need the dcache flush in domain_map_page
at the "/* Commandeer this 2MB slot */" point. In that context I
don''t
think we can avoid flushing anything other than the complete 2MB
mapping. Does this work for you too?

The laziness of the remappings makes me wonder though. Do you know if
the slot is reused between step #2 and #3? Otherwise I''d expect us to
reuse the existing mapping with the cache intact. The caches are PIPT so
I wouldn''t expect the address aliasing to be an issue. Unless the
mapping is reused for something else I''m not too sure where the cache
pollution is coming from.

Ian.
> 
> 
> Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com>
> ---
>  xen/arch/arm/guestcopy.c |    3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/xen/arch/arm/guestcopy.c b/xen/arch/arm/guestcopy.c
> index d146cd6..9c4a7e4 100644
> --- a/xen/arch/arm/guestcopy.c
> +++ b/xen/arch/arm/guestcopy.c
> @@ -25,6 +25,7 @@ unsigned long raw_copy_to_guest(void *to, const void
*from, unsigned len)
>          p += offset;
>          memcpy(p, from, size);
>  
> +        flush_xen_dcache_va_range(p, size);
>          unmap_domain_page(p - offset);
>          len -= size;
>          from += size;
> @@ -55,6 +56,7 @@ unsigned long raw_clear_guest(void *to, unsigned len)
>          p += offset;
>          memset(p, 0x00, size);
>  
> +        flush_xen_dcache_va_range(p, size);
>          unmap_domain_page(p - offset);
>          len -= size;
>          to += size;
> @@ -84,6 +86,7 @@ unsigned long raw_copy_from_guest(void *to, const void
__user *from, unsigned le
>  
>          memcpy(to, p, size);
>  
> +        flush_xen_dcache_va_range(p, size);
>          unmap_domain_page(p);
>          len -= size;
>          from += size;

Jaeyong Yoo

2013-Jun-18 11:22 UTC

head link

Re: [PATCH] ARM: cache coherence problem in guestcopy.c

> 
> So I think we probably actually need the dcache flush in domain_map_page
> at the "/* Commandeer this 2MB slot */" point. In that context I
don''t
> think we can avoid flushing anything other than the complete 2MB
> mapping. Does this work for you too?
I am not sure that this would work. If we map_domain_page and unmap_domain_page
with the same mfn over and over again while the ref count is not zero (say 5), 
then flush is not called. And, I think we should call flush according to the
reason below:
> 
> The laziness of the remappings makes me wonder though. Do you know if
> the slot is reused between step #2 and #3?  Otherwise I''d expect
us to
> reuse the existing mapping with the cache intact. The caches are PIPT so
> I wouldn''t expect the address aliasing to be an issue. Unless the
> mapping is reused for something else I''m not too sure where the
cache
> pollution is coming from.
Let me try explain in more detail. 
We can consider the DomU as a producer (writing values to mfn) and
hypervisor as a consumer (reading values from mfn). While DomU is 
invoking multiple hypercalls, it reuses the same mfn and the same 
mapping at xen page table.

       (consumer)             (producer)
           xen                   DomU
             \                    /   (writing path)
            (cache)              /
                 \              /
(reading path)    \           /
_______________________________________
                    |   mfn   |        (physical memory)
---------------------------------------

If we see the above figure, xen "may" keep reading the cached value
while
DomU is writing different values to mfn. Here goes my observation where
cache pollution happen:

The pollution actually happens in second line of cache.
DomU side hypercall param local address is 0xc785fe38 (cache line size = 0x40B),
and the size of hypercall param is 24B. So the hypercall param lays out in two
cache lines. When hypervisor is reading the hypercall param, it reads the first 
8 bytes correctly (means the first cache line is flushed) and the other 16 bytes
are polluted (means the second cache line is not flushed).
Honestly, I''m not sure why the first cache line is flushed and the
second is not.
I think we can also cache_line_align the hypercall param struct, but that is
only
when  the sizes of all hypercall params are smaller than cache line size.

I hope the alignment of the figure is not broken :)

Best,
Jaeyong

Ian Campbell

2013-Jun-18 11:45 UTC

head link

Re: [PATCH] ARM: cache coherence problem in guestcopy.c

On Tue, 2013-06-18 at 11:22 +0000, Jaeyong Yoo wrote:> > 
> > So I think we probably actually need the dcache flush in
domain_map_page
> > at the "/* Commandeer this 2MB slot */" point. In that
context I don''t
> > think we can avoid flushing anything other than the complete 2MB
> > mapping. Does this work for you too?
> 
> I am not sure that this would work. If we map_domain_page and
unmap_domain_page
> with the same mfn over and over again while the ref count is not zero (say
5),
> then flush is not called. And, I think we should call flush according to
the reason below:
> 
> > 
> > The laziness of the remappings makes me wonder though. Do you know if
> > the slot is reused between step #2 and #3?  Otherwise I''d
expect us to
> > reuse the existing mapping with the cache intact. The caches are PIPT
so
> > I wouldn''t expect the address aliasing to be an issue. Unless
the
> > mapping is reused for something else I''m not too sure where
the cache
> > pollution is coming from.
> 
> Let me try explain in more detail. 
> We can consider the DomU as a producer (writing values to mfn) and
> hypervisor as a consumer (reading values from mfn). While DomU is 
> invoking multiple hypercalls, it reuses the same mfn and the same 
> mapping at xen page table.
> 
>        (consumer)             (producer)
>            xen                   DomU
>              \                    /   (writing path)
>             (cache)              /
>                  \              /
> (reading path)    \           /
> _______________________________________
>                     |   mfn   |        (physical memory)
> ---------------------------------------
> 
> If we see the above figure, xen "may" keep reading the cached
value while
> DomU is writing different values to mfn.
But all of the caches on this platform are PIPT (right?) so isn''t it
actually:

       (consumer)             (producer)
            xen                   DomU
              \                 /   (writing path)
               \               /
                \             /
 (reading path)  \           /
                  \         /
                    (cache)
                       ||
                       ||
                       \/
 _______________________________________
                     |   mfn   |        (physical memory)
 ---------------------------------------
 

Or are you saying that the writing path is uncached?

I was chatting with Tim and he suggested that the issue might be the
ReOrder Buffer, which is virtually tagged. In that case a DMB ought to
be sufficient and not a full cache flush, we think.

We were also speculating that we probably want some DMBs in
context_switch_{from,to} as well as at return_to_guest.
>  Here goes my observation where
> cache pollution happen:
> 
> The pollution actually happens in second line of cache.
> DomU side hypercall param local address is 0xc785fe38 (cache line size =
0x40B),
> and the size of hypercall param is 24B. So the hypercall param lays out in
two
> cache lines. When hypervisor is reading the hypercall param, it reads the
first
> 8 bytes correctly (means the first cache line is flushed) and the other 16
bytes
> are polluted (means the second cache line is not flushed).
> Honestly, I''m not sure why the first cache line is flushed and the
second is not.
> I think we can also cache_line_align the hypercall param struct, but that
is only
> when  the sizes of all hypercall params are smaller than cache line size.
> 
> I hope the alignment of the figure is not broken :)
> 
> Best,
> Jaeyong

Jaeyong Yoo

2013-Jun-18 12:05 UTC

head link

Re: [PATCH] ARM: cache coherence problem in guestcopy.c

> 
> But all of the caches on this platform are PIPT (right?) so isn''t
it
> actually:
> 
>        (consumer)             (producer)
>             xen                   DomU
>               \                 /   (writing path)
>                \               /
>                 \             /
> (reading path)  \           /
>                   \         /
>                     (cache)
>                        ||
>                        ||
>                        \/
> _______________________________________
>                      |   mfn   |        (physical memory)
> ---------------------------------------
> 
> 
> Or are you saying that the writing path is uncached?
Oops my mistake. As far as I know, it is PIPT and the writing also 
should be cached.
> 
> I was chatting with Tim and he suggested that the issue might be the
> ReOrder Buffer, which is virtually tagged. In that case a DMB ought to
> be sufficient and not a full cache flush, we think.
> 
> We were also speculating that we probably want some DMBs in
> context_switch_{from,to} as well as at return_to_guest.
Actually, I just learned ReOrder Buffer, and it looks like so.

Best,
Jaeyong

Ian Campbell

2013-Jun-18 12:18 UTC

head link

Re: [PATCH] ARM: cache coherence problem in guestcopy.c

On Tue, 2013-06-18 at 12:05 +0000, Jaeyong Yoo wrote:> > We were also speculating that we probably want some DMBs in
> > context_switch_{from,to} as well as at return_to_guest.
> 
> Actually, I just learned ReOrder Buffer, and it looks like so.
Right. Tim and I were speculating about where and why the barriers were
needed:

In unmap_domain_page: This ensures any writes to domain memory via the
Xen mapping are seen by subsequent accesses via the guest mapping.
Likewise map_domain_page needs one to ensure that previous accesses via
the guest mapping will be observed (this is the issue you observed).

In context_switch_from: To ensure any writes done on a VCPU are seen
before it is rescheduled, potentially on another PCPU. It seems likely
that you''d want one in context_switch_to for the complementary reasons,

In return_to_guest: This is similar to the unmap_domain_page one but
handles writes to pages which are permanently shared by Xen and the
guest, e.g. the shared info page or anything mapped with
share_xen_page_with_guest etc. This one is debatable because all users
of this are, I think, expected to get their own barriers correct since
this these pages are typically part of some lock free datastructure
arrangement.

Ian.

Ian Campbell

2013-Jun-19 15:12 UTC

head link

Re: [PATCH] ARM: cache coherence problem in guestcopy.c

On Tue, 2013-06-18 at 13:18 +0100, Ian Campbell wrote:> On Tue, 2013-06-18 at 12:05 +0000, Jaeyong Yoo wrote:
> > > We were also speculating that we probably want some DMBs in
> > > context_switch_{from,to} as well as at return_to_guest.
> > 
> > Actually, I just learned ReOrder Buffer, and it looks like so.
Does this patch help with the issue you are seeing?

I think in theory it should *BUT* (and it''s a big But)...

... it doesn''t work for me on an issue I''ve been seeing
booting dom0 on
the Foundation model. However doing the dmb in map_domain_page() *twice*
does work for me. In fact doing a dmb+nop works too. This is
inexplicable to me. It may be a model bug or it may (more likely) be
indicative of some other underlying issue. I also tried dsb() instead of
dmb() and that doesn''t make a difference.

Anyway, I''m interested about its behaviour on real hardware.

These are doing full system barriers. I think in reality only a "dmb
nsh" should be needed for this use case, since we only care about the
local processor. I didn''t want to go there when the supposedly more
obvious case wasn''t doing as I expected!

Ian.

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index beae61e..ef67953 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -218,6 +218,13 @@ void *map_domain_page(unsigned long mfn)

     local_irq_save(flags);

+    /*
+     * Ensure that any previous writes which occurred via another
+     * mapping, specifically the guest''s own mapping have been
+     * completed such that they are visible via this mapping.
+     */
+    dmb();
+
     /* The map is laid out as an open-addressed hash table where each
      * entry is a 2MB superpage pte.  We use the available bits of each
      * PTE as a reference count; when the refcount is zero the slot can
@@ -286,6 +297,13 @@ void unmap_domain_page(const void *va)

     map[slot].pt.avail--;

+    /*
+     * Ensure that any writes which occurred via this mapping have been
+     * completed such that they are visible via other virtual
+     * mappings, specifically the guest''s own mapping.
+     */
+    dmb();
+
     local_irq_restore(flags);
 }

Jaeyong Yoo

2013-Jun-20 08:34 UTC

head link

Re: [PATCH] ARM: cache coherence problem in guestcopy.c

> On Tue, 2013-06-18 at 13:18 +0100, Ian Campbell wrote:
> > On Tue, 2013-06-18 at 12:05 +0000, Jaeyong Yoo wrote:
> > > > We were also speculating that we probably want some DMBs in
> > > > context_switch_{from,to} as well as at return_to_guest.
> > > 
> > > Actually, I just learned ReOrder Buffer, and it looks like so.
> 
> Does this patch help with the issue you are seeing?
I tried the combinations and it does not work. I think my problem maybe stem 
from a different reason. Since this problem happens while
we try to migrate domU, something really weird may happen.

Actually, one of my colleage told me that this problem I''m having has
been
magically disappeared while he tried with copying more vcpu registers and 
lots of printks places to places. At this moment, I''m not sure that
this is the
common problem in xen or the problem due to poor migration, but I''m
more
believing that maybe it is due to the poor migration. Since I''m keep
investigating
tihs issue, I will tell you if anything turns out.
> I think in theory it should *BUT* (and it''s a big But)...
> 
> ... it doesn''t work for me on an issue I''ve been seeing
booting dom0 on
> the Foundation model. However doing the dmb in map_domain_page() *twice*
BTW, did you put barriers in map_domain_page? 
since the patch below looks like in unmap_domain_page.
> does work for me. In fact doing a dmb+nop works too. This is
> inexplicable to me. It may be a model bug or it may (more likely) be
> indicative of some other underlying issue. I also tried dsb() instead of
> dmb() and that doesn''t make a difference.
> 
> Anyway, I''m interested about its behaviour on real hardware.
> 
> These are doing full system barriers. I think in reality only a "dmb
> nsh" should be needed for this use case, since we only care about the
> local processor. I didn''t want to go there when the supposedly
more
> obvious case wasn''t doing as I expected!
> 
> Ian.
> 
> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index beae61e..ef67953 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -218,6 +218,13 @@ void *map_domain_page(unsigned long mfn)
> 
>      local_irq_save(flags);
> 
> +    /*
> +     * Ensure that any previous writes which occurred via another
> +     * mapping, specifically the guest''s own mapping have been
> +     * completed such that they are visible via this mapping.
> +     */
> +    dmb();
> +
>      /* The map is laid out as an open-addressed hash table where each
>       * entry is a 2MB superpage pte.  We use the available bits of each
>       * PTE as a reference count; when the refcount is zero the slot can
> @@ -286,6 +297,13 @@ void unmap_domain_page(const void *va)
> 
>      map[slot].pt.avail--;
> 
> +    /*
> +     * Ensure that any writes which occurred via this mapping have been
> +     * completed such that they are visible via other virtual
> +     * mappings, specifically the guest''s own mapping.
> +     */
> +    dmb();
> +
>      local_irq_restore(flags);
> }
> 
> 
>

Stefano Stabellini

2013-Jun-20 11:55 UTC

head link

Re: [PATCH] ARM: cache coherence problem in guestcopy.c

On Tue, 18 Jun 2013, Ian Campbell wrote:> On Tue, 2013-06-18 at 12:05 +0000, Jaeyong Yoo wrote:
> > > We were also speculating that we probably want some DMBs in
> > > context_switch_{from,to} as well as at return_to_guest.
> > 
> > Actually, I just learned ReOrder Buffer, and it looks like so.
> 
> Right. Tim and I were speculating about where and why the barriers were
> needed:
> 
> In unmap_domain_page: This ensures any writes to domain memory via the
> Xen mapping are seen by subsequent accesses via the guest mapping.
> Likewise map_domain_page needs one to ensure that previous accesses via
> the guest mapping will be observed (this is the issue you observed).
What if the guest doesn''t have caching enabled?
It seems to me that the only way to make sure that the guest is going to
be able to see the changes made by Xen is flushing the dcache.
Unless we mandate that all Xen guests are required to have caching
enabled. 

> In context_switch_from: To ensure any writes done on a VCPU are seen
> before it is rescheduled, potentially on another PCPU. It seems likely
> that you''d want one in context_switch_to for the complementary
reasons,
Is the cache shared across multiple pcpus? Because if it isn''t then we
need to flush the dcache in this case.

> In return_to_guest: This is similar to the unmap_domain_page one but
> handles writes to pages which are permanently shared by Xen and the
> guest, e.g. the shared info page or anything mapped with
> share_xen_page_with_guest etc. This one is debatable because all users
> of this are, I think, expected to get their own barriers correct since
> this these pages are typically part of some lock free datastructure
> arrangement.
same as above

Tim Deegan

2013-Jun-20 12:19 UTC

head link

Re: [PATCH] ARM: cache coherence problem in guestcopy.c

At 12:55 +0100 on 20 Jun (1371732938), Stefano Stabellini
wrote:> On Tue, 18 Jun 2013, Ian Campbell wrote:
> > On Tue, 2013-06-18 at 12:05 +0000, Jaeyong Yoo wrote:
> > > > We were also speculating that we probably want some DMBs in
> > > > context_switch_{from,to} as well as at return_to_guest.
> > > 
> > > Actually, I just learned ReOrder Buffer, and it looks like so.
> > 
> > Right. Tim and I were speculating about where and why the barriers
were
> > needed:
> > 
> > In unmap_domain_page: This ensures any writes to domain memory via the
> > Xen mapping are seen by subsequent accesses via the guest mapping.
> > Likewise map_domain_page needs one to ensure that previous accesses
via
> > the guest mapping will be observed (this is the issue you observed).
> 
> What if the guest doesn''t have caching enabled?
Erk!  I think we can mandate that the guest use caching for memory it
shares with the hypervisor (i.e. hypercall args & responses, shared
pages, granted pages).  Otherwise we end up flushing the dcache before
and after every copy.

In any case, this is something that should get documented. :)
> > In context_switch_from: To ensure any writes done on a VCPU are seen
> > before it is rescheduled, potentially on another PCPU. It seems likely
> > that you''d want one in context_switch_to for the
complementary reasons,
> 
> Is the cache shared across multiple pcpus? Because if it isn''t
then we
> need to flush the dcache in this case.
AIUI this is what marking accesses ''shareable'' is for: to tell
the caches
to keep them coherent.  If not, we''re equally in trouble for all
Xen''s
internal data structures.

Tim.

Ian Campbell

2013-Jun-25 09:22 UTC

head link

Re: [PATCH] ARM: cache coherence problem in guestcopy.c

On Thu, 2013-06-20 at 08:34 +0000, Jaeyong Yoo wrote:> > On Tue, 2013-06-18 at 13:18 +0100, Ian Campbell wrote:
> > > On Tue, 2013-06-18 at 12:05 +0000, Jaeyong Yoo wrote:
> > > > > We were also speculating that we probably want some
DMBs in
> > > > > context_switch_{from,to} as well as at return_to_guest.
> > > > 
> > > > Actually, I just learned ReOrder Buffer, and it looks like
so.
> > 
> > Does this patch help with the issue you are seeing?
> 
> I tried the combinations and it does not work.
Thanks, this is useful to know. It may be that our analysis is simply
flawed and something else it at work in both cases.
>  I think my problem maybe stem 
> from a different reason. Since this problem happens while
> we try to migrate domU, something really weird may happen.
> 
> Actually, one of my colleage told me that this problem I''m having
has been
> magically disappeared while he tried with copying more vcpu registers and 
> lots of printks places to places.
printks change all sorts of things, like timing, their own use of
barriers (e.g. in the serial driver) and simply adding more code which
increases the gap in the instruction stream between memory accesses
which might require a barrier such that the unwanted ordering no longer
happens.
>  At this moment, I''m not sure that this is the 
> common problem in xen or the problem due to poor migration, but
I''m more
> believing that maybe it is due to the poor migration. Since I''m
keep investigating
> tihs issue, I will tell you if anything turns out.
Thanks.
> 
> > I think in theory it should *BUT* (and it''s a big But)...
> > 
> > ... it doesn''t work for me on an issue I''ve been
seeing booting dom0 on
> > the Foundation model. However doing the dmb in map_domain_page()
*twice*
> 
> BTW, did you put barriers in map_domain_page? 
> since the patch below looks like in unmap_domain_page.
The first hunk is map_domain_page.
> > does work for me. In fact doing a dmb+nop works too. This is
> > inexplicable to me. It may be a model bug or it may (more likely) be
> > indicative of some other underlying issue. I also tried dsb() instead
of
> > dmb() and that doesn''t make a difference.
> > 
> > Anyway, I''m interested about its behaviour on real hardware.
> > 
> > These are doing full system barriers. I think in reality only a
"dmb
> > nsh" should be needed for this use case, since we only care about
the
> > local processor. I didn''t want to go there when the
supposedly more
> > obvious case wasn''t doing as I expected!

Ian Campbell

2013-Jun-25 09:43 UTC

head link

Re: [PATCH] ARM: cache coherence problem in guestcopy.c

On Thu, 2013-06-20 at 13:19 +0100, Tim Deegan wrote:> At 12:55 +0100 on 20 Jun (1371732938), Stefano Stabellini wrote:
> > On Tue, 18 Jun 2013, Ian Campbell wrote:
> > > On Tue, 2013-06-18 at 12:05 +0000, Jaeyong Yoo wrote:
> > > > > We were also speculating that we probably want some
DMBs in
> > > > > context_switch_{from,to} as well as at return_to_guest.
> > > > 
> > > > Actually, I just learned ReOrder Buffer, and it looks like
so.
> > > 
> > > Right. Tim and I were speculating about where and why the
barriers were
> > > needed:
> > > 
> > > In unmap_domain_page: This ensures any writes to domain memory
via the
> > > Xen mapping are seen by subsequent accesses via the guest
mapping.
> > > Likewise map_domain_page needs one to ensure that previous
accesses via
> > > the guest mapping will be observed (this is the issue you
observed).
> > 
> > What if the guest doesn''t have caching enabled?
> 
> Erk!  I think we can mandate that the guest use caching for memory it
> shares with the hypervisor (i.e. hypercall args & responses, shared
> pages, granted pages).  Otherwise we end up flushing the dcache before
> and after every copy.
Yes. Worse because of the reusing/reference counting scheme used in
map_domain_page I''m not sure we can tell when we need to flush the
cache
vs. discard it (because the guest has written something useful behind
the cache).
> In any case, this is something that should get documented. :)
Yes.
> > > In context_switch_from: To ensure any writes done on a VCPU are
seen
> > > before it is rescheduled, potentially on another PCPU. It seems
likely
> > > that you''d want one in context_switch_to for the
complementary reasons,
> > 
> > Is the cache shared across multiple pcpus? Because if it
isn''t then we
> > need to flush the dcache in this case.
> 
> AIUI this is what marking accesses ''shareable'' is for: to
tell the caches
> to keep them coherent.  If not, we''re equally in trouble for all
Xen''s
> internal data structures.
Right they are either snooped or shared, depending on the level and the
implementation. Either way it is fine for us I think.

Ian.

Ian Campbell

2013-Jul-02 09:38 UTC

head link

Re: [PATCH] ARM: cache coherence problem in guestcopy.c

On Thu, 2013-06-20 at 08:34 +0000, Jaeyong Yoo wrote:> > On Tue, 2013-06-18 at 13:18 +0100, Ian Campbell wrote:
> > > On Tue, 2013-06-18 at 12:05 +0000, Jaeyong Yoo wrote:
> > > > > We were also speculating that we probably want some
DMBs in
> > > > > context_switch_{from,to} as well as at return_to_guest.
> > > > 
> > > > Actually, I just learned ReOrder Buffer, and it looks like
so.
> > 
> > Does this patch help with the issue you are seeing?
> 
> I tried the combinations and it does not work.
As an experiment can you try the same patch with dsb instead of dmb
throughout?

Could you also try the patch from my series "xen: arm: map normal memory
as inner shareable, reduce scope of various barriers" in
<1372435809.8976.169.camel@zakaz.uk.xensource.com>, speaking with ARM
they reminded me that Xen currently uses Outer-shareable mappings while
Linux uses Inner-, which is more than likely the root cause of the issue
you are seeing. You probably want the dmb() patch as well to avoid
speculative fetches.

Ian.

Sengul Thomas

2013-Jul-02 12:14 UTC

head link

Re: [PATCH] ARM: cache coherence problem in guestcopy.c

On Tue, Jul 2, 2013 at 6:38 PM, Ian Campbell <Ian.Campbell@citrix.com>
wrote:> On Thu, 2013-06-20 at 08:34 +0000, Jaeyong Yoo wrote:
>> > On Tue, 2013-06-18 at 13:18 +0100, Ian Campbell wrote:
>> > > On Tue, 2013-06-18 at 12:05 +0000, Jaeyong Yoo wrote:
>> > > > > We were also speculating that we probably want some
DMBs in
>> > > > > context_switch_{from,to} as well as at
return_to_guest.
>> > > >
>> > > > Actually, I just learned ReOrder Buffer, and it looks
like so.
>> >
>> > Does this patch help with the issue you are seeing?
>>
>> I tried the combinations and it does not work.
>
> As an experiment can you try the same patch with dsb instead of dmb
> throughout?
I tried several ways to recreate the problem I was having, but I
failed. I was having
that problem with some printks place to place, and sadly, I don''t
remember the printk positions.
I''m sorry.

Jaeyong
>
> Could you also try the patch from my series "xen: arm: map normal
memory
> as inner shareable, reduce scope of various barriers" in
> <1372435809.8976.169.camel@zakaz.uk.xensource.com>, speaking with ARM
> they reminded me that Xen currently uses Outer-shareable mappings while
> Linux uses Inner-, which is more than likely the root cause of the issue
> you are seeing. You probably want the dmb() patch as well to avoid
> speculative fetches.
>
> Ian.
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

Sengul Thomas

2013-Jul-02 12:24 UTC

head link

Re: [PATCH] ARM: cache coherence problem in guestcopy.c

> I tried several ways to recreate the problem I was having, but I
> failed. I was having
> that problem with some printks place to place, and sadly, I don''t
> remember the printk positions.
> I''m sorry.
>
> Jaeyong
BTW, thomas.sengul@gmail.com is my personal email and
jaeyong.yoo@samsung.com is my work-email.
For posting patches, I use my work-email.
It is quite confusing. Sorry :)

Jaeyong/Thomas

Ian Campbell

2013-Jul-02 12:33 UTC

head link

Re: [PATCH] ARM: cache coherence problem in guestcopy.c

On Tue, 2013-07-02 at 21:14 +0900, Sengul Thomas wrote:> On Tue, Jul 2, 2013 at 6:38 PM, Ian Campbell
<Ian.Campbell@citrix.com> wrote:
> > On Thu, 2013-06-20 at 08:34 +0000, Jaeyong Yoo wrote:
> >> > On Tue, 2013-06-18 at 13:18 +0100, Ian Campbell wrote:
> >> > > On Tue, 2013-06-18 at 12:05 +0000, Jaeyong Yoo wrote:
> >> > > > > We were also speculating that we probably want
some DMBs in
> >> > > > > context_switch_{from,to} as well as at
return_to_guest.
> >> > > >
> >> > > > Actually, I just learned ReOrder Buffer, and it
looks like so.
> >> >
> >> > Does this patch help with the issue you are seeing?
> >>
> >> I tried the combinations and it does not work.
> >
> > As an experiment can you try the same patch with dsb instead of dmb
> > throughout?
> 
> I tried several ways to recreate the problem I was having, but I
> failed. I was having
> that problem with some printks place to place, and sadly, I don''t
> remember the printk positions.
> I''m sorry.
Oh yes, I''d forgotten that you weren''t able to reproduce any
more.

Oh well, I think the switch to Inner-shareable mappings in Xen is likely
to remove the issue and I think the justification for the dmb patch in
(un)map_domain_page is pretty good. So hopefully this issue will just go
away for good once we apply those (and if not we can always
reinvestigate).
> 
> Jaeyong
> 
> >
> > Could you also try the patch from my series "xen: arm: map normal
memory
> > as inner shareable, reduce scope of various barriers" in
> > <1372435809.8976.169.camel@zakaz.uk.xensource.com>, speaking
with ARM
> > they reminded me that Xen currently uses Outer-shareable mappings
while
> > Linux uses Inner-, which is more than likely the root cause of the
issue
> > you are seeing. You probably want the dmb() patch as well to avoid
> > speculative fetches.
> >
> > Ian.
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel

Sengul Thomas

2013-Jul-02 12:39 UTC

head link

Re: [PATCH] ARM: cache coherence problem in guestcopy.c

> Oh well, I think the switch to Inner-shareable mappings in Xen is likely
> to remove the issue and I think the justification for the dmb patch in
> (un)map_domain_page is pretty good. So hopefully this issue will just go
> away for good once we apply those. (and if not we can always
> reinvestigate).
Of course yes. If I ran into the similar bug, I will try the patches.

Jaeyong

Possibly Parallel Threads

Search for more reasonably related threads

Xen devel - Jun 2013 - [PATCH] ARM: cache coherence problem in guestcopy.c

[PATCH] ARM: cache coherence problem in guestcopy.c

Re: [PATCH] ARM: cache coherence problem in guestcopy.c

Re: [PATCH] ARM: cache coherence problem in guestcopy.c

Re: [PATCH] ARM: cache coherence problem in guestcopy.c

Re: [PATCH] ARM: cache coherence problem in guestcopy.c

Re: [PATCH] ARM: cache coherence problem in guestcopy.c

Re: [PATCH] ARM: cache coherence problem in guestcopy.c

Re: [PATCH] ARM: cache coherence problem in guestcopy.c

Re: [PATCH] ARM: cache coherence problem in guestcopy.c

Re: [PATCH] ARM: cache coherence problem in guestcopy.c

Re: [PATCH] ARM: cache coherence problem in guestcopy.c

Re: [PATCH] ARM: cache coherence problem in guestcopy.c

Re: [PATCH] ARM: cache coherence problem in guestcopy.c

Re: [PATCH] ARM: cache coherence problem in guestcopy.c

Re: [PATCH] ARM: cache coherence problem in guestcopy.c

Re: [PATCH] ARM: cache coherence problem in guestcopy.c

Re: [PATCH] ARM: cache coherence problem in guestcopy.c

Possibly Parallel Threads