I''ve encountered a rather unusual bug while I''m implementing live migration on arndale board. After resume, domU kernel starts invoking hypercalls and at some point the hypercall parameters delivered to xen are corrupted. After some debugging (with the help of HW debugger), I found that cache polution happens, and here is the detailed sequence. 1) DomU kernel allocates a local variable struct xen_add_to_physmap xatp and the GVA of xatp is 0xc785fe38 (note that not cache-line aligned) (see gnttab_map function in linux/drivers/xen/grant-table.c) 2) GVA of xatp is mapped in xen page table at raw_copy_from_guest function, and the VA of xen is 0xae48ee38 and its contents are cached. 3) DomU kernel reuses xatp to invoke the second hypercall with different parameters. 4) GVA of xatp is mapped again in the same VA of xen, and the cached data at step 2) (the first hypercall parameter) is loaded. The below patch prevents the above problem. I''m wondering, as a better solution, that does unmap_domain_page should invalidate the cache before unmap the page? Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> --- xen/arch/arm/guestcopy.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/xen/arch/arm/guestcopy.c b/xen/arch/arm/guestcopy.c index d146cd6..9c4a7e4 100644 --- a/xen/arch/arm/guestcopy.c +++ b/xen/arch/arm/guestcopy.c @@ -25,6 +25,7 @@ unsigned long raw_copy_to_guest(void *to, const void *from, unsigned len) p += offset; memcpy(p, from, size); + flush_xen_dcache_va_range(p, size); unmap_domain_page(p - offset); len -= size; from += size; @@ -55,6 +56,7 @@ unsigned long raw_clear_guest(void *to, unsigned len) p += offset; memset(p, 0x00, size); + flush_xen_dcache_va_range(p, size); unmap_domain_page(p - offset); len -= size; to += size; @@ -84,6 +86,7 @@ unsigned long raw_copy_from_guest(void *to, const void __user *from, unsigned le memcpy(to, p, size); + flush_xen_dcache_va_range(p, size); unmap_domain_page(p); len -= size; from += size; -- 1.7.9.5
Ian Campbell
2013-Jun-18 09:20 UTC
Re: [PATCH] ARM: cache coherence problem in guestcopy.c
On Tue, 2013-06-18 at 14:03 +0900, Jaeyong Yoo wrote:> I''ve encountered a rather unusual bug while I''m implementing live > migration on arndale board. > After resume, domU kernel starts invoking hypercalls and at some point > the hypercall parameters delivered to xen are corrupted. > > After some debugging (with the help of HW debugger), I found that > cache polution happens, and here is the detailed sequence. > > 1) DomU kernel allocates a local variable struct xen_add_to_physmap xatp > and the GVA of xatp is 0xc785fe38 (note that not cache-line aligned) > (see gnttab_map function in linux/drivers/xen/grant-table.c) > > 2) GVA of xatp is mapped in xen page table at raw_copy_from_guest function, > and the VA of xen is 0xae48ee38 and its contents are cached. > > 3) DomU kernel reuses xatp to invoke the second hypercall with different > parameters. > > 4) GVA of xatp is mapped again in the same VA of xen, and the cached data > at step 2) (the first hypercall parameter) is loaded. > > The below patch prevents the above problem.Ouch! I wonder what other random unexplained events could be attributed to this!> I''m wondering, as a better solution, that does unmap_domain_page should > invalidate the cache before unmap the page?I think that would be better since the domain_page region mappings are reference counted so you could avoid a flush every time you unmapped something and just issue it when the count hits zero. But actually the domain_page stuff is lazy -- it won''t unmap a refcount==0 slot until it comes to reuse it, this means that if you unmap then remap the same address you should end up reusing the same mapping. This means we only need to remap on reuse and not when the count hits zero. That''s why there are TLB flushes in map_domain_page but not unmap_domain_page. Obviously as you have discovered we need some data cache flushes in there as well. So I think we probably actually need the dcache flush in domain_map_page at the "/* Commandeer this 2MB slot */" point. In that context I don''t think we can avoid flushing anything other than the complete 2MB mapping. Does this work for you too? The laziness of the remappings makes me wonder though. Do you know if the slot is reused between step #2 and #3? Otherwise I''d expect us to reuse the existing mapping with the cache intact. The caches are PIPT so I wouldn''t expect the address aliasing to be an issue. Unless the mapping is reused for something else I''m not too sure where the cache pollution is coming from. Ian.> > > Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> > --- > xen/arch/arm/guestcopy.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/xen/arch/arm/guestcopy.c b/xen/arch/arm/guestcopy.c > index d146cd6..9c4a7e4 100644 > --- a/xen/arch/arm/guestcopy.c > +++ b/xen/arch/arm/guestcopy.c > @@ -25,6 +25,7 @@ unsigned long raw_copy_to_guest(void *to, const void *from, unsigned len) > p += offset; > memcpy(p, from, size); > > + flush_xen_dcache_va_range(p, size); > unmap_domain_page(p - offset); > len -= size; > from += size; > @@ -55,6 +56,7 @@ unsigned long raw_clear_guest(void *to, unsigned len) > p += offset; > memset(p, 0x00, size); > > + flush_xen_dcache_va_range(p, size); > unmap_domain_page(p - offset); > len -= size; > to += size; > @@ -84,6 +86,7 @@ unsigned long raw_copy_from_guest(void *to, const void __user *from, unsigned le > > memcpy(to, p, size); > > + flush_xen_dcache_va_range(p, size); > unmap_domain_page(p); > len -= size; > from += size;
> > So I think we probably actually need the dcache flush in domain_map_page > at the "/* Commandeer this 2MB slot */" point. In that context I don''t > think we can avoid flushing anything other than the complete 2MB > mapping. Does this work for you too?I am not sure that this would work. If we map_domain_page and unmap_domain_page with the same mfn over and over again while the ref count is not zero (say 5), then flush is not called. And, I think we should call flush according to the reason below:> > The laziness of the remappings makes me wonder though. Do you know if > the slot is reused between step #2 and #3? Otherwise I''d expect us to > reuse the existing mapping with the cache intact. The caches are PIPT so > I wouldn''t expect the address aliasing to be an issue. Unless the > mapping is reused for something else I''m not too sure where the cache > pollution is coming from.Let me try explain in more detail. We can consider the DomU as a producer (writing values to mfn) and hypervisor as a consumer (reading values from mfn). While DomU is invoking multiple hypercalls, it reuses the same mfn and the same mapping at xen page table. (consumer) (producer) xen DomU \ / (writing path) (cache) / \ / (reading path) \ / _______________________________________ | mfn | (physical memory) --------------------------------------- If we see the above figure, xen "may" keep reading the cached value while DomU is writing different values to mfn. Here goes my observation where cache pollution happen: The pollution actually happens in second line of cache. DomU side hypercall param local address is 0xc785fe38 (cache line size = 0x40B), and the size of hypercall param is 24B. So the hypercall param lays out in two cache lines. When hypervisor is reading the hypercall param, it reads the first 8 bytes correctly (means the first cache line is flushed) and the other 16 bytes are polluted (means the second cache line is not flushed). Honestly, I''m not sure why the first cache line is flushed and the second is not. I think we can also cache_line_align the hypercall param struct, but that is only when the sizes of all hypercall params are smaller than cache line size. I hope the alignment of the figure is not broken :) Best, Jaeyong
Ian Campbell
2013-Jun-18 11:45 UTC
Re: [PATCH] ARM: cache coherence problem in guestcopy.c
On Tue, 2013-06-18 at 11:22 +0000, Jaeyong Yoo wrote:> > > > So I think we probably actually need the dcache flush in domain_map_page > > at the "/* Commandeer this 2MB slot */" point. In that context I don''t > > think we can avoid flushing anything other than the complete 2MB > > mapping. Does this work for you too? > > I am not sure that this would work. If we map_domain_page and unmap_domain_page > with the same mfn over and over again while the ref count is not zero (say 5), > then flush is not called. And, I think we should call flush according to the reason below: > > > > > The laziness of the remappings makes me wonder though. Do you know if > > the slot is reused between step #2 and #3? Otherwise I''d expect us to > > reuse the existing mapping with the cache intact. The caches are PIPT so > > I wouldn''t expect the address aliasing to be an issue. Unless the > > mapping is reused for something else I''m not too sure where the cache > > pollution is coming from. > > Let me try explain in more detail. > We can consider the DomU as a producer (writing values to mfn) and > hypervisor as a consumer (reading values from mfn). While DomU is > invoking multiple hypercalls, it reuses the same mfn and the same > mapping at xen page table. > > (consumer) (producer) > xen DomU > \ / (writing path) > (cache) / > \ / > (reading path) \ / > _______________________________________ > | mfn | (physical memory) > --------------------------------------- > > If we see the above figure, xen "may" keep reading the cached value while > DomU is writing different values to mfn.But all of the caches on this platform are PIPT (right?) so isn''t it actually: (consumer) (producer) xen DomU \ / (writing path) \ / \ / (reading path) \ / \ / (cache) || || \/ _______________________________________ | mfn | (physical memory) --------------------------------------- Or are you saying that the writing path is uncached? I was chatting with Tim and he suggested that the issue might be the ReOrder Buffer, which is virtually tagged. In that case a DMB ought to be sufficient and not a full cache flush, we think. We were also speculating that we probably want some DMBs in context_switch_{from,to} as well as at return_to_guest.> Here goes my observation where > cache pollution happen: > > The pollution actually happens in second line of cache. > DomU side hypercall param local address is 0xc785fe38 (cache line size = 0x40B), > and the size of hypercall param is 24B. So the hypercall param lays out in two > cache lines. When hypervisor is reading the hypercall param, it reads the first > 8 bytes correctly (means the first cache line is flushed) and the other 16 bytes > are polluted (means the second cache line is not flushed). > Honestly, I''m not sure why the first cache line is flushed and the second is not. > I think we can also cache_line_align the hypercall param struct, but that is only > when the sizes of all hypercall params are smaller than cache line size. > > I hope the alignment of the figure is not broken :) > > Best, > Jaeyong
> > But all of the caches on this platform are PIPT (right?) so isn''t it > actually: > > (consumer) (producer) > xen DomU > \ / (writing path) > \ / > \ / > (reading path) \ / > \ / > (cache) > || > || > \/ > _______________________________________ > | mfn | (physical memory) > --------------------------------------- > > > Or are you saying that the writing path is uncached?Oops my mistake. As far as I know, it is PIPT and the writing also should be cached.> > I was chatting with Tim and he suggested that the issue might be the > ReOrder Buffer, which is virtually tagged. In that case a DMB ought to > be sufficient and not a full cache flush, we think. > > We were also speculating that we probably want some DMBs in > context_switch_{from,to} as well as at return_to_guest.Actually, I just learned ReOrder Buffer, and it looks like so. Best, Jaeyong
Ian Campbell
2013-Jun-18 12:18 UTC
Re: [PATCH] ARM: cache coherence problem in guestcopy.c
On Tue, 2013-06-18 at 12:05 +0000, Jaeyong Yoo wrote:> > We were also speculating that we probably want some DMBs in > > context_switch_{from,to} as well as at return_to_guest. > > Actually, I just learned ReOrder Buffer, and it looks like so.Right. Tim and I were speculating about where and why the barriers were needed: In unmap_domain_page: This ensures any writes to domain memory via the Xen mapping are seen by subsequent accesses via the guest mapping. Likewise map_domain_page needs one to ensure that previous accesses via the guest mapping will be observed (this is the issue you observed). In context_switch_from: To ensure any writes done on a VCPU are seen before it is rescheduled, potentially on another PCPU. It seems likely that you''d want one in context_switch_to for the complementary reasons, In return_to_guest: This is similar to the unmap_domain_page one but handles writes to pages which are permanently shared by Xen and the guest, e.g. the shared info page or anything mapped with share_xen_page_with_guest etc. This one is debatable because all users of this are, I think, expected to get their own barriers correct since this these pages are typically part of some lock free datastructure arrangement. Ian.
Ian Campbell
2013-Jun-19 15:12 UTC
Re: [PATCH] ARM: cache coherence problem in guestcopy.c
On Tue, 2013-06-18 at 13:18 +0100, Ian Campbell wrote:> On Tue, 2013-06-18 at 12:05 +0000, Jaeyong Yoo wrote: > > > We were also speculating that we probably want some DMBs in > > > context_switch_{from,to} as well as at return_to_guest. > > > > Actually, I just learned ReOrder Buffer, and it looks like so.Does this patch help with the issue you are seeing? I think in theory it should *BUT* (and it''s a big But)... ... it doesn''t work for me on an issue I''ve been seeing booting dom0 on the Foundation model. However doing the dmb in map_domain_page() *twice* does work for me. In fact doing a dmb+nop works too. This is inexplicable to me. It may be a model bug or it may (more likely) be indicative of some other underlying issue. I also tried dsb() instead of dmb() and that doesn''t make a difference. Anyway, I''m interested about its behaviour on real hardware. These are doing full system barriers. I think in reality only a "dmb nsh" should be needed for this use case, since we only care about the local processor. I didn''t want to go there when the supposedly more obvious case wasn''t doing as I expected! Ian. diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index beae61e..ef67953 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -218,6 +218,13 @@ void *map_domain_page(unsigned long mfn) local_irq_save(flags); + /* + * Ensure that any previous writes which occurred via another + * mapping, specifically the guest''s own mapping have been + * completed such that they are visible via this mapping. + */ + dmb(); + /* The map is laid out as an open-addressed hash table where each * entry is a 2MB superpage pte. We use the available bits of each * PTE as a reference count; when the refcount is zero the slot can @@ -286,6 +297,13 @@ void unmap_domain_page(const void *va) map[slot].pt.avail--; + /* + * Ensure that any writes which occurred via this mapping have been + * completed such that they are visible via other virtual + * mappings, specifically the guest''s own mapping. + */ + dmb(); + local_irq_restore(flags); }
> On Tue, 2013-06-18 at 13:18 +0100, Ian Campbell wrote: > > On Tue, 2013-06-18 at 12:05 +0000, Jaeyong Yoo wrote: > > > > We were also speculating that we probably want some DMBs in > > > > context_switch_{from,to} as well as at return_to_guest. > > > > > > Actually, I just learned ReOrder Buffer, and it looks like so. > > Does this patch help with the issue you are seeing?I tried the combinations and it does not work. I think my problem maybe stem from a different reason. Since this problem happens while we try to migrate domU, something really weird may happen. Actually, one of my colleage told me that this problem I''m having has been magically disappeared while he tried with copying more vcpu registers and lots of printks places to places. At this moment, I''m not sure that this is the common problem in xen or the problem due to poor migration, but I''m more believing that maybe it is due to the poor migration. Since I''m keep investigating tihs issue, I will tell you if anything turns out.> I think in theory it should *BUT* (and it''s a big But)... > > ... it doesn''t work for me on an issue I''ve been seeing booting dom0 on > the Foundation model. However doing the dmb in map_domain_page() *twice*BTW, did you put barriers in map_domain_page? since the patch below looks like in unmap_domain_page.> does work for me. In fact doing a dmb+nop works too. This is > inexplicable to me. It may be a model bug or it may (more likely) be > indicative of some other underlying issue. I also tried dsb() instead of > dmb() and that doesn''t make a difference. > > Anyway, I''m interested about its behaviour on real hardware. > > These are doing full system barriers. I think in reality only a "dmb > nsh" should be needed for this use case, since we only care about the > local processor. I didn''t want to go there when the supposedly more > obvious case wasn''t doing as I expected! > > Ian. > > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c > index beae61e..ef67953 100644 > --- a/xen/arch/arm/mm.c > +++ b/xen/arch/arm/mm.c > @@ -218,6 +218,13 @@ void *map_domain_page(unsigned long mfn) > > local_irq_save(flags); > > + /* > + * Ensure that any previous writes which occurred via another > + * mapping, specifically the guest''s own mapping have been > + * completed such that they are visible via this mapping. > + */ > + dmb(); > + > /* The map is laid out as an open-addressed hash table where each > * entry is a 2MB superpage pte. We use the available bits of each > * PTE as a reference count; when the refcount is zero the slot can > @@ -286,6 +297,13 @@ void unmap_domain_page(const void *va) > > map[slot].pt.avail--; > > + /* > + * Ensure that any writes which occurred via this mapping have been > + * completed such that they are visible via other virtual > + * mappings, specifically the guest''s own mapping. > + */ > + dmb(); > + > local_irq_restore(flags); > } > > >
Stefano Stabellini
2013-Jun-20 11:55 UTC
Re: [PATCH] ARM: cache coherence problem in guestcopy.c
On Tue, 18 Jun 2013, Ian Campbell wrote:> On Tue, 2013-06-18 at 12:05 +0000, Jaeyong Yoo wrote: > > > We were also speculating that we probably want some DMBs in > > > context_switch_{from,to} as well as at return_to_guest. > > > > Actually, I just learned ReOrder Buffer, and it looks like so. > > Right. Tim and I were speculating about where and why the barriers were > needed: > > In unmap_domain_page: This ensures any writes to domain memory via the > Xen mapping are seen by subsequent accesses via the guest mapping. > Likewise map_domain_page needs one to ensure that previous accesses via > the guest mapping will be observed (this is the issue you observed).What if the guest doesn''t have caching enabled? It seems to me that the only way to make sure that the guest is going to be able to see the changes made by Xen is flushing the dcache. Unless we mandate that all Xen guests are required to have caching enabled.> In context_switch_from: To ensure any writes done on a VCPU are seen > before it is rescheduled, potentially on another PCPU. It seems likely > that you''d want one in context_switch_to for the complementary reasons,Is the cache shared across multiple pcpus? Because if it isn''t then we need to flush the dcache in this case.> In return_to_guest: This is similar to the unmap_domain_page one but > handles writes to pages which are permanently shared by Xen and the > guest, e.g. the shared info page or anything mapped with > share_xen_page_with_guest etc. This one is debatable because all users > of this are, I think, expected to get their own barriers correct since > this these pages are typically part of some lock free datastructure > arrangement.same as above
At 12:55 +0100 on 20 Jun (1371732938), Stefano Stabellini wrote:> On Tue, 18 Jun 2013, Ian Campbell wrote: > > On Tue, 2013-06-18 at 12:05 +0000, Jaeyong Yoo wrote: > > > > We were also speculating that we probably want some DMBs in > > > > context_switch_{from,to} as well as at return_to_guest. > > > > > > Actually, I just learned ReOrder Buffer, and it looks like so. > > > > Right. Tim and I were speculating about where and why the barriers were > > needed: > > > > In unmap_domain_page: This ensures any writes to domain memory via the > > Xen mapping are seen by subsequent accesses via the guest mapping. > > Likewise map_domain_page needs one to ensure that previous accesses via > > the guest mapping will be observed (this is the issue you observed). > > What if the guest doesn''t have caching enabled?Erk! I think we can mandate that the guest use caching for memory it shares with the hypervisor (i.e. hypercall args & responses, shared pages, granted pages). Otherwise we end up flushing the dcache before and after every copy. In any case, this is something that should get documented. :)> > In context_switch_from: To ensure any writes done on a VCPU are seen > > before it is rescheduled, potentially on another PCPU. It seems likely > > that you''d want one in context_switch_to for the complementary reasons, > > Is the cache shared across multiple pcpus? Because if it isn''t then we > need to flush the dcache in this case.AIUI this is what marking accesses ''shareable'' is for: to tell the caches to keep them coherent. If not, we''re equally in trouble for all Xen''s internal data structures. Tim.
Ian Campbell
2013-Jun-25 09:22 UTC
Re: [PATCH] ARM: cache coherence problem in guestcopy.c
On Thu, 2013-06-20 at 08:34 +0000, Jaeyong Yoo wrote:> > On Tue, 2013-06-18 at 13:18 +0100, Ian Campbell wrote: > > > On Tue, 2013-06-18 at 12:05 +0000, Jaeyong Yoo wrote: > > > > > We were also speculating that we probably want some DMBs in > > > > > context_switch_{from,to} as well as at return_to_guest. > > > > > > > > Actually, I just learned ReOrder Buffer, and it looks like so. > > > > Does this patch help with the issue you are seeing? > > I tried the combinations and it does not work.Thanks, this is useful to know. It may be that our analysis is simply flawed and something else it at work in both cases.> I think my problem maybe stem > from a different reason. Since this problem happens while > we try to migrate domU, something really weird may happen. > > Actually, one of my colleage told me that this problem I''m having has been > magically disappeared while he tried with copying more vcpu registers and > lots of printks places to places.printks change all sorts of things, like timing, their own use of barriers (e.g. in the serial driver) and simply adding more code which increases the gap in the instruction stream between memory accesses which might require a barrier such that the unwanted ordering no longer happens.> At this moment, I''m not sure that this is the > common problem in xen or the problem due to poor migration, but I''m more > believing that maybe it is due to the poor migration. Since I''m keep investigating > tihs issue, I will tell you if anything turns out.Thanks.> > > I think in theory it should *BUT* (and it''s a big But)... > > > > ... it doesn''t work for me on an issue I''ve been seeing booting dom0 on > > the Foundation model. However doing the dmb in map_domain_page() *twice* > > BTW, did you put barriers in map_domain_page? > since the patch below looks like in unmap_domain_page.The first hunk is map_domain_page.> > does work for me. In fact doing a dmb+nop works too. This is > > inexplicable to me. It may be a model bug or it may (more likely) be > > indicative of some other underlying issue. I also tried dsb() instead of > > dmb() and that doesn''t make a difference. > > > > Anyway, I''m interested about its behaviour on real hardware. > > > > These are doing full system barriers. I think in reality only a "dmb > > nsh" should be needed for this use case, since we only care about the > > local processor. I didn''t want to go there when the supposedly more > > obvious case wasn''t doing as I expected!
Ian Campbell
2013-Jun-25 09:43 UTC
Re: [PATCH] ARM: cache coherence problem in guestcopy.c
On Thu, 2013-06-20 at 13:19 +0100, Tim Deegan wrote:> At 12:55 +0100 on 20 Jun (1371732938), Stefano Stabellini wrote: > > On Tue, 18 Jun 2013, Ian Campbell wrote: > > > On Tue, 2013-06-18 at 12:05 +0000, Jaeyong Yoo wrote: > > > > > We were also speculating that we probably want some DMBs in > > > > > context_switch_{from,to} as well as at return_to_guest. > > > > > > > > Actually, I just learned ReOrder Buffer, and it looks like so. > > > > > > Right. Tim and I were speculating about where and why the barriers were > > > needed: > > > > > > In unmap_domain_page: This ensures any writes to domain memory via the > > > Xen mapping are seen by subsequent accesses via the guest mapping. > > > Likewise map_domain_page needs one to ensure that previous accesses via > > > the guest mapping will be observed (this is the issue you observed). > > > > What if the guest doesn''t have caching enabled? > > Erk! I think we can mandate that the guest use caching for memory it > shares with the hypervisor (i.e. hypercall args & responses, shared > pages, granted pages). Otherwise we end up flushing the dcache before > and after every copy.Yes. Worse because of the reusing/reference counting scheme used in map_domain_page I''m not sure we can tell when we need to flush the cache vs. discard it (because the guest has written something useful behind the cache).> In any case, this is something that should get documented. :)Yes.> > > In context_switch_from: To ensure any writes done on a VCPU are seen > > > before it is rescheduled, potentially on another PCPU. It seems likely > > > that you''d want one in context_switch_to for the complementary reasons, > > > > Is the cache shared across multiple pcpus? Because if it isn''t then we > > need to flush the dcache in this case. > > AIUI this is what marking accesses ''shareable'' is for: to tell the caches > to keep them coherent. If not, we''re equally in trouble for all Xen''s > internal data structures.Right they are either snooped or shared, depending on the level and the implementation. Either way it is fine for us I think. Ian.
Ian Campbell
2013-Jul-02 09:38 UTC
Re: [PATCH] ARM: cache coherence problem in guestcopy.c
On Thu, 2013-06-20 at 08:34 +0000, Jaeyong Yoo wrote:> > On Tue, 2013-06-18 at 13:18 +0100, Ian Campbell wrote: > > > On Tue, 2013-06-18 at 12:05 +0000, Jaeyong Yoo wrote: > > > > > We were also speculating that we probably want some DMBs in > > > > > context_switch_{from,to} as well as at return_to_guest. > > > > > > > > Actually, I just learned ReOrder Buffer, and it looks like so. > > > > Does this patch help with the issue you are seeing? > > I tried the combinations and it does not work.As an experiment can you try the same patch with dsb instead of dmb throughout? Could you also try the patch from my series "xen: arm: map normal memory as inner shareable, reduce scope of various barriers" in <1372435809.8976.169.camel@zakaz.uk.xensource.com>, speaking with ARM they reminded me that Xen currently uses Outer-shareable mappings while Linux uses Inner-, which is more than likely the root cause of the issue you are seeing. You probably want the dmb() patch as well to avoid speculative fetches. Ian.
Sengul Thomas
2013-Jul-02 12:14 UTC
Re: [PATCH] ARM: cache coherence problem in guestcopy.c
On Tue, Jul 2, 2013 at 6:38 PM, Ian Campbell <Ian.Campbell@citrix.com> wrote:> On Thu, 2013-06-20 at 08:34 +0000, Jaeyong Yoo wrote: >> > On Tue, 2013-06-18 at 13:18 +0100, Ian Campbell wrote: >> > > On Tue, 2013-06-18 at 12:05 +0000, Jaeyong Yoo wrote: >> > > > > We were also speculating that we probably want some DMBs in >> > > > > context_switch_{from,to} as well as at return_to_guest. >> > > > >> > > > Actually, I just learned ReOrder Buffer, and it looks like so. >> > >> > Does this patch help with the issue you are seeing? >> >> I tried the combinations and it does not work. > > As an experiment can you try the same patch with dsb instead of dmb > throughout?I tried several ways to recreate the problem I was having, but I failed. I was having that problem with some printks place to place, and sadly, I don''t remember the printk positions. I''m sorry. Jaeyong> > Could you also try the patch from my series "xen: arm: map normal memory > as inner shareable, reduce scope of various barriers" in > <1372435809.8976.169.camel@zakaz.uk.xensource.com>, speaking with ARM > they reminded me that Xen currently uses Outer-shareable mappings while > Linux uses Inner-, which is more than likely the root cause of the issue > you are seeing. You probably want the dmb() patch as well to avoid > speculative fetches. > > Ian. > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Sengul Thomas
2013-Jul-02 12:24 UTC
Re: [PATCH] ARM: cache coherence problem in guestcopy.c
> I tried several ways to recreate the problem I was having, but I > failed. I was having > that problem with some printks place to place, and sadly, I don''t > remember the printk positions. > I''m sorry. > > JaeyongBTW, thomas.sengul@gmail.com is my personal email and jaeyong.yoo@samsung.com is my work-email. For posting patches, I use my work-email. It is quite confusing. Sorry :) Jaeyong/Thomas
Ian Campbell
2013-Jul-02 12:33 UTC
Re: [PATCH] ARM: cache coherence problem in guestcopy.c
On Tue, 2013-07-02 at 21:14 +0900, Sengul Thomas wrote:> On Tue, Jul 2, 2013 at 6:38 PM, Ian Campbell <Ian.Campbell@citrix.com> wrote: > > On Thu, 2013-06-20 at 08:34 +0000, Jaeyong Yoo wrote: > >> > On Tue, 2013-06-18 at 13:18 +0100, Ian Campbell wrote: > >> > > On Tue, 2013-06-18 at 12:05 +0000, Jaeyong Yoo wrote: > >> > > > > We were also speculating that we probably want some DMBs in > >> > > > > context_switch_{from,to} as well as at return_to_guest. > >> > > > > >> > > > Actually, I just learned ReOrder Buffer, and it looks like so. > >> > > >> > Does this patch help with the issue you are seeing? > >> > >> I tried the combinations and it does not work. > > > > As an experiment can you try the same patch with dsb instead of dmb > > throughout? > > I tried several ways to recreate the problem I was having, but I > failed. I was having > that problem with some printks place to place, and sadly, I don''t > remember the printk positions. > I''m sorry.Oh yes, I''d forgotten that you weren''t able to reproduce any more. Oh well, I think the switch to Inner-shareable mappings in Xen is likely to remove the issue and I think the justification for the dmb patch in (un)map_domain_page is pretty good. So hopefully this issue will just go away for good once we apply those (and if not we can always reinvestigate).> > Jaeyong > > > > > Could you also try the patch from my series "xen: arm: map normal memory > > as inner shareable, reduce scope of various barriers" in > > <1372435809.8976.169.camel@zakaz.uk.xensource.com>, speaking with ARM > > they reminded me that Xen currently uses Outer-shareable mappings while > > Linux uses Inner-, which is more than likely the root cause of the issue > > you are seeing. You probably want the dmb() patch as well to avoid > > speculative fetches. > > > > Ian. > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xen.org > > http://lists.xen.org/xen-devel
Sengul Thomas
2013-Jul-02 12:39 UTC
Re: [PATCH] ARM: cache coherence problem in guestcopy.c
> Oh well, I think the switch to Inner-shareable mappings in Xen is likely > to remove the issue and I think the justification for the dmb patch in > (un)map_domain_page is pretty good. So hopefully this issue will just go > away for good once we apply those. (and if not we can always > reinvestigate).Of course yes. If I ran into the similar bug, I will try the patches. Jaeyong