Ian Campbell
2013-Dec-12 12:21 UTC
[PATCH] tools: libxc: flush data cache after loading images into guest memory
On ARM guest OSes are started with MMU and Caches disables (as they are on native) however caching is enabled in the domain running the builder and therefore we must flush the cache as we load the blobs, otherwise when the guest starts running it may not see them. The dom0 build in the hypervisor has the same requirements and already does the right thing. The mechanism for performing a cache flush from userspace is OS specific, so implement this as a new osdep hook: - On 32-bit ARM Linux provides a system call to flush the cache. - On 64-bit ARM Linux the processor is configured to allow cache flushes directly from userspace. - Non-Linux platforms will need to provide their own implementation. If similar mechanisms are not available then a new privcmd ioctl should be a suitable alternative. No cache maintenance is required on x86, so provide a stub for all non-Linux platforms which returns success on x86 only and log an error otherwise. This fixes guest building on Xgene which has a very large L3 cache and so is particularly susceptible to this problem. It has also been observed sporadically on midway. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Andre Przywara <andre.przywara@calxeda.com> Cc: Pranavkumar Sawargaonkar <psawargaonkar@apm.com> Cc: Anup Patel <apatel@apm.com> --- Freeze: Bugfix. --- tools/libxc/xc_dom_armzimageloader.c | 1 + tools/libxc/xc_dom_binloader.c | 1 + tools/libxc/xc_dom_core.c | 2 ++ tools/libxc/xc_linux_osdep.c | 39 ++++++++++++++++++++++++++++++++++ tools/libxc/xc_minios.c | 11 ++++++++++ tools/libxc/xc_netbsd.c | 12 +++++++++++ tools/libxc/xc_private.c | 5 +++++ tools/libxc/xc_private.h | 3 +++ tools/libxc/xc_solaris.c | 12 +++++++++++ tools/libxc/xenctrl_osdep_ENOSYS.c | 6 ++++++ tools/libxc/xenctrlosdep.h | 1 + 11 files changed, 93 insertions(+) diff --git a/tools/libxc/xc_dom_armzimageloader.c b/tools/libxc/xc_dom_armzimageloader.c index e6516a1..508f74b 100644 --- a/tools/libxc/xc_dom_armzimageloader.c +++ b/tools/libxc/xc_dom_armzimageloader.c @@ -229,6 +229,7 @@ static int xc_dom_load_zimage_kernel(struct xc_dom_image *dom) __func__, dom->kernel_size, dom->kernel_blob, dst); memcpy(dst, dom->kernel_blob, dom->kernel_size); + xc_cache_flush(dom->xch, dst, dom->kernel_size); return 0; } diff --git a/tools/libxc/xc_dom_binloader.c b/tools/libxc/xc_dom_binloader.c index e1de5b5..aa0463c 100644 --- a/tools/libxc/xc_dom_binloader.c +++ b/tools/libxc/xc_dom_binloader.c @@ -301,6 +301,7 @@ static int xc_dom_load_bin_kernel(struct xc_dom_image *dom) memcpy(dest, image + skip, text_size); memset(dest + text_size, 0, bss_size); + xc_cache_flush(dom->xch, dest, text_size+bss_size); return 0; } diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c index 77a4e64..d46ac22 100644 --- a/tools/libxc/xc_dom_core.c +++ b/tools/libxc/xc_dom_core.c @@ -978,6 +978,7 @@ int xc_dom_build_image(struct xc_dom_image *dom) } else memcpy(ramdiskmap, dom->ramdisk_blob, dom->ramdisk_size); + xc_cache_flush(dom->xch, ramdiskmap, ramdisklen); } /* load devicetree */ @@ -997,6 +998,7 @@ int xc_dom_build_image(struct xc_dom_image *dom) goto err; } memcpy(devicetreemap, dom->devicetree_blob, dom->devicetree_size); + xc_cache_flush(dom->xch, devicetreemap, dom->devicetree_size); } /* allocate other pages */ diff --git a/tools/libxc/xc_linux_osdep.c b/tools/libxc/xc_linux_osdep.c index 73860a2..8362495 100644 --- a/tools/libxc/xc_linux_osdep.c +++ b/tools/libxc/xc_linux_osdep.c @@ -30,6 +30,7 @@ #include <sys/mman.h> #include <sys/ioctl.h> +#include <sys/syscall.h> #include <xen/memory.h> #include <xen/sys/evtchn.h> @@ -416,6 +417,42 @@ static void *linux_privcmd_map_foreign_ranges(xc_interface *xch, xc_osdep_handle return ret; } +static void linux_privcmd_cache_flush(xc_interface *xch, + const void *ptr, size_t nr) +{ +#if defined(__arm__) + unsigned long start = (unsigned long)ptr; + unsigned long end = start + nr; + /* cacheflush(unsigned long start, unsigned long end, int flags) */ + int rc = syscall(__ARM_NR_cacheflush, start, end, 0); + if ( rc < 0 ) + PERROR("cache flush operation failed: %d\n", errno); +#elif defined(__aarch64__) + unsigned long start = (unsigned long)ptr; + unsigned long end = start + nr; + unsigned long p, ctr; + int stride; + + /* Flush cache using direct DC CVAC instructions. This is + * available to EL0 when SCTLR_EL1.UCI is set, which Linux does. + * + * Bits 19:16 of CTR_EL0 are log2 of the minimum dcache line size + * in words, which we use as our stride length. This is readable + * with SCTLR_EL1.UCT is set, which Linux does. + */ + asm volatile ("mrs %0, ctr_el0" : "=r" (ctr)); + + stride = 4 * (1 << ((ctr & 0xf0000UL) >> 16)); + + for ( p = start ; p < end ; p += stride ) + asm volatile ("dc cvac, %0" : : "r" (p)); +#elif defined(__i386__) || defined(__x86_64__) + /* No need for cache maintenance on x86 */ +#else + PERROR("No cache flush operation defined for architecture"); +#endif +} + static struct xc_osdep_ops linux_privcmd_ops = { .open = &linux_privcmd_open, .close = &linux_privcmd_close, @@ -430,6 +467,8 @@ static struct xc_osdep_ops linux_privcmd_ops = { .map_foreign_bulk = &linux_privcmd_map_foreign_bulk, .map_foreign_range = &linux_privcmd_map_foreign_range, .map_foreign_ranges = &linux_privcmd_map_foreign_ranges, + + .cache_flush = &linux_privcmd_cache_flush, }, }; diff --git a/tools/libxc/xc_minios.c b/tools/libxc/xc_minios.c index dec4d73..3b2f553 100644 --- a/tools/libxc/xc_minios.c +++ b/tools/libxc/xc_minios.c @@ -181,6 +181,15 @@ static void *minios_privcmd_map_foreign_ranges(xc_interface *xch, xc_osdep_handl return ret; } +static void minios_privcmd_cache_flush(xc_interface *xch, + const void *ptr, size_t nr) +{ +#if defined(__i386__) || defined(__x86_64__) + /* No need for cache maintenance on x86 */ +#else + PERROR("No cache flush operation defined for architecture"); +#endif +} static struct xc_osdep_ops minios_privcmd_ops = { .open = &minios_privcmd_open, @@ -196,6 +205,8 @@ static struct xc_osdep_ops minios_privcmd_ops = { .map_foreign_bulk = &minios_privcmd_map_foreign_bulk, .map_foreign_range = &minios_privcmd_map_foreign_range, .map_foreign_ranges = &minios_privcmd_map_foreign_ranges, + + .cache_flush = &minios_privcmd_cache_flush, }, }; diff --git a/tools/libxc/xc_netbsd.c b/tools/libxc/xc_netbsd.c index 8a90ef3..11e1027 100644 --- a/tools/libxc/xc_netbsd.c +++ b/tools/libxc/xc_netbsd.c @@ -207,6 +207,16 @@ mmap_failed: return NULL; } +static void netbsd_privcmd_cache_flush(xc_interface *xch, + const void *ptr, size_t nr) +{ +#if defined(__i386__) || defined(__x86_64__) + /* No need for cache maintenance on x86 */ +#else + PERROR("No cache flush operation defined for architecture"); +#endif +} + static struct xc_osdep_ops netbsd_privcmd_ops = { .open = &netbsd_privcmd_open, .close = &netbsd_privcmd_close, @@ -221,6 +231,8 @@ static struct xc_osdep_ops netbsd_privcmd_ops = { .map_foreign_bulk = &xc_map_foreign_bulk_compat, .map_foreign_range = &netbsd_privcmd_map_foreign_range, .map_foreign_ranges = &netbsd_privcmd_map_foreign_ranges, + + .cache_flush = &netbsd_privcmd_cache_flush, }, }; diff --git a/tools/libxc/xc_private.c b/tools/libxc/xc_private.c index 838fd21..3ccee2b 100644 --- a/tools/libxc/xc_private.c +++ b/tools/libxc/xc_private.c @@ -249,6 +249,11 @@ int do_xen_hypercall(xc_interface *xch, privcmd_hypercall_t *hypercall) return xch->ops->u.privcmd.hypercall(xch, xch->ops_handle, hypercall); } +void xc_cache_flush(xc_interface *xch, const void *p, size_t n) +{ + xch->ops->u.privcmd.cache_flush(xch, p, n); +} + xc_evtchn *xc_evtchn_open(xentoollog_logger *logger, unsigned open_flags) { diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h index 92271c9..50a0aa7 100644 --- a/tools/libxc/xc_private.h +++ b/tools/libxc/xc_private.h @@ -304,6 +304,9 @@ void bitmap_byte_to_64(uint64_t *lp, const uint8_t *bp, int nbits); /* Optionally flush file to disk and discard page cache */ void discard_file_cache(xc_interface *xch, int fd, int flush); +/* Flush data cache */ +void xc_cache_flush(xc_interface *xch, const void *p, size_t n); + #define MAX_MMU_UPDATES 1024 struct xc_mmu { mmu_update_t updates[MAX_MMU_UPDATES]; diff --git a/tools/libxc/xc_solaris.c b/tools/libxc/xc_solaris.c index 7257a54..83c3777 100644 --- a/tools/libxc/xc_solaris.c +++ b/tools/libxc/xc_solaris.c @@ -178,6 +178,16 @@ mmap_failed: return NULL; } +static void solaris_privcmd_cache_flush(xc_interface *xch, + const void *ptr, size_t nr) +{ +#if defined(__i386__) || defined(__x86_64__) + /* No need for cache maintenance on x86 */ +#else + PERROR("No cache flush operation defined for architecture"); +#endif +} + static struct xc_osdep_ops solaris_privcmd_ops = { .open = &solaris_privcmd_open, .close = &solaris_privcmd_close, @@ -192,6 +202,8 @@ static struct xc_osdep_ops solaris_privcmd_ops = { .map_foreign_bulk = &xc_map_foreign_bulk_compat, .map_foreign_range = &solaris_privcmd_map_foreign_range, .map_foreign_ranges = &solaris_privcmd_map_foreign_ranges, + + .cache_flush = &solaris_privcmd_cache_flush, }, }; diff --git a/tools/libxc/xenctrl_osdep_ENOSYS.c b/tools/libxc/xenctrl_osdep_ENOSYS.c index 4821342..c6fceff 100644 --- a/tools/libxc/xenctrl_osdep_ENOSYS.c +++ b/tools/libxc/xenctrl_osdep_ENOSYS.c @@ -63,6 +63,10 @@ static void *ENOSYS_privcmd_map_foreign_ranges(xc_interface *xch, xc_osdep_handl return MAP_FAILED; } +static void ENOSYS_privcmd_cache_flush(xc_interface *xch, const void *p, size_t n) +{ +} + static struct xc_osdep_ops ENOSYS_privcmd_ops { .open = &ENOSYS_privcmd_open, @@ -74,6 +78,8 @@ static struct xc_osdep_ops ENOSYS_privcmd_ops .map_foreign_bulk = &ENOSYS_privcmd_map_foreign_bulk, .map_foreign_range = &ENOSYS_privcmd_map_foreign_range, .map_foreign_ranges = &ENOSYS_privcmd_map_foreign_ranges, + + .cache_flush = &ENOSYS_privcmd_cache_flush, } }; diff --git a/tools/libxc/xenctrlosdep.h b/tools/libxc/xenctrlosdep.h index e610a24..6c9a005 100644 --- a/tools/libxc/xenctrlosdep.h +++ b/tools/libxc/xenctrlosdep.h @@ -89,6 +89,7 @@ struct xc_osdep_ops void *(*map_foreign_ranges)(xc_interface *xch, xc_osdep_handle h, uint32_t dom, size_t size, int prot, size_t chunksize, privcmd_mmap_entry_t entries[], int nentries); + void (*cache_flush)(xc_interface *xch, const void *p, size_t n); } privcmd; struct { int (*fd)(xc_evtchn *xce, xc_osdep_handle h); -- 1.7.10.4
Julien Grall
2013-Dec-12 14:11 UTC
Re: [PATCH] tools: libxc: flush data cache after loading images into guest memory
On 12/12/2013 12:21 PM, Ian Campbell wrote:> On ARM guest OSes are started with MMU and Caches disables (as they are on > native) however caching is enabled in the domain running the builder and > therefore we must flush the cache as we load the blobs, otherwise when the > guest starts running it may not see them. The dom0 build in the hypervisor has > the same requirements and already does the right thing. > > The mechanism for performing a cache flush from userspace is OS specific, so > implement this as a new osdep hook: > > - On 32-bit ARM Linux provides a system call to flush the cache. > - On 64-bit ARM Linux the processor is configured to allow cache flushes > directly from userspace. > - Non-Linux platforms will need to provide their own implementation. If > similar mechanisms are not available then a new privcmd ioctl should be a > suitable alternative. > > No cache maintenance is required on x86, so provide a stub for all non-Linux > platforms which returns success on x86 only and log an error otherwise. > > This fixes guest building on Xgene which has a very large L3 cache and so is > particularly susceptible to this problem. It has also been observed > sporadically on midway. > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > Cc: Andre Przywara <andre.przywara@calxeda.com> > Cc: Pranavkumar Sawargaonkar <psawargaonkar@apm.com> > Cc: Anup Patel <apatel@apm.com> > --- > Freeze: Bugfix. > --- > tools/libxc/xc_dom_armzimageloader.c | 1 + > tools/libxc/xc_dom_binloader.c | 1 + > tools/libxc/xc_dom_core.c | 2 ++ > tools/libxc/xc_linux_osdep.c | 39 ++++++++++++++++++++++++++++++++++ > tools/libxc/xc_minios.c | 11 ++++++++++ > tools/libxc/xc_netbsd.c | 12 +++++++++++ > tools/libxc/xc_private.c | 5 +++++ > tools/libxc/xc_private.h | 3 +++ > tools/libxc/xc_solaris.c | 12 +++++++++++ > tools/libxc/xenctrl_osdep_ENOSYS.c | 6 ++++++ > tools/libxc/xenctrlosdep.h | 1 + > 11 files changed, 93 insertions(+) >[..]> diff --git a/tools/libxc/xenctrl_osdep_ENOSYS.c b/tools/libxc/xenctrl_osdep_ENOSYS.c > index 4821342..c6fceff 100644 > --- a/tools/libxc/xenctrl_osdep_ENOSYS.c > +++ b/tools/libxc/xenctrl_osdep_ENOSYS.c > @@ -63,6 +63,10 @@ static void *ENOSYS_privcmd_map_foreign_ranges(xc_interface *xch, xc_osdep_handl > return MAP_FAILED; > } > > +static void ENOSYS_privcmd_cache_flush(xc_interface *xch, const void *p, size_t n) > +{ > +} > +Missing IPRINTF for this function.> static struct xc_osdep_ops ENOSYS_privcmd_ops > { > .open = &ENOSYS_privcmd_open, > @@ -74,6 +78,8 @@ static struct xc_osdep_ops ENOSYS_privcmd_ops > .map_foreign_bulk = &ENOSYS_privcmd_map_foreign_bulk, > .map_foreign_range = &ENOSYS_privcmd_map_foreign_range, > .map_foreign_ranges = &ENOSYS_privcmd_map_foreign_ranges, > + > + .cache_flush = &ENOSYS_privcmd_cache_flush, > } > }; > > diff --git a/tools/libxc/xenctrlosdep.h b/tools/libxc/xenctrlosdep.h > index e610a24..6c9a005 100644 > --- a/tools/libxc/xenctrlosdep.h > +++ b/tools/libxc/xenctrlosdep.h > @@ -89,6 +89,7 @@ struct xc_osdep_ops > void *(*map_foreign_ranges)(xc_interface *xch, xc_osdep_handle h, uint32_t dom, size_t size, int prot, > size_t chunksize, privcmd_mmap_entry_t entries[], > int nentries); > + void (*cache_flush)(xc_interface *xch, const void *p, size_t n); > } privcmd; > struct { > int (*fd)(xc_evtchn *xce, xc_osdep_handle h); >-- Julien Grall
Ian Campbell
2013-Dec-12 14:23 UTC
[PATCH v2] tools: libxc: flush data cache after loading images into guest memory
On ARM guest OSes are started with MMU and Caches disables (as they are on native) however caching is enabled in the domain running the builder and therefore we must flush the cache as we load the blobs, otherwise when the guest starts running it may not see them. The dom0 build in the hypervisor has the same requirements and already does the right thing. The mechanism for performing a cache flush from userspace is OS specific, so implement this as a new osdep hook: - On 32-bit ARM Linux provides a system call to flush the cache. - On 64-bit ARM Linux the processor is configured to allow cache flushes directly from userspace. - Non-Linux platforms will need to provide their own implementation. If similar mechanisms are not available then a new privcmd ioctl should be a suitable alternative. No cache maintenance is required on x86, so provide a stub for all non-Linux platforms which returns success on x86 only and log an error otherwise. This fixes guest building on Xgene which has a very large L3 cache and so is particularly susceptible to this problem. It has also been observed sporadically on midway. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Andre Przywara <andre.przywara@calxeda.com> Cc: Pranavkumar Sawargaonkar <psawargaonkar@apm.com> Cc: Anup Patel <apatel@apm.com> --- v2: Add IPRINTF to ENOSYS debug module. Freeze: Bugfix. --- tools/libxc/xc_dom_armzimageloader.c | 1 + tools/libxc/xc_dom_binloader.c | 1 + tools/libxc/xc_dom_core.c | 2 ++ tools/libxc/xc_linux_osdep.c | 39 ++++++++++++++++++++++++++++++++++ tools/libxc/xc_minios.c | 11 ++++++++++ tools/libxc/xc_netbsd.c | 12 +++++++++++ tools/libxc/xc_private.c | 5 +++++ tools/libxc/xc_private.h | 3 +++ tools/libxc/xc_solaris.c | 12 +++++++++++ tools/libxc/xenctrl_osdep_ENOSYS.c | 9 ++++++++ tools/libxc/xenctrlosdep.h | 1 + 11 files changed, 96 insertions(+) diff --git a/tools/libxc/xc_dom_armzimageloader.c b/tools/libxc/xc_dom_armzimageloader.c index e6516a1..508f74b 100644 --- a/tools/libxc/xc_dom_armzimageloader.c +++ b/tools/libxc/xc_dom_armzimageloader.c @@ -229,6 +229,7 @@ static int xc_dom_load_zimage_kernel(struct xc_dom_image *dom) __func__, dom->kernel_size, dom->kernel_blob, dst); memcpy(dst, dom->kernel_blob, dom->kernel_size); + xc_cache_flush(dom->xch, dst, dom->kernel_size); return 0; } diff --git a/tools/libxc/xc_dom_binloader.c b/tools/libxc/xc_dom_binloader.c index e1de5b5..aa0463c 100644 --- a/tools/libxc/xc_dom_binloader.c +++ b/tools/libxc/xc_dom_binloader.c @@ -301,6 +301,7 @@ static int xc_dom_load_bin_kernel(struct xc_dom_image *dom) memcpy(dest, image + skip, text_size); memset(dest + text_size, 0, bss_size); + xc_cache_flush(dom->xch, dest, text_size+bss_size); return 0; } diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c index 77a4e64..d46ac22 100644 --- a/tools/libxc/xc_dom_core.c +++ b/tools/libxc/xc_dom_core.c @@ -978,6 +978,7 @@ int xc_dom_build_image(struct xc_dom_image *dom) } else memcpy(ramdiskmap, dom->ramdisk_blob, dom->ramdisk_size); + xc_cache_flush(dom->xch, ramdiskmap, ramdisklen); } /* load devicetree */ @@ -997,6 +998,7 @@ int xc_dom_build_image(struct xc_dom_image *dom) goto err; } memcpy(devicetreemap, dom->devicetree_blob, dom->devicetree_size); + xc_cache_flush(dom->xch, devicetreemap, dom->devicetree_size); } /* allocate other pages */ diff --git a/tools/libxc/xc_linux_osdep.c b/tools/libxc/xc_linux_osdep.c index 73860a2..8362495 100644 --- a/tools/libxc/xc_linux_osdep.c +++ b/tools/libxc/xc_linux_osdep.c @@ -30,6 +30,7 @@ #include <sys/mman.h> #include <sys/ioctl.h> +#include <sys/syscall.h> #include <xen/memory.h> #include <xen/sys/evtchn.h> @@ -416,6 +417,42 @@ static void *linux_privcmd_map_foreign_ranges(xc_interface *xch, xc_osdep_handle return ret; } +static void linux_privcmd_cache_flush(xc_interface *xch, + const void *ptr, size_t nr) +{ +#if defined(__arm__) + unsigned long start = (unsigned long)ptr; + unsigned long end = start + nr; + /* cacheflush(unsigned long start, unsigned long end, int flags) */ + int rc = syscall(__ARM_NR_cacheflush, start, end, 0); + if ( rc < 0 ) + PERROR("cache flush operation failed: %d\n", errno); +#elif defined(__aarch64__) + unsigned long start = (unsigned long)ptr; + unsigned long end = start + nr; + unsigned long p, ctr; + int stride; + + /* Flush cache using direct DC CVAC instructions. This is + * available to EL0 when SCTLR_EL1.UCI is set, which Linux does. + * + * Bits 19:16 of CTR_EL0 are log2 of the minimum dcache line size + * in words, which we use as our stride length. This is readable + * with SCTLR_EL1.UCT is set, which Linux does. + */ + asm volatile ("mrs %0, ctr_el0" : "=r" (ctr)); + + stride = 4 * (1 << ((ctr & 0xf0000UL) >> 16)); + + for ( p = start ; p < end ; p += stride ) + asm volatile ("dc cvac, %0" : : "r" (p)); +#elif defined(__i386__) || defined(__x86_64__) + /* No need for cache maintenance on x86 */ +#else + PERROR("No cache flush operation defined for architecture"); +#endif +} + static struct xc_osdep_ops linux_privcmd_ops = { .open = &linux_privcmd_open, .close = &linux_privcmd_close, @@ -430,6 +467,8 @@ static struct xc_osdep_ops linux_privcmd_ops = { .map_foreign_bulk = &linux_privcmd_map_foreign_bulk, .map_foreign_range = &linux_privcmd_map_foreign_range, .map_foreign_ranges = &linux_privcmd_map_foreign_ranges, + + .cache_flush = &linux_privcmd_cache_flush, }, }; diff --git a/tools/libxc/xc_minios.c b/tools/libxc/xc_minios.c index dec4d73..3b2f553 100644 --- a/tools/libxc/xc_minios.c +++ b/tools/libxc/xc_minios.c @@ -181,6 +181,15 @@ static void *minios_privcmd_map_foreign_ranges(xc_interface *xch, xc_osdep_handl return ret; } +static void minios_privcmd_cache_flush(xc_interface *xch, + const void *ptr, size_t nr) +{ +#if defined(__i386__) || defined(__x86_64__) + /* No need for cache maintenance on x86 */ +#else + PERROR("No cache flush operation defined for architecture"); +#endif +} static struct xc_osdep_ops minios_privcmd_ops = { .open = &minios_privcmd_open, @@ -196,6 +205,8 @@ static struct xc_osdep_ops minios_privcmd_ops = { .map_foreign_bulk = &minios_privcmd_map_foreign_bulk, .map_foreign_range = &minios_privcmd_map_foreign_range, .map_foreign_ranges = &minios_privcmd_map_foreign_ranges, + + .cache_flush = &minios_privcmd_cache_flush, }, }; diff --git a/tools/libxc/xc_netbsd.c b/tools/libxc/xc_netbsd.c index 8a90ef3..11e1027 100644 --- a/tools/libxc/xc_netbsd.c +++ b/tools/libxc/xc_netbsd.c @@ -207,6 +207,16 @@ mmap_failed: return NULL; } +static void netbsd_privcmd_cache_flush(xc_interface *xch, + const void *ptr, size_t nr) +{ +#if defined(__i386__) || defined(__x86_64__) + /* No need for cache maintenance on x86 */ +#else + PERROR("No cache flush operation defined for architecture"); +#endif +} + static struct xc_osdep_ops netbsd_privcmd_ops = { .open = &netbsd_privcmd_open, .close = &netbsd_privcmd_close, @@ -221,6 +231,8 @@ static struct xc_osdep_ops netbsd_privcmd_ops = { .map_foreign_bulk = &xc_map_foreign_bulk_compat, .map_foreign_range = &netbsd_privcmd_map_foreign_range, .map_foreign_ranges = &netbsd_privcmd_map_foreign_ranges, + + .cache_flush = &netbsd_privcmd_cache_flush, }, }; diff --git a/tools/libxc/xc_private.c b/tools/libxc/xc_private.c index 838fd21..3ccee2b 100644 --- a/tools/libxc/xc_private.c +++ b/tools/libxc/xc_private.c @@ -249,6 +249,11 @@ int do_xen_hypercall(xc_interface *xch, privcmd_hypercall_t *hypercall) return xch->ops->u.privcmd.hypercall(xch, xch->ops_handle, hypercall); } +void xc_cache_flush(xc_interface *xch, const void *p, size_t n) +{ + xch->ops->u.privcmd.cache_flush(xch, p, n); +} + xc_evtchn *xc_evtchn_open(xentoollog_logger *logger, unsigned open_flags) { diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h index 92271c9..50a0aa7 100644 --- a/tools/libxc/xc_private.h +++ b/tools/libxc/xc_private.h @@ -304,6 +304,9 @@ void bitmap_byte_to_64(uint64_t *lp, const uint8_t *bp, int nbits); /* Optionally flush file to disk and discard page cache */ void discard_file_cache(xc_interface *xch, int fd, int flush); +/* Flush data cache */ +void xc_cache_flush(xc_interface *xch, const void *p, size_t n); + #define MAX_MMU_UPDATES 1024 struct xc_mmu { mmu_update_t updates[MAX_MMU_UPDATES]; diff --git a/tools/libxc/xc_solaris.c b/tools/libxc/xc_solaris.c index 7257a54..83c3777 100644 --- a/tools/libxc/xc_solaris.c +++ b/tools/libxc/xc_solaris.c @@ -178,6 +178,16 @@ mmap_failed: return NULL; } +static void solaris_privcmd_cache_flush(xc_interface *xch, + const void *ptr, size_t nr) +{ +#if defined(__i386__) || defined(__x86_64__) + /* No need for cache maintenance on x86 */ +#else + PERROR("No cache flush operation defined for architecture"); +#endif +} + static struct xc_osdep_ops solaris_privcmd_ops = { .open = &solaris_privcmd_open, .close = &solaris_privcmd_close, @@ -192,6 +202,8 @@ static struct xc_osdep_ops solaris_privcmd_ops = { .map_foreign_bulk = &xc_map_foreign_bulk_compat, .map_foreign_range = &solaris_privcmd_map_foreign_range, .map_foreign_ranges = &solaris_privcmd_map_foreign_ranges, + + .cache_flush = &solaris_privcmd_cache_flush, }, }; diff --git a/tools/libxc/xenctrl_osdep_ENOSYS.c b/tools/libxc/xenctrl_osdep_ENOSYS.c index 4821342..d911b10 100644 --- a/tools/libxc/xenctrl_osdep_ENOSYS.c +++ b/tools/libxc/xenctrl_osdep_ENOSYS.c @@ -63,6 +63,13 @@ static void *ENOSYS_privcmd_map_foreign_ranges(xc_interface *xch, xc_osdep_handl return MAP_FAILED; } +static void ENOSYS_privcmd_cache_flush(xc_interface *xch, const void *p, size_t n) +{ + unsigned long start = (unsigned long)p; + unsigned long end = start + n; + IPRINTF(xch, "ENOSYS_privcmd: cache_flush: %#lx-%#lx\n", start, end); +} + static struct xc_osdep_ops ENOSYS_privcmd_ops { .open = &ENOSYS_privcmd_open, @@ -74,6 +81,8 @@ static struct xc_osdep_ops ENOSYS_privcmd_ops .map_foreign_bulk = &ENOSYS_privcmd_map_foreign_bulk, .map_foreign_range = &ENOSYS_privcmd_map_foreign_range, .map_foreign_ranges = &ENOSYS_privcmd_map_foreign_ranges, + + .cache_flush = &ENOSYS_privcmd_cache_flush, } }; diff --git a/tools/libxc/xenctrlosdep.h b/tools/libxc/xenctrlosdep.h index e610a24..6c9a005 100644 --- a/tools/libxc/xenctrlosdep.h +++ b/tools/libxc/xenctrlosdep.h @@ -89,6 +89,7 @@ struct xc_osdep_ops void *(*map_foreign_ranges)(xc_interface *xch, xc_osdep_handle h, uint32_t dom, size_t size, int prot, size_t chunksize, privcmd_mmap_entry_t entries[], int nentries); + void (*cache_flush)(xc_interface *xch, const void *p, size_t n); } privcmd; struct { int (*fd)(xc_evtchn *xce, xc_osdep_handle h); -- 1.7.10.4
Stefano Stabellini
2013-Dec-12 14:30 UTC
Re: [PATCH v2] tools: libxc: flush data cache after loading images into guest memory
On Thu, 12 Dec 2013, Ian Campbell wrote:> On ARM guest OSes are started with MMU and Caches disables (as they are on > native) however caching is enabled in the domain running the builder and > therefore we must flush the cache as we load the blobs, otherwise when the > guest starts running it may not see them. The dom0 build in the hypervisor has > the same requirements and already does the right thing. > > The mechanism for performing a cache flush from userspace is OS specific, so > implement this as a new osdep hook: > > - On 32-bit ARM Linux provides a system call to flush the cache. > - On 64-bit ARM Linux the processor is configured to allow cache flushes > directly from userspace. > - Non-Linux platforms will need to provide their own implementation. If > similar mechanisms are not available then a new privcmd ioctl should be a > suitable alternative. > > No cache maintenance is required on x86, so provide a stub for all non-Linux > platforms which returns success on x86 only and log an error otherwise. > > This fixes guest building on Xgene which has a very large L3 cache and so is > particularly susceptible to this problem. It has also been observed > sporadically on midway. > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > Cc: Andre Przywara <andre.przywara@calxeda.com> > Cc: Pranavkumar Sawargaonkar <psawargaonkar@apm.com> > Cc: Anup Patel <apatel@apm.com>Looks good to me> v2: Add IPRINTF to ENOSYS debug module. > Freeze: Bugfix. > --- > tools/libxc/xc_dom_armzimageloader.c | 1 + > tools/libxc/xc_dom_binloader.c | 1 + > tools/libxc/xc_dom_core.c | 2 ++ > tools/libxc/xc_linux_osdep.c | 39 ++++++++++++++++++++++++++++++++++ > tools/libxc/xc_minios.c | 11 ++++++++++ > tools/libxc/xc_netbsd.c | 12 +++++++++++ > tools/libxc/xc_private.c | 5 +++++ > tools/libxc/xc_private.h | 3 +++ > tools/libxc/xc_solaris.c | 12 +++++++++++ > tools/libxc/xenctrl_osdep_ENOSYS.c | 9 ++++++++ > tools/libxc/xenctrlosdep.h | 1 + > 11 files changed, 96 insertions(+) > > diff --git a/tools/libxc/xc_dom_armzimageloader.c b/tools/libxc/xc_dom_armzimageloader.c > index e6516a1..508f74b 100644 > --- a/tools/libxc/xc_dom_armzimageloader.c > +++ b/tools/libxc/xc_dom_armzimageloader.c > @@ -229,6 +229,7 @@ static int xc_dom_load_zimage_kernel(struct xc_dom_image *dom) > __func__, dom->kernel_size, dom->kernel_blob, dst); > > memcpy(dst, dom->kernel_blob, dom->kernel_size); > + xc_cache_flush(dom->xch, dst, dom->kernel_size); > > return 0; > } > diff --git a/tools/libxc/xc_dom_binloader.c b/tools/libxc/xc_dom_binloader.c > index e1de5b5..aa0463c 100644 > --- a/tools/libxc/xc_dom_binloader.c > +++ b/tools/libxc/xc_dom_binloader.c > @@ -301,6 +301,7 @@ static int xc_dom_load_bin_kernel(struct xc_dom_image *dom) > > memcpy(dest, image + skip, text_size); > memset(dest + text_size, 0, bss_size); > + xc_cache_flush(dom->xch, dest, text_size+bss_size); > > return 0; > } > diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c > index 77a4e64..d46ac22 100644 > --- a/tools/libxc/xc_dom_core.c > +++ b/tools/libxc/xc_dom_core.c > @@ -978,6 +978,7 @@ int xc_dom_build_image(struct xc_dom_image *dom) > } > else > memcpy(ramdiskmap, dom->ramdisk_blob, dom->ramdisk_size); > + xc_cache_flush(dom->xch, ramdiskmap, ramdisklen); > } > > /* load devicetree */ > @@ -997,6 +998,7 @@ int xc_dom_build_image(struct xc_dom_image *dom) > goto err; > } > memcpy(devicetreemap, dom->devicetree_blob, dom->devicetree_size); > + xc_cache_flush(dom->xch, devicetreemap, dom->devicetree_size); > } > > /* allocate other pages */ > diff --git a/tools/libxc/xc_linux_osdep.c b/tools/libxc/xc_linux_osdep.c > index 73860a2..8362495 100644 > --- a/tools/libxc/xc_linux_osdep.c > +++ b/tools/libxc/xc_linux_osdep.c > @@ -30,6 +30,7 @@ > > #include <sys/mman.h> > #include <sys/ioctl.h> > +#include <sys/syscall.h> > > #include <xen/memory.h> > #include <xen/sys/evtchn.h> > @@ -416,6 +417,42 @@ static void *linux_privcmd_map_foreign_ranges(xc_interface *xch, xc_osdep_handle > return ret; > } > > +static void linux_privcmd_cache_flush(xc_interface *xch, > + const void *ptr, size_t nr) > +{ > +#if defined(__arm__) > + unsigned long start = (unsigned long)ptr; > + unsigned long end = start + nr; > + /* cacheflush(unsigned long start, unsigned long end, int flags) */ > + int rc = syscall(__ARM_NR_cacheflush, start, end, 0); > + if ( rc < 0 ) > + PERROR("cache flush operation failed: %d\n", errno); > +#elif defined(__aarch64__) > + unsigned long start = (unsigned long)ptr; > + unsigned long end = start + nr; > + unsigned long p, ctr; > + int stride; > + > + /* Flush cache using direct DC CVAC instructions. This is > + * available to EL0 when SCTLR_EL1.UCI is set, which Linux does. > + * > + * Bits 19:16 of CTR_EL0 are log2 of the minimum dcache line size > + * in words, which we use as our stride length. This is readable > + * with SCTLR_EL1.UCT is set, which Linux does. > + */ > + asm volatile ("mrs %0, ctr_el0" : "=r" (ctr)); > + > + stride = 4 * (1 << ((ctr & 0xf0000UL) >> 16)); > + > + for ( p = start ; p < end ; p += stride ) > + asm volatile ("dc cvac, %0" : : "r" (p)); > +#elif defined(__i386__) || defined(__x86_64__) > + /* No need for cache maintenance on x86 */ > +#else > + PERROR("No cache flush operation defined for architecture"); > +#endif > +} > + > static struct xc_osdep_ops linux_privcmd_ops = { > .open = &linux_privcmd_open, > .close = &linux_privcmd_close, > @@ -430,6 +467,8 @@ static struct xc_osdep_ops linux_privcmd_ops = { > .map_foreign_bulk = &linux_privcmd_map_foreign_bulk, > .map_foreign_range = &linux_privcmd_map_foreign_range, > .map_foreign_ranges = &linux_privcmd_map_foreign_ranges, > + > + .cache_flush = &linux_privcmd_cache_flush, > }, > }; > > diff --git a/tools/libxc/xc_minios.c b/tools/libxc/xc_minios.c > index dec4d73..3b2f553 100644 > --- a/tools/libxc/xc_minios.c > +++ b/tools/libxc/xc_minios.c > @@ -181,6 +181,15 @@ static void *minios_privcmd_map_foreign_ranges(xc_interface *xch, xc_osdep_handl > return ret; > } > > +static void minios_privcmd_cache_flush(xc_interface *xch, > + const void *ptr, size_t nr) > +{ > +#if defined(__i386__) || defined(__x86_64__) > + /* No need for cache maintenance on x86 */ > +#else > + PERROR("No cache flush operation defined for architecture"); > +#endif > +} > > static struct xc_osdep_ops minios_privcmd_ops = { > .open = &minios_privcmd_open, > @@ -196,6 +205,8 @@ static struct xc_osdep_ops minios_privcmd_ops = { > .map_foreign_bulk = &minios_privcmd_map_foreign_bulk, > .map_foreign_range = &minios_privcmd_map_foreign_range, > .map_foreign_ranges = &minios_privcmd_map_foreign_ranges, > + > + .cache_flush = &minios_privcmd_cache_flush, > }, > }; > > diff --git a/tools/libxc/xc_netbsd.c b/tools/libxc/xc_netbsd.c > index 8a90ef3..11e1027 100644 > --- a/tools/libxc/xc_netbsd.c > +++ b/tools/libxc/xc_netbsd.c > @@ -207,6 +207,16 @@ mmap_failed: > return NULL; > } > > +static void netbsd_privcmd_cache_flush(xc_interface *xch, > + const void *ptr, size_t nr) > +{ > +#if defined(__i386__) || defined(__x86_64__) > + /* No need for cache maintenance on x86 */ > +#else > + PERROR("No cache flush operation defined for architecture"); > +#endif > +} > + > static struct xc_osdep_ops netbsd_privcmd_ops = { > .open = &netbsd_privcmd_open, > .close = &netbsd_privcmd_close, > @@ -221,6 +231,8 @@ static struct xc_osdep_ops netbsd_privcmd_ops = { > .map_foreign_bulk = &xc_map_foreign_bulk_compat, > .map_foreign_range = &netbsd_privcmd_map_foreign_range, > .map_foreign_ranges = &netbsd_privcmd_map_foreign_ranges, > + > + .cache_flush = &netbsd_privcmd_cache_flush, > }, > }; > > diff --git a/tools/libxc/xc_private.c b/tools/libxc/xc_private.c > index 838fd21..3ccee2b 100644 > --- a/tools/libxc/xc_private.c > +++ b/tools/libxc/xc_private.c > @@ -249,6 +249,11 @@ int do_xen_hypercall(xc_interface *xch, privcmd_hypercall_t *hypercall) > return xch->ops->u.privcmd.hypercall(xch, xch->ops_handle, hypercall); > } > > +void xc_cache_flush(xc_interface *xch, const void *p, size_t n) > +{ > + xch->ops->u.privcmd.cache_flush(xch, p, n); > +} > + > xc_evtchn *xc_evtchn_open(xentoollog_logger *logger, > unsigned open_flags) > { > diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h > index 92271c9..50a0aa7 100644 > --- a/tools/libxc/xc_private.h > +++ b/tools/libxc/xc_private.h > @@ -304,6 +304,9 @@ void bitmap_byte_to_64(uint64_t *lp, const uint8_t *bp, int nbits); > /* Optionally flush file to disk and discard page cache */ > void discard_file_cache(xc_interface *xch, int fd, int flush); > > +/* Flush data cache */ > +void xc_cache_flush(xc_interface *xch, const void *p, size_t n); > + > #define MAX_MMU_UPDATES 1024 > struct xc_mmu { > mmu_update_t updates[MAX_MMU_UPDATES]; > diff --git a/tools/libxc/xc_solaris.c b/tools/libxc/xc_solaris.c > index 7257a54..83c3777 100644 > --- a/tools/libxc/xc_solaris.c > +++ b/tools/libxc/xc_solaris.c > @@ -178,6 +178,16 @@ mmap_failed: > return NULL; > } > > +static void solaris_privcmd_cache_flush(xc_interface *xch, > + const void *ptr, size_t nr) > +{ > +#if defined(__i386__) || defined(__x86_64__) > + /* No need for cache maintenance on x86 */ > +#else > + PERROR("No cache flush operation defined for architecture"); > +#endif > +} > + > static struct xc_osdep_ops solaris_privcmd_ops = { > .open = &solaris_privcmd_open, > .close = &solaris_privcmd_close, > @@ -192,6 +202,8 @@ static struct xc_osdep_ops solaris_privcmd_ops = { > .map_foreign_bulk = &xc_map_foreign_bulk_compat, > .map_foreign_range = &solaris_privcmd_map_foreign_range, > .map_foreign_ranges = &solaris_privcmd_map_foreign_ranges, > + > + .cache_flush = &solaris_privcmd_cache_flush, > }, > }; > > diff --git a/tools/libxc/xenctrl_osdep_ENOSYS.c b/tools/libxc/xenctrl_osdep_ENOSYS.c > index 4821342..d911b10 100644 > --- a/tools/libxc/xenctrl_osdep_ENOSYS.c > +++ b/tools/libxc/xenctrl_osdep_ENOSYS.c > @@ -63,6 +63,13 @@ static void *ENOSYS_privcmd_map_foreign_ranges(xc_interface *xch, xc_osdep_handl > return MAP_FAILED; > } > > +static void ENOSYS_privcmd_cache_flush(xc_interface *xch, const void *p, size_t n) > +{ > + unsigned long start = (unsigned long)p; > + unsigned long end = start + n; > + IPRINTF(xch, "ENOSYS_privcmd: cache_flush: %#lx-%#lx\n", start, end); > +} > + > static struct xc_osdep_ops ENOSYS_privcmd_ops > { > .open = &ENOSYS_privcmd_open, > @@ -74,6 +81,8 @@ static struct xc_osdep_ops ENOSYS_privcmd_ops > .map_foreign_bulk = &ENOSYS_privcmd_map_foreign_bulk, > .map_foreign_range = &ENOSYS_privcmd_map_foreign_range, > .map_foreign_ranges = &ENOSYS_privcmd_map_foreign_ranges, > + > + .cache_flush = &ENOSYS_privcmd_cache_flush, > } > }; > > diff --git a/tools/libxc/xenctrlosdep.h b/tools/libxc/xenctrlosdep.h > index e610a24..6c9a005 100644 > --- a/tools/libxc/xenctrlosdep.h > +++ b/tools/libxc/xenctrlosdep.h > @@ -89,6 +89,7 @@ struct xc_osdep_ops > void *(*map_foreign_ranges)(xc_interface *xch, xc_osdep_handle h, uint32_t dom, size_t size, int prot, > size_t chunksize, privcmd_mmap_entry_t entries[], > int nentries); > + void (*cache_flush)(xc_interface *xch, const void *p, size_t n); > } privcmd; > struct { > int (*fd)(xc_evtchn *xce, xc_osdep_handle h); > -- > 1.7.10.4 >
Ian Campbell
2013-Dec-12 14:37 UTC
Re: [PATCH v2] tools: libxc: flush data cache after loading images into guest memory
On Thu, 2013-12-12 at 14:30 +0000, Stefano Stabellini wrote:> On Thu, 12 Dec 2013, Ian Campbell wrote: > > On ARM guest OSes are started with MMU and Caches disables (as they are on > > native) however caching is enabled in the domain running the builder and > > therefore we must flush the cache as we load the blobs, otherwise when the > > guest starts running it may not see them. The dom0 build in the hypervisor has > > the same requirements and already does the right thing. > > > > The mechanism for performing a cache flush from userspace is OS specific, so > > implement this as a new osdep hook: > > > > - On 32-bit ARM Linux provides a system call to flush the cache. > > - On 64-bit ARM Linux the processor is configured to allow cache flushes > > directly from userspace. > > - Non-Linux platforms will need to provide their own implementation. If > > similar mechanisms are not available then a new privcmd ioctl should be a > > suitable alternative. > > > > No cache maintenance is required on x86, so provide a stub for all non-Linux > > platforms which returns success on x86 only and log an error otherwise. > > > > This fixes guest building on Xgene which has a very large L3 cache and so is > > particularly susceptible to this problem. It has also been observed > > sporadically on midway. > > > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > > Cc: Andre Przywara <andre.przywara@calxeda.com> > > Cc: Pranavkumar Sawargaonkar <psawargaonkar@apm.com> > > Cc: Anup Patel <apatel@apm.com> > > Looks good to meIs that an Ack?
Stefano Stabellini
2013-Dec-12 14:45 UTC
Re: [PATCH v2] tools: libxc: flush data cache after loading images into guest memory
On Thu, 12 Dec 2013, Ian Campbell wrote:> On Thu, 2013-12-12 at 14:30 +0000, Stefano Stabellini wrote: > > On Thu, 12 Dec 2013, Ian Campbell wrote: > > > On ARM guest OSes are started with MMU and Caches disables (as they are on > > > native) however caching is enabled in the domain running the builder and > > > therefore we must flush the cache as we load the blobs, otherwise when the > > > guest starts running it may not see them. The dom0 build in the hypervisor has > > > the same requirements and already does the right thing. > > > > > > The mechanism for performing a cache flush from userspace is OS specific, so > > > implement this as a new osdep hook: > > > > > > - On 32-bit ARM Linux provides a system call to flush the cache. > > > - On 64-bit ARM Linux the processor is configured to allow cache flushes > > > directly from userspace. > > > - Non-Linux platforms will need to provide their own implementation. If > > > similar mechanisms are not available then a new privcmd ioctl should be a > > > suitable alternative. > > > > > > No cache maintenance is required on x86, so provide a stub for all non-Linux > > > platforms which returns success on x86 only and log an error otherwise. > > > > > > This fixes guest building on Xgene which has a very large L3 cache and so is > > > particularly susceptible to this problem. It has also been observed > > > sporadically on midway. > > > > > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > > > Cc: Andre Przywara <andre.przywara@calxeda.com> > > > Cc: Pranavkumar Sawargaonkar <psawargaonkar@apm.com> > > > Cc: Anup Patel <apatel@apm.com> > > > > Looks good to me > > Is that an Ack?Yep, but keep in mind that I don''t maintain libxc :)
Ian Campbell
2013-Dec-12 14:49 UTC
Re: [PATCH v2] tools: libxc: flush data cache after loading images into guest memory
On Thu, 2013-12-12 at 14:45 +0000, Stefano Stabellini wrote:> On Thu, 12 Dec 2013, Ian Campbell wrote: > > On Thu, 2013-12-12 at 14:30 +0000, Stefano Stabellini wrote: > > > On Thu, 12 Dec 2013, Ian Campbell wrote: > > > > On ARM guest OSes are started with MMU and Caches disables (as they are on > > > > native) however caching is enabled in the domain running the builder and > > > > therefore we must flush the cache as we load the blobs, otherwise when the > > > > guest starts running it may not see them. The dom0 build in the hypervisor has > > > > the same requirements and already does the right thing. > > > > > > > > The mechanism for performing a cache flush from userspace is OS specific, so > > > > implement this as a new osdep hook: > > > > > > > > - On 32-bit ARM Linux provides a system call to flush the cache. > > > > - On 64-bit ARM Linux the processor is configured to allow cache flushes > > > > directly from userspace. > > > > - Non-Linux platforms will need to provide their own implementation. If > > > > similar mechanisms are not available then a new privcmd ioctl should be a > > > > suitable alternative. > > > > > > > > No cache maintenance is required on x86, so provide a stub for all non-Linux > > > > platforms which returns success on x86 only and log an error otherwise. > > > > > > > > This fixes guest building on Xgene which has a very large L3 cache and so is > > > > particularly susceptible to this problem. It has also been observed > > > > sporadically on midway. > > > > > > > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > > > > Cc: Andre Przywara <andre.przywara@calxeda.com> > > > > Cc: Pranavkumar Sawargaonkar <psawargaonkar@apm.com> > > > > Cc: Anup Patel <apatel@apm.com> > > > > > > Looks good to me > > > > Is that an Ack? > > Yep, but keep in mind that I don''t maintain libxc :)An Ack from an ARM side person is still useful IMHO, since that''s where the bulk of the actual functionality is. Ian.
Ian Campbell
2013-Dec-12 17:31 UTC
Re: [PATCH v2] tools: libxc: flush data cache after loading images into guest memory
On Thu, 2013-12-12 at 14:23 +0000, Ian Campbell wrote: Ian, Since this is a tool patch more than and ARM one I should have CCd you, sorry. Ian.> On ARM guest OSes are started with MMU and Caches disables (as they are on > native) however caching is enabled in the domain running the builder and > therefore we must flush the cache as we load the blobs, otherwise when the > guest starts running it may not see them. The dom0 build in the hypervisor has > the same requirements and already does the right thing. > > The mechanism for performing a cache flush from userspace is OS specific, so > implement this as a new osdep hook: > > - On 32-bit ARM Linux provides a system call to flush the cache. > - On 64-bit ARM Linux the processor is configured to allow cache flushes > directly from userspace. > - Non-Linux platforms will need to provide their own implementation. If > similar mechanisms are not available then a new privcmd ioctl should be a > suitable alternative. > > No cache maintenance is required on x86, so provide a stub for all non-Linux > platforms which returns success on x86 only and log an error otherwise. > > This fixes guest building on Xgene which has a very large L3 cache and so is > particularly susceptible to this problem. It has also been observed > sporadically on midway. > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > Cc: Andre Przywara <andre.przywara@calxeda.com> > Cc: Pranavkumar Sawargaonkar <psawargaonkar@apm.com> > Cc: Anup Patel <apatel@apm.com> > --- > v2: Add IPRINTF to ENOSYS debug module. > Freeze: Bugfix. > --- > tools/libxc/xc_dom_armzimageloader.c | 1 + > tools/libxc/xc_dom_binloader.c | 1 + > tools/libxc/xc_dom_core.c | 2 ++ > tools/libxc/xc_linux_osdep.c | 39 ++++++++++++++++++++++++++++++++++ > tools/libxc/xc_minios.c | 11 ++++++++++ > tools/libxc/xc_netbsd.c | 12 +++++++++++ > tools/libxc/xc_private.c | 5 +++++ > tools/libxc/xc_private.h | 3 +++ > tools/libxc/xc_solaris.c | 12 +++++++++++ > tools/libxc/xenctrl_osdep_ENOSYS.c | 9 ++++++++ > tools/libxc/xenctrlosdep.h | 1 + > 11 files changed, 96 insertions(+) > > diff --git a/tools/libxc/xc_dom_armzimageloader.c b/tools/libxc/xc_dom_armzimageloader.c > index e6516a1..508f74b 100644 > --- a/tools/libxc/xc_dom_armzimageloader.c > +++ b/tools/libxc/xc_dom_armzimageloader.c > @@ -229,6 +229,7 @@ static int xc_dom_load_zimage_kernel(struct xc_dom_image *dom) > __func__, dom->kernel_size, dom->kernel_blob, dst); > > memcpy(dst, dom->kernel_blob, dom->kernel_size); > + xc_cache_flush(dom->xch, dst, dom->kernel_size); > > return 0; > } > diff --git a/tools/libxc/xc_dom_binloader.c b/tools/libxc/xc_dom_binloader.c > index e1de5b5..aa0463c 100644 > --- a/tools/libxc/xc_dom_binloader.c > +++ b/tools/libxc/xc_dom_binloader.c > @@ -301,6 +301,7 @@ static int xc_dom_load_bin_kernel(struct xc_dom_image *dom) > > memcpy(dest, image + skip, text_size); > memset(dest + text_size, 0, bss_size); > + xc_cache_flush(dom->xch, dest, text_size+bss_size); > > return 0; > } > diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c > index 77a4e64..d46ac22 100644 > --- a/tools/libxc/xc_dom_core.c > +++ b/tools/libxc/xc_dom_core.c > @@ -978,6 +978,7 @@ int xc_dom_build_image(struct xc_dom_image *dom) > } > else > memcpy(ramdiskmap, dom->ramdisk_blob, dom->ramdisk_size); > + xc_cache_flush(dom->xch, ramdiskmap, ramdisklen); > } > > /* load devicetree */ > @@ -997,6 +998,7 @@ int xc_dom_build_image(struct xc_dom_image *dom) > goto err; > } > memcpy(devicetreemap, dom->devicetree_blob, dom->devicetree_size); > + xc_cache_flush(dom->xch, devicetreemap, dom->devicetree_size); > } > > /* allocate other pages */ > diff --git a/tools/libxc/xc_linux_osdep.c b/tools/libxc/xc_linux_osdep.c > index 73860a2..8362495 100644 > --- a/tools/libxc/xc_linux_osdep.c > +++ b/tools/libxc/xc_linux_osdep.c > @@ -30,6 +30,7 @@ > > #include <sys/mman.h> > #include <sys/ioctl.h> > +#include <sys/syscall.h> > > #include <xen/memory.h> > #include <xen/sys/evtchn.h> > @@ -416,6 +417,42 @@ static void *linux_privcmd_map_foreign_ranges(xc_interface *xch, xc_osdep_handle > return ret; > } > > +static void linux_privcmd_cache_flush(xc_interface *xch, > + const void *ptr, size_t nr) > +{ > +#if defined(__arm__) > + unsigned long start = (unsigned long)ptr; > + unsigned long end = start + nr; > + /* cacheflush(unsigned long start, unsigned long end, int flags) */ > + int rc = syscall(__ARM_NR_cacheflush, start, end, 0); > + if ( rc < 0 ) > + PERROR("cache flush operation failed: %d\n", errno); > +#elif defined(__aarch64__) > + unsigned long start = (unsigned long)ptr; > + unsigned long end = start + nr; > + unsigned long p, ctr; > + int stride; > + > + /* Flush cache using direct DC CVAC instructions. This is > + * available to EL0 when SCTLR_EL1.UCI is set, which Linux does. > + * > + * Bits 19:16 of CTR_EL0 are log2 of the minimum dcache line size > + * in words, which we use as our stride length. This is readable > + * with SCTLR_EL1.UCT is set, which Linux does. > + */ > + asm volatile ("mrs %0, ctr_el0" : "=r" (ctr)); > + > + stride = 4 * (1 << ((ctr & 0xf0000UL) >> 16)); > + > + for ( p = start ; p < end ; p += stride ) > + asm volatile ("dc cvac, %0" : : "r" (p)); > +#elif defined(__i386__) || defined(__x86_64__) > + /* No need for cache maintenance on x86 */ > +#else > + PERROR("No cache flush operation defined for architecture"); > +#endif > +} > + > static struct xc_osdep_ops linux_privcmd_ops = { > .open = &linux_privcmd_open, > .close = &linux_privcmd_close, > @@ -430,6 +467,8 @@ static struct xc_osdep_ops linux_privcmd_ops = { > .map_foreign_bulk = &linux_privcmd_map_foreign_bulk, > .map_foreign_range = &linux_privcmd_map_foreign_range, > .map_foreign_ranges = &linux_privcmd_map_foreign_ranges, > + > + .cache_flush = &linux_privcmd_cache_flush, > }, > }; > > diff --git a/tools/libxc/xc_minios.c b/tools/libxc/xc_minios.c > index dec4d73..3b2f553 100644 > --- a/tools/libxc/xc_minios.c > +++ b/tools/libxc/xc_minios.c > @@ -181,6 +181,15 @@ static void *minios_privcmd_map_foreign_ranges(xc_interface *xch, xc_osdep_handl > return ret; > } > > +static void minios_privcmd_cache_flush(xc_interface *xch, > + const void *ptr, size_t nr) > +{ > +#if defined(__i386__) || defined(__x86_64__) > + /* No need for cache maintenance on x86 */ > +#else > + PERROR("No cache flush operation defined for architecture"); > +#endif > +} > > static struct xc_osdep_ops minios_privcmd_ops = { > .open = &minios_privcmd_open, > @@ -196,6 +205,8 @@ static struct xc_osdep_ops minios_privcmd_ops = { > .map_foreign_bulk = &minios_privcmd_map_foreign_bulk, > .map_foreign_range = &minios_privcmd_map_foreign_range, > .map_foreign_ranges = &minios_privcmd_map_foreign_ranges, > + > + .cache_flush = &minios_privcmd_cache_flush, > }, > }; > > diff --git a/tools/libxc/xc_netbsd.c b/tools/libxc/xc_netbsd.c > index 8a90ef3..11e1027 100644 > --- a/tools/libxc/xc_netbsd.c > +++ b/tools/libxc/xc_netbsd.c > @@ -207,6 +207,16 @@ mmap_failed: > return NULL; > } > > +static void netbsd_privcmd_cache_flush(xc_interface *xch, > + const void *ptr, size_t nr) > +{ > +#if defined(__i386__) || defined(__x86_64__) > + /* No need for cache maintenance on x86 */ > +#else > + PERROR("No cache flush operation defined for architecture"); > +#endif > +} > + > static struct xc_osdep_ops netbsd_privcmd_ops = { > .open = &netbsd_privcmd_open, > .close = &netbsd_privcmd_close, > @@ -221,6 +231,8 @@ static struct xc_osdep_ops netbsd_privcmd_ops = { > .map_foreign_bulk = &xc_map_foreign_bulk_compat, > .map_foreign_range = &netbsd_privcmd_map_foreign_range, > .map_foreign_ranges = &netbsd_privcmd_map_foreign_ranges, > + > + .cache_flush = &netbsd_privcmd_cache_flush, > }, > }; > > diff --git a/tools/libxc/xc_private.c b/tools/libxc/xc_private.c > index 838fd21..3ccee2b 100644 > --- a/tools/libxc/xc_private.c > +++ b/tools/libxc/xc_private.c > @@ -249,6 +249,11 @@ int do_xen_hypercall(xc_interface *xch, privcmd_hypercall_t *hypercall) > return xch->ops->u.privcmd.hypercall(xch, xch->ops_handle, hypercall); > } > > +void xc_cache_flush(xc_interface *xch, const void *p, size_t n) > +{ > + xch->ops->u.privcmd.cache_flush(xch, p, n); > +} > + > xc_evtchn *xc_evtchn_open(xentoollog_logger *logger, > unsigned open_flags) > { > diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h > index 92271c9..50a0aa7 100644 > --- a/tools/libxc/xc_private.h > +++ b/tools/libxc/xc_private.h > @@ -304,6 +304,9 @@ void bitmap_byte_to_64(uint64_t *lp, const uint8_t *bp, int nbits); > /* Optionally flush file to disk and discard page cache */ > void discard_file_cache(xc_interface *xch, int fd, int flush); > > +/* Flush data cache */ > +void xc_cache_flush(xc_interface *xch, const void *p, size_t n); > + > #define MAX_MMU_UPDATES 1024 > struct xc_mmu { > mmu_update_t updates[MAX_MMU_UPDATES]; > diff --git a/tools/libxc/xc_solaris.c b/tools/libxc/xc_solaris.c > index 7257a54..83c3777 100644 > --- a/tools/libxc/xc_solaris.c > +++ b/tools/libxc/xc_solaris.c > @@ -178,6 +178,16 @@ mmap_failed: > return NULL; > } > > +static void solaris_privcmd_cache_flush(xc_interface *xch, > + const void *ptr, size_t nr) > +{ > +#if defined(__i386__) || defined(__x86_64__) > + /* No need for cache maintenance on x86 */ > +#else > + PERROR("No cache flush operation defined for architecture"); > +#endif > +} > + > static struct xc_osdep_ops solaris_privcmd_ops = { > .open = &solaris_privcmd_open, > .close = &solaris_privcmd_close, > @@ -192,6 +202,8 @@ static struct xc_osdep_ops solaris_privcmd_ops = { > .map_foreign_bulk = &xc_map_foreign_bulk_compat, > .map_foreign_range = &solaris_privcmd_map_foreign_range, > .map_foreign_ranges = &solaris_privcmd_map_foreign_ranges, > + > + .cache_flush = &solaris_privcmd_cache_flush, > }, > }; > > diff --git a/tools/libxc/xenctrl_osdep_ENOSYS.c b/tools/libxc/xenctrl_osdep_ENOSYS.c > index 4821342..d911b10 100644 > --- a/tools/libxc/xenctrl_osdep_ENOSYS.c > +++ b/tools/libxc/xenctrl_osdep_ENOSYS.c > @@ -63,6 +63,13 @@ static void *ENOSYS_privcmd_map_foreign_ranges(xc_interface *xch, xc_osdep_handl > return MAP_FAILED; > } > > +static void ENOSYS_privcmd_cache_flush(xc_interface *xch, const void *p, size_t n) > +{ > + unsigned long start = (unsigned long)p; > + unsigned long end = start + n; > + IPRINTF(xch, "ENOSYS_privcmd: cache_flush: %#lx-%#lx\n", start, end); > +} > + > static struct xc_osdep_ops ENOSYS_privcmd_ops > { > .open = &ENOSYS_privcmd_open, > @@ -74,6 +81,8 @@ static struct xc_osdep_ops ENOSYS_privcmd_ops > .map_foreign_bulk = &ENOSYS_privcmd_map_foreign_bulk, > .map_foreign_range = &ENOSYS_privcmd_map_foreign_range, > .map_foreign_ranges = &ENOSYS_privcmd_map_foreign_ranges, > + > + .cache_flush = &ENOSYS_privcmd_cache_flush, > } > }; > > diff --git a/tools/libxc/xenctrlosdep.h b/tools/libxc/xenctrlosdep.h > index e610a24..6c9a005 100644 > --- a/tools/libxc/xenctrlosdep.h > +++ b/tools/libxc/xenctrlosdep.h > @@ -89,6 +89,7 @@ struct xc_osdep_ops > void *(*map_foreign_ranges)(xc_interface *xch, xc_osdep_handle h, uint32_t dom, size_t size, int prot, > size_t chunksize, privcmd_mmap_entry_t entries[], > int nentries); > + void (*cache_flush)(xc_interface *xch, const void *p, size_t n); > } privcmd; > struct { > int (*fd)(xc_evtchn *xce, xc_osdep_handle h);
Ian Campbell
2013-Dec-12 17:33 UTC
Re: [PATCH v2] tools: libxc: flush data cache after loading images into guest memory
On Thu, 2013-12-12 at 14:23 +0000, Ian Campbell wrote:> +#elif defined(__aarch64__) > + unsigned long start = (unsigned long)ptr; > + unsigned long end = start + nr; > + unsigned long p, ctr; > + int stride; > + > + /* Flush cache using direct DC CVAC instructions. This is > + * available to EL0 when SCTLR_EL1.UCI is set, which Linux does. > + * > + * Bits 19:16 of CTR_EL0 are log2 of the minimum dcache line size > + * in words, which we use as our stride length. This is readable > + * with SCTLR_EL1.UCT is set, which Linux does. > + */ > + asm volatile ("mrs %0, ctr_el0" : "=r" (ctr)); > + > + stride = 4 * (1 << ((ctr & 0xf0000UL) >> 16)); > + > + for ( p = start ; p < end ; p += stride ) > + asm volatile ("dc cvac, %0" : : "r" (p));I wonder if I need a dsb here. I suspect I do. Ian.
Ian Jackson
2013-Dec-12 17:52 UTC
Re: [PATCH v2] tools: libxc: flush data cache after loading images into guest memory
Ian Campbell writes ("Re: [PATCH v2] tools: libxc: flush data cache after loading images into guest memory"):> Since this is a tool patch more than and ARM one I should have CCd you, > sorry.NP. I haven''t really any comment on whether your cache flushes are good or not, but two things occur to me: * Are there other situations where the toolstack (or device model) maps domain memory, which also need to be treated ? * This:> > +static void minios_privcmd_cache_flush(xc_interface *xch, > > + const void *ptr, size_t nr) > > +{ > > +#if defined(__i386__) || defined(__x86_64__) > > + /* No need for cache maintenance on x86 */ > > +#else > > + PERROR("No cache flush operation defined for architecture"); > > +#endif > > +}That appears to just print a warning message to a file no-one will read. I think it should crash. You may save some code by having a single unimplemented_cache_flush function to put in all these structs. Ian.
Ian Campbell
2013-Dec-12 19:32 UTC
Re: [PATCH v2] tools: libxc: flush data cache after loading images into guest memory
On Thu, 2013-12-12 at 17:52 +0000, Ian Jackson wrote:> Ian Campbell writes ("Re: [PATCH v2] tools: libxc: flush data cache after loading images into guest memory"): > > Since this is a tool patch more than and ARM one I should have CCd you, > > sorry. > > NP. I haven''t really any comment on whether your cache flushes are > good or not, but two things occur to me: > > * Are there other situations where the toolstack (or device model) > maps domain memory, which also need to be treated ?These are all the ones I know of/could find in the domain building case, which is the main time we access guest memory like this, and the one which is problematic because the guest starts with its caches turned off. This is where we have an actual problem in practice today. The other one would be migration when we get there, but that doesn''t exist yet. Other than that we would normally expect/require that guests enable their caches and run with them on. We already require that hypercall argument buffers are in a cacheable region because doing otherwise adds a load of complexity to the common case on the hypervisor side. There is no device model on ARM. If and when there is we would have to think about what that means wrt caches, especially given that in this case the guest knows it will have to do cache maintenance to do DMA etc.> * This: > > > > +static void minios_privcmd_cache_flush(xc_interface *xch, > > > + const void *ptr, size_t nr) > > > +{ > > > +#if defined(__i386__) || defined(__x86_64__) > > > + /* No need for cache maintenance on x86 */ > > > +#else > > > + PERROR("No cache flush operation defined for architecture"); > > > +#endif > > > +} > > That appears to just print a warning message to a file no-one will > read. I think it should crash.Actually, for minios there is no PERROR defined at all so it won''t compile, I clearly forgot to build test stubdoms. The rest of xc_minios.c just uses printf, so I will do the same. I find it hard to believe that whoever is developing a minios based builder on ARM or some new platform wouldn''t be looking at the stubdom console. Unless you feel strongly that I should stick an abort() in here (not sure what minios will do with that...).> You may save some code by having a single unimplemented_cache_flush > function to put in all these structs.Only one is built at a time, depending on the platform, so there is no duplication in the binary. I''m not too worried about the source duplication in this instance, I think its good to have these platform files be pretty standalone. Ian.
Ian Jackson
2013-Dec-12 19:43 UTC
Re: [PATCH v2] tools: libxc: flush data cache after loading images into guest memory
Ian Campbell writes ("Re: [PATCH v2] tools: libxc: flush data cache after loading images into guest memory"):> These are all the ones I know of/could find in the domain building case, > which is the main time we access guest memory like this, and the one > which is problematic because the guest starts with its caches turned > off. This is where we have an actual problem in practice today.I''m reassured, thanks.> > That appears to just print a warning message to a file no-one will > > read. I think it should crash. > > Actually, for minios there is no PERROR defined at all so it won''t > compile, I clearly forgot to build test stubdoms.Heh.> The rest of xc_minios.c just uses printf, so I will do the same. I find > it hard to believe that whoever is developing a minios based builder on > ARM or some new platform wouldn''t be looking at the stubdom console. > Unless you feel strongly that I should stick an abort() in here (not > sure what minios will do with that...).Well, I would prefer an abort() in all of these cases. (Not just the minios one.) It seems to me that it''s better for the code to crash than to carry on and do something which probably has undefined behaviour! Ian.
Julien Grall
2013-Dec-13 00:49 UTC
Re: [PATCH v2] tools: libxc: flush data cache after loading images into guest memory
On 12/12/2013 02:23 PM, Ian Campbell wrote:> On ARM guest OSes are started with MMU and Caches disables (as they are on > native) however caching is enabled in the domain running the builder and > therefore we must flush the cache as we load the blobs, otherwise when the > guest starts running it may not see them. The dom0 build in the hypervisor has > the same requirements and already does the right thing. > > The mechanism for performing a cache flush from userspace is OS specific, so > implement this as a new osdep hook: > > - On 32-bit ARM Linux provides a system call to flush the cache. > - On 64-bit ARM Linux the processor is configured to allow cache flushes > directly from userspace. > - Non-Linux platforms will need to provide their own implementation. If > similar mechanisms are not available then a new privcmd ioctl should be a > suitable alternative. > > No cache maintenance is required on x86, so provide a stub for all non-Linux > platforms which returns success on x86 only and log an error otherwise. > > This fixes guest building on Xgene which has a very large L3 cache and so is > particularly susceptible to this problem. It has also been observed > sporadically on midway.This patch doesn''t solve issue on Midway. cacheflush syscall on ARM32 is calling DCCMVAU (Data Clean Cache by MVA to PoU), that is not enough. As I understand the ARM ARM B2.2.6 (page B2-1275): - PoC means the data will be written to the RAM - PoU means, in a same inner shareable domain, instruction/data cache and translation page table will see the same value for a specific MVA. It doesn''t means that the data will reach the RAM. I did some test and indeed DCCMVAC (Data Clean Cache By MVA to PoC) resolves the problem on Midway (and generally on ARMv7). Unfortunately Linux doesn''t provide any syscall to call this function for ARMv7 and it''s not possible to call cache instruction from userspace. What we could do is: - Use the "flags" parameters of cacheflush syscall and call a function which DCCMVAC (for instance __cpuc_flush_dcache_area) - Extend privcmd to have a flush cache ioctl Both solution would mean waiting Linux 3.14 (I don''t think we can get an accepted patch for 3.13). I have also tried to trap PoU cache instruction (via HCR.TPU). But when Xen call DCCIMVAC/DCCIMVAU, the processor will raise a data abort fault. Any thoughts? -- Julien Grall
Ian Campbell
2013-Dec-13 08:26 UTC
Re: [PATCH v2] tools: libxc: flush data cache after loading images into guest memory
On Fri, 2013-12-13 at 00:49 +0000, Julien Grall wrote:> > On 12/12/2013 02:23 PM, Ian Campbell wrote: > > On ARM guest OSes are started with MMU and Caches disables (as they are on > > native) however caching is enabled in the domain running the builder and > > therefore we must flush the cache as we load the blobs, otherwise when the > > guest starts running it may not see them. The dom0 build in the hypervisor has > > the same requirements and already does the right thing. > > > > The mechanism for performing a cache flush from userspace is OS specific, so > > implement this as a new osdep hook: > > > > - On 32-bit ARM Linux provides a system call to flush the cache. > > - On 64-bit ARM Linux the processor is configured to allow cache flushes > > directly from userspace. > > - Non-Linux platforms will need to provide their own implementation. If > > similar mechanisms are not available then a new privcmd ioctl should be a > > suitable alternative. > > > > No cache maintenance is required on x86, so provide a stub for all non-Linux > > platforms which returns success on x86 only and log an error otherwise. > > > > This fixes guest building on Xgene which has a very large L3 cache and so is > > particularly susceptible to this problem. It has also been observed > > sporadically on midway. > > This patch doesn''t solve issue on Midway.That''s a shame. I think we should go ahead with this patch regardless, since it does fix arm64 and introduces the infrastructure for arm32. I think there is no harm in adding the syscall on arm32 for now.> cacheflush syscall on ARM32 is calling DCCMVAU (Data Clean Cache by MVA > to PoU), that is not enough. > As I understand the ARM ARM B2.2.6 (page B2-1275): > - PoC means the data will be written to the RAM > - PoU means, in a same inner shareable domain, instruction/data > cache and translation page table will see the same value for a specific > MVA. It doesn''t means that the data will reach the RAM.This is essentially my understanding as well.> I did some test and indeed DCCMVAC (Data Clean Cache By MVA to PoC) > resolves the problem on Midway (and generally on ARMv7).Good.> Unfortunately Linux doesn''t provide any syscall to call this function > for ARMv7 and it''s not possible to call cache instruction from > userspace. What we could do is: > - Use the "flags" parameters of cacheflush syscall and call a > function which DCCMVAC (for instance __cpuc_flush_dcache_area) > - Extend privcmd to have a flush cache ioctlPersonally I think the first is nicer, but ultimately we need input from l-a-k on this one and would be happy with either.> Both solution would mean waiting Linux 3.14 (I don''t think we can get an > accepted patch for 3.13).That''s a shame, but it is what it is. We could perhaps tag it for a stable backport.> I have also tried to trap PoU cache instruction (via HCR.TPU). But when > Xen call DCCIMVAC/DCCIMVAU, the processor will raise a data abort fault.Xen MVAs are different to guest MVAs, and the guest MVA is very likely to not correspond to a mapping in the Xen space, so this approach can''t/won''t work I think.> Any thoughts?I think we''ll just have to wait for the Linux part of the fix to land. Are you going to look into this? Ian.
Stefano Stabellini
2013-Dec-13 12:01 UTC
Re: [PATCH v2] tools: libxc: flush data cache after loading images into guest memory
On Fri, 13 Dec 2013, Ian Campbell wrote:> On Fri, 2013-12-13 at 00:49 +0000, Julien Grall wrote: > > > > On 12/12/2013 02:23 PM, Ian Campbell wrote: > > > On ARM guest OSes are started with MMU and Caches disables (as they are on > > > native) however caching is enabled in the domain running the builder and > > > therefore we must flush the cache as we load the blobs, otherwise when the > > > guest starts running it may not see them. The dom0 build in the hypervisor has > > > the same requirements and already does the right thing. > > > > > > The mechanism for performing a cache flush from userspace is OS specific, so > > > implement this as a new osdep hook: > > > > > > - On 32-bit ARM Linux provides a system call to flush the cache. > > > - On 64-bit ARM Linux the processor is configured to allow cache flushes > > > directly from userspace. > > > - Non-Linux platforms will need to provide their own implementation. If > > > similar mechanisms are not available then a new privcmd ioctl should be a > > > suitable alternative. > > > > > > No cache maintenance is required on x86, so provide a stub for all non-Linux > > > platforms which returns success on x86 only and log an error otherwise. > > > > > > This fixes guest building on Xgene which has a very large L3 cache and so is > > > particularly susceptible to this problem. It has also been observed > > > sporadically on midway. > > > > This patch doesn''t solve issue on Midway. > > That''s a shame. I think we should go ahead with this patch regardless, > since it does fix arm64 and introduces the infrastructure for arm32. I > think there is no harm in adding the syscall on arm32 for now.I agree. I wonder if QEMU (qdisk) is going to need similar cache flushes.> > cacheflush syscall on ARM32 is calling DCCMVAU (Data Clean Cache by MVA > > to PoU), that is not enough. > > As I understand the ARM ARM B2.2.6 (page B2-1275): > > - PoC means the data will be written to the RAM > > - PoU means, in a same inner shareable domain, instruction/data > > cache and translation page table will see the same value for a specific > > MVA. It doesn''t means that the data will reach the RAM. > > This is essentially my understanding as well. > > > I did some test and indeed DCCMVAC (Data Clean Cache By MVA to PoC) > > resolves the problem on Midway (and generally on ARMv7). > > Good. > > > Unfortunately Linux doesn''t provide any syscall to call this function > > for ARMv7 and it''s not possible to call cache instruction from > > userspace. What we could do is: > > - Use the "flags" parameters of cacheflush syscall and call a > > function which DCCMVAC (for instance __cpuc_flush_dcache_area) > > - Extend privcmd to have a flush cache ioctl > > Personally I think the first is nicer, but ultimately we need input from > l-a-k on this one and would be happy with either.I agree. Can you try to come up with such a patch?
Ian Campbell
2013-Dec-13 12:53 UTC
Re: [PATCH v2] tools: libxc: flush data cache after loading images into guest memory
On Fri, 2013-12-13 at 12:01 +0000, Stefano Stabellini wrote:> On Fri, 13 Dec 2013, Ian Campbell wrote: > > On Fri, 2013-12-13 at 00:49 +0000, Julien Grall wrote: > > > > > > On 12/12/2013 02:23 PM, Ian Campbell wrote: > > > > On ARM guest OSes are started with MMU and Caches disables (as they are on > > > > native) however caching is enabled in the domain running the builder and > > > > therefore we must flush the cache as we load the blobs, otherwise when the > > > > guest starts running it may not see them. The dom0 build in the hypervisor has > > > > the same requirements and already does the right thing. > > > > > > > > The mechanism for performing a cache flush from userspace is OS specific, so > > > > implement this as a new osdep hook: > > > > > > > > - On 32-bit ARM Linux provides a system call to flush the cache. > > > > - On 64-bit ARM Linux the processor is configured to allow cache flushes > > > > directly from userspace. > > > > - Non-Linux platforms will need to provide their own implementation. If > > > > similar mechanisms are not available then a new privcmd ioctl should be a > > > > suitable alternative. > > > > > > > > No cache maintenance is required on x86, so provide a stub for all non-Linux > > > > platforms which returns success on x86 only and log an error otherwise. > > > > > > > > This fixes guest building on Xgene which has a very large L3 cache and so is > > > > particularly susceptible to this problem. It has also been observed > > > > sporadically on midway. > > > > > > This patch doesn''t solve issue on Midway. > > > > That''s a shame. I think we should go ahead with this patch regardless, > > since it does fix arm64 and introduces the infrastructure for arm32. I > > think there is no harm in adding the syscall on arm32 for now. > > I agree. > I wonder if QEMU (qdisk) is going to need similar cache flushes.I think for the PV driver case we are entitled to require that the rings and the memory under I/O be held in cacheable RAM. The alternative is that both the front and backend have to do cache maintenance operations which seems like a bit of a waste of everyone''s time when we know everything is RAM based rather than real DMA. Obviously for a qemu-dm style emulation we would have to do something, but we don''t support that today.> > > cacheflush syscall on ARM32 is calling DCCMVAU (Data Clean Cache by MVA > > > to PoU), that is not enough. > > > As I understand the ARM ARM B2.2.6 (page B2-1275): > > > - PoC means the data will be written to the RAM > > > - PoU means, in a same inner shareable domain, instruction/data > > > cache and translation page table will see the same value for a specific > > > MVA. It doesn''t means that the data will reach the RAM. > > > > This is essentially my understanding as well. > > > > > I did some test and indeed DCCMVAC (Data Clean Cache By MVA to PoC) > > > resolves the problem on Midway (and generally on ARMv7). > > > > Good. > > > > > Unfortunately Linux doesn''t provide any syscall to call this function > > > for ARMv7 and it''s not possible to call cache instruction from > > > userspace. What we could do is: > > > - Use the "flags" parameters of cacheflush syscall and call a > > > function which DCCMVAC (for instance __cpuc_flush_dcache_area) > > > - Extend privcmd to have a flush cache ioctl > > > > Personally I think the first is nicer, but ultimately we need input from > > l-a-k on this one and would be happy with either. > > I agree. Can you try to come up with such a patch?I think Julien was going to investigate, but if says not I''ll take a stab at it. Ian.
Julien Grall
2013-Dec-16 00:49 UTC
Re: [PATCH v2] tools: libxc: flush data cache after loading images into guest memory
On 12/13/2013 12:53 PM, Ian Campbell wrote:>>> >>>> Unfortunately Linux doesn''t provide any syscall to call this function >>>> for ARMv7 and it''s not possible to call cache instruction from >>>> userspace. What we could do is: >>>> - Use the "flags" parameters of cacheflush syscall and call a >>>> function which DCCMVAC (for instance __cpuc_flush_dcache_area) >>>> - Extend privcmd to have a flush cache ioctl >>> >>> Personally I think the first is nicer, but ultimately we need input from >>> l-a-k on this one and would be happy with either. >> >> I agree. Can you try to come up with such a patch?There is 2 functions in kernel space to flush cache PoC: - flush_kern_dcache_area - dma_flush_range I''m not sure if one of these are suitable to expose to user space.> I think Julien was going to investigate, but if says not I''ll take a > stab at it.I will try to do and send a patch tomorrow. -- Julien Grall