Ian Campbell
2013-Feb-11 13:12 UTC
[PATCH 4.0-testing 00/10] XSA-{25, 27, 33, 36}: Backports for 4.0 (for Debian update)
I had need of these for a Debian squeeze update (the others are all in the tree already) so I thought I would share. These are based on the 4.1 backports already in tree. If any one wants to check my working (or just compare notes with their own backports) I''d appreciate it. The only one which I''m a bit uncertain about is XSA-36, not because the backport was hard but because I''m not sure what if anything my be needed as an antecedent to those patches. I already included a backport of "AMD IOMMU: Fix an interrupt remapping issue" (unstable 23200:995a0c01a076). I''m also intending to wait for the "AMD IOMMU: XSA-36 follow ups" patches to go into mainline before actually sending these anywhere. NB I also included "libxc: Do not use dom0 physmem as parameter to lzma decoder" as a precursor to "libxc: builder: limit maximum size of kernel/ramdisk." since it simplified the backport but also had a vague whiff of being useful too. Ian.
Ian Campbell
2013-Feb-11 13:12 UTC
[PATCH 4.0-testing 01/10] libxc: Do not use dom0 physmem as parameter to lzma decoder
From: Ian Jackson <Ian.Jackson@eu.citrix.com> It''s not clear why a userspace lzma decode would want to use that particular value, what bearing it has on anything or why it would assume it could use 1/3 of the total RAM in the system (potentially quite a large amount of RAM) as opposed to any other limit number. Instead, hardcode 32Mby. This reverts 22830:c80960244942, removes the xc_get_physmem/physmem function entirely, and replaces the expression at the call site with a fixed constant. Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <Ian.Campbell@eu.citrix.com> Cc: Christoph Egger <Christoph.Egger@amd.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com> --- tools/libxc/xc_dom_bzimageloader.c | 23 +---------------------- 1 files changed, 1 insertions(+), 22 deletions(-) diff --git a/tools/libxc/xc_dom_bzimageloader.c b/tools/libxc/xc_dom_bzimageloader.c index 398dbff..aa5abc7 100644 --- a/tools/libxc/xc_dom_bzimageloader.c +++ b/tools/libxc/xc_dom_bzimageloader.c @@ -149,27 +149,6 @@ static int xc_try_bzip2_decode( #include <lzma.h> -static uint64_t physmem(void) -{ - uint64_t ret = 0; - const long pagesize = sysconf(_SC_PAGESIZE); - const long pages = sysconf(_SC_PHYS_PAGES); - - if ( (pagesize != -1) || (pages != -1) ) - { - /* - * According to docs, pagesize * pages can overflow. - * Simple case is 32-bit box with 4 GiB or more RAM, - * which may report exactly 4 GiB of RAM, and "long" - * being 32-bit will overflow. Casting to uint64_t - * hopefully avoids overflows in the near future. - */ - ret = (uint64_t)(pagesize) * (uint64_t)(pages); - } - - return ret; -} - static int xc_try_lzma_decode( struct xc_dom_image *dom, void **blob, size_t *size) { @@ -182,7 +161,7 @@ static int xc_try_lzma_decode( int outsize; const char *msg; - ret = lzma_alone_decoder(&stream, physmem() / 3); + ret = lzma_alone_decoder(&stream, 32*1024*1024); if ( ret != LZMA_OK ) { xc_dom_printf("LZMA: Failed to init stream decoder\n"); -- 1.7.2.5
Ian Campbell
2013-Feb-11 13:12 UTC
[PATCH 4.0-testing 02/10] libxc: builder: limit maximum size of kernel/ramdisk.
From: Ian Jackson <Ian.Jackson@eu.citrix.com> Allowing user supplied kernels of arbitrary sizes, especially during decompression, can swallow up dom0 memory leading to either virtual address space exhaustion in the builder process or allocation failures/OOM killing of both toolstack and unrelated processes. We disable these checks when building in a stub domain for pvgrub since this uses the guest''s own memory and is isolated. Decompression of gzip compressed kernels and ramdisks has been safe since 14954:58205257517d (Xen 3.1.0 onwards). This is XSA-25 / CVE-2012-4544. Also make explicit checks for buffer overflows in various decompression routines. These were already ruled out due to other properties of the code but check them as a belt-and-braces measure. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> [ Includes 25589:60f09d1ab1fe for CVE-2012-2625 ] --- stubdom/grub/kexec.c | 4 ++ tools/libxc/xc_dom.h | 23 +++++++++- tools/libxc/xc_dom_bzimageloader.c | 44 ++++++++++++++++-- tools/libxc/xc_dom_core.c | 78 +++++++++++++++++++++++++++++-- tools/pygrub/src/pygrub | 55 +++++++++++++++++---- tools/python/xen/xm/messages/xen-xm.pot | 3 +- 6 files changed, 186 insertions(+), 21 deletions(-) diff --git a/stubdom/grub/kexec.c b/stubdom/grub/kexec.c index 821c71d..f3f1b7b 100644 --- a/stubdom/grub/kexec.c +++ b/stubdom/grub/kexec.c @@ -137,6 +137,10 @@ void kexec(void *kernel, long kernel_size, void *module, long module_size, char dom = xc_dom_allocate(cmdline, features); dom->allocate = kexec_allocate; + /* We are using guest owned memory, therefore no limits. */ + xc_dom_kernel_max_size(dom, 0); + xc_dom_ramdisk_max_size(dom, 0); + dom->kernel_blob = kernel; dom->kernel_size = kernel_size; diff --git a/tools/libxc/xc_dom.h b/tools/libxc/xc_dom.h index 58d3f49..71ffa1e 100644 --- a/tools/libxc/xc_dom.h +++ b/tools/libxc/xc_dom.h @@ -36,6 +36,9 @@ struct xc_dom_image { void *ramdisk_blob; size_t ramdisk_size; + size_t max_kernel_size; + size_t max_ramdisk_size; + /* arguments and parameters */ char *cmdline; uint32_t f_requested[XENFEAT_NR_SUBMAPS]; @@ -158,6 +161,23 @@ void xc_dom_release_phys(struct xc_dom_image *dom); void xc_dom_release(struct xc_dom_image *dom); int xc_dom_mem_init(struct xc_dom_image *dom, unsigned int mem_mb); +/* Set this larger if you have enormous ramdisks/kernels. Note that + * you should trust all kernels not to be maliciously large (e.g. to + * exhaust all dom0 memory) if you do this (see CVE-2012-4544 / + * XSA-25). You can also set the default independently for + * ramdisks/kernels in xc_dom_allocate() or call + * xc_dom_{kernel,ramdisk}_max_size. + */ +#ifndef XC_DOM_DECOMPRESS_MAX +#define XC_DOM_DECOMPRESS_MAX (1024*1024*1024) /* 1GB */ +#endif + +int xc_dom_kernel_check_size(struct xc_dom_image *dom, size_t sz); +int xc_dom_kernel_max_size(struct xc_dom_image *dom, size_t sz); + +int xc_dom_ramdisk_check_size(struct xc_dom_image *dom, size_t sz); +int xc_dom_ramdisk_max_size(struct xc_dom_image *dom, size_t sz); + size_t xc_dom_check_gzip(void *blob, size_t ziplen); int xc_dom_do_gunzip(void *src, size_t srclen, void *dst, size_t dstlen); int xc_dom_try_gunzip(struct xc_dom_image *dom, void **blob, size_t * size); @@ -202,7 +222,8 @@ void xc_dom_log_memory_footprint(struct xc_dom_image *dom); void *xc_dom_malloc(struct xc_dom_image *dom, size_t size); void *xc_dom_malloc_page_aligned(struct xc_dom_image *dom, size_t size); void *xc_dom_malloc_filemap(struct xc_dom_image *dom, - const char *filename, size_t * size); + const char *filename, size_t * size, + const size_t max_size); char *xc_dom_strdup(struct xc_dom_image *dom, const char *str); /* --- alloc memory pool ------------------------------------------- */ diff --git a/tools/libxc/xc_dom_bzimageloader.c b/tools/libxc/xc_dom_bzimageloader.c index aa5abc7..3aaf827 100644 --- a/tools/libxc/xc_dom_bzimageloader.c +++ b/tools/libxc/xc_dom_bzimageloader.c @@ -33,13 +33,19 @@ static int xc_try_bzip2_decode( char *out_buf; char *tmp_buf; int retval = -1; - int outsize; + unsigned int outsize; uint64_t total; stream.bzalloc = NULL; stream.bzfree = NULL; stream.opaque = NULL; + if ( dom->kernel_size == 0) + { + xc_dom_printf("BZIP2: Input is 0 size"); + return -1; + } + ret = BZ2_bzDecompressInit(&stream, 0, 0); if ( ret != BZ_OK ) { @@ -52,6 +58,17 @@ static int xc_try_bzip2_decode( * the input buffer to start, and we''ll realloc as needed. */ outsize = dom->kernel_size; + + /* + * stream.avail_in and outsize are unsigned int, while kernel_size + * is a size_t. Check we aren''t overflowing. + */ + if ( outsize != dom->kernel_size ) + { + xc_dom_printf("BZIP2: Input too large"); + goto bzip2_cleanup; + } + out_buf = malloc(outsize); if ( out_buf == NULL ) { @@ -84,13 +101,20 @@ static int xc_try_bzip2_decode( if ( stream.avail_out == 0 ) { /* Protect against output buffer overflow */ - if ( outsize > INT_MAX / 2 ) + if ( outsize > UINT_MAX / 2 ) { xc_dom_printf("BZIP2: output buffer overflow\n"); free(out_buf); goto bzip2_cleanup; } + if ( xc_dom_kernel_check_size(dom, outsize * 2) ) + { + xc_dom_printf("BZIP2: output too large"); + free(out_buf); + goto bzip2_cleanup; + } + tmp_buf = realloc(out_buf, outsize * 2); if ( tmp_buf == NULL ) { @@ -158,10 +182,15 @@ static int xc_try_lzma_decode( unsigned char *out_buf; unsigned char *tmp_buf; int retval = -1; - int outsize; + size_t outsize; const char *msg; ret = lzma_alone_decoder(&stream, 32*1024*1024); + if ( dom->kernel_size == 0) + { + xc_dom_printf("LZMA: Input is 0 size"); + return -1; + } if ( ret != LZMA_OK ) { xc_dom_printf("LZMA: Failed to init stream decoder\n"); @@ -237,13 +266,20 @@ static int xc_try_lzma_decode( if ( stream.avail_out == 0 ) { /* Protect against output buffer overflow */ - if ( outsize > INT_MAX / 2 ) + if ( outsize > SIZE_MAX / 2 ) { xc_dom_printf("LZMA: output buffer overflow\n"); free(out_buf); goto lzma_cleanup; } + if ( xc_dom_kernel_check_size(dom, outsize * 2) ) + { + xc_dom_printf("LZMA: output too large"); + free(out_buf); + goto lzma_cleanup; + } + tmp_buf = realloc(out_buf, outsize * 2); if ( tmp_buf == NULL ) { diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c index df8e83b..6770302 100644 --- a/tools/libxc/xc_dom_core.c +++ b/tools/libxc/xc_dom_core.c @@ -139,7 +139,8 @@ void *xc_dom_malloc_page_aligned(struct xc_dom_image *dom, size_t size) } void *xc_dom_malloc_filemap(struct xc_dom_image *dom, - const char *filename, size_t * size) + const char *filename, size_t * size, + const size_t max_size) { struct xc_dom_mem *block = NULL; int fd = -1; @@ -151,6 +152,13 @@ void *xc_dom_malloc_filemap(struct xc_dom_image *dom, lseek(fd, 0, SEEK_SET); *size = lseek(fd, 0, SEEK_END); + if ( max_size && *size > max_size ) + { + xc_dom_panic(XC_OUT_OF_MEMORY, + "tried to map file which is too large"); + goto err; + } + block = malloc(sizeof(*block)); if ( block == NULL ) goto err; @@ -202,6 +210,40 @@ char *xc_dom_strdup(struct xc_dom_image *dom, const char *str) } /* ------------------------------------------------------------------------ */ +/* decompression buffer sizing */ +int xc_dom_kernel_check_size(struct xc_dom_image *dom, size_t sz) +{ + /* No limit */ + if ( !dom->max_kernel_size ) + return 0; + + if ( sz > dom->max_kernel_size ) + { + xc_dom_panic(XC_INVALID_KERNEL, + "kernel image too large"); + return 1; + } + + return 0; +} + +int xc_dom_ramdisk_check_size(struct xc_dom_image *dom, size_t sz) +{ + /* No limit */ + if ( !dom->max_ramdisk_size ) + return 0; + + if ( sz > dom->max_ramdisk_size ) + { + xc_dom_panic(XC_INVALID_KERNEL, + "ramdisk image too large"); + return 1; + } + + return 0; +} + +/* ------------------------------------------------------------------------ */ /* read files, copy memory blocks, with transparent gunzip */ size_t xc_dom_check_gzip(void *blob, size_t ziplen) @@ -215,7 +257,7 @@ size_t xc_dom_check_gzip(void *blob, size_t ziplen) gzlen = blob + ziplen - 4; unziplen = gzlen[3] << 24 | gzlen[2] << 16 | gzlen[1] << 8 | gzlen[0]; - if ( (unziplen < 0) || (unziplen > (1024*1024*1024)) ) /* 1GB limit */ + if ( (unziplen < 0) || (unziplen > XC_DOM_DECOMPRESS_MAX) ) { xc_dom_printf ("%s: size (zip %zd, unzip %zd) looks insane, skip gunzip\n", @@ -266,6 +308,9 @@ int xc_dom_try_gunzip(struct xc_dom_image *dom, void **blob, size_t * size) if ( unziplen == 0 ) return 0; + if ( xc_dom_kernel_check_size(dom, unziplen) ) + return 0; + unzip = xc_dom_malloc(dom, unziplen); if ( unzip == NULL ) return -1; @@ -562,6 +607,10 @@ struct xc_dom_image *xc_dom_allocate(const char *cmdline, const char *features) goto err; memset(dom, 0, sizeof(*dom)); + + dom->max_kernel_size = XC_DOM_DECOMPRESS_MAX; + dom->max_ramdisk_size = XC_DOM_DECOMPRESS_MAX; + if ( cmdline ) dom->cmdline = xc_dom_strdup(dom, cmdline); if ( features ) @@ -582,10 +631,25 @@ struct xc_dom_image *xc_dom_allocate(const char *cmdline, const char *features) return NULL; } +int xc_dom_kernel_max_size(struct xc_dom_image *dom, size_t sz) +{ + xc_dom_printf("%s: kernel_max_size=%zx", __FUNCTION__, sz); + dom->max_kernel_size = sz; + return 0; +} + +int xc_dom_ramdisk_max_size(struct xc_dom_image *dom, size_t sz) +{ + xc_dom_printf("%s: ramdisk_max_size=%zx", __FUNCTION__, sz); + dom->max_ramdisk_size = sz; + return 0; +} + int xc_dom_kernel_file(struct xc_dom_image *dom, const char *filename) { xc_dom_printf("%s: filename=\"%s\"\n", __FUNCTION__, filename); - dom->kernel_blob = xc_dom_malloc_filemap(dom, filename, &dom->kernel_size); + dom->kernel_blob = xc_dom_malloc_filemap(dom, filename, &dom->kernel_size, + dom->max_kernel_size); if ( dom->kernel_blob == NULL ) return -1; return xc_dom_try_gunzip(dom, &dom->kernel_blob, &dom->kernel_size); @@ -595,7 +659,9 @@ int xc_dom_ramdisk_file(struct xc_dom_image *dom, const char *filename) { xc_dom_printf("%s: filename=\"%s\"\n", __FUNCTION__, filename); dom->ramdisk_blob - xc_dom_malloc_filemap(dom, filename, &dom->ramdisk_size); + xc_dom_malloc_filemap(dom, filename, &dom->ramdisk_size, + dom->max_ramdisk_size); + if ( dom->ramdisk_blob == NULL ) return -1; // return xc_dom_try_gunzip(dom, &dom->ramdisk_blob, &dom->ramdisk_size); @@ -755,7 +821,11 @@ int xc_dom_build_image(struct xc_dom_image *dom) void *ramdiskmap; unziplen = xc_dom_check_gzip(dom->ramdisk_blob, dom->ramdisk_size); + if ( xc_dom_ramdisk_check_size(dom, unziplen) != 0 ) + unziplen = 0; + ramdisklen = unziplen ? unziplen : dom->ramdisk_size; + if ( xc_dom_alloc_segment(dom, &dom->ramdisk_seg, "ramdisk", 0, ramdisklen) != 0 ) goto err; diff --git a/tools/pygrub/src/pygrub b/tools/pygrub/src/pygrub index e52df7b..93b0cac 100644 --- a/tools/pygrub/src/pygrub +++ b/tools/pygrub/src/pygrub @@ -28,6 +28,7 @@ import grub.LiloConf import grub.ExtLinuxConf PYGRUB_VER = 0.6 +FS_READ_MAX = 1024 * 1024 def enable_cursor(ison): if ison: @@ -407,7 +408,8 @@ class Grub: if self.__dict__.get(''cf'', None) is None: raise RuntimeError, "couldn''t find bootloader config file in the image provided." f = fs.open_file(self.cf.filename) - buf = f.read() + # limit read size to avoid pathological cases + buf = f.read(FS_READ_MAX) del f self.cf.parse(buf) @@ -633,6 +635,37 @@ if __name__ == "__main__": def usage(): print >> sys.stderr, "Usage: %s [-q|--quiet] [-i|--interactive] [--output=] [--kernel=] [--ramdisk=] [--args=] [--entry=] <image>" %(sys.argv[0],) + def copy_from_image(fs, file_to_read, file_type, output_directory, + not_really): + if not_really: + if fs.file_exists(file_to_read): + return "<%s:%s>" % (file_type, file_to_read) + else: + sys.exit("The requested %s file does not exist" % file_type) + try: + datafile = fs.open_file(file_to_read) + except Exception, e: + print >>sys.stderr, e + sys.exit("Error opening %s in guest" % file_to_read) + (tfd, ret) = tempfile.mkstemp(prefix="boot_"+file_type+".", + dir=output_directory) + dataoff = 0 + while True: + data = datafile.read(FS_READ_MAX, dataoff) + if len(data) == 0: + os.close(tfd) + del datafile + return ret + try: + os.write(tfd, data) + except Exception, e: + print >>sys.stderr, e + os.close(tfd) + os.unlink(ret) + del datafile + sys.exit("Error writing temporary copy of "+file_type) + dataoff += len(data) + try: opts, args = getopt.gnu_getopt(sys.argv[1:], ''qih::'', ["quiet", "interactive", "help", "output=", @@ -712,18 +745,18 @@ if __name__ == "__main__": if not chosencfg["kernel"]: chosencfg = run_grub(file, entry, fs, incfg["args"]) - data = fs.open_file(chosencfg["kernel"]).read() - (tfd, bootcfg["kernel"]) = tempfile.mkstemp(prefix="boot_kernel.", - dir="/var/run/xend/boot") - os.write(tfd, data) - os.close(tfd) + bootcfg["kernel"] = copy_from_image(fs, chosencfg["kernel"], "kernel", + output_directory, not_really) if chosencfg["ramdisk"]: - data = fs.open_file(chosencfg["ramdisk"],).read() - (tfd, bootcfg["ramdisk"]) = tempfile.mkstemp(prefix="boot_ramdisk.", - dir="/var/run/xend/boot") - os.write(tfd, data) - os.close(tfd) + try: + bootcfg["ramdisk"] = copy_from_image(fs, chosencfg["ramdisk"], + "ramdisk", output_directory, + not_really) + except: + if not not_really: + os.unlink(bootcfg["kernel"]) + raise else: initrd = None diff --git a/tools/python/xen/xm/messages/xen-xm.pot b/tools/python/xen/xm/messages/xen-xm.pot index a600a69..3d381f1 100644 --- a/tools/python/xen/xm/messages/xen-xm.pot +++ b/tools/python/xen/xm/messages/xen-xm.pot @@ -8,10 +8,11 @@ msgid "" msgstr "" "Project-Id-Version: PACKAGE VERSION\n" "Report-Msgid-Bugs-To: \n" -"POT-Creation-Date: 2008-03-31 17:40+0100\n" +"POT-Creation-Date: 2013-02-07 10:25+0000\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n" "Language-Team: LANGUAGE <LL@li.org>\n" +"Language: \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=CHARSET\n" "Content-Transfer-Encoding: 8bit\n" -- 1.7.2.5
Ian Campbell
2013-Feb-11 13:12 UTC
[PATCH 4.0-testing 03/10] hvm: Limit the size of large HVM op batches
From: Tim Deegan <tim@xen.org> Doing large p2m updates for HVMOP_track_dirty_vram without preemption ties up the physical processor. Integrating preemption into the p2m updates is hard so simply limit to 1GB which is sufficient for a 15000 * 15000 * 32bpp framebuffer. For HVMOP_modified_memory and HVMOP_set_mem_type preemptible add the necessary machinery to handle preemption. This is CVE-2012-5511 / XSA-27. Signed-off-by: Tim Deegan <tim@xen.org> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson.citrix.com> x86/paging: Don''t allocate user-controlled amounts of stack memory. This is XSA-27 / CVE-2012-5511. Signed-off-by: Tim Deegan <tim@xen.org> Acked-by: Jan Beulich <jbeulich@suse.com> v2: Provide definition of GB to fix x86-32 compile. Signed-off-by: Jan Beulich <JBeulich@suse.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> --- xen/arch/x86/hvm/hvm.c | 37 +++++++++++++++++++++++++++++++++---- xen/arch/x86/mm/paging.c | 17 +++++++++++------ xen/include/asm-x86/config.h | 4 +++- 3 files changed, 47 insertions(+), 11 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 640969b..1ab0ec6 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -2943,6 +2943,9 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE(void) arg) if ( !is_hvm_domain(d) ) goto param_fail2; + if ( a.nr > GB(1) >> PAGE_SHIFT ) + goto param_fail2; + rc = xsm_hvm_param(d, op); if ( rc ) goto param_fail2; @@ -2969,7 +2972,6 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE(void) arg) { struct xen_hvm_modified_memory a; struct domain *d; - unsigned long pfn; if ( copy_from_guest(&a, arg, 1) ) return -EFAULT; @@ -2996,8 +2998,9 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE(void) arg) if ( !paging_mode_log_dirty(d) ) goto param_fail3; - for ( pfn = a.first_pfn; pfn < a.first_pfn + a.nr; pfn++ ) + while ( a.nr > 0 ) { + unsigned long pfn = a.first_pfn; p2m_type_t t; mfn_t mfn = gfn_to_mfn(d, pfn, &t); if ( p2m_is_paging(t) ) @@ -3018,6 +3021,19 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE(void) arg) /* don''t take a long time and don''t die either */ sh_remove_shadows(d->vcpu[0], mfn, 1, 0); } + + a.first_pfn++; + a.nr--; + + /* Check for continuation if it''s not the last interation */ + if ( a.nr > 0 && hypercall_preempt_check() ) + { + if ( copy_to_guest(arg, &a, 1) ) + rc = -EFAULT; + else + rc = -EAGAIN; + break; + } } param_fail3: @@ -3029,7 +3045,6 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE(void) arg) { struct xen_hvm_set_mem_type a; struct domain *d; - unsigned long pfn; /* Interface types to internal p2m types */ p2m_type_t memtype[] = { @@ -3062,8 +3077,9 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE(void) arg) if ( a.hvmmem_type >= ARRAY_SIZE(memtype) ) goto param_fail4; - for ( pfn = a.first_pfn; pfn < a.first_pfn + a.nr; pfn++ ) + while ( a.nr > 0 ) { + unsigned long pfn = a.first_pfn; p2m_type_t t; p2m_type_t nt; mfn_t mfn; @@ -3099,6 +3115,19 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE(void) arg) goto param_fail4; } } + + a.first_pfn++; + a.nr--; + + /* Check for continuation if it''s not the last interation */ + if ( a.nr > 0 && hypercall_preempt_check() ) + { + if ( copy_to_guest(arg, &a, 1) ) + rc = -EFAULT; + else + rc = -EAGAIN; + goto param_fail4; + } } rc = 0; diff --git a/xen/arch/x86/mm/paging.c b/xen/arch/x86/mm/paging.c index b11577a..bba747e 100644 --- a/xen/arch/x86/mm/paging.c +++ b/xen/arch/x86/mm/paging.c @@ -486,13 +486,18 @@ int paging_log_dirty_range(struct domain *d, if ( !d->arch.paging.log_dirty.fault_count && !d->arch.paging.log_dirty.dirty_count ) { - int size = (nr + BITS_PER_LONG - 1) / BITS_PER_LONG; - unsigned long zeroes[size]; - memset(zeroes, 0x00, size * BYTES_PER_LONG); + static uint8_t zeroes[PAGE_SIZE]; + int off, size; + + size = ((nr + BITS_PER_LONG - 1) / BITS_PER_LONG) * sizeof (long); rv = 0; - if ( copy_to_guest_offset(dirty_bitmap, 0, (uint8_t *) zeroes, - size * BYTES_PER_LONG) != 0 ) - rv = -EFAULT; + for ( off = 0; !rv && off < size; off += sizeof zeroes ) + { + int todo = min(size - off, (int) PAGE_SIZE); + if ( copy_to_guest_offset(dirty_bitmap, off, zeroes, todo) ) + rv = -EFAULT; + off += todo; + } goto out; } d->arch.paging.log_dirty.fault_count = 0; diff --git a/xen/include/asm-x86/config.h b/xen/include/asm-x86/config.h index c9e6057..462df51 100644 --- a/xen/include/asm-x86/config.h +++ b/xen/include/asm-x86/config.h @@ -108,6 +108,9 @@ extern unsigned int trampoline_xen_phys_start; extern unsigned char trampoline_cpu_started; extern char wakeup_start[]; extern unsigned int video_mode, video_flags; + +#define GB(_gb) (_gb ## UL << 30) + #endif #define asmlinkage @@ -123,7 +126,6 @@ extern unsigned int video_mode, video_flags; #define PML4_ADDR(_slot) \ ((((_slot ## UL) >> 8) * 0xffff000000000000UL) | \ (_slot ## UL << PML4_ENTRY_BITS)) -#define GB(_gb) (_gb ## UL << 30) #else #define PML4_ENTRY_BYTES (1 << PML4_ENTRY_BITS) #define PML4_ADDR(_slot) \ -- 1.7.2.5
Ian Campbell
2013-Feb-11 13:12 UTC
[PATCH 4.0-testing 04/10] x86/mm: Fix loop increment in paging_log_dirty_range()
From: Tim Deegan <tim@xen.org> In 23417:53ef1f35a0f8 (the fix for XSA-27 / CVE-2012-5511), the loop variable gets incremented twice, so the loop only clears every second page of the bitmap. This might cause the tools to think that pages are dirty when they are not. Reported-by: Steven Noonan <snoonan@amazon.com> Reported-by: Matt Wilson <msw@amazon.com> Signed-off-by: Tim Deegan <tim@xen.org> Acked-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Jan Beulich <jbeulich@suse.com> --- xen/arch/x86/mm/paging.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/xen/arch/x86/mm/paging.c b/xen/arch/x86/mm/paging.c index bba747e..0caebe0 100644 --- a/xen/arch/x86/mm/paging.c +++ b/xen/arch/x86/mm/paging.c @@ -491,7 +491,8 @@ int paging_log_dirty_range(struct domain *d, size = ((nr + BITS_PER_LONG - 1) / BITS_PER_LONG) * sizeof (long); rv = 0; - for ( off = 0; !rv && off < size; off += sizeof zeroes ) + off = 0; + while ( !rv && off < size ) { int todo = min(size - off, (int) PAGE_SIZE); if ( copy_to_guest_offset(dirty_bitmap, off, zeroes, todo) ) -- 1.7.2.5
Ian Campbell
2013-Feb-11 13:12 UTC
[PATCH 4.0-testing 05/10] VT-d: fix interrupt remapping source validation for devices behind legacy bridges
From: Jan Beulich <jbeulich@suse.com> Using SVT_VERIFY_BUS here doesn''t make sense; native Linux also uses SVT_VERIFY_SID_SQ here instead. This is XSA-33 / CVE-2012-5634. Signed-off-by: Jan Beulich <jbeulich@suse.com> xen-unstable changeset: 26340:19fd1237ff0d xen-unstable date: Wed Jan 9 16:13:26 UTC 2013 --- xen/drivers/passthrough/vtd/intremap.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/xen/drivers/passthrough/vtd/intremap.c b/xen/drivers/passthrough/vtd/intremap.c index 006fdd9..663c1d0 100644 --- a/xen/drivers/passthrough/vtd/intremap.c +++ b/xen/drivers/passthrough/vtd/intremap.c @@ -502,7 +502,7 @@ static void set_msi_source_id(struct pci_dev *pdev, struct iremap_entry *ire) set_ire_sid(ire, SVT_VERIFY_BUS, SQ_ALL_16, (bus << 8) | pdev->bus); else if ( pdev_type(bus, devfn) == DEV_TYPE_LEGACY_PCI_BRIDGE ) - set_ire_sid(ire, SVT_VERIFY_BUS, SQ_ALL_16, + set_ire_sid(ire, SVT_VERIFY_SID_SQ, SQ_ALL_16, PCI_BDF2(bus, devfn)); } break; -- 1.7.2.5
Ian Campbell
2013-Feb-11 13:12 UTC
[PATCH 4.0-testing 06/10] AMD IOMMU: Fix an interrupt remapping issue
From: Wei Wang <wei.wang2@amd.com> Some device could generate bogus interrupts if an IO-APIC RTE and an iommu interrupt remapping entry are not consistent during 2 adjacent 64bits IO-APIC RTE updates. For example, if the 2nd operation updates destination bits in RTE for SATA device and unmask it, in some case, SATA device will assert ioapic pin to generate interrupt immediately using new destination but iommu could still translate it into the old destination, then dom0 would be confused. To fix that, we sync up interrupt remapping entry with IO-APIC IRE on every 32 bits operation and forward IOAPIC RTE updates after interrupt. Signed-off-by: Wei Wang <wei.wang2@amd.com> Acked-by: Jan Beulich <jbeulich@novell.com> xen-unstable changeset: 23200:995a0c01a076 xen-unstable date: Tue Apr 12 13:26:19 2011 +0100 --- xen/drivers/passthrough/amd/iommu_intr.c | 104 +++++++++++++++++------------ 1 files changed, 61 insertions(+), 43 deletions(-) diff --git a/xen/drivers/passthrough/amd/iommu_intr.c b/xen/drivers/passthrough/amd/iommu_intr.c index 8c284d3..ceb6f47 100644 --- a/xen/drivers/passthrough/amd/iommu_intr.c +++ b/xen/drivers/passthrough/amd/iommu_intr.c @@ -116,8 +116,7 @@ void invalidate_interrupt_table(struct amd_iommu *iommu, u16 device_id) static void update_intremap_entry_from_ioapic( int bdf, struct amd_iommu *iommu, - struct IO_APIC_route_entry *ioapic_rte, - unsigned int rte_upper, unsigned int value) + struct IO_APIC_route_entry *ioapic_rte) { unsigned long flags; u32* entry; @@ -129,28 +128,26 @@ static void update_intremap_entry_from_ioapic( req_id = get_intremap_requestor_id(bdf); lock = get_intremap_lock(req_id); - /* only remap interrupt vector when lower 32 bits in ioapic ire changed */ - if ( likely(!rte_upper) ) - { - delivery_mode = rte->delivery_mode; - vector = rte->vector; - dest_mode = rte->dest_mode; - dest = rte->dest.logical.logical_dest; - spin_lock_irqsave(lock, flags); - offset = get_intremap_offset(vector, delivery_mode); - entry = (u32*)get_intremap_entry(req_id, offset); + delivery_mode = rte->delivery_mode; + vector = rte->vector; + dest_mode = rte->dest_mode; + dest = rte->dest.logical.logical_dest; - update_intremap_entry(entry, vector, delivery_mode, dest_mode, dest); - spin_unlock_irqrestore(lock, flags); + spin_lock_irqsave(lock, flags); - if ( iommu->enabled ) - { - spin_lock_irqsave(&iommu->lock, flags); - invalidate_interrupt_table(iommu, req_id); - flush_command_buffer(iommu); - spin_unlock_irqrestore(&iommu->lock, flags); - } + offset = get_intremap_offset(vector, delivery_mode); + entry = (u32*)get_intremap_entry(req_id, offset); + update_intremap_entry(entry, vector, delivery_mode, dest_mode, dest); + + spin_unlock_irqrestore(lock, flags); + + if ( iommu->enabled ) + { + spin_lock_irqsave(&iommu->lock, flags); + invalidate_interrupt_table(iommu, req_id); + flush_command_buffer(iommu); + spin_unlock_irqrestore(&iommu->lock, flags); } } @@ -198,7 +195,8 @@ int __init amd_iommu_setup_ioapic_remapping(void) spin_lock_irqsave(lock, flags); offset = get_intremap_offset(vector, delivery_mode); entry = (u32*)get_intremap_entry(req_id, offset); - update_intremap_entry(entry, vector, delivery_mode, dest_mode, dest); + update_intremap_entry(entry, vector, + delivery_mode, dest_mode, dest); spin_unlock_irqrestore(lock, flags); if ( iommu->enabled ) @@ -216,16 +214,17 @@ int __init amd_iommu_setup_ioapic_remapping(void) void amd_iommu_ioapic_update_ire( unsigned int apic, unsigned int reg, unsigned int value) { - struct IO_APIC_route_entry ioapic_rte = { 0 }; - unsigned int rte_upper = (reg & 1) ? 1 : 0; + struct IO_APIC_route_entry old_rte = { 0 }; + struct IO_APIC_route_entry new_rte = { 0 }; + unsigned int rte_lo = (reg & 1) ? reg - 1 : reg; int saved_mask, bdf; struct amd_iommu *iommu; - *IO_APIC_BASE(apic) = reg; - *(IO_APIC_BASE(apic)+4) = value; - if ( !iommu_intremap ) + { + __io_apic_write(apic, reg, value); return; + } /* get device id of ioapic devices */ bdf = ioapic_bdf[IO_APIC_ID(apic)]; @@ -234,30 +233,49 @@ void amd_iommu_ioapic_update_ire( { AMD_IOMMU_DEBUG( "Fail to find iommu for ioapic device id = 0x%x\n", bdf); + __io_apic_write(apic, reg, value); return; } - if ( rte_upper ) - return; - /* read both lower and upper 32-bits of rte entry */ - *IO_APIC_BASE(apic) = reg; - *(((u32 *)&ioapic_rte) + 0) = *(IO_APIC_BASE(apic)+4); - *IO_APIC_BASE(apic) = reg + 1; - *(((u32 *)&ioapic_rte) + 1) = *(IO_APIC_BASE(apic)+4); + /* save io-apic rte lower 32 bits */ + *((u32 *)&old_rte) = __io_apic_read(apic, rte_lo); + saved_mask = old_rte.mask; + + if ( reg == rte_lo ) + { + *((u32 *)&new_rte) = value; + /* read upper 32 bits from io-apic rte */ + *(((u32 *)&new_rte) + 1) = __io_apic_read(apic, reg + 1); + } + else + { + *((u32 *)&new_rte) = *((u32 *)&old_rte); + *(((u32 *)&new_rte) + 1) = value; + } /* mask the interrupt while we change the intremap table */ - saved_mask = ioapic_rte.mask; - ioapic_rte.mask = 1; - *IO_APIC_BASE(apic) = reg; - *(IO_APIC_BASE(apic)+4) = *(((int *)&ioapic_rte)+0); - ioapic_rte.mask = saved_mask; + if ( !saved_mask ) + { + old_rte.mask = 1; + __io_apic_write(apic, rte_lo, *((u32 *)&old_rte)); + } - update_intremap_entry_from_ioapic( - bdf, iommu, &ioapic_rte, rte_upper, value); + /* Update interrupt remapping entry */ + update_intremap_entry_from_ioapic(bdf, iommu, &new_rte); + + /* Forward write access to IO-APIC RTE */ + __io_apic_write(apic, reg, value); + + /* For lower bits access, return directly to avoid double writes */ + if ( reg == rte_lo ) + return; /* unmask the interrupt after we have updated the intremap table */ - *IO_APIC_BASE(apic) = reg; - *(IO_APIC_BASE(apic)+4) = *(((u32 *)&ioapic_rte)+0); + if ( !saved_mask ) + { + old_rte.mask = saved_mask; + __io_apic_write(apic, rte_lo, *((u32 *)&old_rte)); + } } static void update_intremap_entry_from_msi_msg( -- 1.7.2.5
Ian Campbell
2013-Feb-11 13:12 UTC
[PATCH 4.0-testing 07/10] ACPI: acpi_table_parse() should return handler''s error code
From: Boris Ostrovsky <boris.ostrovsky@amd.com> Currently, the error code returned by acpi_table_parse()''s handler is ignored. This patch will propagate handler''s return value to acpi_table_parse()''s caller. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com> Committed-by: Jan Beulich <jbeulich@suse.com> xen-unstable changeset: 26516:32d4516a97f0 xen-unstable date: Tue Feb 5 14:18:18 UTC 2013 --- xen/drivers/acpi/tables.c | 5 ++--- 1 files changed, 2 insertions(+), 3 deletions(-) diff --git a/xen/drivers/acpi/tables.c b/xen/drivers/acpi/tables.c index a25d9b2..151bda6 100644 --- a/xen/drivers/acpi/tables.c +++ b/xen/drivers/acpi/tables.c @@ -237,7 +237,7 @@ acpi_table_parse_madt(enum acpi_madt_type id, * @handler: handler to run * * Scan the ACPI System Descriptor Table (STD) for a table matching @id, - * run @handler on it. Return 0 if table found, return on if not. + * run @handler on it. */ int acpi_table_parse(char *id, acpi_table_handler handler) { @@ -252,8 +252,7 @@ int acpi_table_parse(char *id, acpi_table_handler handler) acpi_get_table(id, 0, &table); if (table) { - handler(table); - return 0; + return handler(table); } else return 1; } -- 1.7.2.5
Ian Campbell
2013-Feb-11 13:12 UTC
[PATCH 4.0-testing 08/10] AMD, IOMMU: Clean up old entries in remapping tables when creating new one
From: Jan Beulich <jbeulich@suse.com> When changing the affinity of an IRQ associated with a passed through PCI device, clear previous mapping. This is XSA-36 / CVE-2013-0153. Signed-off-by: Jan Beulich <jbeulich@suse.com> In addition, because some BIOSes may incorrectly program IVRS entries for IOAPIC try to check for entry''s consistency. Specifically, if conflicting entries are found disable IOMMU if per-device remapping table is used. If entries refer to bogus IOAPIC IDs disable IOMMU unconditionally Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com> xen-unstable changeset: 26517:601139e2b0db xen-unstable date: Tue Feb 5 14:20:47 UTC 2013 --- xen/drivers/passthrough/amd/iommu_acpi.c | 59 +++++++++++++++++++++++-- xen/drivers/passthrough/amd/iommu_intr.c | 40 ++++++++++++++--- xen/include/asm-x86/hvm/svm/amd-iommu-proto.h | 5 ++ 3 files changed, 94 insertions(+), 10 deletions(-) diff --git a/xen/drivers/passthrough/amd/iommu_acpi.c b/xen/drivers/passthrough/amd/iommu_acpi.c index 7905e12..0d6d2a6 100644 --- a/xen/drivers/passthrough/amd/iommu_acpi.c +++ b/xen/drivers/passthrough/amd/iommu_acpi.c @@ -20,6 +20,8 @@ #include <xen/config.h> #include <xen/errno.h> +#include <asm/apicdef.h> +#include <asm/io_apic.h> #include <asm/amd-iommu.h> #include <asm/hvm/svm/amd-iommu-proto.h> #include <asm/hvm/svm/amd-iommu-acpi.h> @@ -28,7 +30,6 @@ extern unsigned long amd_iommu_page_entries; extern unsigned short ivrs_bdf_entries; extern struct ivrs_mappings *ivrs_mappings; extern unsigned short last_bdf; -extern int ioapic_bdf[MAX_IO_APICS]; extern void *shared_intremap_table; static void add_ivrs_mapping_entry( @@ -635,6 +636,7 @@ static u16 __init parse_ivhd_device_special( u16 header_length, u16 block_length, struct amd_iommu *iommu) { u16 dev_length, bdf; + int apic; dev_length = sizeof(struct acpi_ivhd_device_special); if ( header_length < (block_length + dev_length) ) @@ -651,9 +653,58 @@ static u16 __init parse_ivhd_device_special( } add_ivrs_mapping_entry(bdf, bdf, ivhd_device->header.flags, iommu); - /* set device id of ioapic */ - ioapic_bdf[ivhd_device->special.handle] = bdf; - return dev_length; + + if ( ivhd_device->special.variety != 1 /* ACPI_IVHD_IOAPIC */ ) + { + if ( ivhd_device->special.variety != 2 /* ACPI_IVHD_HPET */ ) + printk(XENLOG_ERR "Unrecognized IVHD special variety %#x\n", + ivhd_device->special.variety); + return dev_length; + } + + /* + * Some BIOSes have IOAPIC broken entries so we check for IVRS + * consistency here --- whether entry''s IOAPIC ID is valid and + * whether there are conflicting/duplicated entries. + */ + for ( apic = 0; apic < nr_ioapics; apic++ ) + { + if ( IO_APIC_ID(apic) != ivhd_device->special.handle ) + continue; + + if ( ioapic_bdf[ivhd_device->special.handle].pin_setup ) + { + if ( ioapic_bdf[ivhd_device->special.handle].bdf == bdf ) + AMD_IOMMU_DEBUG("IVHD Warning: Duplicate IO-APIC %#x entries\n", + ivhd_device->special.handle); + else + { + printk(XENLOG_ERR "IVHD Error: Conflicting IO-APIC %#x entries\n", + ivhd_device->special.handle); + if ( amd_iommu_perdev_intremap ) + return 0; + } + } + else + { + /* set device id of ioapic */ + ioapic_bdf[ivhd_device->special.handle].bdf = bdf; + + ioapic_bdf[ivhd_device->special.handle].pin_setup = xzalloc_array( + unsigned long, BITS_TO_LONGS(nr_ioapic_registers[apic])); + if ( nr_ioapic_registers[apic] && + !ioapic_bdf[IO_APIC_ID(apic)].pin_setup ) + { + printk(XENLOG_ERR "IVHD Error: Out of memory\n"); + return 0; + } + } + return dev_length; + } + + printk(XENLOG_ERR "IVHD Error: Invalid IO-APIC %#x\n", + ivhd_device->special.handle); + return 0; } static int __init parse_ivhd_block(struct acpi_ivhd_block_header *ivhd_block) diff --git a/xen/drivers/passthrough/amd/iommu_intr.c b/xen/drivers/passthrough/amd/iommu_intr.c index ceb6f47..f8b1bb0 100644 --- a/xen/drivers/passthrough/amd/iommu_intr.c +++ b/xen/drivers/passthrough/amd/iommu_intr.c @@ -26,7 +26,7 @@ #define INTREMAP_LENGTH 0xB #define INTREMAP_ENTRIES (1 << INTREMAP_LENGTH) -int ioapic_bdf[MAX_IO_APICS]; +struct ioapic_bdf ioapic_bdf[MAX_IO_APICS]; extern struct ivrs_mappings *ivrs_mappings; extern unsigned short ivrs_bdf_entries; void *shared_intremap_table; @@ -116,12 +116,12 @@ void invalidate_interrupt_table(struct amd_iommu *iommu, u16 device_id) static void update_intremap_entry_from_ioapic( int bdf, struct amd_iommu *iommu, - struct IO_APIC_route_entry *ioapic_rte) + const struct IO_APIC_route_entry *rte, + const struct IO_APIC_route_entry *old_rte) { unsigned long flags; u32* entry; u8 delivery_mode, dest, vector, dest_mode; - struct IO_APIC_route_entry *rte = ioapic_rte; int req_id; spinlock_t *lock; int offset; @@ -137,6 +137,14 @@ static void update_intremap_entry_from_ioapic( spin_lock_irqsave(lock, flags); offset = get_intremap_offset(vector, delivery_mode); + if ( old_rte ) + { + int old_offset = get_intremap_offset(old_rte->vector, + old_rte->delivery_mode); + + if ( offset != old_offset ) + free_intremap_entry(bdf, old_offset); + } entry = (u32*)get_intremap_entry(req_id, offset); update_intremap_entry(entry, vector, delivery_mode, dest_mode, dest); @@ -175,7 +183,7 @@ int __init amd_iommu_setup_ioapic_remapping(void) continue; /* get device id of ioapic devices */ - bdf = ioapic_bdf[IO_APIC_ID(apic)]; + bdf = ioapic_bdf[IO_APIC_ID(apic)].bdf; iommu = find_iommu_for_device(bdf); if ( !iommu ) { @@ -206,6 +214,7 @@ int __init amd_iommu_setup_ioapic_remapping(void) flush_command_buffer(iommu); spin_unlock_irqrestore(&iommu->lock, flags); } + set_bit(pin, ioapic_bdf[IO_APIC_ID(apic)].pin_setup); } } return 0; @@ -217,6 +226,7 @@ void amd_iommu_ioapic_update_ire( struct IO_APIC_route_entry old_rte = { 0 }; struct IO_APIC_route_entry new_rte = { 0 }; unsigned int rte_lo = (reg & 1) ? reg - 1 : reg; + unsigned int pin = (reg - 0x10) / 2; int saved_mask, bdf; struct amd_iommu *iommu; @@ -227,7 +237,7 @@ void amd_iommu_ioapic_update_ire( } /* get device id of ioapic devices */ - bdf = ioapic_bdf[IO_APIC_ID(apic)]; + bdf = ioapic_bdf[IO_APIC_ID(apic)].bdf; iommu = find_iommu_for_device(bdf); if ( !iommu ) { @@ -253,6 +263,14 @@ void amd_iommu_ioapic_update_ire( *(((u32 *)&new_rte) + 1) = value; } + if ( new_rte.mask && + !test_bit(pin, ioapic_bdf[IO_APIC_ID(apic)].pin_setup) ) + { + ASSERT(saved_mask); + __io_apic_write(apic, reg, value); + return; + } + /* mask the interrupt while we change the intremap table */ if ( !saved_mask ) { @@ -261,7 +279,11 @@ void amd_iommu_ioapic_update_ire( } /* Update interrupt remapping entry */ - update_intremap_entry_from_ioapic(bdf, iommu, &new_rte); + update_intremap_entry_from_ioapic( + bdf, iommu, &new_rte, + test_and_set_bit(pin, + ioapic_bdf[IO_APIC_ID(apic)].pin_setup) ? &old_rte + : NULL); /* Forward write access to IO-APIC RTE */ __io_apic_write(apic, reg, value); @@ -373,6 +395,12 @@ void amd_iommu_msi_msg_update_ire( return; } + if ( msi_desc->remap_index >= 0 ) + update_intremap_entry_from_msi_msg(iommu, pdev, msi_desc, NULL); + + if ( !msg ) + return; + update_intremap_entry_from_msi_msg(iommu, pdev, msi_desc, msg); } diff --git a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h index 5539636..bbe763d 100644 --- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h +++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h @@ -89,6 +89,11 @@ void amd_iommu_read_msi_from_ire( unsigned int amd_iommu_read_ioapic_from_ire( unsigned int apic, unsigned int reg); +extern struct ioapic_bdf { + u16 bdf; + unsigned long *pin_setup; +} ioapic_bdf[]; + /* power management support */ void amd_iommu_resume(void); void amd_iommu_suspend(void); -- 1.7.2.5
Ian Campbell
2013-Feb-11 13:12 UTC
[PATCH 4.0-testing 09/10] AMD, IOMMU: Disable IOMMU if SATA Combined mode is on
From: Boris Ostrovsky <boris.ostrovsky@amd.com> AMD''s SP5100 chipset can be placed into SATA Combined mode that may cause prevent dom0 from booting when IOMMU is enabled and per-device interrupt remapping table is used. While SP5100 erratum 28 requires BIOSes to disable this mode, some may still use it. This patch checks whether this mode is on and, if per-device table is in use, disables IOMMU. This is XSA-36 / CVE-2013-0153. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com> Flipped operands of && in amd_iommu_init() to make the message issued by amd_sp5100_erratum28() match reality (when amd_iommu_perdev_intremap is zero, there''s really no point in calling the function). Signed-off-by: Jan Beulich <jbeulich@suse.com> xen-unstable changeset: 26518:e379a23b0465 xen-unstable date: Tue Feb 5 14:21:25 UTC 2013 --- xen/drivers/passthrough/amd/iommu_init.c | 33 ++++++++++++++++++++++++++++++ 1 files changed, 33 insertions(+), 0 deletions(-) diff --git a/xen/drivers/passthrough/amd/iommu_init.c b/xen/drivers/passthrough/amd/iommu_init.c index c0ceb2e..170f61f 100644 --- a/xen/drivers/passthrough/amd/iommu_init.c +++ b/xen/drivers/passthrough/amd/iommu_init.c @@ -823,12 +823,45 @@ static int __init amd_iommu_setup_device_table(void) return 0; } +/* Check whether SP5100 SATA Combined mode is on */ +static bool_t __init amd_sp5100_erratum28(void) +{ + u32 bus, id; + u16 vendor_id, dev_id; + u8 byte; + + for (bus = 0; bus < 256; bus++) + { + id = pci_conf_read32(bus, 0x14, 0, PCI_VENDOR_ID); + + vendor_id = id & 0xffff; + dev_id = (id >> 16) & 0xffff; + + /* SP5100 SMBus module sets Combined mode on */ + if (vendor_id != 0x1002 || dev_id != 0x4385) + continue; + + byte = pci_conf_read8(bus, 0x14, 0, 0xad); + if ( (byte >> 3) & 1 ) + { + printk(XENLOG_WARNING "AMD-Vi: SP5100 erratum 28 detected, disabling IOMMU.\n" + "If possible, disable SATA Combined mode in BIOS or contact your vendor for BIOS update.\n"); + return 1; + } + } + + return 0; +} + int __init amd_iommu_init(void) { struct amd_iommu *iommu; BUG_ON( !iommu_found() ); + if ( amd_iommu_perdev_intremap && amd_sp5100_erratum28() ) + goto error_out; + irq_to_iommu = xmalloc_array(struct amd_iommu *, nr_irqs); if ( irq_to_iommu == NULL ) goto error_out; -- 1.7.2.5
Ian Campbell
2013-Feb-11 13:12 UTC
[PATCH 4.0-testing 10/10] AMD, IOMMU: Make per-device interrupt remapping table default
From: Boris Ostrovsky <boris.ostrovsky@amd.com> Using global interrupt remapping table may be insecure, as described by XSA-36. This patch makes per-device mode default. This is XSA-36 / CVE-2013-0153. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com> Moved warning in amd_iov_detect() to location covering all cases. Signed-off-by: Jan Beulich <jbeulich@suse.com> xen-unstable changeset: 26519:1af531e7bc2f xen-unstable date: Tue Feb 5 14:22:11 UTC 2013 --- xen/drivers/passthrough/amd/iommu_acpi.c | 5 +++-- xen/drivers/passthrough/amd/pci_amd_iommu.c | 2 ++ xen/drivers/passthrough/iommu.c | 4 +++- 3 files changed, 8 insertions(+), 3 deletions(-) diff --git a/xen/drivers/passthrough/amd/iommu_acpi.c b/xen/drivers/passthrough/amd/iommu_acpi.c index 0d6d2a6..bf4a691 100644 --- a/xen/drivers/passthrough/amd/iommu_acpi.c +++ b/xen/drivers/passthrough/amd/iommu_acpi.c @@ -20,7 +20,6 @@ #include <xen/config.h> #include <xen/errno.h> -#include <asm/apicdef.h> #include <asm/io_apic.h> #include <asm/amd-iommu.h> #include <asm/hvm/svm/amd-iommu-proto.h> @@ -690,7 +689,7 @@ static u16 __init parse_ivhd_device_special( /* set device id of ioapic */ ioapic_bdf[ivhd_device->special.handle].bdf = bdf; - ioapic_bdf[ivhd_device->special.handle].pin_setup = xzalloc_array( + ioapic_bdf[ivhd_device->special.handle].pin_setup = xmalloc_array( unsigned long, BITS_TO_LONGS(nr_ioapic_registers[apic])); if ( nr_ioapic_registers[apic] && !ioapic_bdf[IO_APIC_ID(apic)].pin_setup ) @@ -698,6 +697,8 @@ static u16 __init parse_ivhd_device_special( printk(XENLOG_ERR "IVHD Error: Out of memory\n"); return 0; } + memset(ioapic_bdf[ivhd_device->special.handle].pin_setup, 0, + sizeof(unsigned long) * BITS_TO_LONGS(nr_ioapic_registers[apic])); } return dev_length; } diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c index fb29e20..597a06a 100644 --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c @@ -173,6 +173,8 @@ int amd_iov_detect(void) printk("Error initialization\n"); return -ENODEV; } + if ( !amd_iommu_perdev_intremap ) + printk(XENLOG_WARNING "AMD-Vi: Using global interrupt remap table is not recommended (see XSA-36)!\n"); return 0; } diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c index 0dad6ef..5b3c66b 100644 --- a/xen/drivers/passthrough/iommu.c +++ b/xen/drivers/passthrough/iommu.c @@ -48,7 +48,7 @@ bool_t __read_mostly iommu_snoop = 1; bool_t __read_mostly iommu_qinval = 1; bool_t __read_mostly iommu_intremap = 1; bool_t __read_mostly amd_iommu_debug; -bool_t __read_mostly amd_iommu_perdev_intremap; +bool_t __read_mostly amd_iommu_perdev_intremap = 1; static void __init parse_iommu_param(char *s) { @@ -78,6 +78,8 @@ static void __init parse_iommu_param(char *s) amd_iommu_debug = 1; else if ( !strcmp(s, "amd-iommu-perdev-intremap") ) amd_iommu_perdev_intremap = 1; + else if ( !strcmp(s, "amd-iommu-global-intremap") ) + amd_iommu_perdev_intremap = 0; else if ( !strcmp(s, "dom0-passthrough") ) iommu_passthrough = 1; else if ( !strcmp(s, "dom0-strict") ) -- 1.7.2.5
Jan Beulich
2013-Feb-12 09:44 UTC
Re: [PATCH 4.0-testing 00/10] XSA-{25, 27, 33, 36}: Backports for 4.0 (for Debian update)
>>> On 11.02.13 at 14:12, Ian Campbell <Ian.Campbell@citrix.com> wrote: > The only one which I''m a bit uncertain about is XSA-36, not because the > backport was hard but because I''m not sure what if anything my be needed > as an antecedent to those patches. I already included a backport of "AMD > IOMMU: Fix an interrupt remapping issue" (unstable 23200:995a0c01a076).Since I also had to bring the XSA-36 fixes over to 4.0.x the other day, here''s what I included as prereqs: 23752:ef9ed3d2aa87 23753:2e0cf9428554 23754:fa4e2ca9ecff 23765:68b903bb1b01 23786:3a05da2dc7c0 23812:32814ad7458d 23899:a99d75671a91 (non-essential, but allowing eases other things) 24155:0d50e704834f 24156:f29b5bd6e25f There may be others (like 23200 that you mentioned) which we already had in our tree, and hence I didn''t directly notice as prereq. Jan