David Vrabel
2013-Jul-09 13:38 UTC
[PATCH] x86/xen: do not identity map E820 memory regions that are UNUSABLE
From: David Vrabel <david.vrabel@citrix.com> If there are UNUSABLE regions in the machine memory map, dom0 will attempt to map them 1:1 which is not permitted by Xen and the kernel will crash. There isn''t anything interesting in the UNUSABLE region that the dom0 kernel needs access to so we can avoid making the 1:1 mapping and leave the region as RAM. Since the obtaining the memory map for dom0 and domU are now more different, refactor each into separate functions. This fixes a dom0 boot failure if tboot is used (because tboot was marking a memory region as UNUSABLE). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: stable@vger.kernel.org --- arch/x86/xen/setup.c | 114 ++++++++++++++++++++++++++++++++++++++++---------- 1 files changed, 91 insertions(+), 23 deletions(-) diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c index 94eac5c..3fdb6bd 100644 --- a/arch/x86/xen/setup.c +++ b/arch/x86/xen/setup.c @@ -55,6 +55,91 @@ unsigned long xen_released_pages; */ #define EXTRA_MEM_RATIO (10) +static int __init xen_get_memory_map_dom0(struct e820entry *map, + unsigned *nr_entries) +{ + struct xen_memory_map memmap; + unsigned i; + int ret; + + /* + * Dom0 requires access to machine addresses for BIOS data and + * MMIO (e.g. PCI) devices. The reset of the kernel expects + * to be able to access these through a 1:1 p2m mapping. + * + * We need to take the pseudo physical memory map and set up + * 1:1 mappings corresponding to the RESERVED regions and + * holes in the /machine/ memory map, adding/expanding the RAM + * region at the end of the map for the relocated RAM. + * + * This is more easily done if we just start with the machine + * memory map. + * + * UNUSABLE regions are awkward, they are not interesting to + * dom0 and Xen won''t allow them to be mapped so we want to + * leave these as RAM in the pseudo physical map. + * + * Again, this is easiest if we take the machine memory map + * and change the UNUSABLE regions to RAM. + */ + + memmap.nr_entries = *nr_entries; + set_xen_guest_handle(memmap.buffer, map); + + ret = HYPERVISOR_memory_op(XENMEM_machine_memory_map, &memmap); + if (ret < 0) + return ret; + + for (i = 0; i < memmap.nr_entries; i++) { + if (map[i].type == E820_UNUSABLE) + map[i].type = E820_RAM; + } + + *nr_entries = memmap.nr_entries; + return 0; +} + +static int __init xen_get_memory_map_domu(struct e820entry *map, + unsigned *nr_entries) +{ + struct xen_memory_map memmap; + int ret; + + /* + * For domUs, use the psuedo-physical memory map. + * + * If this is not available, fake up a memory map with a + * single region containing the initial number of pages. + */ + + memmap.nr_entries = *nr_entries; + set_xen_guest_handle(memmap.buffer, map); + + ret = HYPERVISOR_memory_op(XENMEM_memory_map, &memmap); + if (ret == -ENOSYS) { + unsigned long max_pfn = min(MAX_DOMAIN_PAGES, + xen_start_info->nr_pages); + memmap.nr_entries = 1; + map[0].addr = 0ULL; + map[0].size = PFN_PHYS(max_pfn); + /* 8MB slack (to balance backend allocations). */ + map[0].size += 8ULL << 20; + map[0].type = E820_RAM; + } else if (ret < 0) + return ret; + + *nr_entries = memmap.nr_entries; + return 0; +} + +static int __init xen_get_memory_map(struct e820entry *map, unsigned *nr_entries) +{ + if (xen_initial_domain()) + return xen_get_memory_map_dom0(map, nr_entries); + else + return xen_get_memory_map_domu(map, nr_entries); +} + static void __init xen_add_extra_mem(u64 start, u64 size) { unsigned long pfn; @@ -319,42 +404,25 @@ static void xen_align_and_add_e820_region(u64 start, u64 size, int type) char * __init xen_memory_setup(void) { static struct e820entry map[E820MAX] __initdata; + unsigned map_nr_entries = E820MAX; unsigned long max_pfn = xen_start_info->nr_pages; unsigned long long mem_end; int rc; - struct xen_memory_map memmap; unsigned long max_pages; unsigned long last_pfn = 0; unsigned long extra_pages = 0; unsigned long populated; int i; - int op; max_pfn = min(MAX_DOMAIN_PAGES, max_pfn); mem_end = PFN_PHYS(max_pfn); - memmap.nr_entries = E820MAX; - set_xen_guest_handle(memmap.buffer, map); - - op = xen_initial_domain() ? - XENMEM_machine_memory_map : - XENMEM_memory_map; - rc = HYPERVISOR_memory_op(op, &memmap); - if (rc == -ENOSYS) { - BUG_ON(xen_initial_domain()); - memmap.nr_entries = 1; - map[0].addr = 0ULL; - map[0].size = mem_end; - /* 8MB slack (to balance backend allocations). */ - map[0].size += 8ULL << 20; - map[0].type = E820_RAM; - rc = 0; - } + rc = xen_get_memory_map(map, &map_nr_entries); BUG_ON(rc); /* Make sure the Xen-supplied memory map is well-ordered. */ - sanitize_e820_map(map, memmap.nr_entries, &memmap.nr_entries); + sanitize_e820_map(map, map_nr_entries, &map_nr_entries); max_pages = xen_get_max_pages(); if (max_pages > max_pfn) @@ -366,12 +434,12 @@ char * __init xen_memory_setup(void) * this are first released. */ xen_released_pages = xen_set_identity_and_release( - map, memmap.nr_entries, max_pfn); + map, map_nr_entries, max_pfn); /* * Populate back the non-RAM pages and E820 gaps that had been * released. */ - populated = xen_populate_chunk(map, memmap.nr_entries, + populated = xen_populate_chunk(map, map_nr_entries, max_pfn, &last_pfn, xen_released_pages); xen_released_pages -= populated; @@ -395,7 +463,7 @@ char * __init xen_memory_setup(void) extra_pages = min(EXTRA_MEM_RATIO * min(max_pfn, PFN_DOWN(MAXMEM)), extra_pages); i = 0; - while (i < memmap.nr_entries) { + while (i < map_nr_entries) { u64 addr = map[i].addr; u64 size = map[i].size; u32 type = map[i].type; -- 1.7.2.5
Konrad Rzeszutek Wilk
2013-Jul-09 14:13 UTC
Re: [PATCH] x86/xen: do not identity map E820 memory regions that are UNUSABLE
On Tue, Jul 09, 2013 at 02:38:53PM +0100, David Vrabel wrote:> From: David Vrabel <david.vrabel@citrix.com> > > If there are UNUSABLE regions in the machine memory map, dom0 will > attempt to map them 1:1 which is not permitted by Xen and the kernel > will crash. > > There isn''t anything interesting in the UNUSABLE region that the dom0 > kernel needs access to so we can avoid making the 1:1 mapping and > leave the region as RAM. > > Since the obtaining the memory map for dom0 and domU are now more > different, refactor each into separate functions. > > This fixes a dom0 boot failure if tboot is used (because tboot was > marking a memory region as UNUSABLE).Please also include the serial log that shows the crash.> > Signed-off-by: David Vrabel <david.vrabel@citrix.com> > Cc: stable@vger.kernel.org > --- > arch/x86/xen/setup.c | 114 ++++++++++++++++++++++++++++++++++++++++---------- > 1 files changed, 91 insertions(+), 23 deletions(-) > > diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c > index 94eac5c..3fdb6bd 100644 > --- a/arch/x86/xen/setup.c > +++ b/arch/x86/xen/setup.c > @@ -55,6 +55,91 @@ unsigned long xen_released_pages; > */ > #define EXTRA_MEM_RATIO (10) > > +static int __init xen_get_memory_map_dom0(struct e820entry *map, > + unsigned *nr_entries) > +{ > + struct xen_memory_map memmap; > + unsigned i; > + int ret; > + > + /* > + * Dom0 requires access to machine addresses for BIOS data and > + * MMIO (e.g. PCI) devices. The reset of the kernel expects > + * to be able to access these through a 1:1 p2m mapping. > + * > + * We need to take the pseudo physical memory map and set up > + * 1:1 mappings corresponding to the RESERVED regions and > + * holes in the /machine/ memory map, adding/expanding the RAM > + * region at the end of the map for the relocated RAM. > + * > + * This is more easily done if we just start with the machine > + * memory map. > + * > + * UNUSABLE regions are awkward, they are not interesting to > + * dom0 and Xen won''t allow them to be mapped so we want to > + * leave these as RAM in the pseudo physical map.We just want them as such in the P2M but not do any PTE creation for it? Why do we care about it? We are not creating any page tables for E820_UNUSABLE regions.> + * > + * Again, this is easiest if we take the machine memory map > + * and change the UNUSABLE regions to RAM.Won''t then Linux try to map them then? In 3.9 (or was it 3.8?) and further the page table creation starts ignoring any E820 entries that are not RAM. See init_range_memory_mapping and its comment: /* * We need to iterate through the E820 memory map and create direct mappings * for only E820_RAM and E820_KERN_RESERVED regions. We cannot simply * create direct mappings for all pfns from [0 to max_low_pfn) and * [4GB to max_pfn) because of possible memory holes in high addresses * that cannot be marked as UC by fixed/variable range MTRRs. * Depending on the alignment of E820 ranges, this may possibly result * in using smaller size (i.e. 4K instead of 2M or 1G) page tables. * So in effect you are now altering them.> + */ > + > + memmap.nr_entries = *nr_entries; > + set_xen_guest_handle(memmap.buffer, map); > + > + ret = HYPERVISOR_memory_op(XENMEM_machine_memory_map, &memmap); > + if (ret < 0) > + return ret; > + > + for (i = 0; i < memmap.nr_entries; i++) { > + if (map[i].type == E820_UNUSABLE)What if the E820_UNUSABLE regions were manufactured by the BIOS? Or somebody booted Xen with mem=3G (in which we clip the memory) on a 16GB box.> + map[i].type = E820_RAM; > + } > + > + *nr_entries = memmap.nr_entries; > + return 0; > +} > + > +static int __init xen_get_memory_map_domu(struct e820entry *map, > + unsigned *nr_entries) > +{ > + struct xen_memory_map memmap; > + int ret; > + > + /* > + * For domUs, use the psuedo-physical memory map. > + * > + * If this is not available, fake up a memory map with a > + * single region containing the initial number of pages. > + */ > + > + memmap.nr_entries = *nr_entries; > + set_xen_guest_handle(memmap.buffer, map); > + > + ret = HYPERVISOR_memory_op(XENMEM_memory_map, &memmap); > + if (ret == -ENOSYS) { > + unsigned long max_pfn = min(MAX_DOMAIN_PAGES, > + xen_start_info->nr_pages); > + memmap.nr_entries = 1; > + map[0].addr = 0ULL; > + map[0].size = PFN_PHYS(max_pfn); > + /* 8MB slack (to balance backend allocations). */ > + map[0].size += 8ULL << 20; > + map[0].type = E820_RAM; > + } else if (ret < 0) > + return ret; > + > + *nr_entries = memmap.nr_entries; > + return 0; > +} > + > +static int __init xen_get_memory_map(struct e820entry *map, unsigned *nr_entries) > +{ > + if (xen_initial_domain()) > + return xen_get_memory_map_dom0(map, nr_entries); > + else > + return xen_get_memory_map_domu(map, nr_entries); > +} > + > static void __init xen_add_extra_mem(u64 start, u64 size) > { > unsigned long pfn; > @@ -319,42 +404,25 @@ static void xen_align_and_add_e820_region(u64 start, u64 size, int type) > char * __init xen_memory_setup(void) > { > static struct e820entry map[E820MAX] __initdata; > + unsigned map_nr_entries = E820MAX; > > unsigned long max_pfn = xen_start_info->nr_pages; > unsigned long long mem_end; > int rc; > - struct xen_memory_map memmap; > unsigned long max_pages; > unsigned long last_pfn = 0; > unsigned long extra_pages = 0; > unsigned long populated; > int i; > - int op; > > max_pfn = min(MAX_DOMAIN_PAGES, max_pfn); > mem_end = PFN_PHYS(max_pfn); > > - memmap.nr_entries = E820MAX; > - set_xen_guest_handle(memmap.buffer, map); > - > - op = xen_initial_domain() ? > - XENMEM_machine_memory_map : > - XENMEM_memory_map; > - rc = HYPERVISOR_memory_op(op, &memmap); > - if (rc == -ENOSYS) { > - BUG_ON(xen_initial_domain()); > - memmap.nr_entries = 1; > - map[0].addr = 0ULL; > - map[0].size = mem_end; > - /* 8MB slack (to balance backend allocations). */ > - map[0].size += 8ULL << 20; > - map[0].type = E820_RAM; > - rc = 0; > - } > + rc = xen_get_memory_map(map, &map_nr_entries); > BUG_ON(rc); > > /* Make sure the Xen-supplied memory map is well-ordered. */ > - sanitize_e820_map(map, memmap.nr_entries, &memmap.nr_entries); > + sanitize_e820_map(map, map_nr_entries, &map_nr_entries); > > max_pages = xen_get_max_pages(); > if (max_pages > max_pfn) > @@ -366,12 +434,12 @@ char * __init xen_memory_setup(void) > * this are first released. > */ > xen_released_pages = xen_set_identity_and_release( > - map, memmap.nr_entries, max_pfn); > + map, map_nr_entries, max_pfn); > > /* > * Populate back the non-RAM pages and E820 gaps that had been > * released. */ > - populated = xen_populate_chunk(map, memmap.nr_entries, > + populated = xen_populate_chunk(map, map_nr_entries, > max_pfn, &last_pfn, xen_released_pages); > > xen_released_pages -= populated; > @@ -395,7 +463,7 @@ char * __init xen_memory_setup(void) > extra_pages = min(EXTRA_MEM_RATIO * min(max_pfn, PFN_DOWN(MAXMEM)), > extra_pages); > i = 0; > - while (i < memmap.nr_entries) { > + while (i < map_nr_entries) { > u64 addr = map[i].addr; > u64 size = map[i].size; > u32 type = map[i].type; > -- > 1.7.2.5 >
David Vrabel
2013-Jul-09 14:44 UTC
Re: [PATCH] x86/xen: do not identity map E820 memory regions that are UNUSABLE
On 09/07/13 15:13, Konrad Rzeszutek Wilk wrote:> On Tue, Jul 09, 2013 at 02:38:53PM +0100, David Vrabel wrote: >> From: David Vrabel <david.vrabel@citrix.com> >> >> If there are UNUSABLE regions in the machine memory map, dom0 will >> attempt to map them 1:1 which is not permitted by Xen and the kernel >> will crash. >> >> There isn''t anything interesting in the UNUSABLE region that the dom0 >> kernel needs access to so we can avoid making the 1:1 mapping and >> leave the region as RAM. >> >> Since the obtaining the memory map for dom0 and domU are now more >> different, refactor each into separate functions. >> >> This fixes a dom0 boot failure if tboot is used (because tboot was >> marking a memory region as UNUSABLE). > > Please also include the serial log that shows the crash.It''s a domain crash by Xen and it isn''t useful as none of the stack is decoded.>> +static int __init xen_get_memory_map_dom0(struct e820entry *map, >> + unsigned *nr_entries) >> +{ >> + struct xen_memory_map memmap; >> + unsigned i; >> + int ret; >> + >> + /* >> + * Dom0 requires access to machine addresses for BIOS data and >> + * MMIO (e.g. PCI) devices. The reset of the kernel expects >> + * to be able to access these through a 1:1 p2m mapping. >> + * >> + * We need to take the pseudo physical memory map and set up >> + * 1:1 mappings corresponding to the RESERVED regions and >> + * holes in the /machine/ memory map, adding/expanding the RAM >> + * region at the end of the map for the relocated RAM.This is the key paragraph. The apparent use of the machine memory map for dom0 is a confusing fiction.>> + * >> + * This is more easily done if we just start with the machine >> + * memory map. >> + * >> + * UNUSABLE regions are awkward, they are not interesting to >> + * dom0 and Xen won''t allow them to be mapped so we want to >> + * leave these as RAM in the pseudo physical map. > > We just want them as such in the P2M but not do any PTE creation for it? > Why do we care about it? We are not creating any page tables for > E820_UNUSABLE regions.I don''t follow what you''re asking here. In dom0, UNUSABLE regions in the machine memory map are RAM regions on the pseudo-physical memory map. So instead of playing games and making these regions special in the pseudo-physical map we just leave them as RAM.>> + * >> + * Again, this is easiest if we take the machine memory map >> + * and change the UNUSABLE regions to RAM. > > Won''t then Linux try to map them then? In 3.9 (or was it 3.8?) and further > the page table creation starts ignoring any E820 entries that are not RAM. > See init_range_memory_mapping and its comment:Yes. They are just regular RAM in the pseudo-physical map.> /* > * We need to iterate through the E820 memory map and create direct mappings > * for only E820_RAM and E820_KERN_RESERVED regions. We cannot simply > * create direct mappings for all pfns from [0 to max_low_pfn) and > * [4GB to max_pfn) because of possible memory holes in high addresses > * that cannot be marked as UC by fixed/variable range MTRRs. > * Depending on the alignment of E820 ranges, this may possibly result > * in using smaller size (i.e. 4K instead of 2M or 1G) page tables. > * > > > So in effect you are now altering them.No.>> + */ >> + >> + memmap.nr_entries = *nr_entries; >> + set_xen_guest_handle(memmap.buffer, map); >> + >> + ret = HYPERVISOR_memory_op(XENMEM_machine_memory_map, &memmap); >> + if (ret < 0) >> + return ret; >> + >> + for (i = 0; i < memmap.nr_entries; i++) { >> + if (map[i].type == E820_UNUSABLE) > > What if the E820_UNUSABLE regions were manufactured by the BIOS? Or > somebody booted Xen with mem=3G (in which we clip the memory) on a 16GB > box.The resulting memory map should be clipped by the result of the call to xen_get_max_pages(). David
Aurelien Chartier
2013-Jul-09 17:00 UTC
Re: [PATCH] x86/xen: do not identity map E820 memory regions that are UNUSABLE
On 09/07/13 14:38, David Vrabel wrote:> From: David Vrabel <david.vrabel@citrix.com> > > If there are UNUSABLE regions in the machine memory map, dom0 will > attempt to map them 1:1 which is not permitted by Xen and the kernel > will crash. > > There isn''t anything interesting in the UNUSABLE region that the dom0 > kernel needs access to so we can avoid making the 1:1 mapping and > leave the region as RAM. > > Since the obtaining the memory map for dom0 and domU are now more > different, refactor each into separate functions. > > This fixes a dom0 boot failure if tboot is used (because tboot was > marking a memory region as UNUSABLE). > > Signed-off-by: David Vrabel <david.vrabel@citrix.com> > Cc: stable@vger.kernel.orgI tested it with Xen 4.1.3-rc and Linux 3.8.13.4 and it is fixing the early boot crash I reported. Tested-by: Aurelien Chartier <aurelien.chartier@citrix.com>> --- > arch/x86/xen/setup.c | 114 ++++++++++++++++++++++++++++++++++++++++---------- > 1 files changed, 91 insertions(+), 23 deletions(-) > > diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c > index 94eac5c..3fdb6bd 100644 > --- a/arch/x86/xen/setup.c > +++ b/arch/x86/xen/setup.c > @@ -55,6 +55,91 @@ unsigned long xen_released_pages; > */ > #define EXTRA_MEM_RATIO (10) > > +static int __init xen_get_memory_map_dom0(struct e820entry *map, > + unsigned *nr_entries) > +{ > + struct xen_memory_map memmap; > + unsigned i; > + int ret; > + > + /* > + * Dom0 requires access to machine addresses for BIOS data and > + * MMIO (e.g. PCI) devices. The reset of the kernel expects > + * to be able to access these through a 1:1 p2m mapping. > + * > + * We need to take the pseudo physical memory map and set up > + * 1:1 mappings corresponding to the RESERVED regions and > + * holes in the /machine/ memory map, adding/expanding the RAM > + * region at the end of the map for the relocated RAM. > + * > + * This is more easily done if we just start with the machine > + * memory map. > + * > + * UNUSABLE regions are awkward, they are not interesting to > + * dom0 and Xen won''t allow them to be mapped so we want to > + * leave these as RAM in the pseudo physical map. > + * > + * Again, this is easiest if we take the machine memory map > + * and change the UNUSABLE regions to RAM. > + */ > + > + memmap.nr_entries = *nr_entries; > + set_xen_guest_handle(memmap.buffer, map); > + > + ret = HYPERVISOR_memory_op(XENMEM_machine_memory_map, &memmap); > + if (ret < 0) > + return ret; > + > + for (i = 0; i < memmap.nr_entries; i++) { > + if (map[i].type == E820_UNUSABLE) > + map[i].type = E820_RAM; > + } > + > + *nr_entries = memmap.nr_entries; > + return 0; > +} > + > +static int __init xen_get_memory_map_domu(struct e820entry *map, > + unsigned *nr_entries) > +{ > + struct xen_memory_map memmap; > + int ret; > + > + /* > + * For domUs, use the psuedo-physical memory map.Small typo here.> + * > + * If this is not available, fake up a memory map with a > + * single region containing the initial number of pages. > + */ > + > + memmap.nr_entries = *nr_entries; > + set_xen_guest_handle(memmap.buffer, map); > + > + ret = HYPERVISOR_memory_op(XENMEM_memory_map, &memmap); > + if (ret == -ENOSYS) { > + unsigned long max_pfn = min(MAX_DOMAIN_PAGES, > + xen_start_info->nr_pages); > + memmap.nr_entries = 1; > + map[0].addr = 0ULL; > + map[0].size = PFN_PHYS(max_pfn); > + /* 8MB slack (to balance backend allocations). */ > + map[0].size += 8ULL << 20; > + map[0].type = E820_RAM; > + } else if (ret < 0) > + return ret; > + > + *nr_entries = memmap.nr_entries; > + return 0; > +} > + > +static int __init xen_get_memory_map(struct e820entry *map, unsigned *nr_entries) > +{ > + if (xen_initial_domain()) > + return xen_get_memory_map_dom0(map, nr_entries); > + else > + return xen_get_memory_map_domu(map, nr_entries); > +} > + > static void __init xen_add_extra_mem(u64 start, u64 size) > { > unsigned long pfn; > @@ -319,42 +404,25 @@ static void xen_align_and_add_e820_region(u64 start, u64 size, int type) > char * __init xen_memory_setup(void) > { > static struct e820entry map[E820MAX] __initdata; > + unsigned map_nr_entries = E820MAX; > > unsigned long max_pfn = xen_start_info->nr_pages; > unsigned long long mem_end; > int rc; > - struct xen_memory_map memmap; > unsigned long max_pages; > unsigned long last_pfn = 0; > unsigned long extra_pages = 0; > unsigned long populated; > int i; > - int op; > > max_pfn = min(MAX_DOMAIN_PAGES, max_pfn); > mem_end = PFN_PHYS(max_pfn); > > - memmap.nr_entries = E820MAX; > - set_xen_guest_handle(memmap.buffer, map); > - > - op = xen_initial_domain() ? > - XENMEM_machine_memory_map : > - XENMEM_memory_map; > - rc = HYPERVISOR_memory_op(op, &memmap); > - if (rc == -ENOSYS) { > - BUG_ON(xen_initial_domain()); > - memmap.nr_entries = 1; > - map[0].addr = 0ULL; > - map[0].size = mem_end; > - /* 8MB slack (to balance backend allocations). */ > - map[0].size += 8ULL << 20; > - map[0].type = E820_RAM; > - rc = 0; > - } > + rc = xen_get_memory_map(map, &map_nr_entries); > BUG_ON(rc); > > /* Make sure the Xen-supplied memory map is well-ordered. */ > - sanitize_e820_map(map, memmap.nr_entries, &memmap.nr_entries); > + sanitize_e820_map(map, map_nr_entries, &map_nr_entries); > > max_pages = xen_get_max_pages(); > if (max_pages > max_pfn) > @@ -366,12 +434,12 @@ char * __init xen_memory_setup(void) > * this are first released. > */ > xen_released_pages = xen_set_identity_and_release( > - map, memmap.nr_entries, max_pfn); > + map, map_nr_entries, max_pfn); > > /* > * Populate back the non-RAM pages and E820 gaps that had been > * released. */ > - populated = xen_populate_chunk(map, memmap.nr_entries, > + populated = xen_populate_chunk(map, map_nr_entries, > max_pfn, &last_pfn, xen_released_pages); > > xen_released_pages -= populated; > @@ -395,7 +463,7 @@ char * __init xen_memory_setup(void) > extra_pages = min(EXTRA_MEM_RATIO * min(max_pfn, PFN_DOWN(MAXMEM)), > extra_pages); > i = 0; > - while (i < memmap.nr_entries) { > + while (i < map_nr_entries) { > u64 addr = map[i].addr; > u64 size = map[i].size; > u32 type = map[i].type;
Konrad Rzeszutek Wilk
2013-Jul-09 18:36 UTC
Re: [PATCH] x86/xen: do not identity map E820 memory regions that are UNUSABLE
On Tue, Jul 09, 2013 at 06:00:17PM +0100, Aurelien Chartier wrote:> On 09/07/13 14:38, David Vrabel wrote: > > From: David Vrabel <david.vrabel@citrix.com> > > > > If there are UNUSABLE regions in the machine memory map, dom0 will > > attempt to map them 1:1 which is not permitted by Xen and the kernel > > will crash. > > > > There isn''t anything interesting in the UNUSABLE region that the dom0 > > kernel needs access to so we can avoid making the 1:1 mapping and > > leave the region as RAM. > > > > Since the obtaining the memory map for dom0 and domU are now more > > different, refactor each into separate functions. > > > > This fixes a dom0 boot failure if tboot is used (because tboot was > > marking a memory region as UNUSABLE). > > > > Signed-off-by: David Vrabel <david.vrabel@citrix.com> > > Cc: stable@vger.kernel.org > > I tested it with Xen 4.1.3-rc and Linux 3.8.13.4 and it is fixing the > early boot crash I reported.How did you test it? How can one reproduce this crash? Do you see this failure with v3.10 or just with v3.8 (and earlier)?> > Tested-by: Aurelien Chartier <aurelien.chartier@citrix.com> > > > --- > > arch/x86/xen/setup.c | 114 ++++++++++++++++++++++++++++++++++++++++---------- > > 1 files changed, 91 insertions(+), 23 deletions(-) > > > > diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c > > index 94eac5c..3fdb6bd 100644 > > --- a/arch/x86/xen/setup.c > > +++ b/arch/x86/xen/setup.c > > @@ -55,6 +55,91 @@ unsigned long xen_released_pages; > > */ > > #define EXTRA_MEM_RATIO (10) > > > > +static int __init xen_get_memory_map_dom0(struct e820entry *map, > > + unsigned *nr_entries) > > +{ > > + struct xen_memory_map memmap; > > + unsigned i; > > + int ret; > > + > > + /* > > + * Dom0 requires access to machine addresses for BIOS data and > > + * MMIO (e.g. PCI) devices. The reset of the kernel expects > > + * to be able to access these through a 1:1 p2m mapping. > > + * > > + * We need to take the pseudo physical memory map and set up > > + * 1:1 mappings corresponding to the RESERVED regions and > > + * holes in the /machine/ memory map, adding/expanding the RAM > > + * region at the end of the map for the relocated RAM. > > + * > > + * This is more easily done if we just start with the machine > > + * memory map. > > + * > > + * UNUSABLE regions are awkward, they are not interesting to > > + * dom0 and Xen won''t allow them to be mapped so we want to > > + * leave these as RAM in the pseudo physical map. > > + * > > + * Again, this is easiest if we take the machine memory map > > + * and change the UNUSABLE regions to RAM. > > + */ > > + > > + memmap.nr_entries = *nr_entries; > > + set_xen_guest_handle(memmap.buffer, map); > > + > > + ret = HYPERVISOR_memory_op(XENMEM_machine_memory_map, &memmap); > > + if (ret < 0) > > + return ret; > > + > > + for (i = 0; i < memmap.nr_entries; i++) { > > + if (map[i].type == E820_UNUSABLE) > > + map[i].type = E820_RAM; > > + } > > + > > + *nr_entries = memmap.nr_entries; > > + return 0; > > +} > > + > > +static int __init xen_get_memory_map_domu(struct e820entry *map, > > + unsigned *nr_entries) > > +{ > > + struct xen_memory_map memmap; > > + int ret; > > + > > + /* > > + * For domUs, use the psuedo-physical memory map. > > Small typo here. > > > + * > > + * If this is not available, fake up a memory map with a > > + * single region containing the initial number of pages. > > + */ > > + > > + memmap.nr_entries = *nr_entries; > > + set_xen_guest_handle(memmap.buffer, map); > > + > > + ret = HYPERVISOR_memory_op(XENMEM_memory_map, &memmap); > > + if (ret == -ENOSYS) { > > + unsigned long max_pfn = min(MAX_DOMAIN_PAGES, > > + xen_start_info->nr_pages); > > + memmap.nr_entries = 1; > > + map[0].addr = 0ULL; > > + map[0].size = PFN_PHYS(max_pfn); > > + /* 8MB slack (to balance backend allocations). */ > > + map[0].size += 8ULL << 20; > > + map[0].type = E820_RAM; > > + } else if (ret < 0) > > + return ret; > > + > > + *nr_entries = memmap.nr_entries; > > + return 0; > > +} > > + > > +static int __init xen_get_memory_map(struct e820entry *map, unsigned *nr_entries) > > +{ > > + if (xen_initial_domain()) > > + return xen_get_memory_map_dom0(map, nr_entries); > > + else > > + return xen_get_memory_map_domu(map, nr_entries); > > +} > > + > > static void __init xen_add_extra_mem(u64 start, u64 size) > > { > > unsigned long pfn; > > @@ -319,42 +404,25 @@ static void xen_align_and_add_e820_region(u64 start, u64 size, int type) > > char * __init xen_memory_setup(void) > > { > > static struct e820entry map[E820MAX] __initdata; > > + unsigned map_nr_entries = E820MAX; > > > > unsigned long max_pfn = xen_start_info->nr_pages; > > unsigned long long mem_end; > > int rc; > > - struct xen_memory_map memmap; > > unsigned long max_pages; > > unsigned long last_pfn = 0; > > unsigned long extra_pages = 0; > > unsigned long populated; > > int i; > > - int op; > > > > max_pfn = min(MAX_DOMAIN_PAGES, max_pfn); > > mem_end = PFN_PHYS(max_pfn); > > > > - memmap.nr_entries = E820MAX; > > - set_xen_guest_handle(memmap.buffer, map); > > - > > - op = xen_initial_domain() ? > > - XENMEM_machine_memory_map : > > - XENMEM_memory_map; > > - rc = HYPERVISOR_memory_op(op, &memmap); > > - if (rc == -ENOSYS) { > > - BUG_ON(xen_initial_domain()); > > - memmap.nr_entries = 1; > > - map[0].addr = 0ULL; > > - map[0].size = mem_end; > > - /* 8MB slack (to balance backend allocations). */ > > - map[0].size += 8ULL << 20; > > - map[0].type = E820_RAM; > > - rc = 0; > > - } > > + rc = xen_get_memory_map(map, &map_nr_entries); > > BUG_ON(rc); > > > > /* Make sure the Xen-supplied memory map is well-ordered. */ > > - sanitize_e820_map(map, memmap.nr_entries, &memmap.nr_entries); > > + sanitize_e820_map(map, map_nr_entries, &map_nr_entries); > > > > max_pages = xen_get_max_pages(); > > if (max_pages > max_pfn) > > @@ -366,12 +434,12 @@ char * __init xen_memory_setup(void) > > * this are first released. > > */ > > xen_released_pages = xen_set_identity_and_release( > > - map, memmap.nr_entries, max_pfn); > > + map, map_nr_entries, max_pfn); > > > > /* > > * Populate back the non-RAM pages and E820 gaps that had been > > * released. */ > > - populated = xen_populate_chunk(map, memmap.nr_entries, > > + populated = xen_populate_chunk(map, map_nr_entries, > > max_pfn, &last_pfn, xen_released_pages); > > > > xen_released_pages -= populated; > > @@ -395,7 +463,7 @@ char * __init xen_memory_setup(void) > > extra_pages = min(EXTRA_MEM_RATIO * min(max_pfn, PFN_DOWN(MAXMEM)), > > extra_pages); > > i = 0; > > - while (i < memmap.nr_entries) { > > + while (i < map_nr_entries) { > > u64 addr = map[i].addr; > > u64 size = map[i].size; > > u32 type = map[i].type; > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Konrad Rzeszutek Wilk
2013-Jul-09 18:45 UTC
Re: [PATCH] x86/xen: do not identity map E820 memory regions that are UNUSABLE
On Tue, Jul 09, 2013 at 03:44:38PM +0100, David Vrabel wrote:> On 09/07/13 15:13, Konrad Rzeszutek Wilk wrote: > > On Tue, Jul 09, 2013 at 02:38:53PM +0100, David Vrabel wrote: > >> From: David Vrabel <david.vrabel@citrix.com> > >> > >> If there are UNUSABLE regions in the machine memory map, dom0 will > >> attempt to map them 1:1 which is not permitted by Xen and the kernel > >> will crash. > >> > >> There isn''t anything interesting in the UNUSABLE region that the dom0 > >> kernel needs access to so we can avoid making the 1:1 mapping and > >> leave the region as RAM. > >> > >> Since the obtaining the memory map for dom0 and domU are now more > >> different, refactor each into separate functions. > >> > >> This fixes a dom0 boot failure if tboot is used (because tboot was > >> marking a memory region as UNUSABLE). > > > > Please also include the serial log that shows the crash. > > It''s a domain crash by Xen and it isn''t useful as none of the stack is > decoded.Could you include the E820 at least to get a sense of where and how this looks? As in - without tboot and then with tboot?> > >> +static int __init xen_get_memory_map_dom0(struct e820entry *map, > >> + unsigned *nr_entries) > >> +{ > >> + struct xen_memory_map memmap; > >> + unsigned i; > >> + int ret; > >> + > >> + /* > >> + * Dom0 requires access to machine addresses for BIOS data and > >> + * MMIO (e.g. PCI) devices. The reset of the kernel expects > >> + * to be able to access these through a 1:1 p2m mapping. > >> + * > >> + * We need to take the pseudo physical memory map and set up > >> + * 1:1 mappings corresponding to the RESERVED regions and > >> + * holes in the /machine/ memory map, adding/expanding the RAM > >> + * region at the end of the map for the relocated RAM. > > This is the key paragraph. The apparent use of the machine memory map > for dom0 is a confusing fiction.OK, but I don''t follow when dom0 would be using the E820_UNUSED regions. Is it the xen_do_chunk that is failing on said PFNs? Or is it in this code xen_set_identity_and_release_chunk: "217 /* 218 * If the PFNs are currently mapped, the VA mapping also needs 219 * to be updated to be 1:1. 220 */ 221 for (pfn = start_pfn; pfn <= max_pfn_mapped && pfn < end_pfn; pfn++) 222 (void)HYPERVISOR_update_va_mapping( 223 (unsigned long)__va(pfn << PAGE_SHIFT), 224 mfn_pte(pfn, PAGE_KERNEL_IO), 0); 225 " which updates the initial PTE''s with the 1-1 PFN and the E820_UNUSABLE is somehow in between two E820_RAM regions?> > >> + * > >> + * This is more easily done if we just start with the machine > >> + * memory map. > >> + * > >> + * UNUSABLE regions are awkward, they are not interesting to > >> + * dom0 and Xen won''t allow them to be mapped so we want to > >> + * leave these as RAM in the pseudo physical map. > > > > We just want them as such in the P2M but not do any PTE creation for it? > > Why do we care about it? We are not creating any page tables for > > E820_UNUSABLE regions. > > I don''t follow what you''re asking here.What code maps said PFNs.> > In dom0, UNUSABLE regions in the machine memory map are RAM regions on > the pseudo-physical memory map. So instead of playing games and making > these regions special in the pseudo-physical map we just leave them as RAM... And then exposing them to the kernel to be used as normal RAM?> > >> + * > >> + * Again, this is easiest if we take the machine memory map > >> + * and change the UNUSABLE regions to RAM. > > > > Won''t then Linux try to map them then? In 3.9 (or was it 3.8?) and further > > the page table creation starts ignoring any E820 entries that are not RAM. > > See init_range_memory_mapping and its comment: > > Yes. They are just regular RAM in the pseudo-physical map.With your change it is. But without your change it would not map it.> > > /* > > * We need to iterate through the E820 memory map and create direct mappings > > * for only E820_RAM and E820_KERN_RESERVED regions. We cannot simply > > * create direct mappings for all pfns from [0 to max_low_pfn) and > > * [4GB to max_pfn) because of possible memory holes in high addresses > > * that cannot be marked as UC by fixed/variable range MTRRs. > > * Depending on the alignment of E820 ranges, this may possibly result > > * in using smaller size (i.e. 4K instead of 2M or 1G) page tables. > > * > > > > > > So in effect you are now altering them. > > No. > > >> + */ > >> + > >> + memmap.nr_entries = *nr_entries; > >> + set_xen_guest_handle(memmap.buffer, map); > >> + > >> + ret = HYPERVISOR_memory_op(XENMEM_machine_memory_map, &memmap); > >> + if (ret < 0) > >> + return ret; > >> + > >> + for (i = 0; i < memmap.nr_entries; i++) { > >> + if (map[i].type == E820_UNUSABLE) > > > > What if the E820_UNUSABLE regions were manufactured by the BIOS? Or > > somebody booted Xen with mem=3G (in which we clip the memory) on a 16GB > > box. > > The resulting memory map should be clipped by the result of the call to > xen_get_max_pages().OK. What about the BIOS manufacturing it?
Aurelien Chartier
2013-Jul-10 14:24 UTC
Re: [PATCH] x86/xen: do not identity map E820 memory regions that are UNUSABLE
On 09/07/13 19:36, Konrad Rzeszutek Wilk wrote:> On Tue, Jul 09, 2013 at 06:00:17PM +0100, Aurelien Chartier wrote: >> On 09/07/13 14:38, David Vrabel wrote: >>> From: David Vrabel <david.vrabel@citrix.com> >>> >>> If there are UNUSABLE regions in the machine memory map, dom0 will >>> attempt to map them 1:1 which is not permitted by Xen and the kernel >>> will crash. >>> >>> There isn''t anything interesting in the UNUSABLE region that the dom0 >>> kernel needs access to so we can avoid making the 1:1 mapping and >>> leave the region as RAM. >>> >>> Since the obtaining the memory map for dom0 and domU are now more >>> different, refactor each into separate functions. >>> >>> This fixes a dom0 boot failure if tboot is used (because tboot was >>> marking a memory region as UNUSABLE). >>> >>> Signed-off-by: David Vrabel <david.vrabel@citrix.com> >>> Cc: stable@vger.kernel.org >> I tested it with Xen 4.1.3-rc and Linux 3.8.13.4 and it is fixing the >> early boot crash I reported. > How did you test it? How can one reproduce this crash? Do you see > this failure with v3.10 or just with v3.8 (and earlier)?I tested it by booting Xen and Linux over tboot (1.7.0) with TXT/TPM enabled. The issue is always reproducible. I posted a log of the kernel panic on xen-devel in May if than can help: http://lists.xen.org/archives/html/xen-devel/2013-05/msg01534.html I did not test with v3.10 yet, but I did not see any commit that could have fixed it. I''ll give it a try though.>> Tested-by: Aurelien Chartier <aurelien.chartier@citrix.com> >> >>> --- >>> arch/x86/xen/setup.c | 114 ++++++++++++++++++++++++++++++++++++++++---------- >>> 1 files changed, 91 insertions(+), 23 deletions(-) >>> >>> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c >>> index 94eac5c..3fdb6bd 100644 >>> --- a/arch/x86/xen/setup.c >>> +++ b/arch/x86/xen/setup.c >>> @@ -55,6 +55,91 @@ unsigned long xen_released_pages; >>> */ >>> #define EXTRA_MEM_RATIO (10) >>> >>> +static int __init xen_get_memory_map_dom0(struct e820entry *map, >>> + unsigned *nr_entries) >>> +{ >>> + struct xen_memory_map memmap; >>> + unsigned i; >>> + int ret; >>> + >>> + /* >>> + * Dom0 requires access to machine addresses for BIOS data and >>> + * MMIO (e.g. PCI) devices. The reset of the kernel expects >>> + * to be able to access these through a 1:1 p2m mapping. >>> + * >>> + * We need to take the pseudo physical memory map and set up >>> + * 1:1 mappings corresponding to the RESERVED regions and >>> + * holes in the /machine/ memory map, adding/expanding the RAM >>> + * region at the end of the map for the relocated RAM. >>> + * >>> + * This is more easily done if we just start with the machine >>> + * memory map. >>> + * >>> + * UNUSABLE regions are awkward, they are not interesting to >>> + * dom0 and Xen won''t allow them to be mapped so we want to >>> + * leave these as RAM in the pseudo physical map. >>> + * >>> + * Again, this is easiest if we take the machine memory map >>> + * and change the UNUSABLE regions to RAM. >>> + */ >>> + >>> + memmap.nr_entries = *nr_entries; >>> + set_xen_guest_handle(memmap.buffer, map); >>> + >>> + ret = HYPERVISOR_memory_op(XENMEM_machine_memory_map, &memmap); >>> + if (ret < 0) >>> + return ret; >>> + >>> + for (i = 0; i < memmap.nr_entries; i++) { >>> + if (map[i].type == E820_UNUSABLE) >>> + map[i].type = E820_RAM; >>> + } >>> + >>> + *nr_entries = memmap.nr_entries; >>> + return 0; >>> +} >>> + >>> +static int __init xen_get_memory_map_domu(struct e820entry *map, >>> + unsigned *nr_entries) >>> +{ >>> + struct xen_memory_map memmap; >>> + int ret; >>> + >>> + /* >>> + * For domUs, use the psuedo-physical memory map. >> Small typo here. >> >>> + * >>> + * If this is not available, fake up a memory map with a >>> + * single region containing the initial number of pages. >>> + */ >>> + >>> + memmap.nr_entries = *nr_entries; >>> + set_xen_guest_handle(memmap.buffer, map); >>> + >>> + ret = HYPERVISOR_memory_op(XENMEM_memory_map, &memmap); >>> + if (ret == -ENOSYS) { >>> + unsigned long max_pfn = min(MAX_DOMAIN_PAGES, >>> + xen_start_info->nr_pages); >>> + memmap.nr_entries = 1; >>> + map[0].addr = 0ULL; >>> + map[0].size = PFN_PHYS(max_pfn); >>> + /* 8MB slack (to balance backend allocations). */ >>> + map[0].size += 8ULL << 20; >>> + map[0].type = E820_RAM; >>> + } else if (ret < 0) >>> + return ret; >>> + >>> + *nr_entries = memmap.nr_entries; >>> + return 0; >>> +} >>> + >>> +static int __init xen_get_memory_map(struct e820entry *map, unsigned *nr_entries) >>> +{ >>> + if (xen_initial_domain()) >>> + return xen_get_memory_map_dom0(map, nr_entries); >>> + else >>> + return xen_get_memory_map_domu(map, nr_entries); >>> +} >>> + >>> static void __init xen_add_extra_mem(u64 start, u64 size) >>> { >>> unsigned long pfn; >>> @@ -319,42 +404,25 @@ static void xen_align_and_add_e820_region(u64 start, u64 size, int type) >>> char * __init xen_memory_setup(void) >>> { >>> static struct e820entry map[E820MAX] __initdata; >>> + unsigned map_nr_entries = E820MAX; >>> >>> unsigned long max_pfn = xen_start_info->nr_pages; >>> unsigned long long mem_end; >>> int rc; >>> - struct xen_memory_map memmap; >>> unsigned long max_pages; >>> unsigned long last_pfn = 0; >>> unsigned long extra_pages = 0; >>> unsigned long populated; >>> int i; >>> - int op; >>> >>> max_pfn = min(MAX_DOMAIN_PAGES, max_pfn); >>> mem_end = PFN_PHYS(max_pfn); >>> >>> - memmap.nr_entries = E820MAX; >>> - set_xen_guest_handle(memmap.buffer, map); >>> - >>> - op = xen_initial_domain() ? >>> - XENMEM_machine_memory_map : >>> - XENMEM_memory_map; >>> - rc = HYPERVISOR_memory_op(op, &memmap); >>> - if (rc == -ENOSYS) { >>> - BUG_ON(xen_initial_domain()); >>> - memmap.nr_entries = 1; >>> - map[0].addr = 0ULL; >>> - map[0].size = mem_end; >>> - /* 8MB slack (to balance backend allocations). */ >>> - map[0].size += 8ULL << 20; >>> - map[0].type = E820_RAM; >>> - rc = 0; >>> - } >>> + rc = xen_get_memory_map(map, &map_nr_entries); >>> BUG_ON(rc); >>> >>> /* Make sure the Xen-supplied memory map is well-ordered. */ >>> - sanitize_e820_map(map, memmap.nr_entries, &memmap.nr_entries); >>> + sanitize_e820_map(map, map_nr_entries, &map_nr_entries); >>> >>> max_pages = xen_get_max_pages(); >>> if (max_pages > max_pfn) >>> @@ -366,12 +434,12 @@ char * __init xen_memory_setup(void) >>> * this are first released. >>> */ >>> xen_released_pages = xen_set_identity_and_release( >>> - map, memmap.nr_entries, max_pfn); >>> + map, map_nr_entries, max_pfn); >>> >>> /* >>> * Populate back the non-RAM pages and E820 gaps that had been >>> * released. */ >>> - populated = xen_populate_chunk(map, memmap.nr_entries, >>> + populated = xen_populate_chunk(map, map_nr_entries, >>> max_pfn, &last_pfn, xen_released_pages); >>> >>> xen_released_pages -= populated; >>> @@ -395,7 +463,7 @@ char * __init xen_memory_setup(void) >>> extra_pages = min(EXTRA_MEM_RATIO * min(max_pfn, PFN_DOWN(MAXMEM)), >>> extra_pages); >>> i = 0; >>> - while (i < memmap.nr_entries) { >>> + while (i < map_nr_entries) { >>> u64 addr = map[i].addr; >>> u64 size = map[i].size; >>> u32 type = map[i].type; >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel
David Vrabel
2013-Jul-11 11:21 UTC
Re: [PATCH] x86/xen: do not identity map E820 memory regions that are UNUSABLE
On 09/07/13 19:45, Konrad Rzeszutek Wilk wrote:> On Tue, Jul 09, 2013 at 03:44:38PM +0100, David Vrabel wrote: >> On 09/07/13 15:13, Konrad Rzeszutek Wilk wrote: >>> On Tue, Jul 09, 2013 at 02:38:53PM +0100, David Vrabel wrote: >>>> From: David Vrabel <david.vrabel@citrix.com> >>>> >>>> If there are UNUSABLE regions in the machine memory map, dom0 will >>>> attempt to map them 1:1 which is not permitted by Xen and the kernel >>>> will crash. >>>> >>>> There isn''t anything interesting in the UNUSABLE region that the dom0 >>>> kernel needs access to so we can avoid making the 1:1 mapping and >>>> leave the region as RAM. >>>> >>>> Since the obtaining the memory map for dom0 and domU are now more >>>> different, refactor each into separate functions. >>>> >>>> This fixes a dom0 boot failure if tboot is used (because tboot was >>>> marking a memory region as UNUSABLE). >>> >>> Please also include the serial log that shows the crash. >> >> It''s a domain crash by Xen and it isn''t useful as none of the stack is >> decoded. > > Could you include the E820 at least to get a sense of where and how > this looks? As in - without tboot and then with tboot?This would take time to set up the host again and I don''t think including a specific example of an E820 map with an UNUSABLE region really adds anything useful to the commit log. You can look at some of the previous threads for examples.>>>> +static int __init xen_get_memory_map_dom0(struct e820entry *map, >>>> + unsigned *nr_entries) >>>> +{ >>>> + struct xen_memory_map memmap; >>>> + unsigned i; >>>> + int ret; >>>> + >>>> + /* >>>> + * Dom0 requires access to machine addresses for BIOS data and >>>> + * MMIO (e.g. PCI) devices. The reset of the kernel expects >>>> + * to be able to access these through a 1:1 p2m mapping. >>>> + * >>>> + * We need to take the pseudo physical memory map and set up >>>> + * 1:1 mappings corresponding to the RESERVED regions and >>>> + * holes in the /machine/ memory map, adding/expanding the RAM >>>> + * region at the end of the map for the relocated RAM. >> >> This is the key paragraph. The apparent use of the machine memory map >> for dom0 is a confusing fiction. > > OK, but I don''t follow when dom0 would be using the E820_UNUSED regions. > Is it the xen_do_chunk that is failing on said PFNs? Or is it in this > code xen_set_identity_and_release_chunk: > > "217 /* > 218 * If the PFNs are currently mapped, the VA mapping also needs > 219 * to be updated to be 1:1. > 220 */ > 221 for (pfn = start_pfn; pfn <= max_pfn_mapped && pfn < end_pfn; pfn++) > 222 (void)HYPERVISOR_update_va_mapping( > 223 (unsigned long)__va(pfn << PAGE_SHIFT), > 224 mfn_pte(pfn, PAGE_KERNEL_IO), 0); > 225 " > > which updates the initial PTE''s with the 1-1 PFN and the E820_UNUSABLE is > somehow in between two E820_RAM regions?It''s here, yes.>>>> + * >>>> + * This is more easily done if we just start with the machine >>>> + * memory map. >>>> + * >>>> + * UNUSABLE regions are awkward, they are not interesting to >>>> + * dom0 and Xen won''t allow them to be mapped so we want to >>>> + * leave these as RAM in the pseudo physical map. >>> >>> We just want them as such in the P2M but not do any PTE creation for it? >>> Why do we care about it? We are not creating any page tables for >>> E820_UNUSABLE regions. >> >> I don''t follow what you''re asking here. > > What code maps said PFNs.See above.>> In dom0, UNUSABLE regions in the machine memory map are RAM regions on >> the pseudo-physical memory map. So instead of playing games and making >> these regions special in the pseudo-physical map we just leave them as RAM. > > .. And then exposing them to the kernel to be used as normal RAM?Yes.> With your change it is. But without your change it would not map it.Incorrect. See above.>>>> + */ >>>> + >>>> + memmap.nr_entries = *nr_entries; >>>> + set_xen_guest_handle(memmap.buffer, map); >>>> + >>>> + ret = HYPERVISOR_memory_op(XENMEM_machine_memory_map, &memmap); >>>> + if (ret < 0) >>>> + return ret; >>>> + >>>> + for (i = 0; i < memmap.nr_entries; i++) { >>>> + if (map[i].type == E820_UNUSABLE) >>> >>> What if the E820_UNUSABLE regions were manufactured by the BIOS? Or >>> somebody booted Xen with mem=3G (in which we clip the memory) on a 16GB >>> box. >> >> The resulting memory map should be clipped by the result of the call to >> xen_get_max_pages(). > > OK. What about the BIOS manufacturing it?What about it? As a PV guest we don''t care what the machine memory map looks like, /except/ as a means to find interesting bits of hardware that we want 1:1 mappings for. David
Konrad Rzeszutek Wilk
2013-Jul-12 19:33 UTC
Re: [PATCH] x86/xen: do not identity map E820 memory regions that are UNUSABLE
On Thu, Jul 11, 2013 at 12:21:42PM +0100, David Vrabel wrote:> On 09/07/13 19:45, Konrad Rzeszutek Wilk wrote: > > On Tue, Jul 09, 2013 at 03:44:38PM +0100, David Vrabel wrote: > >> On 09/07/13 15:13, Konrad Rzeszutek Wilk wrote: > >>> On Tue, Jul 09, 2013 at 02:38:53PM +0100, David Vrabel wrote: > >>>> From: David Vrabel <david.vrabel@citrix.com> > >>>> > >>>> If there are UNUSABLE regions in the machine memory map, dom0 will > >>>> attempt to map them 1:1 which is not permitted by Xen and the kernel > >>>> will crash. > >>>> > >>>> There isn''t anything interesting in the UNUSABLE region that the dom0 > >>>> kernel needs access to so we can avoid making the 1:1 mapping and > >>>> leave the region as RAM. > >>>> > >>>> Since the obtaining the memory map for dom0 and domU are now more > >>>> different, refactor each into separate functions. > >>>> > >>>> This fixes a dom0 boot failure if tboot is used (because tboot was > >>>> marking a memory region as UNUSABLE). > >>> > >>> Please also include the serial log that shows the crash. > >> > >> It''s a domain crash by Xen and it isn''t useful as none of the stack is > >> decoded. > > > > Could you include the E820 at least to get a sense of where and how > > this looks? As in - without tboot and then with tboot? > > This would take time to set up the host again and I don''t think > including a specific example of an E820 map with an UNUSABLE region > really adds anything useful to the commit log. > > You can look at some of the previous threads for examples. > > >>>> +static int __init xen_get_memory_map_dom0(struct e820entry *map, > >>>> + unsigned *nr_entries) > >>>> +{ > >>>> + struct xen_memory_map memmap; > >>>> + unsigned i; > >>>> + int ret; > >>>> + > >>>> + /* > >>>> + * Dom0 requires access to machine addresses for BIOS data and > >>>> + * MMIO (e.g. PCI) devices. The reset of the kernel expects > >>>> + * to be able to access these through a 1:1 p2m mapping. > >>>> + * > >>>> + * We need to take the pseudo physical memory map and set up > >>>> + * 1:1 mappings corresponding to the RESERVED regions and > >>>> + * holes in the /machine/ memory map, adding/expanding the RAM > >>>> + * region at the end of the map for the relocated RAM. > >> > >> This is the key paragraph. The apparent use of the machine memory map > >> for dom0 is a confusing fiction. > > > > OK, but I don''t follow when dom0 would be using the E820_UNUSED regions. > > Is it the xen_do_chunk that is failing on said PFNs? Or is it in this > > code xen_set_identity_and_release_chunk: > > > > "217 /* > > 218 * If the PFNs are currently mapped, the VA mapping also needs > > 219 * to be updated to be 1:1. > > 220 */ > > 221 for (pfn = start_pfn; pfn <= max_pfn_mapped && pfn < end_pfn; pfn++) > > 222 (void)HYPERVISOR_update_va_mapping( > > 223 (unsigned long)__va(pfn << PAGE_SHIFT), > > 224 mfn_pte(pfn, PAGE_KERNEL_IO), 0); > > 225 " > > > > which updates the initial PTE''s with the 1-1 PFN and the E820_UNUSABLE is > > somehow in between two E820_RAM regions? > > It''s here, yes. > > >>>> + * > >>>> + * This is more easily done if we just start with the machine > >>>> + * memory map. > >>>> + * > >>>> + * UNUSABLE regions are awkward, they are not interesting to > >>>> + * dom0 and Xen won''t allow them to be mapped so we want to > >>>> + * leave these as RAM in the pseudo physical map. > >>> > >>> We just want them as such in the P2M but not do any PTE creation for it? > >>> Why do we care about it? We are not creating any page tables for > >>> E820_UNUSABLE regions. > >> > >> I don''t follow what you''re asking here. > > > > What code maps said PFNs. > > See above.So the ''HYPERVISOR_update_va_mapping'' fails b/c we include it in the xen_set_identify_and_release_chunk. Why not make the logic that sets "gaps" and E820_RESERVED regions to omit E820_UNUSABLE regions? That would solve it as well - and we won''t be messing with the E820.> > >> In dom0, UNUSABLE regions in the machine memory map are RAM regions on > >> the pseudo-physical memory map. So instead of playing games and making > >> these regions special in the pseudo-physical map we just leave them as RAM. > > > > .. And then exposing them to the kernel to be used as normal RAM? > > Yes. > > > With your change it is. But without your change it would not map it. > > Incorrect. See above.It would map it using the hypercall. But it would not create pagetables for it (the generic code that is it).> > >>>> + */ > >>>> + > >>>> + memmap.nr_entries = *nr_entries; > >>>> + set_xen_guest_handle(memmap.buffer, map); > >>>> + > >>>> + ret = HYPERVISOR_memory_op(XENMEM_machine_memory_map, &memmap); > >>>> + if (ret < 0) > >>>> + return ret; > >>>> + > >>>> + for (i = 0; i < memmap.nr_entries; i++) { > >>>> + if (map[i].type == E820_UNUSABLE) > >>> > >>> What if the E820_UNUSABLE regions were manufactured by the BIOS? Or > >>> somebody booted Xen with mem=3G (in which we clip the memory) on a 16GB > >>> box. > >> > >> The resulting memory map should be clipped by the result of the call to > >> xen_get_max_pages(). > > > > OK. What about the BIOS manufacturing it? > > What about it? As a PV guest we don''t care what the machine memory map > looks like, /except/ as a means to find interesting bits of hardware > that we want 1:1 mappings for.Right but now you are converting it from 1:1 to a RAM region - where we don''t do 1:1.> > David
David Vrabel
2013-Jul-12 21:38 UTC
Re: [PATCH] x86/xen: do not identity map E820 memory regions that are UNUSABLE
On 12/07/2013 20:33, Konrad Rzeszutek Wilk wrote:> > So the ''HYPERVISOR_update_va_mapping'' fails b/c we include it in the > xen_set_identify_and_release_chunk.That''s what I understand.> Why not make the logic that sets "gaps" > and E820_RESERVED regions to omit E820_UNUSABLE regions? That would solve > it as well - and we won''t be messing with the E820.I suppose we could do what you suggest but there would have to be a xen_release_chunk() function added, otherwise you would waste the frames in that region. I remain unconvinced that adding pointless unusable regions into the dom0''s memory map makes any sense.>>> OK. What about the BIOS manufacturing [unusable regions? >> >> What about it? As a PV guest we don''t care what the machine memory map >> looks like, /except/ as a means to find interesting bits of hardware >> that we want 1:1 mappings for. > > Right but now you are converting it from 1:1 to a RAM region - where we > don''t do 1:1.No, it''s leaving it as a RAM region, as setup by Xen (and as marked as such in the pseudo-physical map). I guess I still haven''t explained very well what the (confusing) setup code is trying to do. 1. Get psuedo-physical memory map. 2. Get machine memory map. 3. For each "interesting" (reserved or MMIO hole) region in the machine memory map: a. Add reserved region or hole to pseudo-physical memory map. b. Release memory. c. Set as 1:1 in p2m. d. Update any existing mappings. The code as-is doesn''t work like that but the end result should be the same. David
Konrad Rzeszutek Wilk
2013-Jul-15 12:48 UTC
Re: [PATCH] x86/xen: do not identity map E820 memory regions that are UNUSABLE
On Fri, Jul 12, 2013 at 10:38:13PM +0100, David Vrabel wrote:> On 12/07/2013 20:33, Konrad Rzeszutek Wilk wrote: > > > > So the ''HYPERVISOR_update_va_mapping'' fails b/c we include it in the > > xen_set_identify_and_release_chunk. > > That''s what I understand. > > > Why not make the logic that sets "gaps" > > and E820_RESERVED regions to omit E820_UNUSABLE regions? That would solve > > it as well - and we won''t be messing with the E820. > > I suppose we could do what you suggest but there would have to be a > xen_release_chunk() function added, otherwise you would waste the frames > in that region.That is a bit different logic - but yes that would still have to work as before and release the frames.> > I remain unconvinced that adding pointless unusable regions into the > dom0''s memory map makes any sense.Please also keep in mind that domU with PCI passthrough can have an E820 that looks like the host. Meaning this is not just for dom0.> > >>> OK. What about the BIOS manufacturing [unusable regions? > >> > >> What about it? As a PV guest we don''t care what the machine memory map > >> looks like, /except/ as a means to find interesting bits of hardware > >> that we want 1:1 mappings for. > > > > Right but now you are converting it from 1:1 to a RAM region - where we > > don''t do 1:1. > > No, it''s leaving it as a RAM region, as setup by Xen (and as marked as > such in the pseudo-physical map). > > I guess I still haven''t explained very well what the (confusing) setup > code is trying to do. > > 1. Get psuedo-physical memory map. > > 2. Get machine memory map. > > 3. For each "interesting" (reserved or MMIO hole) region in the machine > memory map: > > a. Add reserved region or hole to pseudo-physical memory map. > b. Release memory. > c. Set as 1:1 in p2m. > d. Update any existing mappings. > > The code as-is doesn''t work like that but the end result should be the same.It does right now (thought the operations are in a different order - b, d, a, c). But your point about the E820_UNUSED throws a wrench in here. The amount of pages that a guest has should (after re-organizing the P2M to represent the E820) equal the amount of E820_RAM regions. I am ignoring the balloon case here. That is still the case with ''tboot'' as ''tboot'' consumes some of the E820_RAM regions and converts them in E820_UNUSED. The amount of pages that the guest gets is the total_pages - <pages consumed for E820_UNUSED>. With your patch, you convert the E820_UNUSED to E820_RAM. I believe that what you end up is that the amount of E820_RAM PFNS >= nr_pages. In this case I am ignoring the usage of ''dom0_memory_max'' parameter. Since the only code (since v3.9) that maps E820_UNUSED is the special code you added - the HYPERVISOR_update_va_mapping in xen_set_identity_and_release_chunk, perhaps a better mechanism is for said code to consult the E820 and not call said update_va_mapping for the E820_UNUSED regions. After all, it is OK _not_ to call that for 1-1 regions. It worked before - albeit it was buggy with special (shared) 1-1 regions. Heck, we could make that loop consult the M2P and _only_ call the update_va_mapping on the M2P[pfn] = 0x55555555555. Either way I think that there are two easy solutions that are also candidates for stable tree and also make it possible to release the frames: - Either add some extra logic in xen_set_identity_and_release_chunk to not do the hypercall for E820_UNUSED region - Or do a check in the M2P[pfn] and skip if M2P[pfn]==shared_mfn entry value (thought I have no idea how hard-coded that is in the ABI) in the xen_set_identity_and_release_chunk.