On 12/14/2011 04:53 PM, Konrad Rzeszutek Wilk wrote:> On Thu, Dec 08, 2011 at 09:39:21AM -0500, Daniel De Graaf wrote:
>> I have a system with several reserved ranges low in the e820 map which
>> cause problems when starting PV domains with PCI devices. The machine
>> memory map looks like:
>>
>> (XEN) 0000000000000000 - 0000000000060000 (usable)
>> (XEN) 0000000000060000 - 0000000000068000 (reserved)
>> (XEN) 0000000000068000 - 000000000009ac00 (usable)
>> (XEN) 000000000009ac00 - 00000000000a0000 (reserved)
>> (XEN) 00000000000e0000 - 0000000000100000 (reserved)
>> (XEN) 0000000000100000 - 0000000000800000 (usable)
>> (XEN) 0000000000800000 - 000000000087d000 (unusable)
>> (XEN) 000000000087d000 - 0000000000f00000 (usable)
>> (XEN) 0000000000f00000 - 0000000001000000 (reserved)
>> (XEN) 0000000001000000 - 0000000020000000 (usable)
>> (XEN) 0000000020000000 - 0000000020200000 (reserved)
>> (XEN) 0000000020200000 - 0000000040000000 (usable)
>> (XEN) 0000000040000000 - 0000000040200000 (reserved)
>> (XEN) 0000000040200000 - 00000000c95d6000 (usable)
>> (XEN) 00000000c95d6000 - 00000000c961a000 (reserved)
>> (XEN) 00000000c961a000 - 00000000c99b7000 (usable)
>> (XEN) 00000000c99b7000 - 00000000c99e7000 (reserved)
>> (XEN) 00000000c99e7000 - 00000000c9be7000 (ACPI NVS)
>> (XEN) 00000000c9be7000 - 00000000c9bff000 (ACPI data)
>> (XEN) 00000000c9bff000 - 00000000c9c00000 (usable)
>> (XEN) 00000000c9f00000 - 00000000ca000000 (reserved)
>> (XEN) 00000000cb000000 - 00000000cf200000 (reserved)
>> (XEN) 00000000fed1c000 - 00000000fed30000 (reserved)
>> (XEN) 00000000ffc00000 - 00000000ffc20000 (reserved)
>> (XEN) 0000000100000000 - 000000042e000000 (usable)
>>
>> When e820_sanitize is called on this memory map to create a PV domain,
the
>> resulting map has only one usable region (0-0xf00000) below 4GB, and
Linux
>> will not boot with this memory map.
>
> OK, that looks like a bug. WE could modify e820_santizie in the libxl
> to use the hosts E820 and fill in the (usuable) regions with the memory
> that is allocated for it. Instead of allocating a big chunk at the start
> and then working out the other regions.
>
>>
>> I have a patch that reworks e820_sanitize to include later RAM regions
as
>> valid RAM, which works as long as the domain being booted has
permission
>> to map the PFNs from 0x20000-0x20200 and 0x40000-0x40200. If the domain
is
>> not given this permission (the default, since these regions are not
part of
>> the PCI device being passed to the guest) then the hypervisor crashes
the
>> domain when it attempts to map these regions (during
init_memory_mapping).
>
> Hmm, so it tries to map (reserved) regions? That seems rather odd.
> I am pretty sure it worked for me. What Linux kernel did you use and
> can you send the guest config file as well please?
Kernel 3.2-rc3; the code doing the mapping is init_memory_mapping in
arch/x86/mm/init.c which maps all memory up to the highest usable region
below 4G. On this system with my e820 patch, that may be as high as 0xc9c00000
depending on how much memory you give the guest.
Guest config:
kernel=''/home/daniel/linux/build/drdom/arch/x86/boot/bzImage''
ramdisk=''/boot/initramfs-generic.img''
root=''/dev/xvda1''
extra=''earlyprintk=xen console=hvc0''
memory=2500
name=''vm-2''
vcpus=2
vif = [ ''mac=00:fe:64:01:01:01,bridge=vmbr'' ]
disk = [
''backendtype=phy,vdev=xvda,access=w,target=/dev/clam/vm2'' ]
seclabel=''v:vm:domPV''
pci=[''0000:03:02.0'']
PCI device 03:02.0 is an extra NIC that I''m using to test.
>>
>> The domain will boot when these regions are not marked as reserved in
the
>> e820 map or when the PFNs 0x20200-0x40000 and 0x40200-0xc95d6 are
marked
>> as unusable. However, it is difficult to make this happen in any
general
>> case without knowing what reserved regions actually need to be marked
as
>> reserved in the guest.
>>
>> If PCI hot-add is not needed, the problem becomes simpler: the PCI
regions
>> for assigned devices can be included in the e820 map and other regions
can
>> be ignored (marking as RAM so that the guest does not attempt direct
map).
>>
>> Any suggestions on the best way to resolve this?
>
> I think reworking the e820_allocate to just use the E820 from the host
> and just convert the RAM regions that are above the map_limitkb to
> (unsuable). And then apply the other logic in the e820_allocate to
> convert gaps to (unusuable).
>
> But I would think that "libxl: Convert E820_UNUSABLE and E820_RAM to
> E820_UNUSABLE as appropriate" already takes care of this?
It does in part. The issues is that my memory map has a reserved region
(0xf00000 - 0x1000000) that is too low in the memory map, so almost all
of the RAM below 4G is marked as unusable. I''m pretty sure this
includes
the address where the kernel itself is loaded.
Boot with my e820 patch adding usable regions where possible:
libxl: debug: libxl_pci.c:237:libxl__create_pci_backend: Creating pci backend
libxl: debug: libxl_pci.c:1240:e820_sanitize: Memory: 2560000kB Balloon: 8192kB
Contiguous PFN: 0xf00 PCI region PFNs: 0xf00-0x100000
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [0 -> f00] RAM
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [1000 -> 20000] RAM
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [20200 -> 40000] RAM
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [40200 -> 9c900] RAM
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [f00 -> f00] Unusable
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [f00 -> 1000] Reserved
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [20000 -> 20200] Reserved
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [40000 -> 40200] Reserved
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [9c900 -> c95d6] Unusable
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [c95d6 -> c961a] Reserved
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [c961a -> c99b7] Unusable
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [c99b7 -> c99e7] Reserved
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [c99e7 -> c9be7] ACPI NVS
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [c9be7 -> c9bff] ACPI
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [c9bff -> c9c00] Unusable
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [cb000 -> cf200] Reserved
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [fed1c -> fed20] Reserved
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [ffc00 -> ffc20] Reserved
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [100000 -> 100800] RAM
Daemon running with PID 11104
(early) [ 0.000000] Initializing cgroup subsys cpuset
(early) [ 0.000000] Initializing cgroup subsys cpu
(early) [ 0.000000] Linux version 3.2.0-rc3-00022-g8462101
(daniel@moss-clam.epoch.ncsc.mil) (gcc version 4.6.1 20110908 (Red Hat 4.6.1-9)
(GCC) ) #35 SMP Wed Dec 7 10:00:40 EST 2011
(early) [ 0.000000] Command line: root=/dev/xvda1 earlyprintk=xen
console=hvc0
(early) [ 0.000000] ACPI in unprivileged domain disabled
(early) [ 0.000000] Freeing f00-1000 pfn range: 256 pages freed
(early) [ 0.000000] Freeing 20000-20200 pfn range: 512 pages freed
(early) [ 0.000000] Freeing 40000-40200 pfn range: 512 pages freed
(early) [ 0.000000] Released 1280 pages of unused memory
(early) [ 0.000000] Set 408576 page(s) to 1-1 mapping
(early) [ 0.000000] BIOS-provided physical RAM map:
(early) [ 0.000000] Xen: 0000000000000000 - 00000000000a0000 (usable)
(early) [ 0.000000] Xen: 00000000000a0000 - 0000000000100000 (reserved)
(early) [ 0.000000] Xen: 0000000000100000 - 0000000000f00000 (usable)
(early) [ 0.000000] Xen: 0000000000f00000 - 0000000001000000 (reserved)
(early) [ 0.000000] Xen: 0000000001000000 - 0000000020000000 (usable)
(early) [ 0.000000] Xen: 0000000020000000 - 0000000020200000 (reserved)
(early) [ 0.000000] Xen: 0000000020200000 - 0000000040000000 (usable)
(early) [ 0.000000] Xen: 0000000040000000 - 0000000040200000 (reserved)
(early) [ 0.000000] Xen: 0000000040200000 - 000000009c900000 (usable)
(early) [ 0.000000] Xen: 000000009c900000 - 00000000c95d6000 (unusable)
(early) [ 0.000000] Xen: 00000000c95d6000 - 00000000c961a000 (reserved)
(early) [ 0.000000] Xen: 00000000c961a000 - 00000000c99b7000 (unusable)
(early) [ 0.000000] Xen: 00000000c99b7000 - 00000000c99e7000 (reserved)
(early) [ 0.000000] Xen: 00000000c99e7000 - 00000000c9be7000 (ACPI NVS)
(early) [ 0.000000] Xen: 00000000c9be7000 - 00000000c9bff000 (ACPI data)
(early) [ 0.000000] Xen: 00000000c9bff000 - 00000000c9c00000 (unusable)
(early) [ 0.000000] Xen: 00000000cb000000 - 00000000cf200000 (reserved)
(early) [ 0.000000] Xen: 00000000fed1c000 - 00000000fed20000 (reserved)
(early) [ 0.000000] Xen: 00000000ffc00000 - 00000000ffc20000 (reserved)
(early) [ 0.000000] Xen: 0000000100000000 - 0000000100100000 (usable)
(early) [ 0.000000] Xen: 0000000100100000 - 0000000100800000 (unusable)
(early) [ 0.000000] bootconsole [xenboot0] enabled
(early) [ 0.000000] NX (Execute Disable) protection: active
(early) [ 0.000000] DMI not present or invalid.
(early) [ 0.000000] No AGP bridge found
(early) [ 0.000000] last_pfn = 0x100100 max_arch_pfn = 0x400000000
(early) [ 0.000000] last_pfn = 0x9c900 max_arch_pfn = 0x400000000
(early) [ 0.000000] init_memory_mapping: 0000000000000000-000000009c900000
(XEN) mm.c:866:d7 Non-privileged (7) attempt to map I/O space 00020000
(XEN) mm.c:1270:d7 Failure in alloc_l1_table: entry 0
(XEN) mm.c:2239:d7 Error while validating mfn 2fabf8 (pfn af3) for type
1000000000000000: caf=8000000000000003 taf=1000000000000001
(XEN) mm.c:3049:d7 Error while pinning mfn 2fabf8
(XEN) traps.c:486:d7 Unhandled invalid opcode fault/trap [#6] on VCPU 0
[ec=0000]
(XEN) domain_crash_sync called from entry.S
Boot without my patch adjusting the e820 map:
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [0 -> f00] RAM
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [f00 -> f00] Unusable
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [f00 -> 1000] Reserved
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [1000 -> 20000] Unusable
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [20000 -> 20200] Reserved
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [20200 -> 40000] Unusable
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [40000 -> 40200] Reserved
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [40200 -> c95d6] Unusable
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [c95d6 -> c961a] Reserved
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [c961a -> c99b7] Unusable
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [c99b7 -> c99e7] Reserved
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [c99e7 -> c9be7] ACPI NVS
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [c9be7 -> c9bff] ACPI
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [c9bff -> c9c00] Unusable
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [cb000 -> cf200] Reserved
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [fed1c -> fed20] Reserved
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [ffc00 -> ffc20] Reserved
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [100000 -> 19bd00] RAM
Daemon running with PID 18113
--- xl dies at this point due to domain crashing ---
Serial console from that boot:
(XEN) [VT-D]iommu.c:1464: d8:PCI: map 0000:03:02.0
mapping kernel into physical memory
[ 5770.151645] xabout to get started...
en-pciback: vpci: 0000:03:02.0: assign to virtual slot 0
[ 5770.153834] ADDRCONF(NETDEV_UP): vif8.0: link is not ready
[ 5770.168648] pciback 0000:03:02.0: device has been assigned to another domain!
Over-writting the ownership, but beware.
[ 5770.240228] device vif8.0 entered promiscuous mode
[ 5770.248595] ADDRCONF(NETDEV_UP): vif8.0: link is not ready
[ 0.000000] Initializing cgroup subsys cpuset
(XEN) [VT-D]iommu.c:1594: d8:PCI: unmap 0000:03:02.0
(Ignore the pciback message, the result is the same if I reboot to get rid of
that).
Working boot with hard-coded lower-RAM limit PFN=0x20000:
libxl: debug: libxl_pci.c:237:libxl__create_pci_backend: Creating pci backend
libxl: debug: libxl_pci.c:1240:e820_sanitize: Memory: 2560000kB Balloon: 8192kB
Contiguous PFN: 0xf00 PCI region PFNs: 0xf00-0x100000
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [0 -> f00] RAM
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [1000 -> 20000] RAM
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [f00 -> f00] Unusable
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [f00 -> 1000] Reserved
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [20000 -> 20200] Reserved
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [20200 -> 40000] Unusable
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [40000 -> 40200] Reserved
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [40200 -> c95d6] Unusable
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [c95d6 -> c961a] Reserved
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [c961a -> c99b7] Unusable
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [c99b7 -> c99e7] Reserved
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [c99e7 -> c9be7] ACPI NVS
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [c9be7 -> c9bff] ACPI
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [c9bff -> c9c00] Unusable
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [cb000 -> cf200] Reserved
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [fed1c -> fed20] Reserved
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [ffc00 -> ffc20] Reserved
libxl: debug: libxl_pci.c:1326:e820_sanitize: : [100000 -> 17cd00] RAM
Daemon running with PID 24900
(early) [ 0.000000] Initializing cgroup subsys cpuset
(early) [ 0.000000] Initializing cgroup subsys cpu
(early) [ 0.000000] Linux version 3.2.0-rc3-00022-g8462101
(daniel@moss-clam.epoch.ncsc.mil) (gcc version 4.6.1 20110908 (Red Hat 4.6.1-9)
(GCC) ) #35 SMP Wed Dec 7 10:00:40 EST 2011
(early) [ 0.000000] Command line: root=/dev/xvda1 earlyprintk=xen
console=hvc0
(early) [ 0.000000] ACPI in unprivileged domain disabled
(early) [ 0.000000] Freeing f00-1000 pfn range: 256 pages freed
(early) [ 0.000000] Freeing 20000-9c400 pfn range: 508928 pages freed
(early) [ 0.000000] Released 509184 pages of unused memory
(early) [ 0.000000] Set 917760 page(s) to 1-1 mapping
(early) [ 0.000000] BIOS-provided physical RAM map:
(early) [ 0.000000] Xen: 0000000000000000 - 00000000000a0000 (usable)
(early) [ 0.000000] Xen: 00000000000a0000 - 0000000000100000 (reserved)
(early) [ 0.000000] Xen: 0000000000100000 - 0000000000f00000 (usable)
(early) [ 0.000000] Xen: 0000000000f00000 - 0000000001000000 (reserved)
(early) [ 0.000000] Xen: 0000000001000000 - 0000000020000000 (usable)
(early) [ 0.000000] Xen: 0000000020000000 - 0000000020200000 (reserved)
(early) [ 0.000000] Xen: 0000000020200000 - 0000000040000000 (unusable)
(early) [ 0.000000] Xen: 0000000040000000 - 0000000040200000 (reserved)
(early) [ 0.000000] Xen: 0000000040200000 - 00000000c95d6000 (unusable)
(early) [ 0.000000] Xen: 00000000c95d6000 - 00000000c961a000 (reserved)
(early) [ 0.000000] Xen: 00000000c961a000 - 00000000c99b7000 (unusable)
(early) [ 0.000000] Xen: 00000000c99b7000 - 00000000c99e7000 (reserved)
(early) [ 0.000000] Xen: 00000000c99e7000 - 00000000c9be7000 (ACPI NVS)
(early) [ 0.000000] Xen: 00000000c9be7000 - 00000000c9bff000 (ACPI data)
(early) [ 0.000000] Xen: 00000000c9bff000 - 00000000c9c00000 (unusable)
(early) [ 0.000000] Xen: 00000000cb000000 - 00000000cf200000 (reserved)
(early) [ 0.000000] Xen: 00000000fed1c000 - 00000000fed20000 (reserved)
(early) [ 0.000000] Xen: 00000000ffc00000 - 00000000ffc20000 (reserved)
(early) [ 0.000000] Xen: 0000000100000000 - 000000017c600000 (usable)
(early) [ 0.000000] Xen: 000000017c600000 - 000000017cd00000 (unusable)
(early) [ 0.000000] bootconsole [xenboot0] enabled
(early) [ 0.000000] NX (Execute Disable) protection: active
(early) [ 0.000000] DMI not present or invalid.
(early) [ 0.000000] No AGP bridge found
(early) [ 0.000000] last_pfn = 0x17c600 max_arch_pfn = 0x400000000
(early) [ 0.000000] last_pfn = 0x20000 max_arch_pfn = 0x400000000
(early) [ 0.000000] init_memory_mapping: 0000000000000000-0000000020000000
(early) [ 0.000000] init_memory_mapping: 0000000100000000-000000017c600000
(early) [ 0.000000] RAMDISK: 02a5a000 - 03a62000
...