Konrad Rzeszutek Wilk
2011-Jul-25 15:54 UTC
[Xen-devel] git commit 9fd67b4ed0714ab718f1f9bd14c344af336a6df7 (x86-64: Give vvars their own page) breaks Xen PV guests (64-bit).
Hey Andy, I just started testing linus/master and found out that I get this bootup error: mapping kernel into physical memory about to get started... [ 0.000000] Initializing cgroup subsys cpuset [ 0.000000] Initializing cgroup subsys cpu [ 0.000000] Linux version 3.0.0-rc1-00169-gae7bd11 (konrad@phenom) (gcc version 4.4.4 20100503 (Red Hat 4.4.4-2) (GCC) ) #1 SMP PREEMPT Mon Jul 25 10:55:02 EDT 2011 [ 0.000000] Command line: console=hvc0 debug earlyprintk=xenboot [ 0.000000] ACPI in unprivileged domain disabled [ 0.000000] released 0 pages of unused memory [ 0.000000] Set 0 page(s) to 1-1 mapping. [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] Xen: 0000000000000000 - 00000000000a0000 (usable) [ 0.000000] Xen: 00000000000a0000 - 0000000000100000 (reserved) [ 0.000000] Xen: 0000000000100000 - 0000000080000000 (usable) [ 0.000000] Xen: 0000000100000000 - 0000000100800000 (usable) [ 0.000000] bootconsole [xenboot0] enabled [ 0.000000] NX (Execute Disable) protection: active [ 0.000000] DMI not present or invalid. [ 0.000000] e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved) [ 0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable) [ 0.000000] No AGP bridge found [ 0.000000] last_pfn = 0x100800 max_arch_pfn = 0x400000000 [ 0.000000] last_pfn = 0x80000 max_arch_pfn = 0x400000000 [ 0.000000] initial memory mapped : 0 - 100e2000 [ 0.000000] Base memory trampoline at [ffff88000009b000] 9b000 size 20480 [ 0.000000] init_memory_mapping: 0000000000000000-0000000080000000 [ 0.000000] 0000000000 - 0080000000 page 4k [ 0.000000] kernel direct mapping tables up to 80000000 @ 7fbfd000-80000000 [ 0.000000] xen: setting RW the range 7ff76000 - 80000000 [ 0.000000] init_memory_mapping: 0000000100000000-0000000100800000 [ 0.000000] 0100000000 - 0100800000 page 4k [ 0.000000] kernel direct mapping tables up to 100800000 @ 7f3f3000-7fbfd000 [ 0.000000] xen: setting RW the range 7f3f8000 - 7fbfd000 [ 0.000000] RAMDISK: 01b6f000 - 100e2000 [ 0.000000] No NUMA configuration found [ 0.000000] Faking a node at 0000000000000000-0000000100800000 [ 0.000000] Initmem setup node 0 0000000000000000-0000000100800000 [ 0.000000] NODE_DATA [000000007fffb000 - 000000007fffffff] [ 0.000000] Zone PFN ranges: [ 0.000000] DMA 0x00000010 -> 0x00001000 [ 0.000000] DMA32 0x00001000 -> 0x00100000 [ 0.000000] Normal 0x00100000 -> 0x00100800 [ 0.000000] Movable zone start PFN for each node [ 0.000000] early_node_map[3] active PFN ranges [ 0.000000] 0: 0x00000010 -> 0x000000a0 [ 0.000000] 0: 0x00000100 -> 0x00080000 [ 0.000000] 0: 0x00100000 -> 0x00100800 [ 0.000000] On node 0 totalpages: 526224 [ 0.000000] DMA zone: 56 pages used for memmap [ 0.000000] DMA zone: 5 pages reserved [ 0.000000] DMA zone: 3923 pages, LIFO batch:0 [ 0.000000] DMA32 zone: 14280 pages used for memmap [ 0.000000] DMA32 zone: 505912 pages, LIFO batch:31 [ 0.000000] Normal zone: 28 pages used for memmap [ 0.000000] Normal zone: 2020 pages, LIFO batch:0 (XEN) mm.c:940:d10 Error getting mfn 1888 (pfn 1e3e48) from L1 entry 8000000001888465 for l1e_owner=10, pg_owner=10 (XEN) mm.c:5049:d10 ptwr_emulate: could not get_page_from_l1e() [ 0.000000] BUG: unable to handle kernel NULL pointer dereference at (null) [ 0.000000] IP: [<ffffffff8103a930>] xen_set_pte+0x20/0xe0 [ 0.000000] PGD 0 [ 0.000000] Oops: 0003 [#1] PREEMPT SMP [ 0.000000] CPU 0 [ 0.000000] Modules linked in: [ 0.000000] [ 0.000000] Pid: 0, comm: swapper Not tainted 3.0.0-rc1-00169-gae7bd11 #1 [ 0.000000] RIP: e030:[<ffffffff8103a930>] [<ffffffff8103a930>] xen_set_pte+0x20/0xe0 [ 0.000000] RSP: e02b:ffffffff81801df8 EFLAGS: 00010097 [ 0.000000] RAX: 0000000000000000 RBX: ffff88000193dff8 RCX: ffffffffff5ff000 [ 0.000000] RDX: 0000000010000001 RSI: 8000000001888465 RDI: ffff88000193dff8 [ 0.000000] RBP: ffffffff81801e18 R08: 0000000000000000 R09: 0000000000007ff0 [ 0.000000] R10: aaaaaaaaaaaaaaaa R11: aaaaaaaaaaaaaaaa R12: 8000000001888465 [ 0.000000] R13: 000000000e573000 R14: 0000000080000000 R15: 0000000000000000 [ 0.000000] FS: 0000000000000000(0000) GS:ffffffff81889000(0000) knlGS:0000000000000000 [ 0.000000] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.000000] CR2: 0000000000000000 CR3: 0000000001803000 CR4: 0000000000000660 [ 0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 0.000000] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 0.000000] Process swapper (pid: 0, threadinfo ffffffff81800000, task ffffffff8180b020) [ 0.000000] Stack: [ 0.000000] ffffffffff5ff000 8000000001888465 ffffffffff5ff000 8000000001888465 [ 0.000000] ffffffff81801e38 ffffffff8106db53 0000000000000800 8000000001888465 [ 0.000000] ffffffff81801e48 ffffffff8106dbc0 ffffffff81801e58 ffffffff810720f6 [ 0.000000] Call Trace: [ 0.000000] [<ffffffff8106db53>] set_pte_vaddr_pud+0x43/0x60 [ 0.000000] [<ffffffff8106dbc0>] set_pte_vaddr+0x50/0x70 [ 0.000000] [<ffffffff810720f6>] __native_set_fixmap+0x26/0x30 [ 0.000000] [<ffffffff810387e1>] xen_set_fixmap+0xa1/0x160 [ 0.000000] [<ffffffff818a3fa4>] map_vsyscall+0x50/0x55 [ 0.000000] [<ffffffff818a355a>] setup_arch+0xab1/0xb5d [ 0.000000] [<ffffffff8103aa3f>] ? __raw_callee_save_xen_restore_fl+0x11/0x1e [ 0.000000] [<ffffffff815a8fc5>] ? printk+0x3c/0x3e [ 0.000000] [<ffffffff8189da0c>] start_kernel+0xd8/0x3c7 [ 0.000000] [<ffffffff8189d346>] x86_64_start_reservations+0x131/0x135 [ 0.000000] [<ffffffff818a096f>] xen_start_kernel+0x5cf/0x5d6 [ 0.000000] Code: 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec 20 48 89 5d f0 4c 89 65 f8 48 89 fb 49 89 f4 e8 55 ab 02 00 83 f8 01 74 10 <4c> 89 23 48 8b 5d f0 4c 8b 65 f8 c9 c3 0f 1f 00 ff 14 25 80 5a [ 0.000000] RIP [<ffffffff8103a930>] xen_set_pte+0x20/0xe0 [ 0.000000] RSP <ffffffff81801df8> [ 0.000000] CR2: 0000000000000000 [ 0.000000] ---[ end trace a7919e7f17c0a725 ]--- [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! [ 0.000000] Pid: 0, comm: swapper Tainted: G D 3.0.0-rc1-00169-gae7bd11 #1 [ 0.000000] Call Trace: [ 0.000000] [<ffffffff815a8e72>] panic+0x96/0x1ad [ 0.000000] [<ffffffff8108e9a1>] do_exit+0x7e1/0x960 [ 0.000000] [<ffffffff8108ac4a>] ? kmsg_dump+0xca/0x110 [ 0.000000] [<ffffffff815ad4cb>] oops_end+0xab/0xf0 [ 0.000000] [<ffffffff8106e343>] no_context+0xf3/0x260 [ 0.000000] [<ffffffff8106e5d5>] __bad_area_nosemaphore+0x125/0x1e0 [ 0.000000] [<ffffffff8103ab8e>] ? xen_restore_fl+0x3e/0x80 [ 0.000000] [<ffffffff8106e69e>] bad_area_nosemaphore+0xe/0x10 [ 0.000000] [<ffffffff815af426>] do_page_fault+0x306/0x4e0 [ 0.000000] [<ffffffff818bedde>] ? memblock_find_region+0x45/0x7b [ 0.000000] [<ffffffff818bedde>] ? memblock_find_region+0x45/0x7b [ 0.000000] [<ffffffff818bf406>] ? memblock_add_region+0x7f/0x3ef [ 0.000000] [<ffffffff818bf101>] ? memblock_init+0x79/0xbf [ 0.000000] [<ffffffff8103ab8e>] ? xen_restore_fl+0x3e/0x80 [ 0.000000] [<ffffffff815ac885>] page_fault+0x25/0x30 [ 0.000000] [<ffffffff8103a930>] ? xen_set_pte+0x20/0xe0 [ 0.000000] [<ffffffff8103a92b>] ? xen_set_pte+0x1b/0xe0 [ 0.000000] [<ffffffff8106db53>] set_pte_vaddr_pud+0x43/0x60 [ 0.000000] [<ffffffff8106dbc0>] set_pte_vaddr+0x50/0x70 [ 0.000000] [<ffffffff810720f6>] __native_set_fixmap+0x26/0x30 [ 0.000000] [<ffffffff810387e1>] xen_set_fixmap+0xa1/0x160 [ 0.000000] [<ffffffff818a3fa4>] map_vsyscall+0x50/0x55 [ 0.000000] [<ffffffff818a355a>] setup_arch+0xab1/0xb5d [ 0.000000] [<ffffffff8103aa3f>] ? __raw_callee_save_xen_restore_fl+0x11/0x1e [ 0.000000] [<ffffffff815a8fc5>] ? printk+0x3c/0x3e [ 0.000000] [<ffffffff8189da0c>] start_kernel+0xd8/0x3c7 [ 0.000000] [<ffffffff8189d346>] x86_64_start_reservations+0x131/0x135 [ 0.000000] [<ffffffff818a096f>] xen_start_kernel+0x5cf/0x5d6 Using git bisect (see attached bisection log) I''ve narrowed it down to this commit: commit 9fd67b4ed0714ab718f1f9bd14c344af336a6df7 Author: Andy Lutomirski <luto@MIT.EDU> Date: Sun Jun 5 13:50:19 2011 -0400 x86-64: Give vvars their own page Move vvars out of the vsyscall page into their own page and mark it NX. Please see attached .config file The guest config is as follow: kernel="/home/konrad/ssd/xtt/dist/common/vmlinuz" ramdisk="/home/konrad/ssd/xtt/dist/common/initramfs.cpio.gz" extra="console=hvc0 debug earlyprintk=xenboot" memory=2048 vcpus=4 name="latest" on_crash="preserve" vif = [ ''mac=00:0F:4B:00:00:68, bridge=switch'' ] vfb = [ ''vnc=1, vnclisten=0.0.0.0,vncunused=1''] And I am using Xen 4.1.1 hypervisor. This Wiki: http://wiki.xensource.com/xenwiki/XenParavirtOps has details on how to compile Xen, pvops, etc. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Jul-25 16:10 UTC
[Xen-devel] Re: git commit 9fd67b4ed0714ab718f1f9bd14c344af336a6df7 (x86-64: Give vvars their own page) breaks Xen PV guests (64-bit).
On Mon, Jul 25, 2011 at 11:54:42AM -0400, Konrad Rzeszutek Wilk wrote:> Hey Andy, > > I just started testing linus/master and found out that I get this bootup error: > > mapping kernel into physical memory > about to get started... > (XEN) mm.c:940:d10 Error getting mfn 1888 (pfn 1e3e48) from L1 entry 8000000001888465 for l1e_owner=10, pg_owner=10 > (XEN) mm.c:5049:d10 ptwr_emulate: could not get_page_from_l1e() > [ 0.000000] BUG: unable to handle kernel NULL pointer dereference at (null) > [ 0.000000] IP: [<ffffffff8103a930>] xen_set_pte+0x20/0xe0 > [ 0.000000] PGD 0 > [ 0.000000] Oops: 0003 [#1] PREEMPT SMP > [ 0.000000] CPU 0 > [ 0.000000] Modules linked in: > [ 0.000000] > [ 0.000000] Pid: 0, comm: swapper Not tainted 3.0.0-rc1-00169-gae7bd11 #1 > [ 0.000000] RIP: e030:[<ffffffff8103a930>] [<ffffffff8103a930>] xen_set_pte+0x20/0xe0 > [ 0.000000] RSP: e02b:ffffffff81801df8 EFLAGS: 00010097 > [ 0.000000] RAX: 0000000000000000 RBX: ffff88000193dff8 RCX: ffffffffff5ff000 > [ 0.000000] RDX: 0000000010000001 RSI: 8000000001888465 RDI: ffff88000193dff8 > [ 0.000000] RBP: ffffffff81801e18 R08: 0000000000000000 R09: 0000000000007ff0 > [ 0.000000] R10: aaaaaaaaaaaaaaaa R11: aaaaaaaaaaaaaaaa R12: 8000000001888465 > [ 0.000000] R13: 000000000e573000 R14: 0000000080000000 R15: 0000000000000000 > [ 0.000000] FS: 0000000000000000(0000) GS:ffffffff81889000(0000) knlGS:0000000000000000 > [ 0.000000] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 0.000000] CR2: 0000000000000000 CR3: 0000000001803000 CR4: 0000000000000660 > [ 0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 0.000000] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [ 0.000000] Process swapper (pid: 0, threadinfo ffffffff81800000, task ffffffff8180b020) > [ 0.000000] Stack: > [ 0.000000] ffffffffff5ff000 8000000001888465 ffffffffff5ff000 8000000001888465 > [ 0.000000] ffffffff81801e38 ffffffff8106db53 0000000000000800 8000000001888465 > [ 0.000000] ffffffff81801e48 ffffffff8106dbc0 ffffffff81801e58 ffffffff810720f6 > [ 0.000000] Call Trace: > [ 0.000000] [<ffffffff8106db53>] set_pte_vaddr_pud+0x43/0x60 > [ 0.000000] [<ffffffff8106dbc0>] set_pte_vaddr+0x50/0x70This tiny patch fixes the bootup: diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index f987bde..0e4c13c 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -1916,6 +1916,7 @@ static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot) # endif #else case VSYSCALL_LAST_PAGE ... VSYSCALL_FIRST_PAGE: + case VVAR_PAGE: #endif case FIX_TEXT_POKE0: case FIX_TEXT_POKE1: However, this is what I get later on, any ideas? (early) [ 0.000000] Initializing cgroup subsys cpuset (early) [ 0.000000] Initializing cgroup subsys cpu (early) [ 0.000000] Linux version 3.0.0-03370-gb6844e8-dirty (konrad@phenom) (gcc version 4.4.4 20100503 (Red Hat 4.4.4-2) (GCC) ) #1 SMP PREEMPT Mon Jul 25 12:01:00 EDT 2011 (early) [ 0.000000] Command line: console=hvc0 debug earlyprintk=xenboot (early) [ 0.000000] ACPI in unprivileged domain disabled (early) [ 0.000000] released 0 pages of unused memory (early) [ 0.000000] Set 0 page(s) to 1-1 mapping. (early) [ 0.000000] BIOS-provided physical RAM map: (early) [ 0.000000] Xen: 0000000000000000 - 00000000000a0000 (usable) (early) [ 0.000000] Xen: 00000000000a0000 - 0000000000100000 (reserved) (early) [ 0.000000] Xen: 0000000000100000 - 0000000080800000 (usable) (early) [ 0.000000] bootconsole [xenboot0] enabled (early) [ 0.000000] NX (Execute Disable) protection: active (early) [ 0.000000] DMI not present or invalid. (early) [ 0.000000] e820 update range: 0000000000000000 - 0000000000010000 (early) (usable)(early) ==> (early) (reserved)(early) (early) [ 0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (early) (usable)(early) (early) [ 0.000000] No AGP bridge found (early) [ 0.000000] last_pfn = 0x80800 max_arch_pfn = 0x400000000 (early) [ 0.000000] initial memory mapped : 0 - 102ea000 (early) [ 0.000000] Base memory trampoline at [ffff88000009b000] 9b000 size 20480 (early) [ 0.000000] init_memory_mapping: 0000000000000000-0000000080800000 (early) [ 0.000000] 0000000000 - 0080800000 page 4k (early) [ 0.000000] kernel direct mapping tables up to 80800000 @ 7fbf8000-80000000 (early) [ 0.000000] xen: setting RW the range 7ff76000 - 80000000 (early) [ 0.000000] RAMDISK: 01b76000 - 102ea000 (early) [ 0.000000] No NUMA configuration found (early) [ 0.000000] Faking a node at 0000000000000000-0000000080800000 (early) [ 0.000000] Initmem setup node 0 0000000000000000-0000000080800000 (early) [ 0.000000] NODE_DATA [000000007fffb000 - 000000007fffffff] (early) [ 0.000000] Zone PFN ranges: (early) [ 0.000000] DMA (early) 0x00000010 -> 0x00001000 (early) [ 0.000000] DMA32 (early) 0x00001000 -> 0x00100000 (early) [ 0.000000] Normal (early) empty (early) [ 0.000000] Movable zone start PFN for each node (early) [ 0.000000] early_node_map[2] active PFN ranges (early) [ 0.000000] 0: 0x00000010 -> 0x000000a0 (early) [ 0.000000] 0: 0x00000100 -> 0x00080800 (early) [ 0.000000] On node 0 totalpages: 526224 (early) [ 0.000000] DMA zone: 56 pages used for memmap (early) [ 0.000000] DMA zone: 5 pages reserved (early) [ 0.000000] DMA zone: 3923 pages, LIFO batch:0 (early) [ 0.000000] DMA32 zone: 7140 pages used for memmap (early) [ 0.000000] DMA32 zone: 515100 pages, LIFO batch:31 (early) [ 0.000000] SMP: Allowing 4 CPUs, 0 hotplug CPUs (early) [ 0.000000] No local APIC present (early) [ 0.000000] APIC: disable apic facility (early) [ 0.000000] APIC: switched to apic NOOP (early) [ 0.000000] nr_irqs_gsi: 16 (early) [ 0.000000] PM: Registered nosave memory: 00000000000a0000 - 0000000000100000 (early) [ 0.000000] Allocating PCI resources starting at 80800000 (gap: 80800000:7f800000) (early) [ 0.000000] Booting paravirtualized kernel on Xen (early) [ 0.000000] Xen version: 4.2-unstable (preserve-AD) (early) [ 0.000000] setup_percpu: NR_CPUS:256 nr_cpumask_bits:256 nr_cpu_ids:4 nr_node_ids:1 (early) [ 0.000000] PERCPU: Embedded 28 pages/cpu @ffff88007fb88000 s81984 r8192 d24512 u114688 (early) [ 0.000000] pcpu-alloc: s81984 r8192 d24512 u114688 alloc=28*4096(early) (early) [ 0.000000] pcpu-alloc: (early) [0] (early) 0 (early) [0] (early) 1 (early) [0] (early) 2 (early) [0] (early) 3 (early) (early) [ 0.000000] Built 1 zonelists in Node order, mobility grouping on. Total pages: 519023 (early) [ 0.000000] Policy zone: DMA32 (early) [ 0.000000] Kernel command line: console=hvc0 debug earlyprintk=xenboot (early) [ 0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes) (early) [ 0.000000] Checking aperture... (early) [ 0.000000] No AGP bridge found (early) [ 0.000000] Calgary: detecting Calgary via BIOS EBDA area (early) [ 0.000000] Calgary: Unable to locate Rio Grande table in EBDA - bailing! (early) [ 0.000000] Memory: 1809780k/2105344k available (5937k kernel code, 448k absent, 295116k reserved, 2812k data, 692k init) (early) [ 0.000000] SLUB: Genslabs=15, HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 (early) [ 0.000000] Preemptible hierarchical RCU implementation. (early) [ 0.000000] NR_IRQS:16640 nr_irqs:304 16 (early) [ 0.000000] Console: colour dummy device 80x25 (early) [ 0.000000] console [tty0] enabled [ 0.000000] console [hvc0] enabled, bootconsole disabled (early) [ 0.000000] console [hvc0] enabled, bootconsole disabled [ 0.000000] Xen: using vcpuop timer interface [ 0.000000] installing Xen timer for CPU 0 [ 0.000000] Detected 3000.206 MHz processor. [ 0.000000] Marking TSC unstable due to TSCs unsynchronized [ 0.000999] Calibrating delay loop (skipped), value calculated using timer frequency.. 6000.41 BogoMIPS (lpj=3000206) [ 0.000999] pid_max: default: 32768 minimum: 301 [ 0.000999] Security Framework initialized [ 0.000999] SELinux: Initializing. [ 0.000999] SELinux: Starting in permissive mode [ 0.000999] Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes) [ 0.001401] Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes) [ 0.001684] Mount-cache hash table entries: 256 [ 0.001854] Initializing cgroup subsys cpuacct [ 0.001865] Initializing cgroup subsys freezer [ 0.001907] tseg: 0000000000 [ 0.001919] CPU: Physical Processor ID: 0 [ 0.001924] CPU: Processor Core ID: 1 [ 0.001978] SMP alternatives: switching to UP code [ 0.002100] cpu 0 spinlock event irq 17 [ 0.002156] Performance Events: [ 0.002162] no APIC, boot with the "lapic" boot parameter to force-enable it. [ 0.002168] no hardware sampling interrupt available. [ 0.002192] Broken PMU hardware detected, using software events only. [ 0.008054] MCE: In-kernel MCE decoding enabled. [ 0.008079] NMI watchdog disabled (cpu0): hardware events not enabled [ 0.014023] installing Xen timer for CPU 1 [ 0.014061] cpu 1 spinlock event irq 23 [ 0.014135] SMP alternatives: switching to SMP code [ 0.015693] NMI watchdog disabled (cpu1): hardware events not enabled [ 0.021060] installing Xen timer for CPU 2 [ 0.021128] cpu 2 spinlock event irq 29 [ 0.021434] NMI watchdog disabled (cpu2): hardware events not enabled [ 0.027063] installing Xen timer for CPU 3 [ 0.027108] cpu 3 spinlock event irq 35 [ 0.027359] NMI watchdog disabled (cpu3): hardware events not enabled [ 0.029054] Brought up 4 CPUs [ 0.029163] kworker/u:0 used greatest stack depth: 5512 bytes left [ 0.029177] Grant table initialized [ 0.048829] RTC time: 165:165:165, date: 165/165/65 [ 0.048893] NET: Registered protocol family 16 [ 0.049057] Extended Config Space enabled on 0 nodes [ 0.050337] PCI: setting up Xen PCI frontend stub [ 0.050344] PCI: pci_cache_line_size set to 64 bytes [ 0.059043] bio: create slab <bio-0> at 0 [ 0.060045] ACPI: Interpreter disabled. [ 0.060045] xen/balloon: Initialising balloon driver. [ 0.060045] last_pfn = 0x80800 max_arch_pfn = 0x400000000 [ 0.063052] xen-balloon: Initialising balloon driver. [ 0.063079] vgaarb: loaded [ 0.064036] usbcore: registered new interface driver usbfs [ 0.064062] usbcore: registered new interface driver hub [ 0.064062] usbcore: registered new device driver usb [ 0.064062] PCI: System does not support PCI [ 0.064062] PCI: System does not support PCI [ 0.064062] NetLabel: Initializing [ 0.064062] NetLabel: domain hash size = 128 [ 0.064062] NetLabel: protocols = UNLABELED CIPSOv4 [ 0.064062] NetLabel: unlabeled traffic allowed by default [ 0.065036] Switching to clocksource xen [ 0.065243] Switched to NOHz mode on CPU #2 [ 0.065423] Switched to NOHz mode on CPU #3 [ 0.065942] Switched to NOHz mode on CPU #0 [ 0.065993] Switched to NOHz mode on CPU #1 [ 0.067462] pnp: PnP ACPI: disabled [ 0.072379] PCI: max bus depth: 0 pci_try_num: 1 [ 0.072421] NET: Registered protocol family 2 [ 0.072609] IP route cache hash table entries: 65536 (order: 7, 524288 bytes) [ 0.073921] TCP established hash table entries: 262144 (order: 10, 4194304 bytes) [ 0.075229] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes) [ 0.075493] TCP: Hash tables configured (established 262144 bind 65536) [ 0.075506] TCP reno registered [ 0.075526] UDP hash table entries: 1024 (order: 3, 32768 bytes) [ 0.075550] UDP-Lite hash table entries: 1024 (order: 3, 32768 bytes) [ 0.075642] NET: Registered protocol family 1 [ 0.075869] RPC: Registered named UNIX socket transport module. [ 0.075877] RPC: Registered udp transport module. [ 0.075882] RPC: Registered tcp transport module. [ 0.075887] RPC: Registered tcp NFSv4.1 backchannel transport module. [ 0.075895] PCI: CLS 0 bytes, default 64 [ 0.076004] Trying to unpack rootfs image as initramfs... [ 0.328726] Freeing initrd memory: 237008k freed [ 0.390049] platform rtc_cmos: registered platform RTC device (no PNP device found) [ 0.391356] Machine check injector initialized [ 0.392159] microcode: CPU0: patch_level=0x010000bf [ 0.392193] microcode: CPU1: patch_level=0x010000bf [ 0.392278] microcode: CPU2: patch_level=0x010000bf [ 0.392373] microcode: CPU3: patch_level=0x010000bf [ 0.392489] microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba [ 0.392997] audit: initializing netlink socket (disabled) [ 0.393027] type=2000 audit(1311610075.824:1): initialized [ 0.407315] HugeTLB registered 2 MB page size, pre-allocated 0 pages [ 0.412966] VFS: Disk quotas dquot_6.5.2 [ 0.413113] Dquot-cache hash table entries: 512 (order 0, 4096 bytes) [ 0.413840] NTFS driver 2.1.30 [Flags: R/W]. [ 0.414131] msgmni has been set to 3997 [ 0.414372] SELinux: Registering netfilter hooks [ 0.415310] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253) [ 0.415327] io scheduler noop registered [ 0.415332] io scheduler deadline registered [ 0.415434] io scheduler cfq registered (default) [ 0.415794] pci_hotplug: PCI Hot Plug PCI Core version: 0.5 [ 0.464230] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled [ 0.526788] Non-volatile memory driver v1.3 [ 0.526798] Linux agpgart interface v0.103 [ 0.527367] [drm] Initialized drm 1.1.0 20060810 [ 0.530271] brd: module loaded [ 0.531723] loop: module loaded [ 0.532369] Fixed MDIO Bus: probed [ 0.532902] tun: Universal TUN/TAP device driver, 1.6 [ 0.532913] tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com> [ 0.533296] ehci_hcd: USB 2.0 ''Enhanced'' Host Controller (EHCI) Driver [ 0.533305] ehci_hcd: block sizes: qh 104 qtd 96 itd 192 sitd 96 [ 0.533436] ohci_hcd: USB 1.1 ''Open'' Host Controller (OHCI) Driver [ 0.533448] ohci_hcd: block sizes: ed 80 td 96 [ 0.533574] uhci_hcd: USB Universal Host Controller Interface driver [ 0.533768] usbcore: registered new interface driver usblp [ 0.533851] usbcore: registered new interface driver libusual [ 0.534155] i8042: PNP: No PS/2 controller found. Probing ports directly. [ 0.534978] i8042: No controller found [ 0.535254] mousedev: PS/2 mouse device common for all mice [ 0.575759] rtc_cmos rtc_cmos: rtc core: registered rtc_cmos as rtc0 [ 0.575892] rtc_cmos: probe of rtc_cmos failed with error -38 [ 0.576261] cpuidle: using governor ladder [ 0.576279] cpuidle: using governor menu [ 0.576284] EFI Variables Facility v0.08 2004-May-17 [ 0.576395] zram: num_devices not specified. Using default: 1 [ 0.576403] zram: Creating 1 devices ... [ 0.576806] Netfilter messages via NETLINK v0.30. [ 0.576828] nf_conntrack version 0.5.0 (16384 buckets, 65536 max) [ 0.577133] ctnetlink v0.93: registering with nfnetlink. [ 0.577590] ip_tables: (C) 2000-2006 Netfilter Core Team [ 0.577641] TCP cubic registered [ 0.577646] Initializing XFRM netlink socket [ 0.578159] NET: Registered protocol family 10 [ 0.578734] ip6_tables: (C) 2000-2006 Netfilter Core Team [ 0.578788] IPv6 over IPv4 tunneling driver [ 0.579680] NET: Registered protocol family 17 [ 0.579717] Registering the dns_resolver key type [ 0.580246] PM: Hibernation image not present or could not be loaded. [ 0.580268] registered taskstats version 1 [ 0.580306] XENBUS: Device with no driver: device/vif/0 [ 0.580311] XENBUS: Device with no driver: device/vfb/0 [ 0.580324] XENBUS: Device with no driver: device/vkbd/0 [ 0.580338] Magic number: 1:252:3141 [ 0.580459] powernow-k8: Found 1 AMD Phenom(tm) II X6 1075T Processor (4 cpu cores) (version 2.20.00) [ 0.580510] powernow-k8: Core Performance Boosting: on. [ 0.580526] [Firmware Bug]: powernow-k8: No compatible ACPI _PSS objects found. [ 0.580528] [Firmware Bug]: powernow-k8: Try again with latest BIOS. [ 0.581017] Freeing unused kernel memory: 692k freed [ 0.581208] Write protecting the kernel read-only data: 8192k [ 0.584393] Freeing unused kernel memory: 184k freed [ 0.584595] Freeing unused kernel memory: 328k freed [ 0.585880] init[1] illegal int 0xcc from 32-bit mode ip:ffffffffff600400 cs:e033 sp:7fff230ca088 ax:ffffffffff600400 si:7faee3e822bf di:7fff230ca158 [ 0.586105] init used greatest stack depth: 5064 bytes left [ 0.586118] Kernel panic - not syncing: Attempted to kill init! [ 0.586126] Pid: 1, comm: init Not tainted 3.0.0-03370-gb6844e8-dirty #1 [ 0.586132] Call Trace: [ 0.586144] [<ffffffff815bd12d>] panic+0x96/0x1ad [ 0.586152] [<ffffffff810845d1>] ? get_parent_ip+0x11/0x50 [ 0.586160] [<ffffffff810966e8>] do_exit+0x968/0x970 [ 0.586167] [<ffffffff8109673c>] do_group_exit+0x4c/0xc0 [ 0.586175] [<ffffffff810a850f>] get_signal_to_deliver+0x20f/0x5c0 [ 0.586184] [<ffffffff810492f3>] do_signal+0x63/0x710 [ 0.586191] [<ffffffff810402b2>] ? check_events+0x12/0x20 [ 0.586198] [<ffffffff810845d1>] ? get_parent_ip+0x11/0x50 [ 0.586206] [<ffffffff815c392d>] ? sub_preempt_count+0x9d/0xd0 [ 0.586217] [<ffffffff815c03e7>] ? _raw_spin_unlock_irqrestore+0x27/0x50 [ 0.586227] [<ffffffff810a706d>] ? force_sig_info+0x9d/0x110 [ 0.586235] [<ffffffff81049a05>] do_notify_resume+0x65/0x80 [ 0.586242] [<ffffffff8104dd6e>] ? do_emulate_vsyscall+0x5e/0x190 [ 0.586249] [<ffffffff815c093c>] retint_signal+0x48/0x8c Parsing config file /root/pv.xm Daemon running with PID 9155 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andrew Lutomirski
2011-Jul-25 18:10 UTC
[Xen-devel] Re: git commit 9fd67b4ed0714ab718f1f9bd14c344af336a6df7 (x86-64: Give vvars their own page) breaks Xen PV guests (64-bit).
On Mon, Jul 25, 2011 at 12:10 PM, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:> On Mon, Jul 25, 2011 at 11:54:42AM -0400, Konrad Rzeszutek Wilk wrote: >> Hey Andy, >> >> I just started testing linus/master and found out that I get this bootup error: >> >> mapping kernel into physical memory >> about to get started... >> (XEN) mm.c:940:d10 Error getting mfn 1888 (pfn 1e3e48) from L1 entry 8000000001888465 for l1e_owner=10, pg_owner=10 >> (XEN) mm.c:5049:d10 ptwr_emulate: could not get_page_from_l1e() >> [ 0.000000] BUG: unable to handle kernel NULL pointer dereference at (null) >> [ 0.000000] IP: [<ffffffff8103a930>] xen_set_pte+0x20/0xe0 >> [ 0.000000] PGD 0 >> [ 0.000000] Oops: 0003 [#1] PREEMPT SMP >> [ 0.000000] CPU 0 >> [ 0.000000] Modules linked in: >> [ 0.000000] >> [ 0.000000] Pid: 0, comm: swapper Not tainted 3.0.0-rc1-00169-gae7bd11 #1 >> [ 0.000000] RIP: e030:[<ffffffff8103a930>] [<ffffffff8103a930>] xen_set_pte+0x20/0xe0 >> [ 0.000000] RSP: e02b:ffffffff81801df8 EFLAGS: 00010097 >> [ 0.000000] RAX: 0000000000000000 RBX: ffff88000193dff8 RCX: ffffffffff5ff000 >> [ 0.000000] RDX: 0000000010000001 RSI: 8000000001888465 RDI: ffff88000193dff8 >> [ 0.000000] RBP: ffffffff81801e18 R08: 0000000000000000 R09: 0000000000007ff0 >> [ 0.000000] R10: aaaaaaaaaaaaaaaa R11: aaaaaaaaaaaaaaaa R12: 8000000001888465 >> [ 0.000000] R13: 000000000e573000 R14: 0000000080000000 R15: 0000000000000000 >> [ 0.000000] FS: 0000000000000000(0000) GS:ffffffff81889000(0000) knlGS:0000000000000000 >> [ 0.000000] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 0.000000] CR2: 0000000000000000 CR3: 0000000001803000 CR4: 0000000000000660 >> [ 0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> [ 0.000000] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >> [ 0.000000] Process swapper (pid: 0, threadinfo ffffffff81800000, task ffffffff8180b020) >> [ 0.000000] Stack: >> [ 0.000000] ffffffffff5ff000 8000000001888465 ffffffffff5ff000 8000000001888465 >> [ 0.000000] ffffffff81801e38 ffffffff8106db53 0000000000000800 8000000001888465 >> [ 0.000000] ffffffff81801e48 ffffffff8106dbc0 ffffffff81801e58 ffffffff810720f6 >> [ 0.000000] Call Trace: >> [ 0.000000] [<ffffffff8106db53>] set_pte_vaddr_pud+0x43/0x60 >> [ 0.000000] [<ffffffff8106dbc0>] set_pte_vaddr+0x50/0x70 > > This tiny patch fixes the bootup: > > diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c > index f987bde..0e4c13c 100644 > --- a/arch/x86/xen/mmu.c > +++ b/arch/x86/xen/mmu.c > @@ -1916,6 +1916,7 @@ static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot) > # endif > #else > case VSYSCALL_LAST_PAGE ... VSYSCALL_FIRST_PAGE: > + case VVAR_PAGE: > #endif > case FIX_TEXT_POKE0: > case FIX_TEXT_POKE1:Looks sane by analogy to the other code there, but I don''t know how this stuff works in Xen. Jeremy?> > However, this is what I get later on, any ideas?> [ 0.585880] init[1] illegal int 0xcc from 32-bit mode ip:ffffffffff600400 cs:e033 sp:7fff230ca088 ax:ffffffffff600400 si:7faee3e822bf di:7fff230ca158That will, indeed, crash your system. 0xe033 is FLAT_RING3_CS64 Jeremy / other Xen people: I''m trying to implement a lightweight check to distinguish a trap from a sane (i.e. allowable for syscalls) 64-bit user context from anything else. There seems to be precedent for using ->cs == __USER_CS to detect 64-bitness; for example, step.c contains: #ifdef CONFIG_X86_64 case 0x40 ... 0x4f: if (regs->cs != __USER_CS) /* 32-bit mode: register increment */ return 0; /* 64-bit mode: REX prefix */ continue; #endif The prefetch opcode checker in mm/fault.c does something similar. Even the sysret code in xen/xen-asm_64.S does: pushq %r11 pushq $__USER_CS pushq %rcx So I''m at a bit of a loss. You could probably hack it up and get your kernel to boot by allowing __USER_CS and 0xe033 in that check, but I''d rather understand it before submitting a patch. --Andy _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Jul-26 16:18 UTC
[Xen-devel] Re: git commit 9fd67b4ed0714ab718f1f9bd14c344af336a6df7 (x86-64: Give vvars their own page) breaks Xen PV guests (64-bit).
> > However, this is what I get later on, any ideas? > > > [ 0.585880] init[1] illegal int 0xcc from 32-bit mode ip:ffffffffff600400 cs:e033 sp:7fff230ca088 ax:ffffffffff600400 si:7faee3e822bf di:7fff230ca158 > > That will, indeed, crash your system. > > 0xe033 is FLAT_RING3_CS64 > > Jeremy / other Xen people: I''m trying to implement a lightweight > check to distinguish a trap from a sane (i.e. allowable for syscalls) > 64-bit user context from anything else. There seems to be precedent > for using ->cs == __USER_CS to detect 64-bitness; for example, step.c > contains: > > #ifdef CONFIG_X86_64 > case 0x40 ... 0x4f: > if (regs->cs != __USER_CS) > /* 32-bit mode: register increment */ > return 0; > /* 64-bit mode: REX prefix */ > continue; > #endif > > The prefetch opcode checker in mm/fault.c does something similar. > > Even the sysret code in xen/xen-asm_64.S does: > > pushq %r11 > pushq $__USER_CS > pushq %rcx > > So I''m at a bit of a loss. > > You could probably hack it up and get your kernel to boot by allowing > __USER_CS and 0xe033 in that check, but I''d rather understand itDid this little hack: diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c index dda7dff..5d0cf37 100644 --- a/arch/x86/kernel/vsyscall_64.c +++ b/arch/x86/kernel/vsyscall_64.c @@ -131,7 +131,7 @@ void dotraplinkage do_emulate_vsyscall(struct pt_regs *regs, long error_code) * Real 64-bit user mode code has cs == __USER_CS. Anything else * is bogus. */ - if (regs->cs != __USER_CS) { + if ((regs->cs != __USER_CS) && (regs->cs != FLAT_RING3_CS64)) { /* * If we trapped from kernel mode, we might as well OOPS now * instead of returning to some random address and OOPSing diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index f987bde..0e4c13c 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -1916,6 +1916,7 @@ static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot) # endif #else case VSYSCALL_LAST_PAGE ... VSYSCALL_FIRST_PAGE: + case VVAR_PAGE: #endif case FIX_TEXT_POKE0: case FIX_TEXT_POKE1: And getting this on 64-bit: started: BusyBox v1.14.3 (2011-07-26 11:43:49 EDT) [ 0.578603] rcS[1128]: segfault at ffffffffff5ff0a0 ip 00007fff40b7380a sp 00007fff40b5c0f0 error 4 [ 0.578847] rcS used greatest stack depth: 5024 bytes left [ 0.581897] sh[1131]: segfault at ffffffffff5ff0a0 ip 00007fffb93ff80a sp 00007fffb92bbd70 error 4 [ 1.587637] sh[1137]: segfault at ffffffffff5ff0a0 ip 00007ffffa5ff80a sp 00007ffffa522560 error 4 [ 2.592295] sh[1141]: segfault at ffffffffff5ff0a0 ip 00007ffffcb3f80a sp 00007ffffca98af0 error 4 [ 3.596344] sh[1145]: segfault at ffffffffff5ff0a0 ip 00007fff2e3ff80a sp 00007fff2e3e3370 error 4 [ 4.599812] sh[1149]: segfault at ffffffffff5ff0a0 ip 00007fff62dff80a sp 00007fff62ca9f10 error 4 [ 5.605835] sh[1153]: segfault at ffffffffff5ff0a0 ip 00007fff117ff80a sp 00007fff1175e7f0 error 4 [ 6.609438] sh[1157]: segfault at ffffffffff5ff0a0 ip 00007fff91bff80a sp 00007fff91bd71c0 error 4 [ 7.614714] sh[1161]: segfault at ffffffffff5ff0a0 ip 00007fff396b280a sp 00007fff3968ede0 error 4 [ 8.620374] sh[1165]: segfault at ffffffffff5ff0a0 ip 00007fffd398b80a sp 00007fffd38ecd70 error 4 [ 9.625512] sh[1169]: segfault at ffffffffff5ff0a0 ip 00007fff617d980a sp 00007fff61776070 error 4 [ 10.630246] sh[1173]: segfault at ffffffffff5ff0a0 ip 00007fff89fff80a sp 00007fff89f7f3b0 error 4 [ 11.635588] sh[1177]: segfault at ffffffffff5ff0a0 ip 00007fffa95ff80a sp 00007fffa95ea7c0 error 4 [ 12.640491] sh[1181]: segfault at ffffffffff5ff0a0 ip 00007fff28cd180a sp 00007fff28c524f0 error 4 .. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Jul-26 16:46 UTC
[Xen-devel] Re: git commit 9fd67b4ed0714ab718f1f9bd14c344af336a6df7 (x86-64: Give vvars their own page) breaks Xen PV guests (64-bit).
On Tue, 2011-07-26 at 12:18 -0400, Konrad Rzeszutek Wilk wrote:> > > However, this is what I get later on, any ideas? > > > > > [ 0.585880] init[1] illegal int 0xcc from 32-bit mode ip:ffffffffff600400 cs:e033 sp:7fff230ca088 ax:ffffffffff600400 si:7faee3e822bf di:7fff230ca158 > > > > That will, indeed, crash your system. > > > > 0xe033 is FLAT_RING3_CS64 > > > > Jeremy / other Xen people: I''m trying to implement a lightweight > > check to distinguish a trap from a sane (i.e. allowable for syscalls) > > 64-bit user context from anything else. There seems to be precedent > > for using ->cs == __USER_CS to detect 64-bitness; for example, step.c > > contains: > > > > #ifdef CONFIG_X86_64 > > case 0x40 ... 0x4f: > > if (regs->cs != __USER_CS) > > /* 32-bit mode: register increment */ > > return 0; > > /* 64-bit mode: REX prefix */ > > continue; > > #endif > > > > The prefetch opcode checker in mm/fault.c does something similar. > > > > Even the sysret code in xen/xen-asm_64.S does: > > > > pushq %r11 > > pushq $__USER_CS > > pushq %rcx > > > > So I''m at a bit of a loss. > > > > You could probably hack it up and get your kernel to boot by allowing > > __USER_CS and 0xe033 in that check, but I''d rather understand it > > Did this little hack: > > > diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c > index dda7dff..5d0cf37 100644 > --- a/arch/x86/kernel/vsyscall_64.c > +++ b/arch/x86/kernel/vsyscall_64.c > @@ -131,7 +131,7 @@ void dotraplinkage do_emulate_vsyscall(struct pt_regs *regs, long error_code) > * Real 64-bit user mode code has cs == __USER_CS. Anything else > * is bogus. > */ > - if (regs->cs != __USER_CS) { > + if ((regs->cs != __USER_CS) && (regs->cs != FLAT_RING3_CS64)) {While it is possible to run on the Xen provided convenience flat segments, is there any reason not to just switch to using the Linux selector values as early as possible on boot? (I expect the reason for your seg faults is that kernel also runs in ring3 for 64 bit PV Xen, i.e. FLAT_KERNEL_CS64 == FLAT_RING3_CS64, although I thought we ensured that the on-stack representations of the selectors was correct for the actual privilege level (to allow for simple checks of kernel vs non-kernel segments e.g. with seg@~3 type constructs). The error doesn''t print the CS so it''s hard to tell for sure so I''m guessing). Ian.> /* > * If we trapped from kernel mode, we might as well OOPS now > * instead of returning to some random address and OOPSing > diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c > index f987bde..0e4c13c 100644 > --- a/arch/x86/xen/mmu.c > +++ b/arch/x86/xen/mmu.c > @@ -1916,6 +1916,7 @@ static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot) > # endif > #else > case VSYSCALL_LAST_PAGE ... VSYSCALL_FIRST_PAGE: > + case VVAR_PAGE: > #endif > case FIX_TEXT_POKE0: > case FIX_TEXT_POKE1: > > And getting this on 64-bit: > > started: BusyBox v1.14.3 (2011-07-26 11:43:49 EDT) > [ 0.578603] rcS[1128]: segfault at ffffffffff5ff0a0 ip 00007fff40b7380a sp 00007fff40b5c0f0 error 4 > [ 0.578847] rcS used greatest stack depth: 5024 bytes left > [ 0.581897] sh[1131]: segfault at ffffffffff5ff0a0 ip 00007fffb93ff80a sp 00007fffb92bbd70 error 4 > [ 1.587637] sh[1137]: segfault at ffffffffff5ff0a0 ip 00007ffffa5ff80a sp 00007ffffa522560 error 4 > [ 2.592295] sh[1141]: segfault at ffffffffff5ff0a0 ip 00007ffffcb3f80a sp 00007ffffca98af0 error 4 > [ 3.596344] sh[1145]: segfault at ffffffffff5ff0a0 ip 00007fff2e3ff80a sp 00007fff2e3e3370 error 4 > [ 4.599812] sh[1149]: segfault at ffffffffff5ff0a0 ip 00007fff62dff80a sp 00007fff62ca9f10 error 4 > [ 5.605835] sh[1153]: segfault at ffffffffff5ff0a0 ip 00007fff117ff80a sp 00007fff1175e7f0 error 4 > [ 6.609438] sh[1157]: segfault at ffffffffff5ff0a0 ip 00007fff91bff80a sp 00007fff91bd71c0 error 4 > [ 7.614714] sh[1161]: segfault at ffffffffff5ff0a0 ip 00007fff396b280a sp 00007fff3968ede0 error 4 > [ 8.620374] sh[1165]: segfault at ffffffffff5ff0a0 ip 00007fffd398b80a sp 00007fffd38ecd70 error 4 > [ 9.625512] sh[1169]: segfault at ffffffffff5ff0a0 ip 00007fff617d980a sp 00007fff61776070 error 4 > [ 10.630246] sh[1173]: segfault at ffffffffff5ff0a0 ip 00007fff89fff80a sp 00007fff89f7f3b0 error 4 > [ 11.635588] sh[1177]: segfault at ffffffffff5ff0a0 ip 00007fffa95ff80a sp 00007fffa95ea7c0 error 4 > [ 12.640491] sh[1181]: segfault at ffffffffff5ff0a0 ip 00007fff28cd180a sp 00007fff28c524f0 error 4 > > .._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andrew Lutomirski
2011-Jul-26 19:01 UTC
[Xen-devel] Re: git commit 9fd67b4ed0714ab718f1f9bd14c344af336a6df7 (x86-64: Give vvars their own page) breaks Xen PV guests (64-bit).
On Tue, Jul 26, 2011 at 12:18 PM, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:>> > However, this is what I get later on, any ideas? >> >> > [ 0.585880] init[1] illegal int 0xcc from 32-bit mode ip:ffffffffff600400 cs:e033 sp:7fff230ca088 ax:ffffffffff600400 si:7faee3e822bf di:7fff230ca158 >> >> That will, indeed, crash your system. >> >> 0xe033 is FLAT_RING3_CS64 >> >> Jeremy / other Xen people: I''m trying to implement a lightweight >> check to distinguish a trap from a sane (i.e. allowable for syscalls) >> 64-bit user context from anything else. There seems to be precedent >> for using ->cs == __USER_CS to detect 64-bitness; for example, step.c >> contains: >> >> #ifdef CONFIG_X86_64 >> case 0x40 ... 0x4f: >> if (regs->cs != __USER_CS) >> /* 32-bit mode: register increment */ >> return 0; >> /* 64-bit mode: REX prefix */ >> continue; >> #endif >> >> The prefetch opcode checker in mm/fault.c does something similar. >> >> Even the sysret code in xen/xen-asm_64.S does: >> >> pushq %r11 >> pushq $__USER_CS >> pushq %rcx >> >> So I''m at a bit of a loss. >> >> You could probably hack it up and get your kernel to boot by allowing >> __USER_CS and 0xe033 in that check, but I''d rather understand it > > Did this little hack: > > > diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c > index dda7dff..5d0cf37 100644 > --- a/arch/x86/kernel/vsyscall_64.c > +++ b/arch/x86/kernel/vsyscall_64.c > @@ -131,7 +131,7 @@ void dotraplinkage do_emulate_vsyscall(struct pt_regs *regs, long error_code) > * Real 64-bit user mode code has cs == __USER_CS. Anything else > * is bogus. > */ > - if (regs->cs != __USER_CS) { > + if ((regs->cs != __USER_CS) && (regs->cs != FLAT_RING3_CS64)) { > /* > * If we trapped from kernel mode, we might as well OOPS now > * instead of returning to some random address and OOPSing > diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c > index f987bde..0e4c13c 100644 > --- a/arch/x86/xen/mmu.c > +++ b/arch/x86/xen/mmu.c > @@ -1916,6 +1916,7 @@ static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot) > # endif > #else > case VSYSCALL_LAST_PAGE ... VSYSCALL_FIRST_PAGE: > + case VVAR_PAGE: > #endif > case FIX_TEXT_POKE0: > case FIX_TEXT_POKE1: > > And getting this on 64-bit: > > started: BusyBox v1.14.3 (2011-07-26 11:43:49 EDT) > [ 0.578603] rcS[1128]: segfault at ffffffffff5ff0a0 ip 00007fff40b7380a sp 00007fff40b5c0f0 error 4 > [ 0.578847] rcS used greatest stack depth: 5024 bytes left > [ 0.581897] sh[1131]: segfault at ffffffffff5ff0a0 ip 00007fffb93ff80a sp 00007fffb92bbd70 error 4 > [ 1.587637] sh[1137]: segfault at ffffffffff5ff0a0 ip 00007ffffa5ff80a sp 00007ffffa522560 error 4 > [ 2.592295] sh[1141]: segfault at ffffffffff5ff0a0 ip 00007ffffcb3f80a sp 00007ffffca98af0 error 4 > [ 3.596344] sh[1145]: segfault at ffffffffff5ff0a0 ip 00007fff2e3ff80a sp 00007fff2e3e3370 error 4 > [ 4.599812] sh[1149]: segfault at ffffffffff5ff0a0 ip 00007fff62dff80a sp 00007fff62ca9f10 error 4 > [ 5.605835] sh[1153]: segfault at ffffffffff5ff0a0 ip 00007fff117ff80a sp 00007fff1175e7f0 error 4 > [ 6.609438] sh[1157]: segfault at ffffffffff5ff0a0 ip 00007fff91bff80a sp 00007fff91bd71c0 error 4 > [ 7.614714] sh[1161]: segfault at ffffffffff5ff0a0 ip 00007fff396b280a sp 00007fff3968ede0 error 4 > [ 8.620374] sh[1165]: segfault at ffffffffff5ff0a0 ip 00007fffd398b80a sp 00007fffd38ecd70 error 4 > [ 9.625512] sh[1169]: segfault at ffffffffff5ff0a0 ip 00007fff617d980a sp 00007fff61776070 error 4 > [ 10.630246] sh[1173]: segfault at ffffffffff5ff0a0 ip 00007fff89fff80a sp 00007fff89f7f3b0 error 4 > [ 11.635588] sh[1177]: segfault at ffffffffff5ff0a0 ip 00007fffa95ff80a sp 00007fffa95ea7c0 error 4 > [ 12.640491] sh[1181]: segfault at ffffffffff5ff0a0 ip 00007fff28cd180a sp 00007fff28c524f0 error 4That one means that the vvar fixmap isn''t working. Can you try the attached patch? --Andy> > .. >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andrew Lutomirski
2011-Jul-26 19:08 UTC
[Xen-devel] [semi-urgent Xen CS question] Re: git commit 9fd67b4ed0714ab718f1f9bd14c344af336a6df7 (x86-64: Give vvars their own page) breaks Xen PV guests (64-bit).
On Tue, Jul 26, 2011 at 11:32 AM, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:> On Mon, Jul 25, 2011 at 09:50:30PM -0400, Andrew Lutomirski wrote: >> After staring at the Xen assembly code with vague comprehension, I >> think I can sort of understand what''s going on. > > Ok. >> >> Can you run this little program on a working kernel and tell me what >> it says (built as 64-bit and as 32-bit (with -m32)): > > 32-bit: > [konrad@f13-x86-build ~]$ ./check > cs = 73 > [konrad@f13-x86-build ~]$ uname -a > Linux f13-x86-build.dumpdata.com 3.0.0 #1 SMP PREEMPT Tue Jul 26 09:56:38 EDT 2011 i686 i686 i386 GNU/Linux > > > 64-bit: > > [konrad@f13-amd64-build ~]$ ./check > cs = e033My best guess is that each task starts out with standard __USER_CS, but the code in write_stack_trampoline (in the hypervisor) tells the kernel that CS is 0xe033 and then the next return to userspace makes it true. I''ll hack up a patch to avoid the crash. I''ll feel better about it if you or any of the Xen gurus can confirm that explanation. If I''m right, I need to check for both __USER_CS and FLAT_RING3_CS. --Andy _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2011-Jul-26 20:48 UTC
Re: [Xen-devel] [semi-urgent Xen CS question] Re: git commit 9fd67b4ed0714ab718f1f9bd14c344af336a6df7 (x86-64: Give vvars their own page) breaks Xen PV guests (64-bit).
On 26/07/2011 20:08, "Andrew Lutomirski" <luto@mit.edu> wrote:> On Tue, Jul 26, 2011 at 11:32 AM, Konrad Rzeszutek Wilk > <konrad.wilk@oracle.com> wrote: >> On Mon, Jul 25, 2011 at 09:50:30PM -0400, Andrew Lutomirski wrote: >>> After staring at the Xen assembly code with vague comprehension, I >>> think I can sort of understand what''s going on. >> >> Ok. >>> >>> Can you run this little program on a working kernel and tell me what >>> it says (built as 64-bit and as 32-bit (with -m32)): >> >> 32-bit: >> [konrad@f13-x86-build ~]$ ./check >> cs = 73 >> [konrad@f13-x86-build ~]$ uname -a >> Linux f13-x86-build.dumpdata.com 3.0.0 #1 SMP PREEMPT Tue Jul 26 09:56:38 EDT >> 2011 i686 i686 i386 GNU/Linux >> >> >> 64-bit: >> >> [konrad@f13-amd64-build ~]$ ./check >> cs = e033 > > My best guess is that each task starts out with standard __USER_CS, > but the code in write_stack_trampoline (in the hypervisor) tells the > kernel that CS is 0xe033 and then the next return to userspace makes > it true.Yes, that''s right.> I''ll hack up a patch to avoid the crash. I''ll feel better about it if > you or any of the Xen gurus can confirm that explanation. If I''m > right, I need to check for both __USER_CS and FLAT_RING3_CS.Either that, or Linux needs to poke its preferred 32- or 64-bit user CS value into the return stackframe when it receives a syscall notification from Xen. -- Keir> --Andy > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Jul-26 20:51 UTC
[Xen-devel] Re: git commit 9fd67b4ed0714ab718f1f9bd14c344af336a6df7 (x86-64: Give vvars their own page) breaks Xen PV guests (64-bit).
> That one means that the vvar fixmap isn''t working. Can you try the > attached patch?Sure. Albeit it looks to be missing a check for the 0xe033 cs? This is what I get from launching a guest: (early) [ 0.000000] Initializing cgroup subsys cpuset (early) [ 0.000000] Initializing cgroup subsys cpu (early) [ 0.000000] Linux version 3.0.0-05046-ge08dc13-dirty (konrad@phenom) (gcc version 4.4.4 20100503 (Red Hat 4.4.4-2) (GCC) ) #1 SMP PREEMPT Tue Jul 26 16:24:01 EDT 2011 (early) [ 0.000000] Command line: console=hvc0 debug earlyprintk=xenboot test=test (early) [ 0.000000] ACPI in unprivileged domain disabled (early) [ 0.000000] released 0 pages of unused memory (early) [ 0.000000] Set 0 page(s) to 1-1 mapping. (early) [ 0.000000] BIOS-provided physical RAM map: (early) [ 0.000000] Xen: 0000000000000000 - 00000000000a0000 (usable) (early) [ 0.000000] Xen: 00000000000a0000 - 0000000000100000 (reserved) (early) [ 0.000000] Xen: 0000000000100000 - 0000000080800000 (usable) (early) [ 0.000000] bootconsole [xenboot0] enabled (early) [ 0.000000] NX (Execute Disable) protection: active (early) [ 0.000000] DMI not present or invalid. (early) [ 0.000000] e820 update range: 0000000000000000 - 0000000000010000 (early) (usable)(early) ==> (early) (reserved)(early) (early) [ 0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (early) (usable)(early) (early) [ 0.000000] No AGP bridge found (early) [ 0.000000] last_pfn = 0x80800 max_arch_pfn = 0x400000000 (early) [ 0.000000] initial memory mapped : 0 - 1028f000 (early) [ 0.000000] Base memory trampoline at [ffff88000009b000] 9b000 size 20480 (early) [ 0.000000] init_memory_mapping: 0000000000000000-0000000080800000 (early) [ 0.000000] 0000000000 - 0080800000 page 4k (early) [ 0.000000] kernel direct mapping tables up to 80800000 @ 7fbf8000-80000000 (early) [ 0.000000] xen: setting RW the range 7ff76000 - 80000000 (early) [ 0.000000] RAMDISK: 01b76000 - 1028f000 (early) [ 0.000000] No NUMA configuration found (early) [ 0.000000] Faking a node at 0000000000000000-0000000080800000 (early) [ 0.000000] Initmem setup node 0 0000000000000000-0000000080800000 (early) [ 0.000000] NODE_DATA [000000007fffb000 - 000000007fffffff] (early) [ 0.000000] Zone PFN ranges: (early) [ 0.000000] DMA (early) 0x00000010 -> 0x00001000 (early) [ 0.000000] DMA32 (early) 0x00001000 -> 0x00100000 (early) [ 0.000000] Normal (early) empty (early) [ 0.000000] Movable zone start PFN for each node (early) [ 0.000000] early_node_map[2] active PFN ranges (early) [ 0.000000] 0: 0x00000010 -> 0x000000a0 (early) [ 0.000000] 0: 0x00000100 -> 0x00080800 (early) [ 0.000000] On node 0 totalpages: 526224 (early) [ 0.000000] DMA zone: 56 pages used for memmap (early) [ 0.000000] DMA zone: 5 pages reserved (early) [ 0.000000] DMA zone: 3923 pages, LIFO batch:0 (early) [ 0.000000] DMA32 zone: 7140 pages used for memmap (early) [ 0.000000] DMA32 zone: 515100 pages, LIFO batch:31 (early) [ 0.000000] SMP: Allowing 4 CPUs, 0 hotplug CPUs (early) [ 0.000000] No local APIC present (early) [ 0.000000] APIC: disable apic facility (early) [ 0.000000] APIC: switched to apic NOOP (early) [ 0.000000] nr_irqs_gsi: 16 (early) [ 0.000000] PM: Registered nosave memory: 00000000000a0000 - 0000000000100000 (early) [ 0.000000] Allocating PCI resources starting at 80800000 (gap: 80800000:7f800000) (early) [ 0.000000] Booting paravirtualized kernel on Xen (early) [ 0.000000] Xen version: 4.2-unstable (preserve-AD) (early) [ 0.000000] setup_percpu: NR_CPUS:256 nr_cpumask_bits:256 nr_cpu_ids:4 nr_node_ids:1 (early) [ 0.000000] PERCPU: Embedded 28 pages/cpu @ffff88007fb88000 s82048 r8192 d24448 u114688 (early) [ 0.000000] pcpu-alloc: s82048 r8192 d24448 u114688 alloc=28*4096(early) (early) [ 0.000000] pcpu-alloc: (early) [0] (early) 0 (early) [0] (early) 1 (early) [0] (early) 2 (early) [0] (early) 3 (early) (early) [ 0.000000] Built 1 zonelists in Node order, mobility grouping on. Total pages: 519023 (early) [ 0.000000] Policy zone: DMA32 (early) [ 0.000000] Kernel command line: console=hvc0 debug earlyprintk=xenboot test=test (early) [ 0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes) (early) [ 0.000000] Checking aperture... (early) [ 0.000000] No AGP bridge found (early) [ 0.000000] Calgary: detecting Calgary via BIOS EBDA area (early) [ 0.000000] Calgary: Unable to locate Rio Grande table in EBDA - bailing! (early) [ 0.000000] Memory: 1810140k/2105344k available (5942k kernel code, 448k absent, 294756k reserved, 2808k data, 692k init) (early) [ 0.000000] SLUB: Genslabs=15, HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 (early) [ 0.000000] Preemptible hierarchical RCU implementation. (early) [ 0.000000] NR_IRQS:16640 nr_irqs:304 16 (early) [ 0.000000] Console: colour dummy device 80x25 (early) [ 0.000000] console [tty0] enabled [ 0.000000] console [hvc0] enabled, bootconsole disabled (early) [ 0.000000] console [hvc0] enabled, bootconsole disabled [ 0.000000] Xen: using vcpuop timer interface [ 0.000000] installing Xen timer for CPU 0 [ 0.000000] Detected 3000.212 MHz processor. [ 0.000000] Marking TSC unstable due to TSCs unsynchronized [ 0.000999] Calibrating delay loop (skipped), value calculated using timer frequency.. 6000.42 BogoMIPS (lpj=3000212) [ 0.000999] pid_max: default: 32768 minimum: 301 [ 0.000999] Security Framework initialized [ 0.000999] SELinux: Initializing. [ 0.000999] SELinux: Starting in permissive mode [ 0.000999] Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes) [ 0.001391] Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes) [ 0.001664] Mount-cache hash table entries: 256 [ 0.001828] Initializing cgroup subsys cpuacct [ 0.001838] Initializing cgroup subsys freezer [ 0.001879] tseg: 0000000000 [ 0.001891] CPU: Physical Processor ID: 0 [ 0.001896] CPU: Processor Core ID: 3 [ 0.001952] SMP alternatives: switching to UP code [ 0.002067] cpu 0 spinlock event irq 17 [ 0.002113] Performance Events: [ 0.002118] no APIC, boot with the "lapic" boot parameter to force-enable it. [ 0.002124] no hardware sampling interrupt available. [ 0.002145] Broken PMU hardware detected, using software events only. [ 0.008116] MCE: In-kernel MCE decoding enabled. [ 0.008136] NMI watchdog disabled (cpu0): hardware events not enabled [ 0.014060] installing Xen timer for CPU 1 [ 0.014116] cpu 1 spinlock event irq 23 [ 0.014192] SMP alternatives: switching to SMP code [ 0.015097] NMI watchdog disabled (cpu1): hardware events not enabled [ 0.021064] installing Xen timer for CPU 2 [ 0.021119] cpu 2 spinlock event irq 29 [ 0.021369] NMI watchdog disabled (cpu2): hardware events not enabled [ 0.027065] installing Xen timer for CPU 3 [ 0.027119] cpu 3 spinlock event irq 35 [ 0.027356] NMI watchdog disabled (cpu3): hardware events not enabled [ 0.029021] Brought up 4 CPUs [ 0.029137] kworker/u:0 used greatest stack depth: 5496 bytes left [ 0.029174] Grant table initialized [ 0.048869] RTC time: 165:165:165, date: 165/165/65 [ 0.048909] NET: Registered protocol family 16 [ 0.049143] Extended Config Space enabled on 0 nodes [ 0.050326] PCI: setting up Xen PCI frontend stub [ 0.050341] PCI: pci_cache_line_size set to 64 bytes [ 0.059045] bio: create slab <bio-0> at 0 [ 0.060044] ACPI: Interpreter disabled. [ 0.060044] xen/balloon: Initialising balloon driver. [ 0.060044] last_pfn = 0x80800 max_arch_pfn = 0x400000000 [ 0.062138] xen-balloon: Initialising balloon driver. [ 0.063082] vgaarb: loaded [ 0.063082] usbcore: registered new interface driver usbfs [ 0.063082] usbcore: registered new interface driver hub [ 0.063082] usbcore: registered new device driver usb [ 0.064035] PCI: System does not support PCI [ 0.064035] PCI: System does not support PCI [ 0.064054] NetLabel: Initializing [ 0.064054] NetLabel: domain hash size = 128 [ 0.064054] NetLabel: protocols = UNLABELED CIPSOv4 [ 0.064054] NetLabel: unlabeled traffic allowed by default [ 0.064085] Switching to clocksource xen [ 0.064406] Switched to NOHz mode on CPU #0 [ 0.064962] Switched to NOHz mode on CPU #2 [ 0.064995] Switched to NOHz mode on CPU #1 [ 0.065071] Switched to NOHz mode on CPU #3 [ 0.066298] pnp: PnP ACPI: disabled [ 0.071088] PCI: max bus depth: 0 pci_try_num: 1 [ 0.071143] NET: Registered protocol family 2 [ 0.071324] IP route cache hash table entries: 65536 (order: 7, 524288 bytes) [ 0.072659] TCP established hash table entries: 262144 (order: 10, 4194304 bytes) [ 0.073918] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes) [ 0.074217] TCP: Hash tables configured (established 262144 bind 65536) [ 0.074228] TCP reno registered [ 0.074249] UDP hash table entries: 1024 (order: 3, 32768 bytes) [ 0.074276] UDP-Lite hash table entries: 1024 (order: 3, 32768 bytes) [ 0.074364] NET: Registered protocol family 1 [ 0.074542] RPC: Registered named UNIX socket transport module. [ 0.074550] RPC: Registered udp transport module. [ 0.074555] RPC: Registered tcp transport module. [ 0.074559] RPC: Registered tcp NFSv4.1 backchannel transport module. [ 0.074566] PCI: CLS 0 bytes, default 64 [ 0.074682] Trying to unpack rootfs image as initramfs... [ 0.318696] Freeing initrd memory: 236644k freed [ 0.378844] platform rtc_cmos: registered platform RTC device (no PNP device found) [ 0.380209] Machine check injector initialized [ 0.380975] microcode: CPU0: patch_level=0x010000bf [ 0.381010] microcode: CPU1: patch_level=0x010000bf [ 0.381083] microcode: CPU2: patch_level=0x010000bf [ 0.381179] microcode: CPU3: patch_level=0x010000bf [ 0.381256] microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba [ 0.381648] audit: initializing netlink socket (disabled) [ 0.381670] type=2000 audit(1311713301.894:1): initialized [ 0.395872] HugeTLB registered 2 MB page size, pre-allocated 0 pages [ 0.401251] VFS: Disk quotas dquot_6.5.2 [ 0.401400] Dquot-cache hash table entries: 512 (order 0, 4096 bytes) [ 0.402131] NTFS driver 2.1.30 [Flags: R/W]. [ 0.402462] msgmni has been set to 3997 [ 0.402606] SELinux: Registering netfilter hooks [ 0.403425] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253) [ 0.403434] io scheduler noop registered [ 0.403452] io scheduler deadline registered [ 0.403554] io scheduler cfq registered (default) [ 0.403897] pci_hotplug: PCI Hot Plug PCI Core version: 0.5 [ 0.449804] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled [ 0.515715] Non-volatile memory driver v1.3 [ 0.515740] Linux agpgart interface v0.103 [ 0.516231] [drm] Initialized drm 1.1.0 20060810 [ 0.518833] brd: module loaded [ 0.520221] loop: module loaded [ 0.520875] Fixed MDIO Bus: probed [ 0.521447] tun: Universal TUN/TAP device driver, 1.6 [ 0.521461] tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com> [ 0.521835] ehci_hcd: USB 2.0 ''Enhanced'' Host Controller (EHCI) Driver [ 0.521844] ehci_hcd: block sizes: qh 112 qtd 96 itd 192 sitd 96 [ 0.521931] ohci_hcd: USB 1.1 ''Open'' Host Controller (OHCI) Driver [ 0.521937] ohci_hcd: block sizes: ed 80 td 96 [ 0.522023] uhci_hcd: USB Universal Host Controller Interface driver [ 0.522222] usbcore: registered new interface driver usblp [ 0.522307] usbcore: registered new interface driver libusual [ 0.522613] i8042: PNP: No PS/2 controller found. Probing ports directly. [ 0.523442] i8042: No controller found [ 0.523595] mousedev: PS/2 mouse device common for all mice [ 0.564242] rtc_cmos rtc_cmos: rtc core: registered rtc_cmos as rtc0 [ 0.564478] rtc_cmos: probe of rtc_cmos failed with error -38 [ 0.564858] cpuidle: using governor ladder [ 0.564866] cpuidle: using governor menu [ 0.564872] EFI Variables Facility v0.08 2004-May-17 [ 0.564965] zram: num_devices not specified. Using default: 1 [ 0.564973] zram: Creating 1 devices ... [ 0.565379] Netfilter messages via NETLINK v0.30. [ 0.565401] nf_conntrack version 0.5.0 (16384 buckets, 65536 max) [ 0.565729] ctnetlink v0.93: registering with nfnetlink. [ 0.566215] ip_tables: (C) 2000-2006 Netfilter Core Team [ 0.566278] TCP cubic registered [ 0.566282] Initializing XFRM netlink socket [ 0.566843] NET: Registered protocol family 10 [ 0.567453] ip6_tables: (C) 2000-2006 Netfilter Core Team [ 0.567512] IPv6 over IPv4 tunneling driver [ 0.568324] NET: Registered protocol family 17 [ 0.568355] Registering the dns_resolver key type [ 0.568867] PM: Hibernation image not present or could not be loaded. [ 0.568891] registered taskstats version 1 [ 0.568926] XENBUS: Device with no driver: device/vif/0 [ 0.568932] XENBUS: Device with no driver: device/vfb/0 [ 0.568937] XENBUS: Device with no driver: device/vkbd/0 [ 0.568951] Magic number: 1:252:3141 [ 0.569157] powernow-k8: Found 1 AMD Phenom(tm) II X6 1075T Processor (4 cpu cores) (version 2.20.00) [ 0.569202] powernow-k8: Core Performance Boosting: on. [ 0.569220] [Firmware Bug]: powernow-k8: No compatible ACPI _PSS objects found. [ 0.569221] [Firmware Bug]: powernow-k8: Try again with latest BIOS. [ 0.569692] Freeing unused kernel memory: 692k freed [ 0.569888] Write protecting the kernel read-only data: 8192k [ 0.572708] Freeing unused kernel memory: 180k freed [ 0.572901] Freeing unused kernel memory: 328k freed [ 0.574137] init[1] illegal int 0xcc from 32-bit mode ip:ffffffffff600400 cs:e033 sp:7fff741c01a8 ax:ffffffffff600400 si:7fd5c2af52bf di:7fff741c0278 [ 0.574370] init used greatest stack depth: 5104 bytes left [ 0.574382] Kernel panic - not syncing: Attempted to kill init! [ 0.574389] Pid: 1, comm: init Not tainted 3.0.0-05046-ge08dc13-dirty #1 [ 0.574395] Call Trace: [ 0.574414] [<ffffffff815be353>] panic+0x96/0x1ad [ 0.574422] [<ffffffff810845d1>] ? get_parent_ip+0x11/0x50 [ 0.574430] [<ffffffff810966ac>] do_exit+0x93c/0x940 [ 0.574437] [<ffffffff810966fc>] do_group_exit+0x4c/0xc0 [ 0.574445] [<ffffffff810a84cf>] get_signal_to_deliver+0x20f/0x5c0 [ 0.574453] [<ffffffff81049473>] do_signal+0x63/0x710 [ 0.574461] [<ffffffff810402b2>] ? check_events+0x12/0x20 [ 0.574467] [<ffffffff810845d1>] ? get_parent_ip+0x11/0x50 [ 0.574475] [<ffffffff815c4b2d>] ? sub_preempt_count+0x9d/0xd0 [ 0.574482] [<ffffffff815c1607>] ? _raw_spin_unlock_irqrestore+0x27/0x50 [ 0.574489] [<ffffffff810a702d>] ? force_sig_info+0x9d/0x110 [ 0.574496] [<ffffffff81049b85>] do_notify_resume+0x65/0x80 [ 0.574504] [<ffffffff8104deee>] ? do_emulate_vsyscall+0x5e/0x190 [ 0.574511] [<ffffffff815c1b3c>] retint_signal+0x48/0x8c Parsing config file /root/pv.xm Daemon running with PID 29568 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andrew Lutomirski
2011-Jul-26 20:55 UTC
[Xen-devel] Re: git commit 9fd67b4ed0714ab718f1f9bd14c344af336a6df7 (x86-64: Give vvars their own page) breaks Xen PV guests (64-bit).
On Tue, Jul 26, 2011 at 4:51 PM, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:>> That one means that the vvar fixmap isn''t working. Can you try the >> attached patch? > > Sure. Albeit it looks to be missing a check for the 0xe033 cs?Sorry -- I meant can you try that on top of the 0xe033 hack? --Andy _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Jul-26 21:06 UTC
[Xen-devel] Re: git commit 9fd67b4ed0714ab718f1f9bd14c344af336a6df7 (x86-64: Give vvars their own page) breaks Xen PV guests (64-bit).
On Tue, Jul 26, 2011 at 04:55:57PM -0400, Andrew Lutomirski wrote:> On Tue, Jul 26, 2011 at 4:51 PM, Konrad Rzeszutek Wilk > <konrad.wilk@oracle.com> wrote: > >> That one means that the vvar fixmap isn''t working. Can you try the > >> attached patch? > > > > Sure. Albeit it looks to be missing a check for the 0xe033 cs? > > Sorry -- I meant can you try that on top of the 0xe033 hack?Yeah, and with this patch: diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index ca6f7ab..b1f3f53 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -638,6 +638,25 @@ long do_arch_prctl(struct task_struct *task, int code, unsigned long addr) break; } + case 1000: { + kernel_fpu_begin(); + kernel_fpu_end(); + ret = 0; + break; + } + + case 1001: { + int i; + kernel_fpu_begin(); + for (i = 0; i < 999; i++) { + stts(); + clts(); + } + kernel_fpu_end(); + ret = 0; + break; + } + default: ret = -EINVAL; break; diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c index dda7dff..5d0cf37 100644 --- a/arch/x86/kernel/vsyscall_64.c +++ b/arch/x86/kernel/vsyscall_64.c @@ -131,7 +131,7 @@ void dotraplinkage do_emulate_vsyscall(struct pt_regs *regs, long error_code) * Real 64-bit user mode code has cs == __USER_CS. Anything else * is bogus. */ - if (regs->cs != __USER_CS) { + if ((regs->cs != __USER_CS) && (regs->cs != FLAT_RING3_CS64)) { /* * If we trapped from kernel mode, we might as well OOPS now * instead of returning to some random address and OOPSing diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index f987bde..1668deb 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -1916,6 +1916,7 @@ static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot) # endif #else case VSYSCALL_LAST_PAGE ... VSYSCALL_FIRST_PAGE: + case VVAR_PAGE: #endif case FIX_TEXT_POKE0: case FIX_TEXT_POKE1: @@ -1956,7 +1957,8 @@ static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot) #ifdef CONFIG_X86_64 /* Replicate changes to map the vsyscall page into the user pagetable vsyscall mapping. */ - if (idx >= VSYSCALL_LAST_PAGE && idx <= VSYSCALL_FIRST_PAGE) { + if (idx >= VSYSCALL_LAST_PAGE && idx <= VSYSCALL_FIRST_PAGE || + idx == VVAR_PAGE) { unsigned long vaddr = __fix_to_virt(idx); set_pte_vaddr_pud(level3_user_vsyscall, vaddr, pte); } It boots up fine: (early) [ 0.000000] Initializing cgroup subsys cpuset (early) [ 0.000000] Initializing cgroup subsys cpu (early) [ 0.000000] Linux version 3.0.0-05046-ge08dc13-dirty (konrad@phenom) (gcc version 4.4.4 20100503 (Red Hat 4.4.4-2) (GCC) ) #1 SMP PREEMPT Tue Jul 26 16:54:34 EDT 2011 (early) [ 0.000000] Command line: console=hvc0 debug earlyprintk=xenboot test=test (early) [ 0.000000] ACPI in unprivileged domain disabled (early) [ 0.000000] released 0 pages of unused memory (early) [ 0.000000] Set 0 page(s) to 1-1 mapping. (early) [ 0.000000] BIOS-provided physical RAM map: (early) [ 0.000000] Xen: 0000000000000000 - 00000000000a0000 (usable) (early) [ 0.000000] Xen: 00000000000a0000 - 0000000000100000 (reserved) (early) [ 0.000000] Xen: 0000000000100000 - 0000000080800000 (usable) (early) [ 0.000000] bootconsole [xenboot0] enabled (early) [ 0.000000] NX (Execute Disable) protection: active (early) [ 0.000000] DMI not present or invalid. (early) [ 0.000000] e820 update range: 0000000000000000 - 0000000000010000 (early) (usable)(early) ==> (early) (reserved)(early) (early) [ 0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (early) (usable)(early) (early) [ 0.000000] No AGP bridge found (early) [ 0.000000] last_pfn = 0x80800 max_arch_pfn = 0x400000000 (early) [ 0.000000] initial memory mapped : 0 - 1028f000 (early) [ 0.000000] Base memory trampoline at [ffff88000009b000] 9b000 size 20480 (early) [ 0.000000] init_memory_mapping: 0000000000000000-0000000080800000 (early) [ 0.000000] 0000000000 - 0080800000 page 4k (early) [ 0.000000] kernel direct mapping tables up to 80800000 @ 7fbf8000-80000000 (early) [ 0.000000] xen: setting RW the range 7ff76000 - 80000000 (early) [ 0.000000] RAMDISK: 01b76000 - 1028f000 (early) [ 0.000000] No NUMA configuration found (early) [ 0.000000] Faking a node at 0000000000000000-0000000080800000 (early) [ 0.000000] Initmem setup node 0 0000000000000000-0000000080800000 (early) [ 0.000000] NODE_DATA [000000007fffb000 - 000000007fffffff] (early) [ 0.000000] Zone PFN ranges: (early) [ 0.000000] DMA (early) 0x00000010 -> 0x00001000 (early) [ 0.000000] DMA32 (early) 0x00001000 -> 0x00100000 (early) [ 0.000000] Normal (early) empty (early) [ 0.000000] Movable zone start PFN for each node (early) [ 0.000000] early_node_map[2] active PFN ranges (early) [ 0.000000] 0: 0x00000010 -> 0x000000a0 (early) [ 0.000000] 0: 0x00000100 -> 0x00080800 (early) [ 0.000000] On node 0 totalpages: 526224 (early) [ 0.000000] DMA zone: 56 pages used for memmap (early) [ 0.000000] DMA zone: 5 pages reserved (early) [ 0.000000] DMA zone: 3923 pages, LIFO batch:0 (early) [ 0.000000] DMA32 zone: 7140 pages used for memmap (early) [ 0.000000] DMA32 zone: 515100 pages, LIFO batch:31 (early) [ 0.000000] SMP: Allowing 4 CPUs, 0 hotplug CPUs (early) [ 0.000000] No local APIC present (early) [ 0.000000] APIC: disable apic facility (early) [ 0.000000] APIC: switched to apic NOOP (early) [ 0.000000] nr_irqs_gsi: 16 (early) [ 0.000000] PM: Registered nosave memory: 00000000000a0000 - 0000000000100000 (early) [ 0.000000] Allocating PCI resources starting at 80800000 (gap: 80800000:7f800000) (early) [ 0.000000] Booting paravirtualized kernel on Xen (early) [ 0.000000] Xen version: 4.2-unstable (preserve-AD) (early) [ 0.000000] setup_percpu: NR_CPUS:256 nr_cpumask_bits:256 nr_cpu_ids:4 nr_node_ids:1 (early) [ 0.000000] PERCPU: Embedded 28 pages/cpu @ffff88007fb88000 s82048 r8192 d24448 u114688 (early) [ 0.000000] pcpu-alloc: s82048 r8192 d24448 u114688 alloc=28*4096(early) (early) [ 0.000000] pcpu-alloc: (early) [0] (early) 0 (early) [0] (early) 1 (early) [0] (early) 2 (early) [0] (early) 3 (early) (early) [ 0.000000] Built 1 zonelists in Node order, mobility grouping on. Total pages: 519023 (early) [ 0.000000] Policy zone: DMA32 (early) [ 0.000000] Kernel command line: console=hvc0 debug earlyprintk=xenboot test=test (early) [ 0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes) (early) [ 0.000000] Checking aperture... (early) [ 0.000000] No AGP bridge found (early) [ 0.000000] Calgary: detecting Calgary via BIOS EBDA area (early) [ 0.000000] Calgary: Unable to locate Rio Grande table in EBDA - bailing! (early) [ 0.000000] Memory: 1810140k/2105344k available (5942k kernel code, 448k absent, 294756k reserved, 2808k data, 692k init) (early) [ 0.000000] SLUB: Genslabs=15, HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 (early) [ 0.000000] Preemptible hierarchical RCU implementation. (early) [ 0.000000] NR_IRQS:16640 nr_irqs:304 16 (early) [ 0.000000] Console: colour dummy device 80x25 (early) [ 0.000000] console [tty0] enabled [ 0.000000] console [hvc0] enabled, bootconsole disabled (early) [ 0.000000] console [hvc0] enabled, bootconsole disabled [ 0.000000] Xen: using vcpuop timer interface [ 0.000000] installing Xen timer for CPU 0 [ 0.000000] Detected 3000.212 MHz processor. [ 0.000000] Marking TSC unstable due to TSCs unsynchronized [ 0.000999] Calibrating delay loop (skipped), value calculated using timer frequency.. 6000.42 BogoMIPS (lpj=3000212) [ 0.000999] pid_max: default: 32768 minimum: 301 [ 0.000999] Security Framework initialized [ 0.000999] SELinux: Initializing. [ 0.000999] SELinux: Starting in permissive mode [ 0.000999] Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes) [ 0.001199] Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes) [ 0.001446] Mount-cache hash table entries: 256 [ 0.001596] Initializing cgroup subsys cpuacct [ 0.001604] Initializing cgroup subsys freezer [ 0.001642] tseg: 0000000000 [ 0.001653] CPU: Physical Processor ID: 0 [ 0.001657] CPU: Processor Core ID: 1 [ 0.001707] SMP alternatives: switching to UP code [ 0.002069] cpu 0 spinlock event irq 17 [ 0.002115] Performance Events: [ 0.002119] no APIC, boot with the "lapic" boot parameter to force-enable it. [ 0.002125] no hardware sampling interrupt available. [ 0.002147] Broken PMU hardware detected, using software events only. [ 0.008049] MCE: In-kernel MCE decoding enabled. [ 0.008074] NMI watchdog disabled (cpu0): hardware events not enabled [ 0.014060] installing Xen timer for CPU 1 [ 0.014117] cpu 1 spinlock event irq 23 [ 0.014194] SMP alternatives: switching to SMP code [ 0.015186] NMI watchdog disabled (cpu1): hardware events not enabled [ 0.021063] installing Xen timer for CPU 2 [ 0.021118] cpu 2 spinlock event irq 29 [ 0.021368] NMI watchdog disabled (cpu2): hardware events not enabled [ 0.027068] installing Xen timer for CPU 3 [ 0.027120] cpu 3 spinlock event irq 35 [ 0.027357] NMI watchdog disabled (cpu3): hardware events not enabled [ 0.029063] Brought up 4 CPUs [ 0.029307] kworker/u:0 used greatest stack depth: 5496 bytes left [ 0.029307] Grant table initialized [ 0.048886] RTC time: 165:165:165, date: 165/165/65 [ 0.048951] NET: Registered protocol family 16 [ 0.049047] Extended Config Space enabled on 0 nodes [ 0.050350] PCI: setting up Xen PCI frontend stub [ 0.050358] PCI: pci_cache_line_size set to 64 bytes [ 0.059241] bio: create slab <bio-0> at 0 [ 0.060017] ACPI: Interpreter disabled. [ 0.060036] xen/balloon: Initialising balloon driver. [ 0.060036] last_pfn = 0x80800 max_arch_pfn = 0x400000000 [ 0.062260] xen-balloon: Initialising balloon driver. [ 0.063073] vgaarb: loaded [ 0.063073] usbcore: registered new interface driver usbfs [ 0.063073] usbcore: registered new interface driver hub [ 0.063073] usbcore: registered new device driver usb [ 0.064030] PCI: System does not support PCI [ 0.064030] PCI: System does not support PCI [ 0.064053] NetLabel: Initializing [ 0.064053] NetLabel: domain hash size = 128 [ 0.064053] NetLabel: protocols = UNLABELED CIPSOv4 [ 0.064053] NetLabel: unlabeled traffic allowed by default [ 0.064076] Switching to clocksource xen [ 0.064076] Switched to NOHz mode on CPU #3 [ 0.064414] Switched to NOHz mode on CPU #0 [ 0.064996] Switched to NOHz mode on CPU #2 [ 0.065113] Switched to NOHz mode on CPU #1 [ 0.066290] pnp: PnP ACPI: disabled [ 0.071025] PCI: max bus depth: 0 pci_try_num: 1 [ 0.071084] NET: Registered protocol family 2 [ 0.071274] IP route cache hash table entries: 65536 (order: 7, 524288 bytes) [ 0.072620] TCP established hash table entries: 262144 (order: 10, 4194304 bytes) [ 0.073896] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes) [ 0.074185] TCP: Hash tables configured (established 262144 bind 65536) [ 0.074194] TCP reno registered [ 0.074215] UDP hash table entries: 1024 (order: 3, 32768 bytes) [ 0.074239] UDP-Lite hash table entries: 1024 (order: 3, 32768 bytes) [ 0.074332] NET: Registered protocol family 1 [ 0.074583] RPC: Registered named UNIX socket transport module. [ 0.074592] RPC: Registered udp transport module. [ 0.074597] RPC: Registered tcp transport module. [ 0.074601] RPC: Registered tcp NFSv4.1 backchannel transport module. [ 0.074609] PCI: CLS 0 bytes, default 64 [ 0.074726] Trying to unpack rootfs image as initramfs... [ 0.316784] Freeing initrd memory: 236644k freed [ 0.377411] platform rtc_cmos: registered platform RTC device (no PNP device found) [ 0.378778] Machine check injector initialized [ 0.379554] microcode: CPU0: patch_level=0x010000bf [ 0.379574] microcode: CPU1: patch_level=0x010000bf [ 0.379621] microcode: CPU2: patch_level=0x010000bf [ 0.379671] microcode: CPU3: patch_level=0x010000bf [ 0.379748] microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba [ 0.380129] audit: initializing netlink socket (disabled) [ 0.380147] type=2000 audit(1311714191.306:1): initialized [ 0.394045] HugeTLB registered 2 MB page size, pre-allocated 0 pages [ 0.399695] VFS: Disk quotas dquot_6.5.2 [ 0.399844] Dquot-cache hash table entries: 512 (order 0, 4096 bytes) [ 0.400589] NTFS driver 2.1.30 [Flags: R/W]. [ 0.400895] msgmni has been set to 3997 [ 0.401041] SELinux: Registering netfilter hooks [ 0.401781] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253) [ 0.401790] io scheduler noop registered [ 0.401794] io scheduler deadline registered [ 0.401908] io scheduler cfq registered (default) [ 0.402285] pci_hotplug: PCI Hot Plug PCI Core version: 0.5 [ 0.450645] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled [ 0.514817] Non-volatile memory driver v1.3 [ 0.514826] Linux agpgart interface v0.103 [ 0.515298] [drm] Initialized drm 1.1.0 20060810 [ 0.518170] brd: module loaded [ 0.519592] loop: module loaded [ 0.520265] Fixed MDIO Bus: probed [ 0.520797] tun: Universal TUN/TAP device driver, 1.6 [ 0.520804] tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com> [ 0.521202] ehci_hcd: USB 2.0 ''Enhanced'' Host Controller (EHCI) Driver [ 0.521211] ehci_hcd: block sizes: qh 112 qtd 96 itd 192 sitd 96 [ 0.521303] ohci_hcd: USB 1.1 ''Open'' Host Controller (OHCI) Driver [ 0.521310] ohci_hcd: block sizes: ed 80 td 96 [ 0.521400] uhci_hcd: USB Universal Host Controller Interface driver [ 0.521570] usbcore: registered new interface driver usblp [ 0.521649] usbcore: registered new interface driver libusual [ 0.521944] i8042: PNP: No PS/2 controller found. Probing ports directly. [ 0.522766] i8042: No controller found [ 0.523049] mousedev: PS/2 mouse device common for all mice [ 0.563573] rtc_cmos rtc_cmos: rtc core: registered rtc_cmos as rtc0 [ 0.563697] rtc_cmos: probe of rtc_cmos failed with error -38 [ 0.564012] cpuidle: using governor ladder [ 0.564018] cpuidle: using governor menu [ 0.564023] EFI Variables Facility v0.08 2004-May-17 [ 0.564118] zram: num_devices not specified. Using default: 1 [ 0.564125] zram: Creating 1 devices ... [ 0.564504] Netfilter messages via NETLINK v0.30. [ 0.564525] nf_conntrack version 0.5.0 (16384 buckets, 65536 max) [ 0.564828] ctnetlink v0.93: registering with nfnetlink. [ 0.565311] ip_tables: (C) 2000-2006 Netfilter Core Team [ 0.565346] TCP cubic registered [ 0.565351] Initializing XFRM netlink socket [ 0.565760] NET: Registered protocol family 10 [ 0.566372] ip6_tables: (C) 2000-2006 Netfilter Core Team [ 0.566433] IPv6 over IPv4 tunneling driver [ 0.567372] NET: Registered protocol family 17 [ 0.567408] Registering the dns_resolver key type [ 0.567985] PM: Hibernation image not present or could not be loaded. [ 0.568005] registered taskstats version 1 [ 0.568041] XENBUS: Device with no driver: device/vif/0 [ 0.568046] XENBUS: Device with no driver: device/vfb/0 [ 0.568052] XENBUS: Device with no driver: device/vkbd/0 [ 0.568065] Magic number: 1:252:3141 [ 0.568211] powernow-k8: Found 1 AMD Phenom(tm) II X6 1075T Processor (4 cpu cores) (version 2.20.00) [ 0.568237] powernow-k8: Core Performance Boosting: on. [ 0.568253] [Firmware Bug]: powernow-k8: No compatible ACPI _PSS objects found. [ 0.568254] [Firmware Bug]: powernow-k8: Try again with latest BIOS. [ 0.568769] Freeing unused kernel memory: 692k freed [ 0.568966] Write protecting the kernel read-only data: 8192k [ 0.572048] Freeing unused kernel memory: 180k freed [ 0.572251] Freeing unused kernel memory: 328k freed init started: BusyBox v1.14.3 (2011-07-26 16:55:54 EDT) [ 0.579282] consoletype used greatest stack depth: 5376 bytes left Mounting directories [ OK ] [ 0.799316] modprobe used greatest stack depth: 5136 bytes left mount: mount point /sys/kernel/config does not exist [ 0.806063] core_filesystem used greatest stack depth: 5024 bytes left [ 0.818750] input: Xen Virtual Keyboard as /devices/virtual/input/input0 [ 0.819044] input: Xen Virtual Pointer as /devices/virtual/input/input1 [ 1.041570] Initialising Xen virtual ethernet driver. [ 1.158627] ------------[ cut here ]------------ [ 1.158669] WARNING: at /home/konrad/ssd/linux/fs/proc/base.c:1123 oom_adjust_write+0x294/0x2b0() [ 1.158677] udevd (1192): /proc/1192/oom_adj is deprecated, please use /proc/1192/oom_score_adj instead. [ 1.158685] Modules linked in: xen_blkfront xen_netfront xen_fbfront fb_sys_fops sysimgblt sysfillrect syscopyarea xen_kbdfront xenfs [ 1.158711] Pid: 1192, comm: udevd Not tainted 3.0.0-05046-ge08dc13-dirty #1 [ 1.158717] Call Trace: [ 1.158725] [<ffffffff810921da>] warn_slowpath_common+0x7a/0xb0 [ 1.158733] [<ffffffff810922b1>] warn_slowpath_fmt+0x41/0x50 [ 1.158740] [<ffffffff8109de85>] ? ns_capable+0x25/0x60 [ 1.158747] [<ffffffff811d4d44>] oom_adjust_write+0x294/0x2b0 [ 1.158755] [<ffffffff81175698>] vfs_write+0xc8/0x190 [ 1.158761] [<ffffffff8117584c>] sys_write+0x4c/0x90 [ 1.158769] [<ffffffff815c8512>] system_call_fastpath+0x16/0x1b [ 1.158775] ---[ end trace 53a836e564b32553 ]--- [ 1.289040] ip used greatest stack depth: 3936 bytes left Waiting for devices [ OK ] Waiting for fb [ OK ] Starting..[/dev/fb0] /dev/fb0: len:0 /dev/fb0: bits/pixel32 (7fdbfe873000): Writting .. [800:600] Done! FATAL: Module agpgart_intel not found. [ 1.420253] Console: switching to colour frame buffer device 100x37 [ 1.462964] [drm] radeon kernel modesetting enabled. WARNING: Error inserting wmi (/lib/modules/3.0.0-05046-ge08dc13-dirty/kernel/drivers/platform/x86/wmi.ko): No such device WARNING: Error inserting mxm_wmi (/lib/modules/3.0.0-05046-ge08dc13-dirty/kernel/drivers/platform/x86/mxm-wmi.ko): No such device WARNING: Error inserting drm_kms_helper (/lib/modules/3.0.0-05046-ge08dc13-dirty/kernel/drivers/gpu/drm/drm_kms_helper.ko): No such device WARNING: Error inserting ttm (/lib/modules/3.0.0-05046-ge08dc13-dirty/kernel/drivers/gpu/drm/ttm/ttm.ko): No such device FATAL: Error inserting nouveau (/lib/modules/3.0.0-05046-ge08dc13-dirty/kernel/drivers/gpu/drm/nouveau/nouveau.ko): No such device WARNING: Error inserting drm_kms_helper (/lib/modules/3.0.0-05046-ge08dc13-dirty/kernel/drivers/gpu/drm/drm_kms_helper.ko): No such device FATAL: Error inserting i915 (/lib/modules/3.0.0-05046-ge08dc13-dirty/kernel/drivers/gpu/drm/i915/i915.ko): No such device Starting..[/dev/fb0] /dev/fb0: len:0 /dev/fb0: bits/pixel32 (7fecfee0e000): Writting .. [800:600] Done! VGA: 0000: Waiting for network [ OK ] Bringing up loopback interface: [ OK ] Bringing up interface eth0: [ 1.785668] device eth0 entered promiscuous mode [ OK ] Bringing up interface switch: Determining IP information for switch...[ 1.843220] switch: port 1(eth0) entering forwarding state [ 1.843271] switch: port 1(eth0) entering forwarding state done. [ OK ] Waiting for init.custom [ OK ] Start sshd PING master.dumpdata.com (192.168.101.1) 56(84) bytes of data. --- master.dumpdata.com ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 1ms rtt min/avg/max/mdev = 0.365/0.365/0.365/0.000 ms mount.nfs: rpc.statd is not running but is required for remote locking. mount.nfs: Either use ''-o nolock'' to keep locks local, or start statd. mount.nfs: an incorrect mount option was specified Starting SSHd ... [ 3.190254] mount.nfs used greatest stack depth: 3648 bytes left [ 3.193509] [drm] Module unloaded ERROR: Module nouveau does not exist in /proc/modules libxl: error: libxl.c:61:libxl_ctx_alloc: Is xenstore daemon running? failed to stat /var/run/xenstored.pid: No such file or directory cannot init xl context Waiting for SSHd [ OK ] WARNING: ssh currently running [2319] ignoring start request [ 3.359346] SCSI subsystem initialized [ 3.361222] Loading iSCSI transport class v2.0-870. [ 3.364431] iscsi: registered transport (tcp) iscsistart: transport class version 2.0-870. iscsid version 2.0-872 Could not get list of targets from firmware. Jul 26 21:03:14 g-pvops syslogd 1.5.0: restart. FATAL: Module evtchn not found. [ 3.400467] Event-channel device installed. xencommons should be started first. CPU0 CPU1 CPU2 CPU3 16: 1720 0 0 0 xen-percpu-virq timer0 17: 8 0 0 0 xen-percpu-ipi spinlock0 18: 2432 0 0 0 xen-percpu-ipi resched0 19: 162 0 0 0 xen-percpu-ipi callfunc0 20: 0 0 0 0 xen-percpu-virq debug0 21: 105 0 0 0 xen-percpu-ipi callfuncsingle0 22: 0 1724 0 0 xen-percpu-virq timer1 23: 0 11 0 0 xen-percpu-ipi spinlock1 24: 0 2050 0 0 xen-percpu-ipi resched1 25: 0 139 0 0 xen-percpu-ipi callfunc1 26: 0 0 0 0 xen-percpu-virq debug1 27: 0 84 0 0 xen-percpu-ipi callfuncsingle1 28: 0 0 1345 0 xen-percpu-virq timer2 29: 0 0 23 0 xen-percpu-ipi spinlock2 30: 0 0 671 0 xen-percpu-ipi resched2 31: 0 0 161 0 xen-percpu-ipi callfunc2 32: 0 0 0 0 xen-percpu-virq debug2 33: 0 0 101 0 xen-percpu-ipi callfuncsingle2 34: 0 0 0 1684 xen-percpu-virq timer3 35: 0 0 0 15 xen-percpu-ipi spinlock3 36: 0 0 0 1437 xen-percpu-ipi resched3 37: 0 0 0 156 xen-percpu-ipi callfunc3 38: 0 0 0 0 xen-percpu-virq debug3 39: 0 0 0 118 xen-percpu-ipi callfuncsingle3 40: 404 0 0 0 xen-dyn-event xenbus 41: 71 0 0 0 xen-dyn-event hvc_console 42: 0 0 0 0 xen-dyn-event vkbd 43: 70 0 0 0 xen-dyn-event vfb 44: 107 0 0 0 xen-dyn-event eth0 NMI: 0 0 0 0 Non-maskable interrupts LOC: 0 0 0 0 Local timer interrupts SPU: 0 0 0 0 Spurious interrupts PMI: 0 0 0 0 Performance monitoring interrupts IWI: 0 0 0 0 IRQ work interrupts RES: 2432 2050 671 1437 Rescheduling interrupts CAL: 267 223 262 274 Function call interrupts TLB: 0 0 0 0 TLB shootdowns TRM: 0 0 0 0 Thermal event interrupts THR: 0 0 0 0 Threshold APIC interrupts MCE: 0 0 0 0 Machine check exceptions MCP: 0 0 0 0 Machine check polls ERR: 0 MIS: 0 00000000-0000ffff : reserved 00010000-0009ffff : System RAM 000a0000-000fffff : reserved 000f0000-000fffff : System ROM 00100000-807fffff : System RAM 01000000-015cd9bb : Kernel code 015cd9bc-0188bbff : Kernel data 01941000-01a3ffff : Kernel bss Starting test testcase.. Jul 26 21:03:14 g-pvops init: starting pid 2436, tty ''/dev/tty0'': ''/bin/sh'' Jul 26 21:03:14 g-pvops init: starting pid 2437, tty ''/dev/tty1'': ''/bin/sh'' Jul 26 21:03:14 g-pvops init: starting pid 2438, tty ''/dev/ttyS0'': ''/bin/sh'' ~~~~~~~~~~~~~~~~~~~~~~~~~~| DirectFB 1.4.9 |~~~~~~~~~~~~~~~~~~~~~~~~~~ (c) 2001-2010 The world wide DirectFB Open Source Community (c) 2000-2004 Convergence (integrated media) GmbH ---------------------------------------------------------------- (*) DirectFB/Core: Single Application Core. (2011-07-26 20:56) Jul 26 21:03:14 g-pvops init: starting pid 2439, tty ''/dev/hvc0'': ''/bin/sh'' (*) Direct/Memcpy: Using libc memcpy() sh-4.1# (*) Direct/Thread: Started ''VT Switcher'' (-1) [CRITICAL OTHER/OTHER 0/0] <8388608>... (*) Direct/Thread: Started ''VT Flusher'' (-1) [DEFAULT OTHER/OTHER 0/0] <8388608>... (*) DirectFB/FBDev: Found ''xen'' (ID 0) with frame buffer at 0x00000000, 2048k (MMIO 0x00000000, 0k) (*) Direct/Thread: Started ''Keyboard Input'' (-1) [INPUT OTHER/OTHER 0/0] <8388608>... (*) DirectFB/Input: Keyboard 0.9 (directfb.org) (*) Direct/Thread: Started ''PS/2 Input'' (-1) [INPUT OTHER/OTHER 0/0] <8388608>... (*) DirectFB/Input: IMPS/2 Mouse 1.0 (directfb.org) (*) Direct/Thread: Started ''Linux Input'' (-1) [INPUT OTHER/OTHER 0/0] <8388608>... (*) DirectFB/Input: Xen Virtual Keyboard (1) 0.1 (directfb.org) (*) Direct/Thread: Started ''Linux Input'' (-1) [INPUT OTHER/OTHER 0/0] <8388608>... (*) DirectFB/Input: Xen Virtual Pointer (2) 0.1 (directfb.org) (*) Direct/Thread: Started ''Hotplug with Linux Input'' (-1) [INPUT OTHER/OTHER 0/0] <8388608>... (*) DirectFB/Input: Hot-plug detection enabled with Linux Input Driver (*) DirectFB/Genefx: MMX detected and enabled (*) DirectFB/Graphics: MMX Software Rasterizer 0.6 (directfb.org) (*) DirectFB/Core/WM: Default 0.3 (directfb.org) (*) FBDev/Mode: Setting 800x600 RGB32 (*) FBDev/Mode: Switched to 800x600 (virtual 800x600) at 32 bit (RGB32), pitch 3200 SSH started [2319] Jul 26 21:03:15 g-pvops iscsid: transport class version 2.0-870. iscsid version 2.0-872 Jul 26 21:03:15 g-pvops iscsid: iSCSI daemon with pid=2400 started! Jul 26 21:03:15 g-pvops init: process ''/bin/sh'' (pid 2438) exited. Scheduling for restart. Jul 26 21:03:15 g-pvops init: starting pid 2452, tty ''/dev/ttyS0'': ''/bin/sh'' Jul 26 21:03:16 g-pvops init: process ''/bin/sh'' (pid 2452) exited. Scheduling for restart. Jul 26 21:03:16 g-pvops init: starting pid 2453, tty ''/dev/ttyS0'': ''/bin/sh'' Jul 26 21:03:17 g-pvops init: process ''/bin/sh'' (pid 2453) exited. Scheduling for restart. Jul 26 21:03:17 g-pvops init: starting pid 2454, tty ''/dev/ttyS0'': ''/bin/sh'' Jul 26 21:03:18 g-pvops init: process ''/bin/sh'' (pid 2454) exited. Scheduling for restart. Jul 26 21:03:18 g-pvops init: starting pid 2455, tty ''/dev/ttyS0'': ''/bin/sh'' poweroJul 26 21:03:19 g-pvops init: process ''/bin/sh'' (pid 2455) exited. Scheduling for restart. Jul 26 21:03:19 g-pvops init: starting pid 2457, tty ''/dev/ttyS0'': ''/bin/sh'' ff Jul 26 21:03:19 g-pvops init: starting pid 2460, tty '''': ''/etc/init.d/halt'' sh-4.1# Usage: /etc/init.d/halt {start} The system is going down NOW! Jul 26 21:03:19Jul 26 21:03:19 g-pvops Sent SIGTERM to all processes (!) [ 2435: 0.000] --> Caught signal 15 (sent by pid 1, uid 0) <-- (!!!) *** WARNING [still objects in ''Window Pool''] *** [object.c:241 in fusion_object_pool_destroy()] (!!!) *** WARNING [still objects in ''Layer Region Pool''] *** [object.c:241 in fusion_object_pool_destroy()] (!!!) *** WARNING [still objects in ''Layer Context Pool''] *** [object.c:241 in fusion_object_pool_destroy()] (!!!) *** WARNING [still objects in ''Surface Pool''] *** [object.c:241 in fusion_object_pool_destroy()] Sent SIGKILL to all processes Requesting system poweroff [ 10.925177] System halted. Parsing config file /root/pv.xm Daemon running with PID 18759 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andrew Lutomirski
2011-Jul-26 21:10 UTC
Re: [Xen-devel] [semi-urgent Xen CS question] Re: git commit 9fd67b4ed0714ab718f1f9bd14c344af336a6df7 (x86-64: Give vvars their own page) breaks Xen PV guests (64-bit).
On Tue, Jul 26, 2011 at 4:48 PM, Keir Fraser <keir.xen@gmail.com> wrote:> On 26/07/2011 20:08, "Andrew Lutomirski" <luto@mit.edu> wrote: > >> On Tue, Jul 26, 2011 at 11:32 AM, Konrad Rzeszutek Wilk >> <konrad.wilk@oracle.com> wrote: >>> On Mon, Jul 25, 2011 at 09:50:30PM -0400, Andrew Lutomirski wrote: >>>> After staring at the Xen assembly code with vague comprehension, I >>>> think I can sort of understand what''s going on. >>> >>> Ok. >>>> >>>> Can you run this little program on a working kernel and tell me what >>>> it says (built as 64-bit and as 32-bit (with -m32)): >>> >>> 32-bit: >>> [konrad@f13-x86-build ~]$ ./check >>> cs = 73 >>> [konrad@f13-x86-build ~]$ uname -a >>> Linux f13-x86-build.dumpdata.com 3.0.0 #1 SMP PREEMPT Tue Jul 26 09:56:38 EDT >>> 2011 i686 i686 i386 GNU/Linux >>> >>> >>> 64-bit: >>> >>> [konrad@f13-amd64-build ~]$ ./check >>> cs = e033 >> >> My best guess is that each task starts out with standard __USER_CS, >> but the code in write_stack_trampoline (in the hypervisor) tells the >> kernel that CS is 0xe033 and then the next return to userspace makes >> it true. > > Yes, that''s right.But it''s still weird, because AFAICT xen_sysret64 already does the right thing. So presumably the failure case only happens when something prevents sysret from working, like CONFIG_AUDITSYSCALL.> >> I''ll hack up a patch to avoid the crash. I''ll feel better about it if >> you or any of the Xen gurus can confirm that explanation. If I''m >> right, I need to check for both __USER_CS and FLAT_RING3_CS. > > Either that, or Linux needs to poke its preferred 32- or 64-bit user CS > value into the return stackframe when it receives a syscall notification > from Xen.That sounds simpler. It will also make Xen userspace look more like native userspace. --Andy> > -- Keir > >> --Andy >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andrew Lutomirski
2011-Jul-26 21:40 UTC
Re: [Xen-devel] [semi-urgent Xen CS question] Re: git commit 9fd67b4ed0714ab718f1f9bd14c344af336a6df7 (x86-64: Give vvars their own page) breaks Xen PV guests (64-bit).
On Tue, Jul 26, 2011 at 5:10 PM, Andrew Lutomirski <luto@mit.edu> wrote:> On Tue, Jul 26, 2011 at 4:48 PM, Keir Fraser <keir.xen@gmail.com> wrote: >> On 26/07/2011 20:08, "Andrew Lutomirski" <luto@mit.edu> wrote: >> >>> On Tue, Jul 26, 2011 at 11:32 AM, Konrad Rzeszutek Wilk >>> <konrad.wilk@oracle.com> wrote: >>>> On Mon, Jul 25, 2011 at 09:50:30PM -0400, Andrew Lutomirski wrote: >>>>> After staring at the Xen assembly code with vague comprehension, I >>>>> think I can sort of understand what''s going on. >>>> >>>> Ok. >>>>> >>>>> Can you run this little program on a working kernel and tell me what >>>>> it says (built as 64-bit and as 32-bit (with -m32)): >>>> >>>> 32-bit: >>>> [konrad@f13-x86-build ~]$ ./check >>>> cs = 73 >>>> [konrad@f13-x86-build ~]$ uname -a >>>> Linux f13-x86-build.dumpdata.com 3.0.0 #1 SMP PREEMPT Tue Jul 26 09:56:38 EDT >>>> 2011 i686 i686 i386 GNU/Linux >>>> >>>> >>>> 64-bit: >>>> >>>> [konrad@f13-amd64-build ~]$ ./check >>>> cs = e033 >>> >>> My best guess is that each task starts out with standard __USER_CS, >>> but the code in write_stack_trampoline (in the hypervisor) tells the >>> kernel that CS is 0xe033 and then the next return to userspace makes >>> it true. >> >> Yes, that''s right. > > But it''s still weird, because AFAICT xen_sysret64 already does the > right thing. So presumably the failure case only happens when > something prevents sysret from working, like CONFIG_AUDITSYSCALL.I lied. I still don''t see what''s going on. Xen, in enlighten.c, registers xen_syscall_target as the 64-bit syscall target (or at least I assume that''s what CALLBACKTYPE_syscall does). xen_syscall_target does this: .macro undo_xen_syscall mov 0*8(%rsp), %rcx mov 1*8(%rsp), %r11 mov 5*8(%rsp), %rsp .endm /* Normal 64-bit system call target */ ENTRY(xen_syscall_target) undo_xen_syscall jmp system_call_after_swapgs ENDPROC(xen_syscall_target) So the 0xe033 that Xen writes is popped back off the kernel stack and ignored. xen_sysret64 explicitly pushes __USER_CS as its CS value, so that path looks OK. If we go into the iret patch (via auditing, for example), then the FIXUP_TOP_OF_STACK macro does movq $__USER_CS,CS+\offset(%rsp), which (unless it''s buggy) writes __USER_CS into the appropriate spot. So I don''t see what part of the entry path needs patching. --Andy _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2011-Jul-26 22:20 UTC
Re: [Xen-devel] [semi-urgent Xen CS question] Re: git commit 9fd67b4ed0714ab718f1f9bd14c344af336a6df7 (x86-64: Give vvars their own page) breaks Xen PV guests (64-bit).
On 26/07/2011 22:40, "Andrew Lutomirski" <luto@mit.edu> wrote:> If we go into the iret patch (via auditing, for example), then the > FIXUP_TOP_OF_STACK macro does movq $__USER_CS,CS+\offset(%rsp), which > (unless it''s buggy) writes __USER_CS into the appropriate spot. > > So I don''t see what part of the entry path needs patching.You''ll get Xen''s flat CS values loaded if Xen uses SYSRET to return to guest context. This will happen on return to guest userspace if the guest kernel calls the iret hypercall specifying the VGCF_in_syscall flag. And that would typically happen when returning to userspace after a syscall. So I guess the typical user process will quickly end up using the Xen code selector rather than Linux''s own. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
j.fitz.inge@gmail.com
2011-Jul-26 23:37 UTC
Re: [Xen-devel] [semi-urgent Xen CS question] Re: git commit 9fd67b4ed0714ab718f1f9bd14c344af336a6df7 (x86-64: Give vvars their own page) breaks Xen PV guests (64-bit).
The correct fix is to just look at the cpl in cs and ignore the rest of the selector. J -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. Andrew Lutomirski <luto@mit.edu> wrote: On Tue, Jul 26, 2011 at 5:10 PM, Andrew Lutomirski <luto@mit.edu> wrote:> On Tue, Jul 26, 2011 at 4:48 PM, Keir Fraser <keir.xen@gmail.com> wrote: >> On 26/07/2011 20:08, "Andrew Lutomirski" <luto@mit.edu> wrote: >> >>> On Tue, Jul 26, 2011 at 11:32 AM, Konrad Rzeszutek Wilk >>> <konrad.wilk@oracle.com> wrote: >>>> On Mon, Jul 25, 2011 at 09:50:30PM -0400, Andrew Lutomirski wrote: >>>>> After staring at the Xen assembly code with vague comprehension, I >>>>> think I can sort of understand what''s going on. >>>> >>>> Ok. >>>>> >>>>> Can you run this little program on a working kernel and tell me what >>>>> it says (built as 64-bit and as 32-bit (with -m32)): >>>> >>>> 32-bit: >>>> [konrad@f13-x86-build ~]$ ./check >>>> cs = 73 >>>> [konrad@f13-x86-build ~]$ uname -a >>>> Linux f13-x86-build.dumpdata.com 3.0.0 #1 SMP PREEMPT Tue Jul 26 09:56:38 EDT >>>> 2011 i686 i686 i386 GNU/Linux >>>> >>>> >>>> 64-bit: >>>> >>>> [konrad@f13-amd64-build ~]$ ./check >>>> cs = e033 >>> >>> My best guess is that each task starts out with standard __USER_CS, >>> but the code in write_stack_trampoline (in the hypervisor) tells the >>> kernel that CS is 0xe033 and then the next return to userspace makes >>> it true. >> >> Yes, that''s right. > > But it''s still weird, because AFAICT xen_sysret64 already does the > right thing. So presumably the failure case only happens when > something prevents sysret from working, like CONFIG_AUDITSYSCALL.I lied. I still don''t see what''s going on. Xen, in enlighten.c, registers xen_syscall_target as the 64-bit syscall target (or at least I assume that''s what CALLBACKTYPE_syscall does). xen_syscall_target does this: .macro undo_xen_syscall mov 0*8(%rsp), %rcx mov 1*8(%rsp), %r11 mov 5*8(%rsp), %rsp .endm /* Normal 64-bit system call target */ ENTRY(xen_syscall_target) undo_xen_syscall jmp system_call_after_swapgs ENDPROC(xen_syscall_target) So the 0xe033 that Xen writes is popped back off the kernel stack and ignored. xen_sysret64 explicitly pushes __USER_CS as its CS value, so that path looks OK. If we go into the iret patch (via auditing, for example), then the FIXUP_TOP_OF_STACK macro does movq $__USER_CS,CS+\offset(%rsp), which (unless it''s buggy) writes __USER_CS into the appropriate spot. So I don''t see what part of the entry path needs patching. --Andy _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andrew Lutomirski
2011-Jul-27 02:17 UTC
Re: [Xen-devel] [semi-urgent Xen CS question] Re: git commit 9fd67b4ed0714ab718f1f9bd14c344af336a6df7 (x86-64: Give vvars their own page) breaks Xen PV guests (64-bit).
On Tue, Jul 26, 2011 at 7:37 PM, j.fitz.inge@gmail.com <jeremy@goop.org> wrote:> The correct fix is to just look at the cpl in cs and ignore the rest of the > selector.No. All three of these code paths are trap handlers that are trying to distinguish between 64-bit and 32-bit segments. The CPL is 3 in either case. It looks like the reason I didn''t find the code that it references TRAP_syscall not VCGF_in_syscall. Yay for grep-unfriendly code. Barring a better idea, I''ll implement a new paravirt op. --Andy _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Jul-27 12:57 UTC
Re: [Xen-devel] [semi-urgent Xen CS question] Re: git commit 9fd67b4ed0714ab718f1f9bd14c344af336a6df7 (x86-64: Give vvars their own page) breaks Xen PV guests (64-bit).
> > Yes, that''s right. > > But it''s still weird, because AFAICT xen_sysret64 already does the > right thing. So presumably the failure case only happens when > something prevents sysret from working, like CONFIG_AUDITSYSCALL.Oh, which I do seem to have had turned so that SELinux can work. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2011-Jul-27 15:40 UTC
Re: [Xen-devel] [semi-urgent Xen CS question] Re: git commit 9fd67b4ed0714ab718f1f9bd14c344af336a6df7 (x86-64: Give vvars their own page) breaks Xen PV guests (64-bit).
On 07/26/2011 07:17 PM, Andrew Lutomirski wrote:> On Tue, Jul 26, 2011 at 7:37 PM, j.fitz.inge@gmail.com <jeremy@goop.org> wrote: >> The correct fix is to just look at the cpl in cs and ignore the rest of the >> selector. > No. All three of these code paths are trap handlers that are trying > to distinguish between 64-bit and 32-bit segments. The CPL is 3 in > either case.Oh, hm.> It looks like the reason I didn''t find the code that it references > TRAP_syscall not VCGF_in_syscall. Yay for grep-unfriendly code. > > Barring a better idea, I''ll implement a new paravirt op. >Ugh. I''d really like to avoid that. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andrew Lutomirski
2011-Jul-27 16:02 UTC
Re: [Xen-devel] [semi-urgent Xen CS question] Re: git commit 9fd67b4ed0714ab718f1f9bd14c344af336a6df7 (x86-64: Give vvars their own page) breaks Xen PV guests (64-bit).
On Wed, Jul 27, 2011 at 11:40 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:> On 07/26/2011 07:17 PM, Andrew Lutomirski wrote: >> On Tue, Jul 26, 2011 at 7:37 PM, j.fitz.inge@gmail.com <jeremy@goop.org> wrote: >>> The correct fix is to just look at the cpl in cs and ignore the rest of the >>> selector. >> No. All three of these code paths are trap handlers that are trying >> to distinguish between 64-bit and 32-bit segments. The CPL is 3 in >> either case. > > Oh, hm. > >> It looks like the reason I didn''t find the code that it references >> TRAP_syscall not VCGF_in_syscall. Yay for grep-unfriendly code. >> >> Barring a better idea, I''ll implement a new paravirt op. >> > > Ugh. I''d really like to avoid that.My current patch adds a field to pv_info. I agree it''s ugly. How terrible would it be to stop using VCGF_in_syscall so we can keep __USER_CS? Is there a real performance advantage to VCGF_in_syscall? --Andy _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2011-Jul-27 17:19 UTC
Re: [Xen-devel] [semi-urgent Xen CS question] Re: git commit 9fd67b4ed0714ab718f1f9bd14c344af336a6df7 (x86-64: Give vvars their own page) breaks Xen PV guests (64-bit).
On 07/27/2011 09:02 AM, Andrew Lutomirski wrote:> My current patch adds a field to pv_info. I agree it''s ugly.Hm, that''s not so bad as actually adding a new op though.> How terrible would it be to stop using VCGF_in_syscall so we can keep > __USER_CS? Is there a real performance advantage to VCGF_in_syscall?I don''t know. 64-bit PV guests are already pretty horrid because of all the pagetable switching, so it may be that iret vs sysret disappears in the wash. It''s certainly a cleaner fix, but I would want to measure it before committing to it. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andrew Lutomirski
2011-Jul-28 04:33 UTC
Re: [Xen-devel] [semi-urgent Xen CS question] Re: git commit 9fd67b4ed0714ab718f1f9bd14c344af336a6df7 (x86-64: Give vvars their own page) breaks Xen PV guests (64-bit).
On Wed, Jul 27, 2011 at 1:19 PM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:> On 07/27/2011 09:02 AM, Andrew Lutomirski wrote: >> My current patch adds a field to pv_info. I agree it''s ugly. > > Hm, that''s not so bad as actually adding a new op though. > >> How terrible would it be to stop using VCGF_in_syscall so we can keep >> __USER_CS? Is there a real performance advantage to VCGF_in_syscall? > > I don''t know. 64-bit PV guests are already pretty horrid because of all > the pagetable switching, so it may be that iret vs sysret disappears in > the wash. It''s certainly a cleaner fix, but I would want to measure it > before committing to it.On Sandy Bridge, a null vsyscall takes 373 ns. Without VCGF_in_syscall, it''s 457 ns. The change causes my little test app to get cs == __USER_CS. I suspect that Sandy Bridge is just about the worst case. syscall and sysret are amazingly fast on Sandy Bridge. --Andy> > J >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2011-Jul-28 06:07 UTC
Re: [Xen-devel] [semi-urgent Xen CS question] Re: git commit 9fd67b4ed0714ab718f1f9bd14c344af336a6df7 (x86-64: Give vvars their own page) breaks Xen PV guests (64-bit).
On 07/27/2011 09:33 PM, Andrew Lutomirski wrote:> On Sandy Bridge, a null vsyscall takes 373 ns. Without > VCGF_in_syscall, it''s 457 ns. The change causes my little test app to > get cs == __USER_CS.Hm, 20% is more noticable than I would hope. What about a regular syscall?> I suspect that Sandy Bridge is just about the worst case. syscall and > sysret are amazingly fast on Sandy Bridge. >Yes, and one presumes it would only get worse. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andrew Lutomirski
2011-Jul-29 12:51 UTC
Re: [Xen-devel] [semi-urgent Xen CS question] Re: git commit 9fd67b4ed0714ab718f1f9bd14c344af336a6df7 (x86-64: Give vvars their own page) breaks Xen PV guests (64-bit).
On Thu, Jul 28, 2011 at 2:07 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:> On 07/27/2011 09:33 PM, Andrew Lutomirski wrote: >> On Sandy Bridge, a null vsyscall takes 373 ns. Without >> VCGF_in_syscall, it''s 457 ns. The change causes my little test app to >> get cs == __USER_CS. > > Hm, 20% is more noticable than I would hope. What about a regular syscall?VCGF_in_syscall: gettimeofday() (the syscall version) takes 593 ns. Without VCGF_in_syscall, it''s 712 ns. I''d argue for using my original approach of adding a user_64bit_mode function -- I think it''s a legitimate cleanup and Xen, for better or worse, really does have two long mode CPL 3 selectors. If we removed selector 6 from the GDT, that would be a different story, but that would probably be a more intrusive change. --Andy _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2011-Jul-29 15:31 UTC
Re: [Xen-devel] [semi-urgent Xen CS question] Re: git commit 9fd67b4ed0714ab718f1f9bd14c344af336a6df7 (x86-64: Give vvars their own page) breaks Xen PV guests (64-bit).
On 07/29/2011 05:51 AM, Andrew Lutomirski wrote:> VCGF_in_syscall: gettimeofday() (the syscall version) takes 593 ns. > Without VCGF_in_syscall, it''s 712 ns. > > I''d argue for using my original approach of adding a user_64bit_mode > function -- I think it''s a legitimate cleanup and Xen, for better or > worse, really does have two long mode CPL 3 selectors. If we removed > selector 6 from the GDT, that would be a different story, but that > would probably be a more intrusive change.Sigh. Yeah, let''s see what happens. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Jul-31 18:56 UTC
Re: [Xen-devel] [semi-urgent Xen CS question] Re: git commit 9fd67b4ed0714ab718f1f9bd14c344af336a6df7 (x86-64: Give vvars their own page) breaks Xen PV guests (64-bit).
On Fri, Jul 29, 2011 at 08:31:35AM -0700, Jeremy Fitzhardinge wrote:> On 07/29/2011 05:51 AM, Andrew Lutomirski wrote: > > VCGF_in_syscall: gettimeofday() (the syscall version) takes 593 ns. > > Without VCGF_in_syscall, it''s 712 ns. > > > > I''d argue for using my original approach of adding a user_64bit_mode > > function -- I think it''s a legitimate cleanup and Xen, for better or > > worse, really does have two long mode CPL 3 selectors. If we removed > > selector 6 from the GDT, that would be a different story, but that > > would probably be a more intrusive change. > > Sigh. Yeah, let''s see what happens.So.. roll with Andrew''s patches? I think Andrew is just waiting for the word from you whether to repost the patches so that x86 maintainers can take a look at them.. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andrew Lutomirski
2011-Jul-31 19:14 UTC
Re: [Xen-devel] [semi-urgent Xen CS question] Re: git commit 9fd67b4ed0714ab718f1f9bd14c344af336a6df7 (x86-64: Give vvars their own page) breaks Xen PV guests (64-bit).
Actually I''m waiting b/c I''m defending my thesis tomorrow. I might get v2 out before then, but no guarantee. On Jul 31, 2011 2:57 PM, "Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com> wrote:> On Fri, Jul 29, 2011 at 08:31:35AM -0700, Jeremy Fitzhardinge wrote: >> On 07/29/2011 05:51 AM, Andrew Lutomirski wrote: >> > VCGF_in_syscall: gettimeofday() (the syscall version) takes 593 ns. >> > Without VCGF_in_syscall, it''s 712 ns. >> > >> > I''d argue for using my original approach of adding a user_64bit_mode >> > function -- I think it''s a legitimate cleanup and Xen, for better or >> > worse, really does have two long mode CPL 3 selectors. If we removed >> > selector 6 from the GDT, that would be a different story, but that >> > would probably be a more intrusive change. >> >> Sigh. Yeah, let''s see what happens. > > So.. roll with Andrew''s patches? I think Andrew is just waiting for theword> from you whether to repost the patches so that x86 maintainers can take alook> at them.._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Aug-02 14:10 UTC
Re: [Xen-devel] [semi-urgent Xen CS question] Re: git commit 9fd67b4ed0714ab718f1f9bd14c344af336a6df7 (x86-64: Give vvars their own page) breaks Xen PV guests (64-bit).
On Sun, Jul 31, 2011 at 03:14:59PM -0400, Andrew Lutomirski wrote:> Actually I''m waiting b/c I''m defending my thesis tomorrow. I might get v2 > out before then, but no guarantee.Oooh, yeah take care of your thesis first. And then celebrate by writting more patches :-)> On Jul 31, 2011 2:57 PM, "Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com> > wrote: > > On Fri, Jul 29, 2011 at 08:31:35AM -0700, Jeremy Fitzhardinge wrote: > >> On 07/29/2011 05:51 AM, Andrew Lutomirski wrote: > >> > VCGF_in_syscall: gettimeofday() (the syscall version) takes 593 ns. > >> > Without VCGF_in_syscall, it''s 712 ns. > >> > > >> > I''d argue for using my original approach of adding a user_64bit_mode > >> > function -- I think it''s a legitimate cleanup and Xen, for better or > >> > worse, really does have two long mode CPL 3 selectors. If we removed > >> > selector 6 from the GDT, that would be a different story, but that > >> > would probably be a more intrusive change. > >> > >> Sigh. Yeah, let''s see what happens. > > > > So.. roll with Andrew''s patches? I think Andrew is just waiting for the > word > > from you whether to repost the patches so that x86 maintainers can take a > look > > at them.._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel