Steve Dobbelstein
2006-Sep-06 00:56 UTC
[Xen-devel] Oops when loading xen_platform_pci module in HVM domain on CS 11429
I''m running 64-bit SLES 10 beta 10 (yes, we have to upgrade to the official release) on a machine with four Xeon 7020s. I got xen-unstable changeset 11429:66dd34f2f439 and built 64-bit uniprocessor kernels for dom0 and the HVM domain (a 2.6.16.13 baremetal kernel and its initrd). The HVM domain is also running SLES 10 beta 10. I followed the instructions to build the paravirtualized drivers for an HVM domain. When I run "modprobe xen_platform_pci" in the HVM domain I get a kernel oops. Here is the output in dmesg. PCI: Found IRQ 10 for device 0000:00:03.0 Xen version 3.0. Hypercall area is 1 pages (order 0 allocation) Unable to handle kernel paging request at ffff81002aca5220 RIP: [<ffff81002aca5220>] PGD 8063 PUD 9063 PMD 800000002ac001e3 PTE 31e031e031e031e Oops: 0011 [1] CPU 0 Modules linked in: xen_platform_pci ext3 mbcache jbd edd processor lpfc mptspi mptscsih mptbase ata_ piix libata Pid: 4000, comm: modprobe Not tainted 2.6.16.13-baremetal-up #1 RIP: 0010:[<ffff81002aca5220>] [<ffff81002aca5220>] RSP: 0018:ffff8100265b5b60 EFLAGS: 00010282 RAX: ffff81002aca5220 RBX: 000000002aca5000 RCX: 0000000040000000 RDX: 0000000000000000 RSI: ffff8100265b5b68 RDI: 0000000000000006 RBP: ffff8100265b5b78 R08: ffff81002aca5000 R09: ffffffff7fffffff R10: 00007f0000000000 R11: 0000000080000000 R12: ffff81002fea8000 R13: 00000000f3000000 R14: 000000000000c100 R15: 0000000000000001 FS: 00002b443d7726d0(0000) GS:ffffffff80533000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffff81002aca5220 CR3: 0000000026f89000 CR4: 00000000000006e0 Process modprobe (pid: 4000, threadinfo ffff8100265b4000, task ffff81002fba0380) Stack: ffffffff88086c5c ffff810000000000 ffffffff80146693 ffff8100265b5c08 ffffffff88086635 0000000300000000 ffff8100265b5bb8 0000000000000000 0000000000000100 0000000001000000 Call Trace: <ffffffff88086c5c>{:xen_platform_pci:setup_xen_features+40} <ffffffff80146693>{__get_free_pages+49} <ffffffff88086635>{:xen_platform_pci:platform_pci_init+832} <ffffffff80207ef2>{pci_device_probe+77} <ffffffff8024d32a>{driver_probe_device+92} <ffffffff8024d3f2>{__driver_attach+0} <ffffffff8024d449>{__driver_attach+87} <ffffffff8024cd16>{bus_for_each_dev+79} <ffffffff8024d25a>{driver_attach+28} <ffffffff8024c913>{bus_add_driver+122} <ffffffff8024d6d4>{driver_register+143} <ffffffff802080b1>{__pci_register_driver+111} <ffffffff8808e01c>{:xen_platform_pci:platform_pci_module_init+28} <ffffffff8013daa5>{sys_init_module+5606} <ffffffff8013731f>{autoremove_wake_function+0} <ffffffff8015efaa>{vfs_read+173} <ffffffff8010a8ba>{system_call+126} Code: b8 11 00 00 00 0f 01 c1 c3 00 00 00 00 00 00 00 00 00 00 00 RIP [<ffff81002aca5220>] RSP <ffff8100265b5b60> CR2: ffff81002aca5220 It is oopsing on line 25 in unmodified_drivers/linux-2.6 /platform-pci/features.c (which is a sym link to ../../linux-2.6-xen-sparse/drivers/xen/core/features.c): if (HYPERVISOR_xen_version(XENVER_get_features, &fi) < 0) Looks like something went wrong with the hypercall. I crawled through the code to see how the hypercall stubs are set up but got lost in the MSR stuff. I''ll take a look at it again tomorrow. Thought I should post it to the list in case anyone else can reproduce the problem and either find a fix or explain why it''s a user error. Let me know if you need more info on my setup. Steve D. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Pratt
2006-Sep-06 01:54 UTC
RE: [Xen-devel] Oops when loading xen_platform_pci module in HVM domainon CS 11429
> I''m running 64-bit SLES 10 beta 10 (yes, we have to upgrade to theofficial> release) on a machine with four Xeon 7020s. I got xen-unstablechangeset> 11429:66dd34f2f439 and built 64-bit uniprocessor kernels for dom0 andthe> HVM domain (a 2.6.16.13 baremetal kernel and its initrd). The HVMdomain> is also running SLES 10 beta 10. I followed the instructions to buildthe> paravirtualized drivers for an HVM domain. When I run "modprobe > xen_platform_pci" in the HVM domain I get a kernel oops. Here is the > output in dmesg.Did you install the xen, tools and dom0 kernel that came with that xen changeset? You''ll need all three for PV drivers in HVM domains to work. Ian> > PCI: Found IRQ 10 for device 0000:00:03.0 > Xen version 3.0. > Hypercall area is 1 pages (order 0 allocation) > Unable to handle kernel paging request at ffff81002aca5220 RIP: > [<ffff81002aca5220>] > PGD 8063 PUD 9063 PMD 800000002ac001e3 PTE 31e031e031e031e > Oops: 0011 [1] > CPU 0 > Modules linked in: xen_platform_pci ext3 mbcache jbd edd processorlpfc> mptspi mptscsih mptbase ata_ > piix libata > Pid: 4000, comm: modprobe Not tainted 2.6.16.13-baremetal-up #1 > RIP: 0010:[<ffff81002aca5220>] [<ffff81002aca5220>] > RSP: 0018:ffff8100265b5b60 EFLAGS: 00010282 > RAX: ffff81002aca5220 RBX: 000000002aca5000 RCX: 0000000040000000 > RDX: 0000000000000000 RSI: ffff8100265b5b68 RDI: 0000000000000006 > RBP: ffff8100265b5b78 R08: ffff81002aca5000 R09: ffffffff7fffffff > R10: 00007f0000000000 R11: 0000000080000000 R12: ffff81002fea8000 > R13: 00000000f3000000 R14: 000000000000c100 R15: 0000000000000001 > FS: 00002b443d7726d0(0000) GS:ffffffff80533000(0000) > knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: ffff81002aca5220 CR3: 0000000026f89000 CR4: 00000000000006e0 > Process modprobe (pid: 4000, threadinfo ffff8100265b4000, task > ffff81002fba0380) > Stack: ffffffff88086c5c ffff810000000000 ffffffff80146693ffff8100265b5c08> ffffffff88086635 0000000300000000 ffff8100265b5bb80000000000000000> 0000000000000100 0000000001000000 > Call Trace:<ffffffff88086c5c>{:xen_platform_pci:setup_xen_features+40}> <ffffffff80146693>{__get_free_pages+49} > <ffffffff88086635>{:xen_platform_pci:platform_pci_init+832} > <ffffffff80207ef2>{pci_device_probe+77} > <ffffffff8024d32a>{driver_probe_device+92} > <ffffffff8024d3f2>{__driver_attach+0} > <ffffffff8024d449>{__driver_attach+87} > <ffffffff8024cd16>{bus_for_each_dev+79} > <ffffffff8024d25a>{driver_attach+28} > <ffffffff8024c913>{bus_add_driver+122} > <ffffffff8024d6d4>{driver_register+143} > <ffffffff802080b1>{__pci_register_driver+111} > <ffffffff8808e01c>{:xen_platform_pci:platform_pci_module_init+28} > <ffffffff8013daa5>{sys_init_module+5606} > <ffffffff8013731f>{autoremove_wake_function+0} > <ffffffff8015efaa>{vfs_read+173}<ffffffff8010a8ba>{system_call+126}> > Code: b8 11 00 00 00 0f 01 c1 c3 00 00 00 00 00 00 00 00 00 00 00 > RIP [<ffff81002aca5220>] RSP <ffff8100265b5b60> > CR2: ffff81002aca5220 > > It is oopsing on line 25 in unmodified_drivers/linux-2.6 > /platform-pci/features.c (which is a sym link to > ../../linux-2.6-xen-sparse/drivers/xen/core/features.c): > if (HYPERVISOR_xen_version(XENVER_get_features, &fi) < 0) > > Looks like something went wrong with the hypercall. I crawled throughthe> code to see how the hypercall stubs are set up but got lost in the MSR > stuff. I''ll take a look at it again tomorrow. Thought I should postit to> the list in case anyone else can reproduce the problem and either finda> fix or explain why it''s a user error. > > Let me know if you need more info on my setup. > > Steve D. > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steve Dobbelstein
2006-Sep-06 14:44 UTC
RE: [Xen-devel] Oops when loading xen_platform_pci module in HVM domainon CS 11429
"Ian Pratt" <m+Ian.Pratt@cl.cam.ac.uk> wrote on 09/05/2006 08:54:47 PM:> Did you install the xen, tools and dom0 kernel that came with that xen > changeset? You''ll need all three for PV drivers in HVM domains to work.Yes, I built and installed the Xen hypervisor, the dom0 kernel, and the tools from xen-unstable changeset 11429. Steve D. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steve Dobbelstein
2006-Sep-07 20:04 UTC
[Xen-devel] Re: Oops when loading xen_platform_pci module in HVM domain on CS 11429
steved@us.ibm.com wrote on 09/05/2006 07:56:00 PM:> I''m running 64-bit SLES 10 beta 10 (yes, we have to upgrade to the > official release) on a machine with four Xeon 7020s. I got xen- > unstable changeset 11429:66dd34f2f439 and built 64-bit uniprocessor > kernels for dom0 and the HVM domain (a 2.6.16.13 baremetal kernel > and its initrd). The HVM domain is also running SLES 10 beta 10. I > followed the instructions to build the paravirtualized drivers for > an HVM domain. When I run "modprobe xen_platform_pci" in the HVM > domain I get a kernel oops. Here is the output in dmesg. > > PCI: Found IRQ 10 for device 0000:00:03.0 > Xen version 3.0. > Hypercall area is 1 pages (order 0 allocation) > Unable to handle kernel paging request at ffff81002aca5220 RIP: > [<ffff81002aca5220>] > PGD 8063 PUD 9063 PMD 800000002ac001e3 PTE 31e031e031e031e > Oops: 0011 [1] > CPU 0 > Modules linked in: xen_platform_pci ext3 mbcache jbd edd processor > lpfc mptspi mptscsih mptbase ata_ > piix libata > Pid: 4000, comm: modprobe Not tainted 2.6.16.13-baremetal-up #1 > RIP: 0010:[<ffff81002aca5220>] [<ffff81002aca5220>] > RSP: 0018:ffff8100265b5b60 EFLAGS: 00010282 > RAX: ffff81002aca5220 RBX: 000000002aca5000 RCX: 0000000040000000 > RDX: 0000000000000000 RSI: ffff8100265b5b68 RDI: 0000000000000006 > RBP: ffff8100265b5b78 R08: ffff81002aca5000 R09: ffffffff7fffffff > R10: 00007f0000000000 R11: 0000000080000000 R12: ffff81002fea8000 > R13: 00000000f3000000 R14: 000000000000c100 R15: 0000000000000001 > FS: 00002b443d7726d0(0000) GS:ffffffff80533000(0000)knlGS:0000000000000000> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: ffff81002aca5220 CR3: 0000000026f89000 CR4: 00000000000006e0 > Process modprobe (pid: 4000, threadinfo ffff8100265b4000, task > ffff81002fba0380) > Stack: ffffffff88086c5c ffff810000000000 ffffffff80146693ffff8100265b5c08> ffffffff88086635 0000000300000000 ffff8100265b5bb80000000000000000> 0000000000000100 0000000001000000 > Call Trace: <ffffffff88086c5c>{:xen_platform_pci:setup_xen_features+40} > <ffffffff80146693>{__get_free_pages+49} <ffffffff88086635>{: > xen_platform_pci:platform_pci_init+832} > <ffffffff80207ef2>{pci_device_probe+77} > <ffffffff8024d32a>{driver_probe_device+92} > <ffffffff8024d3f2>{__driver_attach+0} > <ffffffff8024d449>{__driver_attach+87} > <ffffffff8024cd16>{bus_for_each_dev+79} > <ffffffff8024d25a>{driver_attach+28} > <ffffffff8024c913>{bus_add_driver+122} > <ffffffff8024d6d4>{driver_register+143} > <ffffffff802080b1>{__pci_register_driver+111} > <ffffffff8808e01c>{:xen_platform_pci:platform_pci_module_init+28} > <ffffffff8013daa5>{sys_init_module+5606} > <ffffffff8013731f>{autoremove_wake_function+0} > <ffffffff8015efaa>{vfs_read+173}<ffffffff8010a8ba>{system_call+126}> > Code: b8 11 00 00 00 0f 01 c1 c3 00 00 00 00 00 00 00 00 00 00 00 > RIP [<ffff81002aca5220>] RSP <ffff8100265b5b60> > CR2: ffff81002aca5220 > > It is oopsing on line 25 in unmodified_drivers/linux-2.6/platform- > pci/features.c (which is a sym link to ../../linux-2.6-xen- > sparse/drivers/xen/core/features.c): > if (HYPERVISOR_xen_version(XENVER_get_features, &fi) < 0) > > Looks like something went wrong with the hypercall. I crawled > through the code to see how the hypercall stubs are set up but got > lost in the MSR stuff. I''ll take a look at it again tomorrow. > Thought I should post it to the list in case anyone else can > reproduce the problem and either find a fix or explain why it''s a usererror.> > Let me know if you need more info on my setup. > > Steve D.Digging into this further I found that the problem is that they hypercall mechanism its trying to execute the instructions for the hypercall which reside in the hypercall stubs page. However, the page table entry for the page has the _PAGE_NX (no execute) bit set. (I''m running a 64-bit OS with PAE in the HVM domain.) The error code in the oops (0x11) indicates that the page fault is because of the _PAGE_NX bit. 0x01 -> access rights violation 0x10 -> The fault was caused by an instruction fetch. I tried hacking some code to turn off the NX bit in the PTE for the hypercall stubs page, but I still get the oops. I''m thinking it''s because the NX bit is set in the PMD. I''m quite new to the paging mechanism, so I''m not sure how to fix this at the moment. I''ll keep poking around. thought I''d share my findings so far. Steve D. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Sep-07 21:11 UTC
Re: [Xen-devel] Re: Oops when loading xen_platform_pci module in HVM domain on CS 11429
On 7/9/06 21:04, "Steve Dobbelstein" <steved@us.ibm.com> wrote:> I tried hacking some code to turn off the NX bit in the PTE for the > hypercall stubs page, but I still get the oops. I''m thinking it''s because > the NX bit is set in the PMD. > > I''m quite new to the paging mechanism, so I''m not sure how to fix this at > the moment. I''ll keep poking around. thought I''d share my findings so > far.Page directory entries use permissions _PAGE_TABLE, which does not include _PAGE_NX. So clearing _PAGE_NX from the PTEs, using change_page_attr(PAGE_KERNEL_EXEC), should suffice. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steve Dobbelstein
2006-Sep-08 00:41 UTC
Re: [Xen-devel] Re: Oops when loading xen_platform_pci module in HVM domain on CS 11429
Keir Fraser <Keir.Fraser@cl.cam.ac.uk> wrote on 09/07/2006 04:11:37 PM:> On 7/9/06 21:04, "Steve Dobbelstein" <steved@us.ibm.com> wrote: > > > I tried hacking some code to turn off the NX bit in the PTE for the > > hypercall stubs page, but I still get the oops. I''m thinking it''sbecause> > the NX bit is set in the PMD. > > > > I''m quite new to the paging mechanism, so I''m not sure how to fix thisat> > the moment. I''ll keep poking around. thought I''d share my findingsso> > far. > > Page directory entries use permissions _PAGE_TABLE, which does notinclude> _PAGE_NX. So clearing _PAGE_NX from the PTEs, using > change_page_attr(PAGE_KERNEL_EXEC), should suffice. > > -- KeirYes, it should suffice, but it doesn''t. What happens is that the PV driver calls __get_free_page() and gets a page -- a large page, i.e. the _PAGE_PSE bit is set in the PTE. change_page_attr() sees that the pgprot is being change for only one 4KB page and splits the page. It creates a PMD for the 4KB pages that made up the large page. The PMD is given the pgprot of the original large page, which in this case includes the _PAGE_NX bit. So while the new PTE for the 4KB page for the hypercall stubs has the _PAGE_NX bit turned off, the PMD over the PTE has the _PAGE_NX bit on which effectively sets it for all the PTEs pointed to by the PMD. :( Thanks for any tips. Steve D. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steven Smith
2006-Sep-08 17:03 UTC
Re: [Xen-devel] Re: Oops when loading xen_platform_pci module in HVM domain on CS 11429
> > I tried hacking some code to turn off the NX bit in the PTE for the > > hypercall stubs page, but I still get the oops. I''m thinking it''s because > > the NX bit is set in the PMD. > > > > I''m quite new to the paging mechanism, so I''m not sure how to fix this at > > the moment. I''ll keep poking around. thought I''d share my findings so > > far. > Page directory entries use permissions _PAGE_TABLE, which does not include > _PAGE_NX. So clearing _PAGE_NX from the PTEs, using > change_page_attr(PAGE_KERNEL_EXEC), should suffice.The oops message is fairly clear that _PAGE_NX is set on the PMD, and I''d guess it probably got set from phys_pmd_init. I think vmalloc_exec is probably the right answer here. I''ll have a go at this over the weekend. Steven. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Sep-08 18:19 UTC
Re: [Xen-devel] Re: Oops when loading xen_platform_pci module in HVM domain on CS 11429
On 8/9/06 18:03, "Steven Smith" <sos22-xen@srcf.ucam.org> wrote:>> Page directory entries use permissions _PAGE_TABLE, which does not include >> _PAGE_NX. So clearing _PAGE_NX from the PTEs, using >> change_page_attr(PAGE_KERNEL_EXEC), should suffice. > The oops message is fairly clear that _PAGE_NX is set on the PMD, and > I''d guess it probably got set from phys_pmd_init. > > I think vmalloc_exec is probably the right answer here. I''ll have a > go at this over the weekend.I''ve had a go (c/s 11435). It''s complicated by the fact that vmalloc_exec() and __PAGE_KERNEL_EXEC are not exported to modules. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steven Smith
2006-Sep-11 08:34 UTC
Re: [Xen-devel] Re: Oops when loading xen_platform_pci module in HVM domain on CS 11429
> >> Page directory entries use permissions _PAGE_TABLE, which does not include > >> _PAGE_NX. So clearing _PAGE_NX from the PTEs, using > >> change_page_attr(PAGE_KERNEL_EXEC), should suffice. > > The oops message is fairly clear that _PAGE_NX is set on the PMD, and > > I''d guess it probably got set from phys_pmd_init. > > > > I think vmalloc_exec is probably the right answer here. I''ll have a > > go at this over the weekend. > I''ve had a go (c/s 11435). It''s complicated by the fact that vmalloc_exec() > and __PAGE_KERNEL_EXEC are not exported to modules.Thanks. Steven. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steve Dobbelstein
2006-Sep-11 14:48 UTC
Re: [Xen-devel] Re: Oops when loading xen_platform_pci module in HVM domain on CS 11429
Keir Fraser <Keir.Fraser@cl.cam.ac.uk> wrote on 09/08/2006 01:19:33 PM:> On 8/9/06 18:03, "Steven Smith" <sos22-xen@srcf.ucam.org> wrote: > > >> Page directory entries use permissions _PAGE_TABLE, which does notinclude> >> _PAGE_NX. So clearing _PAGE_NX from the PTEs, using > >> change_page_attr(PAGE_KERNEL_EXEC), should suffice. > > The oops message is fairly clear that _PAGE_NX is set on the PMD, and > > I''d guess it probably got set from phys_pmd_init. > > > > I think vmalloc_exec is probably the right answer here. I''ll have a > > go at this over the weekend. > > I''ve had a go (c/s 11435). It''s complicated by the fact thatvmalloc_exec()> and __PAGE_KERNEL_EXEC are not exported to modules. > > -- KeirIs this something that should be fixed in the mainline kernel? Basically, a change_page_attr() to make a page executable doesn''t work. It seems to me that split_large_page() in arch/x86_64/mm/pageattr.c should be changed to not propagate the old pgprot to the new PMD (at least not the _PAGE_NX bit) but rather propagate it into the new sub-PTEs that are created when the large PTE is split. Thoughts? Steve D. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Sep-11 15:30 UTC
Re: [Xen-devel] Re: Oops when loading xen_platform_pci module in HVM domain on CS 11429
On 11/9/06 3:48 pm, "Steve Dobbelstein" <steved@us.ibm.com> wrote:> Is this something that should be fixed in the mainline kernel? Basically, > a change_page_attr() to make a page executable doesn''t work. It seems to > me that split_large_page() in arch/x86_64/mm/pageattr.c should be changed > to not propagate the old pgprot to the new PMD (at least not the _PAGE_NX > bit) but rather propagate it into the new sub-PTEs that are created when > the large PTE is split. > > Thoughts?Even just access to vmalloc_exec() from modules would be nice. I really had to hack around the fact that vmalloc_exec() and even PAGE_KERNEL_EXEC are not exported to modules. It almost seems deliberate, except that doesn''t really make sense since it''s quite easy to work/hack around. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel