Hello: I work in the same group as Dave Quigley and George Coker. I''m working on a graphical switcher application for Xen which uses the DirectFB library on top of Linux VESA fbdev. This runs in dom0 at the moment. I''m using the latest xen/next pvops dom0 and xen-unstable hypervisor compiled from source, with vga=ask so I can boot dom0 in a graphical mode. The problem I''m having is illustrated by the attached test program that displays a green background with a white square for 10 seconds when run as root. It doesn''t work on the xen/next / xen-unstable combo. The program runs and exits normally but all I see is a black screen. The program *does* work on xen/next running on the bare metal. It also works using the xen-unstable hypervisor with an older dom0, the 2.6.31.4 kernel with Novell patches. So I think the issue is in the xen/next kernel. I''ve run the test program on different machines and observed the same behavior. The xen-unstable / 2.6.31.4 dom0 combination works and I''m using that for the moment but I''d like to be using pvops. I would be happy to run more tests / provide more data if needed. -- Eamon Walsh National Security Agency _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Mar-12 21:42 UTC
Re: [Xen-devel] Fbdev graphics broken in xen/next dom0
On 03/12/2010 12:24 PM, Eamon Walsh wrote:> I work in the same group as Dave Quigley and George Coker. I''m working > on a graphical switcher application for Xen which uses the DirectFB > library on top of Linux VESA fbdev. This runs in dom0 at the moment. > I''m using the latest xen/next pvops dom0 and xen-unstable hypervisor > compiled from source, with vga=ask so I can boot dom0 in a graphical mode. > > The problem I''m having is illustrated by the attached test program that > displays a green background with a white square for 10 seconds when run > as root. It doesn''t work on the xen/next / xen-unstable combo. The > program runs and exits normally but all I see is a black screen. > > The program *does* work on xen/next running on the bare metal. It also > works using the xen-unstable hypervisor with an older dom0, the 2.6.31.4 > kernel with Novell patches. So I think the issue is in the xen/next > kernel. I''ve run the test program on different machines and observed > the same behavior. > > The xen-unstable / 2.6.31.4 dom0 combination works and I''m using that > for the moment but I''d like to be using pvops. I would be happy to run > more tests / provide more data if needed. >What''s the hardware? Do any messages appear either on the dom0 console or the Xen console? Does booting with a vga console help? Thanks, J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 03/12/2010 04:42 PM, Jeremy Fitzhardinge wrote:> On 03/12/2010 12:24 PM, Eamon Walsh wrote: > >> I work in the same group as Dave Quigley and George Coker. I''m working >> on a graphical switcher application for Xen which uses the DirectFB >> library on top of Linux VESA fbdev. This runs in dom0 at the moment. >> I''m using the latest xen/next pvops dom0 and xen-unstable hypervisor >> compiled from source, with vga=ask so I can boot dom0 in a graphical mode. >> >> The problem I''m having is illustrated by the attached test program that >> displays a green background with a white square for 10 seconds when run >> as root. It doesn''t work on the xen/next / xen-unstable combo. The >> program runs and exits normally but all I see is a black screen. >> >> The program *does* work on xen/next running on the bare metal. It also >> works using the xen-unstable hypervisor with an older dom0, the 2.6.31.4 >> kernel with Novell patches. So I think the issue is in the xen/next >> kernel. I''ve run the test program on different machines and observed >> the same behavior. >> >> The xen-unstable / 2.6.31.4 dom0 combination works and I''m using that >> for the moment but I''d like to be using pvops. I would be happy to run >> more tests / provide more data if needed. >> >> > What''s the hardware? Do any messages appear either on the dom0 console > or the Xen console? Does booting with a vga console help? >The hardware is a Dell Latitude E6500 with nvidia graphics. I also see the issue on a Dell Optiplex 960 desktop with Intel graphics. No obvious messages on the consoles. I am booting in VGA mode. I have narrowed the problem down: it has something to do with mmap of /dev/fb0 not syncing. The attached C code mmaps /dev/fb0 and writes some random bits. On a configuration that does work (2.6.31.4 on 4.0-rc6, or xen/next on bare metal) the random bits are visible on the screen. With xen/next on 4.0-rc6, nothing is visible. Calling msync() before the sleep has no effect. Also, using write() on /dev/fb0 always works so it appears to be mmap related. -- Eamon Walsh National Security Agency _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Mar-13 00:51 UTC
Re: [Xen-devel] Fbdev graphics broken in xen/next dom0
On 03/12/2010 04:44 PM, Eamon Walsh wrote:> I have narrowed the problem down: it has something to do with mmap of > /dev/fb0 not syncing. The attached C code mmaps /dev/fb0 and writes > some random bits. On a configuration that does work (2.6.31.4 on > 4.0-rc6, or xen/next on bare metal) the random bits are visible on the > screen. With xen/next on 4.0-rc6, nothing is visible. Calling msync() > before the sleep has no effect. Also, using write() on /dev/fb0 always > works so it appears to be mmap related. >Yes. I suspect there''s a missing VM_IO in there, and so the mmap is mapping the wrong pages (if you''re lucky you might be able to crash the machine to see something juicy). J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2010-Mar-16 00:46 UTC
Re: [Xen-devel] Fbdev graphics broken in xen/next dom0
On Fri, Mar 12, 2010 at 04:51:30PM -0800, Jeremy Fitzhardinge wrote:> On 03/12/2010 04:44 PM, Eamon Walsh wrote: >> I have narrowed the problem down: it has something to do with mmap of >> /dev/fb0 not syncing. The attached C code mmaps /dev/fb0 and writesIs the machine spinning? Meaning if you start writting to the mmap region the machine looks to be stuck?>> some random bits. On a configuration that does work (2.6.31.4 on >> 4.0-rc6, or xen/next on bare metal) the random bits are visible on the >> screen. With xen/next on 4.0-rc6, nothing is visible. Calling msync() >> before the sleep has no effect. Also, using write() on /dev/fb0 always >> works so it appears to be mmap related. >> > > Yes. I suspect there''s a missing VM_IO in there, and so the mmap is > mapping the wrong pages (if you''re lucky you might be able to crash the > machine to see something juicy).<scratches his head> The nvidia framebuffer (drivers/video/nvidia/nvidia.c) does this: 1369 info->screen_base = ioremap(nvidiafb_fix.smem_start, par->FbMapSize); where the start of memory is obtained via 1328 nvidiafb_fix.smem_start = pci_resource_start(pd, 1); I believe the ''ioremap'' works pretty good, otherwise we would have other PCI devices having trouble. ... and in another code (fbmem.c): 1321 static int 1322 fb_mmap(struct file *file, struct vm_area_struct * vma) .. 1345 start = info->fix.smem_start; .. 1363 /* This is an IO map - tell maydump to skip this VMA */ 1364 vma->vm_flags |= VM_IO | VM_RESERVED; .. it _does_ set the VM_IO, but that is OK since the memory is actually backed by the PCI device. Eamon, can you provide a more detail serial output? That could shed some light on this. Another thing you could try to make sure you are actually hitting the right mmap, is to instrument fb_mmap. I would recommend printing out the vma->vm_start, vm_end, and start to see if the look reasonable. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 03/15/2010 08:46 PM, Konrad Rzeszutek Wilk wrote:> On Fri, Mar 12, 2010 at 04:51:30PM -0800, Jeremy Fitzhardinge wrote: > >> On 03/12/2010 04:44 PM, Eamon Walsh wrote: >> >>> I have narrowed the problem down: it has something to do with mmap of >>> /dev/fb0 not syncing. The attached C code mmaps /dev/fb0 and writes >>> > Is the machine spinning? Meaning if you start writting to the mmap > region the machine looks to be stuck? >No, the machine keeps running just fine. Although I tried reading out of the mmap region and it is definitely not framebuffer memory, it''s filled with some kind of binary data on one of my machines which is not there if I just read() from /dev/fb0.> >>> some random bits. On a configuration that does work (2.6.31.4 on >>> 4.0-rc6, or xen/next on bare metal) the random bits are visible on the >>> screen. With xen/next on 4.0-rc6, nothing is visible. Calling msync() >>> before the sleep has no effect. Also, using write() on /dev/fb0 always >>> works so it appears to be mmap related. >>> >>> >> Yes. I suspect there''s a missing VM_IO in there, and so the mmap is >> mapping the wrong pages (if you''re lucky you might be able to crash the >> machine to see something juicy). >> > <scratches his head> > > The nvidia framebuffer (drivers/video/nvidia/nvidia.c) does this: > > 1369 info->screen_base = ioremap(nvidiafb_fix.smem_start, par->FbMapSize); > > where the start of memory is obtained via > 1328 nvidiafb_fix.smem_start = pci_resource_start(pd, 1); > > I believe the ''ioremap'' works pretty good, otherwise we would have other > PCI devices having trouble. > > ... and in another code (fbmem.c): > > 1321 static int > 1322 fb_mmap(struct file *file, struct vm_area_struct * vma) > .. > 1345 start = info->fix.smem_start; > .. > 1363 /* This is an IO map - tell maydump to skip this VMA */ > 1364 vma->vm_flags |= VM_IO | VM_RESERVED; > > .. it _does_ set the VM_IO, but that is OK since the memory is actually > backed by the PCI device. > > Eamon, can you provide a more detail serial output? That could shed some > light on this. Another thing you could try to make sure you are actually > hitting the right mmap, is to instrument fb_mmap. I would recommend > printing out the vma->vm_start, vm_end, and start to see if the look > reasonable. > > >The serial output is attached. The patch I used to instrument the fb_mmap function and the output it produced for a couple of runs are also attached. And I tossed in my kernel .config for good measure. What else is needed? -- Eamon Walsh National Security Agency _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2010-Mar-16 22:19 UTC
Re: [Xen-devel] Fbdev graphics broken in xen/next dom0
> > The serial output is attached. > > The patch I used to instrument the fb_mmap function and the output it > produced for a couple of runs are also attached. > > And I tossed in my kernel .config for good measure. > > What else is needed?It looks that I confused your email with another person. You don''t seem to run the nvidia fb, but rather the radeon one. .. snip ..> Non-volatile memory driver v1.3 > Linux agpgart interface v0.103 > agpgart-intel 0000:00:00.0: Intel Q45/Q43 Chipset > agpgart-intel 0000:00:00.0: detected 32764K stolen memory > agpgart-intel 0000:00:00.0: AGP aperture is 256M @ 0xd0000000 > tpm_tis 00:08: 1.2 TPM (device-id 0x4A10, rev-id 78) > [drm] Initialized drm 1.1.0 20060810 > [drm] radeon defaulting to kernel modesetting. > [drm] radeon kernel modesetting enabled. > xen_allocate_pirq: returning irq 16 for gsi 16 > Already setup the GSI :16 > i915 0000:00:02.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 > [drm] set up 31M of stolen space > [drm] TMDS-8: set mode 1280x1024 17 > Console: switching to colour frame buffer device 160x64 > fb0: inteldrmfb frame buffer device > registered panic notifier > [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0You look to have a i915 framebuffer on your box. I *think* that the i915 is not using KMS and the TTM stuff, so the patch that Arvind posted would probably not help you. http://www.mail-archive.com/dri-devel@lists.sourceforge.net/msg48668.html So, lets boot your kernel with these command line parameters to get more data: debug initcall_debug drm.debug=255 That should spew out some more details. Next thing I would suggest is to instrument i915_gem_fault. Attached is a patch that does it (thought it is not compile tested nor actually booted so it might need some hand crafting - sorry). And the other thing is to read through the steps that Arvind took in the e-mail thread titled: "Nouveau on dom0". It covers the gamma of things to troubleshoot this. diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index fba37e9..cfcaafd 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -33,6 +33,8 @@ #include "intel_drv.h" #include <linux/swap.h> #include <linux/pci.h> +#include <xen/xen.h> +#include <asm/xen/page.h> #define I915_GEM_GPU_DOMAINS (~(I915_GEM_DOMAIN_CPU | I915_GEM_DOMAIN_GTT)) @@ -1145,6 +1147,143 @@ i915_gem_mmap_ioctl(struct drm_device *dev, void *data, return 0; } +void print_pte(struct vm_area_struct *vma, char *what, struct page *page, unsigned int pfn, unsigned long address) +{ + static const char * const level_name[] + { "NONE", "4K", "2M", "1G", "NUM" }; + unsigned long addr = 0; + pte_t *pte = NULL; + pteval_t val = (pteval_t)0; + unsigned int level = 0; + unsigned offset; + unsigned long phys; + pgprotval_t prot; + char buf[90]; + char *str; + + str = buf; + // Figure out if the address is pagetable. + if (address == 0 && !page && pfn>0) { + page = pfn_to_page(pfn); + } + if (address == 0 && page) + addr = (u64)page_address(page); + + if (address && !page) + addr = address; + + if (address && page) { + addr = (u64)page_address(page); + if (address != addr) { + if (addr == 0) { + str += sprintf(str, "addr(page)==0"); + addr = address; + } + } + } + + if (pfn != 0 && page) { + if (pfn != page_to_pfn(page)) // Gosh!? + str += sprintf(str, "pfn!=pfn(page)"); + } + if (pfn != 0 && addr != 0) { + if (pfn != virt_to_pfn(addr)) + str += sprintf(str,"pfn(addr)!=pfn"); + } + pte = lookup_address(addr, &level); + if (!pte) { + str += sprintf(str,"!pte(addr)"); + goto print; + } + offset = addr & ~PAGE_MASK; + + if (xen_domain()) { + phys = (pte_mfn(*pte) << PAGE_SHIFT) + offset; + val = pte_val_ma(*pte); + + if (pfn > 0) { + if (pte_mfn(*pte) == pfn) { + if (vma->vm_flags && VM_IO) + str += sprintf(str,"PHYS"); + else + str += sprintf(str,"BUG: VM_IO not set!"); + } + /* It is a pseudo page ... and the VM_IO flag is set */ + if (pte_mfn(*pte) != pfn) { + if (vma->vm_flags && VM_IO) + str += sprintf(str,"BUG: VM_IO flag set!"); + else + str += sprintf(str, "PSEUDO"); + } + } else { + str += sprintf(str,"pfn==0"); + } + + } else { + phys = (pte_pfn(*pte) << PAGE_SHIFT) + offset; + val = pte_val(*pte); + } + prot = pgprot_val(pte_pgprot(*pte)); + + if (!prot) + str += sprintf(str, "Not present."); + else { + if (prot & _PAGE_USER) + str += sprintf(str, "USR "); + else + str += sprintf(str, " "); + if (prot & _PAGE_RW) + str += sprintf(str, "RW "); + else + str += sprintf(str, "ro "); + if (prot & _PAGE_PWT) + str += sprintf(str, "PWT "); + else + str += sprintf(str, " "); + if (prot & _PAGE_PCD) + str += sprintf(str, "PCD "); + else + str += sprintf(str, " "); + + /* Bit 9 has a different meaning on level 3 vs 4 */ + if (level <= 3) { + if (prot & _PAGE_PSE) + str += sprintf(str, "PSE "); + else + str += sprintf(str, " "); + } else { + if (prot & _PAGE_PAT) + str += sprintf(str, "pat "); + else + str += sprintf(str, " "); + } + if (prot & _PAGE_GLOBAL) + str += sprintf(str, "GLB "); + else + str += sprintf(str, " "); + if (prot & _PAGE_NX) + str += sprintf(str, "NX "); + else + str += sprintf(str, "x "); +#ifdef _PAGE_IOMEM + if (prot & _PAGE_IOMEM) + str += sprintf(str, "IO "); + else + str += sprintf(str, " "); +#endif + + } + +print: + printk(KERN_INFO "[%16s]PFN: 0x%lx PTE: 0x%lx (val:%lx): [%s] [%s]\n", + what, + (unsigned long)pfn, + (pte) ? (unsigned long)(pte->pte) : 0, + (unsigned long)val, + buf, + level_name[level]); +} + /** * i915_gem_fault - fault a page into the GTT * vma: VMA in question @@ -1200,8 +1339,10 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf) pfn = ((dev->agp->base + obj_priv->gtt_offset) >> PAGE_SHIFT) + page_offset; + print_pte(vma,"before", NULL, pfn, 0); /* Finally, remap it using the new GTT offset */ ret = vm_insert_pfn(vma, (unsigned long)vmf->virtual_address, pfn); + print_pte(vma, "after", NULL, pfn, (unsigned long) vmf->virtual_address); unlock: mutex_unlock(&dev->struct_mutex); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 03/16/2010 06:19 PM, Konrad Rzeszutek Wilk wrote:>> The serial output is attached. >> >> The patch I used to instrument the fb_mmap function and the output it >> produced for a couple of runs are also attached. >> >> And I tossed in my kernel .config for good measure. >> >> What else is needed? >> > It looks that I confused your email with another person. You don''t seem > to run the nvidia fb, but rather the radeon one. >The current machine I am using has Intel integrated graphics but I can also reproduce the problem on a laptop with nvidia graphics (it runs the vesafb framebuffer). After I send this mail I''ll recompile on that machine and see what happens. I recompiled Xen and pvops/next today. I included your instrumentation patch below for i915_gem_fault, but it doesn''t trigger. No instrumentation messages appear. I even put a print statement at the top of the function but it never prints. I have attached the serial console output and dmesg output. The initcall and drm debug stuff is present. Also, I get something new when I run the test program. It prints out: # ./silly Mapped /dev/fb0 at 0x7f3237175000 Killed Message from syslogd@moss-flapper at Mar 25 19:25:52 ... kernel:Bad pagetable: 000f [#1] SMP And I get the following on the serial console (the deadbeef stuff is the buffer I just wrote into the mmap): moss-flapper login: (XEN) d0:v1: reserved bit in page table (ec=000F) (XEN) Pagetable walk from 00007f3237175000: (XEN) L4[0x0fe] = 000000001154a067 00000000001deaec (XEN) L3[0x0c8] = 000000001492f067 00000000001db6d1 (XEN) L2[0x1b8] = 0000000015bc7067 00000000001da569 (XEN) L1[0x175] = fffff7fffffff22f ffffffffffffffff (XEN) ----[ Xen-4.0.0-rc8-pre x86_64 debug=n Not tainted ]---- (XEN) CPU: 1 (XEN) RIP: e033:[<0000003002e8305b>] (XEN) RFLAGS: 0000000000010206 EM: 0 CONTEXT: pv guest (XEN) rax: 00007f3237175000 rbx: 0000000000000000 rcx: 0000000000000200 (XEN) rdx: 0000000000001000 rsi: 00007fff42cc42e0 rdi: 00007f3237175000 (XEN) rbp: 00007fff42cc52f0 rsp: 00007fff42cc42c8 r8: 0000000000000001 (XEN) r9: 0000000000000001 r10: 00000000ffffffff r11: 0000000000001000 (XEN) r12: 00000000004005d0 r13: 00007fff42cc53d0 r14: 0000000000000000 (XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 00000000000026f0 (XEN) cr3: 00000000116da000 cr2: 00007f3237175000 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033 (XEN) Guest stack trace from rsp=00007fff42cc42c8: (XEN) 00000000004007e0 cafeababdeadbeef 0000000000000000 cafeababdeadbeef (XEN) cafeababdeadbeef cafeababdeadbeef cafeababdeadbeef cafeababdeadbeef (XEN) cafeababdeadbeef cafeababdeadbeef cafeababdeadbeef cafeababdeadbeef (XEN) cafeababdeadbeef cafeababdeadbeef cafeababdeadbeef cafeababdeadbeef (XEN) cafeababdeadbeef cafeababdeadbeef cafeababdeadbeef cafeababdeadbeef (XEN) cafeababdeadbeef cafeababdeadbeef cafeababdeadbeef cafeababdeadbeef (XEN) cafeababdeadbeef cafeababdeadbeef cafeababdeadbeef cafeababdeadbeef (XEN) cafeababdeadbeef cafeababdeadbeef cafeababdeadbeef cafeababdeadbeef (XEN) cafeababdeadbeef cafeababdeadbeef cafeababdeadbeef cafeababdeadbeef (XEN) cafeababdeadbeef cafeababdeadbeef cafeababdeadbeef cafeababdeadbeef (XEN) cafeababdeadbeef cafeababdeadbeef cafeababdeadbeef cafeababdeadbeef (XEN) cafeababdeadbeef cafeababdeadbeef cafeababdeadbeef cafeababdeadbeef (XEN) cafeababdeadbeef cafeababdeadbeef cafeababdeadbeef cafeababdeadbeef (XEN) cafeababdeadbeef cafeababdeadbeef cafeababdeadbeef cafeababdeadbeef (XEN) cafeababdeadbeef cafeababdeadbeef cafeababdeadbeef cafeababdeadbeef (XEN) cafeababdeadbeef cafeababdeadbeef cafeababdeadbeef cafeababdeadbeef (XEN) cafeababdeadbeef cafeababdeadbeef cafeababdeadbeef cafeababdeadbeef (XEN) cafeababdeadbeef cafeababdeadbeef cafeababdeadbeef cafeababdeadbeef (XEN) cafeababdeadbeef cafeababdeadbeef cafeababdeadbeef cafeababdeadbeef (XEN) cafeababdeadbeef cafeababdeadbeef cafeababdeadbeef cafeababdeadbeef silly: Corrupted page table at address 7f3237175000 PGD 1deaec067 PUD 1db6d1067 PMD 1da569067 PTE fffffffffffff22f Bad pagetable: 000f [#1] SMP last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map CPU 1 Modules linked in: nfs fscache bridge stp llc ipt_MASQUERADE iptable_nat nf_nat nfsd lockd nfs_acl auth_rpcgss export] Pid: 1775, comm: silly Not tainted 2.6.32-pvops-dom0 #23 OptiPlex 960 RIP: e033:[<0000003002e8305b>] [<0000003002e8305b>] 0x3002e8305b RSP: e02b:00007fff42cc42c8 EFLAGS: 00010206 RAX: 00007f3237175000 RBX: 0000000000000000 RCX: 0000000000000200 RDX: 0000000000001000 RSI: 00007fff42cc42e0 RDI: 00007f3237175000 RBP: 00007fff42cc52f0 R08: 0000000000000001 R09: 0000000000000001 R10: 00000000ffffffff R11: 0000000000001000 R12: 00000000004005d0 R13: 00007fff42cc53d0 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f3237162700(0000) GS:ffff880028054000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f3237175000 CR3: 00000001df03a000 CR4: 0000000000002660 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process silly (pid: 1775, threadinfo ffff8801df02a000, task ffff8801db68ae60) RIP [<0000003002e8305b>] 0x3002e8305b RSP <00007fff42cc42c8> ---[ end trace e07c6ddec4199123 ]---> .. snip .. > >> Non-volatile memory driver v1.3 >> Linux agpgart interface v0.103 >> agpgart-intel 0000:00:00.0: Intel Q45/Q43 Chipset >> agpgart-intel 0000:00:00.0: detected 32764K stolen memory >> agpgart-intel 0000:00:00.0: AGP aperture is 256M @ 0xd0000000 >> tpm_tis 00:08: 1.2 TPM (device-id 0x4A10, rev-id 78) >> [drm] Initialized drm 1.1.0 20060810 >> [drm] radeon defaulting to kernel modesetting. >> [drm] radeon kernel modesetting enabled. >> xen_allocate_pirq: returning irq 16 for gsi 16 >> Already setup the GSI :16 >> i915 0000:00:02.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 >> [drm] set up 31M of stolen space >> [drm] TMDS-8: set mode 1280x1024 17 >> Console: switching to colour frame buffer device 160x64 >> fb0: inteldrmfb frame buffer device >> registered panic notifier >> [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0 >> > You look to have a i915 framebuffer on your box. > > I *think* that the i915 is not using KMS and the TTM stuff, so the > patch that Arvind posted would probably not help you. > http://www.mail-archive.com/dri-devel@lists.sourceforge.net/msg48668.html > > So, lets boot your kernel with these command line parameters to get more > data: debug initcall_debug drm.debug=255 > > That should spew out some more details. > > Next thing I would suggest is to instrument i915_gem_fault. Attached is > a patch that does it (thought it is not compile tested nor actually > booted so it might need some hand crafting - sorry). > > And the other thing is to read through the steps that Arvind took in the > e-mail thread titled: "Nouveau on dom0". It covers the gamma of things > to troubleshoot this. > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index fba37e9..cfcaafd 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -33,6 +33,8 @@ > #include "intel_drv.h" > #include <linux/swap.h> > #include <linux/pci.h> > +#include <xen/xen.h> > +#include <asm/xen/page.h> > > #define I915_GEM_GPU_DOMAINS (~(I915_GEM_DOMAIN_CPU | I915_GEM_DOMAIN_GTT)) > > @@ -1145,6 +1147,143 @@ i915_gem_mmap_ioctl(struct drm_device *dev, void *data, > return 0; > } > > +void print_pte(struct vm_area_struct *vma, char *what, struct page *page, unsigned int pfn, unsigned long address) > +{ > + static const char * const level_name[] > + { "NONE", "4K", "2M", "1G", "NUM" }; > + unsigned long addr = 0; > + pte_t *pte = NULL; > + pteval_t val = (pteval_t)0; > + unsigned int level = 0; > + unsigned offset; > + unsigned long phys; > + pgprotval_t prot; > + char buf[90]; > + char *str; > + > + str = buf; > + // Figure out if the address is pagetable. > + if (address == 0 && !page && pfn>0) { > + page = pfn_to_page(pfn); > + } > + if (address == 0 && page) > + addr = (u64)page_address(page); > + > + if (address && !page) > + addr = address; > + > + if (address && page) { > + addr = (u64)page_address(page); > + if (address != addr) { > + if (addr == 0) { > + str += sprintf(str, "addr(page)==0"); > + addr = address; > + } > + } > + } > + > + if (pfn != 0 && page) { > + if (pfn != page_to_pfn(page)) // Gosh!? > + str += sprintf(str, "pfn!=pfn(page)"); > + } > + if (pfn != 0 && addr != 0) { > + if (pfn != virt_to_pfn(addr)) > + str += sprintf(str,"pfn(addr)!=pfn"); > + } > + pte = lookup_address(addr, &level); > + if (!pte) { > + str += sprintf(str,"!pte(addr)"); > + goto print; > + } > + offset = addr & ~PAGE_MASK; > + > + if (xen_domain()) { > + phys = (pte_mfn(*pte) << PAGE_SHIFT) + offset; > + val = pte_val_ma(*pte); > + > + if (pfn > 0) { > + if (pte_mfn(*pte) == pfn) { > + if (vma->vm_flags && VM_IO) > + str += sprintf(str,"PHYS"); > + else > + str += sprintf(str,"BUG: VM_IO not set!"); > + } > + /* It is a pseudo page ... and the VM_IO flag is set */ > + if (pte_mfn(*pte) != pfn) { > + if (vma->vm_flags && VM_IO) > + str += sprintf(str,"BUG: VM_IO flag set!"); > + else > + str += sprintf(str, "PSEUDO"); > + } > + } else { > + str += sprintf(str,"pfn==0"); > + } > + > + } else { > + phys = (pte_pfn(*pte) << PAGE_SHIFT) + offset; > + val = pte_val(*pte); > + } > + prot = pgprot_val(pte_pgprot(*pte)); > + > + if (!prot) > + str += sprintf(str, "Not present."); > + else { > + if (prot & _PAGE_USER) > + str += sprintf(str, "USR "); > + else > + str += sprintf(str, " "); > + if (prot & _PAGE_RW) > + str += sprintf(str, "RW "); > + else > + str += sprintf(str, "ro "); > + if (prot & _PAGE_PWT) > + str += sprintf(str, "PWT "); > + else > + str += sprintf(str, " "); > + if (prot & _PAGE_PCD) > + str += sprintf(str, "PCD "); > + else > + str += sprintf(str, " "); > + > + /* Bit 9 has a different meaning on level 3 vs 4 */ > + if (level <= 3) { > + if (prot & _PAGE_PSE) > + str += sprintf(str, "PSE "); > + else > + str += sprintf(str, " "); > + } else { > + if (prot & _PAGE_PAT) > + str += sprintf(str, "pat "); > + else > + str += sprintf(str, " "); > + } > + if (prot & _PAGE_GLOBAL) > + str += sprintf(str, "GLB "); > + else > + str += sprintf(str, " "); > + if (prot & _PAGE_NX) > + str += sprintf(str, "NX "); > + else > + str += sprintf(str, "x "); > +#ifdef _PAGE_IOMEM > + if (prot & _PAGE_IOMEM) > + str += sprintf(str, "IO "); > + else > + str += sprintf(str, " "); > +#endif > + > + } > + > +print: > + printk(KERN_INFO "[%16s]PFN: 0x%lx PTE: 0x%lx (val:%lx): [%s] [%s]\n", > + what, > + (unsigned long)pfn, > + (pte) ? (unsigned long)(pte->pte) : 0, > + (unsigned long)val, > + buf, > + level_name[level]); > +} > + > /** > * i915_gem_fault - fault a page into the GTT > * vma: VMA in question > @@ -1200,8 +1339,10 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf) > pfn = ((dev->agp->base + obj_priv->gtt_offset) >> PAGE_SHIFT) + > page_offset; > > + print_pte(vma,"before", NULL, pfn, 0); > /* Finally, remap it using the new GTT offset */ > ret = vm_insert_pfn(vma, (unsigned long)vmf->virtual_address, pfn); > + print_pte(vma, "after", NULL, pfn, (unsigned long) vmf->virtual_address); > unlock: > mutex_unlock(&dev->struct_mutex); > >-- Eamon Walsh National Security Agency _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Fri, Mar 26, 2010 at 5:25 AM, Eamon Walsh <ewalsh@tycho.nsa.gov> wrote:> On 03/16/2010 06:19 PM, Konrad Rzeszutek Wilk wrote:< --- snip --- >> I have attached the serial console output and dmesg output. The > initcall and drm debug stuff is present. > > Also, I get something new when I run the test program. It prints out: > > # ./silly > Mapped /dev/fb0 at 0x7f3237175000 > Killed > > Message from syslogd@moss-flapper at Mar 25 19:25:52 ... > kernel:Bad pagetable: 000f [#1] SMP >< --- snip --- >> silly: Corrupted page table at address 7f3237175000 > PGD 1deaec067 PUD 1db6d1067 PMD 1da569067 PTE fffffffffffff22f > Bad pagetable: 000f [#1] SMP > last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map > CPU 1 > Modules linked in: nfs fscache bridge stp llc ipt_MASQUERADE iptable_nat nf_nat nfsd lockd nfs_acl auth_rpcgss export]< --- snip --->>>> [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0 >>> >> You look to have a i915 framebuffer on your box. >> >> I *think* that the i915 is not using KMS and the TTM stuff, so the >> patch that Arvind posted would probably not help you. >> http://www.mail-archive.com/dri-devel@lists.sourceforge.net/msg48668.html >> >> So, lets boot your kernel with these command line parameters to get more >> data: debug initcall_debug drm.debug=255< --- snip --- >>> e-mail thread titled: "Nouveau on dom0". It covers the gamma of things >> to troubleshoot this.This is related and most probably due to the same bit. xf86-video-fbdev works on bare-metal boot on XenNext with the nouveaufb driver but not on Xen. Have upgraded whole chain to tip except xen which is 3.4.3rc3 Here is the syslog trace: kernel: ------------[ cut here ]------------ kernel: WARNING: at arch/x86/mm/pat.c:872 track_pfn_vma_copy+0x4d/0x86() kernel: Hardware name: System Product Name kernel: Modules linked in: fbcon font bitblit softcursor nouveau ttm drm_kms_helper drm cfbcopyarea cfbimgblt cfbfillrect bridge stp llc ipv6 nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs fuse kernel: Pid: 5835, comm: Xorg Not tainted 2.6.32-xen0-git20100323+asusp5wd #1 kernel: Call Trace: kernel: [<ffffffff8102c834>] ? track_pfn_vma_copy+0x4d/0x86 kernel: [<ffffffff8102c834>] ? track_pfn_vma_copy+0x4d/0x86 kernel: [<ffffffff8103ce54>] ? warn_slowpath_common+0x77/0xa3 kernel: [<ffffffff8102c834>] ? track_pfn_vma_copy+0x4d/0x86 kernel: [<ffffffff8100c436>] ? xen_leave_lazy_mmu+0x25/0x43 kernel: [<ffffffff81090c49>] ? copy_page_range+0x76/0x7f8 kernel: [<ffffffff8100ddc9>] ? xen_force_evtchn_callback+0x9/0xa kernel: [<ffffffff8100e572>] ? check_events+0x12/0x20 kernel: [<ffffffff8100e55f>] ? xen_restore_fl_direct_end+0x0/0x1 kernel: [<ffffffff8103b1f2>] ? dup_mm+0x276/0x409 kernel: [<ffffffff8103bd82>] ? copy_process+0x9c8/0x10ff kernel: [<ffffffff8103c5ff>] ? do_fork+0x146/0x2c0 kernel: [<ffffffff810110a3>] ? stub_clone+0x13/0x20 kernel: [<ffffffff81010d82>] ? system_call_fastpath+0x16/0x1b kernel: ---[ end trace c58bf004d15b0c42 ]--- Xorg.log ends with the same message as originally with trying accelerated nouveau with misleading XKB: Failed to compile keymap fbdev.c calls fbdevHWMapVidmem in xorg-server/hw/xfree86/fbdevhw.c which does a mmap as in silly.c. As far as X is concerned, everything is fine, but there is obviously a page-fault problem. Will have to setup debug options and trace :-( The ''corrupted page table'' syndrome is also present in the accelerated nouveau with AGP cards - so it may be linked to this problem. At least this problem can be repeated on many platforms :-) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Mar-27 21:52 UTC
Re: [Xen-devel] Fbdev graphics broken in xen/next dom0
On 03/27/2010 02:14 AM, Arvind R wrote:> On Fri, Mar 26, 2010 at 5:25 AM, Eamon Walsh<ewalsh@tycho.nsa.gov> wrote: > >> On 03/16/2010 06:19 PM, Konrad Rzeszutek Wilk wrote: >> > < --- snip ---> > >> I have attached the serial console output and dmesg output. The >> initcall and drm debug stuff is present. >> >> Also, I get something new when I run the test program. It prints out: >> >> # ./silly >> Mapped /dev/fb0 at 0x7f3237175000 >> Killed >> >> Message from syslogd@moss-flapper at Mar 25 19:25:52 ... >> kernel:Bad pagetable: 000f [#1] SMP >> >> > < --- snip ---> > >> silly: Corrupted page table at address 7f3237175000 >> PGD 1deaec067 PUD 1db6d1067 PMD 1da569067 PTE fffffffffffff22f >> Bad pagetable: 000f [#1] SMP >> last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map >> CPU 1 >> Modules linked in: nfs fscache bridge stp llc ipt_MASQUERADE iptable_nat nf_nat nfsd lockd nfs_acl auth_rpcgss export] >> > < --- snip ---> > >>>> [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0 >>>> >>>> >>> You look to have a i915 framebuffer on your box. >>> >>> I *think* that the i915 is not using KMS and the TTM stuff, so the >>> patch that Arvind posted would probably not help you. >>> http://www.mail-archive.com/dri-devel@lists.sourceforge.net/msg48668.html >>> >>> So, lets boot your kernel with these command line parameters to get more >>> data: debug initcall_debug drm.debug=255 >>> > < --- snip ---> > > >>> e-mail thread titled: "Nouveau on dom0". It covers the gamma of things >>> to troubleshoot this. >>> > This is related and most probably due to the same bit. xf86-video-fbdev works > on bare-metal boot on XenNext with the nouveaufb driver but not on Xen. > Have upgraded whole chain to tip except xen which is 3.4.3rc3 > Here is the syslog trace: > kernel: ------------[ cut here ]------------ > kernel: WARNING: at arch/x86/mm/pat.c:872 track_pfn_vma_copy+0x4d/0x86() > kernel: Hardware name: System Product Name > kernel: Modules linked in: fbcon font bitblit softcursor nouveau ttm > drm_kms_helper drm cfbcopyarea cfbimgblt cfbfillrect bridge stp llc > ipv6 nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs fuse > kernel: Pid: 5835, comm: Xorg Not tainted 2.6.32-xen0-git20100323+asusp5wd #1 > kernel: Call Trace: > kernel: [<ffffffff8102c834>] ? track_pfn_vma_copy+0x4d/0x86 > kernel: [<ffffffff8102c834>] ? track_pfn_vma_copy+0x4d/0x86 > kernel: [<ffffffff8103ce54>] ? warn_slowpath_common+0x77/0xa3 > kernel: [<ffffffff8102c834>] ? track_pfn_vma_copy+0x4d/0x86 > kernel: [<ffffffff8100c436>] ? xen_leave_lazy_mmu+0x25/0x43 > kernel: [<ffffffff81090c49>] ? copy_page_range+0x76/0x7f8 > kernel: [<ffffffff8100ddc9>] ? xen_force_evtchn_callback+0x9/0xa > kernel: [<ffffffff8100e572>] ? check_events+0x12/0x20 > kernel: [<ffffffff8100e55f>] ? xen_restore_fl_direct_end+0x0/0x1 > kernel: [<ffffffff8103b1f2>] ? dup_mm+0x276/0x409 > kernel: [<ffffffff8103bd82>] ? copy_process+0x9c8/0x10ff > kernel: [<ffffffff8103c5ff>] ? do_fork+0x146/0x2c0 > kernel: [<ffffffff810110a3>] ? stub_clone+0x13/0x20 > kernel: [<ffffffff81010d82>] ? system_call_fastpath+0x16/0x1b > kernel: ---[ end trace c58bf004d15b0c42 ]--- > > Xorg.log ends with the same message as originally with trying > accelerated nouveau with misleading > XKB: Failed to compile keymap > > fbdev.c calls fbdevHWMapVidmem in xorg-server/hw/xfree86/fbdevhw.c > which does a mmap as in silly.c. As far as X is concerned, everything > is fine, but there is obviously a page-fault problem. Will have to setup > debug options and trace :-( > > The ''corrupted page table'' syndrome is also present in the accelerated > nouveau with AGP cards - so it may be linked to this problem. At least > this problem can be repeated on many platforms :-) >The "corrupt pagetable" comes from the pte having invalid reserved bits set in it. I think the failure path is this: The bad bits get set because someone is doing a pfn->mfn conversion on a page which is already an mfn, and doesn''t have a valid pfn->mfn mapping, and the result of the conversion is either 0xff... or 0x7f... (I forget right now). But either way, a whole lot of bits get set, but nothing useful. I''m not quite sure why Xen isn''t complaining about this at set-pte time, but perhaps it looks vaguely valid to it (perhaps it sees the invalid flags, knows the pte can''t be used to access anything, and allows it to be set?). But this fault is happening because usermode gets a tlb miss, and the CPU finds a pte with reserved bits set, and raises the fault. I''m not sure about the mm/pat.c warning thought. I had a quick look at that code, but it wasn''t obvious to me what''s going on there. Something about handing the IO mapping during a fork(). Not sure if its related or not. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Sun, Mar 28, 2010 at 3:22 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:> On 03/27/2010 02:14 AM, Arvind R wrote: >> >> On Fri, Mar 26, 2010 at 5:25 AM, Eamon Walsh<ewalsh@tycho.nsa.gov> wrote: >> >>> >>> On 03/16/2010 06:19 PM, Konrad Rzeszutek Wilk wrote: >>> >> >> < --- snip ---> >> >>> >>> I have attached the serial console output and dmesg output. The >>> initcall and drm debug stuff is present. >>> >>> Also, I get something new when I run the test program. It prints out: >>> >>> # ./silly >>> Mapped /dev/fb0 at 0x7f3237175000 >>> Killed >>> >>> Message from syslogd@moss-flapper at Mar 25 19:25:52 ... >>> kernel:Bad pagetable: 000f [#1] SMP >>> >>> >> >> < --- snip ---> >> >>> >>> silly: Corrupted page table at address 7f3237175000 >>> PGD 1deaec067 PUD 1db6d1067 PMD 1da569067 PTE fffffffffffff22f >>> Bad pagetable: 000f [#1] SMP >>> last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map >>> CPU 1 >>> Modules linked in: nfs fscache bridge stp llc ipt_MASQUERADE iptable_nat >>> nf_nat nfsd lockd nfs_acl auth_rpcgss export] >>> >> >> < --- snip ---> >> >>>>> >>>>> [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0 >>>>> >>>>> >>>> >>>> You look to have a i915 framebuffer on your box. >>>> >>>> I *think* that the i915 is not using KMS and the TTM stuff, so the >>>> patch that Arvind posted would probably not help you. >>>> >>>> http://www.mail-archive.com/dri-devel@lists.sourceforge.net/msg48668.html >>>> >>>> So, lets boot your kernel with these command line parameters to get more >>>> data: debug initcall_debug drm.debug=255 >>>> >> >> < --- snip ---> >> >> >>>> >>>> e-mail thread titled: "Nouveau on dom0". It covers the gamma of things >>>> to troubleshoot this. >>>> >> >> This is related and most probably due to the same bit. xf86-video-fbdev >> works >> on bare-metal boot on XenNext with the nouveaufb driver but not on Xen. >> Have upgraded whole chain to tip except xen which is 3.4.3rc3 >> Here is the syslog trace: >> kernel: ------------[ cut here ]------------ >> kernel: WARNING: at arch/x86/mm/pat.c:872 track_pfn_vma_copy+0x4d/0x86() >> kernel: Hardware name: System Product Name >> kernel: Modules linked in: fbcon font bitblit softcursor nouveau ttm >> drm_kms_helper drm cfbcopyarea cfbimgblt cfbfillrect bridge stp llc >> ipv6 nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs fuse >> kernel: Pid: 5835, comm: Xorg Not tainted 2.6.32-xen0-git20100323+asusp5wd >> #1 >> kernel: Call Trace: >> kernel: [<ffffffff8102c834>] ? track_pfn_vma_copy+0x4d/0x86 >> kernel: [<ffffffff8102c834>] ? track_pfn_vma_copy+0x4d/0x86 >> kernel: [<ffffffff8103ce54>] ? warn_slowpath_common+0x77/0xa3 >> kernel: [<ffffffff8102c834>] ? track_pfn_vma_copy+0x4d/0x86 >> kernel: [<ffffffff8100c436>] ? xen_leave_lazy_mmu+0x25/0x43 >> kernel: [<ffffffff81090c49>] ? copy_page_range+0x76/0x7f8 >> kernel: [<ffffffff8100ddc9>] ? xen_force_evtchn_callback+0x9/0xa >> kernel: [<ffffffff8100e572>] ? check_events+0x12/0x20 >> kernel: [<ffffffff8100e55f>] ? xen_restore_fl_direct_end+0x0/0x1 >> kernel: [<ffffffff8103b1f2>] ? dup_mm+0x276/0x409 >> kernel: [<ffffffff8103bd82>] ? copy_process+0x9c8/0x10ff >> kernel: [<ffffffff8103c5ff>] ? do_fork+0x146/0x2c0 >> kernel: [<ffffffff810110a3>] ? stub_clone+0x13/0x20 >> kernel: [<ffffffff81010d82>] ? system_call_fastpath+0x16/0x1b >> kernel: ---[ end trace c58bf004d15b0c42 ]--- >> >> Xorg.log ends with the same message as originally with trying >> accelerated nouveau with misleading >> XKB: Failed to compile keymap >> >> fbdev.c calls fbdevHWMapVidmem in xorg-server/hw/xfree86/fbdevhw.c >> which does a mmap as in silly.c. As far as X is concerned, everything >> is fine, but there is obviously a page-fault problem. Will have to setup >> debug options and trace :-( >> >> The ''corrupted page table'' syndrome is also present in the accelerated >> nouveau with AGP cards - so it may be linked to this problem. At least >> this problem can be repeated on many platforms :-) >> > > The "corrupt pagetable" comes from the pte having invalid reserved bits set > in it. I think the failure path is this: > > The bad bits get set because someone is doing a pfn->mfn conversion on a > page which is already an mfn, and doesn''t have a valid pfn->mfn mapping, and > the result of the conversion is either 0xff... or 0x7f... (I forget right > now). But either way, a whole lot of bits get set, but nothing useful. I''m > not quite sure why Xen isn''t complaining about this at set-pte time, but > perhaps it looks vaguely valid to it (perhaps it sees the invalid flags, > knows the pte can''t be used to access anything, and allows it to be set?).OK> But this fault is happening because usermode gets a tlb miss, and the CPU > finds a pte with reserved bits set, and raises the fault.Sorry, no faults!> I''m not sure about the mm/pat.c warning thought. I had a quick look at that > code, but it wasn''t obvious to me what''s going on there. Something about > handing the IO mapping during a fork(). Not sure if its related or not. > > J >Was mistaken in assuming a fault. My guess is that Jeremy''s failure-path train is right, minus the fault. The hang occurs after the kernel-mode setting has completed - but usermode (which thinks all is hunky-dory) is somehow unable to create/write to its map of the framebuffer. System responsive - no consoles. The FBDev DDX driver mmaps the framebuffer device, once, during initialization in fbdevHWMapVidmem. Subsequent calls return the previously mapped address. But unfortunately, the first mmap of the device finds it already mapped by the console drivers (I presume) - with VM_IO set in the shareable mapping. Is this the first case where the mapped area is iomem (backed by the graphic card memory) and is already mapped? In mm/mmap.c mmap_region I see the vma created for the mmap - and it does not have the VM_IO set initially, The driver f_ops->mmap should be able to select it. But the common drm_mmap entry-point is not being entered at all in both bare-boot (working) and xen-boot (not working) cases! What am I missing? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel