Hi Stefano, I recently tried to play some 3D games on my linux guest. The game starts without problem but it freezes the entire system after a some time (a minute or so?). Here I mean both the host and domU are not responsive anymore. The ssh freezes and i had to shutdown the machine using power button directly. I did not find anything obvious from the host log. But from the guest, I can find this: Dec 18 20:28:38 debvm kernel: [ 0.899860] resource map sanity check conflict: 0xfeff5018 0xfeff7017 0xfeff7000 0xffffffff reserved Dec 18 20:28:38 debvm kernel: [ 0.899862] ------------[ cut here ]------------ Dec 18 20:28:38 debvm kernel: [ 0.899869] WARNING: at arch/x86/mm/ioremap.c:171 __ioremap_caller+0x2c4/0x33c() Dec 18 20:28:38 debvm kernel: [ 0.899870] Hardware name: HVM domU Dec 18 20:28:38 debvm kernel: [ 0.899872] Info: mapping multiple BARs. Your kernel is fine. Dec 18 20:28:38 debvm kernel: [ 0.899873] Modules linked in: Dec 18 20:28:38 debvm kernel: [ 0.899878] Pid: 1, comm: swapper/0 Not tainted 3.6.9 #4 Dec 18 20:28:38 debvm kernel: [ 0.899892] Call Trace: Dec 18 20:28:38 debvm kernel: [ 0.899896] [<ffffffff8103d194>] ? warn_slowpath_common+0x76/0x8a Dec 18 20:28:38 debvm kernel: [ 0.899898] [<ffffffff8103d240>] ? warn_slowpath_fmt+0x45/0x4a Dec 18 20:28:38 debvm kernel: [ 0.899900] [<ffffffff81032a6c>] ? __ioremap_caller+0x2c4/0x33c Dec 18 20:28:38 debvm kernel: [ 0.899902] [<ffffffff812c3be3>] ? intel_opregion_setup+0x9c/0x201 Dec 18 20:28:38 debvm kernel: [ 0.899904] [<ffffffff812bcb75>] ? intel_setup_gmbus+0x175/0x19d Dec 18 20:28:38 debvm kernel: [ 0.899907] [<ffffffff8128a37a>] ? i915_driver_load+0x548/0x90d Dec 18 20:28:38 debvm kernel: [ 0.899910] [<ffffffff812ff804>] ? setup_hpet_msi_remapped+0x20/0x20 Dec 18 20:28:38 debvm kernel: [ 0.899912] [<ffffffff81272706>] ? drm_get_pci_dev+0x152/0x259 Dec 18 20:28:38 debvm kernel: [ 0.899915] [<ffffffff813d4883>] ? _raw_spin_lock_irqsave+0x21/0x45 Dec 18 20:28:38 debvm kernel: [ 0.899918] [<ffffffff811d9ecc>] ? local_pci_probe+0x5a/0xa0 Dec 18 20:28:38 debvm kernel: [ 0.899920] [<ffffffff811d9fcf>] ? pci_device_probe+0xbd/0xe7 Dec 18 20:28:38 debvm kernel: [ 0.899922] [<ffffffff812cd887>] ? driver_probe_device+0x1b0/0x1b0 Dec 18 20:28:38 debvm kernel: [ 0.899923] [<ffffffff812cd887>] ? driver_probe_device+0x1b0/0x1b0 Dec 18 20:28:38 debvm kernel: [ 0.899925] [<ffffffff812cd769>] ? driver_probe_device+0x92/0x1b0 Dec 18 20:28:38 debvm kernel: [ 0.899926] [<ffffffff812cd8da>] ? __driver_attach+0x53/0x73 Dec 18 20:28:38 debvm kernel: [ 0.899928] [<ffffffff812cc06f>] ? bus_for_each_dev+0x46/0x77 Dec 18 20:28:38 debvm kernel: [ 0.899930] [<ffffffff812ccf8f>] ? bus_add_driver+0xd5/0x1f4 Dec 18 20:28:38 debvm kernel: [ 0.899931] [<ffffffff812cde14>] ? driver_register+0x89/0x101 Dec 18 20:28:38 debvm kernel: [ 0.899933] [<ffffffff811d9336>] ? __pci_register_driver+0x49/0xa3 Dec 18 20:28:38 debvm kernel: [ 0.899935] [<ffffffff816d55c7>] ? ttm_init+0x63/0x63 Dec 18 20:28:38 debvm kernel: [ 0.899937] [<ffffffff81002085>] ? do_one_initcall+0x75/0x12c Dec 18 20:28:38 debvm kernel: [ 0.899940] [<ffffffff816a6cc2>] ? kernel_init+0x13c/0x1c0 Dec 18 20:28:38 debvm kernel: [ 0.899941] [<ffffffff816a6565>] ? do_early_param+0x83/0x83 Dec 18 20:28:38 debvm kernel: [ 0.899943] [<ffffffff813d9f44>] ? kernel_thread_helper+0x4/0x10 Dec 18 20:28:38 debvm kernel: [ 0.899945] [<ffffffff816a6b86>] ? start_kernel+0x3e1/0x3e1 Dec 18 20:28:38 debvm kernel: [ 0.899947] [<ffffffff813d9f40>] ? gs_change+0x13/0x13 Dec 18 20:28:38 debvm kernel: [ 0.899950] ---[ end trace db461543ce599b44 ]--- I''m not sure if this has anything to do with the freeze. This seems to show up on every boot after I upgraded to xen version 4.2.1-rc2. Both debian kernel 3.2.32 / 3.6.9 suffers from the same log. But whole system freeze happens only during gaming, which is much less frequent. So I''m not sure if the two are related. But anyway, could you comment about what does this log mean? I can find the one of the mentioned address in the qemu_dm log: pt_pci_write_config: [00:02:0] address=00fc val=0xfeff5000 len=4 igd_write_opregion: Map OpRegion: cd996018 -> feff5018 igd_write_opregion: [00:02:0] addr=fc len=2 val=feff5000 PS: I also run xbmc on domU and it playbacks video under HW acceleration (VAAPI) without any problem. XBMC by itself is also an graphics intensive program. But this runs on an pure HVM guest, while the failing case is on PVHVM. PS2: I also suffered another instability yesterday. It happens when I was compiling kernel in side the domU. The host reboots suddenly. Since I''m not using graphics at that time (Xorg session is idle, I connected through SSH), this may be a different issue. Thanks, Timothy
Adding Jean, the author to the opregion patch. Jean, I believe the warning is due to the offset within the page. To accommodate the offset, you would need to reserve another page for it. Will the extra page cause any unexpected problem? The original thread is about an instability issue that directly freeze the host. I believe this warning above should not has such effect. What do you think? And any suggestion? Thanks, Timothy On Wed, Dec 19, 2012 at 1:28 AM, G.R. <firemeteor@users.sourceforge.net> wrote:> Hi Stefano, > > I recently tried to play some 3D games on my linux guest. > The game starts without problem but it freezes the entire system after > a some time (a minute or so?). > Here I mean both the host and domU are not responsive anymore. > The ssh freezes and i had to shutdown the machine using power button directly. > > I did not find anything obvious from the host log. But from the guest, > I can find this: > > Dec 18 20:28:38 debvm kernel: [ 0.899860] resource map sanity check > conflict: 0xfeff5018 0xfeff7017 0xfeff7000 0xffffffff reserved > Dec 18 20:28:38 debvm kernel: [ 0.899862] ------------[ cut here > ]------------ > Dec 18 20:28:38 debvm kernel: [ 0.899869] WARNING: at > arch/x86/mm/ioremap.c:171 __ioremap_caller+0x2c4/0x33c() > Dec 18 20:28:38 debvm kernel: [ 0.899870] Hardware name: HVM domU > Dec 18 20:28:38 debvm kernel: [ 0.899872] Info: mapping multiple > BARs. Your kernel is fine. > Dec 18 20:28:38 debvm kernel: [ 0.899873] Modules linked in: > Dec 18 20:28:38 debvm kernel: [ 0.899878] Pid: 1, comm: swapper/0 > Not tainted 3.6.9 #4 > Dec 18 20:28:38 debvm kernel: [ 0.899892] Call Trace: > Dec 18 20:28:38 debvm kernel: [ 0.899896] [<ffffffff8103d194>] ? > warn_slowpath_common+0x76/0x8a > Dec 18 20:28:38 debvm kernel: [ 0.899898] [<ffffffff8103d240>] ? > warn_slowpath_fmt+0x45/0x4a > Dec 18 20:28:38 debvm kernel: [ 0.899900] [<ffffffff81032a6c>] ? > __ioremap_caller+0x2c4/0x33c > Dec 18 20:28:38 debvm kernel: [ 0.899902] [<ffffffff812c3be3>] ? > intel_opregion_setup+0x9c/0x201 > Dec 18 20:28:38 debvm kernel: [ 0.899904] [<ffffffff812bcb75>] ? > intel_setup_gmbus+0x175/0x19d > Dec 18 20:28:38 debvm kernel: [ 0.899907] [<ffffffff8128a37a>] ? > i915_driver_load+0x548/0x90d > Dec 18 20:28:38 debvm kernel: [ 0.899910] [<ffffffff812ff804>] ? > setup_hpet_msi_remapped+0x20/0x20 > Dec 18 20:28:38 debvm kernel: [ 0.899912] [<ffffffff81272706>] ? > drm_get_pci_dev+0x152/0x259 > Dec 18 20:28:38 debvm kernel: [ 0.899915] [<ffffffff813d4883>] ? > _raw_spin_lock_irqsave+0x21/0x45 > Dec 18 20:28:38 debvm kernel: [ 0.899918] [<ffffffff811d9ecc>] ? > local_pci_probe+0x5a/0xa0 > Dec 18 20:28:38 debvm kernel: [ 0.899920] [<ffffffff811d9fcf>] ? > pci_device_probe+0xbd/0xe7 > Dec 18 20:28:38 debvm kernel: [ 0.899922] [<ffffffff812cd887>] ? > driver_probe_device+0x1b0/0x1b0 > Dec 18 20:28:38 debvm kernel: [ 0.899923] [<ffffffff812cd887>] ? > driver_probe_device+0x1b0/0x1b0 > Dec 18 20:28:38 debvm kernel: [ 0.899925] [<ffffffff812cd769>] ? > driver_probe_device+0x92/0x1b0 > Dec 18 20:28:38 debvm kernel: [ 0.899926] [<ffffffff812cd8da>] ? > __driver_attach+0x53/0x73 > Dec 18 20:28:38 debvm kernel: [ 0.899928] [<ffffffff812cc06f>] ? > bus_for_each_dev+0x46/0x77 > Dec 18 20:28:38 debvm kernel: [ 0.899930] [<ffffffff812ccf8f>] ? > bus_add_driver+0xd5/0x1f4 > Dec 18 20:28:38 debvm kernel: [ 0.899931] [<ffffffff812cde14>] ? > driver_register+0x89/0x101 > Dec 18 20:28:38 debvm kernel: [ 0.899933] [<ffffffff811d9336>] ? > __pci_register_driver+0x49/0xa3 > Dec 18 20:28:38 debvm kernel: [ 0.899935] [<ffffffff816d55c7>] ? > ttm_init+0x63/0x63 > Dec 18 20:28:38 debvm kernel: [ 0.899937] [<ffffffff81002085>] ? > do_one_initcall+0x75/0x12c > Dec 18 20:28:38 debvm kernel: [ 0.899940] [<ffffffff816a6cc2>] ? > kernel_init+0x13c/0x1c0 > Dec 18 20:28:38 debvm kernel: [ 0.899941] [<ffffffff816a6565>] ? > do_early_param+0x83/0x83 > Dec 18 20:28:38 debvm kernel: [ 0.899943] [<ffffffff813d9f44>] ? > kernel_thread_helper+0x4/0x10 > Dec 18 20:28:38 debvm kernel: [ 0.899945] [<ffffffff816a6b86>] ? > start_kernel+0x3e1/0x3e1 > Dec 18 20:28:38 debvm kernel: [ 0.899947] [<ffffffff813d9f40>] ? > gs_change+0x13/0x13 > Dec 18 20:28:38 debvm kernel: [ 0.899950] ---[ end trace > db461543ce599b44 ]--- > > I''m not sure if this has anything to do with the freeze. This seems to > show up on every boot after I upgraded to xen version 4.2.1-rc2. Both > debian kernel 3.2.32 / 3.6.9 suffers from the same log. But whole > system freeze happens only during gaming, which is much less frequent. > So I''m not sure if the two are related. But anyway, could you comment > about what does this log mean? > > I can find the one of the mentioned address in the qemu_dm log: > pt_pci_write_config: [00:02:0] address=00fc val=0xfeff5000 len=4 > igd_write_opregion: Map OpRegion: cd996018 -> feff5018 > igd_write_opregion: [00:02:0] addr=fc len=2 val=feff5000 > > PS: I also run xbmc on domU and it playbacks video under HW > acceleration (VAAPI) without any problem. XBMC by itself is also an > graphics intensive program. But this runs on an pure HVM guest, while > the failing case is on PVHVM. > > PS2: I also suffered another instability yesterday. It happens when I > was compiling kernel in side the domU. The host reboots suddenly. > Since I''m not using graphics at that time (Xorg session is idle, I > connected through SSH), this may be a different issue. > > Thanks, > Timothy
On Wed, Dec 19, 2012 at 2:20 PM, G.R. <firemeteor@users.sourceforge.net> wrote:> Adding Jean, the author to the opregion patch. > > Jean, I believe the warning is due to the offset within the page. > To accommodate the offset, you would need to reserve another page for it. > Will the extra page cause any unexpected problem? > > The original thread is about an instability issue that directly freeze the host. > I believe this warning above should not has such effect. > What do you think? And any suggestion? >Jean appears to be no longer reach able. The warning I found turns out to be not relevant. According to the OpRegion spec, the tail part is reserved and should never be touched by the guest. But anyway, I had a local fix to get rid of the warning, but reserving one more page and map it when the host opregion is not page aligned. I''ll send it to a separate thread. Back to the topic. I updated to xen 4.2.1 and tried three times tonight. Two of them lead to total freeze with no error log available, after game playing for a couple of minutes. And the last try ended up with GPU hang after 10+ minutes of game playing. This is a guest only hang. But I still have no way to check GPU error state even it has been collected: [ 1553.588076] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [ 1553.592112] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [ 1582.004075] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [ 1597.220075] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [ 1613.220074] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung I''m wondering if the two syndromes are due to the same underlying cause. But I guess a GPU hang caused by guest driver issue should not freeze the host. Is it true? I''m going to try more with different config -- different kernel version, with / without PVOPS, native run vs VM etc. But this is kind of blindly since I have no clue at all. If you have anything to suspect, it will be highly appreciated. Thanks, Timothy> Thanks, > Timothy > > On Wed, Dec 19, 2012 at 1:28 AM, G.R. <firemeteor@users.sourceforge.net> wrote: >> Hi Stefano, >> >> I recently tried to play some 3D games on my linux guest. >> The game starts without problem but it freezes the entire system after >> a some time (a minute or so?). >> Here I mean both the host and domU are not responsive anymore. >> The ssh freezes and i had to shutdown the machine using power button directly. >> >> I did not find anything obvious from the host log. But from the guest, >> I can find this: >> >> Dec 18 20:28:38 debvm kernel: [ 0.899860] resource map sanity check >> conflict: 0xfeff5018 0xfeff7017 0xfeff7000 0xffffffff reserved >> Dec 18 20:28:38 debvm kernel: [ 0.899862] ------------[ cut here >> ]------------ >> Dec 18 20:28:38 debvm kernel: [ 0.899869] WARNING: at >> arch/x86/mm/ioremap.c:171 __ioremap_caller+0x2c4/0x33c() >> Dec 18 20:28:38 debvm kernel: [ 0.899870] Hardware name: HVM domU >> Dec 18 20:28:38 debvm kernel: [ 0.899872] Info: mapping multiple >> BARs. Your kernel is fine. >> Dec 18 20:28:38 debvm kernel: [ 0.899873] Modules linked in: >> Dec 18 20:28:38 debvm kernel: [ 0.899878] Pid: 1, comm: swapper/0 >> Not tainted 3.6.9 #4 >> Dec 18 20:28:38 debvm kernel: [ 0.899892] Call Trace: >> Dec 18 20:28:38 debvm kernel: [ 0.899896] [<ffffffff8103d194>] ? >> warn_slowpath_common+0x76/0x8a >> Dec 18 20:28:38 debvm kernel: [ 0.899898] [<ffffffff8103d240>] ? >> warn_slowpath_fmt+0x45/0x4a >> Dec 18 20:28:38 debvm kernel: [ 0.899900] [<ffffffff81032a6c>] ? >> __ioremap_caller+0x2c4/0x33c >> Dec 18 20:28:38 debvm kernel: [ 0.899902] [<ffffffff812c3be3>] ? >> intel_opregion_setup+0x9c/0x201 >> Dec 18 20:28:38 debvm kernel: [ 0.899904] [<ffffffff812bcb75>] ? >> intel_setup_gmbus+0x175/0x19d >> Dec 18 20:28:38 debvm kernel: [ 0.899907] [<ffffffff8128a37a>] ? >> i915_driver_load+0x548/0x90d >> Dec 18 20:28:38 debvm kernel: [ 0.899910] [<ffffffff812ff804>] ? >> setup_hpet_msi_remapped+0x20/0x20 >> Dec 18 20:28:38 debvm kernel: [ 0.899912] [<ffffffff81272706>] ? >> drm_get_pci_dev+0x152/0x259 >> Dec 18 20:28:38 debvm kernel: [ 0.899915] [<ffffffff813d4883>] ? >> _raw_spin_lock_irqsave+0x21/0x45 >> Dec 18 20:28:38 debvm kernel: [ 0.899918] [<ffffffff811d9ecc>] ? >> local_pci_probe+0x5a/0xa0 >> Dec 18 20:28:38 debvm kernel: [ 0.899920] [<ffffffff811d9fcf>] ? >> pci_device_probe+0xbd/0xe7 >> Dec 18 20:28:38 debvm kernel: [ 0.899922] [<ffffffff812cd887>] ? >> driver_probe_device+0x1b0/0x1b0 >> Dec 18 20:28:38 debvm kernel: [ 0.899923] [<ffffffff812cd887>] ? >> driver_probe_device+0x1b0/0x1b0 >> Dec 18 20:28:38 debvm kernel: [ 0.899925] [<ffffffff812cd769>] ? >> driver_probe_device+0x92/0x1b0 >> Dec 18 20:28:38 debvm kernel: [ 0.899926] [<ffffffff812cd8da>] ? >> __driver_attach+0x53/0x73 >> Dec 18 20:28:38 debvm kernel: [ 0.899928] [<ffffffff812cc06f>] ? >> bus_for_each_dev+0x46/0x77 >> Dec 18 20:28:38 debvm kernel: [ 0.899930] [<ffffffff812ccf8f>] ? >> bus_add_driver+0xd5/0x1f4 >> Dec 18 20:28:38 debvm kernel: [ 0.899931] [<ffffffff812cde14>] ? >> driver_register+0x89/0x101 >> Dec 18 20:28:38 debvm kernel: [ 0.899933] [<ffffffff811d9336>] ? >> __pci_register_driver+0x49/0xa3 >> Dec 18 20:28:38 debvm kernel: [ 0.899935] [<ffffffff816d55c7>] ? >> ttm_init+0x63/0x63 >> Dec 18 20:28:38 debvm kernel: [ 0.899937] [<ffffffff81002085>] ? >> do_one_initcall+0x75/0x12c >> Dec 18 20:28:38 debvm kernel: [ 0.899940] [<ffffffff816a6cc2>] ? >> kernel_init+0x13c/0x1c0 >> Dec 18 20:28:38 debvm kernel: [ 0.899941] [<ffffffff816a6565>] ? >> do_early_param+0x83/0x83 >> Dec 18 20:28:38 debvm kernel: [ 0.899943] [<ffffffff813d9f44>] ? >> kernel_thread_helper+0x4/0x10 >> Dec 18 20:28:38 debvm kernel: [ 0.899945] [<ffffffff816a6b86>] ? >> start_kernel+0x3e1/0x3e1 >> Dec 18 20:28:38 debvm kernel: [ 0.899947] [<ffffffff813d9f40>] ? >> gs_change+0x13/0x13 >> Dec 18 20:28:38 debvm kernel: [ 0.899950] ---[ end trace >> db461543ce599b44 ]--- >> >> I''m not sure if this has anything to do with the freeze. This seems to >> show up on every boot after I upgraded to xen version 4.2.1-rc2. Both >> debian kernel 3.2.32 / 3.6.9 suffers from the same log. But whole >> system freeze happens only during gaming, which is much less frequent. >> So I''m not sure if the two are related. But anyway, could you comment >> about what does this log mean? >> >> I can find the one of the mentioned address in the qemu_dm log: >> pt_pci_write_config: [00:02:0] address=00fc val=0xfeff5000 len=4 >> igd_write_opregion: Map OpRegion: cd996018 -> feff5018 >> igd_write_opregion: [00:02:0] addr=fc len=2 val=feff5000 >> >> PS: I also run xbmc on domU and it playbacks video under HW >> acceleration (VAAPI) without any problem. XBMC by itself is also an >> graphics intensive program. But this runs on an pure HVM guest, while >> the failing case is on PVHVM. >> >> PS2: I also suffered another instability yesterday. It happens when I >> was compiling kernel in side the domU. The host reboots suddenly. >> Since I''m not using graphics at that time (Xorg session is idle, I >> connected through SSH), this may be a different issue. >> >> Thanks, >> Timothy
On Tue, Dec 18, 2012 at 9:28 AM, G.R. <firemeteor@users.sourceforge.net> wrote:> Hi Stefano, > > I recently tried to play some 3D games on my linux guest. > The game starts without problem but it freezes the entire system after > a some time (a minute or so?). > Here I mean both the host and domU are not responsive anymore. > The ssh freezes and i had to shutdown the machine using power button directly. > > I did not find anything obvious from the host log. But from the guest, > I can find this: > > Dec 18 20:28:38 debvm kernel: [ 0.899860] resource map sanity check > conflict: 0xfeff5018 0xfeff7017 0xfeff7000 0xffffffff reserved > Dec 18 20:28:38 debvm kernel: [ 0.899862] ------------[ cut here > ]------------ > Dec 18 20:28:38 debvm kernel: [ 0.899869] WARNING: at > arch/x86/mm/ioremap.c:171 __ioremap_caller+0x2c4/0x33c() > Dec 18 20:28:38 debvm kernel: [ 0.899870] Hardware name: HVM domU > Dec 18 20:28:38 debvm kernel: [ 0.899872] Info: mapping multiple > BARs. Your kernel is fine. > Dec 18 20:28:38 debvm kernel: [ 0.899873] Modules linked in: > Dec 18 20:28:38 debvm kernel: [ 0.899878] Pid: 1, comm: swapper/0 > Not tainted 3.6.9 #4 > Dec 18 20:28:38 debvm kernel: [ 0.899892] Call Trace: > Dec 18 20:28:38 debvm kernel: [ 0.899896] [<ffffffff8103d194>] ? > warn_slowpath_common+0x76/0x8a > Dec 18 20:28:38 debvm kernel: [ 0.899898] [<ffffffff8103d240>] ? > warn_slowpath_fmt+0x45/0x4a > Dec 18 20:28:38 debvm kernel: [ 0.899900] [<ffffffff81032a6c>] ? > __ioremap_caller+0x2c4/0x33c > Dec 18 20:28:38 debvm kernel: [ 0.899902] [<ffffffff812c3be3>] ? > intel_opregion_setup+0x9c/0x201 > Dec 18 20:28:38 debvm kernel: [ 0.899904] [<ffffffff812bcb75>] ? > intel_setup_gmbus+0x175/0x19d > Dec 18 20:28:38 debvm kernel: [ 0.899907] [<ffffffff8128a37a>] ? > i915_driver_load+0x548/0x90d > Dec 18 20:28:38 debvm kernel: [ 0.899910] [<ffffffff812ff804>] ? > setup_hpet_msi_remapped+0x20/0x20 > Dec 18 20:28:38 debvm kernel: [ 0.899912] [<ffffffff81272706>] ? > drm_get_pci_dev+0x152/0x259 > Dec 18 20:28:38 debvm kernel: [ 0.899915] [<ffffffff813d4883>] ? > _raw_spin_lock_irqsave+0x21/0x45 > Dec 18 20:28:38 debvm kernel: [ 0.899918] [<ffffffff811d9ecc>] ? > local_pci_probe+0x5a/0xa0 > Dec 18 20:28:38 debvm kernel: [ 0.899920] [<ffffffff811d9fcf>] ? > pci_device_probe+0xbd/0xe7 > Dec 18 20:28:38 debvm kernel: [ 0.899922] [<ffffffff812cd887>] ? > driver_probe_device+0x1b0/0x1b0 > Dec 18 20:28:38 debvm kernel: [ 0.899923] [<ffffffff812cd887>] ? > driver_probe_device+0x1b0/0x1b0 > Dec 18 20:28:38 debvm kernel: [ 0.899925] [<ffffffff812cd769>] ? > driver_probe_device+0x92/0x1b0 > Dec 18 20:28:38 debvm kernel: [ 0.899926] [<ffffffff812cd8da>] ? > __driver_attach+0x53/0x73 > Dec 18 20:28:38 debvm kernel: [ 0.899928] [<ffffffff812cc06f>] ? > bus_for_each_dev+0x46/0x77 > Dec 18 20:28:38 debvm kernel: [ 0.899930] [<ffffffff812ccf8f>] ? > bus_add_driver+0xd5/0x1f4 > Dec 18 20:28:38 debvm kernel: [ 0.899931] [<ffffffff812cde14>] ? > driver_register+0x89/0x101 > Dec 18 20:28:38 debvm kernel: [ 0.899933] [<ffffffff811d9336>] ? > __pci_register_driver+0x49/0xa3 > Dec 18 20:28:38 debvm kernel: [ 0.899935] [<ffffffff816d55c7>] ? > ttm_init+0x63/0x63 > Dec 18 20:28:38 debvm kernel: [ 0.899937] [<ffffffff81002085>] ? > do_one_initcall+0x75/0x12c > Dec 18 20:28:38 debvm kernel: [ 0.899940] [<ffffffff816a6cc2>] ? > kernel_init+0x13c/0x1c0 > Dec 18 20:28:38 debvm kernel: [ 0.899941] [<ffffffff816a6565>] ? > do_early_param+0x83/0x83 > Dec 18 20:28:38 debvm kernel: [ 0.899943] [<ffffffff813d9f44>] ? > kernel_thread_helper+0x4/0x10 > Dec 18 20:28:38 debvm kernel: [ 0.899945] [<ffffffff816a6b86>] ? > start_kernel+0x3e1/0x3e1 > Dec 18 20:28:38 debvm kernel: [ 0.899947] [<ffffffff813d9f40>] ? > gs_change+0x13/0x13 > Dec 18 20:28:38 debvm kernel: [ 0.899950] ---[ end trace > db461543ce599b44 ]--- > > I''m not sure if this has anything to do with the freeze. This seems to > show up on every boot after I upgraded to xen version 4.2.1-rc2. Both > debian kernel 3.2.32 / 3.6.9 suffers from the same log. But whole > system freeze happens only during gaming, which is much less frequent. > So I''m not sure if the two are related. But anyway, could you comment > about what does this log mean? > > I can find the one of the mentioned address in the qemu_dm log: > pt_pci_write_config: [00:02:0] address=00fc val=0xfeff5000 len=4 > igd_write_opregion: Map OpRegion: cd996018 -> feff5018 > igd_write_opregion: [00:02:0] addr=fc len=2 val=feff5000 > > PS: I also run xbmc on domU and it playbacks video under HW > acceleration (VAAPI) without any problem. XBMC by itself is also an > graphics intensive program. But this runs on an pure HVM guest, while > the failing case is on PVHVM. > > PS2: I also suffered another instability yesterday. It happens when I > was compiling kernel in side the domU. The host reboots suddenly. > Since I''m not using graphics at that time (Xorg session is idle, I > connected through SSH), this may be a different issue. >Hi Timothy, Could you send /proc/iomem, lspci -vvvv and the e820 from dmesg for this VM? Thanks, Jean
On Thu, Dec 20, 2012 at 2:18 AM, Jean Guyader <jean.guyader@gmail.com> wrote:> Hi Timothy, > > Could you send /proc/iomem, lspci -vvvv and the e820 from dmesg for this VM? >Thanks Jean, here are info you asked. Could I ask what is it about? The warning in kernel log or the host freezing issue? If it''s about the former, I should mention that in the log I posted, I have already applied a local patch (sent in a separate thread with you involved) that reserved one more page in e820. /proc/iomem: 00000000-0000ffff : reserved 00010000-0009dfff : System RAM 0009e000-0009ffff : reserved 000a0000-000bffff : PCI Bus 0000:00 000c0000-000ce3ff : Video ROM 000ce800-000cf1ff : Adapter ROM 000e0000-000fffff : reserved 000f0000-000fffff : System ROM 00100000-dfffffff : System RAM 01000000-013dcc77 : Kernel code 013dcc78-0168f03f : Kernel data 01727000-01804fff : Kernel bss e0000000-fbffffff : PCI Bus 0000:00 e0000000-efffffff : 0000:00:02.0 f0000000-f0ffffff : 0000:00:03.0 f0000000-f0ffffff : xen-platform-pci f1000000-f13fffff : 0000:00:02.0 f1400000-f1403fff : i915 MCHBAR f1620000-f1623fff : 0000:00:05.0 f1620000-f1623fff : ICH HD audio f1624000-f1624fff : 0000:00:06.0 f1624000-f1624fff : ehci_hcd fc000000-feff3fff : reserved fec00000-fec003ff : IOAPIC 0 fed00000-fed003ff : HPET 0 fee00000-fee00fff : Local APIC feff4000-feff6fff : ACPI Non-volatile Storage feff7000-ffffffff : reserved 100000000-11c7fffff : System RAM 11c800000-11fffffff : RAM buffer dmesg lines with ''e820'' in it. [ 0.000000] e820: BIOS-provided physical RAM map: [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009dfff] usable [ 0.000000] BIOS-e820: [mem 0x000000000009e000-0x000000000009ffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000dfffffff] usable [ 0.000000] BIOS-e820: [mem 0x00000000fc000000-0x00000000feff3fff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000feff4000-0x00000000feff6fff] ACPI NVS [ 0.000000] BIOS-e820: [mem 0x00000000feff7000-0x00000000ffffffff] reserved [ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000011c7fffff] usable [ 0.000000] e820: update [mem 0x00000000-0x0000ffff] usable ==> reserved [ 0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable [ 0.000000] e820: last_pfn = 0x11c800 max_arch_pfn = 0x400000000 [ 0.000000] e820: last_pfn = 0xe0000 max_arch_pfn = 0x400000000 [ 0.000000] e820: [mem 0xe0000000-0xfbffffff] available for PCI devices [ 0.439596] e820: reserve RAM buffer [mem 0x0009e000-0x0009ffff] [ 0.439597] e820: reserve RAM buffer [mem 0x11c800000-0x11fffffff] Please find the lspci -vvv log in the attachment, it''s a bit lengthy. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Thu, Dec 20, 2012 at 12:04 AM, G.R. <firemeteor@users.sourceforge.net> wrote:>>> PS2: I also suffered another instability yesterday. It happens when I >>> was compiling kernel in side the domU. The host reboots suddenly. >>> Since I''m not using graphics at that time (Xorg session is idle, I >>> connected through SSH), this may be a different issue.I tried once more to rebuild kernel in the debian VM. It''s a total mess this time. The whole system (including dom0) unexpectedly reboots several times during the compilation. This destroyed the kernel tree and I failed to build the kernel. I suspect this has something to do with disk driver, since the reboot tend to happen during high disk load (like linking vmlinux). Will run iozone to check tomorrow. It seems that this issue has little to do with IGD passthrough. I''m not sure if it''s the same issue for the host freezing during game play. Maybe I should track them separately. Thanks, Timothy
On Thu, Dec 20, 2012 at 12:04:01AM +0800, G.R. wrote:> On Wed, Dec 19, 2012 at 2:20 PM, G.R. <firemeteor@users.sourceforge.net> wrote: > > Adding Jean, the author to the opregion patch. > > > > Jean, I believe the warning is due to the offset within the page. > > To accommodate the offset, you would need to reserve another page for it. > > Will the extra page cause any unexpected problem? > > > > The original thread is about an instability issue that directly freeze the host. > > I believe this warning above should not has such effect. > > What do you think? And any suggestion? > > > > Jean appears to be no longer reach able. > The warning I found turns out to be not relevant. > According to the OpRegion spec, the tail part is reserved and should > never be touched by the guest. > But anyway, I had a local fix to get rid of the warning, but reserving > one more page and map it when the host opregion is not page aligned. > I''ll send it to a separate thread. > > Back to the topic. I updated to xen 4.2.1 and tried three times tonight. > Two of them lead to total freeze with no error log available, after > game playing for a couple of minutes. > And the last try ended up with GPU hang after 10+ minutes of game playing. > This is a guest only hang. But I still have no way to check GPU error > state even it has been collected: > > [ 1553.588076] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer > elapsed... GPU hung > [ 1553.592112] [drm] capturing error event; look for more information > in /debug/dri/0/i915_error_state > [ 1582.004075] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer > elapsed... GPU hung > [ 1597.220075] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer > elapsed... GPU hung > [ 1613.220074] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer > elapsed... GPU hungThose also appear with baremetal (Linus actually mentioned this).> > I''m wondering if the two syndromes are due to the same underlying cause. > But I guess a GPU hang caused by guest driver issue should not freeze > the host. Is it true?It shouldn''t. Is the machine usuable with this guest being frozen?> > I''m going to try more with different config -- different kernel > version, with / without PVOPS, native run vs VM etc. > But this is kind of blindly since I have no clue at all. If you have > anything to suspect, it will be highly appreciated. > > Thanks, > Timothy > > > Thanks, > > Timothy > > > > On Wed, Dec 19, 2012 at 1:28 AM, G.R. <firemeteor@users.sourceforge.net> wrote: > >> Hi Stefano, > >> > >> I recently tried to play some 3D games on my linux guest. > >> The game starts without problem but it freezes the entire system after > >> a some time (a minute or so?). > >> Here I mean both the host and domU are not responsive anymore. > >> The ssh freezes and i had to shutdown the machine using power button directly. > >> > >> I did not find anything obvious from the host log. But from the guest, > >> I can find this: > >> > >> Dec 18 20:28:38 debvm kernel: [ 0.899860] resource map sanity check > >> conflict: 0xfeff5018 0xfeff7017 0xfeff7000 0xffffffff reserved > >> Dec 18 20:28:38 debvm kernel: [ 0.899862] ------------[ cut here > >> ]------------ > >> Dec 18 20:28:38 debvm kernel: [ 0.899869] WARNING: at > >> arch/x86/mm/ioremap.c:171 __ioremap_caller+0x2c4/0x33c() > >> Dec 18 20:28:38 debvm kernel: [ 0.899870] Hardware name: HVM domU > >> Dec 18 20:28:38 debvm kernel: [ 0.899872] Info: mapping multiple > >> BARs. Your kernel is fine. > >> Dec 18 20:28:38 debvm kernel: [ 0.899873] Modules linked in: > >> Dec 18 20:28:38 debvm kernel: [ 0.899878] Pid: 1, comm: swapper/0 > >> Not tainted 3.6.9 #4 > >> Dec 18 20:28:38 debvm kernel: [ 0.899892] Call Trace: > >> Dec 18 20:28:38 debvm kernel: [ 0.899896] [<ffffffff8103d194>] ? > >> warn_slowpath_common+0x76/0x8a > >> Dec 18 20:28:38 debvm kernel: [ 0.899898] [<ffffffff8103d240>] ? > >> warn_slowpath_fmt+0x45/0x4a > >> Dec 18 20:28:38 debvm kernel: [ 0.899900] [<ffffffff81032a6c>] ? > >> __ioremap_caller+0x2c4/0x33c > >> Dec 18 20:28:38 debvm kernel: [ 0.899902] [<ffffffff812c3be3>] ? > >> intel_opregion_setup+0x9c/0x201 > >> Dec 18 20:28:38 debvm kernel: [ 0.899904] [<ffffffff812bcb75>] ? > >> intel_setup_gmbus+0x175/0x19d > >> Dec 18 20:28:38 debvm kernel: [ 0.899907] [<ffffffff8128a37a>] ? > >> i915_driver_load+0x548/0x90d > >> Dec 18 20:28:38 debvm kernel: [ 0.899910] [<ffffffff812ff804>] ? > >> setup_hpet_msi_remapped+0x20/0x20 > >> Dec 18 20:28:38 debvm kernel: [ 0.899912] [<ffffffff81272706>] ? > >> drm_get_pci_dev+0x152/0x259 > >> Dec 18 20:28:38 debvm kernel: [ 0.899915] [<ffffffff813d4883>] ? > >> _raw_spin_lock_irqsave+0x21/0x45 > >> Dec 18 20:28:38 debvm kernel: [ 0.899918] [<ffffffff811d9ecc>] ? > >> local_pci_probe+0x5a/0xa0 > >> Dec 18 20:28:38 debvm kernel: [ 0.899920] [<ffffffff811d9fcf>] ? > >> pci_device_probe+0xbd/0xe7 > >> Dec 18 20:28:38 debvm kernel: [ 0.899922] [<ffffffff812cd887>] ? > >> driver_probe_device+0x1b0/0x1b0 > >> Dec 18 20:28:38 debvm kernel: [ 0.899923] [<ffffffff812cd887>] ? > >> driver_probe_device+0x1b0/0x1b0 > >> Dec 18 20:28:38 debvm kernel: [ 0.899925] [<ffffffff812cd769>] ? > >> driver_probe_device+0x92/0x1b0 > >> Dec 18 20:28:38 debvm kernel: [ 0.899926] [<ffffffff812cd8da>] ? > >> __driver_attach+0x53/0x73 > >> Dec 18 20:28:38 debvm kernel: [ 0.899928] [<ffffffff812cc06f>] ? > >> bus_for_each_dev+0x46/0x77 > >> Dec 18 20:28:38 debvm kernel: [ 0.899930] [<ffffffff812ccf8f>] ? > >> bus_add_driver+0xd5/0x1f4 > >> Dec 18 20:28:38 debvm kernel: [ 0.899931] [<ffffffff812cde14>] ? > >> driver_register+0x89/0x101 > >> Dec 18 20:28:38 debvm kernel: [ 0.899933] [<ffffffff811d9336>] ? > >> __pci_register_driver+0x49/0xa3 > >> Dec 18 20:28:38 debvm kernel: [ 0.899935] [<ffffffff816d55c7>] ? > >> ttm_init+0x63/0x63 > >> Dec 18 20:28:38 debvm kernel: [ 0.899937] [<ffffffff81002085>] ? > >> do_one_initcall+0x75/0x12c > >> Dec 18 20:28:38 debvm kernel: [ 0.899940] [<ffffffff816a6cc2>] ? > >> kernel_init+0x13c/0x1c0 > >> Dec 18 20:28:38 debvm kernel: [ 0.899941] [<ffffffff816a6565>] ? > >> do_early_param+0x83/0x83 > >> Dec 18 20:28:38 debvm kernel: [ 0.899943] [<ffffffff813d9f44>] ? > >> kernel_thread_helper+0x4/0x10 > >> Dec 18 20:28:38 debvm kernel: [ 0.899945] [<ffffffff816a6b86>] ? > >> start_kernel+0x3e1/0x3e1 > >> Dec 18 20:28:38 debvm kernel: [ 0.899947] [<ffffffff813d9f40>] ? > >> gs_change+0x13/0x13 > >> Dec 18 20:28:38 debvm kernel: [ 0.899950] ---[ end trace > >> db461543ce599b44 ]--- > >> > >> I''m not sure if this has anything to do with the freeze. This seems to > >> show up on every boot after I upgraded to xen version 4.2.1-rc2. Both > >> debian kernel 3.2.32 / 3.6.9 suffers from the same log. But whole > >> system freeze happens only during gaming, which is much less frequent. > >> So I''m not sure if the two are related. But anyway, could you comment > >> about what does this log mean? > >> > >> I can find the one of the mentioned address in the qemu_dm log: > >> pt_pci_write_config: [00:02:0] address=00fc val=0xfeff5000 len=4 > >> igd_write_opregion: Map OpRegion: cd996018 -> feff5018 > >> igd_write_opregion: [00:02:0] addr=fc len=2 val=feff5000 > >> > >> PS: I also run xbmc on domU and it playbacks video under HW > >> acceleration (VAAPI) without any problem. XBMC by itself is also an > >> graphics intensive program. But this runs on an pure HVM guest, while > >> the failing case is on PVHVM. > >> > >> PS2: I also suffered another instability yesterday. It happens when I > >> was compiling kernel in side the domU. The host reboots suddenly. > >> Since I''m not using graphics at that time (Xorg session is idle, I > >> connected through SSH), this may be a different issue. > >> > >> Thanks, > >> Timothy > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >
On Fri, Dec 21, 2012 at 02:20:17AM +0800, G.R. wrote:> On Thu, Dec 20, 2012 at 12:04 AM, G.R. <firemeteor@users.sourceforge.net> wrote: > >>> PS2: I also suffered another instability yesterday. It happens when I > >>> was compiling kernel in side the domU. The host reboots suddenly. > >>> Since I''m not using graphics at that time (Xorg session is idle, I > >>> connected through SSH), this may be a different issue. > > I tried once more to rebuild kernel in the debian VM. It''s a total > mess this time. > The whole system (including dom0) unexpectedly reboots several times > during the compilation. > This destroyed the kernel tree and I failed to build the kernel. > I suspect this has something to do with disk driver, since the reboot > tend to happen during high disk load (like linking vmlinux).Is the AHCI controller sharing the same interrupt line as the IGD?> Will run iozone to check tomorrow. > > It seems that this issue has little to do with IGD passthrough. > I''m not sure if it''s the same issue for the host freezing during game play. > Maybe I should track them separately. > > Thanks, > Timothy > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >
> Is the AHCI controller sharing the same interrupt line as the IGD? >Thanks for your help, Konrad. I did some more experiments and this turns out due to my stupid, again. So basically the instability comes from the HW directly, it panics once heavy load is present, either gaming or kernel compilation. The direct cause of this HW instability is that I applied under-voltage to my processor, which I almost forget about.. That config works fine on a native build -- it passes stress testing from prime95. However, the virtualization feature seems more demanding and does not work well on that voltage setting. After removing the under-voltage trick, the virtualized system works just fine. So all known functionality issue about linux build have been solved. Thank you all and apologize for wasting your time. Thanks, Timothy PS: The bad news is that this instability fix does not help on the win7 guest in anyways. It''s as broken as before.