Recently I pulled in new changesets from xen- and qemu-unstable, and when creating a PV guest I''m getting errors like the stack trace below. Is this likely to be caused by QEMU using AIO? It this a bug in Xen or in the Debian kernel? Is there an easy way to turn off aio using a config file so I can see if it is qemu''s aio? The config file is attached, for reference. -George [ 408.127439] BUG: unable to handle kernel paging request at af00003e [ 408.133612] IP: [<c10941f8>] set_page_dirty+0x1e/0x4a [ 408.138726] *pdpt = 0000000033232027 *pde = 0000000000000000 [ 408.144532] Oops: 0000 [#1] SMP [ 408.147825] last sysfs file: /sys/devices/vif-1-0/uevent [ 408.153200] Modules linked in: xt_physdev iptable_filter ip_tables x_tables xen_evtchn xenfs bridge stp loop snd_p] [ 408.194797] [ 408.196359] Pid: 1942, comm: qemu-system-i38 Not tainted (2.6.32-5-xen-686 #1) PowerEdge R710 [ 408.204938] EIP: 0061:[<c10941f8>] EFLAGS: 00010286 CPU: 0 [ 408.210485] EIP is at set_page_dirty+0x1e/0x4a [ 408.214991] EAX: af000006 EBX: 00000000 ECX: c4ad7680 EDX: 41000001 [ 408.221317] ESI: c4ad7680 EDI: f4f0c54c EBP: f3353200 ESP: f33bfdb8 [ 408.227644] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069 [ 408.233104] Process qemu-system-i38 (pid: 1942, ti=f33be000 task=f3de50c0 task.ti=f33be000) [ 408.241508] Stack: [ 408.243588] c10944f0 00000000 f4f0c500 c10d991a f33533c8 f3353200 f4f0c500 c10dc048 [ 408.251128] <0> 00000001 00001000 00000000 c10dcc44 00000001 c1006767 00000000 00000000 [ 408.259189] <0> d4646070 00000000 f2f0869c f3ff4900 00000000 0000000c 00001000 00000000 [ 408.267509] Call Trace: [ 408.270025] [<c10944f0>] ? set_page_dirty_lock+0x22/0x30 [ 408.275486] [<c10d991a>] ? bio_set_pages_dirty+0x22/0x2f [ 408.280944] [<c10dc048>] ? dio_bio_submit+0x3c/0x57 [ 408.285970] [<c10dcc44>] ? __blockdev_direct_IO+0x903/0xaed [ 408.291691] [<c1006767>] ? xen_restore_fl_direct_end+0x0/0x1 [ 408.297500] [<f62a2494>] ? ext3_direct_IO+0xed/0x18d [ext3] [ 408.303219] [<f62a2e2b>] ? ext3_get_block+0x0/0xd1 [ext3] [ 408.308764] [<c1090687>] ? generic_file_aio_read+0xf9/0x57b [ 408.314483] [<c1006040>] ? xen_force_evtchn_callback+0xc/0x10 [ 408.320376] [<c1006770>] ? check_events+0x8/0xc [ 408.325056] [<c1006040>] ? xen_force_evtchn_callback+0xc/0x10 [ 408.330949] [<c109058e>] ? generic_file_aio_read+0x0/0x57b [ 408.336584] [<c10e3725>] ? aio_rw_vect_retry+0x61/0x122 [ 408.341955] [<c10e45fa>] ? aio_run_iocb+0x61/0xef [ 408.346809] [<c10e4ec9>] ? sys_io_submit+0x409/0x49c [ 408.351923] [<c1008f9c>] ? syscall_call+0x7/0xb [ 408.356600] Code: c3 f6 00 10 75 04 f0 80 08 10 31 c0 c3 89 c1 8b 40 10 8b 11 f7 c2 00 00 01 00 74 07 b8 ec 71 3d [ 408.375492] EIP: [<c10941f8>] set_page_dirty+0x1e/0x4a SS:ESP 0069:f33bfdb8 [ 408.382512] CR2: 00000000af00003e [ 408.385894] ---[ end trace 9ce48eb2f06897bf ]--- _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Thu, 2012-04-26 at 11:52 +0100, George Dunlap wrote:> Recently I pulled in new changesets from xen- and qemu-unstable, and > when creating a PV guest I''m getting errors like the stack trace > below. Is this likely to be caused by QEMU using AIO? It this a bug > in Xen or in the Debian kernel? Is there an easy way to turn off aio > using a config file so I can see if it is qemu''s aio? > > The config file is attached, for reference.Which revision of the Debian kernel is this? It looks like Squeeze, which was a fairly old snapshot of Jeremy''s Xen.git -- it''s certainly not impossible that there were latent AIO bugs in there and Stefano has been fixing these sort of things in recent kernels too. So it''s very possible we need to backport some fix. Ian.> > -George > > [ 408.127439] BUG: unable to handle kernel paging request at af00003e > [ 408.133612] IP: [<c10941f8>] set_page_dirty+0x1e/0x4a > [ 408.138726] *pdpt = 0000000033232027 *pde = 0000000000000000 > [ 408.144532] Oops: 0000 [#1] SMP > [ 408.147825] last sysfs file: /sys/devices/vif-1-0/uevent > [ 408.153200] Modules linked in: xt_physdev iptable_filter ip_tables > x_tables xen_evtchn xenfs bridge stp loop snd_p] > [ 408.194797] > [ 408.196359] Pid: 1942, comm: qemu-system-i38 Not tainted > (2.6.32-5-xen-686 #1) PowerEdge R710 > [ 408.204938] EIP: 0061:[<c10941f8>] EFLAGS: 00010286 CPU: 0 > [ 408.210485] EIP is at set_page_dirty+0x1e/0x4a > [ 408.214991] EAX: af000006 EBX: 00000000 ECX: c4ad7680 EDX: 41000001 > [ 408.221317] ESI: c4ad7680 EDI: f4f0c54c EBP: f3353200 ESP: f33bfdb8 > [ 408.227644] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069 > [ 408.233104] Process qemu-system-i38 (pid: 1942, ti=f33be000 > task=f3de50c0 task.ti=f33be000) > [ 408.241508] Stack: > [ 408.243588] c10944f0 00000000 f4f0c500 c10d991a f33533c8 f3353200 > f4f0c500 c10dc048 > [ 408.251128] <0> 00000001 00001000 00000000 c10dcc44 00000001 > c1006767 00000000 00000000 > [ 408.259189] <0> d4646070 00000000 f2f0869c f3ff4900 00000000 > 0000000c 00001000 00000000 > [ 408.267509] Call Trace: > [ 408.270025] [<c10944f0>] ? set_page_dirty_lock+0x22/0x30 > [ 408.275486] [<c10d991a>] ? bio_set_pages_dirty+0x22/0x2f > [ 408.280944] [<c10dc048>] ? dio_bio_submit+0x3c/0x57 > [ 408.285970] [<c10dcc44>] ? __blockdev_direct_IO+0x903/0xaed > [ 408.291691] [<c1006767>] ? xen_restore_fl_direct_end+0x0/0x1 > [ 408.297500] [<f62a2494>] ? ext3_direct_IO+0xed/0x18d [ext3] > [ 408.303219] [<f62a2e2b>] ? ext3_get_block+0x0/0xd1 [ext3] > [ 408.308764] [<c1090687>] ? generic_file_aio_read+0xf9/0x57b > [ 408.314483] [<c1006040>] ? xen_force_evtchn_callback+0xc/0x10 > [ 408.320376] [<c1006770>] ? check_events+0x8/0xc > [ 408.325056] [<c1006040>] ? xen_force_evtchn_callback+0xc/0x10 > [ 408.330949] [<c109058e>] ? generic_file_aio_read+0x0/0x57b > [ 408.336584] [<c10e3725>] ? aio_rw_vect_retry+0x61/0x122 > [ 408.341955] [<c10e45fa>] ? aio_run_iocb+0x61/0xef > [ 408.346809] [<c10e4ec9>] ? sys_io_submit+0x409/0x49c > [ 408.351923] [<c1008f9c>] ? syscall_call+0x7/0xb > [ 408.356600] Code: c3 f6 00 10 75 04 f0 80 08 10 31 c0 c3 89 c1 8b > 40 10 8b 11 f7 c2 00 00 01 00 74 07 b8 ec 71 3d > [ 408.375492] EIP: [<c10941f8>] set_page_dirty+0x1e/0x4a SS:ESP 0069:f33bfdb8 > [ 408.382512] CR2: 00000000af00003e > [ 408.385894] ---[ end trace 9ce48eb2f06897bf ]---
On Thu, Apr 26, 2012 at 11:59 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote:> On Thu, 2012-04-26 at 11:52 +0100, George Dunlap wrote: >> Recently I pulled in new changesets from xen- and qemu-unstable, and >> when creating a PV guest I''m getting errors like the stack trace >> below. Is this likely to be caused by QEMU using AIO? It this a bug >> in Xen or in the Debian kernel? Is there an easy way to turn off aio >> using a config file so I can see if it is qemu''s aio? >> >> The config file is attached, for reference. > > Which revision of the Debian kernel is this? > > It looks like Squeeze, which was a fairly old snapshot of Jeremy''s > Xen.git -- it''s certainly not impossible that there were latent AIO bugs > in there and Stefano has been fixing these sort of things in recent > kernels too. So it''s very possible we need to backport some fix.The package info is below. It is from squeeze, since (AFAIK) that''s the latest "stable" release (and thus what people are likely to be using) -George # dpkg -s linux-image-2.6.32-5-xen-686 Package: linux-image-2.6.32-5-xen-686 Status: install ok installed Priority: optional Section: kernel Installed-Size: 78524 Maintainer: Debian Kernel Team <debian-kernel@lists.debian.org> Architecture: i386 Source: linux-2.6 Version: 2.6.32-41 Provides: linux-image, linux-image-2.6, linux-modules-2.6.32-5-xen-686 Depends: module-init-tools, linux-base (>= 2.6.32-41), initramfs-tools (>= 0.55) Pre-Depends: debconf | debconf-2.0 Recommends: firmware-linux-free (>= 2.6.32), libc6-xen Suggests: linux-doc-2.6.32, grub Breaks: initramfs-tools (<< 0.55), lilo (<< 22.8-8.2~) Description: Linux 2.6.32 for modern PCs, Xen dom0 support The Linux kernel 2.6.32 and modules for use on PCs with Intel Pentium Pro/II/III/4/4M/D/M, Xeon, Celeron, Core or Atom; AMD Geode NX, Athlon (K7), Duron, Opteron, Sempron, Turion or Phenom; Transmeta Efficeon; or VIA C7 processors. . This kernel also runs on a Xen hypervisor. It supports both privileged (dom0) and unprivileged (domU) operation.
On Thu, 2012-04-26 at 12:08 +0100, George Dunlap wrote:> On Thu, Apr 26, 2012 at 11:59 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote: > > On Thu, 2012-04-26 at 11:52 +0100, George Dunlap wrote: > >> Recently I pulled in new changesets from xen- and qemu-unstable, and > >> when creating a PV guest I''m getting errors like the stack trace > >> below. Is this likely to be caused by QEMU using AIO? It this a bug > >> in Xen or in the Debian kernel? Is there an easy way to turn off aio > >> using a config file so I can see if it is qemu''s aio? > >> > >> The config file is attached, for reference. > > > > Which revision of the Debian kernel is this? > > > > It looks like Squeeze, which was a fairly old snapshot of Jeremy''s > > Xen.git -- it''s certainly not impossible that there were latent AIO bugs > > in there and Stefano has been fixing these sort of things in recent > > kernels too. So it''s very possible we need to backport some fix. > > The package info is below. It is from squeeze, since (AFAIK) that''s > the latest "stable" release (and thus what people are likely to be > using)Right, it''s also the latest available kernel package for Squeeze, which is what I wanted to check. Not that I''ve been aware of any AIO fixes recently anyway.> > -George > > # dpkg -s linux-image-2.6.32-5-xen-686 > Package: linux-image-2.6.32-5-xen-686 > Status: install ok installed > Priority: optional > Section: kernel > Installed-Size: 78524 > Maintainer: Debian Kernel Team <debian-kernel@lists.debian.org> > Architecture: i386 > Source: linux-2.6 > Version: 2.6.32-41 > Provides: linux-image, linux-image-2.6, linux-modules-2.6.32-5-xen-686 > Depends: module-init-tools, linux-base (>= 2.6.32-41), initramfs-tools (>= 0.55) > Pre-Depends: debconf | debconf-2.0 > Recommends: firmware-linux-free (>= 2.6.32), libc6-xen > Suggests: linux-doc-2.6.32, grub > Breaks: initramfs-tools (<< 0.55), lilo (<< 22.8-8.2~) > Description: Linux 2.6.32 for modern PCs, Xen dom0 support > The Linux kernel 2.6.32 and modules for use on PCs with Intel Pentium > Pro/II/III/4/4M/D/M, Xeon, Celeron, Core or Atom; AMD Geode NX, Athlon > (K7), Duron, Opteron, Sempron, Turion or Phenom; Transmeta Efficeon; or > VIA C7 processors. > . > This kernel also runs on a Xen hypervisor. It supports both privileged > (dom0) and unprivileged (domU) operation.
On Thu, 2012-04-26 at 12:23 +0100, Stefano Stabellini wrote:> On Thu, 26 Apr 2012, Ian Campbell wrote: > > On Thu, 2012-04-26 at 11:52 +0100, George Dunlap wrote: > > > Recently I pulled in new changesets from xen- and qemu-unstable, and > > > when creating a PV guest I''m getting errors like the stack trace > > > below. Is this likely to be caused by QEMU using AIO? It this a bug > > > in Xen or in the Debian kernel? Is there an easy way to turn off aio > > > using a config file so I can see if it is qemu''s aio? > > > > > > The config file is attached, for reference. > > > > Which revision of the Debian kernel is this? > > > > It looks like Squeeze, which was a fairly old snapshot of Jeremy''s > > Xen.git -- it''s certainly not impossible that there were latent AIO bugs > > in there and Stefano has been fixing these sort of things in recent > > kernels too. So it''s very possible we need to backport some fix. > > Right. > > > > > [ 408.127439] BUG: unable to handle kernel paging request at af00003e > > > [ 408.133612] IP: [<c10941f8>] set_page_dirty+0x1e/0x4a > > > [ 408.138726] *pdpt = 0000000033232027 *pde = 0000000000000000 > > > [ 408.144532] Oops: 0000 [#1] SMP > > > [ 408.147825] last sysfs file: /sys/devices/vif-1-0/uevent > > > [ 408.153200] Modules linked in: xt_physdev iptable_filter ip_tables > > > x_tables xen_evtchn xenfs bridge stp loop snd_p] > > > [ 408.194797] > > > [ 408.196359] Pid: 1942, comm: qemu-system-i38 Not tainted > > > (2.6.32-5-xen-686 #1) PowerEdge R710 > > > [ 408.204938] EIP: 0061:[<c10941f8>] EFLAGS: 00010286 CPU: 0 > > > [ 408.210485] EIP is at set_page_dirty+0x1e/0x4a > > > [ 408.214991] EAX: af000006 EBX: 00000000 ECX: c4ad7680 EDX: 41000001 > > > [ 408.221317] ESI: c4ad7680 EDI: f4f0c54c EBP: f3353200 ESP: f33bfdb8 > > > [ 408.227644] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069 > > > [ 408.233104] Process qemu-system-i38 (pid: 1942, ti=f33be000 > > > task=f3de50c0 task.ti=f33be000) > > > [ 408.241508] Stack: > > > [ 408.243588] c10944f0 00000000 f4f0c500 c10d991a f33533c8 f3353200 > > > f4f0c500 c10dc048 > > > [ 408.251128] <0> 00000001 00001000 00000000 c10dcc44 00000001 > > > c1006767 00000000 00000000 > > > [ 408.259189] <0> d4646070 00000000 f2f0869c f3ff4900 00000000 > > > 0000000c 00001000 00000000 > > > [ 408.267509] Call Trace: > > > [ 408.270025] [<c10944f0>] ? set_page_dirty_lock+0x22/0x30 > > > [ 408.275486] [<c10d991a>] ? bio_set_pages_dirty+0x22/0x2f > > > [ 408.280944] [<c10dc048>] ? dio_bio_submit+0x3c/0x57 > > > [ 408.285970] [<c10dcc44>] ? __blockdev_direct_IO+0x903/0xaed > > > [ 408.291691] [<c1006767>] ? xen_restore_fl_direct_end+0x0/0x1 > > > [ 408.297500] [<f62a2494>] ? ext3_direct_IO+0xed/0x18d [ext3] > > > [ 408.303219] [<f62a2e2b>] ? ext3_get_block+0x0/0xd1 [ext3] > > > [ 408.308764] [<c1090687>] ? generic_file_aio_read+0xf9/0x57b > > > [ 408.314483] [<c1006040>] ? xen_force_evtchn_callback+0xc/0x10 > > > [ 408.320376] [<c1006770>] ? check_events+0x8/0xc > > > [ 408.325056] [<c1006040>] ? xen_force_evtchn_callback+0xc/0x10 > > > [ 408.330949] [<c109058e>] ? generic_file_aio_read+0x0/0x57b > > > [ 408.336584] [<c10e3725>] ? aio_rw_vect_retry+0x61/0x122 > > > [ 408.341955] [<c10e45fa>] ? aio_run_iocb+0x61/0xef > > > [ 408.346809] [<c10e4ec9>] ? sys_io_submit+0x409/0x49c > > > [ 408.351923] [<c1008f9c>] ? syscall_call+0x7/0xb > > > [ 408.356600] Code: c3 f6 00 10 75 04 f0 80 08 10 31 c0 c3 89 c1 8b > > > 40 10 8b 11 f7 c2 00 00 01 00 74 07 b8 ec 71 3d > > > [ 408.375492] EIP: [<c10941f8>] set_page_dirty+0x1e/0x4a SS:ESP 0069:f33bfdb8 > > > [ 408.382512] CR2: 00000000af00003e > > > [ 408.385894] ---[ end trace 9ce48eb2f06897bf ]--- > > This looks like a classic direct_IO/AIO not working bug: it could be > because the m2p_override is not working correctly or it might not even > be present at all in this kernel (it went upstream in 2.6.38). > It only started showing now because qemu-xen-traditional switched to > O_DIRECT.This kernel had VM_FOREIGN and PageForeign etc rather than the m2p_override. Could be that we need to extend VM_FOREIGN to cover rant mapped pages? That''s actually a fair chunk of dev work, not just a simple backport. However this kernel does have blktap so why is qemu based AIO being used at all? Ian.
Stefano Stabellini
2012-Apr-26 11:23 UTC
Re: Kernel aio bug in Debian 2.6.32-5-xen kernel?
On Thu, 26 Apr 2012, Ian Campbell wrote:> On Thu, 2012-04-26 at 11:52 +0100, George Dunlap wrote: > > Recently I pulled in new changesets from xen- and qemu-unstable, and > > when creating a PV guest I''m getting errors like the stack trace > > below. Is this likely to be caused by QEMU using AIO? It this a bug > > in Xen or in the Debian kernel? Is there an easy way to turn off aio > > using a config file so I can see if it is qemu''s aio? > > > > The config file is attached, for reference. > > Which revision of the Debian kernel is this? > > It looks like Squeeze, which was a fairly old snapshot of Jeremy''s > Xen.git -- it''s certainly not impossible that there were latent AIO bugs > in there and Stefano has been fixing these sort of things in recent > kernels too. So it''s very possible we need to backport some fix.Right.> > [ 408.127439] BUG: unable to handle kernel paging request at af00003e > > [ 408.133612] IP: [<c10941f8>] set_page_dirty+0x1e/0x4a > > [ 408.138726] *pdpt = 0000000033232027 *pde = 0000000000000000 > > [ 408.144532] Oops: 0000 [#1] SMP > > [ 408.147825] last sysfs file: /sys/devices/vif-1-0/uevent > > [ 408.153200] Modules linked in: xt_physdev iptable_filter ip_tables > > x_tables xen_evtchn xenfs bridge stp loop snd_p] > > [ 408.194797] > > [ 408.196359] Pid: 1942, comm: qemu-system-i38 Not tainted > > (2.6.32-5-xen-686 #1) PowerEdge R710 > > [ 408.204938] EIP: 0061:[<c10941f8>] EFLAGS: 00010286 CPU: 0 > > [ 408.210485] EIP is at set_page_dirty+0x1e/0x4a > > [ 408.214991] EAX: af000006 EBX: 00000000 ECX: c4ad7680 EDX: 41000001 > > [ 408.221317] ESI: c4ad7680 EDI: f4f0c54c EBP: f3353200 ESP: f33bfdb8 > > [ 408.227644] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069 > > [ 408.233104] Process qemu-system-i38 (pid: 1942, ti=f33be000 > > task=f3de50c0 task.ti=f33be000) > > [ 408.241508] Stack: > > [ 408.243588] c10944f0 00000000 f4f0c500 c10d991a f33533c8 f3353200 > > f4f0c500 c10dc048 > > [ 408.251128] <0> 00000001 00001000 00000000 c10dcc44 00000001 > > c1006767 00000000 00000000 > > [ 408.259189] <0> d4646070 00000000 f2f0869c f3ff4900 00000000 > > 0000000c 00001000 00000000 > > [ 408.267509] Call Trace: > > [ 408.270025] [<c10944f0>] ? set_page_dirty_lock+0x22/0x30 > > [ 408.275486] [<c10d991a>] ? bio_set_pages_dirty+0x22/0x2f > > [ 408.280944] [<c10dc048>] ? dio_bio_submit+0x3c/0x57 > > [ 408.285970] [<c10dcc44>] ? __blockdev_direct_IO+0x903/0xaed > > [ 408.291691] [<c1006767>] ? xen_restore_fl_direct_end+0x0/0x1 > > [ 408.297500] [<f62a2494>] ? ext3_direct_IO+0xed/0x18d [ext3] > > [ 408.303219] [<f62a2e2b>] ? ext3_get_block+0x0/0xd1 [ext3] > > [ 408.308764] [<c1090687>] ? generic_file_aio_read+0xf9/0x57b > > [ 408.314483] [<c1006040>] ? xen_force_evtchn_callback+0xc/0x10 > > [ 408.320376] [<c1006770>] ? check_events+0x8/0xc > > [ 408.325056] [<c1006040>] ? xen_force_evtchn_callback+0xc/0x10 > > [ 408.330949] [<c109058e>] ? generic_file_aio_read+0x0/0x57b > > [ 408.336584] [<c10e3725>] ? aio_rw_vect_retry+0x61/0x122 > > [ 408.341955] [<c10e45fa>] ? aio_run_iocb+0x61/0xef > > [ 408.346809] [<c10e4ec9>] ? sys_io_submit+0x409/0x49c > > [ 408.351923] [<c1008f9c>] ? syscall_call+0x7/0xb > > [ 408.356600] Code: c3 f6 00 10 75 04 f0 80 08 10 31 c0 c3 89 c1 8b > > 40 10 8b 11 f7 c2 00 00 01 00 74 07 b8 ec 71 3d > > [ 408.375492] EIP: [<c10941f8>] set_page_dirty+0x1e/0x4a SS:ESP 0069:f33bfdb8 > > [ 408.382512] CR2: 00000000af00003e > > [ 408.385894] ---[ end trace 9ce48eb2f06897bf ]---This looks like a classic direct_IO/AIO not working bug: it could be because the m2p_override is not working correctly or it might not even be present at all in this kernel (it went upstream in 2.6.38). It only started showing now because qemu-xen-traditional switched to O_DIRECT.
Stefano Stabellini
2012-Apr-26 12:07 UTC
Re: Kernel aio bug in Debian 2.6.32-5-xen kernel?
On Thu, 26 Apr 2012, Ian Campbell wrote:> > > > [ 408.127439] BUG: unable to handle kernel paging request at af00003e > > > > [ 408.133612] IP: [<c10941f8>] set_page_dirty+0x1e/0x4a > > > > [ 408.138726] *pdpt = 0000000033232027 *pde = 0000000000000000 > > > > [ 408.144532] Oops: 0000 [#1] SMP > > > > [ 408.147825] last sysfs file: /sys/devices/vif-1-0/uevent > > > > [ 408.153200] Modules linked in: xt_physdev iptable_filter ip_tables > > > > x_tables xen_evtchn xenfs bridge stp loop snd_p] > > > > [ 408.194797] > > > > [ 408.196359] Pid: 1942, comm: qemu-system-i38 Not tainted > > > > (2.6.32-5-xen-686 #1) PowerEdge R710 > > > > [ 408.204938] EIP: 0061:[<c10941f8>] EFLAGS: 00010286 CPU: 0 > > > > [ 408.210485] EIP is at set_page_dirty+0x1e/0x4a > > > > [ 408.214991] EAX: af000006 EBX: 00000000 ECX: c4ad7680 EDX: 41000001 > > > > [ 408.221317] ESI: c4ad7680 EDI: f4f0c54c EBP: f3353200 ESP: f33bfdb8 > > > > [ 408.227644] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069 > > > > [ 408.233104] Process qemu-system-i38 (pid: 1942, ti=f33be000 > > > > task=f3de50c0 task.ti=f33be000) > > > > [ 408.241508] Stack: > > > > [ 408.243588] c10944f0 00000000 f4f0c500 c10d991a f33533c8 f3353200 > > > > f4f0c500 c10dc048 > > > > [ 408.251128] <0> 00000001 00001000 00000000 c10dcc44 00000001 > > > > c1006767 00000000 00000000 > > > > [ 408.259189] <0> d4646070 00000000 f2f0869c f3ff4900 00000000 > > > > 0000000c 00001000 00000000 > > > > [ 408.267509] Call Trace: > > > > [ 408.270025] [<c10944f0>] ? set_page_dirty_lock+0x22/0x30 > > > > [ 408.275486] [<c10d991a>] ? bio_set_pages_dirty+0x22/0x2f > > > > [ 408.280944] [<c10dc048>] ? dio_bio_submit+0x3c/0x57 > > > > [ 408.285970] [<c10dcc44>] ? __blockdev_direct_IO+0x903/0xaed > > > > [ 408.291691] [<c1006767>] ? xen_restore_fl_direct_end+0x0/0x1 > > > > [ 408.297500] [<f62a2494>] ? ext3_direct_IO+0xed/0x18d [ext3] > > > > [ 408.303219] [<f62a2e2b>] ? ext3_get_block+0x0/0xd1 [ext3] > > > > [ 408.308764] [<c1090687>] ? generic_file_aio_read+0xf9/0x57b > > > > [ 408.314483] [<c1006040>] ? xen_force_evtchn_callback+0xc/0x10 > > > > [ 408.320376] [<c1006770>] ? check_events+0x8/0xc > > > > [ 408.325056] [<c1006040>] ? xen_force_evtchn_callback+0xc/0x10 > > > > [ 408.330949] [<c109058e>] ? generic_file_aio_read+0x0/0x57b > > > > [ 408.336584] [<c10e3725>] ? aio_rw_vect_retry+0x61/0x122 > > > > [ 408.341955] [<c10e45fa>] ? aio_run_iocb+0x61/0xef > > > > [ 408.346809] [<c10e4ec9>] ? sys_io_submit+0x409/0x49c > > > > [ 408.351923] [<c1008f9c>] ? syscall_call+0x7/0xb > > > > [ 408.356600] Code: c3 f6 00 10 75 04 f0 80 08 10 31 c0 c3 89 c1 8b > > > > 40 10 8b 11 f7 c2 00 00 01 00 74 07 b8 ec 71 3d > > > > [ 408.375492] EIP: [<c10941f8>] set_page_dirty+0x1e/0x4a SS:ESP 0069:f33bfdb8 > > > > [ 408.382512] CR2: 00000000af00003e > > > > [ 408.385894] ---[ end trace 9ce48eb2f06897bf ]--- > > > > This looks like a classic direct_IO/AIO not working bug: it could be > > because the m2p_override is not working correctly or it might not even > > be present at all in this kernel (it went upstream in 2.6.38). > > It only started showing now because qemu-xen-traditional switched to > > O_DIRECT. > > This kernel had VM_FOREIGN and PageForeign etc rather than the > m2p_override. Could be that we need to extend VM_FOREIGN to cover rant > mapped pages? > > That''s actually a fair chunk of dev work, not just a simple backport. > > However this kernel does have blktap so why is qemu based AIO being used > at all?If blktap is present and working then libxl only uses QEMU for qcow/qcow2 disk images.
On Thu, Apr 26, 2012 at 1:07 PM, Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote:>> However this kernel does have blktap so why is qemu based AIO being used >> at all? > > If blktap is present and working then libxl only uses QEMU for > qcow/qcow2 disk images.Hmm -- except that the process that''s dying is clearly QEMU, and the disk images are definitely not qcow*, and Ian seems to think this kernel has blktap (how could I tell?), so something''s not right. Is there a command-line way to disable aio? -George
On Thu, 2012-04-26 at 14:14 +0100, George Dunlap wrote:> On Thu, Apr 26, 2012 at 1:07 PM, Stefano Stabellini > <stefano.stabellini@eu.citrix.com> wrote: > >> However this kernel does have blktap so why is qemu based AIO being used > >> at all? > > > > If blktap is present and working then libxl only uses QEMU for > > qcow/qcow2 disk images. > > Hmm -- except that the process that''s dying is clearly QEMU, and the > disk images are definitely not qcow*, and Ian seems to think this > kernel has blktap (how could I tell?), so something''s not right.It looks like it is a module -- lsmod should confirm, maybe it''s a simple as loading it? (if so let me know and I''ll be sure to include that when I write up "installing a Debian Dom0")> > Is there a command-line way to disable aio? > > -George
On Thu, Apr 26, 2012 at 2:24 PM, Ian Campbell <Ian.Campbell@citrix.com> wrote:> On Thu, 2012-04-26 at 14:14 +0100, George Dunlap wrote: >> On Thu, Apr 26, 2012 at 1:07 PM, Stefano Stabellini >> <stefano.stabellini@eu.citrix.com> wrote: >> >> However this kernel does have blktap so why is qemu based AIO being used >> >> at all? >> > >> > If blktap is present and working then libxl only uses QEMU for >> > qcow/qcow2 disk images. >> >> Hmm -- except that the process that''s dying is clearly QEMU, and the >> disk images are definitely not qcow*, and Ian seems to think this >> kernel has blktap (how could I tell?), so something''s not right. > > It looks like it is a module -- lsmod should confirm, maybe it''s a > simple as loading it? > > (if so let me know and I''ll be sure to include that when I write up > "installing a Debian Dom0")Indeed, blktap was *not* loaded, and "modprobe blktap" seems make things work. Should this be done in one of the initscripts? Or perhaps by xl? It would still be good to get the AIO stuff fixed in some way, as I''m sure I''m not the only one who''s going to run into this problem. -George
On Thu, 2012-04-26 at 14:43 +0100, George Dunlap wrote:> On Thu, Apr 26, 2012 at 2:24 PM, Ian Campbell <Ian.Campbell@citrix.com> wrote: > > On Thu, 2012-04-26 at 14:14 +0100, George Dunlap wrote: > >> On Thu, Apr 26, 2012 at 1:07 PM, Stefano Stabellini > >> <stefano.stabellini@eu.citrix.com> wrote: > >> >> However this kernel does have blktap so why is qemu based AIO being used > >> >> at all? > >> > > >> > If blktap is present and working then libxl only uses QEMU for > >> > qcow/qcow2 disk images. > >> > >> Hmm -- except that the process that''s dying is clearly QEMU, and the > >> disk images are definitely not qcow*, and Ian seems to think this > >> kernel has blktap (how could I tell?), so something''s not right. > > > > It looks like it is a module -- lsmod should confirm, maybe it''s a > > simple as loading it? > > > > (if so let me know and I''ll be sure to include that when I write up > > "installing a Debian Dom0") > > Indeed, blktap was *not* loaded, and "modprobe blktap" seems make things work. > > Should this be done in one of the initscripts? Or perhaps by xl?xencommons should do it, IMHO.> It would still be good to get the AIO stuff fixed in some way, as I''m > sure I''m not the only one who''s going to run into this problem.Stefano has fixed it in the upstream kernel. I''m afraid there is no realistic chance of it being fixed in the squeeze kernel at this stage. Ian.
On Thu, Apr 26, 2012 at 2:46 PM, Ian Campbell <Ian.Campbell@citrix.com> wrote:>> It would still be good to get the AIO stuff fixed in some way, as I''m >> sure I''m not the only one who''s going to run into this problem. > > Stefano has fixed it in the upstream kernel. I''m afraid there is no > realistic chance of it being fixed in the squeeze kernel at this stage.Any chance we could get AIO disabled in the squeeze kernel then? -George
On Thu, 2012-04-26 at 14:55 +0100, George Dunlap wrote:> On Thu, Apr 26, 2012 at 2:46 PM, Ian Campbell <Ian.Campbell@citrix.com> wrote: > >> It would still be good to get the AIO stuff fixed in some way, as I''m > >> sure I''m not the only one who''s going to run into this problem. > > > > Stefano has fixed it in the upstream kernel. I''m afraid there is no > > realistic chance of it being fixed in the squeeze kernel at this stage. > > Any chance we could get AIO disabled in the squeeze kernel then?I don''t think so, that would break legitimate uses of AIO. We could potentially ensure that Xen/qemu doesn''t try to use AIO on older kernels but AIUI you were running a newer version than provided in Squeeze so that isn''t something we can fix in Debian? Ian.
Stefano Stabellini
2012-Apr-26 14:10 UTC
Re: Kernel aio bug in Debian 2.6.32-5-xen kernel?
On Thu, 26 Apr 2012, Ian Campbell wrote:> On Thu, 2012-04-26 at 14:55 +0100, George Dunlap wrote: > > On Thu, Apr 26, 2012 at 2:46 PM, Ian Campbell <Ian.Campbell@citrix.com> wrote: > > >> It would still be good to get the AIO stuff fixed in some way, as I''m > > >> sure I''m not the only one who''s going to run into this problem. > > > > > > Stefano has fixed it in the upstream kernel. I''m afraid there is no > > > realistic chance of it being fixed in the squeeze kernel at this stage. > > > > Any chance we could get AIO disabled in the squeeze kernel then? > > I don''t think so, that would break legitimate uses of AIO. > > We could potentially ensure that Xen/qemu doesn''t try to use AIO on > older kernels but AIUI you were running a newer version than provided in > Squeeze so that isn''t something we can fix in Debian?We could add a patch in Debian to disable AIO and O_DIRECT in QEMU. Otherwise I don''t really know what we could do upstream to detect whether a kernel has a buggy O_DIRECT/AIO implementation or not.
On Thu, 2012-04-26 at 15:10 +0100, Stefano Stabellini wrote:> On Thu, 26 Apr 2012, Ian Campbell wrote: > > On Thu, 2012-04-26 at 14:55 +0100, George Dunlap wrote: > > > On Thu, Apr 26, 2012 at 2:46 PM, Ian Campbell <Ian.Campbell@citrix.com> wrote: > > > >> It would still be good to get the AIO stuff fixed in some way, as I''m > > > >> sure I''m not the only one who''s going to run into this problem. > > > > > > > > Stefano has fixed it in the upstream kernel. I''m afraid there is no > > > > realistic chance of it being fixed in the squeeze kernel at this stage. > > > > > > Any chance we could get AIO disabled in the squeeze kernel then? > > > > I don''t think so, that would break legitimate uses of AIO. > > > > We could potentially ensure that Xen/qemu doesn''t try to use AIO on > > older kernels but AIUI you were running a newer version than provided in > > Squeeze so that isn''t something we can fix in Debian? > > We could add a patch in Debian to disable AIO and O_DIRECT in QEMU.AUIU Qemu in this case is not the qemu in Debian, it''s the one from xen-unstable.> Otherwise I don''t really know what we could do upstream to detect > whether a kernel has a buggy O_DIRECT/AIO implementation or not.Me neither. Ian.
On Thu, Apr 26, George Dunlap wrote:> Recently I pulled in new changesets from xen- and qemu-unstable, and > when creating a PV guest I''m getting errors like the stack trace > below. Is this likely to be caused by QEMU using AIO? It this a bug > in Xen or in the Debian kernel? Is there an easy way to turn off aio > using a config file so I can see if it is qemu''s aio?I also hit bugs in the nfs code paths with the SuSE kernels. Try ''device_model_version="qemu-xen-traditional"'' in your .cfg file. See changeset 25222:a095e157f280. Olaf
On Thu, Apr 26, 2012 at 2:46 PM, Ian Campbell <Ian.Campbell@citrix.com> wrote:> On Thu, 2012-04-26 at 14:43 +0100, George Dunlap wrote: >> On Thu, Apr 26, 2012 at 2:24 PM, Ian Campbell <Ian.Campbell@citrix.com> wrote: >> > On Thu, 2012-04-26 at 14:14 +0100, George Dunlap wrote: >> >> On Thu, Apr 26, 2012 at 1:07 PM, Stefano Stabellini >> >> <stefano.stabellini@eu.citrix.com> wrote: >> >> >> However this kernel does have blktap so why is qemu based AIO being used >> >> >> at all? >> >> > >> >> > If blktap is present and working then libxl only uses QEMU for >> >> > qcow/qcow2 disk images. >> >> >> >> Hmm -- except that the process that''s dying is clearly QEMU, and the >> >> disk images are definitely not qcow*, and Ian seems to think this >> >> kernel has blktap (how could I tell?), so something''s not right. >> > >> > It looks like it is a module -- lsmod should confirm, maybe it''s a >> > simple as loading it? >> > >> > (if so let me know and I''ll be sure to include that when I write up >> > "installing a Debian Dom0") >> >> Indeed, blktap was *not* loaded, and "modprobe blktap" seems make things work. >> >> Should this be done in one of the initscripts? Or perhaps by xl? > > xencommons should do it, IMHO.Just re-ran into this problem. Is the preferred solution to just add "modprobe blktap" (without error checking) to the xencommons initscript? -George
On Tue, 2012-05-15 at 11:01 +0100, George Dunlap wrote:> On Thu, Apr 26, 2012 at 2:46 PM, Ian Campbell <Ian.Campbell@citrix.com> wrote: > > On Thu, 2012-04-26 at 14:43 +0100, George Dunlap wrote: > >> On Thu, Apr 26, 2012 at 2:24 PM, Ian Campbell <Ian.Campbell@citrix.com> wrote: > >> > On Thu, 2012-04-26 at 14:14 +0100, George Dunlap wrote: > >> >> On Thu, Apr 26, 2012 at 1:07 PM, Stefano Stabellini > >> >> <stefano.stabellini@eu.citrix.com> wrote: > >> >> >> However this kernel does have blktap so why is qemu based AIO being used > >> >> >> at all? > >> >> > > >> >> > If blktap is present and working then libxl only uses QEMU for > >> >> > qcow/qcow2 disk images. > >> >> > >> >> Hmm -- except that the process that''s dying is clearly QEMU, and the > >> >> disk images are definitely not qcow*, and Ian seems to think this > >> >> kernel has blktap (how could I tell?), so something''s not right. > >> > > >> > It looks like it is a module -- lsmod should confirm, maybe it''s a > >> > simple as loading it? > >> > > >> > (if so let me know and I''ll be sure to include that when I write up > >> > "installing a Debian Dom0") > >> > >> Indeed, blktap was *not* loaded, and "modprobe blktap" seems make things work. > >> > >> Should this be done in one of the initscripts? Or perhaps by xl? > > > > xencommons should do it, IMHO. > > Just re-ran into this problem. Is the preferred solution to just add > "modprobe blktap" (without error checking) to the xencommons > initscript?Right, or maybe xen-blktap depending on the kernel, or maybe both. (not sure if this is one of the ones which got renamed, being out of tree I suppose it is less likely...) There''s a bunch of modprobes in there already which you can just copy