tsk
2010-Jun-28 04:41 UTC
[Xen-devel] blktap2 bug:Assertion ''list_empty(&vreq->next)'' failed
Hi folks, Yestoday, I run a testcase, 6 VMs restart every 10 minitues from 11:00 am to 16:30 pm until Dom0 crashed. Assertion ''list_empty(&vreq->next)'' failed will lead to tapdisk2 process segfault, then Dom0 crashed! Jun 27 16:31:05 r02k05014 kernel: device tap775.0 entered promiscuous mode Jun 27 16:31:05 r02k05014 kernel: eth0: port 13(tap775.0) entering forwarding state Jun 27 16:31:15 r02k05014 tapdisk2[4341]: Assertion ''list_empty(&vreq->next)'' failed, line 1822, file tapdisk-vbd.c Jun 27 16:31:15 r02k05014 kernel: tapdisk2[4341]: segfault at 0 ip 000000000040a24f sp 00007fff0d8acb70 error 6 in tapdisk2[400000+39000] Jun 28 18:58:09 r02k05014 syslogd 1.4.1: restart. Any one who can give me some tips? What will lead to Assertion ''list_empty(&vreq->next)'' failed ? xm info: release : 2.6.31.13 version : #3 SMP Fri Apr 30 15:10:24 CST 2010 machine : x86_64 nr_cpus : 16 nr_nodes : 2 cores_per_socket : 4 threads_per_core : 2 cpu_mhz : 2266 hw_caps : bfebfbff:28100800:00000000:00001b40:009ce3bd:00000000:00000001:00000000 virt_caps : hvm total_memory : 24544 free_memory : 19693 node_to_cpu : node0:0,2,4,6,8,10,12,14 node1:1,3,5,7,9,11,13,15 node_to_memory : node0:7633 node1:12059 node_to_dma32_mem : node0:2996 node1:0 max_node_id : 1 xen_major : 4 xen_minor : 0 xen_extra : .0 xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 xen_scheduler : credit xen_pagesize : 4096 platform_params : virt_start=0xffff800000000000 xen_changeset : unavailable xen_commandline : console=com1,vga com1=115200,8n1 msi=1 dom0_mem=6144M dom0_max_vcpus=4 dom0_vcpus_pin iommu=off x2apic=off hap=0 cc_compiler : gcc version 4.1.2 20080704 (Red Hat 4.1.2-46) cc_compile_by : root cc_compile_date : Wed May 12 19:09:47 CST 2010 xend_config_format : 4 tsk _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
tsk
2010-Jun-28 04:48 UTC
[Xen-devel] Re: blktap2 bug:Assertion ''list_empty(&vreq->next)'' failed
If a tapdisk2 process segfault, the Dom0 will crash. So the robustness of tapdisk2 process and blktap2 driver is very very important! Is there any version to solve this problem? 2010/6/28 tsk <aixt2006@gmail.com>> Hi folks, > > Yestoday, I run a testcase, 6 VMs restart every 10 minitues from 11:00 am > to 16:30 pm until Dom0 crashed. > > Assertion ''list_empty(&vreq->next)'' failed will lead to tapdisk2 process > segfault, then Dom0 crashed! > > Jun 27 16:31:05 r02k05014 kernel: device tap775.0 entered promiscuous mode > Jun 27 16:31:05 r02k05014 kernel: eth0: port 13(tap775.0) entering > forwarding state > Jun 27 16:31:15 r02k05014 tapdisk2[4341]: Assertion > ''list_empty(&vreq->next)'' failed, line 1822, file tapdisk-vbd.c > Jun 27 16:31:15 r02k05014 kernel: tapdisk2[4341]: segfault at 0 ip > 000000000040a24f sp 00007fff0d8acb70 error 6 in tapdisk2[400000+39000] > Jun 28 18:58:09 r02k05014 syslogd 1.4.1: restart. > > Any one who can give me some tips? > > What will lead to Assertion ''list_empty(&vreq->next)'' failed ? > > xm info: > release : 2.6.31.13 > version : #3 SMP Fri Apr 30 15:10:24 CST 2010 > machine : x86_64 > nr_cpus : 16 > nr_nodes : 2 > cores_per_socket : 4 > threads_per_core : 2 > cpu_mhz : 2266 > hw_caps : > bfebfbff:28100800:00000000:00001b40:009ce3bd:00000000:00000001:00000000 > virt_caps : hvm > total_memory : 24544 > free_memory : 19693 > node_to_cpu : node0:0,2,4,6,8,10,12,14 > node1:1,3,5,7,9,11,13,15 > node_to_memory : node0:7633 > node1:12059 > node_to_dma32_mem : node0:2996 > node1:0 > max_node_id : 1 > xen_major : 4 > xen_minor : 0 > xen_extra : .0 > xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 > hvm-3.0-x86_32p hvm-3.0-x86_64 > xen_scheduler : credit > xen_pagesize : 4096 > platform_params : virt_start=0xffff800000000000 > xen_changeset : unavailable > xen_commandline : console=com1,vga com1=115200,8n1 msi=1 > dom0_mem=6144M dom0_max_vcpus=4 dom0_vcpus_pin iommu=off x2apic=off hap=0 > cc_compiler : gcc version 4.1.2 20080704 (Red Hat 4.1.2-46) > cc_compile_by : root > cc_compile_date : Wed May 12 19:09:47 CST 2010 > xend_config_format : 4 > > tsk > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel Stodden
2010-Jun-28 04:49 UTC
[Xen-users] Re: [Xen-devel] blktap2 bug:Assertion ''list_empty(&vreq->next)'' failed
On Mon, 2010-06-28 at 00:41 -0400, tsk wrote:> Hi folks, > > > Yestoday, I run a testcase, 6 VMs restart every 10 minitues from 11:00 > am to 16:30 pm until Dom0 crashed. > > Assertion ''list_empty(&vreq->next)'' failed will lead to tapdisk2 > process segfault, then Dom0 crashed! > > > Jun 27 16:31:05 r02k05014 kernel: device tap775.0 entered promiscuous > mode > Jun 27 16:31:05 r02k05014 kernel: eth0: port 13(tap775.0) entering > forwarding state > Jun 27 16:31:15 r02k05014 tapdisk2[4341]: Assertion > ''list_empty(&vreq->next)'' failed, line 1822, file tapdisk-vbd.c > Jun 27 16:31:15 r02k05014 kernel: tapdisk2[4341]: segfault at 0 ip > 000000000040a24f sp 00007fff0d8acb70 error 6 in tapdisk2[400000+39000] > Jun 28 18:58:09 r02k05014 syslogd 1.4.1: restart. > > > Any one who can give me some tips?Semi-yes. Not entirely sure about the list_empty userland crasher, but the kernel stuff was upstreamed with some minor exposed while unmapping pending I/O. Which I didn''t care yet to send patches for... ... Soon, real soon now... Daniel _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
tsk
2010-Jun-28 04:58 UTC
Re: [Xen-devel] blktap2 bug:Assertion ''list_empty(&vreq->next)'' failed
It is found that some assertion was in the tapdisk2 source, if anyone failed, the tapdisk2 process will exit, They are very dangerous... How can the kernel handle this if the tapdisk2 process exit? 2010/6/28 Daniel Stodden <daniel.stodden@citrix.com>> On Mon, 2010-06-28 at 00:41 -0400, tsk wrote: > > Hi folks, > > > > > > Yestoday, I run a testcase, 6 VMs restart every 10 minitues from 11:00 > > am to 16:30 pm until Dom0 crashed. > > > > Assertion ''list_empty(&vreq->next)'' failed will lead to tapdisk2 > > process segfault, then Dom0 crashed! > > > > > > Jun 27 16:31:05 r02k05014 kernel: device tap775.0 entered promiscuous > > mode > > Jun 27 16:31:05 r02k05014 kernel: eth0: port 13(tap775.0) entering > > forwarding state > > Jun 27 16:31:15 r02k05014 tapdisk2[4341]: Assertion > > ''list_empty(&vreq->next)'' failed, line 1822, file tapdisk-vbd.c > > Jun 27 16:31:15 r02k05014 kernel: tapdisk2[4341]: segfault at 0 ip > > 000000000040a24f sp 00007fff0d8acb70 error 6 in tapdisk2[400000+39000] > > Jun 28 18:58:09 r02k05014 syslogd 1.4.1: restart. > > > > > > Any one who can give me some tips? > > Semi-yes. Not entirely sure about the list_empty userland crasher, but > the kernel stuff was upstreamed with some minor exposed while unmapping > pending I/O. > > Which I didn''t care yet to send patches for... > > ... Soon, real soon now... > > Daniel > > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel Stodden
2010-Jul-05 21:25 UTC
Re: [Xen-devel] blktap2 bug:Assertion ''list_empty(&vreq->next)'' failed
On Thu, 2010-07-01 at 08:55 -0400, tsk wrote:> Hi, will the patches be added to Xen-4.1?About to prepare the next bunch of patches for xen.git. Jeremy, is it okay to pull some of the changes made for 2.6.3x into the xen/dom0/backend/blktap2 topic branch before moving on? I''m mainly talking about this one:>From ab77527f9a63a5c657c6d6a50e70a66adceaa3a0 Mon Sep 17 00:00:00 2001From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Date: Thu, 18 Feb 2010 17:22:09 -0800 Subject: [PATCH] xen/blktap2: make compile and a couple related deltas which came later. Except moving forward to newer blk queue interfaces, they don''t make much of a functional difference. Would simplify merging quite a bit. Daniel> > 2010/6/28 Daniel Stodden <daniel.stodden@citrix.com> > On Mon, 2010-06-28 at 00:58 -0400, tsk wrote: > > It is found that some assertion was in the tapdisk2 source, > if anyone > > failed, the tapdisk2 process will exit, > > They are very dangerous... > > How can the kernel handle this if the tapdisk2 process exit? > > > There would be multiple ways. > > One is to recover in-flight I/O and wait for a new tapdisk to > start, > then reissue. There''s not a real usecase for this, but think > watchdog. > Maybe someday. > > The present strategy is to: > - Fail pending I/O. > - Restart the disk I/O queue, thereby flushing everything > submitted > to the block device with EIO as well. > - Release the block device after the last opener finally gave > up. > > Works well. At least from a dom0 perspective ;) > > Daniel > > > > > > > > 2010/6/28 Daniel Stodden <daniel.stodden@citrix.com> > > On Mon, 2010-06-28 at 00:41 -0400, tsk wrote: > > > Hi folks, > > > > > > > > > Yestoday, I run a testcase, 6 VMs restart every 10 > minitues > > from 11:00 > > > am to 16:30 pm until Dom0 crashed. > > > > > > Assertion ''list_empty(&vreq->next)'' failed will > lead to > > tapdisk2 > > > process segfault, then Dom0 crashed! > > > > > > > > > Jun 27 16:31:05 r02k05014 kernel: device tap775.0 > entered > > promiscuous > > > mode > > > Jun 27 16:31:05 r02k05014 kernel: eth0: port > 13(tap775.0) > > entering > > > forwarding state > > > Jun 27 16:31:15 r02k05014 tapdisk2[4341]: > Assertion > > > ''list_empty(&vreq->next)'' failed, line 1822, file > > tapdisk-vbd.c > > > Jun 27 16:31:15 r02k05014 kernel: tapdisk2[4341]: > segfault > > at 0 ip > > > 000000000040a24f sp 00007fff0d8acb70 error 6 in > > tapdisk2[400000+39000] > > > Jun 28 18:58:09 r02k05014 syslogd 1.4.1: restart. > > > > > > > > > Any one who can give me some tips? > > > > > > Semi-yes. Not entirely sure about the list_empty > userland > > crasher, but > > the kernel stuff was upstreamed with some minor > exposed while > > unmapping > > pending I/O. > > > > Which I didn''t care yet to send patches for... > > > > ... Soon, real soon now... > > > > Daniel > > > > > > > > > > > > > > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Jul-05 22:26 UTC
Re: [Xen-devel] blktap2 bug:Assertion ''list_empty(&vreq->next)'' failed
On 07/05/2010 02:25 PM, Daniel Stodden wrote:> On Thu, 2010-07-01 at 08:55 -0400, tsk wrote: > >> Hi, will the patches be added to Xen-4.1? >> > About to prepare the next bunch of patches for xen.git. > > Jeremy, is it okay to pull some of the changes made for 2.6.3x into the > xen/dom0/backend/blktap2 topic branch before moving on? > > I''m mainly talking about this one: > > >From ab77527f9a63a5c657c6d6a50e70a66adceaa3a0 Mon Sep 17 00:00:00 2001 > From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> > Date: Thu, 18 Feb 2010 17:22:09 -0800 > Subject: [PATCH] xen/blktap2: make compile > > and a couple related deltas which came later. > > Except moving forward to newer blk queue interfaces, they don''t make > much of a functional difference. > > Would simplify merging quite a bit. >Which kernel version has these interface updates? Do they predate the version the blktap2 branch is rooted on, or do the come later? If they''re later, then merge the mainline kernel (ideally a tagged release like v2.6.3[12]) into the blktap2 one and cherrypick the "make compile" changes on top of the merge (and if they''re already there, then just cherrypick). J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel Stodden
2010-Jul-05 23:25 UTC
Re: [Xen-devel] blktap2 bug:Assertion ''list_empty(&vreq->next)'' failed
On Mon, 2010-07-05 at 18:26 -0400, Jeremy Fitzhardinge wrote:> On 07/05/2010 02:25 PM, Daniel Stodden wrote: > > On Thu, 2010-07-01 at 08:55 -0400, tsk wrote: > > > >> Hi, will the patches be added to Xen-4.1? > >> > > About to prepare the next bunch of patches for xen.git. > > > > Jeremy, is it okay to pull some of the changes made for 2.6.3x into the > > xen/dom0/backend/blktap2 topic branch before moving on? > > > > I''m mainly talking about this one: > > > > >From ab77527f9a63a5c657c6d6a50e70a66adceaa3a0 Mon Sep 17 00:00:00 2001 > > From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> > > Date: Thu, 18 Feb 2010 17:22:09 -0800 > > Subject: [PATCH] xen/blktap2: make compile > > > > and a couple related deltas which came later. > > > > Except moving forward to newer blk queue interfaces, they don''t make > > much of a functional difference. > > > > Would simplify merging quite a bit. > > > > Which kernel version has these interface updates? Do they predate the > version the blktap2 branch is rooted on, or do the come later? If > they''re later, then merge the mainline kernel (ideally a tagged release > like v2.6.3[12]) into the blktap2 one and cherrypick the "make compile" > changes on top of the merge (and if they''re already there, then just > cherrypick).Okay, the HG follower in me didn''t quite follow yet the cherry-related part of that procedure. But altogether it sounds quite promising. Followup later. Thanks! :) Daniel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel