Dale Bewley
2007-Nov-15 18:55 UTC
[Fedora-xen] dom0 F7 crashes tcp_tso_segment oops - new kernel lag time
Running 2.6.20-2936.fc7xen x86_64 quad CPU, 16G. We have a dom0 with bridges riding on top of a VLAN interface. # brctl show bridge name bridge id STP enabled interfaces br101 8000.00093d139ae9 no eth0 br6 8000.00093d139ae9 no vif2.0 vif1.0 eth0.6 ... Inside a F7 domU on br6 we are running tc (via shorewall) to limit the bandwidth of a mirror server. On Monday I throttled the bandwidth down far below the demand. Since then we are starting to see dom0 crash and reboot with nothing in the log. Crashes happened on Tues around 9am and Thursday (today) around 4am and 9am. I caught the console during the most recent crash: Unable to handle kernel NULL pointer dereference at 0000000000000030 RIP: [<ffffffff803fc79c>] tcp_tso_segment+0x1d8/0x285 PGD 3db8f1067 PUD 3db8f2067 PMD 0 Oops: 0000 [1] SMP last sysfs file: /devices/xen-backend/vbd-2-51712/statistics/wr_sect CPU 1 Modules linked in: loop netbk xenblktap blkbk autofs4 8021q bridge nf_conntrack_netbios_ns ipt_LdPid: 0, comm: swapper Not tainted 2.6.20-2936.fc7xen #1 RIP: e030:[<ffffffff803fc79c>] [<ffffffff803fc79c>] tcp_tso_segment+0x1d8/0x285 RSP: e02b:ffff880002f77950 EFLAGS: 00010216 RAX: 0000000000007a21 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000010130000 RSI: 0000000000000000 RDI: 0000000087e8b626 RBP: 000000000000fa87 R08: 0000000187e8b625 R09: 0000000000000000 R10: 000000009b9a7b25 R11: 0000000000000003 R12: ffff88034ecab034 R13: 0000000017acfe00 R14: 0000000000000020 R15: 00000000ffff0000 FS: 00002aaaab0ff230(0000) GS:ffffffff80580080(0000) knlGS:0000000000000000 We are up to date with patches. After the first 2 crashes and before the 3rd I upgraded from xen-3.1.0-6.fc7 to xen-3.1.0-8.fc7 and rebooted dom0 for good measure. I did some googling and found this: http://lists.openwall.net/netdev/2007/02/09/16 which seems pretty similar. Assuming this is the problem and there is a fix I''m left to ask what is the ETA for a new xen kernel and what is the typical lag time? Current state is 2.6.23.1-21.fc7 vs. 2.6.20-2936.fc7xen -- Dale Bewley - Unix Administrator - Shields Library - UC Davis GPG: 0xB098A0F3 0D5A 9AEB 43F4 F84C 7EFD 1753 064D 2583 B098 A0F3
Richard W.M. Jones
2007-Nov-19 11:43 UTC
Re: [Fedora-xen] dom0 F7 crashes tcp_tso_segment oops - new kernel lag time
Dale Bewley wrote:> Running 2.6.20-2936.fc7xen x86_64 quad CPU, 16G. > > We have a dom0 with bridges riding on top of a VLAN interface. > > # brctl show > bridge name bridge id STP enabled interfaces > br101 8000.00093d139ae9 no eth0 > br6 8000.00093d139ae9 no vif2.0 > vif1.0 > eth0.6 > ... > > Inside a F7 domU on br6 we are running tc (via shorewall) to limit the bandwidth of a mirror server. On Monday I throttled the bandwidth down far below the demand. Since then we are starting to see dom0 crash and reboot with nothing in the log. > > Crashes happened on Tues around 9am and Thursday (today) around 4am and 9am. I caught the console during the most recent crash: > > Unable to handle kernel NULL pointer dereference at 0000000000000030 RIP: > [<ffffffff803fc79c>] tcp_tso_segment+0x1d8/0x285 > PGD 3db8f1067 PUD 3db8f2067 PMD 0 > Oops: 0000 [1] SMP > last sysfs file: /devices/xen-backend/vbd-2-51712/statistics/wr_sect > CPU 1 > Modules linked in: loop netbk xenblktap blkbk autofs4 8021q bridge nf_conntrack_netbios_ns ipt_LdPid: 0, comm: swapper Not tainted 2.6.20-2936.fc7xen #1 > RIP: e030:[<ffffffff803fc79c>] [<ffffffff803fc79c>] tcp_tso_segment+0x1d8/0x285 > RSP: e02b:ffff880002f77950 EFLAGS: 00010216 > RAX: 0000000000007a21 RBX: 0000000000000000 RCX: 0000000000000000 > RDX: 0000000010130000 RSI: 0000000000000000 RDI: 0000000087e8b626 > RBP: 000000000000fa87 R08: 0000000187e8b625 R09: 0000000000000000 > R10: 000000009b9a7b25 R11: 0000000000000003 R12: ffff88034ecab034 > R13: 0000000017acfe00 R14: 0000000000000020 R15: 00000000ffff0000 > FS: 00002aaaab0ff230(0000) GS:ffffffff80580080(0000) knlGS:0000000000000000 > > We are up to date with patches. After the first 2 crashes and before the 3rd I upgraded from xen-3.1.0-6.fc7 to xen-3.1.0-8.fc7 and rebooted dom0 for good measure. > > I did some googling and found this: > http://lists.openwall.net/netdev/2007/02/09/16 > which seems pretty similar. > > Assuming this is the problem and there is a fix I''m left to ask what is the ETA for a new xen kernel and what is the typical lag time? > > Current state is 2.6.23.1-21.fc7 vs. 2.6.20-2936.fc7xenI wonder if it is worth your while trying the Xen stack & kernel from F8? Currently kernel-xen 2.6.21-2950 and xen 3.1.0-13. Rich. -- Emerging Technologies, Red Hat - http://et.redhat.com/~rjones/ Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SL4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 03798903
Dale Bewley
2007-Nov-19 18:20 UTC
Re: [Fedora-xen] dom0 F7 crashes tcp_tso_segment oops - new kernel lag time
----- "Richard W.M. Jones" <rjones@redhat.com> wrote:> Dale Bewley wrote: > > Current state is 2.6.23.1-21.fc7 vs. 2.6.20-2936.fc7xen > > I wonder if it is worth your while trying the Xen stack & kernel from > > F8? Currently kernel-xen 2.6.21-2950 and xen 3.1.0-13.If you think that is safe-ish, I can try it. However, I have not had a crash since I posted. Maybe thanks to a slight tapering off of F8 mirror traffic on this box. What''s the process of the above kernel and xen releases making it into F7-Updates? I assume the user population is a bit smaller in this realm, so focus is always on the dev release and updates are more ad hoc? -- Dale Bewley - Unix Administrator - Shields Library - UC Davis GPG: 0xB098A0F3 0D5A 9AEB 43F4 F84C 7EFD 1753 064D 2583 B098 A0F3
Dale Bewley
2007-Nov-27 22:20 UTC
Re: [Fedora-xen] dom0 F7 crashes tcp_tso_segment oops - new kernel lag time
----- "Dale Bewley" <dlbewley@lib.ucdavis.edu> wrote:> ----- "Richard W.M. Jones" <rjones@redhat.com> wrote: > > Dale Bewley wrote: > > > Current state is 2.6.23.1-21.fc7 vs. 2.6.20-2936.fc7xen > > > > I wonder if it is worth your while trying the Xen stack & kernel > from > > > > F8? Currently kernel-xen 2.6.21-2950 and xen 3.1.0-13. > > If you think that is safe-ish, I can try it. However, I have not had a > crash since I posted. Maybe thanks to a slight tapering off of F8 > mirror traffic on this box. > > What''s the process of the above kernel and xen releases making it into > F7-Updates? I assume the user population is a bit smaller in this > realm, so focus is always on the dev release and updates are more ad > hoc?After a period of maybe 11 days of stable uptime we have had 4 crashes in the last 24 hours due to this tcp_tso_segment bug. Since there aren''t many domUs on here *yet*, I think we''re just going to go ahead and upgrade to F8 now. We''ll gain the 2.6.21 kernel and hopefully somewhat more developer attention in general. I''m not sure why the 2.6.21 kernel-xen isn''t backported from f8 to f7. Doesn''t the most recent release get pretty much all the same updates? -- Dale Bewley - Unix Administrator - Shields Library - UC Davis GPG: 0xB098A0F3 0D5A 9AEB 43F4 F84C 7EFD 1753 064D 2583 B098 A0F3