Christopher S. Aker
2010-Oct-11 21:44 UTC
[Xen-devel] Xen 4.1 + 3ware 9690SA = rejecting I/O to offline device
In an effort to fix the problem described in my previous xen-devel post ("New CPUS, now get: NETDEV WATCHDOG: eth0: transmit timed out"), we''ve come across another problem. 3ware 9690SA cards to not behave under Xen 4.1 (as of cs 22155). We have a simple Xen thrash test suite which fires up domUs that do different workloads (some swap thrash, some kernel build, some spin CPUs, some cycle rebooting, etc). Almost immediately after launching the suite we can get the 3ware 9690SA card to fail with something like the following: sd 0:0:0:0: WARNING: (0x06:0x002C): Command (0x28) timed out, resetting card. sd 0:0:0:0: WARNING: (0x06:0x002C): Command (0x0) timed out, resetting card. sd 0:0:0:0: rejecting I/O to offline device sd 0:0:0:0: rejecting I/O to offline device Under a 2.6.32 dom0 it sometimes also triggers Xenwatch like so: http://theshore.net/~caker/xen/BUGS/9690SA/xenwatch.txt Results matrix: +---------------------------------------------------------------+ | Xen | Dom0 | 9550SXU | 9690SA | 9750 | +---------------------------------------------------------------+ | 3.4.1 | 2.6.18.8-931-2 | OK | OK | OK | | 3.4.4-rc1-pre | 2.6.18.8-931-2 | OK | OK | OK | | 3.4.4-rc1-pre | 2.6.32.23-g41a85de5 | OK | OK | OK | | 4.1 @ 22155 | 2.6.18.8-931-2 | OK | FAIL | OK | | 4.1 @ 22155 | 2.6.32.23-g41a85de5 | OK | FAIL | OK | +---------------------------------------------------------------+ The failures were verified on at least 2 machines of identical specification. The same dom0 kernels that produce a stable 9690SA under Xen 3.4, bomb under Xen 4.1. -Chris _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
gianfi
2010-Nov-21 16:55 UTC
[Xen-devel] Re: Xen 4.1 + 3ware 9690SA = rejecting I/O to offline device
Hello, i can confirm the same behaviour on xen 4.0.1, with a 3ware 9690SA card, triggered by heavy I/O load. Does anybody know a possible workaround for the issue? Thank you very much. -- View this message in context: http://xen.1045712.n5.nabble.com/Xen-4-1-3ware-9690SA-rejecting-I-O-to-offline-device-tp3208156p3274461.html Sent from the Xen - Dev mailing list archive at Nabble.com. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2010-Nov-22 16:37 UTC
Re: [Xen-devel] Re: Xen 4.1 + 3ware 9690SA = rejecting I/O to offline device
On Sun, Nov 21, 2010 at 08:55:18AM -0800, gianfi wrote:> > Hello, > i can confirm the same behaviour on xen 4.0.1, with a 3ware 9690SA card,Uhh, can you refer to the thread that explains "same behaviour"?> triggered by heavy I/O load. Does anybody know a possible workaround for the > issue?It might be related to "pci-passthrough in pvops causing offline raid" - look in that e-mail thread and I posted a list of things I would like you to do so that we can get to the bottom of this. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christopher S. Aker
2011-Sep-27 18:13 UTC
Re: [Xen-devel] Xen 4.1 + 3ware 9690SA = rejecting I/O to offline device
On 10/11/10 5:44 PM, Christopher S. Aker wrote:> In an effort to fix the problem described in my previous xen-devel post > ("New CPUS, now get: NETDEV WATCHDOG: eth0: transmit timed out"), we''ve > come across another problem. 3ware 9690SA cards to not behave under Xen > 4.1 (as of cs 22155). > > We have a simple Xen thrash test suite which fires up domUs that do > different workloads (some swap thrash, some kernel build, some spin > CPUs, some cycle rebooting, etc). Almost immediately after launching the > suite we can get the 3ware 9690SA card to fail with something like the > following: > > sd 0:0:0:0: WARNING: (0x06:0x002C): Command (0x28) timed out, resetting > card. > sd 0:0:0:0: WARNING: (0x06:0x002C): Command (0x0) timed out, resetting > card. > sd 0:0:0:0: rejecting I/O to offline device > sd 0:0:0:0: rejecting I/O to offline device > > Under a 2.6.32 dom0 it sometimes also triggers Xenwatch like so: > > http://theshore.net/~caker/xen/BUGS/9690SA/xenwatch.txt > > Results matrix: > > +---------------------------------------------------------------+ > | Xen | Dom0 | 9550SXU | 9690SA | 9750 | > +---------------------------------------------------------------+ > | 3.4.1 | 2.6.18.8-931-2 | OK | OK | OK | > | 3.4.4-rc1-pre | 2.6.18.8-931-2 | OK | OK | OK | > | 3.4.4-rc1-pre | 2.6.32.23-g41a85de5 | OK | OK | OK | > | 4.1 @ 22155 | 2.6.18.8-931-2 | OK | FAIL | OK | > | 4.1 @ 22155 | 2.6.32.23-g41a85de5 | OK | FAIL | OK | > +---------------------------------------------------------------+ > > The failures were verified on at least 2 machines of identical > specification. > > The same dom0 kernels that produce a stable 9690SA under Xen 3.4, bomb > under Xen 4.1.I''m back at this, and the problem still exists with a 4.1.1/3.0.4 stack. Konrad, in the "offline raid" thread you asked for the following debug information: http://www.theshore.net/~caker/xen/BUGS/offline-raid/ The sysrq-t.txt and triple-a-star.txt outputs are after I got the raid card to hang up (but before it timed out and started spewing to the console). Oddly, lspci shows three devices assigned IRQ 16, however /proc/interrupts only lists two of them. Side effect of MSI? Also, the problem still happens even with MSI disabled (pci=nomsi). Thanks, -Chris _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andrew Cooper
2011-Sep-27 18:22 UTC
Re: [Xen-devel] Xen 4.1 + 3ware 9690SA = rejecting I/O to offline device
On 27/09/2011 19:13, Christopher S. Aker wrote:> On 10/11/10 5:44 PM, Christopher S. Aker wrote: >> In an effort to fix the problem described in my previous xen-devel post >> ("New CPUS, now get: NETDEV WATCHDOG: eth0: transmit timed out"), we''ve >> come across another problem. 3ware 9690SA cards to not behave under Xen >> 4.1 (as of cs 22155). >> >> We have a simple Xen thrash test suite which fires up domUs that do >> different workloads (some swap thrash, some kernel build, some spin >> CPUs, some cycle rebooting, etc). Almost immediately after launching the >> suite we can get the 3ware 9690SA card to fail with something like the >> following: >> >> sd 0:0:0:0: WARNING: (0x06:0x002C): Command (0x28) timed out, resetting >> card. >> sd 0:0:0:0: WARNING: (0x06:0x002C): Command (0x0) timed out, resetting >> card. >> sd 0:0:0:0: rejecting I/O to offline device >> sd 0:0:0:0: rejecting I/O to offline device >> >> Under a 2.6.32 dom0 it sometimes also triggers Xenwatch like so: >> >> http://theshore.net/~caker/xen/BUGS/9690SA/xenwatch.txt >> >> Results matrix: >> >> +---------------------------------------------------------------+ >> | Xen | Dom0 | 9550SXU | 9690SA | 9750 | >> +---------------------------------------------------------------+ >> | 3.4.1 | 2.6.18.8-931-2 | OK | OK | OK | >> | 3.4.4-rc1-pre | 2.6.18.8-931-2 | OK | OK | OK | >> | 3.4.4-rc1-pre | 2.6.32.23-g41a85de5 | OK | OK | OK | >> | 4.1 @ 22155 | 2.6.18.8-931-2 | OK | FAIL | OK | >> | 4.1 @ 22155 | 2.6.32.23-g41a85de5 | OK | FAIL | OK | >> +---------------------------------------------------------------+ >> >> The failures were verified on at least 2 machines of identical >> specification. >> >> The same dom0 kernels that produce a stable 9690SA under Xen 3.4, bomb >> under Xen 4.1. > I''m back at this, and the problem still exists with a 4.1.1/3.0.4 stack. > > Konrad, in the "offline raid" thread you asked for the following debug > information: > > http://www.theshore.net/~caker/xen/BUGS/offline-raid/ > > The sysrq-t.txt and triple-a-star.txt outputs are after I got the raid > card to hang up (but before it timed out and started spewing to the > console). > > Oddly, lspci shows three devices assigned IRQ 16, however > /proc/interrupts only lists two of them. Side effect of MSI? > > Also, the problem still happens even with MSI disabled (pci=nomsi). > > Thanks, > -ChrisThis is almost certainly the bug to do with not ack''ing a migrating line level interrupt which I fixed in c/s 23145:1092a143ef9d. Try applying that patch, or just running from the tip of http://xenbits.xen.org/hg/xen-4.1-testing.hg/ ~Andrew> > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christopher S. Aker
2011-Sep-27 19:33 UTC
Re: [Xen-devel] Xen 4.1 + 3ware 9690SA = rejecting I/O to offline device
On 9/27/11 2:22 PM, Andrew Cooper wrote:> This is almost certainly the bug to do with not ack''ing a migrating line > level interrupt which I fixed in c/s 23145:1092a143ef9d. Try applying > that patch, or just running from the tip of > http://xenbits.xen.org/hg/xen-4.1-testing.hg/That was it! You''re a champion. Thanks, -Chris _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel