I''m now seeing this network hang a lot, to the point where it makes my iscsi testing unusable. I believe this is more to do with the sort of testing I''m doing now more so than a bug that has suddenly appeared. My setup is this: Dom0: 2.6.8.1 Iscsitarget 0.3.3 + 2.6 patches + my own 2.6 patches. No conntrack or other netfilter related modules Bridged eth0 to Dom1 /usr/src exported via nfs Dom1: 2.6.8.1 Linux-iscsi 4.0.1.8 No conntrack or other netfilter related modules /usr/src mounted from Dom0 Iscsi works for a while, normally crashing in Dom0 due to another non-xen related bug before it hits this bug, but if I try to do a compile on Dom1 in the nfs mounted /usr/src, the network locks up almost instantly, but then clears up shortly after if I kill the compile. The logs show absolutely nothing of any use. I''ve just tried a few netperf tests. A quick hammering goes off without a hitch, but afterwards I see random dropped packets. I''ll keep testing. James ------------------------------------------------------- This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170 Project Admins to receive an Apple iPod Mini FREE for your judgement on who ports your project to Linux PPC the best. Sponsored by IBM. Deadline: Sept. 13. Go here: http://sf.net/ppc_contest.php _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Curiously, a normal ping from dom1 to dom0 works fine, but a ping from dom1 to dom0 drops between 6% and 9% of packets. Netperf always appears to work flawlessly. I''m beginning to get baffled! James> -----Original Message----- > From: xen-devel-admin@lists.sourceforge.net [mailto:xen-devel- > admin@lists.sourceforge.net] On Behalf Of James Harper > Sent: Tuesday, 14 September 2004 22:38 > To: xen-devel@lists.sourceforge.net > Subject: [Xen-devel] network hang again > > I''m now seeing this network hang a lot, to the point where it makes my > iscsi testing unusable. I believe this is more to do with the sort of > testing I''m doing now more so than a bug that has suddenly appeared. > > My setup is this: > Dom0: > 2.6.8.1 > Iscsitarget 0.3.3 + 2.6 patches + my own 2.6 patches. > No conntrack or other netfilter related modules > Bridged eth0 to Dom1 > /usr/src exported via nfs > > Dom1: > 2.6.8.1 > Linux-iscsi 4.0.1.8 > No conntrack or other netfilter related modules > /usr/src mounted from Dom0 > > Iscsi works for a while, normally crashing in Dom0 due to another > non-xen related bug before it hits this bug, but if I try to do a > compile on Dom1 in the nfs mounted /usr/src, the network locks upalmost> instantly, but then clears up shortly after if I kill the compile. > > The logs show absolutely nothing of any use. > > I''ve just tried a few netperf tests. A quick hammering goes offwithout> a hitch, but afterwards I see random dropped packets. I''ll keeptesting.> > James > > > ------------------------------------------------------- > This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170 > Project Admins to receive an Apple iPod Mini FREE for your judgementon> who ports your project to Linux PPC the best. Sponsored by IBM. > Deadline: Sept. 13. Go here: http://sf.net/ppc_contest.php > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xen-devel------------------------------------------------------- This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170 Project Admins to receive an Apple iPod Mini FREE for your judgement on who ports your project to Linux PPC the best. Sponsored by IBM. Deadline: Sept. 13. Go here: http://sf.net/ppc_contest.php _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
I have been running IET 0.3.3 on 2.4.27 on one machine, and cisco''s linux-iscsi on 2.6.8.1 on a second physical machine for a couple days now. So far the only thing that I have run into is a dump message concerning OOM on the linux-iscsi machine. Sep 13 00:20:11 vhost1 kernel: iSCSI: 4.0.1 ( 9-Feb-2004) built for Linux 2.6.8-tbc-vhost-Xen0 Sep 13 00:20:11 vhost1 kernel: iSCSI: will translate deferred sense to current sense on disk command responses Sep 13 00:20:11 vhost1 kernel: iSCSI: control device major number 254 Sep 13 00:20:11 vhost1 kernel: scsi_proc_hostdir_add: proc_mkdir failed for <NULL> Sep 13 00:20:11 vhost1 kernel: scsi17 : Cisco iSCSI driver Sep 13 00:20:11 vhost1 kernel: iSCSI:detected HBA host #17 Sep 13 00:20:11 vhost1 kernel: iSCSI: bus 0 target 0 iqn.2001-04.dmz.iscsi1:wnhttp Sep 13 00:20:11 vhost1 kernel: iSCSI: bus 0 target 0 portal 0 = address 10.11.7.1 port 3260 group 1 Sep 13 00:20:11 vhost1 kernel: iSCSI: starting timer thread at 21835751 Sep 13 00:20:11 vhost1 kernel: iSCSI: bus 0 target 0 trying to establish session to portal 0, address 10.11.7.1 port 32 60 group 1 Sep 13 00:20:12 vhost1 kernel: iSCSI: session c1478000 authenticated by target iqn.2001-04.dmz.iscsi1:wnhttp Sep 13 00:20:12 vhost1 kernel: iSCSI: bus 0 target 0 established session #1, portal 0, address 10.11.7.1 port 3260 grou p 1 Sep 13 00:20:12 vhost1 kernel: Vendor: LINUX Model: ISCSI Rev: 0 Sep 13 00:20:12 vhost1 kernel: Type: Direct-Access ANSI SCSI revision: 03 Sep 13 00:20:12 vhost1 kernel: SCSI device sda: 16777212 512-byte hdwr sectors (8590 MB) Sep 13 00:20:12 vhost1 kernel: SCSI device sda: drive cache: write back Sep 13 00:20:12 vhost1 kernel: sda: unknown partition table Sep 13 00:20:12 vhost1 kernel: Attached scsi disk sda at scsi17, channel 0, id 0, lun 0 Sep 13 00:20:12 vhost1 kernel: Vendor: LINUX Model: ISCSI Rev: 0 Sep 13 00:20:12 vhost1 kernel: Type: Direct-Access ANSI SCSI revision: 03 Sep 13 00:20:12 vhost1 kernel: SCSI device sdb: 65536 512-byte hdwr sectors (34 MB) Sep 13 00:20:12 vhost1 kernel: SCSI device sdb: drive cache: write back Sep 13 00:20:12 vhost1 kernel: sdb: unknown partition table Sep 13 00:20:12 vhost1 kernel: Attached scsi disk sdb at scsi17, channel 0, id 0, lun 1 Sep 13 00:21:55 vhost1 kernel: ReiserFS: sda: found reiserfs format "3.6" with standard journal Sep 13 00:21:55 vhost1 kernel: ReiserFS: sda: using ordered data mode Sep 13 00:21:55 vhost1 kernel: ReiserFS: sda: journal params: device sda, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30 Sep 13 00:21:55 vhost1 kernel: ReiserFS: sda: checking transaction log (sda) Sep 13 00:21:55 vhost1 kernel: ReiserFS: sda: replayed 1 transactions in 0 seconds Sep 13 00:21:55 vhost1 kernel: ReiserFS: sda: Using r5 hash to sort names Sep 13 00:28:51 vhost1 kernel: iscsi-tx: page allocation failure. order:1, mode:0x20 Sep 13 00:28:51 vhost1 kernel: [__alloc_pages+728/848] __alloc_pages+0x2d8/0x350 Sep 13 00:28:51 vhost1 kernel: Sep 13 00:28:51 vhost1 kernel: [__get_free_pages+31/64] __get_free_pages+0x1f/0x40 Sep 13 00:28:51 vhost1 kernel: Sep 13 00:28:51 vhost1 kernel: [kmem_getpages+30/224] kmem_getpages+0x1e/0xe0 Sep 13 00:28:51 vhost1 kernel: Sep 13 00:28:51 vhost1 kernel: [cache_grow+159/336] cache_grow+0x9f/0x150 Sep 13 00:28:51 vhost1 kernel: Sep 13 00:28:51 vhost1 kernel: [cache_alloc_refill+318/512] cache_alloc_refill+0x13e/0x200 Sep 13 00:28:51 vhost1 kernel: Sep 13 00:28:51 vhost1 kernel: [__kmalloc+139/160] __kmalloc+0x8b/0xa0 Sep 13 00:28:51 vhost1 kernel: Sep 13 00:28:51 vhost1 kernel: [alloc_skb+71/224] alloc_skb+0x47/0xe0 Sep 13 00:28:51 vhost1 kernel: Sep 13 00:28:51 vhost1 kernel: [pg0+38296326/1002676224] rhine_rx+0x156/0x460 [via_rhine] Sep 13 00:28:51 vhost1 kernel: Sep 13 00:28:51 vhost1 kernel: [pg0+38295340/1002676224] rhine_interrupt+0x1ac/0x1d0 [via_rhine] Sep 13 00:28:51 vhost1 kernel: Sep 13 00:28:51 vhost1 kernel: [handle_IRQ_event+73/144] handle_IRQ_event+0x49/0x90 Sep 13 00:28:51 vhost1 kernel: Sep 13 00:28:51 vhost1 kernel: [do_IRQ+109/240] do_IRQ+0x6d/0xf0 Sep 13 00:28:51 vhost1 kernel: Sep 13 00:28:51 vhost1 kernel: [evtchn_do_upcall+156/256] evtchn_do_upcall+0x9c/0x100 Sep 13 00:28:51 vhost1 kernel: Sep 13 00:28:51 vhost1 kernel: [hypervisor_callback+51/73] hypervisor_callback+0x33/0x49 Sep 13 00:28:51 vhost1 kernel: Sep 13 00:28:51 vhost1 kernel: [csum_partial_copy_generic+63/248] csum_partial_copy_generic+0x3f/0xf8 Sep 13 00:28:51 vhost1 kernel: Sep 13 00:28:51 vhost1 kernel: [tcp_sendmsg+578/4176] tcp_sendmsg+0x242/0x1050 Sep 13 00:28:51 vhost1 kernel: Sep 13 00:28:51 vhost1 kernel: [inet_sendmsg+77/96] inet_sendmsg+0x4d/0x60 Sep 13 00:28:51 vhost1 kernel: Sep 13 00:28:51 vhost1 kernel: [sock_sendmsg+165/192] sock_sendmsg+0xa5/0xc0 Sep 13 00:28:51 vhost1 kernel: Sep 13 00:28:51 vhost1 kernel: [__do_softirq+149/160] __do_softirq+0x95/0xa0 Sep 13 00:28:51 vhost1 kernel: Sep 13 00:28:51 vhost1 kernel: [do_softirq+69/80] do_softirq+0x45/0x50 Sep 13 00:28:51 vhost1 kernel: Sep 13 00:28:51 vhost1 kernel: [do_IRQ+194/240] do_IRQ+0xc2/0xf0 Sep 13 00:28:51 vhost1 kernel: Sep 13 00:28:51 vhost1 kernel: [pg0+39270168/1002676224] iscsi_xmit_queued_cmnds+0x188/0x3c0 [iscsi] Sep 13 00:28:51 vhost1 kernel: Sep 13 00:28:51 vhost1 kernel: [pg0+39254271/1002676224] iscsi_sendmsg+0x4f/0x70 [iscsi] Sep 13 00:28:51 vhost1 kernel: Sep 13 00:28:51 vhost1 kernel: [pg0+39271874/1002676224] iscsi_xmit_data+0x472/0x8d0 [iscsi] Sep 13 00:28:51 vhost1 kernel: Sep 13 00:28:51 vhost1 kernel: [__do_softirq+149/160] __do_softirq+0x95/0xa0 Sep 13 00:28:51 vhost1 kernel: Sep 13 00:28:51 vhost1 kernel: [pg0+39273273/1002676224] iscsi_xmit_r2t_data+0x119/0x1f0 [iscsi] Sep 13 00:28:51 vhost1 kernel: Sep 13 00:28:51 vhost1 kernel: [pg0+39165617/1002676224] iscsi_tx_thread+0x711/0x8d0 [iscsi] Sep 13 00:28:51 vhost1 kernel: Sep 13 00:28:51 vhost1 kernel: [autoremove_wake_function+0/96] autoremove_wake_function+0x0/0x60 Sep 13 00:28:51 vhost1 kernel: Sep 13 00:28:51 vhost1 kernel: [autoremove_wake_function+0/96] autoremove_wake_function+0x0/0x60 Sep 13 00:28:51 vhost1 kernel: Sep 13 00:28:51 vhost1 kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20 Sep 13 00:28:51 vhost1 kernel: Sep 13 00:28:51 vhost1 kernel: [pg0+39163808/1002676224] iscsi_tx_thread+0x0/0x8d0 [iscsi] Sep 13 00:28:51 vhost1 kernel: Sep 13 00:28:51 vhost1 kernel: [kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10 Sep 13 00:28:51 vhost1 kernel: The only reason I''m posting the "trace" from linux-iscsi is because it contains the hypervisor_callback function in it and it''s in the rx phase of the via_rhine driver. What iscsi are you running on each machine? (Sorry if I missed it, been offline for a few deays now. 8-( ) I''d be interested to know if this is in any way similar to your issue. Brian On Tue, 2004-09-14 at 07:38, James Harper wrote:> I''m now seeing this network hang a lot, to the point where it makes my > iscsi testing unusable. I believe this is more to do with the sort of > testing I''m doing now more so than a bug that has suddenly appeared. > > My setup is this: > Dom0: > 2.6.8.1 > Iscsitarget 0.3.3 + 2.6 patches + my own 2.6 patches. > No conntrack or other netfilter related modules > Bridged eth0 to Dom1 > /usr/src exported via nfs > > Dom1: > 2.6.8.1 > Linux-iscsi 4.0.1.8 > No conntrack or other netfilter related modules > /usr/src mounted from Dom0 > > Iscsi works for a while, normally crashing in Dom0 due to another > non-xen related bug before it hits this bug, but if I try to do a > compile on Dom1 in the nfs mounted /usr/src, the network locks up almost > instantly, but then clears up shortly after if I kill the compile. > > The logs show absolutely nothing of any use. > > I''ve just tried a few netperf tests. A quick hammering goes off without > a hitch, but afterwards I see random dropped packets. I''ll keep testing. > > James > > > ------------------------------------------------------- > This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170 > Project Admins to receive an Apple iPod Mini FREE for your judgement on > who ports your project to Linux PPC the best. Sponsored by IBM. > Deadline: Sept. 13. Go here: http://sf.net/ppc_contest.php > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xen-devel------------------------------------------------------- This SF.Net email is sponsored by: thawte''s Crypto Challenge Vl Crack the code and win a Sony DCRHC40 MiniDV Digital Handycam Camcorder. More prizes in the weekly Lunch Hour Challenge. Sign up NOW http://ad.doubleclick.net/clk;10740251;10262165;m _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
When I explained about the patch on the iet list, I was asked if I was getting frequent disconnections :) It sounds like the network issues I''m seeing in xen are probably triggering the crash in iscsi. I''m running iet 0.3.3 + 2.6 patch + my additional 2.6 patch on dom0, and linux-iscsi 4.0.1.8 on dom1. James> -----Original Message----- > From: Brian Wolfe [mailto:ahzz@comcast.net] > Sent: Wednesday, 15 September 2004 02:22 > To: James Harper > Cc: xen-devel@lists.sourceforge.net > Subject: Re: [Xen-devel] network hang again > > I have been running IET 0.3.3 on 2.4.27 on one machine, and cisco''s > linux-iscsi on 2.6.8.1 on a second physical machine for a couple days > now. So far the only thing that I have run into is a dump message > concerning OOM on the linux-iscsi machine. > > > Sep 13 00:20:11 vhost1 kernel: iSCSI: 4.0.1 ( 9-Feb-2004) built for > Linux 2.6.8-tbc-vhost-Xen0 > Sep 13 00:20:11 vhost1 kernel: iSCSI: will translate deferred sense to > current sense on disk command responses > Sep 13 00:20:11 vhost1 kernel: iSCSI: control device major number 254 > Sep 13 00:20:11 vhost1 kernel: scsi_proc_hostdir_add: proc_mkdirfailed> for <NULL> > Sep 13 00:20:11 vhost1 kernel: scsi17 : Cisco iSCSI driver > Sep 13 00:20:11 vhost1 kernel: iSCSI:detected HBA host #17 > Sep 13 00:20:11 vhost1 kernel: iSCSI: bus 0 target 0 > iqn.2001-04.dmz.iscsi1:wnhttp > Sep 13 00:20:11 vhost1 kernel: iSCSI: bus 0 target 0 portal 0 address > 10.11.7.1 port 3260 group 1 > Sep 13 00:20:11 vhost1 kernel: iSCSI: starting timer thread at21835751> Sep 13 00:20:11 vhost1 kernel: iSCSI: bus 0 target 0 trying toestablish> session to portal 0, address 10.11.7.1 port 32 > 60 group 1 > Sep 13 00:20:12 vhost1 kernel: iSCSI: session c1478000 authenticatedby> target iqn.2001-04.dmz.iscsi1:wnhttp > Sep 13 00:20:12 vhost1 kernel: iSCSI: bus 0 target 0 establishedsession> #1, portal 0, address 10.11.7.1 port 3260 grou > p 1 > Sep 13 00:20:12 vhost1 kernel: Vendor: LINUX Model: > ISCSI Rev: 0 > Sep 13 00:20:12 vhost1 kernel: Type: > Direct-Access ANSI SCSI revision: 03 > Sep 13 00:20:12 vhost1 kernel: SCSI device sda: 16777212 512-byte hdwr > sectors (8590 MB) > Sep 13 00:20:12 vhost1 kernel: SCSI device sda: drive cache: writeback> Sep 13 00:20:12 vhost1 kernel: sda: unknown partition table > Sep 13 00:20:12 vhost1 kernel: Attached scsi disk sda at scsi17,channel> 0, id 0, lun 0 > Sep 13 00:20:12 vhost1 kernel: Vendor: LINUX Model: > ISCSI Rev: 0 > Sep 13 00:20:12 vhost1 kernel: Type: > Direct-Access ANSI SCSI revision: 03 > Sep 13 00:20:12 vhost1 kernel: SCSI device sdb: 65536 512-byte hdwr > sectors (34 MB) > Sep 13 00:20:12 vhost1 kernel: SCSI device sdb: drive cache: writeback> Sep 13 00:20:12 vhost1 kernel: sdb: unknown partition table > Sep 13 00:20:12 vhost1 kernel: Attached scsi disk sdb at scsi17,channel> 0, id 0, lun 1 > Sep 13 00:21:55 vhost1 kernel: ReiserFS: sda: found reiserfs format > "3.6" with standard journal > Sep 13 00:21:55 vhost1 kernel: ReiserFS: sda: using ordered data mode > Sep 13 00:21:55 vhost1 kernel: ReiserFS: sda: journal params: device > sda, size 8192, journal first block 18, max trans > len 1024, max batch 900, max commit age 30, max trans age 30 > Sep 13 00:21:55 vhost1 kernel: ReiserFS: sda: checking transaction log > (sda) > Sep 13 00:21:55 vhost1 kernel: ReiserFS: sda: replayed 1 transactionsin> 0 seconds > Sep 13 00:21:55 vhost1 kernel: ReiserFS: sda: Using r5 hash to sort > names > Sep 13 00:28:51 vhost1 kernel: iscsi-tx: page allocation failure. > order:1, mode:0x20 > Sep 13 00:28:51 vhost1 kernel: [__alloc_pages+728/848] > __alloc_pages+0x2d8/0x350 > Sep 13 00:28:51 vhost1 kernel: > Sep 13 00:28:51 vhost1 kernel: [__get_free_pages+31/64] > __get_free_pages+0x1f/0x40 > Sep 13 00:28:51 vhost1 kernel: > Sep 13 00:28:51 vhost1 kernel: [kmem_getpages+30/224] > kmem_getpages+0x1e/0xe0 > Sep 13 00:28:51 vhost1 kernel: > Sep 13 00:28:51 vhost1 kernel: [cache_grow+159/336] > cache_grow+0x9f/0x150 > Sep 13 00:28:51 vhost1 kernel: > Sep 13 00:28:51 vhost1 kernel: [cache_alloc_refill+318/512] > cache_alloc_refill+0x13e/0x200 > Sep 13 00:28:51 vhost1 kernel: > Sep 13 00:28:51 vhost1 kernel: [__kmalloc+139/160]__kmalloc+0x8b/0xa0> Sep 13 00:28:51 vhost1 kernel: > Sep 13 00:28:51 vhost1 kernel: [alloc_skb+71/224] alloc_skb+0x47/0xe0 > Sep 13 00:28:51 vhost1 kernel: > Sep 13 00:28:51 vhost1 kernel: [pg0+38296326/1002676224] > rhine_rx+0x156/0x460 [via_rhine] > Sep 13 00:28:51 vhost1 kernel: > Sep 13 00:28:51 vhost1 kernel: [pg0+38295340/1002676224] > rhine_interrupt+0x1ac/0x1d0 [via_rhine] > Sep 13 00:28:51 vhost1 kernel: > Sep 13 00:28:51 vhost1 kernel: [handle_IRQ_event+73/144] > handle_IRQ_event+0x49/0x90 > Sep 13 00:28:51 vhost1 kernel: > Sep 13 00:28:51 vhost1 kernel: [do_IRQ+109/240] do_IRQ+0x6d/0xf0 > Sep 13 00:28:51 vhost1 kernel: > Sep 13 00:28:51 vhost1 kernel: [evtchn_do_upcall+156/256] > evtchn_do_upcall+0x9c/0x100 > Sep 13 00:28:51 vhost1 kernel: > Sep 13 00:28:51 vhost1 kernel: [hypervisor_callback+51/73] > hypervisor_callback+0x33/0x49 > Sep 13 00:28:51 vhost1 kernel: > Sep 13 00:28:51 vhost1 kernel: [csum_partial_copy_generic+63/248] > csum_partial_copy_generic+0x3f/0xf8 > Sep 13 00:28:51 vhost1 kernel: > Sep 13 00:28:51 vhost1 kernel: [tcp_sendmsg+578/4176] > tcp_sendmsg+0x242/0x1050 > Sep 13 00:28:51 vhost1 kernel: > Sep 13 00:28:51 vhost1 kernel: [inet_sendmsg+77/96] > inet_sendmsg+0x4d/0x60 > Sep 13 00:28:51 vhost1 kernel: > Sep 13 00:28:51 vhost1 kernel: [sock_sendmsg+165/192] > sock_sendmsg+0xa5/0xc0 > Sep 13 00:28:51 vhost1 kernel: > Sep 13 00:28:51 vhost1 kernel: [__do_softirq+149/160] > __do_softirq+0x95/0xa0 > Sep 13 00:28:51 vhost1 kernel: > Sep 13 00:28:51 vhost1 kernel: [do_softirq+69/80]do_softirq+0x45/0x50> Sep 13 00:28:51 vhost1 kernel: > Sep 13 00:28:51 vhost1 kernel: [do_IRQ+194/240] do_IRQ+0xc2/0xf0 > Sep 13 00:28:51 vhost1 kernel: > Sep 13 00:28:51 vhost1 kernel: [pg0+39270168/1002676224] > iscsi_xmit_queued_cmnds+0x188/0x3c0 [iscsi] > Sep 13 00:28:51 vhost1 kernel: > Sep 13 00:28:51 vhost1 kernel: [pg0+39254271/1002676224] > iscsi_sendmsg+0x4f/0x70 [iscsi] > Sep 13 00:28:51 vhost1 kernel: > Sep 13 00:28:51 vhost1 kernel: [pg0+39271874/1002676224] > iscsi_xmit_data+0x472/0x8d0 [iscsi] > Sep 13 00:28:51 vhost1 kernel: > Sep 13 00:28:51 vhost1 kernel: [__do_softirq+149/160] > __do_softirq+0x95/0xa0 > Sep 13 00:28:51 vhost1 kernel: > Sep 13 00:28:51 vhost1 kernel: [pg0+39273273/1002676224] > iscsi_xmit_r2t_data+0x119/0x1f0 [iscsi] > Sep 13 00:28:51 vhost1 kernel: > Sep 13 00:28:51 vhost1 kernel: [pg0+39165617/1002676224] > iscsi_tx_thread+0x711/0x8d0 [iscsi] > Sep 13 00:28:51 vhost1 kernel: > Sep 13 00:28:51 vhost1 kernel: [autoremove_wake_function+0/96] > autoremove_wake_function+0x0/0x60 > Sep 13 00:28:51 vhost1 kernel: > Sep 13 00:28:51 vhost1 kernel: [autoremove_wake_function+0/96] > autoremove_wake_function+0x0/0x60 > Sep 13 00:28:51 vhost1 kernel: > Sep 13 00:28:51 vhost1 kernel: [default_wake_function+0/32] > default_wake_function+0x0/0x20 > Sep 13 00:28:51 vhost1 kernel: > Sep 13 00:28:51 vhost1 kernel: [pg0+39163808/1002676224] > iscsi_tx_thread+0x0/0x8d0 [iscsi] > Sep 13 00:28:51 vhost1 kernel: > Sep 13 00:28:51 vhost1 kernel: [kernel_thread_helper+5/16] > kernel_thread_helper+0x5/0x10 > Sep 13 00:28:51 vhost1 kernel: > > The only reason I''m posting the "trace" from linux-iscsi is because it > contains the hypervisor_callback function in it and it''s in the rxphase> of the via_rhine driver. > > What iscsi are you running on each machine? (Sorry if I missed it,been> offline for a few deays now. 8-( ) I''d be interested to know if thisis> in any way similar to your issue. > > Brian > > > On Tue, 2004-09-14 at 07:38, James Harper wrote: > > I''m now seeing this network hang a lot, to the point where it makesmy> > iscsi testing unusable. I believe this is more to do with the sortof> > testing I''m doing now more so than a bug that has suddenly appeared. > > > > My setup is this: > > Dom0: > > 2.6.8.1 > > Iscsitarget 0.3.3 + 2.6 patches + my own 2.6 patches. > > No conntrack or other netfilter related modules > > Bridged eth0 to Dom1 > > /usr/src exported via nfs > > > > Dom1: > > 2.6.8.1 > > Linux-iscsi 4.0.1.8 > > No conntrack or other netfilter related modules > > /usr/src mounted from Dom0 > > > > Iscsi works for a while, normally crashing in Dom0 due to another > > non-xen related bug before it hits this bug, but if I try to do a > > compile on Dom1 in the nfs mounted /usr/src, the network locks upalmost> > instantly, but then clears up shortly after if I kill the compile. > > > > The logs show absolutely nothing of any use. > > > > I''ve just tried a few netperf tests. A quick hammering goes offwithout> > a hitch, but afterwards I see random dropped packets. I''ll keeptesting.> > > > James > > > > > > ------------------------------------------------------- > > This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170 > > Project Admins to receive an Apple iPod Mini FREE for your judgementon> > who ports your project to Linux PPC the best. Sponsored by IBM. > > Deadline: Sept. 13. Go here: http://sf.net/ppc_contest.php > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/xen-devel >------------------------------------------------------- This SF.Net email is sponsored by: thawte''s Crypto Challenge Vl Crack the code and win a Sony DCRHC40 MiniDV Digital Handycam Camcorder. More prizes in the weekly Lunch Hour Challenge. Sign up NOW http://ad.doubleclick.net/clk;10740251;10262165;m _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
I tracked the glitch back to the 2.4.27 domain-1 (unpriv, uses evms blocks from dom0 to serve out as iscsi targets via file-io) with this error message being the trigger point of the colapse. Sep 15 00:16:55 localhost kernel: fileio_make_request(85) Bad things happened 40 96, -5 from kernel/file-io.c:lines 76 to 85 seems to be the error point. if (rw == READ) ret = generic_file_read(filp, buf, count, &ppos); else ret = generic_file_write(filp, buf, count, &ppos); if (ret != count) printk("%s(%d) Bad things happened %lld, %d\n", __FUNCTION__, __LINE__, count, ret); -5 is -EIO in linux-2.4.27/include/asm-i386/errno.h:8 #define EIO 5 /* I/O error */ I do NOT get any errors from domain0, so I can''t trace through to dom0 right now. 8-( This error coincides perfectly time wise with the linux-iscsi initiator errors I got earlier this week, so I believe that this is what''s triggering the iscsi-initiator error. Any advice on how to figure out what is causing the I/O error would be greatly appreciated. Right now it is the ONLY thing that is holding me back from using the IET iSCSI target. Thanks! Brian Wolfe On Tue, 2004-09-14 at 21:50, James Harper wrote:> When I explained about the patch on the iet list, I was asked if I was > getting frequent disconnections :) > > It sounds like the network issues I''m seeing in xen are probably > triggering the crash in iscsi. > > I''m running iet 0.3.3 + 2.6 patch + my additional 2.6 patch on dom0, and > linux-iscsi 4.0.1.8 on dom1. > > James > > > -----Original Message----- > > From: Brian Wolfe [mailto:ahzz@comcast.net] > > Sent: Wednesday, 15 September 2004 02:22 > > To: James Harper > > Cc: xen-devel@lists.sourceforge.net > > Subject: Re: [Xen-devel] network hang again > > > > I have been running IET 0.3.3 on 2.4.27 on one machine, and cisco''s > > linux-iscsi on 2.6.8.1 on a second physical machine for a couple days > > now. So far the only thing that I have run into is a dump message > > concerning OOM on the linux-iscsi machine. > > > > > > Sep 13 00:20:11 vhost1 kernel: iSCSI: 4.0.1 ( 9-Feb-2004) built for > > Linux 2.6.8-tbc-vhost-Xen0 > > Sep 13 00:20:11 vhost1 kernel: iSCSI: will translate deferred sense to > > current sense on disk command responses > > Sep 13 00:20:11 vhost1 kernel: iSCSI: control device major number 254 > > Sep 13 00:20:11 vhost1 kernel: scsi_proc_hostdir_add: proc_mkdir > failed > > for <NULL> > > Sep 13 00:20:11 vhost1 kernel: scsi17 : Cisco iSCSI driver > > Sep 13 00:20:11 vhost1 kernel: iSCSI:detected HBA host #17 > > Sep 13 00:20:11 vhost1 kernel: iSCSI: bus 0 target 0 > > iqn.2001-04.dmz.iscsi1:wnhttp > > Sep 13 00:20:11 vhost1 kernel: iSCSI: bus 0 target 0 portal 0 > address > > 10.11.7.1 port 3260 group 1 > > Sep 13 00:20:11 vhost1 kernel: iSCSI: starting timer thread at > 21835751 > > Sep 13 00:20:11 vhost1 kernel: iSCSI: bus 0 target 0 trying to > establish > > session to portal 0, address 10.11.7.1 port 32 > > 60 group 1 > > Sep 13 00:20:12 vhost1 kernel: iSCSI: session c1478000 authenticated > by > > target iqn.2001-04.dmz.iscsi1:wnhttp > > Sep 13 00:20:12 vhost1 kernel: iSCSI: bus 0 target 0 established > session > > #1, portal 0, address 10.11.7.1 port 3260 grou > > p 1 > > Sep 13 00:20:12 vhost1 kernel: Vendor: LINUX Model: > > ISCSI Rev: 0 > > Sep 13 00:20:12 vhost1 kernel: Type: > > Direct-Access ANSI SCSI revision: 03 > > Sep 13 00:20:12 vhost1 kernel: SCSI device sda: 16777212 512-byte hdwr > > sectors (8590 MB) > > Sep 13 00:20:12 vhost1 kernel: SCSI device sda: drive cache: write > back > > Sep 13 00:20:12 vhost1 kernel: sda: unknown partition table > > Sep 13 00:20:12 vhost1 kernel: Attached scsi disk sda at scsi17, > channel > > 0, id 0, lun 0 > > Sep 13 00:20:12 vhost1 kernel: Vendor: LINUX Model: > > ISCSI Rev: 0 > > Sep 13 00:20:12 vhost1 kernel: Type: > > Direct-Access ANSI SCSI revision: 03 > > Sep 13 00:20:12 vhost1 kernel: SCSI device sdb: 65536 512-byte hdwr > > sectors (34 MB) > > Sep 13 00:20:12 vhost1 kernel: SCSI device sdb: drive cache: write > back > > Sep 13 00:20:12 vhost1 kernel: sdb: unknown partition table > > Sep 13 00:20:12 vhost1 kernel: Attached scsi disk sdb at scsi17, > channel > > 0, id 0, lun 1 > > Sep 13 00:21:55 vhost1 kernel: ReiserFS: sda: found reiserfs format > > "3.6" with standard journal > > Sep 13 00:21:55 vhost1 kernel: ReiserFS: sda: using ordered data mode > > Sep 13 00:21:55 vhost1 kernel: ReiserFS: sda: journal params: device > > sda, size 8192, journal first block 18, max trans > > len 1024, max batch 900, max commit age 30, max trans age 30 > > Sep 13 00:21:55 vhost1 kernel: ReiserFS: sda: checking transaction log > > (sda) > > Sep 13 00:21:55 vhost1 kernel: ReiserFS: sda: replayed 1 transactions > in > > 0 seconds > > Sep 13 00:21:55 vhost1 kernel: ReiserFS: sda: Using r5 hash to sort > > names > > Sep 13 00:28:51 vhost1 kernel: iscsi-tx: page allocation failure. > > order:1, mode:0x20 > > Sep 13 00:28:51 vhost1 kernel: [__alloc_pages+728/848] > > __alloc_pages+0x2d8/0x350 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [__get_free_pages+31/64] > > __get_free_pages+0x1f/0x40 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [kmem_getpages+30/224] > > kmem_getpages+0x1e/0xe0 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [cache_grow+159/336] > > cache_grow+0x9f/0x150 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [cache_alloc_refill+318/512] > > cache_alloc_refill+0x13e/0x200 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [__kmalloc+139/160] > __kmalloc+0x8b/0xa0 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [alloc_skb+71/224] alloc_skb+0x47/0xe0 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [pg0+38296326/1002676224] > > rhine_rx+0x156/0x460 [via_rhine] > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [pg0+38295340/1002676224] > > rhine_interrupt+0x1ac/0x1d0 [via_rhine] > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [handle_IRQ_event+73/144] > > handle_IRQ_event+0x49/0x90 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [do_IRQ+109/240] do_IRQ+0x6d/0xf0 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [evtchn_do_upcall+156/256] > > evtchn_do_upcall+0x9c/0x100 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [hypervisor_callback+51/73] > > hypervisor_callback+0x33/0x49 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [csum_partial_copy_generic+63/248] > > csum_partial_copy_generic+0x3f/0xf8 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [tcp_sendmsg+578/4176] > > tcp_sendmsg+0x242/0x1050 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [inet_sendmsg+77/96] > > inet_sendmsg+0x4d/0x60 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [sock_sendmsg+165/192] > > sock_sendmsg+0xa5/0xc0 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [__do_softirq+149/160] > > __do_softirq+0x95/0xa0 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [do_softirq+69/80] > do_softirq+0x45/0x50 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [do_IRQ+194/240] do_IRQ+0xc2/0xf0 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [pg0+39270168/1002676224] > > iscsi_xmit_queued_cmnds+0x188/0x3c0 [iscsi] > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [pg0+39254271/1002676224] > > iscsi_sendmsg+0x4f/0x70 [iscsi] > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [pg0+39271874/1002676224] > > iscsi_xmit_data+0x472/0x8d0 [iscsi] > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [__do_softirq+149/160] > > __do_softirq+0x95/0xa0 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [pg0+39273273/1002676224] > > iscsi_xmit_r2t_data+0x119/0x1f0 [iscsi] > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [pg0+39165617/1002676224] > > iscsi_tx_thread+0x711/0x8d0 [iscsi] > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [autoremove_wake_function+0/96] > > autoremove_wake_function+0x0/0x60 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [autoremove_wake_function+0/96] > > autoremove_wake_function+0x0/0x60 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [default_wake_function+0/32] > > default_wake_function+0x0/0x20 > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [pg0+39163808/1002676224] > > iscsi_tx_thread+0x0/0x8d0 [iscsi] > > Sep 13 00:28:51 vhost1 kernel: > > Sep 13 00:28:51 vhost1 kernel: [kernel_thread_helper+5/16] > > kernel_thread_helper+0x5/0x10 > > Sep 13 00:28:51 vhost1 kernel: > > > > The only reason I''m posting the "trace" from linux-iscsi is because it > > contains the hypervisor_callback function in it and it''s in the rx > phase > > of the via_rhine driver. > > > > What iscsi are you running on each machine? (Sorry if I missed it, > been > > offline for a few deays now. 8-( ) I''d be interested to know if this > is > > in any way similar to your issue. > > > > Brian > > > > > > On Tue, 2004-09-14 at 07:38, James Harper wrote: > > > I''m now seeing this network hang a lot, to the point where it makes > my > > > iscsi testing unusable. I believe this is more to do with the sort > of > > > testing I''m doing now more so than a bug that has suddenly appeared. > > > > > > My setup is this: > > > Dom0: > > > 2.6.8.1 > > > Iscsitarget 0.3.3 + 2.6 patches + my own 2.6 patches. > > > No conntrack or other netfilter related modules > > > Bridged eth0 to Dom1 > > > /usr/src exported via nfs > > > > > > Dom1: > > > 2.6.8.1 > > > Linux-iscsi 4.0.1.8 > > > No conntrack or other netfilter related modules > > > /usr/src mounted from Dom0 > > > > > > Iscsi works for a while, normally crashing in Dom0 due to another > > > non-xen related bug before it hits this bug, but if I try to do a > > > compile on Dom1 in the nfs mounted /usr/src, the network locks up > almost > > > instantly, but then clears up shortly after if I kill the compile. > > > > > > The logs show absolutely nothing of any use. > > > > > > I''ve just tried a few netperf tests. A quick hammering goes off > without > > > a hitch, but afterwards I see random dropped packets. I''ll keep > testing. > > > > > > James > > > > > > > > > ------------------------------------------------------- > > > This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170 > > > Project Admins to receive an Apple iPod Mini FREE for your judgement > on > > > who ports your project to Linux PPC the best. Sponsored by IBM. > > > Deadline: Sept. 13. Go here: http://sf.net/ppc_contest.php > > > _______________________________________________ > > > Xen-devel mailing list > > > Xen-devel@lists.sourceforge.net > > > https://lists.sourceforge.net/lists/listinfo/xen-devel > > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: thawte''s Crypto Challenge Vl > Crack the code and win a Sony DCRHC40 MiniDV Digital Handycam > Camcorder. More prizes in the weekly Lunch Hour Challenge. > Sign up NOW http://ad.doubleclick.net/clk;10740251;10262165;m > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xen-devel------------------------------------------------------- This SF.Net email is sponsored by: thawte''s Crypto Challenge Vl Crack the code and win a Sony DCRHC40 MiniDV Digital Handycam Camcorder. More prizes in the weekly Lunch Hour Challenge. Sign up NOW http://ad.doubleclick.net/clk;10740251;10262165;m _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> I do NOT get any errors from domain0, so I can''t trace through to dom0 > right now. 8-( > > This error coincides perfectly time wise with the linux-iscsi initiator > errors I got earlier this week, so I believe that this is what''s > triggering the iscsi-initiator error. > > Any advice on how to figure out what is causing the I/O error would be > greatly appreciated. Right now it is the ONLY thing that is holding me > back from using the IET iSCSI target.You''ll have to add tracing within generic_file_{read,write}. Those are the interface into the buffer cache (pretty much) and below that you enter the generic blkdev layer. Unfortunately the layers are pretty thick, but hopefully you can drill back down the error paths fairly quickly. Perhaps this might be related to the networking weirdnesses that people are seeing? I can''t explain the weird DOM0<->DOM1 ping behaviour. -- Keir ------------------------------------------------------- This SF.Net email is sponsored by: thawte''s Crypto Challenge Vl Crack the code and win a Sony DCRHC40 MiniDV Digital Handycam Camcorder. More prizes in the weekly Lunch Hour Challenge. Sign up NOW http://ad.doubleclick.net/clk;10740251;10262165;m _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Hmm, I can give it a try, but I''ve never delved this deep into the kernel block layers before. On Wed, 2004-09-15 at 13:04, Keir Fraser wrote:> > I do NOT get any errors from domain0, so I can''t trace through to dom0 > > right now. 8-( > > > > This error coincides perfectly time wise with the linux-iscsi initiator > > errors I got earlier this week, so I believe that this is what''s > > triggering the iscsi-initiator error. > > > > Any advice on how to figure out what is causing the I/O error would be > > greatly appreciated. Right now it is the ONLY thing that is holding me > > back from using the IET iSCSI target. > > You''ll have to add tracing within generic_file_{read,write}. Those are > the interface into the buffer cache (pretty much) and below that you > enter the generic blkdev layer. Unfortunately the layers are pretty > thick, but hopefully you can drill back down the error paths fairly > quickly. > > Perhaps this might be related to the networking weirdnesses that > people are seeing? I can''t explain the weird DOM0<->DOM1 ping > behaviour. > > -- Keir > > > ------------------------------------------------------- > This SF.Net email is sponsored by: thawte''s Crypto Challenge Vl > Crack the code and win a Sony DCRHC40 MiniDV Digital Handycam > Camcorder. More prizes in the weekly Lunch Hour Challenge. > Sign up NOW http://ad.doubleclick.net/clk;10740251;10262165;m > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xen-devel------------------------------------------------------- This SF.Net email is sponsored by: thawte''s Crypto Challenge Vl Crack the code and win a Sony DCRHC40 MiniDV Digital Handycam Camcorder. More prizes in the weekly Lunch Hour Challenge. Sign up NOW http://ad.doubleclick.net/clk;10740251;10262165;m _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel