I need help in trying to understand why the ethernet driver has locked up and how I can go about outputting debug messages. I see in /proc/interrrupts that the interrupt count of eth0 just stops incrementing. I''ve tried different (3Com, Realtek 8169, Realtek 8139) based network cards and this happens with all. I''m not sure what is so unique about my system that might be causing this to lockup, its a regular Nforce3 based system with 512MB ram. This problem does not happen if I''m not using a Xen enabled kernel. This is entirely happening in dom0 and there aren''t any user domains so its not a bridging issue. I''ve also disabled the Xen backend drivers (netbk.ko and netloop.ko) so its talking to the network chip directly. Any help? Please? Adnan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> I need help in trying to understand why the ethernet driver has lockedup> and how I can go about outputting debug messages. I see in > /proc/interrrupts that the interrupt count of eth0 just stopsincrementing.> I''ve tried different (3Com, Realtek 8169, Realtek 8139) based networkcards> and this happens with all. I''m not sure what is so unique about mysystem> that might be causing this to lockup, its a regular Nforce3 basedsystem> with 512MB ram. This problem does not happen if I''m not using a Xenenabled> kernel. This is entirely happening in dom0 and there aren''t any user > domains so its not a bridging issue. I''ve also disabled the Xenbackend> drivers (netbk.ko and netloop.ko) so its talking to the network chip > directly.Which xen version? Can you repro on unstable? Is there anything interesting in ''dmesg''? Best, Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Adnan Khaleel
2006-Jul-25 22:15 UTC
[Xen-devel] RE: Network stops responding after some time
Hello Ian, Thanks for your response. I haven''t had much luck with unstable. I cannot even boot into Dom0 and I get the following message "waiting for /dev/hda2 to appear .... exiting to /bin/sh" (the OS is installed in /dev/hda2) It won''t let me mount /dev/hda2 and I do not see /dev/hda2 either, however when I exit I see a message "/dev/hda2 - Busy" - Strange! I''m not sure why this is happening as the boot messages that appear earlier indicate all IDE devices are properly identified etc. Any guesses? Coming back to the original problem, as for dmesg, I had posted the dmesg on an earlier post on xen-devel - "RE: [Xen-devel] Network stops responding after some time" in which you had mentioned that the IRQ routing maybe causing the problem. How can I enable the debug messages for the eth IRQ? Are there any debug switches or other debug monitors I can enable to give me a better picture of whats happening? I am in need of desperate assistance as I am at my wits end :) Sincerely, Adnan ----- Original Message ----- From: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk> To: adnan@shadowfax.no-ip.com, xen-devel@lists.xensource.com Cc: Ian.Pratt@cl.cam.ac.uk Sent: Tue, 25 Jul 2006 10:08:18 -0500 Subject: RE: Network stops responding after some time> > I need help in trying to understand why the ethernet driver has locked > up > > and how I can go about outputting debug messages. I see in > > /proc/interrrupts that the interrupt count of eth0 just stops > incrementing. > > I''ve tried different (3Com, Realtek 8169, Realtek 8139) based network > cards > > and this happens with all. I''m not sure what is so unique about my > system > > that might be causing this to lockup, its a regular Nforce3 based > system > > with 512MB ram. This problem does not happen if I''m not using a Xen > enabled > > kernel. This is entirely happening in dom0 and there aren''t any user > > domains so its not a bridging issue. I''ve also disabled the Xen > backend > > drivers (netbk.ko and netloop.ko) so its talking to the network chip > > directly. > > Which xen version? Can you repro on unstable? Is there anything > interesting in ''dmesg''? > Best, > Ian >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Jul-26 10:07 UTC
Re: [Xen-devel] Network stops responding after some time
On 24 Jul 2006, at 19:49, Adnan Khaleel wrote:> I need help in trying to understand why the ethernet driver has locked > up and how I can go about outputting debug messages. I see in > /proc/interrrupts that the interrupt count of eth0 just stops > incrementing. I''ve tried different (3Com, Realtek 8169, Realtek 8139) > based network cards and this happens with all. I''m not sure what is so > unique about my system that might be causing this to lockup, its a > regular Nforce3 based system with 512MB ram. This problem does not > happen if I''m not using a Xen enabled kernel. This is entirely > happening in dom0 and there aren''t any user domains so its not a > bridging issue. I''ve also disabled the Xen backend drivers (netbk.ko > and netloop.ko) so its talking to the network chip directly. > > Any help? Please?Firstly, you need to repro on the kernel from our xen-unstable repository, which is based on kernel.org Linux 2.6.16. Then build the same kernel for native i386 and get boot output. Send the unified diff (diff -u) of the two boot outputs. You may need to tweak the configuration of the native build to get the output similar to that of the Xen-based kernel -- we can tell you how to do that when we see your initial diff. Another thing worth trying is ''ioapic_ack=old'' as a Xen boot parameter in your bootloader configuration file. It probably won''t help, but worth a try. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Adnan Khaleel
2006-Jul-31 22:53 UTC
Re: [Xen-devel] Network stops responding after some time
Keir, Thanks for your response. I''m attempting to do what you have asked and have run into a few problems I''ve outlined below. Please let me know how I can proceed. Thanks Adnan Before going to the next step, using the same setup as in my original email (Xen included with SuSE), I noticed is that xend does not launch successfully here. Here is the xend output from /var/xend.log [2006-07-28 16:13:02 xend 3911] INFO (SrvDaemon:283) Xend Daemon started [2006-07-28 16:13:02 xend 3911] ERROR (SrvDaemon:297) Exception starting xend ((38, ''Function not implemented'')) Traceback (most recent call last): File "//usr/lib64/python/xen/xend/server/SrvDaemon.py", line 286, in run xinfo = xc.xeninfo() Error: (38, ''Function not implemented'') [2006-07-28 16:13:02 xend 3910] INFO (SrvDaemon:183) Xend exited with status 1. Any idea why this might be? As per your instructions, in order to reproduce the problems on the kernel.org based linux 2.6.16 that xen-unstable uses, I obtained the latest version of xen-unstable.hg and built it. I''m now faced with a different problem, I cannot get the kernel to boot up. Instead I get the following message : : NFORCE3-150: IDE controller at PCI slot 0000:00:08.0 NFORCE3-150: chipset revision 165 NFORCE3-150: not 100% native mode: will probe irqs later NFORCE3-150: BIOS didn''t set cable bits correctly. Enabling workaround. NFORCE3-150: 0000:00:08.0 (rev a5) UDMA133 controller ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:DMA, hdb:DMA ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:DMA, hdd:DMA input: ImExPS/2 Generic Explorer Mouse as /class/input/input1 hda: IC35L040AVER07-0, ATA DISK drive hdb: IBM-DTLA-307030, ATA DISK drive ide0 at 0x1f0 - 0x1f7, 0x3f6 on irq 14 hdc: SAMSUNG CDRW/DVD SM-352B, ATAPI CD/DVD-ROM drive ide1 at 0x170 - 0x177, 0x376 on irq 15 Loading processor Loading thermal Loading fan Loading resierfs Waiting for /dev/hda2 to appear .................... not found -- Exiting to /bin/sh Actually this is the final screen so thats all I can record, I do not what messages have been spit out before. Is there anyway I can record or slow down the boot messages and have them saved even though the kernel never finishes booting? I guess in this case, since it never sees the hda2, I guess I would have to save it to a floppy or something. I do not what is different between my compiled version of xen-unstable and the version they supply with SuSE but since I can''t diff the boot messages from my xen-unstable, I''m kinda lost. As you can see, all the devices are seen correctly. For comparison sake, using the SuSE Xen, the similar portion of the boot message looks quite a bit different. There is no "Loading processor" etc. <6>NFORCE3-150: IDE controller at PCI slot 0000:00:08.0 <6>NFORCE3-150: chipset revision 165 <6>NFORCE3-150: not 100% native mode: will probe irqs later <4>NFORCE3-150: BIOS didn''t set cable bits correctly. Enabling workaround. <6>NFORCE3-150: 0000:00:08.0 (rev a5) UDMA133 controller <6> ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:DMA, hdb:DMA <6> ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:DMA, hdd:DMA <7>Probing IDE interface ide0... <4>hda: IC35L040AVER07-0, ATA DISK drive <4>hdb: IBM-DTLA-307030, ATA DISK drive <4>ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 <6>hda: max request size: 128KiB <6>hda: 80418240 sectors (41174 MB) w/1916KiB Cache, CHS=65535/16/63, UDMA(100) <6>hda: cache flushes not supported <6> hda: hda1 hda2 <6>hdb: max request size: 128KiB <6>hdb: 60036480 sectors (30738 MB) w/1916KiB Cache, CHS=59560/16/63, UDMA(100) <6>hdb: cache flushes not supported <6> hdb: hdb1 <7>Probing IDE interface ide1... <4>hdc: SAMSUNG CDRW/DVD SM-352B, ATAPI CD/DVD-ROM drive <4>ide1 at 0x170-0x177,0x376 on irq 15 <5>ReiserFS: hda2: found reiserfs format "3.6" with standard journal <5>ReiserFS: hda2: using ordered data mode <4>reiserfs: using flush barriers <5>ReiserFS: hda2: journal params: device hda2, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30 <5>ReiserFS: hda2: checking transaction log (hda2) So at this point I''m thinking that I must be doing something really wrong and bad. I have another dual opteron box based on the AMD chipset so I try to build xen from xen-unstable and try to boot that. Granted there are several things different between the 2 boxes but it booted the xen unstable kernel on the first attempt with no problems. I''ve been told that NForce chipsets can be problematic and am wondering if that is the case. ----- Original Message ----- From: Keir Fraser <Keir.Fraser@cl.cam.ac.uk> To: adnan@shadowfax.no-ip.com Cc: Ian.Pratt@cl.cam.ac.uk, xen-devel@lists.xensource.com Sent: Wed, 26 Jul 2006 05:07:55 -0500 Subject: Re: [Xen-devel] Network stops responding after some time> > On 24 Jul 2006, at 19:49, Adnan Khaleel wrote: > > > I need help in trying to understand why the ethernet driver has locked > > up and how I can go about outputting debug messages. I see in > > /proc/interrrupts that the interrupt count of eth0 just stops > > incrementing. I''ve tried different (3Com, Realtek 8169, Realtek 8139) > > based network cards and this happens with all. I''m not sure what is so > > unique about my system that might be causing this to lockup, its a > > regular Nforce3 based system with 512MB ram. This problem does not > > happen if I''m not using a Xen enabled kernel. This is entirely > > happening in dom0 and there aren''t any user domains so its not a > > bridging issue. I''ve also disabled the Xen backend drivers (netbk.ko > > and netloop.ko) so its talking to the network chip directly. > > > > Any help? Please? > > Firstly, you need to repro on the kernel from our xen-unstable > repository, which is based on kernel.org Linux 2.6.16. Then build the > same kernel for native i386 and get boot output. Send the unified diff > (diff -u) of the two boot outputs. You may need to tweak the > configuration of the native build to get the output similar to that of > the Xen-based kernel -- we can tell you how to do that when we see your > initial diff. > > Another thing worth trying is ''ioapic_ack=old'' as a Xen boot parameter > in your bootloader configuration file. It probably won''t help, but > worth a try. > > -- Keir > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
I am seeing the same dom0 problem on a box that also has an integrated nForce3 Ethernet adapter. The box started out with SLES 10 but I removed the SUSE supplied Xen components and am using Xen-unstable change set 10982. I am not starting xend, so bridging is not involved. To eliminate the LAN driver issue, I used an Intel E100 adapter. It seems that networking goes along OK as long as traffic is light. To trigger the problem, I download a CD ISO image. Downloading an ISO causes the failure every time. The download progress varies from not even getting started to 426 MB out of the 676 MB. When the download progress stops, checking /proc/interrupts shows that interrupts for the LAN adapter have stopped. As suggested, I tried the ''ioapic_ack=old'' Xen boot parameter. It didn''t help. I don''t believe it is a LAN driver problem because doing a ''rmmod e100'' and an ''insmod e100.ko'' does not get things going again. Attached are various log files that will hopefully be of some use. xendebug * contains the serial output as xen is booting up. It also contains the dump_irqs from the ''i'' and th print_IO_APIC_keyhandler from the ''z'' after doing the <ctrl a> <ctrl a><ctrl a> from the debug terminal. I added counters to mask_and_ack_level_ioapic_vector and end_level_ioapic_vector. The interesting thing here is that the e100 is using int 19 and mask_and_ack_level_ioapic_vector, end_level_ioapic_vector, and int 19 from /proc/interrupts all have the save value (317059) when interrupts stop. Also the irr is set to 1 when interrupts stop. cpuinfo * contents of /proc/cpuinfo hwinfo * results of doing ''hwinfo'' dmesg_native * dmesg from a native kernel boot. dmesg * dmesg from booting the xen kernel. After interrupts stop, there were no new messages. messages - the /var/log/messages for the xen booting kernel. It also contains the ''rmmod e100'' and the ''insmod e100'' messages after interrupts stopped for the e100. ifconfig * the results of doing ifconfig after interrupts stop, after ''rmmod e100'', and after ''insmod e100'' all catted together. ints * the results of /proc/interrupts on a native boot, after a xen kernel boot, and after interrupts stop on the xen kernel. Native uses int 201 for the e100 and xen uses int 19. Any help on this issue is greatly appricated. Thanks, Kirk>>> On Wed, Jul 26, 2006 at 4:07 AM, in message<00812518ddfbf4c535e7a4d25d8bbab3@cl.cam.ac.uk>, Keir Fraser <Keir.Fraser@cl.cam.ac.uk> wrote:> On 24 Jul 2006, at 19:49, Adnan Khaleel wrote: > >> I need help in trying to understand why the ethernet driver has locked >> up and how I can go about outputting debug messages. I see in >> /proc/interrrupts that the interrupt count of eth0 just stops >> incrementing. I''ve tried different (3Com, Realtek 8169, Realtek 8139) >> based network cards and this happens with all. I''m not sure what is so >> unique about my system that might be causing this to lockup, its a >> regular Nforce3 based system with 512MB ram. This problem does not >> happen if I''m not using a Xen enabled kernel. This is entirely >> happening in dom0 and there aren''t any user domains so its not a >> bridging issue. I''ve also disabled the Xen backend drivers (netbk.ko >> and netloop.ko) so its talking to the network chip directly. >> >> Any help? Please? > > Firstly, you need to repro on the kernel from our xen- unstable > repository, which is based on kernel.org Linux 2.6.16. Then build the > same kernel for native i386 and get boot output. Send the unified diff > (diff - u) of the two boot outputs. You may need to tweak the > configuration of the native build to get the output similar to that of > the Xen- based kernel -- we can tell you how to do that when we see your > initial diff. > > Another thing worth trying is ''ioapic_ack=old'' as a Xen boot parameter > in your bootloader configuration file. It probably won''t help, but > worth a try. > > -- Keir > > > _______________________________________________ > Xen- devel mailing list > Xen- devel@lists.xensource.com > http://lists.xensource.com/xen- devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Aug-09 09:51 UTC
Re: [Xen-devel] Network stops responding after some time
On 9/8/06 12:04 am, "Kirk Allan" <kallan@novell.com> wrote:> I am seeing the same dom0 problem on a box that also has an integrated nForce3 > Ethernet adapter. The box started out with SLES 10 but I removed the SUSE > supplied Xen components and am using Xen-unstable change set 10982.Thanks for the detailed bug report. I''ll have a look at the attachments and see if anything looks obvious. It''s probably an IRQ routing issue. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Adnan Khaleel
2006-Aug-09 19:49 UTC
Re: [Xen-devel] Network stops responding after some time
Hi Krik, Your problem seems to be identical to mine. Is your kernel x86-64 as well? I was wondering if this problem also manifested in 32bit as well and I never got around to trying to myself. Hope we can find a fix for this. Adnan ----- Original Message ----- From: Kirk Allan <kallan@novell.com> To: xen-devel@lists.xensource.com Cc: Ian.Pratt@cl.cam.ac.uk, Keir Fraser <Keir.Fraser@cl.cam.ac.uk>, adnan@shadowfax.no-ip.com Sent: Tue, 8 Aug 2006 18:04:33 -0500 Subject: Re: [Xen-devel] Network stops responding after some time> I am seeing the same dom0 problem on a box that also has an integrated nForce3 > Ethernet adapter. The box started out with SLES 10 but I removed the SUSE > supplied Xen components and am using Xen-unstable change set 10982. > > I am not starting xend, so bridging is not involved. To eliminate the LAN > driver issue, I used an Intel E100 adapter. > > It seems that networking goes along OK as long as traffic is light. To trigger > the problem, I download a CD ISO image. Downloading an ISO causes the failure > every time. The download progress varies from not even getting started to 426 > MB out of the 676 MB. When the download progress stops, checking > /proc/interrupts shows that interrupts for the LAN adapter have stopped. > > As suggested, I tried the ''ioapic_ack=old'' Xen boot parameter. It didn''t help. > > I don''t believe it is a LAN driver problem because doing a ''rmmod e100'' and an > ''insmod e100.ko'' does not get things going again. > > Attached are various log files that will hopefully be of some use. > > xendebug * contains the serial output as xen is booting up. It also contains > the dump_irqs from the ''i'' and th print_IO_APIC_keyhandler from the ''z'' after > doing the <ctrl a> <ctrl a><ctrl a> from the debug terminal. I added counters > to mask_and_ack_level_ioapic_vector and end_level_ioapic_vector. The > interesting thing here is that the e100 is using int 19 and > mask_and_ack_level_ioapic_vector, end_level_ioapic_vector, and int 19 from > /proc/interrupts all have the save value (317059) when interrupts stop. Also > the irr is set to 1 when interrupts stop. > > cpuinfo * contents of /proc/cpuinfo > > hwinfo * results of doing ''hwinfo'' > > dmesg_native * dmesg from a native kernel boot. > > dmesg * dmesg from booting the xen kernel. After interrupts stop, there were no > new messages. > > messages - the /var/log/messages for the xen booting kernel. It also contains > the ''rmmod e100'' and the ''insmod e100'' messages after interrupts stopped for the > e100. > > ifconfig * the results of doing ifconfig after interrupts stop, after ''rmmod > e100'', and after ''insmod e100'' all catted together. > > ints * the results of /proc/interrupts on a native boot, after a xen kernel > boot, and after interrupts stop on the xen kernel. Native uses int 201 for the > e100 and xen uses int 19. > > Any help on this issue is greatly appricated. > > Thanks, > Kirk > > >>> On Wed, Jul 26, 2006 at 4:07 AM, in message > <00812518ddfbf4c535e7a4d25d8bbab3@cl.cam.ac.uk>, Keir Fraser > <Keir.Fraser@cl.cam.ac.uk> wrote: > > > On 24 Jul 2006, at 19:49, Adnan Khaleel wrote: > > > >> I need help in trying to understand why the ethernet driver has locked > >> up and how I can go about outputting debug messages. I see in > >> /proc/interrrupts that the interrupt count of eth0 just stops > >> incrementing. I''ve tried different (3Com, Realtek 8169, Realtek 8139) > >> based network cards and this happens with all. I''m not sure what is so > >> unique about my system that might be causing this to lockup, its a > >> regular Nforce3 based system with 512MB ram. This problem does not > >> happen if I''m not using a Xen enabled kernel. This is entirely > >> happening in dom0 and there aren''t any user domains so its not a > >> bridging issue. I''ve also disabled the Xen backend drivers (netbk.ko > >> and netloop.ko) so its talking to the network chip directly. > >> > >> Any help? Please? > > > > Firstly, you need to repro on the kernel from our xen- unstable > > repository, which is based on kernel.org Linux 2.6.16. Then build the > > same kernel for native i386 and get boot output. Send the unified diff > > (diff - u) of the two boot outputs. You may need to tweak the > > configuration of the native build to get the output similar to that of > > the Xen- based kernel -- we can tell you how to do that when we see your > > initial diff. > > > > Another thing worth trying is ''ioapic_ack=old'' as a Xen boot parameter > > in your bootloader configuration file. It probably won''t help, but > > worth a try. > > > > -- Keir > > > > > > _______________________________________________ > > Xen- devel mailing list > > Xen- devel@lists.xensource.com > > http://lists.xensource.com/xen- devel > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Adnan, The box is currently running a x86-64 kernel. The box belongs to another team and I am not able to re-install it in 32 bit mode right now. But that would be a good test. Thanks, Kirk>>> On Wed, Aug 9, 2006 at 1:49 PM, in message<20060809144903.8d95375b@shadowfax.centaur.net>, Adnan Khaleel <adnan@khaleel.us> wrote:> Hi Krik, > > Your problem seems to be identical to mine. Is your kernel x86- 64 aswell? I> was wondering if this > problem also manifested in 32bit as well and I never got around totrying to> myself. > > Hope we can find a fix for this. > > Adnan >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel