xc_dmesg.py was very helpful.
I can semi-reproduce the faults at this point. Some hosts seem more
prone than others to faulting, even though they are the same hardware.
Domain-0 has crashed rarely but normally the domain running the app
crashes. The app is a big multithreaded overlay network project
program.
I turned off -NODEBUG in arch/i386/Rules
but didn''t notice any change in the console output.
I see:
DOM2: __alloc_pages: 0-order allocation failed (gfp=0x20/0)
That message doesn''t always reboot domain, sometimes prints just once,
and sometimes prints in long batches.
DOM1: Weird failure in hard_start_xmit
This pops up occasionally and may prevent some TCP connections. Not
sure though.
DOM3: Unable to handle kernel paging request at virtual address c3f77820
...
DOM3: Oops: 0002
The application domain has a 1GB virtual disk swap space. It is
certainly not running out of swap. As long as domain-0 doesn''t crash
I
can get the oops info out of the xen console.
The setup is xen-1.2 without -NODEBUG on uni-P4 cpu with hyperthreading.
The NIC is a tg3. Running two domains, each with 1GB of swap. I''m
not doing
anything at all with the balloon system.
I have not tried this with 1.3 yet.
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel
> I can semi-reproduce the faults at this point. Some hosts seem more > prone than others to faulting, even though they are the same hardware. > Domain-0 has crashed rarely but normally the domain running the app > crashes. The app is a big multithreaded overlay network project > program.> I see: > DOM2: __alloc_pages: 0-order allocation failed (gfp=0x20/0) > That message doesn''t always reboot domain, sometimes prints just once, > and sometimes prints in long batches.It''s pretty unlikely this is anything to do with Xen -- I bet you could reproduce this on a stock Linux compiled without CONFIG_HIGHMEM> DOM1: Weird failure in hard_start_xmit > This pops up occasionally and may prevent some TCP connections. Not > sure though.I haven''t seen this message before, and I''m struggling to find the string in the source. Please can you check you''ve transcribed it correctly?> DOM3: Unable to handle kernel paging request at virtual address c3f77820 > ... > DOM3: Oops: 0002 > The application domain has a 1GB virtual disk swap space. It is > certainly not running out of swap. As long as domain-0 doesn''t crash I > can get the oops info out of the xen console.Armed with the Oops message and vmlinux it should be possible to debug this. Please could you have a go looking up the EIP in System.map. Thanks, Ian ------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> > DOM1: Weird failure in hard_start_xmit > > This pops up occasionally and may prevent some TCP connections. Not > > sure though. > > I haven''t seen this message before, and I''m struggling to find > the string in the source. Please can you check you''ve transcribed > it correctly?It''s a message from Xen. It only gets printed if the network device driver''s transmit function returned an error. This should never happen (if the driver isn''t ready to accept packets there is a way to signal that). Maybe it''s a driver or hardware bug. Maybe Xen is running low on memory (unlikely). -- Keir ------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
" > DOM2: __alloc_pages: 0-order allocation failed (gfp=0x20/0)
" It''s pretty unlikely this is anything to do with Xen -- I bet
you
" could reproduce this on a stock Linux compiled without CONFIG_HIGHMEM
You are correct. This message pops up on stock linux as well if
memory is constrained as tight as in our Xen config.
" > DOM3: Unable to handle kernel paging request at virtual address
c3f77820
The EIP is in arch/xeno/drivers/network/network.c:_network_interrupt()
I no longer have the oops messages unfortunately. We had to get
the hosts going again for that project and the oops got lost.
" > DOM1: Weird failure in hard_start_xmit
Xen prints this message here:
xeno-1.2.bk/xen/net/dev.c:816: printk("Weird failure in
hard_start_xmit!\n");
Last night a user sent me a detailed report on NIC trouble:
" When the machines freeze up running bbsend, bbrecv, or netgen, they
_also_
" freeze up on incoming SSH connections.
" If I''m already logged into rack217 via SSH when I start a
netgen, then my
" interactive session gets laggy or freezes completely.
"
" At any time, killing the netgen process makes whatever was frozen resume
" almost immediately.
"
" We''re not talking about large amounts of traffic here: 12KB/s
causes all
" of the above symptops. netgen and bbsend both do some busy-waiting, but
" not that much of it.
"
" For some reason, the system load goes sky-high, even with just one
netgen
" process. netgen is single-threaded and spends less than half of its
time
" busy-waiting, yet system load often ends up above 3.
"
" End of symtoms, beginning of theory: all the bad systems are P4s running
" Xeno and using Broadcom ethernet cards. (At least, they used to be
" Broadcoms. With Xeno running, I can no longer check.) The working
" systems are a mix of P4 and P3, Xeno is running on two of them (but only
" on P3s), and they''re all eepro100 cards.
"
" My guess is that Xeno is interacting badly with either the bcm5700 or
the
" P4. I''m leaning toward the former. Is there any way to boot
the machines
That "hard_start_xmit" message showed up on the hosts with
Broadcom BCM5703 NICs.
We''ll setup a test cluster to isolate what is going on with these
network apps.
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel
> > " > DOM2: __alloc_pages: 0-order allocation failed (gfp=0x20/0) > > " It''s pretty unlikely this is anything to do with Xen -- I bet you > " could reproduce this on a stock Linux compiled without CONFIG_HIGHMEM > > You are correct. This message pops up on stock linux as well if > memory is constrained as tight as in our Xen config.Good (at least for us!).> > " > DOM3: Unable to handle kernel paging request at virtual address c3f77820 > > The EIP is in arch/xeno/drivers/network/network.c:_network_interrupt() > I no longer have the oops messages unfortunately. We had to get > the hosts going again for that project and the oops got lost.Interesting. I''d certainly like to see an Oops for this one.> > " > DOM1: Weird failure in hard_start_xmit > > Xen prints this message here: > xeno-1.2.bk/xen/net/dev.c:816: printk("Weird failure in hard_start_xmit!\n");[I was confused by the ''DOM1'' prefix and was looking in xenolinux]> That "hard_start_xmit" message showed up on the hosts with > Broadcom BCM5703 NICs.I''m afraid the tg3 chipset is pretty flaky, and the driver needs to be pretty careful. Our version of the tg3 driver is a few revs behind the one in stock Linux, but it could quite easily be brought up to date. (I''d use linux.bkbits.net to extract a patch and apply -- i think our version is: http://linux.bkbits.net:8080/linux-2.4/diffs/drivers/net/tg3.c@1.103?nav=index.html|src/.|src/drivers|src/drivers/net|hist/drivers/net/tg3.c). Would you be able to have a go at this? Cheers, Ian ------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
" It''s a message from Xen. It only gets printed if the network
device
" driver''s transmit function returned an error. This should never
happen
" (if the driver isn''t ready to accept packets there is a way to
signal
" that).
"
" Maybe it''s a driver or hardware bug. Maybe Xen is running low on
" memory (unlikely).
"
" -- Keir
"
I am seeing xen crash again with the ''Weird'' log message.
This time it is xen-1.3-devel using an eepro100 NIC:
(XEN) Weird failure in hard_start_xmit!
I got a couple oops messages pasted below, and the xen startup messages.
My build of xen.gz and xenolinux.gz are here:
http://www.cs.duke.edu/~becker/xen/
The hardware is an IBM x330 PIII with 256MB phys mem. DOM0 has 40MB and DOM1
has almost 190MB. Both domains have 1GB of swap space. DOM0 runs off
of real partitions and DOM1 off of virtual disks.
# xc_dom_control.py list
Dom Name Mem(kb) CPU State Time(s)
0 Domain-0 40000 0 r- 87
1 athos07 189440 0 -- 111
--
(XEN) Weird failure in hard_start_xmit!
(XEN) CPU: 0
(XEN) EIP: 0808:[<fc547e9a>]
(XEN) EFLAGS: 00010206
(XEN) eax: fc77b2c0 ebx: 0162f012 ecx: fc680160 edx: 00000010
(XEN) esi: fc77b2c0 edi: fc680140 ebp: fc780f00 esp: fc503e3c
(XEN) ds: 0810 es: 0810 fs: 0810 gs: 0810 ss: 0810
(XEN) Stack trace from ESP=fc503e3c:
(XEN) ff923012 00000020 00000206 ffc00000 fc680000 00000040 fc680140 [fc5476c4]
(XEN) fc680140 fc680000 fc629a18 00000000 fc77b380 fc680160 fc680168
00000090
(XEN) 00000040 fc680000 00000040 0000f048 ffc00000 fc680140 0000f048
[fc54734e]
(XEN) fc680140 00010810 fc660810 fc77b380 00000000 0000001b fc6264c0
[fc5e14c4]
(XEN) 0000001b fc680000 fc503eec fc784d20 965a495b fc77b380 00000000
00000001
(XEN) 00000001 fc652b82 00000000 [fc5de530] 00000001 00000000 00000000
00000001
(XEN) fc652b82 00000000 00000000 fc7a0810 0a780810 00000810 00000810
ffffff1b
(XEN) [fc51dced] 00000808 00000296 00000008 fc6007f5 fc5f5f42 fc503f5c
00000000
(XEN) 00000000 00000296 00000000 fc780180 fc780180 fc680000 [fc51b47b]
fc5f5f20
(XEN) fc680000 fc652700 fc619628 00000000 00000000 cae3d160 [fc5171bb]
00000000
(XEN) 00000000 00000001 00000001 00000001 fc650f58 00000000 [fc516df5]
fc650f58
(XEN) fc77b380 00000000 fc784d20 fc784d20 0001fc0f 00000000 [fc5dd9c6]
c017fce8
(XEN) cae3d000 c1d5a6c0 0001fc0f 00000000 cae3d160 00000000 00000821
00000821
(XEN) 00000821 00000821 00000006 c00b7b0f 00000819 00000246 c017fcdc
00000821
(XEN) fc784d20
****************************************
CPU0 FATAL PAGE FAULT
[error_code=00000000]
Faulting linear address might be 0162f012
Aieee! CPU0 is toast...
****************************************
--
(XEN) Weird failure in hard_start_xmit!
(XEN) Weird failure in hard_start_xmit!
(XEN) Weird failure in hard_start_xmit!
(XEN) Weird failure in hard_start_xmit!
(XEN) Weird failure in hard_start_xmit!
(XEN) Weird failure in hard_start_xmit!
(XEN) Weird failure in hard_start_xmit!
(XEN) Weird failure in hard_start_xmit!
(XEN) CPU: 0
(XEN) EIP: 0808:[<fc547e9a>]
(XEN) EFLAGS: 00010206
(XEN) eax: fc7862e0 ebx: 01793012 ecx: fc680160 edx: 00000010
(XEN) esi: fc7862e0 edi: fc680140 ebp: fc787b40 esp: fc503e3c
(XEN) ds: 0810 es: 0810 fs: 0810 gs: 0810 ss: 0810
(XEN) Stack trace from ESP=fc503e3c:
(XEN) ffa77012 00000020 00000202 ffc00000 fc680000 00000040 fc680140 [fc5476c4]
(XEN) fc680140 fc680000 fc629a18 00000000 fc786700 fc680160 fc680168
00000044
(XEN) 00000040 fc680000 00000040 0000f048 ffc00000 fc680140 0000f048
[fc54734e]
(XEN) fc680140 00010810 fc660810 fc786700 00000000 0000001b fc6264c0
[fc5e14c4]
(XEN) 0000001b fc680000 fc503eec fc78cd20 aadc5da6 fc786700 00000000
00000001
(XEN) 00000001 fc652b82 00000000 [fc5de530] 00000001 00000000 00000000
00000001
(XEN) fc652b82 00000000 00000000 fc7a0810 0a780810 00000810 00000810
ffffff1b
(XEN) [fc51dced] 00000808 00000296 00000008 fc6007f5 fc5f5f42 fc503f5c
00000000
(XEN) 00000000 00000296 00000000 fc787600 fc787600 fc680000 [fc51b47b]
fc5f5f20
(XEN) fc680000 fc652700 fc619628 00000000 00000000 cb822160 [fc5171bb]
00000000
(XEN) 00000000 00000001 00000001 00000001 fc650f58 00000000 [fc516df5]
fc650f58
(XEN) fc786700 00000000 fc78cd20 fc78cd20 000383c4 00000000 [fc5dd9c6]
c2437d08
(XEN) cb822000 cb02f900 000383c4 00000000 cb822160 00000000 00000821
00000821
(XEN) 00000000 00000000 00000006 c00b7b0f 00000819 00000246 c2437cfc
00000821
(XEN) fc78cd20
****************************************
CPU0 FATAL PAGE FAULT
[error_code=00000000]
Faulting linear address might be 01793012
Aieee! CPU0 is toast...
****************************************
--
__ __ _ _____ _ _
\ \/ /___ _ __ / | |___ / __| | _____ _____| |
\ // _ \ ''_ \ | | |_ \ __ / _` |/ _ \ \ / / _ \ |
/ \ __/ | | | | |_ ___) |__| (_| | __/\ V / __/ |
/_/\_\___|_| |_| |_(_)____/ \__,_|\___| \_/ \___|_|
http://www.cl.cam.ac.uk/netos/xen
University of Cambridge Computer Laboratory
Xen version 1.3-devel (becker@) (gcc version 3.3.3 (Debian)) Tue Apr 13
10:29:07 EDT 2004
(XEN) **WARNING**: Xen option ''ser_baud='' is deprecated! Use
''com1='' instead.
(XEN) Initialised 255MB memory on a 255MB machine
(XEN) Xen heap size is 13870KB
(XEN) Initialising Xen allocator with 13MB memory
(XEN) Reading BIOS drive-info tables at 0x00000 and 0x00000
(XEN) CPU0: Before vendor init, caps: 0383fbff 00000000 00000000, vendor = 0
(XEN) CPU caps: 0383fbff 00000000 00000000 00000000
(XEN) found SMP MP-table at 0009e1d0
(XEN) Memory Reservation 0x9e1d0, 4096 bytes
(XEN) Memory Reservation 0x9e1e0, 4096 bytes
(XEN) ACPI: Searched entire block, no RSDP was found.
(XEN) ACPI: RSDP located at physical address fc4fdfd0
(XEN) RSD PTR v0 [IBM ]
(XEN) __va_range(0xffeff80, 0x68): idx=8 mapped at ffff6000
(XEN) ACPI table found: RSDT v1 [IBM SERSPIDR 0.4096]
(XEN) __va_range(0xffeff00, 0x24): idx=8 mapped at ffff6000
(XEN) __va_range(0xffeff00, 0x74): idx=8 mapped at ffff6000
(XEN) ACPI table found: FACP v1 [IBM SERSPIDR 0.4096]
(XEN) __va_range(0xffefe80, 0x24): idx=8 mapped at ffff6000
(XEN) __va_range(0xffefe80, 0x60): idx=8 mapped at ffff6000
(XEN) ACPI table found: APIC v1 [IBM SERSPIDR 0.4096]
(XEN) __va_range(0xffefe80, 0x60): idx=8 mapped at ffff6000
(XEN) LAPIC (acpi_id[0x0000] id[0x0] enabled[1])
(XEN) CPU 0 (0x0000) enabledProcessor #0 Pentium(tm) Pro APIC version 16
(XEN)
(XEN) IOAPIC (id[0xe] address[0xfec00000] global_irq_base[0x0])
(XEN) IOAPIC (id[0xd] address[0xfec01000] global_irq_base[0x10])
(XEN) INT_SRC_OVR (bus[0] irq[0x0] global_irq[0x2] polarity[0x0] trigger[0x0])
(XEN) INT_SRC_OVR (bus[0] irq[0x3] global_irq[0x1e] polarity[0x0] trigger[0x0])
(XEN) 1 CPUs total
(XEN) Local APIC address fee00000
(XEN) Enabling the CPU''s according to the ACPI table
(XEN) Intel MultiProcessor Specification v1.4
(XEN) Virtual Wire compatibility mode.
(XEN) OEM ID: IBM ENSW Product ID: NF 4100R SMP APIC at: 0xFEE00000
(XEN) Processor #0 Pentium(tm) Pro APIC version 17
(XEN) I/O APIC #14 Version 17 at 0xFEC00000.
(XEN) I/O APIC #13 Version 17 at 0xFEC01000.
(XEN) Enabling APIC mode: Flat. Using 2 I/O APICs
(XEN) Processors: 2
(XEN) Initialising domains
(XEN) Initialising schedulers
(XEN) Using scheduler: Borrowed Virtual Time (bvt)
(XEN) Initializing CPU#0
(XEN) Detected 996.914 MHz processor.
(XEN) CPU0: Before vendor init, caps: 0383fbff 00000000 00000000, vendor = 0
(XEN) CPU caps: 0383fbff 00000000 00000000 00000000
(XEN) CPU0 booted
(XEN) enabled ExtINT on CPU#0
(XEN) ESR value before enabling vector: 00000000
(XEN) ESR value after enabling vector: 00000000
(XEN) Error: only one processor found.
(XEN) ENABLING IO-APIC IRQs
(XEN) Setting 14 in the phys_id_present_map
(XEN) ...changing IO-APIC physical APIC ID to 14 ... ok.
(XEN) Setting 13 in the phys_id_present_map
(XEN) ...changing IO-APIC physical APIC ID to 13 ... ok.
(XEN) init IO_APIC IRQs
(XEN) IO-APIC (apicid-pin) 14-0, 14-3, 14-9, 14-10, 14-11, 13-0, 13-1, 13-2,
13-3, 13-4, 13-5, 13-6, 13-7, 13-8, 13-10, 13-13, 13-14, 13-15 not connected.
(XEN) ..TIMER: vector=0x41 pin1=2 pin2=-1
(XEN) ..MP-BIOS bug: 8254 timer not connected to IO-APIC
(XEN) ...trying to set up timer (IRQ0) through the 8259A ... failed.
(XEN) ...trying to set up timer as Virtual Wire IRQ... works.
(XEN) number of MP IRQ sources: 14.
(XEN) number of IO-APIC #14 registers: 16.
(XEN) number of IO-APIC #13 registers: 16.
(XEN) testing the IO APIC.......................
(XEN)
(XEN) IO APIC #14......
(XEN) .... register #00: 0E000000
(XEN) ....... : physical APIC id: 0E
(XEN) .... register #01: 000F0011
(XEN) ....... : max redirection entries: 000F
(XEN) ....... : PRQ implemented: 0
(XEN) ....... : IO APIC version: 0011
(XEN) .... register #02: 0E000000
(XEN) ....... : arbitration: 0E
(XEN) .... IRQ redirection table:
(XEN) NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
(XEN) 00 000 00 1 0 0 0 0 0 0 00
(XEN) 01 0FF 0F 0 0 0 0 0 1 1 49
(XEN) 02 000 00 1 0 0 0 0 0 0 00
(XEN) 03 000 00 1 0 0 0 0 0 0 00
(XEN) 04 0FF 0F 0 0 0 0 0 1 1 51
(XEN) 05 0FF 0F 1 1 0 1 0 1 1 59
(XEN) 06 0FF 0F 0 0 0 0 0 1 1 61
(XEN) 07 0FF 0F 1 1 0 1 0 1 1 69
(XEN) 08 0FF 0F 0 0 0 0 0 1 1 71
(XEN) 09 000 00 1 0 0 0 0 0 0 00
(XEN) 0a 000 00 1 0 0 0 0 0 0 00
(XEN) 0b 000 00 1 0 0 0 0 0 0 00
(XEN) 0c 0FF 0F 0 0 0 0 0 1 1 79
(XEN) 0d 0FF 0F 0 0 0 0 0 1 1 81
(XEN) 0e 0FF 0F 0 0 0 0 0 1 1 89
(XEN) 0f 0FF 0F 0 0 0 0 0 1 1 91
(XEN)
(XEN) IO APIC #13......
(XEN) .... register #00: 0D000000
(XEN) ....... : physical APIC id: 0D
(XEN) .... register #01: 000F0011
(XEN) ....... : max redirection entries: 000F
(XEN) ....... : PRQ implemented: 0
(XEN) ....... : IO APIC version: 0011
(XEN) .... register #02: 0D000000
(XEN) ....... : arbitration: 0D
(XEN) .... IRQ redirection table:
(XEN) NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
(XEN) 00 000 00 1 0 0 0 0 0 0 00
(XEN) 01 000 00 1 0 0 0 0 0 0 00
(XEN) 02 000 00 1 0 0 0 0 0 0 00
(XEN) 03 000 00 1 0 0 0 0 0 0 00
(XEN) 04 000 00 1 0 0 0 0 0 0 00
(XEN) 05 000 00 1 0 0 0 0 0 0 00
(XEN) 06 000 00 1 0 0 0 0 0 0 00
(XEN) 07 000 00 1 0 0 0 0 0 0 00
(XEN) 08 000 00 1 0 0 0 0 0 0 00
(XEN) 09 0FF 0F 1 1 0 1 0 1 1 99
(XEN) 0a 000 00 1 0 0 0 0 0 0 00
(XEN) 0b 0FF 0F 1 1 0 1 0 1 1 A1
(XEN) 0c 0FF 0F 1 1 0 1 0 1 1 A9
(XEN) 0d 000 00 1 0 0 0 0 0 0 00
(XEN) 0e 000 00 1 0 0 0 0 0 0 00
(XEN) 0f 000 00 1 0 0 0 0 0 0 00
(XEN) IRQ to pin mappings:
(XEN) IRQ0 -> 0:2
(XEN) IRQ1 -> 0:1
(XEN) IRQ4 -> 0:4
(XEN) IRQ5 -> 0:5
(XEN) IRQ6 -> 0:6
(XEN) IRQ7 -> 0:7
(XEN) IRQ8 -> 0:8
(XEN) IRQ12 -> 0:12
(XEN) IRQ13 -> 0:13
(XEN) IRQ14 -> 0:14
(XEN) IRQ15 -> 0:15
(XEN) IRQ25 -> 1:9
(XEN) IRQ27 -> 1:11
(XEN) IRQ28 -> 1:12
(XEN) .................................... done.
(XEN) Using local APIC timer interrupts.
(XEN) Calibrating APIC timer for CPU0...
(XEN) ..... CPU speed is 996.8097 MHz.
(XEN) ..... Bus speed is 132.9078 MHz.
(XEN) ..... bus_scale = 0x00008819
(XEN) ACT: Initialising Accurate timers
(XEN) Time init:
(XEN) .... System Time: 20000448ns
(XEN) .... cpu_freq: 00000000:3B6BB418
(XEN) .... scale: 00000001:00CADB62
(XEN) .... Wall Clock: 1082735518s 0us
(XEN) Start schedulers
(XEN) PCI: PCI BIOS revision 2.10 entry at 0xfd61c, last bus=1
(XEN) PCI: Using configuration type 1
(XEN) PCI: Probing PCI hardware
(XEN) PCI: Discovered peer bus 01
(XEN) PCI->APIC IRQ transform: (B0,I2,P0) -> 27
(XEN) PCI->APIC IRQ transform: (B0,I10,P0) -> 25
(XEN) PCI->APIC IRQ transform: (B0,I15,P0) -> 7
(XEN) PCI->APIC IRQ transform: (B1,I3,P0) -> 28
(XEN) Intel(R) PRO/100 Network Driver - version 2.2.21-k1
(XEN) Copyright (c) 2003 Intel Corporation
(XEN)
(XEN) e100: selftest OK.
(XEN) e100: eth0: Intel(R) PRO/100 Network Connection
(XEN) Hardware receive checksums enabled
(XEN) cpu cycle saver enabled
(XEN)
(XEN) ***************************
(XEN) * WARNING FOR NET DEVICE eth0 (NIC type ''Intel(R) PRO/100 Network
Driver''):
(XEN) * This NIC cannot support fully efficient networking in Xen.
(XEN) * In particular, extra packet copies will be incurred!
(XEN) * See documentation for a list of recommended NIC types
(XEN) ***************************
(XEN) e100: selftest OK.
(XEN) e100: eth1: Intel(R) PRO/100 Network Connection
(XEN) Hardware receive checksums enabled
(XEN) cpu cycle saver enabled
(XEN)
(XEN) ***************************
(XEN) * WARNING FOR NET DEVICE eth1 (NIC type ''Intel(R) PRO/100 Network
Driver''):
(XEN) * This NIC cannot support fully efficient networking in Xen.
(XEN) * In particular, extra packet copies will be incurred!
(XEN) * See documentation for a list of recommended NIC types
(XEN) ***************************
(XEN) Uniform Multi-Platform E-IDE driver Revision: 6.31
(XEN) ide: Assuming 33MHz system bus speed for PIO modes; override with
idebus=xx
(XEN) ServerWorks OSB4: IDE controller on PCI bus 00 dev 79
(XEN) ServerWorks OSB4: detected chipset, but driver not compiled in!
(XEN) ServerWorks OSB4: chipset revision 0
(XEN) ServerWorks OSB4: not 100% native mode: will probe irqs later
(XEN) ide0: BM-DMA at 0x0700-0x0707, BIOS settings: hda:DMA, hdb:DMA
(XEN) ide1: BM-DMA at 0x0708-0x070f, BIOS settings: hdc:pio, hdd:pio
(XEN) hda: CRN-8241B, ATAPI CD/DVD-ROM drive
(XEN) ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
(XEN) hda: ATAPI 24X CD-ROM drive, 128kB Cache
(XEN) Uniform CD-ROM driver Revision: 3.12
(XEN) SCSI subsystem driver Revision: 1.00
(XEN) Red Hat/Adaptec aacraid driver (1.1.2 Apr 2 2004 16:39:14)
(XEN) scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.8
(XEN) <Adaptec aic7892 Ultra160 SCSI adapter>
(XEN) aic7892: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs
(XEN)
(XEN) Vendor: IBM-PSG Model: DDYS-T18350M M Rev: S9HA
(XEN) Type: Direct-Access ANSI SCSI revision: 03
(XEN) (scsi0:A:0): 160.000MB/s transfers (80.000MHz DT, offset 63, 16bit)
(XEN) Vendor: IBM Model: FTlV1 S2 Rev: 0
(XEN) Type: Processor ANSI SCSI revision: 02
(XEN) scsi0:A:0:0: Tagged Queuing enabled. Depth 253
(XEN) Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
(XEN) SCSI device sda: 35548320 512-byte hdwr sectors (18201 MB)
(XEN) Device eth0 opened and ready for use.
(XEN) *** LOADING DOMAIN 0 ***
(XEN) Xen-ELF header found:
''GUEST_OS=linux,GUEST_VER=2.4,XEN_VER=1.3''
(XEN) PHYSICAL MEMORY ARRANGEMENT:
(XEN) Kernel image: 02800000->02991848
(XEN) Initrd image: 00000000->00000000
(XEN) Dom0 alloc.: 02c00000->05310000
(XEN) VIRTUAL MEMORY ARRANGEMENT:
(XEN) Loaded kernel: c0000000->c01c5ea8
(XEN) Init. ramdisk: c01c6000->c01c6000
(XEN) Phys-Mach map: c01c6000->c01cfc40
(XEN) Page tables: c01d0000->c01d2000
(XEN) Start info: c01d2000->c01d3000
(XEN) Boot stack: c01d3000->c01d4000
(XEN) TOTAL: c0000000->c0400000
(XEN) ENTRY ADDRESS: c0000000
(XEN) *** Serial input -> DOM0 (type ''CTRL-a'' three times
to switch input to Xen).
(XEN) Give DOM0 read access to all PCI devices
(XEN) e100: eth0 NIC Link is Up 100 Mbps Full duplex
-------------------------------------------------------
This SF.net email is sponsored by: The Robotic Monkeys at ThinkGeek
For a limited time only, get FREE Ground shipping on all orders of $35
or more. Hurry up and shop folks, this offer expires April 30th!
http://www.thinkgeek.com/freeshipping/?cpg=12297
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel
> > " It''s a message from Xen. It only gets printed if the network device > " driver''s transmit function returned an error. This should never happen > " (if the driver isn''t ready to accept packets there is a way to signal > " that). > " > " Maybe it''s a driver or hardware bug. Maybe Xen is running low on > " memory (unlikely). > > I am seeing xen crash again with the ''Weird'' log message. > This time it is xen-1.3-devel using an eepro100 NIC: > (XEN) Weird failure in hard_start_xmit!Couple of questions that I think might be a big help to us in tracking this down: Can you repeat with hardware other than eepro100? (Our test machines have tg3 and e1000 cards). Can you reproduce reliably using the ttcp test program? What about in UDP mode "ttcp -s -u -fm", or does it require data to be flowing in both directions to trigger it? (I presume it''s the transmitter that blows up?) Does reducing the packet size (and hence taking the packet rate up) cause the bug to trigger more frequently? Can you cause the bug to trigger with just one domain running on each machine? Can you cause the bug to trigger with the machine booted "nosmp"? Of course, the new IO stuff should be ready soon and the bug might just disappear ;-) Thanks, Ian ------------------------------------------------------- This SF.net email is sponsored by: The Robotic Monkeys at ThinkGeek For a limited time only, get FREE Ground shipping on all orders of $35 or more. Hurry up and shop folks, this offer expires April 30th! http://www.thinkgeek.com/freeshipping/?cpg=12297 _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
" Can you repeat with hardware other than eepro100? (Our test " machines have tg3 and e1000 cards). Yes we have tg3 hosts that crash but they do not have heads or serial lines so thats not much help. The cluster with serial access had to be reset to stock Linux so the project could proceed. " Can you reproduce reliably using the ttcp test program? No. At some point with xen-1.2 I could trigger a crash with traffic generators. The generators didn''t crash 1.3, so it was put into service. " the transmitter that blows up?) The app involved is an overlay network streaming data so flows are bi-directional. The app is a memory pig with lots of threads and sockets. " Can you cause the bug to trigger with just one domain running on " each machine? Possibly. That will required some work to make the dom-0 image on that cluster run on another network. " Can you cause the bug to trigger with the machine booted "nosmp"? I''ll try that when I have a chance. These oops were with uni''s on dual-capable motherboards. ------------------------------------------------------- This SF.net email is sponsored by: The Robotic Monkeys at ThinkGeek For a limited time only, get FREE Ground shipping on all orders of $35 or more. Hurry up and shop folks, this offer expires April 30th! http://www.thinkgeek.com/freeshipping/?cpg=12297 _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
The "weird failure" message comes out only occasionally? I guess if that is the case then it is harmless, but I added some extra tracing to the e100 driver so we can take a closer look if we like. I guess the driver gets into an occasional exceptional state such that this message appears and then it executes a rare function in the receive path that causes a blow-up. I''ve now checked in a fix for the bad receive function so that should get rid of the crash, but not the tx warning. -- Keir> I am seeing xen crash again with the ''Weird'' log message. > This time it is xen-1.3-devel using an eepro100 NIC: > (XEN) Weird failure in hard_start_xmit! > > I got a couple oops messages pasted below, and the xen startup messages. > My build of xen.gz and xenolinux.gz are here: > http://www.cs.duke.edu/~becker/xen/ > > The hardware is an IBM x330 PIII with 256MB phys mem. DOM0 has 40MB and DOM1 > has almost 190MB. Both domains have 1GB of swap space. DOM0 runs off > of real partitions and DOM1 off of virtual disks. > # xc_dom_control.py list > Dom Name Mem(kb) CPU State Time(s) > 0 Domain-0 40000 0 r- 87 > 1 athos07 189440 0 -- 111------------------------------------------------------- This SF.net email is sponsored by: The Robotic Monkeys at ThinkGeek For a limited time only, get FREE Ground shipping on all orders of $35 or more. Hurry up and shop folks, this offer expires April 30th! http://www.thinkgeek.com/freeshipping/?cpg=12297 _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel