Hi, We are trying to validate Asterisk as a media gateway PRI <-> SIP with two T400P (8 T1s) per box. The first experience with BOX1 (Compaq, 2.53 GHz, 1 Gb RAM) and just one T400P was encouraging - on the load test with 3 T1s worth of calls we had on average 75% idle CPU. Not so with BOX2 (Dell, single 2.6 GHz Xeon, 1 Gb RAM, 2 T400P) and BOX3 (Dell, dual 2.6 GHz Xeon, 2 Gb RAM, 2 T400P, asterisk/zaptel is built with SMP support). On the similar load test (as with the BOX1) BOX2 was showing 0% idle CPU 70% of the time. Just 3 T1s out of 8. On the load test with just 2 T1s BOX3 was very close to 0% idle on CPU0, CPU1 was at 95% idle. The process ksoftirqd_CPU0 was close to the top of the 'top', with /proc/interrupts showing tor2 related numbers growing very fast. We had 2 T1s plugged into the first T400P board, with nothing going into the second, but the number of interrupts for the both boards was growing at the same pace. Here are the interrupts (after the box reboot, so they are not that big as they were) - do they look OK? CPU0 CPU1 CPU2 CPU3 0: 122556 0 0 0 IO-APIC-edge timer 1: 4 0 0 0 IO-APIC-edge keyboard 2: 0 0 0 0 XT-PIC cascade 5: 0 0 0 0 IO-APIC-level usb-ohci 8: 1 0 0 0 IO-APIC-edge rtc 12: 20 0 0 0 IO-APIC-edge PS/2 Mouse 14: 23 0 2 0 IO-APIC-edge ide0 20: 516930 0 0 0 IO-APIC-level tor2 24: 516524 0 0 0 IO-APIC-level tor2 28: 10600 0 0 0 IO-APIC-level eth0 29: 4837 0 0 0 IO-APIC-level eth1 30: 24831 0 0 0 IO-APIC-level aacraid NMI: 0 0 0 0 LOC: 122430 122429 122429 122428 ERR: 0 MIS: 0 Not sure what went wrong. Any suggestions on how to work with 2 T400P in a box (without hurting performance) and how to get advantage of SMP for Asterisk would be appreciated. Any known Linux kernel related issues (2.4.20-13.7smp #1 SMP for BOX3 )? Thank you. Alex Zarubin -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.digium.com/pipermail/asterisk-users/attachments/20030609/11a3bfd0/attachment.htm
Are you sure that you compiled zaptel for __SMP__ ? Edit your zaptel/Makefile. 0: 75283844 75241320 75286285 75247088 IO-APIC-edge timer 1: 1 0 1 1 IO-APIC-edge keyboard 2: 0 0 0 0 XT-PIC cascade 3: 0 0 0 0 IO-APIC-level usb-ohci 8: 1 0 0 0 IO-APIC-edge rtc 15: 1 0 0 1 IO-APIC-edge ide1 16: 22134870 22120997 22135905 22122829 IO-APIC-level eth0 25: 4670 4548 4614 4518 IO-APIC-level tor2 All the four CPU's should have IRQ's like in the example above. Martin On Mon, 9 Jun 2003, Alex Zarubin wrote:> Hi, > > We are trying to validate Asterisk as a media gateway PRI <-> SIP with two > T400P (8 T1s) per box. The first > experience with BOX1 (Compaq, 2.53 GHz, 1 Gb RAM) and just one T400P was > encouraging - on the load > test with 3 T1s worth of calls we had on average 75% idle CPU. > > Not so with BOX2 (Dell, single 2.6 GHz Xeon, 1 Gb RAM, 2 T400P) and BOX3 > (Dell, dual 2.6 GHz Xeon, > 2 Gb RAM, 2 T400P, asterisk/zaptel is built with SMP support). > > On the similar load test (as with the BOX1) BOX2 was showing 0% idle CPU 70% > of the time. Just 3 T1s > out of 8. > > On the load test with just 2 T1s BOX3 was very close to 0% idle on CPU0, > CPU1 was at 95% idle. > The process ksoftirqd_CPU0 was close to the top of the 'top', with > /proc/interrupts showing tor2 related > numbers growing very fast. We had 2 T1s plugged into the first T400P board, > with nothing going into the second, > but the number of interrupts for the both boards was growing at the same > pace. Here are the interrupts > (after the box reboot, so they are not that big as they were) - do they look > OK? > > > CPU0 CPU1 CPU2 CPU3 > 0: 122556 0 0 0 IO-APIC-edge timer > 1: 4 0 0 0 IO-APIC-edge keyboard > 2: 0 0 0 0 XT-PIC cascade > 5: 0 0 0 0 IO-APIC-level usb-ohci > 8: 1 0 0 0 IO-APIC-edge rtc > 12: 20 0 0 0 IO-APIC-edge PS/2 Mouse > 14: 23 0 2 0 IO-APIC-edge ide0 > 20: 516930 0 0 0 IO-APIC-level tor2 > 24: 516524 0 0 0 IO-APIC-level tor2 > 28: 10600 0 0 0 IO-APIC-level eth0 > 29: 4837 0 0 0 IO-APIC-level eth1 > 30: 24831 0 0 0 IO-APIC-level aacraid > NMI: 0 0 0 0 > LOC: 122430 122429 122429 122428 > ERR: 0 > MIS: 0 > > Not sure what went wrong. Any suggestions on how to work with 2 T400P in a > box (without hurting performance) > and how to get advantage of SMP for Asterisk would be appreciated. > > Any known Linux kernel related issues (2.4.20-13.7smp #1 SMP for BOX3 )? > > Thank you. > > Alex Zarubin > > >
Zaptel was compiled with -D__SMP__ We've installed irqbalance and the picture improved a lot (thanks to Jared Smith). Do you still see problems in our /proc/interrupts? The big issue for us now is that after 24+ hours of the test load PRI->SIP our Dell PE2650, dual 2.6 GHz Xeon, 2 Gb RAM, 2 T400P, 2.4.20-18.7smp #1 SMP stops responding to anything. So the questions are: - are there known issues with PE2650 and ways to fix them? - can someone recommend the 'stable' 2.4 SMP kernel for this kind of load? - any expertise in this area will be appreciated CPU0 CPU1 CPU2 CPU3 0: 230710 30030 50050 0 IO-APIC-edge timer 1: 5 0 0 233 IO-APIC-edge keyboard 2: 0 0 0 0 XT-PIC cascade 5: 0 0 0 0 IO-APIC-level usb-ohci 8: 1 0 0 0 IO-APIC-edge rtc 14: 27 0 2 0 IO-APIC-edge ide0 20: 2085442 400221 0 230232 IO-APIC-level tor2 24: 293848 1841658 10010 570568 IO-APIC-level tor2 28: 5 25643 0 0 IO-APIC-level eth0 29: 5 0 5165040 0 IO-APIC-level eth1 30: 43720 35467 1291 3296 IO-APIC-level aacraid NMI: 0 0 0 0 LOC: 310618 310616 310616 310616 ERR: 0 MIS: 0 Thank you. Alex Zarubin -----Original Message----- From: Martin Pycko [mailto:martinp@digium.com] Sent: Tuesday, June 10, 2003 9:48 AM To: 'asterisk-users@lists.digium.com' Subject: Re: [Asterisk-Users] Dual T400P, SMP, performance issues Are you sure that you compiled zaptel for __SMP__ ? Edit your zaptel/Makefile. 0: 75283844 75241320 75286285 75247088 IO-APIC-edge timer 1: 1 0 1 1 IO-APIC-edge keyboard 2: 0 0 0 0 XT-PIC cascade 3: 0 0 0 0 IO-APIC-level usb-ohci 8: 1 0 0 0 IO-APIC-edge rtc 15: 1 0 0 1 IO-APIC-edge ide1 16: 22134870 22120997 22135905 22122829 IO-APIC-level eth0 25: 4670 4548 4614 4518 IO-APIC-level tor2 All the four CPU's should have IRQ's like in the example above. Martin On Mon, 9 Jun 2003, Alex Zarubin wrote:> Hi, > > We are trying to validate Asterisk as a media gateway PRI <-> SIP with two > T400P (8 T1s) per box. The first > experience with BOX1 (Compaq, 2.53 GHz, 1 Gb RAM) and just one T400P was > encouraging - on the load > test with 3 T1s worth of calls we had on average 75% idle CPU. > > Not so with BOX2 (Dell, single 2.6 GHz Xeon, 1 Gb RAM, 2 T400P) and BOX3 > (Dell, dual 2.6 GHz Xeon, > 2 Gb RAM, 2 T400P, asterisk/zaptel is built with SMP support). > > On the similar load test (as with the BOX1) BOX2 was showing 0% idle CPU70%> of the time. Just 3 T1s > out of 8. > > On the load test with just 2 T1s BOX3 was very close to 0% idle on CPU0, > CPU1 was at 95% idle. > The process ksoftirqd_CPU0 was close to the top of the 'top', with > /proc/interrupts showing tor2 related > numbers growing very fast. We had 2 T1s plugged into the first T400Pboard,> with nothing going into the second, > but the number of interrupts for the both boards was growing at the same > pace. Here are the interrupts > (after the box reboot, so they are not that big as they were) - do theylook> OK? > > > CPU0 CPU1 CPU2 CPU3 > 0: 122556 0 0 0 IO-APIC-edge timer > 1: 4 0 0 0 IO-APIC-edge keyboard > 2: 0 0 0 0 XT-PIC cascade > 5: 0 0 0 0 IO-APIC-level usb-ohci > 8: 1 0 0 0 IO-APIC-edge rtc > 12: 20 0 0 0 IO-APIC-edge PS/2Mouse> 14: 23 0 2 0 IO-APIC-edge ide0 > 20: 516930 0 0 0 IO-APIC-level tor2 > 24: 516524 0 0 0 IO-APIC-level tor2 > 28: 10600 0 0 0 IO-APIC-level eth0 > 29: 4837 0 0 0 IO-APIC-level eth1 > 30: 24831 0 0 0 IO-APIC-level aacraid > NMI: 0 0 0 0 > LOC: 122430 122429 122429 122428 > ERR: 0 > MIS: 0 > > Not sure what went wrong. Any suggestions on how to work with 2 T400P in a > box (without hurting performance) > and how to get advantage of SMP for Asterisk would be appreciated. > > Any known Linux kernel related issues (2.4.20-13.7smp #1 SMP for BOX3 )? > > Thank you. > > Alex Zarubin > > >_______________________________________________ Asterisk-Users mailing list Asterisk-Users@lists.digium.com http://lists.digium.com/mailman/listinfo/asterisk-users -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.digium.com/pipermail/asterisk-users/attachments/20030612/3980cee7/attachment.htm
Mark, As far as pings - we have cases when we could ping the box on both interfaces and there are cases when we could not (we tried 3-4 sets of NICs and drivers). All telnets, X, ssh etc. are definitely dead. No coredumps (asterisk was started with -g option), no kernel panics. Black console, Alt-SysRq combinations don't work. Pretty much no options but rebooting the box. As far as SMP and single T400P - we'll try and report the results but the idea was to go with as high density as possible ... What do you think of using hyperthreading - should we enable or disable it for the box running asterisk? What about -DCONFIG_ZAPTEL_WATCHDOG ? Can it help and how to use it? Thank you. Alex Zarubin -----Original Message----- From: Mark Spencer [mailto:markster@digium.com] Sent: Saturday, June 14, 2003 10:23 AM To: 'asterisk-users@lists.digium.com' Subject: RE: [Asterisk-Users] Dual T400P, SMP, performance issues When you say "stops responding" do you mean no more pings, telnet dead, etc? Or do you mean asterisk stops responding? Is there a segfault or kernel panic, or any other failure diagnostic? Mark On Thu, 12 Jun 2003, Alex Zarubin wrote:> Zaptel was compiled with -D__SMP__ > > We've installed irqbalance and the picture improved a lot > (thanks to Jared Smith). Do you still see problems in our/proc/interrupts?> > The big issue for us now is that after 24+ hours of the test load PRI->SIP > our Dell PE2650, dual 2.6 GHz Xeon, 2 Gb RAM, 2 T400P, 2.4.20-18.7smp #1SMP> stops responding to anything. > > So the questions are: > - are there known issues with PE2650 and ways to fix them? > - can someone recommend the 'stable' 2.4 SMP kernel for this > kind of load? > - any expertise in this area will be appreciated > > CPU0 CPU1 CPU2 CPU3 > 0: 230710 30030 50050 0 IO-APIC-edge timer > 1: 5 0 0 233 IO-APIC-edge keyboard > 2: 0 0 0 0 XT-PIC cascade > 5: 0 0 0 0 IO-APIC-level usb-ohci > 8: 1 0 0 0 IO-APIC-edge rtc > 14: 27 0 2 0 IO-APIC-edge ide0 > 20: 2085442 400221 0 230232 IO-APIC-level tor2 > 24: 293848 1841658 10010 570568 IO-APIC-level tor2 > 28: 5 25643 0 0 IO-APIC-level eth0 > 29: 5 0 5165040 0 IO-APIC-level eth1 > 30: 43720 35467 1291 3296 IO-APIC-level aacraid > NMI: 0 0 0 0 > LOC: 310618 310616 310616 310616 > ERR: 0 > MIS: 0 > > Thank you. > Alex Zarubin > > -----Original Message----- > From: Martin Pycko [mailto:martinp@digium.com] > Sent: Tuesday, June 10, 2003 9:48 AM > To: 'asterisk-users@lists.digium.com' > Subject: Re: [Asterisk-Users] Dual T400P, SMP, performance issues > > > Are you sure that you compiled zaptel for __SMP__ ? > Edit your zaptel/Makefile. > > 0: 75283844 75241320 75286285 75247088 IO-APIC-edge timer > 1: 1 0 1 1 IO-APIC-edge keyboard > 2: 0 0 0 0 XT-PIC cascade > 3: 0 0 0 0 IO-APIC-level usb-ohci > 8: 1 0 0 0 IO-APIC-edge rtc > 15: 1 0 0 1 IO-APIC-edge ide1 > 16: 22134870 22120997 22135905 22122829 IO-APIC-level eth0 > 25: 4670 4548 4614 4518 IO-APIC-level tor2 > > All the four CPU's should have IRQ's like in the example above. > > Martin > > On Mon, 9 Jun 2003, Alex Zarubin wrote: > > > Hi, > > > > We are trying to validate Asterisk as a media gateway PRI <-> SIP withtwo> > T400P (8 T1s) per box. The first > > experience with BOX1 (Compaq, 2.53 GHz, 1 Gb RAM) and just one T400P was > > encouraging - on the load > > test with 3 T1s worth of calls we had on average 75% idle CPU. > > > > Not so with BOX2 (Dell, single 2.6 GHz Xeon, 1 Gb RAM, 2 T400P) and BOX3 > > (Dell, dual 2.6 GHz Xeon, > > 2 Gb RAM, 2 T400P, asterisk/zaptel is built with SMP support). > > > > On the similar load test (as with the BOX1) BOX2 was showing 0% idle CPU > 70% > > of the time. Just 3 T1s > > out of 8. > > > > On the load test with just 2 T1s BOX3 was very close to 0% idle on CPU0, > > CPU1 was at 95% idle. > > The process ksoftirqd_CPU0 was close to the top of the 'top', with > > /proc/interrupts showing tor2 related > > numbers growing very fast. We had 2 T1s plugged into the first T400P > board, > > with nothing going into the second, > > but the number of interrupts for the both boards was growing at the same > > pace. Here are the interrupts > > (after the box reboot, so they are not that big as they were) - do they > look > > OK? > > > > > > CPU0 CPU1 CPU2 CPU3 > > 0: 122556 0 0 0 IO-APIC-edge timer > > 1: 4 0 0 0 IO-APIC-edgekeyboard> > 2: 0 0 0 0 XT-PICcascade> > 5: 0 0 0 0 IO-APIC-levelusb-ohci> > 8: 1 0 0 0 IO-APIC-edge rtc > > 12: 20 0 0 0 IO-APIC-edge PS/2 > Mouse > > 14: 23 0 2 0 IO-APIC-edge ide0 > > 20: 516930 0 0 0 IO-APIC-level tor2 > > 24: 516524 0 0 0 IO-APIC-level tor2 > > 28: 10600 0 0 0 IO-APIC-level eth0 > > 29: 4837 0 0 0 IO-APIC-level eth1 > > 30: 24831 0 0 0 IO-APIC-levelaacraid> > NMI: 0 0 0 0 > > LOC: 122430 122429 122429 122428 > > ERR: 0 > > MIS: 0 > > > > Not sure what went wrong. Any suggestions on how to work with 2 T400P ina> > box (without hurting performance) > > and how to get advantage of SMP for Asterisk would be appreciated. > > > > Any known Linux kernel related issues (2.4.20-13.7smp #1 SMP for BOX3 )? > > > > Thank you. > > > > Alex Zarubin > > > > > > > > _______________________________________________ > Asterisk-Users mailing list > Asterisk-Users@lists.digium.com > http://lists.digium.com/mailman/listinfo/asterisk-users >_______________________________________________ Asterisk-Users mailing list Asterisk-Users@lists.digium.com http://lists.digium.com/mailman/listinfo/asterisk-users -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.digium.com/pipermail/asterisk-users/attachments/20030616/1361a187/attachment.htm
Skipped content of type multipart/alternative-------------- next part -------------- A non-text attachment was scrubbed... Name: mes_ast.gz Type: application/octet-stream Size: 17081 bytes Desc: not available Url : http://lists.digium.com/pipermail/asterisk-users/attachments/20030617/c078130d/mes_ast.obj
Mark & Oliver, It is too early to say, but the picture is different now. Our dual CPU, dual T400P box is up for 4 days, under the load of 10 - 100 simultaneous PRI -> SIP calls. We installed 2.4.21 #2 SMP (it was still freezing after that) and, what I think made the difference, recompiled zaptel-libpri-asterisk with gcc 3.3. The problem, on the way, was that asterisk wouldn't start after that. It was crashing while loading mp3 and lpc10 codecs. We put 'noload' for these two into modules.conf - temporary solution, of course. There are problems, still, with multiple connections at the same time. Windows to the box get frozen for a sec, D-channel error messages. The following messages are dumped into /var/log/messages. What do you think? Jun 24 18:23:25 mspgate03 kernel: Jun 24 18:23:25 mspgate03 kernel: wait_on_irq, CPU 1: Jun 24 18:23:25 mspgate03 kernel: irq: 1 [ 0 0 1 0 ] Jun 24 18:23:25 mspgate03 kernel: bh: 0 [ 0 0 0 0 ] Jun 24 18:23:25 mspgate03 kernel: Stack dumps: Jun 24 18:23:25 mspgate03 kernel: CPU 0:02000000 0000036f 00e14603 18020000 03000010 00006647 008e0200 48030000 Jun 24 18:23:25 mspgate03 kernel: 00000078 001ffa02 5b490300 06000000 000001c7 074e0308 00001afe 01c74d03 Jun 24 18:23:25 mspgate03 kernel: 23020000 d7080000 e1000001 09000000 000001d7 f5030001 04000023 09300207 Jun 24 18:23:25 mspgate03 kernel: Call Trace: [<f89bd281>] [<f89bb132>] [<f89bbb47>] [<f89bd281>] [<f89bd281>] Jun 24 18:23:25 mspgate03 kernel: [<f89bb132>] [<f89bd281>] [<f89bd281>] [<f89bb132>] [<f89bbb47>] [<f89e7737>] Jun 24 18:23:25 mspgate03 kernel: [<f89aa80a>] [<f89aa80a>] [<c01feee4>] [<f89e7737>] [<c01f4eae>] [<c010a98e>] Jun 24 18:23:25 mspgate03 kernel: [<c020d122>] [<c010abe3>] [<c020d122>] [<c020d550>] [<c010a98e>] [<c020d550>] Jun 24 18:23:25 mspgate03 kernel: [<c010abfe>] [<c01f0919>] [<c01f0919>] [<c022a1ef>] [<c022a1ef>] [<c022a5f5>] Jun 24 18:23:25 mspgate03 kernel: [<f89bd281>] [<f89bd281>] [<f89bd281>] [<f89bb132>] [<f89bd510>] [<f89e7737>] Jun 24 18:23:25 mspgate03 kernel: [<c022a5f5>] [<c01f0ffd>] [<c01f112e>] [<c01f53c2>] [<c012005b>] [<c010abfe>] Jun 24 18:23:25 mspgate03 kernel: [<c015147a>] [<c01509dc>] [<c0147460>] [<c0147fb8>] [<f89e7737>] [<f89e7737>] Jun 24 18:23:25 mspgate03 kernel: [<c01f0998>] [<c01f0fac>] [<c01f112e>] [<c01f53c2>] [<c0117fce>] [<c0117ef0>] Jun 24 18:23:25 mspgate03 kernel: [<c0144a64>] [<c01246db>] [<c0109023>] Jun 24 18:23:25 mspgate03 kernel: Jun 24 18:23:25 mspgate03 kernel: CPU 2:00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Jun 24 18:23:25 mspgate03 kernel: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Jun 24 18:23:25 mspgate03 kernel: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Jun 24 18:23:25 mspgate03 kernel: Call Trace: Jun 24 18:23:25 mspgate03 kernel: Jun 24 18:23:25 mspgate03 kernel: CPU 3:00000070 cce30002 0cd80000 08fa0000 69530000 656c706d 6c616e41 73697379 Jun 24 18:23:25 mspgate03 kernel: 0009a700 46534c00 65746e69 6c6f7072 32657461 6e655f61 0a810063 69530000 Jun 24 18:23:25 mspgate03 kernel: 656c706d 65746e49 6c6f7072 4c657461 39004653 5300000b 6c706d69 66736c65 Jun 24 18:23:25 mspgate03 kernel: Call Trace: Jun 24 18:23:25 mspgate03 kernel: Jun 24 18:23:25 mspgate03 kernel: CPU 1:e14d5eac c025c896 00000001 00000001 ffffffff 00000001 c010a7c2 c025c8ab Jun 24 18:23:25 mspgate03 kernel: 00000000 f2d92124 e14d5f00 c0191104 00000500 00001805 000000bf 00008a01 Jun 24 18:23:25 mspgate03 kernel: 7f1c0300 01000415 1a131100 170f1200 00000000 e14d4000 00000000 00000000 Jun 24 18:23:25 mspgate03 kernel: Call Trace: [<c010a7c2>] [<c0191104>] [<c01913d4>] [<c018e1e2>] [<c014c2c7>] Jun 24 18:23:25 mspgate03 kernel: [<c0109023>] Jun 24 18:23:25 mspgate03 kernel: Thank you. Alex Zarubin -----Original Message----- From: The Traveller [mailto:traveler@xs4all.nl] Sent: Tuesday, June 17, 2003 3:10 PM To: asterisk-users@lists.digium.com Subject: Re: [Asterisk-Users] Dual T400P, SMP, performance issues On Tue, Jun 17, 2003 at 20:54:39 +0200, The Traveller wrote:> > BTW: As I reported in my previous mail to the list, I've now installedkernel> 2.4.21-rc2 with ACPI-patch on the box with the E100P. I've been trying > very hard to reproduce a freeze with this kernel, but haven't succeededyet. [...] Ok, it crashed again, so that wasn't it either. What I did to trigger it was using the auto-dialer to loop as many calls to app_datetime out and then back over the same E-1 as it would take, queueing the calls to "/var/spool/asterisk/outgoing/" 14 at a time. It froze at the first attempt. The "good" news is that it produced a visible kernel-panic. this time. My guess is that you only don't see it if the console screensaver has already come on while it happens. It read something like "Unable to handle kernel paging request" and happened in the swapper-task. As usual, it dumped a lot of numbers on the screen, which I didn't want to write down. Mark: If you want my help in debugging this, I'll hook it up to a serial console, trigger the crash and provide you with the exact panic, together with the ksyms and modules-info to trace it. Grtz, Oliver _______________________________________________ Asterisk-Users mailing list Asterisk-Users@lists.digium.com http://lists.digium.com/mailman/listinfo/asterisk-users -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.digium.com/pipermail/asterisk-users/attachments/20030624/ddefd6b5/attachment.htm
Mark, here is the info you requested. As far as multiple T400P boards question, I believe this is the most probable reason for this behavior (we haven't seen it on a single board machines). But in order to prove it we need 4-5 days of load testing. Hopefully we'll be able to do it next week. ksymoops 2.4.4 on i686 2.4.21. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.21 (specified) -m /boot/System.map-2.4.21 (default) -i Jun 24 18:23:25 mspgate03 kernel: wait_on_irq, CPU 1: Jun 24 18:23:25 mspgate03 kernel: irq: 1 [ 0 0 1 0 ] Jun 24 18:23:25 mspgate03 kernel: bh: 0 [ 0 0 0 0 ] Jun 24 18:23:25 mspgate03 kernel: Stack dumps: Jun 24 18:23:25 mspgate03 kernel: CPU 0:02000000 0000036f 00e14603 18020000 03000010 00006647 008e0200 48030000 Jun 24 18:23:25 mspgate03 kernel: 00000078 001ffa02 5b490300 06000000 000001c7 074e0308 00001afe 01c74d03 Jun 24 18:23:25 mspgate03 kernel: 23020000 d7080000 e1000001 09000000 000001d7 f5030001 04000023 09300207 Jun 24 18:23:25 mspgate03 kernel: Call Trace: [<f89bd281>] [<f89bb132>] [<f89bbb47>] [<f89bd281>] [<f89bd281>] Jun 24 18:23:25 mspgate03 kernel: [<f89bb132>] [<f89bd281>] [<f89bd281>] [<f89bb132>] [<f89bbb47>] [<f89e7737>] Jun 24 18:23:25 mspgate03 kernel: [<f89aa80a>] [<f89aa80a>] [<c01feee4>] [<f89e7737>] [<c01f4eae>] [<c010a98e>] Jun 24 18:23:25 mspgate03 kernel: [<c020d122>] [<c010abe3>] [<c020d122>] [<c020d550>] [<c010a98e>] [<c020d550>] Jun 24 18:23:25 mspgate03 kernel: [<c010abfe>] [<c01f0919>] [<c01f0919>] [<c022a1ef>] [<c022a1ef>] [<c022a5f5>] Jun 24 18:23:25 mspgate03 kernel: [<f89bd281>] [<f89bd281>] [<f89bd281>] [<f89bb132>] [<f89bd510>] [<f89e7737>] Jun 24 18:23:25 mspgate03 kernel: [<c022a5f5>] [<c01f0ffd>] [<c01f112e>] [<c01f53c2>] [<c012005b>] [<c010abfe>] Jun 24 18:23:25 mspgate03 kernel: [<c015147a>] [<c01509dc>] [<c0147460>] [<c0147fb8>] [<f89e7737>] [<f89e7737>] Jun 24 18:23:25 mspgate03 kernel: [<c01f0998>] [<c01f0fac>] [<c01f112e>] [<c01f53c2>] [<c0117fce>] [<c0117ef0>] Jun 24 18:23:25 mspgate03 kernel: [<c0144a64>] [<c01246db>] [<c0109023>] Jun 24 18:23:25 mspgate03 kernel: CPU 2:00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Jun 24 18:23:25 mspgate03 kernel: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Jun 24 18:23:25 mspgate03 kernel: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Jun 24 18:23:25 mspgate03 kernel: CPU 3:00000070 cce30002 0cd80000 08fa0000 69530000 656c706d 6c616e41 73697379 Jun 24 18:23:25 mspgate03 kernel: 0009a700 46534c00 65746e69 6c6f7072 32657461 6e655f61 0a810063 69530000 Jun 24 18:23:25 mspgate03 kernel: 656c706d 65746e49 6c6f7072 4c657461 39004653 5300000b 6c706d69 66736c65 Jun 24 18:23:25 mspgate03 kernel: CPU 1:e14d5eac c025c896 00000001 00000001 ffffffff 00000001 c010a7c2 c025c8ab Jun 24 18:23:25 mspgate03 kernel: 00000000 f2d92124 e14d5f00 c0191104 00000500 00001805 000000bf 00008a01 Jun 24 18:23:25 mspgate03 kernel: 7f1c0300 01000415 1a131100 170f1200 00000000 e14d4000 00000000 00000000 Jun 24 18:23:25 mspgate03 kernel: Call Trace: [<c010a7c2>] [<c0191104>] [<c01913d4>] [<c018e1e2>] [<c014c2c7>] Jun 24 18:23:25 mspgate03 kernel: [<c0109023>] Warning (Oops_read): Code line not seen, dumping what data is available Trace; f89bd281 <[zaptel]zt_process_putaudio_chunk+9a1/b70> Trace; f89bb132 <[zaptel]zt_process_getaudio_chunk+f2/910> Trace; f89bbb47 <[zaptel]zt_getbuf_chunk+1f7/4b0> Trace; f89bd281 <[zaptel]zt_process_putaudio_chunk+9a1/b70> Trace; f89bd281 <[zaptel]zt_process_putaudio_chunk+9a1/b70> Trace; f89bb132 <[zaptel]zt_process_getaudio_chunk+f2/910> Trace; f89bd281 <[zaptel]zt_process_putaudio_chunk+9a1/b70> Trace; f89bd281 <[zaptel]zt_process_putaudio_chunk+9a1/b70> Trace; f89bb132 <[zaptel]zt_process_getaudio_chunk+f2/910> Trace; f89bbb47 <[zaptel]zt_getbuf_chunk+1f7/4b0> Trace; f89e7737 <[tor2]tor2_intr+847/cb0> Trace; f89aa80a <[eepro100]speedo_start_xmit+17a/210> Trace; f89aa80a <[eepro100]speedo_start_xmit+17a/210> Trace; c01feee4 <qdisc_restart+14/170> Trace; f89e7737 <[tor2]tor2_intr+847/cb0> Trace; c01f4eae <dev_queue_xmit+14e/320> Trace; c010a98e <handle_IRQ_event+5e/90> Trace; c020d122 <ip_output+102/170> Trace; c010abe3 <do_IRQ+e3/110> Trace; c020d122 <ip_output+102/170> Trace; c020d550 <ip_queue_xmit+3c0/520> Trace; c010a98e <handle_IRQ_event+5e/90> Trace; c020d550 <ip_queue_xmit+3c0/520> Trace; c010abfe <do_IRQ+fe/110> Trace; c01f0919 <sock_def_readable+39/70> Trace; c01f0919 <sock_def_readable+39/70> Trace; c022a1ef <udp_queue_rcv_skb+18f/200> Trace; c022a1ef <udp_queue_rcv_skb+18f/200> Trace; c022a5f5 <udp_rcv+165/340> Trace; f89bd281 <[zaptel]zt_process_putaudio_chunk+9a1/b70> Trace; f89bd281 <[zaptel]zt_process_putaudio_chunk+9a1/b70> Trace; f89bd281 <[zaptel]zt_process_putaudio_chunk+9a1/b70> Trace; f89bb132 <[zaptel]zt_process_getaudio_chunk+f2/910> Trace; f89bd510 <[zaptel]zt_putbuf_chunk+c0/730> Trace; f89e7737 <[tor2]tor2_intr+847/cb0> Trace; c022a5f5 <udp_rcv+165/340> Trace; c01f0ffd <kfree_skbmem+5d/70> Trace; c01f112e <__kfree_skb+11e/130> Trace; c01f53c2 <net_tx_action+62/140> Trace; c012005b <do_softirq+6b/d0> Trace; c010abfe <do_IRQ+fe/110> Trace; c015147a <d_lookup+ba/120> Trace; c01509dc <dput+1c/160> Trace; c0147460 <cached_lookup+10/50> Trace; c0147fb8 <link_path_walk+8f8/a10> Trace; f89e7737 <[tor2]tor2_intr+847/cb0> Trace; f89e7737 <[tor2]tor2_intr+847/cb0> Trace; c01f0998 <sock_def_write_space+48/a0> Trace; c01f0fac <kfree_skbmem+c/70> Trace; c01f112e <__kfree_skb+11e/130> Trace; c01f53c2 <net_tx_action+62/140> Trace; c0117fce <schedule_timeout+7e/a0> Trace; c0117ef0 <process_timeout+0/60> Trace; c0144a64 <sys_stat64+64/70> Trace; c01246db <sys_nanosleep+11b/18c> Trace; c0109023 <system_call+33/38> Trace; c010a7c2 <__global_cli+e2/170> Trace; c0191104 <change_termios+24/190> Trace; c01913d4 <set_termios+164/170> Trace; c018e1e2 <tty_ioctl+372/390> Trace; c014c2c7 <sys_ioctl+1c7/1fe> Trace; c0109023 <system_call+33/38> 1 warning issued. Results may not be reliable. -----Original Message----- From: Mark Spencer [mailto:markster@digium.com] Sent: Wednesday, June 25, 2003 11:11 AM To: 'asterisk-users@lists.digium.com' Subject: RE: [Asterisk-Users] Dual T400P, SMP, performance issues Oooh, how neat! I wonder if there is some sort of race and that the kernel is detecting and defeating it somehow. Will ksymoops on your machine handle that output? Maybe we can track it down! Again, does the problem occur with only one board? i.e. is the problem tied to having multiple boards in the machine? Mark On Tue, 24 Jun 2003, Alex Zarubin wrote:> Mark & Oliver, > > It is too early to say, but the picture is different now. Our dual CPU, > dual T400P box is up for 4 days, under the load of 10 - 100 simultaneous > PRI -> SIP calls. We installed 2.4.21 #2 SMP (it was still freezing after > that) and, what I think made the difference, recompiled > zaptel-libpri-asterisk > with gcc 3.3. > > The problem, on the way, was that asterisk wouldn't start after that. Itwas> crashing while loading mp3 and lpc10 codecs. We put 'noload' for these two > into modules.conf - temporary solution, of course. > > There are problems, still, with multiple connections at the same time. > Windows > to the box get frozen for a sec, D-channel error messages. The following > messages are dumped into /var/log/messages. What do you think? > > ... > > Thank you. > Alex Zarubin > > -----Original Message----- > From: The Traveller [mailto:traveler@xs4all.nl] > Sent: Tuesday, June 17, 2003 3:10 PM > To: asterisk-users@lists.digium.com > Subject: Re: [Asterisk-Users] Dual T400P, SMP, performance issues > > > On Tue, Jun 17, 2003 at 20:54:39 +0200, The Traveller wrote: > > > > BTW: As I reported in my previous mail to the list, I've now installed > kernel > > 2.4.21-rc2 with ACPI-patch on the box with the E100P. I've been trying > > very hard to reproduce a freeze with this kernel, but haven't succeeded > yet. > [...] > > Ok, it crashed again, so that wasn't it either. What I did to trigger > it was using the auto-dialer to loop as many calls to app_datetime out > and then back over the same E-1 as it would take, queueing the calls > to "/var/spool/asterisk/outgoing/" 14 at a time. It froze at the first > attempt. The "good" news is that it produced a visible kernel-panic. > this time. My guess is that you only don't see it if the console > screensaver has already come on while it happens. > > It read something like "Unable to handle kernel paging request" and > happened in the swapper-task. As usual, it dumped a lot of numbers on the > screen, which I didn't want to write down. > > Mark: If you want my help in debugging this, I'll hook it up to a > serial console, trigger the crash and provide you with the exact > panic, together with the ksyms and modules-info to trace it. > > > > Grtz, > > Oliver > _______________________________________________ > Asterisk-Users mailing list > Asterisk-Users@lists.digium.com > http://lists.digium.com/mailman/listinfo/asterisk-users >_______________________________________________ Asterisk-Users mailing list Asterisk-Users@lists.digium.com http://lists.digium.com/mailman/listinfo/asterisk-users -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.digium.com/pipermail/asterisk-users/attachments/20030625/aaa55eaf/attachment.htm
Here is info on the kernel panic with the high volume (110+) of calls. Same configuration as before. Comments would be appreciated. ksymoops 2.4.4 on i686 2.4.21. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.21 (specified) -m /boot/System.map-2.4.21 (default) -i eax: 00000100 ebx: 00000000 ecx: 00000000 edx: f71b5a14 esi: 00000002 edi: f71b4000 ebp: f71b4000 esp: f71b59ec ds: 0018 es: 0018 ss: 0018 Process irqbalance (pid: 713, stackpage=f71b5000) Stack: 6e6d6c6b 7271706f 76757473 7a797877 00000001 c0115ef4 f71b4000 c02578fd f71b5a14 00000001 00000000 00000003 c0115ef4 f71b4000 f71b4000 00000000 f71b0018 c0110018 ffffffef c0114546 00000010 00000286 c0114470 Call Trace: [<c0115ef4>] [<c0115ef4>] [<c0110018>] [<c0114546>] [<c0114470>] [<c011bc88>] [<c01144c0>] [<c0114470>] [<c011b2d5>] [<c011eae2>] [<c011badb>] [<c011bc88>] [<c0116ff0>] [<c010960a>] [<c0115ef4>] [<c01173a8>] [<f89e7737>] [<f89fb1e0>] [<f89fb1e0>] [<c0117000>] [<c0109114>] [<c0115ef4>] [<c010abe3>] [<f897a8c0>] [<f897a8c0>] [<c0110018>] [<c0124345>] [<c012042b>] [<c01202d1>] [<c012005b>] [<c010abfe>] [<c015e751>] [<c0147513>] [<c01479f1>] [<f89e7737>] [<f89fb1e0>] [<c010e1b6>] [<c0123fc0>] [<c01482ab>] [<c01487c4>] [<c012042b>] [<c01202d1>] [<c012005b>] [<c010abfe>] [<c013c606>] [<c01471ae>] [<c013c953>] [<c0109023>] Code: 89 1d b0 e0 ff ff ff 80 04 48 33 c0 eb 02 f3 90 a1 88 f3 30 Using defaults from ksymoops -t elf32-i386 -a i386 Trace; c0115ef4 <end_level_ioapic_irq+24/f0> Trace; c0115ef4 <end_level_ioapic_irq+24/f0> Trace; c0110018 <pci_conf2_write+88/f0> Trace; c0114546 <.text.lock.smp+19/23> Trace; c0114470 <stop_this_cpu+0/40> Trace; c011bc88 <printk+128/140> Trace; c01144c0 <smp_send_stop+10/30> Trace; c0114470 <stop_this_cpu+0/40> Trace; c011b2d5 <panic+85/180> Trace; c011eae2 <do_exit+32/2d0> Trace; c011badb <call_console_drivers+eb/100> Trace; c011bc88 <printk+128/140> Trace; c0116ff0 <bust_spinlocks+50/60> Trace; c010960a <die+5a/80> Trace; c0115ef4 <end_level_ioapic_irq+24/f0> Trace; c01173a8 <do_page_fault+3a8/4db> Trace; f89e7737 <END_OF_CODE+309d4/????> Trace; f89fb1e0 <END_OF_CODE+4447d/????> Trace; f89fb1e0 <END_OF_CODE+4447d/????> Trace; c0117000 <do_page_fault+0/4db> Trace; c0109114 <error_code+34/3c> Trace; c0115ef4 <end_level_ioapic_irq+24/f0> Trace; c010abe3 <do_IRQ+e3/110> Trace; f897a8c0 <[usb-ohci]rh_int_timer_do+0/70> Trace; f897a8c0 <[usb-ohci]rh_int_timer_do+0/70> Trace; c0110018 <pci_conf2_write+88/f0> Trace; c0124345 <timer_bh+2b5/3f0> Trace; c012042b <bh_action+4b/80> Trace; c01202d1 <tasklet_hi_action+61/a0> Trace; c012005b <do_softirq+6b/d0> Trace; c010abfe <do_IRQ+fe/110> Trace; c015e751 <proc_lookup+51/c0> Trace; c0147513 <real_lookup+73/100> Trace; c01479f1 <link_path_walk+331/a10> Trace; f89e7737 <END_OF_CODE+309d4/????> Trace; f89fb1e0 <END_OF_CODE+4447d/????> Trace; c010e1b6 <timer_interrupt+e6/170> Trace; c0123fc0 <update_process_times+20/a0> Trace; c01482ab <path_lookup+1b/30> Trace; c01487c4 <open_namei+94/650> Trace; c012042b <bh_action+4b/80> Trace; c01202d1 <tasklet_hi_action+61/a0> Trace; c012005b <do_softirq+6b/d0> Trace; c010abfe <do_IRQ+fe/110> Trace; c013c606 <filp_open+36/60> Trace; c01471ae <getname+5e/a0> Trace; c013c953 <sys_open+33/a0> Trace; c0109023 <system_call+33/38> Code; 00000000 Before first symbol 00000000 <_EIP>: Code; 00000000 Before first symbol 0: 89 1d b0 e0 ff ff mov %ebx,0xffffe0b0 Code; 00000006 Before first symbol 6: ff 80 04 48 33 c0 incl 0xc0334804(%eax) Code; 0000000c Before first symbol c: eb 02 jmp 10 <_EIP+0x10> 00000010 Before first symbol Code; 0000000e Before first symbol e: f3 90 repz nop Code; 00000010 Before first symbol 10: a1 88 f3 30 00 mov 0x30f388,%eax Thank you. Alex Zarubin -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.digium.com/pipermail/asterisk-users/attachments/20030626/40afec2c/attachment.htm