Karsten Wemheuer
2010-Oct-15 09:00 UTC
[asterisk-users] Kernel panic (asterisk 1.8.0-rc3, dahdi-linux-2.4)
Hi, I setup an asterisk system (asterisk 1.8-rc3, dahdi-linux-2.4.0 with dahdi-extra from Tzafrirs git, kernel 2.6.35.4). The hardware is an older pc system with Celeron CPU (2.5 GHz) with a Beronet BN4S0 ISDN card. The system starts without any errors. I discovered a severe issue. The kernel panics on a very small load. The first call normally gets through. If I start the second or third call and sometimes when I terminate the first call, the system panics (Oops text on console). After solving some difficulties (the relevant part of the Oops text scrolls out of the monitor, no serial interface), I get the text via netconsole. It seems to me, that the panic occurred in oslec (function "oslec_update"). But maybe I am wrong with this. In the oslec code there is a patch to enable MMX. After switching this off, the problem disappeared. AFAIK the cpu supports mmx. Where should I address this issue to? Is it a known issue? Here comes one example for the oops: /----- BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<c0103dd6>] __math_state_restore+0x56/0x90 *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP last sysfs file: /sys/module/configfs/initstate Modules linked in: netconsole configfs dahdi_echocan_oslec echo capifs loop wcb4xxp rtc_cmos i2c_i801 rtc_core dahdi 8250_pnp 8139too floppy 8250 rtc_lib mii serial_core i2c_core processor pcspkr rng_core button ide_pci_generic ide_core sd_mod crc_t10dif thermal [last unloaded: netconsole] Pid: 1268, comm: clip.agi Not tainted 2.6.35.4 #1 P4Dual-915GL/P4Dual-915GL EIP: 0060:[<c0103dd6>] EFLAGS: 00010046 CPU: 0 EIP is at __math_state_restore+0x56/0x90 EAX: 00000000 EBX: c5b20000 ECX: cd461960 EDX: ffffffff ESI: cd461960 EDI: c01045a0 EBP: 00000080 ESP: c5b21cb0 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Process clip.agi (pid: 1268, ti=c5b20000 task=cd461960 task.ti=c5b20000) Stack: c5b21cd0 00000027 c01045a0 c01045e5 00000200 00000000 cfadd500 c0432273 <0> cfadd500 cfadd200 00000008 00000027 00000080 00000080 cf33fa00 0000007b <0> 0000007b c02d00d8 000000e0 ffffffff d0ae2153 00000060 00010002 0000005a Call Trace: [<c01045a0>] ? do_device_not_available+0x0/0x60 [<c01045e5>] ? do_device_not_available+0x45/0x60 [<c0432273>] ? error_code+0x73/0x80 [<c02d00d8>] ? DAC960_V1_ProcessCompletedCommand+0x1108/0x1510 [<d0ae2153>] ? oslec_update+0xe3/0x5c0 [echo] [<d0aeb038>] ? echo_can_process+0x28/0x40 [dahdi_echocan_oslec] [<d0aeb010>] ? echo_can_process+0x0/0x40 [dahdi_echocan_oslec] [<d0a08a18>] ? dahdi_ec_span+0x268/0x2a0 [dahdi] [<d0a9136c>] ? b4xxp_interrupt+0x11c/0x358 [wcb4xxp] [<c0175ded>] ? handle_IRQ_event+0x2d/0xc0 [<c02dd71d>] ? scsi_decide_disposition+0x16d/0x180 [<c0177b85>] ? handle_fasteoi_irq+0x65/0xd0 [<c0105a55>] ? handle_irq+0x15/0x30 [<c01050a7>] ? do_IRQ+0x47/0xc0 [<c0103d30>] ? common_interrupt+0x30/0x40 [<c01300e0>] ? load_balance+0x550/0x7d0 [<c0431614>] ? _raw_spin_unlock_irq+0x4/0x20 [<c012d9ba>] ? finish_task_switch+0x3a/0x90 [<c042f5c9>] ? schedule+0x1c9/0x520 [<c0103d30>] ? common_interrupt+0x30/0x40 [<c042facf>] ? preempt_schedule+0x2f/0x50 [<c0198a60>] ? do_wp_page+0x160/0x960 [<c0199c02>] ? handle_mm_fault+0x5d2/0xaa0 [<c01244b0>] ? do_page_fault+0x0/0x370 [<c01245f0>] ? do_page_fault+0x140/0x370 [<c01b7b2f>] ? copy_strings+0x17f/0x1a0 [<c01b935e>] ? do_execve+0x2be/0x310 [<c01b935e>] ? do_execve+0x2be/0x310 [<c010aa80>] ? sys_execve+0x40/0x70 [<c01244b0>] ? do_page_fault+0x0/0x370 [<c0432273>] ? error_code+0x73/0x80 Code: 89 c2 0f ae 2f 85 c9 75 27 83 4b 0c 01 80 86 98 00 00 00 01 8b 1c 24 8b 74 24 04 8b 7c 24 08 83 c4 0c c3 66 90 8b 86 50 02 00 00 <0f> ae 08 eb d9 e8 c0 ed 01 00 90 83 c8 08 e8 c7 ed 01 00 90 b8 EIP: [<c0103dd6>] __math_state_restore+0x56/0x90 SS:ESP 0068:c5b21cb0 CR2: 0000000000000000 ---[ end trace 65c27cd3a6b7bd8a ]--- \----- Thanks, Karsten
Alex
2010-Oct-15 10:37 UTC
[asterisk-users] Kernel panic (asterisk 1.8.0-rc3, dahdi-linux-2.4)
Hello, I'm having a very similar issue with dahdi 2.3.0.1 / 2.6.32 (and others confirmed the occurence with same software revisions, same kind of old hardware - P3, P4, different HFC hardware). You can look at my last report on loosely related debian bug #598886. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=598886 (see messages #20 and #25) My quick fix was to disable echo cancellation, which is a bit heavy handed but also worked for others. A number of my crashes also pointed out the math_state_restore function (see mess #25), but I didn't knew what to do with this hint. I'll test if things are better off with MMX disabled in dahdi during the week-end... it looks promising. Also, have a look at kernel/Documentation/preempt-locking.txt... Alex Karsten Wemheuer wrote:> Hi, > > I setup an asterisk system (asterisk 1.8-rc3, dahdi-linux-2.4.0 with > dahdi-extra from Tzafrirs git, kernel 2.6.35.4). The hardware is an > older pc system with Celeron CPU (2.5 GHz) with a Beronet BN4S0 ISDN > card. The system starts without any errors. > > I discovered a severe issue. The kernel panics on a very small load. The > first call normally gets through. If I start the second or third call > and sometimes when I terminate the first call, the system panics (Oops > text on console). > > After solving some difficulties (the relevant part of the Oops text > scrolls out of the monitor, no serial interface), I get the text via > netconsole. It seems to me, that the panic occurred in oslec (function > "oslec_update"). But maybe I am wrong with this. In the oslec code there > is a patch to enable MMX. After switching this off, the problem > disappeared. AFAIK the cpu supports mmx. > > Where should I address this issue to? Is it a known issue? > > Here comes one example for the oops: > > /----- > BUG: unable to handle kernel NULL pointer dereference at (null) > IP: [<c0103dd6>] __math_state_restore+0x56/0x90 > *pde = 00000000 > Oops: 0000 [#1] PREEMPT SMP > last sysfs file: /sys/module/configfs/initstate > Modules linked in: netconsole configfs dahdi_echocan_oslec echo capifs > loop wcb4xxp rtc_cmos i2c_i801 rtc_core dahdi 8250_pnp 8139too floppy > 8250 rtc_lib mii serial_core i2c_core processor pcspkr rng_core button > ide_pci_generic ide_core sd_mod crc_t10dif thermal [last unloaded: > netconsole] > > Pid: 1268, comm: clip.agi Not tainted 2.6.35.4 #1 > P4Dual-915GL/P4Dual-915GL > EIP: 0060:[<c0103dd6>] EFLAGS: 00010046 CPU: 0 > EIP is at __math_state_restore+0x56/0x90 > EAX: 00000000 EBX: c5b20000 ECX: cd461960 EDX: ffffffff > ESI: cd461960 EDI: c01045a0 EBP: 00000080 ESP: c5b21cb0 > DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 > Process clip.agi (pid: 1268, ti=c5b20000 task=cd461960 task.ti=c5b20000) > Stack: > c5b21cd0 00000027 c01045a0 c01045e5 00000200 00000000 cfadd500 c0432273 > <0> cfadd500 cfadd200 00000008 00000027 00000080 00000080 cf33fa00 > 0000007b > <0> 0000007b c02d00d8 000000e0 ffffffff d0ae2153 00000060 00010002 > 0000005a > Call Trace: > [<c01045a0>] ? do_device_not_available+0x0/0x60 > [<c01045e5>] ? do_device_not_available+0x45/0x60 > [<c0432273>] ? error_code+0x73/0x80 > [<c02d00d8>] ? DAC960_V1_ProcessCompletedCommand+0x1108/0x1510 > [<d0ae2153>] ? oslec_update+0xe3/0x5c0 [echo] > [<d0aeb038>] ? echo_can_process+0x28/0x40 [dahdi_echocan_oslec] > [<d0aeb010>] ? echo_can_process+0x0/0x40 [dahdi_echocan_oslec] > [<d0a08a18>] ? dahdi_ec_span+0x268/0x2a0 [dahdi] > [<d0a9136c>] ? b4xxp_interrupt+0x11c/0x358 [wcb4xxp] > [<c0175ded>] ? handle_IRQ_event+0x2d/0xc0 > [<c02dd71d>] ? scsi_decide_disposition+0x16d/0x180 > [<c0177b85>] ? handle_fasteoi_irq+0x65/0xd0 > [<c0105a55>] ? handle_irq+0x15/0x30 > [<c01050a7>] ? do_IRQ+0x47/0xc0 > [<c0103d30>] ? common_interrupt+0x30/0x40 > [<c01300e0>] ? load_balance+0x550/0x7d0 > [<c0431614>] ? _raw_spin_unlock_irq+0x4/0x20 > [<c012d9ba>] ? finish_task_switch+0x3a/0x90 > [<c042f5c9>] ? schedule+0x1c9/0x520 > [<c0103d30>] ? common_interrupt+0x30/0x40 > [<c042facf>] ? preempt_schedule+0x2f/0x50 > [<c0198a60>] ? do_wp_page+0x160/0x960 > [<c0199c02>] ? handle_mm_fault+0x5d2/0xaa0 > [<c01244b0>] ? do_page_fault+0x0/0x370 > [<c01245f0>] ? do_page_fault+0x140/0x370 > [<c01b7b2f>] ? copy_strings+0x17f/0x1a0 > [<c01b935e>] ? do_execve+0x2be/0x310 > [<c01b935e>] ? do_execve+0x2be/0x310 > [<c010aa80>] ? sys_execve+0x40/0x70 > [<c01244b0>] ? do_page_fault+0x0/0x370 > [<c0432273>] ? error_code+0x73/0x80 > Code: 89 c2 0f ae 2f 85 c9 75 27 83 4b 0c 01 80 86 98 00 00 00 01 8b 1c > 24 8b 74 24 04 8b 7c 24 08 83 c4 0c c3 66 90 8b 86 50 02 00 00 <0f> ae > 08 eb d9 e8 c0 ed 01 00 90 83 c8 08 e8 c7 ed 01 00 90 b8 > EIP: [<c0103dd6>] __math_state_restore+0x56/0x90 SS:ESP 0068:c5b21cb0 > CR2: 0000000000000000 > ---[ end trace 65c27cd3a6b7bd8a ]--- > \----- > > Thanks, > > Karsten > > >
Shaun Ruffell
2010-Oct-15 19:34 UTC
[asterisk-users] Kernel panic (asterisk 1.8.0-rc3, dahdi-linux-2.4)
On 10/15/2010 04:00 AM, Karsten Wemheuer wrote:> I setup an asterisk system (asterisk 1.8-rc3, dahdi-linux-2.4.0 with > dahdi-extra from Tzafrirs git, kernel 2.6.35.4). The hardware is an > older pc system with Celeron CPU (2.5 GHz) with a Beronet BN4S0 ISDN > card. The system starts without any errors. > > I discovered a severe issue. The kernel panics on a very small load. The > first call normally gets through. If I start the second or third call > and sometimes when I terminate the first call, the system panics (Oops > text on console). > > After solving some difficulties (the relevant part of the Oops text > scrolls out of the monitor, no serial interface), I get the text via > netconsole. It seems to me, that the panic occurred in oslec (function > "oslec_update"). But maybe I am wrong with this. In the oslec code there > is a patch to enable MMX. After switching this off, the problem > disappeared. AFAIK the cpu supports mmx. > > Where should I address this issue to? Is it a known issue? >Do you have CONFIG_DAHDI_MMX defined in include/dahdi/dahdi_config.h? -- Shaun Ruffell Digium, Inc. | Linux Kernel Developer 445 Jan Davis Drive NW - Huntsville, AL 35806 - USA Check us out at: www.digium.com & www.asterisk.org