Lee, John (Sydney)
2009-Aug-15 12:56 UTC
[asterisk-users] BUG: soft lockup - CPU#1 stuck for 10s! [swapper:0]
I have this DELL PE2950 running Asterisk 1.4.21.2 on RHEL 5 with no problems since Dec last year. We are using Digium TE412P to connect to an E1 ISDN line. Since Dec last year, we did not add or delete any software or hardware. We also did not do any "yum update". The linux kernel is 2.6.18-92.1.22.el5 Last week, the users reported that people from outside could not dial in but users can dial out. We rebooted the box and everything was fine. Suddenly, starting this week, the box froze several times a day with a "BUG: soft lockup - CPU#1 stuck for 10s! [swapper:0]" error message on the console. Before it freezes, I can see a continuous stream of error message ... timing source auto card 0! timing source auto card 0! timing source auto card 0! timing source auto card 0! ... coming up on the machine. We rebooted and it became okay for a few hours and we had to reboot it again in order to clear the problem. BUG: soft lockup - CPU#1 stuck for 10s! [swapper:0] Pid: 0, comm: swapper EIP: 0060:[,C0417911.] CPU: 1 EIP is at smp_call_function+0x99/0xc3 EFLAGS: 00000297 Tainted: G (2.6.10-92.1.22.e15 #1) EAX: 00000002 EBX: 00000000 ECX: 00000001 EDX: 000000fb ESI: 00000003 EDI: 00000000 EBP: c0417ae0 DS: 007B ES: 007b CR0: 8005003b CR2: b7fec780 CR3: 324B2000 CR4: 000006d0 [<c0417ae0>] stop_this_cpu+0x0/0x33 [<c041794e>] smp_send_stop+0x13/0x1c [<c0425bcf>] panic+0x4c/0x16d [<c040da17>] intel_machine_check+0xf9/0x146 [<c040d91e>] intel_machine_check+0x0/0x146 [<c0403ccf>] error_code+0x39/0x40 [<c0403ccf>] mwait_idel+0x25/0x38 [<c0522200>] acpi_processor_idle+0x154/0x3b4 [<c0403c90>] cpu_idle+0x9f/0xb9 ====================== Q1. A strange thing is I could not find this error message in /var/log/messages or dmesg. The soft lockup error message can only be found on the machine itself. Q2. Could it be kernel incompatibility problem? However, we did not ever change anything since it was installed. Q3. From the error message, how do I know it is a software (kernel?) or hardware problem? I would appreciate if someone could give me any suggestions.
Lee, John (Sydney)
2009-Aug-15 12:58 UTC
[asterisk-users] BUG: soft lockup - CPU#1 stuck for 10s! [swapper:0]
I have this DELL PE2950 running Asterisk 1.4.21.2 on RHEL 5 with no problems since Dec last year. We are using Digium TE412P to connect to an E1 ISDN line. Since Dec last year, we did not add or delete any software or hardware. We also did not do any "yum update". The linux kernel is 2.6.18-92.1.22.el5 Last week, the users reported that people from outside could not dial in but users can dial out. We rebooted the box and everything was fine. Suddenly, starting this week, the box froze several times a day with a "BUG: soft lockup - CPU#1 stuck for 10s! [swapper:0]" error message on the console. Before it freezes, I can see a continuous stream of error message ... timing source auto card 0! timing source auto card 0! timing source auto card 0! timing source auto card 0! ... coming up on the machine. We rebooted and it became okay for a few hours and we had to reboot it again in order to clear the problem. BUG: soft lockup - CPU#1 stuck for 10s! [swapper:0] Pid: 0, comm: swapper EIP: 0060:[,C0417911.] CPU: 1 EIP is at smp_call_function+0x99/0xc3 EFLAGS: 00000297 Tainted: G (2.6.10-92.1.22.e15 #1) EAX: 00000002 EBX: 00000000 ECX: 00000001 EDX: 000000fb ESI: 00000003 EDI: 00000000 EBP: c0417ae0 DS: 007B ES: 007b CR0: 8005003b CR2: b7fec780 CR3: 324B2000 CR4: 000006d0 [<c0417ae0>] stop_this_cpu+0x0/0x33 [<c041794e>] smp_send_stop+0x13/0x1c [<c0425bcf>] panic+0x4c/0x16d [<c040da17>] intel_machine_check+0xf9/0x146 [<c040d91e>] intel_machine_check+0x0/0x146 [<c0403ccf>] error_code+0x39/0x40 [<c0403ccf>] mwait_idel+0x25/0x38 [<c0522200>] acpi_processor_idle+0x154/0x3b4 [<c0403c90>] cpu_idle+0x9f/0xb9 ====================== Q1. A strange thing is I could not find this error message in /var/log/messages or dmesg. The soft lockup error message can only be found on the machine itself. Q2. Could it be kernel incompatibility problem? However, we did not ever change anything since it was installed. Q3. From the error message, how do I know it is a software (kernel?) or hardware problem? I would appreciate if someone could give me any suggestions.