Urs Rau
2007-Jun-16 14:35 UTC
[Samba] had 3 kernel panics since upgrade from 3.0.21a to 3.0.25 and 3.0.25a on CentOS 4.4
Does anybody have any ideas on this? On our server that has been running 'rock-solid' with no crashes we have now had 3 kernel panics that each appear to have been triggered by the newly upgraded samba daemon. We used to run samba 3.0.21a for 'years' with no crashes. On May 26 we upgraded to 3.0.25 June 9 10:58:28 first crash kernel panic process involved according to log file smbd (see below) June 14 16:32:23 second crash kernel panic process involved according to log file smbd (see below) In the morning of June 15 we upgraded to 3.0.25a June 15 17:26:36 third crash kernel panic process involved according to log file smbd (see below) Some specs on our server. OS: CentOS 4.4 kernel: 2.6.9-22.0.1.EL.1smp CPU: SMP Dual AMD Opteron(tm) Processor 246 2GHz (about 4000 bogomips) RAM: 2GB SWAP: 5.8GB users: peak at ~ 50 - 60 (varies - usually or on average closer to 30 or so) Here are the log files of the kernel panics. Is this a kernel bug triggered by a samba daemon, or a samba daemon bug that crashed the kernel? ******************** first crash ************************ Jun 9 10:58:03 uk smbd[21513]: [2007/06/09 10:58:03, 0] smbd/service.c:make_connection_snum(928) Jun 9 10:58:03 uk smbd[21513]: Can't become connected user! Jun 9 10:58:05 10.37.2.139 SecurityCenter: N/A: The Security Center service has been stopped. It was prevented from running by a software group policy. Jun 9 10:58:10 10.37.2.139 W32Time: N/A: Time Provider NtpClient: This machine is configured to use the domain hierarchy to determine its time source, but the computer is joined to a Windows NT 4.0 domain. Windows NT 4.0 domain controllers do not have a time service and do not support domain hierarchy as a time source. NtpClient will attempt to use an alternate configured external time source if available. If an external time source is not configured or used for this computer, you may choose to disable the NtpClient. Jun 9 10:58:10 10.37.2.139 W32Time: N/A: The time provider NtpClient is configured to acquire time from one or more time sources, however none of the sources are accessible. NtpClient has no source of accurate time. Jun 9 10:58:20 10.37.2.139 E100B: N/A: Intel(R) PRO/100 VM Network Connection driver has been started Jun 9 10:58:28 uk kernel: ------------[ cut here ]------------ Jun 9 10:58:28 uk kernel: kernel BUG at mm/prio_tree.c:528! Jun 9 10:58:28 uk kernel: invalid operand: 0000 [#1] Jun 9 10:58:28 uk kernel: SMP Jun 9 10:58:28 uk kernel: Modules linked in: nls_utf8 usb_storage vfat fat md5 ipv6 parport_pc lp parport tun sunrpc ipt_MASQUERADE ipt_TOS ipt_LOG iptable_filter iptable_mangle iptable_nat ip_conntrack ip_tables button battery ac ohci_hcd e1000 tg3 floppy st ext3 jbd dm_mod gdth aic79xx sata_sil libata sd_mod scsi_mod Jun 9 10:58:28 uk kernel: CPU: 0 Jun 9 10:58:28 uk kernel: EIP: 0060:[<c01450fd>] Not tainted VLI Jun 9 10:58:28 uk kernel: EFLAGS: 00010216 (2.6.9-22.0.1.EL.1omsmp) Jun 9 10:58:28 uk kernel: EIP is at vma_prio_tree_add+0x36/0x95 Jun 9 10:58:28 uk kernel: eax: 00000009 ebx: e721c17c ecx: 00000000 edx: 000000b3 Jun 9 10:58:28 uk kernel: esi: f47ab85c edi: c293ba88 ebp: d0f28250 esp: db80ef3c Jun 9 10:58:28 uk kernel: ds: 007b es: 007b ss: 0068 Jun 9 10:58:28 uk kernel: Process smbd (pid: 21513, threadinfo=db80e000 task=c269eef0) Jun 9 10:58:28 uk kernel: Stack: e721c17c f76b4c40 c014e1ee e721c17c 000000fb 00000000 eaae6640 c014ed2e Jun 9 10:58:28 uk kernel: d0f28250 d0f28248 00000000 00000001 00000000 c293b9d8 f76b4c40 000b4000 Jun 9 10:58:28 uk kernel: b74e8000 d0f2822c d0f28250 d0f28248 f76b4c40 f76b4c70 db80e000 eaae6640 Jun 9 10:58:28 uk kernel: Call Trace: Jun 9 10:58:28 uk kernel: [<c014e1ee>] vma_link+0x9c/0xbc Jun 9 10:58:28 uk kernel: [<c014ed2e>] do_mmap_pgoff+0x50e/0x666 Jun 9 10:58:28 uk kernel: [<c010b697>] sys_mmap2+0x7e/0xaf Jun 9 10:58:28 uk kernel: [<c02d1213>] syscall_call+0x7/0xb Jun 9 10:58:28 uk kernel: Code: c3 39 ca 74 08 0f 0b 0f 02 64 4e 2e c0 8b 43 08 2b 43 04 c1 e8 0c 8d 54 02 ff 8b 46 08 2b 46 04 c1 e8 0c 8d 44 01 ff 39 c2 74 08 <0f> 0b 10 02 64 4e 2e c0 c7 43 34 00 00 00 00 83 7e 34 00 c7 43 Jun 9 10:58:28 uk kernel: <0>Fatal exception: panic in 5 seconds Jun 9 13:04:57 uk syslogd 1.4.1: restart (remote reception). Jun 9 13:04:57 uk syslog: syslogd startup succeeded Jun 9 13:04:57 uk kernel: klogd 1.4.1, log source = /proc/kmsg started. ******************** second crash ************************ Jun 14 16:25:02 uk nmbd[14947]: [2007/06/14 16:25:02, 0] libsmb/nmblib.c:send_udp(791) Jun 14 16:25:02 uk nmbd[14947]: Packet send failed to 10.37.2.70(138) ERRNO=Operation not permitted Jun 14 16:25:16 uk crond(pam_unix)[15907]: session closed for user root Jun 14 16:25:33 uk clamd[10155]: SelfCheck: Database status OK. Jun 14 16:26:44 uk -- MARK -- Jun 14 16:27:44 uk -- MARK -- Jun 14 16:28:01 uk crond(pam_unix)[16009]: session opened for user root by (uid=0) Jun 14 16:28:01 uk crond[16010]: (root) CMD (ping -c 1 uucp.cid.net > /dev/null 2>&1;sleep 8;/usr/sbin/uucico -S mailhost) Jun 14 16:28:44 uk -- MARK -- Jun 14 16:28:45 uk nmbd[14947]: [2007/06/14 16:28:45, 0] libsmb/nmblib.c:send_udp(791) Jun 14 16:28:45 uk nmbd[14947]: Packet send failed to 10.37.2.35(138) ERRNO=Operation not permitted Jun 14 16:29:44 uk -- MARK -- Jun 14 16:30:01 uk crond(pam_unix)[16031]: session opened for user root by (uid=0) Jun 14 16:30:01 uk crond[16032]: (root) CMD (/usr/lib/sa/sa1 1 1) Jun 14 16:30:01 uk crond(pam_unix)[16033]: session opened for user root by (uid=0) Jun 14 16:30:01 uk crond[16035]: (root) CMD (/opt/sarcheck/bin/prst1) Jun 14 16:30:01 uk crond(pam_unix)[16034]: session opened for user root by (uid=0) Jun 14 16:30:01 uk crond[16037]: (root) CMD (ping -c 1 uucp.cid.net > /dev/null 2>&1;sleep 8;/usr/sbin/uucico -S mailhost) Jun 14 16:30:01 uk crond(pam_unix)[16031]: session closed for user root Jun 14 16:30:02 uk crond(pam_unix)[16033]: session closed for user root Jun 14 16:30:10 uk crond(pam_unix)[16034]: session closed for user root Jun 14 16:30:44 uk -- MARK -- Jun 14 16:31:32 uk crond(pam_unix)[16009]: session closed for user root Jun 14 16:32:23 uk kernel: ------------[ cut here ]------------ Jun 14 16:32:23 uk kernel: kernel BUG at mm/prio_tree.c:528! Jun 14 16:32:23 uk kernel: invalid operand: 0000 [#1] Jun 14 16:32:23 uk kernel: SMP Jun 14 16:32:23 uk kernel: Modules linked in: vfat fat md5 ipv6 parport_pc lp parport tun sunrpc ipt_MASQUERADE ipt_TOS ipt_LOG iptable_filter iptable_mangle iptable_nat ip_conntrack ip_tables usb_storage button battery ac ohci_hcd e1000 tg3 floppy st ext3 jbd dm_mod gdth aic79xx sata_sil libata sd_mod scsi_mod Jun 14 16:32:23 uk kernel: CPU: 0 Jun 14 16:32:23 uk kernel: EIP: 0060:[<c01450fd>] Not tainted VLI Jun 14 16:32:23 uk kernel: EFLAGS: 00010212 (2.6.9-22.0.1.EL.1omsmp) Jun 14 16:32:23 uk kernel: EIP is at vma_prio_tree_add+0x36/0x95 Jun 14 16:32:23 uk kernel: eax: 00000009 ebx: c8a05804 ecx: 00000000 edx: 00000041 Jun 14 16:32:23 uk kernel: esi: f76587ac edi: ec80e450 ebp: e3a1b358 esp: cb136f3c Jun 14 16:32:23 uk kernel: ds: 007b es: 007b ss: 0068 Jun 14 16:32:23 uk kernel: Process smbd (pid: 17852, threadinfo=cb136000 task=d22a85b0) Jun 14 16:32:23 uk kernel: Stack: c8a05804 e9ae8300 c014e1ee c8a05804 000000fb 00000000 caa99480 c014ed2e Jun 14 16:32:23 uk kernel: e3a1b358 e3a1b350 00000000 00000001 00000000 ec80e3a0 e9ae8300 00042000 Jun 14 16:32:23 uk kernel: b7867000 e3a1b334 e3a1b358 e3a1b350 e9ae8300 e9ae8330 cb136000 caa99480 Jun 14 16:32:23 uk kernel: Call Trace: Jun 14 16:32:23 uk kernel: [<c014e1ee>] vma_link+0x9c/0xbc Jun 14 16:32:23 uk kernel: [<c014ed2e>] do_mmap_pgoff+0x50e/0x666 Jun 14 16:32:23 uk kernel: [<c010b697>] sys_mmap2+0x7e/0xaf Jun 14 16:32:23 uk kernel: [<c02d1213>] syscall_call+0x7/0xb Jun 14 16:32:23 uk kernel: Code: c3 39 ca 74 08 0f 0b 0f 02 64 4e 2e c0 8b 43 08 2b 43 04 c1 e8 0c 8d 54 02 ff 8b 46 08 2b 46 04 c1 e8 0c 8d 44 01 ff 39 c2 74 08 <0f> 0b 10 02 64 4e 2e c0 c7 43 34 00 00 00 00 83 7e 34 00 c7 43 Jun 14 16:32:23 uk kernel: <0>Fatal exception: panic in 5 seconds Jun 14 17:11:46 uk syslogd 1.4.1: restart (remote reception). Jun 14 17:11:46 uk syslog: syslogd startup succeeded Jun 14 17:11:46 uk kernel: klogd 1.4.1, log source = /proc/kmsg started. ******************** third crash ************************ Jun 15 17:26:36 uk kernel: ------------[ cut here ]------------ Jun 15 17:26:36 uk kernel: kernel BUG at mm/prio_tree.c:528! Jun 15 17:26:36 uk kernel: invalid operand: 0000 [#1] Jun 15 17:26:36 uk kernel: SMP Jun 15 17:26:36 uk kernel: Modules linked in: vfat fat usb_storage md5 ipv6 parport_pc lp parport tun sunrpc ipt_MASQUERADE ipt_TOS ipt_LOG iptable_filter iptable_mangle iptable_nat ip_conntrack ip_tables button battery ac ohci_hcd e1000 tg3 floppy st ext3 jbd dm_mod gdth aic79xx sata_sil libata sd_mod scsi_mod Jun 15 17:26:36 uk kernel: CPU: 0 Jun 15 17:26:36 uk kernel: EIP: 0060:[<c01450fd>] Not tainted VLI Jun 15 17:26:36 uk kernel: EFLAGS: 00010216 (2.6.9-22.0.1.EL.1omsmp) Jun 15 17:26:36 uk kernel: EIP is at vma_prio_tree_add+0x36/0x95 Jun 15 17:26:36 uk kernel: eax: 00000009 ebx: f2cf2754 ecx: 00000000 edx: 00000031 Jun 15 17:26:36 uk kernel: esi: f649b124 edi: f6616cb0 ebp: daf1b3b0 esp: c4093f3c Jun 15 17:26:36 uk kernel: ds: 007b es: 007b ss: 0068 Jun 15 17:26:36 uk kernel: Process smbd (pid: 12530, threadinfo=c4093000 task=f72bf1f0) Jun 15 17:26:36 uk kernel: Stack: f2cf2754 f07e2600 c014e1ee f2cf2754 000000fb 00000000 f374ab00 c014ed2e Jun 15 17:26:36 uk kernel: daf1b3b0 daf1b3a8 00000000 00000001 00000000 f6616c00 f07e2600 00032000 Jun 15 17:26:36 uk kernel: b7bf6000 daf1b38c daf1b3b0 daf1b3a8 f07e2600 f07e2630 c4093000 f374ab00 Jun 15 17:26:36 uk kernel: Call Trace: Jun 15 17:26:36 uk kernel: [<c014e1ee>] vma_link+0x9c/0xbc Jun 15 17:26:36 uk kernel: [<c014ed2e>] do_mmap_pgoff+0x50e/0x666 Jun 15 17:26:36 uk kernel: [<c010b697>] sys_mmap2+0x7e/0xaf Jun 15 17:26:36 uk kernel: [<c02d1213>] syscall_call+0x7/0xb Jun 15 17:26:36 uk kernel: Code: c3 39 ca 74 08 0f 0b 0f 02 64 4e 2e c0 8b 43 08 2b 43 04 c1 e8 0c 8d 54 02 ff 8b 46 08 2b 46 04 c1 e8 0c 8d 44 01 ff 39 c2 74 08 <0f> 0b 10 02 64 4e 2e c0 c7 43 34 00 00 00 00 83 7e 34 00 c7 43 Jun 15 17:26:36 uk kernel: <0>Fatal exception: panic in 5 seconds Am I reading this right? The Process involved on each of these kernel panics is "Process smbd"? Jun 9 10:58:28 uk kernel: Process smbd (pid: 21513, threadinfo=db80e000 task=c269eef0) Jun 14 16:32:23 uk kernel: Process smbd (pid: 17852, threadinfo=cb136000 task=d22a85b0) Jun 15 17:26:36 uk kernel: Process smbd (pid: 12530, threadinfo=c4093000 task=f72bf1f0) I am sorry if I point the finger at the wrong thing here. But it seems strange that a server starts kernel panicking in this 'consistent' way always showing the same process 'smbd' involved and combined with the fact that the samba rpm upgrade is the only thing that recently changed on this server. Or is the fault really a kernel bug as the log file entry suggests with "kernel BUG at mm/prio_tree.c:528!" Jun 9 10:58:28 uk kernel: kernel BUG at mm/prio_tree.c:528! Jun 9 10:58:28 uk kernel: invalid operand: 0000 [#1] Jun 9 10:58:28 uk kernel: SMP Jun 14 16:32:23 uk kernel: kernel BUG at mm/prio_tree.c:528! Jun 14 16:32:23 uk kernel: invalid operand: 0000 [#1] Jun 14 16:32:23 uk kernel: SMP Jun 15 17:26:36 uk kernel: kernel BUG at mm/prio_tree.c:528! Jun 15 17:26:36 uk kernel: invalid operand: 0000 [#1] Jun 15 17:26:36 uk kernel: SMP Any clever ideas? I will explore the redhat kernel list and see if there is a newer one maybe one from CentOS 4.5? Google gives me a number of hits dating back many months where the kernel BUG "kernel BUG at mm/prio_tree.c:528!" has been triggered with a variety of processes (some smbds - but also a few others) Many thanks for any pointers. Would be really great if I could tell people Monday morning when they come back to work, that we have found the culprit, or better that we have managed to fix it even. There is to hopeing. Regards, -- Urs Rau
Urs Rau
2007-Jun-16 16:51 UTC
[Samba] had 3 kernel panics since upgrade from 3.0.21a to 3.0.25 and 3.0.25a on CentOS 4.4
Urs Rau wrote:> Any clever ideas? I will explore the redhat kernel list and see if there > is a newer one maybe one from CentOS 4.5? >Should have spent some more time on this, in the first place. I have found an entry in the redhat bugzilla, that looks like it might fit. There are two entries in redhat bugzilla for rhel 4 error "kernel BUG at mm/prio_tree.c" https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=185472 https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=173981 Bug 173981 was closed with an ERRATA issued in mid 2006 which upgrades the kernel up to 2.6.9-34.EL (we are still at 2.6.9-22.0.1) http://rhn.redhat.com/errata/RHSA-2006-0132.html I have now temporarily upgraded our kernel to the latest centos 4.5 one 2.6.9-55.EL. We will monitor this and report back, hopefully the crashes are really a kernel bug and not a samba bug and will now have been fixed by this upgrade. Sorry, but it seemed to point in the direction of smbd, at least at first glance. Will be keeping you posted if this changes again. -- Urs Rau