Rossoni Fabio
2008-Jul-16 12:12 UTC
aborted journal and kernel bug on RHEL AP 5.1 on SUN AMD 64bit (X4200M2)
Hi, i'm reached a strange situation over my servers SUN X4200M2 running with Linux Advanced Platform 5.1 Linux fea.localdomain 2.6.18-53.el5 #1 SMP Wed Oct 10 16:34:19 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux.. This happen on both internal and external disks (Hitachi AMS 200 storage , emulex HBA , and HDLM sw Hitachi for multipath) After problem happening I'm not able to use the server due to root corruption files : -rwxr-xr-x 1 root root 14096 Sep 5 2007 rmmod -rwxr-xr-x 1 root root 521552 Aug 7 2006 rmt -rwxr-xr-x 1 root root 14648 Jul 13 2006 rngd -rwxr-xr-x 1 root root 57920 Aug 7 2006 route -rwxr-xr-x 1 root root 5904 Sep 25 2007 rpc.lockd -rwxr-xr-x 1 root root 49352 Sep 25 2007 rpc.statd ?--------- ? ? ? ? ? rrestore ?--------- ? ? ? ? ? rrestore.static -rwxr-xr-x 1 root root 29976 Jan 9 2007 rtmon -rwxr-xr-x 1 root root 7736 Oct 13 2006 runlevel -rwxr-xr-x 1 root root 30840 Nov 27 2006 runuser -rwxr-xr-x 1 root root 10376 Aug 17 2007 salsa [root at fea sbin]# And also file system are mounted in read-only mode The following is a parts of messages file: Jul 11 16:11:15 fea clurgmgrd[4739]: <notice> Service service:appl-dfdd is disabled Jul 11 16:29:56 fea clurgmgrd[4739]: <notice> Stopping service service:db-dfdd Jul 11 16:30:00 fea avahi-daemon[4622]: Withdrawing address record for 10.40.3.40 on eth1. Jul 11 16:30:11 fea dlm_controld[4281]: uevent message has 3 args Jul 11 16:30:11 fea clurgmgrd[4739]: <notice> Service service:db-dfdd is disabled Jul 11 16:31:44 fea clurgmgrd[4739]: <notice> Starting disabled service service:db-dfdd Jul 11 16:31:44 fea kernel: kjournald starting. Commit interval 5 seconds Jul 11 16:31:44 fea kernel: EXT3-fs warning: maximal mount count reached, running e2fsck is recommended Jul 11 16:31:44 fea kernel: EXT3 FS on sddlmab, internal journal Jul 11 16:31:44 fea kernel: EXT3-fs: mounted filesystem with ordered data mode. Jul 11 16:31:44 fea dlm_controld[4281]: uevent message has 3 args Jul 11 16:31:44 fea avahi-daemon[4622]: Registering new address record for 10.40.3.40 on eth1. Jul 11 16:31:48 fea clurgmgrd[4739]: <notice> Service service:db-dfdd started Jul 11 16:40:23 fea clurgmgrd[4739]: <notice> Stopping service service:db-dfdd Jul 11 16:40:25 fea avahi-daemon[4622]: Withdrawing address record for 10.40.3.40 on eth1. Jul 11 16:40:35 fea dlm_controld[4281]: uevent message has 3 args Jul 11 16:40:35 fea clurgmgrd[4739]: <notice> Service service:db-dfdd is disabled Jul 11 17:13:01 fea kernel: EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for block 382976 Jul 11 17:13:01 fea kernel: Aborting journal on device dm-0. Jul 11 17:13:01 fea kernel: EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for block 382977 Jul 11 17:13:01 fea kernel: EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for block 382978 Jul 11 17:13:01 fea kernel: EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for block 382979 Jul 11 17:13:01 fea kernel: EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for block 382980 Jul 11 17:13:02 fea kernel: EXT3-fs error (device dm-0) in ext3_reserve_inode_write: Journal has aborted Jul 11 17:13:02 fea kernel: EXT3-fs error (device dm-0) in ext3_reserve_inode_write: Journal has aborted Jul 11 17:13:02 fea kernel: EXT3-fs error (device dm-0) in ext3_orphan_del: Journal has aborted Jul 11 17:13:02 fea kernel: EXT3-fs error (device dm-0) in ext3_truncate: Journal has aborted Jul 11 17:13:02 fea kernel: ext3_abort called. Jul 11 17:13:02 fea kernel: EXT3-fs error (device dm-0): ext3_journal_start_sb: Detected aborted journal Jul 11 17:13:02 fea kernel: Remounting filesystem read-only Jul 11 17:27:30 fea clurgmgrd[4739]: <info> State change: feb.iride DOWN Jul 11 17:27:30 fea clurgmgrd[4739]: <info> State change: /dev/sddlmac UP Jul 11 17:27:30 fea clurgmgrd[4739]: <info> Waiting for node #2 to be fenced Jul 11 17:28:50 fea qdiskd[4191]: <info> Node 2 shutdown And also a kernel bug as: Jul 9 16:57:13 fea syslogd 1.4.1: restart. /trace Jul 10 17:41:09 fea kernel: EXT3-fs warning (device sddlmaa): ext3_unlink: Deleting nonexistent file (13353077), 0 Jul 10 18:20:04 fea dlm_controld[4260]: uevent message has 3 args Jul 10 18:20:04 fea kernel: sb orphan head is 13353077 Jul 10 18:20:04 fea kernel: sb_info orphan list: Jul 10 18:20:04 fea kernel: inode dm-0:1010899 at ffff8100df1f3448: mode 100555, nlink 1, next 0 Jul 10 18:20:13 fea last message repeated 59479 times Jul 10 18:20:13 fea kernel: BUG: soft lockup detected on CPU#1! Jul 10 18:20:13 fea kernel: Jul 10 18:20:13 fea kernel: Call Trace: Jul 10 18:20:13 fea kernel: <IRQ> [<ffffffff800b50fa>] softlockup_tick+0xd5/0xe7 Jul 10 18:20:13 fea kernel: [<ffffffff800930e2>] update_process_times+0x42/0x68 Jul 10 18:20:13 fea kernel: [<ffffffff800746e3>] smp_local_timer_interrupt+0x23/0x47 Jul 10 18:20:13 fea kernel: [<ffffffff80074da5>] smp_apic_timer_interrupt+0x41/0x47 Jul 10 18:20:13 fea kernel: [<ffffffff8005bc8e>] apic_timer_interrupt+0x66/0x6c Jul 10 18:20:13 fea kernel: <EOI> [<ffffffff8008d4b6>] vprintk+0x29e/0x2ea Jul 10 18:20:13 fea kernel: [<ffffffff8008d554>] printk+0x52/0xbd Jul 10 18:20:13 fea kernel: [<ffffffff80061a3f>] out_of_line_wait_on_bit+0x6c/0x78 Jul 10 18:20:13 fea kernel: [<ffffffff880564f4>] :ext3:ext3_put_super+0x13e/0x1e0 Jul 10 18:20:13 fea kernel: [<ffffffff800d8e1e>] generic_shutdown_super+0x79/0xfb Jul 10 18:20:13 fea kernel: [<ffffffff800d8ec6>] kill_block_super+0x26/0x3a Jul 10 18:20:13 fea kernel: [<ffffffff800d8f94>] deactivate_super+0x6a/0x82 Jul 10 18:20:13 fea kernel: [<ffffffff800e1d13>] sys_umount+0x245/0x27b Jul 10 18:20:13 fea kernel: [<ffffffff800b27ae>] audit_syscall_entry+0x14d/0x180 Jul 10 18:20:13 fea kernel: [<ffffffff8005b28d>] tracesys+0xd5/0xe0 Jul 10 18:20:13 fea kernel: Jul 10 18:20:13 fea kernel: inode dm-0:1010899 at ffff8100df1f3448: mode 100555, nlink 1, next 0 Jul 10 18:20:13 fea last message repeated 50 times Jul 10 18:20:13 fea kernel: inode dm-0:1010899 at ffff8100df1f3448: mode 100555, nlink , nlink 1, next 0 Jul 10 18:20:13 fea kernel: inode dm-0:1010899 at ffff8100df1f3448: mode 100555, nlink 1, next 0 Jul 10 18:20:13 fea last message repeated 54 times Jul 10 18:20:13 fea kernel: in, nlink 1, next 0 Jul 10 18:20:13 fea kernel: inode dm-0:1010899 at ffff8100df1f3448: mode 100555, nlink 1, next 0 Jul 10 18:20:13 fea last message repeated 54 times Jul 10 18:20:13 fea kernel: in, nlink 1, next 0 Jul 10 18:20:13 fea kernel: inode dm-0:1010899 at ffff8100df1f3448: mode 100555, nlink 1, next 0 Jul 10 18:20:13 fea last message repeated 54 times Jul 10 18:20:13 fea kernel: in, nlink 1, next 0 Jul 10 18:20:13 fea kernel: inode dm-0:1010899 at ffff8100df1f3448: mode 100555, nlink 1, next 0 Jul 10 18:20:13 fea last message repeated 54 times Jul 10 18:20:13 fea kernel: in, nlink 1, next 0 Jul 10 18:20:13 fea kernel: inode dm-0:1010899 at ffff8100df1f3448: mode 100555, nlink 1, next 0 Jul 10 18:20:13 fea last message repeated 54 times I'm planning to reinstall the server ... Some body can help me ? Thanks a lot Fabio -------------------------------------------- INFORMATIVA SULLA PRIVACY Ai sensi del D.Lgs. 196/2003 si precisa che le informazioni contenute in questo messaggio e nei suoi eventuali allegati sono riservate e per uso esclusivo del destinatario. Nessuno, all'infuori dello stesso, pu? copiare o distribuire il messaggio, o parte di esso, a terzi. Chiunque riceva questo messaggio per errore ? pregato di distruggerlo e di informare il mittente. PRIVACY NOTICE According to the D.Lgs. 196/2003 this document and its attachments are confidential and intended for the named addressee(s) only. If you are not the intended recipient of this message, any use or dissemination of this message is prohibited. If you have received this document by mistake, please notify the sender and destroy all physical and/or electronic copies. -------------------------------------------- INFORMATIVA SULLA PRIVACY Ai sensi del D.Lgs. 196/2003 si precisa che le informazioni contenute in questo messaggio e nei suoi eventuali allegati sono riservate e per uso esclusivo del destinatario. Nessuno, all'infuori dello stesso, pu? copiare o distribuire il messaggio, o parte di esso, a terzi. Chiunque riceva questo messaggio per errore ? pregato di distruggerlo e di informare il mittente. PRIVACY NOTICE According to the D.Lgs. 196/2003 this document and its attachments are confidential and intended for the named addressee(s) only. If you are not the intended recipient of this message, any use or dissemination of this message is prohibited. If you have received this document by mistake, please notify the sender and destroy all physical and/or electronic copies. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://listman.redhat.com/archives/ext3-users/attachments/20080716/f4e6c081/attachment.htm>
Christian Kujau
2008-Jul-17 11:56 UTC
aborted journal and kernel bug on RHEL AP 5.1 on SUN AMD 64bit (X4200M2)
On Wed, July 16, 2008 14:12, Rossoni Fabio wrote:> i'm reached a strange situation over my servers SUN X4200M2 running with > Linux Advanced Platform 5.1 Linux fea.localdomain 2.6.18-53.el5 #1 SMP--------------------------------------------------------^ so, a rather old kernel, patched to hell probably :-)> Wed Oct 10 16:34:19 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux.. This > happen on both internal and external disks (Hitachi AMS 200 storage , > emulex HBA , and HDLM sw Hitachi for multipath)I don't have any multipath experience with ext3, so I hope that's not an issue here.> Jul 11 16:31:44 fea kernel: EXT3-fs warning: maximal mount count > reached, running e2fsck is recommendedWell, did you run e2fsck on the filesystem?> Jul 10 18:20:13 fea kernel: BUG: soft lockup detected on CPU#1!Is this reproducible or did this only occure once? Christian. -- make bzImage, not war