thr3ads.net - CentOS - [CentOS] unstable kernel after update to CentOS 4.5 [Dec 2007]

If this information is useful, please help other people find it:
Share via:

Kai Schaetzl

2007-Dec-10 12:23 UTC

[CentOS] unstable kernel after update to CentOS 4.5

On Saturday I finally upgraded a machine from CentOS 4.3 (I think)
to 4.5 via yum. Seemed to went fine. However, during the following
night /home got mounted read-only because of an EXT3-fs error. The
next night happened the same. Also, today, I saw the first-ever 
kernel crash on this machine.
The machine is about three years old or so, went into production
two years ago with CentOS 4.1 or so and has been rock stable since
then. The fs errors, no kernel crashes, no other "weird" occurences.
As the problems are now happening right after upgrading to a new 
kernel I rather suspect a bug in the kernel (or some module) than
a hardware problem. No RAID, no LVM, a few partitions on an IDE disk.
I didn't file it as a bug yet. I want to first gather some more 
information or get some help.
Here are some details.

Kernel was updated from 2.6.9-34.0.2.EL to 2.6.9-55.0.12.EL.
There is not a single package update missing now.

Dec  9 04:30:35 nx10 kernel: EXT3-fs error (device hda3):
htree_dirblock_to_tree: bad entry in directory
#1330023: rec_len % 4 != 0 - offset=10264, inode=808542775, rec_len=13621,
name_len=100
Dec  9 04:30:35 nx10 kernel: Aborting journal on device hda3.
Dec  9 04:30:35 nx10 kernel: ext3_abort called.
Dec  9 04:30:35 nx10 kernel: EXT3-fs error (device hda3): ext3_journal_start_sb:
Detected aborted journal
Dec  9 04:30:35 nx10 kernel: Remounting filesystem read-only

The second error tonight happened about 5 minutes earlier. 
With exactly the same directory inode.
http://www.google.de/search?as_q=centos+rec_len+4+0&hl=de&num=30&btnG=Google-Suche&as_epq=bad+entry+in+direc
tory&as_oq=&as_eq=&lr=&cr=&as_ft=i&as_filetype=&as_qdr=all&as_occt=any&as_dt=i&as_sitesearch=&as_rights=&saf
e=images
shows this error is very scarce (I also tried it with fedora and got a few
more).
It seems to be related to heavy disk i/o, but only under certain (hardware?)
circumstances and may be a bug introduced in some Fedora kernel and this
krept into RHEL/CentOS 4.4/4.5.
Once this happens that filesystem (in my case /home) is read-only and
the machine just hangs when one tries to shutdown (probably when 
unmounting) or remount ro (for a file check). After a hard reset the
automatic fschk in dmesg lists only an few orphan inode cleanups.
Also, I found that dmesg delivers me an output of the iptables logging 
(which is on kern.=debug) before the problem is fixed with a reset.
Can I use fsdebug safely on that system while mounted? I'm not familiar
with it and just stumbled over a mention of it. I tried it on a machine
here on a mounted device and there was no problem. That other machine is
in a remote data center, so options are a bit limited.

The kernel crash from today starts like this:
Dec 10 10:30:01 nx10 kernel: Unable to handle kernel paging request at virtual
address 8f38df23
Dec 10 10:30:01 nx10 kernel:  printing eip:
Dec 10 10:30:01 nx10 kernel: c019190b
Dec 10 10:30:01 nx10 kernel: *pde = 00000000
Dec 10 10:30:01 nx10 kernel: Oops: 0000 [#1]
Dec 10 10:30:01 nx10 kernel: Modules linked in: ipt_REJECT ipt_limit ipt_state
ipt_LOG iptable_filter
ip_tables ip_conntrack_ftp ip_conntrack md5 ipv6 autofs4 i2c_dev i2c_core sunrpc
dm_mirror dm_mod button
battery ac 8139too mii ext3 jbd ata_piix libata sd_mod scsi_mod
Dec 10 10:30:01 nx10 kernel: CPU:    0
Dec 10 10:30:01 nx10 kernel: EIP:    0060:[<c019190b>]    Not tainted VLI
Dec 10 10:30:01 nx10 kernel: EFLAGS: 00010282   (2.6.9-55.0.12.EL)
Dec 10 10:30:01 nx10 kernel: EIP is at seq_escape+0x21/0xaa
Dec 10 10:30:01 nx10 kernel: eax: 8f38df23   ebx: c0370260   ecx: d35a9151  
edx: d35aa000
Dec 10 10:30:01 nx10 kernel: esi: c518c200   edi: c518c200   ebp: c032f9d9  
esp: c63d9f28
Dec 10 10:30:01 nx10 kernel: ds: 007b   es: 007b   ss: 0068
Dec 10 10:30:01 nx10 kernel: Process mv (pid: 16585, threadinfo=c63d9000
task=cee5a1b0)
Dec 10 10:30:01 nx10 kernel: Stack: d35aa000 8f38df23 c0370260 c518c200 dfe08982
00000000 c018e0e3 c03702c0
Dec 10 10:30:01 nx10 kernel:        c518c200 00000000 dfe08982 c019157f 00000151
00000000 00000400 b7fd5000
Dec 10 10:30:01 nx10 kernel:        0000000c 00000000 0000000b 00000000 c0371300
cea00b80 00000400 c63d9fac
Dec 10 10:30:01 nx10 kernel: Call Trace:
Dec 10 10:30:01 nx10 kernel:  [<c018e0e3>] show_vfsmnt+0x28/0xf5
Dec 10 10:30:01 nx10 kernel:  [<c019157f>] seq_read+0x1c3/0x2bd
Dec 10 10:30:01 nx10 kernel:  [<c016c91b>] vfs_read+0xb6/0xe2
Dec 10 10:30:01 nx10 kernel:  [<c016cb30>] sys_read+0x3c/0x62
Dec 10 10:30:01 nx10 kernel:  [<c031b777>] syscall_call+0x7/0xb


I wonder if I can go back to 2.6.9-34.0.2.EL. Should I expect problems
with other updated packages?

Kai

-- 
Kai Sch?tzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com

Alfred von Campe

2007-Dec-10 12:40 UTC

head link

[CentOS] unstable kernel after update to CentOS 4.5

Kai:
> Dec  9 04:30:35 nx10 kernel: EXT3-fs error (device hda3):  
> htree_dirblock_to_tree: bad entry in directory
> #1330023: rec_len % 4 != 0 - offset=10264, inode=808542775,  
> rec_len=13621, name_len=100
> Dec  9 04:30:35 nx10 kernel: Aborting journal on device hda3.
> Dec  9 04:30:35 nx10 kernel: ext3_abort called.
> Dec  9 04:30:35 nx10 kernel: EXT3-fs error (device hda3):  
> ext3_journal_start_sb: Detected aborted journal
> Dec  9 04:30:35 nx10 kernel: Remounting filesystem read-only
Updating to 4.5 was just a coincidence.  I believe you have a disk  
that's going bad.  I've seen this error three times and it has always  
been a bad disk.  Backup what you can and replace the disk.

Alfred

Kai Schaetzl

2007-Dec-10 22:59 UTC

head link

[CentOS] unstable kernel after update to CentOS 4.5

Kai Schaetzl wrote on Mon, 10 Dec 2007 13:23:26 +0100:
> Dec  9 04:30:35 nx10 kernel: EXT3-fs error (device hda3):
htree_dirblock_to_tree: bad entry in directory
> #1330023: rec_len % 4 != 0 - offset=10264, inode=808542775, rec_len=13621,
name_len=100
I checked the filesystem in the evening and it's clean. I really doubt
there's anything with the disk. What makes me wonder is that high inode
number. According to df -i that partition has a number of 3145728 inodes.
And debugfs stat on inode 808542775 tells me that one doesn't exist.
I don't know how I could access that directory #1330023, but I assume it
doesn't exist like inode 808542775.

Kai

-- 
Kai Sch?tzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com

CentOS - Dec 2007 - unstable kernel after update to CentOS 4.5

[CentOS] unstable kernel after update to CentOS 4.5

[CentOS] unstable kernel after update to CentOS 4.5

[CentOS] unstable kernel after update to CentOS 4.5

Maybe Matching Threads