Allen Ziegenfus
2005-Apr-05 16:13 UTC
e2fsck running for hours, printing out lists of numbers -- should I stop it?
I recently rebooted one of my machines and was greeted with a message about one of my ext3 partitions having some errors. I dutifully started e2fsck on the partition, but after a few hours of some useful messages about fixing inodes, it then started printing out lists of ascending numbers which I'm not sure how to interpret. The first time I noticed this, these numbers were in the 4000000 range. Checking back later I've seen them as high as 16000000, but at some point the counting restarts and I again see 4000000 numbers. The whole screen is filled with these numbers, and there is no other text. So far this has been going on for about 4 days. I was busy with some other things so I figured I'd just let it keep running. The machine in question is very low end (something like a 120 Mhz Pentium) and the partition is fairly large (200 GB). Running top shows me that e2fsck is gobbling up almost all cpu and memory resources. I figured e2fsck just needed some time to work, but now I'm really curious if it will ever stop or if it would be better to just kill the thing. If I were to stop e2fsck, should I just accept that my data is lost and start over? As far as the problem itself, my best guess is that it was a mistake on my part. I'm running Debian woody on this box but I have been running a backported kernel to get usb working better. The backported kernel has this ext3 driver: Linux version 2.4.26-1-386 (tretkowski at rollcage) (gcc version 2.95.4 20011002 (Deb ian prerelease)) #1 Fri Aug 20 16:36:09 CEST 2004 EXT3 FS 2.4-0.9.19, 19 August 2002 on ide3(34,65), internal journal However, at one point I forgot to pick the correct kernel at boot time and I ran the standard woody kernel instead which has this ext3 driver: Linux version 2.4.18-bf2.4 (root at zombie) (gcc version 2.95.4 20011002 (Debian prerelease)) #1 Son Apr 14 09:53:28 CEST 2002 EXT3 FS 2.4-0.9.17, 10 Jan 2002 on ide3(34,65), internal journal I'm guessing that running the older driver after having used the newer one somehow screwed things up. The partition in question is my backup drive for the whole network and at night I copy newly archived files to it (using scp) and delete older ones. The night after I booted with the older kernel, the machine had the following in its logs. It must have been when trying to delete some older archives. Mar 30 02:37:11 musicien kernel: invalid operand: 0000 Mar 30 02:37:11 musicien kernel: CPU: 0 Mar 30 02:37:11 musicien kernel: EIP: 0010:[journal_forget+170/400] Not tainted Mar 30 02:37:11 musicien kernel: EFLAGS: 00010286 Mar 30 02:37:11 musicien kernel: eax: 00000058 ebx: c3f29ab0 ecx: c1760000 edx: c5e96a20 Mar 30 02:37:11 musicien kernel: esi: c5fc7e00 edi: c3e55a40 ebp: c41be6a0 esp: c1761d2c Mar 30 02:37:11 musicien kernel: ds: 0018 es: 0018 ss: 0018 Mar 30 02:37:11 musicien kernel: Process rm (pid: 10451, stackpage=c1761000) Mar 30 02:37:11 musicien kernel: Stack: c02cbcc0 c02cc241 c02cbca0 000004c1 c02cc250 01000000 c3496ec0 c57e1060 Mar 30 02:37:11 musicien kernel: c57e1060 c5fc7e94 c015015b c3496ec0 c3e55a40 01000000 c52be274 c3496ec0 Mar 30 02:37:11 musicien kernel: c42b8e50 01000000 c52be274 c3496ec0 c0151cde c3496ec0 00000000 c57e1060 Mar 30 02:37:11 musicien kernel: Call Trace: [ext3_forget+91/216] [ext3_clear_blocks+250/288] [journal_get_write_access+63/88] [ext3_free_data+200/356] [ext3_free_branches+510/528] Mar 30 02:37:11 musicien kernel: [ext3_free_branches+200/528] [ext3_free_branches+200/528] [ext3_truncate+199/972] [ext3_truncate+751/972] [journal_start+165/204] [start_transaction+85/128] Mar 30 02:37:11 musicien kernel: [ext3_delete_inode+0/284] [ext3_delete_inode+159/284] [ext3_delete_inode+0/284] [iput+243/456] [d_delete+76/108] [vfs_unlink+300/348] Mar 30 02:37:11 musicien kernel: [sys_unlink+165/284] [arp_rcv+1116/1140] [system_call+51/56] Mar 30 02:37:11 musicien kernel: Mar 30 02:37:11 musicien kernel: Code: 0f 0b 83 c4 14 90 8d 74 26 00 53 e8 ae 02 00 00 c7 43 14 00 So are the ext3 drivers not backward compatible? If anyone could give me any ideas, it would be most appreciated. Allen -- -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: <http://listman.redhat.com/archives/ext3-users/attachments/20050405/21bceeac/attachment.sig>
Stephen C. Tweedie
2005-Apr-07 15:32 UTC
e2fsck running for hours, printing out lists of numbers -- should I stop it?
Hi, On Tue, 2005-04-05 at 17:13, Allen Ziegenfus wrote:> However, at one point I forgot to pick the correct kernel at boot time > and I ran the standard woody kernel instead which has this ext3 driver: > > Linux version 2.4.18-bf2.4 (root at zombie) (gcc version 2.95.4 20011002 > (Debian prerelease)) #1 Son Apr 14 09:53:28 CEST 2002 > EXT3 FS 2.4-0.9.17, 10 Jan 2002 on ide3(34,65), internal journalWell, that _is_ old --- there have been a fair number of bugfixes over the past 3 years, but nothing in ext3 itself that you'd expect to cause instant data corruption just because you ran it once instead of a later kernel. And I certainly don't know of any incompatibility issues save some involving features only present on later kernels, such as extended attributes --- and for the most part ext3 handles that transparently, anyway.> The night after I booted with > the older kernel, the machine had the following in its logs. It must > have been when trying to delete some older archives. > > Mar 30 02:37:11 musicien kernel: invalid operand: 0000 > Mar 30 02:37:11 musicien kernel: CPU: 0 > Mar 30 02:37:11 musicien kernel: EIP: 0010:[journal_forget+170/400]Anything else in the logs? You just hit a BUG(), and a bug or assert failure message should have been emitted just prior to this. Such an error in journal_forget() sounds like one of the situations I fixed a couple of years ago, where certain on-disk corruption could cause ext3 to oops internally rather than fail gracefully. But that's not a *cause* of the problem, rather just a less-than-ideal way of responding to it. Also, with a kernel that old, data corruption problems could be due to something as basic as the old IDE driver not dealing properly with new hardware in your system.> So are the ext3 drivers not backward compatible?They should be fully compatible all the way back to the 2.2 kernel (2.0 if you force version 1 superblocks on disk.) --Stephen