Hello, I have a 1.3TB ext3 filesystem that has been in service for about 3 months. About 6 days ago the Emulex fibrechannel controller logged a SCSI error and the filesystem changed to RO. It appears that the filesystem instantly changes to RO and prevents the journal from working, therefore invalidating the filesystem. The filesystem was unmounted and a remount was attempted. The mount failed due to errors and an fsck came up with errors. Top output looks like this: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4562 root 25 0 780m 214m 236 R 99.9 42.6 6211:44 fsck.ext3 The fsck has been running for 6 days without printing anything to the screen. It seems to be working as an strace produces the following. Process 4562 attached - interrupt to quit _llseek(5, 5979127808, [5979127808], SEEK_SET) = 0 read(5, "\377\276\340oY\\i\17\346N\231\370\216\v\276\361\255\245"..., 4096) = 4096 _llseek(5, 299281825792, [299281825792], SEEK_SET) = 0 write(5, "\323\265-Q\33<\331\216\325\304U\3V\221\213\301e\32Q\220"..., 4096) = 4096 _llseek(5, 5979131904, [5979131904], SEEK_SET) = 0 read(5, "\327\347\2435\210\253^\222H\253\302\331\360\245\323\352"..., 4096) = 4096 _llseek(5, 299281829888, [299281829888], SEEK_SET) = 0 write(5, "\242\355\370A\2759Q\251\31>\254\240\301\34\320\226J5\22"..., 4096) = 4096 _llseek(5, 5979136000, [5979136000], SEEK_SET) = 0 read(5, "X\220ik\266\312\306\\ \266\32\220A\362\3319\250\27&\f\357"..., 4096) = 4096 _llseek(5, 299281833984, [299281833984], SEEK_SET) = 0 write(5, "U\352\255\303`\262\372h\242\275\312\333_\352\3\322\313"..., 4096) = 4096 _llseek(5, 5979140096, [5979140096], SEEK_SET) = 0 read(5, "\33\265#\367\332{\250Bj\215\277[\313\201\23\340\223\216"..., 4096) = 4096 _llseek(5, 299281838080, [299281838080], SEEK_SET) = 0 write(5, "\313-\234z\236\253/\3\360\232\222\237p\t5L\353\v\363t%"..., 4096) = 4096 Process 4562 detached How long should I let the fsck run? Regards Darryl Bond DISCLAIMER The contents of this electronic message and any attachments are intended only for the addressee and may contain legally privileged, personal, sensitive or confidential information. If you are not the intended addressee, and have received this email, any transmission, distribution, downloading, printing or photocopying of the contents of this message or attachments is strictly prohibited. Any legal privilege or confidentiality attached to this message and attachments is not waived, lost or destroyed by reason of delivery to any person other than intended addressee. If you have received this message and are not the intended addressee you should notify the sender by return email and destroy all copies of the message and any attachments. Unless expressly attributed, the views expressed in this email do not necessarily represent the views of the company.
On Mon, May 23, 2005 at 08:53:28AM +1000, Darryl Bond wrote:> Hello, > I have a 1.3TB ext3 filesystem that has been in service for about 3 months. > About 6 days ago the Emulex fibrechannel controller logged a SCSI error > and the filesystem changed to RO. > It appears that the filesystem instantly changes to RO and prevents the > journal from working, therefore invalidating the filesystem. > The filesystem was unmounted and a remount was attempted. The mount > failed due to errors and an fsck came up with errors.What version of e2fsck are you using, and what kernel messages were displayed when the filesystem was remounted read-only? What version of the kernel/distribution are you using? What essages were printed by e2fsck? - Ted
Darryl Bond <dbond at nrggos.com.au> writes:> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 4562 root 25 0 780m 214m 236 R 99.9 42.6 > 6211:44 fsck.ext3I looks like fsck.ext3 has eaten all of your memory. Your system is probably thrashing. Buy more memory. -- Per Andreas Buer
Perhaps, but should I stop it. It doesn't seem to be thrashing. The box is still quite responsive. After 8 days it is still working. If I stop it, will I have a mountable filesystem that I can get as much as possible off. I have ordered some 400G disks to try to get as much as possible. Regards Per Andreas Buer wrote:>Darryl Bond <dbond at nrggos.com.au> writes: > > > >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >> 4562 root 25 0 780m 214m 236 R 99.9 42.6 >>6211:44 fsck.ext3 >> >> > >I looks like fsck.ext3 has eaten all of your memory. Your system is >probably thrashing. Buy more memory. > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://listman.redhat.com/archives/ext3-users/attachments/20050525/28533752/attachment.htm>
On May 23 Darryl Bond wrote:> I have a 1.3TB ext3 filesystem that has been in service for about 3 months. > About 6 days ago the Emulex fibrechannel controller logged a SCSI error and > the filesystem changed to RO. > It appears that the filesystem instantly changes to RO and prevents the > journal from working, therefore invalidating the filesystem. > The filesystem was unmounted and a remount was attempted. The mount failed due > to errors and an fsck came up with errors. > > Top output looks like this: > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 4562 root 25 0 780m 214m 236 R 99.9 42.6 6211:44 > fsck.ext3I'm seeing something rather similar, and not for the first time :-\ The MD layer failed a drive (on a 3ware Escalade card), but somehow the fs got wind of this and aborted the journal. My fsck is on an Opteron, it's entirely CPU-bound, occupying about 1.4G of my 2G RAM, stuck in pass 2 six days in. My strace isn't picking up any calls. My question is basically the same as Darryl's. How long do I give it? (I did SIGKILL an earlier invocation as I hadn't passed the "-y" option.) As my volume is all backup data, I'm willing to poke at it with debugfs if people on this list think it's worth a try. Maybe I can mark it as not having errors, and try to mount it? Or maybe there's a way of making fsck less thorough? I don't like the idea of not having backups for more than a week. What I did last time this happened was to run mke2fs and start again from scratch. Can I do better this time? Matt