I appear to have a broken filesystem on this box to the point where I am probably going to rebuild the system since I have lost trust in it.... I was wondering if there was any reasonable postmortum work I could do on it before it gets rebuilt. History is that the laptop has a much mauled partition set on it - initially it came with 100% win2k, I then shrunk that and added linux (a RH71 install onto 2 partitions - /boot and /), then decided to get rid of the win2k, and did a set of operations with parted 1.4.14 (running from a boot floppy) to move and grow the root partition (which was ext2 through these operations). I then installed the ext3 version from sourceforge onto a 2.4.5-ac[78] kernel. Immediately after boot, a perl process failed to write a file:- my $fh = FileHandle->new('>/etc/resolv.conf') || die $!; gave the following error Value too large for defined data type After that the target file was a zero length file with normal permissions. I removed it and the process ran OK. A few other oddities were seen - files not accessible from gnuclient when they should be - ie ls >/tmp/x gnuclient /tmp/x did not give you what was expected... but it all settled down and seemed to be working... Then TSM (backup program) started bombing out (had done some successful backups). The error message was ANS1028S Internal program error. Please see your service representative. which helps little :-) I decided to force a fsck, so did touch /forcefsck and rebooted The fsck threw up many thousands of errors, all of the form INODE nnnn i_blocks is 32, should be 16 with the i_blocks & should-be values various multiples of 8. I suspect that parted has screwed up, and so ext3 has been running on a losing wicket ever since... however even after the fsck (which obviously won't be able to fix everything if parted has broken things), I am still seeing the previous symptoms with TSM and perl Nigel. -- [ Nigel Metheringham Nigel.Metheringham@InTechnology.co.uk ] [ Phone: +44 1423 850000 Fax +44 1423 858866 ] [ - Comments in this message are my own and not ITO opinion/policy - ] [ ----- Security is not an add-on -- security is a way of life ----- ]
Hi, On Wed, Jun 06, 2001 at 11:37:49AM +0100, Nigel Metheringham wrote:> I decided to force a fsck, so did touch /forcefsck and rebooted > > The fsck threw up many thousands of errors, all of the form > INODE nnnn i_blocks is 32, should be 16 > > with the i_blocks & should-be values various multiples of 8.Hmm, that doesn't ring any bells here.> I suspect that parted has screwed up, and so ext3 has been running on a > losing wicket ever since... however even after the fsck (which obviously > won't be able to fix everything if parted has broken things), I am still > seeing the previous symptoms with TSM and perlCan you do another forced fsck and see if there is still corruption being caused? That would help a great deal. Also, knowing exactly which checkout from cvs you are using would be helpful. Finally, can you strace the perl script and send me the breaking bits? A reproducible error like that ought to be easy to find and fix. What does TSM do for its backups? If it accesses the block device directly (as dump does), that may cause problems. Cheers, Stephen
Nigel Metheringham wrote:> > I then installed the ext3 version from sourceforge onto a 2.4.5-ac[78] > kernel.That's dangerous. The -ac kernels have a radically different set of quota code. One may ask "who cares about quotas?" - well, in -ac, the i_blocks accounting (which is unrelated to quotas) is hidden inside the quota calls DQUOT_ALLOC_BLOCK, etc. However in Linus' kernel, the i_blocks accounting is open-coded in ext[23]. So a simple merge of the 2.4.5 ext3 onto a -ac kernel will double-account i_blocks increments and decrements. Merging the 2.4.5 ext3 onto -ac is tricky - you need to look at the diff between the two kernels' ext2s and apply those by hand as well.