Over the last two days we've been seeing a fair bit of this: ---- # ls -laR > /dev/null ... ls: ./server2/b/user/bxyz/392.: Input/output error ---- This is with the latest htree patches applied to 2.4.19, and latest e2fsprogs-test, on a dual AMD system, with 5x73GB SCSI drives on a MegaRAID controller. We're using the gcc 2.96 that comes with RH7.3. esfsck shows "Inodes that were part of a corrupted orphan linked list found." We've been hitting this computer pretty hard, migrating data across to it from 4 servers simultaneously using rsync. Of around a million files or so, 250 developed this problem. Here's some more diagnostics: ---- [root@server5 data]# ls -laR > /dev/null ... ls: ./server2/b/user/bxyz/392.: Input/output error ... [root@server5 all]# cd /var/data/server2/b/user/bxyz/ [root@server5 bxyz]# ls -l 391. -rw------- 1 cyrus cyrus 66274 Mar 22 2002 391. [root@server5 bxyz]# ls -l 392. ls: 392.: Input/output error [root@server5 bxyz]# debugfs /dev/sda2 debugfs 1.30-WIP (30-Sep-2002) debugfs: cd /var/data/server2/b/user/bxyz/ debugfs: ls -l 14991599 40700 (2) 504 0 20480 4-Oct-2002 21:00 . 34226198 40755 (2) 504 12 32768 6-Oct-2002 20:33 .. ... 14992584 100600 (1) 504 505 66274 22-Mar-2002 09:32 391. ... 14992585 100600 (1) 504 505 0 6-Oct-2002 17:52 392. ... debugfs: stat 392. Inode: 14992585 Type: regular Mode: 0600 Flags: 0x0 Generation: 2449155561 User: 504 Group: 505 Size: 0 File ACL: 0 Directory ACL: 0 Links: 0 Blockcount: 0 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x3da0be92 -- Sun Oct 6 17:52:02 2002 atime: 0x3da05967 -- Sun Oct 6 10:40:23 2002 mtime: 0x3da0be92 -- Sun Oct 6 17:52:02 2002 dtime: 0x3da0be92 -- Sun Oct 6 17:52:02 2002 BLOCKS: debugfs: stat 391. Inode: 14992584 Type: regular Mode: 0600 Flags: 0x0 Generation: 2449155559 User: 504 Group: 505 Size: 66274 File ACL: 0 Directory ACL: 0 Links: 1 Blockcount: 144 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x3da05967 -- Sun Oct 6 10:40:23 2002 atime: 0x3da05967 -- Sun Oct 6 10:40:23 2002 mtime: 0x3c9b4e7c -- Fri Mar 22 09:32:12 2002 BLOCKS: (0-11):29991107-29991118, (IND):29991119, (12-16):29991120-29991124 TOTAL: 18 debugfs: ncheck 14992584 Inode Pathname 14992584 /var/data/server2/b/user/bxyz/391. debugfs: ncheck 14992585 Inode Pathname 14992585 /var/data/server2/b/user/bxyz/392. debugfs: stat 392. Inode: 14992585 Type: regular Mode: 0600 Flags: 0x0 Generation: 2449155561 User: 504 Group: 505 Size: 0 File ACL: 0 Directory ACL: 0 Links: 0 Blockcount: 0 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x3da0be92 -- Sun Oct 6 17:52:02 2002 atime: 0x3da05967 -- Sun Oct 6 10:40:23 2002 mtime: 0x3da0be92 -- Sun Oct 6 17:52:02 2002 dtime: 0x3da0be92 -- Sun Oct 6 17:52:02 2002 BLOCKS: debugfs: testi 392. Inode 14992585 is not in use debugfs: testi 391. Inode 14992584 is marked in use debugfs: quit [root@server5 bxyz]# ls -l 391. -rw------- 1 cyrus cyrus 66274 Mar 22 2002 391. [root@server5 bxyz]# ls -l 392. ls: 392.: Input/output error [root@server5 bxyz]# strace ls -l 392. ... lstat64("392.", 0x8053674) = -1 EIO (Input/output error) write(2, "ls: ", 4ls: ) = 4 write(2, "392.", 4392.) = 4 write(2, ": Input/output error", 20: Input/output error) = 20 write(2, "\n", 1 ) = 1 _exit(1) = ? ---- Any ideas on what's causing this? e2fsck causes the problem files to be removed. For now we've disabled directory indexing--if the problem continues after doing this, I'll update the list with the details. BTW, now that I've disabled directory indexing, will folders with the relevent flag already set still use hashed indexes?
On Oct 07, 2002 05:07 +0000, JP Howard wrote:> ---- > # ls -laR > /dev/null > ... > ls: ./server2/b/user/bxyz/392.: Input/output error > ---- > > esfsck shows "Inodes that were part of a corrupted orphan linked list > found." > > We've been hitting this computer pretty hard, migrating data across to it > from 4 servers simultaneously using rsync. Of around a million files or > so, 250 developed this problem.It appears that the inode was marked as deleted, but it was not unlinked from the directory tree. It would be important to know whether this file should or should not exist at this time (i.e. was it ever deleted)?> debugfs: stat 392. > Inode: 14992585 Type: regular Mode: 0600 Flags: 0x0 Generation: > 2449155561 > User: 504 Group: 505 Size: 0 > File ACL: 0 Directory ACL: 0 > Links: 0 Blockcount: 0 > Fragment: Address: 0 Number: 0 Size: 0 > ctime: 0x3da0be92 -- Sun Oct 6 17:52:02 2002 > atime: 0x3da05967 -- Sun Oct 6 10:40:23 2002 > mtime: 0x3da0be92 -- Sun Oct 6 17:52:02 2002 > dtime: 0x3da0be92 -- Sun Oct 6 17:52:02 2002 > BLOCKS:The dtime is consistent with a file that was deleted, and not one that is on an orphan list.> debugfs: ncheck 14992585 > Inode Pathname > 14992585 /var/data/server2/b/user/bxyz/392.This shows the inode is still in the directory tree. If it was supposed to have been deleted (7 hours after it was created, it appears), then it would definitely point to a hashing or other lookup problem in the htree code (either it was inserted into the wrong hash block or subsequent block splits placed it into the wrong block, or it could not be found in the leaf block at delete time). If it was never supposed to have been deleted it would likely be a different sort of problem. I can see from the size of the parent directory (not quoted, but 20kB) that this would be an htree directory.> debugfs: testi 392. > Inode 14992585 is not in useSo, the inode was also marked free in the inode bitmap, further indication that it was supposed to have been deleted.> [root@server5 bxyz]# ls -l 392. > ls: 392.: Input/output errorThe -EIO error is most likely caused by the fact that any operations on this deleted inode are from the "is_bad_inode()" VFS ops, which all return -EIO for every kind of operation.> BTW, now that I've disabled directory indexing, will folders with the > relevent flag already set still use hashed indexes?No, they will be accessible via the normal linear-search methods in ext3 though. Depending on the size of your directories this may or may not be "acceptible performance", but it is usually better than not working at all - another reason I'm glad we made the effort for the htree code to be compatible with non-htree kernels. Cheers, Andreas -- Andreas Dilger http://www-mddsp.enel.ucalgary.ca/People/adilger/ http://sourceforge.net/projects/ext2resize/
On Sun, 6 Oct 2002 23:37:27 -0600, "Andreas Dilger" <adilger@clusterfs.com> said:> On Oct 07, 2002 05:07 +0000, JP Howard wrote: > > ---- > > # ls -laR > /dev/null > > ... > > ls: ./server2/b/user/bxyz/392.: Input/output error > > ---- > > > > esfsck shows "Inodes that were part of a corrupted orphan linked list > > found." > > > > We've been hitting this computer pretty hard, migrating data across to it > > from 4 servers simultaneously using rsync. Of around a million files or > > so, 250 developed this problem. > > It appears that the inode was marked as deleted, but it was not unlinked > from the directory tree. It would be important to know whether this > file should or should not exist at this time (i.e. was it ever deleted)? >It was not intentionally deleted--it should still be in the file system.> I can see from the size of the parent directory (not quoted, but 20kB) > that this would be an htree directory. >Yes, it was.> No, they will be accessible via the normal linear-search methods in ext3 > though. Depending on the size of your directories this may or may not > be "acceptible performance", but it is usually better than not working > at all - another reason I'm glad we made the effort for the htree code > to be compatible with non-htree kernels. >It's not really acceptable performance--that's why we've been trying to get htree working. Large Cyrus mailboxes are far too slow currently. But yes, I'm glad we've at least got something that works! Anyway, here's an alternate hypothesis:... Some file in the directory was supposed to be deleted. It looked by the directory entry, but got it wrong somehow, but in such a way that it removed the wrong file, but removed the right directory entry. So I can see the problem being produced like this: 1. I have a directory with 2 files, 'a' and 'b' 2. I delete 'a' 3. It removes the 'a' from the directory list, but for some reason, deletes the inode for 'b' Would this result in what we're seeing, namely that 'b' appears in the directory, but it's inode says that it's deleted and free? So, what can we do from here to get htree working? Is there anything we could add to the htree patch or to the e2sfprogs code to make it safer? Thanks for your help!
Hi, On Mon, Oct 07, 2002 at 05:07:20AM +0000, JP Howard wrote:> Over the last two days we've been seeing a fair bit of this:> ---- > # ls -laR > /dev/null > ... > ls: ./server2/b/user/bxyz/392.: Input/output error > ----What else is in the kernel logs? That is a sign of a disk failure, usually (though it doesn't appear to be in this case.)> This is with the latest htree patches applied to 2.4.19, and latest > e2fsprogs-test, on a dual AMD system, with 5x73GB SCSI drives on a > MegaRAID controller. We're using the gcc 2.96 that comes with RH7.3. > > esfsck shows "Inodes that were part of a corrupted orphan linked list > found."That can be a harmless side-effect of an older version of e2fsck. If e2fsck finds a partial truncate which needs to be completed, it used to leave the dtime field of the inode intact rather than clearing it. That shows up on the *next* forced fsck as the error you saw. If you used an older fsck on the last reboot, that would explain this message.> [root@server5 data]# ls -laR > /dev/null > ls: ./server2/b/user/bxyz/392.: Input/output error> debugfs: stat 392. > Inode: 14992585 Type: regular Mode: 0600 Flags: 0x0 Generation: > 2449155561 > User: 504 Group: 505 Size: 0 > File ACL: 0 Directory ACL: 0 > Links: 0 Blockcount: 0 > Fragment: Address: 0 Number: 0 Size: 0 > dtime: 0x3da0be92 -- Sun Oct 6 17:52:02 2002Empty file, dtime set, links==0 --- it's definitely a deleted file.> debugfs: testi 392. > Inode 14992585 is not in useDeleted file confirmed.> Any ideas on what's causing this? e2fsck causes the problem files to be > removed. For now we've disabled directory indexing--if the problem > continues after doing this, I'll update the list with the details.Thanks, we need that to work out where the problem might be.> BTW, now that I've disabled directory indexing, will folders with the > relevent flag already set still use hashed indexes?How did you disable it? If you switched to an older, non-htree kernel, then any directories you modify will get their dir index bit cleared, and you'll lose the index on those dirs. But directories you don't modify retain the index and if you reboot into an htree kernel, you can use those indexes again. If you just cleared the dir index bit with htree, then the htree data structures will be completely ignored, but you can recreate them with fsck. --Stephen
On Mon, 7 Oct 2002 09:56:27 +0100, "Stephen C. Tweedie" <sct@redhat.com> said:> On Mon, Oct 07, 2002 at 05:07:20AM +0000, JP Howard wrote: > > esfsck shows "Inodes that were part of a corrupted orphan linked list > > found." > > That can be a harmless side-effect of an older version of e2fsck. If > e2fsck finds a partial truncate which needs to be completed, it used > to leave the dtime field of the inode intact rather than clearing it. > That shows up on the *next* forced fsck as the error you saw. If you > used an older fsck on the last reboot, that would explain this > message. >OK, I had a feeling that was unrelated.> > debugfs: testi 392. > > Inode 14992585 is not in use > > Deleted file confirmed. >However, this file shouldn't have shown as deleted. It looks a lot like the wrong inode got changed when some *other* file was deleted...> > Any ideas on what's causing this? e2fsck causes the problem files to be > > removed. For now we've disabled directory indexing--if the problem > > continues after doing this, I'll update the list with the details. > > Thanks, we need that to work out where the problem might be. >Redoing the same thing with dir indexing off has not replicated the problem. I've run it twice as long as yesterday, with no errors--certainly seems like a htree issue. Regarding the kernel messages, we saw exactly the same as this gentleman: ---- On Sun, 6 Oct 2002 11:00:49 -0400, "Douglas J Hunley" <doug@hunley.homeip.net> said:> In additin to the 'brelse' issue I mentioned a few days back, I'm seeing > new > messages in my logs about ext3 trying to unlink non-existant files. any > ideas? log mesages are below: > Oct 5 19:34:40 linux-sxs kernel: EXT3-fs warning (device sd(8,19)): > ext3_unlink: Deleting nonexistent file (1433981), 0 > Oct 5 19:38:32 linux-sxs kernel: VFS: brelse: Trying to free free buffer----> If you just cleared the dir index bit with htree, then the htree data > structures will be completely ignored, but you can recreate them with > fsck. >That's what I did. I'm now completing our production server migration without dir indexing, but am creating a loopback filesystem to try to replicate this problem so we can fix it and use htree soon. I think we'll need it to get the performance we've budgeted for on our new servers. If we can replicate the process, what other diagnostics should we provide? Is there any additional telemetry we can create? When we turn on dir indexing again, we'll need to run fsck -D because the filesystem will already be full of data. Do we need to do that on an unmounted filesystem? How long might we expect it to take on around ten million files in a 5-disk RAID 5 array--minutes, hours, or days? We'll do some timing on our loopback filesystem, but it won't be a very true guess because the machine will be pretty loaded...
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 JP Howard spewed electrons into the ether that resembled:> Over the last two days we've been seeing a fair bit of this: > > ---- > # ls -laR > /dev/null > ... > ls: ./server2/b/user/bxyz/392.: Input/output errorI'm also seeing this. I'm also seeing entries in /dev that are being labelled as "duplicates" whenever tripwire runs. this is in addition to the 'brelse' and unlik_fail messages I posted earlier. kernel 2.4.19 with htre patch #4. gcc 2.95.3 - -- Douglas J Hunley (doug at hunley.homeip.net) - Linux User #174778 Admin: Linux StepByStep - http://www.linux-sxs.org and http://jobs.linux-sxs.org panic("aha1740.c"); /* Goodbye */ 2.2.16 /usr/src/linux/drivers/scsi/aha1740.c -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.0 (GNU/Linux) iD8DBQE9oZmYSrrWWknCnMIRAoLMAJ4uTit+EKAzyAi7MbJygm/2ECxg8QCfWXc7 SnBz0P2OJ0C16wqBPOizLgI=iaBc -----END PGP SIGNATURE-----
On Mon, Oct 07, 2002 at 05:07:20AM +0000, JP Howard wrote:> Over the last two days we've been seeing a fair bit of this: > > ---- > # ls -laR > /dev/null > ... > ls: ./server2/b/user/bxyz/392.: Input/output error > ----You said you were using the latest versions, but just to confirm, you *are* using patch-ext3-dxdir-2.4.19-4 and e2fsprogs version 1.30-WIP-0930, correct? - Ted
On Mon, 7 Oct 2002 10:40:35 -0400, "Theodore Ts'o" <tytso@mit.edu> said:> On Mon, Oct 07, 2002 at 05:07:20AM +0000, JP Howard wrote: > > Over the last two days we've been seeing a fair bit of this: > > > > ---- > > # ls -laR > /dev/null > > ... > > ls: ./server2/b/user/bxyz/392.: Input/output error > > ---- > > You said you were using the latest versions, but just to confirm, you > *are* using patch-ext3-dxdir-2.4.19-4 and e2fsprogs version > 1.30-WIP-0930, correct? >Confirmed.
On Mon, 7 Oct 2002 10:51:25 -0400, "Theodore Ts'o" <tytso@mit.edu> said:> That's a possible theory; but in order for that to be true, not only > would you be left with a directory entry pointing at a deleted inode, > you would also have an inode with either no directory entry pointed at > it, or if the inode was hard-linked, an incorrect reference count. > > So the e2fsck transcript should also have showed an unreferenced inode > which it would offer to move to the lost+found directory, or an > indication of some inode with an incorrect reference count. (i.e., > the inode refcount indcates that there should be 3 directory entries > pointing at it, but there are only two). >Yes, I've just looked at our fsck output, and I see this: ---- Inode 25723211 ref count is 1, should be 2. Inode 25723438 ref count is 1, should be 2. Unattached inode 25821969 Unattached inode 25821974 ---- Quite a few of those occured. Unfortunately I can't provide more info because the filesystem is now clean. Also, we reran the same commands on a loopback filesystem overnight and failed to replicate the problem. However, this was just one rsync running, whereas when we had the problem there were 4 simultaneously, so if the cause was a race condition or load problem, we may not have triggered it.
On Mon, Oct 07, 2002 at 05:07:20AM +0000, JP Howard wrote:> Over the last two days we've been seeing a fair bit of this: > > ---- > # ls -laR > /dev/null > ... > ls: ./server2/b/user/bxyz/392.: Input/output error > ----Today, I saw a similar thing: ls shows a file ls -l doesn't show it, and ls -l filename gives an I/o error.> debugfs: stat 392. > Inode: 14992585 Type: regular Mode: 0600 Flags: 0x0 Generation: > 2449155561 > User: 504 Group: 505 Size: 0 > File ACL: 0 Directory ACL: 0 > Links: 0 Blockcount: 0 > Fragment: Address: 0 Number: 0 Size: 0 > ctime: 0x3da0be92 -- Sun Oct 6 17:52:02 2002 > atime: 0x3da05967 -- Sun Oct 6 10:40:23 2002 > mtime: 0x3da0be92 -- Sun Oct 6 17:52:02 2002 > dtime: 0x3da0be92 -- Sun Oct 6 17:52:02 2002 > BLOCKS:Our debugfs output is suspiciously similar: Inode: 28332642 Type: regular Mode: 0000 Flags: 0x0 Generation: 0 User: 503 Group: 100 Size: 0 File ACL: 0 Directory ACL: 0 Links: 0 Blockcount: 0 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x3eff64e1 -- Mon Jun 30 00:14:57 2003 atime: 0x3efffe54 -- Mon Jun 30 11:09:40 2003 mtime: 0x3eff64e1 -- Mon Jun 30 00:14:57 2003 dtime: 0x3ef38040 -- Fri Jun 20 23:44:32 2003 BLOCKS:> Any ideas on what's causing this? e2fsck causes the problem files to be > removed. For now we've disabled directory indexing--if the problem > continues after doing this, I'll update the list with the details.No idea, but on our server it happens without HTREE. It's a plain 2.4.21, but until yesterday, we had 2.4.18, so the corruption could have been caused by the older kernel. It's a 540GB ext3 partition on top of LVM, and has been resized a few times with e2fsadm. Jan