I recently set up a new system to run backuppc on centOS 5 with the archive stored on a raid1 of 750 gig SATA drives created with 3 members with one specified as "missing". Once a week I add the 3rd partition, let it sync, then remove it. I've had a similar system working for a long time using a firewire drive as the 3rd member, so I don't think the raid setup is the cause of the problem. I may have had problems with the drive power connectors initially but I think that is fixed now and I can't see any hardware errors being logged (the system/log files are on different drives). About once a week, I get an error like this, and the partition switches to read-only. --- Feb 24 04:48:20 linbackup1 kernel: EXT3-fs error (device md3): htree_dirblock_to_tree: bad entry in directory #869973: directory entry across bloc ks - offset=0, inode=3915132787, rec_len=42464, name_len=11 Feb 24 04:48:20 linbackup1 kernel: Aborting journal on device md3. Feb 24 04:48:20 linbackup1 kernel: ext3_abort called. Feb 24 04:48:20 linbackup1 kernel: EXT3-fs error (device md3): ext3_journal_start_sb: Detected aborted journal Feb 24 04:48:20 linbackup1 kernel: Remounting filesystem read-only Feb 24 04:48:33 linbackup1 kernel: EXT3-fs error (device md3): htree_dirblock_to_tree: bad entry in directory #4212181: rec_len % 4 != 0 - offse t=0, inode=4054525677, rec_len=1183, name_len=121 ---- 'fsck -y' seems to fix it up, but it keeps happening. Is this likely to be leftover cruft from the hardware issues or are there problems in ext3/raid1/sata drivers? The way backuppc stores data with millions of hardlinks in the archive it isn't really practical to copy it off, reformat, and start over. -- Les Mikesell lesmikesell at gmail.com
On Mon, 2008-02-25 at 14:04 -0600, Les Mikesell wrote:> I recently set up a new system to run backuppc on centOS 5 with the > archive stored on a raid1 of 750 gig SATA drives created with 3 members > with one specified as "missing". Once a week I add the 3rd partition, > let it sync, then remove it. I've had a similar system working for a > long time using a firewire drive as the 3rd member, so I don't think the > raid setup is the cause of the problem. I may have had problems with > the drive power connectors initially but I think that is fixed now and I > can't see any hardware errors being logged (the system/log files are on > different drives). > > About once a week, I get an error like this, and the partition switches > to read-only. > > --- > Feb 24 04:48:20 linbackup1 kernel: EXT3-fs error (device md3): > htree_dirblock_to_tree: bad entry in directory #869973: directory entry > across bloc > ks - offset=0, inode=3915132787, rec_len=42464, name_len=11 > Feb 24 04:48:20 linbackup1 kernel: Aborting journal on device md3. > Feb 24 04:48:20 linbackup1 kernel: ext3_abort called. > Feb 24 04:48:20 linbackup1 kernel: EXT3-fs error (device md3): > ext3_journal_start_sb: Detected aborted journal > Feb 24 04:48:20 linbackup1 kernel: Remounting filesystem read-only > Feb 24 04:48:33 linbackup1 kernel: EXT3-fs error (device md3): > htree_dirblock_to_tree: bad entry in directory #4212181: rec_len % 4 != > 0 - offse > t=0, inode=4054525677, rec_len=1183, name_len=121 > ---- > > 'fsck -y' seems to fix it up, but it keeps happening. Is this likely to > be leftover cruft from the hardware issues or are there problems in > ext3/raid1/sata drivers? The way backuppc stores data with millions of > hardlinks in the archive it isn't really practical to copy it off, > reformat, and start over.If you use cpio, it can handle the hard links intelligently, IIRC. That may make this more feasible. Plus you can specify such things as depth to the find command feeding cpio so that even directories end up with good dates. You can also suppress atime updates, making it both faster and non- intrusive.>HTH -- Bill
Les Mikesell <lesmikesell at gmail.com> writes:> 'fsck -y' seems to fix it up, but it keeps happening. Is this likely > to be leftover cruft from the hardware issues or are there problems > in ext3/raid1/sata drivers? The way backuppc stores data with > millions of hardlinks in the archive it isn't really practical to > copy it off, reformat, and start over.Maybe a memory problem: http://thread.gmane.org/gmane.comp.file-systems.ext3.user/3457/focus=3459 -- Nicolas