Can you save the o2image of the volume when it is in that state.
We'll need that for analysis.
On 09/16/2011 05:41 AM, Andre Nathan wrote:> Hello
>
> For a while I had seen errors like this in the kernel logs:
>
> OCFS2: ERROR (device drbd5): ocfs2_validate_gd_parent: Group
> descriptor #69084874 has bad chain 126
> File system is now read-only due to the potential of on-disk
> corruption. Please run fsck.ocfs2 once the file system is unmounted.
>
> This always happened in the same device, and whenever it happened I ran
> fsck.ocfs2 -fy /dev/drbd5, which showed messages like these:
>
> [GROUP_FREE_BITS] Group descriptor at block 201309696 claims to have
> 9893 free bits which is more than 9886 bits indicated by the bitmap.
> Drop its free bit count down to the total? y
> [CHAIN_BITS] Chain 166 in allocator inode 11 has 1264713 bits
> marked free out of 1516032 total bits but the block groups in the
> chain have 1264706 free out of 1516032 total. Fix this by updating
> the chain record? y
> [CHAIN_GROUP_BITS] Allocator inode 11 has 79407510 bits marked used
> out of 365955414 total bits but the chains have 79407911 used out of
> 365955414 total. Fix this by updating the inode counts? y
> [INODE_COUNT] Inode 69085510 has a link count of 0 on disk but
> directory entry references come to 1. Update the count on disk to
> match? y
>
> As time passed, the frequency of these issues started to increase, and
> the last time it happened, I decided to run fsck twice in a row, and was
> surprised to see it showed the same messages in both runs. It seems it
> was unable to fix the problem.
>
> I identified the files corresponding to the inodes using debugfs.ocfs2
> and copied them to a new place, and then moved the copy over the
> original file, in order to recreate the inodes. Whenever I did that for
> one inode, the error above happened and the filesystem became read-only,
> so I had to umount/mount the volume again in order to be able to write
> to it again.
>
> After doing this, I ran fsck.ocfs2 -fy again twice, and no errors were
> reported. Since then I haven't seen this problem again.
>
> I'm running kernel 2.6.35 and ocfs2-tools 1.6.4.
>
> Has anyone else seen an issue like that?
>
> Thanks
> Andre
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users