khaije rock
2009-May-25 12:06 UTC
[Ocfs2-users] fsck fails & volume mount fails, is my data lost?
Hi, I hope its appropriate for me to post my issue to this list. Thanks in advance for any help! I don't know exactly what the underlying cause is, but here is what it looks like: - mount the filesystem - cd into the directory with no errors, however - the shell seizes when i attempt to 'ls' or interact with any data in any way. I've found when running fsck.ocfs2 against the block device (it's a logical volume using lvm) it completes successfully and reports the following: khaije at chronovore:~$ sudo fsck /dev/vg.chronovore/lv.medea.share._multimedia_store fsck 1.41.3 (12-Oct-2008) Checking OCFS2 filesystem in /dev/vg.chronovore/lv.medea.share._multimedia_store: label: lv.medea.share._multimedia_store uuid: 28 f3 65 1c 1d 04 4e 28 af f0 37 7f 30 13 fc 38 number of blocks: 65536000 bytes per block: 4096 number of clusters: 65536000 bytes per cluster: 4096 max slots: 4 o2fsck_should_replay_journals:564 | slot 0 JOURNAL_DIRTY_FL: 1 o2fsck_should_replay_journals:564 | slot 1 JOURNAL_DIRTY_FL: 0 o2fsck_should_replay_journals:564 | slot 2 JOURNAL_DIRTY_FL: 0 o2fsck_should_replay_journals:564 | slot 3 JOURNAL_DIRTY_FL: 0 /dev/vg.chronovore/lv.medea.share._multimedia_store is clean. It will be checked after 20 additional mounts. The command returns this output and returns control to the shell. As you can see it indicates there is a 'journal dirty' flag set for slot one, which is the host machine. You'll notice that immediately after stating that the journal is dirty it says that the filesystem is clean. In order to try to make the filesystem usable I ran fsck.ocfs2 with the -fvv flags. This process never fully completes. After several minutes of the process happily chugging along it seizes. One of the last blocks of output generated has this to say: o2fsck_verify_inode_fields:435 | checking inode 14119181's fields check_el:249 | depth 0 count 243 next_free 1 check_er:164 | cpos 0 clusters 1 blkno 14677109 verify_block:705 | adding dir block 14677109 update_inode_alloc:157 | updated inode 14119181 alloc to 1 from 1 in slot 0 o2fsck_verify_inode_fields:435 | checking inode 14119182's fields check_el:249 | depth 0 count 243 next_free 1 check_er:164 | cpos 0 clusters 1 blkno 14677110 o2fsck_mark_cluster_allocated: Internal logic faliure !! duplicate cluster 14677110 verify_block:705 | adding dir block 14677110 This 'Internal logic failure' seems significant, so I googled and found the following passage ( http://oss.oracle.com/osswiki/OCFS2/DesignDocs/RemoveSlotsTunefs) which seems to have some bearing in my case: -=-=-=-=-=- Duplicate groups or missing groups When we relink the groups in extent_alloc and inode_alloc, it contains 2 steps, deleting from the old inode and relinking to the new inode. So which should be carried first since we may panic between the two steps. Deleting from the old inode first If deletion is carried first and tunefs panic: Since fsck.ocfs2 don't know the inode and extent blocks are allocated(it decide them by reading inode_alloc and extent_alloc), all the spaces will be freed. This is too bad. Relinking to the new inode first If relink is carried first, and tunefs panic: Since now two alloc inode contains some duplicated chains, error "GROUP_PARENT" is prompted every time and many internal error "o2fsck_mark_cluster_allocated: Internal logic failure !! duplicate cluster". Although this is also boring, we at least have the chain information in our hand, so I'd like to revise fsck.ocfs2 to be fit for this scenario. There are also one thing that has to be mentioned: fsck.ocfs2 will loop forever in o2fsck_add_dir_block since it doesn't handle the condition of dbe->e_blkno == tmp_dbe->e_blkno, so we have to handle this also. =-=-=-=-=- Later in this page the author suggests that fsck.ocfs2 would need to be modified to handle this case (which I gather hasn't happened yet), however there must be some other way to remedy this situation and recover the nearly 250 gigs of data i have on this share? Can anyone help? I've tried copying to a new partition by using debugfs.ocfs2 but I'm not sure if I'm doing it right or if there is a more sensible approach to try. Thanks all, Nick -------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20090525/3f3ee6cd/attachment.html
khaije rock
2009-May-29 09:23 UTC
[Ocfs2-users] Fwd: fsck fails & volume mount fails, is my data lost?
I can simplify this question: What can I do to try to recover data from a problematic ocfs2 filesystem? For example, would I get any traction if I build tools from upstream sources? Thanks all! ---------- Forwarded message ---------- From: khaije rock <khaije1 at gmail.com> Date: Mon, May 25, 2009 at 8:06 AM Subject: fsck fails & volume mount fails, is my data lost? To: ocfs2-users at oss.oracle.com Hi, I hope its appropriate for me to post my issue to this list. Thanks in advance for any help! I don't know exactly what the underlying cause is, but here is what it looks like: - mount the filesystem - cd into the directory with no errors, however - the shell seizes when i attempt to 'ls' or interact with any data in any way. I've found when running fsck.ocfs2 against the block device (it's a logical volume using lvm) it completes successfully and reports the following: khaije at chronovore:~$ sudo fsck /dev/vg.chronovore/lv.medea.share._multimedia_store fsck 1.41.3 (12-Oct-2008) Checking OCFS2 filesystem in /dev/vg.chronovore/lv.medea.share._multimedia_store: label: lv.medea.share._multimedia_store uuid: 28 f3 65 1c 1d 04 4e 28 af f0 37 7f 30 13 fc 38 number of blocks: 65536000 bytes per block: 4096 number of clusters: 65536000 bytes per cluster: 4096 max slots: 4 o2fsck_should_replay_journals:564 | slot 0 JOURNAL_DIRTY_FL: 1 o2fsck_should_replay_journals:564 | slot 1 JOURNAL_DIRTY_FL: 0 o2fsck_should_replay_journals:564 | slot 2 JOURNAL_DIRTY_FL: 0 o2fsck_should_replay_journals:564 | slot 3 JOURNAL_DIRTY_FL: 0 /dev/vg.chronovore/lv.medea.share._multimedia_store is clean. It will be checked after 20 additional mounts. The command returns this output and returns control to the shell. As you can see it indicates there is a 'journal dirty' flag set for slot one, which is the host machine. You'll notice that immediately after stating that the journal is dirty it says that the filesystem is clean. In order to try to make the filesystem usable I ran fsck.ocfs2 with the -fvv flags. This process never fully completes. After several minutes of the process happily chugging along it seizes. One of the last blocks of output generated has this to say: o2fsck_verify_inode_fields:435 | checking inode 14119181's fields check_el:249 | depth 0 count 243 next_free 1 check_er:164 | cpos 0 clusters 1 blkno 14677109 verify_block:705 | adding dir block 14677109 update_inode_alloc:157 | updated inode 14119181 alloc to 1 from 1 in slot 0 o2fsck_verify_inode_fields:435 | checking inode 14119182's fields check_el:249 | depth 0 count 243 next_free 1 check_er:164 | cpos 0 clusters 1 blkno 14677110 o2fsck_mark_cluster_allocated: Internal logic faliure !! duplicate cluster 14677110 verify_block:705 | adding dir block 14677110 This 'Internal logic failure' seems significant, so I googled and found the following passage ( http://oss.oracle.com/osswiki/OCFS2/DesignDocs/RemoveSlotsTunefs) which seems to have some bearing in my case: -=-=-=-=-=- Duplicate groups or missing groups When we relink the groups in extent_alloc and inode_alloc, it contains 2 steps, deleting from the old inode and relinking to the new inode. So which should be carried first since we may panic between the two steps. Deleting from the old inode first If deletion is carried first and tunefs panic: Since fsck.ocfs2 don't know the inode and extent blocks are allocated(it decide them by reading inode_alloc and extent_alloc), all the spaces will be freed. This is too bad. Relinking to the new inode first If relink is carried first, and tunefs panic: Since now two alloc inode contains some duplicated chains, error "GROUP_PARENT" is prompted every time and many internal error "o2fsck_mark_cluster_allocated: Internal logic failure !! duplicate cluster". Although this is also boring, we at least have the chain information in our hand, so I'd like to revise fsck.ocfs2 to be fit for this scenario. There are also one thing that has to be mentioned: fsck.ocfs2 will loop forever in o2fsck_add_dir_block since it doesn't handle the condition of dbe->e_blkno == tmp_dbe->e_blkno, so we have to handle this also. =-=-=-=-=- Later in this page the author suggests that fsck.ocfs2 would need to be modified to handle this case (which I gather hasn't happened yet), however there must be some other way to remedy this situation and recover the nearly 250 gigs of data i have on this share? Can anyone help? I've tried copying to a new partition by using debugfs.ocfs2 but I'm not sure if I'm doing it right or if there is a more sensible approach to try. Thanks all, Nick -------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20090529/c5f6e2c6/attachment.html