thr3ads.net - Ocfs2 users - [Ocfs2-users] fsck fails & volume mount fails, is my data lost? [May 2009]

If this information is useful, please help other people find it:
Share via:

khaije rock

2009-May-25 12:06 UTC

[Ocfs2-users] fsck fails & volume mount fails, is my data lost?

Hi,

I hope its appropriate for me to post my issue to this list. Thanks in
advance for any help!

I don't know exactly what the underlying cause is, but here is what it looks
like:
 - mount the filesystem
 - cd into the directory with no errors, however
 - the shell seizes when i attempt to 'ls' or interact with any data in
any
way.

I've found when running fsck.ocfs2 against the block device (it's a
logical
volume using lvm) it completes successfully and reports the following:

khaije at chronovore:~$ sudo fsck
/dev/vg.chronovore/lv.medea.share._multimedia_store
fsck 1.41.3 (12-Oct-2008)
Checking OCFS2 filesystem in
/dev/vg.chronovore/lv.medea.share._multimedia_store:
  label:              lv.medea.share._multimedia_store
  uuid:               28 f3 65 1c 1d 04 4e 28 af f0 37 7f 30 13 fc 38
  number of blocks:   65536000
  bytes per block:    4096
  number of clusters: 65536000
  bytes per cluster:  4096
  max slots:          4

o2fsck_should_replay_journals:564 | slot 0 JOURNAL_DIRTY_FL: 1
o2fsck_should_replay_journals:564 | slot 1 JOURNAL_DIRTY_FL: 0
o2fsck_should_replay_journals:564 | slot 2 JOURNAL_DIRTY_FL: 0
o2fsck_should_replay_journals:564 | slot 3 JOURNAL_DIRTY_FL: 0

/dev/vg.chronovore/lv.medea.share._multimedia_store is clean.  It will be
checked after 20 additional mounts.


The command returns this output and returns control to the shell. As you can
see it indicates there is a 'journal dirty' flag set for slot one, which
is
the host machine. You'll notice that immediately after stating that the
journal is dirty it says that the filesystem is clean.

In order to try to make the filesystem usable I ran fsck.ocfs2 with the -fvv
flags. This process never fully completes. After several minutes of the
process happily chugging along it seizes. One of the last blocks of output
generated has this to say:

o2fsck_verify_inode_fields:435 | checking inode 14119181's fields
check_el:249 | depth 0 count 243 next_free 1
check_er:164 | cpos 0 clusters 1 blkno 14677109
verify_block:705 | adding dir block 14677109
update_inode_alloc:157 | updated inode 14119181 alloc to 1 from 1 in slot 0
o2fsck_verify_inode_fields:435 | checking inode 14119182's fields
check_el:249 | depth 0 count 243 next_free 1
check_er:164 | cpos 0 clusters 1 blkno 14677110
o2fsck_mark_cluster_allocated: Internal logic faliure !! duplicate cluster
14677110
verify_block:705 | adding dir block 14677110

This 'Internal logic failure' seems significant, so I googled and found
the
following passage (
http://oss.oracle.com/osswiki/OCFS2/DesignDocs/RemoveSlotsTunefs) which
seems to have some bearing in my case:

-=-=-=-=-=-
Duplicate groups or missing groups

When we relink the groups in extent_alloc and inode_alloc, it contains 2
steps, deleting from the old inode and relinking to the new inode. So which
should be carried first since we may panic between the two steps.

      Deleting from the old inode first If deletion is carried first and
tunefs panic: Since fsck.ocfs2 don't know the inode and extent blocks are
allocated(it decide them by reading inode_alloc and extent_alloc), all the
spaces will be freed. This is too bad.

      Relinking to the new inode first If relink is carried first, and
tunefs panic: Since now two alloc inode contains some duplicated chains,
error "GROUP_PARENT" is prompted every time and many internal error
"o2fsck_mark_cluster_allocated: Internal logic failure !! duplicate
cluster".
Although this is also boring, we at least have the chain information in our
hand, so I'd like to revise fsck.ocfs2 to be fit for this scenario. There
are also one thing that has to be mentioned: fsck.ocfs2 will loop forever in
o2fsck_add_dir_block since it doesn't handle the condition of
dbe->e_blkno
== tmp_dbe->e_blkno, so we have to handle this also.
=-=-=-=-=-

Later in this page the author suggests that fsck.ocfs2 would need to be
modified to handle this case (which I gather hasn't happened yet), however
there must be some other way to remedy this situation and recover the nearly
250 gigs of data i have on this share?

Can anyone help?

I've tried copying to a new partition by using debugfs.ocfs2 but I'm not
sure if I'm doing it right or if there is a more sensible approach to try.

Thanks all,
Nick
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20090525/3f3ee6cd/attachment.html

khaije rock

2009-May-29 09:23 UTC

head link

[Ocfs2-users] Fwd: fsck fails & volume mount fails, is my data lost?

I can simplify this question:

What can I do to try to recover data from a problematic ocfs2 filesystem?

For example, would I get any traction if I build tools from upstream
sources?

Thanks all!

---------- Forwarded message ----------
From: khaije rock <khaije1 at gmail.com>
Date: Mon, May 25, 2009 at 8:06 AM
Subject: fsck fails & volume mount fails, is my data lost?
To: ocfs2-users at oss.oracle.com


Hi,

I hope its appropriate for me to post my issue to this list. Thanks in
advance for any help!

I don't know exactly what the underlying cause is, but here is what it looks
like:
 - mount the filesystem
 - cd into the directory with no errors, however
 - the shell seizes when i attempt to 'ls' or interact with any data in
any
way.

I've found when running fsck.ocfs2 against the block device (it's a
logical
volume using lvm) it completes successfully and reports the following:

khaije at chronovore:~$ sudo fsck
/dev/vg.chronovore/lv.medea.share._multimedia_store
fsck 1.41.3 (12-Oct-2008)
Checking OCFS2 filesystem in
/dev/vg.chronovore/lv.medea.share._multimedia_store:
  label:              lv.medea.share._multimedia_store
  uuid:               28 f3 65 1c 1d 04 4e 28 af f0 37 7f 30 13 fc 38
  number of blocks:   65536000
  bytes per block:    4096
  number of clusters: 65536000
  bytes per cluster:  4096
  max slots:          4

o2fsck_should_replay_journals:564 | slot 0 JOURNAL_DIRTY_FL: 1
o2fsck_should_replay_journals:564 | slot 1 JOURNAL_DIRTY_FL: 0
o2fsck_should_replay_journals:564 | slot 2 JOURNAL_DIRTY_FL: 0
o2fsck_should_replay_journals:564 | slot 3 JOURNAL_DIRTY_FL: 0

/dev/vg.chronovore/lv.medea.share._multimedia_store is clean.  It will be
checked after 20 additional mounts.


The command returns this output and returns control to the shell. As you can
see it indicates there is a 'journal dirty' flag set for slot one, which
is
the host machine. You'll notice that immediately after stating that the
journal is dirty it says that the filesystem is clean.

In order to try to make the filesystem usable I ran fsck.ocfs2 with the -fvv
flags. This process never fully completes. After several minutes of the
process happily chugging along it seizes. One of the last blocks of output
generated has this to say:

o2fsck_verify_inode_fields:435 | checking inode 14119181's fields
check_el:249 | depth 0 count 243 next_free 1
check_er:164 | cpos 0 clusters 1 blkno 14677109
verify_block:705 | adding dir block 14677109
update_inode_alloc:157 | updated inode 14119181 alloc to 1 from 1 in slot 0
o2fsck_verify_inode_fields:435 | checking inode 14119182's fields
check_el:249 | depth 0 count 243 next_free 1
check_er:164 | cpos 0 clusters 1 blkno 14677110
o2fsck_mark_cluster_allocated: Internal logic faliure !! duplicate cluster
14677110
verify_block:705 | adding dir block 14677110

This 'Internal logic failure' seems significant, so I googled and found
the
following passage (
http://oss.oracle.com/osswiki/OCFS2/DesignDocs/RemoveSlotsTunefs) which
seems to have some bearing in my case:

-=-=-=-=-=-
Duplicate groups or missing groups

When we relink the groups in extent_alloc and inode_alloc, it contains 2
steps, deleting from the old inode and relinking to the new inode. So which
should be carried first since we may panic between the two steps.

      Deleting from the old inode first If deletion is carried first and
tunefs panic: Since fsck.ocfs2 don't know the inode and extent blocks are
allocated(it decide them by reading inode_alloc and extent_alloc), all the
spaces will be freed. This is too bad.

      Relinking to the new inode first If relink is carried first, and
tunefs panic: Since now two alloc inode contains some duplicated chains,
error "GROUP_PARENT" is prompted every time and many internal error
"o2fsck_mark_cluster_allocated: Internal logic failure !! duplicate
cluster".
Although this is also boring, we at least have the chain information in our
hand, so I'd like to revise fsck.ocfs2 to be fit for this scenario. There
are also one thing that has to be mentioned: fsck.ocfs2 will loop forever in
o2fsck_add_dir_block since it doesn't handle the condition of
dbe->e_blkno
== tmp_dbe->e_blkno, so we have to handle this also.
=-=-=-=-=-

Later in this page the author suggests that fsck.ocfs2 would need to be
modified to handle this case (which I gather hasn't happened yet), however
there must be some other way to remedy this situation and recover the nearly
250 gigs of data i have on this share?

Can anyone help?

I've tried copying to a new partition by using debugfs.ocfs2 but I'm not
sure if I'm doing it right or if there is a more sensible approach to try.

Thanks all,
Nick
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20090529/c5f6e2c6/attachment.html

Ocfs2 users - May 2009 - fsck fails & volume mount fails, is my data lost?

[Ocfs2-users] fsck fails & volume mount fails, is my data lost?

[Ocfs2-users] Fwd: fsck fails & volume mount fails, is my data lost?