thr3ads.net - Ocfs2 users - [Ocfs2-users] merge request for patchset / bug #1324 / issue: ocfs2_read_virt_blocks:853 ERROR: Inode #xxxx contains a hole at offset xxxx [Dec 2012]

If this information is useful, please help other people find it:
Share via:

Leen Besselink

2012-Dec-20 14:05 UTC

[Ocfs2-users] merge request for patchset / bug #1324 / issue: ocfs2_read_virt_blocks:853 ERROR: Inode #xxxx contains a hole at offset xxxx

Hi,

Some time ago I had the following error:

Dec 10 14:02:50 xxxx kernel: [11099.666180]
(31655,6):ocfs2_prepare_dir_for_insert:4415 ERROR: status = -5
Dec 10 14:02:50 xxxx kernel: [11099.666208] (31655,6):ocfs2_rename:1266 ERROR:
status = -5
Dec 10 14:02:50 xxxx kernel: [11099.692901] (31655,6):ocfs2_read_virt_blocks:853
ERROR: Inode #xxxx contains a hole at offset xxxx
Dec 10 14:02:50 xxxx kernel: [11099.692952] (31655,6):ocfs2_read_dir_block:533
ERROR: status = -5
Dec 10 14:02:50 xxxx kernel: [11099.693045] (31655,6):ocfs2_read_virt_blocks:853
ERROR: Inode #xxxx contains a hole at offset xxxx
Dec 10 14:02:50 xxxx kernel: [11099.693093] (31655,6):ocfs2_read_dir_block:533
ERROR: status = -5
Dec 10 14:02:50 xxxx kernel: [11099.693186] (31655,6):ocfs2_read_virt_blocks:853
ERROR: Inode #xxxx contains a hole at offset xxxx
Dec 10 14:02:50 xxxx kernel: [11099.693233] (31655,6):ocfs2_read_dir_block:533
ERROR: status = -5

It took me a bit of time to figure out what was wrong and what to do and the
whole time I had taken the system offline.

Which was an annoying situation to be in to say the least.

The reason I diagnosed the problem wrongly at first because only a few days
before we had the other well known problem:

"No space left on device" because of wrongly choosen number of node
slots, we reduced it from 8 to 4 on a 2 node cluster.

I think this was the right solution, we've not seen the issue since.

Obviously upgrading and enabling discontig-bg is the only long term solution.

So I had assumed they were related. They were not. As I understand it the holes
are in the directory index and the cause is a releted
to failover and the use of DRBD. I guess it most have been related to a STONITH
we had trippped when working on the previous issue.

Because I didn't know what to do or how to solve it at first, I hoped a fsck
would fix it.

But it didn't. It didn't even find the problem.

This is because fsck was not only to old, but also because the following patches
were never merged:

https://oss.oracle.com/pipermail/ocfs2-tools-devel/2011-August/003931.html

Are these patches ever going to be merged ?

If I read the mailinglists correctly then I guess it is already fixed in newer
kernels ? It will just disable the directory index on the fly ?

But if the patch is merged, it would allow people to upgrade or compile the
ocfs2-tools instead of the kernel.

So I merged the patch by hand and it did recognise the problem, I just
didn't want to use a handcrafted fsck to fix a problem if I didn't have
to.

An other problem which caused a lot of delay was that I had never used debugfs
extensively before, I've always only looked at 'stats'.

The problem I had with debugfs is that when you see the help of debugs it says:

"locate <block#> ...                     List all pathnames of the
inode(s)/lockname(s)"

Which wasn't very clear for me the first time I looked at it.

I thought it meant:

locate 12345

instead of the correct command:

locale <12345>

Obviously when I found the debugging FAQ, I knew what to do and I could find out
which directory it was. I moved everything to a newly created directory renamed
them both and removed the corrupted, not empty directory. I assume that would
solve it, even though it was never mentioned explicitly on the mailinglist as a
solution.

So the question remains, are those patches ever going to be merged ?

Or is my account of the problem now clear enough so people should be able to
find this post in the mailinglist archive and fix it themselfs ?

Have a nice day,
	Leen.

PS Sorry for not mailing this to a ocfs2-tools mailinglist, I only later noticed
I had subscribed to the wrong one. I assume the same developers read this list ?

Leen Besselink

2012-Dec-20 14:08 UTC

head link

[Ocfs2-users] merge request for patchset / bug #1324 / issue: ocfs2_read_virt_blocks:853 ERROR: Inode #xxxx contains a hole at offset xxxx

On Thu, Dec 20, 2012 at 03:05:01PM +0100, Leen Besselink
wrote:> Hi,
> 
> Some time ago I had the following error:
> 
> Dec 10 14:02:50 xxxx kernel: [11099.666180]
(31655,6):ocfs2_prepare_dir_for_insert:4415 ERROR: status = -5
> Dec 10 14:02:50 xxxx kernel: [11099.666208] (31655,6):ocfs2_rename:1266
ERROR: status = -5
> Dec 10 14:02:50 xxxx kernel: [11099.692901]
(31655,6):ocfs2_read_virt_blocks:853 ERROR: Inode #xxxx contains a hole at
offset xxxx
> Dec 10 14:02:50 xxxx kernel: [11099.692952]
(31655,6):ocfs2_read_dir_block:533 ERROR: status = -5
> Dec 10 14:02:50 xxxx kernel: [11099.693045]
(31655,6):ocfs2_read_virt_blocks:853 ERROR: Inode #xxxx contains a hole at
offset xxxx
> Dec 10 14:02:50 xxxx kernel: [11099.693093]
(31655,6):ocfs2_read_dir_block:533 ERROR: status = -5
> Dec 10 14:02:50 xxxx kernel: [11099.693186]
(31655,6):ocfs2_read_virt_blocks:853 ERROR: Inode #xxxx contains a hole at
offset xxxx
> Dec 10 14:02:50 xxxx kernel: [11099.693233]
(31655,6):ocfs2_read_dir_block:533 ERROR: status = -5
> 
> It took me a bit of time to figure out what was wrong and what to do and
the whole time I had taken the system offline.
> 
> Which was an annoying situation to be in to say the least.
> 
> The reason I diagnosed the problem wrongly at first because only a few days
before we had the other well known problem:
> 
> "No space left on device" because of wrongly choosen number of
node slots, we reduced it from 8 to 4 on a 2 node cluster.
> 
> I think this was the right solution, we've not seen the issue since.
> 
> Obviously upgrading and enabling discontig-bg is the only long term
solution.
> 
> So I had assumed they were related. They were not. As I understand it the
holes are in the directory index and the cause is a releted
> to failover and the use of DRBD. I guess it most have been related to a
STONITH we had trippped when working on the previous issue.
> 
> Because I didn't know what to do or how to solve it at first, I hoped a
fsck would fix it.
> 
> But it didn't. It didn't even find the problem.
> 
> This is because fsck was not only to old, but also because the following
patches were never merged:
> 
> https://oss.oracle.com/pipermail/ocfs2-tools-devel/2011-August/003931.html
> 
> Are these patches ever going to be merged ?
> 
> If I read the mailinglists correctly then I guess it is already fixed in
newer kernels ? It will just disable the directory index on the fly ?
> 
> But if the patch is merged, it would allow people to upgrade or compile the
ocfs2-tools instead of the kernel.
> 
> So I merged the patch by hand and it did recognise the problem, I just
didn't want to use a handcrafted fsck to fix a problem if I didn't have
to.
> 
> An other problem which caused a lot of delay was that I had never used
debugfs extensively before, I've always only looked at 'stats'.
> 
> The problem I had with debugfs is that when you see the help of debugs it
says:
> 
> "locate <block#> ...                     List all pathnames of
the inode(s)/lockname(s)"
> 
> Which wasn't very clear for me the first time I looked at it.
> 
> I thought it meant:
> 
> locate 12345
> 
> instead of the correct command:
> 
> locale <12345>
> 
> Obviously when I found the debugging FAQ, I knew what to do and I could
find out which directory it was. I moved everything to a newly created directory
renamed them both and removed the corrupted, not empty directory. I assume that
would solve it, even though it was never mentioned explicitly on the mailinglist
as a solution.
> 
That should have read:

now empty directory
> So the question remains, are those patches ever going to be merged ?
> 
> Or is my account of the problem now clear enough so people should be able
to find this post in the mailinglist archive and fix it themselfs ?
> 
> Have a nice day,
> 	Leen.
> 
> PS Sorry for not mailing this to a ocfs2-tools mailinglist, I only later
noticed I had subscribed to the wrong one. I assume the same developers read
this list ?

Ocfs2 users - Dec 2012 - merge request for patchset / bug #1324 / issue: ocfs2_read_virt_blocks:853 ERROR: Inode #xxxx contains a hole at offset xxxx

[Ocfs2-users] merge request for patchset / bug #1324 / issue: ocfs2_read_virt_blocks:853 ERROR: Inode #xxxx contains a hole at offset xxxx

[Ocfs2-users] merge request for patchset / bug #1324 / issue: ocfs2_read_virt_blocks:853 ERROR: Inode #xxxx contains a hole at offset xxxx