thr3ads.net - Gluster users - [Gluster-users] "mismatching layouts" errors after expanding volume [Feb 2012]

If this information is useful, please help other people find it:
Share via:

Dan Bretherton

2012-Feb-22 02:52 UTC

[Gluster-users] "mismatching layouts" errors after expanding volume

Dear All-
There are a lot of the following type of errors in my client and NFS 
logs following a recent volume expansion.

[2012-02-16 22:59:42.504907] I 
[dht-layout.c:682:dht_layout_dir_mismatch] 0-atmos-dht: subvol: 
atmos-replicate-0; inode layout - 0 - 0; disk layout - 9203501
34 - 1227133511
[2012-02-16 22:59:42.534399] I [dht-common.c:524:dht_revalidate_cbk] 
0-atmos-dht: mismatching layouts for /users/rle/TRACKTEMP/TRACKS
[2012-02-16 22:59:42.534521] I 
[dht-layout.c:682:dht_layout_dir_mismatch] 0-atmos-dht: subvol: 
atmos-replicate-1; inode layout - 0 - 0; disk layout - 1227133
512 - 1533916889

I have expanded the volume successfully many times in the past.  I can 
think of several possible reasons why this one might have gone wrong, 
but without expert advice I am just guessing.

1) I did precautionary ext4 filesystem checks on all the bricks and 
found errors on some of them, mostly things like this:

Pass 1: Checking inodes, blocks, and sizes
Inode 104386076, i_blocks is 3317792, should be 3317800.  Fix? yes

2) I always use hostname.domain for new GlusterFS servers when doing 
"gluster peer probe HOSTNAME" (e.g. gluster peer probe 
bdan14.nerc-essc.ac.uk).  I normally use hostname.domain (e.g. 
bdan14.nerc-essc.ac.uk) when creating volumes or adding bricks as well, 
but for the last brick I added I just used the hostname (bdan14).  I can 
do "ping bdan14" from all the servers and clients, and the only access
to the volume from outside my subnetwork is via NFS.

3) I found some old GlusterFS client processes still running, probably 
left over from previous occasions when the volume was auto-mounted.  I 
have seen this before and I don't know why it happens, but normally I 
just kill unwanted glusterfs processes without affecting the mount.

4) I recently started using more than one server to export the volume 
via NFS in order to spread the load. In other words, two NFS clients may 
mount the same volume exported from two different servers. I don't 
remember reading anywhere that this is not allowed, but as this is a 
recent change I thought it would be worth checking.

5) I normally let people carry on using a volume while a fix-layout 
process is going on in the background.  I don't remember reading that 
this is not allowed but I thought it worth checking.  I don't do 
migrate-data after fix-layout because it doesn't work on my cluster.  
Normally the fix-layout completes without error and no "mismatching 
layout" errors are observed.  However the volume is now so large that 
fix-layout usually takes several days to complete, and that means that a 
lot more files are created and modified during fix-layout than before.  
Could the continued use of the volume during the lengthy fix-layout be 
causing the layout errors?

I have run fix-layout 3 times now and the second attempt crashed.  All I 
can think of doing is to try again now that several back-end filesystems 
have been repaired.  Could any of the above factors have caused the 
layout errors, and can anyone suggest a better way to remove them?   All 
comments and suggestions would be much appreciated.

Regards
Dan.

Dan Bretherton

2012-Feb-22 12:22 UTC

head link

[Gluster-users] "mismatching layouts" errors after expanding volume

Hello All-
I would really appreciate a quick Yes/No answer to the most important 
question - is it safe to create, modify and delete files in a volume 
during a fix-layout operation after an expansion?

The users are champing at the bit waiting for me to let them have write 
access, but fix-layout is likely to take several days based on previous 
experience.

-Dan

On 02/22/2012 02:52 AM, Dan Bretherton wrote:> Dear All-
> There are a lot of the following type of errors in my client and NFS 
> logs following a recent volume expansion.
>
> [2012-02-16 22:59:42.504907] I 
> [dht-layout.c:682:dht_layout_dir_mismatch] 0-atmos-dht: subvol: 
> atmos-replicate-0; inode layout - 0 - 0; disk layout - 9203501
> 34 - 1227133511
> [2012-02-16 22:59:42.534399] I [dht-common.c:524:dht_revalidate_cbk] 
> 0-atmos-dht: mismatching layouts for /users/rle/TRACKTEMP/TRACKS
> [2012-02-16 22:59:42.534521] I 
> [dht-layout.c:682:dht_layout_dir_mismatch] 0-atmos-dht: subvol: 
> atmos-replicate-1; inode layout - 0 - 0; disk layout - 1227133
> 512 - 1533916889
>
> I have expanded the volume successfully many times in the past.  I can 
> think of several possible reasons why this one might have gone wrong, 
> but without expert advice I am just guessing.
>
> 1) I did precautionary ext4 filesystem checks on all the bricks and 
> found errors on some of them, mostly things like this:
>
> Pass 1: Checking inodes, blocks, and sizes
> Inode 104386076, i_blocks is 3317792, should be 3317800.  Fix? yes
>
> 2) I always use hostname.domain for new GlusterFS servers when doing 
> "gluster peer probe HOSTNAME" (e.g. gluster peer probe 
> bdan14.nerc-essc.ac.uk).  I normally use hostname.domain (e.g. 
> bdan14.nerc-essc.ac.uk) when creating volumes or adding bricks as 
> well, but for the last brick I added I just used the hostname 
> (bdan14).  I can do "ping bdan14" from all the servers and
clients,
> and the only access to the volume from outside my subnetwork is via NFS.
>
> 3) I found some old GlusterFS client processes still running, probably 
> left over from previous occasions when the volume was auto-mounted.  I 
> have seen this before and I don't know why it happens, but normally I 
> just kill unwanted glusterfs processes without affecting the mount.
>
> 4) I recently started using more than one server to export the volume 
> via NFS in order to spread the load. In other words, two NFS clients 
> may mount the same volume exported from two different servers. I don't 
> remember reading anywhere that this is not allowed, but as this is a 
> recent change I thought it would be worth checking.
>
> 5) I normally let people carry on using a volume while a fix-layout 
> process is going on in the background.  I don't remember reading that 
> this is not allowed but I thought it worth checking.  I don't do 
> migrate-data after fix-layout because it doesn't work on my cluster.  
> Normally the fix-layout completes without error and no "mismatching 
> layout" errors are observed.  However the volume is now so large that 
> fix-layout usually takes several days to complete, and that means that 
> a lot more files are created and modified during fix-layout than 
> before.  Could the continued use of the volume during the lengthy 
> fix-layout be causing the layout errors?
>
> I have run fix-layout 3 times now and the second attempt crashed.  All 
> I can think of doing is to try again now that several back-end 
> filesystems have been repaired.  Could any of the above factors have 
> caused the layout errors, and can anyone suggest a better way to 
> remove them?   All comments and suggestions would be much appreciated.
>
> Regards
> Dan.

Dan Bretherton

2012-Feb-23 13:58 UTC

head link

[Gluster-users] "mismatching layouts" errors after expanding volume

Thanks Jeff, that's interesting.

It is reassuring to know that these errors are self repairing.  That 
does appear to be happening, but only when I run "find -print0 | xargs 
--null stat >/dev/null" in affected directories.  I will run that 
self-heal on the whole volume as well, but I have had to start with 
specific directories that people want to work in today.  Does repeating 
the fix-layout operation have any effect, or are the xattr repairs all 
done by the self-heal mechanism?

I have found the cause of the transient brick failure; it happened again 
this morning on a replicated pair of bricks.  Suddenly the 
etc-glusterfs-glusterd.vol.log file was flooded with these messages 
every few seconds.

E [socket.c:2080:socket_connect] 0-management: connection attempt failed 
(Connection refused)

One of the clients then reported errors like the following.

[2012-02-23 11:19:22.922785] E [afr-common.c:3164:afr_notify] 
2-atmos-replicate-3: All subvolumes are down. Going offline until 
atleast one of them comes back up.
[2012-02-23 11:19:22.923682] I [dht-layout.c:581:dht_layout_normalize] 
0-atmos-dht: found anomalies in /. holes=1 overlaps=0
[2012-02-23 11:19:22.923714] I 
[dht-selfheal.c:569:dht_selfheal_directory] 0-atmos-dht: 1 subvolumes 
down -- not fixing

[2012-02-23 11:19:22.941468] W 
[socket.c:1494:__socket_proto_state_machine] 1-atmos-client-7: reading 
from socket failed. Error (Transport endpoint is not connected), peer 
(192.171.166.89:24019)
[2012-02-23 11:19:22.972307] I [client.c:1883:client_rpc_notify] 
1-atmos-client-7: disconnected
[2012-02-23 11:19:22.972352] E [afr-common.c:3164:afr_notify] 
1-atmos-replicate-3: All subvolumes are down. Going offline until 
atleast one of them comes back up.

The servers causing trouble were still showing as Connected in "gluster 
peer status" and nothing appeared to be wrong except for glusterd 
misbehaving.  Restarting glusterd solved the problem, but given that 
this has happened twice this week already I am worried that it could 
happen again at any time.  Do you know what might be causing glusterd to 
stop responding like this?

Regards
Dan.

On 02/22/2012 08:00 PM, gluster-users-request at gluster.org
wrote:> Date: Wed, 22 Feb 2012 10:32:31 -0500
> From: Jeff Darcy<jdarcy at redhat.com>
> Subject: Re: [Gluster-users] "mismatching layouts" errors after
> 	expanding volume
> To:gluster-users at gluster.org
> Message-ID:<4F450A8F.6070809 at redhat.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Following up on the previous reply...
>
> On 02/22/2012 02:52 AM, Dan Bretherton wrote:
>> >  [2012-02-16 22:59:42.504907] I
>> >  [dht-layout.c:682:dht_layout_dir_mismatch] 0-atmos-dht: subvol:
>> >  atmos-replicate-0; inode layout - 0 - 0; disk layout - 9203501
>> >  34 - 1227133511
>> >  [2012-02-16 22:59:42.534399] I
[dht-common.c:524:dht_revalidate_cbk]
>> >  0-atmos-dht: mismatching layouts for /users/rle/TRACKTEMP/TRACKS
> On 02/22/2012 09:19 AM, Jeff Darcy wrote:
>> >  OTOH, the log entries below do seem to indicate that there's
something going on
>> >  that I don't understand.  I'll dig a bit, and let you
know if I find anything
>> >  to change my mind wrt the safety of restoring write access.
> The two messages above are paired, in the sense that the second is
inevitable
> after the first. The "disk layout" range shown in the first is
exactly what I
> would expect for subvolume 3 out of 0-13. That means the
trusted.glusterfs.dht
> value on disk seems reasonable. The corresponding in-memory "inode
layout"
> entry has the less reasonable value of all zero. That probably means we
failed
> to fetch the xattr at some point in the past. There might be something
earlier
> in your logs - perhaps a message about "holes" and/or one
specifically
> mentioning that subvolume - to explain why.
>
> The good news is that this should be self-repairing. Once we get these
> messages, we try to re-fetch the layout information from all subvolumes. If
> *that*  failed, we'd see more messages than those above. Since the
on-disk
> values seem OK and revalidation seems to be succeeding, I would say these
> messages probably represent successful attempts to recover from a transient
> brick failure, and that does*not*  change what I said previously.-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20120223/58b2f596/attachment.html>

Reasonably Related Threads

Search for more possibly parallel threads

Gluster users - Feb 2012 - "mismatching layouts" errors after expanding volume

[Gluster-users] "mismatching layouts" errors after expanding volume

[Gluster-users] "mismatching layouts" errors after expanding volume

[Gluster-users] "mismatching layouts" errors after expanding volume

Reasonably Related Threads