thr3ads.net - Gluster users - [Gluster-users] timestamps getting updated during self-heal after primary brick rebuild [Feb 2013]

If this information is useful, please help other people find it:
Share via:

Todd Stansell

2013-Feb-28 23:23 UTC

[Gluster-users] timestamps getting updated during self-heal after primary brick rebuild

We're looking at using glusterfs to provide a shared filesystem between two
nodes, using just local disk.  They are both gluster servers as well as
clients.  This is on CentOS 5.9 64-bit.  The bricks are simply ext3
filesystems on top of LVM:

    /dev/mapper/VolGroup00-LogVol0 on /gfs0 type ext3 (rw,user_xattr)

We set up a test volume with:

    host14# gluster volume create gv0 replica 2 transport tcp host14:/gfs0
host13:/gfs0
    host14# gluster volume set gv0 nfs.disable on
    host14# gluster volume start gv0

This works just fine.  The issue is simulating hardware failure where we need
to rebuild an entire node.  In this case, we kickstart our server which creates
all fresh new filesystems.  We have a kickstart postinstall script that sets
the glusterd UUID of the server so that it never changes.  It then does a probe
of the other server, looks for existing volumes, sets up fstab entries for them
(to also act as a client) and also sets up an init script to force a full heal
every time the server boots just to ensure all data is replicated to both
nodes.  All of this works great when I'm rebuilding the second brick.

The issue I have is when we rebuild the server that hosts the primary brick
(host14:/gfs0).  It will come online and start copying data from host13:/gfs0,
but as it does so, it sets the timestamps of the files on host13:/gfs0 to the
time it healed the data on host14:/gfs0.  As a result, all files in the
filesystem end up with timestamps of when the first brick was healed.

I enabled client debug logs and the following indicates that it *thinks* it is
doing the right thing:

    after rebuilding gv0-client-1:
    [2013-02-28 00:01:37.264018] D
[afr-self-heal-metadata.c:329:afr_sh_metadata_sync] 0-gv0-replicate-0:
self-healing metadata of /data/bin/sync-data from gv0-client-0 to gv0-client-1

    after rebuilding gv0-client-0:
    [2013-02-28 00:17:03.578377] D
[afr-self-heal-metadata.c:329:afr_sh_metadata_sync] 0-gv0-replicate-0:
self-healing metadata of /data/bin/sync-data from gv0-client-1 to gv0-client-0

Unfortunately, in the second case, the timestamp of the files changed from:

    -r-xr-xr-x 1 root root 2717 Feb 27 23:32 /data/data/bin/sync-data*

 to:

    -r-xr-xr-x 1 root root 2717 Feb 28 00:17 /data/data/bin/sync-data*

And remember, there's nothing accessing any data in this volume so
there's no
"client" access going on anywhere.  No changes happening on the
filesystem,
other than self-heal screwing things up.

The only thing I could find in any logs that would indicate a problem was this
in the brick log:

    [2013-02-28 00:17:03.583063] D [posix.c:323:posix_do_utimes] 0-gv0-posix:
/gfs0/data/bin/sync-data (Function not implemented)

I've also now built a Centos 6 host and verified that the same behavior
happens there, though I get a slightly different brick debug log (which makes
me think this has nothing to do with what I'm seeing):

    [2013-02-28 23:07:41.879440] D [posix.c:262:posix_do_chmod] 0-gv0-posix:
/gfs0/data/bin/sync-data (Function not implemented)

Here's any basic info that might help folks know what's going on:

# rpm -qa | grep gluster
glusterfs-server-3.3.1-1.el5
glusterfs-3.3.1-1.el5
glusterfs-fuse-3.3.1-1.el5

# gluster volume info

Volume Name: gv0
Type: Replicate
Volume ID: 7cec2ba3-f69c-409a-a259-0d055792b11a
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: host14:/gfs0
Brick2: host13:/gfs0
Options Reconfigured:
diagnostics.brick-log-level: DEBUG
diagnostics.client-log-level: DEBUG
nfs.disable: on

Todd

Todd Stansell

2013-Mar-06 07:43 UTC

head link

[Gluster-users] timestamps getting updated during self-heal after primary brick rebuild

In the interest of pinging the community for *any* sort of feedback, I'd
like
to note that we rebuilt things on centos 6 with btrfs as the filesystem to use
something entirely different.  We see the same behavior.  After rebuilding the
first brick in the 2-brick replicate cluster, all file timestamps get updated
to the time self-heal copies the data back to that brick.

This is obviously a bug in 3.3.1.  We basically did what's described here:

 
http://gluster.org/community/documentation/index.php/Gluster_3.2:_Brick_Restoration_-_Replace_Crashed_Server

and timestamps get updated on all files.  Can someone acknowledge that this
sounds like a bug?  Does anyone care?

Being relatively new to glusterfs, it's painful to watch the mailing list
and
even the IRC channel and see many folks ask questions with nothing but
silence.  I honestly wasn't sure if glusterfs was actively being supported
anymore.  Given the recent flurry of mail about lack of documentation I see
that's not really true.  Unfortunately, given that what I'm seeing is a
form
of data corruption (yes, timestamps do matter), I'm surprised nobody's
interested to help figure out what's going wrong.  Hopefully it's
something
about the way I've build out cluster (though it seems less and less likely
given we are able to replicate the problem so easily).

Todd

On Thu, Feb 28, 2013 at 11:23:34PM +0000, Todd Stansell
wrote:> We're looking at using glusterfs to provide a shared filesystem between
two
> nodes, using just local disk.  They are both gluster servers as well as
> clients.  This is on CentOS 5.9 64-bit.  The bricks are simply ext3
> filesystems on top of LVM:
> 
>     /dev/mapper/VolGroup00-LogVol0 on /gfs0 type ext3 (rw,user_xattr)
> 
> We set up a test volume with:
> 
>     host14# gluster volume create gv0 replica 2 transport tcp host14:/gfs0
host13:/gfs0
>     host14# gluster volume set gv0 nfs.disable on
>     host14# gluster volume start gv0
> 
> This works just fine.  The issue is simulating hardware failure where we
need
> to rebuild an entire node.  In this case, we kickstart our server which
creates
> all fresh new filesystems.  We have a kickstart postinstall script that
sets
> the glusterd UUID of the server so that it never changes.  It then does a
probe
> of the other server, looks for existing volumes, sets up fstab entries for
them
> (to also act as a client) and also sets up an init script to force a full
heal
> every time the server boots just to ensure all data is replicated to both
> nodes.  All of this works great when I'm rebuilding the second brick.
> 
> The issue I have is when we rebuild the server that hosts the primary brick
> (host14:/gfs0).  It will come online and start copying data from
host13:/gfs0,
> but as it does so, it sets the timestamps of the files on host13:/gfs0 to
the
> time it healed the data on host14:/gfs0.  As a result, all files in the
> filesystem end up with timestamps of when the first brick was healed.
> 
> I enabled client debug logs and the following indicates that it *thinks* it
is
> doing the right thing:
> 
>     after rebuilding gv0-client-1:
>     [2013-02-28 00:01:37.264018] D
[afr-self-heal-metadata.c:329:afr_sh_metadata_sync] 0-gv0-replicate-0:
self-healing metadata of /data/bin/sync-data from gv0-client-0 to gv0-client-1
> 
>     after rebuilding gv0-client-0:
>     [2013-02-28 00:17:03.578377] D
[afr-self-heal-metadata.c:329:afr_sh_metadata_sync] 0-gv0-replicate-0:
self-healing metadata of /data/bin/sync-data from gv0-client-1 to gv0-client-0
> 
> Unfortunately, in the second case, the timestamp of the files changed from:
> 
>     -r-xr-xr-x 1 root root 2717 Feb 27 23:32 /data/data/bin/sync-data*
> 
>  to:
> 
>     -r-xr-xr-x 1 root root 2717 Feb 28 00:17 /data/data/bin/sync-data*
> 
> And remember, there's nothing accessing any data in this volume so
there's no
> "client" access going on anywhere.  No changes happening on the
filesystem,
> other than self-heal screwing things up.
> 
> The only thing I could find in any logs that would indicate a problem was
this
> in the brick log:
> 
>     [2013-02-28 00:17:03.583063] D [posix.c:323:posix_do_utimes]
0-gv0-posix: /gfs0/data/bin/sync-data (Function not implemented)
> 
> I've also now built a Centos 6 host and verified that the same behavior
> happens there, though I get a slightly different brick debug log (which
makes
> me think this has nothing to do with what I'm seeing):
> 
>     [2013-02-28 23:07:41.879440] D [posix.c:262:posix_do_chmod]
0-gv0-posix: /gfs0/data/bin/sync-data (Function not implemented)
> 
> Here's any basic info that might help folks know what's going on:
> 
> # rpm -qa | grep gluster
> glusterfs-server-3.3.1-1.el5
> glusterfs-3.3.1-1.el5
> glusterfs-fuse-3.3.1-1.el5
> 
> # gluster volume info
> 
> Volume Name: gv0
> Type: Replicate
> Volume ID: 7cec2ba3-f69c-409a-a259-0d055792b11a
> Status: Started
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: host14:/gfs0
> Brick2: host13:/gfs0
> Options Reconfigured:
> diagnostics.brick-log-level: DEBUG
> diagnostics.client-log-level: DEBUG
> nfs.disable: on
> 
> Todd
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users

Gluster users - Feb 2013 - timestamps getting updated during self-heal after primary brick rebuild

[Gluster-users] timestamps getting updated during self-heal after primary brick rebuild

[Gluster-users] timestamps getting updated during self-heal after primary brick rebuild