thr3ads.net - Gluster users - [Gluster-users] How to re-sync [Mar 2010]

If this information is useful, please help other people find it:
Share via:

Tejas N. Bhise

2010-Mar-07 17:21 UTC

[Gluster-users] How to re-sync

Chad, Stephan - thank you for your feedback.

Just to clarify on what wrote, do you mean to say that -

1) The setup is a replicate setup with the file being written to multiple nodes.
2) One of these nodes is brought down.
3) A replicated file with a copy on the node brought down is written to.
4) The other copies are updates as writes  happen while this node is still down.
5) After this node is brought up, the client sometimes sees the old file on the
node brought up
instead of picking the file from a node that has the latest copy.

If the above is correct, quick questions -

1) What versions are you using ?
2) Can you share your volume files ? Are they generated using volgen ? 
3) Did you notice any patterns for the files where the wrong copy was picked ?
like
were they open when the node was brought down ?
4) Any other way to reproduce the problem ?
5) Any other patterns you observed when you see the problem ?
6) Would you have listings of problem file(s) from the replica nodes ?

If however my understanding was not  correct, then please let me know with some
examples.

Regards,
Tejas.

----- Original Message -----
From: "Chad" <ccolumbu at hotmail.com>
To: "Stephan von Krawczynski" <skraw at ithnet.com>
Cc: gluster-users at gluster.org
Sent: Sunday, March 7, 2010 9:32:27 PM GMT +05:30 Chennai, Kolkata, Mumbai, New
Delhi
Subject: Re: [Gluster-users] How to re-sync

I actually do prefer top post.

Well this "overwritten" behavior is what I saw as well and that is a
REALLY REALLY bad thing.
Which is why I asked my question in the first place.

Is there a gluster developer out there working on this problem specifically?
Could we add some kind of "sync done" command that has to be run
manually and until it is the failed node is not used?
The bottom line for me is that I would much rather run on a performance degraded
array until a sysadmin intervenes, than loose any data.

^C

Stephan von Krawczynski wrote:> I love top-post ;-)
> 
> Generally, you are right. But in real-life you cannot trust on this
> "smartness". We tried exactly this point and had to find out that
the clients
> do not always select the correct file version (i.e. the latest)
automatically.
> Our idea in the testcase was to bring down a node, update its kernel an
revive
> it - just as you would like to do it in real world for a kernel update.
> We found out that some files were taken from the downed node afterwards and
> the new contents on the other node got in fact overwritten.
> This does not happen generally, of course. But it does happen. We could
only
> stop this behaviour by setting "favorite-child". But that does
not really help
> a lot, since we want to take down all nodes some other day.
> This is in fact one of our show-stoppers.
> 
> 
> On Sun, 7 Mar 2010 01:33:14 -0800
> Liam Slusser <lslusser at gmail.com> wrote:
> 
>> Assuming you used raid1 (distribute), you DO bring up the new machine
>> and start gluster.  On one of your gluster mounts you run a ls -alR
>> and it will resync the new node.  The gluster clients are smart enough
>> to get the files from the first node.
>>
>> liam
>>
>> On Sat, Mar 6, 2010 at 11:48 PM, Chad <ccolumbu at hotmail.com>
wrote:
>>> Ok, so assuming you have N glusterfsd servers (say 2 cause it does
not
>>> really matter).
>>> Now one of the servers dies.
>>> You repair the machine and bring it back up.
>>>
>>> I think 2 things:
>>> 1. You should not start glusterfsd on boot (you need to sync the HD
first)
>>> 2. When it is up how do you re-sync it?
>>>
>>> Do you rsync the underlying mount points?
>>> If it is a busy gluster cluster it will be getting new files all
the time.
>>> So how do you sync and bring it back up safely so that clients
don't connect
>>> to an incomplete server?
>>>
>>> ^C
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>
> 
> _______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Chad

2010-Mar-08 09:52 UTC

head link

[Gluster-users] How to re-sync

That seems correct with 1 change, not only do I get the old file in step 5, that
old file overwrites the newer file on the node that did not go down.

 > 1) What versions are you using ?
glusterfs 3.0.2 built on Feb  7 2010 00:15:44
Repository revision: v3.0.2

 > 2) Can you share your volume files ? Are they generated using volgen ?
I did generate them via volgen, but then modified them because I have 3 shares,
but only to rename things.
(vol files at end of e-mail)

 > 3) Did you notice any patterns for the files where the wrong copy was
picked ? like
 > were they open when the node was brought down ?
I was not monitoring this.

 > 4) Any other way to reproduce the problem ?
See my nfs issue below, although I don't think they are related.

 > 5) Any other patterns you observed when you see the problem ?
See my nfs issue below, although I don't think they are related.

 > 6) Would you have listings of problem file(s) from the replica nodes ?
No.

Also I did something today that works on nfs but does not work in gluster.
I have a share mounted on /cs_data.
I have directories in that share /cs_data/web and /cs_data/home
I move the /cs_data/web into /cs_data/home (so I get: /cs_data/home/web) then
symlink /cs_data/web to /cs_data/home/web, like this:
cd /cs_data;
mv web home;
ln -s home/web

On all the clients /cs_data/web does not work anymore.
If I unmount and remount it works again.
Unfortunately for the unmount/mount to work I have to kill things like httpd.
So to do a simple dir move (because I had it in the wrong place) on a read-only
dir, I have to kill my service.

I have done exactly this with an nfs mount and it did not fail at all, I did not
have to kill httpd and I did not have to unmount/remount the share.

------------------
--- server.vol ---
------------------
# $ /usr/bin/glusterfs-volgen -n tcb_data -p 50001 -r 1 -c /etc/glusterfs
10.0.0.24:/mnt/tcb_data 10.0.0.25:/mnt/tcb_data

######################################
# Start tcb share
######################################
volume tcb_posix
   type storage/posix
   option directory /mnt/tcb_data
end-volume

volume tcb_locks
     type features/locks
     subvolumes tcb_posix
end-volume

volume tcb_brick
     type performance/io-threads
     option thread-count 8
     subvolumes tcb_locks
end-volume

volume tcb_server
     type protocol/server
     option transport-type tcp
     option auth.addr.tcb_brick.allow *
     option transport.socket.listen-port 50001
     option transport.socket.nodelay on
     subvolumes tcb_brick
end-volume

------------------
--- tcb client.vol ---
------------------
volume tcb_remote_glust1
         type protocol/client
         option transport-type tcp
         option ping-timeout 5
         option remote-host 10.0.0.24
         option transport.socket.nodelay on
         option transport.remote-port 50001
         option remote-subvolume tcb_brick
end-volume

volume tcb_remote_glust2
         type protocol/client
         option transport-type tcp
         option ping-timeout 5
         option remote-host 10.0.0.25
         option transport.socket.nodelay on
         option transport.remote-port 50001
         option remote-subvolume tcb_brick
end-volume

volume tcb_mirror
         type cluster/replicate
         subvolumes tcb_remote_glust1 tcb_remote_glust2
end-volume

volume tcb_writebehind
         type performance/write-behind
         option cache-size 4MB
         subvolumes tcb_mirror
end-volume

volume tcb_readahead
         type performance/read-ahead
         option page-count 4
         subvolumes tcb_writebehind
end-volume

volume tcb_iocache
         type performance/io-cache
         option cache-size `grep 'MemTotal' /proc/meminfo  | awk
'{print $2 * 0.2 / 1024}' | cut -f1 -d.`MB
         option cache-timeout 1
         subvolumes tcb_readahead
end-volume

volume tcb_quickread
         type performance/quick-read
         option cache-timeout 1
         option max-file-size 64kB
         subvolumes tcb_iocache
end-volume

volume tcb_statprefetch
         type performance/stat-prefetch
         subvolumes tcb_quickread
end-volume

^C



Tejas N. Bhise wrote:> Chad, Stephan - thank you for your feedback.
> 
> Just to clarify on what wrote, do you mean to say that -
> 
> 1) The setup is a replicate setup with the file being written to multiple
nodes.
> 2) One of these nodes is brought down.
> 3) A replicated file with a copy on the node brought down is written to.
> 4) The other copies are updates as writes  happen while this node is still
down.
> 5) After this node is brought up, the client sometimes sees the old file on
the node brought up
> instead of picking the file from a node that has the latest copy.
> 
> If the above is correct, quick questions -
> 
> 1) What versions are you using ?
> 2) Can you share your volume files ? Are they generated using volgen ? 
> 3) Did you notice any patterns for the files where the wrong copy was
picked ? like
> were they open when the node was brought down ?
> 4) Any other way to reproduce the problem ?
> 5) Any other patterns you observed when you see the problem ?
> 6) Would you have listings of problem file(s) from the replica nodes ?
> 
> If however my understanding was not  correct, then please let me know with
some
> examples.
> 
> Regards,
> Tejas.
> 
> ----- Original Message -----
> From: "Chad" <ccolumbu at hotmail.com>
> To: "Stephan von Krawczynski" <skraw at ithnet.com>
> Cc: gluster-users at gluster.org
> Sent: Sunday, March 7, 2010 9:32:27 PM GMT +05:30 Chennai, Kolkata, Mumbai,
New Delhi
> Subject: Re: [Gluster-users] How to re-sync
> 
> I actually do prefer top post.
> 
> Well this "overwritten" behavior is what I saw as well and that
is a REALLY REALLY bad thing.
> Which is why I asked my question in the first place.
> 
> Is there a gluster developer out there working on this problem
specifically?
> Could we add some kind of "sync done" command that has to be run
manually and until it is the failed node is not used?
> The bottom line for me is that I would much rather run on a performance
degraded array until a sysadmin intervenes, than loose any data.
> 
> ^C
> 
> 
> 
> Stephan von Krawczynski wrote:
>> I love top-post ;-)
>>
>> Generally, you are right. But in real-life you cannot trust on this
>> "smartness". We tried exactly this point and had to find out
that the clients
>> do not always select the correct file version (i.e. the latest)
automatically.
>> Our idea in the testcase was to bring down a node, update its kernel an
revive
>> it - just as you would like to do it in real world for a kernel update.
>> We found out that some files were taken from the downed node afterwards
and
>> the new contents on the other node got in fact overwritten.
>> This does not happen generally, of course. But it does happen. We could
only
>> stop this behaviour by setting "favorite-child". But that
does not really help
>> a lot, since we want to take down all nodes some other day.
>> This is in fact one of our show-stoppers.
>>
>>
>> On Sun, 7 Mar 2010 01:33:14 -0800
>> Liam Slusser <lslusser at gmail.com> wrote:
>>
>>> Assuming you used raid1 (distribute), you DO bring up the new
machine
>>> and start gluster.  On one of your gluster mounts you run a ls -alR
>>> and it will resync the new node.  The gluster clients are smart
enough
>>> to get the files from the first node.
>>>
>>> liam
>>>
>>> On Sat, Mar 6, 2010 at 11:48 PM, Chad <ccolumbu at
hotmail.com> wrote:
>>>> Ok, so assuming you have N glusterfsd servers (say 2 cause it
does not
>>>> really matter).
>>>> Now one of the servers dies.
>>>> You repair the machine and bring it back up.
>>>>
>>>> I think 2 things:
>>>> 1. You should not start glusterfsd on boot (you need to sync
the HD first)
>>>> 2. When it is up how do you re-sync it?
>>>>
>>>> Do you rsync the underlying mount points?
>>>> If it is a busy gluster cluster it will be getting new files
all the time.
>>>> So how do you sync and bring it back up safely so that clients
don't connect
>>>> to an incomplete server?
>>>>
>>>> ^C
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>
>>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> 
>

Gluster users - Mar 2010 - How to re-sync

[Gluster-users] How to re-sync

[Gluster-users] How to re-sync