That seems correct with 1 change, not only do I get the old file in step 5, that
old file overwrites the newer file on the node that did not go down.
> 1) What versions are you using ?
glusterfs 3.0.2 built on Feb 7 2010 00:15:44
Repository revision: v3.0.2
> 2) Can you share your volume files ? Are they generated using volgen ?
I did generate them via volgen, but then modified them because I have 3 shares,
but only to rename things.
(vol files at end of e-mail)
> 3) Did you notice any patterns for the files where the wrong copy was
picked ? like
> were they open when the node was brought down ?
I was not monitoring this.
> 4) Any other way to reproduce the problem ?
See my nfs issue below, although I don't think they are related.
> 5) Any other patterns you observed when you see the problem ?
See my nfs issue below, although I don't think they are related.
> 6) Would you have listings of problem file(s) from the replica nodes ?
No.
Also I did something today that works on nfs but does not work in gluster.
I have a share mounted on /cs_data.
I have directories in that share /cs_data/web and /cs_data/home
I move the /cs_data/web into /cs_data/home (so I get: /cs_data/home/web) then
symlink /cs_data/web to /cs_data/home/web, like this:
cd /cs_data;
mv web home;
ln -s home/web
On all the clients /cs_data/web does not work anymore.
If I unmount and remount it works again.
Unfortunately for the unmount/mount to work I have to kill things like httpd.
So to do a simple dir move (because I had it in the wrong place) on a read-only
dir, I have to kill my service.
I have done exactly this with an nfs mount and it did not fail at all, I did not
have to kill httpd and I did not have to unmount/remount the share.
------------------
--- server.vol ---
------------------
# $ /usr/bin/glusterfs-volgen -n tcb_data -p 50001 -r 1 -c /etc/glusterfs
10.0.0.24:/mnt/tcb_data 10.0.0.25:/mnt/tcb_data
######################################
# Start tcb share
######################################
volume tcb_posix
type storage/posix
option directory /mnt/tcb_data
end-volume
volume tcb_locks
type features/locks
subvolumes tcb_posix
end-volume
volume tcb_brick
type performance/io-threads
option thread-count 8
subvolumes tcb_locks
end-volume
volume tcb_server
type protocol/server
option transport-type tcp
option auth.addr.tcb_brick.allow *
option transport.socket.listen-port 50001
option transport.socket.nodelay on
subvolumes tcb_brick
end-volume
------------------
--- tcb client.vol ---
------------------
volume tcb_remote_glust1
type protocol/client
option transport-type tcp
option ping-timeout 5
option remote-host 10.0.0.24
option transport.socket.nodelay on
option transport.remote-port 50001
option remote-subvolume tcb_brick
end-volume
volume tcb_remote_glust2
type protocol/client
option transport-type tcp
option ping-timeout 5
option remote-host 10.0.0.25
option transport.socket.nodelay on
option transport.remote-port 50001
option remote-subvolume tcb_brick
end-volume
volume tcb_mirror
type cluster/replicate
subvolumes tcb_remote_glust1 tcb_remote_glust2
end-volume
volume tcb_writebehind
type performance/write-behind
option cache-size 4MB
subvolumes tcb_mirror
end-volume
volume tcb_readahead
type performance/read-ahead
option page-count 4
subvolumes tcb_writebehind
end-volume
volume tcb_iocache
type performance/io-cache
option cache-size `grep 'MemTotal' /proc/meminfo | awk
'{print $2 * 0.2 / 1024}' | cut -f1 -d.`MB
option cache-timeout 1
subvolumes tcb_readahead
end-volume
volume tcb_quickread
type performance/quick-read
option cache-timeout 1
option max-file-size 64kB
subvolumes tcb_iocache
end-volume
volume tcb_statprefetch
type performance/stat-prefetch
subvolumes tcb_quickread
end-volume
^C
Tejas N. Bhise wrote:> Chad, Stephan - thank you for your feedback.
>
> Just to clarify on what wrote, do you mean to say that -
>
> 1) The setup is a replicate setup with the file being written to multiple
nodes.
> 2) One of these nodes is brought down.
> 3) A replicated file with a copy on the node brought down is written to.
> 4) The other copies are updates as writes happen while this node is still
down.
> 5) After this node is brought up, the client sometimes sees the old file on
the node brought up
> instead of picking the file from a node that has the latest copy.
>
> If the above is correct, quick questions -
>
> 1) What versions are you using ?
> 2) Can you share your volume files ? Are they generated using volgen ?
> 3) Did you notice any patterns for the files where the wrong copy was
picked ? like
> were they open when the node was brought down ?
> 4) Any other way to reproduce the problem ?
> 5) Any other patterns you observed when you see the problem ?
> 6) Would you have listings of problem file(s) from the replica nodes ?
>
> If however my understanding was not correct, then please let me know with
some
> examples.
>
> Regards,
> Tejas.
>
> ----- Original Message -----
> From: "Chad" <ccolumbu at hotmail.com>
> To: "Stephan von Krawczynski" <skraw at ithnet.com>
> Cc: gluster-users at gluster.org
> Sent: Sunday, March 7, 2010 9:32:27 PM GMT +05:30 Chennai, Kolkata, Mumbai,
New Delhi
> Subject: Re: [Gluster-users] How to re-sync
>
> I actually do prefer top post.
>
> Well this "overwritten" behavior is what I saw as well and that
is a REALLY REALLY bad thing.
> Which is why I asked my question in the first place.
>
> Is there a gluster developer out there working on this problem
specifically?
> Could we add some kind of "sync done" command that has to be run
manually and until it is the failed node is not used?
> The bottom line for me is that I would much rather run on a performance
degraded array until a sysadmin intervenes, than loose any data.
>
> ^C
>
>
>
> Stephan von Krawczynski wrote:
>> I love top-post ;-)
>>
>> Generally, you are right. But in real-life you cannot trust on this
>> "smartness". We tried exactly this point and had to find out
that the clients
>> do not always select the correct file version (i.e. the latest)
automatically.
>> Our idea in the testcase was to bring down a node, update its kernel an
revive
>> it - just as you would like to do it in real world for a kernel update.
>> We found out that some files were taken from the downed node afterwards
and
>> the new contents on the other node got in fact overwritten.
>> This does not happen generally, of course. But it does happen. We could
only
>> stop this behaviour by setting "favorite-child". But that
does not really help
>> a lot, since we want to take down all nodes some other day.
>> This is in fact one of our show-stoppers.
>>
>>
>> On Sun, 7 Mar 2010 01:33:14 -0800
>> Liam Slusser <lslusser at gmail.com> wrote:
>>
>>> Assuming you used raid1 (distribute), you DO bring up the new
machine
>>> and start gluster. On one of your gluster mounts you run a ls -alR
>>> and it will resync the new node. The gluster clients are smart
enough
>>> to get the files from the first node.
>>>
>>> liam
>>>
>>> On Sat, Mar 6, 2010 at 11:48 PM, Chad <ccolumbu at
hotmail.com> wrote:
>>>> Ok, so assuming you have N glusterfsd servers (say 2 cause it
does not
>>>> really matter).
>>>> Now one of the servers dies.
>>>> You repair the machine and bring it back up.
>>>>
>>>> I think 2 things:
>>>> 1. You should not start glusterfsd on boot (you need to sync
the HD first)
>>>> 2. When it is up how do you re-sync it?
>>>>
>>>> Do you rsync the underlying mount points?
>>>> If it is a busy gluster cluster it will be getting new files
all the time.
>>>> So how do you sync and bring it back up safely so that clients
don't connect
>>>> to an incomplete server?
>>>>
>>>> ^C
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>
>>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
>