Ondrej Jombik
2009-Apr-01 04:21 UTC
[Gluster-users] RAID-1 over network scenario - incredible problems
I'm trying to configure GlusterFS setup with two replicating servers. For now just without any client. Worked well so far, however after I rebooted the second server I started to have difficult times... (note: first server remains unrebooted) 1. are all changes made on non-rebooted server during the second server reboot lost? they are not replicated after rebooted server is online again... is there OFFICIAL way how to acheive this? Does it have some binary log of non-performed write operations? 2. on rebooted server I tried to configure glusterfs to get missing files, here is my basic configuration (afr part only): volume afr type cluster/afr subvolumes local remote end-volume 4. this shows on the second server only files which were there before reboot; files created during the reboot are not there, but they still remain on the non-rebooted server 5. option read-subvolume remote This actually does nothing. Is this implemented? I expecting to read all the data from the remote volume. 6. option favorite-child remote This does nothing as well, but at least print some warning into the log files. However what is written in the warning actually does not happen. I tried to access all files on remote/local device/mountpoint (4 ways), no change at all. 7. if I define "subvolumes remote" (so kicking local from subvolumes) than I finally get the right file contents, but only at mountpoint, not in actual device; I need to get files into the actual device (local disk) of rebooted server 8. and finally I deleted all the files from device of rebooted server and I was hopping for the replication to do the rest; and viola, I have them replicated, so all files created during reboot are there, but they are all filed with zeros! (and no this is not that known XFS bug, it is actually on EXT3) I know this all is pretty incredible and looks like a horror story, but I have read tons of documentation and still I'm not able to figure that out. I wish that it is problem between keyboard and chair and not in the software itself. I'm only trying to have RAID-1 over network with automatic recovery after reboot/outage. Is this that complicated?? (I need to metion that I did not started with clients yet, there I'm expecting even bigger troubles like this) I will much appreciate any kind of help (even confirming me this behaviour will help me a lot) Thank you Ondrej -- /\ Ondrej Jombik - nepto at platon.sk - http://nepto.sk - ICQ #122428216 //\\ Platon Group - open source software development - http://platon.sk //\\ 10 types of people: those who understand binary & those who do not
Stas Oskin
2009-Apr-01 08:55 UTC
[Gluster-users] RAID-1 over network scenario - incredible problems
Hi. No need to do rsync - running ls -lR on the mount *should* synchronize all the files. That said, it doesn't always work - I'm currently battling with several issues on the subject. Regards. 2009/4/1 Ondrej Jombik <nepto at platon.sk>> Than you so much for this reply. We used to have client-side replication > and failover worked very nice, but we were unable to do auto-restore > (servers do not know about each other), thus I thought that moving > tovards server-side replication would be a good idea. > > We will probably move back to client-size and do the rsync-like > auto-restore after storage node failure. > > Again thanks, that helped me a lot, since now I have at least startpoint > what to expect and whan not (regarding GlusterFS). > > PS: I like the GlusterFS architecture concept, and config files as well. > > > On Wed, 1 Apr 2009, Stas Oskin wrote: > > Hi. >> >> 2009/4/1 Ondrej Jombik <nepto at platon.sk> >> I'm trying to configure GlusterFS setup with two replicating servers. >> For now just without any client. Worked well so far, however after >> I rebooted the second server I started to have difficult times... >> (note: first server remains unrebooted) >> >> >> Advice - use client side replication, it's considered more reliable and >> supports fail-over. >> >> >> 1. are all changes made on non-rebooted server during the second >> server >> reboot lost? they are not replicated after rebooted server is >> online >> again... is there OFFICIAL way how to acheive this? Does it have >> some >> binary log of non-performed write operations? >> >> >> You only need to run ls -lR when the server comes up to sync between the >> files. Now, only if it always worked... >> >> >> 6. option favorite-child remote >> This does nothing as well, but at least print some warning into the >> log files. However what is written in the warning actually does not >> happen. I tried to access all files on remote/local >> device/mountpoint (4 ways), no change at all. >> >> >> Happens to me to. >> >> >> 7. if I define "subvolumes remote" (so kicking local from subvolumes) >> than I finally get the right file contents, but only at mountpoint, >> not in actual device; I need to get files into the actual device >> (local disk) of rebooted server >> >> >> Happen to me to. >> >> >> 8. and finally I deleted all the files from device of rebooted server >> and I was hopping for the replication to do the rest; and viola, >> I have them replicated, so all files created during reboot are >> there, >> but they are all filed with zeros! >> (and no this is not that known XFS bug, it is actually on EXT3) >> >> >> Welcome to the club :). >> >> >> I know this all is pretty incredible and looks like a horror story, >> but >> I have read tons of documentation and still I'm not able to figure >> that >> out. I wish that it is problem between keyboard and chair and not in >> the >> software itself. >> >> I'm only trying to have RAID-1 over network with automatic recovery >> after reboot/outage. Is this that complicated?? >> >> (I need to metion that I did not started with clients yet, there I'm >> expecting even bigger troubles like this) >> >> >> As I said, perhaps you should try client-side AFR first? >> > > -- > /\ Ondrej Jombik - nepto at platon.sk - http://nepto.sk - ICQ #122428216 > //\\ Platon Group - open source software development - http://platon.sk > //\\ 10 types of people: those who understand binary & those who do not >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20090401/b6e3551e/attachment.html>