thr3ads.net - Gluster users - [Gluster-users] RAID-1 over network scenario

If this information is useful, please help other people find it:
Share via:

Ondrej Jombik

2009-Apr-01 04:21 UTC

[Gluster-users] RAID-1 over network scenario - incredible problems

I'm trying to configure GlusterFS setup with two replicating servers.
For now just without any client. Worked well so far, however after
I rebooted the second server I started to have difficult times...
(note: first server remains unrebooted)

1. are all changes made on non-rebooted server during the second server
    reboot lost? they are not replicated after rebooted server is online
    again... is there OFFICIAL way how to acheive this? Does it have some
    binary log of non-performed write operations?

2. on rebooted server I tried to configure glusterfs to get missing
    files, here is my basic configuration (afr part only):

     volume afr
       type cluster/afr
       subvolumes local remote
     end-volume

4. this shows on the second server only files which were there before
    reboot; files created during the reboot are not there, but they still
    remain on the non-rebooted server

5. option read-subvolume remote
    This actually does nothing. Is this implemented?
    I expecting to read all the data from the remote volume.

6. option favorite-child remote
    This does nothing as well, but at least print some warning into the
    log files. However what is written in the warning actually does not
    happen. I tried to access all files on remote/local
    device/mountpoint (4 ways), no change at all.

7. if I define "subvolumes remote" (so kicking local from subvolumes)
    than I finally get the right file contents, but only at mountpoint,
    not in actual device; I need to get files into the actual device
    (local disk) of rebooted server

8. and finally I deleted all the files from device of rebooted server
    and I was hopping for the replication to do the rest; and viola,
    I have them replicated, so all files created during reboot are there,
    but they are all filed with zeros!
    (and no this is not that known XFS bug, it is actually on EXT3)

I know this all is pretty incredible and looks like a horror story, but
I have read tons of documentation and still I'm not able to figure that
out. I wish that it is problem between keyboard and chair and not in the
software itself.

I'm only trying to have RAID-1 over network with automatic recovery
after reboot/outage. Is this that complicated??

(I need to metion that I did not started with clients yet, there I'm
expecting even bigger troubles like this)

I will much appreciate any kind of help (even confirming me this
behaviour will help me a lot)

Thank you

Ondrej

--
   /\   Ondrej Jombik - nepto at platon.sk - http://nepto.sk - ICQ #122428216
  //\\  Platon Group - open source software development - http://platon.sk
  //\\  10 types of people: those who understand binary & those who do not

Stas Oskin

2009-Apr-01 08:55 UTC

head link

[Gluster-users] RAID-1 over network scenario - incredible problems

Hi.

No need to do rsync - running ls -lR on the mount *should* synchronize all
the files.

That said, it doesn't always work - I'm currently battling with several
issues on the subject.

Regards.

2009/4/1 Ondrej Jombik <nepto at platon.sk>
> Than you so much for this reply. We used to have client-side replication
> and failover worked very nice, but we were unable to do auto-restore
> (servers do not know about each other), thus I thought that moving
> tovards server-side replication would be a good idea.
>
> We will probably move back to client-size and do the rsync-like
> auto-restore after storage node failure.
>
> Again thanks, that helped me a lot, since now I have at least startpoint
> what to expect and whan not (regarding GlusterFS).
>
> PS: I like the GlusterFS architecture concept, and config files as well.
>
>
> On Wed, 1 Apr 2009, Stas Oskin wrote:
>
>  Hi.
>>
>> 2009/4/1 Ondrej Jombik <nepto at platon.sk>
>>      I'm trying to configure GlusterFS setup with two replicating
servers.
>>      For now just without any client. Worked well so far, however after
>>      I rebooted the second server I started to have difficult times...
>>      (note: first server remains unrebooted)
>>
>>
>> Advice - use client side replication, it's considered more reliable
and
>> supports fail-over.
>>
>>
>>      1. are all changes made on non-rebooted server during the second
>> server
>>        reboot lost? they are not replicated after rebooted server is
>> online
>>        again... is there OFFICIAL way how to acheive this? Does it have
>> some
>>        binary log of non-performed write operations?
>>
>>
>> You only need to run ls -lR when the server comes up to sync between
the
>> files. Now, only if it always worked...
>>
>>
>>      6. option favorite-child remote
>>        This does nothing as well, but at least print some warning into
the
>>        log files. However what is written in the warning actually does
not
>>        happen. I tried to access all files on remote/local
>>        device/mountpoint (4 ways), no change at all.
>>
>>
>> Happens to me to.
>>
>>
>>      7. if I define "subvolumes remote" (so kicking local
from subvolumes)
>>        than I finally get the right file contents, but only at
mountpoint,
>>        not in actual device; I need to get files into the actual device
>>        (local disk) of rebooted server
>>
>>
>> Happen to me to.
>>
>>
>>      8. and finally I deleted all the files from device of rebooted
server
>>        and I was hopping for the replication to do the rest; and viola,
>>        I have them replicated, so all files created during reboot are
>> there,
>>        but they are all filed with zeros!
>>        (and no this is not that known XFS bug, it is actually on EXT3)
>>
>>
>> Welcome to the club :).
>>
>>
>>      I know this all is pretty incredible and looks like a horror
story,
>> but
>>      I have read tons of documentation and still I'm not able to
figure
>> that
>>      out. I wish that it is problem between keyboard and chair and not
in
>> the
>>      software itself.
>>
>>      I'm only trying to have RAID-1 over network with automatic
recovery
>>      after reboot/outage. Is this that complicated??
>>
>>      (I need to metion that I did not started with clients yet, there
I'm
>>      expecting even bigger troubles like this)
>>
>>
>> As I said, perhaps you should try client-side AFR first?
>>
>
> --
>  /\   Ondrej Jombik - nepto at platon.sk - http://nepto.sk - ICQ #122428216
>  //\\  Platon Group - open source software development - http://platon.sk
>  //\\  10 types of people: those who understand binary & those who do
not
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20090401/b6e3551e/attachment.html>

Gluster users - Apr 2009 - RAID-1 over network scenario - incredible problems

[Gluster-users] RAID-1 over network scenario - incredible problems

[Gluster-users] RAID-1 over network scenario - incredible problems