Hi. I have some AFR-related questions which I decided to put to separate email: 1) How can I specify that AFR stores files on separate servers (in order to prevent data loss when server goes down)? 2) How can I specify how many copies of the files to store? 3) Is it true that no rsync like functionality is supported - meaning whole file needs to be replicated? Or there is some bits delta replication mechanism? 4) How well unity/DHT would work with several disks per server at once (as discussed on another email, where the disks in server are separated one from another, and not overlaid by any raid, jbod or lvm)? What developers think about this and the question above? 5) How this whole setup could be managed in NUFA(?) approach, where each server is a client as well? I mean, what is the advised approach to control all the settings until the central configuration comes in version 1.5? Thanks in advance. Regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20081206/b589e01e/attachment.html>
For example: volume ns-afr0 type cluster/afr subvolumes remote-ns1 remote-ns2 remote-ns3 remote-ns4 end-volume Anything written to ns-afr0 will be AFRed to all the 4 subvolumnes. So how many copies you want to get, how many subvolumnes you should set. But I failed to activate the auto-healing function. Step1: I create a client-AFR based unify, both ns and storage are AFRed. I name the 2 nodes node1 and node2. Step2: glusterfs -s node1 -n unify0 /mnt Step3: cp something /mnt/xxx Step4: Check node1 and node2's storage, found 2 copy of the file xxx. Step5: Stop node2's glusterfd Step6: cat something else >> /mnt/xxx Step7: Stat node2's glusterfd Step8: Sleep 100 Step9: Check node2's storage, found the file xxx with no change through Step3. From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Stas Oskin Sent: Saturday, December 06, 2008 9:53 AM To: gluster-users at gluster.org Subject: [Gluster-users] AFR questions Hi. I have some AFR-related questions which I decided to put to separate email: 1) How can I specify that AFR stores files on separate servers (in order to prevent data loss when server goes down)? 2) How can I specify how many copies of the files to store? 3) Is it true that no rsync like functionality is supported - meaning whole file needs to be replicated? Or there is some bits delta replication mechanism? 4) How well unity/DHT would work with several disks per server at once (as discussed on another email, where the disks in server are separated one from another, and not overlaid by any raid, jbod or lvm)? What developers think about this and the question above? 5) How this whole setup could be managed in NUFA(?) approach, where each server is a client as well? I mean, what is the advised approach to control all the settings until the central configuration comes in version 1.5? Thanks in advance. Regards.
At 09:07 PM 12/5/2008, Kirby Zhou wrote:>For example: > >volume ns-afr0 > type cluster/afr > subvolumes remote-ns1 remote-ns2 remote-ns3 remote-ns4 >end-volume > >Anything written to ns-afr0 will be AFRed to all the 4 subvolumnes. >So how many copies you want to get, how many subvolumnes you should set. > >But I failed to activate the auto-healing function. > >Step1: I create a client-AFR based unify, both ns and storage are AFRed. I >name the 2 nodes node1 and node2. >Step2: glusterfs -s node1 -n unify0 /mnt >Step3: cp something /mnt/xxx >Step4: Check node1 and node2''s storage, found 2 copy of the file xxx. >Step5: Stop node2''s glusterfd >Step6: cat something else >> /mnt/xxx >Step7: Stat node2''s glusterfd >Step8: Sleep 100 >Step9: Check node2''s storage, found the file xxx with no change throughdid you cat the file through the gluster mount point or on the underlying filesystem. the auto-heal is automatically "activated" but it only "heal''s" on file access, so if you access the file through the gluster mountpoint it should find that it''s out of date and update from one of the other servers. check your gluster log. grep for your filename and see what it might say (on both servers)
At 09:07 PM 12/5/2008, Kirby Zhou wrote:>For example: > >volume ns-afr0 > type cluster/afr > subvolumes remote-ns1 remote-ns2 remote-ns3 remote-ns4 >end-volume > >Anything written to ns-afr0 will be AFRed to all the 4 subvolumnes. >So how many copies you want to get, how many subvolumnes you should set. > >But I failed to activate the auto-healing function. > >Step1: I create a client-AFR based unify, both ns and storage are AFRed. I >name the 2 nodes node1 and node2. >Step2: glusterfs -s node1 -n unify0 /mnt >Step3: cp something /mnt/xxx >Step4: Check node1 and node2''s storage, found 2 copy of the file xxx. >Step5: Stop node2''s glusterfd >Step6: cat something else >> /mnt/xxx >Step7: Stat node2''s glusterfd >Step8: Sleep 100 >Step9: Check node2''s storage, found the file xxx with no change throughdid you cat the file through the gluster mount point or on the underlying filesystem. the auto-heal is automatically "activated" but it only "heal''s" on file access, so if you access the file through the gluster mountpoint it should find that it''s out of date and update from one of the other servers. check your gluster log. grep for your filename and see what it might say (on both servers)
At 09:07 PM 12/5/2008, Kirby Zhou wrote:>For example: > >volume ns-afr0 > type cluster/afr > subvolumes remote-ns1 remote-ns2 remote-ns3 remote-ns4 >end-volume > >Anything written to ns-afr0 will be AFRed to all the 4 subvolumnes. >So how many copies you want to get, how many subvolumnes you should set. > >But I failed to activate the auto-healing function. > >Step1: I create a client-AFR based unify, both ns and storage are AFRed. I >name the 2 nodes node1 and node2. >Step2: glusterfs -s node1 -n unify0 /mnt >Step3: cp something /mnt/xxx >Step4: Check node1 and node2''s storage, found 2 copy of the file xxx. >Step5: Stop node2''s glusterfd >Step6: cat something else >> /mnt/xxx >Step7: Stat node2''s glusterfd >Step8: Sleep 100 >Step9: Check node2''s storage, found the file xxx with no change throughdid you cat the file through the gluster mount point or on the underlying filesystem. the auto-heal is automatically "activated" but it only "heal''s" on file access, so if you access the file through the gluster mountpoint it should find that it''s out of date and update from one of the other servers. check your gluster log. grep for your filename and see what it might say (on both servers)
At 11:00 AM 12/6/2008, Stas Oskin wrote:>Hi. > >did you cat the file through the gluster mount point or on the >underlying filesystem. >the auto-heal is automatically "activated" but it only "heal''s" on >file access, so if you access the file through the gluster >mountpoint it should find that it''s out of date and update from one >of the other servers. > >check your gluster log. grep for your filename and see what it might >say (on both servers) > > >So actually, if the server goes down, the file will not be >replicated until it''s accessed? > >AFAIK it''s not a very good approach, because it means that you need >somehow re-access all the files on the lost server in order to >replicated them. Otherwise, the single copy of the file could be >entirely lost, when the other server with it goes down as well.this has been glusters approach from the beginning.. if you read the wiki, there are recommendations for how to force auto-healing. look for some find commands what will do what you want. While, form a raw high-availability approach, the on-access healing doesn''t really help when faced with potential of multiple failures, from a performance approach it''s ideal. otherwise, each AFR server has to retain a transaction log. This would likely have to be stuck on the underlying filesystem, as some weird file or something, or just consume vast amounts of memory. so while it may not be ideal in your circumstance, knowing how it behaves and knowing how to force auto-healing should make it work well for you.>Any idea?
At 11:00 AM 12/6/2008, Stas Oskin wrote:>Hi. > >did you cat the file through the gluster mount point or on the >underlying filesystem. >the auto-heal is automatically "activated" but it only "heal''s" on >file access, so if you access the file through the gluster >mountpoint it should find that it''s out of date and update from one >of the other servers. > >check your gluster log. grep for your filename and see what it might >say (on both servers) > > >So actually, if the server goes down, the file will not be >replicated until it''s accessed? > >AFAIK it''s not a very good approach, because it means that you need >somehow re-access all the files on the lost server in order to >replicated them. Otherwise, the single copy of the file could be >entirely lost, when the other server with it goes down as well.this has been glusters approach from the beginning.. if you read the wiki, there are recommendations for how to force auto-healing. look for some find commands what will do what you want. While, form a raw high-availability approach, the on-access healing doesn''t really help when faced with potential of multiple failures, from a performance approach it''s ideal. otherwise, each AFR server has to retain a transaction log. This would likely have to be stuck on the underlying filesystem, as some weird file or something, or just consume vast amounts of memory. so while it may not be ideal in your circumstance, knowing how it behaves and knowing how to force auto-healing should make it work well for you.>Any idea?
At 11:00 AM 12/6/2008, Stas Oskin wrote:>Hi. > >did you cat the file through the gluster mount point or on the >underlying filesystem. >the auto-heal is automatically "activated" but it only "heal''s" on >file access, so if you access the file through the gluster >mountpoint it should find that it''s out of date and update from one >of the other servers. > >check your gluster log. grep for your filename and see what it might >say (on both servers) > > >So actually, if the server goes down, the file will not be >replicated until it''s accessed? > >AFAIK it''s not a very good approach, because it means that you need >somehow re-access all the files on the lost server in order to >replicated them. Otherwise, the single copy of the file could be >entirely lost, when the other server with it goes down as well.this has been glusters approach from the beginning.. if you read the wiki, there are recommendations for how to force auto-healing. look for some find commands what will do what you want. While, form a raw high-availability approach, the on-access healing doesn''t really help when faced with potential of multiple failures, from a performance approach it''s ideal. otherwise, each AFR server has to retain a transaction log. This would likely have to be stuck on the underlying filesystem, as some weird file or something, or just consume vast amounts of memory. so while it may not be ideal in your circumstance, knowing how it behaves and knowing how to force auto-healing should make it work well for you.>Any idea?
En, Of cource I had done it between step step7 and step8 on the client side. [@123.21 ~]# ll /exports/disk2/xxx -rw-r--r-- 1 root root 268435456 Dec 7 00:18 /exports/disk2/xxx [@123.22 ~]# ll /exports/disk1/xxx ls: /exports/disk1/xxx: No such file or directory [@123.25 ~]# md5sum /mnt/xxx 1f5039e50bd66b290c56684d8550c6c2 /mnt/xxx [@123.22 ~]# ll /exports/disk1/xxx ls: /exports/disk1/xxx: No such file or directory [@123.25 ~]# tail -f /var/log/glusterfs/glusterfs.log !!! nothing more show !!! -----Original Message----- From: Keith Freedman [mailto:freedman at FreeFormIT.com] Sent: Sunday, December 07, 2008 2:09 AM To: Kirby Zhou; 'Stas Oskin'; gluster-users at gluster.org Subject: Re: [Gluster-users] AFR questions At 09:07 PM 12/5/2008, Kirby Zhou wrote:>For example: > >volume ns-afr0 > type cluster/afr > subvolumes remote-ns1 remote-ns2 remote-ns3 remote-ns4 >end-volume > >Anything written to ns-afr0 will be AFRed to all the 4 subvolumnes. >So how many copies you want to get, how many subvolumnes you should set. > >But I failed to activate the auto-healing function. > >Step1: I create a client-AFR based unify, both ns and storage are AFRed. I >name the 2 nodes node1 and node2. >Step2: glusterfs -s node1 -n unify0 /mnt >Step3: cp something /mnt/xxx >Step4: Check node1 and node2's storage, found 2 copy of the file xxx. >Step5: Stop node2's glusterfd >Step6: cat something else >> /mnt/xxx >Step7: Stat node2's glusterfd >Step8: Sleep 100 >Step9: Check node2's storage, found the file xxx with no change throughdid you cat the file through the gluster mount point or on the underlying filesystem. the auto-heal is automatically "activated" but it only "heal's" on file access, so if you access the file through the gluster mountpoint it should find that it's out of date and update from one of the other servers. check your gluster log. grep for your filename and see what it might say (on both servers)
Hi. this has been glusters approach from the beginning.. if you read the wiki,> there are recommendations for how to force auto-healing. look for some find > commands what will do what you want. > > While, form a raw high-availability approach, the on-access healing doesn't > really help when faced with potential of multiple failures, from a > performance approach it's ideal. > otherwise, each AFR server has to retain a transaction log. This would > likely have to be stuck on the underlying filesystem, as some weird file or > something, or just consume vast amounts of memory. > > so while it may not be ideal in your circumstance, knowing how it behaves > and knowing how to force auto-healing should make it work well for you. >So, basically the recommended approach is to run "find" every time a server goes down or comes up? How then one insures files are re-balanced on remaining servers? I mean, AFR would just look for the next free server and it put them there? And after the problematic server comes up again, AFR would erase the redundant file copies from it? Regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20081207/1a58b32d/attachment.html>
---------- Forwarded message ---------- From: Stas Oskin <stas.oskin at gmail.com> Date: 2008/12/7 Subject: Re: [Gluster-users] AFR questions To: Kirby Zhou ?kirbyzhou at sohu-rd.com? Hi. For example:> > volume ns-afr0 > type cluster/afr > subvolumes remote-ns1 remote-ns2 remote-ns3 remote-ns4 > end-volume > > Anything written to ns-afr0 will be AFRed to all the 4 subvolumnes. > So how many copies you want to get, how many subvolumnes you should set.Thanks for the example. By this approach you mean that basically the only way to define 2-way replication for example, is to put 2 bricks inside a volume? What if I have say 10 bricks and still need only 2 copies? I will need to define such sub-volumes for each pair? This sounds a bit strange, because then you basically limited to 2 servers instead of to the whole server pool. Is it possible just to add all available bricks to list, then specify how many copies to store? Regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20081207/65211565/attachment.html>
Hi Stas, please find the answers inlined. On Mon, Dec 8, 2008 at 1:55 PM, Stas Oskin <stas.oskin at gmail.com> wrote:> Hi. > > Thanks for your answer, it clarifies the matter a bit for me (+ several > hours I spent on Gluster wiki :) ) > > you can have two unify of 5 bricks each and have these two as children of >> afr. something like, >> >> volume unify-0 >> type cluster/unify >> subvolumes n1 n2 n3 n4 n5 >> end-volume >> >> volume unify-1 >> type cluster/unify >> subvolumes n6 n7 n8 n9 n10 >> end-volume >> >> volume afr >> type cluster/afr >> subvolumes unify-0 unify-1 >> end-volume >> > > Several questions if I may: > > 1) In this setup, anything written to volume afr would be actually > duplicated to unify-1 and unify-2, correct? >yes> > 2) I will not be able to track which server the file copies go to - from my > point of view it's 2 pools of storage unified by single >space?>2 pools of storage replicating each other. Each pool is unify of 5 storage nodes.> > 3) This setup actually means that I will need to add 2 servers every time, > correct? >you mean, every time you want to add new storage capacity?For the normal functioning of glusterfs, its not a requirement. glusterfs can continue to function even If you add a node to only one of the pool. But since the data is always replicated between two unify volumes, its practical to add a node to each of the unify pool.> > 4) What if one of disks in any volume breaks - unify/AFR/client would > overcome it and supply data from another disk, correct? >afr identifies it and serves data from another pool.> > 5) When I bring the disk back, using the "find" approach would re-sync it > to the current state? >yes. When you open the file, afr self heal is triggered and the file is updated to the latest state.> 6) What if I run "find" BEFORE bringing the disk back - would it put the > files on some other disk, or would still require the disk to come back? >Running find before has no effect in terms of healing of the file. The file is healed only when the other node is up and glusterfs finds it to be not of latest version when compared to other copy.> 7) Would this kind of setup function in NUFA environment? >I dont understand the question. Unify has nufa scheduler which prefers local node over other nodes while file creation. Also afr has an option read-subvolume, where you can specify the preferred node for read. Using both the options, one can have nufa kind of environment.> 8) Finally, would it be ever possible to make GlusterFS as transparent > space completely? Meaning, just have one large space, which accepts new > volumes automatically, provides this space to clients, and always insures > there is at least 2 copies present? >The current approach of having two unify of storage nodes will prevail. However, there is a distributed hash translator which does not require namespace cache and hence scalable better than unify. Also "hot-add" functionality is scheduled for future releases which enables automatic addition of nodes without requiring restart of glusterfs.> > > Thanks in advance for your time. >regards, -- Raghavendra G -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20081208/71b9dafd/attachment.html>