Hi, I'm in the process of setting up server-side AFR with 2 servers in separate data centres, separated by a WAN. Writes will be relatively few, so we can live with the performance limitations of the WAN. I noticed unexpected performance though when listing directories of around 1k files with ls -al. It looks like for this operation server1 is sending traffic to server2 in the other data centre, which for a read-only operation I wasn't expecting. tshark shows a reasonable amount of traffic that looks related to xattr: lots of mentions of filenames and 'trusted.glusterfs.afr.metadata-pending'. I'm using the "option read-subvolume local" to point read operations to the volume local to either server. Have tried both with and without the performance translators client-side to no avail. We're using 2.0.0rc1. Apologies if this is an obvious question - can someone spot what I'm doing wrong? cheers, Barnaby
At 04:20 AM 1/30/2009, Barnaby Gray wrote:>I''m in the process of setting up server-side AFR with 2 servers in >separate data centres, separated by a WAN. Writes will be relatively >few, so we can live with the performance limitations of the WAN. > >I noticed unexpected performance though when listing directories of >around 1k files with ls -al. It looks like for this operation server1 is >sending traffic to server2 in the other data centre, which for a >read-only operation I wasn''t expecting.anytime a directory is accessed, gluster/replicate checks with the other server to see if the information it has is current. It does this because if something changed on the other machine it might not have known about it. If something has changed, it auto-heals. Since gluster doesn''t cache information about the other machines in a replicate group, it has to do this everytime.>tshark shows a reasonable amount of traffic that looks related to xattr: >lots of mentions of filenames and ''trusted.glusterfs.afr.metadata-pending''. > >I''m using the "option read-subvolume local" to point read operations to >the volume local to either server.this means. Once it''s determined that my version of the file is the most up to date, then serve it from my disk (or my favorite server in a client-server model) which is faster than streaming it over the network.>Have tried both with and without the performance translators client-side >to no avail. We''re using 2.0.0rc1.I dont suspect any performance translator can help with this particular situation. Gluster HAS to insure that it''s delivering the most up to date version of a file, in order to do that, upon any file request, it has to collaborate with other replicate servers to find out.>Apologies if this is an obvious question - can someone spot what I''m >doing wrong?one might think, "well, both servers haven''t lost connections with eachother, so they should be able to assume they''re in sync," but this isn''t necessarily the case because you can''t know the configuration on the other end. there may be a situation where Server A decided Server B was down because of a network latency, so it wrote and updated a file but didn''t replicate it to Server A. Server B goes to read that file, if it assumes that all has been well with Server A and doesn''t bother checking then it will serve the wrong version of the file. The only way to resolve this would be to make server B responsible for notifying server A when it re-establishes a connection to it. While this seems logical and would improve performance for your case, this would require some sort of journaling on server B. This would be terribly inefficient and would require an additional journal filesystem, or modifying the underlying filesystem in a some way. Then there''s the case of changing architecture. If you have 10 servers in your replicate group, you have to run a journal for all 10, lets say you just shut 5 of them off forever, you''d then need a way to clear out the journal for those so that space isn''t wasted. So given that gluster wants to be non-intrusive>cheers, > >Barnaby > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
At 04:20 AM 1/30/2009, Barnaby Gray wrote:>I'm in the process of setting up server-side AFR with 2 servers in >separate data centres, separated by a WAN. Writes will be relatively >few, so we can live with the performance limitations of the WAN. > >I noticed unexpected performance though when listing directories of >around 1k files with ls -al. It looks like for this operation server1 is >sending traffic to server2 in the other data centre, which for a >read-only operation I wasn't expecting.anytime a directory is accessed, gluster/replicate checks with the other server to see if the information it has is current. It does this because if something changed on the other machine it might not have known about it. If something has changed, it auto-heals. Since gluster doesn't cache information about the other machines in a replicate group, it has to do this everytime.>tshark shows a reasonable amount of traffic that looks related to xattr: >lots of mentions of filenames and 'trusted.glusterfs.afr.metadata-pending'. > >I'm using the "option read-subvolume local" to point read operations to >the volume local to either server.this means. Once it's determined that my version of the file is the most up to date, then serve it from my disk (or my favorite server in a client-server model) which is faster than streaming it over the network.>Have tried both with and without the performance translators client-side >to no avail. We're using 2.0.0rc1.I dont suspect any performance translator can help with this particular situation. Gluster HAS to insure that it's delivering the most up to date version of a file, in order to do that, upon any file request, it has to collaborate with other replicate servers to find out.>Apologies if this is an obvious question - can someone spot what I'm >doing wrong?one might think, "well, both servers haven't lost connections with eachother, so they should be able to assume they're in sync," but this isn't necessarily the case because you can't know the configuration on the other end. there may be a situation where Server A decided Server B was down because of a network latency, so it wrote and updated a file but didn't replicate it to Server A. Server B goes to read that file, if it assumes that all has been well with Server A and doesn't bother checking then it will serve the wrong version of the file. The only way to resolve this would be to make server B responsible for notifying server A when it re-establishes a connection to it. While this seems logical and would improve performance for your case, this would require some sort of journaling on server B. This would be terribly inefficient and would require an additional journal filesystem, or modifying the underlying filesystem in a some way. Then there's the case of changing architecture. If you have 10 servers in your replicate group, you have to run a journal for all 10, lets say you just shut 5 of them off forever, you'd then need a way to clear out the journal for those so that space isn't wasted. So given that gluster wants to be non-intrusive>cheers, > >Barnaby > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users