hi there. i've 6 load-balanced webservers running with apache 2.0. right now we use unison to upload file changes from the developers to server 1 and then 'sync' those changes to 2, 3 andsoon. additionally if a file is created on one of the servers (like a temporary download created with php) we 'sync' that file with scp. for safety there are unisons running every 10mins on some servers to ensure all files are available on every server. right now it's not a problem with only 6 servers but i'm pretty sure it will become a big problem with more servers... so i started some tests GlusterFS 2.0.0rc2. right now i think two setups may suit us: layout 1: node1-replicate-node2 \ node3-replicate-node4 - --> distribute ( like raid0 over 3 x raid1 ) node5-replicate-node6 / or layout 2: node1-replicate-node2-replicate-node3 \ node4-replicate-node5-replicate-node6 - --> distribute ( like raid0 over 2 x raid5 :) ) i think layout 1 should be ok, because until now i've never seen 2 servers crash at the same time. my problem is: what happens if we get additional servers? i've tested layout 1 with 4 nodes, shutdown GLFS, added node5-replicate-node6 to the distrubite translator and started again. files were ok but i got alot of GLFS errors in the log files. can i safely ignore them? i know the manual says: use distribute for fresh installations else unify. still i'm curious if it would work with distribute. btw. for GlusterFS version 2.0 is AFR = replicate? thanks for any suggestions :)
At 03:11 AM 3/10/2009, Christian Meisinger wrote:>hi there. > > >i''ve 6 load-balanced webservers running with apache 2.0. >right now we use unison to upload file changes from the developers to server 1 >and then ''sync'' those changes to 2, 3 andsoon. > >additionally if a file is created on one of the servers (like a >temporary download created with php) >we ''sync'' that file with scp. for safety there are unisons running >every 10mins on some servers >to ensure all files are available on every server. > >right now it''s not a problem with only 6 servers but i''m pretty sure >it will become a big problem >with more servers... > > >so i started some tests GlusterFS 2.0.0rc2. >right now i think two setups may suit us: > >layout 1: > node1-replicate-node2 \ > node3-replicate-node4 - --> distribute ( like raid0 over 3 x raid1 ) > node5-replicate-node6 / > >or > >layout 2: > node1-replicate-node2-replicate-node3 \ > node4-replicate-node5-replicate-node6 - --> distribute ( like > raid0 over 2 x raid5 :) ) > >i think layout 1 should be ok, because until now i''ve never seen 2 >servers crash at the same time.I''d definitely recommend layout 1 over layout 2. the more nodes in an AFR brick the more performance issues you may have.>my problem is: what happens if we get additional servers?if you add them in pairs, you''re in good shape, otherwise, you need to do a 3-node afr brick until you have an even number of servers then do something different.>i''ve tested layout 1 with 4 nodes, shutdown GLFS, added >node5-replicate-node6 to the distrubite translator and started >again. files were ok but i got alot of GLFS errors in the log files. >can i safely ignore them?it depends on which errors they are.. you''d have to use your judgement. whenever I know a node has failed in an AFR pair, I typicall turn on "favorite-child" in the other one, remount, then bring up the down server. then run ls -lR on the server that didn''t crash. once done, remount without favorite child. but this way you don''t get the i/o errors (although I think those are fixed in Rc3 or 4) you might be seeing.>i know the manual says: use distribute for fresh installations else unify. >still i''m curious if it would work with distribute. > >btw. for GlusterFS version 2.0 is AFR = replicate?yes
At 03:11 AM 3/10/2009, Christian Meisinger wrote:>hi there. > > >i've 6 load-balanced webservers running with apache 2.0. >right now we use unison to upload file changes from the developers to server 1 >and then 'sync' those changes to 2, 3 andsoon. > >additionally if a file is created on one of the servers (like a >temporary download created with php) >we 'sync' that file with scp. for safety there are unisons running >every 10mins on some servers >to ensure all files are available on every server. > >right now it's not a problem with only 6 servers but i'm pretty sure >it will become a big problem >with more servers... > > >so i started some tests GlusterFS 2.0.0rc2. >right now i think two setups may suit us: > >layout 1: > node1-replicate-node2 \ > node3-replicate-node4 - --> distribute ( like raid0 over 3 x raid1 ) > node5-replicate-node6 / > >or > >layout 2: > node1-replicate-node2-replicate-node3 \ > node4-replicate-node5-replicate-node6 - --> distribute ( like > raid0 over 2 x raid5 :) ) > >i think layout 1 should be ok, because until now i've never seen 2 >servers crash at the same time.I'd definitely recommend layout 1 over layout 2. the more nodes in an AFR brick the more performance issues you may have.>my problem is: what happens if we get additional servers?if you add them in pairs, you're in good shape, otherwise, you need to do a 3-node afr brick until you have an even number of servers then do something different.>i've tested layout 1 with 4 nodes, shutdown GLFS, added >node5-replicate-node6 to the distrubite translator and started >again. files were ok but i got alot of GLFS errors in the log files. >can i safely ignore them?it depends on which errors they are.. you'd have to use your judgement. whenever I know a node has failed in an AFR pair, I typicall turn on "favorite-child" in the other one, remount, then bring up the down server. then run ls -lR on the server that didn't crash. once done, remount without favorite child. but this way you don't get the i/o errors (although I think those are fixed in Rc3 or 4) you might be seeing.>i know the manual says: use distribute for fresh installations else unify. >still i'm curious if it would work with distribute. > >btw. for GlusterFS version 2.0 is AFR = replicate?yes
>> i've tested layout 1 with 4 nodes, shutdown GLFS, added >> node5-replicate-node6 to the distrubite translator and started >> again. files were ok but i got alot of GLFS errors in the log files. >> can i safely ignore them? > > it depends on which errors they are.. you'd have to use your judgement. > whenever I know a node has failed in an AFR pair, I typicall turn on > "favorite-child" in the other one, remount, then bring up the down > server. then run ls -lR on the server that didn't crash. > once done, remount without favorite child. but this way you don't get > the i/o errors (although I think those are fixed in Rc3 or 4) you might > be seeing.ah ok... i will try this. another question for my testing: - i start GLFS with 4 nodes (dist - repl+repl) - on node5 and 6 i change the volume file - then i start node5 and 6 (now the layout is dist - repl+repl+repl) i guess that's not recommended :) although it seem to work somehow. btw. the manual says with GLFS 1.3.x and higher server and client binary are the same file. does that mean i can write everything in one volumen file and start only one glusterfs binary? thanks