Hello everyone, We want to build a cluster of 4 web-servers. ftp and http will be load-balanced, so we will never know which node will serve ftp/http traffic. Since we don't want to loose any part of functionality in case of getting one of the servers out of order, we have invented the following architecture: - each server will have 2 data bricks and 1 namespace bricks - each second data brick is AFRed with first data brick of the next server - all namespace bricks ar AFRed we've tried to follow recommendations from wiki and the following configs have been created: ------------------------------- begin server config ------------------------------------------- # # Object Storage Brick 1 # # low-level brick pointing to physical folder volume posix1 type storage/posix option directory /mnt/os1/export end-volume # put support for fcntl over brick volume locks1 type features/posix-locks subvolumes posix1 option mandatory on end-volume # put additional io threads for this brick volume brick1 type performance/io-threads option thread-count 4 option cache-size 32MB subvolumes locks1 end-volume # # Object Storage Brick 2 # # low-level brick pointing to physical folder volume posix2 type storage/posix option directory /mnt/os2/export end-volume # put support for fcntl over brick volume locks2 type features/posix-locks subvolumes posix2 option mandatory on end-volume # put additional io threads for this brick volume brick2 type performance/io-threads option thread-count 4 option cache-size 32MB subvolumes locks2 end-volume # # Metadata Storage # volume brick1ns type storage/posix option directory /mnt/ms1 end-volume # # Volume to export # volume server type protocol/server subvolumes brick1 brick2 brick1ns brick2ns option transport-type tcp/server option auth.ip.brick1.allow * option auth.ip.brick2.allow * option auth.ip.brick1ns.allow * end-volume ------------------------------- end server config ------------------------------------------- and client config from one of the nodes ------------------------------- begin client config ------------------------------------------- ### begin x-346-01 ### volume brick01 type protocol/client option transport-type tcp/client option remote-host 192.168.252.11 option remote-subvolume brick1 end-volume volume brick02 type protocol/client option transport-type tcp/client option remote-host 192.168.252.11 option remote-subvolume brick2 end-volume volume brick01ns type protocol/client option transport-type tcp/client option remote-host 192.168.252.11 option remote-subvolume brick1ns end-volume ### end x-346-01 ### ### begin x-346-02 ### volume brick03 type protocol/client option transport-type tcp/client option remote-host 192.168.252.21 option remote-subvolume brick1 end-volume volume brick04 type protocol/client option transport-type tcp/client option remote-host 192.168.252.21 option remote-subvolume brick2 end-volume volume brick03ns type protocol/client option transport-type tcp/client option remote-host 192.168.252.21 option remote-subvolume brick1n end-volume ### end x-346-02 ### ### begin x-346-03 ### volume brick05 type protocol/client option transport-type tcp/client option remote-host 192.168.252.31 option remote-subvolume brick1 end-volume volume brick06 type protocol/client option transport-type tcp/client option remote-host 192.168.252.31 option remote-subvolume brick2 end-volume volume brick05ns type protocol/client option transport-type tcp/client option remote-host 192.168.252.31 option remote-subvolume brick1ns end-volume ### begin x-346-03 ### ### begin x-346-04 ### volume brick07 type protocol/client option transport-type tcp/client option remote-host 192.168.252.41 option remote-subvolume brick1 end-volume volume brick08 type protocol/client option transport-type tcp/client option remote-host 192.168.252.41 option remote-subvolume brick2 end-volume volume brick07ns type protocol/client option transport-type tcp/client option remote-host 192.168.252.41 option remote-subvolume brick1ns end-volume ### begin x-346-04 ### ### afr bricks ### volume afr01 type cluster/afr subvolumes brick02 brick03 end-volume volume afr02 type cluster/afr subvolumes brick04 brick05 end-volume volume afr03 type cluster/afr subvolumes brick06 brick07 end-volume volume afr04 type cluster/afr subvolumes brick08 brick01 end-volume volume afrns type cluster/afr subvolumes brick01ns brick03ns brick05ns brick07ns end-volume ### unify ### volume unify type cluster/unify option namespace afrns option scheduler nufa option nufa.local-volume-name brick03 option nufa.local-volume-name brick04 option nufa.limits.min-free-disk 5% subvolumes afr01 afr02 afr03 afr04 end-volume ------------------------------- end client config ------------------------------------------- seems everything is working fine, but we want to know if there are any alternatives to such configuration and maybe some additional optimizations may be applied? is there any mechanisms to split one file over more than 2 nodes? Do we need readahead translators if we use nufa with local-volume options? what about write-ahead? did we miss something else? -- ...WBR, Roman Hlynovskiy
I''ll let one of the devs respond to your specific config. There are a couple cautions ... if you''re running PHP, you''ll want to modify your php.ini to have session_save_path on shared storage.. If someones session starts on server one and the browser directs them to server2 , their session is missing (Either that or use DB based sessions). I''ve noticed some problems with this configuration, in that it seems PHP likes to create semaphores all the time. These get created in session_save_path. There seems to be some cases where processes sometimes block on the semaphore form the other server. I haven''t been able to figure out exactly why, and it may be exclusive to my configuration, but it''s something to watch out for. You might end up with non-killable php processes out iowait blocked. the only solution has been to kill gluster and remount the filesystem. This only takes a second but it''s inconvenient, and until you realize it''s happening, any process which tries to access the same files will block also, thus eventually consuming all your spare httpd processes. Keith At 10:51 PM 8/10/2008, Roman Hlynovskiy wrote:>Hello everyone, > >We want to build a cluster of 4 web-servers. ftp and http will be >load-balanced, so we will never know which node will serve ftp/http >traffic. >Since we don''t want to loose any part of functionality in case of >getting one of the servers out of order, we have invented the >following architecture: > - each server will have 2 data bricks and 1 namespace bricks > - each second data brick is AFRed with first data brick of the next server > - all namespace bricks ar AFRed > >we''ve tried to follow recommendations from wiki and the following >configs have been created: >------------------------------- begin server config >------------------------------------------- > ># ># Object Storage Brick 1 ># > ># low-level brick pointing to physical folder >volume posix1 > type storage/posix > option directory /mnt/os1/export >end-volume > ># put support for fcntl over brick >volume locks1 > type features/posix-locks > subvolumes posix1 > option mandatory on >end-volume > ># put additional io threads for this brick >volume brick1 > type performance/io-threads > option thread-count 4 > option cache-size 32MB > subvolumes locks1 >end-volume > ># ># Object Storage Brick 2 ># > ># low-level brick pointing to physical folder >volume posix2 > type storage/posix > option directory /mnt/os2/export >end-volume > ># put support for fcntl over brick >volume locks2 > type features/posix-locks > subvolumes posix2 > option mandatory on >end-volume > ># put additional io threads for this brick >volume brick2 > type performance/io-threads > option thread-count 4 > option cache-size 32MB > subvolumes locks2 >end-volume > ># ># Metadata Storage ># > >volume brick1ns > type storage/posix > option directory /mnt/ms1 >end-volume > ># ># Volume to export ># > >volume server > type protocol/server > subvolumes brick1 brick2 brick1ns brick2ns > option transport-type tcp/server > option auth.ip.brick1.allow * > option auth.ip.brick2.allow * > option auth.ip.brick1ns.allow * >end-volume > >------------------------------- end server config >------------------------------------------- > >and client config from one of the nodes > >------------------------------- begin client config >------------------------------------------- > >### begin x-346-01 ### > >volume brick01 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.11 > option remote-subvolume brick1 >end-volume > >volume brick02 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.11 > option remote-subvolume brick2 >end-volume > >volume brick01ns > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.11 > option remote-subvolume brick1ns >end-volume > >### end x-346-01 ### > > > >### begin x-346-02 ### > >volume brick03 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.21 > option remote-subvolume brick1 >end-volume > >volume brick04 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.21 > option remote-subvolume brick2 >end-volume > >volume brick03ns > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.21 > option remote-subvolume brick1n >end-volume > >### end x-346-02 ### > > > >### begin x-346-03 ### > >volume brick05 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.31 > option remote-subvolume brick1 >end-volume > >volume brick06 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.31 > option remote-subvolume brick2 >end-volume > >volume brick05ns > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.31 > option remote-subvolume brick1ns >end-volume > >### begin x-346-03 ### > > > >### begin x-346-04 ### > >volume brick07 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.41 > option remote-subvolume brick1 >end-volume > >volume brick08 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.41 > option remote-subvolume brick2 >end-volume > >volume brick07ns > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.41 > option remote-subvolume brick1ns >end-volume > >### begin x-346-04 ### > > > >### afr bricks ### > >volume afr01 > type cluster/afr > subvolumes brick02 brick03 >end-volume > >volume afr02 > type cluster/afr > subvolumes brick04 brick05 >end-volume > >volume afr03 > type cluster/afr > subvolumes brick06 brick07 >end-volume > >volume afr04 > type cluster/afr > subvolumes brick08 brick01 >end-volume > >volume afrns > type cluster/afr > subvolumes brick01ns brick03ns brick05ns brick07ns >end-volume > >### unify ### > >volume unify > type cluster/unify > option namespace afrns > option scheduler nufa > option nufa.local-volume-name brick03 > option nufa.local-volume-name brick04 > option nufa.limits.min-free-disk 5% > subvolumes afr01 afr02 afr03 afr04 >end-volume > >------------------------------- end client config >------------------------------------------- > >seems everything is working fine, but we want to know if there are any >alternatives to such configuration and maybe some additional >optimizations may be applied? >is there any mechanisms to split one file over more than 2 nodes? >Do we need readahead translators if we use nufa with local-volume >options? what about write-ahead? did we miss something else? > > >-- >...WBR, Roman Hlynovskiy > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
I'll let one of the devs respond to your specific config. There are a couple cautions ... if you're running PHP, you'll want to modify your php.ini to have session_save_path on shared storage.. If someones session starts on server one and the browser directs them to server2 , their session is missing (Either that or use DB based sessions). I've noticed some problems with this configuration, in that it seems PHP likes to create semaphores all the time. These get created in session_save_path. There seems to be some cases where processes sometimes block on the semaphore form the other server. I haven't been able to figure out exactly why, and it may be exclusive to my configuration, but it's something to watch out for. You might end up with non-killable php processes out iowait blocked. the only solution has been to kill gluster and remount the filesystem. This only takes a second but it's inconvenient, and until you realize it's happening, any process which tries to access the same files will block also, thus eventually consuming all your spare httpd processes. Keith At 10:51 PM 8/10/2008, Roman Hlynovskiy wrote:>Hello everyone, > >We want to build a cluster of 4 web-servers. ftp and http will be >load-balanced, so we will never know which node will serve ftp/http >traffic. >Since we don't want to loose any part of functionality in case of >getting one of the servers out of order, we have invented the >following architecture: > - each server will have 2 data bricks and 1 namespace bricks > - each second data brick is AFRed with first data brick of the next server > - all namespace bricks ar AFRed > >we've tried to follow recommendations from wiki and the following >configs have been created: >------------------------------- begin server config >------------------------------------------- > ># ># Object Storage Brick 1 ># > ># low-level brick pointing to physical folder >volume posix1 > type storage/posix > option directory /mnt/os1/export >end-volume > ># put support for fcntl over brick >volume locks1 > type features/posix-locks > subvolumes posix1 > option mandatory on >end-volume > ># put additional io threads for this brick >volume brick1 > type performance/io-threads > option thread-count 4 > option cache-size 32MB > subvolumes locks1 >end-volume > ># ># Object Storage Brick 2 ># > ># low-level brick pointing to physical folder >volume posix2 > type storage/posix > option directory /mnt/os2/export >end-volume > ># put support for fcntl over brick >volume locks2 > type features/posix-locks > subvolumes posix2 > option mandatory on >end-volume > ># put additional io threads for this brick >volume brick2 > type performance/io-threads > option thread-count 4 > option cache-size 32MB > subvolumes locks2 >end-volume > ># ># Metadata Storage ># > >volume brick1ns > type storage/posix > option directory /mnt/ms1 >end-volume > ># ># Volume to export ># > >volume server > type protocol/server > subvolumes brick1 brick2 brick1ns brick2ns > option transport-type tcp/server > option auth.ip.brick1.allow * > option auth.ip.brick2.allow * > option auth.ip.brick1ns.allow * >end-volume > >------------------------------- end server config >------------------------------------------- > >and client config from one of the nodes > >------------------------------- begin client config >------------------------------------------- > >### begin x-346-01 ### > >volume brick01 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.11 > option remote-subvolume brick1 >end-volume > >volume brick02 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.11 > option remote-subvolume brick2 >end-volume > >volume brick01ns > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.11 > option remote-subvolume brick1ns >end-volume > >### end x-346-01 ### > > > >### begin x-346-02 ### > >volume brick03 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.21 > option remote-subvolume brick1 >end-volume > >volume brick04 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.21 > option remote-subvolume brick2 >end-volume > >volume brick03ns > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.21 > option remote-subvolume brick1n >end-volume > >### end x-346-02 ### > > > >### begin x-346-03 ### > >volume brick05 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.31 > option remote-subvolume brick1 >end-volume > >volume brick06 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.31 > option remote-subvolume brick2 >end-volume > >volume brick05ns > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.31 > option remote-subvolume brick1ns >end-volume > >### begin x-346-03 ### > > > >### begin x-346-04 ### > >volume brick07 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.41 > option remote-subvolume brick1 >end-volume > >volume brick08 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.41 > option remote-subvolume brick2 >end-volume > >volume brick07ns > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.41 > option remote-subvolume brick1ns >end-volume > >### begin x-346-04 ### > > > >### afr bricks ### > >volume afr01 > type cluster/afr > subvolumes brick02 brick03 >end-volume > >volume afr02 > type cluster/afr > subvolumes brick04 brick05 >end-volume > >volume afr03 > type cluster/afr > subvolumes brick06 brick07 >end-volume > >volume afr04 > type cluster/afr > subvolumes brick08 brick01 >end-volume > >volume afrns > type cluster/afr > subvolumes brick01ns brick03ns brick05ns brick07ns >end-volume > >### unify ### > >volume unify > type cluster/unify > option namespace afrns > option scheduler nufa > option nufa.local-volume-name brick03 > option nufa.local-volume-name brick04 > option nufa.limits.min-free-disk 5% > subvolumes afr01 afr02 afr03 afr04 >end-volume > >------------------------------- end client config >------------------------------------------- > >seems everything is working fine, but we want to know if there are any >alternatives to such configuration and maybe some additional >optimizations may be applied? >is there any mechanisms to split one file over more than 2 nodes? >Do we need readahead translators if we use nufa with local-volume >options? what about write-ahead? did we miss something else? > > >-- >...WBR, Roman Hlynovskiy > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
I''ll let one of the devs respond to your specific config. There are a couple cautions ... if you''re running PHP, you''ll want to modify your php.ini to have session_save_path on shared storage.. If someones session starts on server one and the browser directs them to server2 , their session is missing (Either that or use DB based sessions). I''ve noticed some problems with this configuration, in that it seems PHP likes to create semaphores all the time. These get created in session_save_path. There seems to be some cases where processes sometimes block on the semaphore form the other server. I haven''t been able to figure out exactly why, and it may be exclusive to my configuration, but it''s something to watch out for. You might end up with non-killable php processes out iowait blocked. the only solution has been to kill gluster and remount the filesystem. This only takes a second but it''s inconvenient, and until you realize it''s happening, any process which tries to access the same files will block also, thus eventually consuming all your spare httpd processes. Keith At 10:51 PM 8/10/2008, Roman Hlynovskiy wrote:>Hello everyone, > >We want to build a cluster of 4 web-servers. ftp and http will be >load-balanced, so we will never know which node will serve ftp/http >traffic. >Since we don''t want to loose any part of functionality in case of >getting one of the servers out of order, we have invented the >following architecture: > - each server will have 2 data bricks and 1 namespace bricks > - each second data brick is AFRed with first data brick of the next server > - all namespace bricks ar AFRed > >we''ve tried to follow recommendations from wiki and the following >configs have been created: >------------------------------- begin server config >------------------------------------------- > ># ># Object Storage Brick 1 ># > ># low-level brick pointing to physical folder >volume posix1 > type storage/posix > option directory /mnt/os1/export >end-volume > ># put support for fcntl over brick >volume locks1 > type features/posix-locks > subvolumes posix1 > option mandatory on >end-volume > ># put additional io threads for this brick >volume brick1 > type performance/io-threads > option thread-count 4 > option cache-size 32MB > subvolumes locks1 >end-volume > ># ># Object Storage Brick 2 ># > ># low-level brick pointing to physical folder >volume posix2 > type storage/posix > option directory /mnt/os2/export >end-volume > ># put support for fcntl over brick >volume locks2 > type features/posix-locks > subvolumes posix2 > option mandatory on >end-volume > ># put additional io threads for this brick >volume brick2 > type performance/io-threads > option thread-count 4 > option cache-size 32MB > subvolumes locks2 >end-volume > ># ># Metadata Storage ># > >volume brick1ns > type storage/posix > option directory /mnt/ms1 >end-volume > ># ># Volume to export ># > >volume server > type protocol/server > subvolumes brick1 brick2 brick1ns brick2ns > option transport-type tcp/server > option auth.ip.brick1.allow * > option auth.ip.brick2.allow * > option auth.ip.brick1ns.allow * >end-volume > >------------------------------- end server config >------------------------------------------- > >and client config from one of the nodes > >------------------------------- begin client config >------------------------------------------- > >### begin x-346-01 ### > >volume brick01 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.11 > option remote-subvolume brick1 >end-volume > >volume brick02 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.11 > option remote-subvolume brick2 >end-volume > >volume brick01ns > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.11 > option remote-subvolume brick1ns >end-volume > >### end x-346-01 ### > > > >### begin x-346-02 ### > >volume brick03 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.21 > option remote-subvolume brick1 >end-volume > >volume brick04 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.21 > option remote-subvolume brick2 >end-volume > >volume brick03ns > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.21 > option remote-subvolume brick1n >end-volume > >### end x-346-02 ### > > > >### begin x-346-03 ### > >volume brick05 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.31 > option remote-subvolume brick1 >end-volume > >volume brick06 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.31 > option remote-subvolume brick2 >end-volume > >volume brick05ns > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.31 > option remote-subvolume brick1ns >end-volume > >### begin x-346-03 ### > > > >### begin x-346-04 ### > >volume brick07 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.41 > option remote-subvolume brick1 >end-volume > >volume brick08 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.41 > option remote-subvolume brick2 >end-volume > >volume brick07ns > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.41 > option remote-subvolume brick1ns >end-volume > >### begin x-346-04 ### > > > >### afr bricks ### > >volume afr01 > type cluster/afr > subvolumes brick02 brick03 >end-volume > >volume afr02 > type cluster/afr > subvolumes brick04 brick05 >end-volume > >volume afr03 > type cluster/afr > subvolumes brick06 brick07 >end-volume > >volume afr04 > type cluster/afr > subvolumes brick08 brick01 >end-volume > >volume afrns > type cluster/afr > subvolumes brick01ns brick03ns brick05ns brick07ns >end-volume > >### unify ### > >volume unify > type cluster/unify > option namespace afrns > option scheduler nufa > option nufa.local-volume-name brick03 > option nufa.local-volume-name brick04 > option nufa.limits.min-free-disk 5% > subvolumes afr01 afr02 afr03 afr04 >end-volume > >------------------------------- end client config >------------------------------------------- > >seems everything is working fine, but we want to know if there are any >alternatives to such configuration and maybe some additional >optimizations may be applied? >is there any mechanisms to split one file over more than 2 nodes? >Do we need readahead translators if we use nufa with local-volume >options? what about write-ahead? did we miss something else? > > >-- >...WBR, Roman Hlynovskiy > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
I''ll let one of the devs respond to your specific config. There are a couple cautions ... if you''re running PHP, you''ll want to modify your php.ini to have session_save_path on shared storage.. If someones session starts on server one and the browser directs them to server2 , their session is missing (Either that or use DB based sessions). I''ve noticed some problems with this configuration, in that it seems PHP likes to create semaphores all the time. These get created in session_save_path. There seems to be some cases where processes sometimes block on the semaphore form the other server. I haven''t been able to figure out exactly why, and it may be exclusive to my configuration, but it''s something to watch out for. You might end up with non-killable php processes out iowait blocked. the only solution has been to kill gluster and remount the filesystem. This only takes a second but it''s inconvenient, and until you realize it''s happening, any process which tries to access the same files will block also, thus eventually consuming all your spare httpd processes. Keith At 10:51 PM 8/10/2008, Roman Hlynovskiy wrote:>Hello everyone, > >We want to build a cluster of 4 web-servers. ftp and http will be >load-balanced, so we will never know which node will serve ftp/http >traffic. >Since we don''t want to loose any part of functionality in case of >getting one of the servers out of order, we have invented the >following architecture: > - each server will have 2 data bricks and 1 namespace bricks > - each second data brick is AFRed with first data brick of the next server > - all namespace bricks ar AFRed > >we''ve tried to follow recommendations from wiki and the following >configs have been created: >------------------------------- begin server config >------------------------------------------- > ># ># Object Storage Brick 1 ># > ># low-level brick pointing to physical folder >volume posix1 > type storage/posix > option directory /mnt/os1/export >end-volume > ># put support for fcntl over brick >volume locks1 > type features/posix-locks > subvolumes posix1 > option mandatory on >end-volume > ># put additional io threads for this brick >volume brick1 > type performance/io-threads > option thread-count 4 > option cache-size 32MB > subvolumes locks1 >end-volume > ># ># Object Storage Brick 2 ># > ># low-level brick pointing to physical folder >volume posix2 > type storage/posix > option directory /mnt/os2/export >end-volume > ># put support for fcntl over brick >volume locks2 > type features/posix-locks > subvolumes posix2 > option mandatory on >end-volume > ># put additional io threads for this brick >volume brick2 > type performance/io-threads > option thread-count 4 > option cache-size 32MB > subvolumes locks2 >end-volume > ># ># Metadata Storage ># > >volume brick1ns > type storage/posix > option directory /mnt/ms1 >end-volume > ># ># Volume to export ># > >volume server > type protocol/server > subvolumes brick1 brick2 brick1ns brick2ns > option transport-type tcp/server > option auth.ip.brick1.allow * > option auth.ip.brick2.allow * > option auth.ip.brick1ns.allow * >end-volume > >------------------------------- end server config >------------------------------------------- > >and client config from one of the nodes > >------------------------------- begin client config >------------------------------------------- > >### begin x-346-01 ### > >volume brick01 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.11 > option remote-subvolume brick1 >end-volume > >volume brick02 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.11 > option remote-subvolume brick2 >end-volume > >volume brick01ns > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.11 > option remote-subvolume brick1ns >end-volume > >### end x-346-01 ### > > > >### begin x-346-02 ### > >volume brick03 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.21 > option remote-subvolume brick1 >end-volume > >volume brick04 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.21 > option remote-subvolume brick2 >end-volume > >volume brick03ns > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.21 > option remote-subvolume brick1n >end-volume > >### end x-346-02 ### > > > >### begin x-346-03 ### > >volume brick05 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.31 > option remote-subvolume brick1 >end-volume > >volume brick06 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.31 > option remote-subvolume brick2 >end-volume > >volume brick05ns > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.31 > option remote-subvolume brick1ns >end-volume > >### begin x-346-03 ### > > > >### begin x-346-04 ### > >volume brick07 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.41 > option remote-subvolume brick1 >end-volume > >volume brick08 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.41 > option remote-subvolume brick2 >end-volume > >volume brick07ns > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.41 > option remote-subvolume brick1ns >end-volume > >### begin x-346-04 ### > > > >### afr bricks ### > >volume afr01 > type cluster/afr > subvolumes brick02 brick03 >end-volume > >volume afr02 > type cluster/afr > subvolumes brick04 brick05 >end-volume > >volume afr03 > type cluster/afr > subvolumes brick06 brick07 >end-volume > >volume afr04 > type cluster/afr > subvolumes brick08 brick01 >end-volume > >volume afrns > type cluster/afr > subvolumes brick01ns brick03ns brick05ns brick07ns >end-volume > >### unify ### > >volume unify > type cluster/unify > option namespace afrns > option scheduler nufa > option nufa.local-volume-name brick03 > option nufa.local-volume-name brick04 > option nufa.limits.min-free-disk 5% > subvolumes afr01 afr02 afr03 afr04 >end-volume > >------------------------------- end client config >------------------------------------------- > >seems everything is working fine, but we want to know if there are any >alternatives to such configuration and maybe some additional >optimizations may be applied? >is there any mechanisms to split one file over more than 2 nodes? >Do we need readahead translators if we use nufa with local-volume >options? what about write-ahead? did we miss something else? > > >-- >...WBR, Roman Hlynovskiy > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
Hello Keith, ok thanks, we will try to make stress tests with php and check if the same situation apply to our configuration. did this semaphore issue occurred only with some specific number of simultaneous connections or it was matter of "luck" :) ? 2008/8/11 Keith Freedman <freedman at freeformit.com>:> I'll let one of the devs respond to your specific config. > > There are a couple cautions ... > if you're running PHP, you'll want to modify your php.ini to have > session_save_path on shared storage.. If someones session starts on server > one and the browser directs them to server2 , their session is missing > (Either that or use DB based sessions). > > I've noticed some problems with this configuration, in that it seems PHP > likes to create semaphores all the time. These get created in > session_save_path. There seems to be some cases where processes sometimes > block on the semaphore form the other server. > > I haven't been able to figure out exactly why, and it may be exclusive to my > configuration, but it's something to watch out for. > You might end up with non-killable php processes out iowait blocked. the > only solution has been to kill gluster and remount the filesystem. This > only takes a second but it's inconvenient, and until you realize it's > happening, any process which tries to access the same files will block also, > thus eventually consuming all your spare httpd processes. > > Keith > > At 10:51 PM 8/10/2008, Roman Hlynovskiy wrote: >> >> Hello everyone, >> >> We want to build a cluster of 4 web-servers. ftp and http will be >> load-balanced, so we will never know which node will serve ftp/http >> traffic. >> Since we don't want to loose any part of functionality in case of >> getting one of the servers out of order, we have invented the >> following architecture: >> - each server will have 2 data bricks and 1 namespace bricks >> - each second data brick is AFRed with first data brick of the next >> server >> - all namespace bricks ar AFRed >> >> we've tried to follow recommendations from wiki and the following >> configs have been created: >> ------------------------------- begin server config >> ------------------------------------------- >> >> # >> # Object Storage Brick 1 >> # >> >> # low-level brick pointing to physical folder >> volume posix1 >> type storage/posix >> option directory /mnt/os1/export >> end-volume >> >> # put support for fcntl over brick >> volume locks1 >> type features/posix-locks >> subvolumes posix1 >> option mandatory on >> end-volume >> >> # put additional io threads for this brick >> volume brick1 >> type performance/io-threads >> option thread-count 4 >> option cache-size 32MB >> subvolumes locks1 >> end-volume >> >> # >> # Object Storage Brick 2 >> # >> >> # low-level brick pointing to physical folder >> volume posix2 >> type storage/posix >> option directory /mnt/os2/export >> end-volume >> >> # put support for fcntl over brick >> volume locks2 >> type features/posix-locks >> subvolumes posix2 >> option mandatory on >> end-volume >> >> # put additional io threads for this brick >> volume brick2 >> type performance/io-threads >> option thread-count 4 >> option cache-size 32MB >> subvolumes locks2 >> end-volume >> >> # >> # Metadata Storage >> # >> >> volume brick1ns >> type storage/posix >> option directory /mnt/ms1 >> end-volume >> >> # >> # Volume to export >> # >> >> volume server >> type protocol/server >> subvolumes brick1 brick2 brick1ns brick2ns >> option transport-type tcp/server >> option auth.ip.brick1.allow * >> option auth.ip.brick2.allow * >> option auth.ip.brick1ns.allow * >> end-volume >> >> ------------------------------- end server config >> ------------------------------------------- >> >> and client config from one of the nodes >> >> ------------------------------- begin client config >> ------------------------------------------- >> >> ### begin x-346-01 ### >> >> volume brick01 >> type protocol/client >> option transport-type tcp/client >> option remote-host 192.168.252.11 >> option remote-subvolume brick1 >> end-volume >> >> volume brick02 >> type protocol/client >> option transport-type tcp/client >> option remote-host 192.168.252.11 >> option remote-subvolume brick2 >> end-volume >> >> volume brick01ns >> type protocol/client >> option transport-type tcp/client >> option remote-host 192.168.252.11 >> option remote-subvolume brick1ns >> end-volume >> >> ### end x-346-01 ### >> >> >> >> ### begin x-346-02 ### >> >> volume brick03 >> type protocol/client >> option transport-type tcp/client >> option remote-host 192.168.252.21 >> option remote-subvolume brick1 >> end-volume >> >> volume brick04 >> type protocol/client >> option transport-type tcp/client >> option remote-host 192.168.252.21 >> option remote-subvolume brick2 >> end-volume >> >> volume brick03ns >> type protocol/client >> option transport-type tcp/client >> option remote-host 192.168.252.21 >> option remote-subvolume brick1n >> end-volume >> >> ### end x-346-02 ### >> >> >> >> ### begin x-346-03 ### >> >> volume brick05 >> type protocol/client >> option transport-type tcp/client >> option remote-host 192.168.252.31 >> option remote-subvolume brick1 >> end-volume >> >> volume brick06 >> type protocol/client >> option transport-type tcp/client >> option remote-host 192.168.252.31 >> option remote-subvolume brick2 >> end-volume >> >> volume brick05ns >> type protocol/client >> option transport-type tcp/client >> option remote-host 192.168.252.31 >> option remote-subvolume brick1ns >> end-volume >> >> ### begin x-346-03 ### >> >> >> >> ### begin x-346-04 ### >> >> volume brick07 >> type protocol/client >> option transport-type tcp/client >> option remote-host 192.168.252.41 >> option remote-subvolume brick1 >> end-volume >> >> volume brick08 >> type protocol/client >> option transport-type tcp/client >> option remote-host 192.168.252.41 >> option remote-subvolume brick2 >> end-volume >> >> volume brick07ns >> type protocol/client >> option transport-type tcp/client >> option remote-host 192.168.252.41 >> option remote-subvolume brick1ns >> end-volume >> >> ### begin x-346-04 ### >> >> >> >> ### afr bricks ### >> >> volume afr01 >> type cluster/afr >> subvolumes brick02 brick03 >> end-volume >> >> volume afr02 >> type cluster/afr >> subvolumes brick04 brick05 >> end-volume >> >> volume afr03 >> type cluster/afr >> subvolumes brick06 brick07 >> end-volume >> >> volume afr04 >> type cluster/afr >> subvolumes brick08 brick01 >> end-volume >> >> volume afrns >> type cluster/afr >> subvolumes brick01ns brick03ns brick05ns brick07ns >> end-volume >> >> ### unify ### >> >> volume unify >> type cluster/unify >> option namespace afrns >> option scheduler nufa >> option nufa.local-volume-name brick03 >> option nufa.local-volume-name brick04 >> option nufa.limits.min-free-disk 5% >> subvolumes afr01 afr02 afr03 afr04 >> end-volume >> >> ------------------------------- end client config >> ------------------------------------------- >> >> seems everything is working fine, but we want to know if there are any >> alternatives to such configuration and maybe some additional >> optimizations may be applied? >> is there any mechanisms to split one file over more than 2 nodes? >> Do we need readahead translators if we use nufa with local-volume >> options? what about write-ahead? did we miss something else? >> >> >> -- >> ...WBR, Roman Hlynovskiy >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users > >-- ...WBR, Roman Hlynovskiy
I haven''t had time to do any thorough testing.. I wont for a couple weeks unfortunately. Here''s what "seems" to be going on. I have 4 systems, which monitor eachother. every 2 minutes they pull a page from each of the other 3 servers. (this is a php script which returns the hostname & timestamp). When things seem to get stuck, it seems that most of the websites are working fine with the following exceptions. since it takes a few minutes for my pager to go off, there are usually a stack of php proceses hung accessing the status.php script. Sometimes there will be other virtualhosts'' scripts lingering, but usually only one or 2 if so. These are generally the index.php file for some of the busy hosts, or the cart.php file for a busy shopping site. gluster seems to be pretty happy during all this, so I''m not sure if the problem is on the underlying filesystem or fuse. (I''m not using the gluster optimized fuse at the moment--I don''t have all the kernel sources to build it). I did realize, after I read your original email that I didn''t have "option mandatory on" in the locks brick. I enabled that and amd thinking that might solve the problem. since I have threads brick enabled, I''m now wondering if there was some strange thing related to the semaphores where they were getting removed while something was trying to get a lock on them. the file goes away, the lock request doesn''t know what to do with itself and just sits there waiting forever??? I''m speculating, but there''s the behavior I''ve been able to observe. Make sure when you do your tests, you have some scripts that take a while to process and some that are really super fast. I think the super fast ones cause most of the problem. If my suspicions about the semaphores and the locks is true, that is likely where you''ll get tripped up. keep me posted, would love to hear any results of your testing. Keith At 02:40 AM 8/11/2008, Roman Hlynovskiy wrote:>Hello Keith, > >ok thanks, we will try to make stress tests with php and check if the >same situation apply to our configuration. >did this semaphore issue occurred only with some specific number of >simultaneous connections or it was matter of "luck" :) ? > > >2008/8/11 Keith Freedman <freedman at freeformit.com>: > > I''ll let one of the devs respond to your specific config. > > > > There are a couple cautions ... > > if you''re running PHP, you''ll want to modify your php.ini to have > > session_save_path on shared storage.. If someones session starts on server > > one and the browser directs them to server2 , their session is missing > > (Either that or use DB based sessions). > > > > I''ve noticed some problems with this configuration, in that it seems PHP > > likes to create semaphores all the time. These get created in > > session_save_path. There seems to be some cases where processes sometimes > > block on the semaphore form the other server. > > > > I haven''t been able to figure out exactly why, and it may be > exclusive to my > > configuration, but it''s something to watch out for. > > You might end up with non-killable php processes out iowait blocked. the > > only solution has been to kill gluster and remount the filesystem. This > > only takes a second but it''s inconvenient, and until you realize it''s > > happening, any process which tries to access the same files will > block also, > > thus eventually consuming all your spare httpd processes. > > > > Keith > > > > At 10:51 PM 8/10/2008, Roman Hlynovskiy wrote: > >> > >> Hello everyone, > >> > >> We want to build a cluster of 4 web-servers. ftp and http will be > >> load-balanced, so we will never know which node will serve ftp/http > >> traffic. > >> Since we don''t want to loose any part of functionality in case of > >> getting one of the servers out of order, we have invented the > >> following architecture: > >> - each server will have 2 data bricks and 1 namespace bricks > >> - each second data brick is AFRed with first data brick of the next > >> server > >> - all namespace bricks ar AFRed > >> > >> we''ve tried to follow recommendations from wiki and the following > >> configs have been created: > >> ------------------------------- begin server config > >> ------------------------------------------- > >> > >> # > >> # Object Storage Brick 1 > >> # > >> > >> # low-level brick pointing to physical folder > >> volume posix1 > >> type storage/posix > >> option directory /mnt/os1/export > >> end-volume > >> > >> # put support for fcntl over brick > >> volume locks1 > >> type features/posix-locks > >> subvolumes posix1 > >> option mandatory on > >> end-volume > >> > >> # put additional io threads for this brick > >> volume brick1 > >> type performance/io-threads > >> option thread-count 4 > >> option cache-size 32MB > >> subvolumes locks1 > >> end-volume > >> > >> # > >> # Object Storage Brick 2 > >> # > >> > >> # low-level brick pointing to physical folder > >> volume posix2 > >> type storage/posix > >> option directory /mnt/os2/export > >> end-volume > >> > >> # put support for fcntl over brick > >> volume locks2 > >> type features/posix-locks > >> subvolumes posix2 > >> option mandatory on > >> end-volume > >> > >> # put additional io threads for this brick > >> volume brick2 > >> type performance/io-threads > >> option thread-count 4 > >> option cache-size 32MB > >> subvolumes locks2 > >> end-volume > >> > >> # > >> # Metadata Storage > >> # > >> > >> volume brick1ns > >> type storage/posix > >> option directory /mnt/ms1 > >> end-volume > >> > >> # > >> # Volume to export > >> # > >> > >> volume server > >> type protocol/server > >> subvolumes brick1 brick2 brick1ns brick2ns > >> option transport-type tcp/server > >> option auth.ip.brick1.allow * > >> option auth.ip.brick2.allow * > >> option auth.ip.brick1ns.allow * > >> end-volume > >> > >> ------------------------------- end server config > >> ------------------------------------------- > >> > >> and client config from one of the nodes > >> > >> ------------------------------- begin client config > >> ------------------------------------------- > >> > >> ### begin x-346-01 ### > >> > >> volume brick01 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.11 > >> option remote-subvolume brick1 > >> end-volume > >> > >> volume brick02 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.11 > >> option remote-subvolume brick2 > >> end-volume > >> > >> volume brick01ns > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.11 > >> option remote-subvolume brick1ns > >> end-volume > >> > >> ### end x-346-01 ### > >> > >> > >> > >> ### begin x-346-02 ### > >> > >> volume brick03 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.21 > >> option remote-subvolume brick1 > >> end-volume > >> > >> volume brick04 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.21 > >> option remote-subvolume brick2 > >> end-volume > >> > >> volume brick03ns > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.21 > >> option remote-subvolume brick1n > >> end-volume > >> > >> ### end x-346-02 ### > >> > >> > >> > >> ### begin x-346-03 ### > >> > >> volume brick05 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.31 > >> option remote-subvolume brick1 > >> end-volume > >> > >> volume brick06 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.31 > >> option remote-subvolume brick2 > >> end-volume > >> > >> volume brick05ns > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.31 > >> option remote-subvolume brick1ns > >> end-volume > >> > >> ### begin x-346-03 ### > >> > >> > >> > >> ### begin x-346-04 ### > >> > >> volume brick07 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.41 > >> option remote-subvolume brick1 > >> end-volume > >> > >> volume brick08 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.41 > >> option remote-subvolume brick2 > >> end-volume > >> > >> volume brick07ns > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.41 > >> option remote-subvolume brick1ns > >> end-volume > >> > >> ### begin x-346-04 ### > >> > >> > >> > >> ### afr bricks ### > >> > >> volume afr01 > >> type cluster/afr > >> subvolumes brick02 brick03 > >> end-volume > >> > >> volume afr02 > >> type cluster/afr > >> subvolumes brick04 brick05 > >> end-volume > >> > >> volume afr03 > >> type cluster/afr > >> subvolumes brick06 brick07 > >> end-volume > >> > >> volume afr04 > >> type cluster/afr > >> subvolumes brick08 brick01 > >> end-volume > >> > >> volume afrns > >> type cluster/afr > >> subvolumes brick01ns brick03ns brick05ns brick07ns > >> end-volume > >> > >> ### unify ### > >> > >> volume unify > >> type cluster/unify > >> option namespace afrns > >> option scheduler nufa > >> option nufa.local-volume-name brick03 > >> option nufa.local-volume-name brick04 > >> option nufa.limits.min-free-disk 5% > >> subvolumes afr01 afr02 afr03 afr04 > >> end-volume > >> > >> ------------------------------- end client config > >> ------------------------------------------- > >> > >> seems everything is working fine, but we want to know if there are any > >> alternatives to such configuration and maybe some additional > >> optimizations may be applied? > >> is there any mechanisms to split one file over more than 2 nodes? > >> Do we need readahead translators if we use nufa with local-volume > >> options? what about write-ahead? did we miss something else? > >> > >> > >> -- > >> ...WBR, Roman Hlynovskiy > >> > >> _______________________________________________ > >> Gluster-users mailing list > >> Gluster-users at gluster.org > >> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users > > > > > > > >-- >...WBR, Roman Hlynovskiy > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
I haven''t had time to do any thorough testing.. I wont for a couple weeks unfortunately. Here''s what "seems" to be going on. I have 4 systems, which monitor eachother. every 2 minutes they pull a page from each of the other 3 servers. (this is a php script which returns the hostname & timestamp). When things seem to get stuck, it seems that most of the websites are working fine with the following exceptions. since it takes a few minutes for my pager to go off, there are usually a stack of php proceses hung accessing the status.php script. Sometimes there will be other virtualhosts'' scripts lingering, but usually only one or 2 if so. These are generally the index.php file for some of the busy hosts, or the cart.php file for a busy shopping site. gluster seems to be pretty happy during all this, so I''m not sure if the problem is on the underlying filesystem or fuse. (I''m not using the gluster optimized fuse at the moment--I don''t have all the kernel sources to build it). I did realize, after I read your original email that I didn''t have "option mandatory on" in the locks brick. I enabled that and amd thinking that might solve the problem. since I have threads brick enabled, I''m now wondering if there was some strange thing related to the semaphores where they were getting removed while something was trying to get a lock on them. the file goes away, the lock request doesn''t know what to do with itself and just sits there waiting forever??? I''m speculating, but there''s the behavior I''ve been able to observe. Make sure when you do your tests, you have some scripts that take a while to process and some that are really super fast. I think the super fast ones cause most of the problem. If my suspicions about the semaphores and the locks is true, that is likely where you''ll get tripped up. keep me posted, would love to hear any results of your testing. Keith At 02:40 AM 8/11/2008, Roman Hlynovskiy wrote:>Hello Keith, > >ok thanks, we will try to make stress tests with php and check if the >same situation apply to our configuration. >did this semaphore issue occurred only with some specific number of >simultaneous connections or it was matter of "luck" :) ? > > >2008/8/11 Keith Freedman <freedman at freeformit.com>: > > I''ll let one of the devs respond to your specific config. > > > > There are a couple cautions ... > > if you''re running PHP, you''ll want to modify your php.ini to have > > session_save_path on shared storage.. If someones session starts on server > > one and the browser directs them to server2 , their session is missing > > (Either that or use DB based sessions). > > > > I''ve noticed some problems with this configuration, in that it seems PHP > > likes to create semaphores all the time. These get created in > > session_save_path. There seems to be some cases where processes sometimes > > block on the semaphore form the other server. > > > > I haven''t been able to figure out exactly why, and it may be > exclusive to my > > configuration, but it''s something to watch out for. > > You might end up with non-killable php processes out iowait blocked. the > > only solution has been to kill gluster and remount the filesystem. This > > only takes a second but it''s inconvenient, and until you realize it''s > > happening, any process which tries to access the same files will > block also, > > thus eventually consuming all your spare httpd processes. > > > > Keith > > > > At 10:51 PM 8/10/2008, Roman Hlynovskiy wrote: > >> > >> Hello everyone, > >> > >> We want to build a cluster of 4 web-servers. ftp and http will be > >> load-balanced, so we will never know which node will serve ftp/http > >> traffic. > >> Since we don''t want to loose any part of functionality in case of > >> getting one of the servers out of order, we have invented the > >> following architecture: > >> - each server will have 2 data bricks and 1 namespace bricks > >> - each second data brick is AFRed with first data brick of the next > >> server > >> - all namespace bricks ar AFRed > >> > >> we''ve tried to follow recommendations from wiki and the following > >> configs have been created: > >> ------------------------------- begin server config > >> ------------------------------------------- > >> > >> # > >> # Object Storage Brick 1 > >> # > >> > >> # low-level brick pointing to physical folder > >> volume posix1 > >> type storage/posix > >> option directory /mnt/os1/export > >> end-volume > >> > >> # put support for fcntl over brick > >> volume locks1 > >> type features/posix-locks > >> subvolumes posix1 > >> option mandatory on > >> end-volume > >> > >> # put additional io threads for this brick > >> volume brick1 > >> type performance/io-threads > >> option thread-count 4 > >> option cache-size 32MB > >> subvolumes locks1 > >> end-volume > >> > >> # > >> # Object Storage Brick 2 > >> # > >> > >> # low-level brick pointing to physical folder > >> volume posix2 > >> type storage/posix > >> option directory /mnt/os2/export > >> end-volume > >> > >> # put support for fcntl over brick > >> volume locks2 > >> type features/posix-locks > >> subvolumes posix2 > >> option mandatory on > >> end-volume > >> > >> # put additional io threads for this brick > >> volume brick2 > >> type performance/io-threads > >> option thread-count 4 > >> option cache-size 32MB > >> subvolumes locks2 > >> end-volume > >> > >> # > >> # Metadata Storage > >> # > >> > >> volume brick1ns > >> type storage/posix > >> option directory /mnt/ms1 > >> end-volume > >> > >> # > >> # Volume to export > >> # > >> > >> volume server > >> type protocol/server > >> subvolumes brick1 brick2 brick1ns brick2ns > >> option transport-type tcp/server > >> option auth.ip.brick1.allow * > >> option auth.ip.brick2.allow * > >> option auth.ip.brick1ns.allow * > >> end-volume > >> > >> ------------------------------- end server config > >> ------------------------------------------- > >> > >> and client config from one of the nodes > >> > >> ------------------------------- begin client config > >> ------------------------------------------- > >> > >> ### begin x-346-01 ### > >> > >> volume brick01 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.11 > >> option remote-subvolume brick1 > >> end-volume > >> > >> volume brick02 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.11 > >> option remote-subvolume brick2 > >> end-volume > >> > >> volume brick01ns > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.11 > >> option remote-subvolume brick1ns > >> end-volume > >> > >> ### end x-346-01 ### > >> > >> > >> > >> ### begin x-346-02 ### > >> > >> volume brick03 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.21 > >> option remote-subvolume brick1 > >> end-volume > >> > >> volume brick04 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.21 > >> option remote-subvolume brick2 > >> end-volume > >> > >> volume brick03ns > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.21 > >> option remote-subvolume brick1n > >> end-volume > >> > >> ### end x-346-02 ### > >> > >> > >> > >> ### begin x-346-03 ### > >> > >> volume brick05 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.31 > >> option remote-subvolume brick1 > >> end-volume > >> > >> volume brick06 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.31 > >> option remote-subvolume brick2 > >> end-volume > >> > >> volume brick05ns > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.31 > >> option remote-subvolume brick1ns > >> end-volume > >> > >> ### begin x-346-03 ### > >> > >> > >> > >> ### begin x-346-04 ### > >> > >> volume brick07 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.41 > >> option remote-subvolume brick1 > >> end-volume > >> > >> volume brick08 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.41 > >> option remote-subvolume brick2 > >> end-volume > >> > >> volume brick07ns > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.41 > >> option remote-subvolume brick1ns > >> end-volume > >> > >> ### begin x-346-04 ### > >> > >> > >> > >> ### afr bricks ### > >> > >> volume afr01 > >> type cluster/afr > >> subvolumes brick02 brick03 > >> end-volume > >> > >> volume afr02 > >> type cluster/afr > >> subvolumes brick04 brick05 > >> end-volume > >> > >> volume afr03 > >> type cluster/afr > >> subvolumes brick06 brick07 > >> end-volume > >> > >> volume afr04 > >> type cluster/afr > >> subvolumes brick08 brick01 > >> end-volume > >> > >> volume afrns > >> type cluster/afr > >> subvolumes brick01ns brick03ns brick05ns brick07ns > >> end-volume > >> > >> ### unify ### > >> > >> volume unify > >> type cluster/unify > >> option namespace afrns > >> option scheduler nufa > >> option nufa.local-volume-name brick03 > >> option nufa.local-volume-name brick04 > >> option nufa.limits.min-free-disk 5% > >> subvolumes afr01 afr02 afr03 afr04 > >> end-volume > >> > >> ------------------------------- end client config > >> ------------------------------------------- > >> > >> seems everything is working fine, but we want to know if there are any > >> alternatives to such configuration and maybe some additional > >> optimizations may be applied? > >> is there any mechanisms to split one file over more than 2 nodes? > >> Do we need readahead translators if we use nufa with local-volume > >> options? what about write-ahead? did we miss something else? > >> > >> > >> -- > >> ...WBR, Roman Hlynovskiy > >> > >> _______________________________________________ > >> Gluster-users mailing list > >> Gluster-users at gluster.org > >> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users > > > > > > > >-- >...WBR, Roman Hlynovskiy > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
I haven''t had time to do any thorough testing.. I wont for a couple weeks unfortunately. Here''s what "seems" to be going on. I have 4 systems, which monitor eachother. every 2 minutes they pull a page from each of the other 3 servers. (this is a php script which returns the hostname & timestamp). When things seem to get stuck, it seems that most of the websites are working fine with the following exceptions. since it takes a few minutes for my pager to go off, there are usually a stack of php proceses hung accessing the status.php script. Sometimes there will be other virtualhosts'' scripts lingering, but usually only one or 2 if so. These are generally the index.php file for some of the busy hosts, or the cart.php file for a busy shopping site. gluster seems to be pretty happy during all this, so I''m not sure if the problem is on the underlying filesystem or fuse. (I''m not using the gluster optimized fuse at the moment--I don''t have all the kernel sources to build it). I did realize, after I read your original email that I didn''t have "option mandatory on" in the locks brick. I enabled that and amd thinking that might solve the problem. since I have threads brick enabled, I''m now wondering if there was some strange thing related to the semaphores where they were getting removed while something was trying to get a lock on them. the file goes away, the lock request doesn''t know what to do with itself and just sits there waiting forever??? I''m speculating, but there''s the behavior I''ve been able to observe. Make sure when you do your tests, you have some scripts that take a while to process and some that are really super fast. I think the super fast ones cause most of the problem. If my suspicions about the semaphores and the locks is true, that is likely where you''ll get tripped up. keep me posted, would love to hear any results of your testing. Keith At 02:40 AM 8/11/2008, Roman Hlynovskiy wrote:>Hello Keith, > >ok thanks, we will try to make stress tests with php and check if the >same situation apply to our configuration. >did this semaphore issue occurred only with some specific number of >simultaneous connections or it was matter of "luck" :) ? > > >2008/8/11 Keith Freedman <freedman at freeformit.com>: > > I''ll let one of the devs respond to your specific config. > > > > There are a couple cautions ... > > if you''re running PHP, you''ll want to modify your php.ini to have > > session_save_path on shared storage.. If someones session starts on server > > one and the browser directs them to server2 , their session is missing > > (Either that or use DB based sessions). > > > > I''ve noticed some problems with this configuration, in that it seems PHP > > likes to create semaphores all the time. These get created in > > session_save_path. There seems to be some cases where processes sometimes > > block on the semaphore form the other server. > > > > I haven''t been able to figure out exactly why, and it may be > exclusive to my > > configuration, but it''s something to watch out for. > > You might end up with non-killable php processes out iowait blocked. the > > only solution has been to kill gluster and remount the filesystem. This > > only takes a second but it''s inconvenient, and until you realize it''s > > happening, any process which tries to access the same files will > block also, > > thus eventually consuming all your spare httpd processes. > > > > Keith > > > > At 10:51 PM 8/10/2008, Roman Hlynovskiy wrote: > >> > >> Hello everyone, > >> > >> We want to build a cluster of 4 web-servers. ftp and http will be > >> load-balanced, so we will never know which node will serve ftp/http > >> traffic. > >> Since we don''t want to loose any part of functionality in case of > >> getting one of the servers out of order, we have invented the > >> following architecture: > >> - each server will have 2 data bricks and 1 namespace bricks > >> - each second data brick is AFRed with first data brick of the next > >> server > >> - all namespace bricks ar AFRed > >> > >> we''ve tried to follow recommendations from wiki and the following > >> configs have been created: > >> ------------------------------- begin server config > >> ------------------------------------------- > >> > >> # > >> # Object Storage Brick 1 > >> # > >> > >> # low-level brick pointing to physical folder > >> volume posix1 > >> type storage/posix > >> option directory /mnt/os1/export > >> end-volume > >> > >> # put support for fcntl over brick > >> volume locks1 > >> type features/posix-locks > >> subvolumes posix1 > >> option mandatory on > >> end-volume > >> > >> # put additional io threads for this brick > >> volume brick1 > >> type performance/io-threads > >> option thread-count 4 > >> option cache-size 32MB > >> subvolumes locks1 > >> end-volume > >> > >> # > >> # Object Storage Brick 2 > >> # > >> > >> # low-level brick pointing to physical folder > >> volume posix2 > >> type storage/posix > >> option directory /mnt/os2/export > >> end-volume > >> > >> # put support for fcntl over brick > >> volume locks2 > >> type features/posix-locks > >> subvolumes posix2 > >> option mandatory on > >> end-volume > >> > >> # put additional io threads for this brick > >> volume brick2 > >> type performance/io-threads > >> option thread-count 4 > >> option cache-size 32MB > >> subvolumes locks2 > >> end-volume > >> > >> # > >> # Metadata Storage > >> # > >> > >> volume brick1ns > >> type storage/posix > >> option directory /mnt/ms1 > >> end-volume > >> > >> # > >> # Volume to export > >> # > >> > >> volume server > >> type protocol/server > >> subvolumes brick1 brick2 brick1ns brick2ns > >> option transport-type tcp/server > >> option auth.ip.brick1.allow * > >> option auth.ip.brick2.allow * > >> option auth.ip.brick1ns.allow * > >> end-volume > >> > >> ------------------------------- end server config > >> ------------------------------------------- > >> > >> and client config from one of the nodes > >> > >> ------------------------------- begin client config > >> ------------------------------------------- > >> > >> ### begin x-346-01 ### > >> > >> volume brick01 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.11 > >> option remote-subvolume brick1 > >> end-volume > >> > >> volume brick02 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.11 > >> option remote-subvolume brick2 > >> end-volume > >> > >> volume brick01ns > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.11 > >> option remote-subvolume brick1ns > >> end-volume > >> > >> ### end x-346-01 ### > >> > >> > >> > >> ### begin x-346-02 ### > >> > >> volume brick03 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.21 > >> option remote-subvolume brick1 > >> end-volume > >> > >> volume brick04 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.21 > >> option remote-subvolume brick2 > >> end-volume > >> > >> volume brick03ns > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.21 > >> option remote-subvolume brick1n > >> end-volume > >> > >> ### end x-346-02 ### > >> > >> > >> > >> ### begin x-346-03 ### > >> > >> volume brick05 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.31 > >> option remote-subvolume brick1 > >> end-volume > >> > >> volume brick06 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.31 > >> option remote-subvolume brick2 > >> end-volume > >> > >> volume brick05ns > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.31 > >> option remote-subvolume brick1ns > >> end-volume > >> > >> ### begin x-346-03 ### > >> > >> > >> > >> ### begin x-346-04 ### > >> > >> volume brick07 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.41 > >> option remote-subvolume brick1 > >> end-volume > >> > >> volume brick08 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.41 > >> option remote-subvolume brick2 > >> end-volume > >> > >> volume brick07ns > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.41 > >> option remote-subvolume brick1ns > >> end-volume > >> > >> ### begin x-346-04 ### > >> > >> > >> > >> ### afr bricks ### > >> > >> volume afr01 > >> type cluster/afr > >> subvolumes brick02 brick03 > >> end-volume > >> > >> volume afr02 > >> type cluster/afr > >> subvolumes brick04 brick05 > >> end-volume > >> > >> volume afr03 > >> type cluster/afr > >> subvolumes brick06 brick07 > >> end-volume > >> > >> volume afr04 > >> type cluster/afr > >> subvolumes brick08 brick01 > >> end-volume > >> > >> volume afrns > >> type cluster/afr > >> subvolumes brick01ns brick03ns brick05ns brick07ns > >> end-volume > >> > >> ### unify ### > >> > >> volume unify > >> type cluster/unify > >> option namespace afrns > >> option scheduler nufa > >> option nufa.local-volume-name brick03 > >> option nufa.local-volume-name brick04 > >> option nufa.limits.min-free-disk 5% > >> subvolumes afr01 afr02 afr03 afr04 > >> end-volume > >> > >> ------------------------------- end client config > >> ------------------------------------------- > >> > >> seems everything is working fine, but we want to know if there are any > >> alternatives to such configuration and maybe some additional > >> optimizations may be applied? > >> is there any mechanisms to split one file over more than 2 nodes? > >> Do we need readahead translators if we use nufa with local-volume > >> options? what about write-ahead? did we miss something else? > >> > >> > >> -- > >> ...WBR, Roman Hlynovskiy > >> > >> _______________________________________________ > >> Gluster-users mailing list > >> Gluster-users at gluster.org > >> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users > > > > > > > >-- >...WBR, Roman Hlynovskiy > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users