Hello everyone,
We want to build a cluster of 4 web-servers. ftp and http will be
load-balanced, so we will never know which node will serve ftp/http
traffic.
Since we don't want to loose any part of functionality in case of
getting one of the servers out of order, we have invented the
following architecture:
- each server will have 2 data bricks and 1 namespace bricks
- each second data brick is AFRed with first data brick of the next server
- all namespace bricks ar AFRed
we've tried to follow recommendations from wiki and the following
configs have been created:
------------------------------- begin server config
-------------------------------------------
#
# Object Storage Brick 1
#
# low-level brick pointing to physical folder
volume posix1
type storage/posix
option directory /mnt/os1/export
end-volume
# put support for fcntl over brick
volume locks1
type features/posix-locks
subvolumes posix1
option mandatory on
end-volume
# put additional io threads for this brick
volume brick1
type performance/io-threads
option thread-count 4
option cache-size 32MB
subvolumes locks1
end-volume
#
# Object Storage Brick 2
#
# low-level brick pointing to physical folder
volume posix2
type storage/posix
option directory /mnt/os2/export
end-volume
# put support for fcntl over brick
volume locks2
type features/posix-locks
subvolumes posix2
option mandatory on
end-volume
# put additional io threads for this brick
volume brick2
type performance/io-threads
option thread-count 4
option cache-size 32MB
subvolumes locks2
end-volume
#
# Metadata Storage
#
volume brick1ns
type storage/posix
option directory /mnt/ms1
end-volume
#
# Volume to export
#
volume server
type protocol/server
subvolumes brick1 brick2 brick1ns brick2ns
option transport-type tcp/server
option auth.ip.brick1.allow *
option auth.ip.brick2.allow *
option auth.ip.brick1ns.allow *
end-volume
------------------------------- end server config
-------------------------------------------
and client config from one of the nodes
------------------------------- begin client config
-------------------------------------------
### begin x-346-01 ###
volume brick01
type protocol/client
option transport-type tcp/client
option remote-host 192.168.252.11
option remote-subvolume brick1
end-volume
volume brick02
type protocol/client
option transport-type tcp/client
option remote-host 192.168.252.11
option remote-subvolume brick2
end-volume
volume brick01ns
type protocol/client
option transport-type tcp/client
option remote-host 192.168.252.11
option remote-subvolume brick1ns
end-volume
### end x-346-01 ###
### begin x-346-02 ###
volume brick03
type protocol/client
option transport-type tcp/client
option remote-host 192.168.252.21
option remote-subvolume brick1
end-volume
volume brick04
type protocol/client
option transport-type tcp/client
option remote-host 192.168.252.21
option remote-subvolume brick2
end-volume
volume brick03ns
type protocol/client
option transport-type tcp/client
option remote-host 192.168.252.21
option remote-subvolume brick1n
end-volume
### end x-346-02 ###
### begin x-346-03 ###
volume brick05
type protocol/client
option transport-type tcp/client
option remote-host 192.168.252.31
option remote-subvolume brick1
end-volume
volume brick06
type protocol/client
option transport-type tcp/client
option remote-host 192.168.252.31
option remote-subvolume brick2
end-volume
volume brick05ns
type protocol/client
option transport-type tcp/client
option remote-host 192.168.252.31
option remote-subvolume brick1ns
end-volume
### begin x-346-03 ###
### begin x-346-04 ###
volume brick07
type protocol/client
option transport-type tcp/client
option remote-host 192.168.252.41
option remote-subvolume brick1
end-volume
volume brick08
type protocol/client
option transport-type tcp/client
option remote-host 192.168.252.41
option remote-subvolume brick2
end-volume
volume brick07ns
type protocol/client
option transport-type tcp/client
option remote-host 192.168.252.41
option remote-subvolume brick1ns
end-volume
### begin x-346-04 ###
### afr bricks ###
volume afr01
type cluster/afr
subvolumes brick02 brick03
end-volume
volume afr02
type cluster/afr
subvolumes brick04 brick05
end-volume
volume afr03
type cluster/afr
subvolumes brick06 brick07
end-volume
volume afr04
type cluster/afr
subvolumes brick08 brick01
end-volume
volume afrns
type cluster/afr
subvolumes brick01ns brick03ns brick05ns brick07ns
end-volume
### unify ###
volume unify
type cluster/unify
option namespace afrns
option scheduler nufa
option nufa.local-volume-name brick03
option nufa.local-volume-name brick04
option nufa.limits.min-free-disk 5%
subvolumes afr01 afr02 afr03 afr04
end-volume
------------------------------- end client config
-------------------------------------------
seems everything is working fine, but we want to know if there are any
alternatives to such configuration and maybe some additional
optimizations may be applied?
is there any mechanisms to split one file over more than 2 nodes?
Do we need readahead translators if we use nufa with local-volume
options? what about write-ahead? did we miss something else?
--
...WBR, Roman Hlynovskiy
I''ll let one of the devs respond to your specific config. There are a couple cautions ... if you''re running PHP, you''ll want to modify your php.ini to have session_save_path on shared storage.. If someones session starts on server one and the browser directs them to server2 , their session is missing (Either that or use DB based sessions). I''ve noticed some problems with this configuration, in that it seems PHP likes to create semaphores all the time. These get created in session_save_path. There seems to be some cases where processes sometimes block on the semaphore form the other server. I haven''t been able to figure out exactly why, and it may be exclusive to my configuration, but it''s something to watch out for. You might end up with non-killable php processes out iowait blocked. the only solution has been to kill gluster and remount the filesystem. This only takes a second but it''s inconvenient, and until you realize it''s happening, any process which tries to access the same files will block also, thus eventually consuming all your spare httpd processes. Keith At 10:51 PM 8/10/2008, Roman Hlynovskiy wrote:>Hello everyone, > >We want to build a cluster of 4 web-servers. ftp and http will be >load-balanced, so we will never know which node will serve ftp/http >traffic. >Since we don''t want to loose any part of functionality in case of >getting one of the servers out of order, we have invented the >following architecture: > - each server will have 2 data bricks and 1 namespace bricks > - each second data brick is AFRed with first data brick of the next server > - all namespace bricks ar AFRed > >we''ve tried to follow recommendations from wiki and the following >configs have been created: >------------------------------- begin server config >------------------------------------------- > ># ># Object Storage Brick 1 ># > ># low-level brick pointing to physical folder >volume posix1 > type storage/posix > option directory /mnt/os1/export >end-volume > ># put support for fcntl over brick >volume locks1 > type features/posix-locks > subvolumes posix1 > option mandatory on >end-volume > ># put additional io threads for this brick >volume brick1 > type performance/io-threads > option thread-count 4 > option cache-size 32MB > subvolumes locks1 >end-volume > ># ># Object Storage Brick 2 ># > ># low-level brick pointing to physical folder >volume posix2 > type storage/posix > option directory /mnt/os2/export >end-volume > ># put support for fcntl over brick >volume locks2 > type features/posix-locks > subvolumes posix2 > option mandatory on >end-volume > ># put additional io threads for this brick >volume brick2 > type performance/io-threads > option thread-count 4 > option cache-size 32MB > subvolumes locks2 >end-volume > ># ># Metadata Storage ># > >volume brick1ns > type storage/posix > option directory /mnt/ms1 >end-volume > ># ># Volume to export ># > >volume server > type protocol/server > subvolumes brick1 brick2 brick1ns brick2ns > option transport-type tcp/server > option auth.ip.brick1.allow * > option auth.ip.brick2.allow * > option auth.ip.brick1ns.allow * >end-volume > >------------------------------- end server config >------------------------------------------- > >and client config from one of the nodes > >------------------------------- begin client config >------------------------------------------- > >### begin x-346-01 ### > >volume brick01 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.11 > option remote-subvolume brick1 >end-volume > >volume brick02 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.11 > option remote-subvolume brick2 >end-volume > >volume brick01ns > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.11 > option remote-subvolume brick1ns >end-volume > >### end x-346-01 ### > > > >### begin x-346-02 ### > >volume brick03 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.21 > option remote-subvolume brick1 >end-volume > >volume brick04 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.21 > option remote-subvolume brick2 >end-volume > >volume brick03ns > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.21 > option remote-subvolume brick1n >end-volume > >### end x-346-02 ### > > > >### begin x-346-03 ### > >volume brick05 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.31 > option remote-subvolume brick1 >end-volume > >volume brick06 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.31 > option remote-subvolume brick2 >end-volume > >volume brick05ns > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.31 > option remote-subvolume brick1ns >end-volume > >### begin x-346-03 ### > > > >### begin x-346-04 ### > >volume brick07 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.41 > option remote-subvolume brick1 >end-volume > >volume brick08 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.41 > option remote-subvolume brick2 >end-volume > >volume brick07ns > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.41 > option remote-subvolume brick1ns >end-volume > >### begin x-346-04 ### > > > >### afr bricks ### > >volume afr01 > type cluster/afr > subvolumes brick02 brick03 >end-volume > >volume afr02 > type cluster/afr > subvolumes brick04 brick05 >end-volume > >volume afr03 > type cluster/afr > subvolumes brick06 brick07 >end-volume > >volume afr04 > type cluster/afr > subvolumes brick08 brick01 >end-volume > >volume afrns > type cluster/afr > subvolumes brick01ns brick03ns brick05ns brick07ns >end-volume > >### unify ### > >volume unify > type cluster/unify > option namespace afrns > option scheduler nufa > option nufa.local-volume-name brick03 > option nufa.local-volume-name brick04 > option nufa.limits.min-free-disk 5% > subvolumes afr01 afr02 afr03 afr04 >end-volume > >------------------------------- end client config >------------------------------------------- > >seems everything is working fine, but we want to know if there are any >alternatives to such configuration and maybe some additional >optimizations may be applied? >is there any mechanisms to split one file over more than 2 nodes? >Do we need readahead translators if we use nufa with local-volume >options? what about write-ahead? did we miss something else? > > >-- >...WBR, Roman Hlynovskiy > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
I'll let one of the devs respond to your specific config. There are a couple cautions ... if you're running PHP, you'll want to modify your php.ini to have session_save_path on shared storage.. If someones session starts on server one and the browser directs them to server2 , their session is missing (Either that or use DB based sessions). I've noticed some problems with this configuration, in that it seems PHP likes to create semaphores all the time. These get created in session_save_path. There seems to be some cases where processes sometimes block on the semaphore form the other server. I haven't been able to figure out exactly why, and it may be exclusive to my configuration, but it's something to watch out for. You might end up with non-killable php processes out iowait blocked. the only solution has been to kill gluster and remount the filesystem. This only takes a second but it's inconvenient, and until you realize it's happening, any process which tries to access the same files will block also, thus eventually consuming all your spare httpd processes. Keith At 10:51 PM 8/10/2008, Roman Hlynovskiy wrote:>Hello everyone, > >We want to build a cluster of 4 web-servers. ftp and http will be >load-balanced, so we will never know which node will serve ftp/http >traffic. >Since we don't want to loose any part of functionality in case of >getting one of the servers out of order, we have invented the >following architecture: > - each server will have 2 data bricks and 1 namespace bricks > - each second data brick is AFRed with first data brick of the next server > - all namespace bricks ar AFRed > >we've tried to follow recommendations from wiki and the following >configs have been created: >------------------------------- begin server config >------------------------------------------- > ># ># Object Storage Brick 1 ># > ># low-level brick pointing to physical folder >volume posix1 > type storage/posix > option directory /mnt/os1/export >end-volume > ># put support for fcntl over brick >volume locks1 > type features/posix-locks > subvolumes posix1 > option mandatory on >end-volume > ># put additional io threads for this brick >volume brick1 > type performance/io-threads > option thread-count 4 > option cache-size 32MB > subvolumes locks1 >end-volume > ># ># Object Storage Brick 2 ># > ># low-level brick pointing to physical folder >volume posix2 > type storage/posix > option directory /mnt/os2/export >end-volume > ># put support for fcntl over brick >volume locks2 > type features/posix-locks > subvolumes posix2 > option mandatory on >end-volume > ># put additional io threads for this brick >volume brick2 > type performance/io-threads > option thread-count 4 > option cache-size 32MB > subvolumes locks2 >end-volume > ># ># Metadata Storage ># > >volume brick1ns > type storage/posix > option directory /mnt/ms1 >end-volume > ># ># Volume to export ># > >volume server > type protocol/server > subvolumes brick1 brick2 brick1ns brick2ns > option transport-type tcp/server > option auth.ip.brick1.allow * > option auth.ip.brick2.allow * > option auth.ip.brick1ns.allow * >end-volume > >------------------------------- end server config >------------------------------------------- > >and client config from one of the nodes > >------------------------------- begin client config >------------------------------------------- > >### begin x-346-01 ### > >volume brick01 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.11 > option remote-subvolume brick1 >end-volume > >volume brick02 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.11 > option remote-subvolume brick2 >end-volume > >volume brick01ns > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.11 > option remote-subvolume brick1ns >end-volume > >### end x-346-01 ### > > > >### begin x-346-02 ### > >volume brick03 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.21 > option remote-subvolume brick1 >end-volume > >volume brick04 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.21 > option remote-subvolume brick2 >end-volume > >volume brick03ns > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.21 > option remote-subvolume brick1n >end-volume > >### end x-346-02 ### > > > >### begin x-346-03 ### > >volume brick05 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.31 > option remote-subvolume brick1 >end-volume > >volume brick06 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.31 > option remote-subvolume brick2 >end-volume > >volume brick05ns > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.31 > option remote-subvolume brick1ns >end-volume > >### begin x-346-03 ### > > > >### begin x-346-04 ### > >volume brick07 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.41 > option remote-subvolume brick1 >end-volume > >volume brick08 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.41 > option remote-subvolume brick2 >end-volume > >volume brick07ns > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.41 > option remote-subvolume brick1ns >end-volume > >### begin x-346-04 ### > > > >### afr bricks ### > >volume afr01 > type cluster/afr > subvolumes brick02 brick03 >end-volume > >volume afr02 > type cluster/afr > subvolumes brick04 brick05 >end-volume > >volume afr03 > type cluster/afr > subvolumes brick06 brick07 >end-volume > >volume afr04 > type cluster/afr > subvolumes brick08 brick01 >end-volume > >volume afrns > type cluster/afr > subvolumes brick01ns brick03ns brick05ns brick07ns >end-volume > >### unify ### > >volume unify > type cluster/unify > option namespace afrns > option scheduler nufa > option nufa.local-volume-name brick03 > option nufa.local-volume-name brick04 > option nufa.limits.min-free-disk 5% > subvolumes afr01 afr02 afr03 afr04 >end-volume > >------------------------------- end client config >------------------------------------------- > >seems everything is working fine, but we want to know if there are any >alternatives to such configuration and maybe some additional >optimizations may be applied? >is there any mechanisms to split one file over more than 2 nodes? >Do we need readahead translators if we use nufa with local-volume >options? what about write-ahead? did we miss something else? > > >-- >...WBR, Roman Hlynovskiy > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
I''ll let one of the devs respond to your specific config. There are a couple cautions ... if you''re running PHP, you''ll want to modify your php.ini to have session_save_path on shared storage.. If someones session starts on server one and the browser directs them to server2 , their session is missing (Either that or use DB based sessions). I''ve noticed some problems with this configuration, in that it seems PHP likes to create semaphores all the time. These get created in session_save_path. There seems to be some cases where processes sometimes block on the semaphore form the other server. I haven''t been able to figure out exactly why, and it may be exclusive to my configuration, but it''s something to watch out for. You might end up with non-killable php processes out iowait blocked. the only solution has been to kill gluster and remount the filesystem. This only takes a second but it''s inconvenient, and until you realize it''s happening, any process which tries to access the same files will block also, thus eventually consuming all your spare httpd processes. Keith At 10:51 PM 8/10/2008, Roman Hlynovskiy wrote:>Hello everyone, > >We want to build a cluster of 4 web-servers. ftp and http will be >load-balanced, so we will never know which node will serve ftp/http >traffic. >Since we don''t want to loose any part of functionality in case of >getting one of the servers out of order, we have invented the >following architecture: > - each server will have 2 data bricks and 1 namespace bricks > - each second data brick is AFRed with first data brick of the next server > - all namespace bricks ar AFRed > >we''ve tried to follow recommendations from wiki and the following >configs have been created: >------------------------------- begin server config >------------------------------------------- > ># ># Object Storage Brick 1 ># > ># low-level brick pointing to physical folder >volume posix1 > type storage/posix > option directory /mnt/os1/export >end-volume > ># put support for fcntl over brick >volume locks1 > type features/posix-locks > subvolumes posix1 > option mandatory on >end-volume > ># put additional io threads for this brick >volume brick1 > type performance/io-threads > option thread-count 4 > option cache-size 32MB > subvolumes locks1 >end-volume > ># ># Object Storage Brick 2 ># > ># low-level brick pointing to physical folder >volume posix2 > type storage/posix > option directory /mnt/os2/export >end-volume > ># put support for fcntl over brick >volume locks2 > type features/posix-locks > subvolumes posix2 > option mandatory on >end-volume > ># put additional io threads for this brick >volume brick2 > type performance/io-threads > option thread-count 4 > option cache-size 32MB > subvolumes locks2 >end-volume > ># ># Metadata Storage ># > >volume brick1ns > type storage/posix > option directory /mnt/ms1 >end-volume > ># ># Volume to export ># > >volume server > type protocol/server > subvolumes brick1 brick2 brick1ns brick2ns > option transport-type tcp/server > option auth.ip.brick1.allow * > option auth.ip.brick2.allow * > option auth.ip.brick1ns.allow * >end-volume > >------------------------------- end server config >------------------------------------------- > >and client config from one of the nodes > >------------------------------- begin client config >------------------------------------------- > >### begin x-346-01 ### > >volume brick01 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.11 > option remote-subvolume brick1 >end-volume > >volume brick02 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.11 > option remote-subvolume brick2 >end-volume > >volume brick01ns > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.11 > option remote-subvolume brick1ns >end-volume > >### end x-346-01 ### > > > >### begin x-346-02 ### > >volume brick03 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.21 > option remote-subvolume brick1 >end-volume > >volume brick04 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.21 > option remote-subvolume brick2 >end-volume > >volume brick03ns > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.21 > option remote-subvolume brick1n >end-volume > >### end x-346-02 ### > > > >### begin x-346-03 ### > >volume brick05 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.31 > option remote-subvolume brick1 >end-volume > >volume brick06 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.31 > option remote-subvolume brick2 >end-volume > >volume brick05ns > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.31 > option remote-subvolume brick1ns >end-volume > >### begin x-346-03 ### > > > >### begin x-346-04 ### > >volume brick07 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.41 > option remote-subvolume brick1 >end-volume > >volume brick08 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.41 > option remote-subvolume brick2 >end-volume > >volume brick07ns > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.41 > option remote-subvolume brick1ns >end-volume > >### begin x-346-04 ### > > > >### afr bricks ### > >volume afr01 > type cluster/afr > subvolumes brick02 brick03 >end-volume > >volume afr02 > type cluster/afr > subvolumes brick04 brick05 >end-volume > >volume afr03 > type cluster/afr > subvolumes brick06 brick07 >end-volume > >volume afr04 > type cluster/afr > subvolumes brick08 brick01 >end-volume > >volume afrns > type cluster/afr > subvolumes brick01ns brick03ns brick05ns brick07ns >end-volume > >### unify ### > >volume unify > type cluster/unify > option namespace afrns > option scheduler nufa > option nufa.local-volume-name brick03 > option nufa.local-volume-name brick04 > option nufa.limits.min-free-disk 5% > subvolumes afr01 afr02 afr03 afr04 >end-volume > >------------------------------- end client config >------------------------------------------- > >seems everything is working fine, but we want to know if there are any >alternatives to such configuration and maybe some additional >optimizations may be applied? >is there any mechanisms to split one file over more than 2 nodes? >Do we need readahead translators if we use nufa with local-volume >options? what about write-ahead? did we miss something else? > > >-- >...WBR, Roman Hlynovskiy > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
I''ll let one of the devs respond to your specific config. There are a couple cautions ... if you''re running PHP, you''ll want to modify your php.ini to have session_save_path on shared storage.. If someones session starts on server one and the browser directs them to server2 , their session is missing (Either that or use DB based sessions). I''ve noticed some problems with this configuration, in that it seems PHP likes to create semaphores all the time. These get created in session_save_path. There seems to be some cases where processes sometimes block on the semaphore form the other server. I haven''t been able to figure out exactly why, and it may be exclusive to my configuration, but it''s something to watch out for. You might end up with non-killable php processes out iowait blocked. the only solution has been to kill gluster and remount the filesystem. This only takes a second but it''s inconvenient, and until you realize it''s happening, any process which tries to access the same files will block also, thus eventually consuming all your spare httpd processes. Keith At 10:51 PM 8/10/2008, Roman Hlynovskiy wrote:>Hello everyone, > >We want to build a cluster of 4 web-servers. ftp and http will be >load-balanced, so we will never know which node will serve ftp/http >traffic. >Since we don''t want to loose any part of functionality in case of >getting one of the servers out of order, we have invented the >following architecture: > - each server will have 2 data bricks and 1 namespace bricks > - each second data brick is AFRed with first data brick of the next server > - all namespace bricks ar AFRed > >we''ve tried to follow recommendations from wiki and the following >configs have been created: >------------------------------- begin server config >------------------------------------------- > ># ># Object Storage Brick 1 ># > ># low-level brick pointing to physical folder >volume posix1 > type storage/posix > option directory /mnt/os1/export >end-volume > ># put support for fcntl over brick >volume locks1 > type features/posix-locks > subvolumes posix1 > option mandatory on >end-volume > ># put additional io threads for this brick >volume brick1 > type performance/io-threads > option thread-count 4 > option cache-size 32MB > subvolumes locks1 >end-volume > ># ># Object Storage Brick 2 ># > ># low-level brick pointing to physical folder >volume posix2 > type storage/posix > option directory /mnt/os2/export >end-volume > ># put support for fcntl over brick >volume locks2 > type features/posix-locks > subvolumes posix2 > option mandatory on >end-volume > ># put additional io threads for this brick >volume brick2 > type performance/io-threads > option thread-count 4 > option cache-size 32MB > subvolumes locks2 >end-volume > ># ># Metadata Storage ># > >volume brick1ns > type storage/posix > option directory /mnt/ms1 >end-volume > ># ># Volume to export ># > >volume server > type protocol/server > subvolumes brick1 brick2 brick1ns brick2ns > option transport-type tcp/server > option auth.ip.brick1.allow * > option auth.ip.brick2.allow * > option auth.ip.brick1ns.allow * >end-volume > >------------------------------- end server config >------------------------------------------- > >and client config from one of the nodes > >------------------------------- begin client config >------------------------------------------- > >### begin x-346-01 ### > >volume brick01 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.11 > option remote-subvolume brick1 >end-volume > >volume brick02 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.11 > option remote-subvolume brick2 >end-volume > >volume brick01ns > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.11 > option remote-subvolume brick1ns >end-volume > >### end x-346-01 ### > > > >### begin x-346-02 ### > >volume brick03 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.21 > option remote-subvolume brick1 >end-volume > >volume brick04 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.21 > option remote-subvolume brick2 >end-volume > >volume brick03ns > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.21 > option remote-subvolume brick1n >end-volume > >### end x-346-02 ### > > > >### begin x-346-03 ### > >volume brick05 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.31 > option remote-subvolume brick1 >end-volume > >volume brick06 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.31 > option remote-subvolume brick2 >end-volume > >volume brick05ns > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.31 > option remote-subvolume brick1ns >end-volume > >### begin x-346-03 ### > > > >### begin x-346-04 ### > >volume brick07 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.41 > option remote-subvolume brick1 >end-volume > >volume brick08 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.41 > option remote-subvolume brick2 >end-volume > >volume brick07ns > type protocol/client > option transport-type tcp/client > option remote-host 192.168.252.41 > option remote-subvolume brick1ns >end-volume > >### begin x-346-04 ### > > > >### afr bricks ### > >volume afr01 > type cluster/afr > subvolumes brick02 brick03 >end-volume > >volume afr02 > type cluster/afr > subvolumes brick04 brick05 >end-volume > >volume afr03 > type cluster/afr > subvolumes brick06 brick07 >end-volume > >volume afr04 > type cluster/afr > subvolumes brick08 brick01 >end-volume > >volume afrns > type cluster/afr > subvolumes brick01ns brick03ns brick05ns brick07ns >end-volume > >### unify ### > >volume unify > type cluster/unify > option namespace afrns > option scheduler nufa > option nufa.local-volume-name brick03 > option nufa.local-volume-name brick04 > option nufa.limits.min-free-disk 5% > subvolumes afr01 afr02 afr03 afr04 >end-volume > >------------------------------- end client config >------------------------------------------- > >seems everything is working fine, but we want to know if there are any >alternatives to such configuration and maybe some additional >optimizations may be applied? >is there any mechanisms to split one file over more than 2 nodes? >Do we need readahead translators if we use nufa with local-volume >options? what about write-ahead? did we miss something else? > > >-- >...WBR, Roman Hlynovskiy > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
Hello Keith, ok thanks, we will try to make stress tests with php and check if the same situation apply to our configuration. did this semaphore issue occurred only with some specific number of simultaneous connections or it was matter of "luck" :) ? 2008/8/11 Keith Freedman <freedman at freeformit.com>:> I'll let one of the devs respond to your specific config. > > There are a couple cautions ... > if you're running PHP, you'll want to modify your php.ini to have > session_save_path on shared storage.. If someones session starts on server > one and the browser directs them to server2 , their session is missing > (Either that or use DB based sessions). > > I've noticed some problems with this configuration, in that it seems PHP > likes to create semaphores all the time. These get created in > session_save_path. There seems to be some cases where processes sometimes > block on the semaphore form the other server. > > I haven't been able to figure out exactly why, and it may be exclusive to my > configuration, but it's something to watch out for. > You might end up with non-killable php processes out iowait blocked. the > only solution has been to kill gluster and remount the filesystem. This > only takes a second but it's inconvenient, and until you realize it's > happening, any process which tries to access the same files will block also, > thus eventually consuming all your spare httpd processes. > > Keith > > At 10:51 PM 8/10/2008, Roman Hlynovskiy wrote: >> >> Hello everyone, >> >> We want to build a cluster of 4 web-servers. ftp and http will be >> load-balanced, so we will never know which node will serve ftp/http >> traffic. >> Since we don't want to loose any part of functionality in case of >> getting one of the servers out of order, we have invented the >> following architecture: >> - each server will have 2 data bricks and 1 namespace bricks >> - each second data brick is AFRed with first data brick of the next >> server >> - all namespace bricks ar AFRed >> >> we've tried to follow recommendations from wiki and the following >> configs have been created: >> ------------------------------- begin server config >> ------------------------------------------- >> >> # >> # Object Storage Brick 1 >> # >> >> # low-level brick pointing to physical folder >> volume posix1 >> type storage/posix >> option directory /mnt/os1/export >> end-volume >> >> # put support for fcntl over brick >> volume locks1 >> type features/posix-locks >> subvolumes posix1 >> option mandatory on >> end-volume >> >> # put additional io threads for this brick >> volume brick1 >> type performance/io-threads >> option thread-count 4 >> option cache-size 32MB >> subvolumes locks1 >> end-volume >> >> # >> # Object Storage Brick 2 >> # >> >> # low-level brick pointing to physical folder >> volume posix2 >> type storage/posix >> option directory /mnt/os2/export >> end-volume >> >> # put support for fcntl over brick >> volume locks2 >> type features/posix-locks >> subvolumes posix2 >> option mandatory on >> end-volume >> >> # put additional io threads for this brick >> volume brick2 >> type performance/io-threads >> option thread-count 4 >> option cache-size 32MB >> subvolumes locks2 >> end-volume >> >> # >> # Metadata Storage >> # >> >> volume brick1ns >> type storage/posix >> option directory /mnt/ms1 >> end-volume >> >> # >> # Volume to export >> # >> >> volume server >> type protocol/server >> subvolumes brick1 brick2 brick1ns brick2ns >> option transport-type tcp/server >> option auth.ip.brick1.allow * >> option auth.ip.brick2.allow * >> option auth.ip.brick1ns.allow * >> end-volume >> >> ------------------------------- end server config >> ------------------------------------------- >> >> and client config from one of the nodes >> >> ------------------------------- begin client config >> ------------------------------------------- >> >> ### begin x-346-01 ### >> >> volume brick01 >> type protocol/client >> option transport-type tcp/client >> option remote-host 192.168.252.11 >> option remote-subvolume brick1 >> end-volume >> >> volume brick02 >> type protocol/client >> option transport-type tcp/client >> option remote-host 192.168.252.11 >> option remote-subvolume brick2 >> end-volume >> >> volume brick01ns >> type protocol/client >> option transport-type tcp/client >> option remote-host 192.168.252.11 >> option remote-subvolume brick1ns >> end-volume >> >> ### end x-346-01 ### >> >> >> >> ### begin x-346-02 ### >> >> volume brick03 >> type protocol/client >> option transport-type tcp/client >> option remote-host 192.168.252.21 >> option remote-subvolume brick1 >> end-volume >> >> volume brick04 >> type protocol/client >> option transport-type tcp/client >> option remote-host 192.168.252.21 >> option remote-subvolume brick2 >> end-volume >> >> volume brick03ns >> type protocol/client >> option transport-type tcp/client >> option remote-host 192.168.252.21 >> option remote-subvolume brick1n >> end-volume >> >> ### end x-346-02 ### >> >> >> >> ### begin x-346-03 ### >> >> volume brick05 >> type protocol/client >> option transport-type tcp/client >> option remote-host 192.168.252.31 >> option remote-subvolume brick1 >> end-volume >> >> volume brick06 >> type protocol/client >> option transport-type tcp/client >> option remote-host 192.168.252.31 >> option remote-subvolume brick2 >> end-volume >> >> volume brick05ns >> type protocol/client >> option transport-type tcp/client >> option remote-host 192.168.252.31 >> option remote-subvolume brick1ns >> end-volume >> >> ### begin x-346-03 ### >> >> >> >> ### begin x-346-04 ### >> >> volume brick07 >> type protocol/client >> option transport-type tcp/client >> option remote-host 192.168.252.41 >> option remote-subvolume brick1 >> end-volume >> >> volume brick08 >> type protocol/client >> option transport-type tcp/client >> option remote-host 192.168.252.41 >> option remote-subvolume brick2 >> end-volume >> >> volume brick07ns >> type protocol/client >> option transport-type tcp/client >> option remote-host 192.168.252.41 >> option remote-subvolume brick1ns >> end-volume >> >> ### begin x-346-04 ### >> >> >> >> ### afr bricks ### >> >> volume afr01 >> type cluster/afr >> subvolumes brick02 brick03 >> end-volume >> >> volume afr02 >> type cluster/afr >> subvolumes brick04 brick05 >> end-volume >> >> volume afr03 >> type cluster/afr >> subvolumes brick06 brick07 >> end-volume >> >> volume afr04 >> type cluster/afr >> subvolumes brick08 brick01 >> end-volume >> >> volume afrns >> type cluster/afr >> subvolumes brick01ns brick03ns brick05ns brick07ns >> end-volume >> >> ### unify ### >> >> volume unify >> type cluster/unify >> option namespace afrns >> option scheduler nufa >> option nufa.local-volume-name brick03 >> option nufa.local-volume-name brick04 >> option nufa.limits.min-free-disk 5% >> subvolumes afr01 afr02 afr03 afr04 >> end-volume >> >> ------------------------------- end client config >> ------------------------------------------- >> >> seems everything is working fine, but we want to know if there are any >> alternatives to such configuration and maybe some additional >> optimizations may be applied? >> is there any mechanisms to split one file over more than 2 nodes? >> Do we need readahead translators if we use nufa with local-volume >> options? what about write-ahead? did we miss something else? >> >> >> -- >> ...WBR, Roman Hlynovskiy >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users > >-- ...WBR, Roman Hlynovskiy
I haven''t had time to do any thorough testing.. I wont for a couple weeks unfortunately. Here''s what "seems" to be going on. I have 4 systems, which monitor eachother. every 2 minutes they pull a page from each of the other 3 servers. (this is a php script which returns the hostname & timestamp). When things seem to get stuck, it seems that most of the websites are working fine with the following exceptions. since it takes a few minutes for my pager to go off, there are usually a stack of php proceses hung accessing the status.php script. Sometimes there will be other virtualhosts'' scripts lingering, but usually only one or 2 if so. These are generally the index.php file for some of the busy hosts, or the cart.php file for a busy shopping site. gluster seems to be pretty happy during all this, so I''m not sure if the problem is on the underlying filesystem or fuse. (I''m not using the gluster optimized fuse at the moment--I don''t have all the kernel sources to build it). I did realize, after I read your original email that I didn''t have "option mandatory on" in the locks brick. I enabled that and amd thinking that might solve the problem. since I have threads brick enabled, I''m now wondering if there was some strange thing related to the semaphores where they were getting removed while something was trying to get a lock on them. the file goes away, the lock request doesn''t know what to do with itself and just sits there waiting forever??? I''m speculating, but there''s the behavior I''ve been able to observe. Make sure when you do your tests, you have some scripts that take a while to process and some that are really super fast. I think the super fast ones cause most of the problem. If my suspicions about the semaphores and the locks is true, that is likely where you''ll get tripped up. keep me posted, would love to hear any results of your testing. Keith At 02:40 AM 8/11/2008, Roman Hlynovskiy wrote:>Hello Keith, > >ok thanks, we will try to make stress tests with php and check if the >same situation apply to our configuration. >did this semaphore issue occurred only with some specific number of >simultaneous connections or it was matter of "luck" :) ? > > >2008/8/11 Keith Freedman <freedman at freeformit.com>: > > I''ll let one of the devs respond to your specific config. > > > > There are a couple cautions ... > > if you''re running PHP, you''ll want to modify your php.ini to have > > session_save_path on shared storage.. If someones session starts on server > > one and the browser directs them to server2 , their session is missing > > (Either that or use DB based sessions). > > > > I''ve noticed some problems with this configuration, in that it seems PHP > > likes to create semaphores all the time. These get created in > > session_save_path. There seems to be some cases where processes sometimes > > block on the semaphore form the other server. > > > > I haven''t been able to figure out exactly why, and it may be > exclusive to my > > configuration, but it''s something to watch out for. > > You might end up with non-killable php processes out iowait blocked. the > > only solution has been to kill gluster and remount the filesystem. This > > only takes a second but it''s inconvenient, and until you realize it''s > > happening, any process which tries to access the same files will > block also, > > thus eventually consuming all your spare httpd processes. > > > > Keith > > > > At 10:51 PM 8/10/2008, Roman Hlynovskiy wrote: > >> > >> Hello everyone, > >> > >> We want to build a cluster of 4 web-servers. ftp and http will be > >> load-balanced, so we will never know which node will serve ftp/http > >> traffic. > >> Since we don''t want to loose any part of functionality in case of > >> getting one of the servers out of order, we have invented the > >> following architecture: > >> - each server will have 2 data bricks and 1 namespace bricks > >> - each second data brick is AFRed with first data brick of the next > >> server > >> - all namespace bricks ar AFRed > >> > >> we''ve tried to follow recommendations from wiki and the following > >> configs have been created: > >> ------------------------------- begin server config > >> ------------------------------------------- > >> > >> # > >> # Object Storage Brick 1 > >> # > >> > >> # low-level brick pointing to physical folder > >> volume posix1 > >> type storage/posix > >> option directory /mnt/os1/export > >> end-volume > >> > >> # put support for fcntl over brick > >> volume locks1 > >> type features/posix-locks > >> subvolumes posix1 > >> option mandatory on > >> end-volume > >> > >> # put additional io threads for this brick > >> volume brick1 > >> type performance/io-threads > >> option thread-count 4 > >> option cache-size 32MB > >> subvolumes locks1 > >> end-volume > >> > >> # > >> # Object Storage Brick 2 > >> # > >> > >> # low-level brick pointing to physical folder > >> volume posix2 > >> type storage/posix > >> option directory /mnt/os2/export > >> end-volume > >> > >> # put support for fcntl over brick > >> volume locks2 > >> type features/posix-locks > >> subvolumes posix2 > >> option mandatory on > >> end-volume > >> > >> # put additional io threads for this brick > >> volume brick2 > >> type performance/io-threads > >> option thread-count 4 > >> option cache-size 32MB > >> subvolumes locks2 > >> end-volume > >> > >> # > >> # Metadata Storage > >> # > >> > >> volume brick1ns > >> type storage/posix > >> option directory /mnt/ms1 > >> end-volume > >> > >> # > >> # Volume to export > >> # > >> > >> volume server > >> type protocol/server > >> subvolumes brick1 brick2 brick1ns brick2ns > >> option transport-type tcp/server > >> option auth.ip.brick1.allow * > >> option auth.ip.brick2.allow * > >> option auth.ip.brick1ns.allow * > >> end-volume > >> > >> ------------------------------- end server config > >> ------------------------------------------- > >> > >> and client config from one of the nodes > >> > >> ------------------------------- begin client config > >> ------------------------------------------- > >> > >> ### begin x-346-01 ### > >> > >> volume brick01 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.11 > >> option remote-subvolume brick1 > >> end-volume > >> > >> volume brick02 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.11 > >> option remote-subvolume brick2 > >> end-volume > >> > >> volume brick01ns > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.11 > >> option remote-subvolume brick1ns > >> end-volume > >> > >> ### end x-346-01 ### > >> > >> > >> > >> ### begin x-346-02 ### > >> > >> volume brick03 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.21 > >> option remote-subvolume brick1 > >> end-volume > >> > >> volume brick04 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.21 > >> option remote-subvolume brick2 > >> end-volume > >> > >> volume brick03ns > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.21 > >> option remote-subvolume brick1n > >> end-volume > >> > >> ### end x-346-02 ### > >> > >> > >> > >> ### begin x-346-03 ### > >> > >> volume brick05 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.31 > >> option remote-subvolume brick1 > >> end-volume > >> > >> volume brick06 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.31 > >> option remote-subvolume brick2 > >> end-volume > >> > >> volume brick05ns > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.31 > >> option remote-subvolume brick1ns > >> end-volume > >> > >> ### begin x-346-03 ### > >> > >> > >> > >> ### begin x-346-04 ### > >> > >> volume brick07 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.41 > >> option remote-subvolume brick1 > >> end-volume > >> > >> volume brick08 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.41 > >> option remote-subvolume brick2 > >> end-volume > >> > >> volume brick07ns > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.41 > >> option remote-subvolume brick1ns > >> end-volume > >> > >> ### begin x-346-04 ### > >> > >> > >> > >> ### afr bricks ### > >> > >> volume afr01 > >> type cluster/afr > >> subvolumes brick02 brick03 > >> end-volume > >> > >> volume afr02 > >> type cluster/afr > >> subvolumes brick04 brick05 > >> end-volume > >> > >> volume afr03 > >> type cluster/afr > >> subvolumes brick06 brick07 > >> end-volume > >> > >> volume afr04 > >> type cluster/afr > >> subvolumes brick08 brick01 > >> end-volume > >> > >> volume afrns > >> type cluster/afr > >> subvolumes brick01ns brick03ns brick05ns brick07ns > >> end-volume > >> > >> ### unify ### > >> > >> volume unify > >> type cluster/unify > >> option namespace afrns > >> option scheduler nufa > >> option nufa.local-volume-name brick03 > >> option nufa.local-volume-name brick04 > >> option nufa.limits.min-free-disk 5% > >> subvolumes afr01 afr02 afr03 afr04 > >> end-volume > >> > >> ------------------------------- end client config > >> ------------------------------------------- > >> > >> seems everything is working fine, but we want to know if there are any > >> alternatives to such configuration and maybe some additional > >> optimizations may be applied? > >> is there any mechanisms to split one file over more than 2 nodes? > >> Do we need readahead translators if we use nufa with local-volume > >> options? what about write-ahead? did we miss something else? > >> > >> > >> -- > >> ...WBR, Roman Hlynovskiy > >> > >> _______________________________________________ > >> Gluster-users mailing list > >> Gluster-users at gluster.org > >> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users > > > > > > > >-- >...WBR, Roman Hlynovskiy > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
I haven''t had time to do any thorough testing.. I wont for a couple weeks unfortunately. Here''s what "seems" to be going on. I have 4 systems, which monitor eachother. every 2 minutes they pull a page from each of the other 3 servers. (this is a php script which returns the hostname & timestamp). When things seem to get stuck, it seems that most of the websites are working fine with the following exceptions. since it takes a few minutes for my pager to go off, there are usually a stack of php proceses hung accessing the status.php script. Sometimes there will be other virtualhosts'' scripts lingering, but usually only one or 2 if so. These are generally the index.php file for some of the busy hosts, or the cart.php file for a busy shopping site. gluster seems to be pretty happy during all this, so I''m not sure if the problem is on the underlying filesystem or fuse. (I''m not using the gluster optimized fuse at the moment--I don''t have all the kernel sources to build it). I did realize, after I read your original email that I didn''t have "option mandatory on" in the locks brick. I enabled that and amd thinking that might solve the problem. since I have threads brick enabled, I''m now wondering if there was some strange thing related to the semaphores where they were getting removed while something was trying to get a lock on them. the file goes away, the lock request doesn''t know what to do with itself and just sits there waiting forever??? I''m speculating, but there''s the behavior I''ve been able to observe. Make sure when you do your tests, you have some scripts that take a while to process and some that are really super fast. I think the super fast ones cause most of the problem. If my suspicions about the semaphores and the locks is true, that is likely where you''ll get tripped up. keep me posted, would love to hear any results of your testing. Keith At 02:40 AM 8/11/2008, Roman Hlynovskiy wrote:>Hello Keith, > >ok thanks, we will try to make stress tests with php and check if the >same situation apply to our configuration. >did this semaphore issue occurred only with some specific number of >simultaneous connections or it was matter of "luck" :) ? > > >2008/8/11 Keith Freedman <freedman at freeformit.com>: > > I''ll let one of the devs respond to your specific config. > > > > There are a couple cautions ... > > if you''re running PHP, you''ll want to modify your php.ini to have > > session_save_path on shared storage.. If someones session starts on server > > one and the browser directs them to server2 , their session is missing > > (Either that or use DB based sessions). > > > > I''ve noticed some problems with this configuration, in that it seems PHP > > likes to create semaphores all the time. These get created in > > session_save_path. There seems to be some cases where processes sometimes > > block on the semaphore form the other server. > > > > I haven''t been able to figure out exactly why, and it may be > exclusive to my > > configuration, but it''s something to watch out for. > > You might end up with non-killable php processes out iowait blocked. the > > only solution has been to kill gluster and remount the filesystem. This > > only takes a second but it''s inconvenient, and until you realize it''s > > happening, any process which tries to access the same files will > block also, > > thus eventually consuming all your spare httpd processes. > > > > Keith > > > > At 10:51 PM 8/10/2008, Roman Hlynovskiy wrote: > >> > >> Hello everyone, > >> > >> We want to build a cluster of 4 web-servers. ftp and http will be > >> load-balanced, so we will never know which node will serve ftp/http > >> traffic. > >> Since we don''t want to loose any part of functionality in case of > >> getting one of the servers out of order, we have invented the > >> following architecture: > >> - each server will have 2 data bricks and 1 namespace bricks > >> - each second data brick is AFRed with first data brick of the next > >> server > >> - all namespace bricks ar AFRed > >> > >> we''ve tried to follow recommendations from wiki and the following > >> configs have been created: > >> ------------------------------- begin server config > >> ------------------------------------------- > >> > >> # > >> # Object Storage Brick 1 > >> # > >> > >> # low-level brick pointing to physical folder > >> volume posix1 > >> type storage/posix > >> option directory /mnt/os1/export > >> end-volume > >> > >> # put support for fcntl over brick > >> volume locks1 > >> type features/posix-locks > >> subvolumes posix1 > >> option mandatory on > >> end-volume > >> > >> # put additional io threads for this brick > >> volume brick1 > >> type performance/io-threads > >> option thread-count 4 > >> option cache-size 32MB > >> subvolumes locks1 > >> end-volume > >> > >> # > >> # Object Storage Brick 2 > >> # > >> > >> # low-level brick pointing to physical folder > >> volume posix2 > >> type storage/posix > >> option directory /mnt/os2/export > >> end-volume > >> > >> # put support for fcntl over brick > >> volume locks2 > >> type features/posix-locks > >> subvolumes posix2 > >> option mandatory on > >> end-volume > >> > >> # put additional io threads for this brick > >> volume brick2 > >> type performance/io-threads > >> option thread-count 4 > >> option cache-size 32MB > >> subvolumes locks2 > >> end-volume > >> > >> # > >> # Metadata Storage > >> # > >> > >> volume brick1ns > >> type storage/posix > >> option directory /mnt/ms1 > >> end-volume > >> > >> # > >> # Volume to export > >> # > >> > >> volume server > >> type protocol/server > >> subvolumes brick1 brick2 brick1ns brick2ns > >> option transport-type tcp/server > >> option auth.ip.brick1.allow * > >> option auth.ip.brick2.allow * > >> option auth.ip.brick1ns.allow * > >> end-volume > >> > >> ------------------------------- end server config > >> ------------------------------------------- > >> > >> and client config from one of the nodes > >> > >> ------------------------------- begin client config > >> ------------------------------------------- > >> > >> ### begin x-346-01 ### > >> > >> volume brick01 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.11 > >> option remote-subvolume brick1 > >> end-volume > >> > >> volume brick02 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.11 > >> option remote-subvolume brick2 > >> end-volume > >> > >> volume brick01ns > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.11 > >> option remote-subvolume brick1ns > >> end-volume > >> > >> ### end x-346-01 ### > >> > >> > >> > >> ### begin x-346-02 ### > >> > >> volume brick03 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.21 > >> option remote-subvolume brick1 > >> end-volume > >> > >> volume brick04 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.21 > >> option remote-subvolume brick2 > >> end-volume > >> > >> volume brick03ns > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.21 > >> option remote-subvolume brick1n > >> end-volume > >> > >> ### end x-346-02 ### > >> > >> > >> > >> ### begin x-346-03 ### > >> > >> volume brick05 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.31 > >> option remote-subvolume brick1 > >> end-volume > >> > >> volume brick06 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.31 > >> option remote-subvolume brick2 > >> end-volume > >> > >> volume brick05ns > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.31 > >> option remote-subvolume brick1ns > >> end-volume > >> > >> ### begin x-346-03 ### > >> > >> > >> > >> ### begin x-346-04 ### > >> > >> volume brick07 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.41 > >> option remote-subvolume brick1 > >> end-volume > >> > >> volume brick08 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.41 > >> option remote-subvolume brick2 > >> end-volume > >> > >> volume brick07ns > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.41 > >> option remote-subvolume brick1ns > >> end-volume > >> > >> ### begin x-346-04 ### > >> > >> > >> > >> ### afr bricks ### > >> > >> volume afr01 > >> type cluster/afr > >> subvolumes brick02 brick03 > >> end-volume > >> > >> volume afr02 > >> type cluster/afr > >> subvolumes brick04 brick05 > >> end-volume > >> > >> volume afr03 > >> type cluster/afr > >> subvolumes brick06 brick07 > >> end-volume > >> > >> volume afr04 > >> type cluster/afr > >> subvolumes brick08 brick01 > >> end-volume > >> > >> volume afrns > >> type cluster/afr > >> subvolumes brick01ns brick03ns brick05ns brick07ns > >> end-volume > >> > >> ### unify ### > >> > >> volume unify > >> type cluster/unify > >> option namespace afrns > >> option scheduler nufa > >> option nufa.local-volume-name brick03 > >> option nufa.local-volume-name brick04 > >> option nufa.limits.min-free-disk 5% > >> subvolumes afr01 afr02 afr03 afr04 > >> end-volume > >> > >> ------------------------------- end client config > >> ------------------------------------------- > >> > >> seems everything is working fine, but we want to know if there are any > >> alternatives to such configuration and maybe some additional > >> optimizations may be applied? > >> is there any mechanisms to split one file over more than 2 nodes? > >> Do we need readahead translators if we use nufa with local-volume > >> options? what about write-ahead? did we miss something else? > >> > >> > >> -- > >> ...WBR, Roman Hlynovskiy > >> > >> _______________________________________________ > >> Gluster-users mailing list > >> Gluster-users at gluster.org > >> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users > > > > > > > >-- >...WBR, Roman Hlynovskiy > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
I haven''t had time to do any thorough testing.. I wont for a couple weeks unfortunately. Here''s what "seems" to be going on. I have 4 systems, which monitor eachother. every 2 minutes they pull a page from each of the other 3 servers. (this is a php script which returns the hostname & timestamp). When things seem to get stuck, it seems that most of the websites are working fine with the following exceptions. since it takes a few minutes for my pager to go off, there are usually a stack of php proceses hung accessing the status.php script. Sometimes there will be other virtualhosts'' scripts lingering, but usually only one or 2 if so. These are generally the index.php file for some of the busy hosts, or the cart.php file for a busy shopping site. gluster seems to be pretty happy during all this, so I''m not sure if the problem is on the underlying filesystem or fuse. (I''m not using the gluster optimized fuse at the moment--I don''t have all the kernel sources to build it). I did realize, after I read your original email that I didn''t have "option mandatory on" in the locks brick. I enabled that and amd thinking that might solve the problem. since I have threads brick enabled, I''m now wondering if there was some strange thing related to the semaphores where they were getting removed while something was trying to get a lock on them. the file goes away, the lock request doesn''t know what to do with itself and just sits there waiting forever??? I''m speculating, but there''s the behavior I''ve been able to observe. Make sure when you do your tests, you have some scripts that take a while to process and some that are really super fast. I think the super fast ones cause most of the problem. If my suspicions about the semaphores and the locks is true, that is likely where you''ll get tripped up. keep me posted, would love to hear any results of your testing. Keith At 02:40 AM 8/11/2008, Roman Hlynovskiy wrote:>Hello Keith, > >ok thanks, we will try to make stress tests with php and check if the >same situation apply to our configuration. >did this semaphore issue occurred only with some specific number of >simultaneous connections or it was matter of "luck" :) ? > > >2008/8/11 Keith Freedman <freedman at freeformit.com>: > > I''ll let one of the devs respond to your specific config. > > > > There are a couple cautions ... > > if you''re running PHP, you''ll want to modify your php.ini to have > > session_save_path on shared storage.. If someones session starts on server > > one and the browser directs them to server2 , their session is missing > > (Either that or use DB based sessions). > > > > I''ve noticed some problems with this configuration, in that it seems PHP > > likes to create semaphores all the time. These get created in > > session_save_path. There seems to be some cases where processes sometimes > > block on the semaphore form the other server. > > > > I haven''t been able to figure out exactly why, and it may be > exclusive to my > > configuration, but it''s something to watch out for. > > You might end up with non-killable php processes out iowait blocked. the > > only solution has been to kill gluster and remount the filesystem. This > > only takes a second but it''s inconvenient, and until you realize it''s > > happening, any process which tries to access the same files will > block also, > > thus eventually consuming all your spare httpd processes. > > > > Keith > > > > At 10:51 PM 8/10/2008, Roman Hlynovskiy wrote: > >> > >> Hello everyone, > >> > >> We want to build a cluster of 4 web-servers. ftp and http will be > >> load-balanced, so we will never know which node will serve ftp/http > >> traffic. > >> Since we don''t want to loose any part of functionality in case of > >> getting one of the servers out of order, we have invented the > >> following architecture: > >> - each server will have 2 data bricks and 1 namespace bricks > >> - each second data brick is AFRed with first data brick of the next > >> server > >> - all namespace bricks ar AFRed > >> > >> we''ve tried to follow recommendations from wiki and the following > >> configs have been created: > >> ------------------------------- begin server config > >> ------------------------------------------- > >> > >> # > >> # Object Storage Brick 1 > >> # > >> > >> # low-level brick pointing to physical folder > >> volume posix1 > >> type storage/posix > >> option directory /mnt/os1/export > >> end-volume > >> > >> # put support for fcntl over brick > >> volume locks1 > >> type features/posix-locks > >> subvolumes posix1 > >> option mandatory on > >> end-volume > >> > >> # put additional io threads for this brick > >> volume brick1 > >> type performance/io-threads > >> option thread-count 4 > >> option cache-size 32MB > >> subvolumes locks1 > >> end-volume > >> > >> # > >> # Object Storage Brick 2 > >> # > >> > >> # low-level brick pointing to physical folder > >> volume posix2 > >> type storage/posix > >> option directory /mnt/os2/export > >> end-volume > >> > >> # put support for fcntl over brick > >> volume locks2 > >> type features/posix-locks > >> subvolumes posix2 > >> option mandatory on > >> end-volume > >> > >> # put additional io threads for this brick > >> volume brick2 > >> type performance/io-threads > >> option thread-count 4 > >> option cache-size 32MB > >> subvolumes locks2 > >> end-volume > >> > >> # > >> # Metadata Storage > >> # > >> > >> volume brick1ns > >> type storage/posix > >> option directory /mnt/ms1 > >> end-volume > >> > >> # > >> # Volume to export > >> # > >> > >> volume server > >> type protocol/server > >> subvolumes brick1 brick2 brick1ns brick2ns > >> option transport-type tcp/server > >> option auth.ip.brick1.allow * > >> option auth.ip.brick2.allow * > >> option auth.ip.brick1ns.allow * > >> end-volume > >> > >> ------------------------------- end server config > >> ------------------------------------------- > >> > >> and client config from one of the nodes > >> > >> ------------------------------- begin client config > >> ------------------------------------------- > >> > >> ### begin x-346-01 ### > >> > >> volume brick01 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.11 > >> option remote-subvolume brick1 > >> end-volume > >> > >> volume brick02 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.11 > >> option remote-subvolume brick2 > >> end-volume > >> > >> volume brick01ns > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.11 > >> option remote-subvolume brick1ns > >> end-volume > >> > >> ### end x-346-01 ### > >> > >> > >> > >> ### begin x-346-02 ### > >> > >> volume brick03 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.21 > >> option remote-subvolume brick1 > >> end-volume > >> > >> volume brick04 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.21 > >> option remote-subvolume brick2 > >> end-volume > >> > >> volume brick03ns > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.21 > >> option remote-subvolume brick1n > >> end-volume > >> > >> ### end x-346-02 ### > >> > >> > >> > >> ### begin x-346-03 ### > >> > >> volume brick05 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.31 > >> option remote-subvolume brick1 > >> end-volume > >> > >> volume brick06 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.31 > >> option remote-subvolume brick2 > >> end-volume > >> > >> volume brick05ns > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.31 > >> option remote-subvolume brick1ns > >> end-volume > >> > >> ### begin x-346-03 ### > >> > >> > >> > >> ### begin x-346-04 ### > >> > >> volume brick07 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.41 > >> option remote-subvolume brick1 > >> end-volume > >> > >> volume brick08 > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.41 > >> option remote-subvolume brick2 > >> end-volume > >> > >> volume brick07ns > >> type protocol/client > >> option transport-type tcp/client > >> option remote-host 192.168.252.41 > >> option remote-subvolume brick1ns > >> end-volume > >> > >> ### begin x-346-04 ### > >> > >> > >> > >> ### afr bricks ### > >> > >> volume afr01 > >> type cluster/afr > >> subvolumes brick02 brick03 > >> end-volume > >> > >> volume afr02 > >> type cluster/afr > >> subvolumes brick04 brick05 > >> end-volume > >> > >> volume afr03 > >> type cluster/afr > >> subvolumes brick06 brick07 > >> end-volume > >> > >> volume afr04 > >> type cluster/afr > >> subvolumes brick08 brick01 > >> end-volume > >> > >> volume afrns > >> type cluster/afr > >> subvolumes brick01ns brick03ns brick05ns brick07ns > >> end-volume > >> > >> ### unify ### > >> > >> volume unify > >> type cluster/unify > >> option namespace afrns > >> option scheduler nufa > >> option nufa.local-volume-name brick03 > >> option nufa.local-volume-name brick04 > >> option nufa.limits.min-free-disk 5% > >> subvolumes afr01 afr02 afr03 afr04 > >> end-volume > >> > >> ------------------------------- end client config > >> ------------------------------------------- > >> > >> seems everything is working fine, but we want to know if there are any > >> alternatives to such configuration and maybe some additional > >> optimizations may be applied? > >> is there any mechanisms to split one file over more than 2 nodes? > >> Do we need readahead translators if we use nufa with local-volume > >> options? what about write-ahead? did we miss something else? > >> > >> > >> -- > >> ...WBR, Roman Hlynovskiy > >> > >> _______________________________________________ > >> Gluster-users mailing list > >> Gluster-users at gluster.org > >> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users > > > > > > > >-- >...WBR, Roman Hlynovskiy > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users