Roman Hlynovskiy
2008-Sep-08 08:45 UTC
[Gluster-users] gluster(1.3.10) becomes unstable after some time
hello all, i have a setup of 4 identical servers. each of them exports 2 data bricks and 1 namespace brick. each first brick of server is AFR'ed with second brick of previous server. so, this configurations gives some service redundancy in case of failure of one of the servers. all the namespace bricks are also AFR'ed into one. below you can find my configuration from the first server. as it can be seen, for client configuration I used local bricks from this server: brick1, brick2, brickns instead of network-exported from this server brick01, brick02, brick01ns for i/o reading improvement. So, the second server uses brick1, brick2, brickns instead of brick03, brick04, brick02ns etc The first problem I saw: After 20 minutes of some basic tests with file copying gluster mount on all servers became unavailable. I see the following errors in the log: 2008-09-08 14:26:36 W [client-protocol.c:205:call_bail] brick03ns: activating bail-out. pending frames = 1. last sent = 2008-09-08 14:19:43. last received = 2008-09-08 14:19:43 transport-timeout = 42 2008-09-08 14:26:36 C [client-protocol.c:212:call_bail] brick03ns: bailing transport 2008-09-08 14:26:36 E [tcp.c:124:tcp_except] brick03ns: shutdown () - error: Transport endpoint is not connected 2008-09-08 14:26:36 W [client-protocol.c:205:call_bail] brick05: activating bail-out. pending frames = 1. last sent = 2008-09-08 14:19:43. last received = 2008-09-08 14:19:43 transport-timeout = 42 2008-09-08 14:26:36 C [client-protocol.c:212:call_bail] brick05: bailing transport 2008-09-08 14:26:36 E [tcp.c:124:tcp_except] brick05: shutdown () - error: Transport endpoint is not connected 2008-09-08 14:26:36 W [client-protocol.c:205:call_bail] brick06: activating bail-out. pending frames = 1. last sent = 2008-09-08 14:19:43. last received = 2008-09-08 14:19:43 transport-timeout = 42 2008-09-08 14:26:36 C [client-protocol.c:212:call_bail] brick06: bailing transport 2008-09-08 14:26:36 E [tcp.c:124:tcp_except] brick06: shutdown () - error: Transport endpoint is not connected 2008-09-08 14:26:41 W [client-protocol.c:205:call_bail] brick08: activating bail-out. pending frames = 1. last sent = 2008-09-08 14:19:43. last received = 2008-09-08 14:19:43 transport-timeout = 42 2008-09-08 14:26:41 C [client-protocol.c:212:call_bail] brick08: bailing transport 2008-09-08 14:26:41 E [tcp.c:124:tcp_except] brick08: shutdown () - error: Transport endpoint is not connected 2008-09-08 14:26:41 W [client-protocol.c:205:call_bail] brick04ns: activating bail-out. pending frames = 1. last sent = 2008-09-08 14:19:43. last received = 2008-09-08 14:19:43 transport-tim= 42 2008-09-08 14:26:41 C [client-protocol.c:212:call_bail] brick04ns: bailing transport 2008-09-08 14:26:41 E [tcp.c:124:tcp_except] brick04ns: shutdown () - error: Transport endpoint is not connected 2008-09-08 14:26:41 W [client-protocol.c:205:call_bail] brick07: activating bail-out. pending frames = 1. last sent = 2008-09-08 14:19:43. last received = 2008-09-08 14:19:43 transport-timeout = 42 2008-09-08 14:26:41 C [client-protocol.c:212:call_bail] brick07: bailing transport 2008-09-08 14:26:41 E [tcp.c:124:tcp_except] brick07: shutdown () - error: Transport endpoint is not connected The second problem I see - even with 'option alu.read-only-subvolumes' gluster remains writing to the specified as read-only volumes. what could be the reason for this? ---------------------- volume posix1 type storage/posix option directory /mnt/os1/export end-volume volume locks1 type features/posix-locks subvolumes posix1 option mandatory on end-volume volume brick1 type performance/io-threads option thread-count 4 option cache-size 32MB subvolumes locks1 end-volume volume posix2 type storage/posix option directory /mnt/os2/export end-volume volume locks2 type features/posix-locks subvolumes posix2 option mandatory on end-volume volume brick2 type performance/io-threads option thread-count 4 option cache-size 32MB subvolumes locks2 end-volume volume brickns type storage/posix option directory /mnt/ms end-volume volume server type protocol/server subvolumes brick1 brick2 brickns option transport-type tcp/server option auth.ip.brick1.allow * option auth.ip.brick2.allow * option auth.ip.brickns.allow * end-volume volume brick01 type protocol/client option transport-type tcp/client option remote-host 192.168.252.11 option remote-subvolume brick1 end-volume volume brick02 type protocol/client option transport-type tcp/client option remote-host 192.168.252.11 option remote-subvolume brick2 end-volume volume brick01ns type protocol/client option transport-type tcp/client option remote-host 192.168.252.11 option remote-subvolume brickns end-volume volume brick03 type protocol/client option transport-type tcp/client option remote-host 192.168.252.21 option remote-subvolume brick1 end-volume volume brick04 type protocol/client option transport-type tcp/client option remote-host 192.168.252.21 option remote-subvolume brick2 end-volume volume brick02ns type protocol/client option transport-type tcp/client option remote-host 192.168.252.21 option remote-subvolume brickns end-volume volume brick05 type protocol/client option transport-type tcp/client option remote-host 192.168.252.31 option remote-subvolume brick1 end-volume volume brick06 type protocol/client option transport-type tcp/client option remote-host 192.168.252.31 option remote-subvolume brick2 end-volume volume brick03ns type protocol/client option transport-type tcp/client option remote-host 192.168.252.31 option remote-subvolume brickns end-volume volume brick07 type protocol/client option transport-type tcp/client option remote-host 192.168.252.41 option remote-subvolume brick1 end-volume volume brick08 type protocol/client option transport-type tcp/client option remote-host 192.168.252.41 option remote-subvolume brick2 end-volume volume brick04ns type protocol/client option transport-type tcp/client option remote-host 192.168.252.41 option remote-subvolume brickns end-volume volume afr01 type cluster/afr subvolumes brick2 brick03 option read-subvolume brick2 end-volume volume afr02 type cluster/afr subvolumes brick04 brick05 end-volume volume afr03 type cluster/afr subvolumes brick06 brick07 end-volume volume afr04 type cluster/afr subvolumes brick08 brick1 option read-subvolume brick1 end-volume volume afrns type cluster/afr subvolumes brickns brick02ns brick03ns brick04ns option read-subvolume brickns end-volume volume unify type cluster/unify subvolumes afr01 afr02 afr03 afr04 option namespace afrns option scheduler alu option alu.read-only-subvolumes afr02,afr03 option alu.limits.min-free-disk 5% option alu.stat-refresh.interval 10sec option alu.order disk-usage:read-usage:write-usage:open-files-usage:disk-speed-usage option alu.disk-usage.entry-threshold 1024M option alu.disk-usage.exit-threshold 32M end-volume --------------------- -- ...WBR, Roman Hlynovskiy
Amar S. Tumballi
2008-Sep-22 22:39 UTC
[Gluster-users] gluster(1.3.10) becomes unstable after some time
Hi Roman, Sorry for the delay in response. * The first problem I saw: After 20 minutes of some basic tests with file copying gluster mount on all servers became unavailable. Do you see any '/core*' files? this means the calls are bailing out, there are three possible reasons. i) because of heavy disk i/o, response is getting delayed, hence the default 'transport-timeout' option is not enough. Try higher values like 120. ii) a glusterfs process died, hence the clients couldn't connect to the corresponding server process (unlikely in your case a new connection is made again after call bail). iii) bug in glusterfs itself. in this case, we would like you to try 1.3.12 (latest 1.3.x release) or wait for another 10days for next pre release of 1.4 branch, which should work fine IMO.> The second problem I see - even with 'option > alu.read-only-subvolumes' gluster remains writing to the specified as > read-only volumes. what could be the reason for this? >The reason for it is, the 'read-only-subvolumes' option is used for making sure new files are not created on those two subvolumes. But if a file already exists on those subvolumes then it continues to grow. If you don't want any write to happen, you need to use filter. Regards, Amar -- Amar Tumballi Gluster/GlusterFS Hacker [bulde on #gluster/irc.gnu.org] http://www.zresearch.com - Commoditizing Super Storage! -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20080922/ca41ec50/attachment.html>