Francis GASCHET
2008-Dec-16 07:44 UTC
[Gluster-users] Performance problem, Network traffic, ASCII/BINARY protocol
Hi all,
I installed glusterFS on 2 computers under Mandriva 2008.
Connection type : Ethernet 100 mbits/S (mii-tool result: negotiated
100baseTx-FD flow-control)
Hereafter is my configuration (same on both PCs):
*glusterfs-server.vol:*
volume dir_main
type storage/posix # POSIX FS
translator
option directory /main # Export
this directory
end-volume
volume locks_main
type features/posix-locks
subvolumes dir_main
end-volume
volume main
type protocol/server
option transport-type tcp/server # For
TCP/IP transport
subvolumes locks_main
option auth.ip.main.allow 127.0.0.1,172.16.1.* # Allow
access to "brick" volume
end-volume
*client.vol:*
volume main_loc
type protocol/client
option transport-type tcp/client
option remote-host localhost
option remote-subvolume main
end-volume
volume main_dist
type protocol/client
option transport-type tcp/client
option remote-host other
option remote-subvolume main
end-volume
volume raid_main_afr
type cluster/afr
subvolumes main_loc main_dist
option read-subvolume main_loc
end-volume
volume raid_main_ra
type performance/read-ahead
option page-size 128kB
option page-count 4
option force-atime-update off
subvolumes raid_main_afr
end-volume
volume raid_main_wb
type performance/write-behind
option aggregate-size 1MB
option flush-behind on
subvolumes raid_main_ra
end-volume
volume raid_main
type performance/io-cache
option cache-size 512MB
option page-size 1MB
option priority *:0 # *.html:2,*:1
option force-revalidate-timeout 2 # default is 1
subvolumes raid_main_wb
end-volume
It works fine, but slowly !
I'm a newbie in glusterFS, so may be some option isn't adequate. Please
advise.
Due to the option "read-subvolume main_loc" I didn't expect
network
traffic when I just list files or read them, but actually, even with a
simple ls, I see a lot of network traffic.
A "ls -R" takes 7 to 8 seconds for less than 8000 files. If I do it
locally, I get the result instantly.
*Question 1:* Is this traffic normal on read only operation ?
*Question 2:* in the documentation, I read that there is 2 protocols :
ASCII protocol and binary protocol. Currently, according to what I see
with tcpdump, my glusterFS uses the ASCII protocol. I guess it's not the
best for performance ! What is the way to enforce it using the binary
protocol ?
Thank's for any help.
Best regards,
--
Francis GASCHET / NUMLOG
http://www.numlog.fr
Tel.: +33 (0) 130 791 616
Fax.: +33 (0) 130 819 286
NUMLOG recrute sur LOLIX :
http://fr.lolix.org/
Keith Freedman
2008-Dec-16 09:11 UTC
[Gluster-users] Performance problem, Network traffic, ASCII/BINARY protocol
I believe, 1.3 uses ascii and 1.4 uses binary protocols I noticed a HUGE performance boost when I went up to 1.4 Also, I''d remove your read-ahead, write-behind, and io-cache, and test again, then add back only one at a time, then 2 at a time, then all 3 and see what your results are. Personally, I dont trust using them with AFR.. it''s probably safe, but I have fears that the write-behind will capture data which is written later while it''s busy being changed on another server. Read-ahead might be ok in AFR, but even still. you risk pre-fetching data that changes a second later so you''re out of date... if I''m wrong I''m sure one of the devs will chime in. beyond that, I believe an ls will have some network traffic, the AFR has to check with the other server to validate the serial number. check your logs and see if it''s for some reason continually auto-healing the directory entry--if it is, that accounts for the delays, if not, then something else is in the way. At 11:44 PM 12/15/2008, Francis GASCHET wrote:>Hi all, > >I installed glusterFS on 2 computers under Mandriva 2008. >Connection type : Ethernet 100 mbits/S (mii-tool result: negotiated >100baseTx-FD flow-control) > >Hereafter is my configuration (same on both PCs): > >*glusterfs-server.vol:* > >volume dir_main > type storage/posix # POSIX FS >translator > option directory /main # Export >this directory >end-volume > >volume locks_main > type features/posix-locks > subvolumes dir_main >end-volume > >volume main > type protocol/server > option transport-type tcp/server # For >TCP/IP transport > subvolumes locks_main > option auth.ip.main.allow 127.0.0.1,172.16.1.* # Allow >access to "brick" volume >end-volume > > >*client.vol:* > >volume main_loc > type protocol/client > option transport-type tcp/client > option remote-host localhost > option remote-subvolume main >end-volume > >volume main_dist > type protocol/client > option transport-type tcp/client > option remote-host other > option remote-subvolume main >end-volume > >volume raid_main_afr > type cluster/afr > subvolumes main_loc main_dist > option read-subvolume main_loc >end-volume > >volume raid_main_ra > type performance/read-ahead > option page-size 128kB > option page-count 4 > option force-atime-update off > subvolumes raid_main_afr >end-volume > >volume raid_main_wb > type performance/write-behind > option aggregate-size 1MB > option flush-behind on > subvolumes raid_main_ra >end-volume > >volume raid_main > type performance/io-cache > option cache-size 512MB > option page-size 1MB > option priority *:0 # *.html:2,*:1 > option force-revalidate-timeout 2 # default is 1 > subvolumes raid_main_wb >end-volume > > >It works fine, but slowly ! >I''m a newbie in glusterFS, so may be some option isn''t adequate. Please >advise. > >Due to the option "read-subvolume main_loc" I didn''t expect network >traffic when I just list files or read them, but actually, even with a >simple ls, I see a lot of network traffic. >A "ls -R" takes 7 to 8 seconds for less than 8000 files. If I do it >locally, I get the result instantly. >*Question 1:* Is this traffic normal on read only operation ? > >*Question 2:* in the documentation, I read that there is 2 protocols : >ASCII protocol and binary protocol. Currently, according to what I see >with tcpdump, my glusterFS uses the ASCII protocol. I guess it''s not the >best for performance ! What is the way to enforce it using the binary >protocol ? > >Thank''s for any help. > >Best regards, > >-- >Francis GASCHET / NUMLOG >http://www.numlog.fr >Tel.: +33 (0) 130 791 616 >Fax.: +33 (0) 130 819 286 > >NUMLOG recrute sur LOLIX : >http://fr.lolix.org/ > > > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
Keith Freedman
2008-Dec-16 09:11 UTC
[Gluster-users] Performance problem, Network traffic, ASCII/BINARY protocol
I believe, 1.3 uses ascii and 1.4 uses binary protocols I noticed a HUGE performance boost when I went up to 1.4 Also, I'd remove your read-ahead, write-behind, and io-cache, and test again, then add back only one at a time, then 2 at a time, then all 3 and see what your results are. Personally, I dont trust using them with AFR.. it's probably safe, but I have fears that the write-behind will capture data which is written later while it's busy being changed on another server. Read-ahead might be ok in AFR, but even still. you risk pre-fetching data that changes a second later so you're out of date... if I'm wrong I'm sure one of the devs will chime in. beyond that, I believe an ls will have some network traffic, the AFR has to check with the other server to validate the serial number. check your logs and see if it's for some reason continually auto-healing the directory entry--if it is, that accounts for the delays, if not, then something else is in the way. At 11:44 PM 12/15/2008, Francis GASCHET wrote:>Hi all, > >I installed glusterFS on 2 computers under Mandriva 2008. >Connection type : Ethernet 100 mbits/S (mii-tool result: negotiated >100baseTx-FD flow-control) > >Hereafter is my configuration (same on both PCs): > >*glusterfs-server.vol:* > >volume dir_main > type storage/posix # POSIX FS >translator > option directory /main # Export >this directory >end-volume > >volume locks_main > type features/posix-locks > subvolumes dir_main >end-volume > >volume main > type protocol/server > option transport-type tcp/server # For >TCP/IP transport > subvolumes locks_main > option auth.ip.main.allow 127.0.0.1,172.16.1.* # Allow >access to "brick" volume >end-volume > > >*client.vol:* > >volume main_loc > type protocol/client > option transport-type tcp/client > option remote-host localhost > option remote-subvolume main >end-volume > >volume main_dist > type protocol/client > option transport-type tcp/client > option remote-host other > option remote-subvolume main >end-volume > >volume raid_main_afr > type cluster/afr > subvolumes main_loc main_dist > option read-subvolume main_loc >end-volume > >volume raid_main_ra > type performance/read-ahead > option page-size 128kB > option page-count 4 > option force-atime-update off > subvolumes raid_main_afr >end-volume > >volume raid_main_wb > type performance/write-behind > option aggregate-size 1MB > option flush-behind on > subvolumes raid_main_ra >end-volume > >volume raid_main > type performance/io-cache > option cache-size 512MB > option page-size 1MB > option priority *:0 # *.html:2,*:1 > option force-revalidate-timeout 2 # default is 1 > subvolumes raid_main_wb >end-volume > > >It works fine, but slowly ! >I'm a newbie in glusterFS, so may be some option isn't adequate. Please >advise. > >Due to the option "read-subvolume main_loc" I didn't expect network >traffic when I just list files or read them, but actually, even with a >simple ls, I see a lot of network traffic. >A "ls -R" takes 7 to 8 seconds for less than 8000 files. If I do it >locally, I get the result instantly. >*Question 1:* Is this traffic normal on read only operation ? > >*Question 2:* in the documentation, I read that there is 2 protocols : >ASCII protocol and binary protocol. Currently, according to what I see >with tcpdump, my glusterFS uses the ASCII protocol. I guess it's not the >best for performance ! What is the way to enforce it using the binary >protocol ? > >Thank's for any help. > >Best regards, > >-- >Francis GASCHET / NUMLOG >http://www.numlog.fr >Tel.: +33 (0) 130 791 616 >Fax.: +33 (0) 130 819 286 > >NUMLOG recrute sur LOLIX : >http://fr.lolix.org/ > > > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
Keith Freedman
2008-Dec-16 09:11 UTC
[Gluster-users] Performance problem, Network traffic, ASCII/BINARY protocol
I believe, 1.3 uses ascii and 1.4 uses binary protocols I noticed a HUGE performance boost when I went up to 1.4 Also, I''d remove your read-ahead, write-behind, and io-cache, and test again, then add back only one at a time, then 2 at a time, then all 3 and see what your results are. Personally, I dont trust using them with AFR.. it''s probably safe, but I have fears that the write-behind will capture data which is written later while it''s busy being changed on another server. Read-ahead might be ok in AFR, but even still. you risk pre-fetching data that changes a second later so you''re out of date... if I''m wrong I''m sure one of the devs will chime in. beyond that, I believe an ls will have some network traffic, the AFR has to check with the other server to validate the serial number. check your logs and see if it''s for some reason continually auto-healing the directory entry--if it is, that accounts for the delays, if not, then something else is in the way. At 11:44 PM 12/15/2008, Francis GASCHET wrote:>Hi all, > >I installed glusterFS on 2 computers under Mandriva 2008. >Connection type : Ethernet 100 mbits/S (mii-tool result: negotiated >100baseTx-FD flow-control) > >Hereafter is my configuration (same on both PCs): > >*glusterfs-server.vol:* > >volume dir_main > type storage/posix # POSIX FS >translator > option directory /main # Export >this directory >end-volume > >volume locks_main > type features/posix-locks > subvolumes dir_main >end-volume > >volume main > type protocol/server > option transport-type tcp/server # For >TCP/IP transport > subvolumes locks_main > option auth.ip.main.allow 127.0.0.1,172.16.1.* # Allow >access to "brick" volume >end-volume > > >*client.vol:* > >volume main_loc > type protocol/client > option transport-type tcp/client > option remote-host localhost > option remote-subvolume main >end-volume > >volume main_dist > type protocol/client > option transport-type tcp/client > option remote-host other > option remote-subvolume main >end-volume > >volume raid_main_afr > type cluster/afr > subvolumes main_loc main_dist > option read-subvolume main_loc >end-volume > >volume raid_main_ra > type performance/read-ahead > option page-size 128kB > option page-count 4 > option force-atime-update off > subvolumes raid_main_afr >end-volume > >volume raid_main_wb > type performance/write-behind > option aggregate-size 1MB > option flush-behind on > subvolumes raid_main_ra >end-volume > >volume raid_main > type performance/io-cache > option cache-size 512MB > option page-size 1MB > option priority *:0 # *.html:2,*:1 > option force-revalidate-timeout 2 # default is 1 > subvolumes raid_main_wb >end-volume > > >It works fine, but slowly ! >I''m a newbie in glusterFS, so may be some option isn''t adequate. Please >advise. > >Due to the option "read-subvolume main_loc" I didn''t expect network >traffic when I just list files or read them, but actually, even with a >simple ls, I see a lot of network traffic. >A "ls -R" takes 7 to 8 seconds for less than 8000 files. If I do it >locally, I get the result instantly. >*Question 1:* Is this traffic normal on read only operation ? > >*Question 2:* in the documentation, I read that there is 2 protocols : >ASCII protocol and binary protocol. Currently, according to what I see >with tcpdump, my glusterFS uses the ASCII protocol. I guess it''s not the >best for performance ! What is the way to enforce it using the binary >protocol ? > >Thank''s for any help. > >Best regards, > >-- >Francis GASCHET / NUMLOG >http://www.numlog.fr >Tel.: +33 (0) 130 791 616 >Fax.: +33 (0) 130 819 286 > >NUMLOG recrute sur LOLIX : >http://fr.lolix.org/ > > > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
Keith Freedman
2008-Dec-16 09:11 UTC
[Gluster-users] Performance problem, Network traffic, ASCII/BINARY protocol
I believe, 1.3 uses ascii and 1.4 uses binary protocols I noticed a HUGE performance boost when I went up to 1.4 Also, I''d remove your read-ahead, write-behind, and io-cache, and test again, then add back only one at a time, then 2 at a time, then all 3 and see what your results are. Personally, I dont trust using them with AFR.. it''s probably safe, but I have fears that the write-behind will capture data which is written later while it''s busy being changed on another server. Read-ahead might be ok in AFR, but even still. you risk pre-fetching data that changes a second later so you''re out of date... if I''m wrong I''m sure one of the devs will chime in. beyond that, I believe an ls will have some network traffic, the AFR has to check with the other server to validate the serial number. check your logs and see if it''s for some reason continually auto-healing the directory entry--if it is, that accounts for the delays, if not, then something else is in the way. At 11:44 PM 12/15/2008, Francis GASCHET wrote:>Hi all, > >I installed glusterFS on 2 computers under Mandriva 2008. >Connection type : Ethernet 100 mbits/S (mii-tool result: negotiated >100baseTx-FD flow-control) > >Hereafter is my configuration (same on both PCs): > >*glusterfs-server.vol:* > >volume dir_main > type storage/posix # POSIX FS >translator > option directory /main # Export >this directory >end-volume > >volume locks_main > type features/posix-locks > subvolumes dir_main >end-volume > >volume main > type protocol/server > option transport-type tcp/server # For >TCP/IP transport > subvolumes locks_main > option auth.ip.main.allow 127.0.0.1,172.16.1.* # Allow >access to "brick" volume >end-volume > > >*client.vol:* > >volume main_loc > type protocol/client > option transport-type tcp/client > option remote-host localhost > option remote-subvolume main >end-volume > >volume main_dist > type protocol/client > option transport-type tcp/client > option remote-host other > option remote-subvolume main >end-volume > >volume raid_main_afr > type cluster/afr > subvolumes main_loc main_dist > option read-subvolume main_loc >end-volume > >volume raid_main_ra > type performance/read-ahead > option page-size 128kB > option page-count 4 > option force-atime-update off > subvolumes raid_main_afr >end-volume > >volume raid_main_wb > type performance/write-behind > option aggregate-size 1MB > option flush-behind on > subvolumes raid_main_ra >end-volume > >volume raid_main > type performance/io-cache > option cache-size 512MB > option page-size 1MB > option priority *:0 # *.html:2,*:1 > option force-revalidate-timeout 2 # default is 1 > subvolumes raid_main_wb >end-volume > > >It works fine, but slowly ! >I''m a newbie in glusterFS, so may be some option isn''t adequate. Please >advise. > >Due to the option "read-subvolume main_loc" I didn''t expect network >traffic when I just list files or read them, but actually, even with a >simple ls, I see a lot of network traffic. >A "ls -R" takes 7 to 8 seconds for less than 8000 files. If I do it >locally, I get the result instantly. >*Question 1:* Is this traffic normal on read only operation ? > >*Question 2:* in the documentation, I read that there is 2 protocols : >ASCII protocol and binary protocol. Currently, according to what I see >with tcpdump, my glusterFS uses the ASCII protocol. I guess it''s not the >best for performance ! What is the way to enforce it using the binary >protocol ? > >Thank''s for any help. > >Best regards, > >-- >Francis GASCHET / NUMLOG >http://www.numlog.fr >Tel.: +33 (0) 130 791 616 >Fax.: +33 (0) 130 819 286 > >NUMLOG recrute sur LOLIX : >http://fr.lolix.org/ > > > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
Daniel Maher
2008-Dec-16 09:25 UTC
[Gluster-users] Performance problem, Network traffic, ASCII/BINARY protocol
Keith Freedman wrote:> Also, I'd remove your read-ahead, write-behind, and io-cache, and > test again, then add back only one at a time, then 2 at a time, then > all 3 and see what your results are. > Personally, I dont trust using them with AFR.. it's probably safe, > but I have fears that the write-behind will capture data which is > written later while it's busy being changed on another > server. Read-ahead might be ok in AFR, but even still. you risk > pre-fetching data that changes a second later so you're out of > date... if I'm wrong I'm sure one of the devs will chime in.I can confirm that under 1.3, using the write-behind and read-ahead caches in an AFR environment, especially if you're dealing with lots of changes to many small files, can result in a total nightmare. We've seen files appear to be empty, files with out-dated content, and files with the content of /other/ files in them. When we disabled translators, all of these problems ceased to occur. Unfortunately, removing performance translators has the side effect of (you guessed it) hurting performance. Even though we no longer have occasionally inconsistant data, we also took a severe performance hit for the rest of the Gluster-hosted files which, normally, worked without a hitch. Mr. Avati has suggested that under 1.4 this would no longer be the case, as writes are now atomic. We're actually going ahead with the upgrade to 1.4 today - hopefully we'll be able to re-enable the translators... -- Daniel Maher <dma+gluster AT witbe DOT net>