Francis GASCHET
2008-Dec-16 07:44 UTC
[Gluster-users] Performance problem, Network traffic, ASCII/BINARY protocol
Hi all, I installed glusterFS on 2 computers under Mandriva 2008. Connection type : Ethernet 100 mbits/S (mii-tool result: negotiated 100baseTx-FD flow-control) Hereafter is my configuration (same on both PCs): *glusterfs-server.vol:* volume dir_main type storage/posix # POSIX FS translator option directory /main # Export this directory end-volume volume locks_main type features/posix-locks subvolumes dir_main end-volume volume main type protocol/server option transport-type tcp/server # For TCP/IP transport subvolumes locks_main option auth.ip.main.allow 127.0.0.1,172.16.1.* # Allow access to "brick" volume end-volume *client.vol:* volume main_loc type protocol/client option transport-type tcp/client option remote-host localhost option remote-subvolume main end-volume volume main_dist type protocol/client option transport-type tcp/client option remote-host other option remote-subvolume main end-volume volume raid_main_afr type cluster/afr subvolumes main_loc main_dist option read-subvolume main_loc end-volume volume raid_main_ra type performance/read-ahead option page-size 128kB option page-count 4 option force-atime-update off subvolumes raid_main_afr end-volume volume raid_main_wb type performance/write-behind option aggregate-size 1MB option flush-behind on subvolumes raid_main_ra end-volume volume raid_main type performance/io-cache option cache-size 512MB option page-size 1MB option priority *:0 # *.html:2,*:1 option force-revalidate-timeout 2 # default is 1 subvolumes raid_main_wb end-volume It works fine, but slowly ! I'm a newbie in glusterFS, so may be some option isn't adequate. Please advise. Due to the option "read-subvolume main_loc" I didn't expect network traffic when I just list files or read them, but actually, even with a simple ls, I see a lot of network traffic. A "ls -R" takes 7 to 8 seconds for less than 8000 files. If I do it locally, I get the result instantly. *Question 1:* Is this traffic normal on read only operation ? *Question 2:* in the documentation, I read that there is 2 protocols : ASCII protocol and binary protocol. Currently, according to what I see with tcpdump, my glusterFS uses the ASCII protocol. I guess it's not the best for performance ! What is the way to enforce it using the binary protocol ? Thank's for any help. Best regards, -- Francis GASCHET / NUMLOG http://www.numlog.fr Tel.: +33 (0) 130 791 616 Fax.: +33 (0) 130 819 286 NUMLOG recrute sur LOLIX : http://fr.lolix.org/
Keith Freedman
2008-Dec-16 09:11 UTC
[Gluster-users] Performance problem, Network traffic, ASCII/BINARY protocol
I believe, 1.3 uses ascii and 1.4 uses binary protocols I noticed a HUGE performance boost when I went up to 1.4 Also, I''d remove your read-ahead, write-behind, and io-cache, and test again, then add back only one at a time, then 2 at a time, then all 3 and see what your results are. Personally, I dont trust using them with AFR.. it''s probably safe, but I have fears that the write-behind will capture data which is written later while it''s busy being changed on another server. Read-ahead might be ok in AFR, but even still. you risk pre-fetching data that changes a second later so you''re out of date... if I''m wrong I''m sure one of the devs will chime in. beyond that, I believe an ls will have some network traffic, the AFR has to check with the other server to validate the serial number. check your logs and see if it''s for some reason continually auto-healing the directory entry--if it is, that accounts for the delays, if not, then something else is in the way. At 11:44 PM 12/15/2008, Francis GASCHET wrote:>Hi all, > >I installed glusterFS on 2 computers under Mandriva 2008. >Connection type : Ethernet 100 mbits/S (mii-tool result: negotiated >100baseTx-FD flow-control) > >Hereafter is my configuration (same on both PCs): > >*glusterfs-server.vol:* > >volume dir_main > type storage/posix # POSIX FS >translator > option directory /main # Export >this directory >end-volume > >volume locks_main > type features/posix-locks > subvolumes dir_main >end-volume > >volume main > type protocol/server > option transport-type tcp/server # For >TCP/IP transport > subvolumes locks_main > option auth.ip.main.allow 127.0.0.1,172.16.1.* # Allow >access to "brick" volume >end-volume > > >*client.vol:* > >volume main_loc > type protocol/client > option transport-type tcp/client > option remote-host localhost > option remote-subvolume main >end-volume > >volume main_dist > type protocol/client > option transport-type tcp/client > option remote-host other > option remote-subvolume main >end-volume > >volume raid_main_afr > type cluster/afr > subvolumes main_loc main_dist > option read-subvolume main_loc >end-volume > >volume raid_main_ra > type performance/read-ahead > option page-size 128kB > option page-count 4 > option force-atime-update off > subvolumes raid_main_afr >end-volume > >volume raid_main_wb > type performance/write-behind > option aggregate-size 1MB > option flush-behind on > subvolumes raid_main_ra >end-volume > >volume raid_main > type performance/io-cache > option cache-size 512MB > option page-size 1MB > option priority *:0 # *.html:2,*:1 > option force-revalidate-timeout 2 # default is 1 > subvolumes raid_main_wb >end-volume > > >It works fine, but slowly ! >I''m a newbie in glusterFS, so may be some option isn''t adequate. Please >advise. > >Due to the option "read-subvolume main_loc" I didn''t expect network >traffic when I just list files or read them, but actually, even with a >simple ls, I see a lot of network traffic. >A "ls -R" takes 7 to 8 seconds for less than 8000 files. If I do it >locally, I get the result instantly. >*Question 1:* Is this traffic normal on read only operation ? > >*Question 2:* in the documentation, I read that there is 2 protocols : >ASCII protocol and binary protocol. Currently, according to what I see >with tcpdump, my glusterFS uses the ASCII protocol. I guess it''s not the >best for performance ! What is the way to enforce it using the binary >protocol ? > >Thank''s for any help. > >Best regards, > >-- >Francis GASCHET / NUMLOG >http://www.numlog.fr >Tel.: +33 (0) 130 791 616 >Fax.: +33 (0) 130 819 286 > >NUMLOG recrute sur LOLIX : >http://fr.lolix.org/ > > > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
Keith Freedman
2008-Dec-16 09:11 UTC
[Gluster-users] Performance problem, Network traffic, ASCII/BINARY protocol
I believe, 1.3 uses ascii and 1.4 uses binary protocols I noticed a HUGE performance boost when I went up to 1.4 Also, I'd remove your read-ahead, write-behind, and io-cache, and test again, then add back only one at a time, then 2 at a time, then all 3 and see what your results are. Personally, I dont trust using them with AFR.. it's probably safe, but I have fears that the write-behind will capture data which is written later while it's busy being changed on another server. Read-ahead might be ok in AFR, but even still. you risk pre-fetching data that changes a second later so you're out of date... if I'm wrong I'm sure one of the devs will chime in. beyond that, I believe an ls will have some network traffic, the AFR has to check with the other server to validate the serial number. check your logs and see if it's for some reason continually auto-healing the directory entry--if it is, that accounts for the delays, if not, then something else is in the way. At 11:44 PM 12/15/2008, Francis GASCHET wrote:>Hi all, > >I installed glusterFS on 2 computers under Mandriva 2008. >Connection type : Ethernet 100 mbits/S (mii-tool result: negotiated >100baseTx-FD flow-control) > >Hereafter is my configuration (same on both PCs): > >*glusterfs-server.vol:* > >volume dir_main > type storage/posix # POSIX FS >translator > option directory /main # Export >this directory >end-volume > >volume locks_main > type features/posix-locks > subvolumes dir_main >end-volume > >volume main > type protocol/server > option transport-type tcp/server # For >TCP/IP transport > subvolumes locks_main > option auth.ip.main.allow 127.0.0.1,172.16.1.* # Allow >access to "brick" volume >end-volume > > >*client.vol:* > >volume main_loc > type protocol/client > option transport-type tcp/client > option remote-host localhost > option remote-subvolume main >end-volume > >volume main_dist > type protocol/client > option transport-type tcp/client > option remote-host other > option remote-subvolume main >end-volume > >volume raid_main_afr > type cluster/afr > subvolumes main_loc main_dist > option read-subvolume main_loc >end-volume > >volume raid_main_ra > type performance/read-ahead > option page-size 128kB > option page-count 4 > option force-atime-update off > subvolumes raid_main_afr >end-volume > >volume raid_main_wb > type performance/write-behind > option aggregate-size 1MB > option flush-behind on > subvolumes raid_main_ra >end-volume > >volume raid_main > type performance/io-cache > option cache-size 512MB > option page-size 1MB > option priority *:0 # *.html:2,*:1 > option force-revalidate-timeout 2 # default is 1 > subvolumes raid_main_wb >end-volume > > >It works fine, but slowly ! >I'm a newbie in glusterFS, so may be some option isn't adequate. Please >advise. > >Due to the option "read-subvolume main_loc" I didn't expect network >traffic when I just list files or read them, but actually, even with a >simple ls, I see a lot of network traffic. >A "ls -R" takes 7 to 8 seconds for less than 8000 files. If I do it >locally, I get the result instantly. >*Question 1:* Is this traffic normal on read only operation ? > >*Question 2:* in the documentation, I read that there is 2 protocols : >ASCII protocol and binary protocol. Currently, according to what I see >with tcpdump, my glusterFS uses the ASCII protocol. I guess it's not the >best for performance ! What is the way to enforce it using the binary >protocol ? > >Thank's for any help. > >Best regards, > >-- >Francis GASCHET / NUMLOG >http://www.numlog.fr >Tel.: +33 (0) 130 791 616 >Fax.: +33 (0) 130 819 286 > >NUMLOG recrute sur LOLIX : >http://fr.lolix.org/ > > > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
Keith Freedman
2008-Dec-16 09:11 UTC
[Gluster-users] Performance problem, Network traffic, ASCII/BINARY protocol
I believe, 1.3 uses ascii and 1.4 uses binary protocols I noticed a HUGE performance boost when I went up to 1.4 Also, I''d remove your read-ahead, write-behind, and io-cache, and test again, then add back only one at a time, then 2 at a time, then all 3 and see what your results are. Personally, I dont trust using them with AFR.. it''s probably safe, but I have fears that the write-behind will capture data which is written later while it''s busy being changed on another server. Read-ahead might be ok in AFR, but even still. you risk pre-fetching data that changes a second later so you''re out of date... if I''m wrong I''m sure one of the devs will chime in. beyond that, I believe an ls will have some network traffic, the AFR has to check with the other server to validate the serial number. check your logs and see if it''s for some reason continually auto-healing the directory entry--if it is, that accounts for the delays, if not, then something else is in the way. At 11:44 PM 12/15/2008, Francis GASCHET wrote:>Hi all, > >I installed glusterFS on 2 computers under Mandriva 2008. >Connection type : Ethernet 100 mbits/S (mii-tool result: negotiated >100baseTx-FD flow-control) > >Hereafter is my configuration (same on both PCs): > >*glusterfs-server.vol:* > >volume dir_main > type storage/posix # POSIX FS >translator > option directory /main # Export >this directory >end-volume > >volume locks_main > type features/posix-locks > subvolumes dir_main >end-volume > >volume main > type protocol/server > option transport-type tcp/server # For >TCP/IP transport > subvolumes locks_main > option auth.ip.main.allow 127.0.0.1,172.16.1.* # Allow >access to "brick" volume >end-volume > > >*client.vol:* > >volume main_loc > type protocol/client > option transport-type tcp/client > option remote-host localhost > option remote-subvolume main >end-volume > >volume main_dist > type protocol/client > option transport-type tcp/client > option remote-host other > option remote-subvolume main >end-volume > >volume raid_main_afr > type cluster/afr > subvolumes main_loc main_dist > option read-subvolume main_loc >end-volume > >volume raid_main_ra > type performance/read-ahead > option page-size 128kB > option page-count 4 > option force-atime-update off > subvolumes raid_main_afr >end-volume > >volume raid_main_wb > type performance/write-behind > option aggregate-size 1MB > option flush-behind on > subvolumes raid_main_ra >end-volume > >volume raid_main > type performance/io-cache > option cache-size 512MB > option page-size 1MB > option priority *:0 # *.html:2,*:1 > option force-revalidate-timeout 2 # default is 1 > subvolumes raid_main_wb >end-volume > > >It works fine, but slowly ! >I''m a newbie in glusterFS, so may be some option isn''t adequate. Please >advise. > >Due to the option "read-subvolume main_loc" I didn''t expect network >traffic when I just list files or read them, but actually, even with a >simple ls, I see a lot of network traffic. >A "ls -R" takes 7 to 8 seconds for less than 8000 files. If I do it >locally, I get the result instantly. >*Question 1:* Is this traffic normal on read only operation ? > >*Question 2:* in the documentation, I read that there is 2 protocols : >ASCII protocol and binary protocol. Currently, according to what I see >with tcpdump, my glusterFS uses the ASCII protocol. I guess it''s not the >best for performance ! What is the way to enforce it using the binary >protocol ? > >Thank''s for any help. > >Best regards, > >-- >Francis GASCHET / NUMLOG >http://www.numlog.fr >Tel.: +33 (0) 130 791 616 >Fax.: +33 (0) 130 819 286 > >NUMLOG recrute sur LOLIX : >http://fr.lolix.org/ > > > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
Keith Freedman
2008-Dec-16 09:11 UTC
[Gluster-users] Performance problem, Network traffic, ASCII/BINARY protocol
I believe, 1.3 uses ascii and 1.4 uses binary protocols I noticed a HUGE performance boost when I went up to 1.4 Also, I''d remove your read-ahead, write-behind, and io-cache, and test again, then add back only one at a time, then 2 at a time, then all 3 and see what your results are. Personally, I dont trust using them with AFR.. it''s probably safe, but I have fears that the write-behind will capture data which is written later while it''s busy being changed on another server. Read-ahead might be ok in AFR, but even still. you risk pre-fetching data that changes a second later so you''re out of date... if I''m wrong I''m sure one of the devs will chime in. beyond that, I believe an ls will have some network traffic, the AFR has to check with the other server to validate the serial number. check your logs and see if it''s for some reason continually auto-healing the directory entry--if it is, that accounts for the delays, if not, then something else is in the way. At 11:44 PM 12/15/2008, Francis GASCHET wrote:>Hi all, > >I installed glusterFS on 2 computers under Mandriva 2008. >Connection type : Ethernet 100 mbits/S (mii-tool result: negotiated >100baseTx-FD flow-control) > >Hereafter is my configuration (same on both PCs): > >*glusterfs-server.vol:* > >volume dir_main > type storage/posix # POSIX FS >translator > option directory /main # Export >this directory >end-volume > >volume locks_main > type features/posix-locks > subvolumes dir_main >end-volume > >volume main > type protocol/server > option transport-type tcp/server # For >TCP/IP transport > subvolumes locks_main > option auth.ip.main.allow 127.0.0.1,172.16.1.* # Allow >access to "brick" volume >end-volume > > >*client.vol:* > >volume main_loc > type protocol/client > option transport-type tcp/client > option remote-host localhost > option remote-subvolume main >end-volume > >volume main_dist > type protocol/client > option transport-type tcp/client > option remote-host other > option remote-subvolume main >end-volume > >volume raid_main_afr > type cluster/afr > subvolumes main_loc main_dist > option read-subvolume main_loc >end-volume > >volume raid_main_ra > type performance/read-ahead > option page-size 128kB > option page-count 4 > option force-atime-update off > subvolumes raid_main_afr >end-volume > >volume raid_main_wb > type performance/write-behind > option aggregate-size 1MB > option flush-behind on > subvolumes raid_main_ra >end-volume > >volume raid_main > type performance/io-cache > option cache-size 512MB > option page-size 1MB > option priority *:0 # *.html:2,*:1 > option force-revalidate-timeout 2 # default is 1 > subvolumes raid_main_wb >end-volume > > >It works fine, but slowly ! >I''m a newbie in glusterFS, so may be some option isn''t adequate. Please >advise. > >Due to the option "read-subvolume main_loc" I didn''t expect network >traffic when I just list files or read them, but actually, even with a >simple ls, I see a lot of network traffic. >A "ls -R" takes 7 to 8 seconds for less than 8000 files. If I do it >locally, I get the result instantly. >*Question 1:* Is this traffic normal on read only operation ? > >*Question 2:* in the documentation, I read that there is 2 protocols : >ASCII protocol and binary protocol. Currently, according to what I see >with tcpdump, my glusterFS uses the ASCII protocol. I guess it''s not the >best for performance ! What is the way to enforce it using the binary >protocol ? > >Thank''s for any help. > >Best regards, > >-- >Francis GASCHET / NUMLOG >http://www.numlog.fr >Tel.: +33 (0) 130 791 616 >Fax.: +33 (0) 130 819 286 > >NUMLOG recrute sur LOLIX : >http://fr.lolix.org/ > > > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
Daniel Maher
2008-Dec-16 09:25 UTC
[Gluster-users] Performance problem, Network traffic, ASCII/BINARY protocol
Keith Freedman wrote:> Also, I'd remove your read-ahead, write-behind, and io-cache, and > test again, then add back only one at a time, then 2 at a time, then > all 3 and see what your results are. > Personally, I dont trust using them with AFR.. it's probably safe, > but I have fears that the write-behind will capture data which is > written later while it's busy being changed on another > server. Read-ahead might be ok in AFR, but even still. you risk > pre-fetching data that changes a second later so you're out of > date... if I'm wrong I'm sure one of the devs will chime in.I can confirm that under 1.3, using the write-behind and read-ahead caches in an AFR environment, especially if you're dealing with lots of changes to many small files, can result in a total nightmare. We've seen files appear to be empty, files with out-dated content, and files with the content of /other/ files in them. When we disabled translators, all of these problems ceased to occur. Unfortunately, removing performance translators has the side effect of (you guessed it) hurting performance. Even though we no longer have occasionally inconsistant data, we also took a severe performance hit for the rest of the Gluster-hosted files which, normally, worked without a hitch. Mr. Avati has suggested that under 1.4 this would no longer be the case, as writes are now atomic. We're actually going ahead with the upgrade to 1.4 today - hopefully we'll be able to re-enable the translators... -- Daniel Maher <dma+gluster AT witbe DOT net>