Francis GASCHET
2008-Dec-16  07:44 UTC
[Gluster-users] Performance problem, Network traffic, ASCII/BINARY protocol
Hi all,
I installed glusterFS on 2 computers under Mandriva 2008.
Connection type : Ethernet 100 mbits/S (mii-tool result: negotiated
100baseTx-FD flow-control)
Hereafter is my configuration (same on both PCs):
*glusterfs-server.vol:*
volume dir_main
   type storage/posix                                        # POSIX FS
translator
   option directory /main                                    # Export
this directory
end-volume
volume locks_main
     type features/posix-locks
     subvolumes dir_main
end-volume
volume main
     type protocol/server
     option transport-type tcp/server                          # For
TCP/IP transport
    subvolumes locks_main
     option auth.ip.main.allow 127.0.0.1,172.16.1.*            # Allow
access to "brick" volume
end-volume
*client.vol:*
volume main_loc
     type protocol/client
     option transport-type tcp/client
     option remote-host localhost
     option remote-subvolume main
end-volume
volume main_dist
     type protocol/client
     option transport-type tcp/client
     option remote-host other
     option remote-subvolume main
end-volume
volume raid_main_afr
     type cluster/afr
     subvolumes main_loc main_dist
     option read-subvolume main_loc
end-volume
volume raid_main_ra
     type performance/read-ahead
     option page-size 128kB
     option page-count 4
     option force-atime-update off
     subvolumes raid_main_afr
end-volume
volume raid_main_wb
     type performance/write-behind
     option aggregate-size 1MB
     option flush-behind on
     subvolumes raid_main_ra
end-volume
volume raid_main
     type performance/io-cache
     option cache-size 512MB
     option page-size 1MB
     option priority *:0                 # *.html:2,*:1
     option force-revalidate-timeout 2   # default is 1
     subvolumes raid_main_wb
end-volume
It works fine, but slowly !
I'm a newbie in glusterFS, so may be some option isn't adequate. Please
advise.
Due to the option "read-subvolume main_loc" I didn't expect
network
traffic when I just list files or read them, but actually, even with a
simple ls, I see a lot of network traffic.
A "ls -R" takes 7 to 8 seconds for less than 8000 files. If I do it
locally, I get the result instantly.
*Question 1:* Is this traffic normal on read only operation ?
*Question 2:* in the documentation, I read that there is 2 protocols :
ASCII protocol and binary protocol. Currently, according to what I see
with tcpdump, my glusterFS uses the ASCII protocol. I guess it's not the
best for performance ! What is the way to enforce it using the binary
protocol ?
Thank's for any help.
Best regards,
-- 
Francis GASCHET / NUMLOG
http://www.numlog.fr
Tel.: +33 (0) 130 791 616
Fax.: +33 (0) 130 819 286
NUMLOG recrute sur LOLIX :
http://fr.lolix.org/
Keith Freedman
2008-Dec-16  09:11 UTC
[Gluster-users] Performance problem, Network traffic, ASCII/BINARY protocol
I believe, 1.3 uses ascii and 1.4 uses binary protocols I noticed a HUGE performance boost when I went up to 1.4 Also, I''d remove your read-ahead, write-behind, and io-cache, and test again, then add back only one at a time, then 2 at a time, then all 3 and see what your results are. Personally, I dont trust using them with AFR.. it''s probably safe, but I have fears that the write-behind will capture data which is written later while it''s busy being changed on another server. Read-ahead might be ok in AFR, but even still. you risk pre-fetching data that changes a second later so you''re out of date... if I''m wrong I''m sure one of the devs will chime in. beyond that, I believe an ls will have some network traffic, the AFR has to check with the other server to validate the serial number. check your logs and see if it''s for some reason continually auto-healing the directory entry--if it is, that accounts for the delays, if not, then something else is in the way. At 11:44 PM 12/15/2008, Francis GASCHET wrote:>Hi all, > >I installed glusterFS on 2 computers under Mandriva 2008. >Connection type : Ethernet 100 mbits/S (mii-tool result: negotiated >100baseTx-FD flow-control) > >Hereafter is my configuration (same on both PCs): > >*glusterfs-server.vol:* > >volume dir_main > type storage/posix # POSIX FS >translator > option directory /main # Export >this directory >end-volume > >volume locks_main > type features/posix-locks > subvolumes dir_main >end-volume > >volume main > type protocol/server > option transport-type tcp/server # For >TCP/IP transport > subvolumes locks_main > option auth.ip.main.allow 127.0.0.1,172.16.1.* # Allow >access to "brick" volume >end-volume > > >*client.vol:* > >volume main_loc > type protocol/client > option transport-type tcp/client > option remote-host localhost > option remote-subvolume main >end-volume > >volume main_dist > type protocol/client > option transport-type tcp/client > option remote-host other > option remote-subvolume main >end-volume > >volume raid_main_afr > type cluster/afr > subvolumes main_loc main_dist > option read-subvolume main_loc >end-volume > >volume raid_main_ra > type performance/read-ahead > option page-size 128kB > option page-count 4 > option force-atime-update off > subvolumes raid_main_afr >end-volume > >volume raid_main_wb > type performance/write-behind > option aggregate-size 1MB > option flush-behind on > subvolumes raid_main_ra >end-volume > >volume raid_main > type performance/io-cache > option cache-size 512MB > option page-size 1MB > option priority *:0 # *.html:2,*:1 > option force-revalidate-timeout 2 # default is 1 > subvolumes raid_main_wb >end-volume > > >It works fine, but slowly ! >I''m a newbie in glusterFS, so may be some option isn''t adequate. Please >advise. > >Due to the option "read-subvolume main_loc" I didn''t expect network >traffic when I just list files or read them, but actually, even with a >simple ls, I see a lot of network traffic. >A "ls -R" takes 7 to 8 seconds for less than 8000 files. If I do it >locally, I get the result instantly. >*Question 1:* Is this traffic normal on read only operation ? > >*Question 2:* in the documentation, I read that there is 2 protocols : >ASCII protocol and binary protocol. Currently, according to what I see >with tcpdump, my glusterFS uses the ASCII protocol. I guess it''s not the >best for performance ! What is the way to enforce it using the binary >protocol ? > >Thank''s for any help. > >Best regards, > >-- >Francis GASCHET / NUMLOG >http://www.numlog.fr >Tel.: +33 (0) 130 791 616 >Fax.: +33 (0) 130 819 286 > >NUMLOG recrute sur LOLIX : >http://fr.lolix.org/ > > > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
Keith Freedman
2008-Dec-16  09:11 UTC
[Gluster-users] Performance problem, Network traffic, ASCII/BINARY protocol
I believe, 1.3 uses ascii and 1.4 uses binary protocols I noticed a HUGE performance boost when I went up to 1.4 Also, I'd remove your read-ahead, write-behind, and io-cache, and test again, then add back only one at a time, then 2 at a time, then all 3 and see what your results are. Personally, I dont trust using them with AFR.. it's probably safe, but I have fears that the write-behind will capture data which is written later while it's busy being changed on another server. Read-ahead might be ok in AFR, but even still. you risk pre-fetching data that changes a second later so you're out of date... if I'm wrong I'm sure one of the devs will chime in. beyond that, I believe an ls will have some network traffic, the AFR has to check with the other server to validate the serial number. check your logs and see if it's for some reason continually auto-healing the directory entry--if it is, that accounts for the delays, if not, then something else is in the way. At 11:44 PM 12/15/2008, Francis GASCHET wrote:>Hi all, > >I installed glusterFS on 2 computers under Mandriva 2008. >Connection type : Ethernet 100 mbits/S (mii-tool result: negotiated >100baseTx-FD flow-control) > >Hereafter is my configuration (same on both PCs): > >*glusterfs-server.vol:* > >volume dir_main > type storage/posix # POSIX FS >translator > option directory /main # Export >this directory >end-volume > >volume locks_main > type features/posix-locks > subvolumes dir_main >end-volume > >volume main > type protocol/server > option transport-type tcp/server # For >TCP/IP transport > subvolumes locks_main > option auth.ip.main.allow 127.0.0.1,172.16.1.* # Allow >access to "brick" volume >end-volume > > >*client.vol:* > >volume main_loc > type protocol/client > option transport-type tcp/client > option remote-host localhost > option remote-subvolume main >end-volume > >volume main_dist > type protocol/client > option transport-type tcp/client > option remote-host other > option remote-subvolume main >end-volume > >volume raid_main_afr > type cluster/afr > subvolumes main_loc main_dist > option read-subvolume main_loc >end-volume > >volume raid_main_ra > type performance/read-ahead > option page-size 128kB > option page-count 4 > option force-atime-update off > subvolumes raid_main_afr >end-volume > >volume raid_main_wb > type performance/write-behind > option aggregate-size 1MB > option flush-behind on > subvolumes raid_main_ra >end-volume > >volume raid_main > type performance/io-cache > option cache-size 512MB > option page-size 1MB > option priority *:0 # *.html:2,*:1 > option force-revalidate-timeout 2 # default is 1 > subvolumes raid_main_wb >end-volume > > >It works fine, but slowly ! >I'm a newbie in glusterFS, so may be some option isn't adequate. Please >advise. > >Due to the option "read-subvolume main_loc" I didn't expect network >traffic when I just list files or read them, but actually, even with a >simple ls, I see a lot of network traffic. >A "ls -R" takes 7 to 8 seconds for less than 8000 files. If I do it >locally, I get the result instantly. >*Question 1:* Is this traffic normal on read only operation ? > >*Question 2:* in the documentation, I read that there is 2 protocols : >ASCII protocol and binary protocol. Currently, according to what I see >with tcpdump, my glusterFS uses the ASCII protocol. I guess it's not the >best for performance ! What is the way to enforce it using the binary >protocol ? > >Thank's for any help. > >Best regards, > >-- >Francis GASCHET / NUMLOG >http://www.numlog.fr >Tel.: +33 (0) 130 791 616 >Fax.: +33 (0) 130 819 286 > >NUMLOG recrute sur LOLIX : >http://fr.lolix.org/ > > > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
Keith Freedman
2008-Dec-16  09:11 UTC
[Gluster-users] Performance problem, Network traffic, ASCII/BINARY protocol
I believe, 1.3 uses ascii and 1.4 uses binary protocols I noticed a HUGE performance boost when I went up to 1.4 Also, I''d remove your read-ahead, write-behind, and io-cache, and test again, then add back only one at a time, then 2 at a time, then all 3 and see what your results are. Personally, I dont trust using them with AFR.. it''s probably safe, but I have fears that the write-behind will capture data which is written later while it''s busy being changed on another server. Read-ahead might be ok in AFR, but even still. you risk pre-fetching data that changes a second later so you''re out of date... if I''m wrong I''m sure one of the devs will chime in. beyond that, I believe an ls will have some network traffic, the AFR has to check with the other server to validate the serial number. check your logs and see if it''s for some reason continually auto-healing the directory entry--if it is, that accounts for the delays, if not, then something else is in the way. At 11:44 PM 12/15/2008, Francis GASCHET wrote:>Hi all, > >I installed glusterFS on 2 computers under Mandriva 2008. >Connection type : Ethernet 100 mbits/S (mii-tool result: negotiated >100baseTx-FD flow-control) > >Hereafter is my configuration (same on both PCs): > >*glusterfs-server.vol:* > >volume dir_main > type storage/posix # POSIX FS >translator > option directory /main # Export >this directory >end-volume > >volume locks_main > type features/posix-locks > subvolumes dir_main >end-volume > >volume main > type protocol/server > option transport-type tcp/server # For >TCP/IP transport > subvolumes locks_main > option auth.ip.main.allow 127.0.0.1,172.16.1.* # Allow >access to "brick" volume >end-volume > > >*client.vol:* > >volume main_loc > type protocol/client > option transport-type tcp/client > option remote-host localhost > option remote-subvolume main >end-volume > >volume main_dist > type protocol/client > option transport-type tcp/client > option remote-host other > option remote-subvolume main >end-volume > >volume raid_main_afr > type cluster/afr > subvolumes main_loc main_dist > option read-subvolume main_loc >end-volume > >volume raid_main_ra > type performance/read-ahead > option page-size 128kB > option page-count 4 > option force-atime-update off > subvolumes raid_main_afr >end-volume > >volume raid_main_wb > type performance/write-behind > option aggregate-size 1MB > option flush-behind on > subvolumes raid_main_ra >end-volume > >volume raid_main > type performance/io-cache > option cache-size 512MB > option page-size 1MB > option priority *:0 # *.html:2,*:1 > option force-revalidate-timeout 2 # default is 1 > subvolumes raid_main_wb >end-volume > > >It works fine, but slowly ! >I''m a newbie in glusterFS, so may be some option isn''t adequate. Please >advise. > >Due to the option "read-subvolume main_loc" I didn''t expect network >traffic when I just list files or read them, but actually, even with a >simple ls, I see a lot of network traffic. >A "ls -R" takes 7 to 8 seconds for less than 8000 files. If I do it >locally, I get the result instantly. >*Question 1:* Is this traffic normal on read only operation ? > >*Question 2:* in the documentation, I read that there is 2 protocols : >ASCII protocol and binary protocol. Currently, according to what I see >with tcpdump, my glusterFS uses the ASCII protocol. I guess it''s not the >best for performance ! What is the way to enforce it using the binary >protocol ? > >Thank''s for any help. > >Best regards, > >-- >Francis GASCHET / NUMLOG >http://www.numlog.fr >Tel.: +33 (0) 130 791 616 >Fax.: +33 (0) 130 819 286 > >NUMLOG recrute sur LOLIX : >http://fr.lolix.org/ > > > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
Keith Freedman
2008-Dec-16  09:11 UTC
[Gluster-users] Performance problem, Network traffic, ASCII/BINARY protocol
I believe, 1.3 uses ascii and 1.4 uses binary protocols I noticed a HUGE performance boost when I went up to 1.4 Also, I''d remove your read-ahead, write-behind, and io-cache, and test again, then add back only one at a time, then 2 at a time, then all 3 and see what your results are. Personally, I dont trust using them with AFR.. it''s probably safe, but I have fears that the write-behind will capture data which is written later while it''s busy being changed on another server. Read-ahead might be ok in AFR, but even still. you risk pre-fetching data that changes a second later so you''re out of date... if I''m wrong I''m sure one of the devs will chime in. beyond that, I believe an ls will have some network traffic, the AFR has to check with the other server to validate the serial number. check your logs and see if it''s for some reason continually auto-healing the directory entry--if it is, that accounts for the delays, if not, then something else is in the way. At 11:44 PM 12/15/2008, Francis GASCHET wrote:>Hi all, > >I installed glusterFS on 2 computers under Mandriva 2008. >Connection type : Ethernet 100 mbits/S (mii-tool result: negotiated >100baseTx-FD flow-control) > >Hereafter is my configuration (same on both PCs): > >*glusterfs-server.vol:* > >volume dir_main > type storage/posix # POSIX FS >translator > option directory /main # Export >this directory >end-volume > >volume locks_main > type features/posix-locks > subvolumes dir_main >end-volume > >volume main > type protocol/server > option transport-type tcp/server # For >TCP/IP transport > subvolumes locks_main > option auth.ip.main.allow 127.0.0.1,172.16.1.* # Allow >access to "brick" volume >end-volume > > >*client.vol:* > >volume main_loc > type protocol/client > option transport-type tcp/client > option remote-host localhost > option remote-subvolume main >end-volume > >volume main_dist > type protocol/client > option transport-type tcp/client > option remote-host other > option remote-subvolume main >end-volume > >volume raid_main_afr > type cluster/afr > subvolumes main_loc main_dist > option read-subvolume main_loc >end-volume > >volume raid_main_ra > type performance/read-ahead > option page-size 128kB > option page-count 4 > option force-atime-update off > subvolumes raid_main_afr >end-volume > >volume raid_main_wb > type performance/write-behind > option aggregate-size 1MB > option flush-behind on > subvolumes raid_main_ra >end-volume > >volume raid_main > type performance/io-cache > option cache-size 512MB > option page-size 1MB > option priority *:0 # *.html:2,*:1 > option force-revalidate-timeout 2 # default is 1 > subvolumes raid_main_wb >end-volume > > >It works fine, but slowly ! >I''m a newbie in glusterFS, so may be some option isn''t adequate. Please >advise. > >Due to the option "read-subvolume main_loc" I didn''t expect network >traffic when I just list files or read them, but actually, even with a >simple ls, I see a lot of network traffic. >A "ls -R" takes 7 to 8 seconds for less than 8000 files. If I do it >locally, I get the result instantly. >*Question 1:* Is this traffic normal on read only operation ? > >*Question 2:* in the documentation, I read that there is 2 protocols : >ASCII protocol and binary protocol. Currently, according to what I see >with tcpdump, my glusterFS uses the ASCII protocol. I guess it''s not the >best for performance ! What is the way to enforce it using the binary >protocol ? > >Thank''s for any help. > >Best regards, > >-- >Francis GASCHET / NUMLOG >http://www.numlog.fr >Tel.: +33 (0) 130 791 616 >Fax.: +33 (0) 130 819 286 > >NUMLOG recrute sur LOLIX : >http://fr.lolix.org/ > > > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
Daniel Maher
2008-Dec-16  09:25 UTC
[Gluster-users] Performance problem, Network traffic, ASCII/BINARY protocol
Keith Freedman wrote:> Also, I'd remove your read-ahead, write-behind, and io-cache, and > test again, then add back only one at a time, then 2 at a time, then > all 3 and see what your results are. > Personally, I dont trust using them with AFR.. it's probably safe, > but I have fears that the write-behind will capture data which is > written later while it's busy being changed on another > server. Read-ahead might be ok in AFR, but even still. you risk > pre-fetching data that changes a second later so you're out of > date... if I'm wrong I'm sure one of the devs will chime in.I can confirm that under 1.3, using the write-behind and read-ahead caches in an AFR environment, especially if you're dealing with lots of changes to many small files, can result in a total nightmare. We've seen files appear to be empty, files with out-dated content, and files with the content of /other/ files in them. When we disabled translators, all of these problems ceased to occur. Unfortunately, removing performance translators has the side effect of (you guessed it) hurting performance. Even though we no longer have occasionally inconsistant data, we also took a severe performance hit for the rest of the Gluster-hosted files which, normally, worked without a hitch. Mr. Avati has suggested that under 1.4 this would no longer be the case, as writes are now atomic. We're actually going ahead with the upgrade to 1.4 today - hopefully we'll be able to re-enable the translators... -- Daniel Maher <dma+gluster AT witbe DOT net>