thr3ads.net - Gluster users - [Gluster-users] Performance problem, Network traffic, ASCII/BINARY protocol [Dec 2008]

If this information is useful, please help other people find it:
Share via:

Francis GASCHET

2008-Dec-16 07:44 UTC

[Gluster-users] Performance problem, Network traffic, ASCII/BINARY protocol

Hi all,

I installed glusterFS on 2 computers under Mandriva 2008.
Connection type : Ethernet 100 mbits/S (mii-tool result: negotiated
100baseTx-FD flow-control)

Hereafter is my configuration (same on both PCs):

*glusterfs-server.vol:*

volume dir_main
   type storage/posix                                        # POSIX FS
translator
   option directory /main                                    # Export
this directory
end-volume

volume locks_main
     type features/posix-locks
     subvolumes dir_main
end-volume

volume main
     type protocol/server
     option transport-type tcp/server                          # For
TCP/IP transport
    subvolumes locks_main
     option auth.ip.main.allow 127.0.0.1,172.16.1.*            # Allow
access to "brick" volume
end-volume


*client.vol:*

volume main_loc
     type protocol/client
     option transport-type tcp/client
     option remote-host localhost
     option remote-subvolume main
end-volume

volume main_dist
     type protocol/client
     option transport-type tcp/client
     option remote-host other
     option remote-subvolume main
end-volume

volume raid_main_afr
     type cluster/afr
     subvolumes main_loc main_dist
     option read-subvolume main_loc
end-volume

volume raid_main_ra
     type performance/read-ahead
     option page-size 128kB
     option page-count 4
     option force-atime-update off
     subvolumes raid_main_afr
end-volume

volume raid_main_wb
     type performance/write-behind
     option aggregate-size 1MB
     option flush-behind on
     subvolumes raid_main_ra
end-volume

volume raid_main
     type performance/io-cache
     option cache-size 512MB
     option page-size 1MB
     option priority *:0                 # *.html:2,*:1
     option force-revalidate-timeout 2   # default is 1
     subvolumes raid_main_wb
end-volume


It works fine, but slowly !
I'm a newbie in glusterFS, so may be some option isn't adequate. Please
advise.

Due to the option "read-subvolume main_loc" I didn't expect
network
traffic when I just list files or read them, but actually, even with a
simple ls, I see a lot of network traffic.
A "ls -R" takes 7 to 8 seconds for less than 8000 files. If I do it
locally, I get the result instantly.
*Question 1:* Is this traffic normal on read only operation ?

*Question 2:* in the documentation, I read that there is 2 protocols :
ASCII protocol and binary protocol. Currently, according to what I see
with tcpdump, my glusterFS uses the ASCII protocol. I guess it's not the
best for performance ! What is the way to enforce it using the binary
protocol ?

Thank's for any help.

Best regards,

-- 
Francis GASCHET / NUMLOG
http://www.numlog.fr
Tel.: +33 (0) 130 791 616
Fax.: +33 (0) 130 819 286

NUMLOG recrute sur LOLIX :
http://fr.lolix.org/

Keith Freedman

2008-Dec-16 09:11 UTC

head link

[Gluster-users] Performance problem, Network traffic, ASCII/BINARY protocol

I believe, 1.3 uses ascii and 1.4 uses binary protocols
I noticed a HUGE performance boost when I went up to 1.4

Also, I''d remove your read-ahead, write-behind, and io-cache, and 
test again, then add back only one at a time, then 2 at a time, then 
all 3 and see what your results are.
Personally, I dont trust using them with AFR.. it''s probably safe, 
but I have fears that the write-behind will capture data which is 
written later while it''s busy being changed on another 
server.  Read-ahead might be ok in AFR, but even still.  you risk 
pre-fetching data that changes a second later so you''re out of 
date... if I''m wrong I''m sure one of the devs will chime in.

beyond that, I believe an ls will have some network traffic, the AFR 
has to check with the other server to validate the serial number.

check your logs and see if it''s for some reason continually 
auto-healing the directory entry--if it is, that accounts for the 
delays, if not, then something else is in the way.




At 11:44 PM 12/15/2008, Francis GASCHET wrote:>Hi all,
>
>I installed glusterFS on 2 computers under Mandriva 2008.
>Connection type : Ethernet 100 mbits/S (mii-tool result: negotiated
>100baseTx-FD flow-control)
>
>Hereafter is my configuration (same on both PCs):
>
>*glusterfs-server.vol:*
>
>volume dir_main
>    type storage/posix                                        # POSIX FS
>translator
>    option directory /main                                    # Export
>this directory
>end-volume
>
>volume locks_main
>      type features/posix-locks
>      subvolumes dir_main
>end-volume
>
>volume main
>      type protocol/server
>      option transport-type tcp/server                          # For
>TCP/IP transport
>     subvolumes locks_main
>      option auth.ip.main.allow 127.0.0.1,172.16.1.*            # Allow
>access to "brick" volume
>end-volume
>
>
>*client.vol:*
>
>volume main_loc
>      type protocol/client
>      option transport-type tcp/client
>      option remote-host localhost
>      option remote-subvolume main
>end-volume
>
>volume main_dist
>      type protocol/client
>      option transport-type tcp/client
>      option remote-host other
>      option remote-subvolume main
>end-volume
>
>volume raid_main_afr
>      type cluster/afr
>      subvolumes main_loc main_dist
>      option read-subvolume main_loc
>end-volume
>
>volume raid_main_ra
>      type performance/read-ahead
>      option page-size 128kB
>      option page-count 4
>      option force-atime-update off
>      subvolumes raid_main_afr
>end-volume
>
>volume raid_main_wb
>      type performance/write-behind
>      option aggregate-size 1MB
>      option flush-behind on
>      subvolumes raid_main_ra
>end-volume
>
>volume raid_main
>      type performance/io-cache
>      option cache-size 512MB
>      option page-size 1MB
>      option priority *:0                 # *.html:2,*:1
>      option force-revalidate-timeout 2   # default is 1
>      subvolumes raid_main_wb
>end-volume
>
>
>It works fine, but slowly !
>I''m a newbie in glusterFS, so may be some option isn''t
adequate. Please
>advise.
>
>Due to the option "read-subvolume main_loc" I didn''t
expect network
>traffic when I just list files or read them, but actually, even with a
>simple ls, I see a lot of network traffic.
>A "ls -R" takes 7 to 8 seconds for less than 8000 files. If I do
it
>locally, I get the result instantly.
>*Question 1:* Is this traffic normal on read only operation ?
>
>*Question 2:* in the documentation, I read that there is 2 protocols :
>ASCII protocol and binary protocol. Currently, according to what I see
>with tcpdump, my glusterFS uses the ASCII protocol. I guess it''s
not the
>best for performance ! What is the way to enforce it using the binary
>protocol ?
>
>Thank''s for any help.
>
>Best regards,
>
>--
>Francis GASCHET / NUMLOG
>http://www.numlog.fr
>Tel.: +33 (0) 130 791 616
>Fax.: +33 (0) 130 819 286
>
>NUMLOG recrute sur LOLIX :
>http://fr.lolix.org/
>
>
>
>_______________________________________________
>Gluster-users mailing list
>Gluster-users at gluster.org
>http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users

Keith Freedman

2008-Dec-16 09:11 UTC

head link

[Gluster-users] Performance problem, Network traffic, ASCII/BINARY protocol

I believe, 1.3 uses ascii and 1.4 uses binary protocols
I noticed a HUGE performance boost when I went up to 1.4

Also, I'd remove your read-ahead, write-behind, and io-cache, and 
test again, then add back only one at a time, then 2 at a time, then 
all 3 and see what your results are.
Personally, I dont trust using them with AFR.. it's probably safe, 
but I have fears that the write-behind will capture data which is 
written later while it's busy being changed on another 
server.  Read-ahead might be ok in AFR, but even still.  you risk 
pre-fetching data that changes a second later so you're out of 
date... if I'm wrong I'm sure one of the devs will chime in.

beyond that, I believe an ls will have some network traffic, the AFR 
has to check with the other server to validate the serial number.

check your logs and see if it's for some reason continually 
auto-healing the directory entry--if it is, that accounts for the 
delays, if not, then something else is in the way.




At 11:44 PM 12/15/2008, Francis GASCHET wrote:>Hi all,
>
>I installed glusterFS on 2 computers under Mandriva 2008.
>Connection type : Ethernet 100 mbits/S (mii-tool result: negotiated
>100baseTx-FD flow-control)
>
>Hereafter is my configuration (same on both PCs):
>
>*glusterfs-server.vol:*
>
>volume dir_main
>    type storage/posix                                        # POSIX FS
>translator
>    option directory /main                                    # Export
>this directory
>end-volume
>
>volume locks_main
>      type features/posix-locks
>      subvolumes dir_main
>end-volume
>
>volume main
>      type protocol/server
>      option transport-type tcp/server                          # For
>TCP/IP transport
>     subvolumes locks_main
>      option auth.ip.main.allow 127.0.0.1,172.16.1.*            # Allow
>access to "brick" volume
>end-volume
>
>
>*client.vol:*
>
>volume main_loc
>      type protocol/client
>      option transport-type tcp/client
>      option remote-host localhost
>      option remote-subvolume main
>end-volume
>
>volume main_dist
>      type protocol/client
>      option transport-type tcp/client
>      option remote-host other
>      option remote-subvolume main
>end-volume
>
>volume raid_main_afr
>      type cluster/afr
>      subvolumes main_loc main_dist
>      option read-subvolume main_loc
>end-volume
>
>volume raid_main_ra
>      type performance/read-ahead
>      option page-size 128kB
>      option page-count 4
>      option force-atime-update off
>      subvolumes raid_main_afr
>end-volume
>
>volume raid_main_wb
>      type performance/write-behind
>      option aggregate-size 1MB
>      option flush-behind on
>      subvolumes raid_main_ra
>end-volume
>
>volume raid_main
>      type performance/io-cache
>      option cache-size 512MB
>      option page-size 1MB
>      option priority *:0                 # *.html:2,*:1
>      option force-revalidate-timeout 2   # default is 1
>      subvolumes raid_main_wb
>end-volume
>
>
>It works fine, but slowly !
>I'm a newbie in glusterFS, so may be some option isn't adequate.
Please
>advise.
>
>Due to the option "read-subvolume main_loc" I didn't expect
network
>traffic when I just list files or read them, but actually, even with a
>simple ls, I see a lot of network traffic.
>A "ls -R" takes 7 to 8 seconds for less than 8000 files. If I do
it
>locally, I get the result instantly.
>*Question 1:* Is this traffic normal on read only operation ?
>
>*Question 2:* in the documentation, I read that there is 2 protocols :
>ASCII protocol and binary protocol. Currently, according to what I see
>with tcpdump, my glusterFS uses the ASCII protocol. I guess it's not the
>best for performance ! What is the way to enforce it using the binary
>protocol ?
>
>Thank's for any help.
>
>Best regards,
>
>--
>Francis GASCHET / NUMLOG
>http://www.numlog.fr
>Tel.: +33 (0) 130 791 616
>Fax.: +33 (0) 130 819 286
>
>NUMLOG recrute sur LOLIX :
>http://fr.lolix.org/
>
>
>
>_______________________________________________
>Gluster-users mailing list
>Gluster-users at gluster.org
>http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users

Keith Freedman

2008-Dec-16 09:11 UTC

head link

[Gluster-users] Performance problem, Network traffic, ASCII/BINARY protocol

I believe, 1.3 uses ascii and 1.4 uses binary protocols
I noticed a HUGE performance boost when I went up to 1.4

Also, I''d remove your read-ahead, write-behind, and io-cache, and 
test again, then add back only one at a time, then 2 at a time, then 
all 3 and see what your results are.
Personally, I dont trust using them with AFR.. it''s probably safe, 
but I have fears that the write-behind will capture data which is 
written later while it''s busy being changed on another 
server.  Read-ahead might be ok in AFR, but even still.  you risk 
pre-fetching data that changes a second later so you''re out of 
date... if I''m wrong I''m sure one of the devs will chime in.

beyond that, I believe an ls will have some network traffic, the AFR 
has to check with the other server to validate the serial number.

check your logs and see if it''s for some reason continually 
auto-healing the directory entry--if it is, that accounts for the 
delays, if not, then something else is in the way.




At 11:44 PM 12/15/2008, Francis GASCHET wrote:>Hi all,
>
>I installed glusterFS on 2 computers under Mandriva 2008.
>Connection type : Ethernet 100 mbits/S (mii-tool result: negotiated
>100baseTx-FD flow-control)
>
>Hereafter is my configuration (same on both PCs):
>
>*glusterfs-server.vol:*
>
>volume dir_main
>    type storage/posix                                        # POSIX FS
>translator
>    option directory /main                                    # Export
>this directory
>end-volume
>
>volume locks_main
>      type features/posix-locks
>      subvolumes dir_main
>end-volume
>
>volume main
>      type protocol/server
>      option transport-type tcp/server                          # For
>TCP/IP transport
>     subvolumes locks_main
>      option auth.ip.main.allow 127.0.0.1,172.16.1.*            # Allow
>access to "brick" volume
>end-volume
>
>
>*client.vol:*
>
>volume main_loc
>      type protocol/client
>      option transport-type tcp/client
>      option remote-host localhost
>      option remote-subvolume main
>end-volume
>
>volume main_dist
>      type protocol/client
>      option transport-type tcp/client
>      option remote-host other
>      option remote-subvolume main
>end-volume
>
>volume raid_main_afr
>      type cluster/afr
>      subvolumes main_loc main_dist
>      option read-subvolume main_loc
>end-volume
>
>volume raid_main_ra
>      type performance/read-ahead
>      option page-size 128kB
>      option page-count 4
>      option force-atime-update off
>      subvolumes raid_main_afr
>end-volume
>
>volume raid_main_wb
>      type performance/write-behind
>      option aggregate-size 1MB
>      option flush-behind on
>      subvolumes raid_main_ra
>end-volume
>
>volume raid_main
>      type performance/io-cache
>      option cache-size 512MB
>      option page-size 1MB
>      option priority *:0                 # *.html:2,*:1
>      option force-revalidate-timeout 2   # default is 1
>      subvolumes raid_main_wb
>end-volume
>
>
>It works fine, but slowly !
>I''m a newbie in glusterFS, so may be some option isn''t
adequate. Please
>advise.
>
>Due to the option "read-subvolume main_loc" I didn''t
expect network
>traffic when I just list files or read them, but actually, even with a
>simple ls, I see a lot of network traffic.
>A "ls -R" takes 7 to 8 seconds for less than 8000 files. If I do
it
>locally, I get the result instantly.
>*Question 1:* Is this traffic normal on read only operation ?
>
>*Question 2:* in the documentation, I read that there is 2 protocols :
>ASCII protocol and binary protocol. Currently, according to what I see
>with tcpdump, my glusterFS uses the ASCII protocol. I guess it''s
not the
>best for performance ! What is the way to enforce it using the binary
>protocol ?
>
>Thank''s for any help.
>
>Best regards,
>
>--
>Francis GASCHET / NUMLOG
>http://www.numlog.fr
>Tel.: +33 (0) 130 791 616
>Fax.: +33 (0) 130 819 286
>
>NUMLOG recrute sur LOLIX :
>http://fr.lolix.org/
>
>
>
>_______________________________________________
>Gluster-users mailing list
>Gluster-users at gluster.org
>http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users

Keith Freedman

2008-Dec-16 09:11 UTC

head link

[Gluster-users] Performance problem, Network traffic, ASCII/BINARY protocol

I believe, 1.3 uses ascii and 1.4 uses binary protocols
I noticed a HUGE performance boost when I went up to 1.4

Also, I''d remove your read-ahead, write-behind, and io-cache, and 
test again, then add back only one at a time, then 2 at a time, then 
all 3 and see what your results are.
Personally, I dont trust using them with AFR.. it''s probably safe, 
but I have fears that the write-behind will capture data which is 
written later while it''s busy being changed on another 
server.  Read-ahead might be ok in AFR, but even still.  you risk 
pre-fetching data that changes a second later so you''re out of 
date... if I''m wrong I''m sure one of the devs will chime in.

beyond that, I believe an ls will have some network traffic, the AFR 
has to check with the other server to validate the serial number.

check your logs and see if it''s for some reason continually 
auto-healing the directory entry--if it is, that accounts for the 
delays, if not, then something else is in the way.




At 11:44 PM 12/15/2008, Francis GASCHET wrote:>Hi all,
>
>I installed glusterFS on 2 computers under Mandriva 2008.
>Connection type : Ethernet 100 mbits/S (mii-tool result: negotiated
>100baseTx-FD flow-control)
>
>Hereafter is my configuration (same on both PCs):
>
>*glusterfs-server.vol:*
>
>volume dir_main
>    type storage/posix                                        # POSIX FS
>translator
>    option directory /main                                    # Export
>this directory
>end-volume
>
>volume locks_main
>      type features/posix-locks
>      subvolumes dir_main
>end-volume
>
>volume main
>      type protocol/server
>      option transport-type tcp/server                          # For
>TCP/IP transport
>     subvolumes locks_main
>      option auth.ip.main.allow 127.0.0.1,172.16.1.*            # Allow
>access to "brick" volume
>end-volume
>
>
>*client.vol:*
>
>volume main_loc
>      type protocol/client
>      option transport-type tcp/client
>      option remote-host localhost
>      option remote-subvolume main
>end-volume
>
>volume main_dist
>      type protocol/client
>      option transport-type tcp/client
>      option remote-host other
>      option remote-subvolume main
>end-volume
>
>volume raid_main_afr
>      type cluster/afr
>      subvolumes main_loc main_dist
>      option read-subvolume main_loc
>end-volume
>
>volume raid_main_ra
>      type performance/read-ahead
>      option page-size 128kB
>      option page-count 4
>      option force-atime-update off
>      subvolumes raid_main_afr
>end-volume
>
>volume raid_main_wb
>      type performance/write-behind
>      option aggregate-size 1MB
>      option flush-behind on
>      subvolumes raid_main_ra
>end-volume
>
>volume raid_main
>      type performance/io-cache
>      option cache-size 512MB
>      option page-size 1MB
>      option priority *:0                 # *.html:2,*:1
>      option force-revalidate-timeout 2   # default is 1
>      subvolumes raid_main_wb
>end-volume
>
>
>It works fine, but slowly !
>I''m a newbie in glusterFS, so may be some option isn''t
adequate. Please
>advise.
>
>Due to the option "read-subvolume main_loc" I didn''t
expect network
>traffic when I just list files or read them, but actually, even with a
>simple ls, I see a lot of network traffic.
>A "ls -R" takes 7 to 8 seconds for less than 8000 files. If I do
it
>locally, I get the result instantly.
>*Question 1:* Is this traffic normal on read only operation ?
>
>*Question 2:* in the documentation, I read that there is 2 protocols :
>ASCII protocol and binary protocol. Currently, according to what I see
>with tcpdump, my glusterFS uses the ASCII protocol. I guess it''s
not the
>best for performance ! What is the way to enforce it using the binary
>protocol ?
>
>Thank''s for any help.
>
>Best regards,
>
>--
>Francis GASCHET / NUMLOG
>http://www.numlog.fr
>Tel.: +33 (0) 130 791 616
>Fax.: +33 (0) 130 819 286
>
>NUMLOG recrute sur LOLIX :
>http://fr.lolix.org/
>
>
>
>_______________________________________________
>Gluster-users mailing list
>Gluster-users at gluster.org
>http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users

Daniel Maher

2008-Dec-16 09:25 UTC

head link

[Gluster-users] Performance problem, Network traffic, ASCII/BINARY protocol

Keith Freedman wrote:
> Also, I'd remove your read-ahead, write-behind, and io-cache, and 
> test again, then add back only one at a time, then 2 at a time, then 
> all 3 and see what your results are.
> Personally, I dont trust using them with AFR.. it's probably safe, 
> but I have fears that the write-behind will capture data which is 
> written later while it's busy being changed on another 
> server.  Read-ahead might be ok in AFR, but even still.  you risk 
> pre-fetching data that changes a second later so you're out of 
> date... if I'm wrong I'm sure one of the devs will chime in.
I can confirm that under 1.3, using the write-behind and read-ahead 
caches in an AFR environment, especially if you're dealing with lots of 
changes to many small files, can result in a total nightmare.  We've 
seen files appear to be empty, files with out-dated content, and files 
with the content of /other/ files in them.  When we disabled 
translators, all of these problems ceased to occur.

Unfortunately, removing performance translators has the side effect of 
(you guessed it) hurting performance.  Even though we no longer have 
occasionally inconsistant data, we also took a severe performance hit 
for the rest of the Gluster-hosted files which, normally, worked without 
a hitch.

Mr. Avati has suggested that under 1.4 this would no longer be the case, 
as writes are now atomic.  We're actually going ahead with the upgrade 
to 1.4 today - hopefully we'll be able to re-enable the translators...

-- 
Daniel Maher <dma+gluster AT witbe DOT net>

Gluster users - Dec 2008 - Performance problem, Network traffic, ASCII/BINARY protocol

[Gluster-users] Performance problem, Network traffic, ASCII/BINARY protocol

[Gluster-users] Performance problem, Network traffic, ASCII/BINARY protocol

[Gluster-users] Performance problem, Network traffic, ASCII/BINARY protocol

[Gluster-users] Performance problem, Network traffic, ASCII/BINARY protocol

[Gluster-users] Performance problem, Network traffic, ASCII/BINARY protocol

[Gluster-users] Performance problem, Network traffic, ASCII/BINARY protocol