thr3ads.net - Gluster users - [Gluster-users] [External] To good to be truth speed improvements? [Jan 2019]

If this information is useful, please help other people find it:
Share via:

Diego Remolina

2019-Jan-15 13:28 UTC

[Gluster-users] [External] To good to be truth speed improvements?

Hi Davide,

The options information is already provided in prior e-mail, see the
termbin.con link for the options of the volume after the 4.1.6 upgrade.

The gluster options set on the volume are:
https://termbin.com/yxtd

This is the other piece:

# gluster v info export

Volume Name: export
Type: Replicate
Volume ID: b4353b3f-6ef6-4813-819a-8e85e5a95cff
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.0.1.7:/bricks/hdds/brick
Brick2: 10.0.1.6:/bricks/hdds/brick
Options Reconfigured:
performance.stat-prefetch: on
performance.cache-min-file-size: 0
network.inode-lru-limit: 65536
performance.cache-invalidation: on
features.cache-invalidation: on
performance.md-cache-timeout: 600
features.cache-invalidation-timeout: 600
performance.cache-samba-metadata: on
transport.address-family: inet
server.allow-insecure: on
performance.cache-size: 10GB
cluster.server-quorum-type: server
nfs.disable: on
performance.io-thread-count: 64
performance.io-cache: on
cluster.lookup-optimize: on
cluster.readdir-optimize: on
server.event-threads: 5
client.event-threads: 5
performance.cache-max-file-size: 256MB
diagnostics.client-log-level: INFO
diagnostics.brick-log-level: INFO
cluster.server-quorum-ratio: 51%

Now I did create a backup of /var/lib/glusterd so if you tell me how to
pull information from there to compare I can do it.

I compared the file /var/lib/glusterd/vols/export/info and it is the same
in both, though entries are in different order.

Diego




On Tue, Jan 15, 2019 at 5:03 AM Davide Obbi <davide.obbi at booking.com>
wrote:
>
>
> On Tue, Jan 15, 2019 at 2:18 AM Diego Remolina <dijuremo at
gmail.com> wrote:
>
>> Dear all,
>>
>> I was running gluster 3.10.12 on a pair of servers and recently
upgraded
>> to 4.1.6. There is a cron job that runs nightly in one machine, which
>> rsyncs the data on the servers over to another machine for backup
purposes.
>> The rsync operation runs on one of the gluster servers, which mounts
the
>> gluster volume via fuse on /export.
>>
>> When using 3.10.12, this process would start at 8:00PM nightly, and
>> usually end up at around 4:30AM when the servers had been freshly
rebooted.
>> From this point, things would start taking a bit longer and stabilize
>> ending at around 7-9AM depending on actual file changes and at some
point
>> the servers would start eating up so much ram (up to 30GB) and I would
have
>> to reboot them to bring things back to normal as the file system would
>> become extremely slow (perhaps the memory leak I have read was present
on
>> 3.10.x).
>>
>> After upgrading to 4.1.6 over the weekend, I was shocked to see the
rsync
>> process finish in about 1 hour and 26 minutes. This is compared to 8
hours
>> 30 mins with the older version. This is a nice speed up, however, I can
>> only ask myself what has changed so drastically that this process is
now so
>> fast. Have there really been improvements in 4.1.6 that could speed
this up
>> so dramatically? In both of my test cases, there would had not really
been
>> a lot to copy via rsync given the fresh reboots are done on Saturday
after
>> the sync has finished from the day before.
>>
>> In general, the servers (which are accessed via samba for windows
>> clients) are much faster and responsive since the update to 4.1.6.
Tonight
>> I will have the first rsync run which will actually have to copy the
day's
>> changes and will have another point of comparison.
>>
>> I am still using fuse mounts for samba, due to prior problems with vsf
>> =gluster, which are currently present in Samba 4.8.3-4, and already
>> documented in bugs, for which patches exist, but no official updated
samba
>> packages have been released yet. Since I was going from 3.10.12 to
4.1.6 I
>> also did not want to change other things to make sure I could track any
>> issues just related to the change in gluster versions and eliminate
other
>> complexity.
>>
>> The file system currently has about 16TB of data in
>> 5142816 files and 696544 directories
>>
>> I've just ran the following code to count files and dirs and it
took
>> 67mins 38.957 secs to complete in this gluster volume:
>> https://github.com/ChristopherSchultz/fast-file-count
>>
>> # time ( /root/sbin/dircnt /export )
>> /export contains 5142816 files and 696544 directories
>>
>> real    67m38.957s
>> user    0m6.225s
>> sys     0m48.939s
>>
>> The gluster options set on the volume are:
>> https://termbin.com/yxtd
>>
>> # gluster v status export
>> Status of volume: export
>> Gluster process                             TCP Port  RDMA Port  Online
>> Pid
>>
>>
------------------------------------------------------------------------------
>> Brick 10.0.1.7:/bricks/hdds/brick           49157     0          Y
>>  13986
>> Brick 10.0.1.6:/bricks/hdds/brick           49153     0          Y
>>  9953
>> Self-heal Daemon on localhost               N/A       N/A        Y
>>  21934
>> Self-heal Daemon on 10.0.1.5                N/A       N/A        Y
>>  4598
>> Self-heal Daemon on 10.0.1.6                N/A       N/A        Y
>>  14485
>>
>> Task Status of Volume export
>>
>>
------------------------------------------------------------------------------
>> There are no active volume tasks
>>
>> Truth, there is a 3rd server here, but no bricks on it.
>>
>> Thoughts?
>>
>> Diego
>>
>>
>>
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=icon>
Virus-free.
>> www.avast.com
>>
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=link>
>>
<#m_-4021393732076721680_m_8084651329793795211_m_7462352325940458688_m_-6479459361629161759_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
> Hi Diego,
>
> Besides the actual improvements made in the code i think new releases
> might implement volume options by default that before might have had
> different setting. I would have been interesting to diff "gluster
volume
> get <volname> all" befor and after the upgrade. Just for
curiosity and i am
> trying to figure out volume options for rsync kind of workloads can you
> share the command output anyway along with gluster volume info
<volname>?
>
> thanks
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190115/8496eb4a/attachment.html>

Davide Obbi

2019-Jan-15 19:03 UTC

head link

[Gluster-users] [External] To good to be truth speed improvements?

i think you can find the volume options doing  a grep -R option
/var/lib/glusterd/vols/ and the .vol files show the options

On Tue, Jan 15, 2019 at 2:28 PM Diego Remolina <dijuremo at gmail.com>
wrote:
> Hi Davide,
>
> The options information is already provided in prior e-mail, see the
> termbin.con link for the options of the volume after the 4.1.6 upgrade.
>
> The gluster options set on the volume are:
> https://termbin.com/yxtd
>
> This is the other piece:
>
> # gluster v info export
>
> Volume Name: export
> Type: Replicate
> Volume ID: b4353b3f-6ef6-4813-819a-8e85e5a95cff
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: 10.0.1.7:/bricks/hdds/brick
> Brick2: 10.0.1.6:/bricks/hdds/brick
> Options Reconfigured:
> performance.stat-prefetch: on
> performance.cache-min-file-size: 0
> network.inode-lru-limit: 65536
> performance.cache-invalidation: on
> features.cache-invalidation: on
> performance.md-cache-timeout: 600
> features.cache-invalidation-timeout: 600
> performance.cache-samba-metadata: on
> transport.address-family: inet
> server.allow-insecure: on
> performance.cache-size: 10GB
> cluster.server-quorum-type: server
> nfs.disable: on
> performance.io-thread-count: 64
> performance.io-cache: on
> cluster.lookup-optimize: on
> cluster.readdir-optimize: on
> server.event-threads: 5
> client.event-threads: 5
> performance.cache-max-file-size: 256MB
> diagnostics.client-log-level: INFO
> diagnostics.brick-log-level: INFO
> cluster.server-quorum-ratio: 51%
>
> Now I did create a backup of /var/lib/glusterd so if you tell me how to
> pull information from there to compare I can do it.
>
> I compared the file /var/lib/glusterd/vols/export/info and it is the same
> in both, though entries are in different order.
>
> Diego
>
>
>
>
> On Tue, Jan 15, 2019 at 5:03 AM Davide Obbi <davide.obbi at
booking.com>
> wrote:
>
>>
>>
>> On Tue, Jan 15, 2019 at 2:18 AM Diego Remolina <dijuremo at
gmail.com>
>> wrote:
>>
>>> Dear all,
>>>
>>> I was running gluster 3.10.12 on a pair of servers and recently
upgraded
>>> to 4.1.6. There is a cron job that runs nightly in one machine,
which
>>> rsyncs the data on the servers over to another machine for backup
purposes.
>>> The rsync operation runs on one of the gluster servers, which
mounts the
>>> gluster volume via fuse on /export.
>>>
>>> When using 3.10.12, this process would start at 8:00PM nightly, and
>>> usually end up at around 4:30AM when the servers had been freshly
rebooted.
>>> From this point, things would start taking a bit longer and
stabilize
>>> ending at around 7-9AM depending on actual file changes and at some
point
>>> the servers would start eating up so much ram (up to 30GB) and I
would have
>>> to reboot them to bring things back to normal as the file system
would
>>> become extremely slow (perhaps the memory leak I have read was
present on
>>> 3.10.x).
>>>
>>> After upgrading to 4.1.6 over the weekend, I was shocked to see the
>>> rsync process finish in about 1 hour and 26 minutes. This is
compared to 8
>>> hours 30 mins with the older version. This is a nice speed up,
however, I
>>> can only ask myself what has changed so drastically that this
process is
>>> now so fast. Have there really been improvements in 4.1.6 that
could speed
>>> this up so dramatically? In both of my test cases, there would had
not
>>> really been a lot to copy via rsync given the fresh reboots are
done on
>>> Saturday after the sync has finished from the day before.
>>>
>>> In general, the servers (which are accessed via samba for windows
>>> clients) are much faster and responsive since the update to 4.1.6.
Tonight
>>> I will have the first rsync run which will actually have to copy
the day's
>>> changes and will have another point of comparison.
>>>
>>> I am still using fuse mounts for samba, due to prior problems with
vsf
>>> =gluster, which are currently present in Samba 4.8.3-4, and already
>>> documented in bugs, for which patches exist, but no official
updated samba
>>> packages have been released yet. Since I was going from 3.10.12 to
4.1.6 I
>>> also did not want to change other things to make sure I could track
any
>>> issues just related to the change in gluster versions and eliminate
other
>>> complexity.
>>>
>>> The file system currently has about 16TB of data in
>>> 5142816 files and 696544 directories
>>>
>>> I've just ran the following code to count files and dirs and it
took
>>> 67mins 38.957 secs to complete in this gluster volume:
>>> https://github.com/ChristopherSchultz/fast-file-count
>>>
>>> # time ( /root/sbin/dircnt /export )
>>> /export contains 5142816 files and 696544 directories
>>>
>>> real    67m38.957s
>>> user    0m6.225s
>>> sys     0m48.939s
>>>
>>> The gluster options set on the volume are:
>>> https://termbin.com/yxtd
>>>
>>> # gluster v status export
>>> Status of volume: export
>>> Gluster process                             TCP Port  RDMA Port 
Online
>>> Pid
>>>
>>>
------------------------------------------------------------------------------
>>> Brick 10.0.1.7:/bricks/hdds/brick           49157     0          Y
>>>  13986
>>> Brick 10.0.1.6:/bricks/hdds/brick           49153     0          Y
>>>  9953
>>> Self-heal Daemon on localhost               N/A       N/A        Y
>>>  21934
>>> Self-heal Daemon on 10.0.1.5                N/A       N/A        Y
>>>  4598
>>> Self-heal Daemon on 10.0.1.6                N/A       N/A        Y
>>>  14485
>>>
>>> Task Status of Volume export
>>>
>>>
------------------------------------------------------------------------------
>>> There are no active volume tasks
>>>
>>> Truth, there is a 3rd server here, but no bricks on it.
>>>
>>> Thoughts?
>>>
>>> Diego
>>>
>>>
>>>
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=icon>
Virus-free.
>>> www.avast.com
>>>
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=link>
>>>
<#m_-657708050556748564_m_-2130281720557425520_m_-4021393732076721680_m_8084651329793795211_m_7462352325940458688_m_-6479459361629161759_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>> Hi Diego,
>>
>> Besides the actual improvements made in the code i think new releases
>> might implement volume options by default that before might have had
>> different setting. I would have been interesting to diff "gluster
volume
>> get <volname> all" befor and after the upgrade. Just for
curiosity and i am
>> trying to figure out volume options for rsync kind of workloads can you
>> share the command output anyway along with gluster volume info
<volname>?
>>
>> thanks
>>
>>
-- 
Davide Obbi
Senior System Administrator

Booking.com B.V.
Vijzelstraat 66-80 Amsterdam 1017HL Netherlands
Direct +31207031558
[image: Booking.com] <https://www.booking.com/>
Empowering people to experience the world since 1996
43 languages, 214+ offices worldwide, 141,000+ global destinations, 29
million reported listings
Subsidiary of Booking Holdings Inc. (NASDAQ: BKNG)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190115/c764003c/attachment.html>

Gluster users - Jan 2019 - [External] To good to be truth speed improvements?

[Gluster-users] [External] To good to be truth speed improvements?

[Gluster-users] [External] To good to be truth speed improvements?