thr3ads.net - Gluster users - [Gluster-users] NFS versus Fuse file locking problem (NFS works, fuse doesn't...) [Aug 2017]

If this information is useful, please help other people find it:
Share via:

Krist van Besien

2017-Aug-24 13:01 UTC

[Gluster-users] NFS versus Fuse file locking problem (NFS works, fuse doesn't...)

Hi
This is gluster 3.8.4. Volume options are out of the box. Sharding is off
(and I don't think enabling it would matter)

I haven't done much performance tuning. For one thing, using a simple
script that just creates files I can easily flood the network, so I don't
expect a performance issue.

The problem we see is that after a certain time the fuse clients completely
stop accepting writes. Something is preventing the application to write
after a while.
We see this on the fuse client, but not when we use nfs. So the question I
am interested in seeing an answer too is in what way is nfs different from
fuse that could cause this.

My suspicion is it is locking related.

Krist


On 24 August 2017 at 14:36, Everton Brogliatto <brogliatto at gmail.com>
wrote:
> Hi Krist,
>
> What are your volume options on that setup? Have you tried tuning it for
> the kind of workload and files size you have?
>
> I would definitely do some tests with feature.shard=on/off first. If shard
> is on, try playing with features.shard-block-size.
> Do you have jumbo frames (MTU=9000) enabled across the switch and nodes?
> if you have concurrent clients writing/reading, it could be beneficial to
> increase the number of client and server threads as well, try setting
> higher values for client.event-threads and server.event-threads.
>
> Best regards,
> Everton Brogliatto
>
>
>
> On Thu, Aug 24, 2017 at 7:48 PM, Krist van Besien <krist at
redhat.com>
> wrote:
>
>> Hi all,
>>
>> I usualy advise clients to use the native client if at all possible, as
>> it is very robust. But I am running in to problems here.
>>
>> In this case the gluster system is used to store video streams.
Basicaly
>> the setup is the following:
>> - A gluster cluster of 3 nodes, with ample storage. They export several
>> volumes.
>> - The network is 10GB, switched.
>> - A "recording server" which subscribes to multi cast video
streams, and
>> records them to disk. The recorder writes the streams in 10s blocks, so
>> when it is for example recording 50 streams it is creating 5 files a
>> second, each about 5M. it uses a write-then-rename process.
>>
>> I simulated that with a small script, that wrote 5M files and renamed
>> them as fast as it could, and could easily create around 100 files/s
(which
>> abouts saturates the network). So I think the cluster is up to the
task.
>>
>> However if we try the actualy workload we run in to trouble. Running
the
>> recorder software we can gradually ramp up the number of streams it
records
>> (and thus the number of files it creates), and at arou d 50 streams the
>> recorder eventually stops writing files. According to the programmers
that
>> wrote it, it appears that it can no longer get the needed locks? and as
a
>> result just stops writing.
>>
>> We decided to test using the NFS client as well, and there the problem
>> does not exist. But again, I (and the customer) would prefer not to use
>> NFS, but use the native client in stead.
>>
>> So if the problem is file locking, and the problem exists with the
native
>> client, and not using NFS, what could be the cause?
>>
>> In what way do locking differ between the two different file systems,
>> between NFS and Fuse, and how can the programmers work around any
issues
>> the fuse client might be causing?
>>
>> This video stream software is a bespoke solution, developped in house
and
>> it is thus possible to change the way it handles files so it works with
the
>> native client, but the programmers are looking at me for guidance.
>>
>> Any suggestions?
>>
>> Krist
>>
>>
>>
>>
>> --
>> Vriendelijke Groet |  Best Regards | Freundliche Gr??e | Cordialement
>> ------------------------------
>>
>> Krist van Besien
>>
>> senior architect, RHCE, RHCSA Open Stack
>>
>> Red Hat Red Hat Switzerland S.A. <https://www.redhat.com>
>>
>> krist at redhat.com    M: +41-79-5936260
>> <https://red.ht/sig>
>> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>

-- 
Vriendelijke Groet |  Best Regards | Freundliche Gr??e | Cordialement
------------------------------

Krist van Besien

senior architect, RHCE, RHCSA Open Stack

Red Hat Red Hat Switzerland S.A. <https://www.redhat.com>

krist at redhat.com    M: +41-79-5936260
<https://red.ht/sig>
TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170824/d77d6f12/attachment.html>

Everton Brogliatto

2017-Aug-25 02:28 UTC

head link

[Gluster-users] NFS versus Fuse file locking problem (NFS works, fuse doesn't...)

Hi Krist,

In my setup, if I mount the Gluster storage using NFS, I have an
improvement of 3x in writes speed.

I believe the answer for your questions is here:
http://lists.gluster.org/pipermail/gluster-users/2015-July/022703.html
https://joejulian.name/blog/nfs-mount-for-glusterfs-gives-better-read-performance-for-small-files/

In my case, as I run VMs and have shard enabled, changing the shard block
size made a significant difference.

Does changing the number of Gluster threads make any difference in you
setup as you have multiple clients accessing it simultaneously?

Best regards,
Everton Brogliatto



On Thu, Aug 24, 2017 at 9:01 PM, Krist van Besien <krist at redhat.com>
wrote:
> Hi
> This is gluster 3.8.4. Volume options are out of the box. Sharding is off
> (and I don't think enabling it would matter)
>
> I haven't done much performance tuning. For one thing, using a simple
> script that just creates files I can easily flood the network, so I
don't
> expect a performance issue.
>
> The problem we see is that after a certain time the fuse clients
> completely stop accepting writes. Something is preventing the application
> to write after a while.
> We see this on the fuse client, but not when we use nfs. So the question I
> am interested in seeing an answer too is in what way is nfs different from
> fuse that could cause this.
>
> My suspicion is it is locking related.
>
> Krist
>
>
>
> On 24 August 2017 at 14:36, Everton Brogliatto <brogliatto at
gmail.com>
> wrote:
>
>> Hi Krist,
>>
>> What are your volume options on that setup? Have you tried tuning it
for
>> the kind of workload and files size you have?
>>
>> I would definitely do some tests with feature.shard=on/off first. If
>> shard is on, try playing with features.shard-block-size.
>> Do you have jumbo frames (MTU=9000) enabled across the switch and
nodes?
>> if you have concurrent clients writing/reading, it could be beneficial
to
>> increase the number of client and server threads as well, try setting
>> higher values for client.event-threads and server.event-threads.
>>
>> Best regards,
>> Everton Brogliatto
>>
>>
>>
>> On Thu, Aug 24, 2017 at 7:48 PM, Krist van Besien <krist at
redhat.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> I usualy advise clients to use the native client if at all
possible, as
>>> it is very robust. But I am running in to problems here.
>>>
>>> In this case the gluster system is used to store video streams.
Basicaly
>>> the setup is the following:
>>> - A gluster cluster of 3 nodes, with ample storage. They export
several
>>> volumes.
>>> - The network is 10GB, switched.
>>> - A "recording server" which subscribes to multi cast
video streams, and
>>> records them to disk. The recorder writes the streams in 10s
blocks, so
>>> when it is for example recording 50 streams it is creating 5 files
a
>>> second, each about 5M. it uses a write-then-rename process.
>>>
>>> I simulated that with a small script, that wrote 5M files and
renamed
>>> them as fast as it could, and could easily create around 100
files/s (which
>>> abouts saturates the network). So I think the cluster is up to the
task.
>>>
>>> However if we try the actualy workload we run in to trouble.
Running the
>>> recorder software we can gradually ramp up the number of streams it
records
>>> (and thus the number of files it creates), and at arou d 50 streams
the
>>> recorder eventually stops writing files. According to the
programmers that
>>> wrote it, it appears that it can no longer get the needed locks?
and as a
>>> result just stops writing.
>>>
>>> We decided to test using the NFS client as well, and there the
problem
>>> does not exist. But again, I (and the customer) would prefer not to
use
>>> NFS, but use the native client in stead.
>>>
>>> So if the problem is file locking, and the problem exists with the
>>> native client, and not using NFS, what could be the cause?
>>>
>>> In what way do locking differ between the two different file
systems,
>>> between NFS and Fuse, and how can the programmers work around any
issues
>>> the fuse client might be causing?
>>>
>>> This video stream software is a bespoke solution, developped in
house
>>> and it is thus possible to change the way it handles files so it
works with
>>> the native client, but the programmers are looking at me for
guidance.
>>>
>>> Any suggestions?
>>>
>>> Krist
>>>
>>>
>>>
>>>
>>> --
>>> Vriendelijke Groet |  Best Regards | Freundliche Gr??e |
Cordialement
>>> ------------------------------
>>>
>>> Krist van Besien
>>>
>>> senior architect, RHCE, RHCSA Open Stack
>>>
>>> Red Hat Red Hat Switzerland S.A. <https://www.redhat.com>
>>>
>>> krist at redhat.com    M: +41-79-5936260
>>> <https://red.ht/sig>
>>> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>
>
> --
> Vriendelijke Groet |  Best Regards | Freundliche Gr??e | Cordialement
> ------------------------------
>
> Krist van Besien
>
> senior architect, RHCE, RHCSA Open Stack
>
> Red Hat Red Hat Switzerland S.A. <https://www.redhat.com>
>
> krist at redhat.com    M: +41-79-5936260
> <https://red.ht/sig>
> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170825/82722aa0/attachment.html>

Vijay Bellur

2017-Aug-25 02:47 UTC

head link

[Gluster-users] NFS versus Fuse file locking problem (NFS works, fuse doesn't...)

On Thu, Aug 24, 2017 at 9:01 AM, Krist van Besien <krist at redhat.com>
wrote:
> Hi
> This is gluster 3.8.4. Volume options are out of the box. Sharding is off
> (and I don't think enabling it would matter)
>
> I haven't done much performance tuning. For one thing, using a simple
> script that just creates files I can easily flood the network, so I
don't
> expect a performance issue.
>
> The problem we see is that after a certain time the fuse clients
> completely stop accepting writes. Something is preventing the application
> to write after a while.
> We see this on the fuse client, but not when we use nfs. So the question I
> am interested in seeing an answer too is in what way is nfs different from
> fuse that could cause this.
>
> My suspicion is it is locking related.
>
>Would it be possible to obtain a statedump of the native client when the
application becomes completely unresponsive? A statedump can help in
understanding operations within the gluster stack. Log file of the native
client might also offer some clues.

Regards,
Vijay
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170824/9c396076/attachment.html>

Krist van Besien

2017-Aug-25 07:22 UTC

head link

[Gluster-users] NFS versus Fuse file locking problem (NFS works, fuse doesn't...)

On 25 August 2017 at 04:47, Vijay Bellur <vbellur at redhat.com> wrote:
>
>
> On Thu, Aug 24, 2017 at 9:01 AM, Krist van Besien <krist at
redhat.com>
> wrote:
>
> Would it be possible to obtain a statedump of the native client when the
> application becomes completely unresponsive? A statedump can help in
> understanding operations within the gluster stack. Log file of the native
> client might also offer some clues.
>
I've increased logging to debug on both client and bricks, but didn't
see
anything that hinted at problems.
Maybe we have to go for Ganesha after all.

But currently we are stuck at the customer having trouble actually
generating enough load to test the server with...

When I try to simulate the workload with a script that writes and renames
files at the same rate the the video recorders do I can run it without any
issue, and can ramp up to the point where I am hitting the network ceiling.
So the gluster cluster is up to the task.
But the recorder software itself is running in to issues. Which makes me
suspect that it may have to do with the way some aspects of it are coded.
And it is there I am looking for answers. Any hints, like "if you call
fopen() you should give these flags an not these flags or you get in to
trouble"...

Krist

-- 
Vriendelijke Groet |  Best Regards | Freundliche Gr??e | Cordialement
------------------------------

Krist van Besien

senior architect, RHCE, RHCSA Open Stack

Red Hat Red Hat Switzerland S.A. <https://www.redhat.com>

krist at redhat.com    M: +41-79-5936260
<https://red.ht/sig>
TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170825/26b49750/attachment.html>

Maybe Matching Threads

Search for more seemingly similar threads

Gluster users - Aug 2017 - NFS versus Fuse file locking problem (NFS works, fuse doesn't...)

[Gluster-users] NFS versus Fuse file locking problem (NFS works, fuse doesn't...)

[Gluster-users] NFS versus Fuse file locking problem (NFS works, fuse doesn't...)

[Gluster-users] NFS versus Fuse file locking problem (NFS works, fuse doesn't...)

[Gluster-users] NFS versus Fuse file locking problem (NFS works, fuse doesn't...)

Maybe Matching Threads