thr3ads.net - Gluster users - [Gluster-users] NFS versus Fuse file locking problem (NFS works, fuse doesn't...) [Aug 2017]

If this information is useful, please help other people find it:
Share via:

Krist van Besien

2017-Aug-24 11:48 UTC

[Gluster-users] NFS versus Fuse file locking problem (NFS works, fuse doesn't...)

Hi all,

I usualy advise clients to use the native client if at all possible, as it
is very robust. But I am running in to problems here.

In this case the gluster system is used to store video streams. Basicaly
the setup is the following:
- A gluster cluster of 3 nodes, with ample storage. They export several
volumes.
- The network is 10GB, switched.
- A "recording server" which subscribes to multi cast video streams,
and
records them to disk. The recorder writes the streams in 10s blocks, so
when it is for example recording 50 streams it is creating 5 files a
second, each about 5M. it uses a write-then-rename process.

I simulated that with a small script, that wrote 5M files and renamed them
as fast as it could, and could easily create around 100 files/s (which
abouts saturates the network). So I think the cluster is up to the task.

However if we try the actualy workload we run in to trouble. Running the
recorder software we can gradually ramp up the number of streams it records
(and thus the number of files it creates), and at arou d 50 streams the
recorder eventually stops writing files. According to the programmers that
wrote it, it appears that it can no longer get the needed locks? and as a
result just stops writing.

We decided to test using the NFS client as well, and there the problem does
not exist. But again, I (and the customer) would prefer not to use NFS, but
use the native client in stead.

So if the problem is file locking, and the problem exists with the native
client, and not using NFS, what could be the cause?

In what way do locking differ between the two different file systems,
between NFS and Fuse, and how can the programmers work around any issues
the fuse client might be causing?

This video stream software is a bespoke solution, developped in house and
it is thus possible to change the way it handles files so it works with the
native client, but the programmers are looking at me for guidance.

Any suggestions?

Krist

--
Vriendelijke Groet | Best Regards | Freundliche Gr??e | Cordialement
------------------------------

Krist van Besien

senior architect, RHCE, RHCSA Open Stack

Red Hat Red Hat Switzerland S.A. <https://www.redhat.com>

krist at redhat.com M: +41-79-5936260
<https://red.ht/sig>
TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170824/29cdcf3d/attachment.html>

Everton Brogliatto

2017-Aug-24 12:36 UTC

head link

[Gluster-users] NFS versus Fuse file locking problem (NFS works, fuse doesn't...)

Hi Krist,

What are your volume options on that setup? Have you tried tuning it for
the kind of workload and files size you have?

I would definitely do some tests with feature.shard=on/off first. If shard
is on, try playing with features.shard-block-size.
Do you have jumbo frames (MTU=9000) enabled across the switch and nodes? if
you have concurrent clients writing/reading, it could be beneficial to
increase the number of client and server threads as well, try setting
higher values for client.event-threads and server.event-threads.

Best regards,
Everton Brogliatto



On Thu, Aug 24, 2017 at 7:48 PM, Krist van Besien <krist at redhat.com>
wrote:
> Hi all,
>
> I usualy advise clients to use the native client if at all possible, as it
> is very robust. But I am running in to problems here.
>
> In this case the gluster system is used to store video streams. Basicaly
> the setup is the following:
> - A gluster cluster of 3 nodes, with ample storage. They export several
> volumes.
> - The network is 10GB, switched.
> - A "recording server" which subscribes to multi cast video
streams, and
> records them to disk. The recorder writes the streams in 10s blocks, so
> when it is for example recording 50 streams it is creating 5 files a
> second, each about 5M. it uses a write-then-rename process.
>
> I simulated that with a small script, that wrote 5M files and renamed them
> as fast as it could, and could easily create around 100 files/s (which
> abouts saturates the network). So I think the cluster is up to the task.
>
> However if we try the actualy workload we run in to trouble. Running the
> recorder software we can gradually ramp up the number of streams it records
> (and thus the number of files it creates), and at arou d 50 streams the
> recorder eventually stops writing files. According to the programmers that
> wrote it, it appears that it can no longer get the needed locks? and as a
> result just stops writing.
>
> We decided to test using the NFS client as well, and there the problem
> does not exist. But again, I (and the customer) would prefer not to use
> NFS, but use the native client in stead.
>
> So if the problem is file locking, and the problem exists with the native
> client, and not using NFS, what could be the cause?
>
> In what way do locking differ between the two different file systems,
> between NFS and Fuse, and how can the programmers work around any issues
> the fuse client might be causing?
>
> This video stream software is a bespoke solution, developped in house and
> it is thus possible to change the way it handles files so it works with the
> native client, but the programmers are looking at me for guidance.
>
> Any suggestions?
>
> Krist
>
>
>
>
> --
> Vriendelijke Groet |  Best Regards | Freundliche Gr??e | Cordialement
> ------------------------------
>
> Krist van Besien
>
> senior architect, RHCE, RHCSA Open Stack
>
> Red Hat Red Hat Switzerland S.A. <https://www.redhat.com>
>
> krist at redhat.com    M: +41-79-5936260
> <https://red.ht/sig>
> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170824/c87b6242/attachment.html>

Krist van Besien

2017-Aug-24 13:01 UTC

head link

[Gluster-users] NFS versus Fuse file locking problem (NFS works, fuse doesn't...)

Hi
This is gluster 3.8.4. Volume options are out of the box. Sharding is off
(and I don't think enabling it would matter)

I haven't done much performance tuning. For one thing, using a simple
script that just creates files I can easily flood the network, so I don't
expect a performance issue.

The problem we see is that after a certain time the fuse clients completely
stop accepting writes. Something is preventing the application to write
after a while.
We see this on the fuse client, but not when we use nfs. So the question I
am interested in seeing an answer too is in what way is nfs different from
fuse that could cause this.

My suspicion is it is locking related.

Krist


On 24 August 2017 at 14:36, Everton Brogliatto <brogliatto at gmail.com>
wrote:
> Hi Krist,
>
> What are your volume options on that setup? Have you tried tuning it for
> the kind of workload and files size you have?
>
> I would definitely do some tests with feature.shard=on/off first. If shard
> is on, try playing with features.shard-block-size.
> Do you have jumbo frames (MTU=9000) enabled across the switch and nodes?
> if you have concurrent clients writing/reading, it could be beneficial to
> increase the number of client and server threads as well, try setting
> higher values for client.event-threads and server.event-threads.
>
> Best regards,
> Everton Brogliatto
>
>
>
> On Thu, Aug 24, 2017 at 7:48 PM, Krist van Besien <krist at
redhat.com>
> wrote:
>
>> Hi all,
>>
>> I usualy advise clients to use the native client if at all possible, as
>> it is very robust. But I am running in to problems here.
>>
>> In this case the gluster system is used to store video streams.
Basicaly
>> the setup is the following:
>> - A gluster cluster of 3 nodes, with ample storage. They export several
>> volumes.
>> - The network is 10GB, switched.
>> - A "recording server" which subscribes to multi cast video
streams, and
>> records them to disk. The recorder writes the streams in 10s blocks, so
>> when it is for example recording 50 streams it is creating 5 files a
>> second, each about 5M. it uses a write-then-rename process.
>>
>> I simulated that with a small script, that wrote 5M files and renamed
>> them as fast as it could, and could easily create around 100 files/s
(which
>> abouts saturates the network). So I think the cluster is up to the
task.
>>
>> However if we try the actualy workload we run in to trouble. Running
the
>> recorder software we can gradually ramp up the number of streams it
records
>> (and thus the number of files it creates), and at arou d 50 streams the
>> recorder eventually stops writing files. According to the programmers
that
>> wrote it, it appears that it can no longer get the needed locks? and as
a
>> result just stops writing.
>>
>> We decided to test using the NFS client as well, and there the problem
>> does not exist. But again, I (and the customer) would prefer not to use
>> NFS, but use the native client in stead.
>>
>> So if the problem is file locking, and the problem exists with the
native
>> client, and not using NFS, what could be the cause?
>>
>> In what way do locking differ between the two different file systems,
>> between NFS and Fuse, and how can the programmers work around any
issues
>> the fuse client might be causing?
>>
>> This video stream software is a bespoke solution, developped in house
and
>> it is thus possible to change the way it handles files so it works with
the
>> native client, but the programmers are looking at me for guidance.
>>
>> Any suggestions?
>>
>> Krist
>>
>>
>>
>>
>> --
>> Vriendelijke Groet |  Best Regards | Freundliche Gr??e | Cordialement
>> ------------------------------
>>
>> Krist van Besien
>>
>> senior architect, RHCE, RHCSA Open Stack
>>
>> Red Hat Red Hat Switzerland S.A. <https://www.redhat.com>
>>
>> krist at redhat.com    M: +41-79-5936260
>> <https://red.ht/sig>
>> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>

-- 
Vriendelijke Groet |  Best Regards | Freundliche Gr??e | Cordialement
------------------------------

Krist van Besien

senior architect, RHCE, RHCSA Open Stack

Red Hat Red Hat Switzerland S.A. <https://www.redhat.com>

krist at redhat.com    M: +41-79-5936260
<https://red.ht/sig>
TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170824/d77d6f12/attachment.html>

Maybe Matching Threads

Search for more possibly parallel threads

Gluster users - Aug 2017 - NFS versus Fuse file locking problem (NFS works, fuse doesn't...)

[Gluster-users] NFS versus Fuse file locking problem (NFS works, fuse doesn't...)

[Gluster-users] NFS versus Fuse file locking problem (NFS works, fuse doesn't...)

[Gluster-users] NFS versus Fuse file locking problem (NFS works, fuse doesn't...)

Maybe Matching Threads