thr3ads.net - Gluster users - [Gluster-users] ganesha.nfsd process dies when copying files [Aug 2018]

If this information is useful, please help other people find it:
Share via:

Karli Sjöberg

2018-Aug-14 13:10 UTC

[Gluster-users] ganesha.nfsd process dies when copying files

On Fri, 2018-08-10 at 09:39 -0400, Kaleb S. KEITHLEY
wrote:> On 08/10/2018 09:23 AM, Karli Sj?berg wrote:
> > On Fri, 2018-08-10 at 21:23 +0800, Pui Edylie wrote:
> > > Hi Karli,
> > > 
> > > Storhaug works with glusterfs 4.1.2 and latest nfs-ganesha.
> > > 
> > > I just installed them last weekend ... they are working very well
> > > :)
> > 
> > Okay, awesome!
> > 
> > Is there any documentation on how to do that?
> > 
> 
> https://github.com/gluster/storhaug/wiki
> 
Thanks Kaleb and Edy!

I have now redone the cluster using the latest and greatest following
the above guide and repeated the same test I was doing before (the
rsync while loop) with success. I let (forgot) it run for about a day
and it was still chugging along nicely when I aborted it, so success
there!

On to the next test; the catastrophic failure test- where one of the
servers dies, I'm having a more difficult time with.

1) I start with mounting the share over NFS 4.1 and then proceed with
writing a 8 GiB large random data file with 'dd', while
"hard-cutting"
the power to the server I'm writing to, the transfer just stops
indefinitely, until the server comes back again. Is that supposed to
happen? Like this:

# dd if=/dev/urandom of=/var/tmp/test.bin bs=1M count=8192
# mount -o vers=4.1 hv03v.localdomain:/data /mnt/
# dd if=/var/tmp/test.bin of=/mnt/test.bin bs=1M status=progress
2434793472 bytes (2,4 GB, 2,3 GiB) copied, 42 s, 57,9 MB/s

(here I cut the power and let it be for almost two hours before turning
it on again)

dd: error writing '/mnt/test.bin': Remote I/O error
2325+0 records in
2324+0 records out
2436890624 bytes (2,4 GB, 2,3 GiB) copied, 6944,84 s, 351 kB/s
# umount /mnt

Here the unmount command hung and I had to hard reset the client.

2) Another question I have is why some files "change" as you copy them
out to the Gluster storage? Is that the way it should be? This time, I
deleted eveything in the destination directory to start over:

# mount -o vers=4.1 hv03v.localdomain:/data /mnt/
# rm -f /mnt/test.bin
# dd if=/var/tmp/test.bin of=/mnt/test.bin bs=1M status=progress
8557428736 bytes (8,6 GB, 8,0 GiB) copied, 122 s, 70,1 MB/s
8192+0 records in
8192+0 records out
8589934592 bytes (8,6 GB, 8,0 GiB) copied, 123,039 s, 69,8 MB/s
# md5sum /var/tmp/test.bin 
073867b68fa8eaa382ffe05adb90b583  /var/tmp/test.bin
# md5sum /mnt/test.bin 
634187d367f856f3f5fb31846f796397  /mnt/test.bin
# umount /mnt

Thanks in advance!

/K

Alex Chekholko

2018-Aug-14 16:25 UTC

head link

[Gluster-users] ganesha.nfsd process dies when copying files

Hi Karli,

I'm not 100% sure this is related, but when I set up my ZFS NFS HA per
https://github.com/ewwhite/zfs-ha/wiki I was not able to get the failover
to work with NFS v4 but only with NFS v3.
>From the client point of view, it really looked like with NFS v4 there isan open file handle and that just goes stale and hangs, or something like
that, whereas with NFSv3 the client retries and recovers and continues.  I
did not investigate further, I just use v3.  I think it has something to do
with NFSv4 being "stateful" and NFSv3 being "stateless".

Can you re-run your test but using NFSv3 on the client mount?  Or do you
need to use v4.x?

Regards,
Alex

On Tue, Aug 14, 2018 at 6:11 AM Karli Sj?berg <karli at inparadise.se>
wrote:
> On Fri, 2018-08-10 at 09:39 -0400, Kaleb S. KEITHLEY wrote:
> > On 08/10/2018 09:23 AM, Karli Sj?berg wrote:
> > > On Fri, 2018-08-10 at 21:23 +0800, Pui Edylie wrote:
> > > > Hi Karli,
> > > >
> > > > Storhaug works with glusterfs 4.1.2 and latest nfs-ganesha.
> > > >
> > > > I just installed them last weekend ... they are working very
well
> > > > :)
> > >
> > > Okay, awesome!
> > >
> > > Is there any documentation on how to do that?
> > >
> >
> > https://github.com/gluster/storhaug/wiki
> >
>
> Thanks Kaleb and Edy!
>
> I have now redone the cluster using the latest and greatest following
> the above guide and repeated the same test I was doing before (the
> rsync while loop) with success. I let (forgot) it run for about a day
> and it was still chugging along nicely when I aborted it, so success
> there!
>
> On to the next test; the catastrophic failure test- where one of the
> servers dies, I'm having a more difficult time with.
>
> 1) I start with mounting the share over NFS 4.1 and then proceed with
> writing a 8 GiB large random data file with 'dd', while
"hard-cutting"
> the power to the server I'm writing to, the transfer just stops
> indefinitely, until the server comes back again. Is that supposed to
> happen? Like this:
>
> # dd if=/dev/urandom of=/var/tmp/test.bin bs=1M count=8192
> # mount -o vers=4.1 hv03v.localdomain:/data /mnt/
> # dd if=/var/tmp/test.bin of=/mnt/test.bin bs=1M status=progress
> 2434793472 bytes (2,4 GB, 2,3 GiB) copied, 42 s, 57,9 MB/s
>
> (here I cut the power and let it be for almost two hours before turning
> it on again)
>
> dd: error writing '/mnt/test.bin': Remote I/O error
> 2325+0 records in
> 2324+0 records out
> 2436890624 bytes (2,4 GB, 2,3 GiB) copied, 6944,84 s, 351 kB/s
> # umount /mnt
>
> Here the unmount command hung and I had to hard reset the client.
>
> 2) Another question I have is why some files "change" as you copy
them
> out to the Gluster storage? Is that the way it should be? This time, I
> deleted eveything in the destination directory to start over:
>
> # mount -o vers=4.1 hv03v.localdomain:/data /mnt/
> # rm -f /mnt/test.bin
> # dd if=/var/tmp/test.bin of=/mnt/test.bin bs=1M status=progress
> 8557428736 bytes (8,6 GB, 8,0 GiB) copied, 122 s, 70,1 MB/s
> 8192+0 records in
> 8192+0 records out
> 8589934592 bytes (8,6 GB, 8,0 GiB) copied, 123,039 s, 69,8 MB/s
> # md5sum /var/tmp/test.bin
> 073867b68fa8eaa382ffe05adb90b583  /var/tmp/test.bin
> # md5sum /mnt/test.bin
> 634187d367f856f3f5fb31846f796397  /mnt/test.bin
> # umount /mnt
>
> Thanks in advance!
>
> /K
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180814/fece9249/attachment.html>

Gluster users - Aug 2018 - ganesha.nfsd process dies when copying files

[Gluster-users] ganesha.nfsd process dies when copying files

[Gluster-users] ganesha.nfsd process dies when copying files