thr3ads.net - Gluster users - [Gluster-users] Gluster 3.7.13 with nfs-ganesha 2.3.0.1 [Aug 2016]

If this information is useful, please help other people find it:
Share via:

ML Wong

2016-Aug-02 18:18 UTC

[Gluster-users] Gluster 3.7.13 with nfs-ganesha 2.3.0.1

When i have used the packages from "centos-gluster37" to setup my
Gluster
with ZFS backend, ganesha-nfsd will throw me a ABRT signal when i tried to
copy, or simply rsync a directory to the share exported from nfs-ganesha.

Environment:
CentOS 7 - kernel 3.10.0-327.22.2.el7.x86_64
ZFS Version: 0.6.5.7 Release : 1.el7.centos
Gluster 3.7.13.1.el7 from centos-gluster37
nfs-ganesha 2.3.0.1.el7 from centos-gluster37

This is the only line i got from strace, off from the PID of gaensha-nfsd

futex(0x7f623ffff9d0, FUTEX_WAIT, 38303, NULL <detached ...>

In ganesha-gfapi.log, when i try to copy files - the log will pop up the
following entries, which keep complaining split-brain, and
stale-file-handle, issues in the Gluster volume.

[2016-08-01 23:06:09.423901] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn] 0-nfsvol1-replicate-0: Unreadable
subvolume -1 found with event generation 2 for gfid
3f713211-7573-45b1-aed8-503c8e17714b. (Possible split-brain)

[2016-08-01 23:06:09.425664] E [MSGID: 109040]
[dht-helper.c:1190:dht_migration_complete_check_task] 0-nfsvol1-dht:
<gfid:3f713211-7573-45b1-aed8-503c8e17714b>: failed to lookup the file on
nfsvol1-dht [Stale file handle]

I tried with both Gluster 3.7.13, and 3.7.12, these versions both give me
the same problem. Until i downgrade Gluster to 3.7.11, nfs-ganesha then
plays nicely with Gluster. I once wondered if that's related to my ZFS
backend setup, then i set something quick in my laptop using XFS as the
backend with 3 nodes running Gluster 3.7.13, and nfs-ganesha 2.3.0, and i
got the same result. Rsync/Copy files to the NFS shares exported from
Gluster+Ganesha aborted after a few files got copied. For your reference,
pacemaker+corosync are both still running in the background even when this
happens.

I am wondering if there are something introduced since 3.7.12, which
somehow breaks the interface between nfs-ganesha, and Gluster. Any pointers
will be appreciated.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160802/cb0cc191/attachment.html>

Soumya Koduri

2016-Aug-02 18:31 UTC

head link

[Gluster-users] Gluster 3.7.13 with nfs-ganesha 2.3.0.1

Hi,

Does your test involve multiple multiple ganesha servers or removing files?

http://review.gluster.org/#/c/14522/ (merged in 3.7.12)  caused a 
regression in upcall processing of nfs-ganesha. It is being fixed as 
part of http://review.gluster.org/14701 .

Could you please turn off upcalls (using below cmd) and re-try the tests.

cmd: gluster v set <volname> features.cache-invalidation off

Thanks,
Soumya

On 08/02/2016 11:48 PM, ML Wong wrote:> When i have used the packages from "centos-gluster37" to setup my
> Gluster with ZFS backend, ganesha-nfsd will throw me a ABRT signal when
> i tried to copy, or simply rsync a directory to the share exported from
> nfs-ganesha.
>
> Environment:
> CentOS 7 - kernel 3.10.0-327.22.2.el7.x86_64
> ZFS Version: 0.6.5.7 Release     : 1.el7.centos
> Gluster 3.7.13.1.el7 from centos-gluster37
> nfs-ganesha 2.3.0.1.el7 from centos-gluster37
>
> This is the only line i got from strace, off from the PID of gaensha-nfsd
>
> futex(0x7f623ffff9d0, FUTEX_WAIT, 38303, NULL <detached ...>
>
> In ganesha-gfapi.log, when i try to copy files - the log will pop up the
> following entries, which keep complaining split-brain, and
> stale-file-handle, issues in the Gluster volume.
>
> [2016-08-01 23:06:09.423901] W [MSGID: 108008]
> [afr-read-txn.c:244:afr_read_txn] 0-nfsvol1-replicate-0: Unreadable
> subvolume -1 found with event generation 2 for gfid
> 3f713211-7573-45b1-aed8-503c8e17714b. (Possible split-brain)
>
> [2016-08-01 23:06:09.425664] E [MSGID: 109040]
> [dht-helper.c:1190:dht_migration_complete_check_task] 0-nfsvol1-dht:
> <gfid:3f713211-7573-45b1-aed8-503c8e17714b>: failed to lookup the
file
> on nfsvol1-dht [Stale file handle]
>
>
> I tried with both Gluster 3.7.13, and 3.7.12, these versions both give
> me the same problem. Until i downgrade Gluster to 3.7.11, nfs-ganesha
> then plays nicely with Gluster. I once wondered if that's related to my
> ZFS backend setup, then i set something quick in my laptop using XFS as
> the backend with 3 nodes running Gluster 3.7.13, and nfs-ganesha 2.3.0,
> and i got the same result. Rsync/Copy files to the NFS shares exported
> from Gluster+Ganesha aborted after a few files got copied. For your
> reference, pacemaker+corosync are both still running in the background
> even when this happens.
>
> I am wondering if there are something introduced since 3.7.12, which
> somehow breaks the interface between nfs-ganesha, and Gluster. Any
> pointers will be appreciated.
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>

Gluster users - Aug 2016 - Gluster 3.7.13 with nfs-ganesha 2.3.0.1

[Gluster-users] Gluster 3.7.13 with nfs-ganesha 2.3.0.1

[Gluster-users] Gluster 3.7.13 with nfs-ganesha 2.3.0.1