thr3ads.net - Gluster users - [Gluster-users] nfs-ganesha locking problems [Oct 2017]

If this information is useful, please help other people find it:
Share via:

Bernhard Dübi

2017-Sep-29 15:39 UTC

[Gluster-users] nfs-ganesha locking problems

Hi,

I have a problem with nfs-ganesha serving gluster volumes

I can read and write files but then one of the DBAs tried to dump an
Oracle DB onto the NFS share and got the following errors:


Export: Release 11.2.0.4.0 - Production on Wed Sep 27 23:27:48 2017

Copyright (c) 1982, 2011, Oracle and/or its affiliates.??All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release
11.2.0.4.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Data Mining
and Real Application Testing options
ORA-39001: invalid argument value
ORA-39000: bad dump file specification
ORA-31641: unable to create dump file
"/u00/app/oracle/DB_BACKUPS/FPESSP11/riskdw_prod_tabs_28092017_01.dmp"
ORA-27086: unable to lock file - already in use
Linux-x86_64 Error: 37: No locks available
Additional information: 10
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3


the file exists and is accessible.


Details:
There are 2 gluster clusters involved
the first cluster hosts a number of "replica 3 arbiter 1" volumes
the second cluster only hosts the cluster.enable-shared-storage volume
across 3 nodes. it also runs nfs-ganesha in cluster configuration
(pacemaker, corosync). nfs-ganesha serves the volumes from the first
cluster.

Any idea what's wrong?

Kind Regards
Bernhard


CLUSTER 1 info
=============
root at chglbcvtprd04:/etc# cat os-release
NAME="Ubuntu"
VERSION="16.04.3 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.3 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
root at chglbcvtprd04:/etc# cat lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.3 LTS"
root at chglbcvtprd04:/etc# dpkg -l | grep gluster | sort
ii  glusterfs-client                    3.8.15-ubuntu1~xenial1
              amd64        clustered file-system (client package)
ii  glusterfs-common                    3.8.15-ubuntu1~xenial1
              amd64        GlusterFS common libraries and translator
modules
ii  glusterfs-server                    3.8.15-ubuntu1~xenial1
              amd64        clustered file-system (server package)

root at chglbcvtprd04:~# gluster volume status ora_dump
Status of volume: ora_dump
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick chastcvtprd04:/data/glusterfs/ora_dum
p/2I-1-39/brick                             49772     0          Y       11048
Brick chglbcvtprd04:/data/glusterfs/ora_dum
p/2I-1-39/brick                             50108     0          Y       9990
Brick chealglaprd01:/data/glusterfs/arbiter
/vol01/ora_dump.2I-1-39                     49200     0          Y       3114
Brick chastcvtprd04:/data/glusterfs/ora_dum
p/1I-1-18/brick                             49773     0          Y       11085
Brick chglbcvtprd04:/data/glusterfs/ora_dum
p/1I-1-18/brick                             50109     0          Y       10000
Brick chealglaprd01:/data/glusterfs/arbiter
/vol02/ora_dump.1I-1-18                     49201     0          Y       3080
Brick chastcvtprd04:/data/glusterfs/ora_dum
p/2I-1-48/brick                             49774     0          Y       11091
Brick chglbcvtprd04:/data/glusterfs/ora_dum
p/2I-1-48/brick                             50110     0          Y       10007
Brick chealglaprd01:/data/glusterfs/arbiter
/vol03/ora_dump.2I-1-48                     49202     0          Y       3070
Brick chastcvtprd04:/data/glusterfs/ora_dum
p/1I-1-25/brick                             49775     0          Y       11152
Brick chglbcvtprd04:/data/glusterfs/ora_dum
p/1I-1-25/brick                             50111     0          Y       10012
Brick chealglaprd01:/data/glusterfs/arbiter
/vol04/ora_dump.1I-1-25                     49203     0          Y       3090
Self-heal Daemon on localhost               N/A       N/A        Y       27438
Self-heal Daemon on chealglaprd01           N/A       N/A        Y       32209
Self-heal Daemon on chastcvtprd04.fpprod.co
rp                                          N/A       N/A        Y       27378

root at chglbcvtprd04:~# gluster volume info ora_dump

Volume Name: ora_dump
Type: Distributed-Replicate
Volume ID: b26e649d-d1fe-4ebc-aa03-b196c8925466
Status: Started
Snapshot Count: 0
Number of Bricks: 4 x (2 + 1) = 12
Transport-type: tcp
Bricks:
Brick1: chastcvtprd04:/data/glusterfs/ora_dump/2I-1-39/brick
Brick2: chglbcvtprd04:/data/glusterfs/ora_dump/2I-1-39/brick
Brick3: chealglaprd01:/data/glusterfs/arbiter/vol01/ora_dump.2I-1-39 (arbiter)
Brick4: chastcvtprd04:/data/glusterfs/ora_dump/1I-1-18/brick
Brick5: chglbcvtprd04:/data/glusterfs/ora_dump/1I-1-18/brick
Brick6: chealglaprd01:/data/glusterfs/arbiter/vol02/ora_dump.1I-1-18 (arbiter)
Brick7: chastcvtprd04:/data/glusterfs/ora_dump/2I-1-48/brick
Brick8: chglbcvtprd04:/data/glusterfs/ora_dump/2I-1-48/brick
Brick9: chealglaprd01:/data/glusterfs/arbiter/vol03/ora_dump.2I-1-48 (arbiter)
Brick10: chastcvtprd04:/data/glusterfs/ora_dump/1I-1-25/brick
Brick11: chglbcvtprd04:/data/glusterfs/ora_dump/1I-1-25/brick
Brick12: chealglaprd01:/data/glusterfs/arbiter/vol04/ora_dump.1I-1-25 (arbiter)
Options Reconfigured:
auth.allow:
127.0.0.1,10.30.28.43,10.30.28.44,10.8.13.132,10.30.28.36,10.30.28.37,10.30.201.30,10.30.201.31,10.30.201.32,10.30.201.39,10.30.201.43,10.30.201.44
nfs.rpc-auth-allow: all
performance.readdir-ahead: on
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on
features.bitrot: off
features.scrub: Inactive
nfs.disable: on
features.cache-invalidation: on

root at chglbcvtprd04:~# gluster volume get ora_dump all
Option                                  Value
------                                  -----
cluster.lookup-unhashed                 on
cluster.lookup-optimize                 off
cluster.min-free-disk                   10%
cluster.min-free-inodes                 5%
cluster.rebalance-stats                 off
cluster.subvols-per-directory           (null)
cluster.readdir-optimize                off
cluster.rsync-hash-regex                (null)
cluster.extra-hash-regex                (null)
cluster.dht-xattr-name                  trusted.glusterfs.dht
cluster.randomize-hash-range-by-gfid    off
cluster.rebal-throttle                  normal
cluster.lock-migration                  off
cluster.local-volume-name               (null)
cluster.weighted-rebalance              on
cluster.switch-pattern                  (null)
cluster.entry-change-log                on
cluster.read-subvolume                  (null)
cluster.read-subvolume-index            -1
cluster.read-hash-mode                  1
cluster.background-self-heal-count      8
cluster.metadata-self-heal              on
cluster.data-self-heal                  on
cluster.entry-self-heal                 on
cluster.self-heal-daemon                on
cluster.heal-timeout                    600
cluster.self-heal-window-size           1
cluster.data-change-log                 on
cluster.metadata-change-log             on
cluster.data-self-heal-algorithm        (null)
cluster.eager-lock                      on
disperse.eager-lock                     on
cluster.quorum-type                     none
cluster.quorum-count                    (null)
cluster.choose-local                    true
cluster.self-heal-readdir-size          1KB
cluster.post-op-delay-secs              1
cluster.ensure-durability               on
cluster.consistent-metadata             no
cluster.heal-wait-queue-length          128
cluster.favorite-child-policy           none
cluster.stripe-block-size               128KB
cluster.stripe-coalesce                 true
diagnostics.latency-measurement         on
diagnostics.dump-fd-stats               off
diagnostics.count-fop-hits              on
diagnostics.brick-log-level             INFO
diagnostics.client-log-level            INFO
diagnostics.brick-sys-log-level         CRITICAL
diagnostics.client-sys-log-level        CRITICAL
diagnostics.brick-logger                (null)
diagnostics.client-logger               (null)
diagnostics.brick-log-format            (null)
diagnostics.client-log-format           (null)
diagnostics.brick-log-buf-size          5
diagnostics.client-log-buf-size         5
diagnostics.brick-log-flush-timeout     120
diagnostics.client-log-flush-timeout    120
diagnostics.stats-dump-interval         0
diagnostics.fop-sample-interval         0
diagnostics.fop-sample-buf-size         65535
diagnostics.stats-dnscache-ttl-sec      86400
performance.cache-max-file-size         0
performance.cache-min-file-size         0
performance.cache-refresh-timeout       1
performance.cache-priority
performance.cache-size                  32MB
performance.io-thread-count             16
performance.high-prio-threads           16
performance.normal-prio-threads         16
performance.low-prio-threads            16
performance.least-prio-threads          1
performance.enable-least-priority       on
performance.least-rate-limit            0
performance.cache-size                  128MB
performance.flush-behind                on
performance.nfs.flush-behind            on
performance.write-behind-window-size    1MB
performance.resync-failed-syncs-after-fsyncoff
performance.nfs.write-behind-window-size1MB
performance.strict-o-direct             off
performance.nfs.strict-o-direct         off
performance.strict-write-ordering       off
performance.nfs.strict-write-ordering   off
performance.lazy-open                   yes
performance.read-after-open             no
performance.read-ahead-page-count       4
performance.md-cache-timeout            1
performance.cache-swift-metadata        true
features.encryption                     off
encryption.master-key                   (null)
encryption.data-key-size                256
encryption.block-size                   4096
network.frame-timeout                   1800
network.ping-timeout                    42
network.tcp-window-size                 (null)
features.lock-heal                      off
features.grace-timeout                  10
network.remote-dio                      disable
client.event-threads                    2
network.ping-timeout                    42
network.tcp-window-size                 (null)
network.inode-lru-limit                 16384
auth.allow
127.0.0.1,10.30.28.43,10.30.28.44,10.8.13.132,10.30.28.36,10.30.28.37,10.30.201.30,10.30.201.31,10.30.201.32,10.30.201.39,10.30.201.43,10.30.201.44
auth.reject                             (null)
transport.keepalive                     (null)
server.allow-insecure                   (null)
server.root-squash                      off
server.anonuid                          65534
server.anongid                          65534
server.statedump-path                   /var/run/gluster
server.outstanding-rpc-limit            64
features.lock-heal                      off
features.grace-timeout                  10
server.ssl                              (null)
auth.ssl-allow                          *
server.manage-gids                      off
server.dynamic-auth                     on
client.send-gids                        on
server.gid-timeout                      300
server.own-thread                       (null)
server.event-threads                    2
ssl.own-cert                            (null)
ssl.private-key                         (null)
ssl.ca-list                             (null)
ssl.crl-path                            (null)
ssl.certificate-depth                   (null)
ssl.cipher-list                         (null)
ssl.dh-param                            (null)
ssl.ec-curve                            (null)
performance.write-behind                on
performance.read-ahead                  on
performance.readdir-ahead               on
performance.io-cache                    on
performance.quick-read                  on
performance.open-behind                 on
performance.stat-prefetch               on
performance.client-io-threads           off
performance.nfs.write-behind            on
performance.nfs.read-ahead              off
performance.nfs.io-cache                off
performance.nfs.quick-read              off
performance.nfs.stat-prefetch           off
performance.nfs.io-threads              off
performance.force-readdirp              true
features.uss                            off
features.snapshot-directory             .snaps
features.show-snapshot-directory        off
network.compression                     off
network.compression.window-size         -15
network.compression.mem-level           8
network.compression.min-size            0
network.compression.compression-level   -1
network.compression.debug               false
features.limit-usage                    (null)
features.quota-timeout                  0
features.default-soft-limit             80%
features.soft-timeout                   60
features.hard-timeout                   5
features.alert-time                     86400
features.quota-deem-statfs              off
geo-replication.indexing                off
geo-replication.indexing                off
geo-replication.ignore-pid-check        off
geo-replication.ignore-pid-check        off
features.quota                          off
features.inode-quota                    off
features.bitrot                         off
debug.trace                             off
debug.log-history                       no
debug.log-file                          no
debug.exclude-ops                       (null)
debug.include-ops                       (null)
debug.error-gen                         off
debug.error-failure                     (null)
debug.error-number                      (null)
debug.random-failure                    off
debug.error-fops                        (null)
nfs.enable-ino32                        no
nfs.mem-factor                          15
nfs.export-dirs                         on
nfs.export-volumes                      on
nfs.addr-namelookup                     off
nfs.dynamic-volumes                     off
nfs.register-with-portmap               on
nfs.outstanding-rpc-limit               16
nfs.port                                2049
nfs.rpc-auth-unix                       on
nfs.rpc-auth-null                       on
nfs.rpc-auth-allow                      all
nfs.rpc-auth-reject                     none
nfs.ports-insecure                      off
nfs.trusted-sync                        off
nfs.trusted-write                       off
nfs.volume-access                       read-write
nfs.export-dir
nfs.disable                             on
nfs.nlm                                 on
nfs.acl                                 on
nfs.mount-udp                           off
nfs.mount-rmtab                         /var/lib/glusterd/nfs/rmtab
nfs.rpc-statd                           /sbin/rpc.statd
nfs.server-aux-gids                     off
nfs.drc                                 off
nfs.drc-size                            0x20000
nfs.read-size                           (1 * 1048576ULL)
nfs.write-size                          (1 * 1048576ULL)
nfs.readdir-size                        (1 * 1048576ULL)
nfs.rdirplus                            on
nfs.exports-auth-enable                 (null)
nfs.auth-refresh-interval-sec           (null)
nfs.auth-cache-ttl-sec                  (null)
features.read-only                      off
features.worm                           off
features.worm-file-level                off
features.default-retention-period       120
features.retention-mode                 relax
features.auto-commit-period             180
storage.linux-aio                       off
storage.batch-fsync-mode                reverse-fsync
storage.batch-fsync-delay-usec          0
storage.owner-uid                       -1
storage.owner-gid                       -1
storage.node-uuid-pathinfo              off
storage.health-check-interval           30
storage.build-pgfid                     off
storage.bd-aio                          off
cluster.server-quorum-type              off
cluster.server-quorum-ratio             0
changelog.changelog                     off
changelog.changelog-dir                 (null)
changelog.encoding                      ascii
changelog.rollover-time                 15
changelog.fsync-interval                5
changelog.changelog-barrier-timeout     120
changelog.capture-del-path              off
features.barrier                        disable
features.barrier-timeout                120
features.trash                          off
features.trash-dir                      .trashcan
features.trash-eliminate-path           (null)
features.trash-max-filesize             5MB
features.trash-internal-op              off
cluster.enable-shared-storage           disable
cluster.write-freq-threshold            0
cluster.read-freq-threshold             0
cluster.tier-pause                      off
cluster.tier-promote-frequency          120
cluster.tier-demote-frequency           3600
cluster.watermark-hi                    90
cluster.watermark-low                   75
cluster.tier-mode                       cache
cluster.tier-max-promote-file-size      0
cluster.tier-max-mb                     4000
cluster.tier-max-files                  10000
features.ctr-enabled                    off
features.record-counters                off
features.ctr-record-metadata-heat       off
features.ctr_link_consistency           off
features.ctr_lookupheal_link_timeout    300
features.ctr_lookupheal_inode_timeout   300
features.ctr-sql-db-cachesize           1000
features.ctr-sql-db-wal-autocheckpoint  1000
locks.trace                             off
locks.mandatory-locking                 off
cluster.disperse-self-heal-daemon       enable
cluster.quorum-reads                    no
client.bind-insecure                    (null)
ganesha.enable                          off
features.shard                          off
features.shard-block-size               4MB
features.scrub-throttle                 lazy
features.scrub-freq                     biweekly
features.scrub                          Inactive
features.expiry-time                    120
features.cache-invalidation             on
features.cache-invalidation-timeout     60
features.leases                         off
features.lease-lock-recall-timeout      60
disperse.background-heals               8
disperse.heal-wait-qlength              128
cluster.heal-timeout                    600
dht.force-readdirp                      on
disperse.read-policy                    round-robin
cluster.shd-max-threads                 1
cluster.shd-wait-qlength                1024
cluster.locking-scheme                  full
cluster.granular-entry-heal             no




CLUSTER 2 info
=============
[root at chvirnfsprd10 etc]# cat os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

[root at chvirnfsprd10 etc]# cat centos-release
CentOS Linux release 7.3.1611 (Core)
[root at chvirnfsprd10 ~]# rpm -qa | grep gluster | sort
centos-release-gluster38-1.0-1.el7.centos.noarch
glusterfs-3.8.15-2.el7.x86_64
glusterfs-api-3.8.15-2.el7.x86_64
glusterfs-cli-3.8.15-2.el7.x86_64
glusterfs-client-xlators-3.8.15-2.el7.x86_64
glusterfs-fuse-3.8.15-2.el7.x86_64
glusterfs-ganesha-3.8.15-2.el7.x86_64
glusterfs-libs-3.8.15-2.el7.x86_64
glusterfs-resource-agents-3.8.15-2.el7.noarch
glusterfs-server-3.8.15-2.el7.x86_64
nfs-ganesha-gluster-2.3.3-1.el7.x86_64
[root at chvirnfsprd10 sssd]# rpm -qa | grep ganesha | sort
glusterfs-ganesha-3.8.15-2.el7.x86_64
nfs-ganesha-2.3.3-1.el7.x86_64
nfs-ganesha-gluster-2.3.3-1.el7.x86_64

[root at chvirnfsprd10 ~]# gluster volume status
Status of volume: gluster_shared_storage
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick chvirnfsprd11:/var/lib/glusterd/ss_br
ick                                         49155     0          Y       1054
Brick chvirnfsprd12:/var/lib/glusterd/ss_br
ick                                         49155     0          Y       1434
Brick chvirnfsprd10.fpprod.corp:/var/lib/gl
usterd/ss_brick                             49155     0          Y       1474
Self-heal Daemon on localhost               N/A       N/A        Y       12196
Self-heal Daemon on chvirnfsprd11           N/A       N/A        Y       32110
Self-heal Daemon on chvirnfsprd12           N/A       N/A        Y       2877

Task Status of Volume gluster_shared_storage
------------------------------------------------------------------------------
There are no active volume tasks

[root at chvirnfsprd10 ~]# cat /etc/ganesha/ganesha.conf
NFS_Core_Param {
        #Use supplied name other than IP In NSM operations
        NSM_Use_Caller_Name = true;
        #Copy lock states into "/var/lib/nfs/ganesha" dir
       Clustered = true;
        #Use a non-privileged port for RQuota
        Rquota_Port = 875;
}

%include /etc/ganesha/exports/ora_dump.conf
%include /etc/ganesha/exports/chzrhcvtprd04.conf

[root at chvirnfsprd10 ~]# cat /etc/ganesha/exports/ora_dump.conf
EXPORT
{
        # Export Id (mandatory, each EXPORT must have a unique Export_Id)
        Export_Id = 77;

        # Exported path (mandatory)
        Path = /ora_dump;

        # Pseudo Path (required for NFS v4)
        Pseudo = /ora_dump;

        # Exporting FSAL
        FSAL {
                Name = GLUSTER;
                Hostname = 10.30.28.43;
                Volume = ora_dump;
        }

        CLIENT {
                # Oracle Servers
                Clients
10.30.29.125,10.30.28.25,10.30.28.64,10.30.29.123,10.30.28.21,10.30.28.81,10.30.29.124,10.30.28.82,10.30.29.111;
                Access_Type = RW;
        }
}

[root at chvirnfsprd10 ~]# cat /etc/ganesha/ganesha-ha.conf
HA_NAME="ltq-prd-nfs"
HA_VOL_SERVER="chvirnfsprd10"
HA_CLUSTER_NODES="chvirnfsprd10,chvirnfsprd11,chvirnfsprd12"
VIP_chvirnfsprd10="10.30.201.39"
VIP_chvirnfsprd11="10.30.201.43"
VIP_chvirnfsprd12="10.30.201.44"

[root at chvirnfsprd10 ~]# pcs status
Cluster name: ltq-prd-nfs
Stack: corosync
Current DC: chvirnfsprd11 (version 1.1.15-11.el7_3.5-e174ec8) -
partition with quorum
Last updated: Fri Sep 29 15:01:26 2017          Last change: Mon Sep
18 11:40:45 2017 by root via crm_attribute on chvirnfsprd12

3 nodes and 12 resources configured

Online: [ chvirnfsprd10 chvirnfsprd11 chvirnfsprd12 ]

Full list of resources:

Clone Set: nfs_setup-clone [nfs_setup]
     Started: [ chvirnfsprd10 chvirnfsprd11 chvirnfsprd12 ]
Clone Set: nfs-mon-clone [nfs-mon]
    Started: [ chvirnfsprd10 chvirnfsprd11 chvirnfsprd12 ]
Clone Set: nfs-grace-clone [nfs-grace]
     Started: [ chvirnfsprd10 chvirnfsprd11 chvirnfsprd12 ]
chvirnfsprd10-cluster_ip-1     (ocf::heartbeat:IPaddr):        Started
chvirnfsprd10
chvirnfsprd11-cluster_ip-1     (ocf::heartbeat:IPaddr):        Started
chvirnfsprd11
chvirnfsprd12-cluster_ip-1     (ocf::heartbeat:IPaddr):        Started
chvirnfsprd12

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

Soumya Koduri

2017-Oct-02 13:44 UTC

head link

[Gluster-users] nfs-ganesha locking problems

Hi

On 09/29/2017 09:09 PM, Bernhard D?bi wrote:> Hi,
> 
> I have a problem with nfs-ganesha serving gluster volumes
> 
> I can read and write files but then one of the DBAs tried to dump an
> Oracle DB onto the NFS share and got the following errors:
> 
> 
> Export: Release 11.2.0.4.0 - Production on Wed Sep 27 23:27:48 2017
> 
> Copyright (c) 1982, 2011, Oracle and/or its affiliates.??All rights
reserved.
> 
> Connected to: Oracle Database 11g Enterprise Edition Release
> 11.2.0.4.0 - 64bit Production
> With the Partitioning, Automatic Storage Management, OLAP, Data Mining
> and Real Application Testing options
> ORA-39001: invalid argument value
> ORA-39000: bad dump file specification
> ORA-31641: unable to create dump file
>
"/u00/app/oracle/DB_BACKUPS/FPESSP11/riskdw_prod_tabs_28092017_01.dmp"
> ORA-27086: unable to lock file - already in use
> Linux-x86_64 Error: 37: No locks available
> Additional information: 10
> ORA-27037: unable to obtain file status
> Linux-x86_64 Error: 2: No such file or directory
> Additional information: 3
> 
Do you see any errors/warnings in any of the logs - ganesha.log, 
ganesha-gfapi.log and brick logs? Also if the issue is reproducible, 
please collect tcpdump for that duration on the node where nfs-ganesha 
server is running.

Thanks,
Soumya

Bernhard Dübi

2017-Oct-02 20:45 UTC

head link

[Gluster-users] nfs-ganesha locking problems

Hi Soumya,

what I can say so far:

it is working on a standalone system but not on the clustered system

from reading the ganesha wiki I have the impression that it is
possible to change the log level without restarting ganesha. I was
playing with dbus-send but so far was unsuccessful. if you can help me
with that, this would be great.

here some details about the tested machines. the nfs client was always the same


THIS SYSTEM IS WORKING


root at chvirnfstst01 ~]# uname -a
Linux chvirnfstst01 3.10.0-693.2.2.el7.x86_64 #1 SMP Tue Sep 12
22:26:13 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
[root at chvirnfstst01 ~]# cd /etc/
[root at chvirnfstst01 etc]# ls -ld *rel*
-rw-r--r--. 1 root root  38 Aug 30 17:53 centos-release
-rw-r--r--. 1 root root  51 Aug 30 17:53 centos-release-upstream
-rw-r--r--. 1 root root 393 Aug 30 17:53 os-release
drwxr-xr-x. 2 root root  78 Oct  1 15:52 prelink.conf.d
lrwxrwxrwx. 1 root root  14 Oct  1 15:51 redhat-release -> centos-release
lrwxrwxrwx. 1 root root  14 Oct  1 15:51 system-release -> centos-release
-rw-r--r--. 1 root root  23 Aug 30 17:53 system-release-cpe
[root at chvirnfstst01 etc]# cat centos-release
CentOS Linux release 7.4.1708 (Core)
[root at chvirnfstst01 etc]# cat os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

[root at chvirnfstst01 etc]#
[root at chvirnfstst01 etc]# rpm -qa | grep ganesha | sort
nfs-ganesha-2.3.3-1.el7.x86_64
nfs-ganesha-gluster-2.3.3-1.el7.x86_64
[root at chvirnfstst01 etc]#
[root at chvirnfstst01 etc]# rpm -qa | grep gluster | sort
centos-release-gluster38-1.0-1.el7.centos.noarch
glusterfs-3.8.15-2.el7.x86_64
glusterfs-api-3.8.15-2.el7.x86_64
glusterfs-client-xlators-3.8.15-2.el7.x86_64
glusterfs-libs-3.8.15-2.el7.x86_64
nfs-ganesha-gluster-2.3.3-1.el7.x86_64
[root at chvirnfstst01 etc]#
[root at chvirnfstst01 etc]# cat /etc/ganesha/ganesha.conf
EXPORT
{
        # Export Id (mandatory, each EXPORT must have a unique Export_Id)
        Export_Id = 77;

        # Exported path (mandatory)
        Path = /ora_dump;

        # Pseudo Path (required for NFS v4)
        Pseudo = /ora_dump;

        # Exporting FSAL
        FSAL {
                Name = GLUSTER;
                Hostname = 10.30.28.43;
                Volume = ora_dump;
        }

        CLIENT {
                # Oracle Servers
                Clients
10.30.29.125,10.30.28.25,10.30.28.64,10.30.29.123,10.30.28.21,10.30.28.81,10.30.29.124,10.30.28.82,10.30.29.111;
                Access_Type = RW;
        }
}

EXPORT
{
        # Export Id (mandatory, each EXPORT must have a unique Export_Id)
        Export_Id = 88;

        # Exported path (mandatory)
        Path = /chzrhcvtprd04;

        # Pseudo Path (required for NFS v4)
        Pseudo = /chzrhcvtprd04;

        # Exporting FSAL
        FSAL {
                Name = GLUSTER;
                Hostname = 10.30.28.43;
                Volume = chzrhcvtprd04;
        }

        CLIENT {
                # everybody
                Clients = 10.30.0.0/16,10.40.0.0/16,10.50.0.0/16;
                Access_Type = RW;
        }
}
[root at chvirnfstst01 etc]#



THIS SYSTEM IS NOT WORKING

you can find the details about the shared volume in my previous mail

[root at chvirnfsprd12 ~]# uname -a
Linux chvirnfsprd12 3.10.0-514.26.2.el7.x86_64 #1 SMP Tue Jul 4
15:04:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
[root at chvirnfsprd12 ~]# cd /etc/
[root at chvirnfsprd12 etc]# ls -ld *rel*
-rw-r--r--. 1 root root  38 Nov 29  2016 centos-release
-rw-r--r--. 1 root root  51 Nov 29  2016 centos-release-upstream
-rw-r--r--. 1 root root 393 Nov 29  2016 os-release
drwxr-xr-x. 2 root root  78 Sep  2 08:54 prelink.conf.d
lrwxrwxrwx. 1 root root  14 Sep  2 08:53 redhat-release -> centos-release
lrwxrwxrwx. 1 root root  14 Sep  2 08:53 system-release -> centos-release
-rw-r--r--. 1 root root  23 Nov 29  2016 system-release-cpe
[root at chvirnfsprd12 etc]# cat centos-release
CentOS Linux release 7.3.1611 (Core)
[root at chvirnfsprd12 etc]# cat os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

[root at chvirnfsprd12 etc]# rpm -qa | grep ganesha | sort
glusterfs-ganesha-3.8.15-2.el7.x86_64
nfs-ganesha-2.3.3-1.el7.x86_64
nfs-ganesha-gluster-2.3.3-1.el7.x86_64
[root at chvirnfsprd12 etc]# rpm -qa | grep gluster | sort
centos-release-gluster38-1.0-1.el7.centos.noarch
glusterfs-3.8.15-2.el7.x86_64
glusterfs-api-3.8.15-2.el7.x86_64
glusterfs-cli-3.8.15-2.el7.x86_64
glusterfs-client-xlators-3.8.15-2.el7.x86_64
glusterfs-fuse-3.8.15-2.el7.x86_64
glusterfs-ganesha-3.8.15-2.el7.x86_64
glusterfs-libs-3.8.15-2.el7.x86_64
glusterfs-server-3.8.15-2.el7.x86_64
nfs-ganesha-gluster-2.3.3-1.el7.x86_64
[root at chvirnfsprd12 etc]# cat /etc/ganesha/ganesha.conf
NFS_Core_Param {
        #Use supplied name other than IP In NSM operations
        NSM_Use_Caller_Name = true;
        #Copy lock states into "/var/lib/nfs/ganesha" dir
        Clustered = true;
        #Use a non-privileged port for RQuota
        Rquota_Port = 875;
}

%include /etc/ganesha/exports/ora_dump.conf
%include /etc/ganesha/exports/chzrhcvtprd04.conf
[root at chvirnfsprd12 etc]# cat /etc/ganesha/exports/ora_dump.conf
EXPORT
{
        # Export Id (mandatory, each EXPORT must have a unique Export_Id)
        Export_Id = 77;

        # Exported path (mandatory)
        Path = /ora_dump;

        # Pseudo Path (required for NFS v4)
        Pseudo = /ora_dump;

        # Exporting FSAL
        FSAL {
                Name = GLUSTER;
                Hostname = 10.30.28.43;
                Volume = ora_dump;
        }

        CLIENT {
                # Oracle Servers
                Clients
10.30.29.125,10.30.28.25,10.30.28.64,10.30.29.123,10.30.28.21,10.30.28.81,10.30.29.124,10.30.28.82,10.30.29.111;
                Access_Type = RW;
        }
}
[root at chvirnfsprd12 etc]#
[root at chvirnfsprd12 log]# grep '^\[2017-10-02 [12]' ganesha-gfapi.log
[2017-10-02 18:49:12.855174] I [MSGID: 104043]
[glfs-mgmt.c:565:glfs_mgmt_getspec_cbk] 0-gfapi: No change in volfile,
continuing
[2017-10-02 18:49:12.862051] I [MSGID: 104043]
[glfs-mgmt.c:565:glfs_mgmt_getspec_cbk] 0-gfapi: No change in volfile,
continuing
[2017-10-02 18:50:05.789064] E [socket.c:2309:socket_connect_finish]
0-gfapi: connection to 10.30.28.43:24007 failed (Connection refused)
[2017-10-02 18:50:06.519516] E [socket.c:2309:socket_connect_finish]
0-gfapi: connection to 10.30.28.43:24007 failed (Connection refused)
[2017-10-02 18:49:13.076308] I [MSGID: 104043]
[glfs-mgmt.c:565:glfs_mgmt_getspec_cbk] 0-gfapi: No change in volfile,
continuing
[2017-10-02 18:50:22.694873] I [MSGID: 104043]
[glfs-mgmt.c:565:glfs_mgmt_getspec_cbk] 0-gfapi: No change in volfile,
continuing
[2017-10-02 19:17:54.733967] E [MSGID: 109040]
[dht-helper.c:1198:dht_migration_complete_check_task] 0-ora_dump-dht:
<gfid:6d0de5f7-293c-4075-b3e2-f85f7902456e>: failed to lookup the file
on ora_dump-dht [Stale file handle]
[2017-10-02 19:17:54.774694] E [MSGID: 109040]
[dht-helper.c:1198:dht_migration_complete_check_task] 0-ora_dump-dht:
<gfid:6b8da639-5a48-42b5-aa9d-a25850eee8bd>: failed to lookup the file
on ora_dump-dht [Stale file handle]
[2017-10-02 19:17:54.825167] E [MSGID: 109040]
[dht-helper.c:1198:dht_migration_complete_check_task] 0-ora_dump-dht:
<gfid:a5ab5eb4-f6fa-419a-87fc-1180f2ef324a>: failed to lookup the file
on ora_dump-dht [Stale file handle]
[2017-10-02 19:17:54.877278] E [MSGID: 109040]
[dht-helper.c:1198:dht_migration_complete_check_task] 0-ora_dump-dht:
<gfid:9d12bd00-75ba-47d5-b513-5cfd9926b061>: failed to lookup the file
on ora_dump-dht [Stale file handle]
[2017-10-02 19:17:54.938806] E [MSGID: 109040]
[dht-helper.c:1198:dht_migration_complete_check_task] 0-ora_dump-dht:
<gfid:2ca21df0-d8bb-4484-b386-438d99f9be0d>: failed to lookup the file
on ora_dump-dht [Stale file handle]
[2017-10-02 19:19:00.865158] E [MSGID: 109040]
[dht-helper.c:1198:dht_migration_complete_check_task] 0-ora_dump-dht:
<gfid:32634781-2a61-45d1-83b9-909fe96ec0ad>: failed to lookup the file
on ora_dump-dht [Stale file handle]
[2017-10-02 19:19:05.619545] E [MSGID: 109040]
[dht-helper.c:1198:dht_migration_complete_check_task] 0-ora_dump-dht:
<gfid:ed9dec37-081a-4062-b626-779f9ce494de>: failed to lookup the file
on ora_dump-dht [Stale file handle]
[2017-10-02 19:19:08.176730] E [MSGID: 109040]
[dht-helper.c:1198:dht_migration_complete_check_task] 0-ora_dump-dht:
<gfid:fb38de03-11c7-4d77-b262-933f5a6e9289>: failed to lookup the file
on ora_dump-dht [Stale file handle]
[2017-10-02 19:19:09.456716] E [MSGID: 109040]
[dht-helper.c:1198:dht_migration_complete_check_task] 0-ora_dump-dht:
<gfid:b252ffdd-b62e-4470-a885-75df67ba4bf5>: failed to lookup the file
on ora_dump-dht [Stale file handle]
[2017-10-02 19:19:10.730761] E [MSGID: 109040]
[dht-helper.c:1198:dht_migration_complete_check_task] 0-ora_dump-dht:
<gfid:10ee40e0-4807-40fe-abdc-590a7ee86de5>: failed to lookup the file
on ora_dump-dht [Stale file handle]
[2017-10-02 19:19:11.425541] E [MSGID: 109040]
[dht-helper.c:1198:dht_migration_complete_check_task] 0-ora_dump-dht:
<gfid:d6f4cffd-eda7-4188-8af1-60eaa67a3724>: failed to lookup the file
on ora_dump-dht [Stale file handle]





[root at chvirnfsprd12 log]# grep '^02/10/2017 21:1[7-9]' ganesha.log
02/10/2017 21:17:43 : epoch 59cbce8c : chvirnfsprd12 :
ganesha.nfsd-8516[work-6] mnt_Mnt :NFS3 :EVENT :MOUNT: Performance
warning: Export entry is not cached
02/10/2017 21:17:54 : epoch 59cbce8c : chvirnfsprd12 :
ganesha.nfsd-8516[work-13] nsm_monitor :NLM :CRIT :Can not monitor
chglboraprd10.fpprod.corp SM_MON status 1
02/10/2017 21:17:54 : epoch 59cbce8c : chvirnfsprd12 :
ganesha.nfsd-8516[work-16] nsm_monitor :NLM :CRIT :Can not monitor
chglboraprd10.fpprod.corp SM_MON status 1
02/10/2017 21:17:54 : epoch 59cbce8c : chvirnfsprd12 :
ganesha.nfsd-8516[work-1] nsm_monitor :NLM :CRIT :Can not monitor
chglboraprd10.fpprod.corp SM_MON status 1
02/10/2017 21:17:54 : epoch 59cbce8c : chvirnfsprd12 :
ganesha.nfsd-8516[work-2] nsm_monitor :NLM :CRIT :Can not monitor
chglboraprd10.fpprod.corp SM_MON status 1
02/10/2017 21:17:54 : epoch 59cbce8c : chvirnfsprd12 :
ganesha.nfsd-8516[work-12] nsm_monitor :NLM :CRIT :Can not monitor
chglboraprd10.fpprod.corp SM_MON status 1
02/10/2017 21:17:54 : epoch 59cbce8c : chvirnfsprd12 :
ganesha.nfsd-8516[work-16] nsm_monitor :NLM :CRIT :Can not monitor
chglboraprd10.fpprod.corp SM_MON status 1
02/10/2017 21:19:05 : epoch 59cbce8c : chvirnfsprd12 :
ganesha.nfsd-8516[work-2] nsm_monitor :NLM :CRIT :Can not monitor
chglboraprd10.fpprod.corp SM_MON status 1
02/10/2017 21:19:08 : epoch 59cbce8c : chvirnfsprd12 :
ganesha.nfsd-8516[work-6] nsm_monitor :NLM :CRIT :Can not monitor
chglboraprd10.fpprod.corp SM_MON status 1
02/10/2017 21:19:09 : epoch 59cbce8c : chvirnfsprd12 :
ganesha.nfsd-8516[work-5] nsm_monitor :NLM :CRIT :Can not monitor
chglboraprd10.fpprod.corp SM_MON status 1
02/10/2017 21:19:10 : epoch 59cbce8c : chvirnfsprd12 :
ganesha.nfsd-8516[work-2] nsm_monitor :NLM :CRIT :Can not monitor
chglboraprd10.fpprod.corp SM_MON status 1
02/10/2017 21:19:11 : epoch 59cbce8c : chvirnfsprd12 :
ganesha.nfsd-8516[work-11] nsm_monitor :NLM :CRIT :Can not monitor
chglboraprd10.fpprod.corp SM_MON status 1
02/10/2017 21:19:15 : epoch 59cbce8c : chvirnfsprd12 :
ganesha.nfsd-8516[work-14] nsm_monitor :NLM :CRIT :Can not monitor
chglboraprd10.fpprod.corp SM_MON status 1

Seemingly Similar Threads

Search for more reasonably related threads

Gluster users - Oct 2017 - nfs-ganesha locking problems

[Gluster-users] nfs-ganesha locking problems

[Gluster-users] nfs-ganesha locking problems

[Gluster-users] nfs-ganesha locking problems

Seemingly Similar Threads