Kolasinski, Brent D.
2014-Jun-09 20:59 UTC
[Gluster-users] Gluster NFS fails to start when replica brick is down
Hi all, I have noticed some interesting behavior from my gluster setup regarding NFS on Gluster 3.5.0: My Problem: I have 2 bricks in a replica volume (named gvol0). This volume is accessed through NFS. If I fail one of the servers, everything works as expected; gluster NFS continues to export the volume from the remaining brick. However, if I restart the the glusterd, glusterfsd, and rpcbind services or reboot the remaining host while the other brick is down, gluster NFS no longer exports the volume from the remaining brick. It appears to share the volume for the gluster-fuse client though. Is this intended behavior, or is this a possible bug? Here is a ps just after a brick fails, with 1 brick remaining to export the volume over gluster NFS: [root at nfs0 ~]# ps aux | grep gluster root 2145 0.0 0.1 518444 24972 ? Ssl 19:24 0:00 /usr/sbin/glusterfsd -s nfs0g --volfile-id gvol0.nfs0g.data-brick0-gvol0 -p /var/lib/glusterd/vols/gvol0/run/nfs0g-data-brick0-gvol0.pid -S /var/run/91885b40ac4835907081de3bdc235620.socket --brick-name /data/brick0/gvol0 -l /var/log/glusterfs/bricks/data-brick0-gvol0.log --xlator-option *-posix.glusterd-uuid=49f53699-babd-4731-9c56-582b2b90b27c --brick-port 49152 --xlator-option gvol0-server.listen-port=49152 root 2494 0.1 0.1 414208 19204 ? Ssl 19:46 0:02 /usr/sbin/glusterd --pid-file=/var/run/glusterd.pid root 2511 0.0 0.4 471324 77868 ? Ssl 19:47 0:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/b0f1e836c0c9f168518e0adba7187c10.socket root 2515 0.0 0.1 334968 25408 ? Ssl 19:47 0:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/173a6cd55e36ea8e0ce0896d27533355.socket --xlator-option *replicate*.node-uuid=49f53699-babd-4731-9c56-582b2b90b27c Here is a ps after restarting the remaining host, with the other brick still down: [root at nfs0 ~]# ps aux | grep gluster root 2134 0.1 0.0 280908 14684 ? Ssl 20:36 0:00 /usr/sbin/glusterd --pid-file=/var/run/glusterd.pid root 2144 0.0 0.1 513192 17300 ? Ssl 20:36 0:00 /usr/sbin/glusterfsd -s nfs0g --volfile-id gvol0.nfs0g.data-brick0-gvol0 -p /var/lib/glusterd/vols/gvol0/run/nfs0g-data-brick0-gvol0.pid -S /var/run/91885b40ac4835907081de3bdc235620.socket --brick-name /data/brick0/gvol0 -l /var/log/glusterfs/bricks/data-brick0-gvol0.log --xlator-option *-posix.glusterd-uuid=49f53699-babd-4731-9c56-582b2b90b27c --brick-port 49152 --xlator-option gvol0-server.listen-port=49152 It appears glusterfsd is not starting the gluster NFS service back up upon reboot of the remaining host. If I were to restart glusterfsd on the remaining host, it still will not bring up NFS. However, if I start the gluster service on the host that serves the down brick, NFS will start up again, without me restarting any services. Here is the volume information: Volume Name: gvol0 Type: Replicate Volume ID: e88afc1c-50d3-4e2e-b540-4c2979219d12 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: nfs0g:/data/brick0/gvol0 Brick2: nfs1g:/data/brick0/gvol0 Options Reconfigured: nfs.disable: 0 network.ping-timeout: 3 Is this a bug, or intended functionality? ---------- Brent Kolasinski Computer Systems Engineer Argonne National Laboratory Decision and Information Sciences ARM Climate Research Facility
Santosh Pradhan
2014-Jun-10 11:27 UTC
[Gluster-users] Gluster NFS fails to start when replica brick is down
Hi Brent, Please go ahead and file a bug. Thanks, Santosh On 06/10/2014 02:29 AM, Kolasinski, Brent D. wrote:> Hi all, > > I have noticed some interesting behavior from my gluster setup regarding > NFS on Gluster 3.5.0: > > My Problem: > I have 2 bricks in a replica volume (named gvol0). This volume is > accessed through NFS. If I fail one of the servers, everything works as > expected; gluster NFS continues to export the volume from the remaining > brick. However, if I restart the the glusterd, glusterfsd, and rpcbind > services or reboot the remaining host while the other brick is down, > gluster NFS no longer exports the volume from the remaining brick. It > appears to share the volume for the gluster-fuse client though. Is this > intended behavior, or is this a possible bug? > > Here is a ps just after a brick fails, with 1 brick remaining to export > the volume over gluster NFS: > > [root at nfs0 ~]# ps aux | grep gluster > root 2145 0.0 0.1 518444 24972 ? Ssl 19:24 0:00 > /usr/sbin/glusterfsd -s nfs0g --volfile-id gvol0.nfs0g.data-brick0-gvol0 > -p /var/lib/glusterd/vols/gvol0/run/nfs0g-data-brick0-gvol0.pid -S > /var/run/91885b40ac4835907081de3bdc235620.socket --brick-name > /data/brick0/gvol0 -l /var/log/glusterfs/bricks/data-brick0-gvol0.log > --xlator-option *-posix.glusterd-uuid=49f53699-babd-4731-9c56-582b2b90b27c > --brick-port 49152 --xlator-option gvol0-server.listen-port=49152 > root 2494 0.1 0.1 414208 19204 ? Ssl 19:46 0:02 > /usr/sbin/glusterd --pid-file=/var/run/glusterd.pid > root 2511 0.0 0.4 471324 77868 ? Ssl 19:47 0:00 > /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p > /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S > /var/run/b0f1e836c0c9f168518e0adba7187c10.socket > root 2515 0.0 0.1 334968 25408 ? Ssl 19:47 0:00 > /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p > /var/lib/glusterd/glustershd/run/glustershd.pid -l > /var/log/glusterfs/glustershd.log -S > /var/run/173a6cd55e36ea8e0ce0896d27533355.socket --xlator-option > *replicate*.node-uuid=49f53699-babd-4731-9c56-582b2b90b27c > > Here is a ps after restarting the remaining host, with the other brick > still down: > > [root at nfs0 ~]# ps aux | grep gluster > > root 2134 0.1 0.0 280908 14684 ? Ssl 20:36 0:00 > /usr/sbin/glusterd --pid-file=/var/run/glusterd.pid > root 2144 0.0 0.1 513192 17300 ? Ssl 20:36 0:00 > /usr/sbin/glusterfsd -s nfs0g --volfile-id gvol0.nfs0g.data-brick0-gvol0 > -p /var/lib/glusterd/vols/gvol0/run/nfs0g-data-brick0-gvol0.pid -S > /var/run/91885b40ac4835907081de3bdc235620.socket --brick-name > /data/brick0/gvol0 -l /var/log/glusterfs/bricks/data-brick0-gvol0.log > --xlator-option *-posix.glusterd-uuid=49f53699-babd-4731-9c56-582b2b90b27c > --brick-port 49152 --xlator-option gvol0-server.listen-port=49152 > > It appears glusterfsd is not starting the gluster NFS service back up upon > reboot of the remaining host. If I were to restart glusterfsd on the > remaining host, it still will not bring up NFS. However, if I start the > gluster service on the host that serves the down brick, NFS will start up > again, without me restarting any services. > > Here is the volume information: > > Volume Name: gvol0 > Type: Replicate > Volume ID: e88afc1c-50d3-4e2e-b540-4c2979219d12 > Status: Started > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: nfs0g:/data/brick0/gvol0 > Brick2: nfs1g:/data/brick0/gvol0 > Options Reconfigured: > nfs.disable: 0 > network.ping-timeout: 3 > > > > Is this a bug, or intended functionality? > > > > > > ---------- > Brent Kolasinski > Computer Systems Engineer > > Argonne National Laboratory > Decision and Information Sciences > ARM Climate Research Facility > > > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users