Pat Haley
2013-Nov-27 21:42 UTC
[Gluster-users] After reboot, one brick is not being seen by clients
Hi, We are currently using gluster with 3 bricks. We just rebooted one of the bricks (mseas-data, also identified as gluster-data) which is actually the main server. After rebooting this brick, our client machine (mseas) only sees the files on the other 2 bricks. Note that if I mount the gluster filespace (/gdata) on the brick I rebooted, it sees the entire space. The last time I had this problem, there was an error in one of our /etc/hosts file. This does not seem to be the case now. What else can I look at to debug this problem? Some information I have from the gluster server [root at mseas-data ~]# gluster --version glusterfs 3.3.1 built on Oct 11 2012 22:01:05 [root at mseas-data ~]# gluster volume info Volume Name: gdata Type: Distribute Volume ID: eccc3a90-212d-4563-ae8d-10a77758738d Status: Started Number of Bricks: 3 Transport-type: tcp Bricks: Brick1: gluster-0-0:/mseas-data-0-0 Brick2: gluster-0-1:/mseas-data-0-1 Brick3: gluster-data:/data [root at mseas-data ~]# ps -ef | grep gluster root 2781 1 0 15:16 ? 00:00:00 /usr/sbin/glusterd -p /var/run/glusterd.pid root 2897 1 0 15:16 ? 00:00:00 /usr/sbin/glusterfsd -s localhost --volfile-id gdata.gluster-data.data -p /var/lib/glusterd/vols/gdata/run/gluster-data-data.pid -S /tmp/e3eac7ce95e786a3d909b8fc65ed2059.socket --brick-name /data -l /var/log/glusterfs/bricks/data.log --xlator-option *-posix.glusterd-uuid=22f1102a-08e6-482d-ad23-d8e063cf32ed --brick-port 24009 --xlator-option gdata-server.listen-port=24009 root 2903 1 0 15:16 ? 00:00:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /tmp/d5c892de43c28a1ee7481b780245b789.socket root 4258 1 0 15:52 ? 00:00:00 /usr/sbin/glusterfs --volfile-id=/gdata --volfile-server=mseas-data /gdata root 4475 4033 0 16:35 pts/0 00:00:00 grep gluster [ -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Pat Haley Email: phaley at mit.edu Center for Ocean Engineering Phone: (617) 253-6824 Dept. of Mechanical Engineering Fax: (617) 253-8125 MIT, Room 5-213 http://web.mit.edu/phaley/www/ 77 Massachusetts Avenue Cambridge, MA 02139-4301
Ravishankar N
2013-Nov-28 04:21 UTC
[Gluster-users] After reboot, one brick is not being seen by clients
On 11/28/2013 03:12 AM, Pat Haley wrote:> > Hi, > > We are currently using gluster with 3 bricks. We just > rebooted one of the bricks (mseas-data, also identified > as gluster-data) which is actually the main server. After > rebooting this brick, our client machine (mseas) only sees > the files on the other 2 bricks. Note that if I mount > the gluster filespace (/gdata) on the brick I rebooted, > it sees the entire space. > > The last time I had this problem, there was an error in > one of our /etc/hosts file. This does not seem to be the > case now. > > What else can I look at to debug this problem? > > Some information I have from the gluster server > > [root at mseas-data ~]# gluster --version > glusterfs 3.3.1 built on Oct 11 2012 22:01:05 > > [root at mseas-data ~]# gluster volume info > > Volume Name: gdata > Type: Distribute > Volume ID: eccc3a90-212d-4563-ae8d-10a77758738d > Status: Started > Number of Bricks: 3 > Transport-type: tcp > Bricks: > Brick1: gluster-0-0:/mseas-data-0-0 > Brick2: gluster-0-1:/mseas-data-0-1 > Brick3: gluster-data:/data > > [root at mseas-data ~]# ps -ef | grep gluster > > root 2781 1 0 15:16 ? 00:00:00 /usr/sbin/glusterd -p > /var/run/glusterd.pid > root 2897 1 0 15:16 ? 00:00:00 /usr/sbin/glusterfsd > -s localhost --volfile-id gdata.gluster-data.data -p > /var/lib/glusterd/vols/gdata/run/gluster-data-data.pid -S > /tmp/e3eac7ce95e786a3d909b8fc65ed2059.socket --brick-name /data -l > /var/log/glusterfs/bricks/data.log --xlator-option > *-posix.glusterd-uuid=22f1102a-08e6-482d-ad23-d8e063cf32ed > --brick-port 24009 --xlator-option gdata-server.listen-port=24009 > root 2903 1 0 15:16 ? 00:00:00 /usr/sbin/glusterfs -s > localhost --volfile-id gluster/nfs -p > /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S > /tmp/d5c892de43c28a1ee7481b780245b789.socket > root 4258 1 0 15:52 ? 00:00:00 /usr/sbin/glusterfs > --volfile-id=/gdata --volfile-server=mseas-data /gdata > root 4475 4033 0 16:35 pts/0 00:00:00 grep gluster > [ >From the ps output, the brick process (glusterfsd) doesn't seem to be running on the gluster-data server. Run `gluster volume status` and check if that is indeed the case. If yes, you could either restart glusterd on the brick node (`service glusterd restart`) or restart the entire volume (`gluster volume start gdata force`) which should bring back the brick process online. I'm not sure why glusterd did not start the brick process when you rebooted the machine in the first place. You could perhaps check the glusterd log for clues). Hope this helps, Ravi> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley Email: phaley at mit.edu > Center for Ocean Engineering Phone: (617) 253-6824 > Dept. of Mechanical Engineering Fax: (617) 253-8125 > MIT, Room 5-213 http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users
Ravishankar N
2013-Nov-28 07:48 UTC
[Gluster-users] After reboot, one brick is not being seen by clients
On 11/28/2013 12:52 PM, Patrick Haley wrote:> Hi Ravi, > > Thanks for the reply. If I interpret the output of gluster volume status > correctly, glusterfsd was running > > [root at mseas-data ~]# gluster volume status > Status of volume: gdata > Gluster process Port Online Pid > ------------------------------------------------------------------------------ > Brick gluster-0-0:/mseas-data-0-0 24009 Y 27006 > Brick gluster-0-1:/mseas-data-0-1 24009 Y 7063 > Brick gluster-data:/data 24009 Y 2897 > NFS Server on localhost 38467 Y 2903 > NFS Server on gluster-0-1 38467 Y 7069 > NFS Server on gluster-0-0 38467 Y 27012 > > For completeness, I tried both "service glusterd restart" and > "gluster volume start gdata force". Neither solved the problem. > Note that after "gluster volume start gdata force" the gluster volume status > failed > > [root at mseas-data ~]# gluster volume status > operation failed > > Failed to get names of volumes > > Doing another "service glusterd restart" let the "gluster volume status" > command work, but the clients still don't see the files on mseas-data.Are your clients using fuse mounts or NFS mounts?> > A second piece of data, on the other bricks, "gluster volume status"does not > show gluster-data:/data:Hmm, could you check if all 3 bricks are connected ? `gluster peer status` on each brick should show the others as connected.> > [root at nas-0-0 ~]# gluster volume status > Status of volume: gdata > Gluster process Port Online Pid > ------------------------------------------------------------------------------ > Brick gluster-0-0:/mseas-data-0-0 24009 Y 27006 > Brick gluster-0-1:/mseas-data-0-1 24009 Y 7063 > NFS Server on localhost 38467 Y 27012 > NFS Server on gluster-0-1 38467 Y 8051 > > Any thoughts on what I should look at next?Also noticed the NFS server process on gluster-0-1 (on which I guess no commands were run ) seems to have changed it's pid from 7069 to 8051. FWIW, I am able to observe a similar bug (https://bugzilla.redhat.com/show_bug.cgi?id=1035586) which needs to be investigated. Thanks, Ravi> Thanks again. > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley Email: phaley at mit.edu > Center for Ocean Engineering Phone: (617) 253-6824 > Dept. of Mechanical Engineering Fax: (617) 253-8125 > MIT, Room 5-213 http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > > > ________________________________________ > From: Ravishankar N [ravishankar at redhat.com] > Sent: Wednesday, November 27, 2013 11:21 PM > To: Patrick Haley; gluster-users at gluster.org > Subject: Re: [Gluster-users] After reboot, one brick is not being seen by clients > > On 11/28/2013 03:12 AM, Pat Haley wrote: >> Hi, >> >> We are currently using gluster with 3 bricks. We just >> rebooted one of the bricks (mseas-data, also identified >> as gluster-data) which is actually the main server. After >> rebooting this brick, our client machine (mseas) only sees >> the files on the other 2 bricks. Note that if I mount >> the gluster filespace (/gdata) on the brick I rebooted, >> it sees the entire space. >> >> The last time I had this problem, there was an error in >> one of our /etc/hosts file. This does not seem to be the >> case now. >> >> What else can I look at to debug this problem? >> >> Some information I have from the gluster server >> >> [root at mseas-data ~]# gluster --version >> glusterfs 3.3.1 built on Oct 11 2012 22:01:05 >> >> [root at mseas-data ~]# gluster volume info >> >> Volume Name: gdata >> Type: Distribute >> Volume ID: eccc3a90-212d-4563-ae8d-10a77758738d >> Status: Started >> Number of Bricks: 3 >> Transport-type: tcp >> Bricks: >> Brick1: gluster-0-0:/mseas-data-0-0 >> Brick2: gluster-0-1:/mseas-data-0-1 >> Brick3: gluster-data:/data >> >> [root at mseas-data ~]# ps -ef | grep gluster >> >> root 2781 1 0 15:16 ? 00:00:00 /usr/sbin/glusterd -p >> /var/run/glusterd.pid >> root 2897 1 0 15:16 ? 00:00:00 /usr/sbin/glusterfsd >> -s localhost --volfile-id gdata.gluster-data.data -p >> /var/lib/glusterd/vols/gdata/run/gluster-data-data.pid -S >> /tmp/e3eac7ce95e786a3d909b8fc65ed2059.socket --brick-name /data -l >> /var/log/glusterfs/bricks/data.log --xlator-option >> *-posix.glusterd-uuid=22f1102a-08e6-482d-ad23-d8e063cf32ed >> --brick-port 24009 --xlator-option gdata-server.listen-port=24009 >> root 2903 1 0 15:16 ? 00:00:00 /usr/sbin/glusterfs -s >> localhost --volfile-id gluster/nfs -p >> /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S >> /tmp/d5c892de43c28a1ee7481b780245b789.socket >> root 4258 1 0 15:52 ? 00:00:00 /usr/sbin/glusterfs >> --volfile-id=/gdata --volfile-server=mseas-data /gdata >> root 4475 4033 0 16:35 pts/0 00:00:00 grep gluster >> [ >> > From the ps output, the brick process (glusterfsd) doesn't seem to be > running on the gluster-data server. Run `gluster volume status` and > check if that is indeed the case. If yes, you could either restart > glusterd on the brick node (`service glusterd restart`) or restart the > entire volume (`gluster volume start gdata force`) which should bring > back the brick process online. > > I'm not sure why glusterd did not start the brick process when you > rebooted the machine in the first place. You could perhaps check the > glusterd log for clues). > > Hope this helps, > Ravi > >> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >> Pat Haley Email: phaley at mit.edu >> Center for Ocean Engineering Phone: (617) 253-6824 >> Dept. of Mechanical Engineering Fax: (617) 253-8125 >> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >> 77 Massachusetts Avenue >> Cambridge, MA 02139-4301 >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://supercolony.gluster.org/mailman/listinfo/gluster-users
Patrick Haley
2013-Nov-28 16:06 UTC
[Gluster-users] After reboot, one brick is not being seen by clients
________________________________________ From: Patrick Haley Sent: Thursday, November 28, 2013 11:00 AM To: Ravishankar N Subject: RE: [Gluster-users] After reboot, one brick is not being seen by clients Hi Ravi, I'm pretty sure the clients use fuse mounts. The relevant line from /etc/fstab is mseas-data:/gdata /gdata glusterfs defaults,_netdev 0 0 gluster-data sees the other bricks as connected. The other bricks see each other as connected but gluster-data as disconnected: --------------- gluster-data: --------------- [root at mseas-data ~]# gluster peer status Number of Peers: 2 Hostname: gluster-0-1 Uuid: 393fc4a6-1573-4564-971e-1b1aec434167 State: Peer in Cluster (Connected) Hostname: gluster-0-0 Uuid: 3619440a-4ca3-4151-b62e-d4d6bf2e0c03 State: Peer in Cluster (Connected) ------------- gluster-0-0: -------------- [root at nas-0-0 ~]# gluster peer status Number of Peers: 2 Hostname: gluster-data Uuid: 22f1102a-08e6-482d-ad23-d8e063cf32ed State: Peer in Cluster (Disconnected) Hostname: gluster-0-1 Uuid: 393fc4a6-1573-4564-971e-1b1aec434167 State: Peer in Cluster (Connected) ------------- gluster-0-1: -------------- [root at nas-0-1 ~]# gluster peer status Number of Peers: 2 Hostname: gluster-data Uuid: 22f1102a-08e6-482d-ad23-d8e063cf32ed State: Peer in Cluster (Disconnected) Hostname: gluster-0-0 Uuid: 3619440a-4ca3-4151-b62e-d4d6bf2e0c03 State: Peer in Cluster (Connected) Does any of this suggest what I need to look at next? Thanks. -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Pat Haley Email: phaley at mit.edu Center for Ocean Engineering Phone: (617) 253-6824 Dept. of Mechanical Engineering Fax: (617) 253-8125 MIT, Room 5-213 http://web.mit.edu/phaley/www/ 77 Massachusetts Avenue Cambridge, MA 02139-4301 ________________________________________ From: Ravishankar N [ravishankar at redhat.com] Sent: Thursday, November 28, 2013 2:48 AM To: Patrick Haley Cc: gluster-users at gluster.org Subject: Re: [Gluster-users] After reboot, one brick is not being seen by clients On 11/28/2013 12:52 PM, Patrick Haley wrote:> Hi Ravi, > > Thanks for the reply. If I interpret the output of gluster volume status > correctly, glusterfsd was running > > [root at mseas-data ~]# gluster volume status > Status of volume: gdata > Gluster process Port Online Pid > ------------------------------------------------------------------------------ > Brick gluster-0-0:/mseas-data-0-0 24009 Y 27006 > Brick gluster-0-1:/mseas-data-0-1 24009 Y 7063 > Brick gluster-data:/data 24009 Y 2897 > NFS Server on localhost 38467 Y 2903 > NFS Server on gluster-0-1 38467 Y 7069 > NFS Server on gluster-0-0 38467 Y 27012 > > For completeness, I tried both "service glusterd restart" and > "gluster volume start gdata force". Neither solved the problem. > Note that after "gluster volume start gdata force" the gluster volume status > failed > > [root at mseas-data ~]# gluster volume status > operation failed > > Failed to get names of volumes > > Doing another "service glusterd restart" let the "gluster volume status" > command work, but the clients still don't see the files on mseas-data.Are your clients using fuse mounts or NFS mounts?> > A second piece of data, on the other bricks, "gluster volume status"does not > show gluster-data:/data:Hmm, could you check if all 3 bricks are connected ? `gluster peer status` on each brick should show the others as connected.> > [root at nas-0-0 ~]# gluster volume status > Status of volume: gdata > Gluster process Port Online Pid > ------------------------------------------------------------------------------ > Brick gluster-0-0:/mseas-data-0-0 24009 Y 27006 > Brick gluster-0-1:/mseas-data-0-1 24009 Y 7063 > NFS Server on localhost 38467 Y 27012 > NFS Server on gluster-0-1 38467 Y 8051 > > Any thoughts on what I should look at next?Also noticed the NFS server process on gluster-0-1 (on which I guess no commands were run ) seems to have changed it's pid from 7069 to 8051. FWIW, I am able to observe a similar bug (https://bugzilla.redhat.com/show_bug.cgi?id=1035586) which needs to be investigated. Thanks, Ravi> Thanks again. > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley Email: phaley at mit.edu > Center for Ocean Engineering Phone: (617) 253-6824 > Dept. of Mechanical Engineering Fax: (617) 253-8125 > MIT, Room 5-213 http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > > > ________________________________________ > From: Ravishankar N [ravishankar at redhat.com] > Sent: Wednesday, November 27, 2013 11:21 PM > To: Patrick Haley; gluster-users at gluster.org > Subject: Re: [Gluster-users] After reboot, one brick is not being seen by clients > > On 11/28/2013 03:12 AM, Pat Haley wrote: >> Hi, >> >> We are currently using gluster with 3 bricks. We just >> rebooted one of the bricks (mseas-data, also identified >> as gluster-data) which is actually the main server. After >> rebooting this brick, our client machine (mseas) only sees >> the files on the other 2 bricks. Note that if I mount >> the gluster filespace (/gdata) on the brick I rebooted, >> it sees the entire space. >> >> The last time I had this problem, there was an error in >> one of our /etc/hosts file. This does not seem to be the >> case now. >> >> What else can I look at to debug this problem? >> >> Some information I have from the gluster server >> >> [root at mseas-data ~]# gluster --version >> glusterfs 3.3.1 built on Oct 11 2012 22:01:05 >> >> [root at mseas-data ~]# gluster volume info >> >> Volume Name: gdata >> Type: Distribute >> Volume ID: eccc3a90-212d-4563-ae8d-10a77758738d >> Status: Started >> Number of Bricks: 3 >> Transport-type: tcp >> Bricks: >> Brick1: gluster-0-0:/mseas-data-0-0 >> Brick2: gluster-0-1:/mseas-data-0-1 >> Brick3: gluster-data:/data >> >> [root at mseas-data ~]# ps -ef | grep gluster >> >> root 2781 1 0 15:16 ? 00:00:00 /usr/sbin/glusterd -p >> /var/run/glusterd.pid >> root 2897 1 0 15:16 ? 00:00:00 /usr/sbin/glusterfsd >> -s localhost --volfile-id gdata.gluster-data.data -p >> /var/lib/glusterd/vols/gdata/run/gluster-data-data.pid -S >> /tmp/e3eac7ce95e786a3d909b8fc65ed2059.socket --brick-name /data -l >> /var/log/glusterfs/bricks/data.log --xlator-option >> *-posix.glusterd-uuid=22f1102a-08e6-482d-ad23-d8e063cf32ed >> --brick-port 24009 --xlator-option gdata-server.listen-port=24009 >> root 2903 1 0 15:16 ? 00:00:00 /usr/sbin/glusterfs -s >> localhost --volfile-id gluster/nfs -p >> /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S >> /tmp/d5c892de43c28a1ee7481b780245b789.socket >> root 4258 1 0 15:52 ? 00:00:00 /usr/sbin/glusterfs >> --volfile-id=/gdata --volfile-server=mseas-data /gdata >> root 4475 4033 0 16:35 pts/0 00:00:00 grep gluster >> [ >> > From the ps output, the brick process (glusterfsd) doesn't seem to be > running on the gluster-data server. Run `gluster volume status` and > check if that is indeed the case. If yes, you could either restart > glusterd on the brick node (`service glusterd restart`) or restart the > entire volume (`gluster volume start gdata force`) which should bring > back the brick process online. > > I'm not sure why glusterd did not start the brick process when you > rebooted the machine in the first place. You could perhaps check the > glusterd log for clues). > > Hope this helps, > Ravi > >> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >> Pat Haley Email: phaley at mit.edu >> Center for Ocean Engineering Phone: (617) 253-6824 >> Dept. of Mechanical Engineering Fax: (617) 253-8125 >> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >> 77 Massachusetts Avenue >> Cambridge, MA 02139-4301 >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://supercolony.gluster.org/mailman/listinfo/gluster-users
Ravishankar N
2013-Nov-28 17:32 UTC
[Gluster-users] After reboot, one brick is not being seen by clients
On 11/28/2013 09:30 PM, Patrick Haley wrote:> Hi Ravi, > > I'm pretty sure the clients use fuse mounts. The relevant line from /etc/fstab is > > mseas-data:/gdata /gdata glusterfs defaults,_netdev 0 0 > > > gluster-data sees the other bricks as connected. The other bricks see each > other as connected but gluster-data as disconnected: > > --------------- > gluster-data: > --------------- > [root at mseas-data ~]# gluster peer status > Number of Peers: 2 > > Hostname: gluster-0-1 > Uuid: 393fc4a6-1573-4564-971e-1b1aec434167 > State: Peer in Cluster (Connected) > > Hostname: gluster-0-0 > Uuid: 3619440a-4ca3-4151-b62e-d4d6bf2e0c03 > State: Peer in Cluster (Connected) > > ------------- > gluster-0-0: > -------------- > [root at nas-0-0 ~]# gluster peer status > Number of Peers: 2 > > Hostname: gluster-data > Uuid: 22f1102a-08e6-482d-ad23-d8e063cf32ed > State: Peer in Cluster (Disconnected) > > Hostname: gluster-0-1 > Uuid: 393fc4a6-1573-4564-971e-1b1aec434167 > State: Peer in Cluster (Connected) > > ------------- > gluster-0-1: > -------------- > [root at nas-0-1 ~]# gluster peer status > Number of Peers: 2 > > Hostname: gluster-data > Uuid: 22f1102a-08e6-482d-ad23-d8e063cf32ed > State: Peer in Cluster (Disconnected) > > Hostname: gluster-0-0 > Uuid: 3619440a-4ca3-4151-b62e-d4d6bf2e0c03 > State: Peer in Cluster (Connected) > > Does any of this suggest what I need to look at next?Hi Patrick, If gluster-data is pingable from the other bricks, you could try detaching and retttaching it from gluster-0-0 or 0-1. 1) On gluster-0-0: `gluster peer detach gluster-data`, if that fails, `gluster peer detach gluster-data force` 2) On gluster-data: `rm -rf /var/lib/glusterd` `service glusterd restart` 3) Again on gluster-0-0: 'gluster peer probe gluster-data' Now check if things work. PS:You should really do a 'reply-to-all' so that your queries reach a wider audience, getting you faster responses from the community. Also serves as a double-check in case I goof up :) I'm off to sleep now.> > Thanks. > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley Email: phaley at mit.edu > Center for Ocean Engineering Phone: (617) 253-6824 > Dept. of Mechanical Engineering Fax: (617) 253-8125 > MIT, Room 5-213 http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > > > ________________________________________ > From: Ravishankar N [ravishankar at redhat.com] > Sent: Thursday, November 28, 2013 2:48 AM > To: Patrick Haley > Cc: gluster-users at gluster.org > Subject: Re: [Gluster-users] After reboot, one brick is not being seen by clients > > On 11/28/2013 12:52 PM, Patrick Haley wrote: >> Hi Ravi, >> >> Thanks for the reply. If I interpret the output of gluster volume status >> correctly, glusterfsd was running >> >> [root at mseas-data ~]# gluster volume status >> Status of volume: gdata >> Gluster process Port Online Pid >> ------------------------------------------------------------------------------ >> Brick gluster-0-0:/mseas-data-0-0 24009 Y 27006 >> Brick gluster-0-1:/mseas-data-0-1 24009 Y 7063 >> Brick gluster-data:/data 24009 Y 2897 >> NFS Server on localhost 38467 Y 2903 >> NFS Server on gluster-0-1 38467 Y 7069 >> NFS Server on gluster-0-0 38467 Y 27012 >> >> For completeness, I tried both "service glusterd restart" and >> "gluster volume start gdata force". Neither solved the problem. >> Note that after "gluster volume start gdata force" the gluster volume status >> failed >> >> [root at mseas-data ~]# gluster volume status >> operation failed >> >> Failed to get names of volumes >> >> Doing another "service glusterd restart" let the "gluster volume status" >> command work, but the clients still don't see the files on mseas-data. > Are your clients using fuse mounts or NFS mounts? >> A second piece of data, on the other bricks, "gluster volume status"does not >> show gluster-data:/data: > Hmm, could you check if all 3 bricks are connected ? `gluster peer > status` on each brick should show the others as connected. >> [root at nas-0-0 ~]# gluster volume status >> Status of volume: gdata >> Gluster process Port Online Pid >> ------------------------------------------------------------------------------ >> Brick gluster-0-0:/mseas-data-0-0 24009 Y 27006 >> Brick gluster-0-1:/mseas-data-0-1 24009 Y 7063 >> NFS Server on localhost 38467 Y 27012 >> NFS Server on gluster-0-1 38467 Y 8051 >> >> Any thoughts on what I should look at next? > Also noticed the NFS server process on gluster-0-1 (on which I guess no > commands were run ) seems to have changed it's pid from 7069 to 8051. > FWIW, I am able to observe a similar bug > (https://bugzilla.redhat.com/show_bug.cgi?id=1035586) which needs to be > investigated. > > Thanks, > Ravi >> Thanks again. >> >> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >> Pat Haley Email: phaley at mit.edu >> Center for Ocean Engineering Phone: (617) 253-6824 >> Dept. of Mechanical Engineering Fax: (617) 253-8125 >> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >> 77 Massachusetts Avenue >> Cambridge, MA 02139-4301 >> >> >> ________________________________________ >> From: Ravishankar N [ravishankar at redhat.com] >> Sent: Wednesday, November 27, 2013 11:21 PM >> To: Patrick Haley; gluster-users at gluster.org >> Subject: Re: [Gluster-users] After reboot, one brick is not being seen by clients >> >> On 11/28/2013 03:12 AM, Pat Haley wrote: >>> Hi, >>> >>> We are currently using gluster with 3 bricks. We just >>> rebooted one of the bricks (mseas-data, also identified >>> as gluster-data) which is actually the main server. After >>> rebooting this brick, our client machine (mseas) only sees >>> the files on the other 2 bricks. Note that if I mount >>> the gluster filespace (/gdata) on the brick I rebooted, >>> it sees the entire space. >>> >>> The last time I had this problem, there was an error in >>> one of our /etc/hosts file. This does not seem to be the >>> case now. >>> >>> What else can I look at to debug this problem? >>> >>> Some information I have from the gluster server >>> >>> [root at mseas-data ~]# gluster --version >>> glusterfs 3.3.1 built on Oct 11 2012 22:01:05 >>> >>> [root at mseas-data ~]# gluster volume info >>> >>> Volume Name: gdata >>> Type: Distribute >>> Volume ID: eccc3a90-212d-4563-ae8d-10a77758738d >>> Status: Started >>> Number of Bricks: 3 >>> Transport-type: tcp >>> Bricks: >>> Brick1: gluster-0-0:/mseas-data-0-0 >>> Brick2: gluster-0-1:/mseas-data-0-1 >>> Brick3: gluster-data:/data >>> >>> [root at mseas-data ~]# ps -ef | grep gluster >>> >>> root 2781 1 0 15:16 ? 00:00:00 /usr/sbin/glusterd -p >>> /var/run/glusterd.pid >>> root 2897 1 0 15:16 ? 00:00:00 /usr/sbin/glusterfsd >>> -s localhost --volfile-id gdata.gluster-data.data -p >>> /var/lib/glusterd/vols/gdata/run/gluster-data-data.pid -S >>> /tmp/e3eac7ce95e786a3d909b8fc65ed2059.socket --brick-name /data -l >>> /var/log/glusterfs/bricks/data.log --xlator-option >>> *-posix.glusterd-uuid=22f1102a-08e6-482d-ad23-d8e063cf32ed >>> --brick-port 24009 --xlator-option gdata-server.listen-port=24009 >>> root 2903 1 0 15:16 ? 00:00:00 /usr/sbin/glusterfs -s >>> localhost --volfile-id gluster/nfs -p >>> /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S >>> /tmp/d5c892de43c28a1ee7481b780245b789.socket >>> root 4258 1 0 15:52 ? 00:00:00 /usr/sbin/glusterfs >>> --volfile-id=/gdata --volfile-server=mseas-data /gdata >>> root 4475 4033 0 16:35 pts/0 00:00:00 grep gluster >>> [ >>> >> From the ps output, the brick process (glusterfsd) doesn't seem to be >> running on the gluster-data server. Run `gluster volume status` and >> check if that is indeed the case. If yes, you could either restart >> glusterd on the brick node (`service glusterd restart`) or restart the >> entire volume (`gluster volume start gdata force`) which should bring >> back the brick process online. >> >> I'm not sure why glusterd did not start the brick process when you >> rebooted the machine in the first place. You could perhaps check the >> glusterd log for clues). >> >> Hope this helps, >> Ravi >> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >>> Pat Haley Email: phaley at mit.edu >>> Center for Ocean Engineering Phone: (617) 253-6824 >>> Dept. of Mechanical Engineering Fax: (617) 253-8125 >>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ >>> 77 Massachusetts Avenue >>> Cambridge, MA 02139-4301 >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> http://supercolony.gluster.org/mailman/listinfo/gluster-users