thr3ads.net - Gluster users - [Gluster-users] lingering <gfid:*> entries in volume heal, gluster 3.6.3 [Jul 2016]

If this information is useful, please help other people find it:
Share via:

Kingsley

2016-Jul-15 16:25 UTC

[Gluster-users] lingering <gfid:*> entries in volume heal, gluster 3.6.3

On Fri, 2016-07-15 at 21:41 +0530, Ravishankar N wrote:> On 07/15/2016 09:32 PM, Kingsley wrote:
> > On Fri, 2016-07-15 at 21:06 +0530, Ravishankar N wrote:
> >> On 07/15/2016 08:48 PM, Kingsley wrote:
> >>> I don't have star installed so I used ls,
> >> Oops typo. I meant `stat`.
> >>>    but yes they all have 2 links
> >>> to them (see below).
> >>>
> >> Everything seems to be in place for the heal to happen. Can you
tailf
> >> the output of shd logs on all nodes and manually launch gluster
vol heal
> >> volname?
> >> Use DEBUG log level if you have to and examine the output for
clues.
> > I presume I can do that with this command:
> >
> > gluster volume set callrec diagnostics.brick-log-level DEBUG
> shd is a client process, so it is diagnostics.client-log-level. This 
> would affect your mounts too.
> >
> > How can I find out what the log level is at the moment, so that I can
> > put it back afterwards?
> INFO. you can also use `gluster volume reset`.
Thanks.
> >> Also, some dumb things to check: are all the bricks really up and
is the
> >> shd connected to them etc.
> > All bricks are definitely up. I just created a file on a client and it
> > appeared in all 4 bricks.
> >
> > I don't know how to tell whether the shd is connected to all of
them,
> > though.
> Latest messages like "connected to client-xxx " and
"disconnected from
> client-xxx" in the shd logs. Just like in the mount logs.
This has revealed something. I'm now seeing lots of lines like this in
the shd log:

[2016-07-15 16:20:51.098152] D [afr-self-heald.c:516:afr_shd_index_sweep]
0-callrec-replicate-0: got entry: eaa43674-b1a3-4833-a946-de7b7121bb88
[2016-07-15 16:20:51.099346] D [client-rpc-fops.c:1523:client3_3_inodelk_cbk]
0-callrec-client-2: remote operation failed: Stale file handle
[2016-07-15 16:20:51.100683] D [client-rpc-fops.c:2686:client3_3_opendir_cbk]
0-callrec-client-2: remote operation failed: Stale file handle. Path:
<gfid:eaa43674-b1a3-4833-a946-de7b7121bb88>
(eaa43674-b1a3-4833-a946-de7b7121bb88)
[2016-07-15 16:20:51.101180] D [client-rpc-fops.c:1627:client3_3_entrylk_cbk]
0-callrec-client-2: remote operation failed: Stale file handle
[2016-07-15 16:20:51.101663] D [client-rpc-fops.c:1627:client3_3_entrylk_cbk]
0-callrec-client-2: remote operation failed: Stale file handle
[2016-07-15 16:20:51.102056] D [client-rpc-fops.c:1627:client3_3_entrylk_cbk]
0-callrec-client-2: remote operation failed: Stale file handle

These lines continued to be written to the log even after I manually
launched the self heal (which it told me had been launched
successfully). I also tried repeating that command on one of the bricks
that was giving those messages, but that made no difference.

Client 2 would correspond to the one that had been offline, so how do I
get the shd to reconnect to that brick? I did a ps but I couldn't see
any processes with glustershd in the name, else I'd have tried sending
that a HUP.

Cheers,
Kingsley.

Ravishankar N

2016-Jul-15 16:54 UTC

head link

[Gluster-users] lingering <gfid:*> entries in volume heal, gluster 3.6.3

On 07/15/2016 09:55 PM, Kingsley wrote:> This has revealed something. I'm now seeing lots of lines like this in
> the shd log:
>
> [2016-07-15 16:20:51.098152] D [afr-self-heald.c:516:afr_shd_index_sweep]
0-callrec-replicate-0: got entry: eaa43674-b1a3-4833-a946-de7b7121bb88
> [2016-07-15 16:20:51.099346] D
[client-rpc-fops.c:1523:client3_3_inodelk_cbk] 0-callrec-client-2: remote
operation failed: Stale file handle
> [2016-07-15 16:20:51.100683] D
[client-rpc-fops.c:2686:client3_3_opendir_cbk] 0-callrec-client-2: remote
operation failed: Stale file handle. Path:
<gfid:eaa43674-b1a3-4833-a946-de7b7121bb88>
(eaa43674-b1a3-4833-a946-de7b7121bb88)
Looks like the files are not present at all in client-2 which is why you 
see these messages.
Find out the files/directory names corresponding to these gfids from one 
of the healthy bricks and see if they are present in client-2 as well. 
If not try accessing them from the mount. That should create any missing 
entries in client-2. Then launch heal again.

Hope this helps.
Ravi> [2016-07-15 16:20:51.101180] D
[client-rpc-fops.c:1627:client3_3_entrylk_cbk] 0-callrec-client-2: remote
operation failed: Stale file handle
> [2016-07-15 16:20:51.101663] D
[client-rpc-fops.c:1627:client3_3_entrylk_cbk] 0-callrec-client-2: remote
operation failed: Stale file handle
> [2016-07-15 16:20:51.102056] D
[client-rpc-fops.c:1627:client3_3_entrylk_cbk] 0-callrec-client-2: remote
operation failed: Stale file handle
>
> These lines continued to be written to the log even after I manually
> launched the self heal (which it told me had been launched
> successfully). I also tried repeating that command on one of the bricks
> that was giving those messages, but that made no difference.
>
> Client 2 would correspond to the one that had been offline, so how do I
> get the shd to reconnect to that brick? I did a ps but I couldn't see
> any processes with glustershd in the name, else I'd have tried sending
> that a HUP.

Gluster users - Jul 2016 - lingering <gfid:*> entries in volume heal, gluster 3.6.3

[Gluster-users] lingering <gfid:*> entries in volume heal, gluster 3.6.3

[Gluster-users] lingering <gfid:*> entries in volume heal, gluster 3.6.3