Christos Theodosiou
2010-Apr-16 08:27 UTC
[Lustre-discuss] Frequent appearence of LustreError: no handle for file close ino
Hi all, our lustre installation uses two failover MDSes, which serve 10 file-systems. We recently upgraded from 1.8.1.1 to 1.8.2 version. By monitoring the MDSes I noticed that we get frequent error messages (1-4 times/hour) which looklike this: Apr 16 10:36:10 lustre01 kernel: LustreError: 31386:0:(mds_open.c:1666:mds_close()) @@@ no handle for file close ino 4011004: cookie 0x699184f2b7f715d6 req at ffff8100621d2c00 x1332632126967249/t0 o35->60822f67-b798-f355-d5b1-54d7d7fa9f15@:0/0 lens 408/472 e 0 to 0 dl 1271403376 ref 1 fl Interpret:/0/0 rc 0/0 Apr 16 10:36:10 lustre01 kernel: LustreError: 31386:0:(mds_open.c:1666:mds_close()) Skipped 1 previous similar message Apr 16 10:36:10 lustre01 kernel: LustreError: 31386:0:(ldlm_lib.c:1848:target_send_reply_msg()) @@@ processing error (-116) req at ffff8100621d2c00 x1332632126967249/t0 o35->60822f67-b798-f355-d5b1-54d7d7fa9f15@:0/0 lens 408/400 e 0 to 0 dl 1271403376 ref 1 fl Interpret:/0/0 rc -116/0 Apr 16 10:36:10 lustre01 kernel: LustreError: 31386:0:(ldlm_lib.c:1848:target_send_reply_msg()) Skipped 1 previous similar message and Apr 16 11:07:51 lustre02 kernel: LustreError: 6305:0:(mds_open.c:1666:mds_close()) @@@ no handle for file close ino 2264949: cookie 0xc4e85523602d2686 req at ffff81006787c000 x1332654888280600/t0 o35->52c78d24-f0da-a56e-9bc5-2e16e6e78790@:0/0 lens 408/976 e 0 to 0 dl 1271405277 ref 1 fl Interpret:/0/0 rc 0/0 Apr 16 11:07:51 lustre02 kernel: LustreError: 6305:0:(ldlm_lib.c:1848:target_send_reply_msg()) @@@ processing error (-116) req at ffff81006787c000 x1332654888280600/t0 o35->52c78d24-f0da-a56e-9bc5-2e16e6e78790@:0/0 lens 408/688 e 0 to 0 dl 1271405277 ref 1 fl Interpret:/0/0 rc -116/0 I would like to ask: a) is this a critical error message? b) is there any way to find out more info about it. e.g. filesystem, filename and lustre client that are related to this error? c) is there any way to resolve this errors? Thank you in advance
Andreas Dilger
2010-Apr-16 17:49 UTC
[Lustre-discuss] Frequent appearence of LustreError: no handle for file close ino
On 2010-04-16, at 01:27, Christos Theodosiou wrote:> our lustre installation uses two failover MDSes, which serve 10 > file-systems. We recently upgraded from 1.8.1.1 to 1.8.2 version. > > By monitoring the MDSes I noticed that we get frequent error messages > (1-4 times/hour) which looklike this: > > Apr 16 10:36:10 lustre01 kernel: LustreError: > 31386:0:(mds_open.c:1666:mds_close()) @@@ no handle for file close ino > 4011004: cookie 0x699184f2b7f715d6 req at ffff8100621d2c00 > x1332632126967249/t0 o35->60822f67-b798-f355-d5b1-54d7d7fa9f15@:0/0 > lens > 408/472 e 0 to 0 dl 1271403376 ref 1 fl Interpret:/0/0 rc 0/0This likely means some client was evicted in the past, but later on it is closing a file that was previously opened. This is not an error to be concerned about, and we should probably just turn it off.> o35->60822f67-b798-f355-d5b1-54d7d7fa9f15@:0/0 lens 408/400 e 0 to 0 > dlYou can find which client this is via (something like, I''m not logged on my system): lfs get_param mds.*.exports.*.uuid | grep 60822f67-b798-f355- d5b1-54d7d7fa9f15> 1271403376 ref 1 fl Interpret:/0/0 rc -116/0 > Apr 16 10:36:10 lustre01 kernel: LustreError: > 31386:0:(ldlm_lib.c:1848:target_send_reply_msg()) Skipped 1 previous > similar message > > and > Apr 16 11:07:51 lustre02 kernel: LustreError: > 6305:0:(mds_open.c:1666:mds_close()) @@@ no handle for file close ino > 2264949: cookie 0xc4e85523602d2686 req at ffff81006787c000 > x1332654888280600/t0 o35->52c78d24-f0da-a56e-9bc5-2e16e6e78790@:0/0 > lens > 408/976 e 0 to 0 dl 1271405277 ref 1 fl Interpret:/0/0 rc 0/0 > Apr 16 11:07:51 lustre02 kernel: LustreError: > 6305:0:(ldlm_lib.c:1848:target_send_reply_msg()) @@@ processing error > (-116) req at ffff81006787c000 x1332654888280600/t0 > o35->52c78d24-f0da-a56e-9bc5-2e16e6e78790@:0/0 lens 408/688 e 0 to 0 > dl > 1271405277 ref 1 fl Interpret:/0/0 rc -116/0 > > I would like to ask: > a) is this a critical error message? > b) is there any way to find out more info about it. e.g. filesystem, > filename and lustre client that are related to this error? > c) is there any way to resolve this errors? >Cheers, Andreas -- Andreas Dilger Principal Engineer, Lustre Group Oracle Corporation Canada Inc.
Christos Theodosiou
2010-Apr-19 08:09 UTC
[Lustre-discuss] Frequent appearence of LustreError: no handle for file close ino
On Fri, 2010-04-16 at 10:49 -0700, Andreas Dilger wrote:> On 2010-04-16, at 01:27, Christos Theodosiou wrote: > > our lustre installation uses two failover MDSes, which serve 10 > > file-systems. We recently upgraded from 1.8.1.1 to 1.8.2 version. > > > > By monitoring the MDSes I noticed that we get frequent error messages > > (1-4 times/hour) which looklike this: > > > > Apr 16 10:36:10 lustre01 kernel: LustreError: > > 31386:0:(mds_open.c:1666:mds_close()) @@@ no handle for file close ino > > 4011004: cookie 0x699184f2b7f715d6 req at ffff8100621d2c00 > > x1332632126967249/t0 o35->60822f67-b798-f355-d5b1-54d7d7fa9f15@:0/0 > > lens > > 408/472 e 0 to 0 dl 1271403376 ref 1 fl Interpret:/0/0 rc 0/0 > > This likely means some client was evicted in the past, but later on it > is closing a file that was previously opened. This is not an error to > be concerned about, and we should probably just turn it off. > > > o35->60822f67-b798-f355-d5b1-54d7d7fa9f15@:0/0 lens 408/400 e 0 to 0 > > dl > > You can find which client this is via (something like, I''m not logged on > my system): > > lfs get_param mds.*.exports.*.uuid | grep 60822f67-b798-f355- > d5b1-54d7d7fa9f15lctl get_param mds.*.exports.*.uuid | grep 60822f67-b798-f355- d5b1-54d7d7fa9f15 worked fine. This command provides info about the server and the client. Is there any way to get information about the file of the error message?> > > 1271403376 ref 1 fl Interpret:/0/0 rc -116/0 > > Apr 16 10:36:10 lustre01 kernel: LustreError: > > 31386:0:(ldlm_lib.c:1848:target_send_reply_msg()) Skipped 1 previous > > similar message > > > > and > > Apr 16 11:07:51 lustre02 kernel: LustreError: > > 6305:0:(mds_open.c:1666:mds_close()) @@@ no handle for file close ino > > 2264949: cookie 0xc4e85523602d2686 req at ffff81006787c000 > > x1332654888280600/t0 o35->52c78d24-f0da-a56e-9bc5-2e16e6e78790@:0/0 > > lens > > 408/976 e 0 to 0 dl 1271405277 ref 1 fl Interpret:/0/0 rc 0/0 > > Apr 16 11:07:51 lustre02 kernel: LustreError: > > 6305:0:(ldlm_lib.c:1848:target_send_reply_msg()) @@@ processing error > > (-116) req at ffff81006787c000 x1332654888280600/t0 > > o35->52c78d24-f0da-a56e-9bc5-2e16e6e78790@:0/0 lens 408/688 e 0 to 0 > > dl > > 1271405277 ref 1 fl Interpret:/0/0 rc -116/0 > > > > I would like to ask: > > a) is this a critical error message? > > b) is there any way to find out more info about it. e.g. filesystem, > > filename and lustre client that are related to this error? > > c) is there any way to resolve this errors? > > > > > Cheers, Andreas > -- > Andreas Dilger > Principal Engineer, Lustre Group > Oracle Corporation Canada Inc. >Thanks, Christos
Dmitry Zogin
2010-Apr-20 03:05 UTC
[Lustre-discuss] Frequent appearence of LustreError: no handle for file close ino
Christos Theodosiou wrote:> On Fri, 2010-04-16 at 10:49 -0700, Andreas Dilger wrote: > >> On 2010-04-16, at 01:27, Christos Theodosiou wrote: >> >>> our lustre installation uses two failover MDSes, which serve 10 >>> file-systems. We recently upgraded from 1.8.1.1 to 1.8.2 version. >>> >>> By monitoring the MDSes I noticed that we get frequent error messages >>> (1-4 times/hour) which looklike this: >>> >>> Apr 16 10:36:10 lustre01 kernel: LustreError: >>> 31386:0:(mds_open.c:1666:mds_close()) @@@ no handle for file close ino >>> 4011004: cookie 0x699184f2b7f715d6 req at ffff8100621d2c00 >>> x1332632126967249/t0 o35->60822f67-b798-f355-d5b1-54d7d7fa9f15@:0/0 >>> lens >>> 408/472 e 0 to 0 dl 1271403376 ref 1 fl Interpret:/0/0 rc 0/0 >>> >> This likely means some client was evicted in the past, but later on it >> is closing a file that was previously opened. This is not an error to >> be concerned about, and we should probably just turn it off. >> >> >>> o35->60822f67-b798-f355-d5b1-54d7d7fa9f15@:0/0 lens 408/400 e 0 to 0 >>> dl >>> >> You can find which client this is via (something like, I''m not logged on >> my system): >> >> lfs get_param mds.*.exports.*.uuid | grep 60822f67-b798-f355- >> d5b1-54d7d7fa9f15 >> > > lctl get_param mds.*.exports.*.uuid | grep 60822f67-b798-f355- > d5b1-54d7d7fa9f15 > > worked fine. This command provides info about the server and the client. > Is there any way to get information about the file of the error message? > >You can actually see the inode number in the message: 31386:0:(mds_open.c:1666:mds_close()) @@@ no handle for file close ino 4011004: cookie 0x699184f2b7f715d6 req at ffff8100621d2c00 ino=4011004 is the inode number.>>> 1271403376 ref 1 fl Interpret:/0/0 rc -116/0 >>> Apr 16 10:36:10 lustre01 kernel: LustreError: >>> 31386:0:(ldlm_lib.c:1848:target_send_reply_msg()) Skipped 1 previous >>> similar message >>> >>> and >>> Apr 16 11:07:51 lustre02 kernel: LustreError: >>> 6305:0:(mds_open.c:1666:mds_close()) @@@ no handle for file close ino >>> 2264949: cookie 0xc4e85523602d2686 req at ffff81006787c000 >>> x1332654888280600/t0 o35->52c78d24-f0da-a56e-9bc5-2e16e6e78790@:0/0 >>> lens >>> 408/976 e 0 to 0 dl 1271405277 ref 1 fl Interpret:/0/0 rc 0/0 >>> Apr 16 11:07:51 lustre02 kernel: LustreError: >>> 6305:0:(ldlm_lib.c:1848:target_send_reply_msg()) @@@ processing error >>> (-116) req at ffff81006787c000 x1332654888280600/t0 >>> o35->52c78d24-f0da-a56e-9bc5-2e16e6e78790@:0/0 lens 408/688 e 0 to 0 >>> dl >>> 1271405277 ref 1 fl Interpret:/0/0 rc -116/0 >>> >>> I would like to ask: >>> a) is this a critical error message? >>> b) is there any way to find out more info about it. e.g. filesystem, >>> filename and lustre client that are related to this error? >>> c) is there any way to resolve this errors? >>> >>> >> Cheers, Andreas >> -- >> Andreas Dilger >> Principal Engineer, Lustre Group >> Oracle Corporation Canada Inc. >> >> > > Thanks, Christos > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100419/5844108b/attachment.html