Arne Brutschy
2009-Jul-03 10:41 UTC
[Lustre-discuss] Strange bug: no handle for file close
Hi all, I recently created a test installation of lustre on our cluster (rocks 4.2.1, CentOS 4.7, lustre 1.6.7.2). The setup is quite simple, 1 MGS/MDS and 4 OSS, each with a single target. I migrated each user''s homedir from our raid5-nfs-shared head node to the lustre mount. I didn''t had any problems, everything went fine (I used the automounter for easy transition). The setup has been running fine for a week. Now I added a new user -- still with the old scripts, so I added the user and tried to migrate it afterwards. The result: the user cannot log in. Bash reports something like "identifier removed", apparently the user cannot read any file from his home. Strangely, I can read and write all files fine when I''m root. I can revert the migration and the data is fine (the user can log in). On the MDS, I found the following messages in the log:> LustreError: 3584:0:(mds_open.c:1567:mds_close()) @@@ no handle for file close ino 30863131: cookie 0xafec72814d4ff48a req at f5f41400 x1257636/t0 o35->bb2a441c-fe74-2223-8d98-c2e40170b718@:0/0 lens 296/560 e 0 to 0 dl 1246615645 ref 1 fl Interpret:/0/0 rc 0/0 > LustreError: 3583:0:(mds_open.c:1567:mds_close()) @@@ no handle for file close ino 34451538: cookie 0xafec728167e99feb req at f3caee00 x1319105/t0 o35->98f257a6-4c8a-7f0b-25fc-d02a17efc2a6@:0/0 lens 296/560 e 0 to 0 dl 1246615646 ref 1 fl Interpret:/0/0 rc 0/0 > LustreError: 3583:0:(mds_open.c:1567:mds_close()) Skipped 12 previous similar messages > LustreError: 3584:0:(mds_open.c:1567:mds_close()) @@@ no handle for file close ino 30862821: cookie 0xafec728146cb237c req at f35de600 x433424/t0 o35->3e8da5ff-20e6-19dd-f975-4837dc86654a at NET_0x200000affffd3_UUID:0/0 lens 296/560 e 0 to 0 dl 1246615648 ref 1 fl Interpret:/0/0 rc 0/0 > LustreError: 3584:0:(mds_open.c:1567:mds_close()) Skipped 170 previous similar messages > LustreError: 3584:0:(mds_open.c:1567:mds_close()) @@@ no handle for file close ino 30863197: cookie 0xafec728150bfaa7b req at f6f6aa00 x1207616/t0 o35->b058f8e1-dc62-9e9d-f480-a38c8fe5f36d@:0/0 lens 296/560 e 0 to 0 dl 1246615651 ref 1 fl Interpret:/0/0 rc 0/0 > LustreError: 3584:0:(mds_open.c:1567:mds_close()) Skipped 18 previous similar messages > LustreError: 3583:0:(mds_open.c:1567:mds_close()) @@@ no handle for file close ino 34451538: cookie 0xafec728168149d03 req at f3779c00 x1072035/t0 o35->afec1fd6-045f-7d1e-50e5-bbd95a94f117 at NET_0x200000affffc4_UUID:0/0 lens 296/560 e 0 to 0 dl 1246615656 ref 1 fl Interpret:/0/0 rc 0/0 > LustreError: 3583:0:(mds_open.c:1567:mds_close()) Skipped 548 previous similar messages > LustreError: 3584:0:(mds_open.c:1567:mds_close()) @@@ no handle for file close ino 34451538: cookie 0xafec728168b5e2b6 req at f7a4662c x318943/t0 o35->8b9a4c7c-d0ca-5a07-2fed-76b2c29a3953@:0/0 lens 296/560 e 0 to 0 dl 1246615664 ref 1 fl Interpret:/0/0 rc 0/0 > LustreError: 3584:0:(mds_open.c:1567:mds_close()) Skipped 174 previous similar messages > LustreError: 5072:0:(ldlm_lib.c:1643:target_send_reply_msg()) @@@ processing error (-116) req at f5da4600 x458268/t0 o35->e99372df-e6db-3adb-0884-d238b1ef8a4e@:0/0 lens 296/560 e 0 to 0 dl 1246615672 ref 1 fl Interpret:/0/0 rc -116/0 > LustreError: 5072:0:(ldlm_lib.c:1643:target_send_reply_msg()) Skipped 1291 previous similar messages > LustreError: 5126:0:(mds_open.c:1567:mds_close()) @@@ no handle for file close ino 30863061: cookie 0xafec72814a9a70e8 req at f37cbe00 x3264550/t0 o35->90ff7cc4-8f14-5c0e-90ce-1e7fb80533ce@:0/0 lens 296/560 e 0 to 0 dl 1246615680 ref 1 fl Interpret:/0/0 rc 0/0 > LustreError: 5126:0:(mds_open.c:1567:mds_close()) Skipped 435 previous similar messages > LustreError: 5148:0:(mds_open.c:1567:mds_close()) @@@ no handle for file close ino 34524483: cookie 0xafec728168c287d4 req at f3690c00 x383668/t0 o35->cc70305b-4ca6-dc64-6f55-97299fc52fd5@:0/0 lens 296/560 e 0 to 0 dl 1246615720 ref 1 fl Interpret:/0/0 rc 0/0 > LustreError: 5148:0:(mds_open.c:1567:mds_close()) Skipped 51 previous similar messages > LustreError: 3546:0:(ldlm_lib.c:1643:target_send_reply_msg()) @@@ processing error (-43) req at f5f46c00 x3532/t0 o36->25465654-6df2-739c-a30a-e215b53e324e@:0/0 lens 344/296 e 0 to 0 dl 1246616320 ref 1 fl Interpret:/0/0 rc 0/0 > LustreError: 3546:0:(ldlm_lib.c:1643:target_send_reply_msg()) Skipped 440 previous similar messagesFor each attempted access, the list grows. The inodes are the ones of the files of the users homedir. The OSS log no error. Anyone a clue why this happens? And why only with this user? All other users are working fine. Cheers Arne -- Arne Brutschy Ph.D. Student Email arne.brutschy(AT)ulb.ac.be IRIDIA CP 194/6 Web iridia.ulb.ac.be/~abrutschy Universite'' Libre de Bruxelles Tel +32 2 650 3168 Avenue Franklin Roosevelt 50 Fax +32 2 650 2715 1050 Bruxelles, Belgium (Fax at IRIDIA secretary)
Brian J. Murrell
2009-Jul-03 11:13 UTC
[Lustre-discuss] Strange bug: no handle for file close
On Fri, 2009-07-03 at 12:41 +0200, Arne Brutschy wrote:> > The result: the user cannot log in. Bash reports something like > "identifier removed",This has been discussed many times, in detail on this list. Check the archives. It''s also covered in the operations manual. The summary is that the the clients and the MDS must share common /etc/passwd and /etc/group database. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090703/f979fd31/attachment.bin
Arne Brutschy
2009-Jul-03 12:28 UTC
[Lustre-discuss] Strange bug: no handle for file close
Aw, worked like a charm. Sorry for asking a FAQ, but I guess I was searching using the wrong terms. To add something to the list: If you are using cluster-rocks, you need to have the ''greceptor'' service running on the lustre servers as well. This way the servers'' user databases will be updated automatically when the headnode executes ''rocks-user-sync''. Thanks a lot! Arne On Fr, 2009-07-03 at 07:13 -0400, Brian J. Murrell wrote:> On Fri, 2009-07-03 at 12:41 +0200, Arne Brutschy wrote: > > > > The result: the user cannot log in. Bash reports something like > > "identifier removed", > > This has been discussed many times, in detail on this list. Check the > archives. It''s also covered in the operations manual. > > The summary is that the the clients and the MDS must share > common /etc/passwd and /etc/group database. > > b. > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss-- Arne Brutschy Ph.D. Student Email arne.brutschy(AT)ulb.ac.be IRIDIA CP 194/6 Web iridia.ulb.ac.be/~abrutschy Universite'' Libre de Bruxelles Tel +32 2 650 3168 Avenue Franklin Roosevelt 50 Fax +32 2 650 2715 1050 Bruxelles, Belgium (Fax at IRIDIA secretary)