I''m attempting my first-ever Lustre install on a small test cluster. I have one MDS and 5 OSS''s, all with identical hardware. They are all on the same network segment, and have a single ethernet interface. I''m running SLES9 SP3 with the Lustre RPMs for the kernel, modules, etc. I''ve configured the systems and mounted everything, and everything seems fine. As a first test, I''ve tried to mount the filesystem on the MDS (and on more than one OSS) as a client. The filesystem seems to mount fine, but once it is mounted, which ever system has it mounted will hang for long periods of time (often, permanently), however, I can log into the system from another shell, and things will act ok. Normally the hang seems to be caused by doing anything related to the client-mounted filesystem, but not always. I had an installation of 1.6 beta7 working that didn''t seem to have the problem, but 1.6.0 and 1.6.0.1 both have done it. I currently have the filesystem mounted using the MDS as a client, it has created a nearly 1MB lustre-log in /tmp (available upon request), and I''ve included a snippet from /var/log/messages below. Any help would be appreciated! May 9 15:13:46 Lustre-01-01 kernel: Lustre: 7256:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000: d09242ed-b4f7-806f-bc12-912f7cfac1a9 reconnecting May 9 15:13:46 Lustre-01-01 kernel: Lustre: 7256:0:(ldlm_lib.c:497:target_handle_reconnect()) Skipped 1 previous similar message May 9 15:13:46 Lustre-01-01 kernel: Lustre: 7256:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: refuse reconnection from d09242ed-b4f7-806f-bc12-912f7cfac1a9@0@lo to 0x000001011f228000/2 May 9 15:13:46 Lustre-01-01 kernel: Lustre: 7256:0:(ldlm_lib.c:709:target_handle_connect()) Skipped 1 previous similar message May 9 15:13:46 Lustre-01-01 kernel: LustreError: 7256:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing error (-16) req@00000100dfd5b000 x670/t0 o38->d09242ed-b4f7-806f-bc12-912f7cfac1a9@172.19.0.10@tcp:-1 lens 304/200 ref 0 fl Interpret:/0/0 rc -16/0 May 9 15:13:46 Lustre-01-01 kernel: LustreError: 7256:0:(ldlm_lib.c:1363:target_send_reply_msg()) Skipped 1 previous similar message May 9 15:14:54 Lustre-01-01 automount[6101]: attempting to mount entry /.autofs/var.mail May 9 15:15:01 Lustre-01-01 kernel: Lustre: 7262:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000: d09242ed-b4f7-806f-bc12-912f7cfac1a9 reconnecting May 9 15:15:01 Lustre-01-01 kernel: Lustre: 7262:0:(ldlm_lib.c:497:target_handle_reconnect()) Skipped 2 previous similar messages May 9 15:15:01 Lustre-01-01 kernel: Lustre: 7262:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: refuse reconnection from d09242ed-b4f7-806f-bc12-912f7cfac1a9@0@lo to 0x000001011f228000/2 May 9 15:15:01 Lustre-01-01 kernel: Lustre: 7262:0:(ldlm_lib.c:709:target_handle_connect()) Skipped 2 previous similar messages May 9 15:15:01 Lustre-01-01 kernel: LustreError: 7262:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing error (-16) req@000001011e904a00 x712/t0 o38->d09242ed-b4f7-806f-bc12-912f7cfac1a9@172.19.0.10@tcp:-1 lens 304/200 ref 0 fl Interpret:/0/0 rc -16/0 May 9 15:15:01 Lustre-01-01 kernel: LustreError: 7262:0:(ldlm_lib.c:1363:target_send_reply_msg()) Skipped 2 previous similar messages May 9 15:17:31 Lustre-01-01 kernel: Lustre: 7242:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000: d09242ed-b4f7-806f-bc12-912f7cfac1a9 reconnecting May 9 15:17:31 Lustre-01-01 kernel: Lustre: 7242:0:(ldlm_lib.c:497:target_handle_reconnect()) Skipped 5 previous similar messages May 9 15:17:31 Lustre-01-01 kernel: Lustre: 7242:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: refuse reconnection from d09242ed-b4f7-806f-bc12-912f7cfac1a9@0@lo to 0x000001011f228000/2 May 9 15:17:31 Lustre-01-01 kernel: Lustre: 7242:0:(ldlm_lib.c:709:target_handle_connect()) Skipped 5 previous similar messages May 9 15:17:31 Lustre-01-01 kernel: LustreError: 7242:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing error (-16) req@000001021b13ec00 x796/t0 o38->d09242ed-b4f7-806f-bc12-912f7cfac1a9@172.19.0.10@tcp:-1 lens 304/200 ref 0 fl Interpret:/0/0 rc -16/0 May 9 15:17:31 Lustre-01-01 kernel: LustreError: 7242:0:(ldlm_lib.c:1363:target_send_reply_msg()) Skipped 5 previous similar messages May 9 15:20:17 Lustre-01-01 automount[7452]: expired /.autofs/var.mail May 9 15:20:26 Lustre-01-01 kernel: LustreError: 7143:0:(client.c:574:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR, err == -19 req@00000102196f8c00 x891/t0 o8->lustre1-OST0003_UUID@172.19.0.14@tcp:6 lens 240/272 ref 1 fl Rpc:R/0/0 rc 0/-19 May 9 15:20:26 Lustre-01-01 kernel: LustreError: 7143:0:(client.c:574:ptlrpc_check_status()) Skipped 72 previous similar messages May 9 15:22:05 Lustre-01-01 kernel: Lustre: 7233:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000: d09242ed-b4f7-806f-bc12-912f7cfac1a9 reconnecting May 9 15:22:05 Lustre-01-01 kernel: Lustre: 7233:0:(ldlm_lib.c:497:target_handle_reconnect()) Skipped 10 previous similar messages May 9 15:22:05 Lustre-01-01 kernel: Lustre: 7233:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: refuse reconnection from d09242ed-b4f7-806f-bc12-912f7cfac1a9@0@lo to 0x000001011f228000/2 May 9 15:22:05 Lustre-01-01 kernel: Lustre: 7233:0:(ldlm_lib.c:709:target_handle_connect()) Skipped 10 previous similar messages May 9 15:22:05 Lustre-01-01 kernel: LustreError: 7233:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing error (-16) req@00000102140d7400 x950/t0 o38->d09242ed-b4f7-806f-bc12-912f7cfac1a9@172.19.0.10@tcp:-1 lens 304/200 ref 0 fl Interpret:/0/0 rc -16/0 May 9 15:22:05 Lustre-01-01 kernel: LustreError: 7233:0:(ldlm_lib.c:1363:target_send_reply_msg()) Skipped 10 previous similar messages May 9 15:25:11 Lustre-01-01 kernel: LustreError: 7406:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue: -4 -- Roger L. Smith Senior Systems Administrator Mississippi State University High Performance Computing Collaboratory
May 9 15:13:46 Lustre-01-01 kernel: Lustre: 7256:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: refuse reconnection from d09242ed-b4f7-806f-bc12-912f7cfac1a9@0@lo to 0x000001011f228000/2 The MDT for some reason thinks the export is in use. I have no idea why, but try this. stop all your clients. on the MDT: # cat /proc/fs/lustre/devices # ls /proc/fs/lustre/mds/lustre1-MDT0000/exports/ just to prove there are no clients. Now try mounting a client on the MDT Roger L. Smith wrote:> > I''m attempting my first-ever Lustre install on a small test cluster. > I have one MDS and 5 OSS''s, all with identical hardware. They are all > on the same network segment, and have a single ethernet interface. > I''m running SLES9 SP3 with the Lustre RPMs for the kernel, modules, etc. > > I''ve configured the systems and mounted everything, and everything > seems fine. > > As a first test, I''ve tried to mount the filesystem on the MDS (and on > more than one OSS) as a client. The filesystem seems to mount fine, > but once it is mounted, which ever system has it mounted will hang for > long periods of time (often, permanently), however, I can log into the > system from another shell, and things will act ok. Normally the hang > seems to be caused by doing anything related to the client-mounted > filesystem, but not always. > > I had an installation of 1.6 beta7 working that didn''t seem to have > the problem, but 1.6.0 and 1.6.0.1 both have done it. > > I currently have the filesystem mounted using the MDS as a client, it > has created a nearly 1MB lustre-log in /tmp (available upon request), > and I''ve included a snippet from /var/log/messages below. > > Any help would be appreciated! > > May 9 15:13:46 Lustre-01-01 kernel: Lustre: > 7256:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000: > d09242ed-b4f7-806f-bc12-912f7cfac1a9 reconnecting > May 9 15:13:46 Lustre-01-01 kernel: Lustre: > 7256:0:(ldlm_lib.c:497:target_handle_reconnect()) Skipped 1 previous > similar message > May 9 15:13:46 Lustre-01-01 kernel: Lustre: > 7256:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: > refuse reconnection from d09242ed-b4f7-806f-bc12-912f7cfac1a9@0@lo to > 0x000001011f228000/2 > May 9 15:13:46 Lustre-01-01 kernel: Lustre: > 7256:0:(ldlm_lib.c:709:target_handle_connect()) Skipped 1 previous > similar message > May 9 15:13:46 Lustre-01-01 kernel: LustreError: > 7256:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing error > (-16) req@00000100dfd5b000 x670/t0 > o38->d09242ed-b4f7-806f-bc12-912f7cfac1a9@172.19.0.10@tcp:-1 lens > 304/200 ref 0 fl Interpret:/0/0 rc -16/0 > May 9 15:13:46 Lustre-01-01 kernel: LustreError: > 7256:0:(ldlm_lib.c:1363:target_send_reply_msg()) Skipped 1 previous > similar message > May 9 15:14:54 Lustre-01-01 automount[6101]: attempting to mount > entry /.autofs/var.mail > May 9 15:15:01 Lustre-01-01 kernel: Lustre: > 7262:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000: > d09242ed-b4f7-806f-bc12-912f7cfac1a9 reconnecting > May 9 15:15:01 Lustre-01-01 kernel: Lustre: > 7262:0:(ldlm_lib.c:497:target_handle_reconnect()) Skipped 2 previous > similar messages > May 9 15:15:01 Lustre-01-01 kernel: Lustre: > 7262:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: > refuse reconnection from d09242ed-b4f7-806f-bc12-912f7cfac1a9@0@lo to > 0x000001011f228000/2 > May 9 15:15:01 Lustre-01-01 kernel: Lustre: > 7262:0:(ldlm_lib.c:709:target_handle_connect()) Skipped 2 previous > similar messages > May 9 15:15:01 Lustre-01-01 kernel: LustreError: > 7262:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing error > (-16) req@000001011e904a00 x712/t0 > o38->d09242ed-b4f7-806f-bc12-912f7cfac1a9@172.19.0.10@tcp:-1 lens > 304/200 ref 0 fl Interpret:/0/0 rc -16/0 > May 9 15:15:01 Lustre-01-01 kernel: LustreError: > 7262:0:(ldlm_lib.c:1363:target_send_reply_msg()) Skipped 2 previous > similar messages > May 9 15:17:31 Lustre-01-01 kernel: Lustre: > 7242:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000: > d09242ed-b4f7-806f-bc12-912f7cfac1a9 reconnecting > May 9 15:17:31 Lustre-01-01 kernel: Lustre: > 7242:0:(ldlm_lib.c:497:target_handle_reconnect()) Skipped 5 previous > similar messages > May 9 15:17:31 Lustre-01-01 kernel: Lustre: > 7242:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: > refuse reconnection from d09242ed-b4f7-806f-bc12-912f7cfac1a9@0@lo to > 0x000001011f228000/2 > May 9 15:17:31 Lustre-01-01 kernel: Lustre: > 7242:0:(ldlm_lib.c:709:target_handle_connect()) Skipped 5 previous > similar messages > May 9 15:17:31 Lustre-01-01 kernel: LustreError: > 7242:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing error > (-16) req@000001021b13ec00 x796/t0 > o38->d09242ed-b4f7-806f-bc12-912f7cfac1a9@172.19.0.10@tcp:-1 lens > 304/200 ref 0 fl Interpret:/0/0 rc -16/0 > May 9 15:17:31 Lustre-01-01 kernel: LustreError: > 7242:0:(ldlm_lib.c:1363:target_send_reply_msg()) Skipped 5 previous > similar messages > May 9 15:20:17 Lustre-01-01 automount[7452]: expired /.autofs/var.mail > May 9 15:20:26 Lustre-01-01 kernel: LustreError: > 7143:0:(client.c:574:ptlrpc_check_status()) @@@ type == > PTL_RPC_MSG_ERR, err == -19 req@00000102196f8c00 x891/t0 > o8->lustre1-OST0003_UUID@172.19.0.14@tcp:6 lens 240/272 ref 1 fl > Rpc:R/0/0 rc 0/-19 > May 9 15:20:26 Lustre-01-01 kernel: LustreError: > 7143:0:(client.c:574:ptlrpc_check_status()) Skipped 72 previous > similar messages > May 9 15:22:05 Lustre-01-01 kernel: Lustre: > 7233:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000: > d09242ed-b4f7-806f-bc12-912f7cfac1a9 reconnecting > May 9 15:22:05 Lustre-01-01 kernel: Lustre: > 7233:0:(ldlm_lib.c:497:target_handle_reconnect()) Skipped 10 previous > similar messages > May 9 15:22:05 Lustre-01-01 kernel: Lustre: > 7233:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: > refuse reconnection from d09242ed-b4f7-806f-bc12-912f7cfac1a9@0@lo to > 0x000001011f228000/2 > May 9 15:22:05 Lustre-01-01 kernel: Lustre: > 7233:0:(ldlm_lib.c:709:target_handle_connect()) Skipped 10 previous > similar messages > May 9 15:22:05 Lustre-01-01 kernel: LustreError: > 7233:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing error > (-16) req@00000102140d7400 x950/t0 > o38->d09242ed-b4f7-806f-bc12-912f7cfac1a9@172.19.0.10@tcp:-1 lens > 304/200 ref 0 fl Interpret:/0/0 rc -16/0 > May 9 15:22:05 Lustre-01-01 kernel: LustreError: > 7233:0:(ldlm_lib.c:1363:target_send_reply_msg()) Skipped 10 previous > similar messages > May 9 15:25:11 Lustre-01-01 kernel: LustreError: > 7406:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue: -4 > >
Nathan, Thanks for the help. That solved one problem, but after booting all of the servers (no clients at all), I''m getting this in the syslog on the MDS: May 11 16:57:11 Lustre-01-01 kernel: LustreError: 6359:0:(client.c:574:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR, err == -19 req@0000010218691800 x1450/t0 o8->lustre1-OST0003_UUID@172.19.0.14@tcp:6 lens 240/272 ref 1 fl Rpc:R/0/0 rc 0/-19 May 11 16:57:11 Lustre-01-01 kernel: LustreError: 6359:0:(client.c:574:ptlrpc_check_status()) Skipped 24 previous similar messages IF I then attempt to mount the filesystem on the MDS, it mounts, and I can see the contents. I then tried removing a file that I had earlier created using the "touch" command, and it removes fine, however, if I then try to touch a new file, the command hangs and after a minute or so I get the following set of errors: (I''ll be happy to provide the lustre-log if that''s helpful). May 11 17:01:38 Lustre-01-01 kernel: Lustre: Client lustre1-client has started May 11 17:03:32 Lustre-01-01 kernel: Lustre: 0:0:(watchdog.c:130:lcw_cb()) Watch dog triggered for pid 6617: it was inactive for 100s May 11 17:03:32 Lustre-01-01 kernel: Lustre: 0:0:(linux-debug.c:166:libcfs_debug _dumpstack()) showing stack for process 6617 May 11 17:03:32 Lustre-01-01 kernel: ll_mdt_10 S 00000102179ac4a8 0 661 7 1 6618 6616 (L-TLB) May 11 17:03:32 Lustre-01-01 kernel: 000001021770b018 0000000000000046 000000300 0000030 000001021770b0b0 May 11 17:03:32 Lustre-01-01 kernel: 000001021770af98 0000010215f28380 00 0001020000007b 0000000000000000 May 11 17:03:32 Lustre-01-01 kernel: 0000000000000000 000001000c001160 May 11 17:03:32 Lustre-01-01 kernel: Call Trace:<ffffffff80147456>{schedule_time out+246} <ffffffff801468b0>{process_timeout+0} May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa03a776e>{:ptlrpc:ptlrpc_se t_wait+974} <ffffffffa06040d4>{:osc:osc_statfs_async+372} May 11 17:03:32 Lustre-01-01 kernel: <ffffffff80136120>{default_wake_func tion+0} <ffffffffa04734fa>{:lov:lov_statfs_async+1098} May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa03a67e0>{:ptlrpc:ptlrpc_ex pired_set+0} <ffffffffa03a1680>{:ptlrpc:ptlrpc_interrupted_set+0} May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa03a67e0>{:ptlrpc:ptlrpc_ex pired_set+0} <ffffffffa03a1680>{:ptlrpc:ptlrpc_interrupted_set+0} May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa047af8e>{:lov:lov_create+7 006} <ffffffff80190bdf>{__getblk+31} May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa0525e1f>{:ldiskfs:ldiskfs_ get_inode_loc+351} May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa0533486>{:ldiskfs:ldiskfs_ xattr_ibody_get+454} May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa0533e18>{:ldiskfs:ldiskfs_ xattr_get+120} May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa05aa4da>{:mds:mds_get_md+1 06} <ffffffffa05ccd7e>{:mds:mds_create_objects+7214} May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa0533e18>{:ldiskfs:ldiskfs_ xattr_get+120} May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa058cb6d>{:fsfilt_ldiskfs:f sfilt_ldiskfs_get_md+269} May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa05d0b9d>{:mds:mds_finish_o pen+701} <ffffffffa05d3617>{:mds:mds_open+8359} May 11 17:03:32 Lustre-01-01 kernel: <ffffffff8014ca1b>{groups_alloc+59} <ffffffffa029d653>{:lvfs:entry_set_group_info+211} May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa029cb71>{:lvfs:alloc_entry +241} <ffffffffa028570d>{:libcfs:libcfs_debug_vmsg2+1677} May 11 17:03:32 Lustre-01-01 kernel: <ffffffff801a9361>{dput+33} <fffffff fa03b1e60>{:ptlrpc:lustre_swab_mds_rec_create+0} May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa05b35fc>{:mds:mds_reint_re c+460} <ffffffffa05d4fe4>{:mds:mds_open_unpack+820} May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa05d42d4>{:mds:mds_update_u npack+484} <ffffffffa05aa391>{:mds:mds_reint+817} May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa037d887>{:ptlrpc:_ldlm_loc k_debug+1319} <ffffffffa05a86a1>{:mds:fixup_handle_for_resent_req+81} May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa05ae771>{:mds:mds_intent_p olicy+1089} <ffffffff801168e5>{do_gettimeofday+101} May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa0382e63>{:ptlrpc:ldlm_lock _enqueue+243} <ffffffffa039dc70>{:ptlrpc:ldlm_server_completion_ast+0} May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa039c972>{:ptlrpc:ldlm_hand le_enqueue+2722} May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa039e100>{:ptlrpc:ldlm_serv er_blocking_ast+0} May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa039dc70>{:ptlrpc:ldlm_server_completion_ast+0} May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa05b291a>{:mds:mds_handle+1 4938} <ffffffffa0306e4f>{:obdclass:class_handle2object+207} May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa03aec30>{:ptlrpc:lustre_sw ab_ptlrpc_body+0} May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa03b3300>{:ptlrpc:lustre_sw ab_buf+208} <ffffffffa028247d>{:libcfs:libcfs_nid2str+189} May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa03b9e27>{:ptlrpc:ptlrpc_se rver_handle_request+2951} May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa03bc4a8>{:ptlrpc:ptlrpc_ma in+2232} <ffffffff80136120>{default_wake_function+0} May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa03b7590>{:ptlrpc:ptlrpc_re try_rqbds+0} <ffffffffa03b7590>{:ptlrpc:ptlrpc_retry_rqbds+0} May 11 17:03:32 Lustre-01-01 kernel: <ffffffff8011126f>{child_rip+8} <fff fffffa03bbbf0>{:ptlrpc:ptlrpc_main+0} May 11 17:03:32 Lustre-01-01 kernel: <ffffffff80111267>{child_rip+0} May 11 17:03:32 Lustre-01-01 kernel: LustreError: dumping log to /tmp/lustre-log .1178921012.6617 May 11 17:03:32 Lustre-01-01 kernel: LustreError: 7460:0:(client.c:950:ptlrpc_ex pire_one_request()) @@@ timeout (sent at 1178920912, 100s ago) req@000001011dea6 400 x1548/t0 o101->lustre1-MDT0000_UUID@172.19.0.10@tcp:12 lens 512/864 ref 1 fl Rpc:P/0/0 rc 0/-22 May 11 17:03:32 Lustre-01-01 kernel: LustreError: 7460:0:(client.c:950:ptlrpc_ex pire_one_request()) Skipped 17 previous similar messages May 11 17:03:32 Lustre-01-01 kernel: Lustre: lustre1-MDT0000-mdc-000001000c22440 0: Connection to service lustre1-MDT0000 via nid 0@lo was lost; in progress oper ations using this service will wait for recovery to complete. May 11 17:03:32 Lustre-01-01 kernel: Lustre: 6621:0:(ldlm_lib.c:497:target_handl e_reconnect()) lustre1-MDT0000: a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 reconnectin g May 11 17:03:32 Lustre-01-01 kernel: Lustre: 6621:0:(ldlm_lib.c:709:target_handl e_connect()) lustre1-MDT0000: refuse reconnection from a5a32bfb-05ba-d0cc-f4dd-6 afbc5d45224@0@lo to 0x000001011df5b000/2 May 11 17:03:32 Lustre-01-01 kernel: LustreError: 6621:0:(ldlm_lib.c:1363:target _send_reply_msg()) @@@ processing error (-16) req@000001011d8f9800 x1607/t0 o38- >a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224@172.19.0.10@tcp:-1 lens 304/200 ref 0 fl I nterpret:/0/0 rc -16/0 May 11 17:03:57 Lustre-01-01 kernel: Lustre: 6622:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000: a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 reconnecting May 11 17:03:57 Lustre-01-01 kernel: Lustre: 6622:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: refuse reconnection from a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224@0@lo to 0x000001011df5b000/2 May 11 17:03:57 Lustre-01-01 kernel: LustreError: 6622:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing error (-16) req@000001011d877800 x1615/t0 o38->a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224@172.19.0.10@tcp:-1 lens 304/200 ref 0 fl Interpret:/0/0 rc -16/0 May 11 17:04:22 Lustre-01-01 kernel: Lustre: 6623:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000: a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 reconnecting May 11 17:04:22 Lustre-01-01 kernel: Lustre: 6623:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: refuse reconnection from a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224@0@lo to 0x000001011df5b000/2 May 11 17:04:22 Lustre-01-01 kernel: LustreError: 6623:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing error (-16) req@000001011d8dd000 x1629/t0 o38->a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224@172.19.0.10@tcp:-1 lens 304/200 ref 0 fl Interpret:/0/0 rc -16/0 May 11 17:04:47 Lustre-01-01 kernel: Lustre: 6624:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000: a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 reconnecting May 11 17:04:47 Lustre-01-01 kernel: Lustre: 6624:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: refuse reconnection from a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224@0@lo to 0x000001011df5b000/2 May 11 17:04:47 Lustre-01-01 kernel: LustreError: 6624:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing error (-16) req@000001011d877e00 x1643/t0 o38->a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224@172.19.0.10@tcp:-1 lens 304/200 ref 0 fl Interpret:/0/0 rc -16/0 Nathaniel Rutman wrote:> May 9 15:13:46 Lustre-01-01 kernel: Lustre: > 7256:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: refuse > reconnection from d09242ed-b4f7-806f-bc12-912f7cfac1a9@0@lo to > 0x000001011f228000/2 > The MDT for some reason thinks the export is in use. I have no idea > why, but try this. > > stop all your clients. > on the MDT: > # cat /proc/fs/lustre/devices > # ls /proc/fs/lustre/mds/lustre1-MDT0000/exports/ > just to prove there are no clients. > Now try mounting a client on the MDT > > > Roger L. Smith wrote: >> >> I''m attempting my first-ever Lustre install on a small test cluster. >> I have one MDS and 5 OSS''s, all with identical hardware. They are all >> on the same network segment, and have a single ethernet interface. >> I''m running SLES9 SP3 with the Lustre RPMs for the kernel, modules, etc. >> >> I''ve configured the systems and mounted everything, and everything >> seems fine. >> >> As a first test, I''ve tried to mount the filesystem on the MDS (and on >> more than one OSS) as a client. The filesystem seems to mount fine, >> but once it is mounted, which ever system has it mounted will hang for >> long periods of time (often, permanently), however, I can log into the >> system from another shell, and things will act ok. Normally the hang >> seems to be caused by doing anything related to the client-mounted >> filesystem, but not always. >> >> I had an installation of 1.6 beta7 working that didn''t seem to have >> the problem, but 1.6.0 and 1.6.0.1 both have done it. >> >> I currently have the filesystem mounted using the MDS as a client, it >> has created a nearly 1MB lustre-log in /tmp (available upon request), >> and I''ve included a snippet from /var/log/messages below. >> >> Any help would be appreciated! >> >> May 9 15:13:46 Lustre-01-01 kernel: Lustre: >> 7256:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000: >> d09242ed-b4f7-806f-bc12-912f7cfac1a9 reconnecting >> May 9 15:13:46 Lustre-01-01 kernel: Lustre: >> 7256:0:(ldlm_lib.c:497:target_handle_reconnect()) Skipped 1 previous >> similar message >> May 9 15:13:46 Lustre-01-01 kernel: Lustre: >> 7256:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: >> refuse reconnection from d09242ed-b4f7-806f-bc12-912f7cfac1a9@0@lo to >> 0x000001011f228000/2 >> May 9 15:13:46 Lustre-01-01 kernel: Lustre: >> 7256:0:(ldlm_lib.c:709:target_handle_connect()) Skipped 1 previous >> similar message >> May 9 15:13:46 Lustre-01-01 kernel: LustreError: >> 7256:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing error >> (-16) req@00000100dfd5b000 x670/t0 >> o38->d09242ed-b4f7-806f-bc12-912f7cfac1a9@172.19.0.10@tcp:-1 lens >> 304/200 ref 0 fl Interpret:/0/0 rc -16/0 >> May 9 15:13:46 Lustre-01-01 kernel: LustreError: >> 7256:0:(ldlm_lib.c:1363:target_send_reply_msg()) Skipped 1 previous >> similar message >> May 9 15:14:54 Lustre-01-01 automount[6101]: attempting to mount >> entry /.autofs/var.mail >> May 9 15:15:01 Lustre-01-01 kernel: Lustre: >> 7262:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000: >> d09242ed-b4f7-806f-bc12-912f7cfac1a9 reconnecting >> May 9 15:15:01 Lustre-01-01 kernel: Lustre: >> 7262:0:(ldlm_lib.c:497:target_handle_reconnect()) Skipped 2 previous >> similar messages >> May 9 15:15:01 Lustre-01-01 kernel: Lustre: >> 7262:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: >> refuse reconnection from d09242ed-b4f7-806f-bc12-912f7cfac1a9@0@lo to >> 0x000001011f228000/2 >> May 9 15:15:01 Lustre-01-01 kernel: Lustre: >> 7262:0:(ldlm_lib.c:709:target_handle_connect()) Skipped 2 previous >> similar messages >> May 9 15:15:01 Lustre-01-01 kernel: LustreError: >> 7262:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing error >> (-16) req@000001011e904a00 x712/t0 >> o38->d09242ed-b4f7-806f-bc12-912f7cfac1a9@172.19.0.10@tcp:-1 lens >> 304/200 ref 0 fl Interpret:/0/0 rc -16/0 >> May 9 15:15:01 Lustre-01-01 kernel: LustreError: >> 7262:0:(ldlm_lib.c:1363:target_send_reply_msg()) Skipped 2 previous >> similar messages >> May 9 15:17:31 Lustre-01-01 kernel: Lustre: >> 7242:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000: >> d09242ed-b4f7-806f-bc12-912f7cfac1a9 reconnecting >> May 9 15:17:31 Lustre-01-01 kernel: Lustre: >> 7242:0:(ldlm_lib.c:497:target_handle_reconnect()) Skipped 5 previous >> similar messages >> May 9 15:17:31 Lustre-01-01 kernel: Lustre: >> 7242:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: >> refuse reconnection from d09242ed-b4f7-806f-bc12-912f7cfac1a9@0@lo to >> 0x000001011f228000/2 >> May 9 15:17:31 Lustre-01-01 kernel: Lustre: >> 7242:0:(ldlm_lib.c:709:target_handle_connect()) Skipped 5 previous >> similar messages >> May 9 15:17:31 Lustre-01-01 kernel: LustreError: >> 7242:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing error >> (-16) req@000001021b13ec00 x796/t0 >> o38->d09242ed-b4f7-806f-bc12-912f7cfac1a9@172.19.0.10@tcp:-1 lens >> 304/200 ref 0 fl Interpret:/0/0 rc -16/0 >> May 9 15:17:31 Lustre-01-01 kernel: LustreError: >> 7242:0:(ldlm_lib.c:1363:target_send_reply_msg()) Skipped 5 previous >> similar messages >> May 9 15:20:17 Lustre-01-01 automount[7452]: expired /.autofs/var.mail >> May 9 15:20:26 Lustre-01-01 kernel: LustreError: >> 7143:0:(client.c:574:ptlrpc_check_status()) @@@ type == >> PTL_RPC_MSG_ERR, err == -19 req@00000102196f8c00 x891/t0 >> o8->lustre1-OST0003_UUID@172.19.0.14@tcp:6 lens 240/272 ref 1 fl >> Rpc:R/0/0 rc 0/-19 >> May 9 15:20:26 Lustre-01-01 kernel: LustreError: >> 7143:0:(client.c:574:ptlrpc_check_status()) Skipped 72 previous >> similar messages >> May 9 15:22:05 Lustre-01-01 kernel: Lustre: >> 7233:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000: >> d09242ed-b4f7-806f-bc12-912f7cfac1a9 reconnecting >> May 9 15:22:05 Lustre-01-01 kernel: Lustre: >> 7233:0:(ldlm_lib.c:497:target_handle_reconnect()) Skipped 10 previous >> similar messages >> May 9 15:22:05 Lustre-01-01 kernel: Lustre: >> 7233:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: >> refuse reconnection from d09242ed-b4f7-806f-bc12-912f7cfac1a9@0@lo to >> 0x000001011f228000/2 >> May 9 15:22:05 Lustre-01-01 kernel: Lustre: >> 7233:0:(ldlm_lib.c:709:target_handle_connect()) Skipped 10 previous >> similar messages >> May 9 15:22:05 Lustre-01-01 kernel: LustreError: >> 7233:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing error >> (-16) req@00000102140d7400 x950/t0 >> o38->d09242ed-b4f7-806f-bc12-912f7cfac1a9@172.19.0.10@tcp:-1 lens >> 304/200 ref 0 fl Interpret:/0/0 rc -16/0 >> May 9 15:22:05 Lustre-01-01 kernel: LustreError: >> 7233:0:(ldlm_lib.c:1363:target_send_reply_msg()) Skipped 10 previous >> similar messages >> May 9 15:25:11 Lustre-01-01 kernel: LustreError: >> 7406:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue: -4 >> >> > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss-- Roger L. Smith Senior Systems Administrator Mississippi State University High Performance Computing Collaboratory
errno 19 = ENODEV -- did the server lustre-OST0003 successfully start? Roger L. Smith wrote:> Nathan, > > Thanks for the help. That solved one problem, but after booting all > of the servers (no clients at all), I''m getting this in the syslog on > the MDS: > > > > May 11 16:57:11 Lustre-01-01 kernel: LustreError: > 6359:0:(client.c:574:ptlrpc_check_status()) @@@ type == > PTL_RPC_MSG_ERR, err == -19 req@0000010218691800 x1450/t0 > o8->lustre1-OST0003_UUID@172.19.0.14@tcp:6 lens 240/272 ref 1 fl > Rpc:R/0/0 rc 0/-19 > May 11 16:57:11 Lustre-01-01 kernel: LustreError: > 6359:0:(client.c:574:ptlrpc_check_status()) Skipped 24 previous > similar messages > > > IF I then attempt to mount the filesystem on the MDS, it mounts, and I > can see the contents. I then tried removing a file that I had earlier > created using the "touch" command, and it removes fine, however, if I > then try to touch a new file, the command hangs and after a minute or > so I get the following set of errors: (I''ll be happy to provide the > lustre-log if that''s helpful). > > May 11 17:01:38 Lustre-01-01 kernel: Lustre: Client lustre1-client has > started > May 11 17:03:32 Lustre-01-01 kernel: Lustre: > 0:0:(watchdog.c:130:lcw_cb()) Watch > dog triggered for pid 6617: it was inactive for 100s > May 11 17:03:32 Lustre-01-01 kernel: Lustre: > 0:0:(linux-debug.c:166:libcfs_debug > _dumpstack()) showing stack for process 6617 > May 11 17:03:32 Lustre-01-01 kernel: ll_mdt_10 S 00000102179ac4a8 > 0 661 > 7 1 6618 6616 (L-TLB) > May 11 17:03:32 Lustre-01-01 kernel: 000001021770b018 0000000000000046 > 000000300 > 0000030 000001021770b0b0 > May 11 17:03:32 Lustre-01-01 kernel: 000001021770af98 > 0000010215f28380 00 > 0001020000007b 0000000000000000 > May 11 17:03:32 Lustre-01-01 kernel: 0000000000000000 > 000001000c001160 > May 11 17:03:32 Lustre-01-01 kernel: Call > Trace:<ffffffff80147456>{schedule_time > out+246} <ffffffff801468b0>{process_timeout+0} > May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa03a776e>{:ptlrpc:ptlrpc_se > t_wait+974} <ffffffffa06040d4>{:osc:osc_statfs_async+372} > May 11 17:03:32 Lustre-01-01 kernel: <ffffffff80136120>{default_wake_func > tion+0} <ffffffffa04734fa>{:lov:lov_statfs_async+1098} > May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa03a67e0>{:ptlrpc:ptlrpc_ex > pired_set+0} <ffffffffa03a1680>{:ptlrpc:ptlrpc_interrupted_set+0} > May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa03a67e0>{:ptlrpc:ptlrpc_ex > pired_set+0} <ffffffffa03a1680>{:ptlrpc:ptlrpc_interrupted_set+0} > May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa047af8e>{:lov:lov_create+7 > 006} <ffffffff80190bdf>{__getblk+31} > May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa0525e1f>{:ldiskfs:ldiskfs_ > get_inode_loc+351} > May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa0533486>{:ldiskfs:ldiskfs_ > xattr_ibody_get+454} > May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa0533e18>{:ldiskfs:ldiskfs_ > xattr_get+120} > May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa05aa4da>{:mds:mds_get_md+1 > 06} <ffffffffa05ccd7e>{:mds:mds_create_objects+7214} > May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa0533e18>{:ldiskfs:ldiskfs_ > xattr_get+120} > May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa058cb6d>{:fsfilt_ldiskfs:f > sfilt_ldiskfs_get_md+269} > May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa05d0b9d>{:mds:mds_finish_o > pen+701} <ffffffffa05d3617>{:mds:mds_open+8359} > May 11 17:03:32 Lustre-01-01 kernel: <ffffffff8014ca1b>{groups_alloc+59} > <ffffffffa029d653>{:lvfs:entry_set_group_info+211} > May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa029cb71>{:lvfs:alloc_entry > +241} <ffffffffa028570d>{:libcfs:libcfs_debug_vmsg2+1677} > May 11 17:03:32 Lustre-01-01 kernel: > <ffffffff801a9361>{dput+33} <fffffff > fa03b1e60>{:ptlrpc:lustre_swab_mds_rec_create+0} > May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa05b35fc>{:mds:mds_reint_re > c+460} <ffffffffa05d4fe4>{:mds:mds_open_unpack+820} > May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa05d42d4>{:mds:mds_update_u > npack+484} <ffffffffa05aa391>{:mds:mds_reint+817} > May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa037d887>{:ptlrpc:_ldlm_loc > k_debug+1319} <ffffffffa05a86a1>{:mds:fixup_handle_for_resent_req+81} > May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa05ae771>{:mds:mds_intent_p > olicy+1089} <ffffffff801168e5>{do_gettimeofday+101} > May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa0382e63>{:ptlrpc:ldlm_lock > _enqueue+243} <ffffffffa039dc70>{:ptlrpc:ldlm_server_completion_ast+0} > May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa039c972>{:ptlrpc:ldlm_hand > le_enqueue+2722} > May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa039e100>{:ptlrpc:ldlm_serv > er_blocking_ast+0} > May 11 17:03:32 Lustre-01-01 kernel: > <ffffffffa039dc70>{:ptlrpc:ldlm_server_completion_ast+0} > May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa05b291a>{:mds:mds_handle+1 > 4938} <ffffffffa0306e4f>{:obdclass:class_handle2object+207} > May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa03aec30>{:ptlrpc:lustre_sw > ab_ptlrpc_body+0} > May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa03b3300>{:ptlrpc:lustre_sw > ab_buf+208} <ffffffffa028247d>{:libcfs:libcfs_nid2str+189} > May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa03b9e27>{:ptlrpc:ptlrpc_se > rver_handle_request+2951} > May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa03bc4a8>{:ptlrpc:ptlrpc_ma > in+2232} <ffffffff80136120>{default_wake_function+0} > May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa03b7590>{:ptlrpc:ptlrpc_re > try_rqbds+0} <ffffffffa03b7590>{:ptlrpc:ptlrpc_retry_rqbds+0} > May 11 17:03:32 Lustre-01-01 kernel: <ffffffff8011126f>{child_rip+8} <fff > fffffa03bbbf0>{:ptlrpc:ptlrpc_main+0} > May 11 17:03:32 Lustre-01-01 kernel: > <ffffffff80111267>{child_rip+0} > May 11 17:03:32 Lustre-01-01 kernel: LustreError: dumping log to > /tmp/lustre-log > .1178921012.6617 > May 11 17:03:32 Lustre-01-01 kernel: LustreError: > 7460:0:(client.c:950:ptlrpc_ex > pire_one_request()) @@@ timeout (sent at 1178920912, 100s ago) > req@000001011dea6 > 400 x1548/t0 o101->lustre1-MDT0000_UUID@172.19.0.10@tcp:12 lens > 512/864 ref 1 fl > Rpc:P/0/0 rc 0/-22 > May 11 17:03:32 Lustre-01-01 kernel: LustreError: > 7460:0:(client.c:950:ptlrpc_ex > pire_one_request()) Skipped 17 previous similar messages > May 11 17:03:32 Lustre-01-01 kernel: Lustre: > lustre1-MDT0000-mdc-000001000c22440 > 0: Connection to service lustre1-MDT0000 via nid 0@lo was lost; in > progress oper > ations using this service will wait for recovery to complete. > May 11 17:03:32 Lustre-01-01 kernel: Lustre: > 6621:0:(ldlm_lib.c:497:target_handl > e_reconnect()) lustre1-MDT0000: a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 > reconnectin > g > May 11 17:03:32 Lustre-01-01 kernel: Lustre: > 6621:0:(ldlm_lib.c:709:target_handl > e_connect()) lustre1-MDT0000: refuse reconnection from > a5a32bfb-05ba-d0cc-f4dd-6 > afbc5d45224@0@lo to 0x000001011df5b000/2 > May 11 17:03:32 Lustre-01-01 kernel: LustreError: > 6621:0:(ldlm_lib.c:1363:target > _send_reply_msg()) @@@ processing error (-16) req@000001011d8f9800 > x1607/t0 o38- > >a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224@172.19.0.10@tcp:-1 lens 304/200 > ref 0 fl I > nterpret:/0/0 rc -16/0 > May 11 17:03:57 Lustre-01-01 kernel: Lustre: > 6622:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000: > a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 reconnecting > May 11 17:03:57 Lustre-01-01 kernel: Lustre: > 6622:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: > refuse reconnection from a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224@0@lo to > 0x000001011df5b000/2 > May 11 17:03:57 Lustre-01-01 kernel: LustreError: > 6622:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing error > (-16) req@000001011d877800 x1615/t0 > o38->a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224@172.19.0.10@tcp:-1 lens > 304/200 ref 0 fl Interpret:/0/0 rc -16/0 > May 11 17:04:22 Lustre-01-01 kernel: Lustre: > 6623:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000: > a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 reconnecting > May 11 17:04:22 Lustre-01-01 kernel: Lustre: > 6623:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: > refuse reconnection from a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224@0@lo to > 0x000001011df5b000/2 > May 11 17:04:22 Lustre-01-01 kernel: LustreError: > 6623:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing error > (-16) req@000001011d8dd000 x1629/t0 > o38->a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224@172.19.0.10@tcp:-1 lens > 304/200 ref 0 fl Interpret:/0/0 rc -16/0 > May 11 17:04:47 Lustre-01-01 kernel: Lustre: > 6624:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000: > a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 reconnecting > May 11 17:04:47 Lustre-01-01 kernel: Lustre: > 6624:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: > refuse reconnection from a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224@0@lo to > 0x000001011df5b000/2 > May 11 17:04:47 Lustre-01-01 kernel: LustreError: > 6624:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing error > (-16) req@000001011d877e00 x1643/t0 > o38->a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224@172.19.0.10@tcp:-1 lens > 304/200 ref 0 fl Interpret:/0/0 rc -16/0 > > > Nathaniel Rutman wrote: >> May 9 15:13:46 Lustre-01-01 kernel: Lustre: >> 7256:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: >> refuse reconnection from d09242ed-b4f7-806f-bc12-912f7cfac1a9@0@lo to >> 0x000001011f228000/2 >> The MDT for some reason thinks the export is in use. I have no idea >> why, but try this. >> >> stop all your clients. >> on the MDT: >> # cat /proc/fs/lustre/devices >> # ls /proc/fs/lustre/mds/lustre1-MDT0000/exports/ >> just to prove there are no clients. >> Now try mounting a client on the MDT >> >> >> Roger L. Smith wrote: >>> >>> I''m attempting my first-ever Lustre install on a small test >>> cluster. I have one MDS and 5 OSS''s, all with identical hardware. >>> They are all on the same network segment, and have a single ethernet >>> interface. I''m running SLES9 SP3 with the Lustre RPMs for the >>> kernel, modules, etc. >>> >>> I''ve configured the systems and mounted everything, and everything >>> seems fine. >>> >>> As a first test, I''ve tried to mount the filesystem on the MDS (and >>> on more than one OSS) as a client. The filesystem seems to mount >>> fine, but once it is mounted, which ever system has it mounted will >>> hang for long periods of time (often, permanently), however, I can >>> log into the system from another shell, and things will act ok. >>> Normally the hang seems to be caused by doing anything related to >>> the client-mounted filesystem, but not always. >>> >>> I had an installation of 1.6 beta7 working that didn''t seem to have >>> the problem, but 1.6.0 and 1.6.0.1 both have done it. >>> >>> I currently have the filesystem mounted using the MDS as a client, >>> it has created a nearly 1MB lustre-log in /tmp (available upon >>> request), and I''ve included a snippet from /var/log/messages below. >>> >>> Any help would be appreciated! >>> >>> May 9 15:13:46 Lustre-01-01 kernel: Lustre: >>> 7256:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000: >>> d09242ed-b4f7-806f-bc12-912f7cfac1a9 reconnecting >>> May 9 15:13:46 Lustre-01-01 kernel: Lustre: >>> 7256:0:(ldlm_lib.c:497:target_handle_reconnect()) Skipped 1 previous >>> similar message >>> May 9 15:13:46 Lustre-01-01 kernel: Lustre: >>> 7256:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: >>> refuse reconnection from d09242ed-b4f7-806f-bc12-912f7cfac1a9@0@lo >>> to 0x000001011f228000/2 >>> May 9 15:13:46 Lustre-01-01 kernel: Lustre: >>> 7256:0:(ldlm_lib.c:709:target_handle_connect()) Skipped 1 previous >>> similar message >>> May 9 15:13:46 Lustre-01-01 kernel: LustreError: >>> 7256:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing >>> error (-16) req@00000100dfd5b000 x670/t0 >>> o38->d09242ed-b4f7-806f-bc12-912f7cfac1a9@172.19.0.10@tcp:-1 lens >>> 304/200 ref 0 fl Interpret:/0/0 rc -16/0 >>> May 9 15:13:46 Lustre-01-01 kernel: LustreError: >>> 7256:0:(ldlm_lib.c:1363:target_send_reply_msg()) Skipped 1 previous >>> similar message >>> May 9 15:14:54 Lustre-01-01 automount[6101]: attempting to mount >>> entry /.autofs/var.mail >>> May 9 15:15:01 Lustre-01-01 kernel: Lustre: >>> 7262:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000: >>> d09242ed-b4f7-806f-bc12-912f7cfac1a9 reconnecting >>> May 9 15:15:01 Lustre-01-01 kernel: Lustre: >>> 7262:0:(ldlm_lib.c:497:target_handle_reconnect()) Skipped 2 previous >>> similar messages >>> May 9 15:15:01 Lustre-01-01 kernel: Lustre: >>> 7262:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: >>> refuse reconnection from d09242ed-b4f7-806f-bc12-912f7cfac1a9@0@lo >>> to 0x000001011f228000/2 >>> May 9 15:15:01 Lustre-01-01 kernel: Lustre: >>> 7262:0:(ldlm_lib.c:709:target_handle_connect()) Skipped 2 previous >>> similar messages >>> May 9 15:15:01 Lustre-01-01 kernel: LustreError: >>> 7262:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing >>> error (-16) req@000001011e904a00 x712/t0 >>> o38->d09242ed-b4f7-806f-bc12-912f7cfac1a9@172.19.0.10@tcp:-1 lens >>> 304/200 ref 0 fl Interpret:/0/0 rc -16/0 >>> May 9 15:15:01 Lustre-01-01 kernel: LustreError: >>> 7262:0:(ldlm_lib.c:1363:target_send_reply_msg()) Skipped 2 previous >>> similar messages >>> May 9 15:17:31 Lustre-01-01 kernel: Lustre: >>> 7242:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000: >>> d09242ed-b4f7-806f-bc12-912f7cfac1a9 reconnecting >>> May 9 15:17:31 Lustre-01-01 kernel: Lustre: >>> 7242:0:(ldlm_lib.c:497:target_handle_reconnect()) Skipped 5 previous >>> similar messages >>> May 9 15:17:31 Lustre-01-01 kernel: Lustre: >>> 7242:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: >>> refuse reconnection from d09242ed-b4f7-806f-bc12-912f7cfac1a9@0@lo >>> to 0x000001011f228000/2 >>> May 9 15:17:31 Lustre-01-01 kernel: Lustre: >>> 7242:0:(ldlm_lib.c:709:target_handle_connect()) Skipped 5 previous >>> similar messages >>> May 9 15:17:31 Lustre-01-01 kernel: LustreError: >>> 7242:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing >>> error (-16) req@000001021b13ec00 x796/t0 >>> o38->d09242ed-b4f7-806f-bc12-912f7cfac1a9@172.19.0.10@tcp:-1 lens >>> 304/200 ref 0 fl Interpret:/0/0 rc -16/0 >>> May 9 15:17:31 Lustre-01-01 kernel: LustreError: >>> 7242:0:(ldlm_lib.c:1363:target_send_reply_msg()) Skipped 5 previous >>> similar messages >>> May 9 15:20:17 Lustre-01-01 automount[7452]: expired /.autofs/var.mail >>> May 9 15:20:26 Lustre-01-01 kernel: LustreError: >>> 7143:0:(client.c:574:ptlrpc_check_status()) @@@ type == >>> PTL_RPC_MSG_ERR, err == -19 req@00000102196f8c00 x891/t0 >>> o8->lustre1-OST0003_UUID@172.19.0.14@tcp:6 lens 240/272 ref 1 fl >>> Rpc:R/0/0 rc 0/-19 >>> May 9 15:20:26 Lustre-01-01 kernel: LustreError: >>> 7143:0:(client.c:574:ptlrpc_check_status()) Skipped 72 previous >>> similar messages >>> May 9 15:22:05 Lustre-01-01 kernel: Lustre: >>> 7233:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000: >>> d09242ed-b4f7-806f-bc12-912f7cfac1a9 reconnecting >>> May 9 15:22:05 Lustre-01-01 kernel: Lustre: >>> 7233:0:(ldlm_lib.c:497:target_handle_reconnect()) Skipped 10 >>> previous similar messages >>> May 9 15:22:05 Lustre-01-01 kernel: Lustre: >>> 7233:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: >>> refuse reconnection from d09242ed-b4f7-806f-bc12-912f7cfac1a9@0@lo >>> to 0x000001011f228000/2 >>> May 9 15:22:05 Lustre-01-01 kernel: Lustre: >>> 7233:0:(ldlm_lib.c:709:target_handle_connect()) Skipped 10 previous >>> similar messages >>> May 9 15:22:05 Lustre-01-01 kernel: LustreError: >>> 7233:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing >>> error (-16) req@00000102140d7400 x950/t0 >>> o38->d09242ed-b4f7-806f-bc12-912f7cfac1a9@172.19.0.10@tcp:-1 lens >>> 304/200 ref 0 fl Interpret:/0/0 rc -16/0 >>> May 9 15:22:05 Lustre-01-01 kernel: LustreError: >>> 7233:0:(ldlm_lib.c:1363:target_send_reply_msg()) Skipped 10 previous >>> similar messages >>> May 9 15:25:11 Lustre-01-01 kernel: LustreError: >>> 7406:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue: -4 >>> >>> >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss@clusterfs.com >> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > >
Hmmm, all of the systems are running, but something isn''t right. /proc/fs/lustre/devices on the MDS shows 6 OST''s, but there are only 5 of them. Furthermore, the ordering on the OST''s is out of whack. Any ideas on how to correct this? On MDS: Lustre-01-01$ cat /proc/fs/lustre/devices 0 UP mgs MGS MGS 15 1 UP mgc MGC172.19.0.10@tcp 45b90bd7-b51f-fcc9-9610-4640debeaf74 5 2 UP mdt MDS MDS_uuid 3 3 UP lov lustre1-mdtlov lustre1-mdtlov_UUID 4 4 UP mds lustre1-MDT0000 lustre1-MDT0000_UUID 6 5 UP osc lustre1-OST0000-osc lustre1-mdtlov_UUID 5 6 UP osc lustre1-OST0001-osc lustre1-mdtlov_UUID 5 7 UP osc lustre1-OST0002-osc lustre1-mdtlov_UUID 5 8 UP osc lustre1-OST0003-osc lustre1-mdtlov_UUID 5 9 UP osc lustre1-OST0004-osc lustre1-mdtlov_UUID 5 10 UP osc lustre1-OST0005-osc lustre1-mdtlov_UUID 5 11 UP lov lustre1-clilov-000001000c224400 a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 4 12 UP mdc lustre1-MDT0000-mdc-000001000c224400 a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 5 13 UP osc lustre1-OST0000-osc-000001000c224400 a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 5 14 UP osc lustre1-OST0001-osc-000001000c224400 a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 5 15 UP osc lustre1-OST0002-osc-000001000c224400 a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 5 16 UP osc lustre1-OST0003-osc-000001000c224400 a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 5 17 UP osc lustre1-OST0004-osc-000001000c224400 a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 5 18 UP osc lustre1-OST0005-osc-000001000c224400 a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 5 On OSS''s: Lustre-01-02$ cat /proc/fs/lustre/devices 0 UP mgc MGC172.19.0.10@tcp 788e3733-c847-fc92-bd8b-4f75891520b0 5 1 UP ost OSS OSS_uuid 3 2 UP obdfilter lustre1-OST0000 lustre1-OST0000_UUID 7 Lustre-01-03$ cat /proc/fs/lustre/devices 0 UP mgc MGC172.19.0.10@tcp e324df13-4042-f0e0-a0b1-aefd20cbfcb2 5 1 UP ost OSS OSS_uuid 3 2 UP obdfilter lustre1-OST0001 lustre1-OST0001_UUID 7 Lustre-01-04$ cat /proc/fs/lustre/devices 0 UP mgc MGC172.19.0.10@tcp 496fbabb-0c40-f069-8e35-fe4bf54ca2bf 5 1 UP ost OSS OSS_uuid 3 2 UP obdfilter lustre1-OST0002 lustre1-OST0002_UUID 7 Lustre-01-05$ cat /proc/fs/lustre/devices 0 UP mgc MGC172.19.0.10@tcp 8b44355c-c5d7-f1f8-be11-dcc7a620e4d9 5 1 UP ost OSS OSS_uuid 3 2 UP obdfilter lustre1-OST0005 lustre1-OST0005_UUID 7 Lustre-01-06$ cat /proc/fs/lustre/devices 0 UP mgc MGC172.19.0.10@tcp 59201360-8fe6-f7da-ba49-81a4dc56a705 5 1 UP ost OSS OSS_uuid 3 2 UP obdfilter lustre1-OST0004 lustre1-OST0004_UUID 7 Lustre-01-05 should be lustre1-OST0003 and Lustre-01-06 should be lustre1-OST0004. Instead, I don''t have an 0003, and I''ve got an 0005, and the MDS sees more machines than exist. Nathaniel Rutman wrote:> errno 19 = ENODEV -- did the server lustre-OST0003 successfully start? > > > Roger L. Smith wrote: >> Nathan, >> >> Thanks for the help. That solved one problem, but after booting all >> of the servers (no clients at all), I''m getting this in the syslog on >> the MDS: >> >> >> >> May 11 16:57:11 Lustre-01-01 kernel: LustreError: >> 6359:0:(client.c:574:ptlrpc_check_status()) @@@ type == >> PTL_RPC_MSG_ERR, err == -19 req@0000010218691800 x1450/t0 >> o8->lustre1-OST0003_UUID@172.19.0.14@tcp:6 lens 240/272 ref 1 fl >> Rpc:R/0/0 rc 0/-19 >> May 11 16:57:11 Lustre-01-01 kernel: LustreError: >> 6359:0:(client.c:574:ptlrpc_check_status()) Skipped 24 previous >> similar messages >> >> >> IF I then attempt to mount the filesystem on the MDS, it mounts, and I >> can see the contents. I then tried removing a file that I had earlier >> created using the "touch" command, and it removes fine, however, if I >> then try to touch a new file, the command hangs and after a minute or >> so I get the following set of errors: (I''ll be happy to provide the >> lustre-log if that''s helpful). >> >> May 11 17:01:38 Lustre-01-01 kernel: Lustre: Client lustre1-client has >> started >> May 11 17:03:32 Lustre-01-01 kernel: Lustre: >> 0:0:(watchdog.c:130:lcw_cb()) Watch >> dog triggered for pid 6617: it was inactive for 100s >> May 11 17:03:32 Lustre-01-01 kernel: Lustre: >> 0:0:(linux-debug.c:166:libcfs_debug >> _dumpstack()) showing stack for process 6617 >> May 11 17:03:32 Lustre-01-01 kernel: ll_mdt_10 S 00000102179ac4a8 >> 0 661 >> 7 1 6618 6616 (L-TLB) >> May 11 17:03:32 Lustre-01-01 kernel: 000001021770b018 0000000000000046 >> 000000300 >> 0000030 000001021770b0b0 >> May 11 17:03:32 Lustre-01-01 kernel: 000001021770af98 >> 0000010215f28380 00 >> 0001020000007b 0000000000000000 >> May 11 17:03:32 Lustre-01-01 kernel: 0000000000000000 >> 000001000c001160 >> May 11 17:03:32 Lustre-01-01 kernel: Call >> Trace:<ffffffff80147456>{schedule_time >> out+246} <ffffffff801468b0>{process_timeout+0} >> May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa03a776e>{:ptlrpc:ptlrpc_se >> t_wait+974} <ffffffffa06040d4>{:osc:osc_statfs_async+372} >> May 11 17:03:32 Lustre-01-01 kernel: <ffffffff80136120>{default_wake_func >> tion+0} <ffffffffa04734fa>{:lov:lov_statfs_async+1098} >> May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa03a67e0>{:ptlrpc:ptlrpc_ex >> pired_set+0} <ffffffffa03a1680>{:ptlrpc:ptlrpc_interrupted_set+0} >> May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa03a67e0>{:ptlrpc:ptlrpc_ex >> pired_set+0} <ffffffffa03a1680>{:ptlrpc:ptlrpc_interrupted_set+0} >> May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa047af8e>{:lov:lov_create+7 >> 006} <ffffffff80190bdf>{__getblk+31} >> May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa0525e1f>{:ldiskfs:ldiskfs_ >> get_inode_loc+351} >> May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa0533486>{:ldiskfs:ldiskfs_ >> xattr_ibody_get+454} >> May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa0533e18>{:ldiskfs:ldiskfs_ >> xattr_get+120} >> May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa05aa4da>{:mds:mds_get_md+1 >> 06} <ffffffffa05ccd7e>{:mds:mds_create_objects+7214} >> May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa0533e18>{:ldiskfs:ldiskfs_ >> xattr_get+120} >> May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa058cb6d>{:fsfilt_ldiskfs:f >> sfilt_ldiskfs_get_md+269} >> May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa05d0b9d>{:mds:mds_finish_o >> pen+701} <ffffffffa05d3617>{:mds:mds_open+8359} >> May 11 17:03:32 Lustre-01-01 kernel: <ffffffff8014ca1b>{groups_alloc+59} >> <ffffffffa029d653>{:lvfs:entry_set_group_info+211} >> May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa029cb71>{:lvfs:alloc_entry >> +241} <ffffffffa028570d>{:libcfs:libcfs_debug_vmsg2+1677} >> May 11 17:03:32 Lustre-01-01 kernel: >> <ffffffff801a9361>{dput+33} <fffffff >> fa03b1e60>{:ptlrpc:lustre_swab_mds_rec_create+0} >> May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa05b35fc>{:mds:mds_reint_re >> c+460} <ffffffffa05d4fe4>{:mds:mds_open_unpack+820} >> May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa05d42d4>{:mds:mds_update_u >> npack+484} <ffffffffa05aa391>{:mds:mds_reint+817} >> May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa037d887>{:ptlrpc:_ldlm_loc >> k_debug+1319} <ffffffffa05a86a1>{:mds:fixup_handle_for_resent_req+81} >> May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa05ae771>{:mds:mds_intent_p >> olicy+1089} <ffffffff801168e5>{do_gettimeofday+101} >> May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa0382e63>{:ptlrpc:ldlm_lock >> _enqueue+243} <ffffffffa039dc70>{:ptlrpc:ldlm_server_completion_ast+0} >> May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa039c972>{:ptlrpc:ldlm_hand >> le_enqueue+2722} >> May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa039e100>{:ptlrpc:ldlm_serv >> er_blocking_ast+0} >> May 11 17:03:32 Lustre-01-01 kernel: >> <ffffffffa039dc70>{:ptlrpc:ldlm_server_completion_ast+0} >> May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa05b291a>{:mds:mds_handle+1 >> 4938} <ffffffffa0306e4f>{:obdclass:class_handle2object+207} >> May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa03aec30>{:ptlrpc:lustre_sw >> ab_ptlrpc_body+0} >> May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa03b3300>{:ptlrpc:lustre_sw >> ab_buf+208} <ffffffffa028247d>{:libcfs:libcfs_nid2str+189} >> May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa03b9e27>{:ptlrpc:ptlrpc_se >> rver_handle_request+2951} >> May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa03bc4a8>{:ptlrpc:ptlrpc_ma >> in+2232} <ffffffff80136120>{default_wake_function+0} >> May 11 17:03:32 Lustre-01-01 kernel: <ffffffffa03b7590>{:ptlrpc:ptlrpc_re >> try_rqbds+0} <ffffffffa03b7590>{:ptlrpc:ptlrpc_retry_rqbds+0} >> May 11 17:03:32 Lustre-01-01 kernel: <ffffffff8011126f>{child_rip+8} <fff >> fffffa03bbbf0>{:ptlrpc:ptlrpc_main+0} >> May 11 17:03:32 Lustre-01-01 kernel: >> <ffffffff80111267>{child_rip+0} >> May 11 17:03:32 Lustre-01-01 kernel: LustreError: dumping log to >> /tmp/lustre-log >> .1178921012.6617 >> May 11 17:03:32 Lustre-01-01 kernel: LustreError: >> 7460:0:(client.c:950:ptlrpc_ex >> pire_one_request()) @@@ timeout (sent at 1178920912, 100s ago) >> req@000001011dea6 >> 400 x1548/t0 o101->lustre1-MDT0000_UUID@172.19.0.10@tcp:12 lens >> 512/864 ref 1 fl >> Rpc:P/0/0 rc 0/-22 >> May 11 17:03:32 Lustre-01-01 kernel: LustreError: >> 7460:0:(client.c:950:ptlrpc_ex >> pire_one_request()) Skipped 17 previous similar messages >> May 11 17:03:32 Lustre-01-01 kernel: Lustre: >> lustre1-MDT0000-mdc-000001000c22440 >> 0: Connection to service lustre1-MDT0000 via nid 0@lo was lost; in >> progress oper >> ations using this service will wait for recovery to complete. >> May 11 17:03:32 Lustre-01-01 kernel: Lustre: >> 6621:0:(ldlm_lib.c:497:target_handl >> e_reconnect()) lustre1-MDT0000: a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 >> reconnectin >> g >> May 11 17:03:32 Lustre-01-01 kernel: Lustre: >> 6621:0:(ldlm_lib.c:709:target_handl >> e_connect()) lustre1-MDT0000: refuse reconnection from >> a5a32bfb-05ba-d0cc-f4dd-6 >> afbc5d45224@0@lo to 0x000001011df5b000/2 >> May 11 17:03:32 Lustre-01-01 kernel: LustreError: >> 6621:0:(ldlm_lib.c:1363:target >> _send_reply_msg()) @@@ processing error (-16) req@000001011d8f9800 >> x1607/t0 o38- >> >a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224@172.19.0.10@tcp:-1 lens 304/200 >> ref 0 fl I >> nterpret:/0/0 rc -16/0 >> May 11 17:03:57 Lustre-01-01 kernel: Lustre: >> 6622:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000: >> a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 reconnecting >> May 11 17:03:57 Lustre-01-01 kernel: Lustre: >> 6622:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: >> refuse reconnection from a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224@0@lo to >> 0x000001011df5b000/2 >> May 11 17:03:57 Lustre-01-01 kernel: LustreError: >> 6622:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing error >> (-16) req@000001011d877800 x1615/t0 >> o38->a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224@172.19.0.10@tcp:-1 lens >> 304/200 ref 0 fl Interpret:/0/0 rc -16/0 >> May 11 17:04:22 Lustre-01-01 kernel: Lustre: >> 6623:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000: >> a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 reconnecting >> May 11 17:04:22 Lustre-01-01 kernel: Lustre: >> 6623:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: >> refuse reconnection from a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224@0@lo to >> 0x000001011df5b000/2 >> May 11 17:04:22 Lustre-01-01 kernel: LustreError: >> 6623:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing error >> (-16) req@000001011d8dd000 x1629/t0 >> o38->a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224@172.19.0.10@tcp:-1 lens >> 304/200 ref 0 fl Interpret:/0/0 rc -16/0 >> May 11 17:04:47 Lustre-01-01 kernel: Lustre: >> 6624:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000: >> a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 reconnecting >> May 11 17:04:47 Lustre-01-01 kernel: Lustre: >> 6624:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: >> refuse reconnection from a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224@0@lo to >> 0x000001011df5b000/2 >> May 11 17:04:47 Lustre-01-01 kernel: LustreError: >> 6624:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing error >> (-16) req@000001011d877e00 x1643/t0 >> o38->a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224@172.19.0.10@tcp:-1 lens >> 304/200 ref 0 fl Interpret:/0/0 rc -16/0 >> >> >> Nathaniel Rutman wrote: >>> May 9 15:13:46 Lustre-01-01 kernel: Lustre: >>> 7256:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: >>> refuse reconnection from d09242ed-b4f7-806f-bc12-912f7cfac1a9@0@lo to >>> 0x000001011f228000/2 >>> The MDT for some reason thinks the export is in use. I have no idea >>> why, but try this. >>> >>> stop all your clients. >>> on the MDT: >>> # cat /proc/fs/lustre/devices >>> # ls /proc/fs/lustre/mds/lustre1-MDT0000/exports/ >>> just to prove there are no clients. >>> Now try mounting a client on the MDT >>> >>> >>> Roger L. Smith wrote: >>>> >>>> I''m attempting my first-ever Lustre install on a small test >>>> cluster. I have one MDS and 5 OSS''s, all with identical hardware. >>>> They are all on the same network segment, and have a single ethernet >>>> interface. I''m running SLES9 SP3 with the Lustre RPMs for the >>>> kernel, modules, etc. >>>> >>>> I''ve configured the systems and mounted everything, and everything >>>> seems fine. >>>> >>>> As a first test, I''ve tried to mount the filesystem on the MDS (and >>>> on more than one OSS) as a client. The filesystem seems to mount >>>> fine, but once it is mounted, which ever system has it mounted will >>>> hang for long periods of time (often, permanently), however, I can >>>> log into the system from another shell, and things will act ok. >>>> Normally the hang seems to be caused by doing anything related to >>>> the client-mounted filesystem, but not always. >>>> >>>> I had an installation of 1.6 beta7 working that didn''t seem to have >>>> the problem, but 1.6.0 and 1.6.0.1 both have done it. >>>> >>>> I currently have the filesystem mounted using the MDS as a client, >>>> it has created a nearly 1MB lustre-log in /tmp (available upon >>>> request), and I''ve included a snippet from /var/log/messages below. >>>> >>>> Any help would be appreciated! >>>> >>>> May 9 15:13:46 Lustre-01-01 kernel: Lustre: >>>> 7256:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000: >>>> d09242ed-b4f7-806f-bc12-912f7cfac1a9 reconnecting >>>> May 9 15:13:46 Lustre-01-01 kernel: Lustre: >>>> 7256:0:(ldlm_lib.c:497:target_handle_reconnect()) Skipped 1 previous >>>> similar message >>>> May 9 15:13:46 Lustre-01-01 kernel: Lustre: >>>> 7256:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: >>>> refuse reconnection from d09242ed-b4f7-806f-bc12-912f7cfac1a9@0@lo >>>> to 0x000001011f228000/2 >>>> May 9 15:13:46 Lustre-01-01 kernel: Lustre: >>>> 7256:0:(ldlm_lib.c:709:target_handle_connect()) Skipped 1 previous >>>> similar message >>>> May 9 15:13:46 Lustre-01-01 kernel: LustreError: >>>> 7256:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing >>>> error (-16) req@00000100dfd5b000 x670/t0 >>>> o38->d09242ed-b4f7-806f-bc12-912f7cfac1a9@172.19.0.10@tcp:-1 lens >>>> 304/200 ref 0 fl Interpret:/0/0 rc -16/0 >>>> May 9 15:13:46 Lustre-01-01 kernel: LustreError: >>>> 7256:0:(ldlm_lib.c:1363:target_send_reply_msg()) Skipped 1 previous >>>> similar message >>>> May 9 15:14:54 Lustre-01-01 automount[6101]: attempting to mount >>>> entry /.autofs/var.mail >>>> May 9 15:15:01 Lustre-01-01 kernel: Lustre: >>>> 7262:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000: >>>> d09242ed-b4f7-806f-bc12-912f7cfac1a9 reconnecting >>>> May 9 15:15:01 Lustre-01-01 kernel: Lustre: >>>> 7262:0:(ldlm_lib.c:497:target_handle_reconnect()) Skipped 2 previous >>>> similar messages >>>> May 9 15:15:01 Lustre-01-01 kernel: Lustre: >>>> 7262:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: >>>> refuse reconnection from d09242ed-b4f7-806f-bc12-912f7cfac1a9@0@lo >>>> to 0x000001011f228000/2 >>>> May 9 15:15:01 Lustre-01-01 kernel: Lustre: >>>> 7262:0:(ldlm_lib.c:709:target_handle_connect()) Skipped 2 previous >>>> similar messages >>>> May 9 15:15:01 Lustre-01-01 kernel: LustreError: >>>> 7262:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing >>>> error (-16) req@000001011e904a00 x712/t0 >>>> o38->d09242ed-b4f7-806f-bc12-912f7cfac1a9@172.19.0.10@tcp:-1 lens >>>> 304/200 ref 0 fl Interpret:/0/0 rc -16/0 >>>> May 9 15:15:01 Lustre-01-01 kernel: LustreError: >>>> 7262:0:(ldlm_lib.c:1363:target_send_reply_msg()) Skipped 2 previous >>>> similar messages >>>> May 9 15:17:31 Lustre-01-01 kernel: Lustre: >>>> 7242:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000: >>>> d09242ed-b4f7-806f-bc12-912f7cfac1a9 reconnecting >>>> May 9 15:17:31 Lustre-01-01 kernel: Lustre: >>>> 7242:0:(ldlm_lib.c:497:target_handle_reconnect()) Skipped 5 previous >>>> similar messages >>>> May 9 15:17:31 Lustre-01-01 kernel: Lustre: >>>> 7242:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: >>>> refuse reconnection from d09242ed-b4f7-806f-bc12-912f7cfac1a9@0@lo >>>> to 0x000001011f228000/2 >>>> May 9 15:17:31 Lustre-01-01 kernel: Lustre: >>>> 7242:0:(ldlm_lib.c:709:target_handle_connect()) Skipped 5 previous >>>> similar messages >>>> May 9 15:17:31 Lustre-01-01 kernel: LustreError: >>>> 7242:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing >>>> error (-16) req@000001021b13ec00 x796/t0 >>>> o38->d09242ed-b4f7-806f-bc12-912f7cfac1a9@172.19.0.10@tcp:-1 lens >>>> 304/200 ref 0 fl Interpret:/0/0 rc -16/0 >>>> May 9 15:17:31 Lustre-01-01 kernel: LustreError: >>>> 7242:0:(ldlm_lib.c:1363:target_send_reply_msg()) Skipped 5 previous >>>> similar messages >>>> May 9 15:20:17 Lustre-01-01 automount[7452]: expired /.autofs/var.mail >>>> May 9 15:20:26 Lustre-01-01 kernel: LustreError: >>>> 7143:0:(client.c:574:ptlrpc_check_status()) @@@ type == >>>> PTL_RPC_MSG_ERR, err == -19 req@00000102196f8c00 x891/t0 >>>> o8->lustre1-OST0003_UUID@172.19.0.14@tcp:6 lens 240/272 ref 1 fl >>>> Rpc:R/0/0 rc 0/-19 >>>> May 9 15:20:26 Lustre-01-01 kernel: LustreError: >>>> 7143:0:(client.c:574:ptlrpc_check_status()) Skipped 72 previous >>>> similar messages >>>> May 9 15:22:05 Lustre-01-01 kernel: Lustre: >>>> 7233:0:(ldlm_lib.c:497:target_handle_reconnect()) lustre1-MDT0000: >>>> d09242ed-b4f7-806f-bc12-912f7cfac1a9 reconnecting >>>> May 9 15:22:05 Lustre-01-01 kernel: Lustre: >>>> 7233:0:(ldlm_lib.c:497:target_handle_reconnect()) Skipped 10 >>>> previous similar messages >>>> May 9 15:22:05 Lustre-01-01 kernel: Lustre: >>>> 7233:0:(ldlm_lib.c:709:target_handle_connect()) lustre1-MDT0000: >>>> refuse reconnection from d09242ed-b4f7-806f-bc12-912f7cfac1a9@0@lo >>>> to 0x000001011f228000/2 >>>> May 9 15:22:05 Lustre-01-01 kernel: Lustre: >>>> 7233:0:(ldlm_lib.c:709:target_handle_connect()) Skipped 10 previous >>>> similar messages >>>> May 9 15:22:05 Lustre-01-01 kernel: LustreError: >>>> 7233:0:(ldlm_lib.c:1363:target_send_reply_msg()) @@@ processing >>>> error (-16) req@00000102140d7400 x950/t0 >>>> o38->d09242ed-b4f7-806f-bc12-912f7cfac1a9@172.19.0.10@tcp:-1 lens >>>> 304/200 ref 0 fl Interpret:/0/0 rc -16/0 >>>> May 9 15:22:05 Lustre-01-01 kernel: LustreError: >>>> 7233:0:(ldlm_lib.c:1363:target_send_reply_msg()) Skipped 10 previous >>>> similar messages >>>> May 9 15:25:11 Lustre-01-01 kernel: LustreError: >>>> 7406:0:(mdc_locks.c:414:mdc_enqueue()) ldlm_cli_enqueue: -4 >>>> >>>> >>> >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss@clusterfs.com >>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >> >>-- Roger L. Smith Senior Systems Administrator Mississippi State University High Performance Computing Collaboratory
When an OST first starts, it registers with the MGS and is assigned an index number. The initial startup order determines the OST indicies. I''m guessing you started an OST that was assigned number 3, but then lost/reformatted/something bad happened to that disk, and it re-registered, getting a new index (6). There are a few things you can do: 1. You can tell Lustre to forever ignore the "missing" OST0003 by doing this on the MGS: > lctl conf_param lustre1-OST0003.osc.active=0 2. If you don''t care about your data, you can just reformat everybody (including the MDT) and start over. This is the only way you''ll be able to get your index numbers back to what you want them to be - you cannot change an index number once assigned, because Lustre expects to find certain file objects on certain OSTs. You can use the --index flag to mkfs.lustre to force a particular index if you want. 3. I lied. If you are sure no files are on OST0005 (use ''lfs find''), you can reformat just that disk, and use "tunefs.lustre --writeconf" on the MDT to force regeneration of the configuration files. (See the docs). Roger L. Smith wrote:> Hmmm, all of the systems are running, but something isn''t right. > /proc/fs/lustre/devices on the MDS shows 6 OST''s, but there are only 5 > of them. Furthermore, the ordering on the OST''s is out of whack. > > Any ideas on how to correct this? > > On MDS: > > Lustre-01-01$ cat /proc/fs/lustre/devices > 0 UP mgs MGS MGS 15 > 1 UP mgc MGC172.19.0.10@tcp 45b90bd7-b51f-fcc9-9610-4640debeaf74 5 > 2 UP mdt MDS MDS_uuid 3 > 3 UP lov lustre1-mdtlov lustre1-mdtlov_UUID 4 > 4 UP mds lustre1-MDT0000 lustre1-MDT0000_UUID 6 > 5 UP osc lustre1-OST0000-osc lustre1-mdtlov_UUID 5 > 6 UP osc lustre1-OST0001-osc lustre1-mdtlov_UUID 5 > 7 UP osc lustre1-OST0002-osc lustre1-mdtlov_UUID 5 > 8 UP osc lustre1-OST0003-osc lustre1-mdtlov_UUID 5 > 9 UP osc lustre1-OST0004-osc lustre1-mdtlov_UUID 5 > 10 UP osc lustre1-OST0005-osc lustre1-mdtlov_UUID 5 > 11 UP lov lustre1-clilov-000001000c224400 > a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 4 > 12 UP mdc lustre1-MDT0000-mdc-000001000c224400 > a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 5 > 13 UP osc lustre1-OST0000-osc-000001000c224400 > a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 5 > 14 UP osc lustre1-OST0001-osc-000001000c224400 > a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 5 > 15 UP osc lustre1-OST0002-osc-000001000c224400 > a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 5 > 16 UP osc lustre1-OST0003-osc-000001000c224400 > a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 5 > 17 UP osc lustre1-OST0004-osc-000001000c224400 > a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 5 > 18 UP osc lustre1-OST0005-osc-000001000c224400 > a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 5 > > > On OSS''s: > > Lustre-01-02$ cat /proc/fs/lustre/devices > 0 UP mgc MGC172.19.0.10@tcp 788e3733-c847-fc92-bd8b-4f75891520b0 5 > 1 UP ost OSS OSS_uuid 3 > 2 UP obdfilter lustre1-OST0000 lustre1-OST0000_UUID 7 > > Lustre-01-03$ cat /proc/fs/lustre/devices > 0 UP mgc MGC172.19.0.10@tcp e324df13-4042-f0e0-a0b1-aefd20cbfcb2 5 > 1 UP ost OSS OSS_uuid 3 > 2 UP obdfilter lustre1-OST0001 lustre1-OST0001_UUID 7 > > Lustre-01-04$ cat /proc/fs/lustre/devices > 0 UP mgc MGC172.19.0.10@tcp 496fbabb-0c40-f069-8e35-fe4bf54ca2bf 5 > 1 UP ost OSS OSS_uuid 3 > 2 UP obdfilter lustre1-OST0002 lustre1-OST0002_UUID 7 > > Lustre-01-05$ cat /proc/fs/lustre/devices > 0 UP mgc MGC172.19.0.10@tcp 8b44355c-c5d7-f1f8-be11-dcc7a620e4d9 5 > 1 UP ost OSS OSS_uuid 3 > 2 UP obdfilter lustre1-OST0005 lustre1-OST0005_UUID 7 > > Lustre-01-06$ cat /proc/fs/lustre/devices > 0 UP mgc MGC172.19.0.10@tcp 59201360-8fe6-f7da-ba49-81a4dc56a705 5 > 1 UP ost OSS OSS_uuid 3 > 2 UP obdfilter lustre1-OST0004 lustre1-OST0004_UUID 7 > > Lustre-01-05 should be lustre1-OST0003 and Lustre-01-06 should be > lustre1-OST0004. Instead, I don''t have an 0003, and I''ve got an 0005, > and the MDS sees more machines than exist. > > > > Nathaniel Rutman wrote: >> errno 19 = ENODEV -- did the server lustre-OST0003 successfully start? >> >> >> Roger L. Smith wrote: >>> Nathan, >>> >>> Thanks for the help. That solved one problem, but after booting all >>> of the servers (no clients at all), I''m getting this in the syslog >>> on the MDS: >>> >>> >>> >>> May 11 16:57:11 Lustre-01-01 kernel: LustreError: >>> 6359:0:(client.c:574:ptlrpc_check_status()) @@@ type == >>> PTL_RPC_MSG_ERR, err == -19 req@0000010218691800 x1450/t0 >>> o8->lustre1-OST0003_UUID@172.19.0.14@tcp:6 lens 240/272 ref 1 fl >>> Rpc:R/0/0 rc 0/-19 >>> May 11 16:57:11 Lustre-01-01 kernel: LustreError: >>> 6359:0:(client.c:574:ptlrpc_check_status()) Skipped 24 previous >>> similar messages >>>
Since I didn''t have any data on the filesystem yet, I just reformatted everything, and that seems to have fixed it. During one of the upgrades (1.6b7 to 1.6.0 or 1.6.0 to 1.6.0.1), one of the nodes wouldn''t boot after I installed the new kernel RPM''s. I had to rebuild the OS on the node, and I reformatted the lustre disk in that node as well. That''s probably what caused this problem. Thanks for your help! Nathaniel Rutman wrote:> When an OST first starts, it registers with the MGS and is assigned an > index number. The initial startup order determines the OST indicies. > I''m guessing you started an OST that was assigned number 3, but then > lost/reformatted/something bad happened to that disk, and it > re-registered, getting a new index (6). > There are a few things you can do: > 1. You can tell Lustre to forever ignore the "missing" OST0003 by doing > this on the MGS: > > lctl conf_param lustre1-OST0003.osc.active=0 > 2. If you don''t care about your data, you can just reformat everybody > (including the MDT) and start over. This is the only way you''ll be > able to get your index numbers back to what you want them to be - you > cannot change an index number once assigned, because Lustre expects to > find certain file objects on certain OSTs. You can use the --index flag > to mkfs.lustre to force a particular index if you want. > 3. I lied. If you are sure no files are on OST0005 (use ''lfs find''), > you can reformat just that disk, and use "tunefs.lustre --writeconf" on > the MDT to force regeneration of the configuration files. (See the docs). > Roger L. Smith wrote: >> Hmmm, all of the systems are running, but something isn''t right. >> /proc/fs/lustre/devices on the MDS shows 6 OST''s, but there are only 5 >> of them. Furthermore, the ordering on the OST''s is out of whack. >> >> Any ideas on how to correct this? >> >> On MDS: >> >> Lustre-01-01$ cat /proc/fs/lustre/devices >> 0 UP mgs MGS MGS 15 >> 1 UP mgc MGC172.19.0.10@tcp 45b90bd7-b51f-fcc9-9610-4640debeaf74 5 >> 2 UP mdt MDS MDS_uuid 3 >> 3 UP lov lustre1-mdtlov lustre1-mdtlov_UUID 4 >> 4 UP mds lustre1-MDT0000 lustre1-MDT0000_UUID 6 >> 5 UP osc lustre1-OST0000-osc lustre1-mdtlov_UUID 5 >> 6 UP osc lustre1-OST0001-osc lustre1-mdtlov_UUID 5 >> 7 UP osc lustre1-OST0002-osc lustre1-mdtlov_UUID 5 >> 8 UP osc lustre1-OST0003-osc lustre1-mdtlov_UUID 5 >> 9 UP osc lustre1-OST0004-osc lustre1-mdtlov_UUID 5 >> 10 UP osc lustre1-OST0005-osc lustre1-mdtlov_UUID 5 >> 11 UP lov lustre1-clilov-000001000c224400 >> a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 4 >> 12 UP mdc lustre1-MDT0000-mdc-000001000c224400 >> a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 5 >> 13 UP osc lustre1-OST0000-osc-000001000c224400 >> a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 5 >> 14 UP osc lustre1-OST0001-osc-000001000c224400 >> a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 5 >> 15 UP osc lustre1-OST0002-osc-000001000c224400 >> a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 5 >> 16 UP osc lustre1-OST0003-osc-000001000c224400 >> a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 5 >> 17 UP osc lustre1-OST0004-osc-000001000c224400 >> a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 5 >> 18 UP osc lustre1-OST0005-osc-000001000c224400 >> a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 5 >> >> >> On OSS''s: >> >> Lustre-01-02$ cat /proc/fs/lustre/devices >> 0 UP mgc MGC172.19.0.10@tcp 788e3733-c847-fc92-bd8b-4f75891520b0 5 >> 1 UP ost OSS OSS_uuid 3 >> 2 UP obdfilter lustre1-OST0000 lustre1-OST0000_UUID 7 >> >> Lustre-01-03$ cat /proc/fs/lustre/devices >> 0 UP mgc MGC172.19.0.10@tcp e324df13-4042-f0e0-a0b1-aefd20cbfcb2 5 >> 1 UP ost OSS OSS_uuid 3 >> 2 UP obdfilter lustre1-OST0001 lustre1-OST0001_UUID 7 >> >> Lustre-01-04$ cat /proc/fs/lustre/devices >> 0 UP mgc MGC172.19.0.10@tcp 496fbabb-0c40-f069-8e35-fe4bf54ca2bf 5 >> 1 UP ost OSS OSS_uuid 3 >> 2 UP obdfilter lustre1-OST0002 lustre1-OST0002_UUID 7 >> >> Lustre-01-05$ cat /proc/fs/lustre/devices >> 0 UP mgc MGC172.19.0.10@tcp 8b44355c-c5d7-f1f8-be11-dcc7a620e4d9 5 >> 1 UP ost OSS OSS_uuid 3 >> 2 UP obdfilter lustre1-OST0005 lustre1-OST0005_UUID 7 >> >> Lustre-01-06$ cat /proc/fs/lustre/devices >> 0 UP mgc MGC172.19.0.10@tcp 59201360-8fe6-f7da-ba49-81a4dc56a705 5 >> 1 UP ost OSS OSS_uuid 3 >> 2 UP obdfilter lustre1-OST0004 lustre1-OST0004_UUID 7 >> >> Lustre-01-05 should be lustre1-OST0003 and Lustre-01-06 should be >> lustre1-OST0004. Instead, I don''t have an 0003, and I''ve got an 0005, >> and the MDS sees more machines than exist. >> >> >> >> Nathaniel Rutman wrote: >>> errno 19 = ENODEV -- did the server lustre-OST0003 successfully start? >>> >>> >>> Roger L. Smith wrote: >>>> Nathan, >>>> >>>> Thanks for the help. That solved one problem, but after booting all >>>> of the servers (no clients at all), I''m getting this in the syslog >>>> on the MDS: >>>> >>>> >>>> >>>> May 11 16:57:11 Lustre-01-01 kernel: LustreError: >>>> 6359:0:(client.c:574:ptlrpc_check_status()) @@@ type == >>>> PTL_RPC_MSG_ERR, err == -19 req@0000010218691800 x1450/t0 >>>> o8->lustre1-OST0003_UUID@172.19.0.14@tcp:6 lens 240/272 ref 1 fl >>>> Rpc:R/0/0 rc 0/-19 >>>> May 11 16:57:11 Lustre-01-01 kernel: LustreError: >>>> 6359:0:(client.c:574:ptlrpc_check_status()) Skipped 24 previous >>>> similar messages >>>> > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss-- Roger L. Smith Senior Systems Administrator Mississippi State University High Performance Computing Collaboratory