Somsak Sriprayoonsakul
2006-Nov-07 05:29 UTC
[Lustre-discuss] Can''t mount lustre on some nodes
Dear List,
I''m trying to set up a Lustre 1.6b5 cluster where every nodes
except
frontend serve OST, frontend serve MGS+MDT, and every nodes (including
frontend) mount and use Lustre. Somehow there''s a weird problem where
some nodes can''t mount lustre but some nodes can.
My configuration:
OS: Rocks 4.2.1 Cluster (CentOS 4.4) using stock lustre
2.6.9-42.EL_lustre.1.5.95smp kernel. Frontend has 2 IP (real + private)
and ever compute nodes using private IP.
Lustre: 1.6b5.
Here''s log from frontend (MGS+MDT) and the failed client node
Failed client node:
Lustre: mount data:
Lustre: profile: lustre-client
Lustre: device: 10.1.1.1@tcp:/lustre
Lustre: flags: 2
LustreError: 22040:0:(client.c:579:ptlrpc_check_status()) @@@ type ==
PTL_RPC_MSG_ERR, err == -107
LustreError: 22040:0:(client.c:579:ptlrpc_check_status()) Skipped 3
previous similar messages
LustreError: 22040:0:(mgc_request.c:964:mgc_process_log()) Can''t get
cfg
lock: -107
LustreError: 3099:0:(mgc_request.c:493:mgc_blocking_ast()) original
grant failed, won''t requeue
LustreError: 22040:0:(mgc_request.c:1014:mgc_process_log())
MGC10.1.1.1@tcp: the configuration ''lustre-client'' could not
be read
(-107) from the MGS.
LustreError: MGC10.1.1.1@tcp: The configuration
''lustre-client'' could
not be read from the MGS (-107). This may be the result of
communication errors between this node and the MGS, or the MGS may not
be running.
Lustre: 0 UP mgc MGC10.1.1.1@tcp f19e61f7-623f-55a2-6332-ea987600d10d 5
Lustre: 1 UP ost OSS OSS_uuid 3
Lustre: 2 UP obdfilter lustre-OST0001 lustre-OST0001_UUID 9
LustreError: 22040:0:(llite_lib.c:909:ll_fill_super()) Unable to process
log: -107
Lustre: client 0000010118688000 umount complete
LustreError: 22040:0:(obd_mount.c:1857:lustre_fill_super()) Unable to
mount (-107)
Frontend:
LustreError: 10490:0:(mgs_handler.c:468:mgs_handle()) lustre_mgs:
operation 101 on unconnected MGS
LustreError: 10490:0:(mgs_handler.c:468:mgs_handle()) Skipped 1 previous
similar message
LustreError: 10490:0:(ldlm_lib.c:1317:target_send_reply_msg()) @@@
processing error (-107)
LustreError: 10490:0:(ldlm_lib.c:1317:target_send_reply_msg()) Skipped 3
previous similar messages
I think I strictly follow the guide at
https://mail.clusterfs.com/wikis/lustre/MountConf. I suppose that the
problem occurred because IP confusion on Frontend. But some compute
nodes successfully mount lustre file system frontend.
Regards,
--
-----------------------------------------------------------------------------------
Somsak Sriprayoonsakul
Thai National Grid Center
Software Industry Promotion Agency
Ministry of ICT, Thailand
somsak_sr@thaigrid.or.th
-----------------------------------------------------------------------------------
Use "lctl list_nids" and "lctl ping <remote_nid>" on the clients and servers to help see where the problem is. Somsak Sriprayoonsakul wrote:> Dear List, > > I''m trying to set up a Lustre 1.6b5 cluster where every nodes > except frontend serve OST, frontend serve MGS+MDT, and every nodes > (including frontend) mount and use Lustre. Somehow there''s a weird > problem where some nodes can''t mount lustre but some nodes can. > > My configuration: > > OS: Rocks 4.2.1 Cluster (CentOS 4.4) using stock lustre > 2.6.9-42.EL_lustre.1.5.95smp kernel. Frontend has 2 IP (real + > private) and ever compute nodes using private IP. > Lustre: 1.6b5. > Here''s log from frontend (MGS+MDT) and the failed client node > > Failed client node: > > Lustre: mount data: > Lustre: profile: lustre-client > Lustre: device: 10.1.1.1@tcp:/lustre > Lustre: flags: 2 > LustreError: 22040:0:(client.c:579:ptlrpc_check_status()) @@@ type == > PTL_RPC_MSG_ERR, err == -107 > LustreError: 22040:0:(client.c:579:ptlrpc_check_status()) Skipped 3 > previous similar messages > LustreError: 22040:0:(mgc_request.c:964:mgc_process_log()) Can''t get > cfg lock: -107 > LustreError: 3099:0:(mgc_request.c:493:mgc_blocking_ast()) original > grant failed, won''t requeue > LustreError: 22040:0:(mgc_request.c:1014:mgc_process_log()) > MGC10.1.1.1@tcp: the configuration ''lustre-client'' could not be read > (-107) from the MGS. > LustreError: MGC10.1.1.1@tcp: The configuration ''lustre-client'' could > not be read from the MGS (-107). This may be the result of > communication errors between this node and the MGS, or the MGS may not > be running. > Lustre: 0 UP mgc MGC10.1.1.1@tcp f19e61f7-623f-55a2-6332-ea987600d10d 5 > Lustre: 1 UP ost OSS OSS_uuid 3 > Lustre: 2 UP obdfilter lustre-OST0001 lustre-OST0001_UUID 9 > LustreError: 22040:0:(llite_lib.c:909:ll_fill_super()) Unable to > process log: -107 > Lustre: client 0000010118688000 umount complete > LustreError: 22040:0:(obd_mount.c:1857:lustre_fill_super()) Unable to > mount (-107) > > > Frontend: > LustreError: 10490:0:(mgs_handler.c:468:mgs_handle()) lustre_mgs: > operation 101 on unconnected MGS > LustreError: 10490:0:(mgs_handler.c:468:mgs_handle()) Skipped 1 > previous similar message > LustreError: 10490:0:(ldlm_lib.c:1317:target_send_reply_msg()) @@@ > processing error (-107) > LustreError: 10490:0:(ldlm_lib.c:1317:target_send_reply_msg()) Skipped > 3 previous similar messages > > I think I strictly follow the guide at > https://mail.clusterfs.com/wikis/lustre/MountConf. I suppose that the > problem occurred because IP confusion on Frontend. But some compute > nodes successfully mount lustre file system frontend. > > Regards, >
Somsak Sriprayoonsakul
2006-Nov-09 01:49 UTC
[Lustre-discuss] Can''t mount lustre on some nodes
The problem is solved in a very weird way. I found that when I umount ost temporary and remount it again. The client mount just come back to work again. Now every nodes can see the file system without problem. lctl ping seems to work ok on every nodes (I didn''t test every possibility. But all few tests are success). Nathaniel Rutman wrote:> Use "lctl list_nids" and "lctl ping <remote_nid>" on the clients and > servers to help see where the problem is. > Somsak Sriprayoonsakul wrote: >> Dear List, >> >> I''m trying to set up a Lustre 1.6b5 cluster where every nodes >> except frontend serve OST, frontend serve MGS+MDT, and every nodes >> (including frontend) mount and use Lustre. Somehow there''s a weird >> problem where some nodes can''t mount lustre but some nodes can. >> >> My configuration: >> >> OS: Rocks 4.2.1 Cluster (CentOS 4.4) using stock lustre >> 2.6.9-42.EL_lustre.1.5.95smp kernel. Frontend has 2 IP (real + >> private) and ever compute nodes using private IP. >> Lustre: 1.6b5. >> Here''s log from frontend (MGS+MDT) and the failed client node >> >> Failed client node: >> >> Lustre: mount data: >> Lustre: profile: lustre-client >> Lustre: device: 10.1.1.1@tcp:/lustre >> Lustre: flags: 2 >> LustreError: 22040:0:(client.c:579:ptlrpc_check_status()) @@@ type == >> PTL_RPC_MSG_ERR, err == -107 >> LustreError: 22040:0:(client.c:579:ptlrpc_check_status()) Skipped 3 >> previous similar messages >> LustreError: 22040:0:(mgc_request.c:964:mgc_process_log()) Can''t get >> cfg lock: -107 >> LustreError: 3099:0:(mgc_request.c:493:mgc_blocking_ast()) original >> grant failed, won''t requeue >> LustreError: 22040:0:(mgc_request.c:1014:mgc_process_log()) >> MGC10.1.1.1@tcp: the configuration ''lustre-client'' could not be read >> (-107) from the MGS. >> LustreError: MGC10.1.1.1@tcp: The configuration ''lustre-client'' could >> not be read from the MGS (-107). This may be the result of >> communication errors between this node and the MGS, or the MGS may >> not be running. >> Lustre: 0 UP mgc MGC10.1.1.1@tcp >> f19e61f7-623f-55a2-6332-ea987600d10d 5 >> Lustre: 1 UP ost OSS OSS_uuid 3 >> Lustre: 2 UP obdfilter lustre-OST0001 lustre-OST0001_UUID 9 >> LustreError: 22040:0:(llite_lib.c:909:ll_fill_super()) Unable to >> process log: -107 >> Lustre: client 0000010118688000 umount complete >> LustreError: 22040:0:(obd_mount.c:1857:lustre_fill_super()) Unable to >> mount (-107) >> >> >> Frontend: >> LustreError: 10490:0:(mgs_handler.c:468:mgs_handle()) lustre_mgs: >> operation 101 on unconnected MGS >> LustreError: 10490:0:(mgs_handler.c:468:mgs_handle()) Skipped 1 >> previous similar message >> LustreError: 10490:0:(ldlm_lib.c:1317:target_send_reply_msg()) @@@ >> processing error (-107) >> LustreError: 10490:0:(ldlm_lib.c:1317:target_send_reply_msg()) >> Skipped 3 previous similar messages >> >> I think I strictly follow the guide at >> https://mail.clusterfs.com/wikis/lustre/MountConf. I suppose that the >> problem occurred because IP confusion on Frontend. But some compute >> nodes successfully mount lustre file system frontend. >> >> Regards, >> >