Scott Atchley
2007-Apr-18 08:41 UTC
[Lustre-discuss] Multiple clients creating/deleting files in the same directory
Hi all, We are testing a small cluster with 1 MDS, 2 OSS, and 5 clients. When all clients are writing to independent directories as is well. When one client tries to list the contents of a directory that another client is creating/deleting files in, Lustre will hang and /var/log/ messages shows a lot of "printk suppressed" messages. Is this normal behavior or can we do something to minimize it (besides not having two clients work in the same directory)? Scott
Scott Atchley
2007-Apr-18 09:44 UTC
[Lustre-discuss] Multiple clients creating/deleting files in the same directory
On Apr 18, 2007, at 10:40 AM, Scott Atchley wrote:> Hi all, > > We are testing a small cluster with 1 MDS, 2 OSS, and 5 clients. > When all clients are writing to independent directories as is well. > When one client tries to list the contents of a directory that > another client is creating/deleting files in, Lustre will hang and / > var/log/messages shows a lot of "printk suppressed" messages. > > Is this normal behavior or can we do something to minimize it > (besides not having two clients work in the same directory)? > > ScottThis may or may not be related, but four of the clients can list a directory, but the fifth client cannot. On the fifth client, dmesg shows: Lustre: MDC_nas-0-0.local_mds1_MNT_client-0000010037e37800: Connection restored to service mds1 using nid 192.168.1.250@tcp. Lustre: Skipped 1 previous similar message LustreError: 9634:0:(mdc_request.c:684:mdc_close()) Unexpected: can''t find mdc_open_data, but the close succeeded. Please tell CFS. LustreError: 23673:0:(client.c:576:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR, err == -107 req@0000010006925800 x201696/t0 o400- >mds1_UUID@nas-0-0-m_UUID:12 lens 64/64 ref 1 fl Rpc:RN/0/0 rc 0/-107 LustreError: MDC_nas-0-0.local_mds1_MNT_client-0000010037e37800: Connection to service mds1 via nid 192.168.1.250@tcp was lost; in progress operations using this service will wait for recovery to complete. LustreError: This client was evicted by mds1; in progress operations using this service will fail. LustreError: 9645:0:(client.c:548:ptlrpc_check_reply()) @@@ ABORTED: req@00000100cfee7e00 x201693/t0 o37->mds1_UUID@nas-0-0-m_UUID:12 lens 240/240 ref 1 fl Rpc:E/0/0 rc 0/0 LustreError: 9645:0:(dir.c:329:ll_readdir()) error reading dir 480862/408283751 page 0: rc -5 LustreError: 9645:0:(dir.c:329:ll_readdir()) Skipped 89 previous similar messages LustreError: 9645:0:(client.c:511:ptlrpc_import_delay_req()) @@@ IMP_INVALID req@00000100cfee7e00 x201700/t0 o37->mds1_UUID@nas-0-0- m_UUID:12 lens 240/240 ref 1 fl Rpc:/0/0 rc 0/0 LustreError: 9645:0:(client.c:511:ptlrpc_import_delay_req()) Skipped 88 previous similar messages Lustre: MDC_nas-0-0.local_mds1_MNT_client-0000010037e37800: Connection restored to service mds1 using nid 192.168.1.250@tcp. LustreError: 9645:0:(mdc_request.c:684:mdc_close()) Unexpected: can''t find mdc_open_data, but the close succeeded. Please tell CFS. I like the "Please tell CFS." note. :-) Any suggestions? Scott
Scott Atchley
2007-Apr-18 11:15 UTC
[Lustre-discuss] Multiple clients creating/deleting files in the same directory
On Apr 18, 2007, at 11:44 AM, Scott Atchley wrote:> On Apr 18, 2007, at 10:40 AM, Scott Atchley wrote: > > This may or may not be related, but four of the clients can list a > directory, but the fifth client cannot. On the fifth client, dmesg > shows:Interestingly, although we cannot list the directory, if we supply the filename, we can list the file. If I iterate through all 7,000 filenames, I can list them individually. Even after that, I still cannot list the directory. Scott
Scott Atchley
2007-Apr-18 17:57 UTC
[Lustre-discuss] Multiple clients creating/deleting files in the same directory
Solved. iptables. Blocking lustre port. Grrr. Scott On Apr 18, 2007, at 1:14 PM, Scott Atchley wrote:> On Apr 18, 2007, at 11:44 AM, Scott Atchley wrote: > >> On Apr 18, 2007, at 10:40 AM, Scott Atchley wrote: >> >> This may or may not be related, but four of the clients can list a >> directory, but the fifth client cannot. On the fifth client, dmesg >> shows: > > Interestingly, although we cannot list the directory, if we supply > the filename, we can list the file. If I iterate through all 7,000 > filenames, I can list them individually. Even after that, I still > cannot list the directory. > > Scott > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >