thr3ads.net - Lustre discuss - [Lustre-discuss] The ost_connect operation failed with -16 [May 2012]

If this information is useful, please help other people find it:
Share via:

huangql

2012-May-30 03:21 UTC

[Lustre-discuss] The ost_connect operation failed with -16

Dear  all,

Recently we found the problem in OSS that some threads might be hung when the
server got heavy IO load. In this case, some clients will be evicted or refused
by some OSTs and got the error messages as following:

Server side:

May 30 11:06:31 boss07 kernel: Lustre: Service thread pid 8011 was inactive for
200.00s. The thread might be hung, or it might only be slow and will resume
later. D
umping the stack trace for debugging purposes: May 30 11:06:31 boss07 kernel:
Lustre: Skipped 1 previous similar message
May 30 11:06:31 boss07 kernel: Pid: 8011, comm: ll_ost_71 
May 30 11:06:31 boss07 kernel: 
May 30 11:06:31 boss07 kernel: Call Trace:
May 30 11:06:31 boss07 kernel:  [<ffffffff886f5d0e>]
start_this_handle+0x301/0x3cb [jbd2]
May 30 11:06:31 boss07 kernel:  [<ffffffff800a09ca>]
autoremove_wake_function+0x0/0x2e
May 30 11:06:31 boss07 kernel:  [<ffffffff886f5e83>]
jbd2_journal_start+0xab/0xdf [jbd2]
May 30 11:06:31 boss07 kernel:  [<ffffffff888ce9b2>]
fsfilt_ldiskfs_start+0x4c2/0x590 [fsfilt_ldiskfs]
May 30 11:06:31 boss07 kernel:  [<ffffffff88920551>]
filter_version_get_check+0x91/0x2a0 [obdfilter]
May 30 11:06:31 boss07 kernel:  [<ffffffff80036cf4>]
__lookup_hash+0x61/0x12f
May 30 11:06:31 boss07 kernel:  [<ffffffff8893108d>]
filter_setattr_internal+0x90d/0x1de0 [obdfilter]
May 30 11:06:31 boss07 kernel:  [<ffffffff800e859b>]
lookup_one_len+0x53/0x61
May 30 11:06:31 boss07 kernel:  [<ffffffff88925452>]
filter_fid2dentry+0x512/0x740 [obdfilter]
May 30 11:06:31 boss07 kernel:  [<ffffffff88924e27>]
filter_fmd_get+0x2b7/0x320 [obdfilter]
May 30 11:06:31 boss07 kernel:  [<ffffffff8003027b>] __up_write+0x27/0xf2
May 30 11:06:31 boss07 kernel:  [<ffffffff88932721>]
filter_setattr+0x1c1/0x3b0 [obdfilter]
May 30 11:06:31 boss07 kernel:  [<ffffffff8882677a>]
lustre_pack_reply_flags+0x86a/0x950 [ptlrpc]
May 30 11:06:31 boss07 kernel:  [<ffffffff8881e658>]
ptlrpc_send_reply+0x5c8/0x5e0 [ptlrpc]
May 30 11:06:31 boss07 kernel:  [<ffffffff88822b05>]
lustre_msg_get_version+0x35/0xf0 [ptlrpc]
May 30 11:06:31 boss07 kernel:  [<ffffffff888b0abb>]
ost_handle+0x25db/0x55b0 [ost]
May 30 11:06:31 boss07 kernel:  [<ffffffff80150d56>] __next_cpu+0x19/0x28
May 30 11:06:31 boss07 kernel:  [<ffffffff800767ae>]
smp_send_reschedule+0x4e/0x53
May 30 11:06:31 boss07 kernel:  [<ffffffff8883215a>]
ptlrpc_server_handle_request+0x97a/0xdf0 [ptlrpc]
May 30 11:06:31 boss07 kernel:  [<ffffffff888328a8>]
ptlrpc_wait_event+0x2d8/0x310 [ptlrpc]
May 30 11:06:31 boss07 kernel:  [<ffffffff8008b3bd>]
__wake_up_common+0x3e/0x68
May 30 11:06:31 boss07 kernel:  [<ffffffff88833817>]
ptlrpc_main+0xf37/0x10f0 [ptlrpc]
May 30 11:06:31 boss07 kernel:  [<ffffffff8005dfb1>] child_rip+0xa/0x11
May 30 11:06:31 boss07 kernel:  [<ffffffff888328e0>]
ptlrpc_main+0x0/0x10f0 [ptlrpc]
May 30 11:06:31 boss07 kernel:  [<ffffffff8005dfa7>] child_rip+0x0/0x11
May 30 11:06:31 boss07 kernel:
May 30 11:06:31 boss07 kernel: LustreError: dumping log to
/tmp/lustre-log.1338347191.8011


Client side:

May 30 09:58:36 ccopt kernel: LustreError: 11-0: an error occurred while
communicating with 192.168.50.123 at tcp. The ost_connect operation failed with
-16

When you got this error message, you failed to run "ls",
"df" ,"vi", "touch" and so on, which affect us to
do anything in the file system.
I think the ost_connect failure could report some error messages to users
instead of  causing any interactive actions stuck.

Could someone give us some advice or any suggestions to solve this problem?

Thank you very much in advance.


Best Regards
Qiulan Huang
2012-05-30
===================================================================Computing
center,the Institute of High Energy Physics, China
Huang, Qiulan                        Tel: (+86) 10 8823 6010-105
P.O. Box 918-7                       Fax: (+86) 10 8823 6839
Beijing 100049  P.R. China           Email: huangql at ihep.ac.cn
===================================================================

Adrian Ulrich

2012-May-30 05:51 UTC

head link

[Lustre-discuss] [wc-discuss] The ost_connect operation failed with -16

Hello,

> May 30 09:58:36 ccopt kernel: LustreError: 11-0: an error occurred while
communicating with 192.168.50.123 at tcp. The ost_connect operation failed with
-16
Error -16 stands for -EBUSY

> When you got this error message, you failed to run "ls",
"df" ,"vi", "touch" and so on, which affect us to
do anything in the file system.
That''s to be expected in such a situation. I suppose that ''lfs
check servers'' returned ''temporarily unavailable'' for
some OSTs ?

> I think the ost_connect failure could report some error messages to users
instead of  causing any interactive actions stuck.
No: Users shouldn''t get an error in such a situation: The filesystem
will just hang until the situation recovered (= the client was able to
re-connect to the OST).



Regards,
 Adrian

HUANG Qiulan

2012-May-30 10:47 UTC

head link

[Lustre-discuss] [SPAM] Re: [wc-discuss] The ost_connect operation failed with -16

Hi Adrian,

> -----????-----
> ???: "Adrian Ulrich" <adrian at blinkenlights.ch>
> ????: 2012?5?30? ???
> ???: huangql <huangql at ihep.ac.cn>
> ??: lustre-discuss <lustre-discuss at lists.lustre.org>, wc-discuss
<wc-discuss at whamcloud.com>
> ??: [SPAM] Re: [wc-discuss] The ost_connect operation failed with -16
> 
> Hello,
> 
> 
> > May 30 09:58:36 ccopt kernel: LustreError: 11-0: an error occurred
while communicating with 192.168.50.123 at tcp. The ost_connect operation failed
with -16
> 
> Error -16 stands for -EBUSY
> 
> 
> > When you got this error message, you failed to run "ls",
"df" ,"vi", "touch" and so on, which affect us to
do anything in the file system.
> 
> That''s to be expected in such a situation. I suppose that
''lfs check servers'' returned ''temporarily
unavailable'' for some OSTs ?
> 
Yes, we can get ''temporarily unavailable'' message on the
client and it can reconnect to the OST after minutes or even hours.

However, the users cannot do any interactive actions in the file system which is
not accepted by them. Do you have some other measures to solve this problem?

> 
> > I think the ost_connect failure could report some error messages to
users instead of  causing any interactive actions stuck.
> 
> No: Users shouldn''t get an error in such a situation: The
filesystem will just hang until the situation recovered (= the client was able
to re-connect to the OST).
> 
> 
> 
> Regards,
>  Adrian

Liang Zhen

2012-May-30 11:04 UTC

head link

[Lustre-discuss] [wc-discuss] The ost_connect operation failed with -16

Hi, I think you might hit this: http://jira.whamcloud.com/browse/LU-952 , you
can find the patch from this ticket

Regards
Liang

On May 30, 2012, at 11:21 AM, huangql wrote:
> Dear  all,
> 
> Recently we found the problem in OSS that some threads might be hung when
the server got heavy IO load. In this case, some clients will be evicted or
refused by some OSTs and got the error messages as following:
> 
> Server side:
> 
> May 30 11:06:31 boss07 kernel: Lustre: Service thread pid 8011 was inactive
for 200.00s. The thread might be hung, or it might only be slow and will resume
later. D
> umping the stack trace for debugging purposes: May 30 11:06:31 boss07
kernel: Lustre: Skipped 1 previous similar message
> May 30 11:06:31 boss07 kernel: Pid: 8011, comm: ll_ost_71 
> May 30 11:06:31 boss07 kernel: 
> May 30 11:06:31 boss07 kernel: Call Trace:
> May 30 11:06:31 boss07 kernel:  [<ffffffff886f5d0e>]
start_this_handle+0x301/0x3cb [jbd2]
> May 30 11:06:31 boss07 kernel:  [<ffffffff800a09ca>]
autoremove_wake_function+0x0/0x2e
> May 30 11:06:31 boss07 kernel:  [<ffffffff886f5e83>]
jbd2_journal_start+0xab/0xdf [jbd2]
> May 30 11:06:31 boss07 kernel:  [<ffffffff888ce9b2>]
fsfilt_ldiskfs_start+0x4c2/0x590 [fsfilt_ldiskfs]
> May 30 11:06:31 boss07 kernel:  [<ffffffff88920551>]
filter_version_get_check+0x91/0x2a0 [obdfilter]
> May 30 11:06:31 boss07 kernel:  [<ffffffff80036cf4>]
__lookup_hash+0x61/0x12f
> May 30 11:06:31 boss07 kernel:  [<ffffffff8893108d>]
filter_setattr_internal+0x90d/0x1de0 [obdfilter]
> May 30 11:06:31 boss07 kernel:  [<ffffffff800e859b>]
lookup_one_len+0x53/0x61
> May 30 11:06:31 boss07 kernel:  [<ffffffff88925452>]
filter_fid2dentry+0x512/0x740 [obdfilter]
> May 30 11:06:31 boss07 kernel:  [<ffffffff88924e27>]
filter_fmd_get+0x2b7/0x320 [obdfilter]
> May 30 11:06:31 boss07 kernel:  [<ffffffff8003027b>]
__up_write+0x27/0xf2
> May 30 11:06:31 boss07 kernel:  [<ffffffff88932721>]
filter_setattr+0x1c1/0x3b0 [obdfilter]
> May 30 11:06:31 boss07 kernel:  [<ffffffff8882677a>]
lustre_pack_reply_flags+0x86a/0x950 [ptlrpc]
> May 30 11:06:31 boss07 kernel:  [<ffffffff8881e658>]
ptlrpc_send_reply+0x5c8/0x5e0 [ptlrpc]
> May 30 11:06:31 boss07 kernel:  [<ffffffff88822b05>]
lustre_msg_get_version+0x35/0xf0 [ptlrpc]
> May 30 11:06:31 boss07 kernel:  [<ffffffff888b0abb>]
ost_handle+0x25db/0x55b0 [ost]
> May 30 11:06:31 boss07 kernel:  [<ffffffff80150d56>]
__next_cpu+0x19/0x28
> May 30 11:06:31 boss07 kernel:  [<ffffffff800767ae>]
smp_send_reschedule+0x4e/0x53
> May 30 11:06:31 boss07 kernel:  [<ffffffff8883215a>]
ptlrpc_server_handle_request+0x97a/0xdf0 [ptlrpc]
> May 30 11:06:31 boss07 kernel:  [<ffffffff888328a8>]
ptlrpc_wait_event+0x2d8/0x310 [ptlrpc]
> May 30 11:06:31 boss07 kernel:  [<ffffffff8008b3bd>]
__wake_up_common+0x3e/0x68
> May 30 11:06:31 boss07 kernel:  [<ffffffff88833817>]
ptlrpc_main+0xf37/0x10f0 [ptlrpc]
> May 30 11:06:31 boss07 kernel:  [<ffffffff8005dfb1>]
child_rip+0xa/0x11
> May 30 11:06:31 boss07 kernel:  [<ffffffff888328e0>]
ptlrpc_main+0x0/0x10f0 [ptlrpc]
> May 30 11:06:31 boss07 kernel:  [<ffffffff8005dfa7>]
child_rip+0x0/0x11
> May 30 11:06:31 boss07 kernel:
> May 30 11:06:31 boss07 kernel: LustreError: dumping log to
/tmp/lustre-log.1338347191.8011
> 
> 
> Client side:
> 
> May 30 09:58:36 ccopt kernel: LustreError: 11-0: an error occurred while
communicating with 192.168.50.123 at tcp. The ost_connect operation failed with
-16
> 
> When you got this error message, you failed to run "ls",
"df" ,"vi", "touch" and so on, which affect us to
do anything in the file system.
> I think the ost_connect failure could report some error messages to users
instead of  causing any interactive actions stuck.
> 
> Could someone give us some advice or any suggestions to solve this problem?
> 
> Thank you very much in advance.
> 
> 
> Best Regards
> Qiulan Huang
> 2012-05-30
> ===================================================================>
Computing center,the Institute of High Energy Physics, China
> Huang, Qiulan                        Tel: (+86) 10 8823 6010-105
> P.O. Box 918-7                       Fax: (+86) 10 8823 6839
> Beijing 100049  P.R. China           Email: huangql at ihep.ac.cn
> ===================================================================	
> 
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20120530/21de54cc/attachment.html

Niu

2012-May-31 09:23 UTC

head link

[Lustre-discuss] [SPAM] Re: [wc-discuss] The ost_connect operation failedwith -16

Hi, Qiulan

LU-952 is about a deadlock issue, was the quota enabled? You could try
to disable quota and see if the problem is gone.

Thanks
- Niu
> Hi Zhen,
> Many thanks to your prompt reply. I have disabled the
> writhetrhough_cache and read_cache to see the problem but it still
> hung thread when there is heavy IO.
> May 31 08:38:43 boss33 kernel: LustreError: dumping log to
> /tmp/lustre-log.1338424722.5303
> May 31 08:38:43 boss33 kernel: Pid: 5262, comm: ll_ost_io_48
> May 31 08:38:43 boss33 kernel:
> May 31 08:38:43 boss33 kernel: Call Trace:
> May 31 08:38:43 boss33 kernel: [<ffffffff887bd451>]
> ksocknal_queue_tx_locked+0x451/0x490 [ksocklnd]
> May 31 08:38:43 boss33 kernel: [<ffffffff800646ac>]
__down_read+0x7a/0x92
> May 31 08:38:43 boss33 kernel: [<ffffffff889843df>]
> ldiskfs_get_blocks+0x5f/0x2e0 [ldiskfs]
> May 31 08:38:43 boss33 kernel: [<ffffffff889851a0>]
> ldiskfs_get_block+0xc0/0x120 [ldiskfs]
> May 31 08:38:43 boss33 kernel: [<ffffffff88981f60>]
> ldiskfs_bmap+0x0/0xf0 [ldiskfs]
> May 31 08:38:43 boss33 kernel: [<ffffffff80033615>]
> generic_block_bmap+0x37/0x41
> May 31 08:38:43 boss33 kernel: [<ffffffff800341ad>]
> mapping_tagged+0x3c/0x47
> May 31 08:38:43 boss33 kernel: [<ffffffff88981f88>]
> ldiskfs_bmap+0x28/0xf0 [ldiskfs]
> May 31 08:38:43 boss33 kernel: [<ffffffff88981f60>]
> ldiskfs_bmap+0x0/0xf0 [ldiskfs]
> May 31 08:38:43 boss33 kernel: [<ffffffff88a4a288>]
> filter_commitrw_write+0x398/0x2be0 [obdfilter]
> May 31 08:38:43 boss33 kernel: [<ffffffff889e6e5c>]
> ost_checksum_bulk+0x30c/0x5b0 [ost]
> May 31 08:38:43 boss33 kernel: [<ffffffff889e6c38>]
> ost_checksum_bulk+0xe8/0x5b0 [ost]
> May 31 08:38:43 boss33 kernel: [<ffffffff889edcf9>]
> ost_brw_write+0x1c99/0x2480 [ost]
> May 31 08:38:43 boss33 kernel: [<ffffffff8872e658>]
> ptlrpc_send_reply+0x5c8/0x5e0 [ptlrpc]
> May 31 08:38:43 boss33 kernel: [<ffffffff886f98b0>]
> target_committed_to_req+0x40/0x120 [ptlrpc]
> May 31 08:38:43 boss33 kernel: [<ffffffff8008cf93>]
> default_wake_function+0x0/0xe
> May 31 08:38:43 boss33 kernel: [<ffffffff88732bc8>]
> lustre_msg_check_version_v2+0x8/0x20 [ptlrpc]
> May 31 08:38:43 boss33 kernel: [<ffffffff889f108e>]
> ost_handle+0x2bae/0x55b0 [ost]
> May 31 08:38:43 boss33 kernel: [<ffffffff80150d56>]
__next_cpu+0x19/0x28
> May 31 08:38:43 boss33 kernel: [<ffffffff800767ae>]
> smp_send_reschedule+0x4e/0x53
> May 31 08:38:43 boss33 kernel: [<ffffffff8874215a>]
> ptlrpc_server_handle_request+0x97a/0xdf0 [ptlrpc]
> May 31 08:38:43 boss33 kernel: [<ffffffff887428a8>]
> ptlrpc_wait_event+0x2d8/0x310 [ptlrpc]
> May 31 08:38:43 boss33 kernel: [<ffffffff8008b3bd>]
> __wake_up_common+0x3e/0x68
> May 31 08:38:43 boss33 kernel: [<ffffffff88743817>]
> ptlrpc_main+0xf37/0x10f0 [ptlrpc]
> May 31 08:38:43 boss33 kernel: [<ffffffff8005dfb1>]
child_rip+0xa/0x11
> May 31 08:38:43 boss33 kernel: [<ffffffff887428e0>]
> ptlrpc_main+0x0/0x10f0 [ptlrpc]
> May 31 08:38:43 boss33 kernel: [<ffffffff8005dfa7>]
child_rip+0x0/0x11
> May 31 08:38:43 boss33 kernel:
> May 31 08:38:43 boss33 kernel: LustreError: dumping log to
> /tmp/lustre-log.1338424722.5262
> May 31 08:38:43 boss33 kernel: Lustre: bes3fs-OST0064: slow journal
> start 48s due to heavy IO load
> May 31 08:38:43 boss33 kernel: Lustre: Skipped 1 previous similar message
> May 31 08:38:43 boss33 kernel: Lustre: bes3fs-OST0064: slow brw_start
> 48s due to heavy IO load
> May 31 08:38:43 boss33 kernel: Lustre: Skipped 1 previous similar message
> May 31 08:38:43 boss33 kernel: Lustre: bes3fs-OST0064: slow journal
> start 187s due to heavy IO load
> May 31 08:38:43 boss33 kernel: Lustre: bes3fs-OST0064: slow brw_start
> 187s due to heavy IO load
> I have not patched the bug because the all servers is online. Could
> you know how to deal with it without affecting users?
> Thank you very much.
> Cheers,
> Qiulan
> ===================================================================>
Computing center,the Institute of High Energy Physics, China
> Huang, Qiulan Tel: (+86) 10 8823 6010-105
> P.O. Box 918-7 Fax: (+86) 10 8823 6839
> Beijing 100049 P.R. China Email: huangql at ihep.ac.cn
> <mailto:huangql at ihep.ac.cn>
> ==================================================================>
2012-05-31
> ------------------------------------------------------------------------
> huangql
> ------------------------------------------------------------------------
> *????* Liang Zhen
> *?????* 2012-05-30 19:12:15
> *????* huangql
> *???* lustre-discuss; wc-discuss
> *???* [SPAM] Re: [wc-discuss] The ost_connect operation failedwith -16
> Hi, I think you might hit this:
> http://jira.whamcloud.com/browse/LU-952 , you can find the patch from
> this ticket
>
> Regards
> Liang
>
> On May 30, 2012, at 11:21 AM, huangql wrote:
>
>> Dear all,
>>
>> Recently we found the problem in OSS that some threads might be hung
>> when the server got heavy IO load. In this case, some clients will be
>> evicted or refused by some OSTs and got the error messages as
following:
>>
>> Server side:
>>
>> May 30 11:06:31 boss07 kernel: Lustre: Service thread pid 8011 was
>> inactive for 200.00s. The thread might be hung, or it might only be
>> slow and will resume later. D
>> umping the stack trace for debugging purposes: May 30 11:06:31 boss07
>> kernel: Lustre: Skipped 1 previous similar message
>> May 30 11:06:31 boss07 kernel: Pid: 8011, comm: ll_ost_71
>> May 30 11:06:31 boss07 kernel:
>> May 30 11:06:31 boss07 kernel: Call Trace:
>> May 30 11:06:31 boss07 kernel: [<ffffffff886f5d0e>]
>> start_this_handle+0x301/0x3cb [jbd2]
>> May 30 11:06:31 boss07 kernel: [<ffffffff800a09ca>]
>> autoremove_wake_function+0x0/0x2e
>> May 30 11:06:31 boss07 kernel: [<ffffffff886f5e83>]
>> jbd2_journal_start+0xab/0xdf [jbd2]
>> May 30 11:06:31 boss07 kernel: [<ffffffff888ce9b2>]
>> fsfilt_ldiskfs_start+0x4c2/0x590 [fsfilt_ldiskfs]
>> May 30 11:06:31 boss07 kernel: [<ffffffff88920551>]
>> filter_version_get_check+0x91/0x2a0 [obdfilter]
>> May 30 11:06:31 boss07 kernel: [<ffffffff80036cf4>]
>> __lookup_hash+0x61/0x12f
>> May 30 11:06:31 boss07 kernel: [<ffffffff8893108d>]
>> filter_setattr_internal+0x90d/0x1de0 [obdfilter]
>> May 30 11:06:31 boss07 kernel: [<ffffffff800e859b>]
>> lookup_one_len+0x53/0x61
>> May 30 11:06:31 boss07 kernel: [<ffffffff88925452>]
>> filter_fid2dentry+0x512/0x740 [obdfilter]
>> May 30 11:06:31 boss07 kernel: [<ffffffff88924e27>]
>> filter_fmd_get+0x2b7/0x320 [obdfilter]
>> May 30 11:06:31 boss07 kernel: [<ffffffff8003027b>]
__up_write+0x27/0xf2
>> May 30 11:06:31 boss07 kernel: [<ffffffff88932721>]
>> filter_setattr+0x1c1/0x3b0 [obdfilter]
>> May 30 11:06:31 boss07 kernel: [<ffffffff8882677a>]
>> lustre_pack_reply_flags+0x86a/0x950 [ptlrpc]
>> May 30 11:06:31 boss07 kernel: [<ffffffff8881e658>]
>> ptlrpc_send_reply+0x5c8/0x5e0 [ptlrpc]
>> May 30 11:06:31 boss07 kernel: [<ffffffff88822b05>]
>> lustre_msg_get_version+0x35/0xf0 [ptlrpc]
>> May 30 11:06:31 boss07 kernel: [<ffffffff888b0abb>]
>> ost_handle+0x25db/0x55b0 [ost]
>> May 30 11:06:31 boss07 kernel: [<ffffffff80150d56>]
__next_cpu+0x19/0x28
>> May 30 11:06:31 boss07 kernel: [<ffffffff800767ae>]
>> smp_send_reschedule+0x4e/0x53
>> May 30 11:06:31 boss07 kernel: [<ffffffff8883215a>]
>> ptlrpc_server_handle_request+0x97a/0xdf0 [ptlrpc]
>> May 30 11:06:31 boss07 kernel: [<ffffffff888328a8>]
>> ptlrpc_wait_event+0x2d8/0x310 [ptlrpc]
>> May 30 11:06:31 boss07 kernel: [<ffffffff8008b3bd>]
>> __wake_up_common+0x3e/0x68
>> May 30 11:06:31 boss07 kernel: [<ffffffff88833817>]
>> ptlrpc_main+0xf37/0x10f0 [ptlrpc]
>> May 30 11:06:31 boss07 kernel: [<ffffffff8005dfb1>]
child_rip+0xa/0x11
>> May 30 11:06:31 boss07 kernel: [<ffffffff888328e0>]
>> ptlrpc_main+0x0/0x10f0 [ptlrpc]
>> May 30 11:06:31 boss07 kernel: [<ffffffff8005dfa7>]
child_rip+0x0/0x11
>> May 30 11:06:31 boss07 kernel:
>> May 30 11:06:31 boss07 kernel: LustreError: dumping log to
>> /tmp/lustre-log.1338347191.8011
>>
>>
>> Client side:
>>
>> May 30 09:58:36 ccopt kernel: LustreError: 11-0: an error occurred
>> while communicating with 192.168.50.123 at tcp. The ost_connect
>> operation failed with -16
>>
>> When you got this error message, you failed to run "ls",
"df" ,"vi",
>> "touch" and so on, which affect us to do anything in the file
system.
>> I think the ost_connect failure could report some error messages to
>> users instead of causing any interactive actions stuck.
>>
>> Could someone give us some advice or any suggestions to solve this
>> problem?
>>
>> Thank you very much in advance.
>>
>>
>> Best Regards
>> Qiulan Huang
>> 2012-05-30
>>
===================================================================>>
Computing center,the Institute of High Energy Physics, China
>> Huang, Qiulan Tel: (+86) 10 8823 6010-105
>> P.O. Box 918-7 Fax: (+86) 10 8823 6839
>> Beijing 100049 P.R. China Email: huangql at ihep.ac.cn
>> <mailto:huangql at ihep.ac.cn>
>>
==================================================================>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20120531/5a4badb5/attachment-0001.html

Lustre discuss - May 2012 - The ost_connect operation failed with -16

[Lustre-discuss] The ost_connect operation failed with -16

[Lustre-discuss] [wc-discuss] The ost_connect operation failed with -16

[Lustre-discuss] [SPAM] Re: [wc-discuss] The ost_connect operation failed with -16

[Lustre-discuss] [wc-discuss] The ost_connect operation failed with -16

[Lustre-discuss] [SPAM] Re: [wc-discuss] The ost_connect operation failedwith -16