thr3ads.net - Lustre discuss - [Lustre-discuss] possible quota problem [Feb 2010]

If this information is useful, please help other people find it:
Share via:

Dusty Marks

2010-Feb-14 06:33 UTC

[Lustre-discuss] possible quota problem

I''m using Luster 1.8.2

Everything seemed to be working quite nicely, until i enabled user quotas.

I am able to mount the file system on the client, but when ever i cd
into it, or ls it, or try anything else on it, it hangs. Then when i
type in "lfs df -h", the MDS server longer appears in the list

192.168.0.2 is the MDS/MGS server
192.168.0.3 is the OST server (/dev/hdc is the oss)
192.168.0.6 is the patchless client


Thanks for the help all
-Dusty


This shows up in /var/log/messages on the client (sorry, the time is
wrong on this machine)
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Feb 13 18:15:49 mainframe2 kernel: Lustre: MGC192.168.0.2 at tcp:
Reactivating import
Feb 13 18:15:49 mainframe2 kernel: Lustre: Client cluster-client has started
Feb 13 18:16:21 mainframe2 kernel: Lustre:
6386:0:(client.c:1434:ptlrpc_expire_one_request()) @@@ Request
x1327583245569516 sent from cluster-MDT0000-mdc-ffff810009154c00 to
NID 192.168.0.2 at tcp 7s ago has timed out (7s prior to deadline).
Feb 13 18:16:21 mainframe2 kernel:   req at ffff810036502c00
x1327583245569516/t0 o101->cluster-MDT0000_UUID at 192.168.0.2@tcp:12/10
lens 544/1064 e 0 to 1 dl 1266106581 ref 1 fl Rpc:/0/0 rc 0/0
Feb 13 18:16:21 mainframe2 kernel: Lustre:
cluster-MDT0000-mdc-ffff810009154c00: Connection to service
cluster-MDT0000 via nid 192.168.0.2 at tcp was lost; in progress
operations using this service will wait for recovery to complete.
Feb 13 18:16:27 mainframe2 kernel: LustreError:
6386:0:(mdc_locks.c:625:mdc_enqueue()) ldlm_cli_enqueue: -4
----------------------------------------------------------------------------------------------------------------------------------------------------------------


This shows up in /var/log/messages on the MDS server
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Feb 13 23:43:07 MDS kernel: Lustre: MGS: haven''t heard from client
d9029b94-c905-383b-b046-df9c7d7be59d (at 0 at lo) in 248 seconds. I think
it''s dead, and I am evicting it.
Feb 13 23:53:08 MDS kernel: LustreError:
4121:0:(ldlm_lib.c:1848:target_send_reply_msg()) @@@ processing error
(-43)  req at f6553600 x1327583245569136/t0
o36->8b82793a-0c0a-06d5-220b-4e2bc0e85cdf at NET_0x20000c0a80006_UUID:0/0
lens 424/360 e 0 to 0 dl 1266126794 ref 1 fl Interpret:/0/0 rc 0/0
Feb 14 00:03:15 MDS kernel: LustreError:
2581:0:(ldlm_lib.c:1848:target_send_reply_msg()) @@@ processing error
(-43)  req at f5fb4800 x1327583245569252/t0
o36->8b82793a-0c0a-06d5-220b-4e2bc0e85cdf at NET_0x20000c0a80006_UUID:0/0
lens 424/360 e 0 to 0 dl 1266127401 ref 1 fl Interpret:/0/0 rc 0/0
Feb 14 00:04:49 MDS kernel: LustreError: 11-0: an error occurred while
communicating with 192.168.0.3 at tcp. The ost_statfs operation failed
with -107
Feb 14 00:04:49 MDS kernel: Lustre: cluster-OST0000-osc: Connection to
service cluster-OST0000 via nid 192.168.0.3 at tcp was lost; in progress
operations using this service will wait for recovery to complete.
Feb 14 00:04:49 MDS kernel: LustreError: 167-0: This client was
evicted by cluster-OST0000; in progress operations using this service
will fail.
Feb 14 00:04:49 MDS kernel: Lustre:
4352:0:(quota_master.c:1711:mds_quota_recovery()) Only 0/1 OSTs are
active, abort quota recovery
Feb 14 00:04:49 MDS kernel: Lustre: cluster-OST0000-osc: Connection
restored to service cluster-OST0000 using nid 192.168.0.3 at tcp.
Feb 14 00:04:49 MDS kernel: Lustre: MDS cluster-MDT0000:
cluster-OST0000_UUID now active, resetting orphans
Feb 14 00:04:56 MDS kernel: Lustre:
4121:0:(ldlm_lib.c:540:target_handle_reconnect()) cluster-MDT0000:
8b82793a-0c0a-06d5-220b-4e2bc0e85cdf reconnecting
Feb 14 00:04:56 MDS kernel: Lustre:
4121:0:(ldlm_lib.c:837:target_handle_connect()) cluster-MDT0000:
refuse reconnection from
8b82793a-0c0a-06d5-220b-4e2bc0e85cdf at 192.168.0.6@tcp to 0xc9b5f600;
still busy with 1 active RPCs
Feb 14 00:05:10 MDS kernel: Lustre:
4121:0:(ldlm_lib.c:540:target_handle_reconnect()) cluster-MDT0000:
8b82793a-0c0a-06d5-220b-4e2bc0e85cdf reconnecting
Feb 14 00:05:10 MDS kernel: Lustre:
4121:0:(ldlm_lib.c:540:target_handle_reconnect()) Skipped 1 previous
similar message
Feb 14 00:05:10 MDS kernel: Lustre:
4121:0:(ldlm_lib.c:837:target_handle_connect()) cluster-MDT0000:
refuse reconnection from
8b82793a-0c0a-06d5-220b-4e2bc0e85cdf at 192.168.0.6@tcp to 0xc9b5f600;
still busy with 1 active RPCs
Feb 14 00:05:10 MDS kernel: Lustre:
4121:0:(ldlm_lib.c:837:target_handle_connect()) Skipped 1 previous
similar message
Feb 14 00:12:11 MDS kernel: Lustre: cluster-MDT0000: haven''t heard
from client 8b82793a-0c0a-06d5-220b-4e2bc0e85cdf (at 192.168.0.6 at tcp)
in 258 seconds. I think it''s dead, and I am evicting it.
Feb 14 00:15:37 MDS kernel: LustreError: 11-0: an error occurred while
communicating with 192.168.0.3 at tcp. The ost_quotactl operation failed
with -107
Feb 14 00:15:37 MDS kernel: Lustre: cluster-OST0000-osc: Connection to
service cluster-OST0000 via nid 192.168.0.3 at tcp was lost; in progress
operations using this service will wait for recovery to complete.
Feb 14 00:15:37 MDS kernel: LustreError:
4357:0:(quota_ctl.c:379:client_quota_ctl()) ptlrpc_queue_wait failed,
rc: -107
Feb 14 00:15:37 MDS kernel: LustreError: 167-0: This client was
evicted by cluster-OST0000; in progress operations using this service
will fail.
Feb 14 00:15:37 MDS kernel: Lustre:
4358:0:(quota_master.c:1711:mds_quota_recovery()) Only 0/1 OSTs are
active, abort quota recovery
Feb 14 00:15:37 MDS kernel: Lustre: cluster-OST0000-osc: Connection
restored to service cluster-OST0000 using nid 192.168.0.3 at tcp.
Feb 14 00:15:37 MDS kernel: Lustre: MDS cluster-MDT0000:
cluster-OST0000_UUID now active, resetting orphans
----------------------------------------------------------------------------------------------------------------------------------------------------------------

-- 
The graduate with a Science degree asks, "Why does it work?" The
graduate with an Engineering degree asks, "How does it work?" The
graduate with an Accounting degree asks, "How much will it cost?" The
graduate with an Arts degree asks, "Do you want fries with that?"

rishi pathak

2010-Feb-15 06:36 UTC

head link

[Lustre-discuss] possible quota problem

Dear Dusty Marks,
                              Is the user with with which you are trying
these commands present on MDS with same UID/GID.

On Sun, Feb 14, 2010 at 12:03 PM, Dusty Marks <dustynmarks at gmail.com>
wrote:
> I''m using Luster 1.8.2
>
> Everything seemed to be working quite nicely, until i enabled user quotas.
>
> I am able to mount the file system on the client, but when ever i cd
> into it, or ls it, or try anything else on it, it hangs. Then when i
> type in "lfs df -h", the MDS server longer appears in the list
>
> 192.168.0.2 is the MDS/MGS server
> 192.168.0.3 is the OST server (/dev/hdc is the oss)
> 192.168.0.6 is the patchless client
>
>
> Thanks for the help all
> -Dusty
>
>
> This shows up in /var/log/messages on the client (sorry, the time is
> wrong on this machine)
>
>
----------------------------------------------------------------------------------------------------------------------------------------------------------------
> Feb 13 18:15:49 mainframe2 kernel: Lustre: MGC192.168.0.2 at tcp:
> Reactivating import
> Feb 13 18:15:49 mainframe2 kernel: Lustre: Client cluster-client has
> started
> Feb 13 18:16:21 mainframe2 kernel: Lustre:
> 6386:0:(client.c:1434:ptlrpc_expire_one_request()) @@@ Request
> x1327583245569516 sent from cluster-MDT0000-mdc-ffff810009154c00 to
> NID 192.168.0.2 at tcp 7s ago has timed out (7s prior to deadline).
> Feb 13 18:16:21 mainframe2 kernel:   req at ffff810036502c00
> x1327583245569516/t0 o101->cluster-MDT0000_UUID at 192.168.0.2@tcp:12/10
> lens 544/1064 e 0 to 1 dl 1266106581 ref 1 fl Rpc:/0/0 rc 0/0
> Feb 13 18:16:21 mainframe2 kernel: Lustre:
> cluster-MDT0000-mdc-ffff810009154c00: Connection to service
> cluster-MDT0000 via nid 192.168.0.2 at tcp was lost; in progress
> operations using this service will wait for recovery to complete.
> Feb 13 18:16:27 mainframe2 kernel: LustreError:
> 6386:0:(mdc_locks.c:625:mdc_enqueue()) ldlm_cli_enqueue: -4
>
>
----------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>
> This shows up in /var/log/messages on the MDS server
>
>
----------------------------------------------------------------------------------------------------------------------------------------------------------------
> Feb 13 23:43:07 MDS kernel: Lustre: MGS: haven''t heard from client
> d9029b94-c905-383b-b046-df9c7d7be59d (at 0 at lo) in 248 seconds. I think
> it''s dead, and I am evicting it.
> Feb 13 23:53:08 MDS kernel: LustreError:
> 4121:0:(ldlm_lib.c:1848:target_send_reply_msg()) @@@ processing error
> (-43)  req at f6553600 x1327583245569136/t0
> o36->8b82793a-0c0a-06d5-220b-4e2bc0e85cdf at
NET_0x20000c0a80006_UUID:0/0
> lens 424/360 e 0 to 0 dl 1266126794 ref 1 fl Interpret:/0/0 rc 0/0
> Feb 14 00:03:15 MDS kernel: LustreError:
> 2581:0:(ldlm_lib.c:1848:target_send_reply_msg()) @@@ processing error
> (-43)  req at f5fb4800 x1327583245569252/t0
> o36->8b82793a-0c0a-06d5-220b-4e2bc0e85cdf at
NET_0x20000c0a80006_UUID:0/0
> lens 424/360 e 0 to 0 dl 1266127401 ref 1 fl Interpret:/0/0 rc 0/0
> Feb 14 00:04:49 MDS kernel: LustreError: 11-0: an error occurred while
> communicating with 192.168.0.3 at tcp. The ost_statfs operation failed
> with -107
> Feb 14 00:04:49 MDS kernel: Lustre: cluster-OST0000-osc: Connection to
> service cluster-OST0000 via nid 192.168.0.3 at tcp was lost; in progress
> operations using this service will wait for recovery to complete.
> Feb 14 00:04:49 MDS kernel: LustreError: 167-0: This client was
> evicted by cluster-OST0000; in progress operations using this service
> will fail.
> Feb 14 00:04:49 MDS kernel: Lustre:
> 4352:0:(quota_master.c:1711:mds_quota_recovery()) Only 0/1 OSTs are
> active, abort quota recovery
> Feb 14 00:04:49 MDS kernel: Lustre: cluster-OST0000-osc: Connection
> restored to service cluster-OST0000 using nid 192.168.0.3 at tcp.
> Feb 14 00:04:49 MDS kernel: Lustre: MDS cluster-MDT0000:
> cluster-OST0000_UUID now active, resetting orphans
> Feb 14 00:04:56 MDS kernel: Lustre:
> 4121:0:(ldlm_lib.c:540:target_handle_reconnect()) cluster-MDT0000:
> 8b82793a-0c0a-06d5-220b-4e2bc0e85cdf reconnecting
> Feb 14 00:04:56 MDS kernel: Lustre:
> 4121:0:(ldlm_lib.c:837:target_handle_connect()) cluster-MDT0000:
> refuse reconnection from
> 8b82793a-0c0a-06d5-220b-4e2bc0e85cdf at 192.168.0.6@tcp to 0xc9b5f600;
> still busy with 1 active RPCs
> Feb 14 00:05:10 MDS kernel: Lustre:
> 4121:0:(ldlm_lib.c:540:target_handle_reconnect()) cluster-MDT0000:
> 8b82793a-0c0a-06d5-220b-4e2bc0e85cdf reconnecting
> Feb 14 00:05:10 MDS kernel: Lustre:
> 4121:0:(ldlm_lib.c:540:target_handle_reconnect()) Skipped 1 previous
> similar message
> Feb 14 00:05:10 MDS kernel: Lustre:
> 4121:0:(ldlm_lib.c:837:target_handle_connect()) cluster-MDT0000:
> refuse reconnection from
> 8b82793a-0c0a-06d5-220b-4e2bc0e85cdf at 192.168.0.6@tcp to 0xc9b5f600;
> still busy with 1 active RPCs
> Feb 14 00:05:10 MDS kernel: Lustre:
> 4121:0:(ldlm_lib.c:837:target_handle_connect()) Skipped 1 previous
> similar message
> Feb 14 00:12:11 MDS kernel: Lustre: cluster-MDT0000: haven''t heard
> from client 8b82793a-0c0a-06d5-220b-4e2bc0e85cdf (at 192.168.0.6 at tcp)
> in 258 seconds. I think it''s dead, and I am evicting it.
> Feb 14 00:15:37 MDS kernel: LustreError: 11-0: an error occurred while
> communicating with 192.168.0.3 at tcp. The ost_quotactl operation failed
> with -107
> Feb 14 00:15:37 MDS kernel: Lustre: cluster-OST0000-osc: Connection to
> service cluster-OST0000 via nid 192.168.0.3 at tcp was lost; in progress
> operations using this service will wait for recovery to complete.
> Feb 14 00:15:37 MDS kernel: LustreError:
> 4357:0:(quota_ctl.c:379:client_quota_ctl()) ptlrpc_queue_wait failed,
> rc: -107
> Feb 14 00:15:37 MDS kernel: LustreError: 167-0: This client was
> evicted by cluster-OST0000; in progress operations using this service
> will fail.
> Feb 14 00:15:37 MDS kernel: Lustre:
> 4358:0:(quota_master.c:1711:mds_quota_recovery()) Only 0/1 OSTs are
> active, abort quota recovery
> Feb 14 00:15:37 MDS kernel: Lustre: cluster-OST0000-osc: Connection
> restored to service cluster-OST0000 using nid 192.168.0.3 at tcp.
> Feb 14 00:15:37 MDS kernel: Lustre: MDS cluster-MDT0000:
> cluster-OST0000_UUID now active, resetting orphans
>
>
----------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> --
> The graduate with a Science degree asks, "Why does it work?" The
> graduate with an Engineering degree asks, "How does it work?" The
> graduate with an Accounting degree asks, "How much will it cost?"
The
> graduate with an Arts degree asks, "Do you want fries with that?"
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>


-- 
Regards--
Rishi Pathak
National PARAM Supercomputing Facility
Center for Development of Advanced Computing(C-DAC)
Pune University Campus,Ganesh Khind Road
Pune-Maharastra
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100215/dc2f21b9/attachment.html

Lustre discuss - Feb 2010 - possible quota problem

[Lustre-discuss] possible quota problem

[Lustre-discuss] possible quota problem