Patricia Santos Marco
2009-Aug-21 12:32 UTC
[Lustre-discuss] qmaster dead when an ost is umounted
He, we have upgraded to lustre 1.8.1 successfully. But we have detected that when a ost is umounted, the quotas failed and when the ost is mounted again the qmaster recovery fails and the quotas are off. Aug 21 11:56:20 lxsrv4 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.1.249 at tcp. The obd_ping operation failed with -107 Aug 21 11:56:20 lxsrv4 kernel: Lustre: luster-OST0000-osc: Connection to service luster-OST0000 via nid 192.168.1.249 at tcp waslost; in progress operations using this service will wait for recovery to complete. Aug 21 11:56:45 lxsrv4 kernel: Lustre: 4752:0:(import.c:508:import_select_connection()) luster-OST0000-osc: tried all connections, increasing latency to 6s Aug 21 11:56:45 lxsrv4 kernel: Lustre: 4752:0:(import.c:508:import_select_connection()) Skipped 11 previous similar messages Aug 21 11:57:10 lxsrv4 kernel: Lustre: 4752:0:(import.c:508:import_select_connection()) luster-OST0000-osc: tried all connections, increasing latency to 11s Aug 21 11:58:00 lxsrv4 kernel: Lustre: 4752:0:(import.c:508:import_select_connection()) luster-OST0000-osc: tried all connections, increasing latency to 21s Aug 21 11:58:00 lxsrv4 kernel: Lustre: 4752:0:(import.c:508:import_select_connection()) Skipped 1 previous similar message Aug 21 11:58:06 lxsrv4 kernel: Lustre: luster-OST0000-osc: Connection restored to service luster-OST0000 using nid 192.168.1.249 at tcp. Aug 21 11:58:06 lxsrv4 kernel: Lustre: Skipped 5 previous similar messages Aug 21 11:58:06 lxsrv4 kernel: LustreError: 9741:0:(quota_ctl.c:373:client_quota_ctl()) ptlrpc_queue_wait failed, rc: -3 Aug 21 11:58:06 lxsrv4 kernel: LustreError: 9741:0:(quota_ctl.c:373:client_quota_ctl()) Skipped 5 previous similar messages Aug 21 11:58:06 lxsrv4 kernel: LustreError: 9741:0:(quota_master.c:1686:qmaster_recovery_main()) qmaster recovery failed! (id:1047 type:0 rc:-3) The command "lfs quotaon" fails: terminus:~ # lfs quotaon -ug /lustre quotaon failed: Device or resource busy we must to run "lfs quotacheck", this takes a lot of time and it fails too: terminus:~ # lfs quotacheck -ug /lustre quotacheck failed: Device or resource busy Is there another command to reactivate quotas without disconnecting the clients? What''s the reason for this failure? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090821/781ffead/attachment.html