Philippe Weill
2010-May-10 07:39 UTC
[Lustre-discuss] migrate lustre filesystem from 1.6.5.1 to 1.8.3
Hi, Is there special consideration before migrate from 1.6.5.1 to 1.8.3 1 mgs 2 filesystem 3 oss 12 ost 80T ( we need now 16T ost ) we migrate just 1 client for test to see how it''s comporting and I have some strange issue 1 on this client users don''t have acces any more to their quota --------------------------------------------------------------- lfs quota -v -u weill /home Disk quotas for user weill (uid 1001): Filesystem kbytes quota limit grace files quota limit grace /home [0] [0] [0] [0] [0] [0] quotactl failed: Operation not permitted homefs-OST0000_UUID quotactl failed: Operation not permitted Some errors happened when getting quota info. Some devices may be not working or deactivated. The data in "[]" is inaccurate. from root it''s working lfs quota -u weill /home Disk quotas for user weill (uid 1001): Filesystem kbytes quota limit grace files quota limit grace /home 1646536 5000000 5100000 16628 0 0 2 regular error -16 only on the migrated node --------------------------------------------- May 10 07:15:34 ciclad12 kernel: Lustre: 4202:0:(client.c:1434:ptlrpc_expire_one_request()) @@@ Request x1333645653270608 sent from datafs-OST000a-osc-ffff810c1b6a0c00 to NID 172.20.176.131 at tcp 7s ago has timed out (7s prior to deadline). May 10 07:15:34 ciclad-io2 kernel: Lustre: 7178:0:(ldlm_lib.c:525:target_handle_reconnect()) datafs-OST000a: 15f8d8bb-7b73-bcb3-c3bc-2b03195a9360 reconnecting May 10 07:15:34 ciclad12 kernel: Lustre: datafs-OST000a-osc-ffff810c1b6a0c00: Connection to service datafs-OST000a via nid 172.20.176.131 at tcp was lost; in progress operations using this service will wait for recovery to complete. May 10 07:15:34 ciclad12 kernel: Lustre: 4202:0:(client.c:1434:ptlrpc_expire_one_request()) @@@ Request x1333645653270610 sent from datafs-OST000a-osc-ffff810c1b6a0c00 to NID 172.20.176.131 at tcp 7s ago has timed out (7s prior to deadline). May 10 07:15:34 ciclad-io2 kernel: Lustre: 7178:0:(ldlm_lib.c:525:target_handle_reconnect()) Skipped 6776 previous similar messages May 10 07:15:34 ciclad-io2 kernel: Lustre: 7178:0:(ldlm_lib.c:760:target_handle_connect()) datafs-OST000a: refuse reconnection from 15f8d8bb-7b73-bcb3-c3bc-2b03195a9360 at 172.20.176.242@tcp to 0xffff8101da55a000; still busy with 5 active RPCs May 10 07:15:34 ciclad-io2 kernel: Lustre: 7178:0:(ldlm_lib.c:760:target_handle_connect()) Skipped 6775 previous similar messages May 10 07:15:34 ciclad-io2 kernel: LustreError: 7178:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error (-16) req at ffff8101bab43400 x1333645653270613/t0 o8->15f8d8bb-7b73-bcb3-c3bc-2b03195a9360 at NET_0x20000869db0f2_UUID:0/0 lens 368/200 e 0 to 0 dl 1273468734 ref 1 fl Interpret:/0/0 rc -16/0 May 10 07:15:34 ciclad-io2 kernel: LustreError: 7178:0:(ldlm_lib.c:1536:target_send_reply_msg()) Skipped 6775 previous similar messages May 10 07:15:34 ciclad12 kernel: LustreError: 11-0: an error occurred while communicating with 172.20.176.131 at tcp. The ost_connect operation failed with -16 May 10 07:15:34 ciclad12 kernel: LustreError: Skipped 778 previous similar messages May 10 07:15:35 ciclad-io2 kernel: Lustre: 7253:0:(service.c:1064:ptlrpc_server_handle_request()) @@@ Request x1333645653270608 took longer than estimated (6+2s); client may timeout. req at ffff81022f851000 x1333645653270608/t54502752 o4->15f8d8bb-7b73-bcb3-c3bc-2b03195a9360 at NET_0x20000869db0f2_UUID:0/0 lens 448/352 e 0 to 0 dl 1273468533 ref 1 fl Complete:/0/0 rc 0/0 May 10 07:15:35 ciclad-io2 kernel: Lustre: 7253:0:(service.c:1064:ptlrpc_server_handle_request()) Skipped 1 previous similar message May 10 07:15:35 ciclad12 kernel: LustreError: 11-0: an error occurred while communicating with 172.20.176.131 at tcp. The ost_connect operation failed with -16 May 10 07:15:35 ciclad12 kernel: LustreError: Skipped 596 previous similar messages May 10 07:15:35 ciclad-io2 kernel: Lustre: 7172:0:(ldlm_lib.c:525:target_handle_reconnect()) datafs-OST000a: 15f8d8bb-7b73-bcb3-c3bc-2b03195a9360 reconnecting May 10 07:15:35 ciclad-io2 kernel: Lustre: 7172:0:(ldlm_lib.c:525:target_handle_reconnect()) Skipped 1364 previous similar messages May 10 07:15:35 ciclad-io2 kernel: Lustre: 7172:0:(ldlm_lib.c:760:target_handle_connect()) datafs-OST000a: refuse reconnection from 15f8d8bb-7b73-bcb3-c3bc-2b03195a9360 at 172.20.176.242@tcp to 0xffff8101da55a000; still busy with 4 active RPCs May 10 07:15:35 ciclad-io2 kernel: Lustre: 7172:0:(ldlm_lib.c:760:target_handle_connect()) Skipped 1364 previous similar messages May 10 07:15:35 ciclad-io2 kernel: LustreError: 7172:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error (-16) req at ffff81022f851200 x1333645653271978/t0 o8->15f8d8bb-7b73-bcb3-c3bc-2b03195a9360 at NET_0x20000869db0f2_UUID:0/0 lens 368/200 e 0 to 0 dl 1273468735 ref 1 fl Interpret:/0/0 rc -16/0 May 10 07:15:35 ciclad-io2 kernel: LustreError: 7172:0:(ldlm_lib.c:1536:target_send_reply_msg()) Skipped 1364 previous similar messages May 10 07:15:35 ciclad12 kernel: LustreError: 11-0: an error occurred while communicating with 172.20.176.131 at tcp. The ost_connect operation failed with -16 May 10 07:15:35 ciclad12 kernel: LustreError: Skipped 2918 previous similar messages May 10 07:15:35 ciclad-io2 kernel: Lustre: 7160:0:(ldlm_lib.c:525:target_handle_reconnect()) datafs-OST000a: 15f8d8bb-7b73-bcb3-c3bc-2b03195a9360 reconnecting May 10 07:15:35 ciclad-io2 kernel: Lustre: 7160:0:(ldlm_lib.c:525:target_handle_reconnect()) Skipped 2607 previous similar messages May 10 07:15:35 ciclad-io2 kernel: Lustre: 7160:0:(ldlm_lib.c:760:target_handle_connect()) datafs-OST000a: refuse reconnection from 15f8d8bb-7b73-bcb3-c3bc-2b03195a9360 at 172.20.176.242@tcp to 0xffff8101da55a000; still busy with 4 active RPCs May 10 07:15:35 ciclad-io2 kernel: Lustre: 7160:0:(ldlm_lib.c:760:target_handle_connect()) Skipped 2607 previous similar messages May 10 07:15:35 ciclad-io2 kernel: LustreError: 7160:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error (-16) req at ffff81022e05c400 x1333645653274586/t0 o8->15f8d8bb-7b73-bcb3-c3bc-2b03195a9360 at NET_0x20000869db0f2_UUID:0/0 lens 368/200 e 0 to 0 dl 1273468735 ref 1 fl Interpret:/0/0 rc -16/0 May 10 07:15:35 ciclad-io2 kernel: LustreError: 7160:0:(ldlm_lib.c:1536:target_send_reply_msg()) Skipped 2607 previous similar messages May 10 07:15:36 ciclad-io2 kernel: Lustre: 7231:0:(service.c:1064:ptlrpc_server_handle_request()) @@@ Request x1333645653270612 took longer than estimated (6+3s); client may timeout. req at ffff810139315600 x1333645653270612/t54502757 o4->15f8d8bb-7b73-bcb3-c3bc-2b03195a9360 at NET_0x20000869db0f2_UUID:0/0 lens 448/352 e 0 to 0 dl 1273468533 ref 1 fl Complete:/0/0 rc 0/0 May 10 07:15:36 ciclad12 kernel: LustreError: 11-0: an error occurred while communicating with 172.20.176.131 at tcp. The ost_connect operation failed with -16 May 10 07:15:36 ciclad12 kernel: LustreError: Skipped 4443 previous similar messages May 10 07:15:36 ciclad-io2 kernel: Lustre: 7180:0:(ldlm_lib.c:525:target_handle_reconnect()) datafs-OST000a: 15f8d8bb-7b73-bcb3-c3bc-2b03195a9360 reconnecting May 10 07:15:36 ciclad-io2 kernel: Lustre: 7180:0:(ldlm_lib.c:525:target_handle_reconnect()) Skipped 4627 previous similar messages May 10 07:15:36 ciclad-io2 kernel: Lustre: 7180:0:(ldlm_lib.c:760:target_handle_connect()) datafs-OST000a: refuse reconnection from 15f8d8bb-7b73-bcb3-c3bc-2b03195a9360 at 172.20.176.242@tcp to 0xffff8101da55a000; still busy with 2 active RPCs May 10 07:15:36 ciclad-io2 kernel: Lustre: 7180:0:(ldlm_lib.c:760:target_handle_connect()) Skipped 4627 previous similar messages May 10 07:15:36 ciclad-io2 kernel: LustreError: 7180:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error (-16) req at ffff810068c24a00 x1333645653279214/t0 o8->15f8d8bb-7b73-bcb3-c3bc-2b03195a9360 at NET_0x20000869db0f2_UUID:0/0 lens 368/200 e 0 to 0 dl 1273468736 ref 1 fl Interpret:/0/0 rc -16/0 May 10 07:15:36 ciclad-io2 kernel: LustreError: 7180:0:(ldlm_lib.c:1536:target_send_reply_msg()) Skipped 4627 previous similar messages May 10 07:15:36 ciclad12 kernel: Lustre: datafs-OST000a-osc-ffff810c1b6a0c00: Connection restored to service datafs-OST000a using nid 172.20.176.131 at tcp. -- Weill Philippe - Administrateur Systeme et Reseaux CNRS/UPMC/IPSL LATMOS (UMR 8190)