Good Morning Folks, We''re (seemingly suddenly) getting some fairly odd IO pauses of about 20-30 seconds during client writes into one of our file systems (specifically an rsync from an NFS to a Lustre). On the client, we''re seeing blocks similar to the following when the pause occurs: Nov 8 09:19:50 lrc-xfer.scs00 lrc-xfer kernel: LustreError: 1809:0:(events.c:198:client_bulk_callback()) event type 0, status -5, desc ffff880080ec4000 Nov 8 09:19:50 lrc-xfer.scs00 lrc-xfer kernel: LustreError: 1819:0:(events.c:198:client_bulk_callback()) event type 0, status -5, desc ffff880034c72000 Nov 8 09:19:50 lrc-xfer.scs00 lrc-xfer kernel: LustreError: 1809:0:(events.c:198:client_bulk_callback()) event type 0, status -113, desc ffff8803c6658000 Nov 8 09:19:50 lrc-xfer.scs00 lrc-xfer kernel: LustreError: 1809:0:(events.c:198:client_bulk_callback()) event type 0, status -5, desc ffff8805a283e000 Nov 8 09:19:50 lrc-xfer.scs00 lrc-xfer kernel: LustreError: 1809:0:(events.c:198:client_bulk_callback()) event type 0, status -5, desc ffff8805b1b0e000 Nov 8 09:19:50 lrc-xfer.scs00 lrc-xfer kernel: LustreError: 1809:0:(events.c:198:client_bulk_callback()) event type 0, status -5, desc ffff8805ca086000 Nov 8 09:19:50 lrc-xfer.scs00 lrc-xfer kernel: LustreError: 1809:0:(events.c:198:client_bulk_callback()) event type 0, status -5, desc ffff88054b762000 Nov 8 09:19:50 lrc-xfer.scs00 lrc-xfer kernel: LustreError: 1809:0:(events.c:198:client_bulk_callback()) event type 0, status -5, desc ffff8805ae49c000 Nov 8 09:19:50 lrc-xfer.scs00 lrc-xfer kernel: LustreError: 1809:0:(events.c:198:client_bulk_callback()) event type 0, status -5, desc ffff88045cb74000 On the OSS, we can see (note: 10.0.2.8 is the client in question): Nov 8 09:21:18 n0002.lustre LustreError: 8731:0:(socklnd.c:1671:ksocknal_destroy_conn()) Completing partial receive from 12345-10.0.2.8 at tcp[2], ip 10.0.2.8:1021, with error, wanted: 8192, left: 8192, last alive is 1 secs ago Nov 8 09:21:18 n0002.lustre kernel: LustreError: 8731:0:(socklnd.c:1671:ksocknal_destroy_conn()) Completing partial receive from 12345-10.0.2.8 at tcp[2], ip 10.0.2.8:1021, with error, wanted: 8192, left: 8192, last alive is 1 secs ago Nov 8 09:21:18 n0002.lustre kernel: LustreError: 8731:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8103be200000 Nov 8 09:21:18 n0002.lustre LustreError: 8731:0:(events.c:381:server_bulk_callback()) event type 2, status -5, desc ffff8103be200000 Nov 8 09:21:18 n0002.lustre LustreError: 9141:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(1048576) req at ffff8104178a6c00 x1412852387822649/t0 o4->81cf6d57-d07f-6bef-2fef-ca8a980c718e@:0/0 lens 448/416 e 1 to 0 dl 1352395330 ref 1 fl Interpret:/0/0 rc 0/0 Nov 8 09:21:18 n0002.lustre kernel: LustreError: 9141:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(1048576) req at ffff8104178a6c00 x1412852387822649/t0 o4->81cf6d57-d07f-6bef-2fef-ca8a980c718e@:0/0 lens 448/416 e 1 to 0 dl 1352395330 ref 1 fl Interpret:/0/0 rc 0/0 Nov 8 09:21:18 n0002.lustre Lustre: 9141:0:(ost_handler.c:1224:ost_brw_write()) lrc-OST0009: ignoring bulk IO comm error with 81cf6d57-d07f-6bef-2fef-ca8a980c718e@ id 12345-10.0.2.8 at tcp - client will retry Nov 8 09:21:18 n0002.lustre kernel: Lustre: 9141:0:(ost_handler.c:1224:ost_brw_write()) lrc-OST0009: ignoring bulk IO comm error with 81cf6d57-d07f-6bef-2fef-ca8a980c718e@ id 12345-10.0.2.8 at tcp - client will retry Nov 8 09:21:24 n0002.lustre Lustre: 8978:0:(ldlm_lib.c:574:target_handle_reconnect()) lrc-OST0004: 81cf6d57-d07f-6bef-2fef-ca8a980c718e reconnecting Nov 8 09:21:24 n0002.lustre Lustre: 8978:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 5 previous similar messages Nov 8 09:21:24 n0002.lustre kernel: Lustre: 8978:0:(ldlm_lib.c:574:target_handle_reconnect()) lrc-OST0004: 81cf6d57-d07f-6bef-2fef-ca8a980c718e reconnecting Nov 8 09:21:24 n0002.lustre kernel: Lustre: 8978:0:(ldlm_lib.c:574:target_handle_reconnect()) Skipped 5 previous similar messages Any ideas as to a cause? Is this network loss? ---------------- John White HPC Systems Engineer (510) 486-7307 One Cyclotron Rd, MS: 50C-3209C Lawrence Berkeley National Lab Berkeley, CA 94720