I get these errors, any ideas? Running Lustre 1.8.4. This client is also the server where we nfs export the filesystem. LustreError: 4994:0:(dir.c:384:ll_readdir_18()) error reading dir 575283686/935610515 page 0: rc -110 LustreError: 11-0: an error occurred while communicating with 192.168.5.104 at tcp. The mds_readpage operation failed with -107 LustreError: 28410:0:(dir.c:384:ll_readdir_18()) error reading dir 579577179/4015460576 page 0: rc -110 LustreError: Skipped 12 previous similar messages Lustre: lustre-MDT0000-mdc-ffff810338e81400: Connection to service lustre-MDT0000 via nid 192.168.5.104 at tcp was lost; in progress operations using this service will wait for recovery to complete. LustreError: 167-0: This client was evicted by lustre-MDT0000; in progress operations using this service will fail. LustreError: 25118:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID req at ffff8101f87d8c00 x1383759180968916/t0 o35->lustre-MDT0000_UUID at 192.168.5.104@tcp:23/10 lens 408/1128 e 0 to 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0 LustreError: 25118:0:(file.c:116:ll_close_inode_openhandle()) inode 17928860 mdc close failed: rc = -108 LustreError: 25118:0:(mdc_locks.c:646:mdc_enqueue()) ldlm_cli_enqueue: -108 LustreError: 9199:0:(file.c:116:ll_close_inode_openhandle()) inode 579577179 mdc close failed: rc = -108 LustreError: 9199:0:(file.c:116:ll_close_inode_openhandle()) Skipped 1 previous similar message Lustre: lustre-MDT0000-mdc-ffff810338e81400: Connection restored to service lustre-MDT0000 using nid 192.168.5.104 at tcp. nfsd: non-standard errno: -43 nfsd: non-standard errno: -43 LustreError: 4994:0:(dir.c:384:ll_readdir_18()) error reading dir 575283686/935610515 page 0: rc -110 LustreError: 4994:0:(dir.c:384:ll_readdir_18()) Skipped 29 previous similar messages LustreError: 11-0: an error occurred while communicating with 192.168.5.104 at tcp. The mds_readpage operation failed with -107 Lustre: lustre-MDT0000-mdc-ffff810338e81400: Connection to service lustre-MDT0000 via nid 192.168.5.104 at tcp was lost; in progress operations using this service will wait for recovery to complete. LustreError: 167-0: This client was evicted by lustre-MDT0000; in progress operations using this service will fail. LustreError: 4994:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID req at ffff8102a576c000 x1383759180969003/t0 o37->lustre-MDT0000_UUID at 192.168.5.104@tcp:23/10 lens 408/600 e 0 to 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0 LustreError: 4994:0:(client.c:858:ptlrpc_import_delay_req()) Skipped 34 previous similar messages nfsd: non-standard errno: -108 nfsd: non-standard errno: -4 nfsd: non-standard errno: -4 nfsd: non-standard errno: -108 LustreError: 25118:0:(file.c:116:ll_close_inode_openhandle()) inode 17928860 mdc close failed: rc = -4 LustreError: 25118:0:(file.c:116:ll_close_inode_openhandle()) Skipped 1 previous similar message LustreError: 25118:0:(mdc_locks.c:646:mdc_enqueue()) ldlm_cli_enqueue: -108 LustreError: 25118:0:(mdc_locks.c:646:mdc_enqueue()) Skipped 4 previous similar messages LustreError: 28407:0:(file.c:3280:ll_inode_revalidate_fini()) failure -108 inode 558497795 LustreError: 28407:0:(file.c:3280:ll_inode_revalidate_fini()) Skipped 3 previous similar messages nfsd: non-standard errno: -108 Lustre: lustre-MDT0000-mdc-ffff810338e81400: Connection restored to service lustre-MDT0000 using nid 192.168.5.104 at tcp. LustreError: 11-0: an error occurred while communicating with 192.168.5.104 at tcp. The mds_close operation failed with -116 LustreError: Skipped 1 previous similar message LustreError: 28407:0:(file.c:116:ll_close_inode_openhandle()) inode 558497794 mdc close failed: rc = -116 LustreError: 28407:0:(file.c:116:ll_close_inode_openhandle()) Skipped 4 previous similar messages LustreError: 11-0: an error occurred while communicating with 192.168.5.104 at tcp. The mds_close operation failed with -116 -- Personally, I liked the university. They gave us money and facilities, we didn''t have to produce anything! You''ve never been out of college! You don''t know what it''s like out there! I''ve worked in the private sector. They expect results. -Ray Ghostbusters
Hi, Just quickly looking at the log you''ve posted, it looks like you''re timing out with overloaded network. -cf On 10/27/2011 10:08 AM, David Noriega wrote:> I get these errors, any ideas? Running Lustre 1.8.4. This client is > also the server where we nfs export the filesystem. > > LustreError: 4994:0:(dir.c:384:ll_readdir_18()) error reading dir > 575283686/935610515 page 0: rc -110 > LustreError: 11-0: an error occurred while communicating with > 192.168.5.104 at tcp. The mds_readpage operation failed with -107 > LustreError: 28410:0:(dir.c:384:ll_readdir_18()) error reading dir > 579577179/4015460576 page 0: rc -110 > LustreError: Skipped 12 previous similar messages > Lustre: lustre-MDT0000-mdc-ffff810338e81400: Connection to service > lustre-MDT0000 via nid 192.168.5.104 at tcp was lost; in progress > operations using this service will wait for recovery to complete. > LustreError: 167-0: This client was evicted by lustre-MDT0000; in > progress operations using this service will fail. > LustreError: 25118:0:(client.c:858:ptlrpc_import_delay_req()) @@@ > IMP_INVALID req at ffff8101f87d8c00 x1383759180968916/t0 > o35->lustre-MDT0000_UUID at 192.168.5.104@tcp:23/10 lens 408/1128 e 0 to > 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0 > LustreError: 25118:0:(file.c:116:ll_close_inode_openhandle()) inode > 17928860 mdc close failed: rc = -108 > LustreError: 25118:0:(mdc_locks.c:646:mdc_enqueue()) ldlm_cli_enqueue: -108 > LustreError: 9199:0:(file.c:116:ll_close_inode_openhandle()) inode > 579577179 mdc close failed: rc = -108 > LustreError: 9199:0:(file.c:116:ll_close_inode_openhandle()) Skipped 1 > previous similar message > Lustre: lustre-MDT0000-mdc-ffff810338e81400: Connection restored to > service lustre-MDT0000 using nid 192.168.5.104 at tcp. > nfsd: non-standard errno: -43 > nfsd: non-standard errno: -43 > LustreError: 4994:0:(dir.c:384:ll_readdir_18()) error reading dir > 575283686/935610515 page 0: rc -110 > LustreError: 4994:0:(dir.c:384:ll_readdir_18()) Skipped 29 previous > similar messages > LustreError: 11-0: an error occurred while communicating with > 192.168.5.104 at tcp. The mds_readpage operation failed with -107 > Lustre: lustre-MDT0000-mdc-ffff810338e81400: Connection to service > lustre-MDT0000 via nid 192.168.5.104 at tcp was lost; in progress > operations using this service will wait for recovery to complete. > LustreError: 167-0: This client was evicted by lustre-MDT0000; in > progress operations using this service will fail. > LustreError: 4994:0:(client.c:858:ptlrpc_import_delay_req()) @@@ > IMP_INVALID req at ffff8102a576c000 x1383759180969003/t0 > o37->lustre-MDT0000_UUID at 192.168.5.104@tcp:23/10 lens 408/600 e 0 to 1 > dl 0 ref 1 fl Rpc:/0/0 rc 0/0 > LustreError: 4994:0:(client.c:858:ptlrpc_import_delay_req()) Skipped > 34 previous similar messages > nfsd: non-standard errno: -108 > nfsd: non-standard errno: -4 > nfsd: non-standard errno: -4 > nfsd: non-standard errno: -108 > LustreError: 25118:0:(file.c:116:ll_close_inode_openhandle()) inode > 17928860 mdc close failed: rc = -4 > LustreError: 25118:0:(file.c:116:ll_close_inode_openhandle()) Skipped > 1 previous similar message > LustreError: 25118:0:(mdc_locks.c:646:mdc_enqueue()) ldlm_cli_enqueue: -108 > LustreError: 25118:0:(mdc_locks.c:646:mdc_enqueue()) Skipped 4 > previous similar messages > LustreError: 28407:0:(file.c:3280:ll_inode_revalidate_fini()) failure > -108 inode 558497795 > LustreError: 28407:0:(file.c:3280:ll_inode_revalidate_fini()) Skipped > 3 previous similar messages > nfsd: non-standard errno: -108 > Lustre: lustre-MDT0000-mdc-ffff810338e81400: Connection restored to > service lustre-MDT0000 using nid 192.168.5.104 at tcp. > LustreError: 11-0: an error occurred while communicating with > 192.168.5.104 at tcp. The mds_close operation failed with -116 > LustreError: Skipped 1 previous similar message > LustreError: 28407:0:(file.c:116:ll_close_inode_openhandle()) inode > 558497794 mdc close failed: rc = -116 > LustreError: 28407:0:(file.c:116:ll_close_inode_openhandle()) Skipped > 4 previous similar messages > LustreError: 11-0: an error occurred while communicating with > 192.168.5.104 at tcp. The mds_close operation failed with -116 > >
Overloaded on the client or mds? All the lustre nodes use nic bonding, so I suppose since we have alot of io traffic on this client, should bump up the number of nics in use? On Thu, Oct 27, 2011 at 3:28 PM, Colin Faber <Colin_Faber at xyratex.com> wrote:> Hi, > Just quickly looking at the log you''ve posted, it looks like you''re > timing out with overloaded network. > > -cf > > > On 10/27/2011 10:08 AM, David Noriega wrote: >> I get these errors, any ideas? Running Lustre 1.8.4. This client is >> also the server where we nfs export the filesystem. >> >> LustreError: 4994:0:(dir.c:384:ll_readdir_18()) error reading dir >> 575283686/935610515 page 0: rc -110 >> LustreError: 11-0: an error occurred while communicating with >> 192.168.5.104 at tcp. The mds_readpage operation failed with -107 >> LustreError: 28410:0:(dir.c:384:ll_readdir_18()) error reading dir >> 579577179/4015460576 page 0: rc -110 >> LustreError: Skipped 12 previous similar messages >> Lustre: lustre-MDT0000-mdc-ffff810338e81400: Connection to service >> lustre-MDT0000 via nid 192.168.5.104 at tcp was lost; in progress >> operations using this service will wait for recovery to complete. >> LustreError: 167-0: This client was evicted by lustre-MDT0000; in >> progress operations using this service will fail. >> LustreError: 25118:0:(client.c:858:ptlrpc_import_delay_req()) @@@ >> IMP_INVALID ?req at ffff8101f87d8c00 x1383759180968916/t0 >> o35->lustre-MDT0000_UUID at 192.168.5.104@tcp:23/10 lens 408/1128 e 0 to >> 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0 >> LustreError: 25118:0:(file.c:116:ll_close_inode_openhandle()) inode >> 17928860 mdc close failed: rc = -108 >> LustreError: 25118:0:(mdc_locks.c:646:mdc_enqueue()) ldlm_cli_enqueue: -108 >> LustreError: 9199:0:(file.c:116:ll_close_inode_openhandle()) inode >> 579577179 mdc close failed: rc = -108 >> LustreError: 9199:0:(file.c:116:ll_close_inode_openhandle()) Skipped 1 >> previous similar message >> Lustre: lustre-MDT0000-mdc-ffff810338e81400: Connection restored to >> service lustre-MDT0000 using nid 192.168.5.104 at tcp. >> nfsd: non-standard errno: -43 >> nfsd: non-standard errno: -43 >> LustreError: 4994:0:(dir.c:384:ll_readdir_18()) error reading dir >> 575283686/935610515 page 0: rc -110 >> LustreError: 4994:0:(dir.c:384:ll_readdir_18()) Skipped 29 previous >> similar messages >> LustreError: 11-0: an error occurred while communicating with >> 192.168.5.104 at tcp. The mds_readpage operation failed with -107 >> Lustre: lustre-MDT0000-mdc-ffff810338e81400: Connection to service >> lustre-MDT0000 via nid 192.168.5.104 at tcp was lost; in progress >> operations using this service will wait for recovery to complete. >> LustreError: 167-0: This client was evicted by lustre-MDT0000; in >> progress operations using this service will fail. >> LustreError: 4994:0:(client.c:858:ptlrpc_import_delay_req()) @@@ >> IMP_INVALID ?req at ffff8102a576c000 x1383759180969003/t0 >> o37->lustre-MDT0000_UUID at 192.168.5.104@tcp:23/10 lens 408/600 e 0 to 1 >> dl 0 ref 1 fl Rpc:/0/0 rc 0/0 >> LustreError: 4994:0:(client.c:858:ptlrpc_import_delay_req()) Skipped >> 34 previous similar messages >> nfsd: non-standard errno: -108 >> nfsd: non-standard errno: -4 >> nfsd: non-standard errno: -4 >> nfsd: non-standard errno: -108 >> LustreError: 25118:0:(file.c:116:ll_close_inode_openhandle()) inode >> 17928860 mdc close failed: rc = -4 >> LustreError: 25118:0:(file.c:116:ll_close_inode_openhandle()) Skipped >> 1 previous similar message >> LustreError: 25118:0:(mdc_locks.c:646:mdc_enqueue()) ldlm_cli_enqueue: -108 >> LustreError: 25118:0:(mdc_locks.c:646:mdc_enqueue()) Skipped 4 >> previous similar messages >> LustreError: 28407:0:(file.c:3280:ll_inode_revalidate_fini()) failure >> -108 inode 558497795 >> LustreError: 28407:0:(file.c:3280:ll_inode_revalidate_fini()) Skipped >> 3 previous similar messages >> nfsd: non-standard errno: -108 >> Lustre: lustre-MDT0000-mdc-ffff810338e81400: Connection restored to >> service lustre-MDT0000 using nid 192.168.5.104 at tcp. >> LustreError: 11-0: an error occurred while communicating with >> 192.168.5.104 at tcp. The mds_close operation failed with -116 >> LustreError: Skipped 1 previous similar message >> LustreError: 28407:0:(file.c:116:ll_close_inode_openhandle()) inode >> 558497794 mdc close failed: rc = -116 >> LustreError: 28407:0:(file.c:116:ll_close_inode_openhandle()) Skipped >> 4 previous similar messages >> LustreError: 11-0: an error occurred while communicating with >> 192.168.5.104 at tcp. The mds_close operation failed with -116 >> >> > ______________________________________________________________________ > This email may contain privileged or confidential information, which should only be used for the purpose for which it was sent by Xyratex. No further rights or licenses are granted to use such information. If you are not the intended recipient of this message, please notify the sender by return and delete it. You may not use, copy, disclose or rely on the information contained in it. > > Internet email is susceptible to data corruption, interception and unauthorised amendment for which Xyratex does not accept liability. While we have taken reasonable precautions to ensure that this email is free of viruses, Xyratex does not accept liability for the presence of any computer viruses in this email, nor for any losses caused as a result of viruses. > > Xyratex Technology Limited (03134912), Registered in England & Wales, Registered Office, Langstone Road, Havant, Hampshire, PO9 1SA. > > The Xyratex group of companies also includes, Xyratex Ltd, registered in Bermuda, Xyratex International Inc, registered in California, Xyratex (Malaysia) Sdn Bhd registered in Malaysia, Xyratex Technology (Wuxi) Co Ltd registered in The People''s Republic of China and Xyratex Japan Limited registered in Japan. > ______________________________________________________________________ > > >-- Personally, I liked the university. They gave us money and facilities, we didn''t have to produce anything! You''ve never been out of college! You don''t know what it''s like out there! I''ve worked in the private sector. They expect results. -Ray Ghostbusters
On the client yes. You can check your mds logs and verify whether other clients are experiencing connectivity problems to the MDS as well. Increasing overall IO bandwidth to the client will help, though remember you will eventually hit a point at which you''ve fully saturated the clients bus. -cf On 10/27/2011 03:34 PM, David Noriega wrote:> Overloaded on the client or mds? All the lustre nodes use nic bonding, > so I suppose since we have alot of io traffic on this client, should > bump up the number of nics in use? > > On Thu, Oct 27, 2011 at 3:28 PM, Colin Faber<Colin_Faber at xyratex.com> wrote: >> Hi, >> Just quickly looking at the log you''ve posted, it looks like you''re >> timing out with overloaded network. >> >> -cf >> >> >> On 10/27/2011 10:08 AM, David Noriega wrote: >>> I get these errors, any ideas? Running Lustre 1.8.4. This client is >>> also the server where we nfs export the filesystem. >>> >>> LustreError: 4994:0:(dir.c:384:ll_readdir_18()) error reading dir >>> 575283686/935610515 page 0: rc -110 >>> LustreError: 11-0: an error occurred while communicating with >>> 192.168.5.104 at tcp. The mds_readpage operation failed with -107 >>> LustreError: 28410:0:(dir.c:384:ll_readdir_18()) error reading dir >>> 579577179/4015460576 page 0: rc -110 >>> LustreError: Skipped 12 previous similar messages >>> Lustre: lustre-MDT0000-mdc-ffff810338e81400: Connection to service >>> lustre-MDT0000 via nid 192.168.5.104 at tcp was lost; in progress >>> operations using this service will wait for recovery to complete. >>> LustreError: 167-0: This client was evicted by lustre-MDT0000; in >>> progress operations using this service will fail. >>> LustreError: 25118:0:(client.c:858:ptlrpc_import_delay_req()) @@@ >>> IMP_INVALID req at ffff8101f87d8c00 x1383759180968916/t0 >>> o35->lustre-MDT0000_UUID at 192.168.5.104@tcp:23/10 lens 408/1128 e 0 to >>> 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0 >>> LustreError: 25118:0:(file.c:116:ll_close_inode_openhandle()) inode >>> 17928860 mdc close failed: rc = -108 >>> LustreError: 25118:0:(mdc_locks.c:646:mdc_enqueue()) ldlm_cli_enqueue: -108 >>> LustreError: 9199:0:(file.c:116:ll_close_inode_openhandle()) inode >>> 579577179 mdc close failed: rc = -108 >>> LustreError: 9199:0:(file.c:116:ll_close_inode_openhandle()) Skipped 1 >>> previous similar message >>> Lustre: lustre-MDT0000-mdc-ffff810338e81400: Connection restored to >>> service lustre-MDT0000 using nid 192.168.5.104 at tcp. >>> nfsd: non-standard errno: -43 >>> nfsd: non-standard errno: -43 >>> LustreError: 4994:0:(dir.c:384:ll_readdir_18()) error reading dir >>> 575283686/935610515 page 0: rc -110 >>> LustreError: 4994:0:(dir.c:384:ll_readdir_18()) Skipped 29 previous >>> similar messages >>> LustreError: 11-0: an error occurred while communicating with >>> 192.168.5.104 at tcp. The mds_readpage operation failed with -107 >>> Lustre: lustre-MDT0000-mdc-ffff810338e81400: Connection to service >>> lustre-MDT0000 via nid 192.168.5.104 at tcp was lost; in progress >>> operations using this service will wait for recovery to complete. >>> LustreError: 167-0: This client was evicted by lustre-MDT0000; in >>> progress operations using this service will fail. >>> LustreError: 4994:0:(client.c:858:ptlrpc_import_delay_req()) @@@ >>> IMP_INVALID req at ffff8102a576c000 x1383759180969003/t0 >>> o37->lustre-MDT0000_UUID at 192.168.5.104@tcp:23/10 lens 408/600 e 0 to 1 >>> dl 0 ref 1 fl Rpc:/0/0 rc 0/0 >>> LustreError: 4994:0:(client.c:858:ptlrpc_import_delay_req()) Skipped >>> 34 previous similar messages >>> nfsd: non-standard errno: -108 >>> nfsd: non-standard errno: -4 >>> nfsd: non-standard errno: -4 >>> nfsd: non-standard errno: -108 >>> LustreError: 25118:0:(file.c:116:ll_close_inode_openhandle()) inode >>> 17928860 mdc close failed: rc = -4 >>> LustreError: 25118:0:(file.c:116:ll_close_inode_openhandle()) Skipped >>> 1 previous similar message >>> LustreError: 25118:0:(mdc_locks.c:646:mdc_enqueue()) ldlm_cli_enqueue: -108 >>> LustreError: 25118:0:(mdc_locks.c:646:mdc_enqueue()) Skipped 4 >>> previous similar messages >>> LustreError: 28407:0:(file.c:3280:ll_inode_revalidate_fini()) failure >>> -108 inode 558497795 >>> LustreError: 28407:0:(file.c:3280:ll_inode_revalidate_fini()) Skipped >>> 3 previous similar messages >>> nfsd: non-standard errno: -108 >>> Lustre: lustre-MDT0000-mdc-ffff810338e81400: Connection restored to >>> service lustre-MDT0000 using nid 192.168.5.104 at tcp. >>> LustreError: 11-0: an error occurred while communicating with >>> 192.168.5.104 at tcp. The mds_close operation failed with -116 >>> LustreError: Skipped 1 previous similar message >>> LustreError: 28407:0:(file.c:116:ll_close_inode_openhandle()) inode >>> 558497794 mdc close failed: rc = -116 >>> LustreError: 28407:0:(file.c:116:ll_close_inode_openhandle()) Skipped >>> 4 previous similar messages >>> LustreError: 11-0: an error occurred while communicating with >>> 192.168.5.104 at tcp. The mds_close operation failed with -116 >>> >>> >> ______________________________________________________________________ >> This email may contain privileged or confidential information, which should only be used for the purpose for which it was sent by Xyratex. No further rights or licenses are granted to use such information. If you are not the intended recipient of this message, please notify the sender by return and delete it. You may not use, copy, disclose or rely on the information contained in it. >> >> Internet email is susceptible to data corruption, interception and unauthorised amendment for which Xyratex does not accept liability. While we have taken reasonable precautions to ensure that this email is free of viruses, Xyratex does not accept liability for the presence of any computer viruses in this email, nor for any losses caused as a result of viruses. >> >> Xyratex Technology Limited (03134912), Registered in England& Wales, Registered Office, Langstone Road, Havant, Hampshire, PO9 1SA. >> >> The Xyratex group of companies also includes, Xyratex Ltd, registered in Bermuda, Xyratex International Inc, registered in California, Xyratex (Malaysia) Sdn Bhd registered in Malaysia, Xyratex Technology (Wuxi) Co Ltd registered in The People''s Republic of China and Xyratex Japan Limited registered in Japan. >> ______________________________________________________________________ >> >> >> > >