I don''t know if this is a bad thing, I was doing a stress of our new lustre install and managed to have a client kicked out with the following message on the OST that kicked it out: Lustre: 6584:0:(ldlm_lib.c:760:target_handle_connect()) nobackup- OST0000: refuse reconnection from 749b3c01-4ac0- cc80-157c-86c845d6d60a at 10.164.1.93@tcp to 0x00000102f7cdc000; still busy with 6 active RPCs Was this just a result of hammering the filesystem really hard? Both OSS became CPU bound, so I would not be surprised if it was just to much. Any other common causes of this message (I never saw it with our old setup) would be great. Thanks, New install is working great, nice product. Brock Palen www.umich.edu/~brockp Center for Advanced Computing brockp at umich.edu (734)936-1985
On Thu, 2008-08-21 at 22:23 -0400, Brock Palen wrote:> I don''t know if this is a bad thing, I was doing a stress of our new > lustre install and managed to have a client kicked out with the > following message on the OST that kicked it out:To be clear the below message is not a client being evicted but rather a client trying to reconnect after it has been evicted.> Lustre: 6584:0:(ldlm_lib.c:760:target_handle_connect()) nobackup- > OST0000: refuse reconnection from 749b3c01-4ac0- > cc80-157c-86c845d6d60a at 10.164.1.93@tcp to 0x00000102f7cdc000; still > busy with 6 active RPCsThe OSS is refusing to allow the client to reconnect however because it is still trying to finish the transactions the client had in progress when it was evicted.> Was this just a result of hammering the filesystem really hard?Could be, if the load was atypical and you have tuned your obd_timeout for a more typical load. Typically, until AT is in full swing, you need to tune for your worst case scenario. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080821/8682a58e/attachment.bin
On Aug 21, 2008, at 11:17 PM, Brian J. Murrell wrote:> On Thu, 2008-08-21 at 22:23 -0400, Brock Palen wrote: >> I don''t know if this is a bad thing, I was doing a stress of our new >> lustre install and managed to have a client kicked out with the >> following message on the OST that kicked it out: > > To be clear the below message is not a client being evicted but > rather a > client trying to reconnect after it has been evicted.Thanks yes, this message appeared after the eviction notice,> >> Lustre: 6584:0:(ldlm_lib.c:760:target_handle_connect()) nobackup- >> OST0000: refuse reconnection from 749b3c01-4ac0- >> cc80-157c-86c845d6d60a at 10.164.1.93@tcp to 0x00000102f7cdc000; still >> busy with 6 active RPCs > > The OSS is refusing to allow the client to reconnect however > because it > is still trying to finish the transactions the client had in progress > when it was evicted.Good to know that its just for ''that'' client.> >> Was this just a result of hammering the filesystem really hard? > > Could be, if the load was atypical and you have tuned your obd_timeout > for a more typical load. Typically, until AT is in full swing, you > need > to tune for your worst case scenario. > > b. > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss