Ms. Megan Larko
2008-Sep-22 20:17 UTC
[Lustre-discuss] o2ib possible network problems -- solved
Hello All, I honestly do not know how it happened, but the value in /proc/sys/lustre/timeout on the OSS box was set to 100. All other systems were set to 1000. I changed the value on the OSS to 1000 and every error message on all of the related systems stopped. I got the idea to re-check from an e-mail message sent by Brian Murrell archived on os-dir referring to bug 16237. Brian listed the above as another thing to check. Interestingly enough, the readahead (blockdev --report /dev/sdX) on the same OSS was set to 672. I have no idea where that came from either. All of the other systems have a reported readahead value of 256. I had changed the readahead value on OSS box first (blockdev --setra 256 /dev/sdX). The error messages did not stop until I fixed the value in /proc/sys/lustre/timeout. How could my /proc have such odd values in it? I will see if the change holds for now. I may have to do something to make it persistent for future reboots. Cheers! megan
Brian J. Murrell
2008-Sep-22 20:27 UTC
[Lustre-discuss] o2ib possible network problems -- solved
On Mon, 2008-09-22 at 16:17 -0400, Ms. Megan Larko wrote:> Hello All, > > I honestly do not know how it happened, but the value in > /proc/sys/lustre/timeout on the OSS box was set to 100. All other > systems were set to 1000.FWIW, 1000 is waaaaay high. Our biggest production systems (thousands if not 10s of thousands) nodes don''t use values higher than 300 seconds. You might want to try lowering that value to 300 seconds (on all nodes of course!) and see if you experience stability. You might want to experiment with even lower values (100s is default) and see where you can maintain stability. The downside of high obd_timeouts is long recovery times. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080922/f71d83d3/attachment.bin
Brian Behlendorf
2008-Sep-22 23:34 UTC
[Lustre-discuss] o2ib possible network problems -- solved
> FWIW, 1000 is waaaaay high. Our biggest production systems (thousands > if not 10s of thousands) nodes don''t use values higher than 300 seconds.Since I''m here at LLNL and we happen to have a few of the large systems maybe I should chime in. While it is true our large systems (many thousands of nodes) use a timeout value of 300s, it is not true that they prevent all of our timeouts. The 300s value has just shown itself through actual usage to prevent 99% of our timeouts and still allow reasonable length recovery times. It certainly does not prevent all of our timeouts. To get to that point I feel the only viable solution is to validate the new adaptive timeout feature for our production use. -- Thanks, Brian -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080922/b137a69a/attachment.bin