Thomas Roth
2010-Sep-18 10:34 UTC
[Lustre-discuss] Question about adaptive timeouts, not sending early reply
Hi all, I''m trying to understand MDT logs and adaptive timeouts. After upgrade to 1.8.4 and while users believed Lustre to be still in maintenance (no activity), the MDT log just shows Lustre: 19823:0:(service.c:808:ptlrpc_at_send_early_reply()) @@@ Couldn''t add any time (42/30), not sending early reply Now, for historical reasons of running on a very shaky network, we load the lustre module with options ptlrpc at_max=6000 options ptlrpc at_history=6000 options ptlrpc at_early_margin=50 Right now however, the MDT reports: lxmds:~# lctl get_param -n mdt.MDS.mds.timeouts service : cur 30 worst 76 (at 1284734311, 0d19h33m39s ago) 30 30 30 30 Reading the manual on adaptive timeouts again, I conclude that if the current estimate for timeout is 30 sec, the MDT is indeed hard pressed to send an early reply 50 sec before that timeout occurs. The log messages states something of the like, (42/30). So, is my assessment correct? Are these log messages just due to the stupid at_early_margin setting? Regards, Thomas
Kevin Van Maren
2010-Sep-18 13:58 UTC
[Lustre-discuss] Question about adaptive timeouts, not sending early reply
I believe this message says that the request timeout is on this transaction is 42s, but when Lustre went to go ask for more time based on the current AT service estimate, it came up with 30s. Since 30s is < 42s, it could not ask for more time. Kevin Thomas Roth wrote:> Hi all, > > I''m trying to understand MDT logs and adaptive timeouts. After upgrade > to 1.8.4 and while users believed Lustre to be still in maintenance (> no activity), the MDT log just shows > > Lustre: 19823:0:(service.c:808:ptlrpc_at_send_early_reply()) @@@ > Couldn''t add any time (42/30), not sending early reply > > Now, for historical reasons of running on a very shaky network, we load > the lustre module with > > options ptlrpc at_max=6000 > options ptlrpc at_history=6000 > options ptlrpc at_early_margin=50 > > Right now however, the MDT reports: > > lxmds:~# lctl get_param -n mdt.MDS.mds.timeouts > service : cur 30 worst 76 (at 1284734311, 0d19h33m39s ago) 30 30 30 30 > > Reading the manual on adaptive timeouts again, I conclude that if the > current estimate for timeout is 30 sec, the MDT is indeed hard pressed > to send an early reply 50 sec before that timeout occurs. The log > messages states something of the like, (42/30). > So, is my assessment correct? Are these log messages just due to the > stupid at_early_margin setting? > > Regards, > Thomas > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >