Theodoros Stylianos Kondylis
2013-Feb-04 17:06 UTC
[Lustre-discuss] How to debug a client''s eviction.
Hello everyone, We are facing a problem in our production system. A user''s application is creating concurrently 12,000 files (containing the solution) but for some reason one of the user''s computational nodes gets evicted because of a timeout before the writing procedure is completed, thus the files are not properly written. I try to debug this situation so I did the following ::>> echo 1 > /proc/sys/lustre/dump_on_eviction >> echo 1 > /proc/sys/lustre/dump_on_timeoutAnd in the /proc/sys/lnet/debug file there is :: ioctl neterror warning error emerg ha config console I would like to ask if there is any other flag I can enable that will help me debug this situation? Thank you in advance for any reply, Stelios. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20130204/19e71d69/attachment.html
Hello! On Feb 4, 2013, at 12:06 PM, Theodoros Stylianos Kondylis wrote:> I try to debug this situation so I did the following :: > > >> echo 1 > /proc/sys/lustre/dump_on_eviction > >> echo 1 > /proc/sys/lustre/dump_on_timeout > > And in the /proc/sys/lnet/debug file there is :: > > ioctl neterror warning error emerg ha config consolerpctrace and dlmtrace seem t one two important ones to see what was sent and received where, after you check those and narrow the problem down to something more specific, you might want to enable some more debug and retry. Bye, Oleg
Theodoros Stylianos Kondylis
2013-Feb-05 16:19 UTC
[Lustre-discuss] How to debug a client''s eviction.
Thank you very much, I shall enable them. Stelios. On Tue, Feb 5, 2013 at 5:11 PM, Drokin, Oleg <oleg.drokin at intel.com> wrote:> Hello! > > On Feb 4, 2013, at 12:06 PM, Theodoros Stylianos Kondylis wrote: > > I try to debug this situation so I did the following :: > > > > >> echo 1 > /proc/sys/lustre/dump_on_eviction > > >> echo 1 > /proc/sys/lustre/dump_on_timeout > > > > And in the /proc/sys/lnet/debug file there is :: > > > > ioctl neterror warning error emerg ha config console > > rpctrace and dlmtrace seem t one two important ones to see what was sent > and received where, > after you check those and narrow the problem down to something more > specific, you > might want to enable some more debug and retry. > > Bye, > Oleg-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20130205/0cda7654/attachment.html