Patrick Winnertz
2009-Mar-16 10:17 UTC
[Lustre-discuss] LustreErrors on mgs/mdt when accessing files
Hey, After a test on my freshly installed testcluster with lustre 1.6.7 I saw some errors in our logfiles. I''ve basically created plenty of files with: i=1; while true; do touch $i; echo $i > $i; i=$(($i+1)); done and tried to delete them later with: lfs find . | xargs rm Many files are deleted properly, but after a while lfs find stated: ------------------------------- [...] warning: cb_find_init: ./3933 does not exist: No such file or directory (2) warning: cb_find_init: ./2873 does not exist: No such file or directory (2) warning: cb_find_init: ./4126 does not exist: No such file or directory (2) [...] ------------------------------- At the same time this shows up on the mgs/mdt server in dmesg: --------------------------- LustreError: 2493:0:(ldlm_lib.c:1643:target_send_reply_msg()) Skipped 143 previous similar messages LustreError: 2444:0:(ldlm_lib.c:1643:target_send_reply_msg()) @@@ processing error (-2) req at dcde5200 x21558/t0 o34->50bef30a-7a07- d9da-81c5-8fb613d6b8d6 at NET_0x20000c0a80103_UUID:0/0 lens 312/128 e 0 to 0 dl 1237193648 ref 1 fl Interpret:/0/0 rc -2/0 LustreError: 2444:0:(ldlm_lib.c:1643:target_send_reply_msg()) Skipped 2216 previous similar messages LustreError: 2386:0:(ldlm_lib.c:1643:target_send_reply_msg()) @@@ processing error (-2) req at df8c3600 x32012/t0 o34->50bef30a-7a07- d9da-81c5-8fb613d6b8d6 at NET_0x20000c0a80103_UUID:0/0 lens 312/128 e 0 to 0 dl 1237193799 ref 1 fl Interpret:/0/0 rc -2/0 LustreError: 2386:0:(ldlm_lib.c:1643:target_send_reply_msg()) Skipped 542 previous similar messages LustreError: 2386:0:(ldlm_lib.c:1643:target_send_reply_msg()) @@@ processing error (-2) req at d0c53000 x126145/t0 o34->50bef30a-7a07- d9da-81c5-8fb613d6b8d6 at NET_0x20000c0a80103_UUID:0/0 lens 312/128 e 0 to 0 dl 1237194056 ref 1 fl Interpret:/0/0 rc -2/0 LustreError: 2386:0:(ldlm_lib.c:1643:target_send_reply_msg()) Skipped 3436 previous similar messages ------------------------------ Any hints what was going wrong here and why there was no errors when creating these files? Greetings Patrick -- Patrick Winnertz Tel.: +49 (0) 2161 / 4643 - 0 credativ GmbH, HRB M?nchengladbach 12080 Hohenzollernstr. 133, 41061 M?nchengladbach Gesch?ftsf?hrung: Dr. Michael Meskes, J?rg Folz
Alex Lyashkov
2009-Mar-16 12:15 UTC
[Lustre-discuss] LustreErrors on mgs/mdt when accessing files
Hi Patrik, On Mon, 2009-03-16 at 11:17 +0100, Patrick Winnertz wrote:> Hey, > > After a test on my freshly installed testcluster with lustre 1.6.7 I saw some > errors in our logfiles. > > I''ve basically created plenty of files with: i=1; while true; do touch $i; echo > $i > $i; i=$(($i+1)); done > and tried to delete them later with: lfs find . | xargs rm > Many files are deleted properly, but after a while lfs find stated: > ------------------------------- > [...] > warning: cb_find_init: ./3933 does not exist: No such file or directory (2) > warning: cb_find_init: ./2873 does not exist: No such file or directory (2) > warning: cb_find_init: ./4126 does not exist: No such file or directory (2) > [...] > -------------------------------Is this error replicated? can you replicate this with start debug daemon (lctl debug_daemon ....) and set lnet.debug=-1 / lnet.subsystem_debug=-1 ? I have one similar report before - but not have debug logs for investigate. Thanks. -- Alex Lyashkov <alexey.lyashkov at sun.com> Lustre Group, Sun Microsystems
Patrick Winnertz
2009-Mar-16 13:19 UTC
[Lustre-discuss] LustreErrors on mgs/mdt when accessing files
Hey,> Is this error replicated? can you replicate this with start debug daemon > (lctl debug_daemon ....) and set lnet.debug=-1 / > lnet.subsystem_debug=-1 ?As I was not sure where you want me to set this I''ve uploaded two debug logs (one from the client and one from the mgs/mdt server). http://www.credativ.com/~pwi/lustre-debug-client # from client http://www.credativ.com/~pwi/lustre-debug-mgs # from server Please wait a bit for downloading the client logfile it''s quite huge (~250MB) the server logfile is complete.> I have one similar report before - but not have debug logs for > investigate.I hope this helps to sort this out. If you need more informations please ask. Greetings Patrick -- Patrick Winnertz Tel.: +49 (0) 2161 / 4643 - 0 credativ GmbH, HRB M?nchengladbach 12080 Hohenzollernstr. 133, 41061 M?nchengladbach Gesch?ftsf?hrung: Dr. Michael Meskes, J?rg Folz
Alex Lyashkov
2009-Mar-17 05:53 UTC
[Lustre-discuss] LustreErrors on mgs/mdt when accessing files
Hi On Mon, 2009-03-16 at 14:19 +0100, Patrick Winnertz wrote:> Hey, > > > Is this error replicated? can you replicate this with start debug daemon > > (lctl debug_daemon ....) and set lnet.debug=-1 / > > lnet.subsystem_debug=-1 ? > As I was not sure where you want me to set this I''ve uploaded two debug logs > (one from the client and one from the mgs/mdt server). > > http://www.credativ.com/~pwi/lustre-debug-client # from client > http://www.credativ.com/~pwi/lustre-debug-mgs # from server > > Please wait a bit for downloading the client logfile it''s quite huge (~250MB) > the server logfile is complete.looks something wrong with permission: $ wget http://www.credativ.com/~pwi/lustre-debug-client --2009-03-17 07:51:24-- http://www.credativ.com/~pwi/lustre-debug-client Resolving www.credativ.com... 88.198.32.163 Connecting to www.credativ.com|88.198.32.163|:80... connected. HTTP request sent, awaiting response... 302 Found Location: http://www.credativ.com/404.html [following] -- Alex Lyashkov <alexey.lyashkov at sun.com> Lustre Group, Sun Microsystems
Patrick Winnertz
2009-Mar-17 07:51 UTC
[Lustre-discuss] LustreErrors on mgs/mdt when accessing files
Hey,> looks something wrong with permission: > $ wget http://www.credativ.com/~pwi/lustre-debug-client > --2009-03-17 07:51:24-- > http://www.credativ.com/~pwi/lustre-debug-client > Resolving www.credativ.com... 88.198.32.163 > Connecting to www.credativ.com|88.198.32.163|:80... connected. > HTTP request sent, awaiting response... 302 Found > Location: http://www.credativ.com/404.html [following]Sorry for this, This is fixed now. Greetings Patrick -- Patrick Winnertz Tel.: +49 (0) 2161 / 4643 - 0 credativ GmbH, HRB M?nchengladbach 12080 Hohenzollernstr. 133, 41061 M?nchengladbach Gesch?ftsf?hrung: Dr. Michael Meskes, J?rg Folz
Alex Lyashkov
2009-Mar-17 09:41 UTC
[Lustre-discuss] LustreErrors on mgs/mdt when accessing files
Hi Patrick, On Tue, 2009-03-17 at 08:51 +0100, Patrick Winnertz wrote:> Hey, > > looks something wrong with permission: > > $ wget http://www.credativ.com/~pwi/lustre-debug-client > > --2009-03-17 07:51:24-- > > http://www.credativ.com/~pwi/lustre-debug-client > > Resolving www.credativ.com... 88.198.32.163 > > Connecting to www.credativ.com|88.198.32.163|:80... connected. > > HTTP request sent, awaiting response... 302 Found > > Location: http://www.credativ.com/404.html [following] > Sorry for this, This is fixed now. >logs download in progress, i look to they later today or tomorrow. -- Alex Lyashkov <alexey.lyashkov at sun.com> Lustre Group, Sun Microsystems