Timh Bergström
2009-Jun-03 09:08 UTC
[Lustre-discuss] LustreError: 3429:0:(llog_obd.c:226:llog_add()) No ctxt
Hi all, After a mdt-server-crash we decided to upgrade to 2.6.22+1.6.7 ( to solve some other problems we''ve had before ) from 2.6.18+1.6.6.1 we got this errors in dmesg on MDT: LustreError: 3429:0:(llog_obd.c:226:llog_add()) No ctxt LustreError: 3429:0:(lov_log.c:118:lov_llog_origin_add()) Can''t add llog (rc = -19) for stripe 0 We ran fsck and found some errors in the journal(s) on some of the OST:s which got fixed, we haven''t been able to run fsck on the mdt resource yet (drbd-related). The cluster is running and operating as usual despite the messages in kern.log on the MDT. Is this dangerous? What do we do to fix it? We are running Debian Etch with kernel, utils and lustre packages from http://pkg-lustre.alioth.debian.org/backports/, the cluster has been running fine except some occurrences of LBUG associated with ext3-unlink-race bug in the 2.6.18-6 kernel, and assertion-bugs before upgrading to 1.6.6.1. -- Timh Bergstr?m System Operations Manager Diino AB - www.diino.com :wq
Kevin Van Maren
2009-Jun-03 10:35 UTC
[Lustre-discuss] LustreError: 3429:0:(llog_obd.c:226:llog_add()) No ctxt
1.6.7 is known to corrupt the MDT and was pulled from the download site. Please make sure you are using 1.6.7.1 and not 1.6.7. Kevin Timh Bergstr?m wrote:> Hi all, > > After a mdt-server-crash we decided to upgrade to 2.6.22+1.6.7 ( to > solve some other problems we''ve had before ) from 2.6.18+1.6.6.1 we > got this errors in dmesg on MDT: > > LustreError: 3429:0:(llog_obd.c:226:llog_add()) No ctxt > LustreError: 3429:0:(lov_log.c:118:lov_llog_origin_add()) Can''t add > llog (rc = -19) for stripe 0 > > We ran fsck and found some errors in the journal(s) on some of the > OST:s which got fixed, we haven''t been able to run fsck on the mdt > resource yet (drbd-related). The cluster is running and operating as > usual despite the messages in kern.log on the MDT. Is this dangerous? > What do we do to fix it? > > We are running Debian Etch with kernel, utils and lustre packages from > http://pkg-lustre.alioth.debian.org/backports/, the cluster has been > running fine except some occurrences of LBUG associated with > ext3-unlink-race bug in the 2.6.18-6 kernel, and assertion-bugs before > upgrading to 1.6.6.1. > >
Timh Bergström
2009-Jun-03 12:04 UTC
[Lustre-discuss] LustreError: 3429:0:(llog_obd.c:226:llog_add()) No ctxt
Hello and thanks for the reply, Im 99% sure we are running 1.6.7.1, when was it released btw? I''ve mailed the package maintainer to be sure. Provided we run 1.6.7.1, and still got theese errors, what should we do to get rid of them? Does it indicate some serious error(s)? Or would a simple fsck on mdt-data solve this? Regards, Timh 2009/6/3 Kevin Van Maren <Kevin.Vanmaren at sun.com>:> 1.6.7 is known to corrupt the MDT and was pulled from the download site. > ?Please make sure you are using 1.6.7.1 and not 1.6.7. > > Kevin > > > Timh Bergstr?m wrote: >> >> Hi all, >> >> After a mdt-server-crash we decided to upgrade to 2.6.22+1.6.7 ( to >> solve some other problems we''ve had before ) from 2.6.18+1.6.6.1 we >> got this errors in dmesg on MDT: >> >> LustreError: 3429:0:(llog_obd.c:226:llog_add()) No ctxt >> LustreError: 3429:0:(lov_log.c:118:lov_llog_origin_add()) Can''t add >> llog (rc = -19) for stripe 0 >> >> We ran fsck and found some errors in the journal(s) on some of the >> OST:s which got fixed, we haven''t been able to run fsck on the mdt >> resource yet (drbd-related). The cluster is running and operating as >> usual despite the messages in kern.log on the MDT. Is this dangerous? >> What do we do to fix it? >> >> We are running Debian Etch with kernel, utils and lustre packages from >> http://pkg-lustre.alioth.debian.org/backports/, the cluster has been >> running fine except some occurrences of LBUG associated with >> ext3-unlink-race bug in the 2.6.18-6 kernel, and assertion-bugs before >> upgrading to 1.6.6.1. >> >> > >-- Timh Bergstr?m System Operations Manager Diino AB - www.diino.com :wq
Timh Bergström
2009-Jun-05 09:20 UTC
[Lustre-discuss] LustreError: 3429:0:(llog_obd.c:226:llog_add()) No ctxt
I''ve verified that we run 1.6.7.1. We still get errors similar to the ones i posted; Jun 5 07:55:11 mdt1 kernel: LustreError: 3420:0:(llog_obd.c:226:llog_add()) Skipped 261 previous similar messages Jun 5 07:55:11 mdt1 kernel: LustreError: 3420:0:(lov_log.c:118:lov_llog_origin_add()) Can''t add llog (rc = -19) for stripe 0 Jun 5 07:55:11 mdt1 kernel: LustreError: 3420:0:(lov_log.c:118:lov_llog_origin_add()) Skipped 261 previous similar messages Jun 5 09:15:20 mdt1 kernel: LustreError: 3451:0:(llog_obd.c:226:llog_add()) No ctxt Jun 5 09:15:20 mdt1 kernel: LustreError: 3451:0:(llog_obd.c:226:llog_add()) Skipped 68 previous similar messages Jun 5 09:15:20 mdt1 kernel: LustreError: 3451:0:(lov_log.c:118:lov_llog_origin_add()) Can''t add llog (rc = -19) for stripe 0 Jun 5 09:15:20 mdt1 kernel: LustreError: 3451:0:(lov_log.c:118:lov_llog_origin_add()) Skipped 68 previous similar messages Anyone else seen this? What can we do to "fix" this, obviously there is something wrong with the mdt. From what i find on searching about this error it seems like it''s a corrupted CATALOGS file/folder. According to bug #16002 I should "simply" mount the MDT-drive and delete the CATALOGS file/folder. So what happends when I do this, does Lustre rebuild the file/folder? Will the filesystem remain intact? BR, Timh 2009/6/3 Timh Bergstr?m <timh.bergstrom at diino.net>:> Hello and thanks for the reply, > > Im 99% sure we are running 1.6.7.1, when was it released btw? I''ve > mailed the package maintainer to be sure. > > Provided we run 1.6.7.1, and still got theese errors, what should we > do to get rid of them? Does it indicate some serious error(s)? Or > would a simple fsck on mdt-data solve this? > > Regards, > Timh > > 2009/6/3 Kevin Van Maren <Kevin.Vanmaren at sun.com>: >> 1.6.7 is known to corrupt the MDT and was pulled from the download site. >> ?Please make sure you are using 1.6.7.1 and not 1.6.7. >> >> Kevin >> >> >> Timh Bergstr?m wrote: >>> >>> Hi all, >>> >>> After a mdt-server-crash we decided to upgrade to 2.6.22+1.6.7 ( to >>> solve some other problems we''ve had before ) from 2.6.18+1.6.6.1 we >>> got this errors in dmesg on MDT: >>> >>> LustreError: 3429:0:(llog_obd.c:226:llog_add()) No ctxt >>> LustreError: 3429:0:(lov_log.c:118:lov_llog_origin_add()) Can''t add >>> llog (rc = -19) for stripe 0 >>> >>> We ran fsck and found some errors in the journal(s) on some of the >>> OST:s which got fixed, we haven''t been able to run fsck on the mdt >>> resource yet (drbd-related). The cluster is running and operating as >>> usual despite the messages in kern.log on the MDT. Is this dangerous? >>> What do we do to fix it? >>> >>> We are running Debian Etch with kernel, utils and lustre packages from >>> http://pkg-lustre.alioth.debian.org/backports/, the cluster has been >>> running fine except some occurrences of LBUG associated with >>> ext3-unlink-race bug in the 2.6.18-6 kernel, and assertion-bugs before >>> upgrading to 1.6.6.1. >>> >>> >> >> > > > > -- > Timh Bergstr?m > System Operations Manager > Diino AB - www.diino.com > :wq >