I have had two ''kernel bug'' issues today both referencing file-item.c. The first oops happened when i was cp''ing from and external HD(ext3) to and ext3 partition. The second happened during boot up. I have attached them both. Im using btrfs that was merged into my kernel yesterday. -- -- Marc R. O''Connor Information Systems Camden County Board of Social Services 600 Market St. Camden, NJ 08102 mroconnor@oel.state.nj.us 856-225-8915 Ofc. 609-206-3458 Blackberry 319D8AF1 BB PIN
On Tue, 2009-04-28 at 13:39 -0400, Marc R. O''Connor wrote:> I have had two ''kernel bug'' issues today both referencing file-item.c. > The first oops happened when i was cp''ing from and external HD(ext3) to > and ext3 partition. The second happened during boot up. I have attached > them both. > > Im using btrfs that was merged into my kernel yesterday. > > plain text document attachment (btrfs_bug_1) > Apr 28 10:55:10 cosmo2 ------------[ cut here ]------------ > Apr 28 10:55:10 cosmo2 kernel BUG at fs/btrfs/file-item.c:494!Well, I think I see the bug. It looks like we want to do - if (key->offset < bytenr && csum_end <= end_byte) { + if (key->offset <= bytenr && csum_end <= end_byte) { in truncate_one_csum. But I need to test that here for a bit and send you a patch. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason wrote:> On Tue, 2009-04-28 at 13:39 -0400, Marc R. O''Connor wrote: >> I have had two ''kernel bug'' issues today both referencing file-item.c. >> The first oops happened when i was cp''ing from and external HD(ext3) to >> and ext3 partition. The second happened during boot up. I have attached >> them both. >> >> Im using btrfs that was merged into my kernel yesterday. >> >> plain text document attachment (btrfs_bug_1) >> Apr 28 10:55:10 cosmo2 ------------[ cut here ]------------ >> Apr 28 10:55:10 cosmo2 kernel BUG at fs/btrfs/file-item.c:494! > > Well, I think I see the bug. It looks like we want to do > > - if (key->offset < bytenr && csum_end <= end_byte) { > + if (key->offset <= bytenr && csum_end <= end_byte) { > > in truncate_one_csum. But I need to test that here for a bit and send > you a patch. > > -chris > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.htmlThanks, I am a patient kind of guy ;) -- -- Marc R. O''Connor Information Systems Camden County Board of Social Services 600 Market St. Camden, NJ 08102 mroconnor@oel.state.nj.us 856-225-8915 Ofc. 609-206-3458 Blackberry 319D8AF1 BB PIN
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 full file-item.c attached Chris Mason wrote:> On Tue, 2009-04-28 at 13:39 -0400, Marc R. O''Connor wrote: >> I have had two ''kernel bug'' issues today both referencing file-item.c. >> The first oops happened when i was cp''ing from and external HD(ext3) to >> and ext3 partition. The second happened during boot up. I have attached >> them both. >> >> Im using btrfs that was merged into my kernel yesterday. >> >> plain text document attachment (btrfs_bug_1) >> Apr 28 10:55:10 cosmo2 ------------[ cut here ]------------ >> Apr 28 10:55:10 cosmo2 kernel BUG at fs/btrfs/file-item.c:494! > > Well, I think I see the bug. It looks like we want to do > > - if (key->offset < bytenr && csum_end <= end_byte) { > + if (key->offset <= bytenr && csum_end <= end_byte) { > > in truncate_one_csum. But I need to test that here for a bit and send > you a patch. > > -chris > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html- -- - -- Marc R. O''Connor Information Systems Camden County Board of Social Services 600 Market St. Camden, NJ 08102 mroconnor@oel.state.nj.us 856-225-8915 Ofc. 609-206-3458 Blackberry 319D8AF1 BB PIN -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkn4en0ACgkQd+xzezvVnBzEDwCfVK0fXZ8aeYJJWJLqSLGiQRu3 MpIAoIPQatlYj8dtyhECz/LT8AKCoHjT =UHoN -----END PGP SIGNATURE-----
On Wed, 2009-04-29 at 12:04 -0400, Marc R. O''Connor wrote:> -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > full file-item.c attached > > Chris Mason wrote: > > On Tue, 2009-04-28 at 13:39 -0400, Marc R. O''Connor wrote: > >> I have had two ''kernel bug'' issues today both referencing file-item.c. > >> The first oops happened when i was cp''ing from and external HD(ext3) to > >> and ext3 partition. The second happened during boot up. I have attached > >> them both. > >> > >> Im using btrfs that was merged into my kernel yesterday. > >> > >> plain text document attachment (btrfs_bug_1) > >> Apr 28 10:55:10 cosmo2 ------------[ cut here ]------------ > >> Apr 28 10:55:10 cosmo2 kernel BUG at fs/btrfs/file-item.c:494! > > > > Well, I think I see the bug. It looks like we want to do > > > > - if (key->offset < bytenr && csum_end <= end_byte) { > > + if (key->offset <= bytenr && csum_end <= end_byte) { > > > > in truncate_one_csum. But I need to test that here for a bit and send > > you a patch.Ok, line 494 is actually this one ;) key->offset = end_byte; ret = btrfs_set_item_key_safe(trans, root, path, key); BUG_ON(ret); <---- 494 Which means we''re finding things out of order in the btree leaf. Could you please run btrfsck on this filesystem? -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason wrote:> On Wed, 2009-04-29 at 12:04 -0400, Marc R. O''Connor wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> full file-item.c attached >> >> Chris Mason wrote: >>> On Tue, 2009-04-28 at 13:39 -0400, Marc R. O''Connor wrote: >>>> I have had two ''kernel bug'' issues today both referencing file-item.c. >>>> The first oops happened when i was cp''ing from and external HD(ext3) to >>>> and ext3 partition. The second happened during boot up. I have attached >>>> them both. >>>> >>>> Im using btrfs that was merged into my kernel yesterday. >>>> >>>> plain text document attachment (btrfs_bug_1) >>>> Apr 28 10:55:10 cosmo2 ------------[ cut here ]------------ >>>> Apr 28 10:55:10 cosmo2 kernel BUG at fs/btrfs/file-item.c:494! >>> Well, I think I see the bug. It looks like we want to do >>> >>> - if (key->offset < bytenr && csum_end <= end_byte) { >>> + if (key->offset <= bytenr && csum_end <= end_byte) { >>> >>> in truncate_one_csum. But I need to test that here for a bit and send >>> you a patch. > > Ok, line 494 is actually this one ;) > > key->offset = end_byte; > ret = btrfs_set_item_key_safe(trans, root, path, key); > BUG_ON(ret); <---- 494 > > Which means we''re finding things out of order in the btree leaf. > > Could you please run btrfsck on this filesystem? > > -chris > >I have done that on all btrfs partitions I have and btrfsck did not return anything odd. -- -- Marc R. O''Connor Information Systems Camden County Board of Social Services 600 Market St. Camden, NJ 08102 mroconnor@oel.state.nj.us 856-225-8915 Ofc. 609-206-3458 Blackberry 319D8AF1 BB PIN
On Wed, 2009-04-29 at 14:21 -0400, Marc R. O''Connor wrote:> > Chris Mason wrote: > > On Wed, 2009-04-29 at 12:04 -0400, Marc R. O''Connor wrote: > >> -----BEGIN PGP SIGNED MESSAGE----- > >> Hash: SHA1 > >> > >> full file-item.c attached > >> > >> Chris Mason wrote: > >>> On Tue, 2009-04-28 at 13:39 -0400, Marc R. O''Connor wrote: > >>>> I have had two ''kernel bug'' issues today both referencing file-item.c. > >>>> The first oops happened when i was cp''ing from and external HD(ext3) to > >>>> and ext3 partition. The second happened during boot up. I have attached > >>>> them both. > >>>> > >>>> Im using btrfs that was merged into my kernel yesterday. > >>>> > >>>> plain text document attachment (btrfs_bug_1) > >>>> Apr 28 10:55:10 cosmo2 ------------[ cut here ]------------ > >>>> Apr 28 10:55:10 cosmo2 kernel BUG at fs/btrfs/file-item.c:494! > >>> Well, I think I see the bug. It looks like we want to do > >>> > >>> - if (key->offset < bytenr && csum_end <= end_byte) { > >>> + if (key->offset <= bytenr && csum_end <= end_byte) { > >>> > >>> in truncate_one_csum. But I need to test that here for a bit and send > >>> you a patch. > > > > Ok, line 494 is actually this one ;) > > > > key->offset = end_byte; > > ret = btrfs_set_item_key_safe(trans, root, path, key); > > BUG_ON(ret); <---- 494 > > > > Which means we''re finding things out of order in the btree leaf. > > > > Could you please run btrfsck on this filesystem? > >> I have done that on all btrfs partitions I have and btrfsck did not > return anything odd. >In that case, the bad ordering is being introduced at run time. Could you please run memtest86 on the box? -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason wrote:> On Wed, 2009-04-29 at 14:21 -0400, Marc R. O''Connor wrote: >> Chris Mason wrote: >>> On Wed, 2009-04-29 at 12:04 -0400, Marc R. O''Connor wrote: >>>> -----BEGIN PGP SIGNED MESSAGE----- >>>> Hash: SHA1 >>>> >>>> full file-item.c attached >>>> >>>> Chris Mason wrote: >>>>> On Tue, 2009-04-28 at 13:39 -0400, Marc R. O''Connor wrote: >>>>>> I have had two ''kernel bug'' issues today both referencing file-item.c. >>>>>> The first oops happened when i was cp''ing from and external HD(ext3) to >>>>>> and ext3 partition. The second happened during boot up. I have attached >>>>>> them both. >>>>>> >>>>>> Im using btrfs that was merged into my kernel yesterday. >>>>>> >>>>>> plain text document attachment (btrfs_bug_1) >>>>>> Apr 28 10:55:10 cosmo2 ------------[ cut here ]------------ >>>>>> Apr 28 10:55:10 cosmo2 kernel BUG at fs/btrfs/file-item.c:494! >>>>> Well, I think I see the bug. It looks like we want to do >>>>> >>>>> - if (key->offset < bytenr && csum_end <= end_byte) { >>>>> + if (key->offset <= bytenr && csum_end <= end_byte) { >>>>> >>>>> in truncate_one_csum. But I need to test that here for a bit and send >>>>> you a patch. >>> Ok, line 494 is actually this one ;) >>> >>> key->offset = end_byte; >>> ret = btrfs_set_item_key_safe(trans, root, path, key); >>> BUG_ON(ret); <---- 494 >>> >>> Which means we''re finding things out of order in the btree leaf. >>> >>> Could you please run btrfsck on this filesystem? >>> > >> I have done that on all btrfs partitions I have and btrfsck did not >> return anything odd. >> > > In that case, the bad ordering is being introduced at run time. Could > you please run memtest86 on the box? > > -chris > >memtest comes back with two errors very early on then reboots the sysrescueCD. :( -- -- Marc R. O''Connor Information Systems Camden County Board of Social Services 600 Market St. Camden, NJ 08102 mroconnor@oel.state.nj.us 856-225-8915 Ofc. 609-206-3458 Blackberry 319D8AF1 BB PIN
On Wed, 2009-04-29 at 14:38 -0400, Marc R. O''Connor wrote:> > > >> I have done that on all btrfs partitions I have and btrfsck did not > >> return anything odd. > >> > > > > In that case, the bad ordering is being introduced at run time. Could > > you please run memtest86 on the box? > > > > -chris > > > > > memtest comes back with two errors very early on then reboots the > sysrescueCD. :(Well, then I''m surprised btrfs doesn''t crash more violently and more often ;) Do you think you''re hitting a memtest bug or is the HW really bad? -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
To be honest I have no idea. Let me see if I can find a different version to test with and see. I think I''m still under warranty. ;) Chris Mason wrote:> On Wed, 2009-04-29 at 14:38 -0400, Marc R. O''Connor wrote: >>>> I have done that on all btrfs partitions I have and btrfsck did not >>>> return anything odd. >>>> >>> In that case, the bad ordering is being introduced at run time. Could >>> you please run memtest86 on the box? >>> >>> -chris >>> >>> >> memtest comes back with two errors very early on then reboots the >> sysrescueCD. :( > > Well, then I''m surprised btrfs doesn''t crash more violently and more > often ;) > > Do you think you''re hitting a memtest bug or is the HW really bad? > > -chris > > > >-- -- Marc R. O''Connor Information Systems Camden County Board of Social Services 600 Market St. Camden, NJ 08102 mroconnor@oel.state.nj.us 856-225-8915 Ofc. 609-206-3458 Blackberry 319D8AF1 BB PIN
> Do you think you''re hitting a memtest bug or is the HW really bad?If you can stomach it, you can get a second opinion from the bootable windows memory testing iso: http://oca.microsoft.com/en/windiag.asp - z -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Zach Brown wrote:>> Do you think you''re hitting a memtest bug or is the HW really bad? > > If you can stomach it, you can get a second opinion from the bootable > windows memory testing iso: > > http://oca.microsoft.com/en/windiag.asp > > - zIt will be hard but I might just try it. Two versions of memtest86+ die in the middle of the scan. Ugh. -- -- Marc R. O''Connor Information Systems Camden County Board of Social Services 600 Market St. Camden, NJ 08102 mroconnor@oel.state.nj.us 856-225-8915 Ofc. 609-206-3458 Blackberry 319D8AF1 BB PIN
On Wed, Apr 29, 2009 at 02:40:19PM -0400, Chris Mason spake thusly:> Well, then I''m surprised btrfs doesn''t crash more violently and more > often ;)Note that this will be a problem that btrfs must properly manage. And it must be done MUCH better than a certain previously semi-popular filesystem did. The expectation needs to be set that due to the much more complicated in-memory structures being used by modern filesystems that your hardware must be rock solid or you will get filesystem corruption. I ran the other filesystem on hundreds of machines with no problem (all solid hardware) but I regularly run into people (just this morning, for example) who swear that *every*single*time* they created a filesystem with that other fs it was corrupted in a short amount of time. It just doesn''t add up. First impressions and early rumors can doom a filesystem (although clearly other factors such as personalities and politics can be at play as well).> Do you think you''re hitting a memtest bug or is the HW really bad?A bug in memtest? It''s been rock solid for years hasn''t it? Maybe some new memory configuration or MMU might freak it out. Seems quite unlikely compared to the chances of actually having bad RAM. -- Tracy Reed http://tracyreed.org
On Thu, Apr 30, 2009 at 5:04 AM, Marc R. O''Connor <mroconnor@oel.state.nj.us> wrote:>> If you can stomach it, you can get a second opinion from the bootable >> windows memory testing iso: >> >> http://oca.microsoft.com/en/windiag.asp > > > It will be hard but I might just try it. Two versions of memtest86+ die > in the middle of the scan. Ugh.Also try memtester in Linux. Boot up as you normally do and give it a fair chunk of your RAM to run on. Unlike memtest it leaves all the actual low level stuff to the kernel. Doesn''t even need root, let alone boot. At least then you can thoroughly rule out a memtest86 bug. -- Dmitri Nikulin Centre for Synchrotron Science Monash University Victoria 3800, Australia -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Apr 29, 2009 at 8:32 PM, Tracy Reed <treed@ultraviolet.org> wrote:> Note that this will be a problem that btrfs must properly manage. And > it must be done MUCH better than a certain previously semi-popular > filesystem did. The expectation needs to be set that due to the muchI don''t think it''s workable or feasible or whatever. As soon as you have faulty memory or CPU, all bets are off. -- This message represents the official view of the voices in my head -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Apr 30, 2009 at 10:08:04AM +0100, Paul Komkoff spake thusly:> On Wed, Apr 29, 2009 at 8:32 PM, Tracy Reed <treed@ultraviolet.org> wrote: > > Note that this will be a problem that btrfs must properly manage. And > > it must be done MUCH better than a certain previously semi-popular > > filesystem did. The expectation needs to be set that due to the much > > I don''t think it''s workable or feasible or whatever. > As soon as you have faulty memory or CPU, all bets are off.Sorry, I was unclear: I meant manage in a public-relations sort of way. Not in a technical way. You are absolutely right that bad RAM or CPU means you are hosed. -- Tracy Reed http://tracyreed.org
On Fri, May 1, 2009 at 2:54 AM, Tracy Reed <treed@ultraviolet.org> wrote:> Sorry, I was unclear: I meant manage in a public-relations sort of > way. Not in a technical way. You are absolutely right that bad RAM or > CPU means you are hosed.Even so, it''s a perfect opportunity to not make things worse by trying to write data after fundamental assertions have already failed. My most recent data loss scenario with ext3 involved a little kernel/hardware/whatever glitch that would have been harmless on its own, but ext3 took as a cue to completely mangle metadata. I found XML config files with PNG blocks in them, etc. -- Dmitri Nikulin Centre for Synchrotron Science Monash University Victoria 3800, Australia -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html