thr3ads.net - Btrfs devel - btrfs stability [Jan 2013]

If this information is useful, please help other people find it:
Share via:

Andrew McNabb

2013-Jan-25 20:05 UTC

btrfs stability

I tried creating a multi-device btrfs filesystem for the first time (on
Fedora 18 with 3.7.2-204.fc18.x86_64), and I ran into some problems.  I
had heard that btrfs is now reasonably stable, and though I expected to
possibly see a problem here or there, I was a little surprised at just
how many problems I encountered in such a short period of time.  I now
have about a thousand error messages in my kernel logs related to
several different problems.  Is this roughly the expected level of
stability for btrfs with multiple devices, or am I just particularly
lucky? :)

Am I correct in assuming that I''ll need to switch to md for a few
months
and try btrfs again later, or are there known problems in the specific
kernel I''m running that I could avoid by trying a different version?

For the sake of being specific, I''ll detail a few of the problems
I''ve
hit:

These two may have been caused by a possibly faulty disk (I''m still
trying to determine whether it was faulty or whether the bug was purely
in btrfs):

https://bugzilla.redhat.com/show_bug.cgi?id=903794
https://bugzilla.redhat.com/show_bug.cgi?id=904143

This one was triggered when I tried to remove a possibly faulty disk:

https://bugzilla.redhat.com/show_bug.cgi?id=904197

With a freshly created filesystem, I got a kernel bug, associated with a
hang in most filesystem operations.  This occurred in the middle of
ordinary operation and without any sort of hardware-related errors in
the kernel logs.

https://bugzilla.redhat.com/show_bug.cgi?id=904223

I''ve noticed that a lot of the reports in the Fedora bugzilla and
kernel
bugzilla don''t seem to include much discussion; is there any specific
type of information that bug submitters should try to include to make
the reports more helpful?  Thanks.

--
Andrew McNabb
http://www.mcnabbs.org/andrew/
PGP Fingerprint: 8A17 B57C 6879 1863 DE55  8012 AB4D 6098 8826 6868
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik

2013-Jan-25 20:37 UTC

head link

Re: btrfs stability

On Fri, Jan 25, 2013 at 01:05:14PM -0700, Andrew McNabb
wrote:> I tried creating a multi-device btrfs filesystem for the first time (on
> Fedora 18 with 3.7.2-204.fc18.x86_64), and I ran into some problems.  I
> had heard that btrfs is now reasonably stable, and though I expected to
> possibly see a problem here or there, I was a little surprised at just
> how many problems I encountered in such a short period of time.  I now
> have about a thousand error messages in my kernel logs related to
> several different problems.  Is this roughly the expected level of
> stability for btrfs with multiple devices, or am I just particularly
> lucky? :)
> 
> Am I correct in assuming that I''ll need to switch to md for a few
months
> and try btrfs again later, or are there known problems in the specific
> kernel I''m running that I could avoid by trying a different
version?
> 
> For the sake of being specific, I''ll detail a few of the problems
I''ve
> hit:
> 
> These two may have been caused by a possibly faulty disk (I''m
still
> trying to determine whether it was faulty or whether the bug was purely
> in btrfs):
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=903794
This one is just a allocator warning because the relocator doesn''t do
the right
accounting for relocation.  It''s just complainig, we need to fix it but
it won''t
keep it from working.
> https://bugzilla.redhat.com/show_bug.cgi?id=904143
This I''m almost certain (I have to check) was just a result of me
making fsync
faster and forgetting to remove this warn on.  It''s fixed upstream. 
Again,
nothing to worry about, but annoying.
> 
> This one was triggered when I tried to remove a possibly faulty disk:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=904197
> 
Ok this is a bug, I can fix this.  Basically we tried to read from the faulty
disk, it failed, we read from the other copy, and then tried to write the good
copy back to the failed disk and when we saw that the IO wasn''t
actually going
to go to the bad disk we panic''ed.  Silly but easy enough to
understand/fix.
> With a freshly created filesystem, I got a kernel bug, associated with a
> hang in most filesystem operations.  This occurred in the middle of
> ordinary operation and without any sort of hardware-related errors in
> the kernel logs.
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=904223
> 
So this is from the fsync stuff, and I''m sure I fixed this somewhere
but I can''t
account for where I did it.  Can you give btrfs-next a try and see if you can
still reproduce.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik

2013-Jan-25 20:53 UTC

head link

Re: btrfs stability

On Fri, Jan 25, 2013 at 01:05:14PM -0700, Andrew McNabb
wrote:> I tried creating a multi-device btrfs filesystem for the first time (on
> Fedora 18 with 3.7.2-204.fc18.x86_64), and I ran into some problems.  I
> had heard that btrfs is now reasonably stable, and though I expected to
> possibly see a problem here or there, I was a little surprised at just
> how many problems I encountered in such a short period of time.  I now
> have about a thousand error messages in my kernel logs related to
> several different problems.  Is this roughly the expected level of
> stability for btrfs with multiple devices, or am I just particularly
> lucky? :)
> 
> Am I correct in assuming that I''ll need to switch to md for a few
months
> and try btrfs again later, or are there known problems in the specific
> kernel I''m running that I could avoid by trying a different
version?
> 
> For the sake of being specific, I''ll detail a few of the problems
I''ve
> hit:
> 
> These two may have been caused by a possibly faulty disk (I''m
still
> trying to determine whether it was faulty or whether the bug was purely
> in btrfs):
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=903794
> https://bugzilla.redhat.com/show_bug.cgi?id=904143
> 
> This one was triggered when I tried to remove a possibly faulty disk:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=904197
Actually for this one, how did you remove the disk?  Did you just yank it out
while the box was running?  Did you mount -o degraded and then delete the device
and then remove it?  How exactly did you get to this situation.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Andrew McNabb

2013-Jan-25 21:22 UTC

head link

Re: btrfs stability

On Fri, Jan 25, 2013 at 03:37:17PM -0500, Josef Bacik
wrote:> > https://bugzilla.redhat.com/show_bug.cgi?id=903794
> 
> This one is just a allocator warning because the relocator doesn''t
do the right
> accounting for relocation.  It''s just complainig, we need to fix
it but it won''t
> keep it from working.
I won''t worry about this one, then.
> > https://bugzilla.redhat.com/show_bug.cgi?id=904143
> 
> This I''m almost certain (I have to check) was just a result of me
making fsync
> faster and forgetting to remove this warn on.  It''s fixed
upstream.  Again,
> nothing to worry about, but annoying.
Sounds good.
> > This one was triggered when I tried to remove a possibly faulty disk:
> > 
> > https://bugzilla.redhat.com/show_bug.cgi?id=904197
> > 
> 
> Ok this is a bug, I can fix this.  Basically we tried to read from the
faulty
> disk, it failed, we read from the other copy, and then tried to write the
good
> copy back to the failed disk and when we saw that the IO wasn''t
actually going
> to go to the bad disk we panic''ed.  Silly but easy enough to
understand/fix.
I was a little surprised that this happened after I had already done a
"btrfs dev delete"--is there a way to tell btrfs that a disk really is
gone?
> > With a freshly created filesystem, I got a kernel bug, associated with
a
> > hang in most filesystem operations.  This occurred in the middle of
> > ordinary operation and without any sort of hardware-related errors in
> > the kernel logs.
> > 
> > https://bugzilla.redhat.com/show_bug.cgi?id=904223
> > 
> 
> So this is from the fsync stuff, and I''m sure I fixed this
somewhere but I can''t
> account for where I did it.
Would this also be the cause of the hangs that I''m seeing?  In the end,
a hang with the load rising to 260.10 is the most serious problem. 
It''s
happened a few times, and it gets temporarily fixed by a reboot, but
then tends to recur fairly soon.
> Can you give btrfs-next a try and see if you can
> still reproduce.  Thanks,
Is there a pre-built RPM for btrfs-next, or what''s the best way to try
it out in Fedora without breaking other things?

Thanks for your quick response, and sorry for not responding sooner
(I''ve been interrupted by a few phone calls).

--
Andrew McNabb
http://www.mcnabbs.org/andrew/
PGP Fingerprint: 8A17 B57C 6879 1863 DE55  8012 AB4D 6098 8826 6868
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Andrew McNabb

2013-Jan-25 21:39 UTC

head link

Re: btrfs stability

On Fri, Jan 25, 2013 at 03:53:22PM -0500, Josef Bacik
wrote:> 
> Actually for this one, how did you remove the disk?  Did you just yank it
out
> while the box was running?  Did you mount -o degraded and then delete the
device
> and then remove it?  How exactly did you get to this situation.  Thanks,
I''ve moved my answer over to IRC to reduce the latency in the
conversation.  Thanks again for all the help.

--
Andrew McNabb
http://www.mcnabbs.org/andrew/
PGP Fingerprint: 8A17 B57C 6879 1863 DE55  8012 AB4D 6098 8826 6868
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Andrew McNabb

2013-Jan-26 20:27 UTC

head link

Re: btrfs stability

Here''s an update.  I tried the new kernel, and I seem to be having some
new (possibly worse problems.  In my ssh session, I''m seeing many
errors
of this sort:

Message from syslogd@guru at Jan 26 13:13:14 ...
 kernel:[  308.223834] BUG: soft lockup - CPU#0 stuck for 23s!
 [btrfs-endio-wri:2073]

Message from syslogd@guru at Jan 26 13:13:14 ...
 kernel:[  308.248754] BUG: soft lockup - CPU#2 stuck for 23s!
 [btrfs-delalloc-:594]

In the logs, I''m seeing several warnings and bugs, including:

WARNING: at fs/btrfs/extent_map.c:78 free_extent_map+0x79/0x90 [btrfs]()
WARNING: at lib/list_debug.c:62 __list_del_entry+0x82/0xd0()
BUG: unable to handle kernel NULL pointer dereference at     (null)
BUG: soft lockup - CPU#0 stuck for 22s! [btrfs-endio-wri:1489]
BUG: soft lockup - CPU#1 stuck for 22s! [btrfs-delalloc-:607]

Kernel logs (across a few reboots) are at:

http://students.cs.byu.edu/~amcnabb/messages2

--
Andrew McNabb
http://www.mcnabbs.org/andrew/
PGP Fingerprint: 8A17 B57C 6879 1863 DE55  8012 AB4D 6098 8826 6868
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik

2013-Jan-28 14:17 UTC

head link

Re: btrfs stability

On Sat, Jan 26, 2013 at 01:27:11PM -0700, Andrew McNabb
wrote:> Here''s an update.  I tried the new kernel, and I seem to be having
some
> new (possibly worse problems.  In my ssh session, I''m seeing many
errors
> of this sort:
> 
> Message from syslogd@guru at Jan 26 13:13:14 ...
>  kernel:[  308.223834] BUG: soft lockup - CPU#0 stuck for 23s!
>  [btrfs-endio-wri:2073]
> 
> Message from syslogd@guru at Jan 26 13:13:14 ...
>  kernel:[  308.248754] BUG: soft lockup - CPU#2 stuck for 23s!
>  [btrfs-delalloc-:594]
> 
> In the logs, I''m seeing several warnings and bugs, including:
> 
> WARNING: at fs/btrfs/extent_map.c:78 free_extent_map+0x79/0x90 [btrfs]()
> WARNING: at lib/list_debug.c:62 __list_del_entry+0x82/0xd0()
> BUG: unable to handle kernel NULL pointer dereference at     (null)
> BUG: soft lockup - CPU#0 stuck for 22s! [btrfs-endio-wri:1489]
> BUG: soft lockup - CPU#1 stuck for 22s! [btrfs-delalloc-:607]
> 
> Kernel logs (across a few reboots) are at:
> 
> http://students.cs.byu.edu/~amcnabb/messages2
> 
Hrm well I didn''t expect that.  I will look into this and see what I
can come up
with.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik

2013-Jan-28 15:10 UTC

head link

Re: btrfs stability

On Sat, Jan 26, 2013 at 01:27:11PM -0700, Andrew McNabb
wrote:> Here''s an update.  I tried the new kernel, and I seem to be having
some
> new (possibly worse problems.  In my ssh session, I''m seeing many
errors
> of this sort:
> 
> Message from syslogd@guru at Jan 26 13:13:14 ...
>  kernel:[  308.223834] BUG: soft lockup - CPU#0 stuck for 23s!
>  [btrfs-endio-wri:2073]
> 
> Message from syslogd@guru at Jan 26 13:13:14 ...
>  kernel:[  308.248754] BUG: soft lockup - CPU#2 stuck for 23s!
>  [btrfs-delalloc-:594]
> 
> In the logs, I''m seeing several warnings and bugs, including:
> 
> WARNING: at fs/btrfs/extent_map.c:78 free_extent_map+0x79/0x90 [btrfs]()
> WARNING: at lib/list_debug.c:62 __list_del_entry+0x82/0xd0()
> BUG: unable to handle kernel NULL pointer dereference at     (null)
> BUG: soft lockup - CPU#0 stuck for 22s! [btrfs-endio-wri:1489]
> BUG: soft lockup - CPU#1 stuck for 22s! [btrfs-delalloc-:607]
> 
> Kernel logs (across a few reboots) are at:
> 
> http://students.cs.byu.edu/~amcnabb/messages2
> 
Ok I think I figured it out, can you give this a whirl?  Let me know when you
get testers fatigue ;)

http://koji.fedoraproject.org/koji/taskinfo?taskID=4908932

Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Jan 2013 - btrfs stability

btrfs stability

Re: btrfs stability

Re: btrfs stability

Re: btrfs stability

Re: btrfs stability

Re: btrfs stability

Re: btrfs stability

Re: btrfs stability