thr3ads.net - Btrfs devel - Btrfs lockup during defrag and removing device [Jun 2013]

If this information is useful, please help other people find it:
Share via:

Ondřej Kunc

2013-Jun-24 10:25 UTC

Btrfs lockup during defrag and removing device

Hi developers,

I have been playing with btrfs on our test server. I have streessed it
much ... and I can say ... it''s troughput and features are very nice
and usable, but I experienced one problem during testing, btrfs
triggered lockup of 3 of 8 CPU cores of testing server.

What I was doing ?

Simultaneously:
- copying 130GB file on one subvolume to another file and measuring speed by pv
- removing one device (/dev/md3) from btrfs
- running btrfs defrag on whole  fs (via xargs)
- rsyncing files from another server to subvolume
- untaring the first 130GB tar to one subdirectory

Our server is HP-DL380, 12*146GB SAS HDD, 72GB RAM, Intel Xeon 5620
Running uptodate debian wheezy with kernel and btrfs-tools from testing
3.9-1-amd64 #1 SMP Debian 3.9.6-1 x86_64 GNU/Linux
# btrfs version
Btrfs Btrfs v0.19

btrfs was created on top on 3 software RAID6 devices, every one built
from 4 SAS drive

About one hour after this(see dmesg) server became inaccessible, so I
had to restart it by power cycle.

After reboot there was problem with free space cache, but it was fixed
automatically. I have one suspection, I have tried next this:

btrfs balance start /btrfs
and then btrfs resize 4:max /btrfs (device was previosly smaller)

it failed with dmesg output: btrfs: dev
add/delete/balance/replace/resize operation in progress.

So it is possible, that it is mutualy exclusive and should not be
permitted in the other way, to start balance or defrag when device is
being removed, is this true ?

Thank you all for your good work!

Ondrej Kunc

dmesg output: http://pastebin.com/Ndxypkxa

P.S. in case of duplicate message I''m sorry, but I was not able to
post from our company email.

--
Ondřej Kunc
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik

2013-Jun-24 15:25 UTC

head link

Re: Btrfs lockup during defrag and removing device

On Mon, Jun 24, 2013 at 12:25:07PM +0200, Ondřej Kunc
wrote:> Hi developers,
> 
> I have been playing with btrfs on our test server. I have streessed it
> much ... and I can say ... it''s troughput and features are very
nice
> and usable, but I experienced one problem during testing, btrfs
> triggered lockup of 3 of 8 CPU cores of testing server.
> 
> What I was doing ?
> 
> Simultaneously:
> - copying 130GB file on one subvolume to another file and measuring speed
by pv
> - removing one device (/dev/md3) from btrfs
> - running btrfs defrag on whole  fs (via xargs)
> - rsyncing files from another server to subvolume
> - untaring the first 130GB tar to one subdirectory
> 
> Our server is HP-DL380, 12*146GB SAS HDD, 72GB RAM, Intel Xeon 5620
> Running uptodate debian wheezy with kernel and btrfs-tools from testing
> 3.9-1-amd64 #1 SMP Debian 3.9.6-1 x86_64 GNU/Linux
> # btrfs version
> Btrfs Btrfs v0.19
> 
> btrfs was created on top on 3 software RAID6 devices, every one built
> from 4 SAS drive
> 
> About one hour after this(see dmesg) server became inaccessible, so I
> had to restart it by power cycle.
> 
> After reboot there was problem with free space cache, but it was fixed
> automatically. I have one suspection, I have tried next this:
> 
> btrfs balance start /btrfs
> and then btrfs resize 4:max /btrfs (device was previosly smaller)
> 
> it failed with dmesg output: btrfs: dev
> add/delete/balance/replace/resize operation in progress.
> 
> So it is possible, that it is mutualy exclusive and should not be
> permitted in the other way, to start balance or defrag when device is
> being removed, is this true ?
> 
> Thank you all for your good work!
> 
> Ondrej Kunc
> 
> dmesg output: http://pastebin.com/Ndxypkxa
> 
So it seems like you hit some bug higher up that just made the system devolve
into this chain of panics.  I think you are probably hitting this

https://bugzilla.kernel.org/show_bug.cgi?id=59451

which the strato guys are working on.  If you take the "btrfs defrag"
step out
of that test do you still have the same problem?  If yes then it may be
something new and could you file a new bugzilla if thats the case?  If it
doesn''t reproduce with taking out the defrag step then just attach
yourself to
that bugzilla with a "me too" so you can test whatever patch we come
up with.
Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Ondřej Kunc

2013-Jun-26 11:02 UTC

head link

Re: Btrfs lockup during defrag and removing device

Hi Josef,

I can confirm, that I''m not able to crash it when I didn''t run
defrag
during other operations. So as you wish, I will place "me too" in
bugzilla, as I see, there is a patch available, so I will test it asap
(just need to patch&compile kernel, because I was testing debian
package before).

Thank you all for nice work

Ondrej

2013/6/24 Josef Bacik <jbacik@fusionio.com>:> On Mon, Jun 24, 2013 at 12:25:07PM +0200, Ondřej Kunc wrote:
>> Hi developers,
>>
>> I have been playing with btrfs on our test server. I have streessed it
>> much ... and I can say ... it''s troughput and features are
very nice
>> and usable, but I experienced one problem during testing, btrfs
>> triggered lockup of 3 of 8 CPU cores of testing server.
>>
>> What I was doing ?
>>
>> Simultaneously:
>> - copying 130GB file on one subvolume to another file and measuring
speed by pv
>> - removing one device (/dev/md3) from btrfs
>> - running btrfs defrag on whole  fs (via xargs)
>> - rsyncing files from another server to subvolume
>> - untaring the first 130GB tar to one subdirectory
>>
>> Our server is HP-DL380, 12*146GB SAS HDD, 72GB RAM, Intel Xeon 5620
>> Running uptodate debian wheezy with kernel and btrfs-tools from testing
>> 3.9-1-amd64 #1 SMP Debian 3.9.6-1 x86_64 GNU/Linux
>> # btrfs version
>> Btrfs Btrfs v0.19
>>
>> btrfs was created on top on 3 software RAID6 devices, every one built
>> from 4 SAS drive
>>
>> About one hour after this(see dmesg) server became inaccessible, so I
>> had to restart it by power cycle.
>>
>> After reboot there was problem with free space cache, but it was fixed
>> automatically. I have one suspection, I have tried next this:
>>
>> btrfs balance start /btrfs
>> and then btrfs resize 4:max /btrfs (device was previosly smaller)
>>
>> it failed with dmesg output: btrfs: dev
>> add/delete/balance/replace/resize operation in progress.
>>
>> So it is possible, that it is mutualy exclusive and should not be
>> permitted in the other way, to start balance or defrag when device is
>> being removed, is this true ?
>>
>> Thank you all for your good work!
>>
>> Ondrej Kunc
>>
>> dmesg output: http://pastebin.com/Ndxypkxa
>>
>
> So it seems like you hit some bug higher up that just made the system
devolve
> into this chain of panics.  I think you are probably hitting this
>
> https://bugzilla.kernel.org/show_bug.cgi?id=59451
>
> which the strato guys are working on.  If you take the "btrfs
defrag" step out
> of that test do you still have the same problem?  If yes then it may be
> something new and could you file a new bugzilla if thats the case?  If it
> doesn''t reproduce with taking out the defrag step then just attach
yourself to
> that bugzilla with a "me too" so you can test whatever patch we
come up with.
> Thanks,
>
> Josef


--
Ondřej Kunc
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Jun 2013 - Btrfs lockup during defrag and removing device

Btrfs lockup during defrag and removing device

Re: Btrfs lockup during defrag and removing device

Re: Btrfs lockup during defrag and removing device