thr3ads.net - Btrfs devel - Device delete returns "unable to go below four devices on raid10" on 5 drive setup [Aug 2013]

If this information is useful, please help other people find it:
Share via:

Steven Post

2013-Aug-31 10:12 UTC

Device delete returns "unable to go below four devices on raid10" on 5 drive setup

Hello list,

I have a 5 drive raid10 setup (6th sata port malfunctions, all drives
are 3TB in size).
I want to remove a single drive, yet the ''btrfs device delete''
command
gives me the "unable to go below four devices on raid10" error.

This is the result after first deleting a device, after a check, it
didn''t seem to be removed, then issuing the command again results in
the
error.

Before:
# btrfs filesystem show /dev/sda3
Label: ''maindrivearray''  uuid:
f58976ab-2ce1-4a1c-bc82-22df7d3393b4
	Total devices 5 FS bytes used 2.67TB
	devid    4 size 2.73TB used 1.09TB path /dev/sde3
	devid    3 size 2.73TB used 1.09TB path /dev/sdd3
	devid    2 size 2.73TB used 1.09TB path /dev/sdc3
	devid    6 size 2.73TB used 1.09TB path /dev/sdb3
	devid    5 size 2.73TB used 1.09TB path /dev/sda3

Btrfs Btrfs v0.19

After issuing the command
# btrfs device delete /dev/sde3 /mnt

I get this:
# btrfs filesystem show /dev/sda3
Label: ''maindrivearray''  uuid:
f58976ab-2ce1-4a1c-bc82-22df7d3393b4
	Total devices 5 FS bytes used 2.67TB
	devid    4 size 2.73TB used 1.09TB path /dev/sde3
	devid    3 size 2.73TB used 1.24TB path /dev/sdd3
	devid    2 size 2.73TB used 1.24TB path /dev/sdc3
	devid    6 size 2.73TB used 1.24TB path /dev/sdb3
	devid    5 size 2.73TB used 1.24TB path /dev/sda3

Btrfs Btrfs v0.19

When issuing the delete command again, the error pops up, also after
reboot. The first remove did take a long time to complete and according
to syslog and the ''filesystem show'' command a lot of data was
moved to
the other drives (as expected).

The system is running Debian Wheezy (kernel 3.2.0-4-amd64 #1 SMP Debian
3.2.46-1 x86_64).

Is this something known (and possibly resolved in a later version), or
should I open a bug report about it? Could it be that the device removal
was completed, but still shows as part of the array for some reason?

The reason for the remove is actually that I want to (gradually) replace
the 3TB drives with 1 TB ones, and somewhere in the middle move some of
the data of the array, to another machine, that currently has the 1 TB
drives which I intend to replace with the 3TB ones.

Best regards,
Steven

Duncan

2013-Aug-31 11:41 UTC

head link

Re: Device delete returns "unable to go below four devices on raid10" on 5 drive setup

Steven Post posted on Sat, 31 Aug 2013 12:12:55 +0200 as excerpted:
> Btrfs Btrfs v0.19
> The system is running Debian Wheezy (kernel 3.2.0-4-amd64 #1 SMP Debian
> 3.2.46-1 x86_64).
> 
> Is this something known (and possibly resolved in a later version), or
> should I open a bug report about it? Could it be that the device removal
> was completed, but still shows as part of the array for some reason?
As a sysadmin running btrfs not a dev, I don''t know about your specific
issue, but in general, be aware that btrfs (both the kernelspace 
filesystem and the userspace btrfs-tools) is still experimental and under 
heavy development.  As such, it''s strongly recommended to run the
latest
stable series kernel at the oldest, if not the rcs (here, I run a git 
kernel, but don''t normally update to the new development series until
rc2
or 3, which will hopefully avoid any serious data-destroying bugs, but of 
course on a development filesystem backups are even more important than 
normal in any case, so...), unless you have a specific known problem 
preventing you from doing so.

That would be 3.10.x, with 3.11 late soon to be out now.  Some people do 
run one behind that, so the 3.9 series, but older than that and trying 
something that isn''t as ancient as the hills is generally a strong
first
recommendation, both because many known bugs have been fixed so you''re 
actually taking a bigger risk with older, and because the bug reports 
simply aren''t as useful that far back.

Similarly, the git master branch of btrfs-tools is deliberately kept 
stable and usable -- development happens on other branches and is merged 
-- so a live-git build not older than a couple months (that being about 
the length of a kernel cycle so they''ll be of similar age) is 
recommended.  0.19 is a very old release tho it''s the latest actual 
release, but there''s a much newer 0.20-rc1 tagged if you still
don''t feel
comfortable with a live-git build.

All this and more is covered on the btrfs wiki, found at

https://btrfs.wiki.kernel.org/

If you wish to continue testing btrfs I''d suggest you read up a bit 
there, as if you didn''t know the above, there''s surely a lot
else covered
there that you''re not aware of -- and on a development filesystem that 
lack of knowledge could well bite you!

Alternatively, testing a development filesystem certainly isn''t for 
everybody, and the fact that you''re running an old 3.2 kernel could be
a
hint that you''re looking for something a bit more conservative and
stable
than a development filesystem, making it a poor fit for your needs at 
best.  But that''s for you to decide.  If you''re happy being a
tester and
either have your data well backed up or otherwise consider it losable in 
testing if things go wrong, go for it, but do it right, with current 
kernel and tools so your tests at least have some value if things /do/ go 
wrong! =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Murphy

2013-Aug-31 17:42 UTC

head link

Re: Device delete returns "unable to go below four devices on raid10" on 5 drive setup

On Aug 31, 2013, at 4:12 AM, Steven Post <redalert.commander@gmail.com>
wrote:> 
> The system is running Debian Wheezy (kernel 3.2.0-4-amd64 #1 SMP Debian
> 3.2.46-1 x86_64).
> 
> Is this something known (and possibly resolved in a later version), or
> should I open a bug report about it?
Try 3.10 or 3.11 before filing a bug on it.
> Could it be that the device removal
> was completed, but still shows as part of the array for some reason?
Yes. It might take a few minutes after the chunks are reallocated for the device
to be removed from the volume. I''ve had some cases where even a reboot
was needed for the information in fi sh to refresh.

> The reason for the remove is actually that I want to (gradually) replace
> the 3TB drives with 1 TB ones, and somewhere in the middle move some of
> the data of the array, to another machine, that currently has the 1 TB
> drives which I intend to replace with the 3TB ones.
Use a newer kernel for sure. What you suggest should work. If you''re
testing to see if it does work, and you''re prepared for it not working
(i.e. totally losing the entire file system) and prepared to find a consistent
reproducer if it doesn''t work, then have at it.

Otherwise, create a whole new btrfs volume with recent kernel and btrfs-progs on
the other machine; and then rsync everything from old to new. Rsync has a
checksum option, it will take longer, but you can then be reasonably assured of
file integrity.


Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Hugo Mills

2013-Aug-31 22:20 UTC

head link

Re: Device delete returns "unable to go below four devices on raid10" on 5 drive setup

On Sat, Aug 31, 2013 at 11:42:28AM -0600, Chris Murphy
wrote:> 
> On Aug 31, 2013, at 4:12 AM, Steven Post
<redalert.commander@gmail.com> wrote:
> > 
> > The system is running Debian Wheezy (kernel 3.2.0-4-amd64 #1 SMP
Debian
> > 3.2.46-1 x86_64).
> > 
> > Is this something known (and possibly resolved in a later version), or
> > should I open a bug report about it?
> 
> Try 3.10 or 3.11 before filing a bug on it.
   If you want a debian-packaged kernel, they''re available from the
experimental distribution.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ==  PGP
key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
     --- Great oxymorons of the world, no.  5: Manifesto Promise ---

Steven Post

2013-Aug-31 23:55 UTC

head link

Re: Device delete returns "unable to go below four devices on raid10" on 5 drive setup

On Sat, 2013-08-31 at 11:42 -0600, Chris Murphy wrote:> On Aug 31, 2013, at 4:12 AM, Steven Post
<redalert.commander@gmail.com> wrote:
> > 
> > The system is running Debian Wheezy (kernel 3.2.0-4-amd64 #1 SMP
Debian
> > 3.2.46-1 x86_64).
> > 
> > Is this something known (and possibly resolved in a later version), or
> > should I open a bug report about it?
> 
> Try 3.10 or 3.11 before filing a bug on it.
I don''t intend on upgrading the first machine at this point, but
I''ll
see if I can reproduce this on the second machine which is running
Debian Testing (Jessie), that one has a 3.10.7 kernel. Hugo Mills
suggested using a kernel from experimental, but I don''t feel
comfortable
at running that at this point, as that would be a 3.11-rc4 kernel, I
might consider it if the 3.11 release became available in
''unstable'' (I
understand that Linus might release 3.11 this weekend) .
I might also consider running the 3.10 kernel from backports on the
first machine if that would be necessary for some reason, but we''ll
see.
> 
> > Could it be that the device removal
> > was completed, but still shows as part of the array for some reason?
> 
> Yes. It might take a few minutes after the chunks are reallocated for the
device to be removed from the volume. I''ve had some cases where even a
reboot was needed for the information in fi sh to refresh.
I see, so that might be normal behaviour. Although we''re several hours
later now and there has been a reboot after the first time the "unable
to go below four drives" error. I did start a balance operation after
the reboot, we''ll see what that gives. Once that completes, I intend to
try removing the device again with the ''device delete''
command, if that
still gives the error I''ll just remove the drive from the machine and
go
from there.
> 
> 
> > The reason for the remove is actually that I want to (gradually)
replace
> > the 3TB drives with 1 TB ones, and somewhere in the middle move some
of
> > the data of the array, to another machine, that currently has the 1 TB
> > drives which I intend to replace with the 3TB ones.
> 
> Use a newer kernel for sure. What you suggest should work. If
you''re testing to see if it does work, and you''re prepared for
it not working (i.e. totally losing the entire file system) and prepared to find
a consistent reproducer if it doesn''t work, then have at it.
> 
> Otherwise, create a whole new btrfs volume with recent kernel and
btrfs-progs on the other machine; and then rsync everything from old to new.
Rsync has a checksum option, it will take longer, but you can then be reasonably
assured of file integrity.
The plan was to switch 2 or 3 3TB drives with 1TB drives, then move data
using sftp (scp), and then switch the remaining drives, all this time
keeping the raid10 configuration. Except for the first switch on machine
2 as I didn''t have the capacity to remove a single drive, so I had to
mount degraded.

As I was handling the second machine (3.10.7 kernel) the filesystem
suddenly became read-only during a device delete missing operation with
the a warning in /var/log/syslog (after already adding a new 3TB
device), I''ll add the Call Trace from the log at the end of this
message
for reference. After remounting (with -o degraded again) I issued a
balance which completed successfully, then the device delete command
immediately returned and the the filesystem seemed alright, with no sign
of data loss or corruption.

As an aside, I''d rather not recreate the arrays if it can be done
without recreating. On the other hand we''re not talking about a mission
critical system, I wouldn''t use btrfs for such a system at this point,
but for home use (with backups) or testing, things seem to be in good
shape.
> 
> 
> Chris Murphy
Thanks to all who replied for your responses.

Best regards,
Steven

PS: I forgot to mention it in my first mail, but please CC me, I''m not
subscribed to the list. I''ll try to check the archives to see if I
missed anything though. I see I missed 1 reply on the list, while 1
reply was sent to me directly, and a third didn''t even hit the list
archives (yet?) at spinics.net.

PPS: sorry if I seem to be rambling on a bit about everything in a
non-structured e-mail message.

/var/log/syslog (3.10-2-amd64 #1 SMP Debian 3.10.7-1 (2013-08-17)
x86_64):
[16431.789463] btrfs: relocating block group 1573890818048 flags 65
[16456.635819] btrfs: found 3392 extents
[16459.691201] BTRFS error (device sdb) in
btrfs_commit_transaction:1809: errno=-5 IO failure (Error while writing
out transaction)
[16459.691207] BTRFS info (device sdb): forced readonly
[16459.691210] BTRFS warning (device sdb): Skipping commit of aborted
transaction.
[16459.691212] ------------[ cut here ]------------
[16459.691252] WARNING:
at /build/linux-kDQkfE/linux-3.10.7/fs/btrfs/super.c:254
__btrfs_abort_transaction+0x4a/0xbe [btrfs]()
[16459.691253] btrfs: Transaction aborted (error -5)
[16459.691254] Modules linked in: rpcsec_gss_krb5 nfsv4 nfnetlink_queue
nfnetlink nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack xt_tcpudp
ip6table_filter ip6_tables ebtable_nat ebtables iptable_filter ip_tables
xt_iprange xt_state nf_conntrack ipt_REJECT xt_mark xt_NFQUEUE x_tables
parport_pc ppdev lp parport bnep rfcomm bluetooth snd_hrtimer pci_stub
vboxpci(O) vboxnetadp(O) cpufreq_userspace cpufreq_conservative
cpufreq_powersave cpufreq_stats vboxnetflt(O) vboxdrv(O) binfmt_misc
nfsd auth_rpcgss oid_registry nfs_acl nfs lockd dns_resolver fscache
sunrpc loop fuse joydev adt7475 hwmon_vid snd_hda_codec_realtek
snd_hda_intel coretemp snd_hda_codec snd_hwdep kvm_intel snd_pcm_oss
snd_mixer_oss kvm snd_pcm snd_page_alloc crc32c_intel snd_seq_midi
snd_seq_midi_event ghash_clmulni_intel snd_rawmidi snd_seq eeepc_wmi
iTCO_wdt asus_wmi iTCO_vendor_support sparse_keymap rfkill evdev
aesni_intel snd_seq_device snd_timer aes_x86_64 ablk_helper cryptd lrw
gf128mul glue_helper microcode pcspkr snd nouveau psmouse serio_raw
i2c_i801 mxm_wmi lpc_ich video mfd_core ttm drm_kms_helper drm mperf
i2c_algo_bit i2c_core soundcore wmi mei_me processor button mei
thermal_sys ext4 crc16 jbd2 mbcache btrfs xor zlib_deflate raid6_pq
crc32c libcrc32c dm_mod hid_generic md_mod usbhid hid sg sd_mod
crc_t10dif ata_generic xhci_hcd ehci_pci ehci_hcd pata_via ata_piix ahci
libahci usbcore usb_common libata r8169 mii scsi_mod
[16459.691308] CPU: 0 PID: 5381 Comm: btrfs Tainted: G           O
3.10-2-amd64 #1 Debian 3.10.7-1
[16459.691309] Hardware name: System manufacturer System Product
Name/P8H67, BIOS 1103 08/12/2011
[16459.691311]  0000000000000000 ffffffff8103bb5f ffff8801f75039f0
00000000fffffffb
[16459.691313]  ffff8801f7503a40 ffff88005ed243b0 ffffffffa01f5500
ffffffff8103bc0a
[16459.691315]  ffffffffa01f7288 0000000000000020 ffff8801f7503a50
ffff8801f7503a10
[16459.691317] Call Trace:
[16459.691323]  [<ffffffff8103bb5f>] ? warn_slowpath_common+0x5b/0x70
[16459.691326]  [<ffffffff8103bc0a>] ? warn_slowpath_fmt+0x47/0x49
[16459.691334]  [<ffffffffa017d657>] ? __btrfs_abort_transaction
+0x4a/0xbe [btrfs]
[16459.691344]  [<ffffffffa019dbbe>] ? cleanup_transaction+0x84/0x24f
[btrfs]
[16459.691347]  [<ffffffff81057c67>] ? abort_exclusive_wait+0x79/0x79
[16459.691357]  [<ffffffffa019d870>] ? btrfs_commit_transaction
+0x866/0x878 [btrfs]
[16459.691359]  [<ffffffff81057c67>] ? abort_exclusive_wait+0x79/0x79
[16459.691368]  [<ffffffffa019e0ae>] ? start_transaction+0x325/0x448
[btrfs]
[16459.691371]  [<ffffffff8105f669>] ? should_resched+0x5/0x23
[16459.691374]  [<ffffffff81384167>] ? mutex_lock+0xa/0x27
[16459.691384]  [<ffffffffa01d3988>] ? prepare_to_relocate+0xc2/0xd0
[btrfs]
[16459.691395]  [<ffffffffa01d7d45>] ? relocate_block_group+0x3d/0x4db
[btrfs]
[16459.691404]  [<ffffffffa01d8327>] ? btrfs_relocate_block_group
+0x144/0x268 [btrfs]
[16459.691415]  [<ffffffffa01b9c23>] ? btrfs_relocate_chunk.isra.59
+0x50/0x3f6 [btrfs]
[16459.691421]  [<ffffffffa017e0eb>] ? btrfs_item_key_to_cpu+0x12/0x30
[btrfs]
[16459.691432]  [<ffffffffa01af0fc>] ? btrfs_get_token_64+0x76/0xc6
[btrfs]
[16459.691442]  [<ffffffffa01b19a1>] ? release_extent_buffer+0x90/0x97
[btrfs]
[16459.691452]  [<ffffffffa01bbea0>] ? btrfs_shrink_device+0x1f8/0x35e
[btrfs]
[16459.691462]  [<ffffffffa01be84b>] ? btrfs_rm_device+0x2b8/0x690
[btrfs]
[16459.691472]  [<ffffffffa01c49ed>] ? btrfs_ioctl+0x8ee/0x197d [btrfs]
[16459.691474]  [<ffffffff810dee28>] ? handle_mm_fault+0x1f1/0x238
[16459.691476]  [<ffffffff81388c33>] ? __do_page_fault+0x32d/0x3cb
[16459.691479]  [<ffffffff81115f74>] ? vfs_ioctl+0x1b/0x25
[16459.691480]  [<ffffffff81116795>] ? do_vfs_ioctl+0x3e8/0x42a
[16459.691482]  [<ffffffff81116825>] ? SyS_ioctl+0x4e/0x79
[16459.691484]  [<ffffffff8138ade9>] ? system_call_fastpath+0x16/0x1b
[16459.691485] ---[ end trace 92cca53f6fe2bc37 ]---
[16459.691487] BTRFS error (device sdb) in cleanup_transaction:1449:
errno=-5 IO failure
[16459.691488] delayed_refs has NO entry

Chris Murphy

2013-Sep-01 00:03 UTC

head link

Re: Device delete returns "unable to go below four devices on raid10" on 5 drive setup

On Aug 31, 2013, at 5:55 PM, Steven Post <redalert.commander@gmail.com>
wrote:
> On Sat, 2013-08-31 at 11:42 -0600, Chris Murphy wrote:
>> 
>> Yes. It might take a few minutes after the chunks are reallocated for
the device to be removed from the volume. I''ve had some cases where
even a reboot was needed for the information in fi sh to refresh.
> 
> I see, so that might be normal behaviour.
No, I think it''s expected for deleted devices to not appear in the
volume listing anymore, but with older kernels I had that experience. I
haven''t tried it recently with newer kernels.
> 
> As an aside, I''d rather not recreate the arrays if it can be done
> without recreating.
It should work. But it''s an experimental file system. I''d at
least make a backup if you''re going to do this with the device
add/remove method.


Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Steven Post

2013-Sep-01 12:08 UTC

head link

Re: Device delete returns "unable to go below four devices on raid10" on 5 drive setup

On Sat, 2013-08-31 at 18:03 -0600, Chris Murphy wrote:> On Aug 31, 2013, at 5:55 PM, Steven Post
<redalert.commander@gmail.com> wrote:
> 
> > On Sat, 2013-08-31 at 11:42 -0600, Chris Murphy wrote:
> >> 
> >> Yes. It might take a few minutes after the chunks are reallocated
for the device to be removed from the volume. I''ve had some cases where
even a reboot was needed for the information in fi sh to refresh.
> > 
> > I see, so that might be normal behaviour.
> 
> No, I think it''s expected for deleted devices to not appear in the
volume listing anymore, but with older kernels I had that experience. I
haven''t tried it recently with newer kernels.
I might have phrased that a bit incorrectly, with ''normal
behaviour'' I
meant it was known to do that and not cause major problems. Of course I
would expect the entry to just disappear.
I''ll let you know the result of the ''device delete''
operation on the
second machine (3.10.7 kernel).

Back on the 3.2 kernel, the "filesystem show" command still showed the
removed device, but still with less used space that the others after the
balance. Shutdown + physical removal + boot didn''t produce any error,
since this array is used as the root filesystem I think I would have
noticed a serious problem by now. I successfully added the 1 TB drive to
the array (after partitioning).
So it seems it''s just the output of the ''filesystem
show'' command that
is lagging behind, even after a reboot and a 20 hour idle period.
> 
> > 
> > As an aside, I''d rather not recreate the arrays if it can be
done
> > without recreating.
> 
> It should work. But it''s an experimental file system. I''d
at least make a backup if you''re going to do this with the device
add/remove method.
Naturally, all important files have a backup, but for the rest of the
volume I can live with the fact that it would be lost. Also if every one
created a new filesystem instead of using device add/delete, this code
wouldn''t get much testing ;)

Best regards,
Steven

Steven Post

2013-Sep-01 21:43 UTC

head link

Re: Device delete returns "unable to go below four devices on raid10" on 5 drive setup

On Sun, 2013-09-01 at 14:08 +0200, Steven Post wrote:> On Sat, 2013-08-31 at 18:03 -0600, Chris Murphy wrote:
> > On Aug 31, 2013, at 5:55 PM, Steven Post
<redalert.commander@gmail.com> wrote:
> > 
> > > On Sat, 2013-08-31 at 11:42 -0600, Chris Murphy wrote:
> > >> 
> > >> Yes. It might take a few minutes after the chunks are
reallocated for the device to be removed from the volume. I''ve had some
cases where even a reboot was needed for the information in fi sh to refresh.
> > > 
> > > I see, so that might be normal behaviour.
> > 
> > No, I think it''s expected for deleted devices to not appear
in the volume listing anymore, but with older kernels I had that experience. I
haven''t tried it recently with newer kernels.
> 
> I might have phrased that a bit incorrectly, with ''normal
behaviour'' I
> meant it was known to do that and not cause major problems. Of course I
> would expect the entry to just disappear.
> I''ll let you know the result of the ''device
delete'' operation on the
> second machine (3.10.7 kernel).
The 3.10.7 kernel doesn''t seem to have this issue, the removed device
is
not listed anymore immediately after the ''device delete''
command
returns. Although in that instance the array still had 6 drives, not 5
as with the old one.

[...]

Best regards,
Steven

Btrfs devel - Aug 2013 - Device delete returns "unable to go below four devices on raid10" on 5 drive setup

Device delete returns "unable to go below four devices on raid10" on 5 drive setup

Re: Device delete returns "unable to go below four devices on raid10" on 5 drive setup

Re: Device delete returns "unable to go below four devices on raid10" on 5 drive setup

Re: Device delete returns "unable to go below four devices on raid10" on 5 drive setup

Re: Device delete returns "unable to go below four devices on raid10" on 5 drive setup

Re: Device delete returns "unable to go below four devices on raid10" on 5 drive setup

Re: Device delete returns "unable to go below four devices on raid10" on 5 drive setup

Re: Device delete returns "unable to go below four devices on raid10" on 5 drive setup