Summary: 3 drive raid0 btrfs volume, in a VM. There is no data at risk at all. Purely a test. The volume is pretty much full. I added a larger drive /dev/sde to the existing btrfs volume; then tried to remove one of the smaller drives. I''m getting an error removing the device. It seems this should be possible, e.g. move extents on a failing device to a replacement prior to failure. This is stock Fedora 18 beta kernel, 3.6.1-1.fc18.x86_64 #1 SMP Mon Oct 8 17:19:09 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux [root@f18v ~]# btrfs device add /dev/sde /mnt [root@f18v ~]# btrfs fi show failed to read /dev/sr0 Label: none uuid: 6e96a96e-3357-4f23-b064-0f0713366d45 Total devices 4 FS bytes used 7.52GB devid 4 size 12.00GB used 0.00 path /dev/sde devid 3 size 3.00GB used 3.00GB path /dev/sdd devid 2 size 3.00GB used 2.55GB path /dev/sdc devid 1 size 3.00GB used 3.00GB path /dev/sdb [root@f18v ~]# btrfs fi df /mnt Data, RAID0: total=7.61GB, used=7.51GB System, RAID1: total=8.00MB, used=4.00KB Metadata, RAID1: total=460.75MB, used=10.91MB [root@f18v ~]# btrfs device delete /dev/sdb /mnt ERROR: error removing the device ''/dev/sdb'' - No space left on device [ 570.379652] btrfs: relocating block group 29360128 flags 20 [ 571.739126] btrfs: found 2717 extents [ 571.761061] btrfs: relocating block group 12582912 flags 1 [ 571.787233] btrfs: relocating block group 4194304 flags 4 [ 571.806878] btrfs: relocating block group 0 flags 2 Some time later (hours), I unmounted, remounted, and see this: [root@f18v ~]# btrfs fi show failed to read /dev/sr0 Label: none uuid: 6e96a96e-3357-4f23-b064-0f0713366d45 Total devices 4 FS bytes used 7.52GB devid 4 size 12.00GB used 460.75MB path /dev/sde devid 3 size 3.00GB used 2.55GB path /dev/sdd devid 2 size 3.00GB used 3.00GB path /dev/sdc devid 1 size 3.00GB used 2.53GB path /dev/sdb Subsequent attempts cause the same failure, but no new entries in dmesg. If I try with a different device, there is activity, but after a few seconds, another failure, similar errors in dmesg: [root@f18v ~]# btrfs device delete /dev/sdc /mnt ERROR: error removing the device ''/dev/sdc'' - No space left on device [ 2478.017828] btrfs: relocating block group 8955363328 flags 20 [ 2478.059607] btrfs: relocating block group 8686927872 flags 20 [ 2479.405142] btrfs: found 2735 extents [ 2479.436190] btrfs: relocating block group 8663859200 flags 9 [ 2479.458680] btrfs: relocating block group 8243118080 flags 9 [ 2481.949991] btrfs: found 4 extents [ 2482.700146] btrfs: found 4 extents [ 2482.717422] btrfs: relocating block group 20971520 flags 18 [ 2482.733835] btrfs: found 1 extents User error? Known bug? Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/22/2012 01:32 PM, Chris Murphy wrote:> Summary: 3 drive raid0 btrfs volume, in a VM. There is no data at risk at all. Purely a test. The volume is pretty much full. I added a larger drive /dev/sde to the existing btrfs volume; then tried to remove one of the smaller drives. I''m getting an error removing the device. It seems this should be possible, e.g. move extents on a failing device to a replacement prior to failure. > > This is stock Fedora 18 beta kernel, 3.6.1-1.fc18.x86_64 #1 SMP Mon Oct 8 17:19:09 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux > > > [root@f18v ~]# btrfs device add /dev/sde /mnt > [root@f18v ~]# btrfs fi show > failed to read /dev/sr0 > Label: none uuid: 6e96a96e-3357-4f23-b064-0f0713366d45 > Total devices 4 FS bytes used 7.52GB > devid 4 size 12.00GB used 0.00 path /dev/sde > devid 3 size 3.00GB used 3.00GB path /dev/sdd > devid 2 size 3.00GB used 2.55GB path /dev/sdc > devid 1 size 3.00GB used 3.00GB path /dev/sdb > > [root@f18v ~]# btrfs fi df /mnt > Data, RAID0: total=7.61GB, used=7.51GB > System, RAID1: total=8.00MB, used=4.00KB > Metadata, RAID1: total=460.75MB, used=10.91MB > > [root@f18v ~]# btrfs device delete /dev/sdb /mnt > ERROR: error removing the device ''/dev/sdb'' - No space left on device > > [ 570.379652] btrfs: relocating block group 29360128 flags 20 > [ 571.739126] btrfs: found 2717 extents > [ 571.761061] btrfs: relocating block group 12582912 flags 1 > [ 571.787233] btrfs: relocating block group 4194304 flags 4 > [ 571.806878] btrfs: relocating block group 0 flags 2 > > Some time later (hours), I unmounted, remounted, and see this: > > [root@f18v ~]# btrfs fi show > failed to read /dev/sr0 > Label: none uuid: 6e96a96e-3357-4f23-b064-0f0713366d45 > Total devices 4 FS bytes used 7.52GB > devid 4 size 12.00GB used 460.75MB path /dev/sde > devid 3 size 3.00GB used 2.55GB path /dev/sdd > devid 2 size 3.00GB used 3.00GB path /dev/sdc > devid 1 size 3.00GB used 2.53GB path /dev/sdb > > Subsequent attempts cause the same failure, but no new entries in dmesg. > > If I try with a different device, there is activity, but after a few seconds, another failure, similar errors in dmesg: > > [root@f18v ~]# btrfs device delete /dev/sdc /mnt > ERROR: error removing the device ''/dev/sdc'' - No space left on device > > > [ 2478.017828] btrfs: relocating block group 8955363328 flags 20 > [ 2478.059607] btrfs: relocating block group 8686927872 flags 20 > [ 2479.405142] btrfs: found 2735 extents > [ 2479.436190] btrfs: relocating block group 8663859200 flags 9 > [ 2479.458680] btrfs: relocating block group 8243118080 flags 9 > [ 2481.949991] btrfs: found 4 extents > [ 2482.700146] btrfs: found 4 extents > [ 2482.717422] btrfs: relocating block group 20971520 flags 18 > [ 2482.733835] btrfs: found 1 extents > > User error? Known bug?Might be you need to balance first after adding a new disk btrfs filesystem balance /mnt -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Oct 21, 2012, at 11:04 PM, dima <dolenin@parallels.com> wrote:> Might be you need to balance first after adding a new disk > btrfs filesystem balance /mntGood idea, although I''d argue this is potentially a huge problem if a large file system needed to be rebalanced (could take days or weeks) before removing a device. It''d likely fail in a real scenario by then. But here is the attempt: [root@f18v ~]# btrfs fi balance /mnt ERROR: error during balancing ''/mnt'' - No space left on device There may be more info in syslog - try dmesg | tail [ 5860.532696] btrfs: relocating block group 11070865408 flags 18 [ 5860.557268] btrfs: found 1 extents [ 5860.573295] btrfs: relocating block group 10996416512 flags 20 [ 5860.590275] btrfs: relocating block group 10694426624 flags 20 [ 5860.616739] btrfs: relocating block group 10576199680 flags 20 [ 5860.901766] btrfs: found 2690 extents [ 5860.923053] btrfs: relocating block group 9706930176 flags 9 [ 5863.197821] btrfs: found 4 extents [ 5864.066177] btrfs: found 4 extents [ 5864.086322] btrfs: 8 enospc errors during balance Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Oct 21, 2012, at 10:32 PM, Chris Murphy <lists@colorremedies.com> wrote:> This is stock Fedora 18 beta kernel, 3.6.1-1.fc18.x86_64 #1 SMP Mon Oct 8 17:19:09 UTC 2012 x86_64 x86_64 x86_64 GNU/LinuxProbably not a good idea to omit this is a beta *test candidate* not a beta. Two things that make this possibly not realistic: 1. The virtual disks are obviously very small, 3GB each with the 4th one only 12GB. 2. The original 3 device volume was ~97% full with a single large file prior to adding the 4th device. Approximately 313MB free space remained on the volume. Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Oct 22, 2012 at 12:02:08AM -0600, Chris Murphy wrote:> > On Oct 21, 2012, at 10:32 PM, Chris Murphy <lists@colorremedies.com> wrote: > > > This is stock Fedora 18 beta kernel, 3.6.1-1.fc18.x86_64 #1 SMP Mon Oct 8 17:19:09 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux > > Probably not a good idea to omit this is a beta *test candidate* not a beta. > > Two things that make this possibly not realistic: > > 1. The virtual disks are obviously very small, 3GB each with the 4th one only 12GB. > > 2. The original 3 device volume was ~97% full with a single large file prior to adding the 4th device. Approximately 313MB free space remained on the volume.I''m not entirely sure what''s going on here(*), but it looks like an awkward interaction between the unequal sizes of the devices, the fact that three of them are very small, and the RAID-0/RAID-1 on data/metadata respectively. You can''t relocate any of the data chunks, because RAID-0 requires at least two chunks, and all your data chunks are more than 50% full, so it can''t put one 0.55 GiB chunk on the big disk and one 0.55 GiB chunk on the remaining space on the small disk, which is the only way it could proceed. You _may_ be able to get some more success by changing the data to "single": # btrfs balance start -dconvert=single /mountpoint You may also possibly be able to reclaim some metadata space with: # btrfs balance start -m /mountpoint but I think that''s unlikely. Hugo. (*) It may be an as-yet-undiscovered reservation problem, in which case you get to see Josef scream loudly and hide under his desk, gibbering. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk == PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- If it ain''t broke, hit it again. ---
Thanks for the response Hugo, On Oct 22, 2012, at 3:19 AM, Hugo Mills <hugo@carfax.org.uk> wrote:> I''m not entirely sure what''s going on here(*), but it looks like an > awkward interaction between the unequal sizes of the devices, the fact > that three of them are very small, and the RAID-0/RAID-1 on > data/metadata respectively.I''m fine accepting the devices are very small and the original file system was packed completely full: to the point this is effectively sabotage. The idea was merely to test how a full (I was aiming more for 90%, not 97%, oops) volume handles being migrated to a replacement disk, which I think for a typical user would be larger not the same, knowing in advance that not all of the space on the new disk is usable. And I was doing it at a one order magnitude reduced scale for space consideration.> You can''t relocate any of the data chunks, because RAID-0 requires > at least two chunks, and all your data chunks are more than 50% full, > so it can''t put one 0.55 GiB chunk on the big disk and one 0.55 GiB > chunk on the remaining space on the small disk, which is the only way > it could proceed.Interesting. So the way "device delete" moves extents is not at all similar to how LVM pvmove moves extents, which is unidirectional (away from the device being demoted). My, seemingly flawed, expectation was that "device delete" would cause extents on the deleted device to be moved to the newly added disk. If I add yet another 12GB virtual disk, sdf, and then attempt a delete, it works, no errors. Result: [root@f18v ~]# btrfs device delete /dev/sdb /mnt [root@f18v ~]# btrfs fi show failed to read /dev/sr0 Label: none uuid: 6e96a96e-3357-4f23-b064-0f0713366d45 Total devices 5 FS bytes used 7.52GB devid 5 size 12.00GB used 4.17GB path /dev/sdf devid 4 size 12.00GB used 4.62GB path /dev/sde devid 3 size 3.00GB used 2.68GB path /dev/sdd devid 2 size 3.00GB used 2.68GB path /dev/sdc *** Some devices missing However, I think that last line is a bug. When I [root@f18v ~]# btrfs device delete missing /mnt I get [ 2152.257163] btrfs: no missing devices found to remove So they''re missing but not missing?> btrfs balance start -dconvert=single /mountpointYeah that''s perhaps a better starting point for many regular Joe users setting up a multiple device btrfs volume, in particular where different sized disks can be anticipated. Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2012-10-22 18:42, Chris Murphy wrote:> [root@f18v ~]# btrfs fi show > failed to read /dev/sr0 > Label: none uuid: 6e96a96e-3357-4f23-b064-0f0713366d45 > Total devices 5 FS bytes used 7.52GB > devid 5 size 12.00GB used 4.17GB path /dev/sdf > devid 4 size 12.00GB used 4.62GB path /dev/sde > devid 3 size 3.00GB used 2.68GB path /dev/sdd > devid 2 size 3.00GB used 2.68GB path /dev/sdc > *** Some devices missing > > However, I think that last line is a bug. When IWhich version of "btrfs" tool are you using ? There was a bug on this. Try the latest. BR G.Baroncelli -- gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it> Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Oct 22, 2012 at 10:42:18AM -0600, Chris Murphy wrote:> Thanks for the response Hugo, > > On Oct 22, 2012, at 3:19 AM, Hugo Mills <hugo@carfax.org.uk> wrote: > > > I''m not entirely sure what''s going on here(*), but it looks like an > > awkward interaction between the unequal sizes of the devices, the fact > > that three of them are very small, and the RAID-0/RAID-1 on > > data/metadata respectively. > > I''m fine accepting the devices are very small and the original file system was packed completely full: to the point this is effectively sabotage. > > The idea was merely to test how a full (I was aiming more for 90%, not 97%, oops) volume handles being migrated to a replacement disk, which I think for a typical user would be larger not the same, knowing in advance that not all of the space on the new disk is usable. And I was doing it at a one order magnitude reduced scale for space consideration. > > > > You can''t relocate any of the data chunks, because RAID-0 requires > > at least two chunks, and all your data chunks are more than 50% full, > > so it can''t put one 0.55 GiB chunk on the big disk and one 0.55 GiB > > chunk on the remaining space on the small disk, which is the only way > > it could proceed. > > Interesting. So the way "device delete" moves extents is not at all similar to how LVM pvmove moves extents, which is unidirectional (away from the device being demoted). My, seemingly flawed, expectation was that "device delete" would cause extents on the deleted device to be moved to the newly added disk.It''s more like a balance which moves everything that has some (part of its) existence on a device. So when you have RAID-0 or RAID-1 data, all of the related chunks on other disks get moved too (so in RAID-1, it''s the mirror chunk as well as the chunk on the removed disk that gets rewritten).> If I add yet another 12GB virtual disk, sdf, and then attempt a delete, it works, no errors. Result: > [root@f18v ~]# btrfs device delete /dev/sdb /mnt > [root@f18v ~]# btrfs fi show > failed to read /dev/sr0 > Label: none uuid: 6e96a96e-3357-4f23-b064-0f0713366d45 > Total devices 5 FS bytes used 7.52GB > devid 5 size 12.00GB used 4.17GB path /dev/sdf > devid 4 size 12.00GB used 4.62GB path /dev/sde > devid 3 size 3.00GB used 2.68GB path /dev/sdd > devid 2 size 3.00GB used 2.68GB path /dev/sdc > *** Some devices missing > > However, I think that last line is a bug. When I > > [root@f18v ~]# btrfs device delete missing /mnt > > I get > > [ 2152.257163] btrfs: no missing devices found to remove > > So they''re missing but not missing?If you run sync, or wait for 30 seconds, you''ll find that fi show shows the correct information again -- btrfs fi show reads the superblocks directly, and if you run it immediately after the dev del, they''ve not been flushed back to disk yet.> > btrfs balance start -dconvert=single /mountpoint> Yeah that''s perhaps a better starting point for many regular Joe > users setting up a multiple device btrfs volume, in particular where > different sized disks can be anticipated.I think we should probably default to single on multi-device filesystems, not RAID-0, as this kind of problem bites a lot of people, particularly when trying to drop the second disk in a pair. In similar vein, I''d suggest that an automatic downgrade from RAID-1 to DUP metadata on removing one device from a 2-device array should also be done, but I suspect there''s some good reasons for not doing that, that I''ve not thought of. This has also bitten a lot of people in the past. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk == PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- "There''s more than one way to do it" is not a commandment. It --- is a dire warning.
On Oct 22, 2012, at 11:04 AM, Goffredo Baroncelli <kreijack@gmail.com> wrote:> Which version of "btrfs" tool are you using ? There was a bug on this. > Try the latest.No idea. On Oct 22, 2012, at 11:18 AM, Hugo Mills <hugo@carfax.org.uk> wrote:> > It''s more like a balance which moves everything that has some (part > of its) existence on a device. So when you have RAID-0 or RAID-1 data, > all of the related chunks on other disks get moved too (so in RAID-1, > it''s the mirror chunk as well as the chunk on the removed disk that > gets rewritten).Does this mean "device delete" depends on an ability to make writes to the device being removed? I immediately think of SSD failures, which seem to fail writing, while still being able to reliably read. Would that behavior inhibit the ability to remove the device from the volume?>> [ 2152.257163] btrfs: no missing devices found to remove >> >> So they''re missing but not missing? > > If you run sync, or wait for 30 seconds, you''ll find that fi show > shows the correct information again -- btrfs fi show reads the > superblocks directly, and if you run it immediately after the dev del, > they''ve not been flushed back to disk yet.Even after an hour, btrfs fi show says there are missing devices. After mkfs.btrfs on that "missing" device, ''btrfs fi show'' no longer shows the missing device message.> I think we should probably default to single on multi-device > filesystems, not RAID-0, as this kind of problem bites a lot of > people, particularly when trying to drop the second disk in a pair.I''m not thinking of an obvious advantage raid0 has over single other than performance. It seems the more common general purpose use case is better served by single, especially the likelihood of volumes being grown with arbitrary drive capacities. I found this [1] thread discussing a case where a -d single volume is upgraded to the raid0 profile. I''m not finding this to be the case when trying it today. mkfs.btrfs on 1 drive, then adding a 2nd drive, produces: Data: total=8.00MB, used=128.00KB System, DUP: total=8.00MB, used=4.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=409.56MB, used=24.00KB Metadata: total=8.00MB, used=0.00 This appears to retain the single profile. This is expected at this point? What I find a bit problematic is that metadata is still DUP rather than being automatically upgraded to raid1. What is the likelihood of a mkfs.btrfs 2+ device change in the default data profile from raid0 to single? [1] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/16278-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Oct 22, 2012 at 01:36:31PM -0600, Chris Murphy wrote:> On Oct 22, 2012, at 11:18 AM, Hugo Mills <hugo@carfax.org.uk> wrote: > > > > It''s more like a balance which moves everything that has some (part > > of its) existence on a device. So when you have RAID-0 or RAID-1 data, > > all of the related chunks on other disks get moved too (so in RAID-1, > > it''s the mirror chunk as well as the chunk on the removed disk that > > gets rewritten). > > Does this mean "device delete" depends on an ability to make writes > to the device being removed? I immediately think of SSD failures, > which seem to fail writing, while still being able to reliably read. > Would that behavior inhibit the ability to remove the device from > the volume?No, the device being removed isn''t modified at all. (Which causes its own set of weird problemettes, but I think most of those have gone away).> >> [ 2152.257163] btrfs: no missing devices found to remove > >> > >> So they''re missing but not missing? > > > > If you run sync, or wait for 30 seconds, you''ll find that fi show > > shows the correct information again -- btrfs fi show reads the > > superblocks directly, and if you run it immediately after the dev del, > > they''ve not been flushed back to disk yet. > > Even after an hour, btrfs fi show says there are missing devices. > After mkfs.btrfs on that "missing" device, ''btrfs fi show'' no longer > shows the missing device message.Hmm. Someone had this on IRC yesterday. It sounds like something''s not properly destroying the superblock(s) on the removed device.> > I think we should probably default to single on multi-device > > filesystems, not RAID-0, as this kind of problem bites a lot of > > people, particularly when trying to drop the second disk in a pair. > > I''m not thinking of an obvious advantage raid0 has over single other > than performance. It seems the more common general purpose use case > is better served by single, especially the likelihood of volumes > being grown with arbitrary drive capacities.Indeed.> I found this [1] thread discussing a case where a -d single volume > is upgraded to the raid0 profile. I''m not finding this to be the > case when trying it today. mkfs.btrfs on 1 drive, then adding a 2nd > drive, produces: > Data: total=8.00MB, used=128.00KB > System, DUP: total=8.00MB, used=4.00KB > System: total=4.00MB, used=0.00 > Metadata, DUP: total=409.56MB, used=24.00KB > Metadata: total=8.00MB, used=0.00 > > This appears to retain the single profile. This is expected at this > point? What I find a bit problematic is that metadata is still DUP > rather than being automatically upgraded to raid1.Yes, the automatic single -> RAID-0 upgrade was fixed. If you haven''t run a balance on (at least) the metadata after adding the new device, then you won''t get the DUP -> RAID-1 upgrade on metadata. (I can tell you haven''t run the balance, because you still have the empty single metadata chunk).> What is the likelihood of a mkfs.btrfs 2+ device change in the > default data profile from raid0 to single?Non-zero. I think it mostly just wants someone to write the patch, and then beat off any resulting bikeshedding. :) Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk == PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I spent most of my money on drink, women and fast cars. The --- rest I wasted. -- James Hunt
On 2012-10-22 21:50, Hugo Mills wrote:>>>>> It''s more like a balance which moves everything that has >>>>> some (part of its) existence on a device. So when you have >>>>> RAID-0 or RAID-1 data, all of the related chunks on other >>>>> disks get moved too (so in RAID-1, it''s the mirror chunk as >>>>> well as the chunk on the removed disk that gets >>>>> rewritten). >>> >>> Does this mean "device delete" depends on an ability to make >>> writes to the device being removed? I immediately think of SSD >>> failures, which seem to fail writing, while still being able to >>> reliably read. Would that behavior inhibit the ability to >>> remove the device from the volume? > No, the device being removed isn''t modified at all. (Which causes > its own set of weird problemettes, but I think most of those have > gone away).IIRC, when a device is deleted, the 1st superblock is zeroed. Moreover btrfs needs to be able to read the device in order to delete it. Of course these rules aren''t applied when a device is classified as "missing". See the function btrfs_rm_device() in fs/btrfs/volumes.c for the details.>-- gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it> Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Oct 22, 2012, at 1:50 PM, Hugo Mills <hugo@carfax.org.uk> wrote:> > Yes, the automatic single -> RAID-0 upgrade was fixed. If you > haven''t run a balance on (at least) the metadata after adding the new > device, then you won''t get the DUP -> RAID-1 upgrade on metadata. (I > can tell you haven''t run the balance, because you still have the empty > single metadata chunk).I think it will be common for a balance of either type to not occur to most users. So if there''s a way to automatically get a metadata only balance, such that metadata goes from dup to raid1, that would be nice.>> What is the likelihood of a mkfs.btrfs 2+ device change in the >> default data profile from raid0 to single? > > Non-zero. I think it mostly just wants someone to write the patch, > and then beat off any resulting bikeshedding. :)I''m curious of the negatives of -d single by default. Maybe this warrants a separate thread for discussion? As for the ensuing input for additional features once going down that path, the only one I''m coming up with is already mentioned above. Simple make options or additional steps go away for "most" people - whoever they are. Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 22 Oct 2012 18:18 +0100, from hugo@carfax.org.uk (Hugo Mills):>> [root@f18v ~]# btrfs device delete /dev/sdb /mnt >> [root@f18v ~]# btrfs fi show >> failed to read /dev/sr0 >> Label: none uuid: 6e96a96e-3357-4f23-b064-0f0713366d45 >> Total devices 5 FS bytes used 7.52GB >> devid 5 size 12.00GB used 4.17GB path /dev/sdf >> devid 4 size 12.00GB used 4.62GB path /dev/sde >> devid 3 size 3.00GB used 2.68GB path /dev/sdd >> devid 2 size 3.00GB used 2.68GB path /dev/sdc >> *** Some devices missing >> >> However, I think that last line is a bug. When I >> >> [root@f18v ~]# btrfs device delete missing /mnt >> >> I get >> >> [ 2152.257163] btrfs: no missing devices found to remove >> >> So they''re missing but not missing? > > If you run sync, or wait for 30 seconds, you''ll find that fi show > shows the correct information again -- btrfs fi show reads the > superblocks directly, and if you run it immediately after the dev del, > they''ve not been flushed back to disk yet.That sounds like it has the potential to bite a lot of people in the rear. Yes, 30 seconds or a sync is trivial, but only if you know about it. Considering that a device delete is a pretty rare but potentially important operation, would it not be better for a sync to be done automatically after a "device delete" command? And potentially others in a similar vein. With an option --no-sync or similar to disable the behavior (in the relatively unlikely situation that multiple devices are unavailable and need to be deleted, for example). I can definitely see the described behavior qualifying as a "WTF?" moment. -- Michael Kjörling • http://michael.kjorling.se • michael@kjorling.se “People who think they know everything really annoy those of us who know we don’t.” (Bjarne Stroustrup)
On 2012-10-23 09:57, Michael Kjörling wrote:> On 22 Oct 2012 18:18 +0100, from hugo@carfax.org.uk (Hugo Mills): >>> [root@f18v ~]# btrfs device delete /dev/sdb /mnt [root@f18v ~]# >>> btrfs fi show failed to read /dev/sr0 Label: none uuid: >>> 6e96a96e-3357-4f23-b064-0f0713366d45 Total devices 5 FS bytes >>> used 7.52GB devid 5 size 12.00GB used 4.17GB path /dev/sdf >>> devid 4 size 12.00GB used 4.62GB path /dev/sde devid 3 >>> size 3.00GB used 2.68GB path /dev/sdd devid 2 size 3.00GB >>> used 2.68GB path /dev/sdc *** Some devices missing >>> >>> However, I think that last line is a bug. When I >>> >>> [root@f18v ~]# btrfs device delete missing /mnt >>> >>> I get >>> >>> [ 2152.257163] btrfs: no missing devices found to remove >>> >>> So they''re missing but not missing? >> >> If you run sync, or wait for 30 seconds, you''ll find that fi >> show shows the correct information again -- btrfs fi show reads >> the superblocks directly, and if you run it immediately after the >> dev del, they''ve not been flushed back to disk yet. > > That sounds like it has the potential to bite a lot of people in > the rear. Yes, 30 seconds or a sync is trivial, but only if you > know about it.IIRC Chris [Mason] told that when a disk is removed, the super-block (signature) is zeroed and a sync is issued. Looking at the code, confirm it: form fs/btrfs/volume.c: int btrfs_rm_device(struct btrfs_root *root, char *device_path) { [...] /* * at this point, the device is zero sized. We want to * remove it from the devices list and zero out the old super */ if (clear_super) { /* make sure this device isn''t detected as part of * the FS anymore */ memset(&disk_super->magic, 0, sizeof(disk_super->magic)); set_buffer_dirty(bh); sync_dirty_buffer(bh); } [...] I think that what Chris [Murphy] was reported is a bug of the btrfs user space program (which is corrected in the latest git). Unfortunately we don''t know which version Chris is using so we cannot reach a conclusion (if the bug was corrected, or it is a new bug). BR G.Baroncelli> > Considering that a device delete is a pretty rare but potentially > important operation, would it not be better for a sync to be done > automatically after a "device delete" command? And potentially > others in a similar vein. With an option --no-sync or similar to > disable the behavior (in the relatively unlikely situation that > multiple devices are unavailable and need to be deleted, for > example). > > I can definitely see the described behavior qualifying as a "WTF?" > moment. >-- gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it> Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Oct 23, 2012, at 12:10 PM, Goffredo Baroncelli <kreijack@gmail.com> wrote:> > I think that what Chris [Murphy] was reported is a bug of the btrfs > user space program (which is corrected in the latest git). > Unfortunately we don''t know which version Chris is using so we cannot > reach a conclusion (if the bug was corrected, or it is a new bug).btrfs-progs-0.20.rc1.20121017git91d9eec-1.fc18.x86_64 Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2012-10-23 20:17, Chris Murphy wrote:> > On Oct 23, 2012, at 12:10 PM, Goffredo Baroncelli <kreijack@gmail.com> wrote: >> >> I think that what Chris [Murphy] was reported is a bug of the btrfs >> user space program (which is corrected in the latest git). >> Unfortunately we don''t know which version Chris is using so we cannot >> reach a conclusion (if the bug was corrected, or it is a new bug). > > btrfs-progs-0.20.rc1.20121017git91d9eec-1.fc18.x86_64Definitely this version contains my patch. Are you able to reproduce this bug (the missing device warning)? If so, please could you be so kindly to test the Stefan''s tool [*] to dump the super-blocks of the involved disks ? Thanks G.Baroncelli [*] See http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg19572.html> > > > Chris Murphy > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it> Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Oct 23, 2012, at 1:02 PM, Goffredo Baroncelli <kreijack@inwind.it> wrote:> On 2012-10-23 20:17, Chris Murphy wrote: >> >> btrfs-progs-0.20.rc1.20121017git91d9eec-1.fc18.x86_64 > > Definitely this version contains my patch. > > Are you able to reproduce this bug (the missing device warning)?Yes. 100%. It''s there hours later.> If so, > please could you be so kindly to test the Stefan''s tool [*] to dump the > super-blocks of the involved disks ?I would but haven''t the first clue what to do with this: http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg19572.html This happens even with no data, so I can dd the entire missing device into a sparse file: dd to img, cp --sparse, tar with -Sj. The 80GB /dev/sdb becomes 2.5K. It''s here now: https://dl.dropbox.com/u/3253801/devsdb_image_sparse.img.tar.bz2 And by the time I''ve done all that, the "*** Some devices missing" message is still present with ''btrfs fi show'' Or since it''s a VM that I don''t care about at all, I can give you login info. Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Chris, I was able to reproduce (partially) your behaviour. First I created 3 disk of 3GB. I formatted them, then I filled them with $ dd if=/dev/zero of=/mnt/btrfs1/bigfile bs=1M count=$((7*1024)) Then I got $ sudo /mnt/home-ghigo/btrfs/btrfs-progs/btrfs fi sh Label: ''test1'' uuid: 7ba72d6f-d226-4e8c-9a9c-92a7fd89cd99 Total devices 3 FS bytes used 7.01GB devid 3 size 3.00GB used 3.00GB path /dev/vdd devid 2 size 3.00GB used 2.55GB path /dev/vdc devid 1 size 3.00GB used 3.00GB path /dev/vdb Then I added a 12GB devices, and then I did $ sudo btrfs device del /dev/vdb /mnt/btrfs1 After a bit of time I *succeeded* to remove the disk. Then I did: $ sudo /mnt/home-ghigo/btrfs/btrfs-progs/btrfs fi sh Label: ''test1'' uuid: 7ba72d6f-d226-4e8c-9a9c-92a7fd89cd99 Total devices 4 FS bytes used 7.01GB devid 4 size 12.00GB used 3.21GB path /dev/vdf devid 3 size 3.00GB used 2.53GB path /dev/vdd devid 2 size 3.00GB used 2.95GB path /dev/vdc *** Some devices missing Which is what you are reporting. Further analysis revealed that: 1) The /dev/sdb superblock signatures is really zeroed by btrfs after removing. In fact the disk /dev/vdb disappeared from the list above as considered not valid. 2) The other superblocks aren''t reset to the new setup after removing. Every superblock still stores as number of disk the (old) value of 4. 3) *If I don''t touch* the filesystem the situation above doesn''t change. 4) If I touch the filesystem (eg creating a file or unmount the filesystem) then the superblocks are update and the I get $ sudo /mnt/home-ghigo/btrfs/btrfs-progs/btrfs fi show Label: ''test1'' uuid: 7ba72d6f-d226-4e8c-9a9c-92a7fd89cd99 Total devices 3 FS bytes used 7.01GB devid 4 size 12.00GB used 3.21GB path /dev/vdf devid 3 size 3.00GB used 2.53GB path /dev/vdd devid 2 size 3.00GB used 2.95GB path /dev/vdc Conclusion: - I was not able to reproduce your problem about removing the device. I was able to remove the device after filling the filesystem and adding a new device. - After removing a device, its superblock is zeroed. However the other superblock aren''t updated - To update the other superblocks, it is needed to a) make some writing to the filesystem [and wait the 30s timeout] or b) unmount the filesystem. If you don''t do that the superblocks still contain the old value. If this will be confirmed, this my be considered a bug because in case of power supply shortage the filesystem stores an incorrect information about the number of disks which compose the filesystem. BR G.Baroncelli On 2012-10-23 22:28, Chris Murphy wrote:> > On Oct 23, 2012, at 1:02 PM, Goffredo Baroncelli <kreijack@inwind.it> wrote: > >> On 2012-10-23 20:17, Chris Murphy wrote: >>> >>> btrfs-progs-0.20.rc1.20121017git91d9eec-1.fc18.x86_64 >> >> Definitely this version contains my patch. >> >> Are you able to reproduce this bug (the missing device warning)? > > Yes. 100%. It''s there hours later. > >> If so, >> please could you be so kindly to test the Stefan''s tool [*] to dump the >> super-blocks of the involved disks ? > > I would but haven''t the first clue what to do with this: > http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg19572.html > > This happens even with no data, so I can dd the entire missing device into a sparse file: dd to img, cp --sparse, tar with -Sj. The 80GB /dev/sdb becomes 2.5K. It''s here now: > https://dl.dropbox.com/u/3253801/devsdb_image_sparse.img.tar.bz2 > > And by the time I''ve done all that, the "*** Some devices missing" message is still present with ''btrfs fi show'' > > Or since it''s a VM that I don''t care about at all, I can give you login info. > > Chris Murphy-- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it> Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Oct 23, 2012, at 4:16 PM, Goffredo Baroncelli <kreijack@inwind.it> wrote:> > > $ sudo /mnt/home-ghigo/btrfs/btrfs-progs/btrfs fi sh > Label: ''test1'' uuid: 7ba72d6f-d226-4e8c-9a9c-92a7fd89cd99 > Total devices 4 FS bytes used 7.01GB > devid 4 size 12.00GB used 3.21GB path /dev/vdf > devid 3 size 3.00GB used 2.53GB path /dev/vdd > devid 2 size 3.00GB used 2.95GB path /dev/vdc > *** Some devices missing > > Which is what you are reporting.Yes. However I can reproduce this much more easily as: [root@f18v ~]# mkfs.btrfs /dev/sd[bc] [root@f18v ~]# mount /dev/sdb /mnt [root@f18v ~]# btrfs device add /dev/sdd /mnt [root@f18v ~]# btrfs device delete /dev/sdb /mnt [root@f18v ~]# btrfs fi show Label: none uuid: 0daeada5-98c0-4a9a-8d0c-5a9dcfde2972 Total devices 3 FS bytes used 796.00KB devid 3 size 80.00GB used 6.06GB path /dev/sdd devid 2 size 80.00GB used 6.06GB path /dev/sdc *** Some devices missing> 4) If I touch the filesystem (eg creating a file or unmount the > filesystem) then the superblocks are update and the I getConfirmed.> Conclusion: > - I was not able to reproduce your problem about removing the device. I > was able to remove the device after filling the filesystem and adding a > new device.dd if=/dev/zero of=/mnt/btrfs1/bigfile bs=1M count=$((7*1024)) I think it needs to be bigger. I was at a bit over 8GB file size for a 9GB file system (3x 3GB drives). There was about 300MB of free space left according to df -h, which was for the whole volume, i.e. maybe around 100MB free space per device, and hence possibly not enough room to budge unless I added yet another drive. Then it was able to back out. Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Goffredo Baroncelli
2012-Oct-24 18:06 UTC
Re: device delete, error removing device [SOLVED]
On 2012-10-24 00:29, Chris Murphy wrote:> > On Oct 23, 2012, at 4:16 PM, Goffredo Baroncelli <kreijack@inwind.it> > wrote:> > I think it needs to be bigger. I was at a bit over 8GB file size for > a 9GB file system (3x 3GB drives). There was about 300MB of free > space left according to df -h, which was for the whole volume, i.e. > maybe around 100MB free space per device, and hence possibly not > enough room to budge unless I added yet another drive. Then it was > able to back out.I was able to reproduce it: - I filled the filesystem until I got "No space left on device". - Then I added a new device -> success - I balanced -> success The status is: ghigo@emulato:~$ sudo /mnt/home-ghigo/btrfs/btrfs-progs/btrfs fi show Label: ''test2'' uuid: 11d0f1a8-2770-4ff2-8df5-f772f1056edc Total devices 4 FS bytes used 8.29GB devid 4 size 12.00GB used 3.35GB path /dev/vdf devid 3 size 3.00GB used 2.53GB path /dev/vdd devid 2 size 3.00GB used 2.56GB path /dev/vdc devid 1 size 3.00GB used 2.55GB path /dev/vdb Note the used space. Then I removed /dev/vdb but I got $ sudo /mnt/home-ghigo/btrfs/btrfs-progs/btrfs dev del /dev/vdb /mnt/btrfs1/ ERROR: error removing the device ''/dev/vdb'' - No space left on device The interesting things is: ghigo@emulato:~$ sudo /mnt/home-ghigo/btrfs/btrfs-progs/btrfs fi show Label: ''test2'' uuid: 11d0f1a8-2770-4ff2-8df5-f772f1056edc Total devices 4 FS bytes used 7.63GB devid 4 size 12.00GB used 3.48GB path /dev/vdf devid 3 size 3.00GB used 3.00GB path /dev/vdd devid 2 size 3.00GB used 3.00GB path /dev/vdc devid 1 size 3.00GB used 2.55GB path /dev/vdb So it seems that I spread all the data to the other disk, filling up the smaller ones. So it stuck to "No space left on device". Now I rebalanced with -dconvert=single, as suggested by Hugo, then I was able to remove the disk: Label: ''test2'' uuid: 11d0f1a8-2770-4ff2-8df5-f772f1056edc Total devices 3 FS bytes used 7.63GB devid 4 size 12.00GB used 9.48GB path /dev/vdf devid 3 size 3.00GB used 492.94MB path /dev/vdd devid 2 size 3.00GB used 64.00MB path /dev/vdc GB -- gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it> Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Oct 24, 2012, at 12:06 PM, Goffredo Baroncelli <kreijack@gmail.com> wrote:>> > > I was able to reproduce it: > > - I filled the filesystem until I got "No space left on device".I didn''t even need to get that far.> So it seems that I spread all the data to the other disk, filling up the > smaller ones. So it stuck to "No space left on device". > > Now I rebalanced with -dconvert=single, as suggested by Hugo, then I was > able to remove the disk: > > Label: ''test2'' uuid: 11d0f1a8-2770-4ff2-8df5-f772f1056edc > Total devices 3 FS bytes used 7.63GB > devid 4 size 12.00GB used 9.48GB path /dev/vdf > devid 3 size 3.00GB used 492.94MB path /dev/vdd > devid 2 size 3.00GB used 64.00MB path /dev/vdcIt''s an interesting solution, but difficult for a larger file system. Or at least, could be very time consuming. Aside from the "no space left" problem, the ''device delete'' behavior itself has kindof a high penalty: a successful ''device delete'' on a five disk raid10 (one was added in advance of the delete), all disks are significantly written to, not merely a reconstruction of the replaced disk. It means a lot of writing to do disk removals in the face of an impending disk failure. Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Goffredo Baroncelli
2012-Oct-24 21:30 UTC
Re: device delete, error removing device [SOLVED]
On 2012-10-24 21:13, Chris Murphy wrote:> > On Oct 24, 2012, at 12:06 PM, Goffredo Baroncelli > <kreijack@gmail.com> wrote: >>> >> >> I was able to reproduce it: >> >> - I filled the filesystem until I got "No space left on device". > > I didn''t even need to get that far. > > >> So it seems that I spread all the data to the other disk, filling >> up the smaller ones. So it stuck to "No space left on device". >> >> Now I rebalanced with -dconvert=single, as suggested by Hugo, then >> I was able to remove the disk: >> >> Label: ''test2'' uuid: 11d0f1a8-2770-4ff2-8df5-f772f1056edc Total >> devices 3 FS bytes used 7.63GB devid 4 size 12.00GB used 9.48GB >> path /dev/vdf devid 3 size 3.00GB used 492.94MB path /dev/vdd >> devid 2 size 3.00GB used 64.00MB path /dev/vdc > > It''s an interesting solution, but difficult for a larger file system. > Or at least, could be very time consuming.It is not a solution but a workaround.> Aside from the "no space left" problem, the ''device delete'' behavior > itself has kindof a high penalty: a successful ''device delete'' on a > five disk raid10 (one was added in advance of the delete), all disks > are significantly written to, not merely a reconstruction of the > replaced disk. It means a lot of writing to do disk removals in the > face of an impending disk failure.I am not telling that this is the right solution, I am telling that this is the only solution available now :-( However this page https://btrfs.wiki.kernel.org/index.php/Project_ideas#Drive_swapping states that someone is working on this kind of issue. G.Baroncelli> > > Chris Murphy > > -- To unsubscribe from this list: send the line "unsubscribe > linux-btrfs" in the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it> Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Oct 24, 2012, at 3:30 PM, Goffredo Baroncelli <kreijack@gmail.com> wrote:> On 2012-10-24 21:13, Chris Murphy wrote: >> >> It''s an interesting solution, but difficult for a larger file system. >> Or at least, could be very time consuming. > It is not a solution but a workaround.Understood.> >> Aside from the "no space left" problem, the ''device delete'' behavior >> itself has kindof a high penalty: a successful ''device delete'' on a >> five disk raid10 (one was added in advance of the delete), all disks >> are significantly written to, not merely a reconstruction of the >> replaced disk. It means a lot of writing to do disk removals in the >> face of an impending disk failure. > > I am not telling that this is the right solution, I am telling that this > is the only solution available now :-(Fair enough.> > However this page > > https://btrfs.wiki.kernel.org/index.php/Project_ideas#Drive_swapping > > states that someone is working on this kind of issue.Yep, I see I''m merely stating what is already known, sorry about that. Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Goffredo Baroncelli
2012-Oct-25 19:26 UTC
Re: device delete, error removing device [SOLVED]
On 2012-10-24 23:43, Chris Murphy wrote:>> However this page >> > >> > https://btrfs.wiki.kernel.org/index.php/Project_ideas#Drive_swapping >> > >> > states that someone is working on this kind of issue. > Yep, I see I''m merely stating what is already known, sorry about that.No, please; you are really raising the attention to an hot point of btrfs. Its capability of working when a disk is failed is not enough. For example it is impossible to remove a disk which disappeared (I have umount the filesystem, then mount it in degraded mode)... Btrfs is a filesystem in development, so it is OK that not everything is working. The "complaints" raise the attention level. -- gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it> Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
3.6.3-3.fc18.x86_64.debug btrfs-progs-0.20.rc1.20121017git91d9eec-1.fc18.x86_64 I''m getting a very different result with this kernel compared to 3.6.2, when I do the same thing. I fill the btrfs volume to 97% full again, no errors. Add a device of the *same* size, and then device delete. In this case, the ''device delete'' command hangs, no recovery, and dmesg from another shell reports the file system is forced read only. The debug kernel produces quite a bit more information so I''ll post that here: http://pastebin.com/8d1b6eCn Label: ''filltest'' uuid: c0a4c7d7-7a23-4ce3-bafe-20cb92156562 Total devices 3 FS bytes used 13.84GB devid 3 size 8.00GB used 19.00MB path /dev/sdd devid 2 size 8.00GB used 8.00GB path /dev/sdc devid 1 size 8.00GB used 8.00GB path /dev/sdb [root@f18v ~]# btrfs fi df /mnt Data, RAID0: total=13.95GB, used=13.82GB Data: total=8.00MB, used=0.00 System, RAID1: total=8.00MB, used=4.00KB System: total=4.00MB, used=0.00 Metadata, RAID1: total=1.02GB, used=19.09MB Metadata: total=8.00MB, used=0.00 Two minutes later I get more from dmesg since btrfs is blocked: http://pastebin.com/BznS3dF0 The volume can''t be unmounted and the stuck process can''t be killed. So I reboot. Mounting it produces: [ 45.540143] device label filltest devid 1 transid 17 /dev/sdb [ 45.545417] btrfs: disk space caching is enabled [ 45.566326] btrfs: free space inode generation (0) did not match free space cache generation (1858) [ 45.598677] btrfs: free space inode generation (0) did not match free space cache generation (1832) [ 45.794886] btrfs: unlinked 1 orphans Otherwise the file system seems fine. And btrfs balance start -dconvert=single /mnt Does eventually unwind the problem. If the scenario allows adding a 4th device to this situation, it''s faster because the balance isn''t required. Thus deleting the (hypothetically troublesome) device occurs more quickly while also not requiring significant write capability to it. Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html