Hello all
I'm a somewhat new btrfs user, and have recently finalized the conversion of
my
desktop to btrfs with the 1TB backup partition on my 3TB USB3 drive.
The concrete issue is that I have seen two enospc errors this week while running
a full balance on the backup partition.
A little background on the system first: it consists, as of now, of 3 btrfs
file systems (for more details look towards the bottom of this email):
- / on a single 120GB Crucial M500 SSD (data single, metadata DUP)
- /home on 4x320GB disks (data raid10, metadata raid1)
- the aforementioned backup partition on the external USB3 drive (data single,
metadata DUP)
This is the end result of a conversion process I had started about two to three
months ago. My system started off with mdraid + LVM2 (/ and /boot on RAID1,
LVM on RAID10), and no SSD, and no backups (nasty, I know). Both the SSD and
the backup partition were originally ext4, and were converted using
btrfs-convert.
After converting the backup partition about a week ago, following the wiki entry
on ext4 conversion, I eventually ran a full balance (after first converting
metadata to DUP and running a "lighter" balance with -dusage=50 or so,
although
that was probably a waste of time).
The full balance was still running the same night, but the morning after I found
that it aborted with ENOSPC (after about 12-13 hours, I think). My syslog said
the following:
kernel: BTRFS info (device sdg2): 4 enospc errors during balance
Some additional information: a balance was running on /home (which finished
successfully), and fcron started a weekly scrub on the backup partition, which
finished (also without errors) shortly after the ENOSPC error.
Then, since I wasn't thinking properly, I started a new balance (which ran
fine) before collecting the information below. From memory I can say this:
the total size of metadata and data was nowhere *near* full disk capacity. In
fact, it was very close to the "btrfs fi df" output below.
Furthermore, the
output of "btrfs balance status" reported "(-nan)" where
"(N considered)"
normally appears (I think the other two numbers were incorrect, too, like
"0
out of about 0", but don't remember their exact values).
Now, when I fist set up my backups, I decided on rsnapshot. After converting
the backup partition to btrfs, rsnapshot got *really* slow, so this week I
switched to plain rsync + btrfs-snap, wrapped in two custom shell scripts (I
will switch to btrfs-send/recv once I think that they are stable). This is
consistently faster, but still more erratic and slower than rsnapshot with an
ext4 target file system (about 8-23 minutes vs. 5-10 minutes).
Finally we arrive at today, where, after deleting the rest of my old rsnapshot
backups, I did a full rebalance (because it freed up a *lot* of space, both
data and metadata), which also aborted with ENOSPC:
kernel: BTRFS info (device sdg2): 2 enospc errors during balance
This time, after about 3 hours, it was almost done (241 of 244 chunks), and
"balance status" didn't show the -nan I had seen the previous
time. The only
thing I noticed was that "total" space jumped by several GB (from 229
to 236 or
so), while "used" only increased by 1-2 GB a few minutes before the
balance
aborted. Starting and aborting another balance freed the space again.
Looking through my local ML archive, I found some problem reports related to
balance. The one most similar to mine (AFAICT) is "3.14.2 Debian kernel
BTRFS
corruption after balance" from Russel Coker, although in my case the file
system has yet to end up corrupted.
And finally, I'd like to make clear that up to now I have been very happy
with
btrfs and that this is the first real issue I have encountered with it
(although I don't use a lot of its features yet). For my usage I definitely
like it a lot more than mdraid + LVM2 :-) .
The requested output as per wiki, from shortly after I got the first error:
marcec marcec # uname -a
Linux marcec 3.15.5-gentoo #1 SMP PREEMPT Fri Jul 11 00:18:11 CEST 2014
x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ AuthenticAMD GNU/Linux
marcec marcec # btrfs --version
Btrfs v3.14.2
marcec marcec # btrfs fi show
Label: none uuid: 0267d8b3-a074-460a-832d-5d5fd36bae64
Total devices 1 FS bytes used 42.19GiB
devid 1 size 107.79GiB used 50.06GiB path /dev/sdf1
Label: 'MARCEC_STORAGE' uuid: 472c9290-3ff2-4096-9c47-0612d3a52cef
Total devices 4 FS bytes used 476.02GiB
devid 1 size 298.09GiB used 238.03GiB path /dev/sda
devid 2 size 298.09GiB used 239.03GiB path /dev/sdb
devid 3 size 298.09GiB used 240.00GiB path /dev/sdc
devid 4 size 298.09GiB used 239.00GiB path /dev/sdd
Label: 'MARCEC_BACKUP' uuid: f97b3cda-15e8-418b-bb9b-235391ef2a38
Total devices 1 FS bytes used 167.81GiB
devid 1 size 976.56GiB used 180.06GiB path /dev/sdg2
Btrfs v3.14.2
marcec marcec # btrfs fi df /run/media/marcec/MARCEC_BACKUP
Data, single: total=160.00GiB, used=159.06GiB
System, DUP: total=32.00MiB, used=28.00KiB
Metadata, DUP: total=10.00GiB, used=8.80GiB
unknown, single: total=512.00MiB, used=40.95MiB
And now from today:
marcec marcec # uname -a
Linux marcec 3.15.5-gentoo #1 SMP PREEMPT Fri Jul 11 00:18:11 CEST 2014
x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ AuthenticAMD GNU/Linux
marcec marcec # btrfs --version
Btrfs v3.14.2
marcec marcec # btrfs fi show
Label: none uuid: 0267d8b3-a074-460a-832d-5d5fd36bae64
Total devices 1 FS bytes used 42.16GiB
devid 1 size 107.79GiB used 50.06GiB path /dev/sdf1
Label: 'MARCEC_STORAGE' uuid: 472c9290-3ff2-4096-9c47-0612d3a52cef
Total devices 4 FS bytes used 474.84GiB
devid 1 size 298.09GiB used 238.03GiB path /dev/sda
devid 2 size 298.09GiB used 239.03GiB path /dev/sdb
devid 3 size 298.09GiB used 240.00GiB path /dev/sdc
devid 4 size 298.09GiB used 239.00GiB path /dev/sdd
Label: 'MARCEC_BACKUP' uuid: f97b3cda-15e8-418b-bb9b-235391ef2a38
Total devices 1 FS bytes used 231.97GiB
devid 1 size 976.56GiB used 237.06GiB path /dev/sdg2
Btrfs v3.14.2
marcec marcec # btrfs fi df /run/media/marcec/MARCEC_BACKUP
Data, single: total=229.00GiB, used=228.77GiB
System, DUP: total=32.00MiB, used=36.00KiB
Metadata, DUP: total=4.00GiB, used=3.19GiB
unknown, single: total=512.00MiB, used=0.00
The output of dmesg from both times is attached; however, to avoid exceeding
100KiB, I compressed them with xz first.
Greetings,
--
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup