I have a 4 disk RAID1 setup that fails to {mount,btrfsck} when disk 4 is connected. With disk 4 attached btrfsck errors with: btrfsck: root-tree.c:46: btrfs_find_last_root: Assertion `!(path->slots[0] == 0)'' failed (I''d have to reboot in a non-functioning state to get the full output.) I can mount the filesystem in a degraded state with the 4th drive removed. I believe there is some data corruption as I see lines in /var/log/messages from the degraded,ro filesystem like this: BTRFS info (device sdd1): csum failed ino 4433 off 3254538240 csum 1033749897 private 2248083221 I''m at the point where all I can think to do is wipe disk 4 and then add it back in. Is there anything else I should try first. I have booted btrfs-next with the latest btrfs-progs. Thanks. -- Sandy McArthur "He who dares not offend cannot be honest." - Thomas Paine -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Sandy McArthur <sandymac@gmail.com> schrieb:> I have a 4 disk RAID1 setup that fails to {mount,btrfsck} when disk 4 > is connected. > > With disk 4 attached btrfsck errors with: > btrfsck: root-tree.c:46: btrfs_find_last_root: Assertion > `!(path->slots[0] == 0)'' failed > (I''d have to reboot in a non-functioning state to get the full output.) > > I can mount the filesystem in a degraded state with the 4th drive > removed. I believe there is some data corruption as I see lines in > /var/log/messages from the degraded,ro filesystem like this: > > BTRFS info (device sdd1): csum failed ino 4433 off 3254538240 csum > 1033749897 private 2248083221 > > I''m at the point where all I can think to do is wipe disk 4 and then > add it back in. Is there anything else I should try first. I have > booted btrfs-next with the latest btrfs-progs.It is a RAID-1 so why bother with the faulty drive? Just wipe it, put it back in, then run a btrfs balance... There should be no data loss because all data is stored twice (two-way mirroring). Regards, Kai -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Kai Krakow posted on Sun, 04 Aug 2013 14:41:54 +0200 as excerpted:> It is a RAID-1 so why bother with the faulty drive? Just wipe it, put it > back in, then run a btrfs balance... There should be no data loss > because all data is stored twice (two-way mirroring).The caveat would be if it didn''t start as btrfs raid1, and there''s still some data (or possibly metadata if it was the single drive at one point or they''re ssds, as btrfs defaults to metadata single in ssd mode) that hasn''t been duped elsewhere. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Duncan <1i5t5.duncan@cox.net> schrieb:>> It is a RAID-1 so why bother with the faulty drive? Just wipe it, put it >> back in, then run a btrfs balance... There should be no data loss >> because all data is stored twice (two-way mirroring). > > The caveat would be if it didn''t start as btrfs raid1, and there''s still > some data (or possibly metadata if it was the single drive at one point > or they''re ssds, as btrfs defaults to metadata single in ssd mode) that > hasn''t been duped elsewhere.Oh... That''s actually a pitfall... :-\ Note to myself: Ensure balance has been run successfully and completely. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Aug 4, 2013, at 4:19 PM, Duncan <1i5t5.duncan@cox.net> wrote:> Kai Krakow posted on Sun, 04 Aug 2013 14:41:54 +0200 as excerpted: > >> It is a RAID-1 so why bother with the faulty drive? Just wipe it, put it >> back in, then run a btrfs balance... There should be no data loss >> because all data is stored twice (two-way mirroring). > > The caveat would be if it didn''t start as btrfs raid1, and there''s still > some data (or possibly metadata if it was the single drive at one point > or they''re ssds, as btrfs defaults to metadata single in ssd mode) that > hasn''t been duped elsewhere.I agree. I think tossing the data on the problematic device is a bit of a hammer. It may be necessary, but I don''t think enough information has been provided to conclusively determine all other options have been explored. What kernel versions have been used? What does dmesg record beginning at the time of a normal mount attempt with all four devices available? What does btrfsck (without repair) report? Are there any prior kernel messages related to the controller or libata messages related to the suspect drive? What''s the smartctl -x output for the suspect drive? Has mounting with -o recovery been attempted, and if so what were the messages recorded? Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
FYI: I ended up wipefs''ing the drive and adding it back in. I also has to abort the residual balance process to get the filesystem back to a state where I could add disk. I didn''t realize this until after wiping the drive. Maybe if I''d known to look for that I could have recovered the drive before the wipe. Anyway, all seems fine now and I''m not mix and matching connection types. More History: The filesystem came to a failed state during a balance just after adding the problem disk. This disk also had been installed inside the case on SATA instead of inside an external multi-drive enclosure. My thoughts at the time (now known to be semi-faulty) were it would be faster to push data into the disk that way. When the machine hardlocked this one drive was different enough from the other 3 I simply could not get btrfs to work with all four disks at once. On Sun, Aug 4, 2013 at 7:05 PM, Kai Krakow <hurikhan77+btrfs@gmail.com> wrote:> Duncan <1i5t5.duncan@cox.net> schrieb: > >>> It is a RAID-1 so why bother with the faulty drive? Just wipe it, put it >>> back in, then run a btrfs balance... There should be no data loss >>> because all data is stored twice (two-way mirroring). >> >> The caveat would be if it didn''t start as btrfs raid1, and there''s still >> some data (or possibly metadata if it was the single drive at one point >> or they''re ssds, as btrfs defaults to metadata single in ssd mode) that >> hasn''t been duped elsewhere. > > Oh... That''s actually a pitfall... :-\ > > Note to myself: Ensure balance has been run successfully and completely. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html-- Sandy McArthur "He who dares not offend cannot be honest." - Thomas Paine -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html