Marco L. Crociani
2012-Apr-14 16:39 UTC
Errors in rebalancing RAID1 array after disk failure.
Hi, do you remember my btrfs array? http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg14482.html http://btrfs.ipv5.de/index.php?title=UseCases#Resize_on_multi_devices_filesystem One disk of my two disks btrfs RAID1 array broke down. Initially there was bad sectors that seem to be OK after rewriting those. So I started a scrub. During scrub the faulty disk went worse and I found the first disk completely full. I sent the hard disk to the vendors (it was in warranty) and now I have a new drive. I add it to the btrfs file system. mount -o degraded /dev/sda3 /mnt/sda3 With kernel 3.2.7 I got errors after mounting it. With 3.3.2 I was able to add the new disk. btrfs dev add /dev/sdb3 /mnt/sda3 ./btrfs fi sh Label: ''RootFS'' uuid: c87975a0-a575-405e-9890-d3f7f25bbd96 Total devices 3 FS bytes used 1015.83GB devid 3 size 1.75TB used 357.00GB path /dev/sdb3 devid 1 size 1.75TB used 1.34TB path /dev/sda3 *** Some devices missing Btrfs Btrfs v0.19 start balance # ./btrfs filesystem balance start /mnt/sda3 ERROR: error during balancing ''/mnt/sda3'' - Input/output error There may be more info in syslog - try dmesg | tail Apr 14 18:02:36 evo kernel: [ 115.294006] device label RootFS devid 1 transid 45583 /dev/sda3 Apr 14 18:03:20 evo kernel: [ 159.027945] device label RootFS devid 1 transid 45583 /dev/sda3 Apr 14 18:03:20 evo kernel: [ 159.028397] btrfs: allowing degraded mounts Apr 14 18:03:20 evo kernel: [ 159.028404] btrfs: disk space caching is enabled Apr 14 18:06:16 evo kernel: [ 335.740763] btrfs: relocating block group 1513745285120 flags 20 Apr 14 18:06:17 evo kernel: [ 336.613986] btrfs: relocating block group 1512671543296 flags 20 Apr 14 18:06:19 evo kernel: [ 338.727980] btrfs: relocating block group 1511597801472 flags 20 Apr 14 18:06:20 evo kernel: [ 339.230115] btrfs: relocating block group 1510524059648 flags 20 Apr 14 18:06:20 evo kernel: [ 339.711998] btrfs: relocating block group 1509450317824 flags 20 Apr 14 18:06:21 evo kernel: [ 340.192085] btrfs: relocating block group 1508376576000 flags 20 Apr 14 18:06:21 evo kernel: [ 340.648110] btrfs: relocating block group 1507302834176 flags 20 Apr 14 18:06:22 evo kernel: [ 341.342495] btrfs: relocating block group 1506229092352 flags 20 Apr 14 18:06:23 evo kernel: [ 341.812211] btrfs: relocating block group 1505155350528 flags 20 Apr 14 18:06:23 evo kernel: [ 342.315954] btrfs: relocating block group 1504081608704 flags 20 Apr 14 18:06:24 evo kernel: [ 342.880293] btrfs: relocating block group 1503007866880 flags 20 Apr 14 18:06:24 evo kernel: [ 343.502431] btrfs: relocating block group 1501934125056 flags 20 Apr 14 18:06:25 evo kernel: [ 344.020142] btrfs: relocating block group 1500860383232 flags 20 Apr 14 18:06:25 evo kernel: [ 344.500399] btrfs: relocating block group 1499786641408 flags 20 Apr 14 18:06:26 evo kernel: [ 345.004203] btrfs: relocating block group 1498712899584 flags 20 Apr 14 18:06:26 evo kernel: [ 345.676401] btrfs: relocating block group 1497639157760 flags 20 Apr 14 18:06:27 evo kernel: [ 346.432498] btrfs: relocating block group 1496565415936 flags 20 Apr 14 18:06:28 evo kernel: [ 346.898677] btrfs: relocating block group 1495491674112 flags 20 Apr 14 18:06:28 evo kernel: [ 347.358779] btrfs: relocating block group 1494417932288 flags 20 Apr 14 18:06:29 evo kernel: [ 347.812423] btrfs: relocating block group 1493344190464 flags 20 Apr 14 18:06:29 evo kernel: [ 348.254785] btrfs: relocating block group 1492270448640 flags 20 Apr 14 18:06:29 evo kernel: [ 348.712486] btrfs: relocating block group 1491196706816 flags 20 Apr 14 18:06:30 evo kernel: [ 349.180724] btrfs: relocating block group 1490122964992 flags 20 Apr 14 18:06:30 evo kernel: [ 349.660531] btrfs: relocating block group 1489049223168 flags 20 Apr 14 18:06:31 evo kernel: [ 350.247621] btrfs: relocating block group 1487975481344 flags 20 Apr 14 18:06:32 evo kernel: [ 351.640810] btrfs: relocating block group 1486901739520 flags 20 Apr 14 18:06:34 evo kernel: [ 353.045191] btrfs: relocating block group 1485827997696 flags 20 Apr 14 18:06:35 evo kernel: [ 354.353076] btrfs: relocating block group 1484754255872 flags 20 Apr 14 18:06:36 evo kernel: [ 355.649374] btrfs: relocating block group 1483680514048 flags 20 Apr 14 18:06:38 evo kernel: [ 356.789416] btrfs: relocating block group 1482606772224 flags 20 Apr 14 18:06:39 evo kernel: [ 358.049273] btrfs: relocating block group 1481533030400 flags 20 Apr 14 18:06:40 evo kernel: [ 358.997303] btrfs: relocating block group 1480459288576 flags 20 Apr 14 18:06:41 evo kernel: [ 359.907791] btrfs: relocating block group 1479385546752 flags 20 Apr 14 18:06:42 evo kernel: [ 360.808096] btrfs: relocating block group 1478311804928 flags 20 Apr 14 18:06:42 evo kernel: [ 361.673508] btrfs: relocating block group 1477238063104 flags 20 Apr 14 18:06:43 evo kernel: [ 362.561198] btrfs: relocating block group 1476164321280 flags 20 Apr 14 18:06:44 evo kernel: [ 363.737713] btrfs: relocating block group 1475090579456 flags 20 Apr 14 18:06:45 evo kernel: [ 364.638036] btrfs: relocating block group 1474016837632 flags 20 Apr 14 18:06:46 evo kernel: [ 365.430090] btrfs: relocating block group 1472943095808 flags 20 Apr 14 18:06:47 evo kernel: [ 366.304545] btrfs: relocating block group 1471869353984 flags 20 Apr 14 18:06:48 evo kernel: [ 367.134223] btrfs: relocating block group 1470795612160 flags 20 Apr 14 18:06:49 evo kernel: [ 368.015084] btrfs: relocating block group 1469721870336 flags 20 Apr 14 18:06:50 evo kernel: [ 368.981808] btrfs: relocating block group 1468648128512 flags 20 Apr 14 18:06:51 evo kernel: [ 369.954430] btrfs: relocating block group 1467574386688 flags 20 Apr 14 18:06:52 evo kernel: [ 370.811505] btrfs: relocating block group 1466500644864 flags 20 Apr 14 18:06:53 evo kernel: [ 371.802372] btrfs: relocating block group 1465426903040 flags 20 Apr 14 18:06:53 evo kernel: [ 372.690405] btrfs: relocating block group 1464353161216 flags 20 Apr 14 18:06:54 evo kernel: [ 373.542724] btrfs: relocating block group 1463279419392 flags 20 Apr 14 18:06:55 evo kernel: [ 374.705285] btrfs: relocating block group 1462205677568 flags 20 Apr 14 18:06:56 evo kernel: [ 375.738935] btrfs: relocating block group 1461131935744 flags 20 Apr 14 18:06:58 evo kernel: [ 376.818765] btrfs: relocating block group 1460058193920 flags 20 Apr 14 18:06:59 evo kernel: [ 377.911016] btrfs: relocating block group 1458984452096 flags 20 Apr 14 18:07:00 evo kernel: [ 378.785504] btrfs: relocating block group 1457910710272 flags 20 Apr 14 18:07:01 evo kernel: [ 380.155285] btrfs: relocating block group 1456836968448 flags 20 Apr 14 18:07:02 evo kernel: [ 380.995338] btrfs: relocating block group 1455763226624 flags 20 Apr 14 18:07:03 evo kernel: [ 381.979403] btrfs: relocating block group 1454689484800 flags 20 Apr 14 18:07:04 evo kernel: [ 382.891477] btrfs: relocating block group 1453615742976 flags 20 Apr 14 18:07:04 evo kernel: [ 383.743286] btrfs: relocating block group 1452542001152 flags 20 Apr 14 18:07:06 evo kernel: [ 385.395922] btrfs: relocating block group 1451468259328 flags 20 Apr 14 18:07:07 evo kernel: [ 386.237748] btrfs: relocating block group 1450394517504 flags 20 Apr 14 18:07:08 evo kernel: [ 386.947701] btrfs: relocating block group 1449320775680 flags 20 Apr 14 18:07:08 evo kernel: [ 387.715735] btrfs: relocating block group 1448247033856 flags 20 Apr 14 18:07:09 evo kernel: [ 388.495807] btrfs: relocating block group 1447173292032 flags 20 Apr 14 18:07:10 evo kernel: [ 389.310221] btrfs: relocating block group 1446099550208 flags 20 Apr 14 18:07:11 evo kernel: [ 389.995904] btrfs: relocating block group 1445025808384 flags 20 Apr 14 18:07:11 evo kernel: [ 390.643700] btrfs: relocating block group 1443952066560 flags 20 Apr 14 18:07:12 evo kernel: [ 391.639758] btrfs: relocating block group 1442878324736 flags 20 Apr 14 18:07:13 evo kernel: [ 392.202374] btrfs: relocating block group 1441804582912 flags 20 Apr 14 18:07:14 evo kernel: [ 392.802422] btrfs: relocating block group 1440730841088 flags 20 Apr 14 18:07:14 evo kernel: [ 393.428116] btrfs: relocating block group 1439657099264 flags 20 Apr 14 18:07:15 evo kernel: [ 394.184194] btrfs: relocating block group 1438583357440 flags 20 Apr 14 18:07:15 evo kernel: [ 394.722289] btrfs: relocating block group 1437509615616 flags 20 Apr 14 18:07:16 evo kernel: [ 395.275988] btrfs: relocating block group 1436435873792 flags 20 Apr 14 18:07:17 evo kernel: [ 395.905274] btrfs: relocating block group 1435362131968 flags 20 Apr 14 18:07:17 evo kernel: [ 396.584115] btrfs: relocating block group 1434288390144 flags 20 Apr 14 18:07:18 evo kernel: [ 397.194476] btrfs: relocating block group 1433214648320 flags 20 Apr 14 18:07:19 evo kernel: [ 397.758507] btrfs: relocating block group 1432140906496 flags 20 Apr 14 18:07:19 evo kernel: [ 398.382795] btrfs: relocating block group 1431067164672 flags 20 Apr 14 18:07:20 evo kernel: [ 399.152527] btrfs: relocating block group 1429993422848 flags 20 Apr 14 18:07:20 evo kernel: [ 399.726684] btrfs: relocating block group 1428919681024 flags 20 Apr 14 18:07:21 evo kernel: [ 400.352578] btrfs: relocating block group 1427845939200 flags 20 Apr 14 18:07:22 evo kernel: [ 401.010780] btrfs: relocating block group 1426772197376 flags 20 Apr 14 18:07:23 evo kernel: [ 401.972700] btrfs: relocating block group 1425698455552 flags 20 Apr 14 18:07:23 evo kernel: [ 402.691132] btrfs: relocating block group 1424624713728 flags 20 Apr 14 18:07:24 evo kernel: [ 403.291128] btrfs: relocating block group 1423550971904 flags 20 Apr 14 18:07:25 evo kernel: [ 403.868841] btrfs: relocating block group 1422477230080 flags 20 Apr 14 18:07:25 evo kernel: [ 404.611037] btrfs: relocating block group 1421403488256 flags 20 Apr 14 18:07:26 evo kernel: [ 405.163300] btrfs: relocating block group 1420329746432 flags 20 Apr 14 18:07:27 evo kernel: [ 405.896774] btrfs: relocating block group 1419256004608 flags 20 Apr 14 18:07:27 evo kernel: [ 406.435405] btrfs: relocating block group 1418182262784 flags 20 Apr 14 18:07:28 evo kernel: [ 407.385118] btrfs: relocating block group 1417108520960 flags 20 Apr 14 18:07:29 evo kernel: [ 407.935516] btrfs: relocating block group 1416034779136 flags 20 Apr 14 18:07:29 evo kernel: [ 408.512968] btrfs: relocating block group 1414961037312 flags 20 Apr 14 18:07:30 evo kernel: [ 409.053245] btrfs: relocating block group 1413887295488 flags 20 Apr 14 18:07:30 evo kernel: [ 409.531385] btrfs: relocating block group 1412813553664 flags 20 Apr 14 18:07:31 evo kernel: [ 410.025036] btrfs: relocating block group 1411739811840 flags 20 Apr 14 18:07:31 evo kernel: [ 410.503666] btrfs: relocating block group 1410666070016 flags 20 Apr 14 18:07:32 evo kernel: [ 411.033124] btrfs: relocating block group 1409592328192 flags 20 Apr 14 18:07:33 evo kernel: [ 411.753059] btrfs: relocating block group 1408518586368 flags 20 Apr 14 18:07:33 evo kernel: [ 412.401433] btrfs: relocating block group 1407444844544 flags 20 Apr 14 18:07:34 evo kernel: [ 412.951844] btrfs: relocating block group 1406371102720 flags 20 Apr 14 18:07:34 evo kernel: [ 413.385523] btrfs: relocating block group 1405297360896 flags 20 Apr 14 18:07:35 evo kernel: [ 413.817546] btrfs: relocating block group 1404223619072 flags 20 Apr 14 18:07:35 evo kernel: [ 414.235729] btrfs: relocating block group 1403149877248 flags 20 Apr 14 18:07:35 evo kernel: [ 414.703857] btrfs: relocating block group 1402076135424 flags 17 Apr 14 18:07:47 evo kernel: [ 425.878246] btrfs: found 119 extents Apr 14 18:07:51 evo kernel: [ 430.715144] btrfs: found 119 extents Apr 14 18:07:52 evo kernel: [ 431.054709] btrfs: relocating block group 1401002393600 flags 17 Apr 14 18:08:14 evo kernel: [ 453.506541] btrfs csum failed ino 362 off 910946304 csum 432355644 private 175165154 Apr 14 18:08:14 evo kernel: [ 453.536473] btrfs csum failed ino 362 off 910946304 csum 432355644 private 175165154 Apr 14 18:08:14 evo kernel: [ 453.536804] btrfs csum failed ino 362 off 910946304 csum 432355644 private 175165154 Apr 14 18:08:14 evo kernel: [ 453.607397] btrfs csum failed ino 362 off 910946304 csum 432355644 private 175165154 Apr 14 18:08:14 evo kernel: [ 453.607649] btrfs csum failed ino 362 off 910946304 csum 432355644 private 175165154 ./btrfs device delete missing /mnt/sda3/ ERROR: error removing the device ''missing'' - Input/output error Apr 14 18:13:49 evo kernel: [ 788.432854] btrfs: relocating block group 1401002393600 flags 17 Apr 14 18:14:07 evo kernel: [ 806.727391] btrfs csum failed ino 363 off 910946304 csum 432355644 private 175165154 Apr 14 18:14:08 evo kernel: [ 806.765441] btrfs csum failed ino 363 off 910946304 csum 432355644 private 175165154 Apr 14 18:14:08 evo kernel: [ 806.765799] btrfs csum failed ino 363 off 910946304 csum 432355644 private 175165154 Apr 14 18:14:08 evo kernel: [ 806.766290] btrfs csum failed ino 363 off 910946304 csum 432355644 private 175165154 Apr 14 18:14:08 evo kernel: [ 806.766650] btrfs csum failed ino 363 off 910946304 csum 432355644 private 175165154 What can I do? Best regards, -- Marco Lorenzo Crociani, marco.crociani@gmail.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Sterba
2012-Apr-16 13:46 UTC
Re: Errors in rebalancing RAID1 array after disk failure.
On Sat, Apr 14, 2012 at 06:39:12PM +0200, Marco L. Crociani wrote:> Apr 14 18:07:52 evo kernel: [ 431.054709] btrfs: relocating block > group 1401002393600 flags 17 > Apr 14 18:08:14 evo kernel: [ 453.506541] btrfs csum failed ino 362 > off 910946304 csum 432355644 private 175165154The failed checksums prevent balance to relocate the blockgroup, which is a needed step during ''dev delete''. Unless the csum is fixable by using another copy, I think the only option left is to delete the file (not counting the unsafe way of resetting the block''s cheksum). david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Marco L. Crociani
2012-Apr-16 22:56 UTC
Re: Errors in rebalancing RAID1 array after disk failure.
On Mon, Apr 16, 2012 at 3:46 PM, David Sterba <dave@jikos.cz> wrote:> On Sat, Apr 14, 2012 at 06:39:12PM +0200, Marco L. Crociani wrote: >> Apr 14 18:07:52 evo kernel: [ 431.054709] btrfs: relocating block >> group 1401002393600 flags 17 >> Apr 14 18:08:14 evo kernel: [ 453.506541] btrfs csum failed ino 362 >> off 910946304 csum 432355644 private 175165154 > > The failed checksums prevent balance to relocate the blockgroup, which > is a needed step during ''dev delete''. Unless the csum is fixable by > using another copy, I think the only option left is to delete the file > (not counting the unsafe way of resetting the block''s cheksum). >I deleted the files. (" find /mnt/sda3 -inum 362 -ls " is correct to find them? ) Now it gives me errors on inode 257 I deleted a file but it still gives me errors on inode 257 but "find /mnt/sda3 -inum 257 -ls" gives me nothing now. Apr 17 00:41:49 evo kernel: [ 156.530441] device label RootFS devid 1 transid 47037 /dev/sda3 Apr 17 00:41:49 evo kernel: [ 156.734993] device label RootFS devid 3 transid 47037 /dev/sdb3 Apr 17 00:42:12 evo kernel: [ 179.496155] device label RootFS devid 1 transid 47037 /dev/sda3 Apr 17 00:42:12 evo kernel: [ 179.496881] btrfs: allowing degraded mounts Apr 17 00:42:12 evo kernel: [ 179.496888] btrfs: disk space caching is enabled Apr 17 00:42:24 evo kernel: [ 191.290093] btrfs: relocating block group 1401002393600 flags 17 Apr 17 00:42:53 evo kernel: [ 220.417535] btrfs csum failed ino 257 off 910946304 csum 432355644 private 175165154 Apr 17 00:42:53 evo kernel: [ 220.480570] btrfs csum failed ino 257 off 910946304 csum 432355644 private 175165154 Apr 17 00:42:53 evo kernel: [ 220.480868] btrfs csum failed ino 257 off 910946304 csum 432355644 private 175165154 Apr 17 00:42:53 evo kernel: [ 220.505168] btrfs csum failed ino 257 off 910946304 csum 432355644 private 175165154 Apr 17 00:42:53 evo kernel: [ 220.528368] btrfs csum failed ino 257 off 910946304 csum 432355644 private 175165154 -- Marco Lorenzo Crociani, -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Marco L. Crociani
2012-Apr-17 00:25 UTC
Re: Errors in rebalancing RAID1 array after disk failure.
On Tue, Apr 17, 2012 at 12:56 AM, Marco L. Crociani <marco.crociani@gmail.com> wrote:> On Mon, Apr 16, 2012 at 3:46 PM, David Sterba <dave@jikos.cz> wrote: >> On Sat, Apr 14, 2012 at 06:39:12PM +0200, Marco L. Crociani wrote: >>> Apr 14 18:07:52 evo kernel: [ 431.054709] btrfs: relocating block >>> group 1401002393600 flags 17 >>> Apr 14 18:08:14 evo kernel: [ 453.506541] btrfs csum failed ino 362 >>> off 910946304 csum 432355644 private 175165154 >> >> The failed checksums prevent balance to relocate the blockgroup, which >> is a needed step during ''dev delete''. Unless the csum is fixable by >> using another copy, I think the only option left is to delete the file >> (not counting the unsafe way of resetting the block''s cheksum). >> > > I deleted the files. > (" find /mnt/sda3 -inum 362 -ls " is correct to find them? ) > > Now it gives me errors on inode 257 > I deleted a file but it still gives me errors on inode 257 but "find > /mnt/sda3 -inum 257 -ls" gives me nothing now. > > Apr 17 00:41:49 evo kernel: [ 156.530441] device label RootFS devid 1 > transid 47037 /dev/sda3 > Apr 17 00:41:49 evo kernel: [ 156.734993] device label RootFS devid 3 > transid 47037 /dev/sdb3 > Apr 17 00:42:12 evo kernel: [ 179.496155] device label RootFS devid 1 > transid 47037 /dev/sda3 > Apr 17 00:42:12 evo kernel: [ 179.496881] btrfs: allowing degraded mounts > Apr 17 00:42:12 evo kernel: [ 179.496888] btrfs: disk space caching is enabled > Apr 17 00:42:24 evo kernel: [ 191.290093] btrfs: relocating block > group 1401002393600 flags 17 > Apr 17 00:42:53 evo kernel: [ 220.417535] btrfs csum failed ino 257 > off 910946304 csum 432355644 private 175165154 > Apr 17 00:42:53 evo kernel: [ 220.480570] btrfs csum failed ino 257 > off 910946304 csum 432355644 private 175165154 > Apr 17 00:42:53 evo kernel: [ 220.480868] btrfs csum failed ino 257 > off 910946304 csum 432355644 private 175165154 > Apr 17 00:42:53 evo kernel: [ 220.505168] btrfs csum failed ino 257 > off 910946304 csum 432355644 private 175165154 > Apr 17 00:42:53 evo kernel: [ 220.528368] btrfs csum failed ino 257 > off 910946304 csum 432355644 private 175165154 > > > -- > Marco Lorenzo Crociani,Running another time btrfs dev delete missing return a different error (something like invalid argument), and no log activity. Then umount completely freeze the system. Keyboard''s leds start blinking. Also alt gr + print screen + REISUB doesn''t work. -- Marco Lorenzo Crociani, marco.crociani@gmail.com Telefono: +39 02320622509 Fax: +39 02700540121 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Marco L. Crociani
2012-Apr-19 15:42 UTC
Re: Errors in rebalancing RAID1 array after disk failure.
Today I tried scrub... Apr 19 17:36:01 evo kernel: [ 187.932297] device label RootFS devid 1 transid 47046 /dev/sda3 Apr 19 17:36:02 evo kernel: [ 188.145858] device label RootFS devid 3 transid 47046 /dev/sdb3 Apr 19 17:36:19 evo kernel: [ 205.483044] device label RootFS devid 1 transid 47046 /dev/sda3 Apr 19 17:36:19 evo kernel: [ 205.483730] btrfs: allowing degraded mounts Apr 19 17:36:19 evo kernel: [ 205.483737] btrfs: disk space caching is enabled Apr 19 17:38:41 evo kernel: [ 347.661603] BUG: unable to handle kernel NULL pointer dereference at (null) Apr 19 17:38:41 evo kernel: [ 347.661617] IP: [<ffffffff8131ff94>] strncpy+0x14/0x30 Apr 19 17:38:41 evo kernel: [ 347.661633] PGD 17b672067 PUD 17b5ed067 PMD 0 Apr 19 17:38:41 evo kernel: [ 347.661643] Oops: 0000 [#1] SMP Apr 19 17:38:41 evo kernel: [ 347.661650] CPU 3 Apr 19 17:38:41 evo kernel: [ 347.661654] Modules linked in: ip6table_filter ip6_tables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp iptable_filter ip_tables x_tables bridge stp kvm_amd kvm rfcomm bnep bluetooth parport_pc ppdev dm_crypt snd_hda_codec_realtek snd_hda_codec_hdmi snd_usb_audio snd_usbmidi_lib snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event uvcvideo snd_seq snd_timer snd_seq_device snd videobuf2_core videodev v4l2_compat_ioctl32 videobuf2_vmalloc soundcore videobuf2_memops dm_multipath eeepc_wmi mac_hid asus_wmi binfmt_misc snd_page_alloc fglrx(PO) i2c_piix4 k10temp sparse_keymap lp parport raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx raid0 multipath linear btrfs zlib_deflate libcrc32c raid1 usbhid hid wmi r8169 Apr 19 17:38:41 evo kernel: [ 347.661780] Apr 19 17:38:41 evo kernel: [ 347.661787] Pid: 3218, comm: btrfs Tainted: P O 3.3.2-030302-generic #201204131335 System manufacturer System Product Name/F1A75-V EVO Apr 19 17:38:41 evo kernel: [ 347.661799] RIP: 0010:[<ffffffff8131ff94>] [<ffffffff8131ff94>] strncpy+0x14/0x30 Apr 19 17:38:41 evo kernel: [ 347.661810] RSP: 0018:ffff880182559e08 EFLAGS: 00010206 Apr 19 17:38:41 evo kernel: [ 347.661816] RAX: ffff8801b14eac00 RBX: ffff8801b14ea000 RCX: 0000000000000000 Apr 19 17:38:41 evo kernel: [ 347.661822] RDX: 0000000000000400 RSI: 0000000000000000 RDI: ffff8801b14eac00 Apr 19 17:38:41 evo kernel: [ 347.661827] RBP: ffff880182559e08 R08: ffff8801b048b8b8 R09: 0000000000000002 Apr 19 17:38:41 evo kernel: [ 347.661833] R10: 0000000000000010 R11: 0000000000000206 R12: ffff8801b1741800 Apr 19 17:38:41 evo kernel: [ 347.661839] R13: 0000000000d55040 R14: ffff8801b14ea008 R15: ffff8801b048b898 Apr 19 17:38:41 evo kernel: [ 347.661846] FS: 00007f73c9f34760(0000) GS:ffff8801bed80000(0000) knlGS:0000000000000000 Apr 19 17:38:41 evo kernel: [ 347.661852] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Apr 19 17:38:41 evo kernel: [ 347.661857] CR2: 0000000000000000 CR3: 00000001827db000 CR4: 00000000000006e0 Apr 19 17:38:41 evo kernel: [ 347.661863] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Apr 19 17:38:41 evo kernel: [ 347.661869] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Apr 19 17:38:41 evo kernel: [ 347.661875] Process btrfs (pid: 3218, threadinfo ffff880182558000, task ffff88017b5e44d0) Apr 19 17:38:41 evo kernel: [ 347.661880] Stack: Apr 19 17:38:41 evo kernel: [ 347.661884] ffff880182559e78 ffffffffa00b76ac ffff8801b1504e00 0000000000000000 Apr 19 17:38:41 evo kernel: [ 347.661895] 0000000000000000 0000000000000000 ffff880182559f48 000000005bfc4f67 Apr 19 17:38:41 evo kernel: [ 347.661905] 0000000100002c2c ffff8801824a2600 0000000000d55040 ffff88018c7df800 Apr 19 17:38:41 evo kernel: [ 347.661915] Call Trace: Apr 19 17:38:41 evo kernel: [ 347.661964] [<ffffffffa00b76ac>] btrfs_ioctl_dev_info+0x15c/0x1a0 [btrfs] Apr 19 17:38:41 evo kernel: [ 347.662013] [<ffffffffa00ba9b1>] btrfs_ioctl+0x571/0x6c0 [btrfs] Apr 19 17:38:41 evo kernel: [ 347.662024] [<ffffffff81193839>] do_vfs_ioctl+0x99/0x330 Apr 19 17:38:41 evo kernel: [ 347.662032] [<ffffffff8118d345>] ? putname+0x35/0x50 Apr 19 17:38:41 evo kernel: [ 347.662040] [<ffffffff81193b71>] sys_ioctl+0xa1/0xb0 Apr 19 17:38:41 evo kernel: [ 347.662049] [<ffffffff816691a9>] system_call_fastpath+0x16/0x1b Apr 19 17:38:41 evo kernel: [ 347.662054] Code: 48 83 c2 01 84 c9 75 ef c9 c3 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 85 d2 48 89 f8 48 89 e5 75 08 eb 18 66 90 48 83 c7 01 <0f> b6 0e 80 f9 01 88 0f 48 83 de ff 48 83 ea 01 75 ea c9 c3 0f Apr 19 17:38:41 evo kernel: [ 347.662128] RIP [<ffffffff8131ff94>] strncpy+0x14/0x30 Apr 19 17:38:41 evo kernel: [ 347.662137] RSP <ffff880182559e08> Apr 19 17:38:41 evo kernel: [ 347.662141] CR2: 0000000000000000 Apr 19 17:38:41 evo kernel: [ 347.662147] ---[ end trace 9a8c295d04917ed2 ]--- -- Marco Lorenzo Crociani, marco.crociani@gmail.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Marco L. Crociani
2012-Apr-30 13:01 UTC
Re: Errors in rebalancing RAID1 array after disk failure.
Hi all, today another episode... I have compiled and tried kernel 3.4-rc5 ./btrfs fi sh Label: ''RootFS'' uuid: c87975a0-a575-405e-9890-d3f7f25bbd96 Total devices 3 FS bytes used 1006.67GB devid 3 size 1.75TB used 357.00GB path /dev/sdb3 devid 1 size 1.75TB used 1.34TB path /dev/sda3 *** Some devices missing Btrfs Btrfs v0.19 ./btrfs device delete missing /mnt/sda3 ERROR: error removing the device ''missing'' - Input/output error Apr 30 13:17:51 evo kernel: [ 103.074835] device label RootFS devid 1 transid 47082 /dev/sda3 Apr 30 13:17:52 evo kernel: [ 103.281796] device label RootFS devid 3 transid 47082 /dev/sdb3 Apr 30 13:17:57 evo kernel: [ 108.865001] device label RootFS devid 1 transid 47082 /dev/sda3 Apr 30 13:17:57 evo kernel: [ 108.866205] btrfs: allowing degraded mounts Apr 30 13:17:57 evo kernel: [ 108.866214] btrfs: disk space caching is enabled Apr 30 13:18:32 evo kernel: [ 143.274899] btrfs: relocating block group 1401002393600 flags 17 Apr 30 13:19:25 evo kernel: [ 196.888248] btrfs csum failed ino 257 off 910946304 csum 432355644 private 175165154 Apr 30 13:19:25 evo kernel: [ 196.889900] btrfs csum failed ino 257 off 910946304 csum 432355644 private 175165154 Apr 30 13:19:25 evo kernel: [ 196.890429] btrfs csum failed ino 257 off 910946304 csum 432355644 private 175165154 Apr 30 13:19:25 evo kernel: [ 197.087419] btrfs csum failed ino 257 off 910946304 csum 432355644 private 175165154 Apr 30 13:19:25 evo kernel: [ 197.087681] btrfs csum failed ino 257 off 910946304 csum 432355644 private 175165154 ./btrfs inspect-internal inode-resolve -v 257 /mnt/sda3/ ioctl ret=-1, error: No such file or directory ./btrfs scrub status /mnt/sda3/ scrub status for c87975a0-a575-405e-9890-d3f7f25bbd96 scrub started at Mon Apr 30 13:26:26 2012 and was aborted after 4367 seconds total bytes scrubbed: 406.64GB with 2 errors error details: csum=2 corrected errors: 0, uncorrectable errors: 0, unverified errors: 0 Apr 30 14:37:24 evo kernel: [ 4875.275776] btrfs: checksum error at logical 752871157760 on dev /dev/sda3, sector 873795352, root 259, inode 1580389, offset 612610048, length 4096, links 1 (path: .ecryptfs/[ .... ] Apr 30 14:37:24 evo kernel: [ 4875.275838] BUG: unable to handle kernel NULL pointer dereference at 0000000000000090 Apr 30 14:37:24 evo kernel: [ 4875.275848] IP: [<ffffffff811ae841>] bio_add_page+0x11/0x60 Apr 30 14:37:24 evo kernel: [ 4875.275862] PGD 0 Apr 30 14:37:24 evo kernel: [ 4875.275868] Oops: 0000 [#1] SMP Apr 30 14:37:24 evo kernel: [ 4875.275875] CPU 2 Apr 30 14:37:24 evo kernel: [ 4875.275878] Modules linked in: ip6table_filter ip6_tables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp iptable_filter ip_tables x_tables bridge stp kvm_amd kvm rfcomm bnep dm_crypt parport_pc bluetooth ppdev snd_hda_codec_realtek snd_hda_codec_hdmi uvcvideo videobuf2_core snd_hda_intel snd_hda_codec videodev videobuf2_vmalloc snd_usb_audio videobuf2_memops snd_hwdep snd_pcm snd_usbmidi_lib snd_seq_midi snd_rawmidi eeepc_wmi asus_wmi snd_seq_midi_event snd_seq snd_timer snd_seq_device mac_hid sparse_keymap snd binfmt_misc soundcore snd_page_alloc dm_multipath k10temp i2c_piix4 microcode lp parport raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx raid0 multipath linear btrfs zlib_deflate libcrc32c raid1 usbhid hid wmi r8169 Apr 30 14:37:24 evo kernel: [ 4875.276004] Apr 30 14:37:24 evo kernel: [ 4875.276010] Pid: 3401, comm: btrfs-scrub-1 Not tainted 3.4.0-rc5-mio01 #1 System manufacturer System Product Name/F1A75-V EVO Apr 30 14:37:24 evo kernel: [ 4875.276022] RIP: 0010:[<ffffffff811ae841>] [<ffffffff811ae841>] bio_add_page+0x11/0x60 Apr 30 14:37:24 evo kernel: [ 4875.276033] RSP: 0018:ffff88017135bba0 EFLAGS: 00010246 Apr 30 14:37:24 evo kernel: [ 4875.276038] RAX: 0000000000000000 RBX: ffff8801710ac000 RCX: 0000000000000000 Apr 30 14:37:24 evo kernel: [ 4875.276044] RDX: 0000000000001000 RSI: ffffea0004c2b8c0 RDI: ffff88017775b900 Apr 30 14:37:24 evo kernel: [ 4875.276050] RBP: ffff88017135bba0 R08: ffff8801bed16590 R09: 0000000000000001 Apr 30 14:37:24 evo kernel: [ 4875.276056] R10: 00000000710d1001 R11: 0000000000000007 R12: ffff88017775b900 Apr 30 14:37:24 evo kernel: [ 4875.276061] R13: ffff8801710ac000 R14: 0000000000000000 R15: ffff88017135bbf8 Apr 30 14:37:24 evo kernel: [ 4875.276068] FS: 00007f33e7e239c0(0000) GS:ffff8801bed00000(0000) knlGS:00000000f66a2b70 Apr 30 14:37:24 evo kernel: [ 4875.276074] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Apr 30 14:37:24 evo kernel: [ 4875.276080] CR2: 0000000000000090 CR3: 000000017b6e4000 CR4: 00000000000007e0 Apr 30 14:37:24 evo kernel: [ 4875.276086] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Apr 30 14:37:24 evo kernel: [ 4875.276092] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Apr 30 14:37:24 evo kernel: [ 4875.276098] Process btrfs-scrub-1 (pid: 3401, threadinfo ffff88017135a000, task ffff88016de916e0) Apr 30 14:37:24 evo kernel: [ 4875.276103] Stack: Apr 30 14:37:24 evo kernel: [ 4875.276107] ffff88017135bc60 ffffffffa00dbc4a ffff8801beffcd00 0000000000006090 Apr 30 14:37:24 evo kernel: [ 4875.276118] 000000010f953000 ffff8801b1084000 ffff88017825a83c 000000000004e96c Apr 30 14:37:24 evo kernel: [ 4875.276129] 0000000034150f18 000000af00000000 ffff880100000000 ffff88017135bbf8 Apr 30 14:37:24 evo kernel: [ 4875.276138] Call Trace: Apr 30 14:37:24 evo kernel: [ 4875.276193] [<ffffffffa00dbc4a>] scrub_recheck_block+0xea/0x410 [btrfs] Apr 30 14:37:24 evo kernel: [ 4875.276238] [<ffffffffa00de24f>] scrub_handle_errored_block+0x33f/0x990 [btrfs] Apr 30 14:37:24 evo kernel: [ 4875.276279] [<ffffffffa00defa9>] scrub_bio_end_io_worker+0x709/0x770 [btrfs] Apr 30 14:37:24 evo kernel: [ 4875.276291] [<ffffffff8105f360>] ? usleep_range+0x50/0x50 Apr 30 14:37:24 evo kernel: [ 4875.276332] [<ffffffffa00bc30f>] worker_loop+0x16f/0x5d0 [btrfs] Apr 30 14:37:24 evo kernel: [ 4875.276374] [<ffffffffa00bc1a0>] ? btrfs_queue_worker+0x310/0x310 [btrfs] Apr 30 14:37:24 evo kernel: [ 4875.276383] [<ffffffff81072d83>] kthread+0x93/0xa0 Apr 30 14:37:24 evo kernel: [ 4875.276393] [<ffffffff81661fa4>] kernel_thread_helper+0x4/0x10 Apr 30 14:37:24 evo kernel: [ 4875.276401] [<ffffffff81072cf0>] ? kthread_freezable_should_stop+0x70/0x70 Apr 30 14:37:24 evo kernel: [ 4875.276410] [<ffffffff81661fa0>] ? gs_change+0x13/0x13 Apr 30 14:37:24 evo kernel: [ 4875.276414] Code: e8 f5 65 20 00 84 c0 0f b7 43 28 0f 85 47 ff ff ff e9 d8 fe ff ff 0f 1f 40 00 55 48 89 e5 66 66 66 66 90 48 8b 47 10 f6 47 18 10 <48> 8b 80 90 00 00 00 4c 8b 88 50 03 00 00 41 8b 81 0c 05 00 00 Apr 30 14:37:24 evo kernel: [ 4875.276492] RIP [<ffffffff811ae841>] bio_add_page+0x11/0x60 Apr 30 14:37:24 evo kernel: [ 4875.276500] RSP <ffff88017135bba0> Apr 30 14:37:24 evo kernel: [ 4875.276504] CR2: 0000000000000090 Apr 30 14:37:24 evo kernel: [ 4875.276510] ---[ end trace 3ddfc561d71fac95 ]--- Apr 30 14:37:30 evo kernel: [ 4881.497569] btrfs: checksum error at logical 753578258432 on dev /dev/sda3, sector 875176408, root 259, inode 1580396, offset 586055680, length 4096, links 1 (path: .ecryptfs/[ ... ] Apr 30 14:37:30 evo kernel: [ 4881.497636] BUG: unable to handle kernel NULL pointer dereference at 0000000000000090 Apr 30 14:37:30 evo kernel: [ 4881.497645] IP: [<ffffffff811ae841>] bio_add_page+0x11/0x60 Apr 30 14:37:30 evo kernel: [ 4881.497660] PGD 17b6d8067 PUD 171101067 PMD 0 Apr 30 14:37:30 evo kernel: [ 4881.497670] Oops: 0000 [#2] SMP Apr 30 14:37:30 evo kernel: [ 4881.497677] CPU 2 Apr 30 14:37:30 evo kernel: [ 4881.497680] Modules linked in: ip6table_filter ip6_tables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp iptable_filter ip_tables x_tables bridge stp kvm_amd kvm rfcomm bnep dm_crypt parport_pc bluetooth ppdev snd_hda_codec_realtek snd_hda_codec_hdmi uvcvideo videobuf2_core snd_hda_intel snd_hda_codec videodev videobuf2_vmalloc snd_usb_audio videobuf2_memops snd_hwdep snd_pcm snd_usbmidi_lib snd_seq_midi snd_rawmidi eeepc_wmi asus_wmi snd_seq_midi_event snd_seq snd_timer snd_seq_device mac_hid sparse_keymap snd binfmt_misc soundcore snd_page_alloc dm_multipath k10temp i2c_piix4 microcode lp parport raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx raid0 multipath linear btrfs zlib_deflate libcrc32c raid1 usbhid hid wmi r8169 Apr 30 14:37:30 evo kernel: [ 4881.497806] Apr 30 14:37:30 evo kernel: [ 4881.497813] Pid: 3412, comm: btrfs-scrub-2 Tainted: G D 3.4.0-rc5-mio01 #1 System manufacturer System Product Name/F1A75-V EVO Apr 30 14:37:30 evo kernel: [ 4881.497826] RIP: 0010:[<ffffffff811ae841>] [<ffffffff811ae841>] bio_add_page+0x11/0x60 Apr 30 14:37:30 evo kernel: [ 4881.497837] RSP: 0018:ffff8801776adba0 EFLAGS: 00010246 Apr 30 14:37:30 evo kernel: [ 4881.497842] RAX: 0000000000000000 RBX: ffff8801710ae000 RCX: 0000000000000000 Apr 30 14:37:30 evo kernel: [ 4881.497848] RDX: 0000000000001000 RSI: ffffea0004c4be40 RDI: ffff88017775bc00 Apr 30 14:37:30 evo kernel: [ 4881.497854] RBP: ffff8801776adba0 R08: ffff8801bed16590 R09: 0000000000000001 Apr 30 14:37:30 evo kernel: [ 4881.497859] R10: ffffffffa00db723 R11: 0000000000000000 R12: ffff88017775bc00 Apr 30 14:37:30 evo kernel: [ 4881.497865] R13: ffff8801710ae000 R14: 0000000000000000 R15: ffff8801776adbf8 Apr 30 14:37:30 evo kernel: [ 4881.497872] FS: 00007fcf45503700(0000) GS:ffff8801bed00000(0000) knlGS:00000000f66a2b70 Apr 30 14:37:30 evo kernel: [ 4881.497878] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Apr 30 14:37:30 evo kernel: [ 4881.497884] CR2: 0000000000000090 CR3: 000000018174c000 CR4: 00000000000007e0 Apr 30 14:37:30 evo kernel: [ 4881.497890] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Apr 30 14:37:30 evo kernel: [ 4881.497896] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Apr 30 14:37:30 evo kernel: [ 4881.497902] Process btrfs-scrub-2 (pid: 3412, threadinfo ffff8801776ac000, task ffff88016de92dc0) Apr 30 14:37:30 evo kernel: [ 4881.497907] Stack: Apr 30 14:37:30 evo kernel: [ 4881.497911] ffff8801776adc60 ffffffffa00dbc4a ffff8801beffcd00 0000000000006090 Apr 30 14:37:30 evo kernel: [ 4881.497923] 000000010a000000 ffff8801b1084000 ffff88017132d03c 000000000004e96c Apr 30 14:37:30 evo kernel: [ 4881.497933] 00000000342a21d8 000000af00000000 ffff880100000000 ffff8801776adbf8 Apr 30 14:37:30 evo kernel: [ 4881.497943] Call Trace: Apr 30 14:37:30 evo kernel: [ 4881.497998] [<ffffffffa00dbc4a>] scrub_recheck_block+0xea/0x410 [btrfs] Apr 30 14:37:30 evo kernel: [ 4881.498044] [<ffffffffa00de24f>] scrub_handle_errored_block+0x33f/0x990 [btrfs] Apr 30 14:37:30 evo kernel: [ 4881.498085] [<ffffffffa00defa9>] scrub_bio_end_io_worker+0x709/0x770 [btrfs] Apr 30 14:37:30 evo kernel: [ 4881.498128] [<ffffffffa00bc30f>] worker_loop+0x16f/0x5d0 [btrfs] Apr 30 14:37:30 evo kernel: [ 4881.498171] [<ffffffffa00bc1a0>] ? btrfs_queue_worker+0x310/0x310 [btrfs] Apr 30 14:37:30 evo kernel: [ 4881.498180] [<ffffffff81072d83>] kthread+0x93/0xa0 Apr 30 14:37:30 evo kernel: [ 4881.498190] [<ffffffff81661fa4>] kernel_thread_helper+0x4/0x10 Apr 30 14:37:30 evo kernel: [ 4881.498199] [<ffffffff81072cf0>] ? kthread_freezable_should_stop+0x70/0x70 Apr 30 14:37:30 evo kernel: [ 4881.498208] [<ffffffff81661fa0>] ? gs_change+0x13/0x13 Apr 30 14:37:30 evo kernel: [ 4881.498212] Code: e8 f5 65 20 00 84 c0 0f b7 43 28 0f 85 47 ff ff ff e9 d8 fe ff ff 0f 1f 40 00 55 48 89 e5 66 66 66 66 90 48 8b 47 10 f6 47 18 10 <48> 8b 80 90 00 00 00 4c 8b 88 50 03 00 00 41 8b 81 0c 05 00 00 Apr 30 14:37:30 evo kernel: [ 4881.498290] RIP [<ffffffff811ae841>] bio_add_page+0x11/0x60 Apr 30 14:37:30 evo kernel: [ 4881.498298] RSP <ffff8801776adba0> Apr 30 14:37:30 evo kernel: [ 4881.498302] CR2: 0000000000000090 Apr 30 14:37:30 evo kernel: [ 4881.498308] ---[ end trace 3ddfc561d71fac96 ]--- -- Marco Lorenzo Crociani, marco.crociani@gmail.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Sterba
2012-May-02 14:54 UTC
Re: Errors in rebalancing RAID1 array after disk failure.
On Thu, Apr 19, 2012 at 05:42:05PM +0200, Marco L. Crociani wrote:> Apr 19 17:38:41 evo kernel: [ 347.661915] Call Trace: > Apr 19 17:38:41 evo kernel: [ 347.661964] [<ffffffffa00b76ac>] > btrfs_ioctl_dev_info+0x15c/0x1a0 [btrfs] > Apr 19 17:38:41 evo kernel: [ 347.662013] [<ffffffffa00ba9b1>] > btrfs_ioctl+0x571/0x6c0 [btrfs] > Apr 19 17:38:41 evo kernel: [ 347.662024] [<ffffffff81193839>] > do_vfs_ioctl+0x99/0x330 > Apr 19 17:38:41 evo kernel: [ 347.662032] [<ffffffff8118d345>] ? > putname+0x35/0x50 > Apr 19 17:38:41 evo kernel: [ 347.662040] [<ffffffff81193b71>] > sys_ioctl+0xa1/0xb0 > Apr 19 17:38:41 evo kernel: [ 347.662049] [<ffffffff816691a9>] > system_call_fastpath+0x16/0x1bFixed by http://comments.gmane.org/gmane.comp.file-systems.btrfs/16302 reported earlier http://article.gmane.org/gmane.comp.file-systems.btrfs/16796 and it''s part of 3.4-rc5. david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Marco L. Crociani
2012-May-02 14:59 UTC
Re: Errors in rebalancing RAID1 array after disk failure.
On Wed, May 2, 2012 at 4:54 PM, David Sterba <dave@jikos.cz> wrote:> > On Thu, Apr 19, 2012 at 05:42:05PM +0200, Marco L. Crociani wrote: > > Apr 19 17:38:41 evo kernel: [ 347.661915] Call Trace: > > Apr 19 17:38:41 evo kernel: [ 347.661964] [<ffffffffa00b76ac>] > > > btrfs_ioctl_dev_info+0x15c/0x1a0 [btrfs] > > Apr 19 17:38:41 evo kernel: [ 347.662013] [<ffffffffa00ba9b1>] > > > btrfs_ioctl+0x571/0x6c0 [btrfs] > > Apr 19 17:38:41 evo kernel: [ 347.662024] [<ffffffff81193839>] > > > do_vfs_ioctl+0x99/0x330 > > Apr 19 17:38:41 evo kernel: [ 347.662032] [<ffffffff8118d345>] ? > > > putname+0x35/0x50 > > Apr 19 17:38:41 evo kernel: [ 347.662040] [<ffffffff81193b71>] > > > sys_ioctl+0xa1/0xb0 > > Apr 19 17:38:41 evo kernel: [ 347.662049] [<ffffffff816691a9>] > > > system_call_fastpath+0x16/0x1b > > Fixed by > http://comments.gmane.org/gmane.comp.file-systems.btrfs/16302 > > reported earlier > http://article.gmane.org/gmane.comp.file-systems.btrfs/16796 > > and it''s part of 3.4-rc5. >I was on 3.4-rc5! -- Marco Lorenzo Crociani, marco.crociani@gmail.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Sterba
2012-May-02 15:22 UTC
Re: Errors in rebalancing RAID1 array after disk failure.
On Mon, Apr 30, 2012 at 03:01:04PM +0200, Marco L. Crociani wrote:> ./btrfs device delete missing /mnt/sda3 > ERROR: error removing the device ''missing'' - Input/output error > > > Apr 30 13:17:57 evo kernel: [ 108.866205] btrfs: allowing degraded mounts > Apr 30 13:17:57 evo kernel: [ 108.866214] btrfs: disk space caching is enabled > Apr 30 13:18:32 evo kernel: [ 143.274899] btrfs: relocating block > group 1401002393600 flags 17 > Apr 30 13:19:25 evo kernel: [ 196.888248] btrfs csum failed ino 257 > off 910946304 csum 432355644 private 175165154 > Apr 30 13:19:25 evo kernel: [ 196.889900] btrfs csum failed ino 257 > off 910946304 csum 432355644 private 175165154 > Apr 30 13:19:25 evo kernel: [ 196.890429] btrfs csum failed ino 257 > off 910946304 csum 432355644 private 175165154 > Apr 30 13:19:25 evo kernel: [ 197.087419] btrfs csum failed ino 257 > off 910946304 csum 432355644 private 175165154 > Apr 30 13:19:25 evo kernel: [ 197.087681] btrfs csum failed ino 257 > off 910946304 csum 432355644 private 175165154the failed checksums prevent to remove the data from the device and then removing fails with the above error.> ./btrfs inspect-internal inode-resolve -v 257 /mnt/sda3/ > ioctl ret=-1, error: No such file or directorySo it''s not a visible file, possibly a deleted yet uncleaned snapshot or the space_cache (guessing from the inode number). But AFAICS the checksums are turned off for the free space inode so ...> ./btrfs scrub status /mnt/sda3/ > scrub status for c87975a0-a575-405e-9890-d3f7f25bbd96 > scrub started at Mon Apr 30 13:26:26 2012 and was aborted after 4367 seconds > total bytes scrubbed: 406.64GB with 2 errors > error details: csum=2 > corrected errors: 0, uncorrectable errors: 0, unverified errors: 0Shouldn''t the csum errors be included under uncorrectable?> Apr 30 14:37:24 evo kernel: [ 4875.275776] btrfs: checksum error at > logical 752871157760 on dev /dev/sda3, sector 873795352, root 259, > inode 1580389, offset 612610048, length 4096, links 1 (path:^^^^^^^ so the scrub catches different checksum errors than appeared during balance (inode 257).> Apr 30 14:37:24 evo kernel: [ 4875.275838] BUG: unable to handle kernel NULL pointer dereference at 0000000000000090 > Apr 30 14:37:24 evo kernel: [ 4875.275848] IP: [<ffffffff811ae841>] bio_add_page+0x11/0x60 > Apr 30 14:37:24 evo kernel: [ 4875.276022] RIP: > 0010:[<ffffffff811ae841>] [<ffffffff811ae841>] bio_add_page+0x11/0x60this looks like something disappeared under hands of scrub 1045 BUG_ON(!page->page); 1046 bio = bio_alloc(GFP_NOFS, 1); 1047 if (!bio) 1048 return -EIO; 1049 bio->bi_bdev = page->bdev; 1050 bio->bi_sector = page->physical >> 9; 1051 bio->bi_end_io = scrub_complete_bio_end_io; 1052 bio->bi_private = &complete; 1054 ret = bio_add_page(bio, page->page, PAGE_SIZE, 0); 1055 if (PAGE_SIZE != ret) { 1056 bio_put(bio); 1057 return -EIO; 1058 } everything is initialized before use here, so it''s hidden behind the pointers, my bet is at page->bdev->something . Thinking again how things got here: * unsuccesful device remove ''missing'', due to csum errors in a non-regular file * crashed scrub, after inidirect access of a null pointer Is there anything I missed for steps to reproduce it? david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Sterba
2012-May-02 15:27 UTC
Re: Errors in rebalancing RAID1 array after disk failure.
On Wed, May 02, 2012 at 04:59:03PM +0200, Marco L. Crociani wrote:> > On Thu, Apr 19, 2012 at 05:42:05PM +0200, Marco L. Crociani wrote: > > > Apr 19 17:38:41 evo kernel: [ 347.661964] [<ffffffffa00b76ac>] > > > > btrfs_ioctl_dev_info+0x15c/0x1a0 [btrfs][...]> I was on 3.4-rc5!You really saw this crash with 3.4-rc5 ? The patch should be there. Anyway, your follow-up report was on top of 3.4-rc5, with different error. david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Marco L. Crociani
2012-May-02 17:10 UTC
Re: Errors in rebalancing RAID1 array after disk failure.
> Is there anything I missed for steps to reproduce it?All the story is in previous mails. http://thread.gmane.org/gmane.comp.file-systems.btrfs/16829 http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg15949.html First mail is missing from mail-archive... Summary: Some damaged sectors on one device. Seems to be ok after rewriteing so I started a scrub. During scrub (kernel 3.2.x) device completely broke down with A LOT of dameged sectors ---> other device fills up --> out of space ---> unclean shotdown. With 3.3 kernels I was able to mount it and add a new device. I tried 3.4-rc4 but the patch wasn''t there. I had problem compiling from git, before I tried DKMS, then the whole kernel, (set CONCURRENCY = 5 with quadcore is wrong? ) so I waited rc5. With the tar from kernel.org I have successfully compiled 3.4-rc5 (with CONCURRENCY = 4). Errors with scrub. Here we are. On Wed, May 2, 2012 at 5:27 PM, David Sterba <dave@jikos.cz> wrote:> On Wed, May 02, 2012 at 04:59:03PM +0200, Marco L. Crociani wrote: >> > On Thu, Apr 19, 2012 at 05:42:05PM +0200, Marco L. Crociani wrote: >> > > Apr 19 17:38:41 evo kernel: [ 347.661964] [<ffffffffa00b76ac>] > >> > > btrfs_ioctl_dev_info+0x15c/0x1a0 [btrfs] > [...] >> I was on 3.4-rc5! > > You really saw this crash with 3.4-rc5 ?Yes. I tell you now what I did before your response today. From this point: btrfs fi sh Label: ''RootFS'' uuid: c87975a0-a575-405e-9890-d3f7f25bbd96 Total devices 3 FS bytes used 1015.83GB devid 3 size 1.75TB used 357.00GB path /dev/sdb3 devid 1 size 1.75TB used 1.34TB path /dev/sda3 *** Some devices missing I reached: btrfs fi show Label: ''RootFS'' uuid: c87975a0-a575-405e-9890-d3f7f25bbd96 Total devices 3 FS bytes used 1004.23GB devid 3 size 1.75TB used 1.25TB path /dev/sdb3 devid 1 size 1.75TB used 1.33TB path /dev/sda3 *** Some devices missing using "btrfs balance start -dvrange=1..[group where it fails minus 1] " a number of times (I started writing some notes on http://btrfs.ipv5.de/index.php?title=User:Tyrael ). These should be all the errors (sorry for the confusion): --------------------------------------------------- Apr 30 19:53:13 evo kernel: [ 3163.927548] btrfs csum failed ino 510 off 910946304 csum 432355644 private 175165154 May 1 23:15:12 evo kernel: [101661.681997] btrfs: relocating block group 1742452293632 flags 17 May 1 23:15:39 evo kernel: [101688.412777] btrfs: found 328 extents May 1 23:15:47 evo kernel: [101696.543742] btrfs: found 328 extents May 1 23:15:48 evo kernel: [101697.575754] btrfs: relocating block group 1741378551808 flags 17 May 1 23:16:16 evo kernel: [101724.754908] btrfs: found 137 extents May 1 23:16:24 evo kernel: [101732.915791] btrfs: found 137 extents May 1 23:16:24 evo kernel: [101733.275939] btrfs: relocating block group 1401002393600 flags 17 May 1 23:16:45 evo kernel: [101753.889479] btrfs csum failed ino 2876 off 910946304 csum 432355644 private 175165154 Apr 30 20:55:09 evo kernel: [ 6879.601004] btrfs: relocating block group 1738157326336 flags 17 Apr 30 20:55:10 evo kernel: [ 6879.995377] btrfs: relocating block group 1401002393600 flags 17 Apr 30 20:55:29 evo kernel: [ 6898.819546] btrfs csum failed ino 636 off 910946304 csum 432355644 private 175165154 Apr 30 20:55:29 evo kernel: [ 6898.849422] btrfs csum failed ino 636 off 910946304 csum 432355644 private 175165154 Apr 30 20:55:29 evo kernel: [ 6898.849689] btrfs csum failed ino 636 off 910946304 csum 432355644 private 175165154 Apr 30 20:55:29 evo kernel: [ 6898.878413] btrfs csum failed ino 636 off 910946304 csum 432355644 private 175165154 Apr 30 20:55:29 evo kernel: [ 6898.878668] btrfs csum failed ino 636 off 910946304 csum 432355644 private 175165154 May 1 15:26:26 evo kernel: [73542.827058] btrfs: relocating block group 1394559942656 flags 17 May 1 15:26:38 evo kernel: [73555.038433] btrfs csum failed ino 1581 off 648593408 csum 283516648 private 3975454589 Apr 30 20:58:26 evo kernel: [ 7076.525087] btrfs: relocating block group 1394559942656 flags 17 Apr 30 20:58:38 evo kernel: [ 7088.082493] btrfs csum failed ino 642 off 648593408 csum 283516648 private 3975454589 Apr 30 20:58:38 evo kernel: [ 7088.108851] btrfs csum failed ino 642 off 648593408 csum 283516648 private 3975454589 May 1 15:28:41 evo kernel: [73677.797363] btrfs: relocating block group 1385970008064 flags 17 May 1 15:28:45 evo kernel: [73681.242643] btrfs csum failed ino 1582 off 229765120 csum 3096851068 private 993448323 Apr 30 21:30:46 evo kernel: [ 9016.216885] btrfs: found 223 extents Apr 30 21:30:46 evo kernel: [ 9016.533470] btrfs: relocating block group 1385970008064 flags 17 Apr 30 21:30:49 evo kernel: [ 9019.630665] btrfs csum failed ino 650 off 229765120 csum 3096851068 private 993448323 Apr 30 21:56:29 evo kernel: [10558.769597] btrfs: relocating block group 1378453815296 flags 17 Apr 30 21:56:31 evo kernel: [10561.185029] btrfs csum failed ino 657 off 190976000 csum 3234929648 private 3669891009 May 1 14:07:30 evo kernel: [68808.355851] btrfs: relocating block group 1283964534784 flags 17 May 1 14:07:32 evo kernel: [68809.636406] btrfs csum failed ino 1580 off 76992512 csum 2845512790 private 1793157788 May 1 14:07:30 evo kernel: [68808.355851] btrfs: relocating block group 1283964534784 flags 17 May 1 14:07:32 evo kernel: [68809.636406] btrfs csum failed ino 1580 off 76992512 csum 2845512790 private 1793157788 Apr 30 21:58:01 evo kernel: [10650.588154] btrfs: relocating block group 1283964534784 flags 17 Apr 30 21:58:02 evo kernel: [10651.659749] btrfs csum failed ino 660 off 76992512 csum 2845512790 private 1793157788 May 1 01:41:51 evo kernel: [24077.073607] btrfs: relocating block group 755951992832 flags 17 May 1 01:42:01 evo kernel: [24087.429383] btrfs csum failed ino 1078 off 685268992 csum 397158032 private 511106431 ---------------------------------------------- It''s "normal" that ino changes from one balance run to the next? before: Apr 30 21:58:01 evo kernel: [10650.588154] btrfs: relocating block group 1283964534784 flags 17 Apr 30 21:58:02 evo kernel: [10651.659749] btrfs csum failed ino 660 off 76992512 csum 2845512790 private 1793157788 after: May 1 14:07:30 evo kernel: [68808.355851] btrfs: relocating block group 1283964534784 flags 17 May 1 14:07:32 evo kernel: [68809.636406] btrfs csum failed ino 1580 off 76992512 csum 2845512790 private 1793157788 Sincerely, thanks for the help. It is much appreciated. I do not know where to turn. -- Marco Lorenzo Crociani, marco.crociani@gmail.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Stefan Behrens
2012-May-02 17:15 UTC
Re: Errors in rebalancing RAID1 array after disk failure.
On 5/2/2012 5:22 PM, David Sterba wrote:> On Mon, Apr 30, 2012 at 03:01:04PM +0200, Marco L. Crociani wrote: >> ./btrfs device delete missing /mnt/sda3 >> ERROR: error removing the device ''missing'' - Input/output error >> >> >> Apr 30 13:17:57 evo kernel: [ 108.866205] btrfs: allowing degraded mounts >> Apr 30 13:17:57 evo kernel: [ 108.866214] btrfs: disk space caching is enabled >> Apr 30 13:18:32 evo kernel: [ 143.274899] btrfs: relocating block >> group 1401002393600 flags 17 >> Apr 30 13:19:25 evo kernel: [ 196.888248] btrfs csum failed ino 257 >> off 910946304 csum 432355644 private 175165154 >> Apr 30 13:19:25 evo kernel: [ 196.889900] btrfs csum failed ino 257 >> off 910946304 csum 432355644 private 175165154 >> Apr 30 13:19:25 evo kernel: [ 196.890429] btrfs csum failed ino 257 >> off 910946304 csum 432355644 private 175165154 >> Apr 30 13:19:25 evo kernel: [ 197.087419] btrfs csum failed ino 257 >> off 910946304 csum 432355644 private 175165154 >> Apr 30 13:19:25 evo kernel: [ 197.087681] btrfs csum failed ino 257 >> off 910946304 csum 432355644 private 175165154 > > the failed checksums prevent to remove the data from the device and then > removing fails with the above error. > >> ./btrfs inspect-internal inode-resolve -v 257 /mnt/sda3/ >> ioctl ret=-1, error: No such file or directory > > So it''s not a visible file, possibly a deleted yet uncleaned snapshot or > the space_cache (guessing from the inode number). But AFAICS the > checksums are turned off for the free space inode so ... > >> ./btrfs scrub status /mnt/sda3/ >> scrub status for c87975a0-a575-405e-9890-d3f7f25bbd96 >> scrub started at Mon Apr 30 13:26:26 2012 and was aborted after 4367 seconds >> total bytes scrubbed: 406.64GB with 2 errors >> error details: csum=2 >> corrected errors: 0, uncorrectable errors: 0, unverified errors: 0 > > Shouldn''t the csum errors be included under uncorrectable?"uncorrectable errors" would have been set to 2 if no crash had happened.> >> Apr 30 14:37:24 evo kernel: [ 4875.275776] btrfs: checksum error at >> logical 752871157760 on dev /dev/sda3, sector 873795352, root 259, >> inode 1580389, offset 612610048, length 4096, links 1 (path: > ^^^^^^^ > > so the scrub catches different checksum errors than appeared during > balance (inode 257). > >> Apr 30 14:37:24 evo kernel: [ 4875.275838] BUG: unable to handle kernel NULL pointer dereference at 0000000000000090 >> Apr 30 14:37:24 evo kernel: [ 4875.275848] IP: [<ffffffff811ae841>] bio_add_page+0x11/0x60 >> Apr 30 14:37:24 evo kernel: [ 4875.276022] RIP: >> 0010:[<ffffffff811ae841>] [<ffffffff811ae841>] bio_add_page+0x11/0x60 > > this looks like something disappeared under hands of scrub > > 1045 BUG_ON(!page->page); > 1046 bio = bio_alloc(GFP_NOFS, 1); > 1047 if (!bio) > 1048 return -EIO; > 1049 bio->bi_bdev = page->bdev; > 1050 bio->bi_sector = page->physical >> 9; > 1051 bio->bi_end_io = scrub_complete_bio_end_io; > 1052 bio->bi_private = &complete; > > 1054 ret = bio_add_page(bio, page->page, PAGE_SIZE, 0); > 1055 if (PAGE_SIZE != ret) { > 1056 bio_put(bio); > 1057 return -EIO; > 1058 } > > everything is initialized before use here, so it''s hidden behind the > pointers, my bet is at page->bdev->something . Thinking again how things > got here: > > * unsuccesful device remove ''missing'', due to csum errors in a > non-regular file > * crashed scrub, after inidirect access of a null pointer > > Is there anything I missed for steps to reproduce it?Right. bdev is a NULL pointer for missing devices. Scrub tries to repair the checksum error by accessing the mirrors, and that device is missing and NULL. I''ll send a patch tomorrow to prevent the scrub crash in this situation. Thanks!
Stefan Behrens
2012-May-02 17:18 UTC
Re: Errors in rebalancing RAID1 array after disk failure.
Oops, please scratch the attachment of the mail before, that patch is not yet finished. I forgot to remove it before hitting the send button :( Sorry.> I''ll send a patch tomorrow to prevent the scrub crash in this situation.-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html