Marc MERLIN
2012-Jun-27 01:38 UTC
Re: Long btrfs hangs during suspend to RAM / BTRFS warning (device dm-0): Aborting unused transaction
On Tue, Jun 26, 2012 at 12:36:37PM -0700, Marc MERLIN wrote:> I was fine with btrfs until 3.2.16, but when going to 3.4 or 3.4.4, I''m > having my system randomly not wanting to suspend to RAM. > > My suspend light started flashing but the system wasn''t suspending. > I was able to unlock X, run top, saw that btrfs-delalloc-??? was taking > 98% CPU, and any command that touched disk hung > (so I could log in, but would not get a shell). > > I did the sysrq commands, which took a long time to work (they stacked > up with nothing happening and maybe 10mn later, the system unhung itself > and they ran, maybe too late, not sure). > > Interestingly the system did go to sleep and resumed ok after that, it > just took 20mn to get there. > > It''s all pasted below. > > gandalfthegreat:~# btrfs fi show > Label: ''btrfs_pool1'' uuid: 873d526c-e911-4234-af1b-239889cd143d > Total devices 1 FS bytes used 214.44GB > devid 1 size 231.02GB used 231.02GB path /dev/dm-0 > > Btrfs Btrfs v0.19 > gandalfthegreat:~#Now, I''m also seeing these below and I have this again (86% CPU): 6076 root 20 0 0 0 0 R 86 0.0 29:40.11 btrfs-delalloc- How bad is it, doctor? I think I''ll be going back to 3.2.16 for now though. [100415.369301] BTRFS warning (device dm-0): Aborting unused transaction. [100415.739337] BTRFS warning (device dm-0): Aborting unused transaction. [100416.093839] BTRFS warning (device dm-0): Aborting unused transaction. [100416.433245] BTRFS warning (device dm-0): Aborting unused transaction. [100709.352445] usb 1-5: USB disconnect, device number 8 [100710.197602] thinkpad_acpi: EC reports that Thermal Table has changed [100710.199954] ACPI: \_SB_.GDCK - undocking [100711.358679] wlan0: deauthenticating from 00:24:6c:67:03:d1 by local choice (reason=3) [100713.044214] e1000e 0000:00:19.0: irq 46 for MSI/MSI-X [101447.801190] btrfs no csum found for inode 3910588 start 5144576 [101447.802277] btrfs csum failed ino 3910588 off 5144576 csum 2096800889 private 0 [101447.957161] btrfs no csum found for inode 3910588 start 5996544 [101447.957990] btrfs csum failed ino 3910588 off 5996544 csum 2096800889 private 0 [101447.960459] btrfs no csum found for inode 3910588 start 6062080 [101447.961351] btrfs csum failed ino 3910588 off 6062080 csum 2096800889 private 0 [101447.962459] btrfs no csum found for inode 3910588 start 6225920 [101447.963543] btrfs csum failed ino 3910588 off 6225920 csum 2096800889 private 0 [101447.964369] btrfs no csum found for inode 3910588 start 6291456 [101447.965045] btrfs csum failed ino 3910588 off 6291456 csum 2096800889 private 0 [101447.966406] btrfs no csum found for inode 3910588 start 6356992 [101447.967392] btrfs csum failed ino 3910588 off 6356992 csum 2123070294 private 0 [101448.088912] btrfs no csum found for inode 3910588 start 6553600 [101448.088971] btrfs no csum found for inode 3910588 start 6586368 [101448.090058] btrfs csum failed ino 3910588 off 6586368 csum 2096800889 private 0 [101448.090305] btrfs csum failed ino 3910588 off 6553600 csum 4253301504 private 0 [101448.093231] btrfs no csum found for inode 3910588 start 6717440 [101448.094269] btrfs csum failed ino 3910588 off 6717440 csum 2096800889 private 0 [101448.096104] btrfs no csum found for inode 3910588 start 6848512 [101448.097002] btrfs csum failed ino 3910588 off 6848512 csum 3282939717 private 0 [101448.098249] btrfs no csum found for inode 3910588 start 6946816 [101448.101561] btrfs no csum found for inode 3910588 start 7045120 [101448.106352] btrfs no csum found for inode 3910588 start 7176192 [101448.108224] btrfs no csum found for inode 3910588 start 7241728 [101448.111133] btrfs no csum found for inode 3910588 start 7307264 [101448.127081] btrfs no csum found for inode 3910588 start 7569408 [101448.144351] btrfs no csum found for inode 3910588 start 7864320 [101448.155857] btrfs no csum found for inode 3910588 start 8060928 [101448.165868] btrfs no csum found for inode 3910588 start 8257536 [101448.175275] btrfs no csum found for inode 3910588 start 8454144 [101448.176087] btrfs no csum found for inode 3910588 start 8552448 [101448.180927] btrfs no csum found for inode 3910588 start 8847360 [101448.183489] btrfs no csum found for inode 3910588 start 8978432 [101448.192909] btrfs no csum found for inode 3910588 start 9371648 [101448.198959] btrfs no csum found for inode 3910588 start 9568256 [101448.413716] btrfs no csum found for inode 3910588 start 5439488 [101448.413996] btrfs no csum found for inode 3910588 start 5537792 [101448.414379] btrfs no csum found for inode 3910588 start 5603328 [101448.416386] btrfs no csum found for inode 3910588 start 5734400 [101448.424108] btrfs no csum found for inode 3910588 start 5799936 [101448.721649] btrfs no csum found for inode 3910588 start 9699328 [101448.737377] btrfs no csum found for inode 3910588 start 9863168 [101448.743929] btrfs no csum found for inode 3910588 start 10027008 [101448.753186] btrfs no csum found for inode 3910588 start 10158080 [101448.762406] btrfs no csum found for inode 3910588 start 10256384 [101448.765143] btrfs no csum found for inode 3910588 start 10354688 [102230.473794] BTRFS warning (device dm-0): Aborting unused transaction. [102231.154304] BTRFS warning (device dm-0): Aborting unused transaction. [102231.816380] BTRFS warning (device dm-0): Aborting unused transaction. [102231.845170] btrfs no csum found for inode 3910562 start 0 [102231.855415] btrfs_readpage_end_io_hook: 26 callbacks suppressed [102231.855418] btrfs csum failed ino 3910562 off 0 csum 3145117582 private 0 [102231.999555] btrfs no csum found for inode 3910588 start 294912 [102232.009479] btrfs csum failed ino 3910588 off 294912 csum 697692408 private 0 [102232.142602] btrfs no csum found for inode 3910588 start 360448 [102232.142661] btrfs no csum found for inode 3910588 start 393216 [102232.157527] btrfs csum failed ino 3910588 off 360448 csum 66868660 private 0 [102232.158415] btrfs csum failed ino 3910588 off 393216 csum 230387419 private 0 [102232.968939] BTRFS warning (device dm-0): Aborting unused transaction. [102234.107914] BTRFS warning (device dm-0): Aborting unused transaction. [102235.218139] BTRFS warning (device dm-0): Aborting unused transaction. Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Marc MERLIN
2012-Jun-27 05:20 UTC
Re: Long btrfs hangs during suspend to RAM / BTRFS warning (device dm-0): Aborting unused transaction
On Tue, Jun 26, 2012 at 06:38:18PM -0700, Marc MERLIN wrote:> Now, I''m also seeing these below and I have this again (86% CPU): > 6076 root 20 0 0 0 0 R 86 0.0 29:40.11 btrfs-delalloc- > > How bad is it, doctor? I think I''ll be going back to 3.2.16 for now though.Back to 3.2.16, I''m now seeing this: [ 840.516733] INFO: task VirtualBox:6818 blocked for more than 120 seconds. [ 840.516735] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 840.516736] VirtualBox D ffff8801fd134080 0 6818 6758 0x00000080 [ 840.516740] ffff8801fd134080 0000000000000086 0000000000000050 ffff880202e7f100 [ 840.516744] 0000000000013580 ffff8801c6f0dfd8 ffff8801c6f0dfd8 ffff8801fd134080 [ 840.516748] ffff8801c6f0da68 ffff8801c6f0da68 ffff88020a4e22f0 ffff88023bc13e08 [ 840.516752] Call Trace: [ 840.516755] [<ffffffff810b5c67>] ? __lock_page+0x66/0x66 [ 840.516758] [<ffffffff8134aea4>] ? io_schedule+0x58/0x6f [ 840.516761] [<ffffffff810b5c6d>] ? sleep_on_page+0x6/0xa [ 840.516764] [<ffffffff8134b1e5>] ? __wait_on_bit_lock+0x3c/0x85 [ 840.516767] [<ffffffff810b5c62>] ? __lock_page+0x61/0x66 [ 840.516770] [<ffffffff81060051>] ? autoremove_wake_function+0x2a/0x2a [ 840.516785] [<ffffffffa01838d7>] ? extent_write_cache_pages.isra.13.constprop.22+0xf6/0x278 [btrfs] [ 840.516789] [<ffffffff810ec9cb>] ? __cache_free.isra.40+0x19/0x1a7 [ 840.516792] [<ffffffff8134ed52>] ? sub_preempt_count+0x83/0x94 [ 840.516795] [<ffffffff8134c2dd>] ? _raw_spin_unlock+0x24/0x30 [ 840.516811] [<ffffffffa0183c4b>] ? extent_writepages+0x40/0x57 [btrfs] [ 840.516826] [<ffffffffa0177f5f>] ? __btrfs_buffered_write+0x2bb/0x2dc [btrfs] [ 840.516841] [<ffffffffa016e88a>] ? uncompress_inline.isra.44+0x116/0x116 [btrfs] [ 840.516844] [<ffffffff810b6aaf>] ? __filemap_fdatawrite_range+0x4b/0x50 [ 840.516847] [<ffffffff810b6ad9>] ? filemap_write_and_wait_range+0x25/0x4d [ 840.516863] [<ffffffffa01782ce>] ? btrfs_file_aio_write+0x34e/0x490 [btrfs] [ 840.516866] [<ffffffff8103e092>] ? get_parent_ip+0x9/0x1b [ 840.516882] [<ffffffffa0177f80>] ? __btrfs_buffered_write+0x2dc/0x2dc [btrfs] [ 840.516886] [<ffffffff8112f19c>] ? aio_rw_vect_retry+0x70/0x18e [ 840.516888] [<ffffffff8112f12c>] ? aio_fsync+0x22/0x22 [ 840.516891] [<ffffffff8112fbc7>] ? aio_run_iocb+0x72/0x11c [ 840.516894] [<ffffffff81130d9a>] ? do_io_submit+0x6a4/0x7f9 [ 840.516898] [<ffffffff813508d2>] ? system_call_fastpath+0x16/0x1b [ 1187.553635] btrfs: unlinked 8 orphans [ 3810.200064] e1000e 0000:00:19.0: BAR 0: set to [mem 0xfc000000-0xfc01ffff] (PCI address [0xfc000000-0xfc01ffff]) [ 3810.200071] e1000e 0000:00:19.0: BAR 1: set to [mem 0xfc025000-0xfc025fff] (PCI address [0xfc025000-0xfc025fff]) [ 3810.200076] e1000e 0000:00:19.0: BAR 2: set to [io 0x1840-0x185f] (PCI address [0x1840-0x185f]) [ 3810.200093] e1000e 0000:00:19.0: restoring config space at offset 0xf (was 0x100, writing 0x10b) [ 3810.200115] e1000e 0000:00:19.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100107) [ 3810.200147] e1000e 0000:00:19.0: PME# disabled [ 3810.200224] e1000e 0000:00:19.0: irq 45 for MSI/MSI-X [ 4671.144685] iwlwifi 0000:03:00.0: Tx aggregation enabled on ra = 2c:b0:5d:3c:7d:f1 tid = 1 [ 4799.384107] btrfs: unlinked 8 orphans [ 8436.512513] btrfs: unlinked 7 orphans [11350.749850] btrfs no csum found for inode 3909426 start 0 [11350.750697] btrfs csum failed ino 3909426 off 0 csum 1419704114 private 0 [11652.088805] btrfs no csum found for inode 3910848 start 0 [11652.089524] btrfs csum failed ino 3910848 off 0 csum 3145117582 private 0 My firefox and chrome profiles were corrupted, so I had to restore them from an old snapshot. I can''t prove it, but it looks like my corruption happened right at the same time than I rebooted to 3.4.4. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Marc MERLIN
2012-Jun-29 12:36 UTC
Re: Long btrfs hangs during suspend to RAM / BTRFS warning (device dm-0): Aborting unused transaction
On Tue, Jun 26, 2012 at 10:20:12PM -0700, Marc MERLIN wrote:> On Tue, Jun 26, 2012 at 06:38:18PM -0700, Marc MERLIN wrote: > > Now, I''m also seeing these below and I have this again (86% CPU): > > 6076 root 20 0 0 0 0 R 86 0.0 29:40.11 btrfs-delalloc- > > > > How bad is it, doctor? I think I''ll be going back to 3.2.16 for now though.I reverted to 3.2.16 and haven''t had further problems after dropping the current snapshot that was corrupted in various ways. Now, I''m not sure when I should upgrade anymore since I haven''t heard of any fixes for what I saw. Assuming I go forward again, is there something else I could have provided to help debug? Marc> Back to 3.2.16, I''m now seeing this: > [ 840.516733] INFO: task VirtualBox:6818 blocked for more than 120 seconds. > [ 840.516735] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 840.516736] VirtualBox D ffff8801fd134080 0 6818 6758 0x00000080 > [ 840.516740] ffff8801fd134080 0000000000000086 0000000000000050 ffff880202e7f100 > [ 840.516744] 0000000000013580 ffff8801c6f0dfd8 ffff8801c6f0dfd8 ffff8801fd134080 > [ 840.516748] ffff8801c6f0da68 ffff8801c6f0da68 ffff88020a4e22f0 ffff88023bc13e08 > [ 840.516752] Call Trace: > [ 840.516755] [<ffffffff810b5c67>] ? __lock_page+0x66/0x66 > [ 840.516758] [<ffffffff8134aea4>] ? io_schedule+0x58/0x6f > [ 840.516761] [<ffffffff810b5c6d>] ? sleep_on_page+0x6/0xa > [ 840.516764] [<ffffffff8134b1e5>] ? __wait_on_bit_lock+0x3c/0x85 > [ 840.516767] [<ffffffff810b5c62>] ? __lock_page+0x61/0x66 > [ 840.516770] [<ffffffff81060051>] ? autoremove_wake_function+0x2a/0x2a > [ 840.516785] [<ffffffffa01838d7>] ? extent_write_cache_pages.isra.13.constprop.22+0xf6/0x278 [btrfs] > [ 840.516789] [<ffffffff810ec9cb>] ? __cache_free.isra.40+0x19/0x1a7 > [ 840.516792] [<ffffffff8134ed52>] ? sub_preempt_count+0x83/0x94 > [ 840.516795] [<ffffffff8134c2dd>] ? _raw_spin_unlock+0x24/0x30 > [ 840.516811] [<ffffffffa0183c4b>] ? extent_writepages+0x40/0x57 [btrfs] > [ 840.516826] [<ffffffffa0177f5f>] ? __btrfs_buffered_write+0x2bb/0x2dc [btrfs] > [ 840.516841] [<ffffffffa016e88a>] ? uncompress_inline.isra.44+0x116/0x116 [btrfs] > [ 840.516844] [<ffffffff810b6aaf>] ? __filemap_fdatawrite_range+0x4b/0x50 > [ 840.516847] [<ffffffff810b6ad9>] ? filemap_write_and_wait_range+0x25/0x4d > [ 840.516863] [<ffffffffa01782ce>] ? btrfs_file_aio_write+0x34e/0x490 [btrfs] > [ 840.516866] [<ffffffff8103e092>] ? get_parent_ip+0x9/0x1b > [ 840.516882] [<ffffffffa0177f80>] ? __btrfs_buffered_write+0x2dc/0x2dc [btrfs] > [ 840.516886] [<ffffffff8112f19c>] ? aio_rw_vect_retry+0x70/0x18e > [ 840.516888] [<ffffffff8112f12c>] ? aio_fsync+0x22/0x22 > [ 840.516891] [<ffffffff8112fbc7>] ? aio_run_iocb+0x72/0x11c > [ 840.516894] [<ffffffff81130d9a>] ? do_io_submit+0x6a4/0x7f9 > [ 840.516898] [<ffffffff813508d2>] ? system_call_fastpath+0x16/0x1b > [ 1187.553635] btrfs: unlinked 8 orphans > [ 3810.200064] e1000e 0000:00:19.0: BAR 0: set to [mem 0xfc000000-0xfc01ffff] (PCI address [0xfc000000-0xfc01ffff]) > [ 3810.200071] e1000e 0000:00:19.0: BAR 1: set to [mem 0xfc025000-0xfc025fff] (PCI address [0xfc025000-0xfc025fff]) > [ 3810.200076] e1000e 0000:00:19.0: BAR 2: set to [io 0x1840-0x185f] (PCI address [0x1840-0x185f]) > [ 3810.200093] e1000e 0000:00:19.0: restoring config space at offset 0xf (was 0x100, writing 0x10b) > [ 3810.200115] e1000e 0000:00:19.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100107) > [ 3810.200147] e1000e 0000:00:19.0: PME# disabled > [ 3810.200224] e1000e 0000:00:19.0: irq 45 for MSI/MSI-X > [ 4671.144685] iwlwifi 0000:03:00.0: Tx aggregation enabled on ra = 2c:b0:5d:3c:7d:f1 tid = 1 > [ 4799.384107] btrfs: unlinked 8 orphans > [ 8436.512513] btrfs: unlinked 7 orphans > [11350.749850] btrfs no csum found for inode 3909426 start 0 > [11350.750697] btrfs csum failed ino 3909426 off 0 csum 1419704114 private 0 > [11652.088805] btrfs no csum found for inode 3910848 start 0 > [11652.089524] btrfs csum failed ino 3910848 off 0 csum 3145117582 private 0 > > My firefox and chrome profiles were corrupted, so I had to restore them from an old snapshot. > > I can''t prove it, but it looks like my corruption happened right at the same > time than I rebooted to 3.4.4. > > Marc > -- > "A mouse is a device used to point at the xterm you want to type in" - A.S.R. > Microsoft is to operating systems .... > .... what McDonalds is to gourmet cooking > Home page: http://marc.merlins.org/ > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html-- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Marc MERLIN
2012-Jul-02 19:58 UTC
Re: Long btrfs hangs during suspend to RAM / BTRFS warning (device dm-0): Aborting unused transaction
On Fri, Jun 29, 2012 at 05:36:24AM -0700, Marc MERLIN wrote:> On Tue, Jun 26, 2012 at 10:20:12PM -0700, Marc MERLIN wrote: > > On Tue, Jun 26, 2012 at 06:38:18PM -0700, Marc MERLIN wrote: > > > Now, I''m also seeing these below and I have this again (86% CPU): > > > 6076 root 20 0 0 0 0 R 86 0.0 29:40.11 btrfs-delalloc- > > > > > > How bad is it, doctor? I think I''ll be going back to 3.2.16 for now though. > > I reverted to 3.2.16 and haven''t had further problems after dropping the > current snapshot that was corrupted in various ways. > > Now, I''m not sure when I should upgrade anymore since I haven''t heard of > any fixes for what I saw. > Assuming I go forward again, is there something else I could have > provided to help debug?Mmmh, ok. I understand that this code comes with no guarantees, and I have backups, but I''m reporting a problem that lead to corruption (I had multiple files that were corrupted in my latest snapshot and I had to drop it and revert to an older snapshot and then out of fear for 3.4.4, went back to 3.2.16). I didn''t see any problems with 3.2.16 (doesn''t mean there weren''t any, just that I didn''t see any). Since my filesystem was a bit full, and that triggers problems with btrfs, I freed up 70GB gandalfthegreat:~# btrfs fi show Label: ''btrfs_pool1'' uuid: 873d526c-e911-4234-af1b-239889cd143d Total devices 1 FS bytes used 163.01GB devid 1 size 231.02GB used 231.02GB path /dev/dm-0 I rebooted with 3.4.4 and started copying data, and for now I''ve gotten this: kernel: [ 832.108558] btrfs no csum found for inode 3896855 start 0 kernel: [ 832.108873] btrfs csum failed ino 3896855 off 0 csum 1150320628 private 0 How bad is this? More generally, what was missing from my previous report (I gave all the sysrq I could output) that no one seemed to be able to use it? Thanks, Marc> > Back to 3.2.16, I''m now seeing this: > > [ 840.516733] INFO: task VirtualBox:6818 blocked for more than 120 seconds. > > [ 840.516735] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > [ 840.516736] VirtualBox D ffff8801fd134080 0 6818 6758 0x00000080 > > [ 840.516740] ffff8801fd134080 0000000000000086 0000000000000050 ffff880202e7f100 > > [ 840.516744] 0000000000013580 ffff8801c6f0dfd8 ffff8801c6f0dfd8 ffff8801fd134080 > > [ 840.516748] ffff8801c6f0da68 ffff8801c6f0da68 ffff88020a4e22f0 ffff88023bc13e08 > > [ 840.516752] Call Trace: > > [ 840.516755] [<ffffffff810b5c67>] ? __lock_page+0x66/0x66 > > [ 840.516758] [<ffffffff8134aea4>] ? io_schedule+0x58/0x6f > > [ 840.516761] [<ffffffff810b5c6d>] ? sleep_on_page+0x6/0xa > > [ 840.516764] [<ffffffff8134b1e5>] ? __wait_on_bit_lock+0x3c/0x85 > > [ 840.516767] [<ffffffff810b5c62>] ? __lock_page+0x61/0x66 > > [ 840.516770] [<ffffffff81060051>] ? autoremove_wake_function+0x2a/0x2a > > [ 840.516785] [<ffffffffa01838d7>] ? extent_write_cache_pages.isra.13.constprop.22+0xf6/0x278 [btrfs] > > [ 840.516789] [<ffffffff810ec9cb>] ? __cache_free.isra.40+0x19/0x1a7 > > [ 840.516792] [<ffffffff8134ed52>] ? sub_preempt_count+0x83/0x94 > > [ 840.516795] [<ffffffff8134c2dd>] ? _raw_spin_unlock+0x24/0x30 > > [ 840.516811] [<ffffffffa0183c4b>] ? extent_writepages+0x40/0x57 [btrfs] > > [ 840.516826] [<ffffffffa0177f5f>] ? __btrfs_buffered_write+0x2bb/0x2dc [btrfs] > > [ 840.516841] [<ffffffffa016e88a>] ? uncompress_inline.isra.44+0x116/0x116 [btrfs] > > [ 840.516844] [<ffffffff810b6aaf>] ? __filemap_fdatawrite_range+0x4b/0x50 > > [ 840.516847] [<ffffffff810b6ad9>] ? filemap_write_and_wait_range+0x25/0x4d > > [ 840.516863] [<ffffffffa01782ce>] ? btrfs_file_aio_write+0x34e/0x490 [btrfs] > > [ 840.516866] [<ffffffff8103e092>] ? get_parent_ip+0x9/0x1b > > [ 840.516882] [<ffffffffa0177f80>] ? __btrfs_buffered_write+0x2dc/0x2dc [btrfs] > > [ 840.516886] [<ffffffff8112f19c>] ? aio_rw_vect_retry+0x70/0x18e > > [ 840.516888] [<ffffffff8112f12c>] ? aio_fsync+0x22/0x22 > > [ 840.516891] [<ffffffff8112fbc7>] ? aio_run_iocb+0x72/0x11c > > [ 840.516894] [<ffffffff81130d9a>] ? do_io_submit+0x6a4/0x7f9 > > [ 840.516898] [<ffffffff813508d2>] ? system_call_fastpath+0x16/0x1b > > [ 1187.553635] btrfs: unlinked 8 orphans > > [ 3810.200064] e1000e 0000:00:19.0: BAR 0: set to [mem 0xfc000000-0xfc01ffff] (PCI address [0xfc000000-0xfc01ffff]) > > [ 3810.200071] e1000e 0000:00:19.0: BAR 1: set to [mem 0xfc025000-0xfc025fff] (PCI address [0xfc025000-0xfc025fff]) > > [ 3810.200076] e1000e 0000:00:19.0: BAR 2: set to [io 0x1840-0x185f] (PCI address [0x1840-0x185f]) > > [ 3810.200093] e1000e 0000:00:19.0: restoring config space at offset 0xf (was 0x100, writing 0x10b) > > [ 3810.200115] e1000e 0000:00:19.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100107) > > [ 3810.200147] e1000e 0000:00:19.0: PME# disabled > > [ 3810.200224] e1000e 0000:00:19.0: irq 45 for MSI/MSI-X > > [ 4671.144685] iwlwifi 0000:03:00.0: Tx aggregation enabled on ra = 2c:b0:5d:3c:7d:f1 tid = 1 > > [ 4799.384107] btrfs: unlinked 8 orphans > > [ 8436.512513] btrfs: unlinked 7 orphans > > [11350.749850] btrfs no csum found for inode 3909426 start 0 > > [11350.750697] btrfs csum failed ino 3909426 off 0 csum 1419704114 private 0 > > [11652.088805] btrfs no csum found for inode 3910848 start 0 > > [11652.089524] btrfs csum failed ino 3910848 off 0 csum 3145117582 private 0 > > > > My firefox and chrome profiles were corrupted, so I had to restore them from an old snapshot. > > > > I can''t prove it, but it looks like my corruption happened right at the same > > time than I rebooted to 3.4.4. > > > > Marc > > -- > > "A mouse is a device used to point at the xterm you want to type in" - A.S.R. > > Microsoft is to operating systems .... > > .... what McDonalds is to gourmet cooking > > Home page: http://marc.merlins.org/ > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > "A mouse is a device used to point at the xterm you want to type in" - A.S.R. > Microsoft is to operating systems .... > .... what McDonalds is to gourmet cooking > Home page: http://marc.merlins.org/ > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html-- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Liu Bo
2012-Jul-04 05:58 UTC
Re: Long btrfs hangs during suspend to RAM / BTRFS warning (device dm-0): Aborting unused transaction
On 07/03/2012 03:58 AM, Marc MERLIN wrote:> On Fri, Jun 29, 2012 at 05:36:24AM -0700, Marc MERLIN wrote: >> On Tue, Jun 26, 2012 at 10:20:12PM -0700, Marc MERLIN wrote: >>> On Tue, Jun 26, 2012 at 06:38:18PM -0700, Marc MERLIN wrote: >>>> Now, I''m also seeing these below and I have this again (86% CPU): >>>> 6076 root 20 0 0 0 0 R 86 0.0 29:40.11 btrfs-delalloc- >>>> >>>> How bad is it, doctor? I think I''ll be going back to 3.2.16 for now though. >> >> I reverted to 3.2.16 and haven''t had further problems after dropping the >> current snapshot that was corrupted in various ways. >> >> Now, I''m not sure when I should upgrade anymore since I haven''t heard of >> any fixes for what I saw. >> Assuming I go forward again, is there something else I could have >> provided to help debug? > > Mmmh, ok. I understand that this code comes with no guarantees, and I have > backups, but I''m reporting a problem that lead to corruption (I had multiple > files that were corrupted in my latest snapshot and I had to drop it and > revert to an older snapshot and then out of fear for 3.4.4, went back to > 3.2.16). >Hi Marc, Sorry for not replying this earlier. The dmesg log, sysrq log and stack dump info can usually be very helpful. From your report, we can see the csum error and hang on log, ''no csum'' is not that bad while hanging-on is serious and dangerous. so can you please get any ''sysrq + w'' log in the hanging-on case and paste them here, and the log may tell us who blocks other threads.> I didn''t see any problems with 3.2.16 (doesn''t mean there weren''t any, just > that I didn''t see any).Feel free to use the latest btrfs upstream, it always contains some fixes. thanks, liubo> Since my filesystem was a bit full, and that triggers problems with btrfs, I > freed up 70GB > gandalfthegreat:~# btrfs fi show > Label: ''btrfs_pool1'' uuid: 873d526c-e911-4234-af1b-239889cd143d > Total devices 1 FS bytes used 163.01GB > devid 1 size 231.02GB used 231.02GB path /dev/dm-0 > > I rebooted with 3.4.4 and started copying data, and for now I''ve gotten this: > kernel: [ 832.108558] btrfs no csum found for inode 3896855 start 0 > kernel: [ 832.108873] btrfs csum failed ino 3896855 off 0 csum 1150320628 private 0 > > How bad is this? > > More generally, what was missing from my previous report (I gave all the > sysrq I could output) that no one seemed to be able to use it? >> Thanks, > Marc > >>> Back to 3.2.16, I''m now seeing this: >>> [ 840.516733] INFO: task VirtualBox:6818 blocked for more than 120 seconds. >>> [ 840.516735] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >>> [ 840.516736] VirtualBox D ffff8801fd134080 0 6818 6758 0x00000080 >>> [ 840.516740] ffff8801fd134080 0000000000000086 0000000000000050 ffff880202e7f100 >>> [ 840.516744] 0000000000013580 ffff8801c6f0dfd8 ffff8801c6f0dfd8 ffff8801fd134080 >>> [ 840.516748] ffff8801c6f0da68 ffff8801c6f0da68 ffff88020a4e22f0 ffff88023bc13e08 >>> [ 840.516752] Call Trace: >>> [ 840.516755] [<ffffffff810b5c67>] ? __lock_page+0x66/0x66 >>> [ 840.516758] [<ffffffff8134aea4>] ? io_schedule+0x58/0x6f >>> [ 840.516761] [<ffffffff810b5c6d>] ? sleep_on_page+0x6/0xa >>> [ 840.516764] [<ffffffff8134b1e5>] ? __wait_on_bit_lock+0x3c/0x85 >>> [ 840.516767] [<ffffffff810b5c62>] ? __lock_page+0x61/0x66 >>> [ 840.516770] [<ffffffff81060051>] ? autoremove_wake_function+0x2a/0x2a >>> [ 840.516785] [<ffffffffa01838d7>] ? extent_write_cache_pages.isra.13.constprop.22+0xf6/0x278 [btrfs] >>> [ 840.516789] [<ffffffff810ec9cb>] ? __cache_free.isra.40+0x19/0x1a7 >>> [ 840.516792] [<ffffffff8134ed52>] ? sub_preempt_count+0x83/0x94 >>> [ 840.516795] [<ffffffff8134c2dd>] ? _raw_spin_unlock+0x24/0x30 >>> [ 840.516811] [<ffffffffa0183c4b>] ? extent_writepages+0x40/0x57 [btrfs] >>> [ 840.516826] [<ffffffffa0177f5f>] ? __btrfs_buffered_write+0x2bb/0x2dc [btrfs] >>> [ 840.516841] [<ffffffffa016e88a>] ? uncompress_inline.isra.44+0x116/0x116 [btrfs] >>> [ 840.516844] [<ffffffff810b6aaf>] ? __filemap_fdatawrite_range+0x4b/0x50 >>> [ 840.516847] [<ffffffff810b6ad9>] ? filemap_write_and_wait_range+0x25/0x4d >>> [ 840.516863] [<ffffffffa01782ce>] ? btrfs_file_aio_write+0x34e/0x490 [btrfs] >>> [ 840.516866] [<ffffffff8103e092>] ? get_parent_ip+0x9/0x1b >>> [ 840.516882] [<ffffffffa0177f80>] ? __btrfs_buffered_write+0x2dc/0x2dc [btrfs] >>> [ 840.516886] [<ffffffff8112f19c>] ? aio_rw_vect_retry+0x70/0x18e >>> [ 840.516888] [<ffffffff8112f12c>] ? aio_fsync+0x22/0x22 >>> [ 840.516891] [<ffffffff8112fbc7>] ? aio_run_iocb+0x72/0x11c >>> [ 840.516894] [<ffffffff81130d9a>] ? do_io_submit+0x6a4/0x7f9 >>> [ 840.516898] [<ffffffff813508d2>] ? system_call_fastpath+0x16/0x1b >>> [ 1187.553635] btrfs: unlinked 8 orphans >>> [ 3810.200064] e1000e 0000:00:19.0: BAR 0: set to [mem 0xfc000000-0xfc01ffff] (PCI address [0xfc000000-0xfc01ffff]) >>> [ 3810.200071] e1000e 0000:00:19.0: BAR 1: set to [mem 0xfc025000-0xfc025fff] (PCI address [0xfc025000-0xfc025fff]) >>> [ 3810.200076] e1000e 0000:00:19.0: BAR 2: set to [io 0x1840-0x185f] (PCI address [0x1840-0x185f]) >>> [ 3810.200093] e1000e 0000:00:19.0: restoring config space at offset 0xf (was 0x100, writing 0x10b) >>> [ 3810.200115] e1000e 0000:00:19.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100107) >>> [ 3810.200147] e1000e 0000:00:19.0: PME# disabled >>> [ 3810.200224] e1000e 0000:00:19.0: irq 45 for MSI/MSI-X >>> [ 4671.144685] iwlwifi 0000:03:00.0: Tx aggregation enabled on ra = 2c:b0:5d:3c:7d:f1 tid = 1 >>> [ 4799.384107] btrfs: unlinked 8 orphans >>> [ 8436.512513] btrfs: unlinked 7 orphans >>> [11350.749850] btrfs no csum found for inode 3909426 start 0 >>> [11350.750697] btrfs csum failed ino 3909426 off 0 csum 1419704114 private 0 >>> [11652.088805] btrfs no csum found for inode 3910848 start 0 >>> [11652.089524] btrfs csum failed ino 3910848 off 0 csum 3145117582 private 0 >>> >>> My firefox and chrome profiles were corrupted, so I had to restore them from an old snapshot. >>> >>> I can''t prove it, but it looks like my corruption happened right at the same >>> time than I rebooted to 3.4.4. >>> >>> Marc >>> -- >>> "A mouse is a device used to point at the xterm you want to type in" - A.S.R. >>> Microsoft is to operating systems .... >>> .... what McDonalds is to gourmet cooking >>> Home page: http://marc.merlins.org/ >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> -- >> "A mouse is a device used to point at the xterm you want to type in" - A.S.R. >> Microsoft is to operating systems .... >> .... what McDonalds is to gourmet cooking >> Home page: http://marc.merlins.org/ >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Marc MERLIN
2012-Jul-04 15:15 UTC
Re: Long btrfs hangs during suspend to RAM / BTRFS warning (device dm-0): Aborting unused transaction
On Wed, Jul 04, 2012 at 01:58:31PM +0800, Liu Bo wrote:> The dmesg log, sysrq log and stack dump info can usually be very helpful. > > From your report, we can see the csum error and hang on log, > ''no csum'' is not that bad while hanging-on is serious and dangerous. > > so can you please get any ''sysrq + w'' log in the hanging-on case and paste them here, > and the log may tell us who blocks other threads.Hi, thanks for the answer. I dumped all sysrq data, that was in my original Email. Here are two different sysrq+w runs, as well as aborted transaction messages from that Email. Sorry that the original was a bit long and contained a bunch of sysrq output. From doing further testing since then, it does seem that the code just start doing bad things, including the file corruption I saw, when I''m running low on free space. Anything else that would help? Thanks, Marc> [84951.680847] Sched Debug Version: v0.10, 3.4.4-amd64-preempt-noide-20120410 #1 > [84951.680850] ktime : 84951680.847904 > [84951.680851] sched_clk : 55820767.832790 > [84951.680853] cpu_clk : 84951680.846676 > [84951.680855] jiffies : 4316130216 > [84951.680857] sched_clock_stable : 0 > [84951.680858] > [84951.680859] sysctl_sched > [84951.680860] .sysctl_sched_latency : 12.000000 > [84951.680862] .sysctl_sched_min_granularity : 1.500000 > [84951.680864] .sysctl_sched_wakeup_granularity : 2.000000 > [84951.680865] .sysctl_sched_child_runs_first : 0 > [84951.680867] .sysctl_sched_features : 24119 > [84951.680869] .sysctl_sched_tunable_scaling : 1 (logaritmic) > [84951.680871] > [84951.680872] cpu#0, 2527.110 MHz > [84951.680873] .nr_running : 4 > [84951.680875] .load : 1542 > [84951.680876] .nr_switches : 298015357 > [84951.680878] .nr_load_updates : 16858906 > [84951.680879] .nr_uninterruptible : -7812 > [84951.680881] .next_balance : 4316.130229 > [84951.680883] .curr->pid : 6859 > [84951.680884] .clock : 84951679.104864 > [84951.680886] .cpu_load[0] : 1277 > [84951.680887] .cpu_load[1] : 885 > [84951.680889] .cpu_load[2] : 728 > [84951.680890] .cpu_load[3] : 693 > [84951.680892] .cpu_load[4] : 731 > [84951.680894] > [84951.680894] cfs_rq[0]:/autogroup-69 > [84951.680896] .exec_clock : 0.000000 > [84951.680898] .MIN_vruntime : 0.000001 > [84951.680899] .min_vruntime : 194572.976125 > [84951.680901] .max_vruntime : 0.000001 > [84951.680903] .spread : 0.000000 > [84951.680904] .spread0 : -52160826.593685 > [84951.680906] .nr_spread_over : 0 > [84951.680907] .nr_running : 0 > [84951.680909] .load : 0 > [84951.680910] .load_avg : 5179.920896 > [84951.680912] .load_period : 8.310703 > [84951.680913] .load_contrib : 623 > [84951.680915] .load_tg : 623 > [84951.680917] .se->exec_start : 84951675.234336 > [84951.680918] .se->vruntime : 52355393.270692 > [84951.680920] .se->sum_exec_runtime : 241715.485721 > [84951.680922] .se->load.weight : 2 > [84951.680923] > [84951.680924] cfs_rq[0]:/autogroup-20 > [84951.680925] .exec_clock : 0.000000 > [84951.680927] .MIN_vruntime : 0.000001 > [84951.680929] .min_vruntime : 13027.823103 > [84951.680930] .max_vruntime : 0.000001 > [84951.680932] .spread : 0.000000 > [84951.680934] .spread0 : -52342371.746707 > [84951.680935] .nr_spread_over : 0 > [84951.680937] .nr_running : 0 > [84951.680938] .load : 0 > [84951.680940] .load_avg : 1279.999872 > [84951.680941] .load_period : 6.444962 > [84951.680943] .load_contrib : 198 > [84951.680944] .load_tg : 257 > [84951.680946] .se->exec_start : 84951664.094034 > [84951.680948] .se->vruntime : 52355389.621584 > [84951.680950] .se->sum_exec_runtime : 13082.655453 > [84951.680951] .se->load.weight : 2 > [84951.680953] > [84951.680954] cfs_rq[0]:/autogroup-74 > [84951.680955] .exec_clock : 0.000000 > [84951.680957] .MIN_vruntime : 1323610.934982 > [84951.680959] .min_vruntime : 1323616.934982 > [84951.680960] .max_vruntime : 1323610.934982 > [84951.680962] .spread : 0.000000 > [84951.680964] .spread0 : -51031782.634828 > [84951.680965] .nr_spread_over : 0 > [84951.680967] .nr_running : 1 > [84951.680968] .load : 1024 > [84951.680970] .load_avg : 252.855291 > [84951.680971] .load_period : 6.216878 > [84951.680973] .load_contrib : 44 > [84951.680974] .load_tg : 991 > [84951.680976] .se->exec_start : 84951670.049973 > [84951.680978] .se->vruntime : 52355393.569810 > [84951.680980] .se->sum_exec_runtime : 2397898.918671 > [84951.680981] .se->load.weight : 532 > [84951.680983] > [84951.680984] cfs_rq[0]:/autogroup-28 > [84951.680985] .exec_clock : 0.000000 > [84951.680987] .MIN_vruntime : 0.000001 > [84951.680989] .min_vruntime : 3527635.809340 > [84951.680990] .max_vruntime : 0.000001 > [84951.680992] .spread : 0.000000 > [84951.680993] .spread0 : -48827763.760470 > [84951.680995] .nr_spread_over : 0 > [84951.680996] .nr_running : 0 > [84951.680998] .load : 0 > [84951.680999] .load_avg : 1058.675911 > [84951.681001] .load_period : 6.328132 > [84951.681002] .load_contrib : 155 > [84951.681004] .load_tg : 1169 > [84951.681006] .se->exec_start : 84951678.610201 > [84951.681007] .se->vruntime : 52355394.509141 > [84951.681009] .se->sum_exec_runtime : 3528010.809431 > [84951.681011] .se->load.weight : 2 > [84951.681013] > [84951.681013] cfs_rq[0]:/autogroup-39 > [84951.681015] .exec_clock : 0.000000 > [84951.681016] .MIN_vruntime : 0.000001 > [84951.681018] .min_vruntime : 4931529.000259 > [84951.681020] .max_vruntime : 0.000001 > [84951.681021] .spread : 0.000000 > [84951.681023] .spread0 : -47423870.569551 > [84951.681025] .nr_spread_over : 0 > [84951.681026] .nr_running : 0 > [84951.681028] .load : 0 > [84951.681029] .load_avg : 4701.742763 > [84951.681031] .load_period : 9.302606 > [84951.681032] .load_contrib : 505 > [84951.681034] .load_tg : 1033 > [84951.681036] .se->exec_start : 84951675.470082 > [84951.681037] .se->vruntime : 52355399.569810 > [84951.681039] .se->sum_exec_runtime : 4256492.512746 > [84951.681041] .se->load.weight : 2 > [84951.681043] > [84951.681043] cfs_rq[0]:/autogroup-84 > [84951.681045] .exec_clock : 0.000000 > [84951.681047] .MIN_vruntime : 12773469.982548 > [84951.681048] .min_vruntime : 12773475.929218 > [84951.681050] .max_vruntime : 12773475.929218 > [84951.681052] .spread : 5.946670 > [84951.681053] .spread0 : -39581923.640592 > [84951.681055] .nr_spread_over : 0 > [84951.681056] .nr_running : 3 > [84951.681058] .load : 1578 > [84951.681059] .load_avg : 2389.940390 > [84951.681061] .load_period : 6.650024 > [84951.681063] .load_contrib : 289 > [84951.681064] .load_tg : 310 > [84951.681066] .se->exec_start : 84951679.104864 > [84951.681068] .se->vruntime : 52355396.091688 > [84951.681069] .se->sum_exec_runtime : 11567512.453793 > [84951.681071] .se->load.weight : 1010 > [84951.681073] > [84951.681073] cfs_rq[0]:/ > [84951.681075] .exec_clock : 0.000000 > [84951.681076] .MIN_vruntime : 52355393.569810 > [84951.681078] .min_vruntime : 52355399.569810 > [84951.681080] .max_vruntime : 52355393.569810 > [84951.681082] .spread : 0.000000 > [84951.681083] .spread0 : 0.000000 > [84951.681085] .nr_spread_over : 0 > [84951.681086] .nr_running : 2 > [84951.681087] .load : 1542 > [84951.681089] .load_avg : 0.000000 > [84951.681091] .load_period : 0.000000 > [84951.681092] .load_contrib : 0 > [84951.681093] .load_tg : 0 > [84951.681095] > [84951.681095] rt_rq[0]: > [84951.681097] .rt_nr_running : 0 > [84951.681098] .rt_throttled : 0 > [84951.681099] .rt_time : 0.000000 > [84951.681101] .rt_runtime : 950.000000 > [84951.681103] > [84951.681104] runnable tasks: > [84951.681104] task PID tree-key switches prio exec-runtime sum-exec sum-sleep > [84951.681105] ---------------------------------------------------------------------------------------------------------- > [84951.681119] R chrome 6859 12773469.929218 31318750 123 0 0 0.000000 0.000000 0.000000 /autogroup-84 > [84951.681129] chrome 7268 12773475.929218 72627834 123 0 0 0.000000 0.000000 0.000000 /autogroup-84 > [84951.681135] Chrome_ChildIOT 7270 12773469.982548 20491354 123 0 0 0.000000 0.000000 0.000000 /autogroup-84 > [84951.681143] chromium-browse 28255 1323610.934982 2145602 120 0 0 0.000000 0.000000 0.000000 /autogroup-74 > [84951.681152] > [84951.681153] cpu#1, 2527.110 MHz > [84951.681154] .nr_running : 1 > [84951.681156] .load : 1024 > [84951.681157] .nr_switches : 193625464 > [84951.681159] .nr_load_updates : 19156541 > [84951.681160] .nr_uninterruptible : 7826 > [84951.681162] .next_balance : 4316.130228 > [84951.681163] .curr->pid : 5589 > [84951.681165] .clock : 84951680.008137 > [84951.681167] .cpu_load[0] : 1024 > [84951.681168] .cpu_load[1] : 1024 > [84951.681169] .cpu_load[2] : 1024 > [84951.681171] .cpu_load[3] : 1024 > [84951.681172] .cpu_load[4] : 1029 > [84951.681174] > [84951.681175] cfs_rq[1]:/autogroup-74 > [84951.681176] .exec_clock : 0.000000 > [84951.681178] .MIN_vruntime : 0.000001 > [84951.681179] .min_vruntime : 1424197.278711 > [84951.681181] .max_vruntime : 0.000001 > [84951.681183] .spread : 0.000000 > [84951.681184] .spread0 : -50931202.291099 > [84951.681186] .nr_spread_over : 0 > [84951.681187] .nr_running : 0 > [84951.681189] .load : 0 > [84951.681190] .load_avg : 5245.213184 > [84951.681192] .load_period : 5.523241 > [84951.681194] .load_contrib : 947 > [84951.681195] .load_tg : 991 > [84951.681197] .se->exec_start : 84951600.488928 > [84951.681199] .se->vruntime : 71918776.133351 > [84951.681201] .se->sum_exec_runtime : 2565479.414558 > [84951.681202] .se->load.weight : 2 > [84951.681204] > [84951.681205] cfs_rq[1]:/autogroup-20 > [84951.681206] .exec_clock : 0.000000 > [84951.681208] .MIN_vruntime : 0.000001 > [84951.681210] .min_vruntime : 16579.934530 > [84951.681211] .max_vruntime : 0.000001 > [84951.681213] .spread : 0.000000 > [84951.681214] .spread0 : -52338819.635280 > [84951.681216] .nr_spread_over : 0 > [84951.681217] .nr_running : 0 > [84951.681219] .load : 0 > [84951.681220] .load_avg : 440.283872 > [84951.681222] .load_period : 6.022332 > [84951.681224] .load_contrib : 59 > [84951.681225] .load_tg : 257 > [84951.681227] .se->exec_start : 84951604.151757 > [84951.681229] .se->vruntime : 71918779.019518 > [84951.681230] .se->sum_exec_runtime : 16600.817243 > [84951.681232] .se->load.weight : 2 > [84951.681234] > [84951.681234] cfs_rq[1]:/autogroup-39 > [84951.681236] .exec_clock : 0.000000 > [84951.681238] .MIN_vruntime : 0.000001 > [84951.681239] .min_vruntime : 4238682.043047 > [84951.681241] .max_vruntime : 0.000001 > [84951.681242] .spread : 0.000000 > [84951.681244] .spread0 : -48116717.526763 > [84951.681246] .nr_spread_over : 0 > [84951.681247] .nr_running : 0 > [84951.681248] .load : 0 > [84951.681250] .load_avg : 5176.058368 > [84951.681252] .load_period : 9.746619 > [84951.681253] .load_contrib : 528 > [84951.681255] .load_tg : 1033 > [84951.681257] .se->exec_start : 84951600.594975 > [84951.681258] .se->vruntime : 71918775.930447 > [84951.681260] .se->sum_exec_runtime : 4311876.124090 > [84951.681262] .se->load.weight : 2 > [84951.681264] > [84951.681264] cfs_rq[1]:/autogroup-84 > [84951.681266] .exec_clock : 0.000000 > [84951.681267] .MIN_vruntime : 0.000001 > [84951.681269] .min_vruntime : 7070943.272798 > [84951.681271] .max_vruntime : 0.000001 > [84951.681272] .spread : 0.000000 > [84951.681274] .spread0 : -45284456.297012 > [84951.681275] .nr_spread_over : 0 > [84951.681277] .nr_running : 0 > [84951.681278] .load : 0 > [84951.681280] .load_avg : 5259.999474 > [84951.681281] .load_period : 9.999999 > [84951.681283] .load_contrib : 21 > [84951.681284] .load_tg : 310 > [84951.681286] .se->exec_start : 84951649.364685 > [84951.681288] .se->vruntime : 71918824.034522 > [84951.681290] .se->sum_exec_runtime : 6502960.496865 > [84951.681292] .se->load.weight : 2 > [84951.681293] > [84951.681294] cfs_rq[1]:/autogroup-28 > [84951.681295] .exec_clock : 0.000000 > [84951.681297] .MIN_vruntime : 0.000001 > [84951.681299] .min_vruntime : 1961436.099473 > [84951.681301] .max_vruntime : 0.000001 > [84951.681302] .spread : 0.000000 > [84951.681304] .spread0 : -50393963.470337 > [84951.681305] .nr_spread_over : 0 > [84951.681307] .nr_running : 0 > [84951.681308] .load : 0 > [84951.681310] .load_avg : 5174.934016 > [84951.681311] .load_period : 5.101167 > [84951.681313] .load_contrib : 1014 > [84951.681314] .load_tg : 1169 > [84951.681316] .se->exec_start : 84951649.415319 > [84951.681318] .se->vruntime : 71918824.084299 > [84951.681320] .se->sum_exec_runtime : 1961721.167843 > [84951.681321] .se->load.weight : 2 > [84951.681323] > [84951.681323] cfs_rq[1]:/ > [84951.681325] .exec_clock : 0.000000 > [84951.681326] .MIN_vruntime : 0.000001 > [84951.681328] .min_vruntime : 71918860.488271 > [84951.681330] .max_vruntime : 0.000001 > [84951.681331] .spread : 0.000000 > [84951.681333] .spread0 : 19563460.918461 > [84951.681334] .nr_spread_over : 0 > [84951.681336] .nr_running : 1 > [84951.681337] .load : 1024 > [84951.681339] .load_avg : 0.000000 > [84951.681340] .load_period : 0.000000 > [84951.681342] .load_contrib : 0 > [84951.681343] .load_tg : 0 > [84951.681345] > [84951.681345] rt_rq[1]: > [84951.681346] .rt_nr_running : 0 > [84951.681348] .rt_throttled : 0 > [84951.681349] .rt_time : 0.000000 > [84951.681351] .rt_runtime : 950.000000 > [84951.681353] > [84951.681353] runnable tasks: > [84951.681354] task PID tree-key switches prio exec-runtime sum-exec sum-sleep > [84951.681355] ---------------------------------------------------------------------------------------------------------- > [84951.681370] Rbtrfs-delalloc- 5589 71918860.488271 173262 120 0 0 0.000000 0.000000 0.000000 /> [84959.028005] Sched Debug Version: v0.10, 3.4.4-amd64-preempt-noide-20120410 #1 > [84959.028005] ktime : 84959073.124471 > [84959.028005] sched_clk : 55828156.022356 > [84959.028005] cpu_clk : 84959028.005869 > [84959.028005] jiffies : 4316132062 > [84959.028005] sched_clock_stable : 0 > [84959.028005] > [84959.028005] sysctl_sched > [84959.028005] .sysctl_sched_latency : 12.000000 > [84959.028005] .sysctl_sched_min_granularity : 1.500000 > [84959.028005] .sysctl_sched_wakeup_granularity : 2.000000 > [84959.028005] .sysctl_sched_child_runs_first : 0 > [84959.028005] .sysctl_sched_features : 24119 > [84959.028005] .sysctl_sched_tunable_scaling : 1 (logaritmic) > [84959.028005] > [84959.028005] cpu#0, 2527.110 MHz > [84959.028005] .nr_running : 1 > [84959.028005] .load : 1024 > [84959.028005] .nr_switches : 298019849 > [84959.028005] .nr_load_updates : 16860702 > [84959.028005] .nr_uninterruptible : -7812 > [84959.028005] .next_balance : 4316.132064 > [84959.028005] .curr->pid : 5589 > [84959.028005] .clock : 84959024.005728 > [84959.028005] .cpu_load[0] : 1024 > [84959.028005] .cpu_load[1] : 1024 > [84959.028005] .cpu_load[2] : 1024 > [84959.028005] .cpu_load[3] : 1024 > [84959.028005] .cpu_load[4] : 1024 > [84959.028005] > [84959.028005] cfs_rq[0]:/ > [84959.028005] .exec_clock : 0.000000 > [84959.028005] .MIN_vruntime : 0.000001 > [84959.028005] .min_vruntime : 52362263.842363 > [84959.028005] .max_vruntime : 0.000001 > [84959.028005] .spread : 0.000000 > [84959.028005] .spread0 : 0.000000 > [84959.028005] .nr_spread_over : 0 > [84959.028005] .nr_running : 1 > [84959.028005] .load : 1024 > [84959.028005] .load_avg : 0.000000 > [84959.028005] .load_period : 0.000000 > [84959.028005] .load_contrib : 0 > [84959.028005] .load_tg : 0 > [84959.028005] > [84959.028005] rt_rq[0]: > [84959.028005] .rt_nr_running : 0 > [84959.028005] .rt_throttled : 0 > [84959.028005] .rt_time : 0.000000 > [84959.028005] .rt_runtime : 950.000000 > [84959.028005] > [84959.028005] runnable tasks: > [84959.028005] task PID tree-key switches prio exec-runtime sum-exec sum-sleep > [84959.028005] ---------------------------------------------------------------------------------------------------------- > [84959.028005] Rbtrfs-delalloc- 5589 52362263.842363 173591 120 0 0 0.000000 0.000000 0.000000 / > [84959.028005] > [84959.028005] cpu#1, 2527.110 MHz > [84959.028005] .nr_running : 0 > [84959.028005] .load : 0 > [84959.028005] .nr_switches : 193660011 > [84959.028005] .nr_load_updates : 19157733 > [84959.028005] .nr_uninterruptible : 7826 > [84959.028005] .next_balance : 4316.132092 > [84959.028005] .curr->pid : 0 > [84959.028005] .clock : 84959064.238592 > [84959.028005] .cpu_load[0] : 4096 > [84959.028005] .cpu_load[1] : 2072 > [84959.028005] .cpu_load[2] : 1194 > [84959.028005] .cpu_load[3] : 869 > [84959.028005] .cpu_load[4] : 762 > [84959.028005] > [84959.028005] cfs_rq[1]:/autogroup-69 > [84959.028005] .exec_clock : 0.000000 > [84959.028005] .MIN_vruntime : 0.000001 > [84959.028005] .min_vruntime : 223361.691158 > [84959.028005] .max_vruntime : 0.000001 > [84959.028005] .spread : 0.000000 > [84959.028005] .spread0 : -52138902.151205 > [84959.028005] .nr_spread_over : 0 > [84959.028005] .nr_running : 0 > [84959.028005] .load : 0 > [84959.028005] .load_avg : 2590.643456 > [84959.028005] .load_period : 6.494722 > [84959.028005] .load_contrib : 398 > [84959.028005] .load_tg : 398 > [84959.028005] .se->exec_start : 84959056.335435 > [84959.028005] .se->vruntime : 71921556.083864 > [84959.028005] .se->sum_exec_runtime : 271360.995455 > [84959.028005] .se->load.weight : 2 > [84959.028005] > [84959.028005] cfs_rq[1]:/autogroup-20 > [84959.028005] .exec_clock : 0.000000 > [84959.028005] .MIN_vruntime : 0.000001 > [84959.028005] .min_vruntime : 16585.443509 > [84959.028005] .max_vruntime : 0.000001 > [84959.028005] .spread : 0.000000 > [84959.028005] .spread0 : -52345678.398854 > [84959.028005] .nr_spread_over : 0 > [84959.028005] .nr_running : 0 > [84959.028005] .load : 0 > [84959.028005] .load_avg : 347.126752 > [84959.028005] .load_period : 6.020438 > [84959.028005] .load_contrib : 57 > [84959.028005] .load_tg : 57 > [84959.028005] .se->exec_start : 84959060.122289 > [84959.028005] .se->vruntime : 71921556.040907 > [84959.028005] .se->sum_exec_runtime : 16606.326222 > [84959.028005] .se->load.weight : 2 > [84959.028005] > [84959.028005] cfs_rq[1]:/autogroup-84 > [84959.028005] .exec_clock : 0.000000 > [84959.028005] .MIN_vruntime : 0.000001 > [84959.028005] .min_vruntime : 7072493.859899 > [84959.028005] .max_vruntime : 0.000001 > [84959.028005] .spread : 0.000000 > [84959.028005] .spread0 : -45289769.982464 > [84959.028005] .nr_spread_over : 0 > [84959.028005] .nr_running : 0 > [84959.028005] .load : 0 > [84959.028005] .load_avg : 6650.423655 > [84959.028005] .load_period : 8.469275 > [84959.028005] .load_contrib : 785 > [84959.028005] .load_tg : 785 > [84959.028005] .se->exec_start : 84959064.185687 > [84959.028005] .se->vruntime : 71921564.865429 > [84959.028005] .se->sum_exec_runtime : 6504348.221165 > [84959.028005] .se->load.weight : 2 > [84959.028005] > [84959.028005] cfs_rq[1]:/autogroup-28 > [84959.028005] .exec_clock : 0.000000 > [84959.028005] .MIN_vruntime : 0.000001 > [84959.028005] .min_vruntime : 1961906.582524 > [84959.028005] .max_vruntime : 0.000001 > [84959.028005] .spread : 0.000000 > [84959.028005] .spread0 : -50400357.259839 > [84959.028005] .nr_spread_over : 0 > [84959.028005] .nr_running : 0 > [84959.028005] .load : 0 > [84959.028005] .load_avg : 994.827639 > [84959.028005] .load_period : 7.072831 > [84959.028005] .load_contrib : 140 > [84959.028005] .load_tg : 140 > [84959.028005] .se->exec_start : 84959064.180949 > [84959.028005] .se->vruntime : 71921559.003595 > [84959.028005] .se->sum_exec_runtime : 1962191.771814 > [84959.028005] .se->load.weight : 2 > [84959.028005] > [84959.028005] cfs_rq[1]:/autogroup-39 > [84959.028005] .exec_clock : 0.000000 > [84959.028005] .MIN_vruntime : 0.000001 > [84959.028005] .min_vruntime : 4239411.104825 > [84959.028005] .max_vruntime : 0.000001 > [84959.028005] .spread : 0.000000 > [84959.028005] .spread0 : -48122852.737538 > [84959.028005] .nr_spread_over : 0 > [84959.028005] .nr_running : 0 > [84959.028005] .load : 0 > [84959.028005] .load_avg : 7224.556007 > [84959.028005] .load_period : 8.698458 > [84959.028005] .load_contrib : 830 > [84959.028005] .load_tg : 830 > [84959.028005] .se->exec_start : 84959060.995273 > [84959.028005] .se->vruntime : 71921558.657276 > [84959.028005] .se->sum_exec_runtime : 4312867.647890 > [84959.028005] .se->load.weight : 2 > [84959.028005] > [84959.028005] cfs_rq[1]:/autogroup-74 > [84959.028005] .exec_clock : 0.000000 > [84959.028005] .MIN_vruntime : 0.000001 > [84959.028005] .min_vruntime : 1424529.029769 > [84959.028005] .max_vruntime : 0.000001 > [84959.028005] .spread : 0.000000 > [84959.028005] .spread0 : -50937734.812594 > [84959.028005] .nr_spread_over : 0 > [84959.028005] .nr_running : 0 > [84959.028005] .load : 0 > [84959.028005] .load_avg : 6439.299117 > [84959.028005] .load_period : 8.633038 > [84959.028005] .load_contrib : 745 > [84959.028005] .load_tg : 745 > [84959.028005] .se->exec_start : 84959061.053899 > [84959.028005] .se->vruntime : 71921559.096778 > [84959.028005] .se->sum_exec_runtime : 2566083.226405 > [84959.028005] .se->load.weight : 2 > [84959.028005] > [84959.028005] cfs_rq[1]:/ > [84959.028005] .exec_clock : 0.000000 > [84959.028005] .MIN_vruntime : 0.000001 > [84959.028005] .min_vruntime : 71921564.865429 > [84959.028005] .max_vruntime : 0.000001 > [84959.028005] .spread : 0.000000 > [84959.028005] .spread0 : 19559301.023066 > [84959.028005] .nr_spread_over : 0 > [84959.028005] .nr_running : 0 > [84959.028005] .load : 0 > [84959.028005] .load_avg : 0.000000 > [84959.028005] .load_period : 0.000000 > [84959.028005] .load_contrib : 0 > [84959.028005] .load_tg : 0 > [84959.028005] > [84959.028005] rt_rq[1]: > [84959.028005] .rt_nr_running : 0 > [84959.028005] .rt_throttled : 0 > [84959.028005] .rt_time : 0.000000 > [84959.028005] .rt_runtime : 950.000000 > [84959.028005] > [84959.028005] runnable tasks: > [84959.028005] task PID tree-key switches prio exec-runtime sum-exec sum-sleep > [84959.028005] ---------------------------------------------------------------------------------------------------------- > [84959.028005] > [84960.376059] INFO: task flush-btrfs-1:399 blocked for more than 120 seconds.> [85748.849847] BTRFS warning (device dm-0): Aborting unused transaction. > [85754.690251] BTRFS warning (device dm-0): Aborting unused transaction. > [85754.787953] BTRFS warning (device dm-0): Aborting unused transaction. > [85754.871973] BTRFS warning (device dm-0): Aborting unused transaction. > [85754.971097] BTRFS warning (device dm-0): Aborting unused transaction. > [85755.242875] BTRFS warning (device dm-0): Aborting unused transaction. > [85755.522664] BTRFS warning (device dm-0): Aborting unused transaction. > [85755.670110] BTRFS warning (device dm-0): Aborting unused transaction. > [85755.811342] BTRFS warning (device dm-0): Aborting unused transaction. > [85755.971524] BTRFS warning (device dm-0): Aborting unused transaction. > [85756.175633] BTRFS warning (device dm-0): Aborting unused transaction. > [85756.329593] BTRFS warning (device dm-0): Aborting unused transaction. > [85757.064995] BTRFS warning (device dm-0): Aborting unused transaction. > [85757.210976] BTRFS warning (device dm-0): Aborting unused transaction. > [85758.058191] BTRFS warning (device dm-0): Aborting unused transaction. > [85758.454009] BTRFS warning (device dm-0): Aborting unused transaction. > [85758.742629] BTRFS warning (device dm-0): Aborting unused transaction. > [85759.082905] BTRFS warning (device dm-0): Aborting unused transaction. > [85760.355436] BTRFS warning (device dm-0): Aborting unused transaction. > [85760.795476] BTRFS warning (device dm-0): Aborting unused transaction. > [85760.921292] BTRFS warning (device dm-0): Aborting unused transaction. > [85761.041269] BTRFS warning (device dm-0): Aborting unused transaction. > [85761.160903] BTRFS warning (device dm-0): Aborting unused transaction. > [85761.265970] BTRFS warning (device dm-0): Aborting unused transaction. > [85761.376858] BTRFS warning (device dm-0): Aborting unused transaction. > [85762.271769] BTRFS warning (device dm-0): Aborting unused transaction. > [85762.888725] BTRFS warning (device dm-0): Aborting unused transaction.-- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Liu Bo
2012-Jul-05 13:25 UTC
Re: Long btrfs hangs during suspend to RAM / BTRFS warning (device dm-0): Aborting unused transaction
On 07/04/2012 11:15 PM, Marc MERLIN wrote:> On Wed, Jul 04, 2012 at 01:58:31PM +0800, Liu Bo wrote: >> > The dmesg log, sysrq log and stack dump info can usually be very helpful. >> > >> > From your report, we can see the csum error and hang on log, >> > ''no csum'' is not that bad while hanging-on is serious and dangerous. >> > >> > so can you please get any ''sysrq + w'' log in the hanging-on case and paste them here, >> > and the log may tell us who blocks other threads. > > Hi, thanks for the answer. > > I dumped all sysrq data, that was in my original Email. Here are two > different sysrq+w runs, as well as aborted transaction messages from that > Email. > Sorry that the original was a bit long and contained a bunch of sysrq output. > >>From doing further testing since then, it does seem that the code just start > doing bad things, including the file corruption I saw, when I''m running low > on free space. > > Anything else that would help? > > Thanks, > Marc >I''d expect to get some info from the following one, but I fails. Is it reproducible on your box? thanks, liubo> > Back to 3.2.16, I''m now seeing this: > [ 840.516733] INFO: task VirtualBox:6818 blocked for more than 120 seconds. > [ 840.516735] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 840.516736] VirtualBox D ffff8801fd134080 0 6818 6758 0x00000080 > [ 840.516740] ffff8801fd134080 0000000000000086 0000000000000050 ffff880202e7f100 > [ 840.516744] 0000000000013580 ffff8801c6f0dfd8 ffff8801c6f0dfd8 ffff8801fd134080 > [ 840.516748] ffff8801c6f0da68 ffff8801c6f0da68 ffff88020a4e22f0 ffff88023bc13e08 > [ 840.516752] Call Trace: > [ 840.516755] [<ffffffff810b5c67>] ? __lock_page+0x66/0x66 > [ 840.516758] [<ffffffff8134aea4>] ? io_schedule+0x58/0x6f > [ 840.516761] [<ffffffff810b5c6d>] ? sleep_on_page+0x6/0xa > [ 840.516764] [<ffffffff8134b1e5>] ? __wait_on_bit_lock+0x3c/0x85 > [ 840.516767] [<ffffffff810b5c62>] ? __lock_page+0x61/0x66 > [ 840.516770] [<ffffffff81060051>] ? autoremove_wake_function+0x2a/0x2a > [ 840.516785] [<ffffffffa01838d7>] ? extent_write_cache_pages.isra.13.constprop.22+0xf6/0x278 [btrfs] > [ 840.516789] [<ffffffff810ec9cb>] ? __cache_free.isra.40+0x19/0x1a7 > [ 840.516792] [<ffffffff8134ed52>] ? sub_preempt_count+0x83/0x94 > [ 840.516795] [<ffffffff8134c2dd>] ? _raw_spin_unlock+0x24/0x30 > [ 840.516811] [<ffffffffa0183c4b>] ? extent_writepages+0x40/0x57 [btrfs] > [ 840.516826] [<ffffffffa0177f5f>] ? __btrfs_buffered_write+0x2bb/0x2dc [btrfs] > [ 840.516841] [<ffffffffa016e88a>] ? uncompress_inline.isra.44+0x116/0x116 [btrfs] > [ 840.516844] [<ffffffff810b6aaf>] ? __filemap_fdatawrite_range+0x4b/0x50 > [ 840.516847] [<ffffffff810b6ad9>] ? filemap_write_and_wait_range+0x25/0x4d > [ 840.516863] [<ffffffffa01782ce>] ? btrfs_file_aio_write+0x34e/0x490 [btrfs] > [ 840.516866] [<ffffffff8103e092>] ? get_parent_ip+0x9/0x1b > [ 840.516882] [<ffffffffa0177f80>] ? __btrfs_buffered_write+0x2dc/0x2dc [btrfs] > [ 840.516886] [<ffffffff8112f19c>] ? aio_rw_vect_retry+0x70/0x18e > [ 840.516888] [<ffffffff8112f12c>] ? aio_fsync+0x22/0x22 > [ 840.516891] [<ffffffff8112fbc7>] ? aio_run_iocb+0x72/0x11c > [ 840.516894] [<ffffffff81130d9a>] ? do_io_submit+0x6a4/0x7f9 > [ 840.516898] [<ffffffff813508d2>] ? system_call_fastpath+0x16/0x1b > [ 1187.553635] btrfs: unlinked 8 orphans >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Marc MERLIN
2012-Jul-05 14:34 UTC
Re: Long btrfs hangs during suspend to RAM / BTRFS warning (device dm-0): Aborting unused transaction
On Thu, Jul 05, 2012 at 09:25:44PM +0800, Liu Bo wrote:> On 07/04/2012 11:15 PM, Marc MERLIN wrote: > I''d expect to get some info from the following one, but I fails. > > Is it reproducible on your box?It was reproducible when I reported it, but not anymore now. I couldn''t stay in a state where I was getting hangs and corruption. I had problems with btrfs-delalloc- taking most of the CPU and deadlocking the machine for minutes or even over half an hour. I''m pretty sure it was related to the fact that I only had 10GB free on my partition (including lots of snapshots). Now that I''m back to >40GB free, I haven''t had the problem come back. How about these I reported: [102231.845170] btrfs no csum found for inode 3910562 start 0 [102231.855415] btrfs_readpage_end_io_hook: 26 callbacks suppressed [102231.855418] btrfs csum failed ino 3910562 off 0 csum 3145117582 private 0 [102231.999555] btrfs no csum found for inode 3910588 start 294912 [102232.009479] btrfs csum failed ino 3910588 off 294912 csum 697692408 private 0 [102232.142602] btrfs no csum found for inode 3910588 start 360448 [102232.142661] btrfs no csum found for inode 3910588 start 393216 [102232.968939] BTRFS warning (device dm-0): Aborting unused transaction. [102234.107914] BTRFS warning (device dm-0): Aborting unused transaction. [102235.218139] BTRFS warning (device dm-0): Aborting unused transaction. I know I got corruption since I saw some of my files being damaged, but are they useful in any way, or not so much? Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Marc MERLIN
2012-Jul-18 18:01 UTC
Re: Long btrfs hangs during suspend to RAM / BTRFS warning (device
On Thu, Jul 05, 2012 at 07:34:17AM -0700, Marc MERLIN wrote:> On Thu, Jul 05, 2012 at 09:25:44PM +0800, Liu Bo wrote: > > On 07/04/2012 11:15 PM, Marc MERLIN wrote: > > I''d expect to get some info from the following one, but I fails. > > > > Is it reproducible on your box? > > It was reproducible when I reported it, but not anymore now. I couldn''t stay > in a state where I was getting hangs and corruption. > > I had problems with btrfs-delalloc- taking most of the CPU and deadlocking > the machine for minutes or even over half an hour. I''m pretty sure it was > related to the fact that I only had 10GB free on my partition (including > lots of snapshots). > Now that I''m back to >40GB free, I haven''t had the problem come back. > > How about these I reported: > [102231.845170] btrfs no csum found for inode 3910562 start 0 > [102231.855415] btrfs_readpage_end_io_hook: 26 callbacks suppressed > [102231.855418] btrfs csum failed ino 3910562 off 0 csum 3145117582 private 0 > [102231.999555] btrfs no csum found for inode 3910588 start 294912 > [102232.009479] btrfs csum failed ino 3910588 off 294912 csum 697692408 private 0 > [102232.142602] btrfs no csum found for inode 3910588 start 360448 > [102232.142661] btrfs no csum found for inode 3910588 start 393216 > [102232.968939] BTRFS warning (device dm-0): Aborting unused transaction. > [102234.107914] BTRFS warning (device dm-0): Aborting unused transaction. > [102235.218139] BTRFS warning (device dm-0): Aborting unused transaction. > > I know I got corruption since I saw some of my files being damaged, but > are they useful in any way, or not so much?Just to put a lid (maybe) on that thread, I now think that it''s very possible my btrfs hangs when the drive was almost full (10GB remaining though) were due to my Crucial RealSSD C300, which has just mostly died yesterday after (I think) the firmware got too many problems trying to remap blocks (garbage collection) since I wasn''t using TRIM. (yes, I had the latest 007 firmware on the SSD). I don''t know that the SSD caused the corruption I saw in some files and that went away when I dropped the latest btrfs snapshot, but at this point I''m pretty sure that the SSD went into emergency garbage collection mode when its space was running low, and in turn was causing btrfs to hang. Sorry for the most likely false alarm (except for snapshot corruption which is likely btrfs related). I''ll post another message about no TRIM with dmcrypt and possible ensuing SSD problems. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Liu Bo
2012-Jul-19 01:00 UTC
Re: Long btrfs hangs during suspend to RAM / BTRFS warning (device
On 07/19/2012 02:01 AM, Marc MERLIN wrote:> > Just to put a lid (maybe) on that thread, I now think that it''s very > possible my btrfs hangs when the drive was almost full (10GB remaining > though) were due to my Crucial RealSSD C300, which has just mostly died > yesterday after (I think) the firmware got too many problems trying to remap > blocks (garbage collection) since I wasn''t using TRIM. > (yes, I had the latest 007 firmware on the SSD). > > I don''t know that the SSD caused the corruption I saw in some files and that > went away when I dropped the latest btrfs snapshot, but at this point I''m > pretty sure that the SSD went into emergency garbage collection mode when > its space was running low, and in turn was causing btrfs to hang. > > Sorry for the most likely false alarm (except for snapshot corruption which > is likely btrfs related). > > I''ll post another message about no TRIM with dmcrypt and possible ensuing > SSD problems. > > MarcHi Marc, It''s ok, any user report is WELCOME. :) thanks, liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html