After running ceph on XFS for some time, I decided to try btrfs again. Performance with the current "for-linux-min" branch and big metadata is much better. The only problem (?) I''m still seeing is a warning that seems to occur from time to time: [87703.784552] ------------[ cut here ]------------ [87703.789759] WARNING: at fs/btrfs/inode.c:2103 btrfs_orphan_commit_root+0xf6/0x100 [btrfs]() [87703.799070] Hardware name: ProLiant DL180 G6 [87703.804024] Modules linked in: btrfs zlib_deflate libcrc32c xfs exportfs sunrpc bonding ipv6 sg serio_raw pcspkr iTCO_wdt iTCO_vendor_support i7core_edac edac_core ixgbe dca mdio iomemory_vsl(PO) hpsa squashfs [last unloaded: scsi_wait_scan] [87703.828166] Pid: 929, comm: kworker/1:2 Tainted: P O 3.3.2-1.fits.1.el6.x86_64 #1 [87703.837513] Call Trace: [87703.840280] [<ffffffff8104df6f>] warn_slowpath_common+0x7f/0xc0 [87703.847016] [<ffffffff8104dfca>] warn_slowpath_null+0x1a/0x20 [87703.853533] [<ffffffffa0355686>] btrfs_orphan_commit_root+0xf6/0x100 [btrfs] [87703.861541] [<ffffffffa0350a06>] commit_fs_roots+0xc6/0x1c0 [btrfs] [87703.868674] [<ffffffffa0351bcb>] btrfs_commit_transaction+0x5db/0xa50 [btrfs] [87703.876745] [<ffffffff810127a3>] ? __switch_to+0x153/0x440 [87703.882966] [<ffffffff81070a90>] ? wake_up_bit+0x40/0x40 [87703.888997] [<ffffffffa0352040>] ? btrfs_commit_transaction+0xa50/0xa50 [btrfs] [87703.897271] [<ffffffffa035205f>] do_async_commit+0x1f/0x30 [btrfs] [87703.904262] [<ffffffff81068949>] process_one_work+0x129/0x450 [87703.910777] [<ffffffff8106b7eb>] worker_thread+0x17b/0x3c0 [87703.916991] [<ffffffff8106b670>] ? manage_workers+0x220/0x220 [87703.923504] [<ffffffff810703fe>] kthread+0x9e/0xb0 [87703.928952] [<ffffffff8158c224>] kernel_thread_helper+0x4/0x10 [87703.935555] [<ffffffff81070360>] ? kthread_freezable_should_stop+0x70/0x70 [87703.943323] [<ffffffff8158c220>] ? gs_change+0x13/0x13 [87703.949149] ---[ end trace b8c31966cca731fa ]--- [91128.812399] ------------[ cut here ]------------ [91128.817576] WARNING: at fs/btrfs/inode.c:2103 btrfs_orphan_commit_root+0xf6/0x100 [btrfs]() [91128.826930] Hardware name: ProLiant DL180 G6 [91128.831897] Modules linked in: btrfs zlib_deflate libcrc32c xfs exportfs sunrpc bonding ipv6 sg serio_raw pcspkr iTCO_wdt iTCO_vendor_support i7core_edac edac_core ixgbe dca mdio iomemory_vsl(PO) hpsa squashfs [last unloaded: scsi_wait_scan] [91128.856086] Pid: 6806, comm: btrfs-transacti Tainted: P W O 3.3.2-1.fits.1.el6.x86_64 #1 [91128.865912] Call Trace: [91128.868670] [<ffffffff8104df6f>] warn_slowpath_common+0x7f/0xc0 [91128.875379] [<ffffffff8104dfca>] warn_slowpath_null+0x1a/0x20 [91128.881900] [<ffffffffa0355686>] btrfs_orphan_commit_root+0xf6/0x100 [btrfs] [91128.889894] [<ffffffffa0350a06>] commit_fs_roots+0xc6/0x1c0 [btrfs] [91128.897019] [<ffffffffa03a2b61>] ? btrfs_run_delayed_items+0xf1/0x160 [btrfs] [91128.905075] [<ffffffffa0351bcb>] btrfs_commit_transaction+0x5db/0xa50 [btrfs] [91128.913156] [<ffffffffa03524b2>] ? start_transaction+0x92/0x310 [btrfs] [91128.920643] [<ffffffff81070a90>] ? wake_up_bit+0x40/0x40 [91128.926667] [<ffffffffa034cfcb>] transaction_kthread+0x26b/0x2e0 [btrfs] [91128.934254] [<ffffffffa034cd60>] ? btrfs_destroy_marked_extents.clone.0+0x1f0/0x1f0 [btrfs] [91128.943671] [<ffffffffa034cd60>] ? btrfs_destroy_marked_extents.clone.0+0x1f0/0x1f0 [btrfs] [91128.953079] [<ffffffff810703fe>] kthread+0x9e/0xb0 [91128.958532] [<ffffffff8158c224>] kernel_thread_helper+0x4/0x10 [91128.965133] [<ffffffff81070360>] ? kthread_freezable_should_stop+0x70/0x70 [91128.972913] [<ffffffff8158c220>] ? gs_change+0x13/0x13 [91128.978826] ---[ end trace b8c31966cca731fb ]--- I''m able to reproduce this with ceph on a single server with 4 disks (4 filesystems/osds) and a small test program based on librbd. It is simply writing random bytes on a rbd volume (see attachment). Is this something I should care about? Any hint''s on solving this would be appreciated. Thanks, Christian
I decided to run the test over the weekend. The good news is, that the system is still running without performance degradation. But in the meantime I''ve got over 5000 WARNINGs of this kind: [330700.043557] btrfs: block rsv returned -28 [330700.043559] ------------[ cut here ]------------ [330700.048898] WARNING: at fs/btrfs/extent-tree.c:6220 btrfs_alloc_free_block+0x357/0x370 [btrfs]() [330700.058880] Hardware name: ProLiant DL180 G6 [330700.064044] Modules linked in: btrfs zlib_deflate libcrc32c xfs exportfs sunrpc bonding ipv6 sg serio_raw pcspkr iTCO_wdt iTCO_vendor_support i7core_edac edac_core ixgbe dca mdio iomemory_vsl(PO) hpsa squashfs [last unloaded: scsi_wait_scan] [330700.090361] Pid: 7954, comm: btrfs-endio-wri Tainted: P W O 3.3.2-1.fits.1.el6.x86_64 #1 [330700.100393] Call Trace: [330700.103263] [<ffffffff8104df6f>] warn_slowpath_common+0x7f/0xc0 [330700.110201] [<ffffffff8104dfca>] warn_slowpath_null+0x1a/0x20 [330700.116905] [<ffffffffa03436f7>] btrfs_alloc_free_block+0x357/0x370 [btrfs] [330700.124988] [<ffffffffa0330eb0>] ? __btrfs_cow_block+0x330/0x530 [btrfs] [330700.132787] [<ffffffffa0398174>] ? btrfs_add_delayed_data_ref+0x64/0x1c0 [btrfs] [330700.141369] [<ffffffffa0372d8b>] ? read_extent_buffer+0xbb/0x120 [btrfs] [330700.149194] [<ffffffffa0365d6d>] ? btrfs_token_item_offset+0x5d/0xe0 [btrfs] [330700.157373] [<ffffffffa0330cb3>] __btrfs_cow_block+0x133/0x530 [btrfs] [330700.165023] [<ffffffffa032f2ed>] ? read_block_for_search+0x14d/0x3d0 [btrfs] [330700.173183] [<ffffffffa0331684>] btrfs_cow_block+0xf4/0x1f0 [btrfs] [330700.180552] [<ffffffffa03344b8>] btrfs_search_slot+0x3e8/0x8e0 [btrfs] [330700.188128] [<ffffffffa03469f4>] btrfs_lookup_csum+0x74/0x170 [btrfs] [330700.195634] [<ffffffff811589e5>] ? kmem_cache_alloc+0x105/0x130 [330700.202551] [<ffffffffa03477e0>] btrfs_csum_file_blocks+0xd0/0x6d0 [btrfs] [330700.210542] [<ffffffffa03768b1>] ? clear_extent_bit+0x161/0x420 [btrfs] [330700.218237] [<ffffffffa0354109>] add_pending_csums+0x49/0x70 [btrfs] [330700.225706] [<ffffffffa0357de6>] btrfs_finish_ordered_io+0x276/0x3d0 [btrfs] [330700.233940] [<ffffffffa0357f8c>] btrfs_writepage_end_io_hook+0x4c/0xa0 [btrfs] [330700.242345] [<ffffffffa0376cb9>] end_extent_writepage+0x69/0x100 [btrfs] [330700.250192] [<ffffffffa0376db6>] end_bio_extent_writepage+0x66/0xa0 [btrfs] [330700.258327] [<ffffffff8119959d>] bio_endio+0x1d/0x40 [330700.264214] [<ffffffffa034b135>] end_workqueue_fn+0x45/0x50 [btrfs] [330700.271612] [<ffffffffa03831df>] worker_loop+0x14f/0x5a0 [btrfs] [330700.278672] [<ffffffffa0383090>] ? btrfs_queue_worker+0x300/0x300 [btrfs] [330700.286582] [<ffffffffa0383090>] ? btrfs_queue_worker+0x300/0x300 [btrfs] [330700.294535] [<ffffffff810703fe>] kthread+0x9e/0xb0 [330700.300244] [<ffffffff8158c224>] kernel_thread_helper+0x4/0x10 [330700.307031] [<ffffffff81070360>] ? kthread_freezable_should_stop+0x70/0x70 [330700.315061] [<ffffffff8158c220>] ? gs_change+0x13/0x13 [330700.321167] ---[ end trace b8c31966cca74ca0 ]--- The filesystems have plenty of free space: /dev/sda 1.9T 16G 1.8T 1% /ceph/osd.000 /dev/sdb 1.9T 15G 1.8T 1% /ceph/osd.001 /dev/sdc 1.9T 13G 1.8T 1% /ceph/osd.002 /dev/sdd 1.9T 14G 1.8T 1% /ceph/osd.003 # btrfs fi df /ceph/osd.000 Data: total=38.01GB, used=15.53GB System, DUP: total=8.00MB, used=64.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=37.50GB, used=82.19MB Metadata: total=8.00MB, used=0.00 A few more btrfs_orphan_commit_root WARNINGS are present too. If needed I could upload the messages file. Regards, Christian Am 20. April 2012 17:09 schrieb Christian Brunner <christian@brunner-muc.de>:> After running ceph on XFS for some time, I decided to try btrfs again. > Performance with the current "for-linux-min" branch and big metadata > is much better. The only problem (?) I''m still seeing is a warning > that seems to occur from time to time: > > [87703.784552] ------------[ cut here ]------------ > [87703.789759] WARNING: at fs/btrfs/inode.c:2103 > btrfs_orphan_commit_root+0xf6/0x100 [btrfs]() > [87703.799070] Hardware name: ProLiant DL180 G6 > [87703.804024] Modules linked in: btrfs zlib_deflate libcrc32c xfs > exportfs sunrpc bonding ipv6 sg serio_raw pcspkr iTCO_wdt > iTCO_vendor_support i7core_edac edac_core ixgbe dca mdio > iomemory_vsl(PO) hpsa squashfs [last unloaded: scsi_wait_scan] > [87703.828166] Pid: 929, comm: kworker/1:2 Tainted: P O > 3.3.2-1.fits.1.el6.x86_64 #1 > [87703.837513] Call Trace: > [87703.840280] [<ffffffff8104df6f>] warn_slowpath_common+0x7f/0xc0 > [87703.847016] [<ffffffff8104dfca>] warn_slowpath_null+0x1a/0x20 > [87703.853533] [<ffffffffa0355686>] btrfs_orphan_commit_root+0xf6/0x100 [btrfs] > [87703.861541] [<ffffffffa0350a06>] commit_fs_roots+0xc6/0x1c0 [btrfs] > [87703.868674] [<ffffffffa0351bcb>] > btrfs_commit_transaction+0x5db/0xa50 [btrfs] > [87703.876745] [<ffffffff810127a3>] ? __switch_to+0x153/0x440 > [87703.882966] [<ffffffff81070a90>] ? wake_up_bit+0x40/0x40 > [87703.888997] [<ffffffffa0352040>] ? > btrfs_commit_transaction+0xa50/0xa50 [btrfs] > [87703.897271] [<ffffffffa035205f>] do_async_commit+0x1f/0x30 [btrfs] > [87703.904262] [<ffffffff81068949>] process_one_work+0x129/0x450 > [87703.910777] [<ffffffff8106b7eb>] worker_thread+0x17b/0x3c0 > [87703.916991] [<ffffffff8106b670>] ? manage_workers+0x220/0x220 > [87703.923504] [<ffffffff810703fe>] kthread+0x9e/0xb0 > [87703.928952] [<ffffffff8158c224>] kernel_thread_helper+0x4/0x10 > [87703.935555] [<ffffffff81070360>] ? kthread_freezable_should_stop+0x70/0x70 > [87703.943323] [<ffffffff8158c220>] ? gs_change+0x13/0x13 > [87703.949149] ---[ end trace b8c31966cca731fa ]--- > [91128.812399] ------------[ cut here ]------------ > [91128.817576] WARNING: at fs/btrfs/inode.c:2103 > btrfs_orphan_commit_root+0xf6/0x100 [btrfs]() > [91128.826930] Hardware name: ProLiant DL180 G6 > [91128.831897] Modules linked in: btrfs zlib_deflate libcrc32c xfs > exportfs sunrpc bonding ipv6 sg serio_raw pcspkr iTCO_wdt > iTCO_vendor_support i7core_edac edac_core ixgbe dca mdio > iomemory_vsl(PO) hpsa squashfs [last unloaded: scsi_wait_scan] > [91128.856086] Pid: 6806, comm: btrfs-transacti Tainted: P W O > 3.3.2-1.fits.1.el6.x86_64 #1 > [91128.865912] Call Trace: > [91128.868670] [<ffffffff8104df6f>] warn_slowpath_common+0x7f/0xc0 > [91128.875379] [<ffffffff8104dfca>] warn_slowpath_null+0x1a/0x20 > [91128.881900] [<ffffffffa0355686>] btrfs_orphan_commit_root+0xf6/0x100 [btrfs] > [91128.889894] [<ffffffffa0350a06>] commit_fs_roots+0xc6/0x1c0 [btrfs] > [91128.897019] [<ffffffffa03a2b61>] ? > btrfs_run_delayed_items+0xf1/0x160 [btrfs] > [91128.905075] [<ffffffffa0351bcb>] > btrfs_commit_transaction+0x5db/0xa50 [btrfs] > [91128.913156] [<ffffffffa03524b2>] ? start_transaction+0x92/0x310 [btrfs] > [91128.920643] [<ffffffff81070a90>] ? wake_up_bit+0x40/0x40 > [91128.926667] [<ffffffffa034cfcb>] transaction_kthread+0x26b/0x2e0 [btrfs] > [91128.934254] [<ffffffffa034cd60>] ? > btrfs_destroy_marked_extents.clone.0+0x1f0/0x1f0 [btrfs] > [91128.943671] [<ffffffffa034cd60>] ? > btrfs_destroy_marked_extents.clone.0+0x1f0/0x1f0 [btrfs] > [91128.953079] [<ffffffff810703fe>] kthread+0x9e/0xb0 > [91128.958532] [<ffffffff8158c224>] kernel_thread_helper+0x4/0x10 > [91128.965133] [<ffffffff81070360>] ? kthread_freezable_should_stop+0x70/0x70 > [91128.972913] [<ffffffff8158c220>] ? gs_change+0x13/0x13 > [91128.978826] ---[ end trace b8c31966cca731fb ]--- > > I''m able to reproduce this with ceph on a single server with 4 disks > (4 filesystems/osds) and a small test program based on librbd. It is > simply writing random bytes on a rbd volume (see attachment). > > Is this something I should care about? Any hint''s on solving this > would be appreciated. > > Thanks, > Christian-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Apr 20, 2012 at 05:09:34PM +0200, Christian Brunner wrote:> After running ceph on XFS for some time, I decided to try btrfs again. > Performance with the current "for-linux-min" branch and big metadata > is much better. The only problem (?) I''m still seeing is a warning > that seems to occur from time to time: > > [87703.784552] ------------[ cut here ]------------ > [87703.789759] WARNING: at fs/btrfs/inode.c:2103 > btrfs_orphan_commit_root+0xf6/0x100 [btrfs]() > [87703.799070] Hardware name: ProLiant DL180 G6 > [87703.804024] Modules linked in: btrfs zlib_deflate libcrc32c xfs > exportfs sunrpc bonding ipv6 sg serio_raw pcspkr iTCO_wdt > iTCO_vendor_support i7core_edac edac_core ixgbe dca mdio > iomemory_vsl(PO) hpsa squashfs [last unloaded: scsi_wait_scan] > [87703.828166] Pid: 929, comm: kworker/1:2 Tainted: P O > 3.3.2-1.fits.1.el6.x86_64 #1 > [87703.837513] Call Trace: > [87703.840280] [<ffffffff8104df6f>] warn_slowpath_common+0x7f/0xc0 > [87703.847016] [<ffffffff8104dfca>] warn_slowpath_null+0x1a/0x20 > [87703.853533] [<ffffffffa0355686>] btrfs_orphan_commit_root+0xf6/0x100 [btrfs] > [87703.861541] [<ffffffffa0350a06>] commit_fs_roots+0xc6/0x1c0 [btrfs] > [87703.868674] [<ffffffffa0351bcb>] > btrfs_commit_transaction+0x5db/0xa50 [btrfs] > [87703.876745] [<ffffffff810127a3>] ? __switch_to+0x153/0x440 > [87703.882966] [<ffffffff81070a90>] ? wake_up_bit+0x40/0x40 > [87703.888997] [<ffffffffa0352040>] ? > btrfs_commit_transaction+0xa50/0xa50 [btrfs] > [87703.897271] [<ffffffffa035205f>] do_async_commit+0x1f/0x30 [btrfs] > [87703.904262] [<ffffffff81068949>] process_one_work+0x129/0x450 > [87703.910777] [<ffffffff8106b7eb>] worker_thread+0x17b/0x3c0 > [87703.916991] [<ffffffff8106b670>] ? manage_workers+0x220/0x220 > [87703.923504] [<ffffffff810703fe>] kthread+0x9e/0xb0 > [87703.928952] [<ffffffff8158c224>] kernel_thread_helper+0x4/0x10 > [87703.935555] [<ffffffff81070360>] ? kthread_freezable_should_stop+0x70/0x70 > [87703.943323] [<ffffffff8158c220>] ? gs_change+0x13/0x13 > [87703.949149] ---[ end trace b8c31966cca731fa ]--- > [91128.812399] ------------[ cut here ]------------ > [91128.817576] WARNING: at fs/btrfs/inode.c:2103 > btrfs_orphan_commit_root+0xf6/0x100 [btrfs]() > [91128.826930] Hardware name: ProLiant DL180 G6 > [91128.831897] Modules linked in: btrfs zlib_deflate libcrc32c xfs > exportfs sunrpc bonding ipv6 sg serio_raw pcspkr iTCO_wdt > iTCO_vendor_support i7core_edac edac_core ixgbe dca mdio > iomemory_vsl(PO) hpsa squashfs [last unloaded: scsi_wait_scan] > [91128.856086] Pid: 6806, comm: btrfs-transacti Tainted: P W O > 3.3.2-1.fits.1.el6.x86_64 #1 > [91128.865912] Call Trace: > [91128.868670] [<ffffffff8104df6f>] warn_slowpath_common+0x7f/0xc0 > [91128.875379] [<ffffffff8104dfca>] warn_slowpath_null+0x1a/0x20 > [91128.881900] [<ffffffffa0355686>] btrfs_orphan_commit_root+0xf6/0x100 [btrfs] > [91128.889894] [<ffffffffa0350a06>] commit_fs_roots+0xc6/0x1c0 [btrfs] > [91128.897019] [<ffffffffa03a2b61>] ? > btrfs_run_delayed_items+0xf1/0x160 [btrfs] > [91128.905075] [<ffffffffa0351bcb>] > btrfs_commit_transaction+0x5db/0xa50 [btrfs] > [91128.913156] [<ffffffffa03524b2>] ? start_transaction+0x92/0x310 [btrfs] > [91128.920643] [<ffffffff81070a90>] ? wake_up_bit+0x40/0x40 > [91128.926667] [<ffffffffa034cfcb>] transaction_kthread+0x26b/0x2e0 [btrfs] > [91128.934254] [<ffffffffa034cd60>] ? > btrfs_destroy_marked_extents.clone.0+0x1f0/0x1f0 [btrfs] > [91128.943671] [<ffffffffa034cd60>] ? > btrfs_destroy_marked_extents.clone.0+0x1f0/0x1f0 [btrfs] > [91128.953079] [<ffffffff810703fe>] kthread+0x9e/0xb0 > [91128.958532] [<ffffffff8158c224>] kernel_thread_helper+0x4/0x10 > [91128.965133] [<ffffffff81070360>] ? kthread_freezable_should_stop+0x70/0x70 > [91128.972913] [<ffffffff8158c220>] ? gs_change+0x13/0x13 > [91128.978826] ---[ end trace b8c31966cca731fb ]--- > > I''m able to reproduce this with ceph on a single server with 4 disks > (4 filesystems/osds) and a small test program based on librbd. It is > simply writing random bytes on a rbd volume (see attachment). > > Is this something I should care about? Any hint''s on solving this > would be appreciated. >Can you send me a config or some basic steps for me to setup ceph on my box so I can run this program and finally track down this problem? Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 24 Apr 2012, Josef Bacik wrote:> On Fri, Apr 20, 2012 at 05:09:34PM +0200, Christian Brunner wrote: > > After running ceph on XFS for some time, I decided to try btrfs again. > > Performance with the current "for-linux-min" branch and big metadata > > is much better. The only problem (?) I''m still seeing is a warning > > that seems to occur from time to time:Actually, before you do that... we have a new tool, test_filestore_workloadgen, that generates a ceph-osd-like workload on the local file system. It''s a subset of what a full OSD might do, but if we''re lucky it will be sufficient to reproduce this issue. Something like test_filestore_workloadgen --osd-data /foo --osd-journal /bar will hopefully do the trick. Christian, maybe you can see if that is able to trigger this warning? You''ll need to pull it from the current master branch; it wasn''t in the last release. Thanks! sage> > > > [87703.784552] ------------[ cut here ]------------ > > [87703.789759] WARNING: at fs/btrfs/inode.c:2103 > > btrfs_orphan_commit_root+0xf6/0x100 [btrfs]() > > [87703.799070] Hardware name: ProLiant DL180 G6 > > [87703.804024] Modules linked in: btrfs zlib_deflate libcrc32c xfs > > exportfs sunrpc bonding ipv6 sg serio_raw pcspkr iTCO_wdt > > iTCO_vendor_support i7core_edac edac_core ixgbe dca mdio > > iomemory_vsl(PO) hpsa squashfs [last unloaded: scsi_wait_scan] > > [87703.828166] Pid: 929, comm: kworker/1:2 Tainted: P O > > 3.3.2-1.fits.1.el6.x86_64 #1 > > [87703.837513] Call Trace: > > [87703.840280] [<ffffffff8104df6f>] warn_slowpath_common+0x7f/0xc0 > > [87703.847016] [<ffffffff8104dfca>] warn_slowpath_null+0x1a/0x20 > > [87703.853533] [<ffffffffa0355686>] btrfs_orphan_commit_root+0xf6/0x100 [btrfs] > > [87703.861541] [<ffffffffa0350a06>] commit_fs_roots+0xc6/0x1c0 [btrfs] > > [87703.868674] [<ffffffffa0351bcb>] > > btrfs_commit_transaction+0x5db/0xa50 [btrfs] > > [87703.876745] [<ffffffff810127a3>] ? __switch_to+0x153/0x440 > > [87703.882966] [<ffffffff81070a90>] ? wake_up_bit+0x40/0x40 > > [87703.888997] [<ffffffffa0352040>] ? > > btrfs_commit_transaction+0xa50/0xa50 [btrfs] > > [87703.897271] [<ffffffffa035205f>] do_async_commit+0x1f/0x30 [btrfs] > > [87703.904262] [<ffffffff81068949>] process_one_work+0x129/0x450 > > [87703.910777] [<ffffffff8106b7eb>] worker_thread+0x17b/0x3c0 > > [87703.916991] [<ffffffff8106b670>] ? manage_workers+0x220/0x220 > > [87703.923504] [<ffffffff810703fe>] kthread+0x9e/0xb0 > > [87703.928952] [<ffffffff8158c224>] kernel_thread_helper+0x4/0x10 > > [87703.935555] [<ffffffff81070360>] ? kthread_freezable_should_stop+0x70/0x70 > > [87703.943323] [<ffffffff8158c220>] ? gs_change+0x13/0x13 > > [87703.949149] ---[ end trace b8c31966cca731fa ]--- > > [91128.812399] ------------[ cut here ]------------ > > [91128.817576] WARNING: at fs/btrfs/inode.c:2103 > > btrfs_orphan_commit_root+0xf6/0x100 [btrfs]() > > [91128.826930] Hardware name: ProLiant DL180 G6 > > [91128.831897] Modules linked in: btrfs zlib_deflate libcrc32c xfs > > exportfs sunrpc bonding ipv6 sg serio_raw pcspkr iTCO_wdt > > iTCO_vendor_support i7core_edac edac_core ixgbe dca mdio > > iomemory_vsl(PO) hpsa squashfs [last unloaded: scsi_wait_scan] > > [91128.856086] Pid: 6806, comm: btrfs-transacti Tainted: P W O > > 3.3.2-1.fits.1.el6.x86_64 #1 > > [91128.865912] Call Trace: > > [91128.868670] [<ffffffff8104df6f>] warn_slowpath_common+0x7f/0xc0 > > [91128.875379] [<ffffffff8104dfca>] warn_slowpath_null+0x1a/0x20 > > [91128.881900] [<ffffffffa0355686>] btrfs_orphan_commit_root+0xf6/0x100 [btrfs] > > [91128.889894] [<ffffffffa0350a06>] commit_fs_roots+0xc6/0x1c0 [btrfs] > > [91128.897019] [<ffffffffa03a2b61>] ? > > btrfs_run_delayed_items+0xf1/0x160 [btrfs] > > [91128.905075] [<ffffffffa0351bcb>] > > btrfs_commit_transaction+0x5db/0xa50 [btrfs] > > [91128.913156] [<ffffffffa03524b2>] ? start_transaction+0x92/0x310 [btrfs] > > [91128.920643] [<ffffffff81070a90>] ? wake_up_bit+0x40/0x40 > > [91128.926667] [<ffffffffa034cfcb>] transaction_kthread+0x26b/0x2e0 [btrfs] > > [91128.934254] [<ffffffffa034cd60>] ? > > btrfs_destroy_marked_extents.clone.0+0x1f0/0x1f0 [btrfs] > > [91128.943671] [<ffffffffa034cd60>] ? > > btrfs_destroy_marked_extents.clone.0+0x1f0/0x1f0 [btrfs] > > [91128.953079] [<ffffffff810703fe>] kthread+0x9e/0xb0 > > [91128.958532] [<ffffffff8158c224>] kernel_thread_helper+0x4/0x10 > > [91128.965133] [<ffffffff81070360>] ? kthread_freezable_should_stop+0x70/0x70 > > [91128.972913] [<ffffffff8158c220>] ? gs_change+0x13/0x13 > > [91128.978826] ---[ end trace b8c31966cca731fb ]--- > > > > I''m able to reproduce this with ceph on a single server with 4 disks > > (4 filesystems/osds) and a small test program based on librbd. It is > > simply writing random bytes on a rbd volume (see attachment). > > > > Is this something I should care about? Any hint''s on solving this > > would be appreciated. > > > > Can you send me a config or some basic steps for me to setup ceph on my box so I > can run this program and finally track down this problem? Thanks, > > Josef > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Apr 24, 2012 at 09:26:15AM -0700, Sage Weil wrote:> On Tue, 24 Apr 2012, Josef Bacik wrote: > > On Fri, Apr 20, 2012 at 05:09:34PM +0200, Christian Brunner wrote: > > > After running ceph on XFS for some time, I decided to try btrfs again. > > > Performance with the current "for-linux-min" branch and big metadata > > > is much better. The only problem (?) I''m still seeing is a warning > > > that seems to occur from time to time: > > Actually, before you do that... we have a new tool, > test_filestore_workloadgen, that generates a ceph-osd-like workload on the > local file system. It''s a subset of what a full OSD might do, but if > we''re lucky it will be sufficient to reproduce this issue. Something like > > test_filestore_workloadgen --osd-data /foo --osd-journal /bar > > will hopefully do the trick. > > Christian, maybe you can see if that is able to trigger this warning? > You''ll need to pull it from the current master branch; it wasn''t in the > last release. >Keep up the good work Sage, at this rate I''ll never have to setup ceph for myself :), Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Apr 24, 2012 at 01:33:44PM -0400, Josef Bacik wrote:> On Tue, Apr 24, 2012 at 09:26:15AM -0700, Sage Weil wrote: > > On Tue, 24 Apr 2012, Josef Bacik wrote: > > > On Fri, Apr 20, 2012 at 05:09:34PM +0200, Christian Brunner wrote: > > > > After running ceph on XFS for some time, I decided to try btrfs again. > > > > Performance with the current "for-linux-min" branch and big metadata > > > > is much better. The only problem (?) I''m still seeing is a warning > > > > that seems to occur from time to time: > > > > Actually, before you do that... we have a new tool, > > test_filestore_workloadgen, that generates a ceph-osd-like workload on the > > local file system. It''s a subset of what a full OSD might do, but if > > we''re lucky it will be sufficient to reproduce this issue. Something like > > > > test_filestore_workloadgen --osd-data /foo --osd-journal /bar > > > > will hopefully do the trick. > > > > Christian, maybe you can see if that is able to trigger this warning? > > You''ll need to pull it from the current master branch; it wasn''t in the > > last release. > > > > Keep up the good work Sage, at this rate I''ll never have to setup ceph for > myself :), >You can setup another OSD on daedalus if you''re looking for something to do Josef :) Neil> Josef > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am 24. April 2012 18:26 schrieb Sage Weil <sage@newdream.net>:> On Tue, 24 Apr 2012, Josef Bacik wrote: >> On Fri, Apr 20, 2012 at 05:09:34PM +0200, Christian Brunner wrote: >> > After running ceph on XFS for some time, I decided to try btrfs again. >> > Performance with the current "for-linux-min" branch and big metadata >> > is much better. The only problem (?) I''m still seeing is a warning >> > that seems to occur from time to time: > > Actually, before you do that... we have a new tool, > test_filestore_workloadgen, that generates a ceph-osd-like workload on the > local file system. It''s a subset of what a full OSD might do, but if > we''re lucky it will be sufficient to reproduce this issue. Something like > > test_filestore_workloadgen --osd-data /foo --osd-journal /bar > > will hopefully do the trick. > > Christian, maybe you can see if that is able to trigger this warning? > You''ll need to pull it from the current master branch; it wasn''t in the > last release.Trying to reproduce with test_filestore_workloadgen didn''t work for me. So here are some instructions on how to reproduce with a minimal ceph setup. You will need a single system with two disks and a bit of memory. - Compile and install ceph (detailed instructions: http://ceph.newdream.net/docs/master/ops/install/mkcephfs/) - For the test setup I''ve used two tmpfs files as journal devices. To create these, do the following: # mkdir -p /ceph/temp # mount -t tmpfs tmpfs /ceph/temp # dd if=/dev/zero of=/ceph/temp/journal0 count=500 bs=1024k # dd if=/dev/zero of=/ceph/temp/journal1 count=500 bs=1024k - Now you should create and mount btrfs. Here is what I did: # mkfs.btrfs -l 64k -n 64k /dev/sda # mkfs.btrfs -l 64k -n 64k /dev/sdb # mkdir /ceph/osd.000 # mkdir /ceph/osd.001 # mount -o noatime,space_cache,inode_cache,autodefrag /dev/sda /ceph/osd.000 # mount -o noatime,space_cache,inode_cache,autodefrag /dev/sdb /ceph/osd.001 - Create /etc/ceph/ceph.conf similar to the attached ceph.conf. You will probably have to change the btrfs devices and the hostname (os39). - Create the ceph filesystems: # mkdir /ceph/mon # mkcephfs -a -c /etc/ceph/ceph.conf - Start ceph (e.g. "service ceph start") - Now you should be able to use ceph - "ceph -s" will tell you about the state of the ceph cluster. - "rbd create -size 100 testimg" will create an rbd image on the ceph cluster. - Compile my test with "gcc -o rbdtest rbdtest.c -lrbd" and run it with "./rbdtest testimg". I can see the first btrfs_orphan_commit_root warning after an hour or so... I hope that I''ve described all necessary steps. If there is a problem just send me a note. Thanks, Christian
On Fri, Apr 20, 2012 at 8:09 AM, Christian Brunner <christian@brunner-muc.de> wrote:> After running ceph on XFS for some time, I decided to try btrfs again. > Performance with the current "for-linux-min" branch and big metadata > is much better.I''ve heard that although performance from btrfs is better at first, it degrades over time due to metadata fragmentation, whereas XFS'' performance starts off a little worse, but remains stable even after weeks of heavy utilization. Would be curious to hear your (or others'') feedback on that topic. -- Benoit "tsuna" Sigoure Software Engineer @ www.StumbleUpon.com -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2012/4/29 tsuna <tsunanet@gmail.com>:> On Fri, Apr 20, 2012 at 8:09 AM, Christian Brunner > <christian@brunner-muc.de> wrote: >> After running ceph on XFS for some time, I decided to try btrfs again. >> Performance with the current "for-linux-min" branch and big metadata >> is much better. > > I''ve heard that although performance from btrfs is better at first, it > degrades over time due to metadata fragmentation, whereas XFS'' > performance starts off a little worse, but remains stable even after > weeks of heavy utilization. Would be curious to hear your (or > others'') feedback on that topic.Metadata fragmentation was a big problem (for us) in the past. With the "big metatdata feature" (mkfs.btrfs -l 64k -n 64k) these problems seem to be solved. We do not use it in production yet, but my stress test didn''t show any degradation. The only remaining issues I''ve seen are these warnings. Regards, Christian -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Apr 27, 2012 at 01:02:08PM +0200, Christian Brunner wrote:> Am 24. April 2012 18:26 schrieb Sage Weil <sage@newdream.net>: > > On Tue, 24 Apr 2012, Josef Bacik wrote: > >> On Fri, Apr 20, 2012 at 05:09:34PM +0200, Christian Brunner wrote: > >> > After running ceph on XFS for some time, I decided to try btrfs again. > >> > Performance with the current "for-linux-min" branch and big metadata > >> > is much better. The only problem (?) I''m still seeing is a warning > >> > that seems to occur from time to time: > > > > Actually, before you do that... we have a new tool, > > test_filestore_workloadgen, that generates a ceph-osd-like workload on the > > local file system. It''s a subset of what a full OSD might do, but if > > we''re lucky it will be sufficient to reproduce this issue. Something like > > > > test_filestore_workloadgen --osd-data /foo --osd-journal /bar > > > > will hopefully do the trick. > > > > Christian, maybe you can see if that is able to trigger this warning? > > You''ll need to pull it from the current master branch; it wasn''t in the > > last release. > > Trying to reproduce with test_filestore_workloadgen didn''t work for > me. So here are some instructions on how to reproduce with a minimal > ceph setup. > > You will need a single system with two disks and a bit of memory. > > - Compile and install ceph (detailed instructions: > http://ceph.newdream.net/docs/master/ops/install/mkcephfs/) > > - For the test setup I''ve used two tmpfs files as journal devices. To > create these, do the following: > > # mkdir -p /ceph/temp > # mount -t tmpfs tmpfs /ceph/temp > # dd if=/dev/zero of=/ceph/temp/journal0 count=500 bs=1024k > # dd if=/dev/zero of=/ceph/temp/journal1 count=500 bs=1024k > > - Now you should create and mount btrfs. Here is what I did: > > # mkfs.btrfs -l 64k -n 64k /dev/sda > # mkfs.btrfs -l 64k -n 64k /dev/sdb > # mkdir /ceph/osd.000 > # mkdir /ceph/osd.001 > # mount -o noatime,space_cache,inode_cache,autodefrag /dev/sda /ceph/osd.000 > # mount -o noatime,space_cache,inode_cache,autodefrag /dev/sdb /ceph/osd.001 > > - Create /etc/ceph/ceph.conf similar to the attached ceph.conf. You > will probably have to change the btrfs devices and the hostname > (os39). > > - Create the ceph filesystems: > > # mkdir /ceph/mon > # mkcephfs -a -c /etc/ceph/ceph.conf > > - Start ceph (e.g. "service ceph start") > > - Now you should be able to use ceph - "ceph -s" will tell you about > the state of the ceph cluster. > > - "rbd create -size 100 testimg" will create an rbd image on the ceph cluster. >It''s failing here http://fpaste.org/e3BG/ Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 3 May 2012 10:13:55 -0400, Josef Bacik <josef@redhat.com> wrote:> On Fri, Apr 27, 2012 at 01:02:08PM +0200, Christian Brunner wrote: >> Am 24. April 2012 18:26 schrieb Sage Weil <sage@newdream.net>: >> > On Tue, 24 Apr 2012, Josef Bacik wrote: >> >> On Fri, Apr 20, 2012 at 05:09:34PM +0200, Christian Brunner wrote: >> >> > After running ceph on XFS for some time, I decided to try btrfs again. >> >> > Performance with the current "for-linux-min" branch and big metadata >> >> > is much better. The only problem (?) I''m still seeing is a warning >> >> > that seems to occur from time to time: >> > >> > Actually, before you do that... we have a new tool, >> > test_filestore_workloadgen, that generates a ceph-osd-like workload on the >> > local file system. It''s a subset of what a full OSD might do, but if >> > we''re lucky it will be sufficient to reproduce this issue. Something like >> > >> > test_filestore_workloadgen --osd-data /foo --osd-journal /bar >> > >> > will hopefully do the trick. >> > >> > Christian, maybe you can see if that is able to trigger this warning? >> > You''ll need to pull it from the current master branch; it wasn''t in the >> > last release. >> >> Trying to reproduce with test_filestore_workloadgen didn''t work for >> me. So here are some instructions on how to reproduce with a minimal >> ceph setup. >> >> You will need a single system with two disks and a bit of memory. >> >> - Compile and install ceph (detailed instructions: >> http://ceph.newdream.net/docs/master/ops/install/mkcephfs/) >> >> - For the test setup I''ve used two tmpfs files as journal devices. To >> create these, do the following: >> >> # mkdir -p /ceph/temp >> # mount -t tmpfs tmpfs /ceph/temp >> # dd if=/dev/zero of=/ceph/temp/journal0 count=500 bs=1024k >> # dd if=/dev/zero of=/ceph/temp/journal1 count=500 bs=1024k >> >> - Now you should create and mount btrfs. Here is what I did: >> >> # mkfs.btrfs -l 64k -n 64k /dev/sda >> # mkfs.btrfs -l 64k -n 64k /dev/sdb >> # mkdir /ceph/osd.000 >> # mkdir /ceph/osd.001 >> # mount -o noatime,space_cache,inode_cache,autodefrag /dev/sda /ceph/osd.000 >> # mount -o noatime,space_cache,inode_cache,autodefrag /dev/sdb /ceph/osd.001 >> >> - Create /etc/ceph/ceph.conf similar to the attached ceph.conf. You >> will probably have to change the btrfs devices and the hostname >> (os39). >> >> - Create the ceph filesystems: >> >> # mkdir /ceph/mon >> # mkcephfs -a -c /etc/ceph/ceph.conf >> >> - Start ceph (e.g. "service ceph start") >> >> - Now you should be able to use ceph - "ceph -s" will tell you about >> the state of the ceph cluster. >> >> - "rbd create -size 100 testimg" will create an rbd image on the ceph cluster. >> > > It''s failing here > > http://fpaste.org/e3BG/2012-05-03 10:11:28.818308 7fcb5a0ee700 -- 127.0.0.1:0/1003269 <=osd.1 127.0.0.1:6803/2379 3 ==== osd_op_reply(3 rbd_info [call] = -5 (Input/output error)) v4 ==== 107+0+0 (3948821281 0 0) 0x7fcb380009a0 con 0x1cad3e0 This is probably because the osd isn''t finding the rbd class. Do you have ''rbd_cls.so'' in /usr/lib64/rados-classes? Wherever rbd_cls.so is, try adding ''osd class dir = /path/to/rados-classes'' to the [osd] section in your ceph.conf, and restarting the osds. If you set ''debug osd = 10'' you should see ''_load_class rbd'' in the osd log when you try to create an rbd image. Autotools should be setting the default location correctly, but if you''re running the osds in a chroot or something the path would be wrong. Josh -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 03, 2012 at 08:17:43AM -0700, Josh Durgin wrote:> On Thu, 3 May 2012 10:13:55 -0400, Josef Bacik <josef@redhat.com> > wrote: > > On Fri, Apr 27, 2012 at 01:02:08PM +0200, Christian Brunner wrote: > >> Am 24. April 2012 18:26 schrieb Sage Weil <sage@newdream.net>: > >> > On Tue, 24 Apr 2012, Josef Bacik wrote: > >> >> On Fri, Apr 20, 2012 at 05:09:34PM +0200, Christian Brunner wrote: > >> >> > After running ceph on XFS for some time, I decided to try btrfs again. > >> >> > Performance with the current "for-linux-min" branch and big metadata > >> >> > is much better. The only problem (?) I''m still seeing is a warning > >> >> > that seems to occur from time to time: > >> > > >> > Actually, before you do that... we have a new tool, > >> > test_filestore_workloadgen, that generates a ceph-osd-like workload on the > >> > local file system. It''s a subset of what a full OSD might do, but if > >> > we''re lucky it will be sufficient to reproduce this issue. Something like > >> > > >> > test_filestore_workloadgen --osd-data /foo --osd-journal /bar > >> > > >> > will hopefully do the trick. > >> > > >> > Christian, maybe you can see if that is able to trigger this warning? > >> > You''ll need to pull it from the current master branch; it wasn''t in the > >> > last release. > >> > >> Trying to reproduce with test_filestore_workloadgen didn''t work for > >> me. So here are some instructions on how to reproduce with a minimal > >> ceph setup. > >> > >> You will need a single system with two disks and a bit of memory. > >> > >> - Compile and install ceph (detailed instructions: > >> http://ceph.newdream.net/docs/master/ops/install/mkcephfs/) > >> > >> - For the test setup I''ve used two tmpfs files as journal devices. To > >> create these, do the following: > >> > >> # mkdir -p /ceph/temp > >> # mount -t tmpfs tmpfs /ceph/temp > >> # dd if=/dev/zero of=/ceph/temp/journal0 count=500 bs=1024k > >> # dd if=/dev/zero of=/ceph/temp/journal1 count=500 bs=1024k > >> > >> - Now you should create and mount btrfs. Here is what I did: > >> > >> # mkfs.btrfs -l 64k -n 64k /dev/sda > >> # mkfs.btrfs -l 64k -n 64k /dev/sdb > >> # mkdir /ceph/osd.000 > >> # mkdir /ceph/osd.001 > >> # mount -o noatime,space_cache,inode_cache,autodefrag /dev/sda /ceph/osd.000 > >> # mount -o noatime,space_cache,inode_cache,autodefrag /dev/sdb /ceph/osd.001 > >> > >> - Create /etc/ceph/ceph.conf similar to the attached ceph.conf. You > >> will probably have to change the btrfs devices and the hostname > >> (os39). > >> > >> - Create the ceph filesystems: > >> > >> # mkdir /ceph/mon > >> # mkcephfs -a -c /etc/ceph/ceph.conf > >> > >> - Start ceph (e.g. "service ceph start") > >> > >> - Now you should be able to use ceph - "ceph -s" will tell you about > >> the state of the ceph cluster. > >> > >> - "rbd create -size 100 testimg" will create an rbd image on the ceph cluster. > >> > > > > It''s failing here > > > > http://fpaste.org/e3BG/ > > 2012-05-03 10:11:28.818308 7fcb5a0ee700 -- 127.0.0.1:0/1003269 <=> osd.1 127.0.0.1:6803/2379 3 ==== osd_op_reply(3 rbd_info [call] = -5 > (Input/output error)) v4 ==== 107+0+0 (3948821281 0 0) 0x7fcb380009a0 > con 0x1cad3e0 > > This is probably because the osd isn''t finding the rbd class. > Do you have ''rbd_cls.so'' in /usr/lib64/rados-classes? Wherever > rbd_cls.so is, > try adding ''osd class dir = /path/to/rados-classes'' to the [osd] > section > in your ceph.conf, and restarting the osds. > > If you set ''debug osd = 10'' you should see ''_load_class rbd'' in the osd > log > when you try to create an rbd image. > > Autotools should be setting the default location correctly, but if > you''re > running the osds in a chroot or something the path would be wrong. >Yeah all that was in the right place, I rebooted and I magically stopped getting that error, but now I''m getting this http://fpaste.org/OE92/ with that ping thing repeating over and over. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 3 May 2012 11:20:53 -0400, Josef Bacik <josef@redhat.com> wrote:> On Thu, May 03, 2012 at 08:17:43AM -0700, Josh Durgin wrote: >> On Thu, 3 May 2012 10:13:55 -0400, Josef Bacik <josef@redhat.com> >> wrote: >> > On Fri, Apr 27, 2012 at 01:02:08PM +0200, Christian Brunner wrote: >> >> Am 24. April 2012 18:26 schrieb Sage Weil <sage@newdream.net>: >> >> > On Tue, 24 Apr 2012, Josef Bacik wrote: >> >> >> On Fri, Apr 20, 2012 at 05:09:34PM +0200, Christian Brunner wrote: >> >> >> > After running ceph on XFS for some time, I decided to try btrfs again. >> >> >> > Performance with the current "for-linux-min" branch and big metadata >> >> >> > is much better. The only problem (?) I''m still seeing is a warning >> >> >> > that seems to occur from time to time: >> >> > >> >> > Actually, before you do that... we have a new tool, >> >> > test_filestore_workloadgen, that generates a ceph-osd-like workload on the >> >> > local file system. It''s a subset of what a full OSD might do, but if >> >> > we''re lucky it will be sufficient to reproduce this issue. Something like >> >> > >> >> > test_filestore_workloadgen --osd-data /foo --osd-journal /bar >> >> > >> >> > will hopefully do the trick. >> >> > >> >> > Christian, maybe you can see if that is able to trigger this warning? >> >> > You''ll need to pull it from the current master branch; it wasn''t in the >> >> > last release. >> >> >> >> Trying to reproduce with test_filestore_workloadgen didn''t work for >> >> me. So here are some instructions on how to reproduce with a minimal >> >> ceph setup. >> >> >> >> You will need a single system with two disks and a bit of memory. >> >> >> >> - Compile and install ceph (detailed instructions: >> >> http://ceph.newdream.net/docs/master/ops/install/mkcephfs/) >> >> >> >> - For the test setup I''ve used two tmpfs files as journal devices. To >> >> create these, do the following: >> >> >> >> # mkdir -p /ceph/temp >> >> # mount -t tmpfs tmpfs /ceph/temp >> >> # dd if=/dev/zero of=/ceph/temp/journal0 count=500 bs=1024k >> >> # dd if=/dev/zero of=/ceph/temp/journal1 count=500 bs=1024k >> >> >> >> - Now you should create and mount btrfs. Here is what I did: >> >> >> >> # mkfs.btrfs -l 64k -n 64k /dev/sda >> >> # mkfs.btrfs -l 64k -n 64k /dev/sdb >> >> # mkdir /ceph/osd.000 >> >> # mkdir /ceph/osd.001 >> >> # mount -o noatime,space_cache,inode_cache,autodefrag /dev/sda /ceph/osd.000 >> >> # mount -o noatime,space_cache,inode_cache,autodefrag /dev/sdb /ceph/osd.001 >> >> >> >> - Create /etc/ceph/ceph.conf similar to the attached ceph.conf. You >> >> will probably have to change the btrfs devices and the hostname >> >> (os39). >> >> >> >> - Create the ceph filesystems: >> >> >> >> # mkdir /ceph/mon >> >> # mkcephfs -a -c /etc/ceph/ceph.conf >> >> >> >> - Start ceph (e.g. "service ceph start") >> >> >> >> - Now you should be able to use ceph - "ceph -s" will tell you about >> >> the state of the ceph cluster. >> >> >> >> - "rbd create -size 100 testimg" will create an rbd image on the ceph cluster. >> >> >> > >> > It''s failing here >> > >> > http://fpaste.org/e3BG/ >> >> 2012-05-03 10:11:28.818308 7fcb5a0ee700 -- 127.0.0.1:0/1003269 <=>> osd.1 127.0.0.1:6803/2379 3 ==== osd_op_reply(3 rbd_info [call] = -5 >> (Input/output error)) v4 ==== 107+0+0 (3948821281 0 0) 0x7fcb380009a0 >> con 0x1cad3e0 >> >> This is probably because the osd isn''t finding the rbd class. >> Do you have ''rbd_cls.so'' in /usr/lib64/rados-classes? Wherever >> rbd_cls.so is, >> try adding ''osd class dir = /path/to/rados-classes'' to the [osd] >> section >> in your ceph.conf, and restarting the osds. >> >> If you set ''debug osd = 10'' you should see ''_load_class rbd'' in the osd >> log >> when you try to create an rbd image. >> >> Autotools should be setting the default location correctly, but if >> you''re >> running the osds in a chroot or something the path would be wrong. >> > > Yeah all that was in the right place, I rebooted and I magically > stopped getting > that error, but now I''m getting this > > http://fpaste.org/OE92/ > > with that ping thing repeating over and over. Thanks,That just looks like the osd isn''t running. If you restart the osd with ''debug osd = 20'' the osd log should tell us what''s going on. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 03, 2012 at 09:38:27AM -0700, Josh Durgin wrote:> On Thu, 3 May 2012 11:20:53 -0400, Josef Bacik <josef@redhat.com> > wrote: > > On Thu, May 03, 2012 at 08:17:43AM -0700, Josh Durgin wrote: > >> On Thu, 3 May 2012 10:13:55 -0400, Josef Bacik <josef@redhat.com> > >> wrote: > >> > On Fri, Apr 27, 2012 at 01:02:08PM +0200, Christian Brunner wrote: > >> >> Am 24. April 2012 18:26 schrieb Sage Weil <sage@newdream.net>: > >> >> > On Tue, 24 Apr 2012, Josef Bacik wrote: > >> >> >> On Fri, Apr 20, 2012 at 05:09:34PM +0200, Christian Brunner wrote: > >> >> >> > After running ceph on XFS for some time, I decided to try btrfs again. > >> >> >> > Performance with the current "for-linux-min" branch and big metadata > >> >> >> > is much better. The only problem (?) I''m still seeing is a warning > >> >> >> > that seems to occur from time to time: > >> >> > > >> >> > Actually, before you do that... we have a new tool, > >> >> > test_filestore_workloadgen, that generates a ceph-osd-like workload on the > >> >> > local file system. It''s a subset of what a full OSD might do, but if > >> >> > we''re lucky it will be sufficient to reproduce this issue. Something like > >> >> > > >> >> > test_filestore_workloadgen --osd-data /foo --osd-journal /bar > >> >> > > >> >> > will hopefully do the trick. > >> >> > > >> >> > Christian, maybe you can see if that is able to trigger this warning? > >> >> > You''ll need to pull it from the current master branch; it wasn''t in the > >> >> > last release. > >> >> > >> >> Trying to reproduce with test_filestore_workloadgen didn''t work for > >> >> me. So here are some instructions on how to reproduce with a minimal > >> >> ceph setup. > >> >> > >> >> You will need a single system with two disks and a bit of memory. > >> >> > >> >> - Compile and install ceph (detailed instructions: > >> >> http://ceph.newdream.net/docs/master/ops/install/mkcephfs/) > >> >> > >> >> - For the test setup I''ve used two tmpfs files as journal devices. To > >> >> create these, do the following: > >> >> > >> >> # mkdir -p /ceph/temp > >> >> # mount -t tmpfs tmpfs /ceph/temp > >> >> # dd if=/dev/zero of=/ceph/temp/journal0 count=500 bs=1024k > >> >> # dd if=/dev/zero of=/ceph/temp/journal1 count=500 bs=1024k > >> >> > >> >> - Now you should create and mount btrfs. Here is what I did: > >> >> > >> >> # mkfs.btrfs -l 64k -n 64k /dev/sda > >> >> # mkfs.btrfs -l 64k -n 64k /dev/sdb > >> >> # mkdir /ceph/osd.000 > >> >> # mkdir /ceph/osd.001 > >> >> # mount -o noatime,space_cache,inode_cache,autodefrag /dev/sda /ceph/osd.000 > >> >> # mount -o noatime,space_cache,inode_cache,autodefrag /dev/sdb /ceph/osd.001 > >> >> > >> >> - Create /etc/ceph/ceph.conf similar to the attached ceph.conf. You > >> >> will probably have to change the btrfs devices and the hostname > >> >> (os39). > >> >> > >> >> - Create the ceph filesystems: > >> >> > >> >> # mkdir /ceph/mon > >> >> # mkcephfs -a -c /etc/ceph/ceph.conf > >> >> > >> >> - Start ceph (e.g. "service ceph start") > >> >> > >> >> - Now you should be able to use ceph - "ceph -s" will tell you about > >> >> the state of the ceph cluster. > >> >> > >> >> - "rbd create -size 100 testimg" will create an rbd image on the ceph cluster. > >> >> > >> > > >> > It''s failing here > >> > > >> > http://fpaste.org/e3BG/ > >> > >> 2012-05-03 10:11:28.818308 7fcb5a0ee700 -- 127.0.0.1:0/1003269 <=> >> osd.1 127.0.0.1:6803/2379 3 ==== osd_op_reply(3 rbd_info [call] = -5 > >> (Input/output error)) v4 ==== 107+0+0 (3948821281 0 0) 0x7fcb380009a0 > >> con 0x1cad3e0 > >> > >> This is probably because the osd isn''t finding the rbd class. > >> Do you have ''rbd_cls.so'' in /usr/lib64/rados-classes? Wherever > >> rbd_cls.so is, > >> try adding ''osd class dir = /path/to/rados-classes'' to the [osd] > >> section > >> in your ceph.conf, and restarting the osds. > >> > >> If you set ''debug osd = 10'' you should see ''_load_class rbd'' in the osd > >> log > >> when you try to create an rbd image. > >> > >> Autotools should be setting the default location correctly, but if > >> you''re > >> running the osds in a chroot or something the path would be wrong. > >> > > > > Yeah all that was in the right place, I rebooted and I magically > > stopped getting > > that error, but now I''m getting this > > > > http://fpaste.org/OE92/ > > > > with that ping thing repeating over and over. Thanks, > > That just looks like the osd isn''t running. If you restart the > osd with ''debug osd = 20'' the osd log should tell us what''s going on.Ok that part was my fault, Duh I need to redo the tmpfs and mkcephfs stuff after reboot. But now I''m back to my original problem http://fpaste.org/PfwO/ I have the osd class dir = /usr/lib64/rados-classes thing set and libcls_rbd is in there, so I''m not sure what is wrong. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2012/5/3 Josef Bacik <josef@redhat.com>:> On Thu, May 03, 2012 at 09:38:27AM -0700, Josh Durgin wrote: >> On Thu, 3 May 2012 11:20:53 -0400, Josef Bacik <josef@redhat.com> >> wrote: >> > On Thu, May 03, 2012 at 08:17:43AM -0700, Josh Durgin wrote: >> > >> > Yeah all that was in the right place, I rebooted and I magically >> > stopped getting >> > that error, but now I''m getting this >> > >> > http://fpaste.org/OE92/ >> > >> > with that ping thing repeating over and over. Thanks, >> >> That just looks like the osd isn''t running. If you restart the >> osd with ''debug osd = 20'' the osd log should tell us what''s going on. > > Ok that part was my fault, Duh I need to redo the tmpfs and mkcephfs stuff after > reboot. But now I''m back to my original problem > > http://fpaste.org/PfwO/ > > I have the osd class dir = /usr/lib64/rados-classes thing set and libcls_rbd is > in there, so I''m not sure what is wrong. Thanks,Thats really strange. Do you have the osd logs in /var/log/ceph? If so, can you look if you find anything about "rbd" or "class" loading in there? Another thing you should try is, whether you can access ceph with rados: # rados -p rbd ls # rados -p rbd -i /proc/cpuinfo put testobj # rados -p rbd -o - get testobj Regards, Christian -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, May 04, 2012 at 10:24:16PM +0200, Christian Brunner wrote:> 2012/5/3 Josef Bacik <josef@redhat.com>: > > On Thu, May 03, 2012 at 09:38:27AM -0700, Josh Durgin wrote: > >> On Thu, 3 May 2012 11:20:53 -0400, Josef Bacik <josef@redhat.com> > >> wrote: > >> > On Thu, May 03, 2012 at 08:17:43AM -0700, Josh Durgin wrote: > >> > > >> > Yeah all that was in the right place, I rebooted and I magically > >> > stopped getting > >> > that error, but now I''m getting this > >> > > >> > http://fpaste.org/OE92/ > >> > > >> > with that ping thing repeating over and over. Thanks, > >> > >> That just looks like the osd isn''t running. If you restart the > >> osd with ''debug osd = 20'' the osd log should tell us what''s going on. > > > > Ok that part was my fault, Duh I need to redo the tmpfs and mkcephfs stuff after > > reboot. But now I''m back to my original problem > > > > http://fpaste.org/PfwO/ > > > > I have the osd class dir = /usr/lib64/rados-classes thing set and libcls_rbd is > > in there, so I''m not sure what is wrong. Thanks, > > Thats really strange. Do you have the osd logs in /var/log/ceph? If > so, can you look if you find anything about "rbd" or "class" loading > in there? > > Another thing you should try is, whether you can access ceph with rados: > > # rados -p rbd ls > # rados -p rbd -i /proc/cpuinfo put testobj > # rados -p rbd -o - get testobj >Ok weirdly ceph is trying to dlopen /usr/lib64/rados-classes/libcls_rbd.so but all I had was libcls_rbd.so.1 and libcls_rbd.so.1.0.0. Symlink fixed that part, I''ll see if I can reproduce now. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Apr 27, 2012 at 01:02:08PM +0200, Christian Brunner wrote:> Am 24. April 2012 18:26 schrieb Sage Weil <sage@newdream.net>: > > On Tue, 24 Apr 2012, Josef Bacik wrote: > >> On Fri, Apr 20, 2012 at 05:09:34PM +0200, Christian Brunner wrote: > >> > After running ceph on XFS for some time, I decided to try btrfs again. > >> > Performance with the current "for-linux-min" branch and big metadata > >> > is much better. The only problem (?) I''m still seeing is a warning > >> > that seems to occur from time to time: > > > > Actually, before you do that... we have a new tool, > > test_filestore_workloadgen, that generates a ceph-osd-like workload on the > > local file system. It''s a subset of what a full OSD might do, but if > > we''re lucky it will be sufficient to reproduce this issue. Something like > > > > test_filestore_workloadgen --osd-data /foo --osd-journal /bar > > > > will hopefully do the trick. > > > > Christian, maybe you can see if that is able to trigger this warning? > > You''ll need to pull it from the current master branch; it wasn''t in the > > last release. > > Trying to reproduce with test_filestore_workloadgen didn''t work for > me. So here are some instructions on how to reproduce with a minimal > ceph setup. > > You will need a single system with two disks and a bit of memory. > > - Compile and install ceph (detailed instructions: > http://ceph.newdream.net/docs/master/ops/install/mkcephfs/) > > - For the test setup I''ve used two tmpfs files as journal devices. To > create these, do the following: > > # mkdir -p /ceph/temp > # mount -t tmpfs tmpfs /ceph/temp > # dd if=/dev/zero of=/ceph/temp/journal0 count=500 bs=1024k > # dd if=/dev/zero of=/ceph/temp/journal1 count=500 bs=1024k > > - Now you should create and mount btrfs. Here is what I did: > > # mkfs.btrfs -l 64k -n 64k /dev/sda > # mkfs.btrfs -l 64k -n 64k /dev/sdb > # mkdir /ceph/osd.000 > # mkdir /ceph/osd.001 > # mount -o noatime,space_cache,inode_cache,autodefrag /dev/sda /ceph/osd.000 > # mount -o noatime,space_cache,inode_cache,autodefrag /dev/sdb /ceph/osd.001 > > - Create /etc/ceph/ceph.conf similar to the attached ceph.conf. You > will probably have to change the btrfs devices and the hostname > (os39). > > - Create the ceph filesystems: > > # mkdir /ceph/mon > # mkcephfs -a -c /etc/ceph/ceph.conf > > - Start ceph (e.g. "service ceph start") > > - Now you should be able to use ceph - "ceph -s" will tell you about > the state of the ceph cluster. > > - "rbd create -size 100 testimg" will create an rbd image on the ceph cluster. > > - Compile my test with "gcc -o rbdtest rbdtest.c -lrbd" and run it > with "./rbdtest testimg". > > I can see the first btrfs_orphan_commit_root warning after an hour or > so... I hope that I''ve described all necessary steps. If there is a > problem just send me a note. >Well it''s only taken me 2 weeks but I''ve finally git it all up and running, hopefully I''ll reproduce. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Apr 27, 2012 at 01:02:08PM +0200, Christian Brunner wrote:> Am 24. April 2012 18:26 schrieb Sage Weil <sage@newdream.net>: > > On Tue, 24 Apr 2012, Josef Bacik wrote: > >> On Fri, Apr 20, 2012 at 05:09:34PM +0200, Christian Brunner wrote: > >> > After running ceph on XFS for some time, I decided to try btrfs again. > >> > Performance with the current "for-linux-min" branch and big metadata > >> > is much better. The only problem (?) I''m still seeing is a warning > >> > that seems to occur from time to time: > > > > Actually, before you do that... we have a new tool, > > test_filestore_workloadgen, that generates a ceph-osd-like workload on the > > local file system. It''s a subset of what a full OSD might do, but if > > we''re lucky it will be sufficient to reproduce this issue. Something like > > > > test_filestore_workloadgen --osd-data /foo --osd-journal /bar > > > > will hopefully do the trick. > > > > Christian, maybe you can see if that is able to trigger this warning? > > You''ll need to pull it from the current master branch; it wasn''t in the > > last release. > > Trying to reproduce with test_filestore_workloadgen didn''t work for > me. So here are some instructions on how to reproduce with a minimal > ceph setup. > > You will need a single system with two disks and a bit of memory. > > - Compile and install ceph (detailed instructions: > http://ceph.newdream.net/docs/master/ops/install/mkcephfs/) > > - For the test setup I''ve used two tmpfs files as journal devices. To > create these, do the following: > > # mkdir -p /ceph/temp > # mount -t tmpfs tmpfs /ceph/temp > # dd if=/dev/zero of=/ceph/temp/journal0 count=500 bs=1024k > # dd if=/dev/zero of=/ceph/temp/journal1 count=500 bs=1024k > > - Now you should create and mount btrfs. Here is what I did: > > # mkfs.btrfs -l 64k -n 64k /dev/sda > # mkfs.btrfs -l 64k -n 64k /dev/sdb > # mkdir /ceph/osd.000 > # mkdir /ceph/osd.001 > # mount -o noatime,space_cache,inode_cache,autodefrag /dev/sda /ceph/osd.000 > # mount -o noatime,space_cache,inode_cache,autodefrag /dev/sdb /ceph/osd.001 > > - Create /etc/ceph/ceph.conf similar to the attached ceph.conf. You > will probably have to change the btrfs devices and the hostname > (os39). > > - Create the ceph filesystems: > > # mkdir /ceph/mon > # mkcephfs -a -c /etc/ceph/ceph.conf > > - Start ceph (e.g. "service ceph start") > > - Now you should be able to use ceph - "ceph -s" will tell you about > the state of the ceph cluster. > > - "rbd create -size 100 testimg" will create an rbd image on the ceph cluster. > > - Compile my test with "gcc -o rbdtest rbdtest.c -lrbd" and run it > with "./rbdtest testimg". > > I can see the first btrfs_orphan_commit_root warning after an hour or > so... I hope that I''ve described all necessary steps. If there is a > problem just send me a note. >Well I feel like an idiot, I finally get it to reproduce, go look at where I want to put my printks and theres the problem staring me right in the face. I''ve looked seriously at this problem 2 or 3 times and have missed this every single freaking time. Here is the patch I''m trying, please try it on yours to make sure it fixes the problem. It takes like 2 hours for it to reproduce for me so I won''t be able to fully test it until tomorrow, but so far it hasn''t broken anything so it should be good. Thanks, Josef diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index eefe573..4ad628d 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -57,9 +57,6 @@ struct btrfs_inode { /* used to order data wrt metadata */ struct btrfs_ordered_inode_tree ordered_tree; - /* for keeping track of orphaned inodes */ - struct list_head i_orphan; - /* list of all the delalloc inodes in the FS. There are times we need * to write all the delalloc pages to disk, and this list is used * to walk them all. @@ -164,6 +161,7 @@ struct btrfs_inode { unsigned dummy_inode:1; unsigned in_defrag:1; unsigned delalloc_meta_reserved:1; + unsigned has_orphan_item:1; /* * always compress this one file diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 8a89888..6dd20f3 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1375,7 +1375,7 @@ struct btrfs_root { struct list_head root_list; spinlock_t orphan_lock; - struct list_head orphan_list; + atomic_t orphan_inodes; struct btrfs_block_rsv *orphan_block_rsv; int orphan_item_inserted; int orphan_cleanup_state; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 7f849b3..8bbe8c4 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1148,7 +1148,6 @@ static void __setup_root(u32 nodesize, u32 leafsize, u32 sectorsize, root->orphan_block_rsv = NULL; INIT_LIST_HEAD(&root->dirty_list); - INIT_LIST_HEAD(&root->orphan_list); INIT_LIST_HEAD(&root->root_list); spin_lock_init(&root->orphan_lock); spin_lock_init(&root->inode_lock); @@ -1161,6 +1160,7 @@ static void __setup_root(u32 nodesize, u32 leafsize, u32 sectorsize, atomic_set(&root->log_commit[0], 0); atomic_set(&root->log_commit[1], 0); atomic_set(&root->log_writers, 0); + atomic_set(&root->orphan_inodes, 0); root->log_batch = 0; root->log_transid = 0; root->last_log_commit = 0; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 0218a4e..0265d40 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2138,12 +2138,12 @@ void btrfs_orphan_commit_root(struct btrfs_trans_handle *trans, struct btrfs_block_rsv *block_rsv; int ret; - if (!list_empty(&root->orphan_list) || + if (atomic_read(&root->orphan_inodes) || root->orphan_cleanup_state != ORPHAN_CLEANUP_DONE) return; spin_lock(&root->orphan_lock); - if (!list_empty(&root->orphan_list)) { + if (atomic_read(&root->orphan_inodes)) { spin_unlock(&root->orphan_lock); return; } @@ -2200,8 +2200,8 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans, struct inode *inode) block_rsv = NULL; } - if (list_empty(&BTRFS_I(inode)->i_orphan)) { - list_add(&BTRFS_I(inode)->i_orphan, &root->orphan_list); + if (!BTRFS_I(inode)->has_orphan_item) { + BTRFS_I(inode)->has_orphan_item = 1; #if 0 /* * For proper ENOSPC handling, we should do orphan @@ -2214,6 +2214,7 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans, struct inode *inode) insert = 1; #endif insert = 1; + atomic_inc(&root->orphan_inodes); } if (!BTRFS_I(inode)->orphan_meta_reserved) { @@ -2261,9 +2262,8 @@ int btrfs_orphan_del(struct btrfs_trans_handle *trans, struct inode *inode) int release_rsv = 0; int ret = 0; - spin_lock(&root->orphan_lock); - if (!list_empty(&BTRFS_I(inode)->i_orphan)) { - list_del_init(&BTRFS_I(inode)->i_orphan); + if (BTRFS_I(inode)->has_orphan_item) { + BTRFS_I(inode)->has_orphan_item = 0; delete_item = 1; } @@ -2271,7 +2271,6 @@ int btrfs_orphan_del(struct btrfs_trans_handle *trans, struct inode *inode) BTRFS_I(inode)->orphan_meta_reserved = 0; release_rsv = 1; } - spin_unlock(&root->orphan_lock); if (trans && delete_item) { ret = btrfs_del_orphan_item(trans, root, btrfs_ino(inode)); @@ -2281,6 +2280,9 @@ int btrfs_orphan_del(struct btrfs_trans_handle *trans, struct inode *inode) if (release_rsv) btrfs_orphan_release_metadata(inode); + if (trans && delete_item) + atomic_dec(&root->orphan_inodes); + return 0; } @@ -2418,9 +2420,8 @@ int btrfs_orphan_cleanup(struct btrfs_root *root) * add this inode to the orphan list so btrfs_orphan_del does * the proper thing when we hit it */ - spin_lock(&root->orphan_lock); - list_add(&BTRFS_I(inode)->i_orphan, &root->orphan_list); - spin_unlock(&root->orphan_lock); + atomic_inc(&root->orphan_inodes); + BTRFS_I(inode)->has_orphan_item = 1; /* if we have links, this was a truncate, lets do that */ if (inode->i_nlink) { @@ -3741,7 +3742,7 @@ void btrfs_evict_inode(struct inode *inode) btrfs_wait_ordered_range(inode, 0, (u64)-1); if (root->fs_info->log_root_recovering) { - BUG_ON(!list_empty(&BTRFS_I(inode)->i_orphan)); + BUG_ON(!BTRFS_I(inode)->has_orphan_item); goto no_delete; } @@ -6921,6 +6922,7 @@ struct inode *btrfs_alloc_inode(struct super_block *sb) ei->in_defrag = 0; ei->delalloc_meta_reserved = 0; ei->complete_ordered = 0; + ei->has_orphan_item = 0; ei->force_compress = BTRFS_COMPRESS_NONE; ei->delayed_node = NULL; @@ -6934,7 +6936,6 @@ struct inode *btrfs_alloc_inode(struct super_block *sb) mutex_init(&ei->log_mutex); mutex_init(&ei->delalloc_mutex); btrfs_ordered_inode_tree_init(&ei->ordered_tree); - INIT_LIST_HEAD(&ei->i_orphan); INIT_LIST_HEAD(&ei->delalloc_inodes); INIT_LIST_HEAD(&ei->ordered_operations); INIT_LIST_HEAD(&ei->ordered_finished); @@ -6980,13 +6981,11 @@ void btrfs_destroy_inode(struct inode *inode) spin_unlock(&root->fs_info->ordered_extent_lock); } - spin_lock(&root->orphan_lock); - if (!list_empty(&BTRFS_I(inode)->i_orphan)) { + if (BTRFS_I(inode)->has_orphan_item) { printk(KERN_INFO "BTRFS: inode %llu still on the orphan list\n", (unsigned long long)btrfs_ino(inode)); - list_del_init(&BTRFS_I(inode)->i_orphan); + atomic_dec(&root->orphan_inodes); } - spin_unlock(&root->orphan_lock); while (1) { ordered = btrfs_lookup_first_ordered_extent(inode, (u64)-1); -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 10, 2012 at 04:35:23PM -0400, Josef Bacik wrote:> On Fri, Apr 27, 2012 at 01:02:08PM +0200, Christian Brunner wrote: > > Am 24. April 2012 18:26 schrieb Sage Weil <sage@newdream.net>: > > > On Tue, 24 Apr 2012, Josef Bacik wrote: > > >> On Fri, Apr 20, 2012 at 05:09:34PM +0200, Christian Brunner wrote: > > >> > After running ceph on XFS for some time, I decided to try btrfs again. > > >> > Performance with the current "for-linux-min" branch and big metadata > > >> > is much better. The only problem (?) I''m still seeing is a warning > > >> > that seems to occur from time to time: > > > > > > Actually, before you do that... we have a new tool, > > > test_filestore_workloadgen, that generates a ceph-osd-like workload on the > > > local file system. It''s a subset of what a full OSD might do, but if > > > we''re lucky it will be sufficient to reproduce this issue. Something like > > > > > > test_filestore_workloadgen --osd-data /foo --osd-journal /bar > > > > > > will hopefully do the trick. > > > > > > Christian, maybe you can see if that is able to trigger this warning? > > > You''ll need to pull it from the current master branch; it wasn''t in the > > > last release. > > > > Trying to reproduce with test_filestore_workloadgen didn''t work for > > me. So here are some instructions on how to reproduce with a minimal > > ceph setup. > > > > You will need a single system with two disks and a bit of memory. > > > > - Compile and install ceph (detailed instructions: > > http://ceph.newdream.net/docs/master/ops/install/mkcephfs/) > > > > - For the test setup I''ve used two tmpfs files as journal devices. To > > create these, do the following: > > > > # mkdir -p /ceph/temp > > # mount -t tmpfs tmpfs /ceph/temp > > # dd if=/dev/zero of=/ceph/temp/journal0 count=500 bs=1024k > > # dd if=/dev/zero of=/ceph/temp/journal1 count=500 bs=1024k > > > > - Now you should create and mount btrfs. Here is what I did: > > > > # mkfs.btrfs -l 64k -n 64k /dev/sda > > # mkfs.btrfs -l 64k -n 64k /dev/sdb > > # mkdir /ceph/osd.000 > > # mkdir /ceph/osd.001 > > # mount -o noatime,space_cache,inode_cache,autodefrag /dev/sda /ceph/osd.000 > > # mount -o noatime,space_cache,inode_cache,autodefrag /dev/sdb /ceph/osd.001 > > > > - Create /etc/ceph/ceph.conf similar to the attached ceph.conf. You > > will probably have to change the btrfs devices and the hostname > > (os39). > > > > - Create the ceph filesystems: > > > > # mkdir /ceph/mon > > # mkcephfs -a -c /etc/ceph/ceph.conf > > > > - Start ceph (e.g. "service ceph start") > > > > - Now you should be able to use ceph - "ceph -s" will tell you about > > the state of the ceph cluster. > > > > - "rbd create -size 100 testimg" will create an rbd image on the ceph cluster. > > > > - Compile my test with "gcc -o rbdtest rbdtest.c -lrbd" and run it > > with "./rbdtest testimg". > > > > I can see the first btrfs_orphan_commit_root warning after an hour or > > so... I hope that I''ve described all necessary steps. If there is a > > problem just send me a note. > > > > Well I feel like an idiot, I finally get it to reproduce, go look at where I > want to put my printks and theres the problem staring me right in the face. > I''ve looked seriously at this problem 2 or 3 times and have missed this every > single freaking time. Here is the patch I''m trying, please try it on yours to > make sure it fixes the problem. It takes like 2 hours for it to reproduce for > me so I won''t be able to fully test it until tomorrow, but so far it hasn''t > broken anything so it should be good. Thanks, >That previous patch was against btrfs-next, this patch is against 3.4-rc6 if you are on mainline. Thanks, Josef diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index 9b9b15f..54af1fa 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -57,9 +57,6 @@ struct btrfs_inode { /* used to order data wrt metadata */ struct btrfs_ordered_inode_tree ordered_tree; - /* for keeping track of orphaned inodes */ - struct list_head i_orphan; - /* list of all the delalloc inodes in the FS. There are times we need * to write all the delalloc pages to disk, and this list is used * to walk them all. @@ -156,6 +153,7 @@ struct btrfs_inode { unsigned dummy_inode:1; unsigned in_defrag:1; unsigned delalloc_meta_reserved:1; + unsigned has_orphan_item:1; /* * always compress this one file diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 8fd7233..aad2600 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1375,7 +1375,7 @@ struct btrfs_root { struct list_head root_list; spinlock_t orphan_lock; - struct list_head orphan_list; + atomic_t orphan_inodes; struct btrfs_block_rsv *orphan_block_rsv; int orphan_item_inserted; int orphan_cleanup_state; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index a7ffc88..ff3bf4b 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1153,7 +1153,6 @@ static void __setup_root(u32 nodesize, u32 leafsize, u32 sectorsize, root->orphan_block_rsv = NULL; INIT_LIST_HEAD(&root->dirty_list); - INIT_LIST_HEAD(&root->orphan_list); INIT_LIST_HEAD(&root->root_list); spin_lock_init(&root->orphan_lock); spin_lock_init(&root->inode_lock); @@ -1166,6 +1165,7 @@ static void __setup_root(u32 nodesize, u32 leafsize, u32 sectorsize, atomic_set(&root->log_commit[0], 0); atomic_set(&root->log_commit[1], 0); atomic_set(&root->log_writers, 0); + atomic_set(&root->orphan_inodes, 0); root->log_batch = 0; root->log_transid = 0; root->last_log_commit = 0; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 61b16c6..78ce750 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2072,12 +2072,12 @@ void btrfs_orphan_commit_root(struct btrfs_trans_handle *trans, struct btrfs_block_rsv *block_rsv; int ret; - if (!list_empty(&root->orphan_list) || + if (atomic_read(&root->orphan_inodes) || root->orphan_cleanup_state != ORPHAN_CLEANUP_DONE) return; spin_lock(&root->orphan_lock); - if (!list_empty(&root->orphan_list)) { + if (atomic_read(&root->orphan_inodes)) { spin_unlock(&root->orphan_lock); return; } @@ -2134,8 +2134,8 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans, struct inode *inode) block_rsv = NULL; } - if (list_empty(&BTRFS_I(inode)->i_orphan)) { - list_add(&BTRFS_I(inode)->i_orphan, &root->orphan_list); + if (!BTRFS_I(inode)->has_orphan_item) { + BTRFS_I(inode)->has_orphan_item = 1; #if 0 /* * For proper ENOSPC handling, we should do orphan @@ -2148,6 +2148,7 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans, struct inode *inode) insert = 1; #endif insert = 1; + atomic_inc(&root->orphan_inodes); } if (!BTRFS_I(inode)->orphan_meta_reserved) { @@ -2195,9 +2196,8 @@ int btrfs_orphan_del(struct btrfs_trans_handle *trans, struct inode *inode) int release_rsv = 0; int ret = 0; - spin_lock(&root->orphan_lock); - if (!list_empty(&BTRFS_I(inode)->i_orphan)) { - list_del_init(&BTRFS_I(inode)->i_orphan); + if (BTRFS_I(inode)->has_orphan_item) { + BTRFS_I(inode)->has_orphan_item = 0; delete_item = 1; } @@ -2205,7 +2205,6 @@ int btrfs_orphan_del(struct btrfs_trans_handle *trans, struct inode *inode) BTRFS_I(inode)->orphan_meta_reserved = 0; release_rsv = 1; } - spin_unlock(&root->orphan_lock); if (trans && delete_item) { ret = btrfs_del_orphan_item(trans, root, btrfs_ino(inode)); @@ -2215,6 +2214,9 @@ int btrfs_orphan_del(struct btrfs_trans_handle *trans, struct inode *inode) if (release_rsv) btrfs_orphan_release_metadata(inode); + if (trans && delete_item) + atomic_dec(&root->orphan_inodes); + return 0; } @@ -2352,9 +2354,8 @@ int btrfs_orphan_cleanup(struct btrfs_root *root) * add this inode to the orphan list so btrfs_orphan_del does * the proper thing when we hit it */ - spin_lock(&root->orphan_lock); - list_add(&BTRFS_I(inode)->i_orphan, &root->orphan_list); - spin_unlock(&root->orphan_lock); + atomic_inc(&root->orphan_inodes); + BTRFS_I(inode)->has_orphan_item = 1; /* if we have links, this was a truncate, lets do that */ if (inode->i_nlink) { @@ -3671,7 +3672,7 @@ void btrfs_evict_inode(struct inode *inode) btrfs_wait_ordered_range(inode, 0, (u64)-1); if (root->fs_info->log_root_recovering) { - BUG_ON(!list_empty(&BTRFS_I(inode)->i_orphan)); + BUG_ON(!BTRFS_I(inode)->has_orphan_item); goto no_delete; } @@ -6914,6 +6915,7 @@ struct inode *btrfs_alloc_inode(struct super_block *sb) ei->dummy_inode = 0; ei->in_defrag = 0; ei->delalloc_meta_reserved = 0; + ei->has_orphan_item = 0; ei->force_compress = BTRFS_COMPRESS_NONE; ei->delayed_node = NULL; @@ -6927,7 +6929,6 @@ struct inode *btrfs_alloc_inode(struct super_block *sb) mutex_init(&ei->log_mutex); mutex_init(&ei->delalloc_mutex); btrfs_ordered_inode_tree_init(&ei->ordered_tree); - INIT_LIST_HEAD(&ei->i_orphan); INIT_LIST_HEAD(&ei->delalloc_inodes); INIT_LIST_HEAD(&ei->ordered_operations); RB_CLEAR_NODE(&ei->rb_node); @@ -6972,13 +6973,11 @@ void btrfs_destroy_inode(struct inode *inode) spin_unlock(&root->fs_info->ordered_extent_lock); } - spin_lock(&root->orphan_lock); - if (!list_empty(&BTRFS_I(inode)->i_orphan)) { + if (BTRFS_I(inode)->has_orphan_item) { printk(KERN_INFO "BTRFS: inode %llu still on the orphan list\n", (unsigned long long)btrfs_ino(inode)); - list_del_init(&BTRFS_I(inode)->i_orphan); + atomic_dec(&root->orphan_inodes); } - spin_unlock(&root->orphan_lock); while (1) { ordered = btrfs_lookup_first_ordered_extent(inode, (u64)-1); -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2012/5/10 Josef Bacik <josef@redhat.com>:> On Fri, Apr 27, 2012 at 01:02:08PM +0200, Christian Brunner wrote: >> Am 24. April 2012 18:26 schrieb Sage Weil <sage@newdream.net>: >> > On Tue, 24 Apr 2012, Josef Bacik wrote: >> >> On Fri, Apr 20, 2012 at 05:09:34PM +0200, Christian Brunner wrote: >> >> > After running ceph on XFS for some time, I decided to try btrfs again. >> >> > Performance with the current "for-linux-min" branch and big metadata >> >> > is much better. The only problem (?) I''m still seeing is a warning >> >> > that seems to occur from time to time: >> > >> > Actually, before you do that... we have a new tool, >> > test_filestore_workloadgen, that generates a ceph-osd-like workload on the >> > local file system. It''s a subset of what a full OSD might do, but if >> > we''re lucky it will be sufficient to reproduce this issue. Something like >> > >> > test_filestore_workloadgen --osd-data /foo --osd-journal /bar >> > >> > will hopefully do the trick. >> > >> > Christian, maybe you can see if that is able to trigger this warning? >> > You''ll need to pull it from the current master branch; it wasn''t in the >> > last release. >> >> Trying to reproduce with test_filestore_workloadgen didn''t work for >> me. So here are some instructions on how to reproduce with a minimal >> ceph setup. >> [...] > > Well I feel like an idiot, I finally get it to reproduce, go look at where I > want to put my printks and theres the problem staring me right in the face. > I''ve looked seriously at this problem 2 or 3 times and have missed this every > single freaking time. Here is the patch I''m trying, please try it on yours to > make sure it fixes the problem. It takes like 2 hours for it to reproduce for > me so I won''t be able to fully test it until tomorrow, but so far it hasn''t > broken anything so it should be good. Thanks,Great! I''ve put your patch on my testbox and will run a test over the weekend. I''ll report back on monday. Thanks, Christian -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Josef, Am 11.05.2012 15:31, schrieb Josef Bacik:> That previous patch was against btrfs-next, this patch is against 3.4-rc6 if you > are on mainline. Thanks,I tried your patch against mainline, after a few minutes I hit this bug. [ 1078.523655] ------------[ cut here ]------------ [ 1078.523667] kernel BUG at fs/btrfs/inode.c:2211! [ 1078.523676] invalid opcode: 0000 [#1] SMP [ 1078.523692] CPU 5 [ 1078.523696] Modules linked in: btrfs zlib_deflate libcrc32c mlx4_en bonding ext2 coretemp ghash_clmulni_intel aesni_intel cryptd aes_x86_64 microcode psmouse serio_raw sb_edac edac_core mei(C) joydev ses ioatdma enclosure mac_hid lp parport isci libsas scsi_transport_sas usbhid hid igb megaraid_sas mlx4_core dca [ 1078.523813] [ 1078.523818] Pid: 4108, comm: ceph-osd Tainted: G C 3.4.0-rc6+ #5 Supermicro X9SRi/X9SRi [ 1078.523841] RIP: 0010:[<ffffffffa022b2a2>] [<ffffffffa022b2a2>] btrfs_orphan_del+0xb2/0xc0 [btrfs] [ 1078.523867] RSP: 0018:ffff880ff14a5d38 EFLAGS: 00010282 [ 1078.523877] RAX: 00000000fffffffe RBX: ffff880ff004d6f0 RCX: 0000000000117400 [ 1078.523891] RDX: 00000000001173ff RSI: ffff8810279f6ea0 RDI: ffffea00409e7d80 [ 1078.523905] RBP: ffff880ff14a5d58 R08: 000060ef80001400 R09: ffffffffa0202c6a [ 1078.523918] R10: 0000000000000000 R11: 00000000000000ba R12: 0000000000000001 [ 1078.523932] R13: ffff881017663c00 R14: 0000000000000001 R15: ffff88101776f5a0 [ 1078.523946] FS: 00007f1d2c03c700(0000) GS:ffff88107fca0000(0000) knlGS:0000000000000000 [ 1078.523961] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1078.523990] CR2: 00000000050f4000 CR3: 0000000ff2a57000 CR4: 00000000000407e0 [ 1078.524019] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1078.524048] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 1078.524077] Process ceph-osd (pid: 4108, threadinfo ffff880ff14a4000, task ffff880ff2aa44a0) [ 1078.524121] Stack: [ 1078.524141] ffff8810279f7460 0000000000000000 ffff881017663c00 ffff880ff004d6f0 [ 1078.524190] ffff880ff14a5e08 ffffffffa022f5d8 ffff880ff004d6f0 0000000000000000 [ 1078.524240] ffff880ff14a5e18 ffffffff81188afd 0000800000000000 0000800000001000 [ 1078.524289] Call Trace: [ 1078.524317] [<ffffffffa022f5d8>] btrfs_truncate+0x4d8/0x650 [btrfs] [ 1078.524348] [<ffffffff81188afd>] ? path_lookupat+0x6d/0x750 [ 1078.524380] [<ffffffffa0230f91>] btrfs_setattr+0xc1/0x1b0 [btrfs] [ 1078.524408] [<ffffffff811955c3>] notify_change+0x183/0x320 [ 1078.524435] [<ffffffff8117889e>] do_truncate+0x5e/0xa0 [ 1078.524461] [<ffffffff81178a24>] sys_truncate+0x144/0x1b0 [ 1078.524489] [<ffffffff8165fd69>] system_call_fastpath+0x16/0x1b [ 1078.524516] Code: 8b 65 e8 4c 8b 6d f0 4c 8b 75 f8 c9 c3 0f 1f 40 00 80 bb 60 fe ff ff 84 75 c1 eb bb 0f 1f 44 00 00 48 89 df e8 a0 73 fe ff eb c1 <0f> 0b 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec [ 1078.524710] RIP [<ffffffffa022b2a2>] btrfs_orphan_del+0xb2/0xc0 [btrfs] [ 1078.524744] RSP <ffff880ff14a5d38> [ 1078.525013] ---[ end trace 88c92720204f7aa4 ]--- That''s the drive with the broken btrfs. [ 212.843776] device fsid 28492275-01d3-4e89-9f1c-bd86057194bf devid 1 transid 4 /dev/sdc [ 212.844630] btrfs: setting nodatacow [ 212.844637] btrfs: enabling auto defrag [ 212.844640] btrfs: disk space caching is enabled [ 212.844643] btrfs flagging fs with big metadata feature -martin -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, May 11, 2012 at 08:33:34PM +0200, Martin Mailand wrote:> Hi Josef, > > Am 11.05.2012 15:31, schrieb Josef Bacik: > >That previous patch was against btrfs-next, this patch is against 3.4-rc6 if you > >are on mainline. Thanks, > > I tried your patch against mainline, after a few minutes I hit this bug. >Heh duh, sorry, try this one instead. Thanks, Josef diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index 9b9b15f..54af1fa 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -57,9 +57,6 @@ struct btrfs_inode { /* used to order data wrt metadata */ struct btrfs_ordered_inode_tree ordered_tree; - /* for keeping track of orphaned inodes */ - struct list_head i_orphan; - /* list of all the delalloc inodes in the FS. There are times we need * to write all the delalloc pages to disk, and this list is used * to walk them all. @@ -156,6 +153,7 @@ struct btrfs_inode { unsigned dummy_inode:1; unsigned in_defrag:1; unsigned delalloc_meta_reserved:1; + unsigned has_orphan_item:1; /* * always compress this one file diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 8fd7233..aad2600 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1375,7 +1375,7 @@ struct btrfs_root { struct list_head root_list; spinlock_t orphan_lock; - struct list_head orphan_list; + atomic_t orphan_inodes; struct btrfs_block_rsv *orphan_block_rsv; int orphan_item_inserted; int orphan_cleanup_state; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index a7ffc88..ff3bf4b 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1153,7 +1153,6 @@ static void __setup_root(u32 nodesize, u32 leafsize, u32 sectorsize, root->orphan_block_rsv = NULL; INIT_LIST_HEAD(&root->dirty_list); - INIT_LIST_HEAD(&root->orphan_list); INIT_LIST_HEAD(&root->root_list); spin_lock_init(&root->orphan_lock); spin_lock_init(&root->inode_lock); @@ -1166,6 +1165,7 @@ static void __setup_root(u32 nodesize, u32 leafsize, u32 sectorsize, atomic_set(&root->log_commit[0], 0); atomic_set(&root->log_commit[1], 0); atomic_set(&root->log_writers, 0); + atomic_set(&root->orphan_inodes, 0); root->log_batch = 0; root->log_transid = 0; root->last_log_commit = 0; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 61b16c6..5ba68d0 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2072,12 +2072,12 @@ void btrfs_orphan_commit_root(struct btrfs_trans_handle *trans, struct btrfs_block_rsv *block_rsv; int ret; - if (!list_empty(&root->orphan_list) || + if (atomic_read(&root->orphan_inodes) || root->orphan_cleanup_state != ORPHAN_CLEANUP_DONE) return; spin_lock(&root->orphan_lock); - if (!list_empty(&root->orphan_list)) { + if (atomic_read(&root->orphan_inodes)) { spin_unlock(&root->orphan_lock); return; } @@ -2134,8 +2134,8 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans, struct inode *inode) block_rsv = NULL; } - if (list_empty(&BTRFS_I(inode)->i_orphan)) { - list_add(&BTRFS_I(inode)->i_orphan, &root->orphan_list); + if (!BTRFS_I(inode)->has_orphan_item) { + BTRFS_I(inode)->has_orphan_item = 1; #if 0 /* * For proper ENOSPC handling, we should do orphan @@ -2148,6 +2148,7 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans, struct inode *inode) insert = 1; #endif insert = 1; + atomic_inc(&root->orphan_inodes); } if (!BTRFS_I(inode)->orphan_meta_reserved) { @@ -2195,9 +2196,13 @@ int btrfs_orphan_del(struct btrfs_trans_handle *trans, struct inode *inode) int release_rsv = 0; int ret = 0; + /* + * evict_inode gets called without holding the i_mutex so we need to + * take the orphan lock to make sure we are safe in messing with these. + */ spin_lock(&root->orphan_lock); - if (!list_empty(&BTRFS_I(inode)->i_orphan)) { - list_del_init(&BTRFS_I(inode)->i_orphan); + if (BTRFS_I(inode)->has_orphan_item) { + BTRFS_I(inode)->has_orphan_item = 0; delete_item = 1; } @@ -2215,6 +2220,9 @@ int btrfs_orphan_del(struct btrfs_trans_handle *trans, struct inode *inode) if (release_rsv) btrfs_orphan_release_metadata(inode); + if (trans && delete_item) + atomic_dec(&root->orphan_inodes); + return 0; } @@ -2352,9 +2360,8 @@ int btrfs_orphan_cleanup(struct btrfs_root *root) * add this inode to the orphan list so btrfs_orphan_del does * the proper thing when we hit it */ - spin_lock(&root->orphan_lock); - list_add(&BTRFS_I(inode)->i_orphan, &root->orphan_list); - spin_unlock(&root->orphan_lock); + atomic_inc(&root->orphan_inodes); + BTRFS_I(inode)->has_orphan_item = 1; /* if we have links, this was a truncate, lets do that */ if (inode->i_nlink) { @@ -3671,7 +3678,7 @@ void btrfs_evict_inode(struct inode *inode) btrfs_wait_ordered_range(inode, 0, (u64)-1); if (root->fs_info->log_root_recovering) { - BUG_ON(!list_empty(&BTRFS_I(inode)->i_orphan)); + BUG_ON(!BTRFS_I(inode)->has_orphan_item); goto no_delete; } @@ -6914,6 +6921,7 @@ struct inode *btrfs_alloc_inode(struct super_block *sb) ei->dummy_inode = 0; ei->in_defrag = 0; ei->delalloc_meta_reserved = 0; + ei->has_orphan_item = 0; ei->force_compress = BTRFS_COMPRESS_NONE; ei->delayed_node = NULL; @@ -6927,7 +6935,6 @@ struct inode *btrfs_alloc_inode(struct super_block *sb) mutex_init(&ei->log_mutex); mutex_init(&ei->delalloc_mutex); btrfs_ordered_inode_tree_init(&ei->ordered_tree); - INIT_LIST_HEAD(&ei->i_orphan); INIT_LIST_HEAD(&ei->delalloc_inodes); INIT_LIST_HEAD(&ei->ordered_operations); RB_CLEAR_NODE(&ei->rb_node); @@ -6972,13 +6979,11 @@ void btrfs_destroy_inode(struct inode *inode) spin_unlock(&root->fs_info->ordered_extent_lock); } - spin_lock(&root->orphan_lock); - if (!list_empty(&BTRFS_I(inode)->i_orphan)) { + if (BTRFS_I(inode)->has_orphan_item) { printk(KERN_INFO "BTRFS: inode %llu still on the orphan list\n", (unsigned long long)btrfs_ino(inode)); - list_del_init(&BTRFS_I(inode)->i_orphan); + atomic_dec(&root->orphan_inodes); } - spin_unlock(&root->orphan_lock); while (1) { ordered = btrfs_lookup_first_ordered_extent(inode, (u64)-1); -- 1.7.7.6 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Josef, Am 11.05.2012 21:16, schrieb Josef Bacik:> Heh duh, sorry, try this one instead. Thanks,With this patch I got this Bug: [ 8233.828722] ------------[ cut here ]------------ [ 8233.828737] kernel BUG at fs/btrfs/inode.c:2217! [ 8233.828746] invalid opcode: 0000 [#1] SMP [ 8233.828761] CPU 1 [ 8233.828766] Modules linked in: btrfs zlib_deflate libcrc32c ses enclosure bonding coretemp ghash_clmulni_intel psmouse aesni_intel sb_edac cryptd a es_x86_64 ext2 microcode serio_raw edac_core mei(C) joydev ioatdma mac_hid lp parport usbhid hid isci libsas ixgbe scsi_transport_sas megaraid_sas igb dca mdio [ 8233.828885] [ 8233.828891] Pid: 4444, comm: ceph-osd Tainted: G WC 3.4.0-rc6+ #6 Supermicro X9SRi/X9SRi [ 8233.828915] RIP: 0010:[<ffffffffa02492d2>] [<ffffffffa02492d2>] btrfs_orphan_del+0xe2/0xf0 [btrfs] [ 8233.828947] RSP: 0018:ffff88101ce53d18 EFLAGS: 00010282 [ 8233.828957] RAX: 00000000fffffffe RBX: ffff880d194e2c50 RCX: 0000000000d0a3be [ 8233.828971] RDX: 0000000000d0a3bd RSI: ffff88101de2a000 RDI: ffffea0040778a80 [ 8233.828985] RBP: ffff88101ce53d58 R08: 000060ef80000f00 R09: ffffffffa0220c6a [ 8233.828999] R10: 0000000000000000 R11: 00000000000000f0 R12: ffff88071bb1e790 [ 8233.829029] R13: ffff88071bb1e400 R14: 0000000000000001 R15: 0000000000000001 [ 8233.829059] FS: 00007fdfa179b700(0000) GS:ffff88107fc20000(0000) knlGS:0000000000000000 [ 8233.829104] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8233.829131] CR2: 000000000c614000 CR3: 00000001df9d2000 CR4: 00000000000407e0 [ 8233.829160] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 8233.829190] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 8233.829220] Process ceph-osd (pid: 4444, threadinfo ffff88101ce52000, task ffff88101b7b96e0) [ 8233.829265] Stack: [ 8233.829286] 0c00000000000002 ffff88101de14cd0 ffff88101ce53d38 ffff88101de14cd0 [ 8233.829336] 0000000000000000 ffff88071bb1e400 ffff880d194e2c50 ffff881024680620 [ 8233.829386] ffff88101ce53e08 ffffffffa024d608 ffff880d194e2c50 0000000000000000 [ 8233.829436] Call Trace: [ 8233.829472] [<ffffffffa024d608>] btrfs_truncate+0x4d8/0x650 [btrfs] [ 8233.829503] [<ffffffff81188afd>] ? path_lookupat+0x6d/0x750 [ 8233.829537] [<ffffffffa024efc1>] btrfs_setattr+0xc1/0x1b0 [btrfs] [ 8233.829567] [<ffffffff811955c3>] notify_change+0x183/0x320 [ 8233.829595] [<ffffffff8117889e>] do_truncate+0x5e/0xa0 [ 8233.829621] [<ffffffff81178a24>] sys_truncate+0x144/0x1b0 [ 8233.829649] [<ffffffff8165fd69>] system_call_fastpath+0x16/0x1b [ 8233.829676] Code: e8 4c 8b 75 f0 4c 8b 7d f8 c9 c3 66 0f 1f 44 00 00 80 bb 60 fe ff ff 84 75 b4 eb ae 0f 1f 44 00 00 48 89 df e8 70 73 fe ff eb b8 <0f> 0b 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec [ 8233.829875] RIP [<ffffffffa02492d2>] btrfs_orphan_del+0xe2/0xf0 [btrfs] [ 8233.829914] RSP <ffff88101ce53d18> [ 8233.830187] ---[ end trace 46dd4a711bf2979d ]--- -martin -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, May 14, 2012 at 04:19:37PM +0200, Martin Mailand wrote:> Hi Josef, > > Am 11.05.2012 21:16, schrieb Josef Bacik: > >Heh duh, sorry, try this one instead. Thanks, > > With this patch I got this Bug:Yeah Christian reported the same thing on Friday. I''m going to work on a patch and actually run it here to make sure it doesn''t blow up and then send it to the list when I think I''ve got something that works. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, May 14, 2012 at 10:20:48AM -0400, Josef Bacik wrote:> On Mon, May 14, 2012 at 04:19:37PM +0200, Martin Mailand wrote: > > Hi Josef, > > > > Am 11.05.2012 21:16, schrieb Josef Bacik: > > >Heh duh, sorry, try this one instead. Thanks, > > > > With this patch I got this Bug: > > Yeah Christian reported the same thing on Friday. I''m going to work on a patch > and actually run it here to make sure it doesn''t blow up and then send it to the > list when I think I''ve got something that works. Thanks, >Hrm ok so I finally got some time to try and debug it and let the test run a good long while (5 hours almost) and I couldn''t hit either the original bug or the one you guys were hitting. So either my extra little bit of locking did the trick or I get to keep my "Worst reproducer ever" award. Can you guys give this one a whirl and if it panics send the entire dmesg since it should spit out a WARN_ON() to let me know what I thought was the problem was it. Thanks, Josef diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index 3771b85..559e716 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -57,9 +57,6 @@ struct btrfs_inode { /* used to order data wrt metadata */ struct btrfs_ordered_inode_tree ordered_tree; - /* for keeping track of orphaned inodes */ - struct list_head i_orphan; - /* list of all the delalloc inodes in the FS. There are times we need * to write all the delalloc pages to disk, and this list is used * to walk them all. @@ -153,6 +150,7 @@ struct btrfs_inode { unsigned dummy_inode:1; unsigned in_defrag:1; unsigned delalloc_meta_reserved:1; + unsigned has_orphan_item:1; /* * always compress this one file diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index ba8743b..72cdf98 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1375,7 +1375,7 @@ struct btrfs_root { struct list_head root_list; spinlock_t orphan_lock; - struct list_head orphan_list; + atomic_t orphan_inodes; struct btrfs_block_rsv *orphan_block_rsv; int orphan_item_inserted; int orphan_cleanup_state; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 19f5b45..25dba7a 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1153,7 +1153,6 @@ static void __setup_root(u32 nodesize, u32 leafsize, u32 sectorsize, root->orphan_block_rsv = NULL; INIT_LIST_HEAD(&root->dirty_list); - INIT_LIST_HEAD(&root->orphan_list); INIT_LIST_HEAD(&root->root_list); spin_lock_init(&root->orphan_lock); spin_lock_init(&root->inode_lock); @@ -1166,6 +1165,7 @@ static void __setup_root(u32 nodesize, u32 leafsize, u32 sectorsize, atomic_set(&root->log_commit[0], 0); atomic_set(&root->log_commit[1], 0); atomic_set(&root->log_writers, 0); + atomic_set(&root->orphan_inodes, 0); root->log_batch = 0; root->log_transid = 0; root->last_log_commit = 0; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 54ae3df..c0cff20 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2104,12 +2104,12 @@ void btrfs_orphan_commit_root(struct btrfs_trans_handle *trans, struct btrfs_block_rsv *block_rsv; int ret; - if (!list_empty(&root->orphan_list) || + if (atomic_read(&root->orphan_inodes) || root->orphan_cleanup_state != ORPHAN_CLEANUP_DONE) return; spin_lock(&root->orphan_lock); - if (!list_empty(&root->orphan_list)) { + if (atomic_read(&root->orphan_inodes)) { spin_unlock(&root->orphan_lock); return; } @@ -2166,8 +2166,8 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans, struct inode *inode) block_rsv = NULL; } - if (list_empty(&BTRFS_I(inode)->i_orphan)) { - list_add(&BTRFS_I(inode)->i_orphan, &root->orphan_list); + if (!BTRFS_I(inode)->has_orphan_item) { + BTRFS_I(inode)->has_orphan_item = 1; #if 0 /* * For proper ENOSPC handling, we should do orphan @@ -2180,6 +2180,7 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans, struct inode *inode) insert = 1; #endif insert = 1; + atomic_inc(&root->orphan_inodes); } if (!BTRFS_I(inode)->orphan_meta_reserved) { @@ -2198,6 +2199,9 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans, struct inode *inode) if (insert >= 1) { ret = btrfs_insert_orphan_item(trans, root, btrfs_ino(inode)); if (ret && ret != -EEXIST) { + spin_lock(&root->orphan_lock); + BTRFS_I(inode)->has_orphan_item = 0; + spin_unlock(&root->orphan_lock); btrfs_abort_transaction(trans, root, ret); return ret; } @@ -2227,9 +2231,13 @@ int btrfs_orphan_del(struct btrfs_trans_handle *trans, struct inode *inode) int release_rsv = 0; int ret = 0; + /* + * evict_inode gets called without holding the i_mutex so we need to + * take the orphan lock to make sure we are safe in messing with these. + */ spin_lock(&root->orphan_lock); - if (!list_empty(&BTRFS_I(inode)->i_orphan)) { - list_del_init(&BTRFS_I(inode)->i_orphan); + if (BTRFS_I(inode)->has_orphan_item) { + BTRFS_I(inode)->has_orphan_item = 0; delete_item = 1; } @@ -2247,6 +2255,9 @@ int btrfs_orphan_del(struct btrfs_trans_handle *trans, struct inode *inode) if (release_rsv) btrfs_orphan_release_metadata(inode); + if (trans && delete_item) + atomic_dec(&root->orphan_inodes); + return 0; } @@ -2385,7 +2396,9 @@ int btrfs_orphan_cleanup(struct btrfs_root *root) * the proper thing when we hit it */ spin_lock(&root->orphan_lock); - list_add(&BTRFS_I(inode)->i_orphan, &root->orphan_list); + atomic_inc(&root->orphan_inodes); + WARN_ON(BTRFS_I(inode)->has_orphan_item); + BTRFS_I(inode)->has_orphan_item = 1; spin_unlock(&root->orphan_lock); /* if we have links, this was a truncate, lets do that */ @@ -3707,7 +3720,7 @@ void btrfs_evict_inode(struct inode *inode) btrfs_wait_ordered_range(inode, 0, (u64)-1); if (root->fs_info->log_root_recovering) { - BUG_ON(!list_empty(&BTRFS_I(inode)->i_orphan)); + BUG_ON(!BTRFS_I(inode)->has_orphan_item); goto no_delete; } @@ -6866,6 +6879,7 @@ struct inode *btrfs_alloc_inode(struct super_block *sb) ei->dummy_inode = 0; ei->in_defrag = 0; ei->delalloc_meta_reserved = 0; + ei->has_orphan_item = 0; ei->force_compress = BTRFS_COMPRESS_NONE; ei->delayed_node = NULL; @@ -6879,7 +6893,6 @@ struct inode *btrfs_alloc_inode(struct super_block *sb) mutex_init(&ei->log_mutex); mutex_init(&ei->delalloc_mutex); btrfs_ordered_inode_tree_init(&ei->ordered_tree); - INIT_LIST_HEAD(&ei->i_orphan); INIT_LIST_HEAD(&ei->delalloc_inodes); INIT_LIST_HEAD(&ei->ordered_operations); RB_CLEAR_NODE(&ei->rb_node); @@ -6924,13 +6937,11 @@ void btrfs_destroy_inode(struct inode *inode) spin_unlock(&root->fs_info->ordered_extent_lock); } - spin_lock(&root->orphan_lock); - if (!list_empty(&BTRFS_I(inode)->i_orphan)) { + if (BTRFS_I(inode)->has_orphan_item) { printk(KERN_INFO "BTRFS: inode %llu still on the orphan list\n", (unsigned long long)btrfs_ino(inode)); - list_del_init(&BTRFS_I(inode)->i_orphan); + atomic_dec(&root->orphan_inodes); } - spin_unlock(&root->orphan_lock); while (1) { ordered = btrfs_lookup_first_ordered_extent(inode, (u64)-1); -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Josef, somehow I still get the kernel Bug messages, I used your patch from the 16th against rc7. -martin Am 16.05.2012 21:20, schrieb Josef Bacik:> Hrm ok so I finally got some time to try and debug it and let the test run a > good long while (5 hours almost) and I couldn''t hit either the original bug or > the one you guys were hitting. So either my extra little bit of locking did the > trick or I get to keep my "Worst reproducer ever" award. Can you guys give this > one a whirl and if it panics send the entire dmesg since it should spit out a > WARN_ON() to let me know what I thought was the problem was it. Thanks,[ 2868.813236] ------------[ cut here ]------------ [ 2868.813297] kernel BUG at fs/btrfs/inode.c:2220! [ 2868.813355] invalid opcode: 0000 [#2] SMP [ 2868.813479] CPU 2 [ 2868.813516] Modules linked in: btrfs zlib_deflate libcrc32c ext2 bonding coretemp ghash_clmulni_intel aesni_intel cryptd aes_x86_64 microcode psmouse serio_raw sb_edac edac_core joydev mei(C) ses ioatdma enclosure mac_hid lp parport isci libsas scsi_transport_sas usbhid hid ixgbe igb megaraid_sas dca mdio [ 2868.814871] [ 2868.814925] Pid: 5325, comm: ceph-osd Tainted: G D C 3.4.0-rc7+ #10 Supermicro X9SRi/X9SRi [ 2868.815108] RIP: 0010:[<ffffffffa02212f2>] [<ffffffffa02212f2>] btrfs_orphan_del+0xe2/0xf0 [btrfs] [ 2868.815236] RSP: 0018:ffff880296e89d18 EFLAGS: 00010282 [ 2868.815294] RAX: 00000000fffffffe RBX: ffff88101ef3c390 RCX: 0000000000562497 [ 2868.815355] RDX: 0000000000562496 RSI: ffff88101ef10000 RDI: ffffea00407bc400 [ 2868.815416] RBP: ffff880296e89d58 R08: 000060ef80000fd0 R09: ffffffffa01f8c6a [ 2868.815476] R10: 0000000000000000 R11: 000000000000011d R12: ffff880fdf602790 [ 2868.815537] R13: ffff880fdf602400 R14: 0000000000000001 R15: 0000000000000001 [ 2868.815598] FS: 00007f07d5512700(0000) GS:ffff88107fc40000(0000) knlGS:0000000000000000 [ 2868.815675] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2868.815734] CR2: 000000000ab16000 CR3: 000000082a6b2000 CR4: 00000000000407e0 [ 2868.815796] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2868.815858] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 2868.815920] Process ceph-osd (pid: 5325, threadinfo ffff880296e88000, task ffff8810170616e0) [ 2868.815997] Stack: [ 2868.816049] 0c00000000000007 ffff88101ef12960 ffff880296e89d38 ffff88101ef12960 [ 2868.816262] 0000000000000000 ffff880fdf602400 ffff88101ef3c390 ffff880b4ce2f260 [ 2868.816485] ffff880296e89e08 ffffffffa0225628 ffff88101ef3c390 0000000000000000 [ 2868.816694] Call Trace: [ 2868.816755] [<ffffffffa0225628>] btrfs_truncate+0x4d8/0x650 [btrfs] [ 2868.816817] [<ffffffff81188afd>] ? path_lookupat+0x6d/0x750 [ 2868.816880] [<ffffffffa0227021>] btrfs_setattr+0xc1/0x1b0 [btrfs] [ 2868.816940] [<ffffffff811955c3>] notify_change+0x183/0x320 [ 2868.816998] [<ffffffff8117889e>] do_truncate+0x5e/0xa0 [ 2868.817056] [<ffffffff81178a24>] sys_truncate+0x144/0x1b0 [ 2868.817115] [<ffffffff8165fd29>] system_call_fastpath+0x16/0x1b [ 2868.817173] Code: e8 4c 8b 75 f0 4c 8b 7d f8 c9 c3 66 0f 1f 44 00 00 80 bb 60 fe ff ff 84 75 b4 eb ae 0f 1f 44 00 00 48 89 df e8 50 73 fe ff eb b8 <0f> 0b 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec [ 2868.819501] RIP [<ffffffffa02212f2>] btrfs_orphan_del+0xe2/0xf0 [btrfs] [ 2868.819602] RSP <ffff880296e89d18> [ 2868.819703] ---[ end trace 94d17b770b376c84 ]--- [ 3249.857453] ------------[ cut here ]------------ [ 3249.857481] kernel BUG at fs/btrfs/inode.c:2220! [ 3249.857506] invalid opcode: 0000 [#3] SMP [ 3249.857534] CPU 0 [ 3249.857538] Modules linked in: btrfs zlib_deflate libcrc32c ext2 bonding coretemp ghash_clmulni_intel aesni_intel cryptd aes_x86_64 microcode psmouse serio_raw sb_edac edac_core joydev mei(C) ses ioatdma enclosure mac_hid lp parport isci libsas scsi_transport_sas usbhid hid ixgbe igb megaraid_sas dca mdio [ 3249.857721] [ 3249.857740] Pid: 5384, comm: ceph-osd Tainted: G D C 3.4.0-rc7+ #10 Supermicro X9SRi/X9SRi [ 3249.857791] RIP: 0010:[<ffffffffa02212f2>] [<ffffffffa02212f2>] btrfs_orphan_del+0xe2/0xf0 [btrfs] [ 3249.857847] RSP: 0018:ffff880abe8b5d18 EFLAGS: 00010282 [ 3249.857873] RAX: 00000000fffffffe RBX: ffff8807eb8b6670 RCX: 000000000077a084 [ 3249.857902] RDX: 000000000077a083 RSI: ffff88101ee497e0 RDI: ffffea00407b9240 [ 3249.857931] RBP: ffff880abe8b5d58 R08: 000060ef80000fd0 R09: ffffffffa01f8c6a [ 3249.857959] R10: 0000000000000000 R11: 0000000000000153 R12: ffff880d56825390 [ 3249.857988] R13: ffff880d56825000 R14: 0000000000000001 R15: 0000000000000001 [ 3249.858017] FS: 00007f06bd13b700(0000) GS:ffff88107fc00000(0000) knlGS:0000000000000000 [ 3249.858062] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3249.858088] CR2: 00000000043d2000 CR3: 0000000e7ebe5000 CR4: 00000000000407f0 [ 3249.858117] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 3249.858146] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 3249.858175] Process ceph-osd (pid: 5384, threadinfo ffff880abe8b4000, task ffff880eb7a596e0) [ 3249.858219] Stack: [ 3249.858239] 0c00000000000002 ffff88101ede4d70 ffff880abe8b5d38 ffff88101ede4d70 [ 3249.858288] 0000000000000000 ffff880d56825000 ffff8807eb8b6670 ffff880546925e00 [ 3249.858338] ffff880abe8b5e08 ffffffffa0225628 ffff8807eb8b6670 0000000000000000 [ 3249.858387] Call Trace: [ 3249.858415] [<ffffffffa0225628>] btrfs_truncate+0x4d8/0x650 [btrfs] [ 3249.858445] [<ffffffff81188afd>] ? path_lookupat+0x6d/0x750 [ 3249.858477] [<ffffffffa0227021>] btrfs_setattr+0xc1/0x1b0 [btrfs] [ 3249.858505] [<ffffffff811955c3>] notify_change+0x183/0x320 [ 3249.858533] [<ffffffff8117889e>] do_truncate+0x5e/0xa0 [ 3249.858559] [<ffffffff81178a24>] sys_truncate+0x144/0x1b0 [ 3249.858587] [<ffffffff8165fd29>] system_call_fastpath+0x16/0x1b [ 3249.858614] Code: e8 4c 8b 75 f0 4c 8b 7d f8 c9 c3 66 0f 1f 44 00 00 80 bb 60 fe ff ff 84 75 b4 eb ae 0f 1f 44 00 00 48 89 df e8 50 73 fe ff eb b8 <0f> 0b 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec [ 3249.858813] RIP [<ffffffffa02212f2>] btrfs_orphan_del+0xe2/0xf0 [btrfs] [ 3249.858852] RSP <ffff880abe8b5d18> [ 3249.859140] ---[ end trace 94d17b770b376c85 ]--- -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 17, 2012 at 12:29:32PM +0200, Martin Mailand wrote:> Hi Josef, > > somehow I still get the kernel Bug messages, I used your patch from > the 16th against rc7. >Was there anything above those messages? There should have been a WARN_ON() or something. If not thats fine, I just need to know one way or the other so I can figure out what to do next. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Josef, no there was nothing above. Here the is another dmesg output.> Was there anything above those messages? There should have been a WARN_ON() or > something. If not thats fine, I just need to know one way or the other so I can > figure out what to do next. Thanks, > > Josef-martin [ 63.027277] Btrfs loaded [ 63.027485] device fsid 266726e1-439f-4d89-a374-7ef92d355daf devid 1 transid 4 /dev/sdc [ 63.027750] btrfs: setting nodatacow [ 63.027752] btrfs: enabling auto defrag [ 63.027753] btrfs: disk space caching is enabled [ 63.027754] btrfs flagging fs with big metadata feature [ 63.036347] device fsid 070e2c6c-2ea5-478d-bc07-7ce3a954e2e4 devid 1 transid 4 /dev/sdd [ 63.036624] btrfs: setting nodatacow [ 63.036626] btrfs: enabling auto defrag [ 63.036627] btrfs: disk space caching is enabled [ 63.036628] btrfs flagging fs with big metadata feature [ 63.045628] device fsid 6f7b82a9-a1b7-40c6-8b00-2c2a44481066 devid 1 transid 4 /dev/sde [ 63.045910] btrfs: setting nodatacow [ 63.045912] btrfs: enabling auto defrag [ 63.045913] btrfs: disk space caching is enabled [ 63.045914] btrfs flagging fs with big metadata feature [ 63.831278] device fsid 46890b76-45c2-4ea2-96ee-2ea88e29628b devid 1 transid 4 /dev/sdf [ 63.831577] btrfs: setting nodatacow [ 63.831579] btrfs: enabling auto defrag [ 63.831579] btrfs: disk space caching is enabled [ 63.831580] btrfs flagging fs with big metadata feature [ 1521.820412] ------------[ cut here ]------------ [ 1521.820424] kernel BUG at fs/btrfs/inode.c:2220! [ 1521.820433] invalid opcode: 0000 [#1] SMP [ 1521.820448] CPU 4 [ 1521.820452] Modules linked in: btrfs zlib_deflate libcrc32c ext2 ses enclosure bonding coretemp ghash_clmulni_intel aesni_intel cryptd aes_x86_64 psmouse microcode serio_raw sb_edac edac_core mei(C) joydev ioatdma mac_hid lp parport isci libsas scsi_transport_sas usbhid hid ixgbe igb dca megaraid_sas mdio [ 1521.820562] [ 1521.820567] Pid: 3095, comm: ceph-osd Tainted: G C 3.4.0-rc7+ #10 Supermicro X9SRi/X9SRi [ 1521.820591] RIP: 0010:[<ffffffffa02532f2>] [<ffffffffa02532f2>] btrfs_orphan_del+0xe2/0xf0 [btrfs] [ 1521.820616] RSP: 0018:ffff881013da9d18 EFLAGS: 00010282 [ 1521.820626] RAX: 00000000fffffffe RBX: ffff881013a3b7f0 RCX: 0000000000395dcf [ 1521.820640] RDX: 0000000000395dce RSI: ffff88101df77480 RDI: ffffea004077ddc0 [ 1521.820654] RBP: ffff881013da9d58 R08: 000060ef800010d0 R09: ffffffffa022ac6a [ 1521.820667] R10: 0000000000000000 R11: 000000000000010a R12: ffff88101e378790 [ 1521.820681] R13: ffff88101e378400 R14: 0000000000000001 R15: 0000000000000001 [ 1521.820695] FS: 00007faa45d30700(0000) GS:ffff88107fc80000(0000) knlGS:0000000000000000 [ 1521.820710] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1521.820738] CR2: 00007fe0efba6010 CR3: 0000001016fec000 CR4: 00000000000407e0 [ 1521.820767] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1521.820796] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 1521.820825] Process ceph-osd (pid: 3095, threadinfo ffff881013da8000, task ffff881013da44a0) [ 1521.820870] Stack: [ 1521.820889] 0c00000000000005 ffff88101df9c230 ffff881013da9d38 ffff88101df9c230 [ 1521.820939] 0000000000000000 ffff88101e378400 ffff881013a3b7f0 ffff880c6880f840 [ 1521.820988] ffff881013da9e08 ffffffffa0257628 ffff881013a3b7f0 0000000000000000 [ 1521.821038] Call Trace: [ 1521.821066] [<ffffffffa0257628>] btrfs_truncate+0x4d8/0x650 [btrfs] [ 1521.821096] [<ffffffff81188afd>] ? path_lookupat+0x6d/0x750 [ 1521.821128] [<ffffffffa0259021>] btrfs_setattr+0xc1/0x1b0 [btrfs] [ 1521.821156] [<ffffffff811955c3>] notify_change+0x183/0x320 [ 1521.821183] [<ffffffff8117889e>] do_truncate+0x5e/0xa0 [ 1521.821209] [<ffffffff81178a24>] sys_truncate+0x144/0x1b0 [ 1521.821237] [<ffffffff8165fd29>] system_call_fastpath+0x16/0x1b [ 1521.821265] Code: e8 4c 8b 75 f0 4c 8b 7d f8 c9 c3 66 0f 1f 44 00 00 80 bb 60 fe ff ff 84 75 b4 eb ae 0f 1f 44 00 00 48 89 df e8 50 73 fe ff eb b8 <0f> 0b 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec [ 1521.821458] RIP [<ffffffffa02532f2>] btrfs_orphan_del+0xe2/0xf0 [btrfs] [ 1521.821492] RSP <ffff881013da9d18> [ 1521.821758] ---[ end trace aee4c5fe92ee2a67 ]--- [ 6888.637508] btrfs: truncated 1 orphans [ 7641.701736] ------------[ cut here ]------------ [ 7641.701764] kernel BUG at fs/btrfs/inode.c:2220! [ 7641.701789] invalid opcode: 0000 [#2] SMP [ 7641.701816] CPU 3 [ 7641.701819] Modules linked in: btrfs zlib_deflate libcrc32c ext2 ses enclosure bonding coretemp ghash_clmulni_intel aesni_intel cryptd aes_x86_64 psmouse microcode serio_raw sb_edac edac_core mei(C) joydev ioatdma mac_hid lp parport isci libsas scsi_transport_sas usbhid hid ixgbe igb dca megaraid_sas mdio [ 7641.702000] [ 7641.702030] Pid: 3064, comm: ceph-osd Tainted: G D C 3.4.0-rc7+ #10 Supermicro X9SRi/X9SRi [ 7641.702081] RIP: 0010:[<ffffffffa02532f2>] [<ffffffffa02532f2>] btrfs_orphan_del+0xe2/0xf0 [btrfs] [ 7641.702140] RSP: 0018:ffff881013c51d18 EFLAGS: 00010282 [ 7641.702166] RAX: 00000000fffffffe RBX: ffff881010871130 RCX: 00000000013df293 [ 7641.702195] RDX: 00000000013df292 RSI: ffff88101701c1b0 RDI: ffffea00405c0700 [ 7641.702224] RBP: ffff881013c51d58 R08: 000060ef800010d0 R09: ffffffffa022ac6a [ 7641.702253] R10: 0000000000000000 R11: 000000000000013f R12: ffff88101e379390 [ 7641.702282] R13: ffff88101e379000 R14: 0000000000000001 R15: 0000000000000001 [ 7641.702311] FS: 00007fcb27307700(0000) GS:ffff88107fc60000(0000) knlGS:0000000000000000 [ 7641.702368] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7641.702395] CR2: 0000000010713018 CR3: 0000001016e95000 CR4: 00000000000407e0 [ 7641.702425] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 7641.702454] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 7641.702484] Process ceph-osd (pid: 3064, threadinfo ffff881013c50000, task ffff881015e35b80) [ 7641.702529] Stack: [ 7641.702557] 0c00000000000004 ffff88101701d820 ffff881013c51d38 ffff88101701d820 [ 7641.702618] 0000000000000000 ffff88101e379000 ffff881010871130 ffff880503e70b80 [ 7641.702678] ffff881013c51e08 ffffffffa0257628 ffff881010871130 0000000000000000 [ 7641.702729] Call Trace: [ 7641.702761] [<ffffffffa0257628>] btrfs_truncate+0x4d8/0x650 [btrfs] [ 7641.702792] [<ffffffff81188afd>] ? path_lookupat+0x6d/0x750 [ 7641.702828] [<ffffffffa0259021>] btrfs_setattr+0xc1/0x1b0 [btrfs] [ 7641.702858] [<ffffffff811955c3>] notify_change+0x183/0x320 [ 7641.702886] [<ffffffff8117889e>] do_truncate+0x5e/0xa0 [ 7641.702913] [<ffffffff81178a24>] sys_truncate+0x144/0x1b0 [ 7641.702942] [<ffffffff8165fd29>] system_call_fastpath+0x16/0x1b [ 7641.702969] Code: e8 4c 8b 75 f0 4c 8b 7d f8 c9 c3 66 0f 1f 44 00 00 80 bb 60 fe ff ff 84 75 b4 eb ae 0f 1f 44 00 00 48 89 df e8 50 73 fe ff eb b8 <0f> 0b 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec [ 7641.703185] RIP [<ffffffffa02532f2>] btrfs_orphan_del+0xe2/0xf0 [btrfs] [ 7641.703224] RSP <ffff881013c51d18> [ 7641.703591] ---[ end trace aee4c5fe92ee2a68 ]--- -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 17, 2012 at 05:12:55PM +0200, Martin Mailand wrote:> Hi Josef, > no there was nothing above. Here the is another dmesg output. >Hrm ok give this a try and hopefully this is it, still couldn''t reproduce. Thanks, Josef diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index 3771b85..559e716 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -57,9 +57,6 @@ struct btrfs_inode { /* used to order data wrt metadata */ struct btrfs_ordered_inode_tree ordered_tree; - /* for keeping track of orphaned inodes */ - struct list_head i_orphan; - /* list of all the delalloc inodes in the FS. There are times we need * to write all the delalloc pages to disk, and this list is used * to walk them all. @@ -153,6 +150,7 @@ struct btrfs_inode { unsigned dummy_inode:1; unsigned in_defrag:1; unsigned delalloc_meta_reserved:1; + unsigned has_orphan_item:1; /* * always compress this one file diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index ba8743b..72cdf98 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1375,7 +1375,7 @@ struct btrfs_root { struct list_head root_list; spinlock_t orphan_lock; - struct list_head orphan_list; + atomic_t orphan_inodes; struct btrfs_block_rsv *orphan_block_rsv; int orphan_item_inserted; int orphan_cleanup_state; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 19f5b45..25dba7a 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1153,7 +1153,6 @@ static void __setup_root(u32 nodesize, u32 leafsize, u32 sectorsize, root->orphan_block_rsv = NULL; INIT_LIST_HEAD(&root->dirty_list); - INIT_LIST_HEAD(&root->orphan_list); INIT_LIST_HEAD(&root->root_list); spin_lock_init(&root->orphan_lock); spin_lock_init(&root->inode_lock); @@ -1166,6 +1165,7 @@ static void __setup_root(u32 nodesize, u32 leafsize, u32 sectorsize, atomic_set(&root->log_commit[0], 0); atomic_set(&root->log_commit[1], 0); atomic_set(&root->log_writers, 0); + atomic_set(&root->orphan_inodes, 0); root->log_batch = 0; root->log_transid = 0; root->last_log_commit = 0; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 54ae3df..7cc1c96 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2104,12 +2104,12 @@ void btrfs_orphan_commit_root(struct btrfs_trans_handle *trans, struct btrfs_block_rsv *block_rsv; int ret; - if (!list_empty(&root->orphan_list) || + if (atomic_read(&root->orphan_inodes) || root->orphan_cleanup_state != ORPHAN_CLEANUP_DONE) return; spin_lock(&root->orphan_lock); - if (!list_empty(&root->orphan_list)) { + if (atomic_read(&root->orphan_inodes)) { spin_unlock(&root->orphan_lock); return; } @@ -2166,8 +2166,8 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans, struct inode *inode) block_rsv = NULL; } - if (list_empty(&BTRFS_I(inode)->i_orphan)) { - list_add(&BTRFS_I(inode)->i_orphan, &root->orphan_list); + if (!BTRFS_I(inode)->has_orphan_item) { + BTRFS_I(inode)->has_orphan_item = 1; #if 0 /* * For proper ENOSPC handling, we should do orphan @@ -2180,6 +2180,7 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans, struct inode *inode) insert = 1; #endif insert = 1; + atomic_inc(&root->orphan_inodes); } if (!BTRFS_I(inode)->orphan_meta_reserved) { @@ -2198,6 +2199,9 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans, struct inode *inode) if (insert >= 1) { ret = btrfs_insert_orphan_item(trans, root, btrfs_ino(inode)); if (ret && ret != -EEXIST) { + spin_lock(&root->orphan_lock); + BTRFS_I(inode)->has_orphan_item = 0; + spin_unlock(&root->orphan_lock); btrfs_abort_transaction(trans, root, ret); return ret; } @@ -2227,13 +2231,21 @@ int btrfs_orphan_del(struct btrfs_trans_handle *trans, struct inode *inode) int release_rsv = 0; int ret = 0; + /* + * evict_inode gets called without holding the i_mutex so we need to + * take the orphan lock to make sure we are safe in messing with these. + */ spin_lock(&root->orphan_lock); - if (!list_empty(&BTRFS_I(inode)->i_orphan)) { - list_del_init(&BTRFS_I(inode)->i_orphan); - delete_item = 1; + if (BTRFS_I(inode)->has_orphan_item) { + if (trans) { + BTRFS_I(inode)->has_orphan_item = 0; + delete_item = 1; + } else { + WARN_ON(1); + } } - if (BTRFS_I(inode)->orphan_meta_reserved) { + if (trans && BTRFS_I(inode)->orphan_meta_reserved) { BTRFS_I(inode)->orphan_meta_reserved = 0; release_rsv = 1; } @@ -2247,6 +2259,9 @@ int btrfs_orphan_del(struct btrfs_trans_handle *trans, struct inode *inode) if (release_rsv) btrfs_orphan_release_metadata(inode); + if (trans && delete_item) + atomic_dec(&root->orphan_inodes); + return 0; } @@ -2385,7 +2400,9 @@ int btrfs_orphan_cleanup(struct btrfs_root *root) * the proper thing when we hit it */ spin_lock(&root->orphan_lock); - list_add(&BTRFS_I(inode)->i_orphan, &root->orphan_list); + atomic_inc(&root->orphan_inodes); + WARN_ON(BTRFS_I(inode)->has_orphan_item); + BTRFS_I(inode)->has_orphan_item = 1; spin_unlock(&root->orphan_lock); /* if we have links, this was a truncate, lets do that */ @@ -3707,7 +3724,7 @@ void btrfs_evict_inode(struct inode *inode) btrfs_wait_ordered_range(inode, 0, (u64)-1); if (root->fs_info->log_root_recovering) { - BUG_ON(!list_empty(&BTRFS_I(inode)->i_orphan)); + BUG_ON(!BTRFS_I(inode)->has_orphan_item); goto no_delete; } @@ -6866,6 +6883,7 @@ struct inode *btrfs_alloc_inode(struct super_block *sb) ei->dummy_inode = 0; ei->in_defrag = 0; ei->delalloc_meta_reserved = 0; + ei->has_orphan_item = 0; ei->force_compress = BTRFS_COMPRESS_NONE; ei->delayed_node = NULL; @@ -6879,7 +6897,6 @@ struct inode *btrfs_alloc_inode(struct super_block *sb) mutex_init(&ei->log_mutex); mutex_init(&ei->delalloc_mutex); btrfs_ordered_inode_tree_init(&ei->ordered_tree); - INIT_LIST_HEAD(&ei->i_orphan); INIT_LIST_HEAD(&ei->delalloc_inodes); INIT_LIST_HEAD(&ei->ordered_operations); RB_CLEAR_NODE(&ei->rb_node); @@ -6924,13 +6941,11 @@ void btrfs_destroy_inode(struct inode *inode) spin_unlock(&root->fs_info->ordered_extent_lock); } - spin_lock(&root->orphan_lock); - if (!list_empty(&BTRFS_I(inode)->i_orphan)) { + if (BTRFS_I(inode)->has_orphan_item) { printk(KERN_INFO "BTRFS: inode %llu still on the orphan list\n", (unsigned long long)btrfs_ino(inode)); - list_del_init(&BTRFS_I(inode)->i_orphan); + atomic_dec(&root->orphan_inodes); } - spin_unlock(&root->orphan_lock); while (1) { ordered = btrfs_lookup_first_ordered_extent(inode, (u64)-1); -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2012/5/17 Josef Bacik <josef@redhat.com>:> On Thu, May 17, 2012 at 05:12:55PM +0200, Martin Mailand wrote: >> Hi Josef, >> no there was nothing above. Here the is another dmesg output. >> > > Hrm ok give this a try and hopefully this is it, still couldn''t reproduce. > Thanks, > > JosefWell, I hate to say it, but the new patch doesn''t seem to change much... Regards, Christian [ 123.507444] Btrfs loaded [ 202.683630] device fsid 2aa7531c-0e3c-4955-8542-6aed7ab8c1a2 devid 1 transid 4 /dev/sda [ 202.693704] btrfs: use lzo compression [ 202.697999] btrfs: enabling inode map caching [ 202.702989] btrfs: enabling auto defrag [ 202.707190] btrfs: disk space caching is enabled [ 202.712721] btrfs flagging fs with big metadata feature [ 207.839761] device fsid f81ff6a1-c333-4daf-989f-a28139f15f08 devid 1 transid 4 /dev/sdb [ 207.849681] btrfs: use lzo compression [ 207.853987] btrfs: enabling inode map caching [ 207.858970] btrfs: enabling auto defrag [ 207.863173] btrfs: disk space caching is enabled [ 207.868635] btrfs flagging fs with big metadata feature [ 210.857328] device fsid 9b905faa-f4fa-4626-9cae-2cd0287b30f7 devid 1 transid 4 /dev/sdc [ 210.867265] btrfs: use lzo compression [ 210.871560] btrfs: enabling inode map caching [ 210.876550] btrfs: enabling auto defrag [ 210.880757] btrfs: disk space caching is enabled [ 210.886228] btrfs flagging fs with big metadata feature [ 214.296287] device fsid f7990e4c-90b0-4691-9502-92b60538574a devid 1 transid 4 /dev/sdd [ 214.306510] btrfs: use lzo compression [ 214.310855] btrfs: enabling inode map caching [ 214.315905] btrfs: enabling auto defrag [ 214.320174] btrfs: disk space caching is enabled [ 214.325706] btrfs flagging fs with big metadata feature [ 1337.937379] ------------[ cut here ]------------ [ 1337.942526] kernel BUG at fs/btrfs/inode.c:2224! [ 1337.947671] invalid opcode: 0000 [#1] SMP [ 1337.952255] CPU 5 [ 1337.954300] Modules linked in: btrfs zlib_deflate libcrc32c xfs exportfs sunrpc bonding ipv6 sg pcspkr serio_raw iTCO_wdt iTCO_vendor_support iomemory_vsl(PO) ixgbe dca mdio i7core_edac edac_core hpsa squashfs [last unloaded: scsi_wait_scan] [ 1337.978570] [ 1337.980230] Pid: 6812, comm: ceph-osd Tainted: P O 3.3.5-1.fits.1.el6.x86_64 #1 HP ProLiant DL180 G6 [ 1337.991592] RIP: 0010:[<ffffffffa035675c>] [<ffffffffa035675c>] btrfs_orphan_del+0x14c/0x150 [btrfs] [ 1338.001897] RSP: 0018:ffff8805e1171d38 EFLAGS: 00010282 [ 1338.007815] RAX: 00000000fffffffe RBX: ffff88061c3c8400 RCX: 0000000000b37f48 [ 1338.015768] RDX: 0000000000b37f47 RSI: ffff8805ec2a1cf0 RDI: ffffea0017b0a840 [ 1338.023724] RBP: ffff8805e1171d68 R08: 000060f9d88028a0 R09: ffffffffa033016a [ 1338.031675] R10: 0000000000000000 R11: 0000000000000004 R12: ffff8805de7f57a0 [ 1338.039629] R13: 0000000000000001 R14: 0000000000000001 R15: ffff8805ec2a5280 [ 1338.047584] FS: 00007f4bffc6e700(0000) GS:ffff8806272a0000(0000) knlGS:0000000000000000 [ 1338.056600] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1338.063003] CR2: ffffffffff600400 CR3: 00000005e34c3000 CR4: 00000000000006e0 [ 1338.070954] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1338.078909] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 1338.086865] Process ceph-osd (pid: 6812, threadinfo ffff8805e1170000, task ffff88060fa81940) [ 1338.096268] Stack: [ 1338.098509] ffff8805e1171d68 ffff8805ec2a5280 ffff88051235b920 0000000000000000 [ 1338.106795] ffff88051235b920 0000000000080000 ffff8805e1171e08 ffffffffa036043c [ 1338.115082] 0000000000000000 0000000000000000 0000000000000000 0001000000001000 [ 1338.123367] Call Trace: [ 1338.126111] [<ffffffffa036043c>] btrfs_truncate+0x5bc/0x640 [btrfs] [ 1338.133213] [<ffffffffa03605b6>] btrfs_setattr+0xf6/0x1a0 [btrfs] [ 1338.140105] [<ffffffff811816fb>] notify_change+0x18b/0x2b0 [ 1338.146320] [<ffffffff81276541>] ? selinux_inode_permission+0xd1/0x130 [ 1338.153699] [<ffffffff81165f44>] do_truncate+0x64/0xa0 [ 1338.159527] [<ffffffff81172669>] ? inode_permission+0x49/0x100 [ 1338.166128] [<ffffffff81166197>] sys_truncate+0x137/0x150 [ 1338.172244] [<ffffffff8158b1e9>] system_call_fastpath+0x16/0x1b [ 1338.178936] Code: 89 e7 e8 88 7d fe ff eb 89 66 0f 1f 44 00 00 be a4 08 00 00 48 c7 c7 59 49 3b a0 45 31 ed e8 5c 78 cf e0 45 31 f6 e9 30 ff ff ff <0f> 0b eb fe 55 48 89 e5 48 83 ec 40 48 89 5d d8 4c 89 65 e0 4c [ 1338.200623] RIP [<ffffffffa035675c>] btrfs_orphan_del+0x14c/0x150 [btrfs] [ 1338.208317] RSP <ffff8805e1171d38> [ 1338.212681] ---[ end trace 86be14f0f863ea79 ]--- -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Josef, I hit exact the same bug as Christian with your last patch. -martin -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 17, 2012 at 11:18:25PM +0200, Martin Mailand wrote:> Hi Josef, > > I hit exact the same bug as Christian with your last patch. >Ok hopefully this will print something out that makes sense. Thanks, Josef diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index 9b9b15f..492c74f 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -57,9 +57,6 @@ struct btrfs_inode { /* used to order data wrt metadata */ struct btrfs_ordered_inode_tree ordered_tree; - /* for keeping track of orphaned inodes */ - struct list_head i_orphan; - /* list of all the delalloc inodes in the FS. There are times we need * to write all the delalloc pages to disk, and this list is used * to walk them all. @@ -156,6 +153,8 @@ struct btrfs_inode { unsigned dummy_inode:1; unsigned in_defrag:1; unsigned delalloc_meta_reserved:1; + unsigned has_orphan_item:1; + unsigned doing_truncate:1; /* * always compress this one file diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 8fd7233..aad2600 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1375,7 +1375,7 @@ struct btrfs_root { struct list_head root_list; spinlock_t orphan_lock; - struct list_head orphan_list; + atomic_t orphan_inodes; struct btrfs_block_rsv *orphan_block_rsv; int orphan_item_inserted; int orphan_cleanup_state; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index a7ffc88..ff3bf4b 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1153,7 +1153,6 @@ static void __setup_root(u32 nodesize, u32 leafsize, u32 sectorsize, root->orphan_block_rsv = NULL; INIT_LIST_HEAD(&root->dirty_list); - INIT_LIST_HEAD(&root->orphan_list); INIT_LIST_HEAD(&root->root_list); spin_lock_init(&root->orphan_lock); spin_lock_init(&root->inode_lock); @@ -1166,6 +1165,7 @@ static void __setup_root(u32 nodesize, u32 leafsize, u32 sectorsize, atomic_set(&root->log_commit[0], 0); atomic_set(&root->log_commit[1], 0); atomic_set(&root->log_writers, 0); + atomic_set(&root->orphan_inodes, 0); root->log_batch = 0; root->log_transid = 0; root->last_log_commit = 0; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 61b16c6..7de7f6f 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2072,12 +2072,12 @@ void btrfs_orphan_commit_root(struct btrfs_trans_handle *trans, struct btrfs_block_rsv *block_rsv; int ret; - if (!list_empty(&root->orphan_list) || + if (atomic_read(&root->orphan_inodes) || root->orphan_cleanup_state != ORPHAN_CLEANUP_DONE) return; spin_lock(&root->orphan_lock); - if (!list_empty(&root->orphan_list)) { + if (atomic_read(&root->orphan_inodes)) { spin_unlock(&root->orphan_lock); return; } @@ -2134,8 +2134,8 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans, struct inode *inode) block_rsv = NULL; } - if (list_empty(&BTRFS_I(inode)->i_orphan)) { - list_add(&BTRFS_I(inode)->i_orphan, &root->orphan_list); + if (!BTRFS_I(inode)->has_orphan_item) { + BTRFS_I(inode)->has_orphan_item = 1; #if 0 /* * For proper ENOSPC handling, we should do orphan @@ -2148,6 +2148,7 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans, struct inode *inode) insert = 1; #endif insert = 1; + atomic_inc(&root->orphan_inodes); } if (!BTRFS_I(inode)->orphan_meta_reserved) { @@ -2166,6 +2167,9 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans, struct inode *inode) if (insert >= 1) { ret = btrfs_insert_orphan_item(trans, root, btrfs_ino(inode)); if (ret && ret != -EEXIST) { + spin_lock(&root->orphan_lock); + BTRFS_I(inode)->has_orphan_item = 0; + spin_unlock(&root->orphan_lock); btrfs_abort_transaction(trans, root, ret); return ret; } @@ -2195,13 +2199,21 @@ int btrfs_orphan_del(struct btrfs_trans_handle *trans, struct inode *inode) int release_rsv = 0; int ret = 0; + /* + * evict_inode gets called without holding the i_mutex so we need to + * take the orphan lock to make sure we are safe in messing with these. + */ spin_lock(&root->orphan_lock); - if (!list_empty(&BTRFS_I(inode)->i_orphan)) { - list_del_init(&BTRFS_I(inode)->i_orphan); - delete_item = 1; + if (BTRFS_I(inode)->has_orphan_item) { + if (trans) { + BTRFS_I(inode)->has_orphan_item = 0; + delete_item = 1; + } else { + WARN_ON(1); + } } - if (BTRFS_I(inode)->orphan_meta_reserved) { + if (trans && BTRFS_I(inode)->orphan_meta_reserved) { BTRFS_I(inode)->orphan_meta_reserved = 0; release_rsv = 1; } @@ -2209,12 +2221,18 @@ int btrfs_orphan_del(struct btrfs_trans_handle *trans, struct inode *inode) if (trans && delete_item) { ret = btrfs_del_orphan_item(trans, root, btrfs_ino(inode)); + if (ret) + printk(KERN_ERR "couldn''t find orphan item for %Lu\n", + btrfs_ino(inode)); BUG_ON(ret); /* -ENOMEM or corruption (JDM: Recheck) */ } if (release_rsv) btrfs_orphan_release_metadata(inode); + if (trans && delete_item) + atomic_dec(&root->orphan_inodes); + return 0; } @@ -2341,6 +2359,8 @@ int btrfs_orphan_cleanup(struct btrfs_root *root) ret = PTR_ERR(trans); goto out; } + printk(KERN_ERR "auto deleting %Lu\n", + found_key.objectid); ret = btrfs_del_orphan_item(trans, root, found_key.objectid); BUG_ON(ret); /* -ENOMEM or corruption (JDM: Recheck) */ @@ -2353,7 +2373,9 @@ int btrfs_orphan_cleanup(struct btrfs_root *root) * the proper thing when we hit it */ spin_lock(&root->orphan_lock); - list_add(&BTRFS_I(inode)->i_orphan, &root->orphan_list); + atomic_inc(&root->orphan_inodes); + WARN_ON(BTRFS_I(inode)->has_orphan_item); + BTRFS_I(inode)->has_orphan_item = 1; spin_unlock(&root->orphan_lock); /* if we have links, this was a truncate, lets do that */ @@ -3671,7 +3693,7 @@ void btrfs_evict_inode(struct inode *inode) btrfs_wait_ordered_range(inode, 0, (u64)-1); if (root->fs_info->log_root_recovering) { - BUG_ON(!list_empty(&BTRFS_I(inode)->i_orphan)); + BUG_ON(!BTRFS_I(inode)->has_orphan_item); goto no_delete; } @@ -6683,9 +6705,13 @@ static int btrfs_truncate(struct inode *inode) u64 mask = root->sectorsize - 1; u64 min_size = btrfs_calc_trunc_metadata_size(root, 1); + spin_lock(&BTRFS_I(inode)->lock); + BUG_ON(BTRFS_I(inode)->doing_truncate); + BTRFS_I(inode)->doing_truncate = 0; + spin_unlock(&BTRFS_I(inode)->lock); ret = btrfs_truncate_page(inode->i_mapping, inode->i_size); if (ret) - return ret; + goto real_out; btrfs_wait_ordered_range(inode, inode->i_size & (~mask), (u64)-1); btrfs_ordered_update_i_size(inode, inode->i_size, NULL); @@ -6727,8 +6753,10 @@ static int btrfs_truncate(struct inode *inode) * updating the inode. */ rsv = btrfs_alloc_block_rsv(root); - if (!rsv) - return -ENOMEM; + if (!rsv) { + ret = -ENOMEM; + goto real_out; + } rsv->size = min_size; /* @@ -6847,7 +6875,10 @@ end_trans: out: btrfs_free_block_rsv(root, rsv); - +real_out: + spin_lock(&BTRFS_I(inode)->lock); + BTRFS_I(inode)->doing_truncate = 0; + spin_unlock(&BTRFS_I(inode)->lock); if (ret && !err) err = ret; @@ -6914,6 +6945,8 @@ struct inode *btrfs_alloc_inode(struct super_block *sb) ei->dummy_inode = 0; ei->in_defrag = 0; ei->delalloc_meta_reserved = 0; + ei->has_orphan_item = 0; + ei->doing_truncate = 0; ei->force_compress = BTRFS_COMPRESS_NONE; ei->delayed_node = NULL; @@ -6927,7 +6960,6 @@ struct inode *btrfs_alloc_inode(struct super_block *sb) mutex_init(&ei->log_mutex); mutex_init(&ei->delalloc_mutex); btrfs_ordered_inode_tree_init(&ei->ordered_tree); - INIT_LIST_HEAD(&ei->i_orphan); INIT_LIST_HEAD(&ei->delalloc_inodes); INIT_LIST_HEAD(&ei->ordered_operations); RB_CLEAR_NODE(&ei->rb_node); @@ -6972,13 +7004,11 @@ void btrfs_destroy_inode(struct inode *inode) spin_unlock(&root->fs_info->ordered_extent_lock); } - spin_lock(&root->orphan_lock); - if (!list_empty(&BTRFS_I(inode)->i_orphan)) { + if (BTRFS_I(inode)->has_orphan_item) { printk(KERN_INFO "BTRFS: inode %llu still on the orphan list\n", (unsigned long long)btrfs_ino(inode)); - list_del_init(&BTRFS_I(inode)->i_orphan); + atomic_dec(&root->orphan_inodes); } - spin_unlock(&root->orphan_lock); while (1) { ordered = btrfs_lookup_first_ordered_extent(inode, (u64)-1); -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Josef, there was one line before the bug. [ 995.725105] couldn''t find orphan item for 524 Am 18.05.2012 16:48, schrieb Josef Bacik:> Ok hopefully this will print something out that makes sense. Thanks,-martin [ 241.754693] Btrfs loaded [ 241.755148] device fsid 43c4ebd9-3824-4b07-a710-3ec39b012759 devid 1 transid 4 /dev/sdc [ 241.755750] btrfs: setting nodatacow [ 241.755753] btrfs: enabling auto defrag [ 241.755754] btrfs: disk space caching is enabled [ 241.755755] btrfs flagging fs with big metadata feature [ 241.768683] device fsid e7e7f2df-6a4e-45b1-85cc-860cda849953 devid 1 transid 4 /dev/sdd [ 241.769028] btrfs: setting nodatacow [ 241.769030] btrfs: enabling auto defrag [ 241.769031] btrfs: disk space caching is enabled [ 241.769032] btrfs flagging fs with big metadata feature [ 241.781360] device fsid 203fdd4c-baac-49f8-bfdb-08486c937989 devid 1 transid 4 /dev/sde [ 241.781854] btrfs: setting nodatacow [ 241.781859] btrfs: enabling auto defrag [ 241.781861] btrfs: disk space caching is enabled [ 241.781864] btrfs flagging fs with big metadata feature [ 242.713741] device fsid 95c36e12-0098-48d7-a08d-9d54a299206b devid 1 transid 4 /dev/sdf [ 242.714110] btrfs: setting nodatacow [ 242.714118] btrfs: enabling auto defrag [ 242.714121] btrfs: disk space caching is enabled [ 242.714125] btrfs flagging fs with big metadata feature [ 995.725105] couldn''t find orphan item for 524 [ 995.725126] ------------[ cut here ]------------ [ 995.725134] kernel BUG at fs/btrfs/inode.c:2227! [ 995.725143] invalid opcode: 0000 [#1] SMP [ 995.725158] CPU 0 [ 995.725162] Modules linked in: btrfs zlib_deflate libcrc32c ext2 coretemp ghash_clmulni_intel aesni_intel bonding cryptd aes_x86_64 microcode psmouse serio_raw sb_edac edac_core joydev mei(C) ses ioatdma enclosure mac_hid lp parport ixgbe usbhid hid isci libsas megaraid_sas scsi_transport_sas igb dca mdio [ 995.725285] [ 995.725290] Pid: 2972, comm: ceph-osd Tainted: G C 3.4.0-rc7.2012051800+ #14 Supermicro X9SRi/X9SRi [ 995.725324] RIP: 0010:[<ffffffffa028535f>] [<ffffffffa028535f>] btrfs_orphan_del+0x14f/0x160 [btrfs] [ 995.725354] RSP: 0018:ffff881016ed9d18 EFLAGS: 00010292 [ 995.725364] RAX: 0000000000000037 RBX: ffff88101485fdb0 RCX: 00000000ffffffff [ 995.725378] RDX: 0000000000000000 RSI: 0000000000000082 RDI: 0000000000000246 [ 995.725392] RBP: ffff881016ed9d58 R08: 0000000000000000 R09: 0000000000000000 [ 995.725405] R10: 0000000000000000 R11: 00000000000000b6 R12: ffff88101efe9f90 [ 995.725419] R13: ffff88101efe9c00 R14: 0000000000000001 R15: 0000000000000001 [ 995.725433] FS: 00007f58e5dbc700(0000) GS:ffff88107fc00000(0000) knlGS:0000000000000000 [ 995.725466] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 995.725492] CR2: 0000000003f28000 CR3: 000000101acac000 CR4: 00000000000407f0 [ 995.725522] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 995.725551] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 995.725581] Process ceph-osd (pid: 2972, threadinfo ffff881016ed8000, task ffff881016180000) [ 995.725626] Stack: [ 995.725646] 0c00000000000002 ffff88101deaf550 ffff881016ed9d38 ffff88101deaf550 [ 995.725700] 0000000000000000 ffff88101efe9c00 ffff88101485fdb0 ffff880be890c1e0 [ 995.725757] ffff881016ed9e08 ffffffffa02897a8 ffff88101485fdb0 0000000000000000 [ 995.725807] Call Trace: [ 995.725835] [<ffffffffa02897a8>] btrfs_truncate+0x5e8/0x6d0 [btrfs] [ 995.725869] [<ffffffffa028b121>] btrfs_setattr+0xc1/0x1b0 [btrfs] [ 995.725898] [<ffffffff811955c3>] notify_change+0x183/0x320 [ 995.725925] [<ffffffff8117889e>] do_truncate+0x5e/0xa0 [ 995.725951] [<ffffffff81178a24>] sys_truncate+0x144/0x1b0 [ 995.725979] [<ffffffff8165fd29>] system_call_fastpath+0x16/0x1b [ 995.726006] Code: 45 31 ff e9 3c ff ff ff 48 8b b3 58 fe ff ff 48 85 f6 74 19 80 bb 60 fe ff ff 84 74 10 48 c7 c7 08 48 2e a0 31 c0 e8 09 7c 3c e1 <0f> 0b 48 8b 73 40 eb ea 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 [ 995.726221] RIP [<ffffffffa028535f>] btrfs_orphan_del+0x14f/0x160 [btrfs] [ 995.726258] RSP <ffff881016ed9d18> [ 995.726574] ---[ end trace 4bde8f513a6d106d ]--- -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, May 18, 2012 at 07:24:25PM +0200, Martin Mailand wrote:> Hi Josef, > there was one line before the bug. > > [ 995.725105] couldn''t find orphan item for 524 > >*sigh* ok try this, hopefully it will point me in the right direction. Thanks, Josef diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index 9b9b15f..492c74f 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -57,9 +57,6 @@ struct btrfs_inode { /* used to order data wrt metadata */ struct btrfs_ordered_inode_tree ordered_tree; - /* for keeping track of orphaned inodes */ - struct list_head i_orphan; - /* list of all the delalloc inodes in the FS. There are times we need * to write all the delalloc pages to disk, and this list is used * to walk them all. @@ -156,6 +153,8 @@ struct btrfs_inode { unsigned dummy_inode:1; unsigned in_defrag:1; unsigned delalloc_meta_reserved:1; + unsigned has_orphan_item:1; + unsigned doing_truncate:1; /* * always compress this one file diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 8fd7233..aad2600 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1375,7 +1375,7 @@ struct btrfs_root { struct list_head root_list; spinlock_t orphan_lock; - struct list_head orphan_list; + atomic_t orphan_inodes; struct btrfs_block_rsv *orphan_block_rsv; int orphan_item_inserted; int orphan_cleanup_state; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index a7ffc88..ff3bf4b 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1153,7 +1153,6 @@ static void __setup_root(u32 nodesize, u32 leafsize, u32 sectorsize, root->orphan_block_rsv = NULL; INIT_LIST_HEAD(&root->dirty_list); - INIT_LIST_HEAD(&root->orphan_list); INIT_LIST_HEAD(&root->root_list); spin_lock_init(&root->orphan_lock); spin_lock_init(&root->inode_lock); @@ -1166,6 +1165,7 @@ static void __setup_root(u32 nodesize, u32 leafsize, u32 sectorsize, atomic_set(&root->log_commit[0], 0); atomic_set(&root->log_commit[1], 0); atomic_set(&root->log_writers, 0); + atomic_set(&root->orphan_inodes, 0); root->log_batch = 0; root->log_transid = 0; root->last_log_commit = 0; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 61b16c6..572da13 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2072,12 +2072,12 @@ void btrfs_orphan_commit_root(struct btrfs_trans_handle *trans, struct btrfs_block_rsv *block_rsv; int ret; - if (!list_empty(&root->orphan_list) || + if (atomic_read(&root->orphan_inodes) || root->orphan_cleanup_state != ORPHAN_CLEANUP_DONE) return; spin_lock(&root->orphan_lock); - if (!list_empty(&root->orphan_list)) { + if (atomic_read(&root->orphan_inodes)) { spin_unlock(&root->orphan_lock); return; } @@ -2134,8 +2134,8 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans, struct inode *inode) block_rsv = NULL; } - if (list_empty(&BTRFS_I(inode)->i_orphan)) { - list_add(&BTRFS_I(inode)->i_orphan, &root->orphan_list); + if (!BTRFS_I(inode)->has_orphan_item) { + BTRFS_I(inode)->has_orphan_item = 1; #if 0 /* * For proper ENOSPC handling, we should do orphan @@ -2148,6 +2148,7 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans, struct inode *inode) insert = 1; #endif insert = 1; + atomic_inc(&root->orphan_inodes); } if (!BTRFS_I(inode)->orphan_meta_reserved) { @@ -2166,6 +2167,9 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans, struct inode *inode) if (insert >= 1) { ret = btrfs_insert_orphan_item(trans, root, btrfs_ino(inode)); if (ret && ret != -EEXIST) { + spin_lock(&root->orphan_lock); + BTRFS_I(inode)->has_orphan_item = 0; + spin_unlock(&root->orphan_lock); btrfs_abort_transaction(trans, root, ret); return ret; } @@ -2195,13 +2199,21 @@ int btrfs_orphan_del(struct btrfs_trans_handle *trans, struct inode *inode) int release_rsv = 0; int ret = 0; + /* + * evict_inode gets called without holding the i_mutex so we need to + * take the orphan lock to make sure we are safe in messing with these. + */ spin_lock(&root->orphan_lock); - if (!list_empty(&BTRFS_I(inode)->i_orphan)) { - list_del_init(&BTRFS_I(inode)->i_orphan); - delete_item = 1; + if (BTRFS_I(inode)->has_orphan_item) { + if (trans) { + BTRFS_I(inode)->has_orphan_item = 0; + delete_item = 1; + } else { + WARN_ON(1); + } } - if (BTRFS_I(inode)->orphan_meta_reserved) { + if (trans && BTRFS_I(inode)->orphan_meta_reserved) { BTRFS_I(inode)->orphan_meta_reserved = 0; release_rsv = 1; } @@ -2209,12 +2221,19 @@ int btrfs_orphan_del(struct btrfs_trans_handle *trans, struct inode *inode) if (trans && delete_item) { ret = btrfs_del_orphan_item(trans, root, btrfs_ino(inode)); + if (ret) + printk(KERN_ERR "couldn''t find orphan item for %Lu, nlink %d, root %Lu, root being deleted %s\n", + btrfs_ino(inode), inode->i_nlink, root->objectid, + root->orphan_item_inserted ? "yes" : "no"); BUG_ON(ret); /* -ENOMEM or corruption (JDM: Recheck) */ } if (release_rsv) btrfs_orphan_release_metadata(inode); + if (trans && delete_item) + atomic_dec(&root->orphan_inodes); + return 0; } @@ -2341,6 +2360,8 @@ int btrfs_orphan_cleanup(struct btrfs_root *root) ret = PTR_ERR(trans); goto out; } + printk(KERN_ERR "auto deleting %Lu\n", + found_key.objectid); ret = btrfs_del_orphan_item(trans, root, found_key.objectid); BUG_ON(ret); /* -ENOMEM or corruption (JDM: Recheck) */ @@ -2353,7 +2374,9 @@ int btrfs_orphan_cleanup(struct btrfs_root *root) * the proper thing when we hit it */ spin_lock(&root->orphan_lock); - list_add(&BTRFS_I(inode)->i_orphan, &root->orphan_list); + atomic_inc(&root->orphan_inodes); + WARN_ON(BTRFS_I(inode)->has_orphan_item); + BTRFS_I(inode)->has_orphan_item = 1; spin_unlock(&root->orphan_lock); /* if we have links, this was a truncate, lets do that */ @@ -3671,7 +3694,7 @@ void btrfs_evict_inode(struct inode *inode) btrfs_wait_ordered_range(inode, 0, (u64)-1); if (root->fs_info->log_root_recovering) { - BUG_ON(!list_empty(&BTRFS_I(inode)->i_orphan)); + BUG_ON(!BTRFS_I(inode)->has_orphan_item); goto no_delete; } @@ -6683,9 +6706,13 @@ static int btrfs_truncate(struct inode *inode) u64 mask = root->sectorsize - 1; u64 min_size = btrfs_calc_trunc_metadata_size(root, 1); + spin_lock(&BTRFS_I(inode)->lock); + BUG_ON(BTRFS_I(inode)->doing_truncate); + BTRFS_I(inode)->doing_truncate = 0; + spin_unlock(&BTRFS_I(inode)->lock); ret = btrfs_truncate_page(inode->i_mapping, inode->i_size); if (ret) - return ret; + goto real_out; btrfs_wait_ordered_range(inode, inode->i_size & (~mask), (u64)-1); btrfs_ordered_update_i_size(inode, inode->i_size, NULL); @@ -6727,8 +6754,10 @@ static int btrfs_truncate(struct inode *inode) * updating the inode. */ rsv = btrfs_alloc_block_rsv(root); - if (!rsv) - return -ENOMEM; + if (!rsv) { + ret = -ENOMEM; + goto real_out; + } rsv->size = min_size; /* @@ -6847,7 +6876,10 @@ end_trans: out: btrfs_free_block_rsv(root, rsv); - +real_out: + spin_lock(&BTRFS_I(inode)->lock); + BTRFS_I(inode)->doing_truncate = 0; + spin_unlock(&BTRFS_I(inode)->lock); if (ret && !err) err = ret; @@ -6914,6 +6946,8 @@ struct inode *btrfs_alloc_inode(struct super_block *sb) ei->dummy_inode = 0; ei->in_defrag = 0; ei->delalloc_meta_reserved = 0; + ei->has_orphan_item = 0; + ei->doing_truncate = 0; ei->force_compress = BTRFS_COMPRESS_NONE; ei->delayed_node = NULL; @@ -6927,7 +6961,6 @@ struct inode *btrfs_alloc_inode(struct super_block *sb) mutex_init(&ei->log_mutex); mutex_init(&ei->delalloc_mutex); btrfs_ordered_inode_tree_init(&ei->ordered_tree); - INIT_LIST_HEAD(&ei->i_orphan); INIT_LIST_HEAD(&ei->delalloc_inodes); INIT_LIST_HEAD(&ei->ordered_operations); RB_CLEAR_NODE(&ei->rb_node); @@ -6972,13 +7005,11 @@ void btrfs_destroy_inode(struct inode *inode) spin_unlock(&root->fs_info->ordered_extent_lock); } - spin_lock(&root->orphan_lock); - if (!list_empty(&BTRFS_I(inode)->i_orphan)) { + if (BTRFS_I(inode)->has_orphan_item) { printk(KERN_INFO "BTRFS: inode %llu still on the orphan list\n", (unsigned long long)btrfs_ino(inode)); - list_del_init(&BTRFS_I(inode)->i_orphan); + atomic_dec(&root->orphan_inodes); } - spin_unlock(&root->orphan_lock); while (1) { ordered = btrfs_lookup_first_ordered_extent(inode, (u64)-1); -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Josef, now I get [ 2081.142669] couldn''t find orphan item for 2039, nlink 1, root 269, root being deleted no -martin Am 18.05.2012 21:01, schrieb Josef Bacik:> *sigh* ok try this, hopefully it will point me in the right direction. Thanks,[ 126.389847] Btrfs loaded [ 126.390284] device fsid 0c9d8c6d-2982-4604-b32a-fc443c4e2c50 devid 1 transid 4 /dev/sdc [ 126.391246] btrfs: setting nodatacow [ 126.391252] btrfs: enabling auto defrag [ 126.391254] btrfs: disk space caching is enabled [ 126.391257] btrfs flagging fs with big metadata feature [ 126.405700] device fsid e8a0dc27-8714-49bd-a14f-ac37525febb1 devid 1 transid 4 /dev/sdd [ 126.406162] btrfs: setting nodatacow [ 126.406167] btrfs: enabling auto defrag [ 126.406170] btrfs: disk space caching is enabled [ 126.406172] btrfs flagging fs with big metadata feature [ 126.419819] device fsid f67cd977-ebf4-41f2-9821-f2989e985954 devid 1 transid 4 /dev/sde [ 126.420198] btrfs: setting nodatacow [ 126.420206] btrfs: enabling auto defrag [ 126.420210] btrfs: disk space caching is enabled [ 126.420214] btrfs flagging fs with big metadata feature [ 127.274555] device fsid 3001355e-c2e2-46c7-9eba-dfecb441d6a6 devid 1 transid 4 /dev/sdf [ 127.274980] btrfs: setting nodatacow [ 127.274986] btrfs: enabling auto defrag [ 127.274989] btrfs: disk space caching is enabled [ 127.274992] btrfs flagging fs with big metadata feature [ 2081.142669] couldn''t find orphan item for 2039, nlink 1, root 269, root being deleted no [ 2081.142735] ------------[ cut here ]------------ [ 2081.142750] kernel BUG at fs/btrfs/inode.c:2228! [ 2081.142766] invalid opcode: 0000 [#1] SMP [ 2081.142786] CPU 10 [ 2081.142794] Modules linked in: btrfs zlib_deflate libcrc32c ext2 bonding coretemp ghash_clmulni_intel aesni_intel cryptd aes_x86_64 microcode psmouse serio_raw sb_edac edac_core joydev mei(C) ioatdma ses enclosure mac_hid lp parport usbhid hid megaraid_sas isci libsas scsi_transport_sas igb ixgbe dca mdio [ 2081.142974] [ 2081.142985] Pid: 2966, comm: ceph-osd Tainted: G C 3.4.0-rc7.2012051802+ #16 Supermicro X9SRi/X9SRi [ 2081.143020] RIP: 0010:[<ffffffffa0269383>] [<ffffffffa0269383>] btrfs_orphan_del+0x173/0x180 [btrfs] [ 2081.143080] RSP: 0018:ffff881016d83d18 EFLAGS: 00010292 [ 2081.143096] RAX: 0000000000000062 RBX: ffff881017ad4770 RCX: 00000000ffffffff [ 2081.143115] RDX: 0000000000000000 RSI: 0000000000000082 RDI: 0000000000000246 [ 2081.143134] RBP: ffff881016d83d58 R08: 0000000000000000 R09: 0000000000000000 [ 2081.143154] R10: 0000000000000000 R11: 0000000000000116 R12: ffff88101e7baf90 [ 2081.143173] R13: ffff88101e7bac00 R14: 0000000000000001 R15: 0000000000000001 [ 2081.143193] FS: 00007fcc1e736700(0000) GS:ffff88107fd40000(0000) knlGS:0000000000000000 [ 2081.143243] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2081.143274] CR2: 0000000009269000 CR3: 000000101ba87000 CR4: 00000000000407e0 [ 2081.143308] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2081.143341] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 2081.143376] Process ceph-osd (pid: 2966, threadinfo ffff881016d82000, task ffff881023c744a0) [ 2081.143424] Stack: [ 2081.143447] 0c00000000000007 ffff88101e1dac30 ffff881016d83d38 ffff88101e1dac30 [ 2081.143510] 0000000000000000 ffff88101e7bac00 ffff881017ad4770 ffff88101f0f7d60 [ 2081.143572] ffff881016d83e08 ffffffffa026d7c8 ffff881017ad4770 0000000000000000 [ 2081.143634] Call Trace: [ 2081.143684] [<ffffffffa026d7c8>] btrfs_truncate+0x5e8/0x6d0 [btrfs] [ 2081.143737] [<ffffffffa026f141>] btrfs_setattr+0xc1/0x1b0 [btrfs] [ 2081.143773] [<ffffffff811955c3>] notify_change+0x183/0x320 [ 2081.143807] [<ffffffff8117889e>] do_truncate+0x5e/0xa0 [ 2081.143839] [<ffffffff81178a24>] sys_truncate+0x144/0x1b0 [ 2081.143873] [<ffffffff8165fd29>] system_call_fastpath+0x16/0x1b [ 2081.143903] Code: a0 49 8b 8d f0 02 00 00 8b 53 48 4c 0f 44 c0 48 85 f6 74 19 80 bb 60 fe ff ff 84 74 10 48 c7 c7 10 88 2c a0 31 c0 e8 e5 3b 3e e1 <0f> 0b 48 8b 73 40 eb ea 0f 1f 44 00 00 55 48 89 e5 48 83 ec 10 [ 2081.144199] RIP [<ffffffffa0269383>] btrfs_orphan_del+0x173/0x180 [btrfs] [ 2081.144258] RSP <ffff881016d83d18> [ 2081.144614] ---[ end trace 8d0829d100639242 ]--- -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Josef, On fri, 18 May 2012 15:01:05 -0400, Josef Bacik wrote:> diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h > index 9b9b15f..492c74f 100644 > --- a/fs/btrfs/btrfs_inode.h > +++ b/fs/btrfs/btrfs_inode.h > @@ -57,9 +57,6 @@ struct btrfs_inode { > /* used to order data wrt metadata */ > struct btrfs_ordered_inode_tree ordered_tree; > > - /* for keeping track of orphaned inodes */ > - struct list_head i_orphan; > - > /* list of all the delalloc inodes in the FS. There are times we need > * to write all the delalloc pages to disk, and this list is used > * to walk them all. > @@ -156,6 +153,8 @@ struct btrfs_inode { > unsigned dummy_inode:1; > unsigned in_defrag:1; > unsigned delalloc_meta_reserved:1; > + unsigned has_orphan_item:1; > + unsigned doing_truncate:1;I think the problem is we should not use the different lock to protect the bit fields which are stored in the same machine word. Or some bit fields may be covered by the others when someone change those fields. Could you try to declare ->delalloc_meta_reserved and ->has_orphan_item as a integer? Thanks Miao> > /* > * always compress this one file > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index 8fd7233..aad2600 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -1375,7 +1375,7 @@ struct btrfs_root { > struct list_head root_list; > > spinlock_t orphan_lock; > - struct list_head orphan_list; > + atomic_t orphan_inodes; > struct btrfs_block_rsv *orphan_block_rsv; > int orphan_item_inserted; > int orphan_cleanup_state; > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index a7ffc88..ff3bf4b 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -1153,7 +1153,6 @@ static void __setup_root(u32 nodesize, u32 leafsize, u32 sectorsize, > root->orphan_block_rsv = NULL; > > INIT_LIST_HEAD(&root->dirty_list); > - INIT_LIST_HEAD(&root->orphan_list); > INIT_LIST_HEAD(&root->root_list); > spin_lock_init(&root->orphan_lock); > spin_lock_init(&root->inode_lock); > @@ -1166,6 +1165,7 @@ static void __setup_root(u32 nodesize, u32 leafsize, u32 sectorsize, > atomic_set(&root->log_commit[0], 0); > atomic_set(&root->log_commit[1], 0); > atomic_set(&root->log_writers, 0); > + atomic_set(&root->orphan_inodes, 0); > root->log_batch = 0; > root->log_transid = 0; > root->last_log_commit = 0; > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 61b16c6..572da13 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -2072,12 +2072,12 @@ void btrfs_orphan_commit_root(struct btrfs_trans_handle *trans, > struct btrfs_block_rsv *block_rsv; > int ret; > > - if (!list_empty(&root->orphan_list) || > + if (atomic_read(&root->orphan_inodes) || > root->orphan_cleanup_state != ORPHAN_CLEANUP_DONE) > return; > > spin_lock(&root->orphan_lock); > - if (!list_empty(&root->orphan_list)) { > + if (atomic_read(&root->orphan_inodes)) { > spin_unlock(&root->orphan_lock); > return; > } > @@ -2134,8 +2134,8 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans, struct inode *inode) > block_rsv = NULL; > } > > - if (list_empty(&BTRFS_I(inode)->i_orphan)) { > - list_add(&BTRFS_I(inode)->i_orphan, &root->orphan_list); > + if (!BTRFS_I(inode)->has_orphan_item) { > + BTRFS_I(inode)->has_orphan_item = 1; > #if 0 > /* > * For proper ENOSPC handling, we should do orphan > @@ -2148,6 +2148,7 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans, struct inode *inode) > insert = 1; > #endif > insert = 1; > + atomic_inc(&root->orphan_inodes); > } > > if (!BTRFS_I(inode)->orphan_meta_reserved) { > @@ -2166,6 +2167,9 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans, struct inode *inode) > if (insert >= 1) { > ret = btrfs_insert_orphan_item(trans, root, btrfs_ino(inode)); > if (ret && ret != -EEXIST) { > + spin_lock(&root->orphan_lock); > + BTRFS_I(inode)->has_orphan_item = 0; > + spin_unlock(&root->orphan_lock); > btrfs_abort_transaction(trans, root, ret); > return ret; > } > @@ -2195,13 +2199,21 @@ int btrfs_orphan_del(struct btrfs_trans_handle *trans, struct inode *inode) > int release_rsv = 0; > int ret = 0; > > + /* > + * evict_inode gets called without holding the i_mutex so we need to > + * take the orphan lock to make sure we are safe in messing with these. > + */ > spin_lock(&root->orphan_lock); > - if (!list_empty(&BTRFS_I(inode)->i_orphan)) { > - list_del_init(&BTRFS_I(inode)->i_orphan); > - delete_item = 1; > + if (BTRFS_I(inode)->has_orphan_item) { > + if (trans) { > + BTRFS_I(inode)->has_orphan_item = 0; > + delete_item = 1; > + } else { > + WARN_ON(1); > + } > } > > - if (BTRFS_I(inode)->orphan_meta_reserved) { > + if (trans && BTRFS_I(inode)->orphan_meta_reserved) { > BTRFS_I(inode)->orphan_meta_reserved = 0; > release_rsv = 1; > } > @@ -2209,12 +2221,19 @@ int btrfs_orphan_del(struct btrfs_trans_handle *trans, struct inode *inode) > > if (trans && delete_item) { > ret = btrfs_del_orphan_item(trans, root, btrfs_ino(inode)); > + if (ret) > + printk(KERN_ERR "couldn''t find orphan item for %Lu, nlink %d, root %Lu, root being deleted %s\n", > + btrfs_ino(inode), inode->i_nlink, root->objectid, > + root->orphan_item_inserted ? "yes" : "no"); > BUG_ON(ret); /* -ENOMEM or corruption (JDM: Recheck) */ > } > > if (release_rsv) > btrfs_orphan_release_metadata(inode); > > + if (trans && delete_item) > + atomic_dec(&root->orphan_inodes); > + > return 0; > } > > @@ -2341,6 +2360,8 @@ int btrfs_orphan_cleanup(struct btrfs_root *root) > ret = PTR_ERR(trans); > goto out; > } > + printk(KERN_ERR "auto deleting %Lu\n", > + found_key.objectid); > ret = btrfs_del_orphan_item(trans, root, > found_key.objectid); > BUG_ON(ret); /* -ENOMEM or corruption (JDM: Recheck) */ > @@ -2353,7 +2374,9 @@ int btrfs_orphan_cleanup(struct btrfs_root *root) > * the proper thing when we hit it > */ > spin_lock(&root->orphan_lock); > - list_add(&BTRFS_I(inode)->i_orphan, &root->orphan_list); > + atomic_inc(&root->orphan_inodes); > + WARN_ON(BTRFS_I(inode)->has_orphan_item); > + BTRFS_I(inode)->has_orphan_item = 1; > spin_unlock(&root->orphan_lock); > > /* if we have links, this was a truncate, lets do that */ > @@ -3671,7 +3694,7 @@ void btrfs_evict_inode(struct inode *inode) > btrfs_wait_ordered_range(inode, 0, (u64)-1); > > if (root->fs_info->log_root_recovering) { > - BUG_ON(!list_empty(&BTRFS_I(inode)->i_orphan)); > + BUG_ON(!BTRFS_I(inode)->has_orphan_item); > goto no_delete; > } > > @@ -6683,9 +6706,13 @@ static int btrfs_truncate(struct inode *inode) > u64 mask = root->sectorsize - 1; > u64 min_size = btrfs_calc_trunc_metadata_size(root, 1); > > + spin_lock(&BTRFS_I(inode)->lock); > + BUG_ON(BTRFS_I(inode)->doing_truncate); > + BTRFS_I(inode)->doing_truncate = 0; > + spin_unlock(&BTRFS_I(inode)->lock); > ret = btrfs_truncate_page(inode->i_mapping, inode->i_size); > if (ret) > - return ret; > + goto real_out; > > btrfs_wait_ordered_range(inode, inode->i_size & (~mask), (u64)-1); > btrfs_ordered_update_i_size(inode, inode->i_size, NULL); > @@ -6727,8 +6754,10 @@ static int btrfs_truncate(struct inode *inode) > * updating the inode. > */ > rsv = btrfs_alloc_block_rsv(root); > - if (!rsv) > - return -ENOMEM; > + if (!rsv) { > + ret = -ENOMEM; > + goto real_out; > + } > rsv->size = min_size; > > /* > @@ -6847,7 +6876,10 @@ end_trans: > > out: > btrfs_free_block_rsv(root, rsv); > - > +real_out: > + spin_lock(&BTRFS_I(inode)->lock); > + BTRFS_I(inode)->doing_truncate = 0; > + spin_unlock(&BTRFS_I(inode)->lock); > if (ret && !err) > err = ret; > > @@ -6914,6 +6946,8 @@ struct inode *btrfs_alloc_inode(struct super_block *sb) > ei->dummy_inode = 0; > ei->in_defrag = 0; > ei->delalloc_meta_reserved = 0; > + ei->has_orphan_item = 0; > + ei->doing_truncate = 0; > ei->force_compress = BTRFS_COMPRESS_NONE; > > ei->delayed_node = NULL; > @@ -6927,7 +6961,6 @@ struct inode *btrfs_alloc_inode(struct super_block *sb) > mutex_init(&ei->log_mutex); > mutex_init(&ei->delalloc_mutex); > btrfs_ordered_inode_tree_init(&ei->ordered_tree); > - INIT_LIST_HEAD(&ei->i_orphan); > INIT_LIST_HEAD(&ei->delalloc_inodes); > INIT_LIST_HEAD(&ei->ordered_operations); > RB_CLEAR_NODE(&ei->rb_node); > @@ -6972,13 +7005,11 @@ void btrfs_destroy_inode(struct inode *inode) > spin_unlock(&root->fs_info->ordered_extent_lock); > } > > - spin_lock(&root->orphan_lock); > - if (!list_empty(&BTRFS_I(inode)->i_orphan)) { > + if (BTRFS_I(inode)->has_orphan_item) { > printk(KERN_INFO "BTRFS: inode %llu still on the orphan list\n", > (unsigned long long)btrfs_ino(inode)); > - list_del_init(&BTRFS_I(inode)->i_orphan); > + atomic_dec(&root->orphan_inodes); > } > - spin_unlock(&root->orphan_lock); > > while (1) { > ordered = btrfs_lookup_first_ordered_extent(inode, (u64)-1); > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2012/5/21 Miao Xie <miaox@cn.fujitsu.com>:> Hi Josef, > > On fri, 18 May 2012 15:01:05 -0400, Josef Bacik wrote: >> diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h >> index 9b9b15f..492c74f 100644 >> --- a/fs/btrfs/btrfs_inode.h >> +++ b/fs/btrfs/btrfs_inode.h >> @@ -57,9 +57,6 @@ struct btrfs_inode { >> /* used to order data wrt metadata */ >> struct btrfs_ordered_inode_tree ordered_tree; >> >> - /* for keeping track of orphaned inodes */ >> - struct list_head i_orphan; >> - >> /* list of all the delalloc inodes in the FS. There are times we need >> * to write all the delalloc pages to disk, and this list is used >> * to walk them all. >> @@ -156,6 +153,8 @@ struct btrfs_inode { >> unsigned dummy_inode:1; >> unsigned in_defrag:1; >> unsigned delalloc_meta_reserved:1; >> + unsigned has_orphan_item:1; >> + unsigned doing_truncate:1; > > I think the problem is we should not use the different lock to protect the bit fields which > are stored in the same machine word. Or some bit fields may be covered by the others when > someone change those fields. Could you try to declare ->delalloc_meta_reserved and ->has_orphan_item > as a integer?I have tried changing it to: struct btrfs_inode { unsigned orphan_meta_reserved:1; unsigned dummy_inode:1; unsigned in_defrag:1; - unsigned delalloc_meta_reserved:1; + int delalloc_meta_reserved; + int has_orphan_item; + int doing_truncate; The strange thing is, that I''m no longer hitting the BUG_ON, but the old WARNING (no additional messages): [351021.157124] ------------[ cut here ]------------ [351021.162400] WARNING: at fs/btrfs/inode.c:2103 btrfs_orphan_commit_root+0xf7/0x100 [btrfs]() [351021.171812] Hardware name: ProLiant DL180 G6 [351021.176867] Modules linked in: btrfs zlib_deflate libcrc32c xfs exportfs sunrpc bonding ipv6 sg serio_raw pcspkr iTCO_wdt iTCO_vendor_support ixgbe dca mdio i7core_edac edac_core iomemory_vsl(PO) hpsa squashfs [last unloaded: btrfs] [351021.200236] Pid: 9837, comm: btrfs-transacti Tainted: P W O 3.3.5-1.fits.1.el6.x86_64 #1 [351021.210126] Call Trace: [351021.212957] [<ffffffff8104df6f>] warn_slowpath_common+0x7f/0xc0 [351021.219758] [<ffffffff8104dfca>] warn_slowpath_null+0x1a/0x20 [351021.226385] [<ffffffffa03eb627>] btrfs_orphan_commit_root+0xf7/0x100 [btrfs] [351021.234461] [<ffffffffa03e6976>] commit_fs_roots+0xc6/0x1c0 [btrfs] [351021.241669] [<ffffffffa0438c61>] ? btrfs_run_delayed_items+0xf1/0x160 [btrfs] [351021.249841] [<ffffffffa03e7ae4>] btrfs_commit_transaction+0x584/0xa50 [btrfs] [351021.258006] [<ffffffffa03e8432>] ? start_transaction+0x92/0x310 [btrfs] [351021.265580] [<ffffffff81070aa0>] ? wake_up_bit+0x40/0x40 [351021.271719] [<ffffffffa03e2f3b>] transaction_kthread+0x26b/0x2e0 [btrfs] [351021.279405] [<ffffffffa03e2cd0>] ? btrfs_destroy_marked_extents.clone.0+0x1f0/0x1f0 [btrfs] [351021.288934] [<ffffffffa03e2cd0>] ? btrfs_destroy_marked_extents.clone.0+0x1f0/0x1f0 [btrfs] [351021.298449] [<ffffffff8107040e>] kthread+0x9e/0xb0 [351021.303989] [<ffffffff8158c5a4>] kernel_thread_helper+0x4/0x10 [351021.310691] [<ffffffff81070370>] ? kthread_freezable_should_stop+0x70/0x70 [351021.318555] [<ffffffff8158c5a0>] ? gs_change+0x13/0x13 [351021.324479] ---[ end trace 9adc7b36a3e66833 ]--- [351710.339482] ------------[ cut here ]------------ [351710.344754] WARNING: at fs/btrfs/inode.c:2103 btrfs_orphan_commit_root+0xf7/0x100 [btrfs]() [351710.354165] Hardware name: ProLiant DL180 G6 [351710.359222] Modules linked in: btrfs zlib_deflate libcrc32c xfs exportfs sunrpc bonding ipv6 sg serio_raw pcspkr iTCO_wdt iTCO_vendor_support ixgbe dca mdio i7core_edac edac_core iomemory_vsl(PO) hpsa squashfs [last unloaded: btrfs] [351710.382569] Pid: 9797, comm: kworker/5:0 Tainted: P W O 3.3.5-1.fits.1.el6.x86_64 #1 [351710.392075] Call Trace: [351710.394901] [<ffffffff8104df6f>] warn_slowpath_common+0x7f/0xc0 [351710.401750] [<ffffffff8104dfca>] warn_slowpath_null+0x1a/0x20 [351710.408414] [<ffffffffa03eb627>] btrfs_orphan_commit_root+0xf7/0x100 [btrfs] [351710.416528] [<ffffffffa03e6976>] commit_fs_roots+0xc6/0x1c0 [btrfs] [351710.423775] [<ffffffffa03e7ae4>] btrfs_commit_transaction+0x584/0xa50 [btrfs] [351710.431983] [<ffffffff810127a3>] ? __switch_to+0x153/0x440 [351710.438352] [<ffffffff81070aa0>] ? wake_up_bit+0x40/0x40 [351710.444529] [<ffffffffa03e7fb0>] ? btrfs_commit_transaction+0xa50/0xa50 [btrfs] [351710.452894] [<ffffffffa03e7fcf>] do_async_commit+0x1f/0x30 [btrfs] [351710.459979] [<ffffffff81068959>] process_one_work+0x129/0x450 [351710.466576] [<ffffffff8106b7fb>] worker_thread+0x17b/0x3c0 [351710.472884] [<ffffffff8106b680>] ? manage_workers+0x220/0x220 [351710.479472] [<ffffffff8107040e>] kthread+0x9e/0xb0 [351710.485029] [<ffffffff8158c5a4>] kernel_thread_helper+0x4/0x10 [351710.491731] [<ffffffff81070370>] ? kthread_freezable_should_stop+0x70/0x70 [351710.499640] [<ffffffff8158c5a0>] ? gs_change+0x13/0x13 [351710.505590] ---[ end trace 9adc7b36a3e66834 ]--- Regards, Christian -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, May 21, 2012 at 11:59:54AM +0800, Miao Xie wrote:> Hi Josef, > > On fri, 18 May 2012 15:01:05 -0400, Josef Bacik wrote: > > diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h > > index 9b9b15f..492c74f 100644 > > --- a/fs/btrfs/btrfs_inode.h > > +++ b/fs/btrfs/btrfs_inode.h > > @@ -57,9 +57,6 @@ struct btrfs_inode { > > /* used to order data wrt metadata */ > > struct btrfs_ordered_inode_tree ordered_tree; > > > > - /* for keeping track of orphaned inodes */ > > - struct list_head i_orphan; > > - > > /* list of all the delalloc inodes in the FS. There are times we need > > * to write all the delalloc pages to disk, and this list is used > > * to walk them all. > > @@ -156,6 +153,8 @@ struct btrfs_inode { > > unsigned dummy_inode:1; > > unsigned in_defrag:1; > > unsigned delalloc_meta_reserved:1; > > + unsigned has_orphan_item:1; > > + unsigned doing_truncate:1; > > I think the problem is we should not use the different lock to protect the bit fields which > are stored in the same machine word. Or some bit fields may be covered by the others when > someone change those fields. Could you try to declare ->delalloc_meta_reserved and ->has_orphan_item > as a integer? >Oh freaking duh, thank you Miao, I''m an idiot. Josef -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 22, 2012 at 12:29:59PM +0200, Christian Brunner wrote:> 2012/5/21 Miao Xie <miaox@cn.fujitsu.com>: > > Hi Josef, > > > > On fri, 18 May 2012 15:01:05 -0400, Josef Bacik wrote: > >> diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h > >> index 9b9b15f..492c74f 100644 > >> --- a/fs/btrfs/btrfs_inode.h > >> +++ b/fs/btrfs/btrfs_inode.h > >> @@ -57,9 +57,6 @@ struct btrfs_inode { > >> /* used to order data wrt metadata */ > >> struct btrfs_ordered_inode_tree ordered_tree; > >> > >> - /* for keeping track of orphaned inodes */ > >> - struct list_head i_orphan; > >> - > >> /* list of all the delalloc inodes in the FS. There are times we need > >> * to write all the delalloc pages to disk, and this list is used > >> * to walk them all. > >> @@ -156,6 +153,8 @@ struct btrfs_inode { > >> unsigned dummy_inode:1; > >> unsigned in_defrag:1; > >> unsigned delalloc_meta_reserved:1; > >> + unsigned has_orphan_item:1; > >> + unsigned doing_truncate:1; > > > > I think the problem is we should not use the different lock to protect the bit fields which > > are stored in the same machine word. Or some bit fields may be covered by the others when > > someone change those fields. Could you try to declare ->delalloc_meta_reserved and ->has_orphan_item > > as a integer? > > I have tried changing it to: > > struct btrfs_inode { > unsigned orphan_meta_reserved:1; > unsigned dummy_inode:1; > unsigned in_defrag:1; > - unsigned delalloc_meta_reserved:1; > + int delalloc_meta_reserved; > + int has_orphan_item; > + int doing_truncate; > > The strange thing is, that I''m no longer hitting the BUG_ON, but the > old WARNING (no additional messages): >Yeah you would also need to change orphan_meta_reserved. I fixed this by just taking the BTRFS_I(inode)->lock when messing with these since we don''t want to take up all that space in the inode just for a marker. I ran this patch for 3 hours with no issues, let me know if it works for you. Thanks, Josef diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index 3771b85..559e716 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -57,9 +57,6 @@ struct btrfs_inode { /* used to order data wrt metadata */ struct btrfs_ordered_inode_tree ordered_tree; - /* for keeping track of orphaned inodes */ - struct list_head i_orphan; - /* list of all the delalloc inodes in the FS. There are times we need * to write all the delalloc pages to disk, and this list is used * to walk them all. @@ -153,6 +150,7 @@ struct btrfs_inode { unsigned dummy_inode:1; unsigned in_defrag:1; unsigned delalloc_meta_reserved:1; + unsigned has_orphan_item:1; /* * always compress this one file diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index ba8743b..72cdf98 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1375,7 +1375,7 @@ struct btrfs_root { struct list_head root_list; spinlock_t orphan_lock; - struct list_head orphan_list; + atomic_t orphan_inodes; struct btrfs_block_rsv *orphan_block_rsv; int orphan_item_inserted; int orphan_cleanup_state; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 19f5b45..25dba7a 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1153,7 +1153,6 @@ static void __setup_root(u32 nodesize, u32 leafsize, u32 sectorsize, root->orphan_block_rsv = NULL; INIT_LIST_HEAD(&root->dirty_list); - INIT_LIST_HEAD(&root->orphan_list); INIT_LIST_HEAD(&root->root_list); spin_lock_init(&root->orphan_lock); spin_lock_init(&root->inode_lock); @@ -1166,6 +1165,7 @@ static void __setup_root(u32 nodesize, u32 leafsize, u32 sectorsize, atomic_set(&root->log_commit[0], 0); atomic_set(&root->log_commit[1], 0); atomic_set(&root->log_writers, 0); + atomic_set(&root->orphan_inodes, 0); root->log_batch = 0; root->log_transid = 0; root->last_log_commit = 0; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 54ae3df..54f1b30 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2104,12 +2104,12 @@ void btrfs_orphan_commit_root(struct btrfs_trans_handle *trans, struct btrfs_block_rsv *block_rsv; int ret; - if (!list_empty(&root->orphan_list) || + if (atomic_read(&root->orphan_inodes) || root->orphan_cleanup_state != ORPHAN_CLEANUP_DONE) return; spin_lock(&root->orphan_lock); - if (!list_empty(&root->orphan_list)) { + if (atomic_read(&root->orphan_inodes)) { spin_unlock(&root->orphan_lock); return; } @@ -2166,8 +2166,9 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans, struct inode *inode) block_rsv = NULL; } - if (list_empty(&BTRFS_I(inode)->i_orphan)) { - list_add(&BTRFS_I(inode)->i_orphan, &root->orphan_list); + spin_lock(&BTRFS_I(inode)->lock); + if (!BTRFS_I(inode)->has_orphan_item) { + BTRFS_I(inode)->has_orphan_item = 1; #if 0 /* * For proper ENOSPC handling, we should do orphan @@ -2180,12 +2181,14 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans, struct inode *inode) insert = 1; #endif insert = 1; + atomic_inc(&root->orphan_inodes); } if (!BTRFS_I(inode)->orphan_meta_reserved) { BTRFS_I(inode)->orphan_meta_reserved = 1; reserve = 1; } + spin_unlock(&BTRFS_I(inode)->lock); spin_unlock(&root->orphan_lock); /* grab metadata reservation from transaction handle */ @@ -2198,6 +2201,9 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans, struct inode *inode) if (insert >= 1) { ret = btrfs_insert_orphan_item(trans, root, btrfs_ino(inode)); if (ret && ret != -EEXIST) { + spin_lock(&BTRFS_I(inode)->lock); + BTRFS_I(inode)->has_orphan_item = 0; + spin_unlock(&BTRFS_I(inode)->lock); btrfs_abort_transaction(trans, root, ret); return ret; } @@ -2227,26 +2233,41 @@ int btrfs_orphan_del(struct btrfs_trans_handle *trans, struct inode *inode) int release_rsv = 0; int ret = 0; - spin_lock(&root->orphan_lock); - if (!list_empty(&BTRFS_I(inode)->i_orphan)) { - list_del_init(&BTRFS_I(inode)->i_orphan); - delete_item = 1; + /* + * evict_inode gets called without holding the i_mutex so we need to + * take the orphan lock to make sure we are safe in messing with these. + */ + spin_lock(&BTRFS_I(inode)->lock); + if (BTRFS_I(inode)->has_orphan_item) { + if (trans) { + BTRFS_I(inode)->has_orphan_item = 0; + delete_item = 1; + } else { + WARN_ON(1); + } } - if (BTRFS_I(inode)->orphan_meta_reserved) { + if (trans && BTRFS_I(inode)->orphan_meta_reserved) { BTRFS_I(inode)->orphan_meta_reserved = 0; release_rsv = 1; } - spin_unlock(&root->orphan_lock); + spin_unlock(&BTRFS_I(inode)->lock); if (trans && delete_item) { ret = btrfs_del_orphan_item(trans, root, btrfs_ino(inode)); + if (ret) + printk(KERN_ERR "couldn''t find orphan item for %Lu, nlink %d, root %Lu, root being deleted %s\n", + btrfs_ino(inode), inode->i_nlink, root->objectid, + root->orphan_item_inserted ? "yes" : "no"); BUG_ON(ret); /* -ENOMEM or corruption (JDM: Recheck) */ } if (release_rsv) btrfs_orphan_release_metadata(inode); + if (trans && delete_item) + atomic_dec(&root->orphan_inodes); + return 0; } @@ -2373,6 +2394,8 @@ int btrfs_orphan_cleanup(struct btrfs_root *root) ret = PTR_ERR(trans); goto out; } + printk(KERN_ERR "auto deleting %Lu\n", + found_key.objectid); ret = btrfs_del_orphan_item(trans, root, found_key.objectid); BUG_ON(ret); /* -ENOMEM or corruption (JDM: Recheck) */ @@ -2384,9 +2407,11 @@ int btrfs_orphan_cleanup(struct btrfs_root *root) * add this inode to the orphan list so btrfs_orphan_del does * the proper thing when we hit it */ - spin_lock(&root->orphan_lock); - list_add(&BTRFS_I(inode)->i_orphan, &root->orphan_list); - spin_unlock(&root->orphan_lock); + spin_lock(&BTRFS_I(inode)->lock); + atomic_inc(&root->orphan_inodes); + WARN_ON(BTRFS_I(inode)->has_orphan_item); + BTRFS_I(inode)->has_orphan_item = 1; + spin_unlock(&BTRFS_I(inode)->lock); /* if we have links, this was a truncate, lets do that */ if (inode->i_nlink) { @@ -3707,7 +3732,7 @@ void btrfs_evict_inode(struct inode *inode) btrfs_wait_ordered_range(inode, 0, (u64)-1); if (root->fs_info->log_root_recovering) { - BUG_ON(!list_empty(&BTRFS_I(inode)->i_orphan)); + BUG_ON(!BTRFS_I(inode)->has_orphan_item); goto no_delete; } @@ -6638,7 +6663,7 @@ static int btrfs_truncate(struct inode *inode) ret = btrfs_truncate_page(inode->i_mapping, inode->i_size); if (ret) - return ret; + goto real_out; btrfs_wait_ordered_range(inode, inode->i_size & (~mask), (u64)-1); btrfs_ordered_update_i_size(inode, inode->i_size, NULL); @@ -6680,8 +6705,10 @@ static int btrfs_truncate(struct inode *inode) * updating the inode. */ rsv = btrfs_alloc_block_rsv(root); - if (!rsv) - return -ENOMEM; + if (!rsv) { + ret = -ENOMEM; + goto real_out; + } rsv->size = min_size; /* @@ -6800,7 +6827,7 @@ end_trans: out: btrfs_free_block_rsv(root, rsv); - +real_out: if (ret && !err) err = ret; @@ -6866,6 +6893,7 @@ struct inode *btrfs_alloc_inode(struct super_block *sb) ei->dummy_inode = 0; ei->in_defrag = 0; ei->delalloc_meta_reserved = 0; + ei->has_orphan_item = 0; ei->force_compress = BTRFS_COMPRESS_NONE; ei->delayed_node = NULL; @@ -6879,7 +6907,6 @@ struct inode *btrfs_alloc_inode(struct super_block *sb) mutex_init(&ei->log_mutex); mutex_init(&ei->delalloc_mutex); btrfs_ordered_inode_tree_init(&ei->ordered_tree); - INIT_LIST_HEAD(&ei->i_orphan); INIT_LIST_HEAD(&ei->delalloc_inodes); INIT_LIST_HEAD(&ei->ordered_operations); RB_CLEAR_NODE(&ei->rb_node); @@ -6924,13 +6951,11 @@ void btrfs_destroy_inode(struct inode *inode) spin_unlock(&root->fs_info->ordered_extent_lock); } - spin_lock(&root->orphan_lock); - if (!list_empty(&BTRFS_I(inode)->i_orphan)) { + if (BTRFS_I(inode)->has_orphan_item) { printk(KERN_INFO "BTRFS: inode %llu still on the orphan list\n", (unsigned long long)btrfs_ino(inode)); - list_del_init(&BTRFS_I(inode)->i_orphan); + atomic_dec(&root->orphan_inodes); } - spin_unlock(&root->orphan_lock); while (1) { ordered = btrfs_lookup_first_ordered_extent(inode, (u64)-1); -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2012/5/22 Josef Bacik <josef@redhat.com>:>> > > Yeah you would also need to change orphan_meta_reserved. I fixed this by just > taking the BTRFS_I(inode)->lock when messing with these since we don''t want to > take up all that space in the inode just for a marker. I ran this patch for 3 > hours with no issues, let me know if it works for you. Thanks,Compared to the last runs, I had to run it much longer, but somehow I managed to hit a BUG_ON again: [448281.002087] couldn''t find orphan item for 2027, nlink 1, root 308, root being deleted no [448281.011339] ------------[ cut here ]------------ [448281.016590] kernel BUG at fs/btrfs/inode.c:2230! [448281.021837] invalid opcode: 0000 [#1] SMP [448281.026525] CPU 4 [448281.028670] Modules linked in: btrfs zlib_deflate libcrc32c xfs exportfs sunrpc bonding ipv6 sg serio_raw pcspkr iTCO_wdt iTCO_vendor_support ixgbe dca mdio i7core_edac edac_core iomemory_vsl(PO) hpsa squashfs [last unloaded: btrfs] [448281.052215] [448281.053977] Pid: 16018, comm: ceph-osd Tainted: P W O 3.3.5-1.fits.1.el6.x86_64 #1 HP ProLiant DL180 G6 [448281.065555] RIP: 0010:[<ffffffffa04a17ab>] [<ffffffffa04a17ab>] btrfs_orphan_del+0x19b/0x1b0 [btrfs] [448281.075965] RSP: 0018:ffff880458257d18 EFLAGS: 00010292 [448281.081987] RAX: 0000000000000063 RBX: ffff8803a28ebc48 RCX: 0000000000002fdb [448281.090042] RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246 [448281.098093] RBP: ffff880458257d58 R08: ffffffff81af6100 R09: 0000000000000000 [448281.106146] R10: 0000000000000004 R11: 0000000000000000 R12: 0000000000000001 [448281.114202] R13: ffff88052e130400 R14: 0000000000000001 R15: ffff8805beae9e10 [448281.122262] FS: 00007fa2e772f700(0000) GS:ffff880627280000(0000) knlGS:0000000000000000 [448281.131386] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [448281.137879] CR2: ffffffffff600400 CR3: 00000005015a5000 CR4: 00000000000006e0 [448281.145929] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [448281.153974] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [448281.162043] Process ceph-osd (pid: 16018, threadinfo ffff880458256000, task ffff88055b711940) [448281.171646] Stack: [448281.173987] ffff880458257dff ffff8803a28eba98 ffff880458257d58 ffff8805beae9e10 [448281.182377] 0000000000000000 ffff88052e130400 ffff88029ff33380 ffff8803a28ebc48 [448281.190766] ffff880458257e08 ffffffffa04ab4e6 0000000000000000 ffff8803a28ebc48 [448281.199155] Call Trace: [448281.202005] [<ffffffffa04ab4e6>] btrfs_truncate+0x5f6/0x660 [btrfs] [448281.209203] [<ffffffffa04ab646>] btrfs_setattr+0xf6/0x1a0 [btrfs] [448281.216202] [<ffffffff811816fb>] notify_change+0x18b/0x2b0 [448281.222517] [<ffffffff81276541>] ? selinux_inode_permission+0xd1/0x130 [448281.229990] [<ffffffff81165f44>] do_truncate+0x64/0xa0 [448281.235919] [<ffffffff81172669>] ? inode_permission+0x49/0x100 [448281.242617] [<ffffffff81166197>] sys_truncate+0x137/0x150 [448281.248838] [<ffffffff8158b1e9>] system_call_fastpath+0x16/0x1b [448281.255631] Code: a0 49 8b 8d f0 02 00 00 8b 53 48 4c 0f 45 c0 48 85 f6 74 1b 80 bb 60 fe ff ff 84 74 12 48 c7 c7 e8 1d 50 a0 31 c0 e8 9d ea 0d e1 <0f> 0b eb fe 48 8b 73 40 eb e8 66 66 2e 0f 1f 84 00 00 00 00 00 [448281.277435] RIP [<ffffffffa04a17ab>] btrfs_orphan_del+0x19b/0x1b0 [btrfs] [448281.285229] RSP <ffff880458257d18> [448281.289667] ---[ end trace 9adc7b36a3e66872 ]--- Sorry, Christian -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 23, 2012 at 02:34:43PM +0200, Christian Brunner wrote:> 2012/5/22 Josef Bacik <josef@redhat.com>: > >> > > > > Yeah you would also need to change orphan_meta_reserved. I fixed this by just > > taking the BTRFS_I(inode)->lock when messing with these since we don''t want to > > take up all that space in the inode just for a marker. I ran this patch for 3 > > hours with no issues, let me know if it works for you. Thanks, > > Compared to the last runs, I had to run it much longer, but somehow I > managed to hit a BUG_ON again: >Yeah it''s because we access other parts of that bitfield with no lock at all which is what is likely screwing us. I''m going to have to redo that part and then do the orphan fix, I''ll have a patch shortly. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 23, 2012 at 02:34:43PM +0200, Christian Brunner wrote:> 2012/5/22 Josef Bacik <josef@redhat.com>: > >> > > > > Yeah you would also need to change orphan_meta_reserved. I fixed this by just > > taking the BTRFS_I(inode)->lock when messing with these since we don''t want to > > take up all that space in the inode just for a marker. I ran this patch for 3 > > hours with no issues, let me know if it works for you. Thanks, > > Compared to the last runs, I had to run it much longer, but somehow I > managed to hit a BUG_ON again: >Ok give this a shot, it should do it. Thanks, Josef diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index 9b9b15f..41ddec8 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -24,6 +24,22 @@ #include "ordered-data.h" #include "delayed-inode.h" +/* + * ordered_data_close is set by truncate when a file that used + * to have good data has been truncated to zero. When it is set + * the btrfs file release call will add this inode to the + * ordered operations list so that we make sure to flush out any + * new data the application may have written before commit. + */ +#define BTRFS_INODE_ORDERED_DATA_CLOSE 0 +#define BTRFS_INODE_ORPHAN_META_RESERVED 1 +#define BTRFS_INODE_DUMMY 2 +#define BTRFS_INODE_IN_DEFRAG 3 +#define BTRFS_INODE_DELALLOC_META_RESERVED 4 +#define BTRFS_INODE_HAS_ORPHAN_ITEM 5 +#define BTRFS_INODE_FORCE_ZLIB 6 +#define BTRFS_INODE_FORCE_LZO 7 + /* in memory btrfs inode */ struct btrfs_inode { /* which subvolume this inode belongs to */ @@ -57,9 +73,6 @@ struct btrfs_inode { /* used to order data wrt metadata */ struct btrfs_ordered_inode_tree ordered_tree; - /* for keeping track of orphaned inodes */ - struct list_head i_orphan; - /* list of all the delalloc inodes in the FS. There are times we need * to write all the delalloc pages to disk, and this list is used * to walk them all. @@ -143,24 +156,7 @@ struct btrfs_inode { */ unsigned outstanding_extents; unsigned reserved_extents; - - /* - * ordered_data_close is set by truncate when a file that used - * to have good data has been truncated to zero. When it is set - * the btrfs file release call will add this inode to the - * ordered operations list so that we make sure to flush out any - * new data the application may have written before commit. - */ - unsigned ordered_data_close:1; - unsigned orphan_meta_reserved:1; - unsigned dummy_inode:1; - unsigned in_defrag:1; - unsigned delalloc_meta_reserved:1; - - /* - * always compress this one file - */ - unsigned force_compress:4; + unsigned long runtime_flags; struct btrfs_delayed_node *delayed_node; diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 8fd7233..aad2600 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1375,7 +1375,7 @@ struct btrfs_root { struct list_head root_list; spinlock_t orphan_lock; - struct list_head orphan_list; + atomic_t orphan_inodes; struct btrfs_block_rsv *orphan_block_rsv; int orphan_item_inserted; int orphan_cleanup_state; diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c index 03e3748..5190861 100644 --- a/fs/btrfs/delayed-inode.c +++ b/fs/btrfs/delayed-inode.c @@ -669,8 +669,8 @@ static int btrfs_delayed_inode_reserve_metadata( return ret; } else if (src_rsv == &root->fs_info->delalloc_block_rsv) { spin_lock(&BTRFS_I(inode)->lock); - if (BTRFS_I(inode)->delalloc_meta_reserved) { - BTRFS_I(inode)->delalloc_meta_reserved = 0; + if (test_and_clear_bit(BTRFS_INODE_DELALLOC_META_RESERVED, + &BTRFS_I(inode)->runtime_flags)) { spin_unlock(&BTRFS_I(inode)->lock); release = true; goto migrate; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index a7ffc88..0ddeb0d 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1153,7 +1153,6 @@ static void __setup_root(u32 nodesize, u32 leafsize, u32 sectorsize, root->orphan_block_rsv = NULL; INIT_LIST_HEAD(&root->dirty_list); - INIT_LIST_HEAD(&root->orphan_list); INIT_LIST_HEAD(&root->root_list); spin_lock_init(&root->orphan_lock); spin_lock_init(&root->inode_lock); @@ -1166,6 +1165,7 @@ static void __setup_root(u32 nodesize, u32 leafsize, u32 sectorsize, atomic_set(&root->log_commit[0], 0); atomic_set(&root->log_commit[1], 0); atomic_set(&root->log_writers, 0); + atomic_set(&root->orphan_inodes, 0); root->log_batch = 0; root->log_transid = 0; root->last_log_commit = 0; @@ -2001,7 +2001,8 @@ int open_ctree(struct super_block *sb, BTRFS_I(fs_info->btree_inode)->root = tree_root; memset(&BTRFS_I(fs_info->btree_inode)->location, 0, sizeof(struct btrfs_key)); - BTRFS_I(fs_info->btree_inode)->dummy_inode = 1; + set_bit(BTRFS_INODE_DUMMY, + &BTRFS_I(fs_info->btree_inode)->runtime_flags); insert_inode_hash(fs_info->btree_inode); spin_lock_init(&fs_info->block_group_cache_lock); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 49fd7b6..b372040 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4355,10 +4355,9 @@ static unsigned drop_outstanding_extent(struct inode *inode) BTRFS_I(inode)->outstanding_extents--; if (BTRFS_I(inode)->outstanding_extents == 0 && - BTRFS_I(inode)->delalloc_meta_reserved) { + test_and_clear_bit(BTRFS_INODE_DELALLOC_META_RESERVED, + &BTRFS_I(inode)->runtime_flags)) drop_inode_space = 1; - BTRFS_I(inode)->delalloc_meta_reserved = 0; - } /* * If we have more or the same amount of outsanding extents than we have @@ -4465,7 +4464,8 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes) * Add an item to reserve for updating the inode when we complete the * delalloc io. */ - if (!BTRFS_I(inode)->delalloc_meta_reserved) { + if (!test_bit(BTRFS_INODE_DELALLOC_META_RESERVED, + &BTRFS_I(inode)->runtime_flags)) { nr_extents++; extra_reserve = 1; } @@ -4511,7 +4511,8 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes) spin_lock(&BTRFS_I(inode)->lock); if (extra_reserve) { - BTRFS_I(inode)->delalloc_meta_reserved = 1; + set_bit(BTRFS_INODE_DELALLOC_META_RESERVED, + &BTRFS_I(inode)->runtime_flags); nr_extents--; } BTRFS_I(inode)->reserved_extents += nr_extents; diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 53bf2d7..2f19fe9 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -103,7 +103,7 @@ static void __btrfs_add_inode_defrag(struct inode *inode, goto exists; } } - BTRFS_I(inode)->in_defrag = 1; + set_bit(BTRFS_INODE_IN_DEFRAG, &BTRFS_I(inode)->runtime_flags); rb_link_node(&defrag->rb_node, parent, p); rb_insert_color(&defrag->rb_node, &root->fs_info->defrag_inodes); return; @@ -131,7 +131,7 @@ int btrfs_add_inode_defrag(struct btrfs_trans_handle *trans, if (btrfs_fs_closing(root->fs_info)) return 0; - if (BTRFS_I(inode)->in_defrag) + if (test_bit(BTRFS_INODE_IN_DEFRAG, &BTRFS_I(inode)->runtime_flags)) return 0; if (trans) @@ -148,7 +148,7 @@ int btrfs_add_inode_defrag(struct btrfs_trans_handle *trans, defrag->root = root->root_key.objectid; spin_lock(&root->fs_info->defrag_inodes_lock); - if (!BTRFS_I(inode)->in_defrag) + if (!test_bit(BTRFS_INODE_IN_DEFRAG, &BTRFS_I(inode)->runtime_flags)) __btrfs_add_inode_defrag(inode, defrag); else kfree(defrag); @@ -252,7 +252,7 @@ int btrfs_run_defrag_inodes(struct btrfs_fs_info *fs_info) goto next; /* do a chunk of defrag */ - BTRFS_I(inode)->in_defrag = 0; + clear_bit(BTRFS_INODE_IN_DEFRAG, &BTRFS_I(inode)->runtime_flags); range.start = defrag->last_offset; num_defrag = btrfs_defrag_file(inode, NULL, &range, defrag->transid, defrag_batch); @@ -1466,8 +1466,8 @@ int btrfs_release_file(struct inode *inode, struct file *filp) * flush down new bytes that may have been written if the * application were using truncate to replace a file in place. */ - if (BTRFS_I(inode)->ordered_data_close) { - BTRFS_I(inode)->ordered_data_close = 0; + if (test_and_clear_bit(BTRFS_INODE_ORDERED_DATA_CLOSE, + &BTRFS_I(inode)->runtime_flags)) { btrfs_add_ordered_operation(NULL, BTRFS_I(inode)->root, inode); if (inode->i_size > BTRFS_ORDERED_OPERATIONS_FLUSH_LIMIT) filemap_flush(inode->i_mapping); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 61b16c6..1d42dba 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -109,6 +109,15 @@ static int btrfs_init_inode_security(struct btrfs_trans_handle *trans, return err; } +static int btrfs_inode_force_compress(struct inode *inode) +{ + if (test_bit(BTRFS_INODE_FORCE_ZLIB, &BTRFS_I(inode)->runtime_flags)) + return BTRFS_COMPRESS_ZLIB; + if (test_bit(BTRFS_INODE_FORCE_LZO, &BTRFS_I(inode)->runtime_flags)) + return BTRFS_COMPRESS_LZO; + return BTRFS_COMPRESS_NONE; +} + /* * this does all the hard work for inserting an inline extent into * the btree. The caller should have done a btrfs_drop_extents so that @@ -396,7 +405,7 @@ again: */ if (!(BTRFS_I(inode)->flags & BTRFS_INODE_NOCOMPRESS) && (btrfs_test_opt(root, COMPRESS) || - (BTRFS_I(inode)->force_compress) || + btrfs_inode_force_compress(inode) || (BTRFS_I(inode)->flags & BTRFS_INODE_COMPRESS))) { WARN_ON(pages); pages = kzalloc(sizeof(struct page *) * nr_pages, GFP_NOFS); @@ -405,8 +414,8 @@ again: goto cont; } - if (BTRFS_I(inode)->force_compress) - compress_type = BTRFS_I(inode)->force_compress; + if (btrfs_inode_force_compress(inode)) + compress_type = btrfs_inode_force_compress(inode); ret = btrfs_compress_pages(compress_type, inode->i_mapping, start, @@ -514,7 +523,7 @@ cont: /* flag the file so we don''t compress in the future */ if (!btrfs_test_opt(root, FORCE_COMPRESS) && - !(BTRFS_I(inode)->force_compress)) { + !btrfs_inode_force_compress(inode)) { BTRFS_I(inode)->flags |= BTRFS_INODE_NOCOMPRESS; } } @@ -1365,7 +1374,7 @@ static int run_delalloc_range(struct inode *inode, struct page *locked_page, ret = run_delalloc_nocow(inode, locked_page, start, end, page_started, 0, nr_written); else if (!btrfs_test_opt(root, COMPRESS) && - !(BTRFS_I(inode)->force_compress) && + !btrfs_inode_force_compress(inode) && !(BTRFS_I(inode)->flags & BTRFS_INODE_COMPRESS)) ret = cow_file_range(inode, locked_page, start, end, page_started, nr_written, 1); @@ -2072,12 +2081,12 @@ void btrfs_orphan_commit_root(struct btrfs_trans_handle *trans, struct btrfs_block_rsv *block_rsv; int ret; - if (!list_empty(&root->orphan_list) || + if (atomic_read(&root->orphan_inodes) || root->orphan_cleanup_state != ORPHAN_CLEANUP_DONE) return; spin_lock(&root->orphan_lock); - if (!list_empty(&root->orphan_list)) { + if (atomic_read(&root->orphan_inodes)) { spin_unlock(&root->orphan_lock); return; } @@ -2134,8 +2143,8 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans, struct inode *inode) block_rsv = NULL; } - if (list_empty(&BTRFS_I(inode)->i_orphan)) { - list_add(&BTRFS_I(inode)->i_orphan, &root->orphan_list); + if (!test_and_set_bit(BTRFS_INODE_HAS_ORPHAN_ITEM, + &BTRFS_I(inode)->runtime_flags)) { #if 0 /* * For proper ENOSPC handling, we should do orphan @@ -2148,12 +2157,12 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans, struct inode *inode) insert = 1; #endif insert = 1; + atomic_inc(&root->orphan_inodes); } - if (!BTRFS_I(inode)->orphan_meta_reserved) { - BTRFS_I(inode)->orphan_meta_reserved = 1; + if (!test_and_set_bit(BTRFS_INODE_ORPHAN_META_RESERVED, + &BTRFS_I(inode)->runtime_flags)) reserve = 1; - } spin_unlock(&root->orphan_lock); /* grab metadata reservation from transaction handle */ @@ -2166,6 +2175,8 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans, struct inode *inode) if (insert >= 1) { ret = btrfs_insert_orphan_item(trans, root, btrfs_ino(inode)); if (ret && ret != -EEXIST) { + clear_bit(BTRFS_INODE_HAS_ORPHAN_ITEM, + &BTRFS_I(inode)->runtime_flags); btrfs_abort_transaction(trans, root, ret); return ret; } @@ -2195,26 +2206,33 @@ int btrfs_orphan_del(struct btrfs_trans_handle *trans, struct inode *inode) int release_rsv = 0; int ret = 0; - spin_lock(&root->orphan_lock); - if (!list_empty(&BTRFS_I(inode)->i_orphan)) { - list_del_init(&BTRFS_I(inode)->i_orphan); + /* + * evict_inode gets called without holding the i_mutex so we need to + * take the orphan lock to make sure we are safe in messing with these. + */ + if (trans && test_and_clear_bit(BTRFS_INODE_HAS_ORPHAN_ITEM, + &BTRFS_I(inode)->runtime_flags)) delete_item = 1; - } - if (BTRFS_I(inode)->orphan_meta_reserved) { - BTRFS_I(inode)->orphan_meta_reserved = 0; + if (trans && test_and_clear_bit(BTRFS_INODE_ORPHAN_META_RESERVED, + &BTRFS_I(inode)->runtime_flags)) release_rsv = 1; - } - spin_unlock(&root->orphan_lock); if (trans && delete_item) { ret = btrfs_del_orphan_item(trans, root, btrfs_ino(inode)); + if (ret) + printk(KERN_ERR "couldn''t find orphan item for %Lu, nlink %d, root %Lu, root being deleted %s\n", + btrfs_ino(inode), inode->i_nlink, root->objectid, + root->orphan_item_inserted ? "yes" : "no"); BUG_ON(ret); /* -ENOMEM or corruption (JDM: Recheck) */ } if (release_rsv) btrfs_orphan_release_metadata(inode); + if (trans && delete_item) + atomic_dec(&root->orphan_inodes); + return 0; } @@ -2341,6 +2359,8 @@ int btrfs_orphan_cleanup(struct btrfs_root *root) ret = PTR_ERR(trans); goto out; } + printk(KERN_ERR "auto deleting %Lu\n", + found_key.objectid); ret = btrfs_del_orphan_item(trans, root, found_key.objectid); BUG_ON(ret); /* -ENOMEM or corruption (JDM: Recheck) */ @@ -2352,9 +2372,9 @@ int btrfs_orphan_cleanup(struct btrfs_root *root) * add this inode to the orphan list so btrfs_orphan_del does * the proper thing when we hit it */ - spin_lock(&root->orphan_lock); - list_add(&BTRFS_I(inode)->i_orphan, &root->orphan_list); - spin_unlock(&root->orphan_lock); + set_bit(BTRFS_INODE_HAS_ORPHAN_ITEM, + &BTRFS_I(inode)->runtime_flags); + atomic_inc(&root->orphan_inodes); /* if we have links, this was a truncate, lets do that */ if (inode->i_nlink) { @@ -3607,7 +3627,8 @@ static int btrfs_setsize(struct inode *inode, loff_t newsize) * any new writes get down to disk quickly. */ if (newsize == 0) - BTRFS_I(inode)->ordered_data_close = 1; + set_bit(BTRFS_INODE_ORDERED_DATA_CLOSE, + &BTRFS_I(inode)->runtime_flags); /* we don''t support swapfiles, so vmtruncate shouldn''t fail */ truncate_setsize(inode, newsize); @@ -3671,7 +3692,8 @@ void btrfs_evict_inode(struct inode *inode) btrfs_wait_ordered_range(inode, 0, (u64)-1); if (root->fs_info->log_root_recovering) { - BUG_ON(!list_empty(&BTRFS_I(inode)->i_orphan)); + BUG_ON(!test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM, + &BTRFS_I(inode)->runtime_flags)); goto no_delete; } @@ -4066,7 +4088,7 @@ static struct inode *new_simple_dir(struct super_block *s, BTRFS_I(inode)->root = root; memcpy(&BTRFS_I(inode)->location, key, sizeof(*key)); - BTRFS_I(inode)->dummy_inode = 1; + set_bit(BTRFS_INODE_DUMMY, &BTRFS_I(inode)->runtime_flags); inode->i_ino = BTRFS_EMPTY_SUBVOL_DIR_OBJECTID; inode->i_op = &btrfs_dir_ro_inode_operations; @@ -4370,7 +4392,7 @@ int btrfs_write_inode(struct inode *inode, struct writeback_control *wbc) int ret = 0; bool nolock = false; - if (BTRFS_I(inode)->dummy_inode) + if (test_bit(BTRFS_INODE_DUMMY, &BTRFS_I(inode)->runtime_flags)) return 0; if (btrfs_fs_closing(root->fs_info) && btrfs_is_free_space_inode(root, inode)) @@ -4403,7 +4425,7 @@ int btrfs_dirty_inode(struct inode *inode) struct btrfs_trans_handle *trans; int ret; - if (BTRFS_I(inode)->dummy_inode) + if (test_bit(BTRFS_INODE_DUMMY, &BTRFS_I(inode)->runtime_flags)) return 0; trans = btrfs_join_transaction(root); @@ -6685,7 +6707,7 @@ static int btrfs_truncate(struct inode *inode) ret = btrfs_truncate_page(inode->i_mapping, inode->i_size); if (ret) - return ret; + goto real_out; btrfs_wait_ordered_range(inode, inode->i_size & (~mask), (u64)-1); btrfs_ordered_update_i_size(inode, inode->i_size, NULL); @@ -6727,8 +6749,10 @@ static int btrfs_truncate(struct inode *inode) * updating the inode. */ rsv = btrfs_alloc_block_rsv(root); - if (!rsv) - return -ENOMEM; + if (!rsv) { + ret = -ENOMEM; + goto real_out; + } rsv->size = min_size; /* @@ -6771,7 +6795,8 @@ static int btrfs_truncate(struct inode *inode) * using truncate to replace the contents of the file will * end up with a zero length file after a crash. */ - if (inode->i_size == 0 && BTRFS_I(inode)->ordered_data_close) + if (inode->i_size == 0 && test_bit(BTRFS_INODE_ORDERED_DATA_CLOSE, + &BTRFS_I(inode)->runtime_flags)) btrfs_add_ordered_operation(trans, root, inode); while (1) { @@ -6847,7 +6872,7 @@ end_trans: out: btrfs_free_block_rsv(root, rsv); - +real_out: if (ret && !err) err = ret; @@ -6909,12 +6934,7 @@ struct inode *btrfs_alloc_inode(struct super_block *sb) ei->outstanding_extents = 0; ei->reserved_extents = 0; - ei->ordered_data_close = 0; - ei->orphan_meta_reserved = 0; - ei->dummy_inode = 0; - ei->in_defrag = 0; - ei->delalloc_meta_reserved = 0; - ei->force_compress = BTRFS_COMPRESS_NONE; + ei->runtime_flags = 0; ei->delayed_node = NULL; @@ -6927,7 +6947,6 @@ struct inode *btrfs_alloc_inode(struct super_block *sb) mutex_init(&ei->log_mutex); mutex_init(&ei->delalloc_mutex); btrfs_ordered_inode_tree_init(&ei->ordered_tree); - INIT_LIST_HEAD(&ei->i_orphan); INIT_LIST_HEAD(&ei->delalloc_inodes); INIT_LIST_HEAD(&ei->ordered_operations); RB_CLEAR_NODE(&ei->rb_node); @@ -6972,13 +6991,12 @@ void btrfs_destroy_inode(struct inode *inode) spin_unlock(&root->fs_info->ordered_extent_lock); } - spin_lock(&root->orphan_lock); - if (!list_empty(&BTRFS_I(inode)->i_orphan)) { + if (test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM, + &BTRFS_I(inode)->runtime_flags)) { printk(KERN_INFO "BTRFS: inode %llu still on the orphan list\n", (unsigned long long)btrfs_ino(inode)); - list_del_init(&BTRFS_I(inode)->i_orphan); + atomic_dec(&root->orphan_inodes); } - spin_unlock(&root->orphan_lock); while (1) { ordered = btrfs_lookup_first_ordered_extent(inode, (u64)-1); diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 14f8e1f..a901654 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1039,6 +1039,21 @@ out: } +static void btrfs_set_inode_force_compress(struct inode *inode, + int compress_type) +{ + if (compress_type == BTRFS_COMPRESS_ZLIB) { + set_bit(BTRFS_INODE_FORCE_ZLIB, &BTRFS_I(inode)->runtime_flags); + } else if (compress_type == BTRFS_COMPRESS_LZO) { + set_bit(BTRFS_INODE_FORCE_LZO, &BTRFS_I(inode)->runtime_flags); + } else if (compress_type == BTRFS_COMPRESS_NONE) { + clear_bit(BTRFS_INODE_FORCE_ZLIB, + &BTRFS_I(inode)->runtime_flags); + clear_bit(BTRFS_INODE_FORCE_LZO, + &BTRFS_I(inode)->runtime_flags); + } +} + int btrfs_defrag_file(struct inode *inode, struct file *file, struct btrfs_ioctl_defrag_range_args *range, u64 newer_than, unsigned long max_to_defrag) @@ -1162,7 +1177,7 @@ int btrfs_defrag_file(struct inode *inode, struct file *file, } if (range->flags & BTRFS_DEFRAG_RANGE_COMPRESS) - BTRFS_I(inode)->force_compress = compress_type; + btrfs_set_inode_force_compress(inode, compress_type); if (i + cluster > ra_index) { ra_index = max(i, ra_index); @@ -1230,7 +1245,7 @@ int btrfs_defrag_file(struct inode *inode, struct file *file, atomic_dec(&root->fs_info->async_submit_draining); mutex_lock(&inode->i_mutex); - BTRFS_I(inode)->force_compress = BTRFS_COMPRESS_NONE; + btrfs_set_inode_force_compress(inode, BTRFS_COMPRESS_NONE); mutex_unlock(&inode->i_mutex); } -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Josef, this patch is running for 3 hours without a Bug and without the Warning. I will let it run overnight and report tomorrow. It looks very good ;-) -martin Am 23.05.2012 17:02, schrieb Josef Bacik:> Ok give this a shot, it should do it. Thanks,-- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, the ceph cluster is running under heavy load for the last 13 hours without a problem, dmesg is empty and the performance is good. -martin Am 23.05.2012 21:12, schrieb Martin Mailand:> this patch is running for 3 hours without a Bug and without the Warning. > I will let it run overnight and report tomorrow. > It looks very good ;-)-- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Same thing here. I''ve tried really hard, but even after 12 hours I wasn''t able to get a single warning from btrfs. I think you cracked it! Thanks, Christian 2012/5/24 Martin Mailand <martin@tuxadero.com>:> Hi, > the ceph cluster is running under heavy load for the last 13 hours without a > problem, dmesg is empty and the performance is good. > > -martin > > Am 23.05.2012 21:12, schrieb Martin Mailand: > >> this patch is running for 3 hours without a Bug and without the Warning. >> I will let it run overnight and report tomorrow. >> It looks very good ;-)-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Maybe Matching Threads
- WARNING: at fs/btrfs/inode.c:2198 btrfs_orphan_commit_root+0xa8/0xc0
- [PATCH] Btrfs: fix orphan cleanup regression
- [PATCH] Btrfs: protect orphan block rsv with spin_lock
- [PATCH 1/4] Btrfs: map the node block when looking for readahead targets
- [GIT PULL v3] Btrfs: improve write ahead log with sub transaction