As I type this, I have an ssh process running that''s dumping data into a fifo at high speed (maybe 500Mbps) and a tar process that''s untarring from the same fifo onto btrfs. The btrfs fs is mounted -o space_cache,compress. This machine has 8GB ram, 8 logical cores, and a fast (i7-2600) CPU, so it''s not an issue with the machine struggling under load. Every few tens of seconds, my system stalls for several seconds. These stalls cause keyboard input to be lost, firefox to hang, etc. Setting tar''s ionice priority to best effort / 7 or to idle makes no difference. ionice idle and queue_depth = 1 on the disk (a slow 2TB WD) also makes no difference. max_sectors_kb = 64 in addition to the above doesn''t help either. latencytop shows regular instances of 2-7 *second* latency, variously in sync_page, start_transaction, btrfs_start_ordered_extent, and do_get_write_access (from jbd2 on my ext4 root partition). echo 3 >drop_caches gave me 7 GB free RAM. I still had stalls when 4-5 GB were still free (so it shouldn''t be a problem with important pages being evicted). In case it matters, all of my partitions are on LVM on dm-crypt, but this machine has AES-NI so the overhead from that should be minimal. In fact, overall CPU usage is only about 10%. What gives? I thought this stuff was supposed to be better on modern kernels. --Andy
Excerpts from Andrew Lutomirski''s message of 2011-02-11 10:08:52 -0500:> As I type this, I have an ssh process running that''s dumping data into > a fifo at high speed (maybe 500Mbps) and a tar process that''s > untarring from the same fifo onto btrfs. The btrfs fs is mounted -o > space_cache,compress. This machine has 8GB ram, 8 logical cores, and > a fast (i7-2600) CPU, so it''s not an issue with the machine struggling > under load. > > Every few tens of seconds, my system stalls for several seconds. > These stalls cause keyboard input to be lost, firefox to hang, etc. > > Setting tar''s ionice priority to best effort / 7 or to idle makes no difference. > > ionice idle and queue_depth = 1 on the disk (a slow 2TB WD) also makes > no difference. > > max_sectors_kb = 64 in addition to the above doesn''t help either. > > latencytop shows regular instances of 2-7 *second* latency, variously > in sync_page, start_transaction, btrfs_start_ordered_extent, and > do_get_write_access (from jbd2 on my ext4 root partition). > > echo 3 >drop_caches gave me 7 GB free RAM. I still had stalls when > 4-5 GB were still free (so it shouldn''t be a problem with important > pages being evicted). > > In case it matters, all of my partitions are on LVM on dm-crypt, but > this machine has AES-NI so the overhead from that should be minimal. > In fact, overall CPU usage is only about 10%. > > What gives? I thought this stuff was supposed to be better on modern kernels.We can tell more if you post the full traces from latencytop. I have a patch here for latencytop that adds a -c mode, which dumps the traces out to a text files. http://oss.oracle.com/~mason/latencytop.patch Based on what you have here, I think it''s probably a latency problem between btrfs and the dm-crypt stuff. How easily can setup a test partition without dm-crypt? -chris
On Fri, Feb 11, 2011 at 3:08 PM, Andrew Lutomirski <andy@luto.us> wrote:> As I type this, I have an ssh process running that''s dumping data into > a fifo at high speed (maybe 500Mbps) and a tar process that''s > untarring from the same fifo onto btrfs. The btrfs fs is mounted -o > space_cache,compress. This machine has 8GB ram, 8 logical cores, and > a fast (i7-2600) CPU, so it''s not an issue with the machine struggling > under load. > > Every few tens of seconds, my system stalls for several seconds. > These stalls cause keyboard input to be lost, firefox to hang, etc. > > Setting tar''s ionice priority to best effort / 7 or to idle makes no difference. > > ionice idle and queue_depth = 1 on the disk (a slow 2TB WD) also makes > no difference. > > max_sectors_kb = 64 in addition to the above doesn''t help either. > > latencytop shows regular instances of 2-7 *second* latency, variously > in sync_page, start_transaction, btrfs_start_ordered_extent, and > do_get_write_access (from jbd2 on my ext4 root partition). > > echo 3 >drop_caches gave me 7 GB free RAM. I still had stalls when > 4-5 GB were still free (so it shouldn''t be a problem with important > pages being evicted). > > In case it matters, all of my partitions are on LVM on dm-crypt, but > this machine has AES-NI so the overhead from that should be minimal. > In fact, overall CPU usage is only about 10%. > > What gives? I thought this stuff was supposed to be better on modern kernels. > > --Andy > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >Hi Andrew, you could try the following patch to speed up dm-crypt: https://patchwork.kernel.org/patch/365542/ I''m using it on top of a highly-patched 2.6.37 kernel not sure if exactly that version was included in 2.6.38 there are some additional handles to speed up dm: e.g. PCRYCONFIG_CRYPTO_PCRYPT=y Regards Matt
Andrew Lutomirski
2011-Feb-11 19:56 UTC
Re: 2.6.37: Multi-second I/O latency while untarring
On Fri, Feb 11, 2011 at 10:44 AM, Chris Mason <chris.mason@oracle.com> wrote:> Excerpts from Andrew Lutomirski''s message of 2011-02-11 10:08:52 -0500: >> As I type this, I have an ssh process running that''s dumping data into >> a fifo at high speed (maybe 500Mbps) and a tar process that''s >> untarring from the same fifo onto btrfs. The btrfs fs is mounted -o >> space_cache,compress. This machine has 8GB ram, 8 logical cores, and >> a fast (i7-2600) CPU, so it''s not an issue with the machine struggling >> under load. >> >> Every few tens of seconds, my system stalls for several seconds. >> These stalls cause keyboard input to be lost, firefox to hang, etc. >> >> Setting tar''s ionice priority to best effort / 7 or to idle makes no difference. >> >> ionice idle and queue_depth = 1 on the disk (a slow 2TB WD) also makes >> no difference. >> >> max_sectors_kb = 64 in addition to the above doesn''t help either. >> >> latencytop shows regular instances of 2-7 *second* latency, variously >> in sync_page, start_transaction, btrfs_start_ordered_extent, and >> do_get_write_access (from jbd2 on my ext4 root partition). >> >> echo 3 >drop_caches gave me 7 GB free RAM. I still had stalls when >> 4-5 GB were still free (so it shouldn''t be a problem with important >> pages being evicted). >> >> In case it matters, all of my partitions are on LVM on dm-crypt, but >> this machine has AES-NI so the overhead from that should be minimal. >> In fact, overall CPU usage is only about 10%. >> >> What gives? I thought this stuff was supposed to be better on modern kernels. > > We can tell more if you post the full traces from latencytop. I have a > patch here for latencytop that adds a -c mode, which dumps the traces > out to a text files. > > http://oss.oracle.com/~mason/latencytop.patchBig dump at end of email from latencytop git + your patch.> > Based on what you have here, I think it''s probably a latency problem > between btrfs and the dm-crypt stuff. How easily can setup a test > partition without dm-crypt?Not so easily on that disk. I left some space inside the LVM to play with but none outside. I''ll try hooking up another disk over eSATA l (on a Cougar Point 3Gbps controller, so it might blow up). And here''s the dump: =============== Fri Feb 11 14:44:07 2011 Globals: Cause Maximum Percentage synchronous write 4249.1 msec 35.5 % Writing to a pipe 4248.5 msec 35.5 % Writing a page to disk 105.9 msec 2.1 % Page fault 23.7 msec 0.2 % Reading from a pipe 4.7 msec 19.8 % Waiting for event (select) 4.6 msec 6.4 % Waiting for event (poll) 1.3 msec 0.0 % Executing raw SCSI command 1.3 msec 0.2 % opening cdrom device 1.3 msec 0.3 % Process details: Process ksoftirqd/1 (10) Total: 50.0 msec [run_ksoftirqd] 4.8 msec 100.0 % run_ksoftirqd kthread kernel_thread_helper Process ksoftirqd/2 (15) Total: 8.7 msec [run_ksoftirqd] 4.9 msec 100.0 % run_ksoftirqd kthread kernel_thread_helper Process ksoftirqd/3 (19) Total: 2.9 msec [run_ksoftirqd] 2.9 msec 100.0 % run_ksoftirqd kthread kernel_thread_helper Process ksoftirqd/5 (27) Total: 80.6 msec [run_ksoftirqd] 5.0 msec 100.0 % run_ksoftirqd kthread kernel_thread_helper Process scsi_eh_1 (62) Total: 45.0 msec Executing internal ATA command 0.7 msec 62.3 % ata_exec_internal_sg ata_exec_internal atapi_eh_request_sense ata_eh_link_autopsy ata_eh_autopsy sata_pmp_error_handler ahci_error_handler ata_scsi_error scsi_error_handler kthread kernel_thread_helper SCSI error handler 0.6 msec 37.7 % scsi_error_handler kthread kernel_thread_helper Process kworker/7:1 (76) Total: 8.7 msec . 3.9 msec 100.0 % worker_thread kthread kernel_thread_helper Process kworker/4:1 (139) Total: 124.0 msec . 4.9 msec 100.0 % worker_thread kthread kernel_thread_helper Process kworker/6:1 (140) Total: 11.7 msec . 3.8 msec 100.0 % worker_thread kthread kernel_thread_helper Process kworker/5:1 (141) Total: 12.5 msec . 4.9 msec 100.0 % worker_thread kthread kernel_thread_helper Process kworker/2:1 (142) Total: 26.1 msec . 4.9 msec 100.0 % worker_thread kthread kernel_thread_helper Process kworker/1:1 (143) Total: 47.1 msec . 4.9 msec 100.0 % worker_thread kthread kernel_thread_helper Process kworker/3:1 (150) Total: 4.6 msec . 3.1 msec 100.0 % worker_thread kthread kernel_thread_helper Process jbd2/dm-1-8 (376) Total: 66.7 msec Writing buffer to disk (synchronous) 66.7 msec 100.0 % sync_buffer __wait_on_buffer wait_on_buffer jbd2_journal_commit_transaction kjournald2 kthread kernel_thread_helper Process btrfs-submit-0 (829) Total: 1719.4 msec [worker_loop] 2.2 msec 100.0 % worker_loop kthread kernel_thread_helper Process btrfs-endio-met (833) Total: 540.5 msec [worker_loop] 4.8 msec 100.0 % worker_loop kthread kernel_thread_helper Process btrfs-freespace (836) Total: 31.7 msec [worker_loop] 1.8 msec 100.0 % worker_loop kthread kernel_thread_helper Process btrfs-transacti (838) Total: 5567.3 msec Writing buffer to disk (synchronous) 352.5 msec 6.8 % sync_buffer __wait_on_buffer write_dev_supers write_all_supers write_ctree_super btrfs_commit_transaction transaction_kthread kthread kernel_thread_helper Writing a page to disk 91.4 msec 93.2 % sync_page wait_on_page_bit read_extent_buffer_pages btree_read_extent_buffer_pages.clone.56 read_tree_block read_block_for_search.clone.38 btrfs_search_slot lookup_inline_extent_backref __btrfs_free_extent run_clustered_refs btrfs_run_delayed_refs btrfs_commit_transaction Process hald-addon-stor (1349) Total: 39.5 msec Executing raw SCSI command 1.3 msec 49.1 % blk_execute_rq scsi_execute scsi_execute_req sr_test_unit_ready sr_drive_status cdrom_ioctl sr_block_ioctl __blkdev_driver_ioctl blkdev_ioctl block_ioctl do_vfs_ioctl sys_ioctl opening cdrom device 1.3 msec 43.4 % blk_execute_rq scsi_execute scsi_execute_req sr_test_unit_ready sr_media_change media_changed cdrom_media_changed sr_block_media_changed check_disk_change cdrom_open sr_block_open __blkdev_get SCSI cdrom ioctl 0.6 msec 7.5 % blk_execute_rq scsi_execute sr_do_ioctl sr_packet cdrom_get_media_event sr_drive_status cdrom_ioctl sr_block_ioctl __blkdev_driver_ioctl blkdev_ioctl block_ioctl do_vfs_ioctl Process gdm-binary (1486) Total: 0.3 msec Waiting for event (poll) 0.3 msec 100.0 % poll_schedule_timeout do_sys_poll sys_poll system_call_fastpath Process Xorg (1520) Total: 0.9 msec [i915_do_wait_request] 0.3 msec 70.7 % i915_do_wait_request i915_gem_object_wait_rendering i915_gem_object_set_to_gtt_domain i915_gem_set_domain_ioctl drm_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath Process upowerd (1645) Total: 0.2 msec Waiting for event (poll) 0.2 msec 100.0 % poll_schedule_timeout do_sys_poll sys_poll system_call_fastpath Process gnome-session (1752) Total: 0.1 msec Waiting for event (poll) 0.1 msec 100.0 % poll_schedule_timeout do_sys_poll sys_poll system_call_fastpath Process udisks-daemon (1880) Total: 0.1 msec Waiting for event (poll) 0.1 msec 100.0 % poll_schedule_timeout do_sys_poll sys_poll system_call_fastpath Process udisks-daemon (1881) Total: 19.2 msec opening cdrom device 1.2 msec 88.1 % blk_execute_rq scsi_execute scsi_execute_req sr_test_unit_ready sr_media_change media_changed cdrom_media_changed sr_block_media_changed check_disk_change cdrom_open sr_block_open __blkdev_get Executing raw SCSI command 0.5 msec 11.9 % blk_execute_rq scsi_execute scsi_execute_req ioctl_internal_command.clone.3 scsi_set_medium_removal sr_lock_door cdrom_release sr_block_release __blkdev_put blkdev_put blkdev_close fput Process bluetooth-apple (1915) Total: 0.2 msec Waiting for event (poll) 0.2 msec 100.0 % poll_schedule_timeout do_sys_poll sys_poll system_call_fastpath Process gpk-update-icon (1942) Total: 0.2 msec Waiting for event (poll) 0.2 msec 100.0 % poll_schedule_timeout do_sys_poll sys_poll system_call_fastpath Process gnome-screensav (2040) Total: 0.3 msec Waiting for event (poll) 0.2 msec 100.0 % poll_schedule_timeout do_sys_poll sys_poll system_call_fastpath Process flush-btrfs-1 (2120) Total: 173.3 msec Writing a page to disk 161.4 msec 100.0 % sync_page __lock_page lock_page extent_write_cache_pages.clone.11.clone.18 extent_writepages btrfs_writepages do_writepages writeback_single_inode writeback_sb_inodes writeback_inodes_wb wb_writeback wb_do_writeback Process btrfs-delalloc- (2402) Total: 0.2 msec [worker_loop] 0.2 msec 100.0 % worker_loop kthread kernel_thread_helper Process btrfs-endio-met (2408) Total: 52.1 msec [worker_loop] 2.5 msec 100.0 % worker_loop kthread kernel_thread_helper Process btrfs-endio-wri (2768) Total: 1561.5 msec [worker_loop] 4.1 msec 100.0 % worker_loop kthread kernel_thread_helper Process compiz (6845) Total: 0.8 msec Waiting for event (poll) 0.2 msec 100.0 % poll_schedule_timeout do_sys_poll sys_poll system_call_fastpath Process clock-applet (6928) Total: 1.4 msec Waiting for event (poll) 1.3 msec 100.0 % poll_schedule_timeout do_sys_poll sys_poll system_call_fastpath Process kworker/0:0 (20743) Total: 1867.9 msec . 5.0 msec 100.0 % worker_thread kthread kernel_thread_helper Process tar (22372) Total: 6614.1 msec synchronous write 4249.1 msec 64.2 % btrfs_throttle btrfs_file_aio_write do_sync_write vfs_write sys_write system_call_fastpath Reading from a pipe 4.7 msec 35.8 % pipe_wait pipe_read do_sync_read vfs_read sys_read system_call_fastpath Process ssh (22533) Total: 4974.4 msec Writing to a pipe 4248.5 msec 85.4 % start_transaction btrfs_join_transaction btrfs_dirty_inode __mark_inode_dirty file_update_time pipe_write do_sync_write vfs_write sys_write system_call_fastpath Waiting for event (select) 4.6 msec 14.6 % poll_schedule_timeout do_select core_sys_select sys_select system_call_fastpath Process kworker/u:3 (23092) Total: 1901.4 msec Creating block layer request 14.7 msec 84.7 % get_request_wait __make_request generic_make_request kcryptd_crypt_write_io_submit kcryptd_crypt process_one_work worker_thread kthread kernel_thread_helper . 4.7 msec 15.3 % worker_thread kthread kernel_thread_helper Process kworker/u:0 (23763) Total: 2108.1 msec . 5.0 msec 100.0 % worker_thread kthread kernel_thread_helper Process packagekitd (24691) Total: 0.2 msec Waiting for event (poll) 0.2 msec 100.0 % poll_schedule_timeout do_sys_poll sys_poll system_call_fastpath Process sshd (24718) Total: 35.8 msec Waiting for event (select) 3.1 msec 99.9 % poll_schedule_timeout do_select core_sys_select sys_select system_call_fastpath Process bash (24719) Total: 5.1 msec Reading from a pipe 1.0 msec 82.5 % pipe_wait pipe_read do_sync_read vfs_read sys_read system_call_fastpath Waiting for a process to die 0.9 msec 17.5 % do_wait sys_wait4 system_call_fastpath It seems a little sad that ssh took a big latency hit just marking the fifo dirty. --Andy
Andrew Lutomirski
2011-Feb-12 00:35 UTC
Re: 2.6.37: Multi-second I/O latency while untarring
On Fri, Feb 11, 2011 at 10:44 AM, Chris Mason <chris.mason@oracle.com> wrote:> Excerpts from Andrew Lutomirski''s message of 2011-02-11 10:08:52 -0500: >> As I type this, I have an ssh process running that''s dumping data into >> a fifo at high speed (maybe 500Mbps) and a tar process that''s >> untarring from the same fifo onto btrfs. The btrfs fs is mounted -o >> space_cache,compress. This machine has 8GB ram, 8 logical cores, and >> a fast (i7-2600) CPU, so it''s not an issue with the machine struggling >> under load. >> >> Every few tens of seconds, my system stalls for several seconds. >> These stalls cause keyboard input to be lost, firefox to hang, etc. >> >> Setting tar''s ionice priority to best effort / 7 or to idle makes no difference. >> >> ionice idle and queue_depth = 1 on the disk (a slow 2TB WD) also makes >> no difference. >> >> max_sectors_kb = 64 in addition to the above doesn''t help either. >> >> latencytop shows regular instances of 2-7 *second* latency, variously >> in sync_page, start_transaction, btrfs_start_ordered_extent, and >> do_get_write_access (from jbd2 on my ext4 root partition). >> >> echo 3 >drop_caches gave me 7 GB free RAM. I still had stalls when >> 4-5 GB were still free (so it shouldn''t be a problem with important >> pages being evicted). >> >> In case it matters, all of my partitions are on LVM on dm-crypt, but >> this machine has AES-NI so the overhead from that should be minimal. >> In fact, overall CPU usage is only about 10%. >> >> What gives? I thought this stuff was supposed to be better on modern kernels. > > We can tell more if you post the full traces from latencytop. I have a > patch here for latencytop that adds a -c mode, which dumps the traces > out to a text files. > > http://oss.oracle.com/~mason/latencytop.patch > > Based on what you have here, I think it''s probably a latency problem > between btrfs and the dm-crypt stuff. How easily can setup a test > partition without dm-crypt?Done, on the same physical disk as before. The latency is just as bad. On this test, I wrote a total of 3.1G, which is under half of my RAM. That should rule out lots of VM issues. latencytop trace below. The impression I get (from watching the disk activity light) is that the disk is mostly idle but every now and then writes out a ton of data. While it''s writing, the system often becomes unusable. P.S. How bad is this? I got it on both disks. btrfs: free space inode generation (0) did not match free space cache generation (11070) for block group 1103101952 =============== Fri Feb 11 19:30:57 2011 Globals: Cause Maximum Percentage Writing a page to disk 2009.0 msec 19.7 % fsync() on a file (type ''F'' for details) 612.2 msec 5.0 % synchronous write 573.6 msec 1.8 % Page fault 57.3 msec 0.7 % Writing buffer to disk (synchronous) 45.2 msec 0.1 % Unlinking file 12.6 msec 0.0 % Waiting for event (select) 5.0 msec 22.3 % Reading from a pipe 5.0 msec 29.9 % Waiting for event (poll) 5.0 msec 17.8 % Process details: Process kthreadd (2) Total: 1.9 msec kthreadd kernel thread 1.9 msec 100.0 % kthreadd kernel_thread_helper Process ksoftirqd/0 (3) Total: 18.5 msec [run_ksoftirqd] 4.0 msec 100.0 % run_ksoftirqd kthread kernel_thread_helper Process ksoftirqd/1 (10) Total: 19.6 msec [run_ksoftirqd] 4.9 msec 100.0 % run_ksoftirqd kthread kernel_thread_helper Process kworker/0:1 (11) Total: 556.3 msec . 5.0 msec 100.0 % worker_thread kthread kernel_thread_helper Process ksoftirqd/2 (15) Total: 8.1 msec [run_ksoftirqd] 2.9 msec 100.0 % run_ksoftirqd kthread kernel_thread_helper Process ksoftirqd/4 (23) Total: 11.2 msec [run_ksoftirqd] 4.3 msec 100.0 % run_ksoftirqd kthread kernel_thread_helper Process scsi_eh_1 (62) Total: 38.8 msec SCSI error handler 0.9 msec 39.9 % scsi_error_handler kthread kernel_thread_helper Executing internal ATA command 0.7 msec 60.1 % ata_exec_internal_sg ata_exec_internal atapi_eh_request_sense ata_eh_link_autopsy ata_eh_autopsy sata_pmp_error_handler ahci_error_handler ata_scsi_error scsi_error_handler kthread kernel_thread_helper Process kworker/u:4 (69) Total: 616.5 msec Creating block layer request 54.9 msec 77.8 % get_request_wait __make_request generic_make_request kcryptd_crypt_write_io_submit kcryptd_crypt process_one_work worker_thread kthread kernel_thread_helper . 5.0 msec 22.2 % worker_thread kthread kernel_thread_helper Process kworker/u:5 (70) Total: 1712.3 msec Creating block layer request 492.8 msec 94.3 % get_request_wait __make_request generic_make_request kcryptd_io process_one_work worker_thread kthread kernel_thread_helper . 4.9 msec 5.7 % worker_thread kthread kernel_thread_helper Process kworker/7:1 (76) Total: 5.7 msec . 4.8 msec 100.0 % worker_thread kthread kernel_thread_helper Process kworker/5:1 (117) Total: 31.9 msec . 4.9 msec 100.0 % worker_thread kthread kernel_thread_helper Process kworker/6:1 (118) Total: 13.6 msec . 4.9 msec 100.0 % worker_thread kthread kernel_thread_helper Process kworker/1:1 (119) Total: 126.3 msec . 4.9 msec 100.0 % worker_thread kthread kernel_thread_helper Process kworker/4:1 (120) Total: 377.5 msec . 5.0 msec 100.0 % worker_thread kthread kernel_thread_helper Process kworker/2:1 (131) Total: 73.1 msec . 4.9 msec 100.0 % worker_thread kthread kernel_thread_helper Process kworker/3:1 (132) Total: 30.7 msec . 4.9 msec 100.0 % worker_thread kthread kernel_thread_helper Process jbd2/dm-1-8 (372) Total: 137.4 msec Writing buffer to disk (synchronous) 127.3 msec 92.8 % sync_buffer __wait_on_buffer wait_on_buffer jbd2_journal_commit_transaction kjournald2 kthread kernel_thread_helper Writing a page to disk 9.9 msec 7.2 % sync_page wait_on_page_bit filemap_fdatawait_range filemap_fdatawait jbd2_journal_commit_transaction kjournald2 kthread kernel_thread_helper Process btrfs-worker-0 (828) Total: 4.6 msec [worker_loop] 2.2 msec 100.0 % worker_loop kthread kernel_thread_helper Process btrfs-submit-0 (830) Total: 5.8 msec [worker_loop] 2.3 msec 100.0 % worker_loop kthread kernel_thread_helper Process btrfs-delalloc- (831) Total: 1.1 msec [worker_loop] 1.1 msec 100.0 % worker_loop kthread kernel_thread_helper Process btrfs-endio-0 (833) Total: 69.1 msec [worker_loop] 4.7 msec 100.0 % worker_loop kthread kernel_thread_helper Process btrfs-endio-met (834) Total: 56.0 msec [worker_loop] 4.5 msec 100.0 % worker_loop kthread kernel_thread_helper Process btrfs-endio-met (835) Total: 19.8 msec [worker_loop] 4.8 msec 100.0 % worker_loop kthread kernel_thread_helper Process btrfs-freespace (837) Total: 10.0 msec [worker_loop] 1.4 msec 100.0 % worker_loop kthread kernel_thread_helper Process btrfs-transacti (841) Total: 3271.2 msec Writing a page to disk 507.8 msec 89.5 % sync_page __lock_page lock_page extent_write_cache_pages.clone.11.clone.18 extent_writepages btrfs_writepages do_writepages __filemap_fdatawrite_range filemap_flush btrfs_run_ordered_operations btrfs_commit_transaction transaction_kthread Writing buffer to disk (synchronous) 305.5 msec 10.5 % sync_buffer __wait_on_buffer write_dev_supers write_all_supers write_ctree_super btrfs_commit_transaction transaction_kthread kthread kernel_thread_helper Process hald-addon-inpu (1347) Total: 5.4 msec Waiting for event (poll) 5.0 msec 100.0 % poll_schedule_timeout do_sys_poll sys_poll system_call_fastpath Process hald-addon-stor (1362) Total: 48.2 msec Opening file 3.5 msec 21.7 % bd_prepare_to_claim bd_start_claiming blkdev_open __dentry_open nameidata_to_filp do_last do_filp_open do_sys_open sys_open system_call_fastpath Executing raw SCSI command 1.2 msec 38.7 % blk_execute_rq scsi_execute scsi_execute_req sr_test_unit_ready sr_drive_status cdrom_ioctl sr_block_ioctl __blkdev_driver_ioctl blkdev_ioctl block_ioctl do_vfs_ioctl sys_ioctl opening cdrom device 1.2 msec 33.4 % blk_execute_rq scsi_execute scsi_execute_req sr_test_unit_ready sr_media_change media_changed cdrom_media_changed sr_block_media_changed check_disk_change cdrom_open sr_block_open __blkdev_get SCSI cdrom ioctl 0.6 msec 6.2 % blk_execute_rq scsi_execute sr_do_ioctl sr_packet cdrom_get_media_event sr_drive_status cdrom_ioctl sr_block_ioctl __blkdev_driver_ioctl blkdev_ioctl block_ioctl do_vfs_ioctl Process Xorg (1533) Total: 4718.6 msec Waiting for event (select) 5.0 msec 93.8 % poll_schedule_timeout do_select core_sys_select sys_select system_call_fastpath [i915_do_wait_request] 4.3 msec 3.5 % i915_do_wait_request i915_gem_object_wait_rendering i915_gem_do_execbuffer.clone.29 i915_gem_execbuffer2 drm_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath [__mutex_fastpath_lock_retval] 3.4 msec 2.7 % __mutex_fastpath_lock_retval i915_mutex_lock_interruptible i915_gem_pwrite_ioctl drm_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath Waiting for TTY data 0.3 msec 0.0 % flush_work flush_delayed_work tty_flush_to_ldisc input_available_p n_tty_poll tty_poll do_select core_sys_select sys_select system_call_fastpath Process gnome-settings- (1857) Total: 18.1 msec Waiting for event (poll) 3.9 msec 100.0 % poll_schedule_timeout do_sys_poll sys_poll system_call_fastpath Process flush-btrfs-1 (1865) Total: 0.7 msec Writing a page to disk 0.1 msec 100.0 % sync_page __lock_page lock_page extent_write_cache_pages.clone.11.clone.18 extent_writepages btrfs_writepages do_writepages writeback_single_inode writeback_sb_inodes writeback_inodes_wb wb_writeback wb_do_writeback Process nautilus (1884) Total: 0.5 msec Waiting for event (poll) 0.2 msec 100.0 % poll_schedule_timeout do_sys_poll sys_poll system_call_fastpath Process compiz (1893) Total: 650.6 msec Waiting for event (poll) 5.0 msec 79.4 % poll_schedule_timeout do_sys_poll sys_poll system_call_fastpath [i915_do_wait_request] 2.5 msec 20.2 % i915_do_wait_request i915_gem_object_wait_rendering i915_gem_do_execbuffer.clone.29 i915_gem_execbuffer2 drm_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath [__mutex_fastpath_lock_retval] 1.0 msec 0.4 % __mutex_fastpath_lock_retval i915_mutex_lock_interruptible i915_gem_set_domain_ioctl drm_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath Process wnck-applet (1927) Total: 2.6 msec Waiting for event (poll) 0.7 msec 100.0 % poll_schedule_timeout do_sys_poll sys_poll system_call_fastpath Process udisks-daemon (2005) Total: 24.1 msec Opening file 3.3 msec 53.0 % bd_prepare_to_claim bd_start_claiming blkdev_open __dentry_open nameidata_to_filp do_last do_filp_open do_sys_open sys_open system_call_fastpath opening cdrom device 1.2 msec 41.6 % blk_execute_rq scsi_execute scsi_execute_req sr_test_unit_ready sr_media_change media_changed cdrom_media_changed sr_block_media_changed check_disk_change cdrom_open sr_block_open __blkdev_get Executing raw SCSI command 0.4 msec 5.4 % blk_execute_rq scsi_execute scsi_execute_req ioctl_internal_command.clone.3 scsi_set_medium_removal sr_lock_door cdrom_release sr_block_release __blkdev_put blkdev_put blkdev_close fput Process btrfs-worker-3 (2097) Total: 1.1 msec [worker_loop] 0.6 msec 100.0 % worker_loop kthread kernel_thread_helper Process btrfs-worker-4 (2098) Total: 0.4 msec [worker_loop] 0.2 msec 100.0 % worker_loop kthread kernel_thread_helper Process gtk-window-deco (2133) Total: 69.0 msec Page fault 57.3 msec 83.1 % sync_page wait_on_page_bit wait_on_page_locked __lock_page_or_retry filemap_fault __do_fault handle_mm_fault do_page_fault page_fault Waiting for event (poll) 3.8 msec 16.9 % poll_schedule_timeout do_sys_poll sys_poll system_call_fastpath Process gvfsd-trash (2152) Total: 495.5 msec Waiting for event (poll) 5.0 msec 100.0 % poll_schedule_timeout do_sys_poll sys_poll system_call_fastpath Process gnome-terminal (2185) Total: 483.3 msec Waiting for event (poll) 4.0 msec 100.0 % poll_schedule_timeout do_sys_poll sys_poll system_call_fastpath Process firefox (2426) Total: 5028.3 msec Writing a page to disk 867.7 msec 48.5 % sync_page wait_on_page_bit read_extent_buffer_pages btree_read_extent_buffer_pages.clone.56 read_tree_block read_block_for_search.clone.38 btrfs_search_slot btrfs_lookup_csum __btrfs_lookup_bio_sums btrfs_lookup_bio_sums btrfs_submit_bio_hook submit_one_bio fsync() on a file (type ''F'' for details) 612.2 msec 24.4 % sync_page __lock_page lock_page extent_write_cache_pages.clone.11.clone.18 extent_writepages btrfs_writepages do_writepages __filemap_fdatawrite_range filemap_write_and_wait_range vfs_fsync_range vfs_fsync do_fsync synchronous write 573.6 msec 11.4 % start_transaction btrfs_join_transaction btrfs_dirty_inode __mark_inode_dirty file_update_time btrfs_file_aio_write do_sync_write vfs_write sys_write system_call_fastpath Page fault 57.0 msec 2.9 % sync_page wait_on_page_bit wait_on_page_locked __lock_page_or_retry filemap_fault __do_fault handle_mm_fault do_page_fault page_fault Writing buffer to disk (synchronous) 45.2 msec 0.9 % sync_buffer __wait_on_buffer ext4_find_entry ext4_lookup d_alloc_and_lookup do_lookup do_last do_filp_open do_sys_open sys_open system_call_fastpath Unlinking file 12.6 msec 0.3 % btrfs_tree_lock btrfs_search_slot btrfs_insert_empty_items run_clustered_refs btrfs_run_delayed_refs __btrfs_end_transaction btrfs_end_transaction_throttle __unlink_end_trans btrfs_unlink vfs_unlink do_unlinkat sys_unlink Waiting for event (poll) 4.9 msec 5.9 % poll_schedule_timeout do_sys_poll sys_poll system_call_fastpath Userspace lock contention 4.9 msec 5.7 % futex_wait_queue_me futex_wait do_futex sys_futex system_call_fastpath Process btrfs-delalloc- (2453) Total: 1.8 msec [worker_loop] 1.1 msec 100.0 % worker_loop kthread kernel_thread_helper Process btrfs-endio-wri (2454) Total: 44.7 msec Writing a page to disk 31.1 msec 98.1 % sync_page wait_on_page_bit read_extent_buffer_pages btree_read_extent_buffer_pages.clone.56 read_tree_block read_block_for_search.clone.38 btrfs_search_slot btrfs_insert_empty_items alloc_reserved_file_extent.clone.57 run_clustered_refs btrfs_run_delayed_refs __btrfs_end_transaction [worker_loop] 0.7 msec 1.9 % worker_loop kthread kernel_thread_helper Process btrfs-endio-wri (2460) Total: 819.2 msec Writing a page to disk 532.4 msec 99.8 % sync_page __lock_page lock_page read_extent_buffer_pages btree_read_extent_buffer_pages.clone.56 read_tree_block reada_for_balance btrfs_search_slot btrfs_csum_file_blocks add_pending_csums.clone.29 btrfs_finish_ordered_io btrfs_writepage_end_io_hook [worker_loop] 0.8 msec 0.2 % worker_loop kthread kernel_thread_helper Process btrfs-endio-met (2464) Total: 146.5 msec [worker_loop] 4.7 msec 100.0 % worker_loop kthread kernel_thread_helper Process btrfs-genwork-0 (2573) Total: 2.1 msec [worker_loop] 1.8 msec 87.6 % worker_loop kthread kernel_thread_helper [kthread_create] 0.1 msec 12.4 % kthread_create __btrfs_start_workers btrfs_start_workers start_new_worker_func worker_loop kthread kernel_thread_helper Process btrfs-submit-0 (2574) Total: 2797.6 msec Creating block layer request 484.7 msec 99.9 % get_request_wait __make_request generic_make_request submit_bio run_scheduled_bios pending_bios_fn worker_loop kthread kernel_thread_helper [worker_loop] 0.1 msec 0.1 % worker_loop kthread kernel_thread_helper Process btrfs-endio-met (2579) Total: 5.1 msec [worker_loop] 1.9 msec 100.0 % worker_loop kthread kernel_thread_helper Process btrfs-endio-wri (2580) Total: 146.3 msec [worker_loop] 5.0 msec 100.0 % worker_loop kthread kernel_thread_helper Process btrfs-freespace (2581) Total: 0.4 msec [worker_loop] 0.4 msec 100.0 % worker_loop kthread kernel_thread_helper Process btrfs-transacti (2583) Total: 479.7 msec Writing buffer to disk (synchronous) 466.0 msec 97.2 % sync_buffer __wait_on_buffer write_dev_supers write_all_supers write_ctree_super btrfs_commit_transaction transaction_kthread kthread kernel_thread_helper Writing a page to disk 6.4 msec 2.8 % sync_page wait_on_page_bit filemap_fdatawait_range filemap_fdatawait filemap_write_and_wait btrfs_write_out_cache btrfs_write_dirty_block_groups commit_cowonly_roots btrfs_commit_transaction transaction_kthread kthread kernel_thread_helper Process ssh (2659) Total: 2559.8 msec Waiting for event (select) 4.9 msec 100.0 % poll_schedule_timeout do_select core_sys_select sys_select system_call_fastpath Process tar (2662) Total: 9364.3 msec Reading from a pipe 5.0 msec 100.0 % pipe_wait pipe_read do_sync_read vfs_read sys_read system_call_fastpath Process btrfs-endio-wri (2741) Total: 1.1 msec [worker_loop] 1.0 msec 97.6 % worker_loop kthread kernel_thread_helper Process btrfs-endio-met (2742) Total: 2.4 msec [worker_loop] 1.8 msec 99.1 % worker_loop kthread kernel_thread_helper Process btrfs-endio-met (2743) Total: 2.3 msec [worker_loop] 1.8 msec 99.9 % worker_loop kthread kernel_thread_helper Process bash (2753) Total: 483.3 msec Waiting for event (poll) 4.0 msec 100.0 % poll_schedule_timeout do_sys_poll sys_poll system_call_fastpath Process btrfs-endio-wri (2790) Total: 1.9 msec kthreadd kernel thread 1.9 msec 100.0 % kthreadd kernel_thread_helper Process sudo (2792) Total: 483.3 msec Waiting for event (poll) 4.0 msec 100.0 % poll_schedule_timeout do_sys_poll sys_poll system_call_fastpath Process latencytop (2798) Total: 486.0 msec Waiting for event (poll) 4.0 msec 99.4 % poll_schedule_timeout do_sys_poll sys_poll system_call_fastpath Waiting for a process to die 2.1 msec 0.6 % do_wait sys_wait4 system_call_fastpath
Excerpts from Andrew Lutomirski''s message of 2011-02-11 19:35:02 -0500:> On Fri, Feb 11, 2011 at 10:44 AM, Chris Mason <chris.mason@oracle.com> wrote: > > Excerpts from Andrew Lutomirski''s message of 2011-02-11 10:08:52 -0500: > >> As I type this, I have an ssh process running that''s dumping data into > >> a fifo at high speed (maybe 500Mbps) and a tar process that''s > >> untarring from the same fifo onto btrfs. Â The btrfs fs is mounted -o > >> space_cache,compress. Â This machine has 8GB ram, 8 logical cores, and > >> a fast (i7-2600) CPU, so it''s not an issue with the machine struggling > >> under load. > >> > >> Every few tens of seconds, my system stalls for several seconds. > >> These stalls cause keyboard input to be lost, firefox to hang, etc. > >> > >> Setting tar''s ionice priority to best effort / 7 or to idle makes no difference. > >> > >> ionice idle and queue_depth = 1 on the disk (a slow 2TB WD) also makes > >> no difference. > >> > >> max_sectors_kb = 64 in addition to the above doesn''t help either. > >> > >> latencytop shows regular instances of 2-7 *second* latency, variously > >> in sync_page, start_transaction, btrfs_start_ordered_extent, and > >> do_get_write_access (from jbd2 on my ext4 root partition). > >> > >> echo 3 >drop_caches gave me 7 GB free RAM. Â I still had stalls when > >> 4-5 GB were still free (so it shouldn''t be a problem with important > >> pages being evicted). > >> > >> In case it matters, all of my partitions are on LVM on dm-crypt, but > >> this machine has AES-NI so the overhead from that should be minimal. > >> In fact, overall CPU usage is only about 10%. > >> > >> What gives? Â I thought this stuff was supposed to be better on modern kernels. > > > > We can tell more if you post the full traces from latencytop. Â I have a > > patch here for latencytop that adds a -c mode, which dumps the traces > > out to a text files. > > > > http://oss.oracle.com/~mason/latencytop.patch > > > > Based on what you have here, I think it''s probably a latency problem > > between btrfs and the dm-crypt stuff. Â How easily can setup a test > > partition without dm-crypt? > > Done, on the same physical disk as before. The latency is just as > bad. On this test, I wrote a total of 3.1G, which is under half of my > RAM. That should rule out lots of VM issues. latencytop trace below.Just to confirm, you say on a physical disk you mean without dm-crypt?> > The impression I get (from watching the disk activity light) is that > the disk is mostly idle but every now and then writes out a ton of > data. While it''s writing, the system often becomes unusable.Could you please btrfs fi df /mnt (where /mnt is your test filesystem)> > P.S. How bad is this? I got it on both disks. > btrfs: free space inode generation (0) did not match free space cache > generation (11070) for block group 1103101952We got rid of these in later kernels, they are fine. The latencytop data shows us basically waiting for the disk. We''re either waiting for synchronous reads or writes, and we''re heavily waiting for supers to be sent down to the disk as part of committing transactions. There are a few things I''d like you to try: 1) Try deadline instead of cfq, unless you''re using deadline in which case you could try cfq. 2) Try increasing the number of io requests we allow in flight: echo 2048 > /sys/block/xxx/queue/nr_requests Here xxx is your physical disk (like sda) 3) Try without firefox running. Firefox is generating a lot of synchronous IO here. The btrfs log tries really hard to manage this without making the box stall, but somehow we might not be doing well. One place we don''t do well is if your disk was freshly formatted and you''re still growing chunks to cover new writes. In this case the fsyncs done by firefox will lead to more expensive transaction commits. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Andrew Lutomirski
2011-Mar-29 03:50 UTC
Re: 2.6.37: Multi-second I/O latency while untarring
On Mon, Feb 14, 2011 at 10:22 AM, Chris Mason <chris.mason@oracle.com> wrote:> Excerpts from Andrew Lutomirski''s message of 2011-02-11 19:35:02 -0500: >> On Fri, Feb 11, 2011 at 10:44 AM, Chris Mason <chris.mason@oracle.com> wrote:>> > >> > We can tell more if you post the full traces from latencytop. I have a >> > patch here for latencytop that adds a -c mode, which dumps the traces >> > out to a text files. >> > >> > http://oss.oracle.com/~mason/latencytop.patch >> > >> > Based on what you have here, I think it''s probably a latency problem >> > between btrfs and the dm-crypt stuff. How easily can setup a test >> > partition without dm-crypt? >> >> Done, on the same physical disk as before. The latency is just as >> bad. On this test, I wrote a total of 3.1G, which is under half of my >> RAM. That should rule out lots of VM issues. latencytop trace below. > > Just to confirm, you say on a physical disk you mean without dm-crypt?Sorry for the exceedingly slow reply. This problem is really bad with 2.6.38.1. To make it a little easier to demonstrate, I wrote a tool that shows off the problem. I made a test btrfs partition on a plain disk partition (same disk as my dm-crypt but an unencrypted partition). Now clone a kernel tree there and run make -j8. Wait until the disk starts to write data out in earnest (takes awhile to dirty enough pages). Watch crap like this happen (with nr_requests = 2048, scheduler = deadline). io_latency_watch read <1M file on test partition> read took 0.000 seconds (worst = 0.963s) read took 0.000 seconds (worst = 0.963s) read took 0.022 seconds (worst = 0.963s) read took 0.000 seconds (worst = 0.963s) read took 0.028 seconds (worst = 0.963s) read took 1.430 seconds (worst = 1.430s) read took 0.270 seconds (worst = 1.430s) read took 1.237 seconds (worst = 1.430s) read took 0.282 seconds (worst = 1.430s) read took 0.131 seconds (worst = 1.430s) io_latency_watch read <1M file on other partition on same disk> is similar, and io_latency_test write <dir on other partition> is even worse. The cfq scheduler is similar. --Andy