Hello,
When I'm running postmark on btrfs v0.12, although the system
must be busy doing the I/O, there are some cases where the I/O is idle
while btrfs workqueue eats up most of the CPU time.
For example,
# vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 44 13804 500 927040 0 0 1376 2004 249 3966 0 49 49 2 0
1 0 44 12968 496 927772 0 0 1140 1352 216 2095 0 49 49 2 0
1 0 44 13224 496 927412 0 0 1260 2148 230 2625 0 49 50 2 0
1 0 44 14260 460 926164 0 0 1408 1640 239 2712 0 49 49 2 0
1 0 44 14592 460 925920 0 0 1228 1660 229 2675 0 49 49 2 0
1 0 44 13740 460 926728 0 0 1208 1844 216 2367 0 50 50 1 0
1 0 44 13492 460 926612 0 0 1944 2648 330 4081 0 49 49 3 0
2 0 44 13288 460 926756 0 0 1356 1960 278 3243 0 49 49 1 0
1 0 44 13668 460 926436 0 0 1080 1428 201 2258 0 49 49 2 0
1 0 44 13752 468 926024 0 0 1000 10060 213 2008 0 50 49 1 0
1 0 44 14064 428 925580 0 0 952 1336 199 2934 0 50 49 1 0
1 0 44 13208 332 926116 0 0 352 364 71 673 0 50 49 0 0
1 0 44 14236 320 925016 0 0 0 0 4 121 0
50 50 0 0 <= something happens from here
1 0 44 14024 320 924984 0 0 0 0 5 115 0 50 50 0 0
2 0 44 13924 320 925080 0 0 0 0 4 112 0 50 50 0 0
1 0 44 14108 320 925128 0 0 0 0 4 887 0 50 50 0 0
1 0 44 13476 320 925720 0 0 0 0 10 107 0 50 50 0 0
1 0 44 13296 320 925824 0 0 0 0 8 107 0 50 50 0 0
1 0 44 14040 320 924964 0 0 0 0 4 105 0 51 50 0 0
1 0 44 15036 320 924120 0 0 0 0 7 103 0 50 50 0 0
1 0 44 14028 312 924896 0 0 0 0 2 885 0 50 50 0 0
# top
top - 07:15:11 up 30 min, 2 users, load average: 1.67, 1.67, 1.29
Tasks: 68 total, 2 running, 66 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 50.2%sy, 0.0%ni, 49.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1025740k total, 1011212k used, 14528k free, 756k buffers
Swap: 2031608k total, 44k used, 2031564k free, 921176k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3548 root 15 -5 0 0 0 R 100 0.0 6:56.34 btrfs/1
<= this consumes 100% cpu time
234 root 15 -5 0 0 0 S 0 0.0 0:02.85 kswapd0
4092 djshin 20 0 2200 1000 800 R 0 0.1 0:00.12 top
1 root 20 0 2136 664 576 S 0 0.1 0:00.47 init
2 root 15 -5 0 0 0 S 0 0.0 0:00.00 kthreadd
3 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/0
4 root 15 -5 0 0 0 S 0 0.0 0:00.05 ksoftirqd/0
It seems that this happens for both 2.6.23 and 2.6.24 kernel.
Any idea?
--
Dongjun
On Friday 15 February 2008, Dongjun Shin wrote:> Hello, > > When I'm running postmark on btrfs v0.12, although the system > must be busy doing the I/O, there are some cases where the I/O is idle > while btrfs workqueue eats up most of the CPU time. > > It seems that this happens for both 2.6.23 and 2.6.24 kernel. > > Any idea? >It is probably the defrag process or snapshot deletion holding the FS wide mutex and excluding other writeback. This big mutex is really sub-optimal, and it shows up on a number of benchmarks. (sysrq-p might catch it) Once the multiple device code is ready, fine grained locking is my top priority. For runs like postmark, a smaller blocksize (mkfs.btrfs -l 4096 -n 4096) will lower the amount of work that needs to be done during snapshot deletion. Random IOs cause more btree churn on the larger blocksizes. You could also try the patch below to disable defrag in ssd mode. -chris -------------- next part -------------- A non-text attachment was scrubbed... Name: no-defrag Type: text/x-diff Size: 363 bytes Desc: not available Url : http://oss.oracle.com/pipermail/btrfs-devel/attachments/20080215/fdc9c581/no-defrag.bin
On Friday 15 February 2008, Dongjun Shin wrote:> Hello, > > When I'm running postmark on btrfs v0.12, although the system > must be busy doing the I/O, there are some cases where the I/O is idle > while btrfs workqueue eats up most of the CPU time. >I wasn't able to reproduce this on my small ssd, but I could trigger it on my larger sata drive. Most of the time we seem to be stuck in btrfs_realloc_node, which is part of the defrag. The attached patch disables defrag in ssd mode, or you can grab the latest from btrfs-unstable: http://oss.oracle.com/mercurial/mason/btrfs-unstable/archive/1cc5025e42bb.tar.gz I had left defrag on in ssd mode because earlier tests showed it still helped in some read workloads. This doesn't seem to be the case anymore, but if you see read regressions, please let me know. (updated no-defrag patch below) -chris -------------- next part -------------- A non-text attachment was scrubbed... Name: no-defrag Type: text/x-diff Size: 1023 bytes Desc: not available Url : http://oss.oracle.com/pipermail/btrfs-devel/attachments/20080215/c223f5fb/no-defrag.bin