Hello,
When I'm running postmark on btrfs v0.12, although the system
must be busy doing the I/O, there are some cases where the I/O is idle
while btrfs workqueue eats up most of the CPU time.
For example,
# vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0     44  13804    500 927040    0    0  1376  2004  249 3966  0 49 49  2  0
 1  0     44  12968    496 927772    0    0  1140  1352  216 2095  0 49 49  2  0
 1  0     44  13224    496 927412    0    0  1260  2148  230 2625  0 49 50  2  0
 1  0     44  14260    460 926164    0    0  1408  1640  239 2712  0 49 49  2  0
 1  0     44  14592    460 925920    0    0  1228  1660  229 2675  0 49 49  2  0
 1  0     44  13740    460 926728    0    0  1208  1844  216 2367  0 50 50  1  0
 1  0     44  13492    460 926612    0    0  1944  2648  330 4081  0 49 49  3  0
 2  0     44  13288    460 926756    0    0  1356  1960  278 3243  0 49 49  1  0
 1  0     44  13668    460 926436    0    0  1080  1428  201 2258  0 49 49  2  0
 1  0     44  13752    468 926024    0    0  1000 10060  213 2008  0 50 49  1  0
 1  0     44  14064    428 925580    0    0   952  1336  199 2934  0 50 49  1  0
 1  0     44  13208    332 926116    0    0   352   364   71  673  0 50 49  0  0
 1  0     44  14236    320 925016    0    0     0     0    4  121  0
50 50  0  0   <= something happens from here
 1  0     44  14024    320 924984    0    0     0     0    5  115  0 50 50  0  0
 2  0     44  13924    320 925080    0    0     0     0    4  112  0 50 50  0  0
 1  0     44  14108    320 925128    0    0     0     0    4  887  0 50 50  0  0
 1  0     44  13476    320 925720    0    0     0     0   10  107  0 50 50  0  0
 1  0     44  13296    320 925824    0    0     0     0    8  107  0 50 50  0  0
 1  0     44  14040    320 924964    0    0     0     0    4  105  0 51 50  0  0
 1  0     44  15036    320 924120    0    0     0     0    7  103  0 50 50  0  0
 1  0     44  14028    312 924896    0    0     0     0    2  885  0 50 50  0  0
# top
top - 07:15:11 up 30 min,  2 users,  load average: 1.67, 1.67, 1.29
Tasks:  68 total,   2 running,  66 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us, 50.2%sy,  0.0%ni, 49.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   1025740k total,  1011212k used,    14528k free,      756k buffers
Swap:  2031608k total,       44k used,  2031564k free,   921176k cached
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3548 root      15  -5     0    0    0 R  100  0.0   6:56.34 btrfs/1
<= this consumes 100% cpu time
  234 root      15  -5     0    0    0 S    0  0.0   0:02.85 kswapd0
 4092 djshin    20   0  2200 1000  800 R    0  0.1   0:00.12 top
    1 root      20   0  2136  664  576 S    0  0.1   0:00.47 init
    2 root      15  -5     0    0    0 S    0  0.0   0:00.00 kthreadd
    3 root      RT  -5     0    0    0 S    0  0.0   0:00.00 migration/0
    4 root      15  -5     0    0    0 S    0  0.0   0:00.05 ksoftirqd/0
It seems that this happens for both 2.6.23 and 2.6.24 kernel.
Any idea?
--
Dongjun
On Friday 15 February 2008, Dongjun Shin wrote:> Hello, > > When I'm running postmark on btrfs v0.12, although the system > must be busy doing the I/O, there are some cases where the I/O is idle > while btrfs workqueue eats up most of the CPU time. > > It seems that this happens for both 2.6.23 and 2.6.24 kernel. > > Any idea? >It is probably the defrag process or snapshot deletion holding the FS wide mutex and excluding other writeback. This big mutex is really sub-optimal, and it shows up on a number of benchmarks. (sysrq-p might catch it) Once the multiple device code is ready, fine grained locking is my top priority. For runs like postmark, a smaller blocksize (mkfs.btrfs -l 4096 -n 4096) will lower the amount of work that needs to be done during snapshot deletion. Random IOs cause more btree churn on the larger blocksizes. You could also try the patch below to disable defrag in ssd mode. -chris -------------- next part -------------- A non-text attachment was scrubbed... Name: no-defrag Type: text/x-diff Size: 363 bytes Desc: not available Url : http://oss.oracle.com/pipermail/btrfs-devel/attachments/20080215/fdc9c581/no-defrag.bin
On Friday 15 February 2008, Dongjun Shin wrote:> Hello, > > When I'm running postmark on btrfs v0.12, although the system > must be busy doing the I/O, there are some cases where the I/O is idle > while btrfs workqueue eats up most of the CPU time. >I wasn't able to reproduce this on my small ssd, but I could trigger it on my larger sata drive. Most of the time we seem to be stuck in btrfs_realloc_node, which is part of the defrag. The attached patch disables defrag in ssd mode, or you can grab the latest from btrfs-unstable: http://oss.oracle.com/mercurial/mason/btrfs-unstable/archive/1cc5025e42bb.tar.gz I had left defrag on in ssd mode because earlier tests showed it still helped in some read workloads. This doesn't seem to be the case anymore, but if you see read regressions, please let me know. (updated no-defrag patch below) -chris -------------- next part -------------- A non-text attachment was scrubbed... Name: no-defrag Type: text/x-diff Size: 1023 bytes Desc: not available Url : http://oss.oracle.com/pipermail/btrfs-devel/attachments/20080215/c223f5fb/no-defrag.bin