Tuomas Leikola
2010-Oct-11 20:45 UTC
[zfs-discuss] free space fragmentation causing slow write speeds
Hello everybody. I am experiencing terribly slow writes on my home server. This is from zpool iostat: capacity operations bandwidth pool alloc free read write read write ------------------------- ----- ----- ----- ----- ----- ----- tank 3.69T 812G 148 255 755K 1.61M raidz1 2.72T 192G 86 112 554K 654K raidz1 995G 621G 61 143 201K 962K The case is that one vdev is almost full, while the other one has plenty of space. I remember that at least one point it was known that writes slow down when the fs became full, due to CPU time spent looking for free space. I am seeing that "zpool-tank" is using half a cpu (a quad opteron setup) all the time while this write load is running. That seems weird as I''d expect there to be almost a full cpu consumed if cpu is the bottleneck. Disks aren''t according to iostat -xcn, and there are multiple writers writing large files so I would not expect a bottleneck there. History of the pool; I added a second vdev when the first vdev was about 70% full. There are large files as well as small files and virtual machines and a heavily loaded database, probably making all free space very fragmented. After adding the second vdev, writes didn''t seem to be biased towards the new device enough; the older one filled up anyway, and now speed has slowed to a crawl. If I delete old snapshots or delete some data, the write speeds bump up to a more healthy 70MB/s, but after a while this problem comes back. I know I could rewrite most old data to move half of it to the new device, but that seems rather an unelegant solution to the problem. I had a look with zdb, and there are many metaslabs that have several hundred megabytes of free space, best ones almost a gigabyte (of 4 gigabytes) or in other words being something like 75-90% full. Is that too heavy for the allocator? Maybe space map could be reformatted to a more optimal structure when a metaslab is opened for writing. Or maybe that is exactly what causes the high cpu usage, I don''t know. and there are still perfectly empty metaslabs on the other device.. Last time this occurred I devised some synthetic tests to recreate this condition repeatedly, and noticed that at some point it appeared that zfs stopped allocating space on the more full device, except for ditto metadata blocks. This time around that doesn''t seem to happen, maybe the the trigger is less obvious than simple % free space. Such an ''emergency bias'' seems simple enough IIRC about the source code for choosing vdev to allocate from, aside from the triggering condition maybe being complicated to set accurately. Is there such a trigger and can it be adjusted to occur earlier? Any other remedies? Is there a way to confirm that finding free space is indeed the cause for slow writes, or whether there is possibly another reason? I wonder if the write balancing code should bias more aggressively. This condition should be expected if say, one has a system 80% full and adds another rack of disks, and does not touch existing data. Having speed slow to a crawl a month later is a bit unexpected. Thanks, Tuomas