Radu Radutiu
2016-Dec-05 10:28 UTC
[CentOS] Huge write amplification with thin provisioned logical volumes
Hi, I've noticed huge write amplification problem with thinly provisioned logical volumes and I wondered if anyone can explain why it happens and if and how can be fixed. The behavior is the same on Centos 6.8 and Centos 7.2. I have a NVME card (Intel DC P3600 -2 TB) on which I create a thinly provisioned logical volume: pvcreate /dev/nvme0n1 vgcreate vgg /dev/nvme0n1 lvcreate -l100%FREE -T vgg/thinpool lvcreate -V40000M -T vgg/thinpool -n brick1 mkfs.xfs /dev/vgg/brick1 If I run a write test ( dd if=/dev/zero of=./zero.img bs=4k count=100000 oflag=dsync ) I see in iotop that the actual disk write is 30 times the amount of data that I'm actually writing to disk). Total DISK READ: 0.00 B/s | Total DISK WRITE: 1001.23 M/s TIME TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND 10:59:53 34453 be/4 root 0.00 B/s 30.34 M/s 0.00 % 12.10 % dd if=/dev/zero of=./zero.img bs=4k count=100000 oflag=dsync Total DISK READ: 0.00 B/s | Total DISK WRITE: 991.92 M/s TIME TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND 10:59:54 34453 be/4 root 0.00 B/s 30.05 M/s 0.00 % 12.63 % dd if=/dev/zero of=./zero.img bs=4k count=100000 oflag=dsync Total DISK READ: 0.00 B/s | Total DISK WRITE: 1024.52 M/s TIME TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND 10:59:55 34453 be/4 root 0.00 B/s 31.05 M/s 0.00 % 12.49 % dd if=/dev/zero of=./zero.img bs=4k count=100000 oflag=dsync 10:59:55 1057 be/3 root 0.00 B/s 15.39 K/s 0.00 % 0.01 % [jbd2/sda1-8] Total DISK READ: 0.00 B/s | Total DISK WRITE: 967.60 M/s TIME TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND 10:59:56 34453 be/4 root 0.00 B/s 29.32 M/s 0.00 % 12.75 % dd if=/dev/zero of=./zero.img bs=4k count=100000 oflag=dsync Total DISK READ: 0.00 B/s | Total DISK WRITE: 943.66 M/s TIME TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND 10:59:58 34453 be/4 root 0.00 B/s 28.60 M/s 0.00 % 11.79 % dd if=/dev/zero of=./zero.img bs=4k count=100000 oflag=dsync 10:59:58 34448 be/4 root 0.00 B/s 3.84 K/s 0.00 % 0.00 % python /usr/sbin/iotop -o -b -t Total DISK READ: 0.00 B/s | Total DISK WRITE: 959.40 M/s TIME TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND 10:59:59 34453 be/4 root 0.00 B/s 29.07 M/s 0.00 % 11.81 % dd if=/dev/zero of=./zero.img bs=4k count=100000 oflag=dsync Total DISK READ: 0.00 B/s | Total DISK WRITE: 948.38 M/s TIME TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND 11:00:00 34453 be/4 root 0.00 B/s 28.73 M/s 0.00 % 11.57 % dd if=/dev/zero of=./zero.img bs=4k count=100000 oflag=dsync For a 30MB/s write at the application level I get around 1000MB/s write at the device level, i.e. a 33x amplification. On Centos 6 if I try to align the data using the values from https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Setting%20Up%20Volumes/ I get only a 7x amplification. On Cetos 7 I can see the same 7x amplification using the default lvcreate options. This is the Centos 7 iotop output: 12:48:29 Total DISK READ : 0.00 B/s | Total DISK WRITE : 32.24 M/s 12:48:29 Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 226.63 M/s TIME TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND 12:48:29 15234 be/3 root 0.00 B/s 3.80 K/s 0.00 % 35.20 % [jbd2/dm-8-8] 12:48:29 15258 be/4 root 0.00 B/s 32.24 M/s 0.00 % 10.64 % dd if=/dev/zero of=./zero.img bs=4k count=100000 oflag=dsync 12:48:29 14870 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.05 % [kworker/u80:1] 12:48:29 15240 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.03 % [kworker/u80:2] 12:48:29 15255 be/4 root 0.00 B/s 3.80 K/s 0.00 % 0.00 % python /usr/sbin/iotop -o -b -t 12:48:30 Total DISK READ : 0.00 B/s | Total DISK WRITE : 31.97 M/s 12:48:30 Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 224.85 M/s TIME TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND 12:48:30 15234 be/3 root 0.00 B/s 0.00 B/s 0.00 % 35.14 % [jbd2/dm-8-8] 12:48:30 15258 be/4 root 0.00 B/s 31.97 M/s 0.00 % 10.61 % dd if=/dev/zero of=./zero.img bs=4k count=100000 oflag=dsync 12:48:30 14870 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.05 % [kworker/u80:1] 12:48:30 15240 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.03 % [kworker/u80:2] 12:48:31 Total DISK READ : 0.00 B/s | Total DISK WRITE : 32.50 M/s 12:48:31 Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 228.94 M/s TIME TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND 12:48:31 15234 be/3 root 0.00 B/s 0.00 B/s 0.00 % 35.28 % [jbd2/dm-8-8] 12:48:31 15258 be/4 root 0.00 B/s 32.48 M/s 0.00 % 10.72 % dd if=/dev/zero of=./zero.img bs=4k count=100000 oflag=dsync Still 7x write amplifications seems too much. Has anyone seen this or has any explanation for it? I am rewriting the same file with dd multiple times so the filesystem and thin lvm should use already provisioned space. Best regards, Radu
Possibly Parallel Threads
- cgroup blkio.weight working, but not for KVM guests
- io_uring cause data corruption
- [Bug 98657] New: Reproducible freeze when changing volume amplification in kodi
- AST-2008-011: Traffic amplification in IAX2 firmware provisioning system
- rsync performance weirdness