Hello, I'm using sort of FreeBSD ZFS appliance with custom API, and I'm suffering from huge timeouts when large (dozens, actually) of concurrent zfs/zpool commands are issued (get/create/destroy/snapshot/clone mostly). Are there any tunables that could help mitigate this ? Once I took part in reporting the https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203906 , but the issue that time got resolved somehow. Now I have another set of FreeBSD SANs and it;'s back. I've read the https://wiki.freebsd.org/AndriyGapon/AvgZFSLocking and I realize this probably doesn't have a quick solution, but still... Thanks. Eugene.
On 09.09.2020 17:29, Eugene M. Zheganin wrote:> Hello, > > I'm using sort of FreeBSD ZFS appliance with custom API, and I'm > suffering from huge timeouts when large (dozens, actually) of > concurrent zfs/zpool commands are issued > (get/create/destroy/snapshot/clone mostly). > > Are there any tunables that could help mitigate this ? > > Once I took part in reporting the > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203906 , but the > issue that time got resolved somehow. Now I have another set of > FreeBSD SANs and it;'s back. I've read the > https://wiki.freebsd.org/AndriyGapon/AvgZFSLocking and I realize this > probably doesn't have a quick solution, but still... >This actually looks like this (sometime it takes several [dozens of] minutes): root at cg-mr-prod-stg09:/usr/ports/sysutils/smartmontools # zfs get volmode load: 3.58? cmd: zfs 70231 [spa_namespace_lock] 16.38r 0.00u 0.00s 0% 3872k load: 3.58? cmd: zfs 70231 [spa_namespace_lock] 16.59r 0.00u 0.00s 0% 3872k load: 3.58? cmd: zfs 70231 [spa_namespace_lock] 16.76r 0.00u 0.00s 0% 3872k load: 3.58? cmd: zfs 70231 [spa_namespace_lock] 16.90r 0.00u 0.00s 0% 3872k load: 3.58? cmd: zfs 70231 [spa_namespace_lock] 17.04r 0.00u 0.00s 0% 3872k load: 3.58? cmd: zfs 70231 [spa_namespace_lock] 17.17r 0.00u 0.00s 0% 3872k root at cg-mr-prod-stg09:~ # ps ax | grep volmode 70231? 5? D+??????? 0:00.00 zfs get volmode 70233? 6? S+??????? 0:00.00 grep volmode root at cg-mr-prod-stg09:~ # procstat -kk 70231 ? PID??? TID COMM??????????????? TDNAME KSTACK 70231 101598 zfs???????????????? - mi_switch+0xe2 sleepq_wait+0x2c _sx_xlock_hard+0x459 spa_all_configs+0x1aa zfs_ioc_pool_configs+0x19 zfsdev_ioctl+0x72e devfs_ioctl+0xad VOP_IOCTL_APV+0x7c vn_ioctl+0x16a devfs_ioctl_f+0x1f kern_ioctl+0x2be sys_ioctl+0x15d amd64_syscall+0x364 fast_syscall_common+0x101 root at cg-mr-prod-stg09:~ # Thanks. Eugene.
09.09.2020 19:29, Eugene M. Zheganin wrote:> I'm using sort of FreeBSD ZFS appliance with custom API, and I'm suffering from huge timeouts when large (dozens, actually) of concurrent zfs/zpool commands are issued (get/create/destroy/snapshot/clone mostly). > > Are there any tunables that could help mitigate this ? > > Once I took part in reporting the https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203906 , but the issue that time got resolved somehow. Now I have another set of FreeBSD SANs and it;'s back. I've read the https://wiki.freebsd.org/AndriyGapon/AvgZFSLocking and I realize this probably doesn't have a quick solution, but still...I think this is some kind of bug/misfeature. As work-around, try using "zfs destroy -d" instead of plain "zfs destroy". I suffered from this, too when used ZFS pool over SSD only instead of HDD+SSD for L2ARC and used SSD sucked really hard processing BIO_DELETE (trim) with looong delays. Take a look at "gstat -adI3s" output to monitor amount of delete operations and their timings.