bugzilla-noreply at freebsd.org
2018-Jul-11 11:10 UTC
[Bug 229694] [zfs] unkillable "zpool scrub" in [tx->tx_sync_done_cv] state for damaged data
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229694
Bug ID: 229694
Summary: [zfs] unkillable "zpool scrub" in
[tx->tx_sync_done_cv] state for damaged data
Product: Base System
Version: 11.2-STABLE
Hardware: Any
OS: Any
Status: New
Severity: Affects Some People
Priority: ---
Component: kern
Assignee: stable at FreeBSD.org
Reporter: eugen at freebsd.org
CC: fs at FreeBSD.org
Hi!
"zpool scrub" may hang in an uninterruptable disk i/o state in case of
damaged
pool data for 11.2-STABLE/amd64 r335757. This is easily reproduceable using
file-backed ZFS pool when files reside on another ("real") pool:
cd dir # resides on ZFS
size=100
rm -f vdev1 vdev2
truncate -s ${size}m vdev1 vdev2
zpool create ztest $(realpath vdev1)
zpool add ztest $(realpath vdev2)
# simulate data corruption
dd if=/dev/urandom of=vdev2 bs=1m count=${size}
zpool scrub ztest
The last command "zpool scrub" always hangs here:
load: 0.53 cmd: zpool 2130 [tx->tx_sync_done_cv] 34.59r 0.00u 0.00s 0% 3692k
"kill -9" cannot kill it.
--
You are receiving this mail because:
You are the assignee for the bug.
bugzilla-noreply at freebsd.org
2018-Jul-11 12:08 UTC
[Bug 229694] [zfs] unkillable "zpool scrub" in [tx->tx_sync_done_cv] state for damaged data
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229694 --- Comment #1 from Andriy Gapon <avg at FreeBSD.org> --- I am not too surprised. The pool configuration is not redundant and the whole top level vdev is corrupted. I suspect that the scrub command needs to write something to the pool to record the initial scrub state. And it's quite likely that it needs to perform Read-Modify-Write. And the read fails and the pool gets suspended. zpool scrub command is stuck waiting for confirmation that the scrub is actually started. procstat -kk -a would paint a fuller picture. Maybe there is something reported in dmesg too, but not sure. -- You are receiving this mail because: You are the assignee for the bug.
bugzilla-noreply at freebsd.org
2018-Jul-11 12:41 UTC
[Bug 229694] [zfs] unkillable "zpool scrub" in [tx->tx_sync_done_cv] state for damaged data
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229694
Mark Linimon <linimon at FreeBSD.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Assignee|stable at FreeBSD.org |fs at FreeBSD.org
CC|fs at FreeBSD.org |
--
You are receiving this mail because:
You are the assignee for the bug.
bugzilla-noreply at freebsd.org
2018-Jul-11 13:58 UTC
[Bug 229694] [zfs] unkillable "zpool scrub" in [tx->tx_sync_done_cv] state for damaged data
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229694
Eugene Grosbein <eugen at freebsd.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |stable at FreeBSD.org
--- Comment #2 from Eugene Grosbein <eugen at freebsd.org> ---
(In reply to Andriy Gapon from comment #1)
Nothing in the dmesg output. Procstat output is huge, so I compressed it, see
attachment.
--
You are receiving this mail because:
You are on the CC list for the bug.
bugzilla-noreply at freebsd.org
2018-Jul-11 13:58 UTC
[Bug 229694] [zfs] unkillable "zpool scrub" in [tx->tx_sync_done_cv] state for damaged data
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229694 --- Comment #3 from Eugene Grosbein <eugen at freebsd.org> --- Created attachment 195052 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=195052&action=edit procstat -kk -a output -- You are receiving this mail because: You are on the CC list for the bug.
bugzilla-noreply at freebsd.org
2019-Feb-13 02:00 UTC
[Bug 229694] [zfs] unkillable "zpool scrub" in [tx->tx_sync_done_cv] state for damaged data
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229694 --- Comment #4 from Rodney W. Grimes <rgrimes at FreeBSD.org> --- Please do not put bugs on stable@, current@, hackers@, etc -- You are receiving this mail because: You are on the CC list for the bug.
bugzilla-noreply at freebsd.org
2019-Feb-13 10:10 UTC
[Bug 229694] [zfs] unkillable "zpool scrub" in [tx->tx_sync_done_cv] state for damaged data
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229694
--- Comment #5 from Andriy Gapon <avg at FreeBSD.org> ---
(In reply to Eugene Grosbein from comment #3)
5 101937 zfskern txg_thread_enter mi_switch+0xc5
sleepq_wait+0x2c _cv_wait+0x160 zio_resume_wait+0x4b spa_sync+0xd46
txg_sync_thread+0x25e fork_exit+0x75 fork_trampoline+0xe
3249 101681 zpool - mi_switch+0xc5
sleepq_wait+0x2c _cv_wait+0x160 txg_wait_synced+0xa5 dsl_sync_task_common+0x219
dsl_sync_task+0x14 dsl_scan+0x9e zfs_ioc_pool_scan+0x5a zfsdev_ioctl+0x6c2
devfs_ioctl_f+0x12d kern_ioctl+0x212 sys_ioctl+0x15c amd64_syscall+0x25c
fast_syscall_common+0x101
So, unfortunately, this is how ZFS works now.
--
You are receiving this mail because:
You are on the CC list for the bug.