bugzilla-noreply at freebsd.org
2018-Jul-11 11:10 UTC
[Bug 229694] [zfs] unkillable "zpool scrub" in [tx->tx_sync_done_cv] state for damaged data
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229694 Bug ID: 229694 Summary: [zfs] unkillable "zpool scrub" in [tx->tx_sync_done_cv] state for damaged data Product: Base System Version: 11.2-STABLE Hardware: Any OS: Any Status: New Severity: Affects Some People Priority: --- Component: kern Assignee: stable at FreeBSD.org Reporter: eugen at freebsd.org CC: fs at FreeBSD.org Hi! "zpool scrub" may hang in an uninterruptable disk i/o state in case of damaged pool data for 11.2-STABLE/amd64 r335757. This is easily reproduceable using file-backed ZFS pool when files reside on another ("real") pool: cd dir # resides on ZFS size=100 rm -f vdev1 vdev2 truncate -s ${size}m vdev1 vdev2 zpool create ztest $(realpath vdev1) zpool add ztest $(realpath vdev2) # simulate data corruption dd if=/dev/urandom of=vdev2 bs=1m count=${size} zpool scrub ztest The last command "zpool scrub" always hangs here: load: 0.53 cmd: zpool 2130 [tx->tx_sync_done_cv] 34.59r 0.00u 0.00s 0% 3692k "kill -9" cannot kill it. -- You are receiving this mail because: You are the assignee for the bug.
bugzilla-noreply at freebsd.org
2018-Jul-11 12:08 UTC
[Bug 229694] [zfs] unkillable "zpool scrub" in [tx->tx_sync_done_cv] state for damaged data
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229694 --- Comment #1 from Andriy Gapon <avg at FreeBSD.org> --- I am not too surprised. The pool configuration is not redundant and the whole top level vdev is corrupted. I suspect that the scrub command needs to write something to the pool to record the initial scrub state. And it's quite likely that it needs to perform Read-Modify-Write. And the read fails and the pool gets suspended. zpool scrub command is stuck waiting for confirmation that the scrub is actually started. procstat -kk -a would paint a fuller picture. Maybe there is something reported in dmesg too, but not sure. -- You are receiving this mail because: You are the assignee for the bug.
bugzilla-noreply at freebsd.org
2018-Jul-11 12:41 UTC
[Bug 229694] [zfs] unkillable "zpool scrub" in [tx->tx_sync_done_cv] state for damaged data
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229694 Mark Linimon <linimon at FreeBSD.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|stable at FreeBSD.org |fs at FreeBSD.org CC|fs at FreeBSD.org | -- You are receiving this mail because: You are the assignee for the bug.
bugzilla-noreply at freebsd.org
2018-Jul-11 13:58 UTC
[Bug 229694] [zfs] unkillable "zpool scrub" in [tx->tx_sync_done_cv] state for damaged data
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229694 Eugene Grosbein <eugen at freebsd.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |stable at FreeBSD.org --- Comment #2 from Eugene Grosbein <eugen at freebsd.org> --- (In reply to Andriy Gapon from comment #1) Nothing in the dmesg output. Procstat output is huge, so I compressed it, see attachment. -- You are receiving this mail because: You are on the CC list for the bug.
bugzilla-noreply at freebsd.org
2018-Jul-11 13:58 UTC
[Bug 229694] [zfs] unkillable "zpool scrub" in [tx->tx_sync_done_cv] state for damaged data
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229694 --- Comment #3 from Eugene Grosbein <eugen at freebsd.org> --- Created attachment 195052 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=195052&action=edit procstat -kk -a output -- You are receiving this mail because: You are on the CC list for the bug.
bugzilla-noreply at freebsd.org
2019-Feb-13 02:00 UTC
[Bug 229694] [zfs] unkillable "zpool scrub" in [tx->tx_sync_done_cv] state for damaged data
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229694 --- Comment #4 from Rodney W. Grimes <rgrimes at FreeBSD.org> --- Please do not put bugs on stable@, current@, hackers@, etc -- You are receiving this mail because: You are on the CC list for the bug.
bugzilla-noreply at freebsd.org
2019-Feb-13 10:10 UTC
[Bug 229694] [zfs] unkillable "zpool scrub" in [tx->tx_sync_done_cv] state for damaged data
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229694 --- Comment #5 from Andriy Gapon <avg at FreeBSD.org> --- (In reply to Eugene Grosbein from comment #3) 5 101937 zfskern txg_thread_enter mi_switch+0xc5 sleepq_wait+0x2c _cv_wait+0x160 zio_resume_wait+0x4b spa_sync+0xd46 txg_sync_thread+0x25e fork_exit+0x75 fork_trampoline+0xe 3249 101681 zpool - mi_switch+0xc5 sleepq_wait+0x2c _cv_wait+0x160 txg_wait_synced+0xa5 dsl_sync_task_common+0x219 dsl_sync_task+0x14 dsl_scan+0x9e zfs_ioc_pool_scan+0x5a zfsdev_ioctl+0x6c2 devfs_ioctl_f+0x12d kern_ioctl+0x212 sys_ioctl+0x15c amd64_syscall+0x25c fast_syscall_common+0x101 So, unfortunately, this is how ZFS works now. -- You are receiving this mail because: You are on the CC list for the bug.