thr3ads.net - freebsd stable - zfs related panic [Jun 2009]

If this information is useful, please help other people find it:
Share via:

Andriy Gapon

2009-Jun-12 13:56 UTC

zfs related panic

This is on a recent stable/7 amd64, with zpool and filesystems upgraded to the
latest version.
I did zfs rollback xxx@yyy
And then did ls on a directory in the rolled-back fs.

I have the core file if it can be of any help.

Sleeping thread (tid 100263, pid 2432) owns a non-sleepable lock
sched_switch() at 0xffffffff8031d0ef = sched_switch+0x47d
mi_switch() at 0xffffffff80302a59 = mi_switch+0x1bf
sleepq_switch() at 0xffffffff8032f645 = sleepq_switch+0xd8
sleepq_catch_signals() at 0xffffffff8032f925 = sleepq_catch_signals+0x2db
sleepq_wait_sig() at 0xffffffff80330219 = sleepq_wait_sig+0xc
_sleep() at 0xffffffff80302eba = _sleep+0x2b5
kern_sigsuspend() at 0xffffffff802fc567 = kern_sigsuspend+0xeb
sigsuspend() at 0xffffffff802fc5e9 = sigsuspend+0x34
syscall() at 0xffffffff80491d2d = syscall+0x347
Xfast_syscall() at 0xffffffff8047d00b = Xfast_syscall+0xab
--- syscall (341, FreeBSD ELF64, sigsuspend), rip = 0x80092ce3c, rsp
0x7fffffffdee8, rbp = 0x8011e5a60 ---
panic: sleeping thread
cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper() at 0xffffffff80192dd5 = db_trace_self_wrapper+0x2a
kdb_backtrace() at 0xffffffff80327ea7 = kdb_backtrace+0x32
panic() at 0xffffffff802fb70c = panic+0x1b0
propagate_priority() at 0xffffffff80332e92 = propagate_priority+0x122
turnstile_wait() at 0xffffffff80333e29 = turnstile_wait+0x358
_mtx_lock_sleep() at 0xffffffff802ed64a = _mtx_lock_sleep+0x117
cache_lookup() at 0xffffffff8036a52a = cache_lookup+0x632
vfs_cache_lookup() at 0xffffffff8036a69f = vfs_cache_lookup+0xab
VOP_LOOKUP_APV() at 0xffffffff804c86f3 = VOP_LOOKUP_APV+0x51
lookup() at 0xffffffff80370a71 = lookup+0x5d8
namei() at 0xffffffff8037168f = namei+0x320
kern_lstat() at 0xffffffff8037f6ca = kern_lstat+0x5e
lstat() at 0xffffffff8037f8c9 = lstat+0x25
syscall() at 0xffffffff80491d2d = syscall+0x347
Xfast_syscall() at 0xffffffff8047d00b = Xfast_syscall+0xab
--- syscall (190, FreeBSD ELF64, lstat), rip = 0x80095afbc, rsp =
0x7fffffffdde8,
rbp = 0x800b50270 ---

-- 
Andriy Gapon

Kip Macy

2009-Jun-12 20:54 UTC

head link

zfs related panic

show sleepchain
show thread 100263

On Fri, Jun 12, 2009 at 6:56 AM, Andriy Gapon<avg@icyb.net.ua>
wrote:>
> This is on a recent stable/7 amd64, with zpool and filesystems upgraded to
the
> latest version.
> I did zfs rollback xxx@yyy
> And then did ls on a directory in the rolled-back fs.
>
> I have the core file if it can be of any help.
>
> Sleeping thread (tid 100263, pid 2432) owns a non-sleepable lock
> sched_switch() at 0xffffffff8031d0ef = sched_switch+0x47d
> mi_switch() at 0xffffffff80302a59 = mi_switch+0x1bf
> sleepq_switch() at 0xffffffff8032f645 = sleepq_switch+0xd8
> sleepq_catch_signals() at 0xffffffff8032f925 = sleepq_catch_signals+0x2db
> sleepq_wait_sig() at 0xffffffff80330219 = sleepq_wait_sig+0xc
> _sleep() at 0xffffffff80302eba = _sleep+0x2b5
> kern_sigsuspend() at 0xffffffff802fc567 = kern_sigsuspend+0xeb
> sigsuspend() at 0xffffffff802fc5e9 = sigsuspend+0x34
> syscall() at 0xffffffff80491d2d = syscall+0x347
> Xfast_syscall() at 0xffffffff8047d00b = Xfast_syscall+0xab
> --- syscall (341, FreeBSD ELF64, sigsuspend), rip = 0x80092ce3c, rsp >
0x7fffffffdee8, rbp = 0x8011e5a60 ---
> panic: sleeping thread
> cpuid = 0
> KDB: stack backtrace:
> db_trace_self_wrapper() at 0xffffffff80192dd5 = db_trace_self_wrapper+0x2a
> kdb_backtrace() at 0xffffffff80327ea7 = kdb_backtrace+0x32
> panic() at 0xffffffff802fb70c = panic+0x1b0
> propagate_priority() at 0xffffffff80332e92 = propagate_priority+0x122
> turnstile_wait() at 0xffffffff80333e29 = turnstile_wait+0x358
> _mtx_lock_sleep() at 0xffffffff802ed64a = _mtx_lock_sleep+0x117
> cache_lookup() at 0xffffffff8036a52a = cache_lookup+0x632
> vfs_cache_lookup() at 0xffffffff8036a69f = vfs_cache_lookup+0xab
> VOP_LOOKUP_APV() at 0xffffffff804c86f3 = VOP_LOOKUP_APV+0x51
> lookup() at 0xffffffff80370a71 = lookup+0x5d8
> namei() at 0xffffffff8037168f = namei+0x320
> kern_lstat() at 0xffffffff8037f6ca = kern_lstat+0x5e
> lstat() at 0xffffffff8037f8c9 = lstat+0x25
> syscall() at 0xffffffff80491d2d = syscall+0x347
> Xfast_syscall() at 0xffffffff8047d00b = Xfast_syscall+0xab
> --- syscall (190, FreeBSD ELF64, lstat), rip = 0x80095afbc, rsp =
0x7fffffffdde8,
> rbp = 0x800b50270 ---
>
> --
> Andriy Gapon
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to
"freebsd-fs-unsubscribe@freebsd.org"
>


-- 
When bad men combine, the good must associate; else they will fall one
by one, an unpitied sacrifice in a contemptible struggle.

    Edmund Burke

Peter Much

2009-Aug-11 21:13 UTC

head link

zfs/panic: short after rollback

<kmacy@freebsd.org> aka Kip Macy  schrieb
mit Datum Fri, 12 Jun 2009 13:54:40 -0700 in m2n.fbsd.stable:

|show sleepchain
|show thread 100263
|
|On Fri, Jun 12, 2009 at 6:56 AM, Andriy Gapon<avg@icyb.net.ua> wrote:
|>
|> I did zfs rollback xxx@yyy
|> And then did ls on a directory in the rolled-back fs.

|> panic: sleeping thread

This is quite likely the same problem as I experience. 
And it is maybe also the same problem as in kern/137037 and kern/129148.

It seems to show up in some different flavours, while the bottomline
is this: 
do a rollback, and soon after (usually at the next filesystem-related 
action) the kernel has gone fishing.

I experienced it first when doing a rollback of a mounted filesystem.
It crashed right after the first try, and it did so reproducible.
(Well, more or less reproducible - another day under similar
circumstances it did not crash.)

Then I started thinking, and came to the conclusion that a rollback
of a mounted filesystem (with possibly open files) could easily bring 
a lot of things into an undefined state, and should not be something 
one wants to do normally. So maybe it is not supposed to work at all.

Anyway, when trying this, I do either get the "sleeping thread"
message (as above), or a panic from _sx_xlock() (as shown in 
my addendum to kern/137037, and in the addendum to kern/129148).

So I started to do rollbacks on unmounted filesystems (quite an
excessive amount of them), and while this seemed to work at first, 
later on the system failures reappeared. 
These system failures took various shapes - I experienced immediate
resets without dump, and system hangs.
When deliberately trying to reproduce that (after installing a 
kernel with debugging info and watching the console), I also 
captured a panic coming from _sx_xlock() - so it seems to be the 
same problem as without unmounting, only that it takes a couple 
of rollbacks (a dozen or more) to hit.

Over all, there was never any data loss or persistent damage.
So, I consider rollback still functional and safe to use, but
I consider a system no longer production stable after doing
a rollback.

rgds,
PMc

freebsd stable - Jun 2009 - zfs related panic

zfs related panic

zfs related panic

zfs/panic: short after rollback