Thomas Maier-Komor
2006-Oct-24 21:58 UTC
[zfs-discuss] zpool snapshot fails on unmounted filesystem
Is this a known problem/bug? $ zfs snapshot zpool/loud at now internal error: unexpected error 16 at line 2302 of ../common/libzfs_dataset.c this occured on: $ uname -a SunOS azalin 5.10 Generic_118833-24 sun4u sparc SUNW,Sun-Blade-2500 This message posted from opensolaris.org
Frank Cusack
2006-Oct-25 01:00 UTC
[zfs-discuss] zpool snapshot fails on unmounted filesystem
On October 24, 2006 2:58:58 PM -0700 Thomas Maier-Komor <thomas at maier-komor.de> wrote:> Is this a known problem/bug? > > $ zfs snapshot zpool/loud at now > internal error: unexpected error 16 at line 2302 of > ../common/libzfs_dataset.cI had this problem also. I think the answer was to unmount the filesystem. -frank
Tim Foster
2006-Oct-25 13:38 UTC
[zfs-discuss] zpool snapshot fails on unmounted filesystem
hi Thomas, On Tue, 2006-10-24 at 14:58 -0700, Thomas Maier-Komor wrote:> Is this a known problem/bug?Yep, that sounds a bit like 6482985.> $ zfs snapshot zpool/loud at now > internal error: unexpected error 16 at line 2302 of ../common/libzfs_dataset.c > > this occured on: > $ uname -a > SunOS azalin 5.10 Generic_118833-24 sun4u sparc SUNW,Sun-Blade-2500I wasn''t able to reproduce this on similar bits, nor on recent s10 bits (ultimately destined for s10_u3) or nevada bits. Do you have a consistently reproducible test case ? cheers, tim -- Tim Foster, Sun Microsystems Inc, Solaris Engineering Ops http://blogs.sun.com/timf
Thomas Maier-Komor
2006-Oct-26 22:08 UTC
[zfs-discuss] Re: zpool snapshot fails on unmounted filesystem
Hi Tim, I just retried to reproduce it to generate a reliable test case. Unfortunately, I cannot reproduce the error message. So I really have no idea what might have cause it.... Sorry, Tom This message posted from opensolaris.org
Jürgen Keil
2006-Oct-27 08:40 UTC
[zfs-discuss] Re: zpool snapshot fails on unmounted filesystem
> I just retried to reproduce it to generate a reliable > test case. Unfortunately, I cannot reproduce the > error message. So I really have no idea what might > have cause it....I also had this problem 2-3 times in the past, but I cannot reproduce it. =================================================================== Using dtrace against the kernel, I found out that the source of the EBUSY error 16 is the kernel function zil_suspend(): [b] ... 0 <- dnode_cons 0 0 -> dnode_setdblksz 0 <- dnode_setdblksz 14 0 -> dmu_zfetch_init 0 -> list_create 0 <- list_create 3734548404 0 -> rw_init 0 <- rw_init 3734548400 0 <- dmu_zfetch_init 3734548400 0 -> list_insert_head 0 <- list_insert_head 3734548052 0 <- dnode_create 3734548048 0 <- dnode_special_open 3734548048 0 -> dsl_dataset_set_user_ptr 0 <- dsl_dataset_set_user_ptr 0 0 <- dmu_objset_open_impl 0 0 <- dmu_objset_open 0 0 -> dmu_objset_zil 0 <- dmu_objset_zil 3700903200 0 -> zil_suspend 0 | zil_suspend:entry zh_claim_txg: 83432 0 <- zil_suspend 16 0 -> dmu_objset_close 0 -> dsl_dataset_close 0 -> dbuf_rele 0 -> dbuf_evict_user 0 -> dsl_dataset_evict 0 -> unique_remove ... 1200 /* 1201 * Suspend an intent log. While in suspended mode, we still honor 1202 * synchronous semantics, but we rely on txg_wait_synced() to do it. 1203 * We suspend the log briefly when taking a snapshot so that the snapshot 1204 * contains all the data it''s supposed to, and has an empty intent log. 1205 */ 1206 int 1207 zil_suspend(zilog_t *zilog) 1208 { 1209 const zil_header_t *zh = zilog->zl_header; 1210 lwb_t *lwb; 1211 1212 mutex_enter(&zilog->zl_lock); 1213 if (zh->zh_claim_txg != 0) { /* unplayed log */ 1214 mutex_exit(&zilog->zl_lock); 1215 return (EBUSY); 1216 } ... [/b] =================================================================== It seems that you can identify zfs filesystems that fail zfs snapshot with error 16 EBUSY using zdb -iv {your_zpool_here} | grep claim_txg If there are any ZIL headers listed with a claim_txg != 0, the dataset that uses this ZIL should fail zfs snapshot with error 16, EBUSY. This message posted from opensolaris.org
Tim Foster
2006-Oct-27 09:15 UTC
[zfs-discuss] Re: zpool snapshot fails on unmounted filesystem
On Fri, 2006-10-27 at 01:40 -0700, J??rgen Keil wrote:> Using dtrace against the kernel, I found out that the source > of the EBUSY error 16 is the kernel function zil_suspend():. .> It seems that you can identify zfs filesystems that fail > zfs snapshot with error 16 EBUSY using > > zdb -iv {your_zpool_here} | grep claim_txg > > If there are any ZIL headers listed with a claim_txg != 0, the > dataset that uses this ZIL should fail zfs snapshot with > error 16, EBUSY.Thanks J??rgen, I''ll add your comments to 6482985 in case they help with the evaluation. I''ll also keep an eye out for those pools during testing. cheers, tim -- Tim Foster, Sun Microsystems Inc, Operating Platforms Group Engineering Operations http://blogs.sun.com/timf