Hi All, I believe this has been asked before, but I wasn?t able to find too much information about the subject. Long story short, I was moving data around on a storage zpool of mine and a zfs destroy <filesystem> hung (or so I thought). This pool had dedup turned on at times while imported as well; it?s running on a Nexenta Core 3.0.1 box (snv_134f). The first time the machine was rebooted, it hung at the ?Loading ZFS filesystems? line after loading the kernel; I booted the box with all drives unplugged and exported the pool. The machine was rebooted, and now the pool is hanging on import (zpool import ?Fn Nalgene). I?m using ?0t2761::pid2proc|::walk thread|::findstack" | mdb ?k? to try and view what the import processes is doing, but I?m not a hard-core ZFS/Solaris dev so I don?t know if I?m reading the output correctly, but it appears that ZFS is continuing to delete a snapshot/FS from before (reading from the top down): stack pointer for thread ffffff01ce408e00: ffffff0008f2b1f0 [ ffffff0008f2b1f0 _resume_from_idle+0xf1() ] ffffff0008f2b220 swtch+0x145() ffffff0008f2b250 cv_wait+0x61() ffffff0008f2b2a0 txg_wait_open+0x7a() ffffff0008f2b2e0 dmu_tx_wait+0xb3() ffffff0008f2b320 dmu_tx_assign+0x4b() ffffff0008f2b3b0 dmu_free_long_range_impl+0x12b() ffffff0008f2b400 dmu_free_object+0xe6() ffffff0008f2b710 dsl_dataset_destroy+0x122() ffffff0008f2b740 dsl_destroy_inconsistent+0x5f() ffffff0008f2b770 findfunc+0x23() ffffff0008f2b850 dmu_objset_find_spa+0x38c() ffffff0008f2b930 dmu_objset_find_spa+0x153() ffffff0008f2b970 dmu_objset_find+0x40() ffffff0008f2ba40 spa_load_impl+0xb23() ffffff0008f2bad0 spa_load+0x117() ffffff0008f2bb50 spa_load_best+0x78() ffffff0008f2bbf0 spa_import+0xee() ffffff0008f2bc40 zfs_ioc_pool_import+0xc0() ffffff0008f2bcc0 zfsdev_ioctl+0x177() ffffff0008f2bd00 cdev_ioctl+0x45() ffffff0008f2bd40 spec_ioctl+0x5a() ffffff0008f2bdc0 fop_ioctl+0x7b() ffffff0008f2bec0 ioctl+0x18e() ffffff0008f2bf10 sys_syscall32+0xff() I have this in a loop running every 15 secs, and I?ll occasionally see some ddt_* lines as well (current dedup ratio is 1.05). The ratio was originally about 1.09 when I started the import (from zdb ?e Nalgene); is the system doing something special, or is this just ZFS destroying the pending-deletion data causing the ratio to change? As far as the import, is there any estimate I can make as to how long the process will take? I?ve had it running since Saturday morning (~36 hours now) through a couple system lockups. The zpool is a 7-disk raidz2 (5tb useable, 2tb used) with 4GB of RAM (8GB coming tomorrow which I?ll put in to use) running on an AMD Phenom II X4 processor. Thanks in advance! -- *Stephen Repetski* BS Applied Networking and Systems Administration, 2013 Rochester Institute of Technology, Thomas Jefferson HS S&T skr3394 at rit.edu | srepetsk at srepetsk.net http://srepetsk.net -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110117/2b79cffd/attachment.html>
On 01/18/11 04:00 PM, Repetski, Stephen wrote:> > Hi All, > > I believe this has been asked before, but I wasn?t able to find too > much information about the subject. Long story short, I was moving > data around on a storage zpool of mine and a zfs destroy <filesystem> > hung (or so I thought). This pool had dedup turned on at times while > imported as well; it?s running on a Nexenta Core 3.0.1 box (snv_134f). >> The first time the machine was rebooted, it hung at the ?Loading ZFS > filesystems? line after loading the kernel; I booted the box with all > drives unplugged and exported the pool. The machine was rebooted, and > now the pool is hanging on import (zpool import ?Fn Nalgene). I?m > using ?0t2761::pid2proc|::walk thread|::findstack" | mdb ?k? to try > and view what the import processes is doing, but I?m not a hard-core > ZFS/Solaris dev so I don?t know if I?m reading the output correctly, > but it appears that ZFS is continuing to delete a snapshot/FS from > before (reading from the top down): >What does "zpool iostat <pool> 10" show? If you have a lot a deduped data and not a lot of RAM (or a cache device), it can take a very long time to destroy a filesystem. You will see lot of reads and not many writes if this is happening. -- Ian.
On Mon, Jan 17, 2011 at 22:08, Ian Collins <ian at ianshome.com> wrote:> On 01/18/11 04:00 PM, Repetski, Stephen wrote: > >> >> Hi All, >> >> I believe this has been asked before, but I wasn?t able to find too much >> information about the subject. Long story short, I was moving data around on >> a storage zpool of mine and a zfs destroy <filesystem> hung (or so I >> thought). This pool had dedup turned on at times while imported as well; >> it?s running on a Nexenta Core 3.0.1 box (snv_134f). >> >> > The first time the machine was rebooted, it hung at the ?Loading ZFS >> filesystems? line after loading the kernel; I booted the box with all >> drives unplugged and exported the pool. The machine was rebooted, and now >> the pool is hanging on import (zpool import ?Fn Nalgene). I?m using >> ?0t2761::pid2proc|::walk thread|::findstack" | mdb ?k? to try and view what >> the import processes is doing, but I?m not a hard-core ZFS/Solaris dev so I >> don?t know if I?m reading the output correctly, but it appears that ZFS is >> continuing to delete a snapshot/FS from before (reading from the top down): >> >> What does "zpool iostat <pool> 10" show? > > If you have a lot a deduped data and not a lot of RAM (or a cache device), > it can take a very long time to destroy a filesystem. You will see lot of > reads and not many writes if this is happening. > > -- > Ian. > >Zpool iostat itself hangs, but iostat does show me one drive in particular causing some issues - http://pastebin.com/6rJG3qV9 - %w and %b drop to ~50 and ~90, respectively, when mdk shows ZFS doing some deduplication work ( http://pastebin.com/EMPYy5Rr). As you said the pool is mostly reading data and not writing much. I should be able to switch up that drive to another controller (currently on a PCI SATA adapter) and see what iostat reports then. Until then, I''ll keep the zpool import running and see what the box does... Thanks, Trey -- *Stephen Repetski* BS Applied Networking and Systems Administration, 2013 Rochester Institute of Technology, Thomas Jefferson HS S&T skr3394 at rit.edu | srepetsk at srepetsk.net http://srepetsk.net -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110117/8ecab2a9/attachment.html>
On 01/18/11 05:22 PM, Repetski, Stephen wrote:> > On Mon, Jan 17, 2011 at 22:08, Ian Collins <ian at ianshome.com > <mailto:ian at ianshome.com>> wrote: > > On 01/18/11 04:00 PM, Repetski, Stephen wrote: > > > Hi All, > > I believe this has been asked before, but I wasn?t able to > find too much information about the subject. Long story short, > I was moving data around on a storage zpool of mine and a zfs > destroy <filesystem> hung (or so I thought). This pool had > dedup turned on at times while imported as well; it?s running > on a Nexenta Core 3.0.1 box (snv_134f). > > > The first time the machine was rebooted, it hung at the > ?Loading ZFS filesystems? line after loading the kernel; I > booted the box with all drives unplugged and exported the > pool. The machine was rebooted, and now the pool is hanging on > import (zpool import ?Fn Nalgene). I?m using > ?0t2761::pid2proc|::walk thread|::findstack" | mdb ?k? to try > and view what the import processes is doing, but I?m not a > hard-core ZFS/Solaris dev so I don?t know if I?m reading the > output correctly, but it appears that ZFS is continuing to > delete a snapshot/FS from before (reading from the top down): > > What does "zpool iostat <pool> 10" show? > > If you have a lot a deduped data and not a lot of RAM (or a cache > device), it can take a very long time to destroy a filesystem. > You will see lot of reads and not many writes if this is happening. > > -- > Ian. > > > Zpool iostat itself hangs,If you are running it as root, try another user. I don''t know about recent builds, but zpool commands are way slower as root on Solaris 10. -- Ian. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110118/fdaaaaab/attachment-0001.html>
On Jan 17, 2011, at 8:22 PM, Repetski, Stephen wrote:> > On Mon, Jan 17, 2011 at 22:08, Ian Collins <ian at ianshome.com> wrote: > On 01/18/11 04:00 PM, Repetski, Stephen wrote: > > Hi All, > > I believe this has been asked before, but I wasn?t able to find too much information about the subject. Long story short, I was moving data around on a storage zpool of mine and a zfs destroy <filesystem> hung (or so I thought). This pool had dedup turned on at times while imported as well; it?s running on a Nexenta Core 3.0.1 box (snv_134f). > > > The first time the machine was rebooted, it hung at the ?Loading ZFS filesystems? line after loading the kernel; I booted the box with all drives unplugged and exported the pool. The machine was rebooted, and now the pool is hanging on import (zpool import ?Fn Nalgene). I?m using ?0t2761::pid2proc|::walk thread|::findstack" | mdb ?k? to try and view what the import processes is doing, but I?m not a hard-core ZFS/Solaris dev so I don?t know if I?m reading the output correctly, but it appears that ZFS is continuing to delete a snapshot/FS from before (reading from the top down): > > What does "zpool iostat <pool> 10" show? > > If you have a lot a deduped data and not a lot of RAM (or a cache device), it can take a very long time to destroy a filesystem. You will see lot of reads and not many writes if this is happening. > > -- > Ian. > > > Zpool iostat itself hangs, but iostat does show me one drive in particular causing some issues - http://pastebin.com/6rJG3qV9 - %w and %b drop to ~50 and ~90, respectively, when mdk shows ZFS doing some deduplication work (http://pastebin.com/EMPYy5Rr). As you said the pool is mostly reading data and not writing much. I should be able to switch up that drive to another controller (currently on a PCI SATA adapter) and see what iostat reports then.%w should be near 0 for most cases. Until you solve that problem, everything will be slow. -- richard -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110118/98ade0cd/attachment.html>