In the last few days my performance has gone to hell. I''m running: # uname -a SunOS nissan 5.11 snv_150 i86pc i386 i86pc (I''ll upgrade as soon as the desktop hang bug is fixed.) The performance problems seem to be due to excessive I/O on the main disk/pool. The only things I''ve changed recently is that I''ve created and destroyed a snapshot, and I used "zpool upgrade". Here''s what I''m seeing: # zpool iostat rpool 5 capacity operations bandwidth pool alloc free read write read write ---------- ----- ----- ----- ----- ----- ----- rpool 13.3G 807M 7 85 15.9K 548K rpool 13.3G 807M 3 89 1.60K 723K rpool 13.3G 810M 5 91 5.19K 741K rpool 13.3G 810M 3 94 2.59K 756K Using iofileb.d from the dtrace toolkit shows: # iofileb.d Tracing... Hit Ctrl-C to end. ^C PID CMD KB FILE 0 sched 6 <none> 5 zpool-rpool 7770 <none> zpool status doesn''t show any problems: # zpool status rpool pool: rpool state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c3d0s0 ONLINE 0 0 0 Perhaps related to this or perhaps not, I discovered recently that time-sliderd was doing just a ton of "close" requests. I disabled time-sliderd while trying to solve my performance problem. I was also getting these error messages in the time-sliderd log file: Warning: Cleanup failed to destroy: rpool/ROOT at zfs-auto-snap_hourly-2010-11-10-15h01 Details: [''/usr/bin/pfexec'', ''/usr/sbin/zfs'', ''destroy'', ''-d'', ''rpool/ROOT at zfs-auto-snap_hourly-2010-11-10-15h01''] failed with exit code 1 cannot destroy ''rpool/ROOT at zfs-auto-snap_hourly-2010-11-10-15h01'': unsupported version That was the reason I did the zpool upgrade. I discovered that I had a *ton* of snapshots from time-slider that hadn''t been destroyed, over 6500 of them, presumably all because of this version problem? I manually removed all the snapshots and my performance returned to normal. I don''t quite understand what the "-d" option to "zfs destroy" does. Why does time-sliderd use it, and why does it prevent these snapshots from being destroyed? Shouldn''t time-sliderd detect that it can''t destroy any of the snapshots it''s created and stop creating snapshots? And since I don''t quite understand why time-sliderd was failing to begin with, I''m nervous about re-enabling it. Do I need to do a "zpool upgrade" on all my pools to make it work?
Cindy Swearingen
2011-Feb-18 20:07 UTC
[zfs-discuss] time-sliderd doesn''t remove snapshots
Hi Bill, I think the root cause of this problem is that time slider implemented the zfs destroy -d feature but this feature is only available in later pool versions. This means that the routine removal of time slider generated snapshots fails on older pool versions. The zfs destroy -d feature (snapshot user holds) was introduced in pool version 18. I think this bug describes some or all of the problem: https://defect.opensolaris.org/bz/show_bug.cgi?id=16361 Thanks, Cindy On 02/18/11 12:34, Bill Shannon wrote:> In the last few days my performance has gone to hell. I''m running: > > # uname -a > SunOS nissan 5.11 snv_150 i86pc i386 i86pc > > (I''ll upgrade as soon as the desktop hang bug is fixed.) > > The performance problems seem to be due to excessive I/O on the main > disk/pool. > > The only things I''ve changed recently is that I''ve created and destroyed > a snapshot, and I used "zpool upgrade". > > Here''s what I''m seeing: > > # zpool iostat rpool 5 > capacity operations bandwidth > pool alloc free read write read write > ---------- ----- ----- ----- ----- ----- ----- > rpool 13.3G 807M 7 85 15.9K 548K > rpool 13.3G 807M 3 89 1.60K 723K > rpool 13.3G 810M 5 91 5.19K 741K > rpool 13.3G 810M 3 94 2.59K 756K > > Using iofileb.d from the dtrace toolkit shows: > > # iofileb.d > Tracing... Hit Ctrl-C to end. > ^C > PID CMD KB FILE > 0 sched 6 <none> > 5 zpool-rpool 7770 <none> > > zpool status doesn''t show any problems: > > # zpool status rpool > pool: rpool > state: ONLINE > scan: none requested > config: > > NAME STATE READ WRITE CKSUM > rpool ONLINE 0 0 0 > c3d0s0 ONLINE 0 0 0 > > > Perhaps related to this or perhaps not, I discovered recently that > time-sliderd > was doing just a ton of "close" requests. I disabled time-sliderd while > trying > to solve my performance problem. > > I was also getting these error messages in the time-sliderd log file: > > Warning: Cleanup failed to destroy: > rpool/ROOT at zfs-auto-snap_hourly-2010-11-10-15h01 > Details: > [''/usr/bin/pfexec'', ''/usr/sbin/zfs'', ''destroy'', ''-d'', > ''rpool/ROOT at zfs-auto-snap_hourly-2010-11-10-15h01''] failed with exit code 1 > cannot destroy ''rpool/ROOT at zfs-auto-snap_hourly-2010-11-10-15h01'': > unsupported version > > That was the reason I did the zpool upgrade. > > I discovered that I had a *ton* of snapshots from time-slider that > hadn''t been destroyed, over 6500 of them, presumably all because of this > version problem? > > I manually removed all the snapshots and my performance returned to normal. > > I don''t quite understand what the "-d" option to "zfs destroy" does. > Why does time-sliderd use it, and why does it prevent these snapshots > from being destroyed? > > Shouldn''t time-sliderd detect that it can''t destroy any of the snapshots > it''s created and stop creating snapshots? > > And since I don''t quite understand why time-sliderd was failing to begin > with, > I''m nervous about re-enabling it. Do I need to do a "zpool upgrade" on all > my pools to make it work? > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
One of my old pools was version 10, another was version 13. I guess that explains the problem. Seems like time-sliderd should refuse to run on pools that aren''t of a sufficient version. Cindy Swearingen wrote on 02/18/11 12:07 PM:> Hi Bill, > > I think the root cause of this problem is that time slider implemented > the zfs destroy -d feature but this feature is only available in later > pool versions. This means that the routine removal of time slider > generated snapshots fails on older pool versions. > > The zfs destroy -d feature (snapshot user holds) was introduced in pool > version 18. > > I think this bug describes some or all of the problem: > > https://defect.opensolaris.org/bz/show_bug.cgi?id=16361 > > Thanks, > > Cindy > > > > On 02/18/11 12:34, Bill Shannon wrote: >> In the last few days my performance has gone to hell. I''m running: >> >> # uname -a >> SunOS nissan 5.11 snv_150 i86pc i386 i86pc >> >> (I''ll upgrade as soon as the desktop hang bug is fixed.) >> >> The performance problems seem to be due to excessive I/O on the main >> disk/pool. >> >> The only things I''ve changed recently is that I''ve created and destroyed >> a snapshot, and I used "zpool upgrade". >> >> Here''s what I''m seeing: >> >> # zpool iostat rpool 5 >> capacity operations bandwidth >> pool alloc free read write read write >> ---------- ----- ----- ----- ----- ----- ----- >> rpool 13.3G 807M 7 85 15.9K 548K >> rpool 13.3G 807M 3 89 1.60K 723K >> rpool 13.3G 810M 5 91 5.19K 741K >> rpool 13.3G 810M 3 94 2.59K 756K >> >> Using iofileb.d from the dtrace toolkit shows: >> >> # iofileb.d >> Tracing... Hit Ctrl-C to end. >> ^C >> PID CMD KB FILE >> 0 sched 6<none> >> 5 zpool-rpool 7770<none> >> >> zpool status doesn''t show any problems: >> >> # zpool status rpool >> pool: rpool >> state: ONLINE >> scan: none requested >> config: >> >> NAME STATE READ WRITE CKSUM >> rpool ONLINE 0 0 0 >> c3d0s0 ONLINE 0 0 0 >> >> >> Perhaps related to this or perhaps not, I discovered recently that >> time-sliderd >> was doing just a ton of "close" requests. I disabled time-sliderd while >> trying >> to solve my performance problem. >> >> I was also getting these error messages in the time-sliderd log file: >> >> Warning: Cleanup failed to destroy: >> rpool/ROOT at zfs-auto-snap_hourly-2010-11-10-15h01 >> Details: >> [''/usr/bin/pfexec'', ''/usr/sbin/zfs'', ''destroy'', ''-d'', >> ''rpool/ROOT at zfs-auto-snap_hourly-2010-11-10-15h01''] failed with exit code 1 >> cannot destroy ''rpool/ROOT at zfs-auto-snap_hourly-2010-11-10-15h01'': >> unsupported version >> >> That was the reason I did the zpool upgrade. >> >> I discovered that I had a *ton* of snapshots from time-slider that >> hadn''t been destroyed, over 6500 of them, presumably all because of this >> version problem? >> >> I manually removed all the snapshots and my performance returned to normal. >> >> I don''t quite understand what the "-d" option to "zfs destroy" does. >> Why does time-sliderd use it, and why does it prevent these snapshots >> from being destroyed? >> >> Shouldn''t time-sliderd detect that it can''t destroy any of the snapshots >> it''s created and stop creating snapshots? >> >> And since I don''t quite understand why time-sliderd was failing to begin >> with, >> I''m nervous about re-enabling it. Do I need to do a "zpool upgrade" on all >> my pools to make it work? >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss