thr3ads.net - zfs discuss - [zfs-discuss] zfs snapshot issues. [Apr 2007]

If this information is useful, please help other people find it:
Share via:

Joseph Barbey

2007-Apr-02 19:42 UTC

[zfs-discuss] zfs snapshot issues.

I have a couple of questions:

1.)
I am working with an v240 IMAP server that is currently set up with 3 zfs 
pools, one (conf-pool) on the internal disks, and two (email-pool, and 
email1-pool) that are spread across 12 disks in an attached JBOD like so:

   pool: email-pool
  state: ONLINE
config:
         NAME         STATE     READ WRITE CKSUM
         email-pool   ONLINE       0     0     0
           mirror     ONLINE       0     0     0
             c0t1d0   ONLINE       0     0     0
             c0t9d0   ONLINE       0     0     0
           mirror     ONLINE       0     0     0
             c0t2d0   ONLINE       0     0     0
             c0t10d0  ONLINE       0     0     0
           mirror     ONLINE       0     0     0
             c0t3d0   ONLINE       0     0     0
             c0t11d0  ONLINE       0     0     0
           mirror     ONLINE       0     0     0
             c0t4d0   ONLINE       0     0     0
             c0t12d0  ONLINE       0     0     0

   pool: email1-pool
  state: ONLINE
config:
         NAME         STATE     READ WRITE CKSUM
         email1-pool  ONLINE       0     0     0
           mirror     ONLINE       0     0     0
             c0t5d0   ONLINE       0     0     0
             c0t13d0  ONLINE       0     0     0
           mirror     ONLINE       0     0     0
             c0t6d0   ONLINE       0     0     0
             c0t14d0  ONLINE       0     0     0

We are planning on migrating everything over to email-pool eventually, but 
for now that''s what I''m working with.

I currently have a script that runs every night at midnight to create 
full/normal snapshots of the various filesystems on the two pools so we can 
do backups.  The script also renames and deletes snapshots such that I 
always have the last two week''s worth of snapshots.  Also, email-pool
has
26 file systems (one per letter), and email1-pool has just 1.

So, normally, when the script runs, all snapshots finish in maybe a minute 
total.  However, on Sundays, it continues to take longer and longer.   On 
2/25 it took 30 minutes, and this last Sunday, it took 2:11.  The only 
thing special thing about Sunday''s snapshots is that they are the first
ones created since the full backup (using NetBackup) on Saturday.  All 
other backups are incrementals.

I''ve tried scrubbing both pools, but that didn''t help.  So,
any thoughts or
further questions on this one?

2.)
Possibly related to the above, I''ve found that my zfs tools have been 
patched/upgraded, and now zpool status says I can/should upgrade.  zpool 
upgrade shows the following:

     -bash-3.00$ zpool upgrade
     This system is currently running ZFS version 3.

     The following pools are out of date, and can be upgraded.  After being
     upgraded, these pools will no longer be accessible by older software
     versions.

     VER  POOL
     ---  ------------
      2   conf-pool
      2   email-pool
      2   email1-pool

I''ve looked a bit, and can''t find anything that tells me if
there will be
any sort of a hit on my performance if I upgrade the pools.  I''m
especially
cautious as this is our production email server.  Will an upgrade cause 
load/performance problems?  Will I need to take down the IMAP server while 
I do this?  Also, the whole ''older software versions''
mentioned above: what
is the older software I''d have to worry about?  Just the ZFS tools?  If
the
array is staying on this server, do I even have to worry about this?
I''m
running a patched 2006/06 OS.

-- 
Joe Barbey               IT Services/Network Services
office: (715) 425-4357	 Davee Library room 166C
cell:   (715) 821-0008   UW - River Falls

Robert Milkowski

2007-Apr-02 20:43 UTC

head link

[zfs-discuss] zfs snapshot issues.

Hello Joseph,

Monday, April 2, 2007, 9:42:24 PM, you wrote:

JB> I have a couple of questions:

JB> 1.)
JB> I am working with an v240 IMAP server that is currently set up with 3 zfs
JB> pools, one (conf-pool) on the internal disks, and two (email-pool, and
JB> email1-pool) that are spread across 12 disks in an attached JBOD like so:

JB>    pool: email-pool
JB>   state: ONLINE
JB> config:
JB>          NAME         STATE     READ WRITE CKSUM
JB>          email-pool   ONLINE       0     0     0
JB>            mirror     ONLINE       0     0     0
JB>              c0t1d0   ONLINE       0     0     0
JB>              c0t9d0   ONLINE       0     0     0
JB>            mirror     ONLINE       0     0     0
JB>              c0t2d0   ONLINE       0     0     0
JB>              c0t10d0  ONLINE       0     0     0
JB>            mirror     ONLINE       0     0     0
JB>              c0t3d0   ONLINE       0     0     0
JB>              c0t11d0  ONLINE       0     0     0
JB>            mirror     ONLINE       0     0     0
JB>              c0t4d0   ONLINE       0     0     0
JB>              c0t12d0  ONLINE       0     0     0

JB>    pool: email1-pool
JB>   state: ONLINE
JB> config:
JB>          NAME         STATE     READ WRITE CKSUM
JB>          email1-pool  ONLINE       0     0     0
JB>            mirror     ONLINE       0     0     0
JB>              c0t5d0   ONLINE       0     0     0
JB>              c0t13d0  ONLINE       0     0     0
JB>            mirror     ONLINE       0     0     0
JB>              c0t6d0   ONLINE       0     0     0
JB>              c0t14d0  ONLINE       0     0     0

JB> We are planning on migrating everything over to email-pool eventually,
but
JB> for now that''s what I''m working with.

JB> I currently have a script that runs every night at midnight to create 
JB> full/normal snapshots of the various filesystems on the two pools so we
can
JB> do backups.  The script also renames and deletes snapshots such that I
JB> always have the last two week''s worth of snapshots.  Also,
email-pool has
JB> 26 file systems (one per letter), and email1-pool has just 1.

JB> So, normally, when the script runs, all snapshots finish in maybe a
minute
JB> total.  However, on Sundays, it continues to take longer and longer.   On
JB> 2/25 it took 30 minutes, and this last Sunday, it took 2:11.  The only
JB> thing special thing about Sunday''s snapshots is that they are
the first
JB> ones created since the full backup (using NetBackup) on Saturday. All
JB> other backups are incrementals.

hmmmmm do you have atime property set to off?
Maybe you spend most of the time in destroying snapshots due to much
larger delta coused by atime updates? You can possibly also gain some
performance by setting atime to off.



JB> 2.)
JB> Possibly related to the above, I''ve found that my zfs tools have
been
JB> patched/upgraded, and now zpool status says I can/should upgrade. zpool
JB> upgrade shows the following:

JB>      -bash-3.00$ zpool upgrade
JB>      This system is currently running ZFS version 3.

JB>      The following pools are out of date, and can be upgraded.  After
being
JB>      upgraded, these pools will no longer be accessible by older software
JB>      versions.

JB>      VER  POOL
JB>      ---  ------------
JB>       2   conf-pool
JB>       2   email-pool
JB>       2   email1-pool

JB> I''ve looked a bit, and can''t find anything that tells
me if there will be
JB> any sort of a hit on my performance if I upgrade the pools.  I''m
especially
JB> cautious as this is our production email server.  Will an upgrade cause
JB> load/performance problems?  Will I need to take down the IMAP server
while
JB> I do this?  Also, the whole ''older software versions''
mentioned above: what
JB> is the older software I''d have to worry about?  Just the ZFS
tools?  If the
JB> array is staying on this server, do I even have to worry about this?
I''m
JB> running a patched 2006/06 OS.

It''s about zfs on-disk format.
By doing upgrade it won''t issue any significant number of IOs (should
complete in a second or so). It won''t hit performance but it will add
new features like hot spare support (in version 3) or proper disk
space accounting for raid-z* file systems (but they have to be
re-created).

There''s nothing to worry about upgrade and of course you can do it
online without stopping applications. However once you upgrade you
won''t be able to import version 3 pool on older systems with version 2
only support (so basically you won''t be able to import on S10U2).
So once you''re sure that system is working correctly for some time
after applying
last patches you can safely upgrade.



-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Joseph Barbey

2007-Apr-02 21:23 UTC

head link

[zfs-discuss] zfs snapshot issues.

Robert Milkowski wrote:
> JB> So, normally, when the script runs, all snapshots finish in maybe a
minute
> JB> total.  However, on Sundays, it continues to take longer and longer.
On
> JB> 2/25 it took 30 minutes, and this last Sunday, it took 2:11.  The
only
> JB> thing special thing about Sunday''s snapshots is that they
are the first
> JB> ones created since the full backup (using NetBackup) on Saturday.
All
> JB> other backups are incrementals.
> 
> hmmmmm do you have atime property set to off?
> Maybe you spend most of the time in destroying snapshots due to much
> larger delta coused by atime updates? You can possibly also gain some
> performance by setting atime to off.
Yep, atime is set to off for all pools and filesystems.  I looked through 
the other possible properties, and nothing really looked like it would 
really affect things.

One additional weird thing.  My script hits each filesystem 
(email-pool/A..Z) individually, so I can run zfs list -t snapshot and find 
out how long each snapshot actually takes.  Everything runs fine until I 
get to around V or (normally) W.  Then it can take a couple of hours on the 
one FS.  After that, the rest go quickly.
> JB> I''ve looked a bit, and can''t find anything that
tells me if there will be
> JB> any sort of a hit on my performance if I upgrade the pools. 
I''m especially
> JB> cautious as this is our production email server.  Will an upgrade
cause
> JB> load/performance problems?  Will I need to take down the IMAP server
while
> JB> I do this?  Also, the whole ''older software
versions'' mentioned above: what
> JB> is the older software I''d have to worry about?  Just the
ZFS tools?  If the
> JB> array is staying on this server, do I even have to worry about this?
I''m
> JB> running a patched 2006/06 OS.
> 
> It''s about zfs on-disk format.
> By doing upgrade it won''t issue any significant number of IOs
(should
> complete in a second or so). It won''t hit performance but it will
add
> new features like hot spare support (in version 3) or proper disk
> space accounting for raid-z* file systems (but they have to be
> re-created).
> 
> There''s nothing to worry about upgrade and of course you can do it
> online without stopping applications. However once you upgrade you
> won''t be able to import version 3 pool on older systems with
version 2
> only support (so basically you won''t be able to import on S10U2).
> So once you''re sure that system is working correctly for some time
after applying
> last patches you can safely upgrade.
OK.  Good to know.  I figured it was something along those lines, but 
wasn''t sure.  I''m not overly concerned about the import part,
so, nothing
to worry about there, then.

Thanks for your help on this!

-- 
Joe Barbey               IT Services/Network Services
office: (715) 425-4357	 Davee Library room 166C
cell:   (715) 821-0008   UW - River Falls

Matthew Ahrens

2007-Apr-03 00:29 UTC

head link

[zfs-discuss] zfs snapshot issues.

Joseph Barbey wrote:> Robert Milkowski wrote:
> 
>> JB> So, normally, when the script runs, all snapshots finish in
maybe
>> a minute
>> JB> total.  However, on Sundays, it continues to take longer and 
>> longer.   On
>> JB> 2/25 it took 30 minutes, and this last Sunday, it took 2:11. 
The
>> only
>> JB> thing special thing about Sunday''s snapshots is that
they are the
>> first
>> JB> ones created since the full backup (using NetBackup) on
Saturday. All
>> JB> other backups are incrementals.
>>
>> hmmmmm do you have atime property set to off?
>> Maybe you spend most of the time in destroying snapshots due to much
>> larger delta coused by atime updates? You can possibly also gain some
>> performance by setting atime to off.
> 
> Yep, atime is set to off for all pools and filesystems.  I looked 
> through the other possible properties, and nothing really looked like it 
> would really affect things.
> 
> One additional weird thing.  My script hits each filesystem 
> (email-pool/A..Z) individually, so I can run zfs list -t snapshot and 
> find out how long each snapshot actually takes.  Everything runs fine 
> until I get to around V or (normally) W.  Then it can take a couple of 
> hours on the one FS.  After that, the rest go quickly.
So, what operation exactly is taking "a couple of hours on the one
FS"?
  The only one I can imagine taking more than a minute would be ''zfs 
destroy'', but even that should be very rare on a snapshot.  Is it
always
the same FS that takes longer than the rest?  Is the pool busy when you 
do the slow operation?

You should be able to improve performance considerably (~26x) by using 
just doing one ''zfs snapshot -r'', ''zfs destroy
-r'', and ''zfs rename -r''.
  (rename -r is in progress, should be available in OpenSolaris soon; 
the others are in s10u3.)

--matt

Joseph Barbey

2007-Apr-03 15:45 UTC

head link

[zfs-discuss] zfs snapshot issues.

Matthew Ahrens wrote:> Joseph Barbey wrote:
>> Robert Milkowski wrote:
>>
>>> JB> So, normally, when the script runs, all snapshots finish in
maybe
>>> a minute
>>> JB> total.  However, on Sundays, it continues to take longer and
>>> longer.   On
>>> JB> 2/25 it took 30 minutes, and this last Sunday, it took 2:11.
The
>>> only
>>> JB> thing special thing about Sunday''s snapshots is
that they are the
>>> first
>>> JB> ones created since the full backup (using NetBackup) on
Saturday.
>>> All
>>> JB> other backups are incrementals.
>>>
>>> hmmmmm do you have atime property set to off?
>>> Maybe you spend most of the time in destroying snapshots due to
much
>>> larger delta coused by atime updates? You can possibly also gain
some
>>> performance by setting atime to off.
>>
>> Yep, atime is set to off for all pools and filesystems.  I looked 
>> through the other possible properties, and nothing really looked like 
>> it would really affect things.
>>
>> One additional weird thing.  My script hits each filesystem 
>> (email-pool/A..Z) individually, so I can run zfs list -t snapshot and 
>> find out how long each snapshot actually takes.  Everything runs fine 
>> until I get to around V or (normally) W.  Then it can take a couple of 
>> hours on the one FS.  After that, the rest go quickly.
> 
> So, what operation exactly is taking "a couple of hours on the one
FS"?
>  The only one I can imagine taking more than a minute would be
''zfs
> destroy'', but even that should be very rare on a snapshot.  Is it
always
> the same FS that takes longer than the rest?  Is the pool busy when you 
> do the slow operation?
I''ll need to look at this.  I expect the next wait will be again on
Sunday.
  My first-blush guess is that creating the snapshot itself is actually the 
problem, but I''ll likely have to wait until next Sunday to be sure. 
I''ve
added some debugging to my script, so I can see how long each part takes.

Also, it is always the ''W'' filesytems''s snapshot that
seems to indicate a
long time to be created.  Recently it showed up a little bit in the
''V''
filesystem as well.
> You should be able to improve performance considerably (~26x) by using 
> just doing one ''zfs snapshot -r'', ''zfs destroy
-r'', and ''zfs rename -r''.
>  (rename -r is in progress, should be available in OpenSolaris soon; the 
> others are in s10u3.)
When I first set this stuff up, we could not use the -r option anywhere, so 
I didn''t use it in my script.  A quick test verifies that I now CAN use
it
as indicated above.  Once my test on Sunday is done, I''ll be using -r
in my
script as well.

Also, all 3 pools are still ''formatted'' as v2.  I''ll
try upgrading all 3
before Sunday, and see if that helps as well.

-- 
Joe Barbey               IT Services/Network Services
office: (715) 425-4357	 Davee Library room 166C
cell:   (715) 821-0008   UW - River Falls

Matthew Ahrens

2007-Apr-03 16:31 UTC

head link

[zfs-discuss] zfs snapshot issues.

Joseph Barbey wrote:> Also, all 3 pools are still ''formatted'' as v2. 
I''ll try upgrading all 3
> before Sunday, and see if that helps as well.
That won''t change any performance; upgrading to v3 just enables new 
features (hot spares and double parity raidz).

--matt

Joseph Barbey

2007-Apr-09 21:05 UTC

head link

[zfs-discuss] zfs snapshot issues.

Matthew Ahrens wrote:> Joseph Barbey wrote:
>> Robert Milkowski wrote:
>>
>>> JB> So, normally, when the script runs, all snapshots finish in
maybe
>>> a minute
>>> JB> total.  However, on Sundays, it continues to take longer and
>>> longer.   On
>>> JB> 2/25 it took 30 minutes, and this last Sunday, it took 2:11.
The
>>> only
>>> JB> thing special thing about Sunday''s snapshots is
that they are the
>>> first
>>> JB> ones created since the full backup (using NetBackup) on
Saturday.
>>> All
>>> JB> other backups are incrementals.
>>>
>>> hmmmmm do you have atime property set to off?
>>> Maybe you spend most of the time in destroying snapshots due to
much
>>> larger delta coused by atime updates? You can possibly also gain
some
>>> performance by setting atime to off.
>>
>> Yep, atime is set to off for all pools and filesystems.  I looked 
>> through the other possible properties, and nothing really looked like 
>> it would really affect things.
>>
>> One additional weird thing.  My script hits each filesystem 
>> (email-pool/A..Z) individually, so I can run zfs list -t snapshot and 
>> find out how long each snapshot actually takes.  Everything runs fine 
>> until I get to around V or (normally) W.  Then it can take a couple of 
>> hours on the one FS.  After that, the rest go quickly.
> 
> So, what operation exactly is taking "a couple of hours on the one
FS"?
>  The only one I can imagine taking more than a minute would be
''zfs
> destroy'', but even that should be very rare on a snapshot.  Is it
always
> the same FS that takes longer than the rest?  Is the pool busy when you 
> do the slow operation?
I''ve now determined that renaming the previous snapshot seems to be the
problem in certain instances.

What we are currently doing through the script is to keep 2 weeks of daily 
snapshots of the various pool/filesystems.  These snapshots are named 
{fs}.$Day-2, {fs}.$Day-2, and {fs}.snap.  Specifically, for our
''V''
filesystem, which is created under the email-pool, I will have the 
following snapshots:

   email-pool/V at V.Tuesday-2
   email-pool/V at V.Wednesday-2
   email-pool/V at V.Thursday-2
   email-pool/V at V.Friday-2
   email-pool/V at V.Saturday-2
   email-pool/V at V.Sunday-2
   email-pool/V at V.Monday-2
   email-pool/V at V.Tuesday-1
   email-pool/V at V.Wednesday-1
   email-pool/V at V.Thursday-1
   email-pool/V at V.Friday-1
   email-pool/V at V.Saturday-1
   email-pool/V at V.Sunday-1
   email-pool/V at V.snap

So, my script does the following for each FS:
   Check for FS.$Day-2.  If exists, then destroy it.
   Check if there is a FS.$Day-1.  If so, rename it to $DAY-2.
   Check for FS.snap. If so, rename to FS.$Yesterday-1 (day it was created).
   Create FS.snap

I added logging to a file, along with the action just run and the time that 
it completed:

   Destroy email-pool/V at V.Sunday-2    Sun Apr  8 00:01:04 CDT 2007
   Rename email-pool/V at V.Sunday-1 email-pool/V at V.Sunday-2    Sun Apr  8 
00:01:05 CDT 2007
   Rename email-pool/V at V.snap email-pool/V at V.Sunday-1    Sun Apr  8 
00:54:52 CDT 2007
   Create email-pool/V at V.snap    Sun Apr  8 00:54:53 CDT 2007

Looking at the above, Rename took from 00:01:05 until 00:54:52, so almost 
54 minutes.

So, any ideas on why a rename should take so long?  And again, why is this 
only happening on Sunday?  Any other information I can provide that might 
help diagnose this?

Thanks again for any help on this.

-- 
Joe Barbey               IT Services/Network Services
office: (715) 425-4357	 Davee Library room 166C
cell:   (715) 821-0008   UW - River Falls

Mark Maybee

2007-Apr-10 18:42 UTC

head link

[zfs-discuss] zfs snapshot issues.

Joseph Barbey wrote:> Matthew Ahrens wrote:
>> Joseph Barbey wrote:
>>> Robert Milkowski wrote:
>>>
>>>> JB> So, normally, when the script runs, all snapshots finish
in
>>>> maybe a minute
>>>> JB> total.  However, on Sundays, it continues to take longer
and
>>>> longer.   On
>>>> JB> 2/25 it took 30 minutes, and this last Sunday, it took
2:11.
>>>> The only
>>>> JB> thing special thing about Sunday''s snapshots is
that they are
>>>> the first
>>>> JB> ones created since the full backup (using NetBackup) on 
>>>> Saturday. All
>>>> JB> other backups are incrementals.
>>>>
>>>> hmmmmm do you have atime property set to off?
>>>> Maybe you spend most of the time in destroying snapshots due to
much
>>>> larger delta coused by atime updates? You can possibly also
gain some
>>>> performance by setting atime to off.
>>>
>>> Yep, atime is set to off for all pools and filesystems.  I looked 
>>> through the other possible properties, and nothing really looked
like
>>> it would really affect things.
>>>
>>> One additional weird thing.  My script hits each filesystem 
>>> (email-pool/A..Z) individually, so I can run zfs list -t snapshot
and
>>> find out how long each snapshot actually takes.  Everything runs
fine
>>> until I get to around V or (normally) W.  Then it can take a couple
>>> of hours on the one FS.  After that, the rest go quickly.
>>
>> So, what operation exactly is taking "a couple of hours on the one
>> FS"?  The only one I can imagine taking more than a minute would
be
>> ''zfs destroy'', but even that should be very rare on a
snapshot.  Is it
>> always the same FS that takes longer than the rest?  Is the pool busy 
>> when you do the slow operation?
> 
> I''ve now determined that renaming the previous snapshot seems to
be the
> problem in certain instances.
> 
> What we are currently doing through the script is to keep 2 weeks of 
> daily snapshots of the various pool/filesystems.  These snapshots are 
> named {fs}.$Day-2, {fs}.$Day-2, and {fs}.snap.  Specifically, for our 
> ''V'' filesystem, which is created under the email-pool, I
will have the
> following snapshots:
> 
>   email-pool/V at V.Tuesday-2
>   email-pool/V at V.Wednesday-2
>   email-pool/V at V.Thursday-2
>   email-pool/V at V.Friday-2
>   email-pool/V at V.Saturday-2
>   email-pool/V at V.Sunday-2
>   email-pool/V at V.Monday-2
>   email-pool/V at V.Tuesday-1
>   email-pool/V at V.Wednesday-1
>   email-pool/V at V.Thursday-1
>   email-pool/V at V.Friday-1
>   email-pool/V at V.Saturday-1
>   email-pool/V at V.Sunday-1
>   email-pool/V at V.snap
> 
> So, my script does the following for each FS:
>   Check for FS.$Day-2.  If exists, then destroy it.
>   Check if there is a FS.$Day-1.  If so, rename it to $DAY-2.
>   Check for FS.snap. If so, rename to FS.$Yesterday-1 (day it was created).
>   Create FS.snap
> 
> I added logging to a file, along with the action just run and the time 
> that it completed:
> 
>   Destroy email-pool/V at V.Sunday-2    Sun Apr  8 00:01:04 CDT 2007
>   Rename email-pool/V at V.Sunday-1 email-pool/V at V.Sunday-2    Sun Apr 
8
> 00:01:05 CDT 2007
>   Rename email-pool/V at V.snap email-pool/V at V.Sunday-1    Sun Apr  8 
> 00:54:52 CDT 2007
>   Create email-pool/V at V.snap    Sun Apr  8 00:54:53 CDT 2007
> 
> Looking at the above, Rename took from 00:01:05 until 00:54:52, so 
> almost 54 minutes.
> 
> So, any ideas on why a rename should take so long?  And again, why is 
> this only happening on Sunday?  Any other information I can provide that 
> might help diagnose this?
> This could be an instance of:

6509628 unmount of a snapshot (from ''zfs destroy'') is slow

The fact that this bug comes from a destroy op is not relevant, what is
relevant is the required unmount (also required in a rename op).  Has
there been recent activity in the Sunday-1 snapshot (like a backup or
''find'' perhaps)?  This will cause the unmount to proceed very
slowly.

-Mark

zfs discuss - Apr 2007 - zfs snapshot issues.

[zfs-discuss] zfs snapshot issues.

[zfs-discuss] zfs snapshot issues.

[zfs-discuss] zfs snapshot issues.

[zfs-discuss] zfs snapshot issues.

[zfs-discuss] zfs snapshot issues.

[zfs-discuss] zfs snapshot issues.

[zfs-discuss] zfs snapshot issues.

[zfs-discuss] zfs snapshot issues.