thr3ads.net - zfs discuss - [zfs-discuss] kernel panic during zfs import [UPDATE] [Apr 2012]

If this information is useful, please help other people find it:
Share via:

Carsten John

2012-Apr-17 15:40 UTC

[zfs-discuss] kernel panic during zfs import [UPDATE]

Hello everybody,

just to let you know what happened in the meantime:

I was able to open a Service Request at Oracle.

The issue is a known bug (Bug 6742788 : assertion panic at: zfs:zap_deref_leaf)

The bug has bin fixed (according to Oracle support) since build 164, but there
is no fix for Solaris 11 available so far (will be fixed in S11U7?).

There is a workaround available that works (partly), but my system crashed again
when trying to rebuild the offending zfs within the affected zpool.

At the moment I''m waiting for a so called "interim diagnostic
relief" patch....


cu

Carsten

-- 
Max Planck Institut fuer marine Mikrobiologie
- Network Administration -
Celsiustr. 1
D-28359 Bremen
Tel.: +49 421 2028568
Fax.: +49 421 2028565
PGP public key:http://www.mpi-bremen.de/Carsten_John.html

Enda O''Connor

2012-Apr-17 15:49 UTC

head link

[zfs-discuss] kernel panic during zfs import [UPDATE]

On 17/04/2012 16:40, Carsten John wrote:> Hello everybody,
>
> just to let you know what happened in the meantime:
>
> I was able to open a Service Request at Oracle.
>
> The issue is a known bug (Bug 6742788 : assertion panic at:
zfs:zap_deref_leaf)
>
> The bug has bin fixed (according to Oracle support) since build 164, but
there is no fix for Solaris 11 available so far (will be fixed in S11U7?).
>
> There is a workaround available that works (partly), but my system crashed
again when trying to rebuild the offending zfs within the affected zpool.
>
> At the moment I''m waiting for a so called "interim diagnostic
relief" patch....
so are you on s11, can I see pkg info entire

this bug is fixed in FCS s11 release, as that is 175b, and it got fixed 
in build 164. So if you have solaris 11 that CR is fixed.

In solaris 10 it is fixed in 147440-14/147441-14 ( sparc/x86 )


Enda>
>
> cu
>
> Carsten
>

Stephan Budach

2012-Apr-17 19:55 UTC

head link

[zfs-discuss] kernel panic during zfs import [UPDATE]

Hi Carsten,


Am 17.04.12 17:40, schrieb Carsten John:> Hello everybody,
>
> just to let you know what happened in the meantime:
>
> I was able to open a Service Request at Oracle.
>
> The issue is a known bug (Bug 6742788 : assertion panic at:
zfs:zap_deref_leaf)
>
> The bug has bin fixed (according to Oracle support) since build 164, but
there is no fix for Solaris 11 available so far (will be fixed in S11U7?).
>
> There is a workaround available that works (partly), but my system crashed
again when trying to rebuild the offending zfs within the affected zpool.
>
> At the moment I''m waiting for a so called "interim diagnostic
relief" patch....
>
>
> cu
>
> Carsten
>

Afaik, bug 6742788 is fixed in S11 FCS (release) but you might be 
hitting this bug: 7098658. This bug, according to MOS, is still 
unresolved. My solution is to mount the affected zfs fs in read-only 
mode upon importing the zpool and setting it to rw afterwards.

Cheers,
budy
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120417/de99ed7f/attachment.html>

Brian Wilson

2012-Jul-06 20:34 UTC

head link

[zfs-discuss] Scenario sanity check

Hello,

I''d like a sanity check from people more knowledgeable than myself.
I''m managing backups on a production system.  Previously I was using 
another volume manager and filesystem on Solaris, and I''ve just
switched
to using ZFS.

My model is -
Production Server A
Test Server B
Mirrored storage arrays (HDS TruCopy if it matters)
Backup software (TSM)

Production server A sees the live volumes.
Test Server B sees the TruCopy mirrors of the live volumes.  (it sees 
the second storage array, the production server sees the primary array)

Production server A shuts down zone C, and exports the zpools for zone C.
Production server A splits the mirror to secondary storage array, 
leaving the mirror writable.
Production server A re-imports the pools for zone C, and boots zone C.
Test Server B imports the ZFS pool using -R /backup.
Backup software backs up the mounted mirror volumes on Test Server B.

Later in the day after the backups finish, a script exports the ZFS 
pools on test server B, and re-establishes the TruCopy mirror between 
the storage arrays.

So.. I had this working fine with one zone on server A for a couple of 
months.  This week I''ve added 6 more zones, each with two ZFS pools.  
The first night went okay.  Last night, the test server B kernel
panic''d
well after the mirrored volumes zpools were imported, just after the TSM 
backup started reading all the ZFS pools to push it all to the 
enterprise backup environment.

Here''s the kernel panic message -
Jul  6 03:04:55 riggs ^Mpanic[cpu22]/thread=2a10e81bca0:
Jul  6 03:04:55 riggs unix: [ID 403854 kern.notice] assertion failed: 0 
== dmu_buf_hold_array(os, object, offset, size, FALSE, FTAG, &numbufs, 
&dbp), file: ../../common/fs/zfs/dmu.c, line: 759
Jul  6 03:04:55 riggs unix: [ID 100000 kern.notice]
Jul  6 03:04:55 riggs genunix: [ID 723222 kern.notice] 000002a10e81b4f0 
genunix:assfail+74 (7af0f8c0, 7af0f910, 2f7, 190d000, 12a1800, 0)
Jul  6 03:04:55 riggs genunix: [ID 179002 kern.notice]   %l0-3: 
0000000000000000 0000000000000001 0000000000000001 00000300f20fdf81
Jul  6 03:04:55 riggs   %l4-7: 00000000012a1800 0000000000000000 
0000000001959400 0000000000000000
Jul  6 03:04:55 riggs genunix: [ID 723222 kern.notice] 000002a10e81b5a0 
zfs:dmu_write+54 (300cbfd5c40, ad, a70, 20, 300b8c02800, 300f83414d0)
Jul  6 03:04:55 riggs genunix: [ID 179002 kern.notice]   %l0-3: 
0000000000000038 0000000000000007 000000000194bd40 000000000194bc00
Jul  6 03:04:55 riggs   %l4-7: 0000000000000001 0000030071bcb701 
0000000000003006 0000000000003000
Jul  6 03:04:55 riggs genunix: [ID 723222 kern.notice] 000002a10e81b670 
zfs:space_map_sync+278 (3009babd130, b, 3009babcfe0, 20, 4, 58)
Jul  6 03:04:55 riggs genunix: [ID 179002 kern.notice]   %l0-3: 
0000000000000020 00000300b8c02800 00000300b8c02820 00000300b8c02858
Jul  6 03:04:55 riggs   %l4-7: 00007fffffffffff 0000000000007fff 
00000000000022d9 0000000000000020
Jul  6 03:04:55 riggs genunix: [ID 723222 kern.notice] 000002a10e81b760 
zfs:metaslab_sync+2b0 (3009babcfc0, 1db7, 300f83414d0, 3009babd408, 
300c9724000, 6003e24acc0)
Jul  6 03:04:55 riggs genunix: [ID 179002 kern.notice]   %l0-3: 
00000300cbfd5c40 000003009babcff8 000003009babd130 000003009babd2d0
Jul  6 03:04:55 riggs   %l4-7: 000003009babcfe0 0000000000000000 
000003009babd268 000000000000001a
Jul  6 03:04:55 riggs genunix: [ID 723222 kern.notice] 000002a10e81b820 
zfs:vdev_sync+b8 (6003e24acc0, 1db7, 1db6, 3009babcfc0, 6003e24b000, 17)
Jul  6 03:04:55 riggs genunix: [ID 179002 kern.notice]   %l0-3: 
0000000000000090 0000000000000012 000006003e24acc0 00000300c9724000
Jul  6 03:04:55 riggs   %l4-7: 0000000000000000 0000000000000000 
0000000000000000 00000009041ea000
Jul  6 03:04:55 riggs genunix: [ID 723222 kern.notice] 000002a10e81b8d0 
zfs:spa_sync+484 (300c9724000, 1db7, 3005fec09a8, 300c9724428, 1, 
300cbfd5c40)
Jul  6 03:04:55 riggs genunix: [ID 179002 kern.notice]   %l0-3: 
0000000000000000 00000300c9724280 0000030087c3e940 00000300c6aae700
Jul  6 03:04:55 riggs   %l4-7: 0000030080073520 00000300c9724378 
00000300c9724300 00000300c9724330
Jul  6 03:04:55 riggs genunix: [ID 723222 kern.notice] 000002a10e81b9a0 
zfs:txg_sync_thread+1b8 (30087c3e940, 183f9f0, 707a3130, 0, 2a10e81ba70, 0)
Jul  6 03:04:55 riggs genunix: [ID 179002 kern.notice]   %l0-3: 
0000000000000000 0000030087c3eb0e 0000030087c3eb08 0000030087c3eb0c
Jul  6 03:04:55 riggs   %l4-7: 000000001230fa07 0000030087c3eac8 
0000030087c3ead0 0000000000001db7
Jul  6 03:04:55 riggs unix: [ID 100000 kern.notice]

So, I guess my question is - is what I''m doing sane?  Or is there 
something inherint with ZFS that I''m missing that''s going to
cause this
kernel panic to repeat?  Best I can guess, it got upset when the pools 
were being read.  I''m wondering of exporting the pools later in the day
before re-syncing the SAN volumes to mirrors is causing weirdness. 
(because makes the mirrored volumes visible on Test Server B read-only 
until the split).  I wouldn''t think so, because they''re
exported before
the luns go read-only, but I could be wrong.

Anyway, am I off my rocker?  This should work with ZFS, right?

Thanks!
Brian

-- 
-----------------------------------------------------------------------------------
Brian Wilson, Solaris SE, UW-Madison DoIT
Room 3114 CS&S            608-263-8047
brian.wilson(a)doit.wisc.edu
''I try to save a life a day. Usually it''s my own.'' -
John Crichton
-----------------------------------------------------------------------------------

Ian Collins

2012-Jul-06 21:17 UTC

head link

[zfs-discuss] Scenario sanity check

On 07/ 7/12 08:34 AM, Brian Wilson wrote:> Hello,
>
> I''d like a sanity check from people more knowledgeable than
myself.
> I''m managing backups on a production system.  Previously I was
using
> another volume manager and filesystem on Solaris, and I''ve just
switched
> to using ZFS.
>
> My model is -
> Production Server A
> Test Server B
> Mirrored storage arrays (HDS TruCopy if it matters)
> Backup software (TSM)
>
> Production server A sees the live volumes.
> Test Server B sees the TruCopy mirrors of the live volumes.  (it sees
> the second storage array, the production server sees the primary array)
>
> Production server A shuts down zone C, and exports the zpools for zone C.
> Production server A splits the mirror to secondary storage array,
> leaving the mirror writable.
> Production server A re-imports the pools for zone C, and boots zone C.
> Test Server B imports the ZFS pool using -R /backup.
> Backup software backs up the mounted mirror volumes on Test Server B.
>
> Later in the day after the backups finish, a script exports the ZFS
> pools on test server B, and re-establishes the TruCopy mirror between
> the storage arrays.
That looks awfully complicated.   Why don''t you just clone a snapshot 
and back up the clone?

-- 
Ian.

Brian Wilson

2012-Jul-06 23:29 UTC

head link

[zfs-discuss] Scenario sanity check

On 07/ 6/12 04:17 PM, Ian Collins wrote:> On 07/ 7/12 08:34 AM, Brian Wilson wrote:
>> Hello,
>>
>> I''d like a sanity check from people more knowledgeable than
myself.
>> I''m managing backups on a production system.  Previously I was
using
>> another volume manager and filesystem on Solaris, and I''ve
just switched
>> to using ZFS.
>>
>> My model is -
>> Production Server A
>> Test Server B
>> Mirrored storage arrays (HDS TruCopy if it matters)
>> Backup software (TSM)
>>
>> Production server A sees the live volumes.
>> Test Server B sees the TruCopy mirrors of the live volumes.  (it sees
>> the second storage array, the production server sees the primary array)
>>
>> Production server A shuts down zone C, and exports the zpools for 
>> zone C.
>> Production server A splits the mirror to secondary storage array,
>> leaving the mirror writable.
>> Production server A re-imports the pools for zone C, and boots zone C.
>> Test Server B imports the ZFS pool using -R /backup.
>> Backup software backs up the mounted mirror volumes on Test Server B.
>>
>> Later in the day after the backups finish, a script exports the ZFS
>> pools on test server B, and re-establishes the TruCopy mirror between
>> the storage arrays.
>
> That looks awfully complicated.   Why don''t you just clone a
snapshot
> and back up the clone?
>Taking a snapshot and cloning incurs IO.  Backing up the clone incurs a 
lot more IO reading off the disks and going over the network.  These 
aren''t acceptable costs in my situation.

The solution is complicated if you''re starting from scratch. 
I''m
working in an environment that already had all the pieces in place 
(offsite synchronous mirroring, a test server to mount stuff up on, 
scripts that automated the storage array mirror management, etc).  It 
was setup that way specifically to accomplish short downtime outages for 
cold backups with minimal or no IO hit to production.  So while it''s 
complicated, when it was put together it was also the most obvious thing 
to do to drop my backup window to almost nothing, and keep all the IO 
from the backup from impacting production.  And like I said, with a 
different volume manager, it''s been rock solid for years.

So, to ask the sanity check more specifically -
Is it reasonable to expect ZFS pools to be exported, have their luns 
change underneath, then later import the same pool on those changed 
drives again?

Thanks!
Brian

-- 
-----------------------------------------------------------------------------------
Brian Wilson, Solaris SE, UW-Madison DoIT
Room 3114 CS&S            608-263-8047
brian.wilson(a)doit.wisc.edu
''I try to save a life a day. Usually it''s my own.'' -
John Crichton
-----------------------------------------------------------------------------------

Ian Collins

2012-Jul-06 23:55 UTC

head link

[zfs-discuss] Scenario sanity check

On 07/ 7/12 11:29 AM, Brian Wilson wrote:> On 07/ 6/12 04:17 PM, Ian Collins wrote:
>> On 07/ 7/12 08:34 AM, Brian Wilson wrote:
>>> Hello,
>>>
>>> I''d like a sanity check from people more knowledgeable
than myself.
>>> I''m managing backups on a production system.  Previously I
was using
>>> another volume manager and filesystem on Solaris, and I''ve
just switched
>>> to using ZFS.
>>>
>>> My model is -
>>> Production Server A
>>> Test Server B
>>> Mirrored storage arrays (HDS TruCopy if it matters)
>>> Backup software (TSM)
>>>
>>> Production server A sees the live volumes.
>>> Test Server B sees the TruCopy mirrors of the live volumes.  (it
sees
>>> the second storage array, the production server sees the primary
array)
>>>
>>> Production server A shuts down zone C, and exports the zpools for
>>> zone C.
>>> Production server A splits the mirror to secondary storage array,
>>> leaving the mirror writable.
>>> Production server A re-imports the pools for zone C, and boots zone
C.
>>> Test Server B imports the ZFS pool using -R /backup.
>>> Backup software backs up the mounted mirror volumes on Test Server
B.
>>>
>>> Later in the day after the backups finish, a script exports the ZFS
>>> pools on test server B, and re-establishes the TruCopy mirror
between
>>> the storage arrays.
>> That looks awfully complicated.   Why don''t you just clone a
snapshot
>> and back up the clone?
>>
> Taking a snapshot and cloning incurs IO.  Backing up the clone incurs a
> lot more IO reading off the disks and going over the network.  These
> aren''t acceptable costs in my situation.
So splitting a mirror and reconnecting it doesn''t incur I/O?
> The solution is complicated if you''re starting from scratch. 
I''m
> working in an environment that already had all the pieces in place
> (offsite synchronous mirroring, a test server to mount stuff up on,
> scripts that automated the storage array mirror management, etc).  It
> was setup that way specifically to accomplish short downtime outages for
> cold backups with minimal or no IO hit to production.  So while
it''s
> complicated, when it was put together it was also the most obvious thing
> to do to drop my backup window to almost nothing, and keep all the IO
> from the backup from impacting production.  And like I said, with a
> different volume manager, it''s been rock solid for years.
>
> So, to ask the sanity check more specifically -
> Is it reasonable to expect ZFS pools to be exported, have their luns
> change underneath, then later import the same pool on those changed
> drives again?
If you were splitting ZFS mirrors to read data from one half all would 
be sweet (and you wouldn''t have to export the pool).  I guess the 
question here is what does TruCopy do under the hood when you re-connect 
the mirror?

-- 
Ian.

Richard Elling

2012-Jul-07 00:11 UTC

head link

[zfs-discuss] Scenario sanity check

First things first, the panic is a bug. Please file one with your OS supplier.
More below...

On Jul 6, 2012, at 4:55 PM, Ian Collins wrote:
> On 07/ 7/12 11:29 AM, Brian Wilson wrote:
>> On 07/ 6/12 04:17 PM, Ian Collins wrote:
>>> On 07/ 7/12 08:34 AM, Brian Wilson wrote:
>>>> Hello,
>>>> 
>>>> I''d like a sanity check from people more knowledgeable
than myself.
>>>> I''m managing backups on a production system. 
Previously I was using
>>>> another volume manager and filesystem on Solaris, and
I''ve just switched
>>>> to using ZFS.
>>>> 
>>>> My model is -
>>>> Production Server A
>>>> Test Server B
>>>> Mirrored storage arrays (HDS TruCopy if it matters)
>>>> Backup software (TSM)
>>>> 
>>>> Production server A sees the live volumes.
>>>> Test Server B sees the TruCopy mirrors of the live volumes. 
(it sees
>>>> the second storage array, the production server sees the
primary array)
>>>> 
>>>> Production server A shuts down zone C, and exports the zpools
for
>>>> zone C.
>>>> Production server A splits the mirror to secondary storage
array,
>>>> leaving the mirror writable.
>>>> Production server A re-imports the pools for zone C, and boots
zone C.
>>>> Test Server B imports the ZFS pool using -R /backup.
>>>> Backup software backs up the mounted mirror volumes on Test
Server B.
>>>> 
>>>> Later in the day after the backups finish, a script exports the
ZFS
>>>> pools on test server B, and re-establishes the TruCopy mirror
between
>>>> the storage arrays.
>>> That looks awfully complicated.   Why don''t you just clone
a snapshot
>>> and back up the clone?
>>> 
>> Taking a snapshot and cloning incurs IO.  Backing up the clone incurs a
>> lot more IO reading off the disks and going over the network.  These
>> aren''t acceptable costs in my situation.
Yet it is acceptable to shut down the zones and export the pools? 
I''m interested to understand how a service outage is preferred over
I/O?
> So splitting a mirror and reconnecting it doesn''t incur I/O?
It does.
>> The solution is complicated if you''re starting from scratch. 
I''m
>> working in an environment that already had all the pieces in place
>> (offsite synchronous mirroring, a test server to mount stuff up on,
>> scripts that automated the storage array mirror management, etc).  It
>> was setup that way specifically to accomplish short downtime outages
for
>> cold backups with minimal or no IO hit to production.  So while
it''s
>> complicated, when it was put together it was also the most obvious
thing
>> to do to drop my backup window to almost nothing, and keep all the IO
>> from the backup from impacting production.  And like I said, with a
>> different volume manager, it''s been rock solid for years.
... where data corruption is blissfully ignored? I''m not sure what
volume
manager you were using, but SVM has absolutely zero data integrity 
checking :-(  And no, we do not miss using SVM :-)

>> So, to ask the sanity check more specifically -
>> Is it reasonable to expect ZFS pools to be exported, have their luns
>> change underneath, then later import the same pool on those changed
>> drives again?
Yes, we do this quite frequently. And it is tested ad nauseum. Methinks it is
simply a bug, perhaps one that is already fixed.
> If you were splitting ZFS mirrors to read data from one half all would be
sweet (and you wouldn''t have to export the pool).  I guess the question
here is what does TruCopy do under the hood when you re-connect the mirror?
Yes, this is one of the use cases for zpool split. However, zpool split creates
a new
pool, which is not what Brian wants, because to reattach the disks requires a
full resilver.
Using TrueCopy as he does, is a reasonable approach for Brian''s use
case.
 -- richard

--
ZFS Performance and Training
Richard.Elling at RichardElling.com
+1-760-896-4422







-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120706/84e4e0c4/attachment-0001.html>

Brian Wilson

2012-Jul-09 17:26 UTC

head link

[zfs-discuss] Scenario sanity check

On 07/06/12, Richard Elling  wrote:>
>
>
> First things first, the panic is a bug. Please file one with your OS
supplier.More below...
Thanks! It helps that it recurred a second night in a row.
>
> On Jul 6, 2012, at 4:55 PM, Ian Collins wrote:
>
>
> > On 07/ 7/12 11:29 AM, Brian Wilson wrote:
> >
> > > On 07/ 6/12 04:17 PM, Ian Collins wrote:
> > >
> >
> > >
> > > > On 07/ 7/12 08:34 AM, Brian Wilson wrote:
> > > >
> > >
> >
> > >
> > > >
> > > > > Hello,
> > > > >
> > > >
> > >
> >
> > >
> > > >
> > > > >
> > > > >
> > > >
> > >
> >
> > >
> > > >
> > > > > I''d like a sanity check from people more
knowledgeable than myself.
> > > > >
> > > >
> > >
> >
> > >
> > > >
> > > > > I''m managing backups on a production system.
Previously I was using
> > > > >
> > > >
> > >
> >
> > >
> > > >
> > > > > another volume manager and filesystem on Solaris, and
I''ve just switched
> > > > >
> > > >
> > >
> >
> > >
> > > >
> > > > > to using ZFS.
> > > > >
> > > >
> > >
> >
> > >
> > > >
> > > > >
> > > > >
> > > >
> > >
> >
> > >
> > > >
> > > > > My model is -
> > > > >
> > > >
> > >
> >
> > >
> > > >
> > > > > Production Server A
> > > > >
> > > >
> > >
> >
> > >
> > > >
> > > > > Test Server B
> > > > >
> > > >
> > >
> >
> > >
> > > >
> > > > > Mirrored storage arrays (HDS TruCopy if it matters)
> > > > >
> > > >
> > >
> >
> > >
> > > >
> > > > > Backup software (TSM)
> > > > >
> > > >
> > >
> >
> > >
> > > >
> > > > >
> > > > >
> > > >
> > >
> >
> > >
> > > >
> > > > > Production server A sees the live volumes.
> > > > >
> > > >
> > >
> >
> > >
> > > >
> > > > > Test Server B sees the TruCopy mirrors of the live
volumes. (it sees
> > > > >
> > > >
> > >
> >
> > >
> > > >
> > > > > the second storage array, the production server sees
the primary array)
> > > > >
> > > >
> > >
> >
> > >
> > > >
> > > > >
> > > > >
> > > >
> > >
> >
> > >
> > > >
> > > > > Production server A shuts down zone C, and exports the
zpools for
> > > > >
> > > >
> > >
> >
> > >
> > > >
> > > > > zone C.
> > > > >
> > > >
> > >
> >
> > >
> > > >
> > > > > Production server A splits the mirror to secondary
storage array,
> > > > >
> > > >
> > >
> >
> > >
> > > >
> > > > > leaving the mirror writable.
> > > > >
> > > >
> > >
> >
> > >
> > > >
> > > > > Production server A re-imports the pools for zone C,
and boots zone C.
> > > > >
> > > >
> > >
> >
> > >
> > > >
> > > > > Test Server B imports the ZFS pool using -R /backup.
> > > > >
> > > >
> > >
> >
> > >
> > > >
> > > > > Backup software backs up the mounted mirror volumes on
Test Server B.
> > > > >
> > > >
> > >
> >
> > >
> > > >
> > > > >
> > > > >
> > > >
> > >
> >
> > >
> > > >
> > > > > Later in the day after the backups finish, a script
exports the ZFS
> > > > >
> > > >
> > >
> >
> > >
> > > >
> > > > > pools on test server B, and re-establishes the TruCopy
mirror between
> > > > >
> > > >
> > >
> >
> > >
> > > >
> > > > > the storage arrays.
> > > > >
> > > >
> > >
> >
> > >
> > > > That looks awfully complicated. Why don''t you just
clone a snapshot
> > > >
> > >
> >
> > >
> > > > and back up the clone?
> > > >
> > >
> >
> > >
> > > >
> > > >
> > >
> >
> > > Taking a snapshot and cloning incurs IO. Backing up the clone
incurs a
> > >
> >
> > > lot more IO reading off the disks and going over the network.
These
> > >
> >
> > > aren''t acceptable costs in my situation.
> > >
> >
> >
>
>
> Yet it is acceptable to shut down the zones and export the pools?
> I''m interested to understand how a service outage is preferred
over I/O?
>
>
> > So splitting a mirror and reconnecting it doesn''t incur I/O?
> >
> >
>
>
> It does.
>
>
> >
> > > The solution is complicated if you''re starting from
scratch. I''m
> > >
> >
> > > working in an environment that already had all the pieces in
place
> > >
> >
> > > (offsite synchronous mirroring, a test server to mount stuff up
on,
> > >
> >
> > > scripts that automated the storage array mirror management, etc).
It
> > >
> >
> > > was setup that way specifically to accomplish short downtime
outages for
> > >
> >
> > > cold backups with minimal or no IO hit to production. So while
it''s
> > >
> >
> > > complicated, when it was put together it was also the most
obvious thing
> > >
> >
> > > to do to drop my backup window to almost nothing, and keep all
the IO
> > >
> >
> > > from the backup from impacting production. And like I said, with
a
> > >
> >
> > > different volume manager, it''s been rock solid for
years.
> > >
> >
> >
>
>
> ... where data corruption is blissfully ignored? I''m not sure what
volume
> manager you were using, but SVM has absolutely zero data integrity
> checking :-( And no, we do not miss using SVM :-)
>I was trying to avoid sounding like a brand snob (''my old volume
manager
did X, why doesn''t ZFS?''), because that''s truely not
my attitude, I
prefer ZFS. I was using VxVM and VxFS - still no integrity checking, I 
agree :-)
>
>
>
> >
> > > So, to ask the sanity check more specifically -
> > >
> >
> > > Is it reasonable to expect ZFS pools to be exported, have their
luns
> > >
> >
> > > change underneath, then later import the same pool on those
changed
> > >
> >
> > > drives again?
> > >
> >
> >
>
>
> Yes, we do this quite frequently. And it is tested ad nauseum. Methinks it
is
> simply a bug, perhaps one that is already fixed.
>Excellent, that''s exactly what I was hoping to hear. Thank you!
>
> > If you were splitting ZFS mirrors to read data from one half all would
be sweet (and you wouldn''t have to export the pool). I guess the
question here is what does TruCopy do under the hood when you re-connect the
mirror?
> >
> >
>
>
> Yes, this is one of the use cases for zpool split. However, zpool split
creates a new
> pool, which is not what Brian wants, because to reattach the disks requires
a full resilver.
> Using TrueCopy as he does, is a reasonable approach for Brian''s
use case.
> -- richard

Yep, thanks, and to answer Ian with more detail on what TruCopy does. 
TruCopy mirrors between the two storage arrays, with software running on 
the arrays, and keeps a list of dirty/changed ''tracks'' while
the mirror
is split. I think they call it something other than ''tracks''
for HDS,
but, whatever.  When it resyncs the mirrors it sets the target luns 
read-only (which is why I export the zpools first), and the source array 
reads the changed tracks, and writes them across dedicated mirror ports 
and fibre links to the target array''s dedicated mirror ports, which
then
brings the target luns up to synchronized. So, yes, like Richard says, 
there is IO, but it''s isolated to the arrays, and it''s
scheduled as
lower priority on the source array than production traffic. For example 
it can take an hour or more to re-synchronize a particularly busy 250 GB 
lun. (though you can do more than one at a time without it taking longer 
or impacting production any more unless you choke the mirror links, 
which we do our best not to do) That lower priority, dedicated ports on 
the arrays, etc, all makes the noticaeble impact on the production 
storage luns from the production server as un-noticable as I can make it 
in my environment.

Thanks again! Off to file a bug...

Brian
>
>
> --
> ZFS Performance and Training
> Richard.Elling at RichardElling.com <Richard.Elling at
RichardElling.com>
> +1-760-896-4422
--
-- 




-----------------------------------------------------------------------------------

Brian Wilson, Solaris SE, UW-Madison DoIT

Room 3114 CS&S 608-263-8047

brian.wilson(a)doit.wisc.edu

''I try to save a life a day. Usually it''s my own.'' -
John Crichton

-----------------------------------------------------------------------------------

-- 
-----------------------------------------------------------------------------------
Brian Wilson, Solaris SE, UW-Madison DoIT
Room 3114 CS&S            608-263-8047
brian.wilson(a)doit.wisc.edu
''I try to save a life a day. Usually it''s my own.'' -
John Crichton
-----------------------------------------------------------------------------------

Ian Collins

2012-Jul-09 21:36 UTC

head link

[zfs-discuss] Scenario sanity check

On 07/10/12 05:26 AM, Brian Wilson wrote:> Yep, thanks, and to answer Ian with more detail on what TruCopy does.
> TruCopy mirrors between the two storage arrays, with software running on
> the arrays, and keeps a list of dirty/changed ''tracks''
while the mirror
> is split. I think they call it something other than
''tracks'' for HDS,
> but, whatever.  When it resyncs the mirrors it sets the target luns
> read-only (which is why I export the zpools first), and the source array
> reads the changed tracks, and writes them across dedicated mirror ports
> and fibre links to the target array''s dedicated mirror ports,
which then
> brings the target luns up to synchronized. So, yes, like Richard says,
> there is IO, but it''s isolated to the arrays, and it''s
scheduled as
> lower priority on the source array than production traffic. For example
> it can take an hour or more to re-synchronize a particularly busy 250 GB
> lun. (though you can do more than one at a time without it taking longer
> or impacting production any more unless you choke the mirror links,
> which we do our best not to do) That lower priority, dedicated ports on
> the arrays, etc, all makes the noticaeble impact on the production
> storage luns from the production server as un-noticable as I can make it
> in my environment.
Thank you for the background on TruCopy.   Reading the above, it looks 
like you can have pretty long time without a true copy!  I guess my view 
on replication is you are always going to have X number of I/O 
operations and now dense they are depends on how up to date you want 
you''re copy to be.

What I still don''t understand is why a service interruption is 
preferable to a wee bit more I/O?

-- 
Ian.

Brian Wilson

2012-Jul-11 19:40 UTC

head link

[zfs-discuss] Scenario sanity check

On 07/ 9/12 04:36 PM, Ian Collins wrote:> On 07/10/12 05:26 AM, Brian Wilson wrote:
>> Yep, thanks, and to answer Ian with more detail on what TruCopy does.
>> TruCopy mirrors between the two storage arrays, with software running
on
>> the arrays, and keeps a list of dirty/changed
''tracks'' while the mirror
>> is split. I think they call it something other than
''tracks'' for HDS,
>> but, whatever.  When it resyncs the mirrors it sets the target luns
>> read-only (which is why I export the zpools first), and the source
array
>> reads the changed tracks, and writes them across dedicated mirror ports
>> and fibre links to the target array''s dedicated mirror ports,
which then
>> brings the target luns up to synchronized. So, yes, like Richard says,
>> there is IO, but it''s isolated to the arrays, and
it''s scheduled as
>> lower priority on the source array than production traffic. For example
>> it can take an hour or more to re-synchronize a particularly busy 250
GB
>> lun. (though you can do more than one at a time without it taking
longer
>> or impacting production any more unless you choke the mirror links,
>> which we do our best not to do) That lower priority, dedicated ports on
>> the arrays, etc, all makes the noticaeble impact on the production
>> storage luns from the production server as un-noticable as I can make
it
>> in my environment.
>
> Thank you for the background on TruCopy.   Reading the above, it looks 
> like you can have pretty long time without a true copy!  I guess my 
> view on replication is you are always going to have X number of I/O 
> operations and now dense they are depends on how up to date you want 
> you''re copy to be.
>
> What I still don''t understand is why a service interruption is 
> preferable to a wee bit more I/O?
>
Sorry for the delayed answer.  In this case it''s less a matter of how 
much IO, as where the IO is.  One thing I should mention is that during 
normal operations of TruCopy, the mirroring is synchronous - meaning the 
remote mirror array acknowledges every write before it''s acknowledged
to
the host (battery backed cache keeps it from slowing down performance).

First, in this case the amount of nightly IO unfortunately isn''t a
''wee
bit'', because the large database files that end up having to get backed
up every night via TSM tie up a network connection for several hours.  
Secondly, the application doesn''t support hot backup.  The Oracle 
database does sure, however the application itself extensively uses and 
maintains ''keyword index'' files external to the database that
require a
full application shutdown for a consistent backup.  So, this is where 
taking a snapshot (in my case using array-to-array mirroring to do so) 
takes the nightly backup outage from the duration of hours for the 
backup to complete over the network, to a matter of minutes.  So, while 
it is an outage, it''s a very short one comparative to the options that 
are available with the application.  (FYI - the application is Exlibris 
Group''s Voyager software for libraries - I''m the primary admin
for it
for almost all campus libraries in Wisconsin).

Cheers,
Brian

-- 
-----------------------------------------------------------------------------------
Brian Wilson, Solaris SE, UW-Madison DoIT
Room 3114 CS&S            608-263-8047
brian.wilson(a)doit.wisc.edu
''I try to save a life a day. Usually it''s my own.'' -
John Crichton
-----------------------------------------------------------------------------------

Reasonably Related Threads

Search for more seemingly similar threads

zfs discuss - Apr 2012 - kernel panic during zfs import [UPDATE]

[zfs-discuss] kernel panic during zfs import [UPDATE]

[zfs-discuss] kernel panic during zfs import [UPDATE]

[zfs-discuss] kernel panic during zfs import [UPDATE]

[zfs-discuss] Scenario sanity check

[zfs-discuss] Scenario sanity check

[zfs-discuss] Scenario sanity check

[zfs-discuss] Scenario sanity check

[zfs-discuss] Scenario sanity check

[zfs-discuss] Scenario sanity check

[zfs-discuss] Scenario sanity check

[zfs-discuss] Scenario sanity check

Reasonably Related Threads