Hello everybody, just to let you know what happened in the meantime: I was able to open a Service Request at Oracle. The issue is a known bug (Bug 6742788 : assertion panic at: zfs:zap_deref_leaf) The bug has bin fixed (according to Oracle support) since build 164, but there is no fix for Solaris 11 available so far (will be fixed in S11U7?). There is a workaround available that works (partly), but my system crashed again when trying to rebuild the offending zfs within the affected zpool. At the moment I''m waiting for a so called "interim diagnostic relief" patch.... cu Carsten -- Max Planck Institut fuer marine Mikrobiologie - Network Administration - Celsiustr. 1 D-28359 Bremen Tel.: +49 421 2028568 Fax.: +49 421 2028565 PGP public key:http://www.mpi-bremen.de/Carsten_John.html
Enda O''Connor
2012-Apr-17 15:49 UTC
[zfs-discuss] kernel panic during zfs import [UPDATE]
On 17/04/2012 16:40, Carsten John wrote:> Hello everybody, > > just to let you know what happened in the meantime: > > I was able to open a Service Request at Oracle. > > The issue is a known bug (Bug 6742788 : assertion panic at: zfs:zap_deref_leaf) > > The bug has bin fixed (according to Oracle support) since build 164, but there is no fix for Solaris 11 available so far (will be fixed in S11U7?). > > There is a workaround available that works (partly), but my system crashed again when trying to rebuild the offending zfs within the affected zpool. > > At the moment I''m waiting for a so called "interim diagnostic relief" patch....so are you on s11, can I see pkg info entire this bug is fixed in FCS s11 release, as that is 175b, and it got fixed in build 164. So if you have solaris 11 that CR is fixed. In solaris 10 it is fixed in 147440-14/147441-14 ( sparc/x86 ) Enda> > > cu > > Carsten >
Stephan Budach
2012-Apr-17 19:55 UTC
[zfs-discuss] kernel panic during zfs import [UPDATE]
Hi Carsten, Am 17.04.12 17:40, schrieb Carsten John:> Hello everybody, > > just to let you know what happened in the meantime: > > I was able to open a Service Request at Oracle. > > The issue is a known bug (Bug 6742788 : assertion panic at: zfs:zap_deref_leaf) > > The bug has bin fixed (according to Oracle support) since build 164, but there is no fix for Solaris 11 available so far (will be fixed in S11U7?). > > There is a workaround available that works (partly), but my system crashed again when trying to rebuild the offending zfs within the affected zpool. > > At the moment I''m waiting for a so called "interim diagnostic relief" patch.... > > > cu > > Carsten >Afaik, bug 6742788 is fixed in S11 FCS (release) but you might be hitting this bug: 7098658. This bug, according to MOS, is still unresolved. My solution is to mount the affected zfs fs in read-only mode upon importing the zpool and setting it to rw afterwards. Cheers, budy -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120417/de99ed7f/attachment.html>
Hello, I''d like a sanity check from people more knowledgeable than myself. I''m managing backups on a production system. Previously I was using another volume manager and filesystem on Solaris, and I''ve just switched to using ZFS. My model is - Production Server A Test Server B Mirrored storage arrays (HDS TruCopy if it matters) Backup software (TSM) Production server A sees the live volumes. Test Server B sees the TruCopy mirrors of the live volumes. (it sees the second storage array, the production server sees the primary array) Production server A shuts down zone C, and exports the zpools for zone C. Production server A splits the mirror to secondary storage array, leaving the mirror writable. Production server A re-imports the pools for zone C, and boots zone C. Test Server B imports the ZFS pool using -R /backup. Backup software backs up the mounted mirror volumes on Test Server B. Later in the day after the backups finish, a script exports the ZFS pools on test server B, and re-establishes the TruCopy mirror between the storage arrays. So.. I had this working fine with one zone on server A for a couple of months. This week I''ve added 6 more zones, each with two ZFS pools. The first night went okay. Last night, the test server B kernel panic''d well after the mirrored volumes zpools were imported, just after the TSM backup started reading all the ZFS pools to push it all to the enterprise backup environment. Here''s the kernel panic message - Jul 6 03:04:55 riggs ^Mpanic[cpu22]/thread=2a10e81bca0: Jul 6 03:04:55 riggs unix: [ID 403854 kern.notice] assertion failed: 0 == dmu_buf_hold_array(os, object, offset, size, FALSE, FTAG, &numbufs, &dbp), file: ../../common/fs/zfs/dmu.c, line: 759 Jul 6 03:04:55 riggs unix: [ID 100000 kern.notice] Jul 6 03:04:55 riggs genunix: [ID 723222 kern.notice] 000002a10e81b4f0 genunix:assfail+74 (7af0f8c0, 7af0f910, 2f7, 190d000, 12a1800, 0) Jul 6 03:04:55 riggs genunix: [ID 179002 kern.notice] %l0-3: 0000000000000000 0000000000000001 0000000000000001 00000300f20fdf81 Jul 6 03:04:55 riggs %l4-7: 00000000012a1800 0000000000000000 0000000001959400 0000000000000000 Jul 6 03:04:55 riggs genunix: [ID 723222 kern.notice] 000002a10e81b5a0 zfs:dmu_write+54 (300cbfd5c40, ad, a70, 20, 300b8c02800, 300f83414d0) Jul 6 03:04:55 riggs genunix: [ID 179002 kern.notice] %l0-3: 0000000000000038 0000000000000007 000000000194bd40 000000000194bc00 Jul 6 03:04:55 riggs %l4-7: 0000000000000001 0000030071bcb701 0000000000003006 0000000000003000 Jul 6 03:04:55 riggs genunix: [ID 723222 kern.notice] 000002a10e81b670 zfs:space_map_sync+278 (3009babd130, b, 3009babcfe0, 20, 4, 58) Jul 6 03:04:55 riggs genunix: [ID 179002 kern.notice] %l0-3: 0000000000000020 00000300b8c02800 00000300b8c02820 00000300b8c02858 Jul 6 03:04:55 riggs %l4-7: 00007fffffffffff 0000000000007fff 00000000000022d9 0000000000000020 Jul 6 03:04:55 riggs genunix: [ID 723222 kern.notice] 000002a10e81b760 zfs:metaslab_sync+2b0 (3009babcfc0, 1db7, 300f83414d0, 3009babd408, 300c9724000, 6003e24acc0) Jul 6 03:04:55 riggs genunix: [ID 179002 kern.notice] %l0-3: 00000300cbfd5c40 000003009babcff8 000003009babd130 000003009babd2d0 Jul 6 03:04:55 riggs %l4-7: 000003009babcfe0 0000000000000000 000003009babd268 000000000000001a Jul 6 03:04:55 riggs genunix: [ID 723222 kern.notice] 000002a10e81b820 zfs:vdev_sync+b8 (6003e24acc0, 1db7, 1db6, 3009babcfc0, 6003e24b000, 17) Jul 6 03:04:55 riggs genunix: [ID 179002 kern.notice] %l0-3: 0000000000000090 0000000000000012 000006003e24acc0 00000300c9724000 Jul 6 03:04:55 riggs %l4-7: 0000000000000000 0000000000000000 0000000000000000 00000009041ea000 Jul 6 03:04:55 riggs genunix: [ID 723222 kern.notice] 000002a10e81b8d0 zfs:spa_sync+484 (300c9724000, 1db7, 3005fec09a8, 300c9724428, 1, 300cbfd5c40) Jul 6 03:04:55 riggs genunix: [ID 179002 kern.notice] %l0-3: 0000000000000000 00000300c9724280 0000030087c3e940 00000300c6aae700 Jul 6 03:04:55 riggs %l4-7: 0000030080073520 00000300c9724378 00000300c9724300 00000300c9724330 Jul 6 03:04:55 riggs genunix: [ID 723222 kern.notice] 000002a10e81b9a0 zfs:txg_sync_thread+1b8 (30087c3e940, 183f9f0, 707a3130, 0, 2a10e81ba70, 0) Jul 6 03:04:55 riggs genunix: [ID 179002 kern.notice] %l0-3: 0000000000000000 0000030087c3eb0e 0000030087c3eb08 0000030087c3eb0c Jul 6 03:04:55 riggs %l4-7: 000000001230fa07 0000030087c3eac8 0000030087c3ead0 0000000000001db7 Jul 6 03:04:55 riggs unix: [ID 100000 kern.notice] So, I guess my question is - is what I''m doing sane? Or is there something inherint with ZFS that I''m missing that''s going to cause this kernel panic to repeat? Best I can guess, it got upset when the pools were being read. I''m wondering of exporting the pools later in the day before re-syncing the SAN volumes to mirrors is causing weirdness. (because makes the mirrored volumes visible on Test Server B read-only until the split). I wouldn''t think so, because they''re exported before the luns go read-only, but I could be wrong. Anyway, am I off my rocker? This should work with ZFS, right? Thanks! Brian -- ----------------------------------------------------------------------------------- Brian Wilson, Solaris SE, UW-Madison DoIT Room 3114 CS&S 608-263-8047 brian.wilson(a)doit.wisc.edu ''I try to save a life a day. Usually it''s my own.'' - John Crichton -----------------------------------------------------------------------------------
On 07/ 7/12 08:34 AM, Brian Wilson wrote:> Hello, > > I''d like a sanity check from people more knowledgeable than myself. > I''m managing backups on a production system. Previously I was using > another volume manager and filesystem on Solaris, and I''ve just switched > to using ZFS. > > My model is - > Production Server A > Test Server B > Mirrored storage arrays (HDS TruCopy if it matters) > Backup software (TSM) > > Production server A sees the live volumes. > Test Server B sees the TruCopy mirrors of the live volumes. (it sees > the second storage array, the production server sees the primary array) > > Production server A shuts down zone C, and exports the zpools for zone C. > Production server A splits the mirror to secondary storage array, > leaving the mirror writable. > Production server A re-imports the pools for zone C, and boots zone C. > Test Server B imports the ZFS pool using -R /backup. > Backup software backs up the mounted mirror volumes on Test Server B. > > Later in the day after the backups finish, a script exports the ZFS > pools on test server B, and re-establishes the TruCopy mirror between > the storage arrays.That looks awfully complicated. Why don''t you just clone a snapshot and back up the clone? -- Ian.
On 07/ 6/12 04:17 PM, Ian Collins wrote:> On 07/ 7/12 08:34 AM, Brian Wilson wrote: >> Hello, >> >> I''d like a sanity check from people more knowledgeable than myself. >> I''m managing backups on a production system. Previously I was using >> another volume manager and filesystem on Solaris, and I''ve just switched >> to using ZFS. >> >> My model is - >> Production Server A >> Test Server B >> Mirrored storage arrays (HDS TruCopy if it matters) >> Backup software (TSM) >> >> Production server A sees the live volumes. >> Test Server B sees the TruCopy mirrors of the live volumes. (it sees >> the second storage array, the production server sees the primary array) >> >> Production server A shuts down zone C, and exports the zpools for >> zone C. >> Production server A splits the mirror to secondary storage array, >> leaving the mirror writable. >> Production server A re-imports the pools for zone C, and boots zone C. >> Test Server B imports the ZFS pool using -R /backup. >> Backup software backs up the mounted mirror volumes on Test Server B. >> >> Later in the day after the backups finish, a script exports the ZFS >> pools on test server B, and re-establishes the TruCopy mirror between >> the storage arrays. > > That looks awfully complicated. Why don''t you just clone a snapshot > and back up the clone? >Taking a snapshot and cloning incurs IO. Backing up the clone incurs a lot more IO reading off the disks and going over the network. These aren''t acceptable costs in my situation. The solution is complicated if you''re starting from scratch. I''m working in an environment that already had all the pieces in place (offsite synchronous mirroring, a test server to mount stuff up on, scripts that automated the storage array mirror management, etc). It was setup that way specifically to accomplish short downtime outages for cold backups with minimal or no IO hit to production. So while it''s complicated, when it was put together it was also the most obvious thing to do to drop my backup window to almost nothing, and keep all the IO from the backup from impacting production. And like I said, with a different volume manager, it''s been rock solid for years. So, to ask the sanity check more specifically - Is it reasonable to expect ZFS pools to be exported, have their luns change underneath, then later import the same pool on those changed drives again? Thanks! Brian -- ----------------------------------------------------------------------------------- Brian Wilson, Solaris SE, UW-Madison DoIT Room 3114 CS&S 608-263-8047 brian.wilson(a)doit.wisc.edu ''I try to save a life a day. Usually it''s my own.'' - John Crichton -----------------------------------------------------------------------------------
On 07/ 7/12 11:29 AM, Brian Wilson wrote:> On 07/ 6/12 04:17 PM, Ian Collins wrote: >> On 07/ 7/12 08:34 AM, Brian Wilson wrote: >>> Hello, >>> >>> I''d like a sanity check from people more knowledgeable than myself. >>> I''m managing backups on a production system. Previously I was using >>> another volume manager and filesystem on Solaris, and I''ve just switched >>> to using ZFS. >>> >>> My model is - >>> Production Server A >>> Test Server B >>> Mirrored storage arrays (HDS TruCopy if it matters) >>> Backup software (TSM) >>> >>> Production server A sees the live volumes. >>> Test Server B sees the TruCopy mirrors of the live volumes. (it sees >>> the second storage array, the production server sees the primary array) >>> >>> Production server A shuts down zone C, and exports the zpools for >>> zone C. >>> Production server A splits the mirror to secondary storage array, >>> leaving the mirror writable. >>> Production server A re-imports the pools for zone C, and boots zone C. >>> Test Server B imports the ZFS pool using -R /backup. >>> Backup software backs up the mounted mirror volumes on Test Server B. >>> >>> Later in the day after the backups finish, a script exports the ZFS >>> pools on test server B, and re-establishes the TruCopy mirror between >>> the storage arrays. >> That looks awfully complicated. Why don''t you just clone a snapshot >> and back up the clone? >> > Taking a snapshot and cloning incurs IO. Backing up the clone incurs a > lot more IO reading off the disks and going over the network. These > aren''t acceptable costs in my situation.So splitting a mirror and reconnecting it doesn''t incur I/O?> The solution is complicated if you''re starting from scratch. I''m > working in an environment that already had all the pieces in place > (offsite synchronous mirroring, a test server to mount stuff up on, > scripts that automated the storage array mirror management, etc). It > was setup that way specifically to accomplish short downtime outages for > cold backups with minimal or no IO hit to production. So while it''s > complicated, when it was put together it was also the most obvious thing > to do to drop my backup window to almost nothing, and keep all the IO > from the backup from impacting production. And like I said, with a > different volume manager, it''s been rock solid for years. > > So, to ask the sanity check more specifically - > Is it reasonable to expect ZFS pools to be exported, have their luns > change underneath, then later import the same pool on those changed > drives again?If you were splitting ZFS mirrors to read data from one half all would be sweet (and you wouldn''t have to export the pool). I guess the question here is what does TruCopy do under the hood when you re-connect the mirror? -- Ian.
First things first, the panic is a bug. Please file one with your OS supplier. More below... On Jul 6, 2012, at 4:55 PM, Ian Collins wrote:> On 07/ 7/12 11:29 AM, Brian Wilson wrote: >> On 07/ 6/12 04:17 PM, Ian Collins wrote: >>> On 07/ 7/12 08:34 AM, Brian Wilson wrote: >>>> Hello, >>>> >>>> I''d like a sanity check from people more knowledgeable than myself. >>>> I''m managing backups on a production system. Previously I was using >>>> another volume manager and filesystem on Solaris, and I''ve just switched >>>> to using ZFS. >>>> >>>> My model is - >>>> Production Server A >>>> Test Server B >>>> Mirrored storage arrays (HDS TruCopy if it matters) >>>> Backup software (TSM) >>>> >>>> Production server A sees the live volumes. >>>> Test Server B sees the TruCopy mirrors of the live volumes. (it sees >>>> the second storage array, the production server sees the primary array) >>>> >>>> Production server A shuts down zone C, and exports the zpools for >>>> zone C. >>>> Production server A splits the mirror to secondary storage array, >>>> leaving the mirror writable. >>>> Production server A re-imports the pools for zone C, and boots zone C. >>>> Test Server B imports the ZFS pool using -R /backup. >>>> Backup software backs up the mounted mirror volumes on Test Server B. >>>> >>>> Later in the day after the backups finish, a script exports the ZFS >>>> pools on test server B, and re-establishes the TruCopy mirror between >>>> the storage arrays. >>> That looks awfully complicated. Why don''t you just clone a snapshot >>> and back up the clone? >>> >> Taking a snapshot and cloning incurs IO. Backing up the clone incurs a >> lot more IO reading off the disks and going over the network. These >> aren''t acceptable costs in my situation.Yet it is acceptable to shut down the zones and export the pools? I''m interested to understand how a service outage is preferred over I/O?> So splitting a mirror and reconnecting it doesn''t incur I/O?It does.>> The solution is complicated if you''re starting from scratch. I''m >> working in an environment that already had all the pieces in place >> (offsite synchronous mirroring, a test server to mount stuff up on, >> scripts that automated the storage array mirror management, etc). It >> was setup that way specifically to accomplish short downtime outages for >> cold backups with minimal or no IO hit to production. So while it''s >> complicated, when it was put together it was also the most obvious thing >> to do to drop my backup window to almost nothing, and keep all the IO >> from the backup from impacting production. And like I said, with a >> different volume manager, it''s been rock solid for years.... where data corruption is blissfully ignored? I''m not sure what volume manager you were using, but SVM has absolutely zero data integrity checking :-( And no, we do not miss using SVM :-)>> So, to ask the sanity check more specifically - >> Is it reasonable to expect ZFS pools to be exported, have their luns >> change underneath, then later import the same pool on those changed >> drives again?Yes, we do this quite frequently. And it is tested ad nauseum. Methinks it is simply a bug, perhaps one that is already fixed.> If you were splitting ZFS mirrors to read data from one half all would be sweet (and you wouldn''t have to export the pool). I guess the question here is what does TruCopy do under the hood when you re-connect the mirror?Yes, this is one of the use cases for zpool split. However, zpool split creates a new pool, which is not what Brian wants, because to reattach the disks requires a full resilver. Using TrueCopy as he does, is a reasonable approach for Brian''s use case. -- richard -- ZFS Performance and Training Richard.Elling at RichardElling.com +1-760-896-4422 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120706/84e4e0c4/attachment-0001.html>
On 07/06/12, Richard Elling wrote:> > > > First things first, the panic is a bug. Please file one with your OS supplier.More below...Thanks! It helps that it recurred a second night in a row.> > On Jul 6, 2012, at 4:55 PM, Ian Collins wrote: > > > > On 07/ 7/12 11:29 AM, Brian Wilson wrote: > > > > > On 07/ 6/12 04:17 PM, Ian Collins wrote: > > > > > > > > > > > > On 07/ 7/12 08:34 AM, Brian Wilson wrote: > > > > > > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I''d like a sanity check from people more knowledgeable than myself. > > > > > > > > > > > > > > > > > > > > > > > > > > I''m managing backups on a production system. Previously I was using > > > > > > > > > > > > > > > > > > > > > > > > > > another volume manager and filesystem on Solaris, and I''ve just switched > > > > > > > > > > > > > > > > > > > > > > > > > > to using ZFS. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > My model is - > > > > > > > > > > > > > > > > > > > > > > > > > > Production Server A > > > > > > > > > > > > > > > > > > > > > > > > > > Test Server B > > > > > > > > > > > > > > > > > > > > > > > > > > Mirrored storage arrays (HDS TruCopy if it matters) > > > > > > > > > > > > > > > > > > > > > > > > > > Backup software (TSM) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Production server A sees the live volumes. > > > > > > > > > > > > > > > > > > > > > > > > > > Test Server B sees the TruCopy mirrors of the live volumes. (it sees > > > > > > > > > > > > > > > > > > > > > > > > > > the second storage array, the production server sees the primary array) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Production server A shuts down zone C, and exports the zpools for > > > > > > > > > > > > > > > > > > > > > > > > > > zone C. > > > > > > > > > > > > > > > > > > > > > > > > > > Production server A splits the mirror to secondary storage array, > > > > > > > > > > > > > > > > > > > > > > > > > > leaving the mirror writable. > > > > > > > > > > > > > > > > > > > > > > > > > > Production server A re-imports the pools for zone C, and boots zone C. > > > > > > > > > > > > > > > > > > > > > > > > > > Test Server B imports the ZFS pool using -R /backup. > > > > > > > > > > > > > > > > > > > > > > > > > > Backup software backs up the mounted mirror volumes on Test Server B. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Later in the day after the backups finish, a script exports the ZFS > > > > > > > > > > > > > > > > > > > > > > > > > > pools on test server B, and re-establishes the TruCopy mirror between > > > > > > > > > > > > > > > > > > > > > > > > > > the storage arrays. > > > > > > > > > > > > > > > > > > > > > That looks awfully complicated. Why don''t you just clone a snapshot > > > > > > > > > > > > > > > > and back up the clone? > > > > > > > > > > > > > > > > > > > > > > > > > > > > Taking a snapshot and cloning incurs IO. Backing up the clone incurs a > > > > > > > > lot more IO reading off the disks and going over the network. These > > > > > > > > aren''t acceptable costs in my situation. > > > > > > > > > > Yet it is acceptable to shut down the zones and export the pools? > I''m interested to understand how a service outage is preferred over I/O? > > > > So splitting a mirror and reconnecting it doesn''t incur I/O? > > > > > > > It does. > > > > > > > The solution is complicated if you''re starting from scratch. I''m > > > > > > > > working in an environment that already had all the pieces in place > > > > > > > > (offsite synchronous mirroring, a test server to mount stuff up on, > > > > > > > > scripts that automated the storage array mirror management, etc). It > > > > > > > > was setup that way specifically to accomplish short downtime outages for > > > > > > > > cold backups with minimal or no IO hit to production. So while it''s > > > > > > > > complicated, when it was put together it was also the most obvious thing > > > > > > > > to do to drop my backup window to almost nothing, and keep all the IO > > > > > > > > from the backup from impacting production. And like I said, with a > > > > > > > > different volume manager, it''s been rock solid for years. > > > > > > > > > > ... where data corruption is blissfully ignored? I''m not sure what volume > manager you were using, but SVM has absolutely zero data integrity > checking :-( And no, we do not miss using SVM :-) >I was trying to avoid sounding like a brand snob (''my old volume manager did X, why doesn''t ZFS?''), because that''s truely not my attitude, I prefer ZFS. I was using VxVM and VxFS - still no integrity checking, I agree :-)>> > > > > > > So, to ask the sanity check more specifically - > > > > > > > > Is it reasonable to expect ZFS pools to be exported, have their luns > > > > > > > > change underneath, then later import the same pool on those changed > > > > > > > > drives again? > > > > > > > > > > Yes, we do this quite frequently. And it is tested ad nauseum. Methinks it is > simply a bug, perhaps one that is already fixed. >Excellent, that''s exactly what I was hoping to hear. Thank you!> > > If you were splitting ZFS mirrors to read data from one half all would be sweet (and you wouldn''t have to export the pool). I guess the question here is what does TruCopy do under the hood when you re-connect the mirror? > > > > > > > Yes, this is one of the use cases for zpool split. However, zpool split creates a new > pool, which is not what Brian wants, because to reattach the disks requires a full resilver. > Using TrueCopy as he does, is a reasonable approach for Brian''s use case. > -- richardYep, thanks, and to answer Ian with more detail on what TruCopy does. TruCopy mirrors between the two storage arrays, with software running on the arrays, and keeps a list of dirty/changed ''tracks'' while the mirror is split. I think they call it something other than ''tracks'' for HDS, but, whatever. When it resyncs the mirrors it sets the target luns read-only (which is why I export the zpools first), and the source array reads the changed tracks, and writes them across dedicated mirror ports and fibre links to the target array''s dedicated mirror ports, which then brings the target luns up to synchronized. So, yes, like Richard says, there is IO, but it''s isolated to the arrays, and it''s scheduled as lower priority on the source array than production traffic. For example it can take an hour or more to re-synchronize a particularly busy 250 GB lun. (though you can do more than one at a time without it taking longer or impacting production any more unless you choke the mirror links, which we do our best not to do) That lower priority, dedicated ports on the arrays, etc, all makes the noticaeble impact on the production storage luns from the production server as un-noticable as I can make it in my environment. Thanks again! Off to file a bug... Brian> > > -- > ZFS Performance and Training > Richard.Elling at RichardElling.com <Richard.Elling at RichardElling.com> > +1-760-896-4422-- -- ----------------------------------------------------------------------------------- Brian Wilson, Solaris SE, UW-Madison DoIT Room 3114 CS&S 608-263-8047 brian.wilson(a)doit.wisc.edu ''I try to save a life a day. Usually it''s my own.'' - John Crichton ----------------------------------------------------------------------------------- -- ----------------------------------------------------------------------------------- Brian Wilson, Solaris SE, UW-Madison DoIT Room 3114 CS&S 608-263-8047 brian.wilson(a)doit.wisc.edu ''I try to save a life a day. Usually it''s my own.'' - John Crichton -----------------------------------------------------------------------------------
On 07/10/12 05:26 AM, Brian Wilson wrote:> Yep, thanks, and to answer Ian with more detail on what TruCopy does. > TruCopy mirrors between the two storage arrays, with software running on > the arrays, and keeps a list of dirty/changed ''tracks'' while the mirror > is split. I think they call it something other than ''tracks'' for HDS, > but, whatever. When it resyncs the mirrors it sets the target luns > read-only (which is why I export the zpools first), and the source array > reads the changed tracks, and writes them across dedicated mirror ports > and fibre links to the target array''s dedicated mirror ports, which then > brings the target luns up to synchronized. So, yes, like Richard says, > there is IO, but it''s isolated to the arrays, and it''s scheduled as > lower priority on the source array than production traffic. For example > it can take an hour or more to re-synchronize a particularly busy 250 GB > lun. (though you can do more than one at a time without it taking longer > or impacting production any more unless you choke the mirror links, > which we do our best not to do) That lower priority, dedicated ports on > the arrays, etc, all makes the noticaeble impact on the production > storage luns from the production server as un-noticable as I can make it > in my environment.Thank you for the background on TruCopy. Reading the above, it looks like you can have pretty long time without a true copy! I guess my view on replication is you are always going to have X number of I/O operations and now dense they are depends on how up to date you want you''re copy to be. What I still don''t understand is why a service interruption is preferable to a wee bit more I/O? -- Ian.
On 07/ 9/12 04:36 PM, Ian Collins wrote:> On 07/10/12 05:26 AM, Brian Wilson wrote: >> Yep, thanks, and to answer Ian with more detail on what TruCopy does. >> TruCopy mirrors between the two storage arrays, with software running on >> the arrays, and keeps a list of dirty/changed ''tracks'' while the mirror >> is split. I think they call it something other than ''tracks'' for HDS, >> but, whatever. When it resyncs the mirrors it sets the target luns >> read-only (which is why I export the zpools first), and the source array >> reads the changed tracks, and writes them across dedicated mirror ports >> and fibre links to the target array''s dedicated mirror ports, which then >> brings the target luns up to synchronized. So, yes, like Richard says, >> there is IO, but it''s isolated to the arrays, and it''s scheduled as >> lower priority on the source array than production traffic. For example >> it can take an hour or more to re-synchronize a particularly busy 250 GB >> lun. (though you can do more than one at a time without it taking longer >> or impacting production any more unless you choke the mirror links, >> which we do our best not to do) That lower priority, dedicated ports on >> the arrays, etc, all makes the noticaeble impact on the production >> storage luns from the production server as un-noticable as I can make it >> in my environment. > > Thank you for the background on TruCopy. Reading the above, it looks > like you can have pretty long time without a true copy! I guess my > view on replication is you are always going to have X number of I/O > operations and now dense they are depends on how up to date you want > you''re copy to be. > > What I still don''t understand is why a service interruption is > preferable to a wee bit more I/O? >Sorry for the delayed answer. In this case it''s less a matter of how much IO, as where the IO is. One thing I should mention is that during normal operations of TruCopy, the mirroring is synchronous - meaning the remote mirror array acknowledges every write before it''s acknowledged to the host (battery backed cache keeps it from slowing down performance). First, in this case the amount of nightly IO unfortunately isn''t a ''wee bit'', because the large database files that end up having to get backed up every night via TSM tie up a network connection for several hours. Secondly, the application doesn''t support hot backup. The Oracle database does sure, however the application itself extensively uses and maintains ''keyword index'' files external to the database that require a full application shutdown for a consistent backup. So, this is where taking a snapshot (in my case using array-to-array mirroring to do so) takes the nightly backup outage from the duration of hours for the backup to complete over the network, to a matter of minutes. So, while it is an outage, it''s a very short one comparative to the options that are available with the application. (FYI - the application is Exlibris Group''s Voyager software for libraries - I''m the primary admin for it for almost all campus libraries in Wisconsin). Cheers, Brian -- ----------------------------------------------------------------------------------- Brian Wilson, Solaris SE, UW-Madison DoIT Room 3114 CS&S 608-263-8047 brian.wilson(a)doit.wisc.edu ''I try to save a life a day. Usually it''s my own.'' - John Crichton -----------------------------------------------------------------------------------