thr3ads.net - zfs discuss - [zfs-discuss] ZFS and question on repetative data migrating to it efficiently... [Feb 2007]

If this information is useful, please help other people find it:
Share via:

jason

2007-Feb-02 17:16 UTC

[zfs-discuss] ZFS and question on repetative data migrating to it efficiently...

Hi all,

Longtime reader, first time poster.... Sorry for the lengthy intro and not
really sure the title matches what I''m trying to get at... I am trying
to find a solution where making use of a zfs filesystem can shorten our backup
window.  Currently, our backup solution takes data from ufs or vxfs filesystems
from a split mirrored disk, mounts it off host, and then writes that directly to
tape from the off host backup system.  This equates to a "full" backup
and requires a significant amount of time that the split mirror is attached to
that system till it can get returned.  I''d like to fit in a zfs
filesystem into the mix, and hopefully make use of "space saving"
snapshot capabilities and find out if there is a known way to migrate this into
a "incremental" backup, with known freeware or os level tools.

I get that if the source data storage was originally zfs instead of ufs|vxfs,
that I''d be able to take snapshots of the storage pre mirror split,
mount that storage on the offhost, and then take deltas from the different
snapshots to turn into files, or get applied directly to another zfs filesystem
on the offhost that was originally created from information off that detached
mirror.  We could also do this without the mirror split and just do a zfs send
and pipe the data out to a remote host where it would recreate that snapshot
there.  It will take a while to get to where I can have zfs running in
production as it might involve some brainwashing of some DBAs to get it done, so
in the meantime, what are some thoughts on how to do this without data sitting
on a zfs source?

Some questions I have are in regards to trying to keep this management of data
at a "file" level.  I would like to use a zfs filesystem as the
repository of data, and have that repository most efficiently house the data. 
I''d like to see that if I sent over binary database files that were
sourced on a ufs|vxfs filesystem to the zfs filesystem, and then took a snapshot
of that data, how could I update the data on that zfs filesystem with more
current files, and then have zfs recognize that the files are mostly the same,
and only have some differing bits.  Can a snapshotted zfs filesystem, get a file
that is named the same, overwritten fully on the live zfs filesystem, and use
the same amount of "block" space that is used in the snapshot?  I
don''t know if I''m stating that all clearly.  I don''t
know how I can recreate data on a zfs filesystem to the point where a zfs
snapshot makes use of the same data if it is the same.  I know that if I tar -c
- | tar -x or find | cpio data onto a zfs filesystem, take a snapshot of that
zfs fs, and then do the operation again to the same set of files, take another
snapshot, both snapshots say they consume the amount of space that included the
total of the amount of files copied.  So they are not sharing the same space on
blocks of the disk.  Do I understand that correctly?

What I''m really looking for is a way to shrink our backup window, by
making use of some "tool" that can look at a binary file that is at 2
different points in time, say one on a zfs snapshot, and one from a different
filesystem, i.e. a current split of a mirror housing a zfs|ufs|vxfs filesystem
mounted on a host that can see both filesystems.  Is there a way to compare the
2 files, and just get portions that differ to get written to the copy that is on
the zfs filesystem, so that after a new snapshot is taken, the zfs snapshot
could see the amount of changes in that file to only be the delta of bits inside
the file that changed?  I thought rsync could deal with this, yet I think if
timestamp changes on your source file, it considers the whole file as changed
and would copy over the whole thing, yet I''m really not that versed in
rsync and could be completely wrong.

I guess I''m more after that "tool".  I know there exist
agents that can run that poll oracle database files and find out what bits
changed and write those off somewhere.  RMAN can do that, yet that still keeps
things down at a DBA level, yet I need to keep this backup processing at the SA
level.  I''m just trying to find out how to migrate our data in a way
that is fast, reliable, and optimal.

Was checking out these threads:
http://www.opensolaris.org/jive/thread.jspa?threadID=20276&tstart=0
http://www.opensolaris.org/jive/thread.jspa?threadID=22724&tstart=0

And now just saw an update to http://blogs.sun.com/AVS/.  Maybe all my answers
lie there... Will dig around there for more, but would welcome feedback and
ideas for this.

TIA
 
 
This message posted from opensolaris.org

Wade.Stuart at fallon.com

2007-Feb-02 17:40 UTC

head link

[zfs-discuss] ZFS and question on repetative data migrating to it efficiently...

zfs-discuss-bounces at opensolaris.org wrote on 02/02/2007 11:16:32 AM:
> Hi all,
>
> Longtime reader, first time poster.... Sorry for the lengthy intro
> and not really sure the title matches what I''m trying to get at...
I
> am trying to find a solution where making use of a zfs filesystem
> can shorten our backup window.  Currently, our backup solution takes
> data from ufs or vxfs filesystems from a split mirrored disk, mounts
> it off host, and then writes that directly to tape from the off host
> backup system.  This equates to a "full" backup and requires a
> significant amount of time that the split mirror is attached to that
> system till it can get returned.  I''d like to fit in a zfs
> filesystem into the mix, and hopefully make use of "space saving"
> snapshot capabilities and find out if there is a known way to
> migrate this into a "incremental" backup, with known freeware or
os
> level tools.
What we are playing with here is rsync --inplace diffs from vxfs/ufs (large
systems, 7+TB, millions of files) multiple times per day to thumper and
have the thumper then spool to tape via netbackup.  In our situation this
has shortened our full backup windows from 2 days on our largest systems to
< 1 hour.  On the thumper side we snap after each rsync and because of the
--inplace the differential data requirements for the snaps are very close
to actual data delta on the primary server.   This also allows for (in our
case) 1 -> 8 snaps per day to be kept live nearline over extended periods
with very little overhead -- and reducing the amount of production side
snaps holding delta data.  Most restore requests are done via snaps.

>
> I get that if the source data storage was originally zfs instead of
> ufs|vxfs, that I''d be able to take snapshots of the storage pre
> mirror split, mount that storage on the offhost, and then take
> deltas from the different snapshots to turn into files, or get
> applied directly to another zfs filesystem on the offhost that was
> originally created from information off that detached mirror.  We
> could also do this without the mirror split and just do a zfs send
> and pipe the data out to a remote host where it would recreate that
> snapshot there.  It will take a while to get to where I can have zfs
> running in production as it might involve some brainwashing of some
> DBAs to get it done, so in the meantime, what are some thoughts on
> how to do this without data sitting on a zfs source?
This was the same issue we had,  zfs is missing some features and has some
performance issues in certain workflows that do not allow us to migrate
most of our production systems (yet).  rsync is working well for us in in
lieu of zfs send/receive.
>
> Some questions I have are in regards to trying to keep this
> management of data at a "file" level.  I would like to use a zfs
> filesystem as the repository of data, and have that repository most
> efficiently house the data.  I''d like to see that if I sent over
> binary database files that were sourced on a ufs|vxfs filesystem to
> the zfs filesystem, and then took a snapshot of that data, how could
> I update the data on that zfs filesystem with more current files,
> and then have zfs recognize that the files are mostly the same, and
> only have some differing bits.  Can a snapshotted zfs filesystem,
> get a file that is named the same, overwritten fully on the live zfs
> filesystem, and use the same amount of "block" space that is used
in
> the snapshot?  I don''t know if I''m stating that all
clearly.  I
> don''t know how I can recreate data on a zfs filesystem to the
point
> where a zfs snapshot makes use of the same data if it is the same.
> I know that if I tar -c - | tar -x or find | cpio data onto a zfs
> filesystem, take a snapshot of that zfs fs, and then do the
> operation again to the same set of files, take another snapshot,
> both snapshots say they consume the amount of space that included
> the total of the amount of files copied.  So they are not sharing
> the same space on blocks of the disk.  Do I understand that correctly?
again, rsync --inplace overlays only changed files in place so as only
blocks that change are rewritten -- minimizing snap delta
cost.>
> What I''m really looking for is a way to shrink our backup window,
by
> making use of some "tool" that can look at a binary file that is
at
> 2 different points in time, say one on a zfs snapshot, and one from
> a different filesystem, i.e. a current split of a mirror housing a
> zfs|ufs|vxfs filesystem mounted on a host that can see both
> filesystems.  Is there a way to compare the 2 files, and just get
> portions that differ to get written to the copy that is on the zfs
> filesystem, so that after a new snapshot is taken, the zfs snapshot
> could see the amount of changes in that file to only be the delta of
> bits inside the file that changed?  I thought rsync could deal with
> this, yet I think if timestamp changes on your source file, it
> considers the whole file as changed and would copy over the whole
> thing, yet I''m really not that versed in rsync and could be
completely
wrong.>
rsync can use timestamp/size diffs to quickly tell if a file change is
suspect and then will go further and checksum blocks and only transfer
changed blocks.  It is pretty efficient.

> I guess I''m more after that "tool".  I know there exist
agents that
> can run that poll oracle database files and find out what bits
> changed and write those off somewhere.  RMAN can do that, yet that
> still keeps things down at a DBA level, yet I need to keep this
> backup processing at the SA level.  I''m just trying to find out
how
> to migrate our data in a way that is fast, reliable, and optimal.
>
> Was checking out these threads: http://www.opensolaris.
> org/jive/thread.jspa?threadID=20276&tstart=0
> http://www.opensolaris.org/jive/thread.jspa?threadID=22724&tstart=0
>
> And now just saw an update to http://blogs.sun.com/AVS/.  Maybe all
> my answers lie there... Will dig around there for more, but would
> welcome feedback and ideas for this.
>
> TIA
>
>
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Anton B. Rang

2007-Feb-03 02:54 UTC

head link

[zfs-discuss] Re: ZFS and question on repetative data migrating to it efficiently...

In general, your backup software should handle making incremental dumps, even
from a split mirror. What are you using to write data to tape? Are you simply
dumping the whole file system, rather than using standard backup software?

ZFS snapshots use a pure copy-on-write model. If you have a block containing
some data, and you write exactly the same data to that block, ZFS will allocate
a new block for it. (It would be possible to change this, but I can''t
think of many environments where detecting duplicate blocks would be
advantageous, since most synchronization tools won''t copy duplicate
blocks.)

rsync does actually detect unchanged portions of files and avoids copying them.
However, I''m not sure if it also avoids *rewriting* them, so it may not
help you.

You also wrote:>RMAN can [collect changes at the block level from Oracle files], yet that
still keeps things
>down at a DBA level, yet I need to keep this backup processing at the SA
level.
This sounds like you have a political problem that really should be fixed.
Splitting a mirror is not sufficient to have an Oracle backup from which you can
safely restore, so the DBAs must already be cooperating with the SAs on backups.
Proper use of the database backup tools can make the backup window shorter and

zfs send/receive can be used to back up only changed blocks; vxfs also has
incremental block-based backup available, but the licensing fees may be high.
 
 
This message posted from opensolaris.org

jason

2007-Feb-03 06:23 UTC

head link

[zfs-discuss] Re: ZFS and question on repetative data migrating to it efficiently...

> In general, your backup software should handle making
> incremental dumps, even from a split mirror. What are
> you using to write data to tape? Are you simply
> dumping the whole file system, rather than using
> standard backup software?
> We are using Veritas Netbackup 5 MP4.  It is performing a backup of a vxfs
filesystem that gets mounted from a BCV (business continuance volume) volume
split off a set of mirrors from an EMC DMX.  The dbf files prior to split are
placed into hot backup mode so that the files are in a consistent state.  VxFS
with checkpoints taken can record differing blocks between checkpoints, and
house them for the ability to mount those checkpoints as read-only or
read-write, yet the no-data method of taking checkpoints is a feature that would
record the different blocks that change and could pipe that data to the
netbackup master server, provided you have the correct netbackup agents that can
do this.  Our issue is that the supported method of grabbing no-data checkpoints
on a vxfs filesystem is that this would need to happen on the database host
itself.  We would like if possible to keep the backups performed offhost, and
not steal any cpu cycles from the database host.  AFAIK, there is not a method
that Veritas Netbackup has that can see differences in files on a filesystem
that gets presented to a host, then backed up on that host, and then BCV disks
reattached|resynched to their source (hidden from the backup host when getting
resynched|reattached to source mirrors), then resplit and remounted on the
backup host for another backup cycle.  Netbackup does not know which blocks have
then changed and then would treat all the dbf files as different from previous
backup and backup the whole file = full backup.  I''d love to implement
an incremental backup for these files, yet don''t know which agents can
do this if the storage is not present all the time on the off-host backup
server.  Maybe that is not an issue, but would need to research more to see if
it is.
> ZFS snapshots use a pure copy-on-write model. If you
> have a block containing some data, and you write
> exactly the same data to that block, ZFS will
> allocate a new block for it. (It would be possible to
> change this, but I can''t think of many environments
> where detecting duplicate blocks would be
> advantageous, since most synchronization tools won''t
> copy duplicate blocks.)
> I guess I understand this.  So anytime any new file is created, it will take a
new block.  There would be no point in copying the same file on top of itself,
yet I wanted to find an application that could see differences between 2 files,
and if some bits are identical, do not change them, just change the differing
bits to get the 2 files equivalent.  But if it goes to the point of rewriting
the whole file, then no snapshot space saving is accomplished.
> rsync does actually detect unchanged portions of
> files and avoids copying them. However, I''m not sure
> if it also avoids *rewriting* them, so it may not
> help you.
> 
> You also wrote:
> >RMAN can [collect changes at the block level from
> Oracle files], yet that still keeps things
> >down at a DBA level, yet I need to keep this backup
> processing at the SA level.
> 
> This sounds like you have a political problem that
> really should be fixed. Splitting a mirror is not
> sufficient to have an Oracle backup from which you
> can safely restore, so the DBAs must already be
> cooperating with the SAs on backups. Proper use of
> the database backup tools can make the backup window
> shorter and 
> 
> zfs send/receive can be used to back up only changed
> blocks; vxfs also has incremental block-based backup
> available, but the licensing fees may be high.
This is true that our DBAs do support our existing configuration, yet I feel
that full backups for each backup window are not the fastest method of backup. 
And if you have to backup fully, you need to restore fully to get a database
back into a previous state.  This is for the method that we have, in that we do
not do backups directly from the database host.  So our method for restore would
be to restore the data from tapes back to a BCV disk, and then reverse sync
those disks (or BCV restore sync) the data back to the original.  This would be
a time consuming process, yet could be quicker on machines that have a vxfs
filesystem on them since they could have a checkpoint remounted to be the live
filesystem.  I failed to mention earlier that we do have some databases running
on ufs filesystems and rely on this BCV synch|split process to backup their data
off-host.  So anytime we would need to restore any data from tapes, it will be
the a long process.

I''ll keep looking into Veritas Netbackup agents and what solutions are
available for off-host backup of Oracle database files.  Maybe I can spawn an
Oracle database to read the content of the BCV which gets mounted on the
Netbackup master server and then also have it run a netbackup oracle agent that
could scan the database for the changed blocks and write those off, yet I
believe this would not work for non-vxfs filesystem based BCV volumes.

I''m just trying to find out what options we have that can help us get
to an incremental way to backup our data, yet still perform this off-host.  Have
some thumpers on the way in to try to house backed up data instead of BCV to
tape, as experiencing any media failure with tapes, stretches the backup window.
A VTL might be a better solution, yet hoping that a thumper can act like a
pseudo VTL and do this with some zfs filesystems, so that it could be a higher
option for recovery instead of needing to rely directly on tapes.

thanx for your insight
 
 
This message posted from opensolaris.org

zfs discuss - Feb 2007 - ZFS and question on repetative data migrating to it efficiently...

[zfs-discuss] ZFS and question on repetative data migrating to it efficiently...

[zfs-discuss] ZFS and question on repetative data migrating to it efficiently...

[zfs-discuss] Re: ZFS and question on repetative data migrating to it efficiently...

[zfs-discuss] Re: ZFS and question on repetative data migrating to it efficiently...