Hi, We are to archive huge amount, say, 100TB, of data/images and keep a replicate at a remote site. I thought ZFS will be a good choice. Can someone comment and advice if it''s practical: plan A. To mirror on iSCSI devices: keep one server with a set of zfs file systems with 2 (sub)mirrors each, one of the mirrors use devices physically on remote site accessed as iSCSI LUNs. How does ZFS handle remote replication? If the Internet link is down for hours or days, can the file systems still be written? Will the submirrors be resync''ed efficiently? Plan B. To use ZFS incremental snapshot backup/restore on a pair or servers to sync 2 copies of the same data over the Internet, once say, every 10 or 60 min. Will I still get decent performance when I have 50 or 100TB (say using V490/V890 on some SATA drives on 3511 arrays)? More options? Much Thanks. Max This message posted from opensolaris.org
Jeff Bonwick
2006-May-12 00:01 UTC
[zfs-discuss] remote replication with huge data using zfs?
> plan A. To mirror on iSCSI devices: > keep one server with a set of zfs file systems > with 2 (sub)mirrors each, one of the mirrors use > devices physically on remote site accessed as > iSCSI LUNs. > > How does ZFS handle remote replication? > If the Internet link is down for hours or days, > can the file systems still be written? Will > the submirrors be resync''ed efficiently?This would work. If the link goes down, it''s no different than if someone trips over the cable for a local disk. All writes (and reads) will go to the local disk until the remote one returns. When the remote disk returns, we''ll resilver it. The cool thing here is that ZFS resilvering is logical, not physical, so it''ll only copy the blocks that changed during the outage (i.e., it''ll be fast). For a bit more detail on how ZFS mirroring works, see: http://blogs.sun.com/roller/page/bonwick?entry=smokin_mirrors The one hesitation I''d have about Plan A is that ZFS doesn''t yet support the notion of two sides of a mirror being very different in performance. With a local/remote pair, you really want different semantics than a pair of local disks. You want to send all reads to the local disk, and you want to consider a write complete when the local disk is done (and let the remote write be asynchronous). We''re planning to do this soon, but it''s not there yet.> Plan B. To use ZFS incremental snapshot backup/restore on a > pair or servers to sync 2 copies of the same data over > the Internet, once say, every 10 or 60 min.This is a better approach, for several reasons. It will generally be faster than remote mirroring because most ''churn'' (creation and deletion of short-lived files) will never be sent over the wire. It allows you to have fault tolerance like RAID-Z on the local disks. It allows you to have arbitrarily different hardware at the local and remote sites (e.g. you could have a SPARC system with a pool of RAID-Z disks locally, and an Opteron system with mirrored disks at the remote site). Plan B is also more flexible because it''s acting as a file server rather than as a dumb LUN. This means you can do things like have several different sites pushing changes to a single remove server, or arrange to have several different sites back each other up (e.g. the LA office sends incrementals to NY, and the NY office sends incrementals to LA). Generating incrementals is *very* fast in ZFS. The time it takes to send an incremental is proportional to the amount of data changed, no matter how much *unchanged* data there is. Note that this is very different than most incremental backup tools, which have to traverse *all* of the metadata to find what''s changed. It can take hours to discover that a single block changed. For ZFS it''s instant. Jeff
Richard Elling
2006-May-12 20:35 UTC
[zfs-discuss] remote replication with huge data using zfs?
On Thu, 2006-05-11 at 17:01 -0700, Jeff Bonwick wrote:> > plan A. To mirror on iSCSI devices: > > keep one server with a set of zfs file systems > > with 2 (sub)mirrors each, one of the mirrors use > > devices physically on remote site accessed as > > iSCSI LUNs. > > > > How does ZFS handle remote replication? > > If the Internet link is down for hours or days, > > can the file systems still be written? Will > > the submirrors be resync''ed efficiently? > > This would work. If the link goes down, it''s no different than > if someone trips over the cable for a local disk. All writes > (and reads) will go to the local disk until the remote one returns.By mirroring remotely, you create a dependency between the two hosts and their interconnects. In general, this will be noticeable by performance bottlenecks as the system is only as fast as the slowest component (ala a chain is as strong as its weakest link).>From a practical perspective, this tends to increase the faultdetection time. In the local disk case, if you unplug the disk it is either immediately detected (SAS, SATA) or quickly detected (direct FC, parallel SCSI). For IP networks, the detection may take some number of minutes before the various timeouts expire. The timeouts exist to provide stability, so you can''t just tune them short without risking instability. However, it is rare that people do such analysis. As to what ZFS does when one side of the mirror starts having very long delays, I''m not sure. For most other LVMs, they just slow down. Architecturally, it is usually not in your best interest to have performance in a primary datacenter depend directly on the performance of the secondary datacenter and the link between. Using a one-way replication such as ZFS send/receive will allow you to break this dependency.> When the remote disk returns, we''ll resilver it. The cool thing here > is that ZFS resilvering is logical, not physical, so it''ll only copy > the blocks that changed during the outage (i.e., it''ll be fast). > For a bit more detail on how ZFS mirroring works, see: > > http://blogs.sun.com/roller/page/bonwick?entry=smokin_mirrors > > The one hesitation I''d have about Plan A is that ZFS doesn''t yet > support the notion of two sides of a mirror being very different in > performance. With a local/remote pair, you really want different > semantics than a pair of local disks. You want to send all reads > to the local disk, and you want to consider a write complete when > the local disk is done (and let the remote write be asynchronous). > We''re planning to do this soon, but it''s not there yet.Yes, preferred policy will be appreciated. This would be useful in the case where I have local devices which are different, such as [NV]RAM on one side and spinning rust on the other. -- richard
Hi,> By mirroring remotely, you create a dependency between the two > hosts and their interconnects. In general, this will be noticeable > by performance bottlenecks as the system is only as fast as the > slowest component (ala a chain is as strong as its weakest link).Well what about copying snapshots? would that be an option, thus giving you a less redundant, yet still redundant solution...? Patrick
Max Holm
2006-May-13 01:19 UTC
[zfs-discuss] Re: remote replication with huge data using zfs?
Hi, Much thanks to the advices from Jeff & Ricard (and maybe others.). More questions of this topic: If not to have ZFS do the mirroring job, the users have to take care of this by some scripts and handle the errors. If use a pair of servers to replicate a big archive, 100TB(can grow to PT), with incremental snapshots and after some host/array failures on either end of the pair and you cannot be sure the 2 copies of archives on 2 zfs pools still contain exactly the same files/contents. How do you resync them or verify their status efficiently? Thanks. -Max This message posted from opensolaris.org