Dear All, I''ve just joined this list, and am trying to understand the state of play with using free backup solutions for ZFS, specifically on a Sun x4500. The x4500 we have is used as a file store, serving clients using NFS only. I''m handling the issue of recovery of accidentally deleted files by using daily rolling snapshots. I''m also looking for a solution for making complete backups (from snapshot) to tape. Optimistically, I''m hoping to find something that is both reliable and free :-) From what I''ve read on the net, it seems that amanda may not be up to it, and I''ve not really found anything about whether bacula is up to it either. Another option is to see if it is possible to stream "zfs send" to tape in a way that is reliable enough to depend upon. Does anyone here have experience of this with multi-TB filesystems and any of these solutions that they''d be willing to share with me please? Thanks, Anna
I had a brief look into this too. I''m a solaris newbie, but the best solution looked to be tar, or something called Star. Our plan is to use ZFS send/receive to back the data up onto live server storage. But for tape archives I actually want to use a completely different filesystem. If something were to go badly wrong and corrupt the ZFS pool for example in a way that wasn''t discovered for some time, I would always have access to the raw files on the backup. This message posted from opensolaris.org
On Wed, Apr 16, 2008 at 2:12 PM, Anna Langley <jal58 at cam.ac.uk> wrote:> > I''ve just joined this list, and am trying to understand the state of > play with using free backup solutions for ZFS, specifically on a Sun > x4500....> Does anyone here have experience of this with multi-TB filesystems and > any of these solutions that they''d be willing to share with me please?My experience so far is that anything past a terabyte and 10 million files, and any backup software struggles. (I''ve largely been involved with commercial solutions, as we already have them. They struggle as well.) Generally, handling data volumes on this scale seems to require some way of partitioning it into more easily digestible chunks. Either into separate filesystems (zfs makes this easy) or, if that isn''t possible, to structure the data on a large filesystem into some sort of hierarchy so that it has top-level directories that break it up into smaller chunks. (Some sort of hashing scheme appears to be indicated. Unfortunately our applications fall into two classes: everything in one huge directory, or a hashing scheme that results in many thousands of top-level directories.) -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
On Sun, 20 Apr 2008, Peter Tribble wrote:>> Does anyone here have experience of this with multi-TB filesystems and >> any of these solutions that they''d be willing to share with me please? > > My experience so far is that anything past a terabyte and 10 million files, > and any backup software struggles.What is the cause of the "struggling"? Does the backup host run short of RAM or CPU? If backups are incremental, is a large portion of time spent determining the changes to be backed up? What is the relative cost of many small files vs large files? How does ''zfs send'' performance compare with a traditional incremental backup system? Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Sun, Apr 20, 2008 at 4:39 PM, Bob Friesenhahn <bfriesen at simple.dallas.tx.us> wrote:> On Sun, 20 Apr 2008, Peter Tribble wrote: > > > > My experience so far is that anything past a terabyte and 10 million > files, > > and any backup software struggles. > > > > What is the cause of the "struggling"? Does the backup host run short of > RAM or CPU? If backups are incremental, is a large portion of time spent > determining the changes to be backed up? What is the relative cost of many > small files vs large files?It''s just the fact that, while the backup completes, it can take over 24 hours. Clearly this takes you well over any backup window. It''s not so much that the backup software is defective; it''s an indication that traditional notions of backup need to be rethought. I have one small (200G) filesystem that takes an hour to do an incremental with no changes. (After a while, it was obvious we don''t need to do that every night.) The real killer, I think, is sheer number of files. For us, 10 million files isn''t excessive. I have one filesystem that''s likely to have getting on for 200 million files by the time the project finishes. (Gulp!)> How does ''zfs send'' performance compare with a traditional incremental > backup system?I haven''t done that particular comparison. (zfs send isn''t useful for backup - doesn''t span tapes, doesn''t hold an index of the files.) But I have compared it against various varieties of tar for moving data between machines, and the performance of ''zfs send'' wasn''t particularly good - I ended up using tar instead. (Maybe lots of smallish files again.) For incrementals, it may be useful. But that presumes a replicated configuration (preferably with the other node at a DR site), rather than use in backups. -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
On Sun, 20 Apr 2008, Peter Tribble wrote:>> >> What is the cause of the "struggling"? Does the backup host run short of >> RAM or CPU? If backups are incremental, is a large portion of time spent >> determining the changes to be backed up? What is the relative cost of many >> small files vs large files? > > It''s just the fact that, while the backup completes, it can take over 24 hours. > Clearly this takes you well over any backup window. It''s not so much that the > backup software is defective; it''s an indication that traditional notions of > backup need to be rethought.There is no doubt about that. However, there are organizations with hundreds of terrabytes online and they manage to survive somehow. I receive bug reports from people with 600K files in a single subdirectory. Terrabyte-sized USB drives are available now. When you say that the backup can take over 24 hours, are you talking only about the initial backup, or incrementals as well?> I have one small (200G) filesystem that takes an hour to do an incremental > with no changes. (After a while, it was obvious we don''t need to do that > every night.)That is pretty outrageous. It seems that your backup software is suspect since it must be severely assaulting the filesystem. I am using ''rsync'' (version 3.0) to do disk-to-disk network backups (with differencing) to a large Firewire type drive and have not noticed any performance issues. I do not have 10 million files though (I have about half of that). Since zfs supports really efficient snapshots, a backup system which is aware of snapshots can take snapshots and then backup safely even if the initial dump takes several days. Really smart software could perform both initial dump and incremental dump simultaneously. The minimum useful incremental backup interval would still be be limited to the time required to do one incremental backup. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Hello Peter, Sunday, April 20, 2008, 7:47:31 PM, you wrote:>> How does ''zfs send'' performance compare with a traditional incremental >> backup system?PT> I haven''t done that particular comparison. (zfs send isn''t useful for backup PT> - doesn''t span tapes, doesn''t hold an index of the files.) But I have compared PT> it against various varieties of tar for moving data between machines, and PT> the performance of ''zfs send'' wasn''t particularly good - I ended up using PT> tar instead. (Maybe lots of smallish files again.) PT> For incrementals, it may be useful. But that presumes a replicated PT> configuration (preferably with the other node at a DR site), rather than PT> use in backups. Over a year ago I compared Legato incremental with zfs send incremental on x4500 with a lot of small files. zfs send (incremental) was dramatically quicker. -- Best regards, Robert Milkowski mailto:milek at task.gda.pl http://milek.blogspot.com
> I haven''t done that particular comparison. (zfs send isn''t useful for backup > - doesn''t span tapes, doesn''t hold an index of the files.) But I have compared > it against various varieties of tar for moving data between machines, and > the performance of ''zfs send'' wasn''t particularly good - I ended up using > tar instead. (Maybe lots of smallish files again.)FWIW, we''re using a small ksh script with zfs snapshotting and incremental send/recv to keep a rolling backup of our fileserver (0.9Tb in total and growing) to another machine. They''re both running quad-core Intels with SATA disks, nothing fancy, and they cope very well. Whilst this isn''t directly comparable to backing up your 24Tb on a Thumper, it shows that zfs send/recv *can* be used effectively in a backup strategy. In comparison to our old strategy with rsync, we have equivalent hardware redundancy, 4x the number of backups, near-zero file restore time and backups that complete in 1/12 of the time. Which makes for happy systems administrators. Chris