ankush grover
2008-Feb-06 04:58 UTC
[CentOS] strategy/technology to backup 20TB or more user's data
Hi Friends, I am currently using Samba on Centos 4.4 as a domain member of AD 2003 with each user having a quota of 2GB(no of users is around 2,000). Now the management wants to increase the quota to 10GB with this there will be more than 20TB of data to be backup weekly which will take lots of hours. Currently Veritas backup software is used to backup data on tapes. There is a concept of snapshots of Samba with LVM where snapshots of samba are taken at the given interval but so far haven't found any good article or how-to on that and also what is the experience of users using this technology and also what other technologies are being to handle TBs of data. Kindly let me know if you need any further inputs Thanks & Regards Ankush
nate
2008-Feb-06 05:16 UTC
[CentOS] strategy/technology to backup 20TB or more user's data
ankush grover wrote:> There is a concept of snapshots of Samba with LVM where snapshots of > samba are taken at the given interval but so far haven't found any > good article or how-to on that and also what is the experience of > users using this technology and also what other technologies are being > to handle TBs of data.Save yourself a bunch of trouble and buy a real storage system, if you have 20TB of data that's a serious amount of stuff to back up. Network Appliance is pretty popular in that space. I've been using 3PAR for my storage and really like it's built in virtualization. Dell recently purchase Equallogic which looks to have some solid technology as well. I attended a little event where they pitched their pooled storage iSCSI system. Looked pretty cool. With these sorts of system snapshotting is really easy and scalable. In the 3PAR world for example(not sure who else might have this ability), they have thin copy on write technology. So say you take a snapshot of a volume once a day for 30 days. In a traditional snapshot environment, you use a lot of space on the array as it keeps track of (up to) 30 different snapshots and the changes from the original volume. In the 3PAR world it only writes the changes once for the source volume. So if you change 10GB of data on the base volume and you have 20 snapshots, only 10GB of data is written as changed to the array, not 20*10GB. I also love the ability to snapshot multiple volumes from multiple systems at the same time, and the array ensures they are all taken at the same instant. Add on top of that the most advanced thin provisioning (dedicate on write) technology around, the ability to dynamically grow the array, change RAID levels on the fly with no downtime, etc, and you got yourself a nice system :) I'd personally steer clear of any of the "old fashioned" arrays (e.g. traditional EMC, HDS, IBM, though some of them are getting thin provisioning as well). I can only speak with personal experience in 3PAR, but I believe NetApp and it looks like Equallogic are very similar in ease of use. No need to be a storage engineer or have fancy training to use them. No spending days/weeks planning the layout of your storage system. With my storage array currently I have 8.5TB raw space, am using about 6TB, but my servers think I have nearly 30TB. As I get closer to 8TB I can add more space on the fly, re-balance the I/O for maximum performance and continue to grow as-needed. nate
Peter Kjellstrom
2008-Feb-06 07:06 UTC
[CentOS] strategy/technology to backup 20TB or more user's data
On Wednesday 06 February 2008, ankush grover wrote:> Hi Friends, > > > I am currently using Samba on Centos 4.4 as a domain member of AD 2003 > with each user having a quota of 2GB(no of users is around 2,000). NowThis mean you can run several smaller filesystem and (among other things) run backups in parallell.> the management wants to increase the quota to 10GB with this there > will be more than 20TB of data to be backup weekly which will take > lots of hours. Currently Veritas backup software is used to backup > data on tapes.I think you'll want a backup-software that does fully incremental backups. That is. We use IBM-TSM for this and I'm quite sure it would easily backup, say, 4x 5TB (assuming four nice filesystems on decent hardware). ...of course this depends on how much your users modify their data each day. A 2nd alternative is to don't do traditional backups at all. Go either multiple replicas (probably not what you want) or a storage system that can do cheap snapshots and keep a weeks worth of daily snapshots alive. Good luck, Peter -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part. URL: <http://lists.centos.org/pipermail/centos/attachments/20080206/bfe852dd/attachment.sig>
Apparently Analagous Threads
- Is Samba Shadowcopying can be used in Production Environement with more than 20 TB of data
- Need help in securing maildir so that root user should not able to read anyother user's mail
- Cross Network Based CD/DVD Burning Software
- restricting mails from "mail" command to specific domains only in postfix
- software for analyzing ssh logs and generatiing reports based on that