Following my previous post across several mailing lists regarding multi-tera volumes with small files on them, I''d be glad if people could share real life numbers on large filesystems and their experience with them. I''m slowly coming to a realization that regardless of theoretical filesystem capabilities (1TB, 32TB, 256TB or more), more or less across the enterprise filesystem arena people are recommending to keep practical filesystems up to 1TB in size, for manageability and recoverability. What''s the maximum filesystem size you''ve used in production environment? How did the experience come out? Thanks, -Yaniv (p.s. I think for ZFS, "large zpools" can be considered more apropriate to classic FS terms of "large filesystems". would you agree?) This message posted from opensolaris.org
Yaniv Aknin <the.aknin at gmail.com> wrote:> Following my previous post across several mailing lists regarding multi-tera volumes with small files on them, I''d be glad if people could share real life numbers on large filesystems and their experience with them. I''m slowly coming to a realization that regardless of theoretical filesystem capabilities (1TB, 32TB, 256TB or more), more or less across the enterprise filesystem arena people are recommending to keep practical filesystems up to 1TB in size, for manageability and recoverability.UFS is limited to 2**31 inodes and this also limits the filesystem size. On Berlios we have a mixture of small and large files and the average file size is 100 kB. This would still give you a limit os 200 TB which is more than UFS allows you. I would guess that the recommendations are rather oriented on the backup. On backup speed and on the size of the backup media. J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
> Following my previous post across several mailing lists regarding multi-tera volumes with small files on them, I''d be glad if people could share real life numbers on large filesystems and their experience with them. I''m slowly coming to a realization that regardless of theoretical filesystem capabilities (1TB, 32TB, 256TB or more), more or less across the enterprise filesystem arena people are recommending to keep practical filesystems up to 1TB in size, for manageability and recoverability. > > What''s the maximum filesystem size you''ve used in production environment? How did the experience come out?I''m currently using 4 TB partitions with vxfs. When hosted on FreeBSD I was limited to 2 TB but using UFS2/FreeBSD was impractical for several reasons. With vxfs 4 TB is a practical limit, when files are stored on the volume we take incremental backup every night, and this requires approx. 16-17 LTO-3-.tapes. When a partition is filled up we perform a complete backup which requires approx. 12 LTO-3 tape. Our tape-station is a Dell PV136T with 3x18 slots. Increasing a partition to 5 TB would require more tapes and I don''t have any plans on becoming a tape-dj :-) If I did use zfs I would probably make the partitons the same size but still make the (z)pool rather large. regards Claus
"Claus Guttesen" <kometen at gmail.com> wrote:> I''m currently using 4 TB partitions with vxfs. When hosted on FreeBSD > I was limited to 2 TB but using UFS2/FreeBSD was impractical for > several reasons. With vxfs 4 TB is a practical limit, when files areCould you please give some hints on these reasons? I only know that FreeBSDs UFS2 is not fully 64 Bit aware.> stored on the volume we take incremental backup every night, and this > requires approx. 16-17 LTO-3-.tapes. When a partition is filled up we > perform a complete backup which requires approx. 12 LTO-3 tape. Our > tape-station is a Dell PV136T with 3x18 slots. Increasing a partition > to 5 TB would require more tapes and I don''t have any plans on > becoming a tape-dj :-)What kind of backup software do you use and what amountof time does a full and an incremental backup take? Could you please explain why an incremental backup takes 16-17 tapes while a full backup only takes 12 tapes? 12 tapes with 400 GB should pe OK for a level 0 backup of a 5 TB volume. I see the size of an incremenatal backup is less than 4% of the total data size on the NFS server for berlios.de. In your case, this should be less than 180 GB which is less than one single tape. Do you use GNU tar for your incrementals? J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
> > I''m currently using 4 TB partitions with vxfs. When hosted on FreeBSD > > I was limited to 2 TB but using UFS2/FreeBSD was impractical for > > several reasons. With vxfs 4 TB is a practical limit, when files are > > Could you please give some hints on these reasons? I only know that > FreeBSDs UFS2 is not fully 64 Bit aware.FreeBSD 5/6 can under normal circumstances perform a fsck in the background if the server had an unplanned restart. But before it can do so it has do make a snapshot. A snapshot would take approx. 20 min. on a 400 GB partition and during that time the webservers could not access the nfs-server and hence our visitors would see an error-message. As my partitions grew the time taken to perform a snapshot would also increase. At that time FreeBSD did not have a volume manager (it may have now) and jugling with number of disks in a raid-set so I could make partitions reach the 2 TB size and still make them almost the same size and not waste too much disk space was a bit of a pain. Growfs would not always work, jugling with sizes in bytes and manually edit the partition can result in erros if one is not careful. Using veritas vol.man. is more a matter of adding the LUN''s to a diskgroup and then create the partitions with that dg. Much easier. Zpool and zfs create is easier than veritas.> > stored on the volume we take incremental backup every night, and this > > requires approx. 16-17 LTO-3-.tapes. When a partition is filled up we > > perform a complete backup which requires approx. 12 LTO-3 tape. Our > > tape-station is a Dell PV136T with 3x18 slots. Increasing a partition > > to 5 TB would require more tapes and I don''t have any plans on > > becoming a tape-dj :-) > > What kind of backup software do you use and what amountof time does a full > and an incremental backup take?I use legato networker. They have drivers for solaris 9 on sparc for lto-3-tapes.> Could you please explain why an incremental backup takes 16-17 tapes while > a full backup only takes 12 tapes?It''s because some of our users do minor changes to their files and all these changes to individual files will be added to the incremental backup. So a file can be backed up more than once.> 12 tapes with 400 GB should pe OK for a level 0 backup of a 5 TB volume. > > I see the size of an incremenatal backup is less than 4% of the total data size > on the NFS server for berlios.de. In your case, this should be less than 180 GB > which is less than one single tape. Do you use GNU tar for your incrementals?I use legato. -- regards Claus
"Claus Guttesen" <kometen at gmail.com> wrote:> > > I''m currently using 4 TB partitions with vxfs. When hosted on FreeBSD > > > I was limited to 2 TB but using UFS2/FreeBSD was impractical for > > > several reasons. With vxfs 4 TB is a practical limit, when files are > > > > Could you please give some hints on these reasons? I only know that > > FreeBSDs UFS2 is not fully 64 Bit aware. > > FreeBSD 5/6 can under normal circumstances perform a fsck in the > background if the server had an unplanned restart. But before it can > do so it has do make a snapshot. A snapshot would take approx. 20 min. > on a 400 GB partition and during that time the webservers could not > access the nfs-server and hence our visitors would see an > error-message. As my partitions grew the time taken to perform a > snapshot would also increase.This leads to an interesting question. I observe a time between 2-3 minutes on a 500 GB Solaris UFS partition to create a snapshot. It would be of interest to me to know about other data on snapshots on UFS or ZFS.> > > stored on the volume we take incremental backup every night, and this > > > requires approx. 16-17 LTO-3-.tapes. When a partition is filled up we > > > perform a complete backup which requires approx. 12 LTO-3 tape. Our > > > tape-station is a Dell PV136T with 3x18 slots. Increasing a partition > > > to 5 TB would require more tapes and I don''t have any plans on > > > becoming a tape-dj :-) > > > > What kind of backup software do you use and what amountof time does a full > > and an incremental backup take? > > I use legato networker. They have drivers for solaris 9 on sparc for > lto-3-tapes. > > > Could you please explain why an incremental backup takes 16-17 tapes while > > a full backup only takes 12 tapes? > > It''s because some of our users do minor changes to their files and all > these changes to individual files will be added to the incremental > backup. So a file can be backed up more than once.I don''t understand this, if you only create one incremental per day, I would asume every changed file to appear exactly once in an incremental. J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
On 4/28/07, Yaniv Aknin <the.aknin at gmail.com> wrote:> Following my previous post across several mailing lists regarding multi-tera volumes with small files on them, I''d be glad if people could share real life numbers on large filesystems and their experience with them. I''m slowly coming to a realization that regardless of theoretical filesystem capabilities (1TB, 32TB, 256TB or more), more or less across the enterprise filesystem arena people are recommending to keep practical filesystems up to 1TB in size, for manageability and recoverability. > > What''s the maximum filesystem size you''ve used in production environment? How did the experience come out?Works fine. As an example: Filesystem size used avail capacity Mounted on xx 14T 116K 2.1T 1% /xx xx/peter 14T 34G 2.1T 2% /xx/peter xx/aa 14T 1.2T 2.1T 37% /xx/aa xx/tank 14T 1.5T 2.1T 41% /xx/tank xx/tank-archives 14T 3.5T 2.1T 63% /xx/tank-archives xx/rework 14T 6.8G 2.1T 1% /xx/rework xx/bb 14T 61G 2.1T 3% /xx/bb xx/foo 14T 73K 2.1T 1% /xx/foo xx/foo/aug06 14T 771K 2.1T 1% /xx/foo/aug06 xx/foo/cc 14T 55G 2.1T 3% /xx/foo/cc xx/foo/mm 14T 2.2G 2.1T 1% /xx/foo/mm xx/foo/dd 14T 47M 2.1T 1% /xx/foo/dd xx/foo/rr 14T 1.4G 2.1T 1% /xx/foo/rr xx/foo/tf 14T 1.6G 2.1T 1% /xx/foo/tf xx/ee 14T 1.3T 2.1T 38% /xx/ee xx/aa-fe 14T 274G 2.1T 12% /xx/aa-fe xx/vv 14T 68G 2.1T 4% /xx/vv xx/nn 14T 28G 2.1T 2% /xx/nn xx/mm 14T 4.2G 2.1T 1% /xx/mm xx/rr 14T 3.1G 2.1T 1% /xx/rr xx/ss 14T 48G 2.1T 3% /xx/ss xx/ff 14T 305G 2.1T 13% /xx/ff xx/gg-jj 14T 570G 2.1T 21% /xx/gg-jj xx/gg 14T 882G 2.1T 29% /xx/gg xx/aa-tn 14T 35G 2.1T 2% /xx/aa-tn xx/pp 14T 234K 2.1T 1% /xx/pp xx/ee-tt 14T 256K 2.1T 1% /xx/ee-tt xx/tank-r4 14T 2.0T 2.1T 50% /xx/tank-r4 xx/tank-r1-clone 14T 3.3T 2.1T 61% /xx/tank-r1-clone xx/rdce 14T 91G 2.1T 5% /xx/rdce That''s a fair spread of sizes. Each filesystem in this case represents a single dataset, so it''s hard to make them any smaller. (Until recently, some of the larger datasets were spread across multiple ufs filesystems and merged back together using an automount map. At least zfs has saved me from that nightmare.) Many of these filesystems have millions of files - the most is over 11 million at an average of 115k each, although one has 8 million at 4k each. In practical terms, backing up much over a terabyte in a single chunk isn''t ideal. What I would like to see here is more flexibility from something like Legato in terms of defining schedules that would allow us to back this up sensibly. (Basically, the changes are relatively small, so it would be nice to use quarterly schedules - Legato only really does weekly or monthly.) -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
On Sat, Apr 28, 2007 at 05:02:47PM +0100, Peter Tribble wrote:> > In practical terms, backing up much over a terabyte > in a single chunk isn''t ideal. What I would like to see > here is more flexibility from something like Legato > in terms of defining schedules that would allow us to > back this up sensibly. (Basically, the changes are > relatively small, so it would be nice to use quarterly > schedules - Legato only really does weekly or monthly.)So what you *really* want is TSM. I wonder if IBM would ever consider supporting ZFS. -brian -- "Perl can be fast and elegant as much as J2EE can be fast and elegant. In the hands of a skilled artisan, it can and does happen; it''s just that most of the shit out there is built by people who''d be better suited to making sure that my burger is cooked thoroughly." -- Jonathan Patschke
On 4/28/07, Brian Hechinger <wonko at 4amlunch.net> wrote:> So what you *really* want is TSM. I wonder if IBM would ever > consider supporting ZFS.Just wondering, does Sun/STK have something similar to TSM?? Rayson> > -brian > -- > "Perl can be fast and elegant as much as J2EE can be fast and elegant. > In the hands of a skilled artisan, it can and does happen; it''s just > that most of the shit out there is built by people who''d be better > suited to making sure that my burger is cooked thoroughly." -- Jonathan Patschke > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
the Sun''s equivalent is SAMFS and especially the latest version (4.6) which can be entirely used for backup/restore/archive SAMFS will be opensource very soon s. On 4/28/07, Rayson Ho <rayrayson at gmail.com> wrote:> On 4/28/07, Brian Hechinger <wonko at 4amlunch.net> wrote: > > So what you *really* want is TSM. I wonder if IBM would ever > > consider supporting ZFS. > > Just wondering, does Sun/STK have something similar to TSM?? > > Rayson > > > > > > > -brian > > -- > > "Perl can be fast and elegant as much as J2EE can be fast and elegant. > > In the hands of a skilled artisan, it can and does happen; it''s just > > that most of the shit out there is built by people who''d be better > > suited to making sure that my burger is cooked thoroughly." -- Jonathan Patschke > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Erblichs <erblichs at earthlink.net> wrote:> Jorg, > > Do you really think that ANY FS actually needs to support > more FS objects? If that would be an issue, why not create > more FSs? > > A multi-TB FS SHOULD support 100MB+/GB size FS objects, which > IMO is the more common use. I have seen this alot in video > environments. The largest that I have personally seen is in > excess of 64TBs.Abou 12 years ago many people have been in fear to create filesystems grater than 2 GB. Today we cannot believe this.> I would assume that just normal FSops that search or display > a extremely large number of FS objects is going to be > difficult to use. Just try placing 10k+ FS objects/files within > a directly and then list that directory. > > As for backups / restore type ops, I would assume that a > smaller granularity of specified paths / directories would be > more common due to user error and not disturbing other > directories.I am sure, in 10-15 years people think different than they do today. Note that in 15 years, a single 2.5 " disk will have a capacity of aprox. 100 TB. As people will use striping and RAID technologies to optimize speed and reliability, a single data pool will most likely be usually 1000 TB. Backup media will also increate in size...... J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
Jorg, Do you really think that ANY FS actually needs to support more FS objects? If that would be an issue, why not create more FSs? A multi-TB FS SHOULD support 100MB+/GB size FS objects, which IMO is the more common use. I have seen this alot in video environments. The largest that I have personally seen is in excess of 64TBs. I would assume that just normal FSops that search or display a extremely large number of FS objects is going to be difficult to use. Just try placing 10k+ FS objects/files within a directly and then list that directory. As for backups / restore type ops, I would assume that a smaller granularity of specified paths / directories would be more common due to user error and not disturbing other directories. Mitchell Erblich ----------------- Joerg Schilling wrote:> > Yaniv Aknin <the.aknin at gmail.com> wrote: > > > Following my previous post across several mailing lists regarding multi-tera volumes with small files on them, I''d be glad if people could share real life numbers on large filesystems and their experience with them. I''m slowly coming to a realization that regardless of theoretical filesystem capabilities (1TB, 32TB, 256TB or more), more or less across the enterprise filesystem arena people are recommending to keep practical filesystems up to 1TB in size, for manageability and recoverability. > > UFS is limited to 2**31 inodes and this also limits the filesystem size. > On Berlios we have a mixture of small and large files and the average file > size is 100 kB. This would still give you a limit os 200 TB which is more > than UFS allows you. > > I would guess that the recommendations are rather oriented on the backup. > On backup speed and on the size of the backup media. > > J?rg > > -- > EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin > js at cs.tu-berlin.de (uni) > schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ > URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On 2007-04-28, Brian Hechinger <wonko at 4amlunch.net> wrote:>> > So what you *really* want is TSM. I wonder if IBM would ever > consider supporting ZFS. >TSM can do file-level backup of any normal posix file system by specifying "VirtualMountPoint /directory". You just woun''t get the extra features like extended attributes and ACL''s covered. -jf
On 4/28/07, Brian Hechinger <wonko at 4amlunch.net> wrote:> On Sat, Apr 28, 2007 at 05:02:47PM +0100, Peter Tribble wrote: > > > > In practical terms, backing up much over a terabyte > > in a single chunk isn''t ideal. What I would like to see > > here is more flexibility from something like Legato > > in terms of defining schedules that would allow us to > > back this up sensibly. (Basically, the changes are > > relatively small, so it would be nice to use quarterly > > schedules - Legato only really does weekly or monthly.) > > So what you *really* want is TSM. I wonder if IBM would ever > consider supporting ZFS.Educate me. In what way would TSM help? The only real issue I have with Legato (or NetBackup) is the inability to define schedules just the way I want. (The workaround is to set up a monthly schedule and them manually override some of the months.) Thanks, -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
> What''s the maximum filesystem size you''ve used in production environment? How did the experience come out?I have a 26tb pool that will be upgraded to 39tb in the next couple of months. This is the backend for Backup images. The ease of managing this sort of expanding storage is a little bit of wonderful. I remember the pain of managing 10tb of A5200''s back in 2000 and this is a welcome sight. This message posted from opensolaris.org