thr3ads.net - zfs discuss - [zfs-discuss] Very Large Filesystems [Apr 2007]

If this information is useful, please help other people find it:
Share via:

Yaniv Aknin

2007-Apr-28 09:51 UTC

[zfs-discuss] Very Large Filesystems

Following my previous post across several mailing lists regarding multi-tera
volumes with small files on them, I''d be glad if people could share
real life numbers on large filesystems and their experience with them.
I''m slowly coming to a realization that regardless of theoretical
filesystem capabilities (1TB, 32TB, 256TB or more), more or less across the
enterprise filesystem arena people are recommending to keep practical
filesystems up to 1TB in size, for manageability and recoverability.

What''s the maximum filesystem size you''ve used in production
environment? How did the experience come out?

Thanks,
 -Yaniv

(p.s. I think for ZFS, "large zpools" can be considered more
apropriate to classic FS terms of "large filesystems". would you
agree?)
 
 
This message posted from opensolaris.org

Joerg Schilling

2007-Apr-28 10:28 UTC

head link

[zfs-discuss] Very Large Filesystems

Yaniv Aknin <the.aknin at gmail.com> wrote:
> Following my previous post across several mailing lists regarding
multi-tera volumes with small files on them, I''d be glad if people
could share real life numbers on large filesystems and their experience with
them. I''m slowly coming to a realization that regardless of theoretical
filesystem capabilities (1TB, 32TB, 256TB or more), more or less across the
enterprise filesystem arena people are recommending to keep practical
filesystems up to 1TB in size, for manageability and recoverability.
UFS is limited to 2**31 inodes and this also limits the filesystem size.
On Berlios we have a mixture of small and large files and the average file
size is 100 kB. This would still give you a limit os 200 TB which is more
than UFS allows you.

I would guess that the recommendations are rather oriented on the backup.
On backup speed and on the size of the backup media.

J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de                (uni)  
       schilling at fokus.fraunhofer.de     (work) Blog:
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Claus Guttesen

2007-Apr-28 12:42 UTC

head link

[zfs-discuss] Very Large Filesystems

> Following my previous post across several mailing lists regarding
multi-tera volumes with small files on them, I''d be glad if people
could share real life numbers on large filesystems and their experience with
them. I''m slowly coming to a realization that regardless of theoretical
filesystem capabilities (1TB, 32TB, 256TB or more), more or less across the
enterprise filesystem arena people are recommending to keep practical
filesystems up to 1TB in size, for manageability and recoverability.
>
> What''s the maximum filesystem size you''ve used in
production environment? How did the experience come out?
I''m currently using 4 TB partitions with vxfs. When hosted on FreeBSD
I was limited to 2 TB but using UFS2/FreeBSD was impractical for
several reasons. With vxfs 4 TB is a practical limit, when files are
stored on the volume we take incremental backup every night, and this
requires approx. 16-17 LTO-3-.tapes. When a partition is filled up we
perform a complete backup which requires approx. 12 LTO-3 tape. Our
tape-station is a Dell PV136T with  3x18 slots. Increasing a partition
to 5 TB would require more tapes and I don''t have any plans on
becoming a tape-dj :-)

If I did use zfs I would probably make the partitons the same size but
still make the (z)pool rather large.

regards
Claus

Joerg Schilling

2007-Apr-28 12:59 UTC

head link

[zfs-discuss] Very Large Filesystems

"Claus Guttesen" <kometen at gmail.com> wrote:
> I''m currently using 4 TB partitions with vxfs. When hosted on
FreeBSD
> I was limited to 2 TB but using UFS2/FreeBSD was impractical for
> several reasons. With vxfs 4 TB is a practical limit, when files are
Could you please give some hints on these reasons? I only know that 
FreeBSDs UFS2 is not fully 64 Bit aware.

> stored on the volume we take incremental backup every night, and this
> requires approx. 16-17 LTO-3-.tapes. When a partition is filled up we
> perform a complete backup which requires approx. 12 LTO-3 tape. Our
> tape-station is a Dell PV136T with  3x18 slots. Increasing a partition
> to 5 TB would require more tapes and I don''t have any plans on
> becoming a tape-dj :-)
What kind of backup software do you use and what amountof time does a full
and an incremental backup take?

Could you please explain why an incremental backup takes 16-17 tapes while
a full backup only takes 12 tapes?

12 tapes with 400 GB should pe OK for a level 0 backup of a 5 TB volume.

I see the size of an incremenatal backup is less than 4% of the total data size
on the NFS server for berlios.de. In your case, this should be less than 180 GB
which is less than one single tape. Do you use GNU tar for your incrementals?

J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de                (uni)  
       schilling at fokus.fraunhofer.de     (work) Blog:
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Claus Guttesen

2007-Apr-28 14:11 UTC

head link

[zfs-discuss] Very Large Filesystems

> > I''m currently using 4 TB partitions with vxfs. When hosted on
FreeBSD
> > I was limited to 2 TB but using UFS2/FreeBSD was impractical for
> > several reasons. With vxfs 4 TB is a practical limit, when files are
>
> Could you please give some hints on these reasons? I only know that
> FreeBSDs UFS2 is not fully 64 Bit aware.
FreeBSD 5/6 can under normal circumstances perform a fsck in the
background if the server had an unplanned restart. But before it can
do so it has do make a snapshot. A snapshot would take approx. 20 min.
on a 400 GB partition and during that time the webservers could not
access the nfs-server and hence our visitors would see an
error-message. As my partitions grew the time taken to perform a
snapshot would also increase.

At that time FreeBSD did not have a volume manager (it may have now)
and jugling with number of disks in a raid-set so I could make
partitions reach the 2 TB size and still make them almost the same
size and not waste too much disk space was a bit of a pain. Growfs
would not always work, jugling with sizes in bytes and manually edit
the partition can result in erros if one is not careful.

Using veritas vol.man. is more a matter of adding the LUN''s to a
diskgroup and then create the partitions with that dg. Much easier.
Zpool and zfs create is easier than veritas.
> > stored on the volume we take incremental backup every night, and this
> > requires approx. 16-17 LTO-3-.tapes. When a partition is filled up we
> > perform a complete backup which requires approx. 12 LTO-3 tape. Our
> > tape-station is a Dell PV136T with  3x18 slots. Increasing a partition
> > to 5 TB would require more tapes and I don''t have any plans
on
> > becoming a tape-dj :-)
>
> What kind of backup software do you use and what amountof time does a full
> and an incremental backup take?
I use legato networker. They have drivers for solaris 9 on sparc for
lto-3-tapes.
> Could you please explain why an incremental backup takes 16-17 tapes while
> a full backup only takes 12 tapes?
It''s because some of our users do minor changes to their files and all
these changes to individual files will be added to the incremental
backup. So a file can be backed up more than once.
> 12 tapes with 400 GB should pe OK for a level 0 backup of a 5 TB volume.
>
> I see the size of an incremenatal backup is less than 4% of the total data
size
> on the NFS server for berlios.de. In your case, this should be less than
180 GB
> which is less than one single tape. Do you use GNU tar for your
incrementals?
I use legato.

-- 
regards
Claus

Joerg Schilling

2007-Apr-28 14:35 UTC

head link

[zfs-discuss] Very Large Filesystems

"Claus Guttesen" <kometen at gmail.com> wrote:
> > > I''m currently using 4 TB partitions with vxfs. When
hosted on FreeBSD
> > > I was limited to 2 TB but using UFS2/FreeBSD was impractical for
> > > several reasons. With vxfs 4 TB is a practical limit, when files
are
> >
> > Could you please give some hints on these reasons? I only know that
> > FreeBSDs UFS2 is not fully 64 Bit aware.
>
> FreeBSD 5/6 can under normal circumstances perform a fsck in the
> background if the server had an unplanned restart. But before it can
> do so it has do make a snapshot. A snapshot would take approx. 20 min.
> on a 400 GB partition and during that time the webservers could not
> access the nfs-server and hence our visitors would see an
> error-message. As my partitions grew the time taken to perform a
> snapshot would also increase.
This leads to an interesting question. I observe a time between 2-3 minutes
on a 500 GB Solaris UFS partition to create a snapshot. It would be of interest
to me to know about other data on snapshots on UFS or ZFS.
> > > stored on the volume we take incremental backup every night, and
this
> > > requires approx. 16-17 LTO-3-.tapes. When a partition is filled
up we
> > > perform a complete backup which requires approx. 12 LTO-3 tape.
Our
> > > tape-station is a Dell PV136T with  3x18 slots. Increasing a
partition
> > > to 5 TB would require more tapes and I don''t have any
plans on
> > > becoming a tape-dj :-)
> >
> > What kind of backup software do you use and what amountof time does a
full
> > and an incremental backup take?
>
> I use legato networker. They have drivers for solaris 9 on sparc for
> lto-3-tapes.
>
> > Could you please explain why an incremental backup takes 16-17 tapes
while
> > a full backup only takes 12 tapes?
>
> It''s because some of our users do minor changes to their files and
all
> these changes to individual files will be added to the incremental
> backup. So a file can be backed up more than once.
I don''t understand this, if you only create one incremental per day, I
would
asume every changed file to appear exactly once in an incremental.

J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de                (uni)  
       schilling at fokus.fraunhofer.de     (work) Blog:
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Peter Tribble

2007-Apr-28 16:02 UTC

head link

[zfs-discuss] Very Large Filesystems

On 4/28/07, Yaniv Aknin <the.aknin at gmail.com>
wrote:> Following my previous post across several mailing lists regarding
multi-tera volumes with small files on them, I''d be glad if people
could share real life numbers on large filesystems and their experience with
them. I''m slowly coming to a realization that regardless of theoretical
filesystem capabilities (1TB, 32TB, 256TB or more), more or less across the
enterprise filesystem arena people are recommending to keep practical
filesystems up to 1TB in size, for manageability and recoverability.
>
> What''s the maximum filesystem size you''ve used in
production environment? How did the experience come out?
Works fine. As an example:

Filesystem             size   used  avail capacity  Mounted on
xx                  14T   116K   2.1T     1%    /xx
xx/peter            14T    34G   2.1T     2%    /xx/peter
xx/aa             14T   1.2T   2.1T    37%    /xx/aa
xx/tank               14T   1.5T   2.1T    41%    /xx/tank
xx/tank-archives      14T   3.5T   2.1T    63%    /xx/tank-archives
xx/rework      14T   6.8G   2.1T     1%    /xx/rework
xx/bb             14T    61G   2.1T     3%    /xx/bb
xx/foo         14T    73K   2.1T     1%    /xx/foo
xx/foo/aug06    14T   771K   2.1T     1%    /xx/foo/aug06
xx/foo/cc     14T    55G   2.1T     3%    /xx/foo/cc
xx/foo/mm    14T   2.2G   2.1T     1%    /xx/foo/mm
xx/foo/dd    14T    47M   2.1T     1%    /xx/foo/dd
xx/foo/rr    14T   1.4G   2.1T     1%    /xx/foo/rr
xx/foo/tf    14T   1.6G   2.1T     1%    /xx/foo/tf
xx/ee              14T   1.3T   2.1T    38%    /xx/ee
xx/aa-fe          14T   274G   2.1T    12%    /xx/aa-fe
xx/vv            14T    68G   2.1T     4%    /xx/vv
xx/nn           14T    28G   2.1T     2%    /xx/nn
xx/mm       14T   4.2G   2.1T     1%    /xx/mm
xx/rr        14T   3.1G   2.1T     1%    /xx/rr
xx/ss       14T    48G   2.1T     3%    /xx/ss
xx/ff             14T   305G   2.1T    13%    /xx/ff
xx/gg-jj    14T   570G   2.1T    21%    /xx/gg-jj
xx/gg      14T   882G   2.1T    29%    /xx/gg
xx/aa-tn    14T    35G   2.1T     2%    /xx/aa-tn
xx/pp            14T   234K   2.1T     1%    /xx/pp
xx/ee-tt       14T   256K   2.1T     1%    /xx/ee-tt
xx/tank-r4          14T   2.0T   2.1T    50%    /xx/tank-r4
xx/tank-r1-clone    14T   3.3T   2.1T    61%    /xx/tank-r1-clone
xx/rdce             14T    91G   2.1T     5%    /xx/rdce

That''s a fair spread of sizes. Each filesystem in this
case represents a single dataset, so it''s hard to make
them any smaller. (Until recently, some of the larger
datasets were spread across multiple ufs filesystems
and merged back together using an automount map.
At least zfs has saved me from that nightmare.)

Many of these filesystems have millions of files - the
most is over 11 million at an average of 115k each,
although one has 8 million at 4k each.

In practical terms, backing up much over a terabyte
in a single chunk isn''t ideal. What I would like to see
here is more flexibility from something like Legato
in terms of defining schedules that would allow us to
back this up sensibly. (Basically, the changes are
relatively small, so it would be nice to use quarterly
schedules - Legato only really does weekly or monthly.)

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/

Brian Hechinger

2007-Apr-28 16:49 UTC

head link

[zfs-discuss] Very Large Filesystems

On Sat, Apr 28, 2007 at 05:02:47PM +0100, Peter Tribble
wrote:> 
> In practical terms, backing up much over a terabyte
> in a single chunk isn''t ideal. What I would like to see
> here is more flexibility from something like Legato
> in terms of defining schedules that would allow us to
> back this up sensibly. (Basically, the changes are
> relatively small, so it would be nice to use quarterly
> schedules - Legato only really does weekly or monthly.)
So what you *really* want is TSM.  I wonder if IBM would ever
consider supporting ZFS.

-brian
-- 
"Perl can be fast and elegant as much as J2EE can be fast and elegant.
In the hands of a skilled artisan, it can and does happen; it''s just
that most of the shit out there is built by people who''d be better
suited to making sure that my burger is cooked thoroughly."  -- Jonathan
Patschke

Rayson Ho

2007-Apr-28 16:52 UTC

head link

[zfs-discuss] Very Large Filesystems

On 4/28/07, Brian Hechinger <wonko at 4amlunch.net>
wrote:> So what you *really* want is TSM.  I wonder if IBM would ever
> consider supporting ZFS.
Just wondering, does Sun/STK have something similar to TSM??

Rayson


>
> -brian
> --
> "Perl can be fast and elegant as much as J2EE can be fast and elegant.
> In the hands of a skilled artisan, it can and does happen; it''s
just
> that most of the shit out there is built by people who''d be better
> suited to making sure that my burger is cooked thoroughly."  --
Jonathan Patschke
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Selim Daoud

2007-Apr-28 17:13 UTC

head link

[zfs-discuss] Very Large Filesystems

the Sun''s equivalent is SAMFS and especially the latest version (4.6)
which can be entirely used for backup/restore/archive
SAMFS will be opensource very soon

s.

On 4/28/07, Rayson Ho <rayrayson at gmail.com>
wrote:> On 4/28/07, Brian Hechinger <wonko at 4amlunch.net> wrote:
> > So what you *really* want is TSM.  I wonder if IBM would ever
> > consider supporting ZFS.
>
> Just wondering, does Sun/STK have something similar to TSM??
>
> Rayson
>
>
>
> >
> > -brian
> > --
> > "Perl can be fast and elegant as much as J2EE can be fast and
elegant.
> > In the hands of a skilled artisan, it can and does happen;
it''s just
> > that most of the shit out there is built by people who''d be
better
> > suited to making sure that my burger is cooked thoroughly."  --
Jonathan Patschke
> > _______________________________________________
> > zfs-discuss mailing list
> > zfs-discuss at opensolaris.org
> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> >
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Joerg Schilling

2007-Apr-28 19:00 UTC

head link

[zfs-discuss] Very Large Filesystems

Erblichs <erblichs at earthlink.net> wrote:
> Jorg,
>
> 	Do you really think that ANY FS actually needs to support
> 	more FS objects? If that would be an issue, why not create
> 	more FSs?
>
> 	A multi-TB FS SHOULD support 100MB+/GB size FS objects, which
> 	IMO is the more common use. I have seen this alot in video
> 	environments. The largest that I have personally seen is in
> 	excess of 64TBs.
Abou 12 years ago many people have been in fear to create filesystems
grater than 2 GB. Today we cannot believe this.
> 	I would assume that just normal FSops that search or display
> 	a extremely large number of FS objects is going to be
> 	difficult to use. Just try placing 10k+ FS objects/files within
> 	a directly and then list that directory.
>
> 	As for backups / restore type ops, I would assume that a
> 	smaller granularity of specified paths / directories would be 
> 	more common due to user error and not disturbing other
> 	directories.
I am sure, in 10-15 years people think different than they do today.

Note that in 15 years, a single 2.5 " disk will have a capacity of
aprox. 100 TB. As people will use striping and RAID technologies
to optimize speed and reliability, a single data pool will most likely be
usually 1000 TB. Backup media will also increate in size......

J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de                (uni)  
       schilling at fokus.fraunhofer.de     (work) Blog:
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Erblichs

2007-Apr-29 18:07 UTC

head link

[zfs-discuss] Very Large Filesystems

Jorg,

	Do you really think that ANY FS actually needs to support
	more FS objects? If that would be an issue, why not create
	more FSs?

	A multi-TB FS SHOULD support 100MB+/GB size FS objects, which
	IMO is the more common use. I have seen this alot in video
	environments. The largest that I have personally seen is in
	excess of 64TBs.

	I would assume that just normal FSops that search or display
	a extremely large number of FS objects is going to be
	difficult to use. Just try placing 10k+ FS objects/files within
	a directly and then list that directory.

	As for backups / restore type ops, I would assume that a
	smaller granularity of specified paths / directories would be 
	more common due to user error and not disturbing other
	directories.

	Mitchell Erblich
	-----------------

 
Joerg Schilling wrote:> 
> Yaniv Aknin <the.aknin at gmail.com> wrote:
> 
> > Following my previous post across several mailing lists regarding
multi-tera volumes with small files on them, I''d be glad if people
could share real life numbers on large filesystems and their experience with
them. I''m slowly coming to a realization that regardless of theoretical
filesystem capabilities (1TB, 32TB, 256TB or more), more or less across the
enterprise filesystem arena people are recommending to keep practical
filesystems up to 1TB in size, for manageability and recoverability.
> 
> UFS is limited to 2**31 inodes and this also limits the filesystem size.
> On Berlios we have a mixture of small and large files and the average file
> size is 100 kB. This would still give you a limit os 200 TB which is more
> than UFS allows you.
> 
> I would guess that the recommendations are rather oriented on the backup.
> On backup speed and on the size of the backup media.
> 
> J?rg
> 
> --
>  EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353
Berlin
>        js at cs.tu-berlin.de                (uni)
>        schilling at fokus.fraunhofer.de     (work) Blog:
http://schily.blogspot.com/
>  URL:  http://cdrecord.berlios.de/old/private/
ftp://ftp.berlios.de/pub/schily
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Jan-Frode Myklebust

2007-May-01 09:32 UTC

head link

[zfs-discuss] Re: Very Large Filesystems

On 2007-04-28, Brian Hechinger <wonko at 4amlunch.net>
wrote:>> 
> So what you *really* want is TSM.  I wonder if IBM would ever
> consider supporting ZFS.
>
TSM can do file-level backup of any normal posix file system
by specifying "VirtualMountPoint /directory". You just woun''t
get
the extra features like extended attributes and ACL''s covered.


  -jf

Peter Tribble

2007-May-01 22:29 UTC

head link

[zfs-discuss] Very Large Filesystems

On 4/28/07, Brian Hechinger <wonko at 4amlunch.net>
wrote:> On Sat, Apr 28, 2007 at 05:02:47PM +0100, Peter Tribble wrote:
> >
> > In practical terms, backing up much over a terabyte
> > in a single chunk isn''t ideal. What I would like to see
> > here is more flexibility from something like Legato
> > in terms of defining schedules that would allow us to
> > back this up sensibly. (Basically, the changes are
> > relatively small, so it would be nice to use quarterly
> > schedules - Legato only really does weekly or monthly.)
>
> So what you *really* want is TSM.  I wonder if IBM would ever
> consider supporting ZFS.
Educate me. In what way would TSM help? The only real
issue I have with Legato (or NetBackup) is the inability to define
schedules just the way I want. (The workaround is to set up
a monthly schedule and them manually override some of the
months.)

Thanks,

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/

2007-May-05 17:34 UTC

head link

[zfs-discuss] Re: Very Large Filesystems

> What''s the maximum filesystem size you''ve used in
production environment? How did the experience come out?
 I have a 26tb pool that will be upgraded to 39tb in the next couple of months.
This is the backend for Backup images. The ease of managing this sort of
expanding storage is a little bit of wonderful. I remember the pain of managing
10tb of A5200''s back in 2000 and this is a welcome sight.
 
 
This message posted from opensolaris.org

zfs discuss - Apr 2007 - Very Large Filesystems

[zfs-discuss] Very Large Filesystems

[zfs-discuss] Very Large Filesystems

[zfs-discuss] Very Large Filesystems

[zfs-discuss] Very Large Filesystems

[zfs-discuss] Very Large Filesystems

[zfs-discuss] Very Large Filesystems

[zfs-discuss] Very Large Filesystems

[zfs-discuss] Very Large Filesystems

[zfs-discuss] Very Large Filesystems

[zfs-discuss] Very Large Filesystems

[zfs-discuss] Very Large Filesystems

[zfs-discuss] Very Large Filesystems

[zfs-discuss] Re: Very Large Filesystems

[zfs-discuss] Very Large Filesystems

[zfs-discuss] Re: Very Large Filesystems