thr3ads.net - zfs discuss - [zfs-discuss] Experiences with 10.000+ filesystems [May 2011]

If this information is useful, please help other people find it:
Share via:

Gertjan Oude Lohuis

2011-May-31 10:08 UTC

[zfs-discuss] Experiences with 10.000+ filesystems

"Filesystem are cheap" is one of ZFS''s mottos. I''m
wondering how far
this goes. Does anyone have experience with having more than 10.000 ZFS
filesystems? I know that mounting this many filesystems during boot
while take considerable time. Are there any other disadvantages that I
should be aware of? Are zfs-tools still usable, like ''zfs
list'', ''zfs
get/set''.
Would I run into any problems when snapshots are taken (almost)
simultaneously from multiple filesystems at once?

Regards,
Gertjan Oude Lohuis

Khushil Dep

2011-May-31 10:26 UTC

head link

[zfs-discuss] Experiences with 10.000+ filesystems

The adage that I adhere to with ZFS features is "just because you can
doesn''t mean you should!". I would suspect that with that many
filesystems the normal zfs-tools would also take an inordinate length of time to
complete their operations - scale according to size.

Generally snapshots are quick operations but 10,000 such operations would I
believe take enough to time to complete as to present operational issues -
breaking these into sets would alleviate some? Perhaps if you are starting to
run into many thousands of filesystems you would need to re-examin your
rationale in creating so many.

My 2c. YMMV.

-- 
Khush

On Tuesday, 31 May 2011 at 11:08, Gertjan Oude Lohuis wrote:
> "Filesystem are cheap" is one of ZFS''s mottos.
I''m wondering how far
> this goes. Does anyone have experience with having more than 10.000 ZFS
> filesystems? I know that mounting this many filesystems during boot
> while take considerable time. Are there any other disadvantages that I
> should be aware of? Are zfs-tools still usable, like ''zfs
list'', ''zfs
> get/set''.
> Would I run into any problems when snapshots are taken (almost)
> simultaneously from multiple filesystems at once?
> 
> Regards,
> Gertjan Oude Lohuis
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org (mailto:zfs-discuss at opensolaris.org)
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110531/72209b91/attachment-0001.html>

Paul Kraus

2011-May-31 12:52 UTC

head link

[zfs-discuss] Experiences with 10.000+ filesystems

On Tue, May 31, 2011 at 6:08 AM, Gertjan Oude Lohuis
<gertjan at oudelohuis.nl> wrote:
> "Filesystem are cheap" is one of ZFS''s mottos.
I''m wondering how far
> this goes. Does anyone have experience with having more than 10.000 ZFS
> filesystems? I know that mounting this many filesystems during boot
> while take considerable time. Are there any other disadvantages that I
> should be aware of? Are zfs-tools still usable, like ''zfs
list'', ''zfs
> get/set''.
    When we initially configured a large (20TB) files server about 5
years ago, we went with multiple zpools and multiple datasets (zfs) in
each zpool. Currently we have 17 zpools and about 280 datasets.
Nowhere near the 10,000+ you intend. We are moving _away_ from the
many dataset model to one zpool and one dataset. We are doing this for
the following reasons:

1. manageability
2. space management (we have wasted space in some pools while others
are starved)
3. tool speed

    I do not have good numbers for time to do some of these operations
as we are down to under 200 datasets (1/3 of the way through the
migration to the new layout). I do have log entries that point to
about a minute to complete a `zfs list` operation.
> Would I run into any problems when snapshots are taken (almost)
> simultaneously from multiple filesystems at once?
    Our logs show snapshot creation time at 2 seconds or less, but we
do not try to do them all at once, we walk the list of datasets and
process (snapshot and replicate) each in turn.

-- 
{--------1---------2---------3---------4---------5---------6---------7---------}
Paul Kraus
-> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
-> Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
-> Technical Advisor, RPI Players

Tomas Ögren

2011-May-31 13:52 UTC

head link

[zfs-discuss] Experiences with 10.000+ filesystems

On 31 May, 2011 - Khushil Dep sent me these 4,5K bytes:
> The adage that I adhere to with ZFS features is "just because you can
> doesn''t mean you should!". I would suspect that with that
many
> filesystems the normal zfs-tools would also take an inordinate length
> of time to complete their operations - scale according to size.
I''ve done a not too scientific test on reboot times for Solaris 10 vs
11
with regard to many filesystems...

Quad Xeon machines with single raid10 and one boot environment. Using
more be''s with LU in sol10 will make the situation even worse, as
it''s
LU that''s taking time (re)mounting all filesystems over and over and
over and over again.
http://www8.cs.umu.se/~stric/tmp/zfs-many.png

As the picture shows, don''t try 10000 filesystems with nfs on sol10.
Creating more filesystems gets slower and slower the more you have as
well.
> Generally snapshots are quick operations but 10,000 such operations
> would I believe take enough to time to complete as to present
> operational issues - breaking these into sets would alleviate some?
> Perhaps if you are starting to run into many thousands of filesystems
> you would need to re-examin your rationale in creating so many.
On a different setup, we have about 750 datasets where we would like to
use a single recursive snapshot, but when doing that all file access
will be frozen for varying amounts of time (sometimes half an hour or
way more). Splitting it up into ~30 subsets, doing recursive snapshots
over those instead has decreased the total snapshot time greatly and cut
the "frozen time" down to single digit seconds instead of minutes or
hours.
> My 2c. YMMV.
> 
> -- 
> Khush
> 
> On Tuesday, 31 May 2011 at 11:08, Gertjan Oude Lohuis wrote:
> 
> > "Filesystem are cheap" is one of ZFS''s mottos.
I''m wondering how far
> > this goes. Does anyone have experience with having more than 10.000
ZFS
> > filesystems? I know that mounting this many filesystems during boot
> > while take considerable time. Are there any other disadvantages that I
> > should be aware of? Are zfs-tools still usable, like ''zfs
list'', ''zfs
> > get/set''.
> > Would I run into any problems when snapshots are taken (almost)
> > simultaneously from multiple filesystems at once?
> > 
> > Regards,
> > Gertjan Oude Lohuis
> > _______________________________________________
> > zfs-discuss mailing list
> > zfs-discuss at opensolaris.org (mailto:zfs-discuss at opensolaris.org)
> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


/Tomas
-- 
Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Ume?
`- Sysadmin at {cs,acc}.umu.se

Jerry Kemp

2011-May-31 14:43 UTC

head link

[zfs-discuss] Experiences with 10.000+ filesystems

Gertjan,

In addition to the comments directly relating from your post, we have
had similar discussions previously on the zfs-discuss list.

If you care to go and review the list archives, I can share that we have
had similar discussions on at least the following time periods.

March 2006
May 2008
January 2010
February 2010

There may be (and probably are) more stuff in the list archives, but I
know from my personal archives that these are good dates.

Hope this helps,

Jerry

On 05/31/11 05:08, Gertjan Oude Lohuis wrote:> "Filesystem are cheap" is one of ZFS''s mottos.
I''m wondering how far
> this goes. Does anyone have experience with having more than 10.000 ZFS
> filesystems? I know that mounting this many filesystems during boot
> while take considerable time. Are there any other disadvantages that I
> should be aware of? Are zfs-tools still usable, like ''zfs
list'', ''zfs
> get/set''.
> Would I run into any problems when snapshots are taken (almost)
> simultaneously from multiple filesystems at once?
> 
> Regards,
> Gertjan Oude Lohuis
>

Eric D. Mudama

2011-May-31 15:10 UTC

head link

[zfs-discuss] Experiences with 10.000+ filesystems

On Tue, May 31 at  8:52, Paul Kraus wrote:>    When we initially configured a large (20TB) files server about 5
>years ago, we went with multiple zpools and multiple datasets (zfs) in
>each zpool. Currently we have 17 zpools and about 280 datasets.
>Nowhere near the 10,000+ you intend. We are moving _away_ from the
>many dataset model to one zpool and one dataset. We are doing this for
>the following reasons:
>
>1. manageability
>2. space management (we have wasted space in some pools while others
>are starved)
>3. tool speed
>
>    I do not have good numbers for time to do some of these operations
>as we are down to under 200 datasets (1/3 of the way through the
>migration to the new layout). I do have log entries that point to
>about a minute to complete a `zfs list` operation.
It would be interesting to see if you still had issues (#3) with 1 pool and
your 280 datasets.  It would definitely eliminate #2.

-- 
Eric D. Mudama
edmudama at bounceswoosh.org

Jim Klimov

2011-May-31 17:40 UTC

head link

[zfs-discuss] Experiences with 10.000+ filesystems

In general, you may need to keep data in one dataset if it is somehow
related (i.e. backup of a specific machine or program, a user''s home,
etc)
and if you plan to manage it in a consistent manner. For example, CIFS
shares can not be nested, so for a unitary share (like "distribs") you
would
probably want one dataset. Also you can only have hardlinks within one
FS dataset, so if you manage different views into a distribution set
(i.e. sorted by vendor or sorted by software type) and if you do it
by hardlinks - you need one dataset as well. If you often move (link
and unlink) files around, i.e. from an "incoming" directory to final
storage, you may want or not want to have that "incoming" in the
same dataset, this depends on some other considerations too.
 
You want to split datasets when you need them to have different
features and perhaps different uses, i.e. to have them as separate
shares, to enforce separate quotas and reservations, perhaps to
delegate administration to particular OS users (i.e. let a user manage
snapshots of his own homedir) and/or local zones. Don''t forget
about individual dataset properties (i.e. you may want compression
for source code files but not for a multimedia collection), snapshots
and clones, etc.
 > 2. space management (we have wasted space in some pools while others
> are starved)Well, that''s a reason to decrease number of pools, but not datasets ;)
 > 3. tool speed
> 
>     I do not have good numbers for time to do 
> some of these operations
> as we are down to under 200 datasets (1/3 of the way through the
> migration to the new layout). I do have log entries that point to
> about a minute to complete a `zfs list` operation.
> 
> > Would I run into any problems when snapshots are taken (almost)
> > simultaneously from multiple filesystems at once?
> 
>     Our logs show snapshot creation time at 2 
> seconds or less, but we
> do not try to do them all at once, we walk the list of datasets and
> process (snapshot and replicate) each in turn.
I can partially relate to that. We have a Thumper system running
OpenSolaris SXCE snv_177, with a separate dataset for each
user''s home directory, for backups of each individual remote
machine, for each VM image, each local zone, etc. - in particular 
as to have separate history of snapshots and possibility to clone
what we need to.
 
Its relatively many filesystems (about 350) are or are not a problem 
depending on the tool used. For example, a typical import of the 
main pool may take up to 8 minutes when in safe mode,  but many 
of delays seem to be related to attempts to share_nfs and share_cifs
while the network is down ;)
 
Auto-snapshots are on, and listing them is indeed rather long:
 
[root at thumper ~]# time zfs list -tall -r pond | wc -l
   56528
real    0m18.146s
user    0m7.360s
sys     0m10.084s

[root at thumper ~]# time zfs list -tvolume -r pond | wc -l
       5
real    0m0.096s
user    0m0.025s
sys     0m0.073s

[root at thumper ~]# time zfs list -tfilesystem -r pond | wc -l
     353
real    0m0.123s
user    0m0.052s
sys     0m0.073s

Some operations like listing the filesystems SEEM slow due to the terminal,
but in fact are rather quick:
 
[root at thumper ~]# time df -k | wc -l
     363
real    0m2.104s
user    0m0.094s
sys     0m0.183s

However low-level system programs may have problems with multiple 
FSes; one known troublemaker is LiveUpgrade. Jens Elkner published
a wonderful set of patches for Solaris 10 and OpenSolaris to limit LU''s
interests to just the filesystems that the admin knows to be interesting
for the OS upgrade (they also fix mount order and other known bugs
of that LU software release):
* http://iws.cs.uni-magdeburg.de/~elkner/luc/lutrouble.html
 
True, 10000 FSes is not something I would have seen, so some tools
(especially legacy ones) may break at the sheer amount of mountpoints :)
 
One of my own tricks for cleaning snapshots, i.e. to free up pool space 
starvation quickly, is to use parallel "zfs destroy" invokations like
this
(note the ampersand):
 
# zfs list -t snapshot -r pond/export/home/user | grep @zfs-auto-snap | awk
''{print $1}'' | \
  while read Z ; do zfs destroy "$Z" & done
 
This may spawn several thousand processes (if called for the root dataset), 
but they often complete in just 1-2 minutes instead of hours for a one-by-one 
series of calls; I guess because this way many ZFS metadata operations 
are requested in a small timeframe and get coalesced into few big writes.
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110531/7de36a5d/attachment-0001.html>

Matthew Ahrens

2011-May-31 19:06 UTC

head link

[zfs-discuss] Experiences with 10.000+ filesystems

On Tue, May 31, 2011 at 6:52 AM, Tomas ?gren <stric at acc.umu.se> wrote:
>
> On a different setup, we have about 750 datasets where we would like to
> use a single recursive snapshot, but when doing that all file access
> will be frozen for varying amounts of time (sometimes half an hour or
> way more). Splitting it up into ~30 subsets, doing recursive snapshots
> over those instead has decreased the total snapshot time greatly and cut
> the "frozen time" down to single digit seconds instead of minutes
or
> hours.
>
If you can upgrade to zpool version 27 or later, you should see much much
less "frozen time" when doing a "zfs snapshot -r" of
thousands of
filesystems.

--matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110531/d5077c4e/attachment.html>

Gertjan Oude Lohuis

2011-May-31 21:29 UTC

head link

[zfs-discuss] Experiences with 10.000+ filesystems

On 05/31/2011 03:52 PM, Tomas ?gren wrote:> I''ve done a not too scientific test on reboot times for Solaris 10
vs 11
> with regard to many filesystems...
>
> http://www8.cs.umu.se/~stric/tmp/zfs-many.png
>
> As the picture shows, don''t try 10000 filesystems with nfs on
sol10.
> Creating more filesystems gets slower and slower the more you have as
> well.
>
Since all filesystem would be shared via NFS, this clearly is a nogo :). 
Thanks!
> On a different setup, we have about 750 datasets where we would like to
> use a single recursive snapshot, but when doing that all file access
> will be frozen for varying amounts of time
What version of ZFS are you using? Like Matthew Ahrens said: version 27 
has a fix for this.

Gertjan Oude Lohuis

2011-May-31 21:37 UTC

head link

[zfs-discuss] Experiences with 10.000+ filesystems

On 05/31/2011 12:26 PM, Khushil Dep wrote:> Generally snapshots are quick operations but 10,000 such operations
> would I believe take enough to time to complete as to present
> operational issues - breaking these into sets would alleviate some?
> Perhaps if you are starting to run into many thousands of filesystems
> you would need to re-examin your rationale in creating so many.
>
Thanks for your feedback! My rationale is this: I have a lot of 
hostingaccounts which have databases. These databases need to be backed 
up, preferably with mysqldump and there need to be historic data. I 
would like to use ZFS snapshots for this. However, I have some variables 
that need to be taken into account:

* Different hostingplans offer different backupschedules: every 3 hour, 
every 24 hour. Backups might be kept 3 days, 14 day or 30 days. These 
schedules thus need to be on separate storage, otherwise I can''t create
a matching snapshot schedule to create and rotate snapshots.

* Databases are hosted on multiple databaseservers, and are frequently 
migrated between them. I could create a ZFS filesystem for each server, 
but if a hostingaccount is migrated, all backups will be
''lost''.

Having one filesystem for each hostingaccount would have solved nearly 
all disadvantages I could think of. But I don''t think it is going to 
work, sadly. I''ll have to make some choices :).

Regards,
Gertjan Oude Lohuis

Richard Elling

2011-May-31 21:39 UTC

head link

[zfs-discuss] Experiences with 10.000+ filesystems

On May 31, 2011, at 2:29 PM, Gertjan Oude Lohuis wrote:
> On 05/31/2011 03:52 PM, Tomas ?gren wrote:
>> I''ve done a not too scientific test on reboot times for
Solaris 10 vs 11
>> with regard to many filesystems...
>> 
> 
>> http://www8.cs.umu.se/~stric/tmp/zfs-many.png
>> 
>> As the picture shows, don''t try 10000 filesystems with nfs on
sol10.
>> Creating more filesystems gets slower and slower the more you have as
>> well.
>> 
> 
> Since all filesystem would be shared via NFS, this clearly is a nogo :).
Thanks!
If you search the archives, you will find that the people who tried to do this
in the
past were more successful with legacy NFS export methods than the sharenfs
property in ZFS.
 -- richard

Tomas Ögren

2011-May-31 21:52 UTC

head link

[zfs-discuss] Experiences with 10.000+ filesystems

On 31 May, 2011 - Gertjan Oude Lohuis sent me these 0,9K bytes:
> On 05/31/2011 03:52 PM, Tomas ?gren wrote:
>> I''ve done a not too scientific test on reboot times for
Solaris 10 vs 11
>> with regard to many filesystems...
>>
>
>> http://www8.cs.umu.se/~stric/tmp/zfs-many.png
>>
>> As the picture shows, don''t try 10000 filesystems with nfs on
sol10.
>> Creating more filesystems gets slower and slower the more you have as
>> well.
>>
>
> Since all filesystem would be shared via NFS, this clearly is a nogo :).  
> Thanks!
>
>> On a different setup, we have about 750 datasets where we would like to
>> use a single recursive snapshot, but when doing that all file access
>> will be frozen for varying amounts of time
>
> What version of ZFS are you using? Like Matthew Ahrens said: version 27  
> has a fix for this.
22, Solaris 10.

/Tomas
-- 
Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Ume?
`- Sysadmin at {cs,acc}.umu.se

zfs discuss - May 2011 - Experiences with 10.000+ filesystems

[zfs-discuss] Experiences with 10.000+ filesystems

[zfs-discuss] Experiences with 10.000+ filesystems

[zfs-discuss] Experiences with 10.000+ filesystems

[zfs-discuss] Experiences with 10.000+ filesystems

[zfs-discuss] Experiences with 10.000+ filesystems

[zfs-discuss] Experiences with 10.000+ filesystems

[zfs-discuss] Experiences with 10.000+ filesystems

[zfs-discuss] Experiences with 10.000+ filesystems

[zfs-discuss] Experiences with 10.000+ filesystems

[zfs-discuss] Experiences with 10.000+ filesystems

[zfs-discuss] Experiences with 10.000+ filesystems

[zfs-discuss] Experiences with 10.000+ filesystems