Gertjan Oude Lohuis
2011-May-31 10:08 UTC
[zfs-discuss] Experiences with 10.000+ filesystems
"Filesystem are cheap" is one of ZFS''s mottos. I''m wondering how far this goes. Does anyone have experience with having more than 10.000 ZFS filesystems? I know that mounting this many filesystems during boot while take considerable time. Are there any other disadvantages that I should be aware of? Are zfs-tools still usable, like ''zfs list'', ''zfs get/set''. Would I run into any problems when snapshots are taken (almost) simultaneously from multiple filesystems at once? Regards, Gertjan Oude Lohuis
The adage that I adhere to with ZFS features is "just because you can doesn''t mean you should!". I would suspect that with that many filesystems the normal zfs-tools would also take an inordinate length of time to complete their operations - scale according to size. Generally snapshots are quick operations but 10,000 such operations would I believe take enough to time to complete as to present operational issues - breaking these into sets would alleviate some? Perhaps if you are starting to run into many thousands of filesystems you would need to re-examin your rationale in creating so many. My 2c. YMMV. -- Khush On Tuesday, 31 May 2011 at 11:08, Gertjan Oude Lohuis wrote:> "Filesystem are cheap" is one of ZFS''s mottos. I''m wondering how far > this goes. Does anyone have experience with having more than 10.000 ZFS > filesystems? I know that mounting this many filesystems during boot > while take considerable time. Are there any other disadvantages that I > should be aware of? Are zfs-tools still usable, like ''zfs list'', ''zfs > get/set''. > Would I run into any problems when snapshots are taken (almost) > simultaneously from multiple filesystems at once? > > Regards, > Gertjan Oude Lohuis > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org (mailto:zfs-discuss at opensolaris.org) > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110531/72209b91/attachment-0001.html>
On Tue, May 31, 2011 at 6:08 AM, Gertjan Oude Lohuis <gertjan at oudelohuis.nl> wrote:> "Filesystem are cheap" is one of ZFS''s mottos. I''m wondering how far > this goes. Does anyone have experience with having more than 10.000 ZFS > filesystems? I know that mounting this many filesystems during boot > while take considerable time. Are there any other disadvantages that I > should be aware of? Are zfs-tools still usable, like ''zfs list'', ''zfs > get/set''.When we initially configured a large (20TB) files server about 5 years ago, we went with multiple zpools and multiple datasets (zfs) in each zpool. Currently we have 17 zpools and about 280 datasets. Nowhere near the 10,000+ you intend. We are moving _away_ from the many dataset model to one zpool and one dataset. We are doing this for the following reasons: 1. manageability 2. space management (we have wasted space in some pools while others are starved) 3. tool speed I do not have good numbers for time to do some of these operations as we are down to under 200 datasets (1/3 of the way through the migration to the new layout). I do have log entries that point to about a minute to complete a `zfs list` operation.> Would I run into any problems when snapshots are taken (almost) > simultaneously from multiple filesystems at once?Our logs show snapshot creation time at 2 seconds or less, but we do not try to do them all at once, we walk the list of datasets and process (snapshot and replicate) each in turn. -- {--------1---------2---------3---------4---------5---------6---------7---------} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, RPI Players
On 31 May, 2011 - Khushil Dep sent me these 4,5K bytes:> The adage that I adhere to with ZFS features is "just because you can > doesn''t mean you should!". I would suspect that with that many > filesystems the normal zfs-tools would also take an inordinate length > of time to complete their operations - scale according to size.I''ve done a not too scientific test on reboot times for Solaris 10 vs 11 with regard to many filesystems... Quad Xeon machines with single raid10 and one boot environment. Using more be''s with LU in sol10 will make the situation even worse, as it''s LU that''s taking time (re)mounting all filesystems over and over and over and over again. http://www8.cs.umu.se/~stric/tmp/zfs-many.png As the picture shows, don''t try 10000 filesystems with nfs on sol10. Creating more filesystems gets slower and slower the more you have as well.> Generally snapshots are quick operations but 10,000 such operations > would I believe take enough to time to complete as to present > operational issues - breaking these into sets would alleviate some? > Perhaps if you are starting to run into many thousands of filesystems > you would need to re-examin your rationale in creating so many.On a different setup, we have about 750 datasets where we would like to use a single recursive snapshot, but when doing that all file access will be frozen for varying amounts of time (sometimes half an hour or way more). Splitting it up into ~30 subsets, doing recursive snapshots over those instead has decreased the total snapshot time greatly and cut the "frozen time" down to single digit seconds instead of minutes or hours.> My 2c. YMMV. > > -- > Khush > > On Tuesday, 31 May 2011 at 11:08, Gertjan Oude Lohuis wrote: > > > "Filesystem are cheap" is one of ZFS''s mottos. I''m wondering how far > > this goes. Does anyone have experience with having more than 10.000 ZFS > > filesystems? I know that mounting this many filesystems during boot > > while take considerable time. Are there any other disadvantages that I > > should be aware of? Are zfs-tools still usable, like ''zfs list'', ''zfs > > get/set''. > > Would I run into any problems when snapshots are taken (almost) > > simultaneously from multiple filesystems at once? > > > > Regards, > > Gertjan Oude Lohuis > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org (mailto:zfs-discuss at opensolaris.org) > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss/Tomas -- Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Ume? `- Sysadmin at {cs,acc}.umu.se
Gertjan, In addition to the comments directly relating from your post, we have had similar discussions previously on the zfs-discuss list. If you care to go and review the list archives, I can share that we have had similar discussions on at least the following time periods. March 2006 May 2008 January 2010 February 2010 There may be (and probably are) more stuff in the list archives, but I know from my personal archives that these are good dates. Hope this helps, Jerry On 05/31/11 05:08, Gertjan Oude Lohuis wrote:> "Filesystem are cheap" is one of ZFS''s mottos. I''m wondering how far > this goes. Does anyone have experience with having more than 10.000 ZFS > filesystems? I know that mounting this many filesystems during boot > while take considerable time. Are there any other disadvantages that I > should be aware of? Are zfs-tools still usable, like ''zfs list'', ''zfs > get/set''. > Would I run into any problems when snapshots are taken (almost) > simultaneously from multiple filesystems at once? > > Regards, > Gertjan Oude Lohuis >
On Tue, May 31 at 8:52, Paul Kraus wrote:> When we initially configured a large (20TB) files server about 5 >years ago, we went with multiple zpools and multiple datasets (zfs) in >each zpool. Currently we have 17 zpools and about 280 datasets. >Nowhere near the 10,000+ you intend. We are moving _away_ from the >many dataset model to one zpool and one dataset. We are doing this for >the following reasons: > >1. manageability >2. space management (we have wasted space in some pools while others >are starved) >3. tool speed > > I do not have good numbers for time to do some of these operations >as we are down to under 200 datasets (1/3 of the way through the >migration to the new layout). I do have log entries that point to >about a minute to complete a `zfs list` operation.It would be interesting to see if you still had issues (#3) with 1 pool and your 280 datasets. It would definitely eliminate #2. -- Eric D. Mudama edmudama at bounceswoosh.org
In general, you may need to keep data in one dataset if it is somehow related (i.e. backup of a specific machine or program, a user''s home, etc) and if you plan to manage it in a consistent manner. For example, CIFS shares can not be nested, so for a unitary share (like "distribs") you would probably want one dataset. Also you can only have hardlinks within one FS dataset, so if you manage different views into a distribution set (i.e. sorted by vendor or sorted by software type) and if you do it by hardlinks - you need one dataset as well. If you often move (link and unlink) files around, i.e. from an "incoming" directory to final storage, you may want or not want to have that "incoming" in the same dataset, this depends on some other considerations too. You want to split datasets when you need them to have different features and perhaps different uses, i.e. to have them as separate shares, to enforce separate quotas and reservations, perhaps to delegate administration to particular OS users (i.e. let a user manage snapshots of his own homedir) and/or local zones. Don''t forget about individual dataset properties (i.e. you may want compression for source code files but not for a multimedia collection), snapshots and clones, etc.> 2. space management (we have wasted space in some pools while others > are starved)Well, that''s a reason to decrease number of pools, but not datasets ;)> 3. tool speed > > I do not have good numbers for time to do > some of these operations > as we are down to under 200 datasets (1/3 of the way through the > migration to the new layout). I do have log entries that point to > about a minute to complete a `zfs list` operation. > > > Would I run into any problems when snapshots are taken (almost) > > simultaneously from multiple filesystems at once? > > Our logs show snapshot creation time at 2 > seconds or less, but we > do not try to do them all at once, we walk the list of datasets and > process (snapshot and replicate) each in turn.I can partially relate to that. We have a Thumper system running OpenSolaris SXCE snv_177, with a separate dataset for each user''s home directory, for backups of each individual remote machine, for each VM image, each local zone, etc. - in particular as to have separate history of snapshots and possibility to clone what we need to. Its relatively many filesystems (about 350) are or are not a problem depending on the tool used. For example, a typical import of the main pool may take up to 8 minutes when in safe mode, but many of delays seem to be related to attempts to share_nfs and share_cifs while the network is down ;) Auto-snapshots are on, and listing them is indeed rather long: [root at thumper ~]# time zfs list -tall -r pond | wc -l 56528 real 0m18.146s user 0m7.360s sys 0m10.084s [root at thumper ~]# time zfs list -tvolume -r pond | wc -l 5 real 0m0.096s user 0m0.025s sys 0m0.073s [root at thumper ~]# time zfs list -tfilesystem -r pond | wc -l 353 real 0m0.123s user 0m0.052s sys 0m0.073s Some operations like listing the filesystems SEEM slow due to the terminal, but in fact are rather quick: [root at thumper ~]# time df -k | wc -l 363 real 0m2.104s user 0m0.094s sys 0m0.183s However low-level system programs may have problems with multiple FSes; one known troublemaker is LiveUpgrade. Jens Elkner published a wonderful set of patches for Solaris 10 and OpenSolaris to limit LU''s interests to just the filesystems that the admin knows to be interesting for the OS upgrade (they also fix mount order and other known bugs of that LU software release): * http://iws.cs.uni-magdeburg.de/~elkner/luc/lutrouble.html True, 10000 FSes is not something I would have seen, so some tools (especially legacy ones) may break at the sheer amount of mountpoints :) One of my own tricks for cleaning snapshots, i.e. to free up pool space starvation quickly, is to use parallel "zfs destroy" invokations like this (note the ampersand): # zfs list -t snapshot -r pond/export/home/user | grep @zfs-auto-snap | awk ''{print $1}'' | \ while read Z ; do zfs destroy "$Z" & done This may spawn several thousand processes (if called for the root dataset), but they often complete in just 1-2 minutes instead of hours for a one-by-one series of calls; I guess because this way many ZFS metadata operations are requested in a small timeframe and get coalesced into few big writes. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110531/7de36a5d/attachment-0001.html>
On Tue, May 31, 2011 at 6:52 AM, Tomas ?gren <stric at acc.umu.se> wrote:> > On a different setup, we have about 750 datasets where we would like to > use a single recursive snapshot, but when doing that all file access > will be frozen for varying amounts of time (sometimes half an hour or > way more). Splitting it up into ~30 subsets, doing recursive snapshots > over those instead has decreased the total snapshot time greatly and cut > the "frozen time" down to single digit seconds instead of minutes or > hours. >If you can upgrade to zpool version 27 or later, you should see much much less "frozen time" when doing a "zfs snapshot -r" of thousands of filesystems. --matt -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110531/d5077c4e/attachment.html>
Gertjan Oude Lohuis
2011-May-31 21:29 UTC
[zfs-discuss] Experiences with 10.000+ filesystems
On 05/31/2011 03:52 PM, Tomas ?gren wrote:> I''ve done a not too scientific test on reboot times for Solaris 10 vs 11 > with regard to many filesystems... >> http://www8.cs.umu.se/~stric/tmp/zfs-many.png > > As the picture shows, don''t try 10000 filesystems with nfs on sol10. > Creating more filesystems gets slower and slower the more you have as > well. >Since all filesystem would be shared via NFS, this clearly is a nogo :). Thanks!> On a different setup, we have about 750 datasets where we would like to > use a single recursive snapshot, but when doing that all file access > will be frozen for varying amounts of timeWhat version of ZFS are you using? Like Matthew Ahrens said: version 27 has a fix for this.
Gertjan Oude Lohuis
2011-May-31 21:37 UTC
[zfs-discuss] Experiences with 10.000+ filesystems
On 05/31/2011 12:26 PM, Khushil Dep wrote:> Generally snapshots are quick operations but 10,000 such operations > would I believe take enough to time to complete as to present > operational issues - breaking these into sets would alleviate some? > Perhaps if you are starting to run into many thousands of filesystems > you would need to re-examin your rationale in creating so many. >Thanks for your feedback! My rationale is this: I have a lot of hostingaccounts which have databases. These databases need to be backed up, preferably with mysqldump and there need to be historic data. I would like to use ZFS snapshots for this. However, I have some variables that need to be taken into account: * Different hostingplans offer different backupschedules: every 3 hour, every 24 hour. Backups might be kept 3 days, 14 day or 30 days. These schedules thus need to be on separate storage, otherwise I can''t create a matching snapshot schedule to create and rotate snapshots. * Databases are hosted on multiple databaseservers, and are frequently migrated between them. I could create a ZFS filesystem for each server, but if a hostingaccount is migrated, all backups will be ''lost''. Having one filesystem for each hostingaccount would have solved nearly all disadvantages I could think of. But I don''t think it is going to work, sadly. I''ll have to make some choices :). Regards, Gertjan Oude Lohuis
On May 31, 2011, at 2:29 PM, Gertjan Oude Lohuis wrote:> On 05/31/2011 03:52 PM, Tomas ?gren wrote: >> I''ve done a not too scientific test on reboot times for Solaris 10 vs 11 >> with regard to many filesystems... >> > >> http://www8.cs.umu.se/~stric/tmp/zfs-many.png >> >> As the picture shows, don''t try 10000 filesystems with nfs on sol10. >> Creating more filesystems gets slower and slower the more you have as >> well. >> > > Since all filesystem would be shared via NFS, this clearly is a nogo :). Thanks!If you search the archives, you will find that the people who tried to do this in the past were more successful with legacy NFS export methods than the sharenfs property in ZFS. -- richard
On 31 May, 2011 - Gertjan Oude Lohuis sent me these 0,9K bytes:> On 05/31/2011 03:52 PM, Tomas ?gren wrote: >> I''ve done a not too scientific test on reboot times for Solaris 10 vs 11 >> with regard to many filesystems... >> > >> http://www8.cs.umu.se/~stric/tmp/zfs-many.png >> >> As the picture shows, don''t try 10000 filesystems with nfs on sol10. >> Creating more filesystems gets slower and slower the more you have as >> well. >> > > Since all filesystem would be shared via NFS, this clearly is a nogo :). > Thanks! > >> On a different setup, we have about 750 datasets where we would like to >> use a single recursive snapshot, but when doing that all file access >> will be frozen for varying amounts of time > > What version of ZFS are you using? Like Matthew Ahrens said: version 27 > has a fix for this.22, Solaris 10. /Tomas -- Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Ume? `- Sysadmin at {cs,acc}.umu.se