eric kustarz
2007-Apr-23 23:43 UTC
[zfs-discuss] Re: [nfs-discuss] Multi-tera, small-file filesystems
On Apr 18, 2007, at 6:44 AM, Yaniv Aknin wrote:> Hello, > > I''d like to plan a storage solution for a system currently in > production. > > The system''s storage is based on code which writes many files to > the file system, with overall storage needs currently around 40TB > and expected to reach hundreds of TBs. The average file size of the > system is ~100K, which translates to ~500 million files today, and > billions of files in the future. This storage is accessed over NFS > by a rack of 40 Linux blades, and is mostly read-only (99% of the > activity is reads). While I realize calling this sub-optimal system > design is probably an understatement, the design of the system is > beyond my control and isn''t likely to change in the near future. > > The system''s current storage is based on 4 VxFS filesystems, > created on SVM meta-devices each ~10TB in size. A 2-node Sun > Cluster serves the filesystems, 2 filesystems per node. Each of the > filesystems undergoes growfs as more storage is made available. > We''re looking for an alternative solution, in an attempt to improve > performance and ability to recover from disasters (fsck on 2^42 > files isn''t practical, and I''m getting pretty worried due to this > fact - even the smallest filesystem inconsistency will leave me > lots of useless bits). > > Question is - does anyone here have experience with large ZFS > filesystems with many small-files? Is it practical to base such a > solution on a few (8) zpools, each with single large filesystem in it?hey Yaniv, Why not 1 pool? That''s what we usually recommend (you can have 8 filesystems on top of the 1 pool if you need to). eric
Leon Koll
2007-Apr-24 00:24 UTC
[zfs-discuss] Re: [nfs-discuss] Multi-tera, small-file filesystems
> > On Apr 18, 2007, at 6:44 AM, Yaniv Aknin wrote: > > > Hello, > > > > I''d like to plan a storage solution for a system > currently in > > production. > > > > The system''s storage is based on code which writes > many files to > > the file system, with overall storage needs > currently around 40TB > > and expected to reach hundreds of TBs. The average > file size of the > > system is ~100K, which translates to ~500 million > files today, and > > billions of files in the future. This storage is > accessed over NFS > > by a rack of 40 Linux blades, and is mostly > read-only (99% of the > > activity is reads). While I realize calling this > sub-optimal system > > design is probably an understatement, the design of > the system is > > beyond my control and isn''t likely to change in the > near future. > > > > The system''s current storage is based on 4 VxFS > filesystems, > > created on SVM meta-devices each ~10TB in size. A > 2-node Sun > > Cluster serves the filesystems, 2 filesystems per > node. Each of the > > filesystems undergoes growfs as more storage is > made available. > > We''re looking for an alternative solution, in an > attempt to improve > > performance and ability to recover from disasters > (fsck on 2^42 > > files isn''t practical, and I''m getting pretty > worried due to this > > fact - even the smallest filesystem inconsistency > will leave me > > lots of useless bits). > > > > Question is - does anyone here have experience with > large ZFS > > filesystems with many small-files? Is it practical > to base such a > > solution on a few (8) zpools, each with single > large filesystem in it? > > hey Yaniv, > > Why not 1 pool? That''s what we usually recommend > (you can have 8 > filesystems on top of the 1 pool if you need to). > > ericMy guess that Yaniv assumes that 8 pools with 62.5 million files each have significantly less chances to be corrupted/cause the data loss than 1 pool with 500 million files in it. Do you agree with this? TIA, -- leon This message posted from opensolaris.org
Richard Elling
2007-Apr-24 00:37 UTC
[zfs-discuss] Re: [nfs-discuss] Multi-tera, small-file filesystems
Leon Koll wrote:> My guess that Yaniv assumes that 8 pools with 62.5 million files each have significantly less chances to be corrupted/cause the data loss than 1 pool with 500 million files in it. > Do you agree with this?I do not agree with this statement. The probability is the same, regardless of the number of files. By analogy, if I have 100 people and the risk of heart attack is 0.1%/year/person, then dividing those people into groups does not change their risk of heart attack. -- richard
Leon Koll
2007-Apr-24 00:51 UTC
[zfs-discuss] Re: Re: [nfs-discuss] Multi-tera, small-file filesystems
> Leon Koll wrote: > > My guess that Yaniv assumes that 8 pools with 62.5 > million files each have significantly less chances to > be corrupted/cause the data loss than 1 pool with 500 > million files in it. > > Do you agree with this? > > I do not agree with this statement. The probability > is the same, > regardless of the number of files. By analogy, if I > have 100 people > and the risk of heart attack is 0.1%/year/person, > then dividing those > people into groups does not change their risk of > heart attack. > -- richardMy analogy was - to put these 100 people to 8 elevators instead of one, especially in case when one elevator can carry only 13 people. But if you tell me that the risk of dealing with 500 million files in 1 pool is the same as with 500 million files in 8 pools, I agree that my analogy is not relevant. This message posted from opensolaris.org
Anton B. Rang
2007-Apr-24 01:27 UTC
[zfs-discuss] Re: Re: [nfs-discuss] Multi-tera, small-file filesystems
However, the MTTR is likely to be 1/8 the time.... This message posted from opensolaris.org
Gavin Maltby
2007-Apr-26 09:37 UTC
[zfs-discuss] Re: [nfs-discuss] Multi-tera, small-file filesystems
On 04/24/07 01:37, Richard Elling wrote:> Leon Koll wrote: >> My guess that Yaniv assumes that 8 pools with 62.5 million files each >> have significantly less chances to be corrupted/cause the data loss >> than 1 pool with 500 million files in it. >> Do you agree with this? > > I do not agree with this statement. The probability is the same, > regardless of the number of files. By analogy, if I have 100 people > and the risk of heart attack is 0.1%/year/person, then dividing those > people into groups does not change their risk of heart attack.Is that not because heart attacks in different people are (under normal circumstances!) independent events. 8 filesystems backed by a single pool are not independent; 8 filesystems from 8 distinct pools are a lot more independent. Gavin