Joe Armstrong
2009-May-19 16:01 UTC
ext3 efficiency, larger vs smaller file system, lots of inodes...
(... to Nabble Ext3:Users - reposted by me after I joined the ext3-users mailing list - sorry for the dup...) A bit of a rambling subject there but I am trying to figure out if it is more efficient at runtime to have few very large file systems (8 TB) vs a larger number of smaller file systems. The file systems will hold many small files. My preference is to have a larger number of smaller file systems for faster recovery and less impact if a problem does occur, but I was wondering if anybody had information from a runtime performance perspective - is there a difference between few large and many small file systems ? Is memory consumption higher for the inode tables if there are more small ones vs one really large one ? Also, does anybody have a reasonable formula for calculating memory requirements of a given file system ? Thanks. Joe
Eric Sandeen
2009-May-19 16:21 UTC
ext3 efficiency, larger vs smaller file system, lots of inodes...
Joe Armstrong wrote:> (... to Nabble Ext3:Users - reposted by me after I joined the > ext3-users mailing list - sorry for the dup...) > > A bit of a rambling subject there but I am trying to figure out if it > is more efficient at runtime to have few very large file systems (8 > TB) vs a larger number of smaller file systems. The file systems > will hold many small files. > > My preference is to have a larger number of smaller file systems for > faster recovery and less impact if a problem does occur, but I was > wondering if anybody had information from a runtime performance > perspective - is there a difference between few large and many small > file systems ? Is memory consumption higher for the inode tables if > there are more small ones vs one really large one ?It's the vfs that caches dentries & inodes; whether they come from multiple filesystems or one should not change matters significantly. The other downside to multiple smaller filesystems is space management, when you wind up with half of them full and half of them empty, it may be hard to rearrange. But the extra granularity for better availability and fsck/recovery time may be well worth it. It probably depends on what your application is doing and how it can manage the space. You might want to test filling an 8T filesystem and see for yourself how long fsck will take... it'll be a while. Perhaps a very long while. :)> Also, does anybody have a reasonable formula for calculating memory > requirements of a given file system ?Probably the largest memory footprint will be the cached dentries & inodes, though this is a "soft" requirement since it's mostly just cached. Each journal probably has a bit of memory requirement overhead, but I doubt it'll be a significant factor in your decision unless every byte is at a premium... -Eric> Thanks. Joe
Joe Armstrong
2009-May-19 16:28 UTC
ext3 efficiency, larger vs smaller file system, lots of inodes...
-----Original Message----- From: Eric Sandeen [mailto:sandeen at redhat.com] Sent: Tuesday, May 19, 2009 9:21 AM To: Joe Armstrong Cc: ext3-users at redhat.com Subject: Re: ext3 efficiency, larger vs smaller file system, lots of inodes... Joe Armstrong wrote:> (... to Nabble Ext3:Users - reposted by me after I joined the > ext3-users mailing list - sorry for the dup...) > > A bit of a rambling subject there but I am trying to figure out if it > is more efficient at runtime to have few very large file systems (8 > TB) vs a larger number of smaller file systems. The file systems > will hold many small files. > > My preference is to have a larger number of smaller file systems for > faster recovery and less impact if a problem does occur, but I was > wondering if anybody had information from a runtime performance > perspective - is there a difference between few large and many small > file systems ? Is memory consumption higher for the inode tables if > there are more small ones vs one really large one ?It's the vfs that caches dentries & inodes; whether they come from multiple filesystems or one should not change matters significantly. The other downside to multiple smaller filesystems is space management, when you wind up with half of them full and half of them empty, it may be hard to rearrange. But the extra granularity for better availability and fsck/recovery time may be well worth it. It probably depends on what your application is doing and how it can manage the space. You might want to test filling an 8T filesystem and see for yourself how long fsck will take... it'll be a while. Perhaps a very long while. :)> Also, does anybody have a reasonable formula for calculating memory > requirements of a given file system ?Probably the largest memory footprint will be the cached dentries & inodes, though this is a "soft" requirement since it's mostly just cached. Each journal probably has a bit of memory requirement overhead, but I doubt it'll be a significant factor in your decision unless every byte is at a premium... -Eric> Thanks. JoeOK, it sounds like it is mostly a space management issue rather than a performance issue. FWIW, the space management issue we were planning on managing via LVM and allocating some medium size volumes to start with and leave lots of spare extents unallocated and then just grow the volume/fs as needed. Thanks. Joe
Joe Armstrong
2009-May-19 17:08 UTC
ext3 efficiency, larger vs smaller file system, lots of inodes...
> -----Original Message----- > From: Ric Wheeler [mailto:rwheeler at redhat.com] > Sent: Tuesday, May 19, 2009 9:54 AM > To: Joe Armstrong > Cc: ext3-users at redhat.com > Subject: Re: ext3 efficiency, larger vs smaller file system, lots of > inodes... > > On 05/19/2009 12:28 PM, Joe Armstrong wrote: > > > > -----Original Message----- > > From: Eric Sandeen [mailto:sandeen at redhat.com] > > Sent: Tuesday, May 19, 2009 9:21 AM > > To: Joe Armstrong > > Cc: ext3-users at redhat.com > > Subject: Re: ext3 efficiency, larger vs smaller file system, lots of > inodes... > > > > Joe Armstrong wrote: > >> (... to Nabble Ext3:Users - reposted by me after I joined the > >> ext3-users mailing list - sorry for the dup...) > >> > >> A bit of a rambling subject there but I am trying to figure out if > it > >> is more efficient at runtime to have few very large file systems (8 > >> TB) vs a larger number of smaller file systems. The file systems > >> will hold many small files. > >> > >> My preference is to have a larger number of smaller file systems for > >> faster recovery and less impact if a problem does occur, but I was > >> wondering if anybody had information from a runtime performance > >> perspective - is there a difference between few large and many > small > >> file systems ? Is memory consumption higher for the inode tables if > >> there are more small ones vs one really large one ? > > > > It's the vfs that caches dentries& inodes; whether they come from > > multiple filesystems or one should not change matters significantly. > > > > The other downside to multiple smaller filesystems is space > management, > > when you wind up with half of them full and half of them empty, it > may > > be hard to rearrange. > > > > But the extra granularity for better availability and fsck/recovery > time > > may be well worth it. It probably depends on what your application > is > > doing and how it can manage the space. You might want to test > filling > > an 8T filesystem and see for yourself how long fsck will take... > it'll > > be a while. Perhaps a very long while. :) > > > >> Also, does anybody have a reasonable formula for calculating memory > >> requirements of a given file system ? > > > > Probably the largest memory footprint will be the cached dentries& > > inodes, though this is a "soft" requirement since it's mostly just > cached. > > > > Each journal probably has a bit of memory requirement overhead, but I > > doubt it'll be a significant factor in your decision unless every > byte > > is at a premium... > > > > -Eric > > > > How you do this also depends on the type of storage you use. If you > have > multiple file systems on one physical disk (say 2 1TB partitions on a > 2TB S-ATA > disk), you need to be careful not to bash on both file systems at once > since you > will thrash the disk heads. > > In general, it is less of an issue with arrays, but still can have a > performance > impact. > > RicJust for completeness, we will be using Striped LUN's (RAID-6 underneath), so I hope that the striping will distribute the IO's while the RAID-6 device will provide the HA/recovery capabilities. Joe
Theodore Tso
2009-May-19 17:47 UTC
ext3 efficiency, larger vs smaller file system, lots of inodes...
On Tue, May 19, 2009 at 09:01:47AM -0700, Joe Armstrong wrote:> > A bit of a rambling subject there but I am trying to figure out if > it is more efficient at runtime to have few very large file systems > (8 TB) vs a larger number of smaller file systems. The file systems > will hold many small files.No, it's not really more efficient to have large filesystems --- efficiency at least in terms of performance, that is. In fact, depending on your workload, it sometimes can be more efficiency to have smaller filesystems, since it the journal is a single choke-point if you have a fsync-heavy workload. Other advantages of smaller filesystems is that it's faster to fsck a particular filesystem. The disadvantage of breaking up a large filesystem are the obvious ones; you have less flexibility about space allocation, and you can't hard link across different filesystems, which can be a big deal for some folks.> Is memory consumption higher for the inode tables if > there are more small ones vs one really large one ?No, because we don't keep a entire filesystem inode table in memory; pieces of it are brought in as needed, and when they aren't needed they are released from memory. About the only thing which is permanently pinned into memory are the block group descriptors, which take up 32 bytes per block group descriptor, where a block group descriptor represents 32 megabytes of storage on disk. So 1 GB of filesystem will require 1k of space, and a 1TB filesystem will require 1 megabyte of memory in terms of block group descriptors. There are some other overheads, but most of them are fixed overheads, and normally not a problem. The struct superblock data structure a kilobyte or so, for example. The buffer heads for the block group descriptors are 56 bytes per 4k of block group descriptors, so 1 megabytes of block grouptors also requires 14k of buffer heads. Unless you're creating some kind of embedded NAS system, I doubt memory consuption will be a major problem for you. - Ted