Gary Mills
2010-Jan-12 16:46 UTC
[zfs-discuss] How do separate ZFS filesystems affect performance?
I''m working with a Cyrus IMAP server running on a T2000 box under Solaris 10 10/09 with current patches. Mailboxes reside on six ZFS filesystems, each containing about 200 gigabytes of data. These are part of a single zpool built on four Iscsi devices from our Netapp filer. One of these ZFS filesystems contains a number of global and per-user databases in addition to one sixth of the mailboxes. I''m thinking of moving these databases to a separate ZFS filesystem. Access to these databases must be quick to ensure responsiveness of the server. We are currently experiencing a slowdown in performance when the number of simultaneous IMAP sessions rises above 3000. These databases are opened and memory-mapped by all processes. They have the usual requirement for locking and synchronous writes whenever they are updated. Is moving the databases (IMAP metadata) to a separate ZFS filesystem likely to improve performance? I''ve heard that this is important, but I''m not clear why this is. Does each filesystem have its own queue in the ARC or ZIL? Here are some statistics taken while the server was busy and access was slow: # /usr/local/sbin/zilstat 5 5 N-Bytes N-Bytes/s N-Max-Rate B-Bytes B-Bytes/s B-Max-Rate ops <=4kB 4-32kB >=32kB 1126664 225332 515872 11485184 2297036 3469312 292 163 51 79 740536 148107 250896 9535488 1907097 4005888 198 106 24 68 758344 151668 179104 12546048 2509209 2682880 227 93 45 89 603304 120660 204344 9179136 1835827 2084864 179 89 23 67 948896 189779 346520 15880192 3176038 4173824 262 108 32 123 # /usr/local/sbin/arcstat 5 5 Time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c 10:50:16 191M 31M 16 14M 8 17M 48 18M 12 30G 32G 10:50:21 1K 148 10 76 5 72 58 78 15 30G 32G 10:50:26 1K 154 12 88 7 65 72 96 18 30G 32G 10:50:31 796 61 7 54 7 6 35 25 8 30G 32G 10:50:36 1K 117 9 105 8 12 53 44 10 30G 32G -- -Gary Mills- -Unix Group- -Computer and Network Services-
Bob Friesenhahn
2010-Jan-12 17:11 UTC
[zfs-discuss] How do separate ZFS filesystems affect performance?
On Tue, 12 Jan 2010, Gary Mills wrote:> > Is moving the databases (IMAP metadata) to a separate ZFS filesystem > likely to improve performance? I''ve heard that this is important, but > I''m not clear why this is.There is an obvious potential benefit in that you are then able to tune filesystem parameters to best fit the needs of the application which updates the data. For example, if the database uses a small block size, then you can set the filesystem blocksize to match. If the database uses memory mapped files, then using a filesystem blocksize which is closest to the MMU page size may improve performance. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Gary Mills
2010-Jan-12 20:37 UTC
[zfs-discuss] How do separate ZFS filesystems affect performance?
On Tue, Jan 12, 2010 at 11:11:36AM -0600, Bob Friesenhahn wrote:> On Tue, 12 Jan 2010, Gary Mills wrote: > > > >Is moving the databases (IMAP metadata) to a separate ZFS filesystem > >likely to improve performance? I''ve heard that this is important, but > >I''m not clear why this is. > > There is an obvious potential benefit in that you are then able to > tune filesystem parameters to best fit the needs of the application > which updates the data. For example, if the database uses a small > block size, then you can set the filesystem blocksize to match. If > the database uses memory mapped files, then using a filesystem > blocksize which is closest to the MMU page size may improve > performance.I found a couple of references that suggest just putting the databases on their own ZFS filesystem has a great benefit. One is an e-mail message to a mailing list from Vincent Fox at UC Davis. They run a similar system to ours at that site. He says: Particularly the database is important to get it''s own filesystem so that it''s queue/cache are separated. The second one is from: http://blogs.sun.com/roch/entry/the_dynamics_of_zfs He says: For file modification that come with some immediate data integrity constraint (O_DSYNC, fsync etc.) ZFS manages a per-filesystem intent log or ZIL. This sounds like the ZIL queue mentioned above. Is I/O for each of those handled separately? -- -Gary Mills- -Unix Group- -Computer and Network Services-
Ray Van Dolson
2010-Jan-12 20:39 UTC
[zfs-discuss] How do separate ZFS filesystems affect performance?
On Tue, Jan 12, 2010 at 12:37:30PM -0800, Gary Mills wrote:> On Tue, Jan 12, 2010 at 11:11:36AM -0600, Bob Friesenhahn wrote: > > On Tue, 12 Jan 2010, Gary Mills wrote: > > > > > >Is moving the databases (IMAP metadata) to a separate ZFS filesystem > > >likely to improve performance? I''ve heard that this is important, but > > >I''m not clear why this is. > > > > There is an obvious potential benefit in that you are then able to > > tune filesystem parameters to best fit the needs of the application > > which updates the data. For example, if the database uses a small > > block size, then you can set the filesystem blocksize to match. If > > the database uses memory mapped files, then using a filesystem > > blocksize which is closest to the MMU page size may improve > > performance. > > I found a couple of references that suggest just putting the databases > on their own ZFS filesystem has a great benefit. One is an e-mail > message to a mailing list from Vincent Fox at UC Davis. They run a > similar system to ours at that site. He says: > > Particularly the database is important to get it''s own filesystem so > that it''s queue/cache are separated. > > The second one is from: > > http://blogs.sun.com/roch/entry/the_dynamics_of_zfs > > He says: > > For file modification that come with some immediate data integrity > constraint (O_DSYNC, fsync etc.) ZFS manages a per-filesystem intent > log or ZIL. > > This sounds like the ZIL queue mentioned above. Is I/O for each of > those handled separately?That''s interesting... and if so, is there a way to designate a log device for a specific filesystem? Ray
Richard Elling
2010-Jan-12 21:56 UTC
[zfs-discuss] How do separate ZFS filesystems affect performance?
On Jan 12, 2010, at 12:37 PM, Gary Mills wrote:> On Tue, Jan 12, 2010 at 11:11:36AM -0600, Bob Friesenhahn wrote: >> On Tue, 12 Jan 2010, Gary Mills wrote: >>> >>> Is moving the databases (IMAP metadata) to a separate ZFS filesystem >>> likely to improve performance? I''ve heard that this is important, but >>> I''m not clear why this is. >> >> There is an obvious potential benefit in that you are then able to >> tune filesystem parameters to best fit the needs of the application >> which updates the data. For example, if the database uses a small >> block size, then you can set the filesystem blocksize to match. If >> the database uses memory mapped files, then using a filesystem >> blocksize which is closest to the MMU page size may improve >> performance. > > I found a couple of references that suggest just putting the databases > on their own ZFS filesystem has a great benefit. One is an e-mail > message to a mailing list from Vincent Fox at UC Davis. They run a > similar system to ours at that site. He says: > > Particularly the database is important to get it''s own filesystem so > that it''s queue/cache are separated.Another policy you might consider is the recordsize for the database vs the message store. In general, databases like the recordsize to match. Of course, recordsize is a per-dataset parameter.> The second one is from: > > http://blogs.sun.com/roch/entry/the_dynamics_of_zfs > > He says: > > For file modification that come with some immediate data integrity > constraint (O_DSYNC, fsync etc.) ZFS manages a per-filesystem intent > log or ZIL. > > This sounds like the ZIL queue mentioned above. Is I/O for each of > those handled separately?ZIL is for the pool. We did some experiments with the messaging server and a RAID array with separate logs. As expected, it didn''t make much difference because of the nice, large nonvolatile write cache on the array. This reinforces the notion that Dan Carosone also recently noted: performance gains for separate logs are possible when the latency of the separate log device is much lower than the latency of the devices in the main pool, and, of course, the workload uses sync writes. -- richard
Gary Mills
2010-Jan-13 14:21 UTC
[zfs-discuss] How do separate ZFS filesystems affect performance?
On Tue, Jan 12, 2010 at 01:56:57PM -0800, Richard Elling wrote:> On Jan 12, 2010, at 12:37 PM, Gary Mills wrote: > > > On Tue, Jan 12, 2010 at 11:11:36AM -0600, Bob Friesenhahn wrote: > >> On Tue, 12 Jan 2010, Gary Mills wrote: > >>> > >>> Is moving the databases (IMAP metadata) to a separate ZFS filesystem > >>> likely to improve performance? I''ve heard that this is important, but > >>> I''m not clear why this is. > > > > I found a couple of references that suggest just putting the databases > > on their own ZFS filesystem has a great benefit. One is an e-mail > > message to a mailing list from Vincent Fox at UC Davis. They run a > > similar system to ours at that site. He says: > > > > Particularly the database is important to get it''s own filesystem so > > that it''s queue/cache are separated. > > Another policy you might consider is the recordsize for the > database vs the message store. In general, databases like the > recordsize to match. Of course, recordsize is a per-dataset > parameter.Unfortunately, it''s not a single database. There are many of them, of different types. One is a Berkeley DB, others are something specific to the IMAP server (called skiplist), and some are small flat files that are just rewritten. All they have in common is activity and frequent locking. They can be relocated as a whole.> > The second one is from: > > > > http://blogs.sun.com/roch/entry/the_dynamics_of_zfs > > > > He says: > > > > For file modification that come with some immediate data integrity > > constraint (O_DSYNC, fsync etc.) ZFS manages a per-filesystem intent > > log or ZIL. > > > > This sounds like the ZIL queue mentioned above. Is I/O for each of > > those handled separately? > > ZIL is for the pool.Yes, I understand that, but do filesystems have separate queues of any sort within the ZIL? If not, would it help to put the database filesystems into a separate zpool?> We did some experiments with the messaging server and a RAID > array with separate logs. As expected, it didn''t make much difference > because of the nice, large nonvolatile write cache on the array. This > reinforces the notion that Dan Carosone also recently noted: performance > gains for separate logs are possible when the latency of the separate > log device is much lower than the latency of the devices in the main pool, > and, of course, the workload uses sync writes.It certainly sounds as if latency is the key for synchronous writes. -- -Gary Mills- -Unix Group- -Computer and Network Services-
Daniel Carosone
2010-Jan-13 23:58 UTC
[zfs-discuss] How do separate ZFS filesystems affect performance?
On Wed, Jan 13, 2010 at 08:21:13AM -0600, Gary Mills wrote:> Yes, I understand that, but do filesystems have separate queues of any > sort within the ZIL?I''m not sure. If you can experiment and measure a benefit, understanding the reasons is helpful but secondary. If you can''t experiment so easily, you''re stuck asking questions, as now, to see whether the effort of experimenting is potentially worthwhile. Some other things to note (not necessarily arguments for or against): * you can have multiple slog devices, in case you''re creating so much ZIL traffic that ZIL queueing is a real problem, however shared or structured between filesystems. * separate filesystems can have different properties which might help tuning and experiments (logbias, copies, compress, *cache), as well the recordsize. Maybe you will find that compress on mailboxes helps, as long as you''re not also compressing the db''s? * separate filesystems may have different recovery requirements (snapshot cycles). Note that taking snapshots is ~free, but keeping them and deleting them have costs over time. Perhaps you can save some of these costs if the db''s are throwaway/rebuildable.> If not, would it help to put the database > filesystems into a separate zpool?Maybe, if you have the extra devices - but you need to compare with the potential benefit of adding those devices (and their IOPS) to benefit all users of the existing pool. For example, if the databases are a distinctly different enough load, you could compare putting them on a dedicated pool on ssd, vs using those ssd''s as additional slog/l2arc. Unless you can make quite categorical separations between the workloads, such that an unbalanced configuration matches an unbalanced workload, you may still be better with consolidated IO capacity in the one pool. Note, also, you can only take recursive atomic snapshots within the one pool - this might be important if the db''s have to match the mailbox state exactly, for recovery. -- Dan. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 194 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100114/88d62f48/attachment.bin>
Roch
2010-Jan-14 09:47 UTC
[zfs-discuss] How do separate ZFS filesystems affect performance?
Gary Mills writes: > On Tue, Jan 12, 2010 at 01:56:57PM -0800, Richard Elling wrote: > > On Jan 12, 2010, at 12:37 PM, Gary Mills wrote: > > > > > On Tue, Jan 12, 2010 at 11:11:36AM -0600, Bob Friesenhahn wrote: > > >> On Tue, 12 Jan 2010, Gary Mills wrote: > > >>> > > >>> Is moving the databases (IMAP metadata) to a separate ZFS filesystem > > >>> likely to improve performance? I''ve heard that this is important, but > > >>> I''m not clear why this is. > > > > > > I found a couple of references that suggest just putting the databases > > > on their own ZFS filesystem has a great benefit. One is an e-mail > > > message to a mailing list from Vincent Fox at UC Davis. They run a > > > similar system to ours at that site. He says: > > > > > > Particularly the database is important to get it''s own filesystem so > > > that it''s queue/cache are separated. > > > > Another policy you might consider is the recordsize for the > > database vs the message store. In general, databases like the > > recordsize to match. Of course, recordsize is a per-dataset > > parameter. > > Unfortunately, it''s not a single database. There are many of them, of > different types. One is a Berkeley DB, others are something specific > to the IMAP server (called skiplist), and some are small flat files > that are just rewritten. All they have in common is activity and > frequent locking. They can be relocated as a whole. > > > > The second one is from: > > > > > > http://blogs.sun.com/roch/entry/the_dynamics_of_zfs > > > > > > He says: > > > > > > For file modification that come with some immediate data integrity > > > constraint (O_DSYNC, fsync etc.) ZFS manages a per-filesystem intent > > > log or ZIL. > > > > > > This sounds like the ZIL queue mentioned above. Is I/O for each of > > > those handled separately? > > > > ZIL is for the pool. > > Yes, I understand that, but do filesystems have separate queues of any > sort within the ZIL? If not, would it help to put the database > filesystems into a separate zpool? > The slog device is for the pool but the ZIL is per filesystem/dataset. The logbias property can be used on a dataset to prevent that set from consuming the slog device resource : http://blogs.sun.com/roch/entry/synchronous_write_bias_property -r > > We did some experiments with the messaging server and a RAID > > array with separate logs. As expected, it didn''t make much difference > > because of the nice, large nonvolatile write cache on the array. This > > reinforces the notion that Dan Carosone also recently noted: performance > > gains for separate logs are possible when the latency of the separate > > log device is much lower than the latency of the devices in the main pool, > > and, of course, the workload uses sync writes. > > It certainly sounds as if latency is the key for synchronous writes. > > -- > -Gary Mills- -Unix Group- -Computer and Network Services- > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Gary Mills
2010-Jan-14 14:36 UTC
[zfs-discuss] How do separate ZFS filesystems affect performance?
On Thu, Jan 14, 2010 at 10:58:48AM +1100, Daniel Carosone wrote:> On Wed, Jan 13, 2010 at 08:21:13AM -0600, Gary Mills wrote: > > Yes, I understand that, but do filesystems have separate queues of any > > sort within the ZIL? > > I''m not sure. If you can experiment and measure a benefit, > understanding the reasons is helpful but secondary. If you can''t > experiment so easily, you''re stuck asking questions, as now, to see > whether the effort of experimenting is potentially worthwhile.Yes, we''re stuck asking questions. I appreciate your responses.> Some other things to note (not necessarily arguments for or against): > > * you can have multiple slog devices, in case you''re creating > so much ZIL traffic that ZIL queueing is a real problem, however > shared or structured between filesystems.For the time being, I''d like to stay with the ZIL that''s internal to the zpool.> * separate filesystems can have different properties which might help > tuning and experiments (logbias, copies, compress, *cache), as well > the recordsize. Maybe you will find that compress on mailboxes > helps, as long as you''re not also compressing the db''s?Yes, that''s a good point in favour of a separate filesystem.> * separate filesystems may have different recovery requirements > (snapshot cycles). Note that taking snapshots is ~free, but > keeping them and deleting them have costs over time. Perhaps you > can save some of these costs if the db''s are throwaway/rebuildable.Also a good point.> > If not, would it help to put the database > > filesystems into a separate zpool? > > Maybe, if you have the extra devices - but you need to compare with > the potential benefit of adding those devices (and their IOPS) to > benefit all users of the existing pool. > > For example, if the databases are a distinctly different enough load, > you could compare putting them on a dedicated pool on ssd, vs using > those ssd''s as additional slog/l2arc. Unless you can make quite > categorical separations between the workloads, such that an unbalanced > configuration matches an unbalanced workload, you may still be better > with consolidated IO capacity in the one pool.As well, I''d like to keep all of the ZFS pools on the same external storage device. This makes migrating to a different server quite easy.> Note, also, you can only take recursive atomic snapshots within the > one pool - this might be important if the db''s have to match the > mailbox state exactly, for recovery.That''s another good point. It''s certainly better to have synchronized snapshots. -- -Gary Mills- -Unix Group- -Computer and Network Services-
Gary Mills
2010-Jan-14 14:41 UTC
[zfs-discuss] How do separate ZFS filesystems affect performance?
On Thu, Jan 14, 2010 at 01:47:46AM -0800, Roch wrote:> > Gary Mills writes: > > > > Yes, I understand that, but do filesystems have separate queues of any > > sort within the ZIL? If not, would it help to put the database > > filesystems into a separate zpool? > > > > The slog device is for the pool but the ZIL is per > filesystem/dataset. The logbias property can be used on a dataset to > prevent that set from consuming the slog device resource : > > http://blogs.sun.com/roch/entry/synchronous_write_bias_propertyAh, that''s what I wanted to know. Thanks for the response. -- -Gary Mills- -Unix Group- -Computer and Network Services-
Richard Elling
2010-Jan-14 16:49 UTC
[zfs-discuss] How do separate ZFS filesystems affect performance?
On Jan 14, 2010, at 6:41 AM, Gary Mills wrote:> On Thu, Jan 14, 2010 at 01:47:46AM -0800, Roch wrote: >> >> Gary Mills writes: >>> >>> Yes, I understand that, but do filesystems have separate queues of any >>> sort within the ZIL? If not, would it help to put the database >>> filesystems into a separate zpool? >>> >> >> The slog device is for the pool but the ZIL is per >> filesystem/dataset. The logbias property can be used on a dataset to >> prevent that set from consuming the slog device resource : >> >> http://blogs.sun.com/roch/entry/synchronous_write_bias_property > > Ah, that''s what I wanted to know. Thanks for the response.Roch, I think this can be misinterpreted, so perhaps more clarity is needed. If you have sync writes, they will be written to persistent storage before they are acknowledged. The only question is where they will be written: to the ZIL or pool? By default, this preference is based on the size of each I/O, with small I/Os written to the ZIL and large I/Os written to the pool. The dataset parameter logbias is used to set the ZIL vs pool preference. Thus, one could force all datasets, save one, to use the pool and permit the one, lucky dataset to use the ZIL (or vice versa) Separate log devices is an orthogonal issue. -- richard
Richard Elling
2010-Jan-14 18:27 UTC
[zfs-discuss] How do separate ZFS filesystems affect performance?
additional clarification ... On Jan 14, 2010, at 8:49 AM, Richard Elling wrote:> On Jan 14, 2010, at 6:41 AM, Gary Mills wrote: > >> On Thu, Jan 14, 2010 at 01:47:46AM -0800, Roch wrote: >>> >>> Gary Mills writes: >>>> >>>> Yes, I understand that, but do filesystems have separate queues of any >>>> sort within the ZIL? If not, would it help to put the database >>>> filesystems into a separate zpool? >>>> >>> >>> The slog device is for the pool but the ZIL is per >>> filesystem/dataset. The logbias property can be used on a dataset to >>> prevent that set from consuming the slog device resource : >>> >>> http://blogs.sun.com/roch/entry/synchronous_write_bias_property >> >> Ah, that''s what I wanted to know. Thanks for the response. > > Roch, I think this can be misinterpreted, so perhaps more clarity is needed. > > If you have sync writes, they will be written to persistent storage before > they are acknowledged. > > The only question is where they will be written: to the ZIL or pool? > > By default, this preference is based on the size of each I/O, with small > I/Os written to the ZIL and large I/Os written to the pool. > > The dataset parameter logbias is used to set the ZIL vs pool preference. > > Thus, one could force all datasets, save one, to use the pool and permit > the one, lucky dataset to use the ZIL (or vice versa)Should read: Thus, one could force all datasets, save one, to use the pool and permit the one, lucky dataset to use the ZIL (or vice versa) for large I/Os. -- richard> Separate log devices is an orthogonal issue. > -- richard >