Adam N. Copeland
2008-Oct-21 13:00 UTC
[zfs-discuss] Tuning ZFS for Sun Java Messaging Server
We''re using a rather large (3.8TB) ZFS volume for our mailstores on a JMS setup. Does anybody have any tips for tuning ZFS for JMS? I''m looking for even the most obvious tips, as I am a bit of a novice. Thanks, Adam
Robert Milkowski
2008-Oct-21 23:28 UTC
[zfs-discuss] Tuning ZFS for Sun Java Messaging Server
Hello Adam, Tuesday, October 21, 2008, 2:00:46 PM, you wrote: ANC> We''re using a rather large (3.8TB) ZFS volume for our mailstores on a ANC> JMS setup. Does anybody have any tips for tuning ZFS for JMS? I''m ANC> looking for even the most obvious tips, as I am a bit of a novice. Thanks, Well, it''s kind of broad topic and it depends on a specific environment. Then do not tune for the sake of tuning - try to understand your problem first. Nevertheless you should consider things like (random order): 1. RAID level - you probably will end-up with relatively small random IOs - generally avoid RAID-Z Of course it could be that RAID-Z in your environment is perfectly fine. 2. Depending on your workload and disk subsystem ZFS''s slog on SSD could help to improve performance 3. Disable atime updates on zfs file system 4. Enabling compression like lzjb in theory could help - depends on how weel you data would compress and how much CPU you have left and if you are mostly IO bond 5. ZFS recordsize - probably not as in most cases when you read anything from email you will probably read entire mail anyway. Nevertheless could be easily checked with dtrace. 6. IIRC JMS keeps an index/db file per mailbox - so just maybe L2ARC on large SSD would help assuming it would nicely cache these files - would need to be simulated/tested 7. Disabling vdev pre-fetching in ZFS could help - see ZFS Evile tuning guide Except for #3 and maybe #7 first identify what is your problem and what are you trying to fix. -- Best regards, Robert Milkowski mailto:milek at task.gda.pl http://milek.blogspot.com
Richard Elling
2008-Oct-22 20:56 UTC
[zfs-discuss] Tuning ZFS for Sun Java Messaging Server
As it happens, I''m currently involved with a project doing some performance analysis for this... but it is currently a WIP. Comments below. Robert Milkowski wrote:> Hello Adam, > > Tuesday, October 21, 2008, 2:00:46 PM, you wrote: > > ANC> We''re using a rather large (3.8TB) ZFS volume for our mailstores on a > ANC> JMS setup. Does anybody have any tips for tuning ZFS for JMS? I''m > ANC> looking for even the most obvious tips, as I am a bit of a novice. Thanks, > > Well, it''s kind of broad topic and it depends on a specific > environment. Then do not tune for the sake of tuning - try to > understand your problem first. Nevertheless you should consider things like (random order): > > 1. RAID level - you probably will end-up with relatively small random > IOs - generally avoid RAID-Z > Of course it could be that RAID-Z in your environment is perfectly > fine. >There are some write latency-sensitive areas that will begin to cause consternation for large loads. Storage tuning is very important in this space. In our case, we''re using a ST6540 array which has a decent write cache and fast back-end.> 2. Depending on your workload and disk subsystem ZFS''s slog on SSD > could help to improve performance >My experiments show that this is not the main performance issue for large message volumes.> 3. Disable atime updates on zfs file system >Agree. JMS doesn''t use it, so it just means extra work.> 4. Enabling compression like lzjb in theory could help - depends on > how weel you data would compress and how much CPU you have left and if > you are mostly IO bond >We have not experimented with this yet, but know that some of the latency-sensitive writes are files with a small number of bytes, which will not compress to be less than one disk block. [opportunities for cleverness are here :-)] There may be a benefit for the message body, but in my tests we are not concentrating on that at this time.> 5. ZFS recordsize - probably not as in most cases when you read > anything from email you will probably read entire mail anyway. > Nevertheless could be easily checked with dtrace. >This does not seem to be an issue.> 6. IIRC JMS keeps an index/db file per mailbox - so just maybe L2ARC > on large SSD would help assuming it would nicely cache these files - > would need to be simulated/tested >This does not seem to be an issue, but in our testing the message stores have plenty of memory, and hence, ARC size is on the order of tens of GBytes.> 7. Disabling vdev pre-fetching in ZFS could help - see ZFS Evile tuning > guide >My experiments showed no benefit by disabling pre-fetch. However, there are multiple layers of pre-fetching at play when you are using an array, and we haven''t done a complete analysis on this yet. It is clear that we are not bandwidth limited, so prefetching may not hurt.> > Except for #3 and maybe #7 first identify what is your problem and > what are you trying to fix. > >Yep. -- richard
Adam N. Copeland
2008-Oct-24 18:54 UTC
[zfs-discuss] Tuning ZFS for Sun Java Messaging Server
Thanks for the replies. It appears the problem is that we are I/O bound. We have our SAN guy looking into possibly moving us to faster spindles. In the meantime, I wanted to implement whatever was possible to give us breathing room. Turning off atime certainly helped, but we are definitely not completely out of the drink yet. I also found that disabling the ZFS cache flush as per the Evil Tuning Guide was a huge boon, considering we''re on a battery-backed (non-Sun) SAN. Thanks, Adam Richard Elling wrote:> As it happens, I''m currently involved with a project doing some > performance > analysis for this... but it is currently a WIP. Comments below. > > Robert Milkowski wrote: >> Hello Adam, >> >> Tuesday, October 21, 2008, 2:00:46 PM, you wrote: >> >> ANC> We''re using a rather large (3.8TB) ZFS volume for our mailstores >> on a >> ANC> JMS setup. Does anybody have any tips for tuning ZFS for JMS? I''m >> ANC> looking for even the most obvious tips, as I am a bit of a >> novice. Thanks, >> >> Well, it''s kind of broad topic and it depends on a specific >> environment. Then do not tune for the sake of tuning - try to >> understand your problem first. Nevertheless you should consider >> things like (random order): >> >> 1. RAID level - you probably will end-up with relatively small random >> IOs - generally avoid RAID-Z >> Of course it could be that RAID-Z in your environment is perfectly >> fine. >> > > There are some write latency-sensitive areas that will begin > to cause consternation for large loads. Storage tuning is very > important in this space. In our case, we''re using a ST6540 > array which has a decent write cache and fast back-end. > >> 2. Depending on your workload and disk subsystem ZFS''s slog on SSD >> could help to improve performance >> > > My experiments show that this is not the main performance > issue for large message volumes. > >> 3. Disable atime updates on zfs file system >> > > Agree. JMS doesn''t use it, so it just means extra work. > >> 4. Enabling compression like lzjb in theory could help - depends on >> how weel you data would compress and how much CPU you have left and if >> you are mostly IO bond >> > > We have not experimented with this yet, but know that some > of the latency-sensitive writes are files with a small number of > bytes, which will not compress to be less than one disk block. > [opportunities for cleverness are here :-)] > > There may be a benefit for the message body, but in my tests > we are not concentrating on that at this time. > >> 5. ZFS recordsize - probably not as in most cases when you read >> anything from email you will probably read entire mail anyway. >> Nevertheless could be easily checked with dtrace. >> > > This does not seem to be an issue. > >> 6. IIRC JMS keeps an index/db file per mailbox - so just maybe L2ARC >> on large SSD would help assuming it would nicely cache these files - >> would need to be simulated/tested >> > > This does not seem to be an issue, but in our testing the message > stores have plenty of memory, and hence, ARC size is on the order > of tens of GBytes. > >> 7. Disabling vdev pre-fetching in ZFS could help - see ZFS Evile tuning >> guide >> > > My experiments showed no benefit by disabling pre-fetch. However, > there are multiple layers of pre-fetching at play when you are using an > array, and we haven''t done a complete analysis on this yet. It is clear > that we are not bandwidth limited, so prefetching may not hurt. > >> >> Except for #3 and maybe #7 first identify what is your problem and >> what are you trying to fix. >> >> > > > Yep. > -- richard >
Torrey McMahon
2008-Oct-24 18:57 UTC
[zfs-discuss] Tuning ZFS for Sun Java Messaging Server
You may want to ask your SAN vendor if they have a setting you can make to no-op the cache flush. That way you don''t have to worry about the flush behavior if you change/add different arrays. Adam N. Copeland wrote:> Thanks for the replies. > > It appears the problem is that we are I/O bound. We have our SAN guy > looking into possibly moving us to faster spindles. In the meantime, I > wanted to implement whatever was possible to give us breathing room. > Turning off atime certainly helped, but we are definitely not completely > out of the drink yet. > > I also found that disabling the ZFS cache flush as per the Evil Tuning > Guide was a huge boon, considering we''re on a battery-backed (non-Sun) SAN. > > Thanks, > Adam > > Richard Elling wrote: > >> As it happens, I''m currently involved with a project doing some >> performance >> analysis for this... but it is currently a WIP. Comments below. >> >> Robert Milkowski wrote: >> >>> Hello Adam, >>> >>> Tuesday, October 21, 2008, 2:00:46 PM, you wrote: >>> >>> ANC> We''re using a rather large (3.8TB) ZFS volume for our mailstores >>> on a >>> ANC> JMS setup. Does anybody have any tips for tuning ZFS for JMS? I''m >>> ANC> looking for even the most obvious tips, as I am a bit of a >>> novice. Thanks, >>> >>> Well, it''s kind of broad topic and it depends on a specific >>> environment. Then do not tune for the sake of tuning - try to >>> understand your problem first. Nevertheless you should consider >>> things like (random order): >>> >>> 1. RAID level - you probably will end-up with relatively small random >>> IOs - generally avoid RAID-Z >>> Of course it could be that RAID-Z in your environment is perfectly >>> fine. >>> >>> >> There are some write latency-sensitive areas that will begin >> to cause consternation for large loads. Storage tuning is very >> important in this space. In our case, we''re using a ST6540 >> array which has a decent write cache and fast back-end. >> >> >>> 2. Depending on your workload and disk subsystem ZFS''s slog on SSD >>> could help to improve performance >>> >>> >> My experiments show that this is not the main performance >> issue for large message volumes. >> >> >>> 3. Disable atime updates on zfs file system >>> >>> >> Agree. JMS doesn''t use it, so it just means extra work. >> >> >>> 4. Enabling compression like lzjb in theory could help - depends on >>> how weel you data would compress and how much CPU you have left and if >>> you are mostly IO bond >>> >>> >> We have not experimented with this yet, but know that some >> of the latency-sensitive writes are files with a small number of >> bytes, which will not compress to be less than one disk block. >> [opportunities for cleverness are here :-)] >> >> There may be a benefit for the message body, but in my tests >> we are not concentrating on that at this time. >> >> >>> 5. ZFS recordsize - probably not as in most cases when you read >>> anything from email you will probably read entire mail anyway. >>> Nevertheless could be easily checked with dtrace. >>> >>> >> This does not seem to be an issue. >> >> >>> 6. IIRC JMS keeps an index/db file per mailbox - so just maybe L2ARC >>> on large SSD would help assuming it would nicely cache these files - >>> would need to be simulated/tested >>> >>> >> This does not seem to be an issue, but in our testing the message >> stores have plenty of memory, and hence, ARC size is on the order >> of tens of GBytes. >> >> >>> 7. Disabling vdev pre-fetching in ZFS could help - see ZFS Evile tuning >>> guide >>> >>> >> My experiments showed no benefit by disabling pre-fetch. However, >> there are multiple layers of pre-fetching at play when you are using an >> array, and we haven''t done a complete analysis on this yet. It is clear >> that we are not bandwidth limited, so prefetching may not hurt. >> >> >>> Except for #3 and maybe #7 first identify what is your problem and >>> what are you trying to fix. >>> >>> >>> >> Yep. >> -- richard >> >> > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > >
Richard Elling
2008-Oct-24 19:10 UTC
[zfs-discuss] Tuning ZFS for Sun Java Messaging Server
Adam N. Copeland wrote:> Thanks for the replies. > > It appears the problem is that we are I/O bound. We have our SAN guy > looking into possibly moving us to faster spindles. In the meantime, I > wanted to implement whatever was possible to give us breathing room. > Turning off atime certainly helped, but we are definitely not completely > out of the drink yet. > > I also found that disabling the ZFS cache flush as per the Evil Tuning > Guide was a huge boon, considering we''re on a battery-backed (non-Sun) SAN. >Really? Which OS version are you on? This should have been fixed in Solaris 10 5/08 (it is a fix in the [s]sd driver). Caveat: there may be some devices which do not properly negotiate the SYNC_NV bit. In my tests, using Solaris 10 5/08, disabling the cache flush made zero difference. -- richard
Torrey McMahon
2008-Oct-24 19:34 UTC
[zfs-discuss] Tuning ZFS for Sun Java Messaging Server
Richard Elling wrote:> Adam N. Copeland wrote: > >> Thanks for the replies. >> >> It appears the problem is that we are I/O bound. We have our SAN guy >> looking into possibly moving us to faster spindles. In the meantime, I >> wanted to implement whatever was possible to give us breathing room. >> Turning off atime certainly helped, but we are definitely not completely >> out of the drink yet. >> >> I also found that disabling the ZFS cache flush as per the Evil Tuning >> Guide was a huge boon, considering we''re on a battery-backed (non-Sun) SAN. >> >> > > Really? Which OS version are you on? This should have been > fixed in Solaris 10 5/08 (it is a fix in the [s]sd driver). Caveat: there > may be some devices which do not properly negotiate the SYNC_NV > bit. In my tests, using Solaris 10 5/08, disabling the cache flush made > zero difference. >PSARC 2007/053 If I read through the code correctly... If the array doesn''t respond to the device inquiry, you haven''t made an entry to sd.conf for the array, or it isn''t hard coded in the sd.c table - I think there are only two in that state - then you''d have to disable the cache flush.