Hello, I have several ~12TB storage servers using Solaris with ZFS. Two of them have recently developed performance issues where the majority of time in an spa_sync() will be spent in the space_map_*() functions. During this time, "zpool iostat" will show 0 writes to disk, while it does hundreds or thousands of small (~3KB) reads each second, presumably reading space map data from disk to find places to put the new blocks. The result is that it can take several minutes for an spa_sync() to complete, even if I''m only writing a single 128KB block. Using DTrace, I can see that space_map_alloc() frequently returns -1 for 128KB blocks. From my understanding of the ZFS code, that means that one or more metaslabs has no 128KB blocks available. Because of that, it seems to be spending a lot of time going through different space maps which aren''t able to all be cached in RAM at the same time, thus causing bad performance as it has to read from the disks. The on-disk space map size seems to be about 500MB. I assume the simple solution is to leave enough free space available so that the space map functions don''t have to hunt around so much. This problem starts happening when there''s about 1TB free out of the 12TB. It seems like such a shame to waste that much space, so if anyone has any suggestions, I''d be glad to hear them. 1) Is there anything I can do to temporarily fix the servers that are having this problem? They are production servers, and I have customers complaining, so a temporary fix is needed. 2) Is there any sort of tuning I can do with future servers to prevent this from becoming a problem? Perhaps a way to make sure all the space maps are always in RAM? 3) I set recordsize=32K and turned off compression, thinking that should fix the performance problem for now. However, using a DTrace script to watch calls to space_map_alloc(), I see that it''s still looking for 128KB blocks (!!!) for reasons that are unclear to me, thus it hasn''t helped the problem. Thanks, Scott This message posted from opensolaris.org
Sanjeev Bagewadi
2008-Jun-10 04:13 UTC
[zfs-discuss] ZFS space map causing slow performance
Scott, This looks more like " bug#*6596237 Stop looking and start ganging <http://monaco.sfbay/detail.jsf?cr=6596237>". * What version of Solaris are the production servers running (S10 or Opensolaris) ? Thanks and regards, Sanjeev. Scott wrote:> Hello, > > I have several ~12TB storage servers using Solaris with ZFS. Two of them have recently developed performance issues where the majority of time in an spa_sync() will be spent in the space_map_*() functions. During this time, "zpool iostat" will show 0 writes to disk, while it does hundreds or thousands of small (~3KB) reads each second, presumably reading space map data from disk to find places to put the new blocks. The result is that it can take several minutes for an spa_sync() to complete, even if I''m only writing a single 128KB block. > > Using DTrace, I can see that space_map_alloc() frequently returns -1 for 128KB blocks. From my understanding of the ZFS code, that means that one or more metaslabs has no 128KB blocks available. Because of that, it seems to be spending a lot of time going through different space maps which aren''t able to all be cached in RAM at the same time, thus causing bad performance as it has to read from the disks. The on-disk space map size seems to be about 500MB. > > I assume the simple solution is to leave enough free space available so that the space map functions don''t have to hunt around so much. This problem starts happening when there''s about 1TB free out of the 12TB. It seems like such a shame to waste that much space, so if anyone has any suggestions, I''d be glad to hear them. > > 1) Is there anything I can do to temporarily fix the servers that are having this problem? They are production servers, and I have customers complaining, so a temporary fix is needed. > > 2) Is there any sort of tuning I can do with future servers to prevent this from becoming a problem? Perhaps a way to make sure all the space maps are always in RAM? > > 3) I set recordsize=32K and turned off compression, thinking that should fix the performance problem for now. However, using a DTrace script to watch calls to space_map_alloc(), I see that it''s still looking for 128KB blocks (!!!) for reasons that are unclear to me, thus it hasn''t helped the problem. > > Thanks, > Scott > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-- Solaris Revenue Products Engineering, India Engineering Center, Sun Microsystems India Pvt Ltd. Tel: x27521 +91 80 669 27521
> Scott, > > This looks more like " bug#*6596237 Stop looking and > start ganging > <http://monaco.sfbay/detail.jsf?cr=6596237>". > * > What version of Solaris are the production servers > running (S10 or > Opensolaris) ? > > Thanks and regards, > Sanjeev.Hi Sanjeev, Thanks for the reply. These servers are running SXCE 86. The same problem happens with b87, but I downgraded to take the new write throttling code out of the equation. Interestingly enough, I never saw this problem when these servers were running b70. It may just be a coincidence, but I noticed the problem starting within days of upgrading from b70, and it was already too late to downgrade due to ZFS versioning. -Scott This message posted from opensolaris.org
Victor Latushkin
2008-Jun-10 20:56 UTC
[zfs-discuss] ZFS space map causing slow performance
Scott wrote:> Hello, > > I have several ~12TB storage servers using Solaris with ZFS. Two of > them have recently developed performance issues where the majority of > time in an spa_sync() will be spent in the space_map_*() functions. > During this time, "zpool iostat" will show 0 writes to disk, while it > does hundreds or thousands of small (~3KB) reads each second, > presumably reading space map data from disk to find places to put the > new blocks. The result is that it can take several minutes for an > spa_sync() to complete, even if I''m only writing a single 128KB > block. > > Using DTrace, I can see that space_map_alloc() frequently returns -1 > for 128KB blocks. From my understanding of the ZFS code, that means > that one or more metaslabs has no 128KB blocks available. Because of > that, it seems to be spending a lot of time going through different > space maps which aren''t able to all be cached in RAM at the same > time, thus causing bad performance as it has to read from the disks. > The on-disk space map size seems to be about 500MB.This indeed sounds like ZFS is trying to find bigger chunks of properly aligned free space segments and fails to find it.> I assume the simple solution is to leave enough free space available > so that the space map functions don''t have to hunt around so much. > This problem starts happening when there''s about 1TB free out of the > 12TB. It seems like such a shame to waste that much space, so if > anyone has any suggestions, I''d be glad to hear them.Although fix for "6596237 Stop looking and start ganging" as suggested by Sanjeev will provide some relief here, you are running you pool at 92% capacity, so it may be time to consider expanding your pool.> 1) Is there anything I can do to temporarily fix the servers that are > having this problem? They are production servers, and I have > customers complaining, so a temporary fix is needed.Setting ZFS recordsize to some smaller value than default 128k may help but only temporarily.> 2) Is there any sort of tuning I can do with future servers to > prevent this from becoming a problem? Perhaps a way to make sure all > the space maps are always in RAM?Fix for 6596237 will help improve performance in such cases, so probably you need to make sure that it is installed once available. Ability to defragment pool could be useful as well.> 3) I set recordsize=32K and turned off compression, thinking that > should fix the performance problem for now. However, using a DTrace > script to watch calls to space_map_alloc(), I see that it''s still > looking for 128KB blocks (!!!) for reasons that are unclear to me, > thus it hasn''t helped the problem.Changing recordsize affect block sizes ZFS uses for data blocks. It may still require bigger blocks for metadata needs. DTrace may help to better understand what is causing ZFS to try to allocate bigger block. For example, larger blocks may still be used for ZIL. Wbr, victor