Josh Simon
2011-Aug-01 14:29 UTC
[zfs-discuss] ZFS Fragmentation issue - examining the ZIL
Hello, One of my coworkers was sent the following explanation from Oracle as to why one of backup systems was conducting a scrub so slow. I figured I would share it with the group. http://wildness.espix.org/index.php?post/2011/06/09/ZFS-Fragmentation-issue-examining-the-ZIL PS: Thought it was kind of odd that Oracle would direct us to a blog, but the post is very thorough. Thanks, Josh Simon
Neil Perrin
2011-Aug-01 21:16 UTC
[zfs-discuss] ZFS Fragmentation issue - examining the ZIL
In general the blogs conclusion is correct . When file systems get full there is fragmentation (happens to all file systems) and for ZFS the pool uses gang blocks of smaller blocks when there are insufficient large blocks. However, the ZIL never allocates or uses gang blocks. It directly allocates blocks (outside of the zio pipeline) using zio_alloc_zil() -> metaslab_alloc(). Gang blocks are only used by the main pool when the pool transaction group (txg) commit occurs. Solutions to the problem include: - add a separate intent log - add more top level devices (hopefully replicated) - delete unused files/snapshots etc with in the poll... Neil. On 08/01/11 08:29, Josh Simon wrote:> Hello, > > One of my coworkers was sent the following explanation from Oracle as > to why one of backup systems was conducting a scrub so slow. I figured > I would share it with the group. > > http://wildness.espix.org/index.php?post/2011/06/09/ZFS-Fragmentation-issue-examining-the-ZIL > > > PS: Thought it was kind of odd that Oracle would direct us to a blog, > but the post is very thorough. > > Thanks, > > Josh Simon > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Brandon High
2011-Aug-01 22:04 UTC
[zfs-discuss] ZFS Fragmentation issue - examining the ZIL
On Mon, Aug 1, 2011 at 2:16 PM, Neil Perrin <neil.perrin at oracle.com> wrote:> In general the blogs conclusion is correct . When file systems get full > there is > fragmentation (happens to all file systems) and for ZFS the pool uses gang > blocks of smaller blocks when there are insufficient large blocks.The blog doesn''t mention how full the pool was. It''s pretty well documented that performance takes a nosedive at a certain point. A slow scrub is actually not related to the problems in the blog post, since there''s not a lot of writes during (or at least caused by) a scrub. Fragmentation is a real issue with pools that are (or have been) very full. The data gets written out in fragments and has to be read back in the same order. If the mythical bp_rewrite code ever shows up, it will be possible to defrag a pool. But not yet. -B -- Brandon High : bhigh at freaks.com
Richard Elling
2011-Aug-01 22:10 UTC
[zfs-discuss] ZFS Fragmentation issue - examining the ZIL
On Aug 1, 2011, at 2:16 PM, Neil Perrin wrote:> In general the blogs conclusion is correct . When file systems get full there is > fragmentation (happens to all file systems) and for ZFS the pool uses gang > blocks of smaller blocks when there are insufficient large blocks. > However, the ZIL never allocates or uses gang blocks. It directly allocates > blocks (outside of the zio pipeline) using zio_alloc_zil() -> metaslab_alloc(). > Gang blocks are only used by the main pool when the pool transaction > group (txg) commit occurs. Solutions to the problem include: > - add a separate intent logYes, I thought that it was odd that someone who is familiar with Oracle databases, and their redo logs, didn''t use separate intent logs.> - add more top level devices (hopefully replicated) > - delete unused files/snapshots etc with in the poll?If gang activity is the root cause of the performance, then they must be at the edge of effective space utilization. -- richard> > Neil. > > > On 08/01/11 08:29, Josh Simon wrote: >> Hello, >> >> One of my coworkers was sent the following explanation from Oracle as to why one of backup systems was conducting a scrub so slow. I figured I would share it with the group. >> >> http://wildness.espix.org/index.php?post/2011/06/09/ZFS-Fragmentation-issue-examining-the-ZIL >> >> PS: Thought it was kind of odd that Oracle would direct us to a blog, but the post is very thorough. >> >> Thanks, >> >> Josh Simon >> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Daniel Carosone
2011-Aug-01 23:27 UTC
[zfs-discuss] ZFS Fragmentation issue - examining the ZIL
On Mon, Aug 01, 2011 at 03:10:28PM -0700, Richard Elling wrote:> On Aug 1, 2011, at 2:16 PM, Neil Perrin wrote: > > > In general the blogs conclusion is correct . When file systems get full there is > > fragmentation (happens to all file systems) and for ZFS the pool uses gang > > blocks of smaller blocks when there are insufficient large blocks. > > However, the ZIL never allocates or uses gang blocks. It directly allocates > > blocks (outside of the zio pipeline) using zio_alloc_zil() -> metaslab_alloc(). > > Gang blocks are only used by the main pool when the pool transaction > > group (txg) commit occurs. Solutions to the problem include: > > - add a separate intent log > > Yes, I thought that it was odd that someone who is familiar with Oracle databases, > and their redo logs, didn''t use separate intent logs. > > > - add more top level devices (hopefully replicated) > > - delete unused files/snapshots etc with in the poll? > > If gang activity is the root cause of the performance, then they must be at the > edge of effective space utilization. > -- richardThe other thing that can cause a storm of tiny IOs is dedup, and this effect can last long after space has been freed and/or dedup turned off, until all the blocks corresponding to DDT entries are rewritten. I wonder if this was involved here. -- Dan. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 194 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110802/035c568a/attachment.bin>
Brandon High
2011-Aug-03 19:32 UTC
[zfs-discuss] ZFS Fragmentation issue - examining the ZIL
On Mon, Aug 1, 2011 at 4:27 PM, Daniel Carosone <dan at geek.com.au> wrote:> The other thing that can cause a storm of tiny IOs is dedup, and this > effect can last long after space has been freed and/or dedup turned > off, until all the blocks corresponding to DDT entries are rewritten. > I wonder if this was involved here.Using dedup on a pool that houses an Oracle DB is Doing It Wrong in so many ways... -B -- Brandon High : bhigh at freaks.com
Daniel Carosone
2011-Aug-03 22:59 UTC
[zfs-discuss] ZFS Fragmentation issue - examining the ZIL
On Wed, Aug 03, 2011 at 12:32:56PM -0700, Brandon High wrote:> On Mon, Aug 1, 2011 at 4:27 PM, Daniel Carosone <dan at geek.com.au> wrote: > > The other thing that can cause a storm of tiny IOs is dedup, and this > > effect can last long after space has been freed and/or dedup turned > > off, until all the blocks corresponding to DDT entries are rewritten. > > I wonder if this was involved here. > > Using dedup on a pool that houses an Oracle DB is Doing It Wrong in so > many ways...Indeed, but alas people still Do It Wrong. In particular, when a pool is approaching full, turning on dedup might seem like an attractive proposition to someone who doesn''t understand the cost. So i just wonder if they have, or had at some time past, enabed it. -- Dan. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 194 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110804/4fbfbe20/attachment.bin>