thr3ads.net - zfs discuss - [zfs-discuss] ZFS Fragmentation issue

If this information is useful, please help other people find it:
Share via:

Josh Simon

2011-Aug-01 14:29 UTC

[zfs-discuss] ZFS Fragmentation issue - examining the ZIL

Hello,

One of my coworkers was sent the following explanation from Oracle as to 
why one of backup systems was conducting a scrub so slow. I figured I 
would share it with the group.

http://wildness.espix.org/index.php?post/2011/06/09/ZFS-Fragmentation-issue-examining-the-ZIL

PS: Thought it was kind of odd that Oracle would direct us to a blog, 
but the post is very thorough.

Thanks,

Josh Simon

Neil Perrin

2011-Aug-01 21:16 UTC

head link

[zfs-discuss] ZFS Fragmentation issue - examining the ZIL

In general the blogs conclusion is correct . When file systems get full 
there is
fragmentation (happens to all file systems) and for ZFS the pool uses gang
blocks of smaller blocks when there are insufficient large blocks.
However, the ZIL never allocates or uses gang blocks. It directly allocates
blocks (outside of the zio pipeline) using zio_alloc_zil() -> 
metaslab_alloc().
Gang blocks are only used by the main pool when the pool transaction
group (txg) commit occurs.  Solutions to the problem include:
    - add a separate intent log
    - add more top level devices (hopefully replicated)
    - delete unused files/snapshots etc with in the poll...

Neil.


On 08/01/11 08:29, Josh Simon wrote:> Hello,
>
> One of my coworkers was sent the following explanation from Oracle as 
> to why one of backup systems was conducting a scrub so slow. I figured 
> I would share it with the group.
>
>
http://wildness.espix.org/index.php?post/2011/06/09/ZFS-Fragmentation-issue-examining-the-ZIL
>
>
> PS: Thought it was kind of odd that Oracle would direct us to a blog, 
> but the post is very thorough.
>
> Thanks,
>
> Josh Simon
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Brandon High

2011-Aug-01 22:04 UTC

head link

[zfs-discuss] ZFS Fragmentation issue - examining the ZIL

On Mon, Aug 1, 2011 at 2:16 PM, Neil Perrin <neil.perrin at oracle.com>
wrote:> In general the blogs conclusion is correct . When file systems get full
> there is
> fragmentation (happens to all file systems) and for ZFS the pool uses gang
> blocks of smaller blocks when there are insufficient large blocks.
The blog doesn''t mention how full the pool was. It''s pretty
well
documented that performance takes a nosedive at a certain point.

A slow scrub is actually not related to the problems in the blog post,
since there''s not a lot of writes during (or at least caused by) a
scrub. Fragmentation is a real issue with pools that are (or have
been) very full. The data gets written out in fragments and has to be
read back in the same order.

If the mythical bp_rewrite code ever shows up, it will be possible to
defrag a pool. But not yet.

-B

-- 
Brandon High : bhigh at freaks.com

Richard Elling

2011-Aug-01 22:10 UTC

head link

[zfs-discuss] ZFS Fragmentation issue - examining the ZIL

On Aug 1, 2011, at 2:16 PM, Neil Perrin wrote:
> In general the blogs conclusion is correct . When file systems get full
there is
> fragmentation (happens to all file systems) and for ZFS the pool uses gang
> blocks of smaller blocks when there are insufficient large blocks.
> However, the ZIL never allocates or uses gang blocks. It directly allocates
> blocks (outside of the zio pipeline) using zio_alloc_zil() ->
metaslab_alloc().
> Gang blocks are only used by the main pool when the pool transaction
> group (txg) commit occurs.  Solutions to the problem include:
>   - add a separate intent log
Yes, I thought that it was odd that someone who is familiar with Oracle
databases,
and their redo logs, didn''t use separate intent logs.
>   - add more top level devices (hopefully replicated)
>   - delete unused files/snapshots etc with in the poll?
If gang activity is the root cause of the performance, then they must be at the
edge of effective space utilization.
 -- richard
> 
> Neil.
> 
> 
> On 08/01/11 08:29, Josh Simon wrote:
>> Hello,
>> 
>> One of my coworkers was sent the following explanation from Oracle as
to why one of backup systems was conducting a scrub so slow. I figured I would
share it with the group.
>> 
>>
http://wildness.espix.org/index.php?post/2011/06/09/ZFS-Fragmentation-issue-examining-the-ZIL
>> 
>> PS: Thought it was kind of odd that Oracle would direct us to a blog,
but the post is very thorough.
>> 
>> Thanks,
>> 
>> Josh Simon
>> 
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Daniel Carosone

2011-Aug-01 23:27 UTC

head link

[zfs-discuss] ZFS Fragmentation issue - examining the ZIL

On Mon, Aug 01, 2011 at 03:10:28PM -0700, Richard Elling
wrote:> On Aug 1, 2011, at 2:16 PM, Neil Perrin wrote:
> 
> > In general the blogs conclusion is correct . When file systems get
full there is
> > fragmentation (happens to all file systems) and for ZFS the pool uses
gang
> > blocks of smaller blocks when there are insufficient large blocks.
> > However, the ZIL never allocates or uses gang blocks. It directly
allocates
> > blocks (outside of the zio pipeline) using zio_alloc_zil() ->
metaslab_alloc().
> > Gang blocks are only used by the main pool when the pool transaction
> > group (txg) commit occurs.  Solutions to the problem include:
> >   - add a separate intent log
> 
> Yes, I thought that it was odd that someone who is familiar with Oracle
databases,
> and their redo logs, didn''t use separate intent logs.
> 
> >   - add more top level devices (hopefully replicated)
> >   - delete unused files/snapshots etc with in the poll?
> 
> If gang activity is the root cause of the performance, then they must be at
the
> edge of effective space utilization.
>  -- richard
The other thing that can cause a storm of tiny IOs is dedup, and this
effect can last long after space has been freed and/or dedup turned
off, until all the blocks corresponding to DDT entries are rewritten.
I wonder if this was involved here.  

--
Dan.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 194 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110802/035c568a/attachment.bin>

Brandon High

2011-Aug-03 19:32 UTC

head link

[zfs-discuss] ZFS Fragmentation issue - examining the ZIL

On Mon, Aug 1, 2011 at 4:27 PM, Daniel Carosone <dan at geek.com.au>
wrote:> The other thing that can cause a storm of tiny IOs is dedup, and this
> effect can last long after space has been freed and/or dedup turned
> off, until all the blocks corresponding to DDT entries are rewritten.
> I wonder if this was involved here.
Using dedup on a pool that houses an Oracle DB is Doing It Wrong in so
many ways...

-B

-- 
Brandon High : bhigh at freaks.com

Daniel Carosone

2011-Aug-03 22:59 UTC

head link

[zfs-discuss] ZFS Fragmentation issue - examining the ZIL

On Wed, Aug 03, 2011 at 12:32:56PM -0700, Brandon High
wrote:> On Mon, Aug 1, 2011 at 4:27 PM, Daniel Carosone <dan at geek.com.au>
wrote:
> > The other thing that can cause a storm of tiny IOs is dedup, and this
> > effect can last long after space has been freed and/or dedup turned
> > off, until all the blocks corresponding to DDT entries are rewritten.
> > I wonder if this was involved here.
> 
> Using dedup on a pool that houses an Oracle DB is Doing It Wrong in so
> many ways...
Indeed, but alas people still Do It Wrong.  In particular, when a pool
is approaching full, turning on dedup might seem like an attractive
proposition to someone who doesn''t understand the cost. 

So i just wonder if they have, or had at some time past, enabed it.

--
Dan.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 194 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110804/4fbfbe20/attachment.bin>

zfs discuss - Aug 2011 - ZFS Fragmentation issue - examining the ZIL

[zfs-discuss] ZFS Fragmentation issue - examining the ZIL

[zfs-discuss] ZFS Fragmentation issue - examining the ZIL

[zfs-discuss] ZFS Fragmentation issue - examining the ZIL

[zfs-discuss] ZFS Fragmentation issue - examining the ZIL

[zfs-discuss] ZFS Fragmentation issue - examining the ZIL

[zfs-discuss] ZFS Fragmentation issue - examining the ZIL

[zfs-discuss] ZFS Fragmentation issue - examining the ZIL