thr3ads.net - zfs discuss - [zfs-discuss] dedup experience with sufficient RAM/l2arc/cpu [Jan 2011]

If this information is useful, please help other people find it:
Share via:

Ware Adams

2011-Jan-28 15:13 UTC

[zfs-discuss] dedup experience with sufficient RAM/l2arc/cpu

There''s a lot of discussion of dedup performance issues (including
problems backing out of using it which concerns me), but many/most of those
involve relatively limited RAM and CPU configurations. I wanted to see if there
is experience that people could share using it on with higher RAM levels and
l2arc.

We have built a backup storage server nearly identical to this:

http://www.natecarlson.com/2010/05/07/review-supermicros-sc847a-4u-chassis-with-36-drive-bays/

briefly:

SuperMicro 36 bay case
48 GB RAM
2x 5620 CPU
Hitachi A7K2000 drives for storage
X25-M for l2arc (160 GB)
4x LSI SAS9211-8i
Solaris 11 Express

The main storage pool is mirrored and uses gzip compression. Our use consists
of backing up daily snapshots of multiple MySQL hosts from a Sun 7410 appliance.
We rsync the snapshot to the backup server (ZFS send to non-appliance host
isn''t supported on the 7000 unfortunately), snapshot (so now we have a
snapshot of that matches the original on the 7410), clone, start MySQL on the
clone to verify the backup, shut down MySQL. We do this daily across 10 hosts
which have significant overlap in data.

I might guess that dedup would provide good space savings, but before I turn it
on I wanted to see if people with larger configurations had found it workable.
My greatest concern are stories of not only poor performance but worse complete
non-responsiveness when trying to zfs destroy a filesystem with dedup turned on.

We are somewhat flexible here. We are not terribly pressed for space, and we do
not need massive performance out of this. Because of that I probably
won''t use dedup without hearing it is workable on a similar
configuration, but if people have had success it would give us more cushion for
inevitable data growth.

Thanks for any help,
Ware

Richard Elling

2011-Jan-28 17:21 UTC

head link

[zfs-discuss] dedup experience with sufficient RAM/l2arc/cpu

comment below...

On Jan 28, 2011, at 7:13 AM, Ware Adams wrote:
> There''s a lot of discussion of dedup performance issues (including
problems backing out of using it which concerns me), but many/most of those
involve relatively limited RAM and CPU configurations.  I wanted to see if there
is experience that people could share using it on with higher RAM levels and
l2arc.
> 
> We have built a backup storage server nearly identical to this:
> 
>
http://www.natecarlson.com/2010/05/07/review-supermicros-sc847a-4u-chassis-with-36-drive-bays/
> 
> briefly:
> 
> SuperMicro 36 bay case
> 48 GB RAM
> 2x 5620 CPU
> Hitachi A7K2000 drives for storage
> X25-M for l2arc (160 GB)
> 4x LSI SAS9211-8i
> Solaris 11 Express
> 
> The main storage pool is mirrored and uses gzip compression.  Our use
consists of backing up daily snapshots of multiple MySQL hosts from a Sun 7410
appliance.  We rsync the snapshot to the backup server (ZFS send to
non-appliance host isn''t supported on the 7000 unfortunately), snapshot
(so now we have a snapshot of that matches the original on the 7410), clone,
start MySQL on the clone to verify the backup, shut down MySQL.  We do this
daily across 10 hosts which have significant overlap in data.
> 
> I might guess that dedup would provide good space savings, but before I
turn it on I wanted to see if people with larger configurations had found it
workable.  My greatest concern are stories of not only poor performance but
worse complete non-responsiveness when trying to zfs destroy a filesystem with
dedup turned on.
> 
> We are somewhat flexible here.  We are not terribly pressed for space, and
we do not need massive performance out of this.  Because of that I probably
won''t use dedup without hearing it is workable on a similar
configuration, but if people have had success it would give us more cushion for
inevitable data growth.
I apologize for the shortness, but since you have such large, slow drives,
rather than making
a single huge pool and deduping, create a pool per month/week/quarter. Send the
snaps over
that you need, destroy the old pool. KISS & fast destroy.
 -- richard

Ware Adams

2011-Jan-28 18:37 UTC

head link

[zfs-discuss] dedup experience with sufficient RAM/l2arc/cpu

On Jan 28, 2011, at 12:21 PM, Richard Elling wrote:> 
> On Jan 28, 2011, at 7:13 AM, Ware Adams wrote:
> 
>> SuperMicro 36 bay case
>> 48 GB RAM
>> 2x 5620 CPU
>> Hitachi A7K2000 drives for storage
>> X25-M for l2arc (160 GB)
>> 4x LSI SAS9211-8i
>> Solaris 11 Express
> 
> I apologize for the shortness, but since you have such large, slow drives,
rather than making
> a single huge pool and deduping, create a pool per month/week/quarter. Send
the snaps over
> that you need, destroy the old pool. KISS & fast destroy.
I hadn''t thought about that, but I think it might add its own
complexity.  Some more detail on what we are doing:

This host is a backup storage server for (currently) six MySQL hosts (whose data
sets reside on NFS shares exported from the 7410).  Each data set is ~1.5 TB
uncompressed.  Of this about 30 GB changes per day (that''s the
rsync''d amount, ZFS send -i would be less but I can''t do that
from the 7410).  We are getting about 3.6:1 compression using gzip.

Then we are keeping daily backups for a month, weeklies for 6 months and then
monthlies for a year.  By far our most frequent use of backups is an
accidentally dropped table, but we also need with some frequency to recover from
a situation where a user''s code error was writing garbage to a field
for say a month and they need to recover as of a certain date several months
ago.  So all in all we would like to keep quite a number of backups, say 6 hosts
* (30 dailies + 20 weeklies + 6 monthlies) = 336.  The dailies and weeklies get
pruned as they age into later time periods and aren''t needed (and all
are pruned after a year).

With the above I''d be able to have 18 pools with mirrors or 36 pools
with just single drives.  So there are two things that would seem to add
complexity.  First, I''d have to assign each incoming snapshot from the
7410 to one of those pools based on whether it is going to expire or not.  I
assume you could live with 18 or 36 "slots", but I haven''t
done the logic to exactly find out.  Still, it would be some added complexity
vs. today''s process which is bascially:

rsync from 7410
snapshot
clone

The other issue is the rsync step.  With only one pool I just rsync the 30 GB of
changed data to that MySQL hosts''s share.  In the multiple pool
scenario''s I guess I would have a base copy of the full data set per
pool?  That would eat up ~400 GB on each 2 TB pool, so I wouldn''t be
able to fit all 6 hosts onto a given pool.

We haven''t done a lot of zfs destroy yet (though some in testing), so I
can''t say the current setup is workable.  But unless it is horribly
slow there does seem to be some simplicity benefit from having a single pool. 
I''ll keep this in mind though.  We could probably have a larger pool
for the 6 dailies per week that will be destroyed.  I''d still have to
zfs send the base directory prior to rsync, but that would simplify some.

Thanks for the suggestion.

--Ware

zfs discuss - Jan 2011 - dedup experience with sufficient RAM/l2arc/cpu

[zfs-discuss] dedup experience with sufficient RAM/l2arc/cpu

[zfs-discuss] dedup experience with sufficient RAM/l2arc/cpu

[zfs-discuss] dedup experience with sufficient RAM/l2arc/cpu