thr3ads.net - zfs discuss - [zfs-discuss] ZFS Success Stories [Oct 2008]

If this information is useful, please help other people find it:
Share via:

gm_sjo

2008-Oct-20 07:10 UTC

[zfs-discuss] ZFS Success Stories

Hi all,

I  have built out an 8TB SAN at home using OpenSolaris + ZFS. I have
yet to put it into ''production'' as a lot of the issues raised
on this
mailing list are putting me off trusting my data onto the platform
right now.

Throughout time, I have stored my personal data on NetWare and now NT
and this solution has been 100% reliable for the last 12 years. Never
a single problem (nor have I had any issues with NTFS with the tens of
thousands of spindles i''ve worked with over the years).

I appreciate 99% of the time people only comment if they have a
problem, which is why I think it''d be nice for some people who have
successfully implemented ZFS, including making various use of the
features (recovery, replacing disks, etc), could just reply to this
post with a sentence or paragraph detailing how great it is for them.
Not necessarily interested in very small implementations of one/two
disks that haven''t changed config since the first day it was
installed, but more aimed towards setups that are ''organic''
and have
changed/been_administered over time (to show functionality of the
tools, resilience of the platform, etc.)..

.. Of course though, I guess a lot of people who may have never had a
problem wouldn''t even be signed up on this list! :-)


Thanks!

Will Murnane

2008-Oct-20 15:36 UTC

head link

[zfs-discuss] [storage-discuss] ZFS Success Stories

On Mon, Oct 20, 2008 at 03:10, gm_sjo <saqmaster at gmail.com>
wrote:> I appreciate 99% of the time people only comment if they have a
> problem, which is why I think it''d be nice for some people who
have
> successfully implemented ZFS, including making various use of the
> features (recovery, replacing disks, etc), could just reply to this
> post with a sentence or paragraph detailing how great it is for them.My initial test of zfs was with a few IDE disks which I had found
flaky on other platforms (md5 mismatches, that kind of thing). I put
them all in a non-redundant pool, and loaded some data on the pool.
Then I let it sit in the corner serving NFS for a couple weeks,
scrubbed the pool every once in a while, and watched the error
counters. It confirmed what I''d seen: these disks gave off errors
spontaneously. This was a good start: the first time I''d seen a
storage stack that had the audacity to complain about problems with
its hardware.

So I upgraded, and put in "known-good" disks. I started with a
mirrored pair of 750s, then added another pair, then added a pair of
log disks. At each step, things moved smoothly, and speed increased.

I''ve also helped my brother set up a Solaris/ZFS setup, on a bit
larger scale but with a more static configuration. He started with
Linux, md raid, and XFS, using raid 5 on 8 320GB disks and a
Supermicro AOC-SAT2-MV8 Marvell controller. Unfortunately, he lost
basically the entire array due to corruption in some layer of the
stack. So I suggested ZFS as an alternative. This was around Build
67 of Nevada. He put his 8 disks in a raidz pool. About a year ago,
he bought six 500gb disks and another Marvell controller, made a new
raidz vdev (in a new pool) out of them, and added six of the 320gb
disks in another vdev. A month or so ago, he bought six 1TB disks,
made a new pool out of them, and moved all his data over to it.

At each step of the way, he upgraded to solve a problem. Moving from
Linux to Solaris was because it had better drivers for the
Marvell-based card. Adding the 500GB disks was because he was out of
space, and the reason we didn''t just add another vdev to the existing
pool is because his case only has room for 13 disks. Finally, the 320
gig disks have started returning checksum errors, so he wanted to get
them out of the pool. The system as a whole has been very reliable,
but due to some ZFS limitations (no vdev removal, no stripe width
changing) a new pool has been needed at each stage.

My experiences with ZFS at home have been very positive, but I also
use it at work. I''m concerned about the speed of "zfs send"
and with
being able to remove vdevs before I will recommend it unilaterally for
work purposes, but despite these issues I have a couple pools in
production: one serving mail, one serving user home directories, and
one serving data for research groups. We have had no problems with
these pools, but I keep an eye on the backup logs for them. I hope
that eventually such careful watching will not be necessary.

Will

Jonathan Loran

2008-Oct-20 19:55 UTC

head link

[zfs-discuss] [storage-discuss] ZFS Success Stories

We have 135 TB capacity with about 75 TB in use on zfs based storage.  
zfs use started about 2 years ago, and has grown from there.  This spans 
9 SAN appliances, with 5 "head nodes", and 2 more recent servers
running
zfs on JBOD with vdevs made up of raidz2. 

So far, the experience has been very positive.  Never lost a bit of 
data.  We scrub weekly, and I''ve started sleeping better at night.  I 
have also read the horror stories, but we aren''t seeing them here. 

We did have some performance issues, especially involving the SAN 
storage on more heavily used systems, but enabling the cache on the SAN 
devices without pushing fsync through to disk basically fixed that.  
Your zfs layout can profoundly effect performance, which is a down 
side.  It''s best to test your setup under an approximate realistic work
load  to balance capacity with performance before deploying.

BTW, most of our zfs deployment is on Solaris 10{u4,u5}, but two large 
servers are on OpenSolaris svn86.  The OpenSolaris servers seem to be 
considerably faster, and more feature rich, without any reliability 
issues, so far.

Jon

gm_sjo wrote:> Hi all,
>
> I  have built out an 8TB SAN at home using OpenSolaris + ZFS. I have
> yet to put it into ''production'' as a lot of the issues
raised on this
> mailing list are putting me off trusting my data onto the platform
> right now.
>
> Throughout time, I have stored my personal data on NetWare and now NT
> and this solution has been 100% reliable for the last 12 years. Never
> a single problem (nor have I had any issues with NTFS with the tens of
> thousands of spindles i''ve worked with over the years).
>
> I appreciate 99% of the time people only comment if they have a
> problem, which is why I think it''d be nice for some people who
have
> successfully implemented ZFS, including making various use of the
> features (recovery, replacing disks, etc), could just reply to this
> post with a sentence or paragraph detailing how great it is for them.
> Not necessarily interested in very small implementations of one/two
> disks that haven''t changed config since the first day it was
> installed, but more aimed towards setups that are
''organic'' and have
> changed/been_administered over time (to show functionality of the
> tools, resilience of the platform, etc.)..
>
> .. Of course though, I guess a lot of people who may have never had a
> problem wouldn''t even be signed up on this list! :-)
>
>
> Thanks!
> _______________________________________________
> storage-discuss mailing list
> storage-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/storage-discuss
>   
-- 

-     _____/     _____/      /           - Jonathan Loran -           -
-    /          /           /                IT Manager               -
-  _____  /   _____  /     /     Space Sciences Laboratory, UC Berkeley
-        /          /     /      (510) 643-5146 jloran at ssl.berkeley.edu
- ______/    ______/    ______/           AST:7731^29u18e3

Marc Bevand

2008-Oct-21 07:14 UTC

head link

[zfs-discuss] ZFS Success Stories

About 2 years ago I used to run snv_55b with a raidz on top of 5 500GB SATA 
drives. After 10 months I ran out of space and added a mirror of 2 250GB 
drives to my pool with "zpool add". No pb. I scrubbed it weekly. I
only saw 1
CKSUM error one day (ZFS self-healed itself automatically of course). Never 
had any pb with that server.

After running again out of space I replaced it with a new system running 
snv_82, configured with a raidz on top of 7 750GB drives. To burn in the 
machine, I wrote a python script that read random sectors from the drives. I 
let it run for 48 hours to subject each disk to 10+ million I/O operations. 
After it passed this test, I created the pool and run some more scripts to 
create/delete files off it continously. To test disk failures (and SATA 
hotplug), I disconnected and reconnected a drive at random while the scripts 
were running. The system was always able to redetect the drive immediately 
after being plugged in (you need "set sata:sata_auto_online=1" for
this to
work). Depending on how long the drive had been disconnected, I either needed 
to do a "zpool replace" or nothing at all, for the system to re-add
the disk
to the pool and initiate a resilver. After these tests, I trusted the system 
enough to move all my data to it, so I rsync''d everything and
double-checked
it with MD5 sums.

I have another ZFS server, at work, on which 1 disk someday started acting 
weirdly (timeouts). I physically replaced it, and ran "zpool replace".
The
resilver completed successfully. On this server, we have seen 2 CKSUM errors 
over the last 18 months or so. We read about 3 TB of data every day from it 
(daily rsync), that amounts to about 1.5 PB over 18 months. I guess 2 silent 
data corruptions while reading that quantity of data is about the expected 
error rate of modern SATA drives. (Again ZFS self-healed itself, so this was 
completely transparent to us.)

-marc

Robert Milkowski

2008-Oct-21 23:37 UTC

head link

[zfs-discuss] ZFS Success Stories

Hello Marc,

Tuesday, October 21, 2008, 8:14:17 AM, you wrote:

MB> About 2 years ago I used to run snv_55b with a raidz on top of 5 500GB
SATA
MB> drives. After 10 months I ran out of space and added a mirror of 2 250GB
MB> drives to my pool with "zpool add". No pb. I scrubbed it
weekly. I only saw 1
MB> CKSUM error one day (ZFS self-healed itself automatically of course).
Never
MB> had any pb with that server.

MB> After running again out of space I replaced it with a new system running
MB> snv_82, configured with a raidz on top of 7 750GB drives. To burn in the
MB> machine, I wrote a python script that read random sectors from the
drives. I
MB> let it run for 48 hours to subject each disk to 10+ million I/O
operations.
MB> After it passed this test, I created the pool and run some more scripts
to
MB> create/delete files off it continously. To test disk failures (and SATA
MB> hotplug), I disconnected and reconnected a drive at random while the
scripts
MB> were running. The system was always able to redetect the drive
immediately
MB> after being plugged in (you need "set sata:sata_auto_online=1"
for this to
MB> work). Depending on how long the drive had been disconnected, I either
needed
MB> to do a "zpool replace" or nothing at all, for the system to
re-add the disk
MB> to the pool and initiate a resilver. After these tests, I trusted the
system
MB> enough to move all my data to it, so I rsync''d everything and
double-checked
MB> it with MD5 sums.

MB> I have another ZFS server, at work, on which 1 disk someday started
acting
MB> weirdly (timeouts). I physically replaced it, and ran "zpool
replace". The
MB> resilver completed successfully. On this server, we have seen 2 CKSUM
errors
MB> over the last 18 months or so. We read about 3 TB of data every day from
it
MB> (daily rsync), that amounts to about 1.5 PB over 18 months. I guess 2
silent
MB> data corruptions while reading that quantity of data is about the
expected
MB> error rate of modern SATA drives. (Again ZFS self-healed itself, so this
was
MB> completely transparent to us.)

Which means you haven''t experienced silent data corruption thanks to
ZFS. :)

-- 
Best regards,
 Robert                            mailto:milek at task.gda.pl
                                       http://milek.blogspot.com

zfs discuss - Oct 2008 - ZFS Success Stories

[zfs-discuss] ZFS Success Stories

[zfs-discuss] [storage-discuss] ZFS Success Stories

[zfs-discuss] [storage-discuss] ZFS Success Stories

[zfs-discuss] ZFS Success Stories

[zfs-discuss] ZFS Success Stories