thr3ads.net - zfs discuss - [zfs-discuss] confusion and frustration with zpool [Jul 2008]

If this information is useful, please help other people find it:
Share via:

Pete Hartman

2008-Jul-06 14:19 UTC

[zfs-discuss] confusion and frustration with zpool

I have a zpool which has grown "organically".  I had a 60Gb disk, I
added a 120, I added a 500, I got a 750 and sliced it and mirrored the other
pieces.

The 60 and the 120 are internal PATA drives, the 500 and 750 are Maxtor OneTouch
USB drives.

The original system I created the 60+120+500 pool on was Solaris 10 update 3,
patched to use ZFS sometime last fall (November I believe).  In early June, a
storm blew out my root drive.  Thinking it was an opportunity to upgrade, I
re-installed with OpenSolaris, and completed the mirroring which I had intended
for some time, and upgraded zfs from v4 to v10.

The system was not stable.  Reading around, I realized that 512M of RAM and a
32-bit CPU was probably a poor choice for an OpenSolaris, ZFS based web and file
server for my home.  So I purchased an ASUS AMD64x2 system and 4G of RAM and
this weekend I was able to get that set up.

However, my pool is not behaving well.  I have had "insufficient
replicas" for the pool and "corrupted data" for the mirror piece
that is on both the USB drives.  This confuses me because I''m also
seeing "no known data errors" which leads me to wonder where this
corrupted data might be.  I did a zpool scrub, thinking I could shake out what
the problem was; earlier when the system was unstable doing this pointed out a
couple of MP3 files that were incorrect, and as they were easily replaced I just
removed them and was able to get a clean filesystem.

My most recent attempt to clear this involved removing the 750G drive and then
trying to bring it online; this had no effect, but now the 750 is on c0 rather
than c7 at the OS device level.

I''ve googled for some guidance and found advice to export/import, and
while this cleared the original insufficient replicas problem, it has not done
anything for the alleged corrupted data.

I have a couple thousand family photos (many of which are backed up elsewhere,
but would be a huge problem to re-import) and several thousand MP3s and AACs
(iTunes songs, many of which are backed up, but many are not because of being
recently purchased).  I''ve been hearing how ZFS is the way I should go,
which is why I made this change last fall, but at this point I am only having
confusion and frustration.

Any advice for other steps I could take to recover would be great.

here is some data directly from the system (yes, I know, somewhere along the
line I set the date one day ahead of the real date, I will be fixing that later
:) ):

-bash-3.2# zpool status local
  pool: local
 state: DEGRADED
 scrub: none requested
config:

        NAME          STATE     READ WRITE CKSUM
        local         DEGRADED     0     0     0
          mirror      ONLINE       0     0     0
            c6d1p0    ONLINE       0     0     0
            c0t0d0s3  ONLINE       0     0     0
          mirror      ONLINE       0     0     0
            c6d0p0    ONLINE       0     0     0
            c0t0d0s4  ONLINE       0     0     0
          mirror      UNAVAIL      0     0     0  corrupted data
            c8t0d0p0  ONLINE       0     0     0
            c0t0d0s5  ONLINE       0     0     0

errors: No known data errors
-bash-3.2# zpool history local
History for ''local'':
2007-11-19.11:45:11 zpool create -m /local2 local c1d0p0
2007-11-19.13:38:44 zfs recv local/main
2007-11-19.13:52:51 zfs set mountpoint=/local-pool local
2007-11-19.13:53:09 zfs set mountpoint=/local local/main
2007-11-19.14:00:48 zpool add local c1d1p0
2007-11-19.14:26:35 zfs destroy local/main at now
2007-11-28.18:38:26 zpool add local /dev/dsk/c3t0d0p0
2008-05-12.10:20:48 zfs set canmount=off local
2008-05-12.10:21:24 zfs set mountpoint=/ local
2008-06-16.15:56:29 zpool import -f local
2008-06-16.15:58:04 zpool export local
2008-06-27.21:41:35 zpool import local
2008-06-27.22:42:09 zpool attach -f local c5d0p0 c7t0d0s3
2008-06-28.09:06:51 zpool clear local c5d0p0
2008-06-28.09:07:00 zpool clear local c7t0d0s3
2008-06-28.09:07:11 zpool clear local
2008-06-28.09:35:39 zpool attach -f local c5d1p0 c7t0d0s4
2008-06-28.09:36:23 zpool attach -f local c6t0d0p0 c7t0d0s5
2008-06-28.13:15:26 zpool clear local
2008-06-28.13:16:48 zpool scrub local
2008-06-28.18:30:19 zpool clear local
2008-06-28.18:30:37 zpool upgrade local
2008-06-28.18:53:33 zfs create -o mountpoint=/opt/csw local/csw
2008-06-28.21:59:38 zpool export local
2008-07-06.23:25:41 zpool import local
2008-07-06.23:26:19 zpool scrub local
2008-07-07.08:40:13 zpool clear local
2008-07-07.08:43:39 zpool export local
2008-07-07.08:43:54 zpool import local
2008-07-07.08:44:20 zpool clear local
2008-07-07.08:47:20 zpool export local
2008-07-07.08:56:49 zpool import local
2008-07-07.08:58:57 zpool export local
2008-07-07.09:00:26 zpool import local
2008-07-07.09:18:16 zpool export local
2008-07-07.09:18:26 zpool import local
 
 
This message posted from opensolaris.org

Pete Hartman

2008-Jul-06 14:55 UTC

head link

[zfs-discuss] confusion and frustration with zpool

I''m doing another scrub after clearing "insufficient
replicas" only to find that I''m back to the report of insufficient
replicas, which basically leads me to expect this scrub (due to complete in
about 5 hours from now) won''t have any benefit either.

-bash-3.2#  zpool status local
  pool: local
 state: FAULTED
 scrub: scrub in progress for 0h32m, 9.51% done, 5h11m to go
config:

        NAME          STATE     READ WRITE CKSUM
        local         FAULTED      0     0     0  insufficient replicas
          mirror      ONLINE       0     0     0
            c6d1p0    ONLINE       0     0     0
            c0t0d0s3  ONLINE       0     0     0
          mirror      ONLINE       0     0     0
            c6d0p0    ONLINE       0     0     0
            c0t0d0s4  ONLINE       0     0     0
          mirror      UNAVAIL      0     0     0  corrupted data
            c8t0d0p0  ONLINE       0     0     0
            c0t0d0s5  ONLINE       0     0     0

errors: No known data errors
 
 
This message posted from opensolaris.org

Jeff Bonwick

2008-Jul-06 23:45 UTC

head link

[zfs-discuss] confusion and frustration with zpool

As a first step, ''fmdump -ev'' should indicate why
it''s complaining
about the mirror.

Jeff

On Sun, Jul 06, 2008 at 07:55:22AM -0700, Pete Hartman
wrote:> I''m doing another scrub after clearing "insufficient
replicas" only to find that I''m back to the report of insufficient
replicas, which basically leads me to expect this scrub (due to complete in
about 5 hours from now) won''t have any benefit either.
> 
> -bash-3.2#  zpool status local
>   pool: local
>  state: FAULTED
>  scrub: scrub in progress for 0h32m, 9.51% done, 5h11m to go
> config:
> 
>         NAME          STATE     READ WRITE CKSUM
>         local         FAULTED      0     0     0  insufficient replicas
>           mirror      ONLINE       0     0     0
>             c6d1p0    ONLINE       0     0     0
>             c0t0d0s3  ONLINE       0     0     0
>           mirror      ONLINE       0     0     0
>             c6d0p0    ONLINE       0     0     0
>             c0t0d0s4  ONLINE       0     0     0
>           mirror      UNAVAIL      0     0     0  corrupted data
>             c8t0d0p0  ONLINE       0     0     0
>             c0t0d0s5  ONLINE       0     0     0
> 
> errors: No known data errors
>  
>  
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Pete Hartman

2008-Jul-07 01:45 UTC

head link

[zfs-discuss] confusion and frustration with zpool

I''m not sure how to interpret the output of fmdump:

-bash-3.2#  fmdump -ev
TIME                 CLASS                                 ENA
Jul 06 23:25:39.3184 ereport.fs.zfs.vdev.bad_label 
0x03b3e4e8b1900401
Jul 07 03:32:14.3561 ereport.fs.zfs.checksum 
0xdaffb466a7e00001
Jul 07 03:32:14.3561 ereport.fs.zfs.checksum 
0xdaffb466a7e00001
Jul 07 03:32:14.3561 ereport.fs.zfs.checksum 
0xdaffb466a7e00001
Jul 07 03:32:14.3561 ereport.fs.zfs.checksum 
0xdaffb466a7e00001
Jul 07 03:32:14.3561 ereport.fs.zfs.checksum 
0xdaffb466a7e00001
Jul 07 03:32:14.3561 ereport.fs.zfs.checksum 
0xdaffb466a7e00001
Jul 07 03:32:14.3561 ereport.fs.zfs.checksum 
0xdaffb466a7e00001
Jul 07 03:32:14.3561 ereport.fs.zfs.checksum 
0xdaffb466a7e00001
Jul 07 03:32:14.3561 ereport.fs.zfs.data 
0xdaffb466a7e00001
Jul 07 08:43:51.9399 ereport.fs.zfs.vdev.bad_label 
0xeb15a1de01f00401
Jul 07 08:56:46.8978 ereport.fs.zfs.vdev.bad_label 
0xf66406a7f9f00401
Jul 07 09:00:25.6136 ereport.fs.zfs.vdev.bad_label 
0xf992ce4b4c100001
Jul 07 09:00:25.6136 ereport.fs.zfs.io 
0xf992ce4b4c100001
Jul 07 09:00:25.6136 ereport.fs.zfs.io 
0xf992ce4b4c100001
Jul 07 09:00:27.1258 ereport.fs.zfs.io 
0xf99870686ff00401
Jul 07 09:00:27.1258 ereport.fs.zfs.io 
0xf99870686ff00401
Jul 07 09:00:27.6452 ereport.fs.zfs.io 
0xf99a5fd3be900401
Jul 07 09:00:27.6452 ereport.fs.zfs.io 
0xf99a5fd3be900401
Jul 07 09:12:58.8672 ereport.fs.zfs.vdev.bad_label 
0x0488e4f3f2b00001
Jul 07 09:13:04.2748 ereport.fs.zfs.vdev.bad_label 
0x049d0a0437a00401
Jul 07 09:18:23.3689 ereport.fs.zfs.vdev.bad_label 
0x0941c1d9ae900001
Jul 07 13:32:19.9203 ereport.fs.zfs.checksum 
0xe6fa55a373b00001
Jul 07 13:32:19.9203 ereport.fs.zfs.checksum 
0xe6fa55a373b00001
Jul 07 13:32:19.9203 ereport.fs.zfs.checksum 
0xe6fa55a373b00001
Jul 07 13:32:19.9203 ereport.fs.zfs.checksum 
0xe6fa55a373b00001
Jul 07 13:32:19.9203 ereport.fs.zfs.checksum 
0xe6fa55a373b00001
Jul 07 13:32:19.9203 ereport.fs.zfs.checksum 
0xe6fa55a373b00001
Jul 07 13:32:19.9203 ereport.fs.zfs.checksum 
0xe6fa55a373b00001
Jul 07 13:32:19.9203 ereport.fs.zfs.checksum 
0xe6fa55a373b00001
Jul 07 13:32:19.9203 ereport.fs.zfs.data 
0xe6fa55a373b00001
Jul 07 20:03:41.6315 ereport.fs.zfs.vdev.bad_label 
0x3cb5f9c64ac00001
Jul 07 20:03:42.5642 ereport.fs.zfs.vdev.bad_label 
0x3cb97354d3100001
Jul 07 20:03:43.3098 ereport.fs.zfs.vdev.bad_label 
0x3cbc3a681b300001
Jul 07 20:03:58.6815 ereport.fs.zfs.vdev.bad_label 
0x3cf57dee80000401
Jul 07 20:04:01.0846 ereport.fs.zfs.vdev.bad_label 
0x3cfe71b9f5800401
Jul 07 20:04:03.2627 ereport.fs.zfs.vdev.bad_label 
0x3d068ee974a00401
Jul 07 20:04:06.2904 ereport.fs.zfs.vdev.bad_label 
0x3d11d65e58300001


So current sequence of events:

The scrub from this morning completed, and it now is calling out a 
specific file with problems.

Based on the "bad_label" messages above, I went to my USB devices to 
double check their labels; format shows them without problems.  So does 
fdisk.  Just to be sure, I went to the format partition menu and re-ran 
label without changing anything.

I then ran a zpool clear, and now it looks like everything is online 
except that one file:

-bash-3.2# zpool status -v
   pool: local
  state: ONLINE
status: One or more devices has experienced an error resulting in data
         corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
         entire pool from backup.
    see: http://www.sun.com/msg/ZFS-8000-8A
  scrub: scrub completed after 4h22m with 1 errors on Mon Jul  7 
13:44:31 2008
config:

         NAME          STATE     READ WRITE CKSUM
         local         ONLINE       0     0     0
           mirror      ONLINE       0     0     0
             c6d1p0    ONLINE       0     0     0
             c0t0d0s3  ONLINE       0     0     0
           mirror      ONLINE       0     0     0
             c6d0p0    ONLINE       0     0     0
             c0t0d0s4  ONLINE       0     0     0
           mirror      ONLINE       0     0     0
             c8t0d0p0  ONLINE       0     0     0
             c0t0d0s5  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

         /local/share/music/Petes-itunes/Scientist/Scientific Dub/Satta 
Dread Dub.mp3

HOWEVER, it does not appear that things are good:

-bash-3.2# zpool list
NAME    SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
local   630G   228G   403G    36%  ONLINE  -
rpool    55G  2.63G  52.4G     4%  ONLINE  -

-bash-3.2# df -k /local
Filesystem            kbytes    used   avail capacity  Mounted on
local/main           238581865 238567908       0   100%    /local

-bash-3.2# cd ''/local/share/music/Petes-itunes/Scientist/Scientific
Dub/''
-bash-3.2# ls -l
total 131460
-rwxr--r--   1 elmegil  other    8374348 Jun 10 18:51 Bad Days Dub.mp3
-rwxr--r--   1 elmegil  other    5355853 Jun 10 18:51 Blacka Shade of 
Dub.mp3
-rwxr--r--   1 elmegil  other    7260905 Jun 10 18:50 Drum Song Dub.mp3
-rwxr--r--   1 elmegil  other    6058878 Jun 10 18:51 East of Scientist 
Corner (II Pieces).mp3
-rwxr--r--   1 elmegil  other    7244195 Jun 10 18:51 Every Dub Shall 
Scrub.mp3
-rwxr--r--   1 elmegil  other    6878897 Jun 10 18:52 Just say Dub... 
Who.mp3
-rwxr--r--   1 elmegil  other    8197144 Jun 10 18:51 Keep a good Dub 
Rubbing.mp3
-rwxr--r--   1 elmegil  other    4929531 Jun 10 18:51 Satta Dread Dub.mp3
-rwxr--r--   1 elmegil  other    7873642 Jun 10 18:51 Taxi to Baltimore 
Dub.mp3
-rwxr--r--   1 elmegil  other    4438008 Jun 10 18:52 Words of Dub.mp3
-bash-3.2# rm ''Satta Dread Dub.mp3''
rm: Satta Dread Dub.mp3 not removed: No space left on device

Running export/import again shows data corruption again, but otherwise 
has the same symptom.  This is strange to me because previously the 
other files that were corrupted didn''t object to being removed.

Someone else wrote me directly and suggested this could be the fault of 
the new hardware...but the old hardware was panicking in ZFS so it 
wasn''t any more reliable (read: not any help to recover my data), and I
half expect that the panics could be related to some of this problem too.

I definitely am not seeing any other symptoms of bad hardware, no 
transport or other disk errors aside from the ZFS complaints (i.e. none 
of the usb or disk drivers are having any reported issues afaics), I''m 
not seeing ECC or other memory issues, no panicing from bit 
flips...which doesn''t rule out bad hardware of course, but I think
I''d
expect to see more than just the ZFS problems....

Just as a point of information, the motherboard is an ASUS M2A-VM and 
I''ve updated to the latest available BIOS (1705 I believe it was, from 
March this year).  I did that before the first import of the local pool 
on the new HW in fact.


Part of me is thinking what I ought to do is lop off the 750G drive, 
make it its own pool, physically copy as much of the data as I can save 
into that pool, scrub it to be sure it''s ok beyond that, and then 
re-create the original pool from scratch and copy the data back before 
mirroring again to the 750.  Very drastic, seems risky.  If there is 
anything more intelligible than I can discern from the fmdump above 
(fmdump -eV gives even more cryptic hex strings :) ) that could save 
this radical approach, any advice is appreciated.  Unfortunately there 
aren''t any other available media big enough to store 230G in a 
reasonable amount of time/individual media count (60 DVDs!  8G DVDs 
would be half that, but I have yet to find a DL drive that works 
reliably for me....).

Thanks Jeff.  I hope my frustration in all this doesn''t sound directed 
at anyone in particular and definitely not you.  I appreciate your time 
looking and giving advice.

Thanks

Pete



Jeff Bonwick wrote:> As a first step, ''fmdump -ev'' should indicate why
it''s complaining
> about the mirror.
> 
> Jeff
> 
> On Sun, Jul 06, 2008 at 07:55:22AM -0700, Pete Hartman wrote:
>> I''m doing another scrub after clearing "insufficient
replicas" only to find that I''m back to the report of insufficient
replicas, which basically leads me to expect this scrub (due to complete in
about 5 hours from now) won''t have any benefit either.
>>
>> -bash-3.2#  zpool status local
>>   pool: local
>>  state: FAULTED
>>  scrub: scrub in progress for 0h32m, 9.51% done, 5h11m to go
>> config:
>>
>>         NAME          STATE     READ WRITE CKSUM
>>         local         FAULTED      0     0     0  insufficient replicas
>>           mirror      ONLINE       0     0     0
>>             c6d1p0    ONLINE       0     0     0
>>             c0t0d0s3  ONLINE       0     0     0
>>           mirror      ONLINE       0     0     0
>>             c6d0p0    ONLINE       0     0     0
>>             c0t0d0s4  ONLINE       0     0     0
>>           mirror      UNAVAIL      0     0     0  corrupted data
>>             c8t0d0p0  ONLINE       0     0     0
>>             c0t0d0s5  ONLINE       0     0     0
>>
>> errors: No known data errors
>>  
>>  
>> This message posted from opensolaris.org
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

2008-Jul-07 14:50 UTC

head link

[zfs-discuss] confusion and frustration with zpool

> I got a 750 and sliced it and mirrored the other pieces.
Maybe you ran into a bug, because that situation would not be tested much in the
wild... or maybe you just bad lucked out and your computer toasted some data.
> Thanks Jeff.  I hope my frustration in all this doesn''t sound
directed
> at anyone in particular and definitely not you.
You are taking this frustrating problem better than I would be, so right on.  I
feel confident for you now just because Bonwick Is On The Case!
 
 
This message posted from opensolaris.org

Bohdan Tashchuk

2008-Jul-08 03:59 UTC

head link

[zfs-discuss] confusion and frustration with zpool

> However, my pool is not behaving well.  I have had
> "insufficient replicas" for the pool and "corrupted
> data" for the mirror piece that is on both the USB
> drives.
I''m learining about ZFS for the same reason, I want a reliable home
server. So I''ve been reading the archives. In March 2007 there was a
thread titled

     ZFS and Firewire/USB enclosures

the conclusion was that USB had problems because of the following open bug

     6424510 usb ignores DKIOCFLUSHWRITECACHE
>From what I can tell this bug is still not fixed
     http://bugs.opensolaris.org/view_bug.do?bug_id=6424510

So, help me out here. Can USB be relied on in Solaris? Maybe the original poster
is hitting this bug?
 
 
This message posted from opensolaris.org

James C. McPherson

2008-Jul-08 04:06 UTC

head link

[zfs-discuss] confusion and frustration with zpool

Bohdan Tashchuk wrote:>> However, my pool is not behaving well.  I have had
>> "insufficient replicas" for the pool and "corrupted
>> data" for the mirror piece that is on both the USB
>> drives.
> 
> I''m learining about ZFS for the same reason, I want a reliable
home server. So I''ve been reading the archives. In March 2007 there was
a thread titled
> 
>      ZFS and Firewire/USB enclosures
> 
> the conclusion was that USB had problems because of the following open bug
> 
>      6424510 usb ignores DKIOCFLUSHWRITECACHE
> 
>>From what I can tell this bug is still not fixed
> 
>      http://bugs.opensolaris.org/view_bug.do?bug_id=6424510
> 
> So, help me out here. Can USB be relied on in Solaris? Maybe the original
poster is hitting this bug?
Hi Bohdan,
that bug was fixed in build 54 of Solaris Nevada, I''m
not sure why bugs.opensolaris.org is showing it as still
being unfixed.

My only problems with USB-attached storage and Solaris
have been due to the actual disks in my enclosures dying.
(Most recently, just as I was about to deliver a ZFS demo
to a large customer. *very annoying*!)

If I didn''t have SAS or eSATA available to me, I''d be
going to USB-attached storage without any qualms. As
long as the disk inside the enclosure was 3.5" - those
laptop hard disks still aren''t quite there, imnsho.


cheers,
James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp	http://www.jmcp.homeunix.com/blog

Justin Vassallo

2008-Jul-08 09:12 UTC

head link

[zfs-discuss] confusion and frustration with zpool

James,
May I ask what kind of USB enclosures and hubs you are using? I''ve had
some
very bad experiences over the past month with not so cheap enclosures.

Wrt esata, I found the following chipsets on the SHCL. Any others you can
recommend?

Silicon Image 3112A
intel S5400
Intel S5100
Silicon Image Sil3114

Thanks
justin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3361 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080708/1dfae2e1/attachment.bin>

Pete Hartman

2008-Jul-08 13:42 UTC

head link

[zfs-discuss] confusion and frustration with zpool

I''m curious which enclosures you''ve had problems with?

Mine are both Maxtor One Touch; the 750 is slightly different in that it has a
FireWire port as well as USB.
 
 
This message posted from opensolaris.org

Darren J Moffat

2008-Jul-08 13:56 UTC

head link

[zfs-discuss] confusion and frustration with zpool

Pete Hartman wrote:> I''m curious which enclosures you''ve had problems with?
> 
> Mine are both Maxtor One Touch; the 750 is slightly different in that it
has a FireWire port as well as USB.
I''ve had VERY bad experiences with the Maxtor One Touch and ZFS.  To
the
point that we gave up trying to use them.  We last tried on snv_79 though.

-- 
Darren J Moffat

Al Hopper

2008-Jul-09 13:12 UTC

head link

[zfs-discuss] confusion and frustration with zpool

On Tue, Jul 8, 2008 at 8:56 AM, Darren J Moffat <darrenm at
opensolaris.org> wrote:> Pete Hartman wrote:
>> I''m curious which enclosures you''ve had problems
with?
>>
>> Mine are both Maxtor One Touch; the 750 is slightly different in that
it has a FireWire port as well as USB.
>
> I''ve had VERY bad experiences with the Maxtor One Touch and ZFS. 
To the
> point that we gave up trying to use them.  We last tried on snv_79 though.
>
I''ve had bad experiences with the Seagate products.  Last time I read
a bunch of customer reviews on newegg.com and it seemed to be split
between those with no issues and those with failures.  My guess is
that it''s related to duty cycle - casual users who really
don''t "beat
up" on the drive will have no problems, while power users will
probably kill the drive.  If my guess is correct, it''s simply physics
- lack of airflow over the HDA (head disk assembly).

Regards,

-- 
Al Hopper Logical Approach Inc,Plano,TX al at logical-approach.com
 Voice: 972.379.2133 Timezone: US CDT
OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007
http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/

Miles Nordin

2008-Jul-09 17:12 UTC

head link

[zfs-discuss] confusion and frustration with zpool

>>>>> "ah" == Al Hopper <al at
logical-approach.com> writes:
ah> I''ve had bad experiences with the Seagate products.

I''ve had bad experiences with all of them.
(maxtor, hgst, seagate, wd)

ah> My guess is that it''s related to duty cycle -

Recently I''ve been getting a lot of drives from companies like newegg
and zipzoomfly that fail within the first month. The rate is high
enough that I would not trust a two-way mirror with <1mo old drives.

Then I have drives with a few undreadable sectors 2 - 5 years into
their life, from all manufacturers. I test them with ''smartctl -t
long'', and either send them for warranty repair or abandon them. I
suspect usually ''dd if=/dev/zero of=<drive>'' would fix
such a disk
unless the ``reallocated sector count'''' is too high, but I
just
pretend every drive is on lease for its warranty period. The
PATA/SATA/SATA2NCQ interfaces and capacity-per-watt changes about that
often anyway.

I send so many drives back for repair that it only makes financial
sense to buy 5-year-warranty drives. I don''t think they can make any
money on me with the rate I send them, but if more people did this
maybe they would learn to make disks that don''t suck. Maybe they are
giving me all their marginal ones or something, by using ``sales
channels''''---we pour our shit down THIS channel. In that case
they
could still make money.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080709/f5bf61b8/attachment.bin>

Keith Bierman

2008-Jul-09 17:35 UTC

head link

[zfs-discuss] confusion and frustration with zpool

On Jul 9, 2008, at 11:12 AM, Miles Nordin wrote:
>>>>>> "ah" == Al Hopper <al at
logical-approach.com> writes:
>
>     ah> I''ve had bad experiences with the Seagate products.
>
> I''ve had bad experiences with all of them.
> (maxtor, hgst, seagate, wd)
>
>     ah> My guess is that it''s related to duty cycle -
>
> Recently I''ve been getting a lot of drives from companies like
newegg
> and zipzoomfly that fail within the first month.  The rate is high
> enough that I would not trust a two-way mirror with <1mo old drives.
>
While I''ve always had good luck with zipzoomfly, "infant
mortality"
is a well known feature of many devices. Your advice to do some "burn  
in" testing of drives before putting them into full production is  
probably a very sound one for sites large enough to maintain a bit of  
"inventory" ;>


-- 
Keith H. Bierman   khbkhb at gmail.com      | AIM kbiermank
5430 Nassau Circle East                  |
Cherry Hills Village, CO 80113           | 303-997-2749
<speaking for myself*> Copyright 2008

Anton B. Rang

2008-Jul-09 23:32 UTC

head link

[zfs-discuss] confusion and frustration with zpool

Also worth noting is that the "enterprise-class" drives have
protection from heavy load that the "consumer-class" drives
don''t. In particular, there''s no temperature sensor on the
voice coil for the consumer drives, which means that under heavy seek load
(constant i/o), the drive will eventually overheat. [There are plenty of other
differences, but this one is important if you plan to put a drive into 24/7
use.]
 
 
This message posted from opensolaris.org

Pete Hartman

2008-Jul-21 14:48 UTC

head link

[zfs-discuss] confusion and frustration with zpool

Just to close the loop on this, for some other poor soul having similar problems
and googling away....

I believe I have resolved it.  The problem was somewhere on the 750G drive, and
was fixed by detaching and re-attaching it to my mirrors.

I actually took the extra step of creating a UFS on the largest slice of the
750G and copying the data, with the thought that I may not be able to get my
data back from ZFS, but after detaching the disk and doing a scrub, there was
only one more data error with another MP3 that''s easily replaced, and
the filesystem was clean.

Re-attaching the 750G drive''s slices to the clean filesystem has not
resulted in any further problems in something over a week so far.
 
 
This message posted from opensolaris.org

Seemingly Similar Threads

Search for more reasonably related threads

zfs discuss - Jul 2008 - confusion and frustration with zpool

[zfs-discuss] confusion and frustration with zpool

[zfs-discuss] confusion and frustration with zpool

[zfs-discuss] confusion and frustration with zpool

[zfs-discuss] confusion and frustration with zpool

[zfs-discuss] confusion and frustration with zpool

[zfs-discuss] confusion and frustration with zpool

[zfs-discuss] confusion and frustration with zpool

[zfs-discuss] confusion and frustration with zpool

[zfs-discuss] confusion and frustration with zpool

[zfs-discuss] confusion and frustration with zpool

[zfs-discuss] confusion and frustration with zpool

[zfs-discuss] confusion and frustration with zpool

[zfs-discuss] confusion and frustration with zpool

[zfs-discuss] confusion and frustration with zpool

[zfs-discuss] confusion and frustration with zpool

Seemingly Similar Threads