ok, two weeks ago I did notice one of my disk in zpool got problems. I was getting "Corrupt label; wrong magic number" messages, then when I looked in format it did not see that disk... (last disk) I had that setup running for few months now and all of the sudden last disk failed. So I ordered another disk, had it replaced like a week ago, I did issue replace command after disk replacement, it was resilvering disks since forever, then I got hints from this group that snaps could be causing it so yesterday I did disable snaps and this morning I di dnotice the same disk that I replaced is gone... Does it seem weird that this disk would fail? Its new disk... I have Solaris 10 U2, 4 internal drives and then 7 external drives which are in single enclousures connected via scsi chain to each other... So it seems like last disk is failing. Those nipacks from sun have self termination so there is no terminator at the end... Any ideas what should I do? Do I need to order another drive and replace that one too? Or will it happen again? What do you think could be the problem? Ah, when I look at that enclosure I do see green light on it so it seems like it did not fail... format Searching for disks... efi_alloc_and_init failed. done AVAILABLE DISK SELECTIONS: 0. c1t0d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 sec 809> /pci at 1c,600000/scsi at 2/sd at 0,0 1. c1t1d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 sec 809> /pci at 1c,600000/scsi at 2/sd at 1,0 2. c1t2d0 <SEAGATE-ST3300007LC-D703-279.40GB> /pci at 1c,600000/scsi at 2/sd at 2,0 3. c1t3d0 <SEAGATE-ST3300007LC-D703-279.40GB> /pci at 1c,600000/scsi at 2/sd at 3,0 4. c3t0d0 <SEAGATE-ST3146807LC-0007-136.73GB> /pci at 1e,600000/scsi at 3/sd at 0,0 5. c3t1d0 <SEAGATE-ST3146807LC-0007-136.73GB> /pci at 1e,600000/scsi at 3/sd at 1,0 6. c3t2d0 <SEAGATE-ST3146807LC-0007-136.73GB> /pci at 1e,600000/scsi at 3/sd at 2,0 7. c3t3d0 <SEAGATE-ST3146807LC-0007-136.73GB> /pci at 1e,600000/scsi at 3/sd at 3,0 8. c3t4d0 <SEAGATE-ST3146807LC-0007-136.73GB> /pci at 1e,600000/scsi at 3/sd at 4,0 9. c3t5d0 <SEAGATE-ST3146807LC-0007-136.73GB> /pci at 1e,600000/scsi at 3/sd at 5,0 10. c3t6d0 <drive type unknown> /pci at 1e,600000/scsi at 3/sd at 6,0 zpool status -v pool: mypool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM mypool ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 errors: No known data errors pool: mypool2 state: DEGRADED scrub: resilver completed with 0 errors on Mon Dec 4 22:34:57 2006 config: NAME STATE READ WRITE CKSUM mypool2 DEGRADED 0 0 0 raidz DEGRADED 0 0 0 c3t0d0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 c3t2d0 ONLINE 0 0 0 c3t3d0 ONLINE 0 0 0 c3t4d0 ONLINE 0 0 0 c3t5d0 ONLINE 0 0 0 replacing UNAVAIL 0 775 0 insufficient replicas c3t6d0s0/o UNAVAIL 0 0 0 cannot open c3t6d0 UNAVAIL 0 940 0 cannot open errors: No known data errors
The only time that I have seen a format return "drive type unknown" is when the drive has failed. You may just have another bad drive and want to try replacing it again. If that does not work you, you may have another problem such as a bad backplane or a bad SCSI cable assuming the drive is an external drive. Hope that helps. On 12/5/06, Krzys <krzys at perfekt.net> wrote:> > > ok, two weeks ago I did notice one of my disk in zpool got problems. > I was getting "Corrupt label; wrong magic number" messages, then when I > looked > in format it did not see that disk... (last disk) I had that setup running > for > few months now and all of the sudden last disk failed. So I ordered > another > disk, had it replaced like a week ago, I did issue replace command after > disk > replacement, it was resilvering disks since forever, then I got hints from > this > group that snaps could be causing it so yesterday I did disable snaps and > this > morning I di dnotice the same disk that I replaced is gone... Does it seem > weird > that this disk would fail? Its new disk... I have Solaris 10 U2, 4 > internal > drives and then 7 external drives which are in single enclousures > connected via > scsi chain to each other... So it seems like last disk is failing. Those > nipacks > from sun have self termination so there is no terminator at the end... Any > ideas > what should I do? Do I need to order another drive and replace that one > too? Or > will it happen again? What do you think could be the problem? Ah, when I > look at > that enclosure I do see green light on it so it seems like it did not > fail... > > format > Searching for disks... > efi_alloc_and_init failed. > done > > > AVAILABLE DISK SELECTIONS: > 0. c1t0d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 sec 809> > /pci at 1c,600000/scsi at 2/sd at 0,0 > 1. c1t1d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 sec 809> > /pci at 1c,600000/scsi at 2/sd at 1,0 > 2. c1t2d0 <SEAGATE-ST3300007LC-D703-279.40GB> > /pci at 1c,600000/scsi at 2/sd at 2,0 > 3. c1t3d0 <SEAGATE-ST3300007LC-D703-279.40GB> > /pci at 1c,600000/scsi at 2/sd at 3,0 > 4. c3t0d0 <SEAGATE-ST3146807LC-0007-136.73GB> > /pci at 1e,600000/scsi at 3/sd at 0,0 > 5. c3t1d0 <SEAGATE-ST3146807LC-0007-136.73GB> > /pci at 1e,600000/scsi at 3/sd at 1,0 > 6. c3t2d0 <SEAGATE-ST3146807LC-0007-136.73GB> > /pci at 1e,600000/scsi at 3/sd at 2,0 > 7. c3t3d0 <SEAGATE-ST3146807LC-0007-136.73GB> > /pci at 1e,600000/scsi at 3/sd at 3,0 > 8. c3t4d0 <SEAGATE-ST3146807LC-0007-136.73GB> > /pci at 1e,600000/scsi at 3/sd at 4,0 > 9. c3t5d0 <SEAGATE-ST3146807LC-0007-136.73GB> > /pci at 1e,600000/scsi at 3/sd at 5,0 > 10. c3t6d0 <drive type unknown> > /pci at 1e,600000/scsi at 3/sd at 6,0 > > > > zpool status -v > pool: mypool > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > mypool ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c1t2d0 ONLINE 0 0 0 > c1t3d0 ONLINE 0 0 0 > > errors: No known data errors > > pool: mypool2 > state: DEGRADED > scrub: resilver completed with 0 errors on Mon Dec 4 22:34:57 2006 > config: > > NAME STATE READ WRITE CKSUM > mypool2 DEGRADED 0 0 0 > raidz DEGRADED 0 0 0 > c3t0d0 ONLINE 0 0 0 > c3t1d0 ONLINE 0 0 0 > c3t2d0 ONLINE 0 0 0 > c3t3d0 ONLINE 0 0 0 > c3t4d0 ONLINE 0 0 0 > c3t5d0 ONLINE 0 0 0 > replacing UNAVAIL 0 775 0 insufficient > replicas > c3t6d0s0/o UNAVAIL 0 0 0 cannot open > c3t6d0 UNAVAIL 0 940 0 cannot open > > errors: No known data errors > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061205/9acb2482/attachment.html>
Thanks, ah another weeeeird thing is that when I run format on that frive I get a coredump :( format Searching for disks... efi_alloc_and_init failed. done AVAILABLE DISK SELECTIONS: 0. c1t0d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 sec 809> /pci at 1c,600000/scsi at 2/sd at 0,0 1. c1t1d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 sec 809> /pci at 1c,600000/scsi at 2/sd at 1,0 2. c1t2d0 <SEAGATE-ST3300007LC-D703-279.40GB> /pci at 1c,600000/scsi at 2/sd at 2,0 3. c1t3d0 <SEAGATE-ST3300007LC-D703-279.40GB> /pci at 1c,600000/scsi at 2/sd at 3,0 4. c3t0d0 <SEAGATE-ST3146807LC-0007-136.73GB> /pci at 1e,600000/scsi at 3/sd at 0,0 5. c3t1d0 <SEAGATE-ST3146807LC-0007-136.73GB> /pci at 1e,600000/scsi at 3/sd at 1,0 6. c3t2d0 <SEAGATE-ST3146807LC-0007-136.73GB> /pci at 1e,600000/scsi at 3/sd at 2,0 7. c3t3d0 <SEAGATE-ST3146807LC-0007-136.73GB> /pci at 1e,600000/scsi at 3/sd at 3,0 8. c3t4d0 <SEAGATE-ST3146807LC-0007-136.73GB> /pci at 1e,600000/scsi at 3/sd at 4,0 9. c3t5d0 <SEAGATE-ST3146807LC-0007-136.73GB> /pci at 1e,600000/scsi at 3/sd at 5,0 10. c3t6d0 <drive type unknown> /pci at 1e,600000/scsi at 3/sd at 6,0 Specify disk (enter its number): 10 Segmentation Fault (core dumped) :( Cant even get to format menu on that drive... Chris On Tue, 5 Dec 2006, Nicholas Senedzuk wrote:> The only time that I have seen a format return "drive type unknown" is when > the drive has failed. You may just have another bad drive and want to try > replacing it again. If that does not work you, you may have another problem > such as a bad backplane or a bad SCSI cable assuming the drive is an > external drive. Hope that helps. > > > > > On 12/5/06, Krzys <krzys at perfekt.net> wrote: >> >> >> ok, two weeks ago I did notice one of my disk in zpool got problems. >> I was getting "Corrupt label; wrong magic number" messages, then when I >> looked >> in format it did not see that disk... (last disk) I had that setup running >> for >> few months now and all of the sudden last disk failed. So I ordered >> another >> disk, had it replaced like a week ago, I did issue replace command after >> disk >> replacement, it was resilvering disks since forever, then I got hints from >> this >> group that snaps could be causing it so yesterday I did disable snaps and >> this >> morning I di dnotice the same disk that I replaced is gone... Does it seem >> weird >> that this disk would fail? Its new disk... I have Solaris 10 U2, 4 >> internal >> drives and then 7 external drives which are in single enclousures >> connected via >> scsi chain to each other... So it seems like last disk is failing. Those >> nipacks >> from sun have self termination so there is no terminator at the end... Any >> ideas >> what should I do? Do I need to order another drive and replace that one >> too? Or >> will it happen again? What do you think could be the problem? Ah, when I >> look at >> that enclosure I do see green light on it so it seems like it did not >> fail... >> >> format >> Searching for disks... >> efi_alloc_and_init failed. >> done >> >> >> AVAILABLE DISK SELECTIONS: >> 0. c1t0d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 sec 809> >> /pci at 1c,600000/scsi at 2/sd at 0,0 >> 1. c1t1d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 sec 809> >> /pci at 1c,600000/scsi at 2/sd at 1,0 >> 2. c1t2d0 <SEAGATE-ST3300007LC-D703-279.40GB> >> /pci at 1c,600000/scsi at 2/sd at 2,0 >> 3. c1t3d0 <SEAGATE-ST3300007LC-D703-279.40GB> >> /pci at 1c,600000/scsi at 2/sd at 3,0 >> 4. c3t0d0 <SEAGATE-ST3146807LC-0007-136.73GB> >> /pci at 1e,600000/scsi at 3/sd at 0,0 >> 5. c3t1d0 <SEAGATE-ST3146807LC-0007-136.73GB> >> /pci at 1e,600000/scsi at 3/sd at 1,0 >> 6. c3t2d0 <SEAGATE-ST3146807LC-0007-136.73GB> >> /pci at 1e,600000/scsi at 3/sd at 2,0 >> 7. c3t3d0 <SEAGATE-ST3146807LC-0007-136.73GB> >> /pci at 1e,600000/scsi at 3/sd at 3,0 >> 8. c3t4d0 <SEAGATE-ST3146807LC-0007-136.73GB> >> /pci at 1e,600000/scsi at 3/sd at 4,0 >> 9. c3t5d0 <SEAGATE-ST3146807LC-0007-136.73GB> >> /pci at 1e,600000/scsi at 3/sd at 5,0 >> 10. c3t6d0 <drive type unknown> >> /pci at 1e,600000/scsi at 3/sd at 6,0 >> >> >> >> zpool status -v >> pool: mypool >> state: ONLINE >> scrub: none requested >> config: >> >> NAME STATE READ WRITE CKSUM >> mypool ONLINE 0 0 0 >> mirror ONLINE 0 0 0 >> c1t2d0 ONLINE 0 0 0 >> c1t3d0 ONLINE 0 0 0 >> >> errors: No known data errors >> >> pool: mypool2 >> state: DEGRADED >> scrub: resilver completed with 0 errors on Mon Dec 4 22:34:57 2006 >> config: >> >> NAME STATE READ WRITE CKSUM >> mypool2 DEGRADED 0 0 0 >> raidz DEGRADED 0 0 0 >> c3t0d0 ONLINE 0 0 0 >> c3t1d0 ONLINE 0 0 0 >> c3t2d0 ONLINE 0 0 0 >> c3t3d0 ONLINE 0 0 0 >> c3t4d0 ONLINE 0 0 0 >> c3t5d0 ONLINE 0 0 0 >> replacing UNAVAIL 0 775 0 insufficient >> replicas >> c3t6d0s0/o UNAVAIL 0 0 0 cannot open >> c3t6d0 UNAVAIL 0 940 0 cannot open >> >> errors: No known data errors >> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> > > > !DSPAM:122,457599be233492969614300! >
Krzys wrote:> Thanks, ah another weeeeird thing is that when I run format on that > frive I get a coredump :(Run pstack /path/to/core and send the output.
[12:00:40] root at chrysek: /d/d3/nb1 > pstack core core ''core'' of 29506: format -e ----------------- lwp# 1 / thread# 1 -------------------- 000239b8 c_disk (51800, 52000, 4bde4, 525f4, 54e78, 0) + 4e0 00020fb4 main (2, 0, ffbff8e8, 0, 52000, 29000) + 46c 000141a8 _start (0, 0, 0, 0, 0, 0) + 108 ----------------- lwp# 2 / thread# 2 -------------------- ff241818 _door_return (0, 0, 0, 0, fef92400, ff26cbc0) + 10 ff0c0c30 door_create_func (0, feefc000, 0, 0, ff0c0c10, 0) + 20 ff2400b0 _lwp_start (0, 0, 0, 0, 0, 0) ----------------- lwp# 3 / thread# 3 -------------------- ff240154 __lwp_park (75e78, 75e88, 0, 0, 0, 0) + 14 ff23a1e4 cond_wait_queue (75e78, 75e88, 0, 0, 0, 0) + 28 ff23a764 cond_wait (75e78, 75e88, 1, 0, 0, ff26cbc0) + 10 ff142a60 subscriber_event_handler (551d8, fedfc000, 0, 0, ff142a2c, 0) + 34 ff2400b0 _lwp_start (0, 0, 0, 0, 0, 0) On Tue, 5 Dec 2006, Torrey McMahon wrote:> Krzys wrote: >> Thanks, ah another weeeeird thing is that when I run format on that frive I >> get a coredump :( > > Run pstack /path/to/core and send the output. > > > !DSPAM:122,45759fd826586021468! >
On Tue, 5 Dec 2006, Krzys wrote:> Thanks, ah another weeeeird thing is that when I run format on that frive I get > a coredump :(... snip .... Try zeroing out the disk label with something like: dd if=/dev/zero of=/dev/rdsk/c?t?d?p0 bs=1024k count=1024 Regards, Al Hopper Logical Approach Inc, Plano, TX. al at logical-approach.com Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005 OpenSolaris Governing Board (OGB) Member - Feb 2006
Al Hopper <al at logical-approach.com> wrote:> On Tue, 5 Dec 2006, Krzys wrote: > > > Thanks, ah another weeeeird thing is that when I run format on that frive I get > > a coredump :( > ... snip .... > > Try zeroing out the disk label with something like: > > dd if=/dev/zero of=/dev/rdsk/c?t?d?p0 bs=1024k count=1024Do you expect a 1 GB disk label? J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
On Tue, 5 Dec 2006, Joerg Schilling wrote:> Al Hopper <al at logical-approach.com> wrote: > > > On Tue, 5 Dec 2006, Krzys wrote: > > > > > Thanks, ah another weeeeird thing is that when I run format on that frive I get > > > a coredump :( > > ... snip .... > > > > Try zeroing out the disk label with something like: > > > > dd if=/dev/zero of=/dev/rdsk/c?t?d?p0 bs=1024k count=1024 > > Do you expect a 1 GB disk label?No. :) Al Hopper Logical Approach Inc, Plano, TX. al at logical-approach.com Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005 OpenSolaris Governing Board (OGB) Member - Feb 2006
Does not work :( dd if=/dev/zero of=/dev/rdsk/c3t6d0s0 bs=1024k count=1024 dd: opening `/dev/rdsk/c3t6d0s0'': I/O error That is so strange... it seems like I lost another disk... I will try to reboot and see what I get, but I guess I need to order another disk then and give it a try... Chris On Tue, 5 Dec 2006, Al Hopper wrote:> On Tue, 5 Dec 2006, Krzys wrote: > >> Thanks, ah another weeeeird thing is that when I run format on that frive I get >> a coredump :( > ... snip .... > > Try zeroing out the disk label with something like: > > dd if=/dev/zero of=/dev/rdsk/c?t?d?p0 bs=1024k count=1024 > > Regards, > > Al Hopper Logical Approach Inc, Plano, TX. al at logical-approach.com > Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT > OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005 > OpenSolaris Governing Board (OGB) Member - Feb 2006 > > > !DSPAM:122,4575a7731650371292! >
Given your description of the physical installation, I''d initially suspect that you have a poorly SCSI bus before proceeding. What is the bus type and what length are the cables? If you''ve got 7 devices and hence 7 individual enclosures, with associated wiring between them, you may have exceeded the working length of the scsi bus, or have an issue with one of the later devices due to sync. Have you tried the same drive moved in the chain (as ZFS will id the disk irrespective of its solaris path)? What card (or onboard) and platform are you running ... Craig On 5 Dec 2006, at 16:01, Krzys wrote:> > ok, two weeks ago I did notice one of my disk in zpool got problems. > I was getting "Corrupt label; wrong magic number" messages, then > when I looked in format it did not see that disk... (last disk) I > had that setup running for few months now and all of the sudden > last disk failed. So I ordered another disk, had it replaced like a > week ago, I did issue replace command after disk replacement, it > was resilvering disks since forever, then I got hints from this > group that snaps could be causing it so yesterday I did disable > snaps and this morning I di dnotice the same disk that I replaced > is gone... Does it seem weird that this disk would fail? Its new > disk... I have Solaris 10 U2, 4 internal drives and then 7 external > drives which are in single enclousures connected via scsi chain to > each other... So it seems like last disk is failing. Those nipacks > from sun have self termination so there is no terminator at the > end... Any ideas what should I do? Do I need to order another drive > and replace that one too? Or will it happen again? What do you > think could be the problem? Ah, when I look at that enclosure I do > see green light on it so it seems like it did not fail... > > format > Searching for disks... > efi_alloc_and_init failed. > done > > > AVAILABLE DISK SELECTIONS: > 0. c1t0d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 > sec 809> > /pci at 1c,600000/scsi at 2/sd at 0,0 > 1. c1t1d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 > sec 809> > /pci at 1c,600000/scsi at 2/sd at 1,0 > 2. c1t2d0 <SEAGATE-ST3300007LC-D703-279.40GB> > /pci at 1c,600000/scsi at 2/sd at 2,0 > 3. c1t3d0 <SEAGATE-ST3300007LC-D703-279.40GB> > /pci at 1c,600000/scsi at 2/sd at 3,0 > 4. c3t0d0 <SEAGATE-ST3146807LC-0007-136.73GB> > /pci at 1e,600000/scsi at 3/sd at 0,0 > 5. c3t1d0 <SEAGATE-ST3146807LC-0007-136.73GB> > /pci at 1e,600000/scsi at 3/sd at 1,0 > 6. c3t2d0 <SEAGATE-ST3146807LC-0007-136.73GB> > /pci at 1e,600000/scsi at 3/sd at 2,0 > 7. c3t3d0 <SEAGATE-ST3146807LC-0007-136.73GB> > /pci at 1e,600000/scsi at 3/sd at 3,0 > 8. c3t4d0 <SEAGATE-ST3146807LC-0007-136.73GB> > /pci at 1e,600000/scsi at 3/sd at 4,0 > 9. c3t5d0 <SEAGATE-ST3146807LC-0007-136.73GB> > /pci at 1e,600000/scsi at 3/sd at 5,0 > 10. c3t6d0 <drive type unknown> > /pci at 1e,600000/scsi at 3/sd at 6,0 > > > > zpool status -v > pool: mypool > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > mypool ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c1t2d0 ONLINE 0 0 0 > c1t3d0 ONLINE 0 0 0 > > errors: No known data errors > > pool: mypool2 > state: DEGRADED > scrub: resilver completed with 0 errors on Mon Dec 4 22:34:57 2006 > config: > > NAME STATE READ WRITE CKSUM > mypool2 DEGRADED 0 0 0 > raidz DEGRADED 0 0 0 > c3t0d0 ONLINE 0 0 0 > c3t1d0 ONLINE 0 0 0 > c3t2d0 ONLINE 0 0 0 > c3t3d0 ONLINE 0 0 0 > c3t4d0 ONLINE 0 0 0 > c3t5d0 ONLINE 0 0 0 > replacing UNAVAIL 0 775 0 insufficient > replicas > c3t6d0s0/o UNAVAIL 0 0 0 cannot open > c3t6d0 UNAVAIL 0 940 0 cannot open > > errors: No known data errors > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2693 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061205/1f072871/attachment.bin>
Ok, so here is an update I did restart my sysyte, I power it off and power it on. Here is screen capture of my boot. I certainly do have some hard drive issues and will need to take a look at them... But I got my disk back visible to the system and zfs is doing resilvering again Rebooting with command: boot Boot device: /pci at 1c,600000/scsi at 2/disk at 0,0:a File and args: SunOS Release 5.10 Version Generic_118833-24 64-bit Copyright 1983-2006 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. Hardware watchdog enabled Hostname: chrysek WARNING: /pci at 1e,600000/scsi at 3 (glm2): SCSI bus DATA IN phase parity error WARNING: /pci at 1e,600000/scsi at 3 (glm2): Target 6 reducing sync. transfer rate WARNING: /pci at 1e,600000/scsi at 3/sd at 6,0 (sd5): Error for Command: read(10) Error Level: Retryable Requested Block: 286732066 Error Block: 286732066 Vendor: SEAGATE Serial Number: 3HY14PVS Sense Key: Aborted Command ASC: 0x48 (initiator detected error message received), ASCQ: 0x0, FRU: 0x2 WARNING: /pci at 1e,600000/scsi at 3 (glm2): SCSI bus DATA IN phase parity error WARNING: /pci at 1e,600000/scsi at 3 (glm2): Target 3 reducing sync. transfer rate WARNING: /pci at 1e,600000/scsi at 3/sd at 3,0 (sd23): Error for Command: read(10) Error Level: Retryable Requested Block: 283623842 Error Block: 283623842 Vendor: SEAGATE Serial Number: 3HY8HS7L Sense Key: Aborted Command ASC: 0x48 (initiator detected error message received), ASCQ: 0x0, FRU: 0x2 WARNING: /pci at 1e,600000/scsi at 3 (glm2): SCSI bus DATA IN phase parity error WARNING: /pci at 1e,600000/scsi at 3 (glm2): Target 5 reducing sync. transfer rate WARNING: /pci at 1e,600000/scsi at 3/sd at 5,0 (sd25): Error for Command: read(10) Error Level: Retryable Requested Block: 283623458 Error Block: 283623458 Vendor: SEAGATE Serial Number: 3HY0LF18 Sense Key: Aborted Command ASC: 0x48 (initiator detected error message received), ASCQ: 0x0, FRU: 0x2 /kernel/drv/sparcv9/zpool symbol avl_add multiply defined /kernel/drv/sparcv9/zpool symbol assfail3 multiply defined WARNING: kstat_create(''unix'', 0, ''dmu_buf_impl_t''): namespace collision mypool2/d3 uncorrectable error checking ufs filesystems /dev/rdsk/c1t0d0s7: is logging. chrysek console login: VERITAS SCSA Generic Revision: 3.5c Dec 5 13:01:38 chrysek root: CAPTURE_UPTIME ERROR: /var/opt/SUNWsrsrp missing Dec 5 13:01:38 chrysek root: CAPTURE_UPTIME ERROR: /var/opt/SUNWsrsrp missing Dec 5 13:01:46 chrysek VERITAS: No proxy found. Dec 5 13:01:52 chrysek vmd[546]: ready for connections Dec 5 13:01:53 chrysek VERITAS: No proxy found. Dec 5 13:01:54 chrysek VERITAS: No proxy found. Dec 5 13:02:00 chrysek VERITAS: No proxy found. Dec 5 13:02:01 chrysek VERITAS: No proxy found. Dec 5 13:02:03 chrysek VERITAS: No proxy found. starting NetWorker daemons: nsrexecd lgtolmd Dec 5 13:02:20 chrysek CNS Transport[841]: cctransport started Dec 5 13:02:48 chrysek webmin[1353]: Webmin starting Dec 5 13:19:07 chrysek scsi: WARNING: /pci at 1e,600000/scsi at 3 (glm2): Dec 5 13:19:07 chrysek Target 6 disabled wide SCSI mode Dec 5 13:19:07 chrysek scsi: WARNING: /pci at 1e,600000/scsi at 3 (glm2): Dec 5 13:19:07 chrysek Target 6 reverting to async. mode Dec 5 13:19:07 chrysek scsi: WARNING: /pci at 1e,600000/scsi at 3/sd at 6,0 (sd5): Dec 5 13:19:07 chrysek Error for Command: write(10) Error Level: Retryable Dec 5 13:19:07 chrysek scsi: Requested Block: 137163259 Error Block: 137163259 Dec 5 13:19:07 chrysek scsi: Vendor: SEAGATE Serial Number: 3HY14PVS Dec 5 13:19:07 chrysek scsi: Sense Key: Aborted Command Dec 5 13:19:07 chrysek scsi: ASC: 0x47 (scsi parity error), ASCQ: 0x0, FRU: 0x3 but now when I do zpool status -v pool: mypool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM mypool ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 errors: No known data errors pool: mypool2 state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress, 4.40% done, 11h40m to go config: NAME STATE READ WRITE CKSUM mypool2 DEGRADED 0 0 0 raidz DEGRADED 0 0 0 c3t0d0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 c3t2d0 ONLINE 0 0 0 c3t3d0 ONLINE 0 0 0 c3t4d0 ONLINE 0 0 0 c3t5d0 ONLINE 0 0 0 replacing DEGRADED 0 0 12 c3t6d0s0/o UNAVAIL 0 0 0 cannot open c3t6d0 ONLINE 0 0 0 errors: No known data errors I do see that drive... and it is doing resilvering format works too and I dont get coredump format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c1t0d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 sec 809> /pci at 1c,600000/scsi at 2/sd at 0,0 1. c1t1d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 sec 809> /pci at 1c,600000/scsi at 2/sd at 1,0 2. c1t2d0 <SEAGATE-ST3300007LC-D703-279.40GB> /pci at 1c,600000/scsi at 2/sd at 2,0 3. c1t3d0 <SEAGATE-ST3300007LC-D703-279.40GB> /pci at 1c,600000/scsi at 2/sd at 3,0 4. c3t0d0 <SEAGATE-ST3146807LC-0007-136.73GB> /pci at 1e,600000/scsi at 3/sd at 0,0 5. c3t1d0 <SEAGATE-ST3146807LC-0007-136.73GB> /pci at 1e,600000/scsi at 3/sd at 1,0 6. c3t2d0 <SEAGATE-ST3146807LC-0007-136.73GB> /pci at 1e,600000/scsi at 3/sd at 2,0 7. c3t3d0 <SEAGATE-ST3146807LC-0007-136.73GB> /pci at 1e,600000/scsi at 3/sd at 3,0 8. c3t4d0 <SEAGATE-ST3146807LC-0007-136.73GB> /pci at 1e,600000/scsi at 3/sd at 4,0 9. c3t5d0 <SEAGATE-ST3146807LC-0007-136.73GB> /pci at 1e,600000/scsi at 3/sd at 5,0 10. c3t6d0 <SEAGATE-ST3146807LC-0007-136.73GB> /pci at 1e,600000/scsi at 3/sd at 6,0 Specify disk (enter its number): 10 selecting c3t6d0 [disk formatted] /dev/dsk/c3t6d0s0 is part of active ZFS pool mypool2. Please see zpool(1M). FORMAT MENU: disk - select a disk type - select (define) a disk type partition - select (define) a partition table current - describe the current disk format - format and analyze the disk repair - repair a defective sector label - write label to the disk analyze - surface analysis defect - defect list management backup - search for backup labels verify - read and display labels inquiry - show vendor, product and revision volname - set 8-character volume name !<cmd> - execute <cmd>, then return quit format> verify Volume name = < > ascii name = <SEAGATE-ST3146807LC-0007-136.73GB> bytes/sector = 512 sectors = 286749487 accessible sectors = 286749454 Part Tag Flag First Sector Size Last Sector 0 usr wm 34 136.72GB 286733070 1 unassigned wm 0 0 0 2 unassigned wm 0 0 0 3 unassigned wm 0 0 0 4 unassigned wm 0 0 0 5 unassigned wm 0 0 0 6 unassigned wm 0 0 0 8 reserved wm 286733071 8.00MB 286749454 format> q On Tue, 5 Dec 2006, Krzys wrote:> > ok, two weeks ago I did notice one of my disk in zpool got problems. > I was getting "Corrupt label; wrong magic number" messages, then when I > looked in format it did not see that disk... (last disk) I had that setup > running for few months now and all of the sudden last disk failed. So I > ordered another disk, had it replaced like a week ago, I did issue replace > command after disk replacement, it was resilvering disks since forever, then > I got hints from this group that snaps could be causing it so yesterday I did > disable snaps and this morning I di dnotice the same disk that I replaced is > gone... Does it seem weird that this disk would fail? Its new disk... I have > Solaris 10 U2, 4 internal drives and then 7 external drives which are in > single enclousures connected via scsi chain to each other... So it seems like > last disk is failing. Those nipacks from sun have self termination so there > is no terminator at the end... Any ideas what should I do? Do I need to order > another drive and replace that one too? Or will it happen again? What do you > think could be the problem? Ah, when I look at that enclosure I do see green > light on it so it seems like it did not fail... > > format > Searching for disks... > efi_alloc_and_init failed. > done > > > AVAILABLE DISK SELECTIONS: > 0. c1t0d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 sec 809> > /pci at 1c,600000/scsi at 2/sd at 0,0 > 1. c1t1d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 sec 809> > /pci at 1c,600000/scsi at 2/sd at 1,0 > 2. c1t2d0 <SEAGATE-ST3300007LC-D703-279.40GB> > /pci at 1c,600000/scsi at 2/sd at 2,0 > 3. c1t3d0 <SEAGATE-ST3300007LC-D703-279.40GB> > /pci at 1c,600000/scsi at 2/sd at 3,0 > 4. c3t0d0 <SEAGATE-ST3146807LC-0007-136.73GB> > /pci at 1e,600000/scsi at 3/sd at 0,0 > 5. c3t1d0 <SEAGATE-ST3146807LC-0007-136.73GB> > /pci at 1e,600000/scsi at 3/sd at 1,0 > 6. c3t2d0 <SEAGATE-ST3146807LC-0007-136.73GB> > /pci at 1e,600000/scsi at 3/sd at 2,0 > 7. c3t3d0 <SEAGATE-ST3146807LC-0007-136.73GB> > /pci at 1e,600000/scsi at 3/sd at 3,0 > 8. c3t4d0 <SEAGATE-ST3146807LC-0007-136.73GB> > /pci at 1e,600000/scsi at 3/sd at 4,0 > 9. c3t5d0 <SEAGATE-ST3146807LC-0007-136.73GB> > /pci at 1e,600000/scsi at 3/sd at 5,0 > 10. c3t6d0 <drive type unknown> > /pci at 1e,600000/scsi at 3/sd at 6,0 > > > > zpool status -v > pool: mypool > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > mypool ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c1t2d0 ONLINE 0 0 0 > c1t3d0 ONLINE 0 0 0 > > errors: No known data errors > > pool: mypool2 > state: DEGRADED > scrub: resilver completed with 0 errors on Mon Dec 4 22:34:57 2006 > config: > > NAME STATE READ WRITE CKSUM > mypool2 DEGRADED 0 0 0 > raidz DEGRADED 0 0 0 > c3t0d0 ONLINE 0 0 0 > c3t1d0 ONLINE 0 0 0 > c3t2d0 ONLINE 0 0 0 > c3t3d0 ONLINE 0 0 0 > c3t4d0 ONLINE 0 0 0 > c3t5d0 ONLINE 0 0 0 > replacing UNAVAIL 0 775 0 insufficient replicas > c3t6d0s0/o UNAVAIL 0 0 0 cannot open > c3t6d0 UNAVAIL 0 940 0 cannot open > > errors: No known data errors > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > !DSPAM:122,4575983a2243349481112! >
What os is this? What is the hardware? can you try running format with efi_debug set. You have to run format using a debugger and patch the variable. Here is how using mdb (set a break point in main so that the dynamic linker has done it''s stuff, then update the value of efi_debug to be 1, then continue): # mdb /usr/sbin/format> main:b > :rmdb: stop at main mdb: target stopped at: main: pushl %ebp> efi_debug/xlibefi.so.1`efi_debug: libefi.so.1`efi_debug: 0> efi_debug/w 1libefi.so.1`efi_debug: 0 = 0x1> :cSearching for disks...done --chris This message posted from opensolaris.org
This looks more like a cabling or connector problem. When that happens you should see parity errors and transfer rate negotiations. -- richard Krzys wrote:> Ok, so here is an update > > I did restart my sysyte, I power it off and power it on. Here is screen > capture of my boot. I certainly do have some hard drive issues and will > need to take a look at them... But I got my disk back visible to the > system and zfs is doing resilvering again > > Rebooting with command: boot > Boot device: /pci at 1c,600000/scsi at 2/disk at 0,0:a File and args: > SunOS Release 5.10 Version Generic_118833-24 64-bit > Copyright 1983-2006 Sun Microsystems, Inc. All rights reserved. > Use is subject to license terms. > Hardware watchdog enabled > Hostname: chrysek > WARNING: /pci at 1e,600000/scsi at 3 (glm2): > SCSI bus DATA IN phase parity error > WARNING: /pci at 1e,600000/scsi at 3 (glm2): > Target 6 reducing sync. transfer rate > WARNING: /pci at 1e,600000/scsi at 3/sd at 6,0 (sd5): > Error for Command: read(10) Error Level: Retryable > Requested Block: 286732066 Error Block: 286732066 > Vendor: SEAGATE Serial Number: 3HY14PVS > Sense Key: Aborted Command > ASC: 0x48 (initiator detected error message received), ASCQ: > 0x0, FRU: 0x2 > WARNING: /pci at 1e,600000/scsi at 3 (glm2): > SCSI bus DATA IN phase parity error > WARNING: /pci at 1e,600000/scsi at 3 (glm2): > Target 3 reducing sync. transfer rate > WARNING: /pci at 1e,600000/scsi at 3/sd at 3,0 (sd23): > Error for Command: read(10) Error Level: Retryable > Requested Block: 283623842 Error Block: 283623842 > Vendor: SEAGATE Serial Number: 3HY8HS7L > Sense Key: Aborted Command > ASC: 0x48 (initiator detected error message received), ASCQ: > 0x0, FRU: 0x2 > WARNING: /pci at 1e,600000/scsi at 3 (glm2): > SCSI bus DATA IN phase parity error > WARNING: /pci at 1e,600000/scsi at 3 (glm2): > Target 5 reducing sync. transfer rate > WARNING: /pci at 1e,600000/scsi at 3/sd at 5,0 (sd25): > Error for Command: read(10) Error Level: Retryable > Requested Block: 283623458 Error Block: 283623458 > Vendor: SEAGATE Serial Number: 3HY0LF18 > Sense Key: Aborted Command > ASC: 0x48 (initiator detected error message received), ASCQ: > 0x0, FRU: 0x2 > /kernel/drv/sparcv9/zpool symbol avl_add multiply defined > /kernel/drv/sparcv9/zpool symbol assfail3 multiply defined > WARNING: kstat_create(''unix'', 0, ''dmu_buf_impl_t''): namespace collision > mypool2/d3 uncorrectable error > checking ufs filesystems > /dev/rdsk/c1t0d0s7: is logging. > > chrysek console login: VERITAS SCSA Generic Revision: 3.5c > Dec 5 13:01:38 chrysek root: CAPTURE_UPTIME ERROR: /var/opt/SUNWsrsrp > missing > Dec 5 13:01:38 chrysek root: CAPTURE_UPTIME ERROR: /var/opt/SUNWsrsrp > missing > Dec 5 13:01:46 chrysek VERITAS: No proxy found. > Dec 5 13:01:52 chrysek vmd[546]: ready for connections > Dec 5 13:01:53 chrysek VERITAS: No proxy found. > Dec 5 13:01:54 chrysek VERITAS: No proxy found. > Dec 5 13:02:00 chrysek VERITAS: No proxy found. > Dec 5 13:02:01 chrysek VERITAS: No proxy found. > Dec 5 13:02:03 chrysek VERITAS: No proxy found. > starting NetWorker daemons: > nsrexecd > lgtolmd > Dec 5 13:02:20 chrysek CNS Transport[841]: cctransport started > Dec 5 13:02:48 chrysek webmin[1353]: Webmin starting > Dec 5 13:19:07 chrysek scsi: WARNING: /pci at 1e,600000/scsi at 3 (glm2): > Dec 5 13:19:07 chrysek Target 6 disabled wide SCSI mode > Dec 5 13:19:07 chrysek scsi: WARNING: /pci at 1e,600000/scsi at 3 (glm2): > Dec 5 13:19:07 chrysek Target 6 reverting to async. mode > Dec 5 13:19:07 chrysek scsi: WARNING: /pci at 1e,600000/scsi at 3/sd at 6,0 (sd5): > Dec 5 13:19:07 chrysek Error for Command: > write(10) Error Level: Retryable > Dec 5 13:19:07 chrysek scsi: Requested Block: > 137163259 Error Block: 137163259 > Dec 5 13:19:07 chrysek scsi: Vendor: SEAGATE Serial Number: 3HY14PVS > Dec 5 13:19:07 chrysek scsi: Sense Key: Aborted Command > Dec 5 13:19:07 chrysek scsi: ASC: 0x47 (scsi parity error), ASCQ: > 0x0, FRU: 0x3 > > > but now when I do zpool status -v > pool: mypool > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > mypool ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c1t2d0 ONLINE 0 0 0 > c1t3d0 ONLINE 0 0 0 > > errors: No known data errors > > pool: mypool2 > state: DEGRADED > status: One or more devices is currently being resilvered. The pool will > continue to function, possibly in a degraded state. > action: Wait for the resilver to complete. > scrub: resilver in progress, 4.40% done, 11h40m to go > config: > > NAME STATE READ WRITE CKSUM > mypool2 DEGRADED 0 0 0 > raidz DEGRADED 0 0 0 > c3t0d0 ONLINE 0 0 0 > c3t1d0 ONLINE 0 0 0 > c3t2d0 ONLINE 0 0 0 > c3t3d0 ONLINE 0 0 0 > c3t4d0 ONLINE 0 0 0 > c3t5d0 ONLINE 0 0 0 > replacing DEGRADED 0 0 12 > c3t6d0s0/o UNAVAIL 0 0 0 cannot open > c3t6d0 ONLINE 0 0 0 > > errors: No known data errors > > I do see that drive... and it is doing resilvering > > format works too and I dont get coredump > > format > Searching for disks...done > > > AVAILABLE DISK SELECTIONS: > 0. c1t0d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 sec 809> > /pci at 1c,600000/scsi at 2/sd at 0,0 > 1. c1t1d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 sec 809> > /pci at 1c,600000/scsi at 2/sd at 1,0 > 2. c1t2d0 <SEAGATE-ST3300007LC-D703-279.40GB> > /pci at 1c,600000/scsi at 2/sd at 2,0 > 3. c1t3d0 <SEAGATE-ST3300007LC-D703-279.40GB> > /pci at 1c,600000/scsi at 2/sd at 3,0 > 4. c3t0d0 <SEAGATE-ST3146807LC-0007-136.73GB> > /pci at 1e,600000/scsi at 3/sd at 0,0 > 5. c3t1d0 <SEAGATE-ST3146807LC-0007-136.73GB> > /pci at 1e,600000/scsi at 3/sd at 1,0 > 6. c3t2d0 <SEAGATE-ST3146807LC-0007-136.73GB> > /pci at 1e,600000/scsi at 3/sd at 2,0 > 7. c3t3d0 <SEAGATE-ST3146807LC-0007-136.73GB> > /pci at 1e,600000/scsi at 3/sd at 3,0 > 8. c3t4d0 <SEAGATE-ST3146807LC-0007-136.73GB> > /pci at 1e,600000/scsi at 3/sd at 4,0 > 9. c3t5d0 <SEAGATE-ST3146807LC-0007-136.73GB> > /pci at 1e,600000/scsi at 3/sd at 5,0 > 10. c3t6d0 <SEAGATE-ST3146807LC-0007-136.73GB> > /pci at 1e,600000/scsi at 3/sd at 6,0 > Specify disk (enter its number): 10 > selecting c3t6d0 > [disk formatted] > /dev/dsk/c3t6d0s0 is part of active ZFS pool mypool2. Please see zpool(1M). > > > FORMAT MENU: > disk - select a disk > type - select (define) a disk type > partition - select (define) a partition table > current - describe the current disk > format - format and analyze the disk > repair - repair a defective sector > label - write label to the disk > analyze - surface analysis > defect - defect list management > backup - search for backup labels > verify - read and display labels > inquiry - show vendor, product and revision > volname - set 8-character volume name > !<cmd> - execute <cmd>, then return > quit > format> verify > > Volume name = < > > ascii name = <SEAGATE-ST3146807LC-0007-136.73GB> > bytes/sector = 512 > sectors = 286749487 > accessible sectors = 286749454 > Part Tag Flag First Sector Size Last Sector > 0 usr wm 34 136.72GB 286733070 > 1 unassigned wm 0 0 0 > 2 unassigned wm 0 0 0 > 3 unassigned wm 0 0 0 > 4 unassigned wm 0 0 0 > 5 unassigned wm 0 0 0 > 6 unassigned wm 0 0 0 > 8 reserved wm 286733071 8.00MB 286749454 > > format> q > > > > > > > > On Tue, 5 Dec 2006, Krzys wrote: > >> >> ok, two weeks ago I did notice one of my disk in zpool got problems. >> I was getting "Corrupt label; wrong magic number" messages, then when >> I looked in format it did not see that disk... (last disk) I had that >> setup running for few months now and all of the sudden last disk >> failed. So I ordered another disk, had it replaced like a week ago, I >> did issue replace command after disk replacement, it was resilvering >> disks since forever, then I got hints from this group that snaps could >> be causing it so yesterday I did disable snaps and this morning I di >> dnotice the same disk that I replaced is gone... Does it seem weird >> that this disk would fail? Its new disk... I have Solaris 10 U2, 4 >> internal drives and then 7 external drives which are in single >> enclousures connected via scsi chain to each other... So it seems like >> last disk is failing. Those nipacks from sun have self termination so >> there is no terminator at the end... Any ideas what should I do? Do I >> need to order another drive and replace that one too? Or will it >> happen again? What do you think could be the problem? Ah, when I look >> at that enclosure I do see green light on it so it seems like it did >> not fail... >> >> format >> Searching for disks... >> efi_alloc_and_init failed. >> done >> >> >> AVAILABLE DISK SELECTIONS: >> 0. c1t0d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 sec 809> >> /pci at 1c,600000/scsi at 2/sd at 0,0 >> 1. c1t1d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 sec 809> >> /pci at 1c,600000/scsi at 2/sd at 1,0 >> 2. c1t2d0 <SEAGATE-ST3300007LC-D703-279.40GB> >> /pci at 1c,600000/scsi at 2/sd at 2,0 >> 3. c1t3d0 <SEAGATE-ST3300007LC-D703-279.40GB> >> /pci at 1c,600000/scsi at 2/sd at 3,0 >> 4. c3t0d0 <SEAGATE-ST3146807LC-0007-136.73GB> >> /pci at 1e,600000/scsi at 3/sd at 0,0 >> 5. c3t1d0 <SEAGATE-ST3146807LC-0007-136.73GB> >> /pci at 1e,600000/scsi at 3/sd at 1,0 >> 6. c3t2d0 <SEAGATE-ST3146807LC-0007-136.73GB> >> /pci at 1e,600000/scsi at 3/sd at 2,0 >> 7. c3t3d0 <SEAGATE-ST3146807LC-0007-136.73GB> >> /pci at 1e,600000/scsi at 3/sd at 3,0 >> 8. c3t4d0 <SEAGATE-ST3146807LC-0007-136.73GB> >> /pci at 1e,600000/scsi at 3/sd at 4,0 >> 9. c3t5d0 <SEAGATE-ST3146807LC-0007-136.73GB> >> /pci at 1e,600000/scsi at 3/sd at 5,0 >> 10. c3t6d0 <drive type unknown> >> /pci at 1e,600000/scsi at 3/sd at 6,0 >> >> >> >> zpool status -v >> pool: mypool >> state: ONLINE >> scrub: none requested >> config: >> >> NAME STATE READ WRITE CKSUM >> mypool ONLINE 0 0 0 >> mirror ONLINE 0 0 0 >> c1t2d0 ONLINE 0 0 0 >> c1t3d0 ONLINE 0 0 0 >> >> errors: No known data errors >> >> pool: mypool2 >> state: DEGRADED >> scrub: resilver completed with 0 errors on Mon Dec 4 22:34:57 2006 >> config: >> >> NAME STATE READ WRITE CKSUM >> mypool2 DEGRADED 0 0 0 >> raidz DEGRADED 0 0 0 >> c3t0d0 ONLINE 0 0 0 >> c3t1d0 ONLINE 0 0 0 >> c3t2d0 ONLINE 0 0 0 >> c3t3d0 ONLINE 0 0 0 >> c3t4d0 ONLINE 0 0 0 >> c3t5d0 ONLINE 0 0 0 >> replacing UNAVAIL 0 775 0 insufficient >> replicas >> c3t6d0s0/o UNAVAIL 0 0 0 cannot open >> c3t6d0 UNAVAIL 0 940 0 cannot open >> >> errors: No known data errors >> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> >> >> !DSPAM:122,4575983a2243349481112! >> > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
BTW, there is a way to check what the SCSI negotiations resolved to. I wrote about it once in a BluePrint http://www.sun.com/blueprints/0500/sysperfnc.pdf See page 11 -- richard Richard Elling wrote:> This looks more like a cabling or connector problem. When that happens > you should see parity errors and transfer rate negotiations. > -- richard > > Krzys wrote: >> Ok, so here is an update >> >> I did restart my sysyte, I power it off and power it on. Here is >> screen capture of my boot. I certainly do have some hard drive issues >> and will need to take a look at them... But I got my disk back visible >> to the system and zfs is doing resilvering again >> >> Rebooting with command: boot >> Boot device: /pci at 1c,600000/scsi at 2/disk at 0,0:a File and args: >> SunOS Release 5.10 Version Generic_118833-24 64-bit >> Copyright 1983-2006 Sun Microsystems, Inc. All rights reserved. >> Use is subject to license terms. >> Hardware watchdog enabled >> Hostname: chrysek >> WARNING: /pci at 1e,600000/scsi at 3 (glm2): >> SCSI bus DATA IN phase parity error >> WARNING: /pci at 1e,600000/scsi at 3 (glm2): >> Target 6 reducing sync. transfer rate >> WARNING: /pci at 1e,600000/scsi at 3/sd at 6,0 (sd5): >> Error for Command: read(10) Error Level: Retryable >> Requested Block: 286732066 Error Block: 286732066 >> Vendor: SEAGATE Serial Number: >> 3HY14PVS >> Sense Key: Aborted Command >> ASC: 0x48 (initiator detected error message received), ASCQ: >> 0x0, FRU: 0x2 >> WARNING: /pci at 1e,600000/scsi at 3 (glm2): >> SCSI bus DATA IN phase parity error >> WARNING: /pci at 1e,600000/scsi at 3 (glm2): >> Target 3 reducing sync. transfer rate >> WARNING: /pci at 1e,600000/scsi at 3/sd at 3,0 (sd23): >> Error for Command: read(10) Error Level: Retryable >> Requested Block: 283623842 Error Block: 283623842 >> Vendor: SEAGATE Serial Number: >> 3HY8HS7L >> Sense Key: Aborted Command >> ASC: 0x48 (initiator detected error message received), ASCQ: >> 0x0, FRU: 0x2 >> WARNING: /pci at 1e,600000/scsi at 3 (glm2): >> SCSI bus DATA IN phase parity error >> WARNING: /pci at 1e,600000/scsi at 3 (glm2): >> Target 5 reducing sync. transfer rate >> WARNING: /pci at 1e,600000/scsi at 3/sd at 5,0 (sd25): >> Error for Command: read(10) Error Level: Retryable >> Requested Block: 283623458 Error Block: 283623458 >> Vendor: SEAGATE Serial Number: >> 3HY0LF18 >> Sense Key: Aborted Command >> ASC: 0x48 (initiator detected error message received), ASCQ: >> 0x0, FRU: 0x2 >> /kernel/drv/sparcv9/zpool symbol avl_add multiply defined >> /kernel/drv/sparcv9/zpool symbol assfail3 multiply defined >> WARNING: kstat_create(''unix'', 0, ''dmu_buf_impl_t''): namespace collision >> mypool2/d3 uncorrectable error >> checking ufs filesystems >> /dev/rdsk/c1t0d0s7: is logging. >> >> chrysek console login: VERITAS SCSA Generic Revision: 3.5c >> Dec 5 13:01:38 chrysek root: CAPTURE_UPTIME ERROR: /var/opt/SUNWsrsrp >> missing >> Dec 5 13:01:38 chrysek root: CAPTURE_UPTIME ERROR: /var/opt/SUNWsrsrp >> missing >> Dec 5 13:01:46 chrysek VERITAS: No proxy found. >> Dec 5 13:01:52 chrysek vmd[546]: ready for connections >> Dec 5 13:01:53 chrysek VERITAS: No proxy found. >> Dec 5 13:01:54 chrysek VERITAS: No proxy found. >> Dec 5 13:02:00 chrysek VERITAS: No proxy found. >> Dec 5 13:02:01 chrysek VERITAS: No proxy found. >> Dec 5 13:02:03 chrysek VERITAS: No proxy found. >> starting NetWorker daemons: >> nsrexecd >> lgtolmd >> Dec 5 13:02:20 chrysek CNS Transport[841]: cctransport started >> Dec 5 13:02:48 chrysek webmin[1353]: Webmin starting >> Dec 5 13:19:07 chrysek scsi: WARNING: /pci at 1e,600000/scsi at 3 (glm2): >> Dec 5 13:19:07 chrysek Target 6 disabled wide SCSI mode >> Dec 5 13:19:07 chrysek scsi: WARNING: /pci at 1e,600000/scsi at 3 (glm2): >> Dec 5 13:19:07 chrysek Target 6 reverting to async. mode >> Dec 5 13:19:07 chrysek scsi: WARNING: /pci at 1e,600000/scsi at 3/sd at 6,0 >> (sd5): >> Dec 5 13:19:07 chrysek Error for Command: >> write(10) Error Level: Retryable >> Dec 5 13:19:07 chrysek scsi: Requested Block: >> 137163259 Error Block: 137163259 >> Dec 5 13:19:07 chrysek scsi: Vendor: SEAGATE Serial Number: 3HY14PVS >> Dec 5 13:19:07 chrysek scsi: Sense Key: Aborted Command >> Dec 5 13:19:07 chrysek scsi: ASC: 0x47 (scsi parity error), ASCQ: >> 0x0, FRU: 0x3 >> >> >> but now when I do zpool status -v >> pool: mypool >> state: ONLINE >> scrub: none requested >> config: >> >> NAME STATE READ WRITE CKSUM >> mypool ONLINE 0 0 0 >> mirror ONLINE 0 0 0 >> c1t2d0 ONLINE 0 0 0 >> c1t3d0 ONLINE 0 0 0 >> >> errors: No known data errors >> >> pool: mypool2 >> state: DEGRADED >> status: One or more devices is currently being resilvered. The pool will >> continue to function, possibly in a degraded state. >> action: Wait for the resilver to complete. >> scrub: resilver in progress, 4.40% done, 11h40m to go >> config: >> >> NAME STATE READ WRITE CKSUM >> mypool2 DEGRADED 0 0 0 >> raidz DEGRADED 0 0 0 >> c3t0d0 ONLINE 0 0 0 >> c3t1d0 ONLINE 0 0 0 >> c3t2d0 ONLINE 0 0 0 >> c3t3d0 ONLINE 0 0 0 >> c3t4d0 ONLINE 0 0 0 >> c3t5d0 ONLINE 0 0 0 >> replacing DEGRADED 0 0 12 >> c3t6d0s0/o UNAVAIL 0 0 0 cannot open >> c3t6d0 ONLINE 0 0 0 >> >> errors: No known data errors >> >> I do see that drive... and it is doing resilvering >> >> format works too and I dont get coredump >> >> format >> Searching for disks...done >> >> >> AVAILABLE DISK SELECTIONS: >> 0. c1t0d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 sec 809> >> /pci at 1c,600000/scsi at 2/sd at 0,0 >> 1. c1t1d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 sec 809> >> /pci at 1c,600000/scsi at 2/sd at 1,0 >> 2. c1t2d0 <SEAGATE-ST3300007LC-D703-279.40GB> >> /pci at 1c,600000/scsi at 2/sd at 2,0 >> 3. c1t3d0 <SEAGATE-ST3300007LC-D703-279.40GB> >> /pci at 1c,600000/scsi at 2/sd at 3,0 >> 4. c3t0d0 <SEAGATE-ST3146807LC-0007-136.73GB> >> /pci at 1e,600000/scsi at 3/sd at 0,0 >> 5. c3t1d0 <SEAGATE-ST3146807LC-0007-136.73GB> >> /pci at 1e,600000/scsi at 3/sd at 1,0 >> 6. c3t2d0 <SEAGATE-ST3146807LC-0007-136.73GB> >> /pci at 1e,600000/scsi at 3/sd at 2,0 >> 7. c3t3d0 <SEAGATE-ST3146807LC-0007-136.73GB> >> /pci at 1e,600000/scsi at 3/sd at 3,0 >> 8. c3t4d0 <SEAGATE-ST3146807LC-0007-136.73GB> >> /pci at 1e,600000/scsi at 3/sd at 4,0 >> 9. c3t5d0 <SEAGATE-ST3146807LC-0007-136.73GB> >> /pci at 1e,600000/scsi at 3/sd at 5,0 >> 10. c3t6d0 <SEAGATE-ST3146807LC-0007-136.73GB> >> /pci at 1e,600000/scsi at 3/sd at 6,0 >> Specify disk (enter its number): 10 >> selecting c3t6d0 >> [disk formatted] >> /dev/dsk/c3t6d0s0 is part of active ZFS pool mypool2. Please see >> zpool(1M). >> >> >> FORMAT MENU: >> disk - select a disk >> type - select (define) a disk type >> partition - select (define) a partition table >> current - describe the current disk >> format - format and analyze the disk >> repair - repair a defective sector >> label - write label to the disk >> analyze - surface analysis >> defect - defect list management >> backup - search for backup labels >> verify - read and display labels >> inquiry - show vendor, product and revision >> volname - set 8-character volume name >> !<cmd> - execute <cmd>, then return >> quit >> format> verify >> >> Volume name = < > >> ascii name = <SEAGATE-ST3146807LC-0007-136.73GB> >> bytes/sector = 512 >> sectors = 286749487 >> accessible sectors = 286749454 >> Part Tag Flag First Sector Size Last Sector >> 0 usr wm 34 136.72GB 286733070 >> 1 unassigned wm 0 0 0 >> 2 unassigned wm 0 0 0 >> 3 unassigned wm 0 0 0 >> 4 unassigned wm 0 0 0 >> 5 unassigned wm 0 0 0 >> 6 unassigned wm 0 0 0 >> 8 reserved wm 286733071 8.00MB 286749454 >> >> format> q >> >> >> >> >> >> >> >> On Tue, 5 Dec 2006, Krzys wrote: >> >>> >>> ok, two weeks ago I did notice one of my disk in zpool got problems. >>> I was getting "Corrupt label; wrong magic number" messages, then when >>> I looked in format it did not see that disk... (last disk) I had that >>> setup running for few months now and all of the sudden last disk >>> failed. So I ordered another disk, had it replaced like a week ago, I >>> did issue replace command after disk replacement, it was resilvering >>> disks since forever, then I got hints from this group that snaps >>> could be causing it so yesterday I did disable snaps and this morning >>> I di dnotice the same disk that I replaced is gone... Does it seem >>> weird that this disk would fail? Its new disk... I have Solaris 10 >>> U2, 4 internal drives and then 7 external drives which are in single >>> enclousures connected via scsi chain to each other... So it seems >>> like last disk is failing. Those nipacks from sun have self >>> termination so there is no terminator at the end... Any ideas what >>> should I do? Do I need to order another drive and replace that one >>> too? Or will it happen again? What do you think could be the problem? >>> Ah, when I look at that enclosure I do see green light on it so it >>> seems like it did not fail... >>> >>> format >>> Searching for disks... >>> efi_alloc_and_init failed. >>> done >>> >>> >>> AVAILABLE DISK SELECTIONS: >>> 0. c1t0d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 sec 809> >>> /pci at 1c,600000/scsi at 2/sd at 0,0 >>> 1. c1t1d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 sec 809> >>> /pci at 1c,600000/scsi at 2/sd at 1,0 >>> 2. c1t2d0 <SEAGATE-ST3300007LC-D703-279.40GB> >>> /pci at 1c,600000/scsi at 2/sd at 2,0 >>> 3. c1t3d0 <SEAGATE-ST3300007LC-D703-279.40GB> >>> /pci at 1c,600000/scsi at 2/sd at 3,0 >>> 4. c3t0d0 <SEAGATE-ST3146807LC-0007-136.73GB> >>> /pci at 1e,600000/scsi at 3/sd at 0,0 >>> 5. c3t1d0 <SEAGATE-ST3146807LC-0007-136.73GB> >>> /pci at 1e,600000/scsi at 3/sd at 1,0 >>> 6. c3t2d0 <SEAGATE-ST3146807LC-0007-136.73GB> >>> /pci at 1e,600000/scsi at 3/sd at 2,0 >>> 7. c3t3d0 <SEAGATE-ST3146807LC-0007-136.73GB> >>> /pci at 1e,600000/scsi at 3/sd at 3,0 >>> 8. c3t4d0 <SEAGATE-ST3146807LC-0007-136.73GB> >>> /pci at 1e,600000/scsi at 3/sd at 4,0 >>> 9. c3t5d0 <SEAGATE-ST3146807LC-0007-136.73GB> >>> /pci at 1e,600000/scsi at 3/sd at 5,0 >>> 10. c3t6d0 <drive type unknown> >>> /pci at 1e,600000/scsi at 3/sd at 6,0 >>> >>> >>> >>> zpool status -v >>> pool: mypool >>> state: ONLINE >>> scrub: none requested >>> config: >>> >>> NAME STATE READ WRITE CKSUM >>> mypool ONLINE 0 0 0 >>> mirror ONLINE 0 0 0 >>> c1t2d0 ONLINE 0 0 0 >>> c1t3d0 ONLINE 0 0 0 >>> >>> errors: No known data errors >>> >>> pool: mypool2 >>> state: DEGRADED >>> scrub: resilver completed with 0 errors on Mon Dec 4 22:34:57 2006 >>> config: >>> >>> NAME STATE READ WRITE CKSUM >>> mypool2 DEGRADED 0 0 0 >>> raidz DEGRADED 0 0 0 >>> c3t0d0 ONLINE 0 0 0 >>> c3t1d0 ONLINE 0 0 0 >>> c3t2d0 ONLINE 0 0 0 >>> c3t3d0 ONLINE 0 0 0 >>> c3t4d0 ONLINE 0 0 0 >>> c3t5d0 ONLINE 0 0 0 >>> replacing UNAVAIL 0 775 0 insufficient >>> replicas >>> c3t6d0s0/o UNAVAIL 0 0 0 cannot open >>> c3t6d0 UNAVAIL 0 940 0 cannot open >>> >>> errors: No known data errors >>> >>> _______________________________________________ >>> zfs-discuss mailing list >>> zfs-discuss at opensolaris.org >>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >>> >>> >>> !DSPAM:122,4575983a2243349481112! >>> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Hm. If the disk has no label, why would it have an s0? Or, did you mean p0? Nathan. On Wed, 2006-12-06 at 04:45, Krzys wrote:> Does not work :( > > dd if=/dev/zero of=/dev/rdsk/c3t6d0s0 bs=1024k count=1024 > dd: opening `/dev/rdsk/c3t6d0s0'': I/O error > > That is so strange... it seems like I lost another disk... I will try to reboot > and see what I get, but I guess I need to order another disk then and give it a > try... > > Chris > > > > > > On Tue, 5 Dec 2006, Al Hopper wrote: > > > On Tue, 5 Dec 2006, Krzys wrote: > > > >> Thanks, ah another weeeeird thing is that when I run format on that frive I get > >> a coredump :( > > ... snip .... > > > > Try zeroing out the disk label with something like: > > > > dd if=/dev/zero of=/dev/rdsk/c?t?d?p0 bs=1024k count=1024 > > > > Regards, > > > > Al Hopper Logical Approach Inc, Plano, TX. al at logical-approach.com > > Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT > > OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005 > > OpenSolaris Governing Board (OGB) Member - Feb 2006 > > > > > > !DSPAM:122,4575a7731650371292! > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss--
Thanks so much.. anyway resilvering worked its way, I got everything resolved zpool status -v pool: mypool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM mypool ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 errors: No known data errors pool: mypool2 state: ONLINE scrub: resilver completed with 0 errors on Tue Dec 5 13:48:31 2006 config: NAME STATE READ WRITE CKSUM mypool2 ONLINE 0 0 0 raidz ONLINE 0 0 0 c3t0d0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 c3t2d0 ONLINE 0 0 0 c3t3d0 ONLINE 0 0 0 c3t4d0 ONLINE 0 0 0 c3t5d0 ONLINE 0 0 0 c3t6d0 ONLINE 0 0 0 errors: No known data errors did not change any cables nor anything, just reboot... I will llook into replacing cables (those are the short scsi cables.. anyway this is so weird and original disk that I replaced seems to be good as well.. it must be connectivity problem... but whats weird is that I had it running for months without problems... Regards and thanks to all for help. Chris On Tue, 5 Dec 2006, Richard Elling wrote:> BTW, there is a way to check what the SCSI negotiations resolved to. > I wrote about it once in a BluePrint > http://www.sun.com/blueprints/0500/sysperfnc.pdf > See page 11 > -- richard > > Richard Elling wrote: >> This looks more like a cabling or connector problem. When that happens >> you should see parity errors and transfer rate negotiations. >> -- richard >> >> Krzys wrote: >>> Ok, so here is an update >>> >>> I did restart my sysyte, I power it off and power it on. Here is screen >>> capture of my boot. I certainly do have some hard drive issues and will >>> need to take a look at them... But I got my disk back visible to the >>> system and zfs is doing resilvering again >>> >>> Rebooting with command: boot >>> Boot device: /pci at 1c,600000/scsi at 2/disk at 0,0:a File and args: >>> SunOS Release 5.10 Version Generic_118833-24 64-bit >>> Copyright 1983-2006 Sun Microsystems, Inc. All rights reserved. >>> Use is subject to license terms. >>> Hardware watchdog enabled >>> Hostname: chrysek >>> WARNING: /pci at 1e,600000/scsi at 3 (glm2): >>> SCSI bus DATA IN phase parity error >>> WARNING: /pci at 1e,600000/scsi at 3 (glm2): >>> Target 6 reducing sync. transfer rate >>> WARNING: /pci at 1e,600000/scsi at 3/sd at 6,0 (sd5): >>> Error for Command: read(10) Error Level: Retryable >>> Requested Block: 286732066 Error Block: 286732066 >>> Vendor: SEAGATE Serial Number: 3HY14PVS >>> Sense Key: Aborted Command >>> ASC: 0x48 (initiator detected error message received), ASCQ: 0x0, >>> FRU: 0x2 >>> WARNING: /pci at 1e,600000/scsi at 3 (glm2): >>> SCSI bus DATA IN phase parity error >>> WARNING: /pci at 1e,600000/scsi at 3 (glm2): >>> Target 3 reducing sync. transfer rate >>> WARNING: /pci at 1e,600000/scsi at 3/sd at 3,0 (sd23): >>> Error for Command: read(10) Error Level: Retryable >>> Requested Block: 283623842 Error Block: 283623842 >>> Vendor: SEAGATE Serial Number: 3HY8HS7L >>> Sense Key: Aborted Command >>> ASC: 0x48 (initiator detected error message received), ASCQ: 0x0, >>> FRU: 0x2 >>> WARNING: /pci at 1e,600000/scsi at 3 (glm2): >>> SCSI bus DATA IN phase parity error >>> WARNING: /pci at 1e,600000/scsi at 3 (glm2): >>> Target 5 reducing sync. transfer rate >>> WARNING: /pci at 1e,600000/scsi at 3/sd at 5,0 (sd25): >>> Error for Command: read(10) Error Level: Retryable >>> Requested Block: 283623458 Error Block: 283623458 >>> Vendor: SEAGATE Serial Number: 3HY0LF18 >>> Sense Key: Aborted Command >>> ASC: 0x48 (initiator detected error message received), ASCQ: 0x0, >>> FRU: 0x2 >>> /kernel/drv/sparcv9/zpool symbol avl_add multiply defined >>> /kernel/drv/sparcv9/zpool symbol assfail3 multiply defined >>> WARNING: kstat_create(''unix'', 0, ''dmu_buf_impl_t''): namespace collision >>> mypool2/d3 uncorrectable error >>> checking ufs filesystems >>> /dev/rdsk/c1t0d0s7: is logging. >>> >>> chrysek console login: VERITAS SCSA Generic Revision: 3.5c >>> Dec 5 13:01:38 chrysek root: CAPTURE_UPTIME ERROR: /var/opt/SUNWsrsrp >>> missing >>> Dec 5 13:01:38 chrysek root: CAPTURE_UPTIME ERROR: /var/opt/SUNWsrsrp >>> missing >>> Dec 5 13:01:46 chrysek VERITAS: No proxy found. >>> Dec 5 13:01:52 chrysek vmd[546]: ready for connections >>> Dec 5 13:01:53 chrysek VERITAS: No proxy found. >>> Dec 5 13:01:54 chrysek VERITAS: No proxy found. >>> Dec 5 13:02:00 chrysek VERITAS: No proxy found. >>> Dec 5 13:02:01 chrysek VERITAS: No proxy found. >>> Dec 5 13:02:03 chrysek VERITAS: No proxy found. >>> starting NetWorker daemons: >>> nsrexecd >>> lgtolmd >>> Dec 5 13:02:20 chrysek CNS Transport[841]: cctransport started >>> Dec 5 13:02:48 chrysek webmin[1353]: Webmin starting >>> Dec 5 13:19:07 chrysek scsi: WARNING: /pci at 1e,600000/scsi at 3 (glm2): >>> Dec 5 13:19:07 chrysek Target 6 disabled wide SCSI mode >>> Dec 5 13:19:07 chrysek scsi: WARNING: /pci at 1e,600000/scsi at 3 (glm2): >>> Dec 5 13:19:07 chrysek Target 6 reverting to async. mode >>> Dec 5 13:19:07 chrysek scsi: WARNING: /pci at 1e,600000/scsi at 3/sd at 6,0 (sd5): >>> Dec 5 13:19:07 chrysek Error for Command: write(10) >>> Error Level: Retryable >>> Dec 5 13:19:07 chrysek scsi: Requested Block: 137163259 >>> Error Block: 137163259 >>> Dec 5 13:19:07 chrysek scsi: Vendor: SEAGATE Serial Number: 3HY14PVS >>> Dec 5 13:19:07 chrysek scsi: Sense Key: Aborted Command >>> Dec 5 13:19:07 chrysek scsi: ASC: 0x47 (scsi parity error), ASCQ: 0x0, >>> FRU: 0x3 >>> >>> >>> but now when I do zpool status -v >>> pool: mypool >>> state: ONLINE >>> scrub: none requested >>> config: >>> >>> NAME STATE READ WRITE CKSUM >>> mypool ONLINE 0 0 0 >>> mirror ONLINE 0 0 0 >>> c1t2d0 ONLINE 0 0 0 >>> c1t3d0 ONLINE 0 0 0 >>> >>> errors: No known data errors >>> >>> pool: mypool2 >>> state: DEGRADED >>> status: One or more devices is currently being resilvered. The pool will >>> continue to function, possibly in a degraded state. >>> action: Wait for the resilver to complete. >>> scrub: resilver in progress, 4.40% done, 11h40m to go >>> config: >>> >>> NAME STATE READ WRITE CKSUM >>> mypool2 DEGRADED 0 0 0 >>> raidz DEGRADED 0 0 0 >>> c3t0d0 ONLINE 0 0 0 >>> c3t1d0 ONLINE 0 0 0 >>> c3t2d0 ONLINE 0 0 0 >>> c3t3d0 ONLINE 0 0 0 >>> c3t4d0 ONLINE 0 0 0 >>> c3t5d0 ONLINE 0 0 0 >>> replacing DEGRADED 0 0 12 >>> c3t6d0s0/o UNAVAIL 0 0 0 cannot open >>> c3t6d0 ONLINE 0 0 0 >>> >>> errors: No known data errors >>> >>> I do see that drive... and it is doing resilvering >>> >>> format works too and I dont get coredump >>> >>> format >>> Searching for disks...done >>> >>> >>> AVAILABLE DISK SELECTIONS: >>> 0. c1t0d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 sec 809> >>> /pci at 1c,600000/scsi at 2/sd at 0,0 >>> 1. c1t1d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 sec 809> >>> /pci at 1c,600000/scsi at 2/sd at 1,0 >>> 2. c1t2d0 <SEAGATE-ST3300007LC-D703-279.40GB> >>> /pci at 1c,600000/scsi at 2/sd at 2,0 >>> 3. c1t3d0 <SEAGATE-ST3300007LC-D703-279.40GB> >>> /pci at 1c,600000/scsi at 2/sd at 3,0 >>> 4. c3t0d0 <SEAGATE-ST3146807LC-0007-136.73GB> >>> /pci at 1e,600000/scsi at 3/sd at 0,0 >>> 5. c3t1d0 <SEAGATE-ST3146807LC-0007-136.73GB> >>> /pci at 1e,600000/scsi at 3/sd at 1,0 >>> 6. c3t2d0 <SEAGATE-ST3146807LC-0007-136.73GB> >>> /pci at 1e,600000/scsi at 3/sd at 2,0 >>> 7. c3t3d0 <SEAGATE-ST3146807LC-0007-136.73GB> >>> /pci at 1e,600000/scsi at 3/sd at 3,0 >>> 8. c3t4d0 <SEAGATE-ST3146807LC-0007-136.73GB> >>> /pci at 1e,600000/scsi at 3/sd at 4,0 >>> 9. c3t5d0 <SEAGATE-ST3146807LC-0007-136.73GB> >>> /pci at 1e,600000/scsi at 3/sd at 5,0 >>> 10. c3t6d0 <SEAGATE-ST3146807LC-0007-136.73GB> >>> /pci at 1e,600000/scsi at 3/sd at 6,0 >>> Specify disk (enter its number): 10 >>> selecting c3t6d0 >>> [disk formatted] >>> /dev/dsk/c3t6d0s0 is part of active ZFS pool mypool2. Please see >>> zpool(1M). >>> >>> >>> FORMAT MENU: >>> disk - select a disk >>> type - select (define) a disk type >>> partition - select (define) a partition table >>> current - describe the current disk >>> format - format and analyze the disk >>> repair - repair a defective sector >>> label - write label to the disk >>> analyze - surface analysis >>> defect - defect list management >>> backup - search for backup labels >>> verify - read and display labels >>> inquiry - show vendor, product and revision >>> volname - set 8-character volume name >>> !<cmd> - execute <cmd>, then return >>> quit >>> format> verify >>> >>> Volume name = < > >>> ascii name = <SEAGATE-ST3146807LC-0007-136.73GB> >>> bytes/sector = 512 >>> sectors = 286749487 >>> accessible sectors = 286749454 >>> Part Tag Flag First Sector Size Last Sector >>> 0 usr wm 34 136.72GB 286733070 >>> 1 unassigned wm 0 0 0 >>> 2 unassigned wm 0 0 0 >>> 3 unassigned wm 0 0 0 >>> 4 unassigned wm 0 0 0 >>> 5 unassigned wm 0 0 0 >>> 6 unassigned wm 0 0 0 >>> 8 reserved wm 286733071 8.00MB 286749454 >>> >>> format> q >>> >>> >>> >>> >>> >>> >>> >>> On Tue, 5 Dec 2006, Krzys wrote: >>> >>>> >>>> ok, two weeks ago I did notice one of my disk in zpool got problems. >>>> I was getting "Corrupt label; wrong magic number" messages, then when I >>>> looked in format it did not see that disk... (last disk) I had that setup >>>> running for few months now and all of the sudden last disk failed. So I >>>> ordered another disk, had it replaced like a week ago, I did issue >>>> replace command after disk replacement, it was resilvering disks since >>>> forever, then I got hints from this group that snaps could be causing it >>>> so yesterday I did disable snaps and this morning I di dnotice the same >>>> disk that I replaced is gone... Does it seem weird that this disk would >>>> fail? Its new disk... I have Solaris 10 U2, 4 internal drives and then 7 >>>> external drives which are in single enclousures connected via scsi chain >>>> to each other... So it seems like last disk is failing. Those nipacks >>>> from sun have self termination so there is no terminator at the end... >>>> Any ideas what should I do? Do I need to order another drive and replace >>>> that one too? Or will it happen again? What do you think could be the >>>> problem? Ah, when I look at that enclosure I do see green light on it so >>>> it seems like it did not fail... >>>> >>>> format >>>> Searching for disks... >>>> efi_alloc_and_init failed. >>>> done >>>> >>>> >>>> AVAILABLE DISK SELECTIONS: >>>> 0. c1t0d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 sec 809> >>>> /pci at 1c,600000/scsi at 2/sd at 0,0 >>>> 1. c1t1d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 sec 809> >>>> /pci at 1c,600000/scsi at 2/sd at 1,0 >>>> 2. c1t2d0 <SEAGATE-ST3300007LC-D703-279.40GB> >>>> /pci at 1c,600000/scsi at 2/sd at 2,0 >>>> 3. c1t3d0 <SEAGATE-ST3300007LC-D703-279.40GB> >>>> /pci at 1c,600000/scsi at 2/sd at 3,0 >>>> 4. c3t0d0 <SEAGATE-ST3146807LC-0007-136.73GB> >>>> /pci at 1e,600000/scsi at 3/sd at 0,0 >>>> 5. c3t1d0 <SEAGATE-ST3146807LC-0007-136.73GB> >>>> /pci at 1e,600000/scsi at 3/sd at 1,0 >>>> 6. c3t2d0 <SEAGATE-ST3146807LC-0007-136.73GB> >>>> /pci at 1e,600000/scsi at 3/sd at 2,0 >>>> 7. c3t3d0 <SEAGATE-ST3146807LC-0007-136.73GB> >>>> /pci at 1e,600000/scsi at 3/sd at 3,0 >>>> 8. c3t4d0 <SEAGATE-ST3146807LC-0007-136.73GB> >>>> /pci at 1e,600000/scsi at 3/sd at 4,0 >>>> 9. c3t5d0 <SEAGATE-ST3146807LC-0007-136.73GB> >>>> /pci at 1e,600000/scsi at 3/sd at 5,0 >>>> 10. c3t6d0 <drive type unknown> >>>> /pci at 1e,600000/scsi at 3/sd at 6,0 >>>> >>>> >>>> >>>> zpool status -v >>>> pool: mypool >>>> state: ONLINE >>>> scrub: none requested >>>> config: >>>> >>>> NAME STATE READ WRITE CKSUM >>>> mypool ONLINE 0 0 0 >>>> mirror ONLINE 0 0 0 >>>> c1t2d0 ONLINE 0 0 0 >>>> c1t3d0 ONLINE 0 0 0 >>>> >>>> errors: No known data errors >>>> >>>> pool: mypool2 >>>> state: DEGRADED >>>> scrub: resilver completed with 0 errors on Mon Dec 4 22:34:57 2006 >>>> config: >>>> >>>> NAME STATE READ WRITE CKSUM >>>> mypool2 DEGRADED 0 0 0 >>>> raidz DEGRADED 0 0 0 >>>> c3t0d0 ONLINE 0 0 0 >>>> c3t1d0 ONLINE 0 0 0 >>>> c3t2d0 ONLINE 0 0 0 >>>> c3t3d0 ONLINE 0 0 0 >>>> c3t4d0 ONLINE 0 0 0 >>>> c3t5d0 ONLINE 0 0 0 >>>> replacing UNAVAIL 0 775 0 insufficient >>>> replicas >>>> c3t6d0s0/o UNAVAIL 0 0 0 cannot open >>>> c3t6d0 UNAVAIL 0 940 0 cannot open >>>> >>>> errors: No known data errors >>>> >>>> _______________________________________________ >>>> zfs-discuss mailing list >>>> zfs-discuss at opensolaris.org >>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >>>> >>>> >>>> >>>> >>> _______________________________________________ >>> zfs-discuss mailing list >>> zfs-discuss at opensolaris.org >>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > !DSPAM:122,4575c6a814723197925582! >