thr3ads.net - zfs discuss - [zfs-discuss] weird thing with zfs [Dec 2006]

If this information is useful, please help other people find it:
Share via:

Krzys

2006-Dec-05 16:01 UTC

[zfs-discuss] weird thing with zfs

ok, two weeks ago I did notice one of my disk in zpool got problems.
I was getting "Corrupt label; wrong magic number" messages, then when
I looked
in format it did not see that disk... (last disk) I had that setup running for 
few months now and all of the sudden last disk failed. So I ordered another 
disk, had it replaced like a week ago, I did issue replace command after disk 
replacement, it was resilvering disks since forever, then I got hints from this 
group that snaps could be causing it so yesterday I did disable snaps and this 
morning I di dnotice the same disk that I replaced is gone... Does it seem weird
that this disk would fail? Its new disk... I have Solaris 10 U2, 4 internal 
drives and then 7 external drives which are in single enclousures connected via 
scsi chain to each other... So it seems like last disk is failing. Those nipacks
from sun have self termination so there is no terminator at the end... Any ideas
what should I do? Do I need to order another drive and replace that one too? Or 
will it happen again? What do you think could be the problem? Ah, when I look at
that enclosure I do see green light on it so it seems like it did not fail...

format
Searching for disks...
efi_alloc_and_init failed.
done


AVAILABLE DISK SELECTIONS:
        0. c1t0d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 sec 809>
           /pci at 1c,600000/scsi at 2/sd at 0,0
        1. c1t1d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 sec 809>
           /pci at 1c,600000/scsi at 2/sd at 1,0
        2. c1t2d0 <SEAGATE-ST3300007LC-D703-279.40GB>
           /pci at 1c,600000/scsi at 2/sd at 2,0
        3. c1t3d0 <SEAGATE-ST3300007LC-D703-279.40GB>
           /pci at 1c,600000/scsi at 2/sd at 3,0
        4. c3t0d0 <SEAGATE-ST3146807LC-0007-136.73GB>
           /pci at 1e,600000/scsi at 3/sd at 0,0
        5. c3t1d0 <SEAGATE-ST3146807LC-0007-136.73GB>
           /pci at 1e,600000/scsi at 3/sd at 1,0
        6. c3t2d0 <SEAGATE-ST3146807LC-0007-136.73GB>
           /pci at 1e,600000/scsi at 3/sd at 2,0
        7. c3t3d0 <SEAGATE-ST3146807LC-0007-136.73GB>
           /pci at 1e,600000/scsi at 3/sd at 3,0
        8. c3t4d0 <SEAGATE-ST3146807LC-0007-136.73GB>
           /pci at 1e,600000/scsi at 3/sd at 4,0
        9. c3t5d0 <SEAGATE-ST3146807LC-0007-136.73GB>
           /pci at 1e,600000/scsi at 3/sd at 5,0
       10. c3t6d0 <drive type unknown>
           /pci at 1e,600000/scsi at 3/sd at 6,0



zpool status -v
   pool: mypool
  state: ONLINE
  scrub: none requested
config:

         NAME        STATE     READ WRITE CKSUM
         mypool      ONLINE       0     0     0
           mirror    ONLINE       0     0     0
             c1t2d0  ONLINE       0     0     0
             c1t3d0  ONLINE       0     0     0

errors: No known data errors

   pool: mypool2
  state: DEGRADED
  scrub: resilver completed with 0 errors on Mon Dec  4 22:34:57 2006
config:

         NAME              STATE     READ WRITE CKSUM
         mypool2           DEGRADED     0     0     0
           raidz           DEGRADED     0     0     0
             c3t0d0        ONLINE       0     0     0
             c3t1d0        ONLINE       0     0     0
             c3t2d0        ONLINE       0     0     0
             c3t3d0        ONLINE       0     0     0
             c3t4d0        ONLINE       0     0     0
             c3t5d0        ONLINE       0     0     0
             replacing     UNAVAIL      0   775     0  insufficient replicas
               c3t6d0s0/o  UNAVAIL      0     0     0  cannot open
               c3t6d0      UNAVAIL      0   940     0  cannot open

errors: No known data errors

Nicholas Senedzuk

2006-Dec-05 16:09 UTC

head link

[zfs-discuss] weird thing with zfs

The only time that I have seen a format return "drive type unknown" is
when
the drive has failed. You may just have another bad drive and want to try
replacing it again. If that does not work you, you may have another problem
such as a bad backplane or a bad SCSI cable assuming the drive is an
external drive. Hope that helps.




On 12/5/06, Krzys <krzys at perfekt.net> wrote:>
>
> ok, two weeks ago I did notice one of my disk in zpool got problems.
> I was getting "Corrupt label; wrong magic number" messages, then
when I
> looked
> in format it did not see that disk... (last disk) I had that setup running
> for
> few months now and all of the sudden last disk failed. So I ordered
> another
> disk, had it replaced like a week ago, I did issue replace command after
> disk
> replacement, it was resilvering disks since forever, then I got hints from
> this
> group that snaps could be causing it so yesterday I did disable snaps and
> this
> morning I di dnotice the same disk that I replaced is gone... Does it seem
> weird
> that this disk would fail? Its new disk... I have Solaris 10 U2, 4
> internal
> drives and then 7 external drives which are in single enclousures
> connected via
> scsi chain to each other... So it seems like last disk is failing. Those
> nipacks
> from sun have self termination so there is no terminator at the end... Any
> ideas
> what should I do? Do I need to order another drive and replace that one
> too? Or
> will it happen again? What do you think could be the problem? Ah, when I
> look at
> that enclosure I do see green light on it so it seems like it did not
> fail...
>
> format
> Searching for disks...
> efi_alloc_and_init failed.
> done
>
>
> AVAILABLE DISK SELECTIONS:
>         0. c1t0d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 sec
809>
>            /pci at 1c,600000/scsi at 2/sd at 0,0
>         1. c1t1d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 sec
809>
>            /pci at 1c,600000/scsi at 2/sd at 1,0
>         2. c1t2d0 <SEAGATE-ST3300007LC-D703-279.40GB>
>            /pci at 1c,600000/scsi at 2/sd at 2,0
>         3. c1t3d0 <SEAGATE-ST3300007LC-D703-279.40GB>
>            /pci at 1c,600000/scsi at 2/sd at 3,0
>         4. c3t0d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>            /pci at 1e,600000/scsi at 3/sd at 0,0
>         5. c3t1d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>            /pci at 1e,600000/scsi at 3/sd at 1,0
>         6. c3t2d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>            /pci at 1e,600000/scsi at 3/sd at 2,0
>         7. c3t3d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>            /pci at 1e,600000/scsi at 3/sd at 3,0
>         8. c3t4d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>            /pci at 1e,600000/scsi at 3/sd at 4,0
>         9. c3t5d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>            /pci at 1e,600000/scsi at 3/sd at 5,0
>        10. c3t6d0 <drive type unknown>
>            /pci at 1e,600000/scsi at 3/sd at 6,0
>
>
>
> zpool status -v
>    pool: mypool
>   state: ONLINE
>   scrub: none requested
> config:
>
>          NAME        STATE     READ WRITE CKSUM
>          mypool      ONLINE       0     0     0
>            mirror    ONLINE       0     0     0
>              c1t2d0  ONLINE       0     0     0
>              c1t3d0  ONLINE       0     0     0
>
> errors: No known data errors
>
>    pool: mypool2
>   state: DEGRADED
>   scrub: resilver completed with 0 errors on Mon Dec  4 22:34:57 2006
> config:
>
>          NAME              STATE     READ WRITE CKSUM
>          mypool2           DEGRADED     0     0     0
>            raidz           DEGRADED     0     0     0
>              c3t0d0        ONLINE       0     0     0
>              c3t1d0        ONLINE       0     0     0
>              c3t2d0        ONLINE       0     0     0
>              c3t3d0        ONLINE       0     0     0
>              c3t4d0        ONLINE       0     0     0
>              c3t5d0        ONLINE       0     0     0
>              replacing     UNAVAIL      0   775     0  insufficient
> replicas
>                c3t6d0s0/o  UNAVAIL      0     0     0  cannot open
>                c3t6d0      UNAVAIL      0   940     0  cannot open
>
> errors: No known data errors
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061205/9acb2482/attachment.html>

Krzys

2006-Dec-05 16:16 UTC

head link

[zfs-discuss] weird thing with zfs

Thanks, ah another weeeeird thing is that when I run format on that frive I get 
a coredump :(

format
Searching for disks...
efi_alloc_and_init failed.
done


AVAILABLE DISK SELECTIONS:
        0. c1t0d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 sec 809>
           /pci at 1c,600000/scsi at 2/sd at 0,0
        1. c1t1d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 sec 809>
           /pci at 1c,600000/scsi at 2/sd at 1,0
        2. c1t2d0 <SEAGATE-ST3300007LC-D703-279.40GB>
           /pci at 1c,600000/scsi at 2/sd at 2,0
        3. c1t3d0 <SEAGATE-ST3300007LC-D703-279.40GB>
           /pci at 1c,600000/scsi at 2/sd at 3,0
        4. c3t0d0 <SEAGATE-ST3146807LC-0007-136.73GB>
           /pci at 1e,600000/scsi at 3/sd at 0,0
        5. c3t1d0 <SEAGATE-ST3146807LC-0007-136.73GB>
           /pci at 1e,600000/scsi at 3/sd at 1,0
        6. c3t2d0 <SEAGATE-ST3146807LC-0007-136.73GB>
           /pci at 1e,600000/scsi at 3/sd at 2,0
        7. c3t3d0 <SEAGATE-ST3146807LC-0007-136.73GB>
           /pci at 1e,600000/scsi at 3/sd at 3,0
        8. c3t4d0 <SEAGATE-ST3146807LC-0007-136.73GB>
           /pci at 1e,600000/scsi at 3/sd at 4,0
        9. c3t5d0 <SEAGATE-ST3146807LC-0007-136.73GB>
           /pci at 1e,600000/scsi at 3/sd at 5,0
       10. c3t6d0 <drive type unknown>
           /pci at 1e,600000/scsi at 3/sd at 6,0
Specify disk (enter its number): 10

Segmentation Fault (core dumped)

:( Cant even get to format menu on that drive...

Chris



On Tue, 5 Dec 2006, Nicholas Senedzuk wrote:
> The only time that I have seen a format return "drive type
unknown" is when
> the drive has failed. You may just have another bad drive and want to try
> replacing it again. If that does not work you, you may have another problem
> such as a bad backplane or a bad SCSI cable assuming the drive is an
> external drive. Hope that helps.
>
>
>
>
> On 12/5/06, Krzys <krzys at perfekt.net> wrote:
>> 
>> 
>> ok, two weeks ago I did notice one of my disk in zpool got problems.
>> I was getting "Corrupt label; wrong magic number" messages,
then when I
>> looked
>> in format it did not see that disk... (last disk) I had that setup
running
>> for
>> few months now and all of the sudden last disk failed. So I ordered
>> another
>> disk, had it replaced like a week ago, I did issue replace command
after
>> disk
>> replacement, it was resilvering disks since forever, then I got hints
from
>> this
>> group that snaps could be causing it so yesterday I did disable snaps
and
>> this
>> morning I di dnotice the same disk that I replaced is gone... Does it
seem
>> weird
>> that this disk would fail? Its new disk... I have Solaris 10 U2, 4
>> internal
>> drives and then 7 external drives which are in single enclousures
>> connected via
>> scsi chain to each other... So it seems like last disk is failing.
Those
>> nipacks
>> from sun have self termination so there is no terminator at the end...
Any
>> ideas
>> what should I do? Do I need to order another drive and replace that one
>> too? Or
>> will it happen again? What do you think could be the problem? Ah, when
I
>> look at
>> that enclosure I do see green light on it so it seems like it did not
>> fail...
>> 
>> format
>> Searching for disks...
>> efi_alloc_and_init failed.
>> done
>> 
>> 
>> AVAILABLE DISK SELECTIONS:
>>         0. c1t0d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16
sec 809>
>>            /pci at 1c,600000/scsi at 2/sd at 0,0
>>         1. c1t1d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16
sec 809>
>>            /pci at 1c,600000/scsi at 2/sd at 1,0
>>         2. c1t2d0 <SEAGATE-ST3300007LC-D703-279.40GB>
>>            /pci at 1c,600000/scsi at 2/sd at 2,0
>>         3. c1t3d0 <SEAGATE-ST3300007LC-D703-279.40GB>
>>            /pci at 1c,600000/scsi at 2/sd at 3,0
>>         4. c3t0d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>>            /pci at 1e,600000/scsi at 3/sd at 0,0
>>         5. c3t1d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>>            /pci at 1e,600000/scsi at 3/sd at 1,0
>>         6. c3t2d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>>            /pci at 1e,600000/scsi at 3/sd at 2,0
>>         7. c3t3d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>>            /pci at 1e,600000/scsi at 3/sd at 3,0
>>         8. c3t4d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>>            /pci at 1e,600000/scsi at 3/sd at 4,0
>>         9. c3t5d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>>            /pci at 1e,600000/scsi at 3/sd at 5,0
>>        10. c3t6d0 <drive type unknown>
>>            /pci at 1e,600000/scsi at 3/sd at 6,0
>> 
>> 
>> 
>> zpool status -v
>>    pool: mypool
>>   state: ONLINE
>>   scrub: none requested
>> config:
>>
>>          NAME        STATE     READ WRITE CKSUM
>>          mypool      ONLINE       0     0     0
>>            mirror    ONLINE       0     0     0
>>              c1t2d0  ONLINE       0     0     0
>>              c1t3d0  ONLINE       0     0     0
>> 
>> errors: No known data errors
>>
>>    pool: mypool2
>>   state: DEGRADED
>>   scrub: resilver completed with 0 errors on Mon Dec  4 22:34:57 2006
>> config:
>>
>>          NAME              STATE     READ WRITE CKSUM
>>          mypool2           DEGRADED     0     0     0
>>            raidz           DEGRADED     0     0     0
>>              c3t0d0        ONLINE       0     0     0
>>              c3t1d0        ONLINE       0     0     0
>>              c3t2d0        ONLINE       0     0     0
>>              c3t3d0        ONLINE       0     0     0
>>              c3t4d0        ONLINE       0     0     0
>>              c3t5d0        ONLINE       0     0     0
>>              replacing     UNAVAIL      0   775     0  insufficient
>> replicas
>>                c3t6d0s0/o  UNAVAIL      0     0     0  cannot open
>>                c3t6d0      UNAVAIL      0   940     0  cannot open
>> 
>> errors: No known data errors
>> 
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>> 
>
>
> !DSPAM:122,457599be233492969614300!
>

Torrey McMahon

2006-Dec-05 16:35 UTC

head link

[zfs-discuss] weird thing with zfs

Krzys wrote:> Thanks, ah another weeeeird thing is that when I run format on that 
> frive I get a coredump :(
Run pstack /path/to/core and send the output.

Krzys

2006-Dec-05 17:01 UTC

head link

[zfs-discuss] weird thing with zfs

[12:00:40] root at chrysek: /d/d3/nb1 > pstack core
core ''core'' of 29506:   format -e
-----------------  lwp# 1 / thread# 1  --------------------
  000239b8 c_disk   (51800, 52000, 4bde4, 525f4, 54e78, 0) + 4e0
  00020fb4 main     (2, 0, ffbff8e8, 0, 52000, 29000) + 46c
  000141a8 _start   (0, 0, 0, 0, 0, 0) + 108
-----------------  lwp# 2 / thread# 2  --------------------
  ff241818 _door_return (0, 0, 0, 0, fef92400, ff26cbc0) + 10
  ff0c0c30 door_create_func (0, feefc000, 0, 0, ff0c0c10, 0) + 20
  ff2400b0 _lwp_start (0, 0, 0, 0, 0, 0)
-----------------  lwp# 3 / thread# 3  --------------------
  ff240154 __lwp_park (75e78, 75e88, 0, 0, 0, 0) + 14
  ff23a1e4 cond_wait_queue (75e78, 75e88, 0, 0, 0, 0) + 28
  ff23a764 cond_wait (75e78, 75e88, 1, 0, 0, ff26cbc0) + 10
  ff142a60 subscriber_event_handler (551d8, fedfc000, 0, 0, ff142a2c, 0) + 34
  ff2400b0 _lwp_start (0, 0, 0, 0, 0, 0)



On Tue, 5 Dec 2006, Torrey McMahon wrote:
> Krzys wrote:
>> Thanks, ah another weeeeird thing is that when I run format on that
frive I
>> get a coredump :(
>
> Run pstack /path/to/core and send the output.
>
>
> !DSPAM:122,45759fd826586021468!
>

Al Hopper

2006-Dec-05 17:07 UTC

head link

[zfs-discuss] weird thing with zfs

On Tue, 5 Dec 2006, Krzys wrote:
> Thanks, ah another weeeeird thing is that when I run format on that frive I
get
> a coredump :(... snip ....

Try zeroing out the disk label with something like:

dd if=/dev/zero of=/dev/rdsk/c?t?d?p0  bs=1024k count=1024

Regards,

Al Hopper  Logical Approach Inc, Plano, TX.  al at logical-approach.com
           Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
             OpenSolaris Governing Board (OGB) Member - Feb 2006

Joerg Schilling

2006-Dec-05 17:17 UTC

head link

[zfs-discuss] weird thing with zfs

Al Hopper <al at logical-approach.com> wrote:
> On Tue, 5 Dec 2006, Krzys wrote:
>
> > Thanks, ah another weeeeird thing is that when I run format on that
frive I get
> > a coredump :(
> ... snip ....
>
> Try zeroing out the disk label with something like:
>
> dd if=/dev/zero of=/dev/rdsk/c?t?d?p0  bs=1024k count=1024
Do you expect a 1 GB disk label?

J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de                (uni)  
       schilling at fokus.fraunhofer.de     (work) Blog:
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Al Hopper

2006-Dec-05 17:25 UTC

head link

[zfs-discuss] weird thing with zfs

On Tue, 5 Dec 2006, Joerg Schilling wrote:
> Al Hopper <al at logical-approach.com> wrote:
>
> > On Tue, 5 Dec 2006, Krzys wrote:
> >
> > > Thanks, ah another weeeeird thing is that when I run format on
that frive I get
> > > a coredump :(
> > ... snip ....
> >
> > Try zeroing out the disk label with something like:
> >
> > dd if=/dev/zero of=/dev/rdsk/c?t?d?p0  bs=1024k count=1024
>
> Do you expect a 1 GB disk label?
No.  :)

Al Hopper  Logical Approach Inc, Plano, TX.  al at logical-approach.com
           Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
             OpenSolaris Governing Board (OGB) Member - Feb 2006

Krzys

2006-Dec-05 17:45 UTC

head link

[zfs-discuss] weird thing with zfs

Does not work :(

dd if=/dev/zero of=/dev/rdsk/c3t6d0s0 bs=1024k count=1024
dd: opening `/dev/rdsk/c3t6d0s0'': I/O error

That is so strange... it seems like I lost another disk... I will try to reboot 
and see what I get, but I guess I need to order another disk then and give it a 
try...

Chris





On Tue, 5 Dec 2006, Al Hopper wrote:
> On Tue, 5 Dec 2006, Krzys wrote:
>
>> Thanks, ah another weeeeird thing is that when I run format on that
frive I get
>> a coredump :(
> ... snip ....
>
> Try zeroing out the disk label with something like:
>
> dd if=/dev/zero of=/dev/rdsk/c?t?d?p0  bs=1024k count=1024
>
> Regards,
>
> Al Hopper  Logical Approach Inc, Plano, TX.  al at logical-approach.com
>           Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
> OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
>             OpenSolaris Governing Board (OGB) Member - Feb 2006
>
>
> !DSPAM:122,4575a7731650371292!
>

Craig Morgan

2006-Dec-05 18:14 UTC

head link

[zfs-discuss] weird thing with zfs

Given your description of the physical installation, I''d initially  
suspect that you have a poorly SCSI bus before proceeding. What is  
the bus type and what length are the cables?

If you''ve got 7 devices and hence 7 individual enclosures, with  
associated wiring between them, you may have exceeded the working  
length of the scsi bus, or have an issue with one of the later  
devices due to sync.

Have you tried the same drive moved in the chain (as ZFS will id the  
disk irrespective of its solaris path)?

What card (or onboard) and platform are you running ...

Craig

On 5 Dec 2006, at 16:01, Krzys wrote:
>
> ok, two weeks ago I did notice one of my disk in zpool got problems.
> I was getting "Corrupt label; wrong magic number" messages, then
> when I looked in format it did not see that disk... (last disk) I  
> had that setup running for few months now and all of the sudden  
> last disk failed. So I ordered another disk, had it replaced like a  
> week ago, I did issue replace command after disk replacement, it  
> was resilvering disks since forever, then I got hints from this  
> group that snaps could be causing it so yesterday I did disable  
> snaps and this morning I di dnotice the same disk that I replaced  
> is gone... Does it seem weird that this disk would fail? Its new  
> disk... I have Solaris 10 U2, 4 internal drives and then 7 external  
> drives which are in single enclousures connected via scsi chain to  
> each other... So it seems like last disk is failing. Those nipacks  
> from sun have self termination so there is no terminator at the  
> end... Any ideas what should I do? Do I need to order another drive  
> and replace that one too? Or will it happen again? What do you  
> think could be the problem? Ah, when I look at that enclosure I do  
> see green light on it so it seems like it did not fail...
>
> format
> Searching for disks...
> efi_alloc_and_init failed.
> done
>
>
> AVAILABLE DISK SELECTIONS:
>        0. c1t0d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16  
> sec 809>
>           /pci at 1c,600000/scsi at 2/sd at 0,0
>        1. c1t1d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16  
> sec 809>
>           /pci at 1c,600000/scsi at 2/sd at 1,0
>        2. c1t2d0 <SEAGATE-ST3300007LC-D703-279.40GB>
>           /pci at 1c,600000/scsi at 2/sd at 2,0
>        3. c1t3d0 <SEAGATE-ST3300007LC-D703-279.40GB>
>           /pci at 1c,600000/scsi at 2/sd at 3,0
>        4. c3t0d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>           /pci at 1e,600000/scsi at 3/sd at 0,0
>        5. c3t1d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>           /pci at 1e,600000/scsi at 3/sd at 1,0
>        6. c3t2d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>           /pci at 1e,600000/scsi at 3/sd at 2,0
>        7. c3t3d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>           /pci at 1e,600000/scsi at 3/sd at 3,0
>        8. c3t4d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>           /pci at 1e,600000/scsi at 3/sd at 4,0
>        9. c3t5d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>           /pci at 1e,600000/scsi at 3/sd at 5,0
>       10. c3t6d0 <drive type unknown>
>           /pci at 1e,600000/scsi at 3/sd at 6,0
>
>
>
> zpool status -v
>   pool: mypool
>  state: ONLINE
>  scrub: none requested
> config:
>
>         NAME        STATE     READ WRITE CKSUM
>         mypool      ONLINE       0     0     0
>           mirror    ONLINE       0     0     0
>             c1t2d0  ONLINE       0     0     0
>             c1t3d0  ONLINE       0     0     0
>
> errors: No known data errors
>
>   pool: mypool2
>  state: DEGRADED
>  scrub: resilver completed with 0 errors on Mon Dec  4 22:34:57 2006
> config:
>
>         NAME              STATE     READ WRITE CKSUM
>         mypool2           DEGRADED     0     0     0
>           raidz           DEGRADED     0     0     0
>             c3t0d0        ONLINE       0     0     0
>             c3t1d0        ONLINE       0     0     0
>             c3t2d0        ONLINE       0     0     0
>             c3t3d0        ONLINE       0     0     0
>             c3t4d0        ONLINE       0     0     0
>             c3t5d0        ONLINE       0     0     0
>             replacing     UNAVAIL      0   775     0  insufficient  
> replicas
>               c3t6d0s0/o  UNAVAIL      0     0     0  cannot open
>               c3t6d0      UNAVAIL      0   940     0  cannot open
>
> errors: No known data errors
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2693 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061205/1f072871/attachment.bin>

Krzys

2006-Dec-05 18:35 UTC

head link

[zfs-discuss] weird thing with zfs

Ok, so here is an update

I did restart my sysyte, I power it off and power it on. Here is screen capture 
of my boot. I certainly do have some hard drive issues and will need to take a 
look at them... But I got my disk back visible to the system and zfs is doing 
resilvering again

Rebooting with command: boot
Boot device: /pci at 1c,600000/scsi at 2/disk at 0,0:a  File and args:
SunOS Release 5.10 Version Generic_118833-24 64-bit
Copyright 1983-2006 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
Hardware watchdog enabled
Hostname: chrysek
WARNING: /pci at 1e,600000/scsi at 3 (glm2):
         SCSI bus DATA IN phase parity error
WARNING: /pci at 1e,600000/scsi at 3 (glm2):
         Target 6 reducing sync. transfer rate
WARNING: /pci at 1e,600000/scsi at 3/sd at 6,0 (sd5):
         Error for Command: read(10)                Error Level: Retryable
         Requested Block: 286732066                 Error Block: 286732066
         Vendor: SEAGATE                            Serial Number: 3HY14PVS
         Sense Key: Aborted Command
         ASC: 0x48 (initiator detected error message received), ASCQ: 0x0, FRU: 
0x2
WARNING: /pci at 1e,600000/scsi at 3 (glm2):
         SCSI bus DATA IN phase parity error
WARNING: /pci at 1e,600000/scsi at 3 (glm2):
         Target 3 reducing sync. transfer rate
WARNING: /pci at 1e,600000/scsi at 3/sd at 3,0 (sd23):
         Error for Command: read(10)                Error Level: Retryable
         Requested Block: 283623842                 Error Block: 283623842
         Vendor: SEAGATE                            Serial Number: 3HY8HS7L
         Sense Key: Aborted Command
         ASC: 0x48 (initiator detected error message received), ASCQ: 0x0, FRU: 
0x2
WARNING: /pci at 1e,600000/scsi at 3 (glm2):
         SCSI bus DATA IN phase parity error
WARNING: /pci at 1e,600000/scsi at 3 (glm2):
         Target 5 reducing sync. transfer rate
WARNING: /pci at 1e,600000/scsi at 3/sd at 5,0 (sd25):
         Error for Command: read(10)                Error Level: Retryable
         Requested Block: 283623458                 Error Block: 283623458
         Vendor: SEAGATE                            Serial Number: 3HY0LF18
         Sense Key: Aborted Command
         ASC: 0x48 (initiator detected error message received), ASCQ: 0x0, FRU: 
0x2
/kernel/drv/sparcv9/zpool symbol avl_add multiply defined
/kernel/drv/sparcv9/zpool symbol assfail3 multiply defined
WARNING: kstat_create(''unix'', 0,
''dmu_buf_impl_t''): namespace collision
mypool2/d3 uncorrectable error
checking ufs filesystems
/dev/rdsk/c1t0d0s7: is logging.

chrysek console login: VERITAS SCSA Generic Revision: 3.5c
Dec  5 13:01:38 chrysek root: CAPTURE_UPTIME ERROR: /var/opt/SUNWsrsrp missing
Dec  5 13:01:38 chrysek root: CAPTURE_UPTIME ERROR: /var/opt/SUNWsrsrp missing
Dec  5 13:01:46 chrysek VERITAS: No proxy found.
Dec  5 13:01:52 chrysek vmd[546]: ready for connections
Dec  5 13:01:53 chrysek VERITAS: No proxy found.
Dec  5 13:01:54 chrysek VERITAS: No proxy found.
Dec  5 13:02:00 chrysek VERITAS: No proxy found.
Dec  5 13:02:01 chrysek VERITAS: No proxy found.
Dec  5 13:02:03 chrysek VERITAS: No proxy found.
starting NetWorker daemons:
  nsrexecd
  lgtolmd
Dec  5 13:02:20 chrysek CNS Transport[841]: cctransport started
Dec  5 13:02:48 chrysek webmin[1353]: Webmin starting
Dec  5 13:19:07 chrysek scsi: WARNING: /pci at 1e,600000/scsi at 3 (glm2):
Dec  5 13:19:07 chrysek         Target 6 disabled wide SCSI mode
Dec  5 13:19:07 chrysek scsi: WARNING: /pci at 1e,600000/scsi at 3 (glm2):
Dec  5 13:19:07 chrysek         Target 6 reverting to async. mode
Dec  5 13:19:07 chrysek scsi: WARNING: /pci at 1e,600000/scsi at 3/sd at 6,0
(sd5):
Dec  5 13:19:07 chrysek         Error for Command: write(10)               Error
Level: Retryable
Dec  5 13:19:07 chrysek scsi:   Requested Block: 137163259                 Error
Block: 137163259
Dec  5 13:19:07 chrysek scsi:   Vendor: SEAGATE 
Serial Number: 3HY14PVS
Dec  5 13:19:07 chrysek scsi:   Sense Key: Aborted Command
Dec  5 13:19:07 chrysek scsi:   ASC: 0x47 (scsi parity error), ASCQ: 0x0, FRU: 
0x3


but now when I do zpool status -v
   pool: mypool
  state: ONLINE
  scrub: none requested
config:

         NAME        STATE     READ WRITE CKSUM
         mypool      ONLINE       0     0     0
           mirror    ONLINE       0     0     0
             c1t2d0  ONLINE       0     0     0
             c1t3d0  ONLINE       0     0     0

errors: No known data errors

   pool: mypool2
  state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
         continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scrub: resilver in progress, 4.40% done, 11h40m to go
config:

         NAME              STATE     READ WRITE CKSUM
         mypool2           DEGRADED     0     0     0
           raidz           DEGRADED     0     0     0
             c3t0d0        ONLINE       0     0     0
             c3t1d0        ONLINE       0     0     0
             c3t2d0        ONLINE       0     0     0
             c3t3d0        ONLINE       0     0     0
             c3t4d0        ONLINE       0     0     0
             c3t5d0        ONLINE       0     0     0
             replacing     DEGRADED     0     0    12
               c3t6d0s0/o  UNAVAIL      0     0     0  cannot open
               c3t6d0      ONLINE       0     0     0

errors: No known data errors

I do see that drive... and it is doing resilvering

format works too and I dont get coredump

format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
        0. c1t0d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 sec 809>
           /pci at 1c,600000/scsi at 2/sd at 0,0
        1. c1t1d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 sec 809>
           /pci at 1c,600000/scsi at 2/sd at 1,0
        2. c1t2d0 <SEAGATE-ST3300007LC-D703-279.40GB>
           /pci at 1c,600000/scsi at 2/sd at 2,0
        3. c1t3d0 <SEAGATE-ST3300007LC-D703-279.40GB>
           /pci at 1c,600000/scsi at 2/sd at 3,0
        4. c3t0d0 <SEAGATE-ST3146807LC-0007-136.73GB>
           /pci at 1e,600000/scsi at 3/sd at 0,0
        5. c3t1d0 <SEAGATE-ST3146807LC-0007-136.73GB>
           /pci at 1e,600000/scsi at 3/sd at 1,0
        6. c3t2d0 <SEAGATE-ST3146807LC-0007-136.73GB>
           /pci at 1e,600000/scsi at 3/sd at 2,0
        7. c3t3d0 <SEAGATE-ST3146807LC-0007-136.73GB>
           /pci at 1e,600000/scsi at 3/sd at 3,0
        8. c3t4d0 <SEAGATE-ST3146807LC-0007-136.73GB>
           /pci at 1e,600000/scsi at 3/sd at 4,0
        9. c3t5d0 <SEAGATE-ST3146807LC-0007-136.73GB>
           /pci at 1e,600000/scsi at 3/sd at 5,0
       10. c3t6d0 <SEAGATE-ST3146807LC-0007-136.73GB>
           /pci at 1e,600000/scsi at 3/sd at 6,0
Specify disk (enter its number): 10
selecting c3t6d0
[disk formatted]
/dev/dsk/c3t6d0s0 is part of active ZFS pool mypool2. Please see zpool(1M).


FORMAT MENU:
         disk       - select a disk
         type       - select (define) a disk type
         partition  - select (define) a partition table
         current    - describe the current disk
         format     - format and analyze the disk
         repair     - repair a defective sector
         label      - write label to the disk
         analyze    - surface analysis
         defect     - defect list management
         backup     - search for backup labels
         verify     - read and display labels
         inquiry    - show vendor, product and revision
         volname    - set 8-character volume name
         !<cmd>     - execute <cmd>, then return
         quit
format> verify

Volume name = <        >
ascii name  = <SEAGATE-ST3146807LC-0007-136.73GB>
bytes/sector    =  512
sectors = 286749487
accessible sectors = 286749454
Part      Tag    Flag     First Sector         Size         Last Sector
   0        usr    wm                34      136.72GB          286733070
   1 unassigned    wm                 0           0               0
   2 unassigned    wm                 0           0               0
   3 unassigned    wm                 0           0               0
   4 unassigned    wm                 0           0               0
   5 unassigned    wm                 0           0               0
   6 unassigned    wm                 0           0               0
   8   reserved    wm         286733071        8.00MB          286749454

format> q







On Tue, 5 Dec 2006, Krzys wrote:
>
> ok, two weeks ago I did notice one of my disk in zpool got problems.
> I was getting "Corrupt label; wrong magic number" messages, then
when I
> looked in format it did not see that disk... (last disk) I had that setup 
> running for few months now and all of the sudden last disk failed. So I 
> ordered another disk, had it replaced like a week ago, I did issue replace 
> command after disk replacement, it was resilvering disks since forever,
then
> I got hints from this group that snaps could be causing it so yesterday I
did
> disable snaps and this morning I di dnotice the same disk that I replaced
is
> gone... Does it seem weird that this disk would fail? Its new disk... I
have
> Solaris 10 U2, 4 internal drives and then 7 external drives which are in 
> single enclousures connected via scsi chain to each other... So it seems
like
> last disk is failing. Those nipacks from sun have self termination so there
> is no terminator at the end... Any ideas what should I do? Do I need to
order
> another drive and replace that one too? Or will it happen again? What do
you
> think could be the problem? Ah, when I look at that enclosure I do see
green
> light on it so it seems like it did not fail...
>
> format
> Searching for disks...
> efi_alloc_and_init failed.
> done
>
>
> AVAILABLE DISK SELECTIONS:
>       0. c1t0d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 sec
809>
>          /pci at 1c,600000/scsi at 2/sd at 0,0
>       1. c1t1d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 sec
809>
>          /pci at 1c,600000/scsi at 2/sd at 1,0
>       2. c1t2d0 <SEAGATE-ST3300007LC-D703-279.40GB>
>          /pci at 1c,600000/scsi at 2/sd at 2,0
>       3. c1t3d0 <SEAGATE-ST3300007LC-D703-279.40GB>
>          /pci at 1c,600000/scsi at 2/sd at 3,0
>       4. c3t0d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>          /pci at 1e,600000/scsi at 3/sd at 0,0
>       5. c3t1d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>          /pci at 1e,600000/scsi at 3/sd at 1,0
>       6. c3t2d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>          /pci at 1e,600000/scsi at 3/sd at 2,0
>       7. c3t3d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>          /pci at 1e,600000/scsi at 3/sd at 3,0
>       8. c3t4d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>          /pci at 1e,600000/scsi at 3/sd at 4,0
>       9. c3t5d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>          /pci at 1e,600000/scsi at 3/sd at 5,0
>      10. c3t6d0 <drive type unknown>
>          /pci at 1e,600000/scsi at 3/sd at 6,0
>
>
>
> zpool status -v
>  pool: mypool
> state: ONLINE
> scrub: none requested
> config:
>
>        NAME        STATE     READ WRITE CKSUM
>        mypool      ONLINE       0     0     0
>          mirror    ONLINE       0     0     0
>            c1t2d0  ONLINE       0     0     0
>            c1t3d0  ONLINE       0     0     0
>
> errors: No known data errors
>
>  pool: mypool2
> state: DEGRADED
> scrub: resilver completed with 0 errors on Mon Dec  4 22:34:57 2006
> config:
>
>        NAME              STATE     READ WRITE CKSUM
>        mypool2           DEGRADED     0     0     0
>          raidz           DEGRADED     0     0     0
>            c3t0d0        ONLINE       0     0     0
>            c3t1d0        ONLINE       0     0     0
>            c3t2d0        ONLINE       0     0     0
>            c3t3d0        ONLINE       0     0     0
>            c3t4d0        ONLINE       0     0     0
>            c3t5d0        ONLINE       0     0     0
>            replacing     UNAVAIL      0   775     0  insufficient replicas
>              c3t6d0s0/o  UNAVAIL      0     0     0  cannot open
>              c3t6d0      UNAVAIL      0   940     0  cannot open
>
> errors: No known data errors
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>
> !DSPAM:122,4575983a2243349481112!
>

Chris Gerhard

2006-Dec-05 18:44 UTC

head link

[zfs-discuss] Re: weird thing with zfs

What os is this?

What is the hardware?

can you try running format with efi_debug set. You have to run format using a
debugger and patch the variable. Here is how using mdb (set a break point in
main so that the dynamic linker has done it''s stuff, then update the
value of efi_debug to be 1, then continue):

# mdb /usr/sbin/format> main:b
> :rmdb: stop at main
mdb: target stopped at:
main:           pushl  %ebp> efi_debug/xlibefi.so.1`efi_debug:
libefi.so.1`efi_debug:          0       > efi_debug/w 1
libefi.so.1`efi_debug:          0       =       0x1> :cSearching for disks...done

--chris
 
 
This message posted from opensolaris.org

Richard Elling

2006-Dec-05 19:14 UTC

head link

[zfs-discuss] weird thing with zfs

This looks more like a cabling or connector problem.  When that happens
you should see parity errors and transfer rate negotiations.
  -- richard

Krzys wrote:> Ok, so here is an update
> 
> I did restart my sysyte, I power it off and power it on. Here is screen 
> capture of my boot. I certainly do have some hard drive issues and will 
> need to take a look at them... But I got my disk back visible to the 
> system and zfs is doing resilvering again
> 
> Rebooting with command: boot
> Boot device: /pci at 1c,600000/scsi at 2/disk at 0,0:a  File and args:
> SunOS Release 5.10 Version Generic_118833-24 64-bit
> Copyright 1983-2006 Sun Microsystems, Inc.  All rights reserved.
> Use is subject to license terms.
> Hardware watchdog enabled
> Hostname: chrysek
> WARNING: /pci at 1e,600000/scsi at 3 (glm2):
>         SCSI bus DATA IN phase parity error
> WARNING: /pci at 1e,600000/scsi at 3 (glm2):
>         Target 6 reducing sync. transfer rate
> WARNING: /pci at 1e,600000/scsi at 3/sd at 6,0 (sd5):
>         Error for Command: read(10)                Error Level: Retryable
>         Requested Block: 286732066                 Error Block: 286732066
>         Vendor: SEAGATE                            Serial Number: 3HY14PVS
>         Sense Key: Aborted Command
>         ASC: 0x48 (initiator detected error message received), ASCQ: 
> 0x0, FRU: 0x2
> WARNING: /pci at 1e,600000/scsi at 3 (glm2):
>         SCSI bus DATA IN phase parity error
> WARNING: /pci at 1e,600000/scsi at 3 (glm2):
>         Target 3 reducing sync. transfer rate
> WARNING: /pci at 1e,600000/scsi at 3/sd at 3,0 (sd23):
>         Error for Command: read(10)                Error Level: Retryable
>         Requested Block: 283623842                 Error Block: 283623842
>         Vendor: SEAGATE                            Serial Number: 3HY8HS7L
>         Sense Key: Aborted Command
>         ASC: 0x48 (initiator detected error message received), ASCQ: 
> 0x0, FRU: 0x2
> WARNING: /pci at 1e,600000/scsi at 3 (glm2):
>         SCSI bus DATA IN phase parity error
> WARNING: /pci at 1e,600000/scsi at 3 (glm2):
>         Target 5 reducing sync. transfer rate
> WARNING: /pci at 1e,600000/scsi at 3/sd at 5,0 (sd25):
>         Error for Command: read(10)                Error Level: Retryable
>         Requested Block: 283623458                 Error Block: 283623458
>         Vendor: SEAGATE                            Serial Number: 3HY0LF18
>         Sense Key: Aborted Command
>         ASC: 0x48 (initiator detected error message received), ASCQ: 
> 0x0, FRU: 0x2
> /kernel/drv/sparcv9/zpool symbol avl_add multiply defined
> /kernel/drv/sparcv9/zpool symbol assfail3 multiply defined
> WARNING: kstat_create(''unix'', 0,
''dmu_buf_impl_t''): namespace collision
> mypool2/d3 uncorrectable error
> checking ufs filesystems
> /dev/rdsk/c1t0d0s7: is logging.
> 
> chrysek console login: VERITAS SCSA Generic Revision: 3.5c
> Dec  5 13:01:38 chrysek root: CAPTURE_UPTIME ERROR: /var/opt/SUNWsrsrp 
> missing
> Dec  5 13:01:38 chrysek root: CAPTURE_UPTIME ERROR: /var/opt/SUNWsrsrp 
> missing
> Dec  5 13:01:46 chrysek VERITAS: No proxy found.
> Dec  5 13:01:52 chrysek vmd[546]: ready for connections
> Dec  5 13:01:53 chrysek VERITAS: No proxy found.
> Dec  5 13:01:54 chrysek VERITAS: No proxy found.
> Dec  5 13:02:00 chrysek VERITAS: No proxy found.
> Dec  5 13:02:01 chrysek VERITAS: No proxy found.
> Dec  5 13:02:03 chrysek VERITAS: No proxy found.
> starting NetWorker daemons:
>  nsrexecd
>  lgtolmd
> Dec  5 13:02:20 chrysek CNS Transport[841]: cctransport started
> Dec  5 13:02:48 chrysek webmin[1353]: Webmin starting
> Dec  5 13:19:07 chrysek scsi: WARNING: /pci at 1e,600000/scsi at 3 (glm2):
> Dec  5 13:19:07 chrysek         Target 6 disabled wide SCSI mode
> Dec  5 13:19:07 chrysek scsi: WARNING: /pci at 1e,600000/scsi at 3 (glm2):
> Dec  5 13:19:07 chrysek         Target 6 reverting to async. mode
> Dec  5 13:19:07 chrysek scsi: WARNING: /pci at 1e,600000/scsi at 3/sd at
6,0 (sd5):
> Dec  5 13:19:07 chrysek         Error for Command: 
> write(10)               Error Level: Retryable
> Dec  5 13:19:07 chrysek scsi:   Requested Block: 
> 137163259                 Error Block: 137163259
> Dec  5 13:19:07 chrysek scsi:   Vendor: SEAGATE Serial Number: 3HY14PVS
> Dec  5 13:19:07 chrysek scsi:   Sense Key: Aborted Command
> Dec  5 13:19:07 chrysek scsi:   ASC: 0x47 (scsi parity error), ASCQ: 
> 0x0, FRU: 0x3
> 
> 
> but now when I do zpool status -v
>   pool: mypool
>  state: ONLINE
>  scrub: none requested
> config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         mypool      ONLINE       0     0     0
>           mirror    ONLINE       0     0     0
>             c1t2d0  ONLINE       0     0     0
>             c1t3d0  ONLINE       0     0     0
> 
> errors: No known data errors
> 
>   pool: mypool2
>  state: DEGRADED
> status: One or more devices is currently being resilvered.  The pool will
>         continue to function, possibly in a degraded state.
> action: Wait for the resilver to complete.
>  scrub: resilver in progress, 4.40% done, 11h40m to go
> config:
> 
>         NAME              STATE     READ WRITE CKSUM
>         mypool2           DEGRADED     0     0     0
>           raidz           DEGRADED     0     0     0
>             c3t0d0        ONLINE       0     0     0
>             c3t1d0        ONLINE       0     0     0
>             c3t2d0        ONLINE       0     0     0
>             c3t3d0        ONLINE       0     0     0
>             c3t4d0        ONLINE       0     0     0
>             c3t5d0        ONLINE       0     0     0
>             replacing     DEGRADED     0     0    12
>               c3t6d0s0/o  UNAVAIL      0     0     0  cannot open
>               c3t6d0      ONLINE       0     0     0
> 
> errors: No known data errors
> 
> I do see that drive... and it is doing resilvering
> 
> format works too and I dont get coredump
> 
> format
> Searching for disks...done
> 
> 
> AVAILABLE DISK SELECTIONS:
>        0. c1t0d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 sec
809>
>           /pci at 1c,600000/scsi at 2/sd at 0,0
>        1. c1t1d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 sec
809>
>           /pci at 1c,600000/scsi at 2/sd at 1,0
>        2. c1t2d0 <SEAGATE-ST3300007LC-D703-279.40GB>
>           /pci at 1c,600000/scsi at 2/sd at 2,0
>        3. c1t3d0 <SEAGATE-ST3300007LC-D703-279.40GB>
>           /pci at 1c,600000/scsi at 2/sd at 3,0
>        4. c3t0d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>           /pci at 1e,600000/scsi at 3/sd at 0,0
>        5. c3t1d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>           /pci at 1e,600000/scsi at 3/sd at 1,0
>        6. c3t2d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>           /pci at 1e,600000/scsi at 3/sd at 2,0
>        7. c3t3d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>           /pci at 1e,600000/scsi at 3/sd at 3,0
>        8. c3t4d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>           /pci at 1e,600000/scsi at 3/sd at 4,0
>        9. c3t5d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>           /pci at 1e,600000/scsi at 3/sd at 5,0
>       10. c3t6d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>           /pci at 1e,600000/scsi at 3/sd at 6,0
> Specify disk (enter its number): 10
> selecting c3t6d0
> [disk formatted]
> /dev/dsk/c3t6d0s0 is part of active ZFS pool mypool2. Please see zpool(1M).
> 
> 
> FORMAT MENU:
>         disk       - select a disk
>         type       - select (define) a disk type
>         partition  - select (define) a partition table
>         current    - describe the current disk
>         format     - format and analyze the disk
>         repair     - repair a defective sector
>         label      - write label to the disk
>         analyze    - surface analysis
>         defect     - defect list management
>         backup     - search for backup labels
>         verify     - read and display labels
>         inquiry    - show vendor, product and revision
>         volname    - set 8-character volume name
>         !<cmd>     - execute <cmd>, then return
>         quit
> format> verify
> 
> Volume name = <        >
> ascii name  = <SEAGATE-ST3146807LC-0007-136.73GB>
> bytes/sector    =  512
> sectors = 286749487
> accessible sectors = 286749454
> Part      Tag    Flag     First Sector         Size         Last Sector
>   0        usr    wm                34      136.72GB          286733070
>   1 unassigned    wm                 0           0               0
>   2 unassigned    wm                 0           0               0
>   3 unassigned    wm                 0           0               0
>   4 unassigned    wm                 0           0               0
>   5 unassigned    wm                 0           0               0
>   6 unassigned    wm                 0           0               0
>   8   reserved    wm         286733071        8.00MB          286749454
> 
> format> q
> 
> 
> 
> 
> 
> 
> 
> On Tue, 5 Dec 2006, Krzys wrote:
> 
>>
>> ok, two weeks ago I did notice one of my disk in zpool got problems.
>> I was getting "Corrupt label; wrong magic number" messages,
then when
>> I looked in format it did not see that disk... (last disk) I had that 
>> setup running for few months now and all of the sudden last disk 
>> failed. So I ordered another disk, had it replaced like a week ago, I 
>> did issue replace command after disk replacement, it was resilvering 
>> disks since forever, then I got hints from this group that snaps could 
>> be causing it so yesterday I did disable snaps and this morning I di 
>> dnotice the same disk that I replaced is gone... Does it seem weird 
>> that this disk would fail? Its new disk... I have Solaris 10 U2, 4 
>> internal drives and then 7 external drives which are in single 
>> enclousures connected via scsi chain to each other... So it seems like 
>> last disk is failing. Those nipacks from sun have self termination so 
>> there is no terminator at the end... Any ideas what should I do? Do I 
>> need to order another drive and replace that one too? Or will it 
>> happen again? What do you think could be the problem? Ah, when I look 
>> at that enclosure I do see green light on it so it seems like it did 
>> not fail...
>>
>> format
>> Searching for disks...
>> efi_alloc_and_init failed.
>> done
>>
>>
>> AVAILABLE DISK SELECTIONS:
>>       0. c1t0d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 sec
809>
>>          /pci at 1c,600000/scsi at 2/sd at 0,0
>>       1. c1t1d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 sec
809>
>>          /pci at 1c,600000/scsi at 2/sd at 1,0
>>       2. c1t2d0 <SEAGATE-ST3300007LC-D703-279.40GB>
>>          /pci at 1c,600000/scsi at 2/sd at 2,0
>>       3. c1t3d0 <SEAGATE-ST3300007LC-D703-279.40GB>
>>          /pci at 1c,600000/scsi at 2/sd at 3,0
>>       4. c3t0d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>>          /pci at 1e,600000/scsi at 3/sd at 0,0
>>       5. c3t1d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>>          /pci at 1e,600000/scsi at 3/sd at 1,0
>>       6. c3t2d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>>          /pci at 1e,600000/scsi at 3/sd at 2,0
>>       7. c3t3d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>>          /pci at 1e,600000/scsi at 3/sd at 3,0
>>       8. c3t4d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>>          /pci at 1e,600000/scsi at 3/sd at 4,0
>>       9. c3t5d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>>          /pci at 1e,600000/scsi at 3/sd at 5,0
>>      10. c3t6d0 <drive type unknown>
>>          /pci at 1e,600000/scsi at 3/sd at 6,0
>>
>>
>>
>> zpool status -v
>>  pool: mypool
>> state: ONLINE
>> scrub: none requested
>> config:
>>
>>        NAME        STATE     READ WRITE CKSUM
>>        mypool      ONLINE       0     0     0
>>          mirror    ONLINE       0     0     0
>>            c1t2d0  ONLINE       0     0     0
>>            c1t3d0  ONLINE       0     0     0
>>
>> errors: No known data errors
>>
>>  pool: mypool2
>> state: DEGRADED
>> scrub: resilver completed with 0 errors on Mon Dec  4 22:34:57 2006
>> config:
>>
>>        NAME              STATE     READ WRITE CKSUM
>>        mypool2           DEGRADED     0     0     0
>>          raidz           DEGRADED     0     0     0
>>            c3t0d0        ONLINE       0     0     0
>>            c3t1d0        ONLINE       0     0     0
>>            c3t2d0        ONLINE       0     0     0
>>            c3t3d0        ONLINE       0     0     0
>>            c3t4d0        ONLINE       0     0     0
>>            c3t5d0        ONLINE       0     0     0
>>            replacing     UNAVAIL      0   775     0  insufficient 
>> replicas
>>              c3t6d0s0/o  UNAVAIL      0     0     0  cannot open
>>              c3t6d0      UNAVAIL      0   940     0  cannot open
>>
>> errors: No known data errors
>>
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>
>>
>> !DSPAM:122,4575983a2243349481112!
>>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Richard Elling

2006-Dec-05 19:20 UTC

head link

[zfs-discuss] weird thing with zfs

BTW, there is a way to check what the SCSI negotiations resolved to.
I wrote about it once in a BluePrint
	http://www.sun.com/blueprints/0500/sysperfnc.pdf
See page 11
  -- richard

Richard Elling wrote:> This looks more like a cabling or connector problem.  When that happens
> you should see parity errors and transfer rate negotiations.
>  -- richard
> 
> Krzys wrote:
>> Ok, so here is an update
>>
>> I did restart my sysyte, I power it off and power it on. Here is 
>> screen capture of my boot. I certainly do have some hard drive issues 
>> and will need to take a look at them... But I got my disk back visible 
>> to the system and zfs is doing resilvering again
>>
>> Rebooting with command: boot
>> Boot device: /pci at 1c,600000/scsi at 2/disk at 0,0:a  File and args:
>> SunOS Release 5.10 Version Generic_118833-24 64-bit
>> Copyright 1983-2006 Sun Microsystems, Inc.  All rights reserved.
>> Use is subject to license terms.
>> Hardware watchdog enabled
>> Hostname: chrysek
>> WARNING: /pci at 1e,600000/scsi at 3 (glm2):
>>         SCSI bus DATA IN phase parity error
>> WARNING: /pci at 1e,600000/scsi at 3 (glm2):
>>         Target 6 reducing sync. transfer rate
>> WARNING: /pci at 1e,600000/scsi at 3/sd at 6,0 (sd5):
>>         Error for Command: read(10)                Error Level:
Retryable
>>         Requested Block: 286732066                 Error Block:
286732066
>>         Vendor: SEAGATE                            Serial Number: 
>> 3HY14PVS
>>         Sense Key: Aborted Command
>>         ASC: 0x48 (initiator detected error message received), ASCQ: 
>> 0x0, FRU: 0x2
>> WARNING: /pci at 1e,600000/scsi at 3 (glm2):
>>         SCSI bus DATA IN phase parity error
>> WARNING: /pci at 1e,600000/scsi at 3 (glm2):
>>         Target 3 reducing sync. transfer rate
>> WARNING: /pci at 1e,600000/scsi at 3/sd at 3,0 (sd23):
>>         Error for Command: read(10)                Error Level:
Retryable
>>         Requested Block: 283623842                 Error Block:
283623842
>>         Vendor: SEAGATE                            Serial Number: 
>> 3HY8HS7L
>>         Sense Key: Aborted Command
>>         ASC: 0x48 (initiator detected error message received), ASCQ: 
>> 0x0, FRU: 0x2
>> WARNING: /pci at 1e,600000/scsi at 3 (glm2):
>>         SCSI bus DATA IN phase parity error
>> WARNING: /pci at 1e,600000/scsi at 3 (glm2):
>>         Target 5 reducing sync. transfer rate
>> WARNING: /pci at 1e,600000/scsi at 3/sd at 5,0 (sd25):
>>         Error for Command: read(10)                Error Level:
Retryable
>>         Requested Block: 283623458                 Error Block:
283623458
>>         Vendor: SEAGATE                            Serial Number: 
>> 3HY0LF18
>>         Sense Key: Aborted Command
>>         ASC: 0x48 (initiator detected error message received), ASCQ: 
>> 0x0, FRU: 0x2
>> /kernel/drv/sparcv9/zpool symbol avl_add multiply defined
>> /kernel/drv/sparcv9/zpool symbol assfail3 multiply defined
>> WARNING: kstat_create(''unix'', 0,
''dmu_buf_impl_t''): namespace collision
>> mypool2/d3 uncorrectable error
>> checking ufs filesystems
>> /dev/rdsk/c1t0d0s7: is logging.
>>
>> chrysek console login: VERITAS SCSA Generic Revision: 3.5c
>> Dec  5 13:01:38 chrysek root: CAPTURE_UPTIME ERROR: /var/opt/SUNWsrsrp 
>> missing
>> Dec  5 13:01:38 chrysek root: CAPTURE_UPTIME ERROR: /var/opt/SUNWsrsrp 
>> missing
>> Dec  5 13:01:46 chrysek VERITAS: No proxy found.
>> Dec  5 13:01:52 chrysek vmd[546]: ready for connections
>> Dec  5 13:01:53 chrysek VERITAS: No proxy found.
>> Dec  5 13:01:54 chrysek VERITAS: No proxy found.
>> Dec  5 13:02:00 chrysek VERITAS: No proxy found.
>> Dec  5 13:02:01 chrysek VERITAS: No proxy found.
>> Dec  5 13:02:03 chrysek VERITAS: No proxy found.
>> starting NetWorker daemons:
>>  nsrexecd
>>  lgtolmd
>> Dec  5 13:02:20 chrysek CNS Transport[841]: cctransport started
>> Dec  5 13:02:48 chrysek webmin[1353]: Webmin starting
>> Dec  5 13:19:07 chrysek scsi: WARNING: /pci at 1e,600000/scsi at 3
(glm2):
>> Dec  5 13:19:07 chrysek         Target 6 disabled wide SCSI mode
>> Dec  5 13:19:07 chrysek scsi: WARNING: /pci at 1e,600000/scsi at 3
(glm2):
>> Dec  5 13:19:07 chrysek         Target 6 reverting to async. mode
>> Dec  5 13:19:07 chrysek scsi: WARNING: /pci at 1e,600000/scsi at 3/sd
at 6,0
>> (sd5):
>> Dec  5 13:19:07 chrysek         Error for Command: 
>> write(10)               Error Level: Retryable
>> Dec  5 13:19:07 chrysek scsi:   Requested Block: 
>> 137163259                 Error Block: 137163259
>> Dec  5 13:19:07 chrysek scsi:   Vendor: SEAGATE Serial Number: 3HY14PVS
>> Dec  5 13:19:07 chrysek scsi:   Sense Key: Aborted Command
>> Dec  5 13:19:07 chrysek scsi:   ASC: 0x47 (scsi parity error), ASCQ: 
>> 0x0, FRU: 0x3
>>
>>
>> but now when I do zpool status -v
>>   pool: mypool
>>  state: ONLINE
>>  scrub: none requested
>> config:
>>
>>         NAME        STATE     READ WRITE CKSUM
>>         mypool      ONLINE       0     0     0
>>           mirror    ONLINE       0     0     0
>>             c1t2d0  ONLINE       0     0     0
>>             c1t3d0  ONLINE       0     0     0
>>
>> errors: No known data errors
>>
>>   pool: mypool2
>>  state: DEGRADED
>> status: One or more devices is currently being resilvered.  The pool
will
>>         continue to function, possibly in a degraded state.
>> action: Wait for the resilver to complete.
>>  scrub: resilver in progress, 4.40% done, 11h40m to go
>> config:
>>
>>         NAME              STATE     READ WRITE CKSUM
>>         mypool2           DEGRADED     0     0     0
>>           raidz           DEGRADED     0     0     0
>>             c3t0d0        ONLINE       0     0     0
>>             c3t1d0        ONLINE       0     0     0
>>             c3t2d0        ONLINE       0     0     0
>>             c3t3d0        ONLINE       0     0     0
>>             c3t4d0        ONLINE       0     0     0
>>             c3t5d0        ONLINE       0     0     0
>>             replacing     DEGRADED     0     0    12
>>               c3t6d0s0/o  UNAVAIL      0     0     0  cannot open
>>               c3t6d0      ONLINE       0     0     0
>>
>> errors: No known data errors
>>
>> I do see that drive... and it is doing resilvering
>>
>> format works too and I dont get coredump
>>
>> format
>> Searching for disks...done
>>
>>
>> AVAILABLE DISK SELECTIONS:
>>        0. c1t0d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 sec
809>
>>           /pci at 1c,600000/scsi at 2/sd at 0,0
>>        1. c1t1d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16 sec
809>
>>           /pci at 1c,600000/scsi at 2/sd at 1,0
>>        2. c1t2d0 <SEAGATE-ST3300007LC-D703-279.40GB>
>>           /pci at 1c,600000/scsi at 2/sd at 2,0
>>        3. c1t3d0 <SEAGATE-ST3300007LC-D703-279.40GB>
>>           /pci at 1c,600000/scsi at 2/sd at 3,0
>>        4. c3t0d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>>           /pci at 1e,600000/scsi at 3/sd at 0,0
>>        5. c3t1d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>>           /pci at 1e,600000/scsi at 3/sd at 1,0
>>        6. c3t2d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>>           /pci at 1e,600000/scsi at 3/sd at 2,0
>>        7. c3t3d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>>           /pci at 1e,600000/scsi at 3/sd at 3,0
>>        8. c3t4d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>>           /pci at 1e,600000/scsi at 3/sd at 4,0
>>        9. c3t5d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>>           /pci at 1e,600000/scsi at 3/sd at 5,0
>>       10. c3t6d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>>           /pci at 1e,600000/scsi at 3/sd at 6,0
>> Specify disk (enter its number): 10
>> selecting c3t6d0
>> [disk formatted]
>> /dev/dsk/c3t6d0s0 is part of active ZFS pool mypool2. Please see 
>> zpool(1M).
>>
>>
>> FORMAT MENU:
>>         disk       - select a disk
>>         type       - select (define) a disk type
>>         partition  - select (define) a partition table
>>         current    - describe the current disk
>>         format     - format and analyze the disk
>>         repair     - repair a defective sector
>>         label      - write label to the disk
>>         analyze    - surface analysis
>>         defect     - defect list management
>>         backup     - search for backup labels
>>         verify     - read and display labels
>>         inquiry    - show vendor, product and revision
>>         volname    - set 8-character volume name
>>         !<cmd>     - execute <cmd>, then return
>>         quit
>> format> verify
>>
>> Volume name = <        >
>> ascii name  = <SEAGATE-ST3146807LC-0007-136.73GB>
>> bytes/sector    =  512
>> sectors = 286749487
>> accessible sectors = 286749454
>> Part      Tag    Flag     First Sector         Size         Last Sector
>>   0        usr    wm                34      136.72GB          286733070
>>   1 unassigned    wm                 0           0               0
>>   2 unassigned    wm                 0           0               0
>>   3 unassigned    wm                 0           0               0
>>   4 unassigned    wm                 0           0               0
>>   5 unassigned    wm                 0           0               0
>>   6 unassigned    wm                 0           0               0
>>   8   reserved    wm         286733071        8.00MB          286749454
>>
>> format> q
>>
>>
>>
>>
>>
>>
>>
>> On Tue, 5 Dec 2006, Krzys wrote:
>>
>>>
>>> ok, two weeks ago I did notice one of my disk in zpool got
problems.
>>> I was getting "Corrupt label; wrong magic number"
messages, then when
>>> I looked in format it did not see that disk... (last disk) I had
that
>>> setup running for few months now and all of the sudden last disk 
>>> failed. So I ordered another disk, had it replaced like a week ago,
I
>>> did issue replace command after disk replacement, it was
resilvering
>>> disks since forever, then I got hints from this group that snaps 
>>> could be causing it so yesterday I did disable snaps and this
morning
>>> I di dnotice the same disk that I replaced is gone... Does it seem 
>>> weird that this disk would fail? Its new disk... I have Solaris 10 
>>> U2, 4 internal drives and then 7 external drives which are in
single
>>> enclousures connected via scsi chain to each other... So it seems 
>>> like last disk is failing. Those nipacks from sun have self 
>>> termination so there is no terminator at the end... Any ideas what 
>>> should I do? Do I need to order another drive and replace that one 
>>> too? Or will it happen again? What do you think could be the
problem?
>>> Ah, when I look at that enclosure I do see green light on it so it 
>>> seems like it did not fail...
>>>
>>> format
>>> Searching for disks...
>>> efi_alloc_and_init failed.
>>> done
>>>
>>>
>>> AVAILABLE DISK SELECTIONS:
>>>       0. c1t0d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16
sec 809>
>>>          /pci at 1c,600000/scsi at 2/sd at 0,0
>>>       1. c1t1d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16
sec 809>
>>>          /pci at 1c,600000/scsi at 2/sd at 1,0
>>>       2. c1t2d0 <SEAGATE-ST3300007LC-D703-279.40GB>
>>>          /pci at 1c,600000/scsi at 2/sd at 2,0
>>>       3. c1t3d0 <SEAGATE-ST3300007LC-D703-279.40GB>
>>>          /pci at 1c,600000/scsi at 2/sd at 3,0
>>>       4. c3t0d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>>>          /pci at 1e,600000/scsi at 3/sd at 0,0
>>>       5. c3t1d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>>>          /pci at 1e,600000/scsi at 3/sd at 1,0
>>>       6. c3t2d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>>>          /pci at 1e,600000/scsi at 3/sd at 2,0
>>>       7. c3t3d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>>>          /pci at 1e,600000/scsi at 3/sd at 3,0
>>>       8. c3t4d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>>>          /pci at 1e,600000/scsi at 3/sd at 4,0
>>>       9. c3t5d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>>>          /pci at 1e,600000/scsi at 3/sd at 5,0
>>>      10. c3t6d0 <drive type unknown>
>>>          /pci at 1e,600000/scsi at 3/sd at 6,0
>>>
>>>
>>>
>>> zpool status -v
>>>  pool: mypool
>>> state: ONLINE
>>> scrub: none requested
>>> config:
>>>
>>>        NAME        STATE     READ WRITE CKSUM
>>>        mypool      ONLINE       0     0     0
>>>          mirror    ONLINE       0     0     0
>>>            c1t2d0  ONLINE       0     0     0
>>>            c1t3d0  ONLINE       0     0     0
>>>
>>> errors: No known data errors
>>>
>>>  pool: mypool2
>>> state: DEGRADED
>>> scrub: resilver completed with 0 errors on Mon Dec  4 22:34:57 2006
>>> config:
>>>
>>>        NAME              STATE     READ WRITE CKSUM
>>>        mypool2           DEGRADED     0     0     0
>>>          raidz           DEGRADED     0     0     0
>>>            c3t0d0        ONLINE       0     0     0
>>>            c3t1d0        ONLINE       0     0     0
>>>            c3t2d0        ONLINE       0     0     0
>>>            c3t3d0        ONLINE       0     0     0
>>>            c3t4d0        ONLINE       0     0     0
>>>            c3t5d0        ONLINE       0     0     0
>>>            replacing     UNAVAIL      0   775     0  insufficient 
>>> replicas
>>>              c3t6d0s0/o  UNAVAIL      0     0     0  cannot open
>>>              c3t6d0      UNAVAIL      0   940     0  cannot open
>>>
>>> errors: No known data errors
>>>
>>> _______________________________________________
>>> zfs-discuss mailing list
>>> zfs-discuss at opensolaris.org
>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>>
>>>
>>> !DSPAM:122,4575983a2243349481112!
>>>
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Nathan Kroenert

2006-Dec-05 21:25 UTC

head link

[zfs-discuss] weird thing with zfs

Hm. If the disk has no label, why would it have an s0?

Or, did you mean p0?

Nathan.


On Wed, 2006-12-06 at 04:45, Krzys wrote:> Does not work :(
> 
> dd if=/dev/zero of=/dev/rdsk/c3t6d0s0 bs=1024k count=1024
> dd: opening `/dev/rdsk/c3t6d0s0'': I/O error
> 
> That is so strange... it seems like I lost another disk... I will try to
reboot
> and see what I get, but I guess I need to order another disk then and give
it a
> try...
> 
> Chris
> 
> 
> 
> 
> 
> On Tue, 5 Dec 2006, Al Hopper wrote:
> 
> > On Tue, 5 Dec 2006, Krzys wrote:
> >
> >> Thanks, ah another weeeeird thing is that when I run format on
that frive I get
> >> a coredump :(
> > ... snip ....
> >
> > Try zeroing out the disk label with something like:
> >
> > dd if=/dev/zero of=/dev/rdsk/c?t?d?p0  bs=1024k count=1024
> >
> > Regards,
> >
> > Al Hopper  Logical Approach Inc, Plano, TX.  al at
logical-approach.com
> >           Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
> > OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
> >             OpenSolaris Governing Board (OGB) Member - Feb 2006
> >
> >
> > !DSPAM:122,4575a7731650371292!
> >
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss--

Krzys

2006-Dec-06 15:46 UTC

head link

[zfs-discuss] weird thing with zfs

Thanks so much.. anyway resilvering worked its way, I got everything resolved
zpool status -v
   pool: mypool
  state: ONLINE
  scrub: none requested
config:

         NAME        STATE     READ WRITE CKSUM
         mypool      ONLINE       0     0     0
           mirror    ONLINE       0     0     0
             c1t2d0  ONLINE       0     0     0
             c1t3d0  ONLINE       0     0     0

errors: No known data errors

   pool: mypool2
  state: ONLINE
  scrub: resilver completed with 0 errors on Tue Dec  5 13:48:31 2006
config:

         NAME        STATE     READ WRITE CKSUM
         mypool2     ONLINE       0     0     0
           raidz     ONLINE       0     0     0
             c3t0d0  ONLINE       0     0     0
             c3t1d0  ONLINE       0     0     0
             c3t2d0  ONLINE       0     0     0
             c3t3d0  ONLINE       0     0     0
             c3t4d0  ONLINE       0     0     0
             c3t5d0  ONLINE       0     0     0
             c3t6d0  ONLINE       0     0     0

errors: No known data errors

did not change any cables nor anything, just reboot... I will llook into 
replacing cables (those are the short scsi cables.. anyway this is so weird and 
original disk that I replaced seems to be good as well.. it must be connectivity
problem... but whats weird is that I had it running for months without 
problems...

Regards and thanks to all for help.

Chris



On Tue, 5 Dec 2006, Richard Elling wrote:
> BTW, there is a way to check what the SCSI negotiations resolved to.
> I wrote about it once in a BluePrint
> 	http://www.sun.com/blueprints/0500/sysperfnc.pdf
> See page 11
> -- richard
>
> Richard Elling wrote:
>> This looks more like a cabling or connector problem.  When that happens
>> you should see parity errors and transfer rate negotiations.
>>  -- richard
>> 
>> Krzys wrote:
>>> Ok, so here is an update
>>> 
>>> I did restart my sysyte, I power it off and power it on. Here is
screen
>>> capture of my boot. I certainly do have some hard drive issues and
will
>>> need to take a look at them... But I got my disk back visible to
the
>>> system and zfs is doing resilvering again
>>> 
>>> Rebooting with command: boot
>>> Boot device: /pci at 1c,600000/scsi at 2/disk at 0,0:a  File and
args:
>>> SunOS Release 5.10 Version Generic_118833-24 64-bit
>>> Copyright 1983-2006 Sun Microsystems, Inc.  All rights reserved.
>>> Use is subject to license terms.
>>> Hardware watchdog enabled
>>> Hostname: chrysek
>>> WARNING: /pci at 1e,600000/scsi at 3 (glm2):
>>>         SCSI bus DATA IN phase parity error
>>> WARNING: /pci at 1e,600000/scsi at 3 (glm2):
>>>         Target 6 reducing sync. transfer rate
>>> WARNING: /pci at 1e,600000/scsi at 3/sd at 6,0 (sd5):
>>>         Error for Command: read(10)                Error Level:
Retryable
>>>         Requested Block: 286732066                 Error Block:
286732066
>>>         Vendor: SEAGATE                            Serial Number:
3HY14PVS
>>>         Sense Key: Aborted Command
>>>         ASC: 0x48 (initiator detected error message received),
ASCQ: 0x0,
>>> FRU: 0x2
>>> WARNING: /pci at 1e,600000/scsi at 3 (glm2):
>>>         SCSI bus DATA IN phase parity error
>>> WARNING: /pci at 1e,600000/scsi at 3 (glm2):
>>>         Target 3 reducing sync. transfer rate
>>> WARNING: /pci at 1e,600000/scsi at 3/sd at 3,0 (sd23):
>>>         Error for Command: read(10)                Error Level:
Retryable
>>>         Requested Block: 283623842                 Error Block:
283623842
>>>         Vendor: SEAGATE                            Serial Number:
3HY8HS7L
>>>         Sense Key: Aborted Command
>>>         ASC: 0x48 (initiator detected error message received),
ASCQ: 0x0,
>>> FRU: 0x2
>>> WARNING: /pci at 1e,600000/scsi at 3 (glm2):
>>>         SCSI bus DATA IN phase parity error
>>> WARNING: /pci at 1e,600000/scsi at 3 (glm2):
>>>         Target 5 reducing sync. transfer rate
>>> WARNING: /pci at 1e,600000/scsi at 3/sd at 5,0 (sd25):
>>>         Error for Command: read(10)                Error Level:
Retryable
>>>         Requested Block: 283623458                 Error Block:
283623458
>>>         Vendor: SEAGATE                            Serial Number:
3HY0LF18
>>>         Sense Key: Aborted Command
>>>         ASC: 0x48 (initiator detected error message received),
ASCQ: 0x0,
>>> FRU: 0x2
>>> /kernel/drv/sparcv9/zpool symbol avl_add multiply defined
>>> /kernel/drv/sparcv9/zpool symbol assfail3 multiply defined
>>> WARNING: kstat_create(''unix'', 0,
''dmu_buf_impl_t''): namespace collision
>>> mypool2/d3 uncorrectable error
>>> checking ufs filesystems
>>> /dev/rdsk/c1t0d0s7: is logging.
>>> 
>>> chrysek console login: VERITAS SCSA Generic Revision: 3.5c
>>> Dec  5 13:01:38 chrysek root: CAPTURE_UPTIME ERROR:
/var/opt/SUNWsrsrp
>>> missing
>>> Dec  5 13:01:38 chrysek root: CAPTURE_UPTIME ERROR:
/var/opt/SUNWsrsrp
>>> missing
>>> Dec  5 13:01:46 chrysek VERITAS: No proxy found.
>>> Dec  5 13:01:52 chrysek vmd[546]: ready for connections
>>> Dec  5 13:01:53 chrysek VERITAS: No proxy found.
>>> Dec  5 13:01:54 chrysek VERITAS: No proxy found.
>>> Dec  5 13:02:00 chrysek VERITAS: No proxy found.
>>> Dec  5 13:02:01 chrysek VERITAS: No proxy found.
>>> Dec  5 13:02:03 chrysek VERITAS: No proxy found.
>>> starting NetWorker daemons:
>>>  nsrexecd
>>>  lgtolmd
>>> Dec  5 13:02:20 chrysek CNS Transport[841]: cctransport started
>>> Dec  5 13:02:48 chrysek webmin[1353]: Webmin starting
>>> Dec  5 13:19:07 chrysek scsi: WARNING: /pci at 1e,600000/scsi at 3
(glm2):
>>> Dec  5 13:19:07 chrysek         Target 6 disabled wide SCSI mode
>>> Dec  5 13:19:07 chrysek scsi: WARNING: /pci at 1e,600000/scsi at 3
(glm2):
>>> Dec  5 13:19:07 chrysek         Target 6 reverting to async. mode
>>> Dec  5 13:19:07 chrysek scsi: WARNING: /pci at 1e,600000/scsi at
3/sd at 6,0 (sd5):
>>> Dec  5 13:19:07 chrysek         Error for Command: write(10) 
>>> Error Level: Retryable
>>> Dec  5 13:19:07 chrysek scsi:   Requested Block: 137163259 
>>> Error Block: 137163259
>>> Dec  5 13:19:07 chrysek scsi:   Vendor: SEAGATE Serial Number:
3HY14PVS
>>> Dec  5 13:19:07 chrysek scsi:   Sense Key: Aborted Command
>>> Dec  5 13:19:07 chrysek scsi:   ASC: 0x47 (scsi parity error),
ASCQ: 0x0,
>>> FRU: 0x3
>>> 
>>> 
>>> but now when I do zpool status -v
>>>   pool: mypool
>>>  state: ONLINE
>>>  scrub: none requested
>>> config:
>>>
>>>         NAME        STATE     READ WRITE CKSUM
>>>         mypool      ONLINE       0     0     0
>>>           mirror    ONLINE       0     0     0
>>>             c1t2d0  ONLINE       0     0     0
>>>             c1t3d0  ONLINE       0     0     0
>>> 
>>> errors: No known data errors
>>>
>>>   pool: mypool2
>>>  state: DEGRADED
>>> status: One or more devices is currently being resilvered.  The
pool will
>>>         continue to function, possibly in a degraded state.
>>> action: Wait for the resilver to complete.
>>>  scrub: resilver in progress, 4.40% done, 11h40m to go
>>> config:
>>>
>>>         NAME              STATE     READ WRITE CKSUM
>>>         mypool2           DEGRADED     0     0     0
>>>           raidz           DEGRADED     0     0     0
>>>             c3t0d0        ONLINE       0     0     0
>>>             c3t1d0        ONLINE       0     0     0
>>>             c3t2d0        ONLINE       0     0     0
>>>             c3t3d0        ONLINE       0     0     0
>>>             c3t4d0        ONLINE       0     0     0
>>>             c3t5d0        ONLINE       0     0     0
>>>             replacing     DEGRADED     0     0    12
>>>               c3t6d0s0/o  UNAVAIL      0     0     0  cannot open
>>>               c3t6d0      ONLINE       0     0     0
>>> 
>>> errors: No known data errors
>>> 
>>> I do see that drive... and it is doing resilvering
>>> 
>>> format works too and I dont get coredump
>>> 
>>> format
>>> Searching for disks...done
>>> 
>>> 
>>> AVAILABLE DISK SELECTIONS:
>>>        0. c1t0d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16
sec 809>
>>>           /pci at 1c,600000/scsi at 2/sd at 0,0
>>>        1. c1t1d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd 16
sec 809>
>>>           /pci at 1c,600000/scsi at 2/sd at 1,0
>>>        2. c1t2d0 <SEAGATE-ST3300007LC-D703-279.40GB>
>>>           /pci at 1c,600000/scsi at 2/sd at 2,0
>>>        3. c1t3d0 <SEAGATE-ST3300007LC-D703-279.40GB>
>>>           /pci at 1c,600000/scsi at 2/sd at 3,0
>>>        4. c3t0d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>>>           /pci at 1e,600000/scsi at 3/sd at 0,0
>>>        5. c3t1d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>>>           /pci at 1e,600000/scsi at 3/sd at 1,0
>>>        6. c3t2d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>>>           /pci at 1e,600000/scsi at 3/sd at 2,0
>>>        7. c3t3d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>>>           /pci at 1e,600000/scsi at 3/sd at 3,0
>>>        8. c3t4d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>>>           /pci at 1e,600000/scsi at 3/sd at 4,0
>>>        9. c3t5d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>>>           /pci at 1e,600000/scsi at 3/sd at 5,0
>>>       10. c3t6d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>>>           /pci at 1e,600000/scsi at 3/sd at 6,0
>>> Specify disk (enter its number): 10
>>> selecting c3t6d0
>>> [disk formatted]
>>> /dev/dsk/c3t6d0s0 is part of active ZFS pool mypool2. Please see 
>>> zpool(1M).
>>> 
>>> 
>>> FORMAT MENU:
>>>         disk       - select a disk
>>>         type       - select (define) a disk type
>>>         partition  - select (define) a partition table
>>>         current    - describe the current disk
>>>         format     - format and analyze the disk
>>>         repair     - repair a defective sector
>>>         label      - write label to the disk
>>>         analyze    - surface analysis
>>>         defect     - defect list management
>>>         backup     - search for backup labels
>>>         verify     - read and display labels
>>>         inquiry    - show vendor, product and revision
>>>         volname    - set 8-character volume name
>>>         !<cmd>     - execute <cmd>, then return
>>>         quit
>>> format> verify
>>> 
>>> Volume name = <        >
>>> ascii name  = <SEAGATE-ST3146807LC-0007-136.73GB>
>>> bytes/sector    =  512
>>> sectors = 286749487
>>> accessible sectors = 286749454
>>> Part      Tag    Flag     First Sector         Size         Last
Sector
>>>   0        usr    wm                34      136.72GB         
286733070
>>>   1 unassigned    wm                 0           0               0
>>>   2 unassigned    wm                 0           0               0
>>>   3 unassigned    wm                 0           0               0
>>>   4 unassigned    wm                 0           0               0
>>>   5 unassigned    wm                 0           0               0
>>>   6 unassigned    wm                 0           0               0
>>>   8   reserved    wm         286733071        8.00MB         
286749454
>>> 
>>> format> q
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Tue, 5 Dec 2006, Krzys wrote:
>>> 
>>>> 
>>>> ok, two weeks ago I did notice one of my disk in zpool got
problems.
>>>> I was getting "Corrupt label; wrong magic number"
messages, then when I
>>>> looked in format it did not see that disk... (last disk) I had
that setup
>>>> running for few months now and all of the sudden last disk
failed. So I
>>>> ordered another disk, had it replaced like a week ago, I did
issue
>>>> replace command after disk replacement, it was resilvering
disks since
>>>> forever, then I got hints from this group that snaps could be
causing it
>>>> so yesterday I did disable snaps and this morning I di dnotice
the same
>>>> disk that I replaced is gone... Does it seem weird that this
disk would
>>>> fail? Its new disk... I have Solaris 10 U2, 4 internal drives
and then 7
>>>> external drives which are in single enclousures connected via
scsi chain
>>>> to each other... So it seems like last disk is failing. Those
nipacks
>>>> from sun have self termination so there is no terminator at the
end...
>>>> Any ideas what should I do? Do I need to order another drive
and replace
>>>> that one too? Or will it happen again? What do you think could
be the
>>>> problem? Ah, when I look at that enclosure I do see green light
on it so
>>>> it seems like it did not fail...
>>>> 
>>>> format
>>>> Searching for disks...
>>>> efi_alloc_and_init failed.
>>>> done
>>>> 
>>>> 
>>>> AVAILABLE DISK SELECTIONS:
>>>>       0. c1t0d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd
16 sec 809>
>>>>          /pci at 1c,600000/scsi at 2/sd at 0,0
>>>>       1. c1t1d0 <SEAGATE-ST3300007LC-D703 cyl 45265 alt 2 hd
16 sec 809>
>>>>          /pci at 1c,600000/scsi at 2/sd at 1,0
>>>>       2. c1t2d0 <SEAGATE-ST3300007LC-D703-279.40GB>
>>>>          /pci at 1c,600000/scsi at 2/sd at 2,0
>>>>       3. c1t3d0 <SEAGATE-ST3300007LC-D703-279.40GB>
>>>>          /pci at 1c,600000/scsi at 2/sd at 3,0
>>>>       4. c3t0d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>>>>          /pci at 1e,600000/scsi at 3/sd at 0,0
>>>>       5. c3t1d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>>>>          /pci at 1e,600000/scsi at 3/sd at 1,0
>>>>       6. c3t2d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>>>>          /pci at 1e,600000/scsi at 3/sd at 2,0
>>>>       7. c3t3d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>>>>          /pci at 1e,600000/scsi at 3/sd at 3,0
>>>>       8. c3t4d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>>>>          /pci at 1e,600000/scsi at 3/sd at 4,0
>>>>       9. c3t5d0 <SEAGATE-ST3146807LC-0007-136.73GB>
>>>>          /pci at 1e,600000/scsi at 3/sd at 5,0
>>>>      10. c3t6d0 <drive type unknown>
>>>>          /pci at 1e,600000/scsi at 3/sd at 6,0
>>>> 
>>>> 
>>>> 
>>>> zpool status -v
>>>>  pool: mypool
>>>> state: ONLINE
>>>> scrub: none requested
>>>> config:
>>>>
>>>>        NAME        STATE     READ WRITE CKSUM
>>>>        mypool      ONLINE       0     0     0
>>>>          mirror    ONLINE       0     0     0
>>>>            c1t2d0  ONLINE       0     0     0
>>>>            c1t3d0  ONLINE       0     0     0
>>>> 
>>>> errors: No known data errors
>>>>
>>>>  pool: mypool2
>>>> state: DEGRADED
>>>> scrub: resilver completed with 0 errors on Mon Dec  4 22:34:57
2006
>>>> config:
>>>>
>>>>        NAME              STATE     READ WRITE CKSUM
>>>>        mypool2           DEGRADED     0     0     0
>>>>          raidz           DEGRADED     0     0     0
>>>>            c3t0d0        ONLINE       0     0     0
>>>>            c3t1d0        ONLINE       0     0     0
>>>>            c3t2d0        ONLINE       0     0     0
>>>>            c3t3d0        ONLINE       0     0     0
>>>>            c3t4d0        ONLINE       0     0     0
>>>>            c3t5d0        ONLINE       0     0     0
>>>>            replacing     UNAVAIL      0   775     0 
insufficient
>>>> replicas
>>>>              c3t6d0s0/o  UNAVAIL      0     0     0  cannot
open
>>>>              c3t6d0      UNAVAIL      0   940     0  cannot
open
>>>> 
>>>> errors: No known data errors
>>>> 
>>>> _______________________________________________
>>>> zfs-discuss mailing list
>>>> zfs-discuss at opensolaris.org
>>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>>> 
>>>> 
>>>> 
>>>> 
>>> _______________________________________________
>>> zfs-discuss mailing list
>>> zfs-discuss at opensolaris.org
>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>
> !DSPAM:122,4575c6a814723197925582!
>

zfs discuss - Dec 2006 - weird thing with zfs

[zfs-discuss] weird thing with zfs

[zfs-discuss] weird thing with zfs

[zfs-discuss] weird thing with zfs

[zfs-discuss] weird thing with zfs

[zfs-discuss] weird thing with zfs

[zfs-discuss] weird thing with zfs

[zfs-discuss] weird thing with zfs

[zfs-discuss] weird thing with zfs

[zfs-discuss] weird thing with zfs

[zfs-discuss] weird thing with zfs

[zfs-discuss] weird thing with zfs

[zfs-discuss] Re: weird thing with zfs

[zfs-discuss] weird thing with zfs

[zfs-discuss] weird thing with zfs

[zfs-discuss] weird thing with zfs

[zfs-discuss] weird thing with zfs