thr3ads.net - zfs discuss - [zfs-discuss] help zfs pool with duplicated and missing entry of hdd [Jan 2013]

If this information is useful, please help other people find it:
Share via:

Jason

2013-Jan-10 07:51 UTC

[zfs-discuss] help zfs pool with duplicated and missing entry of hdd

Hi,

One of my server''s zfs faulted and it shows following:
NAME        STATE     READ WRITE CKSUM
        backup      UNAVAIL      0     0     0  insufficient replicas
          raidz2-0  UNAVAIL      0     0     0  insufficient replicas
            c4t0d0  ONLINE       0     0     0
            c4t0d1  ONLINE       0     0     0
            c4t0d0  FAULTED      0     0     0  corrupted data
            c4t0d3  FAULTED      0     0     0  too many errors
            c4t0d4  FAULTED      0     0     0  too many errors
...(omit the rest).

My question is why c4t0d0 appeared twice, and c4t0d2 is missing.

Have check the controller card and hard disk, they are all working fine.

Please help how to troubleshooting and what is the main cause of it, how to 
recover the pool?

Thank you.

Jim Klimov

2013-Jan-10 12:25 UTC

head link

[zfs-discuss] help zfs pool with duplicated and missing entry of hdd

On 2013-01-10 08:51, Jason wrote:> Hi,
>
> One of my server''s zfs faulted and it shows following:
> NAME        STATE     READ WRITE CKSUM
>          backup      UNAVAIL      0     0     0  insufficient replicas
>            raidz2-0  UNAVAIL      0     0     0  insufficient replicas
>              c4t0d0  ONLINE       0     0     0
>              c4t0d1  ONLINE       0     0     0
>              c4t0d0  FAULTED      0     0     0  corrupted data
>              c4t0d3  FAULTED      0     0     0  too many errors
>              c4t0d4  FAULTED      0     0     0  too many errors
> ...(omit the rest).
>
> My question is why c4t0d0 appeared twice, and c4t0d2 is missing.
>
> Have check the controller card and hard disk, they are all working fine.
This renaming does seem like an error in detecting (and further naming)
of the disks - i.e. if a connector got loose, and one of the disks is
not seen by the system, the numbering can shift in such manner. It is
indeed strange however that only "d2" got shifted or missing and not
all those numbers after it.

So, you did verify that the controller sees all the disks in "format"
command (and perhaps after a cold reboot - in BIOS)? Just in case, try
to unplug and replug all cables (power, data) in case their pins got
oxydized over time.

HTH,
//Jim

Michael Hase

2013-Jan-10 14:03 UTC

head link

[zfs-discuss] help zfs pool with duplicated and missing entry of hdd

On Thu, 10 Jan 2013, Jim Klimov wrote:
> On 2013-01-10 08:51, Jason wrote:
>> Hi,
>> 
>> One of my server''s zfs faulted and it shows following:
>> NAME        STATE     READ WRITE CKSUM
>>          backup      UNAVAIL      0     0     0  insufficient replicas
>>            raidz2-0  UNAVAIL      0     0     0  insufficient replicas
>>              c4t0d0  ONLINE       0     0     0
>>              c4t0d1  ONLINE       0     0     0
>>              c4t0d0  FAULTED      0     0     0  corrupted data
>>              c4t0d3  FAULTED      0     0     0  too many errors
>>              c4t0d4  FAULTED      0     0     0  too many errors
>> ...(omit the rest).
>> 
>> My question is why c4t0d0 appeared twice, and c4t0d2 is missing.
>> 
>> Have check the controller card and hard disk, they are all working
fine.
>
> This renaming does seem like an error in detecting (and further naming)
> of the disks - i.e. if a connector got loose, and one of the disks is
> not seen by the system, the numbering can shift in such manner. It is
> indeed strange however that only "d2" got shifted or missing and
not
> all those numbers after it.
>
> So, you did verify that the controller sees all the disks in
"format"
> command (and perhaps after a cold reboot - in BIOS)? Just in case, try
> to unplug and replug all cables (power, data) in case their pins got
> oxydized over time.
Usually the disk numbering in any solaris based os stays the same if one 
disk is offline/missing, it''s fixed to the controller port, or scsi 
target, or wwn. Imho a huge advantage of the c0t0d0 pattern, instead of 
the linux or freebsd numbering. I once had an old sun 5200 hooked up to a 
linux box and one of the 22 disks failed, every disk after the bad one had 
shifted, what a mess.

To me the c4t0d0, c4t0d1, ... numbering looks either like a hardware raid 
controller not in jbod mode, or even an external san. jbods normally show 
up as lun 0 (d0) with different target numbers (t1, t2, ...). Maybe 
something wrong with lun numbering on your box?

-- Michael

zfs discuss - Jan 2013 - help zfs pool with duplicated and missing entry of hdd

[zfs-discuss] help zfs pool with duplicated and missing entry of hdd

[zfs-discuss] help zfs pool with duplicated and missing entry of hdd

[zfs-discuss] help zfs pool with duplicated and missing entry of hdd