thr3ads.net - zfs discuss - [zfs-discuss] Kernel Panic on zpool clean [Jun 2010]

If this information is useful, please help other people find it:
Share via:

George

2010-Jun-28 19:27 UTC

[zfs-discuss] Kernel Panic on zpool clean

Hi,

I have a machine running 2009.06 with 8 SATA drives in SCSI connected enclosure.

I had a drive fail and accidentally replaced the wrong one, which unsurprisingly
caused the rebuild to fail. The status of the zpool then ended up as:

 pool: storage2
 state: FAULTED
status: An intent log record could not be read.
        Waiting for adminstrator intervention to fix the faulted pool.
action: Either restore the affected device(s) and run ''zpool
online'',
        or ignore the intent log records by running ''zpool
clear''.
   see: http://www.sun.com/msg/ZFS-8000-K4
 scrub: none requested
config:

    NAME           STATE     READ WRITE CKSUM
    storage2       FAULTED      0     0     1  bad intent log
        raidz1       ONLINE       0     0     0
            c9t4d2     ONLINE       0     0     0
            c9t4d3     ONLINE       0     0     0
            c10t4d2    ONLINE       0     0     0
            c10t4d4    ONLINE       0     0     0
        raidz1       DEGRADED     0     0     6
            c10t4d0    UNAVAIL      0     0     0  cannot open
            replacing  ONLINE       0     0     0
                c9t4d0   ONLINE       0     0     0
                c10t4d3  ONLINE       0     0     0
            c10t4d1    ONLINE       0     0     0
            c9t4d1     ONLINE       0     0     0

running "zpool clear storage2" caused the machine to dump and reboot.
I''ve tried removing the spare and putting back the faulty drive to
give:

  pool: storage2
 state: FAULTED
status: An intent log record could not be read.
        Waiting for adminstrator intervention to fix the faulted pool.
action: Either restore the affected device(s) and run ''zpool
online'',
        or ignore the intent log records by running ''zpool
clear''.
   see: http://www.sun.com/msg/ZFS-8000-K4
 scrub: none requested
config:

    NAME           STATE     READ WRITE CKSUM
    storage2       FAULTED      0     0     1  bad intent log
        raidz1       ONLINE       0     0     0
            c9t4d2     ONLINE       0     0     0
            c9t4d3     ONLINE       0     0     0
            c10t4d2    ONLINE       0     0     0
            c10t4d4    ONLINE       0     0     0
        raidz1       DEGRADED     0     0     6
            c10t4d0    FAULTED      0     0     0  corrupted data
            replacing  DEGRADED     0     0     0
                c9t4d0   ONLINE       0     0     0
                c9t4d4   UNAVAIL      0     0     0  cannot open
            c10t4d1    ONLINE       0     0     0
            c9t4d1     ONLINE       0     0     0

Again this core dumps when I try to do "zpool clear storage2"

Does anyone have any suggestions what would be the best course of action now?
-- 
This message posted from opensolaris.org

Victor Latushkin

2010-Jun-28 20:56 UTC

head link

[zfs-discuss] Kernel Panic on zpool clean

On Jun 28, 2010, at 11:27 PM, George wrote:
> I''ve tried removing the spare and putting back the faulty drive to
give:
> 
>  pool: storage2
> state: FAULTED
> status: An intent log record could not be read.
>        Waiting for adminstrator intervention to fix the faulted pool.
> action: Either restore the affected device(s) and run ''zpool
online'',
>        or ignore the intent log records by running ''zpool
clear''.
>   see: http://www.sun.com/msg/ZFS-8000-K4
> scrub: none requested
> config:
> 
>    NAME           STATE     READ WRITE CKSUM
>    storage2       FAULTED      0     0     1  bad intent log
>        raidz1       ONLINE       0     0     0
>            c9t4d2     ONLINE       0     0     0
>            c9t4d3     ONLINE       0     0     0
>            c10t4d2    ONLINE       0     0     0
>            c10t4d4    ONLINE       0     0     0
>        raidz1       DEGRADED     0     0     6
>            c10t4d0    FAULTED      0     0     0  corrupted data
>            replacing  DEGRADED     0     0     0
>                c9t4d0   ONLINE       0     0     0
>                c9t4d4   UNAVAIL      0     0     0  cannot open
>            c10t4d1    ONLINE       0     0     0
>            c9t4d1     ONLINE       0     0     0
> 
> Again this core dumps when I try to do "zpool clear storage2"
> 
> Does anyone have any suggestions what would be the best course of action
now?
I think first we need to understand why it does not like ''zpool
clear'', as that may provide better understanding of what is wrong.

For that you need to create directory for saving crashdumps e.g. like this

mkdir -p /var/crash/`uname -n`

then run savecore and see if it would save a crash dump into that directory.

If crashdump is there, then you need to perform some basic investigation:

cd /var/crash/`uname -n`

mdb <dump number>

::status
::stack
::spa -c
::spa -v
::spa -ve
$q

for a start.

George

2010-Jun-28 21:30 UTC

head link

[zfs-discuss] Kernel Panic on zpool clean

I''ve attached the output of those commands. The machine is a v20z if
that makes any difference.

Thanks,

George
-- 
This message posted from opensolaris.org
-------------- next part --------------
mdb: logging to "debug.txt"> ::statusdebugging crash dump vmcore.0 (64-bit) from crypt
operating system: 5.11 snv_111b (i86pc)
panic message: 
BAD TRAP: type=e (#pf Page fault) rp=ffffff00084fc660 addr=0 occurred in module 
"unix" due to a NULL pointer dereference
dump content: kernel pages only


> ::stackmutex_enter+0xb()
metaslab_free+0x12e(ffffff01c9fb3800, ffffff01cce64668, 1b9528, 0)
zio_dva_free+0x26(ffffff01cce64608)
zio_execute+0xa0(ffffff01cce64608)
zio_nowait+0x5a(ffffff01cce64608)
arc_free+0x197(ffffff01cf0c80c0, ffffff01c9fb3800, 1b9528, ffffff01d389bcf0, 0, 
0)
dsl_free+0x30(ffffff01cf0c80c0, ffffff01d389bcc0, 1b9528, ffffff01d389bcf0, 0, 0
)
dsl_dataset_block_kill+0x293(0, ffffff01d389bcf0, ffffff01cf0c80c0, 
ffffff01d18cfd80)
dmu_objset_sync+0xc4(ffffff01cffe0080, ffffff01cf0c80c0, ffffff01d18cfd80)
dsl_pool_sync+0x1ee(ffffff01d389bcc0, 1b9528)
spa_sync+0x32a(ffffff01c9fb3800, 1b9528)
txg_sync_thread+0x265(ffffff01d389bcc0)
thread_start+8()


> ::spa -cADDR                 STATE NAME                                                
ffffff01c8df3000    ACTIVE rpool

    version=000000000000000e
    name=''rpool''
    state=0000000000000000
    txg=00000000056a6ad1
    pool_guid=53825ef3c58abc97
    hostid=0000000000820b9b
    hostname=''crypt''
    vdev_tree
        type=''root''
        id=0000000000000000
        guid=53825ef3c58abc97
        children[0]
            type=''mirror''
            id=0000000000000000
            guid=e9b8daed37492cfe
            whole_disk=0000000000000000
            metaslab_array=0000000000000017
            metaslab_shift=000000000000001d
            ashift=0000000000000009
            asize=0000001114e00000
            is_log=0000000000000000
            children[0]
                type=''disk''
                id=0000000000000000
                guid=ad7e5022f804365a
                path=''/dev/dsk/c8t0d0s0''
                devid=''id1,sd at
SSEAGATE_ST373307LC______3HZ76YYD0000743809WM/a''
                phys_path=''/pci at 0,0/pci1022,7450 at a/pci17c2,10 at
4/sd at 0,0:a''
                whole_disk=0000000000000000
                DTL=0000000000000052
            children[1]
                type=''disk''
                id=0000000000000001
                guid=2f7a03c75a4931ac
                path=''/dev/dsk/c8t1d0s0''
                devid=''id1,sd at
SSEAGATE_ST373307LC______3HZ80BDP0000743793PA/a''
                phys_path=''/pci at 0,0/pci1022,7450 at a/pci17c2,10 at
4/sd at 1,0:a''
                whole_disk=0000000000000000
                DTL=0000000000000050
ffffff01c9fb3800    ACTIVE storage2

    version=000000000000000e
    name=''storage2''
    state=0000000000000000
    txg=00000000001b9406
    pool_guid=cc049c0f1321fc28
    hostid=0000000000820b9b
    hostname=''crypt''
    vdev_tree
        type=''root''
        id=0000000000000000
        guid=cc049c0f1321fc28
        children[0]
            type=''raidz''
            id=0000000000000000
            guid=dc1ecf18721028c1
            nparity=0000000000000001
            metaslab_array=000000000000000e
            metaslab_shift=0000000000000023
            ashift=0000000000000009
            asize=000003a33f100000
            is_log=0000000000000000
            children[0]
                type=''disk''
                id=0000000000000000
                guid=c7b64596709ebdef
                path=''/dev/dsk/c9t4d2s0''
                devid=''id1,sd at
n600d0230006c8a5f0c3fd863ea736d00/a''
                phys_path=''/pci at 0,0/pci1022,7450 at b/pci9005,40 at
1/sd at 4,2:a''
                whole_disk=0000000000000001
                DTL=000000000000012d
            children[1]
                type=''disk''
                id=0000000000000001
                guid=cd7ba5d38162fe0d
                path=''/dev/dsk/c9t4d3s0''
                devid=''id1,sd at
n600d0230006c8a5f0c3fd8514ed8d900/a''
                phys_path=''/pci at 0,0/pci1022,7450 at b/pci9005,40 at
1/sd at 4,3:a''
                whole_disk=0000000000000001
                DTL=000000000000012c
            children[2]
                type=''disk''
                id=0000000000000002
                guid=3b499fb48e06460b
                path=''/dev/dsk/c10t4d2s0''
                devid=''id1,sd at
n600d0230006c8a5f0c3fd84312aa6d00/a''
                phys_path=''/pci at 0,0/pci1022,7450 at b/pci9005,40 at
1,1/sd at 4,2:a''
                whole_disk=0000000000000001
                DTL=000000000000012b
            children[3]
                type=''disk''
                id=0000000000000003
                guid=e205849496e5e447
                path=''/dev/dsk/c10t4d4s0''
                devid=''id1,sd at
n600d0230006c8a5f0c3fd8415c62ae00/a''
                phys_path=''/pci at 0,0/pci1022,7450 at b/pci9005,40 at
1,1/sd at 4,4:a''
                whole_disk=0000000000000001
                DTL=0000000000000128
        children[1]
            type=''raidz''
            id=0000000000000001
            guid=aee16872dbfc7c57
            nparity=0000000000000001
            metaslab_array=00000000000000ac
            metaslab_shift=0000000000000023
            ashift=0000000000000009
            asize=000003a33f100000
            is_log=0000000000000000
            children[0]
                type=''disk''
                id=0000000000000000
                guid=61b419ff9ec3a9be
                path=''/dev/dsk/c10t4d0s0''
                devid=''id1,sd at
n600d0230006c8a5f0c3fd83eda0a4a00/a''
                phys_path=''/pci at 0,0/pci1022,7450 at b/pci9005,40 at
1,1/sd at 4,0:a''
                whole_disk=0000000000000001
                DTL=0000000000000131
            children[1]
                type=''replacing''
                id=0000000000000001
                guid=eaedce68dff419e7
                whole_disk=0000000000000000
                children[0]
                    type=''disk''
                    id=0000000000000000
                    guid=7e516b0508d6d9ad
                    path=''/dev/dsk/c9t4d0s0''
                    devid=''id1,sd at
n600d0230006c8a5f0c3fd86eee69a300/a''
                    phys_path=''/pci at 0,0/pci1022,7450 at b/pci9005,40
at 1/sd at 4,0:a''
                    whole_disk=0000000000000001
                    DTL=0000000000000130
                children[1]
                    type=''disk''
                    id=0000000000000001
                    guid=ea6066eef4fa119e
                    path=''/dev/dsk/c9t4d4s0''
                    devid=''id1,sd at
n600d0230006c8a5f0c3fd8612edc7d00/a''
                    phys_path=''/pci at 0,0/pci1022,7450 at b/pci9005,40
at 1/sd at 4,4:a''
                    whole_disk=0000000000000001
                    DTL=0000000000000141
            children[2]
                type=''disk''
                id=0000000000000002
                guid=37dbb4cce114392a
                path=''/dev/dsk/c10t4d1s0''
                devid=''id1,sd at
n600d0230006c8a5f0c3fd8609d147700/a''
                phys_path=''/pci at 0,0/pci1022,7450 at b/pci9005,40 at
1,1/sd at 4,1:a''
                whole_disk=0000000000000001
                DTL=000000000000012f
            children[3]
                type=''disk''
                id=0000000000000003
                guid=e942d5e14333bca5
                path=''/dev/dsk/c9t4d1s0''
                devid=''id1,sd at
n600d0230006c8a5f0c3fd86cbc020700/a''
                phys_path=''/pci at 0,0/pci1022,7450 at b/pci9005,40 at
1/sd at 4,1:a''
                whole_disk=0000000000000001
                DTL=000000000000012e


> ::spa -vADDR                 STATE NAME                                                
ffffff01c8df3000    ACTIVE rpool

    ADDR             STATE     AUX          DESCRIPTION                        
    ffffff01c73de680 HEALTHY   -            root
    ffffff01c73de040 HEALTHY   -              mirror
    ffffff01c9cca340 HEALTHY   -                /dev/dsk/c8t0d0s0
    ffffff01c9cca980 HEALTHY   -                /dev/dsk/c8t1d0s0
ffffff01c9fb3800    ACTIVE storage2

    ffffff01c9d32900 DEGRADED  -            root
    ffffff01c9cb3640 HEALTHY   -              raidz
    ffffff01d3874300 HEALTHY   -                /dev/dsk/c9t4d2s0
    ffffff01d3874940 HEALTHY   -                /dev/dsk/c9t4d3s0
    ffffff01cae76d40 HEALTHY   -                /dev/dsk/c10t4d2s0
    ffffff01c9da5040 HEALTHY   -                /dev/dsk/c10t4d4s0
    ffffff01c9cb3000 DEGRADED  -              raidz
    ffffff01c9d322c0 CANT_OPEN CORRUPT_DATA     /dev/dsk/c10t4d0s0
    ffffff01c9da6300 DEGRADED  -                replacing
    ffffff01c9d31000 HEALTHY   -                  /dev/dsk/c9t4d0s0
    ffffff01c9cb3c80 CANT_OPEN OPEN_FAILED        /dev/dsk/c9t4d4s0
    ffffff01c9da5cc0 HEALTHY   -                /dev/dsk/c10t4d1s0
    ffffff01cae779c0 HEALTHY   -                /dev/dsk/c9t4d1s0


> ::spa -veADDR                 STATE NAME                                                
ffffff01c8df3000    ACTIVE rpool

    ADDR             STATE     AUX          DESCRIPTION                        
    ffffff01c73de680 HEALTHY   -            root
        
                       READ        WRITE         FREE        CLAIM        IOCTL
        OPS               0            0            0            0            0
        BYTES             0            0            0            0            0
        EREAD             0
        EWRITE            0
        ECKSUM            0
    
    ffffff01c73de040 HEALTHY   -              mirror
        
                       READ        WRITE         FREE        CLAIM        IOCTL
        OPS          0x146d        0x717            0            0            0
        BYTES     0x75a8a00    0x1718600            0            0            0
        EREAD             0
        EWRITE            0
        ECKSUM            0
    
    ffffff01c9cca340 HEALTHY   -                /dev/dsk/c8t0d0s0
        
                       READ        WRITE         FREE        CLAIM        IOCTL
        OPS           0x5b9        0x399            0            0         0x76
        BYTES     0x56ae000    0x1808600            0            0            0
        EREAD             0
        EWRITE            0
        ECKSUM            0
    
    ffffff01c9cca980 HEALTHY   -                /dev/dsk/c8t1d0s0
        
                       READ        WRITE         FREE        CLAIM        IOCTL
        OPS           0x626        0x388            0            0         0x76
        BYTES     0x59ff600    0x1808600            0            0            0
        EREAD             0
        EWRITE            0
        ECKSUM            0
    
ffffff01c9fb3800    ACTIVE storage2

    ffffff01c9d32900 DEGRADED  -            root
        
                       READ        WRITE         FREE        CLAIM        IOCTL
        OPS               0            0            0            0            0
        BYTES             0            0            0            0            0
        EREAD             0
        EWRITE            0
        ECKSUM          0x4
    
    ffffff01c9cb3640 HEALTHY   -              raidz
        
                       READ        WRITE         FREE        CLAIM        IOCTL
        OPS            0x15            0            0            0            0
        BYTES       0x1c000            0            0            0            0
        EREAD             0
        EWRITE            0
        ECKSUM            0
    
    ffffff01d3874300 HEALTHY   -                /dev/dsk/c9t4d2s0
        
                       READ        WRITE         FREE        CLAIM        IOCTL
        OPS            0x15          0x3            0            0            0
        BYTES      0x152000       0x6000            0            0            0
        EREAD             0
        EWRITE            0
        ECKSUM            0
    
    ffffff01d3874940 HEALTHY   -                /dev/dsk/c9t4d3s0
        
                       READ        WRITE         FREE        CLAIM        IOCTL
        OPS            0x11          0x3            0            0            0
        BYTES      0x112000       0x6000            0            0            0
        EREAD             0
        EWRITE            0
        ECKSUM            0
    
    ffffff01cae76d40 HEALTHY   -                /dev/dsk/c10t4d2s0
        
                       READ        WRITE         FREE        CLAIM        IOCTL
        OPS            0x17          0x3            0            0            0
        BYTES      0x172000       0x6000            0            0            0
        EREAD             0
        EWRITE            0
        ECKSUM            0
    
    ffffff01c9da5040 HEALTHY   -                /dev/dsk/c10t4d4s0
        
                       READ        WRITE         FREE        CLAIM        IOCTL
        OPS            0x16          0x3            0            0            0
        BYTES      0x162000       0x6000            0            0            0
        EREAD             0
        EWRITE            0
        ECKSUM            0
    
    ffffff01c9cb3000 DEGRADED  -              raidz
        
                       READ        WRITE         FREE        CLAIM        IOCTL
        OPS             0x3          0x2            0            0            0
        BYTES        0x1000       0x1000            0            0            0
        EREAD             0
        EWRITE            0
        ECKSUM         0x19
    
    ffffff01c9d322c0 CANT_OPEN CORRUPT_DATA     /dev/dsk/c10t4d0s0
        
                       READ        WRITE         FREE        CLAIM        IOCTL
        OPS             0x4          0x3            0            0            0
        BYTES       0x22000       0x6000            0            0            0
        EREAD             0
        EWRITE            0
        ECKSUM            0
    
    ffffff01c9da6300 DEGRADED  -                replacing
        
                       READ        WRITE         FREE        CLAIM        IOCTL
        OPS            0x22          0x2            0            0            0
        BYTES        0xc000        0x600            0            0            0
        EREAD             0
        EWRITE            0
        ECKSUM            0
    
    ffffff01c9d31000 HEALTHY   -                  /dev/dsk/c9t4d0s0
        
                       READ        WRITE         FREE        CLAIM        IOCTL
        OPS            0x1f          0x5            0            0            0
        BYTES      0x107a00       0x6600            0            0            0
        EREAD             0
        EWRITE            0
        ECKSUM            0
    
    ffffff01c9cb3c80 CANT_OPEN OPEN_FAILED        /dev/dsk/c9t4d4s0
        
                       READ        WRITE         FREE        CLAIM        IOCTL
        OPS               0            0            0            0            0
        BYTES             0            0            0            0            0
        EREAD             0
        EWRITE            0
        ECKSUM            0
    
    ffffff01c9da5cc0 HEALTHY   -                /dev/dsk/c10t4d1s0
        
                       READ        WRITE         FREE        CLAIM        IOCTL
        OPS            0x1e          0x5            0            0            0
        BYTES       0xf7a00       0x6600            0            0            0
        EREAD             0
        EWRITE            0
        ECKSUM            0
    
    ffffff01cae779c0 HEALTHY   -                /dev/dsk/c9t4d1s0
        
                       READ        WRITE         FREE        CLAIM        IOCTL
        OPS            0x1e          0x5            0            0            0
        BYTES       0xf5c00       0x6400            0            0            0
        EREAD             0
        EWRITE            0
        ECKSUM            0

George

2010-Jun-29 13:33 UTC

head link

[zfs-discuss] Kernel Panic on zpool clean

Another related question - 

I have a second enclosure with blank disks which I would like to use to take a
copy of the existing zpool as a precaution before attempting any fixes. The
disks in this enclosure are larger than those that the one with a problem.

What would be the best way to do this?

If I were to clone the disks 1:1 would the difference in size cause any
problems? I also had an idea that I might be able to DD the original disks into
files on a ZFS on the second enclosure and mount the files but the few results
I''ve turned up on the subject seem to say this is a bad idea.
-- 
This message posted from opensolaris.org

Victor Latushkin

2010-Jun-29 22:35 UTC

head link

[zfs-discuss] Kernel Panic on zpool clean

On Jun 29, 2010, at 1:30 AM, George wrote:
> I''ve attached the output of those commands. The machine is a v20z
if that makes any difference.
Stack trace is similar to one bug that I do not recall right now, and it
indicates that there''s likely a corruption in ZFS metadata.

I suggest you to try running ''zdb -bcsv storage2'' and show the
result.

victor

George

2010-Jun-30 06:48 UTC

head link

[zfs-discuss] Kernel Panic on zpool clean

> I suggest you to try running ''zdb -bcsv storage2'' and
> show the result.
root at crypt:/tmp# zdb -bcsv storage2
zdb: can''t open storage2: No such device or address

then I tried

root at crypt:/tmp# zdb -ebcsv storage2
zdb: can''t open storage2: File exists

George
-- 
This message posted from opensolaris.org

Victor Latushkin

2010-Jun-30 07:25 UTC

head link

[zfs-discuss] Kernel Panic on zpool clean

On Jun 30, 2010, at 10:48 AM, George wrote:
>> I suggest you to try running ''zdb -bcsv storage2'' and
>> show the result.
> 
> root at crypt:/tmp# zdb -bcsv storage2
> zdb: can''t open storage2: No such device or address
> 
> then I tried
> 
> root at crypt:/tmp# zdb -ebcsv storage2
> zdb: can''t open storage2: File exists
Please try 

zdb -U /dev/null -ebcsv storage2

George

2010-Jun-30 08:22 UTC

head link

[zfs-discuss] Kernel Panic on zpool clean

> Please try 
> 
> zdb -U /dev/null -ebcsv storage2
root at crypt:~# zdb -U /dev/null -ebcsv storage2
zdb: can''t open storage2: No such device or address

If I try

root at crypt:~# zdb -C storage2

Then it prints what appears to be a valid configuration but then the same error
message about being unable to find the device (output attached).

George
-- 
This message posted from opensolaris.org
-------------- next part --------------
root at crypt:~# zdb -C storage2
    version=14
    name=''storage2''
    state=0
    txg=1807366
    pool_guid=14701046672203578408
    hostid=8522651
    hostname=''crypt''
    vdev_tree
        type=''root''
        id=0
        guid=14701046672203578408
        children[0]
                type=''raidz''
                id=0
                guid=15861342641545291969
                nparity=1
                metaslab_array=14
                metaslab_shift=35
                ashift=9
                asize=3999672565760
                is_log=0
                children[0]
                        type=''disk''
                        id=0
                        guid=14390766171745861103
                        path=''/dev/dsk/c9t4d2s0''
                        devid=''id1,sd at
n600d0230006c8a5f0c3fd863ea736d00/a''
                        phys_path=''/pci at 0,0/pci1022,7450 at
b/pci9005,40 at 1/sd at 4,2:a''
                        whole_disk=1
                        DTL=301
                children[1]
                        type=''disk''
                        id=1
                        guid=14806610527738068493
                        path=''/dev/dsk/c9t4d3s0''
                        devid=''id1,sd at
n600d0230006c8a5f0c3fd8514ed8d900/a''
                        phys_path=''/pci at 0,0/pci1022,7450 at
b/pci9005,40 at 1/sd at 4,3:a''
                        whole_disk=1
                        DTL=300
                children[2]
                        type=''disk''
                        id=2
                        guid=4272121319363331595
                        path=''/dev/dsk/c10t4d2s0''
                        devid=''id1,sd at
n600d0230006c8a5f0c3fd84312aa6d00/a''
                        phys_path=''/pci at 0,0/pci1022,7450 at
b/pci9005,40 at 1,1/sd at 4,2:a''
                        whole_disk=1
                        DTL=299
                children[3]
                        type=''disk''
                        id=3
                        guid=16286569401176941639
                        path=''/dev/dsk/c10t4d4s0''
                        devid=''id1,sd at
n600d0230006c8a5f0c3fd8415c62ae00/a''
                        phys_path=''/pci at 0,0/pci1022,7450 at
b/pci9005,40 at 1,1/sd at 4,4:a''
                        whole_disk=1
                        DTL=296
        children[1]
                type=''raidz''
                id=1
                guid=12601468074885676119
                nparity=1
                metaslab_array=172
                metaslab_shift=35
                ashift=9
                asize=3999672565760
                is_log=0
                children[0]
                        type=''disk''
                        id=0
                        guid=7040280703157905854
                        path=''/dev/dsk/c10t4d0s0''
                        devid=''id1,sd at
n600d0230006c8a5f0c3fd83eda0a4a00/a''
                        phys_path=''/pci at 0,0/pci1022,7450 at
b/pci9005,40 at 1,1/sd at 4,0:a''
                        whole_disk=1
                        DTL=305
                children[1]
                        type=''replacing''
                        id=1
                        guid=16928413524184799719
                        whole_disk=0
                        children[0]
                                type=''disk''
                                id=0
                                guid=9102173991259789741
                                path=''/dev/dsk/c9t4d0s0''
                                devid=''id1,sd at
n600d0230006c8a5f0c3fd86eee69a300/a''
                                phys_path=''/pci at 0,0/pci1022,7450 at
b/pci9005,40 at 1/sd at 4,0:a''
                                whole_disk=1
                                DTL=304
                        children[1]
                                type=''disk''
                                id=1
                                guid=16888611779137638814
                                path=''/dev/dsk/c9t4d4s0''
                                devid=''id1,sd at
n600d0230006c8a5f0c3fd8612edc7d00/a''
                                phys_path=''/pci at 0,0/pci1022,7450 at
b/pci9005,40 at 1/sd at 4,4:a''
                                whole_disk=1
                                DTL=321
                children[2]
                        type=''disk''
                        id=2
                        guid=4025009484028197162
                        path=''/dev/dsk/c10t4d1s0''
                        devid=''id1,sd at
n600d0230006c8a5f0c3fd8609d147700/a''
                        phys_path=''/pci at 0,0/pci1022,7450 at
b/pci9005,40 at 1,1/sd at 4,1:a''
                        whole_disk=1
                        DTL=303
                children[3]
                        type=''disk''
                        id=3
                        guid=16808231922771934373
                        path=''/dev/dsk/c9t4d1s0''
                        devid=''id1,sd at
n600d0230006c8a5f0c3fd86cbc020700/a''
                        phys_path=''/pci at 0,0/pci1022,7450 at
b/pci9005,40 at 1/sd at 4,1:a''
                        whole_disk=1
                        DTL=302
zdb: can''t open storage2: No such device or address

George

2010-Jun-30 23:02 UTC

head link

[zfs-discuss] Kernel Panic on zpool clean

Aha:

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6794136

I think I''ll try booting from a b134 Live CD and see that will let me
fix things.
-- 
This message posted from opensolaris.org

George

2010-Jul-03 01:04 UTC

head link

[zfs-discuss] Kernel Panic on zpool clean

> I think I''ll try booting from a b134 Live CD and see
> that will let me fix things.
Sadly it appears not - at least not straight away.

Running "zpool import" now gives

  pool: storage2
    id: 14701046672203578408
 state: FAULTED
status: The pool was last accessed by another system.
action: The pool cannot be imported due to damaged devices or data.
	The pool may be active on another system, but can be imported using
	the ''-f'' flag.
   see: http://www.sun.com/msg/ZFS-8000-EY
config:

	storage2         FAULTED  corrupted data
	  raidz1-0       FAULTED  corrupted data
	    c6t4d2       ONLINE
	    c6t4d3       ONLINE
	    c7t4d2       ONLINE
	    c7t4d3       ONLINE
	  raidz1-1       FAULTED  corrupted data
	    c7t4d0       ONLINE
	    replacing-1  UNAVAIL  insufficient replicas
	      c6t4d0     FAULTED  corrupted data
	      c9t4d4     UNAVAIL  cannot open
	    c7t4d1       ONLINE
	    c6t4d1       ONLINE

If I do "zpool import -f storage2" it complains about devices being
faulted and suggests destroying the pool.
If I do "zpool clean storage2" or "zpool clean storage2
c9t4d4" these say that storage2 does not exist.
If I do "zpool import -nF storage2" this says that the pool was last
run on another system and prompts for "-f".
if I do "zpool import -fnF storage2" this appears to quit silently.

I don''t really understand why the installed system is very specific
about the problem being with the intent log (and suggesting it just needs
clearing) but booting from the b134 CD doesn''t pick up on that, unless
it''s being masked by the hostid mismatch error. Because of that
I''m thinking that I should try to change the hostid when booted from
the CD to be the same as the previously installed system to see if that helps -
unless that''s likely to confuse it at all...?
-- 
This message posted from opensolaris.org

George

2010-Jul-03 09:20 UTC

head link

[zfs-discuss] Kernel Panic on zpool clean

> Because of that I''m thinking that I should try
> to change the hostid when booted from the CD to be
> the same as the previously installed system to see if
> that helps - unless that''s likely to confuse it at
> all...?
I''ve now tried changing the hostid using the code from
http://forums.sun.com/thread.jspa?threadID=5075254 NB: you need to leave this
running in a separate terminal.

This changes the start of "zpool import" to

  pool: storage2
    id: 14701046672203578408
 state: FAULTED
status: The pool metadata is corrupted.
action: The pool cannot be imported due to damaged devices or data.
        The pool may be active on another system, but can be imported using
        the ''-f'' flag.
   see: http://www.sun.com/msg/ZFS-8000-72


but otherwise nothing is changed with respect to trying to import or clear the
pool. The pool is 8TB and the machine has 4GB but as far as I can see via top
the commands aren''t failing due to a lack of memory.

I''m a bit stumped now. The only thing else I can think to try is
inserting c9t4d4 (the new drive) and removing c6t4d0 (which should be fine). The
problem with this though is that it relies on c7t4d0 (which is faulty) and so it
assumes that the errors can be cleared, the replace stopped and the drives
swapped back before further errors happen.
-- 
This message posted from opensolaris.org

Victor Latushkin

2010-Jul-06 14:44 UTC

head link

[zfs-discuss] Kernel Panic on zpool clean

On Jul 3, 2010, at 1:20 PM, George wrote:
>> Because of that I''m thinking that I should try
>> to change the hostid when booted from the CD to be
>> the same as the previously installed system to see if
>> that helps - unless that''s likely to confuse it at
>> all...?
> 
> I''ve now tried changing the hostid using the code from
http://forums.sun.com/thread.jspa?threadID=5075254 NB: you need to leave this
running in a separate terminal.
> 
> This changes the start of "zpool import" to
> 
>  pool: storage2
>    id: 14701046672203578408
> state: FAULTED
> status: The pool metadata is corrupted.
> action: The pool cannot be imported due to damaged devices or data.
>        The pool may be active on another system, but can be imported using
>        the ''-f'' flag.
>   see: http://www.sun.com/msg/ZFS-8000-72
> 
> 
> but otherwise nothing is changed with respect to trying to import or clear
the pool. The pool is 8TB and the machine has 4GB but as far as I can see via
top the commands aren''t failing due to a lack of memory.
> 
> I''m a bit stumped now. The only thing else I can think to try is
inserting c9t4d4 (the new drive) and removing c6t4d0 (which should be fine). The
problem with this though is that it relies on c7t4d0 (which is faulty) and so it
assumes that the errors can be cleared, the replace stopped and the drives
swapped back before further errors happen.
I think it is quite likely to be possible to get readonly access to your data,
but this requires modified ZFS binaries. What is your pool version? What build
do you have installed on your system disk or available as LiveCD?

regards
victor

Roy Sigurd Karlsbakk

2010-Jul-06 14:53 UTC

head link

[zfs-discuss] Kernel Panic on zpool clean

> I think it is quite likely to be possible to get readonly access to
> your data, but this requires modified ZFS binaries. What is your pool
> version? What build do you have installed on your system disk or
> available as LiveCD?
Sorry, but does this mean if ZFS can''t write to the drives, access to
the pool won''t be possible? If so, that''s rather scary...

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
roy at karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er
et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og
relevante synonymer p? norsk.

Victor Latushkin

2010-Jul-06 14:57 UTC

head link

[zfs-discuss] Kernel Panic on zpool clean

On Jun 28, 2010, at 11:27 PM, George wrote:
> Again this core dumps when I try to do "zpool clear storage2"
> 
> Does anyone have any suggestions what would be the best course of action
now?
Do you have any crahsdumps saved? First one is most interesting one...

George

2010-Jul-09 00:27 UTC

head link

[zfs-discuss] Kernel Panic on zpool clean

> I think it is quite likely to be possible to get
> readonly access to your data, but this requires
> modified ZFS binaries. What is your pool version?
> What build do you have installed on your system disk
> or available as LiveCD?
[Prompted by an off-list e-mail from Victor asking if I was still having
problems]

Thanks for your reply, and apologies for not having replied here sooner - I was
going to try something myself (which I''ll explain shortly) but have
been hampered by a flakey cdrom drive - something I won''t have chance
to sort until the weekend.

In answer to your question the installed system is running 2009.06 (b111b) and
the LiveCD I''ve been using is b134.

The problem with the Installed system crashing when I tried to run "zpool
clean" I believe is being caused by
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6794136 which makes
me think that the same command run from a later version should work fine.

I haven''t had any success doing this though and I believe the reason is
that several of the ZFS commands won''t work if the hostid of the
machine to last access the pool is different from the current system (and the
pool is exported/faulted), as happens when using a LiveCD. Where I was getting
errors about "storage2 does not exist" I found it was writing errors
to the syslog that the pool "could not be loaded as it was last accessed by
another system". I tried to get round this using the Dtrace hostid changing
script I mentioned in one of my earlier messages but this seemed not to be able
to fool system processes.

I also tried exporting the pool from the Installed system to see if that would
help but unfortunately it didn''t. After having exported the pool
"zfs import" run on the Installed system reported "The pool can
be imported despite missing or damaged devices." however when trying to
import it (with or without -f) it refused to import it as "one or more
devices is currently unavailable". When booting the LiveCD after having
exported the pool it still gave errors about having been last accessed by
another system.

I couldn''t spot any method of modifying the LiveCD image to have a
particular hostid so my plan therefore has been to try installing b134 onto the
system, setting the hostid under /etc and seeing if things then behaved in a
more straightforward fashion, which I haven''t managed yet due to the
cdrom problems.

I also mentioned in one of my earlier e-mails that I was confused that the
Installed system mentioned an unreadable intent log but the LiveCD said the
problem was corrupted metadata. This seems to be caused by the functions
print_import_config and print_statement_config having slightly different case
statements and not a difference in the pool itself.

Hopefully I''ll be able to complete the reinstall soon and see if that
fixes things or there''s a deeper problem.

Thanks again for your help,

George
-- 
This message posted from opensolaris.org

Victor Latushkin

2010-Aug-17 22:15 UTC

head link

[zfs-discuss] Kernel Panic on zpool clean

On Jul 9, 2010, at 4:27 AM, George wrote:
>> I think it is quite likely to be possible to get
>> readonly access to your data, but this requires
>> modified ZFS binaries. What is your pool version?
>> What build do you have installed on your system disk
>> or available as LiveCD?
For the record - using ZFS readonly import code backported to build 134 and
slightly modified to account for specific corruptions of this case
we''ve been able to import pool in readonly mode and George is now
backing up his data.

As soon as that completes I hope to have a chance to have another look into it
to see what else we can learn from this case.

regards
victor
> 
> [Prompted by an off-list e-mail from Victor asking if I was still having
problems]
> 
> Thanks for your reply, and apologies for not having replied here sooner - I
was going to try something myself (which I''ll explain shortly) but have
been hampered by a flakey cdrom drive - something I won''t have chance
to sort until the weekend.
> 
> In answer to your question the installed system is running 2009.06 (b111b)
and the LiveCD I''ve been using is b134.
> 
> The problem with the Installed system crashing when I tried to run
"zpool clean" I believe is being caused by
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6794136 which makes
me think that the same command run from a later version should work fine.
> 
> I haven''t had any success doing this though and I believe the
reason is that several of the ZFS commands won''t work if the hostid of
the machine to last access the pool is different from the current system (and
the pool is exported/faulted), as happens when using a LiveCD. Where I was
getting errors about "storage2 does not exist" I found it was writing
errors to the syslog that the pool "could not be loaded as it was last
accessed by another system". I tried to get round this using the Dtrace
hostid changing script I mentioned in one of my earlier messages but this seemed
not to be able to fool system processes.
> 
> I also tried exporting the pool from the Installed system to see if that
would help but unfortunately it didn''t. After having exported the pool
"zfs import" run on the Installed system reported "The pool can
be imported despite missing or damaged devices." however when trying to
import it (with or without -f) it refused to import it as "one or more
devices is currently unavailable". When booting the LiveCD after having
exported the pool it still gave errors about having been last accessed by
another system.
> 
> I couldn''t spot any method of modifying the LiveCD image to have a
particular hostid so my plan therefore has been to try installing b134 onto the
system, setting the hostid under /etc and seeing if things then behaved in a
more straightforward fashion, which I haven''t managed yet due to the
cdrom problems.
> 
> I also mentioned in one of my earlier e-mails that I was confused that the
Installed system mentioned an unreadable intent log but the LiveCD said the
problem was corrupted metadata. This seems to be caused by the functions
print_import_config and print_statement_config having slightly different case
statements and not a difference in the pool itself.
> 
> Hopefully I''ll be able to complete the reinstall soon and see if
that fixes things or there''s a deeper problem.
> 
> Thanks again for your help,
> 
> George
> -- 
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

zfs discuss - Jun 2010 - Kernel Panic on zpool clean

[zfs-discuss] Kernel Panic on zpool clean

[zfs-discuss] Kernel Panic on zpool clean

[zfs-discuss] Kernel Panic on zpool clean

[zfs-discuss] Kernel Panic on zpool clean

[zfs-discuss] Kernel Panic on zpool clean

[zfs-discuss] Kernel Panic on zpool clean

[zfs-discuss] Kernel Panic on zpool clean

[zfs-discuss] Kernel Panic on zpool clean

[zfs-discuss] Kernel Panic on zpool clean

[zfs-discuss] Kernel Panic on zpool clean

[zfs-discuss] Kernel Panic on zpool clean

[zfs-discuss] Kernel Panic on zpool clean

[zfs-discuss] Kernel Panic on zpool clean

[zfs-discuss] Kernel Panic on zpool clean

[zfs-discuss] Kernel Panic on zpool clean

[zfs-discuss] Kernel Panic on zpool clean