Hi, I have a machine running 2009.06 with 8 SATA drives in SCSI connected enclosure. I had a drive fail and accidentally replaced the wrong one, which unsurprisingly caused the rebuild to fail. The status of the zpool then ended up as: pool: storage2 state: FAULTED status: An intent log record could not be read. Waiting for adminstrator intervention to fix the faulted pool. action: Either restore the affected device(s) and run ''zpool online'', or ignore the intent log records by running ''zpool clear''. see: http://www.sun.com/msg/ZFS-8000-K4 scrub: none requested config: NAME STATE READ WRITE CKSUM storage2 FAULTED 0 0 1 bad intent log raidz1 ONLINE 0 0 0 c9t4d2 ONLINE 0 0 0 c9t4d3 ONLINE 0 0 0 c10t4d2 ONLINE 0 0 0 c10t4d4 ONLINE 0 0 0 raidz1 DEGRADED 0 0 6 c10t4d0 UNAVAIL 0 0 0 cannot open replacing ONLINE 0 0 0 c9t4d0 ONLINE 0 0 0 c10t4d3 ONLINE 0 0 0 c10t4d1 ONLINE 0 0 0 c9t4d1 ONLINE 0 0 0 running "zpool clear storage2" caused the machine to dump and reboot. I''ve tried removing the spare and putting back the faulty drive to give: pool: storage2 state: FAULTED status: An intent log record could not be read. Waiting for adminstrator intervention to fix the faulted pool. action: Either restore the affected device(s) and run ''zpool online'', or ignore the intent log records by running ''zpool clear''. see: http://www.sun.com/msg/ZFS-8000-K4 scrub: none requested config: NAME STATE READ WRITE CKSUM storage2 FAULTED 0 0 1 bad intent log raidz1 ONLINE 0 0 0 c9t4d2 ONLINE 0 0 0 c9t4d3 ONLINE 0 0 0 c10t4d2 ONLINE 0 0 0 c10t4d4 ONLINE 0 0 0 raidz1 DEGRADED 0 0 6 c10t4d0 FAULTED 0 0 0 corrupted data replacing DEGRADED 0 0 0 c9t4d0 ONLINE 0 0 0 c9t4d4 UNAVAIL 0 0 0 cannot open c10t4d1 ONLINE 0 0 0 c9t4d1 ONLINE 0 0 0 Again this core dumps when I try to do "zpool clear storage2" Does anyone have any suggestions what would be the best course of action now? -- This message posted from opensolaris.org
On Jun 28, 2010, at 11:27 PM, George wrote:> I''ve tried removing the spare and putting back the faulty drive to give: > > pool: storage2 > state: FAULTED > status: An intent log record could not be read. > Waiting for adminstrator intervention to fix the faulted pool. > action: Either restore the affected device(s) and run ''zpool online'', > or ignore the intent log records by running ''zpool clear''. > see: http://www.sun.com/msg/ZFS-8000-K4 > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > storage2 FAULTED 0 0 1 bad intent log > raidz1 ONLINE 0 0 0 > c9t4d2 ONLINE 0 0 0 > c9t4d3 ONLINE 0 0 0 > c10t4d2 ONLINE 0 0 0 > c10t4d4 ONLINE 0 0 0 > raidz1 DEGRADED 0 0 6 > c10t4d0 FAULTED 0 0 0 corrupted data > replacing DEGRADED 0 0 0 > c9t4d0 ONLINE 0 0 0 > c9t4d4 UNAVAIL 0 0 0 cannot open > c10t4d1 ONLINE 0 0 0 > c9t4d1 ONLINE 0 0 0 > > Again this core dumps when I try to do "zpool clear storage2" > > Does anyone have any suggestions what would be the best course of action now?I think first we need to understand why it does not like ''zpool clear'', as that may provide better understanding of what is wrong. For that you need to create directory for saving crashdumps e.g. like this mkdir -p /var/crash/`uname -n` then run savecore and see if it would save a crash dump into that directory. If crashdump is there, then you need to perform some basic investigation: cd /var/crash/`uname -n` mdb <dump number> ::status ::stack ::spa -c ::spa -v ::spa -ve $q for a start.
I''ve attached the output of those commands. The machine is a v20z if that makes any difference. Thanks, George -- This message posted from opensolaris.org -------------- next part -------------- mdb: logging to "debug.txt"> ::statusdebugging crash dump vmcore.0 (64-bit) from crypt operating system: 5.11 snv_111b (i86pc) panic message: BAD TRAP: type=e (#pf Page fault) rp=ffffff00084fc660 addr=0 occurred in module "unix" due to a NULL pointer dereference dump content: kernel pages only> ::stackmutex_enter+0xb() metaslab_free+0x12e(ffffff01c9fb3800, ffffff01cce64668, 1b9528, 0) zio_dva_free+0x26(ffffff01cce64608) zio_execute+0xa0(ffffff01cce64608) zio_nowait+0x5a(ffffff01cce64608) arc_free+0x197(ffffff01cf0c80c0, ffffff01c9fb3800, 1b9528, ffffff01d389bcf0, 0, 0) dsl_free+0x30(ffffff01cf0c80c0, ffffff01d389bcc0, 1b9528, ffffff01d389bcf0, 0, 0 ) dsl_dataset_block_kill+0x293(0, ffffff01d389bcf0, ffffff01cf0c80c0, ffffff01d18cfd80) dmu_objset_sync+0xc4(ffffff01cffe0080, ffffff01cf0c80c0, ffffff01d18cfd80) dsl_pool_sync+0x1ee(ffffff01d389bcc0, 1b9528) spa_sync+0x32a(ffffff01c9fb3800, 1b9528) txg_sync_thread+0x265(ffffff01d389bcc0) thread_start+8()> ::spa -cADDR STATE NAME ffffff01c8df3000 ACTIVE rpool version=000000000000000e name=''rpool'' state=0000000000000000 txg=00000000056a6ad1 pool_guid=53825ef3c58abc97 hostid=0000000000820b9b hostname=''crypt'' vdev_tree type=''root'' id=0000000000000000 guid=53825ef3c58abc97 children[0] type=''mirror'' id=0000000000000000 guid=e9b8daed37492cfe whole_disk=0000000000000000 metaslab_array=0000000000000017 metaslab_shift=000000000000001d ashift=0000000000000009 asize=0000001114e00000 is_log=0000000000000000 children[0] type=''disk'' id=0000000000000000 guid=ad7e5022f804365a path=''/dev/dsk/c8t0d0s0'' devid=''id1,sd at SSEAGATE_ST373307LC______3HZ76YYD0000743809WM/a'' phys_path=''/pci at 0,0/pci1022,7450 at a/pci17c2,10 at 4/sd at 0,0:a'' whole_disk=0000000000000000 DTL=0000000000000052 children[1] type=''disk'' id=0000000000000001 guid=2f7a03c75a4931ac path=''/dev/dsk/c8t1d0s0'' devid=''id1,sd at SSEAGATE_ST373307LC______3HZ80BDP0000743793PA/a'' phys_path=''/pci at 0,0/pci1022,7450 at a/pci17c2,10 at 4/sd at 1,0:a'' whole_disk=0000000000000000 DTL=0000000000000050 ffffff01c9fb3800 ACTIVE storage2 version=000000000000000e name=''storage2'' state=0000000000000000 txg=00000000001b9406 pool_guid=cc049c0f1321fc28 hostid=0000000000820b9b hostname=''crypt'' vdev_tree type=''root'' id=0000000000000000 guid=cc049c0f1321fc28 children[0] type=''raidz'' id=0000000000000000 guid=dc1ecf18721028c1 nparity=0000000000000001 metaslab_array=000000000000000e metaslab_shift=0000000000000023 ashift=0000000000000009 asize=000003a33f100000 is_log=0000000000000000 children[0] type=''disk'' id=0000000000000000 guid=c7b64596709ebdef path=''/dev/dsk/c9t4d2s0'' devid=''id1,sd at n600d0230006c8a5f0c3fd863ea736d00/a'' phys_path=''/pci at 0,0/pci1022,7450 at b/pci9005,40 at 1/sd at 4,2:a'' whole_disk=0000000000000001 DTL=000000000000012d children[1] type=''disk'' id=0000000000000001 guid=cd7ba5d38162fe0d path=''/dev/dsk/c9t4d3s0'' devid=''id1,sd at n600d0230006c8a5f0c3fd8514ed8d900/a'' phys_path=''/pci at 0,0/pci1022,7450 at b/pci9005,40 at 1/sd at 4,3:a'' whole_disk=0000000000000001 DTL=000000000000012c children[2] type=''disk'' id=0000000000000002 guid=3b499fb48e06460b path=''/dev/dsk/c10t4d2s0'' devid=''id1,sd at n600d0230006c8a5f0c3fd84312aa6d00/a'' phys_path=''/pci at 0,0/pci1022,7450 at b/pci9005,40 at 1,1/sd at 4,2:a'' whole_disk=0000000000000001 DTL=000000000000012b children[3] type=''disk'' id=0000000000000003 guid=e205849496e5e447 path=''/dev/dsk/c10t4d4s0'' devid=''id1,sd at n600d0230006c8a5f0c3fd8415c62ae00/a'' phys_path=''/pci at 0,0/pci1022,7450 at b/pci9005,40 at 1,1/sd at 4,4:a'' whole_disk=0000000000000001 DTL=0000000000000128 children[1] type=''raidz'' id=0000000000000001 guid=aee16872dbfc7c57 nparity=0000000000000001 metaslab_array=00000000000000ac metaslab_shift=0000000000000023 ashift=0000000000000009 asize=000003a33f100000 is_log=0000000000000000 children[0] type=''disk'' id=0000000000000000 guid=61b419ff9ec3a9be path=''/dev/dsk/c10t4d0s0'' devid=''id1,sd at n600d0230006c8a5f0c3fd83eda0a4a00/a'' phys_path=''/pci at 0,0/pci1022,7450 at b/pci9005,40 at 1,1/sd at 4,0:a'' whole_disk=0000000000000001 DTL=0000000000000131 children[1] type=''replacing'' id=0000000000000001 guid=eaedce68dff419e7 whole_disk=0000000000000000 children[0] type=''disk'' id=0000000000000000 guid=7e516b0508d6d9ad path=''/dev/dsk/c9t4d0s0'' devid=''id1,sd at n600d0230006c8a5f0c3fd86eee69a300/a'' phys_path=''/pci at 0,0/pci1022,7450 at b/pci9005,40 at 1/sd at 4,0:a'' whole_disk=0000000000000001 DTL=0000000000000130 children[1] type=''disk'' id=0000000000000001 guid=ea6066eef4fa119e path=''/dev/dsk/c9t4d4s0'' devid=''id1,sd at n600d0230006c8a5f0c3fd8612edc7d00/a'' phys_path=''/pci at 0,0/pci1022,7450 at b/pci9005,40 at 1/sd at 4,4:a'' whole_disk=0000000000000001 DTL=0000000000000141 children[2] type=''disk'' id=0000000000000002 guid=37dbb4cce114392a path=''/dev/dsk/c10t4d1s0'' devid=''id1,sd at n600d0230006c8a5f0c3fd8609d147700/a'' phys_path=''/pci at 0,0/pci1022,7450 at b/pci9005,40 at 1,1/sd at 4,1:a'' whole_disk=0000000000000001 DTL=000000000000012f children[3] type=''disk'' id=0000000000000003 guid=e942d5e14333bca5 path=''/dev/dsk/c9t4d1s0'' devid=''id1,sd at n600d0230006c8a5f0c3fd86cbc020700/a'' phys_path=''/pci at 0,0/pci1022,7450 at b/pci9005,40 at 1/sd at 4,1:a'' whole_disk=0000000000000001 DTL=000000000000012e> ::spa -vADDR STATE NAME ffffff01c8df3000 ACTIVE rpool ADDR STATE AUX DESCRIPTION ffffff01c73de680 HEALTHY - root ffffff01c73de040 HEALTHY - mirror ffffff01c9cca340 HEALTHY - /dev/dsk/c8t0d0s0 ffffff01c9cca980 HEALTHY - /dev/dsk/c8t1d0s0 ffffff01c9fb3800 ACTIVE storage2 ffffff01c9d32900 DEGRADED - root ffffff01c9cb3640 HEALTHY - raidz ffffff01d3874300 HEALTHY - /dev/dsk/c9t4d2s0 ffffff01d3874940 HEALTHY - /dev/dsk/c9t4d3s0 ffffff01cae76d40 HEALTHY - /dev/dsk/c10t4d2s0 ffffff01c9da5040 HEALTHY - /dev/dsk/c10t4d4s0 ffffff01c9cb3000 DEGRADED - raidz ffffff01c9d322c0 CANT_OPEN CORRUPT_DATA /dev/dsk/c10t4d0s0 ffffff01c9da6300 DEGRADED - replacing ffffff01c9d31000 HEALTHY - /dev/dsk/c9t4d0s0 ffffff01c9cb3c80 CANT_OPEN OPEN_FAILED /dev/dsk/c9t4d4s0 ffffff01c9da5cc0 HEALTHY - /dev/dsk/c10t4d1s0 ffffff01cae779c0 HEALTHY - /dev/dsk/c9t4d1s0> ::spa -veADDR STATE NAME ffffff01c8df3000 ACTIVE rpool ADDR STATE AUX DESCRIPTION ffffff01c73de680 HEALTHY - root READ WRITE FREE CLAIM IOCTL OPS 0 0 0 0 0 BYTES 0 0 0 0 0 EREAD 0 EWRITE 0 ECKSUM 0 ffffff01c73de040 HEALTHY - mirror READ WRITE FREE CLAIM IOCTL OPS 0x146d 0x717 0 0 0 BYTES 0x75a8a00 0x1718600 0 0 0 EREAD 0 EWRITE 0 ECKSUM 0 ffffff01c9cca340 HEALTHY - /dev/dsk/c8t0d0s0 READ WRITE FREE CLAIM IOCTL OPS 0x5b9 0x399 0 0 0x76 BYTES 0x56ae000 0x1808600 0 0 0 EREAD 0 EWRITE 0 ECKSUM 0 ffffff01c9cca980 HEALTHY - /dev/dsk/c8t1d0s0 READ WRITE FREE CLAIM IOCTL OPS 0x626 0x388 0 0 0x76 BYTES 0x59ff600 0x1808600 0 0 0 EREAD 0 EWRITE 0 ECKSUM 0 ffffff01c9fb3800 ACTIVE storage2 ffffff01c9d32900 DEGRADED - root READ WRITE FREE CLAIM IOCTL OPS 0 0 0 0 0 BYTES 0 0 0 0 0 EREAD 0 EWRITE 0 ECKSUM 0x4 ffffff01c9cb3640 HEALTHY - raidz READ WRITE FREE CLAIM IOCTL OPS 0x15 0 0 0 0 BYTES 0x1c000 0 0 0 0 EREAD 0 EWRITE 0 ECKSUM 0 ffffff01d3874300 HEALTHY - /dev/dsk/c9t4d2s0 READ WRITE FREE CLAIM IOCTL OPS 0x15 0x3 0 0 0 BYTES 0x152000 0x6000 0 0 0 EREAD 0 EWRITE 0 ECKSUM 0 ffffff01d3874940 HEALTHY - /dev/dsk/c9t4d3s0 READ WRITE FREE CLAIM IOCTL OPS 0x11 0x3 0 0 0 BYTES 0x112000 0x6000 0 0 0 EREAD 0 EWRITE 0 ECKSUM 0 ffffff01cae76d40 HEALTHY - /dev/dsk/c10t4d2s0 READ WRITE FREE CLAIM IOCTL OPS 0x17 0x3 0 0 0 BYTES 0x172000 0x6000 0 0 0 EREAD 0 EWRITE 0 ECKSUM 0 ffffff01c9da5040 HEALTHY - /dev/dsk/c10t4d4s0 READ WRITE FREE CLAIM IOCTL OPS 0x16 0x3 0 0 0 BYTES 0x162000 0x6000 0 0 0 EREAD 0 EWRITE 0 ECKSUM 0 ffffff01c9cb3000 DEGRADED - raidz READ WRITE FREE CLAIM IOCTL OPS 0x3 0x2 0 0 0 BYTES 0x1000 0x1000 0 0 0 EREAD 0 EWRITE 0 ECKSUM 0x19 ffffff01c9d322c0 CANT_OPEN CORRUPT_DATA /dev/dsk/c10t4d0s0 READ WRITE FREE CLAIM IOCTL OPS 0x4 0x3 0 0 0 BYTES 0x22000 0x6000 0 0 0 EREAD 0 EWRITE 0 ECKSUM 0 ffffff01c9da6300 DEGRADED - replacing READ WRITE FREE CLAIM IOCTL OPS 0x22 0x2 0 0 0 BYTES 0xc000 0x600 0 0 0 EREAD 0 EWRITE 0 ECKSUM 0 ffffff01c9d31000 HEALTHY - /dev/dsk/c9t4d0s0 READ WRITE FREE CLAIM IOCTL OPS 0x1f 0x5 0 0 0 BYTES 0x107a00 0x6600 0 0 0 EREAD 0 EWRITE 0 ECKSUM 0 ffffff01c9cb3c80 CANT_OPEN OPEN_FAILED /dev/dsk/c9t4d4s0 READ WRITE FREE CLAIM IOCTL OPS 0 0 0 0 0 BYTES 0 0 0 0 0 EREAD 0 EWRITE 0 ECKSUM 0 ffffff01c9da5cc0 HEALTHY - /dev/dsk/c10t4d1s0 READ WRITE FREE CLAIM IOCTL OPS 0x1e 0x5 0 0 0 BYTES 0xf7a00 0x6600 0 0 0 EREAD 0 EWRITE 0 ECKSUM 0 ffffff01cae779c0 HEALTHY - /dev/dsk/c9t4d1s0 READ WRITE FREE CLAIM IOCTL OPS 0x1e 0x5 0 0 0 BYTES 0xf5c00 0x6400 0 0 0 EREAD 0 EWRITE 0 ECKSUM 0
Another related question - I have a second enclosure with blank disks which I would like to use to take a copy of the existing zpool as a precaution before attempting any fixes. The disks in this enclosure are larger than those that the one with a problem. What would be the best way to do this? If I were to clone the disks 1:1 would the difference in size cause any problems? I also had an idea that I might be able to DD the original disks into files on a ZFS on the second enclosure and mount the files but the few results I''ve turned up on the subject seem to say this is a bad idea. -- This message posted from opensolaris.org
On Jun 29, 2010, at 1:30 AM, George wrote:> I''ve attached the output of those commands. The machine is a v20z if that makes any difference.Stack trace is similar to one bug that I do not recall right now, and it indicates that there''s likely a corruption in ZFS metadata. I suggest you to try running ''zdb -bcsv storage2'' and show the result. victor
> I suggest you to try running ''zdb -bcsv storage2'' and > show the result.root at crypt:/tmp# zdb -bcsv storage2 zdb: can''t open storage2: No such device or address then I tried root at crypt:/tmp# zdb -ebcsv storage2 zdb: can''t open storage2: File exists George -- This message posted from opensolaris.org
On Jun 30, 2010, at 10:48 AM, George wrote:>> I suggest you to try running ''zdb -bcsv storage2'' and >> show the result. > > root at crypt:/tmp# zdb -bcsv storage2 > zdb: can''t open storage2: No such device or address > > then I tried > > root at crypt:/tmp# zdb -ebcsv storage2 > zdb: can''t open storage2: File existsPlease try zdb -U /dev/null -ebcsv storage2
> Please try > > zdb -U /dev/null -ebcsv storage2root at crypt:~# zdb -U /dev/null -ebcsv storage2 zdb: can''t open storage2: No such device or address If I try root at crypt:~# zdb -C storage2 Then it prints what appears to be a valid configuration but then the same error message about being unable to find the device (output attached). George -- This message posted from opensolaris.org -------------- next part -------------- root at crypt:~# zdb -C storage2 version=14 name=''storage2'' state=0 txg=1807366 pool_guid=14701046672203578408 hostid=8522651 hostname=''crypt'' vdev_tree type=''root'' id=0 guid=14701046672203578408 children[0] type=''raidz'' id=0 guid=15861342641545291969 nparity=1 metaslab_array=14 metaslab_shift=35 ashift=9 asize=3999672565760 is_log=0 children[0] type=''disk'' id=0 guid=14390766171745861103 path=''/dev/dsk/c9t4d2s0'' devid=''id1,sd at n600d0230006c8a5f0c3fd863ea736d00/a'' phys_path=''/pci at 0,0/pci1022,7450 at b/pci9005,40 at 1/sd at 4,2:a'' whole_disk=1 DTL=301 children[1] type=''disk'' id=1 guid=14806610527738068493 path=''/dev/dsk/c9t4d3s0'' devid=''id1,sd at n600d0230006c8a5f0c3fd8514ed8d900/a'' phys_path=''/pci at 0,0/pci1022,7450 at b/pci9005,40 at 1/sd at 4,3:a'' whole_disk=1 DTL=300 children[2] type=''disk'' id=2 guid=4272121319363331595 path=''/dev/dsk/c10t4d2s0'' devid=''id1,sd at n600d0230006c8a5f0c3fd84312aa6d00/a'' phys_path=''/pci at 0,0/pci1022,7450 at b/pci9005,40 at 1,1/sd at 4,2:a'' whole_disk=1 DTL=299 children[3] type=''disk'' id=3 guid=16286569401176941639 path=''/dev/dsk/c10t4d4s0'' devid=''id1,sd at n600d0230006c8a5f0c3fd8415c62ae00/a'' phys_path=''/pci at 0,0/pci1022,7450 at b/pci9005,40 at 1,1/sd at 4,4:a'' whole_disk=1 DTL=296 children[1] type=''raidz'' id=1 guid=12601468074885676119 nparity=1 metaslab_array=172 metaslab_shift=35 ashift=9 asize=3999672565760 is_log=0 children[0] type=''disk'' id=0 guid=7040280703157905854 path=''/dev/dsk/c10t4d0s0'' devid=''id1,sd at n600d0230006c8a5f0c3fd83eda0a4a00/a'' phys_path=''/pci at 0,0/pci1022,7450 at b/pci9005,40 at 1,1/sd at 4,0:a'' whole_disk=1 DTL=305 children[1] type=''replacing'' id=1 guid=16928413524184799719 whole_disk=0 children[0] type=''disk'' id=0 guid=9102173991259789741 path=''/dev/dsk/c9t4d0s0'' devid=''id1,sd at n600d0230006c8a5f0c3fd86eee69a300/a'' phys_path=''/pci at 0,0/pci1022,7450 at b/pci9005,40 at 1/sd at 4,0:a'' whole_disk=1 DTL=304 children[1] type=''disk'' id=1 guid=16888611779137638814 path=''/dev/dsk/c9t4d4s0'' devid=''id1,sd at n600d0230006c8a5f0c3fd8612edc7d00/a'' phys_path=''/pci at 0,0/pci1022,7450 at b/pci9005,40 at 1/sd at 4,4:a'' whole_disk=1 DTL=321 children[2] type=''disk'' id=2 guid=4025009484028197162 path=''/dev/dsk/c10t4d1s0'' devid=''id1,sd at n600d0230006c8a5f0c3fd8609d147700/a'' phys_path=''/pci at 0,0/pci1022,7450 at b/pci9005,40 at 1,1/sd at 4,1:a'' whole_disk=1 DTL=303 children[3] type=''disk'' id=3 guid=16808231922771934373 path=''/dev/dsk/c9t4d1s0'' devid=''id1,sd at n600d0230006c8a5f0c3fd86cbc020700/a'' phys_path=''/pci at 0,0/pci1022,7450 at b/pci9005,40 at 1/sd at 4,1:a'' whole_disk=1 DTL=302 zdb: can''t open storage2: No such device or address
Aha: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6794136 I think I''ll try booting from a b134 Live CD and see that will let me fix things. -- This message posted from opensolaris.org
> I think I''ll try booting from a b134 Live CD and see > that will let me fix things.Sadly it appears not - at least not straight away. Running "zpool import" now gives pool: storage2 id: 14701046672203578408 state: FAULTED status: The pool was last accessed by another system. action: The pool cannot be imported due to damaged devices or data. The pool may be active on another system, but can be imported using the ''-f'' flag. see: http://www.sun.com/msg/ZFS-8000-EY config: storage2 FAULTED corrupted data raidz1-0 FAULTED corrupted data c6t4d2 ONLINE c6t4d3 ONLINE c7t4d2 ONLINE c7t4d3 ONLINE raidz1-1 FAULTED corrupted data c7t4d0 ONLINE replacing-1 UNAVAIL insufficient replicas c6t4d0 FAULTED corrupted data c9t4d4 UNAVAIL cannot open c7t4d1 ONLINE c6t4d1 ONLINE If I do "zpool import -f storage2" it complains about devices being faulted and suggests destroying the pool. If I do "zpool clean storage2" or "zpool clean storage2 c9t4d4" these say that storage2 does not exist. If I do "zpool import -nF storage2" this says that the pool was last run on another system and prompts for "-f". if I do "zpool import -fnF storage2" this appears to quit silently. I don''t really understand why the installed system is very specific about the problem being with the intent log (and suggesting it just needs clearing) but booting from the b134 CD doesn''t pick up on that, unless it''s being masked by the hostid mismatch error. Because of that I''m thinking that I should try to change the hostid when booted from the CD to be the same as the previously installed system to see if that helps - unless that''s likely to confuse it at all...? -- This message posted from opensolaris.org
> Because of that I''m thinking that I should try > to change the hostid when booted from the CD to be > the same as the previously installed system to see if > that helps - unless that''s likely to confuse it at > all...?I''ve now tried changing the hostid using the code from http://forums.sun.com/thread.jspa?threadID=5075254 NB: you need to leave this running in a separate terminal. This changes the start of "zpool import" to pool: storage2 id: 14701046672203578408 state: FAULTED status: The pool metadata is corrupted. action: The pool cannot be imported due to damaged devices or data. The pool may be active on another system, but can be imported using the ''-f'' flag. see: http://www.sun.com/msg/ZFS-8000-72 but otherwise nothing is changed with respect to trying to import or clear the pool. The pool is 8TB and the machine has 4GB but as far as I can see via top the commands aren''t failing due to a lack of memory. I''m a bit stumped now. The only thing else I can think to try is inserting c9t4d4 (the new drive) and removing c6t4d0 (which should be fine). The problem with this though is that it relies on c7t4d0 (which is faulty) and so it assumes that the errors can be cleared, the replace stopped and the drives swapped back before further errors happen. -- This message posted from opensolaris.org
On Jul 3, 2010, at 1:20 PM, George wrote:>> Because of that I''m thinking that I should try >> to change the hostid when booted from the CD to be >> the same as the previously installed system to see if >> that helps - unless that''s likely to confuse it at >> all...? > > I''ve now tried changing the hostid using the code from http://forums.sun.com/thread.jspa?threadID=5075254 NB: you need to leave this running in a separate terminal. > > This changes the start of "zpool import" to > > pool: storage2 > id: 14701046672203578408 > state: FAULTED > status: The pool metadata is corrupted. > action: The pool cannot be imported due to damaged devices or data. > The pool may be active on another system, but can be imported using > the ''-f'' flag. > see: http://www.sun.com/msg/ZFS-8000-72 > > > but otherwise nothing is changed with respect to trying to import or clear the pool. The pool is 8TB and the machine has 4GB but as far as I can see via top the commands aren''t failing due to a lack of memory. > > I''m a bit stumped now. The only thing else I can think to try is inserting c9t4d4 (the new drive) and removing c6t4d0 (which should be fine). The problem with this though is that it relies on c7t4d0 (which is faulty) and so it assumes that the errors can be cleared, the replace stopped and the drives swapped back before further errors happen.I think it is quite likely to be possible to get readonly access to your data, but this requires modified ZFS binaries. What is your pool version? What build do you have installed on your system disk or available as LiveCD? regards victor
> I think it is quite likely to be possible to get readonly access to > your data, but this requires modified ZFS binaries. What is your pool > version? What build do you have installed on your system disk or > available as LiveCD?Sorry, but does this mean if ZFS can''t write to the drives, access to the pool won''t be possible? If so, that''s rather scary... Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.
On Jun 28, 2010, at 11:27 PM, George wrote:> Again this core dumps when I try to do "zpool clear storage2" > > Does anyone have any suggestions what would be the best course of action now?Do you have any crahsdumps saved? First one is most interesting one...
> I think it is quite likely to be possible to get > readonly access to your data, but this requires > modified ZFS binaries. What is your pool version? > What build do you have installed on your system disk > or available as LiveCD?[Prompted by an off-list e-mail from Victor asking if I was still having problems] Thanks for your reply, and apologies for not having replied here sooner - I was going to try something myself (which I''ll explain shortly) but have been hampered by a flakey cdrom drive - something I won''t have chance to sort until the weekend. In answer to your question the installed system is running 2009.06 (b111b) and the LiveCD I''ve been using is b134. The problem with the Installed system crashing when I tried to run "zpool clean" I believe is being caused by http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6794136 which makes me think that the same command run from a later version should work fine. I haven''t had any success doing this though and I believe the reason is that several of the ZFS commands won''t work if the hostid of the machine to last access the pool is different from the current system (and the pool is exported/faulted), as happens when using a LiveCD. Where I was getting errors about "storage2 does not exist" I found it was writing errors to the syslog that the pool "could not be loaded as it was last accessed by another system". I tried to get round this using the Dtrace hostid changing script I mentioned in one of my earlier messages but this seemed not to be able to fool system processes. I also tried exporting the pool from the Installed system to see if that would help but unfortunately it didn''t. After having exported the pool "zfs import" run on the Installed system reported "The pool can be imported despite missing or damaged devices." however when trying to import it (with or without -f) it refused to import it as "one or more devices is currently unavailable". When booting the LiveCD after having exported the pool it still gave errors about having been last accessed by another system. I couldn''t spot any method of modifying the LiveCD image to have a particular hostid so my plan therefore has been to try installing b134 onto the system, setting the hostid under /etc and seeing if things then behaved in a more straightforward fashion, which I haven''t managed yet due to the cdrom problems. I also mentioned in one of my earlier e-mails that I was confused that the Installed system mentioned an unreadable intent log but the LiveCD said the problem was corrupted metadata. This seems to be caused by the functions print_import_config and print_statement_config having slightly different case statements and not a difference in the pool itself. Hopefully I''ll be able to complete the reinstall soon and see if that fixes things or there''s a deeper problem. Thanks again for your help, George -- This message posted from opensolaris.org
On Jul 9, 2010, at 4:27 AM, George wrote:>> I think it is quite likely to be possible to get >> readonly access to your data, but this requires >> modified ZFS binaries. What is your pool version? >> What build do you have installed on your system disk >> or available as LiveCD?For the record - using ZFS readonly import code backported to build 134 and slightly modified to account for specific corruptions of this case we''ve been able to import pool in readonly mode and George is now backing up his data. As soon as that completes I hope to have a chance to have another look into it to see what else we can learn from this case. regards victor> > [Prompted by an off-list e-mail from Victor asking if I was still having problems] > > Thanks for your reply, and apologies for not having replied here sooner - I was going to try something myself (which I''ll explain shortly) but have been hampered by a flakey cdrom drive - something I won''t have chance to sort until the weekend. > > In answer to your question the installed system is running 2009.06 (b111b) and the LiveCD I''ve been using is b134. > > The problem with the Installed system crashing when I tried to run "zpool clean" I believe is being caused by http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6794136 which makes me think that the same command run from a later version should work fine. > > I haven''t had any success doing this though and I believe the reason is that several of the ZFS commands won''t work if the hostid of the machine to last access the pool is different from the current system (and the pool is exported/faulted), as happens when using a LiveCD. Where I was getting errors about "storage2 does not exist" I found it was writing errors to the syslog that the pool "could not be loaded as it was last accessed by another system". I tried to get round this using the Dtrace hostid changing script I mentioned in one of my earlier messages but this seemed not to be able to fool system processes. > > I also tried exporting the pool from the Installed system to see if that would help but unfortunately it didn''t. After having exported the pool "zfs import" run on the Installed system reported "The pool can be imported despite missing or damaged devices." however when trying to import it (with or without -f) it refused to import it as "one or more devices is currently unavailable". When booting the LiveCD after having exported the pool it still gave errors about having been last accessed by another system. > > I couldn''t spot any method of modifying the LiveCD image to have a particular hostid so my plan therefore has been to try installing b134 onto the system, setting the hostid under /etc and seeing if things then behaved in a more straightforward fashion, which I haven''t managed yet due to the cdrom problems. > > I also mentioned in one of my earlier e-mails that I was confused that the Installed system mentioned an unreadable intent log but the LiveCD said the problem was corrupted metadata. This seems to be caused by the functions print_import_config and print_statement_config having slightly different case statements and not a difference in the pool itself. > > Hopefully I''ll be able to complete the reinstall soon and see if that fixes things or there''s a deeper problem. > > Thanks again for your help, > > George > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss