Matt.Ingenthron at Sun.COM
2008-Nov-07 04:25 UTC
[zfs-discuss] ZFS problems which scrub can''t find?
Hi, After a recent pkg image-update to OpenSolaris build 100, my system booted once and now will no longer boot. After exhausting other options, I am left wondering if there is some kind of ZFS issue a scrub won''t find. The current behavior is that it will load GRUB, but trying to boot the most recent boot environment (b100 based) I get "Error 16: Inconsistent filesystem structure". The pool has gone through two scrubs from a livecd based on b101a without finding anything wrong. If I select the previous boot environment (b99 based), I get a kernel panic. I''ve tried replacing the /etc/hostid based on a hunch from one of the engineers working on Indiana and ZFS boot. I also tried rebuilding the boot_archive and reloading the GRUB based on build 100. I then tried reloading the build 99 grub to hopefully get to where I could boot build 99. No luck with any of these thus far. More below, and some comments in this bug: http://defect.opensolaris.org/bz/show_bug.cgi?id=3965, though may need to be a separate bug. I''d appreciate any suggestions and be glad to gather any data to diagnose this if possible. == Screen when trying to boot b100 after boot menu = Booting ''opensolaris-15'' bootfs rpool/ROOT/opensolaris-15 kernel$ /platform/i86pc/kernel/$ISADIR/unix -B $ZFS-BOOTFS loading ''/platform/i86pc/kernel/$ISADIR/unix -B $ZFS-BOOTFS'' ... cpu: ''GenuineIntel'' family 6 model 15 step 11 [BIOS accepted mixed-mode target setting!] [Multiboot-kludge, loadaddr=0xbffe38, text-and-data=0x1931a8, bss=0x0, entry=0xc00000] ''/platform/i86pc/kernel/amd64/unix -B zfs-bootfs=rpool/391,bootpath="pci at 0,0/pci1179,1 at 1f,2/disk at 0,0:a",diskdevid="id1,sd at f3205106b484eb804000447c80000/a"'' is loaded module$ /platform/i86pc/$ISADIR/boot_archive loading ''/platform/i86pc/$ISADIR/boot_archive'' ... Error 16: Inconsistent filesystem structure Press any key to continue... == Booting b99 =(by selecting the grub entry from the GRUB menu and adding -kd then doing a :c to continue I get the following stack trace) debug_enter+37 () panicsys+40b () vpanic+15d () panic+9c () (lines above typed in from ::stack, lines below typed in from when it dropped into the debugger) unix:die+ea () unix:trap+3d0 () unix:cmntrap+e9 () unix:mutex_owner_running+d () genunix:lokuppnat+bc () genunix:vn_removeat+7c () genunix:vn_remove_28 () zfs:spa_config_write+18d () zfs:spa_config_sync+102 () zfs:spa_open_common+24b () zfs:spa_open+1c () zfs:dsl_dsobj_to_dsname+37 () zfs:zfs_parse_bootfs+68 () zfs:zfs_mountroot+10a () genunxi:fsop_mountroot+1a () genunix:rootconf+d5 () genunix:vfs_mountroot+65 () genunix:main+e6 () unix:_locore_start+92 () panic: entering debugger (no dump device, continue to reboot) Loaded modules: [ scsi_vhci uppc sd zfs specfs pcplusmp cpu.generic ] kmdb: target stopped at: kmdb_enter+0xb: movq %rax,%rdi == Output from zdb =-------------------------------------------- LABEL 0 -------------------------------------------- version=10 name=''rpool'' state=1 txg=327816 pool_guid=6981480028020800083 hostid=95693 hostname=''opensolaris'' top_guid=5199095267524632419 guid=5199095267524632419 vdev_tree type=''disk'' id=0 guid=5199095267524632419 path=''/dev/dsk/c4t0d0s0'' devid=''id1,sd at f3205106b484eb804000447c80000/a'' phys_path=''/pci at 0,0/pci1179,1 at 1f,2/disk at 0,0:a'' whole_disk=0 metaslab_array=14 metaslab_shift=29 ashift=9 asize=90374406144 is_log=0 DTL=161 -------------------------------------------- LABEL 1 -------------------------------------------- version=10 name=''rpool'' state=1 txg=327816 pool_guid=6981480028020800083 hostid=95693 hostname=''opensolaris'' top_guid=5199095267524632419 guid=5199095267524632419 vdev_tree type=''disk'' id=0 guid=5199095267524632419 path=''/dev/dsk/c4t0d0s0'' devid=''id1,sd at f3205106b484eb804000447c80000/a'' phys_path=''/pci at 0,0/pci1179,1 at 1f,2/disk at 0,0:a'' whole_disk=0 metaslab_array=14 metaslab_shift=29 ashift=9 asize=90374406144 is_log=0 DTL=161 -------------------------------------------- LABEL 2 -------------------------------------------- version=10 name=''rpool'' state=1 txg=327816 pool_guid=6981480028020800083 hostid=95693 hostname=''opensolaris'' top_guid=5199095267524632419 guid=5199095267524632419 vdev_tree type=''disk'' id=0 guid=5199095267524632419 path=''/dev/dsk/c4t0d0s0'' devid=''id1,sd at f3205106b484eb804000447c80000/a'' phys_path=''/pci at 0,0/pci1179,1 at 1f,2/disk at 0,0:a'' whole_disk=0 metaslab_array=14 metaslab_shift=29 ashift=9 asize=90374406144 is_log=0 DTL=161 -------------------------------------------- LABEL 3 -------------------------------------------- version=10 name=''rpool'' state=1 txg=327816 pool_guid=6981480028020800083 hostid=95693 hostname=''opensolaris'' top_guid=5199095267524632419 guid=5199095267524632419 vdev_tree type=''disk'' id=0 guid=5199095267524632419 path=''/dev/dsk/c4t0d0s0'' devid=''id1,sd at f3205106b484eb804000447c80000/a'' phys_path=''/pci at 0,0/pci1179,1 at 1f,2/disk at 0,0:a'' whole_disk=0 metaslab_array=14 metaslab_shift=29 ashift=9 asize=90374406144 is_log=0 DTL=161 -- Matt Ingenthron http://blogs.sun.com/mingenthron/ email: matt.ingenthron at sun.com
Off the lists, someone suggested to me that the "Inconsistent filesystem" may be the boot archive and not the ZFS filesystem (though I still don''t know what''s wrong with booting b99). Regardless, I tried rebuilding the boot_archive with bootadm update-archive -vf and verified it by mounting it and peeking inside. I also tried both with and without /etc/hostid. I still get the same behavior. Any thoughts? Thanks in advance, - Matt Matt.Ingenthron at Sun.COM wrote:> Hi, > > After a recent pkg image-update to OpenSolaris build 100, my system > booted once and now will no longer boot. After exhausting other > options, I am left wondering if there is some kind of ZFS issue a > scrub won''t find. > > The current behavior is that it will load GRUB, but trying to boot the > most recent boot environment (b100 based) I get "Error 16: Inconsistent > filesystem structure". The pool has gone through two scrubs from a > livecd based on b101a without finding anything wrong. If I select the > previous boot environment (b99 based), I get a kernel panic. > > I''ve tried replacing the /etc/hostid based on a hunch from one of the > engineers working on Indiana and ZFS boot. I also tried rebuilding > the boot_archive and reloading the GRUB based on build 100. I then > tried reloading the build 99 grub to hopefully get to where I could > boot build 99. No luck with any of these thus far. > > More below, and some comments in this bug: > http://defect.opensolaris.org/bz/show_bug.cgi?id=3965, though may need > to be a separate bug. > > I''d appreciate any suggestions and be glad to gather any data to > diagnose this if possible. > > > == Screen when trying to boot b100 after boot menu => > Booting ''opensolaris-15'' > > bootfs rpool/ROOT/opensolaris-15 > kernel$ /platform/i86pc/kernel/$ISADIR/unix -B $ZFS-BOOTFS > loading ''/platform/i86pc/kernel/$ISADIR/unix -B $ZFS-BOOTFS'' ... > cpu: ''GenuineIntel'' family 6 model 15 step 11 > [BIOS accepted mixed-mode target setting!] > [Multiboot-kludge, loadaddr=0xbffe38, text-and-data=0x1931a8, bss=0x0, > entry=0xc00000] > ''/platform/i86pc/kernel/amd64/unix -B > zfs-bootfs=rpool/391,bootpath="pci at 0,0/pci1179,1 at 1f,2/disk at 0,0:a",diskdevid="id1,sd at f3205106b484eb804000447c80000/a"'' > > is loaded > module$ /platform/i86pc/$ISADIR/boot_archive > loading ''/platform/i86pc/$ISADIR/boot_archive'' ... > > Error 16: Inconsistent filesystem structure > > Press any key to continue... > > > > == Booting b99 => (by selecting the grub entry from the GRUB menu and adding -kd then > doing a :c to continue I get the following stack trace) > > debug_enter+37 () > panicsys+40b () > vpanic+15d () > panic+9c () > (lines above typed in from ::stack, lines below typed in from when it > dropped into the debugger) > unix:die+ea () > unix:trap+3d0 () > unix:cmntrap+e9 () > unix:mutex_owner_running+d () > genunix:lokuppnat+bc () > genunix:vn_removeat+7c () > genunix:vn_remove_28 () > zfs:spa_config_write+18d () > zfs:spa_config_sync+102 () > zfs:spa_open_common+24b () > zfs:spa_open+1c () > zfs:dsl_dsobj_to_dsname+37 () > zfs:zfs_parse_bootfs+68 () > zfs:zfs_mountroot+10a () > genunxi:fsop_mountroot+1a () > genunix:rootconf+d5 () > genunix:vfs_mountroot+65 () > genunix:main+e6 () > unix:_locore_start+92 () > > panic: entering debugger (no dump device, continue to reboot) > Loaded modules: [ scsi_vhci uppc sd zfs specfs pcplusmp cpu.generic ] > kmdb: target stopped at: > kmdb_enter+0xb: movq %rax,%rdi > > > > == Output from zdb => -------------------------------------------- > LABEL 0 > -------------------------------------------- > version=10 > name=''rpool'' > state=1 > txg=327816 > pool_guid=6981480028020800083 > hostid=95693 > hostname=''opensolaris'' > top_guid=5199095267524632419 > guid=5199095267524632419 > vdev_tree > type=''disk'' > id=0 > guid=5199095267524632419 > path=''/dev/dsk/c4t0d0s0'' > devid=''id1,sd at f3205106b484eb804000447c80000/a'' > phys_path=''/pci at 0,0/pci1179,1 at 1f,2/disk at 0,0:a'' > whole_disk=0 > metaslab_array=14 > metaslab_shift=29 > ashift=9 > asize=90374406144 > is_log=0 > DTL=161 > -------------------------------------------- > LABEL 1 > -------------------------------------------- > version=10 > name=''rpool'' > state=1 > txg=327816 > pool_guid=6981480028020800083 > hostid=95693 > hostname=''opensolaris'' > top_guid=5199095267524632419 > guid=5199095267524632419 > vdev_tree > type=''disk'' > id=0 > guid=5199095267524632419 > path=''/dev/dsk/c4t0d0s0'' > devid=''id1,sd at f3205106b484eb804000447c80000/a'' > phys_path=''/pci at 0,0/pci1179,1 at 1f,2/disk at 0,0:a'' > whole_disk=0 > metaslab_array=14 > metaslab_shift=29 > ashift=9 > asize=90374406144 > is_log=0 > DTL=161 > -------------------------------------------- > LABEL 2 > -------------------------------------------- > version=10 > name=''rpool'' > state=1 > txg=327816 > pool_guid=6981480028020800083 > hostid=95693 > hostname=''opensolaris'' > top_guid=5199095267524632419 > guid=5199095267524632419 > vdev_tree > type=''disk'' > id=0 > guid=5199095267524632419 > path=''/dev/dsk/c4t0d0s0'' > devid=''id1,sd at f3205106b484eb804000447c80000/a'' > phys_path=''/pci at 0,0/pci1179,1 at 1f,2/disk at 0,0:a'' > whole_disk=0 > metaslab_array=14 > metaslab_shift=29 > ashift=9 > asize=90374406144 > is_log=0 > DTL=161 > -------------------------------------------- > LABEL 3 > -------------------------------------------- > version=10 > name=''rpool'' > state=1 > txg=327816 > pool_guid=6981480028020800083 > hostid=95693 > hostname=''opensolaris'' > top_guid=5199095267524632419 > guid=5199095267524632419 > vdev_tree > type=''disk'' > id=0 > guid=5199095267524632419 > path=''/dev/dsk/c4t0d0s0'' > devid=''id1,sd at f3205106b484eb804000447c80000/a'' > phys_path=''/pci at 0,0/pci1179,1 at 1f,2/disk at 0,0:a'' > whole_disk=0 > metaslab_array=14 > metaslab_shift=29 > ashift=9 > asize=90374406144 > is_log=0 > DTL=161 > > >
Marcin Szychowski
2008-Dec-18 12:44 UTC
[zfs-discuss] ZFS problems which scrub can''t find?
Do you use any form of compression? I changed compression from none to gzip-9, got some message about changing properties of boot pool (or fs), copied and moved all files under /usr and /etc to enforce compression, rebooted, and - guess what message did I get. -- This message posted from opensolaris.org