jan damborsky
2008-Nov-07 17:29 UTC
[zfs-discuss] Ended up in GRUB prompt after the installation on ZFS
Hi ZFS team, when testing installation with recent OpenSolaris builds, we have been encountering that in some cases, people end up in GRUB prompt after the installation - it seems that menu.lst can''t be accessed for some reason. At least two following bugs seems to be describing the same manifestation of the problem which root cause has not been identified yet: 4051 opensolaris b99b/b100a does not install on 1.5 TB disk or boot fails after install 4591 Install failure on a Sun Fire X4240 with Opensolaris 200811 In those bug reports, nothing indicates those might be ZFS related and probably there could be more scenarios why it might happen. But when I hit that problem today when testing Automated Installer (it is a part of Caiman project and will replace current jumpstart install technology), I was able to make GRUB find ''menu.lst'' just by using ''zpool import'' command - please see below for detailed procedure. Based on this, could you please take a look at those observations and if possible help me understand if there is anything obvious what might be wrong and if you think this is somehow related to ZFS technology ? Thank you very much for your help, Jan configuration: -------------- HW: Ultra 20, 1GB RWM, 1 250GB SATA drive SW: Opensolaris build 100, 64bit mode steps used: ----------- [1] OpenSolaris 100 installed using Automated Installer - Solaris 2 partition created during installation * partition configuration before installation: # fdisk -W - c2t0d0p0 ...* Id Act Bhead Bsect Bcyl Ehead Esect Ecyl Rsect Numsect 192 0 0 1 1 254 63 1023 16065 22491000 * partition configuration after installation: # fdisk -W - c2t0d0p0 ...* Id Act Bhead Bsect Bcyl Ehead Esect Ecyl Rsect Numsect 192 0 0 1 1 254 63 1023 16065 22491000 191 128 254 63 1023 254 63 1023 22507065 30000000 [2] When I reboot the system after the installation, I ended up in GRUB prompt: grub> root (hd0,1,a): Filesystem type unknown, partition type 0xbf grub> cat /rpool/boot/grub/menu.lst Error 17: Cannot mount selected partition grub> [3] I rebooted into AI and did ''zpool import'' # zdb -l /dev/rdsk/c2t0d0s0 > /tmp/zdb_before_import.txt (attached) # zpool import -f rpool # zdb -l /dev/rdsk/c2t0d0s0 > /tmp/zdb_after_import.txt (attached) # diff /tmp/zdb_before_import.txt /tmp/zdb_after_import.txt 7c7 < txg=21 --- > txg=2675 9c9 < hostid=4741222 --- > hostid=4247690 17a18 > devid=''id1,sd at f00c778e247ac7bd0000238460000/a'' 31c32 ... # reboot [4] Now GRUB can access menu.lst and Solaris is booted hypothesis ---------- It seems that for some reason, when ZFS pool was created, ''devid'' information was not added to the ZFS label. When ''zpool import'' was called, ''devid'' got populated. Looking at the GRUB ZFS plug-in, it seems that ''devid'' (ZPOOL_CONFIG_DEVID attribute) is required in order to be able to access ZFS filesystem: In grub/grub-0.95/stage2/fsys_zfs.c: vdev_get_bootpath() { ... if (strcmp(type, VDEV_TYPE_DISK) == 0) { if (vdev_validate(nv) != 0 || (nvlist_lookup_value(nv, ZPOOL_CONFIG_PHYS_PATH, bootpath, DATA_TYPE_STRING, NULL) != 0) || (nvlist_lookup_value(nv, ZPOOL_CONFIG_DEVID, devid, DATA_TYPE_STRING, NULL) != 0)) return (ERR_NO_BOOTPATH); ... } additional observations: ------------------------ [1] If ''devid'' is populated during installation after ''zpool create'' operation, the problem doesn''t occur. [2] If following described procedure, the problem is reproducible at will on system where it was initially reproduced (please see above for the configuration) [3] I was not able to reproduce that using exactly the same procedure on following configurations: * Ferrari 4000 with 160GB IDE disk * vmware - installation done on IDE disk [4] When installation into existing Solaris2 partition containing Solaris instance is done ''devid'' is always populated and the problem doesn''t occur. (it doesn''t matter if partition is marked ''active'' or not), -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: zdb_before_import.txt URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081107/283953b5/attachment.txt> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: zdb_after_import.txt URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081107/283953b5/attachment-0001.txt>
Rob at Logan.com
2008-Nov-09 22:04 UTC
[zfs-discuss] Ended up in GRUB prompt after the installation on ZFS
> It seems that for some reason, when ZFS pool was> created, ''devid'' information was not added to the ZFS label. I hit this installing osol-0811-101a-rc1b.iso on a recycled raidz disk. (thanks for the import tip) knowing a secondary pool imports on reboot even if the disks change paths. ie: the disk lable devid and /etc/zfs/zpool.cache are unnecessary. Both will remain wrong until a scrub. So, perhaps the issue is with an EFI labeled disk with old pool info getting converted to VTOC label for zfs root install. Rob
jan damborsky
2008-Nov-10 10:35 UTC
[zfs-discuss] [caiman-discuss] Ended up in GRUB prompt after the installation on ZFS
I have filed following bug in ''solaris/kernel/zfs'' category for tracking this issue: 6769487 Ended up in ''grub>'' prompt after installation of OpenSolaris 2008.11 (build 101a) Thank you, Jan jan damborsky wrote:> Hi ZFS team, > > when testing installation with recent OpenSolaris builds, > we have been encountering that in some cases, people end up > in GRUB prompt after the installation - it seems that menu.lst > can''t be accessed for some reason. At least two following bugs > seems to be describing the same manifestation of the problem > which root cause has not been identified yet: > > 4051 opensolaris b99b/b100a does not install on 1.5 TB disk or boot > fails after install > 4591 Install failure on a Sun Fire X4240 with Opensolaris 200811 > > In those bug reports, nothing indicates those might be ZFS related > and probably there could be more scenarios why it might happen. > > But when I hit that problem today when testing Automated Installer > (it is a part of Caiman project and will replace current jumpstart > install technology), I was able to make GRUB find ''menu.lst'' just by > using ''zpool import'' command - please see below for detailed procedure. > > Based on this, could you please take a look at those observations > and if possible help me understand if there is anything obvious > what might be wrong and if you think this is somehow related to > ZFS technology ? > > Thank you very much for your help, > Jan > > > configuration: > -------------- > HW: Ultra 20, 1GB RWM, 1 250GB SATA drive > SW: Opensolaris build 100, 64bit mode > > steps used: > ----------- > [1] OpenSolaris 100 installed using Automated Installer > - Solaris 2 partition created during installation > > * partition configuration before installation: > > # fdisk -W - c2t0d0p0 > ...* Id Act Bhead Bsect Bcyl Ehead Esect Ecyl Rsect > Numsect > 192 0 0 1 1 254 63 1023 16065 > 22491000 > * partition configuration after installation: > > # fdisk -W - c2t0d0p0 > ...* Id Act Bhead Bsect Bcyl Ehead Esect Ecyl Rsect > Numsect > 192 0 0 1 1 254 63 1023 16065 > 22491000 191 128 254 63 1023 254 63 1023 > 22507065 30000000 > > [2] When I reboot the system after the installation, I ended up in > GRUB prompt: > grub> root > (hd0,1,a): Filesystem type unknown, partition type 0xbf > > grub> cat /rpool/boot/grub/menu.lst > > Error 17: Cannot mount selected partition > > grub> > > [3] I rebooted into AI and did ''zpool import'' > # zdb -l /dev/rdsk/c2t0d0s0 > /tmp/zdb_before_import.txt (attached) > # zpool import -f rpool > # zdb -l /dev/rdsk/c2t0d0s0 > /tmp/zdb_after_import.txt (attached) > # diff /tmp/zdb_before_import.txt /tmp/zdb_after_import.txt > 7c7 > < txg=21 > --- > > txg=2675 > 9c9 > < hostid=4741222 > --- > > hostid=4247690 > 17a18 > > devid=''id1,sd at f00c778e247ac7bd0000238460000/a'' > 31c32 > ... > # reboot > > [4] Now GRUB can access menu.lst and Solaris is booted > > hypothesis > ---------- > It seems that for some reason, when ZFS pool was > created, ''devid'' information was not added to the > ZFS label. > > When ''zpool import'' was called, ''devid'' got populated. > > Looking at the GRUB ZFS plug-in, it seems that ''devid'' > (ZPOOL_CONFIG_DEVID attribute) is required in order to > be able to access ZFS filesystem: > > In grub/grub-0.95/stage2/fsys_zfs.c: > > vdev_get_bootpath() > { > ... > if (strcmp(type, VDEV_TYPE_DISK) == 0) { > if (vdev_validate(nv) != 0 || > (nvlist_lookup_value(nv, ZPOOL_CONFIG_PHYS_PATH, > bootpath, DATA_TYPE_STRING, NULL) != 0) || > (nvlist_lookup_value(nv, ZPOOL_CONFIG_DEVID, > devid, DATA_TYPE_STRING, NULL) != 0)) > return (ERR_NO_BOOTPATH); > ... > } > > additional observations: > ------------------------ > [1] If ''devid'' is populated during installation after ''zpool create'' > operation, the problem doesn''t occur. > > [2] If following described procedure, the problem is reproducible > at will on system where it was initially reproduced > (please see above for the configuration) > > [3] I was not able to reproduce that using exactly the same > procedure on following configurations: > * Ferrari 4000 with 160GB IDE disk > * vmware - installation done on IDE disk > > [4] When installation into existing Solaris2 partition containing > Solaris instance is done ''devid'' is always populated and the problem > doesn''t occur. > (it doesn''t matter if partition is marked ''active'' or not), > > ------------------------------------------------------------------------ > > _______________________________________________ > caiman-discuss mailing list > caiman-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/caiman-discuss