jan damborsky
2008-Nov-07 17:29 UTC
[zfs-discuss] Ended up in GRUB prompt after the installation on ZFS
Hi ZFS team,
when testing installation with recent OpenSolaris builds,
we have been encountering that in some cases, people end up
in GRUB prompt after the installation - it seems that menu.lst
can''t be accessed for some reason. At least two following bugs
seems to be describing the same manifestation of the problem
which root cause has not been identified yet:
4051 opensolaris b99b/b100a does not install on 1.5 TB disk or boot
fails after install
4591 Install failure on a Sun Fire X4240 with Opensolaris 200811
In those bug reports, nothing indicates those might be ZFS related
and probably there could be more scenarios why it might happen.
But when I hit that problem today when testing Automated Installer
(it is a part of Caiman project and will replace current jumpstart
install technology), I was able to make GRUB find ''menu.lst''
just by
using ''zpool import'' command - please see below for detailed
procedure.
Based on this, could you please take a look at those observations
and if possible help me understand if there is anything obvious
what might be wrong and if you think this is somehow related to
ZFS technology ?
Thank you very much for your help,
Jan
configuration:
--------------
HW: Ultra 20, 1GB RWM, 1 250GB SATA drive
SW: Opensolaris build 100, 64bit mode
steps used:
-----------
[1] OpenSolaris 100 installed using Automated Installer
- Solaris 2 partition created during installation
* partition configuration before installation:
# fdisk -W - c2t0d0p0
...* Id Act Bhead Bsect Bcyl Ehead Esect Ecyl Rsect
Numsect
192 0 0 1 1 254 63 1023 16065
22491000
* partition configuration after installation:
# fdisk -W - c2t0d0p0
...* Id Act Bhead Bsect Bcyl Ehead Esect Ecyl Rsect
Numsect
192 0 0 1 1 254 63 1023 16065
22491000
191 128 254 63 1023 254 63 1023 22507065 30000000
[2] When I reboot the system after the installation, I ended up in GRUB
prompt:
grub> root
(hd0,1,a): Filesystem type unknown, partition type 0xbf
grub> cat /rpool/boot/grub/menu.lst
Error 17: Cannot mount selected partition
grub>
[3] I rebooted into AI and did ''zpool import''
# zdb -l /dev/rdsk/c2t0d0s0 > /tmp/zdb_before_import.txt (attached)
# zpool import -f rpool
# zdb -l /dev/rdsk/c2t0d0s0 > /tmp/zdb_after_import.txt (attached)
# diff /tmp/zdb_before_import.txt /tmp/zdb_after_import.txt
7c7
< txg=21
---
> txg=2675
9c9
< hostid=4741222
---
> hostid=4247690
17a18
> devid=''id1,sd at
f00c778e247ac7bd0000238460000/a''
31c32
...
# reboot
[4] Now GRUB can access menu.lst and Solaris is booted
hypothesis
----------
It seems that for some reason, when ZFS pool was
created, ''devid'' information was not added to the
ZFS label.
When ''zpool import'' was called, ''devid'' got
populated.
Looking at the GRUB ZFS plug-in, it seems that ''devid''
(ZPOOL_CONFIG_DEVID attribute) is required in order to
be able to access ZFS filesystem:
In grub/grub-0.95/stage2/fsys_zfs.c:
vdev_get_bootpath()
{
...
if (strcmp(type, VDEV_TYPE_DISK) == 0) {
if (vdev_validate(nv) != 0 ||
(nvlist_lookup_value(nv, ZPOOL_CONFIG_PHYS_PATH,
bootpath, DATA_TYPE_STRING, NULL) != 0) ||
(nvlist_lookup_value(nv, ZPOOL_CONFIG_DEVID,
devid, DATA_TYPE_STRING, NULL) != 0))
return (ERR_NO_BOOTPATH);
...
}
additional observations:
------------------------
[1] If ''devid'' is populated during installation after
''zpool create''
operation, the problem doesn''t occur.
[2] If following described procedure, the problem is reproducible
at will on system where it was initially reproduced
(please see above for the configuration)
[3] I was not able to reproduce that using exactly the same
procedure on following configurations:
* Ferrari 4000 with 160GB IDE disk
* vmware - installation done on IDE disk
[4] When installation into existing Solaris2 partition containing
Solaris instance is done ''devid'' is always populated and the
problem
doesn''t occur.
(it doesn''t matter if partition is marked ''active'' or
not),
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: zdb_before_import.txt
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081107/283953b5/attachment.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: zdb_after_import.txt
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081107/283953b5/attachment-0001.txt>
Rob at Logan.com
2008-Nov-09 22:04 UTC
[zfs-discuss] Ended up in GRUB prompt after the installation on ZFS
> It seems that for some reason, when ZFS pool was> created, ''devid'' information was not added to the ZFS label. I hit this installing osol-0811-101a-rc1b.iso on a recycled raidz disk. (thanks for the import tip) knowing a secondary pool imports on reboot even if the disks change paths. ie: the disk lable devid and /etc/zfs/zpool.cache are unnecessary. Both will remain wrong until a scrub. So, perhaps the issue is with an EFI labeled disk with old pool info getting converted to VTOC label for zfs root install. Rob
jan damborsky
2008-Nov-10 10:35 UTC
[zfs-discuss] [caiman-discuss] Ended up in GRUB prompt after the installation on ZFS
I have filed following bug in ''solaris/kernel/zfs'' category for tracking this issue: 6769487 Ended up in ''grub>'' prompt after installation of OpenSolaris 2008.11 (build 101a) Thank you, Jan jan damborsky wrote:> Hi ZFS team, > > when testing installation with recent OpenSolaris builds, > we have been encountering that in some cases, people end up > in GRUB prompt after the installation - it seems that menu.lst > can''t be accessed for some reason. At least two following bugs > seems to be describing the same manifestation of the problem > which root cause has not been identified yet: > > 4051 opensolaris b99b/b100a does not install on 1.5 TB disk or boot > fails after install > 4591 Install failure on a Sun Fire X4240 with Opensolaris 200811 > > In those bug reports, nothing indicates those might be ZFS related > and probably there could be more scenarios why it might happen. > > But when I hit that problem today when testing Automated Installer > (it is a part of Caiman project and will replace current jumpstart > install technology), I was able to make GRUB find ''menu.lst'' just by > using ''zpool import'' command - please see below for detailed procedure. > > Based on this, could you please take a look at those observations > and if possible help me understand if there is anything obvious > what might be wrong and if you think this is somehow related to > ZFS technology ? > > Thank you very much for your help, > Jan > > > configuration: > -------------- > HW: Ultra 20, 1GB RWM, 1 250GB SATA drive > SW: Opensolaris build 100, 64bit mode > > steps used: > ----------- > [1] OpenSolaris 100 installed using Automated Installer > - Solaris 2 partition created during installation > > * partition configuration before installation: > > # fdisk -W - c2t0d0p0 > ...* Id Act Bhead Bsect Bcyl Ehead Esect Ecyl Rsect > Numsect > 192 0 0 1 1 254 63 1023 16065 > 22491000 > * partition configuration after installation: > > # fdisk -W - c2t0d0p0 > ...* Id Act Bhead Bsect Bcyl Ehead Esect Ecyl Rsect > Numsect > 192 0 0 1 1 254 63 1023 16065 > 22491000 191 128 254 63 1023 254 63 1023 > 22507065 30000000 > > [2] When I reboot the system after the installation, I ended up in > GRUB prompt: > grub> root > (hd0,1,a): Filesystem type unknown, partition type 0xbf > > grub> cat /rpool/boot/grub/menu.lst > > Error 17: Cannot mount selected partition > > grub> > > [3] I rebooted into AI and did ''zpool import'' > # zdb -l /dev/rdsk/c2t0d0s0 > /tmp/zdb_before_import.txt (attached) > # zpool import -f rpool > # zdb -l /dev/rdsk/c2t0d0s0 > /tmp/zdb_after_import.txt (attached) > # diff /tmp/zdb_before_import.txt /tmp/zdb_after_import.txt > 7c7 > < txg=21 > --- > > txg=2675 > 9c9 > < hostid=4741222 > --- > > hostid=4247690 > 17a18 > > devid=''id1,sd at f00c778e247ac7bd0000238460000/a'' > 31c32 > ... > # reboot > > [4] Now GRUB can access menu.lst and Solaris is booted > > hypothesis > ---------- > It seems that for some reason, when ZFS pool was > created, ''devid'' information was not added to the > ZFS label. > > When ''zpool import'' was called, ''devid'' got populated. > > Looking at the GRUB ZFS plug-in, it seems that ''devid'' > (ZPOOL_CONFIG_DEVID attribute) is required in order to > be able to access ZFS filesystem: > > In grub/grub-0.95/stage2/fsys_zfs.c: > > vdev_get_bootpath() > { > ... > if (strcmp(type, VDEV_TYPE_DISK) == 0) { > if (vdev_validate(nv) != 0 || > (nvlist_lookup_value(nv, ZPOOL_CONFIG_PHYS_PATH, > bootpath, DATA_TYPE_STRING, NULL) != 0) || > (nvlist_lookup_value(nv, ZPOOL_CONFIG_DEVID, > devid, DATA_TYPE_STRING, NULL) != 0)) > return (ERR_NO_BOOTPATH); > ... > } > > additional observations: > ------------------------ > [1] If ''devid'' is populated during installation after ''zpool create'' > operation, the problem doesn''t occur. > > [2] If following described procedure, the problem is reproducible > at will on system where it was initially reproduced > (please see above for the configuration) > > [3] I was not able to reproduce that using exactly the same > procedure on following configurations: > * Ferrari 4000 with 160GB IDE disk > * vmware - installation done on IDE disk > > [4] When installation into existing Solaris2 partition containing > Solaris instance is done ''devid'' is always populated and the problem > doesn''t occur. > (it doesn''t matter if partition is marked ''active'' or not), > > ------------------------------------------------------------------------ > > _______________________________________________ > caiman-discuss mailing list > caiman-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/caiman-discuss