Hello *, we''re running a local zone located on an iSCSI device and see zpools faulting at each reboot of the server. [b]$ zpool list NAME SIZE ALLOC FREE CAP HEALTH ALTROOT data 168G 127G 40.3G 75% ONLINE - iscsi1 - - - - FAULTED - $ zpool status iscsi1 pool: iscsi1 state: UNAVAIL status: One or more devices could not be opened. There are insufficient replicas for the pool to continue functioning. action: Attach the missing device and online it using ''zpool online''. see: http://www.sun.com/msg/ZFS-8000-3C scrub: none requested config: NAME STATE READ WRITE CKSUM iscsi1 UNAVAIL 0 0 0 insufficient replicas c8t1d0 UNAVAIL 0 0 0 cannot open $ zpool status iscsi1 pool: iscsi1 state: UNAVAIL status: One or more devices could not be opened. There are insufficient replicas for the pool to continue functioning. action: Attach the missing device and online it using ''zpool online''. see: http://www.sun.com/msg/ZFS-8000-3C scrub: none requested config: NAME STATE READ WRITE CKSUM iscsi1 UNAVAIL 0 0 0 insufficient replicas c8t1d0 UNAVAIL 0 0 0 cannot open[/b] It seems that zpool is beeing access before iSCSI device is online. Currently we cure that problem temporarily with export and import of the faulted zpool: [b]$ zpool export iscsi1 $ zpool import iscsi1 $ zpool list NAME SIZE ALLOC FREE CAP HEALTH ALTROOT iscsi1 21.4G 5.66G 15.7G 26% ONLINE - Upgrading zpool from version 15 to version 22 didn''t help. Is this a known problem? Any hints available? The server is running pre-update-9 with kernel patched to 142909-17. - Andreas -- This message posted from opensolaris.org
After a while I was able to track down the problem. During boot process service filesystem/local gets enabled long before iscsi/initiator. Start method of filesystem/local will mount ufs, swap and all stuff from /etc/vftab. Some recent patch added a "zfs mount -a" to the filesystem/local start method. Of couse "zfs mount -a" will not find our iSCSI zpools and should no do nothing at all. But in our case it finds and imports a zpool "data" located on a local attached disk. I suspect that the zpool management information stored within zpool "data" contains a pointer to a (at that time) inaccessible zpool "iscsi1", which gets marked as "FAULTED" Therefore "zpool list" shows that missing zpool as "FAULTED" since all devices of iscsi1 are inaccessible. Some times later in boot process iscsi/initiator gets enabled. After getting all iscsi targets online, one would expect the zpool to change state, because of all devices are online now. But that will not happen. As a workaround I''ve written a service manifest and start method. The service will be fired up after iscsi/initiator. The start method scans output of "zpool list " for faulted pools with size of "-". For each faulted zpool it will do a "zpool export pool-name" and after that a re-import with "zpool import pool-name". After that the previously faulted iscsi zpool will be online again (unless you have other problems). But be aware, there is a race condition: you have to wait some time (at least 10 secs) between export and re-import of a zpool. Without wait inbetween the fault condition will not get cleared. Again: you may encounter this case only if... ... you''re running some recent kernel patch level (in our case 142909-17) ... *and* you have placed zpools on both iscsi and non-iscsi devices. - Andreas -- This message posted from opensolaris.org
> Again: you may encounter this case only if... > ... you''re running some recent kernel patch level (in our case 142909-17) > ... *and* you have placed zpools on both iscsi and non-iscsi devices.Witnessed same behavior with osol_134 but it seems to be fixed in 147 atleast. No idea about Solaris though Yours Markus Kovero
>From Oracle Support we got the following info:Bug ID: 6992124 reboot of Sol10 u9 host makes zpool FAULTED when zpool uses iscsi LUNs This is a duplicate of: Bug ID: 6907687 zfs pool is not automatically fixed when disk are brought back online or after boot An IDR patch already exists, but no official patch yet. - Andreas -- This message posted from opensolaris.org
On Tue, Nov 09, 2010 at 04:18:17AM -0800, Andreas Koppenhoefer wrote:> From Oracle Support we got the following info: > > Bug ID: 6992124 reboot of Sol10 u9 host makes zpool FAULTED when zpool uses iscsi LUNs > This is a duplicate of: > Bug ID: 6907687 zfs pool is not automatically fixed when disk are brought back online or after boot > > An IDR patch already exists, but no official patch yet. >Do you know if these bugs are fixed in Solaris 11 Express ? -- Pasi
On Tue, November 30, 2010 14:09, Pasi K?rkk?inen wrote:>> Bug ID: 6907687 zfs pool is not automatically fixed when disk are >> brought back online or after boot >> >> An IDR patch already exists, but no official patch yet. > > Do you know if these bugs are fixed in Solaris 11 Express ?It says it was fixed in snv_140, and S11E is based on snv_151a, so it should be in: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6907687
> > Do you know if these bugs are fixed in Solaris 11 Express ?> It says it was fixed in snv_140, and S11E is based on snv_151a, so it > should be in:> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6907687I can confirm it works, iscsi zpools seem to work very happily now. Yours Markus Kovero