thr3ads.net - freebsd stable - ZFS: can't read MOS [Apr 2012]

If this information is useful, please help other people find it:
Share via:

Rumen Telbizov

2012-Apr-09 18:50 UTC

ZFS: can't read MOS

Hello everyone,

I have a ZFS FreeBSD 8.2-STABLE (Aug 30, 2011) that I am having issues with
and might use some help.

In a nutshell, this machine has been running fine for about a year and a
half but after a recent power
outage (complete colo blackout) I can't boot of the ZFS pool any more.
Here's the error I get (attached screenshot as well):

ZFS: i/o error - all block copies unavailable
ZFS: can't read MOS
ZFS: unexpected object set type 0
ZFS: unexpected object set type 0

FreeBSD/x86 boot
Default: zroot:/boot/kernel/kernel
boot: ZFS: unexpected object set type 0

I've been searching the net high and low for an actual solution but all the
threads end up nowhere.
I hope I can get some clue here. Thanks in advance.

Here's the relevant hardware configuration of this box (serves as a backup
box).

   - SuperMicro 4U + another 4U totalling 48 x 2TB disks
   - Hardware raid LSI 9261-8i holding both shelves giving 1 mfid0 device
   to the OS
   - Hardware raid 60 -- 6 x 8 raid6 groups
   - ZFS with gptzfsboot installed on the "single" mfid0 device.
Partition
   table is:

[root@mfsbsd /zroot/etc]# gpart show -l
=>          34  140554616765  mfid0  GPT  (65T)
            34           128      1  (null)  (64k)
           162      33554432      2  swap  (16G)
      33554594  140521062205      3  zroot  (65T)



   - boot device is: vfs.root.mountfrom="zfs:zroot" (as per
loader.conf)
   - zpool status is:

[root@mfsbsd /zroot/etc]# zpool status
  pool: zroot
 state: ONLINE
 scan: scrub canceled on Mon Apr  9 09:48:14 2012
config:

NAME        STATE     READ WRITE CKSUM
zroot       ONLINE       0     0     0
 mfid0p3   ONLINE       0     0     0

errors: No known data errors



   - zpool get all:

[root@mfsbsd /zroot/etc]# zpool get all zroot
NAME   PROPERTY       VALUE       SOURCE
zroot  size           65T         -
zroot  capacity       36%         -
zroot  altroot        -           default
zroot  health         ONLINE      -
zroot  guid           3339338746696340707  default
zroot  version        28          default
*zroot  bootfs         zroot       local*
zroot  delegation     on          default
zroot  autoreplace    off         default
zroot  cachefile      -           default
zroot  failmode       wait        default
zroot  listsnapshots  on          local
zroot  autoexpand     off         default
zroot  dedupditto     0           default
zroot  dedupratio     1.00x       -
zroot  free           41.2T       -
zroot  allocated      23.8T       -
zroot  readonly       off         -


Here's what happened chronologically:

   - Savvis Toronto blacked out completely for 31 minutes
   - After power was restored this machine came up with the above error
   - I managed to PXE boot into mfsbsd successfully and managed to import
   the pool and access actual data/snapshots - no problem
   - Shortly after another reboot the hardware raid controller complained
   that it has lost
   it's configuration and now sees only half of the disks as foreign good
   and the
   rest as foreign bad. BIOS didn't see any boot device.
   - Spent some time on the phone with LSI and managed to restore the
   hardware RAID
   by basically removing any and all configuration, making disks
   unconfigured good
   and recreating the array in exactly the same way as I created it in the
   beginning BUT
   with the important exception that I did NOT initialize the array.
   - After this I was back to square one where I could see all the data
   without any loss
   (via mfsbsd) but cannot boot of the volume any more.
   - First thing I tried was to restore the boot loader without any luck:
      gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 mfid0p1
   - Then out of desperation, took zfsboot, zfsloader, gptzfsboot from
   9.0-RELEASE and replaced them in /boot,
   reinitialized again - no luck
   - Currently running zdb -ccv zroot to check for any corruptions - I am
   afraid this will take forever since I have *23.8T* used space. No errors
   yet
   - One thing I did notice is that zdb zroot returned the metaslab
   information line by line very slowly (10-15 seconds a line). I don't know
   if it's related.
   - Another thing I tried (saw that in a thread) without any difference
   whatsoever was:

# cd src/sys/boot/i386/zfsboot
# make clean; make cleandir
# make obj ; make depend ; make
# cd i386/loader
# make install
# cd /usr/src/sys/boot/i386/zfsboot
# make install
# sysctl kern.geom.debugflags=16
# dd if=/boot/zfsboot of=/dev/da0 count=1
# dd if=/boot/zfsboot of=/dev/da0 skip=1 seek=1024
# reboot


At this point I am contemplating how to evacuate all the data from there or
better yet put some USB flash to boot from.
I could provide further details/execute commands if needed. Any help would
be appreciated.

Thank you,
-- 
Rumen Telbizov
http://telbizov.com

Daniel Kalchev

2012-Apr-10 07:05 UTC

head link

ZFS: can't read MOS

It seems your RAID controlled goofed something.

I wonder, why did you use the hardware controller for RAID60, instead of 
ZFS (using each drive as single drive array, or JBOD).

About the only way I can think out of this situation, sans having a 
second box or huge tape backup is to convert it to proper ZFS.. in place :)

It helps that you have not used half of your capacity. You could do this:

1. remove as many drives from the RAID60 as you could, without breaking 
redundancy.
2. create single drive volumes out of these, or just JBOD the drives if 
this controller can.
3. create new zpool with these drives, again "RAID60" if you like,
that
is vdevs of raidz2. You need enough drives to hold at least 24TB of 
data. You may 'trick' ZFS by creating raidz2 groups of say 8 drives (I 
guess this is what you mean by having 6x8 RAID60), then remove two 
drives from the vdev and use these for the next vdev. I guess two raidz2 
groups of 8 drives will be ok, as that gives you 24TB, but in fact 
slightly less, because the drives aren't really 2TB so you may need 3x8 
groups... You can create 3x8 groups without redundancy (actually 3x6) 
with 18 drives only.
4. copy over your old zpool to the new with zfs send and zfs receive.
5. rename both pools so that the new pool becomes zroot.
6. boot off from your new pool.
7. if everything is ok, destroy the remains of the RAID60, add two 
drives to each non-redundant vdev in the new zpool and add the other 
drives as 8 drive vdevs to the zpool.
8. Now you have true ZFS pool and ZFS will be able to detect and recover 
any data errors on your drives. With pool such huge and so much data it 
is very risky to depend on 'hardware raid', especially when ZFS is 
available.

Thinking about this again, you may not be able to extract enough 'spare'
drives out of the RAID60 pool. You may get up to 12 drives and that is 
not enough to hold all your data. An option is to replace those with 
larger capacity drives, 3TB pr 4TB -- ZFS has not problem with different 
size drives in a zpool, but this has to be checked if this LSI 
controller can manage drives larger than 2TB. Or, you might be able to 
remove some data form the pool.

Another option is if you can attach an spare drive chassis to handle the 
transition..

In any case, my advice is to stay away from ZFS on top of 'hardware 
RAID'. ZFS will be able to detect data corruption, but not do anything 
except inform you, even if you have lots of redundant drives. Use ZFS as 
designed.

Daniel

On 09.04.12 21:50, Rumen Telbizov wrote:> Hello everyone,
>
> I have a ZFS FreeBSD 8.2-STABLE (Aug 30, 2011) that I am having issues with
> and might use some help.
>
> In a nutshell, this machine has been running fine for about a year and a
> half but after a recent power
> outage (complete colo blackout) I can't boot of the ZFS pool any more.
> Here's the error I get (attached screenshot as well):
>
> ZFS: i/o error - all block copies unavailable
> ZFS: can't read MOS
> ZFS: unexpected object set type 0
> ZFS: unexpected object set type 0
>
> FreeBSD/x86 boot
> Default: zroot:/boot/kernel/kernel
> boot: ZFS: unexpected object set type 0
>
> I've been searching the net high and low for an actual solution but all
the
> threads end up nowhere.
> I hope I can get some clue here. Thanks in advance.
>
> Here's the relevant hardware configuration of this box (serves as a
backup
> box).
>
>     - SuperMicro 4U + another 4U totalling 48 x 2TB disks
>     - Hardware raid LSI 9261-8i holding both shelves giving 1 mfid0 device
>     to the OS
>     - Hardware raid 60 -- 6 x 8 raid6 groups
>     - ZFS with gptzfsboot installed on the "single" mfid0 device.
Partition
>     table is:
>
> [root@mfsbsd /zroot/etc]# gpart show -l
> =>           34  140554616765  mfid0  GPT  (65T)
>              34           128      1  (null)  (64k)
>             162      33554432      2  swap  (16G)
>        33554594  140521062205      3  zroot  (65T)
>
>
>
>     - boot device is: vfs.root.mountfrom="zfs:zroot" (as per
loader.conf)
>     - zpool status is:
>
> [root@mfsbsd /zroot/etc]# zpool status
>    pool: zroot
>   state: ONLINE
>   scan: scrub canceled on Mon Apr  9 09:48:14 2012
> config:
>
> NAME        STATE     READ WRITE CKSUM
> zroot       ONLINE       0     0     0
>   mfid0p3   ONLINE       0     0     0
>
> errors: No known data errors
>
>
>
>     - zpool get all:
>
> [root@mfsbsd /zroot/etc]# zpool get all zroot
> NAME   PROPERTY       VALUE       SOURCE
> zroot  size           65T         -
> zroot  capacity       36%         -
> zroot  altroot        -           default
> zroot  health         ONLINE      -
> zroot  guid           3339338746696340707  default
> zroot  version        28          default
> *zroot  bootfs         zroot       local*
> zroot  delegation     on          default
> zroot  autoreplace    off         default
> zroot  cachefile      -           default
> zroot  failmode       wait        default
> zroot  listsnapshots  on          local
> zroot  autoexpand     off         default
> zroot  dedupditto     0           default
> zroot  dedupratio     1.00x       -
> zroot  free           41.2T       -
> zroot  allocated      23.8T       -
> zroot  readonly       off         -
>
>
> Here's what happened chronologically:
>
>     - Savvis Toronto blacked out completely for 31 minutes
>     - After power was restored this machine came up with the above error
>     - I managed to PXE boot into mfsbsd successfully and managed to import
>     the pool and access actual data/snapshots - no problem
>     - Shortly after another reboot the hardware raid controller complained
>     that it has lost
>     it's configuration and now sees only half of the disks as foreign
good
>     and the
>     rest as foreign bad. BIOS didn't see any boot device.
>     - Spent some time on the phone with LSI and managed to restore the
>     hardware RAID
>     by basically removing any and all configuration, making disks
>     unconfigured good
>     and recreating the array in exactly the same way as I created it in the
>     beginning BUT
>     with the important exception that I did NOT initialize the array.
>     - After this I was back to square one where I could see all the data
>     without any loss
>     (via mfsbsd) but cannot boot of the volume any more.
>     - First thing I tried was to restore the boot loader without any luck:
>        gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 mfid0p1
>     - Then out of desperation, took zfsboot, zfsloader, gptzfsboot from
>     9.0-RELEASE and replaced them in /boot,
>     reinitialized again - no luck
>     - Currently running zdb -ccv zroot to check for any corruptions - I am
>     afraid this will take forever since I have *23.8T* used space. No
errors
>     yet
>     - One thing I did notice is that zdb zroot returned the metaslab
>     information line by line very slowly (10-15 seconds a line). I
don't know
>     if it's related.
>     - Another thing I tried (saw that in a thread) without any difference
>     whatsoever was:
>
> # cd src/sys/boot/i386/zfsboot
> # make clean; make cleandir
> # make obj ; make depend ; make
> # cd i386/loader
> # make install
> # cd /usr/src/sys/boot/i386/zfsboot
> # make install
> # sysctl kern.geom.debugflags=16
> # dd if=/boot/zfsboot of=/dev/da0 count=1
> # dd if=/boot/zfsboot of=/dev/da0 skip=1 seek=1024
> # reboot
>
>
> At this point I am contemplating how to evacuate all the data from there or
> better yet put some USB flash to boot from.
> I could provide further details/execute commands if needed. Any help would
> be appreciated.
>
> Thank you,
>
>
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to
"freebsd-stable-unsubscribe@freebsd.org"

Andriy Gapon

2012-Apr-10 11:34 UTC

head link

ZFS: can't read MOS

on 09/04/2012 21:50 Rumen Telbizov said the following:> Hello everyone,
> 
> I have a ZFS FreeBSD 8.2-STABLE (Aug 30, 2011) that I am having issues with
> and might use some help.
> 
> In a nutshell, this machine has been running fine for about a year and a
> half but after a recent power
> outage (complete colo blackout) I can't boot of the ZFS pool any more.
> Here's the error I get (attached screenshot as well):
> 
> ZFS: i/o error - all block copies unavailable
> ZFS: can't read MOS
> ZFS: unexpected object set type 0
> ZFS: unexpected object set type 0
> 
> FreeBSD/x86 boot
> Default: zroot:/boot/kernel/kernel
> boot: ZFS: unexpected object set type 0
> 
> I've been searching the net high and low for an actual solution but all
the
> threads end up nowhere.
> I hope I can get some clue here. Thanks in advance.
Not sure if the following could be of any help to you but
${SRC}/tools/tools/zfsboottest utility can help diagnosing and debugging such
issues from userland (without requiring a reboot).

See also a small nitpick below.
> Here's the relevant hardware configuration of this box (serves as a
backup
> box).
> 
>    - SuperMicro 4U + another 4U totalling 48 x 2TB disks
>    - Hardware raid LSI 9261-8i holding both shelves giving 1 mfid0 device
>    to the OS
>    - Hardware raid 60 -- 6 x 8 raid6 groups
>    - ZFS with gptzfsboot installed on the "single" mfid0 device.
Partition
>    table is:
> 
> [root@mfsbsd /zroot/etc]# gpart show -l
> =>          34  140554616765  mfid0  GPT  (65T)
>             34           128      1  (null)  (64k)
>            162      33554432      2  swap  (16G)
>       33554594  140521062205      3  zroot  (65T)
> 
> 
> 
>    - boot device is: vfs.root.mountfrom="zfs:zroot" (as per
loader.conf)
>    - zpool status is:
> 
> [root@mfsbsd /zroot/etc]# zpool status
>   pool: zroot
>  state: ONLINE
>  scan: scrub canceled on Mon Apr  9 09:48:14 2012
> config:
> 
> NAME        STATE     READ WRITE CKSUM
> zroot       ONLINE       0     0     0
>  mfid0p3   ONLINE       0     0     0
> 
> errors: No known data errors
> 
> 
> 
>    - zpool get all:
> 
> [root@mfsbsd /zroot/etc]# zpool get all zroot
> NAME   PROPERTY       VALUE       SOURCE
> zroot  size           65T         -
> zroot  capacity       36%         -
> zroot  altroot        -           default
> zroot  health         ONLINE      -
> zroot  guid           3339338746696340707  default
> zroot  version        28          default
> *zroot  bootfs         zroot       local*
> zroot  delegation     on          default
> zroot  autoreplace    off         default
> zroot  cachefile      -           default
> zroot  failmode       wait        default
> zroot  listsnapshots  on          local
> zroot  autoexpand     off         default
> zroot  dedupditto     0           default
> zroot  dedupratio     1.00x       -
> zroot  free           41.2T       -
> zroot  allocated      23.8T       -
> zroot  readonly       off         -
> 
> 
> Here's what happened chronologically:
> 
>    - Savvis Toronto blacked out completely for 31 minutes
>    - After power was restored this machine came up with the above error
>    - I managed to PXE boot into mfsbsd successfully and managed to import
>    the pool and access actual data/snapshots - no problem
>    - Shortly after another reboot the hardware raid controller complained
>    that it has lost
>    it's configuration and now sees only half of the disks as foreign
good
>    and the
>    rest as foreign bad. BIOS didn't see any boot device.
>    - Spent some time on the phone with LSI and managed to restore the
>    hardware RAID
>    by basically removing any and all configuration, making disks
>    unconfigured good
>    and recreating the array in exactly the same way as I created it in the
>    beginning BUT
>    with the important exception that I did NOT initialize the array.
>    - After this I was back to square one where I could see all the data
>    without any loss
>    (via mfsbsd) but cannot boot of the volume any more.
>    - First thing I tried was to restore the boot loader without any luck:
>       gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 mfid0p1
>    - Then out of desperation, took zfsboot, zfsloader, gptzfsboot from
>    9.0-RELEASE and replaced them in /boot,
>    reinitialized again - no luck
>    - Currently running zdb -ccv zroot to check for any corruptions - I am
>    afraid this will take forever since I have *23.8T* used space. No errors
>    yet
>    - One thing I did notice is that zdb zroot returned the metaslab
>    information line by line very slowly (10-15 seconds a line). I don't
know
>    if it's related.
>    - Another thing I tried (saw that in a thread) without any difference
>    whatsoever was:
> 
> # cd src/sys/boot/i386/zfsboot
> # make clean; make cleandir
> # make obj ; make depend ; make
> # cd i386/loader
You probably wanted to do this in i386/zfsloader
> # make install
> # cd /usr/src/sys/boot/i386/zfsboot
> # make install
> # sysctl kern.geom.debugflags=16
> # dd if=/boot/zfsboot of=/dev/da0 count=1
> # dd if=/boot/zfsboot of=/dev/da0 skip=1 seek=1024
> # reboot
> 
> 
> At this point I am contemplating how to evacuate all the data from there or
> better yet put some USB flash to boot from.
> I could provide further details/execute commands if needed. Any help would
> be appreciated.
> 


-- 
Andriy Gapon

freebsd stable - Apr 2012 - ZFS: can't read MOS

ZFS: can't read MOS

ZFS: can't read MOS

ZFS: can't read MOS