Mykola Ivanets
2018-Jan-12 14:07 UTC
[Libguestfs] [PATCH 1/1] appliance: init: Avoid running degraded md devices
'--no-degraded' flag in the first mdadm call inhibits the startup of array unless all expected drives are present. This will prevent starting arrays in degraded state. Second mdadm call (after LVM is scanned) will scan unused yet devices and make an attempt to run all found arrays even they are in degraded state. Two new tests are added. This fixes rhbz1527852. Here is boot-benchmark before and after patch (no performance issues): : libvirt backend : direct backend ------------------------------------------------ master : 835.2ms ±1.1ms : 670.4ms ±0.3ms master+patch : 837.7ms ±2.4ms : 671.8ms ±0.2ms --- appliance/init | 7 +- tests/md/Makefile.am | 2 + tests/md/test-lvm-on-md-device.sh | 84 ++++++++++++++++++++++ tests/md/test-md-and-lvm-devices.sh | 138 ++++++++++++++++++++++++++++++++++++ 4 files changed, 230 insertions(+), 1 deletion(-) create mode 100755 tests/md/test-lvm-on-md-device.sh create mode 100755 tests/md/test-md-and-lvm-devices.sh diff --git a/appliance/init b/appliance/init index c04ee45..4284be0 100755 --- a/appliance/init +++ b/appliance/init @@ -131,13 +131,17 @@ if test "$guestfs_network" = 1; then fi # Scan for MDs. -mdadm -As --auto=yes --run +# Inhibits the startup of array unless all expected drives are present. +mdadm -As --no-degraded # Scan for LVM. modprobe dm_mod ||: lvm vgchange -aay --sysinit +# Scan for MDs once again and finally run them. +mdadm -As --auto=yes --run + # Scan for Windows dynamic disks. ldmtool create all @@ -146,6 +150,7 @@ if test "$guestfs_verbose" = 1 && test "$guestfs_boot_analysis" != 1; then uname -a ls -lR /dev cat /proc/mounts + cat /proc/mdstat lvm pvs lvm vgs lvm lvs diff --git a/tests/md/Makefile.am b/tests/md/Makefile.am index 5e18dca..6a5f4d6 100644 --- a/tests/md/Makefile.am +++ b/tests/md/Makefile.am @@ -22,6 +22,8 @@ TESTS = \ test-inspect-fstab-md.sh \ test-list-filesystems.sh \ test-list-md-devices.sh \ + test-lvm-on-md-device.sh \ + test-md-and-lvm-devices.sh \ test-mdadm.sh TESTS_ENVIRONMENT = $(top_builddir)/run --test diff --git a/tests/md/test-lvm-on-md-device.sh b/tests/md/test-lvm-on-md-device.sh new file mode 100755 index 0000000..ab00f89 --- /dev/null +++ b/tests/md/test-lvm-on-md-device.sh @@ -0,0 +1,84 @@ +#!/bin/bash - +# libguestfs +# Copyright (C) 2011 Red Hat Inc. +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + +# Test guestfish finds: +# 1. md device created from physical block device and LV(s), +# 2. md device created from LVs +# 3. VG created from md device(s) +# +# raid0 is used for md device because it is inoperable if one of its component is inaccessible. +# This covers rhbz1527852. + +set -e + +$TEST_FUNCTIONS +skip_if_skipped + +disk1=lvm-on-md-devices-1.img +disk2=lvm-on-md-devices-2.img + +rm -f $disk1 $disk2 + +# Clean up if the script is killed or exits early +cleanup () +{ + status=$? + set +e + + # Don't delete the output files if non-zero exit + if [ "$status" -eq 0 ]; then rm -f $disk1 $disk2; fi + + exit $status +} +trap cleanup INT QUIT TERM EXIT + +guestfish <<EOF +# Add 2 empty disks +sparse $disk1 100M +sparse $disk2 100M +run + +# Create a raid1 based on the 2 disks +md-create test "/dev/sda /dev/sdb" level:raid1 + +# Create volume group and logical volume on md device +pvcreate /dev/md127 +vgcreate vg0 /dev/md127 +lvcreate-free lv0 vg0 100 +EOF + +# Ensure list-md-devices now returns the newly created md device +output=$( +guestfish --format=raw -a $disk1 --format=raw -a $disk2 <<EOF +run +list-md-devices +lvs +EOF +) + +expected="/dev/md127 +/dev/vg0/lv0" + +if [ "$output" != "$expected" ]; then + echo "$0: error: actual output did not match expected output" + echo "$output" + exit 1 +fi + +# cleanup() is called implicitly which cleans up everything +exit 0 \ No newline at end of file diff --git a/tests/md/test-md-and-lvm-devices.sh b/tests/md/test-md-and-lvm-devices.sh new file mode 100755 index 0000000..46bed6d --- /dev/null +++ b/tests/md/test-md-and-lvm-devices.sh @@ -0,0 +1,138 @@ +#!/bin/bash - +# libguestfs +# Copyright (C) 2011 Red Hat Inc. +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + +# Test guestfish finds: +# 1. md device created from physical block device and LV(s), +# 2. md device created from LVs +# 3. VG created from md device(s) +# +# raid0 is used for md device because it is inoperable if one of its component is inaccessible. +# This covers rhbz1527852. + +set -e + +$TEST_FUNCTIONS +skip_if_skipped + +disk1=md-and-lvm-devices-1.img +disk2=md-and-lvm-devices-2.img + +rm -f $disk1 $disk2 + +# Clean up if the script is killed or exits early +cleanup () +{ + status=$? + set +e + + # Don't delete the output files if non-zero exit + if [ "$status" -eq 0 ]; then rm -f $disk1 $disk2; fi + + exit $status +} +trap cleanup INT QUIT TERM EXIT + +# Create 2 disks partitioned as: +# sda1: 20M MD (md127) +# sda2: 20M PV (vg1) +# sda3: 20M MD (md125) +# +# sdb1: 20M PV (vg0) +# sdb2: 20M PV (vg2) +# sdb3: 20M MD (md125) +# +# lv0 : LV (vg0) +# lv1 : LV (vg1) +# lv2 : LV (vg2) +# md127 : md (sda1, lv0) +# md126 : md (lv1, lv2) +# md125 : md (sda3, sdb3) +# vg3 : VG (md125) +# lv3 : LV (vg3) + +guestfish <<EOF +# Add 2 empty disks +sparse $disk1 100M +sparse $disk2 100M +run + +# Partition disks +part-init /dev/sda mbr +part-add /dev/sda p 64 41023 +part-add /dev/sda p 41024 81983 +part-add /dev/sda p 81984 122943 +part-init /dev/sdb mbr +part-add /dev/sdb p 64 41023 +part-add /dev/sdb p 41024 81983 +part-add /dev/sdb p 81984 122943 + +# Create volume group and logical volume on sdb1 +pvcreate /dev/sdb1 +vgcreate vg0 /dev/sdb1 +lvcreate-free lv0 vg0 100 + +# Create md from sda1 and vg0/lv0 +md-create md-sda1-lv0 "/dev/sda1 /dev/vg0/lv0" level:raid0 + +# Create volume group and logical volume on sda2 +pvcreate /dev/sda2 +vgcreate vg1 /dev/sda2 +lvcreate-free lv1 vg1 100 + +# Create volume group and logical volume on sdb2 +pvcreate /dev/sdb2 +vgcreate vg2 /dev/sdb2 +lvcreate-free lv2 vg2 100 + +# Create md from vg1/lv1 and vg2/lv2 +md-create md-lv1-lv2 "/dev/vg1/lv1 /dev/vg2/lv2" level:raid0 + +# Create md from sda3 and sdb3 +md-create md-sda3-sdb3 "/dev/sda3 /dev/sdb3" level:raid0 + +# Create volume group and logical volume on md125 (last created md) +pvcreate /dev/md125 +vgcreate vg3 /dev/md125 +lvcreate-free lv3 vg3 100 +EOF + +# Ensure list-md-devices now returns the newly created md device +output=$( +guestfish --format=raw -a $disk1 --format=raw -a $disk2 <<EOF +run +list-md-devices +lvs +EOF +) + +expected="/dev/md125 +/dev/md126 +/dev/md127 +/dev/vg0/lv0 +/dev/vg1/lv1 +/dev/vg2/lv2 +/dev/vg3/lv3" + +if [ "$output" != "$expected" ]; then + echo "$0: error: actual output did not match expected output" + echo "$output" + exit 1 +fi + +# cleanup() is called implicitly which cleans up everything +exit 0 \ No newline at end of file -- 2.9.5
Mykola Ivanets
2018-Jan-14 22:28 UTC
[Libguestfs] [PATCH v2 0/1] appliance: init: Avoid running degraded md devices
It is a second version of the earlier sent patch. I've added more clear explanation what does it fix and why the issue happened in first place. Also I split the patch (the patch itself and two independent tests): [PATCH v2 1/3] appliance: init: Avoid running degraded md devices [PATCH v2 2/3] tests: md: test guestfish finds logical volume on md [PATCH v2 3/3] tests: md: test guestfish finds md and LV devices in
Mykola Ivanets
2018-Jan-14 22:28 UTC
[Libguestfs] [PATCH v2 1/3] appliance: init: Avoid running degraded md devices
The issue: - raid1 will be in degraded state if one of its components is logical volume (LV) - raid0 will be inoperable at all (inacessible from within appliance) if one of its component is LV - raidN: you can expect the same issue for any raid level depends on how many components are inaccessible at the time mdadm is running and raid redundency. It happens because mdadm is launched prior to lvm AND it is instructed to run found arrays immediately (--run flag) regardless of completeness of their components. Later (when lvm activates found LVs) md signature on LV might be recognized BUT newly found raid components could't be inserted into already running (in degraded state) or marked as inoperable raid arrays. The patch fixes the issue in the following way: 1. Found arrays won't be run immediately unless ALL expected drives (components) are present. Here '--no-degraded' flag comes into a play. See mdadm(8). 2. Second mdadm call (after LVM is scanned) will scan UNUSED yet devices and make an attempt to run all found arrays (even they will be in degraded state). There is no performance penalty because second pass scans UNUSED yet devices. Here is 'boot-benchmark' before and after patch: : libvirt backend : direct backend ------------------------------------------------ master : 835.2ms ±1.1ms : 670.4ms ±0.3ms master+patch : 837.7ms ±2.4ms : 671.8ms ±0.2ms --- appliance/init | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/appliance/init b/appliance/init index c04ee45..6a61a8d 100755 --- a/appliance/init +++ b/appliance/init @@ -130,14 +130,16 @@ if test "$guestfs_network" = 1; then fi fi -# Scan for MDs. -mdadm -As --auto=yes --run +# Scan for MDs but don't run arrays unless all expected drives are present +mdadm -As --auto=yes --no-degraded # Scan for LVM. modprobe dm_mod ||: - lvm vgchange -aay --sysinit +# Scan for MDs and run all found arrays even they are in degraded state +mdadm -As --auto=yes --run + # Scan for Windows dynamic disks. ldmtool create all @@ -146,6 +148,7 @@ if test "$guestfs_verbose" = 1 && test "$guestfs_boot_analysis" != 1; then uname -a ls -lR /dev cat /proc/mounts + cat /proc/mdstat lvm pvs lvm vgs lvm lvs -- 2.9.5
Mykola Ivanets
2018-Jan-14 22:28 UTC
[Libguestfs] [PATCH v2 2/3] tests: md: test guestfish finds logical volume on md device
Test guestfish finds logical volume (LV) created on md device --- tests/md/Makefile.am | 1 + tests/md/test-lvm-on-md-device.sh | 80 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 81 insertions(+) create mode 100755 tests/md/test-lvm-on-md-device.sh diff --git a/tests/md/Makefile.am b/tests/md/Makefile.am index 5e18dca..fe94947 100644 --- a/tests/md/Makefile.am +++ b/tests/md/Makefile.am @@ -22,6 +22,7 @@ TESTS = \ test-inspect-fstab-md.sh \ test-list-filesystems.sh \ test-list-md-devices.sh \ + test-lvm-on-md-device.sh \ test-mdadm.sh TESTS_ENVIRONMENT = $(top_builddir)/run --test diff --git a/tests/md/test-lvm-on-md-device.sh b/tests/md/test-lvm-on-md-device.sh new file mode 100755 index 0000000..4c341de --- /dev/null +++ b/tests/md/test-lvm-on-md-device.sh @@ -0,0 +1,80 @@ +#!/bin/bash - +# libguestfs +# Copyright (C) 2018 Red Hat Inc. +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + +# Test guestfish finds logical volume (LV) created on md device + +set -e + +$TEST_FUNCTIONS +skip_if_skipped + +disk1=lvm-on-md-devices-1.img +disk2=lvm-on-md-devices-2.img + +rm -f $disk1 $disk2 + +# Clean up if the script is killed or exits early +cleanup () +{ + status=$? + set +e + + # Don't delete the output files if non-zero exit + if [ "$status" -eq 0 ]; then rm -f $disk1 $disk2; fi + + exit $status +} +trap cleanup INT QUIT TERM EXIT + +guestfish <<EOF +# Add 2 empty disks +sparse $disk1 100M +sparse $disk2 100M +run + +# Create a raid0 based on the 2 disks +md-create test "/dev/sda /dev/sdb" level:raid0 + +# Create volume group and logical volume on md device +pvcreate /dev/md127 +vgcreate vg0 /dev/md127 +lvcreate-free lv0 vg0 100 +EOF + +# Ensure list-md-devices now returns the newly created md device +# and lvs returns newly created logical volume. +output=$( +guestfish --format=raw -a $disk1 --format=raw -a $disk2 <<EOF +run +list-md-devices +lvs +EOF +) + +expected="/dev/md127 +/dev/vg0/lv0" + +if [ "$output" != "$expected" ]; then + echo "$0: error: actual output did not match expected output" + echo -e "actual:\n$output" + echo -e "expected:\n$expected" + exit 1 +fi + +# cleanup() is called implicitly which cleans up everything +exit 0 \ No newline at end of file -- 2.9.5
Mykola Ivanets
2018-Jan-14 22:28 UTC
[Libguestfs] [PATCH v2 3/3] tests: md: test guestfish finds md and LV devices in different combinations
Test guestfish finds: 1. md device created from physical block device and LV, 2. md device created from LVs 3. LV created on md device raid0 is used for md device because it is inoperable if one of its components is inaccessible so it is easy observable that md device is missing (raid1 in this case will be operable but in degraded state). --- tests/md/Makefile.am | 1 + tests/md/test-md-and-lvm-devices.sh | 142 ++++++++++++++++++++++++++++++++++++ 2 files changed, 143 insertions(+) create mode 100755 tests/md/test-md-and-lvm-devices.sh diff --git a/tests/md/Makefile.am b/tests/md/Makefile.am index fe94947..6a5f4d6 100644 --- a/tests/md/Makefile.am +++ b/tests/md/Makefile.am @@ -23,6 +23,7 @@ TESTS = \ test-list-filesystems.sh \ test-list-md-devices.sh \ test-lvm-on-md-device.sh \ + test-md-and-lvm-devices.sh \ test-mdadm.sh TESTS_ENVIRONMENT = $(top_builddir)/run --test diff --git a/tests/md/test-md-and-lvm-devices.sh b/tests/md/test-md-and-lvm-devices.sh new file mode 100755 index 0000000..0711d28 --- /dev/null +++ b/tests/md/test-md-and-lvm-devices.sh @@ -0,0 +1,142 @@ +#!/bin/bash - +# libguestfs +# Copyright (C) 2018 Red Hat Inc. +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + +# Test guestfish finds: +# 1. md device created from physical block device and LV, +# 2. md device created from LVs +# 3. LV created on md device +# +# raid0 is used for md device because it is inoperable if one of its components +# is inaccessible so it is easy observable that md device is missing (raid1 in +# this case will be operable but in degraded state). + +set -e + +$TEST_FUNCTIONS +skip_if_skipped + +disk1=md-and-lvm-devices-1.img +disk2=md-and-lvm-devices-2.img + +rm -f $disk1 $disk2 + +# Clean up if the script is killed or exits early +cleanup () +{ + status=$? + set +e + + # Don't delete the output files if non-zero exit +# if [ "$status" -eq 0 ]; then rm -f $disk1 $disk2; fi + + exit $status +} +trap cleanup INT QUIT TERM EXIT + +# Create 2 disks partitioned as: +# sda1: 20M MD (md127) +# sda2: 20M PV (vg1) +# sda3: 20M MD (md125) +# +# sdb1: 20M PV (vg0) +# sdb2: 20M PV (vg2) +# sdb3: 20M MD (md125) +# +# lv0 : LV (vg0) +# lv1 : LV (vg1) +# lv2 : LV (vg2) +# md127 : md (sda1, lv0) +# md126 : md (lv1, lv2) +# md125 : md (sda3, sdb3) +# vg3 : VG (md125) +# lv3 : LV (vg3) +# + +guestfish <<EOF +# Add 2 empty disks +sparse $disk1 100M +sparse $disk2 100M +run + +# Partition disks +part-init /dev/sda mbr +part-add /dev/sda p 64 41023 +part-add /dev/sda p 41024 81983 +part-add /dev/sda p 81984 122943 +part-init /dev/sdb mbr +part-add /dev/sdb p 64 41023 +part-add /dev/sdb p 41024 81983 +part-add /dev/sdb p 81984 122943 + +# Create volume group and logical volume on sdb1 +pvcreate /dev/sdb1 +vgcreate vg0 /dev/sdb1 +lvcreate-free lv0 vg0 100 + +# Create md from sda1 and vg0/lv0 +md-create md-sda1-lv0 "/dev/sda1 /dev/vg0/lv0" level:raid0 + +# Create volume group and logical volume on sda2 +pvcreate /dev/sda2 +vgcreate vg1 /dev/sda2 +lvcreate-free lv1 vg1 100 + +# Create volume group and logical volume on sdb2 +pvcreate /dev/sdb2 +vgcreate vg2 /dev/sdb2 +lvcreate-free lv2 vg2 100 + +# Create md from vg1/lv1 and vg2/lv2 +md-create md-lv1-lv2 "/dev/vg1/lv1 /dev/vg2/lv2" level:raid0 + +# Create md from sda3 and sdb3 +md-create md-sda3-sdb3 "/dev/sda3 /dev/sdb3" level:raid0 + +# Create volume group and logical volume on md125 (last created md) +pvcreate /dev/md125 +vgcreate vg3 /dev/md125 +lvcreate-free lv3 vg3 100 +EOF + +# Ensure list-md-devices now returns all created md devices +# and lvs returns all created logical volumes. +output=$( +guestfish --format=raw -a $disk1 --format=raw -a $disk2 <<EOF +run +list-md-devices +lvs +EOF +) + +expected="/dev/md125 +/dev/md126 +/dev/md127 +/dev/vg0/lv0 +/dev/vg1/lv1 +/dev/vg2/lv2 +/dev/vg3/lv3" + +if [ "$output" != "$expected" ]; then + echo "$0: error: actual output did not match expected output" + echo -e "actual:\n$output" + echo -e "expected:\n$expected" + exit 1 +fi + +# cleanup() is called implicitly which cleans up everything +exit 0 \ No newline at end of file -- 2.9.5
Richard W.M. Jones
2018-Jan-16 08:40 UTC
Re: [Libguestfs] [PATCH v2 0/1] appliance: init: Avoid running degraded md devices
This all looks good to me. I'm just running the tests locally, but if everything is fine after that I'll push it. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html
Possibly Parallel Threads
- live migration to localhost with lvm based storage
- Multiple samba instances on same machine in v4.8 and beyond
- Accidentally nuked my system - any suggestions ?
- CentOS 7: software RAID 5 array with 4 disks and no spares?
- Strange behavior from software RAID