Sahina Bose
2017-Aug-16 12:22 UTC
[Gluster-users] [ovirt-users] Recovering from a multi-node failure
On Sun, Aug 6, 2017 at 4:42 AM, Jim Kusznir <jim at palousetech.com> wrote:> Well, after a very stressful weekend, I think I have things largely > working. Turns out that most of the above issues were caused by the linux > permissions of the exports for all three volumes (they had been reset to > 600; setting them to 774 or 770 fixed many of the issues). Of course, I > didn't find that until a much more harrowing outage, and hours and hours of > work, including beginning to look at rebuilding my cluster.... > > So, now my cluster is operating again, and everything looks good EXCEPT > for one major Gluster issue/question that I haven't found any references or > info on. > > my host ovirt2, one of the replica gluster servers, is the one that lost > its storage and had to reinitialize it from the cluster. the iso volume is > perfectly fine and complete, but the engine and data volumes are smaller on > disk on this node than on the other node (and this node before the crash). > On the engine store, the entire cluster reports the smaller utilization on > mounted gluster filesystems; on the data partition, it reports the larger > size (rest of cluster). Here's some df statments to help clarify: > > (brick1 = engine; brick2=data, brick4=iso): > Filesystem Size Used Avail Use% Mounted on > /dev/mapper/gluster-engine 25G 12G 14G 47% /gluster/brick1 > /dev/mapper/gluster-data 136G 125G 12G 92% /gluster/brick2 > /dev/mapper/gluster-iso 25G 7.3G 18G 29% /gluster/brick4 > 192.168.8.11:/engine 15G 9.7G 5.4G 65% > /rhev/data-center/mnt/glusterSD/192.168.8.11:_engine > 192.168.8.11:/data 136G 125G 12G 92% > /rhev/data-center/mnt/glusterSD/192.168.8.11:_data > 192.168.8.11:/iso 13G 7.3G 5.8G 56% > /rhev/data-center/mnt/glusterSD/192.168.8.11:_iso > > View from ovirt2: > Filesystem Size Used Avail Use% Mounted on > /dev/mapper/gluster-engine 15G 9.7G 5.4G 65% /gluster/brick1 > /dev/mapper/gluster-data 174G 119G 56G 69% /gluster/brick2 > /dev/mapper/gluster-iso 13G 7.3G 5.8G 56% /gluster/brick4 > 192.168.8.11:/engine 15G 9.7G 5.4G 65% > /rhev/data-center/mnt/glusterSD/192.168.8.11:_engine > 192.168.8.11:/data 136G 125G 12G 92% > /rhev/data-center/mnt/glusterSD/192.168.8.11:_data > 192.168.8.11:/iso 13G 7.3G 5.8G 56% > /rhev/data-center/mnt/glusterSD/192.168.8.11:_iso > > As you can see, in the process of rebuilding the hard drive for ovirt2, I > did resize some things to give more space to data, where I desperately need > it. If this goes well and the storage is given a clean bill of health at > this time, then I will take ovirt1 down and resize to match ovirt2, and > thus score a decent increase in storage for data. I fully realize that > right now the gluster mounted volumes should have the total size as the > least common denominator. > > So, is this size reduction appropriate? A big part of me thinks data is > missing, but I even went through and shut down ovirt2's gluster daemons, > wiped all the gluster data, and restarted gluster to allow it a fresh heal > attempt, and it again came back to the exact same size. This cluster was > originally built about the time ovirt 4.0 came out, and has been upgraded > to 'current', so perhaps some new gluster features are making more > efficient use of space (dedupe or something)? >The used capacity should be consistent on all nodes - I see you have a discrepancy with the data volume brick. What does "gluster vol heal data info" tell you? Are there entries to be healed? Can you provide the glustershd logs?> > Thank you for your assistance! > --JIm > > On Fri, Aug 4, 2017 at 7:49 PM, Jim Kusznir <jim at palousetech.com> wrote: > >> Hi all: >> >> Today has been rough. two of my three nodes went down today, and self >> heal has not been healing well. 4 hours later, VMs are running. but the >> engine is not happy. It claims the storage domain is down (even though it >> is up on all hosts and VMs are running). I'm getting a ton of these >> messages logging: >> >> VDSM engine3 command HSMGetAllTasksStatusesVDS failed: Not SPM >> >> Aug 4, 2017 7:23:00 PM >> >> VDSM engine3 command SpmStatusVDS failed: Error validating master storage >> domain: ('MD read error',) >> >> Aug 4, 2017 7:22:49 PM >> >> VDSM engine3 command ConnectStoragePoolVDS failed: Cannot find master >> domain: u'spUUID=5868392a-0148-02cf-014d-000000000121, >> msdUUID=cdaf180c-fde6-4cb3-b6e5-b6bd869c8770' >> >> Aug 4, 2017 7:22:47 PM >> >> VDSM engine1 command ConnectStoragePoolVDS failed: Cannot find master >> domain: u'spUUID=5868392a-0148-02cf-014d-000000000121, >> msdUUID=cdaf180c-fde6-4cb3-b6e5-b6bd869c8770' >> >> Aug 4, 2017 7:22:46 PM >> >> VDSM engine2 command SpmStatusVDS failed: Error validating master storage >> domain: ('MD read error',) >> >> Aug 4, 2017 7:22:44 PM >> >> VDSM engine2 command ConnectStoragePoolVDS failed: Cannot find master >> domain: u'spUUID=5868392a-0148-02cf-014d-000000000121, >> msdUUID=cdaf180c-fde6-4cb3-b6e5-b6bd869c8770' >> >> Aug 4, 2017 7:22:42 PM >> >> VDSM engine1 command HSMGetAllTasksStatusesVDS failed: Not SPM: () >> >> >> ------------ >> I cannot set an SPM as it claims the storage domain is down; I cannot set >> the storage domain up. >> >> Also in the storage realm, one of my exports shows substantially less >> data than is actually there. >> >> Here's what happened, as best as I understood them: >> I went to do maintence on ovirt2 (needed to replace a faulty ram stick >> and rework the disk). I put it in maintence mode, then shut it down and >> did my work. In the process, much of the disk contents was lost (all the >> gluster data). I figure, no big deal, the gluster data is redundant on the >> network, it will heal when it comes back up. >> >> While I was doing maintence, all but one of the VMs were running on >> engine1. When I turned on engine2, all of the sudden, all vms including >> the main engine stop and go non-responsive. As far as I can tell, this >> should not have happened, as I turned ON one host, but none the less, I >> waited for recovery to occur (while customers started calling asking why >> everything stopped working....). As I waited, I was checking, and gluster >> volume status only showed ovirt1 and ovirt2....Apparently gluster had >> stopped/failed at some point on ovirt3. I assume that was the cause of the >> outage, still, if everything was working fine with ovirt1 gluster, and >> ovirt2 powers on with a very broke gluster (the volume status was showing >> NA for the port fileds for the gluster volumes), I would not expect to have >> a working gluster go stupid like that. >> >> After starting ovirt3 glusterd and checking the status, all three showed >> ovirt1 and ovirt3 as operational, and ovirt2 as NA. Unfortunately, >> recovery was still not happening, so I did some googling and found about >> the commands to inquire about the hosted-engine status. It appeared to be >> stuck "paused" and I couldn't find a way to unpause it, so I poweroff'ed >> it, then started it manually on engine 1, and the cluster came back up. It >> showed all VMs paused. I was able to unpause them and they worked again. >> >> So now I began to work the ovirt2 gluster healing problem. It didn't >> appear to be self-healing, but eventually I found this document: >> https://support.rackspace.com/how-to/recover-from-a-failed-s >> erver-in-a-glusterfs-array/ >> and from that found the magic xattr commands. After setting them, >> gluster volumes on ovirt2 came online. I told iso to heal, and it did but >> only came up about half as much data as it should have. I told it heal >> full, and it did finish off the remaining data, and came up to full. I >> then told engine to do a full heal (gluster volume heal engine full), and >> it transferred its data from the other gluster hosts too. However, it said >> it was done when it hit 9.7GB while there was 15GB on disk! It is still >> stuck that way; ovirt gui and gluster volume heal engine info both show the >> volume fully healed, but it is not: >> [root at ovirt1 ~]# df -h >> Filesystem Size Used Avail Use% Mounted on >> /dev/mapper/centos_ovirt-root 20G 4.2G 16G 21% / >> devtmpfs 16G 0 16G 0% /dev >> tmpfs 16G 16K 16G 1% /dev/shm >> tmpfs 16G 26M 16G 1% /run >> tmpfs 16G 0 16G 0% /sys/fs/cgroup >> /dev/mapper/gluster-engine 25G 12G 14G 47% /gluster/brick1 >> /dev/sda1 497M 315M 183M 64% /boot >> /dev/mapper/gluster-data 136G 124G 13G 92% /gluster/brick2 >> /dev/mapper/gluster-iso 25G 7.3G 18G 29% /gluster/brick4 >> tmpfs 3.2G 0 3.2G 0% /run/user/0 >> 192.168.8.11:/engine 15G 9.7G 5.4G 65% >> /rhev/data-center/mnt/glusterSD/192.168.8.11:_engine >> 192.168.8.11:/data 136G 124G 13G 92% >> /rhev/data-center/mnt/glusterSD/192.168.8.11:_data >> 192.168.8.11:/iso 13G 7.3G 5.8G 56% >> /rhev/data-center/mnt/glusterSD/192.168.8.11:_iso >> >> This is from ovirt1, and before the work, both ovirt1 and ovirt2's brings >> had the same usage. ovirt2's bricks and the gluster mountpoints agree on >> iso and engine, but as you can see, not here. If I do a du -sh on >> /rhev/data-center/mnt/glusterSD/..../_engine, it comes back with the >> 12GB number (/brick1 is engine, brick2 is data and brick4 is iso). >> However, gluster still says its only 9.7G. I haven't figured out how to >> get it to finish "healing". >> >> data is in the process of healing currently. >> >> So, I think I have two main things to solve right now: >> >> 1) how do I get ovirt to see the data center/storage domain as online >> again? >> 2) How do I get engine to finish healing to ovirt2? >> >> Thanks all for reading this very long message! >> --Jim >> >> > > _______________________________________________ > Users mailing list > Users at ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170816/c5b8179e/attachment.html>
Jim Kusznir
2017-Aug-19 05:02 UTC
[Gluster-users] [ovirt-users] Recovering from a multi-node failure
the heal info command shows perfect consistency between nodes; that's what confused me. At the moment, the physical partitions (lvm partitions) that gluster is using are different sizes, but I expected to see the "least common denominator" for the total size, and I expected to see it consistant accross the cluster. As this issue was from a couple weeks ago, I don't know what logs to give you anymore. Since the origional issue, the entire cluster has been rebooted, with not all nodes down at the same time, but every node having been rebooted. Now things look a bit different: [root at ovirt1 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/centos_ovirt-root 20G 5.1G 15G 26% / devtmpfs 16G 0 16G 0% /dev tmpfs 16G 0 16G 0% /dev/shm tmpfs 16G 34M 16G 1% /run tmpfs 16G 0 16G 0% /sys/fs/cgroup /dev/mapper/gluster-iso 25G 7.3G 18G 29% /gluster/brick4 /dev/sda1 497M 315M 183M 64% /boot /dev/mapper/gluster-engine 25G 13G 13G 49% /gluster/brick1 /dev/mapper/gluster-data 136G 126G 11G 93% /gluster/brick2 192.168.8.11:/engine 15G 10G 5.1G 67% /rhev/data-center/mnt/glusterSD/192.168.8.11:_engine 192.168.8.11:/data 136G 126G 11G 93% /rhev/data-center/mnt/glusterSD/192.168.8.11:_data 192.168.8.11:/iso 13G 7.3G 5.8G 56% /rhev/data-center/mnt/glusterSD/192.168.8.11:_iso tmpfs 3.2G 0 3.2G 0% /run/user/0 [root at ovirt2 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/centos_ovirt-root 8.0G 3.1G 5.0G 39% / devtmpfs 16G 0 16G 0% /dev tmpfs 16G 16K 16G 1% /dev/shm tmpfs 16G 90M 16G 1% /run tmpfs 16G 0 16G 0% /sys/fs/cgroup /dev/mapper/gluster-engine 15G 10G 5.1G 67% /gluster/brick1 /dev/sda1 497M 307M 191M 62% /boot /dev/mapper/gluster-iso 13G 7.3G 5.8G 56% /gluster/brick4 /dev/mapper/gluster-data 174G 121G 54G 70% /gluster/brick2 192.168.8.11:/engine 15G 10G 5.1G 67% /rhev/data-center/mnt/glusterSD/192.168.8.11:_engine 192.168.8.11:/data 136G 126G 11G 93% /rhev/data-center/mnt/glusterSD/192.168.8.11:_data 192.168.8.11:/iso 13G 7.3G 5.8G 56% /rhev/data-center/mnt/glusterSD/192.168.8.11:_iso tmpfs 3.2G 0 3.2G 0% /run/user/0 ------------ The thing that still bothers me is that for engine (brick1) ovirt1's physical disk space used is still higher than ovirt2's physical disk space used, but the smaller number is reported on the gluster fs. For data (brick2), ovirt1 and ovirt2 physical usage are still different, but the larger number is reported by glsuterfs. the main question is still: Is there cause for concern with the fact that physical usage for the bricks are not consistent between the replicas that the heal info show completely healed? (again, I was so concerned that with ovirt2, I re-deleted everything and let gluster re-heal the volume, and it came to the exact same amount of (less) disk usage and claimed fully healed. --Jim On Wed, Aug 16, 2017 at 5:22 AM, Sahina Bose <sabose at redhat.com> wrote:> > > On Sun, Aug 6, 2017 at 4:42 AM, Jim Kusznir <jim at palousetech.com> wrote: > >> Well, after a very stressful weekend, I think I have things largely >> working. Turns out that most of the above issues were caused by the linux >> permissions of the exports for all three volumes (they had been reset to >> 600; setting them to 774 or 770 fixed many of the issues). Of course, I >> didn't find that until a much more harrowing outage, and hours and hours of >> work, including beginning to look at rebuilding my cluster.... >> >> So, now my cluster is operating again, and everything looks good EXCEPT >> for one major Gluster issue/question that I haven't found any references or >> info on. >> >> my host ovirt2, one of the replica gluster servers, is the one that lost >> its storage and had to reinitialize it from the cluster. the iso volume is >> perfectly fine and complete, but the engine and data volumes are smaller on >> disk on this node than on the other node (and this node before the crash). >> On the engine store, the entire cluster reports the smaller utilization on >> mounted gluster filesystems; on the data partition, it reports the larger >> size (rest of cluster). Here's some df statments to help clarify: >> >> (brick1 = engine; brick2=data, brick4=iso): >> Filesystem Size Used Avail Use% Mounted on >> /dev/mapper/gluster-engine 25G 12G 14G 47% /gluster/brick1 >> /dev/mapper/gluster-data 136G 125G 12G 92% /gluster/brick2 >> /dev/mapper/gluster-iso 25G 7.3G 18G 29% /gluster/brick4 >> 192.168.8.11:/engine 15G 9.7G 5.4G 65% >> /rhev/data-center/mnt/glusterSD/192.168.8.11:_engine >> 192.168.8.11:/data 136G 125G 12G 92% >> /rhev/data-center/mnt/glusterSD/192.168.8.11:_data >> 192.168.8.11:/iso 13G 7.3G 5.8G 56% >> /rhev/data-center/mnt/glusterSD/192.168.8.11:_iso >> >> View from ovirt2: >> Filesystem Size Used Avail Use% Mounted on >> /dev/mapper/gluster-engine 15G 9.7G 5.4G 65% /gluster/brick1 >> /dev/mapper/gluster-data 174G 119G 56G 69% /gluster/brick2 >> /dev/mapper/gluster-iso 13G 7.3G 5.8G 56% /gluster/brick4 >> 192.168.8.11:/engine 15G 9.7G 5.4G 65% >> /rhev/data-center/mnt/glusterSD/192.168.8.11:_engine >> 192.168.8.11:/data 136G 125G 12G 92% >> /rhev/data-center/mnt/glusterSD/192.168.8.11:_data >> 192.168.8.11:/iso 13G 7.3G 5.8G 56% >> /rhev/data-center/mnt/glusterSD/192.168.8.11:_iso >> >> As you can see, in the process of rebuilding the hard drive for ovirt2, I >> did resize some things to give more space to data, where I desperately need >> it. If this goes well and the storage is given a clean bill of health at >> this time, then I will take ovirt1 down and resize to match ovirt2, and >> thus score a decent increase in storage for data. I fully realize that >> right now the gluster mounted volumes should have the total size as the >> least common denominator. >> >> So, is this size reduction appropriate? A big part of me thinks data is >> missing, but I even went through and shut down ovirt2's gluster daemons, >> wiped all the gluster data, and restarted gluster to allow it a fresh heal >> attempt, and it again came back to the exact same size. This cluster was >> originally built about the time ovirt 4.0 came out, and has been upgraded >> to 'current', so perhaps some new gluster features are making more >> efficient use of space (dedupe or something)? >> > > The used capacity should be consistent on all nodes - I see you have a > discrepancy with the data volume brick. What does "gluster vol heal data > info" tell you? Are there entries to be healed? > > Can you provide the glustershd logs? > > > >> >> Thank you for your assistance! >> --JIm >> >> On Fri, Aug 4, 2017 at 7:49 PM, Jim Kusznir <jim at palousetech.com> wrote: >> >>> Hi all: >>> >>> Today has been rough. two of my three nodes went down today, and self >>> heal has not been healing well. 4 hours later, VMs are running. but the >>> engine is not happy. It claims the storage domain is down (even though it >>> is up on all hosts and VMs are running). I'm getting a ton of these >>> messages logging: >>> >>> VDSM engine3 command HSMGetAllTasksStatusesVDS failed: Not SPM >>> >>> Aug 4, 2017 7:23:00 PM >>> >>> VDSM engine3 command SpmStatusVDS failed: Error validating master >>> storage domain: ('MD read error',) >>> >>> Aug 4, 2017 7:22:49 PM >>> >>> VDSM engine3 command ConnectStoragePoolVDS failed: Cannot find master >>> domain: u'spUUID=5868392a-0148-02cf-014d-000000000121, >>> msdUUID=cdaf180c-fde6-4cb3-b6e5-b6bd869c8770' >>> >>> Aug 4, 2017 7:22:47 PM >>> >>> VDSM engine1 command ConnectStoragePoolVDS failed: Cannot find master >>> domain: u'spUUID=5868392a-0148-02cf-014d-000000000121, >>> msdUUID=cdaf180c-fde6-4cb3-b6e5-b6bd869c8770' >>> >>> Aug 4, 2017 7:22:46 PM >>> >>> VDSM engine2 command SpmStatusVDS failed: Error validating master >>> storage domain: ('MD read error',) >>> >>> Aug 4, 2017 7:22:44 PM >>> >>> VDSM engine2 command ConnectStoragePoolVDS failed: Cannot find master >>> domain: u'spUUID=5868392a-0148-02cf-014d-000000000121, >>> msdUUID=cdaf180c-fde6-4cb3-b6e5-b6bd869c8770' >>> >>> Aug 4, 2017 7:22:42 PM >>> >>> VDSM engine1 command HSMGetAllTasksStatusesVDS failed: Not SPM: () >>> >>> >>> ------------ >>> I cannot set an SPM as it claims the storage domain is down; I cannot >>> set the storage domain up. >>> >>> Also in the storage realm, one of my exports shows substantially less >>> data than is actually there. >>> >>> Here's what happened, as best as I understood them: >>> I went to do maintence on ovirt2 (needed to replace a faulty ram stick >>> and rework the disk). I put it in maintence mode, then shut it down and >>> did my work. In the process, much of the disk contents was lost (all the >>> gluster data). I figure, no big deal, the gluster data is redundant on the >>> network, it will heal when it comes back up. >>> >>> While I was doing maintence, all but one of the VMs were running on >>> engine1. When I turned on engine2, all of the sudden, all vms including >>> the main engine stop and go non-responsive. As far as I can tell, this >>> should not have happened, as I turned ON one host, but none the less, I >>> waited for recovery to occur (while customers started calling asking why >>> everything stopped working....). As I waited, I was checking, and gluster >>> volume status only showed ovirt1 and ovirt2....Apparently gluster had >>> stopped/failed at some point on ovirt3. I assume that was the cause of the >>> outage, still, if everything was working fine with ovirt1 gluster, and >>> ovirt2 powers on with a very broke gluster (the volume status was showing >>> NA for the port fileds for the gluster volumes), I would not expect to have >>> a working gluster go stupid like that. >>> >>> After starting ovirt3 glusterd and checking the status, all three showed >>> ovirt1 and ovirt3 as operational, and ovirt2 as NA. Unfortunately, >>> recovery was still not happening, so I did some googling and found about >>> the commands to inquire about the hosted-engine status. It appeared to be >>> stuck "paused" and I couldn't find a way to unpause it, so I poweroff'ed >>> it, then started it manually on engine 1, and the cluster came back up. It >>> showed all VMs paused. I was able to unpause them and they worked again. >>> >>> So now I began to work the ovirt2 gluster healing problem. It didn't >>> appear to be self-healing, but eventually I found this document: >>> https://support.rackspace.com/how-to/recover-from-a-failed-s >>> erver-in-a-glusterfs-array/ >>> and from that found the magic xattr commands. After setting them, >>> gluster volumes on ovirt2 came online. I told iso to heal, and it did but >>> only came up about half as much data as it should have. I told it heal >>> full, and it did finish off the remaining data, and came up to full. I >>> then told engine to do a full heal (gluster volume heal engine full), and >>> it transferred its data from the other gluster hosts too. However, it said >>> it was done when it hit 9.7GB while there was 15GB on disk! It is still >>> stuck that way; ovirt gui and gluster volume heal engine info both show the >>> volume fully healed, but it is not: >>> [root at ovirt1 ~]# df -h >>> Filesystem Size Used Avail Use% Mounted on >>> /dev/mapper/centos_ovirt-root 20G 4.2G 16G 21% / >>> devtmpfs 16G 0 16G 0% /dev >>> tmpfs 16G 16K 16G 1% /dev/shm >>> tmpfs 16G 26M 16G 1% /run >>> tmpfs 16G 0 16G 0% /sys/fs/cgroup >>> /dev/mapper/gluster-engine 25G 12G 14G 47% /gluster/brick1 >>> /dev/sda1 497M 315M 183M 64% /boot >>> /dev/mapper/gluster-data 136G 124G 13G 92% /gluster/brick2 >>> /dev/mapper/gluster-iso 25G 7.3G 18G 29% /gluster/brick4 >>> tmpfs 3.2G 0 3.2G 0% /run/user/0 >>> 192.168.8.11:/engine 15G 9.7G 5.4G 65% >>> /rhev/data-center/mnt/glusterSD/192.168.8.11:_engine >>> 192.168.8.11:/data 136G 124G 13G 92% >>> /rhev/data-center/mnt/glusterSD/192.168.8.11:_data >>> 192.168.8.11:/iso 13G 7.3G 5.8G 56% >>> /rhev/data-center/mnt/glusterSD/192.168.8.11:_iso >>> >>> This is from ovirt1, and before the work, both ovirt1 and ovirt2's >>> brings had the same usage. ovirt2's bricks and the gluster mountpoints >>> agree on iso and engine, but as you can see, not here. If I do a du -sh on >>> /rhev/data-center/mnt/glusterSD/..../_engine, it comes back with the >>> 12GB number (/brick1 is engine, brick2 is data and brick4 is iso). >>> However, gluster still says its only 9.7G. I haven't figured out how to >>> get it to finish "healing". >>> >>> data is in the process of healing currently. >>> >>> So, I think I have two main things to solve right now: >>> >>> 1) how do I get ovirt to see the data center/storage domain as online >>> again? >>> 2) How do I get engine to finish healing to ovirt2? >>> >>> Thanks all for reading this very long message! >>> --Jim >>> >>> >> >> _______________________________________________ >> Users mailing list >> Users at ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170818/1495fe40/attachment.html>