Dr. Volker Jaenisch
2008-Oct-17 13:25 UTC
[Xen-users] Preventing DomU corruption in case of Split-Brain of heartbeat
Hi Xen-Users! We run an large HA XEN system based on heartbeat2. Storage base is an infiniband storage cluster exporting iSCSI devices to the frontend HA XEN Machines. The iSCSI devices are used as pysical devices for the domUs using the block-iscsi mechanism (by the way thanks for this cool script). Recently we had a split brain in our heartbeat system. This causes both of our XEN servers to fetch the iSCSI-Device and run the domU on it. This resulted in severe damage of the filesystem of the domU. Is there a method to limit the number of iscsi-sessions per iSCSI target, to prohibit the double aquisition of a iSCSI device. Or does anybody here has a alternative solution to this problem? Thanks in advance Best regards, Volker -- =================================================== inqbus it-consulting +49 ( 341 ) 5643800 Dr. Volker Jaenisch http://www.inqbus.de Herloßsohnstr. 12 0 4 1 5 5 Leipzig N O T - F Ä L L E +49 ( 170 ) 3113748 =================================================== _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Florian Manschwetus
2008-Oct-17 15:06 UTC
Re: [Xen-users] Preventing DomU corruption in case of Split-Brain of heartbeat
Dr. Volker Jaenisch schrieb:> Hi Xen-Users! > > We run an large HA XEN system based on heartbeat2. > > Storage base is an infiniband storage cluster exporting iSCSI devices > to the frontend HA XEN Machines. The iSCSI devices are used as pysical > devices > for the domUs using the block-iscsi mechanism (by the way thanks for > this cool script). > > Recently we had a split brain in our heartbeat system. This causes both > of our XEN servers to > fetch the iSCSI-Device and run the domU on it. This resulted in severe > damage of the filesystem of the domU. > > Is there a method to limit the number of iscsi-sessions per iSCSI > target, to prohibit the double aquisition > of a iSCSI device.Afaik, you should use a quorum disk (a disk with a clusterfs which allows concurrent access) so that each server could touch a file on it to leave a time stamp. So a server could be assumed to be down when he misses three times in a row or so to update his time stamp (disconnected from storage or sth). Then the other servers could jump in. when it comes back online on the same way a rejoin could be planned. Florian> > Or does anybody here has a alternative solution to this problem? > > Thanks in advance > > Best regards, > > Volker > > > >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users