Hi, Of those on the list that have implemented High Availability with Xen what configurations are being used? And what degree of Fault Tolerance can be expected? Ultimately would like to see fault tolerance and scalability at the disk level and also at the VM (node) level where 3 or more nodes can be utilized for automatic switchover. I have looked at some of the docs and they all look like there is an active and an inactive node which switch if there is trouble (heartbeat), doesn''t look like clusters of nodes are implemented, please let me know, thank you in advance, Randy _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Randy Katz wrote:> Of those on the list that have implemented High Availability with Xen > what > configurations are being used? And what degree of Fault Tolerance can be > expected? Ultimately would like to see fault tolerance and scalability at > the disk level and also at the VM (node) level where 3 or more nodes > can be utilized for automatic switchover. I have looked at some of the > docs > and they all look like there is an active and an inactive node which > switch if > there is trouble (heartbeat), doesn''t look like clusters of nodes are > implemented, > please let me know, thank you in advance,I run a fairly simple 2-node setup. Configured roughly as follows, from the hardware up: - lots of fault tolerance provided by the computer center (at the intersection of two power grids, plus generator, plus battery backup in each rack; multiple backbone network connections, etc.) - network: I''m only using one network drop, but more are available; right now only a simple 1G switch to break send that to two servers - 2 1U rack-mounted servers: 4 drives in each, dual NICS (only using 1) -- software RAID1 across all 4 drives for boot, swap, root for Dom0 -- software RAID10 (the md-provided varient) across all 4 drives for one large physical volume for LVM - Xen/HA setup: -- running the version of Xen 3 supported by Debian Lenny (will soon migrate to either Sqeeze/Xen4 or OpenSUSE/Xen4) -- DRBD to mirror VMs across both nodes - for each VM: boot/root and swap volumes -- pacemaker/corosync -- haven''t tried a 3rd node - DRBD only supports 2 nodes (I think it may in later versions) -- 4 VMs - set for automatic failover - I load level by having 2 primary on one node, 2 on the other Experience: -- DRBD insures that disks are consistent if a node fails -- a node failure leads to the effected VMs booting on the other node - performance, of course, drops -- depending on how complicated a particular VM is (and how large the drive) it can take up to about 5 minutes for a reboot -- when the failed node is brought back up - it can take a LONG time for DRBD and RAID10 and/or both to resync (note: configuring with a bitmap speeds things up a lot) -- I expect there''s a way to mirror working memory so that node failure doesn''t require a reboot, but never really dug into it The most surprising thing is that, so far, all my failures have been a result of Xen-induced kernel panics. The Lenny version of Xen 3 has a nasty little bug in the code that allocates physical CPUs to virtual CPUs - every once in a while, when a CPU is released for re-allocation, there''s a Dom0 (or perhaps hypervisor) kernel panic and reboot. For me, every once in a while translates to up to twice a day - alternating with days of running smoothly. Supposedly that''s been fixed upstream, but as Squeeze became eminent, nobody put any attention into updating the Lenny package. The only work-around for this bug is to pin CPUs. Since I''ve done that, I haven''t had ANY failures of any sort - things just keep humming along (fingers crossed here). One other thing to note: RAID, particularly software RAID has its own nasty surprises: If a disk starts degrading, its internal failure recovery mechanisms will often try to re-read sectors and such - so you get your data, but it takes longer and longer. md does NOT take note of this - so your machine will just get slower, and slower, and slower, and slower.... I learned this the hard way - in a way that led to rebuilding my entire software environment rather than just swapping out one bad disk. Lessons learned from that: - use SMART tools to keep an eye on the Raw_Read_Error_Rate - anything other than 0 indicates looming trouble - if all your disks are the same age, and they''re RAIDed - they''re likely to fail around the same time - if one starts going, replace them all Miles Fidelman - computer center: redundant power (at the intersection of 2 grids, generator, -- note: I only have a single network drop, but could have multiple ones if I -- In theory, there is no difference between theory and practice. In<fnord> practice, there is. .... Yogi Berra _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On 2/26/2011 6:56 AM, Miles Fidelman wrote:> Randy Katz wrote: >> Of those on the list that have implemented High Availability with Xen >> what >> configurations are being used? And what degree of Fault Tolerance can be >> expected? Ultimately would like to see fault tolerance and >> scalability at >> the disk level and also at the VM (node) level where 3 or more nodes >> can be utilized for automatic switchover. I have looked at some of >> the docs >> and they all look like there is an active and an inactive node which >> switch if >> there is trouble (heartbeat), doesn''t look like clusters of nodes are >> implemented, >> please let me know, thank you in advance, > > I run a fairly simple 2-node setup. Configured roughly as follows, > from the hardware up: > > - lots of fault tolerance provided by the computer center (at the > intersection of two power grids, plus generator, plus battery backup > in each rack; multiple backbone network connections, etc.) > > - network: I''m only using one network drop, but more are available; > right now only a simple 1G switch to break send that to two servers > > - 2 1U rack-mounted servers: 4 drives in each, dual NICS (only using 1) > -- software RAID1 across all 4 drives for boot, swap, root for Dom0 > -- software RAID10 (the md-provided varient) across all 4 drives for > one large physical volume for LVM > > - Xen/HA setup: > -- running the version of Xen 3 supported by Debian Lenny (will soon > migrate to either Sqeeze/Xen4 or OpenSUSE/Xen4) > -- DRBD to mirror VMs across both nodes - for each VM: boot/root and > swap volumes > -- pacemaker/corosync > -- haven''t tried a 3rd node - DRBD only supports 2 nodes (I think it > may in later versions) > -- 4 VMs - set for automatic failover - I load level by having 2 > primary on one node, 2 on the other > > Experience: > -- DRBD insures that disks are consistent if a node fails > -- a node failure leads to the effected VMs booting on the other node > - performance, of course, drops > -- depending on how complicated a particular VM is (and how large the > drive) it can take up to about 5 minutes for a reboot > -- when the failed node is brought back up - it can take a LONG time > for DRBD and RAID10 and/or both to resync (note: configuring with a > bitmap speeds things up a lot) > -- I expect there''s a way to mirror working memory so that node > failure doesn''t require a reboot, but never really dug into it > > The most surprising thing is that, so far, all my failures have been a > result of Xen-induced kernel panics. The Lenny version of Xen 3 has a > nasty little bug in the code that allocates physical CPUs to virtual > CPUs - every once in a while, when a CPU is released for > re-allocation, there''s a Dom0 (or perhaps hypervisor) kernel panic and > reboot. For me, every once in a while translates to up to twice a day > - alternating with days of running smoothly. Supposedly that''s been > fixed upstream, but as Squeeze became eminent, nobody put any > attention into updating the Lenny package. > > The only work-around for this bug is to pin CPUs. Since I''ve done > that, I haven''t had ANY failures of any sort - things just keep > humming along (fingers crossed here). > > One other thing to note: RAID, particularly software RAID has its own > nasty surprises: If a disk starts degrading, its internal failure > recovery mechanisms will often try to re-read sectors and such - so > you get your data, but it takes longer and longer. md does NOT take > note of this - so your machine will just get slower, and slower, and > slower, and slower.... I learned this the hard way - in a way that > led to rebuilding my entire software environment rather than just > swapping out one bad disk. Lessons learned from that: > - use SMART tools to keep an eye on the Raw_Read_Error_Rate - anything > other than 0 indicates looming trouble > - if all your disks are the same age, and they''re RAIDed - they''re > likely to fail around the same time - if one starts going, replace > them allMiles, excellent, thanks. What version of DRBD are you currently on? You say you''re only using 1 NIC on each machine, so you did NOT use a cross-cable on the second NIC between the machines as is recommended? Why are you using RAID10 with LVM? Couldn''t you just have one large VG that included all the drives? Have you had any drives fail within the RAID10 yet? Thanks, Randy _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Randy Katz wrote:> On 2/26/2011 6:56 AM, Miles Fidelman wrote: > > Miles, excellent, thanks. What version of DRBD are you currently on?8.014 - I think this is the Lenny Package - a lot of new features came in with 8.2, but it''s easier to stay with the packages> You say you''re only using 1 NIC on each machine, so you did NOT use a > cross-cable on the second > NIC between the machines as is recommended?Ooops - I am using the 2nd port for cross-connect. Didn''t have my morning coffee when I wrote things up. What I''m not doing is redundant external connectivity.> Why are you using RAID10 with LVM? Couldn''t you just > have one large VG that included all the drives? Have you had any > drives fail within the RAID10 yet?LVM does nothing to protect against disk failures - it''s just for managing blocks of disk space. The RAID10 gives me protection against drive failures, while maximizing use of disk space and protecting against the RAID5/6 "hole." Yes, I''ve had a drive fail. the RAID10 kept me going and made replacement easy. Miles -- In theory, there is no difference between theory and practice. In<fnord> practice, there is. .... Yogi Berra _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users