Hi Folks, I''ve been running a 2-node, high-availability cluster for a while. I''ve just acquired 2 more servers, and I''ve been trying to figure out my options for organizing my storage configuration. Basic goal: provide a robust, high-availability platform for multiple Xen VMs. Current configuration (2 nodes): - 4 drives each (1TB/drive) - md software raid10 across the 4 drives on each machine -- md devices for Dom0 /boot / swap + one big device -- 2 logical volumes per VM (/ and swap) -- VM volumes replicated across both nodes, using DRBD -- pacemaker, heartbeat, etc. to migrate production VMs if a node fails I now have 2 new servers - each with a lot more memory, faster CPUs (and more cores), also 4 drives each. So I''m wondering what''s my best option for wiring the 4 machines together as a platform to run VMs on. Seems like my first consideration is how to wire together the storage, within the following constraints: - want to use each node for both processing and storage (only have 4U of rackspace to play with, made the choice to buy 4 general purpose servers, with 4 drives each, rather than using some of the space for a storage server) - 4 gigE ports per server - 2 reserved for primary/secondary external links, 2 reserved for storage & heartbeat comms. - total of 16 drives, in groups of 4 (if a node goes down, it takes 4 drives with it) - so I can''t simply treat this as 16 drives in one big array (I don''t think) - want to make things just a bit easier to manage than manually setting up pairs of DRBD volumes per VM - would really like to make it easier to migrate a VM from any node to any other (for both load leveling and n-way fallback) - but DRBD seems to put a serious crimp in this - sort of been keeping my eyes on some of the emerging cloud technologies, but they all seem to be aimed at larger clusters - sheepdog seems like the closest thing to what I''m looking for, but it seems married at the hip to KVM (unless someone has ported it to support Xen while I wasn''t looking) So... just wondering - anybody able to share some thoughts and/or experiences? Thanks very much, Miles Fidelman -- In theory, there is no difference between theory and practice. In<fnord> practice, there is. .... Yogi Berra _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On 10/10/11 00:26, Miles Fidelman wrote:> Hi Folks, > > I''ve been running a 2-node, high-availability cluster for a while. I''ve > just acquired 2 more servers, and I''ve been trying to figure out my > options for organizing my storage configuration. > > Basic goal: provide a robust, high-availability platform for multiple > Xen VMs. > > Current configuration (2 nodes): > - 4 drives each (1TB/drive) > - md software raid10 across the 4 drives on each machine > -- md devices for Dom0 /boot / swap + one big device > -- 2 logical volumes per VM (/ and swap) > -- VM volumes replicated across both nodes, using DRBD > -- pacemaker, heartbeat, etc. to migrate production VMs if a node fails > > I now have 2 new servers - each with a lot more memory, faster CPUs (and > more cores), also 4 drives each. So I''m wondering what''s my best option > for wiring the 4 machines together as a platform to run VMs on. > > Seems like my first consideration is how to wire together the storage, > within the following constraints: > > - want to use each node for both processing and storage (only have 4U of > rackspace to play with, made the choice to buy 4 general purpose > servers, with 4 drives each, rather than using some of the space for a > storage server) > > - 4 gigE ports per server - 2 reserved for primary/secondary external > links, 2 reserved for storage & heartbeat comms. > > - total of 16 drives, in groups of 4 (if a node goes down, it takes 4 > drives with it) - so I can''t simply treat this as 16 drives in one big > array (I don''t think) > > - want to make things just a bit easier to manage than manually setting > up pairs of DRBD volumes per VM > > - would really like to make it easier to migrate a VM from any node to > any other (for both load leveling and n-way fallback) - but DRBD seems > to put a serious crimp in this > > - sort of been keeping my eyes on some of the emerging cloud > technologies, but they all seem to be aimed at larger clusters > > - sheepdog seems like the closest thing to what I''m looking for, but it > seems married at the hip to KVM (unless someone has ported it to support > Xen while I wasn''t looking) > > So... just wondering - anybody able to share some thoughts and/or > experiences? > > Thanks very much, > > Miles Fidelman > >DRBD does not do 4 nodes. If you split in two clusters you cannot cross migrate, unless you set up the storage on each node in some way, bu thow are you going to replicate? Personally, I would create a storage cluster and a VM cluster, allowing for epxansion of the virtual nodes later if time/budget permits. B. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> > DRBD does not do 4 nodes. If you split in two clusters you cannotcross> migrate, unless you set up the storage on each node in some way, buthow> are you going to replicate? > > Personally, I would create a storage cluster and a VM cluster,allowing for> epxansion of the virtual nodes later if time/budget permits. >That''s what I''ve done. Overall throughput is quite good, but seems to suffer a bit on small writes. James _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Sun, Oct 09, 2011 at 06:26:18PM -0400, Miles Fidelman wrote:> Hi Folks, > > I''ve been running a 2-node, high-availability cluster for a while. > I''ve just acquired 2 more servers, and I''ve been trying to figure > out my options for organizing my storage configuration. > > Basic goal: provide a robust, high-availability platform for > multiple Xen VMs. > > Current configuration (2 nodes): > - 4 drives each (1TB/drive) > - md software raid10 across the 4 drives on each machine > -- md devices for Dom0 /boot / swap + one big device > -- 2 logical volumes per VM (/ and swap) > -- VM volumes replicated across both nodes, using DRBD > -- pacemaker, heartbeat, etc. to migrate production VMs if a node fails > > I now have 2 new servers - each with a lot more memory, faster CPUs > (and more cores), also 4 drives each. So I''m wondering what''s my > best option for wiring the 4 machines together as a platform to run > VMs on. > > Seems like my first consideration is how to wire together the > storage, within the following constraints: > > - want to use each node for both processing and storage (only have > 4U of rackspace to play with, made the choice to buy 4 general > purpose servers, with 4 drives each, rather than using some of the > space for a storage server) > > - 4 gigE ports per server - 2 reserved for primary/secondary > external links, 2 reserved for storage & heartbeat comms. > > - total of 16 drives, in groups of 4 (if a node goes down, it takes > 4 drives with it) - so I can''t simply treat this as 16 drives in one > big array (I don''t think) > > - want to make things just a bit easier to manage than manually > setting up pairs of DRBD volumes per VM > > - would really like to make it easier to migrate a VM from any node > to any other (for both load leveling and n-way fallback) - but DRBD > seems to put a serious crimp in this > > - sort of been keeping my eyes on some of the emerging cloud > technologies, but they all seem to be aimed at larger clusters > > - sheepdog seems like the closest thing to what I''m looking for, but > it seems married at the hip to KVM (unless someone has ported it to > support Xen while I wasn''t looking) > > So... just wondering - anybody able to share some thoughts and/or > experiences?Have you tried Ganeti (http://code.google.com/p/ganeti)? It uses DRBD under the hood but it manages moving instances around without you having to reconfigure instances. I think it matches what you''re looking for, and we support clusters from 1 physical machine up to hundreds. Disclaimer: I''m one of the authors. regards, iustin _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> I now have 2 new servers - each with a lot more memory, faster CPUs (and > more cores), also 4 drives each. So I''m wondering what''s my best option > for wiring the 4 machines together as a platform to run VMs on.A few things come to mind: - Since these two new boxes are so much better, do you really need all four systems for hosting VM''s? If you can do everything on the two old systems, you can do it all on the new systems. If you just keep it to two nodes, you eliminate some complexity. - Ever worked with GlusterFS? It''ll allow you to stripe and replicate across multiple nodes. - If I/O isn''t too much of an issue, you could use one pair with DRDB as the storage node and export to the other two over NFS/etc. - Export everything over iSCSI, use md to mirror, name everything carefully so you know which nodes you can take down based on where VM''s are. Complicated, but workable. I suppose you could set this up with DRDB too. John -- John Madden / Sr UNIX Systems Engineer Office of Technology / Ivy Tech Community College of Indiana Free Software is a matter of liberty, not price. To understand the concept, you should think of Free as in ''free speech,'' not as in ''free beer.'' -- Richard Stallman _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
James Harper wrote:> Bart Coninckx wrote: >> >> DRBD does not do 4 nodes. If you split in two clusters you cannot >> cross migrate, unless you set up the storage on each node in some >> way, bu thow are you going to replicate? >>Well, yes, ... that''s sort of why I''m asking the question :-)>> Personally, I would create a storage cluster and a VM cluster, >> allowing for epxansion of the virtual nodes later if time/budget >> permits. >> > That''s what I''ve done. Overall throughput is quite good, but seems to > suffer a bit on small writes. >Which leads to two follow-up questions: 1. Can I assume that you''re both suggesting a 4-node storage cluster, and a 4-node VM cluster - running on the same 4 computers? If so, that''s sort of what I''m aiming for. 2. What software are you running for your storage cluster? -- In theory, there is no difference between theory and practice. In<fnord> practice, there is. .... Yogi Berra _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Hi Iustin, Iustin Pop wrote:> Have you tried Ganeti (http://code.google.com/p/ganeti)? It uses DRBD > under the hood but it manages moving instances around without you having > to reconfigure instances. I think it matches what you''re looking for, > and we support clusters from 1 physical machine up to hundreds. > > Disclaimer: I''m one of the authors. >I''ve experimented with Ganeti a couple of times, but not for about 6 months (and I know it''s been evolving quickly). I was really impressed with Ganeti as a tool for managing lots of VMs - but it seemed to be lacking when it comes to high-availability capabilities. In particular: Last time I looked, there was some documentation and list discussion that made explicit statements that Ganeti is NOT a high-availability solution - migration and failover require manual intervention, or customization/extension by a user. When I realized that some of the things that one might do with Pacemaker would interact with some of Ganeti''s control wiring, I decided that I really didn''t want to wrestle with that level of complexity (at least not when there were easier ways to do a 2-node configuration). A quick look at the current documentation suggests this hasn''t changed. Re. DRBD: It looks like one still is limited to a primary instance, with failover to a pre-configured secondary instance - and one has to manually recreate a new secondary after a node failure. Hence my interest in some kind of cluster filesystem that provides redundancy across multiple nodes. (Sheepdog really has my attention in this regard - but it seems to only work with KVM, not Xen). Thanks! Miles -- In theory, there is no difference between theory and practice. In<fnord> practice, there is. .... Yogi Berra _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
John Madden wrote:>> I now have 2 new servers - each with a lot more memory, faster CPUs (and >> more cores), also 4 drives each. So I''m wondering what''s my best option >> for wiring the 4 machines together as a platform to run VMs on. > > A few things come to mind: > > - Since these two new boxes are so much better, do you really need all > four systems for hosting VM''s? If you can do everything on the two > old systems, you can do it all on the new systems. If you just keep > it to two nodes, you eliminate some complexity.I''m mixing development with production, and working on some server-side software intended to work across large numbers of nodes - so I figure 4 machines gives us some more flexibility. I''m also thinking of keeping some of our current production stuff (mostly mailing lists) on the old systems - but setting things up to make it easier to migrate later. (But, yes, I have thought of it :-)> > - Ever worked with GlusterFS? It''ll allow you to stripe and replicate > across multiple nodes.That''s pretty much the only thing that''s jumping out, as I peruse the net looking for relatively mature solutions. The one thing that looks a little closer is Sheepdog - but it''s KVM-only and not all that mature. One thing I''ve been wondering about - can''t find in the documentation - and guess I might just have to start experimenting - is what GlusterFS does regarding disks on the same node, vs. disks on different nodes. Things like: - whether or not to run RAID on each node, as well as configuring GlusterFS to stripe/replicate across nodes (i.e., with a total of 16 drives, but split 4-per-node, will Gluster replicate/stripe so that a node failure won''t kill you) - what happens to performance if you stripe/replicate across nodes? Have you (or anybody) had much experience with GlusterFS in practice? Particularly on a relatively small cluster? Comments? Suggestions? Thought of these....> - If I/O isn''t too much of an issue, you could use one pair with DRDB > as the storage node and export to the other two over NFS/etc. >Thought about this, but it would leave half my disk space idle.> - Export everything over iSCSI, use md to mirror, name everything > carefully so you know which nodes you can take down based on where > VM''s are. Complicated, but workable. I suppose you could set this up > with DRDB too.Been thinking about this one too. The complexity really scares me. Any thoughts re. tools that might simplify things, and/or performance implications of using md to mirror across iSCSI mounts on different machines? Thanks again to all, Miles Fidelman -- In theory, there is no difference between theory and practice. In<fnord> practice, there is. .... Yogi Berra _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On 10/10/11 15:18, Miles Fidelman wrote:> James Harper wrote: >> Bart Coninckx wrote: >>> >>> DRBD does not do 4 nodes. If you split in two clusters you cannot >>> cross migrate, unless you set up the storage on each node in some >>> way, bu thow are you going to replicate? >>> > > Well, yes, ... that''s sort of why I''m asking the question :-)The question implies that I don''t see a possibility for this.>>> Personally, I would create a storage cluster and a VM cluster, >>> allowing for epxansion of the virtual nodes later if time/budget >>> permits. >>> >> That''s what I''ve done. Overall throughput is quite good, but seems to >> suffer a bit on small writes. >> > Which leads to two follow-up questions: > > 1. Can I assume that you''re both suggesting a 4-node storage cluster, > and a 4-node VM cluster - running on the same 4 computers? If so, that''s > sort of what I''m aiming for.No, we''re suggesting two 2-node clusters, one for storage, one for virtualization.> 2. What software are you running for your storage cluster?I''m running IET. Next project I would try AoE though. B. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Bart Coninckx wrote:> On 10/10/11 15:18, Miles Fidelman wrote: >> James Harper wrote: >>> Bart Coninckx wrote: >>>> >>>> DRBD does not do 4 nodes. If you split in two clusters you cannot >>>> cross migrate, unless you set up the storage on each node in some >>>> way, bu thow are you going to replicate? >>>> >> >> Well, yes, ... that''s sort of why I''m asking the question :-) > > The question implies that I don''t see a possibility for this.That''s why I''m trying to avoid using DRBD - looking for an alternative that will replicate data across all four nodes, and allow for continued operation if one (or possibly two) node(s) fail. It looks like GlusterFS, VastSky, and Sheepdog would do this but development on VastSky seems to have stalled, and Sheepdog is KVM-only. Alternate might be to mount everything via iSCSI or AoE, run md Raid10 across the mess of drives, or some such.>> Which leads to two follow-up questions: >> >> 1. Can I assume that you''re both suggesting a 4-node storage cluster, >> and a 4-node VM cluster - running on the same 4 computers? If so, that''s >> sort of what I''m aiming for. > > No, we''re suggesting two 2-node clusters, one for storage, one for > virtualization.Ok... that''s what I''m trying to avoid - mostly because that would make half my drives unavailable.>> 2. What software are you running for your storage cluster? > > I''m running IET. Next project I would try AoE though. >Running anything on top of that in the way of a cluster file system? Thanks, Miles -- In theory, there is no difference between theory and practice. In<fnord> practice, there is. .... Yogi Berra _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On 10/10/11 22:35, Miles Fidelman wrote:> That''s why I''m trying to avoid using DRBD - looking for an alternative > that will > replicate data across all four nodes, and allow for continued operation > if one > (or possibly two) node(s) fail. It looks like GlusterFS, VastSky, and > Sheepdog would > do this but development on VastSky seems to have stalled, and Sheepdog > is KVM-only.I just looked at GlusterFS and VastSky - these seems to be aimed at aggregating storage located across different servers. Correct me if I''m wrong, but this seems to me the opposite of what you are looking for: you want HA, so data needs to be replicated to be available via different servers, right?>> I''m running IET. Next project I would try AoE though. >> > Running anything on top of that in the way of a cluster file system?IET is running on DRBD - no cluster file system necessary. Pacemaker sees to it that only one node is using the iSCSI connection actively for a particular DomU. You could think about using a cluster file system to offer the Xen config files to all nodes, but I simply fixed this by syncing these files with csync. B. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On 10/10/11 22:53, Bart Coninckx wrote:> On 10/10/11 22:35, Miles Fidelman wrote: > >> That''s why I''m trying to avoid using DRBD - looking for an alternative >> that will >> replicate data across all four nodes, and allow for continued operation >> if one >> (or possibly two) node(s) fail. It looks like GlusterFS, VastSky, and >> Sheepdog would >> do this but development on VastSky seems to have stalled, and Sheepdog >> is KVM-only. > > I just looked at GlusterFS and VastSky - these seems to be aimed at > aggregating storage located across different servers. Correct me if I''m > wrong, but this seems to me the opposite of what you are looking for: > you want HA, so data needs to be replicated to be available via > different servers, right?Continued reading some more on VastSky. This seems to offer redundancy too, by means of mirroring. 4 nodes might be not enough for that though. Also, I wonder if it is suitable for things like live migration, which iSCSI and AoE can do. B. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Bart Coninckx wrote:> On 10/10/11 22:35, Miles Fidelman wrote: > > I just looked at GlusterFS and VastSky - these seems to be aimed at > aggregating storage located across different servers. Correct me if > I''m wrong, but this seems to me the opposite of what you are looking > for: you want HA, so data needs to be replicated to be available via > different servers, right?goal: - 16 disks spread across 4 servers -> one storage pool - files, logical volumes, VM images, etc. available from any server (e.g., via a cluster FS) - replicate data across servers/drives to insulate from disk/server failures (ideal, survive 2 node outage) - auto-migrate VMs (and associated IP addresses) on node failure -- In theory, there is no difference between theory and practice. In<fnord> practice, there is. .... Yogi Berra _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Bart Coninckx wrote:> On 10/10/11 22:53, Bart Coninckx wrote: > > Continued reading some more on VastSky. This seems to offer redundancy > too, by means of mirroring. 4 nodes might be not enough for that > though. Also, I wonder if it is suitable for things like live > migration, which iSCSI and AoE can do.VastSky seems to suffer from two problems: - storage manager is a single point of failure - development seems to have stopped in Oct. 2010 GlusterFS also does replication - question is whether it''s performance is up to supporting VMs. Mixed responses so far, some suggestions that this is supposed to get a lot better in version 3.3 (currently in beta). Not sure the impact of Red Hat''s acquisition of Gluster. Starting to think that one approach would be to publish all 16 drives via AoE, then build one big md RAID10 array across them (linux RAID10 is interesting vis. standard RAID1+0 - it does mirroring and striping as a single operation, which uses disk space more efficiently). Trying to work through how things would respond in the event of node failures (4 drives go out at once). -- In theory, there is no difference between theory and practice. In<fnord> practice, there is. .... Yogi Berra _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On 10/10/11 23:17, Miles Fidelman wrote:> Bart Coninckx wrote: >> On 10/10/11 22:53, Bart Coninckx wrote: >> >> Continued reading some more on VastSky. This seems to offer redundancy >> too, by means of mirroring. 4 nodes might be not enough for that >> though. Also, I wonder if it is suitable for things like live >> migration, which iSCSI and AoE can do. > > VastSky seems to suffer from two problems: > - storage manager is a single point of failure > - development seems to have stopped in Oct. 2010 > > GlusterFS also does replication - question is whether it''s performance > is up to supporting VMs. Mixed responses so far, some suggestions that > this is supposed to get a lot better in version 3.3 (currently in beta). > Not sure the impact of Red Hat''s acquisition of Gluster. > > Starting to think that one approach would be to publish all 16 drives > via AoE, then build one big md RAID10 array across them (linux RAID10 is > interesting vis. standard RAID1+0 - it does mirroring and striping as a > single operation, which uses disk space more efficiently). Trying to > work through how things would respond in the event of node failures (4 > drives go out at once).If the RAID happens on the AoE client side, how will this fit into HA across all nodes? The RAID only becomes active on one node at the time, right? Two cannot activate it simultaneously for sure... I think the redundancy for the storage needs to happen below the used technology. Just my thoughts. B. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> GlusterFS also does replication - question is whether it''s performance > is up to supporting VMs. Mixed responses so far, some suggestions that > this is supposed to get a lot better in version 3.3 (currently in > beta). Not sure the impact of Red Hat''s acquisition of Gluster.I used GlusterFS a couple of years ago for a storage project and it worked very well until we put load on it. There were multiple problems with our implementation though: - It was an older version of Gluster. - I only had two nodes. No striping, just mirroring. - The backend storage was 7200RPM SATA SAN space and thus slow. Both nodes shared the same spindles. - I was stingy on RAM for the storage nodes. - The primary application was a PHP session store that handled in the neighborhood of 600 requests per second. That''s a lot of read and write for this sort of system. Keep in mind the FUSE layer... We moved to a single-node of NFS to replace this and even then had to move the underlying storage to an FC zone of the SAN to patch up performance. My understanding of GlusterFS is that it''s better designed to handle larger files and this may make it a better candidate for hosting VM''s. I''d still be concerned about only having 16 drives but I think it''s worth a shot. RAID them all at the host level, hand the (ext4 or better -- I''d skip ext2/3) filesystem to Gluster, then stripe+replicate. Do a whole bunch of testing. Live migration won''t be a problem but performance might be. John -- John Madden / Sr UNIX Systems Engineer Office of Technology / Ivy Tech Community College of Indiana Free Software is a matter of liberty, not price. To understand the concept, you should think of Free as in ''free speech,'' not as in ''free beer.'' -- Richard Stallman _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users