We''re working on setting up a test box for an eventual iSCSI SAN. (I''ve dug through the archives for the last 6 months, but don''t see a good thread that discusses our situation.) Both technologies (Xen, iSCSI) are new to us, so there''s going to be a bit of a learning curve. The test unit is an AMD Athlon64 X2 4200+ AM2 socket, 2GB RAM, and multiple SATA drives. Since it has the AM2 socket, I believe that it supports AMD-V (AMD''s virtual technology) so we should be able to run unmodified guest OSs on it. Any guest OSs besides the iSCSI/SAN specific tasks will eventually be migrated off to an external box (either new SocketF Opteron systems or AM2 Athlon64 systems). My plan for the disks is to lay software RAID over the disks (RAID1 for the first pair, RAID10 for the second set of 6 disks). Then lay LVM on top of mdadm''s software RAID before handing it off to iscsitarget to be divided up for use by the iSCSI initiators. Specific questions: 1) The mdadm and LVM will need to happen in Dom0? 2) Should iscsitarget also run in Dom0 or should it be moved into a DomU? 3) I assume that ethernet card bonding should happen in Dom0 with the bonded interface presented as a virtual interface to the DomUs? _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Thomas Harold wrote:> The test unit is an AMD Athlon64 X2 4200+ AM2 socket, 2GB RAM, and > multiple SATA drives. Since it has the AM2 socket, I believe that it > supports AMD-V (AMD''s virtual technology) so we should be able to run > unmodified guest OSs on it. Any guest OSs besides the iSCSI/SAN > specific tasks will eventually be migrated off to an external box > (either new SocketF Opteron systems or AM2 Athlon64 systems). >Very nice hardware!> My plan for the disks is to lay software RAID over the disks (RAID1 for > the first pair, RAID10 for the second set of 6 disks). Then lay LVM on > top of mdadm''s software RAID before handing it off to iscsitarget to be > divided up for use by the iSCSI initiators. >Can''t you do RAID inside the disk controllers? It''s much better than software hardware (with real raid controllers).> Specific questions: > > 1) The mdadm and LVM will need to happen in Dom0? >Yes, it should be in the dom0.> 2) Should iscsitarget also run in Dom0 or should it be moved into a DomU? >Will you use iscsi to export volumes to be used for domU, right? So, it have to be done by dom0.> 3) I assume that ethernet card bonding should happen in Dom0 with the > bonded interface presented as a virtual interface to the DomUs? >I tried some time ago do bounding in dom0, but it didn''t work. Probable was my fault. Good luck. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFE7Z0MoFHRTH+AUKYRAvaoAKDnpJQSaDU0nPjCecJOI8AUj6CL5ACePwEL pDLqKmPo0TKD2TqRxscLUvc=nZwK -----END PGP SIGNATURE----- _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Keep in mind that unless you are exporting hardware ethernet cards to DomU, you will have a significant network performance issue inside a DomU (tech details here http://www.cs.aau.dk/~kleist/Courses/nds-e05/presentations/2609-preformance_overheads_xen.pdf#search=%22xen%20network%20performance%22) but if you run in Dom0 and busy up the CPU, all VMs will suffer. I would run in DomU with an exported PCI NIC instead of using virtual devices. Thomas Harold wrote:> We''re working on setting up a test box for an eventual iSCSI SAN. > (I''ve dug through the archives for the last 6 months, but don''t see a > good thread that discusses our situation.) Both technologies (Xen, > iSCSI) are new to us, so there''s going to be a bit of a learning curve. > > The test unit is an AMD Athlon64 X2 4200+ AM2 socket, 2GB RAM, and > multiple SATA drives. Since it has the AM2 socket, I believe that it > supports AMD-V (AMD''s virtual technology) so we should be able to run > unmodified guest OSs on it. Any guest OSs besides the iSCSI/SAN > specific tasks will eventually be migrated off to an external box > (either new SocketF Opteron systems or AM2 Athlon64 systems). > > My plan for the disks is to lay software RAID over the disks (RAID1 > for the first pair, RAID10 for the second set of 6 disks). Then lay > LVM on top of mdadm''s software RAID before handing it off to > iscsitarget to be divided up for use by the iSCSI initiators. > > Specific questions: > > 1) The mdadm and LVM will need to happen in Dom0? > > 2) Should iscsitarget also run in Dom0 or should it be moved into a DomU? > > 3) I assume that ethernet card bonding should happen in Dom0 with the > bonded interface presented as a virtual interface to the DomUs? > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users > >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> Can''t you do RAID inside the disk controllers? It''s much better than > software hardware (with real raid controllers). >I''m not sure I agree. Years ago I used some really nice Compaq Smartarray things which were simply excellent, but ever since then I have been disappointed by hardware raid. Slow and clunky... I bought a couple of 3ware cards recently and haven''t been impressed with their raid 5 performance at all. 20Mb/s max read/write rates... Software raid under linux is pretty good and will probably outperform anything but a really top end hardware controller (think decent SCSI cards) Ed W _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Lucas Santos wrote:> > Thomas Harold wrote: > >> The test unit is an AMD Athlon64 X2 4200+ AM2 socket, 2GB RAM, and >> multiple SATA drives. Since it has the AM2 socket, I believe that it >> supports AMD-V (AMD''s virtual technology) so we should be able to run >> unmodified guest OSs on it. Any guest OSs besides the iSCSI/SAN >> specific tasks will eventually be migrated off to an external box >> (either new SocketF Opteron systems or AM2 Athlon64 systems). > > Very nice hardware!Thanks, It''s a start for testing. :) Still waiting to see what Socket F Opteron prices are going to be. We may go with AM2 motherboards that take ECC memory in order to get more Xen head units (spread the load and risk over multiple physical units).>> My plan for the disks is to lay software RAID over the disks (RAID1 for >> the first pair, RAID10 for the second set of 6 disks). Then lay LVM on >> top of mdadm''s software RAID before handing it off to iscsitarget to be >> divided up for use by the iSCSI initiators. >> > > Can''t you do RAID inside the disk controllers? It''s much better than > software hardware (with real raid controllers).I''m actually much more comfortable with Software RAID (and mdadm). I find it to be flexible and performance is good enough for most cases. For instance, since this test unit is going to have 14 HDs, there''s a good chance that the hotspare for arrays will have to be on another controller. We''re doing RAID1 on 2-drives at the start of the drive set. Then eventually a 4-drive RAID10 and a 6-drive RAID10 with 2 hot spares sitting at the end of the drive set. But that''s a topic for a SAN mailing list (grin).> >> Specific questions: >> >> 1) The mdadm and LVM will need to happen in Dom0? >> > > Yes, it should be in the dom0. > >> 2) Should iscsitarget also run in Dom0 or should it be moved into a DomU? >> > > Will you use iscsi to export volumes to be used for domU, right? So, it > have to be done by dom0.Thanks for the info.>> 3) I assume that ethernet card bonding should happen in Dom0 with the >> bonded interface presented as a virtual interface to the DomUs? >> > > I tried some time ago do bounding in dom0, but it didn''t work. Probable > was my fault.We''ll see how bonding goes... first time trying it in Linux. But we picked up an inexpensive SMC 16/24 port switch that allows link aggregation (along with jumbo frames). The SMC switch was about $250 and will work well enough for testing. ... Worst come to worst, I take Xen back off the box. But since it''s an AM2 box, it''s worth trying to see how AMD-V performs. (Also thanks to Ed and Jason for their replies.) _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Thursday 24 August 2006 7:10 am, Thomas Harold wrote:> My plan for the disks is to lay software RAID over the disks (RAID1 for > the first pair, RAID10 for the second set of 6 disks). Then lay LVM on > top of mdadm''s software RAID before handing it off to iscsitarget to be > divided up for use by the iSCSI initiators.do you plan to export the LVs over iSCSI? i would advise to export the PVs (/dev/mdX), and do CVLM on dom0 of the iSCSI initiators. that way, you could add more storage boxes (with more PVs) to the same VG -- Javier _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Aug 24, 2006, at 5:10 AM, Thomas Harold wrote:> We''re working on setting up a test box for an eventual iSCSI SAN. > (I''ve dug through the archives for the last 6 months, but don''t see > a good thread that discusses our situation.) Both technologies > (Xen, iSCSI) are new to us, so there''s going to be a bit of a > learning curve.Do you need routable disk access? If not, you should seriously consider AoE instead of iSCSI. http://www.coraid.com The drivers are in the Linux kernel, and their aoe-tools contain vblade so you can build your own AoE targets if you choose not to buy their hardware. -- -- Tom Mornini _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Javier Guerra wrote:> On Thursday 24 August 2006 7:10 am, Thomas Harold wrote: >> My plan for the disks is to lay software RAID over the disks (RAID1 for >> the first pair, RAID10 for the second set of 6 disks). Then lay LVM on >> top of mdadm''s software RAID before handing it off to iscsitarget to be >> divided up for use by the iSCSI initiators. > > do you plan to export the LVs over iSCSI? i would advise to export the PVs > (/dev/mdX), and do CVLM on dom0 of the iSCSI initiators. that way, you could > add more storage boxes (with more PVs) to the same VGNot sure yet. The goal is to have a Xen setup where we can move DomUs between multiple head boxes on-the-fly. Having a SAN should make this easier to do (if I''ve read correctly). And phase2 of the test project would be to have two, identically configured (roughly), SAN units that are either mirrored or fault-tolerant so that the Xen DomUs can keep running even if one of the two SAN units is down. That also includes having 2 physical switches, multiple NICs bonded together (probably 2 bonded pairs, one for each switch) and multiple cables going to the switches (both for fault-tolerance and expanded bandwidth). At least, that''s the plan... the devil is in the details. (Needless to say, I have a good bit of reading left to do. We''re just trying to move away from the situation where if box X is down, services A/B/C are offline until it comes back up. Xen + SAN seems to make the most sense given our low-end requirements.) _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Friday 25 August 2006 13:50, Thomas Harold wrote:> Javier Guerra wrote: > > On Thursday 24 August 2006 7:10 am, Thomas Harold wrote: > >> My plan for the disks is to lay software RAID over the disks (RAID1 for > >> the first pair, RAID10 for the second set of 6 disks). Then lay LVM on > >> top of mdadm''s software RAID before handing it off to iscsitarget to be > >> divided up for use by the iSCSI initiators. > > > > do you plan to export the LVs over iSCSI? i would advise to export the > > PVs (/dev/mdX), and do CVLM on dom0 of the iSCSI initiators. that way, > > you could add more storage boxes (with more PVs) to the same VG > > Not sure yet. The goal is to have a Xen setup where we can move DomUs > between multiple head boxes on-the-fly. Having a SAN should make this > easier to do (if I''ve read correctly). > > And phase2 of the test project would be to have two, identically > configured (roughly), SAN units that are either mirrored or > fault-tolerant so that the Xen DomUs can keep running even if one of the > two SAN units is down. That also includes having 2 physical switches, > multiple NICs bonded together (probably 2 bonded pairs, one for each > switch) and multiple cables going to the switches (both for > fault-tolerance and expanded bandwidth). >What I''ve been building is pretty much the same as this. We have 2 storage servers with 5TB usable storage each, replicating through drbd. These then run iscsitarget to provide lvm based iSCSI disks to a set of Xen servers using open-iscsi. The vitual machines are then set up using these physical disks. Because the iscsi devices can have the same /dev/disk/by-id or /dev/disk/by-path labels on each Xen dom0, you can create generic config files that will work across all the servers. Also, even though drbd is a primary/secondary replication agent at the moment, everything is quite happy for multiple Xen dom0s to connect to the disks, allowing for very quick live migration. I haven''t gone quite so far with multiple switches etc., but we are using VLANs to separate the dom0 traffic (eth0), domUs (eth1), and iSCSI (eth2). All on Gb networking. We are also thinking of putting 10Gb links between the storage servers to keep drbd happy. Matthew -- Matthew Wild Tel.: +44 (0)1235 445173 M.Wild@rl.ac.uk URL http://www.ukssdc.ac.uk/ UK Solar System Data Centre and World Data Centre - Solar-Terrestrial Physics, Chilton Rutherford Appleton Laboratory, Chilton, Didcot, Oxon, OX11 0QX _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
I wonder, instead of drdb, what would happen if you exported both storage servers iscsi targets to your xen machines and then used linux software raid1 to mount them both and keep them in sync. Matthew Wild wrote:> On Friday 25 August 2006 13:50, Thomas Harold wrote: > >> Javier Guerra wrote: >> >>> On Thursday 24 August 2006 7:10 am, Thomas Harold wrote: >>> >>>> My plan for the disks is to lay software RAID over the disks (RAID1 for >>>> the first pair, RAID10 for the second set of 6 disks). Then lay LVM on >>>> top of mdadm''s software RAID before handing it off to iscsitarget to be >>>> divided up for use by the iSCSI initiators. >>>> >>> do you plan to export the LVs over iSCSI? i would advise to export the >>> PVs (/dev/mdX), and do CVLM on dom0 of the iSCSI initiators. that way, >>> you could add more storage boxes (with more PVs) to the same VG >>> >> Not sure yet. The goal is to have a Xen setup where we can move DomUs >> between multiple head boxes on-the-fly. Having a SAN should make this >> easier to do (if I''ve read correctly). >> >> And phase2 of the test project would be to have two, identically >> configured (roughly), SAN units that are either mirrored or >> fault-tolerant so that the Xen DomUs can keep running even if one of the >> two SAN units is down. That also includes having 2 physical switches, >> multiple NICs bonded together (probably 2 bonded pairs, one for each >> switch) and multiple cables going to the switches (both for >> fault-tolerance and expanded bandwidth). >> >> > What I''ve been building is pretty much the same as this. We have 2 storage > servers with 5TB usable storage each, replicating through drbd. These then > run iscsitarget to provide lvm based iSCSI disks to a set of Xen servers > using open-iscsi. The vitual machines are then set up using these physical > disks. Because the iscsi devices can have the same /dev/disk/by-id > or /dev/disk/by-path labels on each Xen dom0, you can create generic config > files that will work across all the servers. Also, even though drbd is a > primary/secondary replication agent at the moment, everything is quite happy > for multiple Xen dom0s to connect to the disks, allowing for very quick live > migration. > > I haven''t gone quite so far with multiple switches etc., but we are using > VLANs to separate the dom0 traffic (eth0), domUs (eth1), and iSCSI (eth2). > All on Gb networking. We are also thinking of putting 10Gb links between the > storage servers to keep drbd happy. > > Matthew >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Friday 25 August 2006 14:49, Jason wrote:> I wonder, instead of drdb, what would happen if you exported both > storage servers iscsi targets to your xen machines and then used linux > software raid1 to mount them both and keep them in sync. >You''d end up with a management nightmare. Which dom0 would be maintaining the RAID set? And you''d have to keep an eye on all the RAID configurations for each virtual machines disk(s) on every dom0. You would also be trying to write to both storage servers at the same time and that''s not possible with drbd. Part of the point is to make the Xen servers as simple as possible, and therefore interchangeable/replaceable, with little extra configuration to support. Eventually I would expect to use drbd 8.0 on the storage servers, giving primary/primary access, and use multipath-tools allowing me to dispense with heartbeat on the storage servers. Matthew -- Matthew Wild Tel.: +44 (0)1235 445173 M.Wild@rl.ac.uk URL http://www.ukssdc.ac.uk/ UK Solar System Data Centre and World Data Centre - Solar-Terrestrial Physics, Chilton Rutherford Appleton Laboratory, Chilton, Didcot, Oxon, OX11 0QX _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Matthew Wild wrote:> What I''ve been building is pretty much the same as this. We have 2 storage > servers with 5TB usable storage each, replicating through drbd. These then > run iscsitarget to provide lvm based iSCSI disks to a set of Xen servers > using open-iscsi. The vitual machines are then set up using these physical > disks. Because the iscsi devices can have the same /dev/disk/by-id > or /dev/disk/by-path labels on each Xen dom0, you can create generic config > files that will work across all the servers. Also, even though drbd is a > primary/secondary replication agent at the moment, everything is quite happy > for multiple Xen dom0s to connect to the disks, allowing for very quick live > migration. > > I haven''t gone quite so far with multiple switches etc., but we are using > VLANs to separate the dom0 traffic (eth0), domUs (eth1), and iSCSI (eth2). > All on Gb networking. We are also thinking of putting 10Gb links between the > storage servers to keep drbd happy.Excellent news. Did you document your setup anywhere public? This all started because we''re getting ready to add 2 new servers to our motley mix of individual servers with DAS and I had a bit of a brainflash where I finally saw how Xen + SAN could play together. Eventually, we should be able to pack our 6-8 servers down into 2-4 Xen "head" units and a pair of redundant storage units. The firewall / border boxes will remain as separate units (although possibly running Xen + Shorewall). For a modest amount of complexity, we''ll gain an enormous amount of flexibility. Which will hopefully result in a lot less stress for me. No more laying awake at night worrying about what happens when server hardware fails. (And doing this myself forces me to learn the technology, which is worthwhile.) ... My initial thought was also to use Software RAID across the two SAN units. Export a block device from each SAN unit via iSCSI, then have the DomU manage its own RAID1 array. But I''ll look closer into DRBD since that seems to be the preferred method. ... The proposed topology for us (full build-out) was: (2) SAN units with 5 NICs (3) gigabit switches (4) Xen units with 3 NICs (maybe 5 NICs) Switch A is for normal LAN traffic Switch B & C are for iSCSI traffic, with the two switches connected, possibly via 4 bonded/trunked ports for a 4 gigabit backbone, or using one of the expansion ports to link the two switches. Each SAN unit will have a single link to the LAN switch for management (via ssh) and monitoring. Then the other 4 NICs would be bonded into two pairs and attached to switch B & C for fault-tolerance and for serving up the iSCSI volumes. Xen head units would have 3 or 5 NICs. One for connecting to the LAN switch to provide client services to users. The other NICs to connect to SAN switches B & C for fault-tolerance (with the possibility of bonding for more performance if we go with 5 NICs). One change that I''m going to make since you''re talking about DRBD wanting a faster inter-link is to add 2 more NICs to the SAN units (for a total of 7 NICs). The additional 2 NICs could then be bonded together and linked directly to the other SAN unit via cross-over cables. But for the start, I''ll be running all iSCSI traffic over an existing gigabit switch. We''re going to install a 48-port gigabit switch soon which will free up the existing 24-port gigabit switch for use with the SAN. (Our LAN is still mostly 10/100 hubs with a gigabit core.) ETA for installing the 2nd switch and 2nd SAN unit is probably 6-12 months. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
The MD nightmare is predicate on a poor naming convention. I have a 30 node Xen grid up now running off of 4x 2TB sans, but I set them up logically, similar to an excel sheet. Each node having 2 nics , the second being private gig-e on its own switch just being used for san and migration traffic. Nodes (dom-0 hosts) become sorted into rows and columns , representing their positions on the racks (we''re using blades). I.e. 1-a is top left hand corner. 1-b is next to it, 2-b would be just under it. A third identifier is added to signify its role. I.e. , 1-a-0 is dom-0 on node 1-a, 1-a-3 would be dom-u #3 on node 1-a. This makes it really easy to keep apples to apples and oranges to oranges. If properly scripted MD does do the job well. I first thought about drbd, but I kept KISS in mind and wanted to plan the grid in such a way that any slightly-better-than-average admin could handle it because I hate 3AM wake up calls. So the end result is, once you setup key pairing (or your interconnect of choice between the nodes), you get the simplicity of open SSI, i.e. - onnode 1-b-0 do-this-cmd onclass someclass do-this-cmd onall do-this-cmd ... with a few hours worth of bash scripting. (actually I use Herbert''s dash port from bsd). I keep a file similar to /etc/hosts that will soon be on a gfs filesystem shared between dom-0 hosts that maps which iscsi target lands where, really helps.. just create a lockfile named .1b1-migration or whatever and toss it on that shared fs, in it list the 2 san partitions being synced for migration and remove it when done. Have a cron job check the age of .*-migration if its unsupervised as you indicated. That can obviously work with drbd too, but its point was to keep things organized enough so a simpler and better known to all technology could be used for the sans, and with a little more hammering its not too difficult to setup isolated single system image load balanced arrays. You may also want to check out pound, http://http://www.apsis.ch/pound/ I''ve had some success on smaller scale (pushing about 250 - 400meg) using pound, but have yet to use it on anything heavier. If your network can afford to try it out - it would be very meaningful data. Pound has a very basic heartbeat that needs improvement, don''t rely on pound to know if a node isn''t reachable especially under stress. Just remember to give dom-0 adequate ram for the bridges and initiators needed to pull it all off. I allow 128 per initiator and 32MB per bridge .. some people say allow more, some less.. but that seems to work as a rule of thumb for me. This is easy and straight forward to pull off, but does involve quite a bit of work.. but well .. you''re talking about a "smart" auto scaling auto migrating system .. so I''m guessing you expected that :) Or you could just use OpenSSI as a HVM... or use 2.0.7 and the xen-ssi kernel. THAT would be the easiest route.. but who likes that? :P HTH -Tim On Fri, 2006-08-25 at 15:17 +0100, Matthew Wild wrote:> On Friday 25 August 2006 14:49, Jason wrote: > > I wonder, instead of drdb, what would happen if you exported both > > storage servers iscsi targets to your xen machines and then used linux > > software raid1 to mount them both and keep them in sync. > > > You''d end up with a management nightmare. Which dom0 would be maintaining the > RAID set? And you''d have to keep an eye on all the RAID configurations for > each virtual machines disk(s) on every dom0. You would also be trying to > write to both storage servers at the same time and that''s not possible with > drbd. > > Part of the point is to make the Xen servers as simple as possible, and > therefore interchangeable/replaceable, with little extra configuration to > support. Eventually I would expect to use drbd 8.0 on the storage servers, > giving primary/primary access, and use multipath-tools allowing me to > dispense with heartbeat on the storage servers. > > Matthew_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Christopher G. Stach II
2006-Aug-30 03:47 UTC
Re: [Xen-users] iSCSI target - run in Dom0 or DomU?
Jason Clark wrote:> Keep in mind that unless you are exporting hardware ethernet cards to > DomU, you will have a significant network performance issue inside a > DomU (tech details here > http://www.cs.aau.dk/~kleist/Courses/nds-e05/presentations/2609-preformance_overheads_xen.pdf#search=%22xen%20network%20performance%22) > > but if you run in Dom0 and busy up the CPU, all VMs will suffer. I would > run in DomU with an exported PCI NIC instead of using virtual devices.I second that observation! :) -- Christopher G. Stach II _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users