Hi all, I used to host the disk images of my xen VMs in a nfs server and am considering move to iscsi for performance purpose. Here is the problem I encountered: For iscsi, there are two ways to export the virtual disk to the physical machine hosting the VMs. 1. Export the virtual disk (at the target side , it is either an img file or a lvm) as a physical device, e.g sdc, then boot the VMs using "phy:/dev/sdc". 2. Export the partition containing the virtual disks (in this case, the virtual disks are img files) to each physical machine as a physical device, and then on each physical machine I mount the new device to the file system. In this way, the img files are accessible from each physical machine (similar as nfs), and the VMs are booted using tapdisk "tap:aio/PATH_TO_IMGS_FILES". I prefer the second approach because I need tapdisk (each virtual disk is a process in host machines) to control the IO priority among VMs. However, there is a problem when I share the LUN containing all the VM img files among multiple hosts. It seems that any modifications to the LUN (by writing some data to folder that mounted LUN ) is not immediate observable at other hosts sharing the LUN (In the case of nfs, the changes are immediate synchronized at all the nfs clients). The changes are only observable when I umount the LUN and remount it on other physical hosts. I searched the Internet, it seems that iscsi is not intended for sharing a single LUN between multiple hosts. Is it true or ,I need some specific configuration of the target or initiator to make the changes immediately synchronized at multiple initiator? Thanks in advance, Jia _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Fri, Aug 7, 2009 at 8:48 AM, Jia Rao<rickenrao@gmail.com> wrote:> I searched the Internet, it seems that iscsi is not intended for sharing a > single LUN between multiple hosts.the point isn''t iSCSI, it''s the filesystem in it. a ''normal'' filesystem (ext3, XFS, JFS, etc) isn''t safe to mount at multiple hosts simultaneously. a cluster filesystem (GFS, OCFS, clusterXFS, etc) is specifically designed for it. in short, you have three options for sharing storage between Xen boxes: 1: NFS with file-based images 2: share a LUN, put a cluster filesystem on it, and put file-based images inside 3: multiple LUNs, each one is a blockdevice-based image -- Javier _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Jia: You''re partially correct. iSCSI as a protocol has no problem allowing multiple initiators access to the same block device, but you''re almost certain to run into corruption if you don''t set up a higher-level locking mechanism to make sure your access is consistent across all devices. To state again: iSCSI is not in itself a protocol that provides all the features necessary for a shared filesystem. If you want to do that, you need to look into the shared filesystem space (OCFS2, GFS, etc). The other option is to set up individual logical volumes on the shared LUN for each VM. Note that this still requires a inter-machine locking protocol--in my case, Clustered LVM. There are quite a few of us who have gone ahead and used clustered LVM with the phy handler--this gives us the consistency on LVM data across the multiple machines, while we administratively restrict access to each logical volume to one machine at a time (unless we''re doing a live migration). I hope this helps. Cheers cc On Fri, Aug 7, 2009 at 6:48 AM, Jia Rao<rickenrao@gmail.com> wrote:> Hi all, > > I used to host the disk images of my xen VMs in a nfs server and am > considering move to iscsi for performance purpose. > Here is the problem I encountered: > > For iscsi, there are two ways to export the virtual disk to the physical > machine hosting the VMs. > > 1. Export the virtual disk (at the target side , it is either an img file or > a lvm) as a physical device, e.g sdc, then boot the VMs using > "phy:/dev/sdc". > > 2. Export the partition containing the virtual disks (in this case, the > virtual disks are img files) to each physical machine as a physical device, > and then on each physical machine I mount the new device to the file system. > In this way, the img files are accessible from each physical machine > (similar as nfs), and the VMs are booted using tapdisk > "tap:aio/PATH_TO_IMGS_FILES". > > I prefer the second approach because I need tapdisk (each virtual disk is a > process in host machines) to control the IO priority among VMs. > > However, there is a problem when I share the LUN containing all the VM img > files among multiple hosts. > It seems that any modifications to the LUN (by writing some data to folder > that mounted LUN ) is not immediate observable at other hosts sharing the > LUN (In the case of nfs, the changes are immediate synchronized at all the > nfs clients). The changes are only observable when I umount the LUN and > remount it on other physical hosts. > > I searched the Internet, it seems that iscsi is not intended for sharing a > single LUN between multiple hosts. > Is it true or ,I need some specific configuration of the target or initiator > to make the changes immediately synchronized at multiple initiator? > > Thanks in advance, > Jia > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users >-- Chris Chen <muffaleta@gmail.com> "The fact that yours is better than anyone else''s is not a guarantee that it''s any good." -- Seen on a wall _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Thank you very much for the prompt replies. My intention of moving to iscsi is due to pure performance purpose. The physical hosts and the storage server are connected through a 1G switch. The storage server uses raid-5 disk array. My current testing using iozone within VMs on both iscsi and nfs produced similar performance results for sequential and random read. I was told it will make a big difference if there are 10-15 VMs sharing the storage server. In my case, I have 8-10 VMs. Any experience with a larger number of VMs in both nfs and iscsi? Jia. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Hi, Quite funny. We are using Xen over iSCSI for years, and we are turning to Netapp NAS to migrate to NFS for the following reasons : - Easier setup, no matter how many xen hosts you have. - Avoid dealing with Cluster Filesystem. (We use OCFS2 quite successfully anyway.) - Some 3 years ago, we were using nbd + drbd + clvm2 + fenced + ... to have direct lvm in your VM works, but it''s too complicated and too complex to maintain, especially if you have an issue or more than 2 dom0. - According to the bench I received, there is no difference in performances between NFS and iSCSI. There are some differences in some cases due to the different levels of cache. Regards, François. ----- Original Message ----- From: "Christopher Chen" <muffaleta@gmail.com> To: "Jia Rao" <rickenrao@gmail.com> Cc: xen-users@lists.xensource.com Sent: Friday, 7 August, 2009 16:00:20 GMT +01:00 Amsterdam / Berlin / Bern / Rome / Stockholm / Vienna Subject: Re: [Xen-users] iscsi vs nfs for xen VMs Jia: You''re partially correct. iSCSI as a protocol has no problem allowing multiple initiators access to the same block device, but you''re almost certain to run into corruption if you don''t set up a higher-level locking mechanism to make sure your access is consistent across all devices. To state again: iSCSI is not in itself a protocol that provides all the features necessary for a shared filesystem. If you want to do that, you need to look into the shared filesystem space (OCFS2, GFS, etc). The other option is to set up individual logical volumes on the shared LUN for each VM. Note that this still requires a inter-machine locking protocol--in my case, Clustered LVM. There are quite a few of us who have gone ahead and used clustered LVM with the phy handler--this gives us the consistency on LVM data across the multiple machines, while we administratively restrict access to each logical volume to one machine at a time (unless we''re doing a live migration). I hope this helps. Cheers cc On Fri, Aug 7, 2009 at 6:48 AM, Jia Rao<rickenrao@gmail.com> wrote:> Hi all, > > I used to host the disk images of my xen VMs in a nfs server and am > considering move to iscsi for performance purpose. > Here is the problem I encountered: > > For iscsi, there are two ways to export the virtual disk to the physical > machine hosting the VMs. > > 1. Export the virtual disk (at the target side , it is either an img file or > a lvm) as a physical device, e.g sdc, then boot the VMs using > "phy:/dev/sdc". > > 2. Export the partition containing the virtual disks (in this case, the > virtual disks are img files) to each physical machine as a physical device, > and then on each physical machine I mount the new device to the file system. > In this way, the img files are accessible from each physical machine > (similar as nfs), and the VMs are booted using tapdisk > "tap:aio/PATH_TO_IMGS_FILES". > > I prefer the second approach because I need tapdisk (each virtual disk is a > process in host machines) to control the IO priority among VMs. > > However, there is a problem when I share the LUN containing all the VM img > files among multiple hosts. > It seems that any modifications to the LUN (by writing some data to folder > that mounted LUN ) is not immediate observable at other hosts sharing the > LUN (In the case of nfs, the changes are immediate synchronized at all the > nfs clients). The changes are only observable when I umount the LUN and > remount it on other physical hosts. > > I searched the Internet, it seems that iscsi is not intended for sharing a > single LUN between multiple hosts. > Is it true or ,I need some specific configuration of the target or initiator > to make the changes immediately synchronized at multiple initiator? > > Thanks in advance, > Jia > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users >-- Chris Chen <muffaleta@gmail.com> "The fact that yours is better than anyone else''s is not a guarantee that it''s any good." -- Seen on a wall _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
We are using iSCSI with CLVM and GFS2 very successfully with 10 physical Xen servers and 80 to 100 VMs running across them. We use file-based disk images, all stored on a single GFS2 file system on a single iSCSI LUN accessible by all 10 Xen servers. On Fri, Aug 7, 2009 at 10:11 AM, Jia Rao <rickenrao@gmail.com> wrote:> Thank you very much for the prompt replies. > > My intention of moving to iscsi is due to pure performance purpose. > The physical hosts and the storage server are connected through a 1G > switch. The storage server uses raid-5 disk array. > My current testing using iozone within VMs on both iscsi and nfs produced > similar performance results for sequential and random read. > > I was told it will make a big difference if there are 10-15 VMs sharing the > storage server. In my case, I have 8-10 VMs. > > Any experience with a larger number of VMs in both nfs and iscsi? > > Jia. >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
2009/8/7 François Delpierre <xensource@pivert.org>:> Hi, > > Quite funny. We are using Xen over iSCSI for years, and we are turning to > Netapp NAS to migrate to NFS for the following reasons : > - Easier setup, no matter how many xen hosts you have. > - Avoid dealing with Cluster Filesystem. (We use OCFS2 quite successfully > anyway.) > - Some 3 years ago, we were using nbd + drbd + clvm2 + fenced + ... to have > direct lvm in your VM works, but it''s too complicated and too complex to > maintain, especially if you have an issue or more than 2 dom0. > - According to the bench I received, there is no difference in performances > between NFS and iSCSI. There are some differences in some cases due to the > different levels of cache. > > Regards, > > François.Hi, was this benchmark specific to Xen?, is it public? Regards, -- Ciro Iriarte http://cyruspy.wordpress.com -- _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Hi! We are in the opposite road. We used NFS but when the IO load was increased the performance was down and we must to change to iSCSI. This system is working Ok from several months. (We tested NFS with phyle and tap) Regards, Agustin Ciro Iriarte escribió:> 2009/8/7 François Delpierre <xensource@pivert.org>: > >> Hi, >> >> Quite funny. We are using Xen over iSCSI for years, and we are turning to >> Netapp NAS to migrate to NFS for the following reasons : >> - Easier setup, no matter how many xen hosts you have. >> - Avoid dealing with Cluster Filesystem. (We use OCFS2 quite successfully >> anyway.) >> - Some 3 years ago, we were using nbd + drbd + clvm2 + fenced + ... to have >> direct lvm in your VM works, but it''s too complicated and too complex to >> maintain, especially if you have an issue or more than 2 dom0. >> - According to the bench I received, there is no difference in performances >> between NFS and iSCSI. There are some differences in some cases due to the >> different levels of cache. >> >> Regards, >> >> François. >> > > Hi, was this benchmark specific to Xen?, is it public? > > Regards, > >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Hi, This was a study made on VMWare over NFS over Netapp (vs iSCSI and vs FC). I just checked, and the document is public : http://www.vmug.be/VMUG/Upload//meetings/VMUGBE06_20081010nfs.pdf Regards, François. ----- Original Message ----- From: "Ciro Iriarte" <cyruspy@gmail.com> To: "François Delpierre" <xensource@pivert.org> Cc: xen-users@lists.xensource.com Sent: Friday, 7 August, 2009 17:07:31 GMT +01:00 Amsterdam / Berlin / Bern / Rome / Stockholm / Vienna Subject: Re: [Xen-users] iscsi vs nfs for xen VMs 2009/8/7 François Delpierre <xensource@pivert.org>:> Hi, > > Quite funny. We are using Xen over iSCSI for years, and we are turning to > Netapp NAS to migrate to NFS for the following reasons : > - Easier setup, no matter how many xen hosts you have. > - Avoid dealing with Cluster Filesystem. (We use OCFS2 quite successfully > anyway.) > - Some 3 years ago, we were using nbd + drbd + clvm2 + fenced + ... to have > direct lvm in your VM works, but it''s too complicated and too complex to > maintain, especially if you have an issue or more than 2 dom0. > - According to the bench I received, there is no difference in performances > between NFS and iSCSI. There are some differences in some cases due to the > different levels of cache. > > Regards, > > François.Hi, was this benchmark specific to Xen?, is it public? Regards, -- Ciro Iriarte http://cyruspy.wordpress.com -- _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Fri, Aug 7, 2009 at 8:48 PM, Jia Rao<rickenrao@gmail.com> wrote:> I prefer the second approach because I need tapdisk (each virtual disk is a > process in host machines) to control the IO priority among VMs.You can still control IO priority for block device using dm-ioband. http://sourceforge.net/apps/trac/ioband/ As for iscsi or nfs, the general rule would be that nfs is easier, but you need a high-performance nfs server to get decent performance (NetApp comes to mind). If you''ve already tried nfs and not satisfied with its performance, then you shoud try iscsi. Each domU storage exported as a block device, imported on dom0 ,and use /dev/disk/by-path/* in domU config. It should give simple-enough (e.g. no cluster fs required) setup, decent performance, and it''s still possible to control disk I/O priority and bandwitdh. -- Fajar _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
dm-ioband seems to work with the new kernels (2.6.31). We are still using 2.6.18.8 and reluctant to upgrade to a new kernel. Can dm-ioband work with older kernels? Jia On Sun, Aug 9, 2009 at 11:11 PM, Fajar A. Nugraha <fajar@fajar.net> wrote:> On Fri, Aug 7, 2009 at 8:48 PM, Jia Rao<rickenrao@gmail.com> wrote: > > I prefer the second approach because I need tapdisk (each virtual disk is > a > > process in host machines) to control the IO priority among VMs. > > You can still control IO priority for block device using dm-ioband. > http://sourceforge.net/apps/trac/ioband/ > > As for iscsi or nfs, the general rule would be that nfs is easier, but > you need a high-performance nfs server to get decent performance > (NetApp comes to mind). > > If you''ve already tried nfs and not satisfied with its performance, > then you shoud try iscsi. Each domU storage exported as a block > device, imported on dom0 ,and use /dev/disk/by-path/* in domU config. > It should give simple-enough (e.g. no cluster fs required) setup, > decent performance, and it''s still possible to control disk I/O > priority and bandwitdh. > > -- > Fajar >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Mon, Aug 10, 2009 at 9:28 PM, Jia Rao<rickenrao@gmail.com> wrote:> dm-ioband seems to work with the new kernels (2.6.31). We are still using > 2.6.18.8 and reluctant to upgrade to a new kernel. > Can dm-ioband work with older kernels?On the project page you''ll see binary for RHEL/Centos5, which is a 2.6.18. So yes, it should work. Personally I just use Redhat''s kernel-xen (which works even for Xen >= 3.3, as long as you don''t need scsi passthru or other newer stuff) and the provided dm-ioband binary. -- Fajar _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Hi Jia, Was there any specific purpose you choose GFS2 + clvmd over NFS ? In my experience, like a year ago, GFS2 performed very poorly compared to EXT3 under high load and we had to rollback to EXT3. I am pretty sure that NFS would outperform GFS2 if VMs have high IO to the disk images. Could you give any IOPS. throughput values you achive to the GFS2 file system with the current configuration? thanks., OZ On Fri, Aug 7, 2009 at 10:26 AM, Dustin Black <vantage@redshade.net> wrote:> We are using iSCSI with CLVM and GFS2 very successfully with 10 physical > Xen servers and 80 to 100 VMs running across them. We use file-based disk > images, all stored on a single GFS2 file system on a single iSCSI LUN > accessible by all 10 Xen servers. > > > On Fri, Aug 7, 2009 at 10:11 AM, Jia Rao <rickenrao@gmail.com> wrote: > >> Thank you very much for the prompt replies. >> >> My intention of moving to iscsi is due to pure performance purpose. >> The physical hosts and the storage server are connected through a 1G >> switch. The storage server uses raid-5 disk array. >> My current testing using iozone within VMs on both iscsi and nfs produced >> similar performance results for sequential and random read. >> >> I was told it will make a big difference if there are 10-15 VMs sharing >> the storage server. In my case, I have 8-10 VMs. >> >> Any experience with a larger number of VMs in both nfs and iscsi? >> >> Jia. >> > > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
thanks first. i am research this solutoin,but none documents i have, would you share you experence with me ,better give me a document of deploying. thanks. my msn: yue-luck@hotmail.com. iscsi+clvm resolve lots of things, like migration,shapshot, iscsi+clvm+gfs2? iscsi+gfs2? share with me ,thanks -- View this message in context: http://xen.1045712.n5.nabble.com/iscsi-vs-nfs-for-xen-VMs-tp2598762p3357410.html Sent from the Xen - User mailing list archive at Nabble.com. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Fri, Aug 7, 2009 at 4:26 PM, Dustin Black <vantage@redshade.net> wrote:> We are using iSCSI with CLVM and GFS2 very successfully with 10 physical Xen > servers and 80 to 100 VMs running across them. We use file-based disk > images, all stored on a single GFS2 file system on a single iSCSI LUN > accessible by all 10 Xen servers. > >What do you do if your iSCSI SAN breaks? -- Kind Regards Rudi Ahlers SoftDux Website: http://www.SoftDux.com Technical Blog: http://Blog.SoftDux.com Office: 087 805 9573 Cell: 082 554 7532 _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
redundant switches nic bonding and you are fine... Am 26.01.11 07:26, schrieb Rudi Ahlers:> On Fri, Aug 7, 2009 at 4:26 PM, Dustin Black<vantage@redshade.net> wrote: >> We are using iSCSI with CLVM and GFS2 very successfully with 10 physical Xen >> servers and 80 to 100 VMs running across them. We use file-based disk >> images, all stored on a single GFS2 file system on a single iSCSI LUN >> accessible by all 10 Xen servers. >> >> > > > > What do you do if your iSCSI SAN breaks? > >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Wed, Jan 26, 2011 at 10:20 AM, Juergen Gotteswinter <jg@internetx.de> wrote:> redundant switches > nic bonding >How, exactly, will redundant switches & NIC bonding help if the NAS device fails. i.e. it''s totally dead? Redundant switches & NIC bonding will only help you if the network fails - which is much less likely than the NAS to fail. -- Kind Regards Rudi Ahlers SoftDux Website: http://www.SoftDux.com Technical Blog: http://Blog.SoftDux.com Office: 087 805 9573 Cell: 082 554 7532 _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Depends on quality of NAS/SAN device. Some of them are more reliable&robust that rest of the infrastructure (dual controllers, raid6, multipathing etc.), obviously they cost arm&leg. So they SHOULD not totally fail (firmware issues are another thing though). And in that case, even if one owns enterprise grade storage, backups (tape, another storage, remote site) are always must. Yeah, if storage fails, there will be downtime. You can still have locals disks on xen host. So for example you can restore most important Xen guests on the local disks from backups and live without live migration until the NAS/SAN issues are solved. Matej ________________________________________ From: xen-users-bounces@lists.xensource.com [xen-users-bounces@lists.xensource.com] On Behalf Of Rudi Ahlers [Rudi@SoftDux.com] Sent: 26 January 2011 09:29 To: jg@internetx.de Cc: Dustin Black; xen-users@lists.xensource.com Subject: Re: [Xen-users] iscsi vs nfs for xen VMs On Wed, Jan 26, 2011 at 10:20 AM, Juergen Gotteswinter <jg@internetx.de> wrote:> redundant switches > nic bonding >How, exactly, will redundant switches & NIC bonding help if the NAS device fails. i.e. it''s totally dead? Redundant switches & NIC bonding will only help you if the network fails - which is much less likely than the NAS to fail. -- Kind Regards Rudi Ahlers SoftDux Website: http://www.SoftDux.com Technical Blog: http://Blog.SoftDux.com Office: 087 805 9573 Cell: 082 554 7532 _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Wed, Jan 26, 2011 at 10:44 AM, Matej Zary <matej.zary@cvtisr.sk> wrote:> Depends on quality of NAS/SAN device. Some of them are more reliable&robust that rest of the infrastructure (dual controllers, raid6, multipathing etc.), obviously they cost arm&leg. So they SHOULD not totally fail (firmware issues are another thing though). And in that case, even if one owns enterprise grade storage, backups (tape, another storage, remote site) are always must. Yeah, if storage fails, there will be downtime. You can still have locals disks on xen host. So for example you can restore most important Xen guests on the local disks from backups and live without live migration until the NAS/SAN issues are solved. > > Matej > ________________________________________Well, that''s the problem. We have (had, soon to be returned) a so called "enterprise SAN" with dual everything, but it failed miserably during December and we ended up migrating everyone to a few older NAS devices just to get the client''s websites up again (VPS hosting). So, just cause a SAN has dual PSU''s, dual controllers, dual NIC''s, dual HEAD''s, etc doesn''t mean it''s non-redundant. I''m thinking of setting up 2 independent SAN''s, of for that matter even NAS clusters, and then doing something like RAID1 (mirror) on the client nodes with the iSCSI mounts. But, I don''t know if it''s feasible or worth the effort. Has anyone done something like this ? -- Kind Regards Rudi Ahlers SoftDux Website: http://www.SoftDux.com Technical Blog: http://Blog.SoftDux.com Office: 087 805 9573 Cell: 082 554 7532 _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
we use storages with redundant controllers, network etc for example equalogic, emc or hp in really critical scenarios you can stack equalogic for example no problems since years... even if a power supply went dead, or a storage controller died everything keeped working Am 26.01.11 09:29, schrieb Rudi Ahlers:> On Wed, Jan 26, 2011 at 10:20 AM, Juergen Gotteswinter<jg@internetx.de> wrote: >> redundant switches >> nic bonding >> > > > How, exactly, will redundant switches& NIC bonding help if the NAS > device fails. i.e. it''s totally dead? Redundant switches& NIC bonding > will only help you if the network fails - which is much less likely > than the NAS to fail. > >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
More on the bottom of message, Exchange and OWA sucks. :( ________________________________________ From: Rudi Ahlers [Rudi@SoftDux.com] Sent: 26 January 2011 09:55 To: Matej Zary Cc: jg@internetx.de; Dustin Black; xen-users@lists.xensource.com Subject: Re: [Xen-users] iscsi vs nfs for xen VMs On Wed, Jan 26, 2011 at 10:44 AM, Matej Zary <matej.zary@cvtisr.sk> wrote:> Depends on quality of NAS/SAN device. Some of them are more reliable&robust that rest of the infrastructure (dual controllers, raid6, multipathing etc.), obviously they cost arm&leg. So they SHOULD not totally fail (firmware issues are another thing though). And in that case, even if one owns enterprise grade storage, backups (tape, another storage, remote site) are always must. Yeah, if storage fails, there will be downtime. You can still have locals disks on xen host. So for example you can restore most important Xen guests on the local disks from backups and live without live migration until the NAS/SAN issues are solved. > > Matej > ________________________________________Well, that''s the problem. We have (had, soon to be returned) a so called "enterprise SAN" with dual everything, but it failed miserably during December and we ended up migrating everyone to a few older NAS devices just to get the client''s websites up again (VPS hosting). So, just cause a SAN has dual PSU''s, dual controllers, dual NIC''s, dual HEAD''s, etc doesn''t mean it''s non-redundant. I''m thinking of setting up 2 independent SAN''s, of for that matter even NAS clusters, and then doing something like RAID1 (mirror) on the client nodes with the iSCSI mounts. But, I don''t know if it''s feasible or worth the effort. Has anyone done something like this ? -- Kind Regards Rudi Ahlers SoftDux Website: http://www.SoftDux.com Technical Blog: http://Blog.SoftDux.com Office: 087 805 9573 Cell: 082 554 7532 ------------------------------------------------------ Well, that sucks. We are in similar situation storage vise, we have HP XP24000 (made by Hitachi) with midrange IBM Dseries extending the capacity. And if the XP fails, we are screwed, even though we have all needed backups on tapes - we run VMware ESX (not my choice :D) in this datacenter, but the HW resources (blade servers) doesn''t offer enough local disk capacity to run the VMs locally in catastrophic scenario. Solution can be "enterprise" NAS/SAN redundancy. Some of the enterprise storage devices can be onlince synced with another even in another distant geographical location (e.g NetApp, our XP offers something a bit similar IIRC). Sure it has performance hit, but whats worse, it cost shitload of money, so it''s out of the question in most cases. :/ Matej _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Hello guys, presently, i am playing with http://www.gluster.com/ and its volumes, it will not offer you block level volume access as you are exporting those with NFS, but the design is allowing you to create highly available geographically distant storage for files. If you can live with file based VM`s, give it a try. r. Matej Zary wrote:> More on the bottom of message, Exchange and OWA sucks. :( > ________________________________________ > From: Rudi Ahlers [Rudi@SoftDux.com] > Sent: 26 January 2011 09:55 > To: Matej Zary > Cc: jg@internetx.de; Dustin Black; xen-users@lists.xensource.com > Subject: Re: [Xen-users] iscsi vs nfs for xen VMs > > On Wed, Jan 26, 2011 at 10:44 AM, Matej Zary <matej.zary@cvtisr.sk> wrote: >> Depends on quality of NAS/SAN device. Some of them are more reliable&robust that rest of the infrastructure (dual controllers, raid6, multipathing etc.), obviously they cost arm&leg. So they SHOULD not totally fail (firmware issues are another thing though). And in that case, even if one owns enterprise grade storage, backups (tape, another storage, remote site) are always must. Yeah, if storage fails, there will be downtime. You can still have locals disks on xen host. So for example you can restore most important Xen guests on the local disks from backups and live without live migration until the NAS/SAN issues are solved. >> >> Matej >> ________________________________________ > > > Well, that''s the problem. We have (had, soon to be returned) a so > called "enterprise SAN" with dual everything, but it failed miserably > during December and we ended up migrating everyone to a few older NAS > devices just to get the client''s websites up again (VPS hosting). So, > just cause a SAN has dual PSU''s, dual controllers, dual NIC''s, dual > HEAD''s, etc doesn''t mean it''s non-redundant. > > I''m thinking of setting up 2 independent SAN''s, of for that matter > even NAS clusters, and then doing something like RAID1 (mirror) on the > client nodes with the iSCSI mounts. But, I don''t know if it''s feasible > or worth the effort. Has anyone done something like this ? > > > -- > Kind Regards > Rudi Ahlers > SoftDux > > Website: http://www.SoftDux.com > Technical Blog: http://Blog.SoftDux.com > Office: 087 805 9573 > Cell: 082 554 7532 > ------------------------------------------------------ > > Well, that sucks. We are in similar situation storage vise, we have HP XP24000 (made by Hitachi) with midrange IBM Dseries extending the capacity. And if the XP fails, we are screwed, even though we have all needed backups on tapes - we run VMware ESX (not my choice :D) in this datacenter, but the HW resources (blade servers) doesn''t offer enough local disk capacity to run the VMs locally in catastrophic scenario. > > Solution can be "enterprise" NAS/SAN redundancy. Some of the enterprise storage devices can be onlince synced with another even in another distant geographical location (e.g NetApp, our XP offers something a bit similar IIRC). Sure it has performance hit, but whats worse, it cost shitload of money, so it''s out of the question in most cases. :/ > > Matej > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
2011/1/26 riki <phobie@axfr.org>:> Hello guys, > > presently, i am playing with http://www.gluster.com/ and its volumes, it > will not offer you block level volume access as you are exporting those > with NFS, but the design is allowing you to create highly available > geographically distant storage for files. > > If you can live with file based VM`s, give it a try. > > r. > >Can you use LVM on it? And is it not possible to install the Linux iSCS enterprise target to export it as a block device? -- Kind Regards Rudi Ahlers SoftDux Website: http://www.SoftDux.com Technical Blog: http://Blog.SoftDux.com Office: 087 805 9573 Cell: 082 554 7532 _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Hi, if you want to push on performance the best one is: CLVM over iSCSI if you need fully redundancy you have to double everything: - switches - network ports - PSUs and - storages. If you want to a completely redundant storage solution you can use DRBD in active-active Just some notes: - VMs over files over NFS is slow (only some vendors have a relative fast NFS appliance). - VMs over files over a cluster FS is slow everytime you add a layer (in particular a clustered FS layer) your performances drop down ... so make it simple Best regards, Christian P.S. another interesting approach would be NFS over RDMA (infiniband) ...most of the advantages of NFS with less disadvantages compared to NFS over TCP/IP _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Wed, Jan 26, 2011 at 12:38 PM, Christian Zoffoli <czoffoli@xmerlin.org> wrote:> Hi, > > if you want to push on performance the best one is: > > CLVM over iSCSI > > if you need fully redundancy you have to double everything: > - switches > - network ports > - PSUs > > and > > - storages.That''s what I''m trying to get to.> > If you want to a completely redundant storage solution you can use DRBD > in active-active > > Just some notes: > - VMs over files over NFS is slow (only some vendors have a relative > fast NFS appliance). > - VMs over files over a cluster FS is slow >So, what other options do we have, which is not slow?> everytime you add a layer (in particular a clustered FS layer) your > performances drop down ... so make it simple > > > Best regards, > Christian > > P.S. another interesting approach would be NFS over RDMA (infiniband) > ...most of the advantages of NFS with less disadvantages compared to NFS > over TCP/IP >But this soon becomes very expensive, often beyond the point of making a decent profit out of the setup without charging the clients so much that they''ll rather go elsewhere. And in many cases it means new NAS and SAN equipment as well since not all of them support infiniband -- Kind Regards Rudi Ahlers SoftDux Website: http://www.SoftDux.com Technical Blog: http://Blog.SoftDux.com Office: 087 805 9573 Cell: 082 554 7532 _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Hello, no you do not have block-level access, so you can not use LVM on it. r. Rudi Ahlers wrote:> 2011/1/26 riki <phobie@axfr.org>: >> Hello guys, >> >> presently, i am playing with http://www.gluster.com/ and its volumes, it >> will not offer you block level volume access as you are exporting those >> with NFS, but the design is allowing you to create highly available >> geographically distant storage for files. >> >> If you can live with file based VM`s, give it a try. >> >> r. >> >> > > Can you use LVM on it? > > And is it not possible to install the Linux iSCS enterprise target to > export it as a block device? > >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> > On Wed, Jan 26, 2011 at 10:44 AM, Matej Zary <matej.zary@cvtisr.sk> wrote: > > Depends on quality of NAS/SAN device. Some of them are more reliable&robust > that rest of the infrastructure (dual controllers, raid6, multipathing etc.), > obviously they cost arm&leg. So they SHOULD not totally fail (firmware issues > are another thing though). And in that case, even if one owns enterprise > grade storage, backups (tape, another storage, remote site) are always must. > Yeah, if storage fails, there will be downtime. You can still have locals > disks on xen host. So for example you can restore most important Xen guests on > the local disks from backups and live without live migration until the NAS/SAN > issues are solved. > > > > Matej > > ________________________________________ > > > Well, that''s the problem. We have (had, soon to be returned) a so > called "enterprise SAN" with dual everything, but it failed miserably > during December and we ended up migrating everyone to a few older NAS > devices just to get the client''s websites up again (VPS hosting). So, > just cause a SAN has dual PSU''s, dual controllers, dual NIC''s, dual > HEAD''s, etc doesn''t mean it''s non-redundant. > > I''m thinking of setting up 2 independent SAN''s, of for that matter > even NAS clusters, and then doing something like RAID1 (mirror) on the > client nodes with the iSCSI mounts. But, I don''t know if it''s feasible > or worth the effort. Has anyone done something like this ? >There are plenty of recipes for DRBD + pacemaker/heartbeat + iSCSI. With appropriate redundancy in place and plenty of testing you should be able to build something that''s pretty much bulletproof. James _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On 26/01/11 09:36, riki wrote:> Hello guys, > > presently, i am playing with http://www.gluster.com/ and its volumes, it > will not offer you block level volume access as you are exporting those > with NFS, but the design is allowing you to create highly available > geographically distant storage for files. > > If you can live with file based VM`s, give it a try. > > r.Is glusterfs not really slow? What performance are you getting with the Xen DomUs? Are take it your using gluster in the Dom0? _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> Ahlers: >> On Fri, Aug 7, 2009 at 4:26 PM, Dustin Black<vantage@redshade.net> >> wrote: >>> We are using iSCSI with CLVM and GFS2 very successfully with 10 >>> physical Xen >>> servers and 80 to 100 VMs running across them. We use file-based disk >>> images, all stored on a single GFS2 file system on a single iSCSI LUN >>> accessible by all 10 Xen servers. >>>So you have the same LUN exported to 10 physical Xen hosts? I take it you are using img files for your DomUs then? How do you find performance? What configuration is your SAN? _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Il 26/01/2011 11:44, Rudi Ahlers ha scritto: [cut]> So, what other options do we have, which is not slow?to increase performances you can skip iscsi and you can try a SAS SAN (with some LSI SAS switch if you need more than 4 servers linked to a single SAN) actually you can have a large amount of power inside a server chassis so ...the problem is only I/O and sometimes one storage cannot handle more than 2/3 servers [cut]> But this soon becomes very expensive, often beyond the point of making > a decent profit out of the setup without charging the clients so much > that they''ll rather go elsewhere. And in many cases it means new NAS > and SAN equipment as well since not all of them support infinibandinfiniband is not so expensive ...if you move ~ all to infiniband is very cost effective but you have to invest a large amount of time. Christian _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Hi all, i''m following with grat interest this thread. Recently I performed some tests on DomUs backed by LVM partitions (also snapshotted ones). I''m actually trying to figure out why I have some strange performace results, after that, I''m planning to move my test over network storages like iSCSI, Raided iSCSI and so on. Maybe we can start an effort to collect data on VM performance over various storage configurations? It should be sufficient to share some performance tests we run over our infrastructures and then organize them in a common place (this ML, a web saite, the xen wiki, etc.). Considering the large use of xen in enterprises, it can help lots of us in designing the right infrastructure! Cheers RB -- Roberto Bifulco, Ph.D. Student robertobifulco.it COMICS Lab - www.comics.unina.it _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Wed, Jan 26, 2011 at 4:42 PM, Christian Zoffoli <czoffoli@xmerlin.org> wrote:> Il 26/01/2011 11:44, Rudi Ahlers ha scritto: > [cut] >> So, what other options do we have, which is not slow? > > to increase performances you can skip iscsi and you can try a SAS SAN > (with some LSI SAS switch if you need more than 4 servers linked to a > single SAN)Do you get different types of SAN? SAN = iSCSI, last time I checked.> > actually you can have a large amount of power inside a server chassis so > ...the problem is only I/O and sometimes one storage cannot handle more > than 2/3 servers > > > [cut] >> But this soon becomes very expensive, often beyond the point of making >> a decent profit out of the setup without charging the clients so much >> that they''ll rather go elsewhere. And in many cases it means new NAS >> and SAN equipment as well since not all of them support infiniband > > infiniband is not so expensive ...if you move ~ all to infiniband is > very cost effective but you have to invest a large amount of time. >It depends where you are, in our country it''s exuberantly expensive.> > Christian > > > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users >-- Kind Regards Rudi Ahlers SoftDux Website: http://www.SoftDux.com Technical Blog: http://Blog.SoftDux.com Office: 087 805 9573 Cell: 082 554 7532 _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Il 26/01/2011 15:50, Rudi Ahlers ha scritto:> On Wed, Jan 26, 2011 at 4:42 PM, Christian Zoffoli <czoffoli@xmerlin.org> wrote: >> Il 26/01/2011 11:44, Rudi Ahlers ha scritto: >> [cut] >>> So, what other options do we have, which is not slow? >> >> to increase performances you can skip iscsi and you can try a SAS SAN >> (with some LSI SAS switch if you need more than 4 servers linked to a >> single SAN) > > Do you get different types of SAN? SAN = iSCSI, last time I checked.SAS SANs are SANs with miniSAS 4x links to the hosts and SAN is not necessarly iSCSI id could be: FC SAN SAS SAN iSCSI SAN Aoe SAN IB SAN Best regards, Christian _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Il 26/01/2011 15:49, Roberto Bifulco ha scritto: [cut]> Considering the large use of xen in enterprises, it can help lots of us > in designing the right infrastructure!fio is the way http://git.kernel.dk/?p=fio.git;a=summary you can simply test any storage / raid level / setup and you can compare all choosing the right solution. But please, check every blocksize, all the disk areas and so on. If you test only some areas you can see overestimated or underestimated results because of caches, better mechanichal performances on some areas and so on. If you test all you have an average result that''s close to the reality. Sequential transfers tipically looks pretty ...but they are not what you are searching for. Random IOPS (read / write / combined) are what you are searching for. I''m agree that real loads are something mixed but RANDOM IOPS are closer to the reality. what you have to search are the worst values and they are what do you need to find how is big your I/O cake. Christian _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
do you have success experence on deploying clvm+iscsi? sometimes image is more comfortably, migration, easy copy, thanks, ---------------------------------------- At 2011-01-26 18:38:52,"Christian Zoffoli" <czoffoli@xmerlin.org> wrote:>Hi, > >if you want to push on performance the best one is: > >CLVM over iSCSI > >if you need fully redundancy you have to double everything: >- switches >- network ports >- PSUs > >and > >- storages. > >If you want to a completely redundant storage solution you can use DRBD >in active-active > >Just some notes: >- VMs over files over NFS is slow (only some vendors have a relative >fast NFS appliance). >- VMs over files over a cluster FS is slow > >everytime you add a layer (in particular a clustered FS layer) your >performances drop down ... so make it simple > > >Best regards, >Christian > >P.S. another interesting approach would be NFS over RDMA (infiniband) >...most of the advantages of NFS with less disadvantages compared to NFS >over TCP/IP > >_______________________________________________ >Xen-users mailing list >Xen-users@lists.xensource.com >http://lists.xensource.com/xen-users_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
yes, there has no a good silution. 1.san+gfs2(ocfs2) 2.san+clvm 3san+clvm+gfs2(ocfs2) 4san+normal filesystem, ext3..... which has the better performance? ---------------------------------------------------------------------------------------------- At 2011-01-26 22:49:49,"Roberto Bifulco" <roberto.bifulco2@unina.it> wrote: Hi all, i'm following with grat interest this thread. Recently I performed some tests on DomUs backed by LVM partitions (also snapshotted ones). I'm actually trying to figure out why I have some strange performace results, after that, I'm planning to move my test over network storages like iSCSI, Raided iSCSI and so on. Maybe we can start an effort to collect data on VM performance over various storage configurations? It should be sufficient to share some performance tests we run over our infrastructures and then organize them in a common place (this ML, a web saite, the xen wiki, etc.). Considering the large use of xen in enterprises, it can help lots of us in designing the right infrastructure! Cheers RB -- Roberto Bifulco, Ph.D. Student robertobifulco.it COMICS Lab -www.comics.unina.it _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
CLVM and GFS2 ,could tell how you deploy? more details are expected. ---------------------------------------------------------- At 2011-01-26 19:38:05,"Jonathan Tripathy" <jonnyt@abpni.co.uk> wrote:> >> Ahlers: >>> On Fri, Aug 7, 2009 at 4:26 PM, Dustin Black<vantage@redshade.net> >>> wrote: >>>> We are using iSCSI with CLVM and GFS2 very successfully with 10 >>>> physical Xen >>>> servers and 80 to 100 VMs running across them. We use file-based disk >>>> images, all stored on a single GFS2 file system on a single iSCSI LUN >>>> accessible by all 10 Xen servers. >>>> >So you have the same LUN exported to 10 physical Xen hosts? I take it >you are using img files for your DomUs then? How do you find >performance? What configuration is your SAN? > >_______________________________________________ >Xen-users mailing list >Xen-users@lists.xensource.com >http://lists.xensource.com/xen-users_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Wed, Jan 26, 2011 at 12:55 AM, Rudi Ahlers <Rudi@softdux.com> wrote:> Well, that''s the problem. We have (had, soon to be returned) a so > called "enterprise SAN" with dual everything, but it failed miserably > during December and we ended up migrating everyone to a few older NAS > devices just to get the client''s websites up again (VPS hosting). So, > just cause a SAN has dual PSU''s, dual controllers, dual NIC''s, dual > HEAD''s, etc doesn''t mean it''s non-redundant. > > I''m thinking of setting up 2 independent SAN''s, of for that matter > even NAS clusters, and then doing something like RAID1 (mirror) on the > client nodes with the iSCSI mounts. But, I don''t know if it''s feasible > or worth the effort. Has anyone done something like this ?Our plan is to use FreeBSD + HAST + ZFS + CARP to create a redundant/fail-over storage setup, using NFS. VM hosts will boot off the network and mount / via NFS, start up libvirtd, pick up their VM configs, and start the VMs. The guest OSes will also boot off the network using NFS, with separate ZFS filesystems for each guest. If the master storage node fails for any reason (network, power, storage, etc), CARP/HAST will fail-over to the slave node, and everything carries on as before. NFS clients will notice the link is down, try again, try again, try again, notice the slave node is up (same IP/hostname), and carry on. The beauty of using NFS is that backups can be done from the storage box without touching the VMs (snapshot, backup from snapshot). And provisioning a new server is as simple as cloning a ZFS filesystem (takes a few seconds). There''s also no need to worry about sizing the storage (NFS can grow/shrink without the client caring); and even less to worry about due to the pooled storage setup of ZFS (if there''s blocks available in the pool, any filesystem can use it). Add in dedupe and compression across the entire pool ... and storage needs go way down. It''s also a lot easier to configure live-migration using NFS than iSCSI. To increase performance, just add a couple of fast SSDs (one for write logging, one for read caching) and let ZFS handle it. Internally, the storage boxes have multiple CPUs, multiple cores, multiple PSUs, multiple NICs bonded together, multiple drive controllers etc. And then there''s two of them (one physically across town connected via fibre). VM hosts are basically throw-away appliances with gobs of CPU, RAM, and NICs, and no local storage to worry about. One fails, just swap it with another and add it to the VM pool. Can''t get much more redundant than that. If there''s anything that we''ve missed, let me know. :) -- Freddie Cash fjwcash@gmail.com _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Christian, for sure the first things to define are tools and methods to perform tests... and yes, we have to test blocksizes, disk areas, caching effects just to cite some of the involved variables, but we also need to test VM colocation effects, and the overall storage system overhead (we are not aiming at testing disk performance, our purpose is to test software/hardware storage systems in virtualized environments). Anyway, when I talk about sharing test results, I''m thinking about tests that stress an hardware configuration using different approaches, e.g. LVM over iSCSI PV compared to VM images over NFS . That''s beacuse, I think that "absolute" tests of a single configuration are not so useful: from comparisons over the same harware we can be more confident that the results we get are still valid over a similar (clearly not exactly the same!!) configuration. Roberto B. 2011/1/26 Christian Zoffoli <czoffoli@xmerlin.org>> Il 26/01/2011 15:49, Roberto Bifulco ha scritto: > [cut] > > Considering the large use of xen in enterprises, it can help lots of us > > in designing the right infrastructure! > > fio is the way http://git.kernel.dk/?p=fio.git;a=summary > > you can simply test any storage / raid level / setup and you can compare > all choosing the right solution. > > But please, check every blocksize, all the disk areas and so on. > > If you test only some areas you can see overestimated or underestimated > results because of caches, better mechanichal performances on some areas > and so on. > > If you test all you have an average result that''s close to the reality. > > Sequential transfers tipically looks pretty ...but they are not what you > are searching for. > Random IOPS (read / write / combined) are what you are searching for. > I''m agree that real loads are something mixed but RANDOM IOPS are closer > to the reality. > > what you have to search are the worst values and they are what do you > need to find how is big your I/O cake. > > Christian > > > > > > >-- Roberto Bifulco, Ph.D. Student robertobifulco.it COMICS Lab - www.comics.unina.it _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Il 26/01/2011 18:58, Roberto Bifulco ha scritto: [cut]> from comparisons over the same harware we can be more confident that the > results we get are still > valid over a similar (clearly not exactly the same!!) configuration.tipically tests are quite incomparable. If you change disks (type, brand, size, number, raid level) or some settings or hw you can obtain very different results. IMHO the right way is to find how many IOPS do you need to archive your load and then you can choose disk type, raid type, rpm etc Tipically, the SAN type (iSCSI, FC, etc) doesn''t affect IOPS ...so if you need 4000 IOPS of a mixed 70/30 RW you can simply calculate the iron you need to archive this. Nevertheless, the connection type affects bandwidth between servers and storage(s), latency and how many VMs you can put on a single piece of hw. In other words, if you have good iron on the disk/controller side you can archive for example 100 VMs but if the bottleneck is your connection probably you have to reduce the overbooking level. iSCSI tipically has a quite big overhead due to the protocol, FC, SAS, native infiniband, AoE have very low overhead. Best regards, Christian _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Il 26/01/2011 17:07, yue ha scritto:> yes, there has no a good silution. > 1.san+gfs2(ocfs2) > 2.san+clvm > 3san+clvm+gfs2(ocfs2) > 4san+normal filesystem, ext3..... > which has the better performance?4 if your SAN exports as many luns as your VM disks 2 is better IMHO ...more flexible, not so high overhead Christian _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
2011/1/26 Christian Zoffoli <czoffoli@xmerlin.org>> Il 26/01/2011 18:58, Roberto Bifulco ha scritto: > [cut] > > from comparisons over the same harware we can be more confident that the > > results we get are still > > valid over a similar (clearly not exactly the same!!) configuration. > > > tipically tests are quite incomparable. > If you change disks (type, brand, size, number, raid level) or some > settings or hw you can obtain very different results. >That''s why I said I''m interested in comparison over the same hardware. Then, results can be generalized if you mantain some variables (such as the "architecture") unchanged. To be clearer, if I found that NFS is slower than LVM over iSCSI, this is likely to be true on fast disks and on slow ones, assuming that the network isn''t a bottleneck.> > IMHO the right way is to find how many IOPS do you need to archive your > load and then you can choose disk type, raid type, rpm etc >I''m actually not interested in numbers. I was just saying: each of us perform some tests to define the storage architecture that best fits his needs, just share results, so that other ones can decide in terms of "this one is better for performance, but worst for flexibility" and so on...> > Tipically, the SAN type (iSCSI, FC, etc) doesn''t affect IOPS ...so if > you need 4000 IOPS of a mixed 70/30 RW you can simply calculate the iron > you need to archive this.> Nevertheless, the connection type affects bandwidth between servers and > storage(s), latency and how many VMs you can put on a single piece of hw. > > In other words, if you have good iron on the disk/controller side you > can archive for example 100 VMs but if the bottleneck is your connection > probably you have to reduce the overbooking level. > > iSCSI tipically has a quite big overhead due to the protocol, FC, SAS, > native infiniband, AoE have very low overhead. >Things like bandwidth consuption, latency, CPU cost and so on, should be included in the evaluation of a storage architecture for virtualized systems. Again, I''m talking about an high view of the system performance as a whole and not solely of the disks, raid controller, etc. performance. Do you think that such an approach is useless? I''m not an expert in storage devices, but I''m quite interested in the flexibility you can get abstracting and combining them. That''s why I''m asking about "architecture" performance. Regards, Roberto -- Roberto Bifulco, Ph.D. Student robertobifulco.it COMICS Lab - www.comics.unina.it _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Wed, Jan 26, 2011 at 12:11 PM, Freddie Cash <fjwcash@gmail.com> wrote:> Our plan is to use FreeBSD + HAST + ZFS + CARP to create a > redundant/fail-over storage setup, using NFS. VM hosts will boot off > the network and mount / via NFS, start up libvirtd, pick up their VM > configs, and start the VMs. The guest OSes will also boot off the > network using NFS, with separate ZFS filesystems for each guest.that''s a thing I''ve been thinking also: it used to be common to deploy no-storage servers that NFS-mounted their systems. a good NFS server (or server farm) works really good for that. why would VMs need block storage? Of course, the counterpoint are database servers, I wouldn''t store database tables on NFS. But a smaller SAN just for DB tablespaces might still be more manageable than filesystem images. -- Javier _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Il 26/01/2011 18:11, Freddie Cash ha scritto: [cut]> If there''s anything that we''ve missed, let me know. :)the exposed setup is very intesting but: a) ZFS on freebsd is not as stable as on solaris b) opensolaris is dead (oracle killed it) c) we have no guarantee that in the future oracle will release updated code d) NFS is slow ...NFS over RDMA is fast but freebsd has no open/official infiniband stack e) consistent snapshots are very different from backuping only files. For example if you backup a DB server copying files is not enought you have to dump also what you have in memory at the same time (the key word is "at the same time") Christian _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Il 26/01/2011 17:09, yue ha scritto:> CLVM and GFS2 ,could tell how you deploy? > more details are expected.install pacemaker, corosync, clvm compiled for the new stack and gfs2 ...but IMHO gfs2 is not stable enought ...gfs is stable but too old. OCFS2 is more interesting for xen vm hosting ...check "OCFS2 reflink" keywords Christian _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Wed, Jan 26, 2011 at 12:05 PM, Christian Zoffoli <czoffoli@xmerlin.org> wrote:> Il 26/01/2011 18:11, Freddie Cash ha scritto: > [cut] >> If there''s anything that we''ve missed, let me know. :) > > the exposed setup is very intesting but: > > a) ZFS on freebsd is not as stable as on solarisBut it''s plenty stable enough for our uses. Been using it for over 2 years now, since ZFSv6 hit FreeBSD 7.0. Once we got over the initial tuning glitches and upgraded to ZFSv14, things have been rock solid, even when swapping drives.> b) opensolaris is dead (oracle killed it) > c) we have no guarantee that in the future oracle will release updated codeFor ZFS? No, there are no guarantees. But the Illumos, Nexenta, and FreeBSD devs won''t be sitting still just waiting for Oracle to release something (look at the removal of the python dependency ZFS delegations in Illumos, for example). This may lead to a split in the future (Oracle ZFS vs OpenZFS). But that''s the future. ZFSv28 is available for FreeBSD right now, which supports all the features we''re looking for in ZFS.> d) NFS is slow ...NFS over RDMA is fast but freebsd has no open/official > infiniband stackNFS doesn''t have to be slow.> e) consistent snapshots are very different from backuping only files. > For example if you backup a DB server copying files is not enought you > have to dump also what you have in memory at the same time (the key word > is "at the same time")Yes, true. But having a cronjob in the guest (or having the backups server execute the command remotely) that does a dump of the database before the backup snapshot is created is pretty darn close to atomic, and hasn''t failed us yet in our restores. It''s not perfect, but so far, so good. Compared to the hassle of getting iSCSI live-migration working, and all the hassles of getting a cluster-aware LVM or FS setup, I''ll take a little drop in raw disk I/O. :) Ease of management trumps raw performance for us (we''re only 5 people managing servers for an entire school district of ~2100 staff and 50 schools). -- Freddie Cash fjwcash@gmail.com _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Il 26/01/2011 21:15, Freddie Cash ha scritto: [cut]> For ZFS? No, there are no guarantees. But the Illumos, Nexenta, and > FreeBSD devs won''t be sitting still just waiting for Oracle to release > something (look at the removal of the python dependency ZFS > delegations in Illumos, for example). This may lead to a split in the > future (Oracle ZFS vs OpenZFS). But that''s the future. ZFSv28 is > available for FreeBSD right now, which supports all the features we''re > looking for in ZFS.I''m hoping like you ...because ZFS is a great FS ...a bit memory hungry but great>> d) NFS is slow ...NFS over RDMA is fast but freebsd has no open/official >> infiniband stack > > NFS doesn''t have to be slow.in other words if your car can go at 300 Km/h ...with NFS you have a maximum of 180/200 ... [cut]> Yes, true. But having a cronjob in the guest (or having the backups > server execute the command remotely) that does a dump of the database > before the backup snapshot is created is pretty darn close to atomic, > and hasn''t failed us yet in our restores. It''s not perfect, but so > far, so good.of course> Compared to the hassle of getting iSCSI live-migration working, and > all the hassles of getting a cluster-aware LVM or FS setup, I''ll take > a little drop in raw disk I/O. :) Ease of management trumps raw > performance for us (we''re only 5 people managing servers for an entire > school district of ~2100 staff and 50 schools).not so difficult to setup ...I prefer to not waste money in additional storages and I prefer also to consolidate more VMs on a single storage. Christian _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Il 26/01/2011 20:04, Roberto Bifulco ha scritto: [cut]> That''s why I said I''m interested in comparison over the same hardware.so ...as before ...fio is *the* way ...you need something reliable to generate something useful for comparison. [cut]> I''m actually not interested in numbers. I was just saying: each of us > perform some tests to > define the storage architecture that best fits his needs, just share > results, so that other ones > can decide in terms of "this one is better for performance, but worst > for flexibility" > and so on...you can have a scientific approach only using numbers ...so you have to setup all and you have to test it. Christian _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Wed, Jan 26, 2011 at 3:05 PM, Christian Zoffoli <czoffoli@xmerlin.org> wrote:> d) NFS is slow ...NFS over RDMA is fast but freebsd has no open/official > infiniband stackZFS is slow for image files (unless you get a big honking NFS appliance); but in this case it''s serving files.> e) consistent snapshots are very different from backuping only files.in most cases, inconsistencies arise from long-open files and non-flushed caches/buffers on the guest. when serving images, restoring from such a backup is equivalent to a hard crash of the filesystem; at least would require a journal replay, possibly with lost data. But, this is not the case here, because NFS isn''t serving images. the other issue i know is about databases. just restoring the stored data won''t get you a consistent system. Definitely, backup databases ''from inside'', using database tools. -- Javier _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> > iSCSI tipically has a quite big overhead due to the protocol, FC, SAS, > native infiniband, AoE have very low overhead. >For iSCSI vs AoE, that isn''t as true as you might think. TCP offload can take care of a lot of the overhead. Any server class network adapter these days should allow you to send 60kb packets to the network adapter and it will take care of the segmentation, while AoE would be limited to MTU sized packets. With AoE you need to checksum every packet yourself while with iSCSI it is taken care of by the network adapter. James _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Il 26/01/2011 22:24, James Harper ha scritto:>> >> iSCSI tipically has a quite big overhead due to the protocol, FC, SAS, >> native infiniband, AoE have very low overhead. >> > > For iSCSI vs AoE, that isn''t as true as you might think. TCP offload can > take care of a lot of the overhead. Any server class network adapter > these days should allow you to send 60kb packets to the network adapter > and it will take care of the segmentation, while AoE would be limited to > MTU sized packets. With AoE you need to checksum every packet yourself > while with iSCSI it is taken care of by the network adapter.the overhead is 10% on a gigabit link and when you speak about resources overhead you have mention also the CPU overhead on the storage side. If you check the datasheets of brands like emc you can see that the same storage platform is sold in iSCSI and FC version ...on the first one you can use less than half the servers you can use with the last one. Every new entry level storage is based on std hardware without any hw acceleration ...for example EMC AX storages are simply xeon servers. Christian _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Wed, Jan 26, 2011 at 5:10 PM, Christian Zoffoli <czoffoli@xmerlin.org> wrote:> Every new entry level storage is based on std hardware without any hw > acceleration ...for example EMC AX storages are simply xeon servers.TCP offloading is quite standard on most GbE chips. even iSCSI offloading is pretty common. I''d be surprised if EMC wouldn''t take advantage of that. -- Javier _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
2011/1/26 Freddie Cash <fjwcash@gmail.com>:> On Wed, Jan 26, 2011 at 12:55 AM, Rudi Ahlers <Rudi@softdux.com> wrote: >> Well, that''s the problem. We have (had, soon to be returned) a so >> called "enterprise SAN" with dual everything, but it failed miserably >> during December and we ended up migrating everyone to a few older NAS >> devices just to get the client''s websites up again (VPS hosting). So, >> just cause a SAN has dual PSU''s, dual controllers, dual NIC''s, dual >> HEAD''s, etc doesn''t mean it''s non-redundant. >> >> I''m thinking of setting up 2 independent SAN''s, of for that matter >> even NAS clusters, and then doing something like RAID1 (mirror) on the >> client nodes with the iSCSI mounts. But, I don''t know if it''s feasible >> or worth the effort. Has anyone done something like this ? > > Our plan is to use FreeBSD + HAST + ZFS + CARP to create a > redundant/fail-over storage setup, using NFS. VM hosts will boot off > the network and mount / via NFS, start up libvirtd, pick up their VM > configs, and start the VMs. The guest OSes will also boot off the > network using NFS, with separate ZFS filesystems for each guest. > > If the master storage node fails for any reason (network, power, > storage, etc), CARP/HAST will fail-over to the slave node, and > everything carries on as before. NFS clients will notice the link is > down, try again, try again, try again, notice the slave node is up > (same IP/hostname), and carry on. > > The beauty of using NFS is that backups can be done from the storage > box without touching the VMs (snapshot, backup from snapshot). And > provisioning a new server is as simple as cloning a ZFS filesystem > (takes a few seconds). There''s also no need to worry about sizing the > storage (NFS can grow/shrink without the client caring); and even less > to worry about due to the pooled storage setup of ZFS (if there''s > blocks available in the pool, any filesystem can use it). Add in > dedupe and compression across the entire pool ... and storage needs go > way down. > > It''s also a lot easier to configure live-migration using NFS than iSCSI. > > To increase performance, just add a couple of fast SSDs (one for write > logging, one for read caching) and let ZFS handle it. > > Internally, the storage boxes have multiple CPUs, multiple cores, > multiple PSUs, multiple NICs bonded together, multiple drive > controllers etc. And then there''s two of them (one physically across > town connected via fibre). > > VM hosts are basically throw-away appliances with gobs of CPU, RAM, > and NICs, and no local storage to worry about. One fails, just swap > it with another and add it to the VM pool. > > Can''t get much more redundant than that. > > If there''s anything that we''ve missed, let me know. :)Yes. NFS can handle only 16 first groups. If user belong to more than 16 users - you are close to have troubles. Regards, Marcin Kuk _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Wed, Jan 26, 2011 at 2:59 PM, Marcin Kuk <marcin.kuk@gmail.com> wrote:> 2011/1/26 Freddie Cash <fjwcash@gmail.com>: >> If there''s anything that we''ve missed, let me know. :) > > Yes. NFS can handle only 16 first groups. If user belong to more than > 16 users - you are close to have troubles.We run our school computers by pxebooting off a single NFS server in each school. 50 schools times 9 years in use gives 0 issues with having users in more than 16 groups. Few are in more than 2 groups. :) Running mail, web, moodle, etc servers won''t have those issues. But, yes, that can be an issue in some situations. -- Freddie Cash fjwcash@gmail.com _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Il 26/01/2011 23:33, Javier Guerra Giraldez ha scritto:> On Wed, Jan 26, 2011 at 5:10 PM, Christian Zoffoli <czoffoli@xmerlin.org> wrote: >> Every new entry level storage is based on std hardware without any hw >> acceleration ...for example EMC AX storages are simply xeon servers. > > TCP offloading is quite standard on most GbE chips. even iSCSI > offloading is pretty common. I''d be surprised if EMC wouldn''t take > advantage of that.also hardware raid6 is very common today but in entry level storages you can find only software raid6 and the performances are terrible. in other words don''t trust so much the entry level storages of many vendors. to test if your storage is good or not you can use simply compare the IOPS of the formulas with real results ...and many times you can find that numbers are very different. Christian _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> Il 26/01/2011 22:24, James Harper ha scritto: > >> > >> iSCSI tipically has a quite big overhead due to the protocol, FC,SAS,> >> native infiniband, AoE have very low overhead. > >> > > > > For iSCSI vs AoE, that isn''t as true as you might think. TCP offloadcan> > take care of a lot of the overhead. Any server class network adapter > > these days should allow you to send 60kb packets to the networkadapter> > and it will take care of the segmentation, while AoE would belimited to> > MTU sized packets. With AoE you need to checksum every packetyourself> > while with iSCSI it is taken care of by the network adapter. > > the overhead is 10% on a gigabit link and when you speak aboutresources> overhead you have mention also the CPU overhead on the storage side.I don''t know the exact size of the iSCSI header, but to be 10% of a gigabit link it would have to be 900 bytes, and I''m pretty sure it''s much less. If you weren''t using jumbo frames then maybe 10% might be realistic, but that''s hardly an enterprise scenario.> If you check the datasheets of brands like emc you can see that thesame> storage platform is sold in iSCSI and FC version ...on the first oneyou> can use less than half the servers you can use with the last one. > > Every new entry level storage is based on std hardware without any hw > acceleration ...for example EMC AX storages are simply xeon servers. >Well if EMC are selling workstation grade cards with no TCP offload at all then I''m not surprised that the performance is so poor. James _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Hi! On Wed, Jan 26, 2011 at 11:59:40PM +0100, Marcin Kuk wrote: [SNIP]> Yes. NFS can handle only 16 first groups. If user belong to more than > 16 users - you are close to have troubles.Actually this is a server issue. Patches to raise this limit to NGROUPS_MAX (65536 in Linux) exist for a long time. And they work well and are still supported: http://www.frankvm.com/nfs-ngroups/ Permission checking is a server task. So, as long as you''re using Linux NFS servers only you may raise that limit to 64K groups, no matter what client you are using. -- Adi _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Thu, Jan 27, 2011 at 09:03:31AM +0100, Adi Kriegisch wrote:> Actually this is a server issue. Patches to raise this limit to NGROUPS_MAX > (65536 in Linux) exist for a long time. And they work well and are still > supported: > http://www.frankvm.com/nfs-ngroups/ > > Permission checking is a server task. So, as long as you''re using Linux NFS > servers only you may raise that limit to 64K groups, no matter what client > you are using.Sorry, this is a client issue... my bad. In Debian there even exists a package with the kernel patch: "linux-patch-nfs-ngroups"... -- Adi _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Hi!> > iSCSI tipically has a quite big overhead due to the protocol, FC, SAS, > > native infiniband, AoE have very low overhead. > > > > For iSCSI vs AoE, that isn''t as true as you might think. TCP offload can > take care of a lot of the overhead. Any server class network adapter > these days should allow you to send 60kb packets to the network adapter > and it will take care of the segmentation, while AoE would be limited to > MTU sized packets. With AoE you need to checksum every packet yourself > while with iSCSI it is taken care of by the network adapter.What AoE actually does is sending a frame per block. Block size is 4K -- so no need for fragmentation. The overhead is pretty low, because we''re talking about Ethernet frames. Most iSCSI issues I have seen are with reordering of packages due to transmission across several interfaces. So what most people recommend is to keep the number of interfaces to two. To keep performance up this means you have to use 10G, FC or similar which is quite expensive -- especially if you''d like to have a HA SAN network (HSRP and stuff like that is required). AoE does not suffer from those issues: Using 6 GBit interfaces is no problem at all, load balancing will happen automatically, as the load is distributed equally across all available interfaces. HA is very simple: just use two switches and connect one half of the interfaces to one switch and the other half to the other switch. (It is recommended to use switches that can do jumbo frames and flow control) IMHO most of the current recommendations and practises surrounding iSCSI are there to overcome the shortcomings of the protocol. AoE is way more robust and easier to handle. -- Adi _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Hi!> Il 26/01/2011 17:07, yue ha scritto: > > yes, there has no a good silution. > > 1.san+gfs2(ocfs2) > > 2.san+clvm > > 3san+clvm+gfs2(ocfs2) > > 4san+normal filesystem, ext3..... > > which has the better performance? > > 4 if your SAN exports as many luns as your VM disks > > 2 is better IMHO ...more flexible, not so high overhead100% ACK. The best thing about this: There is no overhead in using CLVM: The cluster locking is only required when modifying LVs. For the rest of the time performance is (most probably) slightly better than when using LUNs directly because LVM will take care of readahead dynamically. -- Adi _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
No, it is not slow, it is basically nfs exported local directory which is mirrored or striped over network on several servers, so e Performance difference is block level access vs image file storage which will cost you about 30%, but as soon as you are fine with nfs root for DomU you are ok. r.> Is glusterfs not really slow? What performance are you getting with the > Xen DomUs? Are take it your using gluster in the Dom0? > > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Thu, Jan 27, 2011 at 10:38 AM, Adi Kriegisch <adi@cg.tuwien.ac.at> wrote:> Hi! > >> Il 26/01/2011 17:07, yue ha scritto: >> > yes, there has no a good silution. >> > 1.san+gfs2(ocfs2) >> > 2.san+clvm >> > 3san+clvm+gfs2(ocfs2) >> > 4san+normal filesystem, ext3..... >> > which has the better performance? >> >> 4 if your SAN exports as many luns as your VM disks >> >> 2 is better IMHO ...more flexible, not so high overhead > 100% ACK. The best thing about this: There is no overhead in using CLVM: > The cluster locking is only required when modifying LVs. For the rest of > the time performance is (most probably) slightly better than when using > LUNs directly because LVM will take care of readahead dynamically. > > -- Adi > > _______________________________________________How would you do this? Export LUN1 from SAN1 & LUN1 from SAN2 to the same client PC, and then setup cLVM on top of the 2 LUN''s? What do you then do if you want redundancy, between 2 client PC''s, i.e similar to RAID1 ? -- Kind Regards Rudi Ahlers SoftDux Website: http://www.SoftDux.com Technical Blog: http://Blog.SoftDux.com Office: 087 805 9573 Cell: 082 554 7532 _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Il 27/01/2011 11:09, Rudi Ahlers ha scritto: [cut]> > How would you do this? > > Export LUN1 from SAN1 & LUN1 from SAN2 to the same client PC, and then > setup cLVM on top of the 2 LUN''s? > > What do you then do if you want redundancy, between 2 client PC''s, i.e > similar to RAID1 ?DRBD is the way Christian _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Hi! On Thu, Jan 27, 2011 at 12:09:54PM +0200, Rudi Ahlers wrote:> On Thu, Jan 27, 2011 at 10:38 AM, Adi Kriegisch <adi@cg.tuwien.ac.at> wrote:[SNIP]> >> 2 is better IMHO ...more flexible, not so high overhead > > 100% ACK. The best thing about this: There is no overhead in using CLVM: > > The cluster locking is only required when modifying LVs. For the rest of > > the time performance is (most probably) slightly better than when using > > LUNs directly because LVM will take care of readahead dynamically.[SNAP]> How would you do this? > > Export LUN1 from SAN1 & LUN1 from SAN2 to the same client PC, and then > setup cLVM on top of the 2 LUN''s?Ja, exactly.> What do you then do if you want redundancy, between 2 client PC''s, i.e > similar to RAID1 ?Oh well, there are several ways to achieve this, I guess: * use dm mirroring on top of clvm (I tested this once personally but did not need it for production then -- will probably look into it some time again). I think this is just the way to go although it might be a little slower than running a raid in domU. * Giving two LVs to the virtual machines and let them do the mirroring with software raid. I think this option offers greatest performance while being robust. The only disadvantage I see is that in case of failure you have to recreate all the software raids in your domUs. In some hosting environments this might be an issue. * Use glusterfs/drbd/... Performancewise and in terms of reliability and stability I do not see any issues here. But to use those you actually do not need a SAN as a backend. A SAN always adds a performance penalty due to an increase of latency. Local storage always has an advantage over SAN in this respect. So in case you plan to use glusterfs, drbd or something like that, you should reconsider the SAN issue. This might save alot of money as well... ;-) -- Adi _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Thu, Jan 27, 2011 at 1:04 PM, Adi Kriegisch <adi@cg.tuwien.ac.at> wrote:> >> What do you then do if you want redundancy, between 2 client PC''s, i.e >> similar to RAID1 ? > Oh well, there are several ways to achieve this, I guess: > * use dm mirroring on top of clvm (I tested this once personally but did > not need it for production then -- will probably look into it some time > again). > I think this is just the way to go although it might be a little slower > than running a raid in domU. > * Giving two LVs to the virtual machines and let them do the mirroring with > software raid. > I think this option offers greatest performance while being robust. The > only disadvantage I see is that in case of failure you have to recreate > all the software raids in your domUs. In some hosting environments this > might be an issue.Why not just give the 2 LV''s to the dom0, and raid it on the dom0 instead? The the domU''s still use the "local storage" as before and they won''t know about it.> * Use glusterfs/drbd/... Performancewise and in terms of reliability and > stability I do not see any issues here. But to use those you actually do > not need a SAN as a backend. A SAN always adds a performance penalty due > to an increase of latency. Local storage always has an advantage over SAN > in this respect. So in case you plan to use glusterfs, drbd or something > like that, you should reconsider the SAN issue. This might save alot of > money as well... ;-)I would prefer not to use DRBD. Every layer you add, adds more complication at the end of the day. And we already have this expensive EMC SAN, so I would like to utilize it somehow, but with better redundancy.> > -- Adi > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users >-- Kind Regards Rudi Ahlers SoftDux Website: http://www.SoftDux.com Technical Blog: http://Blog.SoftDux.com Office: 087 805 9573 Cell: 082 554 7532 _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Rudi Ahlers <Rudi@SoftDux.com> a écrit :> On Thu, Jan 27, 2011 at 1:04 PM, Adi Kriegisch <adi@cg.tuwien.ac.at> wrote: > >> >>> What do you then do if you want redundancy, between 2 client PC''s, i.e >>> similar to RAID1 ? >> Oh well, there are several ways to achieve this, I guess: >> * use dm mirroring on top of clvm (I tested this once personally but did >> not need it for production then -- will probably look into it some time >> again). >> I think this is just the way to go although it might be a little slower >> than running a raid in domU. >> * Giving two LVs to the virtual machines and let them do the mirroring with >> software raid. >> I think this option offers greatest performance while being robust. The >> only disadvantage I see is that in case of failure you have to recreate >> all the software raids in your domUs. In some hosting environments this >> might be an issue. > > Why not just give the 2 LV''s to the dom0, and raid it on the dom0 > instead? The the domU''s still use the "local storage" as before and > they won''t know about it. >With that setup, are you able to do live migration ?> >> * Use glusterfs/drbd/... Performancewise and in terms of reliability and >> stability I do not see any issues here. But to use those you actually do >> not need a SAN as a backend. A SAN always adds a performance penalty due >> to an increase of latency. Local storage always has an advantage over SAN >> in this respect. So in case you plan to use glusterfs, drbd or something >> like that, you should reconsider the SAN issue. This might save alot of >> money as well... ;-) > > I would prefer not to use DRBD. Every layer you add, adds more > complication at the end of the day. > > And we already have this expensive EMC SAN, so I would like to utilize > it somehow, but with better redundancy. >-- Pierre _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Il 27/01/2011 13:48, Pierre ha scritto: [cut]>> Why not just give the 2 LV''s to the dom0, and raid it on the dom0 >> instead? The the domU''s still use the "local storage" as before and >> they won''t know about it. >> > > With that setup, are you able to do live migration ?no he cannot he can do live migration if he make the raid1 inside domus Christian _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Il 27/01/2011 14:00, Christian Zoffoli ha scritto: [cut]> he can do live migration if he make the raid1 inside domusonly cmirror is clusterwise but I don''t think it''s a good idea to put it in production Christian _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Thu, Jan 27, 2011 at 3:00 PM, Christian Zoffoli <czoffoli@xmerlin.org> wrote:> Il 27/01/2011 13:48, Pierre ha scritto: > [cut] >>> Why not just give the 2 LV''s to the dom0, and raid it on the dom0 >>> instead? The the domU''s still use the "local storage" as before and >>> they won''t know about it. >>> >> >> With that setup, are you able to do live migration ? > > no he cannot > > he can do live migration if he make the raid1 inside domus > > > Christian > > _______________________________________________Why not? Live migration only relies on shared storage on the dom0, not on the domU -- Kind Regards Rudi Ahlers SoftDux Website: http://www.SoftDux.com Technical Blog: http://Blog.SoftDux.com Office: 087 805 9573 Cell: 082 554 7532 _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Rudi Ahlers <Rudi@SoftDux.com> a écrit :> On Thu, Jan 27, 2011 at 3:00 PM, Christian Zoffoli > <czoffoli@xmerlin.org> wrote: >> Il 27/01/2011 13:48, Pierre ha scritto: >> [cut] >>>> Why not just give the 2 LV''s to the dom0, and raid it on the dom0 >>>> instead? The the domU''s still use the "local storage" as before and >>>> they won''t know about it. >>>> >>> >>> With that setup, are you able to do live migration ? >> >> no he cannot >> >> he can do live migration if he make the raid1 inside domus >>That''s what I thought...>> >> Christian >> >> _______________________________________________ > > > Why not? > > Live migration only relies on shared storage on the dom0, not on the domU >Then, how do you achieve shared raid1 ? How may the first dom0 be aware of the mirroring the other dom0(s) is(are) doing ? I must be missing something here. -- Pierre _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Adi Kriegisch wrote:>What AoE actually does is sending a frame per block. Block size is 4K -- so >no need for fragmentation. The overhead is pretty low, because we''re >talking about Ethernet frames. >Most iSCSI issues I have seen are with reordering of packages due to >transmission across several interfaces. So what most people recommend is to >keep the number of interfaces to two. To keep performance up this means you >have to use 10G, FC or similar which is quite expensive -- especially if >you''d like to have a HA SAN network (HSRP and stuff like that is required). > >AoE does not suffer from those issues: Using 6 GBit interfaces is no >problem at all, load balancing will happen automatically, as the load is >distributed equally across all available interfaces. HA is very simple: >just use two switches and connect one half of the interfaces to one switch >and the other half to the other switch. (It is recommended to use switches >that can do jumbo frames and flow control)Getting somewhat off-topic, but I''m interested to know how AoE handles network errors ? I assume there is some handshake to make sure packets were delivered, rather than just "fire and forget" ! -- Simon Hobson Visit http://www.magpiesnestpublishing.co.uk/ for books by acclaimed author Gladys Hobson. Novels - poetry - short stories - ideal as Christmas stocking fillers. Some available as e-books. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Hi!> >> With that setup, are you able to do live migration ? > > > > no he cannot > > > > he can do live migration if he make the raid1 inside domus[SNIP]> > Why not? > > Live migration only relies on shared storage on the dom0, not on the domURight... The part of "storage" you need to share is whatever you export to your domU. So if you use something like: (...) ''phy:/dev/my-cool-volume-group/server-disk,xvda1,w'' (...) you need to take care that the very same disk exists on every single dom0 participating in this cluster. In case you want to use software RAID your DomU config looks like this: (...) ''phy:/dev/md10,xvda1,w'' (...) So /dev/md10 needs to be available on all dom0 -- which isn''t, of course... As a side note: I never tried this and I would NOT recommend to ever use that, but it could be possible to do such crazy stuff: If you can ensure that only one domU has access to a device and you somehow manage to make RAID device names unique within your cluster and you can ensure that no dom0s have any of those RAID monitoring jobs running then it could work (at least for some time) -- but it is pointless, as you have no control whatsoever over your raid devices, there is no monitoring at all and in case you need to rebuild, you need to carefully check on which host you''ll do it... Ah, and upgrades of <your-favorite-distro> will be pretty hard to do as you need to stop [pre|post]-upgrade scripts from snapping in and touching your RAID devices. -- Adi _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Javier Guerra Giraldez
2011-Jan-27 16:09 UTC
Re: [Xen-users] AoE (Was: iscsi vs nfs for xen VMs)
On Thu, Jan 27, 2011 at 11:02 AM, Simon Hobson <linux@thehobsons.co.uk> wrote:> Getting somewhat off-topic, but I''m interested to know how AoE handles > network errors ? I assume there is some handshake to make sure packets were > delivered, rather than just "fire and forget" !AFAIK, it''s handled by the disk drivers. surprisingly, i see less errors on my AoE disks than on some SATA drives -- Javier _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Hi!> > to increase performances you can skip iscsi and you can try a SAS SAN > > (with some LSI SAS switch if you need more than 4 servers linked to a > > single SAN) > > Do you get different types of SAN? SAN = iSCSI, last time I checked.from http://en.wikipedia.org/wiki/Storage_area_network: Most storage networks use the SCSI protocol for communication between servers and disk drive devices. A mapping layer to other protocols is used to form a network: * ATA over Ethernet (AoE), mapping of ATA over Ethernet * Fibre Channel Protocol (FCP), the most prominent one, is a mapping of * SCSI over Fibre Channel * Fibre Channel over Ethernet (FCoE) * ESCON over Fibre Channel (FICON), used by mainframe computers * HyperSCSI, mapping of SCSI over Ethernet * iFCP[2] or SANoIP[3] mapping of FCP over IP * iSCSI, mapping of SCSI over TCP/IP * iSCSI Extensions for RDMA (iSER), mapping of iSCSI over InfiniBand Storage networks may also be built using SAS and SATA technologies. SAS evolved from SCSI direct-attached storage. SATA evolved from IDE direct-attached storage. SAS and SATA devices can be networked using SAS Expanders. So SAN = iSCSI isn''t quite it. -- Adi _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Hi!> Il 27/01/2011 14:00, Christian Zoffoli ha scritto: > [cut] > > he can do live migration if he make the raid1 inside domus > > only cmirror is clusterwise but I don''t think it''s a good idea to put it > in productionI am very interested in that one: Do you have any experience with that in production systems? Based on what cluster stack did you test it? -- Adi _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Il 27/01/2011 14:21, Pierre ha scritto: [cut]> Then, how do you achieve shared raid1 ? How may the first dom0 be aware > of the mirroring the other dom0(s) is(are) doing ? > I must be missing something here.you are right the only way to use raid1 in such scenario is to be sure to enable raid1 only on 1 dom0 ...and of course it''s not the right way to design a reliable cluster Best regards, Christian _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Javier Guerra Giraldez wrote:> > Getting somewhat off-topic, but I''m interested to know how AoE handles >> network errors ? I assume there is some handshake to make sure packets were >> delivered, rather than just "fire and forget" ! > >AFAIK, it''s handled by the disk drivers. surprisingly, i see less >errors on my AoE disks than on some SATA drivesAhh yes, that would make sense. I''d completely forgotten about error recovery in the normal disk drivers ! -- Simon Hobson Visit http://www.magpiesnestpublishing.co.uk/ for books by acclaimed author Gladys Hobson. Novels - poetry - short stories - ideal as Christmas stocking fillers. Some available as e-books. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> Getting somewhat off-topic, but I''m interested to know how AoE > handles network errors ? I assume there is some handshake to make > sure packets were delivered, rather than just "fire and forget" !Ethernet itself has checksumming and error correction, so all of this is handled in layer 2. -- John Madden Sr UNIX Systems Engineer / Office of Technology Ivy Tech Community College of Indiana jmadden@ivytech.edu _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Javier Guerra Giraldez
2011-Jan-27 18:29 UTC
Re: [Xen-users] AoE (Was: iscsi vs nfs for xen VMs)
On Thu, Jan 27, 2011 at 1:08 PM, John Madden <jmadden@ivytech.edu> wrote:>> Getting somewhat off-topic, but I''m interested to know how AoE >> handles network errors ? I assume there is some handshake to make >> sure packets were delivered, rather than just "fire and forget" ! > > Ethernet itself has checksumming and error correction, so all of this is > handled in layer 2.Ethernet hardware+drivers do indeed checksum and verify every frame; but in case of invalid checksum, it simply discards the frame. it doesn''t retransmit, so it''s not error correction; just error detection (could it be called ''error elimination''?) -- Javier _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
John Madden wrote:>>Getting somewhat off-topic, but I''m interested to know how AoE >>handles network errors ? I assume there is some handshake to make >>sure packets were delivered, rather than just "fire and forget" ! > >Ethernet itself has checksumming and error correction, so all of >this is handled in layer 2.As Javier points out, it only drops frames detected to have errors. To have an error corrected link you either have to do it yourself or use TCP. Also, in the context of paralleling multiple links across dual switches to provide redundancy, you could also lose frames in transit if a switch failed. I must admit, AoE does seem to have it''s upsides - in past threads (here and elsewhere) I''ve only ever seen it being criticised. -- Simon Hobson Visit http://www.magpiesnestpublishing.co.uk/ for books by acclaimed author Gladys Hobson. Novels - poetry - short stories - ideal as Christmas stocking fillers. Some available as e-books. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
2011/1/27 Adi Kriegisch <adi@cg.tuwien.ac.at>:> Hi! > >> > to increase performances you can skip iscsi and you can try a SAS SAN >> > (with some LSI SAS switch if you need more than 4 servers linked to a >> > single SAN) >> >> Do you get different types of SAN? SAN = iSCSI, last time I checked. > from http://en.wikipedia.org/wiki/Storage_area_network: > Most storage networks use the SCSI protocol for communication between > servers and disk drive devices. A mapping layer to other protocols is used > to form a network: > > * ATA over Ethernet (AoE), mapping of ATA over Ethernet > * Fibre Channel Protocol (FCP), the most prominent one, is a mapping of > * SCSI over Fibre Channel > * Fibre Channel over Ethernet (FCoE) > * ESCON over Fibre Channel (FICON), used by mainframe computers > * HyperSCSI, mapping of SCSI over Ethernet > * iFCP[2] or SANoIP[3] mapping of FCP over IP > * iSCSI, mapping of SCSI over TCP/IP > * iSCSI Extensions for RDMA (iSER), mapping of iSCSI over InfiniBand > > Storage networks may also be built using SAS and SATA technologies. SAS > evolved from SCSI direct-attached storage. SATA evolved from IDE > direct-attached storage. SAS and SATA devices can be networked using SAS > Expanders. > > So SAN = iSCSI isn''t quite it.Never ending story... This is only terminology - you can talk about it all the time, but the question is... who cares? ;) Any of you will win - all this is named by people, and tomorow all can have different meanings. Regards, Marcin Kuk _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> -----Original Message----- > From: xen-users-bounces@lists.xensource.com [mailto:xen-users- > bounces@lists.xensource.com] On Behalf Of Simon Hobson > Subject: Re: [Xen-users] AoE (Was: iscsi vs nfs for xen VMs) > > Getting somewhat off-topic, but I''m interested to know how AoE handlesnetwork> errors ? I assume there is some handshake to make sure packets weredelivered,> rather than just "fire and forget" !The Linux aoe open-source driver from Coraid (with which I am the most familiar) implements a congestion avoidance and control algorithm, similar to TCP/IP. If a response exceeds twice the average round-trip time plus 8 times the average deviation, the request is retransmitted (based on aoe6-75 sources, earlier sources may differ). What''s interesting about aoe vs. TCP is that a round-trip measures both network and disk latency, not just network latency. A request request will send a request packet, after which the target performs a disk read, and returns a response packet with the disk sector contents. A normal write request will send a request with the sector contents, upon which the target performs a disk write, and returns a status packet. Disk latency is orders of magnitude greater than network, and more variable. We see a RTT of 5-10ms typically under light usage. Upon heavy disk I/O, this time can vary upwards, possibly tenths of seconds, leading to apparent packet loss and an RTT adjustment by the driver. So it''s not uncommon for a target to receive and process a duplicate request, which is okay because each request is idempotent. Lossage of 0.1% to 0.2% is common in our environment, but this does not have a significant impact overall on aoe performance. That said, the aoe protocol also supports an asynchronous write operation, which I suppose really is "fire and forget", unlike normal reads and writes. I haven''t used an aoe driver that implements asynchronous writes however, and I''m not sure I would if I had the option since you have no guarantee that the writes succeed. -Jeff _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> > > -----Original Message----- > > From: xen-users-bounces@lists.xensource.com [mailto:xen-users- > > bounces@lists.xensource.com] On Behalf Of Simon Hobson > > Subject: Re: [Xen-users] AoE (Was: iscsi vs nfs for xen VMs) > > > > Getting somewhat off-topic, but I''m interested to know how AoEhandles> network > > errors ? I assume there is some handshake to make sure packets were > delivered, > > rather than just "fire and forget" ! > > The Linux aoe open-source driver from Coraid (with which I am the most > familiar) implements a congestion avoidance and control algorithm, > similar to TCP/IP. If a response exceeds twice the average round-trip > time plus 8 times the average deviation, the request is retransmitted > (based on aoe6-75 sources, earlier sources may differ). > > What''s interesting about aoe vs. TCP is that a round-trip measuresboth> network and disk latency, not just network latency. A request request > will send a request packet, after which the target performs a diskread,> and returns a response packet with the disk sector contents. A normal > write request will send a request with the sector contents, upon which > the target performs a disk write, and returns a status packet. Disk > latency is orders of magnitude greater than network, and morevariable.> We see a RTT of 5-10ms typically under light usage. > > Upon heavy disk I/O, this time can vary upwards, possibly tenths of > seconds, leading to apparent packet loss and an RTT adjustment by the > driver. So it''s not uncommon for a target to receive and process a > duplicate request, which is okay because each request is idempotent. > > Lossage of 0.1% to 0.2% is common in our environment, but this doesnot> have a significant impact overall on aoe performance. > > That said, the aoe protocol also supports an asynchronous write > operation, which I suppose really is "fire and forget", unlike normal > reads and writes. I haven''t used an aoe driver that implements > asynchronous writes however, and I''m not sure I would if I had the > option since you have no guarantee that the writes succeed. >Interesting stuff. I use DRBD locally and used to regularly see messages about concurrent outstanding requests to the same sector. DRBD logs this because it can''t guarantee the serialising of requests so two write requests to the same sector might be reordered at any layer different between the two servers. It sounds like AoE would make this even worse if the ''first'' write was lost resulting in the ''second'' write being performed first followed by the ''first'' write. Now sensibly, you''d think that a barrier would be placed between the first and second writes guaranteeing that nothing would be reordered across the barrier, but if you run Windows on Xen on AoE on DRBD (eg to a HA DRBD SAN), you might see non-sensible things happen. To be fair, in my testing the writes that Windows performed were always the same data so there were no adverse consequences but it''s still annoying. I modified GPLPV to check the pipeline and to stall if an overlapping write request would be sent (it happens very rarely so there is no measurable performance impact), but it''s a lot of mucking around just to get rid of one little benign message. James _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> -----Original Message----- > From: xen-users-bounces@lists.xensource.com [mailto:xen-users- > bounces@lists.xensource.com] On Behalf Of Simon Hobson > > I must admit, AoE does seem to have it''s upsides - in past threads(here and> elsewhere) I''ve only ever seen it being criticised.Many of those threads seem to delve into performance claims, which isn''t very helpful in order to objectively compare the protocols. I don''t frankly care that either of iSCSI or AoE is more efficient than the other by a few percent on the wire--if your storage implementation depends on such a small margin to determine success or failure, think very carefully about your tolerances. You''d better give yourself more headroom than that. Although the reality is complex, the basic truth is that networks are fast and (non-SSD) disks are slow. On sequential performance, a good disk will have more bandwidth than a single GigE link, but under any sort of random I/O the disk latency dominates all others and network performance is marginalized. And you can forget about relying on the performance of sequential I/O in any large application cluster with e.g. tens of nodes and central storage. The real benefit of AoE that seems to get lost on its detractors is its simplicity. The protocol specification is brief and the drivers are easy to install and manage. The protocol supports self-discovery (via broadcast) so that once you connect your initiator to your targets and bring your Ethernet interface up, device nodes just appear and you can immediately use them exactly as you would local devices. Multipath over AoE can be as easy as connecting two or more Ethernet interfaces rather than one--the new transports will be discovered and utilized with zero incremental configuration provided your targets and initiators support it, as the commercial ones I use do. The supposed benefits of iSCSI, which include security and routeability, and meaningless to me. Whether I use iSCSI or not I would never let my storage network touch any of our general networks. I want my storage connected to my hosts over the shortest path possible, if not with crossover cables, then with a dedicated switch. AoE is not inherently more or less secure than a SAS cable, and shouldn''t be, since you need to physically secure your storage regardless of interconnects. For me the security features of iSCSI only add to the complexity and overhead inherent in the protocol. -Jeff _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
use lun directly, if vms are difficult to be controlled like live migration,backup,.....change vmto template,and template to vm?? thanks. adi At 2011-01-27 18:09:54,"Rudi Ahlers" <Rudi@SoftDux.com> wrote:>On Thu, Jan 27, 2011 at 10:38 AM, Adi Kriegisch <adi@cg.tuwien.ac.at> wrote: >> Hi! >> >>> Il 26/01/2011 17:07, yue ha scritto: >>> > yes, there has no a good silution. >>> > 1.san+gfs2(ocfs2) >>> > 2.san+clvm >>> > 3san+clvm+gfs2(ocfs2) >>> > 4san+normal filesystem, ext3..... >>> > which has the better performance? >>> >>> 4 if your SAN exports as many luns as your VM disks >>> >>> 2 is better IMHO ...more flexible, not so high overhead >> 100% ACK. The best thing about this: There is no overhead in using CLVM: >> The cluster locking is only required when modifying LVs. For the rest of >> the time performance is (most probably) slightly better than when using >> LUNs directly because LVM will take care of readahead dynamically. >> >> -- Adi >> >> _______________________________________________ > > > >How would you do this? > >Export LUN1 from SAN1 & LUN1 from SAN2 to the same client PC, and then >setup cLVM on top of the 2 LUN's? > >What do you then do if you want redundancy, between 2 client PC's, i.e >similar to RAID1 ? > > > >-- >Kind Regards >Rudi Ahlers >SoftDux > >Website: http://www.SoftDux.com >Technical Blog: http://Blog.SoftDux.com >Office: 087 805 9573 >Cell: 082 554 7532_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
what is the performance of clvm+ocfs2? stability,? At 2011-01-27 04:13:29,"Christian Zoffoli" <czoffoli@xmerlin.org> wrote:>Il 26/01/2011 17:09, yue ha scritto: >> CLVM and GFS2 ,could tell how you deploy? >> more details are expected. > >install pacemaker, corosync, clvm compiled for the new stack and gfs2 >...but IMHO gfs2 is not stable enought ...gfs is stable but too old. > >OCFS2 is more interesting for xen vm hosting ...check "OCFS2 reflink" >keywords > > >Christian_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Jeff Sturm wrote:>That said, the aoe protocol also supports an asynchronous write >operation, which I suppose really is "fire and forget", unlike normal >reads and writes. I haven''t used an aoe driver that implements >asynchronous writes however, and I''m not sure I would if I had the >option since you have no guarantee that the writes succeed.Agreed, I can''t see many uses. However there are some applications (such as video capture for instance) where it may be better to miss an occasional block than to suffer the overhead of error correction. I know someone who works for a surveying outfit where they drive around with vans (a bit like the Google camera cars) recording video of roads etc - the video is analysed later by the client to spot things like broken street lights, potholes, or whatever it is they are looking for. IN a situation like this, it''s better to have a glitch in the video than potentially lose a big chunk because the system pauses to correct an error. As it is, they use SDLT tape (I think) because it''s cheaper* than spinning disk and more suitable for the streaming data streams they are writing. * Presumably at the time the decision was made, I suspect that may have changed now. James Harper wrote:>I use DRBD locally and used to regularly see messages about concurrent >outstanding requests to the same sector. DRBD logs this because it can''t >guarantee the serialising of requests so two write requests to the same >sector might be reordered at any layer different between the two >servers. It sounds like AoE would make this even worse if the ''first'' >write was lost resulting in the ''second'' write being performed first >followed by the ''first'' write.Bear in mind that with modern disks it is normal for them to have command queuing and reordering built in. So unless you specifically turn it off, your carefully ordered writes may be re-ordered by teh drive itself. Jeff Sturm wrote:> > I must admit, AoE does seem to have it''s upsides - in past threads >(here and >> elsewhere) I''ve only ever seen it being criticised. > >Many of those threads seem to delve into performance claims, which isn''t >very helpful in order to objectively compare the protocols. I don''t >frankly care that either of iSCSI or AoE is more efficient than the >other by a few percent on the wire--if your storage implementation >depends on such a small margin to determine success or failure, think >very carefully about your tolerances. You''d better give yourself more >headroom than that. > >Although the reality is complex, the basic truth is that networks are >fast and (non-SSD) disks are slow. On sequential performance, a good >disk will have more bandwidth than a single GigE link, but under any >sort of random I/O the disk latency dominates all others and network >performance is marginalized. And you can forget about relying on the >performance of sequential I/O in any large application cluster with e.g. >tens of nodes and central storage. > >The real benefit of AoE that seems to get lost on its detractors is its >simplicity. The protocol specification is brief and the drivers are >easy to install and manage. The protocol supports self-discovery (via >broadcast) so that once you connect your initiator to your targets and >bring your Ethernet interface up, device nodes just appear and you can >immediately use them exactly as you would local devices. Multipath over >AoE can be as easy as connecting two or more Ethernet interfaces rather >than one--the new transports will be discovered and utilized with zero >incremental configuration provided your targets and initiators support >it, as the commercial ones I use do. > >The supposed benefits of iSCSI, which include security and routeability, >and meaningless to me. Whether I use iSCSI or not I would never let my >storage network touch any of our general networks. I want my storage >connected to my hosts over the shortest path possible, if not with >crossover cables, then with a dedicated switch. AoE is not inherently >more or less secure than a SAS cable, and shouldn''t be, since you need >to physically secure your storage regardless of interconnects. For me >the security features of iSCSI only add to the complexity and overhead >inherent in the protocol.Thanks, that''s a useful insight. -- Simon Hobson Visit http://www.magpiesnestpublishing.co.uk/ for books by acclaimed author Gladys Hobson. Novels - poetry - short stories - ideal as Christmas stocking fillers. Some available as e-books. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Marcin Kuk wrote:> > So SAN = iSCSI isn''t quite it. > >Never ending story... >This is only terminology - you can talk about it all the time, but the >question is... who cares? ;) Any of you will win - all this is named >by people, and tomorow all can have different meanings.Indeed, but the main thing people overlook is that they are different things - as in the analogous statement that "car != Ford". So for anyone who hasn''t realised it yet : SAN = Storage Area Network, rally just a name for any technique that makes your storage ''remote'' (as in not ''built in'') from the server/device that''s using it. It really does not require any specific technology though some things are assumed (such as the ability to connect more than one device to the storage and share the capacity). iSCSI = one specific technology for achieving that. Analogy : Car - wheeled and powered machine for moving people and goods about. Ford - just one of many brands you can buy. -- Simon Hobson Visit http://www.magpiesnestpublishing.co.uk/ for books by acclaimed author Gladys Hobson. Novels - poetry - short stories - ideal as Christmas stocking fillers. Some available as e-books. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Fri, Jan 28, 2011 at 10:54 AM, Simon Hobson <linux@thehobsons.co.uk> wrote:> Marcin Kuk wrote: > >> > So SAN = iSCSI isn''t quite it. >> >> Never ending story... >> This is only terminology - you can talk about it all the time, but the >> question is... who cares? ;) Any of you will win - all this is named >> by people, and tomorow all can have different meanings. > > Indeed, but the main thing people overlook is that they are different things > - as in the analogous statement that "car != Ford". > > So for anyone who hasn''t realised it yet : > > SAN = Storage Area Network, rally just a name for any technique that makes > your storage ''remote'' (as in not ''built in'') from the server/device that''s > using it. It really does not require any specific technology though some > things are assumed (such as the ability to connect more than one device to > the storage and share the capacity). > > iSCSI = one specific technology for achieving that. > > Analogy : > Car - wheeled and powered machine for moving people and goods about. > Ford - just one of many brands you can buy. > > -- > Simon HobsonTrue, but for quite some time SAN is associated with iSCSI & NAS with NFS. And I agree with you SAN = Storage Area Network, so in fact 2 NAS boxes would be a SAN since it''s storage on the network. I think a lot of the confusion is with the marketing, and what the corporate would like us to use. It''s nor always what we actually wanted, but we can''t say much cause it costs so damn much. And, sadly, clients associated expensive equipment with quality, which is not always the case either. -- Kind Regards Rudi Ahlers SoftDux Website: http://www.SoftDux.com Technical Blog: http://Blog.SoftDux.com Office: 087 805 9573 Cell: 082 554 7532 _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> James Harper wrote: > > > I use DRBD locally and used to regularly see messages aboutconcurrent> > outstanding requests to the same sector. DRBD logs this because itcan''t> > guarantee the serialising of requests so two write requests to thesame> > sector might be reordered at any layer different between the two > > servers. It sounds like AoE would make this even worse if the''first''> > write was lost resulting in the ''second'' write being performed first > > followed by the ''first'' write. > > Bear in mind that with modern disks it is normal for them to have > command queuing and reordering built in. So unless you specifically > turn it off, your carefully ordered writes may be re-ordered by teh > drive itself. >That''s why barriers were invented. I don''t see how AoE could possibly support that if it just processes requests in the order it receives them, eg if the command sequence went: Write1 Write2 w/barrier Write3 If Write2 was lost by the network and AoE just writes the writes as it receives them then we have a problem. It must implement some sort of sequencing and ordering or data loss is inevitable unless you can guarantee that nothing ever crashes or fails (in which case why are you using RAID, journaling filesystems, etc). James _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
2011/1/28 Rudi Ahlers <Rudi@softdux.com>:> On Fri, Jan 28, 2011 at 10:54 AM, Simon Hobson <linux@thehobsons.co.uk> wrote: >> Marcin Kuk wrote: >> >>> > So SAN = iSCSI isn''t quite it. >>> >>> Never ending story... >>> This is only terminology - you can talk about it all the time, but the >>> question is... who cares? ;) Any of you will win - all this is named >>> by people, and tomorow all can have different meanings. >> >> Indeed, but the main thing people overlook is that they are different things >> - as in the analogous statement that "car != Ford". >> >> So for anyone who hasn''t realised it yet : >> >> SAN = Storage Area Network, rally just a name for any technique that makes >> your storage ''remote'' (as in not ''built in'') from the server/device that''s >> using it. It really does not require any specific technology though some >> things are assumed (such as the ability to connect more than one device to >> the storage and share the capacity). >> >> iSCSI = one specific technology for achieving that. >> >> Analogy : >> Car - wheeled and powered machine for moving people and goods about. >> Ford - just one of many brands you can buy.I think you are right... today :).>> >> -- >> Simon Hobson > > > True, but for quite some time SAN is associated with iSCSI & NAS with NFS. > > And I agree with you SAN = Storage Area Network, so in fact 2 NAS > boxes would be a SAN since it''s storage on the network. I think a lot > of the confusion is with the marketing, and what the corporate would > like us to use. It''s nor always what we actually wanted, but we can''t > say much cause it costs so damn much. And, sadly, clients associated > expensive equipment with quality, which is not always the case either.Bingo! Regards, Marcin Kuk _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Il 28/01/2011 08:08, yue ha scritto:> what is the performance of clvm+ocfs2? > stability,?it''s very reliable but not as fast as using clvm directly. Christian _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
2011/1/28 Christian Zoffoli <czoffoli@xmerlin.org>:> Il 28/01/2011 08:08, yue ha scritto: >> what is the performance of clvm+ocfs2? >> stability,? > > it''s very reliable but not as fast as using clvm directly.to expand a little: ocfs2: it''s a cluster filesystem, it has the overheads of being a filesystem (as opposed to ''naked'' block devices), and of the clustering requirements: in effect, having to check shared locks at critical instants. clvm: it''s the clustering version of LVM. since the whole LVM metadata is quite small, it''s shared entirely, so all accesses are exactly the same on CLVM as on LVM. the only impact is when modifying the LVM metadata (creating/modifying/deleting/migrating/etc volumes), since _all_ access is suspended until every node has the a local copy of the new LVM metadata. Of course, a pause of a few tens or hundreds of milliseconds for an operation done less than once a day (less than once a month in many cases) is totally imperceptible. -- Javier _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> -----Original Message----- > From: James Harper [mailto:james.harper@bendigoit.com.au] > > It sounds like AoE would make this even worse if the ''first'' > write was lost resulting in the ''second'' write being performed > first followed by the ''first'' write.Keep in mind that AoE is a synchronous protocol (although the upper layers can make it appear to be asynchronous). If you issue two write requests concurrently, they can complete in any order. If you issue one request, wait for the response, then issue a 2nd request (i.e. with fsync), they *should* happen sequentially.> Now sensibly, you''d think that a barrier would be placed > between the first and second writes guaranteeing that nothing > would be reordered across the barrier,Yup. I understand the point about disks that reorder requests internally. I wonder if Coraid plans to implement barriers in a future version of the protocol. There are flag bits marked "reserved for future use", so it should be straightforward to implement in a backward-compatible fashion.> but if you run Windows on Xen on AoE on DRBD (eg to a HA > DRBD SAN), you might see non-sensible things happen.Definitely. There are plenty of layers that can get things wrong. I''ve mentioned in a past post here on this list that we experienced infrequent corruption of a cluster filesystem until we used "tap:sync" for our shared domU storage volumes. -Jeff _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> > 2011/1/28 Christian Zoffoli <czoffoli@xmerlin.org>: > > Il 28/01/2011 08:08, yue ha scritto: > >> what is the performance of clvm+ocfs2? > >> stability,? > > > > it's very reliable but not as fast as using clvm directly. > > to expand a little: > > ocfs2: > it's a cluster filesystem, it has the overheads of being a filesystem > (as opposed to 'naked' block devices), and of the clustering > requirements: in effect, having to check shared locks at critical > instants.Microsoft achieve high performance with their cluster filesystem. In fact the docs clearly state it's only reliable for Hyper-V virtual disks, any other use could cause problems, so I assume they get around the metadata locking problem by isolating each disk file so there are no (or minimal) shared resources.> > clvm: > it's the clustering version of LVM. since the whole LVM metadata is > quite small, it's shared entirely, so all accesses are exactly the > same on CLVM as on LVM. > > the only impact is when modifying the LVM metadata > (creating/modifying/deleting/migrating/etc volumes), since _all_ > access is suspended until every node has the a local copy of the new > LVM metadata. > > Of course, a pause of a few tens or hundreds of milliseconds for an > operation done less than once a day (less than once a month in many > cases) is totally imperceptible. >The dealbreaker for me with clvm was that snapshots aren't supported. I assume this hasn't changed and even if it has, every write to a snapshotted volume potentially involves a metadata lock so the performace drops right down unless you can optimise for that 'original + snapshot only accessed on the same node' case, which may be a limitation I could tolerate. James _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Fri, Jan 28, 2011 at 4:31 PM, James Harper <james.harper@bendigoit.com.au> wrote:> Microsoft achieve high performance with their cluster filesystem. In fact the docs clearly state it''s only reliable for Hyper-V virtual disks, any other use could cause problems, so I assume they get around the metadata locking problem by isolating each disk file so there are no (or minimal) shared resources.sounds like the original OCFS, (not ocfs2), which was only usable for oracle RAC -- Javier _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> > > -----Original Message----- > > From: James Harper [mailto:james.harper@bendigoit.com.au] > > > > It sounds like AoE would make this even worse if the ''first'' > > write was lost resulting in the ''second'' write being performed > > first followed by the ''first'' write. > > Keep in mind that AoE is a synchronous protocol (although the upper > layers can make it appear to be asynchronous). > > If you issue two write requests concurrently, they can complete in any > order. If you issue one request, wait for the response, then issue a > 2nd request (i.e. with fsync), they *should* happen sequentially.That''s the poor mans barrier. Not great for performance if you have to do it too often though as you can''t send the next requests until the previous ones are complete.> > > Now sensibly, you''d think that a barrier would be placed > > between the first and second writes guaranteeing that nothing > > would be reordered across the barrier, > > Yup. I understand the point about disks that reorder requests > internally. I wonder if Coraid plans to implement barriers in afuture> version of the protocol. There are flag bits marked "reserved for > future use", so it should be straightforward to implement in a > backward-compatible fashion. >It would require a semi-reliable protocol, which I thought they already had. James _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Christian Zoffoli,do you have document how to deplay clvm+ocfs2? thanks At 2011-01-28 18:57:11,"Christian Zoffoli" <czoffoli@xmerlin.org> wrote:>Il 28/01/2011 08:08, yue ha scritto: >> what is the performance of clvm+ocfs2? >> stability,? > >it's very reliable but not as fast as using clvm directly. > > >Christian > >_______________________________________________ >Xen-users mailing list >Xen-users@lists.xensource.com >http://lists.xensource.com/xen-users_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Wed, Jan 26, 2011 at 12:15:40PM -0800, Freddie Cash wrote:> On Wed, Jan 26, 2011 at 12:05 PM, Christian Zoffoli > <czoffoli@xmerlin.org> wrote: > > Il 26/01/2011 18:11, Freddie Cash ha scritto: > > [cut] > >> If there''s anything that we''ve missed, let me know. :) > > > > the exposed setup is very intesting but: > > > > a) ZFS on freebsd is not as stable as on solaris > > But it''s plenty stable enough for our uses. Been using it for over 2 > years now, since ZFSv6 hit FreeBSD 7.0. Once we got over the initial > tuning glitches and upgraded to ZFSv14, things have been rock solid, > even when swapping drives. > > > b) opensolaris is dead (oracle killed it) > > c) we have no guarantee that in the future oracle will release updated code > > For ZFS? No, there are no guarantees. But the Illumos, Nexenta, and > FreeBSD devs won''t be sitting still just waiting for Oracle to release > something (look at the removal of the python dependency ZFS > delegations in Illumos, for example). This may lead to a split in the > future (Oracle ZFS vs OpenZFS). But that''s the future. ZFSv28 is > available for FreeBSD right now, which supports all the features we''re > looking for in ZFS. > > > d) NFS is slow ...NFS over RDMA is fast but freebsd has no open/official > > infiniband stack > > NFS doesn''t have to be slow. > > > e) consistent snapshots are very different from backuping only files. > > For example if you backup a DB server copying files is not enought you > > have to dump also what you have in memory at the same time (the key word > > is "at the same time") > > Yes, true. But having a cronjob in the guest (or having the backups > server execute the command remotely) that does a dump of the database > before the backup snapshot is created is pretty darn close to atomic, > and hasn''t failed us yet in our restores. It''s not perfect, but so > far, so good. > > Compared to the hassle of getting iSCSI live-migration working, and > all the hassles of getting a cluster-aware LVM or FS setup, I''ll take > a little drop in raw disk I/O. :) Ease of management trumps raw > performance for us (we''re only 5 people managing servers for an entire > school district of ~2100 staff and 50 schools). >You don''t need CLVM for live migration on shared iSCSI+LVM.. See for example Citrix XenServer (or XCP): they use just the normal LVM on top of shared iSCSI, and live migrations work perfectly OK. The trick is that the XAPI toolstack takes care of the locking, so there''s no need for CLVM. -- Pasi _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Sat, Jan 29, 2011 at 08:31:36AM +1100, James Harper wrote:> > > > 2011/1/28 Christian Zoffoli <czoffoli@xmerlin.org>: > > > Il 28/01/2011 08:08, yue ha scritto: > > >> what is the performance of clvm+ocfs2? > > >> stability,? > > > > > > it''s very reliable but not as fast as using clvm directly. > > > > to expand a little: > > > > ocfs2: > > it''s a cluster filesystem, it has the overheads of being a filesystem > > (as opposed to ''naked'' block devices), and of the clustering > > requirements: in effect, having to check shared locks at critical > > instants. > > Microsoft achieve high performance with their cluster filesystem. In fact the docs clearly state it''s only reliable for Hyper-V virtual disks, any other use could cause problems, so I assume they get around the metadata locking problem by isolating each disk file so there are no (or minimal) shared resources. > > > > > clvm: > > it''s the clustering version of LVM. since the whole LVM metadata is > > quite small, it''s shared entirely, so all accesses are exactly the > > same on CLVM as on LVM. > > > > the only impact is when modifying the LVM metadata > > (creating/modifying/deleting/migrating/etc volumes), since _all_ > > access is suspended until every node has the a local copy of the new > > LVM metadata. > > > > Of course, a pause of a few tens or hundreds of milliseconds for an > > operation done less than once a day (less than once a month in many > > cases) is totally imperceptible. > > > > The dealbreaker for me with clvm was that snapshots aren''t supported. I assume this hasn''t changed and even if it has, every write to a snapshotted volume potentially involves a metadata lock so the performace drops right down unless you can optimise for that ''original + snapshot only accessed on the same node'' case, which may be a limitation I could tolerate. >You don''t really need CLVM for live migrations. Normal LVM works well on a shared iSCSI LUN. Assuming you know what you''re doing.. It also allows you to use snapshots. See the other mail I sent on this thread.. -- Pasi _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Wed, Jan 26, 2011 at 11:38:52AM +0100, Christian Zoffoli wrote:> Hi, > > if you want to push on performance the best one is: > > CLVM over iSCSI >And if you need snapshots, just over normal LVM over shared iSCSI LUN. (in this case your management toolstack needs to take care of locking LVM control). -- Pasi> if you need fully redundancy you have to double everything: > - switches > - network ports > - PSUs > > and > > - storages. > > If you want to a completely redundant storage solution you can use DRBD > in active-active > > Just some notes: > - VMs over files over NFS is slow (only some vendors have a relative > fast NFS appliance). > - VMs over files over a cluster FS is slow > > everytime you add a layer (in particular a clustered FS layer) your > performances drop down ... so make it simple > > > Best regards, > Christian > > P.S. another interesting approach would be NFS over RDMA (infiniband) > ...most of the advantages of NFS with less disadvantages compared to NFS > over TCP/IP > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Wed, Jan 26, 2011 at 07:15:41PM +0100, Christian Zoffoli wrote:> Il 26/01/2011 18:58, Roberto Bifulco ha scritto: > [cut] > > from comparisons over the same harware we can be more confident that the > > results we get are still > > valid over a similar (clearly not exactly the same!!) configuration. > > > tipically tests are quite incomparable. > If you change disks (type, brand, size, number, raid level) or some > settings or hw you can obtain very different results. > > IMHO the right way is to find how many IOPS do you need to archive your > load and then you can choose disk type, raid type, rpm etc > > Tipically, the SAN type (iSCSI, FC, etc) doesn''t affect IOPS ...so if > you need 4000 IOPS of a mixed 70/30 RW you can simply calculate the iron > you need to archive this. > > Nevertheless, the connection type affects bandwidth between servers and > storage(s), latency and how many VMs you can put on a single piece of hw. > > In other words, if you have good iron on the disk/controller side you > can archive for example 100 VMs but if the bottleneck is your connection > probably you have to reduce the overbooking level. > > iSCSI tipically has a quite big overhead due to the protocol, FC, SAS, > native infiniband, AoE have very low overhead. >Not true today. TCP/IP is hardware offloaded nowadays, and many NICs also have hardware iSCSI offloading. Also AoE is not really faster than iSCSI, since TCP/IP is hardware offloaded these days.. iSCSI is more flexible and more widely supported than AoE. -- Pasi _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Wed, Jan 26, 2011 at 11:10:12PM +0100, Christian Zoffoli wrote:> Il 26/01/2011 22:24, James Harper ha scritto: > >> > >> iSCSI tipically has a quite big overhead due to the protocol, FC, SAS, > >> native infiniband, AoE have very low overhead. > >> > > > > For iSCSI vs AoE, that isn''t as true as you might think. TCP offload can > > take care of a lot of the overhead. Any server class network adapter > > these days should allow you to send 60kb packets to the network adapter > > and it will take care of the segmentation, while AoE would be limited to > > MTU sized packets. With AoE you need to checksum every packet yourself > > while with iSCSI it is taken care of by the network adapter. > > the overhead is 10% on a gigabit link and when you speak about resources > overhead you have mention also the CPU overhead on the storage side. > > If you check the datasheets of brands like emc you can see that the same > storage platform is sold in iSCSI and FC version ...on the first one you > can use less than half the servers you can use with the last one. >This is mostly because of: - EMC''s crappy iSCSI implementation. - EMC wants to sell you legacy FC stuff they''ve invested a lot in. See dedicated iSCSI enterprise storage like Equallogic.. the way it''s meant to be. Microsoft and Intel had some press releases around one year ago demonstrating over one *million* IOPS using a single 10gbit Intel NIC, on a *single* x86 box, using *software* iSCSI.> Every new entry level storage is based on std hardware without any hw > acceleration ...for example EMC AX storages are simply xeon servers. >Many of the highend enterprise storage boxes are just normal (x86) hardware. Check for example NetApp. The magic is all in *software*. -- Pasi _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Thu, Jan 27, 2011 at 09:35:38AM +0100, Adi Kriegisch wrote:> Hi! > > > > iSCSI tipically has a quite big overhead due to the protocol, FC, SAS, > > > native infiniband, AoE have very low overhead. > > > > > > > For iSCSI vs AoE, that isn''t as true as you might think. TCP offload can > > take care of a lot of the overhead. Any server class network adapter > > these days should allow you to send 60kb packets to the network adapter > > and it will take care of the segmentation, while AoE would be limited to > > MTU sized packets. With AoE you need to checksum every packet yourself > > while with iSCSI it is taken care of by the network adapter. > What AoE actually does is sending a frame per block. Block size is 4K -- so > no need for fragmentation. The overhead is pretty low, because we''re > talking about Ethernet frames. > Most iSCSI issues I have seen are with reordering of packages due to > transmission across several interfaces. So what most people recommend is to > keep the number of interfaces to two. To keep performance up this means you > have to use 10G, FC or similar which is quite expensive -- especially if > you''d like to have a HA SAN network (HSRP and stuff like that is required). > > AoE does not suffer from those issues: Using 6 GBit interfaces is no > problem at all, load balancing will happen automatically, as the load is > distributed equally across all available interfaces. HA is very simple: > just use two switches and connect one half of the interfaces to one switch > and the other half to the other switch. (It is recommended to use switches > that can do jumbo frames and flow control) > IMHO most of the current recommendations and practises surrounding iSCSI > are there to overcome the shortcomings of the protocol. AoE is way more > robust and easier to handle. >iSCSI does not have problems using multiple gige interfaces. Just setup multipathing properly. -- Pasi _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On 01/29/11 16:09, Pasi Kärkkäinen wrote:> On Thu, Jan 27, 2011 at 09:35:38AM +0100, Adi Kriegisch wrote: > >> Hi! >> >> >>>> iSCSI tipically has a quite big overhead due to the protocol, FC, SAS, >>>> native infiniband, AoE have very low overhead. >>>> >>>> >>> For iSCSI vs AoE, that isn''t as true as you might think. TCP offload can >>> take care of a lot of the overhead. Any server class network adapter >>> these days should allow you to send 60kb packets to the network adapter >>> and it will take care of the segmentation, while AoE would be limited to >>> MTU sized packets. With AoE you need to checksum every packet yourself >>> while with iSCSI it is taken care of by the network adapter. >>> >> What AoE actually does is sending a frame per block. Block size is 4K -- so >> no need for fragmentation. The overhead is pretty low, because we''re >> talking about Ethernet frames. >> Most iSCSI issues I have seen are with reordering of packages due to >> transmission across several interfaces. So what most people recommend is to >> keep the number of interfaces to two. To keep performance up this means you >> have to use 10G, FC or similar which is quite expensive -- especially if >> you''d like to have a HA SAN network (HSRP and stuff like that is required). >> >> AoE does not suffer from those issues: Using 6 GBit interfaces is no >> problem at all, load balancing will happen automatically, as the load is >> distributed equally across all available interfaces. HA is very simple: >> just use two switches and connect one half of the interfaces to one switch >> and the other half to the other switch. (It is recommended to use switches >> that can do jumbo frames and flow control) >> IMHO most of the current recommendations and practises surrounding iSCSI >> are there to overcome the shortcomings of the protocol. AoE is way more >> robust and easier to handle. >> >> > iSCSI does not have problems using multiple gige interfaces. > Just setup multipathing properly. > > -- Pasi > > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users >On this subject: am using multipathing to iSCSI too, hoping to have aggregated speed on top of path redundancy but the speed seems not to surpass the one of a single interface. Is anyone successful at doing this? Cheers, B. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Sat, Jan 29, 2011 at 04:27:52PM +0100, Bart Coninckx wrote:> On 01/29/11 16:09, Pasi Kärkkäinen wrote: > > On Thu, Jan 27, 2011 at 09:35:38AM +0100, Adi Kriegisch wrote: > > > >> Hi! > >> > >> > >>>> iSCSI tipically has a quite big overhead due to the protocol, FC, SAS, > >>>> native infiniband, AoE have very low overhead. > >>>> > >>>> > >>> For iSCSI vs AoE, that isn''t as true as you might think. TCP offload can > >>> take care of a lot of the overhead. Any server class network adapter > >>> these days should allow you to send 60kb packets to the network adapter > >>> and it will take care of the segmentation, while AoE would be limited to > >>> MTU sized packets. With AoE you need to checksum every packet yourself > >>> while with iSCSI it is taken care of by the network adapter. > >>> > >> What AoE actually does is sending a frame per block. Block size is 4K -- so > >> no need for fragmentation. The overhead is pretty low, because we''re > >> talking about Ethernet frames. > >> Most iSCSI issues I have seen are with reordering of packages due to > >> transmission across several interfaces. So what most people recommend is to > >> keep the number of interfaces to two. To keep performance up this means you > >> have to use 10G, FC or similar which is quite expensive -- especially if > >> you''d like to have a HA SAN network (HSRP and stuff like that is required). > >> > >> AoE does not suffer from those issues: Using 6 GBit interfaces is no > >> problem at all, load balancing will happen automatically, as the load is > >> distributed equally across all available interfaces. HA is very simple: > >> just use two switches and connect one half of the interfaces to one switch > >> and the other half to the other switch. (It is recommended to use switches > >> that can do jumbo frames and flow control) > >> IMHO most of the current recommendations and practises surrounding iSCSI > >> are there to overcome the shortcomings of the protocol. AoE is way more > >> robust and easier to handle. > >> > >> > > iSCSI does not have problems using multiple gige interfaces. > > Just setup multipathing properly. > > > > -- Pasi > > > > > > _______________________________________________ > > Xen-users mailing list > > Xen-users@lists.xensource.com > > http://lists.xensource.com/xen-users > > > > On this subject: am using multipathing to iSCSI too, hoping to have > aggregated speed on top of path redundancy but the speed seems not to > surpass the one of a single interface. > > Is anyone successful at doing this? >You''re benchmarking sequential/linear IO, using big blocksizes, right? Some questions: - Are you using multipath round robin path policy? - After how many IOs do you switch paths? You might need to lower the min_ios. -- Pasi _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Il 29/01/2011 15:56, Pasi Kärkkäinen ha scritto: [cut]> The trick is that the XAPI toolstack takes care of the locking, > so there''s no need for CLVM.was the same also on the dead virtualiron. Christian _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Il 29/01/2011 16:08, Pasi Kärkkäinen ha scritto: [cut]> Microsoft and Intel had some press releases around one year ago > demonstrating over one *million* IOPS using a single 10gbit Intel NIC, > on a *single* x86 box, using *software* iSCSI.there is a big differece between marketing numbers and real numbers. The test you have pointed is only smoke in the eyes. Noone published the hardware list they have used to reach such performances. First of all they have aggregated the perfomances of *10* targets (if the math is not changed 1 aggregator+10 targets == 11) and they have not said what kind of hard disk and how many hard disks they used to reach these performances. The best std hard disk -> SAS 2.5 15k can do ~190 IOPS so it''s quite impossible to archive such IOPS, the only way is to use SSDs or better PCIe SSDs ...but as everyone knows you have to pay 2 arms, 2 legs and so on. In real life is very hard to reach high performance levels, for example: - 48x 2.5IN 15k disks in raid0 gives you ~8700 RW IOPS (in raid 0 the % of read doesn''t impact on the results) If you have enought money you can choose products from texas memory of fusion-io but tipically the costs are too high. For example a fusion ioDrive DUO 604GB MLC costs ~15k $ ...if you want 1M IOPS you can choose the IODRIVE OCTAL ...but if we make the proportion it should be over 120k $. Christian _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Sat, Jan 29, 2011 at 05:24:30PM +0100, Christian Zoffoli wrote:> Il 29/01/2011 16:08, Pasi Kärkkäinen ha scritto: > [cut] > > Microsoft and Intel had some press releases around one year ago > > demonstrating over one *million* IOPS using a single 10gbit Intel NIC, > > on a *single* x86 box, using *software* iSCSI. > > there is a big differece between marketing numbers and real numbers. > > The test you have pointed is only smoke in the eyes. >No, it''s not just smoke in the eyes. It clearly shows ethernet and iSCSI can match and beat legacy FC.> Noone published the hardware list they have used to reach such performances. >Hardware configuration was published.> First of all they have aggregated the perfomances of *10* targets (if > the math is not changed 1 aggregator+10 targets == 11) and they have not > said what kind of hard disk and how many hard disks they used to reach > these performances. >Targets weren''t the point of that test. The point was to show single host *initiator* (=iSCSI client) can handle one million IOPS.> The best std hard disk -> SAS 2.5 15k can do ~190 IOPS so it''s quite > impossible to archive such IOPS, the only way is to use SSDs or better > PCIe SSDs ...but as everyone knows you have to pay 2 arms, 2 legs and so on. >In that test they used 10 targets, ie. 10 separate servers as targets, and each had big RAM disk shared as iSCSI LUN.> In real life is very hard to reach high performance levels, for example: > - 48x 2.5IN 15k disks in raid0 gives you ~8700 RW IOPS (in raid 0 the % > of read doesn''t impact on the results) >The point of that test was to show iSCSI protocol is NOT the bottleneck, Ethernet is NOT the bottleneck, and iSCSI initiator (client) is NOT the bottleneck. The bottleneck is the storage server. And that''s the reason they used many *RAM disks* as the storage servers.> If you have enought money you can choose products from texas memory of > fusion-io but tipically the costs are too high. > > For example a fusion ioDrive DUO 604GB MLC costs ~15k $ ...if you want > 1M IOPS you can choose the IODRIVE OCTAL ...but if we make the > proportion it should be over 120k $. >-- Pasi _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Sat, Jan 29, 2011 at 04:46:36PM +0100, Christian Zoffoli wrote:> Il 29/01/2011 15:56, Pasi Kärkkäinen ha scritto: > [cut] > > The trick is that the XAPI toolstack takes care of the locking, > > so there''s no need for CLVM. > > was the same also on the dead virtualiron. >Indeed, VirtualIron used the same normal LVM method. I wouldn''t be suprised if OracleVM (Oracle''s current Xen based virtualization product) uses it aswell :) -- Pasi _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On 01/29/11 16:30, Pasi Kärkkäinen wrote:> On Sat, Jan 29, 2011 at 04:27:52PM +0100, Bart Coninckx wrote: > >> On 01/29/11 16:09, Pasi Kärkkäinen wrote: >> >>> On Thu, Jan 27, 2011 at 09:35:38AM +0100, Adi Kriegisch wrote: >>> >>> >>>> Hi! >>>> >>>> >>>> >>>>>> iSCSI tipically has a quite big overhead due to the protocol, FC, SAS, >>>>>> native infiniband, AoE have very low overhead. >>>>>> >>>>>> >>>>>> >>>>> For iSCSI vs AoE, that isn''t as true as you might think. TCP offload can >>>>> take care of a lot of the overhead. Any server class network adapter >>>>> these days should allow you to send 60kb packets to the network adapter >>>>> and it will take care of the segmentation, while AoE would be limited to >>>>> MTU sized packets. With AoE you need to checksum every packet yourself >>>>> while with iSCSI it is taken care of by the network adapter. >>>>> >>>>> >>>> What AoE actually does is sending a frame per block. Block size is 4K -- so >>>> no need for fragmentation. The overhead is pretty low, because we''re >>>> talking about Ethernet frames. >>>> Most iSCSI issues I have seen are with reordering of packages due to >>>> transmission across several interfaces. So what most people recommend is to >>>> keep the number of interfaces to two. To keep performance up this means you >>>> have to use 10G, FC or similar which is quite expensive -- especially if >>>> you''d like to have a HA SAN network (HSRP and stuff like that is required). >>>> >>>> AoE does not suffer from those issues: Using 6 GBit interfaces is no >>>> problem at all, load balancing will happen automatically, as the load is >>>> distributed equally across all available interfaces. HA is very simple: >>>> just use two switches and connect one half of the interfaces to one switch >>>> and the other half to the other switch. (It is recommended to use switches >>>> that can do jumbo frames and flow control) >>>> IMHO most of the current recommendations and practises surrounding iSCSI >>>> are there to overcome the shortcomings of the protocol. AoE is way more >>>> robust and easier to handle. >>>> >>>> >>>> >>> iSCSI does not have problems using multiple gige interfaces. >>> Just setup multipathing properly. >>> >>> -- Pasi >>> >>> >>> _______________________________________________ >>> Xen-users mailing list >>> Xen-users@lists.xensource.com >>> http://lists.xensource.com/xen-users >>> >>> >> On this subject: am using multipathing to iSCSI too, hoping to have >> aggregated speed on top of path redundancy but the speed seems not to >> surpass the one of a single interface. >> >> Is anyone successful at doing this? >> >> > You''re benchmarking sequential/linear IO, using big blocksizes, right? > > Some questions: > - Are you using multipath round robin path policy? > - After how many IOs do you switch paths? You might need to lower the min_ios. > > > -- Pasi > >Hi Pasi, the benchmarking was intuitively done, with just dd and bonnie++. It is indeed rr, this is a part of my multipath.conf: defaults { udev_dir /dev polling_interval 10 selector "round-robin 0" path_grouping_policy multibus getuid_callout "/lib/udev/scsi_id --whitelisted --device=/dev/%n" prio const path_checker directio rr_min_io 100 max_fds 8192 rr_weight priorities failback immediate no_path_retry 5 user_friendly_names no } should the "100" go down a bit? thx, bart _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Sat, Jan 29, 2011 at 06:12:34PM +0100, Bart Coninckx wrote:> On 01/29/11 16:30, Pasi Kärkkäinen wrote: > > On Sat, Jan 29, 2011 at 04:27:52PM +0100, Bart Coninckx wrote: > > > >> On 01/29/11 16:09, Pasi Kärkkäinen wrote: > >> > >>> On Thu, Jan 27, 2011 at 09:35:38AM +0100, Adi Kriegisch wrote: > >>> > >>> > >>>> Hi! > >>>> > >>>> > >>>> > >>>>>> iSCSI tipically has a quite big overhead due to the protocol, FC, SAS, > >>>>>> native infiniband, AoE have very low overhead. > >>>>>> > >>>>>> > >>>>>> > >>>>> For iSCSI vs AoE, that isn''t as true as you might think. TCP offload can > >>>>> take care of a lot of the overhead. Any server class network adapter > >>>>> these days should allow you to send 60kb packets to the network adapter > >>>>> and it will take care of the segmentation, while AoE would be limited to > >>>>> MTU sized packets. With AoE you need to checksum every packet yourself > >>>>> while with iSCSI it is taken care of by the network adapter. > >>>>> > >>>>> > >>>> What AoE actually does is sending a frame per block. Block size is 4K -- so > >>>> no need for fragmentation. The overhead is pretty low, because we''re > >>>> talking about Ethernet frames. > >>>> Most iSCSI issues I have seen are with reordering of packages due to > >>>> transmission across several interfaces. So what most people recommend is to > >>>> keep the number of interfaces to two. To keep performance up this means you > >>>> have to use 10G, FC or similar which is quite expensive -- especially if > >>>> you''d like to have a HA SAN network (HSRP and stuff like that is required). > >>>> > >>>> AoE does not suffer from those issues: Using 6 GBit interfaces is no > >>>> problem at all, load balancing will happen automatically, as the load is > >>>> distributed equally across all available interfaces. HA is very simple: > >>>> just use two switches and connect one half of the interfaces to one switch > >>>> and the other half to the other switch. (It is recommended to use switches > >>>> that can do jumbo frames and flow control) > >>>> IMHO most of the current recommendations and practises surrounding iSCSI > >>>> are there to overcome the shortcomings of the protocol. AoE is way more > >>>> robust and easier to handle. > >>>> > >>>> > >>>> > >>> iSCSI does not have problems using multiple gige interfaces. > >>> Just setup multipathing properly. > >>> > >>> -- Pasi > >>> > >>> > >>> _______________________________________________ > >>> Xen-users mailing list > >>> Xen-users@lists.xensource.com > >>> http://lists.xensource.com/xen-users > >>> > >>> > >> On this subject: am using multipathing to iSCSI too, hoping to have > >> aggregated speed on top of path redundancy but the speed seems not to > >> surpass the one of a single interface. > >> > >> Is anyone successful at doing this? > >> > >> > > You''re benchmarking sequential/linear IO, using big blocksizes, right? > > > > Some questions: > > - Are you using multipath round robin path policy? > > - After how many IOs do you switch paths? You might need to lower the min_ios. > > > > > > -- Pasi > > > > > Hi Pasi, > > the benchmarking was intuitively done, with just dd and bonnie++. >with dd, use "bs=1024k" or so to force big blocks.> It is indeed rr, this is a part of my multipath.conf: > > defaults { > udev_dir /dev > polling_interval 10 > selector "round-robin 0" > path_grouping_policy multibus > getuid_callout "/lib/udev/scsi_id --whitelisted > --device=/dev/%n" > prio const > path_checker directio > rr_min_io 100 > max_fds 8192 > rr_weight priorities > failback immediate > no_path_retry 5 > user_friendly_names no > } > > should the "100" go down a bit? >Yeah, for example Equallogic enterprise iSCSI storage manuals recommends rr_min_io of "3". That should allow utilizing both paths. -- Pasi _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On 01/29/11 18:15, Pasi Kärkkäinen wrote:> On Sat, Jan 29, 2011 at 06:12:34PM +0100, Bart Coninckx wrote: > >> On 01/29/11 16:30, Pasi Kärkkäinen wrote: >> >>> On Sat, Jan 29, 2011 at 04:27:52PM +0100, Bart Coninckx wrote: >>> >>> >>>> On 01/29/11 16:09, Pasi Kärkkäinen wrote: >>>> >>>> >>>>> On Thu, Jan 27, 2011 at 09:35:38AM +0100, Adi Kriegisch wrote: >>>>> >>>>> >>>>> >>>>>> Hi! >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>> iSCSI tipically has a quite big overhead due to the protocol, FC, SAS, >>>>>>>> native infiniband, AoE have very low overhead. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> For iSCSI vs AoE, that isn''t as true as you might think. TCP offload can >>>>>>> take care of a lot of the overhead. Any server class network adapter >>>>>>> these days should allow you to send 60kb packets to the network adapter >>>>>>> and it will take care of the segmentation, while AoE would be limited to >>>>>>> MTU sized packets. With AoE you need to checksum every packet yourself >>>>>>> while with iSCSI it is taken care of by the network adapter. >>>>>>> >>>>>>> >>>>>>> >>>>>> What AoE actually does is sending a frame per block. Block size is 4K -- so >>>>>> no need for fragmentation. The overhead is pretty low, because we''re >>>>>> talking about Ethernet frames. >>>>>> Most iSCSI issues I have seen are with reordering of packages due to >>>>>> transmission across several interfaces. So what most people recommend is to >>>>>> keep the number of interfaces to two. To keep performance up this means you >>>>>> have to use 10G, FC or similar which is quite expensive -- especially if >>>>>> you''d like to have a HA SAN network (HSRP and stuff like that is required). >>>>>> >>>>>> AoE does not suffer from those issues: Using 6 GBit interfaces is no >>>>>> problem at all, load balancing will happen automatically, as the load is >>>>>> distributed equally across all available interfaces. HA is very simple: >>>>>> just use two switches and connect one half of the interfaces to one switch >>>>>> and the other half to the other switch. (It is recommended to use switches >>>>>> that can do jumbo frames and flow control) >>>>>> IMHO most of the current recommendations and practises surrounding iSCSI >>>>>> are there to overcome the shortcomings of the protocol. AoE is way more >>>>>> robust and easier to handle. >>>>>> >>>>>> >>>>>> >>>>>> >>>>> iSCSI does not have problems using multiple gige interfaces. >>>>> Just setup multipathing properly. >>>>> >>>>> -- Pasi >>>>> >>>>> >>>>> _______________________________________________ >>>>> Xen-users mailing list >>>>> Xen-users@lists.xensource.com >>>>> http://lists.xensource.com/xen-users >>>>> >>>>> >>>>> >>>> On this subject: am using multipathing to iSCSI too, hoping to have >>>> aggregated speed on top of path redundancy but the speed seems not to >>>> surpass the one of a single interface. >>>> >>>> Is anyone successful at doing this? >>>> >>>> >>>> >>> You''re benchmarking sequential/linear IO, using big blocksizes, right? >>> >>> Some questions: >>> - Are you using multipath round robin path policy? >>> - After how many IOs do you switch paths? You might need to lower the min_ios. >>> >>> >>> -- Pasi >>> >>> >>> >> Hi Pasi, >> >> the benchmarking was intuitively done, with just dd and bonnie++. >> >> > with dd, use "bs=1024k" or so to force big blocks. > > >> It is indeed rr, this is a part of my multipath.conf: >> >> defaults { >> udev_dir /dev >> polling_interval 10 >> selector "round-robin 0" >> path_grouping_policy multibus >> getuid_callout "/lib/udev/scsi_id --whitelisted >> --device=/dev/%n" >> prio const >> path_checker directio >> rr_min_io 100 >> max_fds 8192 >> rr_weight priorities >> failback immediate >> no_path_retry 5 >> user_friendly_names no >> } >> >> should the "100" go down a bit? >> >> > Yeah, for example Equallogic enterprise iSCSI storage > manuals recommends rr_min_io of "3". > > That should allow utilizing both paths. > > -- Pasi > >Excellent, will try this and start measuring again ... cheers, B. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Il 29/01/2011 17:37, Pasi Kärkkäinen ha scritto: [cut]> No, it''s not just smoke in the eyes. > It clearly shows ethernet and iSCSI can match and beat legacy FC.SAS storages and also infiniband storages can beat both legacy FC and they cost less than a full 10G infrastructure>> Noone published the hardware list they have used to reach such performances. >> > > Hardware configuration was published.please provide a link of the full hw configuration I cannot see anything about what you are saying having a look for example to: http://download.intel.com/support/network/sb/inteliscsiwp.pdf>> First of all they have aggregated the perfomances of *10* targets (if >> the math is not changed 1 aggregator+10 targets == 11) and they have not >> said what kind of hard disk and how many hard disks they used to reach >> these performances. >> > > Targets weren''t the point of that test. > > The point was to show single host *initiator* (=iSCSI client) > can handle one million IOPS.that''s meaningless in this thread ...where are discussing about choosing the right storage infrastructure for a xen cluster when someone will release something real that everyone can adopt in his infrastructure with 1M IOPS I would be delighted to buy it [cut]> In that test they used 10 targets, ie. 10 separate servers as targets, > and each had big RAM disk shared as iSCSI LUN.see above ...it''s meaningless in this thread>> In real life is very hard to reach high performance levels, for example: >> - 48x 2.5IN 15k disks in raid0 gives you ~8700 RW IOPS (in raid 0 the % >> of read doesn''t impact on the results) >> > > The point of that test was to show iSCSI protocol is NOT the bottleneck, > Ethernet is NOT the bottleneck, and iSCSI initiator (client) > is NOT the bottleneck. > > The bottleneck is the storage server. And that''s the reason > they used many *RAM disks* as the storage servers.noone said something different ..we are discussing how to create the best clustered xen setup and in particular we are evaluating also the differences between all the technologies. Nevertheless noone in the test results pointed how much CPU & co was wasted using this approach. Christian _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Sat, Jan 29, 2011 at 07:26:59PM +0100, Christian Zoffoli wrote:> Il 29/01/2011 17:37, Pasi Kärkkäinen ha scritto: > [cut] > > No, it''s not just smoke in the eyes. > > It clearly shows ethernet and iSCSI can match and beat legacy FC. > > SAS storages and also infiniband storages can beat both legacy FC and > they cost less than a full 10G infrastructure >Yep.> >> Noone published the hardware list they have used to reach such performances. > >> > > > > Hardware configuration was published. > > please provide a link of the full hw configuration >1.25 Million IOPS benchmark: http://communities.intel.com/community/wired/blog/2010/04/22/1-million-iops-how-about-125-million http://blog.fosketts.net/2010/03/19/microsoft-intel-starwind-iscsi/> I cannot see anything about what you are saying having a look for > example to: > > http://download.intel.com/support/network/sb/inteliscsiwp.pdf >That pdf is just generic marketing stuff. The hardware setup is described here: http://communities.intel.com/community/wired/blog/2010/04/20/1-million-iop-article-explained and: http://gestaltit.com/featured/top/stephen/wirespeed-10-gb-iscsi/ Somewhere there was also PDF about that benchmark setup. Microsoft presentation about iSCSI optimizations in 2008r2: http://download.microsoft.com/download/5/E/6/5E66B27B-988B-4F50-AF3A-C2FF1E62180F/COR-T586_WH08.pptx> >> First of all they have aggregated the perfomances of *10* targets (if > >> the math is not changed 1 aggregator+10 targets == 11) and they have not > >> said what kind of hard disk and how many hard disks they used to reach > >> these performances. > >> > > > > Targets weren''t the point of that test. > > > > The point was to show single host *initiator* (=iSCSI client) > > can handle one million IOPS. > > that''s meaningless in this thread ...where are discussing about choosing > the right storage infrastructure for a xen cluster >This discussion started from the iSCSI vs. AoE performance differences.. So I just wanted to point out that iSCSI performance is definitely OK.> when someone will release something real that everyone can adopt in his > infrastructure with 1M IOPS I would be delighted to buy it >That was very real, and you can buy the equipment and do the same benchmark yourself.> [cut] > > In that test they used 10 targets, ie. 10 separate servers as targets, > > and each had big RAM disk shared as iSCSI LUN. > > see above ...it''s meaningless in this thread >Actually it just tells the StarWind iSCSI target they used is crap, since they had to use 10x more targets than initiators to achieve the results ;)> > >> In real life is very hard to reach high performance levels, for example: > >> - 48x 2.5IN 15k disks in raid0 gives you ~8700 RW IOPS (in raid 0 the % > >> of read doesn''t impact on the results) > >> > > > > The point of that test was to show iSCSI protocol is NOT the bottleneck, > > Ethernet is NOT the bottleneck, and iSCSI initiator (client) > > is NOT the bottleneck. > > > > The bottleneck is the storage server. And that''s the reason > > they used many *RAM disks* as the storage servers. > > noone said something different ..we are discussing how to create the > best clustered xen setup and in particular we are evaluating also the > differences between all the technologies. > > Nevertheless noone in the test results pointed how much CPU & co was > wasted using this approach. >In that benchmark 100% of the CPU was used (when at 1.3 million IOPS). So when you scale IOPS to common workload numbers you''ll notice iSCSI doesn''t cause much CPU usage.. Say, 12500 IOPS, will cause 1% cpu usage, when scaling linearly from Intel+Microsoft results. -- Pasi _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Sat, Jan 29, 2011 at 08:46:52PM +0200, Pasi Kärkkäinen wrote:> > > > please provide a link of the full hw configuration > > > > 1.25 Million IOPS benchmark: > http://communities.intel.com/community/wired/blog/2010/04/22/1-million-iops-how-about-125-million > > http://blog.fosketts.net/2010/03/19/microsoft-intel-starwind-iscsi/ > > > > I cannot see anything about what you are saying having a look for > > example to: > > > > http://download.intel.com/support/network/sb/inteliscsiwp.pdf > > > > That pdf is just generic marketing stuff. > > The hardware setup is described here: > http://communities.intel.com/community/wired/blog/2010/04/20/1-million-iop-article-explained > and: http://gestaltit.com/featured/top/stephen/wirespeed-10-gb-iscsi/ > > Somewhere there was also PDF about that benchmark setup. >Found it, it''s here: http://dlbmodigital.microsoft.com/ppt/TN-100114-JSchwartz_SMorgan_JPlawner-1032432956-FINAL.pdf -- Pasi> Microsoft presentation about iSCSI optimizations in 2008r2: > http://download.microsoft.com/download/5/E/6/5E66B27B-988B-4F50-AF3A-C2FF1E62180F/COR-T586_WH08.pptx > > > > >> First of all they have aggregated the perfomances of *10* targets (if > > >> the math is not changed 1 aggregator+10 targets == 11) and they have not > > >> said what kind of hard disk and how many hard disks they used to reach > > >> these performances. > > >> > > > > > > Targets weren''t the point of that test. > > > > > > The point was to show single host *initiator* (=iSCSI client) > > > can handle one million IOPS. > > > > that''s meaningless in this thread ...where are discussing about choosing > > the right storage infrastructure for a xen cluster > > > > This discussion started from the iSCSI vs. AoE performance differences.. > So I just wanted to point out that iSCSI performance is definitely OK. > > > when someone will release something real that everyone can adopt in his > > infrastructure with 1M IOPS I would be delighted to buy it > > > > That was very real, and you can buy the equipment and do the > same benchmark yourself. > > > [cut] > > > In that test they used 10 targets, ie. 10 separate servers as targets, > > > and each had big RAM disk shared as iSCSI LUN. > > > > see above ...it''s meaningless in this thread > > > > Actually it just tells the StarWind iSCSI target they used is crap, > since they had to use 10x more targets than initiators to achieve > the results ;) > > > > > >> In real life is very hard to reach high performance levels, for example: > > >> - 48x 2.5IN 15k disks in raid0 gives you ~8700 RW IOPS (in raid 0 the % > > >> of read doesn''t impact on the results) > > >> > > > > > > The point of that test was to show iSCSI protocol is NOT the bottleneck, > > > Ethernet is NOT the bottleneck, and iSCSI initiator (client) > > > is NOT the bottleneck. > > > > > > The bottleneck is the storage server. And that''s the reason > > > they used many *RAM disks* as the storage servers. > > > > noone said something different ..we are discussing how to create the > > best clustered xen setup and in particular we are evaluating also the > > differences between all the technologies. > > > > Nevertheless noone in the test results pointed how much CPU & co was > > wasted using this approach. > > > > In that benchmark 100% of the CPU was used (when at 1.3 million IOPS). > > So when you scale IOPS to common workload numbers you''ll notice > iSCSI doesn''t cause much CPU usage.. > > Say, 12500 IOPS, will cause 1% cpu usage, when scaling linearly > from Intel+Microsoft results. > > -- Pasi > > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Il 29/01/2011 19:53, Pasi Kärkkäinen ha scritto: [cut] thanks for the links [cut]>> Say, 12500 IOPS, will cause 1% cpu usage, when scaling linearly >> from Intel+Microsoft results.I''ve seen many iSCSI implementations and noone gives me 12500 IOPS with 1% cpu usage ...I would be happy to see a real setup with such performances. Christian P.S. real means = something I could put in production without storing VMs in memory _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Sat, Jan 29, 2011 at 08:39:12PM +0100, Christian Zoffoli wrote:> Il 29/01/2011 19:53, Pasi Kärkkäinen ha scritto: > [cut] > > thanks for the links > > [cut] > >> Say, 12500 IOPS, will cause 1% cpu usage, when scaling linearly > >> from Intel+Microsoft results. > > I''ve seen many iSCSI implementations and noone gives me 12500 IOPS with > 1% cpu usage ...I would be happy to see a real setup with such performances. >Well, on windows you can get that. If we can''t get that on Linux, we should optimize. Someone wants to run some benchmarks against ramdisk targets using Linux initiator? And yeah, to get 12500 IOPS from disk-based target you need to have a LOT of disks.. assuming single 7200 rpm sata disk can deliver around 100 random IOPS max, 12500 IOPS would require around 125 disks in raid-0. So for benchmarking initiator performance it''s best to use ramdisk targets.> > Christian > > P.S. real means = something I could put in production without storing > VMs in memoryYep. -- Pasi _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> > On Wed, Jan 26, 2011 at 11:38:52AM +0100, Christian Zoffoli wrote: > > Hi, > > > > if you want to push on performance the best one is: > > > > CLVM over iSCSI > > > > And if you need snapshots, just over normal LVM over shared iSCSI LUN. > (in this case your management toolstack needs to take care of locking LVM > control). >How sure are you of this? Every write to the original or the snapshot involves a potential metadata update. If the original is mounted on one node and the snapshot on another, I''m pretty sure that it would all go to crap really really quickly. Have you actually tried this??? James _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Sun, Jan 30, 2011 at 10:41:23AM +1100, James Harper wrote:> > > > On Wed, Jan 26, 2011 at 11:38:52AM +0100, Christian Zoffoli wrote: > > > Hi, > > > > > > if you want to push on performance the best one is: > > > > > > CLVM over iSCSI > > > > > > > And if you need snapshots, just over normal LVM over shared iSCSI LUN. > > (in this case your management toolstack needs to take care of locking LVM > > control). > > > > How sure are you of this? Every write to the original or the snapshot involves a potential metadata update. If the original is mounted on one node and the snapshot on another, I''m pretty sure that it would all go to crap really really quickly. > > Have you actually tried this??? >It''s widely used. Citrix XenServer (and XCP) do it like that, and also the Xen based VirtualIron did it like that. -- Pasi _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Sun, Jan 30, 2011 at 01:44:06AM +0200, Pasi Kärkkäinen wrote:> On Sun, Jan 30, 2011 at 10:41:23AM +1100, James Harper wrote: > > > > > > On Wed, Jan 26, 2011 at 11:38:52AM +0100, Christian Zoffoli wrote: > > > > Hi, > > > > > > > > if you want to push on performance the best one is: > > > > > > > > CLVM over iSCSI > > > > > > > > > > And if you need snapshots, just over normal LVM over shared iSCSI LUN. > > > (in this case your management toolstack needs to take care of locking LVM > > > control). > > > > > > > How sure are you of this? Every write to the original or the snapshot involves a potential metadata update. If the original is mounted on one node and the snapshot on another, I''m pretty sure that it would all go to crap really really quickly. > > > > Have you actually tried this??? > > > > It''s widely used. > > Citrix XenServer (and XCP) do it like that, and also the > Xen based VirtualIron did it like that. >Forgot to add this: In Citrix XenServer it''s the XAPI toolstack that''s taking care of LVM locking, so only the "pool master" is executing LVM commands. As long as you know you''re executing LVM commands only from a single node, you''re good. No need for CLVM. I guess you also need to refresh all the other nodes after executing LVM commands on the primary/master node. -- Pasi _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> > It''s widely used. > > > > Citrix XenServer (and XCP) do it like that, and also the > > Xen based VirtualIron did it like that. > > > > Forgot to add this: > > In Citrix XenServer it''s the XAPI toolstack that''s > taking care of LVM locking, so only the "pool master" > is executing LVM commands. > > As long as you know you''re executing LVM commands > only from a single node, you''re good. No need for CLVM.So it still does have locking then.> > I guess you also need to refresh all the other nodes > after executing LVM commands on the primary/master node. >That was my point though. Snapshot works by copy-on-write. Every time a block in primary volume is written to for the first time since the snapshot was taken, the data needs to be copied to the snapshot. Same when the snapshot is written to. That involves a metadata update so I don''t understand how it can work without a major performance hit as you lock and unlock everything with (potentially) every write. James _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On 01/30/11 00:50, Pasi Kärkkäinen wrote:> On Sun, Jan 30, 2011 at 01:44:06AM +0200, Pasi Kärkkäinen wrote: > >> On Sun, Jan 30, 2011 at 10:41:23AM +1100, James Harper wrote: >> >>>> On Wed, Jan 26, 2011 at 11:38:52AM +0100, Christian Zoffoli wrote: >>>> >>>>> Hi, >>>>> >>>>> if you want to push on performance the best one is: >>>>> >>>>> CLVM over iSCSI >>>>> >>>>> >>>> And if you need snapshots, just over normal LVM over shared iSCSI LUN. >>>> (in this case your management toolstack needs to take care of locking LVM >>>> control). >>>> >>>> >>> How sure are you of this? Every write to the original or the snapshot involves a potential metadata update. If the original is mounted on one node and the snapshot on another, I''m pretty sure that it would all go to crap really really quickly. >>> >>> Have you actually tried this??? >>> >>> >> It''s widely used. >> >> Citrix XenServer (and XCP) do it like that, and also the >> Xen based VirtualIron did it like that. >> >> > Forgot to add this: > > In Citrix XenServer it''s the XAPI toolstack that''s > taking care of LVM locking, so only the "pool master" > is executing LVM commands. > > As long as you know you''re executing LVM commands > only from a single node, you''re good. No need for CLVM. > > I guess you also need to refresh all the other nodes > after executing LVM commands on the primary/master node. > > -- Pasi > > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users >I "fixed" this in a way I would like to have your idea on: what I did was build the iSCSI LUNs on an IETD implementation. Each LUN points to a LVM LV on this iSCSI server. To avoid worrying about which Hypervisor will change the LUN, I save the DomU using the particular LUN over iSCSI and then snapshot the LV on the iSCSI server. After I restore the guest again and start dd-ing off the snapshot LV. Rgds, B. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Sun, Jan 30, 2011 at 11:14:02AM +1100, James Harper wrote:> > > It''s widely used. > > > > > > Citrix XenServer (and XCP) do it like that, and also the > > > Xen based VirtualIron did it like that. > > > > > > > Forgot to add this: > > > > In Citrix XenServer it''s the XAPI toolstack that''s > > taking care of LVM locking, so only the "pool master" > > is executing LVM commands. > > > > As long as you know you''re executing LVM commands > > only from a single node, you''re good. No need for CLVM. > > So it still does have locking then. >Yep.> > > > I guess you also need to refresh all the other nodes > > after executing LVM commands on the primary/master node. > > > > That was my point though. Snapshot works by copy-on-write. Every time a block in primary volume is written to for the first time since the snapshot was taken, the data needs to be copied to the snapshot. Same when the snapshot is written to. That involves a metadata update so I don''t understand how it can work without a major performance hit as you lock and unlock everything with (potentially) every write. >Hmm.. if the toolstack makes sure each LV is only used from a single node at a time (which it does), isn''t it enough to just have locking when you *create* the snapshot from the same node? snapshot reserve gets allocated then etc. Ie. the toolstack makes sure the node that actually is accessing the volume/snapshot is always the same node, and it''s in sync? -- Pasi _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Thu, Jan 27, 2011 at 11:04 AM, Simon Hobson <linux@thehobsons.co.uk> wrote:> I must admit, AoE does seem to have it''s upsides - in past threads (here and > elsewhere) I''ve only ever seen it being criticised.One upside to AoE is that, as an Ethernet protocol, you can get a lot of visibility into SAN activity using standard sFlow instrumentation in physical and virtual switches: http://blog.sflow.com/2011/03/aoe.html FYI sFlow is supported by the Open vSwitch and is easy to enable on XCP 1.0 http://blog.sflow.com/2010/12/xcp-10-beta.html Peter _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
I thought the license issues had been flused out of XCP 1.1 RC. From Jonathan''s XCP 1.1 RC announcement email: -------------------------------------------------------------------- XCP 1.1 Release Candidate is ready to be tested. Changes since Beta: * License expiry crept back in. It has now crept out again. -------------------------------------------------------------------- Yet, I''m still seeing the annoying license expiry warning in XenCenter 6.0 from the XCP 1.1 RC boxes: cat /etc/redhat-release XCP release 1.1.0-50674c (xcp) Kevin _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Friday, September 30, 2011, wrote:> > I thought the license issues had been flused out of XCP 1.1 RC. From > Jonathan''s XCP 1.1 RC announcement email: > > ------------------------------**------------------------------**-------- > XCP 1.1 Release Candidate is ready to be tested. Changes since Beta: > > * License expiry crept back in. It has now crept out again. > ------------------------------**------------------------------**-------- > > Yet, I''m still seeing the annoying license expiry warning in XenCenter 6.0 > from the XCP 1.1 RC boxes: >The warning is still there with XenCenter, but that is a bug with XenCenter. The warning is just that a warning. XCP has the license code removed, so nothing will happen when the expiry date passes. Thanks, Todd -- Todd Deshane http://www.linkedin.com/in/deshantm http://www.xen.org/products/cloudxen.html http://runningxen.com/ _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
brooks@netgate.net
2011-Oct-03 04:41 UTC
Re: [Xen-users] XCP 1.1 License Expiry Issue Remains
Todd, Doesn''t this seem like a worse situation than what we had in XCP 1.0? At least in 1.0 we could install a "Free Edition" license and make the warnings disappear. As it is now customers with both commercial XenServer and XCP pools managed via XenCenter are constantly reminded that their license has expired making XenCenter more difficult (annoying) to use. With no real alternative and with ClouldStack being the obvious future solution an XCP aware XenCenter would be a (really) nice bridge. Kevin On Sat, 1 Oct 2011, Todd Deshane wrote:> On Friday, September 30, 2011, wrote: > >> >> I thought the license issues had been flused out of XCP 1.1 RC. From >> Jonathan''s XCP 1.1 RC announcement email: >> >> ------------------------------**------------------------------**-------- >> XCP 1.1 Release Candidate is ready to be tested. Changes since Beta: >> >> * License expiry crept back in. It has now crept out again. >> ------------------------------**------------------------------**-------- >> >> Yet, I''m still seeing the annoying license expiry warning in XenCenter 6.0 >> from the XCP 1.1 RC boxes: >> > > The warning is still there with XenCenter, but that is a bug with XenCenter. > The warning is just that a warning. XCP has the license code removed, so > nothing will happen when the expiry date passes. > > Thanks, > Todd_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Mon, Oct 3, 2011 at 12:41 AM, <brooks@netgate.net> wrote:> Todd, > > Doesn''t this seem like a worse situation than what we had in XCP 1.0? At > least in 1.0 we could install a "Free Edition" license and make the warnings > disappear. As it is now customers with both commercial XenServer and XCP > pools managed via XenCenter are constantly reminded that their license has > expired making XenCenter more difficult (annoying) to use. >Adding Mike and Jon to the CC just to make sure I didn''t misunderstand how things are working...> With no real alternative and with ClouldStack being the obvious future > solution an XCP aware XenCenter would be a (really) nice bridge. > > Kevin > > On Sat, 1 Oct 2011, Todd Deshane wrote: > >> On Friday, September 30, 2011, wrote: >> >>> >>> I thought the license issues had been flused out of XCP 1.1 RC. From >>> Jonathan''s XCP 1.1 RC announcement email: >>> >>> ------------------------------**------------------------------**-------- >>> XCP 1.1 Release Candidate is ready to be tested. Changes since Beta: >>> >>> * License expiry crept back in. It has now crept out again. >>> ------------------------------**------------------------------**-------- >>> >>> Yet, I''m still seeing the annoying license expiry warning in XenCenter >>> 6.0 >>> from the XCP 1.1 RC boxes: >>> >> >> The warning is still there with XenCenter, but that is a bug with >> XenCenter. >> The warning is just that a warning. XCP has the license code removed, so >> nothing will happen when the expiry date passes. >> >> Thanks, >> Todd >-- Todd Deshane http://www.linkedin.com/in/deshantm http://www.xen.org/products/cloudxen.html http://runningxen.com/ _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On 10/03/2011 06:21 AM, Todd Deshane wrote:> On Mon, Oct 3, 2011 at 12:41 AM,<brooks@netgate.net> wrote: >> Todd, >> >> Doesn''t this seem like a worse situation than what we had in XCP 1.0? At >> least in 1.0 we could install a "Free Edition" license and make the warnings >> disappear. As it is now customers with both commercial XenServer and XCP >> pools managed via XenCenter are constantly reminded that their license has >> expired making XenCenter more difficult (annoying) to use. >> > Adding Mike and Jon to the CC just to make sure I didn''t misunderstand > how things are working...Todd''s right, the license "nag-dialog" is annoying but harmless. I''m sorry that these two programs don''t play well with each other, but there isn''''t anything that we can do in the XCP 1.1 timeline.> >> With no real alternative and with ClouldStack being the obvious future >> solution an XCP aware XenCenter would be a (really) nice bridge. >>Agreed. Besides the fact that I have to use it from inside a Windows VM, XenCenter is a nice product. We''ll see what we can do about XenCenter/XCP integration in the future. Mike _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Are there any immediate plans to make the http://updates.vmd.citrix.com/XCP/1.1.0 repo live? more /etc/yum.repos.d/Citrix.repo [citrix] name=XCP 1.1.0 updates mirrorlist=http://updates.vmd.citrix.com/XCP/1.1.0/domain0/mirrorlist #baseurl=http://updates.vmd.citrix.com/XCP/1.1.0/domain0/ gpgcheck=1 gpgkey=http://updates.vmd.citrix.com/XCP/RPM-GPG-KEY-1.1.0 enabled=1 [root@xenc1n4 ~]# yum list ocfs2-tools Loaded plugins: fastestmirror Loading mirror speeds from cached hostfile Could not retrieve mirrorlist http://updates.vmd.citrix.com/XCP/1.1.0/domain0/mirrorlist error was [Errno 14] HTTP Error 404: Not Found At this point the /XCP structure doesn''t exist for 1.0 or for 1.1: http://updates.vmd.citrix.com/XCP/ 404 - Not Found Kevin _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Hi, The default kernel of xcp 1.1 does not contain the ocfs2 module, you can find here a kernel that I recompiled and repackaged with ocfs2 support: wget http://xendev.swisscenter.com/xcp/1.1/rpm/kernel-kdump 2.6.32.12-0.7.1.xs1.sc.1.0.327.170596.i686.rpm wget http://xendev.swisscenter.com/xcp/1.1/rpm/kernel-xen-2.6.32.12-0.7.1.xs1.sc.1.0.327.170596.i686.rpm rpm --force -Uvh xen*rpm (force because otherwise he will say that a more recenter kernel version is installed) Then for the ocfs2-tools: wget http://xendev.swisscenter.com/xcp/1.1/rpm/ocfs2-tools-1.4.4-1.el5.i386.rpm rpm --nodeps -Uvh http://xendev.swisscenter.com/xcp/1.1/rpm/ocfs2-tools-1.4.4-1.el5.i386.rpm (no deps because it depends on redhat-lsb package that is installed as xenserver-lsb on XCP) Reboot the box, then run a /etc/init.d/o2cb configure to setup the cluster. You might also want this one: wget http://xendev.swisscenter.com/xcp/1.1/rpm/parted-1.8.1-28.el5.i386.rpm That can be usefull to GPT partition a > 2To ocfs drive. All this is provided without guarantee, althought I use it successfully, don''t try it on production box at first :) Cheers, Sébastien On 04.10.2011 19:02, brooks@netgate.net wrote:> > Are there any immediate plans to make the > http://updates.vmd.citrix.com/XCP/1.1.0 repo live? > > more /etc/yum.repos.d/Citrix.repo > [citrix] > name=XCP 1.1.0 updates > mirrorlist=http://updates.vmd.citrix.com/XCP/1.1.0/domain0/mirrorlist > #baseurl=http://updates.vmd.citrix.com/XCP/1.1.0/domain0/ > gpgcheck=1 > gpgkey=http://updates.vmd.citrix.com/XCP/RPM-GPG-KEY-1.1.0 > enabled=1 > > [root@xenc1n4 ~]# yum list ocfs2-tools > Loaded plugins: fastestmirror > Loading mirror speeds from cached hostfile > Could not retrieve mirrorlist > http://updates.vmd.citrix.com/XCP/1.1.0/domain0/mirrorlist error was > [Errno 14] HTTP Error 404: Not Found > > At this point the /XCP structure doesn''t exist for 1.0 or for 1.1: > > http://updates.vmd.citrix.com/XCP/ > 404 - Not Found > > Kevin > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
[some typo corrected] Hi, The default kernel of xcp 1.1 does not contain the ocfs2 module, you can find here a kernel that I recompiled and repackaged with ocfs2 support: wget http://xendev.swisscenter.com/xcp/1.1/rpm/kernel-kdump-2.6.32.12-0.7.1.xs1.sc.1.0.327.170596.i686.rpm wget http://xendev.swisscenter.com/xcp/1.1/rpm/kernel-xen-2.6.32.12-0.7.1.xs1.sc.1.0.327.170596.i686.rpm rpm --force -Uvh kernel*rpm (force because otherwise he will say that a more recenter kernel version is installed) Then for the ocfs2-tools: wget http://xendev.swisscenter.com/xcp/1.1/rpm/ocfs2-tools-1.4.4-1.el5.i386.rpm rpm --nodeps -Uvh http://xendev.swisscenter.com/xcp/1.1/rpm/ocfs2-tools-1.4.4-1.el5.i386.rpm (no deps because it depends on redhat-lsb package that is installed as xenserver-lsb on XCP) Reboot the box, then run a /etc/init.d/o2cb configure to setup the cluster. You might also want this one: wget http://xendev.swisscenter.com/xcp/1.1/rpm/parted-1.8.1-28.el5.i386.rpm That can be usefull to GPT partition a > 2To ocfs drive. All this is provided without guarantee, althought I use it successfully, don''t try it on production box at first :) Cheers, Sébastien On 04.10.2011 19:02, brooks@netgate.net wrote:> > Are there any immediate plans to make the > http://updates.vmd.citrix.com/XCP/1.1.0 repo live? > > more /etc/yum.repos.d/Citrix.repo > [citrix] > name=XCP 1.1.0 updates > mirrorlist=http://updates.vmd.citrix.com/XCP/1.1.0/domain0/mirrorlist > #baseurl=http://updates.vmd.citrix.com/XCP/1.1.0/domain0/ > gpgcheck=1 > gpgkey=http://updates.vmd.citrix.com/XCP/RPM-GPG-KEY-1.1.0 > enabled=1 > > [root@xenc1n4 ~]# yum list ocfs2-tools > Loaded plugins: fastestmirror > Loading mirror speeds from cached hostfile > Could not retrieve mirrorlist > http://updates.vmd.citrix.com/XCP/1.1.0/domain0/mirrorlist error was > [Errno 14] HTTP Error 404: Not Found > > At this point the /XCP structure doesn''t exist for 1.0 or for 1.1: > > http://updates.vmd.citrix.com/XCP/ > 404 - Not Found > > Kevin > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
The version override feature allows XenCenter to function nicely with XCP.1.x. Unfortunalty xs-tools doesn''t share the love. XCP 1.1 uncludes xs-tools-1.1 but that''s not what XenServer 5.6 fp1/sp2 expects. It expects xs-tools-5.x and it''s unhappy about the version mismatch. So, unhappy that I can''t even manage to do a live migration: [root@xenc2n1 ~]# xe vm-migrate vm=2b038b1d-62bc-b4de-68c2-8ef1cfa8f9d3 live=true host-uuid=2394d5b3-5b47-4a9b-9493-678120d7a576 You attempted an operation on a VM which requires a more recent version of the PV drivers. Please upgrade your PV drivers. vm: 2b038b1d-62bc-b4de-68c2-8ef1cfa8f9d3 (i-3-12-VM) I don''t support there''s a xs-tools version override hack in we can apply :-). Can''t we just get XenCenter support for XCP built-in? XenCenter has it''s flaw (windows app) but it''s the best configuration and monitoring app we have for XenServer and XCP. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users