Hello All, I hope this makes sense, I have two opensolaris machines with a bunch of hard disks, one acts as a iSCSI SAN, and the other is identical other than the hard disk configuration. The only thing being served are VMWare esxi raw disks, which hold either virtual machines or data that the particular virtual machine uses, I.E. we have exchange 2007 virtualized and through its iSCSI initiator we are mounting two LUNs one for the database and another for the Logs, all on different arrays of course. Any how we are then snapshotting this data across the SAN network to the other box using snapshot send/recv. In the case the other box fails this box can immediatly serve all of the iSCSI LUNs. The problem, I don''t really know if its a problem...Is when I snapshot a running vm will it come up alive in esxi or do I have to accomplish this in a different way. These snapshots will then be written to tape with bacula. I hope I am posting this in the correct place. Thanks, Greg -- This message posted from opensolaris.org
On Mon, Jan 11, 2010 at 6:17 PM, Greg <gregory.durham at gmail.com> wrote:> Hello All, > I hope this makes sense, I have two opensolaris machines with a bunch of > hard disks, one acts as a iSCSI SAN, and the other is identical other than > the hard disk configuration. The only thing being served are VMWare esxi raw > disks, which hold either virtual machines or data that the particular > virtual machine uses, I.E. we have exchange 2007 virtualized and through its > iSCSI initiator we are mounting two LUNs one for the database and another > for the Logs, all on different arrays of course. Any how we are then > snapshotting this data across the SAN network to the other box using > snapshot send/recv. In the case the other box fails this box can immediatly > serve all of the iSCSI LUNs. The problem, I don''t really know if its a > problem...Is when I snapshot a running vm will it come up alive in esxi or > do I have to accomplish this in a different way. These snapshots will then > be written to tape with bacula. I hope I am posting this in the correct > place. > > Thanks, > Greg > -- > >What you''ve got are crash consistent snapshots. The disks are in the same state they would be in if you pulled the power plug. They may come up just fine, or they may be in a corrupt state. If you take snapshots frequently enough, you should have at least one good snapshot. Your other option is scripting. You can build custom scripts to leverage the VSS providers in Windows... but it won''t be easy. Any reason in particular you''re using iSCSI? I''ve found NFS to be much more simple to manage, and performance to be equivalent if not better (in large clusters). -- --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100111/c6a3833d/attachment.html>
Your machines won''t come up running, they''ll start up from scratch (like if you had hit the reset button). If you want your machines to come up you have to make vmware snapshots, which capture the state of the running VM (memory, etc..). Typically this is automated with solutions like VCB (Vmware consolidated backup), but I''ve just found http://communities.vmware.com/docs/DOC-8760 (not tested though since we are running ESX and have bought VCB licenses). Bear in mind that vmware won''t be able to take a consistent snapshot if some disks in the VM come from VMDK files while some other disks are raw LUNs (or otherwise mounted directly in the VM, I mean out of control from esx). You''ll have to restart the machine from scratch in this case and have a strong potential for discrepancies between VMDK and raw luns. On the other hand, I understand that you want Exchange2007 logs and db to live their live so that when you ? revert to snapshot ? you don''t loose all the mail that was sent/delivered in between. So this can be a perfectly valid design depending on how you have set it up. I don''t think snapshots (be they vmware or zfs) are a good tool for failover or redundancy here. Basically, if your storage is not accessible from your esxi hosts, your VMs are toasted and you have to restart them from scratch. Please note, I don''t know about esxi iscsi retry policies specifics. For ESX we use an SVC cluster (2 node FC cluster), so our ESX hosts can always access the storage. You could try to setup an iscsi cluster like this http://docs.sun.com/app/docs/doc/820-7821/z40000f557a?a=view (look for the figure at the bottom). You would obtain a mirrored pool where you could place the vmware zvols. Then you could iscsi-share these zvols. Though I''m not sure if/how OpenHA could/would failover if one of your node fails (I always wanted to play with openHA but don''t have the time nor the hardware at hand to try it). This setup of course doesn''t prevent you from doing vmware snapshots and zfs snapshots, you''ll just achieve some level of fault-tolerance. Please note I don''t know anything about using NFS with esx/esxi. Maybe there are setups that are easier to achieve using NFS and provide the same (or a better) level of fault-tolerance. Hope this helps, Arnaud De : zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-bounces at opensolaris.org] De la part de Tim Cook Envoy? : mardi 12 janvier 2010 04:36 ? : Greg Cc : zfs-discuss at opensolaris.org Objet : Re: [zfs-discuss] opensolaris-vmware On Mon, Jan 11, 2010 at 6:17 PM, Greg <gregory.durham at gmail.com<mailto:gregory.durham at gmail.com>> wrote: Hello All, I hope this makes sense, I have two opensolaris machines with a bunch of hard disks, one acts as a iSCSI SAN, and the other is identical other than the hard disk configuration. The only thing being served are VMWare esxi raw disks, which hold either virtual machines or data that the particular virtual machine uses, I.E. we have exchange 2007 virtualized and through its iSCSI initiator we are mounting two LUNs one for the database and another for the Logs, all on different arrays of course. Any how we are then snapshotting this data across the SAN network to the other box using snapshot send/recv. In the case the other box fails this box can immediatly serve all of the iSCSI LUNs. The problem, I don''t really know if its a problem...Is when I snapshot a running vm will it come up alive in esxi or do I have to accomplish this in a different way. These snapshots will then be written to tape with bacula. I hope I am posting this in the correct place. Thanks, Greg -- What you''ve got are crash consistent snapshots. The disks are in the same state they would be in if you pulled the power plug. They may come up just fine, or they may be in a corrupt state. If you take snapshots frequently enough, you should have at least one good snapshot. Your other option is scripting. You can build custom scripts to leverage the VSS providers in Windows... but it won''t be easy. Any reason in particular you''re using iSCSI? I''ve found NFS to be much more simple to manage, and performance to be equivalent if not better (in large clusters). -- --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100112/5fa43bfd/attachment.html>
Tim, iSCSI was a design descision at the time. Performance was key and I wanted to utilize being able to hand a LUN on the SAN to esxi, and use it as a raw disk in physical compatibility mode...however what this has done is that I can no longer take snapshots on the esxi server and must rely on zfs snapshot. Also I have multiple *nix virtual machines I need to worry about backing up and making sure that if all fails that the file systems are consistent... Thanks, Greg On Mon, Jan 11, 2010 at 7:36 PM, Tim Cook <tim at cook.ms> wrote:> > > On Mon, Jan 11, 2010 at 6:17 PM, Greg <gregory.durham at gmail.com> wrote: > >> Hello All, >> I hope this makes sense, I have two opensolaris machines with a bunch of >> hard disks, one acts as a iSCSI SAN, and the other is identical other than >> the hard disk configuration. The only thing being served are VMWare esxi raw >> disks, which hold either virtual machines or data that the particular >> virtual machine uses, I.E. we have exchange 2007 virtualized and through its >> iSCSI initiator we are mounting two LUNs one for the database and another >> for the Logs, all on different arrays of course. Any how we are then >> snapshotting this data across the SAN network to the other box using >> snapshot send/recv. In the case the other box fails this box can immediatly >> serve all of the iSCSI LUNs. The problem, I don''t really know if its a >> problem...Is when I snapshot a running vm will it come up alive in esxi or >> do I have to accomplish this in a different way. These snapshots will then >> be written to tape with bacula. I hope I am posting this in the correct >> place. >> >> Thanks, >> Greg >> -- >> >> > What you''ve got are crash consistent snapshots. The disks are in the same > state they would be in if you pulled the power plug. They may come up just > fine, or they may be in a corrupt state. If you take snapshots frequently > enough, you should have at least one good snapshot. Your other option is > scripting. You can build custom scripts to leverage the VSS providers in > Windows... but it won''t be easy. > > Any reason in particular you''re using iSCSI? I''ve found NFS to be much > more simple to manage, and performance to be equivalent if not better (in > large clusters). > > -- > --Tim >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100113/3cc6cd9d/attachment.html>
Arnaud, The virtual machines coming up as if they were on is the least of my worries, my biggest worry is keeping the filesystems of the vms alive i.e. not corrupt. I have all of my virtual machines set up with raw LUNs in physical compatibility mode. This has increased performance but sadly at the cost of vmware snapshots. Is there anything within the virtual machine itself I can do to keep the filesysystem in tact? In the case of exchange, I have exchange itself on a raw lun in physical compatibility mode, and I have 2 LUNs mounted with the Server 2008 iSCSI initiator for logs and the exchange DB. This is a set up is similar to several other *nix vms I have residing on this SAN. Which I am also worrying about. Any other ideas? Thanks, Greg On Tue, Jan 12, 2010 at 1:11 AM, Arnaud Brand <ABrand at esca.fr> wrote:> Your machines won?t come up running, they?ll start up from scratch (like > if you had hit the reset button). > > > > If you want your machines to come up you have to make vmware snapshots, > which capture the state of the running VM (memory, etc..). Typically this is > automated with solutions like VCB (Vmware consolidated backup), but I?ve > just found http://communities.vmware.com/docs/DOC-8760 (not tested though > since we are running ESX and have bought VCB licenses). > > > > Bear in mind that vmware won?t be able to take a consistent snapshot if > some disks in the VM come from VMDK files while some other disks are raw > LUNs (or otherwise mounted directly in the VM, I mean out of control from > esx). You?ll have to restart the machine from scratch in this case and have > a strong potential for discrepancies between VMDK and raw luns. > > On the other hand, I understand that you want Exchange2007 logs and db to > live their live so that when you ? revert to snapshot ? you don?t loose all > the mail that was sent/delivered in between. > > So this can be a perfectly valid design depending on how you have set it > up. > > > > I don?t think snapshots (be they vmware or zfs) are a good tool for > failover or redundancy here. Basically, if your storage is not accessible > from your esxi hosts, your VMs are toasted and you have to restart them from > scratch. > > Please note, I don?t know about esxi iscsi retry policies specifics. For > ESX we use an SVC cluster (2 node FC cluster), so our ESX hosts can always > access the storage. > > > > You could try to setup an iscsi cluster like this > http://docs.sun.com/app/docs/doc/820-7821/z40000f557a?a=view (look for the > figure at the bottom). You would obtain a mirrored pool where you could > place the vmware zvols. Then you could iscsi-share these zvols. > > Though I?m not sure if/how OpenHA could/would failover if one of your node > fails (I always wanted to play with openHA but don?t have the time nor the > hardware at hand to try it). > > > > This setup of course doesn?t prevent you from doing vmware snapshots and > zfs snapshots, you?ll just achieve some level of fault-tolerance. > > > > Please note I don?t know anything about using NFS with esx/esxi. Maybe > there are setups that are easier to achieve using NFS and provide the same > (or a better) level of fault-tolerance. > > > > Hope this helps, > > Arnaud > > > > *De :* zfs-discuss-bounces at opensolaris.org [mailto: > zfs-discuss-bounces at opensolaris.org] *De la part de* Tim Cook > *Envoy? :* mardi 12 janvier 2010 04:36 > *? :* Greg > *Cc :* zfs-discuss at opensolaris.org > *Objet :* Re: [zfs-discuss] opensolaris-vmware > > > > > > On Mon, Jan 11, 2010 at 6:17 PM, Greg <gregory.durham at gmail.com> wrote: > > Hello All, > I hope this makes sense, I have two opensolaris machines with a bunch of > hard disks, one acts as a iSCSI SAN, and the other is identical other than > the hard disk configuration. The only thing being served are VMWare esxi raw > disks, which hold either virtual machines or data that the particular > virtual machine uses, I.E. we have exchange 2007 virtualized and through its > iSCSI initiator we are mounting two LUNs one for the database and another > for the Logs, all on different arrays of course. Any how we are then > snapshotting this data across the SAN network to the other box using > snapshot send/recv. In the case the other box fails this box can immediatly > serve all of the iSCSI LUNs. The problem, I don''t really know if its a > problem...Is when I snapshot a running vm will it come up alive in esxi or > do I have to accomplish this in a different way. These snapshots will then > be written to tape with bacula. I hope I am posting this in the correct > place. > > Thanks, > Greg > -- > > > What you''ve got are crash consistent snapshots. The disks are in the same > state they would be in if you pulled the power plug. They may come up just > fine, or they may be in a corrupt state. If you take snapshots frequently > enough, you should have at least one good snapshot. Your other option is > scripting. You can build custom scripts to leverage the VSS providers in > Windows... but it won''t be easy. > > Any reason in particular you''re using iSCSI? I''ve found NFS to be much > more simple to manage, and performance to be equivalent if not better (in > large clusters). > > > -- > --Tim >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100113/9eea6bed/attachment.html>
On Thu, Jan 14, 2010 at 6:40 AM, Gregory Durham <gregory.durham at gmail.com> wrote:> Arnaud, > The virtual machines coming up as if they were on is the least of my > worries, my biggest worry is keeping the filesystems of the vms alive i.e. > not corrupt.As Tim said, The snapshot disk are in the same state they would be in if you pulled the power plug. This is also the same thing you got BTW if you use LVM snapshot (on Linux) or SAN/NAS based snapshots (like NetApp)> In the case of exchange, I have exchange itself on a raw lun in physical > compatibility mode, and I have 2 LUNs mounted with the Server 2008 iSCSI > initiator for logs and the exchange DB.Most modern filesystem and database have journaling that can recover from power failure scenarios, so they should be able to use the snapshot and provide consistent, non-corrupt information. So the question now is, have you tried restoring from snapshot? -- Fajar
Haha, Yeah that''s tomorrow, I have a test vm I will be testing on. I shall report back! Thank you all! On Wed, Jan 13, 2010 at 8:26 PM, Fajar A. Nugraha <fajar at fajar.net> wrote:> On Thu, Jan 14, 2010 at 6:40 AM, Gregory Durham > <gregory.durham at gmail.com> wrote: > > Arnaud, > > The virtual machines coming up as if they were on is the least of my > > worries, my biggest worry is keeping the filesystems of the vms alive > i.e. > > not corrupt. > > As Tim said, The snapshot disk are in the same state they would be in > if you pulled the power plug. > This is also the same thing you got BTW if you use LVM snapshot (on > Linux) or SAN/NAS based snapshots (like NetApp) > > > In the case of exchange, I have exchange itself on a raw lun in physical > > compatibility mode, and I have 2 LUNs mounted with the Server 2008 iSCSI > > initiator for logs and the exchange DB. > > Most modern filesystem and database have journaling that can recover > from power failure scenarios, so they should be able to use the > snapshot and provide consistent, non-corrupt information. > > So the question now is, have you tried restoring from snapshot? > > -- > Fajar >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100113/cf59f4be/attachment.html>
I have been recommended by several other users on this mailing list to use inside the vm snapshots, vmware snapshots, and then use zfs snapshots. I believe I understand the difference between filesystem snapshots vs block level snapshots, however since I cannot use vmware snapshots (all LUNs on the SAN are mapped to ESXi using RAW disk in physical compatibility mode, which then disables vmware snapshots) does this cause me to have a weaker backup strategy? What else can I do? Should I convert the virtual machines from physical compatibility to virtual compatibility in order to get snapshotting on the ESXi server? Thanks for all the helpful information! Greg On Wed, Jan 13, 2010 at 9:12 PM, Gregory Durham <gregory.durham at gmail.com>wrote:> Haha, Yeah that''s tomorrow, I have a test vm I will be testing on. I shall > report back! Thank you all! > > > On Wed, Jan 13, 2010 at 8:26 PM, Fajar A. Nugraha <fajar at fajar.net> wrote: > >> On Thu, Jan 14, 2010 at 6:40 AM, Gregory Durham >> <gregory.durham at gmail.com> wrote: >> > Arnaud, >> > The virtual machines coming up as if they were on is the least of my >> > worries, my biggest worry is keeping the filesystems of the vms alive >> i.e. >> > not corrupt. >> >> As Tim said, The snapshot disk are in the same state they would be in >> if you pulled the power plug. >> This is also the same thing you got BTW if you use LVM snapshot (on >> Linux) or SAN/NAS based snapshots (like NetApp) >> >> > In the case of exchange, I have exchange itself on a raw lun in physical >> > compatibility mode, and I have 2 LUNs mounted with the Server 2008 iSCSI >> > initiator for logs and the exchange DB. >> >> Most modern filesystem and database have journaling that can recover >> from power failure scenarios, so they should be able to use the >> snapshot and provide consistent, non-corrupt information. >> >> So the question now is, have you tried restoring from snapshot? >> >> -- >> Fajar >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100114/6dece84c/attachment.html>
On Fri, Jan 15, 2010 at 12:33 AM, Gregory Durham <gregory.durham at gmail.com> wrote:> I have been recommended by several other users on this mailing list to use > inside the vm snapshots, vmware snapshots, and then use zfs snapshots. I > believe I understand the difference between filesystem snapshots vs block > level snapshots, however since I cannot use vmware snapshots (all LUNs on > the SAN are mapped to ESXi using RAW disk in physical compatibility mode, > which then disables vmware snapshots) does this cause me to have a weaker > backup strategy? What else can I do? Should I convert the virtual machines > from physical compatibility to virtual compatibility in order to get > snapshotting on the ESXi server?IMHO using all three is too much. you can pick one, and combine that with other (non-snapshot) backup strategy. vmware snapshot is good because it also stores memory state, but it also uses more space. What I recommend you to do in your current setup: - check whether your application can survive an unclean shutdown/power outage (it should). If not, then you have to do application-specific backup. - do zfs snapshot plus send/receive - add regular tape backup if necessary, although it might not need to be as frequent (you already plan this) - regulary excercise restoring from backups, to make sure your backup system works. -- Fajar
Thank you so much Fajar, You have been incredibly helpful! I will do as you said I am just glad I have not been going down the wrong path! Thanks, Greg On Thu, Jan 14, 2010 at 4:45 PM, Fajar A. Nugraha <fajar at fajar.net> wrote:> On Fri, Jan 15, 2010 at 12:33 AM, Gregory Durham > <gregory.durham at gmail.com> wrote: > > I have been recommended by several other users on this mailing list to > use > > inside the vm snapshots, vmware snapshots, and then use zfs snapshots. I > > believe I understand the difference between filesystem snapshots vs block > > level snapshots, however since I cannot use vmware snapshots (all LUNs on > > the SAN are mapped to ESXi using RAW disk in physical compatibility mode, > > which then disables vmware snapshots) does this cause me to have a weaker > > backup strategy? What else can I do? Should I convert the virtual > machines > > from physical compatibility to virtual compatibility in order to get > > snapshotting on the ESXi server? > > IMHO using all three is too much. you can pick one, and combine that > with other (non-snapshot) backup strategy. > vmware snapshot is good because it also stores memory state, but it > also uses more space. > > What I recommend you to do in your current setup: > - check whether your application can survive an unclean shutdown/power > outage (it should). If not, then you have to do application-specific > backup. > - do zfs snapshot plus send/receive > - add regular tape backup if necessary, although it might not need to > be as frequent (you already plan this) > - regulary excercise restoring from backups, to make sure your backup > system works. > > -- > Fajar >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100119/5aa83cc3/attachment.html>