Hi, We are haing some serious issues trying to configure an OCFS2 cluster on 3 SLES 10 SP2 boxes running in VMware ESX 3.0.1. Before I go into any of the detailed errors we are experiencing I first wanted to ask everyone if they have successfully configured this solution? We would be interested to find out what needs to be set at the VMware level (RDM, VMFS, NICS etc) and what needs to be configured at the O/S level. We have a LUN on our SAN that we have presented to our VMware hosts that we are using for this. Any help would be greatly appreciated!
What I did a few days ago was to create a vmware disk for each OCFS2 filesystem, and store it with one of the VM nodes. Then, add that disk to each additional VM. When you add it, use a separate SCSI host number. In other words, if the OS is on SCSI 0:0, make the disk SCSI 1:0, or some arbitrary other HBA number. Then you can go to each hosts second VM SCSI device and modify it to be shared, and of type Physical (if I remember correctly). At that point, it works fine. -- Kent Rankin -----Original Message----- From: ocfs2-users-bounces at oss.oracle.com on behalf of Haydn Cahir Sent: Mon 7/28/2008 9:48 PM To: ocfs2-users at oss.oracle.com Subject: [Ocfs2-users] OCFS2 and VMware ESX Hi, We are haing some serious issues trying to configure an OCFS2 cluster on 3 SLES 10 SP2 boxes running in VMware ESX 3.0.1. Before I go into any of the detailed errors we are experiencing I first wanted to ask everyone if they have successfully configured this solution? We would be interested to find out what needs to be set at the VMware level (RDM, VMFS, NICS etc) and what needs to be configured at the O/S level. We have a LUN on our SAN that we have presented to our VMware hosts that we are using for this. Any help would be greatly appreciated! _______________________________________________ Ocfs2-users mailing list Ocfs2-users at oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users -------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20080729/dedd472e/attachment.html
We run a similar set up, SLES 10 SP1, we were ESX 3.0.x and are now 3.5. We're running the version of ocfs2 that shipped with SLES 10 SP1. 4 nodes accessing raw mapped LUNs via ESX from an HP SAN on HP Blade Servers. Qlogic HBAs, standard NICs; nothing special. The biggest hurdle we ran into was time synch on the individual hosts (VMWare ESX + some variants of Linux have an interesting clock tick relationship which I still don't understand) that was causing some ugly fencing. It's been running well for about 8 months. Overall we're pretty happy with it thus far. That said, we don't let ESX VMotion the cluster nodes via DRS, but that's more because we haven't tested it. The cluster is used for Apache web hosting. web7:~:%1003#rpm -qa | grep -i ocfs ocfs2-tools-1.2.3-0.7 ocfs2console-1.2.3-0.7 ocfs2-tools-devel-1.2.3-0.7 web7:~:%1004#uname -a Linux web7 2.6.16.53-0.16-smp #1 SMP Tue Oct 2 16:57:49 UTC 2007 i686 athlon i386 GNU/Linux web7:~:%1005#cat /etc/SuSE-release SUSE Linux Enterprise Server 10 (i586) VERSION = 10 PATCHLEVEL = 1 web7:~:%1006# --mark Mark Sedlock Network and System Services Rowan University> -----Original Message----- > From: ocfs2-users-bounces at oss.oracle.com [mailto:ocfs2-users- > bounces at oss.oracle.com] On Behalf Of Haydn Cahir > Sent: Monday, July 28, 2008 9:49 PM > To: ocfs2-users at oss.oracle.com > Subject: [Ocfs2-users] OCFS2 and VMware ESX > > Hi, > > We are haing some serious issues trying to configure an OCFS2 cluster > on 3 SLES 10 SP2 boxes running in VMware ESX 3.0.1. Before I go into > any of the detailed errors we are experiencing I first wanted to ask > everyone if they have successfully configured this solution? We would > be interested to find out what needs to be set at the VMware level > (RDM, VMFS, NICS etc) and what needs to be configured at the O/Slevel.> We have a LUN on our SAN that we have presented to our VMware hosts > that we are using for this. > > Any help would be greatly appreciated! > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users at oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users
Hi Mark, Thanks for your reply. How did you configure your RDM mappings? We have tried a few combinations already. We have three nodes and are trying to use a single OCFS2 volume. We are encountering a range of errors like VM's not starting when another node is already started (goes back to the RDM configurations we think), two of the nodes are able to edit files in the OCFS2 volumes but the third doesn't see any changes made by the other nodes and the OCFS2 volume switching to read-only due to errors on the volume. We have tried running just two nodes and still get the problem where the volume will switch over to read-only. I will look into the time differences on the server, we normally have to make changes in the grub config and NTP settings to keep the time in synch. FYI the version of OCFS2 on SLES 10 SP2 is completely different: HIT-TCN1:~ # rpm -qa | grep -i ocfs ocfs2-tools-1.4.0-0.3 ocfs2console-1.4.0-0.3 I can't even find reference to this version on the Oracle web site. Cheers>>> "Sedlock, Mark A." <Sedlock at rowan.edu> 07/29/08 12:42 PM >>>We run a similar set up, SLES 10 SP1, we were ESX 3.0.x and are now 3.5. We're running the version of ocfs2 that shipped with SLES 10 SP1. 4 nodes accessing raw mapped LUNs via ESX from an HP SAN on HP Blade Servers. Qlogic HBAs, standard NICs; nothing special. The biggest hurdle we ran into was time synch on the individual hosts (VMWare ESX + some variants of Linux have an interesting clock tick relationship which I still don't understand) that was causing some ugly fencing. It's been running well for about 8 months. Overall we're pretty happy with it thus far. That said, we don't let ESX VMotion the cluster nodes via DRS, but that's more because we haven't tested it. The cluster is used for Apache web hosting. web7:~:%1003#rpm -qa | grep -i ocfs ocfs2-tools-1.2.3-0.7 ocfs2console-1.2.3-0.7 ocfs2-tools-devel-1.2.3-0.7 web7:~:%1004#uname -a Linux web7 2.6.16.53-0.16-smp #1 SMP Tue Oct 2 16:57:49 UTC 2007 i686 athlon i386 GNU/Linux web7:~:%1005#cat /etc/SuSE-release SUSE Linux Enterprise Server 10 (i586) VERSION = 10 PATCHLEVEL = 1 web7:~:%1006# --mark Mark Sedlock Network and System Services Rowan University
Thanks everyone for you replies. We have started fresh using raw mapped luns on two nodes, fresh format and everything is working well. Thanks again.>>> "Sedlock, Mark A." <Sedlock at rowan.edu> 07/30/08 12:27 AM >>>The device mappings were nothing out of the ordinary, LSI Logic SCSI controller (only one for the whole VM), we're using two raw mapped LUNs to each VM both OCFS2, we're not using redundant SAN uplinks (so there's no managed paths), Physical mappings (not virtual). We had some problems when we first started before we figured out we needed to keep the VMs on different physical ESX nodes since multiple VMs on the same host didn't play well with raw mapped physical LUNs (which seems obvious in retrospect). In this set up we didn't have to adjust the SCSI host number (as Kent mentioned). We've run heartbeats on both a private network (second virtual NIC, dedicated virtual switch in ESX) and the primary network interface; both have worked fine. --mark Mark Sedlock Network and System Services Rowan University -----Original Message----- From: ocfs2-users-bounces at oss.oracle.com [mailto:ocfs2-users-bounces at oss.oracle.com] On Behalf Of Haydn Cahir Sent: Monday, July 28, 2008 11:08 PM To: ocfs2-users at oss.oracle.com Subject: Re: [Ocfs2-users] OCFS2 and VMware ESX Hi Mark, Thanks for your reply. How did you configure your RDM mappings? We have tried a few combinations already. We have three nodes and are trying to use a single OCFS2 volume. We are encountering a range of errors like VM's not starting when another node is already started (goes back to the RDM configurations we think), two of the nodes are able to edit files in the OCFS2 volumes but the third doesn't see any changes made by the other nodes and the OCFS2 volume switching to read-only due to errors on the volume. We have tried running just two nodes and still get the problem where the volume will switch over to read-only. I will look into the time differences on the server, we normally have to make changes in the grub config and NTP settings to keep the time in synch. FYI the version of OCFS2 on SLES 10 SP2 is completely different: HIT-TCN1:~ # rpm -qa | grep -i ocfs ocfs2-tools-1.4.0-0.3 ocfs2console-1.4.0-0.3 I can't even find reference to this version on the Oracle web site. Cheers>>> "Sedlock, Mark A." <Sedlock at rowan.edu> 07/29/08 12:42 PM >>>We run a similar set up, SLES 10 SP1, we were ESX 3.0.x and are now 3.5. We're running the version of ocfs2 that shipped with SLES 10 SP1. 4 nodes accessing raw mapped LUNs via ESX from an HP SAN on HP Blade Servers. Qlogic HBAs, standard NICs; nothing special. The biggest hurdle we ran into was time synch on the individual hosts (VMWare ESX + some variants of Linux have an interesting clock tick relationship which I still don't understand) that was causing some ugly fencing. It's been running well for about 8 months. Overall we're pretty happy with it thus far. That said, we don't let ESX VMotion the cluster nodes via DRS, but that's more because we haven't tested it. The cluster is used for Apache web hosting. web7:~:%1003#rpm -qa | grep -i ocfs ocfs2-tools-1.2.3-0.7 ocfs2console-1.2.3-0.7 ocfs2-tools-devel-1.2.3-0.7 web7:~:%1004#uname -a Linux web7 2.6.16.53-0.16-smp #1 SMP Tue Oct 2 16:57:49 UTC 2007 i686 athlon i386 GNU/Linux web7:~:%1005#cat /etc/SuSE-release SUSE Linux Enterprise Server 10 (i586) VERSION = 10 PATCHLEVEL = 1 web7:~:%1006# --mark Mark Sedlock Network and System Services Rowan University _______________________________________________ Ocfs2-users mailing list Ocfs2-users at oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users _______________________________________________ Ocfs2-users mailing list Ocfs2-users at oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users --- Scanned by M+ Guardian Messaging Firewall ---