Thanks for all the replies in the previous usage poll. One of the chief concerns expressed was the (very) low default disk heartbeat timeout setting. Well, we want to bump it up but to what? Here are some qs the answers to which will help us determine that value. 1. What is the your disk heartbeat timeout? If you are unsure, "cat /etc/sysconfig/o2cb". 2. What is your shared disk setup like? Fiber Channel, iscsi, AoE, etc. Provide as much detail as you can. 3. Are you using some sort of multipathing? If so, provide details. 4. What is the cluster used for? Oracle database, mailserver, etc. 5. How many nodes in your cluster? 6. Any other relevant information? Again, feel free to mail me directly. Thanks Sunil
> 1. What is the your disk heartbeat timeout? If you are unsure, > "cat /etc/sysconfig/o2cb".31> > 2. What is your shared disk setup like? Fiber Channel, iscsi, AoE, etc. > Provide as much detail as you can.iSCSI, on NetApp cluster, sopftware initiator. Tested on FibreChannel as well. System is SLES9 SP3> > 3. Are you using some sort of multipathing? If so, provide details.Embedded iSCSI multi port support. Can test on FC and system multipath.> > 4. What is the cluster used for? Oracle database, mailserver, etc.Oracle - archive logs and backups ONLY. Other cluster (testing) - aplication binaries and configurations.> > 5. How many nodes in your cluster?3 (2 RAC + 1 backup server) 2> > 6. Any other relevant information?SAN convergence time is: - On NetApp - 1 minute - on Ethernet - 50 seconds - on FibreChannel network - 1 minute (timeouts on HDS Solaris multipath, for example) Network switch reboot time - about 40 seconds. Events: - rebooting one server - no problems. - power outage (10 seconds) on network switches, caused both interfaces gow down - all servers in all clusters rebooted (by OCFSv2, 1 by Oracle CSS). - problems noticed: * when I used cluster for document storage (I tested it), high CPU during heavy io operations; I tested and the decided to use heartbeat cluster + ReiserFS. * when my oracle server locked up memory (on spinlock) so that system freeze for 30 sseconds, it resulted in damaged OCFS (1 time - fatal, and 1 time - repairable). * since we began to use OCFSv2 for low IO file systems only, no big problem except fencing even if system have not pending IO on it. wishes: - clustered lvm2 (not evms - evms is too complicated and is really heavy overhead for 90% tasks); - online resize (at least if we have 1 node left in the system). - multi interface heartbeat; - self-fencing ONLY if system have pending IO (configurable); - if OCFSv2 cluster see, that ALL servers aroiund can not run heartbeat (disk IO delay), no need to self-fence any of them until at least one can run heartbeat on disk again. For now, if al servers lost access to the disk, they all (except 1) reboot; in reality, if they see each other, they dont need to reboot because they can classify failure as GLOBAL. - emergency local mount mode.> > Again, feel free to mail me directly. > > Thanks > Sunil > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users@oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users >
On Thursday 12 October 2006 03:31, Sunil Mushran wrote:> Thanks for all the replies in the previous usage poll. > > One of the chief concerns expressed was the (very) low default disk > heartbeat timeout setting. Well, we want to bump it up but to what? > > Here are some qs the answers to which will help us determine that value. > > 1. What is the your disk heartbeat timeout? If you are unsure, > "cat /etc/sysconfig/o2cb".31> 2. What is your shared disk setup like? Fiber Channel, iscsi, AoE, etc. > Provide as much detail as you can.FC - Dell/EMC CX300> 3. Are you using some sort of multipathing? If so, provide details.Not yet using any multipathing> 4. What is the cluster used for? Oracle database, mailserver, etc.Shared Oracle DB RAC Home and datafiles, indexes, logs etc.> 5. How many nodes in your cluster?2
-----Original Message----- From: ocfs2-users-bounces@oss.oracle.com [mailto:ocfs2-users-bounces@oss.oracle.com] On Behalf Of Sunil Mushran Sent: Wednesday, October 11, 2006 8:31 PM To: ocfs2-users Subject: [Ocfs2-users] disk heartbeat timeout poll Thanks for all the replies in the previous usage poll. One of the chief concerns expressed was the (very) low default disk heartbeat timeout setting. Well, we want to bump it up but to what? Here are some qs the answers to which will help us determine that value. 1. What is the your disk heartbeat timeout? If you are unsure, "cat /etc/sysconfig/o2cb". 61 2. What is your shared disk setup like? Fiber Channel, iscsi, AoE, etc. Provide as much detail as you can. HP MSA500 4-port SCSI scan connected to SA532 CSCSI controller RHEL AS 4 Update 3 2.6.9-42.0.2.ELsmp Installed RPMs oracleasmlib-2.0.2-1 Mon 28 Aug 2006 06:26:24 PM EDT oracleasm-2.6.9-42.0.2.ELhugemem-2.0.3-1 Mon 28 Aug 2006 06:26:22 PM EDT oracleasm-2.6.9-42.0.2.EL-2.0.3-1 Mon 28 Aug 2006 06:26:18 PM EDT ocfs2console-1.2.1-1 Mon 28 Aug 2006 06:26:17 PM EDT ocfs2-2.6.9-42.0.2.ELsmp-1.2.3-1 Mon 28 Aug 2006 06:26:16 PM EDT ocfs2-2.6.9-42.0.2.ELhugemem-1.2.3-1 Mon 28 Aug 2006 06:26:15 PM EDT ocfs2-2.6.9-42.0.2.EL-1.2.3-1 Mon 28 Aug 2006 06:26:14 PM EDT oracleasm-2.6.9-42.0.2.ELsmp-2.0.3-1 Mon 28 Aug 2006 06:26:12 PM EDT oracleasm-support-2.0.3-1 Mon 28 Aug 2006 06:26:11 PM EDT ocfs2-tools-1.2.1-1 Mon 28 Aug 2006 06:26:10 PM EDT kernel-smp-2.6.9-42.0.2.EL Mon 28 Aug 2006 06:19:19 PM EDT kernel-smp-devel-2.6.9-42.0.2.EL Mon 28 Aug 2006 06:18:22 PM EDT kernel-utils-2.4-13.1.83 Mon 28 Aug 2006 06:05:28 PM EDT kernel-hugemem-devel-2.6.9-42.0.2.EL Mon 28 Aug 2006 06:05:03 PM EDT kernel-hugemem-2.6.9-42.0.2.EL Mon 28 Aug 2006 06:04:43 PM EDT kernel-devel-2.6.9-42.0.2.EL Mon 28 Aug 2006 06:04:21 PM EDT kernel-2.6.9-42.0.2.EL Mon 28 Aug 2006 06:04:05 PM EDT 3. Are you using some sort of multipathing? If so, provide details. No 4. What is the cluster used for? Oracle database, mailserver, etc. Oracle 10g RAC (10.2.0.2.0) 5. How many nodes in your cluster? 2 6. Any other relevant information? Crashes during full database backup each morning. Lockups MSA 500 controller. Must power off all systems connected to MSA 500, restart MSA 500 and restart systems. Again, feel free to mail me directly. Thanks Sunil _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users