thr3ads.net - Ocfs2 users - [Ocfs2-users] disk heartbeat timeout poll [Oct 2006]

If this information is useful, please help other people find it:
Share via:

Sunil Mushran

2006-Oct-11 17:31 UTC

[Ocfs2-users] disk heartbeat timeout poll

Thanks for all the replies in the previous usage poll.

One of the chief concerns expressed was the (very) low default disk
heartbeat timeout setting. Well, we want to bump it up but to what?

Here are some qs the answers to which will help us determine that value.

1. What is the your disk heartbeat timeout? If you are unsure,
"cat /etc/sysconfig/o2cb".

2. What is your shared disk setup like? Fiber Channel, iscsi, AoE, etc.
Provide as much detail as you can.

3. Are you using some sort of multipathing? If so, provide details.

4. What is the cluster used for? Oracle database, mailserver, etc.

5. How many nodes in your cluster?

6. Any other relevant information?

Again, feel free to mail me directly.

Thanks
Sunil

Alexei_Roudnev

2006-Oct-11 19:08 UTC

head link

[Ocfs2-users] disk heartbeat timeout poll

> 1. What is the your disk heartbeat timeout? If you are unsure,
> "cat /etc/sysconfig/o2cb".31
>
> 2. What is your shared disk setup like? Fiber Channel, iscsi, AoE, etc.
> Provide as much detail as you can.iSCSI, on NetApp cluster, sopftware initiator. Tested on FibreChannel as
well.

System is SLES9 SP3

>
> 3. Are you using some sort of multipathing? If so, provide details.Embedded iSCSI multi port support. Can test on FC and system multipath.

>
> 4. What is the cluster used for? Oracle database, mailserver, etc.Oracle - archive logs and backups ONLY. Other cluster (testing) - aplication
binaries and configurations.

>
> 5. How many nodes in your cluster?3 (2 RAC + 1 backup server)
2

>
> 6. Any other relevant information?SAN convergence time is:
- On NetApp - 1 minute
- on Ethernet - 50 seconds
- on FibreChannel network - 1 minute (timeouts on HDS Solaris multipath, for
example)

Network switch reboot time - about 40 seconds.

Events:
- rebooting one server - no problems.
- power outage (10 seconds) on network switches, caused both interfaces gow
down - all servers in all clusters rebooted (by OCFSv2, 1 by Oracle CSS).
- problems noticed:
  * when I used cluster for document storage (I tested it), high CPU during
heavy io operations; I tested and the decided to use heartbeat cluster +
ReiserFS.
  * when my oracle server locked up memory (on spinlock) so that system
freeze for 30 sseconds, it resulted in damaged OCFS (1 time - fatal, and 1
time - repairable).
 * since we began to use OCFSv2 for low IO file systems only, no big problem
except fencing even if system have not pending IO on it.

wishes:
- clustered lvm2 (not evms - evms is too complicated and is really heavy
overhead for 90% tasks);
- online resize (at least if we have 1 node left in the system).
- multi interface heartbeat;
- self-fencing ONLY if system have pending IO (configurable);
- if OCFSv2 cluster see, that ALL servers aroiund can not run heartbeat
(disk IO delay), no need to self-fence any of them until at least one can
run heartbeat on disk again. For now, if al servers lost access to the disk,
they all (except 1) reboot; in reality, if they see each other, they dont
need to reboot because they can classify failure as GLOBAL.
- emergency local mount mode.



>
> Again, feel free to mail me directly.
>
> Thanks
> Sunil
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users@oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>

Silviu Marin-Caea

2006-Oct-11 22:25 UTC

head link

[Ocfs2-users] disk heartbeat timeout poll

On Thursday 12 October 2006 03:31, Sunil Mushran wrote:> Thanks for all the replies in the previous usage poll.
>
> One of the chief concerns expressed was the (very) low default disk
> heartbeat timeout setting. Well, we want to bump it up but to what?
>
> Here are some qs the answers to which will help us determine that value.
>
> 1. What is the your disk heartbeat timeout? If you are unsure,
> "cat /etc/sysconfig/o2cb".
31
> 2. What is your shared disk setup like? Fiber Channel, iscsi, AoE, etc.
> Provide as much detail as you can.
FC - Dell/EMC CX300
> 3. Are you using some sort of multipathing? If so, provide details.
Not yet using any multipathing
> 4. What is the cluster used for? Oracle database, mailserver, etc.
Shared Oracle DB RAC Home and datafiles, indexes, logs etc.
> 5. How many nodes in your cluster?
2

Deaderick, David (EDS)

2006-Oct-12 08:44 UTC

head link

[Ocfs2-users] disk heartbeat timeout poll

-----Original Message-----
From: ocfs2-users-bounces@oss.oracle.com
[mailto:ocfs2-users-bounces@oss.oracle.com] On Behalf Of Sunil Mushran
Sent: Wednesday, October 11, 2006 8:31 PM
To: ocfs2-users
Subject: [Ocfs2-users] disk heartbeat timeout poll

Thanks for all the replies in the previous usage poll.

One of the chief concerns expressed was the (very) low default disk
heartbeat timeout setting. Well, we want to bump it up but to what?

Here are some qs the answers to which will help us determine that value.

1. What is the your disk heartbeat timeout? If you are unsure, "cat
/etc/sysconfig/o2cb".
61

2. What is your shared disk setup like? Fiber Channel, iscsi, AoE, etc.
Provide as much detail as you can.
HP MSA500 4-port SCSI scan connected to SA532 CSCSI controller
RHEL AS 4 Update 3 2.6.9-42.0.2.ELsmp
Installed RPMs
oracleasmlib-2.0.2-1                          Mon 28 Aug 2006 06:26:24
PM EDT
oracleasm-2.6.9-42.0.2.ELhugemem-2.0.3-1      Mon 28 Aug 2006 06:26:22
PM EDT
oracleasm-2.6.9-42.0.2.EL-2.0.3-1             Mon 28 Aug 2006 06:26:18
PM EDT
ocfs2console-1.2.1-1                          Mon 28 Aug 2006 06:26:17
PM EDT
ocfs2-2.6.9-42.0.2.ELsmp-1.2.3-1              Mon 28 Aug 2006 06:26:16
PM EDT
ocfs2-2.6.9-42.0.2.ELhugemem-1.2.3-1          Mon 28 Aug 2006 06:26:15
PM EDT
ocfs2-2.6.9-42.0.2.EL-1.2.3-1                 Mon 28 Aug 2006 06:26:14
PM EDT
oracleasm-2.6.9-42.0.2.ELsmp-2.0.3-1          Mon 28 Aug 2006 06:26:12
PM EDT
oracleasm-support-2.0.3-1                     Mon 28 Aug 2006 06:26:11
PM EDT
ocfs2-tools-1.2.1-1                           Mon 28 Aug 2006 06:26:10
PM EDT
kernel-smp-2.6.9-42.0.2.EL                    Mon 28 Aug 2006 06:19:19
PM EDT
kernel-smp-devel-2.6.9-42.0.2.EL              Mon 28 Aug 2006 06:18:22
PM EDT
kernel-utils-2.4-13.1.83                      Mon 28 Aug 2006 06:05:28
PM EDT
kernel-hugemem-devel-2.6.9-42.0.2.EL          Mon 28 Aug 2006 06:05:03
PM EDT
kernel-hugemem-2.6.9-42.0.2.EL                Mon 28 Aug 2006 06:04:43
PM EDT
kernel-devel-2.6.9-42.0.2.EL                  Mon 28 Aug 2006 06:04:21
PM EDT
kernel-2.6.9-42.0.2.EL                        Mon 28 Aug 2006 06:04:05
PM EDT


3. Are you using some sort of multipathing? If so, provide details.
No

4. What is the cluster used for? Oracle database, mailserver, etc.
Oracle 10g RAC (10.2.0.2.0)

5. How many nodes in your cluster?
2

6. Any other relevant information?
Crashes during full database backup each morning.
Lockups MSA 500 controller.
Must power off all systems connected to MSA 500, restart MSA 500 and
restart systems.

Again, feel free to mail me directly.

Thanks
Sunil

_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Ocfs2 users - Oct 2006 - disk heartbeat timeout poll

[Ocfs2-users] disk heartbeat timeout poll

[Ocfs2-users] disk heartbeat timeout poll

[Ocfs2-users] disk heartbeat timeout poll

[Ocfs2-users] disk heartbeat timeout poll