SCOTT, Gavin
2005-Nov-03 19:29 UTC
[Ocfs2-users] Heartbeat threshold, misscount & hangcheck co-ordination
Hi, Having configured 2-node RAC on OCFS2 I'm trying to establish the best figure for the heartbeat threshold. I've seen a few threads on this, but they don't address the other timeout values in the system.When configuring OCFS 1, I didn't notice a heartbeat threshold: either I missed it completely or it is a new function. With the heartbeat default at 7 (2 second ticks) that puts a self-fencing situation at (threshold -1) *2 = 12 seconds. However CRS has a misscount parameter default of 60 (crsctl get css misscount), which is the time that a node will take before evicting another node from the cluster when it fails to respond across the interconnect. Additionally, the hangcheck-timer module must be co-ordinated with this misscount parameter to ensure that if a hung node revives itself after being evicted from the cluster, it reboots to avoid corrupting the database during an attempt to resume previous transactions. It strikes me that the heartbeat threshold should probably also be co-ordinated with these other 2 parameters: ie, if a node self-fences then the rest of the cluster should be evicting it at the same time, not waiting a further 45 seconds. While all the Oracle documentation for 9i RAC and CM on OCFS (1) did not recommend a lower value than 60 seconds for misscount, I found that a bit high in terms of cluster timeouts for a TAF scenario, especially as it also affects TCP timeouts (particularly in 9i RAC where there were no virtual IPs). After some conversation via a TAR, Oracle did state that a lower value was acceptable, as long as premature cluster evictions were not occurring. So the question is, am I right in setting the heartbeat threshold to match the misscount parameter? The value I am thinking of is about 30 seconds. Thanks, Gavin. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20051104/81d441a6/attachment.html