qs is why is the io not completing within 60 secs? You could
try increasing the hb threshold to say 46 (90 secs).
It could be that that while that system may not be doing any
heavy io but some other system using the same storage is.
Has that been ruled out?
In the end, we have to first determine if it is a setup issue
or an environment issue. Once we have ruled those out can
we explore software bugs in fs and/or kernel.
Byron Albert wrote:> Hello group,
>
> I have setup a 3 node cluster running SLES 10sp1. These nodes are all
> running in vmware sharing a physical LUN from our fiber storage backend.
> About twice a day the systems are all fencing and panic'ing locking up.
The
> message is took 58007ms to do waiting for write completion then fences. We
> have tried upping all the time outs to the ones found on the site.
>
> O2CB_HEARTBEAT_THRESHOLD = 31
> O2CB_IDLE_TIMEOUT_MS = 30000
> O2CB_KEEPALIVE_DELAY_MS = 2000
> O2CB_RECONNECT_DELAY_MS = 2000
>
> Today I added elevator=deadline to see if that fixes it.
> Here are the versions we are running
>
> pbywebadmin1:~ # cat /proc/fs/ocfs2/version
> OCFS2 1.2.5-2-SLES-r3027 Tue Mar 27 16:33:19 EDT 2007 (build sles)
> pbywebadmin1:~ # uname -a
> Linux pbywebadmin1 2.6.16.53-0.16-default #1 Tue Oct 2 16:57:49 UTC 2007
> i686 i686 i386 GNU/Linux
> pbywebadmin1:~ # rpm -qa | grep ocfs
> ocfs2-tools-1.2.3-0.7
>
>
> Any suggestions on what is needed to get this to work without issues. There
> is no heavy IO on this system the crashes happen with just random light IO.
>
> Byron
>
> Byron Albert
> Prolifics
> balbert@prolifics.com
> Office: 646-201-4981 Mobile: 203-512-7456
>
> 2007 IBM Award Winner for Overall Technical Excellence
> SOA. Building the Future into Your Business
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users@oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users