Jeffery P. Humes
2006-Jul-28 05:28 UTC
[Ocfs2-users] Private Interconnect and self fencing
I have an OCFS2 filesystem on a coraid AOE device. It mounts fine, but with heavy I/O the server self fences claiming a write timeout: (16,2):o2hb_write_timeout:164 ERROR: Heartbeat write timeout to device etherd/e0.1p1 after 12000 milliseconds (16,2):o2hb_stop_all_regions:1789 ERROR: stopping heartbeat on all active regions. Kernel panic - not syncing: ocfs2 is very sorry to be fencing this system by panicing It is my understanding that OCFS is expecting that the only heartbeat available to be on disk the same disk that I am writing to? Is there any way like with other clustering setups to setup a different or even multiple heartbeats? On a crossover between servers, or on a private interface? Seems like putting it only on the disk, that may have heavy IO is going to cause problems. Any advice on setting up the heartbeats would be greatly appreciated. Thanks, -JPH
The 12 sec default is low. Bump it up to 30 secs or even higher. FAQ has the details. The higher you set it to, the longer the brown-out time. Jeffery P. Humes wrote:> I have an OCFS2 filesystem on a coraid AOE device. > It mounts fine, but with heavy I/O the server self fences claiming a > write timeout: > > (16,2):o2hb_write_timeout:164 ERROR: Heartbeat write timeout to device > etherd/e0.1p1 after 12000 milliseconds > (16,2):o2hb_stop_all_regions:1789 ERROR: stopping heartbeat on all > active regions. > Kernel panic - not syncing: ocfs2 is very sorry to be fencing this > system by panicing > > It is my understanding that OCFS is expecting that the only heartbeat > available to be on disk the same disk that I am writing to? > > Is there any way like with other clustering setups to setup a different > or even multiple heartbeats? On a crossover between servers, or on a > private interface? > Seems like putting it only on the disk, that may have heavy IO is going > to cause problems. > > Any advice on setting up the heartbeats would be greatly appreciated. > > Thanks, > > -JPH > > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users at oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users >
Jeffery P. Humes
2006-Jul-28 18:37 UTC
[Ocfs2-users] Private Interconnect and self fencing
I have set it to 30 seconds, and the same thing still happens. (15,1):o2hb_write_timeout:164 ERROR: Heartbeat write timeout to device etherd/e0.1p1 after 30000 milli seconds panic+0x3e/0x174 (15,1):o2hb_stop_all_regions:1789 ERROR: stopping heartbeat on all active regions. Kernel panic - not syncing: ocfs2 is very sorry to be fencing this system by panicing [<c01233de>] [<f8cc826a>] o2quo_disk_timeout+0x0/0x2 [ocfs2_nodemanager] [<c01313f8>] run_workqueue+0x7f/0xba [<f8cc6b15>] o2hb_write_timeout+0x0/0x65 [ocfs2_nodemanager] [<c0131be5>] worker_thread+0x0/0x117 [<c0131ccb>] worker_thread+0xe6/0x117 [<c011daa9>] default_wake_function+0x0/0xc [<c01344fd>] kthread+0x9d/0xc9 [<c0134460>] kthread+0x0/0xc9 [<c0102005>] kernel_thread_helper+0x5/0xb -JPH Sunil Mushran wrote:> The 12 sec default is low. Bump it up to 30 secs or even higher. FAQ > has the details. > The higher you set it to, the longer the brown-out time. > > Jeffery P. Humes wrote: >> I have an OCFS2 filesystem on a coraid AOE device. >> It mounts fine, but with heavy I/O the server self fences claiming a >> write timeout: >> >> (16,2):o2hb_write_timeout:164 ERROR: Heartbeat write timeout to >> device etherd/e0.1p1 after 12000 milliseconds >> (16,2):o2hb_stop_all_regions:1789 ERROR: stopping heartbeat on all >> active regions. >> Kernel panic - not syncing: ocfs2 is very sorry to be fencing this >> system by panicing >> >> It is my understanding that OCFS is expecting that the only heartbeat >> available to be on disk the same disk that I am writing to? >> >> Is there any way like with other clustering setups to setup a >> different or even multiple heartbeats? On a crossover between >> servers, or on a private interface? >> Seems like putting it only on the disk, that may have heavy IO is >> going to cause problems. >> >> Any advice on setting up the heartbeats would be greatly appreciated. >> >> Thanks, >> >> -JPH >> >> >> _______________________________________________ >> Ocfs2-users mailing list >> Ocfs2-users at oss.oracle.com >> http://oss.oracle.com/mailman/listinfo/ocfs2-users >>
Can better IO policy help in such cases? ----- Original Message ----- From: "Sunil Mushran" <Sunil.Mushran@oracle.com> To: "Jeffery P. Humes" <jeff@bofus.org> Cc: <ocfs2-users@oss.oracle.com> Sent: Friday, July 28, 2006 9:43 AM Subject: Re: [Ocfs2-users] Private Interconnect and self fencing> The 12 sec default is low. Bump it up to 30 secs or even higher. FAQ has > the details. > The higher you set it to, the longer the brown-out time. > > Jeffery P. Humes wrote: > > I have an OCFS2 filesystem on a coraid AOE device. > > It mounts fine, but with heavy I/O the server self fences claiming a > > write timeout: > > > > (16,2):o2hb_write_timeout:164 ERROR: Heartbeat write timeout to device > > etherd/e0.1p1 after 12000 milliseconds > > (16,2):o2hb_stop_all_regions:1789 ERROR: stopping heartbeat on all > > active regions. > > Kernel panic - not syncing: ocfs2 is very sorry to be fencing this > > system by panicing > > > > It is my understanding that OCFS is expecting that the only heartbeat > > available to be on disk the same disk that I am writing to? > > > > Is there any way like with other clustering setups to setup a different > > or even multiple heartbeats? On a crossover between servers, or on a > > private interface? > > Seems like putting it only on the disk, that may have heavy IO is going > > to cause problems. > > > > Any advice on setting up the heartbeats would be greatly appreciated. > > > > Thanks, > > > > -JPH > > > > > > _______________________________________________ > > Ocfs2-users mailing list > > Ocfs2-users@oss.oracle.com > > http://oss.oracle.com/mailman/listinfo/ocfs2-users > > > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users@oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users >