Gabriele Di Giambelardini
2008-Jun-30 11:01 UTC
[Ocfs2-users] Fence abnormal and with not apparent reason
I to all, I have a big and intrigued problem. I explain you the situation: I have 5 servers linux and 1 SAN IBM , every server have ocfs2 and by ocfs2-console I can watch they. Fot connect the server I use an dedicate network, The problem is that some times I have this message on one of the server: kernel: o2net: connection to node test.test.it (num 1) at 10.10.10.1:7777 has been idle for 60.0 seconds, shutting it down. So my server has fenced, but when it come up, not success to start ocfs2 or mount partition. For resolve it I must fence all servers and every thing restart to work well. I have noticed the if I'm not fast to fence all servers, other nodes go in "shutting it down". Some body can help me, it's really important for me. my server: - Red Hat Enterprise Linux Server release 5 2.6.18-8.el5 #1 SMP Fri Jan 26 14:15:14 EST 2007 x86_64 x86_64 x86_64 GNU/Linux - ocfs2-2.6.18-8.el5-1.2.8-2.el5 ocfs2-tools-1.2.7-1.el5 ocfs2console-1.2.7-1.el5 ocfs2-tools-debuginfo-1.2.6-1.el5 ocfs2-2.6.18-92.1.1.el5-1.2.9-1.el5 - OCFS2 1.2.8 Tue Jan 22 11:58:16 PST 2008 (build 9c7ae8bb50ef6d8791df2912775adcc5) thank in advance for any suggestions Hai un indirizzo email difficile da ricordare? Scegli quello che hai sempre desiderato su Yahoo! Mail http://it.docs.yahoo.com/nuovo_indirizzo.html -------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20080630/bbc7b6f0/attachment-0001.html
Gabriele Di Giambelardini
2008-Jun-30 13:56 UTC
[Ocfs2-users] Fence abnormal and with not apparent reason
Hi, this is my output on all the 5 servers Module "configfs": Loaded Filesystem "configfs": Mounted Module "ocfs2_nodemanager": Loaded Module "ocfs2_dlm": Loaded Module "ocfs2_dlmfs": Loaded Filesystem "ocfs2_dlmfs": Mounted Checking O2CB cluster ocfs2: Online Heartbeat dead threshold: 61 Network idle timeout: 60000 Network keepalive delay: 2000 Network reconnect delay: 2000 Checking O2CB heartbeat: Active thanks ----- Messaggio originale ----- Da: V Srinivas <vaungasrinu at gmail.com> A: Gabriele Di Giambelardini <gabriele_d_g at yahoo.it> Inviato: Luned? 30 giugno 2008, 13:07:31 Oggetto: Re: [Ocfs2-users] Fence abnormal and with not apparent reason pls send me service o2cb status output for that servers. On 30/06/2008, Gabriele Di Giambelardini <gabriele_d_g at yahoo.it> wrote: I to all, I have a big and intrigued problem. I explain you the situation: I have 5 servers linux and 1 SAN IBM , every server have ocfs2 and by ocfs2-console I can watch they. Fot connect the server I use an dedicate network, The problem is that some times I have this message on one of the server: kernel: o2net: connection to node test.test.it (num 1) at 10.10.10.1:7777 has been idle for 60.0 seconds, shutting it down. So my server has fenced, but when it come up, not success to start ocfs2 or mount partition. For resolve it I must fence all servers and every thing restart to work well. I have noticed the if I'm not fast to fence all servers, other nodes go in "shutting it down". Some body can help me, it's really important for me. my server: - Red Hat Enterprise Linux Server release 5 2.6.18-8.el5 #1 SMP Fri Jan 26 14:15:14 EST 2007 x86_64 x86_64 x86_64 GNU/Linux - ocfs2-2.6.18-8.el5-1.2.8-2.el5 ocfs2-tools-1.2.7-1.el5 ocfs2console-1.2.7-1.el5 ocfs2-tools-debuginfo-1.2.6-1.el5 ocfs2-2.6.18-92.1.1.el5-1.2.9-1.el5 - OCFS2 1.2.8 Tue Jan 22 11:58:16 PST 2008 (build 9c7ae8bb50ef6d8791df2912775adcc5) thank in advance for any suggestions ________________________________ Scopri il Blog di Yahoo! Mail: trucchi, novit?, consigli... e la tua opinione! _______________________________________________ Ocfs2-users mailing list Ocfs2-users at oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users Hai un indirizzo email difficile da ricordare? Scegli quello che hai sempre desiderato su Yahoo! Mail http://it.docs.yahoo.com/nuovo_indirizzo.html -------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20080630/bde59c4d/attachment.html
Gabriele Di Giambelardini
2008-Jul-11 09:11 UTC
[Ocfs2-users] Fence abnormal and with not apparent reason
Hi to all, watching the log by more attention and in the moment when a node go down, I have this imformation by the kernel about o2net : Jul 10 16:52:02 be1 kernel: BUG: soft lockup - CPU#0 stuck for 10s! [o2net:6814] Jul 10 16:52:02 be1 kernel: CPU 0: Jul 10 16:52:02 be1 kernel: Modules linked in: ocfs2(U) ocfs2_dlmfs(U) ocfs2_dlm parport shpchp ide_cd cdrom i2c_i801 i5000_edac i2c_core serio_raw edac_mc bnx2 Jul 10 16:52:02 be1 kernel: Pid: 6814, comm: o2net Tainted: G 2.6.18-92.el5 Jul 10 16:52:02 be1 kernel: RIP: 0010:[<ffffffff80064b57>] [<ffffffff80064b57>] Jul 10 16:52:02 be1 kernel: RSP: 0018:ffff81043f281d28 EFLAGS: 00000246 Jul 10 16:52:02 be1 kernel: RAX: ffff810316b02828 RBX: ffff810440656018 RCX: 000 Jul 10 16:52:02 be1 kernel: RDX: 0000000000000001 RSI: 0000000000000286 RDI: fff Jul 10 16:52:02 be1 kernel: RBP: ffff810367456c20 R08: ffff810316b02838 R09: fff Jul 10 16:52:02 be1 kernel: R10: ffff810316b02858 R11: 000000000000fa55 R12: fff Jul 10 16:52:02 be1 kernel: R13: 0000000000000044 R14: 000000000000001f R15: 000 Jul 10 16:52:02 be1 kernel: FS: 0000000000000000(0000) GS:ffffffff8039e000(0000 Jul 10 16:52:02 be1 kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Jul 10 16:52:02 be1 kernel: CR2: 000000001c1b6ec8 CR3: 0000000449592000 CR4: 000 Jul 10 16:52:02 be1 kernel: Jul 10 16:52:02 be1 kernel: Call Trace: Jul 10 16:52:02 be1 kernel: [<ffffffff884e7b0b>] :ocfs2_dlm:dlm_assert_master_h Jul 10 16:52:02 be1 kernel: [<ffffffff884ab15e>] :ocfs2_nodemanager:o2net_proce Jul 10 16:52:02 be1 kernel: [<ffffffff884ace20>] :ocfs2_nodemanager:o2net_rx_un Jul 10 16:52:02 be1 kernel: [<ffffffff884ac5d2>] :ocfs2_nodemanager:o2net_rx_un Jul 10 16:52:02 be1 kernel: [<ffffffff8004cea9>] run_workqueue+0x94/0xe4 Jul 10 16:52:02 be1 kernel: [<ffffffff800497be>] worker_thread+0x0/0x122 Jul 10 16:52:02 be1 kernel: [<ffffffff8009dbca>] keventd_create_kthread+0x0/0xc Jul 10 16:52:02 be1 kernel: [<ffffffff800498ae>] worker_thread+0xf0/0x122 Jul 10 16:52:02 be1 kernel: [<ffffffff8008ac03>] default_wake_function+0x0/0xe Jul 10 16:52:02 be1 kernel: [<ffffffff8009dbca>] keventd_create_kthread+0x0/0xc Jul 10 16:52:02 be1 kernel: [<ffffffff8009dbca>] keventd_create_kthread+0x0/0xc Jul 10 16:52:02 be1 kernel: [<ffffffff8003253d>] kthread+0xfe/0x132 Jul 10 16:52:02 be1 kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11 Jul 10 16:52:03 be1 kernel: [<ffffffff8009dbca>] keventd_create_kthread+0x0/0xc Jul 10 16:52:03 be1 kernel: [<ffffffff8002881b>] sync_page+0x0/0x42 Jul 10 16:52:03 be1 kernel: [<ffffffff8003243f>] kthread+0x0/0x132 Jul 10 16:52:03 be1 kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11 --------------------------------------------------------------------------------- Some body can help me to know what means?? Thanks ----- Messaggio originale ----- Da: Gabriele Di Giambelardini <gabriele_d_g at yahoo.it> A: V Srinivas <vaungasrinu at gmail.com> Cc: ocfs2-users at oss.oracle.com Inviato: Luned? 30 giugno 2008, 15:56:35 Oggetto: Re: [Ocfs2-users] Fence abnormal and with not apparent reason Hi, this is my output on all the 5 servers Module "configfs": Loaded Filesystem "configfs": Mounted Module "ocfs2_nodemanager": Loaded Module "ocfs2_dlm": Loaded Module "ocfs2_dlmfs": Loaded Filesystem "ocfs2_dlmfs": Mounted Checking O2CB cluster ocfs2: Online Heartbeat dead threshold: 61 Network idle timeout: 60000 Network keepalive delay: 2000 Network reconnect delay: 2000 Checking O2CB heartbeat: Active thanks ----- Messaggio originale ----- Da: V Srinivas <vaungasrinu at gmail.com> A: Gabriele Di Giambelardini <gabriele_d_g at yahoo.it> Inviato: Luned? 30 giugno 2008, 13:07:31 Oggetto: Re: [Ocfs2-users] Fence abnormal and with not apparent reason pls send me service o2cb status output for that servers. On 30/06/2008, Gabriele Di Giambelardini <gabriele_d_g at yahoo.it> wrote: I to all, I have a big and intrigued problem. I explain you the situation: I have 5 servers linux and 1 SAN IBM , every server have ocfs2 and by ocfs2-console I can watch they. Fot connect the server I use an dedicate network, The problem is that some times I have this message on one of the server: kernel: o2net: connection to node test.test.it (num 1) at 10.10.10.1:7777 has been idle for 60.0 seconds, shutting it down. So my server has fenced, but when it come up, not success to start ocfs2 or mount partition. For resolve it I must fence all servers and every thing restart to work well. I have noticed the if I'm not fast to fence all servers, other nodes go in "shutting it down". Some body can help me, it's really important for me. my server: - Red Hat Enterprise Linux Server release 5 2.6.18-8.el5 #1 SMP Fri Jan 26 14:15:14 EST 2007 x86_64 x86_64 x86_64 GNU/Linux - ocfs2-2.6.18-8.el5-1.2.8-2.el5 ocfs2-tools-1.2.7-1.el5 ocfs2console-1.2.7-1.el5 ocfs2-tools-debuginfo-1.2.6-1.el5 ocfs2-2.6.18-92.1.1.el5-1.2.9-1.el5 - OCFS2 1.2.8 Tue Jan 22 11:58:16 PST 2008 (build 9c7ae8bb50ef6d8791df2912775adcc5) thank in advance for any suggestions ________________________________ Scopri il Blog di Yahoo! Mail: trucchi, novit?, consigli... e la tua opinione! _______________________________________________ Ocfs2-users mailing list Ocfs2-users at oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ________________________________ Scopri il Blog di Yahoo! Mail: trucchi, novit?, consigli... e la tua opinione! Posta, news, sport, oroscopo: tutto in una sola pagina. Crea l'home page che piace a te! www.yahoo.it/latuapagina -------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20080711/d58b3eae/attachment-0001.html