Hello We have a 9 nodes ocfs2 cluster used for http serving. Sometimes when we want to reboot one of the nodes, it happens to kernel panic at ocfs2 unmount, and after this on all the other nodes some httpd processes goes to "D" state, and the system gets loaded. ps -e -o pid,stat,comm,wchan=WIDE-WCHAN-COLUMN 10187 D httpd ocfs2_wait_for_mask 10241 D httpd ocfs2_wait_for_mask 10254 D httpd ocfs2_wait_for_mask 10255 D httpd ocfs2_wait_for_mask 10272 D httpd ocfs2_wait_for_mask 10273 D httpd ocfs2_wait_for_mask 10274 D httpd ocfs2_wait_for_mask 10398 D httpd ocfs2_wait_for_mask 10441 D httpd ocfs2_wait_for_mask 10452 D httpd ocfs2_wait_for_mask Please tell me what info do you need to provide help to us. All nodes are CentOS 5.2, ocfs2 1.4.1, we use an DS4700 IBM Storage, with Qlogic HBA and MPP driver. -- Cristian Gae Director IT Netbridge Services cristian.gae at netbridge.ro 0749 018 817 -- Acest mesaj impreuna cu fisierele transmise constituie o informatie confidentiala si se adreseaza numai persoanei/persoanelor fizice sau juridice mentionata/e ca destinatar. Daca nu sunteti destinatarul acestui mesaj si ati primit e-mailul din greseala, va rugam anuntati administratorul de sistem. Va aducem la cunostinta ca opiniile exprimate in acest e-mail reprezinta punctul de vedere al autorului si nu cel al intregii societati. Primitorul trebuie sa verifice existenta unor virusi in acest e-mail si in continutul fisierele atasate. Societatea Netbridge Services SRL nu este responsabila pentru transmiterea necorespunzatoare a informatiei cauzate de un virus.
Setup a netconsole server to catch the oops trace. Have you set /sys/kernel/panic_on_oops to 1? On Tue, Mar 24, 2009 at 06:09:22PM +0200, Cristian Gae wrote:> Hello > > We have a 9 nodes ocfs2 cluster used for http serving. > > Sometimes when we want to reboot one of the nodes, it happens to kernel > panic at ocfs2 unmount, and after this on all the other nodes some httpd > processes goes to "D" state, and the system gets loaded. > > ps -e -o pid,stat,comm,wchan=WIDE-WCHAN-COLUMN > > 10187 D httpd ocfs2_wait_for_mask > > 10241 D httpd ocfs2_wait_for_mask > > 10254 D httpd ocfs2_wait_for_mask > > 10255 D httpd ocfs2_wait_for_mask > > 10272 D httpd ocfs2_wait_for_mask > > 10273 D httpd ocfs2_wait_for_mask > > 10274 D httpd ocfs2_wait_for_mask > > 10398 D httpd ocfs2_wait_for_mask > > 10441 D httpd ocfs2_wait_for_mask > > 10452 D httpd ocfs2_wait_for_mask > > > > Please tell me what info do you need to provide help to us. > > All nodes are CentOS 5.2, ocfs2 1.4.1, we use an DS4700 IBM Storage, > with Qlogic HBA and MPP driver. > > > > -- > Cristian Gae > Director IT > Netbridge Services > cristian.gae at netbridge.ro > 0749 018 817 > > -- > Acest mesaj impreuna cu fisierele transmise constituie o informatie > confidentiala si se adreseaza numai persoanei/persoanelor fizice sau > juridice mentionata/e ca destinatar. Daca nu sunteti destinatarul > acestui mesaj si ati primit e-mailul din greseala, va rugam anuntati > administratorul de sistem. Va aducem la cunostinta ca opiniile exprimate > in acest e-mail reprezinta punctul de vedere al autorului si nu cel al > intregii societati. Primitorul trebuie sa verifice existenta unor virusi > in acest e-mail si in continutul fisierele atasate. Societatea Netbridge > Services SRL nu este responsabila pentru transmiterea necorespunzatoare > a informatiei cauzate de un virus. > > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users at oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users
/proc/sys/kernel/panic_on_oops was alreay set to 1 We configured netconsole on all nodes and a syslog-ng server to log the messages. I will come back with the traces next time the problem occur. Thank you Sunil Mushran wrote:> Setup a netconsole server to catch the oops trace. Have you set > /sys/kernel/panic_on_oops to 1? > > On Tue, Mar 24, 2009 at 06:09:22PM +0200, Cristian Gae wrote: >> Hello >> >> We have a 9 nodes ocfs2 cluster used for http serving. >> >> Sometimes when we want to reboot one of the nodes, it happens to kernel >> panic at ocfs2 unmount, and after this on all the other nodes some httpd >> processes goes to "D" state, and the system gets loaded. >> >> ps -e -o pid,stat,comm,wchan=WIDE-WCHAN-COLUMN >> >> 10187 D httpd ocfs2_wait_for_mask >> >> 10241 D httpd ocfs2_wait_for_mask >> >> 10254 D httpd ocfs2_wait_for_mask >> >> 10255 D httpd ocfs2_wait_for_mask >> >> 10272 D httpd ocfs2_wait_for_mask >> >> 10273 D httpd ocfs2_wait_for_mask >> >> 10274 D httpd ocfs2_wait_for_mask >> >> 10398 D httpd ocfs2_wait_for_mask >> >> 10441 D httpd ocfs2_wait_for_mask >> >> 10452 D httpd ocfs2_wait_for_mask >> >> >> >> Please tell me what info do you need to provide help to us. >> >> All nodes are CentOS 5.2, ocfs2 1.4.1, we use an DS4700 IBM Storage, >> with Qlogic HBA and MPP driver. >> >> >> >> -- >> Cristian Gae >> Director IT >> Netbridge Services >> cristian.gae at netbridge.ro >> 0749 018 817 >> >> -- >> Acest mesaj impreuna cu fisierele transmise constituie o informatie >> confidentiala si se adreseaza numai persoanei/persoanelor fizice sau >> juridice mentionata/e ca destinatar. Daca nu sunteti destinatarul >> acestui mesaj si ati primit e-mailul din greseala, va rugam anuntati >> administratorul de sistem. Va aducem la cunostinta ca opiniile exprimate >> in acest e-mail reprezinta punctul de vedere al autorului si nu cel al >> intregii societati. Primitorul trebuie sa verifice existenta unor virusi >> in acest e-mail si in continutul fisierele atasate. Societatea Netbridge >> Services SRL nu este responsabila pentru transmiterea necorespunzatoare >> a informatiei cauzate de un virus. >> >> >> _______________________________________________ >> Ocfs2-users mailing list >> Ocfs2-users at oss.oracle.com >> http://oss.oracle.com/mailman/listinfo/ocfs2-users > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users at oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users-- Cristian Gae Director IT Netbridge Services cristian.gae at netbridge.ro 0749 018 817 -- Acest mesaj impreuna cu fisierele transmise constituie o informatie confidentiala si se adreseaza numai persoanei/persoanelor fizice sau juridice mentionata/e ca destinatar. Daca nu sunteti destinatarul acestui mesaj si ati primit e-mailul din greseala, va rugam anuntati administratorul de sistem. Va aducem la cunostinta ca opiniile exprimate in acest e-mail reprezinta punctul de vedere al autorului si nu cel al intregii societati. Primitorul trebuie sa verifice existenta unor virusi in acest e-mail si in continutul fisierele atasate. Societatea Netbridge Services SRL nu este responsabila pentru transmiterea necorespunzatoare a informatiei cauzate de un virus.