Hi, I hope all of them are in good spirit.... *We have a four OSS servers, OSS1 to OSS4 are clustered each other* *The Nodes are clustered with OSS1 and OSS2 , OSS3 & OSS4.* *It was configured six months back, from the beginning itself its creacting * *an issue that one of node is fencing the other node and its goes to the shutdown state.* *This problem may be happen from two to three weeks timing period.* *In the /var/log/messages showing some errors continuously that * *" slow start_page_write 57s due to heavy IO load "* *Can anybody can help me regarding this issue.....* * * Thanks & Regards * VIJESH E K* * * -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20120123/52f7c4e6/attachment-0001.html
Hi Vijesh. You are probably facing a called "split brain" issue. It may happen due heartbeat communication problems. One common reason is an issue with your heartbeat. What sort of heartbeat are you using? Some time ago we had a problem when the OSS were overloaded and the heartbeat becomes unresponsive. This would cause a "false split brain" scenario. Basically all the two nodes within your HA pair stonith itself since there was no answer from heartbeat device. I guess you should take a look and start monitoring your oss nodes to understand if the message logged makes sense (very likely). How''s the memory configuration of your OSS nodes? What OS? How your zone reclaim mode looks like? Regards, Carlos -- Carlos Thomaz | HPC Systems Architect Mobile: +1 (303) 519-0578 cthomaz at ddn.com | Skype ID: carlosthomaz DataDirect Networks, Inc. 9960 Federal Dr., Ste 100 Colorado Springs, CO 80921 ddn.com<http://www.ddn.com/> | Twitter: @ddn_limitless<http://twitter.com/ddn_limitless> | 1.800.TERABYTE From: VIJESH EK <ekvijesh at gmail.com<mailto:ekvijesh at gmail.com>> Date: Sun, 22 Jan 2012 22:33:20 -0800 To: "lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>" <lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>> Subject: [Lustre-discuss] OSS Nodes Fencing issue in HPC " slow start_page_write 57s due to heavy IO load " -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20120122/039c97ba/attachment.html
Well, it sounds like an issue with your HA package configuration. Likely one node is not being responsive enough to heartbeat/are-you-alive messages so the other node assumes it has died. This is likely fixed by increasing the deadtime parameter in your HA configuration (try 180 seconds if it is smaller than that). Hard to say, as you omitted any logs, and you didn''t even say what HA package you are using. You also didn''t indicate which Lustre version you are using. One of the likely candidates for those messages is the kernel having difficulty allocating memory. On many kernels, if /proc/sys/vm/zone_reclaim_mode is not 0, memory allocations can take a long time as it keeps looking for the best pages to free until pages in the local NUMA node are available. With the Lustre 1.8.x write cache, the memory pressure is substantial (in 1.6.x and earlier, the service threads had statically-allocated buffers, but starting with 1.8.x each incoming request allocates new pages and frees them back to the page cache). Kevin On Jan 22, 2012, at 11:33 PM, VIJESH EK wrote: Hi, I hope all of them are in good spirit.... We have a four OSS servers, OSS1 to OSS4 are clustered each other The Nodes are clustered with OSS1 and OSS2 , OSS3 & OSS4. It was configured six months back, from the beginning itself its creacting an issue that one of node is fencing the other node and its goes to the shutdown state. This problem may be happen from two to three weeks timing period. In the /var/log/messages showing some errors continuously that " slow start_page_write 57s due to heavy IO load " Can anybody can help me regarding this issue..... Thanks & Regards VIJESH E K <ATT00001..txt> Confidentiality Notice: This e-mail message, its contents and any attachments to it are confidential to the intended recipient, and may contain information that is privileged and/or exempt from disclosure under applicable law. If you are not the intended recipient, please immediately notify the sender and destroy the original e-mail message and any attachments (and any copies that may have been made) from your system or otherwise. Any unauthorized use, copying, disclosure or distribution of this information is strictly prohibited. Email addresses that end with a ?-c? identify the sender as a Fusion-io contractor. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20120122/5828bba2/attachment.html
You have got already a great advice from Carlos and Kevin. One more point I would like to add is that quite often people configure their HA software to send heartbeat over a single network thus creating a single point of failure and the heartbeat (keep alive) pings are sent over the network that is used as the main I/O feed. In my experience I found that it is very important to have the HA ping to be send at least on two networks or even better using two different methods of comm like Ethernet and serial. Regards, Wojciech On 23 January 2012 06:33, VIJESH EK <ekvijesh at gmail.com> wrote:> Hi, > > I hope all of them are in good spirit.... > > *We have a four OSS servers, OSS1 to OSS4 are clustered each other* > *The Nodes are clustered with OSS1 and OSS2 , OSS3 & OSS4.* > *It was configured six months back, from the beginning itself its > creacting * > *an issue that one of node is fencing the other node and its goes to the > shutdown state.* > *This problem may be happen from two to three weeks timing period.* > *In the /var/log/messages showing some errors continuously that * > *" slow start_page_write 57s due to heavy IO load "* > *Can anybody can help me regarding this issue.....* > * > * > > Thanks & Regards > * > VIJESH E K* > * > * > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20120125/4824e4c5/attachment.html
*Dear Sir,* * * *I have attached the /var/log/messages from the OSS node ,* *Please go through the logs and kindly give me a solution for this issue........ * * * *Thanks & Regards VIJESH E K* *HCL Infosystems Ltd. Chennai-6 Mob:+91 99400 96543* On Mon, Jan 23, 2012 at 12:03 PM, VIJESH EK <ekvijesh at gmail.com> wrote:> Hi, > > I hope all of them are in good spirit.... > > *We have a four OSS servers, OSS1 to OSS4 are clustered each other* > *The Nodes are clustered with OSS1 and OSS2 , OSS3 & OSS4.* > *It was configured six months back, from the beginning itself its > creacting * > *an issue that one of node is fencing the other node and its goes to the > shutdown state.* > *This problem may be happen from two to three weeks timing period.* > *In the /var/log/messages showing some errors continuously that * > *" slow start_page_write 57s due to heavy IO load "* > *Can anybody can help me regarding this issue.....* > * > * > > Thanks & Regards > * > VIJESH E K* > * > * > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20120127/71307764/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: messages.3 Type: application/octet-stream Size: 67149 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20120127/71307764/attachment-0004.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: messages Type: application/octet-stream Size: 92035 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20120127/71307764/attachment-0005.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: messages.1 Type: application/octet-stream Size: 187937 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20120127/71307764/attachment-0006.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: messages.2 Type: application/octet-stream Size: 126397 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20120127/71307764/attachment-0007.obj
As I replied earlier, those "slow" messages are often a result of memory allocations taking a long time. Since zone_reclaim shows up in many of the stack traces, that still appears to be a good candidate. Did you check /proc/sys/vm/zone_reclaim_mode and was it 0? Did you change it to 0 and still have problems? The same situation that causes the Lustre threads to be slow can also stall the heartbeat processes. Did you increase the heartbeat deadtime timeout value? Kevin On Jan 27, 2012, at 1:42 AM, VIJESH EK wrote: Dear Sir, I have attached the /var/log/messages from the OSS node , Please go through the logs and kindly give me a solution for this issue........ Thanks & Regards VIJESH E K HCL Infosystems Ltd. Chennai-6 Mob:+91 99400 96543 On Mon, Jan 23, 2012 at 12:03 PM, VIJESH EK <ekvijesh at gmail.com<mailto:ekvijesh at gmail.com>> wrote: Hi, I hope all of them are in good spirit.... We have a four OSS servers, OSS1 to OSS4 are clustered each other The Nodes are clustered with OSS1 and OSS2 , OSS3 & OSS4. It was configured six months back, from the beginning itself its creacting an issue that one of node is fencing the other node and its goes to the shutdown state. This problem may be happen from two to three weeks timing period. In the /var/log/messages showing some errors continuously that " slow start_page_write 57s due to heavy IO load " Can anybody can help me regarding this issue..... Thanks & Regards VIJESH E K <messages.3><messages><messages.1><messages.2><ATT00001..txt> Confidentiality Notice: This e-mail message, its contents and any attachments to it are confidential to the intended recipient, and may contain information that is privileged and/or exempt from disclosure under applicable law. If you are not the intended recipient, please immediately notify the sender and destroy the original e-mail message and any attachments (and any copies that may have been made) from your system or otherwise. Any unauthorized use, copying, disclosure or distribution of this information is strictly prohibited. Email addresses that end with a ?-c? identify the sender as a Fusion-io contractor. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20120130/69d0e75a/attachment.html
*Dear Sir,* * * *I have checked the file **/proc/sys/vm/zone_reclaim_mode , and found that its value is 1 in four OSS servers (OSS1 to OSS4). Should i change to 0 in all nodes. I want to know one thing , How it can be resolve the current issue ? Can u please explain?, What is the main function of this file ?* *Have u verified the log file which one i has been sent earlier ?. If i have changed the value to 0, Is it will effect currently running processes or Jobs ?* *I am waiting for your reply....* * * *Thanks & Regards VIJESH E K* * * On Tue, Jan 31, 2012 at 12:21 AM, Kevin Van Maren <KVanMaren at fusionio.com>wrote:> As I replied earlier, those "slow" messages are often a result of memory > allocations taking a long time. Since zone_reclaim shows up in many of the > stack traces, that still appears to be a good candidate. > > Did you check /proc/sys/vm/zone_reclaim_mode and was it 0? Did you change > it to 0 and still have problems? > > The same situation that causes the Lustre threads to be slow can also > stall the heartbeat processes. Did you increase the heartbeat deadtime > timeout value? > > Kevin > > > On Jan 27, 2012, at 1:42 AM, VIJESH EK wrote: > > *Dear Sir,* > * > * > *I have attached the /var/log/messages from the OSS node ,* > *Please go through the logs and kindly give me a solution for this > issue........ > * > * > * > *Thanks & Regards > > VIJESH E K* > *HCL Infosystems Ltd. > Chennai-6 > Mob:+91 99400 96543* > > > On Mon, Jan 23, 2012 at 12:03 PM, VIJESH EK <ekvijesh at gmail.com> wrote: > >> Hi, >> >> I hope all of them are in good spirit.... >> >> *We have a four OSS servers, OSS1 to OSS4 are clustered each other* >> *The Nodes are clustered with OSS1 and OSS2 , OSS3 & OSS4.* >> *It was configured six months back, from the beginning itself its >> creacting * >> *an issue that one of node is fencing the other node and its goes to >> the shutdown state.* >> *This problem may be happen from two to three weeks timing period.* >> *In the /var/log/messages showing some errors continuously that * >> *" slow start_page_write 57s due to heavy IO load "* >> *Can anybody can help me regarding this issue.....* >> * >> * >> >> Thanks & Regards >> * >> VIJESH E K* >> * >> * >> >> > > > > <messages.3><messages><messages.1><messages.2><ATT00001..txt> > > > > > Confidentiality Notice: This e-mail message, its contents and any > attachments to it are confidential to the intended recipient, and may > contain information that is privileged and/or exempt from disclosure under > applicable law. If you are not the intended recipient, please immediately > notify the sender and destroy the original e-mail message and any > attachments (and any copies that may have been made) from your system or > otherwise. Any unauthorized use, copying, disclosure or distribution of > this information is strictly prohibited. Email addresses that end with a > ?-c? identify the sender as a Fusion-io contractor. > ?? >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20120131/ca1a2be8/attachment.html
Yes, change it to 0. This will make it easier to allocate memory. Although it will sometimes allocate memory connected to the wrong CPU, it shouldn''t get stuck for long periods in the memory allocator. Because of the Lustre oss cache (starting in 1.8.0), service threads have to allocate new memory for every request. Your lustre server threads are getting stuck allocating memory. I expect that you will see many fewer "slow" messages on the servers after making that change. Kevin On Jan 30, 2012, at 11:03 PM, VIJESH EK wrote: Dear Sir, I have checked the file /proc/sys/vm/zone_reclaim_mode , and found that its value is 1 in four OSS servers (OSS1 to OSS4). Should i change to 0 in all nodes. I want to know one thing , How it can be resolve the current issue ? Can u please explain?, What is the main function of this file ? Have u verified the log file which one i has been sent earlier ?. If i have changed the value to 0, Is it will effect currently running processes or Jobs ? I am waiting for your reply.... Thanks & Regards VIJESH E K On Tue, Jan 31, 2012 at 12:21 AM, Kevin Van Maren <KVanMaren at fusionio.com<mailto:KVanMaren at fusionio.com>> wrote: As I replied earlier, those "slow" messages are often a result of memory allocations taking a long time. Since zone_reclaim shows up in many of the stack traces, that still appears to be a good candidate. Did you check /proc/sys/vm/zone_reclaim_mode and was it 0? Did you change it to 0 and still have problems? The same situation that causes the Lustre threads to be slow can also stall the heartbeat processes. Did you increase the heartbeat deadtime timeout value? Kevin On Jan 27, 2012, at 1:42 AM, VIJESH EK wrote: Dear Sir, I have attached the /var/log/messages from the OSS node , Please go through the logs and kindly give me a solution for this issue........ Thanks & Regards VIJESH E K HCL Infosystems Ltd. Chennai-6 Mob:+91 99400 96543 On Mon, Jan 23, 2012 at 12:03 PM, VIJESH EK <ekvijesh at gmail.com<mailto:ekvijesh at gmail.com>> wrote: Hi, I hope all of them are in good spirit.... We have a four OSS servers, OSS1 to OSS4 are clustered each other The Nodes are clustered with OSS1 and OSS2 , OSS3 & OSS4. It was configured six months back, from the beginning itself its creacting an issue that one of node is fencing the other node and its goes to the shutdown state. This problem may be happen from two to three weeks timing period. In the /var/log/messages showing some errors continuously that " slow start_page_write 57s due to heavy IO load " Can anybody can help me regarding this issue..... Thanks & Regards VIJESH E K <messages.3><messages><messages.1><messages.2><ATT00001..txt> Confidentiality Notice: This e-mail message, its contents and any attachments to it are confidential to the intended recipient, and may contain information that is privileged and/or exempt from disclosure under applicable law. If you are not the intended recipient, please immediately notify the sender and destroy the original e-mail message and any attachments (and any copies that may have been made) from your system or otherwise. Any unauthorized use, copying, disclosure or distribution of this information is strictly prohibited. Email addresses that end with a ?-c? identify the sender as a Fusion-io contractor. ?? This e-mail message, its contents and any attachments to it are confidential to the intended recipient, and may contain information that is privileged and/or exempt from disclosure under applicable law. If you are not the intended recipient, please immediately notify the sender and destroy the original e-mail message and any attachments (and any copies that may have been made) from your system or otherwise. Any unauthorized use, copying, disclosure or distribution of this information is strictly prohibited. Email addresses that end with a ?-c? identify the sender as a Fusion-io contractor. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20120130/40df1def/attachment.html