Hi, I am running the OCFS2 1.2.1 on SLES 10, just the stuff right out of the box. This is a 3 node cluster that's been running for 2 years with just about zero modification. The storage is a high end SAN and the transport is iscsi. We went two years without an issue and all a sudden node 1 in the cluster keeps crashing. I have never had to troubleshoot OCFS2, so I started with what I could control. I checked /var/log/messages and nothing there suggests a problem. I replaced hardware that went as far as me popping the scsi drives out and putting them in another server and trying it with all new hardware. The problem still persists. I had the network team check the iscsi port on the private iscsi network and they are not seeing errors. I've check the few OCFS2 settings in play and they all look good. My question to the group is how go I continue troubleshooting this issue? I'm not aware of any native logs etc to reference. I would appreciate any help that gets this diagnosis moving to a solution. Thanks, Bruce
1.2.1? That's 5 years old. We've had a few fixes since then. ;) You have to catch the oops trace to figure out the reason. And one way to get it by using netconsole. Check the sles10 docs to see how to configure netconsole. Or, whatever is recommended for capturing the oops log in that release. On 06/29/2011 11:28 AM, B Leggett wrote:> Hi, > I am running the OCFS2 1.2.1 on SLES 10, just the stuff right out of the box. This is a 3 node cluster that's been running for 2 years with just about zero modification. The storage is a high end SAN and the transport is iscsi. We went two years without an issue and all a sudden node 1 in the cluster keeps crashing. I have never had to troubleshoot OCFS2, so I started with what I could control. > > I checked /var/log/messages and nothing there suggests a problem. I replaced hardware that went as far as me popping the scsi drives out and putting them in another server and trying it with all new hardware. The problem still persists. > > I had the network team check the iscsi port on the private iscsi network and they are not seeing errors. > > I've check the few OCFS2 settings in play and they all look good. > > My question to the group is how go I continue troubleshooting this issue? I'm not aware of any native logs etc to reference. I would appreciate any help that gets this diagnosis moving to a solution. > > Thanks, > Bruce
For the list, I accidentally sent it direct to Sunil. My apologies for that. Bruce ----- Original Message ----- From: "B Leggett" <bleggett at ngent.com> To: "Sunil Mushran" <sunil.mushran at oracle.com> Sent: Wednesday, June 29, 2011 3:40:52 PM GMT -05:00 US/Canada Eastern Subject: Re: [Ocfs2-users] OCFS2 Crash Sunil, I did as you requested an got one line of output. o2net: accepted connection from node node-05 (num 4) at 192.168.1.62:7777 Bruce ----- Original Message ----- From: "Sunil Mushran" <sunil.mushran at oracle.com> To: "B Leggett" <bleggett at ngent.com> Cc: ocfs2-users at oss.oracle.com Sent: Wednesday, June 29, 2011 2:42:08 PM GMT -05:00 US/Canada Eastern Subject: Re: [Ocfs2-users] OCFS2 Crash 1.2.1? That's 5 years old. We've had a few fixes since then. ;) You have to catch the oops trace to figure out the reason. And one way to get it by using netconsole. Check the sles10 docs to see how to configure netconsole. Or, whatever is recommended for capturing the oops log in that release. On 06/29/2011 11:28 AM, B Leggett wrote:> Hi, > I am running the OCFS2 1.2.1 on SLES 10, just the stuff right out of the box. This is a 3 node cluster that's been running for 2 years with just about zero modification. The storage is a high end SAN and the transport is iscsi. We went two years without an issue and all a sudden node 1 in the cluster keeps crashing. I have never had to troubleshoot OCFS2, so I started with what I could control. > > I checked /var/log/messages and nothing there suggests a problem. I replaced hardware that went as far as me popping the scsi drives out and putting them in another server and trying it with all new hardware. The problem still persists. > > I had the network team check the iscsi port on the private iscsi network and they are not seeing errors. > > I've check the few OCFS2 settings in play and they all look good. > > My question to the group is how go I continue troubleshooting this issue? I'm not aware of any native logs etc to reference. I would appreciate any help that gets this diagnosis moving to a solution. > > Thanks, > Bruce