Hi all, I have a problem with my OCFS2. 3 machines are using it with iSCSI and a SAN. -When one user try to list one of his folder, the console just freeze. -The node2 server has a load of 8.00. If I reboot node2, the load will slowly go up until a load of 8.00 I can't use the ocfs2console since those machines do not have a GUI. This is with the 2.6.23 kernel, Gentoo and those tools http://bugs.gentoo.org/show_bug.cgi?id=193249#c25 Thanks for any help. Alexandre Racine alexandre.racine at mhicc.org 514-461-1300 poste 3303 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20080523/3ab025b1/attachment.html
Any errors in the /var/log/messages? Any busy locks: http://oss.oracle.com/~smushran/.debug/scripts/scanlocks $ scanlocks If so, dump them on all nodes using: $ echo R <domain> <lockname> List of domains can be gotten from this: http://oss.oracle.com/~smushran/.debug/scripts/listdomains Alexandre Racine wrote:> > Hi all, > > I have a problem with my OCFS2. 3 machines are using it with iSCSI and > a SAN. > > -When one user try to list one of his folder, the console just freeze. > > -The node2 server has a load of 8.00. If I reboot node2, the load will > slowly go up until a load of 8.00 > > I can?t use the ocfs2console since those machines do not have a GUI. > > This is with the 2.6.23 kernel, Gentoo and those tools > http://bugs.gentoo.org/show_bug.cgi?id=193249#c25 > > Thanks for any help. > > Alexandre Racine > > alexandre.racine at mhicc.org > > 514-461-1300 poste 3303 > > ------------------------------------------------------------------------ > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users at oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users
Hi Sunil, Ok this scanlocks2 does not produce any output currently. So I guest that if some locks would be found I would have to do this command like this :> If so, dump them on all nodes using: > $ echo R <domain> <lockname> > List of domains can be gotten from this: > http://oss.oracle.com/~smushran/.debug/scripts/listdomainsBut the listdomains script tells me : "which: no debugfs.ocfs2 in (/home... " ($PATH actually) Is this the correct way? Thanks. Alexandre Racine alexandre.racine at mhicc.org 514-461-1300 poste 3303 -----Original Message----- From: Sunil Mushran [mailto:Sunil.Mushran at oracle.com] Sent: 23 mai 2008 16:45 To: Alexandre Racine Subject: Re: [Ocfs2-users] huge "something" problem Sorry that script is for older kernel. Use this one instead. http://oss.oracle.com/~smushran/.debug/scripts/scanlocks2 Alexandre Racine wrote:> Hi Sunil, > > Thanks for your quick answer. In the meanwhile the server was rebooted> and the load is now normal and the folder listable. > > I have started the scanlocks but says that debugfs is not loaded. > Where is this debugfs? > > Thanks. > > > Alexandre Racine > alexandre.racine at mhicc.org > 514-461-1300 poste 3303 > > -----Original Message----- > From: ocfs2-users-bounces at oss.oracle.com > [mailto:ocfs2-users-bounces at oss.oracle.com] On Behalf Of Sunil Mushran > Sent: 23 mai 2008 14:16 > To: Alexandre Racine > Cc: ocfs2-users at oss.oracle.com > Subject: Re: [Ocfs2-users] huge "something" problem urgent > > Any errors in the /var/log/messages? > > Any busy locks: > http://oss.oracle.com/~smushran/.debug/scripts/scanlocks > $ scanlocks > > If so, dump them on all nodes using: > $ echo R <domain> <lockname> > > List of domains can be gotten from this: > http://oss.oracle.com/~smushran/.debug/scripts/listdomains > > Alexandre Racine wrote: > >> Hi all, >> >> I have a problem with my OCFS2. 3 machines are using it with iSCSI >> and >> > > >> a SAN. >> >> -When one user try to list one of his folder, the console justfreeze.>> >> -The node2 server has a load of 8.00. If I reboot node2, the load >> will >> > > >> slowly go up until a load of 8.00 >> >> I can't use the ocfs2console since those machines do not have a GUI. >> >> This is with the 2.6.23 kernel, Gentoo and those tools >> http://bugs.gentoo.org/show_bug.cgi?id=193249#c25 >> >> Thanks for any help. >> >> Alexandre Racine >> >> alexandre.racine at mhicc.org >> >> 514-461-1300 poste 3303 >> >> >> > ---------------------------------------------------------------------- > -- > >> _______________________________________________ >> Ocfs2-users mailing list >> Ocfs2-users at oss.oracle.com >> http://oss.oracle.com/mailman/listinfo/ocfs2-users >> > > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users at oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users >Alexandre Racine alexandre.racine at mhicc.org 514-461-1300 poste 3303 -----Original Message----- From: Sunil Mushran [mailto:Sunil.Mushran at oracle.com] Sent: 23 mai 2008 16:45 To: Alexandre Racine Subject: Re: [Ocfs2-users] huge "something" problem Sorry that script is for older kernel. Use this one instead. http://oss.oracle.com/~smushran/.debug/scripts/scanlocks2 Alexandre Racine wrote:> Hi Sunil, > > Thanks for your quick answer. In the meanwhile the server was rebooted > and the load is now normal and the folder listable. > > I have started the scanlocks but says that debugfs is not loaded.Where> is this debugfs? > > Thanks. > > > Alexandre Racine > alexandre.racine at mhicc.org > 514-461-1300 poste 3303 > > -----Original Message----- > From: ocfs2-users-bounces at oss.oracle.com > [mailto:ocfs2-users-bounces at oss.oracle.com] On Behalf Of Sunil Mushran > Sent: 23 mai 2008 14:16 > To: Alexandre Racine > Cc: ocfs2-users at oss.oracle.com > Subject: Re: [Ocfs2-users] huge "something" problem urgent > > Any errors in the /var/log/messages? > > Any busy locks: > http://oss.oracle.com/~smushran/.debug/scripts/scanlocks > $ scanlocks > > If so, dump them on all nodes using: > $ echo R <domain> <lockname> > > List of domains can be gotten from this: > http://oss.oracle.com/~smushran/.debug/scripts/listdomains > > Alexandre Racine wrote: > >> Hi all, >> >> I have a problem with my OCFS2. 3 machines are using it with iSCSIand>> > > >> a SAN. >> >> -When one user try to list one of his folder, the console justfreeze.>> >> -The node2 server has a load of 8.00. If I reboot node2, the loadwill>> > > >> slowly go up until a load of 8.00 >> >> I can't use the ocfs2console since those machines do not have a GUI. >> >> This is with the 2.6.23 kernel, Gentoo and those tools >> http://bugs.gentoo.org/show_bug.cgi?id=193249#c25 >> >> Thanks for any help. >> >> Alexandre Racine >> >> alexandre.racine at mhicc.org >> >> 514-461-1300 poste 3303 >> >> >> >------------------------------------------------------------------------> >> _______________________________________________ >> Ocfs2-users mailing list >> Ocfs2-users at oss.oracle.com >> http://oss.oracle.com/mailman/listinfo/ocfs2-users >> > > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users at oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users >
Hi,> > But the listdomains script tells me : "which: no debugfs.ocfs2 in > > (/home... " ($PATH actually) > > > Means /sbin is not in your path. Also means the script can be > improved. Redownload both and re-run.Excellent, that works great! Now that I have the locks and the domain name what should I do to unlock them? (Or fix the problem). Thanks.> -----Original Message----- > From: Sunil Mushran [mailto:Sunil.Mushran at oracle.com] > Sent: 27 mai 2008 13:25 > To: Alexandre Racine > Subject: Re: [Ocfs2-users] huge "something" problem > > Alexandre Racine wrote: > > (I just notice that I was replying directly to you. Should I do this > or > > cc the mailing list?) > > > > mailing list is best. > > > Ok this scanlocks2 does not produce any output currently. So I guest > > that if some locks would be found I would have to do this command > like > > this : > > > > scanlocks(2) prints the busy locks. If there are none, it will > not print anything. > > > But the listdomains script tells me : "which: no debugfs.ocfs2 in > > (/home... " ($PATH actually) > > > Means /sbin is not in your path. Also means the script can be > improved. Redownload both and re-run.
Yes. I am assuming this was from the time you had to force shutdown the servers. If so, ignore. Alexandre Racine wrote:> Hi all, > > I had a drive freeze and after shutting down all servers and starting > them one by one I had this in the logs. What does this tells you? > Network problems? Thanks. > > -Alexandre > > Jun 17 11:12:57 SRV1 o2net: connection to node SRV2 (num 1) at > 192.168.60.6:7777 has been idle for 30.0 seconds, > shutting it down. > Jun 17 11:12:57 SRV1 (6913,4):o2net_idle_timer:1422 here are some times > that might help debug the situation: (tmr 1 > 213715547.195522 now 1213715577.197597 dr 1213715577.161345 adv > 1213715547.195522:1213715547.195523 func (c91ec65b:505 > ) 1213715529.22827:1213715529.24199) > Jun 17 11:12:59 SRV1 (6913,4):o2net_send_tcp_msg:837 ERROR: sendmsg > returned -32 instead of 24 > Jun 17 11:12:59 SRV1 o2net: no longer connected to node SRV2 (num 1) at > 192.168.60.6:7777 > Jun 17 11:12:59 SRV1 (6913,4):o2net_sendpage:873 ERROR: sendpage of size > 24 to node SRV2 (num 1) at 192.168.60.6 > :7777 failed with -32 > Jun 17 11:12:59 SRV1 (6968,1):dlm_send_remote_convert_request:395 ERROR: > status = -107 > Jun 17 11:12:59 SRV1 (6968,1):dlm_wait_for_node_death:370 > 41535574BDEB4720B2CE7819A631DF10: waiting 5000ms for noti > fication of death of node 1 > > > Alexandre Racine > alexandre.racine at mhicc.org > 514-461-1300 poste 3303 > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users at oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users >
Well this was today, but yes before I had to force shutdown the machines. Alexandre Racine alexandre.racine at mhicc.org 514-461-1300 poste 3303> -----Original Message----- > From: Sunil Mushran [mailto:Sunil.Mushran at oracle.com] > Sent: 17 juin 2008 14:56 > To: Alexandre Racine > Cc: ocfs2-users at oss.oracle.com > Subject: Re: [Ocfs2-users] errors in logs... > > Yes. I am assuming this was from the time you had > to force shutdown the servers. If so, ignore. > > Alexandre Racine wrote: > > Hi all, > > > > I had a drive freeze and after shutting down all servers andstarting> > them one by one I had this in the logs. What does this tells you? > > Network problems? Thanks. > > > > -Alexandre > > > > Jun 17 11:12:57 SRV1 o2net: connection to node SRV2 (num 1) at > > 192.168.60.6:7777 has been idle for 30.0 seconds, > > shutting it down. > > Jun 17 11:12:57 SRV1 (6913,4):o2net_idle_timer:1422 here are some > times > > that might help debug the situation: (tmr 1 > > 213715547.195522 now 1213715577.197597 dr 1213715577.161345 adv > > 1213715547.195522:1213715547.195523 func (c91ec65b:505 > > ) 1213715529.22827:1213715529.24199) > > Jun 17 11:12:59 SRV1 (6913,4):o2net_send_tcp_msg:837 ERROR: sendmsg > > returned -32 instead of 24 > > Jun 17 11:12:59 SRV1 o2net: no longer connected to node SRV2 (num 1) > at > > 192.168.60.6:7777 > > Jun 17 11:12:59 SRV1 (6913,4):o2net_sendpage:873 ERROR: sendpage of > size > > 24 to node SRV2 (num 1) at 192.168.60.6 > > :7777 failed with -32 > > Jun 17 11:12:59 SRV1 (6968,1):dlm_send_remote_convert_request:395 > ERROR: > > status = -107 > > Jun 17 11:12:59 SRV1 (6968,1):dlm_wait_for_node_death:370 > > 41535574BDEB4720B2CE7819A631DF10: waiting 5000ms for noti > > fication of death of node 1 > > > > > > Alexandre Racine > > alexandre.racine at mhicc.org > > 514-461-1300 poste 3303 > > > > _______________________________________________ > > Ocfs2-users mailing list > > Ocfs2-users at oss.oracle.com > > http://oss.oracle.com/mailman/listinfo/ocfs2-users > >
The only relevant message is the first one that indicates that srv2 has not heard from srv1 for 30 secs. The rest of the messages are because the link broke and are more informative than errors. Alexandre Racine wrote:> Well this was today, but yes before I had to force shutdown the > machines. > > > Alexandre Racine > alexandre.racine at mhicc.org > 514-461-1300 poste 3303 > >