Hello list, There is a 14 node OCFS2 cluster. When I reboot all 14 nodes at once, some node failed to mount the ocfs2 filesystem while rebooting. The mount is supposed to be done by /etc/fstab. The symptom is happened on randam node. I would like to know if there is such a rule that OCFS2 nodes need to be reboot one-by-one, instead of all-at-once. An error message I saw was mount.ocfs2: Transport endpoint is not connected while mounting /dev/EXTDISK/OCFS2 iptables is not set on each nodes. Regards, Masanari Iida
Hi, Masanari Iida wrote:> Hello list, > > There is a 14 node OCFS2 cluster. > When I reboot all 14 nodes at once, some node failed to > mount the ocfs2 filesystem while rebooting. > The mount is supposed to be done by /etc/fstab. > The symptom is happened on randam node. > I would like to know if there is such a rule that OCFS2 nodes > need to be reboot one-by-one, instead of all-at-once.No rule for rebooting ocfs2.> > An error message I saw was > mount.ocfs2: Transport endpoint is not connected while mounting > /dev/EXTDISK/OCFS2Interesting. Have you update ocfs2 in some nodes? Normally it happens when there are some protocol collision among nodes. Any helpful information in "dmesg"? also please provide the version info of ocfs2. Regards, Tao
On Tue, Aug 5, 2008 at 5:43 PM, Tao Ma <tao.ma at oracle.com> wrote:>> >> An error message I saw was >> mount.ocfs2: Transport endpoint is not connected while mounting >> /dev/EXTDISK/OCFS2 > > Interesting. Have you update ocfs2 in some nodes? Normally it happens when > there are some protocol collision among nodes. > Any helpful information in "dmesg"? >The boxes are all SLES10 + ocfs2-tools-1.2.5-SLES-r2997 I found no other messages at the time of failure. I would like to catch some useful information for troubleshooting. Do you think tcpdump may catch something? Or Do I need to use debug.ocfs2? If latter case, what is the right option? Thank you Masanari Iida
Masanari Iida wrote:> On Tue, Aug 5, 2008 at 5:43 PM, Tao Ma <tao.ma at oracle.com> wrote: > > >>> An error message I saw was >>> mount.ocfs2: Transport endpoint is not connected while mounting >>> /dev/EXTDISK/OCFS2 >>> >> Interesting. Have you update ocfs2 in some nodes? Normally it happens when >> there are some protocol collision among nodes. >> Any helpful information in "dmesg"? >> >> > The boxes are all SLES10 + ocfs2-tools-1.2.5-SLES-r2997 > I found no other messages at the time of failure. >which version of ocfs2?> I would like to catch some useful information for troubleshooting. > Do you think tcpdump may catch something? > Or Do I need to use debug.ocfs2? > If latter case, what is the right option? >debugfs.ocfs2 -l CONN DLM_DOMAIN TCP allow mount and check "dmesg". Regards, Tao
I believe 1.2.5-SLES-r2997 is the version of the fs and not the tools. Meaning, an upgrade is required to the latest kernel that is shipping 1.2.9. As far as failure to mount goes, one reason could be that the default timeout (10 secs) could be low. See if increasing to the new default of 30 secs helps. Tao Ma wrote:> Masanari Iida wrote: > >> On Tue, Aug 5, 2008 at 5:43 PM, Tao Ma <tao.ma at oracle.com> wrote: >> >> >> >>>> An error message I saw was >>>> mount.ocfs2: Transport endpoint is not connected while mounting >>>> /dev/EXTDISK/OCFS2 >>>> >>>> >>> Interesting. Have you update ocfs2 in some nodes? Normally it happens when >>> there are some protocol collision among nodes. >>> Any helpful information in "dmesg"? >>> >>> >>> >> The boxes are all SLES10 + ocfs2-tools-1.2.5-SLES-r2997 >> I found no other messages at the time of failure. >> >> > which version of ocfs2? > >> I would like to catch some useful information for troubleshooting. >> Do you think tcpdump may catch something? >> Or Do I need to use debug.ocfs2? >> If latter case, what is the right option? >> >> > debugfs.ocfs2 -l CONN DLM_DOMAIN TCP allow > mount and check "dmesg". > > Regards, > Tao > > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users at oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users >
Hello Tao and Sunil, On Wed, Aug 6, 2008 at 1:07 AM, Sunil Mushran <Sunil.Mushran at oracle.com> wrote:> I believe 1.2.5-SLES-r2997 is the version of the fs and not the > tools. Meaning, an upgrade is required to the latest kernel > that is shipping 1.2.9. >OK> As far as failure to mount goes, one reason could be that the > default timeout (10 secs) could be low. See if increasing to the > new default of 30 secs helps. >I think you are talking about following timeout setting, and it has already extended to 30sec. O2CB_IDLE_TIMEOUT_MS=30000>>> I would like to catch some useful information for troubleshooting. >>> Do you think tcpdump may catch something? >>> Or Do I need to use debug.ocfs2? >>> If latter case, what is the right option? >>> >> >> debugfs.ocfs2 -l CONN DLM_DOMAIN TCP allow >> mount and check "dmesg". >>My case, the symptom (ocfs2 failed to mount a volume using /etc/fstab) happend when I reboot the system. Even if it failed to mount (by /etc/fstab), I can mount it later after I login the system. So it could be some kind of timing issue. Your advice "mount and check "dmesg" ) seemed to be a manual procedure. I would like to know how and where can I set the "debugfs.ocfs2", and make it run just before the ocfs2 mount. Regards, Masanari Iida
Hi, Masanari Iida wrote:> Hello Tao and Sunil,]> My case, the symptom (ocfs2 failed to mount a volume using> /etc/fstab) happend when I reboot the system. > Even if it failed to mount (by /etc/fstab), I can mount it later > after I login the system. So it could be some kind of timing issue. > > Your advice "mount and check "dmesg" ) seemed to be a manual procedure. > I would like to know how and where can I set the "debugfs.ocfs2", > and make it run just before the ocfs2 mount.Are you sure your network device have been started before ocfs2? Have you added _netdev in your fstab? http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2_faq.html qs 41.> > Regards, > > Masanari Iida
http://oss.oracle.com/bugzilla/show_bug.cgi?id=838 Check out this bugzilla. Tao Ma wrote:> Hi, > > Masanari Iida wrote: >> Hello Tao and Sunil, > ]> My case, the symptom (ocfs2 failed to mount a volume using >> /etc/fstab) happend when I reboot the system. >> Even if it failed to mount (by /etc/fstab), I can mount it later >> after I login the system. So it could be some kind of timing issue. >> >> Your advice "mount and check "dmesg" ) seemed to be a manual procedure. >> I would like to know how and where can I set the "debugfs.ocfs2", >> and make it run just before the ocfs2 mount. > Are you sure your network device have been started before ocfs2? Have > you added _netdev in your fstab? > http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2_faq.html > qs 41. >> >> Regards, >> >> Masanari Iida