Hello, i'm running ocfs2 on 27 nodes with 2 devices ( 2 fiber channel disk array storage) on debian system vanilla kernel 2.6.38.2 ocfs2-tools 1.6.3-1 sometimes when i want to mount the device1 after a reboot i can't : (mount.ocfs2,9543,2):dlm_join_domain:1857 Timed out joining dlm domain EA9679D689F64044BFBCDF0D2F7BCDF0 after 94000 msecs the other nodes have already mounted device1 and have heavy I/O access on it. The node which want to mount device1 have already mounted device2. any help welcome. thank you. see the file o2cb. cat /etc/default/o2cb # # This is a configuration file for automatic startup of the O2CB # driver. It is generated by running 'dpkg-reconfigure ocfs2-tools'. # Please use that method to modify this file. # # O2CB_ENABLED: 'true' means to load the driver on boot. O2CB_ENABLED=true # O2CB_BOOTCLUSTER: If not empty, the name of a cluster to start. O2CB_BOOTCLUSTER=bigstock # O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered dead. O2CB_HEARTBEAT_THRESHOLD=61 # O2CB_IDLE_TIMEOUT_MS: Time in ms before a network connection is considered dead. O2CB_IDLE_TIMEOUT_MS=60000 # O2CB_KEEPALIVE_DELAY_MS: Max. time in ms before a keepalive packet is sent. O2CB_KEEPALIVE_DELAY_MS=4000 # O2CB_RECONNECT_DELAY_MS: Min. time in ms between connection attempts. O2CB_RECONNECT_DELAY_MS=4000 -- Christophe Bouder.
Is this during boot or is the mount manual? Does it succeed on second attempt? On 04/22/2011 06:33 AM, Christophe BOUDER wrote:> Hello, > i'm running ocfs2 on 27 nodes > with 2 devices ( 2 fiber channel disk array storage) > on debian system > vanilla kernel 2.6.38.2 > ocfs2-tools 1.6.3-1 > > sometimes when i want to mount the device1 > after a reboot i can't : > > (mount.ocfs2,9543,2):dlm_join_domain:1857 Timed out joining dlm domain > EA9679D689F64044BFBCDF0D2F7BCDF0 after 94000 msecs > > > the other nodes have already mounted device1 > and have heavy I/O access on it. > The node which want to mount device1 have already mounted device2. > > any help welcome. > thank you. > > see the file o2cb. > > cat /etc/default/o2cb > # > # This is a configuration file for automatic startup of the O2CB > # driver. It is generated by running 'dpkg-reconfigure ocfs2-tools'. > # Please use that method to modify this file. > # > > # O2CB_ENABLED: 'true' means to load the driver on boot. > O2CB_ENABLED=true > > # O2CB_BOOTCLUSTER: If not empty, the name of a cluster to start. > O2CB_BOOTCLUSTER=bigstock > > # O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered dead. > O2CB_HEARTBEAT_THRESHOLD=61 > > # O2CB_IDLE_TIMEOUT_MS: Time in ms before a network connection is > considered dead. > O2CB_IDLE_TIMEOUT_MS=60000 > > # O2CB_KEEPALIVE_DELAY_MS: Max. time in ms before a keepalive packet is sent. > O2CB_KEEPALIVE_DELAY_MS=4000 > > # O2CB_RECONNECT_DELAY_MS: Min. time in ms between connection attempts. > O2CB_RECONNECT_DELAY_MS=4000 > >
> hi,how about ocfs2? i have setup a 25 nodes cluster, but often some nodes > dead, and panic.yes sometimes , but my environnement has heavy loads on each node using huge of data. Now i have : mount.ocfs2: Unknown code B 0 while mounting /dev/sda1 on /home. i think i must reboot one of the alive nodes but which one ?> > thanks > > > > At 2011-04-23 13:16:35??"Christophe BOUDER" <Christophe.Bouder at lip6.fr> > wrote: > >> >>> Is this during boot or is the mount manual? >> >>during boot and on manual mount. >> >>> Does it succeed on second attempt? >> >>no, it does not succeed . >> >> >>> >>> On 04/22/2011 06:33 AM, Christophe BOUDER wrote: >>>> Hello, >>>> i'm running ocfs2 on 27 nodes >>>> with 2 devices ( 2 fiber channel disk array storage) >>>> on debian system >>>> vanilla kernel 2.6.38.2 >>>> ocfs2-tools 1.6.3-1 >>>> >>>> sometimes when i want to mount the device1 >>>> after a reboot i can't : >>>> >>>> (mount.ocfs2,9543,2):dlm_join_domain:1857 Timed out joining dlm >>>> domain >>>> EA9679D689F64044BFBCDF0D2F7BCDF0 after 94000 msecs >>>> >>>> >>>> the other nodes have already mounted device1 >>>> and have heavy I/O access on it. >>>> The node which want to mount device1 have already mounted device2. >>>> >>>> any help welcome. >>>> thank you. >>>> >>>> see the file o2cb. >>>> >>>> cat /etc/default/o2cb >>>> # >>>> # This is a configuration file for automatic startup of the O2CB >>>> # driver. It is generated by running 'dpkg-reconfigure ocfs2-tools'. >>>> # Please use that method to modify this file. >>>> # >>>> >>>> # O2CB_ENABLED: 'true' means to load the driver on boot. >>>> O2CB_ENABLED=true >>>> >>>> # O2CB_BOOTCLUSTER: If not empty, the name of a cluster to start. >>>> O2CB_BOOTCLUSTER=bigstock >>>> >>>> # O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered >>>> dead. >>>> O2CB_HEARTBEAT_THRESHOLD=61 >>>> >>>> # O2CB_IDLE_TIMEOUT_MS: Time in ms before a network connection is >>>> considered dead. >>>> O2CB_IDLE_TIMEOUT_MS=60000 >>>> >>>> # O2CB_KEEPALIVE_DELAY_MS: Max. time in ms before a keepalive packet >>>> is >>>> sent. >>>> O2CB_KEEPALIVE_DELAY_MS=4000 >>>> >>>> # O2CB_RECONNECT_DELAY_MS: Min. time in ms between connection >>>> attempts. >>>> O2CB_RECONNECT_DELAY_MS=4000 >>>> >>>> >>> >>> >> >> >>-- >>Christophe . >> >> >>_______________________________________________ >>Ocfs2-users mailing list >>Ocfs2-users at oss.oracle.com >>http://oss.oracle.com/mailman/listinfo/ocfs2-users >-- Christophe .
> give more info, such as dmesg,messages. returned info when mount.i have only : (mount.ocfs2,9543,2):dlm_join_domain:1857 Timed out joining dlm domain EA9679D689F64044BFBCDF0D2F7BCDF0 after 94000 msecs but now it's crashed I have a kernel panic for 2 nodes see the two png joined.> > > > > At 2011-04-26 23:21:52?"Christophe BOUDER" <Christophe.Bouder at lip6.fr> > wrote: > >> >>> hi,how about ocfs2? i have setup a 25 nodes cluster, but often some >>> nodes >>> dead, and panic. >> >>yes sometimes , >>but my environnement has heavy loads on each node >>using huge of data. >> >>Now i have : >> mount.ocfs2: Unknown code B 0 while mounting /dev/sda1 on /home. >> >>i think i must reboot one of the alive nodes >>but which one ? >> >>> >>> thanks >>> >>> >>> >>> At 2011-04-23 13:16:35??"Christophe BOUDER" <Christophe.Bouder at lip6.fr> >>> wrote: >>> >>>> >>>>> Is this during boot or is the mount manual? >>>> >>>>during boot and on manual mount. >>>> >>>>> Does it succeed on second attempt? >>>> >>>>no, it does not succeed . >>>> >>>> >>>>> >>>>> On 04/22/2011 06:33 AM, Christophe BOUDER wrote: >>>>>> Hello, >>>>>> i'm running ocfs2 on 27 nodes >>>>>> with 2 devices ( 2 fiber channel disk array storage) >>>>>> on debian system >>>>>> vanilla kernel 2.6.38.2 >>>>>> ocfs2-tools 1.6.3-1 >>>>>> >>>>>> sometimes when i want to mount the device1 >>>>>> after a reboot i can't : >>>>>> >>>>>> (mount.ocfs2,9543,2):dlm_join_domain:1857 Timed out joining dlm >>>>>> domain >>>>>> EA9679D689F64044BFBCDF0D2F7BCDF0 after 94000 msecs >>>>>> >>>>>> >>>>>> the other nodes have already mounted device1 >>>>>> and have heavy I/O access on it. >>>>>> The node which want to mount device1 have already mounted device2. >>>>>> >>>>>> any help welcome. >>>>>> thank you. >>>>>> >>>>>> see the file o2cb. >>>>>> >>>>>> cat /etc/default/o2cb >>>>>> # >>>>>> # This is a configuration file for automatic startup of the O2CB >>>>>> # driver. It is generated by running 'dpkg-reconfigure >>>>>> ocfs2-tools'. >>>>>> # Please use that method to modify this file. >>>>>> # >>>>>> >>>>>> # O2CB_ENABLED: 'true' means to load the driver on boot. >>>>>> O2CB_ENABLED=true >>>>>> >>>>>> # O2CB_BOOTCLUSTER: If not empty, the name of a cluster to start. >>>>>> O2CB_BOOTCLUSTER=bigstock >>>>>> >>>>>> # O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered >>>>>> dead. >>>>>> O2CB_HEARTBEAT_THRESHOLD=61 >>>>>> >>>>>> # O2CB_IDLE_TIMEOUT_MS: Time in ms before a network connection is >>>>>> considered dead. >>>>>> O2CB_IDLE_TIMEOUT_MS=60000 >>>>>> >>>>>> # O2CB_KEEPALIVE_DELAY_MS: Max. time in ms before a keepalive packet >>>>>> is >>>>>> sent. >>>>>> O2CB_KEEPALIVE_DELAY_MS=4000 >>>>>> >>>>>> # O2CB_RECONNECT_DELAY_MS: Min. time in ms between connection >>>>>> attempts. >>>>>> O2CB_RECONNECT_DELAY_MS=4000 >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>>-- >>>>Christophe . >>>> >>>> >>>>_______________________________________________ >>>>Ocfs2-users mailing list >>>>Ocfs2-users at oss.oracle.com >>>>http://oss.oracle.com/mailman/listinfo/ocfs2-users >>> >> >> >>-- >>Christophe . >-- Christophe Bouder, UPMC - LIP6 - CNRS Bureau 2526-4-04 BP 169 4 place Jussieu, 75252 Paris Cedex 05, France tel: 0144273718 fax: 0144277000 -------------- next part -------------- A non-text attachment was scrubbed... Name: crash2.png Type: image/png Size: 28525 bytes Desc: not available Url : http://oss.oracle.com/pipermail/ocfs2-users/attachments/20110504/d1f1c09c/attachment-0002.png -------------- next part -------------- A non-text attachment was scrubbed... Name: crash1.png Type: image/png Size: 31850 bytes Desc: not available Url : http://oss.oracle.com/pipermail/ocfs2-users/attachments/20110504/d1f1c09c/attachment-0003.png
> i have the same error msg.(fedora12,2.6.32.23,ocfs2-1.5, 25 nodes) > 1.unknown code B 0 > 2.fsck.ocfs2 > fsck.ocfs2 1.4.3 > fsck.ocfs2: Could not create domain while initializing the DLM > 3.i have fsck..ocfs2 -b 2 /dev/sdb . if i am wrong. i do not know > which is superblock? > debugfs.ocfs2 -R slotmap /dev/sdb , it just show 3 nodes? if i destroy > slotmap block?i'm more lucky i halt all the nodes then reboot one by one and it 's ok . Now i run with 9 nodes and see what happens. But it's true that i have heavy heavy load on each nodes. -- Christophe Bouder