Jürgen Herrmann
2010-Apr-12 12:53 UTC
[Ocfs2-users] ocfs2/o2cb problem with openais/pacemaker
hi! i'm on debian lenny and trying to run ocfs2 on a dual primary drbd device. the drbd device is already set up as msDRBD0. to get dlm_controld.pcmk i installed it from source (from cluster-suite-3.0.10) now i configured a resource "resDLM" with 2 clones: primitive resDLM ocf:pacemaker:controld op monitor interval="120s" clone cloneDLM resDLM meta globally-unique="false" interleave="true" colocation colDLM_DRBD0 inf: cloneDLM msDRBD0:Master order ordDRBD0_DLM inf: msDRBD0:promote cloneDLM:start -> seems to work. to get ocfs2_controld.pcmk i installed ocfs2-tools-1.4.3 from source. after adding the resource: primitive resO2CB ocf:pacemaker:o2cb op monitor interval="120s" clone cloneO2CB resO2CB meta globally-unique="false" interleave="true" colocation colO2CB_DLM inf: cloneO2CB cloneDLM order ordDLM_O2CB inf: cloneDLM cloneO2CB i get the following errors in crm_mon: =====================================Failed actions: resO2CB:0_start_0 (node=app1b.xlhost.de, call=28, rc=1, status=complete): unknown error resO2CB:0_start_0 (node=app1a.xlhost.de, call=38, rc=1, status=complete): unknown error the relevant syslog entries: ===========================Apr 12 13:15:18 app1a corosync[4638]: [pcmk ] info: pcmk_notify: Enabling node notifications for child 8311 (0xd83090) Apr 12 13:15:18 app1a ocfs2_controld.pcmk: Error opening control device: Unable to access cluster service if i start "ocfs2_controld.pcmk -D" i get: =========================================ocfs2_controld[18489]: 2010/04/12_13:40:39 info: init_ais_connection: Creating connection to our AIS plugin ocfs2_controld[18489]: 2010/04/12_13:40:39 info: init_ais_connection: AIS connection established ocfs2_controld[18489]: 2010/04/12_13:40:39 info: get_ais_nodeid: Server details: id=569559765 uname=app1a.xlhost.de cname=pcmk ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_new_peer: Node app1a.xlhost.de now has id: 569559765 ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_new_peer: Node 569559765 is now known as app1a.xlhost.de 1271072439 setup_stack at 168: Cluster connection established. Local node id: 569559765 1271072439 setup_stack at 172: Added Pacemaker as client 1 with fd 5 1271072439 setup_ckpt at 609: Initializing CKPT service (try 1) 1271072439 setup_ckpt at 615: Connected to CKPT service with handle 0x327b23c600000000 1271072439 call_ckpt_open at 160: Opening checkpoint "ocfs2:controld:21f2cad5" (try 1) 1271072439 call_ckpt_open at 170: Opened checkpoint "ocfs2:controld:21f2cad5" with handle 0x6633487300000000 1271072439 call_section_write at 340: Writing to section "daemon_max_protocol" on checkpoint "ocfs2:controld:21f2cad5" (try 1) 1271072439 call_section_create at 292: Creating section "daemon_max_protocol" on checkpoint "ocfs2:controld:21f2cad5" (try 1) 1271072439 call_section_create at 300: Created section "daemon_max_protocol" on checkpoint "ocfs2:controld:21f2cad5" 1271072439 call_section_write at 340: Writing to section "ocfs2_max_protocol" on checkpoint "ocfs2:controld:21f2cad5" (try 1) 1271072439 call_section_create at 292: Creating section "ocfs2_max_protocol" on checkpoint "ocfs2:controld:21f2cad5" (try 1) 1271072439 call_section_create at 300: Created section "ocfs2_max_protocol" on checkpoint "ocfs2:controld:21f2cad5" 1271072439 start_join at 588: Starting join for group "ocfs2:controld" 1271072439 start_join at 592: cpg_join succeeded 1271072439 loop at 975: setup done ocfs2_controld[18489]: 2010/04/12_13:40:39 notice: ais_dispatch: Membership 156: quorum acquired ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_update_peer: Node app1a.xlhost.de: id=569559765 state=member (new) addr=r(0) ip(213.202.242.161) (new) votes=1 (new) born=156 seen=156 proc=00000000000000000000000000013312 (new) ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_new_peer: Node app1b.xlhost.de now has id: 586336981 ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_new_peer: Node 586336981 is now known as app1b.xlhost.de ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_update_peer: Node app1b.xlhost.de: id=586336981 state=member (new) addr=r(0) ip(213.202.242.162) votes=1 born=148 seen=156 proc=00000000000000000000000000013312 1271072439 confchg_cb at 495: confchg called 1271072439 daemon_change at 398: ocfs2_controld (group "ocfs2:controld") confchg: members 1, left 0, joined 1 1271072439 cpg_joined at 909: CPG is live, we are the first daemon 1271072439 call_ckpt_open at 160: Opening checkpoint "ocfs2:controld" (try 1) 1271072439 call_ckpt_open at 170: Opened checkpoint "ocfs2:controld" with handle 0x2ae8944a00000001 1271072439 call_section_write at 340: Writing to section "daemon_protocol" on checkpoint "ocfs2:controld" (try 1) 1271072439 call_section_create at 292: Creating section "daemon_protocol" on checkpoint "ocfs2:controld" (try 1) 1271072439 call_section_create at 300: Created section "daemon_protocol" on checkpoint "ocfs2:controld" 1271072439 call_section_write at 340: Writing to section "ocfs2_protocol" on checkpoint "ocfs2:controld" (try 1) 1271072439 call_section_create at 292: Creating section "ocfs2_protocol" on checkpoint "ocfs2:controld" (try 1) 1271072439 call_section_create at 300: Created section "ocfs2_protocol" on checkpoint "ocfs2:controld" 1271072439 cpg_joined at 923: Daemon protocol is 1.0 1271072439 cpg_joined at 925: fs protocol is 1.0 1271072439 cpg_joined at 927: Connecting to dlm_controld 1271072439 cpg_joined at 934: Opening control device 1271072439 cpg_joined at 938: Error opening control device: Unable to access cluster service 1271072439 exit_dlmcontrol at 363: Closing dlm_controld connection 1271072439 start_leave at 613: leaving group "ocfs2:controld" 1271072439 start_leave at 626: cpg_leave succeeded 1271072439 exit_cpg at 760: closing cpg connection 1271072439 call_ckpt_close at 240: Closing checkpoint "ocfs2:controld:21f2cad5" (try 1) 1271072439 call_ckpt_close at 246: Closed checkpoint "ocfs2:controld:21f2cad5" 1271072439 exit_ckpt at 643: Disconnecting from CKPT service (try 1) 1271072439 exit_ckpt at 647: Disconnected from CKPT service 1271072439 exit_stack at 144: closing pacemaker connection ocfs2_controld[18489]: 2010/04/12_13:40:39 notice: terminate_ais_connection: Disconnected from AIS obviously ocfs2_controld.pcmk can connect to the openais CKPT service and to dlm_controld.pcmk, which then terminates the connection. here's the output from dlm_controld.pcmk -q 0 -D: (the last 6 lines show 3 connection attempts from ocfs2_controld.pcmk!) ======================================================================1271072755 dlm_controld 3.0.10 started cluster-dlm[20608]: 2010/04/12_13:45:55 info: init_ais_connection: Creating connection to our AIS plugin cluster-dlm[20608]: 2010/04/12_13:45:55 info: init_ais_connection: AIS connection established cluster-dlm[20608]: 2010/04/12_13:45:55 info: get_ais_nodeid: Server details: id=569559765 uname=app1a.xlhost.de cname=pcmk cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_new_peer: Node app1a.xlhost.de now has id: 569559765 cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_new_peer: Node 569559765 is now known as app1a.xlhost.de 1271072755 found /dev/misc/dlm-control minor 58 1271072755 found /dev/misc/dlm-monitor minor 57 1271072755 found /dev/misc/dlm_plock minor 56 1271072755 /dev/misc/dlm-monitor fd 9 1271072755 /sys/kernel/config/dlm/cluster/comms: opendir failed: 2 1271072755 /sys/kernel/config/dlm/cluster/spaces: opendir failed: 2 1271072755 confdb_key_get error 11 1271072755 group_mode 3 compat 0 1271072755 setup_cpg_daemon 11 1271072755 dlm:controld conf 2 1 0 memb 569559765 586336981 join 569559765 left 1271072755 run protocol from nodeid 586336981 1271072755 daemon run 1.1.1 max 1.1.1 kernel run 1.1.1 max 1.1.1 1271072755 plocks 13 1271072755 plock cpg message size: 104 bytes cluster-dlm[20608]: 2010/04/12_13:45:55 notice: ais_dispatch: Membership 156: quorum acquired cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_update_peer: Node app1a.xlhost.de: id=569559765 state=member (new) addr=r(0) ip(213.202.242.161) (new) votes=1 (new) born=156 seen=156 proc=00000000000000000000000000013312 (new) cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_new_peer: Node app1b.xlhost.de now has id: 586336981 cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_new_peer: Node 586336981 is now known as app1b.xlhost.de cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_update_peer: Node app1b.xlhost.de: id=586336981 state=member (new) addr=r(0) ip(213.202.242.162) votes=1 born=148 seen=156 proc=00000000000000000000000000013312 1271072755 Processing membership 156 1271072755 Adding address ip(213.202.242.161) to configfs for node 569559765 1271072755 set_configfs_node 569559765 213.202.242.161 local 1 1271072755 Added active node 569559765: born-on=156, last-seen=156, this-event=156, last-event=0 1271072755 Adding address ip(213.202.242.162) to configfs for node 586336981 1271072755 set_configfs_node 586336981 213.202.242.162 local 0 1271072755 Added active node 586336981: born-on=148, last-seen=156, this-event=156, last-event=0 1271072763 client connection 5 fd 14 1271072763 connection 5 read error -1 1271072776 client connection 5 fd 14 1271072776 connection 5 read error -1 1271072779 client connection 5 fd 14 1271072779 connection 5 read error -1 i'm pretty lost at the moment, as there's nothing i can find via google regarding the "core" problem: 1271072439 cpg_joined at 934: Opening control device 1271072439 cpg_joined at 938: Error opening control device: Unable to access cluster service any help would be greatly appreciated. best regards, j?rgen herrmann -->> XLhost.de - eXperts in Linux hosting ? <<XLhost.de GmbH J?rgen Herrmann, Gesch?ftsf?hrer Boelckestrasse 21, 93051 Regensburg, Germany Gesch?ftsf?hrer: Volker Geith, J?rgen Herrmann Registriert unter: HRB9918 Umsatzsteuer-Identifikationsnummer: DE245931218 Fon: +49 (0)800 XLHOSTDE [0800 95467833] Fax: +49 (0)800 95467830 WEB: http://www.XLhost.de IRC: #XLhost at irc.quakenet.org
Reasonably Related Threads
- failed: Permission denied missing +w perm: /home/mail_virtual/2001
- ocfs2_controld.cman
- mount.ocfs2: Invalid argument while mounting /dev/mapper/xenconfig_part1 on /etc/xen/vm/. Check 'dmesg' for more information on this error.
- Asterisk Meetme & MeetMeAdmin cmd info-use
- Patch to Pacemaker hooks in ocfs2_controld