Hi. I'm pretty new to ocfs2 and clusters. I'm trying to make ocfs2 running over a drbd device. I know it's not the best solution but for now i must deal with this. I set up drbd and work perfectly. I set up ocfs and i'm not able to make it to work. /etc/init.d/o2cb status: Module "configfs": Loaded Filesystem "configfs": Mounted Module "ocfs2_nodemanager": Loaded Module "ocfs2_dlm": Loaded Module "ocfs2_dlmfs": Loaded Filesystem "ocfs2_dlmfs": Mounted Checking cluster mail: Online Checking heartbeat: Not active strace -f mount -t ocfs2 /dev/drbd1 /mnt/mail: [...] [pid 3875] open("/dev/drbd1", O_RDWR|O_LARGEFILE) = 3 [pid 3875] open("/sys/kernel/config/cluster/mail/heartbeat/609B5AA47550431CB3EB010C41D14312/dev", O_WRONLY) = 4 [pid 3875] write(4, "3", 1) = -1 EIO (Input/output error) [pid 3875] close(4) = 0 [pid 3875] close(3) = 0 [pid 3875] rmdir("/sys/kernel/config/cluster/mail/heartbeat/609B5AA47550431CB3EB010C41D14312") = 0 [pid 3875] semop(98306, 0xbfd4ec9e, 1) = 0 [pid 3875] write(2, "ocfs2_hb_ctl", 12ocfs2_hb_ctl) = 12 [pid 3875] write(2, ": ", 2: ) = 2 [pid 3875] write(2, "I/O error on channel", 20I/O error on channel) = 20 [pid 3875] write(2, " ", 1 ) = 1 [pid 3875] write(2, "while starting heartbeat", 24while starting heartbeat) = 24 [pid 3875] write(2, "\r\n", 2 ) = 2 [pid 3875] rt_sigprocmask(SIG_UNBLOCK, ~[TRAP SEGV RTMIN RT_1], NULL, 8) = 0 [pid 3875] exit_group(1) = ? Process 3874 resumed Process 3875 detached [pid 3874] <... waitpid resumed> [{WIFEXITED(s) && WEXITSTATUS(s) =1}], 0) = 3875 [pid 3874] rt_sigprocmask(SIG_UNBLOCK, ~[TRAP SEGV RTMIN RT_1], NULL, 8) = 0 [pid 3874] --- SIGCHLD (Child exited) @ 0 (0) --- [pid 3874] write(2, "mount.ocfs2", 11mount.ocfs2) = 11 [pid 3874] write(2, ": ", 2: ) = 2 [pid 3874] write(2, "Error when attempting to run /sb"..., 74Error when attempting to run /sbin/ocfs2_hb_ctl: "Operation not permitted") = 74 [pid 3874] write(2, "\r\n", 2 ) = 2 [pid 3874] exit_group(1) = ? Process 3873 resumed Process 3874 detached <... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 3874 --- SIGCHLD (Child exited) @ 0 (0) --- exit_group(1) = ? Process 3873 detached I think the problem is here ([pid 3875] write(4, "3", 1) -1 EIO (Input/output error)), but in my previuos system had no problem...the only think i changed since that was heartbeat2, but i think has nothing to do with ocfs2...or yes? I tried different kernel and different drbd versions as well. How can i make it to work? Thanks Pier
What version of the kernel? Are there any relevant error messages in /var/log/messages. The error in question is cropping up from o2hb_region_dev_write() in fs/ocfs2/cluster/heartbeat.c. The messages may give us more information. Pierguido wrote:> Hi. > I'm pretty new to ocfs2 and clusters. > I'm trying to make ocfs2 running over a drbd device. > I know it's not the best solution but for now i must deal with this. > I set up drbd and work perfectly. > I set up ocfs and i'm not able to make it to work. > > /etc/init.d/o2cb status: > > Module "configfs": Loaded > Filesystem "configfs": Mounted > Module "ocfs2_nodemanager": Loaded > Module "ocfs2_dlm": Loaded > Module "ocfs2_dlmfs": Loaded > Filesystem "ocfs2_dlmfs": Mounted > Checking cluster mail: Online > Checking heartbeat: Not active > > > strace -f mount -t ocfs2 /dev/drbd1 /mnt/mail: > > [...] > [pid 3875] open("/dev/drbd1", O_RDWR|O_LARGEFILE) = 3 > [pid 3875] > open("/sys/kernel/config/cluster/mail/heartbeat/609B5AA47550431CB3EB010C41D14312/dev", > > O_WRONLY) = 4 > [pid 3875] write(4, "3", 1) = -1 EIO (Input/output error) > [pid 3875] close(4) = 0 > [pid 3875] close(3) = 0 > [pid 3875] > rmdir("/sys/kernel/config/cluster/mail/heartbeat/609B5AA47550431CB3EB010C41D14312") > > = 0 > [pid 3875] semop(98306, 0xbfd4ec9e, 1) = 0 > [pid 3875] write(2, "ocfs2_hb_ctl", 12ocfs2_hb_ctl) = 12 > [pid 3875] write(2, ": ", 2: ) = 2 > [pid 3875] write(2, "I/O error on channel", 20I/O error on channel) = 20 > [pid 3875] write(2, " ", 1 ) = 1 > [pid 3875] write(2, "while starting heartbeat", 24while starting > heartbeat) = 24 > [pid 3875] write(2, "\r\n", 2 > > ) = 2 > [pid 3875] rt_sigprocmask(SIG_UNBLOCK, ~[TRAP SEGV RTMIN RT_1], NULL, > 8) = 0 > [pid 3875] exit_group(1) = ? > Process 3874 resumed > Process 3875 detached > [pid 3874] <... waitpid resumed> [{WIFEXITED(s) && WEXITSTATUS(s) => 1}], 0) = 3875 > [pid 3874] rt_sigprocmask(SIG_UNBLOCK, ~[TRAP SEGV RTMIN RT_1], NULL, > 8) = 0 > [pid 3874] --- SIGCHLD (Child exited) @ 0 (0) --- > [pid 3874] write(2, "mount.ocfs2", 11mount.ocfs2) = 11 > [pid 3874] write(2, ": ", 2: ) = 2 > [pid 3874] write(2, "Error when attempting to run /sb"..., 74Error when > attempting to run /sbin/ocfs2_hb_ctl: "Operation not permitted") = 74 > [pid 3874] write(2, "\r\n", 2 > > ) = 2 > [pid 3874] exit_group(1) = ? > Process 3873 resumed > Process 3874 detached > <... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) > = 3874 > --- SIGCHLD (Child exited) @ 0 (0) --- > exit_group(1) = ? > Process 3873 detached > > > I think the problem is here ([pid 3875] write(4, "3", 1) > -1 EIO (Input/output error)), but in my previuos system had no > problem...the only think i changed since that was heartbeat2, but i > think has nothing to do with ocfs2...or yes? > I tried different kernel and different drbd versions as well. > How can i make it to work? > Thanks > > Pier > > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users@oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users
Sunil Mushran wrote:> What version of the kernel?I tried with standard debian kernel 2.6.18-5-686 and vanilla 2.6.20.17. Both of them give the same result.> Are there any relevant error messages in /var/log/messages.In syslog i get: Sep 5 10:54:57 srv-cluster-1 kernel: (8129,0):o2hb_setup_one_bio:290 ERROR: Error adding page to bio i = 7, vec_len = 4096, len = 0 Sep 5 10:54:57 srv-cluster-1 kernel: , start = 0 Sep 5 10:54:57 srv-cluster-1 kernel: (8129,0):o2hb_read_slots:385 ERROR: status = -5 Sep 5 10:54:57 srv-cluster-1 kernel: (8129,0):o2hb_populate_slot_data:1299 ERROR: status = -5 Sep 5 10:54:57 srv-cluster-1 kernel: (8129,0):o2hb_region_dev_write:1399 ERROR: status = -5> The error in question is cropping up from o2hb_region_dev_write() > in fs/ocfs2/cluster/heartbeat.c. The messages may give us more > information.Is there any specific part that i can debug (with debug.ocfs2)? Thank you Pier