Hi.
I'm pretty new to ocfs2 and clusters.
I'm trying to make ocfs2 running over a drbd device.
I know it's not the best solution but for now i must deal with this.
I set up drbd and work perfectly.
I set up ocfs and i'm not able to make it to work.
/etc/init.d/o2cb status:
Module "configfs": Loaded
Filesystem "configfs": Mounted
Module "ocfs2_nodemanager": Loaded
Module "ocfs2_dlm": Loaded
Module "ocfs2_dlmfs": Loaded
Filesystem "ocfs2_dlmfs": Mounted
Checking cluster mail: Online
Checking heartbeat: Not active
strace -f mount -t ocfs2 /dev/drbd1 /mnt/mail:
[...]
[pid 3875] open("/dev/drbd1", O_RDWR|O_LARGEFILE) = 3
[pid 3875]
open("/sys/kernel/config/cluster/mail/heartbeat/609B5AA47550431CB3EB010C41D14312/dev",
O_WRONLY) = 4
[pid 3875] write(4, "3", 1) = -1 EIO (Input/output error)
[pid 3875] close(4) = 0
[pid 3875] close(3) = 0
[pid 3875]
rmdir("/sys/kernel/config/cluster/mail/heartbeat/609B5AA47550431CB3EB010C41D14312")
= 0
[pid 3875] semop(98306, 0xbfd4ec9e, 1) = 0
[pid 3875] write(2, "ocfs2_hb_ctl", 12ocfs2_hb_ctl) = 12
[pid 3875] write(2, ": ", 2: ) = 2
[pid 3875] write(2, "I/O error on channel", 20I/O error on channel) =
20
[pid 3875] write(2, " ", 1 ) = 1
[pid 3875] write(2, "while starting heartbeat", 24while starting
heartbeat) = 24
[pid 3875] write(2, "\r\n", 2
) = 2
[pid 3875] rt_sigprocmask(SIG_UNBLOCK, ~[TRAP SEGV RTMIN RT_1], NULL,
8) = 0
[pid 3875] exit_group(1) = ?
Process 3874 resumed
Process 3875 detached
[pid 3874] <... waitpid resumed> [{WIFEXITED(s) && WEXITSTATUS(s)
=1}], 0) = 3875
[pid 3874] rt_sigprocmask(SIG_UNBLOCK, ~[TRAP SEGV RTMIN RT_1], NULL,
8) = 0
[pid 3874] --- SIGCHLD (Child exited) @ 0 (0) ---
[pid 3874] write(2, "mount.ocfs2", 11mount.ocfs2) = 11
[pid 3874] write(2, ": ", 2: ) = 2
[pid 3874] write(2, "Error when attempting to run /sb"..., 74Error
when
attempting to run /sbin/ocfs2_hb_ctl: "Operation not permitted") = 74
[pid 3874] write(2, "\r\n", 2
) = 2
[pid 3874] exit_group(1) = ?
Process 3873 resumed
Process 3874 detached
<... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0,
NULL) = 3874
--- SIGCHLD (Child exited) @ 0 (0) ---
exit_group(1) = ?
Process 3873 detached
I think the problem is here ([pid 3875] write(4, "3", 1)
-1 EIO (Input/output error)), but in my previuos system had no
problem...the only think i changed since that was heartbeat2, but i
think has nothing to do with ocfs2...or yes?
I tried different kernel and different drbd versions as well.
How can i make it to work?
Thanks
Pier
What version of the kernel? Are there any relevant error messages in /var/log/messages. The error in question is cropping up from o2hb_region_dev_write() in fs/ocfs2/cluster/heartbeat.c. The messages may give us more information. Pierguido wrote:> Hi. > I'm pretty new to ocfs2 and clusters. > I'm trying to make ocfs2 running over a drbd device. > I know it's not the best solution but for now i must deal with this. > I set up drbd and work perfectly. > I set up ocfs and i'm not able to make it to work. > > /etc/init.d/o2cb status: > > Module "configfs": Loaded > Filesystem "configfs": Mounted > Module "ocfs2_nodemanager": Loaded > Module "ocfs2_dlm": Loaded > Module "ocfs2_dlmfs": Loaded > Filesystem "ocfs2_dlmfs": Mounted > Checking cluster mail: Online > Checking heartbeat: Not active > > > strace -f mount -t ocfs2 /dev/drbd1 /mnt/mail: > > [...] > [pid 3875] open("/dev/drbd1", O_RDWR|O_LARGEFILE) = 3 > [pid 3875] > open("/sys/kernel/config/cluster/mail/heartbeat/609B5AA47550431CB3EB010C41D14312/dev", > > O_WRONLY) = 4 > [pid 3875] write(4, "3", 1) = -1 EIO (Input/output error) > [pid 3875] close(4) = 0 > [pid 3875] close(3) = 0 > [pid 3875] > rmdir("/sys/kernel/config/cluster/mail/heartbeat/609B5AA47550431CB3EB010C41D14312") > > = 0 > [pid 3875] semop(98306, 0xbfd4ec9e, 1) = 0 > [pid 3875] write(2, "ocfs2_hb_ctl", 12ocfs2_hb_ctl) = 12 > [pid 3875] write(2, ": ", 2: ) = 2 > [pid 3875] write(2, "I/O error on channel", 20I/O error on channel) = 20 > [pid 3875] write(2, " ", 1 ) = 1 > [pid 3875] write(2, "while starting heartbeat", 24while starting > heartbeat) = 24 > [pid 3875] write(2, "\r\n", 2 > > ) = 2 > [pid 3875] rt_sigprocmask(SIG_UNBLOCK, ~[TRAP SEGV RTMIN RT_1], NULL, > 8) = 0 > [pid 3875] exit_group(1) = ? > Process 3874 resumed > Process 3875 detached > [pid 3874] <... waitpid resumed> [{WIFEXITED(s) && WEXITSTATUS(s) => 1}], 0) = 3875 > [pid 3874] rt_sigprocmask(SIG_UNBLOCK, ~[TRAP SEGV RTMIN RT_1], NULL, > 8) = 0 > [pid 3874] --- SIGCHLD (Child exited) @ 0 (0) --- > [pid 3874] write(2, "mount.ocfs2", 11mount.ocfs2) = 11 > [pid 3874] write(2, ": ", 2: ) = 2 > [pid 3874] write(2, "Error when attempting to run /sb"..., 74Error when > attempting to run /sbin/ocfs2_hb_ctl: "Operation not permitted") = 74 > [pid 3874] write(2, "\r\n", 2 > > ) = 2 > [pid 3874] exit_group(1) = ? > Process 3873 resumed > Process 3874 detached > <... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) > = 3874 > --- SIGCHLD (Child exited) @ 0 (0) --- > exit_group(1) = ? > Process 3873 detached > > > I think the problem is here ([pid 3875] write(4, "3", 1) > -1 EIO (Input/output error)), but in my previuos system had no > problem...the only think i changed since that was heartbeat2, but i > think has nothing to do with ocfs2...or yes? > I tried different kernel and different drbd versions as well. > How can i make it to work? > Thanks > > Pier > > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users@oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users
Sunil Mushran wrote:> What version of the kernel?I tried with standard debian kernel 2.6.18-5-686 and vanilla 2.6.20.17. Both of them give the same result.> Are there any relevant error messages in /var/log/messages.In syslog i get: Sep 5 10:54:57 srv-cluster-1 kernel: (8129,0):o2hb_setup_one_bio:290 ERROR: Error adding page to bio i = 7, vec_len = 4096, len = 0 Sep 5 10:54:57 srv-cluster-1 kernel: , start = 0 Sep 5 10:54:57 srv-cluster-1 kernel: (8129,0):o2hb_read_slots:385 ERROR: status = -5 Sep 5 10:54:57 srv-cluster-1 kernel: (8129,0):o2hb_populate_slot_data:1299 ERROR: status = -5 Sep 5 10:54:57 srv-cluster-1 kernel: (8129,0):o2hb_region_dev_write:1399 ERROR: status = -5> The error in question is cropping up from o2hb_region_dev_write() > in fs/ocfs2/cluster/heartbeat.c. The messages may give us more > information.Is there any specific part that i can debug (with debug.ocfs2)? Thank you Pier