张俊
2014-Feb-26 09:11 UTC
[Ocfs2-users] Large syslog created after cluster node logout iSCSI LUN
Hi everyone, I have meet a OCFS2 issue. The OS is Oracle Linux 6.5, using the latest Oracle UEK kernel 3.8.13-26.1.1.el6uek.x86_64. Three are two nodes in the OCFS2 cluster, and all nodes use the iSCSI SAN as share storage. The heartbeat mode of OCFS2 cluster is global. There are three iSCSI LUNs, one is used as heartbeat device and other two are formatted to OCFS2 volume by mkfs.ocfs2 and mounted on each node. The problem occurred when I intentionally logout one iSCSI LUN (OCFS2 volume) using command : iscsiadm ?Cm node ?CT xxx ?Cu. After 5 minutes or more, large same log messages would begin to written into the syslog (/var/log/messages), the contents are as below: Feb 26 16:06:44 tony kernel: (kworker/u:0,5141,0):ocfs2_dir_foreach_blk_id:1778 ERROR: Unable to read inode block for dir 520 Feb 26 16:06:44 tony kernel: (kworker/u:0,5141,0):ocfs2_dir_foreach_blk_id:1778 ERROR: Unable to read inode block for dir 520 Feb 26 16:06:44 tony kernel: (kworker/u:0,5141,0):ocfs2_dir_foreach_blk_id:1778 ERROR: Unable to read inode block for dir 520 Feb 26 16:06:44 tony kernel: (kworker/u:0,5141,0):ocfs2_dir_foreach_blk_id:1778 ERROR: Unable to read inode block for dir 520 Feb 26 16:06:44 tony kernel: (kworker/u:0,5141,0):ocfs2_dir_foreach_blk_id:1778 ERROR: Unable to read inode block for dir 520 Feb 26 16:06:44 tony kernel: (kworker/u:0,5141,0):ocfs2_dir_foreach_blk_id:1778 ERROR: Unable to read inode block for dir 520 Feb 26 16:06:44 tony kernel: (kworker/u:0,5141,0):ocfs2_dir_foreach_blk_id:1778 ERROR: Unable to read inode block for dir 520 Feb 26 16:06:44 tony kernel: (kworker/u:0,5141,0):ocfs2_dir_foreach_blk_id:1778 ERROR: Unable to read inode block for dir 520 Feb 26 16:06:44 tony kernel: (kworker/u:0,5141,0):ocfs2_dir_foreach_blk_id:1778 ERROR: Unable to read inode block for dir 520 Feb 26 16:06:44 tony kernel: (kworker/u:0,5141,0):ocfs2_dir_foreach_blk_id:1778 ERROR: Unable to read inode block for dir 520 Feb 26 16:06:44 tony kernel: (kworker/u:0,5141,0):ocfs2_dir_foreach_blk_id:1778 ERROR: Unable to read inode block for dir 520 ............................................................................................. The syslog file size increases quickly, and will occupy all the remained capacity of the / directory, which making the host blocked and not responsible. According to the error logs, the messages is logged by function ocfs2_dir_foreach_blk_id in source file fs/ocfs2/dir.c static int ocfs2_dir_foreach_blk_id(struct inode *inode, u64 *f_version, loff_t *f_pos, void *priv, filldir_t filldir, int *filldir_err) { int ret, i, filldir_ret; unsigned long offset = *f_pos; struct buffer_head *di_bh = NULL; struct ocfs2_dinode *di; struct ocfs2_inline_data *data; struct ocfs2_dir_entry *de; ret = ocfs2_read_inode_block(inode, &di_bh); if (ret) { mlog(ML_ERROR, "Unable to read inode block for dir %llu\n", (unsigned long long)OCFS2_I(inode)->ip_blkno); goto out; } di = (struct ocfs2_dinode *)di_bh->b_data; data = &di->id2.i_data; ............................................................................................. I can use the command: debugfs.ocfs2 ?Cl ERROR off to disable mlog(ML_ERROR) logging, but a kernel process will be created and occupy large cpu resources, and it cannot be killed. PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 5141 root 20 0 0 0 0 R 97.2 0.0 33:03.89 kworker/u:0 2464 root 20 0 193m 28m 6212 S 1.0 2.8 0:19.48 Xorg 3331 root 20 0 289m 8972 4944 S 0.7 0.9 0:06.58 gnome-terminal 2941 root 20 0 130m 4804 1512 S 0.3 0.5 0:00.29 gconfd-2 2990 root 20 0 299m 7268 5136 S 0.3 0.7 0:03.71 wnck-applet 3056 root 20 0 272m 6572 4092 S 0.3 0.6 0:00.21 notification-da 6073 root 20 0 15088 1196 852 R 0.3 0.1 0:00.36 top If I umount the OCFS2 volume mounted within 5 minutes, this problem would not happen, and the volume can be re-mounted successfully. While after 5 minitues or more, the OCFS2 volume cannot be umounted successfully, and the umount process will hang. Even I reconnect the iSCSI LUN, and mount operation will also hang, the OCFS2 volume cannot be mounted anymore. This may be a bug of OCFS2. Now I have to reboot the host to solve this problem, is the issue had been solved or any other way to avoid it? Thanks a lot! Tony Zhang -------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20140226/85d34c4c/attachment.html