Gang He
2016-May-13 08:36 UTC
[Ocfs2-devel] inconsistent dlm_new_lockspace LVB_LEN size from ocfs2 user-space tool and ocfs2 kernel module
Hello Guys, Here is a inconsistent LVB_LEN size problem when create a new lockspace from user-space tool (e.g. fsck.ocfs2) and kernel module (e.g. ocfs2/stack_user.c).>From the userspace tool, the LVB size is DLM_USER_LVB_LEN (32 bytes, defined in /include/linux/dlm_device.h) >From the kernel module, the LVB size is DLM_LVB_LEN (64 bytes).Why did we design like this? Look at GFS2 kernel module code, it uses 32 bytes as LVB_LEN size, it is the same size with DLM_USER_LVB_LEN macro definition. Now, We encountered a customer issue, the user did a fsck on a ocfs2 file system from one node, but aborted without release this lockspace (32bytes), then the user mounted this file system. The kernel module would use the existing same lockspace, without creating the new lockspace with 64 bytes LVB_LEN. Next, the bad result was that the user could not mount this file system from the other nodes no longer. The error messages likes, Apr 26 16:29:16 mapkhpch1bl02 kernel: [ 3730.430947] dlm: 032F55597DEA4A61AB065568F964174D: config mismatch: 64,0 nodeid 177127961: 32,0 Apr 26 16:29:16 mapkhpch1bl02 kernel: [ 3730.433267] (mount.ocfs2,26981,46):ocfs2_dlm_init:2995 ERROR: status = -71 Apr 26 16:29:16 mapkhpch1bl02 kernel: [ 3730.433325] (mount.ocfs2,26981,46):ocfs2_mount_volume:1881 ERROR: status = -71 Apr 26 16:29:16 mapkhpch1bl02 kernel: [ 3730.433376] (mount.ocfs2,26981,46):ocfs2_fill_super:1236 ERROR: status = -71 Apr 26 16:29:16 mapkhpch1bl02 Filesystem(MITC_Pool1)[26912]: ERROR: Couldn't mount filesystem /dev/disk/by-id/scsi-3600507640081010d5000000000000082 on /MITC_Pool1 Of course, the urgent fix is easy, we can reboot all the nodes, then mount the file system again. But, I want to if there were some reasons about this design, otherwise, I want to see if we can use the same size between user space and kernel module. Thanks Gang
David Teigland
2016-May-13 16:07 UTC
[Ocfs2-devel] [Cluster-devel] inconsistent dlm_new_lockspace LVB_LEN size from ocfs2 user-space tool and ocfs2 kernel module
On Fri, May 13, 2016 at 02:36:25AM -0600, Gang He wrote:> Here is a inconsistent LVB_LEN size problem when create a new lockspace > from user-space tool (e.g. fsck.ocfs2) and kernel module (e.g. > ocfs2/stack_user.c). > From the userspace tool, the LVB size is DLM_USER_LVB_LEN (32 bytes, > defined in /include/linux/dlm_device.h) From the kernel module, the LVB > size is DLM_LVB_LEN (64 bytes).Yes> Why did we design like this? Look at GFS2 kernel module code, it uses 32 > bytes as LVB_LEN size, it is the same size with DLM_USER_LVB_LEN macro > definition.The lvb length was originally a constant 32 bytes, and was made variable after the dlm user interface existed. The variable length lvb could not be added to the existing user interface. (The dlm user interface is terrible and a new version has been needed for many years, but it's not used much, so it's not been worth the effort.)> Now, We encountered a customer issue, the user did a fsck > on a ocfs2 file system from one node, but aborted without release this > lockspace (32bytes), then the user mounted this file system. The kernel > module would use the existing same lockspace, without creating the new > lockspace with 64 bytes LVB_LEN. Next, the bad result was that the user > could not mount this file system from the other nodes no longer.> The error messages likes, > config mismatch: 64,0 nodeid 177127961: 32,0> Of course, the urgent fix is easy, we can reboot all the nodes, then > mount the file system again. But, I want to if there were some reasons > about this design, otherwise, I want to see if we can use the same size > between user space and kernel module.Sorry, I think the only way around this is to ensure that lockspaces are created from the kernel. Dave