Hello -- We're running a handful of OCFS2 clusters on Novell SuSE SLES 10 SP2. We are in front of IBM SVC storage, and on HP Blade hardware via the QLA 2xxx HBAs. We have an application from IBM that makes use of files in this space in a grid style environment, and we are in the process of debugging some I/O issues and crashes, but while we do, I'm wondering if there is any good reference on what constitutes a solid starting point for tuning how many concurrent accesses to a directory are allowed, or if there are specific tunables that are outside the default we need. There are some strange errors that I can't decipher: Sep 25 12:11:30 host02 kernel: (4438,4):dlmunlock_common:128 ERROR: lockres F00000000000000003b1341b545b16f: Someone is calling dlmunlock while waiting for an ast!<3>(4438,4):dlmunlock:685 ERROR: dlm status DLM_BADPARAM Sep 25 12:11:30 host02 kernel: (4438,4):ocfs2_cancel_convert:3092 ERROR: Dlm error "DLM_BADPARAM" while calling dlmunlock on resource F00000000000000003b1341b545b16f: invalid lock mode specified The symptom of the problem is that file access to the mountpoint of ocfs2 space gets gradually slower and slower until the system just crashes / becomes unresponsive when trying to access files there, cd into the directory, etc. What we've done so far: - Checked our multipath configuration - seems to be showing all paths to our disks, none offline, none failed, etc. - Checked our lvm configuration - seems to be good as well. - Checked our HBA configuration -- made some changes in regards to retry and failover... but this change has made the behavior no better. Anyone can point me in the right direction or help me know what questions to even start asking here? The problem seems related to multiple / concurrent access to directories within an OCFS2 filesystem, and how DLM is behaving. Our OS ver/kernel is 2.6.16.60-0.42.5-smp (Novell SLES10-sp2 + patches) Thanks in advance... Angelo
Ping Novell for issues on SLES10. The error suggests that you are encountering novell bz#524683. This has been addressed in ocfs2 1.4.4. Ping Novell for a PTF kernel with the fix. Angelo McComis wrote:> Hello -- > > We're running a handful of OCFS2 clusters on Novell SuSE SLES 10 SP2. > We are in front of IBM SVC storage, and on HP Blade hardware via the > QLA 2xxx HBAs. > > We have an application from IBM that makes use of files in this space > in a grid style environment, and we are in the process of debugging > some I/O issues and crashes, but while we do, I'm wondering if there > is any good reference on what constitutes a solid starting point for > tuning how many concurrent accesses to a directory are allowed, or if > there are specific tunables that are outside the default we need. > > > There are some strange errors that I can't decipher: > > Sep 25 12:11:30 host02 kernel: (4438,4):dlmunlock_common:128 ERROR: > lockres F00000000000000003b1341b545b16f: Someone is calling dlmunlock > while waiting for an ast!<3>(4438,4):dlmunlock:685 ERROR: dlm status > DLM_BADPARAM > > Sep 25 12:11:30 host02 kernel: (4438,4):ocfs2_cancel_convert:3092 > ERROR: Dlm error "DLM_BADPARAM" while calling dlmunlock on resource > F00000000000000003b1341b545b16f: invalid lock mode specified > > The symptom of the problem is that file access to the mountpoint of > ocfs2 space gets gradually slower and slower until the system just > crashes / becomes unresponsive when trying to access files there, cd > into the directory, etc. > > What we've done so far: > > - Checked our multipath configuration - seems to be showing all paths > to our disks, none offline, none failed, etc. > - Checked our lvm configuration - seems to be good as well. > - Checked our HBA configuration -- made some changes in regards to > retry and failover... but this change has made the behavior no better. > > Anyone can point me in the right direction or help me know what > questions to even start asking here? > > The problem seems related to multiple / concurrent access to > directories within an OCFS2 filesystem, and how DLM is behaving. > > > Our OS ver/kernel is 2.6.16.60-0.42.5-smp (Novell SLES10-sp2 + patches) > > Thanks in advance... > > Angelo > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users at oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users >