Hello All, I wrote earlier about our OCFS2 crash issue in KVM due to bug in the SMP code. For this we come up with a solution: Instead of using multiple vcpus <vcpu placement='static'>8</vcpu> using a single one and multiple cores instead: <topology sockets='8' cores='8' threads='1'/> And applying key tune options to sysctl.conf: vm.min_free_kbytes=131072 vm.zone_reclaim_mode=1 Seemed to be helped, the fs did not crash right away when we were hammering it with apache benchmarks with 10000 requests however last night I started a large rsync operation from a 5TB OCFS2 FS mounted in the VM to another OCFS2 mounted in the same VM and ended up with: https://urldefense.proofpoint.com/v2/url?u=https-3A__ibb.co_gFeGg5&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=QxGl6UoyzTJm_1fAz5ZR9izvWJhWcqbtYn-0afBpa7A&m=cYprGRHz-oQmhnx4HIke8sTdCG_tf8Jb-rF6sHnYLnk&s=ajWfQIlUZOpElFWxoKcmvTIk7J3PpuCJITcnXfJQHrc&e= After trying a lot of different kernels starting from the 3.x series, now we are using 4.13.2 latest kernel with default configuration but these issues still present. Is this OCFS2 project still being developed? With this crashing and unreliability it cannot be used in production unless you put in place bunch of safeguards to reset out the whole virtualmachine when it crashes. Thanks
Hi, Could you please paste the crash back trace. On 2017/9/27 16:15, netbsd at tango.lu wrote:> Hello All, > > I wrote earlier about our OCFS2 crash issue in KVM due to bug in the SMP > code. > > For this we come up with a solution: > > Instead of using multiple vcpus > <vcpu placement='static'>8</vcpu> > > using a single one and multiple cores instead: > <topology sockets='8' cores='8' threads='1'/> > > And applying key tune options to sysctl.conf: > > vm.min_free_kbytes=131072 > vm.zone_reclaim_mode=1 > > Seemed to be helped, the fs did not crash right away when we were > hammering it with apache benchmarks with 10000 requests however last > night I started a large rsync operation from a 5TB OCFS2 FS mounted in > the VM to another OCFS2 mounted in the same VM and ended up with: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__ibb.co_gFeGg5&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=QxGl6UoyzTJm_1fAz5ZR9izvWJhWcqbtYn-0afBpa7A&m=cYprGRHz-oQmhnx4HIke8sTdCG_tf8Jb-rF6sHnYLnk&s=ajWfQIlUZOpElFWxoKcmvTIk7J3PpuCJITcnXfJQHrc&e> > After trying a lot of different kernels starting from the 3.x series, > now we are using 4.13.2 latest kernel with default configuration but > these issues still present. Is this OCFS2 project still being developed?I admit that the developing group is not active recently.> With this crashing and unreliability it cannot be used in production > unless you put in place bunch of safeguards to reset out the whole > virtualmachine when it crashes. > > Thanks > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users at oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-users >
Hello netbsd, The ocfs2 project is still be developed by us (from SUE, Huawei, Oracle and H3C. etc.). If you encountered some problem, please send the mail to ocfs2-devel mail list, we usually watch that mail for ocfs2 kernel related issues.>>> > Hello All, > > I wrote earlier about our OCFS2 crash issue in KVM due to bug in the SMP > code. > > For this we come up with a solution: > > Instead of using multiple vcpus > <vcpu placement='static'>8</vcpu> > > using a single one and multiple cores instead: > <topology sockets='8' cores='8' threads='1'/> > > And applying key tune options to sysctl.conf: > > vm.min_free_kbytes=131072 > vm.zone_reclaim_mode=1 > > Seemed to be helped, the fs did not crash right away when we were > hammering it with apache benchmarks with 10000 requests however last > night I started a large rsync operation from a 5TB OCFS2 FS mounted in > the VM to another OCFS2 mounted in the same VM and ended up with: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__ibb.co_gFeGg5&d=DwICAg&c=R > oP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=QxGl6UoyzTJm_1fAz5ZR9izvWJhWcqbtY > n-0afBpa7A&m=cYprGRHz-oQmhnx4HIke8sTdCG_tf8Jb-rF6sHnYLnk&s=ajWfQIlUZOpElFWxoKcmvTI > k7J3PpuCJITcnXfJQHrc&e= >From the kernel crash backtrace, this problem should be that long time to acquiring spin_lock triggers a NMI interruption.Could you give a detailed reproduce steps? since we want to reproduce this issue in local, then try to fix it. Thanks Gang> > After trying a lot of different kernels starting from the 3.x series, > now we are using 4.13.2 latest kernel with default configuration but > these issues still present. Is this OCFS2 project still being developed? > With this crashing and unreliability it cannot be used in production > unless you put in place bunch of safeguards to reset out the whole > virtualmachine when it crashes. > > Thanks > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users at oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-users