thr3ads.net - Ocfs2 users - [Ocfs2-users] OCFS2 KVM Crashes Yet Again ! [Sep 2017]

If this information is useful, please help other people find it:
Share via:

Gang He

2017-Sep-27 08:53 UTC

[Ocfs2-users] OCFS2 KVM Crashes Yet Again !

Hello netbsd,

The ocfs2 project is still be developed by us (from SUE, Huawei, Oracle and H3C.
etc.).
If you encountered some problem, please send the mail to ocfs2-devel mail list,
we usually watch that mail for ocfs2 kernel related issues.



>>> 
> Hello All,
> 
> I wrote earlier about our OCFS2 crash issue in KVM due to bug in the SMP 
> code.
> 
> For this we come up with a solution:
> 
> Instead of using multiple vcpus
>    <vcpu placement='static'>8</vcpu>
> 
> using a single one and multiple cores instead:
>      <topology sockets='8' cores='8'
threads='1'/>
> 
> And applying key tune options to sysctl.conf:
> 
> vm.min_free_kbytes=131072
> vm.zone_reclaim_mode=1
> 
> Seemed to be helped, the fs did not crash right away when we were 
> hammering it with apache benchmarks with 10000 requests however last 
> night I started a large rsync operation from a 5TB OCFS2 FS mounted in 
> the VM to another OCFS2 mounted in the same VM and ended up with:
> 
>
https://urldefense.proofpoint.com/v2/url?u=https-3A__ibb.co_gFeGg5&d=DwICAg&c=R
>
oP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=QxGl6UoyzTJm_1fAz5ZR9izvWJhWcqbtY
>
n-0afBpa7A&m=cYprGRHz-oQmhnx4HIke8sTdCG_tf8Jb-rF6sHnYLnk&s=ajWfQIlUZOpElFWxoKcmvTI
> k7J3PpuCJITcnXfJQHrc&e= 
>From the kernel crash backtrace, this problem should be that long time to
acquiring spin_lock triggers a NMI interruption.Could you give a detailed reproduce steps? since we want to reproduce this issue
in local, then try to fix it.


Thanks
Gang 
> 
> After trying a lot of different kernels starting from the 3.x series, 
> now we are using 4.13.2 latest kernel with default configuration but 
> these issues still present. Is this OCFS2 project still being developed? 
> With this crashing and unreliability it cannot be used in production 
> unless you put in place bunch of safeguards to reset out the whole 
> virtualmachine when it crashes.
> 
> Thanks
> 
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com 
> https://oss.oracle.com/mailman/listinfo/ocfs2-users

netbsd at tango.lu

2017-Sep-27 09:06 UTC

head link

[Ocfs2-users] OCFS2 KVM Crashes Yet Again !

Hello,

Find the full log below:

https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.ubuntu.com_25625787_&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=QxGl6UoyzTJm_1fAz5ZR9izvWJhWcqbtYn-0afBpa7A&m=5BOEZ6shfWftzp2R4SRZiOEZXmUvTYJZyua8Tuqlya4&s=Wj5bNrDYurcyqvciOCmJAZyQjrpSQjQVP_9-laqPiso&e=

VM was restarted at 9:27 and no problem since then. We are rsyncing 
about 2TB data (a lot of small files) between 2 OCFS shares on the same 
vm:


/dev/vdc                      4.8T  2.8T  2.1T  58% /mnt/s1
/dev/vdf                      4.8T  985G  3.9T  21% /mnt/s2

rsync -av --numeric-ids --delete /mnt/s1/ /mnt/s2/


On 2017-09-27 10:53, Gang He wrote:> Hello netbsd,
> 
> The ocfs2 project is still be developed by us (from SUE, Huawei,
> Oracle and H3C. etc.).
> If you encountered some problem, please send the mail to ocfs2-devel
> mail list, we usually watch that mail for ocfs2 kernel related issues.
> 
> 
> 
> 
>>>> 
>> Hello All,
>> 
>> I wrote earlier about our OCFS2 crash issue in KVM due to bug in the 
>> SMP
>> code.
>> 
>> For this we come up with a solution:
>> 
>> Instead of using multiple vcpus
>>    <vcpu placement='static'>8</vcpu>
>> 
>> using a single one and multiple cores instead:
>>      <topology sockets='8' cores='8'
threads='1'/>
>> 
>> And applying key tune options to sysctl.conf:
>> 
>> vm.min_free_kbytes=131072
>> vm.zone_reclaim_mode=1
>> 
>> Seemed to be helped, the fs did not crash right away when we were
>> hammering it with apache benchmarks with 10000 requests however last
>> night I started a large rsync operation from a 5TB OCFS2 FS mounted in
>> the VM to another OCFS2 mounted in the same VM and ended up with:
>> 
>>
https://urldefense.proofpoint.com/v2/url?u=https-3A__ibb.co_gFeGg5&d=DwICAg&c=R
>>
oP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=QxGl6UoyzTJm_1fAz5ZR9izvWJhWcqbtY
>>
n-0afBpa7A&m=cYprGRHz-oQmhnx4HIke8sTdCG_tf8Jb-rF6sHnYLnk&s=ajWfQIlUZOpElFWxoKcmvTI
>> k7J3PpuCJITcnXfJQHrc&e> From the kernel crash backtrace, this
problem should be that long time
> to acquiring spin_lock triggers a NMI interruption.
> Could you give a detailed reproduce steps? since we want to reproduce
> this issue in local, then try to fix it.
> 
> 
> Thanks
> Gang
> 
>> 
>> After trying a lot of different kernels starting from the 3.x series,
>> now we are using 4.13.2 latest kernel with default configuration but
>> these issues still present. Is this OCFS2 project still being 
>> developed?
>> With this crashing and unreliability it cannot be used in production
>> unless you put in place bunch of safeguards to reset out the whole
>> virtualmachine when it crashes.
>> 
>> Thanks
>> 
>> _______________________________________________
>> Ocfs2-users mailing list
>> Ocfs2-users at oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/ocfs2-users

Ocfs2 users - Sep 2017 - OCFS2 KVM Crashes Yet Again !

[Ocfs2-users] OCFS2 KVM Crashes Yet Again !

[Ocfs2-users] OCFS2 KVM Crashes Yet Again !