netbsd at tango.lu
2017-Aug-28 13:30 UTC
[Ocfs2-users] OCFS2 Crash With Multiple Virtual CPUs
Hello List, We were experimenting with the newer version of OCFS2 on Debian 9 Stretch inside KVM GUESTS. #1 SMP Debian 4.11.6-1~bpo9+1 (2017-07-09) x86_64 GNU/Linux Kernels what we have tried: vmlinuz-4.11.0-0.bpo.1-amd64 vmlinuz-4.1.1 custom vmlinuz-4.9.0-3-amd64 We have 3 nodes, but it is the same with 1 single node when we do apache benchmark on the vm it crashes (becomes unpingable, unreachable, kernel crashlog on virtual console) until destroy and restart. Some point of the crashdump referred to SMP so we have tried to reconfigure the VM with 1 cpu and guess what it worked. No crash in case of 1 cpu but the performance is way too slow. Anybody has a clue what can go wrong here? ii ocfs2-tools 1.8.4-4 amd64 tools for managing OCFS2 cluster filesystems The kernel crash log: Aug 23 11:11:37 webserver3 kernel: [414697.513953] ------------[ cut here ]------------ Aug 23 11:11:37 webserver3 kernel: [414697.515526] kernel BUG at /build/linux-9uDFZV/linux-4.9.30/fs/ocfs2/dlmglue.c:780! Aug 23 11:11:37 webserver3 kernel: [414697.516010] invalid opcode: 0000 [#1] SMP Aug 23 11:11:37 webserver3 kernel: [414697.516010] Modules linked in: ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2 ocfs2_nodemanager configfs ocfs2_stackglue quota_tree ppdev cirrus edac_mce_amd ttm edac_core drm_kms_helper evdev pcspkr joydev sg serio_raw drm parport_pc pvpanic parport virtio_balloon button acpi_cpufreq ip_tables x_tables autofs4 ext4 crc16 jbd2 crc32c_generic fscrypto ecb glue_helper lrw gf128mul ablk_helper cryptd aes_x86_64 mbcache hid_generic usbhid hid dm_mod sr_mod cdrom ata_generic virtio_blk virtio_net ata_piix virtio_pci psmouse libata uhci_hcd virtio_ring virtio ehci_hcd scsi_mod i2c_piix4 usbcore usb_common floppy Aug 23 11:11:37 webserver3 kernel: [414697.516010] CPU: 2 PID: 21763 Comm: apache2 Not tainted 4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u2 Aug 23 11:11:37 webserver3 kernel: [414697.516010] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014 Aug 23 11:11:37 webserver3 kernel: [414697.516010] task: ffff8a9540e9a440 task.stack: fffface70312c000 Aug 23 11:11:37 webserver3 kernel: [414697.516010] RIP: 0010:[<ffffffffc05de5ad>] [<ffffffffc05de5ad>] __ocfs2_cluster_unlock.isra.36+0x9d/0xb0 [ocfs2] Aug 23 11:11:37 webserver3 kernel: [414697.516010] RSP: 0018:fffface70312fbb8 EFLAGS: 00010046 Aug 23 11:11:37 webserver3 kernel: [414697.516010] RAX: 0000000000000296 RBX: ffff8a946cc2ce18 RCX: 000000000022ccaa Aug 23 11:11:37 webserver3 kernel: [414697.516010] RDX: 0000000000000000 RSI: ffff8a946cc2ce18 RDI: ffff8a946cc2ce84 Aug 23 11:11:37 webserver3 kernel: [414697.516010] RBP: 0000000000000003 R08: 0001b1bfad90f0c0 R09: 0000000000000000 Aug 23 11:11:37 webserver3 kernel: [414697.516010] R10: ffff8a956d9a5180 R11: 00000000f5257d14 R12: ffff8a946cc2ce84 Aug 23 11:11:37 webserver3 kernel: [414697.516010] R13: ffff8a95704ef000 R14: 0000000000000000 R15: ffffffffc0671080 Aug 23 11:11:37 webserver3 kernel: [414697.516010] FS: 00007f68b84004c0(0000) GS:ffff8a9576f00000(0000) knlGS:0000000000000000 Aug 23 11:11:37 webserver3 kernel: [414697.516010] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Aug 23 11:11:37 webserver3 kernel: [414697.516010] CR2: 00007f689be75028 CR3: 000000012faca000 CR4: 00000000000006e0 Aug 23 11:11:37 webserver3 kernel: [414697.516010] Stack: Aug 23 11:11:37 webserver3 kernel: [414697.516010] ffffffffc05df6e5 0000000000000000 ffff8a94e4c7b240 ffff8a947ed7f498 Aug 23 11:11:37 webserver3 kernel: [414697.516010] ffff8a94c47ee800 ffffffffc05cf0c0 ffff8a94e4c7bf00 ffff8a957014e498 Aug 23 11:11:37 webserver3 kernel: [414697.516010] 0000000000000000 ffff8a94e4c7b240 844a7733fa02ee88 ffff8a957014e498 Aug 23 11:11:37 webserver3 kernel: [414697.516010] Call Trace: Aug 23 11:11:37 webserver3 kernel: [414697.559622] [<ffffffffc05df6e5>] ? ocfs2_dentry_unlock+0x35/0x80 [ocfs2] Aug 23 11:11:37 webserver3 kernel: [414697.559622] [<ffffffffc05cf0c0>] ? ocfs2_dentry_attach_lock+0x2d0/0x430 [ocfs2] Aug 23 11:11:37 webserver3 kernel: [414697.559622] [<ffffffffc05f811e>] ? ocfs2_lookup+0x19e/0x2e0 [ocfs2] Aug 23 11:11:37 webserver3 kernel: [414697.559622] [<ffffffffa201ac56>] ? d_invalidate+0xb6/0x120 Aug 23 11:11:37 webserver3 kernel: [414697.559622] [<ffffffffa200de53>] ? lookup_slow+0xa3/0x170 Aug 23 11:11:37 webserver3 kernel: [414697.559622] [<ffffffffa200e553>] ? walk_component+0x1f3/0x320 Aug 23 11:11:37 webserver3 kernel: [414697.559622] [<ffffffffa200f0d2>] ? link_path_walk+0x1b2/0x650 Aug 23 11:11:37 webserver3 kernel: [414697.559622] [<ffffffffa200f676>] ? path_lookupat+0x86/0x120 Aug 23 11:11:37 webserver3 kernel: [414697.559622] [<ffffffffa2011fc1>] ? filename_lookup+0xb1/0x180 Aug 23 11:11:37 webserver3 kernel: [414697.559622] [<ffffffffa1ffdefa>] ? __check_object_size+0xfa/0x1d8 Aug 23 11:11:37 webserver3 kernel: [414697.559622] [<ffffffffa2156768>] ? strncpy_from_user+0x48/0x160 Aug 23 11:11:37 webserver3 kernel: [414697.559622] [<ffffffffa2011bfa>] ? getname_flags+0x6a/0x1e0 Aug 23 11:11:37 webserver3 kernel: [414697.559622] [<ffffffffa1fffa0d>] ? SyS_access+0xad/0x220 Aug 23 11:11:37 webserver3 kernel: [414697.559622] [<ffffffffa240627b>] ? system_call_fast_compare_end+0xc/0x9b Aug 23 11:11:37 webserver3 kernel: [414697.559622] Code: 89 c6 5b 5d 41 5c 41 5d e9 a1 78 e2 e1 0f 0b 8b 53 58 85 d2 74 15 83 ea 01 89 53 58 eb af 8b 53 5c 85 d2 74 c3 eb d1 0f 0b 0f 0b <0f> 0b 0f 0b 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 Aug 23 11:11:37 webserver3 kernel: [414697.559622] RIP [<ffffffffc05de5ad>] __ocfs2_cluster_unlock.isra.36+0x9d/0xb0 [ocfs2] Aug 23 11:11:37 webserver3 kernel: [414697.559622] RSP <fffface70312fbb8> Aug 23 11:11:37 webserver3 kernel: [414697.559622] ---[ end trace 44a164f27bc6b279 ]--- Aug 23 11:11:58 webserver3 kernel: [414718.512051] INFO: rcu_sched detected stalls on CPUs/tasks: Aug 23 11:11:58 webserver3 kernel: [414718.514155] 0-...: (62 GPs behind) idle=b5f/140000000000000/0 softirq=632992/632992 fqs=2102 Aug 23 11:11:58 webserver3 kernel: [414718.516044] (detected by 1, t=5252 jiffies, g=1801366, c=1801365, q=8046) Aug 23 11:11:58 webserver3 kernel: [414718.516044] Task dump for CPU 0: Aug 23 11:11:58 webserver3 kernel: [414718.516044] apache2 R running task 0 22317 15345 0x00000008 Aug 23 11:11:58 webserver3 kernel: [414718.516044] ffff8a956d9a5100 ffff8a9576f18240 ffff8a94828d21c0 0000000000000000 Aug 23 11:11:58 webserver3 kernel: [414718.516044] ffff8a9454302000 fffface7031ef9a0 ffffffffa24015db 00000001719dd000 Aug 23 11:11:58 webserver3 kernel: [414718.516044] 00ff8a95719dd260 ffff8a9576e18240 0000000000000003 ffff8a94828d21c0 Aug 23 11:11:58 webserver3 kernel: [414718.516044] Call Trace: Aug 23 11:11:58 webserver3 kernel: [414718.516044] [<ffffffffa24015db>] ? __schedule+0x23b/0x6d0 Aug 23 11:11:58 webserver3 kernel: [414718.516044] [<ffffffffa2401aa2>] ? schedule+0x32/0x80 Aug 23 11:11:58 webserver3 kernel: [414718.516044] [<ffffffffa2404e73>] ? schedule_timeout+0x243/0x310 Aug 23 11:11:58 webserver3 kernel: [414718.516044] [<ffffffffa2147059>] ? list_del+0x9/0x30 Aug 23 11:11:58 webserver3 kernel: [414718.516044] [<ffffffffa24024ce>] ? wait_for_completion+0x10e/0x130 Aug 23 11:11:58 webserver3 kernel: [414718.516044] [<ffffffffa1ea1800>] ? wake_up_q+0x70/0x70 Aug 23 11:11:58 webserver3 kernel: [414718.516044] [<ffffffffa24061e2>] ? _raw_spin_lock_irqsave+0x32/0x39 Aug 23 11:11:58 webserver3 kernel: [414718.516044] [<ffffffffc05dd68d>] ? __ocfs2_cluster_lock.isra.35+0xcd/0x730 [ocfs2] Aug 23 11:11:58 webserver3 kernel: [414718.516044] [<ffffffffa21349d9>] ? snprintf+0x49/0x60 Aug 23 11:11:58 webserver3 kernel: [414718.516044] [<ffffffffc05de3bb>] ? ocfs2_dentry_lock+0xbb/0x110 [ocfs2] Aug 23 11:11:58 webserver3 kernel: [414718.516044] [<ffffffffc05cef3f>] ? ocfs2_dentry_attach_lock+0x14f/0x430 [ocfs2] Aug 23 11:11:58 webserver3 kernel: [414718.516044] [<ffffffffc05f811e>] ? ocfs2_lookup+0x19e/0x2e0 [ocfs2] Aug 23 11:11:58 webserver3 kernel: [414718.516044] [<ffffffffa200de53>] ? lookup_slow+0xa3/0x170 Aug 23 11:11:58 webserver3 kernel: [414718.516044] [<ffffffffa200e553>] ? walk_component+0x1f3/0x320 Aug 23 11:11:58 webserver3 kernel: [414718.516044] [<ffffffffa200f0d2>] ? link_path_walk+0x1b2/0x650 Aug 23 11:11:58 webserver3 kernel: [414718.516044] [<ffffffffa200f676>] ? path_lookupat+0x86/0x120 Aug 23 11:11:58 webserver3 kernel: [414718.516044] [<ffffffffa2011fc1>] ? filename_lookup+0xb1/0x180 Aug 23 11:11:58 webserver3 kernel: [414718.516044] [<ffffffffa1ffdefa>] ? __check_object_size+0xfa/0x1d8 Aug 23 11:11:58 webserver3 kernel: [414718.516044] [<ffffffffa2156768>] ? strncpy_from_user+0x48/0x160 Aug 23 11:11:58 webserver3 kernel: [414718.516044] [<ffffffffa2011bfa>] ? getname_flags+0x6a/0x1e0 Aug 23 11:11:58 webserver3 kernel: [414718.516044] [<ffffffffa1fffa0d>] ? SyS_access+0xad/0x220 Aug 23 11:11:58 webserver3 kernel: [414718.516044] [<ffffffffa240627b>] ? system_call_fast_compare_end+0xc/0x9b Aug 23 11:13:01 webserver3 kernel: [414781.532054] INFO: rcu_sched detected stalls on CPUs/tasks: Aug 23 11:13:01 webserver3 kernel: [414781.533564] 0-...: (62 GPs behind) idle=b5f/140000000000000/0 softirq=632992/632992 fqs=9974 Aug 23 11:13:01 webserver3 kernel: [414781.535001] (detected by 1, t=21007 jiffies, g=1801366, c=1801365, q=8534) Aug 23 11:13:01 webserver3 kernel: [414781.536042] Task dump for CPU 0: Aug 23 11:13:01 webserver3 kernel: [414781.536042] apache2 R running task 0 22317 15345 0x00000008 Aug 23 11:13:01 webserver3 kernel: [414781.536042] ffff8a956d9a5100 ffff8a9576f18240 ffff8a94828d21c0 0000000000000000 Aug 23 11:13:01 webserver3 kernel: [414781.536042] ffff8a9454302000 fffface7031ef9a0 ffffffffa24015db 00000001719dd000 Aug 23 11:13:01 webserver3 kernel: [414781.536042] 00ff8a95719dd260 ffff8a9576e18240 0000000000000003 ffff8a94828d21c0 Aug 23 11:13:01 webserver3 kernel: [414781.536042] Call Trace: Aug 23 11:13:01 webserver3 kernel: [414781.536042] [<ffffffffa24015db>] ? __schedule+0x23b/0x6d0 Aug 23 11:13:01 webserver3 kernel: [414781.536042] [<ffffffffa2401aa2>] ? schedule+0x32/0x80 Aug 23 11:13:01 webserver3 kernel: [414781.536042] [<ffffffffa2404e73>] ? schedule_timeout+0x243/0x310 Aug 23 11:13:01 webserver3 kernel: [414781.536042] [<ffffffffa2147059>] ? list_del+0x9/0x30 Aug 23 11:13:01 webserver3 kernel: [414781.536042] [<ffffffffa24024ce>] ? wait_for_completion+0x10e/0x130 Aug 23 11:13:01 webserver3 kernel: [414781.536042] [<ffffffffa1ea1800>] ? wake_up_q+0x70/0x70 Aug 23 11:13:01 webserver3 kernel: [414781.536042] [<ffffffffa24061e2>] ? _raw_spin_lock_irqsave+0x32/0x39 Aug 23 11:13:01 webserver3 kernel: [414781.536042] [<ffffffffc05dd68d>] ? __ocfs2_cluster_lock.isra.35+0xcd/0x730 [ocfs2] Aug 23 11:13:01 webserver3 kernel: [414781.536042] [<ffffffffa21349d9>] ? snprintf+0x49/0x60 Aug 23 11:13:01 webserver3 kernel: [414781.536042] [<ffffffffc05de3bb>] ? ocfs2_dentry_lock+0xbb/0x110 [ocfs2] Aug 23 11:13:01 webserver3 kernel: [414781.536042] [<ffffffffc05cef3f>] ? ocfs2_dentry_attach_lock+0x14f/0x430 [ocfs2] Aug 23 11:13:01 webserver3 kernel: [414781.536042] [<ffffffffc05f811e>] ? ocfs2_lookup+0x19e/0x2e0 [ocfs2] Aug 23 11:13:01 webserver3 kernel: [414781.536042] [<ffffffffa200de53>] ? lookup_slow+0xa3/0x170 Aug 23 11:13:01 webserver3 kernel: [414781.536042] [<ffffffffa200e553>] ? walk_component+0x1f3/0x320 Aug 23 11:13:01 webserver3 kernel: [414781.536042] [<ffffffffa200f0d2>] ? link_path_walk+0x1b2/0x650 Aug 23 11:13:01 webserver3 kernel: [414781.536042] [<ffffffffa200f676>] ? path_lookupat+0x86/0x120 Aug 23 11:13:01 webserver3 kernel: [414781.536042] [<ffffffffa2011fc1>] ? filename_lookup+0xb1/0x180 Aug 23 11:13:01 webserver3 kernel: [414781.536042] [<ffffffffa1ffdefa>] ? __check_object_size+0xfa/0x1d8 Aug 23 11:13:01 webserver3 kernel: [414781.536042] [<ffffffffa2156768>] ? strncpy_from_user+0x48/0x160 Aug 23 11:13:01 webserver3 kernel: [414781.536042] [<ffffffffa2011bfa>] ? getname_flags+0x6a/0x1e0 Aug 23 11:13:01 webserver3 kernel: [414781.536042] [<ffffffffa1fffa0d>] ? SyS_access+0xad/0x220 Aug 23 11:13:01 webserver3 kernel: [414781.536042] [<ffffffffa240627b>] ? system_call_fast_compare_end+0xc/0x9b Aug 23 11:14:04 webserver3 kernel: [414844.552023] INFO: rcu_sched detected stalls on CPUs/tasks: Aug 23 11:14:04 webserver3 kernel: [414844.553416] 0-...: (62 GPs behind) idle=b5f/140000000000000/0 softirq=632992/632992 fqs=17847 Aug 23 11:14:04 webserver3 kernel: [414844.554734] (detected by 1, t=36762 jiffies, g=1801366, c=1801365, q=9002) Aug 23 11:14:04 webserver3 kernel: [414844.556014] Task dump for CPU 0: Aug 23 11:14:04 webserver3 kernel: [414844.556014] apache2 R running task 0 22317 15345 0x00000008 Aug 23 11:14:04 webserver3 kernel: [414844.556014] ffff8a956d9a5100 ffff8a9576f18240 ffff8a94828d21c0 0000000000000000 Aug 23 11:14:04 webserver3 kernel: [414844.556014] ffff8a9454302000 fffface7031ef9a0 ffffffffa24015db 00000001719dd000 Aug 23 11:14:04 webserver3 kernel: [414844.556014] 00ff8a95719dd260 ffff8a9576e18240 0000000000000003 ffff8a94828d21c0 Aug 23 11:14:04 webserver3 kernel: [414844.556014] Call Trace: Aug 23 11:14:04 webserver3 kernel: [414844.556014] [<ffffffffa24015db>] ? __schedule+0x23b/0x6d0 Aug 23 11:14:04 webserver3 kernel: [414844.556014] [<ffffffffa2401aa2>] ? schedule+0x32/0x80 Aug 23 11:14:04 webserver3 kernel: [414844.556014] [<ffffffffa2404e73>] ? schedule_timeout+0x243/0x310 Aug 23 11:14:04 webserver3 kernel: [414844.556014] [<ffffffffa2147059>] ? list_del+0x9/0x30 Aug 23 11:14:04 webserver3 kernel: [414844.556014] [<ffffffffa24024ce>] ? wait_for_completion+0x10e/0x130 Aug 23 11:14:04 webserver3 kernel: [414844.556014] [<ffffffffa1ea1800>] ? wake_up_q+0x70/0x70 Aug 23 11:14:04 webserver3 kernel: [414844.556014] [<ffffffffa24061e2>] ? _raw_spin_lock_irqsave+0x32/0x39 Aug 23 11:14:04 webserver3 kernel: [414844.556014] [<ffffffffc05dd68d>] ? __ocfs2_cluster_lock.isra.35+0xcd/0x730 [ocfs2] Aug 23 11:14:04 webserver3 kernel: [414844.556014] [<ffffffffa21349d9>] ? snprintf+0x49/0x60 Aug 23 11:14:04 webserver3 kernel: [414844.556014] [<ffffffffc05de3bb>] ? ocfs2_dentry_lock+0xbb/0x110 [ocfs2] Aug 23 11:14:04 webserver3 kernel: [414844.556014] [<ffffffffc05cef3f>] ? ocfs2_dentry_attach_lock+0x14f/0x430 [ocfs2] Aug 23 11:14:04 webserver3 kernel: [414844.556014] [<ffffffffc05f811e>] ? ocfs2_lookup+0x19e/0x2e0 [ocfs2] Aug 23 11:14:04 webserver3 kernel: [414844.556014] [<ffffffffa200de53>] ? lookup_slow+0xa3/0x170 Aug 23 11:14:04 webserver3 kernel: [414844.556014] [<ffffffffa200e553>] ? walk_component+0x1f3/0x320 Aug 23 11:14:04 webserver3 kernel: [414844.556014] [<ffffffffa200f0d2>] ? link_path_walk+0x1b2/0x650 Aug 23 11:14:04 webserver3 kernel: [414844.556014] [<ffffffffa200f676>] ? path_lookupat+0x86/0x120 Aug 23 11:14:04 webserver3 kernel: [414844.556014] [<ffffffffa2011fc1>] ? filename_lookup+0xb1/0x180 Aug 23 11:14:04 webserver3 kernel: [414844.556014] [<ffffffffa1ffdefa>] ? __check_object_size+0xfa/0x1d8 Aug 23 11:14:04 webserver3 kernel: [414844.556014] [<ffffffffa2156768>] ? strncpy_from_user+0x48/0x160 Aug 23 11:14:04 webserver3 kernel: [414844.556014] [<ffffffffa2011bfa>] ? getname_flags+0x6a/0x1e0 Aug 23 11:14:04 webserver3 kernel: [414844.556014] [<ffffffffa1fffa0d>] ? SyS_access+0xad/0x220 Aug 23 11:14:04 webserver3 kernel: [414844.556014] [<ffffffffa240627b>] ? system_call_fast_compare_end+0xc/0x9b Aug 23 11:15:07 webserver3 kernel: [414907.572050] INFO: rcu_sched detected stalls on CPUs/tasks: Aug 23 11:15:07 webserver3 kernel: [414907.573461] 0-...: (62 GPs behind) idle=b5f/140000000000000/0 softirq=632992/632992 fqs=25720 Aug 23 11:15:07 webserver3 kernel: [414907.574780] (detected by 1, t=52517 jiffies, g=1801366, c=1801365, q=9472) Aug 23 11:15:07 webserver3 kernel: [414907.576040] Task dump for CPU 0: Aug 23 11:15:07 webserver3 kernel: [414907.576040] apache2 R running task 0 22317 15345 0x00000008 Aug 23 11:15:07 webserver3 kernel: [414907.576040] ffff8a956d9a5100 ffff8a9576f18240 ffff8a94828d21c0 0000000000000000 Aug 23 11:15:07 webserver3 kernel: [414907.576040] ffff8a9454302000 fffface7031ef9a0 ffffffffa24015db 00000001719dd000 Aug 23 11:15:07 webserver3 kernel: [414907.576040] 00ff8a95719dd260 ffff8a9576e18240 0000000000000003 ffff8a94828d21c0 Aug 23 11:15:07 webserver3 kernel: [414907.576040] Call Trace: Aug 23 11:15:07 webserver3 kernel: [414907.576040] [<ffffffffa24015db>] ? __schedule+0x23b/0x6d0 Aug 23 11:15:07 webserver3 kernel: [414907.576040] [<ffffffffa2401aa2>] ? schedule+0x32/0x80 Aug 23 11:15:07 webserver3 kernel: [414907.576040] [<ffffffffa2404e73>] ? schedule_timeout+0x243/0x310 Aug 23 11:15:07 webserver3 kernel: [414907.576040] [<ffffffffa2147059>] ? list_del+0x9/0x30 Aug 23 11:15:07 webserver3 kernel: [414907.576040] [<ffffffffa24024ce>] ? wait_for_completion+0x10e/0x130 Aug 23 11:15:07 webserver3 kernel: [414907.576040] [<ffffffffa1ea1800>] ? wake_up_q+0x70/0x70 Aug 23 11:15:07 webserver3 kernel: [414907.576040] [<ffffffffa24061e2>] ? _raw_spin_lock_irqsave+0x32/0x39 Aug 23 11:15:07 webserver3 kernel: [414907.576040] [<ffffffffc05dd68d>] ? __ocfs2_cluster_lock.isra.35+0xcd/0x730 [ocfs2] Aug 23 11:15:07 webserver3 kernel: [414907.576040] [<ffffffffa21349d9>] ? snprintf+0x49/0x60 Aug 23 11:15:07 webserver3 kernel: [414907.576040] [<ffffffffc05de3bb>] ? ocfs2_dentry_lock+0xbb/0x110 [ocfs2] Aug 23 11:15:07 webserver3 kernel: [414907.576040] [<ffffffffc05cef3f>] ? ocfs2_dentry_attach_lock+0x14f/0x430 [ocfs2] Aug 23 11:15:07 webserver3 kernel: [414907.576040] [<ffffffffc05f811e>] ? ocfs2_lookup+0x19e/0x2e0 [ocfs2] Aug 23 11:15:07 webserver3 kernel: [414907.576040] [<ffffffffa200de53>] ? lookup_slow+0xa3/0x170 Aug 23 11:15:07 webserver3 kernel: [414907.576040] [<ffffffffa200e553>] ? walk_component+0x1f3/0x320 Aug 23 11:15:07 webserver3 kernel: [414907.576040] [<ffffffffa200f0d2>] ? link_path_walk+0x1b2/0x650 Aug 23 11:15:07 webserver3 kernel: [414907.576040] [<ffffffffa200f676>] ? path_lookupat+0x86/0x120 Aug 23 11:15:07 webserver3 kernel: [414907.576040] [<ffffffffa2011fc1>] ? filename_lookup+0xb1/0x180 Aug 23 11:15:07 webserver3 kernel: [414907.576040] [<ffffffffa1ffdefa>] ? __check_object_size+0xfa/0x1d8 Aug 23 11:15:07 webserver3 kernel: [414907.576040] [<ffffffffa2156768>] ? strncpy_from_user+0x48/0x160 Aug 23 11:15:07 webserver3 kernel: [414907.576040] [<ffffffffa2011bfa>] ? getname_flags+0x6a/0x1e0 Aug 23 11:15:07 webserver3 kernel: [414907.576040] [<ffffffffa1fffa0d>] ? SyS_access+0xad/0x220 Aug 23 11:15:07 webserver3 kernel: [414907.576040] [<ffffffffa240627b>] ? system_call_fast_compare_end+0xc/0x9b Aug 23 11:16:10 webserver3 kernel: [414970.592038] INFO: rcu_sched detected stalls on CPUs/tasks: Aug 23 11:16:10 webserver3 kernel: [414970.592038] INFO: rcu_sched detected stalls on CPUs/tasks: Aug 23 11:16:10 webserver3 kernel: [414970.593438] 0-...: (62 GPs behind) idle=b5f/140000000000000/0 softirq=632992/632992 fqs=33592 Aug 23 11:16:10 webserver3 kernel: [414970.594756] (detected by 1, t=68272 jiffies, g=1801366, c=1801365, q=12181) Aug 23 11:16:10 webserver3 kernel: [414970.596028] Task dump for CPU 0: Aug 23 11:16:10 webserver3 kernel: [414970.596028] apache2 R running task 0 22317 15345 0x00000008 Aug 23 11:16:10 webserver3 kernel: [414970.596028] ffff8a956d9a5100 ffff8a9576f18240 ffff8a94828d21c0 0000000000000000 Aug 23 11:16:10 webserver3 kernel: [414970.596028] ffff8a9454302000 fffface7031ef9a0 ffffffffa24015db 00000001719dd000 Aug 23 11:16:10 webserver3 kernel: [414970.596028] 00ff8a95719dd260 ffff8a9576e18240 0000000000000003 ffff8a94828d21c0 Aug 23 11:16:10 webserver3 kernel: [414970.601874] Call Trace: Aug 23 11:16:10 webserver3 kernel: [414970.601874] [<ffffffffa24015db>] ? __schedule+0x23b/0x6d0 Aug 23 11:16:10 webserver3 kernel: [414970.601874] [<ffffffffa2401aa2>] ? schedule+0x32/0x80 Aug 23 11:16:10 webserver3 kernel: [414970.601874] [<ffffffffa2404e73>] ? schedule_timeout+0x243/0x310 Aug 23 11:16:10 webserver3 kernel: [414970.601874] [<ffffffffa2147059>] ? list_del+0x9/0x30 Aug 23 11:16:10 webserver3 kernel: [414970.601874] [<ffffffffa24024ce>] ? wait_for_completion+0x10e/0x130 Aug 23 11:16:10 webserver3 kernel: [414970.601874] [<ffffffffa1ea1800>] ? wake_up_q+0x70/0x70 Aug 23 11:16:10 webserver3 kernel: [414970.601874] [<ffffffffa24061e2>] ? _raw_spin_lock_irqsave+0x32/0x39 Aug 23 11:16:10 webserver3 kernel: [414970.601874] [<ffffffffc05dd68d>] ? __ocfs2_cluster_lock.isra.35+0xcd/0x730 [ocfs2] Aug 23 11:16:10 webserver3 kernel: [414970.601874] [<ffffffffa21349d9>] ? snprintf+0x49/0x60 Aug 23 11:16:10 webserver3 kernel: [414970.601874] [<ffffffffc05de3bb>] ? ocfs2_dentry_lock+0xbb/0x110 [ocfs2] Aug 23 11:16:10 webserver3 kernel: [414970.601874] [<ffffffffc05cef3f>] ? ocfs2_dentry_attach_lock+0x14f/0x430 [ocfs2] Aug 23 11:16:10 webserver3 kernel: [414970.601874] [<ffffffffc05f811e>] ? ocfs2_lookup+0x19e/0x2e0 [ocfs2] Aug 23 11:16:10 webserver3 kernel: [414970.601874] [<ffffffffa200de53>] ? lookup_slow+0xa3/0x170 Aug 23 11:16:10 webserver3 kernel: [414970.601874] [<ffffffffa200e553>] ? walk_component+0x1f3/0x320 Aug 23 11:16:10 webserver3 kernel: [414970.601874] [<ffffffffa200f0d2>] ? link_path_walk+0x1b2/0x650 Aug 23 11:16:10 webserver3 kernel: [414970.601874] [<ffffffffa200f676>] ? path_lookupat+0x86/0x120 Aug 23 11:16:10 webserver3 kernel: [414970.601874] [<ffffffffa2011fc1>] ? filename_lookup+0xb1/0x180 Aug 23 11:16:10 webserver3 kernel: [414970.601874] [<ffffffffa1ffdefa>] ? __check_object_size+0xfa/0x1d8 Aug 23 11:16:10 webserver3 kernel: [414970.601874] [<ffffffffa2156768>] ? strncpy_from_user+0x48/0x160 Aug 23 11:16:10 webserver3 kernel: [414970.601874] [<ffffffffa2011bfa>] ? getname_flags+0x6a/0x1e0 Aug 23 11:16:10 webserver3 kernel: [414970.601874] [<ffffffffa1fffa0d>] ? SyS_access+0xad/0x220 Aug 23 11:16:10 webserver3 kernel: [414970.601874] [<ffffffffa240627b>] ? system_call_fast_compare_end+0xc/0x9b Aug 23 11:17:13 webserver3 kernel: [415033.612029] INFO: rcu_sched detected stalls on CPUs/tasks: Aug 23 11:17:13 webserver3 kernel: [415033.613441] 0-...: (62 GPs behind) idle=b5f/140000000000000/0 softirq=632992/632992 fqs=41465 Aug 23 11:17:13 webserver3 kernel: [415033.614788] (detected by 1, t=84027 jiffies, g=1801366, c=1801365, q=14338) Aug 23 11:17:13 webserver3 kernel: [415033.616018] Task dump for CPU 0: Thanks!
Hi!> We were experimenting with the newer version of OCFS2 on Debian 9 > Stretch inside KVM GUESTS.[...]> We have 3 nodes, but it is the same with 1 single node when we do apache > benchmark on the vm it crashes (becomes unpingable, unreachable, kernel > crashlog on virtual console) until destroy and restart. Some point of > the crashdump referred to SMP so we have tried to reconfigure the VM > with 1 cpu and guess what it worked. No crash in case of 1 cpu but the > performance is way too slow. Anybody has a clue what can go wrong here?We do see a similar behavior on a file server under load (our favorite is extracting a zip file containing a huge amount of small files). Do you also use the built-in cluster stack? Did you consider testing with an external cluster stack? I'd be very much interested if that makes a difference... best regards, Adi Kriegisch