thr3ads.net - Gluster users - [Gluster-users] GlusterFS mount crash [Dec 2022]

If this information is useful, please help other people find it:
Share via:

Xavi Hernandez

2022-Nov-25 12:24 UTC

[Gluster-users] GlusterFS mount crash

What is "loop0" it seems it's having some issue. Does it point to
a Gluster
file ?

I also see that there's an io_uring thread in D state. If that one belongs
to Gluster, it may explain why systemd was unable to generate a core dump
(all threads need to be stopped to generate a core dump, but a thread
blocked inside the kernel cannot be stopped).

If you are using io_uring in Gluster, maybe you can disable it to see if
it's related.

Xavi

On Fri, Nov 25, 2022 at 11:39 AM Angel Docampo <angel.docampo at
eoniantec.com>
wrote:
> Well, just happened again, the same server, the same mountpoint.
>
> I'm unable to get the core dumps, coredumpctl says there are no core
> dumps, it would be funny if I wasn't the one suffering it, but
> systemd-coredump service crashed as well
> ? systemd-coredump at 0-3199871-0.service - Process Core Dump (PID
> 3199871/UID 0)
>     Loaded: loaded (/lib/systemd/system/systemd-coredump at .service;
> static)
>     Active: failed (Result: timeout) since Fri 2022-11-25 10:54:59 CET;
> 39min ago
> TriggeredBy: ? systemd-coredump.socket
>       Docs: man:systemd-coredump(8)
>    Process: 3199873 ExecStart=/lib/systemd/systemd-coredump (code=killed,
> signal=TERM)
>   Main PID: 3199873 (code=killed, signal=TERM)
>        CPU: 15ms
>
> Nov 25 10:49:59 pve02 systemd[1]: Started Process Core Dump (PID
> 3199871/UID 0).
> Nov 25 10:54:59 pve02 systemd[1]: systemd-coredump at 0-3199871-0.service:
> Service reached runtime time limit. Stopping.
> Nov 25 10:54:59 pve02 systemd[1]: systemd-coredump at 0-3199871-0.service:
> Failed with result 'timeout'.
>
>
> I just saw the exception on dmesg,
> [2022-11-25 10:50:08]  INFO: task kmmpd-loop0:681644 blocked for more than
> 120 seconds.
> [2022-11-25 10:50:08]        Tainted: P          IO      5.15.60-2-pve #1
> [2022-11-25 10:50:08]  "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [2022-11-25 10:50:08]  task:kmmpd-loop0     state:D stack:    0 pid:681644
> ppid:     2 flags:0x00004000
> [2022-11-25 10:50:08]  Call Trace:
> [2022-11-25 10:50:08]   <TASK>
> [2022-11-25 10:50:08]   __schedule+0x33d/0x1750
> [2022-11-25 10:50:08]   ? bit_wait+0x70/0x70
> [2022-11-25 10:50:08]   schedule+0x4e/0xc0
> [2022-11-25 10:50:08]   io_schedule+0x46/0x80
> [2022-11-25 10:50:08]   bit_wait_io+0x11/0x70
> [2022-11-25 10:50:08]   __wait_on_bit+0x31/0xa0
> [2022-11-25 10:50:08]   out_of_line_wait_on_bit+0x8d/0xb0
> [2022-11-25 10:50:08]   ? var_wake_function+0x30/0x30
> [2022-11-25 10:50:08]   __wait_on_buffer+0x34/0x40
> [2022-11-25 10:50:08]   write_mmp_block+0x127/0x180
> [2022-11-25 10:50:08]   kmmpd+0x1b9/0x430
> [2022-11-25 10:50:08]   ? write_mmp_block+0x180/0x180
> [2022-11-25 10:50:08]   kthread+0x127/0x150
> [2022-11-25 10:50:08]   ? set_kthread_struct+0x50/0x50
> [2022-11-25 10:50:08]   ret_from_fork+0x1f/0x30
> [2022-11-25 10:50:08]   </TASK>
> [2022-11-25 10:50:08]  INFO: task iou-wrk-1511979:3200401 blocked for
> more than 120 seconds.
> [2022-11-25 10:50:08]        Tainted: P          IO      5.15.60-2-pve #1
> [2022-11-25 10:50:08]  "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [2022-11-25 10:50:08]  task:iou-wrk-1511979 state:D stack:    0
> pid:3200401 ppid:     1 flags:0x00004000
> [2022-11-25 10:50:08]  Call Trace:
> [2022-11-25 10:50:08]   <TASK>
> [2022-11-25 10:50:08]   __schedule+0x33d/0x1750
> [2022-11-25 10:50:08]   schedule+0x4e/0xc0
> [2022-11-25 10:50:08]   rwsem_down_write_slowpath+0x231/0x4f0
> [2022-11-25 10:50:08]   down_write+0x47/0x60
> [2022-11-25 10:50:08]   fuse_file_write_iter+0x1a3/0x430
> [2022-11-25 10:50:08]   ? apparmor_file_permission+0x70/0x170
> [2022-11-25 10:50:08]   io_write+0xfb/0x320
> [2022-11-25 10:50:08]   ? put_dec+0x1c/0xa0
> [2022-11-25 10:50:08]   io_issue_sqe+0x401/0x1fc0
> [2022-11-25 10:50:08]   io_wq_submit_work+0x76/0xd0
> [2022-11-25 10:50:08]   io_worker_handle_work+0x1a7/0x5f0
> [2022-11-25 10:50:08]   io_wqe_worker+0x2c0/0x360
> [2022-11-25 10:50:08]   ? finish_task_switch.isra.0+0x7e/0x2b0
> [2022-11-25 10:50:08]   ? io_worker_handle_work+0x5f0/0x5f0
> [2022-11-25 10:50:08]   ? io_worker_handle_work+0x5f0/0x5f0
> [2022-11-25 10:50:08]   ret_from_fork+0x1f/0x30
> [2022-11-25 10:50:08]  RIP: 0033:0x0
> [2022-11-25 10:50:08]  RSP: 002b:0000000000000000 EFLAGS: 00000216
> ORIG_RAX: 00000000000001aa
> [2022-11-25 10:50:08]  RAX: 0000000000000000 RBX: 00007fdb1efef640 RCX:
> 00007fdd59f872e9
> [2022-11-25 10:50:08]  RDX: 0000000000000000 RSI: 0000000000000001 RDI:
> 0000000000000011
> [2022-11-25 10:50:08]  RBP: 0000000000000000 R08: 0000000000000000 R09:
> 0000000000000008
> [2022-11-25 10:50:08]  R10: 0000000000000000 R11: 0000000000000216 R12:
> 000055662e5bd268
> [2022-11-25 10:50:08]  R13: 000055662e5bd320 R14: 000055662e5bd260 R15:
> 0000000000000000
> [2022-11-25 10:50:08]   </TASK>
> [2022-11-25 10:52:08]  INFO: task kmmpd-loop0:681644 blocked for more than
> 241 seconds.
> [2022-11-25 10:52:08]        Tainted: P          IO      5.15.60-2-pve #1
> [2022-11-25 10:52:08]  "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [2022-11-25 10:52:08]  task:kmmpd-loop0     state:D stack:    0 pid:681644
> ppid:     2 flags:0x00004000
> [2022-11-25 10:52:08]  Call Trace:
> [2022-11-25 10:52:08]   <TASK>
> [2022-11-25 10:52:08]   __schedule+0x33d/0x1750
> [2022-11-25 10:52:08]   ? bit_wait+0x70/0x70
> [2022-11-25 10:52:08]   schedule+0x4e/0xc0
> [2022-11-25 10:52:08]   io_schedule+0x46/0x80
> [2022-11-25 10:52:08]   bit_wait_io+0x11/0x70
> [2022-11-25 10:52:08]   __wait_on_bit+0x31/0xa0
> [2022-11-25 10:52:08]   out_of_line_wait_on_bit+0x8d/0xb0
> [2022-11-25 10:52:08]   ? var_wake_function+0x30/0x30
> [2022-11-25 10:52:08]   __wait_on_buffer+0x34/0x40
> [2022-11-25 10:52:08]   write_mmp_block+0x127/0x180
> [2022-11-25 10:52:08]   kmmpd+0x1b9/0x430
> [2022-11-25 10:52:08]   ? write_mmp_block+0x180/0x180
> [2022-11-25 10:52:08]   kthread+0x127/0x150
> [2022-11-25 10:52:08]   ? set_kthread_struct+0x50/0x50
> [2022-11-25 10:52:08]   ret_from_fork+0x1f/0x30
> [2022-11-25 10:52:08]   </TASK>
> [2022-11-25 10:52:08]  INFO: task iou-wrk-1511979:3200401 blocked for
> more than 241 seconds.
> [2022-11-25 10:52:08]        Tainted: P          IO      5.15.60-2-pve #1
> [2022-11-25 10:52:08]  "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [2022-11-25 10:52:08]  task:iou-wrk-1511979 state:D stack:    0
> pid:3200401 ppid:     1 flags:0x00004000
> [2022-11-25 10:52:08]  Call Trace:
> [2022-11-25 10:52:08]   <TASK>
> [2022-11-25 10:52:08]   __schedule+0x33d/0x1750
> [2022-11-25 10:52:08]   schedule+0x4e/0xc0
> [2022-11-25 10:52:08]   rwsem_down_write_slowpath+0x231/0x4f0
> [2022-11-25 10:52:08]   down_write+0x47/0x60
> [2022-11-25 10:52:08]   fuse_file_write_iter+0x1a3/0x430
> [2022-11-25 10:52:08]   ? apparmor_file_permission+0x70/0x170
> [2022-11-25 10:52:08]   io_write+0xfb/0x320
> [2022-11-25 10:52:08]   ? put_dec+0x1c/0xa0
> [2022-11-25 10:52:08]   io_issue_sqe+0x401/0x1fc0
> [2022-11-25 10:52:08]   io_wq_submit_work+0x76/0xd0
> [2022-11-25 10:52:08]   io_worker_handle_work+0x1a7/0x5f0
> [2022-11-25 10:52:08]   io_wqe_worker+0x2c0/0x360
> [2022-11-25 10:52:08]   ? finish_task_switch.isra.0+0x7e/0x2b0
> [2022-11-25 10:52:08]   ? io_worker_handle_work+0x5f0/0x5f0
> [2022-11-25 10:52:08]   ? io_worker_handle_work+0x5f0/0x5f0
> [2022-11-25 10:52:08]   ret_from_fork+0x1f/0x30
> [2022-11-25 10:52:08]  RIP: 0033:0x0
> [2022-11-25 10:52:08]  RSP: 002b:0000000000000000 EFLAGS: 00000216
> ORIG_RAX: 00000000000001aa
> [2022-11-25 10:52:08]  RAX: 0000000000000000 RBX: 00007fdb1efef640 RCX:
> 00007fdd59f872e9
> [2022-11-25 10:52:08]  RDX: 0000000000000000 RSI: 0000000000000001 RDI:
> 0000000000000011
> [2022-11-25 10:52:08]  RBP: 0000000000000000 R08: 0000000000000000 R09:
> 0000000000000008
> [2022-11-25 10:52:08]  R10: 0000000000000000 R11: 0000000000000216 R12:
> 000055662e5bd268
> [2022-11-25 10:52:08]  R13: 000055662e5bd320 R14: 000055662e5bd260 R15:
> 0000000000000000
> [2022-11-25 10:52:08]   </TASK>
> [2022-11-25 10:52:12]  loop: Write error at byte offset 37908480, length
> 4096.
> [2022-11-25 10:52:12]  print_req_error: 7 callbacks suppressed
> [2022-11-25 10:52:12]  blk_update_request: I/O error, dev loop0, sector
> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
> [2022-11-25 10:52:12]  Buffer I/O error on dev loop0, logical block 9255,
> lost sync page write
> [2022-11-25 10:52:12]  EXT4-fs error (device loop0): kmmpd:179: comm
> kmmpd-loop0: Error writing to MMP block
> [2022-11-25 10:52:12]  loop: Write error at byte offset 37908480, length
> 4096.
> [2022-11-25 10:52:12]  blk_update_request: I/O error, dev loop0, sector
> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
> [2022-11-25 10:52:12]  Buffer I/O error on dev loop0, logical block 9255,
> lost sync page write
> [2022-11-25 10:52:18]  loop: Write error at byte offset 37908480, length
> 4096.
> [2022-11-25 10:52:18]  blk_update_request: I/O error, dev loop0, sector
> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
> [2022-11-25 10:52:18]  Buffer I/O error on dev loop0, logical block 9255,
> lost sync page write
> [2022-11-25 10:52:18]  loop: Write error at byte offset 4490452992, length
> 4096.
> [2022-11-25 10:52:18]  loop: Write error at byte offset 4490457088, length
> 4096.
> [2022-11-25 10:52:18]  blk_update_request: I/O error, dev loop0, sector
> 8770416 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
> [2022-11-25 10:52:18]  blk_update_request: I/O error, dev loop0, sector
> 8770424 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
> [2022-11-25 10:52:18]  Aborting journal on device loop0-8.
> [2022-11-25 10:52:18]  loop: Write error at byte offset 4429185024, length
> 4096.
> [2022-11-25 10:52:18]  blk_update_request: I/O error, dev loop0, sector
> 8650752 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
> [2022-11-25 10:52:18]  blk_update_request: I/O error, dev loop0, sector
> 8650752 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
> [2022-11-25 10:52:18]  Buffer I/O error on dev loop0, logical block
> 1081344, lost sync page write
> [2022-11-25 10:52:18]  JBD2: Error -5 detected when updating journal
> superblock for loop0-8.
> [2022-11-25 10:52:23]  loop: Write error at byte offset 37908480, length
> 4096.
> [2022-11-25 10:52:23]  blk_update_request: I/O error, dev loop0, sector
> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
> [2022-11-25 10:52:23]  Buffer I/O error on dev loop0, logical block 9255,
> lost sync page write
> [2022-11-25 10:52:28]  loop: Write error at byte offset 37908480, length
> 4096.
> [2022-11-25 10:52:28]  blk_update_request: I/O error, dev loop0, sector
> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
> [2022-11-25 10:52:28]  Buffer I/O error on dev loop0, logical block 9255,
> lost sync page write
> [2022-11-25 10:52:33]  loop: Write error at byte offset 37908480, length
> 4096.
> [2022-11-25 10:52:33]  blk_update_request: I/O error, dev loop0, sector
> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
> [2022-11-25 10:52:33]  Buffer I/O error on dev loop0, logical block 9255,
> lost sync page write
> [2022-11-25 10:52:38]  loop: Write error at byte offset 37908480, length
> 4096.
> [2022-11-25 10:52:38]  blk_update_request: I/O error, dev loop0, sector
> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
> [2022-11-25 10:52:38]  Buffer I/O error on dev loop0, logical block 9255,
> lost sync page write
> [2022-11-25 10:52:43]  loop: Write error at byte offset 37908480, length
> 4096.
> [2022-11-25 10:52:43]  blk_update_request: I/O error, dev loop0, sector
> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
> [2022-11-25 10:52:43]  Buffer I/O error on dev loop0, logical block 9255,
> lost sync page write
> [2022-11-25 10:52:48]  loop: Write error at byte offset 37908480, length
> 4096.
> [2022-11-25 10:52:48]  blk_update_request: I/O error, dev loop0, sector
> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
> [2022-11-25 10:52:48]  Buffer I/O error on dev loop0, logical block 9255,
> lost sync page write
> [2022-11-25 10:52:53]  loop: Write error at byte offset 37908480, length
> 4096.
> [2022-11-25 10:52:53]  blk_update_request: I/O error, dev loop0, sector
> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
> [2022-11-25 10:52:53]  Buffer I/O error on dev loop0, logical block 9255,
> lost sync page write
> [2022-11-25 10:52:59]  loop: Write error at byte offset 37908480, length
> 4096.
> [2022-11-25 10:52:59]  blk_update_request: I/O error, dev loop0, sector
> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
> [2022-11-25 10:52:59]  Buffer I/O error on dev loop0, logical block 9255,
> lost sync page write
> [2022-11-25 10:53:04]  loop: Write error at byte offset 37908480, length
> 4096.
> [2022-11-25 10:53:04]  blk_update_request: I/O error, dev loop0, sector
> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
> [2022-11-25 10:53:04]  Buffer I/O error on dev loop0, logical block 9255,
> lost sync page write
> [2022-11-25 10:53:09]  loop: Write error at byte offset 37908480, length
> 4096.
> [2022-11-25 10:53:09]  blk_update_request: I/O error, dev loop0, sector
> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
> [2022-11-25 10:53:09]  Buffer I/O error on dev loop0, logical block 9255,
> lost sync page write
> [2022-11-25 10:53:14]  loop: Write error at byte offset 37908480, length
> 4096.
> [2022-11-25 10:53:14]  blk_update_request: I/O error, dev loop0, sector
> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
> [2022-11-25 10:53:14]  Buffer I/O error on dev loop0, logical block 9255,
> lost sync page write
> [2022-11-25 10:53:19]  loop: Write error at byte offset 37908480, length
> 4096.
> [2022-11-25 10:53:19]  blk_update_request: I/O error, dev loop0, sector
> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
> [2022-11-25 10:53:19]  Buffer I/O error on dev loop0, logical block 9255,
> lost sync page write
> [2022-11-25 10:53:24]  loop: Write error at byte offset 37908480, length
> 4096.
> [2022-11-25 10:53:24]  blk_update_request: I/O error, dev loop0, sector
> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
> [2022-11-25 10:53:24]  Buffer I/O error on dev loop0, logical block 9255,
> lost sync page write
> [2022-11-25 10:53:29]  loop: Write error at byte offset 37908480, length
> 4096.
> [2022-11-25 10:53:29]  blk_update_request: I/O error, dev loop0, sector
> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
> [2022-11-25 10:53:29]  Buffer I/O error on dev loop0, logical block 9255,
> lost sync page write
> [2022-11-25 10:53:34]  loop: Write error at byte offset 37908480, length
> 4096.
> [2022-11-25 10:53:34]  blk_update_request: I/O error, dev loop0, sector
> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
> [2022-11-25 10:53:34]  Buffer I/O error on dev loop0, logical block 9255,
> lost sync page write
> [2022-11-25 10:53:40]  loop: Write error at byte offset 37908480, length
> 4096.
> [2022-11-25 10:53:40]  blk_update_request: I/O error, dev loop0, sector
> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
> [2022-11-25 10:53:40]  Buffer I/O error on dev loop0, logical block 9255,
> lost sync page write
> [2022-11-25 10:53:45]  loop: Write error at byte offset 37908480, length
> 4096.
> [2022-11-25 10:53:45]  blk_update_request: I/O error, dev loop0, sector
> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
> [2022-11-25 10:53:45]  Buffer I/O error on dev loop0, logical block 9255,
> lost sync page write
> [2022-11-25 10:53:50]  loop: Write error at byte offset 37908480, length
> 4096.
> [2022-11-25 10:53:50]  blk_update_request: I/O error, dev loop0, sector
> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
> [2022-11-25 10:53:50]  Buffer I/O error on dev loop0, logical block 9255,
> lost sync page write
> [2022-11-25 10:53:55]  loop: Write error at byte offset 37908480, length
> 4096.
> [2022-11-25 10:53:55]  blk_update_request: I/O error, dev loop0, sector
> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
> [2022-11-25 10:53:55]  Buffer I/O error on dev loop0, logical block 9255,
> lost sync page write
> [2022-11-25 10:54:00]  loop: Write error at byte offset 37908480, length
> 4096.
> [2022-11-25 10:54:00]  blk_update_request: I/O error, dev loop0, sector
> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
> [2022-11-25 10:54:00]  Buffer I/O error on dev loop0, logical block 9255,
> lost sync page write
> [2022-11-25 10:54:05]  loop: Write error at byte offset 37908480, length
> 4096.
> [2022-11-25 10:54:05]  blk_update_request: I/O error, dev loop0, sector
> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
> [2022-11-25 10:54:05]  Buffer I/O error on dev loop0, logical block 9255,
> lost sync page write
> [2022-11-25 10:54:10]  loop: Write error at byte offset 37908480, length
> 4096.
> [2022-11-25 10:54:10]  blk_update_request: I/O error, dev loop0, sector
> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
> [2022-11-25 10:54:10]  Buffer I/O error on dev loop0, logical block 9255,
> lost sync page write
> [2022-11-25 10:54:15]  loop: Write error at byte offset 37908480, length
> 4096.
> [2022-11-25 10:54:15]  blk_update_request: I/O error, dev loop0, sector
> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
> [2022-11-25 10:54:15]  Buffer I/O error on dev loop0, logical block 9255,
> lost sync page write
> [2022-11-25 10:54:21]  loop: Write error at byte offset 37908480, length
> 4096.
> [2022-11-25 10:54:21]  blk_update_request: I/O error, dev loop0, sector
> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
> [2022-11-25 10:54:21]  Buffer I/O error on dev loop0, logical block 9255,
> lost sync page write
> [2022-11-25 10:54:26]  loop: Write error at byte offset 37908480, length
> 4096.
> [2022-11-25 10:54:26]  blk_update_request: I/O error, dev loop0, sector
> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
> [2022-11-25 10:54:26]  Buffer I/O error on dev loop0, logical block 9255,
> lost sync page write
> [2022-11-25 10:54:31]  loop: Write error at byte offset 37908480, length
> 4096.
> [2022-11-25 10:54:31]  blk_update_request: I/O error, dev loop0, sector
> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
> [2022-11-25 10:54:31]  Buffer I/O error on dev loop0, logical block 9255,
> lost sync page write
> [2022-11-25 10:54:36]  loop: Write error at byte offset 37908480, length
> 4096.
> [2022-11-25 10:54:36]  blk_update_request: I/O error, dev loop0, sector
> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
> [2022-11-25 10:54:36]  Buffer I/O error on dev loop0, logical block 9255,
> lost sync page write
> [2022-11-25 10:54:41]  loop: Write error at byte offset 37908480, length
> 4096.
> [2022-11-25 10:54:41]  blk_update_request: I/O error, dev loop0, sector
> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
> [2022-11-25 10:54:41]  Buffer I/O error on dev loop0, logical block 9255,
> lost sync page write
> [2022-11-25 10:54:46]  loop: Write error at byte offset 37908480, length
> 4096.
> [2022-11-25 10:54:46]  blk_update_request: I/O error, dev loop0, sector
> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
> [2022-11-25 10:54:46]  Buffer I/O error on dev loop0, logical block 9255,
> lost sync page write
> [2022-11-25 10:54:51]  loop: Write error at byte offset 37908480, length
> 4096.
> [2022-11-25 10:54:51]  blk_update_request: I/O error, dev loop0, sector
> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
> [2022-11-25 10:54:51]  Buffer I/O error on dev loop0, logical block 9255,
> lost sync page write
> [2022-11-25 10:54:56]  loop: Write error at byte offset 37908480, length
> 4096.
> [2022-11-25 10:54:56]  blk_update_request: I/O error, dev loop0, sector
> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
> [2022-11-25 10:54:56]  Buffer I/O error on dev loop0, logical block 9255,
> lost sync page write
> [2022-11-25 10:55:01]  loop: Write error at byte offset 37908480, length
> 4096.
> [2022-11-25 10:55:01]  blk_update_request: I/O error, dev loop0, sector
> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
> [2022-11-25 10:55:01]  Buffer I/O error on dev loop0, logical block 9255,
> lost sync page write
> [2022-11-25 10:55:04]  EXT4-fs error (device loop0):
> ext4_journal_check_start:83: comm burp: Detected aborted journal
> [2022-11-25 10:55:04]  loop: Write error at byte offset 0, length 4096.
> [2022-11-25 10:55:04]  blk_update_request: I/O error, dev loop0, sector 0
> op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
> [2022-11-25 10:55:04]  blk_update_request: I/O error, dev loop0, sector 0
> op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
> [2022-11-25 10:55:04]  Buffer I/O error on dev loop0, logical block 0,
> lost sync page write
> [2022-11-25 10:55:04]  EXT4-fs (loop0): I/O error while writing superblock
> [2022-11-25 10:55:04]  EXT4-fs (loop0): Remounting filesystem read-only
> [2022-11-25 10:55:07]  loop: Write error at byte offset 37908480, length
> 4096.
> [2022-11-25 10:55:07]  blk_update_request: I/O error, dev loop0, sector
> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
> [2022-11-25 10:55:07]  Buffer I/O error on dev loop0, logical block 9255,
> lost sync page write
> [2022-11-25 10:57:14]  blk_update_request: I/O error, dev loop0, sector
> 16390368 op 0x0:(READ) flags 0x80700 phys_seg 6 prio class 0
> [2022-11-25 11:03:45]  device tap136i0 entered promiscuous mode
>
> I don't know if it is relevant somehow or it is unrelated to glusterfs,
> but the consequences are the mountpoint crashes, I'm forced to lazy
unmount
> it and remount it back. Then restart all the VMs on there, unfortunately,
> this time several have the hard disk corrupted and now I'm restoring
them
> from the backup.
>
> Any tip?
>
> *Angel Docampo*
>
>
<https://www.google.com/maps/place/Edificio+de+Oficinas+Euro+3/@41.3755943,2.0730134,17z/data=!3m2!4b1!5s0x12a4997021aad323:0x3e06bf8ae6d68351!4m5!3m4!1s0x12a4997a67bf592f:0x83c2323a9cc2aa4b!8m2!3d41.3755903!4d2.0752021>
>   <angel.docampo at eoniantec.com>  <+34-93-1592929>
>
>
> El mar, 22 nov 2022 a las 12:31, Angel Docampo (<
> angel.docampo at eoniantec.com>) escribi?:
>
>> I've taken a look into all possible places they should be, and I
couldn't
>> find it anywhere. Some people say the dump file is generated where the
>> application is running... well, I don't know where to look then,
and I hope
>> they hadn't been generated on the failed mountpoint.
>>
>> As Debian 11 has systemd, I've installed systemd-coredump, so in
the case
>> a new crash happens, at least I will have the exact location and tool
>> (coredumpctl) to find them and will install then the debug symbols,
which
>> is particularly tricky on debian. But I need to wait to happen again,
now
>> the tool says there isn't any core dump on the system.
>>
>> Thank you, Xavi, if this happens again (let's hope it won't), I
will
>> report back.
>>
>> Best regards!
>>
>> *Angel Docampo*
>>
>>
<https://www.google.com/maps/place/Edificio+de+Oficinas+Euro+3/@41.3755943,2.0730134,17z/data=!3m2!4b1!5s0x12a4997021aad323:0x3e06bf8ae6d68351!4m5!3m4!1s0x12a4997a67bf592f:0x83c2323a9cc2aa4b!8m2!3d41.3755903!4d2.0752021>
>>   <angel.docampo at eoniantec.com>  <+34-93-1592929>
>>
>>
>> El mar, 22 nov 2022 a las 10:45, Xavi Hernandez (<jahernan at
redhat.com>)
>> escribi?:
>>
>>> The crash seems related to some problem in ec xlator, but I
don't have
>>> enough information to determine what it is. The crash should have
generated
>>> a core dump somewhere in the system (I don't know where Debian
keeps the
>>> core dumps). If you find it, you should be able to open it using
this
>>> command (make sure debug symbols package is also installed before
running
>>> it):
>>>
>>>     # gdb /usr/sbin/glusterfs <path to core dump>
>>>
>>> And then run this command:
>>>
>>>     # bt -full
>>>
>>> Regards,
>>>
>>> Xavi
>>>
>>> On Tue, Nov 22, 2022 at 9:41 AM Angel Docampo <
>>> angel.docampo at eoniantec.com> wrote:
>>>
>>>> Hi Xavi,
>>>>
>>>> The OS is Debian 11 with the proxmox kernel. Gluster packages
are the
>>>> official from gluster.org (
>>>>
https://download.gluster.org/pub/gluster/glusterfs/10/10.3/Debian/bullseye/
>>>> )
>>>>
>>>> The system logs showed no other issues by the time of the
crash, no OOM
>>>> kill or whatsoever, and no other process was interacting with
the gluster
>>>> mountpoint besides proxmox.
>>>>
>>>> I wasn't running gdb when it crashed, so I don't really
know if I can
>>>> obtain a more detailed trace from logs or if there is a simple
way to let
>>>> it running in the background to see if it happens again (or
there is a flag
>>>> to start the systemd daemon in debug mode).
>>>>
>>>> Best,
>>>>
>>>> *Angel Docampo*
>>>>
>>>>
<https://www.google.com/maps/place/Edificio+de+Oficinas+Euro+3/@41.3755943,2.0730134,17z/data=!3m2!4b1!5s0x12a4997021aad323:0x3e06bf8ae6d68351!4m5!3m4!1s0x12a4997a67bf592f:0x83c2323a9cc2aa4b!8m2!3d41.3755903!4d2.0752021>
>>>>   <angel.docampo at eoniantec.com> 
<+34-93-1592929>
>>>>
>>>>
>>>> El lun, 21 nov 2022 a las 15:16, Xavi Hernandez (<jahernan
at redhat.com>)
>>>> escribi?:
>>>>
>>>>> Hi Angel,
>>>>>
>>>>> On Mon, Nov 21, 2022 at 2:33 PM Angel Docampo <
>>>>> angel.docampo at eoniantec.com> wrote:
>>>>>
>>>>>> Sorry for necrobumping this, but this morning I've
suffered this on
>>>>>> my Proxmox  + GlusterFS cluster. In the log I can see
this
>>>>>>
>>>>>> [2022-11-21 07:38:00.213620 +0000] I [MSGID: 133017]
>>>>>> [shard.c:7275:shard_seek] 11-vmdata-shard: seek called
on
>>>>>> fbc063cb-874e-475d-b585-f89
>>>>>> f7518acdd. [Operation not supported]
>>>>>> pending frames:
>>>>>> frame : type(1) op(WRITE)
>>>>>> frame : type(0) op(0)
>>>>>> frame : type(0) op(0)
>>>>>> frame : type(0) op(0)
>>>>>> frame : type(0) op(0)
>>>>>> frame : type(0) op(0)
>>>>>> frame : type(0) op(0)
>>>>>> frame : type(0) op(0)
>>>>>> frame : type(0) op(0)
>>>>>> frame : type(0) op(0)
>>>>>> frame : type(0) op(0)
>>>>>> frame : type(0) op(0)
>>>>>> frame : type(0) op(0)
>>>>>> frame : type(0) op(0)
>>>>>> frame : type(0) op(0)
>>>>>> frame : type(0) op(0)
>>>>>> frame : type(0) op(0)
>>>>>> ...
>>>>>> frame : type(1) op(FSYNC)
>>>>>> frame : type(1) op(FSYNC)
>>>>>> frame : type(1) op(FSYNC)
>>>>>> frame : type(1) op(FSYNC)
>>>>>> frame : type(1) op(FSYNC)
>>>>>> frame : type(1) op(FSYNC)
>>>>>> frame : type(1) op(FSYNC)
>>>>>> frame : type(1) op(FSYNC)
>>>>>> frame : type(1) op(FSYNC)
>>>>>> frame : type(1) op(FSYNC)
>>>>>> frame : type(1) op(FSYNC)
>>>>>> frame : type(1) op(FSYNC)
>>>>>> frame : type(1) op(FSYNC)
>>>>>> frame : type(1) op(FSYNC)
>>>>>> frame : type(1) op(FSYNC)
>>>>>> frame : type(1) op(FSYNC)
>>>>>> frame : type(1) op(FSYNC)
>>>>>> frame : type(1) op(FSYNC)
>>>>>> frame : type(1) op(FSYNC)
>>>>>> frame : type(1) op(FSYNC)
>>>>>> patchset: git://git.gluster.org/glusterfs.git
>>>>>> signal received: 11
>>>>>> time of crash:
>>>>>> 2022-11-21 07:38:00 +0000
>>>>>> configuration details:
>>>>>> argp 1
>>>>>> backtrace 1
>>>>>> dlfcn 1
>>>>>> libpthread 1
>>>>>> llistxattr 1
>>>>>> setfsid 1
>>>>>> epoll.h 1
>>>>>> xattr.h 1
>>>>>> st_atim.tv_nsec 1
>>>>>> package-string: glusterfs 10.3
>>>>>>
/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x28a54)[0x7f74f286ba54]
>>>>>>
/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x700)[0x7f74f2873fc0]
>>>>>>
>>>>>>
/lib/x86_64-linux-gnu/libc.so.6(+0x38d60)[0x7f74f262ed60]
>>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x37a14)[0x7f74ecfcea14]
>>>>>>
>>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x19414)[0x7f74ecfb0414]
>>>>>>
>>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x16373)[0x7f74ecfad373]
>>>>>>
>>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x21d59)[0x7f74ecfb8d59]
>>>>>>
>>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x22815)[0x7f74ecfb9815]
>>>>>>
>>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x377d9)[0x7f74ecfce7d9]
>>>>>>
>>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x19414)[0x7f74ecfb0414]
>>>>>>
>>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x16373)[0x7f74ecfad373]
>>>>>>
>>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x170f9)[0x7f74ecfae0f9]
>>>>>>
>>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x313bb)[0x7f74ecfc83bb]
>>>>>>
>>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/protocol/client.so(+0x48e3a)[0x7f74ed06ce3a]
>>>>>>
>>>>>>
/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xfccb)[0x7f74f2816ccb]
>>>>>>
/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x26)[0x7f74f2812646]
>>>>>>
>>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/rpc-transport/socket.so(+0x64c8)[0x7f74ee15f4c8]
>>>>>>
>>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/rpc-transport/socket.so(+0xd38c)[0x7f74ee16638c]
>>>>>>
>>>>>>
/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x7971d)[0x7f74f28bc71d]
>>>>>>
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7)[0x7f74f27d2ea7]
>>>>>>
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f74f26f2aef]
>>>>>> ---------
>>>>>> The mount point wasn't accessible with the
"Transport endpoint is
>>>>>> not connected" message and it was shown like this.
>>>>>> d?????????   ? ?    ?            ?            ? vmdata
>>>>>>
>>>>>> I had to stop all the VMs on that proxmox node, then
stop the gluster
>>>>>> daemon to ummount de directory, and after starting the
daemon and
>>>>>> re-mounting, all was working again.
>>>>>>
>>>>>> My gluster volume info returns this
>>>>>>
>>>>>> Volume Name: vmdata
>>>>>> Type: Distributed-Disperse
>>>>>> Volume ID: cace5aa4-b13a-4750-8736-aa179c2485e1
>>>>>> Status: Started
>>>>>> Snapshot Count: 0
>>>>>> Number of Bricks: 2 x (2 + 1) = 6
>>>>>> Transport-type: tcp
>>>>>> Bricks:
>>>>>> Brick1: g01:/data/brick1/brick
>>>>>> Brick2: g02:/data/brick2/brick
>>>>>> Brick3: g03:/data/brick1/brick
>>>>>> Brick4: g01:/data/brick2/brick
>>>>>> Brick5: g02:/data/brick1/brick
>>>>>> Brick6: g03:/data/brick2/brick
>>>>>> Options Reconfigured:
>>>>>> nfs.disable: on
>>>>>> transport.address-family: inet
>>>>>> storage.fips-mode-rchecksum: on
>>>>>> features.shard: enable
>>>>>> features.shard-block-size: 256MB
>>>>>> performance.read-ahead: off
>>>>>> performance.quick-read: off
>>>>>> performance.io-cache: off
>>>>>> server.event-threads: 2
>>>>>> client.event-threads: 3
>>>>>> performance.client-io-threads: on
>>>>>> performance.stat-prefetch: off
>>>>>> dht.force-readdirp: off
>>>>>> performance.force-readdirp: off
>>>>>> network.remote-dio: on
>>>>>> features.cache-invalidation: on
>>>>>> performance.parallel-readdir: on
>>>>>> performance.readdir-ahead: on
>>>>>>
>>>>>> Xavi, do you think the open-behind off setting can help
somehow? I
>>>>>> did try to understand what it does (with no luck), and
if it could impact
>>>>>> the performance of my VMs (I've the setup you know
so well ;))
>>>>>> I would like to avoid more crashings like this, version
10.3 of
>>>>>> gluster was working since two weeks ago, quite well
until this morning.
>>>>>>
>>>>>
>>>>> I don't think disabling open-behind will have any
visible effect on
>>>>> performance. Open-behind is only useful for small files
when the workload
>>>>> is mostly open + read + close, and quick-read is also
enabled (which is not
>>>>> your case). The only effect it will have is that the
latency "saved" during
>>>>> open is "paid" on the next operation sent to the
file, so the total overall
>>>>> latency should be the same. Additionally, VM workload
doesn't open files
>>>>> frequently, so it shouldn't matter much in any case.
>>>>>
>>>>> That said, I'm not sure if the problem is the same in
your case. Based
>>>>> on the stack of the crash, it seems an issue inside the
disperse module.
>>>>>
>>>>> What OS are you using ? are you using official packages ? 
if so,
>>>>> which ones ?
>>>>>
>>>>> Is it possible to provide a backtrace from gdb ?
>>>>>
>>>>> Regards,
>>>>>
>>>>> Xavi
>>>>>
>>>>>
>>>>>> *Angel Docampo*
>>>>>>
>>>>>>
<https://www.google.com/maps/place/Edificio+de+Oficinas+Euro+3/@41.3755943,2.0730134,17z/data=!3m2!4b1!5s0x12a4997021aad323:0x3e06bf8ae6d68351!4m5!3m4!1s0x12a4997a67bf592f:0x83c2323a9cc2aa4b!8m2!3d41.3755903!4d2.0752021>
>>>>>>   <angel.docampo at eoniantec.com> 
<+34-93-1592929>
>>>>>>
>>>>>>
>>>>>> El vie, 19 mar 2021 a las 2:10, David Cunningham (<
>>>>>> dcunningham at voisonics.com>) escribi?:
>>>>>>
>>>>>>> Hi Xavi,
>>>>>>>
>>>>>>> Thank you for that information. We'll look at
upgrading it.
>>>>>>>
>>>>>>>
>>>>>>> On Fri, 12 Mar 2021 at 05:20, Xavi Hernandez
<jahernan at redhat.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi David,
>>>>>>>>
>>>>>>>> with so little information it's hard to
tell, but given that there
>>>>>>>> are several OPEN and UNLINK operations, it
could be related to an already
>>>>>>>> fixed bug (in recent versions) in open-behind.
>>>>>>>>
>>>>>>>> You can try disabling open-behind with this
command:
>>>>>>>>
>>>>>>>>     # gluster volume set <volname>
open-behind off
>>>>>>>>
>>>>>>>> But given the version you are using is very old
and unmaintained, I
>>>>>>>> would recommend you to upgrade to 8.x at least.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Xavi
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Mar 10, 2021 at 5:10 AM David
Cunningham <
>>>>>>>> dcunningham at voisonics.com> wrote:
>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> We have a GlusterFS 5.13 server which also
mounts itself with the
>>>>>>>>> native FUSE client. Recently the FUSE mount
crashed and we found the
>>>>>>>>> following in the syslog. There isn't
anything logged in mnt-glusterfs.log
>>>>>>>>> for that time. After killing all processes
with a file handle open on the
>>>>>>>>> filesystem we were able to unmount and then
remount the filesystem
>>>>>>>>> successfully.
>>>>>>>>>
>>>>>>>>> Would anyone have advice on how to debug
this crash? Thank you in
>>>>>>>>> advance!
>>>>>>>>>
>>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
pending frames:
>>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
frame : type(0) op(0)
>>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
frame : type(0) op(0)
>>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
frame : type(1)
>>>>>>>>> op(UNLINK)
>>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
frame : type(1)
>>>>>>>>> op(UNLINK)
>>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
frame : type(1) op(OPEN)
>>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
message repeated 3355
>>>>>>>>> times: [ frame : type(1) op(OPEN)]
>>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
frame : type(1) op(OPEN)
>>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
message repeated 6965
>>>>>>>>> times: [ frame : type(1) op(OPEN)]
>>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
frame : type(1) op(OPEN)
>>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
message repeated 4095
>>>>>>>>> times: [ frame : type(1) op(OPEN)]
>>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
frame : type(0) op(0)
>>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
patchset: git://
>>>>>>>>> git.gluster.org/glusterfs.git
>>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
signal received: 11
>>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
time of crash:
>>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
2021-03-09 03:12:31
>>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
configuration details:
>>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
argp 1
>>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
backtrace 1
>>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
dlfcn 1
>>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
libpthread 1
>>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
llistxattr 1
>>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
setfsid 1
>>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
spinlock 1
>>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
epoll.h 1
>>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
xattr.h 1
>>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
st_atim.tv_nsec 1
>>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
package-string:
>>>>>>>>> glusterfs 5.13
>>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
---------
>>>>>>>>> ...
>>>>>>>>> Mar 9 05:13:50 voip1 systemd[1]:
glusterfssharedstorage.service:
>>>>>>>>> Main process exited, code=killed,
status=11/SEGV
>>>>>>>>> Mar 9 05:13:50 voip1 systemd[1]:
glusterfssharedstorage.service:
>>>>>>>>> Failed with result 'signal'.
>>>>>>>>> ...
>>>>>>>>> Mar 9 05:13:54 voip1 systemd[1]:
glusterfssharedstorage.service:
>>>>>>>>> Service hold-off time over, scheduling
restart.
>>>>>>>>> Mar 9 05:13:54 voip1 systemd[1]:
glusterfssharedstorage.service:
>>>>>>>>> Scheduled restart job, restart counter is
at 2.
>>>>>>>>> Mar 9 05:13:54 voip1 systemd[1]: Stopped
Mount glusterfs
>>>>>>>>> sharedstorage.
>>>>>>>>> Mar 9 05:13:54 voip1 systemd[1]: Starting
Mount glusterfs
>>>>>>>>> sharedstorage...
>>>>>>>>> Mar 9 05:13:54 voip1
mount-shared-storage.sh[20520]: ERROR: Mount
>>>>>>>>> point does not exist
>>>>>>>>> Mar 9 05:13:54 voip1
mount-shared-storage.sh[20520]: Please
>>>>>>>>> specify a mount point
>>>>>>>>> Mar 9 05:13:54 voip1
mount-shared-storage.sh[20520]: Usage:
>>>>>>>>> Mar 9 05:13:54 voip1
mount-shared-storage.sh[20520]: man 8
>>>>>>>>> /sbin/mount.glusterfs
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> David Cunningham, Voisonics Limited
>>>>>>>>> http://voisonics.com/
>>>>>>>>> USA: +1 213 221 1092
>>>>>>>>> New Zealand: +64 (0)28 2558 3782
>>>>>>>>> ________
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Community Meeting Calendar:
>>>>>>>>>
>>>>>>>>> Schedule -
>>>>>>>>> Every 2nd and 4th Tuesday at 14:30 IST /
09:00 UTC
>>>>>>>>> Bridge:
https://meet.google.com/cpu-eiue-hvk
>>>>>>>>> Gluster-users mailing list
>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> David Cunningham, Voisonics Limited
>>>>>>> http://voisonics.com/
>>>>>>> USA: +1 213 221 1092
>>>>>>> New Zealand: +64 (0)28 2558 3782
>>>>>>> ________
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Community Meeting Calendar:
>>>>>>>
>>>>>>> Schedule -
>>>>>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>>>>>>> Bridge: https://meet.google.com/cpu-eiue-hvk
>>>>>>> Gluster-users mailing list
>>>>>>> Gluster-users at gluster.org
>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>
>>>>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20221125/affc2c22/attachment.html>

Angel Docampo

2022-Nov-25 18:08 UTC

head link

[Gluster-users] GlusterFS mount crash

I did also notice about that loop0... AFAIK, I wasn't using any loop
device, at least consciously.
After looking for the same messages at the other gluster/proxmox nodes, I
saw no trace of it.
Then I saw on that node, there is a single LXC container, which disk is
living on the glusterfs, and effectively, is using ext4.
After the crash of today, I was unable to boot it up again, and the logs
became silent, I did just try to boot it up, and immediately appeared this
on dmesg
[2022-11-25 18:04:18]  loop0: detected capacity change from 0 to 16777216
[2022-11-25 18:04:18]  EXT4-fs (loop0): error loading journal
[2022-11-25 18:05:26]  loop0: detected capacity change from 0 to 16777216
[2022-11-25 18:05:26]  EXT4-fs (loop0): INFO: recovery required on readonly
filesystem
[2022-11-25 18:05:26]  EXT4-fs (loop0): write access unavailable, cannot
proceed (try mounting with noload)

And the LXC container didn't boot up. I've manually moved the LXC
container
to the underlying ZFS where gluster lives, and the LXC booted up and the
dmesg log shows
[2022-11-25 18:24:06]  loop0: detected capacity change from 0 to 16777216
[2022-11-25 18:24:06]  EXT4-fs warning (device loop0):
ext4_multi_mount_protect:326: MMP interval 42 higher than expected, please
wait.
[2022-11-25 18:24:50]  EXT4-fs (loop0): mounted filesystem with ordered
data mode. Opts: (null). Quota mode: none.

So, to recapitulate:
- the loop device on the host relies on the LXC, is not surprising, but I
didn't know it.
- the LXC container had a lot of I/O issues just before the two crashes,
the crash from today, and the crash 4 days ago, this Monday
- as side note, this gluster is in production since last Thursday, so the
first crash was exactly 4 days since this LXC was started with the storage
on the gluster, and exactly 4 days after, it crashed again.
- this crashes began to happen since the upgrade to gluster 10.3, because
it was working just fine with former versions of gluster (from 3.X to 9.X),
and from proxmox 5.X to proxmox 7.1, when all the issues begun, now I'm on
proxmox 7.2.
- underlying ZFS where gluster is, has no ZIL or ZLOG (it had before the
upgrade to gluster 10.3, but as I had to re-create the gluster, I decided
not to add them because all my disks are SSD, so there is no need to add
any of those), I've added them to test if the LXC container caused the same
issues, it did, so they don't seem to make any difference.
- there are more loop0 I/O errors on the dmesg besides the days of the
crash, but there are just "one" error per day, and not all days, but
the
days gluster mountpoint become inaccessible, there are tens of errors per
millisecond just before the crash

I'm going to get rid of that LXC, as now I'm migrating from VMs to K8s
(living in a VM cluster inside proxmox), I was ready to convert this as
well, now is a must.

I don't know if anyone at gluster can replicate this scenario (proxmox +
gluster distributed disperse + LXC on a gluster directory), to see if it
can be reproducible. I know this must be a corner case, just wondering why
stopped working, if it is a bug on GlusterFS 10.3, a bug in LXC or in
Proxmox 7.1 upwards (where I'm going to post this now, but Proxmox probably
won't be interested as they explicitly suggest mounting glusterfs with the
gluster client, and not to map a directory where gluster is mounted via
fstab)

Thank you a lot Xavi, I will monitor dmesg to make sure all those loop
errors disappear, and hopefully I won't have a crash next Tuesday. :)

*Angel Docampo*
<https://www.google.com/maps/place/Edificio+de+Oficinas+Euro+3/@41.3755943,2.0730134,17z/data=!3m2!4b1!5s0x12a4997021aad323:0x3e06bf8ae6d68351!4m5!3m4!1s0x12a4997a67bf592f:0x83c2323a9cc2aa4b!8m2!3d41.3755903!4d2.0752021>
  <angel.docampo at eoniantec.com>  <+34-93-1592929>


El vie, 25 nov 2022 a las 13:25, Xavi Hernandez (<jahernan at redhat.com>)
escribi?:
> What is "loop0" it seems it's having some issue. Does it
point to a
> Gluster file ?
>
> I also see that there's an io_uring thread in D state. If that one
belongs
> to Gluster, it may explain why systemd was unable to generate a core dump
> (all threads need to be stopped to generate a core dump, but a thread
> blocked inside the kernel cannot be stopped).
>
> If you are using io_uring in Gluster, maybe you can disable it to see if
> it's related.
>
> Xavi
>
> On Fri, Nov 25, 2022 at 11:39 AM Angel Docampo <
> angel.docampo at eoniantec.com> wrote:
>
>> Well, just happened again, the same server, the same mountpoint.
>>
>> I'm unable to get the core dumps, coredumpctl says there are no
core
>> dumps, it would be funny if I wasn't the one suffering it, but
>> systemd-coredump service crashed as well
>> ? systemd-coredump at 0-3199871-0.service - Process Core Dump (PID
>> 3199871/UID 0)
>>     Loaded: loaded (/lib/systemd/system/systemd-coredump at .service;
>> static)
>>     Active: failed (Result: timeout) since Fri 2022-11-25 10:54:59 CET;
>> 39min ago
>> TriggeredBy: ? systemd-coredump.socket
>>       Docs: man:systemd-coredump(8)
>>    Process: 3199873 ExecStart=/lib/systemd/systemd-coredump
(code=killed,
>> signal=TERM)
>>   Main PID: 3199873 (code=killed, signal=TERM)
>>        CPU: 15ms
>>
>> Nov 25 10:49:59 pve02 systemd[1]: Started Process Core Dump (PID
>> 3199871/UID 0).
>> Nov 25 10:54:59 pve02 systemd[1]: systemd-coredump at
0-3199871-0.service:
>> Service reached runtime time limit. Stopping.
>> Nov 25 10:54:59 pve02 systemd[1]: systemd-coredump at
0-3199871-0.service:
>> Failed with result 'timeout'.
>>
>>
>> I just saw the exception on dmesg,
>> [2022-11-25 10:50:08]  INFO: task kmmpd-loop0:681644 blocked for more
>> than 120 seconds.
>> [2022-11-25 10:50:08]        Tainted: P          IO      5.15.60-2-pve
#1
>> [2022-11-25 10:50:08]  "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [2022-11-25 10:50:08]  task:kmmpd-loop0     state:D stack:    0
>> pid:681644 ppid:     2 flags:0x00004000
>> [2022-11-25 10:50:08]  Call Trace:
>> [2022-11-25 10:50:08]   <TASK>
>> [2022-11-25 10:50:08]   __schedule+0x33d/0x1750
>> [2022-11-25 10:50:08]   ? bit_wait+0x70/0x70
>> [2022-11-25 10:50:08]   schedule+0x4e/0xc0
>> [2022-11-25 10:50:08]   io_schedule+0x46/0x80
>> [2022-11-25 10:50:08]   bit_wait_io+0x11/0x70
>> [2022-11-25 10:50:08]   __wait_on_bit+0x31/0xa0
>> [2022-11-25 10:50:08]   out_of_line_wait_on_bit+0x8d/0xb0
>> [2022-11-25 10:50:08]   ? var_wake_function+0x30/0x30
>> [2022-11-25 10:50:08]   __wait_on_buffer+0x34/0x40
>> [2022-11-25 10:50:08]   write_mmp_block+0x127/0x180
>> [2022-11-25 10:50:08]   kmmpd+0x1b9/0x430
>> [2022-11-25 10:50:08]   ? write_mmp_block+0x180/0x180
>> [2022-11-25 10:50:08]   kthread+0x127/0x150
>> [2022-11-25 10:50:08]   ? set_kthread_struct+0x50/0x50
>> [2022-11-25 10:50:08]   ret_from_fork+0x1f/0x30
>> [2022-11-25 10:50:08]   </TASK>
>> [2022-11-25 10:50:08]  INFO: task iou-wrk-1511979:3200401 blocked for
>> more than 120 seconds.
>> [2022-11-25 10:50:08]        Tainted: P          IO      5.15.60-2-pve
#1
>> [2022-11-25 10:50:08]  "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [2022-11-25 10:50:08]  task:iou-wrk-1511979 state:D stack:    0
>> pid:3200401 ppid:     1 flags:0x00004000
>> [2022-11-25 10:50:08]  Call Trace:
>> [2022-11-25 10:50:08]   <TASK>
>> [2022-11-25 10:50:08]   __schedule+0x33d/0x1750
>> [2022-11-25 10:50:08]   schedule+0x4e/0xc0
>> [2022-11-25 10:50:08]   rwsem_down_write_slowpath+0x231/0x4f0
>> [2022-11-25 10:50:08]   down_write+0x47/0x60
>> [2022-11-25 10:50:08]   fuse_file_write_iter+0x1a3/0x430
>> [2022-11-25 10:50:08]   ? apparmor_file_permission+0x70/0x170
>> [2022-11-25 10:50:08]   io_write+0xfb/0x320
>> [2022-11-25 10:50:08]   ? put_dec+0x1c/0xa0
>> [2022-11-25 10:50:08]   io_issue_sqe+0x401/0x1fc0
>> [2022-11-25 10:50:08]   io_wq_submit_work+0x76/0xd0
>> [2022-11-25 10:50:08]   io_worker_handle_work+0x1a7/0x5f0
>> [2022-11-25 10:50:08]   io_wqe_worker+0x2c0/0x360
>> [2022-11-25 10:50:08]   ? finish_task_switch.isra.0+0x7e/0x2b0
>> [2022-11-25 10:50:08]   ? io_worker_handle_work+0x5f0/0x5f0
>> [2022-11-25 10:50:08]   ? io_worker_handle_work+0x5f0/0x5f0
>> [2022-11-25 10:50:08]   ret_from_fork+0x1f/0x30
>> [2022-11-25 10:50:08]  RIP: 0033:0x0
>> [2022-11-25 10:50:08]  RSP: 002b:0000000000000000 EFLAGS: 00000216
>> ORIG_RAX: 00000000000001aa
>> [2022-11-25 10:50:08]  RAX: 0000000000000000 RBX: 00007fdb1efef640 RCX:
>> 00007fdd59f872e9
>> [2022-11-25 10:50:08]  RDX: 0000000000000000 RSI: 0000000000000001 RDI:
>> 0000000000000011
>> [2022-11-25 10:50:08]  RBP: 0000000000000000 R08: 0000000000000000 R09:
>> 0000000000000008
>> [2022-11-25 10:50:08]  R10: 0000000000000000 R11: 0000000000000216 R12:
>> 000055662e5bd268
>> [2022-11-25 10:50:08]  R13: 000055662e5bd320 R14: 000055662e5bd260 R15:
>> 0000000000000000
>> [2022-11-25 10:50:08]   </TASK>
>> [2022-11-25 10:52:08]  INFO: task kmmpd-loop0:681644 blocked for more
>> than 241 seconds.
>> [2022-11-25 10:52:08]        Tainted: P          IO      5.15.60-2-pve
#1
>> [2022-11-25 10:52:08]  "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [2022-11-25 10:52:08]  task:kmmpd-loop0     state:D stack:    0
>> pid:681644 ppid:     2 flags:0x00004000
>> [2022-11-25 10:52:08]  Call Trace:
>> [2022-11-25 10:52:08]   <TASK>
>> [2022-11-25 10:52:08]   __schedule+0x33d/0x1750
>> [2022-11-25 10:52:08]   ? bit_wait+0x70/0x70
>> [2022-11-25 10:52:08]   schedule+0x4e/0xc0
>> [2022-11-25 10:52:08]   io_schedule+0x46/0x80
>> [2022-11-25 10:52:08]   bit_wait_io+0x11/0x70
>> [2022-11-25 10:52:08]   __wait_on_bit+0x31/0xa0
>> [2022-11-25 10:52:08]   out_of_line_wait_on_bit+0x8d/0xb0
>> [2022-11-25 10:52:08]   ? var_wake_function+0x30/0x30
>> [2022-11-25 10:52:08]   __wait_on_buffer+0x34/0x40
>> [2022-11-25 10:52:08]   write_mmp_block+0x127/0x180
>> [2022-11-25 10:52:08]   kmmpd+0x1b9/0x430
>> [2022-11-25 10:52:08]   ? write_mmp_block+0x180/0x180
>> [2022-11-25 10:52:08]   kthread+0x127/0x150
>> [2022-11-25 10:52:08]   ? set_kthread_struct+0x50/0x50
>> [2022-11-25 10:52:08]   ret_from_fork+0x1f/0x30
>> [2022-11-25 10:52:08]   </TASK>
>> [2022-11-25 10:52:08]  INFO: task iou-wrk-1511979:3200401 blocked for
>> more than 241 seconds.
>> [2022-11-25 10:52:08]        Tainted: P          IO      5.15.60-2-pve
#1
>> [2022-11-25 10:52:08]  "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [2022-11-25 10:52:08]  task:iou-wrk-1511979 state:D stack:    0
>> pid:3200401 ppid:     1 flags:0x00004000
>> [2022-11-25 10:52:08]  Call Trace:
>> [2022-11-25 10:52:08]   <TASK>
>> [2022-11-25 10:52:08]   __schedule+0x33d/0x1750
>> [2022-11-25 10:52:08]   schedule+0x4e/0xc0
>> [2022-11-25 10:52:08]   rwsem_down_write_slowpath+0x231/0x4f0
>> [2022-11-25 10:52:08]   down_write+0x47/0x60
>> [2022-11-25 10:52:08]   fuse_file_write_iter+0x1a3/0x430
>> [2022-11-25 10:52:08]   ? apparmor_file_permission+0x70/0x170
>> [2022-11-25 10:52:08]   io_write+0xfb/0x320
>> [2022-11-25 10:52:08]   ? put_dec+0x1c/0xa0
>> [2022-11-25 10:52:08]   io_issue_sqe+0x401/0x1fc0
>> [2022-11-25 10:52:08]   io_wq_submit_work+0x76/0xd0
>> [2022-11-25 10:52:08]   io_worker_handle_work+0x1a7/0x5f0
>> [2022-11-25 10:52:08]   io_wqe_worker+0x2c0/0x360
>> [2022-11-25 10:52:08]   ? finish_task_switch.isra.0+0x7e/0x2b0
>> [2022-11-25 10:52:08]   ? io_worker_handle_work+0x5f0/0x5f0
>> [2022-11-25 10:52:08]   ? io_worker_handle_work+0x5f0/0x5f0
>> [2022-11-25 10:52:08]   ret_from_fork+0x1f/0x30
>> [2022-11-25 10:52:08]  RIP: 0033:0x0
>> [2022-11-25 10:52:08]  RSP: 002b:0000000000000000 EFLAGS: 00000216
>> ORIG_RAX: 00000000000001aa
>> [2022-11-25 10:52:08]  RAX: 0000000000000000 RBX: 00007fdb1efef640 RCX:
>> 00007fdd59f872e9
>> [2022-11-25 10:52:08]  RDX: 0000000000000000 RSI: 0000000000000001 RDI:
>> 0000000000000011
>> [2022-11-25 10:52:08]  RBP: 0000000000000000 R08: 0000000000000000 R09:
>> 0000000000000008
>> [2022-11-25 10:52:08]  R10: 0000000000000000 R11: 0000000000000216 R12:
>> 000055662e5bd268
>> [2022-11-25 10:52:08]  R13: 000055662e5bd320 R14: 000055662e5bd260 R15:
>> 0000000000000000
>> [2022-11-25 10:52:08]   </TASK>
>> [2022-11-25 10:52:12]  loop: Write error at byte offset 37908480,
length
>> 4096.
>> [2022-11-25 10:52:12]  print_req_error: 7 callbacks suppressed
>> [2022-11-25 10:52:12]  blk_update_request: I/O error, dev loop0, sector
>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>> [2022-11-25 10:52:12]  Buffer I/O error on dev loop0, logical block
9255,
>> lost sync page write
>> [2022-11-25 10:52:12]  EXT4-fs error (device loop0): kmmpd:179: comm
>> kmmpd-loop0: Error writing to MMP block
>> [2022-11-25 10:52:12]  loop: Write error at byte offset 37908480,
length
>> 4096.
>> [2022-11-25 10:52:12]  blk_update_request: I/O error, dev loop0, sector
>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>> [2022-11-25 10:52:12]  Buffer I/O error on dev loop0, logical block
9255,
>> lost sync page write
>> [2022-11-25 10:52:18]  loop: Write error at byte offset 37908480,
length
>> 4096.
>> [2022-11-25 10:52:18]  blk_update_request: I/O error, dev loop0, sector
>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>> [2022-11-25 10:52:18]  Buffer I/O error on dev loop0, logical block
9255,
>> lost sync page write
>> [2022-11-25 10:52:18]  loop: Write error at byte offset 4490452992,
>> length 4096.
>> [2022-11-25 10:52:18]  loop: Write error at byte offset 4490457088,
>> length 4096.
>> [2022-11-25 10:52:18]  blk_update_request: I/O error, dev loop0, sector
>> 8770416 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
>> [2022-11-25 10:52:18]  blk_update_request: I/O error, dev loop0, sector
>> 8770424 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
>> [2022-11-25 10:52:18]  Aborting journal on device loop0-8.
>> [2022-11-25 10:52:18]  loop: Write error at byte offset 4429185024,
>> length 4096.
>> [2022-11-25 10:52:18]  blk_update_request: I/O error, dev loop0, sector
>> 8650752 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
>> [2022-11-25 10:52:18]  blk_update_request: I/O error, dev loop0, sector
>> 8650752 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
>> [2022-11-25 10:52:18]  Buffer I/O error on dev loop0, logical block
>> 1081344, lost sync page write
>> [2022-11-25 10:52:18]  JBD2: Error -5 detected when updating journal
>> superblock for loop0-8.
>> [2022-11-25 10:52:23]  loop: Write error at byte offset 37908480,
length
>> 4096.
>> [2022-11-25 10:52:23]  blk_update_request: I/O error, dev loop0, sector
>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>> [2022-11-25 10:52:23]  Buffer I/O error on dev loop0, logical block
9255,
>> lost sync page write
>> [2022-11-25 10:52:28]  loop: Write error at byte offset 37908480,
length
>> 4096.
>> [2022-11-25 10:52:28]  blk_update_request: I/O error, dev loop0, sector
>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>> [2022-11-25 10:52:28]  Buffer I/O error on dev loop0, logical block
9255,
>> lost sync page write
>> [2022-11-25 10:52:33]  loop: Write error at byte offset 37908480,
length
>> 4096.
>> [2022-11-25 10:52:33]  blk_update_request: I/O error, dev loop0, sector
>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>> [2022-11-25 10:52:33]  Buffer I/O error on dev loop0, logical block
9255,
>> lost sync page write
>> [2022-11-25 10:52:38]  loop: Write error at byte offset 37908480,
length
>> 4096.
>> [2022-11-25 10:52:38]  blk_update_request: I/O error, dev loop0, sector
>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>> [2022-11-25 10:52:38]  Buffer I/O error on dev loop0, logical block
9255,
>> lost sync page write
>> [2022-11-25 10:52:43]  loop: Write error at byte offset 37908480,
length
>> 4096.
>> [2022-11-25 10:52:43]  blk_update_request: I/O error, dev loop0, sector
>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>> [2022-11-25 10:52:43]  Buffer I/O error on dev loop0, logical block
9255,
>> lost sync page write
>> [2022-11-25 10:52:48]  loop: Write error at byte offset 37908480,
length
>> 4096.
>> [2022-11-25 10:52:48]  blk_update_request: I/O error, dev loop0, sector
>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>> [2022-11-25 10:52:48]  Buffer I/O error on dev loop0, logical block
9255,
>> lost sync page write
>> [2022-11-25 10:52:53]  loop: Write error at byte offset 37908480,
length
>> 4096.
>> [2022-11-25 10:52:53]  blk_update_request: I/O error, dev loop0, sector
>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>> [2022-11-25 10:52:53]  Buffer I/O error on dev loop0, logical block
9255,
>> lost sync page write
>> [2022-11-25 10:52:59]  loop: Write error at byte offset 37908480,
length
>> 4096.
>> [2022-11-25 10:52:59]  blk_update_request: I/O error, dev loop0, sector
>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>> [2022-11-25 10:52:59]  Buffer I/O error on dev loop0, logical block
9255,
>> lost sync page write
>> [2022-11-25 10:53:04]  loop: Write error at byte offset 37908480,
length
>> 4096.
>> [2022-11-25 10:53:04]  blk_update_request: I/O error, dev loop0, sector
>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>> [2022-11-25 10:53:04]  Buffer I/O error on dev loop0, logical block
9255,
>> lost sync page write
>> [2022-11-25 10:53:09]  loop: Write error at byte offset 37908480,
length
>> 4096.
>> [2022-11-25 10:53:09]  blk_update_request: I/O error, dev loop0, sector
>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>> [2022-11-25 10:53:09]  Buffer I/O error on dev loop0, logical block
9255,
>> lost sync page write
>> [2022-11-25 10:53:14]  loop: Write error at byte offset 37908480,
length
>> 4096.
>> [2022-11-25 10:53:14]  blk_update_request: I/O error, dev loop0, sector
>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>> [2022-11-25 10:53:14]  Buffer I/O error on dev loop0, logical block
9255,
>> lost sync page write
>> [2022-11-25 10:53:19]  loop: Write error at byte offset 37908480,
length
>> 4096.
>> [2022-11-25 10:53:19]  blk_update_request: I/O error, dev loop0, sector
>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>> [2022-11-25 10:53:19]  Buffer I/O error on dev loop0, logical block
9255,
>> lost sync page write
>> [2022-11-25 10:53:24]  loop: Write error at byte offset 37908480,
length
>> 4096.
>> [2022-11-25 10:53:24]  blk_update_request: I/O error, dev loop0, sector
>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>> [2022-11-25 10:53:24]  Buffer I/O error on dev loop0, logical block
9255,
>> lost sync page write
>> [2022-11-25 10:53:29]  loop: Write error at byte offset 37908480,
length
>> 4096.
>> [2022-11-25 10:53:29]  blk_update_request: I/O error, dev loop0, sector
>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>> [2022-11-25 10:53:29]  Buffer I/O error on dev loop0, logical block
9255,
>> lost sync page write
>> [2022-11-25 10:53:34]  loop: Write error at byte offset 37908480,
length
>> 4096.
>> [2022-11-25 10:53:34]  blk_update_request: I/O error, dev loop0, sector
>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>> [2022-11-25 10:53:34]  Buffer I/O error on dev loop0, logical block
9255,
>> lost sync page write
>> [2022-11-25 10:53:40]  loop: Write error at byte offset 37908480,
length
>> 4096.
>> [2022-11-25 10:53:40]  blk_update_request: I/O error, dev loop0, sector
>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>> [2022-11-25 10:53:40]  Buffer I/O error on dev loop0, logical block
9255,
>> lost sync page write
>> [2022-11-25 10:53:45]  loop: Write error at byte offset 37908480,
length
>> 4096.
>> [2022-11-25 10:53:45]  blk_update_request: I/O error, dev loop0, sector
>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>> [2022-11-25 10:53:45]  Buffer I/O error on dev loop0, logical block
9255,
>> lost sync page write
>> [2022-11-25 10:53:50]  loop: Write error at byte offset 37908480,
length
>> 4096.
>> [2022-11-25 10:53:50]  blk_update_request: I/O error, dev loop0, sector
>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>> [2022-11-25 10:53:50]  Buffer I/O error on dev loop0, logical block
9255,
>> lost sync page write
>> [2022-11-25 10:53:55]  loop: Write error at byte offset 37908480,
length
>> 4096.
>> [2022-11-25 10:53:55]  blk_update_request: I/O error, dev loop0, sector
>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>> [2022-11-25 10:53:55]  Buffer I/O error on dev loop0, logical block
9255,
>> lost sync page write
>> [2022-11-25 10:54:00]  loop: Write error at byte offset 37908480,
length
>> 4096.
>> [2022-11-25 10:54:00]  blk_update_request: I/O error, dev loop0, sector
>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>> [2022-11-25 10:54:00]  Buffer I/O error on dev loop0, logical block
9255,
>> lost sync page write
>> [2022-11-25 10:54:05]  loop: Write error at byte offset 37908480,
length
>> 4096.
>> [2022-11-25 10:54:05]  blk_update_request: I/O error, dev loop0, sector
>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>> [2022-11-25 10:54:05]  Buffer I/O error on dev loop0, logical block
9255,
>> lost sync page write
>> [2022-11-25 10:54:10]  loop: Write error at byte offset 37908480,
length
>> 4096.
>> [2022-11-25 10:54:10]  blk_update_request: I/O error, dev loop0, sector
>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>> [2022-11-25 10:54:10]  Buffer I/O error on dev loop0, logical block
9255,
>> lost sync page write
>> [2022-11-25 10:54:15]  loop: Write error at byte offset 37908480,
length
>> 4096.
>> [2022-11-25 10:54:15]  blk_update_request: I/O error, dev loop0, sector
>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>> [2022-11-25 10:54:15]  Buffer I/O error on dev loop0, logical block
9255,
>> lost sync page write
>> [2022-11-25 10:54:21]  loop: Write error at byte offset 37908480,
length
>> 4096.
>> [2022-11-25 10:54:21]  blk_update_request: I/O error, dev loop0, sector
>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>> [2022-11-25 10:54:21]  Buffer I/O error on dev loop0, logical block
9255,
>> lost sync page write
>> [2022-11-25 10:54:26]  loop: Write error at byte offset 37908480,
length
>> 4096.
>> [2022-11-25 10:54:26]  blk_update_request: I/O error, dev loop0, sector
>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>> [2022-11-25 10:54:26]  Buffer I/O error on dev loop0, logical block
9255,
>> lost sync page write
>> [2022-11-25 10:54:31]  loop: Write error at byte offset 37908480,
length
>> 4096.
>> [2022-11-25 10:54:31]  blk_update_request: I/O error, dev loop0, sector
>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>> [2022-11-25 10:54:31]  Buffer I/O error on dev loop0, logical block
9255,
>> lost sync page write
>> [2022-11-25 10:54:36]  loop: Write error at byte offset 37908480,
length
>> 4096.
>> [2022-11-25 10:54:36]  blk_update_request: I/O error, dev loop0, sector
>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>> [2022-11-25 10:54:36]  Buffer I/O error on dev loop0, logical block
9255,
>> lost sync page write
>> [2022-11-25 10:54:41]  loop: Write error at byte offset 37908480,
length
>> 4096.
>> [2022-11-25 10:54:41]  blk_update_request: I/O error, dev loop0, sector
>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>> [2022-11-25 10:54:41]  Buffer I/O error on dev loop0, logical block
9255,
>> lost sync page write
>> [2022-11-25 10:54:46]  loop: Write error at byte offset 37908480,
length
>> 4096.
>> [2022-11-25 10:54:46]  blk_update_request: I/O error, dev loop0, sector
>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>> [2022-11-25 10:54:46]  Buffer I/O error on dev loop0, logical block
9255,
>> lost sync page write
>> [2022-11-25 10:54:51]  loop: Write error at byte offset 37908480,
length
>> 4096.
>> [2022-11-25 10:54:51]  blk_update_request: I/O error, dev loop0, sector
>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>> [2022-11-25 10:54:51]  Buffer I/O error on dev loop0, logical block
9255,
>> lost sync page write
>> [2022-11-25 10:54:56]  loop: Write error at byte offset 37908480,
length
>> 4096.
>> [2022-11-25 10:54:56]  blk_update_request: I/O error, dev loop0, sector
>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>> [2022-11-25 10:54:56]  Buffer I/O error on dev loop0, logical block
9255,
>> lost sync page write
>> [2022-11-25 10:55:01]  loop: Write error at byte offset 37908480,
length
>> 4096.
>> [2022-11-25 10:55:01]  blk_update_request: I/O error, dev loop0, sector
>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>> [2022-11-25 10:55:01]  Buffer I/O error on dev loop0, logical block
9255,
>> lost sync page write
>> [2022-11-25 10:55:04]  EXT4-fs error (device loop0):
>> ext4_journal_check_start:83: comm burp: Detected aborted journal
>> [2022-11-25 10:55:04]  loop: Write error at byte offset 0, length 4096.
>> [2022-11-25 10:55:04]  blk_update_request: I/O error, dev loop0, sector
0
>> op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>> [2022-11-25 10:55:04]  blk_update_request: I/O error, dev loop0, sector
0
>> op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>> [2022-11-25 10:55:04]  Buffer I/O error on dev loop0, logical block 0,
>> lost sync page write
>> [2022-11-25 10:55:04]  EXT4-fs (loop0): I/O error while writing
>> superblock
>> [2022-11-25 10:55:04]  EXT4-fs (loop0): Remounting filesystem read-only
>> [2022-11-25 10:55:07]  loop: Write error at byte offset 37908480,
length
>> 4096.
>> [2022-11-25 10:55:07]  blk_update_request: I/O error, dev loop0, sector
>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>> [2022-11-25 10:55:07]  Buffer I/O error on dev loop0, logical block
9255,
>> lost sync page write
>> [2022-11-25 10:57:14]  blk_update_request: I/O error, dev loop0, sector
>> 16390368 op 0x0:(READ) flags 0x80700 phys_seg 6 prio class 0
>> [2022-11-25 11:03:45]  device tap136i0 entered promiscuous mode
>>
>> I don't know if it is relevant somehow or it is unrelated to
glusterfs,
>> but the consequences are the mountpoint crashes, I'm forced to lazy
unmount
>> it and remount it back. Then restart all the VMs on there,
unfortunately,
>> this time several have the hard disk corrupted and now I'm
restoring them
>> from the backup.
>>
>> Any tip?
>>
>> *Angel Docampo*
>>
>>
<https://www.google.com/maps/place/Edificio+de+Oficinas+Euro+3/@41.3755943,2.0730134,17z/data=!3m2!4b1!5s0x12a4997021aad323:0x3e06bf8ae6d68351!4m5!3m4!1s0x12a4997a67bf592f:0x83c2323a9cc2aa4b!8m2!3d41.3755903!4d2.0752021>
>>   <angel.docampo at eoniantec.com>  <+34-93-1592929>
>>
>>
>> El mar, 22 nov 2022 a las 12:31, Angel Docampo (<
>> angel.docampo at eoniantec.com>) escribi?:
>>
>>> I've taken a look into all possible places they should be, and
I
>>> couldn't find it anywhere. Some people say the dump file is
generated where
>>> the application is running... well, I don't know where to look
then, and I
>>> hope they hadn't been generated on the failed mountpoint.
>>>
>>> As Debian 11 has systemd, I've installed systemd-coredump, so
in the
>>> case a new crash happens, at least I will have the exact location
and tool
>>> (coredumpctl) to find them and will install then the debug symbols,
which
>>> is particularly tricky on debian. But I need to wait to happen
again, now
>>> the tool says there isn't any core dump on the system.
>>>
>>> Thank you, Xavi, if this happens again (let's hope it
won't), I will
>>> report back.
>>>
>>> Best regards!
>>>
>>> *Angel Docampo*
>>>
>>>
<https://www.google.com/maps/place/Edificio+de+Oficinas+Euro+3/@41.3755943,2.0730134,17z/data=!3m2!4b1!5s0x12a4997021aad323:0x3e06bf8ae6d68351!4m5!3m4!1s0x12a4997a67bf592f:0x83c2323a9cc2aa4b!8m2!3d41.3755903!4d2.0752021>
>>>   <angel.docampo at eoniantec.com>  <+34-93-1592929>
>>>
>>>
>>> El mar, 22 nov 2022 a las 10:45, Xavi Hernandez (<jahernan at
redhat.com>)
>>> escribi?:
>>>
>>>> The crash seems related to some problem in ec xlator, but I
don't have
>>>> enough information to determine what it is. The crash should
have generated
>>>> a core dump somewhere in the system (I don't know where
Debian keeps the
>>>> core dumps). If you find it, you should be able to open it
using this
>>>> command (make sure debug symbols package is also installed
before running
>>>> it):
>>>>
>>>>     # gdb /usr/sbin/glusterfs <path to core dump>
>>>>
>>>> And then run this command:
>>>>
>>>>     # bt -full
>>>>
>>>> Regards,
>>>>
>>>> Xavi
>>>>
>>>> On Tue, Nov 22, 2022 at 9:41 AM Angel Docampo <
>>>> angel.docampo at eoniantec.com> wrote:
>>>>
>>>>> Hi Xavi,
>>>>>
>>>>> The OS is Debian 11 with the proxmox kernel. Gluster
packages are the
>>>>> official from gluster.org (
>>>>>
https://download.gluster.org/pub/gluster/glusterfs/10/10.3/Debian/bullseye/
>>>>> )
>>>>>
>>>>> The system logs showed no other issues by the time of the
crash, no
>>>>> OOM kill or whatsoever, and no other process was
interacting with the
>>>>> gluster mountpoint besides proxmox.
>>>>>
>>>>> I wasn't running gdb when it crashed, so I don't
really know if I can
>>>>> obtain a more detailed trace from logs or if there is a
simple way to let
>>>>> it running in the background to see if it happens again (or
there is a flag
>>>>> to start the systemd daemon in debug mode).
>>>>>
>>>>> Best,
>>>>>
>>>>> *Angel Docampo*
>>>>>
>>>>>
<https://www.google.com/maps/place/Edificio+de+Oficinas+Euro+3/@41.3755943,2.0730134,17z/data=!3m2!4b1!5s0x12a4997021aad323:0x3e06bf8ae6d68351!4m5!3m4!1s0x12a4997a67bf592f:0x83c2323a9cc2aa4b!8m2!3d41.3755903!4d2.0752021>
>>>>>   <angel.docampo at eoniantec.com> 
<+34-93-1592929>
>>>>>
>>>>>
>>>>> El lun, 21 nov 2022 a las 15:16, Xavi Hernandez
(<jahernan at redhat.com>)
>>>>> escribi?:
>>>>>
>>>>>> Hi Angel,
>>>>>>
>>>>>> On Mon, Nov 21, 2022 at 2:33 PM Angel Docampo <
>>>>>> angel.docampo at eoniantec.com> wrote:
>>>>>>
>>>>>>> Sorry for necrobumping this, but this morning
I've suffered this on
>>>>>>> my Proxmox  + GlusterFS cluster. In the log I can
see this
>>>>>>>
>>>>>>> [2022-11-21 07:38:00.213620 +0000] I [MSGID:
133017]
>>>>>>> [shard.c:7275:shard_seek] 11-vmdata-shard: seek
called on
>>>>>>> fbc063cb-874e-475d-b585-f89
>>>>>>> f7518acdd. [Operation not supported]
>>>>>>> pending frames:
>>>>>>> frame : type(1) op(WRITE)
>>>>>>> frame : type(0) op(0)
>>>>>>> frame : type(0) op(0)
>>>>>>> frame : type(0) op(0)
>>>>>>> frame : type(0) op(0)
>>>>>>> frame : type(0) op(0)
>>>>>>> frame : type(0) op(0)
>>>>>>> frame : type(0) op(0)
>>>>>>> frame : type(0) op(0)
>>>>>>> frame : type(0) op(0)
>>>>>>> frame : type(0) op(0)
>>>>>>> frame : type(0) op(0)
>>>>>>> frame : type(0) op(0)
>>>>>>> frame : type(0) op(0)
>>>>>>> frame : type(0) op(0)
>>>>>>> frame : type(0) op(0)
>>>>>>> frame : type(0) op(0)
>>>>>>> ...
>>>>>>> frame : type(1) op(FSYNC)
>>>>>>> frame : type(1) op(FSYNC)
>>>>>>> frame : type(1) op(FSYNC)
>>>>>>> frame : type(1) op(FSYNC)
>>>>>>> frame : type(1) op(FSYNC)
>>>>>>> frame : type(1) op(FSYNC)
>>>>>>> frame : type(1) op(FSYNC)
>>>>>>> frame : type(1) op(FSYNC)
>>>>>>> frame : type(1) op(FSYNC)
>>>>>>> frame : type(1) op(FSYNC)
>>>>>>> frame : type(1) op(FSYNC)
>>>>>>> frame : type(1) op(FSYNC)
>>>>>>> frame : type(1) op(FSYNC)
>>>>>>> frame : type(1) op(FSYNC)
>>>>>>> frame : type(1) op(FSYNC)
>>>>>>> frame : type(1) op(FSYNC)
>>>>>>> frame : type(1) op(FSYNC)
>>>>>>> frame : type(1) op(FSYNC)
>>>>>>> frame : type(1) op(FSYNC)
>>>>>>> frame : type(1) op(FSYNC)
>>>>>>> patchset: git://git.gluster.org/glusterfs.git
>>>>>>> signal received: 11
>>>>>>> time of crash:
>>>>>>> 2022-11-21 07:38:00 +0000
>>>>>>> configuration details:
>>>>>>> argp 1
>>>>>>> backtrace 1
>>>>>>> dlfcn 1
>>>>>>> libpthread 1
>>>>>>> llistxattr 1
>>>>>>> setfsid 1
>>>>>>> epoll.h 1
>>>>>>> xattr.h 1
>>>>>>> st_atim.tv_nsec 1
>>>>>>> package-string: glusterfs 10.3
>>>>>>>
/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x28a54)[0x7f74f286ba54]
>>>>>>>
/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x700)[0x7f74f2873fc0]
>>>>>>>
>>>>>>>
/lib/x86_64-linux-gnu/libc.so.6(+0x38d60)[0x7f74f262ed60]
>>>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x37a14)[0x7f74ecfcea14]
>>>>>>>
>>>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x19414)[0x7f74ecfb0414]
>>>>>>>
>>>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x16373)[0x7f74ecfad373]
>>>>>>>
>>>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x21d59)[0x7f74ecfb8d59]
>>>>>>>
>>>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x22815)[0x7f74ecfb9815]
>>>>>>>
>>>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x377d9)[0x7f74ecfce7d9]
>>>>>>>
>>>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x19414)[0x7f74ecfb0414]
>>>>>>>
>>>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x16373)[0x7f74ecfad373]
>>>>>>>
>>>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x170f9)[0x7f74ecfae0f9]
>>>>>>>
>>>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x313bb)[0x7f74ecfc83bb]
>>>>>>>
>>>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/protocol/client.so(+0x48e3a)[0x7f74ed06ce3a]
>>>>>>>
>>>>>>>
/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xfccb)[0x7f74f2816ccb]
>>>>>>>
/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x26)[0x7f74f2812646]
>>>>>>>
>>>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/rpc-transport/socket.so(+0x64c8)[0x7f74ee15f4c8]
>>>>>>>
>>>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/rpc-transport/socket.so(+0xd38c)[0x7f74ee16638c]
>>>>>>>
>>>>>>>
/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x7971d)[0x7f74f28bc71d]
>>>>>>>
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7)[0x7f74f27d2ea7]
>>>>>>>
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f74f26f2aef]
>>>>>>> ---------
>>>>>>> The mount point wasn't accessible with the
"Transport endpoint is
>>>>>>> not connected" message and it was shown like
this.
>>>>>>> d?????????   ? ?    ?            ?            ?
vmdata
>>>>>>>
>>>>>>> I had to stop all the VMs on that proxmox node,
then stop the
>>>>>>> gluster daemon to ummount de directory, and after
starting the daemon and
>>>>>>> re-mounting, all was working again.
>>>>>>>
>>>>>>> My gluster volume info returns this
>>>>>>>
>>>>>>> Volume Name: vmdata
>>>>>>> Type: Distributed-Disperse
>>>>>>> Volume ID: cace5aa4-b13a-4750-8736-aa179c2485e1
>>>>>>> Status: Started
>>>>>>> Snapshot Count: 0
>>>>>>> Number of Bricks: 2 x (2 + 1) = 6
>>>>>>> Transport-type: tcp
>>>>>>> Bricks:
>>>>>>> Brick1: g01:/data/brick1/brick
>>>>>>> Brick2: g02:/data/brick2/brick
>>>>>>> Brick3: g03:/data/brick1/brick
>>>>>>> Brick4: g01:/data/brick2/brick
>>>>>>> Brick5: g02:/data/brick1/brick
>>>>>>> Brick6: g03:/data/brick2/brick
>>>>>>> Options Reconfigured:
>>>>>>> nfs.disable: on
>>>>>>> transport.address-family: inet
>>>>>>> storage.fips-mode-rchecksum: on
>>>>>>> features.shard: enable
>>>>>>> features.shard-block-size: 256MB
>>>>>>> performance.read-ahead: off
>>>>>>> performance.quick-read: off
>>>>>>> performance.io-cache: off
>>>>>>> server.event-threads: 2
>>>>>>> client.event-threads: 3
>>>>>>> performance.client-io-threads: on
>>>>>>> performance.stat-prefetch: off
>>>>>>> dht.force-readdirp: off
>>>>>>> performance.force-readdirp: off
>>>>>>> network.remote-dio: on
>>>>>>> features.cache-invalidation: on
>>>>>>> performance.parallel-readdir: on
>>>>>>> performance.readdir-ahead: on
>>>>>>>
>>>>>>> Xavi, do you think the open-behind off setting can
help somehow? I
>>>>>>> did try to understand what it does (with no luck),
and if it could impact
>>>>>>> the performance of my VMs (I've the setup you
know so well ;))
>>>>>>> I would like to avoid more crashings like this,
version 10.3 of
>>>>>>> gluster was working since two weeks ago, quite well
until this morning.
>>>>>>>
>>>>>>
>>>>>> I don't think disabling open-behind will have any
visible effect on
>>>>>> performance. Open-behind is only useful for small files
when the workload
>>>>>> is mostly open + read + close, and quick-read is also
enabled (which is not
>>>>>> your case). The only effect it will have is that the
latency "saved" during
>>>>>> open is "paid" on the next operation sent to
the file, so the total overall
>>>>>> latency should be the same. Additionally, VM workload
doesn't open files
>>>>>> frequently, so it shouldn't matter much in any
case.
>>>>>>
>>>>>> That said, I'm not sure if the problem is the same
in your case.
>>>>>> Based on the stack of the crash, it seems an issue
inside the disperse
>>>>>> module.
>>>>>>
>>>>>> What OS are you using ? are you using official packages
?  if so,
>>>>>> which ones ?
>>>>>>
>>>>>> Is it possible to provide a backtrace from gdb ?
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Xavi
>>>>>>
>>>>>>
>>>>>>> *Angel Docampo*
>>>>>>>
>>>>>>>
<https://www.google.com/maps/place/Edificio+de+Oficinas+Euro+3/@41.3755943,2.0730134,17z/data=!3m2!4b1!5s0x12a4997021aad323:0x3e06bf8ae6d68351!4m5!3m4!1s0x12a4997a67bf592f:0x83c2323a9cc2aa4b!8m2!3d41.3755903!4d2.0752021>
>>>>>>>   <angel.docampo at eoniantec.com> 
<+34-93-1592929>
>>>>>>>
>>>>>>>
>>>>>>> El vie, 19 mar 2021 a las 2:10, David Cunningham
(<
>>>>>>> dcunningham at voisonics.com>) escribi?:
>>>>>>>
>>>>>>>> Hi Xavi,
>>>>>>>>
>>>>>>>> Thank you for that information. We'll look
at upgrading it.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, 12 Mar 2021 at 05:20, Xavi Hernandez
<jahernan at redhat.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi David,
>>>>>>>>>
>>>>>>>>> with so little information it's hard to
tell, but given that there
>>>>>>>>> are several OPEN and UNLINK operations, it
could be related to an already
>>>>>>>>> fixed bug (in recent versions) in
open-behind.
>>>>>>>>>
>>>>>>>>> You can try disabling open-behind with this
command:
>>>>>>>>>
>>>>>>>>>     # gluster volume set <volname>
open-behind off
>>>>>>>>>
>>>>>>>>> But given the version you are using is very
old and unmaintained,
>>>>>>>>> I would recommend you to upgrade to 8.x at
least.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>> Xavi
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Mar 10, 2021 at 5:10 AM David
Cunningham <
>>>>>>>>> dcunningham at voisonics.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> We have a GlusterFS 5.13 server which
also mounts itself with the
>>>>>>>>>> native FUSE client. Recently the FUSE
mount crashed and we found the
>>>>>>>>>> following in the syslog. There
isn't anything logged in mnt-glusterfs.log
>>>>>>>>>> for that time. After killing all
processes with a file handle open on the
>>>>>>>>>> filesystem we were able to unmount and
then remount the filesystem
>>>>>>>>>> successfully.
>>>>>>>>>>
>>>>>>>>>> Would anyone have advice on how to
debug this crash? Thank you in
>>>>>>>>>> advance!
>>>>>>>>>>
>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: pending frames:
>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: frame : type(0) op(0)
>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: frame : type(0) op(0)
>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: frame : type(1)
>>>>>>>>>> op(UNLINK)
>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: frame : type(1)
>>>>>>>>>> op(UNLINK)
>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: frame : type(1) op(OPEN)
>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: message repeated 3355
>>>>>>>>>> times: [ frame : type(1) op(OPEN)]
>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: frame : type(1) op(OPEN)
>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: message repeated 6965
>>>>>>>>>> times: [ frame : type(1) op(OPEN)]
>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: frame : type(1) op(OPEN)
>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: message repeated 4095
>>>>>>>>>> times: [ frame : type(1) op(OPEN)]
>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: frame : type(0) op(0)
>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: patchset: git://
>>>>>>>>>> git.gluster.org/glusterfs.git
>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: signal received: 11
>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: time of crash:
>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: 2021-03-09 03:12:31
>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: configuration details:
>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: argp 1
>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: backtrace 1
>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: dlfcn 1
>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: libpthread 1
>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: llistxattr 1
>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: setfsid 1
>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: spinlock 1
>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: epoll.h 1
>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: xattr.h 1
>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: st_atim.tv_nsec 1
>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: package-string:
>>>>>>>>>> glusterfs 5.13
>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: ---------
>>>>>>>>>> ...
>>>>>>>>>> Mar 9 05:13:50 voip1 systemd[1]:
glusterfssharedstorage.service:
>>>>>>>>>> Main process exited, code=killed,
status=11/SEGV
>>>>>>>>>> Mar 9 05:13:50 voip1 systemd[1]:
glusterfssharedstorage.service:
>>>>>>>>>> Failed with result 'signal'.
>>>>>>>>>> ...
>>>>>>>>>> Mar 9 05:13:54 voip1 systemd[1]:
glusterfssharedstorage.service:
>>>>>>>>>> Service hold-off time over, scheduling
restart.
>>>>>>>>>> Mar 9 05:13:54 voip1 systemd[1]:
glusterfssharedstorage.service:
>>>>>>>>>> Scheduled restart job, restart counter
is at 2.
>>>>>>>>>> Mar 9 05:13:54 voip1 systemd[1]:
Stopped Mount glusterfs
>>>>>>>>>> sharedstorage.
>>>>>>>>>> Mar 9 05:13:54 voip1 systemd[1]:
Starting Mount glusterfs
>>>>>>>>>> sharedstorage...
>>>>>>>>>> Mar 9 05:13:54 voip1
mount-shared-storage.sh[20520]: ERROR: Mount
>>>>>>>>>> point does not exist
>>>>>>>>>> Mar 9 05:13:54 voip1
mount-shared-storage.sh[20520]: Please
>>>>>>>>>> specify a mount point
>>>>>>>>>> Mar 9 05:13:54 voip1
mount-shared-storage.sh[20520]: Usage:
>>>>>>>>>> Mar 9 05:13:54 voip1
mount-shared-storage.sh[20520]: man 8
>>>>>>>>>> /sbin/mount.glusterfs
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> David Cunningham, Voisonics Limited
>>>>>>>>>> http://voisonics.com/
>>>>>>>>>> USA: +1 213 221 1092
>>>>>>>>>> New Zealand: +64 (0)28 2558 3782
>>>>>>>>>> ________
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Community Meeting Calendar:
>>>>>>>>>>
>>>>>>>>>> Schedule -
>>>>>>>>>> Every 2nd and 4th Tuesday at 14:30 IST
/ 09:00 UTC
>>>>>>>>>> Bridge:
https://meet.google.com/cpu-eiue-hvk
>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> David Cunningham, Voisonics Limited
>>>>>>>> http://voisonics.com/
>>>>>>>> USA: +1 213 221 1092
>>>>>>>> New Zealand: +64 (0)28 2558 3782
>>>>>>>> ________
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Community Meeting Calendar:
>>>>>>>>
>>>>>>>> Schedule -
>>>>>>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00
UTC
>>>>>>>> Bridge: https://meet.google.com/cpu-eiue-hvk
>>>>>>>> Gluster-users mailing list
>>>>>>>> Gluster-users at gluster.org
>>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>
>>>>>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20221125/57813c1f/attachment.html>

Angel Docampo

2022-Dec-01 16:27 UTC

head link

[Gluster-users] GlusterFS mount crash

Well, that last more time, but it crashed once again, same node, same
mountpoint... fortunately, I've moved preventively all the VMs to the
underlying ZFS filesystem those past days, so none of them have been
affected this time...

dmesg show this
[2022-12-01 15:49:54]  INFO: task iou-wrk-637144:946532 blocked for more
than 120 seconds.
[2022-12-01 15:49:54]        Tainted: P          IO      5.15.74-1-pve #1
[2022-12-01 15:49:54]  "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[2022-12-01 15:49:54]  task:iou-wrk-637144  state:D stack:    0 pid:946532
ppid:     1 flags:0x00004000
[2022-12-01 15:49:54]  Call Trace:
[2022-12-01 15:49:54]   <TASK>
[2022-12-01 15:49:54]   __schedule+0x34e/0x1740
[2022-12-01 15:49:54]   ? kmem_cache_free+0x271/0x290
[2022-12-01 15:49:54]   ? mempool_free_slab+0x17/0x20
[2022-12-01 15:49:54]   schedule+0x69/0x110
[2022-12-01 15:49:54]   rwsem_down_write_slowpath+0x231/0x4f0
[2022-12-01 15:49:54]   ? ttwu_queue_wakelist+0x40/0x1c0
[2022-12-01 15:49:54]   down_write+0x47/0x60
[2022-12-01 15:49:54]   fuse_file_write_iter+0x1a3/0x430
[2022-12-01 15:49:54]   ? apparmor_file_permission+0x70/0x170
[2022-12-01 15:49:54]   io_write+0xf6/0x330
[2022-12-01 15:49:54]   ? update_cfs_group+0x9c/0xc0
[2022-12-01 15:49:54]   ? dequeue_entity+0xd8/0x490
[2022-12-01 15:49:54]   io_issue_sqe+0x401/0x1fc0
[2022-12-01 15:49:54]   ? lock_timer_base+0x3b/0xd0
[2022-12-01 15:49:54]   io_wq_submit_work+0x76/0xd0
[2022-12-01 15:49:54]   io_worker_handle_work+0x1a7/0x5f0
[2022-12-01 15:49:54]   io_wqe_worker+0x2c0/0x360
[2022-12-01 15:49:54]   ? finish_task_switch.isra.0+0x7e/0x2b0
[2022-12-01 15:49:54]   ? io_worker_handle_work+0x5f0/0x5f0
[2022-12-01 15:49:54]   ? io_worker_handle_work+0x5f0/0x5f0
[2022-12-01 15:49:54]   ret_from_fork+0x1f/0x30
[2022-12-01 15:49:54]  RIP: 0033:0x0
[2022-12-01 15:49:54]  RSP: 002b:0000000000000000 EFLAGS: 00000207
[2022-12-01 15:49:54]  RAX: 0000000000000000 RBX: 0000000000000011 RCX:
0000000000000000
[2022-12-01 15:49:54]  RDX: 0000000000000001 RSI: 0000000000000001 RDI:
0000000000000120
[2022-12-01 15:49:54]  RBP: 0000000000000120 R08: 0000000000000001 R09:
00000000000000f0
[2022-12-01 15:49:54]  R10: 00000000000000f8 R11: 00000001239a4128 R12:
ffffffffffffdb90
[2022-12-01 15:49:54]  R13: 0000000000000001 R14: 0000000000000001 R15:
0000000000000100
[2022-12-01 15:49:54]   </TASK>

My gluster volume log shows plenty of error like this
The message "I [MSGID: 133017] [shard.c:7275:shard_seek] 0-vmdata-shard:
seek called on 73f0ad95-f7e3-4a68-8d08-9f7e03182baa. [Operation not
supported]" repeated 1564 times between [2022-12-01 00:20:09.578233 +0000]
and [2022-12-01 00:22:09.436927 +0000]
[2022-12-01 00:22:09.516269 +0000] I [MSGID: 133017]
[shard.c:7275:shard_seek] 0-vmdata-shard: seek called on
73f0ad95-f7e3-4a68-8d08-9f7e03182baa. [Operation not supported]

and of this
[2022-12-01 09:05:08.525867 +0000] I [MSGID: 133017]
[shard.c:7275:shard_seek] 0-vmdata-shard: seek called on
3ed993c4-bbb5-4938-86e9-6d22b8541e8e. [Operation not supported]

Then simply the same
pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(1) op(FSYNC)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash:
2022-12-01 14:45:14 +0000
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 10.3
/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x28a54)[0x7f1e23db3a54]
/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x700)[0x7f1e23dbbfc0]

/lib/x86_64-linux-gnu/libc.so.6(+0x38d60)[0x7f1e23b76d60]
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x37a14)[0x7f1e200e9a14]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x19414)[0x7f1e200cb414]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0xd072)[0x7f1e200bf072]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/performance/readdir-ahead.so(+0x316d)[0x7f1e200a316d]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/distribute.so(+0x5bdd4)[0x7f1e197aadd4]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/features/shard.so(+0x1e69c)[0x7f1e2008b69c]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/features/shard.so(+0x16551)[0x7f1e20083551]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/features/shard.so(+0x25abf)[0x7f1e20092abf]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/features/shard.so(+0x25d21)[0x7f1e20092d21]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/features/shard.so(+0x167be)[0x7f1e200837be]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/features/shard.so(+0x1c178)[0x7f1e20089178]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/features/utime.so(+0x7804)[0x7f1e20064804]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/performance/write-behind.so(+0x8164)[0x7f1e2004e164]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/performance/write-behind.so(+0x9228)[0x7f1e2004f228]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/performance/write-behind.so(+0x9a4d)[0x7f1e2004fa4d]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/features/utime.so(+0x29e5)[0x7f1e2005f9e5]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/features/shard.so(+0x12e59)[0x7f1e2007fe59]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/features/shard.so(+0xc2c6)[0x7f1e200792c6]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/distribute.so(+0x69e90)[0x7f1e197b8e90]

/lib/x86_64-linux-gnu/libglusterfs.so.0(default_fxattrop_cbk+0x125)[0x7f1e23e27515]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x2421c)[0x7f1e200d621c]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x19414)[0x7f1e200cb414]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x16373)[0x7f1e200c8373]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x170f9)[0x7f1e200c90f9]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x1f929)[0x7f1e200d1929]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/protocol/client.so(+0x469c2)[0x7f1e201859c2]

/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xfccb)[0x7f1e23d5eccb]
/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x26)[0x7f1e23d5a646]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/rpc-transport/socket.so(+0x64c8)[0x7f1e202784c8]

/usr/lib/x86_64-linux-gnu/glusterfs/10.3/rpc-transport/socket.so(+0xd38c)[0x7f1e2027f38c]

/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x7971d)[0x7f1e23e0471d]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7)[0x7f1e23d1aea7]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f1e23c3aa2f]

I'm still unable to gather any core dump.

I can barely read something intelligible from all of this, but it's clearly
something happening with sharding here. So I'm going to empty all the
volume and destroy it completely and re-create another volume without
sharding, and see what happens

*Angel Docampo*
<https://www.google.com/maps/place/Edificio+de+Oficinas+Euro+3/@41.3755943,2.0730134,17z/data=!3m2!4b1!5s0x12a4997021aad323:0x3e06bf8ae6d68351!4m5!3m4!1s0x12a4997a67bf592f:0x83c2323a9cc2aa4b!8m2!3d41.3755903!4d2.0752021>
  <angel.docampo at eoniantec.com>  <+34-93-1592929>


El vie, 25 nov 2022 a las 19:08, Angel Docampo (<angel.docampo at
eoniantec.com>)
escribi?:
> I did also notice about that loop0... AFAIK, I wasn't using any loop
> device, at least consciously.
> After looking for the same messages at the other gluster/proxmox nodes, I
> saw no trace of it.
> Then I saw on that node, there is a single LXC container, which disk is
> living on the glusterfs, and effectively, is using ext4.
> After the crash of today, I was unable to boot it up again, and the logs
> became silent, I did just try to boot it up, and immediately appeared this
> on dmesg
> [2022-11-25 18:04:18]  loop0: detected capacity change from 0 to 16777216
> [2022-11-25 18:04:18]  EXT4-fs (loop0): error loading journal
> [2022-11-25 18:05:26]  loop0: detected capacity change from 0 to 16777216
> [2022-11-25 18:05:26]  EXT4-fs (loop0): INFO: recovery required on
> readonly filesystem
> [2022-11-25 18:05:26]  EXT4-fs (loop0): write access unavailable, cannot
> proceed (try mounting with noload)
>
> And the LXC container didn't boot up. I've manually moved the LXC
> container to the underlying ZFS where gluster lives, and the LXC booted up
> and the dmesg log shows
> [2022-11-25 18:24:06]  loop0: detected capacity change from 0 to 16777216
> [2022-11-25 18:24:06]  EXT4-fs warning (device loop0):
> ext4_multi_mount_protect:326: MMP interval 42 higher than expected, please
> wait.
> [2022-11-25 18:24:50]  EXT4-fs (loop0): mounted filesystem with ordered
> data mode. Opts: (null). Quota mode: none.
>
> So, to recapitulate:
> - the loop device on the host relies on the LXC, is not surprising, but I
> didn't know it.
> - the LXC container had a lot of I/O issues just before the two crashes,
> the crash from today, and the crash 4 days ago, this Monday
> - as side note, this gluster is in production since last Thursday, so the
> first crash was exactly 4 days since this LXC was started with the storage
> on the gluster, and exactly 4 days after, it crashed again.
> - this crashes began to happen since the upgrade to gluster 10.3, because
> it was working just fine with former versions of gluster (from 3.X to 9.X),
> and from proxmox 5.X to proxmox 7.1, when all the issues begun, now I'm
on
> proxmox 7.2.
> - underlying ZFS where gluster is, has no ZIL or ZLOG (it had before the
> upgrade to gluster 10.3, but as I had to re-create the gluster, I decided
> not to add them because all my disks are SSD, so there is no need to add
> any of those), I've added them to test if the LXC container caused the
same
> issues, it did, so they don't seem to make any difference.
> - there are more loop0 I/O errors on the dmesg besides the days of the
> crash, but there are just "one" error per day, and not all days,
but the
> days gluster mountpoint become inaccessible, there are tens of errors per
> millisecond just before the crash
>
> I'm going to get rid of that LXC, as now I'm migrating from VMs to
K8s
> (living in a VM cluster inside proxmox), I was ready to convert this as
> well, now is a must.
>
> I don't know if anyone at gluster can replicate this scenario (proxmox
+
> gluster distributed disperse + LXC on a gluster directory), to see if it
> can be reproducible. I know this must be a corner case, just wondering why
> stopped working, if it is a bug on GlusterFS 10.3, a bug in LXC or in
> Proxmox 7.1 upwards (where I'm going to post this now, but Proxmox
probably
> won't be interested as they explicitly suggest mounting glusterfs with
the
> gluster client, and not to map a directory where gluster is mounted via
> fstab)
>
> Thank you a lot Xavi, I will monitor dmesg to make sure all those loop
> errors disappear, and hopefully I won't have a crash next Tuesday. :)
>
> *Angel Docampo*
>
>
<https://www.google.com/maps/place/Edificio+de+Oficinas+Euro+3/@41.3755943,2.0730134,17z/data=!3m2!4b1!5s0x12a4997021aad323:0x3e06bf8ae6d68351!4m5!3m4!1s0x12a4997a67bf592f:0x83c2323a9cc2aa4b!8m2!3d41.3755903!4d2.0752021>
>   <angel.docampo at eoniantec.com>  <+34-93-1592929>
>
>
> El vie, 25 nov 2022 a las 13:25, Xavi Hernandez (<jahernan at
redhat.com>)
> escribi?:
>
>> What is "loop0" it seems it's having some issue. Does it
point to a
>> Gluster file ?
>>
>> I also see that there's an io_uring thread in D state. If that one
>> belongs to Gluster, it may explain why systemd was unable to generate a
>> core dump (all threads need to be stopped to generate a core dump, but
a
>> thread blocked inside the kernel cannot be stopped).
>>
>> If you are using io_uring in Gluster, maybe you can disable it to see
if
>> it's related.
>>
>> Xavi
>>
>> On Fri, Nov 25, 2022 at 11:39 AM Angel Docampo <
>> angel.docampo at eoniantec.com> wrote:
>>
>>> Well, just happened again, the same server, the same mountpoint.
>>>
>>> I'm unable to get the core dumps, coredumpctl says there are no
core
>>> dumps, it would be funny if I wasn't the one suffering it, but
>>> systemd-coredump service crashed as well
>>> ? systemd-coredump at 0-3199871-0.service - Process Core Dump (PID
>>> 3199871/UID 0)
>>>     Loaded: loaded (/lib/systemd/system/systemd-coredump at
.service;
>>> static)
>>>     Active: failed (Result: timeout) since Fri 2022-11-25 10:54:59
CET;
>>> 39min ago
>>> TriggeredBy: ? systemd-coredump.socket
>>>       Docs: man:systemd-coredump(8)
>>>    Process: 3199873 ExecStart=/lib/systemd/systemd-coredump
>>> (code=killed, signal=TERM)
>>>   Main PID: 3199873 (code=killed, signal=TERM)
>>>        CPU: 15ms
>>>
>>> Nov 25 10:49:59 pve02 systemd[1]: Started Process Core Dump (PID
>>> 3199871/UID 0).
>>> Nov 25 10:54:59 pve02 systemd[1]: systemd-coredump at
0-3199871-0.service:
>>> Service reached runtime time limit. Stopping.
>>> Nov 25 10:54:59 pve02 systemd[1]: systemd-coredump at
0-3199871-0.service:
>>> Failed with result 'timeout'.
>>>
>>>
>>> I just saw the exception on dmesg,
>>> [2022-11-25 10:50:08]  INFO: task kmmpd-loop0:681644 blocked for
more
>>> than 120 seconds.
>>> [2022-11-25 10:50:08]        Tainted: P          IO     
5.15.60-2-pve
>>> #1
>>> [2022-11-25 10:50:08]  "echo 0 >
>>> /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
>>> [2022-11-25 10:50:08]  task:kmmpd-loop0     state:D stack:    0
>>> pid:681644 ppid:     2 flags:0x00004000
>>> [2022-11-25 10:50:08]  Call Trace:
>>> [2022-11-25 10:50:08]   <TASK>
>>> [2022-11-25 10:50:08]   __schedule+0x33d/0x1750
>>> [2022-11-25 10:50:08]   ? bit_wait+0x70/0x70
>>> [2022-11-25 10:50:08]   schedule+0x4e/0xc0
>>> [2022-11-25 10:50:08]   io_schedule+0x46/0x80
>>> [2022-11-25 10:50:08]   bit_wait_io+0x11/0x70
>>> [2022-11-25 10:50:08]   __wait_on_bit+0x31/0xa0
>>> [2022-11-25 10:50:08]   out_of_line_wait_on_bit+0x8d/0xb0
>>> [2022-11-25 10:50:08]   ? var_wake_function+0x30/0x30
>>> [2022-11-25 10:50:08]   __wait_on_buffer+0x34/0x40
>>> [2022-11-25 10:50:08]   write_mmp_block+0x127/0x180
>>> [2022-11-25 10:50:08]   kmmpd+0x1b9/0x430
>>> [2022-11-25 10:50:08]   ? write_mmp_block+0x180/0x180
>>> [2022-11-25 10:50:08]   kthread+0x127/0x150
>>> [2022-11-25 10:50:08]   ? set_kthread_struct+0x50/0x50
>>> [2022-11-25 10:50:08]   ret_from_fork+0x1f/0x30
>>> [2022-11-25 10:50:08]   </TASK>
>>> [2022-11-25 10:50:08]  INFO: task iou-wrk-1511979:3200401 blocked
for
>>> more than 120 seconds.
>>> [2022-11-25 10:50:08]        Tainted: P          IO     
5.15.60-2-pve
>>> #1
>>> [2022-11-25 10:50:08]  "echo 0 >
>>> /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
>>> [2022-11-25 10:50:08]  task:iou-wrk-1511979 state:D stack:    0
>>> pid:3200401 ppid:     1 flags:0x00004000
>>> [2022-11-25 10:50:08]  Call Trace:
>>> [2022-11-25 10:50:08]   <TASK>
>>> [2022-11-25 10:50:08]   __schedule+0x33d/0x1750
>>> [2022-11-25 10:50:08]   schedule+0x4e/0xc0
>>> [2022-11-25 10:50:08]   rwsem_down_write_slowpath+0x231/0x4f0
>>> [2022-11-25 10:50:08]   down_write+0x47/0x60
>>> [2022-11-25 10:50:08]   fuse_file_write_iter+0x1a3/0x430
>>> [2022-11-25 10:50:08]   ? apparmor_file_permission+0x70/0x170
>>> [2022-11-25 10:50:08]   io_write+0xfb/0x320
>>> [2022-11-25 10:50:08]   ? put_dec+0x1c/0xa0
>>> [2022-11-25 10:50:08]   io_issue_sqe+0x401/0x1fc0
>>> [2022-11-25 10:50:08]   io_wq_submit_work+0x76/0xd0
>>> [2022-11-25 10:50:08]   io_worker_handle_work+0x1a7/0x5f0
>>> [2022-11-25 10:50:08]   io_wqe_worker+0x2c0/0x360
>>> [2022-11-25 10:50:08]   ? finish_task_switch.isra.0+0x7e/0x2b0
>>> [2022-11-25 10:50:08]   ? io_worker_handle_work+0x5f0/0x5f0
>>> [2022-11-25 10:50:08]   ? io_worker_handle_work+0x5f0/0x5f0
>>> [2022-11-25 10:50:08]   ret_from_fork+0x1f/0x30
>>> [2022-11-25 10:50:08]  RIP: 0033:0x0
>>> [2022-11-25 10:50:08]  RSP: 002b:0000000000000000 EFLAGS: 00000216
>>> ORIG_RAX: 00000000000001aa
>>> [2022-11-25 10:50:08]  RAX: 0000000000000000 RBX: 00007fdb1efef640
RCX:
>>> 00007fdd59f872e9
>>> [2022-11-25 10:50:08]  RDX: 0000000000000000 RSI: 0000000000000001
RDI:
>>> 0000000000000011
>>> [2022-11-25 10:50:08]  RBP: 0000000000000000 R08: 0000000000000000
R09:
>>> 0000000000000008
>>> [2022-11-25 10:50:08]  R10: 0000000000000000 R11: 0000000000000216
R12:
>>> 000055662e5bd268
>>> [2022-11-25 10:50:08]  R13: 000055662e5bd320 R14: 000055662e5bd260
R15:
>>> 0000000000000000
>>> [2022-11-25 10:50:08]   </TASK>
>>> [2022-11-25 10:52:08]  INFO: task kmmpd-loop0:681644 blocked for
more
>>> than 241 seconds.
>>> [2022-11-25 10:52:08]        Tainted: P          IO     
5.15.60-2-pve
>>> #1
>>> [2022-11-25 10:52:08]  "echo 0 >
>>> /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
>>> [2022-11-25 10:52:08]  task:kmmpd-loop0     state:D stack:    0
>>> pid:681644 ppid:     2 flags:0x00004000
>>> [2022-11-25 10:52:08]  Call Trace:
>>> [2022-11-25 10:52:08]   <TASK>
>>> [2022-11-25 10:52:08]   __schedule+0x33d/0x1750
>>> [2022-11-25 10:52:08]   ? bit_wait+0x70/0x70
>>> [2022-11-25 10:52:08]   schedule+0x4e/0xc0
>>> [2022-11-25 10:52:08]   io_schedule+0x46/0x80
>>> [2022-11-25 10:52:08]   bit_wait_io+0x11/0x70
>>> [2022-11-25 10:52:08]   __wait_on_bit+0x31/0xa0
>>> [2022-11-25 10:52:08]   out_of_line_wait_on_bit+0x8d/0xb0
>>> [2022-11-25 10:52:08]   ? var_wake_function+0x30/0x30
>>> [2022-11-25 10:52:08]   __wait_on_buffer+0x34/0x40
>>> [2022-11-25 10:52:08]   write_mmp_block+0x127/0x180
>>> [2022-11-25 10:52:08]   kmmpd+0x1b9/0x430
>>> [2022-11-25 10:52:08]   ? write_mmp_block+0x180/0x180
>>> [2022-11-25 10:52:08]   kthread+0x127/0x150
>>> [2022-11-25 10:52:08]   ? set_kthread_struct+0x50/0x50
>>> [2022-11-25 10:52:08]   ret_from_fork+0x1f/0x30
>>> [2022-11-25 10:52:08]   </TASK>
>>> [2022-11-25 10:52:08]  INFO: task iou-wrk-1511979:3200401 blocked
for
>>> more than 241 seconds.
>>> [2022-11-25 10:52:08]        Tainted: P          IO     
5.15.60-2-pve
>>> #1
>>> [2022-11-25 10:52:08]  "echo 0 >
>>> /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
>>> [2022-11-25 10:52:08]  task:iou-wrk-1511979 state:D stack:    0
>>> pid:3200401 ppid:     1 flags:0x00004000
>>> [2022-11-25 10:52:08]  Call Trace:
>>> [2022-11-25 10:52:08]   <TASK>
>>> [2022-11-25 10:52:08]   __schedule+0x33d/0x1750
>>> [2022-11-25 10:52:08]   schedule+0x4e/0xc0
>>> [2022-11-25 10:52:08]   rwsem_down_write_slowpath+0x231/0x4f0
>>> [2022-11-25 10:52:08]   down_write+0x47/0x60
>>> [2022-11-25 10:52:08]   fuse_file_write_iter+0x1a3/0x430
>>> [2022-11-25 10:52:08]   ? apparmor_file_permission+0x70/0x170
>>> [2022-11-25 10:52:08]   io_write+0xfb/0x320
>>> [2022-11-25 10:52:08]   ? put_dec+0x1c/0xa0
>>> [2022-11-25 10:52:08]   io_issue_sqe+0x401/0x1fc0
>>> [2022-11-25 10:52:08]   io_wq_submit_work+0x76/0xd0
>>> [2022-11-25 10:52:08]   io_worker_handle_work+0x1a7/0x5f0
>>> [2022-11-25 10:52:08]   io_wqe_worker+0x2c0/0x360
>>> [2022-11-25 10:52:08]   ? finish_task_switch.isra.0+0x7e/0x2b0
>>> [2022-11-25 10:52:08]   ? io_worker_handle_work+0x5f0/0x5f0
>>> [2022-11-25 10:52:08]   ? io_worker_handle_work+0x5f0/0x5f0
>>> [2022-11-25 10:52:08]   ret_from_fork+0x1f/0x30
>>> [2022-11-25 10:52:08]  RIP: 0033:0x0
>>> [2022-11-25 10:52:08]  RSP: 002b:0000000000000000 EFLAGS: 00000216
>>> ORIG_RAX: 00000000000001aa
>>> [2022-11-25 10:52:08]  RAX: 0000000000000000 RBX: 00007fdb1efef640
RCX:
>>> 00007fdd59f872e9
>>> [2022-11-25 10:52:08]  RDX: 0000000000000000 RSI: 0000000000000001
RDI:
>>> 0000000000000011
>>> [2022-11-25 10:52:08]  RBP: 0000000000000000 R08: 0000000000000000
R09:
>>> 0000000000000008
>>> [2022-11-25 10:52:08]  R10: 0000000000000000 R11: 0000000000000216
R12:
>>> 000055662e5bd268
>>> [2022-11-25 10:52:08]  R13: 000055662e5bd320 R14: 000055662e5bd260
R15:
>>> 0000000000000000
>>> [2022-11-25 10:52:08]   </TASK>
>>> [2022-11-25 10:52:12]  loop: Write error at byte offset 37908480,
length
>>> 4096.
>>> [2022-11-25 10:52:12]  print_req_error: 7 callbacks suppressed
>>> [2022-11-25 10:52:12]  blk_update_request: I/O error, dev loop0,
sector
>>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>>> [2022-11-25 10:52:12]  Buffer I/O error on dev loop0, logical block
>>> 9255, lost sync page write
>>> [2022-11-25 10:52:12]  EXT4-fs error (device loop0): kmmpd:179:
comm
>>> kmmpd-loop0: Error writing to MMP block
>>> [2022-11-25 10:52:12]  loop: Write error at byte offset 37908480,
length
>>> 4096.
>>> [2022-11-25 10:52:12]  blk_update_request: I/O error, dev loop0,
sector
>>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>>> [2022-11-25 10:52:12]  Buffer I/O error on dev loop0, logical block
>>> 9255, lost sync page write
>>> [2022-11-25 10:52:18]  loop: Write error at byte offset 37908480,
length
>>> 4096.
>>> [2022-11-25 10:52:18]  blk_update_request: I/O error, dev loop0,
sector
>>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>>> [2022-11-25 10:52:18]  Buffer I/O error on dev loop0, logical block
>>> 9255, lost sync page write
>>> [2022-11-25 10:52:18]  loop: Write error at byte offset 4490452992,
>>> length 4096.
>>> [2022-11-25 10:52:18]  loop: Write error at byte offset 4490457088,
>>> length 4096.
>>> [2022-11-25 10:52:18]  blk_update_request: I/O error, dev loop0,
sector
>>> 8770416 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
>>> [2022-11-25 10:52:18]  blk_update_request: I/O error, dev loop0,
sector
>>> 8770424 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
>>> [2022-11-25 10:52:18]  Aborting journal on device loop0-8.
>>> [2022-11-25 10:52:18]  loop: Write error at byte offset 4429185024,
>>> length 4096.
>>> [2022-11-25 10:52:18]  blk_update_request: I/O error, dev loop0,
sector
>>> 8650752 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
>>> [2022-11-25 10:52:18]  blk_update_request: I/O error, dev loop0,
sector
>>> 8650752 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
>>> [2022-11-25 10:52:18]  Buffer I/O error on dev loop0, logical block
>>> 1081344, lost sync page write
>>> [2022-11-25 10:52:18]  JBD2: Error -5 detected when updating
journal
>>> superblock for loop0-8.
>>> [2022-11-25 10:52:23]  loop: Write error at byte offset 37908480,
length
>>> 4096.
>>> [2022-11-25 10:52:23]  blk_update_request: I/O error, dev loop0,
sector
>>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>>> [2022-11-25 10:52:23]  Buffer I/O error on dev loop0, logical block
>>> 9255, lost sync page write
>>> [2022-11-25 10:52:28]  loop: Write error at byte offset 37908480,
length
>>> 4096.
>>> [2022-11-25 10:52:28]  blk_update_request: I/O error, dev loop0,
sector
>>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>>> [2022-11-25 10:52:28]  Buffer I/O error on dev loop0, logical block
>>> 9255, lost sync page write
>>> [2022-11-25 10:52:33]  loop: Write error at byte offset 37908480,
length
>>> 4096.
>>> [2022-11-25 10:52:33]  blk_update_request: I/O error, dev loop0,
sector
>>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>>> [2022-11-25 10:52:33]  Buffer I/O error on dev loop0, logical block
>>> 9255, lost sync page write
>>> [2022-11-25 10:52:38]  loop: Write error at byte offset 37908480,
length
>>> 4096.
>>> [2022-11-25 10:52:38]  blk_update_request: I/O error, dev loop0,
sector
>>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>>> [2022-11-25 10:52:38]  Buffer I/O error on dev loop0, logical block
>>> 9255, lost sync page write
>>> [2022-11-25 10:52:43]  loop: Write error at byte offset 37908480,
length
>>> 4096.
>>> [2022-11-25 10:52:43]  blk_update_request: I/O error, dev loop0,
sector
>>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>>> [2022-11-25 10:52:43]  Buffer I/O error on dev loop0, logical block
>>> 9255, lost sync page write
>>> [2022-11-25 10:52:48]  loop: Write error at byte offset 37908480,
length
>>> 4096.
>>> [2022-11-25 10:52:48]  blk_update_request: I/O error, dev loop0,
sector
>>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>>> [2022-11-25 10:52:48]  Buffer I/O error on dev loop0, logical block
>>> 9255, lost sync page write
>>> [2022-11-25 10:52:53]  loop: Write error at byte offset 37908480,
length
>>> 4096.
>>> [2022-11-25 10:52:53]  blk_update_request: I/O error, dev loop0,
sector
>>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>>> [2022-11-25 10:52:53]  Buffer I/O error on dev loop0, logical block
>>> 9255, lost sync page write
>>> [2022-11-25 10:52:59]  loop: Write error at byte offset 37908480,
length
>>> 4096.
>>> [2022-11-25 10:52:59]  blk_update_request: I/O error, dev loop0,
sector
>>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>>> [2022-11-25 10:52:59]  Buffer I/O error on dev loop0, logical block
>>> 9255, lost sync page write
>>> [2022-11-25 10:53:04]  loop: Write error at byte offset 37908480,
length
>>> 4096.
>>> [2022-11-25 10:53:04]  blk_update_request: I/O error, dev loop0,
sector
>>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>>> [2022-11-25 10:53:04]  Buffer I/O error on dev loop0, logical block
>>> 9255, lost sync page write
>>> [2022-11-25 10:53:09]  loop: Write error at byte offset 37908480,
length
>>> 4096.
>>> [2022-11-25 10:53:09]  blk_update_request: I/O error, dev loop0,
sector
>>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>>> [2022-11-25 10:53:09]  Buffer I/O error on dev loop0, logical block
>>> 9255, lost sync page write
>>> [2022-11-25 10:53:14]  loop: Write error at byte offset 37908480,
length
>>> 4096.
>>> [2022-11-25 10:53:14]  blk_update_request: I/O error, dev loop0,
sector
>>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>>> [2022-11-25 10:53:14]  Buffer I/O error on dev loop0, logical block
>>> 9255, lost sync page write
>>> [2022-11-25 10:53:19]  loop: Write error at byte offset 37908480,
length
>>> 4096.
>>> [2022-11-25 10:53:19]  blk_update_request: I/O error, dev loop0,
sector
>>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>>> [2022-11-25 10:53:19]  Buffer I/O error on dev loop0, logical block
>>> 9255, lost sync page write
>>> [2022-11-25 10:53:24]  loop: Write error at byte offset 37908480,
length
>>> 4096.
>>> [2022-11-25 10:53:24]  blk_update_request: I/O error, dev loop0,
sector
>>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>>> [2022-11-25 10:53:24]  Buffer I/O error on dev loop0, logical block
>>> 9255, lost sync page write
>>> [2022-11-25 10:53:29]  loop: Write error at byte offset 37908480,
length
>>> 4096.
>>> [2022-11-25 10:53:29]  blk_update_request: I/O error, dev loop0,
sector
>>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>>> [2022-11-25 10:53:29]  Buffer I/O error on dev loop0, logical block
>>> 9255, lost sync page write
>>> [2022-11-25 10:53:34]  loop: Write error at byte offset 37908480,
length
>>> 4096.
>>> [2022-11-25 10:53:34]  blk_update_request: I/O error, dev loop0,
sector
>>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>>> [2022-11-25 10:53:34]  Buffer I/O error on dev loop0, logical block
>>> 9255, lost sync page write
>>> [2022-11-25 10:53:40]  loop: Write error at byte offset 37908480,
length
>>> 4096.
>>> [2022-11-25 10:53:40]  blk_update_request: I/O error, dev loop0,
sector
>>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>>> [2022-11-25 10:53:40]  Buffer I/O error on dev loop0, logical block
>>> 9255, lost sync page write
>>> [2022-11-25 10:53:45]  loop: Write error at byte offset 37908480,
length
>>> 4096.
>>> [2022-11-25 10:53:45]  blk_update_request: I/O error, dev loop0,
sector
>>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>>> [2022-11-25 10:53:45]  Buffer I/O error on dev loop0, logical block
>>> 9255, lost sync page write
>>> [2022-11-25 10:53:50]  loop: Write error at byte offset 37908480,
length
>>> 4096.
>>> [2022-11-25 10:53:50]  blk_update_request: I/O error, dev loop0,
sector
>>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>>> [2022-11-25 10:53:50]  Buffer I/O error on dev loop0, logical block
>>> 9255, lost sync page write
>>> [2022-11-25 10:53:55]  loop: Write error at byte offset 37908480,
length
>>> 4096.
>>> [2022-11-25 10:53:55]  blk_update_request: I/O error, dev loop0,
sector
>>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>>> [2022-11-25 10:53:55]  Buffer I/O error on dev loop0, logical block
>>> 9255, lost sync page write
>>> [2022-11-25 10:54:00]  loop: Write error at byte offset 37908480,
length
>>> 4096.
>>> [2022-11-25 10:54:00]  blk_update_request: I/O error, dev loop0,
sector
>>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>>> [2022-11-25 10:54:00]  Buffer I/O error on dev loop0, logical block
>>> 9255, lost sync page write
>>> [2022-11-25 10:54:05]  loop: Write error at byte offset 37908480,
length
>>> 4096.
>>> [2022-11-25 10:54:05]  blk_update_request: I/O error, dev loop0,
sector
>>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>>> [2022-11-25 10:54:05]  Buffer I/O error on dev loop0, logical block
>>> 9255, lost sync page write
>>> [2022-11-25 10:54:10]  loop: Write error at byte offset 37908480,
length
>>> 4096.
>>> [2022-11-25 10:54:10]  blk_update_request: I/O error, dev loop0,
sector
>>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>>> [2022-11-25 10:54:10]  Buffer I/O error on dev loop0, logical block
>>> 9255, lost sync page write
>>> [2022-11-25 10:54:15]  loop: Write error at byte offset 37908480,
length
>>> 4096.
>>> [2022-11-25 10:54:15]  blk_update_request: I/O error, dev loop0,
sector
>>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>>> [2022-11-25 10:54:15]  Buffer I/O error on dev loop0, logical block
>>> 9255, lost sync page write
>>> [2022-11-25 10:54:21]  loop: Write error at byte offset 37908480,
length
>>> 4096.
>>> [2022-11-25 10:54:21]  blk_update_request: I/O error, dev loop0,
sector
>>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>>> [2022-11-25 10:54:21]  Buffer I/O error on dev loop0, logical block
>>> 9255, lost sync page write
>>> [2022-11-25 10:54:26]  loop: Write error at byte offset 37908480,
length
>>> 4096.
>>> [2022-11-25 10:54:26]  blk_update_request: I/O error, dev loop0,
sector
>>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>>> [2022-11-25 10:54:26]  Buffer I/O error on dev loop0, logical block
>>> 9255, lost sync page write
>>> [2022-11-25 10:54:31]  loop: Write error at byte offset 37908480,
length
>>> 4096.
>>> [2022-11-25 10:54:31]  blk_update_request: I/O error, dev loop0,
sector
>>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>>> [2022-11-25 10:54:31]  Buffer I/O error on dev loop0, logical block
>>> 9255, lost sync page write
>>> [2022-11-25 10:54:36]  loop: Write error at byte offset 37908480,
length
>>> 4096.
>>> [2022-11-25 10:54:36]  blk_update_request: I/O error, dev loop0,
sector
>>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>>> [2022-11-25 10:54:36]  Buffer I/O error on dev loop0, logical block
>>> 9255, lost sync page write
>>> [2022-11-25 10:54:41]  loop: Write error at byte offset 37908480,
length
>>> 4096.
>>> [2022-11-25 10:54:41]  blk_update_request: I/O error, dev loop0,
sector
>>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>>> [2022-11-25 10:54:41]  Buffer I/O error on dev loop0, logical block
>>> 9255, lost sync page write
>>> [2022-11-25 10:54:46]  loop: Write error at byte offset 37908480,
length
>>> 4096.
>>> [2022-11-25 10:54:46]  blk_update_request: I/O error, dev loop0,
sector
>>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>>> [2022-11-25 10:54:46]  Buffer I/O error on dev loop0, logical block
>>> 9255, lost sync page write
>>> [2022-11-25 10:54:51]  loop: Write error at byte offset 37908480,
length
>>> 4096.
>>> [2022-11-25 10:54:51]  blk_update_request: I/O error, dev loop0,
sector
>>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>>> [2022-11-25 10:54:51]  Buffer I/O error on dev loop0, logical block
>>> 9255, lost sync page write
>>> [2022-11-25 10:54:56]  loop: Write error at byte offset 37908480,
length
>>> 4096.
>>> [2022-11-25 10:54:56]  blk_update_request: I/O error, dev loop0,
sector
>>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>>> [2022-11-25 10:54:56]  Buffer I/O error on dev loop0, logical block
>>> 9255, lost sync page write
>>> [2022-11-25 10:55:01]  loop: Write error at byte offset 37908480,
length
>>> 4096.
>>> [2022-11-25 10:55:01]  blk_update_request: I/O error, dev loop0,
sector
>>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>>> [2022-11-25 10:55:01]  Buffer I/O error on dev loop0, logical block
>>> 9255, lost sync page write
>>> [2022-11-25 10:55:04]  EXT4-fs error (device loop0):
>>> ext4_journal_check_start:83: comm burp: Detected aborted journal
>>> [2022-11-25 10:55:04]  loop: Write error at byte offset 0, length
4096.
>>> [2022-11-25 10:55:04]  blk_update_request: I/O error, dev loop0,
sector
>>> 0 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>>> [2022-11-25 10:55:04]  blk_update_request: I/O error, dev loop0,
sector
>>> 0 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>>> [2022-11-25 10:55:04]  Buffer I/O error on dev loop0, logical block
0,
>>> lost sync page write
>>> [2022-11-25 10:55:04]  EXT4-fs (loop0): I/O error while writing
>>> superblock
>>> [2022-11-25 10:55:04]  EXT4-fs (loop0): Remounting filesystem
read-only
>>> [2022-11-25 10:55:07]  loop: Write error at byte offset 37908480,
length
>>> 4096.
>>> [2022-11-25 10:55:07]  blk_update_request: I/O error, dev loop0,
sector
>>> 74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
>>> [2022-11-25 10:55:07]  Buffer I/O error on dev loop0, logical block
>>> 9255, lost sync page write
>>> [2022-11-25 10:57:14]  blk_update_request: I/O error, dev loop0,
sector
>>> 16390368 op 0x0:(READ) flags 0x80700 phys_seg 6 prio class 0
>>> [2022-11-25 11:03:45]  device tap136i0 entered promiscuous mode
>>>
>>> I don't know if it is relevant somehow or it is unrelated to
glusterfs,
>>> but the consequences are the mountpoint crashes, I'm forced to
lazy unmount
>>> it and remount it back. Then restart all the VMs on there,
unfortunately,
>>> this time several have the hard disk corrupted and now I'm
restoring them
>>> from the backup.
>>>
>>> Any tip?
>>>
>>> *Angel Docampo*
>>>
>>>
<https://www.google.com/maps/place/Edificio+de+Oficinas+Euro+3/@41.3755943,2.0730134,17z/data=!3m2!4b1!5s0x12a4997021aad323:0x3e06bf8ae6d68351!4m5!3m4!1s0x12a4997a67bf592f:0x83c2323a9cc2aa4b!8m2!3d41.3755903!4d2.0752021>
>>>   <angel.docampo at eoniantec.com>  <+34-93-1592929>
>>>
>>>
>>> El mar, 22 nov 2022 a las 12:31, Angel Docampo (<
>>> angel.docampo at eoniantec.com>) escribi?:
>>>
>>>> I've taken a look into all possible places they should be,
and I
>>>> couldn't find it anywhere. Some people say the dump file is
generated where
>>>> the application is running... well, I don't know where to
look then, and I
>>>> hope they hadn't been generated on the failed mountpoint.
>>>>
>>>> As Debian 11 has systemd, I've installed systemd-coredump,
so in the
>>>> case a new crash happens, at least I will have the exact
location and tool
>>>> (coredumpctl) to find them and will install then the debug
symbols, which
>>>> is particularly tricky on debian. But I need to wait to happen
again, now
>>>> the tool says there isn't any core dump on the system.
>>>>
>>>> Thank you, Xavi, if this happens again (let's hope it
won't), I will
>>>> report back.
>>>>
>>>> Best regards!
>>>>
>>>> *Angel Docampo*
>>>>
>>>>
<https://www.google.com/maps/place/Edificio+de+Oficinas+Euro+3/@41.3755943,2.0730134,17z/data=!3m2!4b1!5s0x12a4997021aad323:0x3e06bf8ae6d68351!4m5!3m4!1s0x12a4997a67bf592f:0x83c2323a9cc2aa4b!8m2!3d41.3755903!4d2.0752021>
>>>>   <angel.docampo at eoniantec.com> 
<+34-93-1592929>
>>>>
>>>>
>>>> El mar, 22 nov 2022 a las 10:45, Xavi Hernandez (<jahernan
at redhat.com>)
>>>> escribi?:
>>>>
>>>>> The crash seems related to some problem in ec xlator, but I
don't have
>>>>> enough information to determine what it is. The crash
should have generated
>>>>> a core dump somewhere in the system (I don't know where
Debian keeps the
>>>>> core dumps). If you find it, you should be able to open it
using this
>>>>> command (make sure debug symbols package is also installed
before running
>>>>> it):
>>>>>
>>>>>     # gdb /usr/sbin/glusterfs <path to core dump>
>>>>>
>>>>> And then run this command:
>>>>>
>>>>>     # bt -full
>>>>>
>>>>> Regards,
>>>>>
>>>>> Xavi
>>>>>
>>>>> On Tue, Nov 22, 2022 at 9:41 AM Angel Docampo <
>>>>> angel.docampo at eoniantec.com> wrote:
>>>>>
>>>>>> Hi Xavi,
>>>>>>
>>>>>> The OS is Debian 11 with the proxmox kernel. Gluster
packages are the
>>>>>> official from gluster.org (
>>>>>>
https://download.gluster.org/pub/gluster/glusterfs/10/10.3/Debian/bullseye/
>>>>>> )
>>>>>>
>>>>>> The system logs showed no other issues by the time of
the crash, no
>>>>>> OOM kill or whatsoever, and no other process was
interacting with the
>>>>>> gluster mountpoint besides proxmox.
>>>>>>
>>>>>> I wasn't running gdb when it crashed, so I
don't really know if I can
>>>>>> obtain a more detailed trace from logs or if there is a
simple way to let
>>>>>> it running in the background to see if it happens again
(or there is a flag
>>>>>> to start the systemd daemon in debug mode).
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> *Angel Docampo*
>>>>>>
>>>>>>
<https://www.google.com/maps/place/Edificio+de+Oficinas+Euro+3/@41.3755943,2.0730134,17z/data=!3m2!4b1!5s0x12a4997021aad323:0x3e06bf8ae6d68351!4m5!3m4!1s0x12a4997a67bf592f:0x83c2323a9cc2aa4b!8m2!3d41.3755903!4d2.0752021>
>>>>>>   <angel.docampo at eoniantec.com> 
<+34-93-1592929>
>>>>>>
>>>>>>
>>>>>> El lun, 21 nov 2022 a las 15:16, Xavi Hernandez
(<jahernan at redhat.com>)
>>>>>> escribi?:
>>>>>>
>>>>>>> Hi Angel,
>>>>>>>
>>>>>>> On Mon, Nov 21, 2022 at 2:33 PM Angel Docampo <
>>>>>>> angel.docampo at eoniantec.com> wrote:
>>>>>>>
>>>>>>>> Sorry for necrobumping this, but this morning
I've suffered this on
>>>>>>>> my Proxmox  + GlusterFS cluster. In the log I
can see this
>>>>>>>>
>>>>>>>> [2022-11-21 07:38:00.213620 +0000] I [MSGID:
133017]
>>>>>>>> [shard.c:7275:shard_seek] 11-vmdata-shard: seek
called on
>>>>>>>> fbc063cb-874e-475d-b585-f89
>>>>>>>> f7518acdd. [Operation not supported]
>>>>>>>> pending frames:
>>>>>>>> frame : type(1) op(WRITE)
>>>>>>>> frame : type(0) op(0)
>>>>>>>> frame : type(0) op(0)
>>>>>>>> frame : type(0) op(0)
>>>>>>>> frame : type(0) op(0)
>>>>>>>> frame : type(0) op(0)
>>>>>>>> frame : type(0) op(0)
>>>>>>>> frame : type(0) op(0)
>>>>>>>> frame : type(0) op(0)
>>>>>>>> frame : type(0) op(0)
>>>>>>>> frame : type(0) op(0)
>>>>>>>> frame : type(0) op(0)
>>>>>>>> frame : type(0) op(0)
>>>>>>>> frame : type(0) op(0)
>>>>>>>> frame : type(0) op(0)
>>>>>>>> frame : type(0) op(0)
>>>>>>>> frame : type(0) op(0)
>>>>>>>> ...
>>>>>>>> frame : type(1) op(FSYNC)
>>>>>>>> frame : type(1) op(FSYNC)
>>>>>>>> frame : type(1) op(FSYNC)
>>>>>>>> frame : type(1) op(FSYNC)
>>>>>>>> frame : type(1) op(FSYNC)
>>>>>>>> frame : type(1) op(FSYNC)
>>>>>>>> frame : type(1) op(FSYNC)
>>>>>>>> frame : type(1) op(FSYNC)
>>>>>>>> frame : type(1) op(FSYNC)
>>>>>>>> frame : type(1) op(FSYNC)
>>>>>>>> frame : type(1) op(FSYNC)
>>>>>>>> frame : type(1) op(FSYNC)
>>>>>>>> frame : type(1) op(FSYNC)
>>>>>>>> frame : type(1) op(FSYNC)
>>>>>>>> frame : type(1) op(FSYNC)
>>>>>>>> frame : type(1) op(FSYNC)
>>>>>>>> frame : type(1) op(FSYNC)
>>>>>>>> frame : type(1) op(FSYNC)
>>>>>>>> frame : type(1) op(FSYNC)
>>>>>>>> frame : type(1) op(FSYNC)
>>>>>>>> patchset: git://git.gluster.org/glusterfs.git
>>>>>>>> signal received: 11
>>>>>>>> time of crash:
>>>>>>>> 2022-11-21 07:38:00 +0000
>>>>>>>> configuration details:
>>>>>>>> argp 1
>>>>>>>> backtrace 1
>>>>>>>> dlfcn 1
>>>>>>>> libpthread 1
>>>>>>>> llistxattr 1
>>>>>>>> setfsid 1
>>>>>>>> epoll.h 1
>>>>>>>> xattr.h 1
>>>>>>>> st_atim.tv_nsec 1
>>>>>>>> package-string: glusterfs 10.3
>>>>>>>>
/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x28a54)[0x7f74f286ba54]
>>>>>>>>
/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x700)[0x7f74f2873fc0]
>>>>>>>>
>>>>>>>>
/lib/x86_64-linux-gnu/libc.so.6(+0x38d60)[0x7f74f262ed60]
>>>>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x37a14)[0x7f74ecfcea14]
>>>>>>>>
>>>>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x19414)[0x7f74ecfb0414]
>>>>>>>>
>>>>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x16373)[0x7f74ecfad373]
>>>>>>>>
>>>>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x21d59)[0x7f74ecfb8d59]
>>>>>>>>
>>>>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x22815)[0x7f74ecfb9815]
>>>>>>>>
>>>>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x377d9)[0x7f74ecfce7d9]
>>>>>>>>
>>>>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x19414)[0x7f74ecfb0414]
>>>>>>>>
>>>>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x16373)[0x7f74ecfad373]
>>>>>>>>
>>>>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x170f9)[0x7f74ecfae0f9]
>>>>>>>>
>>>>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x313bb)[0x7f74ecfc83bb]
>>>>>>>>
>>>>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/protocol/client.so(+0x48e3a)[0x7f74ed06ce3a]
>>>>>>>>
>>>>>>>>
/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xfccb)[0x7f74f2816ccb]
>>>>>>>>
/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x26)[0x7f74f2812646]
>>>>>>>>
>>>>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/rpc-transport/socket.so(+0x64c8)[0x7f74ee15f4c8]
>>>>>>>>
>>>>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/rpc-transport/socket.so(+0xd38c)[0x7f74ee16638c]
>>>>>>>>
>>>>>>>>
/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x7971d)[0x7f74f28bc71d]
>>>>>>>>
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7)[0x7f74f27d2ea7]
>>>>>>>>
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f74f26f2aef]
>>>>>>>> ---------
>>>>>>>> The mount point wasn't accessible with the
"Transport endpoint is
>>>>>>>> not connected" message and it was shown
like this.
>>>>>>>> d?????????   ? ?    ?            ?            ?
vmdata
>>>>>>>>
>>>>>>>> I had to stop all the VMs on that proxmox node,
then stop the
>>>>>>>> gluster daemon to ummount de directory, and
after starting the daemon and
>>>>>>>> re-mounting, all was working again.
>>>>>>>>
>>>>>>>> My gluster volume info returns this
>>>>>>>>
>>>>>>>> Volume Name: vmdata
>>>>>>>> Type: Distributed-Disperse
>>>>>>>> Volume ID: cace5aa4-b13a-4750-8736-aa179c2485e1
>>>>>>>> Status: Started
>>>>>>>> Snapshot Count: 0
>>>>>>>> Number of Bricks: 2 x (2 + 1) = 6
>>>>>>>> Transport-type: tcp
>>>>>>>> Bricks:
>>>>>>>> Brick1: g01:/data/brick1/brick
>>>>>>>> Brick2: g02:/data/brick2/brick
>>>>>>>> Brick3: g03:/data/brick1/brick
>>>>>>>> Brick4: g01:/data/brick2/brick
>>>>>>>> Brick5: g02:/data/brick1/brick
>>>>>>>> Brick6: g03:/data/brick2/brick
>>>>>>>> Options Reconfigured:
>>>>>>>> nfs.disable: on
>>>>>>>> transport.address-family: inet
>>>>>>>> storage.fips-mode-rchecksum: on
>>>>>>>> features.shard: enable
>>>>>>>> features.shard-block-size: 256MB
>>>>>>>> performance.read-ahead: off
>>>>>>>> performance.quick-read: off
>>>>>>>> performance.io-cache: off
>>>>>>>> server.event-threads: 2
>>>>>>>> client.event-threads: 3
>>>>>>>> performance.client-io-threads: on
>>>>>>>> performance.stat-prefetch: off
>>>>>>>> dht.force-readdirp: off
>>>>>>>> performance.force-readdirp: off
>>>>>>>> network.remote-dio: on
>>>>>>>> features.cache-invalidation: on
>>>>>>>> performance.parallel-readdir: on
>>>>>>>> performance.readdir-ahead: on
>>>>>>>>
>>>>>>>> Xavi, do you think the open-behind off setting
can help somehow? I
>>>>>>>> did try to understand what it does (with no
luck), and if it could impact
>>>>>>>> the performance of my VMs (I've the setup
you know so well ;))
>>>>>>>> I would like to avoid more crashings like this,
version 10.3 of
>>>>>>>> gluster was working since two weeks ago, quite
well until this morning.
>>>>>>>>
>>>>>>>
>>>>>>> I don't think disabling open-behind will have
any visible effect on
>>>>>>> performance. Open-behind is only useful for small
files when the workload
>>>>>>> is mostly open + read + close, and quick-read is
also enabled (which is not
>>>>>>> your case). The only effect it will have is that
the latency "saved" during
>>>>>>> open is "paid" on the next operation sent
to the file, so the total overall
>>>>>>> latency should be the same. Additionally, VM
workload doesn't open files
>>>>>>> frequently, so it shouldn't matter much in any
case.
>>>>>>>
>>>>>>> That said, I'm not sure if the problem is the
same in your case.
>>>>>>> Based on the stack of the crash, it seems an issue
inside the disperse
>>>>>>> module.
>>>>>>>
>>>>>>> What OS are you using ? are you using official
packages ?  if so,
>>>>>>> which ones ?
>>>>>>>
>>>>>>> Is it possible to provide a backtrace from gdb ?
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Xavi
>>>>>>>
>>>>>>>
>>>>>>>> *Angel Docampo*
>>>>>>>>
>>>>>>>>
<https://www.google.com/maps/place/Edificio+de+Oficinas+Euro+3/@41.3755943,2.0730134,17z/data=!3m2!4b1!5s0x12a4997021aad323:0x3e06bf8ae6d68351!4m5!3m4!1s0x12a4997a67bf592f:0x83c2323a9cc2aa4b!8m2!3d41.3755903!4d2.0752021>
>>>>>>>>   <angel.docampo at eoniantec.com> 
<+34-93-1592929>
>>>>>>>>
>>>>>>>>
>>>>>>>> El vie, 19 mar 2021 a las 2:10, David
Cunningham (<
>>>>>>>> dcunningham at voisonics.com>) escribi?:
>>>>>>>>
>>>>>>>>> Hi Xavi,
>>>>>>>>>
>>>>>>>>> Thank you for that information. We'll
look at upgrading it.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, 12 Mar 2021 at 05:20, Xavi
Hernandez <jahernan at redhat.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi David,
>>>>>>>>>>
>>>>>>>>>> with so little information it's
hard to tell, but given that
>>>>>>>>>> there are several OPEN and UNLINK
operations, it could be related to an
>>>>>>>>>> already fixed bug (in recent versions)
in open-behind.
>>>>>>>>>>
>>>>>>>>>> You can try disabling open-behind with
this command:
>>>>>>>>>>
>>>>>>>>>>     # gluster volume set
<volname> open-behind off
>>>>>>>>>>
>>>>>>>>>> But given the version you are using is
very old and unmaintained,
>>>>>>>>>> I would recommend you to upgrade to 8.x
at least.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>>
>>>>>>>>>> Xavi
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Mar 10, 2021 at 5:10 AM David
Cunningham <
>>>>>>>>>> dcunningham at voisonics.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> We have a GlusterFS 5.13 server
which also mounts itself with
>>>>>>>>>>> the native FUSE client. Recently
the FUSE mount crashed and we found the
>>>>>>>>>>> following in the syslog. There
isn't anything logged in mnt-glusterfs.log
>>>>>>>>>>> for that time. After killing all
processes with a file handle open on the
>>>>>>>>>>> filesystem we were able to unmount
and then remount the filesystem
>>>>>>>>>>> successfully.
>>>>>>>>>>>
>>>>>>>>>>> Would anyone have advice on how to
debug this crash? Thank you
>>>>>>>>>>> in advance!
>>>>>>>>>>>
>>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: pending frames:
>>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: frame : type(0) op(0)
>>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: frame : type(0) op(0)
>>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: frame : type(1)
>>>>>>>>>>> op(UNLINK)
>>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: frame : type(1)
>>>>>>>>>>> op(UNLINK)
>>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: frame : type(1)
>>>>>>>>>>> op(OPEN)
>>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: message repeated 3355
>>>>>>>>>>> times: [ frame : type(1) op(OPEN)]
>>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: frame : type(1)
>>>>>>>>>>> op(OPEN)
>>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: message repeated 6965
>>>>>>>>>>> times: [ frame : type(1) op(OPEN)]
>>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: frame : type(1)
>>>>>>>>>>> op(OPEN)
>>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: message repeated 4095
>>>>>>>>>>> times: [ frame : type(1) op(OPEN)]
>>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: frame : type(0) op(0)
>>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: patchset: git://
>>>>>>>>>>> git.gluster.org/glusterfs.git
>>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: signal received: 11
>>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: time of crash:
>>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: 2021-03-09 03:12:31
>>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: configuration details:
>>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: argp 1
>>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: backtrace 1
>>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: dlfcn 1
>>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: libpthread 1
>>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: llistxattr 1
>>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: setfsid 1
>>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: spinlock 1
>>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: epoll.h 1
>>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: xattr.h 1
>>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: st_atim.tv_nsec 1
>>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: package-string:
>>>>>>>>>>> glusterfs 5.13
>>>>>>>>>>> Mar 9 05:12:31 voip1
mnt-glusterfs[2932]: ---------
>>>>>>>>>>> ...
>>>>>>>>>>> Mar 9 05:13:50 voip1 systemd[1]:
glusterfssharedstorage.service:
>>>>>>>>>>> Main process exited, code=killed,
status=11/SEGV
>>>>>>>>>>> Mar 9 05:13:50 voip1 systemd[1]:
glusterfssharedstorage.service:
>>>>>>>>>>> Failed with result
'signal'.
>>>>>>>>>>> ...
>>>>>>>>>>> Mar 9 05:13:54 voip1 systemd[1]:
glusterfssharedstorage.service:
>>>>>>>>>>> Service hold-off time over,
scheduling restart.
>>>>>>>>>>> Mar 9 05:13:54 voip1 systemd[1]:
glusterfssharedstorage.service:
>>>>>>>>>>> Scheduled restart job, restart
counter is at 2.
>>>>>>>>>>> Mar 9 05:13:54 voip1 systemd[1]:
Stopped Mount glusterfs
>>>>>>>>>>> sharedstorage.
>>>>>>>>>>> Mar 9 05:13:54 voip1 systemd[1]:
Starting Mount glusterfs
>>>>>>>>>>> sharedstorage...
>>>>>>>>>>> Mar 9 05:13:54 voip1
mount-shared-storage.sh[20520]: ERROR:
>>>>>>>>>>> Mount point does not exist
>>>>>>>>>>> Mar 9 05:13:54 voip1
mount-shared-storage.sh[20520]: Please
>>>>>>>>>>> specify a mount point
>>>>>>>>>>> Mar 9 05:13:54 voip1
mount-shared-storage.sh[20520]: Usage:
>>>>>>>>>>> Mar 9 05:13:54 voip1
mount-shared-storage.sh[20520]: man 8
>>>>>>>>>>> /sbin/mount.glusterfs
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> David Cunningham, Voisonics Limited
>>>>>>>>>>> http://voisonics.com/
>>>>>>>>>>> USA: +1 213 221 1092
>>>>>>>>>>> New Zealand: +64 (0)28 2558 3782
>>>>>>>>>>> ________
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Community Meeting Calendar:
>>>>>>>>>>>
>>>>>>>>>>> Schedule -
>>>>>>>>>>> Every 2nd and 4th Tuesday at 14:30
IST / 09:00 UTC
>>>>>>>>>>> Bridge:
https://meet.google.com/cpu-eiue-hvk
>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> David Cunningham, Voisonics Limited
>>>>>>>>> http://voisonics.com/
>>>>>>>>> USA: +1 213 221 1092
>>>>>>>>> New Zealand: +64 (0)28 2558 3782
>>>>>>>>> ________
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Community Meeting Calendar:
>>>>>>>>>
>>>>>>>>> Schedule -
>>>>>>>>> Every 2nd and 4th Tuesday at 14:30 IST /
09:00 UTC
>>>>>>>>> Bridge:
https://meet.google.com/cpu-eiue-hvk
>>>>>>>>> Gluster-users mailing list
>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>
>>>>>>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20221201/ed0c3ea1/attachment.html>

Gluster users - Dec 2022 - GlusterFS mount crash

[Gluster-users] GlusterFS mount crash

[Gluster-users] GlusterFS mount crash

[Gluster-users] GlusterFS mount crash