thr3ads.net - Gluster users - [Gluster-users] GlusterFS mount crash [Nov 2022]

If this information is useful, please help other people find it:
Share via:

Angel Docampo

2022-Nov-22 11:31 UTC

[Gluster-users] GlusterFS mount crash

I've taken a look into all possible places they should be, and I
couldn't
find it anywhere. Some people say the dump file is generated where the
application is running... well, I don't know where to look then, and I hope
they hadn't been generated on the failed mountpoint.

As Debian 11 has systemd, I've installed systemd-coredump, so in the case a
new crash happens, at least I will have the exact location and tool
(coredumpctl) to find them and will install then the debug symbols, which
is particularly tricky on debian. But I need to wait to happen again, now
the tool says there isn't any core dump on the system.

Thank you, Xavi, if this happens again (let's hope it won't), I will
report
back.

Best regards!

*Angel Docampo*
<https://www.google.com/maps/place/Edificio+de+Oficinas+Euro+3/@41.3755943,2.0730134,17z/data=!3m2!4b1!5s0x12a4997021aad323:0x3e06bf8ae6d68351!4m5!3m4!1s0x12a4997a67bf592f:0x83c2323a9cc2aa4b!8m2!3d41.3755903!4d2.0752021>
  <angel.docampo at eoniantec.com>  <+34-93-1592929>


El mar, 22 nov 2022 a las 10:45, Xavi Hernandez (<jahernan at redhat.com>)
escribi?:
> The crash seems related to some problem in ec xlator, but I don't have
> enough information to determine what it is. The crash should have generated
> a core dump somewhere in the system (I don't know where Debian keeps
the
> core dumps). If you find it, you should be able to open it using this
> command (make sure debug symbols package is also installed before running
> it):
>
>     # gdb /usr/sbin/glusterfs <path to core dump>
>
> And then run this command:
>
>     # bt -full
>
> Regards,
>
> Xavi
>
> On Tue, Nov 22, 2022 at 9:41 AM Angel Docampo <angel.docampo at
eoniantec.com>
> wrote:
>
>> Hi Xavi,
>>
>> The OS is Debian 11 with the proxmox kernel. Gluster packages are the
>> official from gluster.org (
>>
https://download.gluster.org/pub/gluster/glusterfs/10/10.3/Debian/bullseye/
>> )
>>
>> The system logs showed no other issues by the time of the crash, no OOM
>> kill or whatsoever, and no other process was interacting with the
gluster
>> mountpoint besides proxmox.
>>
>> I wasn't running gdb when it crashed, so I don't really know if
I can
>> obtain a more detailed trace from logs or if there is a simple way to
let
>> it running in the background to see if it happens again (or there is a
flag
>> to start the systemd daemon in debug mode).
>>
>> Best,
>>
>> *Angel Docampo*
>>
>>
<https://www.google.com/maps/place/Edificio+de+Oficinas+Euro+3/@41.3755943,2.0730134,17z/data=!3m2!4b1!5s0x12a4997021aad323:0x3e06bf8ae6d68351!4m5!3m4!1s0x12a4997a67bf592f:0x83c2323a9cc2aa4b!8m2!3d41.3755903!4d2.0752021>
>>   <angel.docampo at eoniantec.com>  <+34-93-1592929>
>>
>>
>> El lun, 21 nov 2022 a las 15:16, Xavi Hernandez (<jahernan at
redhat.com>)
>> escribi?:
>>
>>> Hi Angel,
>>>
>>> On Mon, Nov 21, 2022 at 2:33 PM Angel Docampo <
>>> angel.docampo at eoniantec.com> wrote:
>>>
>>>> Sorry for necrobumping this, but this morning I've suffered
this on my
>>>> Proxmox  + GlusterFS cluster. In the log I can see this
>>>>
>>>> [2022-11-21 07:38:00.213620 +0000] I [MSGID: 133017]
>>>> [shard.c:7275:shard_seek] 11-vmdata-shard: seek called on
>>>> fbc063cb-874e-475d-b585-f89
>>>> f7518acdd. [Operation not supported]
>>>> pending frames:
>>>> frame : type(1) op(WRITE)
>>>> frame : type(0) op(0)
>>>> frame : type(0) op(0)
>>>> frame : type(0) op(0)
>>>> frame : type(0) op(0)
>>>> frame : type(0) op(0)
>>>> frame : type(0) op(0)
>>>> frame : type(0) op(0)
>>>> frame : type(0) op(0)
>>>> frame : type(0) op(0)
>>>> frame : type(0) op(0)
>>>> frame : type(0) op(0)
>>>> frame : type(0) op(0)
>>>> frame : type(0) op(0)
>>>> frame : type(0) op(0)
>>>> frame : type(0) op(0)
>>>> frame : type(0) op(0)
>>>> ...
>>>> frame : type(1) op(FSYNC)
>>>> frame : type(1) op(FSYNC)
>>>> frame : type(1) op(FSYNC)
>>>> frame : type(1) op(FSYNC)
>>>> frame : type(1) op(FSYNC)
>>>> frame : type(1) op(FSYNC)
>>>> frame : type(1) op(FSYNC)
>>>> frame : type(1) op(FSYNC)
>>>> frame : type(1) op(FSYNC)
>>>> frame : type(1) op(FSYNC)
>>>> frame : type(1) op(FSYNC)
>>>> frame : type(1) op(FSYNC)
>>>> frame : type(1) op(FSYNC)
>>>> frame : type(1) op(FSYNC)
>>>> frame : type(1) op(FSYNC)
>>>> frame : type(1) op(FSYNC)
>>>> frame : type(1) op(FSYNC)
>>>> frame : type(1) op(FSYNC)
>>>> frame : type(1) op(FSYNC)
>>>> frame : type(1) op(FSYNC)
>>>> patchset: git://git.gluster.org/glusterfs.git
>>>> signal received: 11
>>>> time of crash:
>>>> 2022-11-21 07:38:00 +0000
>>>> configuration details:
>>>> argp 1
>>>> backtrace 1
>>>> dlfcn 1
>>>> libpthread 1
>>>> llistxattr 1
>>>> setfsid 1
>>>> epoll.h 1
>>>> xattr.h 1
>>>> st_atim.tv_nsec 1
>>>> package-string: glusterfs 10.3
>>>>
/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x28a54)[0x7f74f286ba54]
>>>>
/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x700)[0x7f74f2873fc0]
>>>>
>>>> /lib/x86_64-linux-gnu/libc.so.6(+0x38d60)[0x7f74f262ed60]
>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x37a14)[0x7f74ecfcea14]
>>>>
>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x19414)[0x7f74ecfb0414]
>>>>
>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x16373)[0x7f74ecfad373]
>>>>
>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x21d59)[0x7f74ecfb8d59]
>>>>
>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x22815)[0x7f74ecfb9815]
>>>>
>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x377d9)[0x7f74ecfce7d9]
>>>>
>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x19414)[0x7f74ecfb0414]
>>>>
>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x16373)[0x7f74ecfad373]
>>>>
>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x170f9)[0x7f74ecfae0f9]
>>>>
>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x313bb)[0x7f74ecfc83bb]
>>>>
>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/protocol/client.so(+0x48e3a)[0x7f74ed06ce3a]
>>>>
>>>> /lib/x86_64-linux-gnu/libgfrpc.so.0(+0xfccb)[0x7f74f2816ccb]
>>>>
/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x26)[0x7f74f2812646]
>>>>
>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/rpc-transport/socket.so(+0x64c8)[0x7f74ee15f4c8]
>>>>
>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/rpc-transport/socket.so(+0xd38c)[0x7f74ee16638c]
>>>>
>>>>
/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x7971d)[0x7f74f28bc71d]
>>>> /lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7)[0x7f74f27d2ea7]
>>>> /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f74f26f2aef]
>>>> ---------
>>>> The mount point wasn't accessible with the "Transport
endpoint is not
>>>> connected" message and it was shown like this.
>>>> d?????????   ? ?    ?            ?            ? vmdata
>>>>
>>>> I had to stop all the VMs on that proxmox node, then stop the
gluster
>>>> daemon to ummount de directory, and after starting the daemon
and
>>>> re-mounting, all was working again.
>>>>
>>>> My gluster volume info returns this
>>>>
>>>> Volume Name: vmdata
>>>> Type: Distributed-Disperse
>>>> Volume ID: cace5aa4-b13a-4750-8736-aa179c2485e1
>>>> Status: Started
>>>> Snapshot Count: 0
>>>> Number of Bricks: 2 x (2 + 1) = 6
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: g01:/data/brick1/brick
>>>> Brick2: g02:/data/brick2/brick
>>>> Brick3: g03:/data/brick1/brick
>>>> Brick4: g01:/data/brick2/brick
>>>> Brick5: g02:/data/brick1/brick
>>>> Brick6: g03:/data/brick2/brick
>>>> Options Reconfigured:
>>>> nfs.disable: on
>>>> transport.address-family: inet
>>>> storage.fips-mode-rchecksum: on
>>>> features.shard: enable
>>>> features.shard-block-size: 256MB
>>>> performance.read-ahead: off
>>>> performance.quick-read: off
>>>> performance.io-cache: off
>>>> server.event-threads: 2
>>>> client.event-threads: 3
>>>> performance.client-io-threads: on
>>>> performance.stat-prefetch: off
>>>> dht.force-readdirp: off
>>>> performance.force-readdirp: off
>>>> network.remote-dio: on
>>>> features.cache-invalidation: on
>>>> performance.parallel-readdir: on
>>>> performance.readdir-ahead: on
>>>>
>>>> Xavi, do you think the open-behind off setting can help
somehow? I did
>>>> try to understand what it does (with no luck), and if it could
impact the
>>>> performance of my VMs (I've the setup you know so well ;))
>>>> I would like to avoid more crashings like this, version 10.3 of
gluster
>>>> was working since two weeks ago, quite well until this morning.
>>>>
>>>
>>> I don't think disabling open-behind will have any visible
effect on
>>> performance. Open-behind is only useful for small files when the
workload
>>> is mostly open + read + close, and quick-read is also enabled
(which is not
>>> your case). The only effect it will have is that the latency
"saved" during
>>> open is "paid" on the next operation sent to the file, so
the total overall
>>> latency should be the same. Additionally, VM workload doesn't
open files
>>> frequently, so it shouldn't matter much in any case.
>>>
>>> That said, I'm not sure if the problem is the same in your
case. Based
>>> on the stack of the crash, it seems an issue inside the disperse
module.
>>>
>>> What OS are you using ? are you using official packages ?  if so,
which
>>> ones ?
>>>
>>> Is it possible to provide a backtrace from gdb ?
>>>
>>> Regards,
>>>
>>> Xavi
>>>
>>>
>>>> *Angel Docampo*
>>>>
>>>>
<https://www.google.com/maps/place/Edificio+de+Oficinas+Euro+3/@41.3755943,2.0730134,17z/data=!3m2!4b1!5s0x12a4997021aad323:0x3e06bf8ae6d68351!4m5!3m4!1s0x12a4997a67bf592f:0x83c2323a9cc2aa4b!8m2!3d41.3755903!4d2.0752021>
>>>>   <angel.docampo at eoniantec.com> 
<+34-93-1592929>
>>>>
>>>>
>>>> El vie, 19 mar 2021 a las 2:10, David Cunningham (<
>>>> dcunningham at voisonics.com>) escribi?:
>>>>
>>>>> Hi Xavi,
>>>>>
>>>>> Thank you for that information. We'll look at upgrading
it.
>>>>>
>>>>>
>>>>> On Fri, 12 Mar 2021 at 05:20, Xavi Hernandez <jahernan
at redhat.com>
>>>>> wrote:
>>>>>
>>>>>> Hi David,
>>>>>>
>>>>>> with so little information it's hard to tell, but
given that there
>>>>>> are several OPEN and UNLINK operations, it could be
related to an already
>>>>>> fixed bug (in recent versions) in open-behind.
>>>>>>
>>>>>> You can try disabling open-behind with this command:
>>>>>>
>>>>>>     # gluster volume set <volname> open-behind
off
>>>>>>
>>>>>> But given the version you are using is very old and
unmaintained, I
>>>>>> would recommend you to upgrade to 8.x at least.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Xavi
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 10, 2021 at 5:10 AM David Cunningham <
>>>>>> dcunningham at voisonics.com> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> We have a GlusterFS 5.13 server which also mounts
itself with the
>>>>>>> native FUSE client. Recently the FUSE mount crashed
and we found the
>>>>>>> following in the syslog. There isn't anything
logged in mnt-glusterfs.log
>>>>>>> for that time. After killing all processes with a
file handle open on the
>>>>>>> filesystem we were able to unmount and then remount
the filesystem
>>>>>>> successfully.
>>>>>>>
>>>>>>> Would anyone have advice on how to debug this
crash? Thank you in
>>>>>>> advance!
>>>>>>>
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: pending
frames:
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame :
type(0) op(0)
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame :
type(0) op(0)
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame :
type(1) op(UNLINK)
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame :
type(1) op(UNLINK)
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame :
type(1) op(OPEN)
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: message
repeated 3355
>>>>>>> times: [ frame : type(1) op(OPEN)]
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame :
type(1) op(OPEN)
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: message
repeated 6965
>>>>>>> times: [ frame : type(1) op(OPEN)]
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame :
type(1) op(OPEN)
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: message
repeated 4095
>>>>>>> times: [ frame : type(1) op(OPEN)]
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame :
type(0) op(0)
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: patchset:
git://
>>>>>>> git.gluster.org/glusterfs.git
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: signal
received: 11
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: time of
crash:
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
2021-03-09 03:12:31
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
configuration details:
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: argp 1
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: backtrace
1
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: dlfcn 1
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
libpthread 1
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
llistxattr 1
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: setfsid 1
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: spinlock
1
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: epoll.h 1
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: xattr.h 1
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
st_atim.tv_nsec 1
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
package-string: glusterfs
>>>>>>> 5.13
>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: ---------
>>>>>>> ...
>>>>>>> Mar 9 05:13:50 voip1 systemd[1]:
glusterfssharedstorage.service:
>>>>>>> Main process exited, code=killed, status=11/SEGV
>>>>>>> Mar 9 05:13:50 voip1 systemd[1]:
glusterfssharedstorage.service:
>>>>>>> Failed with result 'signal'.
>>>>>>> ...
>>>>>>> Mar 9 05:13:54 voip1 systemd[1]:
glusterfssharedstorage.service:
>>>>>>> Service hold-off time over, scheduling restart.
>>>>>>> Mar 9 05:13:54 voip1 systemd[1]:
glusterfssharedstorage.service:
>>>>>>> Scheduled restart job, restart counter is at 2.
>>>>>>> Mar 9 05:13:54 voip1 systemd[1]: Stopped Mount
glusterfs
>>>>>>> sharedstorage.
>>>>>>> Mar 9 05:13:54 voip1 systemd[1]: Starting Mount
glusterfs
>>>>>>> sharedstorage...
>>>>>>> Mar 9 05:13:54 voip1
mount-shared-storage.sh[20520]: ERROR: Mount
>>>>>>> point does not exist
>>>>>>> Mar 9 05:13:54 voip1
mount-shared-storage.sh[20520]: Please specify
>>>>>>> a mount point
>>>>>>> Mar 9 05:13:54 voip1
mount-shared-storage.sh[20520]: Usage:
>>>>>>> Mar 9 05:13:54 voip1
mount-shared-storage.sh[20520]: man 8
>>>>>>> /sbin/mount.glusterfs
>>>>>>>
>>>>>>> --
>>>>>>> David Cunningham, Voisonics Limited
>>>>>>> http://voisonics.com/
>>>>>>> USA: +1 213 221 1092
>>>>>>> New Zealand: +64 (0)28 2558 3782
>>>>>>> ________
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Community Meeting Calendar:
>>>>>>>
>>>>>>> Schedule -
>>>>>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>>>>>>> Bridge: https://meet.google.com/cpu-eiue-hvk
>>>>>>> Gluster-users mailing list
>>>>>>> Gluster-users at gluster.org
>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> David Cunningham, Voisonics Limited
>>>>> http://voisonics.com/
>>>>> USA: +1 213 221 1092
>>>>> New Zealand: +64 (0)28 2558 3782
>>>>> ________
>>>>>
>>>>>
>>>>>
>>>>> Community Meeting Calendar:
>>>>>
>>>>> Schedule -
>>>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>>>>> Bridge: https://meet.google.com/cpu-eiue-hvk
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20221122/55402205/attachment.html>

Angel Docampo

2022-Nov-25 10:39 UTC

head link

[Gluster-users] GlusterFS mount crash

Well, just happened again, the same server, the same mountpoint.

I'm unable to get the core dumps, coredumpctl says there are no core dumps,
it would be funny if I wasn't the one suffering it, but systemd-coredump
service crashed as well
? systemd-coredump at 0-3199871-0.service - Process Core Dump (PID 3199871/UID
0)
    Loaded: loaded (/lib/systemd/system/systemd-coredump at .service; static)
    Active: failed (Result: timeout) since Fri 2022-11-25 10:54:59 CET;
39min ago
TriggeredBy: ? systemd-coredump.socket
      Docs: man:systemd-coredump(8)
   Process: 3199873 ExecStart=/lib/systemd/systemd-coredump (code=killed,
signal=TERM)
  Main PID: 3199873 (code=killed, signal=TERM)
       CPU: 15ms

Nov 25 10:49:59 pve02 systemd[1]: Started Process Core Dump (PID
3199871/UID 0).
Nov 25 10:54:59 pve02 systemd[1]: systemd-coredump at 0-3199871-0.service:
Service reached runtime time limit. Stopping.
Nov 25 10:54:59 pve02 systemd[1]: systemd-coredump at 0-3199871-0.service:
Failed with result 'timeout'.


I just saw the exception on dmesg,
[2022-11-25 10:50:08]  INFO: task kmmpd-loop0:681644 blocked for more than
120 seconds.
[2022-11-25 10:50:08]        Tainted: P          IO      5.15.60-2-pve #1
[2022-11-25 10:50:08]  "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[2022-11-25 10:50:08]  task:kmmpd-loop0     state:D stack:    0 pid:681644
ppid:     2 flags:0x00004000
[2022-11-25 10:50:08]  Call Trace:
[2022-11-25 10:50:08]   <TASK>
[2022-11-25 10:50:08]   __schedule+0x33d/0x1750
[2022-11-25 10:50:08]   ? bit_wait+0x70/0x70
[2022-11-25 10:50:08]   schedule+0x4e/0xc0
[2022-11-25 10:50:08]   io_schedule+0x46/0x80
[2022-11-25 10:50:08]   bit_wait_io+0x11/0x70
[2022-11-25 10:50:08]   __wait_on_bit+0x31/0xa0
[2022-11-25 10:50:08]   out_of_line_wait_on_bit+0x8d/0xb0
[2022-11-25 10:50:08]   ? var_wake_function+0x30/0x30
[2022-11-25 10:50:08]   __wait_on_buffer+0x34/0x40
[2022-11-25 10:50:08]   write_mmp_block+0x127/0x180
[2022-11-25 10:50:08]   kmmpd+0x1b9/0x430
[2022-11-25 10:50:08]   ? write_mmp_block+0x180/0x180
[2022-11-25 10:50:08]   kthread+0x127/0x150
[2022-11-25 10:50:08]   ? set_kthread_struct+0x50/0x50
[2022-11-25 10:50:08]   ret_from_fork+0x1f/0x30
[2022-11-25 10:50:08]   </TASK>
[2022-11-25 10:50:08]  INFO: task iou-wrk-1511979:3200401 blocked for more
than 120 seconds.
[2022-11-25 10:50:08]        Tainted: P          IO      5.15.60-2-pve #1
[2022-11-25 10:50:08]  "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[2022-11-25 10:50:08]  task:iou-wrk-1511979 state:D stack:    0 pid:3200401
ppid:     1 flags:0x00004000
[2022-11-25 10:50:08]  Call Trace:
[2022-11-25 10:50:08]   <TASK>
[2022-11-25 10:50:08]   __schedule+0x33d/0x1750
[2022-11-25 10:50:08]   schedule+0x4e/0xc0
[2022-11-25 10:50:08]   rwsem_down_write_slowpath+0x231/0x4f0
[2022-11-25 10:50:08]   down_write+0x47/0x60
[2022-11-25 10:50:08]   fuse_file_write_iter+0x1a3/0x430
[2022-11-25 10:50:08]   ? apparmor_file_permission+0x70/0x170
[2022-11-25 10:50:08]   io_write+0xfb/0x320
[2022-11-25 10:50:08]   ? put_dec+0x1c/0xa0
[2022-11-25 10:50:08]   io_issue_sqe+0x401/0x1fc0
[2022-11-25 10:50:08]   io_wq_submit_work+0x76/0xd0
[2022-11-25 10:50:08]   io_worker_handle_work+0x1a7/0x5f0
[2022-11-25 10:50:08]   io_wqe_worker+0x2c0/0x360
[2022-11-25 10:50:08]   ? finish_task_switch.isra.0+0x7e/0x2b0
[2022-11-25 10:50:08]   ? io_worker_handle_work+0x5f0/0x5f0
[2022-11-25 10:50:08]   ? io_worker_handle_work+0x5f0/0x5f0
[2022-11-25 10:50:08]   ret_from_fork+0x1f/0x30
[2022-11-25 10:50:08]  RIP: 0033:0x0
[2022-11-25 10:50:08]  RSP: 002b:0000000000000000 EFLAGS: 00000216
ORIG_RAX: 00000000000001aa
[2022-11-25 10:50:08]  RAX: 0000000000000000 RBX: 00007fdb1efef640 RCX:
00007fdd59f872e9
[2022-11-25 10:50:08]  RDX: 0000000000000000 RSI: 0000000000000001 RDI:
0000000000000011
[2022-11-25 10:50:08]  RBP: 0000000000000000 R08: 0000000000000000 R09:
0000000000000008
[2022-11-25 10:50:08]  R10: 0000000000000000 R11: 0000000000000216 R12:
000055662e5bd268
[2022-11-25 10:50:08]  R13: 000055662e5bd320 R14: 000055662e5bd260 R15:
0000000000000000
[2022-11-25 10:50:08]   </TASK>
[2022-11-25 10:52:08]  INFO: task kmmpd-loop0:681644 blocked for more than
241 seconds.
[2022-11-25 10:52:08]        Tainted: P          IO      5.15.60-2-pve #1
[2022-11-25 10:52:08]  "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[2022-11-25 10:52:08]  task:kmmpd-loop0     state:D stack:    0 pid:681644
ppid:     2 flags:0x00004000
[2022-11-25 10:52:08]  Call Trace:
[2022-11-25 10:52:08]   <TASK>
[2022-11-25 10:52:08]   __schedule+0x33d/0x1750
[2022-11-25 10:52:08]   ? bit_wait+0x70/0x70
[2022-11-25 10:52:08]   schedule+0x4e/0xc0
[2022-11-25 10:52:08]   io_schedule+0x46/0x80
[2022-11-25 10:52:08]   bit_wait_io+0x11/0x70
[2022-11-25 10:52:08]   __wait_on_bit+0x31/0xa0
[2022-11-25 10:52:08]   out_of_line_wait_on_bit+0x8d/0xb0
[2022-11-25 10:52:08]   ? var_wake_function+0x30/0x30
[2022-11-25 10:52:08]   __wait_on_buffer+0x34/0x40
[2022-11-25 10:52:08]   write_mmp_block+0x127/0x180
[2022-11-25 10:52:08]   kmmpd+0x1b9/0x430
[2022-11-25 10:52:08]   ? write_mmp_block+0x180/0x180
[2022-11-25 10:52:08]   kthread+0x127/0x150
[2022-11-25 10:52:08]   ? set_kthread_struct+0x50/0x50
[2022-11-25 10:52:08]   ret_from_fork+0x1f/0x30
[2022-11-25 10:52:08]   </TASK>
[2022-11-25 10:52:08]  INFO: task iou-wrk-1511979:3200401 blocked for more
than 241 seconds.
[2022-11-25 10:52:08]        Tainted: P          IO      5.15.60-2-pve #1
[2022-11-25 10:52:08]  "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[2022-11-25 10:52:08]  task:iou-wrk-1511979 state:D stack:    0 pid:3200401
ppid:     1 flags:0x00004000
[2022-11-25 10:52:08]  Call Trace:
[2022-11-25 10:52:08]   <TASK>
[2022-11-25 10:52:08]   __schedule+0x33d/0x1750
[2022-11-25 10:52:08]   schedule+0x4e/0xc0
[2022-11-25 10:52:08]   rwsem_down_write_slowpath+0x231/0x4f0
[2022-11-25 10:52:08]   down_write+0x47/0x60
[2022-11-25 10:52:08]   fuse_file_write_iter+0x1a3/0x430
[2022-11-25 10:52:08]   ? apparmor_file_permission+0x70/0x170
[2022-11-25 10:52:08]   io_write+0xfb/0x320
[2022-11-25 10:52:08]   ? put_dec+0x1c/0xa0
[2022-11-25 10:52:08]   io_issue_sqe+0x401/0x1fc0
[2022-11-25 10:52:08]   io_wq_submit_work+0x76/0xd0
[2022-11-25 10:52:08]   io_worker_handle_work+0x1a7/0x5f0
[2022-11-25 10:52:08]   io_wqe_worker+0x2c0/0x360
[2022-11-25 10:52:08]   ? finish_task_switch.isra.0+0x7e/0x2b0
[2022-11-25 10:52:08]   ? io_worker_handle_work+0x5f0/0x5f0
[2022-11-25 10:52:08]   ? io_worker_handle_work+0x5f0/0x5f0
[2022-11-25 10:52:08]   ret_from_fork+0x1f/0x30
[2022-11-25 10:52:08]  RIP: 0033:0x0
[2022-11-25 10:52:08]  RSP: 002b:0000000000000000 EFLAGS: 00000216
ORIG_RAX: 00000000000001aa
[2022-11-25 10:52:08]  RAX: 0000000000000000 RBX: 00007fdb1efef640 RCX:
00007fdd59f872e9
[2022-11-25 10:52:08]  RDX: 0000000000000000 RSI: 0000000000000001 RDI:
0000000000000011
[2022-11-25 10:52:08]  RBP: 0000000000000000 R08: 0000000000000000 R09:
0000000000000008
[2022-11-25 10:52:08]  R10: 0000000000000000 R11: 0000000000000216 R12:
000055662e5bd268
[2022-11-25 10:52:08]  R13: 000055662e5bd320 R14: 000055662e5bd260 R15:
0000000000000000
[2022-11-25 10:52:08]   </TASK>
[2022-11-25 10:52:12]  loop: Write error at byte offset 37908480, length
4096.
[2022-11-25 10:52:12]  print_req_error: 7 callbacks suppressed
[2022-11-25 10:52:12]  blk_update_request: I/O error, dev loop0, sector
74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
[2022-11-25 10:52:12]  Buffer I/O error on dev loop0, logical block 9255,
lost sync page write
[2022-11-25 10:52:12]  EXT4-fs error (device loop0): kmmpd:179: comm
kmmpd-loop0: Error writing to MMP block
[2022-11-25 10:52:12]  loop: Write error at byte offset 37908480, length
4096.
[2022-11-25 10:52:12]  blk_update_request: I/O error, dev loop0, sector
74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
[2022-11-25 10:52:12]  Buffer I/O error on dev loop0, logical block 9255,
lost sync page write
[2022-11-25 10:52:18]  loop: Write error at byte offset 37908480, length
4096.
[2022-11-25 10:52:18]  blk_update_request: I/O error, dev loop0, sector
74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
[2022-11-25 10:52:18]  Buffer I/O error on dev loop0, logical block 9255,
lost sync page write
[2022-11-25 10:52:18]  loop: Write error at byte offset 4490452992, length
4096.
[2022-11-25 10:52:18]  loop: Write error at byte offset 4490457088, length
4096.
[2022-11-25 10:52:18]  blk_update_request: I/O error, dev loop0, sector
8770416 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[2022-11-25 10:52:18]  blk_update_request: I/O error, dev loop0, sector
8770424 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[2022-11-25 10:52:18]  Aborting journal on device loop0-8.
[2022-11-25 10:52:18]  loop: Write error at byte offset 4429185024, length
4096.
[2022-11-25 10:52:18]  blk_update_request: I/O error, dev loop0, sector
8650752 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[2022-11-25 10:52:18]  blk_update_request: I/O error, dev loop0, sector
8650752 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[2022-11-25 10:52:18]  Buffer I/O error on dev loop0, logical block
1081344, lost sync page write
[2022-11-25 10:52:18]  JBD2: Error -5 detected when updating journal
superblock for loop0-8.
[2022-11-25 10:52:23]  loop: Write error at byte offset 37908480, length
4096.
[2022-11-25 10:52:23]  blk_update_request: I/O error, dev loop0, sector
74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
[2022-11-25 10:52:23]  Buffer I/O error on dev loop0, logical block 9255,
lost sync page write
[2022-11-25 10:52:28]  loop: Write error at byte offset 37908480, length
4096.
[2022-11-25 10:52:28]  blk_update_request: I/O error, dev loop0, sector
74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
[2022-11-25 10:52:28]  Buffer I/O error on dev loop0, logical block 9255,
lost sync page write
[2022-11-25 10:52:33]  loop: Write error at byte offset 37908480, length
4096.
[2022-11-25 10:52:33]  blk_update_request: I/O error, dev loop0, sector
74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
[2022-11-25 10:52:33]  Buffer I/O error on dev loop0, logical block 9255,
lost sync page write
[2022-11-25 10:52:38]  loop: Write error at byte offset 37908480, length
4096.
[2022-11-25 10:52:38]  blk_update_request: I/O error, dev loop0, sector
74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
[2022-11-25 10:52:38]  Buffer I/O error on dev loop0, logical block 9255,
lost sync page write
[2022-11-25 10:52:43]  loop: Write error at byte offset 37908480, length
4096.
[2022-11-25 10:52:43]  blk_update_request: I/O error, dev loop0, sector
74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
[2022-11-25 10:52:43]  Buffer I/O error on dev loop0, logical block 9255,
lost sync page write
[2022-11-25 10:52:48]  loop: Write error at byte offset 37908480, length
4096.
[2022-11-25 10:52:48]  blk_update_request: I/O error, dev loop0, sector
74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
[2022-11-25 10:52:48]  Buffer I/O error on dev loop0, logical block 9255,
lost sync page write
[2022-11-25 10:52:53]  loop: Write error at byte offset 37908480, length
4096.
[2022-11-25 10:52:53]  blk_update_request: I/O error, dev loop0, sector
74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
[2022-11-25 10:52:53]  Buffer I/O error on dev loop0, logical block 9255,
lost sync page write
[2022-11-25 10:52:59]  loop: Write error at byte offset 37908480, length
4096.
[2022-11-25 10:52:59]  blk_update_request: I/O error, dev loop0, sector
74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
[2022-11-25 10:52:59]  Buffer I/O error on dev loop0, logical block 9255,
lost sync page write
[2022-11-25 10:53:04]  loop: Write error at byte offset 37908480, length
4096.
[2022-11-25 10:53:04]  blk_update_request: I/O error, dev loop0, sector
74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
[2022-11-25 10:53:04]  Buffer I/O error on dev loop0, logical block 9255,
lost sync page write
[2022-11-25 10:53:09]  loop: Write error at byte offset 37908480, length
4096.
[2022-11-25 10:53:09]  blk_update_request: I/O error, dev loop0, sector
74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
[2022-11-25 10:53:09]  Buffer I/O error on dev loop0, logical block 9255,
lost sync page write
[2022-11-25 10:53:14]  loop: Write error at byte offset 37908480, length
4096.
[2022-11-25 10:53:14]  blk_update_request: I/O error, dev loop0, sector
74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
[2022-11-25 10:53:14]  Buffer I/O error on dev loop0, logical block 9255,
lost sync page write
[2022-11-25 10:53:19]  loop: Write error at byte offset 37908480, length
4096.
[2022-11-25 10:53:19]  blk_update_request: I/O error, dev loop0, sector
74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
[2022-11-25 10:53:19]  Buffer I/O error on dev loop0, logical block 9255,
lost sync page write
[2022-11-25 10:53:24]  loop: Write error at byte offset 37908480, length
4096.
[2022-11-25 10:53:24]  blk_update_request: I/O error, dev loop0, sector
74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
[2022-11-25 10:53:24]  Buffer I/O error on dev loop0, logical block 9255,
lost sync page write
[2022-11-25 10:53:29]  loop: Write error at byte offset 37908480, length
4096.
[2022-11-25 10:53:29]  blk_update_request: I/O error, dev loop0, sector
74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
[2022-11-25 10:53:29]  Buffer I/O error on dev loop0, logical block 9255,
lost sync page write
[2022-11-25 10:53:34]  loop: Write error at byte offset 37908480, length
4096.
[2022-11-25 10:53:34]  blk_update_request: I/O error, dev loop0, sector
74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
[2022-11-25 10:53:34]  Buffer I/O error on dev loop0, logical block 9255,
lost sync page write
[2022-11-25 10:53:40]  loop: Write error at byte offset 37908480, length
4096.
[2022-11-25 10:53:40]  blk_update_request: I/O error, dev loop0, sector
74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
[2022-11-25 10:53:40]  Buffer I/O error on dev loop0, logical block 9255,
lost sync page write
[2022-11-25 10:53:45]  loop: Write error at byte offset 37908480, length
4096.
[2022-11-25 10:53:45]  blk_update_request: I/O error, dev loop0, sector
74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
[2022-11-25 10:53:45]  Buffer I/O error on dev loop0, logical block 9255,
lost sync page write
[2022-11-25 10:53:50]  loop: Write error at byte offset 37908480, length
4096.
[2022-11-25 10:53:50]  blk_update_request: I/O error, dev loop0, sector
74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
[2022-11-25 10:53:50]  Buffer I/O error on dev loop0, logical block 9255,
lost sync page write
[2022-11-25 10:53:55]  loop: Write error at byte offset 37908480, length
4096.
[2022-11-25 10:53:55]  blk_update_request: I/O error, dev loop0, sector
74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
[2022-11-25 10:53:55]  Buffer I/O error on dev loop0, logical block 9255,
lost sync page write
[2022-11-25 10:54:00]  loop: Write error at byte offset 37908480, length
4096.
[2022-11-25 10:54:00]  blk_update_request: I/O error, dev loop0, sector
74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
[2022-11-25 10:54:00]  Buffer I/O error on dev loop0, logical block 9255,
lost sync page write
[2022-11-25 10:54:05]  loop: Write error at byte offset 37908480, length
4096.
[2022-11-25 10:54:05]  blk_update_request: I/O error, dev loop0, sector
74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
[2022-11-25 10:54:05]  Buffer I/O error on dev loop0, logical block 9255,
lost sync page write
[2022-11-25 10:54:10]  loop: Write error at byte offset 37908480, length
4096.
[2022-11-25 10:54:10]  blk_update_request: I/O error, dev loop0, sector
74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
[2022-11-25 10:54:10]  Buffer I/O error on dev loop0, logical block 9255,
lost sync page write
[2022-11-25 10:54:15]  loop: Write error at byte offset 37908480, length
4096.
[2022-11-25 10:54:15]  blk_update_request: I/O error, dev loop0, sector
74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
[2022-11-25 10:54:15]  Buffer I/O error on dev loop0, logical block 9255,
lost sync page write
[2022-11-25 10:54:21]  loop: Write error at byte offset 37908480, length
4096.
[2022-11-25 10:54:21]  blk_update_request: I/O error, dev loop0, sector
74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
[2022-11-25 10:54:21]  Buffer I/O error on dev loop0, logical block 9255,
lost sync page write
[2022-11-25 10:54:26]  loop: Write error at byte offset 37908480, length
4096.
[2022-11-25 10:54:26]  blk_update_request: I/O error, dev loop0, sector
74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
[2022-11-25 10:54:26]  Buffer I/O error on dev loop0, logical block 9255,
lost sync page write
[2022-11-25 10:54:31]  loop: Write error at byte offset 37908480, length
4096.
[2022-11-25 10:54:31]  blk_update_request: I/O error, dev loop0, sector
74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
[2022-11-25 10:54:31]  Buffer I/O error on dev loop0, logical block 9255,
lost sync page write
[2022-11-25 10:54:36]  loop: Write error at byte offset 37908480, length
4096.
[2022-11-25 10:54:36]  blk_update_request: I/O error, dev loop0, sector
74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
[2022-11-25 10:54:36]  Buffer I/O error on dev loop0, logical block 9255,
lost sync page write
[2022-11-25 10:54:41]  loop: Write error at byte offset 37908480, length
4096.
[2022-11-25 10:54:41]  blk_update_request: I/O error, dev loop0, sector
74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
[2022-11-25 10:54:41]  Buffer I/O error on dev loop0, logical block 9255,
lost sync page write
[2022-11-25 10:54:46]  loop: Write error at byte offset 37908480, length
4096.
[2022-11-25 10:54:46]  blk_update_request: I/O error, dev loop0, sector
74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
[2022-11-25 10:54:46]  Buffer I/O error on dev loop0, logical block 9255,
lost sync page write
[2022-11-25 10:54:51]  loop: Write error at byte offset 37908480, length
4096.
[2022-11-25 10:54:51]  blk_update_request: I/O error, dev loop0, sector
74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
[2022-11-25 10:54:51]  Buffer I/O error on dev loop0, logical block 9255,
lost sync page write
[2022-11-25 10:54:56]  loop: Write error at byte offset 37908480, length
4096.
[2022-11-25 10:54:56]  blk_update_request: I/O error, dev loop0, sector
74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
[2022-11-25 10:54:56]  Buffer I/O error on dev loop0, logical block 9255,
lost sync page write
[2022-11-25 10:55:01]  loop: Write error at byte offset 37908480, length
4096.
[2022-11-25 10:55:01]  blk_update_request: I/O error, dev loop0, sector
74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
[2022-11-25 10:55:01]  Buffer I/O error on dev loop0, logical block 9255,
lost sync page write
[2022-11-25 10:55:04]  EXT4-fs error (device loop0):
ext4_journal_check_start:83: comm burp: Detected aborted journal
[2022-11-25 10:55:04]  loop: Write error at byte offset 0, length 4096.
[2022-11-25 10:55:04]  blk_update_request: I/O error, dev loop0, sector 0
op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
[2022-11-25 10:55:04]  blk_update_request: I/O error, dev loop0, sector 0
op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
[2022-11-25 10:55:04]  Buffer I/O error on dev loop0, logical block 0, lost
sync page write
[2022-11-25 10:55:04]  EXT4-fs (loop0): I/O error while writing superblock
[2022-11-25 10:55:04]  EXT4-fs (loop0): Remounting filesystem read-only
[2022-11-25 10:55:07]  loop: Write error at byte offset 37908480, length
4096.
[2022-11-25 10:55:07]  blk_update_request: I/O error, dev loop0, sector
74040 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
[2022-11-25 10:55:07]  Buffer I/O error on dev loop0, logical block 9255,
lost sync page write
[2022-11-25 10:57:14]  blk_update_request: I/O error, dev loop0, sector
16390368 op 0x0:(READ) flags 0x80700 phys_seg 6 prio class 0
[2022-11-25 11:03:45]  device tap136i0 entered promiscuous mode

I don't know if it is relevant somehow or it is unrelated to glusterfs, but
the consequences are the mountpoint crashes, I'm forced to lazy unmount it
and remount it back. Then restart all the VMs on there, unfortunately, this
time several have the hard disk corrupted and now I'm restoring them from
the backup.

Any tip?

*Angel Docampo*
<https://www.google.com/maps/place/Edificio+de+Oficinas+Euro+3/@41.3755943,2.0730134,17z/data=!3m2!4b1!5s0x12a4997021aad323:0x3e06bf8ae6d68351!4m5!3m4!1s0x12a4997a67bf592f:0x83c2323a9cc2aa4b!8m2!3d41.3755903!4d2.0752021>
  <angel.docampo at eoniantec.com>  <+34-93-1592929>


El mar, 22 nov 2022 a las 12:31, Angel Docampo (<angel.docampo at
eoniantec.com>)
escribi?:
> I've taken a look into all possible places they should be, and I
couldn't
> find it anywhere. Some people say the dump file is generated where the
> application is running... well, I don't know where to look then, and I
hope
> they hadn't been generated on the failed mountpoint.
>
> As Debian 11 has systemd, I've installed systemd-coredump, so in the
case
> a new crash happens, at least I will have the exact location and tool
> (coredumpctl) to find them and will install then the debug symbols, which
> is particularly tricky on debian. But I need to wait to happen again, now
> the tool says there isn't any core dump on the system.
>
> Thank you, Xavi, if this happens again (let's hope it won't), I
will
> report back.
>
> Best regards!
>
> *Angel Docampo*
>
>
<https://www.google.com/maps/place/Edificio+de+Oficinas+Euro+3/@41.3755943,2.0730134,17z/data=!3m2!4b1!5s0x12a4997021aad323:0x3e06bf8ae6d68351!4m5!3m4!1s0x12a4997a67bf592f:0x83c2323a9cc2aa4b!8m2!3d41.3755903!4d2.0752021>
>   <angel.docampo at eoniantec.com>  <+34-93-1592929>
>
>
> El mar, 22 nov 2022 a las 10:45, Xavi Hernandez (<jahernan at
redhat.com>)
> escribi?:
>
>> The crash seems related to some problem in ec xlator, but I don't
have
>> enough information to determine what it is. The crash should have
generated
>> a core dump somewhere in the system (I don't know where Debian
keeps the
>> core dumps). If you find it, you should be able to open it using this
>> command (make sure debug symbols package is also installed before
running
>> it):
>>
>>     # gdb /usr/sbin/glusterfs <path to core dump>
>>
>> And then run this command:
>>
>>     # bt -full
>>
>> Regards,
>>
>> Xavi
>>
>> On Tue, Nov 22, 2022 at 9:41 AM Angel Docampo <
>> angel.docampo at eoniantec.com> wrote:
>>
>>> Hi Xavi,
>>>
>>> The OS is Debian 11 with the proxmox kernel. Gluster packages are
the
>>> official from gluster.org (
>>>
https://download.gluster.org/pub/gluster/glusterfs/10/10.3/Debian/bullseye/
>>> )
>>>
>>> The system logs showed no other issues by the time of the crash, no
OOM
>>> kill or whatsoever, and no other process was interacting with the
gluster
>>> mountpoint besides proxmox.
>>>
>>> I wasn't running gdb when it crashed, so I don't really
know if I can
>>> obtain a more detailed trace from logs or if there is a simple way
to let
>>> it running in the background to see if it happens again (or there
is a flag
>>> to start the systemd daemon in debug mode).
>>>
>>> Best,
>>>
>>> *Angel Docampo*
>>>
>>>
<https://www.google.com/maps/place/Edificio+de+Oficinas+Euro+3/@41.3755943,2.0730134,17z/data=!3m2!4b1!5s0x12a4997021aad323:0x3e06bf8ae6d68351!4m5!3m4!1s0x12a4997a67bf592f:0x83c2323a9cc2aa4b!8m2!3d41.3755903!4d2.0752021>
>>>   <angel.docampo at eoniantec.com>  <+34-93-1592929>
>>>
>>>
>>> El lun, 21 nov 2022 a las 15:16, Xavi Hernandez (<jahernan at
redhat.com>)
>>> escribi?:
>>>
>>>> Hi Angel,
>>>>
>>>> On Mon, Nov 21, 2022 at 2:33 PM Angel Docampo <
>>>> angel.docampo at eoniantec.com> wrote:
>>>>
>>>>> Sorry for necrobumping this, but this morning I've
suffered this on my
>>>>> Proxmox  + GlusterFS cluster. In the log I can see this
>>>>>
>>>>> [2022-11-21 07:38:00.213620 +0000] I [MSGID: 133017]
>>>>> [shard.c:7275:shard_seek] 11-vmdata-shard: seek called on
>>>>> fbc063cb-874e-475d-b585-f89
>>>>> f7518acdd. [Operation not supported]
>>>>> pending frames:
>>>>> frame : type(1) op(WRITE)
>>>>> frame : type(0) op(0)
>>>>> frame : type(0) op(0)
>>>>> frame : type(0) op(0)
>>>>> frame : type(0) op(0)
>>>>> frame : type(0) op(0)
>>>>> frame : type(0) op(0)
>>>>> frame : type(0) op(0)
>>>>> frame : type(0) op(0)
>>>>> frame : type(0) op(0)
>>>>> frame : type(0) op(0)
>>>>> frame : type(0) op(0)
>>>>> frame : type(0) op(0)
>>>>> frame : type(0) op(0)
>>>>> frame : type(0) op(0)
>>>>> frame : type(0) op(0)
>>>>> frame : type(0) op(0)
>>>>> ...
>>>>> frame : type(1) op(FSYNC)
>>>>> frame : type(1) op(FSYNC)
>>>>> frame : type(1) op(FSYNC)
>>>>> frame : type(1) op(FSYNC)
>>>>> frame : type(1) op(FSYNC)
>>>>> frame : type(1) op(FSYNC)
>>>>> frame : type(1) op(FSYNC)
>>>>> frame : type(1) op(FSYNC)
>>>>> frame : type(1) op(FSYNC)
>>>>> frame : type(1) op(FSYNC)
>>>>> frame : type(1) op(FSYNC)
>>>>> frame : type(1) op(FSYNC)
>>>>> frame : type(1) op(FSYNC)
>>>>> frame : type(1) op(FSYNC)
>>>>> frame : type(1) op(FSYNC)
>>>>> frame : type(1) op(FSYNC)
>>>>> frame : type(1) op(FSYNC)
>>>>> frame : type(1) op(FSYNC)
>>>>> frame : type(1) op(FSYNC)
>>>>> frame : type(1) op(FSYNC)
>>>>> patchset: git://git.gluster.org/glusterfs.git
>>>>> signal received: 11
>>>>> time of crash:
>>>>> 2022-11-21 07:38:00 +0000
>>>>> configuration details:
>>>>> argp 1
>>>>> backtrace 1
>>>>> dlfcn 1
>>>>> libpthread 1
>>>>> llistxattr 1
>>>>> setfsid 1
>>>>> epoll.h 1
>>>>> xattr.h 1
>>>>> st_atim.tv_nsec 1
>>>>> package-string: glusterfs 10.3
>>>>>
/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x28a54)[0x7f74f286ba54]
>>>>>
/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x700)[0x7f74f2873fc0]
>>>>>
>>>>> /lib/x86_64-linux-gnu/libc.so.6(+0x38d60)[0x7f74f262ed60]
>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x37a14)[0x7f74ecfcea14]
>>>>>
>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x19414)[0x7f74ecfb0414]
>>>>>
>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x16373)[0x7f74ecfad373]
>>>>>
>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x21d59)[0x7f74ecfb8d59]
>>>>>
>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x22815)[0x7f74ecfb9815]
>>>>>
>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x377d9)[0x7f74ecfce7d9]
>>>>>
>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x19414)[0x7f74ecfb0414]
>>>>>
>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x16373)[0x7f74ecfad373]
>>>>>
>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x170f9)[0x7f74ecfae0f9]
>>>>>
>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x313bb)[0x7f74ecfc83bb]
>>>>>
>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/protocol/client.so(+0x48e3a)[0x7f74ed06ce3a]
>>>>>
>>>>>
/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xfccb)[0x7f74f2816ccb]
>>>>>
/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x26)[0x7f74f2812646]
>>>>>
>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/rpc-transport/socket.so(+0x64c8)[0x7f74ee15f4c8]
>>>>>
>>>>>
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/rpc-transport/socket.so(+0xd38c)[0x7f74ee16638c]
>>>>>
>>>>>
/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x7971d)[0x7f74f28bc71d]
>>>>>
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7)[0x7f74f27d2ea7]
>>>>> /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f74f26f2aef]
>>>>> ---------
>>>>> The mount point wasn't accessible with the
"Transport endpoint is not
>>>>> connected" message and it was shown like this.
>>>>> d?????????   ? ?    ?            ?            ? vmdata
>>>>>
>>>>> I had to stop all the VMs on that proxmox node, then stop
the gluster
>>>>> daemon to ummount de directory, and after starting the
daemon and
>>>>> re-mounting, all was working again.
>>>>>
>>>>> My gluster volume info returns this
>>>>>
>>>>> Volume Name: vmdata
>>>>> Type: Distributed-Disperse
>>>>> Volume ID: cace5aa4-b13a-4750-8736-aa179c2485e1
>>>>> Status: Started
>>>>> Snapshot Count: 0
>>>>> Number of Bricks: 2 x (2 + 1) = 6
>>>>> Transport-type: tcp
>>>>> Bricks:
>>>>> Brick1: g01:/data/brick1/brick
>>>>> Brick2: g02:/data/brick2/brick
>>>>> Brick3: g03:/data/brick1/brick
>>>>> Brick4: g01:/data/brick2/brick
>>>>> Brick5: g02:/data/brick1/brick
>>>>> Brick6: g03:/data/brick2/brick
>>>>> Options Reconfigured:
>>>>> nfs.disable: on
>>>>> transport.address-family: inet
>>>>> storage.fips-mode-rchecksum: on
>>>>> features.shard: enable
>>>>> features.shard-block-size: 256MB
>>>>> performance.read-ahead: off
>>>>> performance.quick-read: off
>>>>> performance.io-cache: off
>>>>> server.event-threads: 2
>>>>> client.event-threads: 3
>>>>> performance.client-io-threads: on
>>>>> performance.stat-prefetch: off
>>>>> dht.force-readdirp: off
>>>>> performance.force-readdirp: off
>>>>> network.remote-dio: on
>>>>> features.cache-invalidation: on
>>>>> performance.parallel-readdir: on
>>>>> performance.readdir-ahead: on
>>>>>
>>>>> Xavi, do you think the open-behind off setting can help
somehow? I did
>>>>> try to understand what it does (with no luck), and if it
could impact the
>>>>> performance of my VMs (I've the setup you know so well
;))
>>>>> I would like to avoid more crashings like this, version
10.3 of
>>>>> gluster was working since two weeks ago, quite well until
this morning.
>>>>>
>>>>
>>>> I don't think disabling open-behind will have any visible
effect on
>>>> performance. Open-behind is only useful for small files when
the workload
>>>> is mostly open + read + close, and quick-read is also enabled
(which is not
>>>> your case). The only effect it will have is that the latency
"saved" during
>>>> open is "paid" on the next operation sent to the
file, so the total overall
>>>> latency should be the same. Additionally, VM workload
doesn't open files
>>>> frequently, so it shouldn't matter much in any case.
>>>>
>>>> That said, I'm not sure if the problem is the same in your
case. Based
>>>> on the stack of the crash, it seems an issue inside the
disperse module.
>>>>
>>>> What OS are you using ? are you using official packages ?  if
so, which
>>>> ones ?
>>>>
>>>> Is it possible to provide a backtrace from gdb ?
>>>>
>>>> Regards,
>>>>
>>>> Xavi
>>>>
>>>>
>>>>> *Angel Docampo*
>>>>>
>>>>>
<https://www.google.com/maps/place/Edificio+de+Oficinas+Euro+3/@41.3755943,2.0730134,17z/data=!3m2!4b1!5s0x12a4997021aad323:0x3e06bf8ae6d68351!4m5!3m4!1s0x12a4997a67bf592f:0x83c2323a9cc2aa4b!8m2!3d41.3755903!4d2.0752021>
>>>>>   <angel.docampo at eoniantec.com> 
<+34-93-1592929>
>>>>>
>>>>>
>>>>> El vie, 19 mar 2021 a las 2:10, David Cunningham (<
>>>>> dcunningham at voisonics.com>) escribi?:
>>>>>
>>>>>> Hi Xavi,
>>>>>>
>>>>>> Thank you for that information. We'll look at
upgrading it.
>>>>>>
>>>>>>
>>>>>> On Fri, 12 Mar 2021 at 05:20, Xavi Hernandez
<jahernan at redhat.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi David,
>>>>>>>
>>>>>>> with so little information it's hard to tell,
but given that there
>>>>>>> are several OPEN and UNLINK operations, it could be
related to an already
>>>>>>> fixed bug (in recent versions) in open-behind.
>>>>>>>
>>>>>>> You can try disabling open-behind with this
command:
>>>>>>>
>>>>>>>     # gluster volume set <volname>
open-behind off
>>>>>>>
>>>>>>> But given the version you are using is very old and
unmaintained, I
>>>>>>> would recommend you to upgrade to 8.x at least.
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Xavi
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 10, 2021 at 5:10 AM David Cunningham
<
>>>>>>> dcunningham at voisonics.com> wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> We have a GlusterFS 5.13 server which also
mounts itself with the
>>>>>>>> native FUSE client. Recently the FUSE mount
crashed and we found the
>>>>>>>> following in the syslog. There isn't
anything logged in mnt-glusterfs.log
>>>>>>>> for that time. After killing all processes with
a file handle open on the
>>>>>>>> filesystem we were able to unmount and then
remount the filesystem
>>>>>>>> successfully.
>>>>>>>>
>>>>>>>> Would anyone have advice on how to debug this
crash? Thank you in
>>>>>>>> advance!
>>>>>>>>
>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
pending frames:
>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame
: type(0) op(0)
>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame
: type(0) op(0)
>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame
: type(1) op(UNLINK)
>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame
: type(1) op(UNLINK)
>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame
: type(1) op(OPEN)
>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
message repeated 3355
>>>>>>>> times: [ frame : type(1) op(OPEN)]
>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame
: type(1) op(OPEN)
>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
message repeated 6965
>>>>>>>> times: [ frame : type(1) op(OPEN)]
>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame
: type(1) op(OPEN)
>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
message repeated 4095
>>>>>>>> times: [ frame : type(1) op(OPEN)]
>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame
: type(0) op(0)
>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
patchset: git://
>>>>>>>> git.gluster.org/glusterfs.git
>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
signal received: 11
>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: time
of crash:
>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
2021-03-09 03:12:31
>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
configuration details:
>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: argp
1
>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
backtrace 1
>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: dlfcn
1
>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
libpthread 1
>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
llistxattr 1
>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
setfsid 1
>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
spinlock 1
>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
epoll.h 1
>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
xattr.h 1
>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
st_atim.tv_nsec 1
>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
package-string: glusterfs
>>>>>>>> 5.13
>>>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]:
---------
>>>>>>>> ...
>>>>>>>> Mar 9 05:13:50 voip1 systemd[1]:
glusterfssharedstorage.service:
>>>>>>>> Main process exited, code=killed,
status=11/SEGV
>>>>>>>> Mar 9 05:13:50 voip1 systemd[1]:
glusterfssharedstorage.service:
>>>>>>>> Failed with result 'signal'.
>>>>>>>> ...
>>>>>>>> Mar 9 05:13:54 voip1 systemd[1]:
glusterfssharedstorage.service:
>>>>>>>> Service hold-off time over, scheduling restart.
>>>>>>>> Mar 9 05:13:54 voip1 systemd[1]:
glusterfssharedstorage.service:
>>>>>>>> Scheduled restart job, restart counter is at 2.
>>>>>>>> Mar 9 05:13:54 voip1 systemd[1]: Stopped Mount
glusterfs
>>>>>>>> sharedstorage.
>>>>>>>> Mar 9 05:13:54 voip1 systemd[1]: Starting Mount
glusterfs
>>>>>>>> sharedstorage...
>>>>>>>> Mar 9 05:13:54 voip1
mount-shared-storage.sh[20520]: ERROR: Mount
>>>>>>>> point does not exist
>>>>>>>> Mar 9 05:13:54 voip1
mount-shared-storage.sh[20520]: Please specify
>>>>>>>> a mount point
>>>>>>>> Mar 9 05:13:54 voip1
mount-shared-storage.sh[20520]: Usage:
>>>>>>>> Mar 9 05:13:54 voip1
mount-shared-storage.sh[20520]: man 8
>>>>>>>> /sbin/mount.glusterfs
>>>>>>>>
>>>>>>>> --
>>>>>>>> David Cunningham, Voisonics Limited
>>>>>>>> http://voisonics.com/
>>>>>>>> USA: +1 213 221 1092
>>>>>>>> New Zealand: +64 (0)28 2558 3782
>>>>>>>> ________
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Community Meeting Calendar:
>>>>>>>>
>>>>>>>> Schedule -
>>>>>>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00
UTC
>>>>>>>> Bridge: https://meet.google.com/cpu-eiue-hvk
>>>>>>>> Gluster-users mailing list
>>>>>>>> Gluster-users at gluster.org
>>>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> David Cunningham, Voisonics Limited
>>>>>> http://voisonics.com/
>>>>>> USA: +1 213 221 1092
>>>>>> New Zealand: +64 (0)28 2558 3782
>>>>>> ________
>>>>>>
>>>>>>
>>>>>>
>>>>>> Community Meeting Calendar:
>>>>>>
>>>>>> Schedule -
>>>>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>>>>>> Bridge: https://meet.google.com/cpu-eiue-hvk
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users at gluster.org
>>>>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>
>>>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20221125/1fa88e7e/attachment.html>

Gluster users - Nov 2022 - GlusterFS mount crash

[Gluster-users] GlusterFS mount crash

[Gluster-users] GlusterFS mount crash