Peter Schmidt
2022-Mar-24 14:47 UTC
[Gluster-users] Can't mount particular brick even though the brick port is reachable, error message "Transport endpoint is not connected"
An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20220324/fe004244/attachment.html>
Strahil Nikolov
2022-Mar-24 18:02 UTC
[Gluster-users] Can't mount particular brick even though the brick port is reachable, error message "Transport endpoint is not connected"
In order to troubleshoot such issues, you should start with the brick logs. Do you see any issues there ? As a workaround try to restart glusterd.service on storage2, or even better -> set the node in maintenance (with the tick to stop glusterd) and then reactivate the node. Gluster v8 and bellow are currently not supported and the chance someone to root cause it is very, very low . Upgrade oVirt to 4.4 . Best Regards,Strahil Nikolov On Thu, Mar 24, 2022 at 16:54, Peter Schmidt<peterschmidt18351 at yandex.com> wrote: Hello everyone,?I'm running an oVirt cluster on top of a distributed-replicate gluster volume and one of the bricks cannot be mounted anymore from my oVirt hosts. This morning I also noticed a stack trace and a spike in TCP connections on one of the three gluster nodes (storage2), which I have attached at the end of this mail. Only this particular brick on storage2 seems to be causing trouble:Brick storage2:/data/glusterfs/hdd/brick3/brickStatus: Transport endpoint is not connected?I don't know what's causing this or how to resolve this issue. I would appreciate it if someone could take a look at my logs and point me in the right direction. If any additional logs are required, please let me know. Thank you in advance!?Operating system on all hosts: Centos 7.9.2009oVirt version: 4.3.10.4-1Gluster versions:- storage1: 6.10-1- storage2: 6.7-1- storage3: 6.7-1?##################################### brick is not connected/mounted on the oVirt hosts?[xlator.protocol.client.hdd-client-7.priv]fd.0.remote_fd = -1------ = ------granted-posix-lock[0] = owner = 9d673ffe323e25cd, cmd = F_SETLK fl_type = F_RDLCK, fl_start = 100, fl_end = 100, user_flock: l_type = F_RDLCK, l_start = 100, l_len = 1granted-posix-lock[1] = owner = 9d673ffe323e25cd, cmd = F_SETLK fl_type = F_RDLCK, fl_start = 101, fl_end = 101, user_flock: l_type = F_RDLCK, l_start = 101, l_len = 1------ = ------connected = 0total_bytes_read = 11383136800ping_timeout = 10total_bytes_written = 16699851552ping_msgs_sent = 1msgs_sent = 2?##################################### mount log from one of the oVirt hosts# the IP 172.22.102.142 corresponds to my gluster node "storage2"# the port 49154 corresponds to the brick storage2:/data/glusterfs/hdd/brick3/brick???? ??[2022-03-24 10:59:28.138178] W [rpc-clnt-ping.c:210:rpc_clnt_ping_cbk] 0-hdd-client-7: socket disconnected[2022-03-24 10:59:38.142698] I [rpc-clnt.c:2028:rpc_clnt_reconfig] 0-hdd-client-7: changing port to 49154 (from 0)The message "I [MSGID: 114018] [client.c:2331:client_rpc_notify] 0-hdd-client-7: disconnected from hdd-client-7. Client process will keep trying to connect to glusterd until brick's port is available" repeated 4 times between [2022-03-24 10:58:04.114741] and [2022-03-24 10:59:28.137380]The message "W [MSGID: 114032] [client-handshake.c:1546:client_dump_version_cbk] 0-hdd-client-7: received RPC status error [Transport endpoint is not connected]" repeated 4 times between [2022-03-24 10:58:04.115169] and [2022-03-24 10:59:28.138052][2022-03-24 10:59:49.143217] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 0-hdd-client-7: server 172.22.102.142:49154 has not responded in the last 10 seconds, disconnecting.[2022-03-24 10:59:49.143838] I [MSGID: 114018] [client.c:2331:client_rpc_notify] 0-hdd-client-7: disconnected from hdd-client-7. Client process will keep trying to connect to glusterd until brick's port is available[2022-03-24 10:59:49.144540] E [rpc-clnt.c:346:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f6724643adb] (--> /lib64/libgfrpc.so.0(+0xd7e4)[0x7f67243ea7e4] (--> /lib64/libgfrpc.so.0(+0xd8fe)[0x7f67243ea8fe] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x97)[0x7f67243eb987] (--> /lib64/libgfrpc.so.0(+0xf518)[0x7f67243ec518] ))))) 0-hdd-client-7: forced unwinding frame type(GF-DUMP) op(DUMP(1)) called at 2022-03-24 10:59:38.145208 (xid=0x861)[2022-03-24 10:59:49.144557] W [MSGID: 114032] [client-handshake.c:1546:client_dump_version_cbk] 0-hdd-client-7: received RPC status error [Transport endpoint is not connected][2022-03-24 10:59:49.144653] E [rpc-clnt.c:346:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f6724643adb] (--> /lib64/libgfrpc.so.0(+0xd7e4)[0x7f67243ea7e4] (--> /lib64/libgfrpc.so.0(+0xd8fe)[0x7f67243ea8fe] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x97)[0x7f67243eb987] (--> /lib64/libgfrpc.so.0(+0xf518)[0x7f67243ec518] ))))) 0-hdd-client-7: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2022-03-24 10:59:38.145218 (xid=0x862)[2022-03-24 10:59:49.144665] W [rpc-clnt-ping.c:210:rpc_clnt_ping_cbk] 0-hdd-client-7: socket disconnected?##################################### netcat/telnet to the brick's port of storage2 are working?[root at storage1 ~]#? netcat -z -v 172.22.102.142 49154Connection to 172.22.102.142 49154 port [tcp/*] succeeded!?[root at storage3 ~]# netcat -z -v 172.22.102.142 49154Connection to 172.22.102.142 49154 port [tcp/*] succeeded!?[root at ovirthost1 /var/log/glusterfs]#? netcat -z -v 172.22.102.142 49154Connection to 172.22.102.142 49154 port [tcp/*] succeeded!?##################################### gluster peer status - all gluster peers are connected[root at storage3 ~]#? gluster peer statusNumber of Peers: 2?Hostname: storage1Uuid: 055e79c2-b1ff-4a82-9296-205d6877904eState: Peer in Cluster (Connected)?Hostname: storage2Uuid: d7adcb92-2e71-41a9-80d4-13180ee673cfState: Peer in Cluster (Connected)?##################################### Configuration of the volumeVolume Name: hddType: Distributed-ReplicateVolume ID: 1b47c2f8-5024-4b85-aa7f-a3f767bb076cStatus: StartedSnapshot Count: 0Number of Bricks: 4 x 3 = 12Transport-type: tcpBricks:Brick1: storage1:/data/glusterfs/hdd/brick1/brickBrick2: storage2:/data/glusterfs/hdd/brick1/brickBrick3: storage3:/data/glusterfs/hdd/brick1/brickBrick4: storage1:/data/glusterfs/hdd/brick2/brickBrick5: storage2:/data/glusterfs/hdd/brick2/brickBrick6: storage3:/data/glusterfs/hdd/brick2/brickBrick7: storage1:/data/glusterfs/hdd/brick3/brickBrick8: storage2:/data/glusterfs/hdd/brick3/brickBrick9: storage3:/data/glusterfs/hdd/brick3/brickBrick10: storage1:/data/glusterfs/hdd/brick4/brickBrick11: storage2:/data/glusterfs/hdd/brick4/brickBrick12: storage3:/data/glusterfs/hdd/brick4/brickOptions Reconfigured:storage.owner-gid: 36storage.owner-uid: 36server.event-threads: 4client.event-threads: 4cluster.choose-local: offuser.cifs: offfeatures.shard: oncluster.shd-wait-qlength: 10000cluster.shd-max-threads: 8cluster.locking-scheme: granularcluster.data-self-heal-algorithm: fullcluster.server-quorum-type: servercluster.eager-lock: enablenetwork.remote-dio: enableperformance.low-prio-threads: 32performance.io-cache: offperformance.read-ahead: offperformance.quick-read: offauth.allow: *network.ping-timeout: 10cluster.quorum-type: autotransport.address-family: inetnfs.disable: onperformance.client-io-threads: on?##################################### gluster volume status. The brick running on port 49154 is supposedly online?Status of volume: hddGluster process???????????????????????????? TCP Port? RDMA Port? Online? Pid------------------------------------------------------------------------------Brick storage1:/data/glusterfs/hdd/brick1/brick??????????????????????? 49158???? 0????????? Y?????? 9142Brick storage2:/data/glusterfs/hdd/brick1/brick??????????????????????? 49152???? 0????????? Y?????? 115896Brick storage3:/data/glusterfs/hdd/brick1/brick??????????????????????? 49158???? 0????????? Y?????? 131775Brick storage1:/data/glusterfs/hdd/brick2/brick??????????????????????? 49159???? 0????????? Y?????? 9151Brick storage2:/data/glusterfs/hdd/brick2/brick??????????????????????? 49153???? 0????????? Y?????? 115904Brick storage3:/data/glusterfs/hdd/brick2/brick??????????????????????? 49159???? 0????????? Y?????? 131783Brick storage1:/data/glusterfs/hdd/brick3/brick??????????????????????? 49160???? 0????????? Y?????? 9163Brick storage2:/data/glusterfs/hdd/brick3/brick??????????????????????? 49154???? 0????????? Y?????? 115913Brick storage3:/data/glusterfs/hdd/brick3/brick??????????????????????? 49160???? 0????????? Y?????? 131792Brick storage1:/data/glusterfs/hdd/brick4/brick??????????????????????? 49161???? 0????????? Y?????? 9170Brick storage2:/data/glusterfs/hdd/brick4/brick??????????????????????? 49155???? 0????????? Y?????? 115923Brick storage3:/data/glusterfs/hdd/brick4/brick??????????????????????? 49161???? 0????????? Y?????? 131800Self-heal Daemon on localhost?????????????? N/A?????? N/A??????? Y?????? 170468Self-heal Daemon on storage3?????????????? N/A?????? N/A??????? Y?????? 132263Self-heal Daemon on storage1?????????????? N/A?????? N/A??????? Y?????? 9512?Task Status of Volume hdd------------------------------------------------------------------------------There are no active volume tasks?##################################### gluster volume heal hdd info split-brain. All bricks are connected and showing no entries (0), except for brick3 on storage2Brick storage2:/data/glusterfs/hdd/brick3/brickStatus: Transport endpoint is not connectedNumber of entries in split-brain: -?##################################### gluster volume heal hdd info. Only brick3 seems to be affected and it has lots of entries. brick3 on storage2 is not connected?Brick storage1:/data/glusterfs/hdd/brick3/brick/538befbf-ffa7-4a8c-8827-cee679d589f4/images/615fa020-9737-4b83-a3c1-a61e32400d59/f4917758-deae-4a62-bf4d-5b9a95a7db5b<gfid:f3d0b19a-2544-48c5-90b7-addd561113bc>/.shard/753a8a81-bd06-4c8c-9515-d54123f6fe4d.1/.shard/c7f5f88f-dc85-4645-9178-c7df8e46a99d.83/538befbf-ffa7-4a8c-8827-cee679d589f4/images/bc4362e6-cd43-4ab8-b8fa-0ea72405b7da/ea9c0e7c-d2c7-43c8-b19f-7a3076cc6743/.shard/dc46e963-2b68-4802-9537-42f25ea97ae2.10872/.shard/dc46e963-2b68-4802-9537-42f25ea97ae2.1901/538befbf-ffa7-4a8c-8827-cee679d589f4/images/e48e80fb-d42f-47a4-9a56-07fd7ad868b3/31fd839f-85bf-4c42-ac0e-7055d903df40/.shard/82700f9b-c7e0-4568-a565-64c9a770449f.223/.shard/82700f9b-c7e0-4568-a565-64c9a770449f.243/.shard/dc46e963-2b68-4802-9537-42f25ea97ae2.10696/.shard/dc46e963-2b68-4802-9537-42f25ea97ae2.10902..Status: ConnectedNumber of entries: 664?Brick storage2:/data/glusterfs/hdd/brick3/brickStatus: Transport endpoint is not connectedNumber of entries: -?Brick storage3:/data/glusterfs/hdd/brick3/brick/538befbf-ffa7-4a8c-8827-cee679d589f4/images/615fa020-9737-4b83-a3c1-a61e32400d59/f4917758-deae-4a62-bf4d-5b9a95a7db5b<gfid:f3d0b19a-2544-48c5-90b7-addd561113bc>/.shard/753a8a81-bd06-4c8c-9515-d54123f6fe4d.1..Status: ConnectedNumber of entries: 664?##################################### /data/glusterfs/hdd/brick3 on storage2 is running inside of a software RAID?md6 : active raid6 sdac1[6] sdz1[3] sdx1[1] sdad1[7] sdaa1[4] sdy1[2] sdw1[0] sdab1[5] sdae1[8]????? 68364119040 blocks super 1.2 level 6, 512k chunk, algorithm 2 [9/9] [UUUUUUUUU]????? [============>........]? check = 64.4% (6290736128/9766302720) finish=3220.5min speed=17985K/sec????? bitmap: 10/73 pages [40KB], 65536KB chunk?##################################### glfsheal-hdd.log on storage2?[2022-03-24 10:15:33.238884] I [MSGID: 114046] [client-handshake.c:1106:client_setvolume_cbk] 0-hdd-client-10: Connected to hdd-client-10, attached to remote volume '/data/glusterfs/hdd/brick4/brick'.[2022-03-24 10:15:33.238931] I [MSGID: 108002] [afr-common.c:5607:afr_notify] 0-hdd-replicate-3: Client-quorum is met[2022-03-24 10:15:33.241616] I [MSGID: 114046] [client-handshake.c:1106:client_setvolume_cbk] 0-hdd-client-11: Connected to hdd-client-11, attached to remote volume '/data/glusterfs/hdd/brick4/brick'.[2022-03-24 10:15:44.078651] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 0-hdd-client-7: server 172.22.102.142:49154 has not responded in the last 10 seconds, disconnecting.[2022-03-24 10:15:44.078891] I [MSGID: 114018] [client.c:2331:client_rpc_notify] 0-hdd-client-7: disconnected from hdd-client-7. Client process will keep trying to connect to glusterd until brick's port is available[2022-03-24 10:15:44.079954] E [rpc-clnt.c:346:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fc6c0cadadb] (--> /lib64/libgfrpc.so.0(+0xd7e4)[0x7fc6c019f7e4] (--> /lib64/libgfrpc.so.0(+0xd8fe)[0x7fc6c019f8fe] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x97)[0x7fc6c01a0987] (--> /lib64/libgfrpc.so.0(+0xf518)[0x7fc6c01a1518] ))))) 0-hdd-client-7: forced unwinding frame type(GF-DUMP) op(DUMP(1)) called at 2022-03-24 10:15:33.209640 (xid=0x5)[2022-03-24 10:15:44.080008] W [MSGID: 114032] [client-handshake.c:1547:client_dump_version_cbk] 0-hdd-client-7: received RPC status error [Transport endpoint is not connected][2022-03-24 10:15:44.080526] E [rpc-clnt.c:346:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fc6c0cadadb] (--> /lib64/libgfrpc.so.0(+0xd7e4)[0x7fc6c019f7e4] (--> /lib64/libgfrpc.so.0(+0xd8fe)[0x7fc6c019f8fe] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x97)[0x7fc6c01a0987] (--> /lib64/libgfrpc.so.0(+0xf518)[0x7fc6c01a1518] ))))) 0-hdd-client-7: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2022-03-24 10:15:33.209655 (xid=0x6)[2022-03-24 10:15:44.080574] W [rpc-clnt-ping.c:210:rpc_clnt_ping_cbk] 0-hdd-client-7: socket disconnected?##################################### stack trace on storage2 that happened this morning?Mar 24 06:24:06 storage2 kernel: INFO: task glfs_iotwr000:115974 blocked for more than 120 seconds.Mar 24 06:24:06 storage2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.Mar 24 06:24:06 storage2 kernel: glfs_iotwr000?? D ffff9b91b8951070???? 0 115974????? 1 0x00000080Mar 24 06:24:06 storage2 kernel: Call Trace:Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80a29>] schedule+0x29/0x70Mar 24 06:24:06 storage2 kernel: [<ffffffffc05056e1>] _xfs_log_force_lsn+0x2d1/0x310 [xfs]Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db4d0>] ? wake_up_state+0x20/0x20Mar 24 06:24:06 storage2 kernel: [<ffffffffc04e5a3d>] xfs_file_fsync+0xfd/0x1c0 [xfs]Mar 24 06:24:06 storage2 kernel: [<ffffffffa167fbf7>] do_fsync+0x67/0xb0Mar 24 06:24:06 storage2 kernel: [<ffffffffa167ff03>] SyS_fdatasync+0x13/0x20Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b8dede>] system_call_fastpath+0x25/0x2aMar 24 06:24:06 storage2 kernel: INFO: task glfs_iotwr001:121353 blocked for more than 120 seconds.Mar 24 06:24:06 storage2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.Mar 24 06:24:06 storage2 kernel: glfs_iotwr001?? D ffff9b9b7d4dac80???? 0 121353????? 1 0x00000080Mar 24 06:24:06 storage2 kernel: Call Trace:Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80a29>] schedule+0x29/0x70Mar 24 06:24:06 storage2 kernel: [<ffffffffc05056e1>] _xfs_log_force_lsn+0x2d1/0x310 [xfs]Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db4d0>] ? wake_up_state+0x20/0x20Mar 24 06:24:06 storage2 kernel: [<ffffffffc04e5a3d>] xfs_file_fsync+0xfd/0x1c0 [xfs]Mar 24 06:24:06 storage2 kernel: [<ffffffffa167fbf7>] do_fsync+0x67/0xb0Mar 24 06:24:06 storage2 kernel: [<ffffffffa167ff03>] SyS_fdatasync+0x13/0x20Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b8dede>] system_call_fastpath+0x25/0x2aMar 24 06:24:06 storage2 kernel: INFO: task glfs_iotwr002:121354 blocked for more than 120 seconds.Mar 24 06:24:06 storage2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.Mar 24 06:24:06 storage2 kernel: glfs_iotwr002?? D ffff9b9b7d75ac80???? 0 121354????? 1 0x00000080Mar 24 06:24:06 storage2 kernel: Call Trace:Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80a29>] schedule+0x29/0x70Mar 24 06:24:06 storage2 kernel: [<ffffffffc05056e1>] _xfs_log_force_lsn+0x2d1/0x310 [xfs]Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db4d0>] ? wake_up_state+0x20/0x20Mar 24 06:24:06 storage2 kernel: [<ffffffffc04e5a3d>] xfs_file_fsync+0xfd/0x1c0 [xfs]Mar 24 06:24:06 storage2 kernel: [<ffffffffa167fbf7>] do_fsync+0x67/0xb0Mar 24 06:24:06 storage2 kernel: [<ffffffffa167ff03>] SyS_fdatasync+0x13/0x20Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b8dede>] system_call_fastpath+0x25/0x2aMar 24 06:24:06 storage2 kernel: INFO: task glfs_iotwr003:121355 blocked for more than 120 seconds.Mar 24 06:24:06 storage2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.Mar 24 06:24:06 storage2 kernel: glfs_iotwr003?? D ffff9b9b7d51ac80???? 0 121355????? 1 0x00000080Mar 24 06:24:06 storage2 kernel: Call Trace:Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80a29>] schedule+0x29/0x70Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b7e531>] schedule_timeout+0x221/0x2d0Mar 24 06:24:06 storage2 kernel: [<ffffffffa14d77a9>] ? ttwu_do_wakeup+0x19/0xe0Mar 24 06:24:06 storage2 kernel: [<ffffffffa14d78df>] ? ttwu_do_activate+0x6f/0x80Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db210>] ? try_to_wake_up+0x190/0x390Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80ddd>] wait_for_completion+0xfd/0x140Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db4d0>] ? wake_up_state+0x20/0x20Mar 24 06:24:06 storage2 kernel: [<ffffffffa14be9aa>] flush_work+0x10a/0x1b0Mar 24 06:24:06 storage2 kernel: [<ffffffffa14bb6c0>] ? move_linked_works+0x90/0x90Mar 24 06:24:06 storage2 kernel: [<ffffffffc05070ba>] xlog_cil_force_lsn+0x8a/0x210 [xfs]Mar 24 06:24:06 storage2 kernel: [<ffffffffc0505484>] _xfs_log_force_lsn+0x74/0x310 [xfs]Mar 24 06:24:06 storage2 kernel: [<ffffffffa15bcb1f>] ? filemap_fdatawait_range+0x1f/0x30Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b7fd22>] ? down_read+0x12/0x40Mar 24 06:24:06 storage2 kernel: [<ffffffffc04e5a3d>] xfs_file_fsync+0xfd/0x1c0 [xfs]Mar 24 06:24:06 storage2 kernel: [<ffffffffa167fbf7>] do_fsync+0x67/0xb0Mar 24 06:24:06 storage2 kernel: [<ffffffffa167ff03>] SyS_fdatasync+0x13/0x20Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b8dede>] system_call_fastpath+0x25/0x2aMar 24 06:24:06 storage2 kernel: INFO: task glfs_iotwr004:121356 blocked for more than 120 seconds.Mar 24 06:24:06 storage2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.Mar 24 06:24:06 storage2 kernel: glfs_iotwr004?? D ffff9b9b7d75ac80???? 0 121356????? 1 0x00000080Mar 24 06:24:06 storage2 kernel: Call Trace:Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80a29>] schedule+0x29/0x70Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b7e531>] schedule_timeout+0x221/0x2d0Mar 24 06:24:06 storage2 kernel: [<ffffffffa14d77a9>] ? ttwu_do_wakeup+0x19/0xe0Mar 24 06:24:06 storage2 kernel: [<ffffffffa14d78df>] ? ttwu_do_activate+0x6f/0x80Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db210>] ? try_to_wake_up+0x190/0x390Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80ddd>] wait_for_completion+0xfd/0x140Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db4d0>] ? wake_up_state+0x20/0x20Mar 24 06:24:06 storage2 kernel: [<ffffffffa14be9aa>] flush_work+0x10a/0x1b0Mar 24 06:24:06 storage2 kernel: [<ffffffffa14bb6c0>] ? move_linked_works+0x90/0x90Mar 24 06:24:06 storage2 kernel: [<ffffffffc05070ba>] xlog_cil_force_lsn+0x8a/0x210 [xfs]Mar 24 06:24:06 storage2 kernel: [<ffffffffc0505484>] _xfs_log_force_lsn+0x74/0x310 [xfs]Mar 24 06:24:06 storage2 kernel: [<ffffffffa15bcb1f>] ? filemap_fdatawait_range+0x1f/0x30Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b7fd22>] ? down_read+0x12/0x40Mar 24 06:24:06 storage2 kernel: [<ffffffffc04e5a3d>] xfs_file_fsync+0xfd/0x1c0 [xfs]Mar 24 06:24:06 storage2 kernel: [<ffffffffa167fbf7>] do_fsync+0x67/0xb0Mar 24 06:24:06 storage2 kernel: [<ffffffffa167ff03>] SyS_fdatasync+0x13/0x20Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b8dede>] system_call_fastpath+0x25/0x2aMar 24 06:24:06 storage2 kernel: INFO: task glfs_iotwr005:153774 blocked for more than 120 seconds.Mar 24 06:24:06 storage2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.Mar 24 06:24:06 storage2 kernel: glfs_iotwr005?? D ffff9b9b7d61ac80???? 0 153774????? 1 0x00000080Mar 24 06:24:06 storage2 kernel: Call Trace:Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80a29>] schedule+0x29/0x70Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b7e531>] schedule_timeout+0x221/0x2d0Mar 24 06:24:06 storage2 kernel: [<ffffffffa14d77a9>] ? ttwu_do_wakeup+0x19/0xe0Mar 24 06:24:06 storage2 kernel: [<ffffffffa14d78df>] ? ttwu_do_activate+0x6f/0x80Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db210>] ? try_to_wake_up+0x190/0x390Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80ddd>] wait_for_completion+0xfd/0x140Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db4d0>] ? wake_up_state+0x20/0x20Mar 24 06:24:06 storage2 kernel: [<ffffffffa14be9aa>] flush_work+0x10a/0x1b0Mar 24 06:24:06 storage2 kernel: [<ffffffffa14bb6c0>] ? move_linked_works+0x90/0x90Mar 24 06:24:06 storage2 kernel: [<ffffffffc05070ba>] xlog_cil_force_lsn+0x8a/0x210 [xfs]Mar 24 06:24:06 storage2 kernel: [<ffffffffa167335b>] ? getxattr+0x11b/0x180Mar 24 06:24:06 storage2 kernel: [<ffffffffc0505484>] _xfs_log_force_lsn+0x74/0x310 [xfs]Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b7fd22>] ? down_read+0x12/0x40Mar 24 06:24:06 storage2 kernel: [<ffffffffc04e5a3d>] xfs_file_fsync+0xfd/0x1c0 [xfs]Mar 24 06:24:06 storage2 kernel: [<ffffffffa167fbf7>] do_fsync+0x67/0xb0Mar 24 06:24:06 storage2 kernel: [<ffffffffa167ff03>] SyS_fdatasync+0x13/0x20Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b8dede>] system_call_fastpath+0x25/0x2aMar 24 06:24:06 storage2 kernel: INFO: task glfs_iotwr006:153775 blocked for more than 120 seconds.Mar 24 06:24:06 storage2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.Mar 24 06:24:06 storage2 kernel: glfs_iotwr006?? D ffff9b9b7d49ac80???? 0 153775????? 1 0x00000080Mar 24 06:24:06 storage2 kernel: Call Trace:Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80a29>] schedule+0x29/0x70Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b7e531>] schedule_timeout+0x221/0x2d0Mar 24 06:24:06 storage2 kernel: [<ffffffffa14d77a9>] ? ttwu_do_wakeup+0x19/0xe0Mar 24 06:24:06 storage2 kernel: [<ffffffffa14d78df>] ? ttwu_do_activate+0x6f/0x80Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db210>] ? try_to_wake_up+0x190/0x390Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80ddd>] wait_for_completion+0xfd/0x140Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db4d0>] ? wake_up_state+0x20/0x20Mar 24 06:24:06 storage2 kernel: [<ffffffffa14be9aa>] flush_work+0x10a/0x1b0Mar 24 06:24:06 storage2 kernel: [<ffffffffa14bb6c0>] ? move_linked_works+0x90/0x90Mar 24 06:24:06 storage2 kernel: [<ffffffffc05070ba>] xlog_cil_force_lsn+0x8a/0x210 [xfs]Mar 24 06:24:06 storage2 kernel: [<ffffffffa167335b>] ? getxattr+0x11b/0x180Mar 24 06:24:06 storage2 kernel: [<ffffffffc0505484>] _xfs_log_force_lsn+0x74/0x310 [xfs]Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b7fd22>] ? down_read+0x12/0x40Mar 24 06:24:06 storage2 kernel: [<ffffffffc04e5a3d>] xfs_file_fsync+0xfd/0x1c0 [xfs]Mar 24 06:24:06 storage2 kernel: [<ffffffffa167fbf7>] do_fsync+0x67/0xb0Mar 24 06:24:06 storage2 kernel: [<ffffffffa167ff03>] SyS_fdatasync+0x13/0x20Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b8dede>] system_call_fastpath+0x25/0x2aMar 24 06:24:06 storage2 kernel: INFO: task glfs_iotwr007:153776 blocked for more than 120 seconds.Mar 24 06:24:06 storage2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.Mar 24 06:24:06 storage2 kernel: glfs_iotwr007?? D ffff9b9958c962a0???? 0 153776????? 1 0x00000080Mar 24 06:24:06 storage2 kernel: Call Trace:Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80a29>] schedule+0x29/0x70Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b7e531>] schedule_timeout+0x221/0x2d0Mar 24 06:24:06 storage2 kernel: [<ffffffffa14d7782>] ? check_preempt_curr+0x92/0xa0Mar 24 06:24:06 storage2 kernel: [<ffffffffa14d77a9>] ? ttwu_do_wakeup+0x19/0xe0Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db210>] ? try_to_wake_up+0x190/0x390Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80ddd>] wait_for_completion+0xfd/0x140Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db4d0>] ? wake_up_state+0x20/0x20Mar 24 06:24:06 storage2 kernel: [<ffffffffa14be9aa>] flush_work+0x10a/0x1b0Mar 24 06:24:06 storage2 kernel: [<ffffffffa14bb6c0>] ? move_linked_works+0x90/0x90Mar 24 06:24:06 storage2 kernel: [<ffffffffc05070ba>] xlog_cil_force_lsn+0x8a/0x210 [xfs]Mar 24 06:24:06 storage2 kernel: [<ffffffffa167335b>] ? getxattr+0x11b/0x180Mar 24 06:24:06 storage2 kernel: [<ffffffffc0505484>] _xfs_log_force_lsn+0x74/0x310 [xfs]Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b7fd22>] ? down_read+0x12/0x40Mar 24 06:24:06 storage2 kernel: [<ffffffffc04e5a3d>] xfs_file_fsync+0xfd/0x1c0 [xfs]Mar 24 06:24:06 storage2 kernel: [<ffffffffa167fbf7>] do_fsync+0x67/0xb0Mar 24 06:24:06 storage2 kernel: [<ffffffffa167ff03>] SyS_fdatasync+0x13/0x20Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b8dede>] system_call_fastpath+0x25/0x2aMar 24 06:24:06 storage2 kernel: INFO: task glfs_iotwr008:153777 blocked for more than 120 seconds.Mar 24 06:24:06 storage2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.Mar 24 06:24:06 storage2 kernel: glfs_iotwr008?? D ffff9b9b7d61ac80???? 0 153777????? 1 0x00000080Mar 24 06:24:06 storage2 kernel: Call Trace:Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80a29>] schedule+0x29/0x70Mar 24 06:24:06 storage2 kernel: [<ffffffffc05056e1>] _xfs_log_force_lsn+0x2d1/0x310 [xfs]Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db4d0>] ? wake_up_state+0x20/0x20Mar 24 06:24:06 storage2 kernel: [<ffffffffc04e5a3d>] xfs_file_fsync+0xfd/0x1c0 [xfs]Mar 24 06:24:06 storage2 kernel: [<ffffffffa167fbf7>] do_fsync+0x67/0xb0Mar 24 06:24:06 storage2 kernel: [<ffffffffa167ff03>] SyS_fdatasync+0x13/0x20Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b8dede>] system_call_fastpath+0x25/0x2aMar 24 06:24:06 storage2 kernel: INFO: task glfs_iotwr009:153778 blocked for more than 120 seconds.Mar 24 06:24:06 storage2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.Mar 24 06:24:06 storage2 kernel: glfs_iotwr009?? D ffff9b9958c920e0???? 0 153778????? 1 0x00000080Mar 24 06:24:06 storage2 kernel: Call Trace:Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80a29>] schedule+0x29/0x70Mar 24 06:24:06 storage2 kernel: [<ffffffffc05056e1>] _xfs_log_force_lsn+0x2d1/0x310 [xfs]Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db4d0>] ? wake_up_state+0x20/0x20Mar 24 06:24:06 storage2 kernel: [<ffffffffc04e5a3d>] xfs_file_fsync+0xfd/0x1c0 [xfs]Mar 24 06:24:06 storage2 kernel: [<ffffffffa167fbf7>] do_fsync+0x67/0xb0Mar 24 06:24:06 storage2 kernel: [<ffffffffa167ff03>] SyS_fdatasync+0x13/0x20Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b8dede>] system_call_fastpath+0x25/0x2a________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20220324/67c4db97/attachment.html>
Olaf Buitelaar
2022-Mar-25 10:36 UTC
[Gluster-users] Can't mount particular brick even though the brick port is reachable, error message "Transport endpoint is not connected"
Hi Peter, I see your raid array is rebuilding, could it be your xfs needs a repair, using xfs_repair? did you try running gluster v hdd start force? Kind regards, Olaf Op do 24 mrt. 2022 om 15:54 schreef Peter Schmidt < peterschmidt18351 at yandex.com>:> Hello everyone, > > I'm running an oVirt cluster on top of a distributed-replicate gluster > volume and one of the bricks cannot be mounted anymore from my oVirt hosts. > This morning I also noticed a stack trace and a spike in TCP connections on > one of the three gluster nodes (storage2), which I have attached at the end > of this mail. Only this particular brick on storage2 seems to be causing > trouble: > *Brick storage2:/data/glusterfs/hdd/brick3/brick* > *Status: Transport endpoint is not connected* > > I don't know what's causing this or how to resolve this issue. I would > appreciate it if someone could take a look at my logs and point me in the > right direction. If any additional logs are required, please let me know. > Thank you in advance! > > Operating system on all hosts: Centos 7.9.2009 > oVirt version: 4.3.10.4-1 > Gluster versions: > - storage1: 6.10-1 > - storage2: 6.7-1 > - storage3: 6.7-1 > > #################################### > # brick is not connected/mounted on the oVirt hosts > > *[xlator.protocol.client.hdd-client-7.priv]* > *fd.0.remote_fd = -1* > *------ = ------* > *granted-posix-lock[0] = owner = 9d673ffe323e25cd, cmd = F_SETLK fl_type > F_RDLCK, fl_start = 100, fl_end = 100, user_flock: l_type = F_RDLCK, > l_start = 100, l_len = 1* > *granted-posix-lock[1] = owner = 9d673ffe323e25cd, cmd = F_SETLK fl_type > F_RDLCK, fl_start = 101, fl_end = 101, user_flock: l_type = F_RDLCK, > l_start = 101, l_len = 1* > *------ = ------* > *connected = 0* > *total_bytes_read = 11383136800* > *ping_timeout = 10* > *total_bytes_written = 16699851552* > *ping_msgs_sent = 1* > *msgs_sent = 2* > > #################################### > # mount log from one of the oVirt hosts > # the IP 172.22.102.142 corresponds to my gluster node "storage2" > # the port 49154 corresponds to the brick > storage2:/data/glusterfs/hdd/brick3/brick > > *[2022-03-24 10:59:28.138178] W [rpc-clnt-ping.c:210:rpc_clnt_ping_cbk] > 0-hdd-client-7: socket disconnected* > *[2022-03-24 10:59:38.142698] I [rpc-clnt.c:2028:rpc_clnt_reconfig] > 0-hdd-client-7: changing port to 49154 (from 0)* > *The message "I [MSGID: 114018] [client.c:2331:client_rpc_notify] > 0-hdd-client-7: disconnected from hdd-client-7. Client process will keep > trying to connect to glusterd until brick's port is available" repeated 4 > times between [2022-03-24 10:58:04.114741] and [2022-03-24 10:59:28.137380]* > *The message "W [MSGID: 114032] > [client-handshake.c:1546:client_dump_version_cbk] 0-hdd-client-7: received > RPC status error [Transport endpoint is not connected]" repeated 4 times > between [2022-03-24 10:58:04.115169] and [2022-03-24 10:59:28.138052]* > *[2022-03-24 10:59:49.143217] C > [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 0-hdd-client-7: server > 172.22.102.142:49154 <http://172.22.102.142:49154> has not responded in the > last 10 seconds, disconnecting.* > *[2022-03-24 10:59:49.143838] I [MSGID: 114018] > [client.c:2331:client_rpc_notify] 0-hdd-client-7: disconnected from > hdd-client-7. Client process will keep trying to connect to glusterd until > brick's port is available* > *[2022-03-24 10:59:49.144540] E [rpc-clnt.c:346:saved_frames_unwind] (--> > /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f6724643adb] (--> > /lib64/libgfrpc.so.0(+0xd7e4)[0x7f67243ea7e4] (--> > /lib64/libgfrpc.so.0(+0xd8fe)[0x7f67243ea8fe] (--> > /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x97)[0x7f67243eb987] (--> > /lib64/libgfrpc.so.0(+0xf518)[0x7f67243ec518] ))))) 0-hdd-client-7: forced > unwinding frame type(GF-DUMP) op(DUMP(1)) called at 2022-03-24 > 10:59:38.145208 (xid=0x861)* > *[2022-03-24 10:59:49.144557] W [MSGID: 114032] > [client-handshake.c:1546:client_dump_version_cbk] 0-hdd-client-7: received > RPC status error [Transport endpoint is not connected]* > *[2022-03-24 10:59:49.144653] E [rpc-clnt.c:346:saved_frames_unwind] (--> > /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f6724643adb] (--> > /lib64/libgfrpc.so.0(+0xd7e4)[0x7f67243ea7e4] (--> > /lib64/libgfrpc.so.0(+0xd8fe)[0x7f67243ea8fe] (--> > /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x97)[0x7f67243eb987] (--> > /lib64/libgfrpc.so.0(+0xf518)[0x7f67243ec518] ))))) 0-hdd-client-7: forced > unwinding frame type(GF-DUMP) op(NULL(2)) called at 2022-03-24 > 10:59:38.145218 (xid=0x862)* > *[2022-03-24 10:59:49.144665] W [rpc-clnt-ping.c:210:rpc_clnt_ping_cbk] > 0-hdd-client-7: socket disconnected* > > #################################### > # netcat/telnet to the brick's port of storage2 are working > > *[root at storage1 <root at storage1> ~]# netcat -z -v 172.22.102.142 49154* > *Connection to 172.22.102.142 49154 port [tcp/*] succeeded!* > > *[root at storage3 <root at storage3> ~]# netcat -z -v 172.22.102.142 49154* > *Connection to 172.22.102.142 49154 port [tcp/*] succeeded!* > > *[root at ovirthost1 <root at ovirthost1> /var/log/glusterfs]# netcat -z -v > 172.22.102.142 49154* > *Connection to 172.22.102.142 49154 port [tcp/*] succeeded!* > > #################################### > # gluster peer status - all gluster peers are connected > *[root at storage3 <root at storage3> ~]# gluster peer status* > *Number of Peers: 2* > > *Hostname: storage1* > *Uuid: 055e79c2-b1ff-4a82-9296-205d6877904e* > *State: Peer in Cluster (Connected)* > > *Hostname: storage2* > *Uuid: d7adcb92-2e71-41a9-80d4-13180ee673cf* > *State: Peer in Cluster (Connected)* > > #################################### > # Configuration of the volume > *Volume Name: hdd* > *Type: Distributed-Replicate* > *Volume ID: 1b47c2f8-5024-4b85-aa7f-a3f767bb076c* > *Status: Started* > *Snapshot Count: 0* > *Number of Bricks: 4 x 3 = 12* > *Transport-type: tcp* > *Bricks:* > *Brick1: storage1:/data/glusterfs/hdd/brick1/brick* > *Brick2: storage2:/data/glusterfs/hdd/brick1/brick* > *Brick3: storage3:/data/glusterfs/hdd/brick1/brick* > *Brick4: storage1:/data/glusterfs/hdd/brick2/brick* > *Brick5: storage2:/data/glusterfs/hdd/brick2/brick* > *Brick6: storage3:/data/glusterfs/hdd/brick2/brick* > *Brick7: storage1:/data/glusterfs/hdd/brick3/brick* > *Brick8: storage2:/data/glusterfs/hdd/brick3/brick* > *Brick9: storage3:/data/glusterfs/hdd/brick3/brick* > *Brick10: storage1:/data/glusterfs/hdd/brick4/brick* > *Brick11: storage2:/data/glusterfs/hdd/brick4/brick* > *Brick12: storage3:/data/glusterfs/hdd/brick4/brick* > *Options Reconfigured:* > *storage.owner-gid: 36* > *storage.owner-uid: 36* > *server.event-threads: 4* > *client.event-threads: 4* > *cluster.choose-local: off* > *user.cifs: off* > *features.shard: on* > *cluster.shd-wait-qlength: 10000* > *cluster.shd-max-threads: 8* > *cluster.locking-scheme: granular* > *cluster.data-self-heal-algorithm: full* > *cluster.server-quorum-type: server* > *cluster.eager-lock: enable* > *network.remote-dio: enable* > *performance.low-prio-threads: 32* > *performance.io-cache: off* > *performance.read-ahead: off* > *performance.quick-read: off* > *auth.allow: ** > *network.ping-timeout: 10* > *cluster.quorum-type: auto* > *transport.address-family: inet* > *nfs.disable: on* > *performance.client-io-threads: on* > > #################################### > # gluster volume status. The brick running on port 49154 is supposedly > online > > *Status of volume: hdd* > *Gluster process TCP Port RDMA Port Online > Pid* > > *------------------------------------------------------------------------------* > *Brick storage1:/data/gluste* > *rfs/hdd/brick1/brick 49158 0 Y > 9142* > *Brick storage2:/data/gluste* > *rfs/hdd/brick1/brick 49152 0 Y > 115896* > *Brick storage3:/data/gluste* > *rfs/hdd/brick1/brick 49158 0 Y > 131775* > *Brick storage1:/data/gluste* > *rfs/hdd/brick2/brick 49159 0 Y > 9151* > *Brick storage2:/data/gluste* > *rfs/hdd/brick2/brick 49153 0 Y > 115904* > *Brick storage3:/data/gluste* > *rfs/hdd/brick2/brick 49159 0 Y > 131783* > *Brick storage1:/data/gluste* > *rfs/hdd/brick3/brick 49160 0 Y > 9163* > *Brick storage2:/data/gluste* > *rfs/hdd/brick3/brick 49154 0 Y > 115913* > *Brick storage3:/data/gluste* > *rfs/hdd/brick3/brick 49160 0 Y > 131792* > *Brick storage1:/data/gluste* > *rfs/hdd/brick4/brick 49161 0 Y > 9170* > *Brick storage2:/data/gluste* > *rfs/hdd/brick4/brick 49155 0 Y > 115923* > *Brick storage3:/data/gluste* > *rfs/hdd/brick4/brick 49161 0 Y > 131800* > *Self-heal Daemon on localhost N/A N/A Y > 170468* > *Self-heal Daemon on storage3 N/A N/A Y > 132263* > *Self-heal Daemon on storage1 N/A N/A Y > 9512* > > *Task Status of Volume hdd* > > *------------------------------------------------------------------------------* > *There are no active volume tasks* > > #################################### > # gluster volume heal hdd info split-brain. All bricks are connected and > showing no entries (0), except for brick3 on storage2 > *Brick storage2:/data/glusterfs/hdd/brick3/brick* > *Status: Transport endpoint is not connected* > *Number of entries in split-brain: -* > > #################################### > # gluster volume heal hdd info. Only brick3 seems to be affected and it > has lots of entries. brick3 on storage2 is not connected > > *Brick storage1:/data/glusterfs/hdd/brick3/brick* > > */538befbf-ffa7-4a8c-8827-cee679d589f4/images/615fa020-9737-4b83-a3c1-a61e32400d59/f4917758-deae-4a62-bf4d-5b9a95a7db5b* > *<gfid:f3d0b19a-2544-48c5-90b7-addd561113bc>* > */.shard/753a8a81-bd06-4c8c-9515-d54123f6fe4d.1* > */.shard/c7f5f88f-dc85-4645-9178-c7df8e46a99d.83* > > */538befbf-ffa7-4a8c-8827-cee679d589f4/images/bc4362e6-cd43-4ab8-b8fa-0ea72405b7da/ea9c0e7c-d2c7-43c8-b19f-7a3076cc6743* > */.shard/dc46e963-2b68-4802-9537-42f25ea97ae2.10872* > */.shard/dc46e963-2b68-4802-9537-42f25ea97ae2.1901* > > */538befbf-ffa7-4a8c-8827-cee679d589f4/images/e48e80fb-d42f-47a4-9a56-07fd7ad868b3/31fd839f-85bf-4c42-ac0e-7055d903df40* > */.shard/82700f9b-c7e0-4568-a565-64c9a770449f.223* > */.shard/82700f9b-c7e0-4568-a565-64c9a770449f.243* > */.shard/dc46e963-2b68-4802-9537-42f25ea97ae2.10696* > */.shard/dc46e963-2b68-4802-9537-42f25ea97ae2.10902* > *..* > *Status: Connected* > *Number of entries: 664* > > *Brick storage2:/data/glusterfs/hdd/brick3/brick* > *Status: Transport endpoint is not connected* > *Number of entries: -* > > *Brick storage3:/data/glusterfs/hdd/brick3/brick* > > */538befbf-ffa7-4a8c-8827-cee679d589f4/images/615fa020-9737-4b83-a3c1-a61e32400d59/f4917758-deae-4a62-bf4d-5b9a95a7db5b* > *<gfid:f3d0b19a-2544-48c5-90b7-addd561113bc>* > */.shard/753a8a81-bd06-4c8c-9515-d54123f6fe4d.1* > *..* > *Status: Connected* > *Number of entries: 664* > > #################################### > # /data/glusterfs/hdd/brick3 on storage2 is running inside of a software > RAID > > *md6 : active raid6 sdac1[6] sdz1[3] sdx1[1] sdad1[7] sdaa1[4] sdy1[2] > sdw1[0] sdab1[5] sdae1[8]* > * 68364119040 blocks super 1.2 level 6, 512k chunk, algorithm 2 [9/9] > [UUUUUUUUU]* > * [============>........] check = 64.4% (6290736128/9766302720) > finish=3220.5min speed=17985K/sec* > * bitmap: 10/73 pages [40KB], 65536KB chunk* > > #################################### > # glfsheal-hdd.log on storage2 > > *[2022-03-24 10:15:33.238884] I [MSGID: 114046] > [client-handshake.c:1106:client_setvolume_cbk] 0-hdd-client-10: Connected > to hdd-client-10, attached to remote volume > '/data/glusterfs/hdd/brick4/brick'.* > *[2022-03-24 10:15:33.238931] I [MSGID: 108002] > [afr-common.c:5607:afr_notify] 0-hdd-replicate-3: Client-quorum is met* > *[2022-03-24 10:15:33.241616] I [MSGID: 114046] > [client-handshake.c:1106:client_setvolume_cbk] 0-hdd-client-11: Connected > to hdd-client-11, attached to remote volume > '/data/glusterfs/hdd/brick4/brick'.* > *[2022-03-24 10:15:44.078651] C > [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 0-hdd-client-7: server > 172.22.102.142:49154 <http://172.22.102.142:49154> has not responded in the > last 10 seconds, disconnecting.* > *[2022-03-24 10:15:44.078891] I [MSGID: 114018] > [client.c:2331:client_rpc_notify] 0-hdd-client-7: disconnected from > hdd-client-7. Client process will keep trying to connect to glusterd until > brick's port is available* > *[2022-03-24 10:15:44.079954] E [rpc-clnt.c:346:saved_frames_unwind] (--> > /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fc6c0cadadb] (--> > /lib64/libgfrpc.so.0(+0xd7e4)[0x7fc6c019f7e4] (--> > /lib64/libgfrpc.so.0(+0xd8fe)[0x7fc6c019f8fe] (--> > /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x97)[0x7fc6c01a0987] (--> > /lib64/libgfrpc.so.0(+0xf518)[0x7fc6c01a1518] ))))) 0-hdd-client-7: forced > unwinding frame type(GF-DUMP) op(DUMP(1)) called at 2022-03-24 > 10:15:33.209640 (xid=0x5)* > *[2022-03-24 10:15:44.080008] W [MSGID: 114032] > [client-handshake.c:1547:client_dump_version_cbk] 0-hdd-client-7: received > RPC status error [Transport endpoint is not connected]* > *[2022-03-24 10:15:44.080526] E [rpc-clnt.c:346:saved_frames_unwind] (--> > /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fc6c0cadadb] (--> > /lib64/libgfrpc.so.0(+0xd7e4)[0x7fc6c019f7e4] (--> > /lib64/libgfrpc.so.0(+0xd8fe)[0x7fc6c019f8fe] (--> > /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x97)[0x7fc6c01a0987] (--> > /lib64/libgfrpc.so.0(+0xf518)[0x7fc6c01a1518] ))))) 0-hdd-client-7: forced > unwinding frame type(GF-DUMP) op(NULL(2)) called at 2022-03-24 > 10:15:33.209655 (xid=0x6)* > *[2022-03-24 10:15:44.080574] W [rpc-clnt-ping.c:210:rpc_clnt_ping_cbk] > 0-hdd-client-7: socket disconnected* > > #################################### > # stack trace on storage2 that happened this morning > > *Mar 24 06:24:06 storage2 kernel: INFO: task glfs_iotwr000:115974 blocked > for more than 120 seconds.* > *Mar 24 06:24:06 storage2 kernel: "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message.* > *Mar 24 06:24:06 storage2 kernel: glfs_iotwr000 D ffff9b91b8951070 0 > 115974 1 0x00000080* > *Mar 24 06:24:06 storage2 kernel: Call Trace:* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80a29>] schedule+0x29/0x70* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffc05056e1>] > _xfs_log_force_lsn+0x2d1/0x310 [xfs]* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db4d0>] ? > wake_up_state+0x20/0x20* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffc04e5a3d>] > xfs_file_fsync+0xfd/0x1c0 [xfs]* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167fbf7>] do_fsync+0x67/0xb0* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167ff03>] > SyS_fdatasync+0x13/0x20* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b8dede>] > system_call_fastpath+0x25/0x2a* > *Mar 24 06:24:06 storage2 kernel: INFO: task glfs_iotwr001:121353 blocked > for more than 120 seconds.* > *Mar 24 06:24:06 storage2 kernel: "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message.* > *Mar 24 06:24:06 storage2 kernel: glfs_iotwr001 D ffff9b9b7d4dac80 0 > 121353 1 0x00000080* > *Mar 24 06:24:06 storage2 kernel: Call Trace:* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80a29>] schedule+0x29/0x70* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffc05056e1>] > _xfs_log_force_lsn+0x2d1/0x310 [xfs]* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db4d0>] ? > wake_up_state+0x20/0x20* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffc04e5a3d>] > xfs_file_fsync+0xfd/0x1c0 [xfs]* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167fbf7>] do_fsync+0x67/0xb0* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167ff03>] > SyS_fdatasync+0x13/0x20* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b8dede>] > system_call_fastpath+0x25/0x2a* > *Mar 24 06:24:06 storage2 kernel: INFO: task glfs_iotwr002:121354 blocked > for more than 120 seconds.* > *Mar 24 06:24:06 storage2 kernel: "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message.* > *Mar 24 06:24:06 storage2 kernel: glfs_iotwr002 D ffff9b9b7d75ac80 0 > 121354 1 0x00000080* > *Mar 24 06:24:06 storage2 kernel: Call Trace:* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80a29>] schedule+0x29/0x70* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffc05056e1>] > _xfs_log_force_lsn+0x2d1/0x310 [xfs]* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db4d0>] ? > wake_up_state+0x20/0x20* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffc04e5a3d>] > xfs_file_fsync+0xfd/0x1c0 [xfs]* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167fbf7>] do_fsync+0x67/0xb0* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167ff03>] > SyS_fdatasync+0x13/0x20* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b8dede>] > system_call_fastpath+0x25/0x2a* > *Mar 24 06:24:06 storage2 kernel: INFO: task glfs_iotwr003:121355 blocked > for more than 120 seconds.* > *Mar 24 06:24:06 storage2 kernel: "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message.* > *Mar 24 06:24:06 storage2 kernel: glfs_iotwr003 D ffff9b9b7d51ac80 0 > 121355 1 0x00000080* > *Mar 24 06:24:06 storage2 kernel: Call Trace:* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80a29>] schedule+0x29/0x70* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b7e531>] > schedule_timeout+0x221/0x2d0* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14d77a9>] ? > ttwu_do_wakeup+0x19/0xe0* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14d78df>] ? > ttwu_do_activate+0x6f/0x80* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db210>] ? > try_to_wake_up+0x190/0x390* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80ddd>] > wait_for_completion+0xfd/0x140* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db4d0>] ? > wake_up_state+0x20/0x20* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14be9aa>] > flush_work+0x10a/0x1b0* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14bb6c0>] ? > move_linked_works+0x90/0x90* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffc05070ba>] > xlog_cil_force_lsn+0x8a/0x210 [xfs]* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffc0505484>] > _xfs_log_force_lsn+0x74/0x310 [xfs]* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa15bcb1f>] ? > filemap_fdatawait_range+0x1f/0x30* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b7fd22>] ? > down_read+0x12/0x40* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffc04e5a3d>] > xfs_file_fsync+0xfd/0x1c0 [xfs]* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167fbf7>] do_fsync+0x67/0xb0* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167ff03>] > SyS_fdatasync+0x13/0x20* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b8dede>] > system_call_fastpath+0x25/0x2a* > *Mar 24 06:24:06 storage2 kernel: INFO: task glfs_iotwr004:121356 blocked > for more than 120 seconds.* > *Mar 24 06:24:06 storage2 kernel: "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message.* > *Mar 24 06:24:06 storage2 kernel: glfs_iotwr004 D ffff9b9b7d75ac80 0 > 121356 1 0x00000080* > *Mar 24 06:24:06 storage2 kernel: Call Trace:* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80a29>] schedule+0x29/0x70* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b7e531>] > schedule_timeout+0x221/0x2d0* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14d77a9>] ? > ttwu_do_wakeup+0x19/0xe0* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14d78df>] ? > ttwu_do_activate+0x6f/0x80* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db210>] ? > try_to_wake_up+0x190/0x390* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80ddd>] > wait_for_completion+0xfd/0x140* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db4d0>] ? > wake_up_state+0x20/0x20* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14be9aa>] > flush_work+0x10a/0x1b0* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14bb6c0>] ? > move_linked_works+0x90/0x90* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffc05070ba>] > xlog_cil_force_lsn+0x8a/0x210 [xfs]* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffc0505484>] > _xfs_log_force_lsn+0x74/0x310 [xfs]* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa15bcb1f>] ? > filemap_fdatawait_range+0x1f/0x30* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b7fd22>] ? > down_read+0x12/0x40* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffc04e5a3d>] > xfs_file_fsync+0xfd/0x1c0 [xfs]* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167fbf7>] do_fsync+0x67/0xb0* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167ff03>] > SyS_fdatasync+0x13/0x20* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b8dede>] > system_call_fastpath+0x25/0x2a* > *Mar 24 06:24:06 storage2 kernel: INFO: task glfs_iotwr005:153774 blocked > for more than 120 seconds.* > *Mar 24 06:24:06 storage2 kernel: "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message.* > *Mar 24 06:24:06 storage2 kernel: glfs_iotwr005 D ffff9b9b7d61ac80 0 > 153774 1 0x00000080* > *Mar 24 06:24:06 storage2 kernel: Call Trace:* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80a29>] schedule+0x29/0x70* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b7e531>] > schedule_timeout+0x221/0x2d0* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14d77a9>] ? > ttwu_do_wakeup+0x19/0xe0* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14d78df>] ? > ttwu_do_activate+0x6f/0x80* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db210>] ? > try_to_wake_up+0x190/0x390* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80ddd>] > wait_for_completion+0xfd/0x140* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db4d0>] ? > wake_up_state+0x20/0x20* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14be9aa>] > flush_work+0x10a/0x1b0* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14bb6c0>] ? > move_linked_works+0x90/0x90* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffc05070ba>] > xlog_cil_force_lsn+0x8a/0x210 [xfs]* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167335b>] ? > getxattr+0x11b/0x180* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffc0505484>] > _xfs_log_force_lsn+0x74/0x310 [xfs]* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b7fd22>] ? > down_read+0x12/0x40* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffc04e5a3d>] > xfs_file_fsync+0xfd/0x1c0 [xfs]* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167fbf7>] do_fsync+0x67/0xb0* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167ff03>] > SyS_fdatasync+0x13/0x20* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b8dede>] > system_call_fastpath+0x25/0x2a* > *Mar 24 06:24:06 storage2 kernel: INFO: task glfs_iotwr006:153775 blocked > for more than 120 seconds.* > *Mar 24 06:24:06 storage2 kernel: "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message.* > *Mar 24 06:24:06 storage2 kernel: glfs_iotwr006 D ffff9b9b7d49ac80 0 > 153775 1 0x00000080* > *Mar 24 06:24:06 storage2 kernel: Call Trace:* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80a29>] schedule+0x29/0x70* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b7e531>] > schedule_timeout+0x221/0x2d0* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14d77a9>] ? > ttwu_do_wakeup+0x19/0xe0* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14d78df>] ? > ttwu_do_activate+0x6f/0x80* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db210>] ? > try_to_wake_up+0x190/0x390* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80ddd>] > wait_for_completion+0xfd/0x140* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db4d0>] ? > wake_up_state+0x20/0x20* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14be9aa>] > flush_work+0x10a/0x1b0* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14bb6c0>] ? > move_linked_works+0x90/0x90* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffc05070ba>] > xlog_cil_force_lsn+0x8a/0x210 [xfs]* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167335b>] ? > getxattr+0x11b/0x180* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffc0505484>] > _xfs_log_force_lsn+0x74/0x310 [xfs]* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b7fd22>] ? > down_read+0x12/0x40* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffc04e5a3d>] > xfs_file_fsync+0xfd/0x1c0 [xfs]* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167fbf7>] do_fsync+0x67/0xb0* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167ff03>] > SyS_fdatasync+0x13/0x20* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b8dede>] > system_call_fastpath+0x25/0x2a* > *Mar 24 06:24:06 storage2 kernel: INFO: task glfs_iotwr007:153776 blocked > for more than 120 seconds.* > *Mar 24 06:24:06 storage2 kernel: "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message.* > *Mar 24 06:24:06 storage2 kernel: glfs_iotwr007 D ffff9b9958c962a0 0 > 153776 1 0x00000080* > *Mar 24 06:24:06 storage2 kernel: Call Trace:* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80a29>] schedule+0x29/0x70* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b7e531>] > schedule_timeout+0x221/0x2d0* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14d7782>] ? > check_preempt_curr+0x92/0xa0* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14d77a9>] ? > ttwu_do_wakeup+0x19/0xe0* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db210>] ? > try_to_wake_up+0x190/0x390* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80ddd>] > wait_for_completion+0xfd/0x140* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db4d0>] ? > wake_up_state+0x20/0x20* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14be9aa>] > flush_work+0x10a/0x1b0* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14bb6c0>] ? > move_linked_works+0x90/0x90* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffc05070ba>] > xlog_cil_force_lsn+0x8a/0x210 [xfs]* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167335b>] ? > getxattr+0x11b/0x180* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffc0505484>] > _xfs_log_force_lsn+0x74/0x310 [xfs]* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b7fd22>] ? > down_read+0x12/0x40* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffc04e5a3d>] > xfs_file_fsync+0xfd/0x1c0 [xfs]* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167fbf7>] do_fsync+0x67/0xb0* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167ff03>] > SyS_fdatasync+0x13/0x20* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b8dede>] > system_call_fastpath+0x25/0x2a* > *Mar 24 06:24:06 storage2 kernel: INFO: task glfs_iotwr008:153777 blocked > for more than 120 seconds.* > *Mar 24 06:24:06 storage2 kernel: "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message.* > *Mar 24 06:24:06 storage2 kernel: glfs_iotwr008 D ffff9b9b7d61ac80 0 > 153777 1 0x00000080* > *Mar 24 06:24:06 storage2 kernel: Call Trace:* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80a29>] schedule+0x29/0x70* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffc05056e1>] > _xfs_log_force_lsn+0x2d1/0x310 [xfs]* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db4d0>] ? > wake_up_state+0x20/0x20* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffc04e5a3d>] > xfs_file_fsync+0xfd/0x1c0 [xfs]* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167fbf7>] do_fsync+0x67/0xb0* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167ff03>] > SyS_fdatasync+0x13/0x20* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b8dede>] > system_call_fastpath+0x25/0x2a* > *Mar 24 06:24:06 storage2 kernel: INFO: task glfs_iotwr009:153778 blocked > for more than 120 seconds.* > *Mar 24 06:24:06 storage2 kernel: "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message.* > *Mar 24 06:24:06 storage2 kernel: glfs_iotwr009 D ffff9b9958c920e0 0 > 153778 1 0x00000080* > *Mar 24 06:24:06 storage2 kernel: Call Trace:* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80a29>] schedule+0x29/0x70* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffc05056e1>] > _xfs_log_force_lsn+0x2d1/0x310 [xfs]* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db4d0>] ? > wake_up_state+0x20/0x20* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffc04e5a3d>] > xfs_file_fsync+0xfd/0x1c0 [xfs]* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167fbf7>] do_fsync+0x67/0xb0* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167ff03>] > SyS_fdatasync+0x13/0x20* > *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b8dede>] > system_call_fastpath+0x25/0x2a* > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://meet.google.com/cpu-eiue-hvk > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20220325/b0e63f67/attachment.html>