thr3ads.net - Gluster users - [Gluster-users] Can't mount particular brick even though the brick port is reachable, error message "Transport endpoint is not connected" [Mar 2022]

If this information is useful, please help other people find it:
Share via:

Olaf Buitelaar

2022-Mar-25 10:36 UTC

[Gluster-users] Can't mount particular brick even though the brick port is reachable, error message "Transport endpoint is not connected"

Hi Peter,

I see your raid array is rebuilding, could it be your xfs needs a repair,
using xfs_repair?
did you try running gluster v hdd start force?

Kind regards,

Olaf


Op do 24 mrt. 2022 om 15:54 schreef Peter Schmidt <
peterschmidt18351 at yandex.com>:
> Hello everyone,
>
> I'm running an oVirt cluster on top of a distributed-replicate gluster
> volume and one of the bricks cannot be mounted anymore from my oVirt hosts.
> This morning I also noticed a stack trace and a spike in TCP connections on
> one of the three gluster nodes (storage2), which I have attached at the end
> of this mail. Only this particular brick on storage2 seems to be causing
> trouble:
> *Brick storage2:/data/glusterfs/hdd/brick3/brick*
> *Status: Transport endpoint is not connected*
>
> I don't know what's causing this or how to resolve this issue. I
would
> appreciate it if someone could take a look at my logs and point me in the
> right direction. If any additional logs are required, please let me know.
> Thank you in advance!
>
> Operating system on all hosts: Centos 7.9.2009
> oVirt version: 4.3.10.4-1
> Gluster versions:
> - storage1: 6.10-1
> - storage2: 6.7-1
> - storage3: 6.7-1
>
> ####################################
> # brick is not connected/mounted on the oVirt hosts
>
> *[xlator.protocol.client.hdd-client-7.priv]*
> *fd.0.remote_fd = -1*
> *------ = ------*
> *granted-posix-lock[0] = owner = 9d673ffe323e25cd, cmd = F_SETLK fl_type
> F_RDLCK, fl_start = 100, fl_end = 100, user_flock: l_type = F_RDLCK,
> l_start = 100, l_len = 1*
> *granted-posix-lock[1] = owner = 9d673ffe323e25cd, cmd = F_SETLK fl_type
> F_RDLCK, fl_start = 101, fl_end = 101, user_flock: l_type = F_RDLCK,
> l_start = 101, l_len = 1*
> *------ = ------*
> *connected = 0*
> *total_bytes_read = 11383136800*
> *ping_timeout = 10*
> *total_bytes_written = 16699851552*
> *ping_msgs_sent = 1*
> *msgs_sent = 2*
>
> ####################################
> # mount log from one of the oVirt hosts
> # the IP 172.22.102.142 corresponds to my gluster node "storage2"
> # the port 49154 corresponds to the brick
> storage2:/data/glusterfs/hdd/brick3/brick
>
> *[2022-03-24 10:59:28.138178] W [rpc-clnt-ping.c:210:rpc_clnt_ping_cbk]
> 0-hdd-client-7: socket disconnected*
> *[2022-03-24 10:59:38.142698] I [rpc-clnt.c:2028:rpc_clnt_reconfig]
> 0-hdd-client-7: changing port to 49154 (from 0)*
> *The message "I [MSGID: 114018] [client.c:2331:client_rpc_notify]
> 0-hdd-client-7: disconnected from hdd-client-7. Client process will keep
> trying to connect to glusterd until brick's port is available"
repeated 4
> times between [2022-03-24 10:58:04.114741] and [2022-03-24
10:59:28.137380]*
> *The message "W [MSGID: 114032]
> [client-handshake.c:1546:client_dump_version_cbk] 0-hdd-client-7: received
> RPC status error [Transport endpoint is not connected]" repeated 4
times
> between [2022-03-24 10:58:04.115169] and [2022-03-24 10:59:28.138052]*
> *[2022-03-24 10:59:49.143217] C
> [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 0-hdd-client-7: server
> 172.22.102.142:49154 <http://172.22.102.142:49154> has not responded
in the
> last 10 seconds, disconnecting.*
> *[2022-03-24 10:59:49.143838] I [MSGID: 114018]
> [client.c:2331:client_rpc_notify] 0-hdd-client-7: disconnected from
> hdd-client-7. Client process will keep trying to connect to glusterd until
> brick's port is available*
> *[2022-03-24 10:59:49.144540] E [rpc-clnt.c:346:saved_frames_unwind]
(-->
> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f6724643adb] (-->
> /lib64/libgfrpc.so.0(+0xd7e4)[0x7f67243ea7e4] (-->
> /lib64/libgfrpc.so.0(+0xd8fe)[0x7f67243ea8fe] (-->
> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x97)[0x7f67243eb987]
(-->
> /lib64/libgfrpc.so.0(+0xf518)[0x7f67243ec518] ))))) 0-hdd-client-7: forced
> unwinding frame type(GF-DUMP) op(DUMP(1)) called at 2022-03-24
> 10:59:38.145208 (xid=0x861)*
> *[2022-03-24 10:59:49.144557] W [MSGID: 114032]
> [client-handshake.c:1546:client_dump_version_cbk] 0-hdd-client-7: received
> RPC status error [Transport endpoint is not connected]*
> *[2022-03-24 10:59:49.144653] E [rpc-clnt.c:346:saved_frames_unwind]
(-->
> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f6724643adb] (-->
> /lib64/libgfrpc.so.0(+0xd7e4)[0x7f67243ea7e4] (-->
> /lib64/libgfrpc.so.0(+0xd8fe)[0x7f67243ea8fe] (-->
> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x97)[0x7f67243eb987]
(-->
> /lib64/libgfrpc.so.0(+0xf518)[0x7f67243ec518] ))))) 0-hdd-client-7: forced
> unwinding frame type(GF-DUMP) op(NULL(2)) called at 2022-03-24
> 10:59:38.145218 (xid=0x862)*
> *[2022-03-24 10:59:49.144665] W [rpc-clnt-ping.c:210:rpc_clnt_ping_cbk]
> 0-hdd-client-7: socket disconnected*
>
> ####################################
> # netcat/telnet to the brick's port of storage2 are working
>
> *[root at storage1 <root at storage1> ~]#  netcat -z -v
172.22.102.142 49154*
> *Connection to 172.22.102.142 49154 port [tcp/*] succeeded!*
>
> *[root at storage3 <root at storage3> ~]# netcat -z -v 172.22.102.142
49154*
> *Connection to 172.22.102.142 49154 port [tcp/*] succeeded!*
>
> *[root at ovirthost1 <root at ovirthost1> /var/log/glusterfs]# 
netcat -z -v
> 172.22.102.142 49154*
> *Connection to 172.22.102.142 49154 port [tcp/*] succeeded!*
>
> ####################################
> # gluster peer status - all gluster peers are connected
> *[root at storage3 <root at storage3> ~]#  gluster peer status*
> *Number of Peers: 2*
>
> *Hostname: storage1*
> *Uuid: 055e79c2-b1ff-4a82-9296-205d6877904e*
> *State: Peer in Cluster (Connected)*
>
> *Hostname: storage2*
> *Uuid: d7adcb92-2e71-41a9-80d4-13180ee673cf*
> *State: Peer in Cluster (Connected)*
>
> ####################################
> # Configuration of the volume
> *Volume Name: hdd*
> *Type: Distributed-Replicate*
> *Volume ID: 1b47c2f8-5024-4b85-aa7f-a3f767bb076c*
> *Status: Started*
> *Snapshot Count: 0*
> *Number of Bricks: 4 x 3 = 12*
> *Transport-type: tcp*
> *Bricks:*
> *Brick1: storage1:/data/glusterfs/hdd/brick1/brick*
> *Brick2: storage2:/data/glusterfs/hdd/brick1/brick*
> *Brick3: storage3:/data/glusterfs/hdd/brick1/brick*
> *Brick4: storage1:/data/glusterfs/hdd/brick2/brick*
> *Brick5: storage2:/data/glusterfs/hdd/brick2/brick*
> *Brick6: storage3:/data/glusterfs/hdd/brick2/brick*
> *Brick7: storage1:/data/glusterfs/hdd/brick3/brick*
> *Brick8: storage2:/data/glusterfs/hdd/brick3/brick*
> *Brick9: storage3:/data/glusterfs/hdd/brick3/brick*
> *Brick10: storage1:/data/glusterfs/hdd/brick4/brick*
> *Brick11: storage2:/data/glusterfs/hdd/brick4/brick*
> *Brick12: storage3:/data/glusterfs/hdd/brick4/brick*
> *Options Reconfigured:*
> *storage.owner-gid: 36*
> *storage.owner-uid: 36*
> *server.event-threads: 4*
> *client.event-threads: 4*
> *cluster.choose-local: off*
> *user.cifs: off*
> *features.shard: on*
> *cluster.shd-wait-qlength: 10000*
> *cluster.shd-max-threads: 8*
> *cluster.locking-scheme: granular*
> *cluster.data-self-heal-algorithm: full*
> *cluster.server-quorum-type: server*
> *cluster.eager-lock: enable*
> *network.remote-dio: enable*
> *performance.low-prio-threads: 32*
> *performance.io-cache: off*
> *performance.read-ahead: off*
> *performance.quick-read: off*
> *auth.allow: **
> *network.ping-timeout: 10*
> *cluster.quorum-type: auto*
> *transport.address-family: inet*
> *nfs.disable: on*
> *performance.client-io-threads: on*
>
> ####################################
> # gluster volume status. The brick running on port 49154 is supposedly
> online
>
> *Status of volume: hdd*
> *Gluster process                             TCP Port  RDMA Port  Online
> Pid*
>
>
*------------------------------------------------------------------------------*
> *Brick storage1:/data/gluste*
> *rfs/hdd/brick1/brick                        49158     0          Y
> 9142*
> *Brick storage2:/data/gluste*
> *rfs/hdd/brick1/brick                        49152     0          Y
> 115896*
> *Brick storage3:/data/gluste*
> *rfs/hdd/brick1/brick                        49158     0          Y
> 131775*
> *Brick storage1:/data/gluste*
> *rfs/hdd/brick2/brick                        49159     0          Y
> 9151*
> *Brick storage2:/data/gluste*
> *rfs/hdd/brick2/brick                        49153     0          Y
> 115904*
> *Brick storage3:/data/gluste*
> *rfs/hdd/brick2/brick                        49159     0          Y
> 131783*
> *Brick storage1:/data/gluste*
> *rfs/hdd/brick3/brick                        49160     0          Y
> 9163*
> *Brick storage2:/data/gluste*
> *rfs/hdd/brick3/brick                        49154     0          Y
> 115913*
> *Brick storage3:/data/gluste*
> *rfs/hdd/brick3/brick                        49160     0          Y
> 131792*
> *Brick storage1:/data/gluste*
> *rfs/hdd/brick4/brick                        49161     0          Y
> 9170*
> *Brick storage2:/data/gluste*
> *rfs/hdd/brick4/brick                        49155     0          Y
> 115923*
> *Brick storage3:/data/gluste*
> *rfs/hdd/brick4/brick                        49161     0          Y
> 131800*
> *Self-heal Daemon on localhost               N/A       N/A        Y
> 170468*
> *Self-heal Daemon on storage3               N/A       N/A        Y
> 132263*
> *Self-heal Daemon on storage1               N/A       N/A        Y
> 9512*
>
> *Task Status of Volume hdd*
>
>
*------------------------------------------------------------------------------*
> *There are no active volume tasks*
>
> ####################################
> # gluster volume heal hdd info split-brain. All bricks are connected and
> showing no entries (0), except for brick3 on storage2
> *Brick storage2:/data/glusterfs/hdd/brick3/brick*
> *Status: Transport endpoint is not connected*
> *Number of entries in split-brain: -*
>
> ####################################
> # gluster volume heal hdd info. Only brick3 seems to be affected and it
> has lots of entries. brick3 on storage2 is not connected
>
> *Brick storage1:/data/glusterfs/hdd/brick3/brick*
>
>
*/538befbf-ffa7-4a8c-8827-cee679d589f4/images/615fa020-9737-4b83-a3c1-a61e32400d59/f4917758-deae-4a62-bf4d-5b9a95a7db5b*
> *<gfid:f3d0b19a-2544-48c5-90b7-addd561113bc>*
> */.shard/753a8a81-bd06-4c8c-9515-d54123f6fe4d.1*
> */.shard/c7f5f88f-dc85-4645-9178-c7df8e46a99d.83*
>
>
*/538befbf-ffa7-4a8c-8827-cee679d589f4/images/bc4362e6-cd43-4ab8-b8fa-0ea72405b7da/ea9c0e7c-d2c7-43c8-b19f-7a3076cc6743*
> */.shard/dc46e963-2b68-4802-9537-42f25ea97ae2.10872*
> */.shard/dc46e963-2b68-4802-9537-42f25ea97ae2.1901*
>
>
*/538befbf-ffa7-4a8c-8827-cee679d589f4/images/e48e80fb-d42f-47a4-9a56-07fd7ad868b3/31fd839f-85bf-4c42-ac0e-7055d903df40*
> */.shard/82700f9b-c7e0-4568-a565-64c9a770449f.223*
> */.shard/82700f9b-c7e0-4568-a565-64c9a770449f.243*
> */.shard/dc46e963-2b68-4802-9537-42f25ea97ae2.10696*
> */.shard/dc46e963-2b68-4802-9537-42f25ea97ae2.10902*
> *..*
> *Status: Connected*
> *Number of entries: 664*
>
> *Brick storage2:/data/glusterfs/hdd/brick3/brick*
> *Status: Transport endpoint is not connected*
> *Number of entries: -*
>
> *Brick storage3:/data/glusterfs/hdd/brick3/brick*
>
>
*/538befbf-ffa7-4a8c-8827-cee679d589f4/images/615fa020-9737-4b83-a3c1-a61e32400d59/f4917758-deae-4a62-bf4d-5b9a95a7db5b*
> *<gfid:f3d0b19a-2544-48c5-90b7-addd561113bc>*
> */.shard/753a8a81-bd06-4c8c-9515-d54123f6fe4d.1*
> *..*
> *Status: Connected*
> *Number of entries: 664*
>
> ####################################
> # /data/glusterfs/hdd/brick3 on storage2 is running inside of a software
> RAID
>
> *md6 : active raid6 sdac1[6] sdz1[3] sdx1[1] sdad1[7] sdaa1[4] sdy1[2]
> sdw1[0] sdab1[5] sdae1[8]*
> *      68364119040 blocks super 1.2 level 6, 512k chunk, algorithm 2 [9/9]
> [UUUUUUUUU]*
> *      [============>........]  check = 64.4% (6290736128/9766302720)
> finish=3220.5min speed=17985K/sec*
> *      bitmap: 10/73 pages [40KB], 65536KB chunk*
>
> ####################################
> # glfsheal-hdd.log on storage2
>
> *[2022-03-24 10:15:33.238884] I [MSGID: 114046]
> [client-handshake.c:1106:client_setvolume_cbk] 0-hdd-client-10: Connected
> to hdd-client-10, attached to remote volume
> '/data/glusterfs/hdd/brick4/brick'.*
> *[2022-03-24 10:15:33.238931] I [MSGID: 108002]
> [afr-common.c:5607:afr_notify] 0-hdd-replicate-3: Client-quorum is met*
> *[2022-03-24 10:15:33.241616] I [MSGID: 114046]
> [client-handshake.c:1106:client_setvolume_cbk] 0-hdd-client-11: Connected
> to hdd-client-11, attached to remote volume
> '/data/glusterfs/hdd/brick4/brick'.*
> *[2022-03-24 10:15:44.078651] C
> [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 0-hdd-client-7: server
> 172.22.102.142:49154 <http://172.22.102.142:49154> has not responded
in the
> last 10 seconds, disconnecting.*
> *[2022-03-24 10:15:44.078891] I [MSGID: 114018]
> [client.c:2331:client_rpc_notify] 0-hdd-client-7: disconnected from
> hdd-client-7. Client process will keep trying to connect to glusterd until
> brick's port is available*
> *[2022-03-24 10:15:44.079954] E [rpc-clnt.c:346:saved_frames_unwind]
(-->
> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fc6c0cadadb] (-->
> /lib64/libgfrpc.so.0(+0xd7e4)[0x7fc6c019f7e4] (-->
> /lib64/libgfrpc.so.0(+0xd8fe)[0x7fc6c019f8fe] (-->
> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x97)[0x7fc6c01a0987]
(-->
> /lib64/libgfrpc.so.0(+0xf518)[0x7fc6c01a1518] ))))) 0-hdd-client-7: forced
> unwinding frame type(GF-DUMP) op(DUMP(1)) called at 2022-03-24
> 10:15:33.209640 (xid=0x5)*
> *[2022-03-24 10:15:44.080008] W [MSGID: 114032]
> [client-handshake.c:1547:client_dump_version_cbk] 0-hdd-client-7: received
> RPC status error [Transport endpoint is not connected]*
> *[2022-03-24 10:15:44.080526] E [rpc-clnt.c:346:saved_frames_unwind]
(-->
> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fc6c0cadadb] (-->
> /lib64/libgfrpc.so.0(+0xd7e4)[0x7fc6c019f7e4] (-->
> /lib64/libgfrpc.so.0(+0xd8fe)[0x7fc6c019f8fe] (-->
> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x97)[0x7fc6c01a0987]
(-->
> /lib64/libgfrpc.so.0(+0xf518)[0x7fc6c01a1518] ))))) 0-hdd-client-7: forced
> unwinding frame type(GF-DUMP) op(NULL(2)) called at 2022-03-24
> 10:15:33.209655 (xid=0x6)*
> *[2022-03-24 10:15:44.080574] W [rpc-clnt-ping.c:210:rpc_clnt_ping_cbk]
> 0-hdd-client-7: socket disconnected*
>
> ####################################
> # stack trace on storage2 that happened this morning
>
> *Mar 24 06:24:06 storage2 kernel: INFO: task glfs_iotwr000:115974 blocked
> for more than 120 seconds.*
> *Mar 24 06:24:06 storage2 kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.*
> *Mar 24 06:24:06 storage2 kernel: glfs_iotwr000   D ffff9b91b8951070     0
> 115974      1 0x00000080*
> *Mar 24 06:24:06 storage2 kernel: Call Trace:*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80a29>]
schedule+0x29/0x70*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc05056e1>]
> _xfs_log_force_lsn+0x2d1/0x310 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db4d0>] ?
> wake_up_state+0x20/0x20*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc04e5a3d>]
> xfs_file_fsync+0xfd/0x1c0 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167fbf7>]
do_fsync+0x67/0xb0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167ff03>]
> SyS_fdatasync+0x13/0x20*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b8dede>]
> system_call_fastpath+0x25/0x2a*
> *Mar 24 06:24:06 storage2 kernel: INFO: task glfs_iotwr001:121353 blocked
> for more than 120 seconds.*
> *Mar 24 06:24:06 storage2 kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.*
> *Mar 24 06:24:06 storage2 kernel: glfs_iotwr001   D ffff9b9b7d4dac80     0
> 121353      1 0x00000080*
> *Mar 24 06:24:06 storage2 kernel: Call Trace:*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80a29>]
schedule+0x29/0x70*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc05056e1>]
> _xfs_log_force_lsn+0x2d1/0x310 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db4d0>] ?
> wake_up_state+0x20/0x20*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc04e5a3d>]
> xfs_file_fsync+0xfd/0x1c0 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167fbf7>]
do_fsync+0x67/0xb0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167ff03>]
> SyS_fdatasync+0x13/0x20*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b8dede>]
> system_call_fastpath+0x25/0x2a*
> *Mar 24 06:24:06 storage2 kernel: INFO: task glfs_iotwr002:121354 blocked
> for more than 120 seconds.*
> *Mar 24 06:24:06 storage2 kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.*
> *Mar 24 06:24:06 storage2 kernel: glfs_iotwr002   D ffff9b9b7d75ac80     0
> 121354      1 0x00000080*
> *Mar 24 06:24:06 storage2 kernel: Call Trace:*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80a29>]
schedule+0x29/0x70*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc05056e1>]
> _xfs_log_force_lsn+0x2d1/0x310 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db4d0>] ?
> wake_up_state+0x20/0x20*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc04e5a3d>]
> xfs_file_fsync+0xfd/0x1c0 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167fbf7>]
do_fsync+0x67/0xb0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167ff03>]
> SyS_fdatasync+0x13/0x20*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b8dede>]
> system_call_fastpath+0x25/0x2a*
> *Mar 24 06:24:06 storage2 kernel: INFO: task glfs_iotwr003:121355 blocked
> for more than 120 seconds.*
> *Mar 24 06:24:06 storage2 kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.*
> *Mar 24 06:24:06 storage2 kernel: glfs_iotwr003   D ffff9b9b7d51ac80     0
> 121355      1 0x00000080*
> *Mar 24 06:24:06 storage2 kernel: Call Trace:*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80a29>]
schedule+0x29/0x70*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b7e531>]
> schedule_timeout+0x221/0x2d0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14d77a9>] ?
> ttwu_do_wakeup+0x19/0xe0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14d78df>] ?
> ttwu_do_activate+0x6f/0x80*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db210>] ?
> try_to_wake_up+0x190/0x390*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80ddd>]
> wait_for_completion+0xfd/0x140*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db4d0>] ?
> wake_up_state+0x20/0x20*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14be9aa>]
> flush_work+0x10a/0x1b0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14bb6c0>] ?
> move_linked_works+0x90/0x90*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc05070ba>]
> xlog_cil_force_lsn+0x8a/0x210 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc0505484>]
> _xfs_log_force_lsn+0x74/0x310 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa15bcb1f>] ?
> filemap_fdatawait_range+0x1f/0x30*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b7fd22>] ?
> down_read+0x12/0x40*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc04e5a3d>]
> xfs_file_fsync+0xfd/0x1c0 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167fbf7>]
do_fsync+0x67/0xb0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167ff03>]
> SyS_fdatasync+0x13/0x20*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b8dede>]
> system_call_fastpath+0x25/0x2a*
> *Mar 24 06:24:06 storage2 kernel: INFO: task glfs_iotwr004:121356 blocked
> for more than 120 seconds.*
> *Mar 24 06:24:06 storage2 kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.*
> *Mar 24 06:24:06 storage2 kernel: glfs_iotwr004   D ffff9b9b7d75ac80     0
> 121356      1 0x00000080*
> *Mar 24 06:24:06 storage2 kernel: Call Trace:*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80a29>]
schedule+0x29/0x70*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b7e531>]
> schedule_timeout+0x221/0x2d0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14d77a9>] ?
> ttwu_do_wakeup+0x19/0xe0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14d78df>] ?
> ttwu_do_activate+0x6f/0x80*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db210>] ?
> try_to_wake_up+0x190/0x390*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80ddd>]
> wait_for_completion+0xfd/0x140*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db4d0>] ?
> wake_up_state+0x20/0x20*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14be9aa>]
> flush_work+0x10a/0x1b0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14bb6c0>] ?
> move_linked_works+0x90/0x90*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc05070ba>]
> xlog_cil_force_lsn+0x8a/0x210 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc0505484>]
> _xfs_log_force_lsn+0x74/0x310 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa15bcb1f>] ?
> filemap_fdatawait_range+0x1f/0x30*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b7fd22>] ?
> down_read+0x12/0x40*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc04e5a3d>]
> xfs_file_fsync+0xfd/0x1c0 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167fbf7>]
do_fsync+0x67/0xb0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167ff03>]
> SyS_fdatasync+0x13/0x20*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b8dede>]
> system_call_fastpath+0x25/0x2a*
> *Mar 24 06:24:06 storage2 kernel: INFO: task glfs_iotwr005:153774 blocked
> for more than 120 seconds.*
> *Mar 24 06:24:06 storage2 kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.*
> *Mar 24 06:24:06 storage2 kernel: glfs_iotwr005   D ffff9b9b7d61ac80     0
> 153774      1 0x00000080*
> *Mar 24 06:24:06 storage2 kernel: Call Trace:*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80a29>]
schedule+0x29/0x70*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b7e531>]
> schedule_timeout+0x221/0x2d0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14d77a9>] ?
> ttwu_do_wakeup+0x19/0xe0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14d78df>] ?
> ttwu_do_activate+0x6f/0x80*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db210>] ?
> try_to_wake_up+0x190/0x390*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80ddd>]
> wait_for_completion+0xfd/0x140*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db4d0>] ?
> wake_up_state+0x20/0x20*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14be9aa>]
> flush_work+0x10a/0x1b0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14bb6c0>] ?
> move_linked_works+0x90/0x90*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc05070ba>]
> xlog_cil_force_lsn+0x8a/0x210 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167335b>] ?
> getxattr+0x11b/0x180*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc0505484>]
> _xfs_log_force_lsn+0x74/0x310 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b7fd22>] ?
> down_read+0x12/0x40*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc04e5a3d>]
> xfs_file_fsync+0xfd/0x1c0 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167fbf7>]
do_fsync+0x67/0xb0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167ff03>]
> SyS_fdatasync+0x13/0x20*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b8dede>]
> system_call_fastpath+0x25/0x2a*
> *Mar 24 06:24:06 storage2 kernel: INFO: task glfs_iotwr006:153775 blocked
> for more than 120 seconds.*
> *Mar 24 06:24:06 storage2 kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.*
> *Mar 24 06:24:06 storage2 kernel: glfs_iotwr006   D ffff9b9b7d49ac80     0
> 153775      1 0x00000080*
> *Mar 24 06:24:06 storage2 kernel: Call Trace:*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80a29>]
schedule+0x29/0x70*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b7e531>]
> schedule_timeout+0x221/0x2d0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14d77a9>] ?
> ttwu_do_wakeup+0x19/0xe0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14d78df>] ?
> ttwu_do_activate+0x6f/0x80*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db210>] ?
> try_to_wake_up+0x190/0x390*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80ddd>]
> wait_for_completion+0xfd/0x140*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db4d0>] ?
> wake_up_state+0x20/0x20*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14be9aa>]
> flush_work+0x10a/0x1b0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14bb6c0>] ?
> move_linked_works+0x90/0x90*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc05070ba>]
> xlog_cil_force_lsn+0x8a/0x210 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167335b>] ?
> getxattr+0x11b/0x180*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc0505484>]
> _xfs_log_force_lsn+0x74/0x310 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b7fd22>] ?
> down_read+0x12/0x40*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc04e5a3d>]
> xfs_file_fsync+0xfd/0x1c0 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167fbf7>]
do_fsync+0x67/0xb0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167ff03>]
> SyS_fdatasync+0x13/0x20*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b8dede>]
> system_call_fastpath+0x25/0x2a*
> *Mar 24 06:24:06 storage2 kernel: INFO: task glfs_iotwr007:153776 blocked
> for more than 120 seconds.*
> *Mar 24 06:24:06 storage2 kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.*
> *Mar 24 06:24:06 storage2 kernel: glfs_iotwr007   D ffff9b9958c962a0     0
> 153776      1 0x00000080*
> *Mar 24 06:24:06 storage2 kernel: Call Trace:*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80a29>]
schedule+0x29/0x70*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b7e531>]
> schedule_timeout+0x221/0x2d0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14d7782>] ?
> check_preempt_curr+0x92/0xa0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14d77a9>] ?
> ttwu_do_wakeup+0x19/0xe0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db210>] ?
> try_to_wake_up+0x190/0x390*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80ddd>]
> wait_for_completion+0xfd/0x140*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db4d0>] ?
> wake_up_state+0x20/0x20*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14be9aa>]
> flush_work+0x10a/0x1b0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14bb6c0>] ?
> move_linked_works+0x90/0x90*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc05070ba>]
> xlog_cil_force_lsn+0x8a/0x210 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167335b>] ?
> getxattr+0x11b/0x180*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc0505484>]
> _xfs_log_force_lsn+0x74/0x310 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b7fd22>] ?
> down_read+0x12/0x40*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc04e5a3d>]
> xfs_file_fsync+0xfd/0x1c0 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167fbf7>]
do_fsync+0x67/0xb0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167ff03>]
> SyS_fdatasync+0x13/0x20*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b8dede>]
> system_call_fastpath+0x25/0x2a*
> *Mar 24 06:24:06 storage2 kernel: INFO: task glfs_iotwr008:153777 blocked
> for more than 120 seconds.*
> *Mar 24 06:24:06 storage2 kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.*
> *Mar 24 06:24:06 storage2 kernel: glfs_iotwr008   D ffff9b9b7d61ac80     0
> 153777      1 0x00000080*
> *Mar 24 06:24:06 storage2 kernel: Call Trace:*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80a29>]
schedule+0x29/0x70*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc05056e1>]
> _xfs_log_force_lsn+0x2d1/0x310 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db4d0>] ?
> wake_up_state+0x20/0x20*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc04e5a3d>]
> xfs_file_fsync+0xfd/0x1c0 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167fbf7>]
do_fsync+0x67/0xb0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167ff03>]
> SyS_fdatasync+0x13/0x20*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b8dede>]
> system_call_fastpath+0x25/0x2a*
> *Mar 24 06:24:06 storage2 kernel: INFO: task glfs_iotwr009:153778 blocked
> for more than 120 seconds.*
> *Mar 24 06:24:06 storage2 kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.*
> *Mar 24 06:24:06 storage2 kernel: glfs_iotwr009   D ffff9b9958c920e0     0
> 153778      1 0x00000080*
> *Mar 24 06:24:06 storage2 kernel: Call Trace:*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80a29>]
schedule+0x29/0x70*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc05056e1>]
> _xfs_log_force_lsn+0x2d1/0x310 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db4d0>] ?
> wake_up_state+0x20/0x20*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc04e5a3d>]
> xfs_file_fsync+0xfd/0x1c0 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167fbf7>]
do_fsync+0x67/0xb0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167ff03>]
> SyS_fdatasync+0x13/0x20*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b8dede>]
> system_call_fastpath+0x25/0x2a*
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20220325/b0e63f67/attachment.html>

Strahil Nikolov

2022-Mar-25 17:43 UTC

head link

[Gluster-users] Can't mount particular brick even though the brick port is reachable, error message "Transport endpoint is not connected"

Check mode is read intensive. Yet, the brick logs should indicate XFS issues.
Best Regards,Strahil Nikolov
 
 
  On Fri, Mar 25, 2022 at 12:36, Olaf Buitelaar<olaf.buitelaar at
gmail.com> wrote:   ________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users at gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users
  
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20220325/f76b5d48/attachment.html>

Peter Schmidt

2022-Mar-28 08:00 UTC

head link

[Gluster-users] Can't mount particular brick even though the brick port is reachable, error message "Transport endpoint is not connected"

An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20220328/081d4f75/attachment.html>

Gluster users - Mar 2022 - ​Can't mount particular brick even though the brick port is reachable, error message "Transport endpoint is not connected"

[Gluster-users] ​Can't mount particular brick even though the brick port is reachable, error message "Transport endpoint is not connected"

[Gluster-users] ​Can't mount particular brick even though the brick port is reachable, error message "Transport endpoint is not connected"

[Gluster-users] ​Can't mount particular brick even though the brick port is reachable, error message "Transport endpoint is not connected"

Gluster users - Mar 2022 - Can't mount particular brick even though the brick port is reachable, error message "Transport endpoint is not connected"

[Gluster-users] Can't mount particular brick even though the brick port is reachable, error message "Transport endpoint is not connected"

[Gluster-users] Can't mount particular brick even though the brick port is reachable, error message "Transport endpoint is not connected"

[Gluster-users] Can't mount particular brick even though the brick port is reachable, error message "Transport endpoint is not connected"