thr3ads.net - Gluster users - [Gluster-users] Can't mount particular brick even though the brick port is reachable, error message "Transport endpoint is not connected" [Mar 2022]

If this information is useful, please help other people find it:
Share via:

Peter Schmidt

2022-Mar-28 08:00 UTC

[Gluster-users] Can't mount particular brick even though the brick port is reachable, error message "Transport endpoint is not connected"

An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20220328/081d4f75/attachment.html>

Olaf Buitelaar

2022-Mar-28 10:44 UTC

head link

[Gluster-users] Can't mount particular brick even though the brick port is reachable, error message "Transport endpoint is not connected"

Hi Peter,

I think Staril means, running the command; hosted-engine --set-maintenance
--mode=local, this is also possible from the ovirt ui, via the ribbon on
the hosts section;
[image: image.png]
>From the log's it seems gluster has difficulty find the shared's,
e.g.;.shard/e5f699e2-de11-41be-bd24-e29876928f0f.1279
(be318638-e8a0-4c6d-977d-7a937aa84806/e5f699e2-de11-41be-bd24-e29876928f0f.1279
Do these files exist within your brick's directories?
Did you try to repair the filesystem using xfs_repair?

Best regards,

Olaf

Op ma 28 mrt. 2022 om 10:01 schreef Peter Schmidt <
peterschmidt18351 at yandex.com>:
>
> Hi Olaf,
>
> I tried running "gluster volume start hdd force" but sadly it did
not
> change anything.
>
> the raid rebuild has finished now and everything seems to be fine:
> md6 : active raid6 sdu1[2] sdx1[5] sds1[0] sdt1[1] sdz1[7] sdv1[3] sdw1[4]
> sdaa1[8] sdy1[6]
>       68364119040 blocks super 1.2 level 6, 512k chunk, algorithm 2 [9/9]
> [UUUUUUUUU]
>       bitmap: 0/73 pages [0KB], 65536KB chunk
>
> Best regards
> Peter
>
> 25.03.2022, 12:36, "Olaf Buitelaar" <olaf.buitelaar at
gmail.com>:
>
> Hi Peter,
>
> I see your raid array is rebuilding, could it be your xfs needs a repair,
> using xfs_repair?
> did you try running gluster v hdd start force?
>
> Kind regards,
>
> Olaf
>
>
> Op do 24 mrt. 2022 om 15:54 schreef Peter Schmidt <
> peterschmidt18351 at yandex.com>:
>
> Hello everyone,
>
> I'm running an oVirt cluster on top of a distributed-replicate gluster
> volume and one of the bricks cannot be mounted anymore from my oVirt hosts.
> This morning I also noticed a stack trace and a spike in TCP connections on
> one of the three gluster nodes (storage2), which I have attached at the end
> of this mail. Only this particular brick on storage2 seems to be causing
> trouble:
> *Brick storage2:/data/glusterfs/hdd/brick3/brick*
> *Status: Transport endpoint is not connected*
>
> I don't know what's causing this or how to resolve this issue. I
would
> appreciate it if someone could take a look at my logs and point me in the
> right direction. If any additional logs are required, please let me know.
> Thank you in advance!
>
> Operating system on all hosts: Centos 7.9.2009
> oVirt version: 4.3.10.4-1
> Gluster versions:
> - storage1: 6.10-1
> - storage2: 6.7-1
> - storage3: 6.7-1
>
> ####################################
> # brick is not connected/mounted on the oVirt hosts
>
> *[xlator.protocol.client.hdd-client-7.priv]*
> *fd.0.remote_fd = -1*
> *------ = ------*
> *granted-posix-lock[0] = owner = 9d673ffe323e25cd, cmd = F_SETLK fl_type
> F_RDLCK, fl_start = 100, fl_end = 100, user_flock: l_type = F_RDLCK,
> l_start = 100, l_len = 1*
> *granted-posix-lock[1] = owner = 9d673ffe323e25cd, cmd = F_SETLK fl_type
> F_RDLCK, fl_start = 101, fl_end = 101, user_flock: l_type = F_RDLCK,
> l_start = 101, l_len = 1*
> *------ = ------*
> *connected = 0*
> *total_bytes_read = 11383136800*
> *ping_timeout = 10*
> *total_bytes_written = 16699851552*
> *ping_msgs_sent = 1*
> *msgs_sent = 2*
>
> ####################################
> # mount log from one of the oVirt hosts
> # the IP 172.22.102.142 corresponds to my gluster node "storage2"
> # the port 49154 corresponds to the brick
> storage2:/data/glusterfs/hdd/brick3/brick
>
> *[2022-03-24 10:59:28.138178] W [rpc-clnt-ping.c:210:rpc_clnt_ping_cbk]
> 0-hdd-client-7: socket disconnected*
> *[2022-03-24 10:59:38.142698] I [rpc-clnt.c:2028:rpc_clnt_reconfig]
> 0-hdd-client-7: changing port to 49154 (from 0)*
> *The message "I [MSGID: 114018] [client.c:2331:client_rpc_notify]
> 0-hdd-client-7: disconnected from hdd-client-7. Client process will keep
> trying to connect to glusterd until brick's port is available"
repeated 4
> times between [2022-03-24 10:58:04.114741] and [2022-03-24
10:59:28.137380]*
> *The message "W [MSGID: 114032]
> [client-handshake.c:1546:client_dump_version_cbk] 0-hdd-client-7: received
> RPC status error [Transport endpoint is not connected]" repeated 4
times
> between [2022-03-24 10:58:04.115169] and [2022-03-24 10:59:28.138052]*
> *[2022-03-24 10:59:49.143217] C
> [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 0-hdd-client-7: server
> 172.22.102.142:49154 <http://172.22.102.142:49154/> has not responded
in
> the last 10 seconds, disconnecting.*
> *[2022-03-24 10:59:49.143838] I [MSGID: 114018]
> [client.c:2331:client_rpc_notify] 0-hdd-client-7: disconnected from
> hdd-client-7. Client process will keep trying to connect to glusterd until
> brick's port is available*
> *[2022-03-24 10:59:49.144540] E [rpc-clnt.c:346:saved_frames_unwind]
(-->
> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f6724643adb] (-->
> /lib64/libgfrpc.so.0(+0xd7e4)[0x7f67243ea7e4] (-->
> /lib64/libgfrpc.so.0(+0xd8fe)[0x7f67243ea8fe] (-->
> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x97)[0x7f67243eb987]
(-->
> /lib64/libgfrpc.so.0(+0xf518)[0x7f67243ec518] ))))) 0-hdd-client-7: forced
> unwinding frame type(GF-DUMP) op(DUMP(1)) called at 2022-03-24
> 10:59:38.145208 (xid=0x861)*
> *[2022-03-24 10:59:49.144557] W [MSGID: 114032]
> [client-handshake.c:1546:client_dump_version_cbk] 0-hdd-client-7: received
> RPC status error [Transport endpoint is not connected]*
> *[2022-03-24 10:59:49.144653] E [rpc-clnt.c:346:saved_frames_unwind]
(-->
> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f6724643adb] (-->
> /lib64/libgfrpc.so.0(+0xd7e4)[0x7f67243ea7e4] (-->
> /lib64/libgfrpc.so.0(+0xd8fe)[0x7f67243ea8fe] (-->
> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x97)[0x7f67243eb987]
(-->
> /lib64/libgfrpc.so.0(+0xf518)[0x7f67243ec518] ))))) 0-hdd-client-7: forced
> unwinding frame type(GF-DUMP) op(NULL(2)) called at 2022-03-24
> 10:59:38.145218 (xid=0x862)*
> *[2022-03-24 10:59:49.144665] W [rpc-clnt-ping.c:210:rpc_clnt_ping_cbk]
> 0-hdd-client-7: socket disconnected*
>
> ####################################
> # netcat/telnet to the brick's port of storage2 are working
>
> *[root at storage1 <root at storage1> ~]#  netcat -z -v
172.22.102.142 49154*
> *Connection to 172.22.102.142 49154 port [tcp/*] succeeded!*
>
> *[root at storage3 <root at storage3> ~]# netcat -z -v 172.22.102.142
49154*
> *Connection to 172.22.102.142 49154 port [tcp/*] succeeded!*
>
> *[root at ovirthost1 <root at ovirthost1> /var/log/glusterfs]# 
netcat -z -v
> 172.22.102.142 49154*
> *Connection to 172.22.102.142 49154 port [tcp/*] succeeded!*
>
> ####################################
> # gluster peer status - all gluster peers are connected
> *[root at storage3 <root at storage3> ~]#  gluster peer status*
> *Number of Peers: 2*
>
> *Hostname: storage1*
> *Uuid: 055e79c2-b1ff-4a82-9296-205d6877904e*
> *State: Peer in Cluster (Connected)*
>
> *Hostname: storage2*
> *Uuid: d7adcb92-2e71-41a9-80d4-13180ee673cf*
> *State: Peer in Cluster (Connected)*
>
> ####################################
> # Configuration of the volume
> *Volume Name: hdd*
> *Type: Distributed-Replicate*
> *Volume ID: 1b47c2f8-5024-4b85-aa7f-a3f767bb076c*
> *Status: Started*
> *Snapshot Count: 0*
> *Number of Bricks: 4 x 3 = 12*
> *Transport-type: tcp*
> *Bricks:*
> *Brick1: storage1:/data/glusterfs/hdd/brick1/brick*
> *Brick2: storage2:/data/glusterfs/hdd/brick1/brick*
> *Brick3: storage3:/data/glusterfs/hdd/brick1/brick*
> *Brick4: storage1:/data/glusterfs/hdd/brick2/brick*
> *Brick5: storage2:/data/glusterfs/hdd/brick2/brick*
> *Brick6: storage3:/data/glusterfs/hdd/brick2/brick*
> *Brick7: storage1:/data/glusterfs/hdd/brick3/brick*
> *Brick8: storage2:/data/glusterfs/hdd/brick3/brick*
> *Brick9: storage3:/data/glusterfs/hdd/brick3/brick*
> *Brick10: storage1:/data/glusterfs/hdd/brick4/brick*
> *Brick11: storage2:/data/glusterfs/hdd/brick4/brick*
> *Brick12: storage3:/data/glusterfs/hdd/brick4/brick*
> *Options Reconfigured:*
> *storage.owner-gid: 36*
> *storage.owner-uid: 36*
> *server.event-threads: 4*
> *client.event-threads: 4*
> *cluster.choose-local: off*
> *user.cifs: off*
> *features.shard: on*
> *cluster.shd-wait-qlength: 10000*
> *cluster.shd-max-threads: 8*
> *cluster.locking-scheme: granular*
> *cluster.data-self-heal-algorithm: full*
> *cluster.server-quorum-type: server*
> *cluster.eager-lock: enable*
> *network.remote-dio: enable*
> *performance.low-prio-threads: 32*
> *performance.io-cache: off*
> *performance.read-ahead: off*
> *performance.quick-read: off*
> *auth.allow: **
> *network.ping-timeout: 10*
> *cluster.quorum-type: auto*
> *transport.address-family: inet*
> *nfs.disable: on*
> *performance.client-io-threads: on*
>
> ####################################
> # gluster volume status. The brick running on port 49154 is supposedly
> online
>
> *Status of volume: hdd*
> *Gluster process                             TCP Port  RDMA Port  Online
> Pid*
>
>
*------------------------------------------------------------------------------*
> *Brick storage1:/data/gluste*
> *rfs/hdd/brick1/brick                        49158     0          Y
> 9142*
> *Brick storage2:/data/gluste*
> *rfs/hdd/brick1/brick                        49152     0          Y
> 115896*
> *Brick storage3:/data/gluste*
> *rfs/hdd/brick1/brick                        49158     0          Y
> 131775*
> *Brick storage1:/data/gluste*
> *rfs/hdd/brick2/brick                        49159     0          Y
> 9151*
> *Brick storage2:/data/gluste*
> *rfs/hdd/brick2/brick                        49153     0          Y
> 115904*
> *Brick storage3:/data/gluste*
> *rfs/hdd/brick2/brick                        49159     0          Y
> 131783*
> *Brick storage1:/data/gluste*
> *rfs/hdd/brick3/brick                        49160     0          Y
> 9163*
> *Brick storage2:/data/gluste*
> *rfs/hdd/brick3/brick                        49154     0          Y
> 115913*
> *Brick storage3:/data/gluste*
> *rfs/hdd/brick3/brick                        49160     0          Y
> 131792*
> *Brick storage1:/data/gluste*
> *rfs/hdd/brick4/brick                        49161     0          Y
> 9170*
> *Brick storage2:/data/gluste*
> *rfs/hdd/brick4/brick                        49155     0          Y
> 115923*
> *Brick storage3:/data/gluste*
> *rfs/hdd/brick4/brick                        49161     0          Y
> 131800*
> *Self-heal Daemon on localhost               N/A       N/A        Y
> 170468*
> *Self-heal Daemon on storage3               N/A       N/A        Y
> 132263*
> *Self-heal Daemon on storage1               N/A       N/A        Y
> 9512*
>
> *Task Status of Volume hdd*
>
>
*------------------------------------------------------------------------------*
> *There are no active volume tasks*
>
> ####################################
> # gluster volume heal hdd info split-brain. All bricks are connected and
> showing no entries (0), except for brick3 on storage2
> *Brick storage2:/data/glusterfs/hdd/brick3/brick*
> *Status: Transport endpoint is not connected*
> *Number of entries in split-brain: -*
>
> ####################################
> # gluster volume heal hdd info. Only brick3 seems to be affected and it
> has lots of entries. brick3 on storage2 is not connected
>
> *Brick storage1:/data/glusterfs/hdd/brick3/brick*
>
>
*/538befbf-ffa7-4a8c-8827-cee679d589f4/images/615fa020-9737-4b83-a3c1-a61e32400d59/f4917758-deae-4a62-bf4d-5b9a95a7db5b*
> *<gfid:f3d0b19a-2544-48c5-90b7-addd561113bc>*
> */.shard/753a8a81-bd06-4c8c-9515-d54123f6fe4d.1*
> */.shard/c7f5f88f-dc85-4645-9178-c7df8e46a99d.83*
>
>
*/538befbf-ffa7-4a8c-8827-cee679d589f4/images/bc4362e6-cd43-4ab8-b8fa-0ea72405b7da/ea9c0e7c-d2c7-43c8-b19f-7a3076cc6743*
> */.shard/dc46e963-2b68-4802-9537-42f25ea97ae2.10872*
> */.shard/dc46e963-2b68-4802-9537-42f25ea97ae2.1901*
>
>
*/538befbf-ffa7-4a8c-8827-cee679d589f4/images/e48e80fb-d42f-47a4-9a56-07fd7ad868b3/31fd839f-85bf-4c42-ac0e-7055d903df40*
> */.shard/82700f9b-c7e0-4568-a565-64c9a770449f.223*
> */.shard/82700f9b-c7e0-4568-a565-64c9a770449f.243*
> */.shard/dc46e963-2b68-4802-9537-42f25ea97ae2.10696*
> */.shard/dc46e963-2b68-4802-9537-42f25ea97ae2.10902*
> *..*
> *Status: Connected*
> *Number of entries: 664*
>
> *Brick storage2:/data/glusterfs/hdd/brick3/brick*
> *Status: Transport endpoint is not connected*
> *Number of entries: -*
>
> *Brick storage3:/data/glusterfs/hdd/brick3/brick*
>
>
*/538befbf-ffa7-4a8c-8827-cee679d589f4/images/615fa020-9737-4b83-a3c1-a61e32400d59/f4917758-deae-4a62-bf4d-5b9a95a7db5b*
> *<gfid:f3d0b19a-2544-48c5-90b7-addd561113bc>*
> */.shard/753a8a81-bd06-4c8c-9515-d54123f6fe4d.1*
> *..*
> *Status: Connected*
> *Number of entries: 664*
>
> ####################################
> # /data/glusterfs/hdd/brick3 on storage2 is running inside of a software
> RAID
>
> *md6 : active raid6 sdac1[6] sdz1[3] sdx1[1] sdad1[7] sdaa1[4] sdy1[2]
> sdw1[0] sdab1[5] sdae1[8]*
> *      68364119040 blocks super 1.2 level 6, 512k chunk, algorithm 2 [9/9]
> [UUUUUUUUU]*
> *      [============>........]  check = 64.4% (6290736128/9766302720)
> finish=3220.5min speed=17985K/sec*
> *      bitmap: 10/73 pages [40KB], 65536KB chunk*
>
> ####################################
> # glfsheal-hdd.log on storage2
>
> *[2022-03-24 10:15:33.238884] I [MSGID: 114046]
> [client-handshake.c:1106:client_setvolume_cbk] 0-hdd-client-10: Connected
> to hdd-client-10, attached to remote volume
> '/data/glusterfs/hdd/brick4/brick'.*
> *[2022-03-24 10:15:33.238931] I [MSGID: 108002]
> [afr-common.c:5607:afr_notify] 0-hdd-replicate-3: Client-quorum is met*
> *[2022-03-24 10:15:33.241616] I [MSGID: 114046]
> [client-handshake.c:1106:client_setvolume_cbk] 0-hdd-client-11: Connected
> to hdd-client-11, attached to remote volume
> '/data/glusterfs/hdd/brick4/brick'.*
> *[2022-03-24 10:15:44.078651] C
> [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 0-hdd-client-7: server
> 172.22.102.142:49154 <http://172.22.102.142:49154/> has not responded
in
> the last 10 seconds, disconnecting.*
> *[2022-03-24 10:15:44.078891] I [MSGID: 114018]
> [client.c:2331:client_rpc_notify] 0-hdd-client-7: disconnected from
> hdd-client-7. Client process will keep trying to connect to glusterd until
> brick's port is available*
> *[2022-03-24 10:15:44.079954] E [rpc-clnt.c:346:saved_frames_unwind]
(-->
> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fc6c0cadadb] (-->
> /lib64/libgfrpc.so.0(+0xd7e4)[0x7fc6c019f7e4] (-->
> /lib64/libgfrpc.so.0(+0xd8fe)[0x7fc6c019f8fe] (-->
> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x97)[0x7fc6c01a0987]
(-->
> /lib64/libgfrpc.so.0(+0xf518)[0x7fc6c01a1518] ))))) 0-hdd-client-7: forced
> unwinding frame type(GF-DUMP) op(DUMP(1)) called at 2022-03-24
> 10:15:33.209640 (xid=0x5)*
> *[2022-03-24 10:15:44.080008] W [MSGID: 114032]
> [client-handshake.c:1547:client_dump_version_cbk] 0-hdd-client-7: received
> RPC status error [Transport endpoint is not connected]*
> *[2022-03-24 10:15:44.080526] E [rpc-clnt.c:346:saved_frames_unwind]
(-->
> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fc6c0cadadb] (-->
> /lib64/libgfrpc.so.0(+0xd7e4)[0x7fc6c019f7e4] (-->
> /lib64/libgfrpc.so.0(+0xd8fe)[0x7fc6c019f8fe] (-->
> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x97)[0x7fc6c01a0987]
(-->
> /lib64/libgfrpc.so.0(+0xf518)[0x7fc6c01a1518] ))))) 0-hdd-client-7: forced
> unwinding frame type(GF-DUMP) op(NULL(2)) called at 2022-03-24
> 10:15:33.209655 (xid=0x6)*
> *[2022-03-24 10:15:44.080574] W [rpc-clnt-ping.c:210:rpc_clnt_ping_cbk]
> 0-hdd-client-7: socket disconnected*
>
> ####################################
> # stack trace on storage2 that happened this morning
>
> *Mar 24 06:24:06 storage2 kernel: INFO: task glfs_iotwr000:115974 blocked
> for more than 120 seconds.*
> *Mar 24 06:24:06 storage2 kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.*
> *Mar 24 06:24:06 storage2 kernel: glfs_iotwr000   D ffff9b91b8951070     0
> 115974      1 0x00000080*
> *Mar 24 06:24:06 storage2 kernel: Call Trace:*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80a29>]
schedule+0x29/0x70*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc05056e1>]
> _xfs_log_force_lsn+0x2d1/0x310 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db4d0>] ?
> wake_up_state+0x20/0x20*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc04e5a3d>]
> xfs_file_fsync+0xfd/0x1c0 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167fbf7>]
do_fsync+0x67/0xb0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167ff03>]
> SyS_fdatasync+0x13/0x20*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b8dede>]
> system_call_fastpath+0x25/0x2a*
> *Mar 24 06:24:06 storage2 kernel: INFO: task glfs_iotwr001:121353 blocked
> for more than 120 seconds.*
> *Mar 24 06:24:06 storage2 kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.*
> *Mar 24 06:24:06 storage2 kernel: glfs_iotwr001   D ffff9b9b7d4dac80     0
> 121353      1 0x00000080*
> *Mar 24 06:24:06 storage2 kernel: Call Trace:*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80a29>]
schedule+0x29/0x70*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc05056e1>]
> _xfs_log_force_lsn+0x2d1/0x310 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db4d0>] ?
> wake_up_state+0x20/0x20*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc04e5a3d>]
> xfs_file_fsync+0xfd/0x1c0 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167fbf7>]
do_fsync+0x67/0xb0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167ff03>]
> SyS_fdatasync+0x13/0x20*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b8dede>]
> system_call_fastpath+0x25/0x2a*
> *Mar 24 06:24:06 storage2 kernel: INFO: task glfs_iotwr002:121354 blocked
> for more than 120 seconds.*
> *Mar 24 06:24:06 storage2 kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.*
> *Mar 24 06:24:06 storage2 kernel: glfs_iotwr002   D ffff9b9b7d75ac80     0
> 121354      1 0x00000080*
> *Mar 24 06:24:06 storage2 kernel: Call Trace:*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80a29>]
schedule+0x29/0x70*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc05056e1>]
> _xfs_log_force_lsn+0x2d1/0x310 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db4d0>] ?
> wake_up_state+0x20/0x20*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc04e5a3d>]
> xfs_file_fsync+0xfd/0x1c0 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167fbf7>]
do_fsync+0x67/0xb0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167ff03>]
> SyS_fdatasync+0x13/0x20*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b8dede>]
> system_call_fastpath+0x25/0x2a*
> *Mar 24 06:24:06 storage2 kernel: INFO: task glfs_iotwr003:121355 blocked
> for more than 120 seconds.*
> *Mar 24 06:24:06 storage2 kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.*
> *Mar 24 06:24:06 storage2 kernel: glfs_iotwr003   D ffff9b9b7d51ac80     0
> 121355      1 0x00000080*
> *Mar 24 06:24:06 storage2 kernel: Call Trace:*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80a29>]
schedule+0x29/0x70*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b7e531>]
> schedule_timeout+0x221/0x2d0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14d77a9>] ?
> ttwu_do_wakeup+0x19/0xe0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14d78df>] ?
> ttwu_do_activate+0x6f/0x80*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db210>] ?
> try_to_wake_up+0x190/0x390*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80ddd>]
> wait_for_completion+0xfd/0x140*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db4d0>] ?
> wake_up_state+0x20/0x20*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14be9aa>]
> flush_work+0x10a/0x1b0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14bb6c0>] ?
> move_linked_works+0x90/0x90*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc05070ba>]
> xlog_cil_force_lsn+0x8a/0x210 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc0505484>]
> _xfs_log_force_lsn+0x74/0x310 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa15bcb1f>] ?
> filemap_fdatawait_range+0x1f/0x30*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b7fd22>] ?
> down_read+0x12/0x40*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc04e5a3d>]
> xfs_file_fsync+0xfd/0x1c0 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167fbf7>]
do_fsync+0x67/0xb0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167ff03>]
> SyS_fdatasync+0x13/0x20*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b8dede>]
> system_call_fastpath+0x25/0x2a*
> *Mar 24 06:24:06 storage2 kernel: INFO: task glfs_iotwr004:121356 blocked
> for more than 120 seconds.*
> *Mar 24 06:24:06 storage2 kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.*
> *Mar 24 06:24:06 storage2 kernel: glfs_iotwr004   D ffff9b9b7d75ac80     0
> 121356      1 0x00000080*
> *Mar 24 06:24:06 storage2 kernel: Call Trace:*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80a29>]
schedule+0x29/0x70*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b7e531>]
> schedule_timeout+0x221/0x2d0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14d77a9>] ?
> ttwu_do_wakeup+0x19/0xe0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14d78df>] ?
> ttwu_do_activate+0x6f/0x80*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db210>] ?
> try_to_wake_up+0x190/0x390*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80ddd>]
> wait_for_completion+0xfd/0x140*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db4d0>] ?
> wake_up_state+0x20/0x20*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14be9aa>]
> flush_work+0x10a/0x1b0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14bb6c0>] ?
> move_linked_works+0x90/0x90*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc05070ba>]
> xlog_cil_force_lsn+0x8a/0x210 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc0505484>]
> _xfs_log_force_lsn+0x74/0x310 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa15bcb1f>] ?
> filemap_fdatawait_range+0x1f/0x30*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b7fd22>] ?
> down_read+0x12/0x40*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc04e5a3d>]
> xfs_file_fsync+0xfd/0x1c0 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167fbf7>]
do_fsync+0x67/0xb0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167ff03>]
> SyS_fdatasync+0x13/0x20*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b8dede>]
> system_call_fastpath+0x25/0x2a*
> *Mar 24 06:24:06 storage2 kernel: INFO: task glfs_iotwr005:153774 blocked
> for more than 120 seconds.*
> *Mar 24 06:24:06 storage2 kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.*
> *Mar 24 06:24:06 storage2 kernel: glfs_iotwr005   D ffff9b9b7d61ac80     0
> 153774      1 0x00000080*
> *Mar 24 06:24:06 storage2 kernel: Call Trace:*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80a29>]
schedule+0x29/0x70*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b7e531>]
> schedule_timeout+0x221/0x2d0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14d77a9>] ?
> ttwu_do_wakeup+0x19/0xe0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14d78df>] ?
> ttwu_do_activate+0x6f/0x80*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db210>] ?
> try_to_wake_up+0x190/0x390*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80ddd>]
> wait_for_completion+0xfd/0x140*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db4d0>] ?
> wake_up_state+0x20/0x20*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14be9aa>]
> flush_work+0x10a/0x1b0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14bb6c0>] ?
> move_linked_works+0x90/0x90*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc05070ba>]
> xlog_cil_force_lsn+0x8a/0x210 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167335b>] ?
> getxattr+0x11b/0x180*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc0505484>]
> _xfs_log_force_lsn+0x74/0x310 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b7fd22>] ?
> down_read+0x12/0x40*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc04e5a3d>]
> xfs_file_fsync+0xfd/0x1c0 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167fbf7>]
do_fsync+0x67/0xb0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167ff03>]
> SyS_fdatasync+0x13/0x20*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b8dede>]
> system_call_fastpath+0x25/0x2a*
> *Mar 24 06:24:06 storage2 kernel: INFO: task glfs_iotwr006:153775 blocked
> for more than 120 seconds.*
> *Mar 24 06:24:06 storage2 kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.*
> *Mar 24 06:24:06 storage2 kernel: glfs_iotwr006   D ffff9b9b7d49ac80     0
> 153775      1 0x00000080*
> *Mar 24 06:24:06 storage2 kernel: Call Trace:*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80a29>]
schedule+0x29/0x70*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b7e531>]
> schedule_timeout+0x221/0x2d0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14d77a9>] ?
> ttwu_do_wakeup+0x19/0xe0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14d78df>] ?
> ttwu_do_activate+0x6f/0x80*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db210>] ?
> try_to_wake_up+0x190/0x390*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80ddd>]
> wait_for_completion+0xfd/0x140*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db4d0>] ?
> wake_up_state+0x20/0x20*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14be9aa>]
> flush_work+0x10a/0x1b0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14bb6c0>] ?
> move_linked_works+0x90/0x90*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc05070ba>]
> xlog_cil_force_lsn+0x8a/0x210 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167335b>] ?
> getxattr+0x11b/0x180*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc0505484>]
> _xfs_log_force_lsn+0x74/0x310 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b7fd22>] ?
> down_read+0x12/0x40*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc04e5a3d>]
> xfs_file_fsync+0xfd/0x1c0 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167fbf7>]
do_fsync+0x67/0xb0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167ff03>]
> SyS_fdatasync+0x13/0x20*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b8dede>]
> system_call_fastpath+0x25/0x2a*
> *Mar 24 06:24:06 storage2 kernel: INFO: task glfs_iotwr007:153776 blocked
> for more than 120 seconds.*
> *Mar 24 06:24:06 storage2 kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.*
> *Mar 24 06:24:06 storage2 kernel: glfs_iotwr007   D ffff9b9958c962a0     0
> 153776      1 0x00000080*
> *Mar 24 06:24:06 storage2 kernel: Call Trace:*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80a29>]
schedule+0x29/0x70*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b7e531>]
> schedule_timeout+0x221/0x2d0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14d7782>] ?
> check_preempt_curr+0x92/0xa0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14d77a9>] ?
> ttwu_do_wakeup+0x19/0xe0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db210>] ?
> try_to_wake_up+0x190/0x390*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80ddd>]
> wait_for_completion+0xfd/0x140*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db4d0>] ?
> wake_up_state+0x20/0x20*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14be9aa>]
> flush_work+0x10a/0x1b0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14bb6c0>] ?
> move_linked_works+0x90/0x90*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc05070ba>]
> xlog_cil_force_lsn+0x8a/0x210 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167335b>] ?
> getxattr+0x11b/0x180*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc0505484>]
> _xfs_log_force_lsn+0x74/0x310 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b7fd22>] ?
> down_read+0x12/0x40*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc04e5a3d>]
> xfs_file_fsync+0xfd/0x1c0 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167fbf7>]
do_fsync+0x67/0xb0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167ff03>]
> SyS_fdatasync+0x13/0x20*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b8dede>]
> system_call_fastpath+0x25/0x2a*
> *Mar 24 06:24:06 storage2 kernel: INFO: task glfs_iotwr008:153777 blocked
> for more than 120 seconds.*
> *Mar 24 06:24:06 storage2 kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.*
> *Mar 24 06:24:06 storage2 kernel: glfs_iotwr008   D ffff9b9b7d61ac80     0
> 153777      1 0x00000080*
> *Mar 24 06:24:06 storage2 kernel: Call Trace:*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80a29>]
schedule+0x29/0x70*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc05056e1>]
> _xfs_log_force_lsn+0x2d1/0x310 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db4d0>] ?
> wake_up_state+0x20/0x20*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc04e5a3d>]
> xfs_file_fsync+0xfd/0x1c0 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167fbf7>]
do_fsync+0x67/0xb0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167ff03>]
> SyS_fdatasync+0x13/0x20*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b8dede>]
> system_call_fastpath+0x25/0x2a*
> *Mar 24 06:24:06 storage2 kernel: INFO: task glfs_iotwr009:153778 blocked
> for more than 120 seconds.*
> *Mar 24 06:24:06 storage2 kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.*
> *Mar 24 06:24:06 storage2 kernel: glfs_iotwr009   D ffff9b9958c920e0     0
> 153778      1 0x00000080*
> *Mar 24 06:24:06 storage2 kernel: Call Trace:*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b80a29>]
schedule+0x29/0x70*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc05056e1>]
> _xfs_log_force_lsn+0x2d1/0x310 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa14db4d0>] ?
> wake_up_state+0x20/0x20*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffc04e5a3d>]
> xfs_file_fsync+0xfd/0x1c0 [xfs]*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167fbf7>]
do_fsync+0x67/0xb0*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa167ff03>]
> SyS_fdatasync+0x13/0x20*
> *Mar 24 06:24:06 storage2 kernel: [<ffffffffa1b8dede>]
> system_call_fastpath+0x25/0x2a*
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20220328/673a4e68/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 22704 bytes
Desc: not available
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20220328/673a4e68/attachment.png>

Gluster users - Mar 2022 - ​Can't mount particular brick even though the brick port is reachable, error message "Transport endpoint is not connected"

[Gluster-users] ​Can't mount particular brick even though the brick port is reachable, error message "Transport endpoint is not connected"

[Gluster-users] ​Can't mount particular brick even though the brick port is reachable, error message "Transport endpoint is not connected"

Gluster users - Mar 2022 - Can't mount particular brick even though the brick port is reachable, error message "Transport endpoint is not connected"

[Gluster-users] Can't mount particular brick even though the brick port is reachable, error message "Transport endpoint is not connected"

[Gluster-users] Can't mount particular brick even though the brick port is reachable, error message "Transport endpoint is not connected"