peljasz
2017-Aug-01 17:31 UTC
[Gluster-users] connection to 10.5.6.32:49155 failed (Connection refused); disconnecting socket
how critical is above? I get plenty of these on all three peers. hi guys I've recently upgraded from 3.8 to 3.10 and I'm seeing weird behavior. I see: $gluster vol status $_vol detail; takes long timeand mostly times out. I do: $ gluster vol heal $_vol info and I see: Brick 10.5.6.32:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-CYTO-DATA Status: Transport endpoint is not connected Number of entries: - Brick 10.5.6.49:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-CYTO-DATA Status: Connected Number of entries: 0 Brick 10.5.6.100:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-CYTO-DATA Status: Transport endpoint is not connected Number of entries: - Ibegin to worry that 3.10 @centos7.3might have not been a good idea. many thanks. L.
Atin Mukherjee
2017-Aug-02 01:19 UTC
[Gluster-users] connection to 10.5.6.32:49155 failed (Connection refused); disconnecting socket
This means shd client is not able to establish the connection with the brick on port 49155. Now this could happen if glusterd has ended up providing a stale port back which is not what brick is listening to. If you had killed any brick process using sigkill signal instead of sigterm this is expected as portmap_signout is not received by glusterd in the former case and the old portmap entry is never wiped off. Please restart glusterd service. This should fix the problem. On Tue, 1 Aug 2017 at 23:03, peljasz <peljasz at yahoo.co.uk> wrote:> how critical is above? > I get plenty of these on all three peers. > > hi guys > > I've recently upgraded from 3.8 to 3.10 and I'm seeing weird > behavior. > I see: $gluster vol status $_vol detail; takes long timeand > mostly times out. > I do: > $ gluster vol heal $_vol info > and I see: > Brick > 10.5.6.32:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-CYTO-DATA > Status: Transport endpoint is not connected > Number of entries: - > > Brick > 10.5.6.49:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-CYTO-DATA > Status: Connected > Number of entries: 0 > > Brick > 10.5.6.100:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-CYTO-DATA > Status: Transport endpoint is not connected > Number of entries: - > > Ibegin to worry that 3.10 @centos7.3might have not been a > good idea. > many thanks. > L. > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users >-- - Atin (atinm) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170802/2df0b10c/attachment.html>
lejeczek
2017-Aug-02 05:57 UTC
[Gluster-users] connection to 10.5.6.32:49155 failed (Connection refused); disconnecting socket
But I had not killed anything, unless system did for some reason and silently, but I'd not think so. It seems that one brick is particularly ill about it all. I'd have to restart it but mostly this would not do and actually reboot the system, then for I short while it would be ok only soon later to show up as: Brick 10.5.6.32:/__.aLocalStorages/0/0-GLUS TERs/0GLUSTER-GROUP-WORK N/A N/A N N/A Brick 10.5.6.49:/__.aLocalStorages/0/0-GLUS TERs/0GLUSTER-GROUP-WORK 49153 0 Y 2391260 Brick 10.5.6.100:/__.aLocalStorages/0/0-GLU STERs/0GLUSTER-GROUP-WORK 49153 0 Y 9717 and logs: [2017-08-02 05:51:48.306839] E [socket.c:2316:socket_connect_finish] 0-GROUP-WORK-client-6: connection to 10.5.6.32:49153 failed (Connection refused); disconnecting socket But systemd on that brick says processes/daemon are ok. And all three bricks would be virtually(general config) identical. Not sure what to think about. thanks. L On 02/08/17 02:19, Atin Mukherjee wrote:> This means shd client is not able to establish the > connection with the brick on port 49155. Now this could > happen if glusterd has ended up providing a stale port > back which is not what brick is listening to. If you had > killed any brick process using sigkill signal instead of > sigterm this is expected as portmap_signout is not > received by glusterd in the former case and the old > portmap entry is never wiped off. > > Please restart glusterd service. This should fix the problem. > > On Tue, 1 Aug 2017 at 23:03, peljasz <peljasz at yahoo.co.uk > <mailto:peljasz at yahoo.co.uk>> wrote: > > how critical is above? > I get plenty of these on all three peers. > > hi guys > > I've recently upgraded from 3.8 to 3.10 and I'm seeing > weird > behavior. > I see: $gluster vol status $_vol detail; takes long > timeand > mostly times out. > I do: > $ gluster vol heal $_vol info > and I see: > Brick > 10.5.6.32:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-CYTO-DATA > Status: Transport endpoint is not connected > Number of entries: - > > Brick > 10.5.6.49:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-CYTO-DATA > Status: Connected > Number of entries: 0 > > Brick > 10.5.6.100:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-CYTO-DATA > Status: Transport endpoint is not connected > Number of entries: - > > Ibegin to worry that 3.10 @centos7.3might have not been a > good idea. > many thanks. > L. > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > <mailto:Gluster-users at gluster.org> > http://lists.gluster.org/mailman/listinfo/gluster-users > > -- > - Atin (atinm)
lejeczek
2017-Aug-02 06:10 UTC
[Gluster-users] connection to 10.5.6.32:49155 failed (Connection refused); disconnecting socket
also, now after the upgrade gluster claims, on some vols, log list in heal info, and these in these amongst: Brick 10.5.6.49:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-USER-HOME <gfid:ea647c38-004d-4f2c-a533-ba75682869d2> Status: Connected what are these entries? On 02/08/17 02:19, Atin Mukherjee wrote:> This means shd client is not able to establish the > connection with the brick on port 49155. Now this could > happen if glusterd has ended up providing a stale port > back which is not what brick is listening to. If you had > killed any brick process using sigkill signal instead of > sigterm this is expected as portmap_signout is not > received by glusterd in the former case and the old > portmap entry is never wiped off. > > Please restart glusterd service. This should fix the problem. > > On Tue, 1 Aug 2017 at 23:03, peljasz <peljasz at yahoo.co.uk > <mailto:peljasz at yahoo.co.uk>> wrote: > > how critical is above? > I get plenty of these on all three peers. > > hi guys > > I've recently upgraded from 3.8 to 3.10 and I'm seeing > weird > behavior. > I see: $gluster vol status $_vol detail; takes long > timeand > mostly times out. > I do: > $ gluster vol heal $_vol info > and I see: > Brick > 10.5.6.32:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-CYTO-DATA > Status: Transport endpoint is not connected > Number of entries: - > > Brick > 10.5.6.49:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-CYTO-DATA > Status: Connected > Number of entries: 0 > > Brick > 10.5.6.100:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-CYTO-DATA > Status: Transport endpoint is not connected > Number of entries: - > > Ibegin to worry that 3.10 @centos7.3might have not been a > good idea. > many thanks. > L. > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > <mailto:Gluster-users at gluster.org> > http://lists.gluster.org/mailman/listinfo/gluster-users > > -- > - Atin (atinm)
lejeczek
2017-Aug-02 06:18 UTC
[Gluster-users] connection to 10.5.6.32:49155 failed (Connection refused); disconnecting socket
what I've just notice - the brick in question does show up as: Brick 10.5.6.32:/__.aLocalStorages/0/0-GLUS TERs/0GLUSTER-GROUP-WORK N/A N/A N N/A for one particular vol. Status for other vols(so far) shows it ok. Would this be volume problem or brick problem, or both? And most importantly, how to troubleshoot it? many thanks, L. On 02/08/17 02:19, Atin Mukherjee wrote:> This means shd client is not able to establish the > connection with the brick on port 49155. Now this could > happen if glusterd has ended up providing a stale port > back which is not what brick is listening to. If you had > killed any brick process using sigkill signal instead of > sigterm this is expected as portmap_signout is not > received by glusterd in the former case and the old > portmap entry is never wiped off. > > Please restart glusterd service. This should fix the problem. > > On Tue, 1 Aug 2017 at 23:03, peljasz <peljasz at yahoo.co.uk > <mailto:peljasz at yahoo.co.uk>> wrote: > > how critical is above? > I get plenty of these on all three peers. > > hi guys > > I've recently upgraded from 3.8 to 3.10 and I'm seeing > weird > behavior. > I see: $gluster vol status $_vol detail; takes long > timeand > mostly times out. > I do: > $ gluster vol heal $_vol info > and I see: > Brick > 10.5.6.32:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-CYTO-DATA > Status: Transport endpoint is not connected > Number of entries: - > > Brick > 10.5.6.49:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-CYTO-DATA > Status: Connected > Number of entries: 0 > > Brick > 10.5.6.100:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-CYTO-DATA > Status: Transport endpoint is not connected > Number of entries: - > > Ibegin to worry that 3.10 @centos7.3might have not been a > good idea. > many thanks. > L. > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > <mailto:Gluster-users at gluster.org> > http://lists.gluster.org/mailman/listinfo/gluster-users > > -- > - Atin (atinm)