lemonnierk at ulrar.net
2018-Aug-20 07:37 UTC
[Gluster-users] Disconnected peers after reboot
Hi, To add to the problematic memory leak, I've been seeing another strange behavior on the 3.12 servers. When I reboot a node, it seems like often (but not always) the other nodes mark it as disconnected and won't accept it back until I restart them. Sometimes I need to restart the glusterd on other nodes, sometimes on the node I rebooted too, but not always. I'm also seeing that after a network outage of course, I have bricks staying down because quorum isn't met on some nodes until I restart their glusterd. 3.7 didn't have that problem at all, so it must be a new bug. It's very problematic because we end up with VMs locked, or doing I/O errors after simple node reboots, making upgrades impossible to perform without the clients noticing everything went down. Sometimes we don't even see a VM gets I/O errors, it takes a while for that to show on some of them .. -- PGP Fingerprint : 0x624E42C734DAC346
On Mon, 20 Aug 2018 at 13:08, <lemonnierk at ulrar.net> wrote:> Hi, > > To add to the problematic memory leak, I've been seeing another strange > behavior on the 3.12 servers. When I reboot a node, it seems like often > (but not always) the other nodes mark it as disconnected and won't > accept it back until I restart them.What does gluster peer status detail out? If you can pass down the output along with all glusterd log files, it?d give us some clue on what?s happening here.> > Sometimes I need to restart the glusterd on other nodes, sometimes on > the node I rebooted too, but not always. > I'm also seeing that after a network outage of course, I have bricks > staying down because quorum isn't met on some nodes until I restart > their glusterd. > > 3.7 didn't have that problem at all, so it must be a new bug. It's very > problematic because we end up with VMs locked, or doing I/O errors after > simple node reboots, making upgrades impossible to perform without the > clients noticing everything went down. Sometimes we don't even see a VM > gets I/O errors, it takes a while for that to show on some of them .. > > -- > PGP Fingerprint : 0x624E42C734DAC346 > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users >-- - Atin (atinm) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180820/64c3ce00/attachment.html>