DUCARROZ Birgit
2020-Jan-03 14:55 UTC
[Gluster-users] Is it possible to experience data loss while rebalancing a volume?
Hi list, me again, sorry to bother you again. Last week, I added 2 new servers to my existing cluster. Every thing worked fine until I began to rebalance some volumes with really a lot of files. Rebalancing failed on some servers and I experienced a lot of data loss which replicated on all servers. I had no time to analyze the logfiles, but it happened while there was another "transport endpoint not connected" error. I had to put back these data from the last backup. This is the situation: I experience from time to time a "Transport endpoint not connected". I posted this error with a lot of logfiles on a former post (Treat "Transport Endpoint Not Connected When Writing a Lot of Files") started on october 11, 2019. We did not find the definitive reason of these errors, but Amar suggested me to update to gluster version 7, which I did now on the two additional servers. Actually, these two servers are attached again to the former cluster and I would try again to re-balance and then remove the old servers which cause the transport endpont errors, but I'm hesitating, because people will begin working again on Monday and a new data loss would be catastrophic. My questions: a) Is it really possible to experience data loss when rebalancing? b) Is it important from which server I start rebalance? c) In case there is another data loss, how would it be possible to put the files back directly from a brick? My other solution would be to create a second cluster using the two new servers plus a virtual server as arbiter and then migrating data from backups, but I prefer to use gluster as it is and replicate data. I would be interested if other people experienced data loss while rebalancing. Thank you for every suggestion. Kind regards, Birgit