hi gluster gurus, i have 4 servers g1,g2,g3 & g4 with 24T each running gluster 3.1.5 on opensuse 11.3. they have been running well for the last few months in a distributed+replicated setup. i just found that the nfs log had filled up my root disk of g4 (my bad). so, i removed the log file - and a couple of other large ones and restored a load of disk space. however, gluster 3.1.5 will not restart on this machine!! it err's out with http://pastebin.com/646W8zjg i've searched this forum, and searched the documentation. however, i cant see anything that mentions this situation. please can anyone help - i'm quite concerned about my system. this is a live server with live data. i need to get g4 up and running and back into sync ASAP. many thanks in advance, -paul ps - the following command just hangs: g4:~ # gluster peer status ..however, on g3 it works: g3:/etc/glusterd/logs # gluster peer status Number of Peers: 3 Hostname: 10.0.0.12 Uuid: 8061196e-a075-42f6-89f5-1f60281485f5 State: Peer in Cluster (Connected) Hostname: g2 Uuid: 154d5c46-f62f-4e9c-a328-443e30cadf4e State: Peer in Cluster (Connected) Hostname: g4 Uuid: 62365589-61f8-479f-bb50-11519beba045 State: Peer in Cluster (Disconnected) ..i've also tried rebooting the machine - and nothing changes. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20110808/ae23edb8/attachment.html>
Pranith Kumar K
2011-Aug-08 16:59 UTC
[Gluster-users] help! - cant bring 1 of 4 servers up..
zip /etc/glusterd and send across Pranith On 08/08/2011 10:15 PM, paul simpson wrote:> hi gluster gurus, > > i have 4 servers g1,g2,g3 & g4 with 24T each running gluster 3.1.5 on > opensuse 11.3. they have been running well for the last few months in > a distributed+replicated setup. > > i just found that the nfs log had filled up my root disk of g4 (my > bad). so, i removed the log file - and a couple of other large ones > and restored a load of disk space. however, gluster 3.1.5 will not > restart on this machine!! it err's out with http://pastebin.com/646W8zjg > > i've searched this forum, and searched the documentation. however, i > cant see anything that mentions this situation. please can anyone > help - i'm quite concerned about my system. this is a live server > with live data. i need to get g4 up and running and back into sync ASAP. > > many thanks in advance, > > -paul > > ps - the following command just hangs: > > g4:~ # gluster peer status > > > ..however, on g3 it works: > > g3:/etc/glusterd/logs # gluster peer status > Number of Peers: 3 > > Hostname: 10.0.0.12 > Uuid: 8061196e-a075-42f6-89f5-1f60281485f5 > State: Peer in Cluster (Connected) > > Hostname: g2 > Uuid: 154d5c46-f62f-4e9c-a328-443e30cadf4e > State: Peer in Cluster (Connected) > > Hostname: g4 > Uuid: 62365589-61f8-479f-bb50-11519beba045 > State: Peer in Cluster (Disconnected) > > ..i've also tried rebooting the machine - and nothing changes. > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20110808/9a32620d/attachment.html>
Pranith Kumar K
2011-Aug-08 17:35 UTC
[Gluster-users] help! - cant bring 1 of 4 servers up..
After debugging the problem with paul on IRC, we found that because his disk had no free space, the subsequent writes on one of the peer files (used for recovering run-time information) failed so the file became empty. Because of this glusterd could not restore that peer so it is not re-starting successfully. We copied the contents of that file from other peer in the cluster to the problematic one. Then glusterd started successfully. Pranith. On 08/08/2011 10:32 PM, paul simpson wrote:> hi pranith, > > many thanks for the super quick reply! i've attached the files asked > for - be keen to hear your thoughts. i'm stumped - and scared! > > regards, > > paul > > > > > On 8 August 2011 17:59, Pranith Kumar K <pranithk at gluster.com > <mailto:pranithk at gluster.com>> wrote: > > zip /etc/glusterd and send across > > Pranith > > On 08/08/2011 10:15 PM, paul simpson wrote: >> hi gluster gurus, >> >> i have 4 servers g1,g2,g3 & g4 with 24T each running gluster >> 3.1.5 on opensuse 11.3. they have been running well for the last >> few months in a distributed+replicated setup. >> >> i just found that the nfs log had filled up my root disk of g4 >> (my bad). so, i removed the log file - and a couple of other >> large ones and restored a load of disk space. however, gluster >> 3.1.5 will not restart on this machine!! it err's out with >> http://pastebin.com/646W8zjg >> >> i've searched this forum, and searched the documentation. >> however, i cant see anything that mentions this situation. >> please can anyone help - i'm quite concerned about my system. >> this is a live server with live data. i need to get g4 up and >> running and back into sync ASAP. >> >> many thanks in advance, >> >> -paul >> >> ps - the following command just hangs: >> >> g4:~ # gluster peer status >> >> >> ..however, on g3 it works: >> >> g3:/etc/glusterd/logs # gluster peer status >> Number of Peers: 3 >> >> Hostname: 10.0.0.12 >> Uuid: 8061196e-a075-42f6-89f5-1f60281485f5 >> State: Peer in Cluster (Connected) >> >> Hostname: g2 >> Uuid: 154d5c46-f62f-4e9c-a328-443e30cadf4e >> State: Peer in Cluster (Connected) >> >> Hostname: g4 >> Uuid: 62365589-61f8-479f-bb50-11519beba045 >> State: Peer in Cluster (Disconnected) >> >> ..i've also tried rebooting the machine - and nothing changes. >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20110808/ea99af5e/attachment.html>