符永涛
2013-Jan-04 06:00 UTC
[Gluster-users] help, glusterfs replica can't handle brick filesystem crash and shutdown
Dear gluster experts, Glusterfs replica is supposed to handle hardware failure of one brick.(For example power outage etc). However we recently encounter an issue related to xfs file system crash and shutdown. When it happens the whole volume dones't work. Some files are inaccessible and even worse some directories become inaccessible which make thousands of files missing. To handle it we have to force shutdown the peer. This solves the problem but our services are impacted and data loose happens. Glusterfs replica should be able to handle brick filesystem shutdown smoothly. What's your opinion to avoid this kind of failure? -- ???
Brian Foster
2013-Jan-04 14:02 UTC
[Gluster-users] help, glusterfs replica can't handle brick filesystem crash and shutdown
On 01/04/2013 01:00 AM, ??? wrote:> Dear gluster experts, > > Glusterfs replica is supposed to handle hardware failure of one > brick.(For example power outage etc). However we recently encounter an > issue related to xfs file system crash and shutdown. When it happens > the whole volume dones't work. Some files are inaccessible and even > worse some directories become inaccessible which make thousands of > files missing. > To handle it we have to force shutdown the peer. This solves the > problem but our services are impacted and data loose happens. > Glusterfs replica should be able to handle brick filesystem shutdown > smoothly. What's your opinion to avoid this kind of failure? >Hi, First, I would suggest you independently characterize your XFS crash to the XFS mailing list (xfs at oss.sgi.com): http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F Hopefully they can help assess the state and possible recovery of your local filesystem. How to proceed on the gluster side of things probably depends on the outcome of that analysis. My guess is that the filesystem going into a shutdown state probably causes confusion for gluster, due to the runtime limitations it imposes on the filesystem. I haven't actually tested an active gluster mount on a brick in the shutdown state, so I can't specifically characterize the state (at minimum, I'd expect read-only behavior), but I'll give it a try and see what happens... Brian