Dear All- I am running a fix-layout operation on a volume after seeing errors mentioning "anomalies" and "holes" in the logs. There is a particular directory that is giving trouble and I would like to be able to run the layout fix on that first. Users are experiencing various I/O errors including "invalid argument" and "Unknown error 526", but after running for a week the volume wide fix-layout doesn't seem to have reached this particular directory yet. Fix-layout takes a long time because there are millions of files in the volume and the CPU load is consistently very high on all the servers while it is running, sometimes over 20. Therefore I really need to find a way to target particular directories or speed up the volume wide fix-layout. I have no idea what caused these errors but it could be related to the previous fix-layout operation, which I started following the addition of a new pair of bricks, not having completed successfully. The problem is that the rebalance operation on one or more servers often fails before completing and there is no way (that I know of) to restart or resume the process on one server. Every time this happens I stop the fix-layout and start it again, but it has never completed successfully on every server despite sometimes running for several weeks. One other possible cause I can think of is my recent policy of using XFS for new bricks instead of ext4. The reason I think this might be causing the problem is that none of the other volumes have any XFS bricks yet and they aren't experiencing any I/O errors. Are there any special mount options required for XFS, and is there any reason why a volume shouldn't contain a mixture of ext4 and XFS bricks? Regards, Dan.
On 01/15/2013 01:10 PM, Dan Bretherton wrote:> I am running a fix-layout operation on a volume after seeing errors mentioning > "anomalies" and "holes" in the logs. There is a particular directory that is > giving trouble and I would like to be able to run the layout fix on that > first. Users are experiencing various I/O errors including "invalid argument" > and "Unknown error 526", but after running for a week the volume wide > fix-layout doesn't seem to have reached this particular directory yet. > Fix-layout takes a long time because there are millions of files in the volume > and the CPU load is consistently very high on all the servers while it is > running, sometimes over 20. Therefore I really need to find a way to target > particular directories or speed up the volume wide fix-layout.You should be able to do the following command on a client to fix the layout for just one directory (it's the same xattr used by the rebalance command). setfattr -n distribute.fix.layout -v "anything" /bad/directory> I have no idea what caused these errors but it could be related to the previous > fix-layout operation, which I started following the addition of a new pair of > bricks, not having completed successfully. The problem is that the rebalance > operation on one or more servers often fails before completing and there is no > way (that I know of) to restart or resume the process on one server. Every > time this happens I stop the fix-layout and start it again, but it has never > completed successfully on every server despite sometimes running for several > weeks. > > One other possible cause I can think of is my recent policy of using XFS for > new bricks instead of ext4. The reason I think this might be causing the > problem is that none of the other volumes have any XFS bricks yet and they > aren't experiencing any I/O errors. Are there any special mount options > required for XFS, and is there any reason why a volume shouldn't contain a > mixture of ext4 and XFS bricks?It doesn't seem like that should be a problem, but maybe someone else knows something about ext4/XFS differences that could shed some light.
On 01/15/2013 08:17 PM, gluster-users-request at gluster.org wrote:> Date: Tue, 15 Jan 2013 15:17:00 -0500 > From: Jeff Darcy <jdarcy at redhat.com> > To: gluster-users at gluster.org > Subject: Re: [Gluster-users] Targeted fix-layout? > Message-ID: <50F5B93C.5040802 at redhat.com> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > On 01/15/2013 01:10 PM, Dan Bretherton wrote: >> I am running a fix-layout operation on a volume after seeing errors mentioning >> "anomalies" and "holes" in the logs. There is a particular directory that is >> giving trouble and I would like to be able to run the layout fix on that >> first. Users are experiencing various I/O errors including "invalid argument" >> and "Unknown error 526", but after running for a week the volume wide >> fix-layout doesn't seem to have reached this particular directory yet. >> Fix-layout takes a long time because there are millions of files in the volume >> and the CPU load is consistently very high on all the servers while it is >> running, sometimes over 20. Therefore I really need to find a way to target >> particular directories or speed up the volume wide fix-layout. > You should be able to do the following command on a client to fix the layout > for just one directory (it's the same xattr used by the rebalance command). > > setfattr -n distribute.fix.layout -v "anything" /bad/directory >> I have no idea what caused these errors but it could be related to the previous >> fix-layout operation, which I started following the addition of a new pair of >> bricks, not having completed successfully. The problem is that the rebalance >> operation on one or more servers often fails before completing and there is no >> way (that I know of) to restart or resume the process on one server. Every >> time this happens I stop the fix-layout and start it again, but it has never >> completed successfully on every server despite sometimes running for several >> weeks. >> >> One other possible cause I can think of is my recent policy of using XFS for >> new bricks instead of ext4. The reason I think this might be causing the >> problem is that none of the other volumes have any XFS bricks yet and they >> aren't experiencing any I/O errors. Are there any special mount options >> required for XFS, and is there any reason why a volume shouldn't contain a >> mixture of ext4 and XFS bricks? > It doesn't seem like that should be a problem, but maybe someone else knows > something about ext4/XFS differences that could shed some light.Thanks Jeff, I'll give that a try. Should the xattr name be trusted.distribute.fix.layout by the way? When I try with distribute.fix.layout I get the error "Operation not supported". -Dan.