thr3ads.net - Gluster users - [Gluster-users] Targeted fix-layout? [Jan 2013]

If this information is useful, please help other people find it:
Share via:

Dan Bretherton

2013-Jan-15 18:10 UTC

[Gluster-users] Targeted fix-layout?

Dear All-
I am running a fix-layout operation on a volume after seeing errors 
mentioning "anomalies" and "holes" in the logs.  There is a
particular
directory that is giving trouble and I would like to be able to run the 
layout fix on that first.  Users are experiencing various I/O errors 
including "invalid argument" and "Unknown error 526", but
after running
for a week the volume wide fix-layout doesn't seem to have reached this 
particular directory yet. Fix-layout takes a long time because there are 
millions of files in the volume and the CPU load is consistently very 
high on all the servers while it is running, sometimes over 20.  
Therefore I really need to find a way to target particular directories 
or speed up the volume wide fix-layout.

I have no idea what caused these errors but it could be related to the 
previous fix-layout operation, which I started following the addition of 
a new pair of bricks, not having completed successfully.  The problem is 
that the rebalance operation on one or more servers often fails before 
completing and there is no way (that I know of) to restart or resume the 
process on one server.  Every time this happens I stop the fix-layout 
and start it again, but it has never completed successfully on every 
server despite sometimes running for several weeks.

One other possible cause I can think of is my recent policy of using XFS 
for new bricks instead of ext4.  The reason I think this might be 
causing the problem is that none of the other volumes have any XFS 
bricks yet and they aren't experiencing any I/O errors.  Are there any 
special mount options required for XFS, and is there any reason why a 
volume shouldn't contain a mixture of ext4 and XFS bricks?

Regards,
Dan.

Jeff Darcy

2013-Jan-15 20:17 UTC

head link

[Gluster-users] Targeted fix-layout?

On 01/15/2013 01:10 PM, Dan Bretherton wrote:> I am running a fix-layout operation on a volume after seeing errors
mentioning
> "anomalies" and "holes" in the logs.  There is a
particular directory that is
> giving trouble and I would like to be able to run the layout fix on that
> first.  Users are experiencing various I/O errors including "invalid
argument"
> and "Unknown error 526", but after running for a week the volume
wide
> fix-layout doesn't seem to have reached this particular directory yet.
> Fix-layout takes a long time because there are millions of files in the
volume
> and the CPU load is consistently very high on all the servers while it is
> running, sometimes over 20. Therefore I really need to find a way to target
> particular directories or speed up the volume wide fix-layout.
You should be able to do the following command on a client to fix the layout 
for just one directory (it's the same xattr used by the rebalance command).

	setfattr -n distribute.fix.layout -v "anything" /bad/directory
> I have no idea what caused these errors but it could be related to the
previous
> fix-layout operation, which I started following the addition of a new pair
of
> bricks, not having completed successfully.  The problem is that the
rebalance
> operation on one or more servers often fails before completing and there is
no
> way (that I know of) to restart or resume the process on one server.  Every
> time this happens I stop the fix-layout and start it again, but it has
never
> completed successfully on every server despite sometimes running for
several
> weeks.
>
> One other possible cause I can think of is my recent policy of using XFS
for
> new bricks instead of ext4.  The reason I think this might be causing the
> problem is that none of the other volumes have any XFS bricks yet and they
> aren't experiencing any I/O errors.  Are there any special mount
options
> required for XFS, and is there any reason why a volume shouldn't
contain a
> mixture of ext4 and XFS bricks?
It doesn't seem like that should be a problem, but maybe someone else knows 
something about ext4/XFS differences that could shed some light.

Dan Bretherton

2013-Jan-16 14:56 UTC

head link

[Gluster-users] Targeted fix-layout?

On 01/15/2013 08:17 PM, gluster-users-request at gluster.org
wrote:> Date: Tue, 15 Jan 2013 15:17:00 -0500
> From: Jeff Darcy <jdarcy at redhat.com>
> To: gluster-users at gluster.org
> Subject: Re: [Gluster-users] Targeted fix-layout?
> Message-ID: <50F5B93C.5040802 at redhat.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> On 01/15/2013 01:10 PM, Dan Bretherton wrote:
>> I am running a fix-layout operation on a volume after seeing errors
mentioning
>> "anomalies" and "holes" in the logs.  There is a
particular directory that is
>> giving trouble and I would like to be able to run the layout fix on
that
>> first.  Users are experiencing various I/O errors including
"invalid argument"
>> and "Unknown error 526", but after running for a week the
volume wide
>> fix-layout doesn't seem to have reached this particular directory
yet.
>> Fix-layout takes a long time because there are millions of files in the
volume
>> and the CPU load is consistently very high on all the servers while it
is
>> running, sometimes over 20. Therefore I really need to find a way to
target
>> particular directories or speed up the volume wide fix-layout.
> You should be able to do the following command on a client to fix the
layout
> for just one directory (it's the same xattr used by the rebalance
command).
>
> 	setfattr -n distribute.fix.layout -v "anything" /bad/directory
>> I have no idea what caused these errors but it could be related to the
previous
>> fix-layout operation, which I started following the addition of a new
pair of
>> bricks, not having completed successfully.  The problem is that the
rebalance
>> operation on one or more servers often fails before completing and
there is no
>> way (that I know of) to restart or resume the process on one server. 
Every
>> time this happens I stop the fix-layout and start it again, but it has
never
>> completed successfully on every server despite sometimes running for
several
>> weeks.
>>
>> One other possible cause I can think of is my recent policy of using
XFS for
>> new bricks instead of ext4.  The reason I think this might be causing
the
>> problem is that none of the other volumes have any XFS bricks yet and
they
>> aren't experiencing any I/O errors.  Are there any special mount
options
>> required for XFS, and is there any reason why a volume shouldn't
contain a
>> mixture of ext4 and XFS bricks?
> It doesn't seem like that should be a problem, but maybe someone else
knows
> something about ext4/XFS differences that could shed some light.Thanks Jeff, I'll give that a try.

Should the xattr name be trusted.distribute.fix.layout by the way? When 
I try with distribute.fix.layout I get the error "Operation not
supported".

-Dan.

Gluster users - Jan 2013 - Targeted fix-layout?

[Gluster-users] Targeted fix-layout?

[Gluster-users] Targeted fix-layout?

[Gluster-users] Targeted fix-layout?