thr3ads.net - Gluster users - [Gluster-users] Input/Output Error when deleting folder [Jan 2015]

If this information is useful, please help other people find it:
Share via:

RASTELLI Alessandro

2015-Jan-08 08:51 UTC

[Gluster-users] Input/Output Error when deleting folder

Hi Xavi,
now there are some files on nodes1-2-3 and others on nodes4-5, so I think
I'm going to destroy and re-create the volume from scratch (I can afford it
now).

In your opinion, having 5 nodes with 10x 4TB disks each, what's the best way
to dimension the bricks?
Now we configured disperse FS, 2 bricks per node per volume (2x 4TB RAID0 each),
if I'm not wrong we can afford losing 2 bricks (= an entire node)
Would it be better using distributed FS, and having 1 brick per node (10x 4TB
RAID5 each)?
Or you have other suggestions?

Thanks
A.

-----Original Message-----
From: Xavier Hernandez [mailto:xhernandez at datalab.es] 
Sent: mercoled? 7 gennaio 2015 18:14
To: RASTELLI Alessandro
Cc: gluster-users at gluster.org; CAZZANIGA Stefano; UBERTINI Gabriele;
TECHNOLOGY - Supporto Sistemi OTT e Cloud; ORLANDO Luca
Subject: Re: [Gluster-users] Input/Output Error when deleting folder

If that file is missing only from gluster03-mi, and it has the same attributes
in all remaining bricks, self-heal should recover it automatically.

Are there differences in the extended attributes of the file on bricks that have
it ?

On 01/07/2015 05:22 PM, RASTELLI Alessandro wrote:> It worked... partially :)
> now I can access the folders again,  but I can't delete them because 
> that there are a couple of files into them (which I don't need) The
files exist only on node1,2,4,5 , but not on node3:
>
> [root at gluster02-mi ~]# getfattr -m. -e hex -d 
> /brick1/recorder/Rec218/Rec_218_1_part_14656.ts
> getfattr: Removing leading '/' from absolute path names # file: 
> brick1/recorder/Rec218/Rec_218_1_part_14656.ts
> trusted.ec.config=0x0000080a02000200
> trusted.ec.size=0x0000000034400000
> trusted.ec.version=0x0000000000001a20
> trusted.gfid=0x8d5da5a1cd1949618a5b96657857ceb6
>
> [root at gluster03-mi ~]# getfattr -m. -e hex -d 
> /brick1/recorder/Rec218/Rec_218_1_part_14656.ts
> getfattr: /brick1/recorder/Rec218/Rec_218_1_part_14656.ts: No such 
> file or directory
>
> How do I proceed?
> Thanks
>
> -----Original Message-----
> From: Xavier Hernandez [mailto:xhernandez at datalab.es]
> Sent: mercoled? 7 gennaio 2015 16:45
> To: RASTELLI Alessandro
> Cc: gluster-users at gluster.org; CAZZANIGA Stefano; UBERTINI Gabriele; 
> TECHNOLOGY - Supporto Sistemi OTT e Cloud; ORLANDO Luca
> Subject: Re: [Gluster-users] Input/Output Error when deleting folder
>
> Sorry, the command should be:
>
>       setfattr -n trusted.ec.version -v 0x0000000000000001 <brick
> path>/Rec218
>
> On 01/07/2015 04:34 PM, RASTELLI Alessandro wrote:
>> See my answers below:
>> 1.
>> [root at gluster03-mi ~]# ls -l
>> /brick1/recorder/.glusterfs/a9/d9/a9d904af-0d9e-4018-acb2-881bd8b3c2e
>> 4
>> ls: cannot access
>> /brick1/recorder/.glusterfs/a9/d9/a9d904af-0d9e-4018-acb2-881bd8b3c2e
>> 4
>> : No such file or directory [root at gluster03-mi ~]# ls -l 
>> /brick1/recorder/.glusterfs/bd/a8/bda849fc-a556-469e-ad84-ed074f2c1bc
>> d lrwxrwxrwx 1 root root 55 Dec 17 17:37 
>> /brick1/recorder/.glusterfs/bd/a8/bda849fc-a556-469e-ad84-ed074f2c1bc
>> d
>> -> ../../00/00/00000000-0000-0000-0000-000000000001/Rec218
>> [root at gluster03-mi ~]# ls -l
>> /brick2/recorder/.glusterfs/a9/d9/a9d904af-0d9e-4018-acb2-881bd8b3c2e
>> 4
>> ls: cannot access
>> /brick2/recorder/.glusterfs/a9/d9/a9d904af-0d9e-4018-acb2-881bd8b3c2e
>> 4
>> : No such file or directory [root at gluster03-mi ~]# ls -l 
>> /brick2/recorder/.glusterfs/bd/a8/bda849fc-a556-469e-ad84-ed074f2c1bc
>> d lrwxrwxrwx 1 root root 55 Dec 17 17:37 
>> /brick2/recorder/.glusterfs/bd/a8/bda849fc-a556-469e-ad84-ed074f2c1bc
>> d
>> -> ../../00/00/00000000-0000-0000-0000-000000000001/Rec218
>>
>> 2.
>> /Rec218 is supposed to be empty (or, I don't need to restore the
>> files) I stopped the volume, but when executing the command I get an
error:
>> [root at gluster01-mi ~]# setfattr -n trusted.ec.version -v 0x1
>> /brick1/recorder/Rec218 bad input encoding
>>
>> Regards
>> A.
>>
>>
>>
>> -----Original Message-----
>> From: Xavier Hernandez [mailto:xhernandez at datalab.es]
>> Sent: mercoled? 7 gennaio 2015 16:08
>> To: RASTELLI Alessandro
>> Cc: gluster-users at gluster.org; CAZZANIGA Stefano; UBERTINI Gabriele;
>> TECHNOLOGY - Supporto Sistemi OTT e Cloud; ORLANDO Luca
>> Subject: Re: [Gluster-users] Input/Output Error when deleting folder
>>
>> I see two problems here:
>>
>> 1. There has happened something very strange on gluster03-mi. It 
>> contains the directory, but it's not the same one that there's
on the
>> other bricks (8 bricks have gfid 
>> a9d904af-0d9e-4018-acb2-881bd8b3c2e4,
>> while that node has gfid bda849fc-a556-469e-ad84-ed074f2c1bcd)
>>
>> Whatever that has happened here has affected both bricks of that node
in the same way.
>>
>> What return these commands on gluster03-mi:
>>
>> ls -l
>> /brick1/recorder/.glusterfs/a9/d9/a9d904af-0d9e-4018-acb2-881bd8b3c2e
>> 4
>> ls -l
>> /brick1/recorder/.glusterfs/bd/a8/bda849fc-a556-469e-ad84-ed074f2c1bc
>> d
>>
>> ls -l
>> /brick2/recorder/.glusterfs/a9/d9/a9d904af-0d9e-4018-acb2-881bd8b3c2e
>> 4
>> ls -l
>> /brick2/recorder/.glusterfs/bd/a8/bda849fc-a556-469e-ad84-ed074f2c1bc
>> d
>>
>> 2. It seems that node gluster04-mi has been stopped (or rebooted or 
>> has
>> failed) while an operation that modifies the directory contents was
being executed, so it has lost an update an it's out of sync (both bricks on
the same server have missed one update, so it seems clear that it's not a
brick problem but a server problem).
>>
>> The global result of all this is that you have 4 failed bricks on a
configuration that only supports 2 failed bricks.
>>
>> BTW, having two or more bricks on the same server is not recommended
because a single server failure causes multiple bricks to be lost. In this case
a directory can be recovered, but if this happens to a file, it won't be
100% recoverable.
>>
>> Are there any files inside /Rec218 ?
>>
>> If you are going to delete the directory and all its contents and 
>> brick contents in gluster03-mi are the same than in other servers, 
>> the following commands should be safe (otherwise let me know before 
>> doing
>> anything):
>>
>> Before starting you must be sure that nothing is creating or deleting
entries inside /Rec218. It would be even better if this could be done with
volume stopped.
>>
>> On each brick (including gluster03-mi):
>>        setfattr -n trusted.ec.version -v 0x1 <brick path>/Rec218
>>
>> On bricks in gluster03-mi:
>>        setfattr -n trusted.gfid -v 0xa9d904af0d9e4018acb2881bd8b3c2e4
>> <brick path>/Rec218
>>        setfattr -n trusted.glusterfs.dht -v 
>> 0x000000010000000000000000ffffffff <brick path>/Rec218
>>
>> On client:
>>        check that the directory is accessible and its contents seem ok.
If so:
>>            rm -rf <mount point>/Rec218
>>
>> If you have a way to reproduce this situation, let me know.
>>
>> Xavi
>>
>> On 01/07/2015 03:31 PM, RASTELLI Alessandro wrote:
>>> [root at gluster01-mi ~]# getfattr -m. -e hex -d 
>>> /brick1/recorder/Rec218
>>> getfattr: Removing leading '/' from absolute path names #
file:
>>> brick1/recorder/Rec218 trusted.ec.version=0x000000000000693a
>>> trusted.gfid=0xa9d904af0d9e4018acb2881bd8b3c2e4
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>
>>> [root at gluster01-mi ~]# getfattr -m. -e hex -d 
>>> /brick2/recorder/Rec218
>>> getfattr: Removing leading '/' from absolute path names #
file:
>>> brick2/recorder/Rec218 trusted.ec.version=0x000000000000693a
>>> trusted.gfid=0xa9d904af0d9e4018acb2881bd8b3c2e4
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>
>>>
>>> [root at gluster02-mi ~]# getfattr -m. -e hex -d 
>>> /brick1/recorder/Rec218
>>> getfattr: Removing leading '/' from absolute path names #
file:
>>> brick1/recorder/Rec218 trusted.ec.version=0x000000000000693a
>>> trusted.gfid=0xa9d904af0d9e4018acb2881bd8b3c2e4
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>
>>> [root at gluster02-mi ~]# getfattr -m. -e hex -d 
>>> /brick2/recorder/Rec218
>>> getfattr: Removing leading '/' from absolute path names #
file:
>>> brick2/recorder/Rec218 trusted.ec.version=0x000000000000693a
>>> trusted.gfid=0xa9d904af0d9e4018acb2881bd8b3c2e4
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>
>>>
>>> [root at gluster03-mi ~]# getfattr -m. -e hex -d 
>>> /brick1/recorder/Rec218
>>> getfattr: Removing leading '/' from absolute path names #
file:
>>> brick1/recorder/Rec218
>>> trusted.gfid=0xbda849fca556469ead84ed074f2c1bcd
>>>
>>> [root at gluster03-mi ~]# getfattr -m. -e hex -d 
>>> /brick2/recorder/Rec218
>>> getfattr: Removing leading '/' from absolute path names #
file:
>>> brick2/recorder/Rec218
>>> trusted.gfid=0xbda849fca556469ead84ed074f2c1bcd
>>>
>>>
>>> [root at gluster04-mi ~]# getfattr -m. -e hex -d 
>>> /brick1/recorder/Rec218
>>> getfattr: Removing leading '/' from absolute path names #
file:
>>> brick1/recorder/Rec218
>>> trusted.ec.version=0x0000000000006939
>>> trusted.gfid=0xa9d904af0d9e4018acb2881bd8b3c2e4
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>
>>> [root at gluster04-mi ~]# getfattr -m. -e hex -d 
>>> /brick2/recorder/Rec218
>>> getfattr: Removing leading '/' from absolute path names #
file:
>>> brick2/recorder/Rec218
>>> trusted.ec.version=0x0000000000006939
>>> trusted.gfid=0xa9d904af0d9e4018acb2881bd8b3c2e4
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>
>>>
>>> [root at gluster05-mi ~]# getfattr -m. -e hex -d 
>>> /brick1/recorder/Rec218
>>> getfattr: Removing leading '/' from absolute path names #
file:
>>> brick1/recorder/Rec218 trusted.ec.version=0x000000000000693a
>>> trusted.gfid=0xa9d904af0d9e4018acb2881bd8b3c2e4
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>
>>> [root at gluster05-mi ~]# getfattr -m. -e hex -d 
>>> /brick2/recorder/Rec218
>>> getfattr: Removing leading '/' from absolute path names #
file:
>>> brick2/recorder/Rec218 trusted.ec.version=0x000000000000693a
>>> trusted.gfid=0xa9d904af0d9e4018acb2881bd8b3c2e4
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff

Xavier Hernandez

2015-Jan-08 13:05 UTC

head link

[Gluster-users] Input/Output Error when deleting folder

Hi,

On 01/08/2015 09:51 AM, RASTELLI Alessandro wrote:> Hi Xavi,
> now there are some files on nodes1-2-3 and others on nodes4-5, so I think
I'm going to destroy and re-create the volume from scratch (I can afford it
now).
If data is not needed, this is the best way to remove all problems. 
However, if you continue testing and arrive at this situation again, I 
would be very interested to know what operations have you made and as 
many details on your workload as you can. Maybe there's a bug causing 
this problem.
>
> In your opinion, having 5 nodes with 10x 4TB disks each, what's the
best way to dimension the bricks?
> Now we configured disperse FS, 2 bricks per node per volume (2x 4TB RAID0
each), if I'm not wrong we can afford losing 2 bricks (= an entire node)
> Would it be better using distributed FS, and having 1 brick per node (10x
4TB RAID5 each)?
> Or you have other suggestions?
The best configuration depends on your specific hardware characteristics 
and your needs or preferences.

The main factor is the MTBF/AFR of the disks (Mean Time Between Failures 
/ Annualized Failure Rate).

Relationship between MTBF and AFR is defined by (assuming disks are 
working uninterrupted all the year):

     AFR = 1 - exp(-8760 / MTBF)

AFR is the probability that a single disk fails in one year.

In your environment, each server has 10 disks. If we assume an AFR of 
3%, we can calculate some failure probabilities in different 
configurations (all probabilities are in one year):

Failure probability of a single disk:            3.00%
Failure probability of a RAID-0  with  5 disks: 14.13%
Failure probability of a RAID-0  with 10 disks: 26.26%
Failure probability of a RAID-5  with  5 disks:  0.85%
Failure probability of a RAID-5  with 10 disks:  3.45% *
Failure probability of a RAID-6  with  5 disks:  0.03%
Failure probability of a RAID-6  with 10 disks:  0.28%
Failure probability of a RAID-50 with 10 disks:  1.69%

* Note that a RAID-1 of 10 disks has more probability of failure than a 
single disk in this case.

Once you have calculated the failure probability for your hardware 
configuration, this probability can be considered as the AFR of a single 
disk used as a brick for gluster.

Then you can calculate the failure probability of the gluster volume 
(assuming you have an AFR of 3.45% using a RAID-5 of 10 disks):

Failure probability of a Disperse  3:1:  0.35%
Failure probability of a Disperse  5:1:  1.11%
Failure probability of a Disperse  6:2:  0.08%
Failure probability of a Disperse 10:2:  0.41%

Gluster has the possibility of using Distribute. Distribute is similar 
to a RAID-0 (it combines multiple subvolumes into a single one), but if 
one subvolume fails, only data stored in that subvolume is lost (in a 
RAID-0, if a single disk fails, the entire RAID is lost).

This doesn't reduce the probability of failure, but it reduces the 
impact of that failure (it's much harder to lose all data):

Failure prob of a Distributed-Dispersed 2x3:1: 0.6956% (1 subvol)
                                                0.0012% (2 subvol)
Failure prob of a Distributed-Dispersed 2x5:1: 0.0428135% (1 subvol)
                                                0.0000046% (2 subvol)

Of course all these numbers are only statistical. A batch of defective 
drives or servers can ruin any configuration.

You should also consider the time needed to rebuild a brick if a RAID 
fails. If you create RAID-5 of 10 disks, for example, gluster will need 
to recover up to 36 TB of information (if brick was full). Using smaller 
RAIDs reduces this amount of data.

If you use a single RAID to store multiple bricks, you will get multiple 
brick failures in case of a RAID or server failure. In any case it's not 
recommended to have more than one brick of the same subvolume in the 
same server. It's better to use distribute in this case (a 10:2 
configuration where a single server failure causes 2 bricks to fail, is 
almost equivalent to a 5:1 configuration with respect to probabilities, 
specially if disks are configured in a RAID-0).

I wouldn't recommend to use RAID-0 with gluster. Instead of creating a 
RAID-0 of 2 disks, it's better to create 2 bricks belonging to two 
different gluster subvolumes and use distribute.

Failure probability of one brick using RAID-0 of two disks:  5.91%
Failure probability of two bricks using two disks:  5.82% (1 subvol)
                                                     0.09% (2 subvol)

RAID-5 or RAID-6 can be useful for single disk failure because the disk 
can be recovered locally in the server without having to read data from 
other servers. Only a more critical failure will require that gluster 
rebuilds brick contents. However bigger RAIDs have greater failure 
probabilities (though they waste less physical disk space).

You must also consider the cost of growing a volume. Disperse and 
Replicate need to grow in multiples of the subvolume size. This means 
that if you create a 3:1 configuration you will need to add 3 new bricks 
if you want to get more space. If you start with a 10:2 configuration 
you will need to add 10 new bricks to get more space.

In your case I would recommend using two RAID-5 of 5 disks each, or a 
single RAID-6 of 10 disks, in each server. You can also opt to not use 
any RAID and have 10 independent disks in each server. I would also 
create relatively small bricks (for example 4TB each) and use a 
distributed-dispersed 5:1, with one brick of each subvolume in each server.

With this configuration, if you lose one RAID or an entire server, you 
will only lose, at most, one brick of each subvolume.

Probability of failure using RAID-6:            0.0076% (1 subvol)
Probability of failure using RAID-5 (5 disks):  0.07% (1 subvol)
Probability of failure without RAID:            0.85% (1 subvol)
Probability of failure using RAID-5 (10 disks): 1.11% (1 subvol)

Of course it's better using RAID, but you also waste more space:

Available space using RAID-6:            128 TB
Available space using RAID-5 (5 disks):  128 TB
Available space using RAID-5 (10 disks): 144 TB
Available space without RAID:            160 TB

Using RAID you will recover integrity faster when only one or two disks 
fails. But it will take more time when gluster has to recover more than 
one brick (all bricks contained in the failed RAID).

You can also use disperse with redundancy 2. In your case it should be a 
5:2. This configuration is not considered optimal, but it's possible 
that with your workload it performs quite well (you should test it). 
With this configuration I wouldn't recommend any RAID, or RAID-5 with 5 
disks at most.

Probability of failure using RAID-6:            0.00002% (1 subvol)
Probability of failure using RAID-5 (5 disks):  0.00060% (1 subvol)
Probability of failure without RAID:            0.026% (1 subvol)
Probability of failure using RAID-5 (10 disks): 0.039% (1 subvol)

Available space using RAID-6:             96 TB
Available space using RAID-5 (5 disks):   96 TB
Available space using RAID-5 (10 disks): 108 TB
Available space without RAID:            120 TB

Hope this helps a little to decide the best configuration for you.

Xavi
>
> Thanks
> A.
>
> -----Original Message-----
> From: Xavier Hernandez [mailto:xhernandez at datalab.es]
> Sent: mercoled? 7 gennaio 2015 18:14
> To: RASTELLI Alessandro
> Cc: gluster-users at gluster.org; CAZZANIGA Stefano; UBERTINI Gabriele;
TECHNOLOGY - Supporto Sistemi OTT e Cloud; ORLANDO Luca
> Subject: Re: [Gluster-users] Input/Output Error when deleting folder
>
> If that file is missing only from gluster03-mi, and it has the same
attributes in all remaining bricks, self-heal should recover it automatically.
>
> Are there differences in the extended attributes of the file on bricks that
have it ?
>
> On 01/07/2015 05:22 PM, RASTELLI Alessandro wrote:
>> It worked... partially :)
>> now I can access the folders again,  but I can't delete them
because
>> that there are a couple of files into them (which I don't need) The
files exist only on node1,2,4,5 , but not on node3:
>>
>> [root at gluster02-mi ~]# getfattr -m. -e hex -d
>> /brick1/recorder/Rec218/Rec_218_1_part_14656.ts
>> getfattr: Removing leading '/' from absolute path names # file:
>> brick1/recorder/Rec218/Rec_218_1_part_14656.ts
>> trusted.ec.config=0x0000080a02000200
>> trusted.ec.size=0x0000000034400000
>> trusted.ec.version=0x0000000000001a20
>> trusted.gfid=0x8d5da5a1cd1949618a5b96657857ceb6
>>
>> [root at gluster03-mi ~]# getfattr -m. -e hex -d
>> /brick1/recorder/Rec218/Rec_218_1_part_14656.ts
>> getfattr: /brick1/recorder/Rec218/Rec_218_1_part_14656.ts: No such
>> file or directory
>>
>> How do I proceed?
>> Thanks
>>
>> -----Original Message-----
>> From: Xavier Hernandez [mailto:xhernandez at datalab.es]
>> Sent: mercoled? 7 gennaio 2015 16:45
>> To: RASTELLI Alessandro
>> Cc: gluster-users at gluster.org; CAZZANIGA Stefano; UBERTINI Gabriele;
>> TECHNOLOGY - Supporto Sistemi OTT e Cloud; ORLANDO Luca
>> Subject: Re: [Gluster-users] Input/Output Error when deleting folder
>>
>> Sorry, the command should be:
>>
>>        setfattr -n trusted.ec.version -v 0x0000000000000001 <brick
>> path>/Rec218
>>
>> On 01/07/2015 04:34 PM, RASTELLI Alessandro wrote:
>>> See my answers below:
>>> 1.
>>> [root at gluster03-mi ~]# ls -l
>>>
/brick1/recorder/.glusterfs/a9/d9/a9d904af-0d9e-4018-acb2-881bd8b3c2e
>>> 4
>>> ls: cannot access
>>>
/brick1/recorder/.glusterfs/a9/d9/a9d904af-0d9e-4018-acb2-881bd8b3c2e
>>> 4
>>> : No such file or directory [root at gluster03-mi ~]# ls -l
>>>
/brick1/recorder/.glusterfs/bd/a8/bda849fc-a556-469e-ad84-ed074f2c1bc
>>> d lrwxrwxrwx 1 root root 55 Dec 17 17:37
>>>
/brick1/recorder/.glusterfs/bd/a8/bda849fc-a556-469e-ad84-ed074f2c1bc
>>> d
>>> -> ../../00/00/00000000-0000-0000-0000-000000000001/Rec218
>>> [root at gluster03-mi ~]# ls -l
>>>
/brick2/recorder/.glusterfs/a9/d9/a9d904af-0d9e-4018-acb2-881bd8b3c2e
>>> 4
>>> ls: cannot access
>>>
/brick2/recorder/.glusterfs/a9/d9/a9d904af-0d9e-4018-acb2-881bd8b3c2e
>>> 4
>>> : No such file or directory [root at gluster03-mi ~]# ls -l
>>>
/brick2/recorder/.glusterfs/bd/a8/bda849fc-a556-469e-ad84-ed074f2c1bc
>>> d lrwxrwxrwx 1 root root 55 Dec 17 17:37
>>>
/brick2/recorder/.glusterfs/bd/a8/bda849fc-a556-469e-ad84-ed074f2c1bc
>>> d
>>> -> ../../00/00/00000000-0000-0000-0000-000000000001/Rec218
>>>
>>> 2.
>>> /Rec218 is supposed to be empty (or, I don't need to restore
the
>>> files) I stopped the volume, but when executing the command I get
an error:
>>> [root at gluster01-mi ~]# setfattr -n trusted.ec.version -v 0x1
>>> /brick1/recorder/Rec218 bad input encoding
>>>
>>> Regards
>>> A.
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Xavier Hernandez [mailto:xhernandez at datalab.es]
>>> Sent: mercoled? 7 gennaio 2015 16:08
>>> To: RASTELLI Alessandro
>>> Cc: gluster-users at gluster.org; CAZZANIGA Stefano; UBERTINI
Gabriele;
>>> TECHNOLOGY - Supporto Sistemi OTT e Cloud; ORLANDO Luca
>>> Subject: Re: [Gluster-users] Input/Output Error when deleting
folder
>>>
>>> I see two problems here:
>>>
>>> 1. There has happened something very strange on gluster03-mi. It
>>> contains the directory, but it's not the same one that
there's on the
>>> other bricks (8 bricks have gfid
>>> a9d904af-0d9e-4018-acb2-881bd8b3c2e4,
>>> while that node has gfid bda849fc-a556-469e-ad84-ed074f2c1bcd)
>>>
>>> Whatever that has happened here has affected both bricks of that
node in the same way.
>>>
>>> What return these commands on gluster03-mi:
>>>
>>> ls -l
>>>
/brick1/recorder/.glusterfs/a9/d9/a9d904af-0d9e-4018-acb2-881bd8b3c2e
>>> 4
>>> ls -l
>>>
/brick1/recorder/.glusterfs/bd/a8/bda849fc-a556-469e-ad84-ed074f2c1bc
>>> d
>>>
>>> ls -l
>>>
/brick2/recorder/.glusterfs/a9/d9/a9d904af-0d9e-4018-acb2-881bd8b3c2e
>>> 4
>>> ls -l
>>>
/brick2/recorder/.glusterfs/bd/a8/bda849fc-a556-469e-ad84-ed074f2c1bc
>>> d
>>>
>>> 2. It seems that node gluster04-mi has been stopped (or rebooted or
>>> has
>>> failed) while an operation that modifies the directory contents was
being executed, so it has lost an update an it's out of sync (both bricks on
the same server have missed one update, so it seems clear that it's not a
brick problem but a server problem).
>>>
>>> The global result of all this is that you have 4 failed bricks on a
configuration that only supports 2 failed bricks.
>>>
>>> BTW, having two or more bricks on the same server is not
recommended because a single server failure causes multiple bricks to be lost.
In this case a directory can be recovered, but if this happens to a file, it
won't be 100% recoverable.
>>>
>>> Are there any files inside /Rec218 ?
>>>
>>> If you are going to delete the directory and all its contents and
>>> brick contents in gluster03-mi are the same than in other servers,
>>> the following commands should be safe (otherwise let me know before
>>> doing
>>> anything):
>>>
>>> Before starting you must be sure that nothing is creating or
deleting entries inside /Rec218. It would be even better if this could be done
with volume stopped.
>>>
>>> On each brick (including gluster03-mi):
>>>         setfattr -n trusted.ec.version -v 0x1 <brick
path>/Rec218
>>>
>>> On bricks in gluster03-mi:
>>>         setfattr -n trusted.gfid -v
0xa9d904af0d9e4018acb2881bd8b3c2e4
>>> <brick path>/Rec218
>>>         setfattr -n trusted.glusterfs.dht -v
>>> 0x000000010000000000000000ffffffff <brick path>/Rec218
>>>
>>> On client:
>>>         check that the directory is accessible and its contents
seem ok. If so:
>>>             rm -rf <mount point>/Rec218
>>>
>>> If you have a way to reproduce this situation, let me know.
>>>
>>> Xavi
>>>
>>> On 01/07/2015 03:31 PM, RASTELLI Alessandro wrote:
>>>> [root at gluster01-mi ~]# getfattr -m. -e hex -d
>>>> /brick1/recorder/Rec218
>>>> getfattr: Removing leading '/' from absolute path names
# file:
>>>> brick1/recorder/Rec218 trusted.ec.version=0x000000000000693a
>>>> trusted.gfid=0xa9d904af0d9e4018acb2881bd8b3c2e4
>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>>
>>>> [root at gluster01-mi ~]# getfattr -m. -e hex -d
>>>> /brick2/recorder/Rec218
>>>> getfattr: Removing leading '/' from absolute path names
# file:
>>>> brick2/recorder/Rec218 trusted.ec.version=0x000000000000693a
>>>> trusted.gfid=0xa9d904af0d9e4018acb2881bd8b3c2e4
>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>>
>>>>
>>>> [root at gluster02-mi ~]# getfattr -m. -e hex -d
>>>> /brick1/recorder/Rec218
>>>> getfattr: Removing leading '/' from absolute path names
# file:
>>>> brick1/recorder/Rec218 trusted.ec.version=0x000000000000693a
>>>> trusted.gfid=0xa9d904af0d9e4018acb2881bd8b3c2e4
>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>>
>>>> [root at gluster02-mi ~]# getfattr -m. -e hex -d
>>>> /brick2/recorder/Rec218
>>>> getfattr: Removing leading '/' from absolute path names
# file:
>>>> brick2/recorder/Rec218 trusted.ec.version=0x000000000000693a
>>>> trusted.gfid=0xa9d904af0d9e4018acb2881bd8b3c2e4
>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>>
>>>>
>>>> [root at gluster03-mi ~]# getfattr -m. -e hex -d
>>>> /brick1/recorder/Rec218
>>>> getfattr: Removing leading '/' from absolute path names
# file:
>>>> brick1/recorder/Rec218
>>>> trusted.gfid=0xbda849fca556469ead84ed074f2c1bcd
>>>>
>>>> [root at gluster03-mi ~]# getfattr -m. -e hex -d
>>>> /brick2/recorder/Rec218
>>>> getfattr: Removing leading '/' from absolute path names
# file:
>>>> brick2/recorder/Rec218
>>>> trusted.gfid=0xbda849fca556469ead84ed074f2c1bcd
>>>>
>>>>
>>>> [root at gluster04-mi ~]# getfattr -m. -e hex -d
>>>> /brick1/recorder/Rec218
>>>> getfattr: Removing leading '/' from absolute path names
# file:
>>>> brick1/recorder/Rec218
>>>> trusted.ec.version=0x0000000000006939
>>>> trusted.gfid=0xa9d904af0d9e4018acb2881bd8b3c2e4
>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>>
>>>> [root at gluster04-mi ~]# getfattr -m. -e hex -d
>>>> /brick2/recorder/Rec218
>>>> getfattr: Removing leading '/' from absolute path names
# file:
>>>> brick2/recorder/Rec218
>>>> trusted.ec.version=0x0000000000006939
>>>> trusted.gfid=0xa9d904af0d9e4018acb2881bd8b3c2e4
>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>>
>>>>
>>>> [root at gluster05-mi ~]# getfattr -m. -e hex -d
>>>> /brick1/recorder/Rec218
>>>> getfattr: Removing leading '/' from absolute path names
# file:
>>>> brick1/recorder/Rec218 trusted.ec.version=0x000000000000693a
>>>> trusted.gfid=0xa9d904af0d9e4018acb2881bd8b3c2e4
>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>>
>>>> [root at gluster05-mi ~]# getfattr -m. -e hex -d
>>>> /brick2/recorder/Rec218
>>>> getfattr: Removing leading '/' from absolute path names
# file:
>>>> brick2/recorder/Rec218 trusted.ec.version=0x000000000000693a
>>>> trusted.gfid=0xa9d904af0d9e4018acb2881bd8b3c2e4
>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff

Gluster users - Jan 2015 - Input/Output Error when deleting folder

[Gluster-users] Input/Output Error when deleting folder

[Gluster-users] Input/Output Error when deleting folder