RASTELLI Alessandro
2015-Jan-07 14:31 UTC
[Gluster-users] Input/Output Error when deleting folder
Hmmm... different results on different nodes! Below the output for each brick for each of the 5 nodes. [root at gluster01-mi ~]# getfattr -m. -e hex -d /brick1/recorder/Rec218 getfattr: Removing leading '/' from absolute path names # file: brick1/recorder/Rec218 trusted.ec.version=0x000000000000693a trusted.gfid=0xa9d904af0d9e4018acb2881bd8b3c2e4 trusted.glusterfs.dht=0x000000010000000000000000ffffffff [root at gluster01-mi ~]# getfattr -m. -e hex -d /brick2/recorder/Rec218 getfattr: Removing leading '/' from absolute path names # file: brick2/recorder/Rec218 trusted.ec.version=0x000000000000693a trusted.gfid=0xa9d904af0d9e4018acb2881bd8b3c2e4 trusted.glusterfs.dht=0x000000010000000000000000ffffffff [root at gluster02-mi ~]# getfattr -m. -e hex -d /brick1/recorder/Rec218 getfattr: Removing leading '/' from absolute path names # file: brick1/recorder/Rec218 trusted.ec.version=0x000000000000693a trusted.gfid=0xa9d904af0d9e4018acb2881bd8b3c2e4 trusted.glusterfs.dht=0x000000010000000000000000ffffffff [root at gluster02-mi ~]# getfattr -m. -e hex -d /brick2/recorder/Rec218 getfattr: Removing leading '/' from absolute path names # file: brick2/recorder/Rec218 trusted.ec.version=0x000000000000693a trusted.gfid=0xa9d904af0d9e4018acb2881bd8b3c2e4 trusted.glusterfs.dht=0x000000010000000000000000ffffffff [root at gluster03-mi ~]# getfattr -m. -e hex -d /brick1/recorder/Rec218 getfattr: Removing leading '/' from absolute path names # file: brick1/recorder/Rec218 trusted.gfid=0xbda849fca556469ead84ed074f2c1bcd [root at gluster03-mi ~]# getfattr -m. -e hex -d /brick2/recorder/Rec218 getfattr: Removing leading '/' from absolute path names # file: brick2/recorder/Rec218 trusted.gfid=0xbda849fca556469ead84ed074f2c1bcd [root at gluster04-mi ~]# getfattr -m. -e hex -d /brick1/recorder/Rec218 getfattr: Removing leading '/' from absolute path names # file: brick1/recorder/Rec218 trusted.ec.version=0x0000000000006939 trusted.gfid=0xa9d904af0d9e4018acb2881bd8b3c2e4 trusted.glusterfs.dht=0x000000010000000000000000ffffffff [root at gluster04-mi ~]# getfattr -m. -e hex -d /brick2/recorder/Rec218 getfattr: Removing leading '/' from absolute path names # file: brick2/recorder/Rec218 trusted.ec.version=0x0000000000006939 trusted.gfid=0xa9d904af0d9e4018acb2881bd8b3c2e4 trusted.glusterfs.dht=0x000000010000000000000000ffffffff [root at gluster05-mi ~]# getfattr -m. -e hex -d /brick1/recorder/Rec218 getfattr: Removing leading '/' from absolute path names # file: brick1/recorder/Rec218 trusted.ec.version=0x000000000000693a trusted.gfid=0xa9d904af0d9e4018acb2881bd8b3c2e4 trusted.glusterfs.dht=0x000000010000000000000000ffffffff [root at gluster05-mi ~]# getfattr -m. -e hex -d /brick2/recorder/Rec218 getfattr: Removing leading '/' from absolute path names # file: brick2/recorder/Rec218 trusted.ec.version=0x000000000000693a trusted.gfid=0xa9d904af0d9e4018acb2881bd8b3c2e4 trusted.glusterfs.dht=0x000000010000000000000000ffffffff -----Original Message----- From: Xavier Hernandez [mailto:xhernandez at datalab.es] Sent: mercoled? 7 gennaio 2015 15:22 To: RASTELLI Alessandro Cc: gluster-users at gluster.org; CAZZANIGA Stefano; UBERTINI Gabriele; TECHNOLOGY - Supporto Sistemi OTT e Cloud; ORLANDO Luca Subject: Re: [Gluster-users] Input/Output Error when deleting folder Can you send me the result of the following command on all bricks ? getfattr -m. -e hex -d <brick path>/Rec218 On 01/07/2015 03:15 PM, RASTELLI Alessandro wrote:> Hi, > we just did a clean reboot of the client. > > The config is as follows (2 bricks on each node): > > [root at gluster01-mi ~]# gluster volume info Volume Name: > storage-recorder > Type: Disperse > Volume ID: 97587d68-3834-43cc-8b95-11996e013bf2 > Status: Started > Number of Bricks: 1 x (8 + 2) = 10 > Transport-type: tcp > Bricks: > Brick1: gluster01-mi:/brick1/recorder > Brick2: gluster01-mi:/brick2/recorder > Brick3: gluster02-mi:/brick1/recorder > Brick4: gluster02-mi:/brick2/recorder > Brick5: gluster03-mi:/brick1/recorder > Brick6: gluster03-mi:/brick2/recorder > Brick7: gluster04-mi:/brick1/recorder > Brick8: gluster04-mi:/brick2/recorder > Brick9: gluster05-mi:/brick1/recorder > Brick10: gluster05-mi:/brick2/recorder > > A. > > -----Original Message----- > From: Xavier Hernandez [mailto:xhernandez at datalab.es] > Sent: mercoled? 7 gennaio 2015 15:11 > To: RASTELLI Alessandro > Cc: gluster-users at gluster.org; CAZZANIGA Stefano; UBERTINI Gabriele; > TECHNOLOGY - Supporto Sistemi OTT e Cloud; ORLANDO Luca > Subject: Re: [Gluster-users] Input/Output Error when deleting folder > > Did you stopped the client or unmounted the volume while the application was still writing to it ? > > And what is the volume configuration ? (gluster volume info) > > Xavi > > On 01/07/2015 02:21 PM, RASTELLI Alessandro wrote: >> Hi Xavier, >> >> nothing special happened to volumes and bricks, we just restarted the >> client that writes into the folder Rec218. >> >> Before rebooting, the folder was OK. >> >> Thank you >> >> Alessandro >> >> *From:*Xavier Hernandez [mailto:xhernandez at datalab.es] >> *Sent:* mercoled? 7 gennaio 2015 13:11 >> *To:* RASTELLI Alessandro >> *Cc:* gluster-users at gluster.org; CAZZANIGA Stefano; UBERTINI >> Gabriele; TECHNOLOGY - Supporto Sistemi OTT e Cloud; ORLANDO Luca >> *Subject:* Re: [Gluster-users] Input/Output Error when deleting >> folder >> >> Hi Alessandro, >> >> what is the volume configuration ? >> did happen something special to the volume, bricks or that directory ? >> did you directly removed or changed something in the bricks ? >> did you replaced some brick ? >> >> Xavi >> >> On 01/07/2015 12:09 PM, RASTELLI Alessandro wrote: >> >> Hi, >> >> we encounter this error when I try to delete a folder on a >> Gluster3.6.1 environment (dispersed volume): >> >> [2015-01-07 10:27:49.008999] W [fuse-bridge.c:483:fuse_entry_cbk] >> 0-glusterfs-fuse: 10890: LOOKUP() /Rec218 => -1 (Input/output >> error) >> >> [2015-01-07 10:27:49.009542] W [ec-common.c:164:ec_check_status] >> 0-storage-recorder-disperse-0: Operation failed on some subvolumes >> (up=3FF, mask=3FF, remaining=0, good=3CF, bad=30) >> >> [2015-01-07 10:27:50.326991] E >> [ec-heal.c:656:ec_heal_prepare_others] >> 0-storage-recorder-disperse-0: Don't know how to heal error 116 >> >> ls command gives the output: >> >> ?????????? ? ? ? ? ? Rec218 >> >> I can't do any operation on the folder (move, rename, delete) >> >> Can you please indicate how to fix? >> >> Thanks >> >> Alessandro >> >> >> >> >> _______________________________________________ >> >> Gluster-users mailing list >> >> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >> >> http://www.gluster.org/mailman/listinfo/gluster-users >>
Xavier Hernandez
2015-Jan-07 15:08 UTC
[Gluster-users] Input/Output Error when deleting folder
I see two problems here: 1. There has happened something very strange on gluster03-mi. It contains the directory, but it's not the same one that there's on the other bricks (8 bricks have gfid a9d904af-0d9e-4018-acb2-881bd8b3c2e4, while that node has gfid bda849fc-a556-469e-ad84-ed074f2c1bcd) Whatever that has happened here has affected both bricks of that node in the same way. What return these commands on gluster03-mi: ls -l /brick1/recorder/.glusterfs/a9/d9/a9d904af-0d9e-4018-acb2-881bd8b3c2e4 ls -l /brick1/recorder/.glusterfs/bd/a8/bda849fc-a556-469e-ad84-ed074f2c1bcd ls -l /brick2/recorder/.glusterfs/a9/d9/a9d904af-0d9e-4018-acb2-881bd8b3c2e4 ls -l /brick2/recorder/.glusterfs/bd/a8/bda849fc-a556-469e-ad84-ed074f2c1bcd 2. It seems that node gluster04-mi has been stopped (or rebooted or has failed) while an operation that modifies the directory contents was being executed, so it has lost an update an it's out of sync (both bricks on the same server have missed one update, so it seems clear that it's not a brick problem but a server problem). The global result of all this is that you have 4 failed bricks on a configuration that only supports 2 failed bricks. BTW, having two or more bricks on the same server is not recommended because a single server failure causes multiple bricks to be lost. In this case a directory can be recovered, but if this happens to a file, it won't be 100% recoverable. Are there any files inside /Rec218 ? If you are going to delete the directory and all its contents and brick contents in gluster03-mi are the same than in other servers, the following commands should be safe (otherwise let me know before doing anything): Before starting you must be sure that nothing is creating or deleting entries inside /Rec218. It would be even better if this could be done with volume stopped. On each brick (including gluster03-mi): setfattr -n trusted.ec.version -v 0x1 <brick path>/Rec218 On bricks in gluster03-mi: setfattr -n trusted.gfid -v 0xa9d904af0d9e4018acb2881bd8b3c2e4 <brick path>/Rec218 setfattr -n trusted.glusterfs.dht -v 0x000000010000000000000000ffffffff <brick path>/Rec218 On client: check that the directory is accessible and its contents seem ok. If so: rm -rf <mount point>/Rec218 If you have a way to reproduce this situation, let me know. Xavi On 01/07/2015 03:31 PM, RASTELLI Alessandro wrote:> [root at gluster01-mi ~]# getfattr -m. -e hex -d /brick1/recorder/Rec218 > getfattr: Removing leading '/' from absolute path names > # file: brick1/recorder/Rec218 > trusted.ec.version=0x000000000000693a > trusted.gfid=0xa9d904af0d9e4018acb2881bd8b3c2e4 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > > [root at gluster01-mi ~]# getfattr -m. -e hex -d /brick2/recorder/Rec218 > getfattr: Removing leading '/' from absolute path names > # file: brick2/recorder/Rec218 > trusted.ec.version=0x000000000000693a > trusted.gfid=0xa9d904af0d9e4018acb2881bd8b3c2e4 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > > > [root at gluster02-mi ~]# getfattr -m. -e hex -d /brick1/recorder/Rec218 > getfattr: Removing leading '/' from absolute path names > # file: brick1/recorder/Rec218 > trusted.ec.version=0x000000000000693a > trusted.gfid=0xa9d904af0d9e4018acb2881bd8b3c2e4 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > > [root at gluster02-mi ~]# getfattr -m. -e hex -d /brick2/recorder/Rec218 > getfattr: Removing leading '/' from absolute path names > # file: brick2/recorder/Rec218 > trusted.ec.version=0x000000000000693a > trusted.gfid=0xa9d904af0d9e4018acb2881bd8b3c2e4 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > > > [root at gluster03-mi ~]# getfattr -m. -e hex -d /brick1/recorder/Rec218 > getfattr: Removing leading '/' from absolute path names > # file: brick1/recorder/Rec218 > trusted.gfid=0xbda849fca556469ead84ed074f2c1bcd > > [root at gluster03-mi ~]# getfattr -m. -e hex -d /brick2/recorder/Rec218 > getfattr: Removing leading '/' from absolute path names > # file: brick2/recorder/Rec218 > trusted.gfid=0xbda849fca556469ead84ed074f2c1bcd > > > [root at gluster04-mi ~]# getfattr -m. -e hex -d /brick1/recorder/Rec218 > getfattr: Removing leading '/' from absolute path names > # file: brick1/recorder/Rec218 > trusted.ec.version=0x0000000000006939 > trusted.gfid=0xa9d904af0d9e4018acb2881bd8b3c2e4 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > > [root at gluster04-mi ~]# getfattr -m. -e hex -d /brick2/recorder/Rec218 > getfattr: Removing leading '/' from absolute path names > # file: brick2/recorder/Rec218 > trusted.ec.version=0x0000000000006939 > trusted.gfid=0xa9d904af0d9e4018acb2881bd8b3c2e4 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > > > [root at gluster05-mi ~]# getfattr -m. -e hex -d /brick1/recorder/Rec218 > getfattr: Removing leading '/' from absolute path names > # file: brick1/recorder/Rec218 > trusted.ec.version=0x000000000000693a > trusted.gfid=0xa9d904af0d9e4018acb2881bd8b3c2e4 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > > [root at gluster05-mi ~]# getfattr -m. -e hex -d /brick2/recorder/Rec218 > getfattr: Removing leading '/' from absolute path names > # file: brick2/recorder/Rec218 > trusted.ec.version=0x000000000000693a > trusted.gfid=0xa9d904af0d9e4018acb2881bd8b3c2e4 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff