Sincock, John [FLCPTY]
2015-Sep-03 01:20 UTC
[Gluster-users] Recovering badly corrupted directory
Hi Everybody, Perhaps I asked too many questions at once in my first mail, sorry... But if anyone can provide any info on the one question below, it might help, Q) I realise that if a file has ------T perms, zero size, and a linkto xattr, then it is a gluster linkto file. But, we also have other ------T perm files, which are puzzling: - files which DO have a linkto xattr, but also have size > 0. What are these?! If the file is a gluster linkto, then why is its size > 0? - files which do NOT have a linkto xattr, and have size > 0. The data in these files seems to be just endless NULs (Hex 00). Does anyone have any idea what these files are and why they've ended up with ------T perms? If anyone can explain what these files are, I'd be grateful. Thanks again. John -----Original Message----- From: Sincock, John [FLCPTY] Sent: Wednesday, 26 August 2015 3:55 PM To: gluster-users Subject: Recovering badly corrupted directory Hi Everybody, I'm trying to recover a badly corrupted directory on our gluster, and need some advice: It looks like we've hit this bug here, which was reported against gluster 2.1 and is unresolved: https://bugzilla.redhat.com/show_bug.cgi?id=1034148 Bug 1034148 - "DHT : on lookup getting error ' cannot read symbolic link <dir1>: Invalid argument' or 'Input/output error' and logs says "[posix.c:737:posix_readlink] 0-flat-posix: readlink on <dir> failed Invalid argument" + it shows directory twice in output" Some googling shows what looks like the same bug on glusterfs 3.3 here: http://www.gluster.org/pipermail/gluster-users/2013-August/014038.html and glusterfs 3.4.2-1 here: http://www.gluster.org/pipermail/gluster-users/2014-February/016271.html We are running gluster 3.4.1-3.el6.x86_64 on centos 6.4 Our data for the corrupted folder appears to exist on the bricks but is unusable via the gluster volume. There are many files with ------T permissions, many of which have zero size, others have data Here is what I get when I list the original problem directory: ls -la /gluster/vol00/archive/Online_Archive/Survey/Riegl/2014/Saudi/H11_Riegl_ RiAcquire_Raw_Data/14_11_SAS18_1_RiAcquire\ \(FieldRawData\)/ ls: cannot read symbolic link /gluster/vol00/archive/Online_Archive/Survey/Riegl/2014/Saudi/H11_Riegl_ RiAcquire_Raw_Data/14_11_SAS18_1_RiAcquire (FieldRawData)/14_11_140823_S004_DNP: Invalid argument ls: cannot read symbolic link /gluster/vol00/archive/Online_Archive/Survey/Riegl/2014/Saudi/H11_Riegl_ RiAcquire_Raw_Data/14_11_SAS18_1_RiAcquire (FieldRawData)/14_11_140902_S021: Invalid argument <snips many more like this> total 0 lrwxrwxrwx 0 root root 70 Jul 13 16:30 14_11_140819_S001_DNP -> ../../a9/65/a9650fe5-0436-472f-8f75-4d7b5bf0e676/14_11_140819_S001_DNP lrwxrwxrwx 0 root root 70 Jul 13 16:30 14_11_140819_S001_DNP -> ../../a9/65/a9650fe5-0436-472f-8f75-4d7b5bf0e676/14_11_140819_S001_DNP lrwxrwxrwx 0 root root 70 Jul 13 16:30 14_11_140819_S001_DNP -> ../../a9/65/a9650fe5-0436-472f-8f75-4d7b5bf0e676/14_11_140819_S001_DNP lrwxrwxrwx 0 root root 70 Jul 13 16:30 14_11_140819_S001_DNP -> ../../a9/65/a9650fe5-0436-472f-8f75-4d7b5bf0e676/14_11_140819_S001_DNP lrwxrwxrwx 0 root root 70 Jul 13 16:30 14_11_140819_S001_DNP -> ../../a9/65/a9650fe5-0436-472f-8f75-4d7b5bf0e676/14_11_140819_S001_DNP lrwxrwxrwx 0 root root 66 Jul 13 16:31 14_11_140821_S002 -> ../../a9/65/a9650fe5-0436-472f-8f75-4d7b5bf0e676/14_11_140821_S002 lrwxrwxrwx 0 root root 66 Jul 13 16:31 14_11_140821_S002 -> ../../a9/65/a9650fe5-0436-472f-8f75-4d7b5bf0e676/14_11_140821_S002 lrwxrwxrwx 0 root root 66 Jul 13 16:31 14_11_140821_S002 -> ../../a9/65/a9650fe5-0436-472f-8f75-4d7b5bf0e676/14_11_140821_S002 lrwxrwxrwx 0 root root 66 Jul 13 16:31 14_11_140821_S002 -> ../../a9/65/a9650fe5-0436-472f-8f75-4d7b5bf0e676/14_11_140821_S002 lrwxrwxrwx 0 root root 66 Jul 13 16:31 14_11_140821_S002 -> ../../a9/65/a9650fe5-0436-472f-8f75-4d7b5bf0e676/14_11_140821_S002 lrwxrwxrwx 0 root root 70 Jul 13 16:31 14_11_140821_S003_DNP -> ../../a9/65/a9650fe5-0436-472f-8f75-4d7b5bf0e676/14_11_140821_S003_DNP lrwxrwxrwx 0 root root 70 Jul 13 16:31 14_11_140821_S003_DNP -> ../../a9/65/a9650fe5-0436-472f-8f75-4d7b5bf0e676/14_11_140821_S003_DNP lrwxrwxrwx 0 root root 70 Jul 13 16:31 14_11_140821_S003_DNP -> ../../a9/65/a9650fe5-0436-472f-8f75-4d7b5bf0e676/14_11_140821_S003_DNP lrwxrwxrwx 0 root root 70 Jul 13 16:31 14_11_140821_S003_DNP -> ../../a9/65/a9650fe5-0436-472f-8f75-4d7b5bf0e676/14_11_140821_S003_DNP lrwxrwxrwx 0 root root 70 Jul 13 16:31 14_11_140821_S003_DNP -> ../../a9/65/a9650fe5-0436-472f-8f75-4d7b5bf0e676/14_11_140821_S003_DNP lrwxrwxrwx 0 root root 70 Jul 13 16:32 14_11_140823_S004_DNP lrwxrwxrwx 0 root root 70 Jul 13 16:32 14_11_140823_S004_DNP lrwxrwxrwx 1 root root 70 Jul 13 16:32 14_11_140823_S004_DNP lrwxrwxrwx 0 root root 70 Jul 13 16:32 14_11_140823_S004_DNP lrwxrwxrwx 0 root root 70 Jul 13 16:32 14_11_140823_S004_DNP lrwxrwxrwx 0 root root 66 Jul 13 14:38 14_11_140823_S005 -> ../../a9/65/a9650fe5-0436-472f-8f75-4d7b5bf0e676/14_11_140823_S005 lrwxrwxrwx 0 root root 66 Jul 13 14:38 14_11_140823_S005 -> ../../a9/65/a9650fe5-0436-472f-8f75-4d7b5bf0e676/14_11_140823_S005 lrwxrwxrwx 0 root root 66 Jul 13 14:38 14_11_140823_S005 -> ../../a9/65/a9650fe5-0436-472f-8f75-4d7b5bf0e676/14_11_140823_S005 <snips many more like this> As you can see, the listing is a catastrophic mess, with a multitude of duplicate entries, symbolic links which give this "invalid argument" nonsense, and all the symlinks to ../../a9/65/blahblah are broken. I think I've managed to restore access to most of the data by rsyncing from its original location on each brick, to a new location on the brick ((not copying xattributes), and then recursively listing the new copy via the gluster volume to pull the new data into the gluster volume. The copied data we've brought back into the gluster volume looks like it is all or mostly there, but there are still quite a few duplicates of many files with the ------T permissions in the new copy, so trying to rsync from this new copy throws errors saying structure needs to be cleaned and items failed verification, not surprising as rsync would have no way of properly coping with these duplicates. Eg here is a small subfolder: ll "/gluster/vol00/gluster-recovery-2/14_11_SAS18_1_RiAcquire (FieldRawData)/14_11_141023_S091_DNP/08_RECEIVED/" total 3165 -rwxr-xr-x 1 survey1 surveyor 2702101 Oct 23 2014 14_11_141023_S091.rhk ---------T 1 survey1 surveyor 2702101 May 27 04:49 14_11_141023_S091.rhk -rwxr-xr-x 1 survey1 surveyor 407859 Oct 23 2014 14_11_141023_S091.rpc ---------T 1 survey1 surveyor 407859 May 27 04:49 14_11_141023_S091.rpc -rwxr-xr-x 1 survey1 surveyor 28653 Oct 23 2014 14_11_141023_S091.rpl -rwxr-xr-x 1 survey1 surveyor 101609 Oct 23 2014 14_11_141023_S091.rpp Note the duplicates with ------T perms. The question s I have are: 1) Is there a better way to clean up this corruption? 2) What are these ------T files and why are they there?!?!?!?! 3) what has caused the initial problem which corrupted our data? I was on leave when this problem was noticed, but my colleagues are pretty sure the directory looked OK after our last rebalance, so we do not think this problem occurred during rebalance. We have 3 nodes in our cluster, and one of them is having issues with occasional spontaneous reboots, so it does drop off the gluster at times and then returns. But I don't think we have modified or moved any of the corrupted data recently, so I do not think the problem has been caused by data being moved during rebalance or by the node rebooting while data was being manually moved from one place to another. 4) Can I just delete the remaining ------T files from the data we copied and readded into the volume? If I do so, is there any chance that the other duplicate is bad, and the T-file is the good copy that I should've saved? What are the chances this cleanup has fixed everything? Is there still likely to be corrupt and/or missing files? 5) If we can get our copy of this data cleaned up, we would like to delete the original corrupted folder from the volume, by going behind gluster and deleting the data off the bricks. What is the correct procedure for doing this? Ie if we delete the bad data off all the bricks, this will leave files or links in the .glusterfs folder won't it? How do I find the correct files under .glusterfs to delete? Or do I just delete from bricks and then have to wait until the next time we do a rebalance, and let the rebalance clean up the mess? Any advice would be appreciated! Thanks muchly, John