So, I haven't heard anything back, so I just wanted to update this
just in case anyone else comes across it. This was an old store that
we created in 3.0.4, that kept getting duplicate files, basically we
ran an update script that would use wget, try to download any files
that were not present on the local box but were on the remote. Of
course if it just downloaded the same file it would either 1) ignore
it and not download it because it would see that we already have it 2)
overwrite that file (clobber) with a new version of that file or 2)
rewrite the file as file.1 so as not to mess with the original one
(no-clobber) - but in fact it did none of these - so instead we ended
up with the bizzare feature of having multiple/identical files in the
same directory. Meanwhile we're also using far more space than we
should have (~70TB instead of ~40TB or so) thanks to having
directories like this:
# ls -al /mnt/glusterfs//www/t/tijdschriftvoore1951nede/
total 536436
drwxr-xr-x 2 www-data www-data 294912 Jan 13 10:05 .
drwx------ 1016 www-data www-data 3846144 Dec 12 11:10 ..
-rwxr-xr-x 1 www-data www-data 1151282 Jul 12 2010
tijdschriftvoore1951nede_djvu.txt
-rwxr-xr-x 1 www-data www-data 1151282 Jul 12 2010
tijdschriftvoore1951nede_djvu.txt
-rwxr-xr-x 1 www-data www-data 12078834 Jul 12 2010
tijdschriftvoore1951nede_djvu.xml
-rwxr-xr-x 1 www-data www-data 12078834 Jul 12 2010
tijdschriftvoore1951nede_djvu.xml
-rwxr-xr-x 1 www-data www-data 271733 Jul 12 2010
tijdschriftvoore1951nede.gif
-rwxr-xr-x 1 www-data www-data 271733 Jul 12 2010
tijdschriftvoore1951nede.gif
-rwxr-xr-x 1 www-data www-data 257779301 Jul 12 2010
tijdschriftvoore1951nede_jp2.zip
-rwxr-xr-x 1 www-data www-data 257779301 Jul 12 2010
tijdschriftvoore1951nede_jp2.zip
-rwxr-xr-x 1 www-data www-data 2278 Jul 12 2010
tijdschriftvoore1951nede_marc.xml
-rwxr-xr-x 1 www-data www-data 2278 Jul 12 2010
tijdschriftvoore1951nede_marc.xml
-rwxr-xr-x 1 www-data www-data 720 Jul 12 2010
tijdschriftvoore1951nede_meta.mrc
-rwxr-xr-x 1 www-data www-data 720 Jul 12 2010
tijdschriftvoore1951nede_meta.mrc
-rwxr-xr-x 1 www-data www-data 546411 Jul 12 2010
tijdschriftvoore1951nede_names.xml
-rwxr-xr-x 1 www-data www-data 546411 Jul 12 2010
tijdschriftvoore1951nede_names.xml
-rwxr-xr-x 1 www-data www-data 256 Jul 12 2010
tijdschriftvoore1951nede_names.xml_meta.txt
-rwxr-xr-x 1 www-data www-data 256 Jul 12 2010
tijdschriftvoore1951nede_names.xml_meta.txt
-rwxr-xr-x 1 www-data www-data 257556 Jul 13 2010
tijdschriftvoore1951nede_scandata.xml
-rwxr-xr-x 1 www-data www-data 257556 Jul 13 2010
tijdschriftvoore1951nede_scandata.xml
Ouch, right? So, I installed 3.1.1, that went well, I got it on all
the drives and servers we had before, have a total capacity of 96TB
again, good, all seems to be working, mounted the old directories and
saw the same issue with the duplicate files and let it sit over night
to see if it would notice this and try to fix things. Then we're
seeing gluster logs saying things like:
==> glusterfs/mnt-glusterfs.log <=[2011-01-13 11:46:23.2762] I
[afr-common.c:662:afr_lookup_done]
bhl-volume-replicate-55: entries are missing in lookup of
/www/t/tijdschriftvoore1951nede.
[2011-01-13 11:46:23.2817] I [afr-common.c:716:afr_lookup_done]
bhl-volume-replicate-55: background meta-data data entry self-heal
triggered. path: /www/t/tijdschriftvoore1951nede
[2011-01-13 11:46:23.5342] I
[afr-self-heal-common.c:1526:afr_self_heal_completion_cbk]
bhl-volume-replicate-55: background meta-data data entry self-heal
completed on /www/t/tijdschriftvoore1951nede
...so we think, hey, maybe we're all set here, it's fixing itself and
removing those duplicate files, but no such luck:
# ls -al /mnt/glusterfs//www/t/tijdschriftvoore1951nede/
total 536436
drwxr-xr-x 2 www-data www-data 294912 Jan 13 10:05 .
drwx------ 1016 www-data www-data 3846144 Dec 12 11:10 ..
-rwxr-xr-x 1 www-data www-data 1151282 Jul 12 2010
tijdschriftvoore1951nede_djvu.txt
-rwxr-xr-x 1 www-data www-data 1151282 Jul 12 2010
tijdschriftvoore1951nede_djvu.txt
-rwxr-xr-x 1 www-data www-data 12078834 Jul 12 2010
tijdschriftvoore1951nede_djvu.xml
-rwxr-xr-x 1 www-data www-data 12078834 Jul 12 2010
tijdschriftvoore1951nede_djvu.xml
-rwxr-xr-x 1 www-data www-data 271733 Jul 12 2010
tijdschriftvoore1951nede.gif
-rwxr-xr-x 1 www-data www-data 271733 Jul 12 2010
tijdschriftvoore1951nede.gif
-rwxr-xr-x 1 www-data www-data 257779301 Jul 12 2010
tijdschriftvoore1951nede_jp2.zip
-rwxr-xr-x 1 www-data www-data 257779301 Jul 12 2010
tijdschriftvoore1951nede_jp2.zip
-rwxr-xr-x 1 www-data www-data 2278 Jul 12 2010
tijdschriftvoore1951nede_marc.xml
-rwxr-xr-x 1 www-data www-data 2278 Jul 12 2010
tijdschriftvoore1951nede_marc.xml
-rwxr-xr-x 1 www-data www-data 720 Jul 12 2010
tijdschriftvoore1951nede_meta.mrc
-rwxr-xr-x 1 www-data www-data 720 Jul 12 2010
tijdschriftvoore1951nede_meta.mrc
-rwxr-xr-x 1 www-data www-data 546411 Jul 12 2010
tijdschriftvoore1951nede_names.xml
-rwxr-xr-x 1 www-data www-data 546411 Jul 12 2010
tijdschriftvoore1951nede_names.xml
-rwxr-xr-x 1 www-data www-data 256 Jul 12 2010
tijdschriftvoore1951nede_names.xml_meta.txt
-rwxr-xr-x 1 www-data www-data 256 Jul 12 2010
tijdschriftvoore1951nede_names.xml_meta.txt
-rwxr-xr-x 1 www-data www-data 257556 Jul 13 2010
tijdschriftvoore1951nede_scandata.xml
-rwxr-xr-x 1 www-data www-data 257556 Jul 13 2010
tijdschriftvoore1951nede_scandata.xml
but, this allows us to do (in my opinion) scary things like this:
# ls -al /mnt/glusterfs//www/t/tijdschriftvoore1951nede/*_names.xml
-rwxr-xr-x 1 www-data www-data 546411 Jul 12 2010
/mnt/glusterfs//www/t/tijdschriftvoore1951nede/tijdschriftvoore1951nede_names.xml
-rwxr-xr-x 1 www-data www-data 546411 Jul 12 2010
/mnt/glusterfs//www/t/tijdschriftvoore1951nede/tijdschriftvoore1951nede_names.xml
# rm
/mnt/glusterfs//www/t/tijdschriftvoore1951nede/tijdschriftvoore1951nede_names.xml
# ls -al /mnt/glusterfs//www/t/tijdschriftvoore1951nede/*_names.xml
-rwxr-xr-x 1 www-data www-data 546411 Jul 12 2010
/mnt/glusterfs//www/t/tijdschriftvoore1951nede/tijdschriftvoore1951nede_names.xml
eek! so it only removed one of the files, even though they both had
the same name. At this point we're going to wipe all 70TB and
re-transfer, hoping it stops when it gets all the files and doesn't
start writing the files with the same names as before. Anyone with
advice or insight into this issue? Would love to learn why it did
this, and REALLY hope it doesn't do it again.
Thanks
P
On Wed, Jan 12, 2011 at 2:37 PM, phil cryer <phil at cryer.us>
wrote:> I'm now running gluster 3.1.1 on Debian. A directory that was running
> under 3.0.4 had duplicate files, but I've remounted things now that
> we're running 3.1.1 in hopes it would fix things, but so far it has
> not:
>
> # ls -l /mnt/glusterfs/www/0/0descriptionofta581unittotal 37992
> -rwxr-xr-x 1 www-data www-data ? 796343 Jun 23 ?2010
> 0descriptionofta581unit_bw.pdf
> -rwxr-xr-x 1 www-data www-data ? 796343 Jun 23 ?2010
> 0descriptionofta581unit_bw.pdf
> ---------T 1 root ? ? root ? ? ? ? 1497 Jun 24 ?2010
> 0descriptionofta581unit_dc.xml
> ---------T 1 root ? ? root ? ? ? ? 1497 Jun 24 ?2010
> 0descriptionofta581unit_dc.xml
> ---------T 1 www-data www-data ? 577050 Jun 24 ?2010
> 0descriptionofta581unit.djvu
> ---------T 1 www-data www-data ? 577050 Jun 24 ?2010
> 0descriptionofta581unit.djvu
> -rwxr-xr-x 1 www-data www-data ? ?33272 Jun 22 ?2010
> 0descriptionofta581unit_djvu.txt
> -rwxr-xr-x 1 www-data www-data ? ?33272 Jun 22 ?2010
> 0descriptionofta581unit_djvu.txt
> -rwxr-xr-x 1 www-data www-data ? ? 4445 Jun 23 ?2010
> 0descriptionofta581unit_files.xml
> -rwxr-xr-x 1 www-data www-data ? ? 4445 Jun 23 ?2010
> 0descriptionofta581unit_files.xml
> -rwxr-xr-x 1 www-data www-data ? ? 5011 Jun 22 ?2010
> 0descriptionofta581unit_marc.xml
> -rwxr-xr-x 1 www-data www-data ? ? 5011 Jun 22 ?2010
> 0descriptionofta581unit_marc.xml
> -rwxr-xr-x 1 www-data www-data ? ? ?360 Jun 23 ?2010
> 0descriptionofta581unit_metasource.xml
> -rwxr-xr-x 1 www-data www-data ? ? ?360 Jun 23 ?2010
> 0descriptionofta581unit_metasource.xml
> -rwxr-xr-x 1 www-data www-data ? ? 2848 Jun 22 ?2010
> 0descriptionofta581unit_meta.xml
> -rwxr-xr-x 1 www-data www-data ? ? 2848 Jun 22 ?2010
> 0descriptionofta581unit_meta.xml
> -rwxr-xr-x 1 www-data www-data 16916480 Jun 22 ?2010
> 0descriptionofta581unit_orig_jp2.tar
> -rwxr-xr-x 1 www-data www-data 16916480 Jun 22 ?2010
> 0descriptionofta581unit_orig_jp2.tar
> -rwxr-xr-x 1 www-data www-data ?1051810 Jun 22 ?2010
0descriptionofta581unit.pdf
> -rwxr-xr-x 1 www-data www-data ?1051810 Jun 22 ?2010
0descriptionofta581unit.pdf
>
> While running the latest, 3.1.1, I noticed some log files that said:
>
> [..]
> [2011-01-12 15:24:33.325546] I
> [afr-common.c:613:afr_lookup_self_heal_check] bhl-volume-replicate-69:
> size differs for
> /www/0/0descriptionofta581unit/0descriptionofta581unit.djvu
> [2011-01-12 15:24:33.325558] I [afr-common.c:716:afr_lookup_done]
> bhl-volume-replicate-69: background ?meta-data data self-heal
> triggered. path:
> /www/0/0descriptionofta581unit/0descriptionofta581unit.djvu
> [2011-01-12 15:24:33.364501] I
> [afr-self-heal-common.c:1526:afr_self_heal_completion_cbk]
> bhl-volume-replicate-66: background ?meta-data data self-heal
> completed on /www/0/0descriptionofta581unit/0descriptionofta581unit.djvu
> [2011-01-12 15:24:33.364881] I
> [afr-self-heal-common.c:1526:afr_self_heal_completion_cbk]
> bhl-volume-replicate-69: background ?meta-data data self-heal
> completed on /www/0/0descriptionofta581unit/0descriptionofta581unit.djvu
>
> I assumed it was fixing that, but it didn't. Here's the full logs
that
> include all the gluster.log work it did in this directory:
> http://pastebin.com/8X52Em7Y
>
> Question: how can I 'fix' this, or is the best bet to remove
> everything and start over? It's going to set us back, but I'd
rather
> do it now that keep banging on this without any resolution.
>
> Thanks for the help, really like the new gluster command, very nice!
>
> P
> --
> http://philcryer.com
>
--
http://philcryer.com