phil cryer
2010-Oct-27 17:23 UTC
[Gluster-users] Seeing duplicate files, with duplicate names, inode number, etc
We're building our cluster of data, downloading book data from Internet Archive. I've come across one that looks like this: http://cluster.biodiversitylibrary.org/n/naturwissenschaft19deut/ Almost all the files appear to be there twice, but have the same name, timestamp and inode! What could be causing this, and how can we fix it? At issue is space; it appears that we're using far more space than we should, and an `du -h` or `ls -lsh` both say this directory takes 3.9G when it should really be about 1/2 that. If it has done this on many of the directories, it could explain how we're using 78T of 97T of space already. P -- http://philcryer.com
phil cryer
2010-Oct-28 03:28 UTC
[Gluster-users] Seeing duplicate files, with duplicate names, inode number, etc
Additional on this, I'm copying some of the directories to external drives to transfer them to outside repositories, but looking at the output it's clear that gluster has re-downloaded, and doubled up files in many directories. The output shows that it copies the file once to the external drive, then tries the next file, but since it's the same the external drive rejects it. Again, if we run an md5sum against the files they match, inodes are the same, everything...these are not hard links - what is happening? Here's the output: --------------------------------------------------------------------------------------- [...] cp: cannot create regular file `/mnt/external/n/naturalistslibra30jardrich/naturalistslibra30jardrich_raw_jp2.zip': File exists `/mnt/glusterfs/www/n/naturalistslibra30jardrich/naturalistslibra30jardrich_pure_jp2.zip' -> `/mnt/external/n/naturalistslibra30jardrich/naturalistslibra30jardrich_pure_jp2.zip' `/mnt/glusterfs/www/n/naturalistslibra30jardrich/naturalistslibra30jardrich_meta.mrc' -> `/mnt/external/n/naturalistslibra30jardrich/naturalistslibra30jardrich_meta.mrc' cp: cannot create regular file `/mnt/external/n/naturalistslibra30jardrich/naturalistslibra30jardrich_meta.mrc': File exists `/mnt/glusterfs/www/n/naturalistslibra30jardrich/naturalistslibra30jardrich_marc.xml' -> `/mnt/external/n/naturalistslibra30jardrich/naturalistslibra30jardrich_marc.xml' cp: cannot create regular file `/mnt/external/n/naturalistslibra30jardrich/naturalistslibra30jardrich_marc.xml': File exists `/mnt/glusterfs/www/n/naturalistslibra30jardrich/naturalistslibra30jardrich_abbyy.gz' -> `/mnt/external/n/naturalistslibra30jardrich/naturalistslibra30jardrich_abbyy.gz' cp: cannot create regular file `/mnt/external/n/naturalistslibra30jardrich/naturalistslibra30jardrich_abbyy.gz': File exists `/mnt/glusterfs/www/n/naturalistslibra30jardrich/naturalistslibra30jardrich_lib_jp2.zip' -> `/mnt/external/n/naturalistslibra30jardrich/naturalistslibra30jardrich_lib_jp2.zip' `/mnt/glusterfs/www/n/naturalistslibra30jardrich/naturalistslibra30jardrich.djvu' -> `/mnt/external/n/naturalistslibra30jardrich/naturalistslibra30jardrich.djvu' cp: cannot create regular file `/mnt/external/n/naturalistslibra30jardrich/naturalistslibra30jardrich.djvu': File exists `/mnt/glusterfs/www/n/naturalistslibra30jardrich/naturalistslibra30jardrich_jp2.zip' -> `/mnt/external/n/naturalistslibra30jardrich/naturalistslibra30jardrich_jp2.zip' cp: cannot create regular file `/mnt/external/n/naturalistslibra30jardrich/naturalistslibra30jardrich_jp2.zip': File exists `/mnt/glusterfs/www/n/naturalistslibra30jardrich/naturalistslibra30jardrich_metasource.xml' -> `/mnt/external/n/naturalistslibra30jardrich/naturalistslibra30jardrich_metasource.xml' `/mnt/glusterfs/www/n/naturalistslibra30jardrich/naturalistslibra30jardrich.gif' -> `/mnt/external/n/naturalistslibra30jardrich/naturalistslibra30jardrich.gif' cp: cannot create regular file `/mnt/external/n/naturalistslibra30jardrich/naturalistslibra30jardrich.gif': File exists `/mnt/glusterfs/www/n/naturalistslibra30jardrich/naturalistslibra30jardrich_flippy.zip' -> `/mnt/external/n/naturalistslibra30jardrich/naturalistslibra30jardrich_flippy.zip' `/mnt/glusterfs/www/n/naturalistslibra30jardrich/naturalistslibra30jardrich_djvu.xml' -> `/mnt/external/n/naturalistslibra30jardrich/naturalistslibra30jardrich_djvu.xml' `/mnt/glusterfs/www/n/naturalistslibra30jardrich/naturalistslibra30jardrich_pure_jp2.zip' -> `/mnt/external/n/naturalistslibra30jardrich/naturalistslibra30jardrich_pure_jp2.zip' cp: cannot create regular file `/mnt/external/n/naturalistslibra30jardrich/naturalistslibra30jardrich_pure_jp2.zip': File exists `/mnt/glusterfs/www/n/naturalistslibra30jardrich/naturalistslibra30jardrich_dc.xml' -> `/mnt/external/n/naturalistslibra30jardrich/naturalistslibra30jardrich_dc.xml' `/mnt/glusterfs/www/n/naturalistslibra30jardrich/naturalistslibra30jardrich.pdf' -> `/mnt/external/n/naturalistslibra30jardrich/naturalistslibra30jardrich.pdf' `/mnt/glusterfs/www/n/naturalistslibra30jardrich/naturalistslibra30jardrich_metasource.xml' -> `/mnt/external/n/naturalistslibra30jardrich/naturalistslibra30jardrich_metasource.xml' cp: cannot create regular file `/mnt/external/n/naturalistslibra30jardrich/naturalistslibra30jardrich_metasource.xml': File exists `/mnt/glusterfs/www/n/naturalistslibra30jardrich/naturalistslibra30jardrich_djvu.xml' -> `/mnt/external/n/naturalistslibra30jardrich/naturalistslibra30jardrich_djvu.xml' cp: cannot create regular file `/mnt/external/n/naturalistslibra30jardrich/naturalistslibra30jardrich_djvu.xml': File exists `/mnt/glusterfs/www/n/naturalistslibra30jardrich/naturalistslibra30jardrich.pdf' -> `/mnt/external/n/naturalistslibra30jardrich/naturalistslibra30jardrich.pdf' cp: cannot create regular file `/mnt/external/n/naturalistslibra30jardrich/naturalistslibra30jardrich.pdf': File exists `/mnt/glusterfs/www/n/naturalistslibra30jardrich/naturalistslibra30jardrich_dc.xml' -> `/mnt/external/n/naturalistslibra30jardrich/naturalistslibra30jardrich_dc.xml' cp: cannot create regular file `/mnt/external/n/naturalistslibra30jardrich/naturalistslibra30jardrich_dc.xml': File exists `/mnt/glusterfs/www/n/naturalistslibra30jardrich/naturalistslibra30jardrich_flippy.zip' -> `/mnt/external/n/naturalistslibra30jardrich/naturalistslibra30jardrich_flippy.zip' cp: cannot create regular file `/mnt/external/n/naturalistslibra30jardrich/naturalistslibra30jardrich_flippy.zip': File exists `/mnt/glusterfs/www/n/naturalistslibra30jardrich/naturalistslibra30jardrich_djvu.txt' -> `/mnt/external/n/naturalistslibra30jardrich/naturalistslibra30jardrich_djvu.txt' cp: cannot create regular file `/mnt/external/n/naturalistslibra30jardrich/naturalistslibra30jardrich_djvu.txt': File exists `/mnt/glusterfs/www/n/naturalistslibra30jardrich/scandata.zip' -> `/mnt/external/n/naturalistslibra30jardrich/scandata.zip' cp: cannot create regular file `/mnt/external/n/naturalistslibra30jardrich/scandata.zip': File exists `/mnt/glusterfs/www/n/naturalistslibra30jardrich/naturalistslibra30jardrich_meta.xml' -> `/mnt/external/n/naturalistslibra30jardrich/naturalistslibra30jardrich_meta.xml' `/mnt/glusterfs/www/n/naturalistslibra30jardrich/naturalistslibra30jardrich_lib_jp2.zip' -> `/mnt/external/n/naturalistslibra30jardrich/naturalistslibra30jardrich_lib_jp2.zip' cp: cannot create regular file `/mnt/external/n/naturalistslibra30jardrich/naturalistslibra30jardrich_lib_jp2.zip': File exists `/mnt/glusterfs/www/n/newconquestofcen00andr' -> `/mnt/external/n/newconquestofcen00andr' `/mnt/glusterfs/www/n/nederlandschtijd02arnh' -> `/mnt/external/n/nederlandschtijd02arnh' `/mnt/glusterfs/www/n/newconceptionsin00snyduoft' -> `/mnt/external/n/newconceptionsin00snyduoft' `/mnt/glusterfs/www/n/nestseggsofnorth00daviuoft' -> `/mnt/external/n/nestseggsofnorth00daviuoft' `/mnt/glusterfs/www/n/nederlandschtijd02arnh' -> `/mnt/external/n/nederlandschtijd02arnh' cp: cannot create regular file `/mnt/external/n/nederlandschtijd02arnh': File exists `/mnt/glusterfs/www/n/naturalsciencemo02lond' -> `/mnt/external/n/naturalsciencemo02lond' `/mnt/glusterfs/www/n/naturwissenschaf10brau' -> `/mnt/external/n/naturwissenschaf10brau' `/mnt/glusterfs/www/n/notizenausdemgeb79weim' -> `/mnt/external/n/notizenausdemgeb79weim' `/mnt/glusterfs/www/n/nomenclatureofco00britsm' -> `/mnt/external/n/nomenclatureofco00britsm' `/mnt/glusterfs/www/n/noaatechnicalrep649unit' -> `/mnt/external/n/noaatechnicalrep649unit' [16:25:57] [root at clustr-02 /mnt/external]# --------------------------------------------------------------------------------------->From this you can pick one of the first ones that had so many issues,and see duplicate files... http://cluster.biodiversitylibrary.org/n/naturalistslibra30jardrich/ P On Wed, Oct 27, 2010 at 12:23 PM, phil cryer <phil at cryer.us> wrote:> We're building our cluster of data, downloading book data from > Internet Archive. I've come across one that looks like this: > http://cluster.biodiversitylibrary.org/n/naturwissenschaft19deut/ > > Almost all the files appear to be there twice, but have the same name, > timestamp and inode! What could be causing this, and how can we fix > it? At issue is space; it appears that we're using far more space than > we should, and an `du -h` or `ls -lsh` both say this directory takes > 3.9G when it should really be about 1/2 that. If it has done this on > many of the directories, it could explain how we're using 78T of 97T > of space already. > > P > -- > http://philcryer.com >-- http://philcryer.com