Dear all, this is my first message to this mailing list and I also just subscribed to it. So please, forgive me for my inexperience. I hope this is also the correct place to ask this question. I'm not a system administrator, even if I'm requested to do so (phd stud here). I like to do it, but sometimes I'm lacking the required knowledge. Anyway, here's my problem that, has always, needs to be solved by me as soon as possible. I installed gluster 3.3.1 on Ubuntu 12.10 (from the repository) on 4 machines, all connected together via LAN but two also have a special Infiniband link between them. On two of them I created a "scratch" volume (distributed, 8 TB tot), on the other two I created a "storage" volume (distributed + replicated, 12 TB tot but because of replica just 6 TB available to users). All of the machines see both volumes, and for now to use them you have to ssh to one of those (in future it will be exported: do you suggest nfs or gluster as the mounting type?) The distributed and _not_ replicated filesystem seems to work (at least for now) very well and also is perfectly accessible from all machines, even if is built on them connected by infiniband. The other replicated _and_ distributed filesystem, on the other hand, has some problems. In fact, from all nodes, it's missing some files when asked to list file in a folder with commands like 'ls'. This happened from one day to the other, because I'm sure that three days ago it was working perfectly. The configuration didn't change (one machine got rebooted, but even a global reboot didn't fix anything). I tried to do a volume rebalance to see if it was going to do anything (it magically fixed a problem at the very beginning of my gluster adventure), but it never completed: it grew up to a rebalance of hundred of million of files, but there should not be so many files in this volume, we're speaking of order of magnitude less. I tried to list single bricks and I found that files are still present on them, and each one on two bricks (because of replica), and perfectly working if directly accessed to read them, so it seems that it's not a hardware problem on a particular brick. As another strategy, I found on the internet that a "find . > /dev/null" launched as root on the root folder of the glusterfs should trigger a re-hash of the files, so maybe that could help me. Unfortunately it hangs almost immediately in a folder that, as said, is missing some files when listed from the global filesystem. I tried to read logs, but nothing strange seems to be happening (btw: analysing logs I found out that also the rebalance got stuck in one of these folders and just started counting millions and millions of "nonexistant" files (not even in the single brick, I'm sure that those folder are not so big), so that's why it wrote hundreds of millions of files non requiring rebalance in the status) Do you have any suggestion? Sorry for the long mail, I hope it's enough to explain my problem. Thanks a lot for your time and for your help, in advance Best regards to all, Stefano
Hi Stefano, I'm not sure what is causing your problem, and the variables in your configuration can lead to various scenarios. In any case, according to what you wrote, I will assume that the replicated-distributed volume is contained on 2 machines only, spanning 4 bricks. You might have done this already, but if you look at page 15 of the Gluster Administration guide (I have the guide for 3.3.0, but I think it's the same), you will read in the "Note" that you need to be careful about the order in which you specify the bricks across the 2 servers. In particular, when creating the volume, you need to specify the first brick of each server, and then the second of each server and so on. In your case, I think it should be something like: gluster volume create volume_name replica 2 transport tcp server1:/exp1 server2:/exp1 server1:/exp2 server2:/exp2 (replacing the transport type accrding to your network characteristic, if necessary), so that the exp1 on server1 will be replicated on exp1 on server 2, and exp2 on server1 will be replicated on exp2 on server2, and all 4 bricks together will form a replicated-distributed volume. As I said, you might have followed these steps correctly already, but better to double-check. About the mounting, the manuals claim better performances with the native gluster protocol, but I'm not sure this is the case if you have a a huge amount of small files. I experienced big performance problems with a replicated volume that had many small files, and I had to give up entirely on the glusterfs technology for that particular project. To be fair, because of network topology constraints, my two nodes where using a standar 1Gbit network link, shared with other network traffic, so the performance issues might be partially caused by the "slow" link, but googling around, I'm not the only one facing this kind of problem. About your missing files, I have one last (possibly silly) question/suggestion: on the server themselves, did you remount the glusterfs volume using either the native or nfs protocol on another mount point? I have just a replicated volume, but I found that if you don't do that, and above all, if you don't access/create the files on the glusterfs-mounted filesystem but you use the original fs mount point (in my case an xfs), the files are not replicated at all. To be clear: You have server1:/brick1 replicated to server2:/brick2. (let's forget about distribution, for now). server1:/brick1 is a xfs or ext4 filesystem mounted on a mount point on your server. When you create the volume, you must mount the volume via nfs or glusterfs protocol on another mount point on the server, for example server1:/glustermountpoint When you create files, you must do it on server1:/glustermountpoint, and not on server1:/brick1, otherwise the files are not replicated to server2 and they are stored on server1 only. I don't think that the documentation is very clear on this (i don't think that the glusterfs documentation is particularly clear in general) so double-check that as well. Hope this helps... Regards, Davide On 30 May 2013 07:24, Stefano Sinigardi <stefano.sinigardi at gmail.com> wrote:> Dear all, > this is my first message to this mailing list and I also just > subscribed to it. So please, forgive me for my inexperience. I hope > this is also the correct place to ask this question. I'm not a system > administrator, even if I'm requested to do so (phd stud here). I like > to do it, but sometimes I'm lacking the required knowledge. Anyway, > here's my problem that, has always, needs to be solved by me as soon > as possible. > I installed gluster 3.3.1 on Ubuntu 12.10 (from the repository) on 4 > machines, all connected together via LAN but two also have a special > Infiniband link between them. On two of them I created a "scratch" > volume (distributed, 8 TB tot), on the other two I created a "storage" > volume (distributed + replicated, 12 TB tot but because of replica > just 6 TB available to users). All of the machines see both volumes, > and for now to use them you have to ssh to one of those (in future it > will be exported: do you suggest nfs or gluster as the mounting type?) > The distributed and _not_ replicated filesystem seems to work (at > least for now) very well and also is perfectly accessible from all > machines, even if is built on them connected by infiniband. > The other replicated _and_ distributed filesystem, on the other hand, > has some problems. In fact, from all nodes, it's missing some files > when asked to list file in a folder with commands like 'ls'. This > happened from one day to the other, because I'm sure that three days > ago it was working perfectly. The configuration didn't change (one > machine got rebooted, but even a global reboot didn't fix anything). > I tried to do a volume rebalance to see if it was going to do anything > (it magically fixed a problem at the very beginning of my gluster > adventure), but it never completed: it grew up to a rebalance of > hundred of million of files, but there should not be so many files in > this volume, we're speaking of order of magnitude less. I tried to > list single bricks and I found that files are still present on them, > and each one on two bricks (because of replica), and perfectly working > if directly accessed to read them, so it seems that it's not a > hardware problem on a particular brick. > As another strategy, I found on the internet that a "find . > > /dev/null" launched as root on the root folder of the glusterfs should > trigger a re-hash of the files, so maybe that could help me. > Unfortunately it hangs almost immediately in a folder that, as said, > is missing some files when listed from the global filesystem. > I tried to read logs, but nothing strange seems to be happening (btw: > analysing logs I found out that also the rebalance got stuck in one of > these folders and just started counting millions and millions of > "nonexistant" files (not even in the single brick, I'm sure that those > folder are not so big), so that's why it wrote hundreds of millions of > files non requiring rebalance in the status) > > Do you have any suggestion? > Sorry for the long mail, I hope it's enough to explain my problem. > > Thanks a lot for your time and for your help, in advance > Best regards to all, > > Stefano > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130530/b9a32292/attachment.html>
Al 31/05/13 11:48, En/na Stefano Sinigardi ha escrit:> Dear Xavier, > I realized that the volume was not build properly when doing the first > analyses suggested by Davide, but I'm sure that this is not the > problem and so I quickly dismissed it. Also, we need a replica but not > so strictly, maybe in the future with the next volume I'll build it > properly. Anyway, yes, the volume got birth on "pedrillo" with a > replica-2 and the next day was expanded on "osmino", again with > replica-2, just by adding bricks and doing a rebalance, that was just > tried. I'm saying "tried" because it got "stuck", consuming a lot of > RAM (almost all, 16 GB), and it was counting million of files that I > think don't even exist on the volume, so I stopped it. Do you think > that it might be worth restarting? > The brick logs are there, just before your nice reply asking for them. > They're all the same, except for the order of the connection, all seem > very good.Sorry I don't know what I saw. Yes, the logs are there... :p> Yes, you're right, many files are missing through the mounpoint, I > just choose one. But then again, what do you think about the fact that > ls hides them but calling them from the mount point are not counted as > "file not found"? "leggi_particelle" was working calling it from the > bricks and from the mountpoint, even if ls didn't show it in the > mountpoint...It can happen. When you list a directory, a file may not be shown if it has some problem. However it can be reached if you access it directly (I have seen this before).> And here it is all the other info that you were asking: first of all a > collection of ls of the folder on all the bricks of the volume (10) > /storage/1/data/stefano/leggi_particelle: > total 20 > drwxr-xr-x 3 stefano user 4096 May 22 18:24 ./ > drwxr-xr-x 14 stefano user 4096 May 28 04:58 ../ > -rwxr-xr-x 2 stefano user 286 Feb 25 17:24 Espec.plt* > lrwxrwxrwx 2 stefano user 53 Feb 13 11:30 parametri.cpp -> > /tamino/stefano/codice/leggi_particelle/parametri.cpp > drwxr-xr-x 3 stefano user 4096 Apr 11 19:19 test/ > /storage/2/data/stefano/leggi_particelle: > total 20 > drwxr-xr-x 3 stefano user 4096 May 22 18:24 ./ > drwxr-xr-x 14 stefano user 4096 May 28 04:58 ../ > -rwxr-xr-x 2 stefano user 286 Feb 25 17:24 Espec.plt* > lrwxrwxrwx 2 stefano user 53 Feb 13 11:30 parametri.cpp -> > /tamino/stefano/codice/leggi_particelle/parametri.cpp > drwxr-xr-x 3 stefano user 4096 Apr 11 19:19 test/ > /storage/5/data/stefano/leggi_particelle: > total 892 > drwxr-xr-x 3 stefano user 4096 May 24 17:16 ./ > drwxr-xr-x 14 stefano user 4096 May 28 11:32 ../ > lrwxrwxrwx 2 stefano user 50 Apr 11 19:20 filtro.cpp -> > /tamino/stefano/codice/leggi_particelle/filtro.cpp > lrwxrwxrwx 2 stefano user 70 Apr 11 19:20 > leggi_binario_ALaDyn_fortran.h -> > /tamino/stefano/codice/leggi_particelle/leggi_binario_ALaDyn_fortran.h > -rwxr-xr-x 2 stefano user 705045 May 22 18:24 leggi_particelle* > -rwxr-xr-x 2 stefano user 61883 Dec 16 17:20 leggi_particelle.old01* > -rwxr-xr-x 2 stefano user 106014 Apr 11 19:20 leggi_particelle.old03* > ---------T 2 root root 0 May 24 17:16 parametri.cpp > drwxr-xr-x 3 stefano user 4096 Apr 11 19:19 test/ > /storage/6/data/stefano/leggi_particelle: > total 892 > drwxr-xr-x 3 stefano user 4096 May 24 17:16 ./ > drwxr-xr-x 14 stefano user 4096 May 28 11:32 ../ > lrwxrwxrwx 2 stefano user 50 Apr 11 19:20 filtro.cpp -> > /tamino/stefano/codice/leggi_particelle/filtro.cpp > lrwxrwxrwx 2 stefano user 70 Apr 11 19:20 > leggi_binario_ALaDyn_fortran.h -> > /tamino/stefano/codice/leggi_particelle/leggi_binario_ALaDyn_fortran.h > -rwxr-xr-x 2 stefano user 705045 May 22 18:24 leggi_particelle* > -rwxr-xr-x 2 stefano user 61883 Dec 16 17:20 leggi_particelle.old01* > -rwxr-xr-x 2 stefano user 106014 Apr 11 19:20 leggi_particelle.old03* > ---------T 2 root root 0 May 24 17:16 parametri.cpp > drwxr-xr-x 3 stefano user 4096 Apr 11 19:19 test/ > /storage/arc1/data/stefano/leggi_particelle: > total 144 > drwxr-xr-x 3 stefano user 4096 May 22 18:24 ./ > drwxr-xr-x 14 stefano user 4096 May 28 11:32 ../ > lrwxrwxrwx 2 stefano user 53 Feb 22 19:40 binnaggio.cpp -> > /tamino/stefano/codice/leggi_particelle/binnaggio.cpp > -rwxr-xr-x 2 stefano user 350 Feb 25 17:24 Etheta.plt* > lrwxrwxrwx 2 stefano user 72 Mar 22 2012 > leggi_binario_ALaDyn_fortran.cpp -> > /tamino/stefano/codice/leggi_particelle/leggi_binario_ALaDyn_fortran.cpp > lrwxrwxrwx 2 stefano user 55 Mar 22 2012 leggi_campi.cpp -> > /tamino/stefano/codice/leggi_particelle/leggi_campi.cpp > lrwxrwxrwx 2 stefano user 60 Apr 11 19:20 leggi_particelle.cpp -> > /tamino/stefano/codice/leggi_particelle/leggi_particelle.cpp > -rwxr-xr-x 2 stefano user 97536 Mar 25 12:46 leggi_particelle.old02* > -rwxr-xr-x 2 stefano user 923 May 12 12:42 plot_den.plt* > lrwxrwxrwx 2 stefano user 54 Mar 22 2012 swap_tools.cpp -> > /tamino/stefano/codice/leggi_particelle/swap_tools.cpp > drwxr-xr-x 3 stefano user 4096 Apr 11 19:19 test/ > -rwxr-xr-x 2 stefano user 309 Feb 25 17:24 xpx.plt* > /storage/arc2/data/stefano/leggi_particelle: > total 144 > drwxr-xr-x 3 stefano user 4096 May 22 18:24 ./ > drwxr-xr-x 14 stefano user 4096 May 28 11:32 ../ > lrwxrwxrwx 2 stefano user 53 Feb 22 19:40 binnaggio.cpp -> > /tamino/stefano/codice/leggi_particelle/binnaggio.cpp > -rwxr-xr-x 2 stefano user 350 Feb 25 17:24 Etheta.plt* > lrwxrwxrwx 2 stefano user 72 Mar 22 2012 > leggi_binario_ALaDyn_fortran.cpp -> > /tamino/stefano/codice/leggi_particelle/leggi_binario_ALaDyn_fortran.cpp > lrwxrwxrwx 2 stefano user 55 Mar 22 2012 leggi_campi.cpp -> > /tamino/stefano/codice/leggi_particelle/leggi_campi.cpp > lrwxrwxrwx 2 stefano user 60 Apr 11 19:20 leggi_particelle.cpp -> > /tamino/stefano/codice/leggi_particelle/leggi_particelle.cpp > -rwxr-xr-x 2 stefano user 97536 Mar 25 12:46 leggi_particelle.old02* > -rwxr-xr-x 2 stefano user 923 May 12 12:42 plot_den.plt* > lrwxrwxrwx 2 stefano user 54 Mar 22 2012 swap_tools.cpp -> > /tamino/stefano/codice/leggi_particelle/swap_tools.cpp > drwxr-xr-x 3 stefano user 4096 Apr 11 19:19 test/ > -rwxr-xr-x 2 stefano user 309 Feb 25 17:24 xpx.plt* > /storageOsmino/1/data/stefano/leggi_particelle: > total 24 > drwxr-xr-x 3 stefano user 4096 May 29 05:00 ./ > drwxr-xr-x 14 stefano user 4096 May 28 04:55 ../ > ---------T 2 root root 0 May 29 05:00 leggi_particelle > drwxr-xr-x 3 stefano user 4096 May 24 17:16 test/ > /storageOsmino/2/data/stefano/leggi_particelle: > total 24 > drwxr-xr-x 3 stefano user 4096 May 29 05:00 ./ > drwxr-xr-x 14 stefano user 4096 May 28 04:55 ../ > ---------T 2 root root 0 May 29 05:00 leggi_particelle > drwxr-xr-x 3 stefano user 4096 May 24 17:16 test/ > /storageOsmino/4/data/stefano/leggi_particelle: > total 16 > drwxr-xr-x 3 stefano user 4096 May 24 17:16 ./ > drwxr-xr-x 14 stefano user 4096 May 28 11:32 ../ > drwxr-xr-x 3 stefano user 4096 May 24 17:16 test/ > /storageOsmino/5/data/stefano/leggi_particelle: > total 16 > drwxr-xr-x 3 stefano user 4096 May 24 17:16 ./ > drwxr-xr-x 14 stefano user 4096 May 28 11:32 ../ > drwxr-xr-x 3 stefano user 4096 May 24 17:16 test/ > ======================================> ======================================> and these are the attributes of the folders in every brick > # file: storage/1/data/stefano/leggi_particelle > trusted.gfid=0xb62a16f0bdb94e3f8563ccfb278c2105 > trusted.glusterfs.dht=0x00000001000000000000000033333332 > # file: storage/2/data/stefano/leggi_particelle > trusted.gfid=0xb62a16f0bdb94e3f8563ccfb278c2105 > trusted.glusterfs.dht=0x00000001000000000000000033333332 > # file: storage/5/data/stefano/leggi_particelle > trusted.afr.data-client-2=0x000000000000000000000000 > trusted.afr.data-client-3=0x000000000000000000000000 > trusted.gfid=0xb62a16f0bdb94e3f8563ccfb278c2105 > trusted.glusterfs.dht=0x00000001000000003333333366666665 > # file: storage/6/data/stefano/leggi_particelle > trusted.afr.data-client-2=0x000000000000000000000000 > trusted.afr.data-client-3=0x000000000000000000000000 > trusted.gfid=0xb62a16f0bdb94e3f8563ccfb278c2105 > trusted.glusterfs.dht=0x00000001000000003333333366666665 > # file: storage/arc1/data/stefano/leggi_particelle > trusted.gfid=0xb62a16f0bdb94e3f8563ccfb278c2105 > trusted.glusterfs.dht=0x0000000100000000ccccccccffffffff > # file: storage/arc2/data/stefano/leggi_particelle > trusted.gfid=0xb62a16f0bdb94e3f8563ccfb278c2105 > trusted.glusterfs.dht=0x0000000100000000ccccccccffffffff > # file: storageOsmino/1/data/stefano/leggi_particelle > trusted.afr.data-client-6=0x000000000000000000000000 > trusted.afr.data-client-7=0x000000000000000000000000 > trusted.gfid=0xb62a16f0bdb94e3f8563ccfb278c2105 > trusted.glusterfs.dht=0x00000001000000006666666699999998 > # file: storageOsmino/2/data/stefano/leggi_particelle > trusted.afr.data-client-6=0x000000000000000000000000 > trusted.afr.data-client-7=0x000000000000000000000000 > trusted.gfid=0xb62a16f0bdb94e3f8563ccfb278c2105 > trusted.glusterfs.dht=0x00000001000000006666666699999998 > # file: storageOsmino/4/data/stefano/leggi_particelle > trusted.gfid=0xb62a16f0bdb94e3f8563ccfb278c2105 > trusted.glusterfs.dht=0x000000010000000099999999cccccccb > # file: storageOsmino/5/data/stefano/leggi_particelle > trusted.gfid=0xb62a16f0bdb94e3f8563ccfb278c2105 > trusted.glusterfs.dht=0x000000010000000099999999cccccccb > ==========================================> Something is not consistent here and I can see it by myself... > I hope to be forgiven for all these details and spamming in the > mailing list. I'm just arrived and I'm already filling it up... Sorry. > But also thanks a lot for your help! > Stefano >I don't see anything incorrect here. It's very weird. Have you modified any file manually ? Can you look at .glusterfs/b6/2a/b62a16f0-bdb9-4e3f-8563-ccfb278c2105 on each brick ? is it a valid symbolic link pointing to data/stefano/leggi_particelle ? Check also .glusterfs/88/3c/883c343b-9366-478d-a660-843da8f6b87c. Also make sure if the file 'leggi_particelle' is present in any other brick. Isn't there anything different in the logs of first and second bricks ? I don't see what can be happening... Xavi