Hello, we are trying out GlusterFS as the working filesystem for a compute cluster; the cluster is comprised of 57 compute nodes (55 cores each), acting as GlusterFS clients, and 25 data server nodes (8 cores each), serving 1 large GlusterFS brick each. We currently have noticed a couple of issues: 1) When compute jobs run, the `glusterfs` client process on the compute nodes goes up to 100% CPU, and filesystem operations start to slow down a lot. Since there are many CPUs available, is it possible to make it use, e.g., 4 CPUs instead of one to make it more responsive? 2) In addition (but possibly related to 1) we have an issue with files disappearing and re-appearing: from a compute process we test for the existence of a file and e.g. `test -e /glusterfs/file.txt` fails. Then we test from a different process or shell and the file is there. As far as I can see, the servers are basically idle, and none of the peers is disconnected. We are running GlusterFS 3.7.17 on Ubuntu 16.04, installed from the Launchpad PPA. (Details below for the interested.) Can you give any hint about what's going on? Thanks, Riccardo Installation details: ubuntu at master001:~$ pdsh -a 'glusterfs --version | fgrep built' | dshbak -c ---------------- data[001-025],master001,worker[001-027,029-045,047,049-050,052-058,060,062-063] ---------------- glusterfs 3.7.17 built on Nov 4 2016 13:39:51 ubuntu at master001:~$ dpkg -S $(which glusterfs) glusterfs-client: /usr/sbin/glusterfs ubuntu at master001:~$ apt-cache policy glusterfs-client glusterfs-client: Installed: 3.7.17-ubuntu1~xenial5 Candidate: 3.7.17-ubuntu1~xenial5 Version table: *** 3.7.17-ubuntu1~xenial5 500 500 http://ppa.launchpad.net/gluster/glusterfs-3.7/ubuntu xenial/main amd64 Packages 100 /var/lib/dpkg/status 3.7.6-1ubuntu1 500 500 http://nova.clouds.archive.ubuntu.com/ubuntu xenial/universe amd64 Packages 500 http://archive.ubuntu.com/ubuntu xenial/universe amd64 Packages -- Riccardo Murri, Anna-Heer-Strasse 10, CH-8057 Z?rich, Switzerland
Mohammed Rafi K C
2016-Dec-19 14:23 UTC
[Gluster-users] files disappearing and re-appearing
Hi Riccardo, I'm sorry that you didn't get your issues discussed here in the mailing list on time. Sometimes community members would be busy with some other issues. I just got to know about this problem from a different thread in which I'm part of, and you had mentioned about this case. If you have the problem still bugging you, or if you have any previous logs that you can share with me, that will help to analyze further. some inline questions. On 11/17/2016 07:22 PM, Riccardo Murri wrote:> Hello, > > we are trying out GlusterFS as the working filesystem for a compute cluster; > the cluster is comprised of 57 compute nodes (55 cores each), acting as > GlusterFS clients, and 25 data server nodes (8 cores each), serving > 1 large GlusterFS brick each. > > We currently have noticed a couple of issues: > > 1) When compute jobs run, the `glusterfs` client process on the compute nodes > goes up to 100% CPU, and filesystem operations start to slow down a lot. > Since there are many CPUs available, is it possible to make it use, e.g., > 4 CPUs instead of one to make it more responsive?Can you just briefly describe about your computing job, workloads to see what are the operation happening on the cluster.> > 2) In addition (but possibly related to 1) we have an issue with files > disappearing and re-appearing: from a compute process we test for the existence > of a file and e.g. `test -e /glusterfs/file.txt` fails. Then we test from > a different process or shell and the file is there. As far as I can see, > the servers are basically idle, and none of the peers is disconnected. > > We are running GlusterFS 3.7.17 on Ubuntu 16.04, installed from the Launchpad PPA. > (Details below for the interested.) > > Can you give any hint about what's going on?is there any rebalance happening, tell me more about any on going operations (internal operations like rebalance, shd,etc or client operations). Also some insight about your volume configuration will also help. volume info and volume status. Regards Rafi KC> > Thanks, > Riccardo > > > Installation details: > > ubuntu at master001:~$ pdsh -a 'glusterfs --version | fgrep built' | dshbak -c > ---------------- > data[001-025],master001,worker[001-027,029-045,047,049-050,052-058,060,062-063] > ---------------- > glusterfs 3.7.17 built on Nov 4 2016 13:39:51 > ubuntu at master001:~$ dpkg -S $(which glusterfs) > glusterfs-client: /usr/sbin/glusterfs > ubuntu at master001:~$ apt-cache policy glusterfs-client > glusterfs-client: > Installed: 3.7.17-ubuntu1~xenial5 > Candidate: 3.7.17-ubuntu1~xenial5 > Version table: > *** 3.7.17-ubuntu1~xenial5 500 > 500 http://ppa.launchpad.net/gluster/glusterfs-3.7/ubuntu xenial/main amd64 Packages > 100 /var/lib/dpkg/status > 3.7.6-1ubuntu1 500 > 500 http://nova.clouds.archive.ubuntu.com/ubuntu xenial/universe amd64 Packages > 500 http://archive.ubuntu.com/ubuntu xenial/universe amd64 Packages >