ZHANG Cheng
2012-Nov-26 09:46 UTC
[Gluster-users] Self healing of 3.3.0 cause our 2 bricks replicated cluster freeze (client read/write timeout)
Early this morning our 2 bricks replicated cluster had an outage. The disk space for one of the brick server (brick02) was used up. When we responded to the disk full alert, the issue already lasted for a few hours. We reclaimed some disk space, and reboot the brick02 server, expecting once it come back it will go self healing. It did go self healing, but just after couple minutes, access to gluster filesystem freeze. Tons of "nfs: server brick not responding, still trying" popped up in dmesg. The load average on app server went up to 200 something from usual 0.10. We had to shutdown brick02 server or stop gluster server process on it, to get the gluster cluster back working. How could we deal with this issue? Thanks in advance. Our gluster setup is followed the official doc. gluster> volume info Volume Name: staticvol Type: Replicate Volume ID: fdcbf635-5faf-45d6-ab4e-be97c74d7715 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: brick01:/exports/static Brick2: brick02:/exports/static Underlying filesystem is xfs (on a lvm volume), as: /dev/mapper/vg_node-brick on /exports/static type xfs (rw,noatime,nodiratime,nobarrier,logbufs=8) The brick servers don't act as gluster client. Our app servers are the gluster client, mount via nfs. brick:/staticvol on /mnt/gfs-static type nfs (rw,noatime,nodiratime,vers=3,rsize=8192,wsize=8192,addr=10.10.10.51) brick is a DNS round-robin record for brick01 and brick02.
ZHANG Cheng
2012-Nov-29 05:24 UTC
[Gluster-users] Self healing of 3.3.0 cause our 2 bricks replicated cluster freeze (client read/write timeout)
I dig out an gluster-users m-list thread dated 2011-June at http://gluster.org/pipermail/gluster-users/2011-June/008111.html. In this post, Marco Agostini said: =================================================Craig Carl said me, three days ago: ------------------------------------------------------ that happens because Gluster's self heal is a blocking operation. We are working on a non-blocking self heal, we are hoping to ship it in early September. ------------------------------------------------------ ================================================= Looks like even with release of 3.3.1, self heal is still a blocking operation. I am wondering why the official Administration Guide doesn't warn the reader about such important thing regarding production operation. On Mon, Nov 26, 2012 at 5:46 PM, ZHANG Cheng <czhang.oss at gmail.com> wrote:> Early this morning our 2 bricks replicated cluster had an outage. The > disk space for one of the brick server (brick02) was used up. When we > responded to the disk full alert, the issue already lasted for a few > hours. We reclaimed some disk space, and reboot the brick02 server, > expecting once it come back it will go self healing. > > It did go self healing, but just after couple minutes, access to > gluster filesystem freeze. Tons of "nfs: server brick not responding, > still trying" popped up in dmesg. The load average on app server went > up to 200 something from usual 0.10. We had to shutdown brick02 server > or stop gluster server process on it, to get the gluster cluster back > working. > > How could we deal with this issue? Thanks in advance. > > Our gluster setup is followed the official doc. > > gluster> volume info > > Volume Name: staticvol > Type: Replicate > Volume ID: fdcbf635-5faf-45d6-ab4e-be97c74d7715 > Status: Started > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: brick01:/exports/static > Brick2: brick02:/exports/static > > Underlying filesystem is xfs (on a lvm volume), as: > /dev/mapper/vg_node-brick on /exports/static type xfs > (rw,noatime,nodiratime,nobarrier,logbufs=8) > > The brick servers don't act as gluster client. > > Our app servers are the gluster client, mount via nfs. > brick:/staticvol on /mnt/gfs-static type nfs > (rw,noatime,nodiratime,vers=3,rsize=8192,wsize=8192,addr=10.10.10.51) > > brick is a DNS round-robin record for brick01 and brick02.
Bryan Whitehead
2012-Nov-29 06:48 UTC
[Gluster-users] Self healing of 3.3.0 cause our 2 bricks replicated cluster freeze (client read/write timeout)
when you mount xfs, also use the inode64 option. That will help with xfs performance. My offhand guess is you are likely running into limited network bandwidth for the 2 bricks to sync. As the network gets flooded nfs response gets poor. Make sure you are getting full-duplex connections - or upgrade your network to 10G or (even better) Infiniband. On Mon, Nov 26, 2012 at 1:46 AM, ZHANG Cheng <czhang.oss at gmail.com> wrote:> Early this morning our 2 bricks replicated cluster had an outage. The > disk space for one of the brick server (brick02) was used up. When we > responded to the disk full alert, the issue already lasted for a few > hours. We reclaimed some disk space, and reboot the brick02 server, > expecting once it come back it will go self healing. > > It did go self healing, but just after couple minutes, access to > gluster filesystem freeze. Tons of "nfs: server brick not responding, > still trying" popped up in dmesg. The load average on app server went > up to 200 something from usual 0.10. We had to shutdown brick02 server > or stop gluster server process on it, to get the gluster cluster back > working. > > How could we deal with this issue? Thanks in advance. > > Our gluster setup is followed the official doc. > > gluster> volume info > > Volume Name: staticvol > Type: Replicate > Volume ID: fdcbf635-5faf-45d6-ab4e-be97c74d7715 > Status: Started > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: brick01:/exports/static > Brick2: brick02:/exports/static > > Underlying filesystem is xfs (on a lvm volume), as: > /dev/mapper/vg_node-brick on /exports/static type xfs > (rw,noatime,nodiratime,nobarrier,logbufs=8) > > The brick servers don't act as gluster client. > > Our app servers are the gluster client, mount via nfs. > brick:/staticvol on /mnt/gfs-static type nfs > (rw,noatime,nodiratime,vers=3,rsize=8192,wsize=8192,addr=10.10.10.51) > > brick is a DNS round-robin record for brick01 and brick02. > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20121128/96357baf/attachment.html>
Jeff Darcy
2012-Nov-29 10:58 UTC
[Gluster-users] Self healing of 3.3.0 cause our 2 bricks replicated cluster freeze (client read/write timeout)
On 11/26/12 4:46 AM, ZHANG Cheng wrote:> Early this morning our 2 bricks replicated cluster had an outage. The > disk space for one of the brick server (brick02) was used up. When we > responded to the disk full alert, the issue already lasted for a few > hours. We reclaimed some disk space, and reboot the brick02 server, > expecting once it come back it will go self healing. > > It did go self healing, but just after couple minutes, access to > gluster filesystem freeze. Tons of "nfs: server brick not responding, > still trying" popped up in dmesg. The load average on app server went > up to 200 something from usual 0.10. We had to shutdown brick02 server > or stop gluster server process on it, to get the gluster cluster back > working.Have you checked the glustershd logs (should be in /var/log/glusterfs) on the bricks? If there's nothing useful there, a statedump would also be useful. See the "gluster volume statedump" instructions on your friendly local admin guide (section 10.4 for GlusterFS 3.3). Most helpful of all would be a bug report with any of this information plus a description of your configuration. You can either create a new one or attach the info to an existing bug if one seems to fit. The following seems like it might be related, even though it's for virtual machines. https://bugzilla.redhat.com/show_bug.cgi?id=881685
ZHANG Cheng
2013-Jan-09 08:43 UTC
[Gluster-users] Self healing of 3.3.0 cause our 2 bricks replicated cluster freeze (client read/write timeout)
We had a planned outage yesterday, which requires us to shutdown one of replicated brick server (brick02) for 30 minutes. The maintenance went smooth. But after about couple minutes I brought brick02 back online, our app server's load raise to a very high number (200~300+), as it came to a halt as a REST API backend. The same problem we had that described in previous post. If I shutdown brick02, restart the jboss-as powered our REST API, the app server's load will keep at normal level. So looks like our app server's file access pattern will lead to file operation freeze during the gluster server's self healing process. Our app is a REST API backend for mobile forum/community, so the main content is threads and posts, which contains pictures, showing in pinterest style in our iOS app. For each picture URL in the JSON response, our API server's java code does a check as: public static boolean checkImage(String path){ File file=new File(path); if(null!=file&&file.exists()&&file.length()>0){ return true; } return false; } Usually for each of such response, there are about 10 to 20 pictures in it, which means this checkImage() will be called that many times. Because most of request are asking for recent uploaded pictures, these picture files almost certain are the kind of files requiring self healing. Even during our off-peak hours there are 0-3 such thread/post api request per second, sooner or later we will run into the same freezing problem, if the glusterfs servers are doing self healing. I think now I have more concrete info to file a bug report. On Mon, Nov 26, 2012 at 5:46 PM, ZHANG Cheng <czhang.oss at gmail.com> wrote:> Early this morning our 2 bricks replicated cluster had an outage. The > disk space for one of the brick server (brick02) was used up. When we > responded to the disk full alert, the issue already lasted for a few > hours. We reclaimed some disk space, and reboot the brick02 server, > expecting once it come back it will go self healing. > > It did go self healing, but just after couple minutes, access to > gluster filesystem freeze. Tons of "nfs: server brick not responding, > still trying" popped up in dmesg. The load average on app server went > up to 200 something from usual 0.10. We had to shutdown brick02 server > or stop gluster server process on it, to get the gluster cluster back > working. > > How could we deal with this issue? Thanks in advance. > > Our gluster setup is followed the official doc. > > gluster> volume info > > Volume Name: staticvol > Type: Replicate > Volume ID: fdcbf635-5faf-45d6-ab4e-be97c74d7715 > Status: Started > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: brick01:/exports/static > Brick2: brick02:/exports/static > > Underlying filesystem is xfs (on a lvm volume), as: > /dev/mapper/vg_node-brick on /exports/static type xfs > (rw,noatime,nodiratime,nobarrier,logbufs=8) > > The brick servers don't act as gluster client. > > Our app servers are the gluster client, mount via nfs. > brick:/staticvol on /mnt/gfs-static type nfs > (rw,noatime,nodiratime,vers=3,rsize=8192,wsize=8192,addr=10.10.10.51) > > brick is a DNS round-robin record for brick01 and brick02.