Adrian Gruntkowski
2015-Oct-21 12:25 UTC
[Gluster-users] Copy operation freezes. Lots of locks in state BLOCKED (3-node setup with 1 arbiter)
Hello, I'm trying to track down a problem with my setup (version 3.7.3 on Debian stable). I have a couple of volumes setup in 3-node configuration with 1 brick as an arbiter for each. There are 4 volumes set up in cross-over across 3 physical servers, like this: ------------------------------------->[ GigabitEthernet switch ]<-------------------------- | ^ | | | | V V V /-------------------------- \ /-------------------------- \ /-------------------------- \ | web-rep | | cluster-rep | | mail-rep | | | | | | | | vols: | | vols: | | vols: | | system_www1 | | system_www1 | | system_www1(arbiter) | | data_www1 | | data_www1 | | data_www1(arbiter) | | system_mail1(arbiter) | | system_mail1 | | system_mail1 | | data_mail1(arbiter) | | data_mail1 | | data_mail1 | \---------------------------/ \---------------------------/ \---------------------------/ Now, after a fresh boot-up, everything seems to be running fine. Then I start copying big files (KVM disk images) from local disk to gluster mounts. In the beginning it seems to be running fine (although iowait seems go so high that it clogs up io operations at some moments, but that's an issue for later). After some time the transfer freezes, then after some (long) time, it advances in a short burst to freeze again. Another interesting thing is that I see constant flow of the network traffic on interfaces dedicated to gluster, even when there's a "freeze". I have done "gluster volume statedump" at that time of transfer (file is copied from local disk on cluster-rep onto local mount of "system_www1" volume). I've observer a following section in the dump for cluster-rep node: [xlator.features.locks.system_www1-locks.inode] path=/images/101/vm-101-disk-1.qcow2 mandatory=0 inodelk-count=12 lock-dump.domain.domain=system_www1-replicate-0:self-heal inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid 18446744073709551610, owner=c811600cd67f0000, client=0x7fbe100df280, connection-id=cluster-vm-3603-2015/10/21-10:35:54:596929-system_www1-client-0-0-0, granted at 2015-10-21 11:36:22 lock-dump.domain.domain=system_www1-replicate-0 inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=2195849216, len=131072, pid = 18446744073709551610, owner=c811600cd67f0000, client=0x7fbe100df280, connection-id=cluster-vm-3603-2015/10/21-10:35:54:596929-system_www1-client-0-0-0, granted at 2015-10-21 11:37:45 inodelk.inodelk[1](ACTIVE)=type=WRITE, whence=0, start=9223372036854775805, len=1, pid = 18446744073709551610, owner=c811600cd67f0000, client=0x7fbe100df280, connection-id=cluster-vm-3603-2015/10/21-10:35:54:596929-system_www1-client-0-0-0, granted at 2015-10-21 11:36:22 inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 0, owner=c4fd2d78487f0000, client=0x7fbe100e1380, connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0, blocked at 2015-10-21 11:37:45 inodelk.inodelk[3](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 0, owner=dc752e78487f0000, client=0x7fbe100e1380, connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0, blocked at 2015-10-21 11:37:45 inodelk.inodelk[4](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 0, owner=34832e78487f0000, client=0x7fbe100e1380, connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0, blocked at 2015-10-21 11:37:45 inodelk.inodelk[5](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 0, owner=d44d2e78487f0000, client=0x7fbe100e1380, connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0, blocked at 2015-10-21 11:37:45 inodelk.inodelk[6](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 0, owner=306f2e78487f0000, client=0x7fbe100e1380, connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0, blocked at 2015-10-21 11:37:45 inodelk.inodelk[7](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 0, owner=8c902e78487f0000, client=0x7fbe100e1380, connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0, blocked at 2015-10-21 11:37:45 inodelk.inodelk[8](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 0, owner=782c2e78487f0000, client=0x7fbe100e1380, connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0, blocked at 2015-10-21 11:37:45 inodelk.inodelk[9](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 0, owner=1c0b2e78487f0000, client=0x7fbe100e1380, connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0, blocked at 2015-10-21 11:37:45 inodelk.inodelk[10](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 0, owner=24332e78487f0000, client=0x7fbe100e1380, connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0, blocked at 2015-10-21 11:37:45 There seem to be multiple locks in BLOCKED state - which doesn't look normal to me. The other 2 nodes have only 2 ACTIVE locks at the same time. Below is "gluster volume info" output. # gluster volume info Volume Name: data_mail1 Type: Replicate Volume ID: fc3259a1-ddcf-46e9-ae77-299aaad93b7c Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: cluster-rep:/GFS/data/mail1 Brick2: mail-rep:/GFS/data/mail1 Brick3: web-rep:/GFS/data/mail1 Options Reconfigured: performance.readdir-ahead: on cluster.quorum-count: 2 cluster.quorum-type: fixed cluster.server-quorum-ratio: 51% Volume Name: data_www1 Type: Replicate Volume ID: 0c37a337-dbe5-4e75-8010-94e068c02026 Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: cluster-rep:/GFS/data/www1 Brick2: web-rep:/GFS/data/www1 Brick3: mail-rep:/GFS/data/www1 Options Reconfigured: performance.readdir-ahead: on cluster.quorum-type: fixed cluster.quorum-count: 2 cluster.server-quorum-ratio: 51% Volume Name: system_mail1 Type: Replicate Volume ID: 0568d985-9fa7-40a7-bead-298310622cb5 Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: cluster-rep:/GFS/system/mail1 Brick2: mail-rep:/GFS/system/mail1 Brick3: web-rep:/GFS/system/mail1 Options Reconfigured: performance.readdir-ahead: on cluster.quorum-type: none cluster.quorum-count: 2 cluster.server-quorum-ratio: 51% Volume Name: system_www1 Type: Replicate Volume ID: 147636a2-5c15-4d9a-93c8-44d51252b124 Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: cluster-rep:/GFS/system/www1 Brick2: web-rep:/GFS/system/www1 Brick3: mail-rep:/GFS/system/www1 Options Reconfigured: performance.readdir-ahead: on cluster.quorum-type: none cluster.quorum-count: 2 cluster.server-quorum-ratio: 51% The issue does not occur when I get rid of 3rd arbiter brick. If there's any additional information that is missing and I could provide, please let me know. Greetings, Adrian -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20151021/c40e3a22/attachment.html>
Ravishankar N
2015-Oct-23 04:40 UTC
[Gluster-users] Copy operation freezes. Lots of locks in state BLOCKED (3-node setup with 1 arbiter)
On 10/21/2015 05:55 PM, Adrian Gruntkowski wrote:> Hello, > > I'm trying to track down a problem with my setup (version 3.7.3 on > Debian stable). > > I have a couple of volumes setup in 3-node configuration with 1 brick > as an arbiter for each. > > There are 4 volumes set up in cross-over across 3 physical servers, > like this: > > > > ------------------------------------->[ GigabitEthernet > switch ]<-------------------------- > | ^ > | > | | > | > V V > V > /-------------------------- \ /-------------------------- \ > /-------------------------- \ > | web-rep | | cluster-rep > | | mail-rep | > | | | | > | | > | vols: | | vols: | > | vols: | > | system_www1 | | system_www1 > | | system_www1(arbiter) | > | data_www1 | | data_www1 > | | data_www1(arbiter) | > | system_mail1(arbiter) | | system_mail1 > | | system_mail1 | > | data_mail1(arbiter) | | data_mail1 > | | data_mail1 | > \---------------------------/ \---------------------------/ > \---------------------------/ > > > Now, after a fresh boot-up, everything seems to be running fine. > Then I start copying big files (KVM disk images) from local disk to > gluster mounts. > In the beginning it seems to be running fine (although iowait seems go > so high that it clogs up io operations > at some moments, but that's an issue for later). After some time the > transfer freezes, then > after some (long) time, it advances in a short burst to freeze again. > Another interesting thing is that > I see constant flow of the network traffic on interfaces dedicated to > gluster, even when there's a "freeze". > > I have done "gluster volume statedump" at that time of transfer (file > is copied from local disk on cluster-rep > onto local mount of "system_www1" volume). I've observer a following > section in the dump for cluster-rep node: > > [xlator.features.locks.system_www1-locks.inode] > path=/images/101/vm-101-disk-1.qcow2 > mandatory=0 > inodelk-count=12 > lock-dump.domain.domain=system_www1-replicate-0:self-heal > inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = > 18446744073709551610, owner=c811600cd67f0000, client=0x7fbe100df280, > connection-id=cluster-vm-3603-2015/10/21-10:35:54:596929-system_www1-client-0-0-0, > granted at 2015-10-21 11:36:22 > lock-dump.domain.domain=system_www1-replicate-0 > inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=2195849216, > len=131072, pid = 18446744073709551610, owner=c811600cd67f0000, > client=0x7fbe100df280, > connection-id=cluster-vm-3603-2015/10/21-10:35:54:596929-system_www1-client-0-0-0, > granted at 2015-10-21 11:37:45 > inodelk.inodelk[1](ACTIVE)=type=WRITE, whence=0, > start=9223372036854775805, len=1, pid = 18446744073709551610, > owner=c811600cd67f0000, client=0x7fbe100df280, > connection-id=cluster-vm-3603-2015/10/21-10:35:54:596929-system_www1-client-0-0-0, > granted at 2015-10-21 11:36:22From the statedump, It looks like self-heal daemon had taken locks to heal the file due to which the locks attempted by the client (mount) are in blocked state. In Arbiter volumes the client (mount) takes full locks (start=0, len=0) for every write() as opposed to normal replica volumes which take range locks (i.e. appropriate start,len values) for that write(). This is done to avoid network split-brains. So in normal replica volumes, clients can still write to a file while heal is going on, as long as the offsets don't overlap. This is not the case with arbiter volumes. You can look at the client or glustershd logs to see if there are messages that indicate healing of a file, something along the lines of "Completed data selfheal on xxx"> inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid > = 0, owner=c4fd2d78487f0000, client=0x7fbe100e1380, > connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0, > blocked at 2015-10-21 11:37:45 > inodelk.inodelk[3](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid > = 0, owner=dc752e78487f0000, client=0x7fbe100e1380, > connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0, > blocked at 2015-10-21 11:37:45 > inodelk.inodelk[4](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid > = 0, owner=34832e78487f0000, client=0x7fbe100e1380, > connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0, > blocked at 2015-10-21 11:37:45 > inodelk.inodelk[5](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid > = 0, owner=d44d2e78487f0000, client=0x7fbe100e1380, > connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0, > blocked at 2015-10-21 11:37:45 > inodelk.inodelk[6](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid > = 0, owner=306f2e78487f0000, client=0x7fbe100e1380, > connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0, > blocked at 2015-10-21 11:37:45 > inodelk.inodelk[7](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid > = 0, owner=8c902e78487f0000, client=0x7fbe100e1380, > connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0, > blocked at 2015-10-21 11:37:45 > inodelk.inodelk[8](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid > = 0, owner=782c2e78487f0000, client=0x7fbe100e1380, > connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0, > blocked at 2015-10-21 11:37:45 > inodelk.inodelk[9](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid > = 0, owner=1c0b2e78487f0000, client=0x7fbe100e1380, > connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0, > blocked at 2015-10-21 11:37:45 > inodelk.inodelk[10](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid > = 0, owner=24332e78487f0000, client=0x7fbe100e1380, > connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0, > blocked at 2015-10-21 11:37:45 > > There seem to be multiple locks in BLOCKED state - which doesn't look > normal to me. The other 2 nodes have > only 2 ACTIVE locks at the same time. > > Below is "gluster volume info" output. > > # gluster volume info > > Volume Name: data_mail1 > Type: Replicate > Volume ID: fc3259a1-ddcf-46e9-ae77-299aaad93b7c > Status: Started > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: cluster-rep:/GFS/data/mail1 > Brick2: mail-rep:/GFS/data/mail1 > Brick3: web-rep:/GFS/data/mail1 > Options Reconfigured: > performance.readdir-ahead: on > cluster.quorum-count: 2 > cluster.quorum-type: fixed > cluster.server-quorum-ratio: 51% > > Volume Name: data_www1 > Type: Replicate > Volume ID: 0c37a337-dbe5-4e75-8010-94e068c02026 > Status: Started > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: cluster-rep:/GFS/data/www1 > Brick2: web-rep:/GFS/data/www1 > Brick3: mail-rep:/GFS/data/www1 > Options Reconfigured: > performance.readdir-ahead: on > cluster.quorum-type: fixed > cluster.quorum-count: 2 > cluster.server-quorum-ratio: 51% > > Volume Name: system_mail1 > Type: Replicate > Volume ID: 0568d985-9fa7-40a7-bead-298310622cb5 > Status: Started > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: cluster-rep:/GFS/system/mail1 > Brick2: mail-rep:/GFS/system/mail1 > Brick3: web-rep:/GFS/system/mail1 > Options Reconfigured: > performance.readdir-ahead: on > cluster.quorum-type: none > cluster.quorum-count: 2 > cluster.server-quorum-ratio: 51% > > Volume Name: system_www1 > Type: Replicate > Volume ID: 147636a2-5c15-4d9a-93c8-44d51252b124 > Status: Started > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: cluster-rep:/GFS/system/www1 > Brick2: web-rep:/GFS/system/www1 > Brick3: mail-rep:/GFS/system/www1 > Options Reconfigured: > performance.readdir-ahead: on > cluster.quorum-type: none > cluster.quorum-count: 2 > cluster.server-quorum-ratio: 51% > > The issue does not occur when I get rid of 3rd arbiter brick.What do you mean by 'getting rid of'? Killing the 3rd brick process of the volume? Regards, Ravi> > If there's any additional information that is missing and I could > provide, please let me know. > > Greetings, > Adrian > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20151023/c110dc63/attachment.html>