Rumen Telbizov
2015-Feb-10 23:10 UTC
[Gluster-users] glusterd 100% cpu upon volume status inode
Hello everyone, I am new to GlusterFS and I am in the process of evaluating it as a possible alternative to some other options. While playing with it I came across this problem. Please direct me if there's something wrong that I am might be doing. When I run *volume status myvolume inode* it causes the glusterd process to hit *100% cpu utilization* and no commands work furthermore. If I restart the glusterd process the problem is "resolved" until I run the same command again. Here's some more debug: # time gluster volume status myvolume inode real 2m0.095s ... [2015-02-10 22:49:38.662545] E [name.c:147:client_fill_address_family] 0-glusterfs: transport.address-family not specified. Could not guess default value from (remote-host:(null) or transport.unix.connect-path:(null)) options [2015-02-10 22:49:41.663081] W [dict.c:1055:data_to_str] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(+0x4e24) [0x7fb21d6d2e24] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0x4e) [0x7fb21d6d990e] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(client_fill_address_family+0x202) [0x7fb21d6d95f2]))) 0-dict: data is NULL [2015-02-10 22:49:41.663101] W [dict.c:1055:data_to_str] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(+0x4e24) [0x7fb21d6d2e24] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0x4e) [0x7fb21d6d990e] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(client_fill_address_family+0x20d) [0x7fb21d6d95fd]))) 0-dict: data is NULL [2015-02-10 22:49:41.663107] E [name.c:147:client_fill_address_family] 0-glusterfs: transport.address-family not specified. Could not guess default value from (remote-host:(null) or transport.unix.connect-path:(null)) options [2015-02-10 22:49:44.663576] W [dict.c:1055:data_to_str] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(+0x4e24) [0x7fb21d6d2e24] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0x4e) [0x7fb21d6d990e] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(client_fill_address_family+0x202) [0x7fb21d6d95f2]))) 0-dict: data is NULL [2015-02-10 22:49:44.663595] W [dict.c:1055:data_to_str] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(+0x4e24) [0x7fb21d6d2e24] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0x4e) [0x7fb21d6d990e] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(client_fill_address_family+0x20d) [0x7fb21d6d95fd]))) 0-dict: data is NULL [2015-02-10 22:49:44.663601] E [name.c:147:client_fill_address_family] 0-glusterfs: transport.address-family not specified. Could not guess default value from (remote-host:(null) or transport.unix.connect-path:(null)) options [2015-02-10 22:49:47.664111] W [dict.c:1055:data_to_str] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(+0x4e24) [0x7fb21d6d2e24] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0x4e) [0x7fb21d6d990e] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(client_fill_address_family+0x202) [0x7fb21d6d95f2]))) 0-dict: data is NULL [2015-02-10 22:49:47.664131] W [dict.c:1055:data_to_str] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(+0x4e24) [0x7fb21d6d2e24] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0x4e) [0x7fb21d6d990e] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(client_fill_address_family+0x20d) [0x7fb21d6d95fd]))) 0-dict: data is NULL [2015-02-10 22:49:47.664137] E [name.c:147:client_fill_address_family] 0-glusterfs: transport.address-family not specified. Could not guess default value from (remote-host:(null) or transport.unix.connect-path:(null)) options [2015-02-10 22:49:47.728428] I [input.c:36:cli_batch] 0-:* Exiting with: 110* # time gluster volume status Another transaction is in progress. Please try again after sometime. real 0m10.223s [2015-02-10 22:50:29.937290] E [glusterd-utils.c:153:glusterd_lock] 0-management: Unable to get lock for uuid: c7d1e1ea-c5a5-4bcf-802c-aa04dd2e55ba, lock held by: c7d1e1ea-c5a5-4bcf-802c-aa04dd2e55ba [2015-02-10 22:50:29.937316] E [glusterd-syncop.c:1221:gd_sync_task_begin] 0-management: Unable to acquire lock The volume contains the extracted linux kernel - so lots of small files (48425). Here's the configuration: # gluster volume status Status of volume: myvolume Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.12.10.7:/var/lib/glusterfs_disks/disk01/brick 49152 Y 3321 Brick 10.12.10.8:/var/lib/glusterfs_disks/disk01/brick 49152 Y 3380 Brick 10.12.10.9:/var/lib/glusterfs_disks/disk01/brick 49152 Y 3359 Brick 10.12.10.7:/var/lib/glusterfs_disks/disk02/brick 49154 Y 18687 Brick 10.12.10.8:/var/lib/glusterfs_disks/disk02/brick 49156 Y 32699 Brick 10.12.10.9:/var/lib/glusterfs_disks/disk02/brick 49154 Y 17932 Self-heal Daemon on localhost N/A Y 25005 Self-heal Daemon on 10.12.10.9 N/A Y 17952 Self-heal Daemon on 10.12.10.8 N/A Y 32724 Task Status of Volume myvolume ------------------------------------------------------------------------------ Task : Rebalance ID : eec4f2c1-85f5-400d-ac42-6da63ec7434f Status : completed # gluster volume info Volume Name: myvolume Type: Distributed-Replicate Volume ID: e513a56f-049f-4c8e-bc75-4fb789e06c37 Status: Started Number of Bricks: 2 x 3 = 6 Transport-type: tcp Bricks: Brick1: 10.12.10.7:/var/lib/glusterfs_disks/disk01/brick Brick2: 10.12.10.8:/var/lib/glusterfs_disks/disk01/brick Brick3: 10.12.10.9:/var/lib/glusterfs_disks/disk01/brick Brick4: 10.12.10.7:/var/lib/glusterfs_disks/disk02/brick Brick5: 10.12.10.8:/var/lib/glusterfs_disks/disk02/brick Brick6: 10.12.10.9:/var/lib/glusterfs_disks/disk02/brick Options Reconfigured: nfs.disable: on network.ping-timeout: 10 I run: # glusterd -V glusterfs 3.5.3 built on Nov 17 2014 15:48:52 Repository revision: git://git.gluster.com/glusterfs.git ?Thank you for your time. ?Regards, -- Rumen Telbizov Unix Systems Administrator <http://telbizov.com> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150210/defda4c3/attachment.html>
Kaushal M
2015-Feb-11 05:27 UTC
[Gluster-users] glusterd 100% cpu upon volume status inode
There is nothing wrong with your setup. This is a known issue (at least to me). The problem here lies with how GlusterD collect and collate the information on the open inodes on a volume, which isn't really efficient as of now. The collection and collation process involves doing several small (at least 2, but pretty sure it's more) memory allocations for each inode open on the bricks. This doesn't really scale well when we have lots of files, and is CPU and memory intensive. In your case, with a 3-way replica volume, you'd have inodes atleast 3x the number of files (~150000). This means atleast 300k small memory allocations need to be done by GlusterD. This is going to take a lot of time, CPU time and memory to complete. The process will eventually complete provided you have enough memory available. But as the gluster CLI only waits for 2 minutes for a reply, you will not get to see the output as you've experienced. But GlusterD will continue and finish the asked operation. Also, other CLI commands will fail till the existing operation finishes. GlusterD acquires a transaction lock when it begins an operation and releases it once the operation is complete. As GlusterD still continues with the operation after CLI times out, newer commands will fail as they cannot get the lock. ~kaushal On Wed, Feb 11, 2015 at 4:40 AM, Rumen Telbizov <telbizov at gmail.com> wrote:> Hello everyone, > > I am new to GlusterFS and I am in the process of evaluating it as a > possible alternative to some other options. While playing with it I came > across this problem. Please direct me if there's something wrong that I am > might be doing. > > When I run *volume status myvolume inode* it causes the glusterd process > to hit *100% cpu utilization* and no commands work furthermore. If I > restart the glusterd process the problem is "resolved" until I run the same > command again. Here's some more debug: > > # time gluster volume status myvolume inode > real 2m0.095s > > ... > [2015-02-10 22:49:38.662545] E [name.c:147:client_fill_address_family] > 0-glusterfs: transport.address-family not specified. Could not guess > default value from (remote-host:(null) or > transport.unix.connect-path:(null)) options > [2015-02-10 22:49:41.663081] W [dict.c:1055:data_to_str] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(+0x4e24) > [0x7fb21d6d2e24] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0x4e) > [0x7fb21d6d990e] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(client_fill_address_family+0x202) > [0x7fb21d6d95f2]))) 0-dict: data is NULL > [2015-02-10 22:49:41.663101] W [dict.c:1055:data_to_str] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(+0x4e24) > [0x7fb21d6d2e24] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0x4e) > [0x7fb21d6d990e] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(client_fill_address_family+0x20d) > [0x7fb21d6d95fd]))) 0-dict: data is NULL > [2015-02-10 22:49:41.663107] E [name.c:147:client_fill_address_family] > 0-glusterfs: transport.address-family not specified. Could not guess > default value from (remote-host:(null) or > transport.unix.connect-path:(null)) options > [2015-02-10 22:49:44.663576] W [dict.c:1055:data_to_str] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(+0x4e24) > [0x7fb21d6d2e24] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0x4e) > [0x7fb21d6d990e] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(client_fill_address_family+0x202) > [0x7fb21d6d95f2]))) 0-dict: data is NULL > [2015-02-10 22:49:44.663595] W [dict.c:1055:data_to_str] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(+0x4e24) > [0x7fb21d6d2e24] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0x4e) > [0x7fb21d6d990e] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(client_fill_address_family+0x20d) > [0x7fb21d6d95fd]))) 0-dict: data is NULL > [2015-02-10 22:49:44.663601] E [name.c:147:client_fill_address_family] > 0-glusterfs: transport.address-family not specified. Could not guess > default value from (remote-host:(null) or > transport.unix.connect-path:(null)) options > [2015-02-10 22:49:47.664111] W [dict.c:1055:data_to_str] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(+0x4e24) > [0x7fb21d6d2e24] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0x4e) > [0x7fb21d6d990e] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(client_fill_address_family+0x202) > [0x7fb21d6d95f2]))) 0-dict: data is NULL > [2015-02-10 22:49:47.664131] W [dict.c:1055:data_to_str] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(+0x4e24) > [0x7fb21d6d2e24] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0x4e) > [0x7fb21d6d990e] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.5.3/rpc-transport/socket.so(client_fill_address_family+0x20d) > [0x7fb21d6d95fd]))) 0-dict: data is NULL > [2015-02-10 22:49:47.664137] E [name.c:147:client_fill_address_family] > 0-glusterfs: transport.address-family not specified. Could not guess > default value from (remote-host:(null) or > transport.unix.connect-path:(null)) options > [2015-02-10 22:49:47.728428] I [input.c:36:cli_batch] 0-:* Exiting with: > 110* > > > > # time gluster volume status > Another transaction is in progress. Please try again after sometime. > real 0m10.223s > > [2015-02-10 22:50:29.937290] E [glusterd-utils.c:153:glusterd_lock] > 0-management: Unable to get lock for uuid: > c7d1e1ea-c5a5-4bcf-802c-aa04dd2e55ba, lock held by: > c7d1e1ea-c5a5-4bcf-802c-aa04dd2e55ba > [2015-02-10 22:50:29.937316] E [glusterd-syncop.c:1221:gd_sync_task_begin] > 0-management: Unable to acquire lock > > > The volume contains the extracted linux kernel - so lots of small files > (48425). Here's the configuration: > > # gluster volume status > Status of volume: myvolume > Gluster process Port Online Pid > > ------------------------------------------------------------------------------ > Brick 10.12.10.7:/var/lib/glusterfs_disks/disk01/brick 49152 Y 3321 > Brick 10.12.10.8:/var/lib/glusterfs_disks/disk01/brick 49152 Y 3380 > Brick 10.12.10.9:/var/lib/glusterfs_disks/disk01/brick 49152 Y 3359 > Brick 10.12.10.7:/var/lib/glusterfs_disks/disk02/brick 49154 Y 18687 > Brick 10.12.10.8:/var/lib/glusterfs_disks/disk02/brick 49156 Y 32699 > Brick 10.12.10.9:/var/lib/glusterfs_disks/disk02/brick 49154 Y 17932 > Self-heal Daemon on localhost N/A Y 25005 > Self-heal Daemon on 10.12.10.9 N/A Y 17952 > Self-heal Daemon on 10.12.10.8 N/A Y 32724 > > Task Status of Volume myvolume > > ------------------------------------------------------------------------------ > Task : Rebalance > ID : eec4f2c1-85f5-400d-ac42-6da63ec7434f > Status : completed > > > # gluster volume info > > Volume Name: myvolume > Type: Distributed-Replicate > Volume ID: e513a56f-049f-4c8e-bc75-4fb789e06c37 > Status: Started > Number of Bricks: 2 x 3 = 6 > Transport-type: tcp > Bricks: > Brick1: 10.12.10.7:/var/lib/glusterfs_disks/disk01/brick > Brick2: 10.12.10.8:/var/lib/glusterfs_disks/disk01/brick > Brick3: 10.12.10.9:/var/lib/glusterfs_disks/disk01/brick > Brick4: 10.12.10.7:/var/lib/glusterfs_disks/disk02/brick > Brick5: 10.12.10.8:/var/lib/glusterfs_disks/disk02/brick > Brick6: 10.12.10.9:/var/lib/glusterfs_disks/disk02/brick > Options Reconfigured: > nfs.disable: on > network.ping-timeout: 10 > > I run: > # glusterd -V > glusterfs 3.5.3 built on Nov 17 2014 15:48:52 > Repository revision: git://git.gluster.com/glusterfs.git > > > ?Thank you for your time. > > ?Regards, > -- > Rumen Telbizov > Unix Systems Administrator <http://telbizov.com> > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150211/1fbddbc0/attachment.html>