Jonathan Lefman
2012-Nov-02 00:03 UTC
[Gluster-users] Very slow directory listing and high CPU usage on replicated volume
Hi all, I am having problems with painfully slow directory listings on a freshly created replicated volume. The configuration is as follows: 2 nodes with 3 replicated drives each. The total volume capacity is 5.6T. We would like to expand the storage capacity much more, but first we need to figure this problem out. Soon after loading up about 100 MB of small files (about 300kb each), the drive usage is at 1.1T. I am not sure if this to be expected. The main problem is that directory listing (ls or find) takes a very long time. The CPU usage on the nodes is high for each of the glusterfsd processes - 3 on each machine 54%, 43%, and 25% per core is an example of the usage. Memory is very low for each process. It is incredibly difficult to diagnose this issue. We have wiped previous gluster installs, all directories, and mount points as well as reformatting the disks. Each drive is formatted with ext4. Has anyone had a similar result? Any ideas on how to debug this one? Thank you, Jon -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20121101/e5f65a4b/attachment.html>
Jonathan Lefman
2012-Nov-02 00:41 UTC
[Gluster-users] Very slow directory listing and high CPU usage on replicated volume
I am attaching some extra information to help diagnose this issue. Hopefully this is useful. Volume Name: my_gluster_data Type: Distributed-Replicate Volume ID: e8865f37-6e22-476d-956e-29280bb07e75 Status: Started Number of Bricks: 3 x 2 = 6 Transport-type: tcp Bricks: Brick1: server1:/media/data1/my_gluster_data Brick2: server2:/media/data1/my_gluster_data Brick3: server1:/media/data2/my_gluster_data Brick4: server2:/media/data2/my_gluster_data Brick5: server1:/media/data3/my_gluster_data Brick6: hserver2:/media/data3/my_gluster_data On Thu, Nov 1, 2012 at 8:03 PM, Jonathan Lefman <jonathan.lefman at essess.com>wrote:> Hi all, > > I am having problems with painfully slow directory listings on a freshly > created replicated volume. The configuration is as follows: 2 nodes with > 3 replicated drives each. The total volume capacity is 5.6T. We would > like to expand the storage capacity much more, but first we need to figure > this problem out. > > Soon after loading up about 100 MB of small files (about 300kb each), the > drive usage is at 1.1T. I am not sure if this to be expected. The main > problem is that directory listing (ls or find) takes a very long time. The > CPU usage on the nodes is high for each of the glusterfsd processes - 3 on > each machine 54%, 43%, and 25% per core is an example of the usage. Memory > is very low for each process. It is incredibly difficult to diagnose this > issue. We have wiped previous gluster installs, all directories, and mount > points as well as reformatting the disks. Each drive is formatted with > ext4. > > Has anyone had a similar result? Any ideas on how to debug this one? > > Thank you, > > Jon > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20121101/58e1bf52/attachment.html>
Jules Wang
2012-Nov-02 01:24 UTC
[Gluster-users] Very slow directory listing and high CPU usage on replicated volume
Hi John: Glusterfs is not designed for handling large count small files, because it has no meta data server, every lookup operation cost a lot in your situation. The disk usage is abnormal, does your disk only have gluster bricks? Best Regards. Jules Wang At 2012-11-02 08:03:21,"Jonathan Lefman" <jonathan.lefman at essess.com> wrote: Hi all, I am having problems with painfully slow directory listings on a freshly created replicated volume. The configuration is as follows: 2 nodes with 3 replicated drives each. The total volume capacity is 5.6T. We would like to expand the storage capacity much more, but first we need to figure this problem out. Soon after loading up about 100 MB of small files (about 300kb each), the drive usage is at 1.1T. I am not sure if this to be expected. The main problem is that directory listing (ls or find) takes a very long time. The CPU usage on the nodes is high for each of the glusterfsd processes - 3 on each machine 54%, 43%, and 25% per core is an example of the usage. Memory is very low for each process. It is incredibly difficult to diagnose this issue. We have wiped previous gluster installs, all directories, and mount points as well as reformatting the disks. Each drive is formatted with ext4. Has anyone had a similar result? Any ideas on how to debug this one? Thank you, Jon -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20121102/50b2bd6d/attachment.html>
Jonathan Lefman
2012-Nov-02 01:31 UTC
[Gluster-users] Very slow directory listing and high CPU usage on replicated volume
Thanks for the response. Yes, only gluster bricks on the disk. I know about the large count of small files issue, but we changed how we organized the files. Each directory has about 30 files in it. Am I missing something? On Thu, Nov 1, 2012 at 9:24 PM, Jules Wang <lancelotds at 163.com> wrote:> Hi John: > Glusterfs is not designed for handling large count small files, > because it has no meta data server, every lookup operation cost a lot in > your situation. > The disk usage is abnormal, does your disk only have gluster bricks? > > Best Regards. > Jules Wang > > > > > At 2012-11-02 08:03:21,"Jonathan Lefman" <jonathan.lefman at essess.com> > wrote: > > Hi all, > > I am having problems with painfully slow directory listings on a freshly > created replicated volume. The configuration is as follows: 2 nodes with > 3 replicated drives each. The total volume capacity is 5.6T. We would > like to expand the storage capacity much more, but first we need to figure > this problem out. > > Soon after loading up about 100 MB of small files (about 300kb each), the > drive usage is at 1.1T. I am not sure if this to be expected. The main > problem is that directory listing (ls or find) takes a very long time. The > CPU usage on the nodes is high for each of the glusterfsd processes - 3 on > each machine 54%, 43%, and 25% per core is an example of the usage. Memory > is very low for each process. It is incredibly difficult to diagnose this > issue. We have wiped previous gluster installs, all directories, and mount > points as well as reformatting the disks. Each drive is formatted with > ext4. > > Has anyone had a similar result? Any ideas on how to debug this one? > > Thank you, > > Jon > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20121101/494c936d/attachment.html>
Pranith Kumar Karampuri
2012-Nov-02 02:52 UTC
[Gluster-users] Very slow directory listing and high CPU usage on replicated volume
Jonathan, Could you give us the directory structure details, so that we can try to re-create the issue. I am assuming each file is about 300kb. Please give us the depth of the directory structure and how many directories in each level. Thanks Pranith ----- Original Message ----- From: "Jonathan Lefman" <jonathan.lefman at essess.com> To: gluster-users at gluster.org Sent: Friday, November 2, 2012 5:33:21 AM Subject: [Gluster-users] Very slow directory listing and high CPU usage on replicated volume Hi all, I am having problems with painfully slow directory listings on a freshly created replicated volume. The configuration is as follows: 2 nodes with 3 replicated drives each. The total volume capacity is 5.6T. We would like to expand the storage capacity much more, but first we need to figure this problem out. Soon after loading up about 100 MB of small files (about 300kb each), the drive usage is at 1.1T. I am not sure if this to be expected. The main problem is that directory listing (ls or find) takes a very long time. The CPU usage on the nodes is high for each of the glusterfsd processes - 3 on each machine 54%, 43%, and 25% per core is an example of the usage. Memory is very low for each process. It is incredibly difficult to diagnose this issue. We have wiped previous gluster installs, all directories, and mount points as well as reformatting the disks. Each drive is formatted with ext4. Has anyone had a similar result? Any ideas on how to debug this one? Thank you, Jon _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Brian Candler
2012-Nov-02 08:44 UTC
[Gluster-users] Very slow directory listing and high CPU usage on replicated volume
On Thu, Nov 01, 2012 at 08:03:21PM -0400, Jonathan Lefman wrote:> Soon after loading up about 100 MB of small files (about 300kb each), > the drive usage is at 1.1T.That is very odd. What do you get if you run du and df on the individual bricks themselves? 100MB is only ~330 files of 300KB each. Did you specify any special options to mkfs.ext4? Maybe -l 512 would help, as the xattrs are more likely to sit within the indoes themselves. If you start everything from scratch, it would be interesting to see df stats when the filesystem is empty. It may be that a huge amount of space has been allocated to inodes. If you expect most of your files >16KB then you could add -i 16384 to mkfs.ext4 to reduce the space reserved for inodes. But using xfs would be better, as it doesn't reserve any space for inodes, it allocates it dynamically. Ignore the comment that glusterfs is "not designed for handling large count small files" - 300KB is not small. Regards, Brian.
Jonathan Lefman
2012-Nov-05 18:10 UTC
[Gluster-users] Very slow directory listing and high CPU usage on replicated volume
Thanks Brian. I tried what you recommended. At first I was very encouraged when I saw things moving across the wire. But about 15 minutes into the transfer things ground to a halt. I am currently running across a GigE channel. Things were moving about 20-40 MB/s but when things stopped moving the transfer rate is down in single digit kb/s. Even doing a top level directory listing takes quite a while. Not in a happy state. On Mon, Nov 5, 2012 at 10:10 AM, Brian Candler <B.Candler at pobox.com> wrote:> If your disks are >1TB with XFS then try mount -o inode64 > > This has the effect of sequential writes into the same directory being > localised next to each other (within the same allocation group). When you > skip to the next directory you will probably get a different allocation > group. > > Without this, the behaviour is to (a) stick all the inodes in the first > allocation group, and (b) to stick every file into a random allocation > group, regardless of the parent directory >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20121105/2ce3508c/attachment.html>
Brian Candler
2012-Nov-05 18:26 UTC
[Gluster-users] Very slow directory listing and high CPU usage on replicated volume
On Mon, Nov 05, 2012 at 01:10:40PM -0500, Jonathan Lefman wrote:> Thanks Brian. I tried what you recommended. At first I was very > encouraged when I saw things moving across the wire. But about 15 > minutes into the transfer things ground to a halt. I am currently > running across a GigE channel. Things were moving about 20-40 MB/s but > when things stopped moving the transfer rate is down in single digit > kb/s. Even doing a top level directory listing takes quite a while.Hmm. Maybe an strace on the client side (glusterfs FUSE process) and server side (glusterfsd) might give some clues?