thr3ads.net - Gluster users - [Gluster-users] Very slow directory listing and high CPU usage on replicated volume [Nov 2012]

If this information is useful, please help other people find it:
Share via:

Jonathan Lefman

2012-Nov-02 00:03 UTC

[Gluster-users] Very slow directory listing and high CPU usage on replicated volume

Hi all,

I am having problems with painfully slow directory listings on a freshly
created replicated volume.  The configuration is as follows:   2 nodes with
3 replicated drives each.  The total volume capacity is 5.6T.  We would
like to expand the storage capacity much more, but first we need to figure
this problem out.

Soon after loading up about 100 MB of small files (about 300kb each), the
drive usage is at 1.1T.  I am not sure if this to be expected.  The main
problem is that directory listing (ls or find) takes a very long time.  The
CPU usage on the nodes is high for each of the glusterfsd processes - 3 on
each machine 54%, 43%, and 25% per core is an example of the usage.  Memory
is very low for each process.  It is incredibly difficult to diagnose this
issue.  We have wiped previous gluster installs, all directories, and mount
points as well as reformatting the disks.  Each drive is formatted with
ext4.

Has anyone had a similar result?  Any ideas on how to debug this one?

Thank you,

Jon
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20121101/e5f65a4b/attachment.html>

Jonathan Lefman

2012-Nov-02 00:41 UTC

head link

[Gluster-users] Very slow directory listing and high CPU usage on replicated volume

I am attaching some extra information to help diagnose this issue.
 Hopefully this is useful.

Volume Name: my_gluster_data
Type: Distributed-Replicate
Volume ID: e8865f37-6e22-476d-956e-29280bb07e75
Status: Started
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: server1:/media/data1/my_gluster_data
Brick2: server2:/media/data1/my_gluster_data
Brick3: server1:/media/data2/my_gluster_data
Brick4: server2:/media/data2/my_gluster_data
Brick5: server1:/media/data3/my_gluster_data
Brick6: hserver2:/media/data3/my_gluster_data


On Thu, Nov 1, 2012 at 8:03 PM, Jonathan Lefman
<jonathan.lefman at essess.com>wrote:
> Hi all,
>
> I am having problems with painfully slow directory listings on a freshly
> created replicated volume.  The configuration is as follows:   2 nodes with
> 3 replicated drives each.  The total volume capacity is 5.6T.  We would
> like to expand the storage capacity much more, but first we need to figure
> this problem out.
>
> Soon after loading up about 100 MB of small files (about 300kb each), the
> drive usage is at 1.1T.  I am not sure if this to be expected.  The main
> problem is that directory listing (ls or find) takes a very long time.  The
> CPU usage on the nodes is high for each of the glusterfsd processes - 3 on
> each machine 54%, 43%, and 25% per core is an example of the usage.  Memory
> is very low for each process.  It is incredibly difficult to diagnose this
> issue.  We have wiped previous gluster installs, all directories, and mount
> points as well as reformatting the disks.  Each drive is formatted with
> ext4.
>
> Has anyone had a similar result?  Any ideas on how to debug this one?
>
> Thank you,
>
> Jon
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20121101/58e1bf52/attachment.html>

Jules Wang

2012-Nov-02 01:24 UTC

head link

[Gluster-users] Very slow directory listing and high CPU usage on replicated volume

Hi John:
Glusterfs is not designed for handling large count small files, because it
has no meta data server, every lookup operation cost a lot in your situation.
The disk usage is abnormal, does your disk only have gluster bricks?

Best Regards.
Jules Wang

At 2012-11-02 08:03:21,"Jonathan Lefman" <jonathan.lefman at
essess.com> wrote:
Hi all,

I am having problems with painfully slow directory listings on a freshly created
replicated volume. The configuration is as follows: 2 nodes with 3 replicated
drives each. The total volume capacity is 5.6T. We would like to expand the
storage capacity much more, but first we need to figure this problem out.

Soon after loading up about 100 MB of small files (about 300kb each), the drive
usage is at 1.1T. I am not sure if this to be expected. The main problem is
that directory listing (ls or find) takes a very long time. The CPU usage on
the nodes is high for each of the glusterfsd processes - 3 on each machine 54%,
43%, and 25% per core is an example of the usage. Memory is very low for each
process. It is incredibly difficult to diagnose this issue. We have wiped
previous gluster installs, all directories, and mount points as well as
reformatting the disks. Each drive is formatted with ext4.

Has anyone had a similar result? Any ideas on how to debug this one?

Thank you,

Jon

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20121102/50b2bd6d/attachment.html>

Jonathan Lefman

2012-Nov-02 01:31 UTC

head link

[Gluster-users] Very slow directory listing and high CPU usage on replicated volume

Thanks for the response.  Yes, only gluster bricks on the disk.  I know
about the large count of small files issue, but we changed how we organized
the files.  Each directory has about 30 files in it.  Am I missing
something?

On Thu, Nov 1, 2012 at 9:24 PM, Jules Wang <lancelotds at 163.com> wrote:
> Hi John:
>     Glusterfs is not designed for handling large count  small files,
> because it has no meta data server, every lookup operation cost a lot in
> your situation.
>     The disk usage is abnormal, does your disk only have gluster bricks?
>
> Best Regards.
> Jules Wang
>
>
>
>
> At 2012-11-02 08:03:21,"Jonathan Lefman" <jonathan.lefman at
essess.com>
> wrote:
>
> Hi all,
>
> I am having problems with painfully slow directory listings on a freshly
> created replicated volume.  The configuration is as follows:   2 nodes with
> 3 replicated drives each.  The total volume capacity is 5.6T.  We would
> like to expand the storage capacity much more, but first we need to figure
> this problem out.
>
> Soon after loading up about 100 MB of small files (about 300kb each), the
> drive usage is at 1.1T.  I am not sure if this to be expected.  The main
> problem is that directory listing (ls or find) takes a very long time.  The
> CPU usage on the nodes is high for each of the glusterfsd processes - 3 on
> each machine 54%, 43%, and 25% per core is an example of the usage.  Memory
> is very low for each process.  It is incredibly difficult to diagnose this
> issue.  We have wiped previous gluster installs, all directories, and mount
> points as well as reformatting the disks.  Each drive is formatted with
> ext4.
>
> Has anyone had a similar result?  Any ideas on how to debug this one?
>
> Thank you,
>
> Jon
>
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20121101/494c936d/attachment.html>

Pranith Kumar Karampuri

2012-Nov-02 02:52 UTC

head link

[Gluster-users] Very slow directory listing and high CPU usage on replicated volume

Jonathan,
   Could you give us the directory structure details, so that we can try to
re-create the issue. I am assuming each file is about 300kb. Please give us the
depth of the directory structure and how many directories in each level.

Thanks
Pranith
----- Original Message -----
From: "Jonathan Lefman" <jonathan.lefman at essess.com>
To: gluster-users at gluster.org
Sent: Friday, November 2, 2012 5:33:21 AM
Subject: [Gluster-users] Very slow directory listing and high CPU usage on
replicated volume

Hi all, 

I am having problems with painfully slow directory listings on a freshly created
replicated volume. The configuration is as follows: 2 nodes with 3 replicated
drives each. The total volume capacity is 5.6T. We would like to expand the
storage capacity much more, but first we need to figure this problem out.

Soon after loading up about 100 MB of small files (about 300kb each), the drive
usage is at 1.1T. I am not sure if this to be expected. The main problem is that
directory listing (ls or find) takes a very long time. The CPU usage on the
nodes is high for each of the glusterfsd processes - 3 on each machine 54%, 43%,
and 25% per core is an example of the usage. Memory is very low for each
process. It is incredibly difficult to diagnose this issue. We have wiped
previous gluster installs, all directories, and mount points as well as
reformatting the disks. Each drive is formatted with ext4.

Has anyone had a similar result? Any ideas on how to debug this one? 

Thank you, 

Jon 

_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Brian Candler

2012-Nov-02 08:44 UTC

head link

[Gluster-users] Very slow directory listing and high CPU usage on replicated volume

On Thu, Nov 01, 2012 at 08:03:21PM -0400, Jonathan Lefman
wrote:>    Soon after loading up about 100 MB of small files (about 300kb each),
>    the drive usage is at 1.1T.
That is very odd. What do you get if you run du and df on the individual
bricks themselves? 100MB is only ~330 files of 300KB each.

Did you specify any special options to mkfs.ext4? Maybe -l 512 would help,
as the xattrs are more likely to sit within the indoes themselves.

If you start everything from scratch, it would be interesting to see df
stats when the filesystem is empty.  It may be that a huge amount of space
has been allocated to inodes.  If you expect most of your files >16KB then
you could add -i 16384 to mkfs.ext4 to reduce the space reserved for inodes. 
But using xfs would be better, as it doesn't reserve any space for inodes,
it allocates it dynamically.

Ignore the comment that glusterfs is "not designed for handling large count
small files" - 300KB is not small.

Regards,

Brian.

Jonathan Lefman

2012-Nov-05 18:10 UTC

head link

[Gluster-users] Very slow directory listing and high CPU usage on replicated volume

Thanks Brian.  I tried what you recommended.  At first I was very
encouraged when I saw things moving across the wire.  But about 15 minutes
into the transfer things ground to a halt.  I am currently running across a
GigE channel.  Things were moving about 20-40 MB/s but when things stopped
moving the transfer rate is down in single digit kb/s.  Even doing a top
level directory listing takes quite a while.

Not in a happy state.

On Mon, Nov 5, 2012 at 10:10 AM, Brian Candler <B.Candler at pobox.com>
wrote:
> If your disks are >1TB with XFS then try mount -o inode64
>
> This has the effect of sequential writes into the same directory being
> localised next to each other (within the same allocation group). When you
> skip to the next directory you will probably get a different allocation
> group.
>
> Without this, the behaviour is to (a) stick all the inodes in the first
> allocation group, and (b) to stick every file into a random allocation
> group, regardless of the parent directory
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20121105/2ce3508c/attachment.html>

Brian Candler

2012-Nov-05 18:26 UTC

head link

[Gluster-users] Very slow directory listing and high CPU usage on replicated volume

On Mon, Nov 05, 2012 at 01:10:40PM -0500, Jonathan Lefman
wrote:>    Thanks Brian.  I tried what you recommended.  At first I was very
>    encouraged when I saw things moving across the wire.  But about 15
>    minutes into the transfer things ground to a halt.  I am currently
>    running across a GigE channel.  Things were moving about 20-40 MB/s but
>    when things stopped moving the transfer rate is down in single digit
>    kb/s.  Even doing a top level directory listing takes quite a while.
Hmm. Maybe an strace on the client side (glusterfs FUSE process) and server
side (glusterfsd) might give some clues?

John Mark Walker

2012-Nov-06 10:44 UTC

head link

[Gluster-users] Very slow directory listing and high CPU usage on replicated volume

An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20121106/6e763116/attachment.html>

Seemingly Similar Threads

Search for more possibly parallel threads

Gluster users - Nov 2012 - Very slow directory listing and high CPU usage on replicated volume

[Gluster-users] Very slow directory listing and high CPU usage on replicated volume

[Gluster-users] Very slow directory listing and high CPU usage on replicated volume

[Gluster-users] Very slow directory listing and high CPU usage on replicated volume

[Gluster-users] Very slow directory listing and high CPU usage on replicated volume

[Gluster-users] Very slow directory listing and high CPU usage on replicated volume

[Gluster-users] Very slow directory listing and high CPU usage on replicated volume

[Gluster-users] Very slow directory listing and high CPU usage on replicated volume

[Gluster-users] Very slow directory listing and high CPU usage on replicated volume

[Gluster-users] Very slow directory listing and high CPU usage on replicated volume

Seemingly Similar Threads