thr3ads.net - Gluster users - [Gluster-users] EBADFD with large number of concurrent files [Aug 2012]

If this information is useful, please help other people find it:
Share via:

Brian Candler

2012-Aug-06 21:43 UTC

[Gluster-users] EBADFD with large number of concurrent files

I have an application where there are 48 processes, and each one has opens
1000 files (different files for all 48 processes).  They are opened onto a
distributed gluster volume, distributed between two nodes.

It works initially, but after a while, some of the processes abort. perror
prints "File descriptor in bad state" (I think this means EBADFD)

This is with glusterfs 3.3.0 under Ubuntu 12.04 (both the storage nodes and
the application servers)

Looking on the two backend bricks, each has two glusterfsd processes.  On
both bricks, the one with the lower pid has 24168 open FDs
(ls /proc/<pid>/fd | wc -l), and also 1.5-2.5GB of RSS.  So it's
pretty
clear that glusterfsd keeps one open file handle per file opened by the
client. That's pretty reasonable.

I don't think I'm hitting a system limit for this:

# cat /proc/sys/fs/file-max
808870

and it's clearly working for the first few minutes.  So I wonder if anyone
has any other suggestions for why EBADFD is getting returned after a while?

Thanks,

Brian.

Shishir Gowda

2012-Aug-07 09:18 UTC

head link

[Gluster-users] EBADFD with large number of concurrent files

Hi Brian,

Can you please provide the client(mnt) log files?

Also if possible can you take the state dumps and attach them in the mail
"gluster volume statedump <volname>"
The o/p will be files in /tmp/<brick-path>.<pid>.dump.x

with regards,
Shishir

----- Original Message -----
From: "Brian Candler" <B.Candler at pobox.com>
To: gluster-users at gluster.org
Sent: Tuesday, August 7, 2012 3:13:24 AM
Subject: [Gluster-users] EBADFD with large number of concurrent files

I have an application where there are 48 processes, and each one has opens
1000 files (different files for all 48 processes).  They are opened onto a
distributed gluster volume, distributed between two nodes.

It works initially, but after a while, some of the processes abort. perror
prints "File descriptor in bad state" (I think this means EBADFD)

This is with glusterfs 3.3.0 under Ubuntu 12.04 (both the storage nodes and
the application servers)

Looking on the two backend bricks, each has two glusterfsd processes.  On
both bricks, the one with the lower pid has 24168 open FDs
(ls /proc/<pid>/fd | wc -l), and also 1.5-2.5GB of RSS.  So it's
pretty
clear that glusterfsd keeps one open file handle per file opened by the
client. That's pretty reasonable.

I don't think I'm hitting a system limit for this:

# cat /proc/sys/fs/file-max
808870

and it's clearly working for the first few minutes.  So I wonder if anyone
has any other suggestions for why EBADFD is getting returned after a while?

Thanks,

Brian.
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Gluster users - Aug 2012 - EBADFD with large number of concurrent files

[Gluster-users] EBADFD with large number of concurrent files

[Gluster-users] EBADFD with large number of concurrent files