Brian Candler
2012-Aug-06 21:43 UTC
[Gluster-users] EBADFD with large number of concurrent files
I have an application where there are 48 processes, and each one has opens 1000 files (different files for all 48 processes). They are opened onto a distributed gluster volume, distributed between two nodes. It works initially, but after a while, some of the processes abort. perror prints "File descriptor in bad state" (I think this means EBADFD) This is with glusterfs 3.3.0 under Ubuntu 12.04 (both the storage nodes and the application servers) Looking on the two backend bricks, each has two glusterfsd processes. On both bricks, the one with the lower pid has 24168 open FDs (ls /proc/<pid>/fd | wc -l), and also 1.5-2.5GB of RSS. So it's pretty clear that glusterfsd keeps one open file handle per file opened by the client. That's pretty reasonable. I don't think I'm hitting a system limit for this: # cat /proc/sys/fs/file-max 808870 and it's clearly working for the first few minutes. So I wonder if anyone has any other suggestions for why EBADFD is getting returned after a while? Thanks, Brian.
Shishir Gowda
2012-Aug-07 09:18 UTC
[Gluster-users] EBADFD with large number of concurrent files
Hi Brian, Can you please provide the client(mnt) log files? Also if possible can you take the state dumps and attach them in the mail "gluster volume statedump <volname>" The o/p will be files in /tmp/<brick-path>.<pid>.dump.x with regards, Shishir ----- Original Message ----- From: "Brian Candler" <B.Candler at pobox.com> To: gluster-users at gluster.org Sent: Tuesday, August 7, 2012 3:13:24 AM Subject: [Gluster-users] EBADFD with large number of concurrent files I have an application where there are 48 processes, and each one has opens 1000 files (different files for all 48 processes). They are opened onto a distributed gluster volume, distributed between two nodes. It works initially, but after a while, some of the processes abort. perror prints "File descriptor in bad state" (I think this means EBADFD) This is with glusterfs 3.3.0 under Ubuntu 12.04 (both the storage nodes and the application servers) Looking on the two backend bricks, each has two glusterfsd processes. On both bricks, the one with the lower pid has 24168 open FDs (ls /proc/<pid>/fd | wc -l), and also 1.5-2.5GB of RSS. So it's pretty clear that glusterfsd keeps one open file handle per file opened by the client. That's pretty reasonable. I don't think I'm hitting a system limit for this: # cat /proc/sys/fs/file-max 808870 and it's clearly working for the first few minutes. So I wonder if anyone has any other suggestions for why EBADFD is getting returned after a while? Thanks, Brian. _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users