thr3ads.net - Btrfs devel - btrfs send ''leaks'' open files [Oct 2013]

If this information is useful, please help other people find it:
Share via:

Phil Davis

2013-Oct-20 10:33 UTC

btrfs send ''leaks'' open files

Setup:
Fedora 19 with kernel 3.11.4-201.fc19.x86_64
btrfs-progs-0.20.rc1.20130917git194aa4a-1.fc19.x86_64

I am trying to do an initial btrfs send of a large (206GB, 1392221
paths) snapshot to a single file on a
different filesystem (a slow external USB drive with btrfs):

   btrfs send /tank/backups/snapshots/test1 > /extpool/filetest/test5.btr

What happens is eventually get following error in in /var/log/messages:

  VFS: file-max limit 202149 reached

I worked around this by:

echo 900000 > /proc/sys/fs/file-max

but then it hit that so tried 1800000 and left it running. This
produced a 377GB test5.btr file
before system ran out of memory and crashed (oom killer etc.)

This is reproducible but the btrfs send takes about 6+ hours (USB 3.0
card on order...)

The reason I think btrfs send is leaking open files is if you watch
/proc/sys/fs/file-nr you see the
number of open files increasing  but if you kill the btrfs send
process then the open
files count reduces back down.  In fact suspending the process also
reduces the open file count but
resuming it then makes the count start increasing again.

I also found Robert Buhren reporting very similar issue back in April 2013:
http://comments.gmane.org/gmane.comp.file-systems.btrfs/24795

If further information is needed, i''d be happy to help.

-- 
Phil Davis
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Sterba

2013-Oct-21 13:02 UTC

head link

Re: btrfs send ''leaks'' open files

On Sun, Oct 20, 2013 at 11:33:56AM +0100, Phil Davis
wrote:> I also found Robert Buhren reporting very similar issue back in April 2013:
> http://comments.gmane.org/gmane.comp.file-systems.btrfs/24795
> 
> If further information is needed, i''d be happy to help.
Please open bug at bugzilla.kernel.org.

thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Al Viro

2013-Oct-22 17:22 UTC

head link

Re: btrfs send ''leaks'' open files

On Sun, Oct 20, 2013 at 11:33:56AM +0100, Phil Davis wrote:
> The reason I think btrfs send is leaking open files is if you watch
> /proc/sys/fs/file-nr you see the
> number of open files increasing  but if you kill the btrfs send
> process then the open
> files count reduces back down.  In fact suspending the process also
> reduces the open file count but
> resuming it then makes the count start increasing again.
What does lsof show while you are running that?  AFAICS, btrfs_ioctl_send()
should be neutral wrt file references - we do fget() on entry and
fput() of the result on exit, with nothing else looking relevant in
sight...  OTOH, btrfs-progs number of calls of that ioctl() seems to
be bounded by the length of argument list.  So the interesting questions
are
	a) how many btrfs send instances are running at the time?
	b) what do their arg lists look like?
	c) who (if anyone) has all those opened files in their descriptor
tables?

BTW, looking at do_send()...
        if (g_verbose > 0)
                fprintf(stderr, "joining genl thread\n");

        close(pipefd[1]);
        pipefd[1] = 0;

        ret = pthread_join(t_read, &t_err);
...
        if (subvol_fd != -1)
                close(subvol_fd);
        if (pipefd[0] != -1)
                close(pipefd[0]);
        if (pipefd[1] != -1)
                close(pipefd[1]);

That pipefd[1] = 0; looks bogus; it doesn''t look like it could result
in
what you are seeing, but unless I''m misreading that code it ought to be
pipefd[1] = -1...
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Al Viro

2013-Oct-22 17:39 UTC

head link

Re: btrfs send ''leaks'' open files

On Tue, Oct 22, 2013 at 06:22:49PM +0100, Al Viro wrote:> On Sun, Oct 20, 2013 at 11:33:56AM +0100, Phil Davis wrote:
> 
> > The reason I think btrfs send is leaking open files is if you watch
> > /proc/sys/fs/file-nr you see the
> > number of open files increasing  but if you kill the btrfs send
> > process then the open
> > files count reduces back down.  In fact suspending the process also
> > reduces the open file count but
> > resuming it then makes the count start increasing again.
> 
> What does lsof show while you are running that?  AFAICS, btrfs_ioctl_send()
> should be neutral wrt file references - we do fget() on entry and
> fput() of the result on exit, with nothing else looking relevant in
> sight...
Argh...  WTF do people keep misusing filp_close()?  close_cur_inode_file()
should just use fput() and be done with that..  Anyway, ->cur_inode_filp
also doesn''t look as a likely candidate for leak - even if we missed
calling
close_cur_inode_file(), we''d get no more dentry_open() done by
open_cur_inode_file() for that sctx, which would limit us to at most one
per BTRFS_IOC_SEND.  IOW, there still seems to be something in userland
calling that ioctl a lot...

Anyway, lsof result ought to distinguish that case - we''d get a
shitload of
opened files not in descriptor tables...
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Al Viro

2013-Oct-22 20:41 UTC

head link

Re: btrfs send ''leaks'' open files

On Tue, Oct 22, 2013 at 06:22:49PM +0100, Al Viro wrote:> On Sun, Oct 20, 2013 at 11:33:56AM +0100, Phil Davis wrote:
> 
> > The reason I think btrfs send is leaking open files is if you watch
> > /proc/sys/fs/file-nr you see the
> > number of open files increasing  but if you kill the btrfs send
> > process then the open
> > files count reduces back down.  In fact suspending the process also
> > reduces the open file count but
> > resuming it then makes the count start increasing again.
> 
> What does lsof show while you are running that?  AFAICS, btrfs_ioctl_send()
> should be neutral wrt file references - we do fget() on entry and
> fput() of the result on exit, with nothing else looking relevant in
> sight...  OTOH, btrfs-progs number of calls of that ioctl() seems to
> be bounded by the length of argument list.  So the interesting questions
> are
> 	a) how many btrfs send instances are running at the time?
> 	b) what do their arg lists look like?
> 	c) who (if anyone) has all those opened files in their descriptor
> tables?

aaaarrrgh.....  OK, I see what''s going on.  We have a normal process
doing
a normal syscall (well, inasmuch as ioctl(2) can be called normal). 
It''s
just that this syscall happens to open and close an obscene amount of
struct file as it goes.  Which leads to all those struct file sitting there
and waiting for task_work_run() to do __fput().  In the meanwhile we keep
allocating new ones (and closing them).  All without returning to userland.

Result: O(regular files in snapshot) struct file instances by the time it
ends.  Of course, once we return to userland (or get killed by OOM),
we have task_work_run() called and all those suckers go away.

Note that decrementing the opened files counter earlier will do nothing
to OOM - it *is* caused by the bloody huge pile of struct file / struct
dentry / struct inode.  We really need to flush that shite somehow - or
avoid producing it in the first place.

The trouble is, I''m not sure that doing __fput() here is really safe -
the
call chains are long and convoluted and I don''t see what the locking
environment is.  IOW, I''m not sure that it''s really
deadlock-free with
fput() done synchronously.  btrfs_release_file() seems to be doing some
non-trivial work if we had the file truncated by somebody else, so...

Does using vfs_read() in send_write() really make things much simpler for
us?  That''s the only reason to bother with that dentry_open() at all;
we could bloody well just copy the data from page cache without bothering
with struct file, set_fs(), etc.

Comments?
--
To unsubscribe from this list: send the line "unsubscribe
linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik

2013-Oct-22 21:07 UTC

head link

Re: btrfs send ''leaks'' open files

On Tue, Oct 22, 2013 at 09:41:05PM +0100, Al Viro wrote:> On Tue, Oct 22, 2013 at 06:22:49PM +0100, Al Viro wrote:
> > On Sun, Oct 20, 2013 at 11:33:56AM +0100, Phil Davis wrote:
> > 
> > > The reason I think btrfs send is leaking open files is if you
watch
> > > /proc/sys/fs/file-nr you see the
> > > number of open files increasing  but if you kill the btrfs send
> > > process then the open
> > > files count reduces back down.  In fact suspending the process
also
> > > reduces the open file count but
> > > resuming it then makes the count start increasing again.
> > 
> > What does lsof show while you are running that?  AFAICS,
btrfs_ioctl_send()
> > should be neutral wrt file references - we do fget() on entry and
> > fput() of the result on exit, with nothing else looking relevant in
> > sight...  OTOH, btrfs-progs number of calls of that ioctl() seems to
> > be bounded by the length of argument list.  So the interesting
questions
> > are
> > 	a) how many btrfs send instances are running at the time?
> > 	b) what do their arg lists look like?
> > 	c) who (if anyone) has all those opened files in their descriptor
> > tables?
> 
> 
> aaaarrrgh.....  OK, I see what''s going on.  We have a normal
process doing
> a normal syscall (well, inasmuch as ioctl(2) can be called normal). 
It''s
> just that this syscall happens to open and close an obscene amount of
> struct file as it goes.  Which leads to all those struct file sitting there
> and waiting for task_work_run() to do __fput().  In the meanwhile we keep
> allocating new ones (and closing them).  All without returning to userland.
> 
> Result: O(regular files in snapshot) struct file instances by the time it
> ends.  Of course, once we return to userland (or get killed by OOM),
> we have task_work_run() called and all those suckers go away.
> 
> Note that decrementing the opened files counter earlier will do nothing
> to OOM - it *is* caused by the bloody huge pile of struct file / struct
> dentry / struct inode.  We really need to flush that shite somehow - or
> avoid producing it in the first place.
> 
> The trouble is, I''m not sure that doing __fput() here is really
safe - the
> call chains are long and convoluted and I don''t see what the
locking
> environment is.  IOW, I''m not sure that it''s really
deadlock-free with
> fput() done synchronously.  btrfs_release_file() seems to be doing some
> non-trivial work if we had the file truncated by somebody else, so...
> 
> Does using vfs_read() in send_write() really make things much simpler for
> us?  That''s the only reason to bother with that dentry_open() at
all;
> we could bloody well just copy the data from page cache without bothering
> with struct file, set_fs(), etc.
> 
> Comments?
I agree, we should probably just drop the vfs_read() call and do the hard work
ourselves.  I will look into doing this in the near future.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Zach Brown

2013-Oct-22 21:22 UTC

head link

Re: btrfs send ''leaks'' open files

On Tue, Oct 22, 2013 at 06:22:49PM +0100, Al Viro wrote:> On Sun, Oct 20, 2013 at 11:33:56AM +0100, Phil Davis wrote:
> 
> > The reason I think btrfs send is leaking open files is if you watch
> > /proc/sys/fs/file-nr you see the
> > number of open files increasing  but if you kill the btrfs send
> > process then the open
> > files count reduces back down.  In fact suspending the process also
> > reduces the open file count but
> > resuming it then makes the count start increasing again.
> 
> What does lsof show while you are running that?  AFAICS, btrfs_ioctl_send()
> should be neutral wrt file references - we do fget() on entry and
> fput() of the result on exit, with nothing else looking relevant in
> sight...
Agreed, the fget/fput of sctx->send_filp doesn''t seem interesting.
There''s only one possible delayed fput() for each ioctl.

I haven''t *really* followed the delayed fput mechanics.. are we worried
about a ton of delayed fputs building up before the ioctl returns to
user space?

The sctx->cur_inode_filp is opened and closed for each file that differs
between the two trees such that the destination is brought up to date by
writing.  That can be O(storage) files if the source snapshot is empty
and the destination snapshot is a full file system.  All in one ioctl.

Sooooo, put whatever call it is that synchronizes the delayed fputs in
close_cur_inode_file() and see if the leak persists?

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Oct 2013 - btrfs send 'leaks' open files

btrfs send ''leaks'' open files

Re: btrfs send ''leaks'' open files

Re: btrfs send ''leaks'' open files

Re: btrfs send ''leaks'' open files

Re: btrfs send ''leaks'' open files

Re: btrfs send ''leaks'' open files

Re: btrfs send ''leaks'' open files