thr3ads.net - Xapian devel - [Xapian-devel] Problems with /bin/cat and flintlock? [Apr 2011]

If this information is useful, please help other people find it:
Share via:

Samuel Williams

2011-Apr-07 14:56 UTC

[Xapian-devel] Problems with /bin/cat and flintlock?

Hi Guys,

I'm working on some integration project with Ruby, Rack, Apache, Phusion
Passenger and Xapian.

I've been having intermittent issues with the flintlock code - it seems that
the function FlintLock::lock is never returning and this is locking up the Ruby
process.

My guess is that Xapian is locking up in a system call and Ruby can't
schedule its green threads.

I've done some basic debugging with strace and noticed the following:

29944 30022 29942 29939 ?           -1 Sl      33   0:09  |   | \_ Passenger
ApplicationSpawner: /srv/www/www.oriontransfer.co.nz
30022 30041 29942 29939 ?           -1 S       33   0:00  |   | |   \_ /bin/cat 

[Using the following source code as a reference
http://xapian.org/docs/sourcedoc/html/flint__lock_8cc_source.html]

At this point, using strace I found that the application process seemed to be
stuck in on
00219         ssize_t n = read(fds[0], &ch, 1);

Obviously child process was cat, nothing really interesting about that.

After I killed cat, then the process was freed up and the web application
started responding again.

Well, I don't know why this is unreliable I've briefly looked at the
code and noticed a few things:

00172         // Connect pipe to stdin and stdout.
00173         dup2(fds[1], 0);
00174         dup2(fds[1], 1);

Isn't this setting stdin and stdout to the same end of an existing pipe?
Does this make sense?

Anyway, I thought I'd mention this because it is a consistent problem. If
there is anything you think I should do with strace, gdb, etc on the processes
next time it hangs, let me know.

One option to fix the bug without really understanding the real issue would be
to use select in the parent thread, rather than read. Then, use a timeout of a
few seconds so that if the child doesn't acquire the lock within x seconds,
it is as good as failed.

Kind regards,
Samuel


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xapian.org/pipermail/xapian-devel/attachments/20110408/da9e0f36/attachment.html>

Olly Betts

2011-Apr-08 02:41 UTC

head link

[Xapian-devel] Problems with /bin/cat and flintlock?

On Fri, Apr 08, 2011 at 02:56:22AM +1200, Samuel Williams
wrote:> I've been having intermittent issues with the flintlock code - it
> seems that the function FlintLock::lock is never returning and this is
> locking up the Ruby process.
What OS is this on?  That's likely to be highly relevant.
> At this point, using strace I found that the application process
> seemed to be stuck in on
> 00219         ssize_t n = read(fds[0], &ch, 1);
> 
> Obviously child process was cat, nothing really interesting about that.
The child process should send a single character before it execs
/bin/cat, which is what the parent is waiting to read there.

If the write() call in the child fails, then the child exits, so
unless the OS fails to transfer the byte across the pipe, I struggle
to see how we can end up in this situation.
> 00172         // Connect pipe to stdin and stdout.
> 00173         dup2(fds[1], 0);
> 00174         dup2(fds[1], 1);
> 
> Isn't this setting stdin and stdout to the same end of an existing
> pipe? Does this make sense?
It's a bidirectional socket, so that's fine.
> Anyway, I thought I'd mention this because it is a consistent problem.
> If there is anything you think I should do with strace, gdb, etc on
> the processes next time it hangs, let me know.
It would be useful to attach gdb to the parent and child and do a
backtrace in each (bt) to see exactly where we are.
> One option to fix the bug without really understanding the real issue
> would be to use select in the parent thread, rather than read. Then,
> use a timeout of a few seconds so that if the child doesn't acquire
> the lock within x seconds, it is as good as failed.
I'd prefer to understand the issue rather than paper over it.  Locking
is rather a critical operation to get right!

Also, it's rather unclear what a suitable threshold is - you can use
fcntl locking over NFS if you run the lock daemon, so a few seconds to
get a lock is probably not impossible with a busy NFS server.

Cheers,
    Olly

Maybe Matching Threads

Search for more apparently analagous threads

Xapian devel - Apr 2011 - Problems with /bin/cat and flintlock?

[Xapian-devel] Problems with /bin/cat and flintlock?

[Xapian-devel] Problems with /bin/cat and flintlock?

Maybe Matching Threads