thr3ads.net - freebsd stable - Excessive delays due to syncer kthread [Feb 2005]

If this information is useful, please help other people find it:
Share via:

Peter Jeremy

2005-Feb-26 07:13 UTC

Excessive delays due to syncer kthread

I am trying to do some video capture and have been losing occasional
fields.  After adding some debugging code to the kernel, I've found
that the problem is excessive latency between the hardware interrupt
and the driver interrupt - the hardware can handle about 1.5msec of
latency.  Most of the time, the latency is less than 20?sec but but
I'm seeing up to 8 msec occasionally.  In virtually all cases where
there is a problem, curproc at the time of the hardware interrupt is
syncer.  (I had one case where there was another process, but it had
died by the time I went looking for it).  The interrupt is marked
INTR_TYPE_AV so it shouldn't be being delayed by other threads.  (I
can't easily make it INTR_FAST because it needs to call psignal(9)).

The system is an Athlon XP-1800 with 512MB RAM and 2 ATA-100 disks
running 5.3-RELEASE-p5.  It has a couple of NFS exports but doesn't
import anything.  There's nothing much running apart from ffmpeg
capturing the video and a process capturing my kernel debugging
output.  Apart from 4 files being sequentially written as part of my
capture and cron regularly waking up to go back to sleep, there
shouldn't be any filesystem activity.  I tried copying a couple of
large files and touching lots of files but that didn't cause any
problems.

Can anyone suggest why syncer would be occasionally running for
up to 8 msec at a time?  Overall, it's not clocking up a great
deal of CPU time, it just seems to grab it in large chunks.

Peter

Robert Watson

2005-Feb-26 11:26 UTC

head link

Excessive delays due to syncer kthread

On Sat, 26 Feb 2005, Peter Jeremy wrote:
> I am trying to do some video capture and have been losing occasional
> fields.  After adding some debugging code to the kernel, I've found
that
> the problem is excessive latency between the hardware interrupt and the
> driver interrupt - the hardware can handle about 1.5msec of latency. 
> Most of the time, the latency is less than 20?sec but but I'm seeing up
> to 8 msec occasionally.  In virtually all cases where there is a
> problem, curproc at the time of the hardware interrupt is syncer.  (I
> had one case where there was another process, but it had died by the
> time I went looking for it).  The interrupt is marked INTR_TYPE_AV so it
> shouldn't be being delayed by other threads.  (I can't easily make
it
> INTR_FAST because it needs to call psignal(9)). 
> 
> The system is an Athlon XP-1800 with 512MB RAM and 2 ATA-100 disks
> running 5.3-RELEASE-p5.  It has a couple of NFS exports but doesn't
> import anything.  There's nothing much running apart from ffmpeg
> capturing the video and a process capturing my kernel debugging output. 
> Apart from 4 files being sequentially written as part of my capture and
> cron regularly waking up to go back to sleep, there shouldn't be any
> filesystem activity.  I tried copying a couple of large files and
> touching lots of files but that didn't cause any problems. 
> 
> Can anyone suggest why syncer would be occasionally running for up to 8
> msec at a time?  Overall, it's not clocking up a great deal of CPU
time,
> it just seems to grab it in large chunks. 
I don't have too much insight into the syncer (I've CC'd phk to
victimize
him with more e-mail as this is an area he takes great interested in).  A
couple of questions: 

(1) Have you tried turning on options PREEMPTION? 

(2) Does the driver code run with Giant at all? 

(3) Are you relying on callouts or taskqueues at all for processing? 

With PREEMPTION enabled and all driver code running without Giant (and not
depending on threads that also acquire Giant), and all related workers
running with adequate priority, your driver threads should preempt the
syncer.  Your user process will have to wait for the syncer to finish
running though.  So using preemption and Giant-free code, we should be
able to get your driver code in kernel to run on short deadline, but
getting the syncer to behave better will be necessary to get the user code
running on short deadline.

Robert N M Watson

Don Lewis

2005-Feb-26 21:48 UTC

head link

Excessive delays due to syncer kthread

On 26 Feb, Peter Jeremy wrote:> I am trying to do some video capture and have been losing occasional
> fields.  After adding some debugging code to the kernel, I've found
> that the problem is excessive latency between the hardware interrupt
> and the driver interrupt - the hardware can handle about 1.5msec of
> latency.  Most of the time, the latency is less than 20?sec but but
> I'm seeing up to 8 msec occasionally.  In virtually all cases where
> there is a problem, curproc at the time of the hardware interrupt is
> syncer.  (I had one case where there was another process, but it had
> died by the time I went looking for it).  The interrupt is marked
> INTR_TYPE_AV so it shouldn't be being delayed by other threads.  (I
> can't easily make it INTR_FAST because it needs to call psignal(9)).
> 
> The system is an Athlon XP-1800 with 512MB RAM and 2 ATA-100 disks
> running 5.3-RELEASE-p5.  It has a couple of NFS exports but doesn't
> import anything.  There's nothing much running apart from ffmpeg
> capturing the video and a process capturing my kernel debugging
> output.  Apart from 4 files being sequentially written as part of my
> capture and cron regularly waking up to go back to sleep, there
> shouldn't be any filesystem activity.  I tried copying a couple of
> large files and touching lots of files but that didn't cause any
> problems.
> 
> Can anyone suggest why syncer would be occasionally running for
> up to 8 msec at a time?  Overall, it's not clocking up a great
> deal of CPU time, it just seems to grab it in large chunks.
You're probably running into the inode timestamp update loop.  Each
mounted file system has a special "syncer vnode" that remains
permanently on the syncer worklist.  The syncer will call VOP_FSYNC() on
each of these vnodes as it encounters them in the work list, which it
traverses every 32 seconds.  This is done so that things like the
superblock and other file system metadata is periodically written to
disk.  In the case of ufs, the code that does this is in ffs_sync().

I suspect that the problem that you are running into is that ffs_sync()
(and ext2_sync()) also handle inode timestamp updates.  Each time they
are called, they walk the list of vnodes for the file system and call
VOP_FSYNC() for any that have unwritten timestamp updates.  As the
comment in the loop in ffs_sync() says:

                /*
                 * Depend on the mntvnode_slock to keep things stable enough
                 * for a quick test.  Since there might be hundreds of
                 * thousands of vnodes, we cannot afford even a subroutine
                 * call unless there's a good chance that we have work to
do.
                 */

I noticed a related performance problem a while back.  If you are doing
something that writes to a lot of files, like untarring the ports tree,
there will be large bursts of disk activity every 30 seconds and the
system gets very sluggish. Soft updates and the new syncer were supposed
to eliminate this behaviour by spreading out the write activity over
time, but this loop in ffs_sync() will cause a burst of writes every
time it is called.  This can also be observed by watching the length of
the syncer worklist.  When untarring the ports tree, the length of the
worklist should increase to a certain, high level, and stabilize.
Instead it ramps up over about thirty seconds and then takes a dramatic
drop.

In the initial softupdates implementation, some of the work inside the
loop was skipped in the MNT_LAZY case, but it was found that timestamp
updates were being deferred for too long a time.

I talked to Kirk about entirely bypassing this loop in the MNT_LAZY case
and moving the timestamp updates to the syncer worklist.  Kirk sounded
positive on the idea, but I never found the time to work on the
implementation.  and phk's conversion of the syncer to use bufobjs
instead of vnodes complicated things (what do you do about fifos and
sockets?).

freebsd stable - Feb 2005 - Excessive delays due to syncer kthread

Excessive delays due to syncer kthread

Excessive delays due to syncer kthread

Excessive delays due to syncer kthread