While trying to diagnose a packet loss problem in a RELENG_6 snapshot dated November 8, 2007 it looks like I've stumbled across a broken driver or kernel routine which stops interrupt processing long enough to severly degrade network performance every 30.99 seconds. Packets appear to make it as far as ether_input() then get lost. Test setup: A - ethernet_switch - B A sends UDP packets to B through an ethernet switch. The interface input packet count and output packet count on the switch match what A is sending and B should be receiving. A UDP receiver running on B sees windows of packet loss with a period of 30.99 seconds. The lost packets are counted based on an incrementing sequence number. On an isolated network the <Link#1> Ipkts counter on B matches what A is sending, but the packets never show up in any of the IP/UDP counters or the program trying to receive them. This behavior can be seen with both em and fxp interfaces. Problem is it only occurs after the receiving host has been up about a day. Reboot, problem clears. GENERIC kernel, nothing more than default daemons running. Behavior seen on three different motherboards so far. It also appears this is not just lost network interrupts. Whatever is spinning in the kernel also impacts syscall latency. An easy way to replicate what I'm seeing is to run gettimeofday() in a tight loop and note when the real time syscall delay exceeds some value (which is dependent on processor speed). As an example on an 3.20GHz CPU a small program will output when the syscall latency is > 5000 usecs. Note the periodic behavior at 30.99 seconds. These big jumps in latency correspond to when packets are being dropped. usecs (epoch) latency diff ------------------------------------------------ 1197861705805078 478199 0 1197861721012298 25926 15207220 1197861726332036 11729 5319738 1197861757331549 11691 30999513 1197861788331266 11878 30999717 1197861819330647 11708 30999381 1197861850330192 11698 30999545 1197861881329733 11667 30999541 1197861900018297 6516 18688564 1197861912329282 11684 12310985 1197861943328849 11699 30999567 1197861974328413 11692 30999564 1197862005328228 11916 30999815 1197862036327598 11684 30999370 1197862067327229 11680 30999631 1197862098326860 11667 30999631 1197862129326559 11704 30999699 1197862160326377 11844 30999818 1197862191325890 11674 30999513 (output from packet loss tester) window_start/window_end is packet counter time_start/time_end is absolute time in usecs. window_diff is # of packets missing The test is run at about 15.5Kpps / 132Mbits/second, certainly a lot less than this hardware is capable of running BSD4.X. :missing window_start=311510, time_start=1197861726332008,window_end=311638, time_end=1197861726332011, window_diff=128, time_diff=3 :missing window_start=794482, time_start=1197861757331505,window_end=794609, time_end=1197861757331509, window_diff=127, time_diff=4 :missing window_start=1277313, time_start=1197861788331245,window_end=1277444, time_end=1197861788331249, window_diff=131, time_diff=4 :missing window_start=1760104, time_start=1197861819330625,window_end=1760232, time_end=1197861819330629, window_diff=128, time_diff=4 :missing window_start=2242789, time_start=1197861850330170,window_end=2242916, time_end=1197861850330174, window_diff=127, time_diff=4 :missing window_start=2725818, time_start=1197861881329712,window_end=2725946, time_end=1197861881329715, window_diff=128, time_diff=3 :missing window_start=3208594, time_start=1197861912329261,window_end=3208722, time_end=1197861912329264, window_diff=128, time_diff=3 :missing window_start=3691395, time_start=1197861943328802,window_end=3691522, time_end=1197861943328805, window_diff=127, time_diff=3 :missing window_start=4173793, time_start=1197861974328369,window_end=4173921, time_end=1197861974328373, window_diff=128, time_diff=4 :missing window_start=4656236, time_start=1197862005328176,window_end=4656367, time_end=1197862005328179, window_diff=131, time_diff=3 :missing window_start=5139197, time_start=1197862036327576,window_end=5139325, time_end=1197862036327580, window_diff=128, time_diff=4 :missing window_start=5621958, time_start=1197862067327208,window_end=5622085, time_end=1197862067327211, window_diff=127, time_diff=3 :missing window_start=6104597, time_start=1197862098326839,window_end=6104725, time_end=1197862098326843, window_diff=128, time_diff=4 :missing window_start=6587241, time_start=1197862129326514,window_end=6587369, time_end=1197862129326534, window_diff=128, time_diff=20 :missing window_start=7070051, time_start=1197862160326368,window_end=7070183, time_end=1197862160326371, window_diff=132, time_diff=3 :missing window_start=7552828, time_start=1197862191325873,window_end=7552954, time_end=1197862191325876, window_diff=126, time_diff=3 :missing window_start=8035434, time_start=1197862222325572,window_end=8035560, time_end=1197862222325576, window_diff=126, time_diff=4 I'm building a more up to date copy of RELENG_6 to make sure I'm not chasing something that's been fixed. As a side note this appears to also be happening on a RELENG_6 build dated Mar 11 2007. Included is the gettimeofday() looper. Run as ./a.out 1 5000, where 5000 will depend on your system speed. This probably won't provide any meaningful results on a loaded system. E-mail me off list for a copy of the packet tester or more diagnostics. #include <sys/time.h> #include <sys/types.h> #include <errno.h> #include <unistd.h> #include <stdio.h> #include <stdlib.h> #include <time.h> #include <err.h> main(int argc, char **argv) { struct timeval tv; struct timezone tz; u_int64_t time_now, time_last, time_mark; int quiet, max; if (argc != 3) errx(1, "Usage: %s <quiet> <max>", argv[0]); quiet=atoi(argv[1]); max=atoi(argv[2]); gettimeofday(&tv, &tz); time_last = (u_int64_t)tv.tv_sec * 1000000LL + (u_int64_t)tv.tv_usec; time_mark = 0LL; for (;;) { gettimeofday(&tv, &tz); time_now = (u_int64_t)tv.tv_sec * 1000000LL + (u_int64_t) tv.tv_usec; if (!quiet) { printf("%llu %llu %llu\n", time_now, time_now-time_last, time_now-time_mark); time_mark = time_now; } else { if ((time_now-time_last) > max) { if (time_mark == 0) time_mark = time_now; printf("%llu %llu %llu\n", time_now, time_now-time_last, time_now-time_mark); time_mark = time_now; } } time_last = time_now; } } /* main */
While trying to diagnose a packet loss problem in a RELENG_6 snapshot dated November 8, 2007 it looks like I've stumbled across a broken driver or kernel routine which stops interrupt processing long enough to severly degrade network performance every 30.99 seconds. Packets appear to make it as far as ether_input() then get lost. Test setup: A - ethernet_switch - B A sends UDP packets to B through an ethernet switch. The interface input packet count and output packet count on the switch match what A is sending and B should be receiving. A UDP receiver running on B sees windows of packet loss with a period of 30.99 seconds. The lost packets are counted based on an incrementing sequence number. On an isolated network the <Link#1> Ipkts counter on B matches what A is sending, but the packets never show up in any of the IP/UDP counters or the program trying to receive them. This behavior can be seen with both em and fxp interfaces. Problem is it only occurs after the receiving host has been up about a day. Reboot, problem clears. GENERIC kernel, nothing more than default daemons running. Behavior seen on three different motherboards so far. It also appears this is not just lost network interrupts. Whatever is spinning in the kernel also impacts syscall latency. An easy way to replicate what I'm seeing is to run gettimeofday() in a tight loop and note when the real time syscall delay exceeds some value (which is dependent on processor speed). As an example on an 3.20GHz CPU a small program will output when the syscall latency is > 5000 usecs. Note the periodic behavior at 30.99 seconds. These big jumps in latency correspond to when packets are being dropped. usecs (epoch) latency diff ------------------------------------------------ 1197861705805078 478199 0 1197861721012298 25926 15207220 1197861726332036 11729 5319738 1197861757331549 11691 30999513 1197861788331266 11878 30999717 1197861819330647 11708 30999381 1197861850330192 11698 30999545 1197861881329733 11667 30999541 1197861900018297 6516 18688564 1197861912329282 11684 12310985 1197861943328849 11699 30999567 1197861974328413 11692 30999564 1197862005328228 11916 30999815 1197862036327598 11684 30999370 1197862067327229 11680 30999631 1197862098326860 11667 30999631 1197862129326559 11704 30999699 1197862160326377 11844 30999818 1197862191325890 11674 30999513 (output from packet loss tester) window_start/window_end is packet counter time_start/time_end is absolute time in usecs. window_diff is # of packets missing The test is run at about 15.5Kpps / 132Mbits/second, certainly a lot less than this hardware is capable of running BSD4.X. :missing window_start=311510, time_start=1197861726332008,window_end=311638, time_end=1197861726332011, window_diff=128, time_diff=3 :missing window_start=794482, time_start=1197861757331505,window_end=794609, time_end=1197861757331509, window_diff=127, time_diff=4 :missing window_start=1277313, time_start=1197861788331245,window_end=1277444, time_end=1197861788331249, window_diff=131, time_diff=4 :missing window_start=1760104, time_start=1197861819330625,window_end=1760232, time_end=1197861819330629, window_diff=128, time_diff=4 :missing window_start=2242789, time_start=1197861850330170,window_end=2242916, time_end=1197861850330174, window_diff=127, time_diff=4 :missing window_start=2725818, time_start=1197861881329712,window_end=2725946, time_end=1197861881329715, window_diff=128, time_diff=3 :missing window_start=3208594, time_start=1197861912329261,window_end=3208722, time_end=1197861912329264, window_diff=128, time_diff=3 :missing window_start=3691395, time_start=1197861943328802,window_end=3691522, time_end=1197861943328805, window_diff=127, time_diff=3 :missing window_start=4173793, time_start=1197861974328369,window_end=4173921, time_end=1197861974328373, window_diff=128, time_diff=4 :missing window_start=4656236, time_start=1197862005328176,window_end=4656367, time_end=1197862005328179, window_diff=131, time_diff=3 :missing window_start=5139197, time_start=1197862036327576,window_end=5139325, time_end=1197862036327580, window_diff=128, time_diff=4 :missing window_start=5621958, time_start=1197862067327208,window_end=5622085, time_end=1197862067327211, window_diff=127, time_diff=3 :missing window_start=6104597, time_start=1197862098326839,window_end=6104725, time_end=1197862098326843, window_diff=128, time_diff=4 :missing window_start=6587241, time_start=1197862129326514,window_end=6587369, time_end=1197862129326534, window_diff=128, time_diff=20 :missing window_start=7070051, time_start=1197862160326368,window_end=7070183, time_end=1197862160326371, window_diff=132, time_diff=3 :missing window_start=7552828, time_start=1197862191325873,window_end=7552954, time_end=1197862191325876, window_diff=126, time_diff=3 :missing window_start=8035434, time_start=1197862222325572,window_end=8035560, time_end=1197862222325576, window_diff=126, time_diff=4 I'm building a more up to date copy of RELENG_6 to make sure I'm not chasing something that's been fixed. As a side note this appears to also be happening on a RELENG_6 build dated Mar 11 2007. Included is the gettimeofday() looper. Run as ./a.out 1 5000, where 5000 will depend on your system speed. This probably won't provide any meaningful results on a loaded system. E-mail me off list for a copy of the packet tester or more diagnostics. #include <sys/time.h> #include <sys/types.h> #include <errno.h> #include <unistd.h> #include <stdio.h> #include <stdlib.h> #include <time.h> #include <err.h> main(int argc, char **argv) { struct timeval tv; struct timezone tz; u_int64_t time_now, time_last, time_mark; int quiet, max; if (argc != 3) errx(1, "Usage: %s <quiet> <max>", argv[0]); quiet=atoi(argv[1]); max=atoi(argv[2]); gettimeofday(&tv, &tz); time_last = (u_int64_t)tv.tv_sec * 1000000LL + (u_int64_t)tv.tv_usec; time_mark = 0LL; for (;;) { gettimeofday(&tv, &tz); time_now = (u_int64_t)tv.tv_sec * 1000000LL + (u_int64_t) tv.tv_usec; if (!quiet) { printf("%llu %llu %llu\n", time_now, time_now-time_last, time_now-time_mark); time_mark = time_now; } else { if ((time_now-time_last) > max) { if (time_mark == 0) time_mark = time_now; printf("%llu %llu %llu\n", time_now, time_now-time_last, time_now-time_mark); time_mark = time_now; } } time_last = time_now; } } /* main */
While trying to diagnose a packet loss problem in a RELENG_6 snapshot dated November 8, 2007 it looks like I've stumbled across a broken driver or kernel routine which stops interrupt processing long enough to severly degrade network performance every 30.99 seconds. Packets appear to make it as far as ether_input() then get lost. Test setup: A - ethernet_switch - B A sends UDP packets to B through an ethernet switch. The interface input packet count and output packet count on the switch match what A is sending and B should be receiving. A UDP receiver running on B sees windows of packet loss with a period of 30.99 seconds. The lost packets are counted based on an incrementing sequence number. On an isolated network the <Link#1> Ipkts counter on B matches what A is sending, but the packets never show up in any of the IP/UDP counters or the program trying to receive them. This behavior can be seen with both em and fxp interfaces. Problem is it only occurs after the receiving host has been up about a day. Reboot, problem clears. GENERIC kernel, nothing more than default daemons running. Behavior seen on three different motherboards so far. It also appears this is not just lost network interrupts. Whatever is spinning in the kernel also impacts syscall latency. An easy way to replicate what I'm seeing is to run gettimeofday() in a tight loop and note when the real time syscall delay exceeds some value (which is dependent on processor speed). As an example on an 3.20GHz CPU a small program will output when the syscall latency is > 5000 usecs. Note the periodic behavior at 30.99 seconds. These big jumps in latency correspond to when packets are being dropped. usecs (epoch) latency diff ------------------------------------------------ 1197861705805078 478199 0 1197861721012298 25926 15207220 1197861726332036 11729 5319738 1197861757331549 11691 30999513 1197861788331266 11878 30999717 1197861819330647 11708 30999381 1197861850330192 11698 30999545 1197861881329733 11667 30999541 1197861900018297 6516 18688564 1197861912329282 11684 12310985 1197861943328849 11699 30999567 1197861974328413 11692 30999564 1197862005328228 11916 30999815 1197862036327598 11684 30999370 1197862067327229 11680 30999631 1197862098326860 11667 30999631 1197862129326559 11704 30999699 1197862160326377 11844 30999818 1197862191325890 11674 30999513 (output from packet loss tester) window_start/window_end is packet counter time_start/time_end is absolute time in usecs. window_diff is # of packets missing The test is run at about 15.5Kpps / 132Mbits/second, certainly a lot less than this hardware is capable of running BSD4.X. :missing window_start=311510, time_start=1197861726332008,window_end=311638, time_end=1197861726332011, window_diff=128, time_diff=3 :missing window_start=794482, time_start=1197861757331505,window_end=794609, time_end=1197861757331509, window_diff=127, time_diff=4 :missing window_start=1277313, time_start=1197861788331245,window_end=1277444, time_end=1197861788331249, window_diff=131, time_diff=4 :missing window_start=1760104, time_start=1197861819330625,window_end=1760232, time_end=1197861819330629, window_diff=128, time_diff=4 :missing window_start=2242789, time_start=1197861850330170,window_end=2242916, time_end=1197861850330174, window_diff=127, time_diff=4 :missing window_start=2725818, time_start=1197861881329712,window_end=2725946, time_end=1197861881329715, window_diff=128, time_diff=3 :missing window_start=3208594, time_start=1197861912329261,window_end=3208722, time_end=1197861912329264, window_diff=128, time_diff=3 :missing window_start=3691395, time_start=1197861943328802,window_end=3691522, time_end=1197861943328805, window_diff=127, time_diff=3 :missing window_start=4173793, time_start=1197861974328369,window_end=4173921, time_end=1197861974328373, window_diff=128, time_diff=4 :missing window_start=4656236, time_start=1197862005328176,window_end=4656367, time_end=1197862005328179, window_diff=131, time_diff=3 :missing window_start=5139197, time_start=1197862036327576,window_end=5139325, time_end=1197862036327580, window_diff=128, time_diff=4 :missing window_start=5621958, time_start=1197862067327208,window_end=5622085, time_end=1197862067327211, window_diff=127, time_diff=3 :missing window_start=6104597, time_start=1197862098326839,window_end=6104725, time_end=1197862098326843, window_diff=128, time_diff=4 :missing window_start=6587241, time_start=1197862129326514,window_end=6587369, time_end=1197862129326534, window_diff=128, time_diff=20 :missing window_start=7070051, time_start=1197862160326368,window_end=7070183, time_end=1197862160326371, window_diff=132, time_diff=3 :missing window_start=7552828, time_start=1197862191325873,window_end=7552954, time_end=1197862191325876, window_diff=126, time_diff=3 :missing window_start=8035434, time_start=1197862222325572,window_end=8035560, time_end=1197862222325576, window_diff=126, time_diff=4 I'm building a more up to date copy of RELENG_6 to make sure I'm not chasing something that's been fixed. As a side note this appears to also be happening on a RELENG_6 build dated Mar 11 2007. Included is the gettimeofday() looper. Run as ./a.out 1 5000, where 5000 will depend on your system speed. This probably won't provide any meaningful results on a loaded system. E-mail me off list for a copy of the packet tester or more diagnostics. #include <sys/time.h> #include <sys/types.h> #include <errno.h> #include <unistd.h> #include <stdio.h> #include <stdlib.h> #include <time.h> #include <err.h> main(int argc, char **argv) { struct timeval tv; struct timezone tz; u_int64_t time_now, time_last, time_mark; int quiet, max; if (argc != 3) errx(1, "Usage: %s <quiet> <max>", argv[0]); quiet=atoi(argv[1]); max=atoi(argv[2]); gettimeofday(&tv, &tz); time_last = (u_int64_t)tv.tv_sec * 1000000LL + (u_int64_t)tv.tv_usec; time_mark = 0LL; for (;;) { gettimeofday(&tv, &tz); time_now = (u_int64_t)tv.tv_sec * 1000000LL + (u_int64_t) tv.tv_usec; if (!quiet) { printf("%llu %llu %llu\n", time_now, time_now-time_last, time_now-time_mark); time_mark = time_now; } else { if ((time_now-time_last) > max) { if (time_mark == 0) time_mark = time_now; printf("%llu %llu %llu\n", time_now, time_now-time_last, time_now-time_mark); time_mark = time_now; } } time_last = time_now; } } /* main */
One more comment on my last email... The patch that I included is not meant as a real fix - it is just a bandaid. The real problem appears to be that a very large number of vnodes (all of them?) are getting synced (i.e. calling ffs_syncvnode()) every time. This should normally only happen for dirty vnodes. I suspect that something is broken with this check: if (vp->v_type == VNON || ((ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_MODIFIED | IN_UPDATE)) == 0 && vp->v_bufobj.bo_dirty.bv_cnt == 0)) { VI_UNLOCK(vp); continue; } ...like the i_flag flags aren't ever getting properly cleared (or bv_cnt is always non-zero). ...but I don't have the time to chase this down. -DG David G. Lawrence President Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500 The FreeBSD Project - http://www.freebsd.org Pave the road of life with opportunities.
> While trying to diagnose a packet loss problem in a RELENG_6 snapshot > dated > November 8, 2007 it looks like I've stumbled across a broken driver or > kernel routine which stops interrupt processing long enough to severly > degrade network performance every 30.99 seconds.I noticed this as well some time ago. The problem has to do with the processing (syncing) of vnodes. When the total number of allocated vnodes in the system grows to tens of thousands, the ~31 second periodic sync process takes a long time to run. Try this patch and let people know if it helps your problem. It will periodically wait for one tick (1ms) every 500 vnodes of processing, which will allow other things to run. Index: ufs/ffs/ffs_vfsops.c ==================================================================RCS file: /home/ncvs/src/sys/ufs/ffs/ffs_vfsops.c,v retrieving revision 1.290.2.16 diff -c -r1.290.2.16 ffs_vfsops.c *** ufs/ffs/ffs_vfsops.c 9 Oct 2006 19:47:17 -0000 1.290.2.16 --- ufs/ffs/ffs_vfsops.c 25 Apr 2007 01:58:15 -0000 *************** *** 1109,1114 **** --- 1109,1115 ---- int softdep_deps; int softdep_accdeps; struct bufobj *bo; + int flushed_count = 0; fs = ump->um_fs; if (fs->fs_fmod != 0 && fs->fs_ronly != 0) { /* XXX */ *************** *** 1174,1179 **** --- 1175,1184 ---- allerror = error; vput(vp); MNT_ILOCK(mp); + if (flushed_count++ > 500) { + flushed_count = 0; + msleep(&flushed_count, MNT_MTX(mp), PZERO, "syncw", 1); + } } MNT_IUNLOCK(mp); /* -DG David G. Lawrence President Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500 The FreeBSD Project - http://www.freebsd.org Pave the road of life with opportunities.
David G Lawrence wrote:>> Try it with "find / -type f >/dev/null" to duplicate the problem almost >> instantly. >> > > FreeBSD used to have some code that would cause vnodes with no cached > pages to be recycled quickly (which would have made a simple find > ineffective without reading the files at least a little bit). I guess > that got removed when the size of the vnode pool was dramatically > increased. >You can decrease vfs.wantfreevnodes if caching files without cached data is not beneficial for your application.> -DG > > David G. Lawrence > President > Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500 > The FreeBSD Project - http://www.freebsd.org > Pave the road of life with opportunities. > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > > >
On Wed, 19 Dec 2007, David G Lawrence wrote:>> Try it with "find / -type f >/dev/null" to duplicate the problem almost >> instantly. > > FreeBSD used to have some code that would cause vnodes with no cached > pages to be recycled quickly (which would have made a simple find > ineffective without reading the files at least a little bit). I guess > that got removed when the size of the vnode pool was dramatically > increased.It might still. The data should be cached somewhere, but caching it in both the buffer cache/VMIO and the vnode/inode is wasteful. I may have been only caching vnodes for directories. I switched to using a find or a tar on /home/ncvs/ports since that has a very high density of directories. Bruce
On Dec 22, 2007, at 12:36 AM, Kostik Belousov wrote:> On Fri, Dec 21, 2007 at 10:30:51PM -0500, Mark Fullmer wrote: >> The uio_yield() idea did not work. Still have the same 31 second >> interval packet loss. > What patch you have used ?This is hand applied from the diff you sent December 19, 2007 1:24:48 PM EST sr1400-ar0.eng:/usr/src/sys/ufs/ffs# diff -c ffs_vfsops.c ffs_vfsops.c.orig *** ffs_vfsops.c Fri Dec 21 21:08:39 2007 --- ffs_vfsops.c.orig Sat Dec 22 00:51:22 2007 *************** *** 1107,1113 **** struct ufsmount *ump = VFSTOUFS(mp); struct fs *fs; int error, count, wait, lockreq, allerror = 0; - int yield_count; int suspend; int suspended; int secondary_writes; --- 1107,1112 ---- *************** *** 1148,1154 **** softdep_get_depcounts(mp, &softdep_deps, &softdep_accdeps); MNT_ILOCK(mp); - yield_count = 0; MNT_VNODE_FOREACH(vp, mp, mvp) { /* * Depend on the mntvnode_slock to keep things stable enough --- 1147,1152 ---- *************** *** 1166,1177 **** (IN_ACCESS | IN_CHANGE | IN_MODIFIED | IN_UPDATE)) == 0 && vp->v_bufobj.bo_dirty.bv_cnt == 0)) { VI_UNLOCK(vp); - if (yield_count++ == 100) { - MNT_IUNLOCK(mp); - yield_count = 0; - uio_yield(); - goto relock_mp; - } continue; } MNT_IUNLOCK(mp); --- 1164,1169 ---- *************** *** 1186,1192 **** if ((error = ffs_syncvnode(vp, waitfor)) != 0) allerror = error; vput(vp); - relock_mp: MNT_ILOCK(mp); } MNT_IUNLOCK(mp); --- 1178,1183 ----> > Lets check whether the syncer is the culprit for you. > Please, change the value of the syncdelay at the sys/kern/vfs_subr.c > around the line 238 from 30 to some other value, e.g., 45. After that, > check the interval of the effect you have observed.Changed it to 13. Not sure if SYNCER_MAXDELAY also needed to be increased if syncdelay was increased. static int syncdelay = 13; /* max time to delay syncing data */ Test: ; use vnodes % find / -type f -print > /dev/null ; verify % sysctl vfs.numvnodes vfs.numvnodes: 32128 ; run packet loss test now have periodic loss every 13994633us (13.99 seconds). ; reduce # of vnodes with sysctl kern.maxvnodes=1000 test now runs clean.> > It would be interesting to check whether completely disabling the > syncer > eliminates the packet loss, but such system have to be operated with > extreme caution. >
On Sat, Dec 22, 2007 at 01:28:31AM -0500, Mark Fullmer wrote:> > On Dec 22, 2007, at 12:36 AM, Kostik Belousov wrote: > >Lets check whether the syncer is the culprit for you. > >Please, change the value of the syncdelay at the sys/kern/vfs_subr.c > >around the line 238 from 30 to some other value, e.g., 45. After that, > >check the interval of the effect you have observed. > > Changed it to 13. Not sure if SYNCER_MAXDELAY also needed to be > increased if syncdelay was increased. > > static int syncdelay = 13; /* max time to delay syncing > data */ > > Test: > > ; use vnodes > % find / -type f -print > /dev/null > > ; verify > % sysctl vfs.numvnodes > vfs.numvnodes: 32128 > > ; run packet loss test > now have periodic loss every 13994633us (13.99 seconds). > > ; reduce # of vnodes with sysctl kern.maxvnodes=1000 > test now runs clean.Definitely syncer.> > > >It would be interesting to check whether completely disabling the > >syncer > >eliminates the packet loss, but such system have to be operated with > >extreme caution.Ok, no need to do this. As Bruce Evans noted, there is a vfs_msync() that do almost the same traversal of the vnodes. It was missed in the previous patch. Try this one. diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c index 3c2e1ed..6515d6a 100644 --- a/sys/kern/vfs_subr.c +++ b/sys/kern/vfs_subr.c @@ -2967,7 +2967,9 @@ vfs_msync(struct mount *mp, int flags) { struct vnode *vp, *mvp; struct vm_object *obj; + int yield_count; + yield_count = 0; MNT_ILOCK(mp); MNT_VNODE_FOREACH(vp, mp, mvp) { VI_LOCK(vp); @@ -2996,6 +2998,12 @@ vfs_msync(struct mount *mp, int flags) MNT_ILOCK(mp); } else VI_UNLOCK(vp); + if (yield_count++ == 500) { + MNT_IUNLOCK(mp); + yield_count = 0; + uio_yield(); + MNT_ILOCK(mp); + } } MNT_IUNLOCK(mp); } diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c index cbccc62..9e8b887 100644 --- a/sys/ufs/ffs/ffs_vfsops.c +++ b/sys/ufs/ffs/ffs_vfsops.c @@ -1182,6 +1182,7 @@ ffs_sync(mp, waitfor, td) int secondary_accwrites; int softdep_deps; int softdep_accdeps; + int yield_count; struct bufobj *bo; fs = ump->um_fs; @@ -1216,6 +1217,7 @@ loop: softdep_get_depcounts(mp, &softdep_deps, &softdep_accdeps); MNT_ILOCK(mp); + yield_count = 0; MNT_VNODE_FOREACH(vp, mp, mvp) { /* * Depend on the mntvnode_slock to keep things stable enough @@ -1233,6 +1235,12 @@ loop: (IN_ACCESS | IN_CHANGE | IN_MODIFIED | IN_UPDATE)) == 0 && vp->v_bufobj.bo_dirty.bv_cnt == 0)) { VI_UNLOCK(vp); + if (yield_count++ == 500) { + MNT_IUNLOCK(mp); + yield_count = 0; + uio_yield(); + MNT_ILOCK(mp); + } continue; } MNT_IUNLOCK(mp); -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20071222/4987d98f/attachment.pgp