Дмитрий Долбнин
2015-Oct-29 01:06 UTC
Re: Stuck processes in unkillable (STOP) state, listen queue overflow
Good day everyone ! From my point of view it seems like you're experiencing the "downgraded" hardware performance which causes you the problems you meet. Try to switch for the "new-one" power supply at least. Why I think so ? Because the bad power supplies are met much more often than the bad source code for FreeBSD. Of course I can't tell you you're completely wrong. Best regards, Dimitry.>?????, 28 ??????? 2015, 12:00 UTC ?? freebsd-stable-request at freebsd.org: > >Send freebsd-stable mailing list submissions to >freebsd-stable at freebsd.org > >To subscribe or unsubscribe via the World Wide Web, visit >https://lists.freebsd.org/mailman/listinfo/freebsd-stable >or, via email, send a message with subject or body 'help' to >freebsd-stable-request at freebsd.org > >You can reach the person managing the list at >freebsd-stable-owner at freebsd.org > >When replying, please edit your Subject line so it is more specific >than "Re: Contents of freebsd-stable digest..." > > >Today's Topics: > >???1. Re: Stuck processes in unkillable (STOP) state, listen queue >??????overflow (Zara Kanaeva) >???2. Re: Stuck processes in unkillable (STOP) state, listen queue >??????overflow (Nagy, Attila) > > >---------------------------------------------------------------------- > >Message: 1 >Date: Tue, 27 Oct 2015 14:42:42 +0100 >From: Zara Kanaeva < zara.kanaeva at ggi.uni-tuebingen.de > >To: freebsd-stable at freebsd.org >Subject: Re: Stuck processes in unkillable (STOP) state, listen queue >overflow >Message-ID: >< 20151027144242.Horde.3Xc1_RqzaVMAZ12X6OPXfdN at webmail.uni-tuebingen.de > > >Content-Type: text/plain; charset=utf-8; format=flowed; DelSp=Yes > >Hello, > >I have the same experience with apache and mapserver. It happens on >physical machine and ends with spontaneous reboot. This machine is >updated from FREEBSD 9.0 RELEASE to FREEBSD 10.2-PRERELEASE. Perhaps >this machine doesn't have enough RAM (only 8GB), but I think that must >not be a reason for a spontaneous reboot. > >I had no such behavior with the same machine and FREEBSD 9.0 RELEASE >on it (I am not 100% sure, I have yet no possibility to test it). > >Regards, Z. Kanaeva. > >Zitat von "Nagy, Attila" < bra at fsn.hu >: > >> Hi, >> >> Recently I've started to see a lot of cases, where the log is full >> with "listen queue overflow" messages and the process behind the >> network socket is unavailable. >> When I open a TCP to it, it opens but nothing happens (for example I >> get no SMTP banner from postfix, nor I get a log entry about the new >> connection). >> >> I've seen this with Java programs, postfix and redis, basically >> everything which opens a TCP and listens on the machine. >> >> For example, I have a redis process, which listens on 6381. When I >> telnet into it, the TCP opens, but the program doesn't respond. >> When I kill it, nothing happens. Even with kill -9 yields only this state: >> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAN >> 776 redis 2 20 0 24112K 2256K STOP 3 16:56 >> 0.00% redis- >> >> When I tcpdrop the connections of the process, tcpdrop reports >> success for the first time and failure for the second (No such >> process), but the connections remain: >> # sockstat -4 | grep 776 >> redis redis-serv 776 6 tcp4 *:6381 *:* >> redis redis-serv 776 9 tcp4 *:16381 *:* >> redis redis-serv 776 10 tcp4 127.0.0.1:16381 127.0.0.1:10460 >> redis redis-serv 776 11 tcp4 127.0.0.1:16381 127.0.0.1:35795 >> redis redis-serv 776 13 tcp4 127.0.0.1:30027 127.0.0.1:16379 >> redis redis-serv 776 14 tcp4 127.0.0.1:58802 127.0.0.1:16384 >> redis redis-serv 776 17 tcp4 127.0.0.1:16381 127.0.0.1:24354 >> redis redis-serv 776 18 tcp4 127.0.0.1:16381 127.0.0.1:56999 >> redis redis-serv 776 19 tcp4 127.0.0.1:16381 127.0.0.1:39488 >> redis redis-serv 776 20 tcp4 127.0.0.1:6381 127.0.0.1:39491 >> # sockstat -4 | grep 776 | awk '{print "tcpdrop "$6" "$7}' | /bin/sh >> tcpdrop: getaddrinfo: * port 6381: hostname nor servname provided, >> or not known >> tcpdrop: getaddrinfo: * port 16381: hostname nor servname provided, >> or not known >> tcpdrop: 127.0.0.1 16381 127.0.0.1 10460: No such process >> tcpdrop: 127.0.0.1 16381 127.0.0.1 35795: No such process >> tcpdrop: 127.0.0.1 30027 127.0.0.1 16379: No such process >> tcpdrop: 127.0.0.1 58802 127.0.0.1 16384: No such process >> tcpdrop: 127.0.0.1 16381 127.0.0.1 24354: No such process >> tcpdrop: 127.0.0.1 16381 127.0.0.1 56999: No such process >> tcpdrop: 127.0.0.1 16381 127.0.0.1 39488: No such process >> tcpdrop: 127.0.0.1 6381 127.0.0.1 39491: No such process >> # sockstat -4 | grep 776 >> redis redis-serv 776 6 tcp4 *:6381 *:* >> redis redis-serv 776 9 tcp4 *:16381 *:* >> redis redis-serv 776 10 tcp4 127.0.0.1:16381 127.0.0.1:10460 >> redis redis-serv 776 11 tcp4 127.0.0.1:16381 127.0.0.1:35795 >> redis redis-serv 776 13 tcp4 127.0.0.1:30027 127.0.0.1:16379 >> redis redis-serv 776 14 tcp4 127.0.0.1:58802 127.0.0.1:16384 >> redis redis-serv 776 17 tcp4 127.0.0.1:16381 127.0.0.1:24354 >> redis redis-serv 776 18 tcp4 127.0.0.1:16381 127.0.0.1:56999 >> redis redis-serv 776 19 tcp4 127.0.0.1:16381 127.0.0.1:39488 >> redis redis-serv 776 20 tcp4 127.0.0.1:6381 127.0.0.1:39491 >> >> $ procstat -k 776 >> PID TID COMM TDNAME KSTACK >> 776 100725 redis-server - mi_switch >> sleepq_timedwait_sig _sleep kern_kevent sys_kevent amd64_syscall >> Xfast_syscall >> 776 100744 redis-server - mi_switch >> thread_suspend_switch thread_single exit1 sigexit postsig ast >> doreti_ast >> >> I can do nothing to get out from this state, only reboot helps. >> >> The OS is stable/10 at r289313, but I could observe this behaviour with >> earlier releases too. >> >> The dmesg is full with lines like these: >> sonewconn: pcb 0xfffff8004dc54498: Listen queue overflow: 193 >> already in queue awaiting acceptance (3142 occurrences) >> sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193 >> already in queue awaiting acceptance (3068 occurrences) >> sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193 >> already in queue awaiting acceptance (3057 occurrences) >> sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193 >> already in queue awaiting acceptance (3037 occurrences) >> sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193 >> already in queue awaiting acceptance (3015 occurrences) >> sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193 >> already in queue awaiting acceptance (3035 occurrences) >> >> I guess this is the effect of the process freeze, not the cause (the >> listen queue fills up because the app can't handle the incoming >> connections). >> >> I'm not sure it matters, but some of the machines (and the above) >> runs on an ESX hypervisor (but as far as I can remember, I could see >> this on physical machines too, but I'm not sure about that). >> Also -so far- I could only see this where some "exotic" stuff ran, >> like a java or erlang based server (opendj, elasticsearch and >> rabbitmq). >> >> Also not sure about which triggers this. I've never seen this after >> some hours of uptime, at least some days or a week must've been >> passed to get stuck like the above. >> >> Any ideas about this? >> >> Thanks, >> _______________________________________________ >> freebsd-stable at freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-stable >> To unsubscribe, send any mail to " freebsd-stable-unsubscribe at freebsd.org " > > > >-- >Dipl.-Inf. Zara Kanaeva >Heidelberger Akademie der Wissenschaften >Forschungsstelle "The role of culture in early expansions of humans" >an der Universit?t T?bingen >Geographisches Institut >Universit?t T?bingen >Ruemelinstr. 19-23 >72070 Tuebingen > >Tel.: +49-(0)7071-2972132 >e-mail: zara.kanaeva at geographie.uni-tuebingen.de >------- >- Theory is when you know something but it doesn't work. >- Practice is when something works but you don't know why. >- Usually we combine theory and practice: >?????????Nothing works and we don't know why. > > > >------------------------------ > >Message: 2 >Date: Tue, 27 Oct 2015 17:25:01 +0100 >From: "Nagy, Attila" < bra at fsn.hu > >To: Zara Kanaeva < zara.kanaeva at ggi.uni-tuebingen.de >, >freebsd-stable at freebsd.org >Subject: Re: Stuck processes in unkillable (STOP) state, listen queue >overflow >Message-ID: < 562FA55D.6050503 at fsn.hu > >Content-Type: text/plain; charset=utf-8; format=flowed > >Hi, > >(following topposting) >I have seen this with 16 and 32 GiB of RAM, but anyways, it shouldn't >matter. >Do you use zfs? Although it doesn't seem to be stuck on IO... > >On 10/27/15 14:42, Zara Kanaeva wrote: >> Hello, >> >> I have the same experience with apache and mapserver. It happens on >> physical machine and ends with spontaneous reboot. This machine is >> updated from FREEBSD 9.0 RELEASE to FREEBSD 10.2-PRERELEASE. Perhaps >> this machine doesn't have enough RAM (only 8GB), but I think that must >> not be a reason for a spontaneous reboot. >> >> I had no such behavior with the same machine and FREEBSD 9.0 RELEASE >> on it (I am not 100% sure, I have yet no possibility to test it). >> >> Regards, Z. Kanaeva. >
Zara Kanaeva
2015-Oct-29 15:46 UTC
Stuck processes in unkillable (STOP) state, listen queue overflow
Hello ???????, thank you very much for your message. First of all: I like FreeBSD (the installation logic, the good documentation etc.), this is why I use FreeBSD as Server OS. But in my case I must desagree your strong theoretical probability consideration. In my case I have one machine (7 years old), that had 1-2 spontaneous rebootes in a year. In my case I got a lot of "already in queue awaiting acceptance"-Errors and the machine rebootes immediately after this. I will get soon a new replacement for this old machine with at least 32 GB RAM and (of course) new power supply. So I will see if my problem (perhaps it is only my problem) still persist. Greetings, Z. Kanaeva. Zitat von ??????? ??????? <bad_hdd at list.ru>:> Good day everyone ! > From my point of view it seems like you're experiencing the > "downgraded" hardware performance which causes you the problems you > meet. > Try to switch for the "new-one" power supply at least. > Why I think so ? Because the bad power supplies are met much more > often than the bad source code for FreeBSD. Of course I can't tell > you you're completely wrong. > Best regards, Dimitry. >> ?????, 28 ??????? 2015, 12:00 UTC ?? freebsd-stable-request at freebsd.org: >> >> Send freebsd-stable mailing list submissions to >> freebsd-stable at freebsd.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> https://lists.freebsd.org/mailman/listinfo/freebsd-stable >> or, via email, send a message with subject or body 'help' to >> freebsd-stable-request at freebsd.org >> >> You can reach the person managing the list at >> freebsd-stable-owner at freebsd.org >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of freebsd-stable digest..." >> >> >> Today's Topics: >> >> ???1. Re: Stuck processes in unkillable (STOP) state, listen queue >> ??????overflow (Zara Kanaeva) >> ???2. Re: Stuck processes in unkillable (STOP) state, listen queue >> ??????overflow (Nagy, Attila) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Tue, 27 Oct 2015 14:42:42 +0100 >> From: Zara Kanaeva < zara.kanaeva at ggi.uni-tuebingen.de > >> To: freebsd-stable at freebsd.org >> Subject: Re: Stuck processes in unkillable (STOP) state, listen queue >> overflow >> Message-ID: >> < 20151027144242.Horde.3Xc1_RqzaVMAZ12X6OPXfdN at webmail.uni-tuebingen.de > >> >> Content-Type: text/plain; charset=utf-8; format=flowed; DelSp=Yes >> >> Hello, >> >> I have the same experience with apache and mapserver. It happens on >> physical machine and ends with spontaneous reboot. This machine is >> updated from FREEBSD 9.0 RELEASE to FREEBSD 10.2-PRERELEASE. Perhaps >> this machine doesn't have enough RAM (only 8GB), but I think that must >> not be a reason for a spontaneous reboot. >> >> I had no such behavior with the same machine and FREEBSD 9.0 RELEASE >> on it (I am not 100% sure, I have yet no possibility to test it). >> >> Regards, Z. Kanaeva. >> >> Zitat von "Nagy, Attila" < bra at fsn.hu >: >> >>> Hi, >>> >>> Recently I've started to see a lot of cases, where the log is full >>> with "listen queue overflow" messages and the process behind the >>> network socket is unavailable. >>> When I open a TCP to it, it opens but nothing happens (for example I >>> get no SMTP banner from postfix, nor I get a log entry about the new >>> connection). >>> >>> I've seen this with Java programs, postfix and redis, basically >>> everything which opens a TCP and listens on the machine. >>> >>> For example, I have a redis process, which listens on 6381. When I >>> telnet into it, the TCP opens, but the program doesn't respond. >>> When I kill it, nothing happens. Even with kill -9 yields only this state: >>> PID USERNAME THR PRI NICE SIZE RES STATE C TIME >>> WCPU COMMAN >>> 776 redis 2 20 0 24112K 2256K STOP 3 16:56 >>> 0.00% redis- >>> >>> When I tcpdrop the connections of the process, tcpdrop reports >>> success for the first time and failure for the second (No such >>> process), but the connections remain: >>> # sockstat -4 | grep 776 >>> redis redis-serv 776 6 tcp4 *:6381 *:* >>> redis redis-serv 776 9 tcp4 *:16381 *:* >>> redis redis-serv 776 10 tcp4 127.0.0.1:16381 127.0.0.1:10460 >>> redis redis-serv 776 11 tcp4 127.0.0.1:16381 127.0.0.1:35795 >>> redis redis-serv 776 13 tcp4 127.0.0.1:30027 127.0.0.1:16379 >>> redis redis-serv 776 14 tcp4 127.0.0.1:58802 127.0.0.1:16384 >>> redis redis-serv 776 17 tcp4 127.0.0.1:16381 127.0.0.1:24354 >>> redis redis-serv 776 18 tcp4 127.0.0.1:16381 127.0.0.1:56999 >>> redis redis-serv 776 19 tcp4 127.0.0.1:16381 127.0.0.1:39488 >>> redis redis-serv 776 20 tcp4 127.0.0.1:6381 127.0.0.1:39491 >>> # sockstat -4 | grep 776 | awk '{print "tcpdrop "$6" "$7}' | /bin/sh >>> tcpdrop: getaddrinfo: * port 6381: hostname nor servname provided, >>> or not known >>> tcpdrop: getaddrinfo: * port 16381: hostname nor servname provided, >>> or not known >>> tcpdrop: 127.0.0.1 16381 127.0.0.1 10460: No such process >>> tcpdrop: 127.0.0.1 16381 127.0.0.1 35795: No such process >>> tcpdrop: 127.0.0.1 30027 127.0.0.1 16379: No such process >>> tcpdrop: 127.0.0.1 58802 127.0.0.1 16384: No such process >>> tcpdrop: 127.0.0.1 16381 127.0.0.1 24354: No such process >>> tcpdrop: 127.0.0.1 16381 127.0.0.1 56999: No such process >>> tcpdrop: 127.0.0.1 16381 127.0.0.1 39488: No such process >>> tcpdrop: 127.0.0.1 6381 127.0.0.1 39491: No such process >>> # sockstat -4 | grep 776 >>> redis redis-serv 776 6 tcp4 *:6381 *:* >>> redis redis-serv 776 9 tcp4 *:16381 *:* >>> redis redis-serv 776 10 tcp4 127.0.0.1:16381 127.0.0.1:10460 >>> redis redis-serv 776 11 tcp4 127.0.0.1:16381 127.0.0.1:35795 >>> redis redis-serv 776 13 tcp4 127.0.0.1:30027 127.0.0.1:16379 >>> redis redis-serv 776 14 tcp4 127.0.0.1:58802 127.0.0.1:16384 >>> redis redis-serv 776 17 tcp4 127.0.0.1:16381 127.0.0.1:24354 >>> redis redis-serv 776 18 tcp4 127.0.0.1:16381 127.0.0.1:56999 >>> redis redis-serv 776 19 tcp4 127.0.0.1:16381 127.0.0.1:39488 >>> redis redis-serv 776 20 tcp4 127.0.0.1:6381 127.0.0.1:39491 >>> >>> $ procstat -k 776 >>> PID TID COMM TDNAME KSTACK >>> 776 100725 redis-server - mi_switch >>> sleepq_timedwait_sig _sleep kern_kevent sys_kevent amd64_syscall >>> Xfast_syscall >>> 776 100744 redis-server - mi_switch >>> thread_suspend_switch thread_single exit1 sigexit postsig ast >>> doreti_ast >>> >>> I can do nothing to get out from this state, only reboot helps. >>> >>> The OS is stable/10 at r289313, but I could observe this behaviour with >>> earlier releases too. >>> >>> The dmesg is full with lines like these: >>> sonewconn: pcb 0xfffff8004dc54498: Listen queue overflow: 193 >>> already in queue awaiting acceptance (3142 occurrences) >>> sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193 >>> already in queue awaiting acceptance (3068 occurrences) >>> sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193 >>> already in queue awaiting acceptance (3057 occurrences) >>> sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193 >>> already in queue awaiting acceptance (3037 occurrences) >>> sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193 >>> already in queue awaiting acceptance (3015 occurrences) >>> sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193 >>> already in queue awaiting acceptance (3035 occurrences) >>> >>> I guess this is the effect of the process freeze, not the cause (the >>> listen queue fills up because the app can't handle the incoming >>> connections). >>> >>> I'm not sure it matters, but some of the machines (and the above) >>> runs on an ESX hypervisor (but as far as I can remember, I could see >>> this on physical machines too, but I'm not sure about that). >>> Also -so far- I could only see this where some "exotic" stuff ran, >>> like a java or erlang based server (opendj, elasticsearch and >>> rabbitmq). >>> >>> Also not sure about which triggers this. I've never seen this after >>> some hours of uptime, at least some days or a week must've been >>> passed to get stuck like the above. >>> >>> Any ideas about this? >>> >>> Thanks, >>> _______________________________________________ >>> freebsd-stable at freebsd.org mailing list >>> https://lists.freebsd.org/mailman/listinfo/freebsd-stable >>> To unsubscribe, send any mail to " freebsd-stable-unsubscribe at freebsd.org " >> >> >> >> -- >> Dipl.-Inf. Zara Kanaeva >> Heidelberger Akademie der Wissenschaften >> Forschungsstelle "The role of culture in early expansions of humans" >> an der Universit?t T?bingen >> Geographisches Institut >> Universit?t T?bingen >> Ruemelinstr. 19-23 >> 72070 Tuebingen >> >> Tel.: +49-(0)7071-2972132 >> e-mail: zara.kanaeva at geographie.uni-tuebingen.de >> ------- >> - Theory is when you know something but it doesn't work. >> - Practice is when something works but you don't know why. >> - Usually we combine theory and practice: >> ?????????Nothing works and we don't know why. >> >> >> >> ------------------------------ >> >> Message: 2 >> Date: Tue, 27 Oct 2015 17:25:01 +0100 >> From: "Nagy, Attila" < bra at fsn.hu > >> To: Zara Kanaeva < zara.kanaeva at ggi.uni-tuebingen.de >, >> freebsd-stable at freebsd.org >> Subject: Re: Stuck processes in unkillable (STOP) state, listen queue >> overflow >> Message-ID: < 562FA55D.6050503 at fsn.hu > >> Content-Type: text/plain; charset=utf-8; format=flowed >> >> Hi, >> >> (following topposting) >> I have seen this with 16 and 32 GiB of RAM, but anyways, it shouldn't >> matter. >> Do you use zfs? Although it doesn't seem to be stuck on IO... >> >> On 10/27/15 14:42, Zara Kanaeva wrote: >>> Hello, >>> >>> I have the same experience with apache and mapserver. It happens on >>> physical machine and ends with spontaneous reboot. This machine is >>> updated from FREEBSD 9.0 RELEASE to FREEBSD 10.2-PRERELEASE. Perhaps >>> this machine doesn't have enough RAM (only 8GB), but I think that must >>> not be a reason for a spontaneous reboot. >>> >>> I had no such behavior with the same machine and FREEBSD 9.0 RELEASE >>> on it (I am not 100% sure, I have yet no possibility to test it). >>> >>> Regards, Z. Kanaeva. >> > > _______________________________________________ > freebsd-stable at freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"-- Dipl.-Inf. Zara Kanaeva Heidelberger Akademie der Wissenschaften Forschungsstelle "The role of culture in early expansions of humans" an der Universit?t T?bingen Geographisches Institut Universit?t T?bingen Ruemelinstr. 19-23 72070 Tuebingen Tel.: +49-(0)7071-2972132 e-mail: zara.kanaeva at geographie.uni-tuebingen.de ------- - Theory is when you know something but it doesn't work. - Practice is when something works but you don't know why. - Usually we combine theory and practice: Nothing works and we don't know why.