'K, not sure how/what to debug here ... doing a grep of svr1.postgresql.org in /proc/*/status shows all the processes 'stuck' in inode ... /proc/38750/status:inetd 38750 2072 2072 2072 -1,-1 noflags 1055120147,191009 0,0 0,592 inode 0 0 0,0,0,2,3,4,5,20,31 svr1.postgresql.org /proc/38752/status:inetd 38752 2072 2072 2072 -1,-1 noflags 1055120154,886433 0,0 0,637 inode 0 0 0,0,0,2,3,4,5,20,31 svr1.postgresql.org /proc/38753/status:inetd 38753 2072 2072 2072 -1,-1 noflags 1055120155,641964 0,0 0,610 inode 0 0 0,0,0,2,3,4,5,20,31 svr1.postgresql.org /proc/38806/status:inetd 38806 2072 2072 2072 -1,-1 noflags 1055120188,905284 0,0 0,789 inode 0 0 0,0,0,2,3,4,5,20,31 svr1.postgresql.org /proc/38863/status:inetd 38863 2072 2072 2072 -1,-1 noflags 1055120257,763084 0,0 0,656 inode 0 0 0,0,0,2,3,4,5,20,31 svr1.postgresql.org Jun 8 22:47:00 jupiter root: =============================Jun 8 22:47:00 jupiter root: Jun 8 22:47:00 jupiter root: 10:47PM up 1 day, 23:16, 15 users, load averages: 0.04, 0.14, 0.25 Jun 8 22:47:00 jupiter root: Jun 8 22:47:00 jupiter root: debug.numvnodes: 461971 - debug.freevnodes: 279962 - debug.vnlru_nowhere: 0 - vlruwt Jun 8 22:47:00 jupiter root: Jun 8 22:47:00 jupiter root: unpath 8808 152K 1199K204800K 8061718 0 0 16,32,64 Jun 8 22:47:00 jupiter root: temp 16340 1961K 8737K204800K 15369938 0 0 16,32,64,128,256,512,1K,2K,4K,8K,16K,128K Jun 8 22:47:00 jupiter root: Jun 8 22:47:00 jupiter root: vm.kvm_size: 2139095040 Jun 8 22:47:00 jupiter root: vm.kvm_free: 914358272 Jun 8 22:47:00 jupiter root: Jun 8 22:47:00 jupiter root: 374/986/16384 mbuf clusters in use (current/peak/max) Jun 8 22:47:00 jupiter root: Jun 8 22:47:01 jupiter root: 162 svr1.postgresql.org:inetd Jun 8 22:47:01 jupiter root: 62 -:postgres Jun 8 22:47:01 jupiter root: 18 svr1.postgresql.org:httpd Jun 8 22:47:01 jupiter root: 11 -:httpd Jun 8 22:47:01 jupiter root: 10 svr1.postgresql.org:master Jun 8 22:47:01 jupiter root: 8 -:csh Jun 8 22:47:01 jupiter root: 7 svr1.postgresql.org:perl Jun 8 22:47:01 jupiter root: 7 -:sshd Jun 8 22:47:01 jupiter root: 6 digitalevejapan.org:httpd Jun 8 22:47:01 jupiter root: 6 -:sh Jun 8 22:47:01 jupiter root: Jun 8 22:47:01 jupiter root: running processes: 963 I'm going to get the server rebooted, don't know if a ctl-alt-esc will get us a core though, but will try ... assuming that it doesn't, is there anything that I should run to get more information when something like this happens? Note that when this happens, I can't do an ls of /vm ... /vm isn't a unionfs: Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/da0s1a 516062 84446 390332 18% / /dev/da0s1e 1032142 24 949548 0% /tmp /dev/da0s1f 10322414 7972110 1524512 84% /usr /dev/da0s1g 1032142 91890 857682 10% /var /dev/da0s1h 119837208 85749532 24500700 78% /vm procfs 4 4 0 100% /proc /vm 119837208 85749532 24500700 78% /du procfs 4 4 0 100% /vm/1/mall.pgsql.com/proc <below>:/vm/.t/usr 239674416 205586740 24500700 89% /vm/1/mall.pgsql.com/usr Doing a 'df -t ufs' hangs as well, altho a straight 'df' runs through no problem ... /vm doesn't have softupdates enabled ... This is running a recent -STABLE: FreeBSD 4.8-STABLE #1: Fri Jun 6 01:22:46 ADT 2003
'K, neitehr ctl-alt-esc nor ctl-alt-del works at this stage, *but* I can do a reboot from the command line ... but hangs while rebooting ... On Sun, 8 Jun 2003, Marc G. Fournier wrote:> > 'K, not sure how/what to debug here ... > > doing a grep of svr1.postgresql.org in /proc/*/status shows all the > processes 'stuck' in inode ... > > /proc/38750/status:inetd 38750 2072 2072 2072 -1,-1 noflags 1055120147,191009 0,0 0,592 inode 0 0 0,0,0,2,3,4,5,20,31 svr1.postgresql.org > /proc/38752/status:inetd 38752 2072 2072 2072 -1,-1 noflags 1055120154,886433 0,0 0,637 inode 0 0 0,0,0,2,3,4,5,20,31 svr1.postgresql.org > /proc/38753/status:inetd 38753 2072 2072 2072 -1,-1 noflags 1055120155,641964 0,0 0,610 inode 0 0 0,0,0,2,3,4,5,20,31 svr1.postgresql.org > /proc/38806/status:inetd 38806 2072 2072 2072 -1,-1 noflags 1055120188,905284 0,0 0,789 inode 0 0 0,0,0,2,3,4,5,20,31 svr1.postgresql.org > /proc/38863/status:inetd 38863 2072 2072 2072 -1,-1 noflags 1055120257,763084 0,0 0,656 inode 0 0 0,0,0,2,3,4,5,20,31 svr1.postgresql.org > > Jun 8 22:47:00 jupiter root: =============================> Jun 8 22:47:00 jupiter root: > Jun 8 22:47:00 jupiter root: 10:47PM up 1 day, 23:16, 15 users, load averages: 0.04, 0.14, 0.25 > Jun 8 22:47:00 jupiter root: > Jun 8 22:47:00 jupiter root: debug.numvnodes: 461971 - debug.freevnodes: 279962 - debug.vnlru_nowhere: 0 - vlruwt > Jun 8 22:47:00 jupiter root: > Jun 8 22:47:00 jupiter root: unpath 8808 152K 1199K204800K 8061718 0 0 16,32,64 > Jun 8 22:47:00 jupiter root: temp 16340 1961K 8737K204800K 15369938 0 0 16,32,64,128,256,512,1K,2K,4K,8K,16K,128K > Jun 8 22:47:00 jupiter root: > Jun 8 22:47:00 jupiter root: vm.kvm_size: 2139095040 > Jun 8 22:47:00 jupiter root: vm.kvm_free: 914358272 > Jun 8 22:47:00 jupiter root: > Jun 8 22:47:00 jupiter root: 374/986/16384 mbuf clusters in use (current/peak/max) > Jun 8 22:47:00 jupiter root: > Jun 8 22:47:01 jupiter root: 162 svr1.postgresql.org:inetd > Jun 8 22:47:01 jupiter root: 62 -:postgres > Jun 8 22:47:01 jupiter root: 18 svr1.postgresql.org:httpd > Jun 8 22:47:01 jupiter root: 11 -:httpd > Jun 8 22:47:01 jupiter root: 10 svr1.postgresql.org:master > Jun 8 22:47:01 jupiter root: 8 -:csh > Jun 8 22:47:01 jupiter root: 7 svr1.postgresql.org:perl > Jun 8 22:47:01 jupiter root: 7 -:sshd > Jun 8 22:47:01 jupiter root: 6 digitalevejapan.org:httpd > Jun 8 22:47:01 jupiter root: 6 -:sh > Jun 8 22:47:01 jupiter root: > Jun 8 22:47:01 jupiter root: running processes: 963 > > I'm going to get the server rebooted, don't know if a ctl-alt-esc will get > us a core though, but will try ... assuming that it doesn't, is there > anything that I should run to get more information when something like > this happens? > > Note that when this happens, I can't do an ls of /vm ... /vm isn't a > unionfs: > > Filesystem 1K-blocks Used Avail Capacity Mounted on > /dev/da0s1a 516062 84446 390332 18% / > /dev/da0s1e 1032142 24 949548 0% /tmp > /dev/da0s1f 10322414 7972110 1524512 84% /usr > /dev/da0s1g 1032142 91890 857682 10% /var > /dev/da0s1h 119837208 85749532 24500700 78% /vm > procfs 4 4 0 100% /proc > /vm 119837208 85749532 24500700 78% /du > procfs 4 4 0 100% /vm/1/mall.pgsql.com/proc > <below>:/vm/.t/usr 239674416 205586740 24500700 89% /vm/1/mall.pgsql.com/usr > > Doing a 'df -t ufs' hangs as well, altho a straight 'df' runs through no > problem ... > > /vm doesn't have softupdates enabled ... > > This is running a recent -STABLE: > > FreeBSD 4.8-STABLE #1: Fri Jun 6 01:22:46 ADT 2003 > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" >Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
On Sun, Jun 08, 2003, Marc G. Fournier wrote:> > 'K, not sure how/what to debug here ... > > doing a grep of svr1.postgresql.org in /proc/*/status shows all the > processes 'stuck' in inode ... > > /proc/38750/status:inetd 38750 2072 2072 2072 -1,-1 noflags 1055120147,191009 0,0 0,592 inode 0 0 0,0,0,2,3,4,5,20,31 svr1.postgresql.org > /proc/38752/status:inetd 38752 2072 2072 2072 -1,-1 noflags 1055120154,886433 0,0 0,637 inode 0 0 0,0,0,2,3,4,5,20,31 svr1.postgresql.org > /proc/38753/status:inetd 38753 2072 2072 2072 -1,-1 noflags 1055120155,641964 0,0 0,610 inode 0 0 0,0,0,2,3,4,5,20,31 svr1.postgresql.org > /proc/38806/status:inetd 38806 2072 2072 2072 -1,-1 noflags 1055120188,905284 0,0 0,789 inode 0 0 0,0,0,2,3,4,5,20,31 svr1.postgresql.org > /proc/38863/status:inetd 38863 2072 2072 2072 -1,-1 noflags 1055120257,763084 0,0 0,656 inode 0 0 0,0,0,2,3,4,5,20,31 svr1.postgresql.org[...]> I'm going to get the server rebooted, don't know if a ctl-alt-esc will get > us a core though, but will try ... assuming that it doesn't, is there > anything that I should run to get more information when something like > this happens?A backtrace on the processes stuck in kernel mode might shed some light on the deadlock. (To get a backtrace of the process with pid N from DDB, just say 'trace N'.)