CentOS 4.1/bind-9.2.4-2. I have named serving as a cache DNS server plus SOA for a local intranet zone. The problem I am encountering - over a period of time it stops responding to queries. nmap scan from a different host shows port 53 is visible. I can telnet to the port but all queries to server time out. So much so that "service named status" and "service named restart" hang. I have to manually kill the named process before I am able to start named again (I do remove the lock/pid files manually as well). This has occurred about 4 times since I installed CentOS 4.1 4 weeks ago. I have not encountered any problem with other services running on the same server. I looked through /var/log/messages and did not find any errors logged by named. I'd appreciate any thoughts/suggestions to debug this problem. Here is what I have tried so far to figure out the problem: (from 192.168.1.150) $ host www.yahoo.com 192.168.1.21 ;; connection timed out; no servers could be reached # nmapfe of 192.168.1.21 (from 192.168.1.150) (The 1208 ports scanned but not shown below are in state: closed) PORT STATE SERVICE 22/tcp open ssh 25/tcp open smtp 53/tcp open domain (ssh'd into named server using IP# 192.168.1.21) # service named status rndc: recv failed: operation canceled TIA, -- Arun Khan Linux is like a wigwam - no gates, no windows, apache inside
On Wed, 2005-08-24 at 10:34, Arun K. Khan wrote:> CentOS 4.1/bind-9.2.4-2. > > I have named serving as a cache DNS server plus SOA for a local intranet > zone. > > The problem I am encountering - over a period of time it stops > responding to queries.> (from 192.168.1.150) > $ host www.yahoo.com 192.168.1.21 > ;; connection timed out; no servers could be reached > > # nmapfe of 192.168.1.21 (from 192.168.1.150) > (The 1208 ports scanned but not shown below are in state: > closed) > PORT STATE SERVICE > 22/tcp open ssh > 25/tcp open smtp > 53/tcp open domain > > (ssh'd into named server using IP# 192.168.1.21) > # service named status > rndc: recv failed: operation canceledIt looks like it can't reach the root servers. It has a private address - could you have a problem with your NAT gateway to the internet? How about your local firewalling on 53/udp to let the responses back? -- Les Mikesell lesmikesell at gmail.com
Arun K. Khan wrote:> > I looked through /var/log/messages and did not find any errors logged by > named. I'd appreciate any thoughts/suggestions to debug this problem. >Hmmm, have you used 'lsof' or 'strace' to see what the process is doing? First get the process ID of the named process (pgrep named), you'll need this for the other two. 'lsof -p {process ID}' will show you all of the file handles the process has open, including any shared libraries and network connections. 'strace -p {process ID}' will allow you to watch the system calls the process is executing as it is running. If it's just sitting there, it might be tough to figure out, but if you watch it and see that it is looping on some call that might give you a hint. You can also start browsing through the /proc filesystem for information about that process, start by getting a directory listing of '/proc/{process ID}/'. Just a thought! -- Jay Leafey - Memphis, TN jay.leafey at mindless.com -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5322 bytes Desc: S/MIME Cryptographic Signature URL: <http://lists.centos.org/pipermail/centos/attachments/20050824/1a914f99/attachment-0001.bin>
I've encountered the same problems using the 4.1 SRPMs rebuilt for a RH9 machine I haven't had a chance to port to Centos 4. I've made about as much progress on the issue as you have. I've got pretty extensive logging in named turned on and queries just stop running at a certain point and it hangs until I kill -9 the process. ----- Original Message ----- From: "Arun K. Khan" <knura at yahoo.com> To: "CentOS mailing list" <centos at centos.org> Sent: Wednesday, August 24, 2005 10:34 AM Subject: [CentOS] named is up but does not respond to queries> CentOS 4.1/bind-9.2.4-2. > > I have named serving as a cache DNS server plus SOA for a local intranet > zone. > > The problem I am encountering - over a period of time it stops > responding to queries. > > nmap scan from a different host shows port 53 is visible. I can telnet > to the port but all queries to server time out. So much so that > "service named status" and "service named restart" hang. I have to > manually kill the named process before I am able to start named again (I > do remove the lock/pid files manually as well). This has occurred about > 4 times since I installed CentOS 4.1 4 weeks ago. I have not > encountered any problem with other services running on the same server. > > I looked through /var/log/messages and did not find any errors logged by > named. I'd appreciate any thoughts/suggestions to debug this problem. > > Here is what I have tried so far to figure out the problem: > > (from 192.168.1.150) > $ host www.yahoo.com 192.168.1.21 > ;; connection timed out; no servers could be reached > > # nmapfe of 192.168.1.21 (from 192.168.1.150) > (The 1208 ports scanned but not shown below are in state: > closed) > PORT STATE SERVICE > 22/tcp open ssh > 25/tcp open smtp > 53/tcp open domain > > (ssh'd into named server using IP# 192.168.1.21) > # service named status > rndc: recv failed: operation canceled > > TIA, > -- > Arun Khan > Linux is like a wigwam - no gates, no windows, apache inside > > _______________________________________________ > CentOS mailing list > CentOS at centos.org > http://lists.centos.org/mailman/listinfo/centos >
On Wed, 2005-08-24 at 11:55 -0500, Jay Leafey wrote:> Hmmm, have you used 'lsof' or 'strace' to see what the process is doing?Thanks for the suggestions Jay, I will certainly try them out the next time named decides to go south on me :(