Klaas wrote:>> A reconnect that lasts less than one minute is counted as continuous outage.> Do you mean 'more than one minute?'.I didn't describe it fully. If an outage is detected, the system attempts a reconnect immediately. If a reconnect is established but is lost again without a full minute of audio, I don't log it as a reconnect. Once a reconnect is established and holds for at least a minute, it is logged as being reconnected.> One explanation could be that by > coincidence they share some piece of networking equipment that failed on > their network path to your icecast box while the windows machines didn't.I see that as highly unlikely. One of the Linux boxes sits next to the server and is on the same switch as serves the connection to the ISP. All other clients have completely different ISP's from each for their connections, most on commercial ISP's providing DSL service. A couple are on educational networks.> Are the machines running a cron job to SYNC time (e.g. using ntpdate)?Yes. Their clocks are updated hourly. That makes the possibility of client based interruptions coincide if they are clock related.> Could you detection procedure be fooled by a jump forward/backward in time?Other than the logging time recorded on the client, I don't THINK that is likely. But I would have to say that I don't fully understand the meaning of the parameter of the PID for MPG123 that I have found that seems to signal a disconnect. It is entirely possible that the parameter can be fooled by an adjustment to the clock. But I would guess that I would see on-the-hour problems on a more frequent basis. On the other hand, all Linux clients poll the same time service for synchronization. If that service did something unexpected, that might explain something. I appreciate your exploring this with me but it is purely academic at this point and I'm probably boring the others. -- Dick dtrump1@triadav.com
You mentioned that your systems are running Fedora Core. At 4:00 every morning cron runs the updatedb script to update the locate database. That script looks for files on all local files systems on the machine. When it does this it causes starvation of memory so the kswap process activates, while that process is looking for candidates to move to swap space your regular user processes will observe stalls, frequently on the order of 4 to 15 seconds. This may happen many times during the run of the updatedb script. When I'm up late hacking I always end up taking a 10 to 15 minute break when 4:00 rolls around. Move the /etc/cron.daily/slocate.cron out of the daily directory and I would think your problem would go away. HTH, William. On Fri, 2006-09-08 at 07:24, Dick Trump wrote:> Klaas wrote: > >> A reconnect that lasts less than one minute is counted as continuous outage. > > > Do you mean 'more than one minute?'. > > I didn't describe it fully. If an outage is detected, the system attempts a reconnect immediately. If a reconnect is established but is lost again without a full minute of audio, I don't log it as a reconnect. Once a reconnect is established and holds for at least a minute, it is logged as being reconnected. > > > One explanation could be that by > > coincidence they share some piece of networking equipment that failed on > > their network path to your icecast box while the windows machines didn't. > > I see that as highly unlikely. One of the Linux boxes sits next to the server and is on the same switch as serves the connection to the ISP. All other clients have completely different ISP's from each for their connections, most on commercial ISP's providing DSL service. A couple are on educational networks. > > > Are the machines running a cron job to SYNC time (e.g. using ntpdate)? > > Yes. Their clocks are updated hourly. That makes the possibility of client based interruptions coincide if they are clock related. > > > Could you detection procedure be fooled by a jump forward/backward in time? > > Other than the logging time recorded on the client, I don't THINK that is likely. But I would have to say that I don't fully understand the meaning of the parameter of the PID for MPG123 that I have found that seems to signal a disconnect. It is entirely possible that the parameter can be fooled by an adjustment to the clock. But I would guess that I would see on-the-hour problems on a more frequent basis. > > On the other hand, all Linux clients poll the same time service for synchronization. If that service did something unexpected, that might explain something. > > I appreciate your exploring this with me but it is purely academic at this point and I'm probably boring the others.
William wrote:> You mentioned that your systems are running Fedora Core. > At 4:00 every morning cron runs the updatedb script to > update the locate database.You are correct. Thank you for your insights. As a test case, I have moved slocate.cron out of the cron.daily directory on only one machine to see if that removes that one machine from future coincident drops. However, I have a little new information. Although I don't check these logs 100% for this type of coincident drop, I really think this is a new phenomenon. But here's the new part. I had another incident this morning in the roughly 8 minute span from 06:40 to 06:48. Once again, all Linux machines including the monitor next to the server became erratic during that period but none of the remote Windows machines experienced a problem. I did mis-state something in an earlier post. My clock updating cron is in the daily, not the hourly directory. But with the strange 8 minute stretch this morning, the daily routines are looking to be less suspect. -- Dick dtrump1@triadav.com