Maarten makes some poses some valid questions:> Did you check the Linux machines for system and/or update logs at the > given time?I'm afraid I don't know all the places to look for logs that might tell me something. My script that keeps MPG123 running keeps a log of all reboots and outages. A reconnect that lasts less than one minute is counted as continuous outage. Looking more closely at my server's error.log, sorting the entries by client, during that 10 minute span (actually only 7 minute), the actual outages recorded as only a second long and from 2 to 4 drops per client. One of the drops occurred within a second of each other on 5 machines.> Maybe there was a power surge and the machines rebooted?No. The client machines were in 5 different locations, hundreds of miles apart.> Or maybe a network switch was at fault?The only switch in common was at the server. The three Windows clients that stayed up went through the same switch and were at 3 equally spread out locations.> Maybe a new kernel was installed that required a reboot?No reboot was logged in the log my script creates.> What I know of FC is that they do have kernel updates > every once in a while, and those updates naturally require a reboot.I'm on FC1. I don't think that kernel is in development. I'm sure they didn't reboot. I would have a log entry of that.> (And, if your machine has been running for months, checking hard disks > will probably take a few minutes, 10 wouldn't be that unusual.)But all on the same date? These machines were started on completely different dates in different cities. All are 13 GB drives with only 12% in use. I'm still focusing on a cron job being responsible. But it is a curiosity thing only. I don't see a long term problem. Based on the fact that the actual drops were so short, I'm guessing that some process that did run on a cron somehow fooled my detection procedure for a dropped connection. I know that it isn't perfect but it does work. I'm even further convinced that there was nothing in Icecast that was responsible. It has been incredibly reliable. Thanks for your input. Regards -- Dick dtrump1@triadav.com
Hi Dick, See my comments below: Dick Trump schreef:> Maarten makes some poses some valid questions: > >> Did you check the Linux machines for system and/or update logs at the >> given time? >> > > I'm afraid I don't know all the places to look for logs that might tell me something. My script that keeps MPG123 running keeps a log of all reboots and outages. A reconnect that lasts less than one minute is counted as continuous outage. >Do you mean 'more than one minute?'.> Looking more closely at my server's error.log, sorting the entries by client, during that 10 minute span (actually only 7 minute), the actual outages recorded as only a second long and from 2 to 4 drops per client. One of the drops occurred within a second of each other on 5 machines. > >Ok, this points to a problem on the machine running Icecast. It is very unlikely that remote machines would disconnect at almost exact the same time while they are not related/coordinated other than being connected to the same icecast instance. The question of course is why only the Linux clients would be disconnected. One explanation could be that by coincidence they share some piece of networking equipment that failed on their network path to your icecast box while the windows machines didn't. Are the machines running a cron job to SYNC time (e.g. using ntpdate)? Could you detection procedure be fooled by a jump forward/backward in time? I think that's about all I can think of ... Regards, KJ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.xiph.org/pipermail/icecast/attachments/20060908/ccd8b81c/attachment.htm
Klaas wrote:>> A reconnect that lasts less than one minute is counted as continuous outage.> Do you mean 'more than one minute?'.I didn't describe it fully. If an outage is detected, the system attempts a reconnect immediately. If a reconnect is established but is lost again without a full minute of audio, I don't log it as a reconnect. Once a reconnect is established and holds for at least a minute, it is logged as being reconnected.> One explanation could be that by > coincidence they share some piece of networking equipment that failed on > their network path to your icecast box while the windows machines didn't.I see that as highly unlikely. One of the Linux boxes sits next to the server and is on the same switch as serves the connection to the ISP. All other clients have completely different ISP's from each for their connections, most on commercial ISP's providing DSL service. A couple are on educational networks.> Are the machines running a cron job to SYNC time (e.g. using ntpdate)?Yes. Their clocks are updated hourly. That makes the possibility of client based interruptions coincide if they are clock related.> Could you detection procedure be fooled by a jump forward/backward in time?Other than the logging time recorded on the client, I don't THINK that is likely. But I would have to say that I don't fully understand the meaning of the parameter of the PID for MPG123 that I have found that seems to signal a disconnect. It is entirely possible that the parameter can be fooled by an adjustment to the clock. But I would guess that I would see on-the-hour problems on a more frequent basis. On the other hand, all Linux clients poll the same time service for synchronization. If that service did something unexpected, that might explain something. I appreciate your exploring this with me but it is purely academic at this point and I'm probably boring the others. -- Dick dtrump1@triadav.com