I''m having some issues with Workers dying after a period of several hours. Each worker runs a loop that asks Amazon SQS for work to do. If there is a message in the queue the work is completed (image processing, etc...); if there is no message the worker will sleep for X seconds (sleep 10, etc...). I''ve noticed that the workers will frequently exhibit two negative behaviors: A) stop asking for requests, but still exist as a process; or B) die completely (no more process) with no errors reported in either log file. I made a simple DeathWorker last night to try to find out exactly *when* death occurs. The worker will log when it asks for a message, when it goes to sleep, and when it wakes up. Like so: 09/27/2007 13:23:05 (7673) DeathWorker: SQSMiddleMan.next_message (:death_worker) 09/27/2007 13:23:05 (7673) DeathWorker: No message. Going to sleep. 09/27/2007 13:23:34 (7673) DeathWorker: Done sleeping. The above log entries show the normal course of operation for the DeathWorker: look for a message, almost immediately report that there is no message and go to sleep for 10 seconds. Wake up and log that you are awake. As you can see there was more than 10 seconds between logging that you were going to sleep and then waking up. Is it possible that the log synchronization that occurs through the logging worker causes the delay? This happened later in the night: 09/27/2007 13:23:38 (7673) DeathWorker: No message. Going to sleep. 09/27/2007 13:27:15 (7673) DeathWorker: Done sleeping. Almost four minutes of sleep when I call sleep 10. Interesting. Later in the night: 09/27/2007 13:50:13 (7673) DeathWorker: No message. Going to sleep. 09/27/2007 19:29:36 (7673) DeathWorker: Done sleeping. Wow! Almost 6 hours of sleeping! After that nap the worker went for another 10 minutes or so and then the process actually died, with no errors reported in the log. Any idea what is going on? How can I debug this issue? Every time I try to attach to the oversleeping process with GDB it segfaults. Thanks in advance! Erik