Hi,
Without any update, hardware/software modification, etc... one of my
systems "Hourly restart" problem started again. Currently, I counted 5
restarts at 59th minute. No log entry, no console error, nothing
really interesting. If I do not see camera records with my own eyes,
I'll suspect about someone from D.C. hardreseting the box.
Now, I guess /var failed mounting and ssh not available. Maybe next
hour restart does a magic...
Last time, a kernel update solved hourly restart problem. Before
loosing access, I checked and saw a new kernel with -21 ending number
(x86_64, forgot to mention) and waiting for XFS module to be ready
(Well, I wish I had left it ext3) After that, perhaps update will
solve the problem again, but why?
Last time I stopped all crons, unneeded services, remote access, etc..
Put a man in front of the monitor and made him monitor everything.
Only thing he saw was a welcoming BIOS without any sign at the 60th
minute. Replaced power cords, power supply, some disks, RAM modules,
etc... Currently I have the last recovered remote logs of temperature
and voltage sensors of the system, all seems fine, nothing suspicious.
I am out of ideas. I have many gentoo boxes on the almost same
hardware and a few centos boxes. Only this one failes continuously...
I'd like to hear advice and suggestions about how to debug / repair
this situation.
Thanks.
P.S.: Complete hardware replacement plan is currently in action, new
hardware will be ready soon but I'm not so sure about hardware
failure. Why did it stopped last time after a simple kernel update?