On Wed, 2005-08-03 at 10:23 +0200, Ulrik S. Kofod wrote:> Yesterday i was copying a few dump files (backups) about 1GB in size from
my centOS
> 3.4, on a samba share, to my windows, when the centOS box stopped
responding.
> 
> The HDD LED was on and after I connected a monitor all I could see was this
error
> message over and over again:
> 
> "usb-uhci.c: host controller halted, trying to restart"
> 
> I wasn't able to login or anything, so I saw no other solution than
pressing the
> reset button.
> 
> When it rebooted it forced a hdd check, but was unable to mount /dev/hde2
(that
> useually mounts on /var !) and pretty much nothing works without /var.
> The error message mount gave was something like "Unable to mount
/dev/hde2: invalid
> argument".
Hard to see what that has to do with usb-uhci - hde is apparently on an
IDE controller and usb-uhci is USB.  Could be a memory or MB issue.  I'd
try running memtest86+ and monitor logs for errors.  Checking all disks
with smart is also indicated.
> The files I was copying are located on /dev/md0 (raid 0 over 4 disks) and
that still
> worked fine, / is mounted on /dev/hde1 and that also worked as expected.
> 
> I removed /var from /etc/fstab and restored a backup of /var to the dir
/var and
> then I could boot normally again.
> 
> Trying to save what was on /dev/hdde2 I ran a e2fsck -p /dev/hde2 and that
corrected
> a ton of errors (deleted a lot of data), I then mounted /dev/hde2 to
another folder
> and restored my backups so I only lost a few hours of data. After I added
/var to
> /etc/fstab again everything worked as normal again.
> 
> My question is happened!? and what can I do to avoid this again?
> If /dev/hde2 had been a RAID 1 would it then have rebuild? Should I move
/var to a
> RAID 1?
Wouldn't hurt, but why not do the whole system on RAID 1 if you're going
to that trouble?
> I have copied large files like this before without problems, was it just
bad luck or
> should I expect it to do this again?
Again - I'd suspect some underlying hardware problems.
> /dev/hde is attached to a cheap ide ultra ata 133 pci controller card
(Silicon
> Image) that has worked flawlessly for about a year. Can that be broken?
Right now it
> seems OK again. I have a replacement for it but I would rather not replace
it if it
> isn?t necessary.
I'd guess disks, memory, and MB before the controller - emphasis on
GUESS.
> PS. Try backups! You won?t regret it :)
Follow your own advice! :-)
Good luck,
Phil