On Fri, Nov 07, 2008 at 01:21:48PM -0800, Kevin Oberman
wrote:> I recently started getting errors on a fairly new USB connected SATA
> drive. Aside from the errors, the system was locking up as any process
> attempting to access the drive would lock up in disk uninterruptible
> wait ("D" in ps). I could not shut down the system and had to
power it
> off. (It's a laptop.) After a reboot, I tried to fsck it and that
locked
> up, too. I was able to recover by telling fsck to not fix the truncated
> inode and fix everything else. Then I ran fsck again and it was
> successful in fixing the inode. This happened several times.
>
> I then bought a new drive and got the identical behavior! It was not the
> drive. I rolled my kernel back to 9/13/08 and tried again. This time it
just
> worked! No errors or lock up.
>
> I suspect that there are two issues. One results in the lock-up when the
> disk had errors and the other caused the purported disk errors. The
> latter has been introduced since 9/13/08. The kernel that produced the
> errors was from 10/21. I also ran a kernel from 10/8 which did not cause
> me problems, but I'm not sure that I used the USB drive with this
> kernel.
>
> I'll be building a 10/8 kernel later, after I have backed up some data
> from a failing drive (PATA, not USB, and SMART confirms that the this
> disk is sick). I will try to track down exactly which change triggered
> this ugly behavior, but that will take a number of kernel builds, so it
> will take a while.
>
> Has anyone else seen this? Any ideas on what changes might be the most
> likely cause. Could be USB, CAM, or something else, I guess.
Funny you should post this today -- I just spent the past few days
dealing with this problem, specifically the kernel being "stuck" when
writing to a umass/da device (in my case, USB flash drives).
When I say "stuck", I mean the kernel was still responsive: Ctrl-T
would
report statuses in processes (the states shown were all different) but
the processes essentially had "hung". Ctrl-Alt-Esc on the console
dropped me to a db> prompt, so it's not as if the machine had
frozen/locked up; it was as if some part surrounding the storage
subsystem was spinning in a loop. IP traffic still worked as well, but
of course anything that accessed disks would hang. Rebooting the box
via Ctrl-Alt-Del wouldn't work, because it would get stuck waiting for a
bunch of PIDs to end.
I switched the box to CURRENT (for a lot of reasons), and one of those
was to try out the new USB4BSD (called "USB2" -- not to be confused
with
the USB2.0 protocol) stack. That simply induced a random kernel panic.
However, HPS is fairly certain he found the issue, and it's with
bus_dma(9) interaction. Here's the thread:
http://lists.freebsd.org/pipermail/freebsd-current/2008-November/thread.html#235
http://lists.freebsd.org/pipermail/freebsd-current/2008-November/000220.html
I have not yet tried his patches (I just woke up), but I will in a short
while. So far I have a lot more faith in USB4BSD than I do the old
stack, simply because there's active work going on in it.
(It's ironic that I encountered this issue while working on a document
describing how to put FreeBSD i386, amd64, and MS-DOS on a USB flash
drive, so one could install FreeBSD from it, or boot MS-DOS for BIOS
upgrades)
--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP: 4BD6C0CB |