Jason Thomson
2004-Jul-16 12:41 UTC
Reproducible FreeBSD 4.10-STABLE (Jul 7) , 3ware 7506-4 lockup.
We can now reproduce the lockup we have been experiencing. We have not been able to get a crash dump. I'm not sure if it's something we're doing wrong, or if there's some other reason it's not saving the core to the swap device. Next week sometime we can make the server available on the internet if there is someone willing and able to help us debug this. We can probably provide a serial console hookup from another machine if that would help. (We have to migrate the data from this production machine before we can make it available). We are very keen to resolve this problem; we have ~20 machines running FreeBSD 4.x with 7506-4 cards, and so far three of them have exhibited this problem. (Only one is causing problems now - we replaced disks on the other two). Recap on problem: Hardware / OS: + FreeBSD 4.x (Various -STABLE versions from 21/01/04 until 07/07/04) + Dell 1600SC (UP and SMP). + 7506-4 cards + 300 / 320 GB Maxtor Maxline II hard drives. (Only these disks*). * We have many machines with WD2000JB / WD2500JB that do not exhibit this problem. To reproduce the problem on the the machine in question I run this command: # dd if=/dev/twed0s1h iseek=137510 bs=1m of=/dev/null The card then locks up hard within 10 seconds - no further I/O succeeds, but anything that is already in cached by the VM can be read / invoked. Crash dumps are enabled. We have swap (and the dumpdev) configured on a SCSI disk in the same machine. CTRL-ALT-ESC does drop to the debugger. ddb> panic followed by ddb> call boot(0) does reboot the machine, but savecore does NOT find a kernel core dump on reboot. It is possible that we have something configured wrongly, but I can't see what it is. Another data point: In one previouse instance of this problem, we resolved the symptoms by checking the disks with Maxtor's PowerMax tools. One disk was found to have errors and been and repairing / replacing that disk resolved the errors. (However, if the disk has errors, I would expect the RAID card to deal with it!).
Alasdair Lumsden
2004-Jul-16 12:56 UTC
Reproducible FreeBSD 4.10-STABLE (Jul 7) , 3ware 7506-4 lockup.
On Fri, 2004-07-16 at 20:40, Jason Thomson wrote:> Another data point: > > In one previouse instance of this problem, we resolved the symptoms by > checking the disks with Maxtor's PowerMax tools. One disk was found to > have errors and been and repairing / replacing that disk resolved the > errors. (However, if the disk has errors, I would expect the RAID > card to deal with it!).Absolutely. If a disk is misbehaving, then it should be dropped from the array and the hot-spare used (if any). That way the disk can be pulled and replaced. The machine shouldn't lock up. We've ordered some 9000 series cards to replace the 8000 series cards we are having problems with. I'll report back any progress.
Vinod Kashyap
2004-Jul-16 14:03 UTC
Reproducible FreeBSD 4.10-STABLE (Jul 7) , 3ware 7506-4 lockup.
After the system locks up, from the DDB prompt, do a 'tr, 20'. What does it say? Please check the drive compatibility list at: http://www.3ware.com/products/pdf/Drive_compatibility_list.pdf If you suspect a problem with any of the 3ware components, I strongly encourage you to contact 3ware support.> -----Original Message----- > From: owner-freebsd-stable@freebsd.org > [mailto:owner-freebsd-stable@freebsd.org]On Behalf Of Jason Thomson > Sent: Friday, July 16, 2004 12:40 PM > To: freebsd-hardware@freebsd.org; freebsd-stable@freebsd.org > Cc: Paul Saab > Subject: Reproducible FreeBSD 4.10-STABLE (Jul 7) , 3ware > 7506-4 lockup. > > > > We can now reproduce the lockup we have been experiencing. > We have not > been able to get a crash dump. I'm not sure if it's something we're > doing wrong, or if there's some other reason it's not saving the core > to the swap device. > > Next week sometime we can make the server available on the internet if > there is someone willing and able to help us debug this. We can > probably provide a serial console hookup from another machine if that > would help. (We have to migrate the data from this production machine > before we can make it available). > > We are very keen to resolve this problem; we have ~20 > machines running > FreeBSD 4.x with 7506-4 cards, and so far three of them have > exhibited > this problem. (Only one is causing problems now - we > replaced disks on > the other two). > > > Recap on problem: > > Hardware / OS: > > + FreeBSD 4.x (Various -STABLE versions from 21/01/04 until 07/07/04) > > + Dell 1600SC (UP and SMP). > > + 7506-4 cards > > + 300 / 320 GB Maxtor Maxline II hard drives. (Only these disks*). > > > * We have many machines with WD2000JB / WD2500JB that do not exhibit > this problem. > > > To reproduce the problem on the the machine in question I run this > command: > > # dd if=/dev/twed0s1h iseek=137510 bs=1m of=/dev/null > > The card then locks up hard within 10 seconds - no further > I/O succeeds, > but anything that is already in cached by the VM can be read > / invoked. > > > Crash dumps are enabled. We have swap (and the dumpdev) > configured on a > SCSI disk in the same machine. CTRL-ALT-ESC does drop to the > debugger. > > > ddb> panic > > followed by > > ddb> call boot(0) > > does reboot the machine, but savecore does NOT find a kernel > core dump > on reboot. > > It is possible that we have something configured wrongly, but I can't > see what it is. > > > > > Another data point: > > In one previouse instance of this problem, we resolved the > symptoms by > checking the disks with Maxtor's PowerMax tools. One disk > was found to > have errors and been and repairing / replacing that disk resolved the > errors. (However, if the disk has errors, I would expect the RAID > card to deal with it!). > > > > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to > "freebsd-stable-unsubscribe@freebsd.org" >