David Wolfskill
2007-May-10 20:16 UTC
6.2-R on Dell Poweredge 2950 with Dell PERC 5/i [mfi(4)]
From a quick look in the lists, I get the impression that the Dell PERC 5/i may be a bit problematic. Since I hadn't any plans on using that hardware, though, I've paid more attention to other things. Well, now a colleague is trying to run 6.2-R on one of these 2950s; dmesg says the controller is: mfi0: <Dell PERC 5/i> mem 0xd80f0000-0xd80fffff,0xfc4e0000-0xfc4fffff irq 78 at device 14.0 on pci2 mfi0: 817 (224963336s/0x0020/0) - Shutdown command received from host mfi0: 818 (4278190080s/0x0020/0) - PCI 0x041028 0x0415 0x041028 0x041f03: Firmware initialization started (PCI ID 0015/1028/1f03/1028) mfi0: 819 (4278190080s/0x0020/0) - Type 18: Firmware version 1.00.02-0157 mfi0: 820 (4278190096s/0x0008/0) - Battery Present mfi0: 821 (4278190124s/0x0004/0) - PD 08(e1/s255) event: Enclosure (SES) discovered on PD 08(e1/s255) mfi0: 822 (4278190124s/0x0002/0) - PD 08(e1/s255) event: Inserted: PD 08(e1/s255) mfi0: 823 (4278190124s/0x0002/0) - Type 29: Inserted: PD 08(e1/s255) Info: enclPd=08, scsiType=d, portMap=00, sasAddr=500180b04413ce00,0000000000000000 mfi0: 824 (4278190124s/0x0002/0) - PD 00(e1/s0) event: Inserted: PD 00(e1/s0) mfi0: 825 (4278190124s/0x0002/0) - Type 29: Inserted: PD 00(e1/s0) Info: enclPd=08, scsiType=0, portMap=01, sasAddr=50010b900046038e,0000000000000000 mfi0: 826 (4278190124s/0x0002/0) - PD 01(e1/s1) event: Inserted: PD 01(e1/s1) mfi0: 827 (4278190124s/0x0002/0) - Type 29: Inserted: PD 01(e1/s1) Info: enclPd=08, scsiType=0, portMap=02, sasAddr=50010b9000460376,0000000000000000 mfi0: 828 (4278190124s/0x0002/0) - PD 02(e1/s2) event: Inserted: PD 02(e1/s2) mfi0: 829 (4278190124s/0x0002/0) - Type 29: Inserted: PD 02(e1/s2) Info: enclPd=08, scsiType=0, portMap=04, sasAddr=50010b900046035a,0000000000000000 mfi0: 830 (4278190124s/0x0002/0) - PD 03(e1/s3) event: Inserted: PD 03(e1/s3) mfi0: 831 (4278190124s/0x0002/0) - Type 29: Inserted: PD 03(e1/s3) Info: enclPd=08, scsiType=0, portMap=08, sasAddr=50010b90004603be,0000000000000000 mfi0: 832 (4278190124s/0x0002/0) - PD 04(e1/s4) event: Inserted: PD 04(e1/s4) mfi0: 833 (4278190124s/0x0002/0) - Type 29: Inserted: PD 04(e1/s4) Info: enclPd=08, scsiType=0, portMap=10, sasAddr=50010b900045f6d6,0000000000000000 mfi0: 834 (4278190124s/0x0002/0) - PD 05(e1/s5) event: Inserted: PD 05(e1/s5) mfi0: 835 (4278190124s/0x0002/0) - Type 29: Inserted: PD 05(e1/s5) Info: enclPd=08, scsiType=0, portMap=20, sasAddr=50010b9000460246,0000000000000000 mfi0: 836 (224964238s/0x0020/0) - Adapter ticks 224964238 elapsed 45s: Time established as 02/16/07 18:03:58; (45 seconds since power on) and the disks looks like: mfid0: <MFI Logical Disk> on mfi0 mfid0: 418176MB (856424448 sectors) RAID volume '' is optimal The intended production workload involves creation and deletion of a large number of files rather rapidly. I recalled that for the first year or two with Soft Updates, there were problems with that kind of workload, such that there was enough hysteresis in making free blocks actually available for subsequent allocation that processes that were trying to write to new blocks on such file systems would often fail, reporting ENOSPC. Un-mounting and re-mounting the file system would clean things up, but that doesn't tend to be a viable approach for keeping a long-running application happy. :-} I reminded my colleague of this, since she also reported that an un-mount/re-mount sequence caused a lot of free space to show up on the file system in question, and she responded that she had been aware of this, and had been turning off Soft Updates on the file systems for the application in question, but she had forgotten that Soft Updates was on by default when she set up this (test) system. She then turned off Soft Updates and started the test workload again. And instead of failing with ENOSPC after 3 days, it only took 2. Hmmm... well; that wasn't exactly what I had expected. Any hints, here? The machine is running the i386 arch, with a pair of dual-core 2.33HHz Xeons. I have a recent dmesg.boot, but I'd rather keep list messages fairly short. We have a local private mirror of the FreeBSD CVS repository, so we have some flexibility in what we can do for testing, but the objective is to put the box in production -- and I'd rather not run CURRENT as part of a customer-visible production workload. :-} [My laptop is a different matter, of course....] Thanks! Peace, david -- David H. Wolfskill david@catwhisker.org Believe SORBS at your own risk: 63.193.123.122 has been static since Aug 1999. See http://www.catwhisker.org/~david/publickey.gpg for my public key. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20070510/a9ed6165/attachment.pgp
David Wolfskill wrote:> From a quick look in the lists, I get the impression that the Dell PERC > 5/i may be a bit problematic. Since I hadn't any plans on using that > hardware, though, I've paid more attention to other things. >Not sure that this impression is entirely accurate. The biggest problem with MFI machines is online RAID management. The storage driver itself matured very quickly and has been very reliable.> Well, now a colleague is trying to run 6.2-R on one of these 2950s; dmesg > says the controller is: > > mfi0: <Dell PERC 5/i> mem 0xd80f0000-0xd80fffff,0xfc4e0000-0xfc4fffff irq 78 at device 14.0 on pci2 > mfi0: 817 (224963336s/0x0020/0) - Shutdown command received from host > mfi0: 818 (4278190080s/0x0020/0) - PCI 0x041028 0x0415 0x041028 0x041f03: Firmware initialization started (PCI ID 0015/1028/1f03/1028) > mfi0: 819 (4278190080s/0x0020/0) - Type 18: Firmware version 1.00.02-0157 > mfi0: 820 (4278190096s/0x0008/0) - Battery Present > mfi0: 821 (4278190124s/0x0004/0) - PD 08(e1/s255) event: Enclosure (SES) discovered on PD 08(e1/s255) > mfi0: 822 (4278190124s/0x0002/0) - PD 08(e1/s255) event: Inserted: PD 08(e1/s255) > mfi0: 823 (4278190124s/0x0002/0) - Type 29: Inserted: PD 08(e1/s255) Info: enclPd=08, scsiType=d, portMap=00, sasAddr=500180b04413ce00,0000000000000000 > mfi0: 824 (4278190124s/0x0002/0) - PD 00(e1/s0) event: Inserted: PD 00(e1/s0) > mfi0: 825 (4278190124s/0x0002/0) - Type 29: Inserted: PD 00(e1/s0) Info: enclPd=08, scsiType=0, portMap=01, sasAddr=50010b900046038e,0000000000000000 > mfi0: 826 (4278190124s/0x0002/0) - PD 01(e1/s1) event: Inserted: PD 01(e1/s1) > mfi0: 827 (4278190124s/0x0002/0) - Type 29: Inserted: PD 01(e1/s1) Info: enclPd=08, scsiType=0, portMap=02, sasAddr=50010b9000460376,0000000000000000 > mfi0: 828 (4278190124s/0x0002/0) - PD 02(e1/s2) event: Inserted: PD 02(e1/s2) > mfi0: 829 (4278190124s/0x0002/0) - Type 29: Inserted: PD 02(e1/s2) Info: enclPd=08, scsiType=0, portMap=04, sasAddr=50010b900046035a,0000000000000000 > mfi0: 830 (4278190124s/0x0002/0) - PD 03(e1/s3) event: Inserted: PD 03(e1/s3) > mfi0: 831 (4278190124s/0x0002/0) - Type 29: Inserted: PD 03(e1/s3) Info: enclPd=08, scsiType=0, portMap=08, sasAddr=50010b90004603be,0000000000000000 > mfi0: 832 (4278190124s/0x0002/0) - PD 04(e1/s4) event: Inserted: PD 04(e1/s4) > mfi0: 833 (4278190124s/0x0002/0) - Type 29: Inserted: PD 04(e1/s4) Info: enclPd=08, scsiType=0, portMap=10, sasAddr=50010b900045f6d6,0000000000000000 > mfi0: 834 (4278190124s/0x0002/0) - PD 05(e1/s5) event: Inserted: PD 05(e1/s5) > mfi0: 835 (4278190124s/0x0002/0) - Type 29: Inserted: PD 05(e1/s5) Info: enclPd=08, scsiType=0, portMap=20, sasAddr=50010b9000460246,0000000000000000 > mfi0: 836 (224964238s/0x0020/0) - Adapter ticks 224964238 elapsed 45s: Time established as 02/16/07 18:03:58; (45 seconds since power on) > > and the disks looks like: > > mfid0: <MFI Logical Disk> on mfi0 > mfid0: 418176MB (856424448 sectors) RAID volume '' is optimal >Looks A OK to me.> > The intended production workload involves creation and deletion of > a large number of files rather rapidly. > > I recalled that for the first year or two with Soft Updates, there > were problems with that kind of workload, such that there was enough > hysteresis in making free blocks actually available for subsequent > allocation that processes that were trying to write to new blocks > on such file systems would often fail, reporting ENOSPC. Un-mounting > and re-mounting the file system would clean things up, but that > doesn't tend to be a viable approach for keeping a long-running > application happy. :-} >sysctl vfs.ffs.doasyncfree=0 might help. Running the syncer more frequently might also help, but I don't recall the sysctl node for that.> I reminded my colleague of this, since she also reported that an > un-mount/re-mount sequence caused a lot of free space to show up > on the file system in question, and she responded that she had been > aware of this, and had been turning off Soft Updates on the file > systems for the application in question, but she had forgotten that > Soft Updates was on by default when she set up this (test) system. > > She then turned off Soft Updates and started the test workload again. > And instead of failing with ENOSPC after 3 days, it only took 2.Very strange. No chance that it was due to files that were deleted but still referenced by open apps?> > Hmmm... well; that wasn't exactly what I had expected. > > Any hints, here? The machine is running the i386 arch, with a pair of > dual-core 2.33HHz Xeons. > > I have a recent dmesg.boot, but I'd rather keep list messages fairly > short. > > We have a local private mirror of the FreeBSD CVS repository, so we have > some flexibility in what we can do for testing, but the objective is to > put the box in production -- and I'd rather not run CURRENT as part of a > customer-visible production workload. :-} [My laptop is a different > matter, of course....] >This sounds purely like a filesystem issue, not an MFI driver issue. Scott