On Thu, Oct 27, 2011 at 11:52:51PM +0100, Vincent Hoffman
wrote:> I've recently installed a new NAS at work which uses a rebranded
LSI
> megaraid sas
> [root@banshee ~]# mfiutil show adapter
> mfi0 Adapter:
> Product Name: Supermicro SMC2108
> Serial Number:
> Firmware: 12.12.0-0047
> RAID Levels: JBOD, RAID0, RAID1, RAID5, RAID6, RAID10, RAID50
> Battery Backup: present
> NVRAM: 32K
> Onboard Memory: 512M
> Minimum Stripe: 8k
> Maximum Stripe: 1M
>
> I'm running 8-STABLE as of 2011-10-23 (for zfs v28 as is got 26 3Tb
drives)
>
> I'm seeing a lot of messages like
> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 60 SECONDS
> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 90 SECONDS
> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 120 SECONDS
> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 150 SECONDS
> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 180 SECONDS
> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 210 SECONDS
> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 240 SECONDS
> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 271 SECONDS
> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 301 SECONDS
> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 331 SECONDS
> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 361 SECONDS
> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 391 SECONDS
> mfi0: COMMAND 0xffffff8000b21b08 TIMEOUT AFTER 55 SECONDS
> mfi0: COMMAND 0xffffff8000b21b08 TIMEOUT AFTER 85 SECONDS
>
> At which time I'm seeing IO stall on the array connected to the mfi
> adapter, this can continue for
> 20 minutes or so resuming randomly (or so it seems although a little
> more on this later on)
>
> >From pciconf -lv
> mfi0@pci0:5:0:0: class=0x010400 card=0x070015d9 chip=0x00791000
> rev=0x04 hdr=0x00
> vendor = 'LSI Logic (Was: Symbios Logic, NCR)'
> class = mass storage
> subclass = RAID
>
> >From dmesg
> mfi0: <LSI MegaSAS Gen2> port 0xe000-0xe0ff mem
> 0xfbd9c000-0xfbd9ffff,0xfbdc0000-0xfbdfffff irq 32 at device 0.0 on pci5
> mfi0: Megaraid SAS driver Ver 3.00
> mfi0: 12330 (372962922s/0x0020/info) - Shutdown command received from host
> mfi0: 12331 (boot + 4s/0x0020/info) - Firmware initialization started
> (PCI ID 0079/1000/0700/15d9)
> mfi0: 12332 (boot + 4s/0x0020/info) - Firmware version 2.120.53-1235
> mfi0: 12333 (boot + 7s/0x0008/info) - Battery Present
> mfi0: 12334 (boot + 7s/0x0020/info) - Package version 12.12.0-0047
> mfi0: 12335 (boot + 7s/0x0020/info) - Board Revision
>
> I have found this thread from a bit of googleing but it doesnt end too
well.
>
http://lists.freebsd.org/pipermail/freebsd-stable/2011-September/063821.html
> Was this ever taken further?
>
> One thing I have noticed is that the stall (and timeout messages) seem
> to go away if I query the card using mfiutil, I currently have a cron
> doing this every 2 minutes to see if this has been coincidence or not.
>
>
> Any suggestions welcome and i'm happy to provide more info if i can but
> I dont have a duplicate to do too much debugging on, I'm happy to try
> patches though.
>
> Is this worth filing a PR?
Can you please provide uname -a output? The version of FreeBSD you're
using matters greatly here.
--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, US |
| Making life hard for others since 1977. PGP 4BD6C0CB |