I had an unexpected reboot of my Dell R610 today around 2:05-06pm today.
I do not know if it crashed or if it was power cycled.
This machine is running:
FreeBSD gunsight1.neutralgood.org 8.2-RELEASE FreeBSD 8.2-RELEASE #1: Thu Dec 8
21:58:59 UTC 2011 root@:/usr/obj/usr/src/sys/GENERIC amd64
It''s a stock 8.2-RELEASE kernel except I had to tweak it near the top
of
vfs_mountroot() to delay before attempting to mount the root filesystem.
(Without my tweak it attempts to mount root before the USB drive is finished
getting attached.)
The dmesg shows this at the reboot:
mfi0: 24272 (422106527s/0x0020/info) - Patrol Read complete
mfi0: 24273 (422172000s/0x0020/info) - Patrol Read started
mfi0: 24318 (422192750s/0x0020/info) - Patrol Read complete
mfi0: 24319 (boot + 3s/0x0020/info) - Firmware initialization started (PCI ID
0060/1000/1f0c/1028)
mfi0: 24320 (boot + 3s/0x0020/info) - Firmware version 1.22.12-0952
mfi0: 24321 (boot + 3s/0x0020/info) - Firmware initialization started (PCI ID
0060/1000/1f0c/1028)
mfi0: 24322 (boot + 3s/0x0020/info) - Firmware version 1.22.12-0952
Does this mean the machine did not lose power? I ask because my datacenter
had some sort of power incident and I''m not sure if the server lost
power
or not. But if the kernel message buffer from before the incident is still
present then the machine never lost power, correct? The datacenter''s
power
incident I''m told happened somewhere around the time of the reboot so I
have to ask.
It looks like I didn''t have dumps enabled. That''s ... not
helpful.
The machine has been stable for:
2:05PM up 472 days, 21 mins, 7 users, load averages: 0.01, 0.02, 0.00
http://www.neutralgood.org/~kpn/dmesg.boot
Here''s various stats I usually keep displayed. This is the last from
before the reboot:
http://www.neutralgood.org/~kpn/status.txt
I''ve got all the power savings features turned off in the BIOS and,
like
I said, the machine has been stable for all this time. However, one thing
to note from a couple of days ago:
May 14 00:49:13 gunsight1 -- MARK --
May 14 01:00:45 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER
35 SECONDS
May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER
65 SECONDS
May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER
95 SECONDS
May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER
125 SECONDS
May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER
155 SECONDS
May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER
185 SECONDS
May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER
215 SECONDS
May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER
245 SECONDS
May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER
275 SECONDS
May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER
305 SECONDS
May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER
335 SECONDS
May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER
365 SECONDS
May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER
395 SECONDS
May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER
425 SECONDS
May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER
455 SECONDS
May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER
485 SECONDS
May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER
515 SECONDS
May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER
545 SECONDS
May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER
575 SECONDS
May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER
605 SECONDS
May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER
635 SECONDS
May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER
665 SECONDS
May 14 01:19:36 gunsight1 -- MARK --
May 14 01:39:36 gunsight1 -- MARK --
May 14 01:59:37 gunsight1 -- MARK --
May 14 02:10:55 gunsight1 kernel: mfi0: 24089 (421826400s/0x0020/info) - Patrol
Read started
--
Kevin P. Neal http://www.pobox.com/~kpn/
"Not even the dumbest terrorist would choose an encryption program that
allowed the U.S. government to hold the key." -- (Fortune magazine
is smarter than the US government, Oct 29 2001, page 196.)
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to
"freebsd-stable-unsubscribe@freebsd.org"
On Sat, May 18, 2013 at 09:45:21PM -0400, kpneal at pobox.com wrote:> I had an unexpected reboot of my Dell R610 today around 2:05-06pm today. > I do not know if it crashed or if it was power cycled. > > This machine is running: > FreeBSD gunsight1.neutralgood.org 8.2-RELEASE FreeBSD 8.2-RELEASE #1: Thu Dec 8 21:58:59 UTC 2011 root@:/usr/obj/usr/src/sys/GENERIC amd64 > > It's a stock 8.2-RELEASE kernel except I had to tweak it near the top of > vfs_mountroot() to delay before attempting to mount the root filesystem. > (Without my tweak it attempts to mount root before the USB drive is finished > getting attached.) > > The dmesg shows this at the reboot: > mfi0: 24272 (422106527s/0x0020/info) - Patrol Read complete > mfi0: 24273 (422172000s/0x0020/info) - Patrol Read started > mfi0: 24318 (422192750s/0x0020/info) - Patrol Read complete > mfi0: 24319 (boot + 3s/0x0020/info) - Firmware initialization started (PCI ID 0060/1000/1f0c/1028) > mfi0: 24320 (boot + 3s/0x0020/info) - Firmware version 1.22.12-0952 > mfi0: 24321 (boot + 3s/0x0020/info) - Firmware initialization started (PCI ID 0060/1000/1f0c/1028) > mfi0: 24322 (boot + 3s/0x0020/info) - Firmware version 1.22.12-0952 > > Does this mean the machine did not lose power? I ask because my datacenter > had some sort of power incident and I'm not sure if the server lost power > or not. But if the kernel message buffer from before the incident is still > present then the machine never lost power, correct? The datacenter's power > incident I'm told happened somewhere around the time of the reboot so I > have to ask. > > It looks like I didn't have dumps enabled. That's ... not helpful. > > The machine has been stable for: > 2:05PM up 472 days, 21 mins, 7 users, load averages: 0.01, 0.02, 0.00 > > http://www.neutralgood.org/~kpn/dmesg.boot > > Here's various stats I usually keep displayed. This is the last from > before the reboot: > http://www.neutralgood.org/~kpn/status.txtYour system did not reboot nor did it crash. If it did, your uptime would not be showing 472 days.. Really, it's that simple.> I've got all the power savings features turned off in the BIOS and, like > I said, the machine has been stable for all this time. However, one thing > to note from a couple of days ago: > > May 14 00:49:13 gunsight1 -- MARK -- > May 14 01:00:45 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 35 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 65 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 95 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 125 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 155 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 185 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 215 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 245 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 275 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 305 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 335 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 365 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 395 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 425 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 455 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 485 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 515 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 545 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 575 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 605 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 635 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 665 SECONDS > May 14 01:19:36 gunsight1 -- MARK -- > May 14 01:39:36 gunsight1 -- MARK -- > May 14 01:59:37 gunsight1 -- MARK -- > May 14 02:10:55 gunsight1 kernel: mfi0: 24089 (421826400s/0x0020/info) - Patrol Read startedYour mfi device timeouts are unrelated. If you want to talk about them, please discuss them in a new/separate thread. -- | Jeremy Chadwick jdc at koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
On Sat, May 18, 2013 at 09:45:21PM -0400, kpneal at pobox.com wrote:> I had an unexpected reboot of my Dell R610 today around 2:05-06pm today. > I do not know if it crashed or if it was power cycled. > > This machine is running: > FreeBSD gunsight1.neutralgood.org 8.2-RELEASE FreeBSD 8.2-RELEASE #1: Thu Dec 8 21:58:59 UTC 2011 root@:/usr/obj/usr/src/sys/GENERIC amd64 > > It's a stock 8.2-RELEASE kernel except I had to tweak it near the top of > vfs_mountroot() to delay before attempting to mount the root filesystem. > (Without my tweak it attempts to mount root before the USB drive is finished > getting attached.) > > The dmesg shows this at the reboot: > mfi0: 24272 (422106527s/0x0020/info) - Patrol Read complete > mfi0: 24273 (422172000s/0x0020/info) - Patrol Read started > mfi0: 24318 (422192750s/0x0020/info) - Patrol Read complete > mfi0: 24319 (boot + 3s/0x0020/info) - Firmware initialization started (PCI ID 0060/1000/1f0c/1028) > mfi0: 24320 (boot + 3s/0x0020/info) - Firmware version 1.22.12-0952 > mfi0: 24321 (boot + 3s/0x0020/info) - Firmware initialization started (PCI ID 0060/1000/1f0c/1028) > mfi0: 24322 (boot + 3s/0x0020/info) - Firmware version 1.22.12-0952 > > Does this mean the machine did not lose power? I ask because my datacenter > had some sort of power incident and I'm not sure if the server lost power > or not. But if the kernel message buffer from before the incident is still > present then the machine never lost power, correct? The datacenter's power > incident I'm told happened somewhere around the time of the reboot so I > have to ask.The LSI controllers I've used will keep internal event logs which are persistent across power cycles (so long as the BBU isn't dead, presumably). It looks like mfi(4) has been set up to dump the entire event log during boot. Log entries created after the last reboot are displayed with a timestamp of "boot + Ns".> > It looks like I didn't have dumps enabled. That's ... not helpful. > > The machine has been stable for: > 2:05PM up 472 days, 21 mins, 7 users, load averages: 0.01, 0.02, 0.00That's a bit confusing... did you mean "had been"? This is the exact uptime that's in status.txt below.> > http://www.neutralgood.org/~kpn/dmesg.boot > > Here's various stats I usually keep displayed. This is the last from > before the reboot: > http://www.neutralgood.org/~kpn/status.txt > > I've got all the power savings features turned off in the BIOS and, like > I said, the machine has been stable for all this time. However, one thing > to note from a couple of days ago: >This is probably unrelated? As an aside, it'd be nice if mfi(4) dumped info about the dcmd/io cmd at least once if it times out. At the moment, it only does that if MFI_DEBUG is enabled... does anyone have an objection to changing this from a compile-time option to a sysctl? Thanks, -Mark> May 14 00:49:13 gunsight1 -- MARK -- > May 14 01:00:45 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 35 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 65 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 95 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 125 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 155 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 185 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 215 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 245 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 275 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 305 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 335 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 365 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 395 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 425 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 455 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 485 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 515 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 545 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 575 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 605 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 635 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 665 SECONDS > May 14 01:19:36 gunsight1 -- MARK -- > May 14 01:39:36 gunsight1 -- MARK -- > May 14 01:59:37 gunsight1 -- MARK -- > May 14 02:10:55 gunsight1 kernel: mfi0: 24089 (421826400s/0x0020/info) - Patrol Read started > > -- > Kevin P. Neal http://www.pobox.com/~kpn/ > "Not even the dumbest terrorist would choose an encryption program that > allowed the U.S. government to hold the key." -- (Fortune magazine > is smarter than the US government, Oct 29 2001, page 196.) > _______________________________________________ > freebsd-stable at freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
On Sat, May 18, 2013 at 07:11:00PM -0700, Jeremy Chadwick wrote:> On Sat, May 18, 2013 at 09:45:21PM -0400, kpneal@pobox.com wrote: > > I had an unexpected reboot of my Dell R610 today around 2:05-06pm today. > > I do not know if it crashed or if it was power cycled. > > > > This machine is running: > > FreeBSD gunsight1.neutralgood.org 8.2-RELEASE FreeBSD 8.2-RELEASE #1: Thu Dec 8 21:58:59 UTC 2011 root@:/usr/obj/usr/src/sys/GENERIC amd64 > > > > It''s a stock 8.2-RELEASE kernel except I had to tweak it near the top of > > vfs_mountroot() to delay before attempting to mount the root filesystem. > > (Without my tweak it attempts to mount root before the USB drive is finished > > getting attached.) > > > > The dmesg shows this at the reboot: > > mfi0: 24272 (422106527s/0x0020/info) - Patrol Read complete > > mfi0: 24273 (422172000s/0x0020/info) - Patrol Read started > > mfi0: 24318 (422192750s/0x0020/info) - Patrol Read complete > > mfi0: 24319 (boot + 3s/0x0020/info) - Firmware initialization started (PCI ID 0060/1000/1f0c/1028) > > mfi0: 24320 (boot + 3s/0x0020/info) - Firmware version 1.22.12-0952 > > mfi0: 24321 (boot + 3s/0x0020/info) - Firmware initialization started (PCI ID 0060/1000/1f0c/1028) > > mfi0: 24322 (boot + 3s/0x0020/info) - Firmware version 1.22.12-0952 > > > > Does this mean the machine did not lose power? I ask because my datacenter > > had some sort of power incident and I''m not sure if the server lost power > > or not. But if the kernel message buffer from before the incident is still > > present then the machine never lost power, correct? The datacenter''s power > > incident I''m told happened somewhere around the time of the reboot so I > > have to ask. > > > > It looks like I didn''t have dumps enabled. That''s ... not helpful. > > > > The machine has been stable for: > > 2:05PM up 472 days, 21 mins, 7 users, load averages: 0.01, 0.02, 0.00 > > > > http://www.neutralgood.org/~kpn/dmesg.boot > > > > Here''s various stats I usually keep displayed. This is the last from > > before the reboot: > > http://www.neutralgood.org/~kpn/status.txt > > Your system did not reboot nor did it crash. If it did, your uptime > would not be showing 472 days..It was showing 472 days before it was rebooted. Sorry I used the wrong tense in my statement above.> Really, it''s that simple.Here''s the status at this very second: 12:42AM up 10:34, 3 users, load averages: 0.00, 0.03, 0.00 That looks an awful lot like the machine was restarted for some reason. What I don''t know is why.> > I''ve got all the power savings features turned off in the BIOS and, like > > I said, the machine has been stable for all this time. However, one thing > > to note from a couple of days ago:> > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT AFTER 665 SECONDS > > May 14 01:19:36 gunsight1 -- MARK -- > > May 14 01:39:36 gunsight1 -- MARK -- > > May 14 01:59:37 gunsight1 -- MARK -- > > May 14 02:10:55 gunsight1 kernel: mfi0: 24089 (421826400s/0x0020/info) - Patrol Read started > > Your mfi device timeouts are unrelated. If you want to talk about them, > please discuss them in a new/separate thread.You usually mention people not giving enough information. I''m trying to provide anything I can think of that might for whatever reason be involved. -- Kevin P. Neal http://www.pobox.com/~kpn/ "Oh, I''ve heard that paradox a couple of times, but there''s something about a cat dying and I hate to think of such things." - Dr. Donald Knuth speaking of Schrodinger''s cat, December 8, 1999, MIT _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
On Sat, May 18, 2013 at 10:36:37PM -0400, Mark Johnston wrote:> On Sat, May 18, 2013 at 09:45:21PM -0400, kpneal@pobox.com wrote: > > I had an unexpected reboot of my Dell R610 today around 2:05-06pm today. > > I do not know if it crashed or if it was power cycled. > > > > This machine is running: > > FreeBSD gunsight1.neutralgood.org 8.2-RELEASE FreeBSD 8.2-RELEASE #1: Thu Dec 8 21:58:59 UTC 2011 root@:/usr/obj/usr/src/sys/GENERIC amd64 > > > > It''s a stock 8.2-RELEASE kernel except I had to tweak it near the top of > > vfs_mountroot() to delay before attempting to mount the root filesystem. > > (Without my tweak it attempts to mount root before the USB drive is finished > > getting attached.) > > > > The dmesg shows this at the reboot: > > mfi0: 24272 (422106527s/0x0020/info) - Patrol Read complete > > mfi0: 24273 (422172000s/0x0020/info) - Patrol Read started > > mfi0: 24318 (422192750s/0x0020/info) - Patrol Read complete > > mfi0: 24319 (boot + 3s/0x0020/info) - Firmware initialization started (PCI ID 0060/1000/1f0c/1028) > > mfi0: 24320 (boot + 3s/0x0020/info) - Firmware version 1.22.12-0952 > > mfi0: 24321 (boot + 3s/0x0020/info) - Firmware initialization started (PCI ID 0060/1000/1f0c/1028) > > mfi0: 24322 (boot + 3s/0x0020/info) - Firmware version 1.22.12-0952 > > > > Does this mean the machine did not lose power? I ask because my datacenter > > had some sort of power incident and I''m not sure if the server lost power > > or not. But if the kernel message buffer from before the incident is still > > present then the machine never lost power, correct? The datacenter''s power > > incident I''m told happened somewhere around the time of the reboot so I > > have to ask. > > The LSI controllers I''ve used will keep internal event logs which are > persistent across power cycles (so long as the BBU isn''t dead, > presumably). It looks like mfi(4) has been set up to dump the entire > event log during boot. Log entries created after the last reboot are > displayed with a timestamp of "boot + Ns".Ah. This is probably the missing piece of info I needed. It sound like what I thought was evidence pointing away from a power outage really isn''t. Given this I guess I can assume that this was a simple power outage. Thanks for the help!> > > > It looks like I didn''t have dumps enabled. That''s ... not helpful. > > > > The machine has been stable for: > > 2:05PM up 472 days, 21 mins, 7 users, load averages: 0.01, 0.02, 0.00 > > That''s a bit confusing... did you mean "had been"? This is the exact > uptime that''s in status.txt below.Yep. I meant "had been" or "has previously been".> > > > http://www.neutralgood.org/~kpn/dmesg.boot > > > > Here''s various stats I usually keep displayed. This is the last from > > before the reboot: > > http://www.neutralgood.org/~kpn/status.txt > > > > I''ve got all the power savings features turned off in the BIOS and, like > > I said, the machine has been stable for all this time. However, one thing > > to note from a couple of days ago: > > > > This is probably unrelated? As an aside, it''d be nice if mfi(4) dumpedProbably, but I threw it in there anyway. A machine is stable for 400+ days, has mfi issues, and a few days later unexpectedly gets restarted. Coincidence? It sounds like it now, but I didn''t know in my first post on this to the list.> info about the dcmd/io cmd at least once if it times out. At the moment, > it only does that if MFI_DEBUG is enabled... does anyone have an > objection to changing this from a compile-time option to a sysctl? > > Thanks, > -Mark-- Kevin P. Neal http://www.pobox.com/~kpn/ "14. Re-reading No. 13, I realize that it''s quite possible I''m losing my mind. I''m glad that for the most part I''m not aware it''s happening." -- from "20 things I''m thankful for": Fortune, Nov 29, 2004, page 230 _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
Under 8.2 once MFI sees a timeout it will timeout forever.
I committed major updates to MFI in r247367 & r247369 which deal with
many error condition problems that cause panics. So if your seeing issues
with MFI I'd suggest you upgrade to stable/8.
Regards
Steve
----- Original Message -----
From: <kpneal at pobox.com>
To: <freebsd-stable at freebsd.org>
Sent: Sunday, May 19, 2013 2:45 AM
Subject: Unexpected reboot/crash on 8.2-RELEASE.
>I had an unexpected reboot of my Dell R610 today around 2:05-06pm today.
> I do not know if it crashed or if it was power cycled.
>
> This machine is running:
> FreeBSD gunsight1.neutralgood.org 8.2-RELEASE FreeBSD 8.2-RELEASE #1: Thu
Dec 8 21:58:59 UTC 2011
> root@:/usr/obj/usr/src/sys/GENERIC amd64
>
> It's a stock 8.2-RELEASE kernel except I had to tweak it near the top
of
> vfs_mountroot() to delay before attempting to mount the root filesystem.
> (Without my tweak it attempts to mount root before the USB drive is
finished
> getting attached.)
>
> The dmesg shows this at the reboot:
> mfi0: 24272 (422106527s/0x0020/info) - Patrol Read complete
> mfi0: 24273 (422172000s/0x0020/info) - Patrol Read started
> mfi0: 24318 (422192750s/0x0020/info) - Patrol Read complete
> mfi0: 24319 (boot + 3s/0x0020/info) - Firmware initialization started (PCI
ID 0060/1000/1f0c/1028)
> mfi0: 24320 (boot + 3s/0x0020/info) - Firmware version 1.22.12-0952
> mfi0: 24321 (boot + 3s/0x0020/info) - Firmware initialization started (PCI
ID 0060/1000/1f0c/1028)
> mfi0: 24322 (boot + 3s/0x0020/info) - Firmware version 1.22.12-0952
>
> Does this mean the machine did not lose power? I ask because my datacenter
> had some sort of power incident and I'm not sure if the server lost
power
> or not. But if the kernel message buffer from before the incident is still
> present then the machine never lost power, correct? The datacenter's
power
> incident I'm told happened somewhere around the time of the reboot so I
> have to ask.
>
> It looks like I didn't have dumps enabled. That's ... not helpful.
>
> The machine has been stable for:
> 2:05PM up 472 days, 21 mins, 7 users, load averages: 0.01, 0.02, 0.00
>
> http://www.neutralgood.org/~kpn/dmesg.boot
>
> Here's various stats I usually keep displayed. This is the last from
> before the reboot:
> http://www.neutralgood.org/~kpn/status.txt
>
> I've got all the power savings features turned off in the BIOS and,
like
> I said, the machine has been stable for all this time. However, one thing
> to note from a couple of days ago:
>
> May 14 00:49:13 gunsight1 -- MARK --
> May 14 01:00:45 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT
AFTER 35 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT
AFTER 65 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT
AFTER 95 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT
AFTER 125 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT
AFTER 155 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT
AFTER 185 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT
AFTER 215 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT
AFTER 245 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT
AFTER 275 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT
AFTER 305 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT
AFTER 335 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT
AFTER 365 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT
AFTER 395 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT
AFTER 425 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT
AFTER 455 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT
AFTER 485 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT
AFTER 515 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT
AFTER 545 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT
AFTER 575 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT
AFTER 605 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT
AFTER 635 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xffffff80009d1310 TIMEOUT
AFTER 665 SECONDS
> May 14 01:19:36 gunsight1 -- MARK --
> May 14 01:39:36 gunsight1 -- MARK --
> May 14 01:59:37 gunsight1 -- MARK --
> May 14 02:10:55 gunsight1 kernel: mfi0: 24089 (421826400s/0x0020/info) -
Patrol Read started
>
> --
> Kevin P. Neal http://www.pobox.com/~kpn/
> "Not even the dumbest terrorist would choose an encryption program
that
> allowed the U.S. government to hold the key." -- (Fortune magazine
> is smarter than the US government, Oct 29 2001, page 196.)
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at
freebsd.org"
>
===============================================This e.mail is private and
confidential between Multiplay (UK) Ltd. and the person or entity to whom it is
addressed. In the event of misdirection, the recipient is prohibited from using,
copying, printing or otherwise disseminating it or any information contained in
it.
In the event of misdirection, illegible or incomplete transmission please
telephone +44 845 868 1337
or return the E.mail to postmaster at multiplay.co.uk.