Hi all, based on comments on this list, I bought a new server with 8 SATA bays and an AOC-SAT2-MV8 SATA controller. I them fired up a jumpstart of Solaris 10 5/08 of the server. Install runs through perfectly, with a SVM mirror of / and swap on the first two disks. But during the first boot, I get the Solaris 10 kernel banner then the server resets. If I take out the Marvell card, connect the boot disks to the NVidia MCP55 motherboard SATA controller, the server intalls fine and completes its first reboot. I realise my issue has nothing to do with ZFS, but I''m posting here since I heard of the controller here and I''m pretty sure some on this list are using it. My boss is starting to think that going for a RAID card would have been better, as the server would already be up and running and in production, so I''m willing to listen to any advice. Thanks, Christophe Dupre
On Fri, Jun 27, 2008 at 2:47 PM, Christophe Dupre <cdupre at accovia.com> wrote:> Hi all, > based on comments on this list, I bought a new server with 8 SATA bays > and an AOC-SAT2-MV8 SATA controller. I them fired up a jumpstart of > Solaris 10 5/08 of the server. Install runs through perfectly, with a > SVM mirror of / and swap on the first two disks. But during the first > boot, I get the Solaris 10 kernel banner then the server resets. > > If I take out the Marvell card, connect the boot disks to the NVidia > MCP55 motherboard SATA controller, the server intalls fine and completes > its first reboot. > > I realise my issue has nothing to do with ZFS, but I''m posting here > since I heard of the controller here and I''m pretty sure some on this > list are using it. > > My boss is starting to think that going for a RAID card would have been > better, as the server would already be up and running and in production, > so I''m willing to listen to any advice. > > Thanks, > Christophe Dupre > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >I''ve got several systems running the same setup. It''s definitely not the card. What motherboard are you running? What sort of PCI slot is it installed in? Sounds to me like a pci resource issue. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080627/960cd1cf/attachment.html>
Tim, the system is a Silicon Mechanics A266; the motherboard is a SuperMicro H8DM8E-2 I tried pluging the Marvell card in both 133MHz PCI-X slots. In one I get a lockup during install, in the other I get a reset on first boot. Tim wrote:> > > On Fri, Jun 27, 2008 at 2:47 PM, Christophe Dupre <cdupre at accovia.com > <mailto:cdupre at accovia.com>> wrote: > > Hi all, > based on comments on this list, I bought a new server with 8 SATA bays > and an AOC-SAT2-MV8 SATA controller. I them fired up a jumpstart of > Solaris 10 5/08 of the server. Install runs through perfectly, with a > SVM mirror of / and swap on the first two disks. But during the first > boot, I get the Solaris 10 kernel banner then the server resets. > > If I take out the Marvell card, connect the boot disks to the NVidia > MCP55 motherboard SATA controller, the server intalls fine and > completes > its first reboot. > > I realise my issue has nothing to do with ZFS, but I''m posting here > since I heard of the controller here and I''m pretty sure some on this > list are using it. > > My boss is starting to think that going for a RAID card would have > been > better, as the server would already be up and running and in > production, > so I''m willing to listen to any advice. > > Thanks, > Christophe Dupre > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org <mailto:zfs-discuss at opensolaris.org> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > > I''ve got several systems running the same setup. It''s definitely not > the card. What motherboard are you running? What sort of PCI slot is > it installed in? Sounds to me like a pci resource issue.-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080629/622ee34e/attachment.html>
BIOS revs? Any other pci cards in the system? On Sun, Jun 29, 2008 at 5:16 PM, Christophe Dupre <cdupre at accovia.com> wrote:> Tim, > the system is a Silicon Mechanics A266; the motherboard is a SuperMicro > H8DM8E-2 > > I tried pluging the Marvell card in both 133MHz PCI-X slots. In one I get a > lockup during install, in the other I get a reset on first boot. > > > > > Tim wrote: > > > > On Fri, Jun 27, 2008 at 2:47 PM, Christophe Dupre <cdupre at accovia.com> > wrote: > >> Hi all, >> based on comments on this list, I bought a new server with 8 SATA bays >> and an AOC-SAT2-MV8 SATA controller. I them fired up a jumpstart of >> Solaris 10 5/08 of the server. Install runs through perfectly, with a >> SVM mirror of / and swap on the first two disks. But during the first >> boot, I get the Solaris 10 kernel banner then the server resets. >> >> If I take out the Marvell card, connect the boot disks to the NVidia >> MCP55 motherboard SATA controller, the server intalls fine and completes >> its first reboot. >> >> I realise my issue has nothing to do with ZFS, but I''m posting here >> since I heard of the controller here and I''m pretty sure some on this >> list are using it. >> >> My boss is starting to think that going for a RAID card would have been >> better, as the server would already be up and running and in production, >> so I''m willing to listen to any advice. >> >> Thanks, >> Christophe Dupre >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> > > > I''ve got several systems running the same setup. It''s definitely not the > card. What motherboard are you running? What sort of PCI slot is it > installed in? Sounds to me like a pci resource issue. > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080629/f0b65126/attachment.html>
Just chipping in with my 2c, it might not be a great help though. First of all, it sounds to me like this might be a hardware fault. That motherboard and controller combination are listed on Sun''s HCL so should work fine: http://www.sun.com/bigadmin/hcl/data/systems/details/8454.html The first thing I would ask is whether you need your boot drives powered off that card? Since you''re using mirroring, I assume those two disks won''t be used for your ZFS pool, so why not just boot from disks attached to the motherboard controller, and save the AOC-SAT2-MV8 for your main zpool? That will let you boot Solaris and test the SATA hardware from within Solaris, and should also give slightly better performance when live. Now, if you still get crashes using that configuration while trying to access the drives or create your zfs pool, I would definately suspect faulty hardware. It could be the SATA controller, or anything else in there if it''s all new. However, for troubleshooting, before you did that I''d personally do the following first, making use of the fact that you have a guaranteed test for this fault right now: 1. Strip down to a bare system, see if you can install on the SATA card. By bare system I mean just have PSU, 1 x CPU, 2 x memory chips, SATA card, 1 x hard disk. I''d even disconnect front panel LED''s and everything but the power switch. Even physically disconnect extra hard drives from the PSU. It sounds overkill, but you wouldn''t believe some of the crazy faults I''ve seen over the years. 2. If that works, begin adding components. Start with another two memory chips, then if they work fine, add the second CPU if you have one. If it doesn''t work with a bare system, maybe try another two memory chips, but at that point I''d suspect more serious problems, with either the motherboard or SATA controller. First check there''s nothing shorting under the motherboard, and then speak to your supplier about getting replacement parts. This message posted from opensolaris.org
Christophe Dupre wrote:> Tim, > the system is a Silicon Mechanics A266; the motherboard is a > SuperMicro H8DM8E-2 > > I tried pluging the Marvell card in both 133MHz PCI-X slots. In one I > get a lockup during install, in the other I get a reset on first boot.Just a shot in the dark, but is it possible that you have a problem with with all the disks starting up the power supply is insufficient? Try fewer connected disks. Regards, Lida> > > > Tim wrote: >> >> >> On Fri, Jun 27, 2008 at 2:47 PM, Christophe Dupre <cdupre at accovia.com >> <mailto:cdupre at accovia.com>> wrote: >> >> Hi all, >> based on comments on this list, I bought a new server with 8 SATA >> bays >> and an AOC-SAT2-MV8 SATA controller. I them fired up a jumpstart of >> Solaris 10 5/08 of the server. Install runs through perfectly, with a >> SVM mirror of / and swap on the first two disks. But during the first >> boot, I get the Solaris 10 kernel banner then the server resets. >> >> If I take out the Marvell card, connect the boot disks to the NVidia >> MCP55 motherboard SATA controller, the server intalls fine and >> completes >> its first reboot. >> >> I realise my issue has nothing to do with ZFS, but I''m posting here >> since I heard of the controller here and I''m pretty sure some on this >> list are using it. >> >> My boss is starting to think that going for a RAID card would >> have been >> better, as the server would already be up and running and in >> production, >> so I''m willing to listen to any advice. >> >> Thanks, >> Christophe Dupre >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org <mailto:zfs-discuss at opensolaris.org> >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> >> >> >> I''ve got several systems running the same setup. It''s definitely not >> the card. What motherboard are you running? What sort of PCI slot >> is it installed in? Sounds to me like a pci resource issue. > ------------------------------------------------------------------------ > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
I remember a similar pb with an AOC-SAT2-MV8 controller in a system of mine: Solaris rebooted each time the marvell88sx driver tried to detect the disks attached to it. I don''t remember if happened during installation, or during the first boot after a successful install. I ended up spending a night reverse engineering the controller''s firmware/BIOS to find and fix the bug. The system has been running fine since I reflashed the controller with my patched firmware. To make a long story short, a lot of these controllers in the wild use a buggy firmware, version 1.0b [1]. During POST the controller''s firmware scans the PCI bus to find the device it is supposed to initialize, ie the controller''s Marvell 88SX6081 chip. It incorrectly assumes that the *first* device with one of these PCI device IDs is the 88SX6081: 5040 5041 5080 5081 6041 6042 6081 7042 (the firmware is generic and supposed to support different chips). My system''s motherboard happened to have an Marvell chip 88SX5041 onboard (device ID 5041) which was found first. So during POST the AOC-SAT2-MV8 firmware was initializing disks connected to the 5041, leaving the 6081 disks in an uninitialized stat. Then after POST when Solaris was booting, I guess the marvell88sx barfed on this unexpected state and was causing the kernel to reboot. To fix the bug, I simply patched the firmware to remove 5041 from the device ID list. I used the Supermicro-provided tool to reflash the firmware [1]. You said your motherboard is a Supermicro H8DM8E-2. There is no such model, do you mean H8DM8-2 or H8DME-2 ?. To determine whether one of your PCI devices has one of the device IDs I mentionned, run: $ /usr/X11/bin/scanpci I have recently had to replace this AOC-SAT2-MV8 controller with another one (we accidentally broke a SATA connector during a maintainance operation). Its firmware version is using a totally different numbering scheme (it''s probably more recent) and it worked right out-of-the-box on the same motherboard. So it looks like Marvell or Supermicro fixed the bug in at least some later revisions of the AOC-SAT2-MV8. But they don''t distribute this newer firmware on their FTP site. Do you know if yours is using firmware 1.0b (displayed during POST) ? [1] ftp://ftp.supermicro.com/Firmware/AOC-SAT2-MV8
Good point about the motherboard number, I replied but never spotted that. I''d assumed it was the H8DM3-2, which is on the Sun HCL. Hadn''t realised Supermicro had quite so many similar model numbers. This message posted from opensolaris.org
So what version is on you new card? Seems itd be far easier to request from supermicro if we knew what to ask for. On 7/1/08, Marc Bevand <m.bevand at gmail.com> wrote:> I remember a similar pb with an AOC-SAT2-MV8 controller in a system of mine: > Solaris rebooted each time the marvell88sx driver tried to detect the disks > attached to it. I don''t remember if happened during installation, or during > the first boot after a successful install. I ended up spending a night > reverse > engineering the controller''s firmware/BIOS to find and fix the bug. The > system > has been running fine since I reflashed the controller with my patched > firmware. > > To make a long story short, a lot of these controllers in the wild use a > buggy > firmware, version 1.0b [1]. During POST the controller''s firmware scans the > PCI bus to find the device it is supposed to initialize, ie the controller''s > Marvell 88SX6081 chip. It incorrectly assumes that the *first* device with > one > of these PCI device IDs is the 88SX6081: 5040 5041 5080 5081 6041 6042 6081 > 7042 (the firmware is generic and supposed to support different chips). My > system''s motherboard happened to have an Marvell chip 88SX5041 onboard > (device > ID 5041) which was found first. So during POST the AOC-SAT2-MV8 firmware was > initializing disks connected to the 5041, leaving the 6081 disks in an > uninitialized stat. Then after POST when Solaris was booting, I guess the > marvell88sx barfed on this unexpected state and was causing the kernel to > reboot. > > To fix the bug, I simply patched the firmware to remove 5041 from the device > ID list. I used the Supermicro-provided tool to reflash the firmware [1]. > > You said your motherboard is a Supermicro H8DM8E-2. There is no such model, > do > you mean H8DM8-2 or H8DME-2 ?. To determine whether one of your PCI devices > has one of the device IDs I mentionned, run: > $ /usr/X11/bin/scanpci > > I have recently had to replace this AOC-SAT2-MV8 controller with another one > (we accidentally broke a SATA connector during a maintainance operation). > Its > firmware version is using a totally different numbering scheme (it''s > probably > more recent) and it worked right out-of-the-box on the same motherboard. So > it > looks like Marvell or Supermicro fixed the bug in at least some later > revisions of the AOC-SAT2-MV8. But they don''t distribute this newer firmware > on their FTP site. > > Do you know if yours is using firmware 1.0b (displayed during POST) ? > > [1] ftp://ftp.supermicro.com/Firmware/AOC-SAT2-MV8 > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Marc Bevand <m.bevand <at> gmail.com> writes:> > I have recently had to replace this AOC-SAT2-MV8 controller with another one > (we accidentally broke a SATA connector during a maintainance operation). Its > firmware version is using a totally different numbering scheme (it''s probably > more recent) and it worked right out-of-the-box on the same motherboard.I found the time to reboot the aforementioned system today, and the firmware version displayed during POST by the newer AOC-SAT2-MV8 is "Driver Version 3.2.1.3". -marc