> From: Edward Ned Harvey [mailto:shill at nedharvey.com] > > I have a Dell R710 which has been flaky for some time.? It crashes aboutonce> per week.? I have literally replaced every piece of hardware in it, and > reinstalled Sol 10u9 fresh and clean.It has been over 3 weeks now, with no crashes, and me doing everything I can to get it to crash again. So I''m going to call this one resolved... Tentatively acknowledging the remote possibility that the problem could still come back. All I did was disable the built-in Broadcom network cards, and buy an add-on Intel network card (EXPI9400PT). It is worth noting, that the built-in bcom card cannot be completely disabled if you want to use ipmi... It''s disabled for OS only, but the iDRAC ipmi traffic still goes across the bcom interface. So now I have two network cables running to the machine, one of which is only used for ipmi. No big deal. I had ports to spare on my switch, and the system is stable. It''s the fault of the Broadcom card. Rumor has it (from dell support technician) that the bcom cards have been problematic in other OSes too... It''s not isolated to solaris.
> From: Edward Ned Harvey [mailto:shill at nedharvey.com] > > It has been over 3 weeks now, with no crashes, and me doing everything I > can to get it to crash again. So I''m going to call this one resolved... > > All I did was disable the built-in Broadcom network cards, and buy an add- > on Intel network card (EXPI9400PT).Wow, I can''t believe this topic continues... Yes, I am entirely confident now saying it was the fault of the bcom card. However, if you recall, people who started with bcom firmware v4.x were stable, then they upgraded to v5.x and became unstable, so they downgraded and returned to stable. Unfortunately for me, I have an R710, which shipped with v5 factory installed, and there was no option to downgrade... But a few days ago, Dell released a new firmware upgrade, from version 5.x to 4.x. That''s right. The new firmware is a downgrade to 4. I am going to remove my intel add-on card, and resume using my integrated broadcom nic. I am quite certain the system will continue to be stable, and at last we can call this issue resolved permanently.
Am 10.12.10 19:13, schrieb Edward Ned Harvey:>> From: Edward Ned Harvey [mailto:shill at nedharvey.com] >> >> It has been over 3 weeks now, with no crashes, and me doing everything I >> can to get it to crash again. So I''m going to call this one resolved... >> >> All I did was disable the built-in Broadcom network cards, and buy an add- >> on Intel network card (EXPI9400PT). > Wow, I can''t believe this topic continues... > > Yes, I am entirely confident now saying it was the fault of the bcom card. > However, if you recall, people who started with bcom firmware v4.x were > stable, then they upgraded to v5.x and became unstable, so they downgraded > and returned to stable. Unfortunately for me, I have an R710, which shipped > with v5 factory installed, and there was no option to downgrade... > > But a few days ago, Dell released a new firmware upgrade, from version 5.x > to 4.x. That''s right. The new firmware is a downgrade to 4. > > I am going to remove my intel add-on card, and resume using my integrated > broadcom nic. I am quite certain the system will continue to be stable, and > at last we can call this issue resolved permanently. >Wow - that''s interesting. I will certainly "update" my current bcom fw to get to 4.x. Thanks for the heads-up. Cheers, budy
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Edward Ned Harvey > > But a few days ago, Dell released a new firmware upgrade, from version 5.x > to 4.x. That''s right. The new firmware is a downgrade to 4. > > I am going to remove my intel add-on card, and resume using my integrated > broadcom nic. I am quite certain the system will continue to be stable,and> at last we can call this issue resolved permanently.Oh well. Already, the weird crash has happened again. So we''re concluding two things: -1- The broadcom nic is definitely the cause of the crash. and -2- Even with the new "upgrade" downgrade, the problem is not solved. So the solution is add-on intel nic, and disable broadcom integrated nic.
> Oh well. Already, the weird crash has happened again. So we''re concluding > two things: > -1- The broadcom nic is definitely the cause of the crash. > and > -2- Even with the new "upgrade" downgrade, the problem is not solved.> So the solution is add-on intel nic, and disable broadcom integrated nic.And if I may conclude my own findings. "random crashes" and Broadcom issues are separate non-related problems afaik, we have some R710''s with Broadcom nics that seem to be stable over several months and other R710''s with cannot keep it together for even week or so. Both have identical fw/bios versions. 1) there is/was problem with Broadcom nics loosing network connectivity with every OS, including solaris, this was fixed by software patches in sol10 and sol11 express, and non official driver update was made for snv_134. workaround for this issue was to disable c-states from bios under processor configuration. 2) there is somekind of unstability issue not related to nics with latest batch of R710 series servers, crashes occur randomly, but seemed to get fixed in Solaris 11 Express, no idea on sol10 though. We have yet to test if this is somehow related to processor/memory configuration being used, mind you that software and firmware versions are identical on "stable" R710''s and crashing ones. 3) there was also problems with system disk going missing suddenly (when using sas 6ir), I think it''s somewhat related to problem 2), happens rarely though. 4) Solaris 11 Express and latest R710''s introduced new Broadcom problem, random network hiccups. Disabling C-states does not help. Planning to open SR for this, seems very much different from original problem (OS is not aware what happens at all). My solution for issues would be not to use R710 in anything more serious, it is definitely platform that has more problems than I''m interested in debugging for (: Yours Markus Kovero -
I''ve just noticed that Dell has a 6.0.1 firmware upgrade available, at least for my R610''s they do (they are about 3 months old). Oddly enough it doesn''t show up on support.dell.com when I search using my servicecode, but if I check through "System Services / Lifecycle Controller" it does find them. Two of the same servers are running Ubuntu 10.04.1 and RHEL 5.4, several TB''s of data have gone through the interfaces on those two boxes without a single glitch. So has anyone tried 6.0.1 yet, or is it simply v4.x repackaged with a new version number? - Lasse On 12/12/2010, at 14.39, Markus Kovero wrote:>> Oh well. Already, the weird crash has happened again. So we''re concluding >> two things: >> -1- The broadcom nic is definitely the cause of the crash. >> and >> -2- Even with the new "upgrade" downgrade, the problem is not solved. > >> So the solution is add-on intel nic, and disable broadcom integrated nic. > > And if I may conclude my own findings. > > "random crashes" and Broadcom issues are separate non-related problems afaik, we have some R710''s with Broadcom nics that seem to be stable over several months and other R710''s with cannot keep it together for even week or so. Both have identical fw/bios versions. > > 1) there is/was problem with Broadcom nics loosing network connectivity with every OS, including solaris, this was fixed by software patches in sol10 and sol11 express, and non official driver update was made for snv_134. > workaround for this issue was to disable c-states from bios under processor configuration. > 2) there is somekind of unstability issue not related to nics with latest batch of R710 series servers, crashes occur randomly, but seemed to get fixed in Solaris 11 Express, no idea on sol10 though. We have yet to test if this is somehow related to processor/memory configuration being used, mind you that software and firmware versions are identical on "stable" R710''s and crashing ones. > 3) there was also problems with system disk going missing suddenly (when using sas 6ir), I think it''s somewhat related to problem 2), happens rarely though. > 4) Solaris 11 Express and latest R710''s introduced new Broadcom problem, random network hiccups. Disabling C-states does not help. Planning to open SR for this, seems very much different from original problem (OS is not aware what happens at all). > > My solution for issues would be not to use R710 in anything more serious, it is definitely platform that has more problems than I''m interested in debugging for (: > > Yours > Markus Kovero > > > - > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Am 22.12.10 18:47, schrieb Lasse Osterild:> I''ve just noticed that Dell has a 6.0.1 firmware upgrade available, at least for my R610''s they do (they are about 3 months old). Oddly enough it doesn''t show up on support.dell.com when I search using my servicecode, but if I check through "System Services / Lifecycle Controller" it does find them. > > Two of the same servers are running Ubuntu 10.04.1 and RHEL 5.4, several TB''s of data have gone through the interfaces on those two boxes without a single glitch. > > So has anyone tried 6.0.1 yet, or is it simply v4.x repackaged with a new version number? > > - Lasse > > On 12/12/2010, at 14.39, Markus Kovero wrote: > >>> Oh well. Already, the weird crash has happened again. So we''re concluding >>> two things: >>> -1- The broadcom nic is definitely the cause of the crash. >>> and >>> -2- Even with the new "upgrade" downgrade, the problem is not solved. >>> So the solution is add-on intel nic, and disable broadcom integrated nic. >> And if I may conclude my own findings. >> >> "random crashes" and Broadcom issues are separate non-related problems afaik, we have some R710''s with Broadcom nics that seem to be stable over several months and other R710''s with cannot keep it together for even week or so. Both have identical fw/bios versions. >> >> 1) there is/was problem with Broadcom nics loosing network connectivity with every OS, including solaris, this was fixed by software patches in sol10 and sol11 express, and non official driver update was made for snv_134. >> workaround for this issue was to disable c-states from bios under processor configuration. >> 2) there is somekind of unstability issue not related to nics with latest batch of R710 series servers, crashes occur randomly, but seemed to get fixed in Solaris 11 Express, no idea on sol10 though. We have yet to test if this is somehow related to processor/memory configuration being used, mind you that software and firmware versions are identical on "stable" R710''s and crashing ones. >> 3) there was also problems with system disk going missing suddenly (when using sas 6ir), I think it''s somewhat related to problem 2), happens rarely though. >> 4) Solaris 11 Express and latest R710''s introduced new Broadcom problem, random network hiccups. Disabling C-states does not help. Planning to open SR for this, seems very much different from original problem (OS is not aware what happens at all). >> >> My solution for issues would be not to use R710 in anything more serious, it is definitely platform that has more problems than I''m interested in debugging for (: >> >> Yours >> Markus KoveroWell a couple of weeks before christmas, I enabled the onboard bcom nics on my R610 again, to use them as IMPI ports - I didn''t even use them in Sol11, but as of this morning, the system has entered the state again, in which a successful login to the system was not anymore possible. Neither logging in to the local console was possible - the system didn''t even prompt for the password. So, just having the bcom nics present on the host seem to cause these troubles, even if Sol11 doesn''t have to deal with the nics for anything. Cheers, budy
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Stephan Budach > > Well a couple of weeks before christmas, I enabled the onboard bcom nics > on my R610 again, to use them as IMPI ports - I didn''t even use them inYou don''t have to enable the broadcom nic in order for them to do IPMI. In my R710, I went into BIOS, and disabled all the bcom nics. The primary NIC doesn''t allow you to *fully* disable it. It says something like "Disabled (OS)"... This means the OS can''t see it, but it''s still doing IPMI assuming you configured IPMI in the BIOS interface (Ctrl-E) It seems to work fine in this configuration.
Am 03.01.11 19:41, schrieb Edward Ned Harvey:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >> bounces at opensolaris.org] On Behalf Of Stephan Budach >> >> Well a couple of weeks before christmas, I enabled the onboard bcom nics >> on my R610 again, to use them as IMPI ports - I didn''t even use them in > You don''t have to enable the broadcom nic in order for them to do IPMI. In > my R710, I went into BIOS, and disabled all the bcom nics. The primary NIC > doesn''t allow you to *fully* disable it. It says something like "Disabled > (OS)"... This means the OS can''t see it, but it''s still doing IPMI assuming > you configured IPMI in the BIOS interface (Ctrl-E) > > It seems to work fine in this configuration. >That''s worth a try. I will check that tomorrow. Thanks, budy
If you''re still having issues.... go into the BIOS and disable C-States, if you haven''t already. It is responsible for most of the problems with 11th Gen PowerEdge. -- This message posted from opensolaris.org
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Ben Rockwood > > If you''re still having issues.... go into the BIOS and disable C-States,if you> haven''t already. It is responsible for most of the problems with 11th Gen > PowerEdge.I did that with no benefit on my R710. For me the main problem was the broadcom NIC. Needed to disable the NIC in bios, and add-on an Intel NIC instead.