I have a Dell R710 which has been flaky for some time. It crashes about once per week. I have literally replaced every piece of hardware in it, and reinstalled Sol 10u9 fresh and clean. I am wondering if other people out there are using Dell hardware, with what degree of success, and in what configuration? The failure seems to be related to the perc 6i. For some period around the time of crash, the system still responds to ping, and anything currently in memory or running from remote storage continues to function fine. But new processes that require the local storage ... Such as inbound ssh etc, or even physical login at the console ... those are all hosed. And eventually the system stops responding to ping. As soon as the problem starts, the only recourse is power cycle. I can''t seem to reproduce the problem reliably, but it does happen regularly. Yesterday it happened several times in one day, but sometimes it will go 2 weeks without a problem. Again, just wondering what other people are using, and experiencing. To see if any more clues can be found to identify the cause. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101013/5ed921f7/attachment.html>
> I have a Dell R710 which has been flaky for some time.? It crashes about once per week.? I have literally replaced every piece of hardware in it, and reinstalled Sol 10u9 fresh and clean.? > I am wondering if other people out there are using Dell hardware, with what degree of success, and in what configuration? > The failure seems to be related to the perc 6i.? For some period around the time of crash, the system still responds to ping, and anything currently in memory or running from remote storage continues to function fine.? But new processes that require the local storage ... Such as inbound ssh etc, or even physical login at the console ... those are all hosed.? And eventually the system stops responding to ping.? As soon as the problem starts, the only recourse is power cycle. > I can''t seem to reproduce the problem reliably, but it does happen regularly.? Yesterday it happened several times in one day, but sometimes it will go 2 weeks without a problem. > Again, just wondering what other people are using, and experiencing.? To see if any more clues can be found to identify the cause.Hi, we''ve been running opensolaris on Dell R710 with mixed results, some work better than others and we''ve been struggling with same issue as you are with latest servers. I suspect somekind powersaving issue gone wrong, system disks goes to sleep and never wake up or something similar. Personally, I cannot recommend using them with solaris, support is not even close to what it should be. Yours Markus Kovero
> From: Markus Kovero [mailto:Markus.Kovero at nebula.fi] > Sent: Wednesday, October 13, 2010 10:43 AM > > Hi, we''ve been running opensolaris on Dell R710 with mixed results, > some work better than others and we''ve been struggling with same issue > as you are with latest servers. > I suspect somekind powersaving issue gone wrong, system disks goes to > sleep and never wake up or something similar. > Personally, I cannot recommend using them with solaris, support is not > even close to what it should be.How consistent are your problems? If you change something and things get better or worse, will you be able to notice? Right now, I think I have improved matters by changing the Perc to WriteThrough instead of WriteBack. Yesterday the system crashed several times before I changed that, and afterward, I can''t get it to crash at all. But as I said before ... Sometimes the system goes 2 weeks without a problem. Do you have all your disks configured as individual disks? Do you have any SSD? WriteBack or WriteThrough?
Do you have dedup on? Removing large files, zfs destroy a snapshot, or a zvol and you''ll see hangs like you are describing. Turn off dedup is best option.. If you want dedup get more ram, and more, and more, and.. add SSD cache device.. then it works ok usually.. Right now I''m fighting an outage due to zfs destroy zvol that hung everything, we thought we had identified what "not to do" but apparently not.. Dedup only works well with a lot of ram, otherwise the dedup table is read from disk (very slowly, especially during i/o) and some operations lock the whole server - it blocks other disk i/o. Good luck, Steve Radich www.BitShop.com - Business Innovative Technology Shop -- This message posted from opensolaris.org
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Steve Radich, BitShop, Inc. > > Do you have dedup on? Removing large files, zfs destroy a snapshot, or > a zvol and you''ll see hangs like you are describing.Thank you, but no. I''m running sol 10u9, which does not have dedup yet, because dedup is not yet considered "stable" for reasons like you mentioned. I will admit, when dedup is available in sol 11, I do want it. ;-)
> How consistent are your problems? If you change something and things get > better or worse, will you be able to notice?> Right now, I think I have improved matters by changing the Perc to > WriteThrough instead of WriteBack. Yesterday the system crashed several > times before I changed that, and afterward, I can''t get it to crash at all. > But as I said before ... Sometimes the system goes 2 weeks without a > problem.> Do you have all your disks configured as individual disks? > Do you have any SSD? > WriteBack or WriteThrough?I believe issues are not related to perc, as we use sas 6ir with system disks and disks are showing up as individual disks. System has been crashing with and without (i/o) load, so far it''s been running best with all extra pci-e cards removed (10Gbps nic, sas 5e controllers), uptime almost two days. There''s no apparent reason what triggers the crash, it did crash very frequently during one day and now it seems more stable. (sunspots anyone?) We had SSD''s at start, but removed them during testing, no effect there. Somehow, all this is starting to remind me about Broadcom NIC issues. Different (not fully supported) hardware revision causing issues? Yours Markus Kovero
Hi Ed, I have been using the Dell r710 for a while. You might try disabling c-states, as the problem you saw is identical to one I was seeing (disk i/o stops working, other things are ok). Since disabling c-states, I haven''t seen the problem again. max On Oct 13, 2010, at 4:56 PM, Edward Ned Harvey wrote:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >> bounces at opensolaris.org] On Behalf Of Steve Radich, BitShop, Inc. >> >> Do you have dedup on? Removing large files, zfs destroy a snapshot, or >> a zvol and you''ll see hangs like you are describing. > > Thank you, but no. > > I''m running sol 10u9, which does not have dedup yet, because dedup is not > yet considered "stable" for reasons like you mentioned. > > I will admit, when dedup is available in sol 11, I do want it. ;-) > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Hi , I have some Dell R710 and Dell R410 running OSOL (snv_130 or snv_134) attached to a Supermicro chassis, and the PERC it''s only used for the root disks. I did got some issues with this type of servers, but here''s what i did that made them quite stable : - disable virtualization support in BIOS - disable C-STATE in Bios (CPU menu i think) After those 2 issues those servers became quite stable, but before it they hang without any apparent reason...I use compression, some hosts have SSD''s and i have no dedup enable on all servers. Good luck ! Bruno On Wed, 13 Oct 2010 10:56:48 -0400, "Edward Ned Harvey" wrote:>> From:zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss->>bounces at opensolaris.org] On Behalf Of Steve Radich, BitShop, Inc.>> >> Doyou have dedup on? Removing large files, zfs destroy a snapshot, or>> azvol and you''ll see hangs like you are describing.> > Thank you, butno.> > I''m running sol 10u9, which does not have dedup yet, because dedupis not> yet considered "stable" for reasons like you mentioned. > > Iwill admit, when dedup is available in sol 11, I do want it. ;-)> >_______________________________________________> zfs-discuss mailinglist> zfs-discuss at opensolaris.org >http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Bruno Sousa -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101013/f370676f/attachment.html>
The BROADCOM NIC was also a problem faced by me, and if you downgrade the FW to the 4.x series everything is fine... But i think there''s a new updated driver somewhere... Bruno On Wed, 13 Oct 2010 14:58:32 +0000, Markus Kovero <Markus.Kovero at nebula.fi> wrote:>> How consistent are your problems? If you change something and thingsget>> better or worse, will you be able to notice? > >> Right now, I think I have improved matters by changing the Perc to >> WriteThrough instead of WriteBack. Yesterday the system crashedseveral>> times before I changed that, and afterward, I can''t get it to crash at >> all. >> But as I said before ... Sometimes the system goes 2 weeks without a >> problem. > >> Do you have all your disks configured as individual disks? >> Do you have any SSD? >> WriteBack or WriteThrough? > > I believe issues are not related to perc, as we use sas 6ir with system > disks and disks are showing up as individual disks. > System has been crashing with and without (i/o) load, so far it''s been > running best with all extra pci-e cards removed (10Gbps nic, sas 5e > controllers), uptime almost two days. > There''s no apparent reason what triggers the crash, it did crash very > frequently during one day and now it seems more stable. (sunspotsanyone?)> We had SSD''s at start, but removed them during testing, no effect there. > Somehow, all this is starting to remind me about Broadcom NIC issues. > Different (not fully supported) hardware revision causing issues? > > Yours > Markus Kovero > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Bruno Sousa -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
On Wed, Oct 13 at 10:13, Edward Ned Harvey wrote:> I have a Dell R710 which has been flaky for some time. It crashes about > once per week. I have literally replaced every piece of hardware in it, > and reinstalled Sol 10u9 fresh and clean. > > > > I am wondering if other people out there are using Dell hardware, with > what degree of success, and in what configuration?Dell T610 w/ the default SAS 6/IR adapter has been working fine for us for 18 months. All issues have been software bugs in opensolaris so far. Not much of a data point, but I have no reason not to buy another Dell server in the future. Out of curiosity, did you run into this: http://blogs.everycity.co.uk/alasdair/2010/06/broadcom-nics-dropping-out-on-solaris-10/ --eric -- Eric D. Mudama edmudama at mail.bounceswoosh.org
> From: edmudama at mail.bounceswoosh.org > [mailto:edmudama at mail.bounceswoosh.org] On Behalf Of Eric D. Mudama > > Out of curiosity, did you run into this: > http://blogs.everycity.co.uk/alasdair/2010/06/broadcom-nics-dropping- > out-on-solaris-10/I personally haven''t had the broadcom problem. When my system crashes, surprisingly, it continues responding to ping, answers on port 22 (but you can''t ssh in), and if there are any cron jobs that run from NFS, they''re able to continue. For some period of time, and eventually the whole thing crashes.
Dell R710 ... Solaris 10u9 ... With stability problems ... Notice that I have several CPU''s whose current_cstate is higher than the supported_max_cstate. Logically, that sounds like a bad thing. But I can''t seem to find documentation that defines the meaning of supported_max_cstates, to verify that this is a bad thing. I''m looking for other people out there ... with and without problems ... to try this too, and see if a current_cstate higher than the supported_max_cstate might be a simple indicator of system instability. kstat | grep current_cstate ; kstat | grep supported_max_cstate current_cstate 1 current_cstate 3 current_cstate 3 current_cstate 3 current_cstate 1 current_cstate 3 current_cstate 3 current_cstate 3 current_cstate 0 current_cstate 3 current_cstate 3 current_cstate 3 current_cstate 1 current_cstate 3 current_cstate 3 current_cstate 3 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Edward Ned Harvey > > Dell R710 ... Solaris 10u9 ... With stability problems ... > Notice that I have several CPU''s whose current_cstate is higher than > the > supported_max_cstate.One more data point: Sun x4275 ... Solaris 10u6 fully updated (equivalent of 10u9??) ... No problems ... There are no current_cstate''s higher than supported_max_cstate. kstat | grep current_cstate ; kstat | grep supported_max_cstate current_cstate 2 current_cstate 1 current_cstate 1 current_cstate 1 current_cstate 1 current_cstate 1 current_cstate 0 current_cstate 1 current_cstate 1 current_cstate 1 current_cstate 1 current_cstate 1 current_cstate 1 current_cstate 1 current_cstate 2 current_cstate 2 supported_max_cstates 3 supported_max_cstates 3 supported_max_cstates 3 supported_max_cstates 3 supported_max_cstates 3 supported_max_cstates 3 supported_max_cstates 3 supported_max_cstates 3 supported_max_cstates 3 supported_max_cstates 3 supported_max_cstates 3 supported_max_cstates 3 supported_max_cstates 3 supported_max_cstates 3 supported_max_cstates 3 supported_max_cstates 3
On 13 Oct 2010, at 18:30, Edward Ned Harvey wrote:>> From: edmudama at mail.bounceswoosh.org >> [mailto:edmudama at mail.bounceswoosh.org] On Behalf Of Eric D. Mudama >> >> Out of curiosity, did you run into this: >> http://blogs.everycity.co.uk/alasdair/2010/06/broadcom-nics-dropping- >> out-on-solaris-10/ > > I personally haven''t had the broadcom problem. When my system > crashes, > surprisingly, it continues responding to ping, answers on port 22 > (but you > can''t ssh in), and if there are any cron jobs that run from NFS, > they''re > able to continue. For some period of time, and eventually the whole > thing > crashes. > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discussI had that for months! Eventually I found that there was a memory leak with idmapd. I now have a cron that restarts it every night, problem solved. I only diagnosed the issue by emailing my self a ''top'' output every 5 minutes via cron and watching it slowly creep up. It normally happens when I have allot of SMB traffic, there''s a leak there somewhere! - Daniel
''Edward Ned Harvey'' wrote:>I have a Dell R710 which has been flaky for some time. It crashes >about once per week. I have literally replaced every piece of hardware >in it, and reinstalled Sol 10u9 fresh and clean. > >I am wondering if other people out there are using Dell hardware, with >what degree of success, and in what configuration?We are running (Open)Solaris on lots of 10g servers (PE2900, PE1950, PE2950, R905) and some 11g (R610 and soon some R815) with both PERC and non-PERC controllers and lots of MD1000''s. The 10g models are stable - especially the R905''s are real workhorses. We have had only one 11g server (R610) which caused trouble. The box froze at least once a week - after replacing almost the entire box I switched from the old iscsitgt to COMSTAR and the box has been stable since. Go figure ... I might add that none of these machine use the onboard Broadcom nic''s.>The failure seems to be related to the perc 6i. For some period around >the time of crash, the system still responds to ping, and anything >currently in memory or running from remote storage continues to >function fine. But new processes that require the local storage >... Such as inbound ssh etc, or even physical login at the console >... those are all hosed. And eventually the system stops responding to >ping. As soon as the problem starts, the only recourse is power cycle. > >I can''t seem to reproduce the problem reliably, but it does happen >regularly. Yesterday it happened several times in one day, but >sometimes it will go 2 weeks without a problem. > >Again, just wondering what other people are using, and experiencing. >To see if any more clues can be found to identify the cause.>_______________________________________________ >zfs-discuss mailing list >zfs-discuss at opensolaris.org >http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Med venlig hilsen / Best Regards Henrik Johansen henrik at scannet.dk Tlf. 75 53 35 00 ScanNet Group A/S ScanNet
> From: Henrik Johansen [mailto:henrik at scannet.dk] > > The 10g models are stable - especially the R905''s are real workhorses.You would generally consider all your machines stable now? Can you easily pdsh to all those machines? kstat | grep current_cstate ; kstat | grep supported_max_cstates I''d really love to see if "some current_cstate is higher than supported_max_cstates" is an accurate indicator of system instability. So far the two data points I have support this theory.
''Edward Ned Harvey'' wrote:>> From: Henrik Johansen [mailto:henrik at scannet.dk] >> >> The 10g models are stable - especially the R905''s are real workhorses. > >You would generally consider all your machines stable now? >Can you easily pdsh to all those machines?Yes - the only problem child has been 1 R610 (the other 2 that we have in production have not shown any signs of trouble)>kstat | grep current_cstate ; kstat | grep supported_max_cstates > >I''d really love to see if "some current_cstate is higher than >supported_max_cstates" is an accurate indicator of system instability.Here''s a little sample from different machines : R610 #1 current_cstate 3 current_cstate 3 current_cstate 3 current_cstate 3 current_cstate 3 current_cstate 3 current_cstate 3 current_cstate 3 current_cstate 3 current_cstate 3 current_cstate 3 current_cstate 3 current_cstate 3 current_cstate 3 current_cstate 3 current_cstate 0 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2 R610 #2 current_cstate 3 current_cstate 0 current_cstate 3 current_cstate 3 current_cstate 3 current_cstate 3 current_cstate 3 current_cstate 3 current_cstate 3 current_cstate 3 current_cstate 3 current_cstate 3 current_cstate 3 current_cstate 3 current_cstate 3 current_cstate 3 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2 supported_max_cstates 2 PE2900 current_cstate 1 current_cstate 1 current_cstate 0 current_cstate 1 current_cstate 1 current_cstate 0 current_cstate 1 current_cstate 1 supported_max_cstates 1 supported_max_cstates 1 supported_max_cstates 1 supported_max_cstates 1 supported_max_cstates 1 supported_max_cstates 1 supported_max_cstates 1 supported_max_cstates 1 PER905 current_cstate 1 current_cstate 1 current_cstate 1 current_cstate 1 current_cstate 0 current_cstate 1 current_cstate 1 current_cstate 1 current_cstate 1 current_cstate 1 current_cstate 1 current_cstate 1 current_cstate 1 current_cstate 0 current_cstate 1 current_cstate 1 supported_max_cstates 0 supported_max_cstates 0 supported_max_cstates 0 supported_max_cstates 0 supported_max_cstates 0 supported_max_cstates 0 supported_max_cstates 0 supported_max_cstates 0 supported_max_cstates 0 supported_max_cstates 0 supported_max_cstates 0 supported_max_cstates 0 supported_max_cstates 0 supported_max_cstates 0 supported_max_cstates 0 supported_max_cstates 0 The PE2900 and R905 that I took the data from have uptimes close to 1 year and see a LOT of usage.>So far the two data points I have support this theory. >-- Med venlig hilsen / Best Regards Henrik Johansen henrik at scannet.dk Tlf. 75 53 35 00 ScanNet Group A/S ScanNet
We got a R710 + 3 MD1000s running zfs, with intel 10GE network card. There was a period of time that R710 freezing randomly, when we used osol b12x release. I checked in google and there were reports of freezes caused by a new mpt driver used in b12x release which could be the cause. Changed to nexenta based on b134, then the issue is gone, running very stable ever since. Plan to add 3 more MD1000s. All MD1000s are connected to SAS 5e card. Not sure how is the mpt driver status in sol10u9. -- This message posted from opensolaris.org
On Wed, Oct 13 at 15:44, Edward Ned Harvey wrote:>> From: Henrik Johansen [mailto:henrik at scannet.dk] >> >> The 10g models are stable - especially the R905''s are real workhorses. > >You would generally consider all your machines stable now? >Can you easily pdsh to all those machines? > >kstat | grep current_cstate ; kstat | grep supported_max_cstatesDell T610, machine has been stable since we got it (relative to the failure modes you''ve mentioned) current_cstate 1 current_cstate 1 current_cstate 1 current_cstate 0 current_cstate 1 current_cstate 1 current_cstate 0 current_cstate 1 supported_max_cstates 1 supported_max_cstates 1 supported_max_cstates 1 supported_max_cstates 1 supported_max_cstates 1 supported_max_cstates 1 supported_max_cstates 1 supported_max_cstates 1 --eric -- Eric D. Mudama edmudama at mail.bounceswoosh.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi All, I''m currently considering purchasing 1 or 2 Dell R515''s. With up to 14 drives, and up to 64GB of RAM, it seems like it''s well suited for a low-end ZFS server. I know this box is new, but I wonder if anyone out there has any experience with it? How about the H700 SAS controller? Anyone know where to find the Dell 3.5" sleds that take 2.5" drives? I want to put some SSD''s in a box like this, but there''s no way I''m going to pay Dell''s SSD prices. $1300 for a 50GB ''mainstream'' SSD? Are they kidding? -Kyle -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (MingW32) iQEcBAEBAgAGBQJMwiMEAAoJEEADRM+bKN5w5IkH/AjOBKmnEUHIsSbW44Tmo94o 83kISEBx/hRYhLzNEpFYOW6IBD3pqYDGQP7da4ULMdPBINCWE6zcUT83BTct6O0D MSHJXacciOILIMMj6SM6+auvv9WloWwrbV/S+KsvkKoLxzhBafYkxZOEMJlkBwp1 Jpm/P3EoWpNLBasSHCCvKsGskZUDpIgVnzKrMkzXV6R5ROlgYlmFNPGlC/1kbL1Y 9DZrlKow0Ai0W5fCXjGSafZbzawa4SpBj02ES7CUQLvn45EhaRrSkneAM4dy1obo Oif4c1Nt2c0yV5xa1tc4i84Vd2iy9LR6g5C+1Hm3UqAKjcwPEEEUyAYhQpsKAIA=DW76 -----END PGP SIGNATURE-----
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Kyle McDonald > > I''m currently considering purchasing 1 or 2 Dell R515''s. > > With up to 14 drives, and up to 64GB of RAM, it seems like it''s well > suited > for a low-end ZFS server. > > I know this box is new, but I wonder if anyone out there has any > experience with it? > > How about the H700 SAS controller? > > Anyone know where to find the Dell 3.5" sleds that take 2.5" drives? I > want to put some SSD''s in a box like this, but there''s no way I''m > going to pay Dell''s SSD prices. $1300 for a 50GB ''mainstream'' SSD? Are > they kidding?You are asking for a world of hurt. You may luck out, and it may work great, thus saving you money. Take my example for example ... I took the "safe" approach (as far as any non-sun hardware is concerned.) I bought an officially supported dell server, with all dell blessed and solaris supported components, with support contracts on both the hardware and software, fully patched and updated on all fronts, and I am getting system failures approx once per week. I have support tickets open with both dell and oracle right now ... Have no idea how it''s all going to turn out. But if you have a problem like mine, using unsupported hardware, you have no alternative. You''re up a tree full of bees, naked, with a hunter on the ground trying to shoot you. And IMHO, I think the probability of having a problem like mine is higher when you use the unsupported hardware. But of course there''s no definable way to quantize that belief. My advice to you is: buy the supported hardware, and the support contracts for both the hardware and software. But of course, that''s all just a calculated risk, and I doubt you''re going to take my advice. ;-)
On Fri, Oct 22, 2010 at 10:53 PM, Edward Ned Harvey <shill at nedharvey.com>wrote:> > From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > > bounces at opensolaris.org] On Behalf Of Kyle McDonald > > > > I''m currently considering purchasing 1 or 2 Dell R515''s. > > > > With up to 14 drives, and up to 64GB of RAM, it seems like it''s well > > suited > > for a low-end ZFS server. > > > > I know this box is new, but I wonder if anyone out there has any > > experience with it? > > > > How about the H700 SAS controller? > > > > Anyone know where to find the Dell 3.5" sleds that take 2.5" drives? I > > want to put some SSD''s in a box like this, but there''s no way I''m > > going to pay Dell''s SSD prices. $1300 for a 50GB ''mainstream'' SSD? Are > > they kidding? > > You are asking for a world of hurt. You may luck out, and it may work > great, thus saving you money. Take my example for example ... I took the > "safe" approach (as far as any non-sun hardware is concerned.) I bought an > officially supported dell server, with all dell blessed and solaris > supported components, with support contracts on both the hardware and > software, fully patched and updated on all fronts, and I am getting system > failures approx once per week. I have support tickets open with both dell > and oracle right now ... Have no idea how it''s all going to turn out. But > if you have a problem like mine, using unsupported hardware, you have no > alternative. You''re up a tree full of bees, naked, with a hunter on the > ground trying to shoot you. And IMHO, I think the probability of having a > problem like mine is higher when you use the unsupported hardware. But of > course there''s no definable way to quantize that belief. > > My advice to you is: buy the supported hardware, and the support contracts > for both the hardware and software. But of course, that''s all just a > calculated risk, and I doubt you''re going to take my advice. ;-) > >Dell requires Dell branded drives as of roughly 8 months ago. I don''t think there was ever an H700 firmware released that didn''t require this. I''d bet you''re going to waste a lot of money to get a drive the system refuses to recognize. --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101022/98465b77/attachment.html>
''Tim Cook'' wrote: [... snip ... ]>Dell requires Dell branded drives as of roughly 8 months ago. I don''t >think there was ever an H700 firmware released that didn''t require >this. I''d bet you''re going to waste a lot of money to get a drive the >system refuses to recognize.This should no longer be an issue as Dell has abandoned that practice because of customer pressure.>--Tim > >>_______________________________________________ >zfs-discuss mailing list >zfs-discuss at opensolaris.org >http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Med venlig hilsen / Best Regards Henrik Johansen henrik at scannet.dk Tlf. 75 53 35 00 ScanNet Group A/S ScanNet
I actually have three Dell R610 boxes running OSol snv134 and since I switched from the internal Broadcom NICs to Intel ones, I didn''t have any issue with them. budy
Congratulations Ed, and welcome to "open systems?" Ah, but Nexenta is open and has "no vendor lock-in." That''s what you probably should have done is bank everything on Illumos and Nexenta. A winning combination by all accounts. But then again, you could have used Linux on any hardware as well. Then your hardware and software issues would probably be multiplied even more. Cheers, Mike --- Michael Sullivan michael.p.sullivan at me.com http://www.kamiogi.net/ Japan Mobile: +81-80-3202-2599 US Phone: +1-561-283-2034 On 23 Oct 2010, at 12:53 , Edward Ned Harvey wrote:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >> bounces at opensolaris.org] On Behalf Of Kyle McDonald >> >> I''m currently considering purchasing 1 or 2 Dell R515''s. >> >> With up to 14 drives, and up to 64GB of RAM, it seems like it''s well >> suited >> for a low-end ZFS server. >> >> I know this box is new, but I wonder if anyone out there has any >> experience with it? >> >> How about the H700 SAS controller? >> >> Anyone know where to find the Dell 3.5" sleds that take 2.5" drives? I >> want to put some SSD''s in a box like this, but there''s no way I''m >> going to pay Dell''s SSD prices. $1300 for a 50GB ''mainstream'' SSD? Are >> they kidding? > > You are asking for a world of hurt. You may luck out, and it may work > great, thus saving you money. Take my example for example ... I took the > "safe" approach (as far as any non-sun hardware is concerned.) I bought an > officially supported dell server, with all dell blessed and solaris > supported components, with support contracts on both the hardware and > software, fully patched and updated on all fronts, and I am getting system > failures approx once per week. I have support tickets open with both dell > and oracle right now ... Have no idea how it''s all going to turn out. But > if you have a problem like mine, using unsupported hardware, you have no > alternative. You''re up a tree full of bees, naked, with a hunter on the > ground trying to shoot you. And IMHO, I think the probability of having a > problem like mine is higher when you use the unsupported hardware. But of > course there''s no definable way to quantize that belief. > > My advice to you is: buy the supported hardware, and the support contracts > for both the hardware and software. But of course, that''s all just a > calculated risk, and I doubt you''re going to take my advice. ;-) > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Stephan Budach > > I actually have three Dell R610 boxes running OSol snv134 and since I > switched from the internal Broadcom NICs to Intel ones, I didn''t have > any issue with them.I am still using the built-in broadcom NICs in my R710 that''s having problems... What sort of problems did you have with the bcom NICs in your R610?
Am 24.10.10 16:29, schrieb Edward Ned Harvey:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >> bounces at opensolaris.org] On Behalf Of Stephan Budach >> >> I actually have three Dell R610 boxes running OSol snv134 and since I >> switched from the internal Broadcom NICs to Intel ones, I didn''t have >> any issue with them. > I am still using the built-in broadcom NICs in my R710 that''s having > problems... > > What sort of problems did you have with the bcom NICs in your R610? >Well, basically the boxes would hang themselves up, after a week or so. And by hanging up, I mean becoming inaccessible by either the network via ssh or the local console. It seemed that, for some reason, the authentication didn''t work anymore. Earlier versions of OSol also exhibited the problem, that the links of the broadcom NICs were reported as up (which they actually were, since the LED indiactors we on and also the switch reported that the Links were up), but no network traffic was going through. Disabeling/Enableing the ports didn''t work and I had to reboot the host as well, but that was with 2009/06, I think. Since I am still an OSol noob (well, kind of still), I decided to try the Intel NICs, which had never caused me any trouble in any other server and my boxes have been fine since. -- Stephan Budach Jung von Matt/it-services GmbH Glash?ttenstra?e 79 20357 Hamburg Tel: +49 40-4321-1353 Fax: +49 40-4321-1114 E-Mail: stephan.budach at jvm.de Internet: http://www.jvm.com Gesch?ftsf?hrer: Ulrich Pallas, Frank Wilhelm AG HH HRB 98380 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101024/85c8e794/attachment.html>
> From: Stephan Budach [mailto:stephan.budach at jvm.de] > > > What sort of problems did you have with the bcom NICs in your R610? > > Well, basically the boxes would hang themselves up, after a week or so. > And by hanging up, I mean becoming inaccessible by either the network > via ssh or the local console. It seemed that, for some reason, the > authentication didn''t work anymore.That''s precisely what I''m experiencing. System still responds to ping. Anything that was already running in memory via network stays alive (cron jobs continue to run) but remote access is impossible (ssh, vnc, even local physical console...) And eventually the system will stop completely. There''s a high correlation between the problem and doing some sort of low-level storage operation (zpool import/export, MegaCli offline, zpool status, scrub, zfs send, etc) So I thought the problem was somehow related to the perc or something ... Maybe there''s a bug where the perc conflicts with the nic. I don''t care. Swapping the NIC is cheap enough, I''ll try it now, just to see if it works. Thanks for the suggestion...
> That''s precisely what I''m experiencing. System still responds to ping. > Anything that was already running in memory via network stays alive (cron > jobs continue to run) but remote access is impossible (ssh, vnc, even local > physical console...) And eventually the system will stop completely.Hi, Broadcom issues come out as loss of network connectivity, ie. system stops responding to ping. This is different issue, it''s like system runs out of memory or looses its system disks (which we have seen lately) Yours Markus Kovero
> You are asking for a world of hurt. ?You may luck out, and it may work > great, thus saving you money. ?Take my example for example ... I took the > "safe" approach (as far as any non-sun hardware is concerned.) ?I bought an > officially supported dell server, with all dell blessed and solaris > supported components, with support contracts on both the hardware and > software, fully patched and updated on all fronts, and I am getting system > failures approx once per week. ?I have support tickets open with both dell > and oracle right now ... Have no idea how it''s all going to turn out. ?But > if you have a problem like mine, using unsupported hardware, you have no > alternative. ?You''re up a tree full of bees, naked, with a hunter on the > ground trying to shoot you. ?And IMHO, I think the probability of having a > problem like mine is higher when you use the unsupported hardware. ?But of > course there''s no definable way to quantize that belief.> My advice to you is: ?buy the supported hardware, and the support contracts > for both the hardware and software. ?But of course, that''s all just a > calculated risk, and I doubt you''re going to take my advice. ?;-)Any other feasible alternatives for Dell hardware? Wondering, are these issues mostly related to Nehalem-architectural problems, eg. c-states. So is there anything good in switching hw vendor? HP anyone? Yours Markus Kovero
On 10/25/10 08:39 PM, Markus Kovero wrote:>> You are asking for a world of hurt. You may luck out, and it may work >> great, thus saving you money. Take my example for example ... I took the >> "safe" approach (as far as any non-sun hardware is concerned.) I bought an >> officially supported dell server, with all dell blessed and solaris >> supported components, with support contracts on both the hardware and >> software, fully patched and updated on all fronts, and I am getting system >> failures approx once per week. I have support tickets open with both dell >> and oracle right now ... Have no idea how it''s all going to turn out. But >> if you have a problem like mine, using unsupported hardware, you have no >> alternative. You''re up a tree full of bees, naked, with a hunter on the >> ground trying to shoot you. And IMHO, I think the probability of having a >> problem like mine is higher when you use the unsupported hardware. But of >> course there''s no definable way to quantize that belief. >> >> My advice to you is: buy the supported hardware, and the support contracts >> for both the hardware and software. But of course, that''s all just a >> calculated risk, and I doubt you''re going to take my advice. ;-) >> > > Any other feasible alternatives for Dell hardware? Wondering, are these issues mostly related to Nehalem-architectural problems, eg. c-states. > So is there anything good in switching hw vendor? HP anyone? > >Sun hardware? Then you get all your support from one vendor. -- Ian.
> From: Markus Kovero [mailto:Markus.Kovero at nebula.fi] > > Any other feasible alternatives for Dell hardware? Wondering, are these > issues mostly related to Nehalem-architectural problems, eg. c-states. > So is there anything good in switching hw vendor? HP anyone?In googling around etc ... Many people are having this type of problem, on HP also. So it''s not just Dell. Most people are able to fix or workaround it by disabling c-states in bios, or fidgeting with their NIC (swap out broadcom in favor of intel, or downgrade bcom firmware.) But I already disabled c-states (didn''t help) and my system ships with a minimum bcom FW version 5, which means I can''t downgrade to v4 which sometimes solved the problem for people. So again - I have support tickets open with dell & oracle... Don''t know the result yet.
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Ian Collins > > Sun hardware? Then you get all your support from one vendor.+1 Sun hardware costs more, but it''s worth it, if you want to simply assume your stuff will work. In my case, I''d say the sun hardware was approx 50% to 2x higher cost than the equivalent dell setup.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 10/25/2010 3:39 AM, Markus Kovero wrote:> > Any other feasible alternatives for Dell hardware? Wondering, are theseissues mostly related to Nehalem-architectural problems, eg. c-states.> So is there anything good in switching hw vendor? HP anyone? >Note that while it was a Dell I was asking about, it''s an AMD opteron system (the R515.) I doubt with an architecture that different that the same ''c-states'' corner case will appear. Aren''t there too many variables changing between AMD and Intel to have the exact same problem? Not there there won''t be a different problem though. :) -Kyle> Yours > Markus Kovero > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (MingW32) iQEcBAEBAgAGBQJMxYghAAoJEEADRM+bKN5wnEEH/iMYiNEjqRdEWMYMlzrXJV7G 1EqsmgC/10nwdVS+lxHQbeoXZ6AZltomkb42ckwLfR74BVwHTM8BBC2hmoaXVMAr FeJzVPe61c8LF5M0RrVJ59gXpBJCjIps8mBli/7wqNYm5SyLAfu0DDD59kY54n75 QcvNvz6mNlXjmE2+kakcLbN3DMjCxRlQ4XgrGQrqwusoZL7LPFhwEy7f+rGp63PO LW82RUIolVqRoNQ5Vg2iemaASkbJUKONppOV2J6FN30MQt8fyGL8SlkU1Fek/hgS EbHZ1e8wgmrOKlcKxnMMH7yh296X8ICl990aWRbt6jxUDM+zeKRC3NceV+pmrSc=heKE -----END PGP SIGNATURE-----
On Mon, October 25, 2010 08:38, Edward Ned Harvey wrote:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >> bounces at opensolaris.org] On Behalf Of Ian Collins >> >> Sun hardware? Then you get all your support from one vendor. > > +1 > > Sun hardware costs more, but it''s worth it, if you want to simply assume > your stuff will work. In my case, I''d say the sun hardware was approx 50% > to 2x higher cost than the equivalent dell setup.I agree with the general sentiment, but it can get prohibitive if you also have a sizable DEV/QA/STG environment that you want to have the same as PRD. I don''t mind gold-plated support for PRD, but for the rest, it''d be handy budget-wise if Oracle simply had a basic parts-only warranty for hardware, and a patches-only support for software. It''s all we need for the majority of our environment, but it no longer seems available under Larry Ellison''s watch.
I''ve been having the same problems, and it appears to be from a remote monitoring app that calls zpool status and/or zfs list. I''ve also found problems with PERC and I''m finally replacing the PERC cards with SAS5/E controllers (which are much cheaper anyway). Every time I reboot, the PERC tells me about a foreign import required so the PERC cards and ZFS just don''t go together... On Oct 24, 2010, at 8:14 PM, Edward Ned Harvey wrote:>> From: Stephan Budach [mailto:stephan.budach at jvm.de] >> >>> What sort of problems did you have with the bcom NICs in your R610? >> >> Well, basically the boxes would hang themselves up, after a week or so. >> And by hanging up, I mean becoming inaccessible by either the network >> via ssh or the local console. It seemed that, for some reason, the >> authentication didn''t work anymore. > > That''s precisely what I''m experiencing. System still responds to ping. > Anything that was already running in memory via network stays alive (cron > jobs continue to run) but remote access is impossible (ssh, vnc, even local > physical console...) And eventually the system will stop completely. > > There''s a high correlation between the problem and doing some sort of > low-level storage operation (zpool import/export, MegaCli offline, zpool > status, scrub, zfs send, etc) So I thought the problem was somehow related > to the perc or something ... Maybe there''s a bug where the perc conflicts > with the nic. I don''t care. Swapping the NIC is cheap enough, I''ll try it > now, just to see if it works. > > Thanks for the suggestion... > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On 10/26/10 01:38 AM, Edward Ned Harvey wrote:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >> bounces at opensolaris.org] On Behalf Of Ian Collins >> >> Sun hardware? Then you get all your support from one vendor. >> > +1 > > Sun hardware costs more, but it''s worth it, if you want to simply assume > your stuff will work. In my case, I''d say the sun hardware was approx 50% > to 2x higher cost than the equivalent dell setup. > >I find that claim odd. When ever we bought kit down here in NZ, Sun has been the best on price. Maybe that''s changed under the new order. -- Ian.
Am 25.10.10 21:06, schrieb Ian Collins:> On 10/26/10 01:38 AM, Edward Ned Harvey wrote: >>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >>> bounces at opensolaris.org] On Behalf Of Ian Collins >>> >>> Sun hardware? Then you get all your support from one vendor. >> +1 >> >> Sun hardware costs more, but it''s worth it, if you want to simply assume >> your stuff will work. In my case, I''d say the sun hardware was >> approx 50% >> to 2x higher cost than the equivalent dell setup. >> > I find that claim odd. When ever we bought kit down here in NZ, Sun > has been the best on price. Maybe that''s changed under the new order. >I am currently investigating on buying Sun/Oracle HW. If you take into account Oracle''s license/support fees for non-Oracle HW, actually buying Solaris with Oracle HW may proof cheaper when calculated over three years. We''ll see.
On Tue, Oct 26, 2010 at 08:06:53AM +1300, Ian Collins wrote:> On 10/26/10 01:38 AM, Edward Ned Harvey wrote: > >>From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > >>bounces at opensolaris.org] On Behalf Of Ian Collins > >> > >>Sun hardware? Then you get all your support from one vendor. > >> > >+1 > > > >Sun hardware costs more, but it''s worth it, if you want to simply assume > >your stuff will work. In my case, I''d say the sun hardware was approx 50% > >to 2x higher cost than the equivalent dell setup. > > > > > I find that claim odd. When ever we bought kit down here in NZ, Sun has > been the best on price. Maybe that''s changed under the new order.Add about 50% to the last price list from Sun und you will get the price it costs now ... Have fun, jel. -- Otto-von-Guericke University http://www.cs.uni-magdeburg.de/ Department of Computer Science Geb. 29 R 027, Universitaetsplatz 2 39106 Magdeburg, Germany Tel: +49 391 67 12768
> Add about 50% to the last price list from Sun und you will get the price > it costs now ...Seems oracle does not want to sell its hardware so much, several month delays with sales rep providing prices and pricing nowhere close to its competitors. Yours Markus Kovero
On Tue, Oct 26, 2010 at 12:50:16PM +0000, Markus Kovero wrote:> > > Add about 50% to the last price list from Sun und you will get the price > > it costs now ... > > Seems oracle does not want to sell its hardware so much, several > month delays with sales rep providing prices and pricing nowhere > close to its competitors.Yeah, no more Sun hardware for us, either. Mostly Supermicro, Dell, HP. -- Eugen* Leitl <a href="http://leitl.org">leitl</a> http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE