I have done quite some research over the past few years on the best (ie. simple, robust, inexpensive, and performant) SATA/SAS controllers for ZFS. Especially in terms of throughput analysis (many of them are designed with an insufficient PCIe link width). I have seen many questions on this list about which one to buy, so I thought I would share my knowledge: http://blog.zorinaq.com/?e=10 Very briefly: - The best 16-port one is probably the LSI SAS2116, 6Gbps, PCIe (gen2) x8. Because it is quite pricey, it''s probably better to buy 2 8-port controllers. - The best 8-port is the LSI SAS2008 (faster, more expensive) or SAS1068E (150MB/s/port should be sufficient). - The best 2-port is the Marvell 88SE9128 or 88SE9125 or 88SE9120 because of PCIe gen2 allowing a throughput of at least 300MB/s on the PCIe link with Max_Payload_Size=128. And this one is particularly cheap ($35). AFAIK this is the _only_ controller of the entire market allowing 2 drives to not bottleneck an x1 link. I hope this helps ZFS users here! -mrb
On Sat, May 15, 2010 at 11:01:00AM +0000, Marc Bevand wrote:> I have done quite some research over the past few years on the best (ie. > simple, robust, inexpensive, and performant) SATA/SAS controllers for ZFS. > Especially in terms of throughput analysis (many of them are designed with an > insufficient PCIe link width). I have seen many questions on this list about > which one to buy, so I thought I would share my knowledge: > http://blog.zorinaq.com/?e=10 Very briefly: > > - The best 16-port one is probably the LSI SAS2116, 6Gbps, PCIe (gen2) x8. > Because it is quite pricey, it''s probably better to buy 2 8-port controllers. > - The best 8-port is the LSI SAS2008 (faster, more expensive) or SAS1068E > (150MB/s/port should be sufficient). > - The best 2-port is the Marvell 88SE9128 or 88SE9125 or 88SE9120 because of > PCIe gen2 allowing a throughput of at least 300MB/s on the PCIe link with > Max_Payload_Size=128. And this one is particularly cheap ($35). AFAIK this is > the _only_ controller of the entire market allowing 2 drives to not bottleneck > an x1 link. > > I hope this helps ZFS users here! >Excellent post! It''ll definitely help many. Thanks! -- Pasi
Very helpful. I just started to setup my system and have run into a problem where my SATA port 7/8 aren''t really SATA ports they are behind an unsupported RAID controller, so I am in the market for a compatible controller. Very helpful post. -- This message posted from opensolaris.org
The LSI SAS1064E slipped through the cracks when I built the list. This is a 4-port PCIe x8 HBA with very good Solaris (and Linux) support. I don''t remember having seen it mentionned on zfs-discuss@ before, even though many were looking for 4-port controllers. Perhaps the fact it is priced too close to 8-port models explains why it is relatively unnoted. That said, the wide x8 PCIe link makes it the *cheapest* controller able to feed 300-350MB/s to at least 4 ports concurrently. Now added to my list. -mrb
Nice write-up, Marc. Aren''t the SuperMicro cards their funny "UIO" form factor? Wouldn''t want someone buying a card that won''t work in a standard chassis. -marc On Tue, May 18, 2010 at 2:26 AM, Marc Bevand <m.bevand at gmail.com> wrote:> The LSI SAS1064E slipped through the cracks when I built the list. > This is a 4-port PCIe x8 HBA with very good Solaris (and Linux) > support. I don''t remember having seen it mentionned on zfs-discuss@ > before, even though many were looking for 4-port controllers. Perhaps > the fact it is priced too close to 8-port models explains why it is > relatively unnoted. That said, the wide x8 PCIe link makes it the > *cheapest* controller able to feed 300-350MB/s to at least 4 ports > concurrently. Now added to my list. > > -mrb > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100518/d9b4ec43/attachment.html>
Marc Nicholas <geekything <at> gmail.com> writes:> > Nice write-up, Marc.Aren''t the SuperMicro cards their funny "UIO" form > factor? Wouldn''t want someone buying a card that won''t work in a standard > chassis.Yes, 4 or the 6 Supermicro cards are UIO cards. I added a warning about it. Thanks. -mrb
A really great alternative to the UIO cards for those who don''t want the headache of modifying the brackets or cases is the Intel SASUC8I * * * * *This is a rebranded LSI SAS3081E-R* * * *It can be flashed with the LSI IT firmware from the LSI website and is physically identical to the LSI card. It is really the exact same card, and typically around 140-160 dollars.* * * *These are what i went with.* * * On Tue, May 18, 2010 at 12:28 PM, Marc Bevand <m.bevand at gmail.com> wrote:> Marc Nicholas <geekything <at> gmail.com> writes: > > > > Nice write-up, Marc.Aren''t the SuperMicro cards their funny "UIO" form > > factor? Wouldn''t want someone buying a card that won''t work in a standard > > chassis. > > Yes, 4 or the 6 Supermicro cards are UIO cards. I added a warning about it. > Thanks. > > -mrb > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100518/c904ab2b/attachment.html>
Thomas Burgess <wonslung <at> gmail.com> writes:> > A really great alternative to the UIO cards for those who don''t want the > headache of modifying the brackets or cases is the Intel SASUC8I > > This is a rebranded LSI SAS3081E-R > > It can be flashed with the LSI IT firmware from the LSI website and > is physically identical to the LSI card. ?It is really the exact same > card, and typically around 140-160 dollars.The SASUC8I is already in my list. In fact I bought one last week. I did not need to flash its firmware though - drives were used in JBOD mode by default. -mrb
My work has bought a bunch of IBM servers recently as ESX hosts. They all come with LSI SAS1068E controllers as standard, which we remove and upgrade to a raid 5 controller. So I had a bunch of them lying around. We''ve bought a 16x SAS hotswap case and I''ve put in an AMD X4 955 BE with an ASUS M4A89GTD Pro as the mobo. In the two 16x PCI-E slots I''ve put in the 1068E controllers I had lying around. Everything is still being put together and I still haven''t even installed opensolaris yet but I''ll see if I can get you some numbers on the controllers when I am done. -- This message posted from opensolaris.org
Deon Cui <deon.cui <at> gmail.com> writes:> > So I had a bunch of them lying around. We''ve bought a 16x SAS hotswap > case and I''ve put in an AMD X4 955 BE with an ASUS M4A89GTD Pro as > the mobo. > > In the two 16x PCI-E slots I''ve put in the 1068E controllers I had > lying around. Everything is still being put together and I still > haven''t even installed opensolaris yet but I''ll see if I can get > you some numbers on the controllers when I am done.This is a well-architected config with no bottlenecks on the PCIe links to the 890GX northbridge or on the HT link to the CPU. If you run 16 concurrent dd if=/dev/rdsk/c?d?t?p0 of=/dev/zero bs=1024k and assuming your drives can do ~100MB/s sustained reads at the beginning of the platter, you should literally see an aggregate throughput of ~1.6GB/s... -mrb
> Deon Cui <deon.cui <at> gmail.com> writes: > > > > So I had a bunch of them lying around. We''ve bought > a 16x SAS hotswap > > case and I''ve put in an AMD X4 955 BE with an ASUS > M4A89GTD Pro as > > the mobo. > > > > In the two 16x PCI-E slots I''ve put in the 1068E > controllers I had > > lying around. Everything is still being put > together and I still > > haven''t even installed opensolaris yet but I''ll see > if I can get > > you some numbers on the controllers when I am done. > > This is a well-architected config with no bottlenecks > on the PCIe > links to the 890GX northbridge or on the HT link to > the CPU. If you > run 16 concurrent dd if=/dev/rdsk/c?d?t?p0 > of=/dev/zero bs=1024k and > assuming your drives can do ~100MB/s sustained reads > at the > beginning of the platter, you should literally see an > aggregate > throughput of ~1.6GB/s... > > -mrbthats the kind of stuff wet dreams are made of =] Great post - thanks -- This message posted from opensolaris.org
On Sat, May 15, 2010 at 4:01 AM, Marc Bevand <m.bevand at gmail.com> wrote:> I have done quite some research over the past few years on the best (ie. > simple, robust, inexpensive, and performant) SATA/SAS controllers for ZFS.I''ve spent some time looking at the capabilities of a few controllers based on the questions about the SiI3124 and PMP support. According to the docs, the Marvell 88SX6081 driver doesn''t support NCQ or PMP, though the card does. While I''m not really performance bound on my system, I imagine NCQ would help performance a bit, at least for scrubs or resilvers. Even more so because I''m using the slow WD10EADS drives. This raises the question of whether a SAS controller supports NCQ for sata drives. Would an LSI 1068e based controller? What about a LSI 2008 based card? -B -- Brandon High : bhigh at freaks.com
> > I''ve spent some time looking at the capabilities of a few > > controllers based on the questions about the SiI3124 and PMP > > support. > > > > According to the docs, the Marvell 88SX6081 driver doesn''t support > > NCQ or PMP, though the card does. While I''m not really performance > > bound on my system, I imagine NCQ would help performance a bit, at > > least for > > scrubs or resilvers. Even more so because I''m using the slow > > WD10EADS drives. > > > > This raises the question of whether a SAS controller supports NCQ > > for sata drives. Would an LSI 1068e based controller? What about a > > LSI 2008 based card? > > > > -BIm seeing some issues with the sil3124 at the moment, with 100% %w blocking - leading to stuttering of streams etc. Check out the storage-discuss or osol-discuss lists, if youre a member of them. -- iMx imx at streamvia.net www.slashdevslashnull.com
+storage-discuss On Wed, May 26, 2010 at 2:47 PM, Brandon High <bhigh at freaks.com> wrote:> I''ve spent some time looking at the capabilities of a few controllers > based on the questions about the SiI3124 and PMP support. > > According to the docs, the Marvell 88SX6081 driver doesn''t support NCQ > or PMP, though the card does. While I''m not really performance bound > on my system, I imagine NCQ would help performance a bit, at least for > scrubs or resilvers. Even more so because I''m using the slow WD10EADS > drives. > > This raises the question of whether a SAS controller supports NCQ for > sata drives. Would an LSI 1068e based controller? What about a LSI > 2008 based card? > > -B > > -- > Brandon High : bhigh at freaks.com >-- Brandon High : bhigh at freaks.com
On Wed, May 26, 2010 at 5:47 PM, Brandon High <bhigh at freaks.com> wrote:> On Sat, May 15, 2010 at 4:01 AM, Marc Bevand <m.bevand at gmail.com> wrote: > > I have done quite some research over the past few years on the best (ie. > > simple, robust, inexpensive, and performant) SATA/SAS controllers for > ZFS. > > I''ve spent some time looking at the capabilities of a few controllers > based on the questions about the SiI3124 and PMP support. > > According to the docs, the Marvell 88SX6081 driver doesn''t support NCQ > or PMP, though the card does. While I''m not really performance bound > on my system, I imagine NCQ would help performance a bit, at least for > scrubs or resilvers. Even more so because I''m using the slow WD10EADS > drives. > > This raises the question of whether a SAS controller supports NCQ for > sata drives. Would an LSI 1068e based controller? What about a LSI > 2008 based card? > >If that is the chip on the AOC-SAT2-MV8 then i''m pretty sure it does suppoer NCQ I''m also pretty sure the LSI supports NCQ I''m not 100% sure though -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100526/66a4bcda/attachment.html>
On Thu, May 20, 2010 at 2:19 AM, Marc Bevand <m.bevand at gmail.com> wrote:> Deon Cui <deon.cui <at> gmail.com> writes: >> >> So I had a bunch of them lying around. We''ve bought a 16x SAS hotswap >> case and I''ve put in an AMD X4 955 BE with an ASUS M4A89GTD Pro as >> the mobo. >> >> In the two 16x PCI-E slots I''ve put in the 1068E controllers I had >> lying around. Everything is still being put together and I still >> haven''t even installed opensolaris yet but I''ll see if I can get >> you some numbers on the controllers when I am done. > > This is a well-architected config with no bottlenecks on the PCIe > links to the 890GX northbridge or on the HT link to the CPU. If you > run 16 concurrent dd if=/dev/rdsk/c?d?t?p0 of=/dev/zero bs=1024k and > assuming your drives can do ~100MB/s sustained reads at the > beginning of the platter, you should literally see an aggregate > throughput of ~1.6GB/s...SuperMicro X8DTi motherboard SuperMicro SC846E1 chassis (3Gb/s backplane) LSI 9211-4i (PCIex x4) connected to backplane with a SFF-8087 cable (4-lane). 18 x Seagate 1TB SATA 7200rpm I was able to saturate the system at 800MB/s with the 18 disks in RAID-0. Same performance was achieved swapping the 9211-4i for a MegaRAID 8888ELP. I''m guessing the backplane and cable are the bottleneck here. Any comments ? -- Giovanni
On Wed, May 26, 2010 at 4:25 PM, Thomas Burgess <wonslung at gmail.com> wrote:> If that is the chip on the AOC-SAT2-MV8 then i''m pretty sure it does suppoer > NCQNot according to the driver documentation: http://docs.sun.com/app/docs/doc/819-2254/marvell88sx-7d "In addition, the 88SX6081 device supports the SATA II Phase 1.0 specification features, including SATA II 3.0 Gbps speed, SATA II Port Multiplier functionality and SATA II Port Selector. Currently the driver does not support native command queuing, port multiplier or port selector functionality." The driver source isn''t available (or I couldn''t find it) so it''s not easy to confirm.> I''m also pretty sure the LSI supports NCQ > I''m not 100% sure thoughBoth the LSI chipsets implement SAS, which can support SATA. Since SAS supports TCQ, it''s at lease conceivable that NCQ support was added for SATA. I looked at the mpt_sas driver code a bit and didn''t see anything, but I''m not sure if the TCQ -> NCQ conversion is transparent to the driver, or if it''s handled in the scsi drivers, etc. -B -- Brandon High : bhigh at freaks.com
On Wed, May 26, 2010 at 4:27 PM, Giovanni Tirloni <gtirloni at sysdroid.com> wrote:> SuperMicro X8DTi motherboard > SuperMicro SC846E1 chassis (3Gb/s backplane) > LSI 9211-4i (PCIex x4) connected to backplane with a SFF-8087 cable (4-lane). > 18 x Seagate 1TB SATA 7200rpm > > I was able to saturate the system at 800MB/s with the 18 disks in > RAID-0. Same performance was achieved swapping the 9211-4i for a > MegaRAID 8888ELP. > > I''m guessing the backplane and cable are the bottleneck here.I''d wager it''s the PCIe x4. That''s about 1000MB/s raw bandwidth, about 800MB/s after overhead. -B -- Brandon High : bhigh at freaks.com
On Wed, 2010-05-26 at 17:18 -0700, Brandon High wrote:> > If that is the chip on the AOC-SAT2-MV8 then i''m pretty sure it does > suppoer > > NCQ > > Not according to the driver documentation: > http://docs.sun.com/app/docs/doc/819-2254/marvell88sx-7d > "In addition, the 88SX6081 device supports the SATA II Phase 1.0 > specification features, including SATA II 3.0 Gbps speed, SATA II Port > Multiplier functionality and SATA II Port Selector. Currently the > driver does not support native command queuing, port multiplier or > port selector functionality." > > The driver source isn''t available (or I couldn''t find it) so it''s not > easy to confirm.marvell88sx does support NCQ. This man page error was corrected in nevada build 138. Marty
I thought it did....I couldn''t imagine sun using that chip in the original thumper if it didn''t suppoer NCQ....also, i''ve read where people have had to DISABLE ncq on this driver to fix one bug or another (as a work around) On Wed, May 26, 2010 at 8:40 PM, Marty Faltesek <marty.faltesek at oracle.com>wrote:> On Wed, 2010-05-26 at 17:18 -0700, Brandon High wrote: > > > If that is the chip on the AOC-SAT2-MV8 then i''m pretty sure it does > > suppoer > > > NCQ > > > > Not according to the driver documentation: > > http://docs.sun.com/app/docs/doc/819-2254/marvell88sx-7d > > "In addition, the 88SX6081 device supports the SATA II Phase 1.0 > > specification features, including SATA II 3.0 Gbps speed, SATA II Port > > Multiplier functionality and SATA II Port Selector. Currently the > > driver does not support native command queuing, port multiplier or > > port selector functionality." > > > > The driver source isn''t available (or I couldn''t find it) so it''s not > > easy to confirm. > > marvell88sx does support NCQ. This man page error was corrected in > nevada build 138. > > Marty > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100526/577d993e/attachment.html>
On Wed, May 26, 2010 at 9:22 PM, Brandon High <bhigh at freaks.com> wrote:> On Wed, May 26, 2010 at 4:27 PM, Giovanni Tirloni <gtirloni at sysdroid.com> wrote: >> SuperMicro X8DTi motherboard >> SuperMicro SC846E1 chassis (3Gb/s backplane) >> LSI 9211-4i (PCIex x4) connected to backplane with a SFF-8087 cable (4-lane). >> 18 x Seagate 1TB SATA 7200rpm >> >> I was able to saturate the system at 800MB/s with the 18 disks in >> RAID-0. Same performance was achieved swapping the 9211-4i for a >> MegaRAID 8888ELP. >> >> I''m guessing the backplane and cable are the bottleneck here. > > I''d wager it''s the PCIe x4. That''s about 1000MB/s raw bandwidth, about > 800MB/s after overhead.Makes perfect sense. I was calculating the bottlenecks using the full-duplex bandwidth and it wasn''t apparent the one-way bottleneck. In any case the solution is limited externally by the 4 x Gigabit Ethernet NICs, unless we add more, which isn''t necessary for our requirements. Thanks! -- Giovanni
On Wed, May 26, 2010 at 5:52 PM, Thomas Burgess <wonslung at gmail.com> wrote:> I thought it did....I couldn''t imagine sun using that chip in the original > thumper if it didn''t suppoer NCQ....also, i''ve read where people have had to > DISABLE ncq on this driver to fix one bug or another (as a work around)That''s what I thought too, so you can imagine my surprise when I saw the man page stating otherwise. Marty, is the source available publicly? (or do you know if the pmp or port selector features are supported as well?) I couldn''t find it on http://src.opensolaris.org/ -B -- Brandon High : bhigh at freaks.com
On Wed, 2010-05-26 at 18:37 -0700, Brandon High wrote:> > That''s what I thought too, so you can imagine my surprise when I saw > the man page stating otherwise. > > Marty, is the source available publicly? (or do you know if the pmp or > port selector features are supported as well?) I couldn''t find it on > http://src.opensolaris.org/marvell is closed source. PMP is only available on AHCI at present. Marty
On Wed, May 26, 2010 at 6:09 PM, Giovanni Tirloni <gtirloni at sysdroid.com> wrote:> On Wed, May 26, 2010 at 9:22 PM, Brandon High <bhigh at freaks.com> wrote: >> >> I''d wager it''s the PCIe x4. That''s about 1000MB/s raw bandwidth, about >> 800MB/s after overhead. > > Makes perfect sense. I was calculating the bottlenecks using the > full-duplex bandwidth and it wasn''t apparent the one-way bottleneck.Actually both of you guys are wrong :-) The Supermicro X8DTi mobo and LSISAS9211-4i HBA are both PCIe 2.0 compatible, so the max theoretical PCIe x4 throughput is 4GB/s aggregate, or 2GB/s in each direction, well above the 800MB/s bottleneck observed by Giovanni. This bottleneck is actually caused by the backplane: Supermicro "E1" chassis like Giovanni''s (SC846E1) include port multipliers that degrade performance by putting 6 disks behind a single 3Gbps link. A single 3Gbps link provides in theory 300MB/s usable after 8b-10b encoding, but practical throughput numbers are closer to 90% of this figure, or 270MB/s. 6 disks per link means that each disk gets allocated 270/6 = 45MB/s. So with 18 disks striped, this gives a max usable throughput of 18*45 = 810MB/s, which matches exactly what Giovanni observed. QED! -mrb
On Wed, May 26, 2010 at 8:35 PM, Marc Bevand <m.bevand at gmail.com> wrote:> The Supermicro X8DTi mobo and LSISAS9211-4i HBA are both PCIe 2.0 compatible, > so the max theoretical PCIe x4 throughput is 4GB/s aggregate, or 2GB/s in each > direction, well above the 800MB/s bottleneck observed by Giovanni.I only looked at the Megaraid 8888 that he mentioned, which has a PCIe 1.0 4x interface, or 1000MB/s. I wonder if its performance was slightly lower than the LSI 9211. The board also has a PCIe 1.0 4x electrical slot, which is 8x physical. If the card was in the PCIe slot furthest from the CPUs, then it was only running 4x.> A single 3Gbps link provides in theory 300MB/s usable after 8b-10b encoding, > but practical throughput numbers are closer to 90% of this figure, or 270MB/s. > 6 disks per link means that each disk gets allocated 270/6 = 45MB/s.... except that a SFF-8087 connector contains four 3Gbps connections. It may depend on how the drives were connected to the expander. You''re assuming that all 18 are on 3 channels, in which case moving drives around could help performance a bit. If the expander is able to use all four channels, which it should be able to do, there would be 1200MB/s theoretical, or 1080MB/s using your 90% figure of actual bandwidth available.> So with 18 disks striped, this gives a max usable throughput of 18*45 = 810MB/s, > which matches exactly what Giovanni observed. QED!Giovani will have to confirm the layout of the drives and which expansion slot he was using to be sure. Until then, I declare myself the winner in this pissing contest. -B -- Brandon High : bhigh at freaks.com
Hi, Brandon High <bhigh <at> freaks.com> writes:> > I only looked at the Megaraid 8888 that he mentioned, which has a PCIe > 1.0 4x interface, or 1000MB/s.You mean x8 interface (theoretically plugged into that x4 slot below...)> The board also has a PCIe 1.0 4x electrical slot, which is 8x > physical. If the card was in the PCIe slot furthest from the CPUs, > then it was only running 4x.If Giovanni had put the Megaraid 8888 in this slot, he would have seen an even lower throughput, around 600MB/s: This slot is provided by the ICH10R which as you can see on: http://www.supermicro.com/manuals/motherboard/5500/MNL-1062.pdf is connected to the northbridge through a DMI link, an Intel- proprietary PCIe 1.0 x4 link. The ICH10R supports a Max_Payload_Size of only 128 bytes on the DMI link: http://www.intel.com/Assets/PDF/datasheet/320838.pdf And as per my experience: http://opensolaris.org/jive/thread.jspa?threadID=54481&tstart=45 a 128-byte MPS allows using just about 60% of the theoretical PCIe throughput, that is, for the DMI link: 250MB/s * 4 links * 60% = 600MB/s. Note that the PCIe x4 slot supports a larger, 256-byte MPS but this is irrevelant as the DMI link will be the bottleneck anyway due to the smaller MPS.> > A single 3Gbps link provides in theory 300MB/s usable after 8b-10bencoding,> > but practical throughput numbers are closer to 90% of this figure, or270MB/s.> > 6 disks per link means that each disk gets allocated 270/6 = 45MB/s. > > ... except that a SFF-8087 connector contains four 3Gbps connections.Yes, four 3Gbps links, but 24 disks per SFF-8087 connector. That''s still 6 disks per 3Gbps (according to Giovanni, his LSI HBA was connected to the backplane with a single SFF-8087 cable).> It may depend on how the drives were connected to the expander. You''re > assuming that all 18 are on 3 channels, in which case moving drives > around could help performance a bit.True, I assumed this and, frankly, this is probably what he did by using adjacent drive bays... A more optimal solution would be spread the 18 drives in a 5+5+4+4 config so that the 2 most congested 3Gbps links are shared by only 5 drives, instead of 6, which would boost the througput by 6/5 = 1.2x. Which would change my first overall 810MB/s estimate to 810*1.2 = 972MB/s. PS: it was not my intention to start a pissing contest. Peace! -mrb
On Thu, May 27, 2010 at 2:39 AM, Marc Bevand <m.bevand at gmail.com> wrote:> Hi, > > Brandon High <bhigh <at> freaks.com> writes: >> >> I only looked at the Megaraid 8888 that he mentioned, which has a PCIe >> 1.0 4x interface, or 1000MB/s. > > You mean x8 interface (theoretically plugged into that x4 slot below...) > >> The board also has a PCIe 1.0 4x electrical slot, which is 8x >> physical. If the card was in the PCIe slot furthest from the CPUs, >> then it was only running 4x.The tests were done connecting both cards to the PCIe 2.0 x8 slot#6 that connects directly to the Intel 5520 chipset. I totally ignored the differences between PCIe 1.0 and 2.0. My fault.> > If Giovanni had put the Megaraid 8888 in this slot, he would have seen > an even lower throughput, around 600MB/s: > > This slot is provided by the ICH10R which as you can see on: > http://www.supermicro.com/manuals/motherboard/5500/MNL-1062.pdf > is connected to the northbridge through a DMI link, an Intel- > proprietary PCIe 1.0 x4 link. The ICH10R supports a Max_Payload_Size > of only 128 bytes on the DMI link: > http://www.intel.com/Assets/PDF/datasheet/320838.pdf > And as per my experience: > http://opensolaris.org/jive/thread.jspa?threadID=54481&tstart=45 > a 128-byte MPS allows using just about 60% of the theoretical PCIe > throughput, that is, for the DMI link: 250MB/s * 4 links * 60% = 600MB/s. > Note that the PCIe x4 slot supports a larger, 256-byte MPS but this is > irrevelant as the DMI link will be the bottleneck anyway due to the > smaller MPS. > >> > A single 3Gbps link provides in theory 300MB/s usable after 8b-10b > encoding, >> > but practical throughput numbers are closer to 90% of this figure, or > 270MB/s. >> > 6 disks per link means that each disk gets allocated 270/6 = 45MB/s. >> >> ... except that a SFF-8087 connector contains four 3Gbps connections. > > Yes, four 3Gbps links, but 24 disks per SFF-8087 connector. That''s > still 6 disks per 3Gbps (according to Giovanni, his LSI HBA was > connected to the backplane with a single SFF-8087 cable).Correct. The backplane on the SC646E1 only has one SFF-8087 cable to the HBA.>> It may depend on how the drives were connected to the expander. You''re >> assuming that all 18 are on 3 channels, in which case moving drives >> around could help performance a bit. > > True, I assumed this and, frankly, this is probably what he did by > using adjacent drive bays... A more optimal solution would be spread > the 18 drives in a 5+5+4+4 config so that the 2 most congested 3Gbps > links are shared by only 5 drives, instead of 6, which would boost the > througput by 6/5 = 1.2x. Which would change my first overall 810MB/s > estimate to 810*1.2 = 972MB/s.The chassis has 4 columns of 6 disks. The 18 disks I was testing were all on columns #1 #2 #3. Column #0 still has a pair of SSDs and more disks which I havent'' used in this test. I''ll try to move things around to make use of the 4 port multipliers and test again. SuperMicro is going to release 6Gb/s backplane that uses the LSI SAS2X36 chipset in the near future, I''ve been told. Good thing this is still a lab experience. Thanks very much for the invaluable help! -- Giovanni
Giovanni Tirloni <gtirloni <at> sysdroid.com> writes:> > The chassis has 4 columns of 6 disks. The 18 disks I was testing were > all on columns #1 #2 #3.Good, so this confirms my estimations. I know you said the current ~810 MB/s are amply sufficient for your needs. Spreading the 18 drives across all 4 port multipliers could give you ~972 MB/s. But know that by *removing* 2 drives from the pool you could reach an even higher ~1080 MB/s, because the striped pool is bottlenecked by the slower drives on the congested 3Gbps links, so you are under-using the non-congested 3Gbps links. A 4+4+4+4 config (4 drives on each 3Gbps link, 16 total) would give 300MB/s * 90% / 4 = 67.5 MB/s per drive, times 16 drives = 1080 MB/s. To sum up this thread (your current config is b): (a) 18 drives, HBA in the PCIe 1.0 slot, regardless of PMP config -> 600 MB/s (b) 18 drives, using only 3 PMP (6+6+6+0 config) -> 810 MB/s (c) 18 drives, spread across all 4 PMPs (5+5+4+4 config) -> 972 MB/s (d) 16 drives, more efficient use of all links (4+4+4+4 config) -> 1080 MB/s -mrb
On Fri, May 28, 2010 at 00:56, Marc Bevand <m.bevand at gmail.com> wrote:> Giovanni Tirloni <gtirloni <at> sysdroid.com> writes: >> >> The chassis has 4 columns of 6 disks. The 18 disks I was testing were >> all on columns #1 #2 #3. > > Good, so this confirms my estimations. I know you said the current > ~810 MB/s are amply sufficient for your needs. Spreading the 18 drives > across all 4 port multipliersThe Supermicro SC846E1 cases don''t contain multiple (sata) port multipliers, they contain a single SAS expander, which shares bandwidth among the controllers and drives: no column- or row-based limitations should be present. That backplane has two 8087 ports, IIRC: one labeled for the host, and one for a downstream chassis. I don''t think there''s actually any physical or logical difference between the upstream and downstream ports, so you might consider trying connecting two cables (ideally from two SAS controllers, with multipath) and see if that goes any faster. Giovanni: When you say you saturated the system with a RAID-0 device, what do you mean? I think the suggested benchmark (read from all the disks independently, using dd or some other sequential-transfer mechanism like vdbench) would be more interesting in terms of finding the limiting bus bandwidth than a ZFS-based or hardware-raid-based benchmark. Inter-disk synchronization and checksums and such can put a damper on ZFS performance, so simple read-sequentially-from-disk can often deliver surprising results. Note that such results aren''t always useful: after all, the goal is to run ZFS on the hardware, not dd! but may indicate that a certain component of the system is or is not to blame. Will