In asking about ZFS performance in streaming IO situations, discussion quite quickly turned to potential bottlenecks. By coincidence, I was wondering about the same thing. Richard Elling said:> We know that channels, controllers, memory, network, and CPU bottlenecks > can and will impact actual performance, at least for large configs. > Modeling these bottlenecks is possible, but will require more work in > the tool. If you know the hardware topology, you can do a back-of-the-napkin > analysis, too.Well, I''m normally a Mac guy, so speccing server hardware is a bit of a revelation for me. I''m trying to come up with a ZFS storage server for a networked multimedia research project which hopefully has enough oomph to be a nice resource that outlasts the (2-year) project, but without breaking the bank. Does anyone have a clue as to where the bottlenecks are going to be with this: 16x hot swap SATAII hard drives (plus an internal boot drive) Tyan S2895 (K8WE) motherboard Dual GigE (integral nVidia ports) 2x Areca 8-port PCIe (8-lane) RAID drivers 2x AMD Opteron 275 CPUs (2.2GHz, dual core) 8 GiB RAM The supplier is used to shipping Linux servers in this 3U chassis, but hasn''t dealt with Solaris. He originally suggested 2GiB RAM, but I hear things about ZFS getting RAM hungry after a while. I dug up the RAID controllers after a quick look on Sun''s HCL, but they''re pricy for something that''s just going to give JBOD access (but the bus interconnect looks to be quick, on the other hand). I guess my questions are: - Does anyone out there have a clue where the potential bottlenecks might be? - Is there anywhere where I can save a bit of money? (For example, might the SuperMicro AOC-SAT2-MV8 hanging off the PCI-X slots provide enough bandwidth to the disks?) - If I focused on simple streaming IO, would giving the server less RAM have an impact on performance? - I had assumed four cores would be better than the two faster (3.0GHz) single-core processors the vendor originally suggested. Agree? Many thanks for any thoughts, adam
johansen-osdev at sun.com
2007-Apr-18 23:47 UTC
[zfs-discuss] Bottlenecks in building a system
Adam:> Does anyone have a clue as to where the bottlenecks are going to be with > this: > > 16x hot swap SATAII hard drives (plus an internal boot drive) > Tyan S2895 (K8WE) motherboard > Dual GigE (integral nVidia ports) > 2x Areca 8-port PCIe (8-lane) RAID drivers > 2x AMD Opteron 275 CPUs (2.2GHz, dual core) > 8 GiB RAM> The supplier is used to shipping Linux servers in this 3U chassis, but > hasn''t dealt with Solaris. He originally suggested 2GiB RAM, but I hear > things about ZFS getting RAM hungry after a while.ZFS is opportunistic when it comes to using free memory for caching. I''m not sure what exactly you''ve heard.> I guess my questions are: > - Does anyone out there have a clue where the potential bottlenecks > might be?What''s your workload? Bart is subscribed to this list, but he has a famous saying, "One experiment is worth a thousand expert opinions." Without knowing what you''re trying to do with this box, it''s going to be hard to offer any useful advice. However, you''ll learn the most by getting one of these boxes and running your workload. If you have problems, Solaris has a lot of tools that we can use to diagnose the problem. Then we can improve the performance and everybody wins.> - If I focused on simple streaming IO, would giving the server less RAM > have an impact on performance?The more RAM you can give your box, the more of it ZFS will use for caching. If your workload doesn''t benefit from caching, then the impact on performance won''t be large. Could you be more specific about what the filesystem''s consumers are doing when they''re performing "simple streaming IO?"> - I had assumed four cores would be better than the two faster (3.0GHz) > single-core processors the vendor originally suggested. Agree?I suspect that this is correct. ZFS does many steps in its I/O path asynchronously and they execute in the context of different threads. Four cores are probably better than two. Of course experimentation could prove me wrong here, too. :) -j
On 4/19/07, Adam Lindsay <atl at comp.lancs.ac.uk> wrote:> > 16x hot swap SATAII hard drives (plus an internal boot drive) > Tyan S2895 (K8WE) motherboard > Dual GigE (integral nVidia ports) > 2x Areca 8-port PCIe (8-lane) RAID drivers > 2x AMD Opteron 275 CPUs (2.2GHz, dual core) > 8 GiB RAM > > The supplier is used to shipping Linux servers in this 3U chassis, but > hasn''t dealt with Solaris. He originally suggested 2GiB RAM, but I hear > things about ZFS getting RAM hungry after a while. I dug up the RAID > controllers after a quick look on Sun''s HCL, but they''re pricy for > something that''s just going to give JBOD access (but the bus > interconnect looks to be quick, on the other hand). > > I guess my questions are: > - Does anyone out there have a clue where the potential bottlenecks > might be?Get INTEL GigE!!!!!!! NVidia aren''t as fast and Intel drivers are very good. If it is just storage then it might not matter whether you use Opterons or Xeons. Since you are just feeding stuff from the disk to the nic, things like HyperTransport are probably not so much of a win. Not sure if Tyan have Intel 3U systems, maybe look at Supermicro instead. - Is there anywhere where I can save a bit of money? (For example,> might the SuperMicro AOC-SAT2-MV8 hanging off the PCI-X slots > provide enough bandwidth to the disks?)I''m using this card, seems to work fine. I posted some numbers on this at [1]. This is a [2] 2U 12 disk system with two Dual Core 2GHz Xeons, 4Gb. 2x2x500Gb mirror and 5xRaid2z with Seagate ES 500Gb drives. Rellings reply might be useful as well. Raid cards a definitely a waste of money if you are using zfs. - If I focused on simple streaming IO, would giving the server less RAM> have an impact on performance?I imagine in a streaming situation with multiple clients more memory is good. Is Solaris can do multiple pre-fetchs into memory for each stream and client then you should get a big boost. Maybe that is tunable. Short random read/writes is less likely to win from that situation. - I had assumed four cores would be better than the two faster (3.0GHz)> single-core processors the vendor originally suggested. Agree?Depends on workload probably. How many streams, etc. [1] http://www.opensolaris.org/jive/message.jspa?messageID=109779#109779 [2] http://www.acmemicro.com/estore/merchant.ihtml?pid=4014&step=4 - this is just a http://supermicro.com/products/chassis/2U/826/SC826TQ-R800LPV.cfmwith the X7DBE m/b. Nicholas -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070419/3e7312c3/attachment.html>
johansen-osdev at sun.com wrote:> Adam: > >> Does anyone have a clue as to where the bottlenecks are going to be with >> this: >> >> 16x hot swap SATAII hard drives (plus an internal boot drive) >> Tyan S2895 (K8WE) motherboard >> Dual GigE (integral nVidia ports) >> 2x Areca 8-port PCIe (8-lane) RAID drivers >> 2x AMD Opteron 275 CPUs (2.2GHz, dual core) >> 8 GiB RAM > >> The supplier is used to shipping Linux servers in this 3U chassis, but >> hasn''t dealt with Solaris. He originally suggested 2GiB RAM, but I hear >> things about ZFS getting RAM hungry after a while. > > ZFS is opportunistic when it comes to using free memory for caching. > I''m not sure what exactly you''ve heard."Hungry" clearly had the wrong connotations. With ZFS being so opportunistic, I had the impression that the more memory thrown at it, the better, and that it was typically more RAM than an equivalent Linux/HW RAID box might ask for.>> I guess my questions are: >> - Does anyone out there have a clue where the potential bottlenecks >> might be? > > What''s your workload? Bart is subscribed to this list, but he has a > famous saying, "One experiment is worth a thousand expert opinions." > > Without knowing what you''re trying to do with this box, it''s going to be > hard to offer any useful advice. However, you''ll learn the most by > getting one of these boxes and running your workload. If you have > problems, Solaris has a lot of tools that we can use to diagnose the > problem. Then we can improve the performance and everybody wins.True, all. I gave some details in the other thread ("ZFS performance model for sustained, contiguous writes?") from yesterday: My most performance sensitive requirement would be for one or two streams to saturate two aggregated GigE links while both reading and writing.>> - If I focused on simple streaming IO, would giving the server less RAM >> have an impact on performance? > > The more RAM you can give your box, the more of it ZFS will use for > caching. If your workload doesn''t benefit from caching, then the impact > on performance won''t be large. Could you be more specific about what > the filesystem''s consumers are doing when they''re performing "simple > streaming IO?"Right, "simple" can be anything to anyone. Let''s say writing a 1.5Gbit/s uncompressed HD Video stream, or streaming out several more traditional compressed video streams. Other responses in this thread suggest that pre-fetching will definitely help here, and so ZFS is likely to use that RAM.>> - I had assumed four cores would be better than the two faster (3.0GHz) >> single-core processors the vendor originally suggested. Agree? > > I suspect that this is correct. ZFS does many steps in its I/O path > asynchronously and they execute in the context of different threads. > Four cores are probably better than two. Of course experimentation > could prove me wrong here, too. :)Ah, if only I had that luxury. I understand I''m not going to get terribly far in thought experiment mode, but I want to be able to spec a box that balances cheap with utility over time. Thanks, adam
Nicholas Lee wrote:> On 4/19/07, *Adam Lindsay* <atl at comp.lancs.ac.uk > <mailto:atl at comp.lancs.ac.uk>> wrote: > > 16x hot swap SATAII hard drives (plus an internal boot drive) > Tyan S2895 (K8WE) motherboard > Dual GigE (integral nVidia ports) > 2x Areca 8-port PCIe (8-lane) RAID drivers > 2x AMD Opteron 275 CPUs (2.2GHz, dual core) > 8 GiB RAM >...> I guess my questions are: > - Does anyone out there have a clue where the potential bottlenecks > might be? > > > Get INTEL GigE!!!!!!! NVidia aren''t as fast and Intel drivers are very > good.yes, I''ve been made aware of a lot of pushback on the nVidia drivers/hardware recently, and so I''ll aim to put a better networking card in, as well.> If it is just storage then it might not matter whether you use Opterons > or Xeons. Since you are just feeding stuff from the disk to the nic, > things like HyperTransport are probably not so much of a win. > > Not sure if Tyan have Intel 3U systems, maybe look at Supermicro instead.It''s just the mobo, at the moment, from Tyan. The local vendor has his standard 3U chassis that he uses.> - Is there anywhere where I can save a bit of money? (For example, > might the SuperMicro AOC-SAT2-MV8 hanging off the PCI-X slots > provide enough bandwidth to the disks?) > > > I''m using this card, seems to work fine. I posted some numbers on this > at [1].That''s good, interesting stuff. Here are a few thoughts going through my head: - The bus interconnect is more critical with software RAID than hardware, no? For each block, more data has to go to the CPU. - I''m very lured by the performance offered by two 8-lane PCIe cards (full duplex, mind...) over PCI-X cards running (with the selected mobo) at 133MHz and 100MHz. - I''ll be trying to wring as much IO performance as possible from 15 drives -- most likely 3x 5-disk RAIDZ sets. While your figures are encouraging, I''m not sure they wouldn''t hit a ceiling before they scale up to where I want to be. (It looks like they won''t, but I''m not sure.)> This is a [2] 2U 12 disk system with two Dual Core 2GHz Xeons, 4Gb. > 2x2x500Gb mirror and 5xRaid2z with Seagate ES 500Gb drives. Rellings > reply might be useful as well. > > Raid cards a definitely a waste of money if you are using zfs.Which is exactly what I thought going into this, but I am not aware of that many choices in JBOD controllers, especially running off the PCIe bus. Thanks for the time and pointers, adam
additional comments below... Adam Lindsay wrote:> In asking about ZFS performance in streaming IO situations, discussion > quite quickly turned to potential bottlenecks. By coincidence, I was > wondering about the same thing. > > Richard Elling said: >> We know that channels, controllers, memory, network, and CPU bottlenecks >> can and will impact actual performance, at least for large configs. >> Modeling these bottlenecks is possible, but will require more work in >> the tool. If you know the hardware topology, you can do a >> back-of-the-napkin >> analysis, too. > > Well, I''m normally a Mac guy, so speccing server hardware is a bit of a > revelation for me. I''m trying to come up with a ZFS storage server for a > networked multimedia research project which hopefully has enough oomph > to be a nice resource that outlasts the (2-year) project, but without > breaking the bank. > > Does anyone have a clue as to where the bottlenecks are going to be with > this: > > 16x hot swap SATAII hard drives (plus an internal boot drive)Be sure to check the actual bandwidth of the drives when installed in the final location. We have been doing some studies on the impact of vibration on performance and reliability. If your enclosure does not dampen vibrations, then you should see reduced performance, and it will be obvious for streaming workloads. There was a thread about this a year or so ago regarding thumpers, but since then we''ve seen it in a number of other systems, too. There have also been industry papers on this topic.> Tyan S2895 (K8WE) motherboard > Dual GigE (integral nVidia ports)All I can add to the existing NIC comments in this thread is that Neptune kicks ass. The GbE version is: http://www.sun.com/products/networking/ethernet/sunx8quadgigethernet/index.xml ... but know that I don''t set pricing :-0> 2x Areca 8-port PCIe (8-lane) RAID driversI think this is overkill.> 2x AMD Opteron 275 CPUs (2.2GHz, dual core)This should be a good choice. For high networking loads, you can burn a lot of cycles handling the NICs. For example, using Opterons to drive the dual 10GbE version of Neptune can pretty much consume a significant number of cores. I don''t think your workload will come close to this, however.> 8 GiB RAMI recommend ECC memory, not the cheap stuff... but I''m a RAS guy.> The supplier is used to shipping Linux servers in this 3U chassis, but > hasn''t dealt with Solaris. He originally suggested 2GiB RAM, but I hear > things about ZFS getting RAM hungry after a while. I dug up the RAID > controllers after a quick look on Sun''s HCL, but they''re pricy for > something that''s just going to give JBOD access (but the bus > interconnect looks to be quick, on the other hand).Pretty much any SAS/SATA controller will work ok. You''ll be media speed bound, not I/O channel bound.> I guess my questions are: > - Does anyone out there have a clue where the potential bottlenecks > might be?software + cores --> handling the net and managing data integrity.> - Is there anywhere where I can save a bit of money? (For example, > might the SuperMicro AOC-SAT2-MV8 hanging off the PCI-X slots > provide enough bandwidth to the disks?) > - If I focused on simple streaming IO, would giving the server less RAM > have an impact on performance?RAM as a cache presumes two things: prefetching and data re-use. Most likely, you won''t have re-use and prefetching only makes sense when the disk subsystem is approximately the same speed as the network. Personally, I''d start at 2-4 GBytes and expand as needed (this is easily measured)> - I had assumed four cores would be better than the two faster (3.0GHz) > single-core processors the vendor originally suggested. Agree?Yes, lacking further data. -- richard
If you''re using this for multimedia, do some serious testing first. ZFS tends to have "bursty" write behaviour, and the worst-case latency can be measured in seconds. This has been improved a bit in recent builds but it still seems to "stall" periodically. (QFS works extremely well for streaming, as evidenced in recent Sun press releases, but I''m not sure what the cost is these days.) This message posted from opensolaris.org
Adam Lindsay wrote:> In asking about ZFS performance in streaming IO situations, discussion > quite quickly turned to potential bottlenecks. By coincidence, I was > wondering about the same thing. > > Richard Elling said: > >> We know that channels, controllers, memory, network, and CPU bottlenecks >> can and will impact actual performance, at least for large configs. >> Modeling these bottlenecks is possible, but will require more work in >> the tool. If you know the hardware topology, you can do a >> back-of-the-napkin >> analysis, too. > > > Well, I''m normally a Mac guy, so speccing server hardware is a bit of > a revelation for me. I''m trying to come up with a ZFS storage server > for a networked multimedia research project which hopefully has enough > oomph to be a nice resource that outlasts the (2-year) project, but > without breaking the bank. > > Does anyone have a clue as to where the bottlenecks are going to be > with this: > > 16x hot swap SATAII hard drives (plus an internal boot drive) > Tyan S2895 (K8WE) motherboard > Dual GigE (integral nVidia ports) > 2x Areca 8-port PCIe (8-lane) RAID drivers > 2x AMD Opteron 275 CPUs (2.2GHz, dual core) > 8 GiB RAM >I''m putting together a similar specified machine (Quad-FX with 8GB RAM), but fewer drives. If there any specific tests you want me to run on it while it''s still on my bench, drop me a line. Ian
Richard Elling wrote:>> Does anyone have a clue as to where the bottlenecks are going to be >> with this: >> >> 16x hot swap SATAII hard drives (plus an internal boot drive) > > Be sure to check the actual bandwidth of the drives when installed in the > final location. We have been doing some studies on the impact of vibration > on performance and reliability. If your enclosure does not dampen > vibrations, > then you should see reduced performance, and it will be obvious for > streaming > workloads. There was a thread about this a year or so ago regarding > thumpers, > but since then we''ve seen it in a number of other systems, too. There have > also been industry papers on this topic.Okay, we have a number of the chassis installed here from the same source, but none seem to share the high-throughput workflow, so that''s one thing to quiz the integrator on.>> Tyan S2895 (K8WE) motherboard >> Dual GigE (integral nVidia ports) > > All I can add to the existing NIC comments in this thread is that > Neptune kicks > ass. The GbE version is: > > http://www.sun.com/products/networking/ethernet/sunx8quadgigethernet/index.xml > > ... but know that I don''t set pricing :-0Oh, man, I didn''t need to know about that NIC. Actually, it''s something to shoot for.>> 2x Areca 8-port PCIe (8-lane) RAID drivers > > I think this is overkill.I''m getting convinced of that. With the additional comments in this thread, I''m now seriously considering replacing these PCIe cards with Supermicro''s PCI-X cards, and switching over to a different Tyan board... - 2x SuperMicro AOC-SAT2-MV8 PCI-X SATA2 interfaces - Tyan S2892 (K8SE) motherboard, so that ditches nvidia for: - Dual GigE (integral Broadcom ports)>> 2x AMD Opteron 275 CPUs (2.2GHz, dual core) > > This should be a good choice. For high networking loads, you can burn a > lot > of cycles handling the NICs. For example, using Opterons to drive the dual > 10GbE version of Neptune can pretty much consume a significant number of > cores. > I don''t think your workload will come close to this, however.No, but it''s something to shoot for. :)>> 8 GiB RAM > > I recommend ECC memory, not the cheap stuff... but I''m a RAS guy.So noted.> Pretty much any SAS/SATA controller will work ok. You''ll be media speed > bound, not I/O channel bound.Okay, that message is coming through.> RAM as a cache presumes two things: prefetching and data re-use. Most > likely, you won''t have re-use and prefetching only makes sense when the > disk subsystem is approximately the same speed as the network. Personally, > I''d start at 2-4 GBytes and expand as needed (this is easily measured)I''ll start with 4GBytes, because I like to deploy services in containers, and so will need some elbow room. Many thanks to all in this thread: my spec has certainly evolved, and I hope the machine has gotten cheaper in the process, with little sacrifice in theoretical performance. adam
Anton B. Rang wrote:> If you''re using this for multimedia, do some serious testing first. ZFS tends to have "bursty" write behaviour, and the worst-case latency can be measured in seconds. This has been improved a bit in recent builds but it still seems to "stall" periodically.I had wondered about that, after reading some old threads. For the high-performance stuff, the machine is mostly to be marked as experimental and will spend most of its time being "tested". I''m watching Tony Galway''s current thread most closely, as well. adam
Hi, hope you don''t mind if I make some portions of your email public in a reply--I hadn''t seen it come through on the list at all, so it''s no duplicate to me. Johansen wrote: > Adam: > > Sorry if this is a duplicate, I had issues sending e-mail this morning. > > Based upon your CPU choices, I think you shouldn''t have a problem > saturating a GigE link with a pair of Operton 275''s. Just as a point of > comparison, Sun sells a server with 48 SATA disks and 4 GigE ports: > > http://www.sun.com/servers/x64/x4500/specs.xml > > You have fewer disks, and nearly as much CPU power as the x4500. I > think you have plenty of CPU in your system. > > Your RAID controllers have as many SATA ports as the SATA cards in the > x4500, and you seem to have the same ratio of disks to controllers. I''m well aware of the Thumper, and it''s fair to say it was an inspiration, just without two-thirds of the capacity or any of the serious redundancy. I also used the X4500 as a guide for > I suspect that if you have a bottleneck in your system, it would be due > to the available bandwidth on the PCI bus. Mm. yeah, it''s what I was worried about, too (mostly through ignorance of the issues), which is why I was hoping HyperTransport and PCIe were going to give that data enough room on the bus. But after others expressed the opinion that the Areca PCIe cards were overkill, I''m now looking to putting some PCI-X cards on a different (probably slower) motherboard. > Caching isn''t going to be a huge help for writes, unless there''s another > thread reading simultaneoulsy from the same file. > > Prefetch will definitely use the additional RAM to try to boost the > performance of sequential reads. However, in the interest of full > disclosure, there is a pathology that we''ve seen where the number of > sequential readers exceeds the available space in the cache. In this > situation, sometimes the competeing prefetches for the different streams > will cause more temporally favorable data to be evicted from the cache > and performance will drop. The workaround right now is just to disable > prefetch. We''re looking into more comprehensive solutions. Interesting. So noted. I will expect to have to test thoroughly. >> I understand I''m not going to get terribly far in thought experiment >> mode, but I want to be able to spec a box that balances cheap with >> utility over time. > > If that''s the case, I''m sure you could get by just fine with the pair of > 275''s. Thanks, adam
johansen-osdev at sun.com
2007-Apr-20 20:28 UTC
[zfs-discuss] Bottlenecks in building a system
Adam:> Hi, hope you don''t mind if I make some portions of your email public in > a reply--I hadn''t seen it come through on the list at all, so it''s no > duplicate to me.I don''t mind at all. I had hoped to avoid sending the list a duplicate e-mail, although it looks like my first post never made it here.> > I suspect that if you have a bottleneck in your system, it would be due > > to the available bandwidth on the PCI bus. > > Mm. yeah, it''s what I was worried about, too (mostly through ignorance > of the issues), which is why I was hoping HyperTransport and PCIe were > going to give that data enough room on the bus. > But after others expressed the opinion that the Areca PCIe cards were > overkill, I''m now looking to putting some PCI-X cards on a different > (probably slower) motherboard.I dug up a copy of the S2895 block diagram and asked Bill Moore about it. He said that you should be able to get about 700mb/s off of each of the PCI-X channels and that you only need 100mb/s to saturate a GigE link. He also observed that the RAID card you were using was unnecessary and would probably hamper performance. He reccomended non-RAID SATA cards based upon the Marvell chipset. Here''s the e-mail trail on this list where he discusses Marvell SATA cards in a bit more detail: http://mail.opensolaris.org/pipermail/zfs-discuss/2006-March/016874.html It sounds like if getting disk -> network is the concern, you''ll have plenty of bandwidth, assuming you have a reasonable controller card.> > Caching isn''t going to be a huge help for writes, unless there''s another > > thread reading simultaneoulsy from the same file. > > > > Prefetch will definitely use the additional RAM to try to boost the > > performance of sequential reads. However, in the interest of full > > disclosure, there is a pathology that we''ve seen where the number of > > sequential readers exceeds the available space in the cache. In this > > situation, sometimes the competeing prefetches for the different streams > > will cause more temporally favorable data to be evicted from the cache > > and performance will drop. The workaround right now is just to disable > > prefetch. We''re looking into more comprehensive solutions. > > Interesting. So noted. I will expect to have to test thoroughly.If you run across this problem and are willing to let me debug on your system, shoot me an e-mail. We''ve only seen this in a couple of situations and it was combined with another problem where we were seeing excessive overhead for kcopyout. It''s unlikely, but possible that you''ll hit this. -K
johansen-osdev at sun.com wrote:>>> I suspect that if you have a bottleneck in your system, it would be due >>> to the available bandwidth on the PCI bus. >> Mm. yeah, it''s what I was worried about, too (mostly through ignorance >> of the issues), which is why I was hoping HyperTransport and PCIe were >> going to give that data enough room on the bus. >> But after others expressed the opinion that the Areca PCIe cards were >> overkill, I''m now looking to putting some PCI-X cards on a different >> (probably slower) motherboard. > > I dug up a copy of the S2895 block diagram and asked Bill Moore about > it. He said that you should be able to get about 700mb/s off of each of > the PCI-X channels and that you only need 100mb/s to saturate a GigE > link. He also observed that the RAID card you were using was > unnecessary and would probably hamper performance. He reccomended > non-RAID SATA cards based upon the Marvell chipset. > > Here''s the e-mail trail on this list where he discusses Marvell SATA > cards in a bit more detail: > > http://mail.opensolaris.org/pipermail/zfs-discuss/2006-March/016874.html > > It sounds like if getting disk -> network is the concern, you''ll have > plenty of bandwidth, assuming you have a reasonable controller card.Well, if that isn''t from the horse''s mouth, I don''t know what is. Elsewhere in the thread, I mention that I''m trying to go for a simpler system (well, less dependent upon PCIe) in favour of the S2892, which has the added benefit of having a NIC that is less maligned in the community. From what I can tell of the block diagram, it looks like the PCI-X subsystem is similar enough (except that it''s shared with the NIC). It''s sounding like a safe compromise to me, to use the Marvell chips on the oft-cited SuperMicro cards.>>> Caching isn''t going to be a huge help for writes, unless there''s another >>> thread reading simultaneoulsy from the same file. >>> >>> Prefetch will definitely use the additional RAM to try to boost the >>> performance of sequential reads. However, in the interest of full >>> disclosure, there is a pathology that we''ve seen where the number of >>> sequential readers exceeds the available space in the cache. In this >>> situation, sometimes the competeing prefetches for the different streams >>> will cause more temporally favorable data to be evicted from the cache >>> and performance will drop. The workaround right now is just to disable >>> prefetch. We''re looking into more comprehensive solutions. >> Interesting. So noted. I will expect to have to test thoroughly. > > If you run across this problem and are willing to let me debug on your > system, shoot me an e-mail. We''ve only seen this in a couple of > situations and it was combined with another problem where we were seeing > excessive overhead for kcopyout. It''s unlikely, but possible that you''ll > hit this.That''s one heck of an offer. I''d have no problem with this, nor with taking requests for particular benchmarks from the community. It''s essentially a research machine, and if it can help others out, I''m all for it. Now time to check on the project budget... :) thanks, adam