Rob Clark
2009-Oct-13 02:04 UTC
[zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing
It would be great to use the GPU for ZFS or to offload some of the OS load ... http://developer.nvidia.com/object/nexus.html If SunStudio had some GPU support and there were a few functions added for the hotpoints ... Rob -- This message posted from opensolaris.org
Bob Friesenhahn
2009-Oct-13 13:45 UTC
[zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing
On Mon, 12 Oct 2009, Rob Clark wrote:> It would be great to use the GPU for ZFS or to offload some of the OS load ... > http://developer.nvidia.com/object/nexus.htmlLet us know when you have a demo ready. :-)> If SunStudio had some GPU support and there were a few functions > added for the hotpoints ...GPUs are great for some things but it is difficult to imagine a GPU being of assistance in the zfs implementation due to way too much latency, optimization for floating point rather than integer, and due to creating a "hotpoint". For zfs it is much better to spread the load across multiple CPU cores using many threads as is done now. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Rob Clark
2010-Jul-18 11:41 UTC
[zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing
> On Mon, 12 Oct 2009, Rob Clark wrote: > > > It would be great to use the GPU for ZFS or to offload some of ... > > Let us know when you have a demo ready. :-)[b]Other[/b] demos showing a few ''lousy speed-ups of only 2 times'' with some speed-ups reaching over 1000 times faster. The general consensus seems to be that a 8-12 times speed-up is a reasonable expectation (with some effort) for the so-called "average Problem". See: http://www.nvidia.com/object/cuda_apps_flash_new.html> > If SunStudio had some GPU support and there were a few functions > > added for the hotpoints ... > > GPUs are great for some things but it is difficult to imagine a GPU > being of assistance in the zfs implementation due to way too much > latency, optimization for floating point rather than integer, and due > to creating a "hotpoint". For zfs it is much better to spread the > load across multiple CPU cores using many threads as is done now. > > Bob > --If you wait long enough someone will ''build a Bridge to it'' ... Accelerating Distributed Storage Systems with CUDA - Paper & Code http://www.nvidia.com/object/cuda_apps_flash_new.html#state=detailsOpen;aid=e223fbfc-f017-498c-8174-699747dbd88b Real-time Parallel Hashing on the GPU http://www.nvidia.com/object/cuda_apps_flash_new.html#state=detailsOpen;aid=d91c3c63-a2d6-4a15-a70a-87bcafdd70d8 CUDA Multiforcer - Password Recovery or Testing if Password is ''Secure Enough'' http://www.nvidia.com/object/cuda_apps_flash_new.html#state=detailsOpen;aid=9e515da5-c97e-4c37-8305-f27982a02d5f Parallelizing Hash-based Data Carving http://www.nvidia.com/object/cuda_apps_flash_new.html#state=detailsOpen;aid=f93b62b6-b6af-497e-83a8-865af31c8d7a and Paper: http://arxiv.org/abs/0901.1307 Support for OpenCL in OpenSolaris would be a good thing, Rob -- This message posted from opensolaris.org
"C. Bergström"
2010-Jul-18 12:01 UTC
[zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing
Rob Clark wrote:>> On Mon, 12 Oct 2009, Rob Clark wrote: >> >> >>> It would be great to use the GPU for ZFS or to offload some of ... >>> >> Let us know when you have a demo ready. :-) >> > > [b]Other[/b] demos showing a few ''lousy speed-ups of only 2 times'' with some speed-ups reaching over 1000 times faster. > The general consensus seems to be that a 8-12 times speed-up is a reasonable expectation (with some effort) for > the so-called "average Problem". See: http://www.nvidia.com/object/cuda_apps_flash_new.html >It all depends on the code so I wouldn''t say there''s any reasonable expectation. To assume otherwise is flawed..> > > >>> If SunStudio had some GPU support and there were a few functions >>> added for the hotpoints ... >>> >> GPUs are great for some things but it is difficult to imagine a GPU >> being of assistance in the zfs implementation due to way too much >> latency, optimization for floating point rather than integer, and due >> to creating a "hotpoint". For zfs it is much better to spread the >> load across multiple CPU cores using many threads as is done now. >> >> Bob >> -- >> > > If you wait long enough someone will ''build a Bridge to it'' ... > > > Accelerating Distributed Storage Systems with CUDA - Paper & Code > http://www.nvidia.com/object/cuda_apps_flash_new.html#state=detailsOpen;aid=e223fbfc-f017-498c-8174-699747dbd88b >I believe I commented on this thread before and that paper mostly focuses on hashed and md5 checksums. The other problem is that while as long as the GPU is on the other side of the memory controller it''s unlikely to be "cost" (performance/latency) effective to do the expensive round-trip for small chunks of data. Depending on the storage topology and stack of course... *If* you had some master node doing checksums or doing it at the application layer it could possible make sense, but it would need to be some seriously huge workload to justify it.> > Real-time Parallel Hashing on the GPU > http://www.nvidia.com/object/cuda_apps_flash_new.html#state=detailsOpen;aid=d91c3c63-a2d6-4a15-a70a-87bcafdd70d8 > > > CUDA Multiforcer - Password Recovery or Testing if Password is ''Secure Enough'' > http://www.nvidia.com/object/cuda_apps_flash_new.html#state=detailsOpen;aid=9e515da5-c97e-4c37-8305-f27982a02d5f > > > Parallelizing Hash-based Data Carving > http://www.nvidia.com/object/cuda_apps_flash_new.html#state=detailsOpen;aid=f93b62b6-b6af-497e-83a8-865af31c8d7a and Paper: http://arxiv.org/abs/0901.1307 > > > Support for OpenCL in OpenSolaris would be a good thing, >OpenCL wouldn''t help solve this and from the research papers I''ve seen it''s lower performance and higher execution times than CUDA. /* Disclaimer : I work for a vendor doing a GPGPU solution based on HMPP which will be ported to Solaris. Our driver some parts of the stack are open source as well.. */ ./C
Erik Trimble
2010-Jul-19 00:01 UTC
[zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing
On 7/18/2010 5:01 AM, "C. Bergstr?m" wrote:> Rob Clark wrote: >>>> If SunStudio had some GPU support and there were a few functions >>>> added for the hotpoints ... >>> GPUs are great for some things but it is difficult to imagine a GPU >>> being of assistance in the zfs implementation due to way too much >>> latency, optimization for floating point rather than integer, and >>> due to creating a "hotpoint". For zfs it is much better to spread >>> the load across multiple CPU cores using many threads as is done now. >>> >>> Bob >>> -- >> >> If you wait long enough someone will ''build a Bridge to it'' ... >> >> >> Accelerating Distributed Storage Systems with CUDA - Paper & Code >> http://www.nvidia.com/object/cuda_apps_flash_new.html#state=detailsOpen;aid=e223fbfc-f017-498c-8174-699747dbd88b >> > I believe I commented on this thread before and that paper mostly > focuses on hashed and md5 checksums. The other problem is that while > as long as the GPU is on the other side of the memory controller it''s > unlikely to be "cost" (performance/latency) effective to do the > expensive round-trip for small chunks of data. Depending on the > storage topology and stack of course... *If* you had some master node > doing checksums or doing it at the application layer it could possible > make sense, but it would need to be some seriously huge workload to > justify itGPUs sitting on the PCI-E bus are going to have this problem, and it''s likely insurmountable. HOWEVER, AMD *is* finally getting around to implementing a GPU in the same package as the CPU, so we''ll shortly be able to see a combined CPU/GPU thing that sits in an AM3 socket (or, more likely, a C32 socket). There''s even a possibility that AMD''s talked about where an advanced GPU will live in a second CPU-style socket, with direct HT connections. This sort of design at least leads itself to being used as a co-processor, as it has direct low-latency connection to the memory contoller/bus. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA
"C. Bergström"
2010-Jul-19 00:26 UTC
[zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing
Erik Trimble wrote:> On 7/18/2010 5:01 AM, "C. Bergstr?m" wrote: >> Rob Clark wrote: >>>>> If SunStudio had some GPU support and there were a few functions >>>>> added for the hotpoints ... >>>> GPUs are great for some things but it is difficult to imagine a GPU >>>> being of assistance in the zfs implementation due to way too much >>>> latency, optimization for floating point rather than integer, and >>>> due to creating a "hotpoint". For zfs it is much better to spread >>>> the load across multiple CPU cores using many threads as is done now. >>>> >>>> Bob >>>> -- >>> >>> If you wait long enough someone will ''build a Bridge to it'' ... >>> >>> >>> Accelerating Distributed Storage Systems with CUDA - Paper & Code >>> http://www.nvidia.com/object/cuda_apps_flash_new.html#state=detailsOpen;aid=e223fbfc-f017-498c-8174-699747dbd88b >>> >> I believe I commented on this thread before and that paper mostly >> focuses on hashed and md5 checksums. The other problem is that while >> as long as the GPU is on the other side of the memory controller it''s >> unlikely to be "cost" (performance/latency) effective to do the >> expensive round-trip for small chunks of data. Depending on the >> storage topology and stack of course... *If* you had some master node >> doing checksums or doing it at the application layer it could >> possible make sense, but it would need to be some seriously huge >> workload to justify it > > GPUs sitting on the PCI-E bus are going to have this problem, and it''s > likely insurmountable. > > HOWEVER, AMD *is* finally getting around to implementing a GPU in the > same package as the CPU, so we''ll shortly be able to see a combined > CPU/GPU thing that sits in an AM3 socket (or, more likely, a C32 > socket). There''s even a possibility that AMD''s talked about where an > advanced GPU will live in a second CPU-style socket, with direct HT > connections. This sort of design at least leads itself to being > used as a co-processor, as it has direct low-latency connection to the > memory contoller/bus.lalala.. hear no evil.. speak no evil... Does it *really* sound so fun to write code generation for x86_64 *AND* ATI VLIW targets... Unless everyone wants to rewrite their code in highly explicit parallel programming models I think there''s a huge amount of work before general applications can really benefit from this.. I''d be happy to see a fully automatic solution for optimally offloading general application code to the GPU by 2012.. (I also don''t know if Fusion, which is what I think you''re referring to, is really going to initially target the high performance/visualization market.. If this is the case it''s much less likely to solve performance problems and be better suited for improving efficiency..)
John Martin
2010-Jul-19 14:51 UTC
[zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing
> *From:* Erik Trimble <erik.trimble at oracle.com> GPUs sitting on the PCI-E bus are going to have this problem, and it''s > likely insurmountable.In what context, ZFS or MD5 checksums? Over a year ago I did an experiment extracting what I believe was the 256-bit RAID-Z checksum calculation into a standalone user space program. Compared to a i7-920, a Quadro FX 4800 was much faster and this included the Gen2 x16 transport. Of course, the trick is to overlap the transport with compute so that data is always in flight.> HOWEVER, AMD *is* finally getting around to implementing a GPU in the > same package as the CPU, so we''ll shortly be able to see a combined > CPU/GPU thing that sits in an AM3 socket (or, more likely, a C32 > socket). There''s even a possibility that AMD''s talked about where an > advanced GPU will live in a second CPU-style socket, with direct HT > connections. This sort of design at least leads itself to being used > as a co-processor, as it has direct low-latency connection to the > memory contoller/bus.Again, in the context of ZFS I don''t believe data transport is the big problem to solve. I believe it is a kernel space API. Do we have any indication any of the GPGPU vendors (NVIDIA/ATI/Intel) will offer an API that can be called from kernel space?
"C. Bergström"
2010-Jul-19 15:15 UTC
[zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing
John Martin wrote:> >> HOWEVER, AMD *is* finally getting around to implementing a GPU in the >> same package as the CPU, so we''ll shortly be able to see a combined >> CPU/GPU thing that sits in an AM3 socket (or, more likely, a C32 >> socket). There''s even a possibility that AMD''s talked about where an >> advanced GPU will live in a second CPU-style socket, with direct HT >> connections. This sort of design at least leads itself to being used >> as a co-processor, as it has direct low-latency connection to the >> memory contoller/bus. > > > Again, in the context of ZFS I don''t believe data transport > is the big problem to solve. I believe it is a kernel space API. > Do we have any indication any of the GPGPU vendors (NVIDIA/ATI/Intel) > will offer an API that can be called from kernel space?Nouveau uses KCS (kernel command submission) which would allow this, but the problem is that then you have to deal with relocations. Our driver, and I believe Nvidia''s, use UCS (user command submission) to simplify this and get better performance.. Porting TTM (memory manager implementing GEM interface) and other things was a huge downside to Nouveau and it would be very difficult to go back to that. What sort of API would you specifically need though?
John Martin
2010-Jul-19 16:23 UTC
[zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing
On 07/19/10 11:15 AM, "C. Bergstr?m" wrote:> What sort of API would you specifically need though?Something like CUDA or OpenCL.
"C. Bergström"
2010-Jul-19 16:50 UTC
[zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing
John Martin wrote:> On 07/19/10 11:15 AM, "C. Bergstr?m" wrote: > >> What sort of API would you specifically need though? > > Something like CUDA or OpenCL.Ummm.. I could argue that CUDA and OpenCL are not API, but programming languages/models.. When you say API I think something like this.. http://github.com/pathscale/pscnv/blob/master/libpscnv/libpscnv.h
Bob Friesenhahn
2010-Jul-19 16:52 UTC
[zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing
My own opinion regarding this discussion is that first it should be demonstrated that CPU consumption is an actual ZFS bottleneck (and will continue to be in the near future) before looking at ways to eliminate that bottleneck. The current instantiation of GPUs in computer hardware is a poor design and quite wasteful of resources. That is a reason why I refuse to consider depending on GPUs in my own software. See my reasoning here: "http://www.graphicsmagick.org/FAQ.html#are-there-any-plans-to-use-opencl-or-cuda-to-use-a-gpu" There is every reason to believe that Intel (and perhaps AMD) will introduce updated CPUs which provide the arithmetic benefits of GPUs within their native instruction sets, and with little increase in cost and power consumption. This is in addition to the explosion in the number of computing cores per socket. Except for very specific computing situations, GPU add-on hardware is a very poor architecture going forward into the future. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
John Martin
2010-Jul-19 21:28 UTC
[zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing
On 07/19/10 12:52 PM, Bob Friesenhahn wrote:> My own opinion regarding this discussion is that first it should be > demonstrated that CPU consumption is an actual ZFS bottleneck (and will > continue to be in the near future) before looking at ways to eliminate > that bottleneck.Naturally and the system load from "zpool scrub" was the original motivation for finding a way to offload and accelerate the checksum operations.
Rob Clark
2010-Jul-22 17:02 UTC
[zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing
> On 7/18/2010 5:01 AM, "C. Bergstr?m" wrote: > > Rob Clark wrote: > >>>> If SunStudio had some GPU support and there were a few functions >... > >>> > >>> Bob > >>> -- > >> > >> If you wait long enough someone will ''build a Bridge to it'' ... > >> > >> > >> Accelerating Distributed Storage Systems with CUDA > - Paper & Code > >> > http://www.nvidia.com/object/cuda_apps_flash_new.html# > state=detailsOpen;aid=e223fbfc-f017-498c-8174-699747db > d88b > >> > > I believe I commented on this thread before and that paper mostly > > focuses on hashed and md5 checksums. The other problem is that while > > as long as the GPU is on the other side of the memory controller it''s > > unlikely to be "cost" (performance/latency) effective to do the > > expensive round-trip for small chunks of data. > Depending on the storage topology and stack of course... *If* you had > some master node doing checksums or doing it at the application layer > it could possible make sense, but it would need to be some seriously > huge workload to justify it GPUs sitting on the PCI-E bus are going to > have this problem, and it''s likely insurmountable.I have tried some programming using benchmarks and have managed to exceed the theoretical GFLOPS - pretty good I''d say. What I am unable to do (with either my own code or other Benchmarks) is saturate my "GPU FB", I am yet to kick it past %40 ;( . I did manage to get the Temps up to 104''C boy was the Fan screeming!> HOWEVER, AMD *is* finally getting around to implementing a GPU in the > same package as the CPU, so we''ll shortly be able to see a combined > CPU/GPU thing that sits in an AM3 socket (or, more likely, a C32 > socket). There''s even a possibility that AMD''s talked about where an > advanced GPU will live in a second CPU-style socket, with direct HT > ... > Erik TrimbleYes, them are building my Computer (I''ve seen REAL Hardware Demoed) but they say we''ll be waiting for 2Q next year. If I must wait that long I think I''ll wait for the Netbook, can you imaging a "Netbook SuperComputer"! Thanks for you Post, Rob -- This message posted from opensolaris.org