thr3ads.net - zfs code - [zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing [Oct 2009]

If this information is useful, please help other people find it:
Share via:

Rob Clark

2009-Oct-13 02:04 UTC

[zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing

It would be great to use the GPU for ZFS or to offload some of the OS load ...
http://developer.nvidia.com/object/nexus.html

If SunStudio had some GPU support and there were a few functions added for the
hotpoints ...


Rob
-- 
This message posted from opensolaris.org

Bob Friesenhahn

2009-Oct-13 13:45 UTC

head link

[zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing

On Mon, 12 Oct 2009, Rob Clark wrote:
> It would be great to use the GPU for ZFS or to offload some of the OS load
...
> http://developer.nvidia.com/object/nexus.html
Let us know when you have a demo ready. :-)
> If SunStudio had some GPU support and there were a few functions 
> added for the hotpoints ...
GPUs are great for some things but it is difficult to imagine a GPU 
being of assistance in the zfs implementation due to way too much 
latency, optimization for floating point rather than integer, and due 
to creating a "hotpoint".  For zfs it is much better to spread the 
load across multiple CPU cores using many threads as is done now.

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Rob Clark

2010-Jul-18 11:41 UTC

head link

[zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing

> On Mon, 12 Oct 2009, Rob Clark wrote:
> 
> > It would be great to use the GPU for ZFS or to offload some of ...
> 
> Let us know when you have a demo ready. :-)
[b]Other[/b] demos showing a few ''lousy speed-ups of only 2
times'' with some speed-ups reaching over 1000 times faster.
The general consensus seems to be that a 8-12 times speed-up is a reasonable
expectation (with some effort) for
the so-called "average Problem". See:
http://www.nvidia.com/object/cuda_apps_flash_new.html
 
> > If SunStudio had some GPU support and there were a few functions 
> > added for the hotpoints ...
> 
> GPUs are great for some things but it is difficult to imagine a GPU 
> being of assistance in the zfs implementation due to way too much 
> latency, optimization for floating point rather than integer, and due 
> to creating a "hotpoint".  For zfs it is much better to spread
the
> load across multiple CPU cores using many threads as is done now.
> 
> Bob
> --
If you wait long enough someone will ''build a Bridge to it''
...


Accelerating Distributed Storage Systems with CUDA - Paper & Code
http://www.nvidia.com/object/cuda_apps_flash_new.html#state=detailsOpen;aid=e223fbfc-f017-498c-8174-699747dbd88b


Real-time Parallel Hashing on the GPU
http://www.nvidia.com/object/cuda_apps_flash_new.html#state=detailsOpen;aid=d91c3c63-a2d6-4a15-a70a-87bcafdd70d8


CUDA Multiforcer - Password Recovery or Testing if Password is ''Secure
Enough''
http://www.nvidia.com/object/cuda_apps_flash_new.html#state=detailsOpen;aid=9e515da5-c97e-4c37-8305-f27982a02d5f


Parallelizing Hash-based Data Carving
http://www.nvidia.com/object/cuda_apps_flash_new.html#state=detailsOpen;aid=f93b62b6-b6af-497e-83a8-865af31c8d7a
and Paper: http://arxiv.org/abs/0901.1307


Support for OpenCL in OpenSolaris would be a good thing,
Rob
-- 
This message posted from opensolaris.org

"C. Bergström"

2010-Jul-18 12:01 UTC

head link

[zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing

Rob Clark wrote:>> On Mon, 12 Oct 2009, Rob Clark wrote:
>>
>>     
>>> It would be great to use the GPU for ZFS or to offload some of ...
>>>       
>> Let us know when you have a demo ready. :-)
>>     
>
> [b]Other[/b] demos showing a few ''lousy speed-ups of only 2
times'' with some speed-ups reaching over 1000 times faster.
> The general consensus seems to be that a 8-12 times speed-up is a
reasonable expectation (with some effort) for
> the so-called "average Problem". See:
http://www.nvidia.com/object/cuda_apps_flash_new.html
>   It all depends on the code so I wouldn''t say there''s any
reasonable
expectation.  To assume otherwise is flawed..>  
>
>   
>>> If SunStudio had some GPU support and there were a few functions 
>>> added for the hotpoints ...
>>>       
>> GPUs are great for some things but it is difficult to imagine a GPU 
>> being of assistance in the zfs implementation due to way too much 
>> latency, optimization for floating point rather than integer, and due 
>> to creating a "hotpoint".  For zfs it is much better to
spread the
>> load across multiple CPU cores using many threads as is done now.
>>
>> Bob
>> --
>>     
>
> If you wait long enough someone will ''build a Bridge to
it'' ...
>
>
> Accelerating Distributed Storage Systems with CUDA - Paper & Code
>
http://www.nvidia.com/object/cuda_apps_flash_new.html#state=detailsOpen;aid=e223fbfc-f017-498c-8174-699747dbd88b
>   I believe I commented on this thread before and that paper mostly 
focuses on hashed and md5 checksums.  The other problem is that while as 
long as the GPU is on the other side of the memory controller it''s 
unlikely to be "cost" (performance/latency) effective to do the 
expensive round-trip for small chunks of data.  Depending on the storage 
topology and stack of course... *If* you had some master node doing 
checksums or doing it at the application layer it could possible make 
sense, but it would need to be some seriously huge workload to justify
it.>
> Real-time Parallel Hashing on the GPU
>
http://www.nvidia.com/object/cuda_apps_flash_new.html#state=detailsOpen;aid=d91c3c63-a2d6-4a15-a70a-87bcafdd70d8
>
>
> CUDA Multiforcer - Password Recovery or Testing if Password is
''Secure Enough''
>
http://www.nvidia.com/object/cuda_apps_flash_new.html#state=detailsOpen;aid=9e515da5-c97e-4c37-8305-f27982a02d5f
>
>
> Parallelizing Hash-based Data Carving
>
http://www.nvidia.com/object/cuda_apps_flash_new.html#state=detailsOpen;aid=f93b62b6-b6af-497e-83a8-865af31c8d7a
and Paper: http://arxiv.org/abs/0901.1307
>
>
> Support for OpenCL in OpenSolaris would be a good thing,
>   OpenCL wouldn''t help solve this and from the research papers
I''ve seen
it''s lower performance and higher execution times than CUDA.

/* Disclaimer : I work for a vendor doing a GPGPU solution based on HMPP 
which will be ported to Solaris.  Our driver some parts of the stack are 
open source as well.. */



./C

Erik Trimble

2010-Jul-19 00:01 UTC

head link

[zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing

On 7/18/2010 5:01 AM, "C. Bergstr?m" wrote:> Rob Clark wrote:
>>>> If SunStudio had some GPU support and there were a few
functions
>>>> added for the hotpoints ...
>>> GPUs are great for some things but it is difficult to imagine a GPU
>>> being of assistance in the zfs implementation due to way too much 
>>> latency, optimization for floating point rather than integer, and 
>>> due to creating a "hotpoint".  For zfs it is much better
to spread
>>> the load across multiple CPU cores using many threads as is done
now.
>>>
>>> Bob
>>> -- 
>>
>> If you wait long enough someone will ''build a Bridge to
it'' ...
>>
>>
>> Accelerating Distributed Storage Systems with CUDA - Paper & Code
>>
http://www.nvidia.com/object/cuda_apps_flash_new.html#state=detailsOpen;aid=e223fbfc-f017-498c-8174-699747dbd88b
>>
> I believe I commented on this thread before and that paper mostly 
> focuses on hashed and md5 checksums.  The other problem is that while 
> as long as the GPU is on the other side of the memory controller
it''s
> unlikely to be "cost" (performance/latency) effective to do the 
> expensive round-trip for small chunks of data.  Depending on the 
> storage topology and stack of course... *If* you had some master node 
> doing checksums or doing it at the application layer it could possible 
> make sense, but it would need to be some seriously huge workload to 
> justify it
GPUs sitting on the PCI-E bus are going to have this problem, and it''s 
likely insurmountable.

HOWEVER, AMD *is* finally getting around to implementing a GPU in the 
same package as the CPU, so we''ll shortly be able to see a combined 
CPU/GPU thing that sits in an AM3 socket (or, more likely, a C32 
socket).  There''s even a possibility that AMD''s talked about
where an
advanced GPU will live in a second CPU-style socket, with direct HT 
connections.    This sort of design at least leads itself to being used 
as a co-processor, as it has direct low-latency connection to the memory 
contoller/bus.

-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

"C. Bergström"

2010-Jul-19 00:26 UTC

head link

[zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing

Erik Trimble wrote:> On 7/18/2010 5:01 AM, "C. Bergstr?m" wrote:
>> Rob Clark wrote:
>>>>> If SunStudio had some GPU support and there were a few
functions
>>>>> added for the hotpoints ...
>>>> GPUs are great for some things but it is difficult to imagine a
GPU
>>>> being of assistance in the zfs implementation due to way too
much
>>>> latency, optimization for floating point rather than integer,
and
>>>> due to creating a "hotpoint".  For zfs it is much
better to spread
>>>> the load across multiple CPU cores using many threads as is
done now.
>>>>
>>>> Bob
>>>> -- 
>>>
>>> If you wait long enough someone will ''build a Bridge to
it'' ...
>>>
>>>
>>> Accelerating Distributed Storage Systems with CUDA - Paper &
Code
>>>
http://www.nvidia.com/object/cuda_apps_flash_new.html#state=detailsOpen;aid=e223fbfc-f017-498c-8174-699747dbd88b
>>>
>> I believe I commented on this thread before and that paper mostly 
>> focuses on hashed and md5 checksums.  The other problem is that while 
>> as long as the GPU is on the other side of the memory controller
it''s
>> unlikely to be "cost" (performance/latency) effective to do
the
>> expensive round-trip for small chunks of data.  Depending on the 
>> storage topology and stack of course... *If* you had some master node 
>> doing checksums or doing it at the application layer it could 
>> possible make sense, but it would need to be some seriously huge 
>> workload to justify it
>
> GPUs sitting on the PCI-E bus are going to have this problem, and
it''s
> likely insurmountable.
>
> HOWEVER, AMD *is* finally getting around to implementing a GPU in the 
> same package as the CPU, so we''ll shortly be able to see a
combined
> CPU/GPU thing that sits in an AM3 socket (or, more likely, a C32 
> socket).  There''s even a possibility that AMD''s talked
about where an
> advanced GPU will live in a second CPU-style socket, with direct HT 
> connections.    This sort of design at least leads itself to being 
> used as a co-processor, as it has direct low-latency connection to the 
> memory contoller/bus.lalala.. hear no evil.. speak no evil...  Does it *really* sound so fun 
to write code generation for x86_64 *AND* ATI VLIW targets...  Unless 
everyone wants to rewrite their code in highly explicit parallel 
programming models I think there''s a huge amount of work before general
applications can really benefit from this..  I''d be happy to see a
fully
automatic solution for optimally offloading general application code to 
the GPU by 2012..

(I also don''t know if Fusion, which is what I think you''re
referring to,
is really going to initially target the high performance/visualization 
market..  If this is the case it''s much less likely to solve
performance
problems and be better suited for improving efficiency..)

John Martin

2010-Jul-19 14:51 UTC

head link

[zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing

> *From:* Erik Trimble <erik.trimble at oracle.com
> GPUs sitting on the PCI-E bus are going to have this problem, and
it''s
> likely insurmountable.
In what context, ZFS or MD5 checksums?

Over a year ago I did an experiment extracting
what I believe was the 256-bit RAID-Z checksum
calculation into a standalone user space program.
Compared to a i7-920, a Quadro FX 4800 was much
faster and this included the Gen2 x16 transport.
Of course, the trick is to overlap the transport
with compute so that data is always in flight.

> HOWEVER, AMD *is* finally getting around to implementing a GPU in the
> same package as the CPU, so we''ll shortly be able to see a
combined
> CPU/GPU thing that sits in an AM3 socket (or, more likely, a C32
> socket). There''s even a possibility that AMD''s talked
about where an
> advanced GPU will live in a second CPU-style socket, with direct HT
> connections. This sort of design at least leads itself to being used
> as a co-processor, as it has direct low-latency connection to the
> memory contoller/bus.

Again, in the context of ZFS I don''t believe data transport
is the big problem to solve.  I believe it is a kernel space API.
Do we have any indication any of the GPGPU vendors (NVIDIA/ATI/Intel)
will offer an API that can be called from kernel space?

"C. Bergström"

2010-Jul-19 15:15 UTC

head link

[zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing

John Martin wrote:>
>> HOWEVER, AMD *is* finally getting around to implementing a GPU in the
>> same package as the CPU, so we''ll shortly be able to see a
combined
>> CPU/GPU thing that sits in an AM3 socket (or, more likely, a C32
>> socket). There''s even a possibility that AMD''s talked
about where an
>> advanced GPU will live in a second CPU-style socket, with direct HT
>> connections. This sort of design at least leads itself to being used
>> as a co-processor, as it has direct low-latency connection to the
>> memory contoller/bus.
>
>
> Again, in the context of ZFS I don''t believe data transport
> is the big problem to solve.  I believe it is a kernel space API.
> Do we have any indication any of the GPGPU vendors (NVIDIA/ATI/Intel)
> will offer an API that can be called from kernel space?Nouveau uses KCS (kernel command submission) which would allow this, but 
the problem is that then you have to deal with relocations.  Our driver, 
and I believe Nvidia''s, use UCS (user command submission) to simplify 
this and get better performance..  Porting TTM (memory manager 
implementing GEM interface) and other things was a huge downside to 
Nouveau and it would be very difficult to go back to that.

What sort of API would you specifically need though?

John Martin

2010-Jul-19 16:23 UTC

head link

[zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing

On 07/19/10 11:15 AM, "C. Bergstr?m" wrote:
> What sort of API would you specifically need though?
Something like CUDA or OpenCL.

"C. Bergström"

2010-Jul-19 16:50 UTC

head link

[zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing

John Martin wrote:> On 07/19/10 11:15 AM, "C. Bergstr?m" wrote:
>
>> What sort of API would you specifically need though?
>
> Something like CUDA or OpenCL.Ummm.. I could argue that CUDA and OpenCL are not API, but programming 
languages/models.. When you say API I think something like this..

http://github.com/pathscale/pscnv/blob/master/libpscnv/libpscnv.h

Bob Friesenhahn

2010-Jul-19 16:52 UTC

head link

[zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing

My own opinion regarding this discussion is that first it should be 
demonstrated that CPU consumption is an actual ZFS bottleneck (and 
will continue to be in the near future) before looking at ways to 
eliminate that bottleneck.

The current instantiation of GPUs in computer hardware is a poor 
design and quite wasteful of resources.  That is a reason why I refuse 
to consider depending on GPUs in my own software.  See my reasoning 
here:

"http://www.graphicsmagick.org/FAQ.html#are-there-any-plans-to-use-opencl-or-cuda-to-use-a-gpu"

There is every reason to believe that Intel (and perhaps AMD) will 
introduce updated CPUs which provide the arithmetic benefits of GPUs 
within their native instruction sets, and with little increase in cost 
and power consumption.  This is in addition to the explosion in the 
number of computing cores per socket.

Except for very specific computing situations, GPU add-on hardware is 
a very poor architecture going forward into the future.

Bob
-- 
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

John Martin

2010-Jul-19 21:28 UTC

head link

[zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing

On 07/19/10 12:52 PM, Bob Friesenhahn wrote:> My own opinion regarding this discussion is that first it should be
> demonstrated that CPU consumption is an actual ZFS bottleneck (and will
> continue to be in the near future) before looking at ways to eliminate
> that bottleneck.
Naturally and the system load from "zpool scrub" was the original
motivation for finding a way to offload and accelerate the checksum
operations.

Rob Clark

2010-Jul-22 17:02 UTC

head link

[zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing

> On 7/18/2010 5:01 AM, "C. Bergstr?m" wrote:
> > Rob Clark wrote:
> >>>> If SunStudio had some GPU support and there were a few
functions
>...
> >>>
> >>> Bob
> >>> -- 
> >>
> >> If you wait long enough someone will ''build a Bridge to
it'' ...
> >>
> >>
> >> Accelerating Distributed Storage Systems with CUDA
> - Paper & Code
> >>
> http://www.nvidia.com/object/cuda_apps_flash_new.html#
> state=detailsOpen;aid=e223fbfc-f017-498c-8174-699747db
> d88b 
> >>
> > I believe I commented on this thread before and that paper mostly 
> > focuses on hashed and md5 checksums.  The other problem is that while 
> > as long as the GPU is on the other side of the memory controller
it''s
> > unlikely to be "cost" (performance/latency) effective to do
the
> > expensive round-trip for small chunks of data.
>  Depending on the storage topology and stack of course... *If* you had
>  some master node doing checksums or doing it at the application layer
>  it could possible make sense, but it would need to be some seriously
>  huge workload to justify it GPUs sitting on the PCI-E bus are going to 
> have this problem, and it''s likely insurmountable.
I have tried some programming using benchmarks and have managed 
to exceed the theoretical GFLOPS - pretty good I''d say. What I am 
unable to do (with either my own code or other Benchmarks) is saturate
my "GPU FB", I am yet to kick it past %40 ;(  . I did manage to get
the
Temps up to 104''C boy was the Fan screeming!


> HOWEVER, AMD *is* finally getting around to implementing a GPU in the 
> same package as the CPU, so we''ll shortly be able to see a
combined
> CPU/GPU thing that sits in an AM3 socket (or, more likely, a C32 
> socket).  There''s even a possibility that AMD''s talked
about where an
> advanced GPU will live in a second CPU-style socket, with direct HT 
> ...
> Erik Trimble
Yes, them are building my Computer (I''ve seen REAL Hardware Demoed)
but they say we''ll be waiting for 2Q next year. If I must wait that
long
I think I''ll wait for the Netbook, can you imaging a "Netbook
SuperComputer"!

Thanks for you Post,
Rob
-- 
This message posted from opensolaris.org

Mathew Stuart

2010-Dec-01 11:18 UTC

head link

[zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing

Thinks it will become interesting to use.
-- 
This message posted from opensolaris.org

zfs code - Oct 2009 - Nexus for Solaris - Fermi GPU / CPU Coprocessing

[zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing

[zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing

[zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing

[zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing

[zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing

[zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing

[zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing

[zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing

[zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing

[zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing

[zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing

[zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing

[zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing

[zfs-code] Nexus for Solaris - Fermi GPU / CPU Coprocessing