[[!meta  title="General-purpose computing on graphics processing units"]]

[GPGPU][] utilizes the number crunching speed and massive
parallelization of your [graphics card][GPU] to accelerate
general-purpose tasks.  When your algorithm is compatible with GPU
hardware, the speedup of running hundreds of concurrent threads can be
enormous.

There are a number of ways to implement GPGPU, ranging from
multi-platform frameworks such as [OpenCL][] to single-company
frameworks such as [NVIDIA][]'s [CUDA][].  I've gotten to play around
with [CUDA][] while TAing the [[parallel computing]] class, and its
lots of fun.

With NVIDIA (other vendors are probably similar, I'll update this as I
learn more), each GPU *device* has a block of global memory serving a
number of multi-processors, and each multi-processor contains several
cores which can execute concurrent threads.

Specs on NVIDIA's [GeForce GTX 580][]:

* 512 cores (16 (MP) ⋅ 32 (Cores/MP))
* 1.5 GB GDDR5 RAM
* 192.4 GB/sec memory bandwidth
* 1.54 GHz processor clock rate
* 1.58 TFLOPs per second

Zoom.

The FLOPs/s computaton is `cores⋅clock⋅2`, because (from [page 94][]
of the CUDA programming guide) each core can exectute a single
multiply-add operation (2 FLOPs) per cycle.  Also take a look at the
graph of historical performance on [page 14], the table of device
capabilities that starts on [page 111][], and the description of
*warps* on [page 93][].

[GPGPU]: http://en.wikipedia.org/wiki/GPGPU
[OpenCL]: http://en.wikipedia.org/wiki/OpenCL
[GPU]: http://en.wikipedia.org/wiki/Graphics_processing_unit
[NVIDIA]: http://www.nvidia.com/
[CUDA]: http://en.wikipedia.org/wiki/CUDA
[GeForce GTX 580]: http://www.nvidia.com/object/product-geforce-gtx-580-us.html
[page 14]: http://developer.download.nvidia.com/compute/cuda/3_2/toolkit/docs/CUDA_C_Programming_Guide.pdf#page=14
[page 93]: http://developer.download.nvidia.com/compute/cuda/3_2/toolkit/docs/CUDA_C_Programming_Guide.pdf#page=93
[page 94]: http://developer.download.nvidia.com/compute/cuda/3_2/toolkit/docs/CUDA_C_Programming_Guide.pdf#page=94
[page 111]: http://developer.download.nvidia.com/compute/cuda/3_2/toolkit/docs/CUDA_C_Programming_Guide.pdf#page=111

[[!tag tags/hardware]]
[[!tag tags/programming]]