From: W. Trevor King Date: Thu, 25 Nov 2010 16:04:56 +0000 (-0500) Subject: Bring in this Fall's assigment 7 (from last year's logistic map assigment). X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=6be51bddc283ce9dbb51442c3d98cdb705da4179;p=parallel_computing.git Bring in this Fall's assigment 7 (from last year's logistic map assigment). --- diff --git a/assignments/archive/logistic_cuda/index.shtml.itex2MML b/assignments/archive/logistic_cuda/index.shtml.itex2MML index 705eeb7..beaf339 100644 --- a/assignments/archive/logistic_cuda/index.shtml.itex2MML +++ b/assignments/archive/logistic_cuda/index.shtml.itex2MML @@ -56,9 +56,10 @@ block size tiles. The content of $M$ and $N$ within the tiles can then be transfered into the share memory for speed.

See the Learning CUDA -section of the course notes and the skeleton code matmult_skeleton.cu. See also the -in-class exercise on array reversal.

+section of the course notes and the skeleton +code matmult_skeleton.cu. See +also the in-class exercise on array +reversal.

Part B — Logistic Map

diff --git a/assignments/archive/matrix_multiplication_cuda/index.shtml.itex2MML b/assignments/archive/matrix_multiplication_cuda/index.shtml.itex2MML new file mode 100644 index 0000000..ae7425c --- /dev/null +++ b/assignments/archive/matrix_multiplication_cuda/index.shtml.itex2MML @@ -0,0 +1,66 @@ + + +

Assignment 7

+

Due Friday, December 3

+ +

Purpose

+ +

Learn the CUDA language.

+ +

Note: Please identify all your work.

+ + + + +

This assignment consists in multiplying two square matrices of +identical size.

+ +

\[ + P = M \times N +\]

+ +

The matrices $M$ and $N$, of size MatrixSize, could be +filled with random numbers.

+ +

Step 1

+ +

Write a program to do the matrix multiplication assuming that the +matrices $M$ and $N$ are small and fit in a CUDA block. Input the +matrix size via a command line argument. Do the matrix multiplication +on the GPU and on the CPU and compare the resulting matrices. Make +sure that your code works for arbitrary block size (up to 512 threads) +and (small) matrix size.

+ +

Use one-dimensional arrays to store the matrices $M$, $N$ and $P$ +for efficiency.

+ +

Step 2

+ +

Modify the previous program to multiply arbitrary size +matrices. Make sure that your code works for arbitrary block size (up +to 512 threads) and matrix size (up to memory limitation). Instrument +your program with calls to gettimeofday() to time the +matrix multiplication on the CPU and GPU. Plot these times as a +function of matrix size (up to large matrices, 4096) and guess the +matrix size dependence of the timing.

+ +

Step 3

+ +

Optimize the previous code to take advantage of the very fast share +memory. To do this you must tile the matrix via 2D CUDA grid of blocks +(as above). All matrix elements in a block within $P$ will be computed +at once. The scalar product of each row of $M$ and each column of $N$ +within the block can be calculated by scanning over the matrices in +block size tiles. The content of $M$ and $N$ within the tiles can then +be transfered into the share memory for speed.

+ +

Time this code and compare the results to the code +in Step 2

. + +

See the Learning CUDA +section of the course notes and the skeleton +code matmult_skeleton.cu. See +also the in-class exercise on array +reversal.

+ + diff --git a/assignments/current/7 b/assignments/current/7 index eb551e9..3e43c08 120000 --- a/assignments/current/7 +++ b/assignments/current/7 @@ -1 +1 @@ -../archive/game_of_life/ \ No newline at end of file +../archive/matrix_multiplication_cuda \ No newline at end of file diff --git a/content/GPUs/index.shtml b/content/GPUs/index.shtml index 6f82799..77ebf42 100644 --- a/content/GPUs/index.shtml +++ b/content/GPUs/index.shtml @@ -93,7 +93,7 @@ Fortran, Java and Matlab.

Gems 3 contains great demonstration GPU codes. -

Learning CUDA by examples

+

Learning CUDA by example

Jason Sanders and Edward Kandrot's CUDA