From 6be51bddc283ce9dbb51442c3d98cdb705da4179 Mon Sep 17 00:00:00 2001
From: "W. Trevor King"
See the Learning CUDA -section of the course notes and the skeleton code matmult_skeleton.cu. See also the -in-class exercise on array reversal.
+section of the course notes and the skeleton +code matmult_skeleton.cu. See +also the in-class exercise on array +reversal.Due Friday, December 3
+ +Learn the CUDA language.
+ +Note: Please identify all your work.
+ + + + +This assignment consists in multiplying two square matrices of +identical size.
+ +\[ + P = M \times N +\]
+ +The matrices $M$ and $N$, of size MatrixSize
, could be
+filled with random numbers.
Write a program to do the matrix multiplication assuming that the +matrices $M$ and $N$ are small and fit in a CUDA block. Input the +matrix size via a command line argument. Do the matrix multiplication +on the GPU and on the CPU and compare the resulting matrices. Make +sure that your code works for arbitrary block size (up to 512 threads) +and (small) matrix size.
+ +Use one-dimensional arrays to store the matrices $M$, $N$ and $P$ +for efficiency.
+ +Modify the previous program to multiply arbitrary size
+matrices. Make sure that your code works for arbitrary block size (up
+to 512 threads) and matrix size (up to memory limitation). Instrument
+your program with calls to gettimeofday()
to time the
+matrix multiplication on the CPU and GPU. Plot these times as a
+function of matrix size (up to large matrices, 4096) and guess the
+matrix size dependence of the timing.
Optimize the previous code to take advantage of the very fast share +memory. To do this you must tile the matrix via 2D CUDA grid of blocks +(as above). All matrix elements in a block within $P$ will be computed +at once. The scalar product of each row of $M$ and each column of $N$ +within the block can be calculated by scanning over the matrices in +block size tiles. The content of $M$ and $N$ within the tiles can then +be transfered into the share memory for speed.
+ +Time this code and compare the results to the code +in Step 2
. + +See the Learning CUDA +section of the course notes and the skeleton +code matmult_skeleton.cu. See +also the in-class exercise on array +reversal.
+ + diff --git a/assignments/current/7 b/assignments/current/7 index eb551e9..3e43c08 120000 --- a/assignments/current/7 +++ b/assignments/current/7 @@ -1 +1 @@ -../archive/game_of_life/ \ No newline at end of file +../archive/matrix_multiplication_cuda \ No newline at end of file diff --git a/content/GPUs/index.shtml b/content/GPUs/index.shtml index 6f82799..77ebf42 100644 --- a/content/GPUs/index.shtml +++ b/content/GPUs/index.shtml @@ -93,7 +93,7 @@ Fortran, Java and Matlab. Gems 3 contains great demonstration GPU codes. -Jason Sanders and Edward Kandrot's CUDA -- 2.26.2