From: W. Trevor King <wking@drexel.edu>
Date: Thu, 25 Nov 2010 16:04:56 +0000 (-0500)
Subject: Bring in this Fall's assigment 7 (from last year's logistic map assigment).
X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=6be51bddc283ce9dbb51442c3d98cdb705da4179;p=parallel_computing.git

Bring in this Fall's assigment 7 (from last year's logistic map assigment).
---

diff --git a/assignments/archive/logistic_cuda/index.shtml.itex2MML b/assignments/archive/logistic_cuda/index.shtml.itex2MML
index 705eeb7..beaf339 100644
--- a/assignments/archive/logistic_cuda/index.shtml.itex2MML
+++ b/assignments/archive/logistic_cuda/index.shtml.itex2MML
@@ -56,9 +56,10 @@ block size tiles. The content of $M$ and $N$ within the tiles can then
 be transfered into the share memory for speed.</p>
 
 <p>See the <a href="../../../content/GPUs/#learn">Learning CUDA</a>
-section of the course notes and the skeleton code <a
-href="src/matmult_skeleton.cu">matmult_skeleton.cu</a>. See also the
-in-class exercise on array reversal.</p>
+section of the course notes and the skeleton
+code <a href="../../../src/matmult_skeleton.cu">matmult_skeleton.cu</a>. See
+also the in-class exercise on <a href="../../../content/GPUs/#example">array
+reversal</a>.</p>
 
 <h2 id="B">Part B â Logistic Map</h2>
 
diff --git a/assignments/archive/matrix_multiplication_cuda/index.shtml.itex2MML b/assignments/archive/matrix_multiplication_cuda/index.shtml.itex2MML
new file mode 100644
index 0000000..ae7425c
--- /dev/null
+++ b/assignments/archive/matrix_multiplication_cuda/index.shtml.itex2MML
@@ -0,0 +1,66 @@
+<!--#set var="root_directory" value="../../.." --><!--#include virtual="$root_directory/shared/header.shtml"-->
+
+<h1>Assignment 7</h1>
+<p><em>Due Friday, December 3</em></p>
+
+<h2>Purpose</h2>
+
+<p>Learn the CUDA language.</p>
+
+<p>Note: Please identify all your work.</p>
+
+<!--TableOfContents:Begin-->
+<!--TableOfContents:End-->
+
+<p>This assignment consists in multiplying two square matrices of
+identical size.</p>
+
+<p class="equation">\[
+  P = M \times N
+\]</p>
+
+<p>The matrices $M$ and $N$, of size <code>MatrixSize</code>, could be
+filled with random numbers.</p>
+
+<h2 id="1">Step 1</h2>
+
+<p>Write a program to do the matrix multiplication assuming that the
+matrices $M$ and $N$ are small and fit in a CUDA block. Input the
+matrix size via a command line argument. Do the matrix multiplication
+on the GPU and on the CPU and compare the resulting matrices. Make
+sure that your code works for arbitrary block size (up to 512 threads)
+and (small) matrix size.</p>
+
+<p>Use one-dimensional arrays to store the matrices $M$, $N$ and $P$
+for efficiency.</p>
+
+<h2 id="2">Step 2</h2>
+
+<p>Modify the previous program to multiply arbitrary size
+matrices. Make sure that your code works for arbitrary block size (up
+to 512 threads) and matrix size (up to memory limitation). Instrument
+your program with calls to <code>gettimeofday()</code> to time the
+matrix multiplication on the CPU and GPU. Plot these times as a
+function of matrix size (up to large matrices, 4096) and guess the
+matrix size dependence of the timing.</p>
+
+<h2 id="2">Step 3</h2>
+
+<p>Optimize the previous code to take advantage of the very fast share
+memory. To do this you must tile the matrix via 2D CUDA grid of blocks
+(as above). All matrix elements in a block within $P$ will be computed
+at once. The scalar product of each row of $M$ and each column of $N$
+within the block can be calculated by scanning over the matrices in
+block size tiles. The content of $M$ and $N$ within the tiles can then
+be transfered into the share memory for speed.</p>
+
+<p>Time this code and compare the results to the code
+in <a href="#2">Step 2</a></p>.
+
+<p>See the <a href="../../../content/GPUs/#learn">Learning CUDA</a>
+section of the course notes and the skeleton
+code <a href="../../../src/matmult_skeleton.cu">matmult_skeleton.cu</a>. See
+also the in-class exercise on <a href="../../../content/GPUs/#example">array
+reversal</a>.</p>
+
+<!--#include virtual="$root_directory/shared/footer.shtml"-->
diff --git a/assignments/current/7 b/assignments/current/7
index eb551e9..3e43c08 120000
--- a/assignments/current/7
+++ b/assignments/current/7
@@ -1 +1 @@
-../archive/game_of_life/
\ No newline at end of file
+../archive/matrix_multiplication_cuda
\ No newline at end of file
diff --git a/content/GPUs/index.shtml b/content/GPUs/index.shtml
index 6f82799..77ebf42 100644
--- a/content/GPUs/index.shtml
+++ b/content/GPUs/index.shtml
@@ -93,7 +93,7 @@ Fortran, Java and Matlab.</p>
     Gems 3</a> contains great demonstration GPU codes.</li>
 </ul>
 
-<h2 id="learn">Learning CUDA by examples</h2>
+<h2 id="example">Learning CUDA by example</h2>
 
 <p>Jason Sanders and Edward
 Kandrot's <a href="http://developer.nvidia.com/object/cuda-by-example.html">CUDA