Bring in this Fall's assigment 7 (from last year's logistic map assigment).

author W. Trevor King <wking@drexel.edu>

Thu, 25 Nov 2010 16:04:56 +0000 (11:04 -0500)

committer W. Trevor King <wking@drexel.edu>

Thu, 25 Nov 2010 16:04:56 +0000 (11:04 -0500)
author W. Trevor King <wking@drexel.edu>
Thu, 25 Nov 2010 16:04:56 +0000 (11:04 -0500)
committer W. Trevor King <wking@drexel.edu>
Thu, 25 Nov 2010 16:04:56 +0000 (11:04 -0500)
diff --git a/assignments/archive/logistic_cuda/index.shtml.itex2MML b/assignments/archive/logistic_cuda/index.shtml.itex2MML

index 705eeb7f67c5429b32b1c2fb27c734de7d1246e8..beaf339936fb446b973e6e8cb39b6213d2c6a23f 100644 (file)
--- a/assignments/archive/logistic_cuda/index.shtml.itex2MML
+++ b/assignments/archive/logistic_cuda/index.shtml.itex2MML
@@ -56,9 +56,10 @@ block size tiles. The content of $M$ and $N$ within the tiles can then
  be transfered into the share memory for speed.</p>
  
  <p>See the <a href="../../../content/GPUs/#learn">Learning CUDA</a>
-section of the course notes and the skeleton code <a
-href="src/matmult_skeleton.cu">matmult_skeleton.cu</a>. See also the
-in-class exercise on array reversal.</p>
+section of the course notes and the skeleton
+code <a href="../../../src/matmult_skeleton.cu">matmult_skeleton.cu</a>. See
+also the in-class exercise on <a href="../../../content/GPUs/#example">array
+reversal</a>.</p>
  
  <h2 id="B">Part B — Logistic Map</h2>
  
diff --git a/assignments/archive/matrix_multiplication_cuda/index.shtml.itex2MML b/assignments/archive/matrix_multiplication_cuda/index.shtml.itex2MML

new file mode 100644 (file)

index 0000000..ae7425c
--- /dev/null
+++ b/assignments/archive/matrix_multiplication_cuda/index.shtml.itex2MML
@@ -0,0 +1,66 @@
+<!--#set var="root_directory" value="../../.." --><!--#include virtual="$root_directory/shared/header.shtml"-->
+
+<h1>Assignment 7</h1>
+<p><em>Due Friday, December 3</em></p>
+
+<h2>Purpose</h2>
+
+<p>Learn the CUDA language.</p>
+
+<p>Note: Please identify all your work.</p>
+
+<!--TableOfContents:Begin-->
+<!--TableOfContents:End-->
+
+<p>This assignment consists in multiplying two square matrices of
+identical size.</p>
+
+<p class="equation">\[
+  P = M \times N
+\]</p>
+
+<p>The matrices $M$ and $N$, of size <code>MatrixSize</code>, could be
+filled with random numbers.</p>
+
+<h2 id="1">Step 1</h2>
+
+<p>Write a program to do the matrix multiplication assuming that the
+matrices $M$ and $N$ are small and fit in a CUDA block. Input the
+matrix size via a command line argument. Do the matrix multiplication
+on the GPU and on the CPU and compare the resulting matrices. Make
+sure that your code works for arbitrary block size (up to 512 threads)
+and (small) matrix size.</p>
+
+<p>Use one-dimensional arrays to store the matrices $M$, $N$ and $P$
+for efficiency.</p>
+
+<h2 id="2">Step 2</h2>
+
+<p>Modify the previous program to multiply arbitrary size
+matrices. Make sure that your code works for arbitrary block size (up
+to 512 threads) and matrix size (up to memory limitation). Instrument
+your program with calls to <code>gettimeofday()</code> to time the
+matrix multiplication on the CPU and GPU. Plot these times as a
+function of matrix size (up to large matrices, 4096) and guess the
+matrix size dependence of the timing.</p>
+
+<h2 id="2">Step 3</h2>
+
+<p>Optimize the previous code to take advantage of the very fast share
+memory. To do this you must tile the matrix via 2D CUDA grid of blocks
+(as above). All matrix elements in a block within $P$ will be computed
+at once. The scalar product of each row of $M$ and each column of $N$
+within the block can be calculated by scanning over the matrices in
+block size tiles. The content of $M$ and $N$ within the tiles can then
+be transfered into the share memory for speed.</p>
+
+<p>Time this code and compare the results to the code
+in <a href="#2">Step 2</a></p>.
+
+<p>See the <a href="../../../content/GPUs/#learn">Learning CUDA</a>
+section of the course notes and the skeleton
+code <a href="../../../src/matmult_skeleton.cu">matmult_skeleton.cu</a>. See
+also the in-class exercise on <a href="../../../content/GPUs/#example">array
+reversal</a>.</p>
+
+<!--#include virtual="$root_directory/shared/footer.shtml"-->
diff --git a/assignments/current/7 b/assignments/current/7

index eb551e96a2ad689e6ad8839268988fe1eedad9c6..3e43c08f9396af4b0599081ab0ff222a2f712d69 120000 (symlink)
--- a/assignments/current/7
+++ b/assignments/current/7
@@ -1 +1 @@
-../archive/game_of_life/
-\ No newline at end of file
+../archive/matrix_multiplication_cuda
+\ No newline at end of file
diff --git a/content/GPUs/index.shtml b/content/GPUs/index.shtml

index 6f82799d02e0431d1d219a5712021d150c29891e..77ebf42d77da2f96477421f8314193ccc64a9b18 100644 (file)
--- a/content/GPUs/index.shtml
+++ b/content/GPUs/index.shtml
@@ -93,7 +93,7 @@ Fortran, Java and Matlab.</p>
      Gems 3</a> contains great demonstration GPU codes.</li>
  </ul>
  
-<h2 id="learn">Learning CUDA by examples</h2>
+<h2 id="example">Learning CUDA by example</h2>
  
  <p>Jason Sanders and Edward
  Kandrot's <a href="http://developer.nvidia.com/object/cuda-by-example.html">CUDA
author	W. Trevor King <wking@drexel.edu>
	Thu, 25 Nov 2010 16:04:56 +0000 (11:04 -0500)
committer	W. Trevor King <wking@drexel.edu>
	Thu, 25 Nov 2010 16:04:56 +0000 (11:04 -0500)
assignments/archive/logistic_cuda/index.shtml.itex2MML		patch \| blob \| history
assignments/archive/matrix_multiplication_cuda/index.shtml.itex2MML	[new file with mode: 0644]	patch \| blob
assignments/current/7		patch \| blob \| history
content/GPUs/index.shtml		patch \| blob \| history