Fix html_toc.py issues in logistic_cuda/index.shtml.itex2MML.

author W. Trevor King <wking@drexel.edu>

Tue, 14 Sep 2010 17:39:34 +0000 (13:39 -0400)

committer W. Trevor King <wking@drexel.edu>

Tue, 14 Sep 2010 17:39:34 +0000 (13:39 -0400)
author W. Trevor King <wking@drexel.edu>
Tue, 14 Sep 2010 17:39:34 +0000 (13:39 -0400)
committer W. Trevor King <wking@drexel.edu>
Tue, 14 Sep 2010 17:39:34 +0000 (13:39 -0400)
diff --git a/assignments/archive/logistic_cuda/index.shtml.itex2MML b/assignments/archive/logistic_cuda/index.shtml.itex2MML

index 80445bfe1fbf8e157d5470eaa3671a19acf03db9..705eeb7f67c5429b32b1c2fb27c734de7d1246e8 100644 (file)
--- a/assignments/archive/logistic_cuda/index.shtml.itex2MML
+++ b/assignments/archive/logistic_cuda/index.shtml.itex2MML
@@ -25,7 +25,7 @@ identical size.</p>
  <p>The matrices $M$ and $N$, of size <code>MatrixSize</code>, could be
  filled with random numbers.</p>
  
-<h2 id="A1">Step 1</h2>
+<h3 id="A1">Step 1</h3>
  
  <p>Write a program to do the matrix multiplication assuming that the
  matrices $M$ and $N$ are small and fit in a CUDA block. Input the
@@ -35,7 +35,7 @@ sure that your code works for arbitrary block size (up to 512 threads)
  and (small) matrix size. Use one-dimensional arrays to store the
  matrices $M$, $N$ and $P$ for efficiency.</p>
  
-<h2 id="A2">Step 2</h2>
+<h3 id="A2">Step 2</h3>
  
  <p>Modify the previous program to multiply arbitrary size
  matrices. Make sure that your code works for arbitrary block size (up
@@ -45,7 +45,7 @@ matrix multiplication on the CPU and GPU. Plot these times as a
  function of matrix size (up to large matrices, 4096) and guess the
  matrix size dependence of the timing.</p>
  
-<h2 id="A2">Step 3</h2>
+<h3 id="A2">Step 3</h3>
  
  <p>Optimize the previous code to take advantage of the very fast share
  memory. To do this you must tile the matrix via 2D CUDA grid of blocks
@@ -55,9 +55,8 @@ within the block can be calculated by scanning over the matrices in
  block size tiles. The content of $M$ and $N$ within the tiles can then
  be transfered into the share memory for speed.</p>
  
-<p>See the <a href="<!--#echo var="root_directory"
--->/content/GPUs/#learn">Learning CUDA</a> section of the course notes
-and the skeleton code <a
+<p>See the <a href="../../../content/GPUs/#learn">Learning CUDA</a>
+section of the course notes and the skeleton code <a
  href="src/matmult_skeleton.cu">matmult_skeleton.cu</a>. See also the
  in-class exercise on array reversal.</p>
author	W. Trevor King <wking@drexel.edu>
	Tue, 14 Sep 2010 17:39:34 +0000 (13:39 -0400)
committer	W. Trevor King <wking@drexel.edu>
	Tue, 14 Sep 2010 17:39:34 +0000 (13:39 -0400)