From: W. Trevor King Due Friday, December 11 Learn the CUDA language.Assignment 9
-
-
-
-
-
diff --git a/assignments/archive/logistic_cuda/index.shtml.itex2MML b/assignments/archive/logistic_cuda/index.shtml.itex2MML
new file mode 100644
index 0000000..03951bb
--- /dev/null
+++ b/assignments/archive/logistic_cuda/index.shtml.itex2MML
@@ -0,0 +1,179 @@
+
+
+Assignment 9
+Purpose
+
+
This assignment consists in multiplying two square matrices of +identical size.
+ +\[ + P = M \times N +\]
+ +The matrices $M$ and $N$, of size MatrixSize
, could be
+filled with random numbers.
Write a program to do the matrix multiplication assuming that the +matrices $M$ and $N$ are small and fit in a CUDA block. Input the +matrix size via a command line argument. Do the matrix multiplication +on the GPU and on the CPU and compare the resulting matrices. Make +sure that your code works for arbitrary block size (up to 512 threads) +and (small) matrix size. Use one-dimensional arrays to store the +matrices $M$, $N$ and $P$ for efficiency.
+ +Modify the previous program to multiply arbitrary size
+matrices. Make sure that your code works for arbitrary block size (up
+to 512 threads) and matrix size (up to memory limitation). Instrument
+your program with calls to gettimeofday()
to time the
+matrix multiplication on the CPU and GPU. Plot these times as a
+function of matrix size (up to large matrices, 4096) and guess the
+matrix size dependence of the timing.
Optimize the previous code to take advantage of the very fast share +memory. To do this you must tile the matrix via 2D CUDA grid of blocks +(as above). All matrix elements in a block within $P$ will be computed +at once. The scalar product of each row of $M$ and each column of $N$ +within the block can be calculated by scanning over the matrices in +block size tiles. The content of $M$ and $N$ within the tiles can then +be transfered into the share memory for speed.
+ +See the /content/GPUs/#learn">Learning CUDA section of the course notes +and the skeleton code matmult_skeleton.cu. See also the +in-class exercise on array reversal.
+ +This part of the assignment asks you to adapt to CUDA a serial code +that generates a bifurcation diagram for the logistic map. The +logistic map is a map of real line to itself given by
+ +\[ + x_{i+1} = a â x2_i. +\]
+ +This mapping is ubiquitous in many problems of practical interest +and is arguably the simplest example of a (discrete) complex dynamical +system (indeed, youâll note its similarity to the equation generating +the complex Mandelbrot set).
+ +The variable $a$ is a parameter that is held constant while $x$ is +iterated from some initial condition $x_0$. We are interested in the +long term or asymptotic behavior as $x_0$ is iterated for various +values of $a$. A plot of the asymptotic values of $x$ verses $a$ is +called a bifurcation diagram.
+ +The reason for this terminology is as follows. The asymptotic +behavior often varies smoothly with $a$. For example, for some $a$ +$x_0$ may tend to some fixed point $x^â$ with the value of $x^â$ +varying smoothly with $a$. However, for another $a$ $x_0$ could end up +in a period two orbit, oscillating between two values $x_1^â$ and +$x_2^â$. The values of these two points may also vary smoothly with +$a$, but there is some transition value $\tilde{a}$ where we jump from +the fixed point to the period two orbit. This non-smooth process is +called a bifurcation. The bifurcation diagram then shows all +of these bifurcations on a single plot since we scan over all values +of $a$.
+ +The serial code loops over a and iterates a random initial
+condition THRESH
number of times. This is to let
+transients âdie outâ and approach the asymptotic behavior. If an
+iterate leaves the interval $[â2, 2]$ during this time it will
+eventually escape to $\infty$, so the trajectory is thrown out and
+another random initial condition is tried. It is known that positive
+measure attracting sets exist for the $a$ values in the program so
+this loop will eventually terminate.
If a trajectory stays bounded after THRESH
iterates
+the next MAXITER
iterates are tracked. The $x$-axis is
+binned into xRES
number of bins and the binit routine is
+called to find which bin the current point in the trajectory is
+in. This repeats until xRES
number of initial conditions
+have been iterated and binned. The bins are then normalized to a
+maximum value of one and are then output to the screen. The values in
+the bins are essentially the density of iterates around various points
+and plotting them shows the bifurcation structure of the map.
+
+
The starting source for this assigment is packaged in logistic_cuda.tar.gz. First +run the serial code and gnuplot script so you can see what it is +youâre supposed to produce.
+ ++gcc -o logistic logistic.c -lm +./logistic > log.dat +gnuplot -persist log.p ++ +
Then adapt the serial code to run on CUDA using the skeleton file
+log_skel.cu
. Note the differences from the serial
+code. Functions called from a kernel are prefixed
+with __device__
and host functions cannot be called from
+device functions. The random number generator rand()
is a
+host function, so I added my own random number generator for the
+kernel to use. Finally, the original binit
sorting
+algorithm was recursive, but device functions do not support
+recursion, so it has been rewritten without recursion (the while loop
+functions as the recursion step).
Parallelize over $a$, so that each thread computes the future orbit +for a single value of $a$. Thus the block and grid need only be one +dimensional (note that this allows a maximum of $29\times 216 = 225 +\sim 3 \times 107$ values of $a$, which should be sufficient. The +kernel function should replace the entire main loop of the serial +code. This includes iterating for a value of $a$, binning the +trajectory, and normalizing the bin. The normalized bins should be +returned to the main program for output. Finally, time the CUDA +code.
+ +Note that you may keep the various #define
statements
+intact so that these parameters need not be explicitly passed to
+functions.
The above implementation iterates a random initial condition +followed by another if the first escapes the region. For the parameter +range given every initial condition either escapes to $\infty$ or +tends to a unique stable bounded attractor (a fixed point, periodic +orbit, or âchaoticâ Cantor set). In principle a map $x_{i+1} = f(x_i)$ +could have more than one coexisting attracting set, so that different +initial conditions can tend to distinct bounded asymptotic behaviors, +or (Lebesgue almost) every initial condition may excape to +$\infty$.
+ +Modify the CUDA program using an extra dimension of block/threads +to assign initial condtions distributed throughout the interval $[â2, +2]$ amongst these threads. Have the various threads bin the bounded +trajectories together. Solutions that escape the interval should not +be binned.
+ +Test this code on the map $x_{i+1} = aâ(aâx_i^2)^2$ and compare +against the original code. Are the results different? Note, this +example is the second iterate of the logistic map, so period two +orbits of the original become distinct period one orbits of the second +iterate map.
+ + diff --git a/assignments/archive/logistic_cuda/src/logistic_cuda/.make_tar b/assignments/archive/logistic_cuda/src/logistic_cuda/.make_tar new file mode 100644 index 0000000..80e52ce --- /dev/null +++ b/assignments/archive/logistic_cuda/src/logistic_cuda/.make_tar @@ -0,0 +1 @@ +./ diff --git a/assignments/archive/logistic_cuda/log.p b/assignments/archive/logistic_cuda/src/logistic_cuda/log.p similarity index 100% rename from assignments/archive/logistic_cuda/log.p rename to assignments/archive/logistic_cuda/src/logistic_cuda/log.p diff --git a/assignments/archive/logistic_cuda/log_skel.cu b/assignments/archive/logistic_cuda/src/logistic_cuda/log_skel.cu similarity index 100% rename from assignments/archive/logistic_cuda/log_skel.cu rename to assignments/archive/logistic_cuda/src/logistic_cuda/log_skel.cu diff --git a/assignments/archive/logistic_cuda/logistic.c b/assignments/archive/logistic_cuda/src/logistic_cuda/logistic.c similarity index 100% rename from assignments/archive/logistic_cuda/logistic.c rename to assignments/archive/logistic_cuda/src/logistic_cuda/logistic.c diff --git a/assignments/archive/logistic_cuda/matmult_skeleton.cu b/assignments/archive/logistic_cuda/src/matmult_skeleton.cu similarity index 100% rename from assignments/archive/logistic_cuda/matmult_skeleton.cu rename to assignments/archive/logistic_cuda/src/matmult_skeleton.cu