content/2D_Domain_Decomposition/index.shtml.itex2MML

   1 <!--#set var="root_directory" value="../.." --><!--#include virtual="$root_directory/shared/header.shtml"-->
   2
   3 <h1>Domain Decomposition — 2D Poisson Equation</h1>
   4
   5 <!--TableOfContents:Begin-->
   6 <!--TableOfContents:End-->
   7
   8 <h2 id="EI2D">Elliptic Equation — 2D Poisson Equation</h2>
   9
  10
  11 <h3 id="EI2Dm">The Model</h3>
  12
  13 <p>We will solve the equation</p>
  14
  15 <p class="equation">\[
  16   \nabla^2 u = \frac{\partial^2 u}{\partial x^2} +
  17                \frac{\partial^2 u}{\partial x^2} = S(x,y)
  18 \]</p>
  19
  20 <p>We solve for the function $u(x,y)$ in a 2D domain, $x \in [0,1]$
  21 and $y \in [0,1]$. The boundary conditions are assumed of Dirichlet
  22 type, namely$ u(0,y)$, $u(1,y)$, $u(x,0)$, and $u(x,1)$ are given. In
  23 addition, the function $u(x,y)$ can also be specified on specific
  24 points in the interior domain.</p>
  25
  26 <p>In the following discussion, the source $S(x,y)$ is taken to be a
  27 simple displaced Gaussian for simplicity.</p>
  28
  29
  30 <p>The domain is discretized on a 2D numerical lattice, with $x_i =
  31 (i-1) h$, $y_j = (j-1)h$, and $h = 1.0/(N-1)$.  $N$ is the number of
  32 grid points in both directions. The Poisson equation is re-written in
  33 finite difference form:</p>
  34
  35 <p class="equation">\[
  36   \frac{1}{h^2}
  37   \left( u_{i+1,j} + u_{i-1,j} + u_{i,j+1} + u_{i,j-1} - 4 u_{i,j} \right)
  38                = S_{i,j}
  39 \]</p>
  40
  41 <p>This leads to a large set of linear equations to solve for the
  42 field values $u_{i,j}$ on the grid within the domain, keeping
  43 $u_{i,j}$ fixed at the boundary.  This set of equations can be solved
  44 by an <a id="iter_2d">iterative method</a>, whereby</p>
  45
  46 <p class="equation">\[
  47   u_{i,j}^{k+1} =
  48     \frac{1}{4} \left(  u_{i+1,j}^k + u_{i-1,j}^k + u_{i,j+1}^k + u_{i,j-1}^k
  49                         - h^2  S_{i,j}
  50                 \right)
  51 \]</p>
  52
  53 <p>is iterated in place until the changes in $u_{i,j}$</p>
  54
  55 <p class="equation">\[
  56 diff = Max \| u_{i,j}^{k+1} - u_{i,j}^{k} \|
  57 \]</p>
  58
  59 <p>become less than some predefined tolerance criteria. An arbitrary
  60 guess for $u_{i,j}$ is assumed to start with. The method is stable,
  61 i.e., the solution will converge.</p>
  62
  63 <p>The old and new values of $u_{i,j}$ are mixed to accelerate the
  64 convergence, leading to the <a href="../poisson/#SOR">Successive Over
  65 Relaxation</a> scheme.</p>
  66
  67 <p class="equation">\[
  68   \begin{aligned}
  69     u_{i,j}^{k+\frac{1}{2}} =
  70       \frac{1}{4} \left( u_{i+1,j}^k + u_{i-1,j}^{k+\frac{1}{2}} +
  71                          u_{i,j+1}^k + u_{i,j-1}^{k+\frac{1}{2}}
  72                          - h^2 S_{i,j}
  73                   \right) \\
  74     u_{i,j}^{k+1} = \omega u_{i,j}^{k+\frac{1}{2}}
  75                     - ( 1-\omega) u_{i,j}^k
  76   \end{aligned}
  77 \]</p>
  78
  79 <p>The accuracy of the solution remains second order in $h$.</p>
  80
  81 <h3 id="EI2Dsi">Serial Implementation</h3>
  82
  83 <p>The code <a href="src/poisson_2d/poisson_2d.c">poisson_2d.c</a>
  84 implements the above scheme. The steps in the code are:</p>
  85 <ol>
  86   <li>Initialize the numerical grid.</li>
  87   <li>Provide an initial guess for the solution.</li>
  88   <li>Set the boundary values &amp; source term.</li>
  89   <li>Iterate the solution until convergence.</li>
  90   <li>Output the solution for plotting.</li>
  91 </ol>
  92
  93 <p>Compile and execute with</p>
  94
  95 <pre>
  96 gcc poisson_2d.c -lm -o poisson_2d
  97 ./poisson_2d N Nx Ny | ./plot_image.py -s Nx,Ny -t "2d Poisson"
  98 </pre>
  99
 100 <p>where <code>N</code>, <code>Nx</code> and <code>Ny</code> are the
 101 (arbitrary) maximum number of iterations and number of grid points
 102 (image size) respectively. Good numbers to use could be 10000, 300,
 103 300.  We introduced <a
 104 href="../../src/plot_image/plot_image.py">plot_image.py</a> in <a
 105 href="../programming_strategies/#CI">Programming Strategies: Color
 106 Images</a>.</p>
 107
 108 <p>The code stops once good convergence (parameter <code>EPS</code>)
 109 is obtained.</p>
 110
 111 <p>Note:</p>
 112 <ul>
 113   <li>The <code>chi</code> parameter which implements the
 114     over-relaxation approach is chosen arbitrarily for simplicity (see
 115     <i>Numerical Receipes</i> for an optimum choice).</li>
 116   <li>The code is written in the simplest possible way.</li>
 117   <li>The code outputs the field values to pipe
 118     into <a href="../../src/plot_image/plot_image.py">plot_image.py</a>
 119     for field display.</li>
 120 </ul>
 121
 122 <h3 id="EI2Dpi">Parallel Implementation</h3>
 123
 124 <p>Domain decomposition consists in dividing the original grid among
 125 the processes with an equal number of grid points in each process for
 126 good load balance. We use here the simplest possible division, i.e.,
 127 in simple stripes. This results in the following arrangement on each
 128 processor.</p>
 129
 130 <img border="0" src="img/2D_sub_domain.gif" width="613" height="584" /> <!-- TODO: convert to SVG -->
 131
 132 <p>If the number of grid points does not divide equally among the
 133 processes, the extra grid points are distributed one each to a subset
 134 of processors as indicated schematically in the following
 135 figure.</p>
 136
 137 <p>The domain is divided in stripes which correspond to a split of the
 138 first index in the two dimensional arrays containing the fields. This
 139 scheme is adopted to facilitate the transmission of
 140 the <code>GHOST</code> lines of points. Recall that C stores two
 141 dimensional arrays in row fashion; therefore transmitting
 142 row <code>ip</code> of the array <code>phi</code>
 143 containing <code>jj</code> numbers simply requires</p>
 144
 145 <pre>
 146 MPI_Ssend( &amp;phi[ip][0] , jj, ... )
 147 </pre>
 148
 149 <p>with no further packing. This is efficient and easy to implement.
 150 Note that a FORTRAN programmer would divide the domain in stripes
 151 along the second index of the array
 152 since <a href="http://en.wikipedia.org/wiki/Row-major_order">FORTRAN
 153 stores two dimensional arrays in columns</a>.</p>
 154
 155 <p>The code should follow very much the approach of the 1D code. Some functions, like
 156 <code>decompose_domain()</code>, could even be borrowed as is since
 157 the stripe decomposition is essentially a 1D domain
 158 decomposition. The flowchart below shows the structure that the code
 159 should have:</p>
 160
 161 <img border="0" src="img/Poisson_parallel_Flowchart_2d.gif" width="433" height="723" /> <!-- TODO: convert to SVG -->
 162
 163 <h2 id="red-black">Red and Black algorithm</h2>
 164
 165 <p>The parallel and serial implementations of the Poisson equation
 166 solver in 2D as implemented above yield the same results to within
 167 the tolerance only. Yet these two implementations are not strictly
 168 identical. This stems from the handling of the <code>GHOST</code>
 169 lines in the parallel implementation.</p>
 170
 171 <p>Consider the update of the field at the location marked 0 below.
 172 The <a href="../poisson/#Gauss-Siedel">Gauss-Siedel</a> solution calls
 173 for an <em>in-place</em> update of the field via the finite difference
 174 formula in a systematic sweep of the numerical lattice.  This is done
 175 via the <a href="#iter_2d">2D</a> finite difference formula. The
 176 values of the field at the points 2 and 3 have already been updated if
 177 the sweep originates from the upper-left corner of the lattice, while
 178 the field at the point 1 and 4 are yet to be updated.</p>
 179
 180 <img border="0" src="img/Red_Black_1.gif" width="221" height="223" /> <!-- TODO: convert to SVG -->
 181
 182 <p>The sub-domain decomposition requires the transmission of
 183 the <code>GHOST</code> line of values at each iteration. This is
 184 normally done before each Gauss-Siedel step.  The values of the field
 185 in these <code>GHOST</code> lines are those of the previous
 186 Gauss-Siedel iteration. The value of the field at point "2" below is
 187 no longer the updated one. Therefore the parallel implementation is
 188 strictly not Gauss-Siedel.</p>
 189
 190 <img border="0" src="img/Red_Black_2.gif" width="226" height="347" /> <!-- TODO: convert to SVG -->
 191
 192 <p>The solution to this problem is to modify the Gauss-Siedel
 193 algorithm into the the <em>Red and Black</em> algorithm. In this
 194 approach the grid is scanned alternatively on the "red" then on the
 195 "black" dots. The name is reminiscent of a checker board.</p>
 196
 197 <img border="0" src="img/Red_Black_3.gif" width="226" height="223" />
 198
 199
 200 <p>It is clear from the picture that the needed values of the field at
 201 points 1, 2, 3, and 4 are the old field values (black) when updating
 202 the field at point 0, and vice-versa for any red point. The algorithm
 203 then calls fro two subsequent sweep of the lattice alternately on the
 204 red and black points.  In a parallel implementation,
 205 the <code>GHOST</code> lines are exchanged twice at each iteration:</p>
 206
 207 <ul>
 208   <li>Exchange the <code>GHOST</code> lines.</li>
 209   <li>Update the red points on the lattice.</li>
 210   <li>Exchange the <code>GHOST</code> lines.</li>
 211   <li>Update the black points on the lattice.</li>
 212 </ul>
 213
 214 <p>This algorithm yields numerically equivalent solutions between the
 215 serial and parallel implementations, but might be somewhat slower than
 216 the strict Gauss-Siedel.</p>
 217
 218 <!--  TODO: resurrect this section?
 219 <p>This Red and Black scheme is implemented in the 1D problem as an
 220 illustration. The corresponding codes are
 221 <a href="Codes/1-d_Red_Black/poisson_1d_red_black.c">poisson_1d_red_and_black.c</a>
 222 and&nbsp;
 223 <a href="Codes/1-d_Red_Black/poisson_parallel_1d_red_black.c">poisson_parallel_1d_red_and_black.c</a>
 224 . Running these two codes will give
 225 "identical" answers to machine precision.</p>
 226 -->
 227
 228 <!--#include virtual="$root_directory/shared/footer.shtml"-->