src/sawsim/discussion.tex

   1 \section{Results and Discussion}
   2 \label{sec:sawsim:results}
   3
   4 \subsection{Force curves generated by simulation}
   5 \label{sec:sawsim:results:force-curves}
   6
   7 \Cref{fig:sawsim:sim-sawtooth} shows three simulated force curves from
   8 pulling a polymer composed of eight identical protein molecules using
   9 parameters from typical experimental settings.  The order of the peaks
  10 in the force curves reflects the temporal sequence of the unfolding
  11 events instead of the positions of the protein molecules in the
  12 polymer\citep{li00}.  As observed experimentally
  13 (\cref{fig:expt-sawtooth}), the forces at which identical protein
  14 molecules unfold fluctuate, revealing the stochastic nature of protein
  15 unfolding since no instrumental noise is included in the simulation.
  16
  17 After aquiring a series of experimental unfolding curves, we need to
  18 fit the data to an explanatory model.  For velocity-clamp experiments
  19 (\cref{sec:procedure,sec:sawsim:velocity-clamp}), we extract unfolding
  20 forces from the sawtooth curves (\cref{sec:hooke}) and generate
  21 histograms of unfolding forces.  Then we construct a parameterized
  22 model of the experimental system (\cref{tab:sawsim:model}).  We can
  23 then run \insilico\ experiments mimicking our \invitro\ experiments
  24 (\cref{fig:sawsim:sim-hist}).  We extract the model parameters which
  25 provide the best fit using a ``fit quality'' metric and a nonlinear
  26 optimization routine (or a full parameter space sweep, for
  27 low-dimensional parameter spaces).
  28
  29 \begin{table}[btp]
  30   \begin{center}
  31     \subfloat[][]{\label{tab:sawsim:domains}
  32       \begin{tabular}{l l l l}
  33         \toprule
  34         \multicolumn{4}{c}{Domain states} \\
  35         \midrule
  36         Domain name & Initial count & Tension model & Model parameters \\
  37         \midrule
  38         AFM cantilever & 1 & Hooke (\cref{eq:sawsim:hooke}) &
  39           $k_c=0.05\U{N/m}$ \\
  40         Folded I27 & 8 & WLC (\cref{eq:sawsim:wlc}) &
  41           $p=3.9\U{\AA}$, $L=5.1\U{nm}$ \\
  42         Unfolded I27 & 0 & WLC (\cref{eq:sawsim:wlc}) &
  43           $p=3.9\U{\AA}$, $L=33.8\U{nm}$ \\
  44         \bottomrule
  45       \end{tabular}} \\
  46     \subfloat[][]{\label{tab:sawsim:transitions}
  47       \begin{tabular}{l l l l l}
  48         \toprule
  49         \multicolumn{5}{c}{Transition rates} \\
  50         \midrule
  51         Transition & Source & Target & Rate model & Model parameters \\
  52         \midrule
  53         Unfolding & Folded I27 & Unfolded I27 & Bell (\cref{eq:sawsim:bell}) &
  54           $k_{u0}=3.3\E{-4}\U{s$^{-1}$}$, $\Delta x=0.35\U{nm}$. \\
  55         \bottomrule
  56       \end{tabular}}
  57     \caption{\protect\subref{tab:sawsim:domains} Model for
  58       I27\textsubscript{8} domain states and
  59       \protect\subref{tab:sawsim:transitions} transitions between them
  60       (compare with \cref{fig:sawsim:domains}).  The models and
  61       parameters are those given by \citet{carrion-vazquez99b}.
  62       \citet{carrion-vazquez99b} don't list their cantilever spring
  63       constant (or, presumably, use it in their simulations), but we
  64       can estimate it from the rebound slope in their Figures~2.a and
  65       2.b, see
  66       \cref{fig:sawsim:kappa-sawteeth}.\label{tab:sawsim:model}}
  67   \end{center}
  68 \end{table}
  69
  70 Because the unfolding behavious of an individual sawtooth curve is
  71 stochastic (\cref{fig:sawsim:sim-sawtooth}), we cannot directly
  72 compare single curves in our fit quality metric.  Instead, we gather
  73 many experimental and simulated curves, and compare the aggregate
  74 properties.  For velocity-clamp experiments, the usual aggregate
  75 property used for comparison is a histogram of unfolding
  76 forces\citep{carrion-vazquez99b} (\cref{fig:sawsim:sim-hist}).
  77 Defining and extracting “unfolding force” is suprisingly complicated
  78 (\cref{sec:hooke:unfolding-force}), but basically it is the highest
  79 tension force achieved by the chain before an unfolding event (the
  80 drops in the sawtooth).  The final drop is not an unfolding event, it
  81 is the entire chain breaking away from the cantilever tip, severing
  82 the connection between the substrate and the cantilever.
  83
  84 \begin{figure}
  85   \begin{center}
  86     \subfloat[][]{\asyinclude{figures/sim-sawtooth/sim-sawtooth}%
  87       \label{fig:sawsim:sim-sawtooth}%
  88     }
  89     \hspace{.25in}%
  90     \subfloat[][]{\asyinclude{figures/sim-hist/sim-hist}%
  91       \label{fig:sawsim:sim-hist}%
  92     }
  93     \caption{\protect\subref{fig:sawsim:sim-sawtooth} Three simulated force
  94       curves from pulling a polymer of eight identical protein
  95       molecules.  The simulation was carried out using the parameters:
  96       pulling speed $v=1\U{$\mu$m/s}$, cantilever spring constant
  97       $\kappa_c=50\U{pN/nm}$, temperature $T=300\U{K}$, persistence
  98       length of unfolded proteins $p_u=0.40\U{nm}$, $\Delta
  99       x_u=0.225\U{nm}$, and $k_{u0}=5\E{-5}\U{s$^{-1}$}$.  The contour
 100       length between the two linking points on a protein molecule is
 101       $L_{f1}=3.7\U{nm}$ in the folded form and $L_{u1}=28.1\U{nm}$ in
 102       the unfolded form.  These parameters are those of ubiquitin
 103       molecules connected through the N-C
 104       termini\citep{chyan04,carrion-vazquez03}.  Detachment from the
 105       tip or substrate is assumed to occur at a force of $400\U{pN}$.
 106       In experiments, detachments have been observed to occur at a
 107       variety of forces.  For clarity, the green and blue curves are
 108       offset by $200$ and $400\U{pN}$ respectively.
 109       \protect\subref{fig:sawsim:sim-hist} The distribution of the unfolding
 110       forces from $400$ simulated force curves ($3200$ data points)
 111       such as those shown in \protect\subref{fig:sawsim:sim-sawtooth}. The
 112       frequency is normalized by the total number of points, \ie, the
 113       height of each bin is equal to the number of data points in that
 114       bin divided by the total number of data
 115       points.\label{fig:sawsim:sim-all}}
 116   \end{center}
 117 \end{figure}
 118
 119 \subsection{The supramolecular scaffold}
 120 \label{sec:sawsim:results:scaffold}
 121
 122 Analysis of the mechanical unfolding data is complicated by the
 123 dependence of the average unfolding force on the unfolding order due
 124 to the serial linkage of the molecules.  Under an external stretching
 125 force $F$, the probability of some domain unfolding in a polymer with
 126 $N_f$ folded domains is $N_fP_1$ (\cref{eq:sawsim:prob-n}), which is
 127 higher than the unfolding probability for a single molecule $P_1$.
 128 Consequently, the average unfolding force is lower for the earlier
 129 unfolding events when $N_f$ is larger, and the force should increase
 130 as more and more molecules become unfolded.  However, there is a
 131 competing factor that opposes this trend.  As the protein molecules
 132 unfold, the chain becomes softer and the force loading rate becomes
 133 lower when the pulling speed is constant.  This lower loading rate
 134 leads to a decrease in the unfolding force (in the no-loading limit,
 135 all unfolding events occur at a tension of $0\U{N}$).  The dependence
 136 of the average unfolding force on the unfolding order is the result of
 137 these two opposing effects.  \Cref{fig:sawsim:order-dep} shows the
 138 dependence of the average unfolding force on the unfolding force peak
 139 order (the temporal order of unfolding events) for four polymers with
 140 $4$, $8$, $12$, and $16$ identical protein molecules.  The effect of
 141 polymer chain softening dominates the initial unfolding events, and
 142 the average unfolding force decreases as more molecules unfold.  After
 143 several molecules have unfolded, the softening for each additional
 144 unfolding event becomes less significant, the change in unfolding
 145 probability becomes dominant, and the unfolding force increases upon
 146 each subsequent unfolding event\citep{zinober02}.
 147
 148 We validate this explanation by calculating the unfolding force
 149 probability distribution's dependence on the two competing factors.
 150 The rate of unfolding events with respect to force is
 151 \begin{align}
 152   r_{uF} &= -\deriv{F}{N_f}
 153     = -\frac{\dd N_f/\dd t}{\dd F/\dd t}
 154     = \frac{N_f k_u}{\kappa v} \\
 155     &= \frac{N_f k_{u0}}{\kappa v}\exp\p({\frac{F\Delta x_u}{k_B T}})
 156     = \frac{1}{\rho}\exp\p({\frac{F-\alpha}{\rho}}) \;,
 157 \end{align}
 158 where $N_f$ is the number of folded domain,
 159 $\kappa=(1/\kappa_c+N_u/\kappa_\text{WLC})^{-1}$ is the spring
 160 constant of the cantilever-polymer system ($\kappa_\text{WLC}$ is the
 161 effective spring constant of one unfolded domain, assumed constant for
 162 a particular polymer/cantilever combination), $\kappa v$ is the force
 163 loading rate, and $k_u$ is the unfolding rate constant
 164 (\cref{eq:sawsim:bell}).  In the last expression, $\rho\equiv
 165 k_BT/\Delta x_u$, and $\alpha\equiv-\rho\ln(N_fk_{u0}\rho/\kappa v)$.
 166 The event probability density for events with an exponentially
 167 increasing likelihood function follows the Gumbel (minimum)
 168 probability density\citep{NIST:gumbel} with $\rho$ and $\alpha$ being
 169 the scale and location parameters, respectively\citep{hummer03}
 170 \begin{equation}
 171   \mathcal{P}(F) = \frac{1}{\rho} \exp\p[{\frac{F-\alpha}{\rho}
 172                                            -\exp\p({\frac{F-\alpha}{\rho}})
 173                                            }] \;.  \label{eq:sawsim:gumbel}
 174 \end{equation}
 175 The distribution has a mean $\avg{F}=\alpha-\gamma_e\rho$ and a
 176 variance $\sigma^2 = \pi^2\rho^2/6$, where $\gamma_e=0.577\ldots$ is
 177 the Euler-Mascheroni constant.  Therefore, the unfolding force
 178 distribution has a variance
 179 \begin{equation}
 180   \sigma^2 = \frac{\p({\frac{\pi k_BT}{\Delta x_u}})^2}{6} \;,
 181   \label{eq:sawsim:variance}
 182 \end{equation}
 183 and and average\citep{brockwell02,hummer03}
 184 % TODO: verify brockwell equivalence (p465)
 185 \begin{equation}
 186   \avg{F(i)} = \frac{k_BT}{\Delta x_u}
 187                \p[{\ln\p({\frac{\kappa v\Delta x_u}{N_fk_{u0}k_BT}})
 188                    -\gamma_e}] \;,  \label{eq:sawsim:order-dep}
 189 \end{equation}
 190 where $N_f$ and $\kappa$ depend on the domain index $i=N_u$.  Curves
 191 based on this formula fit the simulated data remarkably well
 192 considering the effective WLC\index{WLC} stiffness $\kappa_\text{WLC}$
 193 is the only fitted parameter, and that the actual WLC stiffness is not
 194 constant, as we have assumed here, but a non-linear function of $F$.
 195 \citet{dudko08} derived a formula for the loading rate for a WLC, but
 196 as far as I know, nobody has found an analytical form for the
 197 unfolding force histograms produced under such a variable loading
 198 rate.
 199 \nomenclature{$r_{uF}$}{Unfolding loading rate (newtons per second)}
 200 \nomenclature{$\gamma_e$}{Euler-Macheroni constant, $\gamma_e=0.577\ldots$}
 201
 202 From \cref{fig:sawsim:order-dep}, we see that the proper way to
 203 process data from mechanical unfolding experiments is to group the
 204 curves according to the length of the polymer and to perform
 205 statistical analysis separately for peaks with the same unfolding
 206 order.  However, in most experiments, the tethering of the polymer to
 207 the AFM tip is by nonspecific adsorption; as a result, the polymers
 208 being stretched between the tip and the substrate have various
 209 lengths\citep{1li00}.  In addition, the interactions between the tip
 210 and the surface often cause irregular features in the beginning of the
 211 force curve (\cref{fig:expt-sawtooth}), making the identification of
 212 the first peak uncertain\citep{carrion-vazquez00}.  Furthermore, it is
 213 often difficult to acquire a large amount of data in single molecule
 214 experiments.  These difficulties make the aforementioned data analysis
 215 approach unfeasible for many mechanical unfolding experiments.  As a
 216 result, the values of all force peaks from polymers of different
 217 lengths are often pooled together for statistical analysis.  To assess
 218 the errors caused by such pooling, simulation data were analyzed using
 219 different pooling methods and the results were compared.
 220 \Cref{fig:sawsim:sim-hist} shows that, for a polymer with eight
 221 protein molecules, the average unfolding force is $281\U{pN}$ with a
 222 standard deviation of $25\U{pN}$ when all data is pooled.  If only the
 223 first peaks in the force curves are analyzed, the average force is
 224 $279\U{pN}$ with a standard deviation of $22\U{pN}$.  While for the
 225 fourth and eighth peaks, the average force are $275\U{pN}$ and
 226 $300\U{pN}$, respectively, and the standard deviations are $23\U{pN}$
 227 and $25\U{pN}$, respectively.  As expected from the Gumbel
 228 distribution, the width of the unfolding force distribution (insets in
 229 \cref{fig:sawsim:order-dep}) is only weakly effected by unfolding
 230 order, but the average unfolding force can be quite different for the
 231 same protein because of the differences in unfolding order and polymer
 232 length.
 233
 234 \begin{figure}
 235   \begin{center}
 236   \asyinclude{figures/order-dep/order-dep}
 237   \caption{The dependence of the unfolding force on the temporal
 238     unfolding order for four polymers with $4$, $8$, $12$, and $16$
 239     identical protein domains.  Each point in the figure is the
 240     average of $400$ data points.  The first point in each curve
 241     represents the average of only the first peak in each of the $400$
 242     simulated force curves, the second point represents the average of
 243     only the second peak, and so on.  The solid lines are fits of
 244     \cref{eq:sawsim:order-dep} to the simulated data, with best fit
 245     $\kappa_\text{WLC}=203$, $207$, $161$, and $157\U{pN/nm}$,
 246     respectively, for lengths $4$ through $16$.  The insets show the
 247     force distributions of the first, fourth, and eighth peaks, left
 248     to right, for the polymer with eight protein domains.  The
 249     parameters used for generating the data were the same as those
 250     used for \cref{fig:sawsim:sim-sawtooth}, except for the number of
 251     domains.  The histogram insets were normalized in the same way as
 252     in \cref{fig:sawsim:sim-hist}.\label{fig:sawsim:order-dep}}
 253   \end{center}
 254 \end{figure}
 255
 256
 257 \subsection{The effect of cantilever force constant}
 258 \label{sec:sawsim:cantilever}
 259
 260 In mechanical unfolding experiments, the ability to observe the
 261 unfolding of a single protein molecule depends on the tension drop
 262 after an unfolding event such that another molecule does not unfold
 263 immediately.  The magnitude of this drop is determined by many
 264 factors, including the magnitude of the unfolding force, the contour
 265 and persistence lengths of the protein polymer, the contour length
 266 increase from unfolding, and the stiffness (force constant) of the
 267 cantilever.  Among these, the effect of the cantilever force constant
 268 is particularly interesting because cantilevers with a wide range of
 269 force constants are available.  In addition, different single molecule
 270 manipulation techniques, such as the AFM and laser tweezers, differ
 271 mainly in the range of the spring constants of their force
 272 transducers\citep{walton08}.  \Cref{fig:sawsim:kappa-sawteeth} shows
 273 the simulated force curves from pulling an octamer of protein
 274 molecules using cantilevers with different force constants, while
 275 other parameters are identical.  For this model protein, the
 276 appearance of the force curve does not change much until the force
 277 constant of the cantilever reaches a certain value
 278 ($\kappa_c=50\U{pN/nm}$).  When $\kappa_c$ is lower than this value,
 279 the individual unfolding events become less identifiable.  In order to
 280 observe individual unfolding events, the cantilever needs to have a
 281 force constant high enough so that the bending at the maximum force is
 282 small in comparison with the contour length increment from the
 283 unfolding of a single molecule.  \Cref{fig:sawsim:kappa-sawteeth} also
 284 shows that the back side of the force peaks becomes more tilted as the
 285 cantilever becomes softer.  This is due to the fact that the extension
 286 (end-to-end distance) of the protein polymer has a large sudden
 287 increase as the tension rebalances after an unfolding event.
 288
 289 It should also be mentioned that the contour length increment from
 290 each unfolding event is not equal to the distance between adjacent
 291 peaks in the force curve because the chain is never fully stretched.
 292 This contour length increase can only be obtained by fitting the curve
 293 to WLC\index{WLC} or other polymer models (\cref{fig:expt-sawtooth}).
 294
 295 \begin{figure}
 296 \begin{center}
 297 \asyinclude{figures/kappa-sawteeth/kappa-sawteeth}
 298 \caption{Simulated force curves obtained from pulling a polymer with
 299   eight protein molecules using cantilevers with different force
 300   constants $\kappa_c$.  Parameters used in generating these curves
 301   are the same as those used in \cref{fig:sawsim:sim-all}, except the
 302   cantilever force constant.  Successive force curves are offset by
 303   $300\U{pN}$ for clarity.\label{fig:sawsim:kappa-sawteeth}}
 304 \end{center}
 305 \end{figure}
 306
 307 \subsection{Determination of $\Delta x_u$ and $k_{u0}$}
 308 \label{sec:sawsim:results:fitting}
 309
 310 As mentioned in \cref{sec:sawsim:results:force-curves}, fitting
 311 experimental unfolding force histograms to simulated histograms allows
 312 you to extract best-fit parameters for your simulation model.  For
 313 example, if you have Bell model unfolding
 314 (\cref{sec:sawsim:rate:bell}), your two fitting parameters are the
 315 zero-force unfolding rate $k_{u0}$ and the distance $\Delta x_u$ from
 316 the native state to the transition state.  \cref{fig:sawsim:v-dep}
 317 shows the dependence of the unfolding force on the pulling speed for
 318 different values of $k_{u0}$ and $\Delta x_u$.  As expected, the
 319 unfolding force increases linearly with the pulling speed in the
 320 linear-log plot\citep{evans99}.  While the magnitude of the unfolding
 321 forces is affected by both $k_{u0}$ and $\Delta x_u$, the slope of
 322 speed dependence is primarily determined by $\Delta x_u$
 323 (\cref{eq:sawsim:order-dep}).  \Cref{fig:sawsim:width-v-dep} shows
 324 that the width of the unfolding force distribution is very sensitive
 325 to $\Delta x_u$, as expected from the Gumbel distribution
 326 (\cref{eq:sawsim:variance}).  To obtain the values of $k_{u0}$ and
 327 $\Delta x_u$ for the protein, the pulling speed dependence and the
 328 distribution of the unfolding forces from simulation, such as those
 329 shown in \cref{fig:sawsim:v-dep} and the insets of
 330 \cref{fig:sawsim:width-v-dep}, are compared with the experimentally
 331 measured results.  The values of $k_{u0}$ and $\Delta x_u$ that
 332 provide the best match are designated as the parameters describing the
 333 protein under study.  Since $k_{u0}$ and $\Delta x_u$ affect the
 334 unfolding forces differently, the values of both parameters can be
 335 determined simultaneously.  The data used in plotting
 336 \cref{fig:sawsim:all-v-dep} includes all force peaks from the
 337 simulated force curves because most experimental data is analyzed that
 338 way.  % TODO: all?  most data analyzed what way?
 339
 340 In most published literature, $k_{u0}$ and $\Delta x_u$ were fit by
 341 carrying out simulations using a handful of possible unfolding
 342 parameters and selected the best fit by eye%
 343 %\citep{us,carrion-vazques99b,schlierf06}
 344 .  This approach does not allow estimation of uncertainties in the
 345 fitting parameters, as shown by \citet{best02}.  A more rigorous
 346 approach involves quantifying the quality of fit between the
 347 experimental and simulated force distributions, allowing the use of a
 348 numerical minimization algorithm to pick the best fit parameters.  We
 349 use the Jensen--Shannon divergence\citep{sims09,lin91}, a measure of
 350 the similarity between two probability distributions.
 351 \begin{equation}
 352   D_\text{JS}(p_e, p_s)
 353     = D_\text{KL}(p_e, p_m) + D_\text{KL}(p_s, p_m) \;,  \label{eq:sawsim:D_JS}
 354 \end{equation}
 355 where $p_e(i)$ and $p_s(i)$ are the the values of the $i^\text{th}$
 356 bin in the experimental and simulated unfolding force histograms,
 357 respectively.  $D_\text{KL}$ is the Kullback-Leibler divergence
 358 \begin{equation}
 359   D_\text{KL}(p_p,p_q)
 360     = \sum_i p_p(i) \log_2\p({\frac{p_p(i)}{p_q(i)}}) \;,  \label{eq:sawsim:D_KL}
 361 \end{equation}
 362 where the sum is over all unfolding force histogram bins.  $p_m$ is
 363 the symmetrized probability distribution
 364 \begin{equation}
 365   p_m(i) \equiv [p_e(i)+p_s(i)]/2 \;.
 366 \end{equation}
 367 % DONE: Mention inter-histogram normalization? no.
 368 %  For experiments carried out over a series of pulling velocities, we
 369 %  simply sum residuals computed for each velocity, although it would
 370 %  also be reasonable to weight this sum according to the number of
 371 %  experimental unfolding events recorded for each velocity.
 372
 373 The major advantage of the Jensen--Shannon divergence is that
 374 $D_\text{JS}$ is bounded ($0\le D_\text{JS}\le 1$) regardless of the
 375 experimental and simulated histograms.  For comparison, Pearson's
 376 $\chi^2$ test,
 377 \begin{equation}
 378   D_{χ^2} = \sum_i \frac{(p_e(i)-p_s(i))^2}{p_s(i)}) \;,  \label{eq:sawsim:X2}
 379 \end{equation}
 380 is infinite if there is a bin for which $p_e(i)>0$ but $p_s(i)=0$.
 381
 382 \Cref{fig:sawsim:fit-space} shows the Jensen--Shannon divergence
 383 calculated using \cref{eq:sawsim:D_JS} between an experimental data
 384 set and simulation results obtained using a range of values of
 385 $k_{u0}$ and $\Delta x_u$.  There is an order of magnitude range of
 386 $k_{u0}$ that produce reasonable fits to experimental data
 387 (\cref{fig:sawsim:fit-space}), which is consistent with the results
 388 \citet{best02} obtained using a chi-square test.  The values of
 389 $k_{u0}$ and $\Delta x_u$ can be determined to higher precision by
 390 using both the pulling speed dependent data and the unfolding force
 391 distribution, as well as any relevant information about the protein
 392 from other sources.
 393
 394 \begin{figure}
 395   \begin{center}
 396   \subfloat[][]{\asyinclude{figures/v-dep/v-dep}%
 397     \label{fig:sawsim:v-dep}%
 398   } \\
 399   \subfloat[][]{\asyinclude{figures/v-dep/v-dep-sd}%
 400     \label{fig:sawsim:width-v-dep}%
 401   }
 402   \caption{\protect\subref{fig:sawsim:v-dep} The dependence of the
 403     unfolding forces on the pulling speed for three different model
 404     protein molecules characterized by the parameters $k_{u0}$ and
 405     $\Delta x_u$.  The polymer length is eight molecules, and each
 406     symbol is the average of $3200$ data points.
 407     \protect\subref{fig:sawsim:width-v-dep} The dependence of standard
 408     deviation of the unfolding force distribution on the pulling speed
 409     for the simulation data shown in
 410     \protect\subref{fig:sawsim:v-dep}, using the same symbols.  The
 411     insets show the force distribution histograms for the three
 412     proteins at the pulling speed of $1\U{$\mu$m/s}$.  The left,
 413     middle and right histograms are for the proteins represented by
 414     the top, middle, and bottom lines in
 415     \protect\subref{fig:sawsim:v-dep},
 416     respectively.\label{fig:sawsim:all-v-dep}}
 417   \end{center}
 418 \end{figure}
 419
 420 \begin{figure}
 421   \begin{center}
 422   \asyinclude{figures/fit-space/fit-valley}
 423   \caption{Fit quality between an experimental data set and simulated
 424     data sets obtained using various values of unfolding rate
 425     parameters $k_{u0}$ and $\Delta x_u$.  The experimental data are
 426     from octameric ubiquitin pulled at $1\U{$\mu$m/s}$\citep{chyan04},
 427     and the other model parameters are the same as those in
 428     \cref{fig:sawsim:sim-all}.  The best fit parameters are $\Delta
 429     x_u=0.17\U{nm}$ and $k_{u0}=1.2\E{-2}\U{s$^{-1}$}$.  The
 430     simulation histograms were built from $400$ pulls at for each
 431     parameter pair.\label{fig:sawsim:fit-space}}
 432   \end{center}
 433 \end{figure}
 434
 435 \subsection{Features}
 436 \label{sec:sawsim:features}
 437
 438 \sawsim\ is a great improvement over existing work in this field.
 439 \citet{best02} are the only authors to mention such automatic
 440 simulation comparisons, and their $\chi^2$ fit only compares mean
 441 unfolding forces over a range of speeds.  They calculate $\avg{F}$
 442 through an iterative method, and assume a standard deviation of
 443 $20\U{pN}$ on their simulated $\avg{F}$.  \sawsim, by comparison,
 444 makes full use of your experimental histograms, which you specify in a
 445 plain-text histogram file:
 446 \begin{center}
 447 \begin{spacing}{1}
 448 \begin{Verbatim}
 449 #HISTOGRAM: -v 6e-7
 450 #Force (N)      Unfolding events
 451 1.4e-10 1
 452 1.5e-10 0
 453 \ldots
 454 3e-10   116
 455 3.1e-10 18
 456 3.2e-10 1
 457
 458 #HISTOGRAM: -v 8e-7
 459 #Force (N)      Unfolding events
 460 1.4e-10 0
 461 1.5e-10 3
 462 \ldots
 463 3.2e-10 50
 464 3.3e-10 13
 465
 466 #HISTOGRAM: -v 1e-6
 467 #Force (N)      Unfolding events
 468 1.5e-10 2
 469 1.6e-10 3
 470 \ldots
 471 3.3e-10 24
 472 3.4e-10 2
 473 \end{Verbatim}
 474 \end{spacing}
 475 \end{center}
 476
 477 Each \sawsim\ run simulates a single sawtooth curve, so you need to
 478 run many \sawsim\ instances to generate your simulated histograms.  To
 479 automate this task, \sawsim\ comes with a \citet{Python} wrapping
 480 library (\pysawsim), which provides convenient programmatic and
 481 command line interfaces for generating and manipulating \sawsim\ runs.
 482 For example, to compare the experimental histograms listed above with
 483 simulated data over a 50-by-50 grid of $k_{u0}$ and $\Delta x$, you
 484 would use something like
 485 \begin{Verbatim}[samepage]
 486 $ sawsim_hist_scan.py -f '-s cantilever,hooke,0.05 -N1 -s folded,null -N8
 487      -s "unfolded,wlc,{0.39e-9,28e-9}" -k "folded,unfolded,bell,{%g,x%g}"
 488      -q folded' -r '[1e-5,1e-3,50],[0.1e-9,1e-9,50]' --logx histograms.txt
 489 \end{Verbatim}
 490 That's a bit of a mouthful, so let's break it down.  Without the
 491 \sawsim\ template (\Verb+-f ...+), we can focus on the comparison
 492 options:
 493 \begin{Verbatim}[samepage]
 494 $ sawsim_hist_scan.py \ldots -r '[1e-5,1e-3,50],[0.1e-9,1e-9,50]' --logx histograms.txt
 495 \end{Verbatim}
 496 This sets up a two-parameter sweep, with the first parameter going
 497 from $1\E{-5}$ to $1\E{-3}$ in 50 logarithmic steps, and the second
 498 going from $0.1\E{-9}$ to $1\E{-9}$ in 50 linear steps.  The
 499 \sawsim\ template defines the simulation model
 500 (\cref{fig:sawsim:domains,tab:sawsim:model}), and \Verb+%g+ marks the
 501 location where the swept parameters will be inserted.
 502
 503 Behind the scenes, \pysawsim\ is spawning several concurrent
 504 \sawsim\ processes to take advantage of any parallel processing
 505 facilities you may have access to (e.g.~multiple cores, MPI, PBS,
 506 \ldots).  A 50-by-50 grid with 400 runs per pixel at about one second
 507 per \sawsim\ pull would take arount 12 days of serial execution.
 508 Moving the simulation to the departments' 16 core file server cuts
 509 that execution time down to 18 hours, which will easily complete over
 510 a quiet weekend.  Using MPI on the departments' 15 box, dual core
 511 computer lab, the simulation would finish overnight.
 512 \nomenclature{MPI}{Message passing interface, a parallel computing
 513   infrastructure}
 514 \nomenclature{PBS}{Portable batch system, a parallel computing
 515   infrastructure.  You should be able to distinguish this from the
 516   other PBS (phosphate buffered saline) based on the context}
 517
 518 \sawsim\ also takes advantage of a number of optimizations for faster
 519 execution.  One of the bottlenecks in the \sawsim\ code is the TODO:
 520 interpolating tree.
 521
 522 \subsection{Testing}
 523 \label{sec:sawsim:testing}
 524
 525 Once a body of code reaches a certain level of complication, it
 526 becomes difficult to convince others (or yourself) that it's actually
 527 working correctly.  In order to test \sawsim, I've developed a test
 528 suite that compares simulated unfolding force histograms with
 529 analytical histograms for a number of situations where solving for the
 530 analytical histogram is possible.
 531
 532 \section{Review of current research}
 533
 534 There are two main approaches to modeling protein domain unfolding
 535 under tension: Bell's and Kramers'\citep{schlierf06,hummer03,dudko06}.
 536 Bell introduced his model in the context of cell
 537 adhesion\citep{bell78}, but it has been widely used to model
 538 mechanical unfolding in
 539 proteins\citep{rief97a,carrion-vazquez99b,schlierf06} due to it's
 540 simplicity and ease of use\citep{hummer03}.  Kramers introduced his
 541 theory in the context of thermally activated barrier crossings, which
 542 is how we use it here.
 543
 544
 545 \subsection{Who's who}
 546
 547 The field of mechanical protein unfolding is developing along three main branches.
 548 Some groups are predominantly theoretical,
 549 \begin{itemize}
 550   \item Evans, University of British Columbia (Emeritus) \\
 551     \url{http://www.physics.ubc.ca/php/directory/research/fac-1p.phtml?entnum=55}
 552   \item Thirumalai, University of Maryland \\
 553     \url{http://www.marylandbiophysics.umd.edu/}
 554   \item Onuchic, University of California, San Diego \\
 555     \url{http://guara.ucsd.edu/}
 556   \item Hyeon, Chung-Ang University (Onuchic postdoc, Thirumalai postdoc?) \\
 557     \url{http://physics.chem.cau.ac.kr/} \\
 558   \item Dietz (Rief grad) \\
 559     \url{http://www.hd-web.de/}
 560   \item Hummer and Szabo, National Institute of Diabetes and Digestive and Kidney Diseases \\
 561     \url{http://intramural.niddk.nih.gov/research/faculty.asp?People_ID=1615}
 562     \url{http://intramural.niddk.nih.gov/research/faculty.asp?People_ID=1559}
 563 \end{itemize}
 564 and the experimentalists are usually either AFM based
 565 \begin{itemize}
 566   \item Rief, Technischen Universität München \\
 567     \url{http://cell.e22.physik.tu-muenchen.de/gruppematthias/index.html}
 568   \item Fernandez, Columbia University \\
 569     \url{http://www.columbia.edu/cu/biology/faculty/fernandez/FernandezLabWebsite/}
 570   \item Oberhauser, University of Texas Medical Branch (Fernandez postdoc) \\
 571     \url{http://www.utmb.edu/ncb/Faculty/OberhauserAndres.html}
 572   \item Marszalek, Duke University (Fernandez postdoc) \\
 573     \url{http://smfs.pratt.duke.edu/homepage/lab.htm}
 574   \item Guoliang Yang, Drexel University \\
 575     \url{http://www.physics.drexel.edu/~gyang/}
 576   \item Wojcikiewicz, University of Miami \\
 577     \url{http://chroma.med.miami.edu/physiol/faculty-wojcikiewicz_e.htm}
 578 \end{itemize}
 579 or laser-tweezers based
 580 \begin{itemize}
 581   \item Bustamante, University of California, Berkley \\
 582     \url{http://alice.berkeley.edu/}
 583   \item Forde, Simon Fraser University \\
 584     \url{http://www.sfu.ca/fordelab/index.html}
 585 \end{itemize}
 586
 587 \subsection{Evolution of unfolding modeling}
 588
 589 Evans introduced the saddle-point Kramers' approximation in a protein unfolding context 1997 (\citet{evans97} Eqn.~3).
 590 However, early work on mechanical unfolding focused on the simper Bell model\citep{rief97a}.%TODO
 591 In the early `00's, the saddle-point/steepest-descent approximation to Kramer's model (\citet{hanggi90} Eqn.~4.56c) was introduced into our field\citep{dudko03,hyeon03}.%TODO
 592 By the mid `00's, the full-blown double-integral form of Kramer's model (\citet{hanggi90} Eqn.~4.56b) was in use\citep{schlierf06}.%TODO
 593
 594 There have been some tangential attempts towards even fancier models.
 595 \citet{dudko03} attempted to reduce the restrictions of the single-unfolding-path model.
 596 \citet{hyeon03} attempted to measure the local roughness using temperature dependent unfolding.
 597
 598 \subsection{History of simulations}
 599
 600 Early molecular dynamics (MD) work on receptor-ligand breakage by Grubmuller 1996 and Izrailev 1997 (according to Evans 1997).
 601 \citet{evans97} introduce a smart Monte Carlo (SMC) Kramers' simulation.
 602
 603 \subsection{History of experimental AFM unfolding experiments}
 604
 605 \begin{itemize}
 606   \item \citet{rief97a}:
 607 \end{itemize}
 608
 609 \subsection{History of experimental laser tweezer unfolding experiments}
 610
 611 \begin{itemize}
 612   \item \citet{izrailev97}:
 613 \end{itemize}
 614
 615 \section{Single-domain proteins under constant loading}
 616
 617 eq:sawsim:order-dep
 618
 619 Let $x$ be the end to end distance of the protein, $t$ be the time since loading began, $F$ be tension applied to the protein, $P$ be the surviving population of folded proteins.
 620 Make the definitions
 621 \begin{align}
 622   v &\equiv \deriv{t}{x} && \text{the pulling velocity} \\
 623   k &\equiv \deriv{x}{F} && \text{the loading spring constant} \\
 624   P_0 &\equiv P(t=0) && \text{the initial number of folded proteins} \\
 625   D &\equiv P_0 - P && \text{the number of dead (unfolded) proteins} \\
 626   \kappa &\equiv -\frac{1}{P} \deriv{t}{P} && \text{the unfolding rate}
 627 \end{align}
 628 \nomenclature{$\equiv$}{Defined as (\ie equivalent to)}
 629 The proteins are under constant loading because
 630 \begin{equation}
 631   \deriv{t}{F} = \deriv{x}{F}\deriv{t}{x} = kv\;,
 632 \end{equation}
 633 a constant, since both $k$ and $v$ are constant (\citet{evans97} in the text on the first page, \citet{dudko06} in the text just before Eqn.~4).
 634
 635 The instantaneous likelyhood of a protein unfolding is given by $\deriv{F}{D}$, and the unfolding histogram is merely this function discretized over a bin of width $W$(This is similar to \citet{dudko06} Eqn.~2, remembering that $\dot{F}=kv$, that their probability density is not a histogram ($W=1$), and that their pdf is normalized to $N=1$).
 636 \begin{equation}
 637   h(F) \equiv \deriv{\text{bin}}{F}
 638     = \deriv{F}{D} \cdot \deriv{\text{bin}}{F}
 639     = W \deriv{F}{D}
 640     = -W \deriv{F}{P}
 641     = -W \deriv{t}{P} \deriv{F}{t}
 642     = \frac{W}{vk} P\kappa \label{eq:unfold:hist}
 643 \end{equation}
 644 Solving for theoretical histograms is merely a question of taking your chosen $\kappa$, solving for $P(f)$, and plugging into Eqn. \ref{eq:unfold:hist}.
 645 We can also make a bit of progress solving for $P$ in terms of $\kappa$ as follows:
 646 \begin{align}
 647   \kappa &\equiv -\frac{1}{P} \deriv{t}{P} \\
 648   -\kappa \dd t \cdot \deriv{t}{F} &= \frac{\dd P}{P} \\
 649   \frac{-1}{kv} \int \kappa \dd F &= \ln(P) + c \\
 650   P &= C\exp{\p({\frac{-1}{kv}\integral{}{}{F}{\kappa}})} \;, \label{eq:P}
 651 \end{align}
 652 where $c \equiv \ln(C)$ is a constant of integration scaling $P$.
 653
 654 \subsection{Constant unfolding rate}
 655
 656 In the extremely weak tension regime, the proteins' unfolding rate is independent of tension, we have
 657 \begin{align}
 658   P &= C\exp{\p({\frac{-1}{kv}\integral{}{}{F}{\kappa}})}
 659      = C\exp{\p({\frac{-1}{kv}\kappa F})}
 660      = C\exp{\p({\frac{-\kappa F}{kv}})} \\
 661   P(0) &\equiv P_0 = C\exp(0) = C \\
 662   h(F) &= \frac{W}{vk} P \kappa
 663      = \frac{W\kappa P_0}{vk} \exp{\p({\frac{-\kappa F}{kv}})}
 664 \end{align}
 665 So, a constant unfolding-rate/hazard-function gives exponential decay.
 666 Not the most earth shattering result, but it's a comforting first step, and it does show explicitly the dependence in terms of the various unfolding-specific parameters.
 667
 668 \subsection{Bell model}
 669
 670 Stepping up the intensity a bit, we come to Bell's model for unfolding
 671 (\citet{hummer03} Eqn.~1 and the first paragraph of \citet{dudko06} and \citet{dudko07}).
 672 \begin{equation}
 673   \kappa = \kappa_0 \cdot \exp\p({\frac{F \dd x}{k_B T}})
 674          = \kappa_0 \cdot \exp(a F) \;,
 675 \end{equation}
 676 where we've defined $a \equiv \dd x/k_B T$ to bundle some constants together.
 677 The unfolding histogram is then given by
 678 \begin{align}
 679   P &= C\exp\p({\frac{-1}{kv}\integral{}{}{F}{\kappa}})
 680      = C\exp\p[{\frac{-1}{kv} \frac{\kappa_0}{a} \exp(a F)}]
 681      = C\exp\p[{\frac{-\kappa_0}{akv}\exp(a F)}] \\
 682   P(0) &\equiv P_0 = C\exp\p({\frac{-\kappa_0}{akv}}) \\
 683   C &= P_0 \exp\p({\frac{\kappa_0}{akv}}) \\
 684   P &= P_0 \exp\p\{{\frac{\kappa_0}{akv}[1-\exp(a F)]}\} \\
 685   h(F) &= \frac{W}{vk} P \kappa
 686      = \frac{W}{vk} P_0 \exp\p\{{\frac{\kappa_0}{akv}[1-\exp(a F)]}\} \kappa_0 \exp(a F)
 687      = \frac{W\kappa_0 P_0}{vk} \exp\p\{{a F + \frac{\kappa_0}{akv}[1-\exp(a F)]}\} \label{eq:unfold:bell_pdf}\;.
 688 \end{align}
 689 The $F$ dependent behavior reduces to
 690 \begin{equation}
 691   h(F) \propto \exp\p[{a F - b\exp(a F)}] \;,
 692 \end{equation}
 693 where $b \equiv \kappa_0/akv \equiv \kappa_0 k_B T / k v \dd x$ is
 694 another constant rephrasing.
 695
 696 This looks similar to the Gompertz / Gumbel / Fisher-Tippett
 697 distribution, where
 698 \begin{align}
 699   p(x) &\propto z\exp(-z) \\
 700   z &\equiv \exp\p({-\frac{x-\mu}{\beta}}) \;,
 701 \end{align}
 702 but we have
 703 \begin{equation}
 704   p(x) \propto z\exp(-bz) \;.
 705 \end{equation}
 706 Strangely, the Gumbel distribution is supposed to derive from an
 707 exponentially increasing hazard function, which is where we started
 708 for our derivation.  I haven't been able to find a good explaination
 709 of this discrepancy yet, but I have found a source that echos my
 710 result (\citet{wu04} Eqn.~1).  TODO: compare \citet{wu04} with
 711 my successful derivation in \cref{sec:sawsim:results-scaffold}.
 712
 713 Oh wait, we can do this:
 714 \begin{equation}
 715   p(x) \propto z\exp(-bz) = \frac{1}{b} z'\exp(-z')\propto z'\exp(-z') \;,
 716 \end{equation}
 717 with $z'\equiv bz$.  I feel silly...  From
 718 \href{Wolfram}{http://mathworld.wolfram.com/GumbelDistribution.html},
 719 the mean of the Gumbel probability density
 720 \begin{equation}
 721   P(x) = \frac{1}{\beta} \exp\p[{\frac{x-\alpha}{\beta}
 722                                  -\exp\p({\frac{x-\alpha}{\beta}})
 723                                  }]
 724 \end{equation}
 725 is given by $\mu=\alpha-\gamma\beta$, and the variance is
 726 $\sigma^2=\frac{1}{6}\pi^2\beta^2$, where $\gamma=0.57721566\ldots$ is
 727 the Euler-Mascheroni constant.  Selecting $\beta=1/a=k_BT/\dd x$,
 728 $\alpha=-\beta\ln(\kappa\beta/kv)$, and $F=x$ we have
 729 \begin{align}
 730   P(F)
 731     &= \frac{1}{\beta} \exp\p[{\frac{F+\beta\ln(\kappa\beta/kv)}{\beta}
 732                                -\exp\p({\frac{F+\beta\ln(\kappa\beta/kv)}
 733                                              {\beta}})
 734                              }] \\
 735     &= \frac{1}{\beta} \exp(F/\beta)\exp[\ln(\kappa\beta/kv)]
 736                        \exp\p\{{-\exp(F/\beta)\exp[\ln(\kappa\beta/kv)]}\} \\
 737     &= \frac{1}{\beta} \frac{\kappa\beta}{kv} \exp(F/\beta)
 738                        \exp\p[{-\kappa\beta/kv\exp(F/\beta)}] \\
 739     &= \frac{\kappa}{kv} \exp(F/\beta)\exp[-\kappa\beta/kv\exp(F/\beta)] \\
 740     &= \frac{\kappa}{kv} \exp(F/\beta - \kappa\beta/kv\exp(F/\beta)] \\
 741     &= \frac{\kappa}{kv} \exp(aF - \kappa/akv\exp(aF)] \\
 742     &= \frac{\kappa}{kv} \exp(aF - b\exp(aF)]
 743     \propto h(F) \;.
 744 \end{align}
 745 So our unfolding force histogram for a single Bell domain under
 746 constant loading does indeed follow the Gumbel distribution.
 747
 748 % Consolidate with src/sawsim/discussion.tex
 749
 750 \subsection{Saddle-point Kramers' model}
 751
 752 For the saddle-point approximation for Kramers' model for unfolding
 753 (\citet{evans97} Eqn.~3, \citet{hanggi90} Eqn. 4.56c, \citet{vanKampen07} Eqn. XIII.2.2).
 754 \begin{equation}
 755   \kappa = \frac{D}{l_b l_{ts}} \cdot \exp\p({\frac{-E_b(F)}{k_B T}}) \;,
 756 \end{equation}
 757 where $E_b(F)$ is the barrier height under an external force $F$,
 758 $D$ is the diffusion constant of the protein conformation along the reaction coordinate,
 759 $l_b$ is the characteristic length of the bound state $l_b \equiv 1/\rho_b$,
 760 $\rho_b$ is the density of states in the bound state, and
 761 $l_{ts}$ is the characteristic length of the transition state
 762 \begin{equation}
 763   l_{ts} = TODO
 764 \end{equation}
 765
 766 \citet{evans97} solved this unfolding rate for both inverse power law potentials and cusp potentials.
 767
 768 \subsubsection{Inverse power law potentials}
 769
 770 \begin{equation}
 771   E(x) = \frac{-A}{x^n}
 772 \end{equation}
 773 (e.g. $n=6$ for a van der Waals interaction, see \citet{evans97} in
 774 the text on page 1544, in the first paragraph of the section
 775 \emph{Dissociation under force from an inverse power law attraction}).
 776 Evans then goes into diffusion constants that depend on the
 777 protein's end to end distance, and I haven't worked out the math
 778 yet.  TODO: clean up.
 779
 780
 781 \subsubsection{Cusp potentials}
 782
 783 \begin{equation}
 784   E(x) = \frac{1}{2}\kappa_a \p({\frac{x}{x_a}})^2
 785 \end{equation}
 786 (see \citet{evans97} in the text on page 1545, in the first paragraph
 787 of the section \emph{Dissociation under force from a deep harmonic well}).
 788
 789 \section{Double-integral Kramers' theory}
 790
 791 The double-integral form of overdamped Kramers' theory may be too
 792 complex for analytical predictions of unfolding-force histograms.
 793 Rather than testing the entire \sawsim\ simulation (\cref{sec:sawsim}),
 794 we will focus on demonstrating that the Kramers' $k(F)$ evaluations
 795 are working properly.  If the Bell modeled histograms check out, that
 796 gives reasonable support for the $k(F) \rightarrow \text{histogram}$
 797 portion of the simulation.
 798
 799 Looking for analytic solutions to Kramers' $k(F)$, we find that there
 800 are not many available in a closed form.  However, we do have analytic
 801 solutions for unforced $k$ for cusp-like and quartic potentials.
 802
 803 \subsection{Cusp-like potentials}
 804
 805
 806 \subsection{Quartic potentials}