src/sawsim/discussion.tex

   1 \section{Results and Discussion}
   2 \label{sec:sawsim:results}
   3
   4 \subsection{Force curves generated by simulation}
   5 \label{sec:sawsim:results:force-curves}
   6
   7 \Cref{fig:sawsim:sim-sawtooth} shows three simulated force curves from
   8 pulling a polymer composed of eight identical protein molecules using
   9 parameters from typical experimental settings.  The order of the peaks
  10 in the force curves reflects the temporal sequence of the unfolding
  11 events instead of the positions of the protein molecules in the
  12 polymer\citep{li00}.  As observed experimentally
  13 (\cref{fig:expt-sawtooth}), the forces at which identical protein
  14 molecules unfold fluctuate, revealing the stochastic nature of protein
  15 unfolding since no instrumental noise is included in the simulation.
  16
  17 After aquiring a series of experimental unfolding curves, we need to
  18 fit the data to an explanatory model.  For velocity-clamp experiments
  19 (\cref{sec:procedure,sec:sawsim:velocity-clamp}), we extract unfolding
  20 forces from the sawtooth curves (\cref{sec:hooke}) and generate
  21 histograms of unfolding forces.  Then we construct a parameterized
  22 model of the experimental system (\cref{tab:sawsim:model}).  We can
  23 then run \insilico\ experiments mimicking our \invitro\ experiments
  24 (\cref{fig:sawsim:sim-hist}).  We extract the model parameters which
  25 provide the best fit using a ``fit quality'' metric and a nonlinear
  26 optimization routine (or a full parameter space sweep, for
  27 low-dimensional parameter spaces).
  28
  29 \begin{table}[btp]
  30   \begin{center}
  31     \subfloat[][]{\label{tab:sawsim:domains}
  32       \begin{tabular}{l l l l}
  33         \toprule
  34         \multicolumn{4}{c}{Domain states} \\
  35         \midrule
  36         Domain name & Initial count & Tension model & Model parameters \\
  37         \midrule
  38         AFM cantilever & 1 & Hooke (\cref{eq:sawsim:hooke}) &
  39           $k_c=0.05\U{N/m}$ \\
  40         Folded I27 & 8 & WLC (\cref{eq:sawsim:wlc}) &
  41           $p=3.9\U{\AA}$, $L=5.1\U{nm}$ \\
  42         Unfolded I27 & 0 & WLC (\cref{eq:sawsim:wlc}) &
  43           $p=3.9\U{\AA}$, $L=33.8\U{nm}$ \\
  44         \bottomrule
  45       \end{tabular}} \\
  46     \subfloat[][]{\label{tab:sawsim:transitions}
  47       \begin{tabular}{l l l l l}
  48         \toprule
  49         \multicolumn{5}{c}{Transition rates} \\
  50         \midrule
  51         Transition & Source & Target & Rate model & Model parameters \\
  52         \midrule
  53         Unfolding & Folded I27 & Unfolded I27 & Bell (\cref{eq:sawsim:bell}) &
  54           $k_{u0}=3.3\E{-4}\U{s$^{-1}$}$, $\Delta x=0.35\U{nm}$. \\
  55         \bottomrule
  56       \end{tabular}}
  57     \caption{\protect\subref{tab:sawsim:domains} Model for
  58       I27\textsubscript{8} domain states and
  59       \protect\subref{tab:sawsim:transitions} transitions between them
  60       (compare with \cref{fig:sawsim:domains}).  The models and
  61       parameters are those given by \citet{carrion-vazquez99b}.
  62       \citet{carrion-vazquez99b} don't list their cantilever spring
  63       constant (or, presumably, use it in their simulations), but we
  64       can estimate it from the rebound slope in their
  65       \fref{figure}{2.a} and \iref{figure}{2.b}, see
  66       \cref{fig:sawsim:kappa-sawteeth}.\label{tab:sawsim:model}}
  67   \end{center}
  68 \end{table}
  69
  70 Because the unfolding behaviors of an individual sawtooth curve is
  71 stochastic (\cref{fig:sawsim:sim-sawtooth}), we cannot directly
  72 compare single curves in our fit quality metric.  Instead, we gather
  73 many experimental and simulated curves, and compare the aggregate
  74 properties.  For velocity-clamp experiments, the usual aggregate
  75 property used for comparison is a histogram of unfolding
  76 forces\citep{carrion-vazquez99b} (\cref{fig:sawsim:sim-hist}).
  77 Defining and extracting ``unfolding force'' is suprisingly complicated
  78 (\cref{sec:hooke:plugins}), but basically it is the highest tension
  79 force achieved by the chain before an unfolding event (the drops in
  80 the sawtooth).  The final drop is not an unfolding event, it is the
  81 entire chain breaking away from the cantilever tip, severing the
  82 connection between the substrate and the cantilever.
  83
  84 \begin{figure}
  85   \begin{center}
  86     \subfloat[][]{\asyinclude{figures/sim-sawtooth/sim-sawtooth}%
  87       \label{fig:sawsim:sim-sawtooth}%
  88     }
  89     \hspace{.25in}%
  90     \subfloat[][]{\asyinclude{figures/sim-hist/sim-hist}%
  91       \label{fig:sawsim:sim-hist}%
  92     }
  93     \caption{\protect\subref{fig:sawsim:sim-sawtooth} Three simulated force
  94       curves from pulling a polymer of eight identical protein
  95       molecules.  The simulation was carried out using the parameters:
  96       pulling speed $v=1\U{$\mu$m/s}$, cantilever spring constant
  97       $\kappa_c=50\U{pN/nm}$, temperature $T=300\U{K}$, persistence
  98       length of unfolded proteins $p_u=0.40\U{nm}$, $\Delta
  99       x_u=0.225\U{nm}$, and $k_{u0}=5\E{-5}\U{s$^{-1}$}$.  The contour
 100       length between the two linking points on a protein molecule is
 101       $L_{f1}=3.7\U{nm}$ in the folded form and $L_{u1}=28.1\U{nm}$ in
 102       the unfolded form.  These parameters are those of ubiquitin
 103       molecules connected through the N-C
 104       termini\citep{chyan04,carrion-vazquez03}.  Detachment from the
 105       tip or substrate is assumed to occur at a force of $400\U{pN}$.
 106       In experiments, detachments have been observed to occur at a
 107       variety of forces.  For clarity, the green and blue curves are
 108       offset by $200$ and $400\U{pN}$ respectively.
 109       \protect\subref{fig:sawsim:sim-hist} The distribution of the unfolding
 110       forces from $400$ simulated force curves ($3200$ data points)
 111       such as those shown in \protect\subref{fig:sawsim:sim-sawtooth}. The
 112       frequency is normalized by the total number of points, \ie, the
 113       height of each bin is equal to the number of data points in that
 114       bin divided by the total number of data
 115       points.\label{fig:sawsim:sim-all}}
 116   \end{center}
 117 \end{figure}
 118
 119 \subsection{The supramolecular scaffold}
 120 \label{sec:sawsim:results:scaffold}
 121
 122 Analysis of the mechanical unfolding data is complicated by the
 123 dependence of the average unfolding force on the unfolding order due
 124 to the serial linkage of the molecules.  Under an external stretching
 125 force $F$, the probability of some domain unfolding in a polymer with
 126 $N_f$ folded domains is $N_fP_1$ (\cref{eq:sawsim:prob-n}), which is
 127 higher than the unfolding probability for a single molecule $P_1$.
 128 Consequently, the average unfolding force is lower for the earlier
 129 unfolding events when $N_f$ is larger, and the force should increase
 130 as more and more molecules become unfolded.  However, there is a
 131 competing factor that opposes this trend.  As the protein molecules
 132 unfold, the chain becomes softer and the force loading rate becomes
 133 lower when the pulling speed is constant.  This lower loading rate
 134 leads to a decrease in the unfolding force (in the no-loading limit,
 135 all unfolding events occur at a tension of $0\U{N}$).  The dependence
 136 of the average unfolding force on the unfolding order is the result of
 137 these two opposing effects.  \Cref{fig:sawsim:order-dep} shows the
 138 dependence of the average unfolding force on the unfolding force peak
 139 order (the temporal order of unfolding events) for four polymers with
 140 $4$, $8$, $12$, and $16$ identical protein molecules.  The effect of
 141 polymer chain softening dominates the initial unfolding events, and
 142 the average unfolding force decreases as more molecules unfold.  After
 143 several molecules have unfolded, the softening for each additional
 144 unfolding event becomes less significant, the change in unfolding
 145 probability becomes dominant, and the unfolding force increases upon
 146 each subsequent unfolding event\citep{zinober02}.
 147 %
 148 \nomenclature{$N_f$}{The number of folded domains in a protein chain
 149   (\cref{sec:sawsim:results:scaffold}).}
 150 \nomenclature{$N_u$}{The number of unfolded domains in a protein chain
 151   (\cref{sec:sawsim:results:scaffold}).}
 152
 153 We validate this explanation by calculating the unfolding force
 154 probability distribution's dependence on the two competing factors.
 155 The rate of unfolding events with respect to force is
 156 \begin{align}
 157   r_{uF} &= -\deriv{F}{N_f}
 158     = -\frac{\dd N_f/\dd t}{\dd F/\dd t}
 159     = \frac{N_f k_u}{\kappa v} \\
 160     &= \frac{N_f k_{u0}}{\kappa v}\exp{\frac{F\Delta x_u}{k_B T}}
 161     = \frac{1}{\rho}\exp{\frac{F-\alpha}{\rho}} \;,
 162 \end{align}
 163 where $N_f$ is the number of folded domain, $\kappa$ is the spring
 164 constant of the cantilever-polymer system, $\kappa v$ is the force
 165 loading rate\footnote{
 166   \begin{equation}
 167     \deriv{t}{F} = \deriv{x}{F}\deriv{t}{x} = \kappa v \;.
 168   \end{equation}
 169   Alternatively,
 170   \begin{align}
 171     F &= \kappa x = \kappa vt \\
 172     \deriv{t}{F} &= \kappa v \;.
 173   \end{align}
 174   See the text before \xref{evans97}{equation}{11} or
 175   \xref{dudko06}{equation}{4} for similar explanations.
 176 }, and $k_u$ is the unfolding rate constant
 177 (\cref{eq:sawsim:bell}).  In the last expression, $\rho\equiv
 178 k_BT/\Delta x_u$, and $\alpha\equiv-\rho\ln(N_fk_{u0}\rho/\kappa v)$.
 179 We can approximate $\kappa$ as a series of Hookean springs,
 180 \begin{equation}
 181   \kappa=\p({\frac{1}{\kappa_c}+\frac{N_u}{\kappa_\text{WLC}}})^{-1} \;,
 182   \label{eq:kappa-system}
 183 \end{equation}
 184 where $\kappa_\text{WLC}$ is the effective spring constant of one
 185 unfolded domain, assumed constant for a particular polymer/cantilever
 186 combination.
 187
 188 The event probability density for events with an exponentially
 189 increasing likelihood function follows the Gumbel (minimum)
 190 probability density\citep{NIST:gumbel} with $\rho$ and $\alpha$ being
 191 the scale and location parameters, respectively\citep{hummer03}
 192 \begin{equation}
 193   \mathcal{P}(F) = \frac{1}{\rho} \exp{\frac{F-\alpha}{\rho}
 194                                            -\exp{\frac{F-\alpha}{\rho}}
 195                                            } \;.  \label{eq:sawsim:gumbel}
 196 \end{equation}
 197 The distribution has a mode $\alpha$, mean
 198 $\avg{F}=\alpha-\gamma_e\rho$, and a variance $\sigma^2 =
 199 \pi^2\rho^2/6$, where $\gamma_e=0.577\ldots$ is the Euler--Mascheroni
 200 constant\citep{NIST:gumbel}.  Therefore, the unfolding force
 201 distribution has a variance
 202 \begin{equation}
 203   \sigma^2 = \frac{\p({\frac{\pi k_BT}{\Delta x_u}})^2}{6} \;,
 204   \label{eq:sawsim:variance}
 205 \end{equation}
 206 and and average\citep{brockwell02,hummer03}
 207 % TODO: verify brockwell equivalence (p465)
 208 \begin{equation}
 209   \avg{F(i)} = \frac{k_BT}{\Delta x_u}
 210                \p[{\ln\p({\frac{\kappa v\Delta x_u}{N_fk_{u0}k_BT}})
 211                    -\gamma_e}] \;,  \label{eq:sawsim:order-dep}
 212 \end{equation}
 213 where $N_f$ and $\kappa$ depend on the domain index $i=N_u$.  Curves
 214 based on this formula fit the simulated data remarkably well
 215 considering the effective WLC\index{WLC} stiffness $\kappa_\text{WLC}$
 216 is the only fitted parameter, and that the actual WLC stiffness is not
 217 constant, as we have assumed here, but a non-linear function of $F$.
 218 \citet{dudko08} derived a formula for the loading rate for a WLC, but
 219 as far as I know, nobody has found an analytical form for the
 220 unfolding force histograms produced under such a variable loading
 221 rate.
 222 %
 223 \nomenclature{$r_{uF}$}{Unfolding loading rate (newtons per second)}
 224 \nomenclature{$\alpha$}{The mode unfolding force,
 225   $\alpha\equiv-\rho\ln(N_f k_{u0}\rho/\kappa v)$
 226   (\cref{eq:sawsim:gumbel}).}
 227 \nomenclature{$\rho$}{The characteristic unfolding force,
 228   $\rho\equiv k_BT/\Delta x_u$ (\cref{eq:sawsim:gumbel}).}
 229 \nomenclature{$\gamma_e$}{Euler--Macheroni constant, $\gamma_e=0.577\ldots$}
 230 \nomenclature{$\sigma$}{Standard deviation.  For example, $\sigma$ is
 231   used as the standard deviation of an unfolding force distribution in
 232   \cref{eq:sawsim:gumbel}.  Not to be confused with the photodiode
 233   sensitivity $\sigma_p$.}
 234
 235 From \cref{fig:sawsim:order-dep}, we see that the proper way to
 236 process data from mechanical unfolding experiments is to group the
 237 curves according to the length of the polymer and to perform
 238 statistical analysis separately for peaks with the same unfolding
 239 order.  However, in most experiments, the tethering of the polymer to
 240 the AFM tip is by nonspecific adsorption; as a result, the polymers
 241 being stretched between the tip and the substrate have various
 242 lengths\citep{li00}.  In addition, the interactions between the tip
 243 and the surface often cause irregular features in the beginning of the
 244 force curve (\cref{fig:expt-sawtooth}), making the identification of
 245 the first peak uncertain\citep{carrion-vazquez00}.  Furthermore, it is
 246 often difficult to acquire a large amount of data in single molecule
 247 experiments.  These difficulties make the aforementioned data analysis
 248 approach unfeasible for many mechanical unfolding experiments.  As a
 249 result, the values of all force peaks from polymers of different
 250 lengths are often pooled together for statistical analysis.  To assess
 251 the errors caused by such pooling, simulation data were analyzed using
 252 different pooling methods and the results were compared.
 253 \Cref{fig:sawsim:sim-hist} shows that, for a polymer with eight
 254 protein molecules, the average unfolding force is $281\U{pN}$ with a
 255 standard deviation of $25\U{pN}$ when all data is pooled.  If only the
 256 first peaks in the force curves are analyzed, the average force is
 257 $279\U{pN}$ with a standard deviation of $22\U{pN}$.  While for the
 258 fourth and eighth peaks, the average force are $275\U{pN}$ and
 259 $300\U{pN}$, respectively, and the standard deviations are $23\U{pN}$
 260 and $25\U{pN}$, respectively.  As expected from the Gumbel
 261 distribution, the width of the unfolding force distribution (insets in
 262 \cref{fig:sawsim:order-dep}) is only weakly effected by unfolding
 263 order, but the average unfolding force can be quite different for the
 264 same protein because of the differences in unfolding order and polymer
 265 length.
 266
 267 \begin{figure}
 268   \begin{center}
 269   \asyinclude{figures/order-dep/order-dep}
 270   \caption{The dependence of the unfolding force on the temporal
 271     unfolding order for four polymers with $4$, $8$, $12$, and $16$
 272     identical protein domains.  Each point in the figure is the
 273     average of $400$ data points.  The first point in each curve
 274     represents the average of only the first peak in each of the $400$
 275     simulated force curves, the second point represents the average of
 276     only the second peak, and so on.  The solid lines are fits of
 277     \cref{eq:sawsim:order-dep} to the simulated data, with best fit
 278     $\kappa_\text{WLC}=203$, $207$, $161$, and $157\U{pN/nm}$,
 279     respectively, for lengths $4$ through $16$.  The insets show the
 280     force distributions of the first, fourth, and eighth peaks, left
 281     to right, for the polymer with eight protein domains.  The
 282     parameters used for generating the data were the same as those
 283     used for \cref{fig:sawsim:sim-sawtooth}, except for the number of
 284     domains.  The histogram insets were normalized in the same way as
 285     in \cref{fig:sawsim:sim-hist}.\label{fig:sawsim:order-dep}}
 286   \end{center}
 287 \end{figure}
 288
 289 \citet{benedetti11} have since proposed an alternative
 290 parameterization for \cref{eq:kappa-system}, using
 291 \begin{equation}
 292   \kappa = \p({\frac{1}{\kappa_c}
 293                + \frac{N_f}{\kappa_f} + \frac{N_u}{\kappa_u}})^{-1}
 294     \equiv \frac{\kappa'}{1 - A N_f} \;,
 295   \label{eq:kappa-system-benedetti}
 296 \end{equation}
 297 where $\kappa'$ is the spring constant of the completely unfolded
 298 chain and $A$ is a correction term for the supramolecular scaffold.
 299 This is effectively a first order Taylor expansion for $\kappa^{-1}$
 300 about $N_f=0$, but the remaining analysis is identical.
 301 \begin{align}
 302   f(N_f) \equiv \kappa^{-1}
 303     &= \frac{1}{\kappa_c} + \frac{N_f}{\kappa_f} + \frac{N - N_f}{\kappa_u} \\
 304     &= f(0) + \left.\deriv{N_f}{f}\right|_{N_f=0} N_f + \order{N_f^2} \\
 305     &\approx \p({\frac{1}{\kappa_c} + \frac{N}{\kappa_u}}) +
 306              \p({\frac{1}{\kappa_f} - \frac{1}{\kappa_u}}) N_f
 307   \label{eq:kappa-system-taylor}
 308 \end{align}
 309 In the case where the wormlike chain stiffnesses $\kappa_f$ and
 310 $\kappa_u$ are fairly constant over the unfolding region, there are no
 311 higher order terms and the first order expansion in
 312 \cref{eq:kappa-system-taylor} is exact.  Comparing
 313 \cref{eq:kappa-system-benedetti,eq:kappa-system-taylor}, we see
 314 \begin{align}
 315   \kappa' &= \frac{1}{\kappa_c} + \frac{N}{\kappa_u} \\
 316   -\kappa' A &= \frac{1}{\kappa_f} - \frac{1}{\kappa_u} \\
 317   A &= \frac{\frac{1}{\kappa_u} - \frac{1}{\kappa_f}}
 318             {\frac{1}{\kappa_c} + \frac{N}{\kappa_u}}
 319 \end{align}
 320 By focusing on the $A=0$ case (\ie~$\kappa_f=\kappa_u$),
 321 \citet{benedetti11} avoid running Monte Carlo simulations when
 322 modeling unfolding histograms.  This simplification does not hold for
 323 our simulated data (\cref{fig:sawsim:order-dep}), but for some
 324 experimental analysis the loss of accuracy may be acceptable in return
 325 for the reduced computational cost.
 326
 327 \subsection{The effect of cantilever force constant}
 328 \label{sec:sawsim:cantilever}
 329
 330 In mechanical unfolding experiments, the ability to observe the
 331 unfolding of a single protein molecule depends on the tension drop
 332 after an unfolding event such that another molecule does not unfold
 333 immediately.  The magnitude of this drop is determined by many
 334 factors, including the magnitude of the unfolding force, the contour
 335 and persistence lengths of the protein polymer, the contour length
 336 increase from unfolding, and the stiffness (force constant) of the
 337 cantilever.  Among these, the effect of the cantilever force constant
 338 is particularly interesting because cantilevers with a wide range of
 339 force constants are available.  In addition, different single molecule
 340 manipulation techniques, such as the AFM and laser tweezers, differ
 341 mainly in the range of the spring constants of their force
 342 transducers\citep{walton08}.  \Cref{fig:sawsim:kappa-sawteeth} shows
 343 the simulated force curves from pulling an octamer of protein
 344 molecules using cantilevers with different force constants, while
 345 other parameters are identical.  For this model protein, the
 346 appearance of the force curve does not change much until the force
 347 constant of the cantilever reaches a certain value
 348 ($\kappa_c=50\U{pN/nm}$).  When $\kappa_c$ is lower than this value,
 349 the individual unfolding events become less identifiable.  In order to
 350 observe individual unfolding events, the cantilever needs to have a
 351 force constant high enough so that the bending at the maximum force is
 352 small in comparison with the contour length increment from the
 353 unfolding of a single molecule.  \Cref{fig:sawsim:kappa-sawteeth} also
 354 shows that the back side of the force peaks becomes more tilted as the
 355 cantilever becomes softer.  This is due to the fact that the extension
 356 (end-to-end distance) of the protein polymer has a large sudden
 357 increase as the tension rebalances after an unfolding event.
 358
 359 It should also be mentioned that the contour length increment from
 360 each unfolding event is not equal to the distance between adjacent
 361 peaks in the force curve because the chain is never fully stretched.
 362 This contour length increase can only be obtained by fitting the curve
 363 to WLC\index{WLC} or other polymer models (\cref{fig:expt-sawtooth}).
 364
 365 \begin{figure}
 366 \begin{center}
 367 \asyinclude{figures/kappa-sawteeth/kappa-sawteeth}
 368 \caption{Simulated force curves obtained from pulling a polymer with
 369   eight protein molecules using cantilevers with different force
 370   constants $\kappa_c$.  Parameters used in generating these curves
 371   are the same as those used in \cref{fig:sawsim:sim-all}, except the
 372   cantilever force constant.  Successive force curves are offset by
 373   $300\U{pN}$ for clarity.\label{fig:sawsim:kappa-sawteeth}}
 374 \end{center}
 375 \end{figure}
 376
 377 \subsection{Determination of $\Delta x_u$ and $k_{u0}$}
 378 \label{sec:sawsim:results:fitting}
 379
 380 As mentioned in \cref{sec:sawsim:results:force-curves}, fitting
 381 experimental unfolding force histograms to simulated histograms allows
 382 you to extract best-fit parameters for your simulation model.  For
 383 example, if you have Bell model unfolding
 384 (\cref{sec:sawsim:rate:bell}), your two fitting parameters are the
 385 zero-force unfolding rate $k_{u0}$ and the distance $\Delta x_u$ from
 386 the native state to the transition state.  \cref{fig:sawsim:v-dep}
 387 shows the dependence of the unfolding force on the pulling speed for
 388 different values of $k_{u0}$ and $\Delta x_u$.  As expected, the
 389 unfolding force increases linearly with the pulling speed in the
 390 linear-log plot\citep{evans99}.  While the magnitude of the unfolding
 391 forces is affected by both $k_{u0}$ and $\Delta x_u$, the slope of
 392 speed dependence is primarily determined by $\Delta x_u$
 393 (\cref{eq:sawsim:order-dep}).  \Cref{fig:sawsim:width-v-dep} shows
 394 that the width of the unfolding force distribution is very sensitive
 395 to $\Delta x_u$, as expected from the Gumbel distribution
 396 (\cref{eq:sawsim:variance}).  To obtain the values of $k_{u0}$ and
 397 $\Delta x_u$ for the protein, the pulling speed dependence and the
 398 distribution of the unfolding forces from simulation, such as those
 399 shown in \cref{fig:sawsim:v-dep} and the insets of
 400 \cref{fig:sawsim:width-v-dep}, are compared with the experimentally
 401 measured results.  The values of $k_{u0}$ and $\Delta x_u$ that
 402 provide the best match are designated as the parameters describing the
 403 protein under study.  Since $k_{u0}$ and $\Delta x_u$ affect the
 404 unfolding forces differently, the values of both parameters can be
 405 determined simultaneously.  The data used in plotting
 406 \cref{fig:sawsim:all-v-dep} includes all force peaks from the
 407 simulated force curves because most experimental data is analyzed that
 408 way.  % TODO: all?  most data analyzed what way?
 409
 410 In most published literature, $k_{u0}$ and $\Delta x_u$ were fit by
 411 carrying out simulations using a handful of possible unfolding
 412 parameters and selected the best fit by eye%
 413 %\citep{us,carrion-vazques99b,schlierf06}
 414 .  This approach does not allow estimation of uncertainties in the
 415 fitting parameters, as shown by \citet{best02}.  A more rigorous
 416 approach involves quantifying the quality of fit between the
 417 experimental and simulated force distributions, allowing the use of a
 418 numerical minimization algorithm to pick the best fit parameters.  We
 419 use the Jensen--Shannon divergence\citep{sims09,lin91}, a measure of
 420 the similarity between two probability distributions.
 421 \begin{equation}
 422   D_\text{JS}(p_e, p_s)
 423     = D_\text{KL}(p_e, p_m) + D_\text{KL}(p_s, p_m) \;,  \label{eq:sawsim:D_JS}
 424 \end{equation}
 425 where $p_e(i)$ and $p_s(i)$ are the the values of the $i^\text{th}$
 426 bin in the experimental and simulated unfolding force histograms,
 427 respectively.  $D_\text{KL}$ is the Kullback--Leibler divergence
 428 \begin{equation}
 429   D_\text{KL}(p_p,p_q)
 430     = \sum_i p_p(i) \log_2\p({\frac{p_p(i)}{p_q(i)}}) \;,  \label{eq:sawsim:D_KL}
 431 \end{equation}
 432 where the sum is over all unfolding force histogram bins.  $p_m$ is
 433 the symmetrized probability distribution
 434 \begin{equation}
 435   p_m(i) \equiv [p_e(i)+p_s(i)]/2 \;.  \label{eq:sawsim:p_m}
 436 \end{equation}
 437 %
 438 \nomenclature{$D_\text{JS}$}{The Jensen--Shannon divergence
 439   (\cref{eq:sawsim:D_JS}).}
 440 \nomenclature{$D_\text{LK}$}{The Kullback--Leibler divergence
 441   (\cref{eq:sawsim:D_KL}).}
 442 \nomenclature{$p_m(i)$}{The symmetrized probability distribution used
 443   in calculating the Jensen--Shannon divergence
 444   (\cref{eq:sawsim:D_JS,eq:sawsim:p_m}).}
 445 % DONE: Mention inter-histogram normalization? no.
 446 %  For experiments carried out over a series of pulling velocities, we
 447 %  simply sum residuals computed for each velocity, although it would
 448 %  also be reasonable to weight this sum according to the number of
 449 %  experimental unfolding events recorded for each velocity.
 450
 451 The major advantage of the Jensen--Shannon divergence is that
 452 $D_\text{JS}$ is bounded ($0\le D_\text{JS}\le 1$) regardless of the
 453 experimental and simulated histograms.  For comparison, Pearson's
 454 $\chi^2$ test\citep{NIST:chi-square},
 455 \begin{equation}
 456   D_{\chi^2} = \sum_i \frac{(p_e(i)-p_s(i))^2}{p_s(i)} \;,
 457   \label{eq:sawsim:X2}
 458 \end{equation}
 459 is infinite if there is a bin for which $p_e(i)>0$ but $p_s(i)=0$.
 460 %
 461 \nomenclature{$\chi^2$}{The chi-squared distribution}
 462 \nomenclature{$D_{\chi^2}$}{Pearson's $\chi^2$ test (\cref{eq:sawsim:X2}).}
 463
 464 \Cref{fig:sawsim:fit-space} shows the Jensen--Shannon divergence
 465 calculated using \cref{eq:sawsim:D_JS} between an experimental data
 466 set and simulation results obtained using a range of values of
 467 $k_{u0}$ and $\Delta x_u$.  There is an order of magnitude range of
 468 $k_{u0}$ that produce reasonable fits to experimental data
 469 (\cref{fig:sawsim:fit-space}), which is consistent with the results
 470 \citet{best02} obtained using a chi-square test.  The values of
 471 $k_{u0}$ and $\Delta x_u$ can be determined to higher precision by
 472 using both the pulling speed dependent data and the unfolding force
 473 distribution, as well as any relevant information about the protein
 474 from other sources.
 475
 476 \begin{figure}
 477   \begin{center}
 478   \subfloat[][]{\asyinclude{figures/v-dep/v-dep}%
 479     \label{fig:sawsim:v-dep}%
 480   } \\
 481   \subfloat[][]{\asyinclude{figures/v-dep/v-dep-sd}%
 482     \label{fig:sawsim:width-v-dep}%
 483   }
 484   \caption{\protect\subref{fig:sawsim:v-dep} The dependence of the
 485     unfolding forces on the pulling speed for three different model
 486     protein molecules characterized by the parameters $k_{u0}$ and
 487     $\Delta x_u$.  The polymer length is eight molecules, and each
 488     symbol is the average of $3200$ data points.
 489     \protect\subref{fig:sawsim:width-v-dep} The dependence of standard
 490     deviation of the unfolding force distribution on the pulling speed
 491     for the simulation data shown in
 492     \protect\subref{fig:sawsim:v-dep}, using the same symbols.  The
 493     insets show the force distribution histograms for the three
 494     proteins at the pulling speed of $1\U{$\mu$m/s}$.  The left,
 495     middle and right histograms are for the proteins represented by
 496     the top, middle, and bottom lines in
 497     \protect\subref{fig:sawsim:v-dep},
 498     respectively.\label{fig:sawsim:all-v-dep}}
 499   \end{center}
 500 \end{figure}
 501
 502 \begin{figure}
 503   \begin{center}
 504   \asyinclude{figures/fit-space/fit-valley}
 505   \caption{Fit quality between an experimental data set and simulated
 506     data sets obtained using various values of unfolding rate
 507     parameters $k_{u0}$ and $\Delta x_u$.  The experimental data are
 508     from octameric ubiquitin pulled at $1\U{$\mu$m/s}$\citep{chyan04},
 509     and the other model parameters are the same as those in
 510     \cref{fig:sawsim:sim-all}.  The best fit parameters are $\Delta
 511     x_u=0.17\U{nm}$ and $k_{u0}=1.2\E{-2}\U{s$^{-1}$}$.  The
 512     simulation histograms were built from $400$ pulls at for each
 513     parameter pair.\label{fig:sawsim:fit-space}}
 514   \end{center}
 515 \end{figure}
 516
 517 \subsection{Features}
 518 \label{sec:sawsim:features}
 519
 520 \sawsim\ is a great improvement over existing work in this field.
 521 \citet{best02} are the only authors to mention such automatic
 522 simulation comparisons, and their $\chi^2$ fit only compares mean
 523 unfolding forces over a range of speeds.  They calculate $\avg{F}$
 524 through an iterative method, and assume a standard deviation of
 525 $20\U{pN}$ on their simulated $\avg{F}$.  \sawsim, by comparison,
 526 makes full use of your experimental histograms, which you specify in a
 527 plain-text histogram file:
 528 \begin{center}
 529 \begin{spacing}{1}
 530 \begin{Verbatim}[commandchars=\\\{\}]
 531 #HISTOGRAM: -v 6e-7
 532 #Force (N)      Unfolding events
 533 1.4e-10 1
 534 1.5e-10 0
 535 \ldots
 536 3e-10   116
 537 3.1e-10 18
 538 3.2e-10 1
 539
 540 #HISTOGRAM: -v 8e-7
 541 #Force (N)      Unfolding events
 542 1.4e-10 0
 543 1.5e-10 3
 544 \ldots
 545 3.2e-10 50
 546 3.3e-10 13
 547
 548 #HISTOGRAM: -v 1e-6
 549 #Force (N)      Unfolding events
 550 1.5e-10 2
 551 1.6e-10 3
 552 \ldots
 553 3.3e-10 24
 554 3.4e-10 2
 555 \end{Verbatim}
 556 \end{spacing}
 557 \end{center}
 558
 559 Each \sawsim\ run simulates a single sawtooth curve, so you need to
 560 run many \sawsim\ instances to generate your simulated histograms.  To
 561 automate this task, \sawsim\ comes with a \citetalias{python} wrapping
 562 library (\pysawsim), which provides convenient programmatic and
 563 command line interfaces for generating and manipulating \sawsim\ runs.
 564 For example, to compare the experimental histograms listed above with
 565 simulated data over a 50-by-50 grid of $k_{u0}$ and $\Delta x$, you
 566 would use something like
 567 \begin{minted}[samepage]{console}
 568 $ sawsim_hist_scan.py -f '-s cantilever,hooke,0.05 -N1 -s folded,null -N8
 569 >   -s "unfolded,wlc,{0.39e-9,28e-9}" -k "folded,unfolded,bell,{%g,x%g}"
 570 >   -q folded' -r '[1e-5,1e-3,50],[0.1e-9,1e-9,50]' --logx histograms.txt
 571 \end{minted}
 572 That's a bit of a mouthful, so let's break it down.  Without the
 573 \sawsim\ template (\imint{sh}|-f ...|), we can focus on the comparison
 574 options:
 575 \begin{minted}[samepage]{console}
 576 $ sawsim_hist_scan.py ... -r '[1e-5,1e-3,50],[0.1e-9,1e-9,50]' --logx histograms.txt
 577 \end{minted}
 578 This sets up a two-parameter sweep, with the first parameter going
 579 from $1\E{-5}$ to $1\E{-3}$ in 50 logarithmic steps, and the second
 580 going from $0.1\E{-9}$ to $1\E{-9}$ in 50 linear steps.  The
 581 \sawsim\ template defines the simulation model
 582 (\cref{fig:sawsim:domains,tab:sawsim:model}), and \imint{sh}|%g| marks
 583 the location where the swept parameters will be inserted.
 584
 585 Behind the scenes, \pysawsim\ is spawning several concurrent
 586 \sawsim\ processes to take advantage of any parallel processing
 587 facilities you may have access to (e.g.~multiple cores, MPI, PBS,
 588 \ldots).  A 50-by-50 grid with 400 runs per pixel at about one second
 589 per \sawsim\ pull would take arount 12 days of serial execution.
 590 Moving the simulation to the departments' 16 core file server cuts
 591 that execution time down to 18 hours, which will easily complete over
 592 a quiet weekend.  Using MPI on the departments' 15 box, dual core
 593 computer lab, the simulation would finish overnight.
 594 \nomenclature{MPI}{Message passing interface, a parallel computing
 595   infrastructure}
 596 \nomenclature{PBS}{Portable batch system, a parallel computing
 597   infrastructure.  You should be able to distinguish this from the
 598   other PBS (phosphate buffered saline) based on the context}
 599
 600 \subsection{Testing}
 601 \label{sec:sawsim:testing}
 602
 603 Once a body of code reaches a certain level of complication, it
 604 becomes difficult to convince others (or yourself) that it's actually
 605 working correctly.  In order to test \sawsim, I've developed a test
 606 suite (distributed with \sawsim) that compares simulated unfolding
 607 force histograms with analytical histograms for a number of situations
 608 where solving for the analytical histogram is possible.
 609
 610 The instantaneous likelyhood of a protein unfolding is given by
 611 $\deriv{F}{N_u}$, and the unfolding histogram is merely this function
 612 discretized over a bin of width $W$\footnote{
 613   This is similar to \xref{dudko06}{equation}{2}, remembering that
 614   $\dot{F}=\kappa v$, that their probability density is not a
 615   histogram ($W=1$), and that their probability density function is
 616   normalized to $N=1$
 617 }.
 618 \begin{equation}
 619   h(F) \equiv \deriv{\text{bin}}{F}
 620     = \deriv{F}{N_u} \cdot \deriv{\text{bin}}{F}
 621     = W \deriv{F}{N_u}
 622     = -W \deriv{F}{N_f}
 623     = -W \deriv{t}{N_f} \deriv{F}{t}
 624     = \frac{W}{\kappa v} N_f k_u \label{eq:unfold:hist}
 625 \end{equation}
 626 Solving for theoretical histograms is merely a question of taking your
 627 chosen $k_u$, solving for $N_f(F)$, and plugging into
 628 \cref{eq:unfold:hist}.  We can also make a bit of progress solving for
 629 $N_f$ in terms of $k_u$ as follows:
 630 \begin{align}
 631   k_u &\equiv -\frac{1}{N_f} \deriv{t}{N_f} \\
 632   -k_u \dd t \cdot \deriv{t}{F} &= \frac{\dd N_f}{N_f} \\
 633   \frac{-1}{\kappa v} \integral{0}{F}{F'}{k_0(F')}
 634     &= \left. \ln(N_f(F')) \right|_0^F
 635     = \ln\p({\frac{N_f(F)}{N_f(0)}})
 636     = \ln\p({\frac{N_f(F)}{N}}) \\
 637   N_f(F) &= N\exp{\frac{-1}{\kappa v}\integral{0}{F}{F'}{k_u(F')}} \;,
 638   \label{eq:N_f}
 639 \end{align}
 640 where $N_f(0) = N$ because all the domains are initially folded.
 641 %
 642 \nomenclature{$W$}{Bin width of an unfolding force histogram
 643   (\cref{eq:unfold:hist}).}
 644
 645 \subsubsection{Constant unfolding rate}
 646
 647 In the extremely weak tension regime, the protein's unfolding rate is
 648 independent of tension, so we can simplify \cref{eq:N_f} and plug into
 649 \cref{eq:unfold:hist}.
 650 \begin{align}
 651   N_f &= N\exp{\frac{-1}{\kappa v}\integral{0}{F}{F'}{\colA{k_u(F')}}}
 652      = N\exp{\frac{-\colA{k_{u0}}}{\kappa v}\colB{\integral{0}{F}{F'}{}}}
 653      = N\exp{\frac{-k_{u0} \colB{F}}{\kappa v}} \\
 654   h(F) &= \frac{W}{\kappa v} N_f k_u
 655      = \frac{W k_{u0} N}{\kappa v} \exp{\frac{-k_{u0} F}{\kappa v}} \;.
 656 \end{align}
 657 A constant unfolding-rate/hazard-function gives exponential decay.
 658 This is not an earth shattering result, but it's a comforting first
 659 step, and it does show explicitly the dependence in terms of the
 660 various unfolding-specific parameters.
 661
 662 \subsubsection{Bell model}
 663
 664 Stepping up the intensity a bit, we come to Bell's model for unfolding
 665 (\cref{sec:sawsim:rate:bell}).  We can simplify the following
 666 calculation by parametrizing with the characteristic force $\rho$
 667 defined in \cref{sec:sawsim:results:scaffold} and the similar
 668 single-domain mode $\alpha'\equiv-\rho\ln(k_{u0}\rho/\kappa v)$.  With
 669 these substitutions, \cref{eq:sawsim:bell} becomes
 670 \begin{equation}
 671   k_u = k_{u0} \exp{\frac{F}{\rho}} \;.
 672 \end{equation}
 673 The unfolding histogram is then given via \cref{eq:N_f,eq:unfold:hist}.
 674 \begin{align}
 675   N_f &= N\exp{\frac{-1}{\kappa v}\integral{0}{F}{F'}{\colA{k_u}}}
 676      = N\exp{\frac{-1}{\kappa v}
 677        \integral{0}{F}{F'}{\colAB{k_{u0}}{\colA{\exp{\frac{F'}{\rho}}}}}}
 678      = N\exp{\frac{-\colB{k_{u0}}}{\kappa v}
 679        \colA{\integral{0}{F}{F'}{\exp{\frac{F'}{\rho}}}}}
 680      = N\exp{\frac{\colB{-}k_{u0}\colA{\rho}}{\kappa v}
 681        \colAB{\p({\exp{\frac{F}{\rho}}-1})}} \\
 682      &= N\exp{\colA{\frac{k_{u0}\rho}{\kappa v}}
 683        \colB{\p({1 - {\exp{\frac{F}{\rho}}}})}}
 684      = N\exp{\colAB{\exp{\frac{-\alpha'}{\rho}}}
 685        \colB{\p({1 - {\exp{\frac{F}{\rho}}}})}}
 686      = N\exp{\colB{\exp{\frac{-\alpha'}{\rho}} -
 687        \exp{\frac{F-\alpha'}{\rho}}}} \\
 688   h(F) &= \frac{W}{\kappa v} \colA{N_f} \colB{k_u}
 689      = \frac{W}{\kappa v}
 690        \colA{N\exp{\exp{\frac{-\alpha'}{\rho}} - \exp{\frac{F-\alpha'}{\rho}}}}
 691        \colB{k_{u0}\exp{\frac{F}{\rho}}}
 692      = \frac{W N \colAB{k_{u0}}}{\colA{\kappa v}}
 693        \exp{\colB{\frac{F}{\rho}} - \exp{\frac{F-\alpha'}{\rho}} +
 694          \exp{\frac{-\alpha'}{\rho}}} \\
 695      &= \frac{W N}{\colA{\rho}}
 696        \exp{\frac{F \colA{-\alpha'}}{\rho} - \exp{\frac{F-\alpha'}{\rho}} +
 697          \colB{\exp{\frac{-\alpha'}{\rho}}}}
 698      = \frac{W N}{\rho}
 699        \exp{\frac{F-\alpha'}{\rho} - \exp{\frac{F-\alpha'}{\rho}}}
 700        \colB{\exp{\exp{\frac{-\alpha'}{\rho}}}} \\
 701      &= \frac{W N \exp{\exp{\frac{-\alpha'}{\rho}}}}{\rho}
 702        \exp{\frac{F-\alpha'}{\rho} - \exp{\frac{F-\alpha'}{\rho}}}
 703   \label{eq:unfold:bell_pdf}
 704 \end{align}
 705 which matches \cref{eq:sawsim:gumbel} except for a constant
 706 prefactor due to the range\footnote{
 707   The Gumbel distribution in \cref{eq:sawsim:gumbel} is normalized for
 708   the range $-\infty < F < \infty$, but \cref{eq:unfold:bell_pdf} is
 709   normalized for the range $0 \le F < \infty$.  This distinction will
 710   alter the analytical mean and variance listed after
 711   \cref{eq:sawsim:gumbel}, but with the experimental unfolding
 712   histograms showing few zero-force unfolding events, the effective
 713   difference will be negligible.
 714 }.
 715 %
 716 \nomenclature{$\alpha'$}{The mode unfolding force for a single folded
 717   domain, $\alpha'\equiv-\rho\ln(k_{u0}\rho/\kappa v)$
 718   (\cref{eq:unfold:bell_pdf}).}
 719
 720 \subsubsection{Saddle-point Kramers' model}
 721
 722 For the saddle-point approximation for Kramers' model for unfolding
 723 (\citet{evans97} Eqn.~3, \citet{hanggi90} Eqn. 4.56c, \citet{vanKampen07} Eqn. XIII.2.2).
 724 \begin{equation}
 725   k_u = \frac{D}{l_b l_{ts}} \cdot \exp{\frac{-U_b(F)}{k_B T}} \;,
 726     \label{eq:kramers-saddle}
 727 \end{equation}
 728 where $U_b(F)$ is the barrier height under an external force $F$,
 729 $D$ is the diffusion constant of the protein conformation along the reaction coordinate,
 730 $l_b$ is the characteristic length of the bound state $l_b \equiv 1/\rho_b$,
 731 $\rho_b$ is the density of states in the bound state, and
 732 $l_{ts}$ is the characteristic length of the transition state.
 733 %
 734 \nomenclature{$U_b(F)$}{The barrier energy as a function of force
 735   (\cref{eq:kramers-saddle}).}
 736 \nomenclature{$l_b$}{The characteristic length of the bound state $l_b
 737   \equiv 1/\rho_b$ (\cref{eq:kramers-saddle}).}
 738 \nomenclature{$\rho_b$}{The density of states in the bound state
 739   (\cref{eq:kramers-saddle}).}
 740 \nomenclature{$l_{ts}$}{The characteristic length of the transition
 741   state (\cref{eq:kramers-saddle}).}
 742
 743 \citet{evans97} solved this unfolding rate for both inverse power law
 744 potentials and cusp potentials.
 745
 746 \subsubsection{Double-integral Kramers' theory}
 747
 748 The double-integral form of overdamped Kramers' theory may be too
 749 complex for analytical predictions of unfolding-force histograms.
 750 Rather than testing the entire \sawsim\ simulation (\cref{sec:sawsim}),
 751 we will focus on demonstrating that the Kramers' $k(F)$ evaluations
 752 are working properly.  If the Bell modeled histograms check out, that
 753 gives reasonable support for the $k(F) \rightarrow \text{histogram}$
 754 portion of the simulation.
 755
 756 Looking for analytic solutions to Kramers' $k(F)$, we find that there
 757 are not many available in a closed form.  However, we do have analytic
 758 solutions for unforced $k$ for cusp-like and quartic potentials.
 759
 760 \section{Review of current research}
 761
 762 There are two main approaches to modeling protein domain unfolding
 763 under tension: Bell's and Kramers'\citep{schlierf06,hummer03,dudko06}.
 764 Bell introduced his model in the context of cell
 765 adhesion\citep{bell78}, but it has been widely used to model
 766 mechanical unfolding in
 767 proteins\citep{rief97a,carrion-vazquez99b,schlierf06} due to its
 768 simplicity and ease of use\citep{hummer03}.  Kramers introduced his
 769 theory in the context of thermally activated barrier crossings, which
 770 is how we use it here.
 771
 772 \subsection{Evolution of unfolding modeling}
 773
 774 Evans introduced the saddle-point Kramers' approximation in a protein unfolding context in 1997 (\citet{evans97} Eqn.~3).
 775 However, early work on mechanical unfolding focused on the simpler Bell model\citep{rief97a}.%TODO
 776 In the early 2000's, the saddle-point/steepest-descent approximation to Kramer's model (\xref{hanggi90}{equation}{4.56c}) was introduced into our field\citep{dudko03,hyeon03}.%TODO
 777 By the mid 2000's, the full-blown double-integral form of Kramer's model (\xref{hanggi90}{equation}{4.56b}) was in use\citep{schlierf06}.%TODO
 778
 779 There have been some tangential attempts towards even fancier models.
 780 \citet{dudko03} attempted to reduce the restrictions of the single-unfolding-path model.
 781 \citet{hyeon03} attempted to measure the local roughness using temperature dependent unfolding.
 782
 783 \subsection{History of simulations}
 784
 785 There is a long history of protein unfolding and unbinding
 786 simulations.  Early work by \citet{grubmuller96} and
 787 \citet{izrailev97} focused on molecular dynamics (MD) simulations of
 788 receptor-ligand breakage.  Around the same time, \citet{evans97}
 789 introduced a Monte Carlo Kramers' simulation in the context of
 790 receptor-ligand breakage.  The approach pioneered by \citet{evans97}
 791 was used as the basis for analysis of the initial protein unfolding
 792 experiments\citep{rief97a}.  However, none of these earlier
 793 implementations were open source, which made it difficult to reuse or
 794 validate their results.
 795 %
 796 \nomenclature{MD}{Molecular dynamics simulation.  Simulate the
 797   physical motion of atoms and molecules by numerically solving
 798   Newton's equations.}
 799
 800 \subsection{History of experimental AFM unfolding experiments}
 801
 802 \begin{itemize}
 803   \item \citet{rief97a}:
 804 \end{itemize}
 805
 806 \subsection{History of experimental laser tweezer unfolding experiments}
 807
 808 \begin{itemize}
 809   \item \citet{izrailev97}:
 810 \end{itemize}