src/sawsim/discussion.tex

   1 \section{Results and Discussion}
   2 \label{sec:sawsim:results}
   3
   4 \subsection{Force curves generated by simulation}
   5 \label{sec:sawsim:results:force-curves}
   6
   7 \Cref{fig:sawsim:sim-sawtooth} shows three simulated force curves from
   8 pulling a polymer composed of eight identical protein molecules using
   9 parameters from typical experimental settings.  The order of the peaks
  10 in the force curves reflects the temporal sequence of the unfolding
  11 events instead of the positions of the protein molecules in the
  12 polymer\citep{li00}.  As observed experimentally
  13 (\cref{fig:expt-sawtooth}), the forces at which identical protein
  14 molecules unfold fluctuate, revealing the stochastic nature of protein
  15 unfolding since no instrumental noise is included in the simulation.
  16
  17 After aquiring a series of experimental unfolding curves, we need to
  18 fit the data to an explanatory model.  For velocity-clamp experiments
  19 (\cref{sec:procedure,sec:sawsim:velocity-clamp}), we extract unfolding
  20 forces from the sawtooth curves (\cref{sec:hooke}) and generate
  21 histograms of unfolding forces.  Then we construct a parameterized
  22 model of the experimental system (\cref{tab:sawsim:model}).  We can
  23 then run \insilico\ experiments mimicking our \invitro\ experiments
  24 (\cref{fig:sawsim:sim-hist}).  We extract the model parameters which
  25 provide the best fit using a ``fit quality'' metric and a nonlinear
  26 optimization routine (or a full parameter space sweep, for
  27 low-dimensional parameter spaces).
  28
  29 \begin{table}[btp]
  30   \begin{center}
  31     \subfloat[][]{\label{tab:sawsim:domains}
  32       \begin{tabular}{l l l l}
  33         \toprule
  34         \multicolumn{4}{c}{Domain states} \\
  35         \midrule
  36         Domain name & Initial count & Tension model & Model parameters \\
  37         \midrule
  38         AFM cantilever & 1 & Hooke (\cref{eq:sawsim:hooke}) &
  39           $k_c=0.05\U{N/m}$ \\
  40         Folded I27 & 8 & WLC (\cref{eq:sawsim:wlc}) &
  41           $p=3.9\U{\AA}$, $L=5.1\U{nm}$ \\
  42         Unfolded I27 & 0 & WLC (\cref{eq:sawsim:wlc}) &
  43           $p=3.9\U{\AA}$, $L=33.8\U{nm}$ \\
  44         \bottomrule
  45       \end{tabular}} \\
  46     \subfloat[][]{\label{tab:sawsim:transitions}
  47       \begin{tabular}{l l l l l}
  48         \toprule
  49         \multicolumn{5}{c}{Transition rates} \\
  50         \midrule
  51         Transition & Source & Target & Rate model & Model parameters \\
  52         \midrule
  53         Unfolding & Folded I27 & Unfolded I27 & Bell (\cref{eq:sawsim:bell}) &
  54           $k_{u0}=3.3\E{-4}\U{s$^{-1}$}$, $\Delta x=0.35\U{nm}$. \\
  55         \bottomrule
  56       \end{tabular}}
  57     \caption{\protect\subref{tab:sawsim:domains} Model for
  58       I27\textsubscript{8} domain states and
  59       \protect\subref{tab:sawsim:transitions} transitions between them
  60       (compare with \cref{fig:sawsim:domains}).  The models and
  61       parameters are those given by \citet{carrion-vazquez99b}.
  62       \citet{carrion-vazquez99b} don't list their cantilever spring
  63       constant (or, presumably, use it in their simulations), but we
  64       can estimate it from the rebound slope in their
  65       \fref{figure}{2.a} and \iref{figure}{2.b}, see
  66       \cref{fig:sawsim:kappa-sawteeth}.\label{tab:sawsim:model}}
  67   \end{center}
  68 \end{table}
  69
  70 Because the unfolding behaviors of an individual sawtooth curve is
  71 stochastic (\cref{fig:sawsim:sim-sawtooth}), we cannot directly
  72 compare single curves in our fit quality metric.  Instead, we gather
  73 many experimental and simulated curves, and compare the aggregate
  74 properties.  For velocity-clamp experiments, the usual aggregate
  75 property used for comparison is a histogram of unfolding
  76 forces\citep{carrion-vazquez99b} (\cref{fig:sawsim:sim-hist}).
  77 Defining and extracting ``unfolding force'' is suprisingly complicated
  78 (\cref{sec:hooke:plugins}), but basically it is the highest tension
  79 force achieved by the chain before an unfolding event (the drops in
  80 the sawtooth).  The final drop is not an unfolding event, it is the
  81 entire chain breaking away from the cantilever tip, severing the
  82 connection between the substrate and the cantilever.
  83
  84 \begin{figure}
  85   \begin{center}
  86     \subfloat[][]{\asyinclude{figures/sim-sawtooth/sim-sawtooth}%
  87       \label{fig:sawsim:sim-sawtooth}%
  88     }
  89     \hspace{.25in}%
  90     \subfloat[][]{\asyinclude{figures/sim-hist/sim-hist}%
  91       \label{fig:sawsim:sim-hist}%
  92     }
  93     \caption{\protect\subref{fig:sawsim:sim-sawtooth} Three simulated force
  94       curves from pulling a polymer of eight identical protein
  95       molecules.  The simulation was carried out using the parameters:
  96       pulling speed $v=1\U{$\mu$m/s}$, cantilever spring constant
  97       $\kappa_c=50\U{pN/nm}$, temperature $T=300\U{K}$, persistence
  98       length of unfolded proteins $p_u=0.40\U{nm}$, $\Delta
  99       x_u=0.225\U{nm}$, and $k_{u0}=5\E{-5}\U{s$^{-1}$}$.  The contour
 100       length between the two linking points on a protein molecule is
 101       $L_{f1}=3.7\U{nm}$ in the folded form and $L_{u1}=28.1\U{nm}$ in
 102       the unfolded form.  These parameters are those of ubiquitin
 103       molecules connected through the N-C
 104       termini\citep{chyan04,carrion-vazquez03}.  Detachment from the
 105       tip or substrate is assumed to occur at a force of $400\U{pN}$.
 106       In experiments, detachments have been observed to occur at a
 107       variety of forces.  For clarity, the green and blue curves are
 108       offset by $200$ and $400\U{pN}$ respectively.
 109       \protect\subref{fig:sawsim:sim-hist} The distribution of the unfolding
 110       forces from $400$ simulated force curves ($3200$ data points)
 111       such as those shown in \protect\subref{fig:sawsim:sim-sawtooth}. The
 112       frequency is normalized by the total number of points, \ie, the
 113       height of each bin is equal to the number of data points in that
 114       bin divided by the total number of data
 115       points.\label{fig:sawsim:sim-all}}
 116   \end{center}
 117 \end{figure}
 118
 119 \subsection{The supramolecular scaffold}
 120 \label{sec:sawsim:results:scaffold}
 121
 122 \begin{figure}
 123   \begin{center}
 124   \asyinclude{figures/order-dep/order-dep}
 125   \caption{The dependence of the unfolding force on the temporal
 126     unfolding order for four polymers with $4$, $8$, $12$, and $16$
 127     identical protein domains.  Each point in the figure is the
 128     average of $400$ data points.  The first point in each curve
 129     represents the average of only the first peak in each of the $400$
 130     simulated force curves, the second point represents the average of
 131     only the second peak, and so on.  The solid lines are fits of
 132     \cref{eq:sawsim:order-dep} to the simulated data, with best fit
 133     $\kappa_\text{WLC}=203$, $207$, $161$, and $157\U{pN/nm}$,
 134     respectively, for lengths $4$ through $16$.  The insets show the
 135     force distributions of the first, fourth, and eighth peaks, left
 136     to right, for the polymer with eight protein domains.  The
 137     parameters used for generating the data were the same as those
 138     used for \cref{fig:sawsim:sim-sawtooth}, except for the number of
 139     domains.  The histogram insets were normalized in the same way as
 140     in \cref{fig:sawsim:sim-hist}.\label{fig:sawsim:order-dep}}
 141   \end{center}
 142 \end{figure}
 143
 144 Analysis of the mechanical unfolding data is complicated by the
 145 dependence of the average unfolding force on the unfolding order due
 146 to the serial linkage of the molecules.  Under an external stretching
 147 force $F$, the probability of some domain unfolding in a polymer with
 148 $N_f$ folded domains is $N_fP_1$ (\cref{eq:sawsim:prob-n}), which is
 149 higher than the unfolding probability for a single molecule $P_1$.
 150 Consequently, the average unfolding force is lower for the earlier
 151 unfolding events when $N_f$ is larger, and the force should increase
 152 as more and more molecules become unfolded.  However, there is a
 153 competing factor that opposes this trend.  As the protein molecules
 154 unfold, the chain becomes softer and the force loading rate becomes
 155 lower when the pulling speed is constant.  This lower loading rate
 156 leads to a decrease in the unfolding force (in the no-loading limit,
 157 all unfolding events occur at a tension of $0\U{N}$).  The dependence
 158 of the average unfolding force on the unfolding order is the result of
 159 these two opposing effects.  \Cref{fig:sawsim:order-dep} shows the
 160 dependence of the average unfolding force on the unfolding force peak
 161 order (the temporal order of unfolding events) for four polymers with
 162 $4$, $8$, $12$, and $16$ identical protein molecules.  The effect of
 163 polymer chain softening dominates the initial unfolding events, and
 164 the average unfolding force decreases as more molecules unfold.  After
 165 several molecules have unfolded, the softening for each additional
 166 unfolding event becomes less significant, the change in unfolding
 167 probability becomes dominant, and the unfolding force increases upon
 168 each subsequent unfolding event\citep{zinober02}.
 169 %
 170 \nomenclature[sr ]{$N_f$}{The number of folded domains in a protein
 171   chain (\cref{sec:sawsim:results:scaffold}).}
 172 \nomenclature[sr ]{$N_u$}{The number of unfolded domains in a protein
 173   chain (\cref{sec:sawsim:results:scaffold}).}
 174
 175 We validate this explanation by calculating the unfolding force
 176 probability distribution's dependence on the two competing factors.
 177 The rate of unfolding events with respect to force is
 178 \begin{align}
 179   r_{uF} &= -\deriv{F}{N_f}
 180     = -\frac{\dd N_f/\dd t}{\dd F/\dd t}
 181     = \frac{N_f k_u}{\kappa v} \\
 182     &= \frac{N_f k_{u0}}{\kappa v}\exp{\frac{F\Delta x_u}{k_B T}}
 183     = \frac{1}{\rho}\exp{\frac{F-\alpha}{\rho}} \;,
 184 \end{align}
 185 where $N_f$ is the number of folded domain, $\kappa$ is the spring
 186 constant of the cantilever-polymer system, $\kappa v$ is the force
 187 loading rate\footnote{
 188   \begin{equation}
 189     \deriv{t}{F} = \deriv{x}{F}\deriv{t}{x} = \kappa v \;.
 190   \end{equation}
 191   Alternatively,
 192   \begin{align}
 193     F &= \kappa x = \kappa vt \\
 194     \deriv{t}{F} &= \kappa v \;.
 195   \end{align}
 196   See the text before \xref{evans97}{equation}{11} or
 197   \xref{dudko06}{equation}{4} for similar explanations.
 198 }, and $k_u$ is the unfolding rate constant
 199 (\cref{eq:sawsim:bell}).  In the last expression, $\rho\equiv
 200 k_BT/\Delta x_u$, and $\alpha\equiv-\rho\ln(N_fk_{u0}\rho/\kappa v)$.
 201 We can approximate $\kappa$ as a series of Hookean springs,
 202 \begin{equation}
 203   \kappa=\p({\frac{1}{\kappa_c}+\frac{N_u}{\kappa_\text{WLC}}})^{-1} \;,
 204   \label{eq:kappa-system}
 205 \end{equation}
 206 where $\kappa_\text{WLC}$ is the effective spring constant of one
 207 unfolded domain, assumed constant for a particular polymer/cantilever
 208 combination.
 209
 210 The event probability density for events with an exponentially
 211 increasing likelihood function follows the Gumbel (minimum)
 212 probability density\citep{NIST:gumbel} with $\rho$ and $\alpha$ being
 213 the scale and location parameters, respectively\citep{hummer03}
 214 \begin{equation}
 215   \mathcal{P}(F) = \frac{1}{\rho} \exp{\frac{F-\alpha}{\rho}
 216                                            -\exp{\frac{F-\alpha}{\rho}}
 217                                            } \;.  \label{eq:sawsim:gumbel}
 218 \end{equation}
 219 The distribution has a mode $\alpha$, mean
 220 $\avg{F}=\alpha-\gamma_e\rho$, and a variance $\sigma^2 =
 221 \pi^2\rho^2/6$, where $\gamma_e=0.577\ldots$ is the Euler--Mascheroni
 222 constant\citep{NIST:gumbel}.  Therefore, the unfolding force
 223 distribution has a variance
 224 \begin{equation}
 225   \sigma^2 = \frac{\p({\frac{\pi k_BT}{\Delta x_u}})^2}{6} \;,
 226   \label{eq:sawsim:variance}
 227 \end{equation}
 228 and and average\citep{brockwell02,hummer03}
 229 % TODO: verify brockwell equivalence (p465)
 230 \begin{equation}
 231   \avg{F(i)} = \frac{k_BT}{\Delta x_u}
 232                \p[{\ln\p({\frac{\kappa v\Delta x_u}{N_fk_{u0}k_BT}})
 233                    -\gamma_e}] \;,  \label{eq:sawsim:order-dep}
 234 \end{equation}
 235 where $N_f$ and $\kappa$ depend on the domain index $i=N_u$.  Curves
 236 based on this formula fit the simulated data remarkably well
 237 considering the effective WLC\index{WLC} stiffness $\kappa_\text{WLC}$
 238 is the only fitted parameter, and that the actual WLC stiffness is not
 239 constant, as we have assumed here, but a non-linear function of $F$.
 240 \citet{dudko08} derived a formula for the loading rate for a WLC, but
 241 as far as I know, nobody has found an analytical form for the
 242 unfolding force histograms produced under such a variable loading
 243 rate.
 244 %
 245 \nomenclature[sr ]{$r_{uF}$}{Unfolding loading rate (newtons per
 246   second).}
 247 \nomenclature[sg a ]{$\alpha$}{The mode unfolding force,
 248   $\alpha\equiv-\rho\ln(N_f k_{u0}\rho/\kappa v)$
 249   (\cref{eq:sawsim:gumbel}).}
 250 \nomenclature[sg r ]{$\rho$}{The characteristic unfolding force,
 251   $\rho\equiv k_BT/\Delta x_u$ (\cref{eq:sawsim:gumbel}).}
 252 \nomenclature[sg ce ]{$\gamma_e$}{Euler--Macheroni constant,
 253   $\gamma_e=0.577\ldots$.}
 254 \nomenclature[sg s ]{$\sigma$}{Standard deviation.  For example,
 255   $\sigma$ is used as the standard deviation of an unfolding force
 256   distribution in \cref{eq:sawsim:gumbel}.  Not to be confused with
 257   the photodiode sensitivity $\sigma_p$.}
 258
 259 From \cref{fig:sawsim:order-dep}, we see that the proper way to
 260 process data from mechanical unfolding experiments is to group the
 261 curves according to the length of the polymer and to perform
 262 statistical analysis separately for peaks with the same unfolding
 263 order.  However, in most experiments, the tethering of the polymer to
 264 the AFM tip is by nonspecific adsorption; as a result, the polymers
 265 being stretched between the tip and the substrate have various
 266 lengths\citep{li00}.  In addition, the interactions between the tip
 267 and the surface often cause irregular features in the beginning of the
 268 force curve (\cref{fig:expt-sawtooth}), making the identification of
 269 the first peak uncertain\citep{carrion-vazquez00}.  Furthermore, it is
 270 often difficult to acquire a large amount of data in single molecule
 271 experiments.  These difficulties make the aforementioned data analysis
 272 approach unfeasible for many mechanical unfolding experiments.  As a
 273 result, the values of all force peaks from polymers of different
 274 lengths are often pooled together for statistical analysis.  To assess
 275 the errors caused by such pooling, simulation data were analyzed using
 276 different pooling methods and the results were compared.
 277 \Cref{fig:sawsim:sim-hist} shows that, for a polymer with eight
 278 protein molecules, the average unfolding force is $281\U{pN}$ with a
 279 standard deviation of $25\U{pN}$ when all data is pooled.  If only the
 280 first peaks in the force curves are analyzed, the average force is
 281 $279\U{pN}$ with a standard deviation of $22\U{pN}$.  While for the
 282 fourth and eighth peaks, the average force are $275\U{pN}$ and
 283 $300\U{pN}$, respectively, and the standard deviations are $23\U{pN}$
 284 and $25\U{pN}$, respectively.  As expected from the Gumbel
 285 distribution, the width of the unfolding force distribution (insets in
 286 \cref{fig:sawsim:order-dep}) is only weakly effected by unfolding
 287 order, but the average unfolding force can be quite different for the
 288 same protein because of the differences in unfolding order and polymer
 289 length.
 290
 291 \citet{benedetti11} have since proposed an alternative
 292 parameterization for \cref{eq:kappa-system}, using
 293 \begin{equation}
 294   \kappa = \p({\frac{1}{\kappa_c}
 295                + \frac{N_f}{\kappa_f} + \frac{N_u}{\kappa_u}})^{-1}
 296     \equiv \frac{\kappa'}{1 - A N_f} \;,
 297   \label{eq:kappa-system-benedetti}
 298 \end{equation}
 299 where $\kappa'$ is the spring constant of the completely unfolded
 300 chain and $A$ is a correction term for the supramolecular scaffold.
 301 This is effectively a first order Taylor expansion for $\kappa^{-1}$
 302 about $N_f=0$, but the remaining analysis is identical.
 303 \begin{align}
 304   f(N_f) \equiv \kappa^{-1}
 305     &= \frac{1}{\kappa_c} + \frac{N_f}{\kappa_f} + \frac{N - N_f}{\kappa_u} \\
 306     &= f(0) + \left.\deriv{N_f}{f}\right|_{N_f=0} N_f + \order{N_f^2} \\
 307     &\approx \p({\frac{1}{\kappa_c} + \frac{N}{\kappa_u}}) +
 308              \p({\frac{1}{\kappa_f} - \frac{1}{\kappa_u}}) N_f
 309   \label{eq:kappa-system-taylor}
 310 \end{align}
 311 In the case where the wormlike chain stiffnesses $\kappa_f$ and
 312 $\kappa_u$ are fairly constant over the unfolding region, there are no
 313 higher order terms and the first order expansion in
 314 \cref{eq:kappa-system-taylor} is exact.  Comparing
 315 \cref{eq:kappa-system-benedetti,eq:kappa-system-taylor}, we see
 316 \begin{align}
 317   \kappa' &= \frac{1}{\kappa_c} + \frac{N}{\kappa_u} \\
 318   -\kappa' A &= \frac{1}{\kappa_f} - \frac{1}{\kappa_u} \\
 319   A &= \frac{\frac{1}{\kappa_u} - \frac{1}{\kappa_f}}
 320             {\frac{1}{\kappa_c} + \frac{N}{\kappa_u}}
 321 \end{align}
 322 By focusing on the $A=0$ case (\ie~$\kappa_f=\kappa_u$),
 323 \citet{benedetti11} avoid running Monte Carlo simulations when
 324 modeling unfolding histograms.  This simplification does not hold for
 325 our simulated data (\cref{fig:sawsim:order-dep}), but for some
 326 experimental analysis the loss of accuracy may be acceptable in return
 327 for the reduced computational cost.
 328
 329 \subsection{The effect of cantilever force constant}
 330 \label{sec:sawsim:cantilever}
 331
 332 In mechanical unfolding experiments, the ability to observe the
 333 unfolding of a single protein molecule depends on the tension drop
 334 after an unfolding event such that another molecule does not unfold
 335 immediately.  The magnitude of this drop is determined by many
 336 factors, including the magnitude of the unfolding force, the contour
 337 and persistence lengths of the protein polymer, the contour length
 338 increase from unfolding, and the stiffness (force constant) of the
 339 cantilever.  Among these, the effect of the cantilever force constant
 340 is particularly interesting because cantilevers with a wide range of
 341 force constants are available.  In addition, different single molecule
 342 manipulation techniques, such as the AFM and laser tweezers, differ
 343 mainly in the range of the spring constants of their force
 344 transducers\citep{walton08}.  \Cref{fig:sawsim:kappa-sawteeth} shows
 345 the simulated force curves from pulling an octamer of protein
 346 molecules using cantilevers with different force constants, while
 347 other parameters are identical.  For this model protein, the
 348 appearance of the force curve does not change much until the force
 349 constant of the cantilever reaches a certain value
 350 ($\kappa_c=50\U{pN/nm}$).  When $\kappa_c$ is lower than this value,
 351 the individual unfolding events become less identifiable.  In order to
 352 observe individual unfolding events, the cantilever needs to have a
 353 force constant high enough so that the bending at the maximum force is
 354 small in comparison with the contour length increment from the
 355 unfolding of a single molecule.  \Cref{fig:sawsim:kappa-sawteeth} also
 356 shows that the back side of the force peaks becomes more tilted as the
 357 cantilever becomes softer.  This is due to the fact that the extension
 358 (end-to-end distance) of the protein polymer has a large sudden
 359 increase as the tension rebalances after an unfolding event.
 360
 361 It should also be mentioned that the contour length increment from
 362 each unfolding event is not equal to the distance between adjacent
 363 peaks in the force curve because the chain is never fully stretched.
 364 This contour length increase can only be obtained by fitting the curve
 365 to WLC\index{WLC} or other polymer models (\cref{fig:expt-sawtooth}).
 366
 367 \begin{figure}
 368 \begin{center}
 369 \asyinclude{figures/kappa-sawteeth/kappa-sawteeth}
 370 \caption{Simulated force curves obtained from pulling a polymer with
 371   eight protein molecules using cantilevers with different force
 372   constants $\kappa_c$.  Parameters used in generating these curves
 373   are the same as those used in \cref{fig:sawsim:sim-all}, except the
 374   cantilever force constant.  Successive force curves are offset by
 375   $300\U{pN}$ for clarity.\label{fig:sawsim:kappa-sawteeth}}
 376 \end{center}
 377 \end{figure}
 378
 379 \subsection{Determination of $\Delta x_u$ and $k_{u0}$}
 380 \label{sec:sawsim:results:fitting}
 381
 382 As mentioned in \cref{sec:sawsim:results:force-curves}, fitting
 383 experimental unfolding force histograms to simulated histograms allows
 384 you to extract best-fit parameters for your simulation model.  For
 385 example, if you have Bell model unfolding
 386 (\cref{sec:sawsim:rate:bell}), your two fitting parameters are the
 387 zero-force unfolding rate $k_{u0}$ and the distance $\Delta x_u$ from
 388 the native state to the transition state.  \cref{fig:sawsim:v-dep}
 389 shows the dependence of the unfolding force on the pulling speed for
 390 different values of $k_{u0}$ and $\Delta x_u$.  As expected, the
 391 unfolding force increases linearly with the pulling speed in the
 392 linear-log plot\citep{evans99}.  While the magnitude of the unfolding
 393 forces is affected by both $k_{u0}$ and $\Delta x_u$, the slope of
 394 speed dependence is primarily determined by $\Delta x_u$
 395 (\cref{eq:sawsim:order-dep}).  \Cref{fig:sawsim:width-v-dep} shows
 396 that the width of the unfolding force distribution is very sensitive
 397 to $\Delta x_u$, as expected from the Gumbel distribution
 398 (\cref{eq:sawsim:variance}).  To obtain the values of $k_{u0}$ and
 399 $\Delta x_u$ for the protein, the pulling speed dependence and the
 400 distribution of the unfolding forces from simulation, such as those
 401 shown in \cref{fig:sawsim:v-dep} and the insets of
 402 \cref{fig:sawsim:width-v-dep}, are compared with the experimentally
 403 measured results.  The values of $k_{u0}$ and $\Delta x_u$ that
 404 provide the best match are designated as the parameters describing the
 405 protein under study.  Since $k_{u0}$ and $\Delta x_u$ affect the
 406 unfolding forces differently, the values of both parameters can be
 407 determined simultaneously.  The data used in plotting
 408 \cref{fig:sawsim:all-v-dep} includes all force peaks from the
 409 simulated force curves because most experimental data is analyzed that
 410 way.  % TODO: all?  most data analyzed what way?
 411
 412 In most published literature, $k_{u0}$ and $\Delta x_u$ were fit by
 413 carrying out simulations using a handful of possible unfolding
 414 parameters and selected the best fit by eye%
 415 %\citep{us,carrion-vazques99b,schlierf06}
 416 .  This approach does not allow estimation of uncertainties in the
 417 fitting parameters, as shown by \citet{best02}.  A more rigorous
 418 approach involves quantifying the quality of fit between the
 419 experimental and simulated force distributions, allowing the use of a
 420 numerical minimization algorithm to pick the best fit parameters.  We
 421 use the Jensen--Shannon divergence\citep{sims09,lin91}, a measure of
 422 the similarity between two probability distributions.
 423 \begin{equation}
 424   D_\text{JS}(p_e, p_s)
 425     = D_\text{KL}(p_e, p_m) + D_\text{KL}(p_s, p_m) \;,  \label{eq:sawsim:D_JS}
 426 \end{equation}
 427 where $p_e(i)$ and $p_s(i)$ are the the values of the $i^\text{th}$
 428 bin in the experimental and simulated unfolding force histograms,
 429 respectively.  $D_\text{KL}$ is the Kullback--Leibler divergence
 430 \begin{equation}
 431   D_\text{KL}(p_p,p_q)
 432     = \sum_i p_p(i) \log_2\p({\frac{p_p(i)}{p_q(i)}}) \;,  \label{eq:sawsim:D_KL}
 433 \end{equation}
 434 where the sum is over all unfolding force histogram bins.  $p_m$ is
 435 the symmetrized probability distribution
 436 \begin{equation}
 437   p_m(i) \equiv [p_e(i)+p_s(i)]/2 \;.  \label{eq:sawsim:p_m}
 438 \end{equation}
 439 %
 440 \nomenclature[sr ]{$D_\text{JS}$}{The Jensen--Shannon divergence
 441   (\cref{eq:sawsim:D_JS}).}
 442 \nomenclature[sr ]{$D_\text{LK}$}{The Kullback--Leibler divergence
 443   (\cref{eq:sawsim:D_KL}).}
 444 \nomenclature[sr ]{$p_m(i)$}{The symmetrized probability distribution
 445   used in calculating the Jensen--Shannon divergence
 446   (\cref{eq:sawsim:D_JS,eq:sawsim:p_m}).}
 447 % DONE: Mention inter-histogram normalization? no.
 448 %  For experiments carried out over a series of pulling velocities, we
 449 %  simply sum residuals computed for each velocity, although it would
 450 %  also be reasonable to weight this sum according to the number of
 451 %  experimental unfolding events recorded for each velocity.
 452
 453 The major advantage of the Jensen--Shannon divergence is that
 454 $D_\text{JS}$ is bounded ($0\le D_\text{JS}\le 1$) regardless of the
 455 experimental and simulated histograms.  For comparison, Pearson's
 456 $\chi^2$ test\citep{NIST:chi-square},
 457 \begin{equation}
 458   D_{\chi^2} = \sum_i \frac{(p_e(i)-p_s(i))^2}{p_s(i)} \;,
 459   \label{eq:sawsim:X2}
 460 \end{equation}
 461 is infinite if there is a bin for which $p_e(i)>0$ but $p_s(i)=0$.
 462 %
 463 \nomenclature[sg x2 ]{$\chi^2$}{The chi-squared distribution.}
 464 \nomenclature[sr ]{$D_{\chi^2}$}{Pearson's $\chi^2$ test
 465   (\cref{eq:sawsim:X2}).}
 466
 467 \Cref{fig:sawsim:fit-space} shows the Jensen--Shannon divergence
 468 calculated using \cref{eq:sawsim:D_JS} between an experimental data
 469 set and simulation results obtained using a range of values of
 470 $k_{u0}$ and $\Delta x_u$.  There is an order of magnitude range of
 471 $k_{u0}$ that produce reasonable fits to experimental data
 472 (\cref{fig:sawsim:fit-space}), which is consistent with the results
 473 \citet{best02} obtained using a chi-square test.  The values of
 474 $k_{u0}$ and $\Delta x_u$ can be determined to higher precision by
 475 using both the pulling speed dependent data and the unfolding force
 476 distribution, as well as any relevant information about the protein
 477 from other sources.
 478
 479 \begin{figure}
 480   \begin{center}
 481   \subfloat[][]{\asyinclude{figures/v-dep/v-dep}%
 482     \label{fig:sawsim:v-dep}%
 483   } \\
 484   \subfloat[][]{\asyinclude{figures/v-dep/v-dep-sd}%
 485     \label{fig:sawsim:width-v-dep}%
 486   }
 487   \caption{\protect\subref{fig:sawsim:v-dep} The dependence of the
 488     unfolding forces on the pulling speed for three different model
 489     protein molecules characterized by the parameters $k_{u0}$ and
 490     $\Delta x_u$.  The polymer length is eight molecules, and each
 491     symbol is the average of $3200$ data points.
 492     \protect\subref{fig:sawsim:width-v-dep} The dependence of standard
 493     deviation of the unfolding force distribution on the pulling speed
 494     for the simulation data shown in
 495     \protect\subref{fig:sawsim:v-dep}, using the same symbols.  The
 496     insets show the force distribution histograms for the three
 497     proteins at the pulling speed of $1\U{$\mu$m/s}$.  The left,
 498     middle and right histograms are for the proteins represented by
 499     the top, middle, and bottom lines in
 500     \protect\subref{fig:sawsim:v-dep},
 501     respectively.\label{fig:sawsim:all-v-dep}}
 502   \end{center}
 503 \end{figure}
 504
 505 \begin{figure}
 506   \begin{center}
 507   \asyinclude{figures/fit-space/fit-valley}
 508   \caption{Fit quality between an experimental data set and simulated
 509     data sets obtained using various values of unfolding rate
 510     parameters $k_{u0}$ and $\Delta x_u$.  The experimental data are
 511     from octameric ubiquitin pulled at $1\U{$\mu$m/s}$\citep{chyan04},
 512     and the other model parameters are the same as those in
 513     \cref{fig:sawsim:sim-all}.  The best fit parameters are $\Delta
 514     x_u=0.17\U{nm}$ and $k_{u0}=1.2\E{-2}\U{s$^{-1}$}$.  The
 515     simulation histograms were built from $400$ pulls at for each
 516     parameter pair.\label{fig:sawsim:fit-space}}
 517   \end{center}
 518 \end{figure}
 519
 520 \subsection{Features}
 521 \label{sec:sawsim:features}
 522
 523 \sawsim\ is a great improvement over existing work in this field.
 524 \citet{best02} are the only authors to mention such automatic
 525 simulation comparisons, and their $\chi^2$ fit only compares mean
 526 unfolding forces over a range of speeds.  They calculate $\avg{F}$
 527 through an iterative method, and assume a standard deviation of
 528 $20\U{pN}$ on their simulated $\avg{F}$.  \sawsim, by comparison,
 529 makes full use of your experimental histograms, which you specify in a
 530 plain-text histogram file:
 531 \begin{center}
 532 \begin{spacing}{1}
 533 \begin{Verbatim}[commandchars=\\\{\}]
 534 #HISTOGRAM: -v 6e-7
 535 #Force (N)      Unfolding events
 536 1.4e-10 1
 537 1.5e-10 0
 538 \ldots
 539 3e-10   116
 540 3.1e-10 18
 541 3.2e-10 1
 542
 543 #HISTOGRAM: -v 8e-7
 544 #Force (N)      Unfolding events
 545 1.4e-10 0
 546 1.5e-10 3
 547 \ldots
 548 3.2e-10 50
 549 3.3e-10 13
 550
 551 #HISTOGRAM: -v 1e-6
 552 #Force (N)      Unfolding events
 553 1.5e-10 2
 554 1.6e-10 3
 555 \ldots
 556 3.3e-10 24
 557 3.4e-10 2
 558 \end{Verbatim}
 559 \end{spacing}
 560 \end{center}
 561
 562 Each \sawsim\ run simulates a single sawtooth curve, so you need to
 563 run many \sawsim\ instances to generate your simulated histograms.  To
 564 automate this task, \sawsim\ comes with a \citetalias{python} wrapping
 565 library (\pysawsim), which provides convenient programmatic and
 566 command line interfaces for generating and manipulating \sawsim\ runs.
 567 For example, to compare the experimental histograms listed above with
 568 simulated data over a 50-by-50 grid of $k_{u0}$ and $\Delta x$, you
 569 would use something like
 570 \begin{minted}[samepage]{console}
 571 $ sawsim_hist_scan.py -f '-s cantilever,hooke,0.05 -N1 -s folded,null -N8
 572 >   -s "unfolded,wlc,{0.39e-9,28e-9}" -k "folded,unfolded,bell,{%g,x%g}"
 573 >   -q folded' -r '[1e-5,1e-3,50],[0.1e-9,1e-9,50]' --logx histograms.txt
 574 \end{minted}
 575 That's a bit of a mouthful, so let's break it down.  Without the
 576 \sawsim\ template (\imint{sh}|-f ...|), we can focus on the comparison
 577 options:
 578 \begin{minted}[samepage]{console}
 579 $ sawsim_hist_scan.py ... -r '[1e-5,1e-3,50],[0.1e-9,1e-9,50]' --logx histograms.txt
 580 \end{minted}
 581 This sets up a two-parameter sweep, with the first parameter going
 582 from $1\E{-5}$ to $1\E{-3}$ in 50 logarithmic steps, and the second
 583 going from $0.1\E{-9}$ to $1\E{-9}$ in 50 linear steps.  The
 584 \sawsim\ template defines the simulation model
 585 (\cref{fig:sawsim:domains,tab:sawsim:model}), and \imint{sh}|%g| marks
 586 the location where the swept parameters will be inserted.
 587
 588 Behind the scenes, \pysawsim\ is spawning several concurrent
 589 \sawsim\ processes to take advantage of any parallel processing
 590 facilities you may have access to (e.g.~multiple cores, MPI, PBS,
 591 \ldots).  A 50-by-50 grid with 400 runs per pixel at about one second
 592 per \sawsim\ pull would take arount 12 days of serial execution.
 593 Moving the simulation to the departments' 16 core file server cuts
 594 that execution time down to 18 hours, which will easily complete over
 595 a quiet weekend.  Using MPI on the departments' 15 box, dual core
 596 computer lab, the simulation would finish overnight.
 597 %
 598 \nomenclature[text ]{MPI}{Message passing interface, a parallel
 599   computing infrastructure.}
 600 \nomenclature[text ]{PBS}{Portable batch system, a parallel computing
 601   infrastructure.  You should be able to distinguish this from the
 602   other PBS (phosphate buffered saline) based on the context.}
 603
 604 \subsection{Testing}
 605 \label{sec:sawsim:testing}
 606
 607 Once a body of code reaches a certain level of complication, it
 608 becomes difficult to convince others (or yourself) that it's actually
 609 working correctly.  In order to test \sawsim, I've developed a test
 610 suite (distributed with \sawsim) that compares simulated unfolding
 611 force histograms with analytical histograms for a number of situations
 612 where solving for the analytical histogram is possible.  In the
 613 following subsection, I'll work out the theoretical unfolding force
 614 distribution for a number of tractable cases.  The sawsim test suite
 615 generates simulated unfolding curves for these tractable cases
 616 (e.g. single domain Bell model unfolding with a constant loading
 617 rate), and compares the simulated unfolding force histograms with the
 618 expected theoretical distribution.  The simulated histograms match the
 619 theoretical distributions for each combination of models regardless of
 620 the parameters you feed into the models, so we can be confident that
 621 \sawsim\ correctly implements at those models.
 622
 623 The instantaneous likelyhood of a protein unfolding is given by
 624 $\deriv{F}{N_u}$, and the unfolding histogram is merely this function
 625 discretized over a bin of width $W$\footnote{
 626   This is similar to \xref{dudko06}{equation}{2}, remembering that
 627   $\dot{F}=\kappa v$, that their probability density is not a
 628   histogram ($W=1$), and that their probability density function is
 629   normalized to $N=1$
 630 }.
 631 \begin{equation}
 632   h(F) \equiv \deriv{\text{bin}}{F}
 633     = \deriv{F}{N_u} \cdot \deriv{\text{bin}}{F}
 634     = W \deriv{F}{N_u}
 635     = -W \deriv{F}{N_f}
 636     = -W \deriv{t}{N_f} \deriv{F}{t}
 637     = \frac{W}{\kappa v} N_f k_u \label{eq:unfold:hist}
 638 \end{equation}
 639 Solving for theoretical histograms is merely a question of taking your
 640 chosen $k_u$, solving for $N_f(F)$, and plugging into
 641 \cref{eq:unfold:hist}.  We can also make a bit of progress solving for
 642 $N_f$ in terms of $k_u$ as follows:
 643 \begin{align}
 644   k_u &\equiv -\frac{1}{N_f} \deriv{t}{N_f} \\
 645   -k_u \dd t \cdot \deriv{t}{F} &= \frac{\dd N_f}{N_f} \\
 646   \frac{-1}{\kappa v} \integral{0}{F}{F'}{k_0(F')}
 647     &= \left. \ln(N_f(F')) \right|_0^F
 648     = \ln\p({\frac{N_f(F)}{N_f(0)}})
 649     = \ln\p({\frac{N_f(F)}{N}}) \\
 650   N_f(F) &= N\exp{\frac{-1}{\kappa v}\integral{0}{F}{F'}{k_u(F')}} \;,
 651   \label{eq:N_f}
 652 \end{align}
 653 where $N_f(0) = N$ because all the domains are initially folded.
 654 %
 655 \nomenclature[sr ]{$W$}{Bin width of an unfolding force histogram
 656   (\cref{eq:unfold:hist}).}
 657
 658 \subsubsection{Constant unfolding rate}
 659
 660 In the extremely weak tension regime, the protein's unfolding rate is
 661 independent of tension, so we can simplify \cref{eq:N_f} and plug into
 662 \cref{eq:unfold:hist}.
 663 \begin{align}
 664   N_f &= N\exp{\frac{-1}{\kappa v}\integral{0}{F}{F'}{\colA{k_u(F')}}}
 665      = N\exp{\frac{-\colA{k_{u0}}}{\kappa v}\colB{\integral{0}{F}{F'}{}}}
 666      = N\exp{\frac{-k_{u0} \colB{F}}{\kappa v}} \\
 667   h(F) &= \frac{W}{\kappa v} N_f k_u
 668      = \frac{W k_{u0} N}{\kappa v} \exp{\frac{-k_{u0} F}{\kappa v}} \;.
 669 \end{align}
 670 A constant unfolding-rate/hazard-function gives exponential decay.
 671 This is not an earth shattering result, but it's a comforting first
 672 step, and it does show explicitly the dependence in terms of the
 673 various unfolding-specific parameters.
 674
 675 \subsubsection{Bell model}
 676
 677 Stepping up the intensity a bit, we come to Bell's model for unfolding
 678 (\cref{sec:sawsim:rate:bell}).  We can simplify the following
 679 calculation by parametrizing with the characteristic force $\rho$
 680 defined in \cref{sec:sawsim:results:scaffold} and the similar
 681 single-domain mode $\alpha'\equiv-\rho\ln(k_{u0}\rho/\kappa v)$.  With
 682 these substitutions, \cref{eq:sawsim:bell} becomes
 683 \begin{equation}
 684   k_u = k_{u0} \exp{\frac{F}{\rho}} \;.
 685 \end{equation}
 686 The unfolding histogram is then given via \cref{eq:N_f,eq:unfold:hist}.
 687 \begin{align}
 688   N_f &= N\exp{\frac{-1}{\kappa v}\integral{0}{F}{F'}{\colA{k_u}}}
 689      = N\exp{\frac{-1}{\kappa v}
 690        \integral{0}{F}{F'}{\colAB{k_{u0}}{\colA{\exp{\frac{F'}{\rho}}}}}}
 691      = N\exp{\frac{-\colB{k_{u0}}}{\kappa v}
 692        \colA{\integral{0}{F}{F'}{\exp{\frac{F'}{\rho}}}}}
 693      = N\exp{\frac{\colB{-}k_{u0}\colA{\rho}}{\kappa v}
 694        \colAB{\p({\exp{\frac{F}{\rho}}-1})}} \\
 695      &= N\exp{\colA{\frac{k_{u0}\rho}{\kappa v}}
 696        \colB{\p({1 - {\exp{\frac{F}{\rho}}}})}}
 697      = N\exp{\colAB{\exp{\frac{-\alpha'}{\rho}}}
 698        \colB{\p({1 - {\exp{\frac{F}{\rho}}}})}}
 699      = N\exp{\colB{\exp{\frac{-\alpha'}{\rho}} -
 700        \exp{\frac{F-\alpha'}{\rho}}}} \\
 701   h(F) &= \frac{W}{\kappa v} \colA{N_f} \colB{k_u}
 702      = \frac{W}{\kappa v}
 703        \colA{N\exp{\exp{\frac{-\alpha'}{\rho}} - \exp{\frac{F-\alpha'}{\rho}}}}
 704        \colB{k_{u0}\exp{\frac{F}{\rho}}}
 705      = \frac{W N \colAB{k_{u0}}}{\colA{\kappa v}}
 706        \exp{\colB{\frac{F}{\rho}} - \exp{\frac{F-\alpha'}{\rho}} +
 707          \exp{\frac{-\alpha'}{\rho}}} \\
 708      &= \frac{W N}{\colA{\rho}}
 709        \exp{\frac{F \colA{-\alpha'}}{\rho} - \exp{\frac{F-\alpha'}{\rho}} +
 710          \colB{\exp{\frac{-\alpha'}{\rho}}}}
 711      = \frac{W N}{\rho}
 712        \exp{\frac{F-\alpha'}{\rho} - \exp{\frac{F-\alpha'}{\rho}}}
 713        \colB{\exp{\exp{\frac{-\alpha'}{\rho}}}} \\
 714      &= \frac{W N \exp{\exp{\frac{-\alpha'}{\rho}}}}{\rho}
 715        \exp{\frac{F-\alpha'}{\rho} - \exp{\frac{F-\alpha'}{\rho}}}
 716   \label{eq:unfold:bell_pdf}
 717 \end{align}
 718 which matches \cref{eq:sawsim:gumbel} except for a constant
 719 prefactor due to the range\footnote{
 720   The Gumbel distribution in \cref{eq:sawsim:gumbel} is normalized for
 721   the range $-\infty < F < \infty$, but \cref{eq:unfold:bell_pdf} is
 722   normalized for the range $0 \le F < \infty$.  This distinction will
 723   alter the analytical mean and variance listed after
 724   \cref{eq:sawsim:gumbel}, but with the experimental unfolding
 725   histograms showing few zero-force unfolding events, the effective
 726   difference will be negligible.
 727 }.
 728 %
 729 \nomenclature[sg a' ]{$\alpha'$}{The mode unfolding force for a single
 730   folded domain, $\alpha'\equiv-\rho\ln(k_{u0}\rho/\kappa v)$
 731   (\cref{eq:unfold:bell_pdf}).}
 732
 733 \subsubsection{Saddle-point Kramers' model}
 734
 735 For the saddle-point approximation for Kramers' model for unfolding
 736 (\citet{evans97} Eqn.~3, \citet{hanggi90} Eqn. 4.56c, \citet{vanKampen07} Eqn. XIII.2.2).
 737 \begin{equation}
 738   k_u = \frac{D}{l_b l_{ts}} \cdot \exp{\frac{-U_b(F)}{k_B T}} \;,
 739     \label{eq:kramers-saddle}
 740 \end{equation}
 741 where $U_b(F)$ is the barrier height under an external force $F$,
 742 $D$ is the diffusion constant of the protein conformation along the reaction coordinate,
 743 $l_b$ is the characteristic length of the bound state $l_b \equiv 1/\rho_b$,
 744 $\rho_b$ is the density of states in the bound state, and
 745 $l_{ts}$ is the characteristic length of the transition state.
 746 %
 747 \nomenclature[sr ]{$U_b(F)$}{The barrier energy as a function of force
 748   (\cref{eq:kramers-saddle}).}
 749 \nomenclature[sr ]{$l_b$}{The characteristic length of the bound state
 750   $l_b \equiv 1/\rho_b$ (\cref{eq:kramers-saddle}).}
 751 \nomenclature[sg r_b ]{$\rho_b$}{The density of states in the bound
 752   state (\cref{eq:kramers-saddle}).}
 753 \nomenclature[sr ]{$l_{ts}$}{The characteristic length of the
 754   transition state (\cref{eq:kramers-saddle}).}
 755
 756 \citet{evans97} solved this unfolding rate for both inverse power law
 757 potentials and cusp potentials.
 758
 759 \section{Review of current research}
 760
 761 There is a long history of protein unfolding and unbinding
 762 simulations.  Early work by \citet{grubmuller96} and
 763 \citet{izrailev97} focused on molecular dynamics (MD) simulations of
 764 receptor-ligand breakage.  Around the same time, \citet{evans97}
 765 introduced a Monte Carlo Kramers' simulation in the context of
 766 receptor-ligand breakage.  The approach pioneered by \citet{evans97}
 767 was used as the basis for analysis of the initial protein unfolding
 768 experiments\citep{rief97a}.  However, none of these earlier
 769 implementations were open source, which made it difficult to reuse or
 770 validate their results.
 771 %
 772 \nomenclature[text ]{MD}{Molecular dynamics simulation.  Simulate the
 773   physical motion of atoms and molecules by numerically solving
 774   Newton's equations.}
 775
 776 Within the Monte Carlo simulation approach, there are two main models
 777 for protein domain unfolding under tension: Bell's and
 778 Kramers'\citep{schlierf06,hummer03,dudko06}.  Bell introduced his
 779 model in the context of cell adhesion\citep{bell78}, but it has been
 780 widely used to model mechanical unfolding in
 781 proteins\citep{rief97a,carrion-vazquez99b,schlierf06} due to its
 782 simplicity and ease of use\citep{hummer03}.  Kramers introduced his
 783 theory in the context of thermally activated barrier crossings, which
 784 is how we use it here.
 785
 786 Evans introduced the saddle-point Kramers' approximation in a protein
 787 unfolding context in 1997 (\xref{evans97}{equation}{3}).  However,
 788 early work on mechanical unfolding focused on the simpler Bell
 789 model\citep{rief97a}.  In the early 2000's, the
 790 saddle-point/steepest-descent approximation to Kramer's model
 791 (\xref{hanggi90}{equation}{4.56c}) was introduced into our
 792 field\citep{dudko03,hyeon03}.  By the mid 2000's, the full-blown
 793 double-integral form of Kramer's model
 794 (\xref{hanggi90}{equation}{4.56b}) was in use\citep{schlierf06}.
 795
 796 There have been some tangential attempts towards even fancier models:
 797 \citet{dudko03} attempted to reduce the restrictions of the
 798 single-unfolding-path model and \citet{hyeon03} attempted to measure
 799 the local roughness using temperature dependent unfolding.  However,
 800 further work on these lines has been slow, because the Bell model fits
 801 the data well despite its simplicity.  For more complicated models to
 802 gain ground, we need larger, more detailed datasets that expose
 803 features which the Bell model doesn't capture.