--- /dev/null
+\section{Results and Discussion}
+\label{sec:sawsim:results}
+
+\subsection{Force curves generated by simulation}
+\label{sec:sawsim:results-force_curves}
+
+\Cref{fig:sawsim:sim-sawtooth} shows three simulated force curves from
+pulling a polymer composed of eight identical protein molecules using
+parameters from typical experimental settings. The order of the peaks
+in the force curves reflects the temporal sequence of the unfolding
+events instead of the positions of the protein molecules in the
+polymer\cite{li00}. As observed experimentally
+(\cref{fig:expt-sawtooth}), the forces at which identical protein
+molecules unfold fluctuate, revealing the stochastic nature of protein
+unfolding since no instrumental noise is included in the simulation.
+\Cref{fig:sawsim:sim-hist} shows the distribution of the unfolding forces,
+\ie, the highest force in each peak (except the last peak in a force
+curve), from a total of $400$ force curves ($3200$ force values). The
+unfolding forces have an average of $281\U{pN}$ with a standard
+deviation of $25\U{pN}$.
+% DONE: discuss noise? no.
+
+\begin{figure}
+\vspace{-1in}
+\begin{center}
+\subfloat[][]{\includegraphics{sawsim/sim-sawtooth}\label{fig:sawsim:sim-sawtooth}%
+}\\
+\subfloat[][]{\includegraphics{sawsim/sim-hist}\label{fig:sawsim:sim-hist}%
+}
+\caption{(a) Three simulated force curves from pulling a polymer of
+ eight identical protein molecules. The simulation was carried out
+ using the parameters: pulling speed $v=1\U{$\mu$m/s}$, cantilever
+ spring constant $\kappa_c=50\U{pN/nm}$, temperature $T=300\U{K}$,
+ persistence length of unfolded proteins $p_u=0.40\U{nm}$, $\Delta
+ x_u=0.225\U{nm}$, and $k_{u0}=5\E{-5}\U{s$^{-1}$}$. The contour
+ length between the two linking point on a protein molecule is
+ $L_{f1}=3.7\U{nm}$ in the folded form and $L_{u1}=28.1\U{nm}$ in the
+ unfolded form. These parameters are those of ubiquitin molecules
+ connected through the N-C termini\citep{chyan04,carrion-vazquez03}.
+ Detachment from the tip or substrate is assumed to occur at a force
+ of $400\U{pN}$. In experiments, detachments have been observed to
+ occur at a variety of forces. (b) The distribution of the unfolding
+ forces from $400$ simulated force curves ($3200$ data points) such
+ as that shown in (a). The frequency is normalized by the total
+ number of points, \ie, the height of each bin is equal to the number
+ of data points in that bin divided by the total number of data
+ points ($3200$, for this histogram).\label{fig:sawsim:sim-all}}
+\end{center}
+\end{figure}
+
+\subsection{Dependence of the unfolding force on the unfolding order
+and polymer length}
+\label{sec:sawsim:results-scaffold}
+
+Analysis of the mechanical unfolding data is complicated by the
+dependence of the average unfolding force on the unfolding order due
+to the serial linkage of the molecules. Under an external stretching
+force $F$, the probability of some domain unfolding in a polymer with
+$N_f$ folded domains is $N_fP_1$ (\cref{eq:sawsim:prob-n}), which is higher
+than the unfolding probability for a single molecule $P_1$.
+Consequently, the average unfolding force is lower for the earlier
+unfolding events when $N_f$ is larger, and the force should increase
+as more and more molecules become unfolded. However, there is a
+competing factor that opposes this trend. As the protein molecules
+unfold, the chain becomes softer and the force loading rate becomes
+lower when the pulling speed is constant, leading to a decrease in the
+unfolding force. The dependence of the average unfolding force on the
+unfolding order is the result of these two opposing effects.
+\Cref{fig:sawsim:order-dep} shows the dependence of the average unfolding
+force on the unfolding force peak order (the temporal order of
+unfolding events) for four polymers with $4$, $8$, $12$, and $16$
+identical protein molecules. The effect of polymer chain softening
+dominates the initial unfolding events, and the average unfolding
+force decreases as more molecules unfold. After several molecules
+have unfolded, the softening for each additional unfolding event
+becomes less significant, the change in unfolding probability becomes
+dominant, and the unfolding force increases upon each subsequent
+unfolding event\citep{zinober02}.
+
+We validate this explanation by calculating the unfolding force
+probability distribution's depending on the two competing factors.
+The rate of unfolding events with respect to force is
+\begin{align}
+ r_{uF} &= -\deriv{F}{N_f}
+ = -\frac{\dd N_f/\dd t}{\dd F/\dd t}
+ = \frac{N_f k_u}{\kappa v} \\
+ &= \frac{N_f k_{u0}}{\kappa v}\exp\p({\frac{F\Delta x_u}{k_B T}})
+ = \frac{1}{\rho}\exp\p({\frac{F-\alpha}{\rho}}) \;,
+\end{align}
+where $N_f$ is the number of folded domain,
+$\kappa=(1/\kappa_c+N_u/\kappa_\text{WLC})^{-1}$ is the spring
+constant of the cantilever-polymer system ($\kappa_\text{WLC}$ is the
+effective spring constant of one unfolded domain, assumed constant for
+a particular polymer/cantilever combination), $\kappa v$ is the force
+loading rate, and $k_u$ is the unfolding rate constant
+(\cref{eq:sawsim:bell}). In the last expression, $\rho\equiv k_BT/\Delta
+x_u$, and $\alpha\equiv-\rho\ln(N_fk_{u0}\rho/\kappa v)$. The event
+probability density for events with an exponentially increasing
+likelihood function follows the Gumbel (minimum) probability
+density\cite{NIST:gumbel}, with $\rho$ and $\alpha$ being the scale
+and location parameters, respectively
+\begin{equation}
+ \mathcal{P}(F) = \frac{1}{\rho} \exp\p[{\frac{F-\alpha}{\rho}
+ -\exp\p({\frac{F-\alpha}{\rho}})
+ }] \;. \label{eq:sawsim:gumbel}
+\end{equation}
+The distribution has a mean $\avg{F}=\alpha-\gamma_e\rho$ and a
+variance $\sigma^2=\pi^2\rho^2/6$, where $\gamma_e=0.577\ldots$ is the
+Euler-Mascheroni constant. Therefore, the unfolding force
+distribution has a variance $\sigma^2 = (\pi k_BT/\Delta x_u)^2/6$,
+and and average
+\begin{equation}
+ \avg{F(i)} = \frac{k_BT}{\Delta x_u}
+ \p[{-\ln\p({\frac{N_fk_{u0}k_BT}{\kappa v\Delta x_u}})
+ -\gamma_e}] \;, \label{eq:sawsim:order-dep}
+\end{equation}
+where $N_f$ and $\kappa$ depend on the domain index $i=N_u$. Curves based
+on this formula fit the simulated data remarkably well considering the
+effective WLC stiffness $\kappa_\text{WLC}$ is the only fitted
+parameter, and that the actual WLC stiffness is not constant, as we
+have assumed here, but a non-linear function of $F$.
+
+From \cref{fig:sawsim:order-dep}, it can be seen that the proper way to
+process data from mechanical unfolding experiments is to group the
+curves according to the length of the polymer and to perform
+statistical analysis separately for peaks with the same unfolding
+order. However, in most experiments, the tethering of the polymer to
+the AFM tip is by nonspecific adsorption; as a result, the polymers
+being stretched between the tip and the substrate have various
+lengths. In addition, the interactions between the tip and the
+surface often cause irregular features in the beginning of the force
+curve (\cref{fig:expt-sawtooth}), making the identification of the
+first peak uncertain. Furthermore, it is often difficult to acquire a
+large amount of data in single molecules experiments. These
+difficulties make the aforementioned data analysis approach unfeasible
+for many mechanical unfolding experiments. As a result, the values of
+all force peaks from polymers of different lengths are often pooled
+together for statistical analysis. To assess the errors caused by
+such pooling, simulation data were analyzed using different pooling
+methods and the results were compared. \Cref{fig:sawsim:sim-hist} shows
+that, for a polymer with eight protein molecules, the average
+unfolding force is $281\U{pN}$ with a standard deviation of $25\U{pN}$
+when all data is pooled. If only the first peaks in the force curves
+are analyzed, the average force is $279\U{pN}$ with a standard
+deviation of $22\U{pN}$. While for the fourth and eighth peaks, the
+average force are $275\U{pN}$ and $300\U{pN}$, respectively, and the
+standard deviations are $23\U{pN}$ and $25\U{pN}$, respectively. As
+expected from the Gumbel distribution, the width of the unfolding
+force distribution (insets in \cref{fig:sawsim:order-dep}) is only weakly
+effected by unfolding order, but the average unfolding force can be
+quite different for the same protein because of the differences in
+unfolding order and polymer length.
+
+\begin{figure}
+\begin{center}
+\includegraphics{sawsim/order-dep}
+\caption{The dependence of the unfolding force on the temporal
+ unfolding order for four polymers with $4$, $8$, $12$, and $16$
+ molecules of identical proteins. Each point in the figure is the
+ average of $400$ data points. The first point in each curve
+ represents the average of only the first peak in each of the $400$
+ simulated force curves, the second point represents the average of
+ only the second peak, and so on. The solid lines are fits of
+ \cref{eq:sawsim:order-dep} to the simulated data, with best fit
+ $\kappa_\text{WLC}=203$, $207$, $161$, and $157\U{pN/nm}$,
+ respectively, for lengths $4$ through $16$. The insets show the
+ force distributions of the first, fourth, and eighth peaks, left to
+ right, for the polymer with eight protein molecules. The parameters
+ used for generating the data were the same as those used for
+ \cref{fig:sawsim:sim-sawtooth}, except the polymer length, and the
+ histograms in the insets were normalized in the same way as in
+ \cref{fig:sawsim:sim-hist}.\label{fig:sawsim:order-dep}}
+\end{center}
+\end{figure}
+
+\subsection{The effect of polymer inhomogeneity}
+\label{sec:sawsim:results-folded-tension}
+
+The unfolded polypeptide chain has been shown to follow the WLC model
+quite well, though other polymer models, such as the Freely-Jointed
+Chain (FJC)\citep{verdier70}, can be used to fit the force-extension
+relationship\citep{janshoff00}. A short chain of folded proteins,
+however, cannot be described well by polymer models. Several studies
+have used WLC and FJC models to fit the elastic properties of the
+modular protein titin\citep{granzier97,linke98a},
+% TODO: check it really is folded domains \& bulk titin
+but native titin contains hundreds of folded and unfolded domains
+domains. For the short protein polymers common in mechanical
+unfolding experiments, the cantilever dominates the elasticity of the
+polymer-cantilever system before any protein molecules unfold. After
+the first unfolding event occurs, the unfolded portion of the chain is
+already longer and softer than the sum of all the remaining folded
+domains, and dominates the elasticity of the whole chain. Therefore,
+the details of the tension model chosen for the folded domains has
+negligible effect on the unfolding forces, which was also suggested by
+\citet{staple08}. Force curves simulated using different models to
+describe the folded domains yielded almost identical unfolding force
+distributions (data not shown).
+
+\subsection{The effect of cantilever force constant}
+\label{sec:sawsim:cantilever}
+
+In mechanical unfolding experiments, the ability to observe the
+unfolding of a single protein molecule depends on the tension drop
+after an unfolding event such that another molecule does not unfold
+immediately. The magnitude of this drop is determined by many
+factors, including the magnitude of the unfolding force, the contour
+and persistence lengths of the protein polymer, the contour length
+increase from unfolding, and the stiffness (force constant) of the
+cantilever. Among these, the effect of the cantilever force constant
+is particularly interesting because cantilevers with a wide range of
+force constants are available. In addition different single molecule
+manipulation techniques, such as the AFM and laser tweezers, differ
+mainly in the range of the spring constants of their force
+transducers. \Cref{fig:sawsim:kappa-sawteeth} shows the simulated force
+curves from pulling an octamer of protein molecules using cantilevers
+with different force constants, while other parameters are identical.
+For this model protein, the appearance of the force curve does not
+change much until the force constant of the cantilever reaches a
+certain value ($\kappa_c=50\U{pN/nm}$). When $\kappa_c$ is lower than
+this value, the individual unfolding events become less identifiable.
+In order to observe individual unfolding events, the cantilever needs
+to have a force constant high enough so that the bending at the
+maximum force is small in comparison with the contour length increment
+from the unfolding of a single molecule. \Cref{fig:sawsim:kappa-sawteeth}
+also shows that the back side of the force peaks becomes more tilted
+as the cantilever becomes softer. This is due to the fact that the
+extension (end-to-end distance) of the protein polymer has a large
+sudden increase as the tension rebalances after an unfolding event.
+
+It should also be mentioned that the contour length increment from
+each unfolding event is not equal to the distance between adjacent
+peaks in the force curve because the chain is never fully stretched.
+This contour length increase can only be obtained by fitting the curve
+to WLC or other polymer models (\cref{fig:expt-sawtooth}).
+
+\begin{figure}
+\begin{center}
+\includegraphics{sawsim/kappa-sawteeth}
+\caption{Simulated force curves obtained from pulling a polymer with
+ eight protein molecules using cantilevers with different force
+ constants $\kappa_c$. Parameters used in generating these curves
+ are the same as those used in \cref{fig:sawsim:sim-all}, except the
+ cantilever force constant.\label{fig:sawsim:kappa-sawteeth}}
+\end{center}
+\end{figure}
+
+\subsection{Determination of $\Delta x_u$ and $k_{u0}$}
+\label{sec:sawsim:results-fitting}
+
+The zero-force unfolding rate $k_{u0}$ and the distance $\Delta x_u$
+from the native state to the transition state are the two kinetic
+parameters obtainable for mechanical unfolding experiments by matching
+the simulated data with measured results. \cref{fig:sawsim:v-dep} shows the
+dependence of the unfolding force on the pulling speed for different
+values of $k_{u0}$ and $\Delta x_u$. As expected, the unfolding force
+increases linearly with the pulling speed in the linear-log
+plot\citep{evans99}. While the magnitude of the unfolding forces is
+affected by both $k_{u0}$ and $\Delta x_u$, the slope of speed
+dependence is primarily determined by $\Delta x_u$.
+\Cref{fig:sawsim:width-v-dep} shows that the width of the unfolding force
+distribution is very sensitive to $\Delta x_u$, as expected from the
+Gumbel distribution discussed in \cref{sec:sawsim:results-scaffold}. To
+obtain the values of $k_{u0}$ and $\Delta x_u$ for the protein, the
+pulling speed dependence and the distribution of the unfolding forces
+from simulation, such as those shown in \cref{fig:sawsim:v-dep} and the
+insets of \cref{fig:sawsim:width-v-dep}, are compared with the experimentally
+measured results. The values of $k_{u0}$ and $\Delta x_u$ that
+provide the best match are designated as the parameters describing the
+protein under study. Since $k_{u0}$ and $\Delta x_u$ affect the
+unfolding forces differently, the values of both parameters can be
+determined simultaneously. The data used in plotting
+\cref{fig:sawsim:all-v-dep} includes all force peaks from the simulated force
+curves because most experimental data is analyzed that way.
+
+In most published literature, determination of the values of $k_{u0}$
+and $\Delta x_u$ was mostly done by carrying out simulations using a
+handful of possible unfolding parameters and selected the best fit by
+eye%
+%\citep{us,CV1999}
+. This approach does not allow estimation of uncertainties in the
+fitting parameters, as shown by \citet{best02}. A more rigorous
+approach involves quantifying the quality of fit between the
+experimental and simulated force distributions, allowing the use of a
+numerical minimization algorithm to pick the best fit parameters. We
+use the Jensen-Shannon divergence\citep{sims09,lin91}, a measure of
+the similarity between two probability distributions.
+\begin{equation}
+ D_\text{JS}(p_e, p_s)
+ = D_\text{KL}(p_e, p_m) + D_\text{KL}(p_s, p_m) \;, \label{eq:sawsim:D_JS}
+\end{equation}
+where $p_e(i)$ and $p_s(i)$ are the the values of the $i^\text{th}$
+bin in the experimental and simulated unfolding force histograms,
+respectively. $D_\text{KL}$ is the Kullback-Leibler divergence
+\begin{equation}
+ D_\text{KL}(p_p,p_q)
+ = \sum_i p_p(i) \log_2\p({\frac{p_p(i)}{p_q(i)}}) \;, \label{eq:sawsim:D_KL}
+\end{equation}
+where the sum is over all unfolding force histogram bins. $p_m$ is
+the symmetrized probability distribution
+\begin{equation}
+ p_m(i) \equiv [p_e(i)+p_s(i)]/2 \;.
+\end{equation}
+% DONE: Mention inter-histogram normalization? no.
+% For experiments carried out over a series of pulling velocities, we
+% simply sum residuals computed for each velocity, although it would
+% also be reasonable to weight this sum according to the number of
+% experimental unfolding events recorded for each velocity.
+% DONE: mention $D_\text{JS}$ features to explains selection over $\chi^2$? no.
+\Cref{fig:sawsim:fit-space} shows the Jensen-Shannon divergence calculated
+using \cref{eq:sawsim:D_JS} between an experimental data set and simulation
+results obtained using a range of values of $k_{u0}$ and $\Delta x_u$.
+There is an order of magnitude range of $k_{u0}$ that produce
+reasonable fits to experimental data (\cref{fig:sawsim:fit-space}), which is
+consistent with the results \citet{best02} obtained using a chi-square
+test. The values of $k_{u0}$ and $\Delta x_u$ can be determined to
+higher precision by using both the pulling speed dependent data and
+the unfolding force distribution, as well as any relevant information
+about the protein from other sources.
+
+\begin{figure}
+\begin{center}
+\subfloat[][]{\includegraphics{sawsim/v-dep}\label{fig:sawsim:v-dep}%
+} \\
+\subfloat[][]{\includegraphics{sawsim/v-dep-sd}\label{fig:sawsim:width-v-dep}%
+}
+\caption{(a) The dependence of the unfolding forces on the pulling
+ speed for three different model protein molecules characterized by
+ the parameters $k_{u0}$ and $\Delta x_u$. The polymer length is
+ eight molecules, and each symbol is the average of $3200$ data
+ points. (b) The dependence of standard deviation of the unfolding
+ force distribution on the pulling speed for the simulation data
+ shown in (a), using the same symbols. The insets show the force
+ distribution histograms for the three proteins at the pulling speed
+ of $1\U{$\mu$m/s}$. The left, middle and right histograms are for
+ the proteins represented by the top, middle, and bottom lines in (a),
+ respectively.\label{fig:sawsim:all-v-dep}}
+\end{center}
+\end{figure}
+
+\begin{figure}
+\begin{center}
+\includegraphics{sawsim/fit-space}
+\caption{Fit quality between an experimental data set and simulated
+ data sets obtained using various values of unfolding rate parameters
+ $k_{u0}$ and $\Delta x_u$. The experimental data are from octameric
+ ubiquitin pulled at $1\U{$\mu$m/s}$\citep{chyan04}, and the other
+ model parameters are the same as those in \cref{fig:sawsim:sim-all}. The
+ best fit parameters are $\Delta x_u=0.17\U{nm}$ and
+ $k_{u0}=1.2\E{-2}\U{s$^{-1}$}$. The simulation histograms were
+ built from $400$ pulls at for each parameter pair. The color scale
+ shown on the right is $log_{10}(D_\text{JS})$.
+\label{fig:sawsim:fit-space}}
+\end{center}
+\end{figure}
\chapter{Monte Carlo mechanical unfolding simulation}
+\label{sec:sawsim}
-\section{Introduction}
-\label{sec:sawsim:intro}
-
-% why single molecule approach?
-In biological systems the most important molecules, such as proteins,
-nucleic acids, and polysaccharides, are all polymers. Understanding
-the properties and functions of these polymeric molecules is crucial
-in elucidating the molecular mechanisms of structures and processes in
-cells. The large size of these molecules imposes certain limitations
-on the information attainable from bulk measurements, because the
-macromolecules in a population can have diverse conformations and
-react differently to external stimuli. The individualized, and
-sometimes rare, behaviors of macromolecules can have important
-implications for their functions inside the cell. Recently developed
-single molecule techniques, in which these macromolecules are studied
-one at a time, have provided important and complementary information
-about the functional mechanisms of several biological
-systems\citep{bustamante08}.
-
-% why AFM & what an AFM is
-Single molecule techniques for the study of biological macromolecules
-include optical measurements, \ie, single molecule fluorescence
-microscopy and spectroscopy, and mechanical manipulations of
-individual macromolecules, \ie, force microscopy and spectroscopy
-using atomic force microscopes (AFM), laser tweezers\citep{forde02},
-magnetic tweezers\citep{smith92}, or biomembrane force
-probes\citep{merkel99}. Of these mechanical manipulation methods, AFM
-is the most widely used due to the availability of user-friendly
-commercial instruments. AFM has been employed on several types of
-biological macromolecules, such as mechanically unfolding
-proteins\citep{carrion-vazquez99b} and forcing structural transitions
-in DNA\citep{rief99} and polysaccharides\citep{rief97a}. An AFM uses
-a sharp tip integrated at the end of a cantilever to interact with the
-sample. Cantilever bending is measured by a laser reflected off the
-cantilever and incident on a position sensitive photodetector. When
-the bending force constant of the cantilever is known\citep{levy02},
-the force applied to the sample can be calculated. The forces that
-can be applied and measured with an AFM range from tens of piconewtons
-to hundreds of nanonewtons. The investigation of the unfolding and
-refolding processes of individual protein molecules by the AFM is
-feasible because many globular proteins unfold under external forces
-in this range. Since elucidating the mechanism of protein folding is
-currently one of the most important problems in biological sciences,
-the potential of the AFM for revealing significant and unique
-information about protein folding has stimulated much effort in both
-experimental and theoretical research.
-
-% AFM unfolding procedure
-In a mechanical unfolding experiment, a protein polymer is tethered
-between two surfaces: a flat substrate and an AFM tip. The polymer is
-stretched by increasing the separation between the two surfaces
-(\cref{fig:sawsim:schematic}). The most common mode is the constant speed
-experiment in which the substrate surface is moved away from the tip
-at a uniform rate. The tethering surfaces, \ie, the AFM tip and the
-substrate, have much larger radii of curvature than the dimensions of
-single domain globular proteins that are normally used for folding
-studies. This causes difficulties in manipulating individual protein
-molecules because nonspecific interactions between the AFM tip and the
-substrate may be stronger than the forces required to unfold the
-protein when the surfaces are a few nanometers apart. To circumvent
-these difficulties, globular protein molecules are linked into
-polymers, which are then used in the AFM
-studies\citep{carrion-vazquez99b,chyan04,carrion-vazquez03}. When
-such a polymer is pulled from its ends, each protein molecule feels
-the externally applied force, which increases the probability of
-unfolding by reducing the free energy barrier between the native and
-unfolded states. The unfolding of one molecule in the polymer causes
-a sudden lengthening of the polymer chain, which reduces the force on
-each protein molecule and prevents another unfolding event from
-occurring immediately. The force versus extension relationship, or
-\emph{force curve}, shows a typical sawtooth pattern
-(\cref{fig:sawsim:expt-sawtooth}), where each peak corresponds to the
-unfolding of a single protein in the polymer. Therefore, the
-individual unfolding events are separated from each other in space and
-time, facilitating single molecule studies.
-
-% AFM unfolding analysis, what we'll do.
-Much theoretical and computational work has been done in order to
-extract information about the structural, kinetic, and energetic
-properties of the protein molecules from the experimental data of
-force-induced protein unfolding measurements. Steered molecular
-dynamics simulations\citep{lu98}, as well as calculations and
-simulations using lattice\citep{lu99} and off-lattice
-models\citep{klimov00,li01}, have provided insights into structural
-and energetic changes during force-induced protein unfolding.
-However, these simulations often involve time scales that are orders
-of magnitude smaller than those of the experiments, and the parameters
-used in the calculations are often neither experimentally controllable
-nor measurable. As a result, a Monte Carlo simulation approach based
-on a simple two-state kinetic model for the protein is usually used to
-analyze data from mechanical unfolding experiments. A comparison of
-the force curves measured experimentally and those generated from
-simulations can yield the unfolding rate constant of the protein in
-the absence of force as well as the distance from the native state to
-the transition state along the pulling direction. The Monte Carlo
-simulation method has been used since the first report of mechanical
-unfolding experiment using AFM\citep{rief97b},
-%,rief97a,rief98,carrion-vazquez99b,best02,zinober02,jollymore09},
-however, a comprehensive description and discussion of the simulation
-procedures and the intricacies involved has not been reported. In
-this paper, we provide a detailed description of the simulation
-procedure, including the theories, approximations, and assumptions
-involved. We also explain the procedure for extracting kinetic
-properties of the protein from experimental data and introduce a
-quantitative measure of fit quality between simulation and
-experimental results. In addition, the effects of various
-experimental parameters on force curve appearance are demonstrated,
-and the errors associated with different methods of data pooling are
-discussed. We believe that these results will be useful in
-experimental design, artifact identification, and data analysis for
-single molecule mechanical unfolding experiments.
-
-\begin{figure}
-\begin{center}
-\subfloat[][]{\includegraphics{sawsim/schematic}\label{fig:sawsim:schematic}}
-%\hspace{.25in}%
-\subfloat[][]{\includegraphics{sawsim/expt-sawtooth}\label{fig:sawsim:expt-sawtooth}}
-\caption{(a) Schematic of the experimental setup for mechanical
- unfolding of proteins using an AFM (not to scale). An experiment
- starts with the tip in contact with the substrate surface, which is
- then moved away from the tip at a constant speed. $x_t$ is the
- distance traveled by the substrate, $x_c$ is the cantilever
- deflection, $x_u$ is the extension of the unfolded polymer, and
- $x_f=x_{f1}+x_{f2}$ is the extension of the folded polymer. (b) An
- experimental force curve from stretching a ubiquitin polymer with
- the rising parts of the peaks fitted to the WLC
- model\citep{chyan04}. The pulling speed used was $1\U{$\mu$m/s}$.
- The irregular features at the beginning of the curve are due to
- nonspecific interactions between the tip and the substrate surface,
- and the last high force peak is caused by the detachment of the
- polymer from the tip or the substrate surface. Note that the
- abscissa is the extension of the protein chain $x_t-x_c$.}
-\end{center}
-\end{figure}
-
-\section{Methods}
-\label{sec:sawsim:methods}
-
-% simulation overview
-In simulating the mechanical unfolding process, a force curve is
-generated by calculating the amount of the cantilever bending as the
-substrate surface moves away from the tip. The cantilever bending is
-obtained by balancing the tension in the protein polymer and the
-Hookean force of the bent cantilever. The unfolding probability of
-the protein molecules in the polymer is then calculated for that
-tension, and whether an unfolding event occurs is determined according
-to a Monte Carlo method. The simulation was implemented in
-C\footnote{Source code available at
- \url{http://www.physics.drexel.edu/~wking/sawsim/}}.
-
-\subsection{Generating force curves}
-\label{sec:sawsim:methods-tension}
-
-% introduce domains and groups.
-The fundamental abstraction of the simulation is the ``domain'', which
-represents a discrete chunk of the flexible chain between the
-substrate and the cantilever holder. Each of these domains is
-assigned a particular state; for example, the domain representing the
-cantilever is assigned to the ``cantilever'' state, and the domains
-representing protein molecules are assigned to either the ``folded''
-or the ``unfolded'' state. When balancing the tension along the
-chain, we assume that the spatial order of domains along the chain is
-irrelevant\citep{li00}, and therefore, the domains can be rearranged
-and grouped by state. To determine the tension in the chain and the
-amount of cantilever bending when $n$ states are populated, a system
-of $n+1$ equations with $n+1$ unknowns must be solved
-\begin{align}
- F_i(x_i) &= F_t \label{eq:sawsim:tension-balance} \\
- \sum_i x_i &= x_t \;, \label{eq:sawsim:x-total}
-\end{align}
-where $F$ are tensions, $x$ are extensions, and the subscripts $i$ and
-$t$ represent a particular state group and the total chain
-respectively (\cref{fig:sawsim:schematic}). From this $F(x_t)$ may be
-computed using any multi-dimensional root-finding algorithm.
-
-% introduce particular models, and mention parameter aggregation
-Inside this framework, we choose a particular extension model
-$F_i(x_i)$ for each domain state. Cantilever elasticity is described
-by Hooke's law, which gives
-\begin{equation}
- F = \kappa_c x_c \;, \label{eq:sawsim:hooke}
-\end{equation}
-where $\kappa_c$ is the bending spring constant and $x_c$ is the
-deflection of the cantilever (\cref{fig:sawsim:schematic}). Unfolded domains
-are modeled as a Worm-Like Chain (WLC)\citep{marko95,bustamante94}, in
-which the tension $F$ is related to its extension (end-to-end
-distance) $x_u$ by
-\begin{equation}
- F = \frac{k_B T}{p_u}
- \p[{ \frac{1}{4}\p({ \frac{1}{(1-x_u/L_u)^2} - 1 })
- + \frac{x_u}{L_u} }] \;,
- \label{eq:sawsim:wlc}
-\end{equation}
-where $p_u$ is the persistence length and $L_u=N_uL_{u1}$ is the total
-contour length of the unfolded chain. The chain of $N_f$ folded
-domains is modeled as a string free to assume any extension up to some
-fixed contour length $L_f=N_fL_{f1}$
-\begin{equation}
- F = \begin{cases}
- 0 & \text{if $x_f<L_f$} \;, \\
- \infty & \text{if $x_f>L_f$} \;,
- \end{cases} \label{eq:sawsim:piston}
-\end{equation}
-where $L_{f1}$ is the separation of the two linking points of a folded
-domain, and $x_f$ is the end-to-end length of the chain of folded
-domains. In this model, any non-zero tension will fully extend these
-folded domains. As discussed in \cref{sec:sawsim:results-folded-tension},
-the contribution of the folded domains to the elastic behavior of the
-polymer-cantilever system is relatively insignificant.
-
-% address assumptions & caveats
-In the simulation, the protein polymer is assumed to be stretched in
-the direction perpendicular to the surface, which is a good
-approximation in most experimental situations, because the unfolded
-length of a protein molecule is much larger than that of the folded
-form. Therefore, after one molecule unfolds, the polymer becomes much
-longer and the angle between the polymer and the surface approaches
-$90$ degrees\citep{carrion-vazquez00}. The joints between domain
-groups are assumed to lie along a line between the surface tether
-point and the position of the tip (\cref{eq:sawsim:x-total}). The effects of
-this assumption are also minimized due to greater length of the
-unfolded domain. Finally, the interactions between different parts of
-the polymer and between the chain and the surface (except at the
-tethering points) are not considered. This is reasonable since these
-interactions should not make substantial contributions to the force
-curve at the force levels of interest, where the polymer is in a
-relatively extended conformation.
-
-% introduce constant velocity and walk through explicit example pull
-Consider an experiment of pulling a polymer with $N$ identical protein
-molecules at a constant speed. At the start of an experiment, the
-chain is unstretched ($x_t=0$), which means all the domains are
-unstretched, the cantilever is undeflected, and the tip is in contact
-with the surface. There is one domain in the cantilever state, $N$ in
-the folded state, and none in the unfolded state. As the surface
-moves away from the tip at a constant speed $v$, the chain becomes
-more extended (\cref{fig:sawsim:schematic}), such that
-\begin{equation}
- x_t = \sum_i x_i = vt \label{eq:sawsim:const-v} \;.
-\end{equation}
-The simulation assumes that the pulling takes discrete steps in space
-and treats $x_t$ as constant over the duration of one time step
-$\Delta t$. Because of the adaptive time steps discussed in
-\cref{sec:sawsim:methods-timesteps}, the space steps $\Delta x_t = v\Delta t$
-may have different sizes, but each step will be ``small''. At each
-step, the total extension is calculated using \cref{eq:sawsim:const-v}, and
-the tension $F(x_t=vt)$ is determined by numerically solving
-\cref{eq:sawsim:tension-balance,eq:sawsim:x-total} using the models
-\cref{eq:sawsim:hooke,eq:sawsim:wlc,eq:sawsim:piston} for known values of the parameters in
-the various states $(N_u, N_f, v, \kappa_c, L_{f1}, L_{u1}, p_u)$.
-When one of the molecules in the polymer unfolds
-(\cref{sec:sawsim:methods-unfolding}), there will be one domain in the
-unfolded state and $N-1$ in the folded state. In the next step, a
-newly balanced tension between the cantilever and the polymer is
-determined by solving for $F(x_t)$ as discussed above, but with the
-total extension $x_t$ incremented by $v\Delta t$ and the new unfolded
-contour length $L_{u1}$ and folded contour length $(N-1)L_{f1}$. The
-sudden lengthening of the polymer chain results in a corresponding
-abrupt drop in the force, leading to the formation of one sawtooth in
-the force curve. As the pulling continues and more domains unfold,
-force curves with a series of sawteeth are generated
-(\cref{fig:sawsim:sim-sawtooth}).
-
-The tension calculation assumes an equilibrated chain, so
-consideration must be given to the chain's relaxation time, which
-should be short compared to the loading timescale. The relaxation
-time for a WLC is given by
-\begin{equation}
- \tau \approx \eta \frac{k_BT p}{F^2}
-% < \eta \frac{k_BT p}{(k_BTx/pL)^2} =
-% Note: < because F > k_BTx/pL
-% < \frac{\eta p^3 L^2}{k_BT x^2}
-% < \frac{\eta p^2 L}{k_BT} % for x/L > \sqrt{p/L}
- \;, \label{eq:sawsim:tau-wlc}
-\end{equation}
-where $\eta$ is the dynamic viscosity, $F$ is the tension, and $p$ is
-the persistence length\citep{evans99}. For forces greater than
-$1\U{pN}$, with $\eta_\text{water}/k_BT=2.45\E{-10}\U{s/nm$^3$}$,
-$\tau<2\U{ns}$ for the protein polymer used in the simulation.
-% python -c 'print 2.45e-10*(1e9)**3 * (1.38e-23*300)**2 * 0.4e-9 / (1e-12)**2'
-% 1.68...e-09
-Therefore, the polymer chain is equilibrated almost instantaneously
-within a time step, which is on the order of tens of microseconds.
-The relaxation time of the cantilever can be determined by measuring
-the cantilever deflection induced by liquid motion and fitting the
-time dependence of the deflection to an exponential
-function\cite{jones05}. For a $200\U{$\mu$m}$ rectangular cantilever
-with a bending spring constant of $20\U{pN/nm}$, the measured
-relaxation time in water is $\sim50\U{$\mu$/s}$ (data not shown).
-This relatively large relaxation time constant makes the cantilever
-act as a low-pass filter and also causes a lag in the force
-measurement.
-
-\subsection{Unfolding protein molecules by force}
-\label{sec:sawsim:methods-unfolding}
-
-% introduce Bell, probability calculations, and MC comparison
-According to the theory developed by \citet{bell78} and extended by
-\citet{evans99}, an external stretching force $F$ increases the
-unfolding rate constant of a protein molecule
-\begin{equation}
- k_u = k_{u0} \exp\p({\frac{F\Delta x_u}{k_B T}}) \;, \label{eq:sawsim:bell}
-\end{equation}
-where $k_{u0}$ is the unfolding rate in the absence of an external
-force, and $\Delta x_u$ is the distance between the native state and
-the transition state along the pulling direction. The probability for
-a protein molecule to unfold under an applied force is
-\begin{equation}
- P_1 = k_u \Delta t \;, \label{eq:sawsim:prob-one}
-\end{equation}
-where $\Delta t$ is the time duration for each pulling step, over
-which $F$ is constant. This expression is accurate for $P_1 \ll 1$.
-From the binomial distribution, the probability of at least one of a
-group of $N_f$ identical domains to unfold in a given time step is
-\begin{equation}
- P = 1 - (1-P_1)^{N_f} \approx N_fP_1 \;, \label{eq:sawsim:prob-n}
-\end{equation}
-where the approximation is valid when $N_fP_1 \ll 1$. To determine if
-an unfolding event occurs in a particular time step, the probability
-calculated using \cref{eq:sawsim:prob-n} is compared with a randomly
-generated number uniformly distributed between $0$ and $1$. If $P$
-is bigger than the random number, a domain unfolds, changing the
-population of each tension state, and a new balance between the
-polymer and the cantilever is determined. If no unfolding event
-occurs the pulling continues and the unfolding probability is
-calculated again in the next step at a higher force. When all the
-molecules in the polymer have unfolded, the pulling continues until a
-pre-determined force level is reached, where the polymer is assumed to
-detach from one of the tethering surfaces. The cantilever deflection
-becomes zero after this point.
-
-% address unfolding models
-Although the Bell model (\cref{eq:sawsim:bell}) is the most widely used
-unfolding model due to its simplicity and its applicability to various
-biopolymers\citep{rief98}, other theoretical models have been proposed
-to interpret mechanical unfolding data. For example,
-\citet{schlierf06} used the mechanical unfolding data of the protein
-ddFLN4 to demonstrate that Kramers' diffusion model fit the measured
-unfolding force data better than the Bell model for proteins with
-broad free energy barriers. For proteins with relatively narrow
-folded and transition states, the Bell model provides a good
-approximation.
-
-\subsection{Choosing the simulation time steps}
-\label{sec:sawsim:methods-timesteps}
-
-The demands on the time step vary throughout a simulated pull due to
-the non-linear elasticity of the polymer. Within a specified time
-duration (or pulling distance), the force change is small at low force
-levels and large at high force levels. To be efficient, the
-simulation algorithm adapts the time step to keep the time steps large
-where large time steps have little effect, while shrinking the time
-step where smaller steps are necessary.
-
-Within each time step, the total chain extension $x_t$ is treated as a
-constant and a force balance is reached very quickly among the various
-domains (see \cref{sec:sawsim:methods-tension} for equilibration timescales).
-This force is used to determine the unfolding probability
-(\cref{eq:sawsim:prob-one,eq:sawsim:prob-n}), which determines the domain state
-populations in the next time step. Therefore, the chain tension must
-not change appreciably over the course of the time step ($\Delta F <
-1\U{pN}$), and the unfolding probability is only calculated once for
-the entire step. The time step must also be short enough that the
-probability of unfolding in a single time step is low ($P<10^{-3}$).
-Besides ensuring that the approximations made in
-\cref{eq:sawsim:prob-one,eq:sawsim:prob-n} are valid, this restriction makes
-time steps which should have multiple unfoldings in a single time step
-highly unlikely. Experimentally measured unfolding are temporally
-supered, because the unfolding transition is characterized by
-multiple, Markovian attempts over a large energy barrier, where the
-probability of crossing the barrier in a single attempt is very low.
-A successful attempt quickly extends the chain contour length,
-reducing the tension, dramatically reducing the likelihood of a second
-escape in that time step. The time step used is recalculated for each
-step so that both of these criteria are satisfied.
-
-\section{Results and Discussion}
-\label{sec:sawsim:results}
-
-\subsection{Force curves generated by simulation}
-\label{sec:sawsim:results-force_curves}
-
-\Cref{fig:sawsim:sim-sawtooth} shows three simulated force curves from
-pulling a polymer composed of eight identical protein molecules using
-parameters from typical experimental settings. The order of the peaks
-in the force curves reflects the temporal sequence of the unfolding
-events instead of the positions of the protein molecules in the
-polymer\cite{li00}. As observed experimentally
-(\cref{fig:sawsim:expt-sawtooth}), the forces at which identical protein
-molecules unfold fluctuate, revealing the stochastic nature of protein
-unfolding since no instrumental noise is included in the simulation.
-\Cref{fig:sawsim:sim-hist} shows the distribution of the unfolding forces,
-\ie, the highest force in each peak (except the last peak in a force
-curve), from a total of $400$ force curves ($3200$ force values). The
-unfolding forces have an average of $281\U{pN}$ with a standard
-deviation of $25\U{pN}$.
-% DONE: discuss noise? no.
-
-\begin{figure}
-\vspace{-1in}
-\begin{center}
-\subfloat[][]{\includegraphics{sawsim/sim-sawtooth}\label{fig:sawsim:sim-sawtooth}%
-}\\
-\subfloat[][]{\includegraphics{sawsim/sim-hist}\label{fig:sawsim:sim-hist}%
-}
-\caption{(a) Three simulated force curves from pulling a polymer of
- eight identical protein molecules. The simulation was carried out
- using the parameters: pulling speed $v=1\U{$\mu$m/s}$, cantilever
- spring constant $\kappa_c=50\U{pN/nm}$, temperature $T=300\U{K}$,
- persistence length of unfolded proteins $p_u=0.40\U{nm}$, $\Delta
- x_u=0.225\U{nm}$, and $k_{u0}=5\E{-5}\U{s$^{-1}$}$. The contour
- length between the two linking point on a protein molecule is
- $L_{f1}=3.7\U{nm}$ in the folded form and $L_{u1}=28.1\U{nm}$ in the
- unfolded form. These parameters are those of ubiquitin molecules
- connected through the N-C termini\citep{chyan04,carrion-vazquez03}.
- Detachment from the tip or substrate is assumed to occur at a force
- of $400\U{pN}$. In experiments, detachments have been observed to
- occur at a variety of forces. (b) The distribution of the unfolding
- forces from $400$ simulated force curves ($3200$ data points) such
- as that shown in (a). The frequency is normalized by the total
- number of points, \ie, the height of each bin is equal to the number
- of data points in that bin divided by the total number of data
- points ($3200$, for this histogram).\label{fig:sawsim:sim-all}}
-\end{center}
-\end{figure}
-
-\subsection{Dependence of the unfolding force on the unfolding order
-and polymer length}
-\label{sec:sawsim:results-scaffold}
-
-Analysis of the mechanical unfolding data is complicated by the
-dependence of the average unfolding force on the unfolding order due
-to the serial linkage of the molecules. Under an external stretching
-force $F$, the probability of some domain unfolding in a polymer with
-$N_f$ folded domains is $N_fP_1$ (\cref{eq:sawsim:prob-n}), which is higher
-than the unfolding probability for a single molecule $P_1$.
-Consequently, the average unfolding force is lower for the earlier
-unfolding events when $N_f$ is larger, and the force should increase
-as more and more molecules become unfolded. However, there is a
-competing factor that opposes this trend. As the protein molecules
-unfold, the chain becomes softer and the force loading rate becomes
-lower when the pulling speed is constant, leading to a decrease in the
-unfolding force. The dependence of the average unfolding force on the
-unfolding order is the result of these two opposing effects.
-\Cref{fig:sawsim:order-dep} shows the dependence of the average unfolding
-force on the unfolding force peak order (the temporal order of
-unfolding events) for four polymers with $4$, $8$, $12$, and $16$
-identical protein molecules. The effect of polymer chain softening
-dominates the initial unfolding events, and the average unfolding
-force decreases as more molecules unfold. After several molecules
-have unfolded, the softening for each additional unfolding event
-becomes less significant, the change in unfolding probability becomes
-dominant, and the unfolding force increases upon each subsequent
-unfolding event\citep{zinober02}.
-
-We validate this explanation by calculating the unfolding force
-probability distribution's depending on the two competing factors.
-The rate of unfolding events with respect to force is
-\begin{align}
- r_{uF} &= -\deriv{F}{N_f}
- = -\frac{\dd N_f/\dd t}{\dd F/\dd t}
- = \frac{N_f k_u}{\kappa v} \\
- &= \frac{N_f k_{u0}}{\kappa v}\exp\p({\frac{F\Delta x_u}{k_B T}})
- = \frac{1}{\rho}\exp\p({\frac{F-\alpha}{\rho}}) \;,
-\end{align}
-where $N_f$ is the number of folded domain,
-$\kappa=(1/\kappa_c+N_u/\kappa_\text{WLC})^{-1}$ is the spring
-constant of the cantilever-polymer system ($\kappa_\text{WLC}$ is the
-effective spring constant of one unfolded domain, assumed constant for
-a particular polymer/cantilever combination), $\kappa v$ is the force
-loading rate, and $k_u$ is the unfolding rate constant
-(\cref{eq:sawsim:bell}). In the last expression, $\rho\equiv k_BT/\Delta
-x_u$, and $\alpha\equiv-\rho\ln(N_fk_{u0}\rho/\kappa v)$. The event
-probability density for events with an exponentially increasing
-likelihood function follows the Gumbel (minimum) probability
-density\cite{NIST:gumbel}, with $\rho$ and $\alpha$ being the scale
-and location parameters, respectively
-\begin{equation}
- \mathcal{P}(F) = \frac{1}{\rho} \exp\p[{\frac{F-\alpha}{\rho}
- -\exp\p({\frac{F-\alpha}{\rho}})
- }] \;. \label{eq:sawsim:gumbel}
-\end{equation}
-The distribution has a mean $\avg{F}=\alpha-\gamma_e\rho$ and a
-variance $\sigma^2=\pi^2\rho^2/6$, where $\gamma_e=0.577\ldots$ is the
-Euler-Mascheroni constant. Therefore, the unfolding force
-distribution has a variance $\sigma^2 = (\pi k_BT/\Delta x_u)^2/6$,
-and and average
-\begin{equation}
- \avg{F(i)} = \frac{k_BT}{\Delta x_u}
- \p[{-\ln\p({\frac{N_fk_{u0}k_BT}{\kappa v\Delta x_u}})
- -\gamma_e}] \;, \label{eq:sawsim:order-dep}
-\end{equation}
-where $N_f$ and $\kappa$ depend on the domain index $i=N_u$. Curves based
-on this formula fit the simulated data remarkably well considering the
-effective WLC stiffness $\kappa_\text{WLC}$ is the only fitted
-parameter, and that the actual WLC stiffness is not constant, as we
-have assumed here, but a non-linear function of $F$.
-
-From \cref{fig:sawsim:order-dep}, it can be seen that the proper way to
-process data from mechanical unfolding experiments is to group the
-curves according to the length of the polymer and to perform
-statistical analysis separately for peaks with the same unfolding
-order. However, in most experiments, the tethering of the polymer to
-the AFM tip is by nonspecific adsorption; as a result, the polymers
-being stretched between the tip and the substrate have various
-lengths. In addition, the interactions between the tip and the
-surface often cause irregular features in the beginning of the force
-curve (\cref{fig:sawsim:expt-sawtooth}), making the identification of the
-first peak uncertain. Furthermore, it is often difficult to acquire a
-large amount of data in single molecules experiments. These
-difficulties make the aforementioned data analysis approach unfeasible
-for many mechanical unfolding experiments. As a result, the values of
-all force peaks from polymers of different lengths are often pooled
-together for statistical analysis. To assess the errors caused by
-such pooling, simulation data were analyzed using different pooling
-methods and the results were compared. \Cref{fig:sawsim:sim-hist} shows
-that, for a polymer with eight protein molecules, the average
-unfolding force is $281\U{pN}$ with a standard deviation of $25\U{pN}$
-when all data is pooled. If only the first peaks in the force curves
-are analyzed, the average force is $279\U{pN}$ with a standard
-deviation of $22\U{pN}$. While for the fourth and eighth peaks, the
-average force are $275\U{pN}$ and $300\U{pN}$, respectively, and the
-standard deviations are $23\U{pN}$ and $25\U{pN}$, respectively. As
-expected from the Gumbel distribution, the width of the unfolding
-force distribution (insets in \cref{fig:sawsim:order-dep}) is only weakly
-effected by unfolding order, but the average unfolding force can be
-quite different for the same protein because of the differences in
-unfolding order and polymer length.
-
-\begin{figure}
-\begin{center}
-\includegraphics{sawsim/order-dep}
-\caption{The dependence of the unfolding force on the temporal
- unfolding order for four polymers with $4$, $8$, $12$, and $16$
- molecules of identical proteins. Each point in the figure is the
- average of $400$ data points. The first point in each curve
- represents the average of only the first peak in each of the $400$
- simulated force curves, the second point represents the average of
- only the second peak, and so on. The solid lines are fits of
- \cref{eq:sawsim:order-dep} to the simulated data, with best fit
- $\kappa_\text{WLC}=203$, $207$, $161$, and $157\U{pN/nm}$,
- respectively, for lengths $4$ through $16$. The insets show the
- force distributions of the first, fourth, and eighth peaks, left to
- right, for the polymer with eight protein molecules. The parameters
- used for generating the data were the same as those used for
- \cref{fig:sawsim:sim-sawtooth}, except the polymer length, and the
- histograms in the insets were normalized in the same way as in
- \cref{fig:sawsim:sim-hist}.\label{fig:sawsim:order-dep}}
-\end{center}
-\end{figure}
-
-\subsection{The effect of polymer inhomogeneity}
-\label{sec:sawsim:results-folded-tension}
-
-The unfolded polypeptide chain has been shown to follow the WLC model
-quite well, though other polymer models, such as the Freely-Jointed
-Chain (FJC)\citep{verdier70}, can be used to fit the force-extension
-relationship\citep{janshoff00}. A short chain of folded proteins,
-however, cannot be described well by polymer models. Several studies
-have used WLC and FJC models to fit the elastic properties of the
-modular protein titin\citep{granzier97,linke98a},
-% TODO: check it really is folded domains \& bulk titin
-but native titin contains hundreds of folded and unfolded domains
-domains. For the short protein polymers common in mechanical
-unfolding experiments, the cantilever dominates the elasticity of the
-polymer-cantilever system before any protein molecules unfold. After
-the first unfolding event occurs, the unfolded portion of the chain is
-already longer and softer than the sum of all the remaining folded
-domains, and dominates the elasticity of the whole chain. Therefore,
-the details of the tension model chosen for the folded domains has
-negligible effect on the unfolding forces, which was also suggested by
-\citet{staple08}. Force curves simulated using different models to
-describe the folded domains yielded almost identical unfolding force
-distributions (data not shown).
-
-\subsection{The effect of cantilever force constant}
-\label{sec:sawsim:cantilever}
-
-In mechanical unfolding experiments, the ability to observe the
-unfolding of a single protein molecule depends on the tension drop
-after an unfolding event such that another molecule does not unfold
-immediately. The magnitude of this drop is determined by many
-factors, including the magnitude of the unfolding force, the contour
-and persistence lengths of the protein polymer, the contour length
-increase from unfolding, and the stiffness (force constant) of the
-cantilever. Among these, the effect of the cantilever force constant
-is particularly interesting because cantilevers with a wide range of
-force constants are available. In addition different single molecule
-manipulation techniques, such as the AFM and laser tweezers, differ
-mainly in the range of the spring constants of their force
-transducers. \Cref{fig:sawsim:kappa-sawteeth} shows the simulated force
-curves from pulling an octamer of protein molecules using cantilevers
-with different force constants, while other parameters are identical.
-For this model protein, the appearance of the force curve does not
-change much until the force constant of the cantilever reaches a
-certain value ($\kappa_c=50\U{pN/nm}$). When $\kappa_c$ is lower than
-this value, the individual unfolding events become less identifiable.
-In order to observe individual unfolding events, the cantilever needs
-to have a force constant high enough so that the bending at the
-maximum force is small in comparison with the contour length increment
-from the unfolding of a single molecule. \Cref{fig:sawsim:kappa-sawteeth}
-also shows that the back side of the force peaks becomes more tilted
-as the cantilever becomes softer. This is due to the fact that the
-extension (end-to-end distance) of the protein polymer has a large
-sudden increase as the tension rebalances after an unfolding event.
-
-It should also be mentioned that the contour length increment from
-each unfolding event is not equal to the distance between adjacent
-peaks in the force curve because the chain is never fully stretched.
-This contour length increase can only be obtained by fitting the curve
-to WLC or other polymer models (\cref{fig:sawsim:expt-sawtooth}).
-
-\begin{figure}
-\begin{center}
-\includegraphics{sawsim/kappa-sawteeth}
-\caption{Simulated force curves obtained from pulling a polymer with
- eight protein molecules using cantilevers with different force
- constants $\kappa_c$. Parameters used in generating these curves
- are the same as those used in \cref{fig:sawsim:sim-all}, except the
- cantilever force constant.\label{fig:sawsim:kappa-sawteeth}}
-\end{center}
-\end{figure}
-
-\subsection{Determination of $\Delta x_u$ and $k_{u0}$}
-\label{sec:sawsim:results-fitting}
-
-The zero-force unfolding rate $k_{u0}$ and the distance $\Delta x_u$
-from the native state to the transition state are the two kinetic
-parameters obtainable for mechanical unfolding experiments by matching
-the simulated data with measured results. \cref{fig:sawsim:v-dep} shows the
-dependence of the unfolding force on the pulling speed for different
-values of $k_{u0}$ and $\Delta x_u$. As expected, the unfolding force
-increases linearly with the pulling speed in the linear-log
-plot\citep{evans99}. While the magnitude of the unfolding forces is
-affected by both $k_{u0}$ and $\Delta x_u$, the slope of speed
-dependence is primarily determined by $\Delta x_u$.
-\Cref{fig:sawsim:width-v-dep} shows that the width of the unfolding force
-distribution is very sensitive to $\Delta x_u$, as expected from the
-Gumbel distribution discussed in \cref{sec:sawsim:results-scaffold}. To
-obtain the values of $k_{u0}$ and $\Delta x_u$ for the protein, the
-pulling speed dependence and the distribution of the unfolding forces
-from simulation, such as those shown in \cref{fig:sawsim:v-dep} and the
-insets of \cref{fig:sawsim:width-v-dep}, are compared with the experimentally
-measured results. The values of $k_{u0}$ and $\Delta x_u$ that
-provide the best match are designated as the parameters describing the
-protein under study. Since $k_{u0}$ and $\Delta x_u$ affect the
-unfolding forces differently, the values of both parameters can be
-determined simultaneously. The data used in plotting
-\cref{fig:sawsim:all-v-dep} includes all force peaks from the simulated force
-curves because most experimental data is analyzed that way.
-
-In most published literature, determination of the values of $k_{u0}$
-and $\Delta x_u$ was mostly done by carrying out simulations using a
-handful of possible unfolding parameters and selected the best fit by
-eye%
-%\citep{us,CV1999}
-. This approach does not allow estimation of uncertainties in the
-fitting parameters, as shown by \citet{best02}. A more rigorous
-approach involves quantifying the quality of fit between the
-experimental and simulated force distributions, allowing the use of a
-numerical minimization algorithm to pick the best fit parameters. We
-use the Jensen-Shannon divergence\citep{sims09,lin91}, a measure of
-the similarity between two probability distributions.
-\begin{equation}
- D_\text{JS}(p_e, p_s)
- = D_\text{KL}(p_e, p_m) + D_\text{KL}(p_s, p_m) \;, \label{eq:sawsim:D_JS}
-\end{equation}
-where $p_e(i)$ and $p_s(i)$ are the the values of the $i^\text{th}$
-bin in the experimental and simulated unfolding force histograms,
-respectively. $D_\text{KL}$ is the Kullback-Leibler divergence
-\begin{equation}
- D_\text{KL}(p_p,p_q)
- = \sum_i p_p(i) \log_2\p({\frac{p_p(i)}{p_q(i)}}) \;, \label{eq:sawsim:D_KL}
-\end{equation}
-where the sum is over all unfolding force histogram bins. $p_m$ is
-the symmetrized probability distribution
-\begin{equation}
- p_m(i) \equiv [p_e(i)+p_s(i)]/2 \;.
-\end{equation}
-% DONE: Mention inter-histogram normalization? no.
-% For experiments carried out over a series of pulling velocities, we
-% simply sum residuals computed for each velocity, although it would
-% also be reasonable to weight this sum according to the number of
-% experimental unfolding events recorded for each velocity.
-% DONE: mention $D_\text{JS}$ features to explains selection over $\chi^2$? no.
-\Cref{fig:sawsim:fit-space} shows the Jensen-Shannon divergence calculated
-using \cref{eq:sawsim:D_JS} between an experimental data set and simulation
-results obtained using a range of values of $k_{u0}$ and $\Delta x_u$.
-There is an order of magnitude range of $k_{u0}$ that produce
-reasonable fits to experimental data (\cref{fig:sawsim:fit-space}), which is
-consistent with the results \citet{best02} obtained using a chi-square
-test. The values of $k_{u0}$ and $\Delta x_u$ can be determined to
-higher precision by using both the pulling speed dependent data and
-the unfolding force distribution, as well as any relevant information
-about the protein from other sources.
-
-\begin{figure}
-\begin{center}
-\subfloat[][]{\includegraphics{sawsim/v-dep}\label{fig:sawsim:v-dep}%
-} \\
-\subfloat[][]{\includegraphics{sawsim/v-dep-sd}\label{fig:sawsim:width-v-dep}%
-}
-\caption{(a) The dependence of the unfolding forces on the pulling
- speed for three different model protein molecules characterized by
- the parameters $k_{u0}$ and $\Delta x_u$. The polymer length is
- eight molecules, and each symbol is the average of $3200$ data
- points. (b) The dependence of standard deviation of the unfolding
- force distribution on the pulling speed for the simulation data
- shown in (a), using the same symbols. The insets show the force
- distribution histograms for the three proteins at the pulling speed
- of $1\U{$\mu$m/s}$. The left, middle and right histograms are for
- the proteins represented by the top, middle, and bottom lines in (a),
- respectively.\label{fig:sawsim:all-v-dep}}
-\end{center}
-\end{figure}
-
-\begin{figure}
-\begin{center}
-\includegraphics{sawsim/fit-space}
-\caption{Fit quality between an experimental data set and simulated
- data sets obtained using various values of unfolding rate parameters
- $k_{u0}$ and $\Delta x_u$. The experimental data are from octameric
- ubiquitin pulled at $1\U{$\mu$m/s}$\citep{chyan04}, and the other
- model parameters are the same as those in \cref{fig:sawsim:sim-all}. The
- best fit parameters are $\Delta x_u=0.17\U{nm}$ and
- $k_{u0}=1.2\E{-2}\U{s$^{-1}$}$. The simulation histograms were
- built from $400$ pulls at for each parameter pair. The color scale
- shown on the right is $log_{10}(D_\text{JS})$.
-\label{fig:sawsim:fit-space}}
-\end{center}
-\end{figure}
-
-\section{Conclusions}
-\label{sec:sawsim:conclusions}
-
-We have described the method of performing Monte Carlo simulations
-based on a simple two-state model for the mechanical unfolding of
-protein molecules and discussed the complications involved in the
-simulation procedure. In addition to the extraction of kinetic
-properties of the protein from mechanical unfolding data, such
-simulations can help to elucidate the effects of various experimental
-parameters on the appearance of force curves and to estimate the
-errors associated with data pooling. To date, the force-induced
-unfolding approach has been used to investigate several different
-types of proteins. As the technique is used to study a wider range of
-proteins, this simple simulation method will be useful for data
-analysis, experimental design, and artifact identification.
-
-%% Ispell local dictionary:
-
-%% Science words
-% LocalWords: polysaccharides polypeptide biomembrane biopolymers photodetector
-% LocalWords: piconewtons nanonewtons nanometers vt timesteps timestep
-% LocalWords: ubiquitin titin sawteeth octamer octameric
-% LocalWords: unstretched undeflected unfoldings underdamped
-% LocalWords: rebalances nonspecific equilibrated
-
-%% Abbreviations:
-% LocalWords: Tel AFM afm WLC wlc FJC fjc PACS MSC pN nm MC ddFLN
-
-%% Names and related
-% LocalWords: Hookean hooke Kramers kramers Mascheroni Kullback Leibler
-% LocalWords: Markovian msu Meihong Su gyang Guoliang CSIRO Drexel
-% LocalWords: bustamante forde smith merkel carrion vazquez rief levy fernandez
-% LocalWords: chyan lu klimov li zinober jollymore marko sims evans schlierf
-% LocalWords: janshoff granzier linke verdier dicola lin
-
-%% LaTeX and related
-% LocalWords: documentclass elsarticle pt tnoteref tnotetext fnref fntext
-% LocalWords: corref cortext ead form url cor TODO
-
-%% Reference abbreviations
-% LocalWords: sec fig eq expt sim dep const prob hist hists
+\input{sawsim/introduction}
+\input{sawsim/methods}
+\input{sawsim/discussion}
+\input{sawsim/conclusions}
--- /dev/null
+\section{Methods}
+\label{sec:sawsim:methods}
+
+% simulation overview
+In simulating the mechanical unfolding process, a force curve is
+generated by calculating the amount of the cantilever bending as the
+substrate surface moves away from the tip. The cantilever bending is
+obtained by balancing the tension in the protein polymer and the
+Hookean force of the bent cantilever. The unfolding probability of
+the protein molecules in the polymer is then calculated for that
+tension, and whether an unfolding event occurs is determined according
+to a Monte Carlo method. The simulation was implemented in
+C\footnote{Source code available at
+ \url{http://www.physics.drexel.edu/~wking/sawsim/}}.
+
+\subsection{Generating force curves}
+\label{sec:sawsim:methods-tension}
+
+% introduce domains and groups.
+The fundamental abstraction of the simulation is the ``domain'', which
+represents a discrete chunk of the flexible chain between the
+substrate and the cantilever holder. Each of these domains is
+assigned a particular state; for example, the domain representing the
+cantilever is assigned to the ``cantilever'' state, and the domains
+representing protein molecules are assigned to either the ``folded''
+or the ``unfolded'' state. When balancing the tension along the
+chain, we assume that the spatial order of domains along the chain is
+irrelevant\citep{li00}, and therefore, the domains can be rearranged
+and grouped by state. To determine the tension in the chain and the
+amount of cantilever bending when $n$ states are populated, a system
+of $n+1$ equations with $n+1$ unknowns must be solved
+\begin{align}
+ F_i(x_i) &= F_t \label{eq:sawsim:tension-balance} \\
+ \sum_i x_i &= x_t \;, \label{eq:sawsim:x-total}
+\end{align}
+where $F$ are tensions, $x$ are extensions, and the subscripts $i$ and
+$t$ represent a particular state group and the total chain
+respectively (\cref{fig:schematic}). From this $F(x_t)$ may be
+computed using any multi-dimensional root-finding algorithm.
+
+% introduce particular models, and mention parameter aggregation
+Inside this framework, we choose a particular extension model
+$F_i(x_i)$ for each domain state. Cantilever elasticity is described
+by Hooke's law, which gives
+\begin{equation}
+ F = \kappa_c x_c \;, \label{eq:sawsim:hooke}
+\end{equation}
+where $\kappa_c$ is the bending spring constant and $x_c$ is the
+deflection of the cantilever (\cref{fig:schematic}). Unfolded domains
+are modeled as a Worm-Like Chain (WLC)\citep{marko95,bustamante94}, in
+which the tension $F$ is related to its extension (end-to-end
+distance) $x_u$ by
+\begin{equation}
+ F = \frac{k_B T}{p_u}
+ \p[{ \frac{1}{4}\p({ \frac{1}{(1-x_u/L_u)^2} - 1 })
+ + \frac{x_u}{L_u} }] \;,
+ \label{eq:sawsim:wlc}
+\end{equation}
+where $p_u$ is the persistence length and $L_u=N_uL_{u1}$ is the total
+contour length of the unfolded chain. The chain of $N_f$ folded
+domains is modeled as a string free to assume any extension up to some
+fixed contour length $L_f=N_fL_{f1}$
+\begin{equation}
+ F = \begin{cases}
+ 0 & \text{if $x_f<L_f$} \;, \\
+ \infty & \text{if $x_f>L_f$} \;,
+ \end{cases} \label{eq:sawsim:piston}
+\end{equation}
+where $L_{f1}$ is the separation of the two linking points of a folded
+domain, and $x_f$ is the end-to-end length of the chain of folded
+domains. In this model, any non-zero tension will fully extend these
+folded domains. As discussed in \cref{sec:sawsim:results-folded-tension},
+the contribution of the folded domains to the elastic behavior of the
+polymer-cantilever system is relatively insignificant.
+
+% address assumptions & caveats
+In the simulation, the protein polymer is assumed to be stretched in
+the direction perpendicular to the surface, which is a good
+approximation in most experimental situations, because the unfolded
+length of a protein molecule is much larger than that of the folded
+form. Therefore, after one molecule unfolds, the polymer becomes much
+longer and the angle between the polymer and the surface approaches
+$90$ degrees\citep{carrion-vazquez00}. The joints between domain
+groups are assumed to lie along a line between the surface tether
+point and the position of the tip (\cref{eq:sawsim:x-total}). The effects of
+this assumption are also minimized due to greater length of the
+unfolded domain. Finally, the interactions between different parts of
+the polymer and between the chain and the surface (except at the
+tethering points) are not considered. This is reasonable since these
+interactions should not make substantial contributions to the force
+curve at the force levels of interest, where the polymer is in a
+relatively extended conformation.
+
+% introduce constant velocity and walk through explicit example pull
+Consider an experiment of pulling a polymer with $N$ identical protein
+molecules at a constant speed. At the start of an experiment, the
+chain is unstretched ($x_t=0$), which means all the domains are
+unstretched, the cantilever is undeflected, and the tip is in contact
+with the surface. There is one domain in the cantilever state, $N$ in
+the folded state, and none in the unfolded state. As the surface
+moves away from the tip at a constant speed $v$, the chain becomes
+more extended (\cref{fig:schematic}), such that
+\begin{equation}
+ x_t = \sum_i x_i = vt \label{eq:sawsim:const-v} \;.
+\end{equation}
+The simulation assumes that the pulling takes discrete steps in space
+and treats $x_t$ as constant over the duration of one time step
+$\Delta t$. Because of the adaptive time steps discussed in
+\cref{sec:sawsim:methods-timesteps}, the space steps $\Delta x_t = v\Delta t$
+may have different sizes, but each step will be ``small''. At each
+step, the total extension is calculated using \cref{eq:sawsim:const-v}, and
+the tension $F(x_t=vt)$ is determined by numerically solving
+\cref{eq:sawsim:tension-balance,eq:sawsim:x-total} using the models
+\cref{eq:sawsim:hooke,eq:sawsim:wlc,eq:sawsim:piston} for known values of the parameters in
+the various states $(N_u, N_f, v, \kappa_c, L_{f1}, L_{u1}, p_u)$.
+When one of the molecules in the polymer unfolds
+(\cref{sec:sawsim:methods-unfolding}), there will be one domain in the
+unfolded state and $N-1$ in the folded state. In the next step, a
+newly balanced tension between the cantilever and the polymer is
+determined by solving for $F(x_t)$ as discussed above, but with the
+total extension $x_t$ incremented by $v\Delta t$ and the new unfolded
+contour length $L_{u1}$ and folded contour length $(N-1)L_{f1}$. The
+sudden lengthening of the polymer chain results in a corresponding
+abrupt drop in the force, leading to the formation of one sawtooth in
+the force curve. As the pulling continues and more domains unfold,
+force curves with a series of sawteeth are generated
+(\cref{fig:sawsim:sim-sawtooth}).
+
+The tension calculation assumes an equilibrated chain, so
+consideration must be given to the chain's relaxation time, which
+should be short compared to the loading timescale. The relaxation
+time for a WLC is given by
+\begin{equation}
+ \tau \approx \eta \frac{k_BT p}{F^2}
+% < \eta \frac{k_BT p}{(k_BTx/pL)^2} =
+% Note: < because F > k_BTx/pL
+% < \frac{\eta p^3 L^2}{k_BT x^2}
+% < \frac{\eta p^2 L}{k_BT} % for x/L > \sqrt{p/L}
+ \;, \label{eq:sawsim:tau-wlc}
+\end{equation}
+where $\eta$ is the dynamic viscosity, $F$ is the tension, and $p$ is
+the persistence length\citep{evans99}. For forces greater than
+$1\U{pN}$, with $\eta_\text{water}/k_BT=2.45\E{-10}\U{s/nm$^3$}$,
+$\tau<2\U{ns}$ for the protein polymer used in the simulation.
+% python -c 'print 2.45e-10*(1e9)**3 * (1.38e-23*300)**2 * 0.4e-9 / (1e-12)**2'
+% 1.68...e-09
+Therefore, the polymer chain is equilibrated almost instantaneously
+within a time step, which is on the order of tens of microseconds.
+The relaxation time of the cantilever can be determined by measuring
+the cantilever deflection induced by liquid motion and fitting the
+time dependence of the deflection to an exponential
+function\cite{jones05}. For a $200\U{$\mu$m}$ rectangular cantilever
+with a bending spring constant of $20\U{pN/nm}$, the measured
+relaxation time in water is $\sim50\U{$\mu$/s}$ (data not shown).
+This relatively large relaxation time constant makes the cantilever
+act as a low-pass filter and also causes a lag in the force
+measurement.
+
+\subsection{Unfolding protein molecules by force}
+\label{sec:sawsim:methods-unfolding}
+
+% introduce Bell, probability calculations, and MC comparison
+According to the theory developed by \citet{bell78} and extended by
+\citet{evans99}, an external stretching force $F$ increases the
+unfolding rate constant of a protein molecule
+\begin{equation}
+ k_u = k_{u0} \exp\p({\frac{F\Delta x_u}{k_B T}}) \;, \label{eq:sawsim:bell}
+\end{equation}
+where $k_{u0}$ is the unfolding rate in the absence of an external
+force, and $\Delta x_u$ is the distance between the native state and
+the transition state along the pulling direction. The probability for
+a protein molecule to unfold under an applied force is
+\begin{equation}
+ P_1 = k_u \Delta t \;, \label{eq:sawsim:prob-one}
+\end{equation}
+where $\Delta t$ is the time duration for each pulling step, over
+which $F$ is constant. This expression is accurate for $P_1 \ll 1$.
+From the binomial distribution, the probability of at least one of a
+group of $N_f$ identical domains to unfold in a given time step is
+\begin{equation}
+ P = 1 - (1-P_1)^{N_f} \approx N_fP_1 \;, \label{eq:sawsim:prob-n}
+\end{equation}
+where the approximation is valid when $N_fP_1 \ll 1$. To determine if
+an unfolding event occurs in a particular time step, the probability
+calculated using \cref{eq:sawsim:prob-n} is compared with a randomly
+generated number uniformly distributed between $0$ and $1$. If $P$
+is bigger than the random number, a domain unfolds, changing the
+population of each tension state, and a new balance between the
+polymer and the cantilever is determined. If no unfolding event
+occurs the pulling continues and the unfolding probability is
+calculated again in the next step at a higher force. When all the
+molecules in the polymer have unfolded, the pulling continues until a
+pre-determined force level is reached, where the polymer is assumed to
+detach from one of the tethering surfaces. The cantilever deflection
+becomes zero after this point.
+
+% address unfolding models
+Although the Bell model (\cref{eq:sawsim:bell}) is the most widely used
+unfolding model due to its simplicity and its applicability to various
+biopolymers\citep{rief98}, other theoretical models have been proposed
+to interpret mechanical unfolding data. For example,
+\citet{schlierf06} used the mechanical unfolding data of the protein
+ddFLN4 to demonstrate that Kramers' diffusion model fit the measured
+unfolding force data better than the Bell model for proteins with
+broad free energy barriers. For proteins with relatively narrow
+folded and transition states, the Bell model provides a good
+approximation.
+
+\subsection{Choosing the simulation time steps}
+\label{sec:sawsim:methods-timesteps}
+
+The demands on the time step vary throughout a simulated pull due to
+the non-linear elasticity of the polymer. Within a specified time
+duration (or pulling distance), the force change is small at low force
+levels and large at high force levels. To be efficient, the
+simulation algorithm adapts the time step to keep the time steps large
+where large time steps have little effect, while shrinking the time
+step where smaller steps are necessary.
+
+Within each time step, the total chain extension $x_t$ is treated as a
+constant and a force balance is reached very quickly among the various
+domains (see \cref{sec:sawsim:methods-tension} for equilibration timescales).
+This force is used to determine the unfolding probability
+(\cref{eq:sawsim:prob-one,eq:sawsim:prob-n}), which determines the domain state
+populations in the next time step. Therefore, the chain tension must
+not change appreciably over the course of the time step ($\Delta F <
+1\U{pN}$), and the unfolding probability is only calculated once for
+the entire step. The time step must also be short enough that the
+probability of unfolding in a single time step is low ($P<10^{-3}$).
+Besides ensuring that the approximations made in
+\cref{eq:sawsim:prob-one,eq:sawsim:prob-n} are valid, this restriction makes
+time steps which should have multiple unfoldings in a single time step
+highly unlikely. Experimentally measured unfolding are temporally
+supered, because the unfolding transition is characterized by
+multiple, Markovian attempts over a large energy barrier, where the
+probability of crossing the barrier in a single attempt is very low.
+A successful attempt quickly extends the chain contour length,
+reducing the tension, dramatically reducing the likelihood of a second
+escape in that time step. The time step used is recalculated for each
+step so that both of these criteria are satisfied.