sawsim/discussion.tex: Remove double integral Kramers' testing

[thesis.git] / src / sawsim / discussion.tex
diff --git a/src/sawsim/discussion.tex b/src/sawsim/discussion.tex

index c97d9bf3b08734a26d74e25cd3a602141f90a6db..ec9d006a5fc9b1563a3344eb0a0e617c1e394357 100644 (file)
--- a/src/sawsim/discussion.tex
+++ b/src/sawsim/discussion.tex
@@ -61,25 +61,25 @@ low-dimensional parameter spaces).
        parameters are those given by \citet{carrion-vazquez99b}.
        \citet{carrion-vazquez99b} don't list their cantilever spring
        constant (or, presumably, use it in their simulations), but we
-      can estimate it from the rebound slope in their Figures~2.a and
-      2.b, see
+      can estimate it from the rebound slope in their
+      \fref{figure}{2.a} and \iref{figure}{2.b}, see
        \cref{fig:sawsim:kappa-sawteeth}.\label{tab:sawsim:model}}
    \end{center}
  \end{table}
  
-Because the unfolding behavious of an individual sawtooth curve is
+Because the unfolding behaviors of an individual sawtooth curve is
  stochastic (\cref{fig:sawsim:sim-sawtooth}), we cannot directly
  compare single curves in our fit quality metric.  Instead, we gather
  many experimental and simulated curves, and compare the aggregate
  properties.  For velocity-clamp experiments, the usual aggregate
  property used for comparison is a histogram of unfolding
  forces\citep{carrion-vazquez99b} (\cref{fig:sawsim:sim-hist}).
-Defining and extracting “unfolding force” is suprisingly complicated
-(\cref{sec:hooke:unfolding-force}), but basically it is the highest
-tension force achieved by the chain before an unfolding event (the
-drops in the sawtooth).  The final drop is not an unfolding event, it
-is the entire chain breaking away from the cantilever tip, severing
-the connection between the substrate and the cantilever.
+Defining and extracting ``unfolding force'' is suprisingly complicated
+(\cref{sec:hooke:plugins}), but basically it is the highest tension
+force achieved by the chain before an unfolding event (the drops in
+the sawtooth).  The final drop is not an unfolding event, it is the
+entire chain breaking away from the cantilever tip, severing the
+connection between the substrate and the cantilever.
  
  \begin{figure}
    \begin{center}
@@ -119,6 +119,28 @@ the connection between the substrate and the cantilever.
  \subsection{The supramolecular scaffold}
  \label{sec:sawsim:results:scaffold}
  
+\begin{figure}
+  \begin{center}
+  \asyinclude{figures/order-dep/order-dep}
+  \caption{The dependence of the unfolding force on the temporal
+    unfolding order for four polymers with $4$, $8$, $12$, and $16$
+    identical protein domains.  Each point in the figure is the
+    average of $400$ data points.  The first point in each curve
+    represents the average of only the first peak in each of the $400$
+    simulated force curves, the second point represents the average of
+    only the second peak, and so on.  The solid lines are fits of
+    \cref{eq:sawsim:order-dep} to the simulated data, with best fit
+    $\kappa_\text{WLC}=203$, $207$, $161$, and $157\U{pN/nm}$,
+    respectively, for lengths $4$ through $16$.  The insets show the
+    force distributions of the first, fourth, and eighth peaks, left
+    to right, for the polymer with eight protein domains.  The
+    parameters used for generating the data were the same as those
+    used for \cref{fig:sawsim:sim-sawtooth}, except for the number of
+    domains.  The histogram insets were normalized in the same way as
+    in \cref{fig:sawsim:sim-hist}.\label{fig:sawsim:order-dep}}
+  \end{center}
+\end{figure}
+
  Analysis of the mechanical unfolding data is complicated by the
  dependence of the average unfolding force on the unfolding order due
  to the serial linkage of the molecules.  Under an external stretching
@@ -144,6 +166,11 @@ several molecules have unfolded, the softening for each additional
  unfolding event becomes less significant, the change in unfolding
  probability becomes dominant, and the unfolding force increases upon
  each subsequent unfolding event\citep{zinober02}.
+%
+\nomenclature{$N_f$}{The number of folded domains in a protein chain
+  (\cref{sec:sawsim:results:scaffold}).}
+\nomenclature{$N_u$}{The number of unfolded domains in a protein chain
+  (\cref{sec:sawsim:results:scaffold}).}
  
  We validate this explanation by calculating the unfolding force
  probability distribution's dependence on the two competing factors.
@@ -152,29 +179,47 @@ The rate of unfolding events with respect to force is
    r_{uF} &= -\deriv{F}{N_f}
      = -\frac{\dd N_f/\dd t}{\dd F/\dd t}
      = \frac{N_f k_u}{\kappa v} \\
-    &= \frac{N_f k_{u0}}{\kappa v}\exp\p({\frac{F\Delta x_u}{k_B T}})
-    = \frac{1}{\rho}\exp\p({\frac{F-\alpha}{\rho}}) \;,
+    &= \frac{N_f k_{u0}}{\kappa v}\exp{\frac{F\Delta x_u}{k_B T}}
+    = \frac{1}{\rho}\exp{\frac{F-\alpha}{\rho}} \;,
  \end{align}
-where $N_f$ is the number of folded domain,
-$\kappa=(1/\kappa_c+N_u/\kappa_\text{WLC})^{-1}$ is the spring
-constant of the cantilever-polymer system ($\kappa_\text{WLC}$ is the
-effective spring constant of one unfolded domain, assumed constant for
-a particular polymer/cantilever combination), $\kappa v$ is the force
-loading rate, and $k_u$ is the unfolding rate constant
+where $N_f$ is the number of folded domain, $\kappa$ is the spring
+constant of the cantilever-polymer system, $\kappa v$ is the force
+loading rate\footnote{
+  \begin{equation}
+    \deriv{t}{F} = \deriv{x}{F}\deriv{t}{x} = \kappa v \;.
+  \end{equation}
+  Alternatively,
+  \begin{align}
+    F &= \kappa x = \kappa vt \\
+    \deriv{t}{F} &= \kappa v \;.
+  \end{align}
+  See the text before \xref{evans97}{equation}{11} or
+  \xref{dudko06}{equation}{4} for similar explanations.
+}, and $k_u$ is the unfolding rate constant
  (\cref{eq:sawsim:bell}).  In the last expression, $\rho\equiv
  k_BT/\Delta x_u$, and $\alpha\equiv-\rho\ln(N_fk_{u0}\rho/\kappa v)$.
+We can approximate $\kappa$ as a series of Hookean springs,
+\begin{equation}
+  \kappa=\p({\frac{1}{\kappa_c}+\frac{N_u}{\kappa_\text{WLC}}})^{-1} \;,
+  \label{eq:kappa-system}
+\end{equation}
+where $\kappa_\text{WLC}$ is the effective spring constant of one
+unfolded domain, assumed constant for a particular polymer/cantilever
+combination.
+
  The event probability density for events with an exponentially
  increasing likelihood function follows the Gumbel (minimum)
  probability density\citep{NIST:gumbel} with $\rho$ and $\alpha$ being
  the scale and location parameters, respectively\citep{hummer03}
  \begin{equation}
-  \mathcal{P}(F) = \frac{1}{\rho} \exp\p[{\frac{F-\alpha}{\rho}
-                                           -\exp\p({\frac{F-\alpha}{\rho}})
-                                           }] \;.  \label{eq:sawsim:gumbel}
+  \mathcal{P}(F) = \frac{1}{\rho} \exp{\frac{F-\alpha}{\rho}
+                                           -\exp{\frac{F-\alpha}{\rho}}
+                                           } \;.  \label{eq:sawsim:gumbel}
  \end{equation}
-The distribution has a mean $\avg{F}=\alpha-\gamma_e\rho$ and a
-variance $\sigma^2 = \pi^2\rho^2/6$, where $\gamma_e=0.577\ldots$ is
-the Euler-Mascheroni constant.  Therefore, the unfolding force
+The distribution has a mode $\alpha$, mean
+$\avg{F}=\alpha-\gamma_e\rho$, and a variance $\sigma^2 =
+\pi^2\rho^2/6$, where $\gamma_e=0.577\ldots$ is the Euler--Mascheroni
+constant\citep{NIST:gumbel}.  Therefore, the unfolding force
  distribution has a variance
  \begin{equation}
    \sigma^2 = \frac{\p({\frac{\pi k_BT}{\Delta x_u}})^2}{6} \;,
@@ -196,8 +241,18 @@ constant, as we have assumed here, but a non-linear function of $F$.
  as far as I know, nobody has found an analytical form for the
  unfolding force histograms produced under such a variable loading
  rate.
+%
  \nomenclature{$r_{uF}$}{Unfolding loading rate (newtons per second)}
-\nomenclature{$\gamma_e$}{Euler-Macheroni constant, $\gamma_e=0.577\ldots$}
+\nomenclature{$\alpha$}{The mode unfolding force,
+  $\alpha\equiv-\rho\ln(N_f k_{u0}\rho/\kappa v)$
+  (\cref{eq:sawsim:gumbel}).}
+\nomenclature{$\rho$}{The characteristic unfolding force,
+  $\rho\equiv k_BT/\Delta x_u$ (\cref{eq:sawsim:gumbel}).}
+\nomenclature{$\gamma_e$}{Euler--Macheroni constant, $\gamma_e=0.577\ldots$}
+\nomenclature{$\sigma$}{Standard deviation.  For example, $\sigma$ is
+  used as the standard deviation of an unfolding force distribution in
+  \cref{eq:sawsim:gumbel}.  Not to be confused with the photodiode
+  sensitivity $\sigma_p$.}
  
  From \cref{fig:sawsim:order-dep}, we see that the proper way to
  process data from mechanical unfolding experiments is to group the
@@ -206,7 +261,7 @@ statistical analysis separately for peaks with the same unfolding
  order.  However, in most experiments, the tethering of the polymer to
  the AFM tip is by nonspecific adsorption; as a result, the polymers
  being stretched between the tip and the substrate have various
-lengths\citep{1li00}.  In addition, the interactions between the tip
+lengths\citep{li00}.  In addition, the interactions between the tip
  and the surface often cause irregular features in the beginning of the
  force curve (\cref{fig:expt-sawtooth}), making the identification of
  the first peak uncertain\citep{carrion-vazquez00}.  Furthermore, it is
@@ -231,28 +286,43 @@ order, but the average unfolding force can be quite different for the
  same protein because of the differences in unfolding order and polymer
  length.
  
-\begin{figure}
-  \begin{center}
-  \asyinclude{figures/order-dep/order-dep}
-  \caption{The dependence of the unfolding force on the temporal
-    unfolding order for four polymers with $4$, $8$, $12$, and $16$
-    identical protein domains.  Each point in the figure is the
-    average of $400$ data points.  The first point in each curve
-    represents the average of only the first peak in each of the $400$
-    simulated force curves, the second point represents the average of
-    only the second peak, and so on.  The solid lines are fits of
-    \cref{eq:sawsim:order-dep} to the simulated data, with best fit
-    $\kappa_\text{WLC}=203$, $207$, $161$, and $157\U{pN/nm}$,
-    respectively, for lengths $4$ through $16$.  The insets show the
-    force distributions of the first, fourth, and eighth peaks, left
-    to right, for the polymer with eight protein domains.  The
-    parameters used for generating the data were the same as those
-    used for \cref{fig:sawsim:sim-sawtooth}, except for the number of
-    domains.  The histogram insets were normalized in the same way as
-    in \cref{fig:sawsim:sim-hist}.\label{fig:sawsim:order-dep}}
-  \end{center}
-\end{figure}
-
+\citet{benedetti11} have since proposed an alternative
+parameterization for \cref{eq:kappa-system}, using
+\begin{equation}
+  \kappa = \p({\frac{1}{\kappa_c}
+               + \frac{N_f}{\kappa_f} + \frac{N_u}{\kappa_u}})^{-1}
+    \equiv \frac{\kappa'}{1 - A N_f} \;,
+  \label{eq:kappa-system-benedetti}
+\end{equation}
+where $\kappa'$ is the spring constant of the completely unfolded
+chain and $A$ is a correction term for the supramolecular scaffold.
+This is effectively a first order Taylor expansion for $\kappa^{-1}$
+about $N_f=0$, but the remaining analysis is identical.
+\begin{align}
+  f(N_f) \equiv \kappa^{-1}
+    &= \frac{1}{\kappa_c} + \frac{N_f}{\kappa_f} + \frac{N - N_f}{\kappa_u} \\
+    &= f(0) + \left.\deriv{N_f}{f}\right|_{N_f=0} N_f + \order{N_f^2} \\
+    &\approx \p({\frac{1}{\kappa_c} + \frac{N}{\kappa_u}}) +
+             \p({\frac{1}{\kappa_f} - \frac{1}{\kappa_u}}) N_f
+  \label{eq:kappa-system-taylor}
+\end{align}
+In the case where the wormlike chain stiffnesses $\kappa_f$ and
+$\kappa_u$ are fairly constant over the unfolding region, there are no
+higher order terms and the first order expansion in
+\cref{eq:kappa-system-taylor} is exact.  Comparing
+\cref{eq:kappa-system-benedetti,eq:kappa-system-taylor}, we see
+\begin{align}
+  \kappa' &= \frac{1}{\kappa_c} + \frac{N}{\kappa_u} \\
+  -\kappa' A &= \frac{1}{\kappa_f} - \frac{1}{\kappa_u} \\
+  A &= \frac{\frac{1}{\kappa_u} - \frac{1}{\kappa_f}}
+            {\frac{1}{\kappa_c} + \frac{N}{\kappa_u}}
+\end{align}
+By focusing on the $A=0$ case (\ie~$\kappa_f=\kappa_u$),
+\citet{benedetti11} avoid running Monte Carlo simulations when
+modeling unfolding histograms.  This simplification does not hold for
+our simulated data (\cref{fig:sawsim:order-dep}), but for some
+experimental analysis the loss of accuracy may be acceptable in return
+for the reduced computational cost.
  
  \subsection{The effect of cantilever force constant}
  \label{sec:sawsim:cantilever}
@@ -354,7 +424,7 @@ the similarity between two probability distributions.
  \end{equation}
  where $p_e(i)$ and $p_s(i)$ are the the values of the $i^\text{th}$
  bin in the experimental and simulated unfolding force histograms,
-respectively.  $D_\text{KL}$ is the Kullback-Leibler divergence
+respectively.  $D_\text{KL}$ is the Kullback--Leibler divergence
  \begin{equation}
    D_\text{KL}(p_p,p_q)
      = \sum_i p_p(i) \log_2\p({\frac{p_p(i)}{p_q(i)}}) \;,  \label{eq:sawsim:D_KL}
@@ -362,8 +432,16 @@ respectively.  $D_\text{KL}$ is the Kullback-Leibler divergence
  where the sum is over all unfolding force histogram bins.  $p_m$ is
  the symmetrized probability distribution
  \begin{equation}
-  p_m(i) \equiv [p_e(i)+p_s(i)]/2 \;.
+  p_m(i) \equiv [p_e(i)+p_s(i)]/2 \;.  \label{eq:sawsim:p_m}
  \end{equation}
+%
+\nomenclature{$D_\text{JS}$}{The Jensen--Shannon divergence
+  (\cref{eq:sawsim:D_JS}).}
+\nomenclature{$D_\text{LK}$}{The Kullback--Leibler divergence
+  (\cref{eq:sawsim:D_KL}).}
+\nomenclature{$p_m(i)$}{The symmetrized probability distribution used
+  in calculating the Jensen--Shannon divergence
+  (\cref{eq:sawsim:D_JS,eq:sawsim:p_m}).}
  % DONE: Mention inter-histogram normalization? no.
  %  For experiments carried out over a series of pulling velocities, we
  %  simply sum residuals computed for each velocity, although it would
@@ -373,11 +451,15 @@ the symmetrized probability distribution
  The major advantage of the Jensen--Shannon divergence is that
  $D_\text{JS}$ is bounded ($0\le D_\text{JS}\le 1$) regardless of the
  experimental and simulated histograms.  For comparison, Pearson's
-$\chi^2$ test,
+$\chi^2$ test\citep{NIST:chi-square},
  \begin{equation}
-  D_{χ^2} = \sum_i \frac{(p_e(i)-p_s(i))^2}{p_s(i)}) \;,  \label{eq:sawsim:X2}
+  D_{\chi^2} = \sum_i \frac{(p_e(i)-p_s(i))^2}{p_s(i)} \;,
+  \label{eq:sawsim:X2}
  \end{equation}
  is infinite if there is a bin for which $p_e(i)>0$ but $p_s(i)=0$.
+%
+\nomenclature{$\chi^2$}{The chi-squared distribution}
+\nomenclature{$D_{\chi^2}$}{Pearson's $\chi^2$ test (\cref{eq:sawsim:X2}).}
  
  \Cref{fig:sawsim:fit-space} shows the Jensen--Shannon divergence
  calculated using \cref{eq:sawsim:D_JS} between an experimental data
@@ -476,7 +558,7 @@ plain-text histogram file:
  
  Each \sawsim\ run simulates a single sawtooth curve, so you need to
  run many \sawsim\ instances to generate your simulated histograms.  To
-automate this task, \sawsim\ comes with a \citet{Python} wrapping
+automate this task, \sawsim\ comes with a \citetalias{python} wrapping
  library (\pysawsim), which provides convenient programmatic and
  command line interfaces for generating and manipulating \sawsim\ runs.
  For example, to compare the experimental histograms listed above with
@@ -484,21 +566,21 @@ simulated data over a 50-by-50 grid of $k_{u0}$ and $\Delta x$, you
  would use something like
  \begin{minted}[samepage]{console}
  $ sawsim_hist_scan.py -f '-s cantilever,hooke,0.05 -N1 -s folded,null -N8
-     -s "unfolded,wlc,{0.39e-9,28e-9}" -k "folded,unfolded,bell,{%g,x%g}"
-     -q folded' -r '[1e-5,1e-3,50],[0.1e-9,1e-9,50]' --logx histograms.txt
+>   -s "unfolded,wlc,{0.39e-9,28e-9}" -k "folded,unfolded,bell,{%g,x%g}"
+>   -q folded' -r '[1e-5,1e-3,50],[0.1e-9,1e-9,50]' --logx histograms.txt
  \end{minted}
  That's a bit of a mouthful, so let's break it down.  Without the
-\sawsim\ template (\Verb+-f ...+), we can focus on the comparison
+\sawsim\ template (\imint{sh}|-f ...|), we can focus on the comparison
  options:
  \begin{minted}[samepage]{console}
-$ sawsim_hist_scan.py \ldots -r '[1e-5,1e-3,50],[0.1e-9,1e-9,50]' --logx histograms.txt
+$ sawsim_hist_scan.py ... -r '[1e-5,1e-3,50],[0.1e-9,1e-9,50]' --logx histograms.txt
  \end{minted}
  This sets up a two-parameter sweep, with the first parameter going
  from $1\E{-5}$ to $1\E{-3}$ in 50 logarithmic steps, and the second
  going from $0.1\E{-9}$ to $1\E{-9}$ in 50 linear steps.  The
  \sawsim\ template defines the simulation model
-(\cref{fig:sawsim:domains,tab:sawsim:model}), and \Verb+%g+ marks the
-location where the swept parameters will be inserted.
+(\cref{fig:sawsim:domains,tab:sawsim:model}), and \imint{sh}|%g| marks
+the location where the swept parameters will be inserted.
  
  Behind the scenes, \pysawsim\ is spawning several concurrent
  \sawsim\ processes to take advantage of any parallel processing
@@ -515,249 +597,194 @@ computer lab, the simulation would finish overnight.
    infrastructure.  You should be able to distinguish this from the
    other PBS (phosphate buffered saline) based on the context}
  
-\sawsim\ also takes advantage of a number of optimizations for faster
-execution.  One of the bottlenecks in the \sawsim\ code is the TODO:
-interpolating tree.
-
  \subsection{Testing}
  \label{sec:sawsim:testing}
  
  Once a body of code reaches a certain level of complication, it
  becomes difficult to convince others (or yourself) that it's actually
  working correctly.  In order to test \sawsim, I've developed a test
-suite that compares simulated unfolding force histograms with
-analytical histograms for a number of situations where solving for the
-analytical histogram is possible.
-
-\section{Review of current research}
-
-There are two main approaches to modeling protein domain unfolding
-under tension: Bell's and Kramers'\citep{schlierf06,hummer03,dudko06}.
-Bell introduced his model in the context of cell
-adhesion\citep{bell78}, but it has been widely used to model
-mechanical unfolding in
-proteins\citep{rief97a,carrion-vazquez99b,schlierf06} due to it's
-simplicity and ease of use\citep{hummer03}.  Kramers introduced his
-theory in the context of thermally activated barrier crossings, which
-is how we use it here.
-
-\subsection{Evolution of unfolding modeling}
-
-Evans introduced the saddle-point Kramers' approximation in a protein unfolding context 1997 (\citet{evans97} Eqn.~3).
-However, early work on mechanical unfolding focused on the simper Bell model\citep{rief97a}.%TODO
-In the early `00's, the saddle-point/steepest-descent approximation to Kramer's model (\citet{hanggi90} Eqn.~4.56c) was introduced into our field\citep{dudko03,hyeon03}.%TODO
-By the mid `00's, the full-blown double-integral form of Kramer's model (\citet{hanggi90} Eqn.~4.56b) was in use\citep{schlierf06}.%TODO
-
-There have been some tangential attempts towards even fancier models.
-\citet{dudko03} attempted to reduce the restrictions of the single-unfolding-path model.
-\citet{hyeon03} attempted to measure the local roughness using temperature dependent unfolding.
-
-\subsection{History of simulations}
-
-Early molecular dynamics (MD) work on receptor-ligand breakage by Grubmuller 1996 and Izrailev 1997 (according to Evans 1997).
-\citet{evans97} introduce a smart Monte Carlo (SMC) Kramers' simulation.
-
-\subsection{History of experimental AFM unfolding experiments}
-
-\begin{itemize}
-  \item \citet{rief97a}: 
-\end{itemize}
-
-\subsection{History of experimental laser tweezer unfolding experiments}
-
-\begin{itemize}
-  \item \citet{izrailev97}: 
-\end{itemize}
-
-\section{Single-domain proteins under constant loading}
-
-eq:sawsim:order-dep
-
-Let $x$ be the end to end distance of the protein, $t$ be the time since loading began, $F$ be tension applied to the protein, $P$ be the surviving population of folded proteins.
-Make the definitions
-\begin{align}
-  v &\equiv \deriv{t}{x} && \text{the pulling velocity} \\
-  k &\equiv \deriv{x}{F} && \text{the loading spring constant} \\
-  P_0 &\equiv P(t=0) && \text{the initial number of folded proteins} \\
-  D &\equiv P_0 - P && \text{the number of dead (unfolded) proteins} \\
-  \kappa &\equiv -\frac{1}{P} \deriv{t}{P} && \text{the unfolding rate}
-\end{align}
-\nomenclature{$\equiv$}{Defined as (\ie\ equivalent to)}
-The proteins are under constant loading because
-\begin{equation}
-  \deriv{t}{F} = \deriv{x}{F}\deriv{t}{x} = kv\;,
-\end{equation}
-a constant, since both $k$ and $v$ are constant (\citet{evans97} in the text on the first page, \citet{dudko06} in the text just before Eqn.~4).
-
-The instantaneous likelyhood of a protein unfolding is given by $\deriv{F}{D}$, and the unfolding histogram is merely this function discretized over a bin of width $W$(This is similar to \citet{dudko06} Eqn.~2, remembering that $\dot{F}=kv$, that their probability density is not a histogram ($W=1$), and that their pdf is normalized to $N=1$).
+suite (distributed with \sawsim) that compares simulated unfolding
+force histograms with analytical histograms for a number of situations
+where solving for the analytical histogram is possible.
+
+The instantaneous likelyhood of a protein unfolding is given by
+$\deriv{F}{N_u}$, and the unfolding histogram is merely this function
+discretized over a bin of width $W$\footnote{
+  This is similar to \xref{dudko06}{equation}{2}, remembering that
+  $\dot{F}=\kappa v$, that their probability density is not a
+  histogram ($W=1$), and that their probability density function is
+  normalized to $N=1$
+}.
  \begin{equation}
    h(F) \equiv \deriv{\text{bin}}{F}
-    = \deriv{F}{D} \cdot \deriv{\text{bin}}{F}
-    = W \deriv{F}{D}
-    = -W \deriv{F}{P}
-    = -W \deriv{t}{P} \deriv{F}{t}
-    = \frac{W}{vk} P\kappa \label{eq:unfold:hist}
+    = \deriv{F}{N_u} \cdot \deriv{\text{bin}}{F}
+    = W \deriv{F}{N_u}
+    = -W \deriv{F}{N_f}
+    = -W \deriv{t}{N_f} \deriv{F}{t}
+    = \frac{W}{\kappa v} N_f k_u \label{eq:unfold:hist}
  \end{equation}
-Solving for theoretical histograms is merely a question of taking your chosen $\kappa$, solving for $P(f)$, and plugging into Eqn. \ref{eq:unfold:hist}.
-We can also make a bit of progress solving for $P$ in terms of $\kappa$ as follows:
+Solving for theoretical histograms is merely a question of taking your
+chosen $k_u$, solving for $N_f(F)$, and plugging into
+\cref{eq:unfold:hist}.  We can also make a bit of progress solving for
+$N_f$ in terms of $k_u$ as follows:
  \begin{align}
-  \kappa &\equiv -\frac{1}{P} \deriv{t}{P} \\
-  -\kappa \dd t \cdot \deriv{t}{F} &= \frac{\dd P}{P} \\
-  \frac{-1}{kv} \int \kappa \dd F &= \ln(P) + c \\
-  P &= C\exp{\p({\frac{-1}{kv}\integral{}{}{F}{\kappa}})} \;, \label{eq:P}
+  k_u &\equiv -\frac{1}{N_f} \deriv{t}{N_f} \\
+  -k_u \dd t \cdot \deriv{t}{F} &= \frac{\dd N_f}{N_f} \\
+  \frac{-1}{\kappa v} \integral{0}{F}{F'}{k_0(F')}
+    &= \left. \ln(N_f(F')) \right|_0^F
+    = \ln\p({\frac{N_f(F)}{N_f(0)}})
+    = \ln\p({\frac{N_f(F)}{N}}) \\
+  N_f(F) &= N\exp{\frac{-1}{\kappa v}\integral{0}{F}{F'}{k_u(F')}} \;,
+  \label{eq:N_f}
  \end{align}
-where $c \equiv \ln(C)$ is a constant of integration scaling $P$.
+where $N_f(0) = N$ because all the domains are initially folded.
+%
+\nomenclature{$W$}{Bin width of an unfolding force histogram
+  (\cref{eq:unfold:hist}).}
  
-\subsection{Constant unfolding rate}
+\subsubsection{Constant unfolding rate}
  
-In the extremely weak tension regime, the proteins' unfolding rate is independent of tension, we have
+In the extremely weak tension regime, the protein's unfolding rate is
+independent of tension, so we can simplify \cref{eq:N_f} and plug into
+\cref{eq:unfold:hist}.
  \begin{align}
-  P &= C\exp{\p({\frac{-1}{kv}\integral{}{}{F}{\kappa}})}
-     = C\exp{\p({\frac{-1}{kv}\kappa F})}
-     = C\exp{\p({\frac{-\kappa F}{kv}})} \\
-  P(0) &\equiv P_0 = C\exp(0) = C \\
-  h(F) &= \frac{W}{vk} P \kappa
-     = \frac{W\kappa P_0}{vk} \exp{\p({\frac{-\kappa F}{kv}})}
+  N_f &= N\exp{\frac{-1}{\kappa v}\integral{0}{F}{F'}{\colA{k_u(F')}}}
+     = N\exp{\frac{-\colA{k_{u0}}}{\kappa v}\colB{\integral{0}{F}{F'}{}}}
+     = N\exp{\frac{-k_{u0} \colB{F}}{\kappa v}} \\
+  h(F) &= \frac{W}{\kappa v} N_f k_u
+     = \frac{W k_{u0} N}{\kappa v} \exp{\frac{-k_{u0} F}{\kappa v}} \;.
  \end{align}
-So, a constant unfolding-rate/hazard-function gives exponential decay.
-Not the most earth shattering result, but it's a comforting first step, and it does show explicitly the dependence in terms of the various unfolding-specific parameters.
+A constant unfolding-rate/hazard-function gives exponential decay.
+This is not an earth shattering result, but it's a comforting first
+step, and it does show explicitly the dependence in terms of the
+various unfolding-specific parameters.
  
-\subsection{Bell model}
+\subsubsection{Bell model}
  
  Stepping up the intensity a bit, we come to Bell's model for unfolding
-(\citet{hummer03} Eqn.~1 and the first paragraph of \citet{dudko06} and \citet{dudko07}).
+(\cref{sec:sawsim:rate:bell}).  We can simplify the following
+calculation by parametrizing with the characteristic force $\rho$
+defined in \cref{sec:sawsim:results:scaffold} and the similar
+single-domain mode $\alpha'\equiv-\rho\ln(k_{u0}\rho/\kappa v)$.  With
+these substitutions, \cref{eq:sawsim:bell} becomes
  \begin{equation}
-  \kappa = \kappa_0 \cdot \exp\p({\frac{F \dd x}{k_B T}})
-         = \kappa_0 \cdot \exp(a F) \;,
+  k_u = k_{u0} \exp{\frac{F}{\rho}} \;.
  \end{equation}
-where we've defined $a \equiv \dd x/k_B T$ to bundle some constants together.
-The unfolding histogram is then given by
+The unfolding histogram is then given via \cref{eq:N_f,eq:unfold:hist}.
  \begin{align}
-  P &= C\exp\p({\frac{-1}{kv}\integral{}{}{F}{\kappa}})
-     = C\exp\p[{\frac{-1}{kv} \frac{\kappa_0}{a} \exp(a F)}]
-     = C\exp\p[{\frac{-\kappa_0}{akv}\exp(a F)}] \\
-  P(0) &\equiv P_0 = C\exp\p({\frac{-\kappa_0}{akv}}) \\
-  C &= P_0 \exp\p({\frac{\kappa_0}{akv}}) \\
-  P &= P_0 \exp\p\{{\frac{\kappa_0}{akv}[1-\exp(a F)]}\} \\
-  h(F) &= \frac{W}{vk} P \kappa
-     = \frac{W}{vk} P_0 \exp\p\{{\frac{\kappa_0}{akv}[1-\exp(a F)]}\} \kappa_0 \exp(a F)
-     = \frac{W\kappa_0 P_0}{vk} \exp\p\{{a F + \frac{\kappa_0}{akv}[1-\exp(a F)]}\} \label{eq:unfold:bell_pdf}\;.
+  N_f &= N\exp{\frac{-1}{\kappa v}\integral{0}{F}{F'}{\colA{k_u}}}
+     = N\exp{\frac{-1}{\kappa v}
+       \integral{0}{F}{F'}{\colAB{k_{u0}}{\colA{\exp{\frac{F'}{\rho}}}}}}
+     = N\exp{\frac{-\colB{k_{u0}}}{\kappa v}
+       \colA{\integral{0}{F}{F'}{\exp{\frac{F'}{\rho}}}}}
+     = N\exp{\frac{\colB{-}k_{u0}\colA{\rho}}{\kappa v}
+       \colAB{\p({\exp{\frac{F}{\rho}}-1})}} \\
+     &= N\exp{\colA{\frac{k_{u0}\rho}{\kappa v}}
+       \colB{\p({1 - {\exp{\frac{F}{\rho}}}})}}
+     = N\exp{\colAB{\exp{\frac{-\alpha'}{\rho}}}
+       \colB{\p({1 - {\exp{\frac{F}{\rho}}}})}}
+     = N\exp{\colB{\exp{\frac{-\alpha'}{\rho}} -
+       \exp{\frac{F-\alpha'}{\rho}}}} \\
+  h(F) &= \frac{W}{\kappa v} \colA{N_f} \colB{k_u}
+     = \frac{W}{\kappa v}
+       \colA{N\exp{\exp{\frac{-\alpha'}{\rho}} - \exp{\frac{F-\alpha'}{\rho}}}}
+       \colB{k_{u0}\exp{\frac{F}{\rho}}}
+     = \frac{W N \colAB{k_{u0}}}{\colA{\kappa v}}
+       \exp{\colB{\frac{F}{\rho}} - \exp{\frac{F-\alpha'}{\rho}} +
+         \exp{\frac{-\alpha'}{\rho}}} \\
+     &= \frac{W N}{\colA{\rho}}
+       \exp{\frac{F \colA{-\alpha'}}{\rho} - \exp{\frac{F-\alpha'}{\rho}} +
+         \colB{\exp{\frac{-\alpha'}{\rho}}}}
+     = \frac{W N}{\rho}
+       \exp{\frac{F-\alpha'}{\rho} - \exp{\frac{F-\alpha'}{\rho}}}
+       \colB{\exp{\exp{\frac{-\alpha'}{\rho}}}} \\
+     &= \frac{W N \exp{\exp{\frac{-\alpha'}{\rho}}}}{\rho}
+       \exp{\frac{F-\alpha'}{\rho} - \exp{\frac{F-\alpha'}{\rho}}}
+  \label{eq:unfold:bell_pdf}
  \end{align}
-The $F$ dependent behavior reduces to
-\begin{equation}
-  h(F) \propto \exp\p[{a F - b\exp(a F)}] \;,
-\end{equation}
-where $b \equiv \kappa_0/akv \equiv \kappa_0 k_B T / k v \dd x$ is
-another constant rephrasing.
-
-This looks similar to the Gompertz / Gumbel / Fisher-Tippett
-distribution, where
-\begin{align}
-  p(x) &\propto z\exp(-z) \\
-  z &\equiv \exp\p({-\frac{x-\mu}{\beta}}) \;,
-\end{align}
-but we have
-\begin{equation}
-  p(x) \propto z\exp(-bz) \;.
-\end{equation}
-Strangely, the Gumbel distribution is supposed to derive from an
-exponentially increasing hazard function, which is where we started
-for our derivation.  I haven't been able to find a good explaination
-of this discrepancy yet, but I have found a source that echos my
-result (\citet{wu04} Eqn.~1).  TODO: compare \citet{wu04} with
-my successful derivation in \cref{sec:sawsim:results-scaffold}.
-
-Oh wait, we can do this:
-\begin{equation}
-  p(x) \propto z\exp(-bz) = \frac{1}{b} z'\exp(-z')\propto z'\exp(-z') \;,
-\end{equation}
-with $z'\equiv bz$.  I feel silly...  From
-\href{Wolfram}{http://mathworld.wolfram.com/GumbelDistribution.html},
-the mean of the Gumbel probability density
-\begin{equation}
-  P(x) = \frac{1}{\beta} \exp\p[{\frac{x-\alpha}{\beta}
-                                 -\exp\p({\frac{x-\alpha}{\beta}})
-                                 }]
-\end{equation}
-is given by $\mu=\alpha-\gamma\beta$, and the variance is
-$\sigma^2=\frac{1}{6}\pi^2\beta^2$, where $\gamma=0.57721566\ldots$ is
-the Euler-Mascheroni constant.  Selecting $\beta=1/a=k_BT/\dd x$,
-$\alpha=-\beta\ln(\kappa\beta/kv)$, and $F=x$ we have
-\begin{align}
-  P(F)
-    &= \frac{1}{\beta} \exp\p[{\frac{F+\beta\ln(\kappa\beta/kv)}{\beta}
-                               -\exp\p({\frac{F+\beta\ln(\kappa\beta/kv)}
-                                             {\beta}})
-                             }] \\
-    &= \frac{1}{\beta} \exp(F/\beta)\exp[\ln(\kappa\beta/kv)]
-                       \exp\p\{{-\exp(F/\beta)\exp[\ln(\kappa\beta/kv)]}\} \\
-    &= \frac{1}{\beta} \frac{\kappa\beta}{kv} \exp(F/\beta)
-                       \exp\p[{-\kappa\beta/kv\exp(F/\beta)}] \\
-    &= \frac{\kappa}{kv} \exp(F/\beta)\exp[-\kappa\beta/kv\exp(F/\beta)] \\
-    &= \frac{\kappa}{kv} \exp(F/\beta - \kappa\beta/kv\exp(F/\beta)] \\
-    &= \frac{\kappa}{kv} \exp(aF - \kappa/akv\exp(aF)] \\
-    &= \frac{\kappa}{kv} \exp(aF - b\exp(aF)]
-    \propto h(F) \;.
-\end{align}
-So our unfolding force histogram for a single Bell domain under
-constant loading does indeed follow the Gumbel distribution.
-
-% Consolidate with src/sawsim/discussion.tex
-
-\subsection{Saddle-point Kramers' model}
+which matches \cref{eq:sawsim:gumbel} except for a constant
+prefactor due to the range\footnote{
+  The Gumbel distribution in \cref{eq:sawsim:gumbel} is normalized for
+  the range $-\infty < F < \infty$, but \cref{eq:unfold:bell_pdf} is
+  normalized for the range $0 \le F < \infty$.  This distinction will
+  alter the analytical mean and variance listed after
+  \cref{eq:sawsim:gumbel}, but with the experimental unfolding
+  histograms showing few zero-force unfolding events, the effective
+  difference will be negligible.
+}.
+%
+\nomenclature{$\alpha'$}{The mode unfolding force for a single folded
+  domain, $\alpha'\equiv-\rho\ln(k_{u0}\rho/\kappa v)$
+  (\cref{eq:unfold:bell_pdf}).}
+
+\subsubsection{Saddle-point Kramers' model}
  
  For the saddle-point approximation for Kramers' model for unfolding
  (\citet{evans97} Eqn.~3, \citet{hanggi90} Eqn. 4.56c, \citet{vanKampen07} Eqn. XIII.2.2).
  \begin{equation}
-  \kappa = \frac{D}{l_b l_{ts}} \cdot \exp\p({\frac{-E_b(F)}{k_B T}}) \;,
+  k_u = \frac{D}{l_b l_{ts}} \cdot \exp{\frac{-U_b(F)}{k_B T}} \;,
+    \label{eq:kramers-saddle}
  \end{equation}
-where $E_b(F)$ is the barrier height under an external force $F$,
+where $U_b(F)$ is the barrier height under an external force $F$,
  $D$ is the diffusion constant of the protein conformation along the reaction coordinate,
  $l_b$ is the characteristic length of the bound state $l_b \equiv 1/\rho_b$,
  $\rho_b$ is the density of states in the bound state, and
-$l_{ts}$ is the characteristic length of the transition state
-\begin{equation}
-  l_{ts} = TODO
-\end{equation} 
-
-\citet{evans97} solved this unfolding rate for both inverse power law potentials and cusp potentials.
+$l_{ts}$ is the characteristic length of the transition state.
+%
+\nomenclature{$U_b(F)$}{The barrier energy as a function of force
+  (\cref{eq:kramers-saddle}).}
+\nomenclature{$l_b$}{The characteristic length of the bound state $l_b
+  \equiv 1/\rho_b$ (\cref{eq:kramers-saddle}).}
+\nomenclature{$\rho_b$}{The density of states in the bound state
+  (\cref{eq:kramers-saddle}).}
+\nomenclature{$l_{ts}$}{The characteristic length of the transition
+  state (\cref{eq:kramers-saddle}).}
+
+\citet{evans97} solved this unfolding rate for both inverse power law
+potentials and cusp potentials.
  
-\subsubsection{Inverse power law potentials}
-
-\begin{equation}
-  E(x) = \frac{-A}{x^n}
-\end{equation}
-(e.g. $n=6$ for a van der Waals interaction, see \citet{evans97} in
-the text on page 1544, in the first paragraph of the section
-\emph{Dissociation under force from an inverse power law attraction}).
-Evans then goes into diffusion constants that depend on the
-protein's end to end distance, and I haven't worked out the math
-yet.  TODO: clean up.
-
-
-\subsubsection{Cusp potentials}
-
-\begin{equation}
-  E(x) = \frac{1}{2}\kappa_a \p({\frac{x}{x_a}})^2
-\end{equation}
-(see \citet{evans97} in the text on page 1545, in the first paragraph
-of the section \emph{Dissociation under force from a deep harmonic well}).
-
-\section{Double-integral Kramers' theory}
-
-The double-integral form of overdamped Kramers' theory may be too
-complex for analytical predictions of unfolding-force histograms.
-Rather than testing the entire \sawsim\ simulation (\cref{sec:sawsim}),
-we will focus on demonstrating that the Kramers' $k(F)$ evaluations
-are working properly.  If the Bell modeled histograms check out, that
-gives reasonable support for the $k(F) \rightarrow \text{histogram}$
-portion of the simulation.
-
-Looking for analytic solutions to Kramers' $k(F)$, we find that there
-are not many available in a closed form.  However, we do have analytic
-solutions for unforced $k$ for cusp-like and quartic potentials.
-
-\subsection{Cusp-like potentials}
+\section{Review of current research}
  
+There is a long history of protein unfolding and unbinding
+simulations.  Early work by \citet{grubmuller96} and
+\citet{izrailev97} focused on molecular dynamics (MD) simulations of
+receptor-ligand breakage.  Around the same time, \citet{evans97}
+introduced a Monte Carlo Kramers' simulation in the context of
+receptor-ligand breakage.  The approach pioneered by \citet{evans97}
+was used as the basis for analysis of the initial protein unfolding
+experiments\citep{rief97a}.  However, none of these earlier
+implementations were open source, which made it difficult to reuse or
+validate their results.
+%
+\nomenclature{MD}{Molecular dynamics simulation.  Simulate the
+  physical motion of atoms and molecules by numerically solving
+  Newton's equations.}
+
+Within the Monte Carlo simulation approach, there are two main models
+for protein domain unfolding under tension: Bell's and
+Kramers'\citep{schlierf06,hummer03,dudko06}.  Bell introduced his
+model in the context of cell adhesion\citep{bell78}, but it has been
+widely used to model mechanical unfolding in
+proteins\citep{rief97a,carrion-vazquez99b,schlierf06} due to its
+simplicity and ease of use\citep{hummer03}.  Kramers introduced his
+theory in the context of thermally activated barrier crossings, which
+is how we use it here.
  
-\subsection{Quartic potentials}
+Evans introduced the saddle-point Kramers' approximation in a protein
+unfolding context in 1997 (\xref{evans97}{equation}{3}).  However,
+early work on mechanical unfolding focused on the simpler Bell
+model\citep{rief97a}.  In the early 2000's, the
+saddle-point/steepest-descent approximation to Kramer's model
+(\xref{hanggi90}{equation}{4.56c}) was introduced into our
+field\citep{dudko03,hyeon03}.  By the mid 2000's, the full-blown
+double-integral form of Kramer's model
+(\xref{hanggi90}{equation}{4.56b}) was in use\citep{schlierf06}.
+
+There have been some tangential attempts towards even fancier models:
+\citet{dudko03} attempted to reduce the restrictions of the
+single-unfolding-path model and \citet{hyeon03} attempted to measure
+the local roughness using temperature dependent unfolding.  However,
+further work on these lines has been slow, because the Bell model fits
+the data well despite its simplicity.  For more complicated models to
+gain ground, we need larger, more detailed datasets that expose
+features which the Bell model doesn't capture.