1 \section{Results and Discussion}
2 \label{sec:sawsim:results}
4 \subsection{Force curves generated by simulation}
5 \label{sec:sawsim:results:force-curves}
7 \Cref{fig:sawsim:sim-sawtooth} shows three simulated force curves from
8 pulling a polymer composed of eight identical protein molecules using
9 parameters from typical experimental settings. The order of the peaks
10 in the force curves reflects the temporal sequence of the unfolding
11 events instead of the positions of the protein molecules in the
12 polymer\citep{li00}. As observed experimentally
13 (\cref{fig:expt-sawtooth}), the forces at which identical protein
14 molecules unfold fluctuate, revealing the stochastic nature of protein
15 unfolding since no instrumental noise is included in the simulation.
17 After aquiring a series of experimental unfolding curves, we need to
18 fit the data to an explanatory model. For velocity-clamp experiments
19 (\cref{sec:procedure,sec:sawsim:velocity-clamp}), we extract unfolding
20 forces from the sawtooth curves (\cref{sec:hooke}) and generate
21 histograms of unfolding forces. Then we construct a parameterized
22 model of the experimental system (\cref{tab:sawsim:model}). We can
23 then run \insilico\ experiments mimicking our \invitro\ experiments
24 (\cref{fig:sawsim:sim-hist}). We extract the model parameters which
25 provide the best fit using a ``fit quality'' metric and a nonlinear
26 optimization routine (or a full parameter space sweep, for
27 low-dimensional parameter spaces).
31 \subfloat[][]{\label{tab:sawsim:domains}
32 \begin{tabular}{l l l l}
34 \multicolumn{4}{c}{Domain states} \\
36 Domain name & Initial count & Tension model & Model parameters \\
38 AFM cantilever & 1 & Hooke (\cref{eq:sawsim:hooke}) &
40 Folded I27 & 8 & WLC (\cref{eq:sawsim:wlc}) &
41 $p=3.9\U{\AA}$, $L=5.1\U{nm}$ \\
42 Unfolded I27 & 0 & WLC (\cref{eq:sawsim:wlc}) &
43 $p=3.9\U{\AA}$, $L=33.8\U{nm}$ \\
46 \subfloat[][]{\label{tab:sawsim:transitions}
47 \begin{tabular}{l l l l l}
49 \multicolumn{5}{c}{Transition rates} \\
51 Transition & Source & Target & Rate model & Model parameters \\
53 Unfolding & Folded I27 & Unfolded I27 & Bell (\cref{eq:sawsim:bell}) &
54 $k_{u0}=3.3\E{-4}\U{s$^{-1}$}$, $\Delta x=0.35\U{nm}$. \\
57 \caption{\protect\subref{tab:sawsim:domains} Model for
58 I27\textsubscript{8} domain states and
59 \protect\subref{tab:sawsim:transitions} transitions between them
60 (compare with \cref{fig:sawsim:domains}). The models and
61 parameters are those given by \citet{carrion-vazquez99b}.
62 \citet{carrion-vazquez99b} don't list their cantilever spring
63 constant (or, presumably, use it in their simulations), but we
64 can estimate it from the rebound slope in their
65 \fref{figure}{2.a} and \iref{figure}{2.b}, see
66 \cref{fig:sawsim:kappa-sawteeth}.\label{tab:sawsim:model}}
70 Because the unfolding behaviors of an individual sawtooth curve is
71 stochastic (\cref{fig:sawsim:sim-sawtooth}), we cannot directly
72 compare single curves in our fit quality metric. Instead, we gather
73 many experimental and simulated curves, and compare the aggregate
74 properties. For velocity-clamp experiments, the usual aggregate
75 property used for comparison is a histogram of unfolding
76 forces\citep{carrion-vazquez99b} (\cref{fig:sawsim:sim-hist}).
77 Defining and extracting ``unfolding force'' is suprisingly complicated
78 (\cref{sec:hooke:plugins}), but basically it is the highest tension
79 force achieved by the chain before an unfolding event (the drops in
80 the sawtooth). The final drop is not an unfolding event, it is the
81 entire chain breaking away from the cantilever tip, severing the
82 connection between the substrate and the cantilever.
86 \subfloat[][]{\asyinclude{figures/sim-sawtooth/sim-sawtooth}%
87 \label{fig:sawsim:sim-sawtooth}%
90 \subfloat[][]{\asyinclude{figures/sim-hist/sim-hist}%
91 \label{fig:sawsim:sim-hist}%
93 \caption{\protect\subref{fig:sawsim:sim-sawtooth} Three simulated force
94 curves from pulling a polymer of eight identical protein
95 molecules. The simulation was carried out using the parameters:
96 pulling speed $v=1\U{$\mu$m/s}$, cantilever spring constant
97 $\kappa_c=50\U{pN/nm}$, temperature $T=300\U{K}$, persistence
98 length of unfolded proteins $p_u=0.40\U{nm}$, $\Delta
99 x_u=0.225\U{nm}$, and $k_{u0}=5\E{-5}\U{s$^{-1}$}$. The contour
100 length between the two linking points on a protein molecule is
101 $L_{f1}=3.7\U{nm}$ in the folded form and $L_{u1}=28.1\U{nm}$ in
102 the unfolded form. These parameters are those of ubiquitin
103 molecules connected through the N-C
104 termini\citep{chyan04,carrion-vazquez03}. Detachment from the
105 tip or substrate is assumed to occur at a force of $400\U{pN}$.
106 In experiments, detachments have been observed to occur at a
107 variety of forces. For clarity, the green and blue curves are
108 offset by $200$ and $400\U{pN}$ respectively.
109 \protect\subref{fig:sawsim:sim-hist} The distribution of the unfolding
110 forces from $400$ simulated force curves ($3200$ data points)
111 such as those shown in \protect\subref{fig:sawsim:sim-sawtooth}. The
112 frequency is normalized by the total number of points, \ie, the
113 height of each bin is equal to the number of data points in that
114 bin divided by the total number of data
115 points.\label{fig:sawsim:sim-all}}
119 \subsection{The supramolecular scaffold}
120 \label{sec:sawsim:results:scaffold}
124 \asyinclude{figures/order-dep/order-dep}
125 \caption{The dependence of the unfolding force on the temporal
126 unfolding order for four polymers with $4$, $8$, $12$, and $16$
127 identical protein domains. Each point in the figure is the
128 average of $400$ data points. The first point in each curve
129 represents the average of only the first peak in each of the $400$
130 simulated force curves, the second point represents the average of
131 only the second peak, and so on. The solid lines are fits of
132 \cref{eq:sawsim:order-dep} to the simulated data, with best fit
133 $\kappa_\text{WLC}=203$, $207$, $161$, and $157\U{pN/nm}$,
134 respectively, for lengths $4$ through $16$. The insets show the
135 force distributions of the first, fourth, and eighth peaks, left
136 to right, for the polymer with eight protein domains. The
137 parameters used for generating the data were the same as those
138 used for \cref{fig:sawsim:sim-sawtooth}, except for the number of
139 domains. The histogram insets were normalized in the same way as
140 in \cref{fig:sawsim:sim-hist}.\label{fig:sawsim:order-dep}}
144 Analysis of the mechanical unfolding data is complicated by the
145 dependence of the average unfolding force on the unfolding order due
146 to the serial linkage of the molecules. Under an external stretching
147 force $F$, the probability of some domain unfolding in a polymer with
148 $N_f$ folded domains is $N_fP_1$ (\cref{eq:sawsim:prob-n}), which is
149 higher than the unfolding probability for a single molecule $P_1$.
150 Consequently, the average unfolding force is lower for the earlier
151 unfolding events when $N_f$ is larger, and the force should increase
152 as more and more molecules become unfolded. However, there is a
153 competing factor that opposes this trend. As the protein molecules
154 unfold, the chain becomes softer and the force loading rate becomes
155 lower when the pulling speed is constant. This lower loading rate
156 leads to a decrease in the unfolding force (in the no-loading limit,
157 all unfolding events occur at a tension of $0\U{N}$). The dependence
158 of the average unfolding force on the unfolding order is the result of
159 these two opposing effects. \Cref{fig:sawsim:order-dep} shows the
160 dependence of the average unfolding force on the unfolding force peak
161 order (the temporal order of unfolding events) for four polymers with
162 $4$, $8$, $12$, and $16$ identical protein molecules. The effect of
163 polymer chain softening dominates the initial unfolding events, and
164 the average unfolding force decreases as more molecules unfold. After
165 several molecules have unfolded, the softening for each additional
166 unfolding event becomes less significant, the change in unfolding
167 probability becomes dominant, and the unfolding force increases upon
168 each subsequent unfolding event\citep{zinober02}.
170 \nomenclature[sr ]{$N_f$}{The number of folded domains in a protein
171 chain (\cref{sec:sawsim:results:scaffold}).}
172 \nomenclature[sr ]{$N_u$}{The number of unfolded domains in a protein
173 chain (\cref{sec:sawsim:results:scaffold}).}
175 We validate this explanation by calculating the unfolding force
176 probability distribution's dependence on the two competing factors.
177 The rate of unfolding events with respect to force is
179 r_{uF} &= -\deriv{F}{N_f}
180 = -\frac{\dd N_f/\dd t}{\dd F/\dd t}
181 = \frac{N_f k_u}{\kappa v} \\
182 &= \frac{N_f k_{u0}}{\kappa v}\exp{\frac{F\Delta x_u}{k_B T}}
183 = \frac{1}{\rho}\exp{\frac{F-\alpha}{\rho}} \;,
185 where $N_f$ is the number of folded domain, $\kappa$ is the spring
186 constant of the cantilever-polymer system, $\kappa v$ is the force
187 loading rate\footnote{
189 \deriv{t}{F} = \deriv{x}{F}\deriv{t}{x} = \kappa v \;.
193 F &= \kappa x = \kappa vt \\
194 \deriv{t}{F} &= \kappa v \;.
196 See the text before \xref{evans97}{equation}{11} or
197 \xref{dudko06}{equation}{4} for similar explanations.
198 }, and $k_u$ is the unfolding rate constant
199 (\cref{eq:sawsim:bell}). In the last expression, $\rho\equiv
200 k_BT/\Delta x_u$, and $\alpha\equiv-\rho\ln(N_fk_{u0}\rho/\kappa v)$.
201 We can approximate $\kappa$ as a series of Hookean springs,
203 \kappa=\p({\frac{1}{\kappa_c}+\frac{N_u}{\kappa_\text{WLC}}})^{-1} \;,
204 \label{eq:kappa-system}
206 where $\kappa_\text{WLC}$ is the effective spring constant of one
207 unfolded domain, assumed constant for a particular polymer/cantilever
210 The event probability density for events with an exponentially
211 increasing likelihood function follows the Gumbel (minimum)
212 probability density\citep{NIST:gumbel} with $\rho$ and $\alpha$ being
213 the scale and location parameters, respectively\citep{hummer03}
215 \mathcal{P}(F) = \frac{1}{\rho} \exp{\frac{F-\alpha}{\rho}
216 -\exp{\frac{F-\alpha}{\rho}}
217 } \;. \label{eq:sawsim:gumbel}
219 The distribution has a mode $\alpha$, mean
220 $\avg{F}=\alpha-\gamma_e\rho$, and a variance $\sigma^2 =
221 \pi^2\rho^2/6$, where $\gamma_e=0.577\ldots$ is the Euler--Mascheroni
222 constant\citep{NIST:gumbel}. Therefore, the unfolding force
223 distribution has a variance
225 \sigma^2 = \frac{\p({\frac{\pi k_BT}{\Delta x_u}})^2}{6} \;,
226 \label{eq:sawsim:variance}
228 and and average\citep{brockwell02,hummer03}
229 % TODO: verify brockwell equivalence (p465)
231 \avg{F(i)} = \frac{k_BT}{\Delta x_u}
232 \p[{\ln\p({\frac{\kappa v\Delta x_u}{N_fk_{u0}k_BT}})
233 -\gamma_e}] \;, \label{eq:sawsim:order-dep}
235 where $N_f$ and $\kappa$ depend on the domain index $i=N_u$. Curves
236 based on this formula fit the simulated data remarkably well
237 considering the effective WLC\index{WLC} stiffness $\kappa_\text{WLC}$
238 is the only fitted parameter, and that the actual WLC stiffness is not
239 constant, as we have assumed here, but a non-linear function of $F$.
240 \citet{dudko08} derived a formula for the loading rate for a WLC, but
241 as far as I know, nobody has found an analytical form for the
242 unfolding force histograms produced under such a variable loading
245 \nomenclature[sr ]{$r_{uF}$}{Unfolding loading rate (newtons per
247 \nomenclature[sg a ]{$\alpha$}{The mode unfolding force,
248 $\alpha\equiv-\rho\ln(N_f k_{u0}\rho/\kappa v)$
249 (\cref{eq:sawsim:gumbel}).}
250 \nomenclature[sg r ]{$\rho$}{The characteristic unfolding force,
251 $\rho\equiv k_BT/\Delta x_u$ (\cref{eq:sawsim:gumbel}).}
252 \nomenclature[sg ce ]{$\gamma_e$}{Euler--Macheroni constant,
253 $\gamma_e=0.577\ldots$.}
254 \nomenclature[sg s ]{$\sigma$}{Standard deviation. For example,
255 $\sigma$ is used as the standard deviation of an unfolding force
256 distribution in \cref{eq:sawsim:gumbel}. Not to be confused with
257 the photodiode sensitivity $\sigma_p$.}
259 From \cref{fig:sawsim:order-dep}, we see that the proper way to
260 process data from mechanical unfolding experiments is to group the
261 curves according to the length of the polymer and to perform
262 statistical analysis separately for peaks with the same unfolding
263 order. However, in most experiments, the tethering of the polymer to
264 the AFM tip is by nonspecific adsorption; as a result, the polymers
265 being stretched between the tip and the substrate have various
266 lengths\citep{li00}. In addition, the interactions between the tip
267 and the surface often cause irregular features in the beginning of the
268 force curve (\cref{fig:expt-sawtooth}), making the identification of
269 the first peak uncertain\citep{carrion-vazquez00}. Furthermore, it is
270 often difficult to acquire a large amount of data in single molecule
271 experiments. These difficulties make the aforementioned data analysis
272 approach unfeasible for many mechanical unfolding experiments. As a
273 result, the values of all force peaks from polymers of different
274 lengths are often pooled together for statistical analysis. To assess
275 the errors caused by such pooling, simulation data were analyzed using
276 different pooling methods and the results were compared.
277 \Cref{fig:sawsim:sim-hist} shows that, for a polymer with eight
278 protein molecules, the average unfolding force is $281\U{pN}$ with a
279 standard deviation of $25\U{pN}$ when all data is pooled. If only the
280 first peaks in the force curves are analyzed, the average force is
281 $279\U{pN}$ with a standard deviation of $22\U{pN}$. While for the
282 fourth and eighth peaks, the average force are $275\U{pN}$ and
283 $300\U{pN}$, respectively, and the standard deviations are $23\U{pN}$
284 and $25\U{pN}$, respectively. As expected from the Gumbel
285 distribution, the width of the unfolding force distribution (insets in
286 \cref{fig:sawsim:order-dep}) is only weakly effected by unfolding
287 order, but the average unfolding force can be quite different for the
288 same protein because of the differences in unfolding order and polymer
291 \citet{benedetti11} have since proposed an alternative
292 parameterization for \cref{eq:kappa-system}, using
294 \kappa = \p({\frac{1}{\kappa_c}
295 + \frac{N_f}{\kappa_f} + \frac{N_u}{\kappa_u}})^{-1}
296 \equiv \frac{\kappa'}{1 - A N_f} \;,
297 \label{eq:kappa-system-benedetti}
299 where $\kappa'$ is the spring constant of the completely unfolded
300 chain and $A$ is a correction term for the supramolecular scaffold.
301 This is effectively a first order Taylor expansion for $\kappa^{-1}$
302 about $N_f=0$, but the remaining analysis is identical.
304 f(N_f) \equiv \kappa^{-1}
305 &= \frac{1}{\kappa_c} + \frac{N_f}{\kappa_f} + \frac{N - N_f}{\kappa_u} \\
306 &= f(0) + \left.\deriv{N_f}{f}\right|_{N_f=0} N_f + \order{N_f^2} \\
307 &\approx \p({\frac{1}{\kappa_c} + \frac{N}{\kappa_u}}) +
308 \p({\frac{1}{\kappa_f} - \frac{1}{\kappa_u}}) N_f
309 \label{eq:kappa-system-taylor}
311 In the case where the wormlike chain stiffnesses $\kappa_f$ and
312 $\kappa_u$ are fairly constant over the unfolding region, there are no
313 higher order terms and the first order expansion in
314 \cref{eq:kappa-system-taylor} is exact. Comparing
315 \cref{eq:kappa-system-benedetti,eq:kappa-system-taylor}, we see
317 \kappa' &= \frac{1}{\kappa_c} + \frac{N}{\kappa_u} \\
318 -\kappa' A &= \frac{1}{\kappa_f} - \frac{1}{\kappa_u} \\
319 A &= \frac{\frac{1}{\kappa_u} - \frac{1}{\kappa_f}}
320 {\frac{1}{\kappa_c} + \frac{N}{\kappa_u}}
322 By focusing on the $A=0$ case (\ie~$\kappa_f=\kappa_u$),
323 \citet{benedetti11} avoid running Monte Carlo simulations when
324 modeling unfolding histograms. This simplification does not hold for
325 our simulated data (\cref{fig:sawsim:order-dep}), but for some
326 experimental analysis the loss of accuracy may be acceptable in return
327 for the reduced computational cost.
329 \subsection{The effect of cantilever force constant}
330 \label{sec:sawsim:cantilever}
332 In mechanical unfolding experiments, the ability to observe the
333 unfolding of a single protein molecule depends on the tension drop
334 after an unfolding event such that another molecule does not unfold
335 immediately. The magnitude of this drop is determined by many
336 factors, including the magnitude of the unfolding force, the contour
337 and persistence lengths of the protein polymer, the contour length
338 increase from unfolding, and the stiffness (force constant) of the
339 cantilever. Among these, the effect of the cantilever force constant
340 is particularly interesting because cantilevers with a wide range of
341 force constants are available. In addition, different single molecule
342 manipulation techniques, such as the AFM and laser tweezers, differ
343 mainly in the range of the spring constants of their force
344 transducers\citep{walton08}. \Cref{fig:sawsim:kappa-sawteeth} shows
345 the simulated force curves from pulling an octamer of protein
346 molecules using cantilevers with different force constants, while
347 other parameters are identical. For this model protein, the
348 appearance of the force curve does not change much until the force
349 constant of the cantilever reaches a certain value
350 ($\kappa_c=50\U{pN/nm}$). When $\kappa_c$ is lower than this value,
351 the individual unfolding events become less identifiable. In order to
352 observe individual unfolding events, the cantilever needs to have a
353 force constant high enough so that the bending at the maximum force is
354 small in comparison with the contour length increment from the
355 unfolding of a single molecule. \Cref{fig:sawsim:kappa-sawteeth} also
356 shows that the back side of the force peaks becomes more tilted as the
357 cantilever becomes softer. This is due to the fact that the extension
358 (end-to-end distance) of the protein polymer has a large sudden
359 increase as the tension rebalances after an unfolding event.
361 It should also be mentioned that the contour length increment from
362 each unfolding event is not equal to the distance between adjacent
363 peaks in the force curve because the chain is never fully stretched.
364 This contour length increase can only be obtained by fitting the curve
365 to WLC\index{WLC} or other polymer models (\cref{fig:expt-sawtooth}).
369 \asyinclude{figures/kappa-sawteeth/kappa-sawteeth}
370 \caption{Simulated force curves obtained from pulling a polymer with
371 eight protein molecules using cantilevers with different force
372 constants $\kappa_c$. Parameters used in generating these curves
373 are the same as those used in \cref{fig:sawsim:sim-all}, except the
374 cantilever force constant. Successive force curves are offset by
375 $300\U{pN}$ for clarity.\label{fig:sawsim:kappa-sawteeth}}
379 \subsection{Determination of $\Delta x_u$ and $k_{u0}$}
380 \label{sec:sawsim:results:fitting}
382 As mentioned in \cref{sec:sawsim:results:force-curves}, fitting
383 experimental unfolding force histograms to simulated histograms allows
384 you to extract best-fit parameters for your simulation model. For
385 example, if you have Bell model unfolding
386 (\cref{sec:sawsim:rate:bell}), your two fitting parameters are the
387 zero-force unfolding rate $k_{u0}$ and the distance $\Delta x_u$ from
388 the native state to the transition state. \cref{fig:sawsim:v-dep}
389 shows the dependence of the unfolding force on the pulling speed for
390 different values of $k_{u0}$ and $\Delta x_u$. As expected, the
391 unfolding force increases linearly with the pulling speed in the
392 linear-log plot\citep{evans99}. While the magnitude of the unfolding
393 forces is affected by both $k_{u0}$ and $\Delta x_u$, the slope of
394 speed dependence is primarily determined by $\Delta x_u$
395 (\cref{eq:sawsim:order-dep}). \Cref{fig:sawsim:width-v-dep} shows
396 that the width of the unfolding force distribution is very sensitive
397 to $\Delta x_u$, as expected from the Gumbel distribution
398 (\cref{eq:sawsim:variance}). To obtain the values of $k_{u0}$ and
399 $\Delta x_u$ for the protein, the pulling speed dependence and the
400 distribution of the unfolding forces from simulation, such as those
401 shown in \cref{fig:sawsim:v-dep} and the insets of
402 \cref{fig:sawsim:width-v-dep}, are compared with the experimentally
403 measured results. The values of $k_{u0}$ and $\Delta x_u$ that
404 provide the best match are designated as the parameters describing the
405 protein under study. Since $k_{u0}$ and $\Delta x_u$ affect the
406 unfolding forces differently, the values of both parameters can be
407 determined simultaneously. The data used in plotting
408 \cref{fig:sawsim:all-v-dep} includes all force peaks from the
409 simulated force curves because most experimental data is analyzed that
410 way. % TODO: all? most data analyzed what way?
412 In most published literature, $k_{u0}$ and $\Delta x_u$ were fit by
413 carrying out simulations using a handful of possible unfolding
414 parameters and selected the best fit by eye%
415 %\citep{us,carrion-vazques99b,schlierf06}
416 . This approach does not allow estimation of uncertainties in the
417 fitting parameters, as shown by \citet{best02}. A more rigorous
418 approach involves quantifying the quality of fit between the
419 experimental and simulated force distributions, allowing the use of a
420 numerical minimization algorithm to pick the best fit parameters. We
421 use the Jensen--Shannon divergence\citep{sims09,lin91}, a measure of
422 the similarity between two probability distributions.
424 D_\text{JS}(p_e, p_s)
425 = D_\text{KL}(p_e, p_m) + D_\text{KL}(p_s, p_m) \;, \label{eq:sawsim:D_JS}
427 where $p_e(i)$ and $p_s(i)$ are the the values of the $i^\text{th}$
428 bin in the experimental and simulated unfolding force histograms,
429 respectively. $D_\text{KL}$ is the Kullback--Leibler divergence
432 = \sum_i p_p(i) \log_2\p({\frac{p_p(i)}{p_q(i)}}) \;, \label{eq:sawsim:D_KL}
434 where the sum is over all unfolding force histogram bins. $p_m$ is
435 the symmetrized probability distribution
437 p_m(i) \equiv [p_e(i)+p_s(i)]/2 \;. \label{eq:sawsim:p_m}
440 \nomenclature[sr ]{$D_\text{JS}$}{The Jensen--Shannon divergence
441 (\cref{eq:sawsim:D_JS}).}
442 \nomenclature[sr ]{$D_\text{LK}$}{The Kullback--Leibler divergence
443 (\cref{eq:sawsim:D_KL}).}
444 \nomenclature[sr ]{$p_m(i)$}{The symmetrized probability distribution
445 used in calculating the Jensen--Shannon divergence
446 (\cref{eq:sawsim:D_JS,eq:sawsim:p_m}).}
447 % DONE: Mention inter-histogram normalization? no.
448 % For experiments carried out over a series of pulling velocities, we
449 % simply sum residuals computed for each velocity, although it would
450 % also be reasonable to weight this sum according to the number of
451 % experimental unfolding events recorded for each velocity.
453 The major advantage of the Jensen--Shannon divergence is that
454 $D_\text{JS}$ is bounded ($0\le D_\text{JS}\le 1$) regardless of the
455 experimental and simulated histograms. For comparison, Pearson's
456 $\chi^2$ test\citep{NIST:chi-square},
458 D_{\chi^2} = \sum_i \frac{(p_e(i)-p_s(i))^2}{p_s(i)} \;,
461 is infinite if there is a bin for which $p_e(i)>0$ but $p_s(i)=0$.
463 \nomenclature[sg x2 ]{$\chi^2$}{The chi-squared distribution.}
464 \nomenclature[sr ]{$D_{\chi^2}$}{Pearson's $\chi^2$ test
465 (\cref{eq:sawsim:X2}).}
467 \Cref{fig:sawsim:fit-space} shows the Jensen--Shannon divergence
468 calculated using \cref{eq:sawsim:D_JS} between an experimental data
469 set and simulation results obtained using a range of values of
470 $k_{u0}$ and $\Delta x_u$. There is an order of magnitude range of
471 $k_{u0}$ that produce reasonable fits to experimental data
472 (\cref{fig:sawsim:fit-space}), which is consistent with the results
473 \citet{best02} obtained using a chi-square test. The values of
474 $k_{u0}$ and $\Delta x_u$ can be determined to higher precision by
475 using both the pulling speed dependent data and the unfolding force
476 distribution, as well as any relevant information about the protein
481 \subfloat[][]{\asyinclude{figures/v-dep/v-dep}%
482 \label{fig:sawsim:v-dep}%
484 \subfloat[][]{\asyinclude{figures/v-dep/v-dep-sd}%
485 \label{fig:sawsim:width-v-dep}%
487 \caption{\protect\subref{fig:sawsim:v-dep} The dependence of the
488 unfolding forces on the pulling speed for three different model
489 protein molecules characterized by the parameters $k_{u0}$ and
490 $\Delta x_u$. The polymer length is eight molecules, and each
491 symbol is the average of $3200$ data points.
492 \protect\subref{fig:sawsim:width-v-dep} The dependence of standard
493 deviation of the unfolding force distribution on the pulling speed
494 for the simulation data shown in
495 \protect\subref{fig:sawsim:v-dep}, using the same symbols. The
496 insets show the force distribution histograms for the three
497 proteins at the pulling speed of $1\U{$\mu$m/s}$. The left,
498 middle and right histograms are for the proteins represented by
499 the top, middle, and bottom lines in
500 \protect\subref{fig:sawsim:v-dep},
501 respectively.\label{fig:sawsim:all-v-dep}}
507 \asyinclude{figures/fit-space/fit-valley}
508 \caption{Fit quality between an experimental data set and simulated
509 data sets obtained using various values of unfolding rate
510 parameters $k_{u0}$ and $\Delta x_u$. The experimental data are
511 from octameric ubiquitin pulled at $1\U{$\mu$m/s}$\citep{chyan04},
512 and the other model parameters are the same as those in
513 \cref{fig:sawsim:sim-all}. The best fit parameters are $\Delta
514 x_u=0.17\U{nm}$ and $k_{u0}=1.2\E{-2}\U{s$^{-1}$}$. The
515 simulation histograms were built from $400$ pulls at for each
516 parameter pair.\label{fig:sawsim:fit-space}}
520 \subsection{Features}
521 \label{sec:sawsim:features}
523 \sawsim\ is a great improvement over existing work in this field.
524 \citet{best02} are the only authors to mention such automatic
525 simulation comparisons, and their $\chi^2$ fit only compares mean
526 unfolding forces over a range of speeds. They calculate $\avg{F}$
527 through an iterative method, and assume a standard deviation of
528 $20\U{pN}$ on their simulated $\avg{F}$. \sawsim, by comparison,
529 makes full use of your experimental histograms, which you specify in a
530 plain-text histogram file:
533 \begin{Verbatim}[commandchars=\\\{\}]
535 #Force (N) Unfolding events
544 #Force (N) Unfolding events
552 #Force (N) Unfolding events
562 Each \sawsim\ run simulates a single sawtooth curve, so you need to
563 run many \sawsim\ instances to generate your simulated histograms. To
564 automate this task, \sawsim\ comes with a \citetalias{python} wrapping
565 library (\pysawsim), which provides convenient programmatic and
566 command line interfaces for generating and manipulating \sawsim\ runs.
567 For example, to compare the experimental histograms listed above with
568 simulated data over a 50-by-50 grid of $k_{u0}$ and $\Delta x$, you
569 would use something like
570 \begin{minted}[samepage]{console}
571 $ sawsim_hist_scan.py -f '-s cantilever,hooke,0.05 -N1 -s folded,null -N8
572 > -s "unfolded,wlc,{0.39e-9,28e-9}" -k "folded,unfolded,bell,{%g,x%g}"
573 > -q folded' -r '[1e-5,1e-3,50],[0.1e-9,1e-9,50]' --logx histograms.txt
575 That's a bit of a mouthful, so let's break it down. Without the
576 \sawsim\ template (\imint{sh}|-f ...|), we can focus on the comparison
578 \begin{minted}[samepage]{console}
579 $ sawsim_hist_scan.py ... -r '[1e-5,1e-3,50],[0.1e-9,1e-9,50]' --logx histograms.txt
581 This sets up a two-parameter sweep, with the first parameter going
582 from $1\E{-5}$ to $1\E{-3}$ in 50 logarithmic steps, and the second
583 going from $0.1\E{-9}$ to $1\E{-9}$ in 50 linear steps. The
584 \sawsim\ template defines the simulation model
585 (\cref{fig:sawsim:domains,tab:sawsim:model}), and \imint{sh}|%g| marks
586 the location where the swept parameters will be inserted.
588 Behind the scenes, \pysawsim\ is spawning several concurrent
589 \sawsim\ processes to take advantage of any parallel processing
590 facilities you may have access to (e.g.~multiple cores, MPI, PBS,
591 \ldots). A 50-by-50 grid with 400 runs per pixel at about one second
592 per \sawsim\ pull would take arount 12 days of serial execution.
593 Moving the simulation to the departments' 16 core file server cuts
594 that execution time down to 18 hours, which will easily complete over
595 a quiet weekend. Using MPI on the departments' 15 box, dual core
596 computer lab, the simulation would finish overnight.
598 \nomenclature[text ]{MPI}{Message passing interface, a parallel
599 computing infrastructure.}
600 \nomenclature[text ]{PBS}{Portable batch system, a parallel computing
601 infrastructure. You should be able to distinguish this from the
602 other PBS (phosphate buffered saline) based on the context.}
605 \label{sec:sawsim:testing}
607 Once a body of code reaches a certain level of complication, it
608 becomes difficult to convince others (or yourself) that it's actually
609 working correctly. In order to test \sawsim, I've developed a test
610 suite (distributed with \sawsim) that compares simulated unfolding
611 force histograms with analytical histograms for a number of situations
612 where solving for the analytical histogram is possible. In the
613 following subsection, I'll work out the theoretical unfolding force
614 distribution for a number of tractable cases. The sawsim test suite
615 generates simulated unfolding curves for these tractable cases
616 (e.g. single domain Bell model unfolding with a constant loading
617 rate), and compares the simulated unfolding force histograms with the
618 expected theoretical distribution. The simulated histograms match the
619 theoretical distributions for each combination of models regardless of
620 the parameters you feed into the models, so we can be confident that
621 \sawsim\ correctly implements at those models.
623 The instantaneous likelyhood of a protein unfolding is given by
624 $\deriv{F}{N_u}$, and the unfolding histogram is merely this function
625 discretized over a bin of width $W$\footnote{
626 This is similar to \xref{dudko06}{equation}{2}, remembering that
627 $\dot{F}=\kappa v$, that their probability density is not a
628 histogram ($W=1$), and that their probability density function is
632 h(F) \equiv \deriv{\text{bin}}{F}
633 = \deriv{F}{N_u} \cdot \deriv{\text{bin}}{F}
636 = -W \deriv{t}{N_f} \deriv{F}{t}
637 = \frac{W}{\kappa v} N_f k_u \label{eq:unfold:hist}
639 Solving for theoretical histograms is merely a question of taking your
640 chosen $k_u$, solving for $N_f(F)$, and plugging into
641 \cref{eq:unfold:hist}. We can also make a bit of progress solving for
642 $N_f$ in terms of $k_u$ as follows:
644 k_u &\equiv -\frac{1}{N_f} \deriv{t}{N_f} \\
645 -k_u \dd t \cdot \deriv{t}{F} &= \frac{\dd N_f}{N_f} \\
646 \frac{-1}{\kappa v} \integral{0}{F}{F'}{k_0(F')}
647 &= \left. \ln(N_f(F')) \right|_0^F
648 = \ln\p({\frac{N_f(F)}{N_f(0)}})
649 = \ln\p({\frac{N_f(F)}{N}}) \\
650 N_f(F) &= N\exp{\frac{-1}{\kappa v}\integral{0}{F}{F'}{k_u(F')}} \;,
653 where $N_f(0) = N$ because all the domains are initially folded.
655 \nomenclature[sr ]{$W$}{Bin width of an unfolding force histogram
656 (\cref{eq:unfold:hist}).}
658 \subsubsection{Constant unfolding rate}
660 In the extremely weak tension regime, the protein's unfolding rate is
661 independent of tension, so we can simplify \cref{eq:N_f} and plug into
662 \cref{eq:unfold:hist}.
664 N_f &= N\exp{\frac{-1}{\kappa v}\integral{0}{F}{F'}{\colA{k_u(F')}}}
665 = N\exp{\frac{-\colA{k_{u0}}}{\kappa v}\colB{\integral{0}{F}{F'}{}}}
666 = N\exp{\frac{-k_{u0} \colB{F}}{\kappa v}} \\
667 h(F) &= \frac{W}{\kappa v} N_f k_u
668 = \frac{W k_{u0} N}{\kappa v} \exp{\frac{-k_{u0} F}{\kappa v}} \;.
670 A constant unfolding-rate/hazard-function gives exponential decay.
671 This is not an earth shattering result, but it's a comforting first
672 step, and it does show explicitly the dependence in terms of the
673 various unfolding-specific parameters.
675 \subsubsection{Bell model}
677 Stepping up the intensity a bit, we come to Bell's model for unfolding
678 (\cref{sec:sawsim:rate:bell}). We can simplify the following
679 calculation by parametrizing with the characteristic force $\rho$
680 defined in \cref{sec:sawsim:results:scaffold} and the similar
681 single-domain mode $\alpha'\equiv-\rho\ln(k_{u0}\rho/\kappa v)$. With
682 these substitutions, \cref{eq:sawsim:bell} becomes
684 k_u = k_{u0} \exp{\frac{F}{\rho}} \;.
686 The unfolding histogram is then given via \cref{eq:N_f,eq:unfold:hist}.
688 N_f &= N\exp{\frac{-1}{\kappa v}\integral{0}{F}{F'}{\colA{k_u}}}
689 = N\exp{\frac{-1}{\kappa v}
690 \integral{0}{F}{F'}{\colAB{k_{u0}}{\colA{\exp{\frac{F'}{\rho}}}}}}
691 = N\exp{\frac{-\colB{k_{u0}}}{\kappa v}
692 \colA{\integral{0}{F}{F'}{\exp{\frac{F'}{\rho}}}}}
693 = N\exp{\frac{\colB{-}k_{u0}\colA{\rho}}{\kappa v}
694 \colAB{\p({\exp{\frac{F}{\rho}}-1})}} \\
695 &= N\exp{\colA{\frac{k_{u0}\rho}{\kappa v}}
696 \colB{\p({1 - {\exp{\frac{F}{\rho}}}})}}
697 = N\exp{\colAB{\exp{\frac{-\alpha'}{\rho}}}
698 \colB{\p({1 - {\exp{\frac{F}{\rho}}}})}}
699 = N\exp{\colB{\exp{\frac{-\alpha'}{\rho}} -
700 \exp{\frac{F-\alpha'}{\rho}}}} \\
701 h(F) &= \frac{W}{\kappa v} \colA{N_f} \colB{k_u}
703 \colA{N\exp{\exp{\frac{-\alpha'}{\rho}} - \exp{\frac{F-\alpha'}{\rho}}}}
704 \colB{k_{u0}\exp{\frac{F}{\rho}}}
705 = \frac{W N \colAB{k_{u0}}}{\colA{\kappa v}}
706 \exp{\colB{\frac{F}{\rho}} - \exp{\frac{F-\alpha'}{\rho}} +
707 \exp{\frac{-\alpha'}{\rho}}} \\
708 &= \frac{W N}{\colA{\rho}}
709 \exp{\frac{F \colA{-\alpha'}}{\rho} - \exp{\frac{F-\alpha'}{\rho}} +
710 \colB{\exp{\frac{-\alpha'}{\rho}}}}
712 \exp{\frac{F-\alpha'}{\rho} - \exp{\frac{F-\alpha'}{\rho}}}
713 \colB{\exp{\exp{\frac{-\alpha'}{\rho}}}} \\
714 &= \frac{W N \exp{\exp{\frac{-\alpha'}{\rho}}}}{\rho}
715 \exp{\frac{F-\alpha'}{\rho} - \exp{\frac{F-\alpha'}{\rho}}}
716 \label{eq:unfold:bell_pdf}
718 which matches \cref{eq:sawsim:gumbel} except for a constant
719 prefactor due to the range\footnote{
720 The Gumbel distribution in \cref{eq:sawsim:gumbel} is normalized for
721 the range $-\infty < F < \infty$, but \cref{eq:unfold:bell_pdf} is
722 normalized for the range $0 \le F < \infty$. This distinction will
723 alter the analytical mean and variance listed after
724 \cref{eq:sawsim:gumbel}, but with the experimental unfolding
725 histograms showing few zero-force unfolding events, the effective
726 difference will be negligible.
729 \nomenclature[sg a' ]{$\alpha'$}{The mode unfolding force for a single
730 folded domain, $\alpha'\equiv-\rho\ln(k_{u0}\rho/\kappa v)$
731 (\cref{eq:unfold:bell_pdf}).}
733 \subsubsection{Saddle-point Kramers' model}
735 For the saddle-point approximation for Kramers' model for unfolding
736 (\citet{evans97} Eqn.~3, \citet{hanggi90} Eqn. 4.56c, \citet{vanKampen07} Eqn. XIII.2.2).
738 k_u = \frac{D}{l_b l_{ts}} \cdot \exp{\frac{-U_b(F)}{k_B T}} \;,
739 \label{eq:kramers-saddle}
741 where $U_b(F)$ is the barrier height under an external force $F$,
742 $D$ is the diffusion constant of the protein conformation along the reaction coordinate,
743 $l_b$ is the characteristic length of the bound state $l_b \equiv 1/\rho_b$,
744 $\rho_b$ is the density of states in the bound state, and
745 $l_{ts}$ is the characteristic length of the transition state.
747 \nomenclature[sr ]{$U_b(F)$}{The barrier energy as a function of force
748 (\cref{eq:kramers-saddle}).}
749 \nomenclature[sr ]{$l_b$}{The characteristic length of the bound state
750 $l_b \equiv 1/\rho_b$ (\cref{eq:kramers-saddle}).}
751 \nomenclature[sg r_b ]{$\rho_b$}{The density of states in the bound
752 state (\cref{eq:kramers-saddle}).}
753 \nomenclature[sr ]{$l_{ts}$}{The characteristic length of the
754 transition state (\cref{eq:kramers-saddle}).}
756 \citet{evans97} solved this unfolding rate for both inverse power law
757 potentials and cusp potentials.
759 \section{Review of current research}
761 There is a long history of protein unfolding and unbinding
762 simulations. Early work by \citet{grubmuller96} and
763 \citet{izrailev97} focused on molecular dynamics (MD) simulations of
764 receptor-ligand breakage. Around the same time, \citet{evans97}
765 introduced a Monte Carlo Kramers' simulation in the context of
766 receptor-ligand breakage. The approach pioneered by \citet{evans97}
767 was used as the basis for analysis of the initial protein unfolding
768 experiments\citep{rief97a}. However, none of these earlier
769 implementations were open source, which made it difficult to reuse or
770 validate their results.
772 \nomenclature[text ]{MD}{Molecular dynamics simulation. Simulate the
773 physical motion of atoms and molecules by numerically solving
776 Within the Monte Carlo simulation approach, there are two main models
777 for protein domain unfolding under tension: Bell's and
778 Kramers'\citep{schlierf06,hummer03,dudko06}. Bell introduced his
779 model in the context of cell adhesion\citep{bell78}, but it has been
780 widely used to model mechanical unfolding in
781 proteins\citep{rief97a,carrion-vazquez99b,schlierf06} due to its
782 simplicity and ease of use\citep{hummer03}. Kramers introduced his
783 theory in the context of thermally activated barrier crossings, which
784 is how we use it here.
786 Evans introduced the saddle-point Kramers' approximation in a protein
787 unfolding context in 1997 (\xref{evans97}{equation}{3}). However,
788 early work on mechanical unfolding focused on the simpler Bell
789 model\citep{rief97a}. In the early 2000's, the
790 saddle-point/steepest-descent approximation to Kramer's model
791 (\xref{hanggi90}{equation}{4.56c}) was introduced into our
792 field\citep{dudko03,hyeon03}. By the mid 2000's, the full-blown
793 double-integral form of Kramer's model
794 (\xref{hanggi90}{equation}{4.56b}) was in use\citep{schlierf06}.
796 There have been some tangential attempts towards even fancier models:
797 \citet{dudko03} attempted to reduce the restrictions of the
798 single-unfolding-path model and \citet{hyeon03} attempted to measure
799 the local roughness using temperature dependent unfolding. However,
800 further work on these lines has been slow, because the Bell model fits
801 the data well despite its simplicity. For more complicated models to
802 gain ground, we need larger, more detailed datasets that expose
803 features which the Bell model doesn't capture.