Fixed _major_ bug in multi-domain unfolding calculations & bumped to 0.8.

author W. Trevor King <wking@drexel.edu>

Thu, 13 Aug 2009 00:06:24 +0000 (20:06 -0400)

committer W. Trevor King <wking@drexel.edu>

Thu, 13 Aug 2009 00:06:24 +0000 (20:06 -0400)
author W. Trevor King <wking@drexel.edu>
Thu, 13 Aug 2009 00:06:24 +0000 (20:06 -0400)
committer W. Trevor King <wking@drexel.edu>
Thu, 13 Aug 2009 00:06:24 +0000 (20:06 -0400)
diff --git a/src/sawsim.nw b/src/sawsim.nw

index f1e7f2c18256fffc84721d5f47fa491064fc2d1b..1132a9fa5b67afeeeb71dbd48191ff4f6f7562c1 100644 (file)
--- a/src/sawsim.nw
+++ b/src/sawsim.nw
@@ -153,8 +153,14 @@ Version 0.7 added tension model inverses, which seems to reduce
  computation time by about a factor of 2.  I was expecting more, but
  I'll take what I can get.
  
+Version 0.8 fixed a \emph{major} bug in unfolding probability for
+multi-domain groups.  The probability of at least one domain unfolding
+had been calculated as $P_N=NP_1$, rather than $P_N=1-(1-P_1)^N$.  I
+dunno how that slipped by before.  Now there are multi-domain tests in
+testing to check for this sort of mistake.
+
  <<version definition>>=
-#define VERSION "0.7"
+#define VERSION "0.8"
  @
  
  \subsection{License}
@@ -258,8 +264,8 @@ following models have been implemented:
  \end{packed_item}
  
  The tension model and parameters used for a particular domain depend
-on whether the domain's current state.  The transition rates between
-states are also handled generally with function pointers, with
+on the domain's current state.  The transition rates between states
+are also handled generally with function pointers, with
  implementations for
  \begin{packed_item}
   \item  Null (Appendix \ref{sec.null_k}),
@@ -394,8 +400,6 @@ interactions between domains of the same group need not be.  Each
  model has a tension handler function, which gives the tension $F$
  needed to stretch a given group of domains a distance $x$.
  
-GROUPS ARE CURRENTLY DISABLED.
-
  Groups represent collections of states which the model should treat as
  a single entity.  To understand the purpose of groups, consider a
  system with two unfolded domain states (e.g.\ for two different
@@ -405,7 +409,10 @@ same group.  This would reduce both the computational cost of
  balancing the tension and the error associated with the inter-group
  interaction linearity.  Note that group numbers only matter
  \emph{within} model classes, since grouping domains with seperate
-models doesn't make sense.
+models doesn't make sense.  Within designated groups, the tension
+parameters for each domain are still checked for compatibility, so if
+you accidentally assign incompatible domains to the same group you
+should get an appropriate error message.
  
  <<find tension definitions>>=
  #define NUM_TENSION_MODELS 5
@@ -449,7 +456,7 @@ into the [[tension_balance]] function.  [[transition]] should be set
  to 0 if there were no transitions since the previous call to
  [[find_tension]] to support some optimizations that assume a small
  increase in tension between steps (only true if no transition has
-occured).  While were messing with the tension handlers, we also grab
+occured).  While we're messing with the tension handlers, we also grab
  a measure of the current linker stiffness for the environmental
  variable, which is needed by some of the unfolding rate models
  (e.g.\ the linker-stiffness corrected Bell model, Appendix
@@ -514,10 +521,12 @@ Each group must define a parameter structure if the tension model is
  parametrized,
   a function to create the parameter structure,
   a function to destroy the parameter structure, and
- a tension function of type [[one_dim_fn_t]] that receives a [[tension_handler_data_t *]] as it's data argument.
+ a tension function of type [[one_dim_fn_t]] that receives a [[tension_handler_data_t *]] as its data argument.
  The tension-specific parameter structures are contained in the domain
-group field.  See the Section \ref{sec.model_selection} for a
-discussion on the command line framework.  See the worm-like chain
+group field.  For optimization, a group may also define an inverse
+tension function as an optimization.  See the Section
+\ref{sec.model_selection} for a discussion on the command line
+framework and inverse function discussion.  See the worm-like chain
  implementation for an example (Section \ref{sec.wlc}).
  
  The major limitation of our approach is that all of our tension models
@@ -541,6 +550,23 @@ $$
    P = k \cdot dt.
  $$
  
+For a group of $N$ identical domains, and therefore identicalunfolding
+probabilities $P$, the probability of $n$ domains unfolding in a given
+timestep is given by the binomial distribution
+$$
+  p(n) = \frac{N!}{n!(N-n)!}P^n(1-P)^(N-n).
+$$
+The probability that \emph{none} of these domains unfold is then
+$$
+  p(0) = (1-P)^N,
+$$
+so the probability that \emph{at least one} domain unfolds is
+$$
+  p(n>0) = 1-(1-P)^N.
+$$
+We take some time to discuss the meaning of $p(n>1)$
+(i.e. multi-unfolding timesteps) in Section \ref{sec.timescales}.
+
  <<does domain transition>>=
  int domain_transitions(double F, double dt, environment_t *env, int num_domains, transition_t *transition)
  { /* returns 1 or 0, F in N, dt in s, pointer to env. data, number of 'reactant' domains, pointer to transition structure */
@@ -548,7 +574,7 @@ int domain_transitions(double F, double dt, environment_t *env, int num_domains,
    k = accel_k(transition->k, F, env, transition->k_params);
    //(*transition->k)(F, env, domain->k_params);
    //printf("k = %g,\tdt = %g,\tk dt = %g\n", k, dt, k*dt);
-  return happens(k*dt*num_domains); /* N dice rolls for prob. k*dt event */
+  return happens(1-pow((1.0-k*dt), num_domains)); /* N dice rolls for prob. k*dt event */
  }
  @ [[happens]] is a random decision making function defined in Appendix \ref{sec.utils}.
  
@@ -593,14 +619,36 @@ see the Appendix of \citet{evans99} and \citet{kroy07}.
  Besides assuming our timestep is much greater than equilibration
  times, we also force it to be short enough so that only one domain may
  unfold in a given timestep.  For an unfolding timescale of $dt_u =
-1/k$ we adapt our timesteps to keep $dt \ll dt_u$, so the probability
-of two domains unfolding in the same timestep is negligible.  This
-approach breaks down as the adaptive timestep scheme approaches $dt
-\sim dt_u$, but $dt_u \sim 1\U{$\mu$s}$ for Ig27-like proteins
-\citep{klimov00}, so this shouldn't be a problem.  To reassure
+1/k$, we adapt our timesteps to keep $dt \ll dt_u$, so the probability
+of two domains unfolding in the same timestep is small (see
+[[determine_dt]] in Section \ref{sec.adaptive_dt} and [[P]] in
+[[main()]] in Section \ref{sec.main} set by [[-P]] in
+[[get_options()]] in Section \ref{sec.get_options}). This approach
+breaks down as the adaptive timestep scheme approaches the transition
+timestep $dt_t$, but $dt_t \sim 1\U{$\mu$s}$ for Ig27-like proteins
+\citep{klimov00}, so this shouldn't be a problem. To reassure
  yourself, you can ask the simulator to print the smallest timestep
  that was used with TODO.
  
+Even if the likelyhood of two domains unfolding in the same timestep
+is small, if you run long enough simulations it will eventually occur.
+In this case, we assume that $dt_t \ll dt$, so even if two domains
+would unfold if we stuck to our unfolding rate $k$ for an entire
+timestep $dt$, in ``reality'' only one of those domains unfolds.
+Because we do not know when in the timestep the unfolding took place,
+we move on to the next timestep without checking for further
+unfoldings.  This ``unchecked timestep portion'' should not contribute
+significant errors to the calculation, because if $dt$ was small
+enough that unfolding was unlikely at the pre-unfolding force, it
+should be even more unlikely at the post-unfolding force, especially
+over only a fraction of the timestep.  The only dangerous cases for
+the current implementation are those where unfolding intermediates are
+much less stable than their precursor states, in which case an
+unfolding event during the remaining timestep may actually be likely.
+A simple workaround for such cases is to pick the value for [[P]] to
+be small enough that the $dt \ll dt_u$ assumption survives even under
+a transition populating the unstable intermediate.
+
  \subsection{Adaptive timesteps}
  \label{sec.adaptive_dt}
  
@@ -848,6 +896,7 @@ void help(char *prog_name, double P, double t_max, double dt_max, double v, doub
  @
  
  \subsection{Get options}
+\label{sec.get_options}
  
  <<get options>>=
  void get_options(int argc, char **argv,
author	W. Trevor King <wking@drexel.edu>
	Thu, 13 Aug 2009 00:06:24 +0000 (20:06 -0400)
committer	W. Trevor King <wking@drexel.edu>
	Thu, 13 Aug 2009 00:06:24 +0000 (20:06 -0400)