From: W. Trevor King Date: Thu, 13 Aug 2009 00:06:24 +0000 (-0400) Subject: Fixed _major_ bug in multi-domain unfolding calculations & bumped to 0.8. X-Git-Tag: v0.8 X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=166322e2b98c0666b7460e7187c6ecd9f7e77050;p=sawsim.git Fixed _major_ bug in multi-domain unfolding calculations & bumped to 0.8. The relavant lines of code are: @@ -548,7 +574,7 @@ int domain_transitions(double F, double dt, environment_t *env, int num_domains, k = accel_k(transition->k, F, env, transition->k_params); //(*transition->k)(F, env, domain->k_params); //printf("k = %g,\tdt = %g,\tk dt = %g\n", k, dt, k*dt); - return happens(k*dt*num_domains); /* N dice rolls for prob. k*dt event */ + return happens(1-pow((1.0-k*dt), num_domains)); /* N dice rolls for prob. k*dt event */ } @ [[happens]] is a random decision making function defined in Appendix \ref{sec.utils}. Yes, I feel very stupid. Sorry. I also took the oportunity to skim over the explanatory text (up to the Appendix break), making a few updates and typo corrections as well as adding more detail to the "Timescales" section discussing multi-unfolding timesteps. --- diff --git a/src/sawsim.nw b/src/sawsim.nw index f1e7f2c..1132a9f 100644 --- a/src/sawsim.nw +++ b/src/sawsim.nw @@ -153,8 +153,14 @@ Version 0.7 added tension model inverses, which seems to reduce computation time by about a factor of 2. I was expecting more, but I'll take what I can get. +Version 0.8 fixed a \emph{major} bug in unfolding probability for +multi-domain groups. The probability of at least one domain unfolding +had been calculated as $P_N=NP_1$, rather than $P_N=1-(1-P_1)^N$. I +dunno how that slipped by before. Now there are multi-domain tests in +testing to check for this sort of mistake. + <>= -#define VERSION "0.7" +#define VERSION "0.8" @ \subsection{License} @@ -258,8 +264,8 @@ following models have been implemented: \end{packed_item} The tension model and parameters used for a particular domain depend -on whether the domain's current state. The transition rates between -states are also handled generally with function pointers, with +on the domain's current state. The transition rates between states +are also handled generally with function pointers, with implementations for \begin{packed_item} \item Null (Appendix \ref{sec.null_k}), @@ -394,8 +400,6 @@ interactions between domains of the same group need not be. Each model has a tension handler function, which gives the tension $F$ needed to stretch a given group of domains a distance $x$. -GROUPS ARE CURRENTLY DISABLED. - Groups represent collections of states which the model should treat as a single entity. To understand the purpose of groups, consider a system with two unfolded domain states (e.g.\ for two different @@ -405,7 +409,10 @@ same group. This would reduce both the computational cost of balancing the tension and the error associated with the inter-group interaction linearity. Note that group numbers only matter \emph{within} model classes, since grouping domains with seperate -models doesn't make sense. +models doesn't make sense. Within designated groups, the tension +parameters for each domain are still checked for compatibility, so if +you accidentally assign incompatible domains to the same group you +should get an appropriate error message. <>= #define NUM_TENSION_MODELS 5 @@ -449,7 +456,7 @@ into the [[tension_balance]] function. [[transition]] should be set to 0 if there were no transitions since the previous call to [[find_tension]] to support some optimizations that assume a small increase in tension between steps (only true if no transition has -occured). While were messing with the tension handlers, we also grab +occured). While we're messing with the tension handlers, we also grab a measure of the current linker stiffness for the environmental variable, which is needed by some of the unfolding rate models (e.g.\ the linker-stiffness corrected Bell model, Appendix @@ -514,10 +521,12 @@ Each group must define a parameter structure if the tension model is parametrized, a function to create the parameter structure, a function to destroy the parameter structure, and - a tension function of type [[one_dim_fn_t]] that receives a [[tension_handler_data_t *]] as it's data argument. + a tension function of type [[one_dim_fn_t]] that receives a [[tension_handler_data_t *]] as its data argument. The tension-specific parameter structures are contained in the domain -group field. See the Section \ref{sec.model_selection} for a -discussion on the command line framework. See the worm-like chain +group field. For optimization, a group may also define an inverse +tension function as an optimization. See the Section +\ref{sec.model_selection} for a discussion on the command line +framework and inverse function discussion. See the worm-like chain implementation for an example (Section \ref{sec.wlc}). The major limitation of our approach is that all of our tension models @@ -541,6 +550,23 @@ $$ P = k \cdot dt. $$ +For a group of $N$ identical domains, and therefore identicalunfolding +probabilities $P$, the probability of $n$ domains unfolding in a given +timestep is given by the binomial distribution +$$ + p(n) = \frac{N!}{n!(N-n)!}P^n(1-P)^(N-n). +$$ +The probability that \emph{none} of these domains unfold is then +$$ + p(0) = (1-P)^N, +$$ +so the probability that \emph{at least one} domain unfolds is +$$ + p(n>0) = 1-(1-P)^N. +$$ +We take some time to discuss the meaning of $p(n>1)$ +(i.e. multi-unfolding timesteps) in Section \ref{sec.timescales}. + <>= int domain_transitions(double F, double dt, environment_t *env, int num_domains, transition_t *transition) { /* returns 1 or 0, F in N, dt in s, pointer to env. data, number of 'reactant' domains, pointer to transition structure */ @@ -548,7 +574,7 @@ int domain_transitions(double F, double dt, environment_t *env, int num_domains, k = accel_k(transition->k, F, env, transition->k_params); //(*transition->k)(F, env, domain->k_params); //printf("k = %g,\tdt = %g,\tk dt = %g\n", k, dt, k*dt); - return happens(k*dt*num_domains); /* N dice rolls for prob. k*dt event */ + return happens(1-pow((1.0-k*dt), num_domains)); /* N dice rolls for prob. k*dt event */ } @ [[happens]] is a random decision making function defined in Appendix \ref{sec.utils}. @@ -593,14 +619,36 @@ see the Appendix of \citet{evans99} and \citet{kroy07}. Besides assuming our timestep is much greater than equilibration times, we also force it to be short enough so that only one domain may unfold in a given timestep. For an unfolding timescale of $dt_u = -1/k$ we adapt our timesteps to keep $dt \ll dt_u$, so the probability -of two domains unfolding in the same timestep is negligible. This -approach breaks down as the adaptive timestep scheme approaches $dt -\sim dt_u$, but $dt_u \sim 1\U{$\mu$s}$ for Ig27-like proteins -\citep{klimov00}, so this shouldn't be a problem. To reassure +1/k$, we adapt our timesteps to keep $dt \ll dt_u$, so the probability +of two domains unfolding in the same timestep is small (see +[[determine_dt]] in Section \ref{sec.adaptive_dt} and [[P]] in +[[main()]] in Section \ref{sec.main} set by [[-P]] in +[[get_options()]] in Section \ref{sec.get_options}). This approach +breaks down as the adaptive timestep scheme approaches the transition +timestep $dt_t$, but $dt_t \sim 1\U{$\mu$s}$ for Ig27-like proteins +\citep{klimov00}, so this shouldn't be a problem. To reassure yourself, you can ask the simulator to print the smallest timestep that was used with TODO. +Even if the likelyhood of two domains unfolding in the same timestep +is small, if you run long enough simulations it will eventually occur. +In this case, we assume that $dt_t \ll dt$, so even if two domains +would unfold if we stuck to our unfolding rate $k$ for an entire +timestep $dt$, in ``reality'' only one of those domains unfolds. +Because we do not know when in the timestep the unfolding took place, +we move on to the next timestep without checking for further +unfoldings. This ``unchecked timestep portion'' should not contribute +significant errors to the calculation, because if $dt$ was small +enough that unfolding was unlikely at the pre-unfolding force, it +should be even more unlikely at the post-unfolding force, especially +over only a fraction of the timestep. The only dangerous cases for +the current implementation are those where unfolding intermediates are +much less stable than their precursor states, in which case an +unfolding event during the remaining timestep may actually be likely. +A simple workaround for such cases is to pick the value for [[P]] to +be small enough that the $dt \ll dt_u$ assumption survives even under +a transition populating the unstable intermediate. + \subsection{Adaptive timesteps} \label{sec.adaptive_dt} @@ -848,6 +896,7 @@ void help(char *prog_name, double P, double t_max, double dt_max, double v, doub @ \subsection{Get options} +\label{sec.get_options} <>= void get_options(int argc, char **argv,