src/introduction/main.tex

   1 \chapter{Introduction}
   2 \label{sec:intro}
   3
   4 Single molecule force spectroscopy (SMFS) is the study of folding and
   5 unfolding transitions in proteins under tension.  By measuring these
   6 transitions, we hope to gain insight into fundamental protein
   7 behavior.  SMFS is an attempt to bridge the gap between chemists
   8 studying folding and unfolding kinetics in bulk solutions and
   9 theorists simulating protein behavior at the amino-acid level.  An
  10 increased understanding of protein folding would guide researchers in
  11 developing drugs targeting biologically significant receptors and
  12 enzymes.  In this chapter, I describe the protein folding problem in a
  13 general sense (\cref{sec:folding-problem}), discuss theoretical
  14 frameworks for understanding protein folding
  15 (\cref{sec:energy-landscape}), highlight the role of SMFS in extending
  16 this understanding (\cref{sec:single-molecule}), and explain the role
  17 of unfolding experiments in understanding protein folding
  18 (\cref{sec:unfolding}).  The last section in this chapter gives a
  19 roadmap for the rest of the thesis (\cref{sec:outline}).
  20
  21 \section{The protein folding problem}
  22 \label{sec:folding-problem}
  23
  24 % Why study protein folding?
  25 In biological systems the most important molecules, such as proteins,
  26 nucleic acids, and polysaccharides, are all polymers.  Understanding
  27 the properties and functions of these polymeric molecules is crucial
  28 in understanding the molecular mechanisms behind structures and
  29 processes in cells.
  30
  31 % What do genes do?  Why is protein folding interesting?
  32 An organism's genetic code is stored in DNA in the cell nucleus.
  33 DNA sequencing is a fairly well developed field, with fundamental work
  34 such as the Human Genome Project seeing major development in the early
  35 2000s\citep{wolfsberg01,mcpherson01,collins03}.  It is estimated that
  36 human genetic information contains approximately 25,000 genes, each
  37 encoding a protein\citep{claverie01,venter01}.  Knowing the amino acid
  38 sequence for a particular protein, however, does not immediately shed
  39 light on the protein's role in the body, or even the protein's
  40 probable conformation.  Indeed, a protein's conformation is often
  41 vitally important in executing its biological tasks
  42 (\cref{fig:ligand-receptor}).  Unfortunately predicting a protein's
  43 stable conformations from it's amino acid sequence has proven to be
  44 remarkably difficult, as has the inverse problem of finding sequences
  45 that form a given conformation.
  46 %
  47 \nomenclature[text ]{DNA}{Deoxyribonucleic acid.}
  48
  49 \begin{figure}
  50   \begin{center}
  51   \includegraphics[width=2in]{figures/biotin-streptavidin/1SWE}%
  52   \caption{Complex of biotin\index{biotin} (red) and a
  53     streptavidin\index{streptavidin} tetramer (green)
  54     (\href{http://dx.doi.org/10.2210/pdb1swe/pdb}{PDB ID: 1SWE})%
  55     \citep{freitag97}.  The correct streptavidin conformation creates
  56     the biotin-specific binding pockets.  Biotin-streptavidin is a
  57     model ligand-receptor pair isolated from the bacterium
  58     \species{Streptomyces avidinii}%
  59     \index{Streptomyces@\species{Streptomyces avidnii}}.  Streptavidin
  60     binds to cell surfaces, and bound biotin increases streptavidin's
  61     cell-binding affinity\citep{alon90}.  Figure generated with
  62     \citetalias{pymol}.
  63     \label{fig:ligand-receptor}}
  64   \end{center}
  65 \end{figure}
  66
  67
  68 \section{Protein folding energy landscapes}
  69 \label{sec:energy-landscape}
  70
  71 % the free energy landscape
  72 Finding a protein's lowest energy state via a brute force sampling of
  73 all possible conformations is impossibly inefficient, due to the
  74 exponential scaling of possible conformations with protein length, as
  75 outlined by \citet{levinthal69}.  This has lead to a succession of
  76 models explaining the folding mechanism.  For a number of years, the
  77 ``pathway'' model of protein folding enjoyed popularity
  78 (\cref{fig:folding:pathway})\citep{levinthal69}.  More recently, the
  79 ``landscape'' or ``funnel'' model has come to the fore
  80 (\cref{fig:folding:landscape})\citep{dill97}.  Both of these models
  81 reduce the conformation space to a more approachable analog, and their
  82 success depends on striking a useful balance between simplicity and
  83 accuracy.
  84
  85 \begin{figure}
  86   \begin{center}
  87   \subfloat[][]{
  88     \begin{tikzpicture}[->,node distance=1.5cm]
  89       \tikzstyle{every state}=[draw=white]
  90       \node[state] (U)                 {$U$};
  91       \node[state] (I1)  [right of=U]  {$I_1$};
  92       \node[state] (I1X) [below of=I1] {$I_1^X$};
  93       \node[state] (I2)  [right of=I1] {$I_2$};
  94       \node[state] (I2X) [below of=I2] {$I_2^X$};
  95       \node[state] (N)   [right of=I2] {$N$};
  96
  97       \path[<->] (U)  edge (I1)
  98                  (I1) edge (I1X)
  99                  (I1) edge (I2)
 100                  (I2) edge (I2X)
 101                  (I2) edge (N);
 102     \end{tikzpicture}\label{fig:folding:pathway}}
 103   % \hspace{.25in}%
 104   \subfloat[][]{\includegraphics[width=2in]{figures/schematic/dill97-fig4}%
 105     \label{fig:folding:landscape}}
 106   \caption{\protect\subref{fig:folding:pathway} A ``double T'' example
 107     of the pathway model of protein folding, in which the protein
 108     proceeds from the native state $N$ to the unfolded state $U$ via a
 109     series of metastable transition states $I_1$ and $I_2$ with two
 110     ``dead end'' states $I_1^X$ and $I_2^X$.  Adapted from
 111     \citet{bedard08}.
 112     \protect\subref{fig:folding:landscape} The landscape model of
 113     protein folding, in which the protein diffuses through a
 114     multi-dimensional free energy landscape.  Separate folding
 115     attempts may take many distinct routes through this landscape on
 116     the way to the folded state.  Reproduced from
 117     \citet{dill97}.\label{fig:folding}}
 118   \end{center}
 119 \end{figure}
 120
 121 When the choice of theoretical approach becomes murky, you must gather
 122 experimental data to help distinguish between similar models.
 123 Separating the pathway model from the funnel model is only marginally
 124 within the realm of current experimental techniques, but with higher
 125 throughput and increased automation it should be easier to make such
 126 distinctions in the near future.
 127
 128
 129 \section{Why \emph{single} molecule?}
 130 \label{sec:single-molecule}
 131
 132 The large size of proteins relative to simpler molecules limits the
 133 information attainable from bulk measurements, because the
 134 macromolecules in a population can have diverse conformations and
 135 behaviors.  Bulk measurements average over these differences,
 136 producing excellent statistics for the mean, but making it difficult
 137 to understand the variation.  The individualized, and sometimes rare,
 138 behaviors of macromolecules can have important implications for their
 139 functions inside the cell.  Single molecule techniques, in which the
 140 macromolecules are studied one at a time, allow direct access to the
 141 variation within the population without averaging.  This provides
 142 important and complementary information about the functional
 143 mechanisms of several biological systems\citep{bustamante08}.
 144
 145 Single molecule techniques provide an opportunity to study protein
 146 folding and unfolding at the level of a single molecule, where the
 147 distinction between the pathway model and funnel model is clearer.
 148 They also provide a convenient benchmark for verifying molecular
 149 dynamics simulations, because it takes lots of computing power to
 150 simulate even one biopolymer with anything close to atomic resolution
 151 over experimental time scales.  Even with significant computing
 152 resources, comparing molecular dynamics results with experimental data
 153 remains elusive.  For example, experimental pulling speeds are on the
 154 order of \bareU{$\mu$m/s}, while simulation pulling speeds are on the
 155 order of \bareU{m/s}\citep{lu98,lu99,rief02,zhao06,berkemeier11}.
 156
 157 % why AFM & what an AFM is
 158 Single molecule techniques for manipulating biopolymers include
 159 optical measurements, \ie, single molecule fluorescence microscopy and
 160 spectroscopy, and mechanical manipulations of individual
 161 macromolecules, \ie, force microscopy and spectroscopy using atomic
 162 force microscopes (AFMs), laser tweezers\citep{kellermayer97,forde02},
 163 magnetic tweezers\citep{smith92}, biomembrane force
 164 probes\citep{merkel99}, and centrifugal
 165 microscopes\citep{halvorsen09}.  These techniques cover a wide range
 166 of approaches, and even when the basic approach is the same
 167 (e.g.\ force microscopy), the different techniques span orders of
 168 magnitude in the range of their controllable parameters.
 169 %
 170 \nomenclature[text ]{AFM}{Atomic force microscope (or microscopy).}
 171
 172 \section{Why \emph{un}folding?}
 173 \label{sec:unfolding}
 174
 175 There's a lot of talk about protein \emph{folding} in this chapter,
 176 while the rest of the thesis (and the title) are about
 177 \emph{unfolding}.  If you understand protein folding, you can use your
 178 understanding to design drugs with a particular conformation, or
 179 predict the conformation of a biologically important receptor
 180 (\cref{sec:folding-problem}).  Understanding protein unfolding is less
 181 directly useful, because unfolded proteins are rarely biologically
 182 relevant (although it does happen\citep{dyson05}).
 183
 184 The focus on unfolding is mainly because it's easier to unravel
 185 proteins by pulling on their ends (\cref{sec:procedure}) than it is to
 186 fold them into their native state by pushing on those ends
 187 (\cref{fig:ligand-receptor,fig:I27}).  For proteins with smooth enough
 188 energy landscapes, the folding and unfolding routes will be similar,
 189 so knowledge about the unfolding behavior \emph{does} shed light on
 190 the folding behavior.
 191
 192 Practically, the distinction between folding and unfolding makes
 193 little difference, as drug designers and doctors are not consuming
 194 SMFS results directly.  For researchers calibrating molecular dynamics
 195 simulations, it doesn't matter if you compare simulated folding
 196 experiments with experimental folding experiments, or simulated
 197 unfolding experiments with experimental unfolding experiments.  The
 198 important thing is to compare your simulation against \emph{some}
 199 experimental benchmarks.  If your molecular dynamics simulation
 200 successfully predicts a protein's unfolding behavior, it makes me more
 201 confident that it will correctly predict the protein's native folding
 202 behavior.
 203
 204
 205 \section{Thesis outline}
 206 \label{sec:outline}
 207
 208 \Cref{sec:methods} of this thesis outlines the apparatus and methods
 209 for single molecule force spectroscopy with an atomic force
 210 microscope.  \Cref{sec:sawsim} presents my \sawsim\ Monte Carlo
 211 simulation for modeling unfolding/refolding behavior.  By comparing
 212 model simulations with experimental measurements, we can gain insight
 213 into the protein's kinetics.  After \cref{sec:sawsim}, you should have
 214 a pretty firm grasp of the underlying physics, so we'll move on to
 215 \cref{sec:pyafm} and discuss my \pyafm\ experiment control software.
 216 With both the kinetic theory and procedure taken care of,
 217 \cref{sec:calibcant} discusses thermal cantilever calibration,
 218 deriving the theoretical approach and presenting my
 219 \calibcant\ automatic calibration software.
 220
 221 Moving away from experiment control, \cref{sec:hooke} presents the
 222 \Hooke\ suite for extracting unfolding force histograms (for
 223 comparison with \sawsim\ simulations).  In \cref{sec:salt}, I pull all
 224 the pieces together (experiment control, post processing, and
 225 simulation) to carry out unfolding experiments on the
 226 immunoglobulin-like domain 27 from human Titin (I27) in buffers with
 227 different ion strength.  We close with \cref{sec:future}, which
 228 summarizes my conclusions and discusses possible directions for future
 229 work.