src/introduction/main.tex

   1 \chapter{Introduction}
   2 \label{sec:intro}
   3
   4 Single molecule force spectroscopy (SMFS) is the study of folding and
   5 unfolding transitions in proteins under tension.  By measuring these
   6 transitions, we hope to gain insight into fundamental protein
   7 behavior.  SMFS is an attempt to bridge the gap between chemists
   8 studying folding and unfolding kinetics in bulk solutions and
   9 theorists simulating protein behavior at the amino-acid level.  An
  10 increased understanding of protein folding would guide researchers in
  11 developing drugs targeting biologically significant receptors and
  12 enzymes.  In this chapter, I describe the protein folding problem in a
  13 general sense (\cref{sec:folding-problem}), discuss theoretical
  14 frameworks for understanding protein folding
  15 (\cref{sec:energy-landscape}), highlight the role of SMFS in extending
  16 this understanding (\cref{sec:single-molecule}), and explain the roll
  17 of unfolding experiments in understanding protein folding
  18 (\cref{sec:unfolding}).  The last section in this chapter gives a
  19 roadmap for the rest of the thesis (\cref{sec:outline}).
  20
  21 \section{The Protein Folding Problem}
  22 \label{sec:folding-problem}
  23
  24 % Why study protein folding?
  25 In biological systems the most important molecules, such as proteins,
  26 nucleic acids, and polysaccharides, are all polymers.  Understanding
  27 the properties and functions of these polymeric molecules is crucial
  28 in understanding the molecular mechanisms behind structures and
  29 processes in cells.
  30
  31 % What do genes do?  Why is protein folding interesting?
  32 An organism's genetic code is stored in DNA%
  33 \nomenclature{DNA}{Deoxyribonucleic Acid}
  34 in the cell nucleus.
  35 DNA sequencing is a fairly well developed field, with fundamental work
  36 such as the Human Genome Project seeing major development in the early
  37 2000s\citep{wolfsberg01,mcpherson01,collins03}.  It is estimated that
  38 human genetic information contains approximately 25,000 genes, each
  39 encoding a protein\citep{claverie01,venter01}.  Knowing the amino acid
  40 sequence for a particular protein, however, does not immediately shed
  41 light on the protein's role in the body, or even the protein's
  42 probable conformation.  Indeed, a protein's conformation is often
  43 vitally important in executing its biological tasks
  44 (\cref{fig:ligand-receptor}).  Unfortunately both predicting stable
  45 conformations of a given amino acid sequence and the inverse problem
  46 of finding sequences that form a given conformation have proven
  47 remarkably difficult.
  48
  49 \begin{figure}
  50   \begin{center}
  51   \includegraphics[width=2in]{figures/biotin-streptavidin/1SWE.png}%
  52   \caption{Complex of biotin\index{biotin} (red) and a
  53     streptavidin\index{streptavidin} tetramer (green)
  54     (\href{http://dx.doi.org/10.2210/pdb1swe/pdb}{PDB ID: 1SWE})%
  55     \citep{freitag97}.  The correct streptavidin conformation creates
  56     the biotin-specific binding pockets.  Biotin-streptavidin is a
  57     model ligand-receptor pair isolated from the bacterium
  58     \species{Streptomyces avidinii}%
  59     \index{Streptomyces@\species{Streptomyces avidnii}}.  Streptavidin
  60     binds to cell surfaces, and bound biotin increases streptavidin's
  61     cell-binding affinity\citep{alon90}.  Figure generated with
  62     \citetalias{pymol}.
  63     \label{fig:ligand-receptor}}
  64   \end{center}
  65 \end{figure}
  66
  67
  68 \section{Protein Folding Energy Landscapes}
  69 \label{sec:energy-landscape}
  70
  71 % the free energy landscape
  72 Finding a protein's lowest energy state via a brute force sampling of
  73 all possible conformations is impossibly inefficient, due to the
  74 exponential scaling of possible conformations with protein length, as
  75 outlined by \citet{levinthal69}.  This has lead to a succession of
  76 models explaining the folding mechanism.  For a number of years, the
  77 ``pathway'' model of protein folding enjoyed popularity
  78 (\cref{fig:folding:pathway})\citep{levinthal69}.  More recently, the
  79 ``landscape'' or ``funnel'' model has come to the fore
  80 (\cref{fig:folding:landscape})\citep{dill97}.  Both of these models
  81 reduce the conformation space to a more approachable analog, and their
  82 success depends on striking a useful balance between simplicity and
  83 accuracy.
  84
  85 \begin{figure}
  86   \begin{center}
  87   \subfloat[][]{
  88     \begin{tikzpicture}[->,node distance=1.5cm]
  89       \tikzstyle{every state}=[draw=white]
  90       \node[state] (U)                 {$U$};
  91       \node[state] (I1)  [right of=U]  {$I_1$};
  92       \node[state] (I1X) [below of=I1] {$I_1^X$};
  93       \node[state] (I2)  [right of=I1] {$I_2$};
  94       \node[state] (I2X) [below of=I2] {$I_2^X$};
  95       \node[state] (N)   [right of=I2] {$N$};
  96
  97       \path[<->] (U)  edge (I1)
  98                  (I1) edge (I1X)
  99                  (I1) edge (I2)
 100                  (I2) edge (I2X)
 101                  (I2) edge (N);
 102     \end{tikzpicture}\label{fig:folding:pathway}}
 103   % \hspace{.25in}%
 104   \subfloat[][]{\includegraphics[width=2in]{figures/schematic/dill97-fig4}%
 105     \label{fig:folding:landscape}}
 106   \caption{\protect\subref{fig:folding:pathway} A ``double T'' example
 107     of the pathway model of protein folding, in which the protein
 108     proceeds from the native state $N$ to the unfolded state $U$ via a
 109     series of metastable transition states $I_1$ and $I_2$ with two
 110     ``dead end'' states $I_1^X$ and $I_2^X$.  Adapted from
 111     \citet{bedard08}.
 112     \protect\subref{fig:folding:landscape} The landscape model of
 113     protein folding, in which the protein diffuses through a
 114     multi-dimensional free energy landscape.  Separate folding
 115     attempts may take many distinct routes through this landscape on
 116     the way to the folded state.  Reproduced from
 117     \citet{dill97}.\label{fig:folding}}
 118   \end{center}
 119 \end{figure}
 120
 121 When the choice of theoretical approach becomes murky, you must gather
 122 experimental data to help distinguish between similar models.
 123 Separating the pathway model from the funnel model is only marginally
 124 within the realm of current experimental techniques, but with higher
 125 throughput and increased automation it should be easier to make such
 126 distinctions in the near future.
 127
 128
 129 \section{Single Molecule Protein Folding Studies}
 130 \label{sec:single-molecule}
 131
 132 The large size of proteins relative to simpler molecules limits the
 133 information attainable from bulk measurements, because the
 134 macromolecules in a population can have diverse conformations and
 135 behaviors.  Bulk measurements average over these differences,
 136 producing excellent statistics for the mean, but making it difficult
 137 to understand the variation.  The individualized, and sometimes rare,
 138 behaviors of macromolecules can have important implications for their
 139 functions inside the cell.  Single molecule techniques, in which the
 140 macromolecules are studied one at a time, allow direct access to the
 141 variation within the population without averaging.  This provides
 142 important and complementary information about the functional
 143 mechanisms of several biological systems\citep{bustamante08}.
 144
 145 Single molecule techniques provide an opportunity to study protein
 146 folding and unfolding at the level of a single molecule, where the
 147 distinction between the pathway model and funnel model is clearer.
 148 They also provide a convenient benchmark for verifying molecular
 149 dynamics simulations, because it takes lots of computing power to
 150 simulate even one biopolymer with anything close to atomic resolution
 151 over experimental time scales.  Even with significant computing
 152 resources, comparing molecular dynamics results with experimental data
 153 remains elusive.  For example, experimental pulling speeds are on the
 154 order of \bareU{$\mu$m/s}, while simulation pulling speeds are on the
 155 order of \bareU{m/s}\citep{lu98,lu99,rief02,zhao06,berkemeier11}.
 156
 157 % why AFM & what an AFM is
 158 Single molecule techniques for manipulating biopolymers include
 159 optical measurements, \ie, single molecule fluorescence microscopy and
 160 spectroscopy, and mechanical manipulations of individual
 161 macromolecules, \ie, force microscopy and spectroscopy using atomic
 162 force microscopes (AFMs), laser tweezers\citep{kellermayer97,forde02},
 163 magnetic tweezers\citep{smith92}, biomembrane force
 164 probes\citep{merkel99}, and centrifugal
 165 microscopes\citep{halvorsen09}.  These techniques cover a wide range
 166 of approaches, and even when the basic approach is the same
 167 (e.g.\ force microscopy), the different techniques span orders of
 168 magnitude in the range of their controllable parameters.
 169 \nomenclature{AFM}{Atomic Force Microscope (or Microscopy)}
 170
 171 \section{Why \emph{unfolding?}}
 172 \label{sec:unfolding}
 173
 174 There's a lot of talk about protein \emph{folding} in this chapter,
 175 while the rest of the thesis (and the title) are about
 176 \emph{unfolding}.  If you understand protein folding, you can use your
 177 understanding to design drugs with a particular conformation, or
 178 predict the conformation of a biologically important receptor
 179 (\cref{sec:folding-problem}).  Understanding protein unfolding is less
 180 directly useful, because unfolded proteins are rarely biologically
 181 relevant (although it does happen\cref{TODO}).
 182
 183 The focus on unfolding is mainly because it's easier to unravel
 184 proteins by pulling on their ends (\cref{sec:procedure}) than it is to
 185 fold them into their native state by pushing on those ends
 186 (\cref{fig:ligand-receptor,fig:I27}).  For proteins with smooth enough
 187 energy landscapes, the folding and unfolding routes will be similar,
 188 so knowledge about the unfolding behavior \emph{does} shed light on
 189 the folding behavior.
 190
 191 Practically, the distinction between folding and unfolding makes
 192 little difference, as drug designers and doctors are not consuming
 193 SMFS results directly.  For researchers calibrating molecular dynamics
 194 simulations, it doesn't matter if you compare simulated folding
 195 experiments with experimental folding experiments, or simulated
 196 unfolding experiments with experimental unfolding experiments.  The
 197 important thing is to compare your simulation against \emph{some}
 198 experimental benchmarks.  If your molecular dynamics simulation
 199 successfully predicts a protein's unfolding behavior, it makes me more
 200 confident that it will correctly predict the protein's native folding
 201 behavior.
 202
 203
 204 \section{Thesis Outline}
 205 \label{sec:outline}
 206
 207 TODO: fill in once structure has stabilized
 208
 209 %\Cref{sec:unfolding} of this thesis discusses the theory of protein
 210 %unfolding for single domains.  \Cref{sec:tension} discusses linker
 211 %tension modeling.  \Cref{sec:unfolding-distributions} pulls
 212 %\cref{sec:unfolding,sec:tension} together to discuss the theory of
 213 %mechanical unfolding experiments.  This theory makes straightforward
 214 %analysis of unfolding results difficult, so \cref{sec:sawsim} presents
 215 %a Monte Carlo simulation approach to fitting unfolding parameters, and
 216 %\cref{sec:contour-space} presents the contour-length space analysis
 217 %for converting force curves to unfolding pathway fingerprints.
 218 %\Cref{sec:temperature-theory} wraps up the theory section by extending
 219 %the analysis in \cref{sec:unfolding,sec:unfolding-distributions} to
 220 %multiple temperatures.
 221 %
 222 %\Cref{sec:apparatus} describes our experimental apparatus and methods,
 223 %as well as calibration procedures.  With both the theory and procedure
 224 %taken care of, \cref{sec:cantilever,sec:temperature}
 225 %present and analyze AFM cantilever- and temperature-dependent
 226 %unfolding behavior of the immunoglobulin-like domain 27 from human
 227 %Titin (I27).
 228 %
 229 %We close with \cref{sec:future}, which presents our conclusions and
 230 %discusses possible directions for future work.