1 Over the last few days I've been trying to teach myself enough
2 genetics to reconstruct [Carrion-Vazquez's poly-I27 synthesis
3 procedure][cv99]. I'm not quite there yet, but I feel like I've made
4 enough progress that it's worth posting my notes somewhere public in
5 case they are useful to others.
10 We buy our poly-I27 from [AthenaES][], who market it as [I27O™][I27O].
11 Perusing their [technical brief][I27O-tb], makes it clear that I2O7™
12 corresponds to Carrion-Vazquez's I27<sup>RS</sup>₈. In
13 [Carrion-Vazquez' original paper][cv99] they describe the synthesis of
14 both I27<sup>RS</sup>₈ and a variant I27<sup>GLG</sup>₁₂. Their
15 I27<sup>RS</sup>₈ procedure is:
17 * Human cardiac muscle used to generate a [cDNA][] library ([Rief 1997][r97])
18 * cDNA library amplified with [PCR][]
19 - 5' primer contained a BamHI restriction site that permitted
20 in-frame cloning of the monomer into the expression vector pQE30.
21 - The 3' primer contained a BglII restriction site, two Cys codons
22 located 3' to the BglII site and in-frame with the I27 domain,
23 and two in-frame stop codons.
24 * The PCR product was cloned into pUC19 linearized with BamHI and SmaI.
25 * The 8-domain synthetic gene was constructed by iterative cloning
26 of monomer into monomer, dimer into dimer, and tetramer into
28 * The final construct contained eight direct repeats of the I27
29 domain, an amino-terminal His tag for purification, and two
30 carboxyl-terminal Cys codons used for covalent attachment to the
31 gold-covered coverslips.
33 They also give the full-length sequence of I27<sup>RS</sup>₈:
35 Met-Arg-Gly-Ser-(His)₆-Gly-Ser-(I27-Arg-Ser)₇-I27-...-Cys-Cys
37 They point out the Arg-Ser (RS) amino acid sequence is the BglII/BamHI
38 hybrid site, [which makes sense](#BglII-BamHI-joint).
40 Back on the Athena site, they have a [page describing their
41 procedure][I27O-syn] (they reference the Carrion-Vazquez paper). They
42 claim to use the restriction enzyme KpnI in addition to BamHI, BglII,
45 Carrion-Vazquez points to the following references:
47 * [Kempe et al. 1985][k85] (CV16), the source of the multi-step cloning technique.
48 * [Rief et al.][r97] (CV10), for I27 subcloning.
53 In their note 11, Rief et al. explain their synthesis procedure:
56 * Titin fragments of interest were amplified by PCR
58 * NH₂-terminal domain boundaries were as in [Politou 1996][p96].
59 * The clones were fused with an NH₂-terminal His₆ tag and a
60 COOH-terminal Cys₂ tag for immobilization on solid surfaces.
62 which doesn't help me very much.
67 The [Kempe article][k85] is more informative, focusing entirely on the
68 synthesis procedure (albiet for a different gene). Their figure 2
69 outlines the general approach, and used the following restriction
70 enzymes: PstI, BamHI, PstI, and BglII. I'll walk through their
71 procedure in detail below.
76 Wikipedia has a good page on the [genetic code][gcode] for converting
77 between DNA/mRNA codons and amino acids. I've written up a little
78 [[Python]] script, [[mRNAcode.py]], to automate the conversion of
79 various sequences, which helped me while I was writing this post. I'm
80 sure there are tons of similar programs out there, so don't feel
81 pressured to use mine ;).
86 We'll use the following [restriction enzymes][renz]:
93 [BglI][] (N is any nucleotide)
126 Here's my attempt to reconstruct the details of the polymer-cloning
127 reactions, where they splice several copies of I27 into the expression
133 Inserted their poly-SP into pHK414 (I haven't been able to find any
134 online sources for pHK414. Kempe cites [R.J. Watson et al.
135 *Expression of Herpes simplex virus type 1 and type 2 glyco-protein D
136 genes using the Escherichia coli lac promoter.* Y. Becker (Ed.),
137 *Recombinant DNA Research and Viruses.* Nijhoff, The Hague, 1985,
143 | | Met Arg Pro Lys Pro Gln Gln Phe Phe Gly Leu Met |
144 5’ GA AGC TTC ATG CGT CCG AAG CCG CAG CAG TTC TTC GGT CTC ATG GAT CCG
145 CT TCG AAG TAC GCA GGC TTC GGC GTC GTC AAG AAG CCA GAG TAC CTA GGC 5’
149 _______Linker_sequence______
152 ,PstI. BglII.| |,SmaI. |
153 CTGCAG...AGATCTAAGCTTCCCGGGGATCCAAGATCC
154 GACGTC...TCTAGATTCGAAGGGCCCCTAGGTTCTAGG
156 .......................................
158 ### Synthesizing pSP4-1
160 #### pHK414 + HindIII + BamHI
162 They cut a hole in the plasmid…
166 CTGCAG...AGATCTA GATCCAAGATCC
167 GACGTC...TCTAGATTCGA GTTCTAGG
169 .......................................
171 #### SP + HindIII + BamHI
173 … and cut matching snips off their SP gene.
176 | | Met Arg Pro Lys Pro Gln Gln Phe Phe Gly Leu Met |
177 AGC TTC ATG CGT CCG AAG CCG CAG CAG TTC TTC GGT CTC ATG
178 AG TAC GCA GGC TTC GGC GTC GTC AAG AAG CCA GAG TAC CTA G
182 Mixing the snips together gives the plasmid with a single SP.
185 ,PstI. BglII.| | MetArgProLysProGlnGlnPhePheGlyLeuMet |
186 CTGCAG...AGATCTAAGCTTCATGCGTCCGAAGCCGCAGCAGTTCTTCGGTCTCATGGATCCAAGATCC
187 GACGTC...TCTAGATTCGAAGTACGCAGGCTTCGGCGTCGTCAAGAAGCCAGAGTACCTAGGTTCTAGG
189 ......................................................................
191 Using `-SP-` to abbreviate the HindIII→Met→Met portion (less the
192 terminal G, which is part of the BamHI match sequence).
195 CTGCAG...AGATCT-SP-GGATCC
196 GACGTC...TCTAGA-SP-CCTAGG
198 .........................
200 ### Synthesizing pSP4-2
202 The single-SP plasmid, pSP4-1, is split in two parallel reactions.
207 ACGTC...TCTAGA-SP-CCTAG
211 CTGCA GATCT-SP-GGATCC
214 .........................
218 Then the SP-containing fragments (shown above) are isolated and mixed
219 together to form pSP4-2.
221 ,PstI. BglII. other. BamHI.
222 CTGCAG...AGATCT-SP-GGATCT-SP-GGATCC
223 GACGTC...TCTAGA-SP-CCTAGA-SP-CCTAGG
225 ...................................
227 where the "other" sequence is the result of the BamHI/BglII splice.
228 Expanding the `-SP-` abbreviation around the SP joint:
230 ....SP,other_.HindIII. SP.....
231 Leu Met Asp Leu Ser Phe Met Arg
232 CTC ATG GAT CTA AGC TTC ATG CGT
233 AGA CGT TCG AGC CTA GGA CGT ATG
235 So the resulting poly-SP will have Asp-Leu-Ser-Phe linking amino
238 By repeating the PstI + BamHI / PstI + BglII split-and-join, you can
239 synthesize plasmids with any number of SP repeats.
241 I27<sup>RS</sup>₈ procedure
242 ---------------------------
244 Like Kempe, Carrion-Vazquez et al. flank the I27 gene with BglII and
245 BamHI, but they reverse the order. Here's the output of their PCR:
247 BamHI-I27-BglII-Cys-Cys-STOP-STOP
249 From the PDB entry for I27 ([1TIT][]), the amino acid sequence is:
252 MHHHHHHSSLIEVEKPLYGVEVFVGETAHFEIELSEPDVHGQWKLKGQPLTASPDCEIIEDGKKHILI
253 LHNCQLGMTGEVSFQAANAKSAANLKVKEL
255 To translate this into cDNA, I've scanned thorough the sequence of
256 [NM_003319.4][], and found a close match from nucleotides 15991
259 15982 CTAATAAAAG TGGAAAAGCC TCTGTACGGA GTAGAGGTGT TTGTTGGTGA
260 16032 AACAGCCCAC TTTGAAATTG AACTTTCTGA ACCTGATGTT CACGGCCAGT
261 16082 GGAAGCTGAA AGGACAGCCT TTGACAGCTT CCCCTGACTG TGAAATCATT
262 16132 GAGGATGGAA AGAAGCATAT TCTGATCCTT CATAACTGTC AGCTGGGTAT
263 16182 GACAGGAGAG GTTTCCTTCC AGGCTGCTAA TGCCAAATCT GCAGCCAATC
264 16232 TGAAAGTGAA AGAATTG
266 This cDNA match generates an amino acid starting with LIKVEK instead
267 of the expected LIEVEK, but the LIKVEK version matches amino acids
268 12677-12765 in [Q8WZ42][] (canonical titin), and there is a natural
269 variant listed for [12679 K→E][var].
271 Interestingly, this sequence contains a PstI site at nucleotides 16220
272 through 16225. None of our other restriction enzymes have sites in
275 Carrion-Vazquez et al. list two vectors in their procedure, but I'm
276 not sure about their respective roles.
280 [pQE30][pQE30-a] ([sequence][pQE30-b]) is listed as the "expression
281 vector", but I'm not sure why they would need a non-expression vector,
282 as they don't reference cross-vector subcloning after inserting their
283 I27 monomer into the plasmid.
285 From the [Qiagen site][pQE30-b], the section around the linker
286 nucleotides 115 through 203 is:
288 ,RGS-His epitope__________________. ,BamHI.
289 Met Arg Gly Ser His His His His His His Gly Ser Ala Cys Glu Leu
290 ATG AGA GGA TCG CAT CAC CAT CAC CAT CAC GGA TCC GCA TGC GAG CTC
291 CGT CTC TTC GAT ACG ACA ACG ACA ACG ACA TTC GAA TAC GTA TCT AGA
295 Gly Thr Pro Gly Arg Pro Ala Ala Lys Leu Asn STOP
296 GGT ACC CCG GGT CGA CCT GCA GCC AAG CTT AAT TAG CTG AG
297 TTG CAA AAT TTG ATC AAG TAC TAA CCT AGG CCG GCT AGT CT
299 However, there is no BglII site in this linker. In fact, there is no
300 BglII site in the entire pQE30 plasmid, so they'd need to use a third
301 restiction enzyme to insert their I27 (which does contain a trailing
306 From [BCCM/LMBP][pUC19-a] and [GenBank][pUC19-b], the section around
307 the linker nucleotides 233 through 289 is:
310 HindIII. ,PstI__. ,BamHI_. ,KpnI__.
312 AA GCT TGC ATG CCT GCA GGT CGA CTC TAG AGG ATC CCC GGG TAC CGA
316 However, there is no BglII the entire pUC19 plasmid either, so they'd
317 need to use a third restiction enzyme to insert their I27.
321 1. Why do Carrion-Vazquez et al. list two different plasmids?
322 2. What is the 3'-side restiction enzyme that Carrion-Vazquez et
323 al. use to insert their I27 into their plasmid?
324 3. What is the remote restriction enzyme that Carrion-Vazquez et
325 al. use to break their opened plasmids (Kempe PstI equivalent).
326 4. The BamHI and SmaI sites in pUC19 overlap, so it is unclear how you
327 could use both to "linearize" pUC19. It would seem that either one
328 would open the plasmid on its own, although I'm not sure you could
329 "heal" the blunt-ended SmaI cut.
330 5. Since the Arg-Ser joint is formed by a BglII/BamHI overlap, why are
331 there no BglII-coded amino acids after the last I27 in the I27<sup>RS</sup>₈
332 sequence? If there is, why do Carrion-Vazquez et al. not
333 acknowledge it when they write [3]:
335 > The full-length construct, I27<sup>RS</sup>₈, results in the
336 > following amino acid additions: (i) the amino-terminal sequence is
337 > Met-Arg-Gly-Ser-(His)6-Gly-Ser-I27 codons; (ii) the junction
338 > between the domains (BamHI-BglII hybrid site) is Arg-Ser; and
339 > (iii) the protein terminates in Cys-Cys.
341 Since they don't acknowledge an I27-Arg-Ser-Cys-Cys ending, might
342 there be more amino acids in the C terminal addition?
346 Since I'm stuck trying to get I27 into either plasmid, let's try and
349 Met-Arg-Gly-Ser-(His)₆-Gly-Ser-(I27-Arg-Ser)₇-I27-...-Cys-Cys
351 #### <a id="BglII-BamHI-joint">BglII/BamHI joint</a>
353 The BglII/BamHI overlap would produce the expected Arg-Ser joint.
356 A + GATCC = AGATCC = Arg-Ser
359 #### Final plasmid (pI27-8)
361 The beginning of this sequence looks like the start of pQE30's linker,
362 so we'll assume the final plasmid was:
364 remote ... ,RGS-His epitope__________________. ,BamHI. I27...
365 ... Met Arg Gly Ser His His His His His His Gly Ser Leu Ile ...
366 ??? ... ATG AGA GGA TCG CAT CAC CAT CAC CAT CAC GGA TCC CTA ATA ...
367 ??? ... CGT CTC TTC GAT ACG ACA ACG ACA ACG ACA TTC GAA GAT TAT ...
369 ........I27 joint_. I27 ... final I27 ,BglII. continuation of pQE30?
370 ... Glu Leu Leu ... Leu Arg Ser Cys Cys STOPSTOP...
371 ... GAA TTG AGA TCC CTA ... TTG AGA TCT TGC TGC TAG TAG ...
372 ... CTT AAC TCT AGG GAT ... GAT CTC GAG GTA GTA GCT GCT ...
374 #### Penultimate plasmid (pI27-4)
376 remote ... ,RGS-His epitope__________________. ,BamHI. I27...
377 Met Arg Gly Ser His His His His His His Gly Ser Leu Ile ...
378 ??? ... ATG AGA GGA TCG CAT CAC CAT CAC CAT CAC GGA TCC CTA ATA ...
379 ??? ... CGT CTC TTC GAT ACG ACA ACG ACA ACG ACA TTC GAA GAT TAT ...
381 ... I27 joint_. I27 ... fourth I27 ,BglII. continuation of pQE30?
382 ... Glu Leu Leu ... Leu Arg Ser Cys Cys STOPSTOP...
383 ... GAA TTG AGA TCC CTA ... TTG AGA TCT TGC TGC TAG TAG ...
384 ... CTT AAC TCT AGG GAT ... GAT CTC GAG GTA GTA GCT GCT ...
386 ##### pI27-4 + BamHI + remote
388 remote ,BamHI. I27...
393 ....... I27 joint_. I27 ... fourth I27 ,BglII. continuation of pQE30?
394 ... Glu Leu Leu ... Leu Arg Ser Cys Cys STOPSTOP...
395 ... GAA TTG AGA TCC CTA ... TTG AGA TCT TGC TGC TAG TAG ...
396 ... CTT AAC TCT AGG GAT ... GAT CTC GAG GTA GTA GCT GCT ...
398 ##### pI27-4 + BglII + remote
400 remote ... ,RGS-His epitope__________________. ,BamHI. I27...
401 Met Arg Gly Ser His His His His His His Gly Ser Leu Ile ...
402 ?? ... ATG AGA GGA TCG CAT CAC CAT CAC CAT CAC GGA TCC CTA ATA ...
403 ? ... CGT CTC TTC GAT ACG ACA ACG ACA ACG ACA TTC GAA GAT TAT ...
405 ....... I27 joint_. I27 ... fourth I27 ,BglII.
406 ... Glu Leu Leu ... Leu
407 ... GAA TTG AGA TCC CTA ... TTG A
408 ... CTT AAC TCT AGG GAT ... GAT CTC GA
412 remote ... ,RGS-His epitope__________________. ,BamHI. I27...
413 Met Arg Gly Ser His His His His His His Gly Ser Leu Ile ...
414 ??? ... ATG AGA GGA TCG CAT CAC CAT CAC CAT CAC GGA TCC CTA ATA ...
415 ??? ... CGT CTC TTC GAT ACG ACA ACG ACA ACG ACA TTC GAA GAT TAT ...
417 ....... I27 joint_. I27 ... fourth I27 ,other. I27...
418 ... Glu Leu Leu ... Leu Gly Ser Leu Ile ...
419 ... GAA TTG AGA TCC CTA ... TTG AGA TCC CTA ATA ...
420 ... CTT AAC TCT AGG GAT ... GAT CTC GAA GAT TAT ...
422 ....... I27 joint_. I27 ... fourth I27 ,BglII. continuation of pQE30?
423 ... Glu Leu Leu ... Leu Arg Ser Cys Cys STOPSTOP...
424 ... GAA TTG AGA TCC CTA ... TTG AGA TCT TGC TGC TAG TAG ...
425 ... CTT AAC TCT AGG GAT ... GAT CTC GAG GTA GTA GCT GCT ...
427 #### Continuing to the first plasmid, pI27-1 must have been
429 remote ... ,RGS-His epitope__________________. ,BamHI. I27...
430 ... Met Arg Gly Ser His His His His His His Gly Ser Leu Ile ...
431 ??? ... ATG AGA GGA TCG CAT CAC CAT CAC CAT CAC GGA TCC CTA ATA ...
432 ??? ... CGT CTC TTC GAT ACG ACA ACG ACA ACG ACA TTC GAA GAT TAT ...
434 ........I27 ,BglII. continuation of pQE30?
435 ... Glu Leu Arg Ser Cys Cys STOPSTOP...
436 ... GAA TTG AGA TCT TGC TGC TAG TAG ...
437 ... CTT AAC CTC GAG GTA GTA GCT GCT ...
439 ### Potential pQE30 insertion points
441 * Kpn1 (present after BamHI in both plasmids)
443 ### Potential remote restriction enzymes
445 * BglI (pQE30 nucleotides 2583-2593 (GCCGGAAGGGC), Amp-resistance
446 3256-2396; pUC19 has two BglI sites (bad idea))
449 [cv99]: http://dx.doi.org/10.1073/pnas.96.7.3694
450 [r97]: http://dx.doi.org/10.1126/science.276.5315.1109
451 [PCR]: http://en.wikipedia.org/wiki/Polymerase_chain_reaction
452 [cDNA]: http://en.wikipedia.org/wiki/Complementary_DNA
453 [λ]: http://en.wikipedia.org/wiki/Lambda_phage
454 [AthenaES]: http://www.athenaes.com/
455 [I27O]: http://www.athenaes.com/I27OAFMReferenceProtein.php
456 [I27O-tb]: http://www.athenaes.com/tech_brief_I27O_protein.php
457 [I27O-syn]: http://www.athenaes.com/Projects_Polyproteins.php
458 [k85]: http://dx.doi.org/10.1016/0378-1119(85)90318-X
459 [p96]: http://dx.doi.org/10.1006/jmbi.1996.0050
460 [gcode]: http://en.wikipedia.org/wiki/Genetic_code
461 [renz]: http://en.wikipedia.org/wiki/Restriction_enzyme
462 [BamHI]: http://en.wikipedia.org/wiki/BamHI
463 [BglI]: http://en.wikipedia.org/wiki/List_of_restriction_enzyme_cutting_sites:_Bd-Bp#Bd_-_Bp
464 [BglII]: http://en.wikipedia.org/wiki/BglII
465 [HindIII]: http://en.wikipedia.org/wiki/HindIII
466 [KpnI]: http://en.wikipedia.org/wiki/List_of_restriction_enzyme_cutting_sites:_G-K#K
467 [PstI]: http://en.wikipedia.org/wiki/PstI
468 [SmaI]: http://en.wikipedia.org/wiki/List_of_restriction_enzyme_cutting_sites:_S#S
469 [w85]: http://books.google.com/books?id=eA6iSmR0I4wC
470 [1TIT]: http://www.pdb.org/pdb/explore/explore.do?structureId=1TIT
471 [NM_003319.4]: http://www.ncbi.nlm.nih.gov/nuccore/NM_003319
472 [Q8WZ42]: http://www.uniprot.org/blast/?about=Q8WZ42[12677-12765]
473 [var]: http://web.expasy.org/cgi-bin/variant_pages/get-sprot-variant.pl?VAR_040140
474 [pQE30-a]: http://www.qiagen.com/literature/vectors_pqe.aspx
475 [pQE30-b]: http://www.qiagen.com/literature/pqesequences/pqe-30w.txt
476 [pUC19-a]: http://bccm.belspo.be/db/lmbp_plasmid_details.php?NM=pUC19
477 [pUC19-b]: http://www.ncbi.nlm.nih.gov/nucleotide/M77789?report=genbank