|
|
||||||||
|
Plant Physiol, December 2000, Vol. 124, pp. 1483-1492 A Simple Procedure for the Analysis of Single Nucleotide Polymorphisms Facilitates Map-Based Cloning in Arabidopsis1Department of Genetics, Harvard Medical School, and Department of Molecular Biology, Massachusetts General Hospital, Boston, Massachusetts 02114 (E.D., B.G.R., N.A.A., F.M.A.); Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, Massachusetts 02142 (S.R.); Harvard College, Cambridge, Massachusetts 02138 (L.M.S.); and Departments of Biochemistry (M.M., R.J.C., P.J.O., R.W.D) and Genetics (R.W.D), Stanford University School of Medicine, Stanford, California 94305
We developed a modified allele-specific PCR procedure for assaying single nucleotide polymorphisms (SNPs) and used the procedure (called SNAP for single-nucleotide amplified polymorphisms) to generate 62 Arabidopsis mapping markers. SNAP primers contain a single base pair mismatch within three nucleotides from the 3' end of one allele (the specific allele) and in addition have a 3' mismatch with the nonspecific allele. A computer program called SNAPER was used to facilitate the design of primers that generate at least a 1,000-fold difference in the quantity of the amplification products from the specific and nonspecific SNP alleles. Because SNAP markers can be readily assayed by electrophoresis on standard agarose gels and because a public database of over 25,000 SNPs is available between the Arabidopsis Columbia and Landsberg erecta ecotypes, the SNAP method greatly facilitates the map-based cloning of Arabidopsis genes defined by a mutant phenotype.
Map-based positional cloning in
Arabidopsis is a standard but, until recently, time-consuming and
expensive procedure for the isolation of genes defined by mutation. The
main obstacle encountered in map-based cloning approaches is the
insufficient number of PCR-based molecular markers available to perform
fine-structure mapping. However, public accessibility to the complete
and annotated sequence of the Arabidopsis genome should greatly
facilitate map-based cloning, because sequence information will provide
the tools necessary for the creation of new molecular markers (Lukowitz
et al., 2000 Based on current estimates, the number of InDels between the Columbia and Landsberg erecta ecotypes is approximately 21,000, an average of one InDel every 6.1 kb (S. Rounsley, personal communication). InDel polymorphisms found in the Columbia and Landsberg erecta ecotypes are frequently polymorphic in other Arabidopsis accessions. Therefore, it should be possible to use the Columbia/Landsberg InDel markers to find a polymorphism in a particular chromosomal region in any pair of accessions (G. Jander, personal communication). InDels are easily detected by amplifying a small region of the genome containing the insertion/deletion element and determining the size of the amplification product. On the other hand, the main drawback of InDel markers is that the differences in fragment lengths of the insertion/deletion elements are sometimes very small. Therefore, the discrimination of the PCR products corresponding to particular InDel alleles requires more sophisticated methods than standard agarose gel electrophoresis (e.g. denaturing PAGE). SNPs comprise the largest set of sequence variants in most organisms,
including Arabidopsis (Cho et al., 1999 In this paper, we describe the use of a modified allele-specific PCR procedure for assaying SNPs that is sufficiently robust to easily discriminate between the specific and nonspecific alleles. This new procedure facilitates the rapid and reliable creation and analysis of large numbers of molecular markers using simple methods common to most molecular biology laboratories.
Allele-Specific PCR Strategy Allele-specific PCR, the basis for the PCR-based mapping strategy
described in this paper, is illustrated in Figure
1. The technique utilizes primers with
specific mismatches at the 3' end that allow preferential amplification
of one allele relative to another on account of the primers being
complementary to the site of a DNA sequence variation (SNP; Ugozzoli
and Wallace, 1991
To overcome this problem, we used a modification of the original
allele-specific PCR methodology, in which an additional base pair
change is introduced within the last four bases of the primer (Newton
et al., 1989 Design of Allele-Specific Primers The main obstacle to the application of the SNAP procedure in the generation of molecular markers is the determination of which additional mismatches to introduce to obtain the required primer specificity. To facilitate the process of primer design, a computer program was written based on a set of empirical data that evaluates the effect of the addition of different mismatch alternatives on PCR amplification (Drenkard et al., article in preparation; http://patho.mgh.harvard.edu/ausubelweb). The program, called SNAPER, generates a list of up to 32 possible primers per SNP site (16 alternatives for each allele) that contain an additional mismatch within the three bases closest to the 3' end. Along with the allele-specific primers for the SNP site, the program also generates a second (reverse) primer that contains no mismatches. The program provides information concerning the likelihood that the primer will be allele-specific, predicted by empirical data, and the position and type of base pair change introduced to generate the additional mismatch in the primer. Because the work described in this paper was initiated well before the
release of the Cereon SNP database, we based the design of our initial
SNAP primers on a set of 487 loci containing SNPs between the Columbia
and Landsberg erecta ecotypes identified by Cho et al.
(1999) Testing of Allele-Specific Primer Sets During previous work, we determined experimental conditions that would assess primer specificity given the range of template DNA concentrations typically encountered in map-based cloning. We found that primers that showed specificity in two sets of PCR reactions that differed by 10 PCR cycles were specific over a 1,000-fold range of template DNA concentration. Therefore, we could test primer specificity by running two identical sets of PCR reactions per primer pair, in which the PCR reactions differ only in the number of cycles used during the amplification process (Drenkard et al., article in preparation). We verified experimentally that primer pairs displaying the presence of a PCR product on agarose gels for the specific allele and the absence of a PCR product for the nonspecific allele in both sets of PCR reactions were suitable for use as molecular markers (E. Drenkard, S. Rozen, M. Mindrinos, B.G. Richter, and F.M. Ausubel, unpublished data). A total of 331 SNAP primer pairs corresponding to 43 SNPs were tested using both 28 and 38 cycles of PCR amplification. From the list of primers generated by the SNAPER program for each SNP, we selected and tested approximately 8 primer pairs per allele, which represented approximately 16 primer pairs per marker (for some SNPs fewer alternatives were generated by the program). The products amplified by PCR were analyzed by agarose gel electrophoresis, and the presence or absence of bands in both sets of reactions was scored for each primer pair. Figure 2 shows the results obtained with primer pairs 1-4, 1-6, 4-3, and 2-6 using 28 and 38 cycles of PCR amplification. Primer pairs 1-4 (Landsberg-specific) and 1-6 (Columbia-specific) generated an amplification product for the specific allele and no amplification product for the nonspecific allele at both 28 and 38 cycles, indicating that the primers have the required specificity. Primer pair 4-3 (Landsberg-specific) generated amplification products for both the specific and nonspecific allele at 38 cycles, suggesting that at conditions of high template DNA concentration the primer could produce a false positive. Primer pair 2-6 (Columbia-specific) failed to generate an amplification product for the specific allele at 28 cycles, indicating that when using low template DNA concentrations the primer could produce a false negative.
For 43 different SNPs examined, the SNAPER program had an overall success rate of approximately 53% in generating primers with the desired range of specificity. Table I shows a list of 33 SNAP markers that we have generated to date using the program. Information about the markers generated is also available at the Ausubel laboratory web site (http://patho.mgh.harvard.edu/ausubelweb) and on The Arabidopsis Information Resource (TAIR) web site (http://www.Arabidopsis.org/). Seventeen of the markers listed in Table I were developed prior to the creation of the SNAPER program and are part of the empirical database used to generate the rules underlying the SNAPER algorithm.
Mapping of SNAP Markers onto the Columbia-Landsberg Recombinant Inbred (RI) Map The chromosomal locations of 33 of the SNPs corresponding to SNAP
markers in Table I were previously mapped, using an Affymetrix-based mapping technique, to unique chromosomal positions and integrated into
the existing Arabidopsis RI linkage map (Table I; Cho et al., 1999
Map positions for all 50 markers are shown in Table I and are also available at the Nottingham Arabidopsis Stock Centre (http://nasc.nott.ac.uk/RI_data/RI_menu.html). The 50 markers are in general well scattered throughout the five Arabidopsis chromosomes with an average of 10 markers per chromosome (Table I). BAC, YAC, and P1 clones containing the markers described in this paper were identified in the TAIR database (Table I; http://www.Arabidopsis.org/blast/). Approximate physical map positions for these clones, obtained from The Institute of Genomic Research web site (http://www.tigr.org/), were compared with the genetic map positions obtained in this paper (Table I) for all 5 chromosomes. The correlation coefficients (r2) between the genetic and physical map positions were all high for the markers located on chromosomes 1, 2, 3, and 4 (0.999, 0.977, 0.986, and 0.990, respectively; Fig. 4A). We were unable to obtain good estimates for physical map positions on chromosome 5 because information on the extent of the overlap that exists between Arabidopsis clones was not available. Nevertheless, based on a less rigorous analysis, we found some discrepancies between the genetic and physical maps of chromosome 5. Genetic map positions were reversed with respect to physical map positions for at least 3 of the markers located below 93.07 cM (Fig. 4B). Moreover, although markers SGCSNP101 and m558a reside in the same clone (MUA2) they differ considerably in their genetic map positions (142.02 and 113.80 cM, respectively).
Mapping of edr5-1 To assess the feasibility of the use of SNAP markers in map-based
cloning approaches, we localized the mutation edr5-1, which causes enhanced disease resistance to the virulent bacterial pathogen Pseudomonas syringae pv maculicola ES4326 and to
the obligate fungal pathogen Erysiphe orontii. The
edr5-1 mutant was isolated in the Ausubel laboratory in a
genetic screen to identify Arabidopsis mutants in the Columbia
accession with altered susceptibility to P. syringae pv
maculicola ES4326 (Volko, 1998 Once the mutation was positioned on chromosome 4, a group of SNAP (SGCSNP24, SGCSNP64, and SCGSNP102; Table I), CAPS (g4539, AG, and RPS2), and SSLP (nga 1111) markers on the long arm of chromosome 4 were used to determine a more defined map position for edr5-1. The analysis of the data obtained from those markers allowed us to narrow down the position of edr5-1 to a 1.7-Mb region (which corresponds to 19 BAC clones) between markers SGCSNP24 and SGCSNP64 (Fig. 5A). For 18 of these BAC clones, two SNPs per BAC clone were selected from the Cereon SNP database (http://www.Arabidopsis. org/cereon) for conversion into SNAP markers. Primer pairs designed by the SNAPER program were tested for specificity under the conditions described previously (28 and 38 PCR cycles). Four primer pairs were tested per allele, representing an average of eight primer pairs tested per SNP site. Because primers/markers were tested in groups, the results obtained from the analysis of the groups tested initially allowed us to reduce the number of SNPs to be used in subsequent rounds, since more precise map positions were obtained after each round of markers was analyzed. A total of 144 primers (which corresponded to 18 SNPs) were tested, and 12 new SNAP markers located in the region of interest were generated. Markers CER426330, CER426890, CER442145, CER444061, CER444203, CER446565, CER447954, CER447956, CER447203, CER465981, CER466066, and CER466198 are described on the Ausubel laboratory web site (http://patho.mgh.harvard.edu/ausubelweb) and on TAIR (http://www.Arabidopsis.org/). A total of 266 F2 plants and pooled F3 families were used to analyze 10 of the 12 markers that were generated (using 35 PCR cycles). As of the preparation of this manuscript, edr5-1 has been mapped to a 315-kb region on the long arm of chromosome 4 between SNAP markers CER447954 and SGCSNP64 (Fig. 5B).
The recent release of a list of approximately 25,000 SNPs (out of
40,000 predicted) between the Columbia and Landsberg erecta accessions in conjunction with the SNAP procedure described in this
paper allow the design of a considerable number of molecular markers
targeted specifically to regions of interest. Sufficient resolution for
fine mapping of a mutation consequently can be achieved in a short
period of time. In the SNAP procedure, SNP alleles are assayed using
specially designed allele-specific primers, which generate
allele-specific patterns (i.e. detectable amplification product only
from one allele) that are rapidly and reliably scored by simple
analytical methods. We used a modification of the original allele-specific PCR methodology, the introduction of an additional mismatch within the last four bases of the primer (Newton et al., 1989 The design of SNAP primers is greatly facilitated by the use of the
SNAPER program. The success rate of the SNAPER program, which uses
empirical rules to design the primers, was approximately 53% out of a
total of 331 primers that corresponded to 43 SNPs examined. Testing an
average of eight primer pairs per allele we obtained specific primer
pairs for both alleles of a given SNP in 27 out of 43 cases. Specific
primer pairs for only one of the alleles were obtained in 14 cases, and
two SNPs failed to generate any allele-specific primers. The failure to
generate allele-specific primers in some cases is consistent with
previous reports that indicated that mismatch extension can vary
significantly (approximately 5-100-fold) depending on the sequences
surrounding the mismatch (Mendelman et al., 1989 In the 14 cases where specific primers were not obtained for one of the alleles after testing eight primer pairs, we tested more primers. We ultimately obtained the second allele-specific primer for six out of seven SNPs examined. However, because of the large number of primers tested, we concluded that the most efficient approach is to simply abandon the SNPs that fail to yield specific primers for both alleles after testing a limited number of primer pairs. Designing primers for new SNPs reduces the time and cost required to generate a marker, and the availability of SNPs is not a limitation. In the case of the mapping of edr5-1, we tested only four primer pairs per allele for each SNP and were able to obtain 12 markers out of 18 SNPs tested. These data suggest that it is only necessary to test a relatively small number of primers to obtain a molecular marker for a specific SNP (approximately four primer pairs per allele). Moreover, we predict that the success rate of the program will increase as the data generated testing the primers is added to the empirical database used to generate the rules underlying the SNAPER algorithm. The 50 SNAP markers generated are evenly distributed throughout the Arabidopsis chromosomes, showing only two gaps at the bottom of chromosome 3 and the top of chromosome 5 (Table I). The maximum distance between any two of the 50 SNAP markers described in Table I is approximately 70 cM (SGCSNP123 and SGCSNP126) at the top of chromosome 5 (Table I). Because there is widespread interest in using RI lines to clone quantitative trait loci, we evaluated the correspondence between the physical map of the Columbia genome with the genetic map derived from the Columbia × Landsberg RI lines. It is interesting that the analysis showed a strong correlation between the genetic and physical maps for chromosomes 1, 2, 3, and 4 (r2 = 0.999, 0.977, 0.986, and 0.990, respectively; Fig. 4A). On the other hand, a less rigorous comparison for chromosome 5 showed that the genetic and physical map positions differed considerably in several places (Fig. 4B). Although it is not possible to definitively attribute a physical map position to a marker until the sequence of the genome is complete (because of potential duplications), the data suggest that caution must be used in determining a physical map position based on genetic data. In the fine-structure mapping of the edr5-1 mutation, we created a total of 12 new SNAP markers, and using 10 of the 12 markers generated we were able to delimit the mutation to a 315-kb region on the long arm of chromosome 4 (Fig. 5B) in approximately a 3-week period (not counting the time it took for primers to be synthesized). The rapid mapping of the edr5-1 mutation (which is still in progress) can be credited to a combination of factors: first, the current abundance of SNPs makes it extremely easy to target a region of interest and generate a large number of markers that can be used in fine-structure mapping; second, the SNAPER program facilitates design of primer alternatives for the creation of PCR-based markers using existing SNP sites; and third, the methods used to analyze the markers generated (PCR and agarose gel electrophoresis) are relatively simple. Compared with some of the methods currently used for high-throughput
SNP detection in genome centers, such as pyrosequencing (Ahmadian et
al., 2000
Primer Design Allele-specific primers that corresponded to SNP sites were designed using an empirically determined algorithm called SNAPER (E. Drenkard, S. Rozen, M. Mindrinos, B.G. Richter, and F.M. Ausubel, unpublished data). The algorithm was implemented in the PERL language for the automatic design of oligonucleotide primers. Optimization of melting temperature, oligonucleotide length, and length of amplified products was achieved using the Primer3 program (S. Rozen and H.J. Skaletsky, code available at http://www-genome.wi.mit.edu/genome software/other/primer3.html). Primer sequences were screened against an Arabidopsis library of repetitive elements (Cold Spring Harbor Laboratory, Cold Spring Harbor, NY) to minimize mis-priming. Quantitation of primer concentrations was performed using the Quant program (developed by B.G. Richter). Primer Testing Testing of the primers designed by the SNAPER program was
performed on a PTC-225 DNA Engine Tetrad (MJ Research, Watertown, MA)
using 20-µL reactions in 384- or 96-well formats. Thirty nanograms of
Columbia or Landsberg erecta genomic DNA, isolated using
standard procedures (Ausubel et al., 1992 RI Mapping Template DNA from 94 recombinant inbred lines generated using
the Columbia × Landsberg erecta cross (Arabidopsis
Biological Resource Center, Ohio State University, Columbus) was
isolated using standard methods (Ausubel et al., 1992 Mapping of edr5-1 Samples were prepared using a plant genomic miniprep (Edwards et
al., 1991
We thank Sean May and Keith Bradnam from the Nottingham Arabidopsis Stock Centre for the analysis of the RI mapping data; Julie Stone and Mary Wildermuth (Department of Genetics, Harvard Medical School, and Department of Molecular Biology, Massachusetts General Hospital) for help in preparation of the manuscript; Sigrid Volko, Julie Stone, and Jacinto Villanueva (Department of Genetics, Harvard Medical School, and Department of Molecular Biology, Massachusetts General Hospital) for help in mapping edr5-1; and Eric Lander and David Altshuler (Whitehead Institute/Massachusetts Institute of Technology Center for Genome Research, Cambridge, MA) for the use of specialized equipment in the large-scale mapping of the SNAP markers.
Received September 8, 2000; accepted September 22, 2000. 1 This work was supported by the National Science Foundation (grant no. MCB-9729599).
2 Present address: Genomics Collaborative Inc., 99 Erie Street, Cambridge, MA 02139.
* Corresponding author; e-mail ausubel{at}frodo.mgh.harvard.edu; fax 617-726-5949.
This article has been cited by other articles:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ASPB Publications | PLANT PHYSIOLOGY® | THE PLANT CELL | |
|---|---|---|---|