|
|
||||||||
|
Plant Physiology 140:805-817 (2006) © 2006 American Society of Plant Biologists TOPAAS, a Tomato and Potato Assembly Assistance System for Selection and Finishing of Bacterial Artificial Chromosomes1,[W]Centre for Biosystems Genomics, 6700 AB Wageningen, The Netherlands (S.A.P., J.C.v.H., T.H., M.J.v.S.); Department of Bioscience, Cluster Greenomics, Plant Research International, 6708 PB Wageningen, The Netherlands (S.A.P., J.C.v.H., T.H., M.J.v.S., M.H.C.A.-H., R.M.K.-L.); and Keygene N.V., 6700 AE Wageningen, The Netherlands (T.P.J., D.W., K.J.)
We have developed the software package Tomato and Potato Assembly Assistance System (TOPAAS), which automates the assembly and scaffolding of contig sequences for low-coverage sequencing projects. The order of contigs predicted by TOPAAS is based on read pair information; alignments between genomic, expressed sequence tags, and bacterial artificial chromosome (BAC) end sequences; and annotated genes. The contig scaffold is used by TOPAAS for automated design of nonredundant sequence gap-flanking PCR primers. We show that TOPAAS builds reliable scaffolds for tomato (Solanum lycopersicum) and potato (Solanum tuberosum) BAC contigs that were assembled from shotgun sequences covering the target at 6- to 8-fold coverage. More than 90% of the gaps are closed by sequence PCR, based on the predicted ordering information. TOPAAS also assists the selection of large genomic insert clones from BAC libraries for walking. For this, tomato BACs are screened by automated BLAST analysis and in parallel, high-density nonselective amplified fragment length polymorphism fingerprinting is used for constructing a high-resolution BAC physical map. BLAST and amplified fragment length polymorphism analysis are then used together to determine the precise overlap. Assembly onto the seed BAC consensus confirms the BACs are properly selected for having an extremely short overlap and largest extending insert. This method will be particularly applicable where related or syntenic genomes are sequenced, as shown here for the Solanaceae, and potentially useful for the monocots Brassicaceae and Leguminosea.
An established strategy to determine the sequence content of target genomes involves large insert clones that are physically mapped into contigs spanning the target of interest, and which are used for shotgun library construction and high-throughput sequencing. Many aspects concerning the clone-by-clone whole-genome sequencing strategy in literature have been addressed, and although much progress has been made in developing this strategy, key steps are the subject of continued evaluation and improvement. Here we present results on the Centre for Biosystems Genomics initiative to sequence tomato chromosome 6 of Solanum lycopersicum cv Heinz 1706 by a clone-by-clone sequencing approach and to establish a resistance gene homolog profiling for the potato (Solanum tuberosum) genome. In this paper we particularly focus on selecting bacterial artificial chromosomes (BACs) for walking and finishing.
The condition of having large insert clones available was fulfilled by Budimann et al. (2000)
Upon selection of fingerprinted BACs, determining the sequence content is the next important step in rebuilding the genomic content of targets. The method most commonly used for genomic DNA sequencing is shotgunning. The sample DNA is randomly sheared into small fragments and cloned into appropriate sequencing vectors. With double-barreled shotgun sequencing, small insert clones are sequenced from both insert ends, producing read pairs or mates. The aim is to cover the target of interest and to reduce the number of sequence gaps between contigs by producing a sufficient amount of sequences from which a reliable consensus can be determined upon assembly. Theoretically, following Poisson distribution rules, the probability for bases not being sequenced leaving sequence gaps reduces with an increase of coverage, as outlined by Lander and Waterman (1988)
Several tools for contig linking and gap closure have been presented in the past. Among those, prokaryotic genome assembly assistance system, which was developed to automate contig ordering and gap closure for prokaryotic cyanobacterial genome assembly by finding possible links for Synechococcus contigs with known protein sequences coming from closely related Synechocystis sp. (Yu et al., 2002 In addition to existing database information, a powerful data source for contig scaffolding and inherent to the double-barreled shotgun sequencing approach, is the assembly position of a sequence read constraint by the assembly position and direction of its mate pair. This information can be used to both relatively position contigs and to solve local assembly problems. Reconstruction of target sequences is often complicated by repeats, resulting in collapsed assemblies. To resolve these phenomena, a tool that reports on violation of direction and size constraints will help to determine contig quality. We report here the development of a Tomato and Potato Assembly Assistance System (TOPAAS) that uses homology-based searches, comparative alignments, read pair information, and high-density AFLP fingerprint data to link contigs, verify assemblies, and select minimal overlapping BACs.
Dataflow and Output The main purpose of TOPAAS is to automate key steps in the clone-by-clone sequencing approach. Its tasks are to find contig link information for gapped assemblies resulting from low-coverage sequencing, to analyze the assembly integrity, and to assist the selection of overlapping BAC clones for a subsequent sequence walk. To that end we have built a system that extracts read pair information, carries out homology-based searches, and analyzes this information according to user-defined settings. A schematic representation of the TOPAAS pipeline and dataflow is shown in Figure 1 . TOPAAS visualizes the link analysis and presents the user with detailed information on type, order, and number of links (see Fig. 2 ).
TOPAAS provides a web front end in PHP for uploading assembly data and contig sequences, setting alignment constraints and average insert sizes for shotgun libraries. Homology-based alignments can be uploaded manually or provided by TOPAAS via two automated BLASTs. TOPAAS aligns contigs against the nonredundant sequence database from the National Center for Biotechnology Information (NCBI) and against the BAC end sequence database from SGN. The system also carries out a MUMmer (Delcher et al., 2002 The automated BLASTN analysis of contigs against the BAC end sequence database is used for high-throughput screening and rapid preselection of candidate BACs, having a sequence overlap with seed BACs. The single-pass BAC end sequences are reassembled onto the seed BAC consensus. Base pair inconsistencies are edited to exclude high quality base call mismatches and the position of a nearby cloning site upstream of the BAC end sequence start position is verified. When meeting constraints, corresponding BACs are then selected for further analysis with high-density AFLP fingerprinting. The reassembly of BAC ends and AFLP fingerprinting analysis is carried out independently from TOPAAS.
Sequence Homology-Based Searches
We first searched the contig sequences of P250I21 and P046G10 with TOPAAS against the BAC end database from SGN, containing 75,000 to 126,000 BAC end sequences from a HindIII and an MboI library depending on the time of screening. The raw BLASTN output was converted into html format to provide for a complete overview of hits (Fig. 3 ; Supplemental Fig. 3). We frequently observe individual seed BAC domains hit by multiple BAC ends. Such can be the result of a repetitive domain within the genome. In addition it may reflect also a redundancy in the BAC library. Indeed, e.g. around the 30-kb position from the start of P250I21, a putative gene predicted by Genscan shows a BLASTX homology against a putative retroelement polyprotein from Arabidopsis (Arabidopsis thaliana) and a hypothetical protein from the wild cabbage (Brassica capitata) transposon Melmoth. Transposable elements account for at least 10% of the Arabidopsis genome and are well represented in other plant genomes as well and most likely also in Solanaceae genomes (Arabidopsis Genome Initiative, 2000
High-Density Nonselective AFLP Fingerprinting of Tomato BACs To investigate the relation between BACs over a larger extent we analyzed AFLP EcoRI/MseI + 0/+ 0 fingerprints by determining the number of comigrating fragments between BACs (Fig. 4), and comparing their sizes with an in silico EcoRI/MseI digest obtained from the seed BAC consensus sequence. From the combinatorial comparison of comigrating fragments, the bins for P250I21 (Fig. 5 ) and P046G10 (Supplemental Fig. 5) are constructed. In the Mi contig the smallest number of comigrating fragments is shared between P250I21 and P073H07 pointing to a minimal overlap. The other BACs in the Mi contig share a large amount of comigrating fragments, suggesting the overlap size with both P250I21 and P073H07 is considerably larger. The deduced order of BACs overlapping P250I21 is consistent with the BLAST hit positions, although we find a 6-kb extension of P111A8 compared with P092A17 (see Fig. 5). The in silico digest of P250I21 indicates two pairs of consecutive EcoRI/MseI restriction sites are present in this 6-kb domain. However, corresponding comigrating fragments couldn't be scored from gel (Fig. 4, lanes 5 and 6). Several phenomena might account for missing the detection of fragments. We cannot entirely rule out an excessively deviating gel migration behavior. Furthermore, similar sized fragments comigrating as a single band can mask each other and cause ambiguities when scoring fragments in gel. Isolation of fragments from gel and sequencing for positive identification would provide more insight, but is beyond the scope of this study and it will be addressed elsewhere. From experience we assume each fragment observed in gel corresponds to an overlap size of approximately 3 kb. In some instances the estimated overlap size per bin differs from the calculated size. Nevertheless, the overall estimated spanning distance is in agreement with the calculated overlap sizes for bin 1 to bin 5. Taken together these results make it unlikely P250I21 and P073H07 would share a small repeat and suggest the minimal overlap is authentic. Furthermore bin 1, bin 3, and bin 11 contain fragments unique to P112G05, P250I21, and P073H07, respectively, indicating these BACs make up for the largest spanning distance in the Mi contig.
The nature of the overlap is further investigated by shotgun-sequencing P073H07 and 103N18 and assembly onto the consensus of P250I21 and P046G10, respectively. Both P073H07 and P103N18 align without base inconsistencies, and the overlap start position is similar to that determined by BLAST. Furthermore the BAC end assembly positions and directions are in agreement with the mapping results (Supplemental Fig. 4). From these results we conclude to have identified P073H07 as optimal BAC for walking in terms of minimal overlap and largest extending insert. At the time of screening the same did hold true for BAC P103N18. Over time the sequencing community will be provided in total with some 400,000 BAC end sequences obtained from three different libraries (Mueller et al., 2005b
To analyze the quality of the contig links predicted by TOPAAS, we have constructed an assembly data set from three potato BACs, which were pulled from two different libraries (Rouppe van der Voort et al., 1999 Subsequently, primers designed by TOPAAS on contig ends were used for PCR analysis on BAC template DNA in combinations according to the contig order predicted by TOPAAS. Figure 6 shows 29 out of 33 primer combinations producing amplicons. Amplified products do not exceed a length of 1 kb except for Figure 6, lane 20, which is well within the size limit for bridging. The PCR analysis shows all except one primer pair combination producing single amplicon products, indicating the primer annealing positions are unique and suggesting the primer redundancy check by TOPAAS to be reliable. PCR products have been sequenced and assembled to investigate the gap closure. In all instances, sequences derived from single amplicons (Fig. 6, lanes 438) are contig bridging and result in joins between contigs. Multiple amplicons from one primer pair combination were isolated separately, of which the larger product produced a gap-spanning sequence (Fig. 6, lane 3). Four out of 33 primer combinations failed to produce a PCR product, although contig pairs flanking the gaps are linked by read pairs (Fig. 6, lanes 2, 31, 34, and 38). In one instance gap-flanking sequences reveal a potential hairpin structure that probably obstructs a proper PCR (Fig. 6, lane 2). We redesigned PCR oligos at the 3' site of both arms of the hairpin structure and adapted PCR conditions. The redesigned primers facilitated a proper PCR and produced a gap-closing sequence (data not shown). Thus using the contig ordering information from TOPAAS we are able to efficiently finish the potato BACs to full closure. Also tomato BAC P103N18 was closed, whereas for P073H07 we could not find sufficient links to complete closure. These results indicate the integrity of the contig order predicted by TOPAAS and the sufficient quality of the automatically designed primers for gap closure.
Selection of BAC Clones for Sequence Walk
We presented here a software package, TOPAAS, that automates key steps in the selection and finishing of BAC clones. A combination of nonselective AFLP fingerprinting, BLASTN analysis, and assembly of BAC ends supports an accurate physical mapping. The BLASTN search is used for high-throughput screening of BACs and rapid preselection. The selection can be used without laborious screening techniques such as the STS approach (Blake et al., 1996 For screening contigs against BAC ends alternatively MegaBlast might be used. MegaBlast is faster compared to BLASTN and allows for a percentage identity cutoff rather than expected value cutoff. Since e-values depends on the length of the BAC ends and the size of the referenced database, relatively short BAC end sequences with a perfect match might be missed when filtering with a cut-off e-value of 0.0. We have also included the option to screen BAC contig sequences with MegaBlast. The screening presented here works very efficiently. From a total of 75,000 to 126,000 BACs we have identified four and seven candidates for P250I21 and P046G10, respectively, prior to fingerprinting. The fingerprinting and BLASTN analyses work complementarily in the physical mapping process. With the BAC end sequence homology search we are able to pinpoint the exact start position and direction of the overlap, and the AFLP fingerprinting is used to determine the relationship between overlapping BACs over a larger domain. Whereas the BLASTN hits disclose information on minimal overlap sizes, the multiple BAC comparisons through nonselective AFLP fingerprinting provide vital information for identifying BACs with the largest extending insert. For BAC P073H07, two comigrating fragments with seed BACs P250I21 have been scored (Fig. 4, lanes 2 and 6). For BAC P103N18 one comigrating fragment is scored (Fig. 4, lanes 9 and 15), which alone would be an insufficient number to declare a reliable overlap. Furthermore, AFLP fragments are sometimes not detected from gel reads, causing small overlaps to be missed in the physical mapping process. The BLASTN hit positions and the assembly of BAC ends onto the seed BAC consensus have shown to be able to compensate this shortcoming. By sequencing and assembly of BACs selected for walking, we have confirmed that the overlap of BACs with a few kilobase pair overlap is authentic. The approach we have taken does not depend on the full closure of a seed BAC. The results for P046G10 show that minimal overlapping BACs can be scored for as well, even when having gapped assemblies, provided the contig ends adjacent to the T7 and SP6 region are identified. Theoretically with this approach it should be possible to identify BACs for walking having only a few hundred base pair overlap. This will depend on the distribution of restriction sites in the tomato genome and the number of BAC clones available to cover the genome. Recently also BAC end sequences from an MboI library have been made available and will be complemented by the United States' part of the SOL initiative with additional sequences coming from an EcoRI library. The use of multiple libraries produced with different restriction enzymes will increase the likelihood of finding BACs with even shorter overlap sizes.
The mapping for BACs in AFLP contigs Mi and P103 has revealed some striking differences compared to FPC mapping results. Six BACs coassemble into contig Mi (Fig. 5). FPC data obtained from http://www.genome.arizona.edu/fpc/WebAGCoL/tomato/WebFPC/ show three BACs, P112G05, P111A08, and P096H22, respectively, map into three separate FPC contigs, while for the other three BACs no FPC mapping information could be retrieved. Contig P103 was assembled from eight BACs. For five out of eight BACs, including P250I21, no FPC data was available, whereas only three BACs, P061I06, P008K02, and P188J12, respectively, coassemble into a single FPC contig. BACs like P250I21 that are not assigned to FPC contigs probably represent dropouts. Our mapping results indicate BACs P111A08 and P096H22 from AFLP contig Mi overlap approximately 100 kb and share some 30 comigrating AFLP fragments. This finding is not reflected by the FPC data, and, despite this large overlap, P111A08 and P096H22 have been mapped into two different FPC contigs. The information content used to construct the maps for the AFLP contig Mi and P103 is significantly higher and directly relates to the number of bands produced and detectable size ranges in polyacrylamide and agarose gels (Meyers et al., 2004
Other important aspects are cost and labor involved. Recently we have screened 21 seeds from a HindIII library against 350,000 BAC ends. The screening yields 186 BACs from the HindIII library, 126 BACs from the EcoRI library, and 75 BACs from the MboI library (data not shown). Thus on average 18 candidate overlapping BACs have been identified per seed BAC. We can now roughly estimate the total number of BACs to be fingerprinted using the STC approach, and compare this with the classical FPC method. If we follow Batzoglou et al. (1999)
TOPAAS assists the assembly, scaffolding, and finishing of BAC contigs. Read pairs are used commonly for finishing assemblies, and this linking approach has also contributed extensively to the positioning of tomato and potato contigs in this study. The likelihood for finding sequence gap-spanning read pairs depends on the insert sizes used for constructing the shotgun library and the coverage with which the target is sequenced. Approximately 15% of the contigs could not be ordered with gap-spanning mate pairs. This is partly due to the low coverage with which BACs have been sequenced. We have included homology-based searches to increase the chance of finding leads that link contig ends. From the links predicted, approximately 70% belonged to a read pair link type, whereas the remaining 30% were equally divided over BLASTX and ESTs link types.
Multiple factors contribute to the success of the homology-based linking approach. We show here alignments to single-pass ESTs can successfully be used for tomato and potato contig linking. For many plant genomes extensive amounts of ESTs have been produced, and in combination with genomic sequences the approach is feasible for many sequence projects including those from monocots, Brassicaceae, and Leguminosea (http://www.ncbi.nlm.nih.gov/genomes/PLANTS/PlantList.html). The closing efficiency will improve when using unigenes, since the spanning distance in general is larger compared to single-pass EST sequences. Building high quality unigenes requires base calling, accurate preclustering, and assembly, however. Reliable linkage by bridging unigenes will thus depend on the consistency and the overall quality of the build. Some 31,000 for S. lycopersicum and 25,000 unigenes for S. tuberosum have been assembled (Mueller et al., 2005a
MUMmer has been used as the matching algorithm. Its suffix tree-based method is relatively computational inexpensive and is very fast. MUMmer can perform a translated alignment, which is preferable for more distant related genomes. However, it is memory intensive and is originally designed for global rather then local alignments (Delcher et al., 2002
Both BLAST and EST bridging sequences were checked manually for homology against known Solanum repeats. In one instance we found a contig pair linked by a BLAST hit against a repetitive element. The contig pair also shared a bridging read pair, making an aberrant linkage unlikely. Neither BLAST nor TOPAAS is specifically designed to deal with repetitive sequences. Although not used in this study, we have recently included an automated screen in the assembly phase against The Institute for Genomic Research Solanaceae Repeat Database (http://www.tigr.org/tdb/e2k1/plant.repeats) with RepeatMasker to circumvent potential problems (http://www.repeatmasker.org/RMDownload.html). In a Staden environment RepeatMasker is interfaced by PREGAP4 (Bonfield et al., 1995
Ordering contig ends with BLASTX depends on the gene distribution in the tomato and potato genome. In this study we have finished BACs containing inserts of the euchromatic part of tomato chromosome 6. The genes are not evenly distributed in the tomato and potato genome (Van der Hoeven et al., 2002 The TOPAAS software is available for nonprofit, academic, and personal use. Please contact http://www.cbsg.nl for nonexclusive commercial licenses. The software can be downloaded from http://www.appliedbioinformatics.wur.nl.
Sequencing and PCR Analysis BAC DNA was isolated with the Qiagen large construct kit, sized by hydro shearing, fractionated by gel electrophoresis, and 2-kb sized fragments were cloned into the dephosphorylated EcoRV site of pBlueScriptSK (Stratagene) or pGEM-TEasy (Promega). Shotgun templates were prepared from XL2 transformants (Stratagene) and sequenced using the ABI PRISM Big Dye Terminator Cycle Sequencing Ready reaction kit with FS AmpliTaq DNA polymerase (Perkin Elmer) or the DYEnamic ET Terminator Cycle Sequencing kit (Amersham). For gap closure, PCR products were amplified with custom-made primers using a regular PCR protocol. Typically a 10-µL PCR reaction contained 1 µL 5 µM forward and 1 µL 5 µM reversed custom primer, 1 µL 2.5 mM dNTPs, 2 µL 25 mM MgCl2, 2 µL 10x sequence buffer (200 mM Tris-HCl pH 9.0, 5 mM MgCl2), 0.2 µL 5 units/µL Goldstar (Eurogentec) polymerase, and 1 µL 10 µg/µL BAC template DNA. PCR products were analyzed on agarose gel, purified using QIAquick gel extraction kit (Qiagen) as described by the manufacturer, and diluted into 30 µL. Sequence PCR was carried out in 10 µL reaction mixture with 2 µL Amerdye (Amersham), 1 µL sequence primer, 2 µL sequence buffer (200 mM Tris-HCl pH 9.0, 10 mM MgCl2), and 5 µL template DNA. Sequence PCRs were analyzed on a 3730 XL DNA analyzer (Applied Biosystems).
Using the PREGAP4 interface of the Staden package 2004, raw trace data was processed into assembly ready sequences. Sequences were base called by the PHRED base caller (Ewing and Green, 1998
To manage the sequence, assembly, and scaffolding data we developed TOPAAS with components that are available as open-source components or with an academic user license. In particular we use MySQL as a database management system (http://www.mysql.com/downloads). Perl (http://www.perl.org) and PHP (http://www.phpmyadmin.net) are used for scripting purposes, and Apache (http://www.apache.org) is used for web hosting. Graphical output relies on the use of the graphics draw library (http://www.sunfreeware.com, or http://www.boutell.com/gd). The core program for primer design is built upon Primer3 (http://www-genome.wi.mit.edu/genome_software/other/primer3.html), though additional scripting has been used to manipulate Primer3 to automated primer design for sequence gap closure. The software also includes scripts to build a local database of contig sequences for redundancy check purposes of primer sequences using BLASTN. To find matching putative functions that can be attributed to contig sequences we rely on BLASTX hits. We have adopted the prokaryotic genome assembly assistance system approach, but we use our own implementation to screen for identical accession ID. We have extensively revised the table structure so that storage of datasets for multiple projects is supported. The software does not cover the implementation of a local BLAST facility and a proper environment to run BLAST. This should be implemented by the user (for details, see http://www.ncbi.nih.nlm.gov/BLAST). For multiple alignment viewing of BLASTX matches we rely on Mview (http://mathbio.nimr.mrc.ac.uk/
Consensus sequences of contig ends were cured with the GAP4 assembly viewer using a PHRED quality threshold of 40 over a length of 1 kb for both ends of a contig. Assembly information was extracted from the GAP4 assembly database and parsed into the ContigLink database with TOPAAS. Subsequently, read pairs were evaluated with respect to direction and size constraints that underlie the shotgun library properties. Bridging read pairs are considered valid when positioned on different contig ends, pointing toward each other with respect to their sequencing direction, and meeting size constraints. For gap-flanking read pairs we calculate the sequence-spanning distance, excluding the size of the gap itself. The left distance, dleft, is taken from position 1 at the 5'-end of the first mate pair to the end position of the contig it is assembled in, running in the direction similar to the sequence direction of the first mate pair. The right distance, dright, is taken from the start position of the second contig to the 5' end coordinate of the second mate pair running opposite to the sequence direction of the second mate pair. The total spanning distance is calculated as dtot = dleft + dright. The size constraint dtot for read pairs can be set to a value related to the average insert size used to construct a shotgun library. In this study dtot is set to 2.5 kb. To align tomato (Solanum lycopersicum) and potato (Solanum tuberosum) EST sequences to contig sequences, we use an extension of the MUMmer package, designated NUCmer, using mummer2 as the matching algorithm. Consensus sequences in multi-fasta format from assembled contigs are used as a reference, and multi-fasta formatted potato and tomato EST sequences derived from NCBI are used as a query data set. An EST is considered contig bridging when aligning to different contig end sequences, with its domains aligned in a consecutive order, and with a minimal sequence identity threshold of 90% for each aligned domain. To find related putative gene functions, contig sequences were queried against the nonredundant sequence database from NCBI with BLASTX. A link is considered valid when hitting against protein sequences with the same accession ID. A threshold for the expected value was set to 1 x 105 to avoid low similarity matches. Primers are automatically designed on contig end sequences, using Primer3 as a core primer design program. Maximum distance of primer positions to contig ends is set to 500 bp. Additional custom scripting is applied to prefer primer sequences pointing outward with respect to the contig end positions and positioned nearest to a contig end. An automated redundancy check is used by aligning the primer sequence against the consensus sequence of the contigs using BLASTN. The expected value threshold for reporting primers as redundant was set to 0.1. Possible mispriming that could give rise to ambiguous PCR results is output by the program and described in terms of position, number of aligned bases, and alternative melting temperature. To identify minimal overlapping BAC clones for walking, we use tomato BAC end sequences from the SOL Genomics Network available at ftp://ftp.sgn.cornell.edu/tomato_genome, and perform a BLASTN analysis against assembled tomato contigs. Position and direction of overlap were verified, and candidate BAC clones were preselected setting a threshold expected value to 0.0. When meeting constraints, corresponding ABI traces were subsequently assembled onto BAC contig sequences to which the BLAST hit was found and verified at nucleotide level for integrity. Assembled BAC end sequences showing high quality base call differences compared to contig consensus sequences, or showing its assembly start more than 50 bp downstream from a candidate HindIII or MboI cloning site are rejected. Remaining candidate BAC clones are further analyzed by fingerprint analysis.
BAC DNA was isolated by standard alkaline lysis method (Sambrook et al., 1989
We thank Joyce van Eck for providing us with the MboI and EcoRI library from tomato cv Heinz 1706, and Andy Pereira and Roeland van Ham for reading the manuscript and for advice. Received September 13, 2005; returned for revision December 16, 2005; accepted January 6, 2006.
1 This work was supported by the research program of the Centre of BioSystems Genomics, which is part of the Netherlands Genomics Initiative/Netherlands Organization for Scientific Research. The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantphysiol.org) is: Sander A. Peters (sander.peters{at}wur.nl).
[W] The online version of this article contains Web-only data. www.plantphysiol.org/cgi/doi/10.1104/pp.105.071464. * Corresponding author; e-mail sander.peters{at}wur.nl; fax 31317418094.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215: 403410[CrossRef][ISI][Medline] Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796815[CrossRef][Medline] Batzoglou S, Berger B, Mesirov J, Lander ES (1999) Sequencing a genome by walking with clone-end sequences: a mathematical analysis. Genome Res 9: 11631174 Blake TK, Kadyrzhanova D, Shepherd KW, Islam AKMR, Langridge PL, McDonald CL, Erpelding J, Larson S, Blake NK, Talbert LE (1996) STS-PCR markers appropriate for wheat-barley introgression. Theor Appl Genet 93: 826832[CrossRef][ISI] Bonfield JK, Smith KF, Staden R (1995) A new DNA sequence assembly program. Nucleic Acids Res 23: 49924999 Bonierbale MW, Plaisted RL, Tangsley SD (1988) RFLP maps based on a common set of clones reveal modes of chromosomal evolution in potato and tomato. Genetics 120: 10951103 Budimann MA, Mao L, Wood TC, Wing RA (2000) A deep-coverage tomato BAC library and prospects toward development of an STC framework for genome sequencing. Genome Res 10: 129136 Delcher AL, Phillippy A, Carlton J, Salzberg SL (2002) Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res 30: 24782483 Ewing B, Green P (1998) Basecalling of automated sequencer traces using PHRED. II. Error probabilities. Genome Res 8: 186194 Ewing B, Hillier L, Wendl MC, Green P (1998) Basecalling of automated sequencer traces using PHRED. I. Accuracy assessment. Genome Res 8: 175185 Huang S, van der Vossen EAG, Kuang H, Vleeshouwers VGAA, Ningwen Z, Borm TJA, van Eck HJ, Baker B, Jacobsen E, Visser RGF (2005) Comparative genomics enabled the isolations of the R3a late blight resistance gene in potato. Plant J 42: 251261[CrossRef][ISI][Medline] Kent JW (2002) The BLAST-like alignment tool. Genome Res 12: 656664 Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL (2004) Versatile and open software to compare large genomes. Genome Biol 5: R12[CrossRef][Medline] Lander ES, Waterman MS (1988) Genomics mapping by fingerprinting random clones: a mathematical analysis. Genomics 2: 231239[CrossRef][Medline] Marra MA, Kucaba TA, Dietrich NL, Green ED, Brownstein B, Wilson RK, McDonald KM, Hillier LW, McPherson JD, Waterston RH (1997) High throughput fingerprint analysis of large-insert clones. Genome Res 7: 10721084 Meyers BB, Scalabrin S, Morgante M (2004) Mapping and sequencing complex genomes. Nat Genet 5: 578588[CrossRef][ISI] Mueller AL, Solow TH, Taylor N, Skwarecki B, Buels R, Bins J, Lin C, Wright MH, Ahrens R, Wang Y, et al (2005a) The SOL genomics network: a comparative resource for Solanaceae biology and beyond. Plant Physiol 138: 13101317 Mueller AL, Tanksley SD, Giovannoni JJ, van Eck J, Stack S, Choi D, Kim BD, Chen M, Cheng Z, Li C, et al (2005b) The tomato sequencing project, the first cornerstone of the international Solanaceae project (SOL). Comp Funct Genomics 6: 153158[CrossRef][Medline] Rouppe van der Voort JR, Kanyuka K, van der Vossen E, Bendahmane A, Mooijman P, Klein-Lankhorst R, Stiekema W, Balcombe D, Bakker J (1999) Tight physical linkage of the nematode resistance gene Gpa2 and the virus resistance gene Rx on a single segment introgressed from wild species Solanum tuberosum subsp. andigena CPC1673 into cultivated potato. Mol Plant Microbe Interact 12: 197206[CrossRef][ISI] Sambrook J, Fritsch EF, Maniatis T (1989) Molecular Cloning: A Laboratory Manual, Ed 2. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY Soderlund C, Humphray S, Dunham A, French L (2000) Contigs built with fingerprints, markers, and FPC V4.7. Genome Res 10: 17721787 Soderlund C, Longdon I, Mott R (1997) FPC: a system for building contigs from restriction fingerprinted clones. Comput Appl Biosci 13: 523535 Vos P, Hogers R, Bleeker M, Rijans M, Van der Lee T, Hornes M, Frijters A, Pot J, Peleman J, Kuiper M, et al (1995) AFLP: a new technique for DNA fingerprinting. Nucleic Acids Res 23: 44074414 Van der Hoeven R, Ronning C, Giovannoni J, Martin G, Tanksley S (2002) Deductions about the number, organization and evolution of genes in the tomato genome based on analysis of a large expressed sequence tag collection and selective genomic sequencing. Plant Cell 14: 14411456 Venter JC, Smith HO, Hood I (1996) A new strategy for genome walking. Nature 381: 364366[CrossRef][Medline] Yamada K, Lim J, Dale JM, Chen H, Shinn P, Palm CJ, Southwick AM, Wu HC, Kim C, Nguyen M (2003) Empirical analysis of transcriptional activity in the Arabidopsis genome. Science 302: 842846 Yu Z, Zhao J, Luo J (2002) PGAAS: a prokaryotic genome assembly assistance system. Bioinformatics 18: 661665 This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||