|
|
||||||||
|
First published online May 28, 2008; 10.1104/pp.108.119081 Plant Physiology 147:1396-1411 (2008) © 2008 American Society of Plant Biologists OPEN ACCESS ARTICLE
Sequence Analysis of Bacterial Artificial Chromosome Clones from the Apospory-Specific Genomic Region of Pennisetum and Cenchrus1,[W],[OA]Department of Horticulture, University of Georgia, Tifton, Georgia 31793–0748 (J.A.C., S.G., G.G., P.O.-A.); Department of Plant Biology (M.-M.C.-P., C.L., H.W., L.H.P.), Department of Genetics (J.D., L.Y., J.L.B.), and Office of Research Services (V.E.J.) University of Georgia, Athens, Georgia 30602; and Department of Plant Biology Institute for Plant Genomics and Biotechnology, Texas A&M University, College Station, Texas 77843 (J.E.M., P.E.K.)
Apomixis, asexual reproduction through seed, is widespread among angiosperm families. Gametophytic apomixis in Pennisetum squamulatum and Cenchrus ciliaris is controlled by the apospory-specific genomic region (ASGR), which is highly conserved and macrosyntenic between these species. Thirty-two ASGR bacterial artificial chromosomes (BACs) isolated from both species and one ASGR-recombining BAC from P. squamulatum, which together cover approximately 2.7 Mb of DNA, were used to investigate the genomic structure of this region. Phrap assembly of 4,521 high-quality reads generated 1,341 contiguous sequences (contigs; 730 from the ASGR and 30 from the ASGR-recombining BAC in P. squamulatum, plus 580 from the C. ciliaris ASGR). Contigs containing putative protein-coding regions unrelated to transposable elements were identified based on protein similarity after Basic Local Alignment Search Tool X analysis. These putative coding regions were further analyzed in silico with reference to the rice (Oryza sativa) and sorghum (Sorghum bicolor) genomes using the resources at Gramene (www.gramene.org) and Phytozome (www.phytozome.net) and by hybridization against sorghum BAC filters. The ASGR sequences reveal that the ASGR (1) contains both gene-rich and gene-poor segments, (2) contains several genes that may play a role in apomictic development, (3) has many classes of transposable elements, and (4) does not exhibit large-scale synteny with either rice or sorghum genomes but does contain multiple regions of microsynteny with these species.
Apomixis is a naturally occurring mode of asexual reproduction in angiosperms that leads to embryo and seed formation without a requirement for meiosis or fertilization of the egg. The application of apomixis in plant breeding could have tremendous impact by providing a mechanism for hybrid progeny to avoid segregation and fix heterosis. Although apomixis has been reported in over 300 species in at least 35 angiosperm families, 75% of the apomictic species reported have been from the Poaceae, Rosaceae, and Asteraceae families. Apomixis occurs almost exclusively in polyploid genotypes. Classification of the apomictic mode of reproduction as either sporophytic or gametophytic is based on the developmental origin of the cell from which the embryo is derived (Nogler, 1984
A single dominant "locus" is required for apomeiosis and parthenogenesis in the aposporous grasses Pennisetum/Cenchrus (Sherwood et al., 1994
In Pennisetum and Cenchrus, many markers specific to apospory were found to be hemizygous. Given the hemizygosity, as well as genetic data suggesting that the region responsible for apospory may be physically large, the term ASGR (for apospory-specific genomic region) was coined to describe the locus in Pennisetum/Cenchrus. Ninety-nine bacterial artificial chromosome (BAC) clones containing molecular markers (SCARs, RFLPs, and AFLPs) showing total genetic linkage to the aposporous phenotype have been identified from BAC libraries constructed from the apomictic polyhaploid line MS228-20 and the apomictic Cenchrus ciliaris line B-12-9 (Roche et al., 2002
Subsets of the ASGR-BAC clones and the ASGR-recombinant BAC have been used for fluorescence in situ hybridization (FISH) analysis in both species. In P. squamulatum, FISH analysis has confirmed that the ASGR is physically large (approximately 50 Mb) and is located near the telomere on the short arm of the ASGR-carrier chromosome. The ASGR has also been categorized as hemizygous and heterochromatic in nature (Goel et al., 2003 Other than the Opie-2-like retrotransposon and the limited sequences generated from the UGT197 ASGR-BACs, very little is known about the sequence composition of the ASGR in Pennisetum/Cenchrus and whether the colinearity between the UGT197 ASGR-BACs and rice chromosome 11 would extend throughout the ASGR. To further investigate this region, 32 ASGR-BACs, 18 from P. squamulatum and 13 from C. ciliaris, in addition to one ASGR-recombinant BAC located approximately 2 cM from the ASGR in P. squamulatum, were shotgun cloned and sample sequenced at approximately 0.5x coverage. Some additional targeted sequences were generated from BACs containing putative protein coding regions (PPCRs) identified by BLASTX analysis of the sample sequences to multiple protein databases. We generated approximately 2.5 Mb of data from 4,521 high-quality sequencing reads. After Phrap assembly, 1,341 sequence contigs (730 contigs from the P. squamulatum ASGR, 581 contigs from the C. ciliaris ASGR, and 30 contigs from the P. squamulatum ASGR-recombining BAC), covering approximately 1.0 Mb, were obtained. Twenty-five C. ciliaris and 23 P. squamulatum ASGR-PPCRs were discovered through similarity to known proteins along with five ASGR-recombinant PPCRs from P. squamulatum. The protein function of the ASGR-PPCRs identified varied widely, with many showing similarity to proteins with functional domains that are known to bind DNA and/or alter DNA transcription and hence could be involved in the apomictic pathway. The strongest candidate ASGR-PPCR identified was the ASGR-BABY BOOM (BBM)-like genes. This study demonstrates that the colinearity previously identified between the UGT197 SCAR-containing BACs and rice chromosome 11 does not extend throughout the ASGR; instead, multiple small regions of shared synteny to the rice and sorghum (Sorghum bicolor) genomes exist throughout the ASGR.
ASGR Sample and Targeted Sequencing
Shotgun libraries were constructed from 32 ASGR-BAC clones isolated in our laboratory and from one ASGR-recombining BAC clone (Table I
). The 13 BACs designated c__ were isolated from the C. ciliaris line B-12-9 library, while the 20 BACs designated p__ were isolated from the polyhaploid apomict library (Roche et al., 2002
Shotgun subclone libraries were sample sequenced at a depth of approximately 0.5x coverage. A total of 2,055 high-quality sequences were obtained from the 33 BAC subclone libraries. Sequence reads derived from an individual BAC clone or from a group of overlapping BAC clones were given a unique Phrap group identification number as shown in Table I. These grouped BAC-derived sequences were then assembled into sequence contigs by Phrap. During Phrap assembly of individual sequences into contigs, two numeric identifiers were added to the Phrap group identification number to generate a unique name for each sequence contig. When an FPC group contained BACs sequenced from both species, the sequences from each species were grouped separately (Phrap assemblies 11 and 22). All generated sequence contigs were analyzed for homology to other known proteins using BLASTX against an internal PIR_NREF database at FUNGEN (www.fungen.org). If a sequence contig contained a BLASTX e-value hit of 10–6 to a protein unrelated to transposable elements and if the sequence was generated from a BAC subclone library produced at the University of Georgia (UGA; Table I), targeted sequencing data were generated (see "Materials and Methods"). After 2,466 additional targeted high-quality sequences were generated, the random and targeted sequences were assembled again by Phrap into 1,341 sequence contigs ranging in size from 100 to 8,521 bp. Each sequence contig was given a Uniscript name for database reference. All high-quality reads have been deposited in GenBank dbGSS ED544199 to ED548719. Individual sequences can also be accessed through the FUNGEN ASGR database at http://asgr.uga.edu. A detailed description of sequence processing and analysis can be found in Cordonnier-Pratt et al. (2004)
Eighty-seven sequence contigs (6.5%) contained similarity to proteins from multiple species unrelated to transposable elements upon BLASTX analysis to the UniprotTrEMBL database (version 7) with an e-value hit of
Duplication of ASGR and ASGR-Recombinant PPCRs
Previous analysis of ASGR-BAC clones suggested that regions within the ASGR regardless of species were duplicated. Sample and targeted gene sequencing allowed for the analysis of sequence contigs with PPCR duplications within a Phrap grouping. Intra-Phrap group PPCR duplications were considered verified if the PPCR was present in two or more sequence contigs and a nucleotide polymorphism was detected either by RFLP analysis or by having two or more sequences for each sequence contig as well as an error P value of less than 0.0001 for a putative nucleotide polymorphism. Four intra-Phrap group duplications were identified. Sequence contigs 14-2-24 and 14-2-25 are 99.6% identical over 6,593 bp; sequence contigs 33-2-33 and 33-2-35 are 99.7% identical over 1,180 bp; sequence contigs 11-2-38 and 11-2-30 are 99.1% identical over 570 bp; while sequence contigs 11-2-38 and 11-2-36 are 99.8% identical over 1,000 bp. As full-length sequencing of the PPCRs was not always accomplished, it is unknown whether the intra-Phrap group duplications identified would extend over the complete gene or whether we are finding remnants of duplication. While Gualtieri et al. (2006)
Analysis of potential duplicate sequence contigs of PPCRs between different Phrap groupings within a species by sample sequence analysis alone is harder to assess due to the complexity of the region. For example, sequence contigs 10-2-2 and 12-2-16 contain similarity to the same rice gene and show 100% sequence similarity over the length of the 10-2-2 sequence. The c018 BAC was isolated using ASGR-specific primers derived from an end clone sequence of ASGR-BAC c001. BAC c001 is contained in the same FPC contig as the sequenced c002 BAC found in Phrap group 10. However, as FPC did not group the c018 BAC with the BACs from the c001/c002 group, and as only one PPCR was identified that was similar between the two BACs, it is unclear without extensive BAC sequencing whether this PPCR is a true duplication or the same gene on overlapping BACs. The duplication of the β-D-xylosidase-like protein on sequence contigs 15-0-12 and 13-2-3 reported by Gualtieri et al. (2006) Whole gene sequencing data for the analysis of PPCR conservation between the two apomictic species was not generated in this study except for the ASGR-BBM-like genes (see below). With the exception of Phrap groups 11 and 22, we chose to sample sequence BACs not considered orthologs between the two species to increase overall coverage across the ASGR. Additionally, if the same PPCR was identified between the two species, only one of the species was chosen for targeted gene sequencing.
We chose to fully sequence the ASGR-BBM-like genes from both species. Four ASGR-BACs (c100, c102, p203, and p207) were identified that contained distinct copies of the ASGR-BBM-like genes. C102, considered by FPC analysis to be orthologous to the Phrap 27 BACs, and p203 were not previously sample sequenced. The ASGR-BBM-like genes were sequenced and contain approximately 300 bp upstream of the predicted start codon to the predicted stop codon. The PsASGR-BBM-like1 gene derived from BAC p203 contains 3,826 bp (EU559280); the PsASGR-BBM-like2 gene derived from BAC p207 contains 3,832 bp (EU559277); the CcASGR-BBM-like1 gene derived from BAC c102 contains 3,835 bp (EU559278); and the CcASGR-BBM-like2 gene contains 3,856 bp (EU559279). The four genes were aligned by ClustalW2, and the alignment is shown in Supplemental Figure S1. Table V shows the percentage of consensus positions of the four ASGR-BBM-like genes with each other using global alignment. The two ASGR-BBM-like genes from P. squamulatum share 99.8% identity with each other and differ only at two positions within the first predicted intron (see ClustalW2 alignment). The two C. ciliaris ASGR-BBM-like genes share 98.6% identity with each other. The differences between the two C. ciliaris ASGR-BBM-like genes are found in both the predicted coding and noncoding regions. When analyzed across the species, the CcASGR-BBM-like1 gene is more similar to both P. squamulatum ASGR-BBM-like genes than to the CcASGR-BBM-like2 gene, confirming the FPC analysis. Unlike the P. squamulatum ASGR-BBM-like genes, which are found on overlapping BACs in an FPC contig, the C. ciliaris ASGR-BBM-like genes are identified on separate FPC contigs.
The four ASGR-BBM-like genes were analyzed for the potential to be transcribed using the rice gene prediction program at RiceGAAS (http://ricegaas.dna.affrc.go.jp/usr/). All ASGR-BBM-like sequences were predicted to contain seven exons. PsASGR-BBM-like1, PsASGR-BBM-like2, and CcASGR-BBM-like1 sequences were predicted to encode a 542-amino acid protein containing two AP2 domains. The predicted exons for these genes are highlighted in red in Supplemental Figure S1. The PsASGR-BBM-like1 and PsASGR-BBM-like2 predicted proteins are identical and 99.3% identical to the predicted CcASGR-BBM-like1 protein. The CcASGR-BBM-like2 sequence was predicted to encode a smaller 489-amino acid protein also containing two AP2 domains. Splice site changes between CcASGR-BBM-like1 and CcASGR-BBM-like2 caused the removal of the third exon in the CcASGR-BBM-like2 predicted protein, and a potential stop codon in the seventh exon was removed by the addition of an intron in the ORF. The predicted exons from CcASGR-BBM-like2 are highlighted in blue in Supplemental Figure S1. The alignment of the predicted ASGR-BBM-like proteins can be seen in Supplemental Figure S2. The genomic sequences and predicted proteins for the four ASGR-BBM-like genes were compared against the rice genomic sequence and the TIGR gene model databases at http://www.gramene.org/multi/blastview. The most significant hits for all protein and genomic DNA ASGR-BBM-like queries were, in order of most to least significant, LOC_Os11g19060, LOC_Os02g40070, LOC_Os04g55970, and LOC_Os01g67410. To further verify that LOC_Os11g19060 is the most similar rice gene to the ASGR-BBM-like genes, the above rice hits were aligned using both genomic DNA sequences and predicted protein sequences against the ASGR-BBM-like genes using ClustalW. In both types of alignments, LOC_Os11g19060 was the most similar to the ASGR-BBM-like sequences.
A total of 303 sequence contigs (23%) had predicted coding regions with similarity to known or predicted transposable elements based on BLASTX analysis using the TrEMBL database (version 7). Forty-three sequence contigs contained similarity to transposases from type II transposable elements. The remaining 235 sequence contigs had similarity to various proteins from type I retrotransposable elements. All 1,341 sequence contigs were scanned using RepeatMasker (A.F.A. Smit, R. Hubley, and P. Green, unpublished data; current version, open-3.1.8) at http://www.repeatmasker.org/cgi-bin/WEBRepeatMasker against the rice repeat database (RM database version 20061006) using the default settings. A total of 468 sequence contigs (33%) were identified that contained sequences with similarity to transposable elements: 362 sequence contigs to type I and 106 sequence contigs to type II. Within the type I elements, 304 hits were classified as long terminal repeat retrotransposons. When the analysis was separated by species, similar percentages of repetitive elements were found. The 580 C. ciliaris sequence contigs contain 505,489 bp of sequence and identified 175 type I and 55 type II elements. The 730 P. squamulatum ASGR sequence contigs, containing 452,641 bp of sequence, identified 178 type I and 48 type II elements. The 30 P. squamulatum ASGR-recombinant sequence contigs covering 46,504 bp had nine and three type I and type II elements identified, respectively. Supplemental Table S2 shows the percentage of nucleotides masked by RepeatMasker for each Phrap grouping for type I and type II elements as well as the percentage masked for repeats and low complexity DNA.
A total of 2,033 individual sequences (1,221 from Pennisetum and 814 from Cenchrus) generated from the 0.5x shotgun sequences were analyzed using the Assisted Automated Assembler of Repeat Families (A.A.A.R.F.) algorithm (J. DeBarry and J.L. Bennetzen, unpublished data). The A.A.A.R.F. program takes sequence overlaps from shotgun sample sequence data sets and walks them out to create pseudomolecular "builds" representing the most abundant repeat families within randomly sequenced data sets. The A.A.A.R.F. algorithm was applied to both species separately and to a combined data set that consisted of sequences pooled together from both species. Individual tests produced a number of builds for each species (118 for Pennisetum and 74 for Cenchrus). During build construction for individual species, 474 sequences were used from the Pennisetum data and 321 from the Cenchrus data set. The number of sequences used to create the pseudomolecules accounts for 38.8% and 39.5% of the Pennisetum and Cenchrus data sets respectively, indicating that these sequences were repetitive within the data sets. The builds were compared with known repeats from the TIGR plant repeat database (Ouyang and Buell, 2004 The Opie-2-like retrotransposon partial sequence (AY375366) previously identified in our laboratory to give a dispersed signal across the high-copy region of the ASGR was compared by BLAST with the combined species A.A.A.R.F. output and with the ASGR sequence contigs. The AY375366 clone, generated from the Phrap 31 BAC, has the last 467 nucleotides containing an open reading frame, similar to the 5' end of the gag protein of Opie-2 from maize. Neither the A.A.A.R.F. outputs nor the ASGR sequence contigs contained a hit that spanned the AY375366 sequence. Rather, pieces of the AY375366 sequence were found in two A.A.A.R.F. outputs and multiple Phrap 31 contigs.
All sequence contigs were screened for possible Helitrons (Kapitonov and Jurka, 2001
All sequence contigs listed in boldface in Tables II to IV
As shown in Table VI, the ASGR sequence contigs analyzed showed highest similarity to genomic regions on seven different rice chromosomes. Five regions of colinearity (genes located <0.1 Mb apart in rice) were identified for ASGR sequence contigs and one region for the ASGR-recombining sequence contigs. Rice homologs for sequence contigs 29-2-18/29-2-25 and 12-2-16/23-0-3 are colinear on rice chromosomes 4 and 6, respectively. All five rice homologs identified from the sequence contigs 33-2-35, 33-2-29, 33-2-32, and 33-2-34 (genes 1 and 2) are located in a colinear manner on chromosome 4. Sequence contig 19-2-19 contains a rice homolog near the rice homologs found in Phrap group 33 sequence contigs. Supplemental Figure S3 compares these sequence contigs with the rice genome using the Artemis Comparison Tool. The sequence contigs from the ASGR-recombinant BAC showed highest sequence similarity to regions on three different rice chromosomes but identified colinear genes on rice chromosome 7.
Two separate sorghum BAC libraries were probed with amplicons covering sections of ASGR-PPCRs plus one potential flanking ASGR, marker HHU27 (Jessup et al., 2002 An in silico BLASTN analysis using identical sequences as the rice BLASTN was conducted on September 11, 2007, using the SORprelim.fasta.masked100 database at http://www.phytozome.net/search.php?show=blast. Table VII shows the best hit, the corresponding sorghum super contig, and its position on the contig. Twenty-seven sequence contigs containing ASGR-PPCRs and four sequence contigs containing ASGR-recombinant PPCRs were placed on sorghum supercontigs. In total, four regions of microsynteny could be identified for the ASGR-PPCRs and one for the ASGR-recombining PPCRs. Three of the ASGRs and the one ASGR-recombining region of microsynteny in sorghum were also present in rice.
Sequence Analysis of ASGR and ASGR-Recombinant BAC Shotgun Libraries
We sample sequenced 32 ASGR-BAC clones chosen to maximize coverage across the ASGR in both species based on FPC analysis (Goel et al., 2006 The analysis of all sequence contigs generated in this study identified 24 C. ciliaris sequence contigs containing 25 PPCRs, 21 P. squamulatum sequence contigs containing 23 PPCRs, and four P. squamulatum recombinant sequence contigs containing five PPCRs based on BLASTX similarity to the TIGR gene model database. Excluding sequence contigs related to transposable elements, the gene density of BACs within the ASGR was quite variable. Of the six sample-sequenced ASGR-BACs that failed to provide evidence of PPCRs based on BLASTX analysis, four are physically located in the high-copy region of the ASGR. While a greater number of PPCRs were identified on BACs located within the low-copy region of the ASGR, the highest gene density (one PPCR per 21 kb) was identified from Phrap group 33, whose BACs physically map at the edge of one high-copy flanking region toward the low-copy region of the ASGR. The ASGR-recombinant BAC had the highest gene density of approximately one gene every 16 kb. Forty different rice proteins were identified when sequence contigs containing ASGR-PPCRs were combined from both species. Using gene ontology terms identified in the rice proteins containing the highest BLASTX hit to the ASGR-PPCRs, there are four ASGR-PPCRs predicted to encode proteins with functional domains known to bind or alter DNA structure and two ASGR-PPCRs with catalytic kinase domains. Nine ASGR-PPCRs had similarity to hypothetical or expressed proteins in rice. One could postulate a role for any of these ASGR-PPCRs in the apomictic developmental pathway. Mutational load in the sequenced regions of the ASGR was not greatly useful for discarding ASGR-PPCRs as nonfunctional. Using the gene prediction program at RiceGAAS, the four ASGR-BBM-like genes and the eight ASGR-PPCRs with at least 80% amino acid coverage compared with the corresponding rice protein were all predicted to encode potentially expressed transcripts, even though the sequence contig 11-2-34 and the CcASGR-BBM-like2 gene contained potential stop codon mutations when compared with the corresponding rice protein. The combination of these results suggests that even if full sequencing of the ASGR was accomplished, too many potentially functional ASGR-PPCRs would be identified for gene-by-gene analysis, assuming that the apomictic pathway is controlled by genes located within the ASGR and not by epigenetic factors such as noncoding RNAs or heterochromatin structure.
Of the PPCRs identified in the study, we did choose to fully sequence the ASGR-BBM-like genes as the best potential candidate gene. BBM originally was designated as a transcript induced in microspore cultures of Brassica napus (BnBBM) undergoing somatic embryogenesis. Two copies of BBM were identified in B. napus and are orthologous to a single gene in Arabidopsis (Arabidopsis thaliana). Overexpression of BnBBM in Arabidopsis results in the formation of ectopic embryos on leaves (Boutilier et al., 2002
The duplication of sequence contigs containing PPCRs at the ASGR was not unexpected given our previous FPC and comparative mapping results (Roche et al., 2002
Classification of repetitive elements ranged from 23% to 40% of the sequences generated, depending on the analysis used. As in other higher plants, retroelements constituted more of the DNA in the studied regions than any other repeat class, but even the most abundant element (a previously reported Opie-2-like long terminal repeat retrotransposon) contributed only slightly more than 1% of the sequenced DNA. Helitrons (Kapitonov and Jurka, 2001
The haploid genome sizes of P. squamulatum, C. ciliaris, and the sexual diploid Pennisetum glaucum are approximately 5,150 Mb, approximately 1,500 Mb, and approximately 1,950 Mb, respectively (Roche et al., 2002 Therefore, it is somewhat puzzling that the sequence contigs derived from the ASGR, a highly heterochromatic region in both P. squamulatum and C. ciliaris, are not showing a larger percentage of transposable elements using multiple bioinformatics programs. It was also surprising that the frequency of transposable elements and other repeats was not dramatically different between the large P. squamulatum genome and the smaller C. ciliaris genome in the ASGR. Perhaps the repeats in these two genomes are so divergent that they are not detected as homologous by the informatic techniques employed (although the informatic screening tends to be much more sensitive than Cot analysis). More likely, the ASGR region may be unusual in its repeat properties compared with the rest of the P. squamulatum or C. ciliaris genomes.
Comparative genetic mapping of members of the Poaceae family, including rice, maize, barley (Hordeum vulgare), wheat (Triticum aestivum), and pearl millet (P. glaucum) has shown a conservation of gene and marker order between the genomes despite an up to 35-fold difference in genome size and 50 to 80 million years of evolution (Moore et al., 1995
Comparative mapping studies using RFLPs have also been attempted, with limited success, in apomicts. Comparative mapping has been used to identify regions of synteny between the distal part of the long arm of rice chromosome 12 and the apomixis locus in Paspalum (Pupilli et al., 2001
Genomic sequence data from other apomicts is also limited. A BAC from the apomictic controlling locus (ACL) in Paspalum simplex was sequenced to approximately 99% completion and organized into 20 contigs. Four of the contigs, containing approximately 13 kb of sequence, did not show any predicted peptides. Nine additional contigs from this BAC contained similarity to 13 repetitive elements. Four genes, unrelated to transposable elements and containing significant hits to rice proteins, were identified through BLASTP alignment of FGENESH predicted proteins to the rice protein database at TIGR. Synteny of the ACL of P. simplex with the distal end of rice chromosome 12 was confirmed at the sequence level for the PsEXS and PsPKD genes (Calderini et al., 2006
With 95% of the rice genome completely sequenced and annotated (International Rice Genome Sequencing Project, 2005
As previous articles presented data tentatively finding synteny of the ASGR in C. ciliaris and sorghum chromosome D (or chromosome 6; Jessup et al., 2002
The many chromosomal rearrangements that our study has uncovered in the ASGR compared with the rice and sorghum genomes indicate that colinearity with other grasses will be a tenuous resource for ASGR analysis. Relative to other grass genome comparisons of microsynteny (Bennetzen, 2005
Apomixis is a fascinating developmental process leading to the clonal propagation of the maternal plant through seed. The ability to harness this potential in food crops could significantly alter agricultural practices. Through genomic technologies, we have sample sequenced and analyzed a small portion of the genomic region required for apomixis in Pennisetum and Cenchrus, two related apomictic genera. Our analyses have identified 40 potentially transcribed genes, four of which contain domains known to bind or alter DNA and two that contain similarity to kinase domains and are thus "apomixis gene candidates." Overall gene density across the ASGR was very low (approximately one per 61 kb), although gene-rich subregions were identified. Regions of microsynteny with the rice and sorghum genomes were identified, suggesting that a narrowly defined ASGR region could use genomic colinearity with rice and/or sorghum as a tool to assist the discovery of the apomixis "gene(s)."
Grouping of ASGR-BAC Clones for Sequence Analysis
Ninety-nine ASGR-BAC clones isolated using 17 ASGR molecular markers totally linked to the apospory phenotype in either and/or both apomictic species were previously analyzed by fingerprinting (Goel et al., 2006
Twenty BAC shotgun libraries (Table I, underlined) were generated at TAMU, and their construction and sequencing are outlined by Gualtieri et al. (2006)
Additional gene sequences were generated by radioactive labeling of rice (Oryza sativa; http://www.genome.arizona.edu/orders/) or sorghum (Sorghum bicolor; http://www.fungen.org/Projects/Sorghum/Clone%20requests.htm) cDNA inserts containing high similarity to the ASGR-PPCR of interest followed by hybridization to filters containing the shotgun library of interest. Hybridization and washes were conducted at low stringency (55°C hybridization; 2x SSC, 0.1% SDS at 55°C wash). If a rice or sorghum cDNA was not available, PCR probes were generated from each end of the genic sequence of interest. The PCR fragments were labeled and hybridized to filters containing the shotgun library of interest. Hybridization and washes were conducted at high stringency (65°C hybridization; 0.2x SSC, 0.1% SDS at 65°C wash). Hybridizing clones not previously sequenced were sequenced using both the T7 and M13REV primers.
Each sequencing reaction was given a unique name and processed for quality using the MAGIC processing pipeline and database (Liang et al., 2006
Individual sequencing reads derived from individual BAC clones or from a group of overlapping BAC clones were given a unique Phrap group identification number and then assembled into sequence contigs using Phrap. During Phrap assembly, two additional numeric identifiers were added to the Phrap group identification number to generate a unique name for each sequence contig. We used BLASTX with the parameters gapopen=11, expect=10.0, gapext=1, and allowgaps=yes to BLAST the consensus sequences of the sequence contigs against the Swiss-Prot and TrEMBL databases. These databases make up the Uniprot Knowledgebase database (UniprotKB). Version 7 (release date February 6, 2006) of the databases was used. More information about the Uniprot databases may be found at http://www.uniprot.org.
Beginning with the sequences generated from sample and targeted gene sequencing, complete ASGR-BBM-like gene genomic structures from c100, c102, p203, and p207 were generated. These four BACs contain individual copies of the ASGR-BBM-like genes based on restriction mapping and hybridization (Gualtieri et al., 2006
Sorghum library filter 052-SOR-H3 from TAMU (http://hbz.tamu.edu/cgi-bin/htmlassembly?bacs) and sorghum library filter SB_BBc from CUGI (https://www.genome.clemson.edu/cgi-bin/orders?andpage=productGroupandservice=bacrcandproductGroup=26) were hybridized at 55°C following the protocol from CUGI (http://www.genome.clemson.edu/) and washed under low stringency (2x SSC, 0.1% SDS at 55°C). Probes were labeled PCR products covering PPCRs from sequence contigs and amplified from the corresponding BACs listed in Supplemental Table S2. Sorghum BAC addresses were called, and potential orthologous BACs were identified and anchored to FPC contigs and chromosomes using information at http://www.stardaddy.uga.edu/fpc/bicolor/WebAGCoL/WebFPC/ and through personal communication with Patricia Klein at TAMU.
Putative gene contigs were analyzed with BLASTN and BLASTX programs at Gramene release 25.0 (http://www.gramene.org/) using the Genomic Sequence and Rice PEP_TIGR databases, respectively, during the month of September 2007. For both analyses, the distant homologies BLAST parameters used were -E:10; -B:100; -filter:seg; -W:3; -hitdist:40; matrix: BLOSUM62; T:15 (for BLASTX) and -E:10; -B:100; -filter:dust; -W:9, -M:1; -N:-1; -Q:2; -R:1 (for BLASTN). Gene prediction was performed using the RiceGAAS Rice Genome Automated Annotation System (http://ricegaas.dna.affrc.go.jp/).
The rice genomic sequence from chromosome 4 (18330000 to 18425000) was exported from Gramene.org. This sequence is derived from the TIGR5 build. Double ACT version 2 at http://www.hpa-bioinfotools.org.uk/pise/double_act.html was used to produce a BLASTN analysis for the rice sequence and the ASGR sequence contigs 33-2-35, 33-2-29, 33-2-32, 33-2-34, and 19-2-19 using a cutoff score of 10. The resulting output was visualized using the Artemis Comparison Tool program, release 7 (Carver et al., 2005 Sequence data from this article can be found in the GenBank/EMBL data libraries under accession numbers ED544199 to ED548719 and EU559277 to EU559280.
The following materials are available in the online version of this article.
We thank Evelyn Morgan and Anne Bell for technical support and Benji Adair for computational support in Tifton. Received March 17, 2008; accepted May 25, 2008; published May 28, 2008.
1 This work was supported by the National Science Foundation (grant no. 0115911) and the University of Georgia Experiment Station.
2 Present address: Department of Botany, University of Delhi, Delhi, India 110007.
3 Present address: Department of Botany, Miami University, Oxford, OH 45056.
4 Present address: Center for Tropical and Emerging Global Diseases, University of Georgia, Athens, GA 30602. The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantphysiol.org) is: Peggy Ozias-Akins (pozias{at}uga.edu).
[W] The online version of this article contains Web-only data.
[OA] Open Access articles can be viewed online without a subscription. www.plantphysiol.org/cgi/doi/10.1104/pp.108.119081 * Corresponding author; e-mail pozias{at}uga.edu.
Adams KL, Wendel JF (2005) Polyploidy and genome evolution in plants. Curr Opin Plant Biol 8: 135–141[CrossRef][ISI][Medline] Akiyama Y, Conner JA, Goel S, Morishige DT, Mullet JE, Hanna WW, Ozias-Akins P (2004) High-resolution physical mapping in Pennisetum squamulatum reveals extensive chromosomal heteromorphism of the genomic region associated with apomixis. Plant Physiol 134: 1733–1741 Akiyama Y, Hanna WW, Ozias-Akins P (2005) High-resolution physical mapping reveals that the apospory-specific genomic region (ASGR) in Cenchrus ciliaris is located on a heterochromatic and hemizygous region of a single chromosome. Theor Appl Genet 111: 1042–1051[CrossRef][ISI][Medline] Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796–815[CrossRef][Medline] Bennetzen JL (2005) Transposable elements, gene creation and genome rearrangement in flowering plants. Curr Opin Genet Dev 15: 621–627[CrossRef][ISI][Medline] Bennetzen JL, Ma J (2003) The genetic colinearity of rice and other cereals on the basis of genomic sequence analysis. Curr Opin Plant Biol 6: 128–133[CrossRef][ISI][Medline] Bennetzen JL, Ramakrishna W (2002) Numerous small rearrangements of gene content, order and orientation differentiate grass genomes. Plant Mol Biol 48: 821–827[CrossRef][ISI][Medline] Boutilier K, Offringa R, Sharma VK, Kieft H, Ouellet T, Zhang L, Hattori J, Liu CM, van Lammeren AAM, Miki BLA, et al (2002) Ectopic expression of BABY BOOM triggers a conversion from vegetative to embryonic growth. Plant Cell 14: 1737–1749 Bowers JE, Abbey C, Anderson S, Chang C, Draye X, Hoppe AH, Jessup R, Lemke C, Lennington J, Li Z, et al (2003a) A high-density genetic recombination map of sequence-tagged sites for Sorghum, as a framework for comparative structural and evolutionary genomics of tropical grains and grasses. Genetics 165: 367–386 Bowers JE, Arias MA, Asher R, Avise JA, Ball RT, Brewer GA, Buss RW, Chen AH, Edwards TM, Estill JC, et al (2005) Comparative physical mapping links conservation of microsynteny to chromosome structure and recombination in grasses. Proc Natl Acad Sci USA 102: 13206–13211 Bowers JE, Chapman BA, Rong J, Paterson AH (2003b) Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature 422: 433–438 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||