First published online December 5, 2002; 10.1104/pp.010207
Plant Physiol, December 2002, Vol. 130, pp. 2118-2128
Cloning and Sequencing of cDNAs for Hypothetical Genes from
Chromosome 2 of Arabidopsis1,[w]
Yong-Li
Xiao,*
Mukesh
Malik,2
Catherine A.
Whitelaw, and
Christopher D.
Town
The Institute for Genomic Research, 9712 Medical Center Drive,
Rockville, Maryland 20850
 |
ABSTRACT |
About 25% of the genes in the fully sequenced and
annotated Arabidopsis genome have structures that are predicted solely
by computer algorithms with no support from either nucleic acid or protein homologs from other species or expressed sequence matches from
Arabidopsis. These are referred to as "hypothetical genes." On
chromosome 2, sequenced by The Institute for Genomic Research, there
are approximately 800 hypothetical genes among a total of approximately
4,100 genes. To test their expression under various growth conditions
and in specific tissues, we used six cDNA populations prepared from
cold-treated, heat-treated, and pathogen (Xanthomonas campestris pv campestris)-infected plants, callus, roots, and young seedlings. To date, 169 hypothetical genes were tested, and 138 of them are found to be expressed in one or more of the six cDNA
populations. By sequencing multiple clones from each 5'- and 3'-rapid
amplification of cDNA ends (RACE) product and assembling the sequences,
we generated full-length sequences for 16 of these genes. For 14 genes,
there was one full-length assembly that precisely supported the
intron-exon boundaries of their gene predictions, adding only 5'- and
3'-untranslated region sequences. However, for three of these genes,
the other assemblies represent additional exons and alternatively
spliced or unspliced introns. For the remaining two genes, the cDNA
sequences reveal major differences with predicted gene structures. In
addition, a total of six genes displayed more than one polyadenylation
site. These data will be used to update gene models in The Institute
for Genomic Research annotation database ATH1.
 |
INTRODUCTION |
With the combined efforts of
scientists from Europe, Japan, and the United States, the first higher
plant genome-sequencing project, whole-genome sequencing of Arabidopsis
has been completed. The sequences of chromosome 2 and 4 were first
released in 1999 (Lin et al., 1999 ; Mayer et al.,
1999 ), and the remaining three chromosomes were sequenced by
the end of 2000 (Salanoubat et al., 2000 ; Tabata
et al., 2000 ; Theologis et al., 2000 ). This
provides scientists a wealth of information and knowledge with which to understand plant biology from a genomic perspective. The whole Arabidopsis genome encodes approximately 25,000 genes (The
Arabidopsis Genome Initiative, 2000 ) and the functional
analysis of these genes is a major challenge in this post-sequencing
era. One approach to this, taken by several groups (Ceres, Stanford
Genome Center, Salk Institute, Plant Gene Expression Center [in
collaboration with RIKEN Genomic Sciences Center], and Institut
National de la Recherche Agronomique/Genoplante) is to produce
full-length cDNAs for all of the 25,000+ genes in the Arabidopsis
genome, because these complete sequences are essential to fully
understand their structure and function (Seki et al.,
2001a , 2001b ). Recent comparison of full-length
cDNAs from Ceres and SSP with previous genomic annotation revealed
that the structures of about one-third of the genes could be
improved based on cDNA sequence (Haas et al.,
2002 ). To complement these large-scale, undirected cloning and
sequencing effects, we are focusing our attention on a special group of
genes that are not represented in current expressed sequence tag (EST)
collections and are therefore least likely to be sequenced by the large
scale public efforts, thus requiring a targeted approach. This group of
"hypothetical genes" is predicted only by ab initio computer
algorithms such as Genscan (Burge and Karlin, 1997 ; now updated to Genscan+), Genemark.hmm (Lukashin and Borodovsky,
1998 ), and various splice site prediction programs
(Uberbacher and Mural, 1991 ; Hebsgaard et al.,
1996 ; Brendel and Kleffe, 1998 ) and have no
database support.
Our goal is cloning, sequencing, and functional analysis of
full-length cDNAs from these hypothetical genes. These genes represent the most obscure and challenging set, because there is no experimental evidence for the validity of the gene models, for the functions of the
encoded proteins, or for the transcription of that particular region of
the genome as evidenced by EST sequences in public databases. Possible
reasons for the apparent lack of expression of these hypothetical genes
include: (a) the limited variety of tissues from which current EST
libraries have been made; (b) the low level of expression of the genes
and/or their restriction to a limited number of cells or tissues or to
specific environmental/experimental conditions that have not yet been
explored; (c) An invalid gene prediction. Therefore, a full-length cDNA
will provide the most robust method to validate a gene prediction, both
by demonstrating its expression as a cDNA and by providing sequence
information through which the details of the gene model can be
confirmed, appropriately modified, or refuted. After this, their
function in vivo can be analyzed by examination of their expression
pattern and the plant phenotypes created by overexpression and
expression inhibition.
Our target list is all of the hypothetical genes on
chromosome 2, which, at the time of completion and annotation, numbered 1,094 (Lin et al., 1999 ). However, continued EST
sequencing and the ongoing full-length cDNA efforts, which are using a
more comprehensive range of tissues and experimental treatments, have
produced cognate sequences for some genes that were originally
annotated as hypothetical. Thus the number of genes that should be
considered as hypothetical based on the above criteria is currently
approximately 800 and will probably continue to decrease slightly as
full-length cDNA sequencing progresses.
Expression analysis using microarrays is also of value in
determining the possible expression of hypothetical genes. Using Arabidopsis chromosome 2 microarrays, H. Kim, E. Snesrud, and J. Quackenbush (unpublished data) at The Institute for Genomic Research (TIGR) have examined gene expression from various tissues and
under a wide range of treatments. Their results show that hybridization
can be detected to approximately 81% of the spots representing
hypothetical genes for which no cDNA or EST sequence exists, indicating
that either these genes or their paralogs are expressed in one or more
of the conditions used. We intend to use this expression information to
target first genes with the lowest (or zero) levels of expression,
because these are least likely to be captured in the large-scale
sequencing efforts.
To test the expression of hypothetical genes, we are currently
using six cDNA populations prepared from cold-treated, heat-treated, and pathogen (Xanthomonas campestris pv campestris)-infected
plants, callus, roots, and young seedlings. The results to date
indicate about 82% of the hypothetical genes tested are expressed in
one or more of our cDNA populations. Sixteen full-length cDNA sequences were obtained by sequencing multiple clones of the 5'- and 3'-RACE products from each hypothetical gene and assembling their sequences. Comparison of the assemblies for each gene with their in silico predictions showed that 14 genes have at least one full-length cDNA
assembly consistent with the predictions, only adding some 5'- and
3'-untranslated region (UTR) sequences, whereas the remaining two genes
had major differences from the predicted gene structure. However, among
the 14 genes consistent with their predictions, seven of
them also display different forms of the cDNA for the same gene
including instances of alternative splice sites and unspliced introns,
and in two others, exons were discovered in the UTRs. Six genes showed
evidence for more than one polyadenylation site. Therefore, cloning and
sequencing the full-length cDNAs of hypothetical genes not only
provides evidence of expression, sequence information, and a foundation
for functional studies of the hypothetical genes of Arabidopsis, but
also could have some bearing on the regulation of their activities.
 |
RESULTS |
The original gene annotations for Arabidopsis
chromosome 2 (Lin et al., 1999 ) served as the starting
point of this study and can be found in their original form in the
Plant division of GenBank. The TIGR version
(http://www.tigr.org/tdb/e2k1/ath1/ath1.shtml) incorporates minor
updates based on new evidence and was used for checking the status of
selected genes before primer design. The 1999 annotation contained
1,094 hypothetical genes for which there was no experimental evidence
for their existence and expression. Our goal is to test their
expression, to obtain full-length sequence for all of them, and to
analyze their function in vivo. In this pioneer study, a total of 169 hypothetical genes were analyzed for their expressions in six cDNA
populations. Sixteen full-length cDNA sequences were obtained, which
are discussed in detail below.
Selection of Hypothetical Genes
At the outset of the project, we grouped the then incomplete
Arabidopsis proteome into gene families using the Geanfammer/Divclus approach (Park and Teichmann, 1998 ). From these
single-linkage clusters, we identified families of genes that contained
primarily or entirely hypothetical genes and selected one or two
members of each family. We focused this analysis on chromosome 2 because this had been annotated at TIGR and because we were most
familiar with the criteria used for the "hypothetical" assignment.
Before primer design and experimentation, we checked each hypothetical assignment by searching the target gene against GenBank, because in
some cases, sequences released since the original annotation (which
took place over 2 to 3 years) may provide experimental support for the
gene's expression and move it from the hypothetical to the
"unknown" category. Only genes for which there was still no
experimental support were pursued. As a broad class, hypothetical genes
show a similar range of size and intron/exon composition to the
remainder of the genome. Average transcript size is around 900 bases,
with outliers more than 5 kb in length.
Expression Pattern of the Tested Hypothetical Genes
For each hypothetical gene, gene-specific primers for both 5'- and
3'-RACE (GSP1 and GSP2) were designed based on the predicted coding
region. If the hypothetical gene was present in the cDNA population, a 200- to 500-bp PCR product would be amplified when the
gene-specific primers were used in combination (Fig.
1). Six cDNA populations were constructed
from cold-treated, heat-treated, and pathogen (X. campestris
pv campestris)-treated plants, tissue culture (induced callus), roots,
and young seedling. The GSP primers were used in combination to examine
expression of the hypothetical gene in the six cDNA populations (Fig.
2). Among the 12 hypothetical gene tests
shown in Figure 2, PCR products for seven genes (At2g39830, At2g40520,
At2g41150, At2g41050, At2g41350, At2g42330, and At2g42370) were
amplified from all six cDNA populations, whereas the remaining five
genes were shown to be expressed in a variety of different tissues
(summarized in Table I).

View larger version (18K):
[in this window]
[in a new window]
|
Figure 1.
The outline of the experiments. A, cDNA
first-strand synthesis. B, cDNA second-strand synthesis. C, Adaptor
ligation. D, Using two gene specific primers (GSPs) to test
hypothetical gene expression in six cDNA populations. There should be
200- to 500-bp PCR products if the gene is expressed. E, Using GSP1 and
AP1 to amplify the 5' end of the cDNA. F, Using GSP2 and AP1 to amplify
the 3' end of the cDNA. H, Cloning and sequencing several clones each
of the 5'- and 3'-RACE products. I, Assembling all of the 5' and 3'
reads to get the full-length cDNA sequence(s). Dashed line, Poly(A)
RNA; thin black line, cDNA; white box, cDNA adaptor; GSP1 (black
arrow), gene specific primer 1; GSP2 (black arrow), gene specific
primer 2; AP1(white arrow), adaptor primer.
|
|

View larger version (72K):
[in this window]
[in a new window]
|
Figure 2.
An example of the results from testing
hypothetical gene expression by PCR. A through D, Four loading
positions; Lanes 1 through 6 through four loading positions, cDNA
populations from young seedlings, tissue culture (induced callus),
roots, pathogen-treated plants, heat-treated plants, cold-treated
plants; Lane 7 through four loading positions, negative control, no
template; Lane 8 through four loading positions, positive control using
genomic DNA. Hypothetical gene names are labeled above the lane numbers
for loading position A; Loading positions B through D follow the same
pattern as A. Loading position of the genes from left to right are: B,
At2g39920, At2g40390, and At2g40520; C, At2g41150, At2g41050, and
At2g41350; D, At2g42140, At2g42330, and At2g42370.
|
|
For eight of the genes in Figure 2 (At2g39440, At2g39790,
At2g39830, At2g39920, At2g40520, At2g41150, At2g41050, and At2g41350), the amplification product sizes from the cDNA populations were smaller
than those amplified from genomic DNA, indicating that introns had been
spliced from the transcripts of these hypothetical genes. To date, 169 hypothetical genes have been tested for gene expression in the six cDNA
populations. One hundred and thirty-eight of them showed expression in
one or more cDNA populations, with 70 genes being expressed in all six
cDNA populations and 31 showing no expression in any of the cDNA
populations (Table II). Thus, most of the
hypothetical genes (82% of genes tested) are expressed in Arabidopsis.
All of the expression results of these 169 hypothetical genes in our
six cDNA populations are shown in Supplemental Data Table I, which can
be viewed at www.plantphysiol.org.
Generation of Full-Length cDNA Sequences
For each target gene, a cDNA population showing strong expression
was used as template for 5'- and 3'-RACE (Frohman et al., 1988 ). Five clones each from the 5' and 3' reactions were
sequenced from both ends with approximately 75% success rate yielding
approximately 15 sequences representing the cDNAs from that gene,
which were then run through TIGR assembler (Sutton et al.,
1995 ). For clarity, assemblies arising from 5' and 3' clones
are represented separately in Figure 3.
In some cases, 5' sequences formed a single assembly and 3' sequences
formed a single assembly with a common overlapping region, which
represents the full-length transcript of that gene. However, in many
cases, the output of TIGR assembler was several 5' or 3' assemblies
(Fig. 3). These arise because TIGR assembler will not assemble
individual sequence reads if there are either internal mismatches of
more than 2.5% or mismatched overhangs of more than 10 bases at the
end. For each gene, all of the individual assemblies were aligned,
along with the predicted cDNA, against the genomic sequence using
dds/gap2 (Huang et al., 1997 ) for inspection. The
results revealed that multiple assemblies arose from a single target
gene because of the presence of alternate splice and/or polyadenylation
isoforms in the collection of sequences from a single gene. Thus the
RACE products from more than one distinct transcript from the
same gene had been cloned and sequenced from a single cDNA
preparation. In this study, we obtained full-length cDNA sequences of
16 hypothetical genes from different cDNA populations (Table
III). They displayed some interesting
features, such as differences with their predictions, alternately
spliced variants, and multiple polyadenylation sites. The GenBank
accession numbers of the full-length cDNA sequences and the predicted
transcripts are shown in Table III.

View larger version (23K):
[in this window]
[in a new window]
|
Figure 3.
Schematic of the alignments of the cDNA assemblies
with the corresponding genomic DNA and curated gene predictions. The
vertical dashed lines indicate points of discrepancy. Tmp5-1, Tmp3-1,
etc. represent separate 5' and 3' assemblies produced by TIGR assembler
of the collection of sequence reads produced from a single gene.
|
|
Gene Structures with Major Differences from the Gene
Predictions
At2g05590 and At2g23370 have major differences from their gene
predictions (Fig. 3). At2g05590 is represented by one assembly (Tmp5-1)
from the 5' end and two assemblies (Tmp3-1 and Tmp3-2) from the 3' end.
The 5' assembly agrees with the predicted intron-exon boundaries adding
only 5'-UTR sequence. The two 3' assemblies (Tmp3-1 and Tmp3-2) contain
two additional exons compared with the predicted structure and also
shorten the predicted exon 6 so that the originally predicted stop
codon now falls in an intron. Tmp3-1 and Tmp3-2 show different
poly(A) sites (Figs. 3 and 4). In
addition, Tmp3-2 displays a disagreement with the gene prediction and
Tmp3-2 at the 3' splice site of intron 2 in that the splice acceptor
position is 5-bp upstream and gives rise to a smaller intron 2 (Figs. 3
and 5). The full-length cDNA
constructed from Tmp5-1 and Tmp3-1 encodes an open reading frame that
terminates in the last exon, resulting in a 303 amino acid peptide
compared with a 263 amino acid peptide encoded by the predicted gene
(Fig. 3). However, the cDNA constructed from Tmp5-1 and Tmp3-2 contains a stop codon just 3 amino acids downstream of the 3' splice site of
intron 2, giving rise to a truncated 164-amino acid peptide (Fig. 5).
At2g23370 is represented by one 5' assembly (Tmp5-1) and two 3'
assemblies (Tmp3-1 and Tmp3-2) because of different poly(A) sites.
These assemblies reveal that the actual gene structure is very
different from that predicted, showing 11 exons compared with the four
predicted. Furthermore, only four of the six predicted splice sites are
supported by the experimental cDNA sequence. Six of the additional
exons lie upstream of the predicted ATG, which itself lies in intron 6 of Tmp5-1 (Fig. 3). The largest open reading frame of this cDNA
assembly encodes a protein of 340 amino acids in contrast to the 175 amino acids encoded by the computationally predicted gene
model.

View larger version (62K):
[in this window]
[in a new window]
|
Figure 4.
Different polyadenylation sites from hypothetical
genes. The top sequence for each gene is the genomic sequence, and the
sequences below are cDNA assemblies with different poly(A) sites. Bases
shown in bold indicate the canonical poly(A) signal.
|
|

View larger version (39K):
[in this window]
[in a new window]
|
Figure 5.
Alternative splice sites from some hypothetical
genes. For each gene, the top sequence is the genomic sequence; the
middle and bottom nucleotide sequences are different cDNA assemblies
with alternative splice patterns. The deduced amino acid sequences are
below, and the asterisk following the amino acid sequence denotes the
stop codon. ###, Omitted sequence.
|
|
Examples of Alternately Spliced or Unspliced Introns
Among the genes for which full-length cDNA assemblies were
obtained, At2g05590, At2g23050, At2g24440, and At2g44220 display assemblies with alternative splice sites and unspliced introns (Table
III). As mentioned previously, At2g05590 shows major disagreements with
its predicted structure, including an alternative 3' splice site for
intron 2 in the Tmp3-2 assembly, which gives rise to a truncated
peptide. At2g23050 is represented by three 3' end assemblies and in one
of them (Tmp3-3), intron 3 is unspliced (Figs. 3 and 5). The two cDNA
sequences formed by combining the 5' (Tmp5-1) and 3' (Tmp3-1 or Tmp3-2)
assemblies match the prediction, but with different poly(A) sites, and
encode a 481 amino acid peptide, whereas the full-length assembly
formed by Tmp5-1 and Tmp3-3 with the unspliced intron 3 encodes a
peptide of 438 amino acids. At2g24440 produced two 5' assemblies
(Tmp5-1 and Tmp5-2) and one 3' assembly (Tmp3-1). Merging of Tmp5-1 and
Tmp3-1 produces a cDNA entirely consistent with the prediction,
encoding a 183-amino acid peptide and adding 5'- and 3'-UTR sequences.
In the second 5' assembly (Tmp5-2), the first intron uses a different
splice donor site 11 bp downstream of the predicted site, which results in a truncated peptide of 105 amino acids when merged with Tmp3-1 (Figs. 3 and 5). There is also a single-base mismatch between the
genomic sequence and the 11-bp extended 5' exon in Tmp5-2 (Fig. 5).
At2g44220 has one 5' assembly (Tmp5-1) and two 3' assemblies (Tmp3-1
and Tmp3-2; Fig. 3). When Tmp5-1 and Tmp3-1 are merged together, a
full-length cDNA is generated that matches the predicted intron-exon
boundaries and encodes a 393-amino acid protein. However, the other 3'
assembly (Tmp3-2) contains an unspliced intron 6 (Figs. 3 and 5). If
Tmp5-1 is merged with Tmp3-2, a stop codon is created within that
unspliced intron 6 resulting in a truncated 270-amino acid peptide
(Figs. 3 and 5). Each of the intron structures for At2g05590 and
At2g24440 that differ from the intron-exon borders in the corresponding
predicted gene models are noticeably compliant with the
conserved splice site dinucleotides 5'-GT, AG-3' (Fig. 5).
Multiple Polyadenylation Sites
At2g02540, At2g05590, At2g23050, At2g23370, and At2g44220 are all
represented by two 3' assemblies because of the presence of two
different polyadenylation sites. There are three 3' assemblies for
At2g03620, and each has a different polyadenylation site (Figs. 3 and
4).
Other Variations
The cDNA assemblies from the remaining genes
(At2g15220, At2g15760, At2g17570, At2g19180, At2g19870,
At2g23790, At2g23940, At2g41660, and At2g42430) precisely match
the predicted gene structures, merely adding 5'- and 3'-UTRs. At2g17570
and At2g19180 have previously unannotated introns in their 5'- and
3'-UTRs, respectively.
The results of all of these comparisons are summarized in Table
III. Disregarding simple 5' or 3' sequence extension and the multiple
poly(A) sites, for five of the 16 genes examined, the existing
prediction needed either to be modified with different splice sites
and/or exons or to be augmented with evidence for alternative splicing.
These results extend our experience in using the Ceres and SSP cDNAs to
validate gene models, where approximately 35% of the models required
some modification following comparison with full-length cDNAs
(Haas et al., 2002 ). However, with only 16 genes
examined, it may be premature to conclude that current models for
hypothetical genes are more likely to contain errors than those for
more highly expressed genes, where the gene model/annotation most
likely either incorporated or was supported by database matches.
 |
DISCUSSION |
Expression Analysis of the Hypothetical Genes
Although hypothetical genes are the group of genes with no
experimental evidence for either their structure or expression, among
the 169 hypothetical genes on chromosome 2 examined to date, about 82%
(138 of 169) were expressed in one or more of the six cDNA populations
tested. The range of Arabidopsis tissues used to date in EST and cDNA
sequencing projects is shown in Table IV.
There are some overlaps between the tissues listed in Table IV and
those selected for this study (e.g. roots and young plant). The absence
of transcripts for hypothetical genes from the EST and cDNA sequencing
projects to date could be attributable to low levels of expression so
that transcripts are missed by these undirected, large-scale
efforts.
There are several possible reasons for the inability to generate
amplification products from any of the cDNA populations for 31 of 169 (18%) hypothetical genes tested. (a) The gene prediction is incorrect;
there is actually no coding sequence present at this region in the
genome. (b) There is an expressed gene at predicted region, but one or
more exon predictions are incorrect, so that one or both of the GSP
primers lie within an intron and thus cannot amplify the spliced
hypothetical gene transcript. Approximately one-third of the predicted
gene models compared with Ceres cDNAs required some modification
(Haas et al., 2002 ). Therefore, primers for these 31 genes that were not amplified will be placed at other locations
and further attempts will be made to amplify products from cDNA. (c)
The hypothetical genes are not expressed at a sufficient level in any
of the six cDNA populations selected in this study to permit
amplification or may represent nonexpressed pseudogenes.
The range of tissues used for cDNA preparations is now being
expanded to increase the chances of capturing cDNAs for most, if not
all, of the hypothetical genes. Tissues that are not well represented
in current EST and cDNA sequencing projects will be selected, including
plants that have been infected with Pseudomonas syringae pv
tomato (which, in contrast to X. campestris infection, induces a hypersensitive response), hormone-treated, or exposed to
drought, salt, UV, and H2O2 stress.
An alternative strategy to examine the expression of hypothetical
genes is using microarray data. TIGR has constructed a 9,000+-element microarray that represents, in duplicate, all of the predicted genes on
chromosome 2 as 3'-biased amplicons approximately 1 kb in size derived
from genomic DNA. RNA isolated from seedlings, seedling roots, young,
and mature leaves, various aerial tissues, flowers, callus tissue,
heat, cold, salt, and hydrogen-peroxide-stressed plants and P. syringae and X. campestris-infected leaves was used in
the hybridizations. There are 200 hypothetical genes showing no
evidence of expression by microarray analysis in any of the tissues so
far examined (H. Kim, E. Snesrud, and J. Quackenbush, personal
communication). Like our PCR data, these results also indicate that
most of hypothetical genes (approximately 82%) are expressed in
different tissues or treatments. Fifty-five of the hypothetical genes
with no microarray evidence of expression have already been examined by
our PCR method on the six cDNA populations. Thirty-one show expression
in one or more cDNA populations, and 12 of these genes show expression
in all six cDNA populations. Thus, it is possible to detect expression
of and capture the full-length cDNAs for genes that are expressed at
levels below microarray detection. Twenty-two of the 31 hypothetical
genes that showed no expression in the cDNA populations tested in this
study conversely showed expression on microarrays using probes prepared
from tissues that were also represented in our cDNA populations.
Because our results described above demonstrate that genes expressed at
undetectable levels on microarrays can be amplified, it seems likely
that failure to amplify these genes is attributable to inaccurate gene
predictions that resulted in the placement of primer(s) in regions of
the predicted gene that are actually spliced out of the final transcript.
Analysis of the Full-Length cDNA Sequences
Assembling the 5'- and 3'-RACE product nucleotide sequences
revealed a number of different forms of full-length cDNA for the same
gene, including alternative intron donor or acceptor sites (At2g24440
and At2g05590), unspliced introns (At2g23050 and At2g44220) and
multiple polyadenylation sites (At2g02540, At2g03620, At2g05590, At2g23050, At2g23370, and At2g44220) (Fig. 3, and summarized in Table
III). It is unknown at this point whether these are genuine alternatively spliced transcripts with biological functions or just
some mis-spliced product that will be degraded. There have been several
previous reports of alternative splicing in Arabidopsis and some
alternatively spliced transcripts do have different biological functions. Alternative splicing was found in the COP1 gene,
resulting in the deletion of exon 11 and the generation of a truncated
COP1b protein, which functions as a dominant negative regulator of
wild-type COP1 function (Zhou et al., 1998 ).
Interestingly, the splicing factor SR1 gene in Arabidopsis
is also regulated by alternative splicing, and temperature determines
the alternative-splicing ratio. It was proposed that one isoform of SR1
could play a role in cellular adaptation to a high-temperature
environment (Lazar and Goodman, 2000 ). For the
Arabidopsis U1 snRNA 70K gene, two distinct transcripts are produced by
alternative splicing that give rise to two proteins, only the smaller
of which can bind specifically to Arabidopsis U1 snRNA (Golovkin
and Reddy, 1996 ). As with several of the cDNAs identified in
this study, multiple polyadenylation sites were also observed in the
3'-UTRs of the U1 snRNA 70K gene (Golovkin and Reddy,
1996 ). When Kato et al. (1999) analyzed cDNAs on
chromosome 1 of Arabidopsis, they found that alternative splicing
produced two very similar cDNAs (ZCW32 and CW7). One of the transcripts
has an intron donor site 13 bp downstream from that in the other, which
generates an in-frame stop codon just after the splice site. This
alternative splicing pattern is similar to that at intron 1 of
At2g24440 (Fig. 3). Another example of alternative splicing occurs in
the Arabidopsis Spo11 gene, the yeast homolog of which plays
an important role in double-strand break formation at meiosis in yeast
(Keeney et al., 1997 ). In Arabidopsis, there are two
SPO11 homologs (AtSPO11-1 and
AtSPO11-2), and each has three different polyadenylation
sites. RT-PCR demonstrated at least 10 different splicing products from AtSPO11-1, whereas there is only one alternative splicing
product from AtSPO11-2 (Hartung and Puchta,
2000 ). Because the alternately spliced isoforms were originally
identified by RACE-PCR of cDNA from a single tissue or treatment, it
will be interesting to determine whether the proportion of the
different isoforms varies among the collection of cDNA populations available.
Variations in transcript structure by alternative splicing are
also known to exist in other plant species. In pumpkin (Cucurbita pepo), two cDNAs are produced by alternative splicing from a
single hydroxypyruvate reductase gene (Mano et al.,
1999 ). The two hydroxypyruvate reductase proteins were
localized in leaf peroxisomes and the cytosol, respectively, indicating
that alternative splicing controls their subcellular localization. This
alternative splicing is regulated by light, and the alternative splice
site is 17 bp downstream of the predicted intron donor site. In spinach
(Spinacia oleracea), there are two cDNA clones encoding
stromal and thylakoid-bound ascorbate peroxidase isoenzymes
(Ishikawa et al., 1996 ), which are produced by
alternative splicing of two 3'-terminal exons (Ishikawa et al.,
1997 ). In cauliflower (Brassica oleracea), a truncated SRK protein is specifically expressed in stigmata and translated from one of several transcripts, which are generated by a
combination of alternative splicing and the use of alternative polyadenylation signals (Giranton et al., 1995 ). In
maize (Zea mays), ZEMa gene encodes a MADS
box-type transcription factor, for which transcripts are present in
almost all maize tissues, but specific differentially spliced forms
accumulate preferentially in maturing endosperm and leaf (Montag
et al., 1995 ). Alternative splicing also occurs at the
untranslated leading exons of the maize Zmhox1a homeobox gene in that
one transcript gives a normal Zmhox1a open reading frame and the other
gives an unrelated open reading frame. The alternative gene product,
transposon-associated protein, has significant homology to the C
terminus of the Mutator transposase (Comelli et al.,
1999 ). The alternate unspliced, intron-containing transcript
from the maize Bronze-2 locus was increased
50-fold by cadmium stress on maize seedlings and was proposed to have a
role during response to heavy metals (Marrs and Walbot,
1997 ). However, the unspliced introns observed in cDNAs from
At2g23050 and At2g44220 may be either alternative splicing products or
may arise from immature nuclear transcripts present in the cDNA
populations used for RACE.
For At2g24440, two cDNA isoforms were recovered, which differ at
the splice donor site of intron 1 (Fig. 5). The 5' splice site of
intron 1 of Tmp5-2 cDNA is 11 bp downstream from that in Tmp5-1 and the
prediction, but their 3' splice sites are all the same as each other.
Both isoforms of intron 1 use the conserved GT-AG splice sites. Most
surprisingly, within the extra 11-bp in Tmp5-2, there is a single-base
pair mismatch with the genomic sequence. The mismatch could be
attributable to PCR error in our RACE reactions, although these do
include a proofreading polymerase. It also could be attributable to a
mutation in the sequenced BAC from which the genomic sequence was
derived. Another possibility is that this alternatively spliced cDNA
actually arose from the transcript of a mutant At2g24440 allele that
existed in the pool of plants from which the RNA was isolated. Previous
reports have demonstrated that mutations in genomic sequence can affect
a gene's splicing patterns. The Arabidopsis floral homeotic mutant
apetala3-1 allele is temperature sensitive and
carries a mutation (from A to T) in exon 5 near the 5' splice site,
which causes a temperature-dependent splicing defect and the mutant
phenotype (Sablowski and Meyerowitz, 1998 ; Yi and
Jack, 1998 ). The Arabidopsis det3-1
mutation is attributable to a T to A mutation 32 bp upstream of a
putative 3' splice site, which causes a reduction of the transcript to
approximately 50% of the wild-type level (Schumacher et al.,
1999 ). The Arabidopsis cop1-1 allele carries a
single-nucleotide change (from G to A) 4 bases upstream from the 3'
splice site of intron 5, which results in exon skipping (Simpson
et al., 1998 ). At2g24440 is expressed in all six cDNA
populations and was amplified from cold-treated cDNA. Therefore, if the
Tmp5-2 cDNA of At2g24440 is really from a mutant allele and carries a
mutation near the 5' splice site of intron 1 (from G to A), the
single-nucleotide change and the temperature treatment could cause the
different splicing pattern of At2g24440. However, more experiments are
needed for verification.
Multiple polyadenylation sites have been found
previously in different transcripts in plants (Giranton et al.,
1995 ; Golovkin and Reddy, 1996 ; Hartung
and Puchta, 2000 ). From our results, six of 16 cDNA sequences
of hypothetical genes display two or more polyadenylation sites
(Fig. 4). All of the different poly(A) sites from each gene are near to
each other, the spacing between any two ranging from 23 to 82 bp (Table
III). A survey of experimentally validated poly(A) sites reveals that
the conservation and use of the canonical AAUAAA element varies widely
among yeast, rice (Oryza sativa), Arabidopsis, fruitfly
(Drosophila melanogaster), mouse, and human, and is
especially weak in plants and yeast (Graber et al.,
1999 ). Only five (At2g15220, At2g23050, At2g23370, At2g23790, and At2g41660) of 16 genes in our study contained the poly(A) signal
(AAUAAA) between the predicted stop codon and the farthest poly(A)
site, suggesting that the polyadenylation mechanism in plants may be
more subtle or variable than in animals.
Overall, our study of the hypothetical genes on chromosome 2 indicates that more than 80% of the genes predicted purely by computer
algorithms are actually expressed in one or more of the six cDNA
populations tested. In addition, there are frequently alternative
splicing and multiple polyadenylation events for the same hypothetical
gene. These observations are valuable not only for validating their
predicted structures but also for understanding their expression and regulation.
 |
MATERIALS AND METHODS |
Plant Material
Arabidopsis ecotype Columbia-0 seeds plants were subjected to a
variety of treatments as described below. After harvesting, the plant
tissue was frozen immediately in liquid nitrogen before RNA isolation.
For young plant tissue, seeds were sown on Redimix, transferred to
4°C for 4 d, and then grown at 25°C and 24-h photoperiod for 3 weeks. The aerial parts were harvested for RNA isolation. For heat and
cold shock, plants were grown as above then either incubated at 4°C
for 4 h (cold shock) or 37°C for 2 h (heat shock) then
harvested. Before infection with Xanthomonas campestris
pv campestris, seeds were cold-treated (4°C for 4 d) and then
grown at 25°C and 8-h photoperiod for 21 d. The leaves were
inoculated with a fresh culture of X.
campestris, and the aerial plant parts were harvest
24 h later. For root tissue, sterile seeds were imbibed at 4°C
for 24 h, then inoculated into Gamborg's B5 liquid medium, and
grown at 25°C and 24-h photoperiod with shaking for 15 d, and
then the entire tissue mass (predominantly roots) was harvested. For
cultured tissue, sterile seeds were germinated on Murashige and Skoog
medium (Murashige and Skoog+) containing 2,4-dichlorophenoxyacetic acid
(0.1 mg L 1) and isopentenyl adenine (0.5 mg
L 1). Callus tissue was subcultured into liquid Murashige
and Skoog+ and harvested after 10 d for RNA isolation.
Construction of cDNA Populations
Total RNA was isolated from a number of tissues/treatments as
described previously using TRIzol reagent (Invitrogen, Carlsbad, CA)
and then treated with DNA-free (Ambion, Austin, TX) to remove residual
genomic DNA. mRNA was isolated using the Oligotex mRNA isolation kit
(Qiagen USA, Valencia, CA). cDNA was synthesized from mRNA using
Marathon cDNA amplification kit (BD Biosciences Clontech, Palo Alto, CA).
Primer Design and PCR for Detection of Expression of Hypothetical
Gene
For each hypothetical gene, two gene-specific primers (GSP1 and
GSP2) for 3'- and 5'-RACE were designed based on the predicted coding
region, using the Primer 3 program
(http://www-genome.wi.mit.edu/cgi-bin/primer/primer3_www.cgi). The criteria for primer design are that they should be 23 to 28 nucleotides long (optimum 25 nucleotides) having 50% to 70% GC content with melting temperature 70°C, which enables
touchdown PCR. The primers were designed to give a 200- to 500-bp
overlap between the 5'- and 3'-RACE products, so that used together,
they could produce a 200- to 500-bp product from any cDNA
population in which the cognate gene is expressed. PCR conditions for
detection of the gene expression from different cDNA populations were
as follows: 94°C for 4 min, 35 cycles of 94°C for 30s, 55°C for
30 s, and 72°C for 2 min followed by 72°C for 10 min. The GSP1
and GSP2 primer sequences of the 16 cloned hypothetical genes are shown
in the Supplemental Data Table II, which can be viewed at www.plantphysiol.org.
Cloning of 5' and 3' Ends of Full-Length cDNAs Using RACE-PCR and
Sequence Analysis
Having identified a cDNA population in which the hypothetical
gene is expressed, 5'- and 3'-RACE for each gene was performed with the Marathon cDNA amplification kit (BD Biosciences
Clontech) using that cDNA population and touchdown PCR with the
following parameters: 94°C for 30 s; five cycles of 94°C for
5 s, 72°C for 4 min; five cycles of 94°C for 5 s, 70 for
4 min; 25 cycles of 94°C for 5 s, 68°C for 4 min; and 68°C
for 4 min. After examination by gel electrophoresis, RACE reaction
products were cloned into pT-Adv (BD Biosciences Clontech) or
pCR2.1-TOPO (Invitrogen). White colonies were inoculated into 96-well
deep blocks and grown in a 37°C shaker (225 rpm) overnight.
Verification of inserts was done by colony PCR using PCR Master Mix
(Promega). PCR conditions were as follows: 94°C for 4 min, 25 cycles
of 94°C for 30 s, 52°C or 68°C for 30 s, 72°C for 1 min, and 72°C for 10 min.
Four to five independent clones for each of the 5'- and 3'-RACE
products were sequenced from both ends using generic sequencing primers
and the sequences were assembled using TIGR assembler (Sutton et
al., 1995 ). The assembled cDNA sequences and the predicted gene
structure for each gene were aligned with the corresponding genomic
sequences using dds/gap2 program (Huang et al.,
1997 ).
 |
ACKNOWLEDGMENTS |
We thank all members of the Arabidopsis group at TIGR for their
help and especially Nadeeza Ishmael for her technical input.
 |
FOOTNOTES |
Received June 17, 2002; returned for revision August 25, 2002; accepted September 9, 2002.
1
This work was supported by the National Science
Foundation (grant no. DBI-9813586).
2
Present address: Gene Logic Inc., 708 Quince
Orchard Rd., Gaithersburg, MD 20878.
[w]
The online version of this article contains
Web-only data. The supplemental material is available at
www.plantphysiol.org.
*
Corresponding author; e-mail yxiao{at}tigr.org;
fax 301-838-0208.
Article, publication date, and citation information can be found
at www.plantphysiol.org/cgi/doi/10.1104/pp.010207.
 |
LITERATURE CITED |
-
Brendel V, Kleffe J
(1998)
Prediction of locally optimal splice sites in plant pre-mRNA with applications to gene identification in Arabidopsis thaliana genomic DNA.
Nucleic Acids Res
26: 4748-4757[Abstract/Free Full Text]
-
Burge C, Karlin S
(1997)
Prediction of complete gene structures in human genomic DNA.
J Mol Biol
268: 78-94[CrossRef][Web of Science][Medline]
-
Comelli P, Konig J, Werr W
(1999)
Alternative splicing of two leading exons partitions promoter activity between the coding regions of the maize homeobox gene Zmhox1a and Trap (transposon-associated protein).
Plant Mol Biol
41: 615-625[CrossRef][Web of Science][Medline]
-
Frohman MA, Dush MK, Martin GR
(1988)
Rapid production of full-length cDNAs from rare transcripts: amplification using a single gene-specific oligonucleotide primer.
Proc Natl Acad Sci USA
85: 8998-9002[Abstract/Free Full Text]
-
Giranton JL, Ariza MJ, Dumas C, Cock JM, Gaude T
(1995)
The S locus receptor kinase gene encodes a soluble glycoprotein corresponding to the SKR extracellular domain in Brassica oleracea.
Plant J
8: 827-834[Web of Science][Medline]
-
Golovkin M, Reddy AS
(1996)
Structure and expression of a plant U1 snRNP 70K gene: alternative splicing of U1 snRNP 70K pre-mRNAs produces two different transcripts.
Plant Cell
8: 1421-1435[Abstract]
-
Graber JH, Cantor CR, Mohr SC, Smith TF
(1999)
In silico detection of control signals: mRNA 3'-end-processing sequences in diverse species.
Proc Natl Acad Sci USA
96: 14055-14060[Abstract/Free Full Text]
-
Haas BJ, Volfovsky N, Town CD, Troukhan M, Alexandrov N, Feldmann KA, Flavell RB, White O, Salzberg SL
(2002)
Full-length messenger RNA sequences greatly improve genome annotation.
Genome Biol
3: RESEARCH0029[Medline]
-
Hartung F, Puchta H
(2000)
Molecular characterisation of two paralogous SPO11 homologues in Arabidopsis thaliana.
Nucleic Acids Res
28: 1548-1554[Abstract/Free Full Text]
-
Hebsgaard SM, Korning PG, Tolstrup N, Engelbrecht J, Rouze P, Brunak S
(1996)
Splice site prediction in Arabidopsis thaliana pre-mRNA by combining local and global sequence information.
Nucleic Acids Res
24: 3439-3452[Abstract/Free Full Text]
-
Huang X, Adams MD, Zhou H, Kerlavage AR
(1997)
A tool for analyzing and annotating genomic sequences.
Genomics
46: 37-45[CrossRef][Web of Science][Medline]
-
Ishikawa T, Sakai K, Yoshimura K, Takeda T, Shigeoka S
(1996)
cDNAs encoding spinach stromal and thylakoid-bound ascorbate peroxidase, differing in the presence or absence of their 3'-coding regions.
FEBS Lett
384: 289-293[CrossRef][Web of Science][Medline]
-
Ishikawa T, Yoshimura K, Tamoi M, Takeda T, Shigeoka S
(1997)
Alternative mRNA splicing of 3'-terminal exons generates ascorbate peroxidase isoenzymes in spinach (Spinacia oleracea) chloroplasts.
Biochem J
328: 795-800
-
Kato A, Suzuki M, Kuwahara A, Ooe H, Higano-Inaba K, Komeda Y
(1999)
Isolation and analysis of cDNA within a 300 kb Arabidopsis thaliana genomic region located around the 100 map unit of chromosome 1.
Gene
239: 309-316[CrossRef][Web of Science][Medline]
-
Keeney S, Giroux CN, Kleckner N
(1997)
Meiosis-specific DNA double-strand breaks are catalyzed by Spo11, a member of a widely conserved protein family.
Cell
88: 375-384[CrossRef][Web of Science][Medline]
-
Lazar G, Goodman HM
(2000)
The Arabidopsis splicing factor SR1 is regulated by alternative splicing.
Plant Mol Biol
42: 571-581[CrossRef][Web of Science][Medline]
-
Lin X, Kaul S, Rounsley S, Shea TP, Benito MI, Town CD, Fujii CY, Mason T, Bowman CL, Barnstead M, et al
(1999)
Sequence and analysis of chromosome 2 of the plant Arabidopsis thaliana.
Nature
402: 761-768[CrossRef][Medline]
-
Lukashin AV, Borodovsky M
(1998)
GeneMark.hmm: new solutions for gene finding.
Nucleic Acids Res
26: 1107-1115[Abstract/Free Full Text]
-
Mano S, Hayashi M, Nishimura M
(1999)
Light regulates alternative splicing of hydroxypyruvate reductase in pumpkin.
Plant J
17: 309-320[CrossRef][Web of Science][Medline]
-
Marrs KA, Walbot V
(1997)
Expression and RNA splicing of the maize glutathione S-transferase Bronze2 gene is regulated by cadmium and other stresses.
Plant Physiol
113: 93-102[Abstract]
-
Mayer K, Schuller C, Wambutt R, Murphy G, Volckaert G, Pohl T, Dusterhoft A, Stiekema W, Entian KD, Terryn N, et al
(1999)
Sequence and analysis of chromosome 4 of the plant Arabidopsis thaliana.
Nature
402: 769-777[CrossRef][Medline]
-
Montag K, Salamini F, Thompson RD
(1995)
ZEMa, a member of a novel group of MADS box genes, is alternatively spliced in maize endosperm.
Nucleic Acids Res
23: 2168-2177[Abstract/Free Full Text]
-
Park J, Teichmann SA
(1998)
DIVCLUS: an automatic method in the GEANFAMMER package that finds homologous domains in single- and multi-domain proteins.
Bioinformatics
14: 144-150[Abstract/Free Full Text]
-
Sablowski RWM, Meyerowitz EM
(1998)
Temperature-sensitive splicing in the floral homeotic mutant apetala3-1.
Plant Cell
10: 1453-1463[Abstract/Free Full Text]
-
Salanoubat M, Lemcke K, Rieger M, Ansorge W, Unseld M, Fartmann B, Valle G, Blocker H, Perez-Alonso M, Obermaier B, et al
(2000)
Sequence and analysis of chromosome 3 of the plant Arabidopsis thaliana.
Nature
408: 820-822[CrossRef][Medline]
-
Schumacher K, Vafeados D, McCarthy M, Sze H, Wilkins T, Chory J
(1999)
The Arabidopsis det3 mutant reveals a central role for the vacuolar H(+)-ATPase in plant growth and development.
Genes Dev
13: 3259-3270[Abstract/Free Full Text]
-
Seki M, Narusaka M, Abe H, Kasuga M, Yamaguchi-Shinozaki K, Carninci P, Hayashizaki Y, Shinozaki K
(2001a)
Monitoring the expression pattern of 1300 Arabidopsis genes under drought and cold stresses by using a full-length cDNA microarray.
Plant Cell
13: 61-72[Abstract/Free Full Text]
-
Seki M, Narusaka M, Yamaguchi-Shinozaki K, Carninci P, Kawai J, Hayashizaki Y, Shinozaki K
(2001b)
Arabidopsis encyclopedia using full-length cDNAs and its application.
Plant Physiol Biochem
39: 211-220[CrossRef]
-
Simpson CG, McQuade C, Lyon J, Brown JWS
(1998)
Characterization of exon skipping mutants of the COP1 gene from Arabidopsis.
Plant J
17: 125-131
-
Sutton G, White O, Adams MD, Kerlavage AR
(1995)
TIGR Assembler: a new tool for assembling large shotgun sequencing projects.
Genome Sci Technol
1: 9-19
-
Tabata S, Kaneko T, Nakamura Y, Kotani H, Kato T, Asamizu E, Miyajima N, Sasamoto S, Kimura T, Hosouchi T, et al
(2000)
Sequence and analysis of chromosome 5 of the plant Arabidopsis thaliana.
Nature
408: 823-826[CrossRef][Medline]
-
The Arabidopsis Genome Initiative
(2000)
Analysis of the genome sequence of the flowering plant Arabidopsis thaliana.
Nature
408: 796-813[CrossRef][Medline]
-
Theologis A, Ecker JR, Palm CJ, Federspiel NA, Kaul S, White O, Alonso J, Altafi H, Araugo, Bowman CL, et al
(2000)
Sequence and analysis of chromosome 1 of the plant Arabidopsis thaliana.
Nature
408: 816-820[CrossRef][Medline]
-
Uberbacher EC, Mural RJ
(1991)
Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach.
Proc Natl Acad Sci USA
88: 11261-11265[Abstract/Free Full Text]
-
Yi Y, Jack T
(1998)
An intragenic suppressor of the Arabidopsis floral organ identity mutant apetala3-1 functions by suppressing defects in splicing.
Plant Cell
10: 1465-1477[Abstract/Free Full Text]
-
Zhou DX, Kim YJ, Li YF, Carol P, Mache R
(1998)
COP1b, an isoform of COP1 generated by alternative splicing, has a negative effect on COP1 function in regulating light-dependent seedling development in Arabidopsis.
Mol Gen Genet
257: 387-391[Medline]
© 2002 American Society of Plant Biologists
This article has been cited by other articles:

|
 |

|
 |
 
H. Ner-Gaon, N. Leviatan, E. Rubin, and R. Fluhr
Comparative Cross-Species Alternative Splicing in Plants
Plant Physiology,
July 1, 2007;
144(3):
1632 - 1641.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. Bevan and S. Walsh
The Arabidopsis genome: A foundation for plant research
Genome Res.,
December 1, 2005;
15(12):
1632 - 1642.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
Y.-L. Xiao, S. R. Smith, N. Ishmael, J. C. Redman, N. Kumar, E. L. Monaghan, M. Ayele, B. J. Haas, H. C. Wu, and C. D. Town
Analysis of the cDNAs of Hypothetical Genes on Arabidopsis Chromosome 2 Reveals Numerous Transcript Variants
Plant Physiology,
November 1, 2005;
139(3):
1323 - 1337.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. Ayele, B. J. Haas, N. Kumar, H. Wu, Y. Xiao, S. Van Aken, T. R. Utterback, J. R. Wortman, O. R. White, and C. D. Town
Whole genome shotgun sequencing of Brassica oleracea and its application to gene discovery and annotation in Arabidopsis
Genome Res.,
April 1, 2005;
15(4):
487 - 495.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
Y. JIN and T. BIAN
Nontemplated nucleotide addition prior to polyadenylation: A comparison of Arabidopsis cDNA and genomic sequences
RNA,
November 18, 2004;
10(11):
1695 - 1697.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
B. C. Meyers, D. W. Galbraith, T. Nelson, and V. Agrawal
Methods for Transcriptional Profiling in Plants. Be Fruitful and Replicate
Plant Physiology,
June 1, 2004;
135(2):
637 - 652.
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
K. Yamada, J. Lim, J. M. Dale, H. Chen, P. Shinn, C. J. Palm, A. M. Southwick, H. C. Wu, C. Kim, M. Nguyen, et al.
Empirical Analysis of Transcriptional Activity in the Arabidopsis Genome
Science,
October 31, 2003;
302(5646):
842 - 846.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
B. J. Haas, A. L. Delcher, S. M. Mount, J. R. Wortman, R. K. Smith Jr, L. I. Hannick, R. Maiti, C. M. Ronning, D. B. Rusch, C. D. Town, et al.
Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies
Nucleic Acids Res.,
October 1, 2003;
31(19):
5654 - 5666.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. R. Wortman, B. J. Haas, L. I. Hannick, R. K. Smith Jr., R. Maiti, C. M. Ronning, A. P. Chan, C. Yu, M. Ayele, C. A. Whitelaw, et al.
Annotation of the Arabidopsis Genome
Plant Physiology,
June 1, 2003;
132(2):
461 - 468.
[Full Text]
[PDF]
|
 |
|
|
|