|
Plant Physiol, May 2003, Vol. 132, pp. 75-83
In Silico Identification of Putative Regulatory Sequence Elements
in the 5'-Untranslated Region of Genes That Are Expressed during Male
Gametogenesis
Raymond Jozef Maurinus
Hulzink,
Han
Weerdesteyn,
Anton Felix
Croes,
Tom
Gerats,
Marinus Maria Antonius
van Herpen, and
Jacques
van Helden*
Catholic University Nijmegen, Department of Experimental Botany,
Plant Genetics, Toernooiveld 1, 6525 ED Nijmegen, The Netherlands
(R.J.M.H., H.W., A.F.C., T.G., M.M.A.v.H.); and Université Libre
de Bruxelles, Unité de Conformation des Macromolécules
Biologiques, Campus Plaine, CP 263 Boulevard du Triomphe, B-1050
Bruxelles, Belgium (J.v.H.)
 |
ABSTRACT |
During pollen development, transcription of a large number
of genes results in the appearance of distinct sets of transcripts. Similar mRNA sets are present in pollen of both mono- and
dicotyledonous plant species, which indicates an evolutionary
conservation of genetic programs that determine pollen gene expression.
In pollen, regulation of gene expression occurs at the transcriptional
and posttranscriptional level. The 5'-untranslated region (UTR) of several pollen transcripts has been shown to be important for regulation of pollen gene expression. The important regulatory role of
5'-UTR sequences and the evolutionary conservation of genetic programs
in pollen led to the hypothesis that the 5'-UTRs of pollen-expressed
genes share regulatory sequence elements. In an attempt to identify
these pollen 5'-UTR elements, a statistical analysis was performed
using 5'-UTR sequences of pollen- and sporophytic-expressed genes. The
analysis revealed the presence of several pollen-specific 5'-UTR
sequence elements. Assembly of the pollen 5'-UTR elements led to the
identification of various consensus sequences, including those that
previously have been demonstrated to play a role in the regulation
of pollen gene expression. Several pollen 5'-UTR elements
were found to be preferentially associated to genes from dicots,
wet-type stigma plants, or plants containing bicellular pollen.
Moreover, three sequence elements exhibited a preferential association
to the 5'-UTR of pollen-expressed genes from Arabidopsis and
Brassica napus. Functional implications of these
observations are discussed.
 |
INTRODUCTION |
Gene expression covers a complex
series of distinctive processes. So far, studies on the regulation of
gene expression in plants mainly have been focused on mechanisms that
underlie the process of transcription. As a consequence, the
architecture and mode of action of promoter sequences of various genes
from different plant systems have been investigated extensively (for
review, see Novina and Roy, 1996 ).
Despite the importance of transcription, it becomes more evident that
posttranscriptional processes also perform a key function in the
regulation of plant gene expression (for review, see Gallie, 1993 ; Fütterer and Hohn, 1996 ;
Bailey-Serres, 1999 ). In precise terms,
posttranscriptional processes comprehend all steps downstream of
transcription, i.e. from pre-mRNA modification to protein turnover. In
many cases, the main determinant for posttranscriptional regulation is
the control of translation efficiency. In eukaryotes, control of
translation efficiency often occurs at the translation initiation level
by either posttranslational modification of translation initiation
factors or by posttranscriptional modification of individual or sets of
transcripts (for review, see Pain, 1996 ;
Bailey-Serres, 1999 ; Kozak, 1999 ). In the
latter case, structural properties of the 5'-untranslated region (UTR)
of mRNA molecules often play an important role. Examples of these
properties are length (Gallie et al., 2000 ), the
presence of secondary structures (Klaff et al., 1996 ;
Gallie et al., 2000 ) or upstream open reading frames (Lukaszewicz et al., 1998 ; Wang and Wessler,
1998 ), and the composition of the sequence that surrounds the
translation initiation codon (Geballe and Morris, 1994 ;
Joshi et al., 1997 ). In addition, the presence of
specific sequence elements that serve as interaction sites for
antisense RNAs (Shayig, 1997 ; Hu et al.,
1999 ) or RNA-binding proteins (for review, see Burd and
Dreyfuss, 1994 ; Albà and Pagès, 1998 ) can also contribute to the regulatory capacity of
5'-UTRs.
To identify putative regulatory sequence elements in the 5'-UTR of
coregulated genes, we focused on genes that are highly expressed during
the development and germination of the male gametophyte (pollen).
During pollen development, a large number of genes are transcribed
(Willing and Mascarenhas, 1984 ; Willing et al.,
1988 ; Guyon et al., 2000 ; F. Cnudde, unpublished
data), which leads to the appearance of distinctive sets of transcripts
(Stinson et al., 1987 ; Schrauwen et al.,
1990 ; Hulzink, 2002 ). These mRNA sets can be
found in pollen from both mono- and dicotyledonous plant species, which
argues for a conservation of genetic programs that underlie pollen gene
expression. The 5'-UTRs of several pollen transcripts have been shown
to alter gene expression at the transcriptional (Curie and
McCormick, 1997 ) or posttranscriptional level (Bate et
al., 1996 ; Hulzink, 2002 ; Hulzink et al.,
2002 ). With regard to the evolutionary conservation of genetic
programs in pollen and the important role of the 5'-UTR in pollen gene
expression, we hypothesize that the 5'-UTRs of pollen-expressed genes
share regulatory sequence elements.
To identify these shared (overrepresented) regulatory sequence elements
in the 5'-UTRs of pollen-expressed genes, a statistical analysis has
been carried out. Two different sequence sets were collected: a test
set containing 5'-UTR sequences of pollen-expressed genes (pollen
sequences) and a reference set containing 5'-UTR sequences of genes
that have been isolated from sporophytic tissues (reference sequences).
Both sequence sets were used to identify overrepresented sequence
elements (oligonucleotides) in the pollen sequences (oligo-analysis;
Van Helden et al., 1998 , 2000b ). Although genetic programs in pollen are conserved in different plant species, it
may well be that the presence of several sequence elements are
associated to genes that originate from specific plant species or from
subsets of plants that share similar taxonomic classifications or
morphological features. Hyper-geometric statistics were applied to
investigate whether the presence of the pollen elements was associated
to genes from specific plant species or from sets of plants that are
distinctive in the number of cotyledons (monocots or dicots), stigma
type (wet or dry), or pollen type (bicellular or tricellular).
 |
RESULTS |
The 5'-UTR of Pollen-Expressed Genes Shares Several Sequence
Elements
To investigate whether the 5'-UTRs of pollen-expressed genes share
sequence elements (overrepresented oligonucleotides), two datasets
were collected containing 5'-UTR sequences of either pollen-expressed genes (Table I, pollen
sequences) or genes that have been isolated from sporophytic tissues
(reference sequences). The background oligonucleotide frequencies were
estimated by calculation of the relative frequencies of all
oligonucleotides within the reference set. Oligonucleotide occurrences
were counted in the pollen sequences, and their statistical
significance was estimated on the basis of the background frequencies
(for a description of the followed methodology, see "Materials and
Methods").
View this table:
[in this window]
[in a new window]
|
Table I.
List of pollen-expressed genes that were used for
the "oligo-analysis"
The first column shows the plant species: A.t., Arabidopsis
B.c., Brassica campestris; B.n., Brassica napus;
B.r., Brassica rapa; H.a., Helianthus annuus;
H.b., Hordeum bulbosum; I.t., Ipomoea trifida;
L.l., Lilium longiflorum; L.p., Lolium perenne;
L.e., Lycopersicon esculentum; M.s., Medicago
sativa; N.a., Nicotiana alata; N.s., Nicotiana
sylvestris; N.t., Nicotiana tabacum; O.s., Oryza
sativa; P.h., Petunia hybrida; P.i., Petunia
inflata; P.p., Pyrus pyrifolia; P.s., Pisum
sativum; S.b., Solanum berthaultii; S.c., Solanum
chacoense; S.t., Solanum tuberosum; T.p.,
Tradescantia paludosa; and Z.m., Zea mays. The
second column shows the gene/clone names. The third column shows the
GenBank accession nos.
|
|
Table II shows the hexanucleotides that
are significantly overrepresented in the pollen sequences compared with
the reference sequences. From the 4,096 possible hexanucleotides, 31 sequence elements are preferentially present in the 5'-UTRs of
pollen-expressed genes (Table II). Similar results were obtained for
penta-, hepta-, and octanucleotides (for these data, see
http://rsat.ulb.ac.be/rsat/). The majority of the overrepresented
oligonucleotides (pollen elements) are A rich, i.e. more than 80% of
the oligonucleotides contain four or more A residues. The most
significant overrepresented oligonucleotide is AAAAAA, which is 74 times present (Oocc) when 20.4 occurrences
are expectable on the basis of the background model
(Eocc). When the total number of possible
oligonucleotides in each dataset is taken into account (4,096), the
occurrence probability value converts into the occurrence significance
index (sigocc). For the oligonucleotide
AAAAAA, the sigocc is 15.9, which means
that a hexanucleotide with a similar high-significance value is
expected to occur in 1015.9 datasets of random
sequences. Highly significant values are also found for several AAAAAA
single-substitution variants. To examine whether the overrepresented
oligonucleotides are positional biased in the pollen sequences,
the distribution of each hexanucleotide was determined, and a test of
homogeneity was applied (Table III). None
of the hexanucleotides passed the significance threshold of 64.64, which indicates that the overrepresented sequence elements are not
positional biased in the pollen 5'-UTRs.
View this table:
[in this window]
[in a new window]
|
Table II.
Overrepresented sequence elements in the 5'-UTRs of
pollen-expressed genes
The first column (oligo) shows the overrepresented pollen elements.
Oocc, observed occurrences; Eocc, expected
occurrences; Pocc, probability occurrences;
sigocc, significance index occurrences; Oms,
observed matching sequences; Ems, expected matching
sequences; Pms, probability matching sequences; and
sigms, significance index matching sequences. All sequence
elements with sigocc > 0 were selected. See "Materials
and Methods" for a description of the parameters.
|
|
View this table:
[in this window]
[in a new window]
|
Table III.
Position analysis of the pollen 5'-UTR sequence
elements
Analysis of the distribution of each overrepresented sequence element
was performed with the program "position-analysis" (Van
Helden et al., 2000b ). Chi-square values ( 2)
were calculated using the parameters df = 30 and = 0.000244. Sequence elements with 2 64.64 do not
exhibit a significant biased position in the pollen sequences. For
details, see "Materials and Methods."
|
|
In summary, these data clearly show that several sequence elements are
preferentially present in the 5'-UTR of pollen-expressed genes. Within
the pollen 5'-UTRs, the overrepresented sequence elements are not
positional biased.
Assembly of Sequence-Related Pollen Elements Gives Rise to Several
Consensus 5'-UTR Elements
Several of the identified pollen elements share sequence
similarity, and their mutual overlap might reveal consensus
sequence elements. To identify these consensus elements, we assembled
the sequence-related pollen elements using the program
"pattern-assembly" (Van Helden et al.,
2000a ). As shown in Table IV,
assembly of sequence-related pollen elements gave rise to several
consensus 5'-UTR sequence elements. Highly significant values are found for the consensus elements CAAATAAAAAT and AAAAAA.
View this table:
[in this window]
[in a new window]
|
Table IV.
Pattern assembly of the pollen 5'-UTR sequence
elements
The "oligo" column represents the pollen 5'-UTR sequence elements
with their respective significant indexes (column 2:
sigocc). C, Consensus sequence element.
|
|
Several Pollen Elements Are Associated with Genes from
Arabidopsis, Brassica napus, Dicotyledonous Plants,
or Plants Containing a Wet Stigma or Bicellular Pollen
The pollen-expressed genes that were used for the 5'-UTR analysis
are derived from 25 different plant species (Table I). With regard to
their taxonomic classification, the plants were separated in subsets
containing mono- or dicotyledonous species. On the
basis of their stigma type, the plant species were
grouped further into the subsets wet and dry stigma. A wet stigma is
covered with a liquid secretion layer, whereas a dry stigma is covered with less or no secretion material (Heslop-Harrison and
Shivanna, 1977 ). Furthermore, the plant species were grouped
into the subsets bi- and tricellular pollen. Sperm cells in tricellular
pollen are formed during pollen maturation, whereas sperm cells in
bicellular pollen are arranged during pollen tube growth
(Brewbaker, 1967 ). On the basis of the hyper-geometric
probability, we tested to what extent the pollen elements were
preferentially associated to genes from each subset of plant species.
Furthermore, we analyzed to what extent the pollen elements were
associated to genes from specific plant species. For each sequence
element, the number of matching pollen sequences in a given subset
(B) was compared with the number of matching sequences in a
given set (M), and the corresponding E value was calculated
(for a detailed description of the methodology, see "Materials and
Methods"). The analysis revealed the presence of 12 associations with an E value < 0.1 (Table
V). From the 31 overrepresented sequence
elements, seven are significantly associated to dicotyledonous plant
species. The most significant example is the sequence element AAAAAT,
which is present in 32 dicotyledonous pollen sequences. The pollen
element ATCAAA is significantly associated to genes from both wet
stigma and bicellular pollen plants. Interestingly, the sequence
elements AAAAAA and AAAAAT are significantly associated to B. napus, whereas AAGAAG is associated to Arabidopsis.
View this table:
[in this window]
[in a new window]
|
Table V.
Statistical analysis of the extent of association of
pollen elements with genes from different plant species or from plants
of the subsets: one or two cotyledons (monocot or dicot), wet- or
dry-type stigma (wet or dry), or bi- or tricellular pollen (bi or
tri)
The first column shows the subsets that give a significant association
to the pollen elements of column 2. S, No. of pollen
sequences from a given subset; M, no. of pollen sequences
from a given set containing a given pollen element; B, no.
of pollen sequences from a given subset containing a given pollen
element; P, probability value; E, expected value; and
sig, significance index. See "Materials and Methods" for
a description of the methodology.
|
|
 |
DISCUSSION |
A systematic approach based on the statistical analysis of
oligonucleotide occurrences has led to the identification of several oligonucleotides (sequence elements) that are significantly
overrepresented in the 5'-UTR of pollen-expressed genes (Table II). It
is obvious that the choice of appropriate reference and test datasets
is an important determinant for a reliable outcome of the analysis. Genes from both datasets were selected on the basis of the origin of
their respective cDNAs (male gametophytic or sporophytic tissues) without a priori consideration of the composition of the 5'-UTRs. The
extent of expression of pollen genes during pollen development and tube
growth was ascertained by data from the available literature. From the
analysis, we conclude that the 5'-UTRs of genes that are expressed
during male gametogenesis share pollen-specific sequence elements
(pollen elements). Statistical analyses of the presence of
overrepresented sequence elements have led to a successful identification of regulatory elements in promoter (Van Helden et
al., 1998 , 2000c ; Sinha and Tompa,
2002 ) and 3'-UTR (Jacobs-Anderson and Parker,
2000 ; Van Helden et al., 2000b ) sequences of
coregulated yeast (Saccharomyces cerevisiae) genes.
To our knowledge, the present study is the first report that describes
the in silico identification of sequence elements that are shared in
the 5'-UTRs of coregulated plant genes.
Although many sequence elements are significantly overrepresented in
the 5'-UTR of pollen-expressed genes, their statistical significance
does not necessarily imply that they are functional in the regulation
of pollen gene expression. However, several observations indicate that
some of the pollen elements exhibit a pollen-related
regulatory function. Figure 1 shows a
schematic representation of the 5'-UTR of the pollen-expressed gene
ntp303. A sequence region at the 5' end of the
ntp303 5'-UTR has been shown previously to affect
translation efficiency, whereas a sequence region at the 3' end was
found to modulate mRNA stability (Hulzink, 2002 ;
Hulzink et al., 2002 ). As shown in Figure 1, assembly of several of the pollen elements in the ntp303 5'-UTR leads to
several extended sequence regions. We assume that some of these
extended pollen sequence regions comprise regulatory elements because
several of these regions are also present in the functional
ntp303 5'-UTR regions. In addition, pattern assembly
analysis (Table IV) highlighted the presence of consensus sequence
elements that are conserved in various pollen-expressed genes. Because
some of these consensus pollen elements are also localized in the
functional ntp303 5'-UTR regions, it is assumable that at
least the consensus sequence elements in the functional
ntp303 5'-UTR regions resemble regulatory sequences.
Regulatory sequence elements are often concentrated as short and highly
conserved core elements (for review, see Novina and Roy,
1996 ). A representative example of such a regulatory sequence
element is the consensus sequence AAGAAG, which is repetitive present
in the 5'-functional region of the ntp303 5'-UTR. Deletion analysis strongly suggests that the AAGAAG repeat is involved in
directing pollen gene expression (Hulzink et al., 2002 ).
In this respect, the identification of the AAGAAG sequence as a
pollen-specific 5'-UTR element provides additional clues for its
regulatory function.

View larger version (14K):
[in this window]
[in a new window]
|
Figure 1.
Schematic representation of the distribution of
overrepresented sequence elements (pollen elements) in the 5'-UTR of
the pollen-expressed gene ntp303. Distribution of the pollen
elements is presented above the graphic representation of the
ntp303 5'-UTR. The small arrows indicate the start of a
pollen element. The numbers correspond to sequence elements as
presented in Table II (counted from up to down). Sequences below
the ntp303 5'-UTR illustration indicate the position of
consensus pollen sequence elements as presented in Table IV. The gray
regions represent the 5' (left) and 3' (right) regulatory regions of
the ntp303 5'-UTR (see text for description). The arrow at
the 3'-end of the ntp303 5'-UTR indicates the position of
the translation initiation site.
|
|
The pollen 5'-UTR sequences that were used in the present study
originate from different plant species. Several pollen elements are
preferentially associated to genes from dicotyledonous plant species,
whereas none of the elements exhibit a significant association to genes
from monocots (Table V). These results indicate co-evolution of
sequence elements in the 5'-UTRs of pollen-expressed genes. It is
plausible that the significant association of pollen 5'-UTR elements is
reflected by specific properties of dicots, which are absent in
monocots. In addition, the pollen element ATCAAA is found to be
preferentially associated to genes from plants containing a wet stigma
and bicellular pollen. With regard to self-incompatibility systems in
angiosperms, a clear relationship exists between pollen and stigma
characteristics, i.e. plant species with a wet-type stigma often have
bicellular pollen (Brewbaker, 1967 ;
Heslop-Harrison and Shivanna, 1977 ). Such a
relationship might explain the preferential association of the ATCAAA
element to genes from wet-type stigma and bicellular pollen plants.
Besides the pollen elements that are preferentially associated to
subsets of plant species, the presence of several other 5'-UTR sequence elements are not related to the number of cotyledons, stigma type, or
pollen type. It is assumable that these elements play a more general
role in the regulation of pollen gene expression, independent of the
phylogenetic background. On the contrary, the preferential association
of three other sequence elements to pollen-expressed genes from
Arabidopsis and B. napus indicates a strong phylogenetic dependency of these elements for these Brassicaceae spp.
It is obvious that in silico identification of conserved sequence
elements in the 5'-UTRs of coregulated genes has to be validated experimentally. Nevertheless, computational identification of shared
5'-UTR sequence elements provides useful indications for new functional
studies. Although gene expression studies in pollen have been
prosperous in several ways, a systematic analysis of the 5'-UTRs of the
growing number of isolated pollen-expressed genes was still lacking.
With the increasing number of publicly available genomic and EST
sequences, we will be able to gain a better understanding of the new
intriguing functional and evolutionary clues provided by our analysis.
 |
MATERIALS AND METHODS |
5'-UTR Sequence Datasets
The 5'-UTR sequences of pollen-expressed genes (pollen
sequences) were collected from the GenBank database. The pollen 5'-UTR dataset consists of 5'-UTR sequences from 132 different genes that are
highly expressed in mature pollen or pollen tubes. Expression of the
genes in pollen or pollen tubes was ascertained by data from the
available literature. Because of their expression in anther tissues,
genes related to pollen coat proteins have been excluded from the
analysis. The total number of nucleotides in the pollen 5'-UTR dataset
is 16,645. The average length of the full-length or partial pollen
5'-UTRs is 126 nucleotides; the smallest UTR sequence consists of six
nucleotides, whereas the longest UTR is 620 nucleotides in length.
The detection of overrepresented oligonucleotides (sequence elements)
in the pollen sequences relies on the prior definition of a background
model, which is used to estimate the random expectation for each
oligonucleotide. As a background model, we used a set of non-pollen
5'-UTR sequences (named reference sequences). The reference sequences
were obtained from the GenBank database (March 2001) by selecting the
first 1,076 cDNA entries that did not contain any of the key words
"pollen," "gametophyte," "flower," and "bud." With
regard to the extraction of the reference sequences, the complete
sequence upstream of the translation initiation site was selected. This
procedure resulted in the collection of 113,481 nucleotides. The
average length of the full-length or partial reference 5'-UTRs is 105 nucleotides; the smallest UTR sequence is 14 nucleotides in length,
whereas the longest UTR consists of 1,396 nucleotides.
Sequence Purging
The presence of homologous 5'-UTR sequences in the datasets
might lead to the inclusion of large conserved sequence regions. These
conserved sequence regions can bias the analysis by duplicating all the
oligonucleotides that are found in the redundant sequences. To avoid
this phenomenon, the pollen and reference sequences were purged to
remove sequence repeats larger than 50 nucleotides (containing a
maximum of three mismatches) using the programs "mkvtree" and "vmatch" (Kurtz and Schleiermacher, 1999 ).
Oligonucleotide Analysis
A pattern discovery approach (oligo-analysis; Van Helden
et al., 1998 , 2000b ) was used to detect overrepresented
oligonucleotides in the pollen sequences. All statistics were performed
on nondegenerated oligonucleotides, without the acceptation of
mismatches. To calculate the expected frequency of each
oligonucleotide, oligo-analysis was first applied to the reference
sequence set. The obtained expected frequency values were used to
estimate the expected number of occurrences for each oligonucleotide in
the set of pollen sequences. The detection of overrepresented
oligonucleotides is based on an estimation of the significance of the
observed occurrences (Oocc). For each
oligonucleotide, the P value
Pocc = P(X x) was
calculated on the basis of the binomial distribution. Because the
analysis comprised multiple tests (4,096 in the case of
hexanucleotides), the possibility existed that even low
P values appeared by chance. To correct for such a
multitesting effect, the P values were multiplied by the
number of oligonucleotides. This correction resulted in an expected
value, the E value (Eocc). The
significance index sigocc = log(Eocc) reflects the degree of
overrepresentation of each oligonucleotide on a logarithmic scale. All
oligonucleotides with a positive significance index
(Eocc 1) were selected. To prevent
a bias due to self-overlapping occurrences (Kleffe and
Borodovsky, 1992 ), a nonoverlapping counting mode was adopted.
This means that when a self-overlapping hexanucleotide was found at a
given position, its occurrence at the next five positions was ignored.
The number of possible positions was corrected according to the
calculation of the binomial.
Overrepresentation of an oligonucleotide can be due to either its
frequent presence in most of the pollen sequences or in a subset of
these sequences. The probability of the first occurrence was taken into
account by calculating an additional index of overrepresentation. For
each oligonucleotide, the number of sequences that contained at least
one occurrence (matching sequence) was determined, and the statistical
significance was calculated with the binomial probability. For
a single sequence, the first occurrence probability was calculated
as:
where pW represents the
probability to observe the oligonucleotide W at any
position of the sequence, XS represents the number of matches on sequence S, and
TS represents the number of positions
that was calculated from the sequence length
LS and the oligonucleotide size
w(TS = LS w + 1). The
matching sequence probability Pms was
calculated by taking the right tail of the binomial
probability:
where m represents the number of sequences that
contain at least one copy of the oligonucleotide, whereas
S represents the total number of sequences. The E value
(Ems) and the significance index
(sigms) were calculated from the
P value in the same way as for the occurrences. A
positive sigms value indicates that
the oligonucleotide is present in more sequences than what would be
expected by chance alone.
Using the 1,076 non-pollen gene sequences as reference, a
problem of statistical sampling was observed. Some oligonucleotides were present in the pollen sequences and not in the reference set. As a
consequence, their probability was estimated to be zero and their
respective significance was infinite, even if they were found in a low
copy number in the pollen sequences. The problem occurred because the
reference sequence set was too small to reflect all possibilities. It
is likely that these oligonucleotides will appear in much larger
reference datasets. To circumvent this problem for the current
reference sequence set, pseudo-weights were used. This means that the
oligonucleotide frequencies that were calculated from the reference
sequence set contributed for 90% to the estimation of prior
oligonucleotide probabilities, whereas the remaining 10% were left for
the potential presence of additional oligonucleotides in the pollen sequences.
Analysis of oligonucleotide distributions in the pollen sequences was
performed with the program "position-analysis" (Van Helden
et al., 2000b ). Occurrences were regrouped by intervals of 20 nucleotides. Given the sequence sizes, the positional distributions contained 31 classes with a class interval of 20 nucleotides. The
expected distribution was calculated according to a homogeneous model,
i.e. a position-independent distribution for each oligonucleotide. Observed and expected positional distributions were compared using the
chi-square statistics. To take the number of oligonucleotides (n = 4096) into account, the threshold was adapted
according to the Bonferoni rule, which recommends a first error risk
< 1/n. The degrees of freedom
(df) of the chi-square test depended on the number of
position classes (c), which in turn depended on the
class interval. For the class interval of 20 nucleotides, the resulting
probability value ( = 0.000244) corresponded to a theoretical
value of Xtheor = 64.64.
Association Analysis
To examine to what extent the oligonucleotides were associated
to specific plant species or to subsets of plants containing one or two
cotyledons, a wet- or dry-type stigma, or bi- or tricellular pollen,
the hyper-geometric probability test was applied. The hyper-geometric
probability estimates the significance of association between the
overrepresented oligonucleotides and different subsets of pollen
sequences. Considering a set of n sequences (e.g. the 132 pollen sequences) that contains a subset of size S
(e.g. the 99 sequences from dicotyledonous plants) and the observation
that a given oligonucleotide is present in M pollen
sequences, the probability that exactly B of these
pollen sequences belongs to the given subset is:
The P value is the probability to observe at
least B matching sequences in the given subset:
In our analysis, each oligonucleotide was compared with
T = 31 different subsets of pollen sequences
(dicots/monocot, wet stigma/dry stigma, bicellular pollen/tricellular
pollen, and 25 different plant species). To correct for this
multitesting, the P value was converted to an E value
by:
Finally, the E value was converted to a logarithmic significance
index:
Availability
The "Regulatory Sequence Analysis Tools" are available at
http://rsat. ulb.ac.be/rsat/. The complete sets of data and
calculation procedures are available on the same site.
Distribution of Materials
Upon request, all novel materials described in this publication
will be made available in a timely manner for noncommercial research
purposes, subject to the requisite permission from any third party
owners of all or parts of the material. Obtaining any permission will
be the responsibility of the requestor.
 |
FOOTNOTES |
Received September 19, 2002; returned for revision November 28, 2002; accepted January 2, 2003.
*
Corresponding author; e-mail
jvanheld{at}ucmb.ulb.ac.be; fax 32-2-650-5425.
Article, publication date, and citation information can be found at
www.plantphysiol.org/cgi/doi/10.1104/pp.102.014894.
 |
LITERATURE CITED |
-
Albà MM, Pagès M
(1998)
Plant proteins containing the RNA-recognition motif.
Trends Plant Sci
3: 15-21
-
Bailey-Serres J
(1999)
Selective translation of cytoplasmic mRNAs in plants.
Trends Plant Sci
4: 142-148[CrossRef][ISI][Medline]
-
Bate N, Spurr C, Foster GD, Twell D
(1996)
Maturation-specific translational enhancement mediated by the 5'-UTR of a late pollen transcript.
Plant J
10: 613-623[CrossRef][ISI][Medline]
-
Brewbaker JL
(1967)
The distribution and phylogenetic significance of binucleate and trinucleate pollen grains in the angiosperms.
Am J Bot
54: 1069-1083
-
Burd CG, Dreyfuss G
(1994)
Conserved structures and diversity of functions of RNA-binding proteins.
Science
265: 615-621[Abstract/Free Full Text]
-
Curie C, McCormick S
(1997)
A strong inhibitor of gene expression in the 5'-untranslated region of the pollen-specific lat59 gene of tomato.
Plant Cell
9: 2025-2036[Abstract]
-
Fütterer J, Hohn T
(1996)
Translation in plants: rules and exceptions.
Plant Mol Biol
32: 159-189[CrossRef][ISI][Medline]
-
Gallie DR
(1993)
Posttranscriptional regulation of gene expression in plants.
Annu Rev Plant Physiol Plant Mol Biol
44: 77-105[CrossRef][ISI]
-
Gallie DR, Ling J, Niepel M, Morley SJ, Pain VM
(2000)
The role of 5'-leader length, secondary structure, and PABP concentration on cap and poly(A) tail function during translation in xenopus oocytes.
Nucleic Acids Res
28: 2943-2953[Abstract/Free Full Text]
-
Geballe AP, Morris DR
(1994)
Initiation codons within 5'-leaders of mRNA as regulators of translation.
Trends Biochem Sci
19: 159-164[CrossRef][ISI][Medline]
-
Guyon VN, Astwood JD, Garner EC, Dunker AK, Taylor LP
(2000)
Isolation and characterization of cDNAs expressed in the early stages of flavonol-induced pollen germination in petunia.
Plant Physiol
123: 699-710[Abstract/Free Full Text]
-
Heslop-Harrison Y, Shivanna KR
(1977)
The receptive surface of the angiosperm stigma.
Ann Bot
41: 1233-1258[Abstract/Free Full Text]
-
Hu MC-Y, Tranque P, Edelman GM, Mauro VP
(1999)
rRNA-complementarity in the 5'-untranslated region of mRNA specifying the Gtx homeodomain protein: evidence that base-pairing to 18S rRNA affects translational efficiency.
Proc Natl Acad Sci USA
96: 1339-1344[Abstract/Free Full Text]
-
Hulzink RJM
(2002)
Post-transcriptional regulation of gene expression during male gametogenesis: regulatory and structural properties of the 5'-untranslated region of pollen-expressed genes. PhD thesis. Catholic University Nijmegen, The Netherlands
-
Hulzink RJM, de Groot PFM, Croes AF, Quaedvlieg W, Twell D, Wullems GJ, van Herpen MMA
(2002)
The 5'-untranslated region of the ntp303 gene strongly enhances translation during pollen tube growth, but not during pollen maturation.
Plant Physiol
129: 342-353[Abstract/Free Full Text]
-
Jacobs-Anderson JS, Parker R
(2000)
Computational identification of cis-acting elements affecting post-transcriptional control of gene expression in Saccharomyces cerevisiae.
Nucleic Acids Res
28: 1604-1617[Abstract/Free Full Text]
-
Joshi CP, Zhou H, Huang X, Chiang VL
(1997)
Context sequences of translation initiation codons in plants.
Plant Mol Biol
35: 993-1001[CrossRef][ISI][Medline]
-
Klaff P, Riesner D, Steger G
(1996)
RNA structure and regulation of gene expression.
Plant Mol Biol
32: 89-106[CrossRef][ISI][Medline]
-
Kleffe J, Borodovsky M
(1992)
First and second moment of count of words in random texts generated by Markov chains.
Comput Appl Biosci
8: 433-441[Abstract/Free Full Text]
-
Kozak M
(1999)
Initiation of translation in prokaryotes and eukaryotes.
Gene
234: 187-208[CrossRef][ISI][Medline]
-
Kurtz S, Schleiermacher C
(1999)
REPuter: fast computation of maximal repeats in complete genomes.
Bioinformatics
15: 426-427[Abstract/Free Full Text]
-
Lukaszewicz M, Jérouville B, Boutry M
(1998)
Signs of translational regulation within the transcript leader of a plant plasma membrane H+-ATPase gene.
Plant J
14: 413-423[CrossRef][ISI][Medline]
-
Novina CD, Roy AL
(1996)
Core promoters and transcriptional control.
Trends Genet
12: 351-355[CrossRef][ISI][Medline]
-
Pain VM
(1996)
Initiation of protein synthesis in eukaryotic cells.
Eur J Biochem
236: 747-771[ISI][Medline]
-
Schrauwen JAM, de Groot PFM, van Herpen MMA, van der Lee T, Reijnen WH, Weterings KAP, Wullems GJ
(1990)
Stage-related expression of mRNAs during pollen development in lily and tobacco.
Planta
182: 298-304
-
Shayig RM
(1997)
Role of gene overlap in the regulation of mRNA translation for mitochondrial cytochrome p-450c27/25 in the rat.
J Biol Chem
272: 4050-4057[Abstract/Free Full Text]
-
Sinha S, Tompa M
(2002)
Discovery of novel transcription factor binding sites by statistical overrepresentation.
Nucleic Acids Res
30: 5549-5560[Abstract/Free Full Text]
-
Stinson JR, Eisenberg AJ, Willing RP, Pe ME, Hanson DD, Mascarenhas JP
(1987)
Genes expressed in the male gametophyte of flowering plants and their isolation.
Plant Physiol
83: 442-447[Abstract/Free Full Text]
-
Van Helden J, André B, Collado-Vides J
(1998)
Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies.
J Mol Biol
281: 827-842[CrossRef][ISI][Medline]
-
Van Helden J, André B, Collado-Vides J
(2000a)
A web site for the computational analysis of yeast regulatory sequences.
Yeast
16: 177-187[CrossRef][ISI][Medline]
-
Van Helden J, del Olmo M, Pérez-Ortín J
(2000b)
Statistical analysis of yeast genomic downstream sequences reveals putative polyadenylation signals.
Nucleic Acids Res
28: 1000-1010[Abstract/Free Full Text]
-
Van Helden J, Rios AF, Collado-Vides J
(2000c)
Discovering regulatory elements in non-coding sequences by analysis of spaced dyads.
Nucleic Acids Res
28: 1808-1818[Abstract/Free Full Text]
-
Wang L, Wessler SR
(1998)
Inefficient reinitiation is responsible for upstream open reading frame-mediated translational repression of the maize r gene.
Plant Cell
10: 1733-1745[Abstract/Free Full Text]
-
Willing RP, Bashe D, Mascarenhas JP
(1988)
An analysis of the quantity and diversity of messenger RNAs from pollen and shoots of Zea mays.
Theor Appl Genet
75: 751-753
-
Willing RP, Mascarenhas JP
(1984)
Analysis of the complexity and diversity of mRNAs from pollen and shoots of tradescantia.
Plant Physiol
75: 865-868[Abstract/Free Full Text]
© 2003 American Society of Plant Biologists
This article has been cited by other articles:

|
 |

|
 |
 
T. C. Mockler, T. P. Michael, H. D. Priest, R. Shen, C. M. Sullivan, S. A. Givan, C. McEntee, S. A. Kay, and J. Chory
The Diurnal Project: Diurnal and Circadian Expression Profiling, Model-based Pattern Matching, and Promoter Analysis
Cold Spring Harb Symp Quant Biol,
January 1, 2007;
72(0):
353 - 363.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
I. W. Wilson, G. C. Kennedy, J. W. Peacock, and E. S. Dennis
Microarray Analysis Reveals Vegetative Molecular Phenotypes of Arabidopsis Flowering-time Mutants
Plant Cell Physiol.,
August 1, 2005;
46(8):
1190 - 1201.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
C. Prasinos, K. Krampis, D. Samakovli, and P. Hatzopoulos
Tight regulation of expression of two Arabidopsis cytosolic Hsp90 genes during embryo development
J. Exp. Bot.,
February 1, 2005;
56(412):
633 - 644.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
H. Wan, Y. Xu, M. Ikegami, M. T. Stahlman, K. H. Kaestner, S.-L. Ang, and J. A. Whitsett
Foxa2 is required for transition to air breathing at birth
PNAS,
October 5, 2004;
101(40):
14449 - 14454.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. E. Hudson and P. H. Quail
Identification of Promoter Motifs Involved in the Network of Phytochrome A-Regulated Gene Expression by Combined Analysis of Genomic Sequence and Microarray Data
Plant Physiology,
December 1, 2003;
133(4):
1605 - 1616.
[Abstract]
[Full Text]
|
 |
|
|
|