First published online November 7, 2002; 10.1104/pp.012179
Plant Physiol, December 2002, Vol. 130, pp. 1626-1635
Contiguous Genomic DNA Sequence Comprising the 19-kD Zein
Gene Family from Maize1
Rentao
Song and
Joachim
Messing*
Waksman Institute, Rutgers, The State University of New Jersey, 190 Frelinghuysen Road, Piscataway, New Jersey 08854-8020
 |
ABSTRACT |
A new approach has been undertaken to analyze the sequences
and linear organization of the 19-kD zein genes in maize (Zea mays). A high-coverage, large-insert genomic library of the
inbred line B73 based on bacterial artificial chromosomes was used to isolate a redundant set of clones containing members of the 19-kD zein
gene family, which previously had been estimated to consist of 50 members. The redundant set of clones was used to create bins of
overlapping clones that represented five distinct genomic regions.
Representative clones containing the entire set of 19-kD zein genes
were chosen from each region and sequenced. Seven bacterial artificial
chromosome clones yielded 1,160 kb of genomic DNA. Three of them formed
a contiguous sequence of 478 kb, the longest contiguous sequenced
region of the maize genome. Altogether, these DNA sequences provide the
linear organization of 25 19-kD zein genes, one-half the number
previously estimated. It is suggested that the difference is because of
haplotypes exhibiting different degrees of gene amplification in the
zein multigene family. About one-half the genes present in B73 appear
to be expressed. Because some active genes have only been duplicated
recently, they are so conserved in their sequence that previous cDNA
sequence analysis resulted in "unigenes" that were actually derived
from different gene copies. This analysis also shows that the 22- and
19-kD zein gene families shared a common ancestor. Although both
ancestral genes had the same incremental gene amplification, the 19-kD
zein branch exhibited a greater degree of far-distance gene
translocations than the 22-kD zein gene family.
 |
INTRODUCTION |
There is evidence that plants have
rather large gene families coding for closely related gene products
(http://www.Arabidopsis.org/info/genefamily/). Although in many cases members of these gene families are
distributed throughout the genome, there are many examples where gene
copies are clustered (Arabidopsis Genome Initiative,
2000 ). By understanding the organization of these gene
families, we can glean new insights into gene amplification, gene
function, gene regulation, and chromosomal architecture.
Traditional methods, like Southern-blot analysis of genomic DNA or
clones of genomic libraries, do not provide the positional information
necessary to understand the linear organization of gene families within
the genome. Furthermore, chromosome-walking methods (Bender et
al., 1983 ) based on genomic libraries are particularly difficult to conduct in larger plant genomes because of the high content of transposable elements and other repeat sequences
(SanMiguel et al., 1996 ). To merge overlapping clones to
construct contiguous genomic sequences beyond the limitations of these
older methods, we have developed a new approach that consists
of the following steps: (a) An expressed sequence tag (EST) database is
used to organize a gene family into subfamilies so that
subfamily-specific gene probes can be developed; (b) High-coverage
bacterial artificial chromosome (BAC)-based genomic libraries are
screened with these probes to identify sets of BAC clones for each
subfamily; (c) BAC clones are DNA fingerprinted to create bins of
overlapping BAC clones; (d) A comparison of DNA restriction fragment
patterns of BAC clones and genomic DNA is used to assess the presence
of all gene copies within the DNA fingerprinted BAC clones; and
(e) BAC clones comprising individual clusters of each subfamily are sequenced. In this study, we applied this approach to a large gene
family in maize (Zea mays) that encodes storage proteins in
the maize endosperm.
Maize, which typically has a large genome ranging in size from 2.3 to 3.3 Gb (http://www.agron.missouri.edu/zeadna.html#Lau1985), belongs to the Gramineae family, which includes many of the cereal crops (Kellogg, 1998 ). It is one of the major sources of
essential amino acids for livestock and humans, which are required for
healthy nutrition. The majority of amino acids in maize kernels are
contained within seed storage proteins that predominantly accumulate in the endosperm (Burr and Burr, 1976 ). These proteins are
rich in the amino acids Pro and Gln, and, therefore, are called
prolamins. The maize prolamins, also known as zeins, consist of a large
protein family that can be divided into two classes. One of these
classes has higher levels of sulfur-rich amino acids and Pro, and is
encoded by only one to two gene copies. The other class, called
-zeins, contains higher levels of Leu and Gln, and is encoded by a
large gene family (Heidecker and Messing, 1986 ).
The -zein gene family can be divided into four subfamilies: z1A,
z1B, z1C, and z1D. These subdivisions were determined using sequence
homology and copy number based on DNA hybridization data as a
classification scheme (for review, see Heidecker and Messing, 1986 ). Three of the subfamilies, z1A, z1B, and z1D, have
relative molecular masses in SDS-polyacrylamide gels of 19 kD,
whereas the z1C subfamily has a molecular mass of 22 kD. cDNA sequence analysis indicated that the sizes of these proteins can vary within subfamilies as a consequence of internal insertions/deletions (Heidecker et al., 1991 ). Our laboratory recently
reported the sequence analysis of the entire z1C subfamily in maize
inbred line BSSS53, which contained a total of 23 gene copies, 22 of them tandemly arranged within 168 kb (Song et al.,
2001 ). The 23rd copy was the normal allele of the
floury-2 locus (Coleman et al., 1995 ) that
was separated from the remaining 22 copies by about 20 cM. Most of
these genes exhibited a size consistent with a relative molecular
mass of 22 kD; but one had a deletion (Zp22/D87),
which enabled it to comigrate in SDS-polyacrylamide gels with the 19-kD
zeins. Although DNA hybridization data estimated the copy number of the
z1C subfamily to be 15 (Heidecker and Messing, 1986 ),
this peculiarity is because of the different haplotypes present in
various inbred lines (R. Song and J. Messing, unpublished data).
Because of these different haplotypes, the size of each subfamily
cannot be used to distinguish it by complexity alone. It has become
clear that sequence homology is a more reliable parameter to classify
the -zein gene family.
The size of the other three subfamilies, z1A, z1B, and z1D, was
estimated to be 25, 20, and 5, respectively, or containing roughly 50 members (Heidecker and Messing, 1986 ). It was also estimated that the remaining three subfamilies occurred on three of
the 10 chromosomes in maize. Given the size and the
distribution of these gene families, traditional genomic analysis was
not sufficient to determine their genomic organization. Furthermore,
expression analysis could not be linked to individual genes in the
absence of their positional information within a contiguous genomic
sequence. To provide a genomic reference of these gene families for
future studies, we applied the approach described above to this set of genes in a single genetic background. Greatly aiding this approach was
the recent construction of a large insert BAC library of
MboI partially digested maize genomic DNA of inbred B73 (Yim
et al., 2002 ). Following the previously outlined steps, we can
show that the three -zein subfamilies in B73 contain only 25 gene
copies and are composed of five different genomic regions covering a total length of over 1 Mb. This analysis also shows the three 19-kD
-zein gene subfamilies underwent a greater degree of far-distance translocation than the 22-kD zein gene subfamily during stages of gene amplification.
 |
RESULTS |
Division of Subfamilies by EST Database Analysis
Before genomic analysis of the -zein gene family was conducted,
gene members were sorted by sequence homology. The simplest and most
accurate approach to obtain the subdivisions is by sequence analysis of
a random collection of cDNAs produced from immature endosperm, where
the zein genes are expressed (Burr et al., 1982 ).
The maize EST database contains non-normalized cDNA sequences from
different sources of zein mRNA (http://www. zmdb.iastate.edu/). Tentative unique contigs (TUC) of all 19-kD zein-specific sequences were collected. A total of 361 cDNA sequences fell into three groups of
related sequences that correspond to the previously defined subfamilies
of the 19-kD zein gene family: z1A, z1B, and z1D (Table
I). The number of ESTs for each subfamily
was also in agreement with its size, based on previous hybridization
data (Burr et al., 1982 ). There were a total of 188 ESTs
for z1A, 126 ESTs for z1B, and only 47 ESTs for z1D. Interestingly,
within each subfamily, there appeared to be members that were expressed at much higher levels than others. Sequences from each subfamily were
then used to draw consensus sequences for each of the three 19-kD zein
gene subfamilies. Based on these consensus sequences, primers were
designed to amplify sequences from the BSSS53 cDNA collection to
generate specific probes for the isolation of the genomic regions
comprising these genes.
BAC Clone Isolation and Identification
The three subfamily-specific probes were used to screen
high-density filters of a maize BAC library constructed from maize inbred line B73 that was partially digested with MboI. A
total of 83 clones were identified by hybridization under medium
stringency ("Materials and Methods"). These clones were then
analyzed in Southern blots at high stringency, reducing the set of
positive BAC clones to 57. All 57 clones were subjected to
NotI digestion and pulsed-field gel electrophoresis
was used to estimate the size of the genomic DNA, which ranged
from 50 to 210 kb (Table II). We obtained
25 clones for the z1A subfamily, 14 clones for the z1B subfamily, and
18 clones for the z1D subfamily. DNA fingerprinting followed by
Southern-blot analysis, however, revealed five different fingerprinting
contigs for the entire 19-kD zein gene family (Fig. 1). z1A and z1B each consisted of two BAC
contigs (Table II), indicating they are in two noncontiguous genomic
locations. The z1D subfamily also fell into two contigs; however,
additional analysis showed they could be merged into one. Therefore,
the z1D subfamily occupies one genomic location but spans an extended genomic region.

View larger version (115K):
[in this window]
[in a new window]
|
Figure 1.
Fingerprinting and Southern-blot analysis of
19-kD- -zein BAC clones. To maximize comparison, BAC clones were
sorted into three groups, z1A, z1B, and z1D, according to different
probes (three vertical panels in the figure). DNA was also
fingerprinted by HindIII digestion and separated by 1%
(w/v) agarose gel electrophoresis. After electrophoresis, the
DNA restriction fragment pattern was recorded by a photo, as shown in
the upper part of the figure. DNA from the agarose gel was subsequently
blotted to nylon membranes and subjected to Southern-blot analysis
using z1A-, z1B-, and z1D-specific probes. DNA fragment bands detected
by specific probes were visualized by autoradiography as shown in the
lower part of the figure. BAC clone designations correspond to those in
Table I. M, DNA marker lane, in which a 1-kb DNA ladder was used. In
the z1A and z1B panels, BAC clone designations were labeled by two
different colors, indicating two different DNA fragment patterns within
the group. Size rulers in kb are included on the right side of the
picture.
|
|
Sequence Analysis of the Five Genomic Regions Containing 19-kD Zein
Genes
The BAC clones that appeared to comprise all the zein clusters
were chosen for DNA sequencing (underlined clones in Table II). To
confirm their suitability for analysis, candidate clones were subjected
to a comparative Southern blot of the BAC inserts with B73 genomic DNA
(data not shown). Based on the banding pattern, a total of seven BAC
clones was selected that accounted for all gene copies of the 19-kD
zein gene family. BAC clones Z448F14 and Z350D07 were selected for the
z1A subfamily, and BAC clones Z492M16 and Z531H07 were chosen for the
z1B subfamily. BAC clones Z576A02, Z513H09, and Z410H16 formed a
contiguous genomic sequence that contained all the members of the z1D
subfamily (Table III).
DNA sequencing was carried out by the shotgun approach (Messing
et al., 1981 ). BAC DNA was physically sheared and cloned into a
pUC-sequencing vector as described in "Materials and Methods." Each
shotgun clone was sequenced from both ends to provide a pair of linked
sequences (Vieira and Messing, 1982 ). This linkage was critical in the process of turning the assembly of shotgun reads into
contiguous sequence information (contigs). Each BAC was sequenced at a
10- to 15-fold coverage before the sequences were assembled into
contigs. Different sequencing chemistries, or primer walking, were
carried out to fill gaps and/or place different contigs into the
correct order. As a result, a total of 1,160 kb was generated from
these seven BAC clones with only a few gaps remaining. Such BAC clones
are regarded as phase II level sequence. However, the number of ordered
pieces and the sizes of the gaps can vary substantially for phase II
level sequence. The seven BACs sequenced here have zero to four gaps
left in regions that do not contain storage protein gene sequences
(Table III). Gaps can be caused by DNA sequences that are absent in
missing shotgun libraries (physical gap) or DNA sequences that are
difficult to sequence. The size of a physical gap is usually around 500 bp, whereas that of a sequencing gap is around 50 bp. BAC clone
sequencing has been summarized in Table III.
Complete Set of 19-kD Zein Genes in the Maize B73 Inbred
Line
All sequences generated in this study were subjected to
sequence homolog searches using known 19-kD zein cDNA sequences. A total of 25 copies of the 19-kD zein gene sequences
representing the entire collection of 19-kD zein gene copies in the
maize inbred line B73 (Table IV) was
discovered. The two z1A clones, Z448F14 and Z350D07, contained nine and
three copies, respectively, of 19-kD zein gene sequences. The two z1B
clones, Z492M16 and Z531H07, contained six and two copies,
respectively, of 19-kD zein gene sequences. The three overlapping z1D
clones, Z576A02, Z513H09 and Z410H16, contained three, one, and one
19-kD zein gene sequence copies, respectively. A diagram with the
relative positions of these zein genes on each BAC clone is presented
in Figure 2.

View larger version (11K):
[in this window]
[in a new window]
|
Figure 2.
The distribution of 19-kD -zein genes in
different BAC clones. Each BAC clone was presented as a bar with the
clone name and size (in parentheses) above it. BAC clones were
orientated according to the transcriptional direction of 19-kD zein
genes (all from 5' to 3'). BAC clones were sorted by z1A, z1B, and z1D
three subfamilies, as indicated with three boxed sections in the
figure. A ruler with size in kb is shown at the left upper corner of
the figure. Along the bars, red ovals indicate 19-kD zein genes, and
the numbers within them indicate their order from 5' to 3' in each BAC
clone, corresponding to their gene names in Table IV. Blue ovals
indicate the position of other predicted genes.
|
|
Among the 19-kD zein genes, only one-half of them (12 of 25) had
intact coding regions, whereas the other one-half (13 of 25) were
"damaged" by truncation, internal deletion, or stop codons (Table
IV). The intact copies of 19-kD zein genes exhibited two different size
ranges: z1B and z1D had coding sizes of 723 and 726 bp, whereas those
of z1A displayed coding sizes of 702 and 705 bp. There was one
exception to this dichotomy in size, gene Z448F14-2 in the
z1A subfamily that had an 804-bp coding region, which is very close to
the size of most of the 22-kD zein gene coding regions (801 bp). As
previously mentioned, the reverse has also been described
forZp22/D87 from inbred line BSSS53, a 22-kD zein gene, that
has a coding region of 716 bp, which is close to that of 19-kD zeins
(Song et al., 2001 ).
Expression of the 19-kD Zein Gene Family
Because the analysis conducted in this study provided us with not
only all 19-kD zein genes, but also their linear arrangement within the
genome, we can now begin to investigate how each gene is regulated. As
a first step, we matched the EST database with each of the genomic
19-kD zein gene sequences. This analysis greatly depended on the degree
of polymorphism occurring among these genes and between
various haplotypes because the ESTs in the database were derived from
different inbred lines than B73
(http://zmdb.iastate.edu/zmdb/EST/libraries.html). However, in our
study of the 22-kD zein genes (z1C subfamily), we noted that
orthologous positions between different inbred lines exhibit a higher
degree of conservation than nonorthologous positions, except for very
recently amplified gene copies (Llaca and Messing, 1998 ;
R. Song and J. Messing, unpublished data). To gain a preliminary overview of which genes were likely to be expressed, we accounted for the divergence between inbred lines by setting a threshold of 98%
identity over a minimum length of 500 bp of EST sequences with genes
from the 19-kD zein family (Table IV).
Based on this analysis, all 19-kD zein genes with intact coding regions
are expressed, although their expression levels vary greatly from each
other. All 19-kD zein genes with truncations appear to be incapable of
accumulating mRNA levels that were detectable within the size of
these libraries. However, mRNAs of three 19-kD zein genes from the z1B
subfamily (Z492M16-1, Z492M16-2, and
Z492M16-5) that contain in-frame stop codons appear to
accumulate mRNA, and, therefore, might produce truncated versions of
-zein proteins. This is in contrast to other genes with in-frame
stop codons that do not direct the accumulation of mRNAs (Van
Hoof and Green, 1996 ; Patracek et al., 2000 ).
Although 15 of 25 genes do accumulate mRNAs at detectable levels, these
levels differ by more than 20- to 30-fold. One of the difficulties in
quantifying the mRNA levels of individual genes with this EST data set
was the very recent amplification of some members of the 19-kD zein
gene family. EST contig TUC02-02-07-16440.1 from ZmDB, which contained
a total of 336 ESTs, apparently contained mixed ESTs from five
different 19-kD zein genes from the z1A subfamily
(Z448F14-3, Z448F14-4, Z448F14-5,
Z448F14-6, and Z448F14-7). Another EST contig,
TUC02-02-07-4151.1 also from ZmDB, which consists of 144 ESTs,
contained mixed ESTs from two expressed 19-kD zein genes from the z1B
subfamily (Z492M16-4 and Z492M16-6).
Distance Analysis of -Zein Genes
Given the different physical linkages of 19-kD zein genes, one
would assume that amplification and mobility of the zein genes occurred
at different times in evolution. This assumption would be consistent
with our previous analysis of the 22-kD zein genes in inbred BSSS53
(Song et al., 2001 ). Therefore, all the coding sequences
of the 19-kD zein genes were used for distance analysis employing the
Clustal method (Higgins and Sharp, 1989 ). Two severely truncated 19-kD zein gene copies were excluded to avoid a bias in the
comparison (Z448F14-9 and Z576A02-1). Although we
already knew that there were differences in gene copy numbers and
sequence polymorphisms between different inbred lines, we included the z1C coding sequences of inbred BSSS53 in the distance analysis as a
reference because allelic sequence differences are too small to disturb
a phylogenetic tree between these subfamilies (shown in Fig.
3). From this analysis, the z1A, z1B, and
z1D subfamilies separated from the z1C subfamily, confirming that the
22- and 19-kD zein genes fall into two groups of genes that originated from a common ancestor.

View larger version (17K):
[in this window]
[in a new window]
|
Figure 3.
Phylogenetic analysis of the maize -zein genes.
The coding regions of maize -zein (19- and 22-kD zein) genes were
aligned by the Clustal method to generate a phylogenetic tree. Two
19-kD zein genes with large sequence truncation (Z448F14-9
and Z576A02-1) were not included in this study. The 19-kD
zein genes with an asterisk on their names mark those with intact
coding regions. The data of 22-kD zein genes came from our previous
study (Song et al., 2001 ). The two major clades in the
figure correspond to 19-kD zein genes (top) and 22-kD zein genes
(bottom). The 19-kD zein gene clade is split into three smaller clades,
corresponding to the three subfamilies z1A, z1B, and z1D, respectively.
The 22-kD zein gene clade contains a single subfamily z1C. Gene names
were color coded according to their relationship within the different
genomic locations: Z448F14 (yellow), Z350D07 (orange), Z492M16 (green),
Z531H07 (blue), z1D contigs (light blue), z1C gene cluster (pink), and
fl2 locus (red). A ruler in the bottom of the figure
provides an estimated evolutionary time scale in million years ago
(mya).
|
|
 |
DISCUSSION |
Complexity of the 19-kD -Zein Gene Family
Here, we have described the genomic organization of a large gene
family in maize comprising all members of the 19-kD zein genes. These
genes fall into three subfamilies, and are located in five distinct
genomic regions. Because the maize genome has not yet been sequenced,
the question that must be posed is whether or not the experimental
approach described in this study was capable of uncovering all the
members of this gene family? We established three basic criteria for
determining the comprehensive isolation of the 19-kD zein genes. First,
the probes used in this study were developed from DNA sequence
information of an EST database, rather than from hybridization data.
DNA sequence information can reveal members of a gene family that have
diverged to a degree that even under reduced stringency would not be
detectable by DNA hybridization experiments. Sequence divergence was
then addressed by selecting as many DNA probes as necessary to detect
all members of the gene family by DNA hybridization.
When this project commenced 2 years ago, the ZmDB database contained
more than 300 ESTs of 19-kD zein cDNAs. At present, the data set has
more than doubled in size, and it remains consistent with the results
from the first data set. Second, the BAC library used for this study
had a comprehensive coverage of the maize genome. This BAC library was
constructed from maize inbred line B73, has an average insert size of
167 kb and a total of 105,579 clones, which provided a 7-fold genome
coverage based on a genome size of 2.5 Gb (Arumunganathan and
Earle, 1991 ). Third, the first screening of BAC high-density
filters was carried out under a medium-stringency hybridization
condition, although we used three different probes. Under such
conditions, even clones from the 22-kD zein gene family that have
further diverged from 19-kD zein genes were detected (Heidecker
and Messing, 1983 ; also see Fig. 3). Therefore, we believe
that, based on these criteria, the isolation of 19-kD zein genes of
maize inbred line B73 was complete.
Genomic Organization of the 19-kD -Zein Gene Family
Segregation studies of polymorphic 19-kD zein proteins or RFLP of
19-kD zein genes placed these genes on maize chromosomes 1, 4, and 7 (Soave et al., 1981 ; Soave et al., 1982 ;
Wilson et al., 1989 ;Woo et al., 2001 ). In
a previous study, we demonstrated that the two other locations on
chromosome 4, where -zein genes have been mapped, represent the
22-kD zein gene family, or the z1C subfamily (Song et al.,
2001 ). In this study, BAC clones were sorted into three groups
according to three different probes. Within each group, BAC clones were
analyzed by DNA fingerprinting and Southern-blotting techniques. The
resulting fingerprint bins indicated a total of five unlinked genomic locations.
There is additional evidence that these five genomic regions are not
contiguous. A systematic effort was undertaken to DNA fingerprint the
clones of the entire BAC library that we utilized for the isolation of
the 19-kD zein genes (Maize Physical Mapping, http://genome.arizona.edu/fpc/maize). Furthermore, two additional libraries, made with HindIII and EcoRI from the
same B73 germplasm, are also in the process of being fingerprinted. The
BAC DNA fingerprints were used to assemble BAC clones into
fingerprinted contigs (FPCs) with a program called WebFPC. This
analysis also included all the zein clones shown in Table II. To date,
more than 232,000 BAC clones, representing 12-fold coverage of the
maize genome from the three different BAC libraries, have been
analyzed. All five genomic regions comprising the 19-kD zein
genes fall into individual FPCs (Table
V). Because these FPCs are larger
contiguous sequences than the zein gene clusters, they also indicate
that these five regions are in noncontiguous locations of the genome. In addition, we have sequenced the ends of several BAC clones, sometimes referred to as sequence-tagged connectors (STC). These STCs
were compared with the sequenced BAC clones, thereby confirming their
linkage to the correct genomic location. These data confirms the
reliability of the WebFPC program because all zein clones were placed
in correct bins. Therefore, future additional coverage of FPCs and
their placement on the genetic map, coupled with the development of an
STC database, will greatly facilitate the genomic analysis of complex
large gene families.
Comparison with EST Databases
Sequence analysis of the five genomic regions comprising
the 19-kD zein gene family revealed a total of 25 gene copies. Among the 19-kD zein genes, only about one-half exhibited an intact coding
region. The remainder of the gene copies displayed either in-frame stop
codons or truncations at the 5' or 3' end. In terms of gene expression,
all intact copies appeared to be expressed. If truncated copies were
transcribed, mRNAs must be rapidly turned over because no transcript
for these genes was detected. However, it was surprising that some of
the genes containing in-frame stop codons appeared to be expressed,
whereas others appeared not to be. It was previously shown that the
introduction of in-frame stop codons in plant mRNAs leads to mRNA
instability (Van Hoof and Green, 1996 ; Patracek
et al., 2000 ). Furthermore, a 22-kD zein mRNA from inbred W22
with an in-frame stop codon was shown to accumulate mRNA at levels two
orders of magnitude lower (5% versus 0.045%) than 22-kD zein mRNAs
without an in-frame stop codon (Liu and Rubenstein,
1993 ). Therefore, it is possible that the genes in B73 with an
in-frame stop codon that appear to be expressed do not have in-frame
stop codons in the inbreds from which they were derived. Such single
nucleotide differences between orthologous genes were also detected in
W22 and BSSS53 and usually affect a C to T conversion of the CAG or CAA
Gln codon (Llaca and Messing, 1998 ).
A large, privately held EST database of 6,732 endosperm-specific cDNAs
from inbred B73 also served as a source of an expression analysis of
zein genes (Woo et al., 2001 ). Although the genomic data
presented here were also derived from the B73 inbred, the Woo et
al. (2001) study predicted fewer expressed genes than our studies indicate. For instance, we discovered that two highly expressed
19-kD zein genes in the Woo et al. (2001) study (az19B1 and az19B3) are actually a mixture of different expressed genes that
are highly homologous. This result clearly demonstrates that the
assembly of ESTs in "unigene" sets must be considered tentative, and it will be important to maintain single cDNA reads in the databases
for the completion of linear genomic analysis of genes.
Recent Amplified Genes Are Tightly Clustered
A complication in gene expression studies is that gene copies are
highly homologous and are mainly discriminated by their chromosomal
position. Interestingly, these copies are tandemly arranged within a
short physical distance. The phylogenetic analysis also showed that
these highly homologous -zein gene copies must have recently been
amplified relative to the other members of the gene family. The more
extensive amplification of the z1A and z1B subfamilies resulted in gene
translocations, followed by additional amplification, explaining the
five genomic locations of the 19-kD zein genes. On the other hand, the
translocation of a single 22-kD zein gene (Fl2) did not give
rise to additional amplification (Song et al., 2001 ).
Furthermore, the z1D subfamily formed without any far-distance gene
translocation, but it also represents the subfamily with the lowest
degree of gene amplification. Interestingly, previous copy number
estimates of the 19-kD zein gene subfamily obtained by
hybridization data of inbred lines, other than B73, deviate
substantially for the z1A subfamily (25 versus 12 copies in B73), and
for the z1B subfamily (20 versus eight copies in B73; Heidecker
and Messing, 1986 ). However, the estimate for the z1D subfamily
is the same (five versus five in B73), suggesting that the longer
stretches of spacer region between copies of the z1D subfamily might
have prevented generation of more haplotypes based on gene
amplification. Comparison with other haplotypes, therefore, should
provide illuminating insights into how these genes were amplified.
There also seems to be a difference of how the 22- and 19-kD -zein
genes amplified. Both branches have undergone several rounds of gene
amplification, most of them long after the allotetraploidization of
maize within the last 5 million years (Gaut and Doebley,
1997 ; Song et al., 2001 ). Interestingly, this
period coincided with the extensive expansion of the maize genome by
retrotranspositions (SanMiguel et al., 1998 ). The 19-kD
zein branch split into two major clades, one representing the z1D
subfamily and the other representing the z1A and z1B subfamilies,
probably representing originally three genomic locations. A major
difference between the two branches is that the expansion of the 19-kD
branch involved a greater degree of far-distance movements within
the maize genome than the 22-kD branch. Gene copies of the
22-kD branch are mostly contained within a 168-kb genomic region,
whereas copies of the 19-kD branch are distributed among five different
genomic locations.
 |
MATERIALS AND METHODS |
Design of 19-kD Zein Probes
The 19-kD zein-related EST sequences were downloaded from ZmDB
(maize [Zea mays] genome database,
http://www.zmdb.iastate.edu/) in October 2000. These sequences were
assembled on a Macintosh G3 computer (Apple Computer, Cupertino,
CA) using the Seq-Man program of the Lasergene software package
(DNAStar, Inc., Madison, WI). Known sequences of z1A, z1B, and z1D were
included in the comparison. These sequences were aligned by homology
and assembled into three contigs (see Table I). Based on the previously
described representative member of each known subfamily
(Heidecker and Messing, 1986 ), the contigs were
classified as z1A, z1B, and z1D subfamilies, respectively. Primers were
designed based on consensus sequences derived from Seq-Man assemble,
which allowed PCR amplification for each of the z1A, z1B, and z1D
subfamilies, respectively: z1A, 5' primer: 5' AGTGCTGCTACGGCGACCATT; 3'
primer, 5' CGGAAGCCACAAACATCAGACAA; z1B, 5' primer:
5'CGGCACGAGGCAACATAGAAAGT; 3' primer, 5' TAAAAGAGGGCACCACCAATGATG; z1D,
5' primer: 5' ATACAATCCTACAGGCTACAAGAG; and 3' primer, 5' GTGGGCTGCTGCAATAAGGTG.
These primers were used to amplify corresponding fragments from cDNA
clones that were isolated from 18-d after pollination endosperm of
inbred line BSSS53 (R. Song and J. Messing, unpublished data).
PCR fragments were purified from an agarose gel and labeled using
standard procedures.
BAC Clone Isolation and Characterization
A maize BAC library CHORI201 was used for screening 19-kD zein
gene sequences. The library was constructed by using genomic DNA of
inbred line B73 partially digested by MboI (Yim et al., 2002 ). BAC library high-density filters were made using a Total Array System (BioRobotics, Inc., Comberton, Cambridge, UK).
Hybridization was carried out in 5× SSC with 7.5% (w/v) SDS at
65°C overnight. Membranes were washed under medium stringency with
1× SSC and 0.1% (w/v) SDS at 65°C twice, for 20 min each.
The membranes were wrapped and exposed to x-ray films. Tentative,
positive BAC clones were streaked on Luria-Bertani broth plates
and DNA minipreparations were carried out with a 3-mL overnight
culture. BAC DNA was digested with BamHI or
HindIII, and separated by agarose gel electrophoresis. DNA was blotted to membranes and hybridized with the probes described above in 5× SSC with 7.5% (w/v) SDS at 65°C overnight.
High-stringency washing conditions were performed twice for 20 min with
0.1× SSC and 0.1% (w/v) SDS at 65°C. Clones passing this
stringency selection were sorted into groups according to the DNA
hybridization probes.
All positive BAC clones were estimated for insert size using the
NotI sites in the vector that flanked the cloned DNA,
followed by pulsed-field gel electrophoresis using a CHEF
apparatus (Bio-Rad Laboratories, Inc., Hercules, CA). BAC end
sequencing was carried out with some BAC clones using the Big Dye
Terminator chemistry and the ABI3700 capillary sequencer (Applied
Biosystems, Inc., Foster City, CA).
BAC Sequencing
Large-scale BAC DNA preparations were conducted by using a Large
Construction Kit (Qiagen, Inc., Valencia, CA). BAC DNAs were physically
sheared using the HydroShear instrument (Genomic Instrumentation Services, Inc., San Carlos, CA) with varying speed codes. Two different
speed codes (11 and 14) were routinely used, which gave an average DNA
fragment size of 3 and 6 kb, respectively. The ends of the sheared DNA
were repaired with T4 DNA polymerase and the products
separated in agarose gels. DNA fractions from 2 to 4 kb (from speed
code 11) or 5 to 8 kb (from speed code 14) were recovered from gels
using a Gel Extraction Kit (Qiagen, Inc.). DNA was ligated into a
dephosphorylated cloning vector such as pUC18/SmaI/BAP
(Amersham Pharmacia Biotech, Inc., Piscataway, NJ) with a vector versus
insert molar ratio of 3:1 or 1:1. Ligated DNA was electroporated into
Escherichia coli strain DH10B and transformants were
selected on Luria-Bertani broth agar plates with appropriate
selective agents. Ten to 20 clones were checked for cloning quality
before large-scale shotgun sequencing.
A DNA template minipreparation for large-scale shotgun sequencing was
carried out by using QIAprep 96 turbo Miniprep Kit (Qiagen, Inc.).
Sequencing reactions of both ends of each clone were carried out by
using Big Dye Terminator chemistry (Applied Biosystems, Inc.).
Sequencing products were analyzed on ABI3700 capillary sequencers
(Applied Biosystems, Inc.). Base calling and quality assessment was
conducted with the PHRED program (Ewing and Green, 1998 ). Sequence assembly with the PHRAP program and assembled shotgun reads were viewed and edited with CONSED (Gordon et al., 1998 ). Primer walking and full shotgun sequencing were used to close gaps or orientate contigs. The dGTP Big Dye Terminator kit (Applied Biosystems, Inc.) was used to resolve the sequence of some
regions that were difficult to sequence. All BAC clones have been
advanced to phase II level sequence and deposited into the HTGS
division of GenBank.
Sequence Analysis
Draft sequences generated from high-throughput DNA sequencing
(phase II) were subjected to gene prediction programs with FGENESH (Softberry, Inc., Mount Kisco, NY). The predicted protein sequences were then subjected to BLASTP searches against databases in GenBank (Altschul et al., 1990 ). Only hits with significant
homology (e value of less than 10 5 to other species) have
been considered. Coding sequences of known 19-kD zein genes were
compared with BAC sequences using the BLAST2 (Tatusova and
Madden, 1999 ). The 19-kD zein sequences identified in this
analysis were then further analyzed by sequence distance analysis using
the MegAlign program of Lasergene (DNAStar, Inc.). Gene sequences with
large truncations were not included in the MegAlign analysis.
The coding regions of the different 19-kD zein genes were also
subjected to sequence homology searches against the ZmDB maize EST
database by BLASTN
(http://www.zmdb.iastate.edu/cgi-bin/ZmDBblast/ZMDB; Altschul et
al., 1990 ). Taking into consideration that the EST database was
made by ESTs from different maize inbred lines, 98% sequence identity
over a minimum length of 500 bp was set as threshold for the comparison
of genomic and cDNA sequences.
WebFPC Analysis
A BAC clone address was used to identify a maize FPC in the
maize WebFPC database (http://genome.arizona.edu/fpc/maize/). Furthermore, within an identified FPC, different BAC clones could be
searched. Some FPCs have been anchored with genetic markers; therefore,
their chromosomal location and genetic map position can be determined.
 |
ACKNOWLEDGMENTS |
We thank Drs. Gregorio Segal and Barbara Miesak More for
helpful comments on the manuscript, and Steve Kavchok and Steve Young for technical assistance.
 |
FOOTNOTES |
Received July 30, 2002; returned for revision August 28, 2002; accepted October 1, 2002.
1
This work was supported by the Department of
Energy (grant no. DE-FG05-95ER20194 to J.M.).
*
Corresponding author; e-mail messing{at}mbcl.rutgers.edu; fax
732-445-0072.
Article, publication date, and citation information can be found at
www.plantphysiol.org/cgi/doi/10.1104/pp.012179.
 |
LITERATURE CITED |
-
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ
(1990)
Basic local alignment search tool.
J Mol Biol
21: 5403-5410
-
Arabidopsis Genome Initiative
(2000)
Analysis of the genome sequence of the flowering plant Arabidopsis thaliana.
Nature
408: 796-815[CrossRef][Medline]
-
Arumunganathan K, Earle ED
(1991)
Nuclear DNA content of some important plant species.
Plant Mol Biol Rep
9: 208-219
-
Bender W, Spierer P, Hogness DS
(1983)
Chromosomal walking and jumping to isolate DNA from the Ace and rosy loci and the bithorax complex in Drosophila melanogaster.
J Mol Biol
168: 17-33[CrossRef][Web of Science][Medline]
-
Burr B, Burr FA
(1976)
Zein synthesis in maize endosperm by polyribosomes attached to protein bodies.
Proc Natl Acad Sci USA
73: 515-519[Abstract/Free Full Text]
-
Burr B, Burr FA, St. John TP, Thomas M, Davis RD
(1982)
Zein storage gene family of maize.
J Mol Biol
154: 33-49[CrossRef][Web of Science][Medline]
-
Coleman CE, Lopes MA, Gillikin JW, Boston RS, Larkins BA
(1995)
A defective signal peptide in the maize high-lysine mutant floury 2.
Proc Natl Acad Sci USA
92: 6828-6831[Abstract/Free Full Text]
-
Ewing B, Green P
(1998)
Base-calling of automated sequencer traces using PHRED: II. Error probabilities.
Genome Res
8: 186-194[Abstract/Free Full Text]
-
Gaut BS, Doebley JF
(1997)
DNA sequence evidence for the segmental allotetraploid origin of maize.
Proc Natl Acad Sci USA
94: 6809-6814[Abstract/Free Full Text]
-
Gordon D, Abajian C, Green P
(1998)
CONSED: a graphical tool for sequencing finishing.
Genome Res
8: 95-202
-
Heidecker G, Messing J
(1983)
Sequence analysis of zein cDNAs obtained by an efficient mRNA cloning method.
Nucleic Acids Res
11: 4891-4906[Abstract/Free Full Text]
-
Heidecker G, Chaudhuri S, Messing J
(1991)
Highly clustered zein gene sequences reveal evolutionary history of the multigene family.
Genomics
10: 719-732[Medline]
-
Heidecker G, Messing J
(1986)
Structural analysis of plant genes.
Annu Rev Plant Physiol
37: 439-466
-
Higgins DG, Sharp PM
(1989)
Fast and sensitive multiple sequence alignments on a microcomputer.
CABIOS
5: 151-153[Abstract/Free Full Text]
-
Kellogg EA
(1998)
Relationships of cereal crops and other grasses.
Proc Natl Acad Sci USA
95: 2005-2010[Abstract/Free Full Text]
-
Liu CN, Rubenstein I
(1993)
Transcriptional characterization of an alpha-zein gene cluster in maize.
Plant Mol Biol
22: 323-336[CrossRef][Web of Science][Medline]
-
Llaca V, Messing J
(1998)
Amplicons of maize zein genes are conserved within genic but expanded and constricted in intergenic regions.
Plant J
15: 211-220[CrossRef][Web of Science][Medline]
-
Messing J, Crea R, Seeburg PH
(1981)
A system for shotgun DNA sequencing.
Nucleic Acids Res
9: 309-321[Abstract/Free Full Text]
-
Patracek ME, Nuygen T, Thompson WF, Dickey LF
(2000)
Premature termination codons destabilize ferredoxin-1 mRNA when ferredoxin-1 is translated.
Plant J
21: 563-569[CrossRef][Web of Science][Medline]
-
SanMiguel P, Gaut BS, Tikhonov A, Nakajima Y, Bennetzen JL
(1998)
The paleontology of intergene retrotransposons of maize.
Nat Genet
20: 43-45[CrossRef][Web of Science][Medline]
-
SanMiguel P, Tikhonov A, Jin YK, Motchoulskaia N, Zakharov D, Melake-Berhan A, Springer PS, Edwards KJ, Lee M, Avramova Z, et al
(1996)
Nested retrotransposons in the intergenic regions of the maize genome.
Science
274: 765-768[Abstract/Free Full Text]
-
Soave C, Reggiani R, Di Fonzo N, Salamini F
(1981)
Clustering of genes for 20 kd zein subunits in the short arm of maize chromosome 7.
Genetics
97: 363-377[Abstract/Free Full Text]
-
Soave C, Riggiani R, Di Fonzo N, Salamini F
(1982)
Genes for zein subunits on maize chromosome 4.
Biochem Genet
20: 1027-1038[CrossRef][Medline]
-
Song R, Llaca V, Linton E, Messing J
(2001)
Sequence, regulation and evolution of the maize 22-kD
zein gene family.
Genome Res
11: 1817-1825[Abstract/Free Full Text] -
Tatusova TA, Madden TL
(1999)
Blast 2 sequences: a new tool for comparing protein and nucleotide sequences.
FEMS Microbiol Lett
174: 247-250[CrossRef][Web of Science][Medline]
-
Van Hoof A, Green PJ
(1996)
Premature nonsense codons decrease the stability of phytohemagglutinin mRNA in a position-dependent manner.
Plant J
10: 415-424[CrossRef][Web of Science][Medline]
-
Vieira J, Messing J
(1982)
The pUC plasmids, an M13 mp7 derived system for insertion mutagenesis and sequencing with synthetic universal primers.
Gene
19: 259-268[CrossRef][Web of Science][Medline]
-
Wilson CM, Sprague GF, Nelsen TC
(1989)
Linkage among zein genes determined by isoelectric focusing.
Theor Appl Genet
77: 217-226[CrossRef]
-
Woo YM, Hu DW, Larkins BA, Jung R
(2001)
Genomics analysis of genes expressed in maize endosperm identifies novel seed proteins and clarifies patterns of zein gene expression.
Plant Cell
13: 2297-2317[Abstract/Free Full Text]
-
Yim YS, Davis G, Duru N, Musket T, Linton EW, Messing J, McMullen MD, Soderlund C, Polacco M, Gardiner J, Coe EH Jr
(2003)
Characterization of three maize BAC libraries toward anchoring of the physical map to the genetic map using high density BAC filter hybridization.
Plant Physiol
130: 1686-1696[Abstract/Free Full Text]
© 2002 American Society of Plant Biologists
This article has been cited by other articles:

|
 |

|
 |
 
S. Locatelli, P. Piatti, M. Motto, and V. Rossi
Chromatin and DNA Modifications in the Opaque2-Mediated Regulation of Gene Transcription during Maize Endosperm Development
PLANT CELL,
May 1, 2009;
21(5):
1410 - 1427.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. S.S. Ammiraju, F. Lu, A. Sanyal, Y. Yu, X. Song, N. Jiang, A. C. Pontaroli, T. Rambo, J. Currie, K. Collura, et al.
Dynamic Evolution of Oryza Genomes Is Revealed by Comparative Genomic Analysis of a Genus-Wide Vertical Data Set
PLANT CELL,
December 1, 2008;
20(12):
3191 - 3209.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J.-H. Xu and J. Messing
Organization of the prolamin gene family provides insight into the evolution of the maize genome and gene duplications in grass species
PNAS,
September 23, 2008;
105(38):
14330 - 14335.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. C. Lamb, T. Danilova, M. J. Bauer, J. M. Meyer, J. J. Holland, M. D. Jensen, and J. A. Birchler
Single-Gene Detection and Karyotyping Using Small-Target Fluorescence in Situ Hybridization on Maize Somatic Chromosomes
Genetics,
March 1, 2007;
175(3):
1047 - 1058.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
R. M. Stupar, K. A. Beaubien, W. Jin, J. Song, M.-K. Lee, C. Wu, H.-B. Zhang, B. Han, and J. Jiang
Structural Diversity and Differential Transcription of the Patatin Multicopy Gene Family During Potato Tuber Development
Genetics,
February 1, 2006;
172(2):
1263 - 1275.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. Ma, P. SanMiguel, J. Lai, J. Messing, and J. L. Bennetzen
DNA Rearrangement in Orthologous Orp Regions of the Maize, Rice and Sorghum Genomes
Genetics,
July 1, 2005;
170(3):
1209 - 1220.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
R. Song, G. Segal, and J. Messing
Expression of the sorghum 10-member kafirin gene cluster in maize endosperm
Nucleic Acids Res.,
December 29, 2004;
32(22):
e189 - e189.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. Messing, A. K. Bharti, W. M. Karlowski, H. Gundlach, H. R. Kim, Y. Yu, F. Wei, G. Fuks, C. A. Soderlund, K. F. X. Mayer, et al.
Sequence composition and genome organization of maize
PNAS,
October 5, 2004;
101(40):
14349 - 14354.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. Lai, J. Ma, Z. Swigonova, W. Ramakrishna, E. Linton, V. Llaca, B. Tanyolac, Y.-J. Park, O-Y. Jeong, J. L. Bennetzen, et al.
Gene Loss and Movement in the Maize Genome
Genome Res.,
October 1, 2004;
14(10a):
1924 - 1931.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. Lai, N. Dey, C.-S. Kim, A. K. Bharti, S. Rudd, K. F.X. Mayer, B. A. Larkins, P. Becraft, and J. Messing
Characterization of the Maize Endosperm Transcriptome and Its Comparison to the Rice Genome
Genome Res.,
October 1, 2004;
14(10a):
1932 - 1937.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
C. S. Kim, B. G. Hunter, J. Kraft, R. S. Boston, S. Yans, R. Jung, and B. A. Larkins
A Defective Signal Peptide in a 19-kD {alpha}-Zein Protein Causes the Unfolded Protein Response and an Opaque Endosperm Phenotype in the Maize De*-B30 Mutant
Plant Physiology,
January 1, 2004;
134(1):
380 - 387.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
G. Lund, M. Lauria, P. Guldberg, and S. Zaina
Duplication-Dependent CG Suppression of the Seed Storage Protein Genes of Maize
Genetics,
October 1, 2003;
165(2):
835 - 848.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
G. Segal, R. Song, and J. Messing
A New Opaque Variant of Maize by a Single Dominant RNA-Interference-Inducing Transgene
Genetics,
September 1, 2003;
165(1):
387 - 397.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
R. Song and J. Messing
Gene expression of a gene family in maize based on noncollinear haplotypes
PNAS,
July 22, 2003;
100(15):
9055 - 9060.
[Abstract]
[Full Text]
[PDF]
|
 |
|
|
|