The differential transcription network between embryo and endosperm in the early developing maize seed.

Transcriptomic analysis of maize seed soon after pollination aids understanding of how maize embryo and endosperm are differentially regulated in the early development stage. Transcriptome analysis of early-developing maize (Zea mays) seed was conducted using Illumina sequencing. We mapped 11,074,508 and 11,495,788 paired-end reads from endosperm and embryo, respectively, at 9 d after pollination to define gene structure and alternative splicing events as well as transcriptional regulators of gene expression to quantify transcript abundance in both embryo and endosperm. We identified a large number of novel transcribed regions that did not fall within maize annotated regions, and many of the novel transcribed regions were tissue-specifically expressed. We found that 50.7% (8,556 of 16,878) of multiexonic genes were alternatively spliced, and some transcript isoforms were specifically expressed either in endosperm or in embryo. In addition, a total of 46 trans-splicing events, with nine intrachromosomal events and 37 interchromosomal events, were found in our data set. Many metabolic activities were specifically assigned to endosperm and embryo, such as starch biosynthesis in endosperm and lipid biosynthesis in embryo. Finally, a number of transcription factors and imprinting genes were found to be specifically expressed in embryo or endosperm. This data set will aid in understanding how embryo/endosperm development in maize is differentially regulated.

Transcriptome analysis of early-developing maize (Zea mays) seed was conducted using Illumina sequencing. We mapped 11,074,508 and 11,495,788 paired-end reads from endosperm and embryo, respectively, at 9 d after pollination to define gene structure and alternative splicing events as well as transcriptional regulators of gene expression to quantify transcript abundance in both embryo and endosperm. We identified a large number of novel transcribed regions that did not fall within maize annotated regions, and many of the novel transcribed regions were tissue-specifically expressed. We found that 50.7% (8,556 of 16,878) of multiexonic genes were alternatively spliced, and some transcript isoforms were specifically expressed either in endosperm or in embryo. In addition, a total of 46 trans-splicing events, with nine intrachromosomal events and 37 interchromosomal events, were found in our data set. Many metabolic activities were specifically assigned to endosperm and embryo, such as starch biosynthesis in endosperm and lipid biosynthesis in embryo. Finally, a number of transcription factors and imprinting genes were found to be specifically expressed in embryo or endosperm. This data set will aid in understanding how embryo/endosperm development in maize is differentially regulated.
Maize (Zea mays) seeds are one of the most important crop materials that provide resources for food, feed, biofuel, and raw material for processing. Maize seed development initiates from double fertilization, in which two of the pollen sperms fuse with an egg cell and a central cell to produce embryo and endosperm, respectively (Randolph, 1936;Chaudhury et al., 2001). The main function of endosperm is to provide nutrient for the developing embryo and germinating embryo.
After fertilization, the zygote undergoes an asymmetric division into a small apical cell and a large basal cell, giving rise to the embryo proper and the suspensor, respectively. The radial symmetry of the proembryo is shifted to a bilateral symmetry at the transition stage, which is characterized by protoderm formation. The shoot apical meristem and the root apical meristem can be distinguished at the onset of the coleoptile stage, and then the position of the future coleoptile is marked by a small protuberance. The mature embryo is composed of the embryo axis, which is formed by the plumule with five or six short internodes, and leaf primordia and primary root surrounded by the coleoptile and the coleorhiza, respectively, and the scutellum (Randolph, 1936;Abbe and Stein, 1954;Diboll, 1968;Lammeren, 1986;Vernoud et al., 2005).
Maize endosperm follows the nucleus-type endosperm development, where the fertilized central cell undergoes several rounds of synchronous division in the absence of cell wall formation and cytokinesis to produce a syncytium (Lopes and Larkins, 1993;Olsen, 2004). Cellularization allows the formation of the internuclear radial microtubule systems and open-ended alveolation from the periphery of the endosperm toward the central vacuole (Olsen et al., 1999;Olsen, 2004). Four major cell types of the maize endosperm are differentiated: transfer cells, aleurone cells, starchy endosperm cells, and embryo-surrounding region cells.
During maize endosperm development, there are three different types of cell cycles (Kowles and Phillips, 1988): (1) cytokinetic mitosis results in a syncytium; (2) mitosis occurs after cellularization and lasts in the central endosperm but continues in the aleurone and subaleurone layers, and cell division occurs and stops in a wave-like pattern from the basal region to the central region of the endosperm; and (3) endoreduplication, the reiterated rounds of DNA replication without chromatin condensation, sister chromatid segregation, or cytokinesis, results in endopolyploidy cells. Embryo and endosperm are both seed compartments exhibiting dramatic differences in multiple respects. Maize endosperm is not a transient tissue like Arabidopsis (Arabidopsis thaliana) endosperm, which is absorbed at a later stage of seed development.
Gene expression is a key event to determine embryo and endosperm development. Some genes have been functionally characterized in maize seed development. For example, Crinkly4 and defective kernel1 contribute to aleurone differentiation during maize endosperm development (Jin et al., 2000). BETL1 (for basal endosperm transfer layer gene), which is detectable at 12 d after pollination (DAP), is a key marker for transfer cell differentiation (Hueros et al., 1995(Hueros et al., , 1999a(Hueros et al., , 1999b. The outer cell layer gene family, which encode putative HD-ZP IV transcription factors (TFs), play a role in protoderm formation and maintenance (Ingram et al., 1999(Ingram et al., , 2000. The lipid transfer protein2 gene is specifically expressed in the abaxial protoderm of the scutellum and coleoptile (Sossountzov et al., 1991). knotted1 contributes to the formation of embryo shoot apical meristem (Smith et al., 1995;Kerstetter et al., 1997). However, a complete gene expression profile of embryo and endosperm development in maize is still lacking.
The transcriptome is the overall set of transcribed regions of the genome. Transcriptomic analysis of the embryo and endosperm is essential for understanding the developmental process of these two tissues. Recently, development of the next-generation high-throughput DNA sequencing technologies has provided a robust tool for mapping and quantifying the transcriptome, RNA-seq. RNA-seq data are highly reproducible, with few systematic discrepancies among technical replicates (Marioni et al., 2008). RNA-seq technology has been applied to uncovering the entire transcriptome and identifying alternative splicing (AS) and novel transcribed regions (NTRs) as well as chimeric transcripts produced by trans-splicing in human, yeast, mouse, Arabidopsis, and rice (Oryza sativa; Cloonan et al., 2008;Lister et al., 2008;Mortazavi et al., 2008;Nagalakshmi et al., 2008;Pan et al., 2008;Sultan et al., 2008;Wang et al., 2008;Wilhelm et al., 2008;Filichkin et al., 2010;Lu et al., 2010;Zhang et al., 2010).
So far, transcriptomic analyses have been carried out in maize with pericycle cells of the primary root (Dembinsky et al., 2007), endosperm (Prioul et al., 2008), seedling (Fu et al., 2010), leaf bundle sheath , kernel and leaf meristem (Kakumanu et al., 2012), and shoot apical meristem (Takacs et al., 2012). However, to our knowledge, the genome-wide transcriptional profile of embryo and endosperm by RNA-seq technology in maize has not yet been performed. To investigate the transcriptional network that governs seed development in maize, we conducted RNA sequencing using 9-DAP embryo and endosperm to profile gene expression. We explored these data with informatics tools to depict the landscape of the transcriptional network as well as major metabolic activities that are associated with embryo/endosperm development in maize.

Overview of the Transcriptome in 9-DAP Maize Seeds
To obtain an overview of the transcription profile in early-developing maize seeds, we performed highthroughput RNA-seq, utilizing paired-end Illumina sequencing technology, on poly(A)-enriched RNAs isolated from 9-DAP endosperm and embryo tissues (Supplemental Fig. S1). Because maize seed development is highly sensitive to environmental conditions, we present the structural details of the embryo and endosperm samples harvested for RNA-seq analysis (Supplemental Fig. S2). The endosperm already completed differentiation, with the aleurone and transfer cell as well as starchy endosperm cells formed (Supplemental Fig. S2, A and B). The embryo was around 1.3 mm in length and characterized by the emerging leaf primordia (Supplemental Fig. S2C). To further characterize the developmental stage of the maize seeds, expression profiles of storage protein genes were specifically compiled using the RNA-seq data generated in this study (Supplemental Table S1). Previous studies showed that the expression of zein genes is temporally regulated during endosperm development (Kodrzycki et al., 1989;Woo et al., 2001). In our RNA-seq data, two genes, 18-kD d-zein and 19-kD a-zein, were most abundantly expressed. Notably, 27-kD g-zein, which was reported to be expressed at 10 to 25 DAP (Woo et al., 2001), was not detected in our data. All these characterizations are indicative of an early development stage of the maize seeds analyzed in this study.
Sequencing resulted in 11,074,508 and 11,495,788 paired-end reads (100-nucleotide read length) from endosperm and embryo, respectively. The generated reads were then aligned to the maize genome (ZmB73_ RefGen_v2; Schnable et al., 2009) using three aligners, Burrows-Wheeler Alignment (Li and Durbin, 2009), Bowtie (Langmead et al., 2009), and TopHat , and the combined results were used for further analysis (see "Materials and Methods"). We mapped almost 87% of the reads from both samples to the reference genome, of which nearly 86% with both ends were correctly aligned (Table I). As expected, most (nearly 97%) of the reads (68.1% exonic and 28.8% junction) were mapped to annotated gene bodies, demonstrating that the majority of the detected genes have been predicted in the maize annotation. Additionally, 3.3% of reads were mapped to intergenic regions (Fig.  1A). All junction reads corresponded to 97,938 unique splicing junction sites, of which 78,592 junctions (80.2%) were identical to the annotation. The remaining 19,346 junctions (19.8%), however, were newly discovered from our RNA-seq data and have yet to be incorporated into transcript models (Fig. 1B).
The aligned reads were then subjected to Cufflinks assembly (Trapnell et al., 2010), leading to the identification of 120,828 unique exons from both samples (Fig. 1C). Among these, 89,027 exons (73.7%) were consistent with boundaries of annotated exons; the remaining were newly identified ones, mostly from either intergenic (46.36%) or intragenic (45.56%) regions (Fig. 1C). The newly detected intragenic exons and junctions indicated novel transcript isoforms to be annotated or some existing gene models to be refined. However, exons and junctions detected from intergenic regions revealed the existence of some NTRs (see below). The number of detected exons and junctions from endosperm and embryo are also listed separately in Table I.
To gain a global overview of a transcriptome on the genome, the read distribution along each chromosome (divided into 100 windows) was constructed ( Fig. 1D; Supplemental Fig. S3). The overall pattern is correlated with the gene distribution across the chromosomes. However, some regions showed quite different patterns between endosperm and embryo, indicating that genes annotated within these regions were differentially expressed. For example, two differentially expressed genes (GRMZM2G304745, highly expressed in endosperm, and GRMZM2G080054, highly expressed in embryo) were located in such regions (Fig. 1D). The differential expression of these two genes was confirmed by quantitative reverse transcription (qRT)-PCR (Supplemental Table S9).

Discovery of NTRs
We observed that a sizable portion of the RNA-seq reads did not fall within annotated regions of the maize genome ( Fig. 1). To comprehensively identify NTRs that are not linked to any annotated gene models, we developed a computational pipeline that integrates RNAseq data with available annotation data and consists of several highly stringent filtering steps ( Fig. 2A; see "Materials and Methods"): (1) more than one exon per transcript; (2) transcript length (total length of exons in bp) longer than 300 nucleotides; and (3) expression level greater than 1 reads per kilobase of exon model per million mapped reads (RPKM). In addition, we required that these NTRs were at least 300 nucleotides distant from an annotated gene. Using these criteria, we identified 1,286 NTRs containing 2,043 multiexonic transcripts (Supplemental Data S1).
Previous studies have shown that NTRs have fewer and shorter exons, less protein-coding potential, and less tissue-specific expression than annotated protein-coding genes (Bruno et al., 2010;Lu et al., 2010;Wetterbom et al., 2010;Zhang et al., 2010;Aanes et al., 2011;Graveley et al., 2011). To determine whether endospermic and embryonic NTRs have similar features, we analyzed the structure, coding potentials, and expression levels of the NTRs. We found that 80.5% of the NTRs had fewer than five exons and that only 26 transcripts (1.27%) had more than 10 exons. Moreover, NTRs had fewer exons per transcript (about 3.4) than the average protein-coding genes (about 5.2; Fig. 2B). From another point of view, we observed that the novel transcripts were on average about one-half of the length of protein-coding transcripts (median size of 746 bp for novel transcripts versus 1,419 bp for protein-coding transcripts; Fig. 2C). The average expression levels (calculated as RPKM; see "Materials and Methods") of NTRs were comparable to those of annotated transcripts (Fig. 2D). However, the expression pattern of NTRs was quite different between endosperm and embryo, with 45.3% (583 of 1,286) of NTRs showing more than 2-fold change between these two tissues ( Fig. 2E), indicating that these NTRs were tissuepreferentially expressed.
We found that a small fraction (312; 15.3%) of NTRs was supported by available EST data (greater than 80% identity and 80% coverage). Nearly 60% (1,210 of 2,043) of the NTRs that have open reading frames (ORFs) with the potential to encode proteins with greater than 100 amino acids (Fig. 2F) may be bona fide novel protein-coding genes. The remaining NTRs could encode small peptides, but many are likely to serve as noncoding RNAs. Besides, we observed that some (490; 24.0%) NTRs are heterochromatic, with sequence similarity (greater than 100-nucleotide match and 80% identity) to transposable elements (TEs; Fig. 2F), indicating the presence of substantially active TEs or TE  2F). An NTR was found located between GRMZM2G063253 and GRMZM2G063754. We used the RNA-seq data to infer the structure of this NTR and identified at least nine overlapping transcripts within this region (Fig. 3A). Interestingly, all these transcripts had predicted ORFs Figure 1. Overview of the maize seed transcriptome of embryo and endosperm at 9 DAP. A, Pie chart indicating the proportion of RNAseq reads assigned to maize annotated genomic features. Two samples were combined for analysis. B, Shared and unique splicing junctions from annotation and support with junction reads. C, Known and novel exons assembled from RNA-seq reads. The newly identified exons can be assigned to four categories according to their genomic locations: antisense to known genes (blue), intergenic regions (brown), annotated intronic regions (violet), and overlapped with known exons but with distinct boundaries (exonic; green). D, Distribution of RNAseq reads (calculated as RPKM) and annotated genes (dashed red line) across chromosome 2. The whole chromosome was divided into 100 bins for visualization. Two tissue-specific genes (GRMZM2G304745 for endosperm and GRMZM2G080054 for embryo) are shown in the right panel. For the distribution from other chromosomes, see Supplemental Figure  but lacked EST evidence, and they were unambiguously divided into two groups based on junction reads, with one (containing five transcripts) being exclusively expressed in embryo and another (containing four transcripts) in endosperm (Fig. 3A). Moreover, these two groups of RNAs differed mainly in the two variant junction regions (Fig. 3A). It remains to be elucidated whether these RNAs function as regulatory and/or peptide-encoding RNAs in maize seed development.
To confirm the existence of the NTRs we observed in the RNA-seq data, we performed semiquantitative reverse transcription (RT)-PCR on some of the NTRs selected (Supplemental Fig. S4). Of 18 NTRs analyzed, 16 (89%) were detected either in embryo or endosperm or both, and 11 (69%) showed expression patterns consistent with the transcriptomic data. For example, NTR.g0111 was specifically expressed in the embryo (Supplemental Fig. S4).

AS Is Differentially Regulated between Embryo and Endosperm
Transcript AS, a universal phenomenon in higher eukaryotes, is considered a key factor in increasing the diversity of the transcriptome and proteome (Nilsen and Graveley, 2010;Kalsotra and Cooper, 2011). To investigate the role of AS in regulating gene expression in maize seed, we conducted surveys of transcript isoforms in the embryo and endosperm. We first estimated the number of maize multiexonic genes with AS by calculating the fraction of genes with more than one expressed transcript divided by the number of genes with at least one splice junction in our RNA-seq data (multiexonic genes, including NTRs). We found that 50.7% (8,556 of 16,878) of multiexonic genes were alternatively spliced, a little smaller than the most recent estimate based on RNA-seq in the maize leaf transcriptome (56.4%; Li et al., 2010) but larger than that in Figure 2. Discovery and description of NTRs. A, Overview of the RNA-seq-based transcript construction pipeline that was employed to identify NTRs. The main six steps of the approach are numbered on the left. Reads were mapped to the maize genome using TopHat, and initial transcripts were separately assembled by Cufflinks in each sample (step 1). After filtering the annotation-overlapped transcripts (step 2), the novel exons and junctions (step 3) were used to reconstruct the final transcript structures (step 4). These transcripts were further subject to size selection (step 5, multiple exon with length greater than 300 bp) and expression filter (step 6, RPKM . 1). This leads to identifying 1,286 NTRs with 2,043 distinct transcript isoforms. For details, see "Materials and Methods." B, Distribution of exon numbers of novel transcripts identified in this study. The majority have two exons. The numbers of average exons for NTRs and protein-coding genes (PC) are also shown. C, Length distribution of annotated genes and newly identified transcribed regions. Novel transcripts (median size of 746 bp) are much smaller than annotated maize transcripts (median size of 1,419 bp). D, Expression pattern of novel transcripts compared with annotated transcripts. E, Plot showing the different expression levels of the identified NTRs between embryo (orange) and endosperm (violet). For visualization, data points are sorted according to the expression levels from embryo. F, NTRs supported by ORF and EST evidence. ORFs were identified by getorf in the EMBOSS packages, and EST-supported criteria are as follows: identity greater than 80% and coverage greater than 80%. For an example of an NTR, see Figure  the annotation (42.6%; 12,669 of 29,709), further illustrating the strong power of RNA-seq technology for examining splicing diversity (Wang et al., 2009).
We observed that some transcript isoforms were specifically expressed in either endosperm or embryo. For example, the splicing pattern of the NTR mentioned . Splicing dynamics of maize seed development. A, An example illustrating a newly identified transcribed region (chromosome 6: 123,740,814-123,743,530) and its developmentally regulated splicing events. The transcript models were quite different between embryo (orange) and endosperm (violet), as indicated by the two groups of variant junctions (colored shading). The involved three types of AS events (defined in C) in this region are indicated (middle box). The peak and junction reads of RNA-seq data are shown below. B, AS of GRMZM2G146599. Three types of AS events (defined in C) from this gene locus are listed below that RNA-seq data track. Black arrows indicate a novel junction that was not incorporated into existing gene models identified within the gene, and the red arrow indicates that the tissue-specific transcript GRMZM2G146599_T3 was expressed from endosperm. C, Analysis and categorization of eight common types of AS events. The diagram represents the eight different AS event types (left), and the numbers of events are shown as a bar chart (right). Total events included annotated and newly identified splicing events. The numbers of events with both isoforms supported by RNA-seq data from embryo (orange bars) and endosperm (violet bars) are indicated. Events detected as tissue regulated (Fisher's exact test, P , 0.05) are shown with red bars. [See online article for color version of this figure.] above was quite different between endosperm and embryo (Fig. 3A), and one transcript isoform (GRMZM2G146599_ T03) of the gene GRMZM2G146599 was only detected in endosperm based on junction-mapping analysis (Fig. 3B). Based on this observation, we systemically examined splicing differences between endosperm and embryo in seed development. For this purpose, we carried out a computational analysis to categorize all splicing models into eight common types of AS events (Reddy, 2007;Keren et al., 2010; see "Materials and Methods"; Fig. 3C). We identified a total of 15,504 splicing events involving 7,991 multiexonic genes (Supplemental Table S2). Retained intron, in which a single intron is alternatively included or spliced out of the mature mRNA via an intron-definition splicing mechanism, was the most prevalent type of AS event (30.4%; Fig. 3C), consistent with previous studies in plants (Black, 2003;Wang and Brendel, 2006;Barbazuk et al., 2008;Filichkin et al., 2010;Li et al., 2010;Lu et al., 2010;Zhang et al., 2010;Marquez et al., 2012) but in contrast to animal AS events, where exon skipping (cassette exons or coordinate cassette exons) is the predominant mechanism (Sultan et al., 2008;Wang et al., 2008;Daines et al., 2011;Graveley et al., 2011). Furthermore, alternative 39 splice sites (A3SS; 21.0%) or alternative last exons (ALE; 11.5%) and alternative 59 splice sites (A5SS; 12.2%) or alternative first exons (AFE; 11.7%) were other types of common AS events observed in our data (Fig. 3C), in agreement with recent findings in rice , Arabidopsis (Marquez et al., 2012), human , and Drosophila melanogaster (Daines et al., 2011;Graveley et al., 2011). In addition, a higher frequency of tissueregulated events of each type and more junctions were detected in embryo than in endosperm ( Fig. 3C; Table I). However, it remains to be answered if these observations are simply due to numbers of mRNAs per gene detected or types of mRNAs observed. Indeed, we detected more junctions from RNA-seq data in embryo than in endosperm. Finally, by adopting a similar method  to assess tissue-regulated AS, we showed that 7.0% of AS events were differentially regulated in embryo and endosperm (Fisher's exact test, P , 0.05; Fig. 3C). Gene Ontology (GO) analysis of the genes involved in tissue-regulated AS events revealed that these genes were functionally enriched in diverse biological processes (Supplemental Table S3), indicating that transcript processing is a prevalent phenomenon in seed development. In addition, Pearson correlation analysis revealed that NTRs showed higher levels of AFE and ALE events, while protein-coding genes had higher frequency of A3SS and A5SS events. To further confirm the AS events detected in our transcriptomic data, we selected some AS events to validate with RT-PCR analysis. Among the 20 events, 15 (75%) was detected by RT-PCR.
In summary, our analysis shows that AS is an important contributor to the extensive transcriptome complexity in maize seeds and that the transcript isoform abundance seems higher in embryo than in endosperm.

Trans-Splicing Events Occur in Both Embryo and Endosperm
Trans-splicing is one of the RNA-processing mechanisms in which AS is carried out under "trans-mode" by joining exons on separate precursor transcript molecules (Nilsen and Graveley, 2010). Trans-splicing commonly occurs in unicellular organisms but much less often in higher eukaryotes. To identify trans-splicing events in the maize seed, we first used TopHat-Fusion (Kim and Salzberg, 2011) to uncover the potential fusion transcripts, and then we dug out fusion transcripts with fusion points joined by two boundaries of annotated exons from distinct gene models. We required that a candidate fusion transcript produced by a trans-splicing event be supported by five spanning reads and two supporting pairs. As a result, we found a total of 46 trans-splicing events in our data set, with nine intrachromosomal events and 37 interchromosomal events (Supplemental Table S4). Of the intrachromosomal fusions, four came from neighboring genes and the other five were chimera distant genes (Fig. 4A). Although most events occurred in both embryo and endosperm, we observed that some fusion transcripts were subjected to tissue-specific regulation (Fig. 4, B and C). Furthermore, we found that the expression levels of fusion transcripts were lower than those of their precursors, consistent with a previous study in rice .
We randomly chose nine trans-spliced fusion transcripts to validate our observation in the RNA-seq data. Among them, the existence of six fusion transcripts was confirmed with RT-PCR analysis and sequencing. Except for TS12, which showed embryo-specific expression, the other detected trans-spliced fusion transcripts displayed expression patterns similar to the RNA-seq results (Supplemental Table S4; Supplemental Fig.  S5).

Diverse Biological Processes in Maize Seed
RNA-seq has been proved to be a robust tool for the measurement of gene expression in a manner that is more sensitive than other methods, such as traditional hybridization-based microarray technologies (Wang et al., 2009;Wilhelm and Landry, 2009). We thus used our RNA-seq data to calculate the expression levels of annotated genes, transcripts, and NTRs uncovered in this study. The gene expression values were calculated as RPKM (Mortazavi et al., 2008) in each of the samples (see "Materials and Methods"), resulting in 68.5% (27,167) of the annotated genes with detectable expression signals (RPKM . 0; Fig. 5A; Supplemental Table S5). Although most genes were commonly expressed in both tissues, we found that the number of genes expressed in embryo was greater than that in endosperm (Fig. 5A). Using RPKM . 1 as a cutoff for gene expression, we detected a total of 19,904 genes (50.2% of the annotated genes) expressed in the maize seed transcriptome, with 18,384 and 17,100 genes being expressed in embryo and endosperm, respectively, and with 15,580 genes (78.3%) being commonly expressed in both tissues (Fig. 5A). Furthermore, we identified 2,982 up-regulated genes (1,422 in endosperm and 1,560 in embryo) that were highly differentially expressed (with greater than 4-fold change) between endosperm and embryo ( Fig. 5B; Supplemental Table S6), representing 15.0% of the seed transcriptome of 19,904 genes identified in the 9-DAP embryo and endosperm in this study. GO enrichment analysis of the genes up-regulated in endosperm showed that many genes encode proteins functioning in nutrient reservoir activity (P , 2.7 3 10 223 , false discovery rate [FDR] , 2.0 3 10 220 ) and involved in carbohydrate and storage protein metabolic processes (P , 1.6 3 10 25 , FDR , 0.019; Fig. 5C; Supplemental Table S7). In contrast, the genes up-regulated in embryo showed enrichment of biological processes mainly acting in the nuclei, such as chromatin regulation, nucleosome organization, DNA packaging, and transcriptional regulation ( Fig. 5D; Supplemental Fig. S7).
Together, these data demonstrate that the major biochemical differences between embryo and endosperm are reflected in part by highly dynamic, coordinated, and localized transitions in transcriptome abundance, and the results are consistent with the knowledge that the endosperm provides the stored nutrients to feed the embryo and that the embryo develops through a series of cell divisions to pass the genetic information to its offspring.

Survey of the Gene Expression Regulators in Maize Seed
TFs are important regulators for the regulation of gene expression in the plant genome. To explore the accumulation of TFs in the maize seed, we examined the expression of TFs in our RNA-seq data. Of the 1,982 expressed TFs (with RPKM . 1) in the two seed tissues, 937 were differentially expressed (greater than 2-fold change) between embryo and endosperm and were grouped into 61 different TF families (Fig. 6B). Overall, most of the TF families showed tissue-specific expression patterns for either embryo or endosperm ( Fig. 6B;  Supplemental Fig. S17). Many TFs highly expressed in embryo probably have diverse functions in seed development. Among them, several regulators identified may contribute to epigenetic inheritance and reprogramming across generations , including genes encoding histone acetyltransferases and deacetylases, DNA methyltransferases, bromodomain proteins, and the Alfin-like family. Several TF families were found to participate in cell differentiation and vascular development, such as the zf-HD family The key is shown in the box at right. In C, the enriched GO "biological process" and "molecular function" categories are shown (no "cellular component" terms for up-regulated genes from endosperm). In D, only the "biological process" category is shown. For "molecular function" and "cellular component" categories, see Supplemental Figure  (GRMZM2G328438), Myb-related (GRMZM2G158117), ARF family (GRMZM5G874163), TCP (GRMZM2G093895), and HB family (GRMZM2G017087, GRMZM2G162481, and GRMZM2G087741), as well as those implicated in hormone signaling pathways, such as B3 domaincontaining factor (Stone et al., 2001), ABI3/VP1 (abscisic acid signaling), ARF and Aux/IAA (auxin signaling; Schruff et al., 2006;Liu et al., 2007;Sreenivasulu et al., 2008;Xing et al., 2011), GRAS (gibberellin signaling), and ARR (cytokinin signaling) homologs. Several TF family members are likely to play a role in cell fate determination, including C2C2-YABBY (GRMZM2G167824 and GRMZM2G529859) and GeBP (GRMZM2G036966) homologs, as well as in cell growth regulation and germination, including WRKY (Zhang et al., 2011a;GRMZM5G816457) and bHLH (GRMZM2G042920) homologs. On the other hand, the majority of TFs enriched in endosperm were families such as MADS, SBP, NAC, bZIP, Myb, and C2C2-GATA, most of which have been implicated in seed development (Sreenivasulu et al., 2008;Bemer et al., 2010;Le et al., 2010;Agarwal et al., 2011). A total of 33 TF genes were selected for confirmation of the differential expression between embryo and endosperm with real-time RT-PCR analysis. Among the 14 up-regulated genes in endosperm, 12 genes showed expression patterns consistent with RNA-seq data. All of the 11 genes up-regulated in embryo showed expression patterns more or less the same as the RNA-seq results (Supplemental Table S9).
Additionally, it is notable that the relevant core regulators in the small RNA pathway, such as Dicerlike (DCL), Argonaute (AGO), and members of RNAdependent RNA polymerase (RDR) gene family, showed relatively higher expression levels in embryo (Fig. 6C), indicating the potential important biological roles of this pathway for embryo development. Interestingly, we found that DCL1, the crucial component in the microRNA pathway, and RDR2 were up-regulated in endosperm (Fig. 6C). Finally, we observed that most of the surveyed imprinting genes (Zhang et al., 2011b) were preferentially expressed in endosperm. As expected, 41 out of the 45 highly differentially expressed genes (with greater than 4-fold change) were found in endosperm (Supplemental Table S10), consistent with the previous observation that gene imprinting occurs primarily in the endosperm of flowering plants (Huh et al., 2007).
These results demonstrate that our RNA-seq data greatly reflected the major reprogramming events in the seed transcriptome for the transcriptional regulation of gene expression.

DISCUSSION
In this study, we performed deep transcriptomic surveys in maize seed. Using high-throughput RNA sequencing technology (RNA-seq), we mapped in detail the transcriptional differences between the embryo and endosperm. Analysis of the RNA-seq data revealed a complex landscape of the transcriptional network governing maize seed development.

Embryo and Endosperm Are Characterized by Different Metabolic Activities
Among the genes expressed at levels of RPKM . 1, 1,520 genes were specifically expressed in endosperm and 2,804 in embryo (Fig. 5A), demonstrating a more complex biological process in embryo than in endosperm. The main function of endosperm is to provide nutrients for embryo development. Four kinds of cell type can be differentiated in endosperm, including the starchy endosperm, the aleurone layer, the transfer cells, and the embryo-surrounding region (Olsen, 2004). The starch endosperm cells constitute the major part of the endosperm and accumulate starch and storage protein (Olsen, 2004), and the genes involved in nutrient reservoir, storage protein accumulation, and carbohydrate metabolic process were found to be highly expressed in endosperm in this study (Fig. 5,C and D;Supplemental Tables S7 and S8).
The main function of the embryo is to transfer genetic information into the next generation through a series of cell divisions and cell differentiation. The genes involved in genetic information transfer, such as chromatin regulation, nucleosome organization, and DNA packaging, were found to be highly expressed in embryo, likely to maintain, at least in part, the genome fidelity ( Fig. 5D; Supplemental Fig. S6; Supplemental Table S7). It is known that the lipid is mainly stored in embryo (Barthole et al., 2012); consistent with this, lipid biosynthesis genes were preferentially expressed in embryo (Fig. 5D).

Embryo and Endosperm Exhibit Differential AS Patterns
Gene expression regulation occurs at different levels, and transcription regulation is one of the key players. RNA-seq technology offers a powerful tool to uncover the complexity of transcriptional expression and regulation.
AS, a process with exons of pre-mRNA spliced in different arrangements to produce structurally and functionally distinct transcripts, is an essential mechanism to contribute to increasing transcriptome plasticity and proteome diversity in eukaryotes (Blencowe, 2006). It was reported that at least 42% and 48% of genes are alternatively spliced in Arabidopsis and rice, respectively (Filichkin et al., 2010;Lu et al., 2010).
The splicing events function in plant development and response to the environment. For example, alternative processing of the rice WAXY gene contributes to glutinous rice (Isshiki et al., 1998), and that of the disease resistance gene RPS4 is dynamically regulated during the resistance response (Zhang and Gassmann, 2007); FCA, which undergoes both polyadenylation and AS, is involved in the regulation of flowering time (Macknight et al., 2002); SR45.1 and SR45.2, the two alternatively spliced isoforms of SR45, play a major role in flower petal development and root growth, respectively (Zhang and Mount, 2009); alternative HYH transcript contributes to the increased activity of the HYH-Hy5 gene pair (Sibout et al., 2006); and a C2H2domain protein with alternatively spliced transcripts is essential for endosperm development in Arabidopsis (Lu et al., 2012). Additionally, AS has also been functionally implicated in nutrient metabolism. For example, the Arabidopsis TF gene IDD14 generates two splicing variants to competitively form nonfunctional heterodimers to regulate starch metabolism (Seo et al., 2011). In our RNA-seq data of maize embryo and endosperm, 50.7% of multiexonic genes were found to undergo AS, which is higher than that in Arabidopsis and rice (Filichkin et al., 2010;Lu et al., 2010). This indicates that the splicing diversity of the maize transcriptome is more complex than that in Arabidopsis and rice. Interestingly, 7.0% of AS events whose genes are involved in diverse biological processes were tissue specific between embryo and endosperm ( Fig. 3; Table I), and more complex AS events were observed in embryo than in endosperm ( Fig. 3C; Table I). However, it remains to be answered whether higher transcript isoform abundance is required for embryo development than for endosperm.

Differential Regulatory Mechanisms in Embryo and Endosperm
In our data set, 1,982 TFs were detected in embryo and endosperm with RPKM . 1, and approximately one-half of them (937) were differentially expressed (greater than 2-fold change) between embryo and endosperm ( Fig. 6B; Supplemental Fig. S17), implicating a differential transcriptional regulation between embryo and endosperm. It was previously reported that TFs of MADS function in seed and endosperm development (Kang et al., 2008;Sreenivasulu et al., 2008;Bemer et al., 2010;Le et al., 2010;Agarwal et al., 2011;Hehenberger et al., 2012). AGL62, a MADS protein of Arabidopsis, is the direct target of FIS Polycomb group repressive complex2 (PRC2), establishing the molecular basis for FIS PRC2-mediated endosperm cellularization (Hehenberger et al., 2012). In our study, the preferential expression of members of the MADS TF family in maize endosperm probably also indicates an important function of these genes in maize endosperm development (Fig. 6B). TFs involved in hormone signaling such as abscisic acid signaling (ABI3/VP1), auxin signaling (ARF and Aux/IAA), gibberellin signaling (GRAS), and cytokinin signaling (ARR) were up-regulated in embryo, suggesting an important role of the hormone in embryo development (Fig. 6B). Some TFs highly expressed in embryo were found to be probably involved in cell differentiation and vascular development (Fig. 6B), such as zf-HD, Myb-related, ARF family, TCP, and HB family.
Several gene expression regulators highly expressed in embryo contribute to epigenetic inheritance and reprogramming across generations, including genes encoding histone acetyltransferase and deacetylase, DNA methyltransferase, bromodomain protein, and the Alfin-like family (Fig. 6B). Previous studies in Arabidopsis support the idea that transposable elements reactivated in endosperm might enhance the silencing of transposable elements in embryo, and by sacrificing genomic integrity, endosperm might make an epigenetic rather than genetic contribution to the progeny (Hsieh et al., 2009;Mosher and Melnyk, 2010). The RNA-directed DNA methylation (RdDM) pathway functions as the transposable element silencing (Zhang and Zhu, 2011). The genes involved in the RdDM pathway, such as DCL, AGO, and the RDR gene family, were up-regulated in embryo (Fig. 6C). Indeed, in Arabidopsis, the small RNAs generated from the RdDM pathway are specifically produced from and imprinted in endosperm (Mosher et al., 2009). Unlike Arabidopsis, the core regulators in the small RNA pathway expressed in maize embryo indicate that the small RNAs involved in epigenetic silencing of TE in maize are probably produced from embryo, and the producing mechanism and function of small RNAs need to be further elucidated.
The results provided in this study offer an initial step toward the identification of the genes that are specifically expressed in maize embryo or endosperm and shed light, at least in part, on molecular differences between embryo and endosperm at different levels, including transcription, transcript splicing, and transcription regulation as well as epigenetic regulation. In particular, it would be tempting to functionally characterize transcripts, genes, and pathways that are preferentially or specifically expressed in either of the tissues by integrating genetic, biochemical, cytological, and molecular approaches to further our understanding of maize seed development.

Plant Material
The maize (Zea mays) inbred line B73 was grown in the field in the summer of 2009 in Langfang, Hebei province, China. Ears were bagged before silk emergence. Each set of inbred kernels was generated on the same day by selfpollination. At 9 DAP, endosperm and embryo were isolated with tweezers and collected in 300 mM sorbitol solution with 5 mM MES (pH 5.7) from the ovules; then, they were transferred into tubes, snap frozen in liquid nitrogen, and stored at 280°C until further use. In brief, 100 mg of embryos (around 100) or endosperm was used for RNA extraction using Trizol and 10 mg of total RNA was used for complementary DNA (cDNA) synthesis.

Histological Analysis of Maize Seed
For histological analysis, intact maize kernels or excised embryos (9 DAP) were fixed in a solution of 5 mL of formaldehyde, 5 mL of acetic acid, and 90 mL of 50% ethanol at 4°C for 12 h. Following fixation, materials were dehydrated in a graded ethanol series and then infiltrated and embedded in paraffin (Sigma). The embedded samples were sliced into 8-mm slices on a KD202A microtome. Sections were then counterstained with periodic acid-Schiff reagent (Sigma) and 1% Amido black (Sigma). Observation was conducted with an optical microscope (CX21FS1; Olympus).

RNA-seq Library Sequencing
All libraries were sequenced using the Illumina HiSeq 2000. We sequenced two lanes for maize cells, corresponding to 22.6 million reads (Table I).

Bioinformatics Analysis
To systematically characterize the transcriptome of maize seed development, we employed various publicly available tools for mapping, assembly, and quantification of transcripts and integrated these tools with additional informatics filtering steps to enrich the results for the most robust transcriptome construction. Additionally, we specifically designed a browser database using GBrowse (Stein et al., 2002;version 1.70) to visualize all data in this study (Supplemental Fig. S1). Details of the bioinformatic analyses are provided below.

Annotation Databases
The maize reference genome sequences were downloaded from the maize sequence project (B73 AGPv2; http://www.maizesequence.org/). The gene annotation information was retrieved from the B73 filter gene set (release 5b).

Read Alignment and Assembly
RNA-seq reads from each sample were aligned to the maize reference genome using TopHat version 1.3.2), a gapped aligner capable of discovery splice junctions ab initio. TopHat finds splice junctions (without a reference annotation) using a two-step mapping process. Briefly, TopHat first aligned RNA-seq reads to the genome using Bowtie (Langmead et al., 2009) without gaps to determine a set of "coverage islands" that may represent potential exons. Using this initial mapping as well as the presence of GT-AG and GC-AG genomic splicing motifs (for which the sense strand can be reliably inferred), TopHat built a second set of reference sequences spanning exon-exon junctions. The unmapped reads from the first alignment step were then remapped against this splice junction database to discover all the junctionspanning reads in the sample. TopHat reports aligned results from the two mapping steps in shoot apical meristem format for further analysis.
Aligned reads from TopHat mapping were subjected to Cufflinks (Trapnell et al., 2010; version 1.1.0) for ab initio transcript assembly. Cufflinks assembled exonic and splice junction reads into sample-specific transcriptomes using their alignment coordinates. The exon boundaries identified by Cufflinks were further refined based on splice junction coordinates. The method was mainly based on the following criteria: the start end coordinate of an exon corresponds to the end position of a splice junction and vice versa. We retained all exons with both ends supported by splice junctions. For exons that have multiple 59 or 39 end variances, we retained the outermost coordinates.

Discovery of Splice Junctions
We prepared two databases of possible splice junctions in this study. The first one was based on the merged annotated transcripts, which led to the identification of 166,552 splice junctions from annotated exons. The second database was derived from new junctions identified using TopHat based on Plant Physiol. Vol. 162, 2013 451 RNA-seq data, without relying on the genome annotations. For the ab initio splice junction detection, we chose the parameters that allowed putting intron size between 50 bp and 500 kb. We required a minimum eight-nucleotide junction read overhang across the anchor region. Moreover, we required that a junction have at least five supported reads that map to at least two distinct positions. We produced a total of approximately 98,000 high-confidence splice junctions supported by approximately 7,154,000 reads (Table I), approximately 19,000 of which were novel splice junctions (Fig. 1B).

Discovery of Alternative Spliced Exons
Adjacent exons of multiexonic genes were reconnected in multiple ways via the AS mechanism. Eight different types of AS events were generally recognized: intron retention, cassette exons, mutually exclusive exons, coordinate cassette exons, A5SS, A3SS, AFE, and ALE. The AS events were identified using a method similar to previous studies Graveley et al., 2011). To identify exons that were differentially spliced between samples, read counts from each sample to every exclusion and inclusion isoform were calculated. For each AS event, a two-by-two table, in which reads were divided into tissue of origin (e.g. embryo versus endosperm) and read type (i.e. inclusion versus exclusion isoform), was computed. Fisher's exact test was then used to identify differential splicing between each pair of samples, and Benjamini-Hochberg correction was performed. Those events with an adjusted P value cutoff corresponding to an FDR cutoff of 5% were considered sample-specifically regulated.

Analysis of Trans-Splicing Events
Fusion transcripts were initially discovered by the TopHat-Fusion algorithm (Kim and Salzberg, 2011) with parameters "--keep-fasta-order --no-coveragesearch -r 0 --mate-std-dev 80 --fusion-min-dist 100000 --fusion-anchor-length 13" We then imposed further filters to get trans-splicing events from these potential fusions: (1) fusion points were formed by the boundaries of annotated exons from distinct gene models; (2) we required five spanning reads and two supporting pairs.

Identification of Novel Transcripts
We proposed a bioinformatics analysis pipeline for the detection of unannotated transcripts (termed NTRs). The six main steps of the approach are outlined in Figure 2A. Specifically, step 1 (mapping and assembly) used TopHat and Cufflinks to obtain the splice junctions and candidate exons from RNA-seq data, respectively. In step 2 (filter known annotation), the exons and junctions that overlap or are directly adjacent (less than 200 bp) to existing annotated transcripts were filtered out. The remaining exons (some representing segmental exons; dashed boxes) were merged or extended according to the supported junction (red polylines), as shown in step 3 (merged and extended exons). During step 4 (novel transcript assembly), a graph is created where nodes were merged exons (solid boxes) and orientated edges (red polylines) between two nodes represented validated junctions. The assembled transcripts were filtered further based on their size (two or more exons, and total size of exons greater than 300 bp, as shown in step 5) and expression (RPKM . 1, as shown in step 6). Additionally, we manually checked the refined transcript models to adjust their orientations and boundaries.

Quantification of Gene Expression Levels
Expression levels were computed as described previously (Mortazavi et al., 2008). Briefly, the expression of a transcript in each RNA-seq sample was calculated in RPKM defined as follows: where C is the number of reads (aligned reads plus junction reads) mapped to the transcript, L is the total exonic length of the transcript, and N is the total number of reads mapped in the sample. Gene expression was computed at both the gene and transcript levels.
For hierarchical clustering analysis, we used clustering software Cluster 3.0 (de Hoon et al., 2004) to perform complete linkage hierarchical clustering on both genes and samples, using uncentered Pearson's correlation as a distance measure. The clustering results were visualized using the Java Treeview program (Saldanha, 2004).

Statistical Analysis
All statistical analyses were performed by using the R language (http:// www.r-project.org/). For gene set analysis, based on the GO term enrichment methods, agriGO (Du et al., 2010) was used to detect the significantly enriched GO terms of gene sets of interest compared with the genome-wide background with an adjusted P value (FDR) cutoff of 0.05.

Real-Time RT-PCR Analysis
Total RNA was isolated from 9-DAP embryo and endosperm using Trizol reagent (Invitrogen). To eliminate any residual genomic DNA, total RNA was treated with RNase-free DNase I (New England Biolabs), and 1 to 2 mg of RNA was used to synthesize first-strand cDNAs using the RevertAid First Strand cDNA Synthesis kit (Fermentas). ACTIN primers were used to detect genomic DNA contamination. Relative quantification values for each target gene were calculated by the comparative threshold cycle method (Livak and Schmittgen, 2001) using ACTIN as an internal reference gene to compare data from different PCR runs or cDNA samples (qRT-PCR). qRT-PCR analysis provided relative changes in gene expression, with the 9-DAP endosperm normalized to a value of 1. Data were statistically analyzed using Student's t test. The results shown are representative of two independent experiments, and within each experiment, treatments were replicated three times unless otherwise stated. For semiquantitative RT-PCR for transcript expression, ACTIN was used as an internal reference to equalize RNA loading into the RT-PCR. After 30 cycles of amplification, PCR products were resolved on a 2% agarose gel and stained with ethidium bromide. All primers used for the RT-PCR analysis are listed in Supplemental Table S11.
Sequence data from this article can be found in the GenBank data library.

Supplemental Data
The following materials are available in the online version of this article.
Supplemental Figure S1. Experimental steps for RNA sequencing and strategy for analyses of RNA-seq data.
Supplemental Figure S3. Distribution of RNA-seq reads (calculated as RPKM) and annotated genes (dashed red line) across maize chromosomes.
Supplemental Figure S6. GO enrichment analysis for the up-regulated genes from embryo.
Supplemental Figure S7. Overview of cell functions.
Supplemental Figure S8. Heat map of the lipid synthesis pathway.
Supplemental Figure S9. Heat map of the tricarboxylic acid cycle metabolism pathway.
Supplemental Figure S10. Heat map of the nucleotide synthesis pathway.
Supplemental Figure S11. Heat map of the photosynthesis (Calvin cycle) pathway.
Supplemental Figure S12. Heat map of the terpenoid metabolism pathway.
Supplemental Figure S13. Heat map of the Suc-starch metabolism pathway.
Supplemental Figure S14. Heat map of the protein-targeting pathway.
Supplemental Figure S15. Heat map of the flavonid pathway.
Supplemental Figure S16. Heat map of the carotenoid metabolism pathway.
Supplemental Figure S17. Overview of the expression heat map of TFs in the maize genome.