Transcriptome analysis of proliferating Arabidopsis endosperm reveals biological implications for the control of syncytial division, cytokinin signaling, and gene expression regulation.

During the early stages of seed development, Arabidopsis (Arabidopsis thaliana) endosperm is syncytial and proliferates rapidly through repeated rounds of mitosis without cytokinesis. This stage of endosperm development is important in determining final seed size and is a model for studying aspects of cellular and molecular biology, such as the cell cycle and genomic imprinting. However, the small size of the Arabidopsis seed makes high-throughput molecular analysis of the early endosperm technically difficult. Laser capture microdissection enabled high-resolution transcript analysis of the syncytial stage of Arabidopsis endosperm development at 4 d after pollination. Analysis of Gene Ontology representation revealed a developmental program dominated by the expression of genes associated with cell cycle, DNA processing, chromatin assembly, protein synthesis, cytoskeleton- and microtubule-related processes, and cell/organelle biogenesis and organization. Analysis of core cell cycle genes implicates particular gene family members as playing important roles in controlling syncytial cell division. Hormone marker analysis indicates predominance for cytokinin signaling during early endosperm development. Comparisons with publicly available microarray data revealed that approximately 800 putative early seed-specific genes were preferentially expressed in the endosperm. Early seed expression was confirmed for 71 genes using quantitative reverse transcription-polymerase chain reaction, with 27 transcription factors being confirmed as early seed specific. Promoter-reporter lines confirmed endosperm-preferred expression at 4 d after pollination for five transcription factors, which validates the approach and suggests important roles for these genes during early endosperm development. In summary, the data generated provide a useful resource providing novel insight into early seed development and identify new target genes for further characterization.


Introduction
Most of the world's food calories come from seed and extensive research has been directed at improving nutritional value and traits such as seed size and number.
However, seed are complex organs and improvement by rational design requires an understanding of the contribution of specific tissues during important stages of seed development.
Seed development in most angiosperms begins with double fertilisation where the haploid egg cell and the double haploid central cell are both fertilised by identical haploid sperm cells contributed from a single pollen grain. This generates the diploid embryo and the triploid endosperm, respectively. The embryo and the endosperm grow rapidly in a coordinated manner that is heavily influenced by the surrounding maternal integument tissues that later form the seed coat (Olsen, 2004;Garcia et al., 2005). During early stages of seed development, maternal resources are mainly used for rapid cell division and growth of new tissues. Once cell division slows, the seed enters a maturation phase during which resources are reallocated to the synthesis of storage compounds, such as starch followed by accumulation of oils and proteins.
Biosynthetic activity then slows as the seed moves through a late maturation phase and prepares to desiccate prior to dormancy.
The developing endosperm plays several important roles during seed development (Berger, 2003;Olsen, 2004). In many plant species, including arabidopsis, the triploid primary endosperm nucleus undergoes several rounds of free-nuclear division, growing rapidly as a syncytium (Olsen, 2001). During the first phase of endosperm to the formation of chalazal-nodules and -cysts. Towards the end of this phase the endosperm consists of approximately 200 nuclear-cytoplasmic domains and the embryo reaches the globular stage of development. Phase four sees the initiation of endosperm cellularisation and reduced rates of mitosis (Scott et al., 1998;Boisnard-Lorig et al., 2001;Ingouff et al., 2005).
A role for endosperm in supporting the formation and growth of the embryo during early stages of development is suggested by the positioning of the CZE endosperm and by the fact that seed with severely defective endosperm cannot complete development (Scott et al., 1998). During the maturation stages, endosperm cellularises and storage reserves are produced that accumulate in the endosperm cells (Sorensen et al., 2002). In plants that have ephemeral endosperms, such as arabidopsis and oilseed rape, the embryo develops at the expense of the endosperm and absorbs these reserves, storing them in the cotyledons (Scott et al., 1998;Olsen, 2004). Seeds that generate large endosperms during the early stages of development produce large embryos at maturity. The early proliferation of the endosperm is therefore associated with the growth of seed and final seed size (Scott et al., 1998;Bushell et al., 2003).
The alteration of the rate and duration of cell division in the endosperm has been proposed as a biotechnological strategy for altering seed size (Tiwari et al., 2006).
Arabidopsis provides an important model system for studying the underlying mechanisms of early seed development. Extensive and rapid analysis of many aspects of seed biology can be conducted in arabidopsis due to the established protocols for producing and analysing mutant and transgenic lines, and the availability of a genome sequence facilitating the generation of tools for high throughput molecular analysis.
However, a major drawback to studying seed biology in arabidopsis is its very small seeds. Laser microdissection (LM) is an important method for obtaining individual tissues or cell types for biochemical analysis. Originally developed for isolating cancerous cells from normal tissue (Emmert-Buck et al., 1996), LM has been used successfully to obtain DNA, RNA, proteins and metabolites from a range of plant species and tissue types (reviewed in Day et al., 2005;2007a;Nelson et al., 2006). It therefore provides an ideal tool for analysing gene expression changes in specific cell types during the early stages of arabidopsis seed development (Spencer et al., 2007).
--7 --In a previous study, we compared different methods of transcriptome amplification from small amounts of RNA for use with printed long-oligonucleotide microarrays (Day et al. 2007b). A two round IVT based amplification was selected and used to obtain array data for proliferating syncytial endosperm 4 days after pollination (DAP).
This corresponded to the end of the third phase of endosperm development where the syncytial endosperm contains many nuclear-cytoplasmic domains but is prior to cellularisation at 5-6 DAP (Scott et al., 1998;Boisnard-Lorig et al., 2001;Ingouff et al., 2005). The microarray data that formed the basis of this study were generated by hybridising LCM endosperm derived target alongside target from similarly treated silique tissues using a two colour microarray approach (Day et al., 2007b). 18,220 unique probes gave signal higher than two-fold background and t-testing identified 12,710 probes as being significantly differentially expressed between the whole silique and endosperm samples using a p-value cut-off of <0.05. Analysis of embryo, seed coat and endosperm markers within the data indicated that a 2-fold differential expression in the endosperm direction provided a stringent cut-off for identification of endosperm-preferred expression (Day et al., 2007b). This procedure identified 2,568 individual loci as being preferentially expressed in the endosperm.
Here, we present extensive validation of the LCM endosperm array data by qRT-PCR and GUS reporter lines and provide a comprehensive analysis of the endosperm transcriptome in the context of existing online resources. The analysis has enabled novel insight into early endosperm development and has identified 793 genes as having early seed-specific and endosperm-preferred expression.

Microarray analysis from laser microdissected endosperm reliably identifies differential expression in the endosperm.
To ensure that the microarray platform was correctly measuring differential expression between the endosperm and silique samples sixteen differentially expressed genes from the array data were selected for concurrent qRT-PCR analysis.
Excess amplified RNA produced during the microarray target preparation was used to provide template for the qRT-PCR. The expression ratios produced by qRT-PCR and the microarray experiments were very similar (Table I) and all genes were confirmed as preferentially expressed in the endosperm sample by both the microarray and qRT-PCR.

Identification of endosperm-preferred genes specifically expressed during early seed development using online datasets
To help identify genes with early endosperm specific roles, we searched three online datasets that included a wide range of different tissues and at least one early seed or silique sample. This identified many genes with apparent early silique/seed-specific expression.
MPSS data available from http://mpss.udel.edu/at/GeneQuery.php) (Meyers et al., 2004), includes two independent silique libraries (1-2 DAP) that are consistent with the early proliferative stage of endosperm development as well as data for inflorescences, leaves, roots and germinating seedlings. We identified 200 genes that showed enrichment in the silique libraries. Of these, 68 were preferentially expressed in the endosperm and were MPSS silique library specific. A summary of the 35 endosperm-preferred genes with the highest MPSS tag frequencies and silique specific MPSS profiles are given in Figure S1.
The AtGenExpress developmental series (Schmid et al., 2005) uses Affymetrix Genechip technology to profile arabidopsis transcripts from different organs and at different stages of development. This library contains data for seeds dissected from the silique 6 DAP onwards but the earlier stages (2-3, 3-4 and 4 DAP) use whole silique material. SUC5 has strong endosperm-preferred expression during the --9 --proliferative stages of seed development (Baud et al., 2005) and was used as an expression template to pull out 196 genes with similar expression (based on an r value cut-off of 0.75) (Table S1). This list was filtered based on endosperm-preferred expression in our data, a median expression level in AtGenExpress <100 units in nonseed containing tissues and a median expression of >100 across the seed series to generate entries for Figure 1.
A more recent Affymetrix Genechip dataset profiles a range of tissues including ovules and seed dissected from gyneocia and siliques, respectively (available at http://estdb.biology.ucla.edu/genechip/). This dataset includes immature seed at 1, 3-4 and 7-8 DAP and was generated in the Goldberg (UCLA) and Harada (UC Davis) laboratories by Brandon Le (UCLA), Anhthu Bui (UCLA), and Julie Pelletier (UC Davis). We refer to it in this manuscript as GHL data. We identified genes with similar expression patterns to early endosperm markers (see materials and methods) and cross-referenced this to genes with differential expression from our arrays. This created a subgroup of 2,608 putative early seed-specific genes (Table S2).

Partitioning the data for further analysis.
To gain insight into the differential processes in operation during early silique and endosperm development at 4 DAP, we analysed endosperm preferred (EP; >2-fold differentially expressed in the endosperm sample compared to the silique sample) or other siliques tissue preferred (OST; >2-fold differentially expressed in the silique sample compared to the endosperm sample) gene groups. We also looked at subgroups containing genes that were thought to be early seed specific from our analysis of the GHL data that we termed ESS-EP and ESS-OST. The GHL data was found to be the most reliable source for identifying early seed-specific expression since it displayed high sensitivity towards known endosperm markers compared to the AtGenExpress data (see Materials and Methods and Figure S2). The lists of AGI numbers used for each partition are available in Table S3.

Analysis of representation of gene ontologies and functional categories.
The arabidopsis genome has been extensively annotated. TAIR, as part of the Gene Ontology Consortium (Rhee et al., 2003) and the Munich Information Centre for --10 --use controlled vocabularies aimed at providing descriptions of the roles of genes that are applicable to all organisms. We identified statistically significant enrichment of annotation terms using both the TAIR GO or FunCat schemes. Since both gave a similar insight into our microarray data, we present the analysis based only on the TAIR GO terms. However, complete analysis using both vocabularies, including AGI identifiers for each gene present in a significantly enriched group, are provided in Tables S4 to 7. Analysis of the EP partition indicated enrichment for GO terms associated with the rapidly proliferating nature of the endosperm at 4 DAP i.e. molecular biosynthesis, protein formation, the cell cycle, DNA metabolism and replication and microtubule based movement (Tables S4 to 7).
The OST partition represents genes that are predominantly expressed in nonendosperm tissues of the silique. The less proliferative nature of the growth and development of the majority of non-endosperm tissues manifest as an enrichment of GO terms for growth, development, cell communication, signal transduction and hormone-mediated signalling (Table S4). We also saw an enrichment of a large number of GO terms associated with endogenous and environmental stimuli that presumably reflect the need for the silique to provide a buffered environment for immature seed to develop. Unlike the syncytial endosperm, the non-endosperm tissues of the silique are mostly comprised of cells encased in a cell wall matrix, a difference that is corroborated in our data by enrichment for cell wall organisation/biogenesis and cell wall loosening. Also enriched in this partition are terms for carbohydrate metabolism and biomolecular transport (Table S4).
The GO analysis was refined to only include genes expressed specifically during the early stages of seed development. The ESS-EP partition was heavily enriched for genes associated with aspects of the cell cycle, DNA and chromatin biochemistry, microtubule associated processes and protein synthesis. The ESS-OST partition was enriched for relatively few GO terms (development, ovule development, carpel development, gynoecium development and organ development) consistent with a less proliferative type of tissue development in non-endosperm seed tissues (ESS-EP and ESS-OST GO analysis shown in Table II).

Representation analysis of selected gene families
Several gene families of interest have been characterised in the recent literature or collected in online resources. Representation analyses of selected gene families are summarised in Table III and details are given below. Significance during this stage of the anlaysis was based on a p-value cut-off of <0.005, unless otherwise stated.

Analysis of cell cycle genes
Plant syncytial development requires a rapid progression through the cell cycle, suppression of phragmoplast formation and an uncoupling of cytokinesis from mitosis (Otegui and Staehelin, 2000). To gain further insights into this process, we analysed our data to identify endosperm-preferred genes that have been implicated in controlling cell cycle progression (Table IV). The core cell cycle genes of arabidopsis (Vandepoele et al., 2002) and genes shown to be regulated by the E2F members of this family were overrepresented in the EP data (Table IV and Table S8). Motifs associated with E2F binding were also highly enriched in this partition, such that the "E2FAT" motif (TYTCCCGCC) was enriched at the p-value <1x10 -5 level in both the EP and ESS-EP partitions and the "E2F binding site motif " (TTTCCCGC) was enriched in the EP and ESS-EP partitions at the p-value <1x10 -9 and p-value <1x10 -10 levels, respectively.
Progression through the cell cycle occurs via coordinated sequential activation of distinct phases. M-phase specific expression is associated with an M-specifc activator sequence (MSA) in the promoter region of a gene. 161 differentially expressed genes from our array analysis contained the MSA sequence in the 500bp upstream of the ATG. Of these putative M phase-specific genes, 27 were in the EP partition and 16 in the ESS-EP partition (Table S8). Hypergeometric testing of these putative M phase specific transcripts indicated significant enrichment in the EP and ESS-EP partitions at the p-value <0.01 and p-value <0.0005 level, respectively.

Analysis of hormone response pathways
The varied distributions of phytohormones and their well-documented ability to regulate growth and development of the seed make them obvious candidates for identifying important components in the control of early endosperm development (Lur and Setter, 1993;Yang et al., 2002;. A recent study by Nemhauser et al. (2006) --12 --used the AtGenExpress hormone series to identify genes that were only expressed in response to particular hormones, suggesting these genes can be used as markers for hormone action. To gain insight into the influence of plant hormones in the developing silique at 4 DAP, we looked for the presence of these markers in our gene groups. The OST partition (representing many tissues) was significantly enriched for hormone markers, whereas the EP partition (single tissue) had significant under representation (Table III).
Analysis of the full lists of genes responsive to the hormones ethylene, abscisic acid, brassinosteroid, cytokinin, gibberellelin, auxin and jasmonate (ACC, ABA, BL, CK, GA, IAA and MJ) showed that all but one of the hormone responsive gene groups (GA) were significantly enriched in the OST partition (Table III and Table S9), reinforcing the observations made using the marker list. The ACC, ABA and GA responsive genes were well represented in the endosperm-preferred partition but only the CK responsive genes were significantly enriched (Table III). Endosperm-preferred genes involved in cytokinin signalling are given in Table V. The hormone responsive gene lists include genes that are up-regulated, downregulated, or have a complex regulatory pattern, in response to exogenous hormone application. The distribution of up, down and complex CK regulated genes in the data partitions were compared using a chi-square test. Significant differences from the expected distribution for CK regulated genes were seen for both the EP and OST partitions. The EP partition included a much larger than expected number of CK upregulated (observed 94% and expected 66%) and fewer than expected CK downregulated genes (observed 4% and expected 32%). Conversely, in the OST partition, we saw a much larger than expected number of CK down-regulated (observed 52% and expected 32%) and fewer than expected CK up-regulated genes (observed 47% and expected 66%).
Interestingly, none of the 48 ARF and AUX-IAA transcription factors represented in the differentially expressed gene list gave evidence for endosperm-preferred expression (data not shown). Conversely, 19 of these transcription factors were present in the OST partition. Interactions between these two groups of proteins mediate auxin-dependent transcriptional regulation and when taken together as an --13 --"Auxin signalling group" (ARFs plus Aux-IAAs), hypergeometric testing showed that the under representation in the EP partition was significant (p-value =0.0032).

Analysis of chromatin related and DNA methylation sensitive genes
Transcriptional regulation is closely related to chromatin structure and during syncytial development endosperm has a high proportion of euchromatin, with small chromocenters and distinct heterochromatic foci (Baroux et al., 2007). Euchromatin is associated with active transcription and alterations in chromatin structure have been associated with the onset of cell division, morphogenesis and differentiation (Zhao et al., 2001;Berger and Gaudin, 2003;Williams et al., 2003;Baroux et al., 2007;De Veylder et al., 2007). Enrichment analysis revealed a significant overrepresentation of chromatin related genes (obtained from ChromDB-http://www.chromdb.org/) in the endosperm (Table III; Table S10).
Dynamic changes in chromatin structure are associated with epigenetic alterations, such as DNA methylation and histone modifications. DNA methylation tends to be associated with transcriptional repression and a recent study has identified a number of genes that appear to have methylation sensitive transcription . Enrichment analysis of the methylation sensitive genes in our partitioned data generated a similar distribution pattern to that observed for transcription factors (Table III; Table S11) which implies DNA methylation is a widespread form of transcriptional control throughout developing siliques.  (Guo et al., 2005). Our analysis identified differential expression for 943 of these, 187 of which were endosperm preferred (Table S12).

Analysis of transcription factors
Furthermore, 71 transcription factors were found to be endosperm preferred and early seed specific (Table S12). Table VI highlights the transcription factors that our analysis validates (see below) to be early seed specific with strong evidence for endosperm-preferred expression from the microarrays. To see if any transcription factor families were overrepresented in the EP and ESS-EP lists, we calculated the frequency of each family in our data (our analysis was limited to the 28 families that --14 --had 10 or more members showing differential expression in our array data). However, little evidence for enrichment of particular types of transcription factors was observed (data not shown).

Evidence for biological significance of protein interactions for MADS box transcription factors expressed during proliferative endosperm development.
Twelve MADS-box genes were found in the EP partition and interestingly, all but one were type I MADS box genes (  Figure 2). We also identified seven MADS-box genes that gave strong evidence for endosperm specific expression at 4 DAP (AGL35, PHE1, PHE2, AGL33, AGL40, AGL62 and AGL91) (Table VII and Figure 2).

Validation of early seed and early seed-specific expression
All the endosperm-preferred genes discussed in detail as part of our analysis had their expression levels assessed in different plant tissues by qRT-PCR ( Figure 3). Samples were taken from leaves, stems, roots, flower buds, whole siliques and seed dissected from 4 DAP siliques. All genes detected showed higher expression in the seed sample than in whole siliques, consistent with the original LCM endosperm array data. Data analysis also predicted early seed-specific expression for a number of transcription factors (Table VI). Of the 25 novel candidates in Table VI only two (At4g23750 and At5g11510) showed significant transcript expression in a non-seed tissue sample ( Figure 3). Both were only additionally expressed in the flower buds, perhaps suggesting prior expression in the male and/or female gametophytes prior to fertilisation. At4g00140 was not detected in any conventional samples by qRT-PCR.

Identification of promoters driving expression in the early endosperm.
GUS reporter constructs were made for a selection of transcription factors to assess their use as markers for early endosperm development ( Figure 4). The promoter for At1g65300 (AGL38/PHE2) drove expression during very early embryo and endosperm development but became restricted to the chalazal endosperm region around the late globular stage of embryo development. Expression was also seen in pollen. The At1g49190 (ARR19) promoter was expressed specifically in the chalazal endosperm during globular and early heart stages of seed development with some evidence of expression in stomatal guard cells of the silique. The At4g21080 (DOF4.5; Yanagisawa, 2002) promoter showed very strong chalazal specific expression at globular stage, but then became more widespread throughout the endosperm with the intensity of staining decreasing around heart stage. The At4g18870 (Hsf-14;Guo et al., 2008) promoter also had very strong expression in the chalazal endosperm plus strong expression in the peripheral endosperm but staining was not apparent in the micropylar domain. At around heart stage expression became restricted to the chalazal endosperm. Expression was also apparent in the cotyledons of the embryo during the mature green stages of seed development. At5g60440 (AGL62) was expressed specifically throughout the early developing endosperm at low levels but staining was not apparent from heart stage ( Figure 4).

DISCUSSION
Using laser microdissection and microarray analysis, we have obtained the transcriptome of the syncytial endosperm (Day et al., 2007b). Here, we present a detailed analysis of the transcriptome, focusing on the genes differentially expressed in the endosperm compared with other silique tissues. We also identified subset of ~800 of these genes that are specifically expressed in the seed and therefore probably play key roles in seed development. Analysis of our data was consistent with the idea that the syncytial endosperm at 4 DAP is locked into a proliferative state dominated by transcripts associated with the regulation of the cell cycle, DNA processes, chromatin assembly, protein synthesis, cytoskeletal/ microtubule related processes, and cell/organelle biogenesis and organisation. In the discussion below, we focus on biological significance of the endosperm-preferred expression of particular genes involved in the cell cycle, hormone biology, transcriptional regulation, and early endosperm development.

Analysis of cell cycle genes suggest roles for gene-family members in the regulation of syncytial division
In arabidopsis, eight mitotic divisions occur during the syncytial phase of then coordinate entry into the next phase of the cell cycle. A-and B-type CDKs are the main drivers of the plant cell cycle and CDKB1;1, CDKB1;2, CDKB2;2, and CDKD1;1 (CDKD3 is just below the 2-fold cut-off) are present in the endospermpreferred partition and probably play a key role in controlling endosperm proliferation (Table IV).
A most striking aspect of our analysis of core cell cycle genes (Vandepoele et al., 2002) is the predominance of cyclin-A and -B genes in the endosperm-preferred partition. Five of the seven A-type cyclins (CYCA1;1, CYCA1;2, CYCA2;1, CYCA2;2, CYCA3;1) and all six of the B-type cyclins (CYCB1;1, CYCB1;2, CYCB1;3, CYCB2;1, CYCB2;3, CYCB3;1) with differential expression on our arrays were endosperm-preferred (Table IV). In contrast to the A and B type, only one out of seven D-type cyclins present on the array (CYCD3;3) (Vandepoele et al., 2002) was detected in the endosperm-preferred partition. Both A and B type cyclins are associated with mitotic cycles and it has been predicted that cyclin accumulation is part of the overall programme responsible for syncytial development (Boisnard-Lorig et al., 2001). The Zea mays CYCA1 is concentrated in the phragmoplast during cytokinesis (John et al., 2001). The arabidopsis CYCA1 orthologue is endoperm preferred in our analysis, yet the precise nuclear localizaiton of CYCA1 is not known and formation of the phragmoplast is supressed during syncytial endosperm development. CYCB1;1 has been studied extensively during early endosperm development (Boisnard-Lorig et al., 2001) and CYCB1;1-3 are endosperm preferred in our analysis. It has been shown that a non-degradable form of CYCB1 suppresses phragmoplast formation at the end of each nuclear division (Weingartner et al., 2004).
Therefore studies on the regulation of CYCB1 genes should provide further insights into their possible role in synctial endosperm development.
Regulation of CDK/cyclin complexes through the cell cycle is also mediated through inhibitors of cyclin dependent kinases (CKI). In plants, inhibitors of cyclin dependent kinases are more similar to Kip protein (KRP; Kip related protein) (De Veylder et al., 2001). There are seven KRPs present in the arabidopsis genome. The putative CDK inhibitory protein KRP4 is endosperm preferred (Table IV) (Otegui and Staehelin, 2000;Kosugi and Ohashi, 2002;Mariconti et al., 2002). Consistent with a pronounced role during early endosperm development, E2F target genes (and upstream E2F binding motifs) were significantly enriched in the EP and ESS-EP groups indicating that some E2F responsive genes have endosperm specific roles (Table III). All six arabidopsis E2F transcription factors (E2Fa-c and DEL1-3) gave signals consistent with endosperm expression, although data for E2Fb and DEL1 was relatively variable. E2Fa-c have important roles regulating transcription during the G1-S transition, with E2Fa and E2Fb acting as positive regulators of the cell cycle (De Veylder et al., 2002;Kosugi and Ohashi, 2003;Sozzani et al., 2006) activating targets that are necessary for DNA replication during S phase (Ramirez-Parra et al., 2003;del Pozo et al., 2006) and Unlike E2Fa-c, DEL1-3 do not interact with DPa and DPb and bind E2F binding sites in a monomeric form (Kosugi and Ohashi, 2002;De Veylder et al., 2007). The DEL proteins have been shown to be abundant in meristematic cells and were thought to balance the activities of E2Fa/bDPa/b transcription factors by restraining cell proliferation (Ohashi-Ito et al., 2002). DEL3 and DEL2 were present in the endosperm (Table IV)

Analysis of hormone markers indicates an important role for cytokinins in the proliferating endosperm
Analysis of hormone responsive genes showed that only the CK responsive genes were significantly enriched in the EP partition (Table III). This is consistent with studies in rice and maize that show significant correlations between CK levels and the rate of cell division in the early endosperm (Lur and Setter, 1993;Yang et al., 2002).
Moreover, the early endosperm appears to be a site of CK biosynthesis with major --20 --components of the CK biosynthetic pathway (Isopententyltransferase genes 4 and 8) being specifically expressed in the chalazal endosperm during early seed developemnt (Miyawaki et al., 2006 and Table I).
Cytokinin signalling genes are expressed in the early endosperm. The enrichment of CK responsive genes and the presence of CK biosynthesis genes in the EP partition indicate that CK signalling is important during the early stages of arabidopsis endosperm development (Table V). CK perception and signalling are similar to twocomponent phosphorelays in bacteria (Müller and Sheen, 2007). AHK4 shows evidence for endosperm expression in our data and is an example of a sensor histidine kinase (AHKs) that is able to initiate a phosphorelay when bound to CKs. Arabidopsis --21 --activating genes in the EP group and down regulating genes in the OST group. A differential response to CK in the endosperm compared to other silique tissues was also evident in our analysis of the ARR genes in the EP and OST partitions ( Table V).
The EP partition included three ARRs (ARR18, 19 and 21), all of which are B-type ARRs that act as transcriptional activators during CK signalling. In contrast, the majority of A-type ARRs (negative regulators) in our data were distributed in the OST partition and perhaps indicates a more inhibitory regulation of CK signalling in other silique tissues compared to endosperm at 4 DAP. IPT4 (Kiba et al., 2005). As IPT4 is also expressed in the chalazal endosperm (Miyawaki et al., 2004), it seems likely therefore that ARR21 and perhaps ARR19 act to ensure IPT gene expression and therefore CK biosynthesis during early wild-type seed development.
Although our analysis indicates that the chalazal endosperm plays an important role in directing proliferation of the endosperm via CK signalling, the chalazal endosperm does not appear to undergo mitosis and shows evidence of endoreduplication (Boisnard-Lorig et al., 2001). This is contrary to current thinking that high levels of CKs exist in mitotically dividing cells, the suggested sites of de novo CK biosynthesis (Moncaleán et al., 2001;Friml, 2003;Nordstrom et al., 2004;He et al., 2005). Whilst our data suggests a primary role for CK signalling during syncytial endosperm development, there is likely to be significant interplay with other phytohormones. It is well documented that the ratio of auxins and cytokinins play an important role in controlling tissue proliferation and differentiation. A study of maize endosperm showed CK levels were high at 9 DAP, which corresponded to the maximal cell division rate in the endosperm, and reduced sharply as auxin levels increased towards the mid-to late-stages of endosperm development. This is consistent with other studies on grain development where CK levels are maximal during early stages and auxin levels reach maximal levels later in seed development (Mengel et al., 1985;Lur and Setter, 1993). Furthermore, auxin has been shown to have a rapid negative control on CK levels by suppressing its biosynthesis (Nordstrom et al., 2004). Our data indicate strong CK-associated and weak auxin-associated transcriptional responses in the syncytial endosperm at 4 DAP (Table III) (Table III and Table S9). In a recent study, Yang et al., (2006) (Table I and (Table III and Table S9). This is consistent with observations that jasmonic acid has a negative effect on G1/S and G2/M transitions (Świątek et al., 2004) and that GAs stimulate cell division (Fabian et al., 2000). However, the underrepresentation of BL responsive genes was unexpected since BLs have been shown to promote cell division and are able to up-regulate the expression of CYCD3;1 and CDKB1;1 (Yoshizumi et al., 1999;Hu et al., 2000). It is possible that the under --24 --representation of BL responsive genes in the endosperm is in some way related to the under representation of auxin genes, since auxin and BLs can cooperate to promote organ growth via cell cycle activation (Bao et al., 2004).

Identification of putative early seed specific transcription factors with endosperm preferred expression
Seventy-one transcription factors were found to be endosperm preferred and early seed specific. These genes likely play important roles within the endosperm and representative genes were further characterised using promoter-GUS reporter lines ( Figure 4). This both implicates the genes (At1g65300, At1g49190, At4g21080 At4g18870 and At5g60440) as being important during proliferative endosperm development but also provide tools to mis-express genes which may promote endosperm proliferation as a rational approach to generate larger seed.  (Table VII). Recently, AGL23 has been shown to play a role in female gametophyte and embryo development (Colombo, 20008). Since a prolonged syncytial growth pattern is limited to early endosperm development in Arabidopsis, it follows that key developmental switches that define syncytial competency are also early seed specific. This is the case for AGL62 that is required for normal syncytial endosperm development since disruption of the AGL62 gene results in very early cellularisation of the endosperm approximately 24 h after fertilisation (Kang et al., 2008). Both the Drews laboratory AGL62-GFP reporter line (Kang et al., 2008) and the AGL62-GUS reporter presented in this study confirm that AGL62 expression is specific to the syncytial endosperm. AGL62 interacts in vitro with many seed expressed MADS-box genes that may help mediate its ability to enable syncytial proliferation (which requires inhibition of cytokinesis and cell wall formation). As mentioned, AGL62 can form a heterodimer with PHE1, which has also been implicated as a positive regulator of endosperm proliferation and has endosperm specific expression at 4 DAP (this study and Köhler et al., 2003). PHE2 has 72% homology to PHE1 at the amino acid level and has been reported to have a very similar expression pattern, although no data was provided to substantiate this claim (Köhler et al., 2003). Our PHE2 promoter-GUS analysis confirms that PHE2 expression is largely equivalent to PHE1 during wild type seed development.

AGL23-GUS lines show
Transcription of both AGL62 and PHE1 appear to be regulated by members of the fertilisation independent seed polycomb (FIS-PcG) complex since levels of both AGL62 and PHE1 are increased and persist longer in fertilisation independent seed (FIS) than during wild-type seed development (Köhler et al., 2003). Developing seed of the FIS class mutant mea abort during heart stage, which coincides with an over proliferated endosperm. Abortion of mea seed is partly due to abnormally high levels of PHE1 since reducing PHE1 transcript in mea using an antisense construct can rescue the abortion phenotype. Like PHE1, PHE2 expression was also reported to be increased in mea seeds, which is suggestive of similar regulation and perhaps a redundant function. Alternatively PHE2 may be an antagonist of PHE1 that competes to form heterodimers with AGL62 thus acting to reduce the rate of syncytial proliferation. Our data also predicts that the other endosperm specific Mα MADS-box https://plantphysiol.org Seedgenes.org (http://www.seedgenes.org/index.html), a database of arabidopsis seed mutants, contains six genes described as having female gametophytic inheritance patterns (where siliques produce ~50% mutant seeds following pollination of heterozygotes, regardless of pollen genotype) (Tzafrir et al., 2003). These include the components of the FIS-PcG complex MEA, FIS2, FIE and MSI1 and the DNA glycosylase DME. The sixth gene, EMB2220 (At5g12840) is not early seed specific but is approximately twofold enriched in our endosperm sample, similar to FIE (which is also not seed specific). EMB2220 mutants have not been fully characterised, but it is tempting to speculate that this gene is also involved in endosperm development. imprinted genes MEA, FWA and FIS2 (Feil and Berger, 2007). This parent of origin dependent differential repression and activation of alleles appears to be limited to the endosperm in plants (Feil and Berger, 2007). Concordantly, all confirmed imprinted genes in arabidopsis i.e., PHE, MEA, FIS2, FWA (Köhler et al., 2005;Luo et al., 2000;Kinoshita et al., 2004) are present in our EP partition. It is therefore likely that our EP partition is enriched for imprinted genes and provides a shortlist for identifying new members of this gene family.

Conclusions
The use of laser assisted microdissection (LAM) technology has enabled the isolation

LCM endosperm microarray data
The LCM endosperm microarray data analysed in this study has been described in (Day et al., 2007) deposited at the NCBI gene expression omnibus (http://www.ncbi.nlm.nih.gov/geo) under accession number GSE6703.

Preparation of qRT-PCR cDNA template and PCR cycling conditions for analysis of LCM derived samples.
Total RNA was obtained from LCM dissected endosperm was obtained as described (Day et al., 2007) and total RNA was obtained using the Picopure RNA isolation kit (Arcturus) with optional on column DNase step (Qiagen). The purified total RNA was quantified using the Ribogreen RNA quantification kit (Invitrogen) according to the manufacturers' instructions. IVT based amplifications were carried out using the Message Amp II aRNA kit (Ambion) following the manufacturers' instructions.
Comparisons between the ability of qRT-PCR and the microarrays to measure differential expression of seventeen genes in the same samples used cDNA made from aRNA generated using one round of IVT. The remaining first round product was then used as the basis for a second round of IVT which generated target for the microarray study. For the qRT-PCR the aRNA was primed with random hexamers and first strand cDNA was synthesised using Superscript III (Invitrogen) according to the manufacturers' instructions. Realtime qRT-PCR was carried out using reagents from the LightCycler FastStart DNA MasterPlus SYBR Green I kit (Roche) in 20 µL volumes using a LightCycler 1.0 (Roche). The amplification conditions for qPCR were: Denature: 95°C for 10 min; Cycling 94°C for 5 s, 58°C for 17 s, 72°C for 10 s (single acquire); Melt: 95°C for 0 s, 55°C for 20 s, 95°C for 0 s with ramp 0.2°C /s (continuous acquire); Cool: 40°C for 20 s. Reaction products were confirmed by melting curve analysis and by running out the product on a 1.2% agarose gel. The primers used for qRT-PCR are provided in Table S13

Preparation of qRT-PCR cDNA template and PCR cycling conditions for analysis of conventional fresh tissue samples.
RNA was extracted from fresh arabidopsis tissues using the Qiagen Plant RNeasy kit as per the manufacturers' instructions with some alterations for the seed samples.
Disruption of the silique, stem, leaf, root and flower bud tissues was carried out by harvest into 1.5 ml Eppendorf tubes and flash freezing in liquid nitrogen. Tissues were then quickly ground to a powder in the Eppendorf tubes using a pre-cooled plastic pestle on dry ice. RNA extraction reagent was added before the samples thawed. For the dissected seed samples, developing siliques were removed from plants using tweezers and cut open under a dissecting microscope with a hypodermic needle being careful not to damage the seed within. The majority of seed were scrapped onto the back of the needle and deposited into a pre-cooled Eppendorf tube on dry ice. Frozen seed were transferred to pre-cooled plastic bags embedded in dry ice and RNA extraction reagent was pipetted into the frozen bag and allowed to thaw.
Individual developing seed (visualised through the plastic using a dissecting microscope) were completely disrupted using pressure from the tip of blunt tweezers and used as input for the Plant RNeasy kit.
RNA was quantified using a Nanodrop spectrophotometer and cDNA was generated using the VILO cDNA synthesis kit (Invitrogen) using the manufacturers instructions.
Realtime qRT-PCR was carried out using reagents from the ExpressSYBR GreenER mastermix kit (Invitrogen) in 7 µL volumes using a LightCycler 480 (Roche). The amplification conditions for qPCR were: Denature: 95°C for 10 min; Cycling 94°C for 5 s, 61 or 58°C for 17 s, 72°C for 10 s (single acquire); Melt: 95°C for 0 s, 55°C for 20 s, 95°C for 0 s with ramp 0.2°C /s (continuous acquire); Cool: 40°C for 20 s. Reaction products were confirmed by melting curve analysis and by 1.2% agarose gel electrophoresis. The primers used for qRT-PCR are provided in Table S13. To identify genes with early endosperm specific expression, we used the known marker genes FIS2, FWA, PHE1 and SUC5 (Luo et al., 2000;Kinoshita et al., 2004;Baud et al., 2005;Köhler et al., 2005) as bait to search the AtGenExpress developmental series (arabidopsis transcripts profiles from different organs and at different stages of development determined using Affymetrix Genechip technology) using the Expression Angler (BAR-http://bar.utoronto.ca/). The software calculates the similarity of expression patterns to the marker genes for the other genes in the database using a Pearson correlation coefficient (r value). FIS2, FWA and PHE1 had very few genes correlated with their expression patterns probably caused by ill defined expression patterns. In contrast, SUC5, which has strong endosperm-preferred expression during the proliferative stages of development (Baud et al., 2005), gave a well-defined pattern in the AtGenExpress seed data. The SUC5 expression template was used to identify genes similarly expressed during early seed development in the ATGenExpress data. This list was then entered into the BAR-DataMetaformatter tool (http://bar.utoronto.ca/) to create a heat map of probe intensities for genes in the AtGenExpress seed development series.

Identification of genes with evidence of early silique/seed-specific expression
The GHL Affymetrix Genechip datasets were downloaded from the NCBI expression omnibus (GEO-http://www.ncbi.nlm.nih.gov/geo/), imported into TIGR MeV software and screened using the pattern matching function to identify those genes with similar expression patterns to the early endosperm markers SUC5, PHE1, FWA and FIS2 (Table S2). Pattern matched genes for all markers were combined and crossreferenced to genes with differential expression on our arrays. This ESS subgroup was used in subsequent analysis due to its enhanced sensitivity towards known endosperm markers. For example, Figure S2 shows a heat map of normalised intensity data from both the GHL Genechip data and the AtGenExpress seed development series.
Negligible signal was apparent for the known endosperm markers PHE1, IPT8, FWA, --31 --was easily apparent in the GHL data. This likely reflects the fact that the GHL data was derived from seed removed from the surrounding silique tissues, whereas the samples representing early seed development in the AtGenExpress data included the whole silique.

Other data processing and analysis
The Genes with methylation sensitive transcription were obtained from . Chromatin related genes were obtained from ChromDB (http://www.chromdb.org/). In some instances a particular loci was represented by more than one probe on our array. In a few cases different probes for the same locus represented by the probes on our array platform, we queried the whole genome with the list of loci on the array. This analysis revealed that several GO and FunCat groups were already over represented on our arrays (data not shown). To get an accurate assessment of the representation of biological processes in our partitioned data, we therefore used our array list as the background population. t-tests were carried out in MeV with no correction for multiple testing.  Table S13. Arabidopsis thaliana was transformed using the standard floral dipping protocol (Clough and Bent, 1998). Developing seeds from hygromycin resistant primary transformants were assessed for GUS activity essentially following (Stangeland and Salehian, 2002). Briefly, fruits at various stages of development were dissected and placed GUS staining buffer (50mM phosphate buffer [pH 7.2], 0.5mM potassium ferri/ferro cyanide and 1mg/ml X-Gluc) overnight at 37˚C. Younger fruits were cleared in Hoyer's medium (100g chloral hydrate in 30mL H 2 0) for 2-4 hours.

Construction and analysis of GUS reporter lines
--33 --Older fruits were placed in 3:1 ethanol:acetic acid for 4-8 h and cleared overnight in Hoyer's medium. Developing seeds were fully dissected from the fruits and mounted in Hoyer's mounting media with a cover slip. Prepared slides were viewed under DIC optics on an Olympus BX51 microscope equipped with an Optronics Magnafire 2.1A digital imaging system. Figure S1. Heat map showing endosperm-preferred genes in MPSS data. Figure S2. Heat maps comparing endosperm marker detection in online datasets. Table S1. Genes with AtGenExpress expression profile correlating with the early endosperm specific gene SUC5. Table S2. List of putative early seed-specific genes (ESS) from the GHL data.     Tables S12. List of transcription factor genes in each partition. Table S13. Sequences of the primers used for qRT-PCR and to generation of the genomic fragments for the GUS constructs.  -16 are genes that gave a range of differential expression in the array data and M1-7 are endosperm marker genes identified from the literature. b Log ratio array and Log ratio qPCR correspond to log 2 values for the ratio of expression between whole silique and LCM endosperm samples. c Relative expression by qRT-PCR was calculated by comparison to the ACTIN 2 (At3g18780) reference gene.      (27) that were endosperm preferred in the LCM endosperm array data and called early seed specific by searching online data resources and then confirmed by qRT-PCR and/or reporter lines b .   This heat map includes genes with endosperm preferred expression from our arrays that correlate with SUC5 expression (based on an r value cut-off of 0.75) across the AtGenExpress expression library using online tools provided at the University of Toronto BAR website. The DataMetaformatter tool created a heat map of probe intensities for genes in AtGenExpress seed development series and also calculated the median expression intensities in all other wild-type tissues. This heat map only includes genes that had a median expression level in AtGenExpress <100 units in non-seed containing tissues and a median expression of >100 across the seed series.