Protein-coding cis-natural antisense transcripts have high and broad expression in Arabidopsis.

Protein-coding overlapping genes in Arabidopsis have unexpectedly high levels and breadths of expression. Pairs of genes within eukaryotic genomes are often located on opposite DNA strands such that transcription generates cis-natural sense antisense transcripts (cis-NATs). This orientation of genes has been associated with the biogenesis of splice variants and natural antisense small RNAs. Here, in an analysis of currently available data, we report that within Arabidopsis (Arabidopsis thaliana), protein-coding cis-NATs are also characterized by high abundance, high coexpression, and broad expression. Our results suggest that a permissive chromatin environment may have led to the proximity of these genes. Compared with other genes, cis-NAT-encoding genes have enriched low-nucleosome-density regions, high levels of histone H3 lysine-9 acetylation, and low levels of H3 lysine-27 trimethylation. Promoters associated with broadly expressed genes are preferentially found in the 5′ regulatory sequences of cis-NAT-encoding genes. Our results further suggest that natural antisense small RNA production from cis-NATs is limited. Small RNAs sequenced from natural antisense small RNA biogenesis mutants including dcl1, dcl2, dcl3, and rdr6 map to cis-NATs as frequently as small RNAs sequenced from wild-type plants. Future work will investigate if the positive transcriptional regulation of overlapping protein-coding genes contributes to the prevalence of these genes within other eukaryotic genomes.

. More than 10,000 small RNAs (sRNAs) from plant stress sRNA libraries map to overlapping regions of cis-NAT transcripts (Zhang et al., 2012). Second, Jen et al., (2005) noted that cis-NAT-encoding genes had far more alternative splice variants and alternative polyadenylation sites than other genes, suggesting a function in antisense transcript-induced RNA splicing, alternative splicing, and polyadenylation.
Transcription occurs within physically distinct regions of the nucleus enriched in active RNA polymerase II as well as other transcriptional regulatory and accessory factors (Edelman and Fraser, 2012). The concentration of transcriptional machinery within a small part of the nuclear space has been proposed to enhance the efficiency of transcription. We postulated that the physical proximity of cis-NAT-encoding genes favors their high and broad expression. Here, we test this theory by analyzing RNA-Seq data from Arabidopsis plants grown in six distinct conditions. We detect cis-NAT transcripts at high frequency and find that they have remarkably broad and high transcript levels. Promoter sequences of broadly expressed genes are highly represented in cis-NAT-encoding genes' upstream regulatory regions, and the chromatin of cis-NAT-encoding genes is enriched for euchromatic marks. sRNAs appear to play a limited role in cis-NAT regulation, as sRNAs sequenced from sRNA biogenesis mutants map to PC cis-NATs at a similar frequency as in wild-type plants.

PC cis-NATs Have High Abundance
The Arabidopsis genome contains 33,239 genes and 33,234 adjacent gene pairs (Table I), as annotated by The Arabidopsis Information Resource (release 9). A total of 5.1% (1,710 of 33,234) of the adjacent genes, evenly distributed across chromosomes, may generate cis-NATs. We classified the pairs into three types. In type I, sense and antisense transcripts are complementary in their 39 ends. In type II, they are complementary in their 59 ends. In type III, one entire transcript has homology to a subsequent second transcript. PC gene cis-NAT pairs are overrepresented among cis-NATs, and type I pairs are highly likely to be PC. Among all cis-NAT-encoding gene pairs, 82% (1,402 of 1,710) encode for two PC transcripts (Table  II), a significantly larger proportion than expected given the frequency of PC genes within the genome (P , 1e-10). Of the 1,363 type I cis-NAT pairs, 93.3% are PC cis-NATs (1,272 of 1,363; Table II), again significantly more than expected (P , 1e-10).
To assay transcript patterns of PC cis-NATs, we mapped RNA-Seq microreads from plants grown in standard, cold, heat, salt, drought, and high-light conditions to PC genes (Filichkin et al., 2010). Of the 2,804 cis-NAT-encoding genes within PC gene pairs, 2,338 (83%) had evidence of transcription in at least one condition. In contrast, among the 24,365 PC genes that do not encode cis-NATs, 18,908 (77%) had evidence for transcription, a significant difference (P , 0.01). Examining only genes with evidence for transcription, we first compared the frequency with which microreads mapped to cis-NATs compared with other transcripts in each growth condition. A higher proportion of cis-NAT-encoding genes were expressed relative to non-cis-NAT-encoding genes in each condition (P , 0.001; Fig. 1). The proportion of expressed cis-NATencoding genes in each condition also exceeded the proportion of non-cis-NATs when we used stricter criteria for defining evidence for transcription (Supplemental Fig. S1). Second, we investigated if cis-NAT-encoding genes had higher transcript abundances than non-cis-NAT-encoding genes. In all conditions except standard growth conditions, median cis-NAT abundances were significantly higher than non-cis-NAT abundances (P , 0.05; Fig. 2). Summing across conditions, the median expression level of the 2,338 genes within the 1,174 protein-encoding cis-NAT pairs was 62.8 reads per kilobase of exon model per million mapped reads (RPKM), significantly higher (Wilcoxon rank sum test, P = 1.19e-11) than the median expression level of the 18,908 expressed PC non-cis-NAT genes (52.5 RPKM). Third, PC cis-NAT genes were also more broadly expressed than were PC noncis-NAT genes. The distribution of expression breadths of the PC cis-NAT genes was shifted to the right compared with the distribution of expression breadths of the PC non-cis-NAT genes (Wilcoxon rank sum test, P , 2.2e-16; Fig. 3). Most notably, the proportion of PC cis-NAT genes expressed in all six conditions was 9% higher than the proportion of PC non-cis-NAT genes (79.2% versus 70.2%; Fig. 3).
A standard method for estimating the transcript levels of a gene from RNA-Seq data is to distribute  reads that map to multiple transcripts across the target transcripts. In our case, because each cis-NAT overlaps another cis-NAT and our RNA-Seq reads were not strand specific, it is possible that the number of expressed cis-NAT genes and the breadth of expressed cis-NAT genes were inflated. Thus, we also assayed patterns of transcript abundance using only RNA-Seq reads that mapped to a single transcript. The results were similar to the findings when using weighted RNA-Seq microreads. In each condition, unique RNA-Seq microreads mapped to a higher proportion of cis-NAT-encoding genes than to non-cis-NAT-encoding genes (P , 0.001; Supplemental Fig. S2). Across all conditions except for the standard growth condition, unique reads mapped more frequently to cis-NATs than to other, non-cis-NAT genes (Wilcoxon rank sum test, P , 0.05; Supplemental Fig. S3). cis-NATs were also detected in more conditions than were non-cis-NATs (Wilcoxon rank sum test, P , 2.2e-16; Supplemental Fig. S4).
cis-NATs are present in more conditions than are non-cis-NATs, but transcripts within a cis-NAT pair may occur in different subsets of conditions. We used the Index of Co-Expression (ICE) to determine if the 1,174 Arabidopsis PC cis-NAT pairs tended to cooccur in the same conditions. A high ICE value indicates that two transcripts are frequently found in the same conditions (Chen et al., 2005). For all tested ICE levels, cis-NATs from the same gene pair cooccurred significantly more often than did transcripts from nonoverlapping genes (P , 0.0001; Table III). For example, 94% of PC cis-NAT pairs had ICE values greater than 0.5 and 78% had ICE values greater than 0.9 (Table III). In contrast, 87% of PC non-cis-NAT pairs had ICE values greater than 0.5 and 64% had ICE values greater than 0.9 (Table III). We considered a pair of cis-NATs to cooccur across a set of conditions if their ICE value was greater than 0.6, as suggested by Chen et al. (2005). The percentage of PC cis-NAT pairs with ICE values greater than 0.6 was 89% versus 80% for PC non-cis-NAT pairs (P , 0.0001; Table III). The ICE metric changes when different criteria are used to make transcript presence/absence calls. However, the proportion of cis-NATs that cooccurred also greatly exceeded the expected proportion using more stringent criteria for determining transcript presence (Supplemental Table S1).

Evidence for the Transcriptional Regulation of cis-NATs
To investigate if transcriptional regulation may have contributed to high cis-NAT abundances, we mapped cis-regulatory elements to gene 59 upstream regions. We ranked each of 200 motifs by the frequency with which it was found upstream of a cis-NAT-encoding gene and the frequency with which it was found upstream of a nonoverlapping gene. Eight motifs were 10 or more ranks higher within the cis-NAT-encoding genes than within the non-cis-NAT-encoding genes (Supplemental Table S2). The eight motifs included Figure 1. PC genes that generate cis-NATs are more frequently expressed than are PC genes that do not. The y axis denotes the percentage of expressed PC genes out of 2,338 cis-NATs and 18,908 noncis-NATs. The x axis denotes the six different growth conditions. Asterisks indicate that the proportion of PC cis-NAT-encoding genes expressed within a given condition is significantly greater than the proportion of PC non-cis-NAT-encoding genes at P , 1e-03. [See online article for color version of this figure.] Figure 2. Expressed PC cis-NATs accumulate to higher abundances than do expressed PC non-cis-NATs. The median expression levels of expressed PC cis-NATs and PC non-cis-NATs are noted on the y axis in RPKM for six growth conditions. Asterisks indicate that the median RPKM of expressed PC cis-NATs within a given condition is significantly greater than the median RPKM of expressed PC non-cis-NATs at P , 0.05. [See online article for color version of this figure.] GGGCC and GGCCCAWWW, which are enriched in regulatory regions of genes represented by Gene Ontology categories "structural constituent of ribosome" (GO:0003735), "ribosome" (GO:0005840), and "ribosome biogenesis and assembly" (GO:0042254). Nine motifs were more than 10 ranks higher in PC non-cis-NAT-encoding genes relative to PC cis-NAT-encoding genes (Supplemental Table S2). Three motifs (TGCAAAG, CATGCA, and CATGCAY) were associated with seed storage protein expression (Supplemental Table S2). Other highly represented motifs included GA and auxin response elements.
We further investigated if the chromatin states of PC cis-NAT-encoding genes differed from other PC genes with similar transcript abundances. Among genes expressed at similar levels, H3K27 trimethylation among cis-NAT-encoding genes was significantly less frequent than among non-cis-NAT-encoding genes ( Fig. 5A). PC cis-NAT-encoding genes also had high levels of H3K9 acetylation relative to PC non-cis-NATencoding genes expressed at similar levels ( Fig. 5B). Thus, the chromatin state of cis-NAT-encoding genes is more open than that of non-cis-NAT-encoding genes when transcripts from both accumulate to the same level. We also investigated if chromatin attributes differed for genes in different cis-NAT orientations. The chromatin status of genes encoding type III cis-NATs did not significantly differ from that of non-cis-NAT-encoding genes (Supplemental Table S3). Genes in tail-to-tail orientation (type I) and genes in head-tohead orientation (type II) had similar, high frequencies of H3K9 acetylation and low frequencies of H3K27 trimethylation (Supplemental Table S3).
sRNAs Match PC cis-NATs Less Frequently than PC non-cis-NATs As described above, one explanation for the frequency of overlapping genes is their capacity to generate small regulatory RNAs. However, genes to which sRNAs map have low transcript abundances relative to other genes (Groszmann et al., 2011). Henz et al.  The average proportion of cooccurring gene pairs of 100,000 randomized sets of expressed PC non-cis-NAT gene pairs. c The probability that a randomly selected group of PC non-cis-NATs would have a higher percentage of cooccurrence than the PC cis-NATs by chance.

DISCUSSION
Here, we propose that a central function of overlapping PC gene pairs is to promote high and broad gene transcription. We found that PC cis-NATs are detected more frequently and at a higher abundance than transcripts from nonoverlapping PC genes ( Figs.  1 and 2). We also report that PC cis-NATs are broadly expressed and cooccur within harvested tissues more often than are PC non-cis-NAT genes ( Fig. 3; Table III). PC genes are significantly overrepresented among cis-NAT-encoding genes, suggesting that genes that give rise to noncoding RNAs may be more sensitive to antisense transcripts than are PC genes (Table I).
Among the PC cis-NAT-encoding genes, gene pairs with 39 overlaps (type I) were significantly overrepresented (Table II). Recombination and/or mutation likely gives rise to all three orientations of cis-NATencoding genes, but most type II and III gene pairs are eliminated because the creation of one gene has deleterious effects on the regulatory or coding sequences of the complementary gene.  (Zhou et al., 2010); LND regions, low-nucleosome-density regions (Zhang et al., 2007). Asterisks indicate that the proportion of PC cis-NAT (PC non-cis-NAT) genes with a given chromatin modification is significantly greater than the proportion of PC non-cis-NAT (PC cis-NAT) genes at P , 1e-03. [See online article for color version of this figure.] Figure 5. H3K9 acetylation (H3K9ac) is overrepresented and H3K27 trimethylation (H3K27me3) is underrepresented in PC cis-NATencoding genes relative to PC non-cis-NAT-encoding genes with similar expression levels. PC cis-NAT genes and PC non-cis-NAT genes are partitioned to three bins based on expression levels. Genes were binned into low, medium, and high groups based on their transcript abundances (2-16 RPKM, 16-128 RPKM, and more than 128 RPKM, respectively). Asterisks indicate that the proportion of PC cis-NAT (PC non-cis-NAT) genes with a given chromatin modification is significantly greater than the proportion of PC non-cis-NAT (PC cis-NAT) genes as follows: ***P , 1e-03, **P , 1e-02, *P , 5e-02. A, The proportions of PC cis-NAT-encoding genes and PC non-cis-NATencoding genes with H3K27me3 are noted on the y axis. A significantly lower proportion of PC cis-NAT genes are marked with H3K27me3 than are PC non-cis-NAT genes across all expression levels. B, The proportions of PC cis-NAT-encoding genes and PC non-cis-NAT-encoding genes marked with H3K9ac are noted on the y axis. The frequency of H3K9ac among PC cis-NAT genes is significantly higher than the frequency among PC non-cis-NAT genes. [See online article for color version of this figure.] Our results seem to be at odds with a previous finding that cis-NATs exhibit inverse expression (e.g. in some conditions, one cis-NAT is highly abundant while its antisense transcript has low abundance, while in other conditions, the cis-NAT has low abundance while its antisense transcript has high abundance; Jin et al., 2008). However, the frequency of inverse expression is strongly correlated with expression breadth. Because gene pairs are declared inversely expressed if their abundances differ across conditions, a gene pair expressed across a large number of conditions is more likely to be inversely expressed than a gene pair expressed across a small number of conditions. Thus, we also found that PC cis-NATs were frequently inversely expressed across treatments (51% of PC cis-NATs compared with 42% of other PC transcripts; P = 3.5e-10). However, we compared the frequency of inverse expression among the PC cis-NAT pairs and random, PC non-cis-NAT gene pairs in which both members are expressed in three, four, five, and six conditions. Inverse expression increased with expression breadth, and the proportion of PC cis-NAT gene pairs that were inversely expressed did not significantly differ from nonoverlapping genes at any given breadth (Fig. 6). High levels of cis-NAT inverse expression have also been reported for human cis-NAT-encoding genes (Chen et al., 2005), and this result too may be a function of expression breadth.

Patterns of cis-NAT Abundances Are Likely Due to Transcriptional Control
Although antisense transcripts may posttranscriptionally stabilize sense transcripts and be generated by sense transcripts (Faghihi et al., 2008;Matsui et al., 2008aMatsui et al., , 2008b, the high frequency of certain promoter motifs, low nucleosome density, low H3K27 trimethylation, and high H3K9 acetylation among the cis-NAT-encoding genes suggest that the transcript abundance patterns of cis-NATs are due to transcriptional regulation. Previous studies have reported that overlapping transcripts tend to be coregulated. Although nonsense-mediated mRNA decay widely suppresses non-PC, antisense transcripts  Kasschau et al., 2007). b sRNAs that matched to 2,338 PC cis-NATs or 18,908 PC noncis-NATs. Number per million, in parentheses, is the expected number of sRNAs out of 10 6 sRNAs that match a single cis-NAT or a single non-cis-NAT transcript.
c The frequency of sRNAs matching PC noncis-NATs divided by the frequency of sRNAs matching PC cis-NATs. d The number of sRNAs matching a PC transcript was higher in rdr2-1 than in other genotypes because rdr2-1 fails to produce many repeatassociated-siRNAs, leading to a high representation of sRNAs with homology to PC sequences .  (Kurihara et al., 2009), non-PC transcripts complementary to highly expressed, sense transcripts are expressed at relatively high levels (Luo et al., 2012). COOLAIR, the regulatory, noncoding antisense transcript of the floral repressor FLOWERING LOCUS C (FLC), has transcript levels that positively correlate with FLC transcript levels in most conditions (Ietswaart et al., 2012). Antisense genes may be under the regulatory control of shared enhancer elements that recruit RNA polymerase to both promoters (Tagoh et al., 2004;Ebralidze et al., 2008), and locus control regions can also establish extensive activated chromatin domains encompassing target promoters (Cajiao et al., 2004).
The transcription of overlapping PC genes itself may be a positive regulatory mechanism. As in our study, Katayama et al. (2005) found that PC cis-NATs were highly coexpressed in mouse cell lines. Furthermore, depleting transcripts derived from one of two overlapping PC genes with RNA interference reduced the transcript abundance of the second gene (Katayama et al., 2005). The expression of enhancer RNAs also positively correlates with the expression of nearby genes (Kim et al., 2010;Ørom et al., 2010). Investigating the transcript abundance and chromatin status of one gene within an overlapping cis-NAT pair when the other gene has been silenced would test if the transcription of a cis-NAT positively regulates its antisense transcript. As enhancers act independently of orientation and position, both overlapping PC genes and nonoverlapping, nearby PC genes may have high levels of coexpression.
Antisense transcription through, or terminating in, a sense promoter can promote sense gene transcription, likely by chromatin remodeling (Uhler et al., 2007). For example, the abundance of COOLAIR antisense transcripts with poly(A) sites within the sense FLC promoter region is correlated with FLC sense transcript abundance (Hornyik et al., 2010). Nonetheless, antisense transcription likely affects the kinetics but not the final level of gene transcription (Uhler et al., 2007). Furthermore, the 80% of PC cis-NAT gene pairs that are type I (tail-to-tail) do not have overlapping promoters.
As mentioned above, some cis-NATs are processed into siRNAs that target transcripts for cleavage. Thus, one may expect that cellular sRNAs map to cis-NATs at high frequencies. Consistent with this idea, Zhang et al. (2012) recently reported that siRNAs are overrepresented (P , 0.04) among the 39 UTRs of cis-NATs compared with 39 UTRs of non-cis-NATs (Zhang et al., 2012). In contrast, Henz et al. (2007) found that sRNAs are more enriched in non-cis-NATs compared with cis-NATs (Henz et al., 2007). Our results are consistent with Henz et al. (2007). We found that among wildtype plants, sRNAs matched PC non-cis-NATs 2.8 times more frequently than they matched PC cis-NATs (Table IV). In addition, we compared the number of sRNAs that map to cis-NATs in putative nat-siRNA biogenesis mutant plants (dcl1-7, dcl2-1, dcl3-1, and rdr6-15) with the number that match in other genotypes (dcl4-2, rdr1-1, and the wild type) and found them to be similar (Table IV). We suggest that nat-siRNA regulation of PC gene expression is infrequent. Our results and those of Henz et al. (2007) may disagree with the findings of Zhang et al. (2012), because this latter work included noncoding RNAs. Fifty-four of the 84 cis-NAT-encoding gene pairs associated with more than 10 siRNAs per million reads reported by Zhang et al. (2012) had one transcript annotated as "other RNAs." A number of the nat-siRNAs reported by Zhang et al. (2012) were DCL1 or DCL3/RDR2 dependent and may be microRNAs or siRNAs derived from the noncoding transcript alone. RNA structure is important for guiding molecules into sRNA biogenesis pathways (Pouch-Pélissier et al., 2008), and a high frequency of noncoding RNAs form stable RNA structures and act as precursors for repeat associated siRNA and microRNA biogenesis (Hirsch et al., 2006;Ben Amor et al., 2009).
We can speculate on how highly and broadly expressed complementary transcripts avoid nat-siRNA production and recruitment into long noncoding RNA-related ribonucleoprotein-silencing complexes (Swiezewski et al., 2009). Perhaps these genes are highly and broadly transcribed together in the same tissue, but they are not simultaneously transcribed. Indeed, Osborne et al. (2004) noted that active genes undergo discontinuous transcription (Osborne et al., 2004). Computational analyses support this model, as PC cis-NAT levels are rarely correlated. Matsui et al. (2008a) reported that strong linear correlations between PC cis-NAT levels were rare. We previously reported 29 highly correlated transcript pairs from convergently oriented genes, of which type I genes are a subset (Zhan et al., 2006). Among the 29 gene pairs, a Highly correlated neighboring gene pairs were identified from microarray data sets across 128 experimental conditions with r . 0.7 (Zhan et al., 2006). b Parentheses give the percentage of cis-NAT pairs in the defined orientation of transcription. c Values are given as counts.
not one is a type I cis-NAT pair, significantly fewer than expected (Table V; binomial test, P = 6.6e-4), and not one type II cis-NAT gene pair is among the 53 highly correlated, divergently oriented gene pairs (Table V). Investigating if positive transcriptional regulation with low cotranscription characterizes cis-NAT genes in other species will determine the generality of these findings.

Identification of cis-NATs and RNA-Seq Analyses
To identify candidate cis-NAT-encoding genes, we used PERL scripts to retrieve gene model orientation and transcript start and stop positions from the GenBank Refseqs NC_003070.9, NC_003071.7, NC_003074.8, NC_003075.7, and NC_003076.8. We defined cis-NAT pairs as overlapping transcript pairs that arise from a pair of genes adjacently located on opposite strands of the same genomic locus. Although non-PC antisense transcripts outnumber PC antisense transcripts (Matsui et al., 2008a), we focused on the cis-NATs of overlapping PC genes. Using a x 2 test, we calculated the probability of the observed number of PC genes that encoded cis-NATs, given the null expectation that the same proportion of PC genes and non-PC genes encode cis-NATs. Similarly, we calculated the probability of the observed number of PC cis-NAT-encoding gene pairs that were type I, type II, and type III, given the null expectation that PC gene pairs would be type I, type II, and type III at the same frequency as all gene pairs.
To assay transcript abundance, we obtained Arabidopsis (Arabidopsis thaliana) RNA-Seq data from 3-week-old Columbia-0 plants grown in normal, high-light, heat, cold, salt, and drought conditions from Dr. Todd Mockler. Filichkin et al. (2010) described plant growth conditions, RNA isolation, and the preparation of complementary DNA for the Illumina 1G Genome Analyzer. A total of 53.44 million 36-base RNA-Seq microreads were truncated to the first 30 bases (Jiang and Wong, 2008;Filichkin et al., 2010), and we used SeqMap (Jiang and Wong, 2008) to map these reads to the annotated mRNAs. We used rSeq (Jiang and Wong, 2009) to compute expression levels in RPKM (Mortazavi et al., 2008). rSeq weights reads by the number of sites to which they map. For overlapping transcripts, the transcript that gave rise to the RNA-Seq read is unknown, as reads are not strand specific. Thus, we also investigated reads that mapped only to a single transcript. To calculate RPKM, we used the length of a gene's longest transcript to compute the exon model length.
We only compared PC cis-NATs and non-cis-NATs with evidence of transcription, defined as having at least 1 RPKM in one condition and at least a sum of 2 RPKM across the six conditions. We used the two-sample proportion test to evaluate the significance of differences in the proportion of expressed genes within each condition (Fig. 1). To determine if a gene is expressed within a condition, we considered a gene with transcript abundance of 1 RPKM or greater to be present. The distribution of cis-NAT and non-cis-NAT RPKM values was nonnormal, and we used the Wilcoxon rank sum test to evaluate the significance of the median expression abundance differences between cis-NATs and non-cis-NATs (Fig. 2). Similarly, we used the Wilcoxon rank sum test to compare the expression breadths of cis-NATs and non-cis-NATs (Fig.  3). To evaluate if transcripts from both members of a gene pair tended to be detected in the same treatment, we calculated the ICE as described by Lercher et al. (2002). In this paper, we call this metric the "index of cooccurrence," as the term coexpression is often used to signify correlation. The ICE metric ranges from 0 to 1, corresponding to no cooccurrence and perfect cooccurrence of two genes' transcripts across samples. To determine if cooccurrence of cis-NAT pairs was more than expected by chance, we first calculated the cooccurrence rate for all cis-NAT pairs. We then generated a control data set by replacing each gene in the cis-NAT set with a randomly picked gene from expressed, nonoverlapping PC genes. We compared the observed frequency of cis-NAT cooccurrence with the frequencies calculated from 100,000 permuted, control data sets. The significance of the cis-NAT cooccurrence rates was calculated using this null distribution (Table III). The ICE metric depends in part on the criteria used to make transcript presence/absence calls. We also used more stringent criteria to assay ICE. Under stringent criteria, a gene was expressed if its RPKM was 4 or greater across six conditions and 3 or greater in one condition. Finally, we adopted a method proposed by Chen et al. (2005) to measure inverse expression (Fig. 6). We first Studentized the expression of each gene, allowing the expression differences of genes with high or low transcript abundances to be compared. We then labeled two genes (A and B) as inversely expressed if they had three or more contrasting expression patterns among their 15 pairwise comparisons across the six RNA-Seq experiments. A gene pair was termed contrasting between two experiments if the expression of gene A in one experiment was greater than the expression of gene B by greater than 0.5 and if the expression of gene B in the second experiment was greater than the expression of gene A by greater than 0.5. We again used permutations of expressed PC non-cis-NAT genes to calculate statistical significance. All analyses were performed with the R statistical software (R Development Core Team, 2010).

Epigenetic and Promoter Analyses of cis-NAT-Encoding Genes
To investigate epigenetic marks on cis-NAT-encoding genes, we obtained from online, supplemental data lists of genes that contain the chromatin attributes: H3K27 trimethylation (Zhang et al., 2007;Lafos et al., 2011), H3K9 acetylation (Zhou et al., 2010), and low nucleosome density (Zhang et al., 2007). We obtained epigenetic information for all genes with evidence of expression. We then calculated the proportions of cis-NAT-encoding PC genes that have each epigenetic attribute and the proportions of non-cis-NAT PC genes that have each epigenetic attribute. The significance of differences was calculated with the two-sample proportions test (Fig. 4). We took a similar approach to compare the frequency of chromatin attributes in PC cis-NATencoding genes and other PC genes expressed at a similar level (Fig. 5).
To evaluate the differences in cis-regulatory elements, we obtained a list of motifs and motif positions from ATCOECIS (Vandepoele et al., 2009). We recorded if a motif was identified within each gene's upstream region. The upstream region was restricted to the first 1,000 bp upstream from the translation start site or to a shorter region if genes were closer than 1,000 bp (Vandepoele et al., 2009). The gene list was divided into cis-NAT-encoding genes and non-cis-NAT-encoding genes. For each group, we ranked the motifs by the frequency with which they were found among the cis-NATs and the non-cis-NATs. We report motifs that differed by more than 10 ranks between the two groups (Supplemental Table S2). Associations of the identified motifs with Gene Ontology categories were retrieved from ATCOECIS (Vandepoele et al., 2009).

Mapping sRNAs to cis-NATs
The nat-siRNA biogenesis is dependent on DCL1, DCL2, and RDR6 (Borsani et al., 2005;Ron et al., 2010). Zhang et al. (2012) also found that some nat-siRNAs were dependent on DCL1, DCL3, and RDR2. To investigate the frequency with which sRNAs mapped to cis-NATs in the wild type, nat-siRNA silencing mutants, and other sRNA silencing mutants, we obtained sRNA sequences from accessions GSE5228 (dcl1-7, dcl2-1, dcl3-1, dcl4-2, rdr1-1, rdr2-1, rdr6-15, and the wild type; Kasschau et al., 2007) and GSE6682 (the wild type; Rajagopalan et al., 2006) from the National Center for Biotechnology Information Gene Expression Omnibus. We tallied perfect matches between the sRNA sequences and PC transcripts using PERL scripts. An sRNA matching the target transcript was counted once if the sRNA matched multiple transcripts from the same gene or the same transcript multiple times. We calculated the number of sRNAs that mapped to cis-NATs and the average frequency with which an sRNA will match each cis-NAT per million sRNAs (Table IV).

Supplemental Data
The following materials are available in the online version of this article.
Supplemental Figure S1. Percentage of expressed PC cis-NAT-encoding genes and PC non-cis-NAT-encoding genes with high transcript abundances.
Supplemental Figure S2. Percentage of cis-NAT-encoding genes and noncis-NAT-encoding genes with uniquely mapped RNA-Seq reads.
Supplemental Figure S3. RNA-Seq reads that map to unique positions are more abundant within PC cis-NATs than PC non-cis-NATs.
Supplemental Figure S4. Percentage of expressed PC cis-NAT and PC non-cis-NAT genes that had uniquely mapped RNA-Seq reads.
Supplemental Table S1. Percentage of cooccurring PC cis-NAT pairs using stringent expression criteria at different ICE cutoff values.
Supplemental Table S2. Motifs overrepresented within PC cis-NATs and PC non-cis-NATs.
Supplemental Table S3. Proportions of H3K9 acetylation and H3K27 trimethylation among types of PC cis-NAT-encoding genes.
Supplemental Table S4. Number and frequency of PC cis-NATs and PC non-cis-NATs matching sRNAs in RNA silencing mutants.