Identification of genes in Thuja plicata foliar terpenoid defenses.

Transcriptome profiling of foliage with and without resin glands identifies candidate gene products in thujone biosynthesis. Thuja plicata (western redcedar) is a long-lived conifer species whose foliage is rarely affected by disease or insect pests, but can be severely damaged by ungulate browsing. Deterrence to browsing correlates with high foliar levels of terpenoids, in particular the monoterpenoid α-thujone. Here, we set out to identify genes whose products may be involved in the production of α-thujone and other terpenoids in this species. First, we generated a foliar transcriptome database from which to draw candidate genes. Second, we mapped the storage of thujones and other terpenoids to foliar glands. Third, we used global expression profiling to identify more than 600 genes that are expressed at high levels in foliage with glands, but can either not be detected or are expressed at low levels in a natural variant lacking foliar glands. Fourth, we used in situ RNA hybridization to map the expression of a putative monoterpene synthase to the epithelium of glands and used enzyme assays with recombinant protein of the same gene to show that it produces sabinene, the monoterpene precursor of α-thujone. Finally, we identified candidate genes with predicted enzymatic functions for the conversion of sabinene to α-thujone. Taken together, this approach generated both general resources and detailed functional characterization in the identification of genes of foliar terpenoid biosynthesis in T. plicata.

Thuja plicata (western redcedar), a member of the Cupressaceae family of conifers, has effective chemical defenses and is known to be highly resistant to insect and fungal damage (Minore, 1983). However, ungulate browsing can cause significant damage to seedlings (Martin and Daufresne, 1999;Martin and Baltzinger, 2002;Vourc'h et al., 2002bVourc'h et al., , 2002cStroh et al., 2008). Reforestation efforts typically rely on physical tree guards to protect newly planted T. plicata seedlings from ungulate browsing at a cost of $20 to $25 million a year in the western parts of Canada (British Columbia) where T. plicata is a native species (Russell, 2008). Ungulate browsing deterrence has been linked to high levels of modified monoterpenes or monoterpenoids. Ungulates selectively feed on individuals with low levels of foliar monoterpenoids, while avoiding plantlets with higher levels (Vourc'h et al., 2002b). The two most important monoterpenoids for deterring ungulate feeding are aand b-thujone (Vourc'h et al., 2002a), which are typically the most abundant monoterpenoids in T. plicata foliage (Kimball et al., 2005). T. plicata foliage consists of scale-like needles, where each scale has a prominent central resin gland. In addition, smaller glands can be found at the base of the scales (Suzuki, 1979). It is unclear, however, where in the foliage highly volatile monoterpenoids such as thujones are produced or stored.
Biochemical studies in sage (Salvia officinalis) found that the monoterpene sabinene is a precursor to aand b-thujone (Dehal and Croteau, 1987). Microsomal preparations from leaf tissue catalyze the NADP (NADPH) and oxygen-dependent hydroxylation of (+)-sabinene to (+)-cis-sabinol (Karp et al., 1987). A (+)-cis-sabinol dehydrogenase catalyzes the formation of sabinone, which then undergoes NADPH-dependent stereoselective reduction to form either aor b-thujone (Dehal and Croteau, 1987). Based on these findings, a pathway for the transformations of sabinene to thujones has been suggested (Dehal and Croteau, 1987), and it is possible that thujone biosynthesis follows a similar path in T. plicata.
Monoterpenes are produced by monoterpene synthases, which in conifers are relatively closely related to sesquiterpene and diterpene synthases (Keeling and Bohlmann, 2006;Chen et al., 2011). Monoterpene synthases convert geranyl diphosphate into a large variety of cyclic or linear monoterpenes. Many terpene synthases have been functionally characterized in conifers of the pine family (Pinaceae), and several conserved motifs have been identified as necessary for enzymatic function; these include the DDXXD and NSE/DTE motifs that bind metal cofactors, the RRX 8 W motifs near the N terminus, and the DXDD motif of some diterpene synthases (Keeling and Bohlmann, 2006;Keeling et al., 2008Keeling et al., , 2011. Encoded monoterpene synthase proteins harbor signal peptides that result in plastid import where monoterpene synthesis takes place. There is extensive documentation that the levels of transcripts of terpene synthase-encoding genes correlate well with enzyme levels as well as levels of specific monoterpenes or monoterpenoids (Miller et al., 2005;Byun-McKay et al., 2006;Zulak et al., 2009;Hall et al., 2011).
Molecular genetic information is limited for T. plicata, with only a handful of complementary DNA (cDNA) sequences present in GenBank. Recent development of next-generation sequencing technologies, however, now allows extensive sequencing and de novo assembly of transcriptomes as a relatively rapid method for gene discovery (Morozova and Marra, 2008). Here, we used Illumina sequencing to generate a T. plicata transcriptome assembly and to identify more than 600 genes whose expression correlated with the presence of foliar monoterpenoids and terpenoid storage glands. Using this new sequence resource, we identified candidate genes for enzymes in the conversion of sabinene to thujones and functionally identified a sabinene synthase-encoding gene. In situ hybridization studies further showed that this sabinene synthase gene is expressed in the epithelial cells lining foliar resin glands.

Characterization of Wild-Type and Variant T. plicata Genotypes
While it is known that T. plicata foliage contains sabinene and thujone (Kimball et al., 2005), the internal site or sites of production and storage are unknown. The limited literature available on T. plicata foliage anatomy describes the presence of resin glands in the scale-like leaves (hereafter referred to as scales; Suzuki, 1979), suggesting that these compounds could be synthesized and possibly stored in these glands, just as terpenoids are stored in needle resin ducts in Pinaceae species (Wooding and Northcote, 1965;Manninen et al., 2002). Figure 1A shows a cross section of central scales of young foliage, revealing the presence of one central resin gland on the adaxial side (arrow I) and one on the abaxial side (arrow II). We used syringe needles to collect samples from the central glands (arrows I and II) and lateral positions (arrow III) and assessed the terpenoid profiles of samples by gas chromatographymass spectrometry (GC-MS). This simple procedure revealed no detectable terpenes in the samples from non-gland positions (Fig. 1B). Samples from glands, on the other hand, showed a terpenoid profile dominated by the monoterpenoids a-thujone, b-thujone, and sabinene (Fig. 1C). The terpenoid profile obtained from glands is qualitatively the same as that published for whole foliage (Kimball et al., 2005). Thus, it appears that the majority of stored foliar terpenoids are confined to resin glands.
We tested this notion further by examining the leaves of wild-type T. plicata and comparing with a natural variant that lacks monoterpenoids (Fig. 2, A  and B). Microscopy showed that foliage from the variant lacked resin glands (Fig. 2B). The lack of resin storage structures in the variant was confirmed by analysis of scale cross sections (not shown). Chemical analysis of leaf terpenoid extracts by GC-MS from the variant confirmed that the foliage does not contain detectable amounts of monoterpenoids ( Fig. 2D), while the presence of the major T. plicata monoterpenoids was confirmed in the wild type (Fig. 2C). In summary, two lines of evidence conclusively link monoterpene storage to foliar resin glands also in T. plicata.

Generation of a de Novo Leaf Transcriptome
There are no genomic resources available in the public domain for T. plicata or any closely related plant in Cupressaceae. We sequenced the poly(A)-enriched portion of RNA extracted from wild-type and variant seedlings foliage (see "Materials and Methods"). The Illumina sequencing resulted in more than 65,000,000 paired-end reads, of which 55,761,915 could be assembled de novo into 74,999 contigs, with an average contig length of 681 bp. The BLASTX application was able to find similar sequences with E values of less than 1e -3 for 33,207 contigs in the National Center for Biotechnology Information (NCBI) database. The contigs with BLASTX matches represented 49,002,907 of the total reads, with mean lengths of 1,039 bp and mean coverage per base of 85.9 times. This transcriptome represents approximately 34.5 Mbp of new sequence information for T. plicata. BLAST2GO software Götz et al., 2008) was used to assign annotations to each contig with BLASTX matches.

Genes with Expression Associated with the Presence of Foliar Resin Glands
The glandless T. plicata variant described above provided us with the opportunity to screen for genes that are expressed at elevated levels in foliage with glands relative to foliage without glands and thereby identify differentially expressed genes that are associated with gland tissues. To this end, we utilized Illumina cDNA tag profiling (see "Materials and Methods") to identify genes whose expression differed between the two samples. Tag profiling quantifies levels of transcripts by sequencing short 39 transcript fragments. Approximately 5 million tags from each library aligned to the transcriptome. Using a stringency criterion of 16 out of 17 bp matches for tag-to-template alignment, a total of 604 contigs with greater than 10fold difference in expression between wild-type and variant seedlings were identified (Fig. 3). In addition, we identified 314 contigs for which no tags were found in the variant tag profiling library, but which were represented with between 10 and 1,499 tags in the wild-type tag profiling library (Fig. 3), providing a correlation between absence of glands and absence of these transcripts in the natural variant in this experiment. The 108 genes with the highest representation within this data set are summarized in Supplemental Table S1. Among the 604 highly differentially expressed genes were a number with predicted gene products that are potentially involved in terpenoid production, including eight terpene synthases, four cytochrome P450s, two dehydrogenases, and two reductases (see below). When the 604 contigs were categorized based on the Gene Ontology for cellular component, biological process, and molecular function, we found that components and activities linked to terpene synthesis and secretion are well represented, including plastids, chloroplasts, and membranes (Supplemental Fig. S1). Figure 1. A, Comparison of monoterpene content in glands (arows I, II) and areas without glands (arrow III). Terpenoids were collected from tissue samples by piercing with sterile needles. B, No monoterpenes were detected by GC from tissue extracted from areas surrounding the resin glands. C, Monoterpenes were detected in resin glands by GC. IS, Internal standard (isobutyl benzene); 1, thujene; 2, a-pinene; 3, sabinene; 4, myrcene; 5, a-thujone; 6, b-thujone. [See online article for color version of this figure.]

Confirmation of Tag Profiling Results for Terpenoid Pathway Candidate Genes by RT-qPCR
We validated the results obtained by tag profiling for the expression of 20 of the 604 genes (Supplemental Table S2) by real-time quantitative PCR (RT-qPCR) in new samples of RNA from foliage with and without glands. Genes for RT-qPCR were selected based on their similarity to genes with annotated functions potentially related to defense and terpenoid production. We used three housekeeping genes with predicted open reading frames (ORFs) of an elongation factor, a polyubiquitin, and an actin as references for assessment of relative transcript abundance. Experiments indicated no differences in expression of these genes in foliage with and without glands (data not shown). For some of the tested genes, RT-qPCR did not yield a detectable product after 35 cycles when RNA from foliage of the glandless variant was used as substrate for cDNA synthesis and PCR. Although these results confirm the findings from the tag profiling experiments, absence of detectable transcripts in the glandless variety prevented us from calculating a relative fold abundance of the same transcript in the foliage with glands. To address this issue, we calculated fold abundance based on an introduced threshold cycle (C T ) of 35 (C T = 35) to calculate a minimum measure of the relative difference of transcript abundance in expression of a gene in foliage with and without glands if the transcript was detectable in foliage without glands.
The above data set of highly differentially expressed transcripts included eight terpene synthase contigs. Five of these were expressed at very low levels in foliage with glands, as indicated by fewer than 10 tags per gene, and were therefore not pursued further as candidate terpene synthases for the production of the highly abundant terpenoids. Analysis of protein motif combinations indicated that the remaining three contigs likely represented one each of a mono-, sesqui-, and diterpene synthase (data not shown). Transcripts of the putative mono-and diterpene synthase could not be detected by RT-qPCR in foliage without glands. Introduction of an artificial C T = 35 value showed that transcript of the putative monoterpene synthase is at least 449-fold more abundant in foliage with glands and that the putative diterpene synthase is at least 5,134-fold more abundant in foliage with glands from the tested genotypes (Fig. 4A). Transcripts of the putative sesquiterpene synthase were detected in foliage without glands, and the RT-qPCR results indicated approximately 318-fold higher abundance in foliage with glands.
Of the four identified putative cytochrome P450encoding transcripts, one could not be amplified from cDNA and was not pursued further. The remaining three transcripts were highly differentially expressed. Based on phylogenetic analysis, these genes have been annotated as CYP805A1 (contig9309), CYP720B18 (contig33366) and CYP750B1 (contig1536; Supplemental  Fig. S4). CYP805A1 and CYP720B18 were not detected in foliage without glands, and an artificial C T = 35 value indicated that CYP805A1 was at least 420-fold more abundant in foliage with glands and CYP720B18 at least 1,055-fold more abundant in foliage with glands ( Fig. 4B). CYP750B1 was detected in foliage without glands and was expressed at approximately 276-fold higher levels in foliage with glands.
Contig30947/dehydrogenase1 was also not detected in foliage without glands, and an artificial C T = 35 value indicated that it was at least 342-fold more abundant in foliage with glands ( Fig. 4C). Contig4118/dehydrogen-ase2 was expressed at approximately 103-fold higher levels in foliage with glands. The one exception among the tested expressed genes was 678/reductase1 because RT-qPCR did not confirm tag profiling results (Fig. 4D). Instead, RT-qPCR indicated a slight down-regulation in foliage with glands. We have not pursued this discrepancy further. On the other hand, putative 44628/Reduc-tase2 could not be detected in foliage without glands, and an introduced C T = 35 value indicated that it was at least 739-fold more abundant in foliage with glands.  We tested nine additional genes for differential expression in foliage with and without glands. These genes were selected based on similarity to genes previously implicated in biotic stress (see Supplemental Table S3 and "Discussion"). Transcripts for none of these genes could be detected in foliage without glands (Supplemental Fig.  S2). With an introduced C T = 35 value, transcripts of seven of the nine genes were expressed at 1,000-fold or higher levels in foliage with glands. The two remaining genes were expressed at higher levels in foliage with glands, albeit at lower levels.
In summary, the RT-qPCR experiments confirmed the highly differential expression obtained by tag profiling for 18 of the 20 tested genes. In addition, the RT-qPCR experiments indicated that transcripts of 15 of the tested genes could not be detected in three biological replicates of RT-qPCR from foliage without glands, providing a strong correlation between absence of expression and absence of glands or, vice versa, a correlation between expression and presence of glands.

Identification of Candidate Proteins in Thujone Biosynthesis by Phylogenetic Analysis
Phylogenetic analysis and identification of characteristic protein motifs were used to further narrow the list of candidates for thujone biosynthesis. In the case of terpene synthases, functional motifs were used to characterize genes as mono-, sesqui-, or diterpenes, as phylogenetic analysis alone may be ineffective in differentiating between mono-and sesquiterpene synthases (Keeling and Bohlmann, 2006).
Plant terpene synthases are subdivided into eight subfamilies (TPS-a-TPS-h; Chen et al., 2011), with gymnosperm terpene synthases of specialized metabolism falling into subfamily TPS-d (Keeling and Bohlmann, 2006;Chen et al., 2011). Based on BLASTX similarities, contig788 was found to closely match two known monoterpene synthases, the Chamaecyparis obtusa limonene/borneol synthase (accession no. AB120957) and the Chamaecyparis formosensis a-pinene synthase (accession no. EU099434). ClustalX phylogenetic analysis showed that the monoterpene synthases from the family Cupressaceae form a subclade, separate from the remaining monoterpene synthases, all from the family Pinaceae, in the TPS-d1 subfamily (Supplemental Fig. S3). The closest BLASTX matches for the other two putative terpene synthases were a diterpene synthase for contig37952 and a sesquiterpene synthase for con-tig14670. Examination of the functional motifs of the predicted ORFs of these sequences supported these functions (data not shown). These genes were not examined further as candidates in thujone biosynthesis. The cDNA and predicted amino acid sequence for the monoterpene synthase contig788 are shown in Supplemental Fig. S5.
The predicted ORFs of two of the three cytochrome P450 contigs were incomplete, and 59 RACE was used to obtain full-length ORFs based on lengths of related proteins. The cDNA sequences and corresponding predicted ORFs can be found in Supplemental Figures S6 to S8. Phylogenetic analysis of the candidate cytochrome P450s found all three contigs to fall into different subfamilies (Supplemental Fig. S4). Contig9309/ CYP805A1 does not share a close similarity to a known conifer cytochrome P450. The closest amino acid identity match for CYP805A1 is at 34% with a cytochrome P450 from California poppy (Eschscholzia californica) in the subfamily CYP719A, which catalyzes a methylenedioxy bridge formation in the isoquinoline alkaloid biosynthesis (Ikezawa et al., 2009). The ORF from contig33366/CYP720B18 groups into the CYP720B subfamily, which contains enzymes involved in diterpene resin acid biosynthesis in loblolly pine (Pinus taeda) and Sitka spruce (Picea sitchensis; Ro et al., 2005;Hamberger and Bohlmann, 2006;Hamberger et al., 2011). The ORF from contig1536/CYP750B1 was found to belong to the CYP75 subfamily, which contains cytochrome P450s involved in mono-, sesqui-, and diterpene metabolism (Ro et al., 2005) and which is also thought to be involved in B-ring hydroxylation during flavonoid biosynthesis (Ayabe and Akashi, 2006).
The proposed thujone biosynthesis pathway shares similarities to menthone biosynthesis in Mentha spp., as both pathways require not only the activity of a monoterpene synthase and a cytochrome P450, but also a dehydrogenase and a reductase (Croteau et al., 2005). The dehydrogenases responsible for the conversion of (2)-trans-isopiperitenol to (2)-isopiperitenone in peppermint (Mentha piperita) belong to the short-chain alcohol dehydrogenase superfamily (Croteau et al., 2005). Tag profiling of T. plicata identified candidates also for the dehydrogenase and reductase steps in the thujone biosynthesis pathway. Contig4118/dehydrogenase2 (Supplemental Fig. S9) was identified as the strongest candidate for a potential sabinol dehydrogenase, as the predicted amino acid sequence of this contig has a high identity to a putative short-chain alcohol dehydrogenase from castor bean (Ricinus communis) and groups into the short-chain alcohol dehydrogenase superfamily (data not shown). Contig44628/reductase2 (Supplemental Fig. S10) is the strongest candidate for the final step in thujone production because it shares similarity with the (2)-isopiperitenone reductase from peppermint, which is known to be involved in menthone biosynthesis (Croteau et al., 2005).

Localized Expression of a Monoterpene Synthase
We used in situ RNA hybridization to sectioned leaf tissues to localize the expression of the putative monoterpene synthase contig788 to resin gland epithelial cells. The negative control sense probes revealed no signal anywhere in the leaf tissue, including the epithelial cells of resin glands ( Fig. 5A; arrow). By contrast, the contig788 antisense probe produced a strong hybridization signal in the gland epithelial cells ( Fig. 5B; arrows). This hybridization signal was consistent and present in resin glands in both central and lateral scales (Fig. 5C) throughout the foliage, including early stages of gland development (Fig. 5D). These in situ hybridizations support the hypothesis that the monoterpene synthase candidate gene, as well as potentially other genes found to be differentially expressed in wild-type and variant foliage, may be expressed specifically in resin gland tissues.

Identification of Monoterpene Synthase Enzymatic Activity
While the motif signature of contig788 suggests it is a monoterpene synthase, it does not provide evidence of specificity of monoterpene synthase activity. To assess the specific activity of the encoded protein, we cloned the complete ORF into an expression vector and produced recombinant protein in Escherichia coli. The poly-His-tagged protein was partially affinity purified and tested with geranyl diphosphate for monoterpene synthase activity. The product profile was assessed by GC-MS using authentic standards for comparison. The recombinant protein produced (+)-sabinene as the primary product (86.5% 6 1.1%) and low quantities of the monoterpenes myrcene (4.9% 6 0.1%), (+)-a-pinene (2.74% 6 0.2%), terpinolene (1.8% 6 0.25%), a-thujene (1.3% 6 0.1%), g-terpinene (1.1% 6 0.4%), (+)-limonene (0.9% 6 0.2%), and (+)-b-pinene (0.7% 6 0.2%; Fig. 6A). The protein did not show any sesqui-or diterpene synthase activity when assayed with farnesyl diphosphate or geranylgeranyl diphosphate substrates, respectively (data not shown). Based on this analysis, we identified the monoterpene synthase encoded by contig788 as T. plicata SABINENE SYNTHASE (TpSAB). Sabinene synthase activity is highly relevant for the production of thujone, as sabinene serves as the monoterpene precursor to thujone in other species ( Fig. 7; Croteau, 1996). Interestingly, the overall profile of the major and minor products of TpSAB, including their stereochemistry, was almost identical to that found in foliar extracts of T. plicata (Fig. 6, A and B). These observations suggest that TpSAB may be the major monoterpene-producing enzyme in the epithelium of foliar glands, at least in the genotype tested here.

DISCUSSION
Among forest trees, T. plicata and other Thuja species are known for having exceptionally strong defenses against pests and pathogens (Minore, 1983). These defenses include a repertoire of potentially toxic specialized (i.e. secondary) metabolites, many of which are named after the Thuja genus. For example, thujaplicins are ring-expanded monoterpenoids with strong antifungal properties that provide heartwood rot resistance (DeBell et al., 1999;Daniels and Russell, 2007;Bentley, 2008). Thujic acid and related lignans serve a similar function, making the wood of T. plicata highly suitable for outdoor use (Taylor et al., 2006;Morris and Stirling, 2012). While thujone is perhaps best known as a component of wormwood (Artemisia absinthium) and the spirit absinthe (Lachenmeier et al., 2006), thujone is present at high levels in the foliage of T. plicata, where it serves to reduce ungulate browsing (Vourc'h et al., 2002a). Thus, Thuja species hold considerable potential for discovery of genes, enzymes, and pathways required for biosynthesis of potent defense chemicals. To that effect, we found that thujones and other monoterpenoids are stored in foliar glands and took advantage of a natural variant of T. plicata that lacked these glands to identify 604 genes whose expression levels are either low or absent in this variant and high in wild-type foliage. While it is too early to conclude that this set of genes represents a gland transcriptome, it allowed us to narrow down a limited set of genes whose gene products are candidates for the synthesis of thujone. The set contained only one highly expressed putative mono-, sesqui-, and diterpene synthase. This number is low, suggesting that relatively few terpene synthases contribute to the terpene profile of glands of T. plicata foliage. While it may also be possible that our sequencing and tag profiling approach may have missed additional terpene synthase genes, it should be kept in mind that the tag profiling method used here is considered both highly sensitive and has a high dynamic range Mizrachi et al., 2010;Wang et al., 2010). In line with these traits, the detection of rare transcripts, down to single tags among 65 million tags, of putative terpene synthases also suggests that we did not miss any highly expressed terpene synthase in the assessed tissue. With an interest of this project in a-thujone biosynthesis, we focused on the monoterpene synthase characterization. The additional putative diand sesquiterpene synthases detected as differentially expressed in foliage with and without glands will be studied in more detail in future work. These genes, which may encode single-or multi-product enzymes, are likely to contribute to the diverse profile of minor and major terpenes of glands of T. plicata foliage. However, specific roles of these genes cannot be predicted without additional functional characterization .
The monoterpene profile of extracted foliage closely resembles that of the product profile of TpSAB identified here. A major additional compound in the foliar extract (peak 10 in Fig. 6B), a-thujone, is predicted to be derived from sabinene, the major TpSAB product, (+)-sabinene (peak 4 in Fig. 6A). Thus, the entire profile of monoterpenoids in wild-type foliage as detected here (Fig. 6B) and described prior (Kimball et al., 2005) can be explained by the activity of TpSAB and the predicted conversion of sabinene to thujone. TpSAB and its protein may be considered a key gene and enzyme, respectively, for monoterpene-based defenses against herbivore feeding in the T. plicata foliage. This finding has important implications for the use of TpSAB or its encoded protein as a potential genetic marker or biomarker, respectively, for screening, selection, and breeding of T. plicata resistant to ungulate browsing. We have amplified, cloned, and sequenced genomic fragments corresponding to full-length transcripts of TpSAB, but found no evidence for multiple copies of this gene based on divergent intron sequences (data not shown), suggesting that TpSAB is encoded by a single genomic locus.
analysis of leaf terpenes in 55 T. plicata populations resulting in a negative correlation between sabinene and aand b-thujone levels (Von Rudloff et al., 1988). This negative correlation indicates interdependence and that sabinene may also be the precursor to thujone in T. plicata. A similar argument can be made from this study, because TpSAB generated monoterpene extracts with a high content of sabinene and no a-thujone, whereas extracts from wild-type foliage contained much lower content of sabinene relative to minor products and a high content of a-thujone (compare Fig. 6, A and B).
Based on the proposed thujone biosynthesis pathway (Croteau, 1996), following the production of sabinene, there are three additional steps in the production of thujone involving cytochrome P450, dehydrogenase, and reductase activity, in that order (Fig. 7). Three putative cytochrome P450 genes and their predicted polypeptides were identified in the tag profiling data set with expression that was absent or very low in the variant. CYP805A1 shares only a weak similarity to a member of the CYP719A subfamily, providing no indication of its enzymatic activity. CYP720B18 is a relevant candidate protein for thujone biosynthesis, as it falls within the CYP720B family, with members known to be involved in diterpene resin acid biosynthesis in members of the pine family (Zulak and Bohlmann, 2010;Hamberger et al., 2011). This cytochrome P450 may alternatively be involved in the biosynthesis of diterpene resin acids, also present in T. plicata foliage (Vourc'h et al., 2001). The third cytochrome P450, CYP750B1, grouped in the CYP75 subfamily with members known to modify terpenoids and flavonoids (Ro et al., 2005;Ayabe and Akashi, 2006). Two partial cDNAs matching a short-chain alcohol dehydrogenase and Rossman superfamily reductase were also identified; however, very few similar genes have been isolated and characterized from any conifer. As such, it is difficult to speculate further on their potential role in thujone biosynthesis.
The tag profiling data set identified a number of other genes that could be involved in defense responses, including signal transduction components that are candidates for activation of genes in terpenoid biosynthesis. For example, one of the transcripts that could not be detected in foliage without glands, but was present at high levels in foliage with glands, closely matched transcripts of phospholipase A-encoding enzymes. Phospholipase A enzymes have been implicated in the regulation of plant biotic and abiotic stress responses, in which it may act by releasing linolenic acid, a precursor to jasmonic acid, from membrane lipids (Ryu, 2004). Jasmonic acid, in turn, is a well-known activator of biotic defense responses, including the induction of terpene biosynthesis (Phillips et al., 2006).
Two APETALA2/ethylene response factor transcription factor matching genes were also present within the tag profiling data set; a member of this gene family, Octadecanoid-derivative Responsive Catharanthus APETALA2-domain protein3, is known to be involved in the jasmonic acid-induced expression of genes involved in terpene alkaloid biosynthesis in Catharanthus roseus (van der Fits and Memelink, 2000). The expression of two other putative transcription factors in the MYB family cannot be detected in foliage without glands, but is present at high levels in foliage with glands, providing additional targets for potential regulation of activities related to gland formation, differentiation, or function.
In summary, we have identified a large set of genes that sheds light on the molecular basis of cell typespecific monoterpene biosynthesis in T. plicata foliage. In particular, TpSAB may have a key function in the biosynthesis of foliar a-thujone and by extension, is a strong candidate for marker-assisted selection to reduce deer browsing of T. plicata seedlings. In future work, we will make use of a large set of different provenances to test genetic associations of variation in candidate genes identified here with variation in sabinene and a-thujone content that exists within the available population.

Characterization of Wild-Type and Variant T. plicata Lines
Wild-type and variant Thuja plicata seedlings in this study were obtained from British Columbia Forest Service (clones 274 and 878). The variant seedling was first identified in an open-pollinated progeny field trial on southern Vancouver Island. The variant seedling was grown from seed collected from a mature tree situated in a bog at the northern edge of the T. plicata coastal maritime range (latitude 54°14' 09" N, longitude 129°36' 58" W, and altitude 140 m). This fragmented population of T. plicata growing in a mixed coniferous forest 50 km east of Prince Rupert, British Columbia was in an isolated narrow valley surrounded by steep mountains on three sides and bordered by the Skeena River to the south, where the probability of selfing would be high (Vourc'h et al., 2002b). Monoterpenes from foliage tips of both wild-type and variant seedlings were extracted following a procedure modified from Kimball et al. (2005). For each seedling, 1 g of foliage was ground in liquid nitrogen. One-half of the ground sample was stored for RNA extraction, and the other 0.5 g was transferred to a 10-mL glass tube and extracted with 5 mL of ethyl acetate on a horizontal shaker for 30 min. To remove solid tissue, the extract was first centrifuged and then passed through a glass wool filter. Either isobutyl benzene or n-nonyl acetate was added as an internal standard (see Figs. 1, 2, and 6). Extracts were then concentrated under a nitrogen stream, and aliquots were analyzed by GC or GC-MS. GC analyses employed a Hewlett Packard 5890 Series II gas chromatograph equipped with a Zebron 5 GC column (Phenomenex; 30 m 3 0.25 mm [length 3 i.d.], 0.25 mm [film]) and programmed with the following temperature program: 5 min at 50°C and 10°C per min to 280°C. For GC-MS analyses, one of two settings was used. The first setting employed a Varian Saturn 2000 Ion Trap Gas Chromatograph-Mass Spectrometer and a Varian workstation version 6.9.1 (Agilent Technologies). The gas chromatograph-mass spectrometer was fitted with a DB-5MS column (30 m 3 0.25 mm [length 3 i.d.], 0.25 mm [film]) using the following temperature program: 5 min at 50°C and 10°C per min to 280°C. Monoterpenes were identified by comparing their retention indices (van Den Dool and Kratz, 1963) and mass spectra with those reported in the literature (Adams, 1989) and with those of authentic standards (a-pinene, myrcene, and aand b-thujone [Sigma-Aldrich], sabinene [Indofine], and a-thujene [available from previous research]). The second setting employed an Agilent 6890A series gas chromatograph coupled with a 5973N mass spectrometer as described by Abbott et al. (2010). Monoterpenes were separated on a DB-WAX capillary column (J&W 122-7032; 30 m 3 0.25 mm [length 3 i.d.], 0.25 mm [film]) using the following program: 4 min at 40°C, increased by 3°C per min to 85°C, then 30°C per min to 250°C and held for 2.5 min.
To determine the storage location of monoterpenes in T. plicata scale leaves, 100 resin glands and 100 locations surrounding resin glands in the wild-type genotype were pierced with a sterile needle. Following penetration of tissue, needles were dipped into 0.5 mL of ethyl acetate to remove any foliar resin, then washed with 70% (v/v) ethanol and dried prior to piercing another part of tissue. Because of the volatile nature of monoterpenes, resin glands and surrounding tissue needles were extracted at different times and on different leaves to prevent cross contamination of samples.

Leaf Transcriptome Construction
RNA was extracted from 1 g foliage of T. plicata as described by Kolosova et al. (2004). Foliage was harvested at midday from the top one-third of 2-yearold wild-type and variant plants grown in a growth chamber at 20°C with daylength of 16 h and relative humidity of 50% to 70%. RNA from both seedlings (clones 274 and 878) was pooled together at an equal ratio and sent to Canada's Michael Smith Genome Sciences Centre for construction of a cDNA library and paired-end sequencing on an Illumina Genome Analyzer IIx with an average read length of 75 bp.
The Illumina sequencing reads were assembled into a transcriptome library using the DNASTAR SeqMan Pro NGen software package. Reads were first screened to remove expected Illumina paired-end adapter and primer sequences and then de novo assembled with a mer length of 21 bp and an expected match rate of 85%. Contigs with fewer than 10 reads were removed from the final transcriptome. BLASTX was used to compare sequences with the NCBI nonredundant database. Only contigs having a BLASTX match with E values of 1e -3 or less were further annotated and used as a template for Illumina tag profiling. This high stringency value was chosen to reduce the chance of incorrect annotations. BLAST2GO Götz et al., 2008) was used to determine functional gene ontology annotations using default parameters.

Illumina Tag Profiling
RNA from both wild-type and variant seedling was used as template for oligo(dT)-primed cDNA library synthesis. Double-stranded cDNA were generated using the Illumina Tag Profiling DpnII Sample Prep Kit and sequenced on an Illumina Genome Analyzer IIx machine with an average read length of 35 bp by Canada's Michael Smith Genome Sciences Centre. The above transcriptome was used as the template for aligning tags using DNASTAR NGen2 software. Each tag library was analyzed individually against the template transcriptome with alignment accuracy set at 90% for 17 bp mer matches, which allowed one single-nucleotide polymorphism between the template and tag. Expression patterns between the wild type and the variant were compared by calculating the number of tags aligned to each contig in the transcriptome. Expression between the two tag profiling libraries was normalized based on the ratio between the total amounts of tags in each library.

RT-qPCR
Primers listed in Supplemental Table S3 were used to amplify gene sequences that aligned closely to the major tag sites. Each set of primers was tested on cDNA from three wild-type and three variant plants with three technical replicates per sample using the Dynamo Flash SYBR Green RT-qPCR Kit (Finnzymes). In addition to the 20 candidate genes tested with RT-qPCR, the geometric mean C T values for three housekeeping genes with BLASTX results matching an elongation factor, a polyubiquitin, and an actin-like protein were chosen from the transcriptome to provide values for normalizing RT-qPCR results. Relative transcript abundance was determined by the ΔCT method, and expression differences between the wild-type (WFP1065) and variant plants (274) were determined by the 2 2ΔΔCT algorithm (Tsai et al., 2006). Statistical analysis was conducted using JMP 8 (SAS Institute).

Selection of Candidate Thujone Biosynthesis Genes
We targeted contigs with BLASTX similarities to terpene synthases, cytochrome P450s, alcohol dehydrogenases, and aldo-keto reductases. Within the terpene synthase family, we examined contigs for their conserved plastid targeting sequences, DDXXD motifs, and RRX 8 W motifs for the presence or absence of the conserved regions to classify the contigs as mono-, sesqui-, or diterpene synthases as described previously (Keeling and Bohlmann, 2006;Chen et al., 2011). Predicted protein sequences for each candidate gene were compared with known sequences from proteins in other conifers and plant species from NCBI GenBank. Phylogenetic trees for each candidate gene family were constructed using ClustalX.

RACE
RACE was performed on putative cytochrome P450 contig9309 and con-tig33366 to acquire the missing 59 terminus by the use of the RLM-RACE kit (Ambion). Two cDNA libraries were generated using SuperScript III with gene-specific primers GSP9309-AAATCTAAATCTACTTTATGTTATGTT and GSP33366-ACAGTGGAACTTCAAACAAC. Forward primers for contig9309 and contig33366 were used in combination with the 59 Ambion Inner RACE primer (forward, 9399-GCATGGCAACTAGGACGACA and forward, 33366-GGCGAGAGTGATGGGAACAT). PCR was conducted using Phusion polymerase (Finnzymes). PCR products were then gel purified using the QIAquick Gel Extraction Kit (Qiagen). Purified PCR products were cloned into vector ptz57RT using the InsTAclone PCR Cloning Kit (Fermentas). Positive colonies were sequenced using standard M13/pUC forward and M13/pUC reverse sequencing primers (Macrogen).

Localized Expression of a Monoterpene Synthase by in Situ Hybridization
Contig788 was cloned from cDNA by PCR with Phusion polymerase (Finnzymes) and primers CCAATGGCTCTTTTCTCTGC and CCAATGGAA-TAGGTTCTAGTAGGATTTG to generate a 1,814-bp product. The PCR product was blunt-end ligated into pBluescript SK-using T4 DNA ligase (Fermentas). Sense and antisense probes were synthesized for each contig by T7 RNA polymerase using digoxygenin (DIG)-labeled UTP (Brewer et al., 2006). Concentration of hybridization probes was determined using the DIG Nucleic Acid Detection Kit (Roche). Leaves were dehydrated, fixed, and embedded in paraffin wax and sectioned following the technique of Brewer et al. (2006). In situ hybridization was performed according to Floyd and Bowman (2006), except that 300 mL per slide pair of Western Blue Stabilized Substrate (Promega) for alkaline phosphatase was used for detection. The staining procedure was stopped after 30 to 60 min by dipping slides into Tris-EDTA buffer and rinsing twice for 3 min each. Slides were dried and pictures were taken in a Nikon E600 microscope using differential interference contrast optics.

Functional Characterization of a Monoterpene Synthase
Expression of recombinant terpene synthase enzyme and functional characterization followed the general procedures described by Martin et al. (2004). Primers GGGGACAAGTTTGTACAAAAAAGCAGGCTTAATGGCTCTTTT-CTCTGCT and GGGGACCACTTTGTACAAGAAAGCTGGGTAAAATTA-CAATGGAATAGG were designed with attB1 sites to clone the full translated region of contig788 into the gateway vector pDONR/Zeo (Invitrogen). PCR was conducted using Phusion polymerase, and amplicons were cloned into pDONR/Zeo using BP Clonase II (Invitrogen) following the manufacturer's instructions. The fragment was thereafter recombined into the pDEST17 expression vector in frame with the 63 poly-His sequence using LR Clonase II (Invitrogen). Positive clones were screened by sequencing using forward and reverse universal primers (Macrogen). Recombinant protein expression and partial purification were performed as described by Keeling et al. (2008), and assays were performed as previously described (O'Maille et al., 2004;Keeling et al., 2008;Hall et al., 2011).

Supplemental Data
The following materials are available in the online version of this article.
Supplemental Figure S1. Top 10 Gene Ontology term assignments.
Supplemental Figure S6. Nucleic acid and predicted amino acid sequence for contig9309/CYP805A1.
Supplemental Figure S7. Nucleic acid and predicted amino acid sequence for contig33366/CYP720B18.
Supplemental Figure S8. Nucleic acid and predicted amino acid sequence for contig1534/CYP750B1.
Supplemental Figure S9. Nucleic acid and predicted amino acid sequence for contig4118/dehydrogenase.
Supplemental Figure S10. Nucleic acid and predicted amino acid sequence for contig44628/reductase.
Supplemental Table S1. List of contigs with 25-fold or higher transcript abundance in the wild-type tag profiling data set than in the variant data set.
Supplemental Table S2. Annotated function of predicted polypeptides.
Supplemental Table S3. Primers used in qPCR validation of tag profiling.