Acquisition, conservation, and loss of dual-targeted proteins in land plants.

Summary: The dual-targeting ability of proteins arose early in evolution, and although it is conserved in many cases, acquisition and loss still occur. The dual-targeting ability of a variety of proteins from Physcomitrella patens, rice (Oryza sativa), and Arabidopsis (Arabidopsis thaliana) was tested to determine when dual targeting arose and to what extent it was conserved in land plants. Overall, the targeting ability of over 80 different proteins from rice and P. patens, representing 42 dual-targeted proteins in Arabidopsis, was tested. We found that dual targeting arose early in land plant evolution, as it was evident in many cases with P. patens proteins that were conserved in rice and Arabidopsis. Furthermore, we found that the acquisition of dual-targeting ability is still occurring, evident in P. patens as well as rice and Arabidopsis. The loss of dual-targeting ability appears to be rare, but does occur. Ascorbate peroxidase represents such an example. After gene duplication in rice, individual genes encode proteins that are targeted to a single organelle. Although we found that dual targeting was generally conserved, the ability to detect dual-targeted proteins differed depending on the cell types used. Furthermore, it appears that small changes in the targeting signal can result in a loss (or gain) of dual-targeting ability. Overall, examination of the targeting signals within this study did not reveal any clear patterns that would predict dual-targeting ability. The acquisition of dual-targeting ability also appears to be coordinated between proteins. Mitochondrial intermembrane space import and assembly protein40, a protein involved in oxidative folding in mitochondria and peroxisomes, provides an example where acquisition of dual targeting is accompanied by the dual targeting of substrate proteins.


Introduction
Gene transfer to the host nucleus followed the endosymbiotic events that led to the formation of mitochondria and plastids in plant cells (Adams et al., 2000;Dyall et al., 2004;Kleine et al., 2009;Keeling, 2010). This resulted in a reduction of the coding capacity of these organelles to ~5% of the original endosymbionts genome (Pfannschmidt, 2010) and therefore the majority of organellar proteins are encoded in the nucleus, translated in the cytosol and imported into their respective organelles. This process of protein targeting required that new machinery, not present in the original endosymbionts be acquired to specifically recognize and translocate thousands of proteins across their respective organelle membranes (Dolezal et al., 2006). Studies into mitochondrial and plastid protein import revealed that targeting and import is specific for each organelle (Rudhe et al., 2002;Glaser and Whelan, 2007).
This specificity is believed to be due to a number of factors, the nature of the targeting signals, the presence of cytosolic 'targeting" factors and the presence of protein receptors on the organelle surface, all of which contribute to maintain the specificity of protein import (Chew and Whelan, 2004). The molecular mechanisms of how these features maintain specificity is largely unknown and even further complicated with the growing identification of proteins that can be targeted to multiple organelles.
The initial report that glutathione reductase from Pisum sativum was targeted to both mitochondria and plastids revealed that targeting could occur to two organelles and protein targeting was not location specific (Creissen et al., 1995). Since this initial report 107 proteins have now been reported to be dual-targeted to mitochondria and plastids in a variety of plants (Carrie et al., 2009a;Carrie and Small, 2012). The dual targeting of proteins can occur by a variety of mechanisms (Peeters and Small, 2001;Yogev and Pines, 2011), such as ambiguous targeting signals, where a single targeting signal has the ability to target a protein to two distinct locations, or via alternative transcription/translation to produce altered targeting signals for each respective organelle (Peeters and Small, 2001;Yogev and Pines, 2011). Dual targeting by ambiguous signals is of particular interest as it questions how two distinct organelle import machineries can recognize these ambiguous signals https://plantphysiol.org Downloaded on December 30, 2020. -Published by Copyright (c) 2020 American Society of Plant Biologists. All rights reserved. and yet distinguish between organelle specific signals. In addition to dual targeting to mitochondria and plastids, dual targeting of proteins to mitochondria and peroxisomes has also been reported (Carrie et al., 2008;Carrie et al., 2009b). The mechanism of targeting differs in that proteins dualtargeted to mitochondria and peroxisomes contain two signals, which result in the same protein being imported into two distinct locations.
To date, 72 proteins in Arabidopsis thaliana have been shown to be dual-targeted (Carrie et al., 2009a;Carrie and Small, 2012) but as many as 500 are predicted to be dual-targeted by containing ambiguous signals (Mitschke et al., 2009). The necessity for dual targeting remains largely unknown though it has been suggested that it may be related to gene copy number and restriction of genome size (Morgante et al., 2009), or required for the co-ordination of organelle function (Chew et al., 2003;Carrie et al., 2009a). There is only a limited amount of information available regarding the extent of dual targeting of orthologous proteins between species. Dual targeting of four proteins (methionine aminopeptidase, monodehydroascorbate reductase, glutamyl-tRNA synthetase and tyrosyl-tRNA synthetase) have been shown to be conserved in Oryza sativa (rice) and Arabidopsis (Morgante et al., 2009). Organellar seryl-tRNA is dual targeted in Arabidopsis and Zea Mays (maize) (Rokov-Plavec et al., 2008).
Additionally the MutS HOMOLOG 1 was also found to be dual-targeted in a number of dicot plants (Xu et al., 2011).
In order to gain a better insight into dual targeting of proteins to mitochondria and plastids or mitochondria and peroxisomes, in terms of i) when did dual targeting arise in plant evolution? ii) is dual targeting of proteins conserved? iii) is acquisition or loss of dual targeting of proteins still occurring? and, iv) is dual targeting of proteins associated with gain of function in organelles?, a study was undertaken to determine if dual-targeted proteins in Arabidopsis were also dual-targeted in rice and Physcomitrella patens. Furthermore, if differences were observed in the dual targeting ability of proteins from within these three species, the targeting ability of Picea glauca orthologs was additionally investigated, thus examining the dual targeting ability of proteins spanning 500 million years of land plant evolution.
Overall these questions were posed to gain a better understanding of the purpose of dual targeting of proteins.

Results
In land plants, 107 proteins have been experimentally reported to be dual-targeted to mitochondria and plastids or mitochondria and peroxisomes (Supplemental Table S1 & S2) (Carrie et al., 2009a;Carrie and Small, 2012).
In order to determine when dual targeting arose in land plant evolution and if dual targeting ability is conserved, we investigated if the orthologs of some dual-targeted proteins in Arabidopsis were also dual-targeted in Physcomitrella and rice. As Physcomitrella diverged from Arabidopsis ~450 million years ago ( Figure 1A), it represents the earliest land plant with a complete genome sequence, allowing identification of all orthologs within gene families of Arabidopsis dual-targeted proteins. It should be noted that for many dual-targeted proteins, location specific orthologs also exist, so identification of all gene family members is required to ensure that all orthologous proteins are being identified to test for targeting ability. The orthologs from Chlamydomonas reihardtii (Merchant et al., 2007) and Chlorella variabilis (Blanc et al., 2010) were also identified and included in the phylogenetic analysis to obtain a more robust characterization of orthologs.
Therefore the ortholog that showed the highest level of sequence identity and similarity to the Arabidopsis dual-targeted protein was defined as the closest ortholog and its targeting ability was tested (Supplemental dataset S1). The targeting of Physcomitrella proteins were tested in Arabidopsis cell suspensions and onion epidermal cells, in addition to Physcomitrella protonemal tissues as there have been no previous reports of targeting Physcomitrella proteins in non-homologous plant tissues. The mitochondrial targeting signals from alternative oxidase (AOX) and the alpha subunit of the mitochondrial processing peptidase (MPP alpha), plastid targeting signals of the small subunit of 1, 5 ribulose biphosphate carboxylase oxygenase (SSU Rubisco) and Photosystem I (PS I) subunit 2, and peroxisomal targeting signals from thiolase and malate synthase were tested in Arabidopsis cell suspensions, onion epidermal cells and Physcomitrella tissue to define the fluorescence characteristics of these organelles in the various tissues tested https://plantphysiol.org Downloaded on December 30, 2020. -Published by Copyright (c) 2020 American Society of Plant Biologists. All rights reserved.
( Figure 1B). Additionally this demonstrated that the mitochondrial, plastid and peroxisomal RFP markers previously used in Arabidopsis to define these organelles can also be used with Physcomitrella (Carrie et al., 2009b).

Dual targeting arose early and is conserved during land plant evolution
A number of orthologs to dual-targeted proteins in Arabidopsis were also found to be dual-targeted from rice and Physcomitrella. DNA Topoisomerase represents an example of a protein that is dual-targeted to mitochondria and plastids in Arabidopsis (Carrie et al., 2009b) and was similarly found to be dual-targeted to mitochondria and plastids from rice and Physcomitrella ( Figure 2). Analysis of Topoisomerase genes reveals that there are three in Arabidopsis, four in rice and five in Physcomitrella ( Figure   2A). While the targeting predictions of the proteins differ using different prediction programs ( Figure 2B), it was apparent that Topoisomerases were dual-targeted in all three species when tested by biolistic transformation ( Figure 2C). Noticeably while dual targeting of the rice Topoisomerase was readily apparent in Arabidopsis cell suspensions and onion epidermal cells, dual targeting of the Physcomitrella Topoisomerase was only apparent in onion epidermal cells and in Physcomitrella protonemal tissues, where the mitochondria were elongated and cylinder-like in shape ( Figure 2C). This elongated shape was similar to that observed with the control mitochondrial markers ( Figure 1B). Thus while dual targeting of Topoisomerases was conserved, the dual targeting of the Physcomitrella Topisomerase was not observed in all cell types tested.
Analysis of a large number of other proteins revealed that while many were observed to be dual-targeted, the ability for dual targeting differed between cell types. In the case of DNA helicase (Figure 3), previously shown to be dual-targeted from Arabidopsis (Carrie et al., 2009b), the Physcomitrella ortholog tested (PpHel1) and the only rice ortholog (OsHel) were also determined to be dual-targeted in all tissues tested ( Figure 3). Thus it differed from the Physcomitrella Topoisomerase that was not observed to be dualtargeted in Arabidopsis cell suspensions. However, in the case of DNA Polymerase (Figure 4), whilst the two Arabidopsis orthologs AtPolgamma 1 & mitochondria and plastids (Carrie et al., 2009b), the Physcomitrella orthologs showed a different pattern (Figure 4). Of the four Physcomitrella DNA polymerase orthologs, three, PpPol2, PpPol3 and PpPol1 branched with the Arabidopsis proteins. However PpPol1 was only targeted to plastids, in all tissues tested ( Figure 4C). In contrast, whilst PpPol2 was predominantly targeted to mitochondria in Arabidopsis cell suspensions, plastid and mitochondrial targeting was evident in onion epidermal cells ( Figure 4C). In Physcomitrella protonemal tissues, as in the Arabidopsis, mitochondrial targeting was dominant with plastid targeting only weakly observed ( Figure   4C).
Whilst only some examples have been discussed above, overall in the analysis of 38 Arabidopsis proteins that were previously published to be dualtargeted to mitochondria and plastids, in 28 out of 38 cases rice contains an ortholog that was dual-targeted (Supplemental Figure S1, Supplemental Table   S1, shaded in green and yellow 1 ), and 9 out of 13 cases rice and Physcomitrella were shown to be dual-targeted in the tissues tested (Supplemental Table S1, Supplemental Figure S1, shaded in green 2 ).
The dual targeting of proteins to mitochondria and peroxisomes or plastids and peroxisomes has also been reported with 11 cases identified to date (Supplemental Table S2). The dual targeting of orthologs from rice and Overall, in the analysis of four Arabidopsis proteins that were dualtargeted to mitochondria and peroxisomes or plastids and peroxisomes, three out of four (Supplemental Table S2 3 ) orthologs from rice tested in Arabidopsis suspension and onion were dual-targeted and two out of three proteins from rice and Physcomitrella were dual-targeted to mitochondria and peroxisomes or plastids and peroxisomes (Supplemental Table S2).

Dual targeting is gained and lost during land plant evolution
While a variety of proteins examined were observed to be dualtargeted, Physcomitrella DNA polymerase hinted that targeting to one organelle may be stronger or more efficient than to another (Figure 4), and that this may be associated with gene duplication. Therefore a detailed study was carried out investigating the dual targeting ability of a number of proteins that are known to be encoded by small gene families. Enzymes of the ascorbate-gluthione reductase cycle were chosen as, gluthathione reductase (GR), ascorbate peroxidase (APX) and monodehydroascorbate reductase (MDHAR) contain orthologs that are dual-targeted in Arabidopsis and are part of multi-gene families ( Figure 5A) (Chew et al., 2003). GR orthologs from rice (OsGR1) and Physcomitrella (PpGR1) were similarly observed to be dualtargeted in Arabidopsis, onion and Physcomitrella tissue ( Figure 5D). In the case of APX, four genes encode APX in Physcomitrella, five in Picea, seven in Arabidopsis and eight in rice ( Figure 6A). The Arabidopsis APX designated as a plastid stromal isoform (AtSAPX) has been previously reported to be dual-targeted (Chew et al., 2003), and this was confirmed in this study ( Figure   6). However, examination of Physcomitrella APX1, showed it was only targeted to plastids in all three tissue types tested ( Figure 6B). Examination of the rice APX proteins, OsAPX5, OsAPX6, OsAPX7 and OsAPX8 which grouped with and displayed the highest sequence identity to AtSAPX ( Figure   6A & C) showed that OsAPX5 and OsAPX6 were targeted to mitochondria, while OsAPX7 and OsAPX8 were targeted to plastids ( Figure 6B). To gain insight into when dual targeting of APX was gained (or lost), the targeting of the most orthologous APX from Picea, PgAPX1, was examined and found to be dual-targeted ( Figure 6). Thus it is proposed that the dual targeting ability of APX arose following the split between Physcomitrella and Picea and subsequently lost in rice following monocot divergence. In rice, gene duplication resulted in two genes with each encoding organelle specific proteins ( Figure 6A). The gene family of MDHAR has multiple members identified with four genes in Physcomitrella, three in Picea, five in Arabidopsis and six in rice ( Figure 7A). AtMDHAR6, previously shown to be dual-targeted (Chew et al., 2003) has two orthologs in Physcomitrella, one of which showed targeting to both mitochondria and plastids. As MDHAR dual-targeted orthologs could be detected in all four species, including Picea (Figure 7), it suggests that as with GR, dual targeting ability arose early in land plant evolution and has been conserved. However PpMDHAR1 and OsMDHAR1.2 are not dual-targeted, it suggests that dual targeting of these isoforms may have been lost following gene duplication in these organisms, so as only a single dual-targeted isoform is conserved in each organism.
Hexokinase proteins represent another example of the acquisition of dual targeting ability throughout land plant evolution. Whilst the targeting of Arabidopsis orthologs have not previously been studied, a recent study (Nilsson et al., 2011) in Physcomitrella had identified isoforms that were dualtargeted to mitochondria and plastids. In this study, of the 11 genes encoding hexokinase (PpHXK1-11) ( Figure 8A), at least six isoforms are dual-targeted ( Figure 9). Compared to the previous study some differences have been observed (Nilsson et al., 2011), in that PpHXK5 and PpHXK9 were found to be dual-targeted whilst PpHXK8 was not ( Figure 8B). However, none of the four hexokinase orthologs, AtHXK1-4 (or one hexokinase-like AtHKL1) from Arabidopsis displayed dual targeting ability, with four orthologs targeted to mitochondria and one plastids ( Figure 8C). Phylogenetic analysis of the dualtargeted hexokinase isoforms from Physcomitrella reveals that five out of the six isoforms branch together ( Figure 8A), suggesting that a single gene in Physcomitrella encoding a dual-targeted hexokinase underwent gene duplication as all proteins in this group display dual targeting ability ( Figure   8D). On the other hand, Hexokinase 5 (PpHKK5) also displays dual targeting ability ( Figure 8D), most likely acquired throughout evolution as its closest orthologs (PpHXK1 & 6) ( Figure 8A) are not dual-targeted ( Figure 8D). As Physcomitrella hexokinases are more similar to each other than to (PpMia40) was chosen to determine its subcellular location. GFP analysis of PpMia40 found that it did not target to either mitochondria or peroxisomes and instead appeared to reside in the cytosol as evidenced by a diffuse GFP signal in all tissue types tested ( Figure 9C). Whilst the lack of peroxisomal targeting is expected due to the lack of a PTS1 sequence at the C-terminus, the lack of mitochondrial targeting is surprising considering all other known Mia40 proteins have been shown to be located within mitochondria. In addition, the PpMia40 sequence was observed to contain the conserved cysteine residues and is highly similar to the Arabidopsis Mia40 in the enzymatically active domain. PpMia40 is missing 28 amino acids from the Nterminal end when compared to Arabidopsis Mia40 (Supplemental Figure S3).
In an attempt to deduce the targeting ability of Mia40 over a wider range of

Discussion
This study used GFP tagging of proteins from several land plants to determine when dual targeting of proteins arose in land plant evolution, and if it was conserved. While it is desirable to use a variety of approaches to determine the location of a protein (Millar et al., 2009), for the analysis of 96 proteins from Arabidopsis, rice, Physcomitrella and three from Picea, GFP tagging is the only realistic approach to determine targeting ability. The use of various other approaches to determine the presence of a protein, either by immunodetection or mass spectrometry was not feasible due to large gene families and thus the requirement of isoform specific antibodies. Mass spectrometric approaches are only feasible for highly purified organelles and may not identify proteins that are relatively low in abundance. Even in Arabidopsis, the most intensively studied plant in subcellular proteomics, only 13 of the 72 dual-targeted proteins that have been determined by GFP tagging, were also detected in two organelles via proteomic studies (Heazlewood et al., 2007). The in vivo targeting assay using GFP tagging offers a realistic approach to assess targeting as if targeting to one organelle, https://plantphysiol.org Downloaded on December 30, 2020. -Published by Copyright (c) 2020 American Society of Plant Biologists. All rights reserved. mitochondria, plastids or peroxisomes, is observed, it indicates that the protein is in an import competent state.
It was observed that in many cases, dual targeting was conserved. If dual targeting ability arose early in evolution, it remained conserved from Physcomitrella to Arabidopsis and rice, as 11 out of 16 tested proteins were confirmed to be dual-targeted in all three species (Supplemental Table S1 and S2, shaded in green). Similarly if dual targeting arose later in plant evolution, dual targeting remained conserved with two out of five tested proteins confirmed to be dual-targeted from rice and Arabidopsis. Loss of dual targeting could be concluded with confidence as was observed with APX, where the loss of dual targeting in rice was accompanied by gene duplication followed by neo-functionalization, in that the duplicated genes encoded proteins that were targeted to single locations. A similar scenario also appears to have occurred with MDHAR isoforms in rice and Physcomitrella. It could also be observed that dual targeting ability is being acquired as a number of proteins (seven) are dual-targeted from Arabidopsis alone. The dual targeting ability of Hexokinases from Physcomitrella, is likely to be a derived feature, rather than it being lost from Arabidopsis.
While dual targeting of proteins was conserved in many cases, differences in targeting ability of proteins to mitochondria and plastids were observed with the different tissues/cells used, e.g. Physcomitrella DNA Polymerase 2. This does not appear to be co-ordinated with differences in the mitochondrial protein import apparatus of Physcomitrella, rice and Arabidopsis. The outer mitochondrial membrane protein import receptor 64 (OM64), which is derived from an ancestral gene encoding a plastid outer envelope protein is only present in rice and Arabidopsis (Chew et al., 2004;Carrie et al., 2010a), yet dual targeting occurs in Physcomitrella. The differences observed may be due to a variety of reasons; firstly there are isoforms of the protein import components present in plastids and mitochondria (Soll and Schleiff, 2004;Lister et al., 2007;Jarvis, 2008) and for plastids it has been proposed that these different isoforms may import different sets of proteins. Thus the difference in import between systems may reflect the different abundance of various isoforms in various cells, and/or the fact that there is co-evolution or specialization of precursor proteins to bind to https://plantphysiol.org Downloaded on December 30, 2020. -Published by Copyright (c) 2020 American Society of Plant Biologists. All rights reserved. specific isoforms of protein import components. An analysis of the Tom20 import family of proteins in Arabidopsis suggested that different Tom20 isoforms mat display some preference or differences for binding different precursor proteins (Lister et al., 2007;Duncan et al., 2012). Furthermore although targeting signals are generally considered to be well conserved across wide phylogenetic gaps, an analysis of mitochondrial targeting signals from rice and Arabidopsis revealed differences in length and amino acid composition could be detected, suggesting that subtle differences between species are likely that may affect the efficiency of targeting in difference cell types (Huang et al., 2009). Another reason for the differences between various cells tested is that even within a single species the extent of dual targeting varies in cells from different tissues (Carrie et al., 2009b). Finally it has previously been reported that while some proteins are dual-targeted, that targeting to a single organelle is only observed in a single cell (Beardslee et al., 2002), although the reason(s) for this are unknown. Thus the reasons that some variation in dual targeting may be observed between various systems with some proteins may be due to a variety of reasons.
Analysis of the targeting signals of dual-targeted proteins does not reveal any specific motifs or residues that are associated with dual targeting.
In this study protein isoforms that display very high levels of sequence similarity were observed to differ in dual targeting ability. In the case of rice MDHAR, one splicing isoform was dual-targeted, OsMDAHR1.1 (Figure 7), yet another with only four amino acids different, OsMDHAR1.2 was not dualtargeted ( Figure 7) (Supplemental Figure S4). With Physcomitrella MDHAR1 or 2, four amino acid differences in the predicted targeting region result in One of the incentives for this study was to gain a better understanding of the purpose of dual targeting. As outlined above, dual targeting appears to be conserved once it arises, and thus under positive selection to be maintained. However, the functional characterization of several dual-targeted proteins has largely concluded that the effects of inactivation of genes The evolutionary history of the targeting of Mia40 makes an interesting case. Mia40 is believed to be found in most eukaryotic species but is best studied in yeast model systems (Chacinska et al., 2004). In yeast, Mia40 is located within the mitochondrial intermembrane space where it plays an essential role in the oxidation and folding of intermembrane space proteins (Chacinska et al., 2004). It was originally thought that this was the role of that a mia40 knockout is not lethal and that it was also targeted to peroxisomes (Carrie et al., 2010b). The role for Arabidopsis Mia40 is thought to involve the oxidation and folding of both the mitochondrial and peroxisomal

Bioinformatic Analyses
The tree showing the evolutionary relationship between the six plant species used in this study was determined according to their time of divergence (in millions of years) according to previous studies (Bowman et al., 2007;Rensing et al., 2008;Carrie et al., 2010b).
Protein sequences of all published Arabidopsis dual-targeted proteins (Supplemental Table S1

RNA extraction from Arabidopsis thaliana, Oryza sativa and
Physcomitrella patens was carried out using the RNeasy kit (Qiagen, Melbourne) according to manufacturer's instructions. Picea glauca RNA was obtained from Dr Olivier Keech (Umea Plant Science Center). Reverse transcription was carried out using SuperScript TM III First-strand synthesis system (Invitrogen, Sydney). The translational start sites for all Picea glauca genes were confirmed by 5'-RACE using CapFishingTM full-length cDNA premix kit (Seegene, South Korea). Full-length cDNA was amplified using gene specific primers flanked by Gateway recombination cassettes (see Supplemental Table S3) according to manufacturer's instructions. A number of genes (see Supplemental Table S3)

Volvox carteri, Chlorella variabillis, Physcomitrella patens and Picea glauca
Mia40 is not predicted to be targeted to any organelle, and this was confirmed for Physcomitrella in this study (Figure 9). This is accompanied by either the