TILLING in Lotus japonicus Identiﬁed Large Allelic Series for Symbiosis Genes and Revealed a Bias in Functionally Defective Ethyl Methanesulfonate Alleles toward Glycine Replacements

We have established tools for forward and reverse genetic analysis of the legume Lotus ( Lotus japonicus ). A structured population of M2 progeny of 4,904 ethyl methanesulfonate-mutagenized M1 embryos is available for single nucleotide polymorphism mutation detection, using a TILLING (for Targeting Induced Local Lesions IN Genomes) protocol. Scanning subsets of this population, we identiﬁed a mutation load of one per 502 kb of ampliﬁed fragment. Moreover, we observed a 1:10 ratio between homozygous and heterozygous mutations in the M2 progeny. This reveals a clear difference in germline genetics between Lotus and Arabidopsis ( Arabidopsis thaliana ). In addition, we assembled M2 siblings with obvious phenotypes in overall development, starch accumulation, or nitrogen-ﬁxing root nodule symbiosis in three thematic subpopulations. By screening the nodulation-defective population of M2 individuals for mutations in a set of 12 genes known to be essential for nodule development, we identiﬁed large allelic series for each gene, generating a unique data set that combines genotypic and phenotypic information facilitating structure-function studies. This analysis revealed a signiﬁcant bias for replacements of glycine (Gly) residues in functionally defective alleles, which may be explained by the exceptional structural features of Gly. Gly allows the peptide chain to adopt conformations that are no longer possible after amino acid replacement. This previously unrecognized vulnerability of proteins at Gly residues could be used for the improvement of algorithms that are designed to predict the deleterious nature of single nucleotide polymorphism mutations. Our results demonstrate the power, as well as the limitations, of ethyl methanesulfonate mutagenesis for forward and reverse genetic studies. (Original mutant phenotypes can be accessed at http://data.jic.bbsrc.ac.uk/cgi-bin/lotusjaponicus. Access to the Lotus TILLING facility can be obtained through http://www.lotusjaponicus.org or http://revgenuk.jic.ac.uk.)

Whole genome sequencing and transcriptome analysis have provided in-depth descriptions of the physical structure and the repertoire of gene expression in a growing number of eukaryotic organisms. However, to reveal the functions of individual genes, genetic approaches will remain of paramount importance. Forward genetics aims to identify the causative genetic change in a phenotypically interesting mutant (i.e. mutant first). In contrast, reverse genetics intends to assign a function to a gene of known sequence through phenotypic analysis of individuals in which the function of this gene is altered (i.e. gene sequence first). In higher plants, targeted gene disruption methods are not yet routine and alternative methods are required to obtain individuals in which the gene of interest is impaired. Random insertion mutagenesis using either T-DNA or transposons has been successfully used in Arabidopsis (Arabidopsis thaliana) to assemble mutant libraries, which cover the vast majority of the genes of this plant (http://signal.salk.edu). Moreover, silencing of genes of interest using RNA interference transgenesis has become a popular tool for reverse genetics (Mansoor et al., 2006). An alternative and wellestablished approach is chemical or fast-neutron mutagenesis followed by identification of mutants from a suitably sized population of mutagenized individuals. Advantages of chemical or physical mutagenesis are their applicability to organisms that are not easily transformable or in which active transposons have not been characterized and the ease of generating large independent mutant populations. Radiation typically induces deletions, which can be readily detected by PCR, using primers flanking the deletion (Li et al., 2001). However, their catastrophic nature limits the number of tolerated deletions per genome. Chemical mutagens, such as ethyl methanesulfonate (EMS), induce point mutations that have the advantage of being tolerated to a high density. This permits near saturation with a manageable number of mutant individuals (Henikoff et al., 2004). In addition, allelic series with single amino acid changes and specific phenotypes bear the potential for providing more detailed information on protein function.
Screening for point mutations without prior knowledge of the mutation is technically challenging. A technology based on mismatch recognition in heteroduplex DNA by endonucleases such as CEL I (Till et al., 2004a) provides a level of specificity that allows the detection of a single mutant allele in a pool of wildtype alleles (Colbert et al., 2001). When combined with chemical mutagenesis, the screening for mutant individuals in a large population using mismatch detection is referred to as TILLING (for Targeting Induced Local Lesions IN Genomes;McCallum et al., 2000;Comai and Henikoff, 2006). The TILLING process involves PCR amplification with fluorescently labeled primers from pooled DNA. Mismatched heteroduplexes are generated between wild-type and mutant DNA by melting and reannealing the PCR products. Heteroduplexes are incubated with the endonuclease CEL I that cleaves mismatched heteroduplex sites, and the resulting products are separated and visualized on sequencing gels or capillaries. Subsequent sequence analysis in heteroduplex regions of individual plant DNAs identifies the mutation. TILLING platforms have been established for a variety of plants, such as maize (Zea mays; Till et al., 2004b), wheat (Triticum aestivum; Slade et al., 2005), rice (Oryza sativa; Till et al., 2007), soybean (Glycine max;Cooper et al., 2008), pea (Pisum sativum; Dalmais et al., 2008), and Medicago truncatula (Le Signor et al., 2009). The technology has been adopted for reverse genetics in animals, including rat (Smits et al., 2004), zebrafish (Sood et al., 2006), Drosophila (Winkler et al., 2005), and Caenorhabditis (Gilchrist et al., 2006). Comai and Henikoff (2006) reviewed the TILLING process.
The legumes, including important crop plants like bean (Phaseolus vulgaris), soybean, pea, and lentil (Lens culinaris), exhibit biological traits that are of agricultural significance but that cannot be analyzed in the model plant Arabidopsis. For example, Arabidopsis does not engage in the ecologically important arbuscular mycorrhiza symbiosis with phosphate-delivering fungi, which is formed by more than 80% of land plants (Harrison, 2005). Moreover, the nitrogen-fixing root nodule symbiosis with rhizobia is almost exclusively found in the legume family (Downie and Oldroyd, 2008). In addition, certain aspects of flower symmetry (Dong et al., 2005) and carbon partitioning (Horst et al., 2007) are peculiar to legumes. To facilitate the genetic analysis of legume-specific biology, a reverse genetic tool for the model legume Lotus (Lotus japonicus) was established that utilizes the TILLING strategy (Perry et al., 2003). Since June 2003, the Lotus TILLING facility (Perry et al., 2003) has been available to the research community, and genes across many aspects of plant growth and development have been subjected to TILLING for 21 research groups covering 10 countries. Here, we describe the genetic resources currently available and provide a quantitative analysis of the distribution and frequency of mutations within our populations. Furthermore, we present allelic series of genes required for root nodule symbiosis obtained through TILLING of these populations.

General TILLING Population
Lotus mutant populations were generated in a continuous effort over 4 years. Following EMS mutagenesis of M1 seeds, the resulting M2 mutant population has been subdivided into several subpopulations according to phenotype. Since the initial description of the Lotus TILLING facility by Perry et al. (2003), the size and structure of the available populations have increased (Fig. 1). The largest subpopulation is the general TILLING population (GENPOP), which is intended primarily for scans for mutations in genes for which no a priori prediction of phenotypic consequence is possible or for which the thematic populations described below did not yield interesting hits. This population is composed of a single representative plant per each M2 family. To assemble the GENPOP, plants were chosen that did not exhibit any severe phenotypic impairment. This selection was to maximize the number of plants for which seeds could be obtained, since seed availability is the prerequisite for phenotypic analysis of lines carrying mutations of interest. The size of GENPOP was increased from 3,840 to 4,904 individuals ( Fig. 1)  We subjected GENPOP to TILLING with 84 fragments corresponding to 61 genes. For economic reasons, 53 gene fragments representing 36 genes were only tested on a subset of 2,304 plants (population 1). Upon customer request, or in cases where an insufficient quantity or quality of mutations was obtained from this subpopulation, we screened additional sets of 1,297 (population 2) or 1,303 (population 3) GEN-POP plants. The total population of 4,904 plants was screened with 11 fragments representing 10 genes (Table I). Taking into account the number of plants screened for each of the fragments, in total we obtained 576 hits from 289 Mb of amplicons (Table I). On average, we obtained two mutations per 1 Mb screened, equivalent to a mutation load in GENPOP of one mutation per 502 kb. The hit frequencies observed in the three subpopulations are not directly comparable because they were obtained with different numbers of gene fragments (Table I). In comparison, the mutation load in the coding portion of the genome after EMS mutagenesis in the Seattle Arabidopsis TILL-ING project (ATP) was determined on a large data set of about 1,900 independent mutations in 192 genes to be about 1 per 300 kb screened (Greene et al., 2003).

Germline Genetics in Lotus
We used a single M2 individual per family as the core unit of our GENPOP. Due to Mendelian segregation and the chimeric nature of the M1 embryo that was exposed to EMS, only a proportion of the mutations present in the germline of an M1 are represented in any M2 individual of the general population, the remainder being carried by other M2 siblings (Henikoff and Comai, 2003). To explain the segregation ratios in offspring of mutagenized embryos, the concept of the genetically effective cell number has been introduced by Li and Rédei (1969). The mechanistic interpretation of the original concept states that a genetically effective number of cells in the apical meristem of the embryo each gives rise to an independent sector in the mature plant.
In the Seattle ATP population, a 2:1 segregation ratio of heterozygous versus homozygous mutants was observed after EMS mutagenesis of seeds (Greene et al., 2003). This suggests that the gametes formed in each individual flower are offspring of only a single mutagenized cell. As only one of the two haploid genomes of this cell carries a particular mutation, it will be theoretically distributed to 50% of the gametes, resulting in the observed segregation ratio (Supplemental Fig. S1). Table II shows the distribution of mutation types and zygosity in the GENPOP. It is apparent that in contrast to Arabidopsis, homozygous mutations are significantly underrepresented in our GENPOP, with a ratio of only one homozygous to 10 heterozygous mutations. Since we deliberately removed plants with obvious phenotypic effects from GENPOP, the homozygosity of detrimental alleles has been selected against during the assembly of the GENPOP. Moreover, almost the same 1:10 ratio was observed for silent mutations (Table II), which makes the phenotypic screen an unlikely explanation for the observed ratio. Therefore, we conclude that Arabidopsis and Lotus differ in germline genetics. Our Lotus data are compatible with a model in which six types of Figure 1. Lotus populations available from forward screens and for TILLING. Screening M2 families over a period of 4 years generated the populations. For details, see text. A population of 80 ecotypes was assembled for the rapid survey of naturally occurring sequence polymorphisms in genes of interest using the TILL-ING protocol. This approach has been previously published as "EcoTILLING" using Arabidopsis ecotypes (Comai et al., 2004). gametes are produced within a single flower, one of which carries the mutation, and random mating at selfing within that single flower results in a 25:10:1 segregation ratio of homozygous wild type versus heterozygotes versus homozygous mutants (Supplemental Fig. S1). Because we observe homozygous mutants in the M2, these gametes contribute equally to the male and female germline. This could be achieved by a hypothetical cell cluster of at least three in the embryo at the time of mutagenesis and a model in which an individual flower consists of a mosaic originating from these cells. These would give rise to gametes originating in equal proportion from all three germline cells. Although our genetic data are consistent with this scenario, to our knowledge there is no histological evidence for this hypothesis.
This fundamental difference between Lotus and Arabidopsis has consequences for the mutation frequency that is carried through to the M2. While in Arabidopsis 25% of the mutations are lost due to Mendelian 1:2:1 segregation, in Lotus it is at least 69.4% (25 out of 36) that are not transmitted. If we take this dilution into account when calculating the initial hit rate in the M1, we obtain one hit in 154 kb (502 kb 3 30.6%), which is higher than the calculated frequency in Arabidopsis (300 kb 3 75% = 225 kb; Greene et al., 2003). Considering that the haploid Lotus genome is approximately 472 Mb in size , a frequency of one mutation per 154 kb is equivalent to 3,065 heritable mutations per M1 plant. However, more relevant for users of the TILLING lines is the mutation load in the M2 and subsequent generations. The M2 generation carries 940 mutations per plant (one hit in 502 kb of diploid sequence of the 472-Mb genome), of which 10% are homozygous. Due to the loss of 25% of the heterozygous mutations during selfing, the mutation load drops to 729 and 624 mutations per plant in the M3 and M4 generations, respectively. It is important to realize that at the same time the theoretical frequency of homozygous mutations increases in each round of selfing. Therefore, second-site mutation phenotypes may only become apparent in later generations.
We observed that fertility is a more sensitive parameter for EMS mutagenesis than seed viability. While in our experiments germination was reduced to about 75% to 80%, only about 50% of these plants set seed. In practical terms, this suggests that fertility of the M1 is the most relevant bottleneck limiting the mutation load that can be obtained in the resulting M2. Similar mutation rates observed for Lotus and Arabidopsis, which have a 4-fold difference in genome size, suggest that there is an upper limit of what can be tolerated in the coding part of the genome. A conserved upper limit is consistent with observations and comparisons with animal systems (Greene et al., 2003).

Distribution of Mutation Types
Of the detected mutations in GENPOP, 97.6% were G/C-to-A/T transitions typically induced by EMS (Table III; Supplemental Table S1). A comparable analysis of mutant alleles detected after EMS mutagenesis in the Seattle ATP has revealed more that 99% G/C-to-A/T transitions (Greene et al., 2003). The authors concluded that EMS induces exclusively G/C-to-A/T transitions (Greene et al., 2003). This would mean that the remaining 1% (Arabidopsis) or 2.4% (Lotus) of the observed mutations are not induced by EMS but are spontaneous mutations. For Lotus, this would be equal to a spontaneous mutation frequency of about 1 in 20 Mb per generation, or 25 mutations per plant per generation. These observed numbers are well within the range of an estimate of 0.1 to 100 per genome per sexual generation (Drake et al., 1998).

Successful Screens of M3 Bulked Family Seeds
We increased the number of available seeds per family by collecting seeds in family bulks from M2 siblings that remained after the phenotypic screens ( Fig. 1, BULKs). This collection is organized such that each seed bag can be traced back to the original M1 plant. This was done to support the anticipated user demand for seeds originating from the TILLING of GENPOP but also to support additional forward genetic screens in the background of our increasingly well-characterized mutant families. This is a significant advantage, since it allows rapid cross-referencing between the TILLING results for each family and any phenotypic information resulting from the original M2 screen or novel screen results in the structured collection of bulked M3 seeds. Our collection of 2,204 individual families in separate bulks has been made available for forward genetic screens and has been subjected to successful screens for mutants with novel symbiotic or developmental phenotypes.
A particularly rewarding screen was performed to identify genetic regulators of root nodule development (Tirichine et al., 2006b) SL0642 [snf5], and SL0935 [snf6]) were identified that segregated individuals forming root nodules in the absence of rhizobia (Tirichine et al., 2006b). Two of the so-called spontaneous nodule formation (snf ) loci have already been identified by map-based cloning. The snf1 locus was found to encode a calcium-and calmodulindependent protein kinase (CCaMK; Tirichine et al., 2006a). A second locus (snf2) has been found to encode a His kinase implicated in cytokinin perception (Tirichine et al., 2006c), both articles providing groundbreaking information about the endogenous regulation of organ formation.
The M3 bulked seeds were also successfully screened by the group of D. Luo at the Shanghai Institute of Plant Physiology and Ecology to isolate flower mutants, some of which have been shown to be alleles of LjLEAFY (Dong et al., 2005).

Thematic Subpopulations
As a special feature of the Lotus TILLING facility, we assembled phenotypically preselected mutant populations. Screening for phenotypes of interest allowed the assembly of three thematic mutant collections (Fig.  1). The first contained mutants with altered shoot, leaf, or flower development (DEVPOP), a second set comprised mutants with altered starch metabolism, and the third and largest population consisted of plants with defects in root nodule symbiosis (NODPOP). With these populations, we have assembled a largescale forward and reverse genetics tool for the legume Lotus. This community resource has contributed significantly to recent advances in the field of legume research, especially root symbiosis research (Perry et al., 2003;Imaizumi-Anraku et al., 2005;Yoshida and Parniske, 2005;Weerasinghe et al., 2005;Heckmann et al., 2006;Lombardo et al., 2006;Murray et al., 2006;Sandal et al., 2006;Tirichine et al., 2006aTirichine et al., , 2006bTirichine et al., , 2006cHorst et al., 2007;Yano et al., 2008;Maekawa-Yoshikawa et al., 2009). Apart from the large allelic series obtained, another advantage of the thematic populations is their smaller size, which means that less enzyme and fewer sequencing runs have to be used, resulting in a lower cost per functionally defective allele.

NODPOP
The NODPOP was assembled from plants that did not form nodules or had fewer and/or smaller nodules by screening M2 families after infection with Mesorhizobium loti. Moreover, plants with a nodule color different from that of the wild type (pink) were included. In addition, the population contained plants that showed signs of early senescence but regreened upon the addition of nitrogen fertilizer. This latter group was expected to include plants whose nodules would be defective in nitrogen fixation.
A total of 5,300 families (63,084 individuals) were screened for nodulation defects over the course of 3 years. This generated a total population of 670 individuals from 396 families (Fig. 1): 248 plants lacking nodules (123 families); 322 having fewer or smaller or white nodules (203 families); 27 root mutants (22 families); and 73 potentially unable to fix nitrogen (48 families). There were 26 additional families where individuals fell into different categories. DNA samples were collected from all mutants to assemble the NODPOP used for TILLING. Upon rescreening of the NODPOP, the original phenotype was so far confirmed in approximately 35% of lines (Fig. 1).
DNA of 43 symbiosis-defective mutants with unidentified genetic defects from the mutant collections of Krzysztof Szczyglowski (Southern Crop Protection and Food Research Centre, Ontario; Murray et al., 2006), of 15 mutants from Jens Stougaard (University of Aarhus; Sandal et al., 2006), and of one mutant from Judith Webb (Institute of Grassland and Environmental Research, Aberysthwyth, UK) were added to the NODPOP DNA arrays. The combined population was termed NODPOP+ ( Fig. 1; Supplemental Fig. S2). This was done so that mutants with defects in known genes could be identified more rapidly. This strategy has led to the identification of several additional mutant alleles . Together with repetitions in DNA extractions as controls from several lines, NODPOP+ consists of 784 DNA samples.
M3 progeny of nodulation-defective M2 individuals were also scored for their ability to form structurally intact arbuscular mycorrhiza symbiosis. To date, we have identified 36 families that were defective in both nodulation and in arbuscular mycorrhiza formation. For the majority of these common symbiosis mutants, it was possible to assign them to one of the previously identified common symbiosis loci ( Fig. 2; Supplemental Fig. S3).
Several of our NODPOP individuals did not produce seeds, so we attempted to rescue the corresponding mutants by rescreening sibling seeds available as bulked M3 seeds (Fig. 1, BULKs). We screened the bulked M3 seeds of 97 families (each containing approximately 50 plants) for the segregation of symbiosis mutants. In 55 of those 97 families, no mutant plants could be identified in the bulked M3 seeds. In the remaining 42 families (43%), individual mutant plants were recovered. In 12 of these families, all mutant representatives died during development. Twentytwo families segregated mutants with root defects and therefore may represent pleiotropic phenotypes.

Identification of Allelic Series of Known Nodulation Genes from NODPOP
Subsequent to the identification of the NODULE INCEPTION (NIN) gene (Schauser et al., 1999), several of the genetic components required for root symbiosis have been characterized (Downie and Oldroyd, 2008). In collaboration with other laboratories, we have used the sequence information for a number of symbiosis genes to identify several allelic series (Fig. 2) from our collection of nodulation mutants (NODPOP+). Using TILLING, we have identified a total of 126 mutations in nodulation genes in NODPOP+ (Supplemental Fig.  S3), of which 97 were from the NODPOP mutant collection (Supplemental Fig. S2; Supplemental Table  S2). For CASTOR, NIN, NSP2, and SYMRK, we also screened for alleles in a subset of GENPOP. This led to the detection of an additional 40 mutations (Supplemental Fig. S2).
In parallel to the TILLING effort, we established a high-throughput mapping pipeline for systematic assignment of genomic map positions to all mutant loci not identified by TILLING. Mutant lines confirmed in the M3 were crossed to the polymorphic mapping parent MG-20 (Kawaguchi et al., 2001), and a population typically of 16 mutant and eight wild-type F2 plants was subjected to a screen for linked simple sequence repeat markers. Simple sequence repeat markers were developed for use on an ABI3730 capillary sequencer equipped with the GeneMapper software for automated genotyping. Using these markers, we assigned 10 additional alleles to known loci in NODPOP and identified a known allele in two lines. This clearly indicates that TILLING failed to detect mutants in the interrogated sequence regions at a rate of approximately 10%. On closer inspection of these results, we could trace all back to failed TILLING PCRs, gel edge effects, or cleavage close to the end of a PCR fragment (Greene et al., 2003). In addition, we identified a new nodulation mutant, brush, which was located on the short arm of chromosome 2 at position 8.8 centimorgan (Maekawa-Yoshikawa et al., 2009).
The complete allelic series of symbiosis genes identified through both TILLING and mapping approaches are listed in Supplemental Figure S3. Detailed phenotypic descriptions of these allelic series have been or will be published elsewhere.

Half of Nodulation Gene Alleles Identified in NODPOP Are Potentially Responsible for a Nodulation Defect
We identified 97 mutant alleles in 12 genes required for symbiosis (Supplemental Table S2) by screening NODPOP. Since the requirement of the 12 genes for root symbiosis had been demonstrated previously, causative alleles should be sufficient, on their own, to cause symbiosis-defective phenotypes. Upon rescreening, 19 lines did not exhibit a nodulation defect; therefore, these 19 mutant alleles do not have an effect on gene function (Supplemental Fig. S2). Of the remaining 78 nodulation-deficient lines, 47 could harbor mutations that are responsible for the observed phenotype, since they are homozygous and lead to missense or nonsense mutations, to frame shifts, or affect splice sites (Table IV; Supplemental Fig. S2). However, the evaluation of the identified 27 homozygous missense mutations (Supplemental Fig. S2; Supplemental Table S3) requires a more detailed analysis because it is difficult to make reliable a priori predictions about the consequences of a particular amino acid substitution for protein function.
Despite the functional preselection of NODPOP, the stochastic distribution of mutations dictates that nonconsequential alleles will be recovered as well. It is important, therefore, to determine the frequency at which mutations are expected to occur by chance alone. We TILLed approximately 53.7 kb of genomic sequence corresponding to 26.8 kb of coding and 26.9 kb of noncoding sequence of the 12 symbiosis genes in Figure 2. By inspection of 224 families with a confirmed nodulation phenotype (Fig. 1), we screened a total of 12.0 Mb of diploid sequence. Taking into account the observed rate of one hit per 502 kb, we expected approximately 24 mutations to occur by chance. We obtained more than three times as many, indicating that enrichment occurred through preselection. From the total of 78 alleles, 31 mutations were heterozygous, in a noncoding region, or silent (Table  IV) and so likely to be present by chance. In these cases, the nodulation phenotype should be caused by an additional mutation in the genome. This is consistent with the finding that nine families within NOD-POP (SL0317, SL0355, SL0456, SL0605, SL0820, SL1719, SL1913, SL5369, and SL5426) carry mutations in more than one symbiosis gene. To confirm a causative connection between each of the missense mutations and the symbiosis phenotype, complementation tests would be required, which are beyond the scope of the current analysis. A summary of all potentially causative mutations in nodulation-deficient NODPOP lines is provided in Supplemental Table S3.

Analysis of Functionally Defective Mutant Alleles Reveals Bias toward Gly Replacements
Almost equal numbers of G-to-A and C-to-T transitions were observed among GENPOP alleles as expected for a random distribution. In contrast, there were 3.4 times as many G-to-A than C-to-T base changes in symbiosis genes in the 47 potentially causative NODPOP alleles (Table III). This enrichment is caused by two additive components. First, five of the six splice site mutations identified in NODPOP are   Table S3); the remaining one is a G-to-T transversion. Second, we observed that deleterious amino acid exchanges showed strongly biased distributions, both in the amino acids that were affected and in the underlying base changes. Surprisingly, mutations that replace Gly are well overrepresented in potentially functionally defective alleles ( Fig. 3; Supplemental Tables S3 and S4). The overrepresentation by 11 mutations affecting Gly is contributing to an increase in the ratio of G-to-A versus C-to-T transitions, since all nonsynonymous positions in Gly codons are occupied by G.

G-to-A transitions (Supplemental
We analyzed the distribution of amino acids that were replaced by EMS mutagenesis in our collection of missense and nonsense alleles in functionally impaired mutant lines. We asked the question whether replacements in some amino acids are more likely to result in a nonfunctional protein than others. To detect such compositional biases in our allele collection, we first determined the expected distribution of hits be- Table IV. Distribution of mutation types in nodulation genes detected by TILLING in NODPOP families with clear nodulation phenotypes (Supplemental Fig. S2) -, Zero.  Figure 3. Amino acids targeted in EMS alleles in different populations. Relative occurrence of nonsynonymous and nonsense exchanges above or below expectation is indicated by bars (Supplemental Table S4). Significant deviations from expected values are designated with asterisks (**, P , 1%; *** P , 0.1%) as determined by individual x 2 tests.
tween the amino acids occurring in the genes under study. For this, we analyzed the susceptibility to EMS mutagenesis of each of the codons in the genetic code. Assuming random mutagenesis and by taking into account the codon usage of the genes under study and the total number of mutations, we could calculate an expected number of hits for each of the codons for a particular amino acid. This expectation value was compared with the observed frequency in 102 EMS alleles of 14 genes with a variety of functions TILLed in the GENPOP (excluding mutations leading to splice site or frame shift mutations) and showed no significant deviation ( Fig. 3; Supplemental Table S4).
In contrast to this, TILLING of 12 genes in the NODPOP resulted in 39 nonsynonymous potentially causative alleles (also excluding mutations leading to splice site or frame shift mutations) with a strongly biased distribution (Fig. 3; Supplemental Table S4). The number of hits in TGG encoding Trp was 5-fold higher than expected by chance (six versus 1.2). Mutations affecting this codon are likely to be overrepresented in a series of functionally impaired alleles, since mutations of either of the two Gs to A results in a (premature) stop codon, TGA or TAG. Likewise, we observed a significant accumulation of mutations (five versus 1.6) in the CAA and CAG codons for Gln, leading to the stop codons TAA and TAG (Fig. 3). Surprisingly, codons for Gly, the smallest amino acid, were hit more than twice as often as expected by chance (11 versus 5.3), whereas codons for Ala, the second smallest amino acid, are slightly underrepresented (two versus 5.6).
A closer analysis revealed that this bias is, at least partially, the consequence of the rather narrow mutational spectrum that is explored by EMS mutagenesis. For example, the five amino acids Phe, Ile, Lys, Asn, and Tyr are not EMS targets, because of the lack of G and C at nonsynonymous positions in their codons. Of the 410 theoretical amino acid and nonsense interconversions (21 coding/noncoding options 3 20 alternatives), only 170 can be achieved by single nucleotide exchanges. And of these, only 26 are possible through EMS mutagenesis (Supplemental Fig. S4). To obtain an approximate quantitative measure for the degree of conservation of these 26 possible changes, we projected them onto the BLOSUM62 matrix (Henikoff and Henikoff, 1992). This matrix is derived from counting the frequency of amino acid pairs at a given position in sequence alignments by focusing on evolutionarily conserved sequence blocks . Apart from the nonsense mutations, only 10 EMSinducible amino acid replacements have a negative score (i.e. are infrequently observed in evolution) and hence are more likely than conservative exchanges to have a detrimental effect. Of these 10 exchanges, three affect Gly (Supplemental Fig. S4), followed by Pro, Ser, Arg, and Thr, with only two nonconservative replacements each. While this analysis provides a possible explanation why Gly hits are likely to be overrepresented in functionally defective EMS alleles, it does not explain why Pro, Ser, Arg, and Thr are not.
To test whether this bias toward Gly hits is due to the particular genes studied here, we analyzed the set of EMS-induced RPM1 alleles (Tornero et al., 2002). In this data set, Gly is also the most frequently hit amino acid, apart from Trp, Arg, and Gln mutations in codons that lead to nonsense mutations ( Fig. 3; Supplemental Table S4). In addition, Leu is a slightly overrepresented target specifically in RPM1. Since RPM1 contains Leu-rich repeats, this accumulation is probably a protein-specific effect, due to the important structural role of Leu in the Leu-rich repeat, which apparently does not tolerate the EMS-induced change to Phe.
We observe a significant overrepresentation of Gly replacements in functionally defective EMS alleles in two independent studies, with a total of 94 missense mutations (39 from the nodulation screen and 55 from the RPM1 study; Supplemental Table S4). We suggest that this particular propensity of EMS-induced Gly replacements to impair protein function might be a general phenomenon that has not been recognized previously. Using different mutagens, more than 4,000 amino acid replacements were tested in a largescale structure-function analysis of the lac repressor (Markiewicz et al., 1994). An emerging pattern was that at many sites, only the substitution of hydrophobic against hydrophobic and small against small was tolerated (Markiewicz et al., 1994). However, a particular susceptibility of Gly versus Ala, which is underrepresented in our series of functionally defective alleles, went unnoticed. Gly only carries hydrogen as a side chain and therefore can adopt exceptional conformations that are sterically forbidden for other amino acids. The torsion angles that define the rotation of the atomic bond between the alpha C atom and the amino-nitrogen and carbonyl-carbon atoms in the context of the peptide chain are referred to as w and c, respectively. Gly can explore areas in the twodimensional Ramachandran plot of w versus c that are inaccessible for other amino acids because of steric exclusion (clashes between side chains). Replacement of a Gly with exceptional w and/or c values with an amino acid carrying a side chain will change these angles, thereby rotating the amino and C-terminal parts of the protein relative to each other. This is a massive alteration in protein structure and therefore likely to lead to impaired protein function.
In conclusion, we have generated a unique forward and reverse genetics resource for legume research and uncovered several allelic series for genes important in symbiotic interactions. The detailed analysis of a large number of functionally defective EMS alleles has revealed a hitherto unobserved bias in the amino acid replacements leading to defective gene products. This information is important not only for prediction of the deleterious effects of the commonly used mutagen, EMS, on genes subjected to TILLING but may also be used to optimize algorithms designed to predict the deleterious effect of naturally occurring mutations detected in human genome sequences.

Plant Material and EMS Mutagenesis
Seeds of Lotus (Lotus japonicus) ecotype B-129 'Gifu' were a kind gift of Jens Stougaard (Aarhus University). All other Lotus ecotypes were obtained from the National Agricultural Research Center (Toyohira, Sapporo, Japan) and Legumebase, Japan (http://www.shigen.nig.ac.jp/lotusjaponicus/index_e. html). Mutagenesis was performed in 10 successive batches over the course of 3 years starting in spring 1999. Typically, 3.5 g of dry seeds was imbibed per batch, rinsed, and, after removal of washing water, treated overnight with 60 mL of EMS suspended in 10 mL of distilled water. M2 seeds from the resulting fertile M1 plants were sown in individual families (Fig. 1). In the first 2 years, a total of 45,600 M2 individuals were subjected to phenotypic screens to assemble thematic subpopulations.

DEVPOP
During the first year of building the populations (families SL0001-SL3552), M2 individuals that exhibited interesting developmental abnormalities were isolated and photographed, and the resulting data were made available in an online database (Perry et al., 2003). DNA was not isolated from these plants, but for most lines mutant or sibling seeds are available so that populations of interest can be established for DNA extraction and thematic TILLING.

Nodulation Assays and Generation of NODPOP
M2 families were screened for nodulation mutants over a 3-year period. Initial findings for the first 2 years were presented previously (Perry et al., 2003). In the first year, seeds from each family (maximum of 30 seeds) were scarified using fine sandpaper (http://www.lotusjaponicus.org/tillingpages/ protocols2.htm) and sown into pots (7 cm, square) filled with expanded clay particles (approximately 1-3 mm diameter; Biosorb, medium grade; Collier Turf Care) and covered with a layer of sand. Seeds were sprinkled on top of the sand layer and covered with more sand. Plants were inoculated after the emergence of the first true leaves (2 weeks after germination) with Mesorhizobium loti (NZP 2235) and screened for the number and appearance of nodules 4 weeks later (population 1 [SL0001-SL3552]; Table I). Biosorb proved a poor germination medium, and subsequently, in the second and third years, 12 scarified seeds from each family were sown individually, in 12 3 24 trays, on F1 compost (Levington; populations 2 [SL4096-SL5428] and 3 [SL5500-SL6964], respectively; Table I). M. loti (Tono) was used to inoculate seedlings 2 weeks after germination. Each individual was screened for nodulation 4 weeks after inoculation. Any individual M2 plants with macroscopically visible defects in root nodule symbiosis (NODPOP) were grown under glasshouse conditions, DNA was extracted, and seeds were collected in individual seed bags (Fig. 1). In some cases, the original M2 mutant individual had died before DNA could be extracted. In such cases, DNA was prepared from M3 self seeds of the mutant. In case the mutant did not produce seeds, it was attempted to recover the mutation from bulked sibling seeds and DNA was extracted subsequently.

Generation of GENPOP
After the removal of nodulation and other phenotypically interesting mutants from the family, a single healthy-looking M2 individual, which scored wild-type by phenotypic criteria, was chosen to represent each M2 family in GENPOP. DNA from each GENPOP plant was extracted, and seeds were collected in individual seed bags.

Generation of Bulk Seeds
In the first year, a representative from each family was taken to form the GENPOP, any nodulation mutants within the families were removed to generate the NODPOP, and finally, phenotypically impaired mutants were removed to provide a source for future trait-specific or theme-based TILLING. The remaining individuals from each family were planted outside in the field, and the family bulk was harvested. This generated 2,084 bulked family lines. In the second year, family lines were only bulked if they contained a starch mutant, resulting in 120 bulked lines. These were grown under glasshouse conditions. This collection of bulked family seeds is accessible in 2,204 seed bags, each representing M3 progeny of a single M1 individual.

Starch Analysis
Starch content in Lotus aerial tissue was determined using iodine staining as described by Harrison et al. (1998).

TILLING
Initially, the Lotus TILLING facility utilized an ABI377 sequencer (Perry et al., 2003; between 2003 and 2008, the platform was based on Li-Cor 4300 DNA sequencers and infrared dyes. We are currently using an ABI3730 capillary sequencing machine, since it is not gel based and has robust automated fragment detection and analysis; it offers higher throughput and reduced handling times.

Supplemental Data
The following materials are available in the online version of this article.
Supplemental Figure S1. Expected segregation ratios for one versus three cells contributing to gamete formation in an individual flower.
Supplemental Figure S2. Overview of mutant alleles in symbiosis genes identified by TILLING.
Supplemental Figure S3. Allelic series of nodulation genes in Lotus.
Supplemental Figure S4. BLOSUM62 table with amino acid changes.
Supplemental Table S1. Mutation types by nucleotide change detected in GENPOP and NODPOP.
Supplemental Table S2. Distribution of mutation types detected in nodulation genes in the complete NODPOP by TILLING.
Supplemental Table S3. Alleles in nodulation genes with potentially causative mutations in the nodulation-deficient NODPOP, ordered by mutation type.
Supplemental Table S4. Expected and observed occurrence of nonsynonymous and nonsense amino acid exchanges in EMS alleles in different populations.