SmartGrain : High-Throughput Phenotyping Software for Measuring Seed Shape through Image Analysis

Seed shape and size are among the most important agronomic traits because they affect yield and market price. To obtain accurate seed size data, a large number of measurements are needed because there is little difference in size among seeds from one plant. To promote genetic analysis and selection for seed shape in plant breeding, ef ﬁ cient, reliable, high-throughput seed phenotyping methods are required. We developed SmartGrain software for high-throughput measurement of seed shape. This software uses a new image analysis method to reduce the time taken in the preparation of seeds and in image capture. Outlines of seeds are automatically recognized from digital images, and several shape parameters, such as seed length, width, area, and perimeter length, are calculated. To validate the software, we performed a quantitative trait locus (QTL) analysis for rice ( Oryza sativa ) seed shape using backcrossed inbred lines derived from a cross between japonica cultivars Koshihikari and Nipponbare, which showed small differences in seed shape. SmartGrain removed areas of awns and pedicels automatically, and several QTLs were detected for six shape parameters. The allelic effect of a QTL for seed length detected on chromosome 11 was con ﬁ rmed in advanced backcross progeny; the cv Nipponbare allele increased seed length and, thus, seed weight. High-throughput measurement with SmartGrain reduced sampling error and made it possible to distinguish between lines with small differences in seed shape. SmartGrain could accurately recognize seed not only of rice but also of several other species, including Arabidopsis ( Arabidopsis thaliana ). The software is free to researchers.

Seed shape and size are among the most important agronomic traits because they affect yield, eating quality, and market price. Therefore, plant research fields such as genetics, functional analysis, and genomicsassisted crop improvement, in addition to breeding programs, could benefit from quantitative evaluation of seed shape. Efficient, reliable, high-throughput phenotyping methods are required.
In general, seed shape can be scored in two ways. The simple way is to measure seed length (L) and width (W) with calipers. However, manual methods have limits to the number of data, the quality of measurements, and the variety of shape data that can be gleaned. By contrast, computational methods using digital imaging technology could enable us to automatically measure a variety of shape parameters at very small sizes in high-resolution images (Brewer et al., 2006;Bylesjö et al., 2008;Weight et al., 2008;French et al., 2009;Wang et al., 2009). Several imaging methods have been developed so far. Elliptic Fourier descriptors have been used to examine variations in grain shape (Iwata and Ukai, 2002;Iwata et al., 2010), but only six grains per accession are measured, grains must be exactingly laid in the same direction, and actual lengths are not measured. Thus, this method is not suitable for high-throughput measurement. Herridge et al. (2011) developed a high-throughput method to measure the area of Arabidopsis (Arabidopsis thaliana) seeds, using a desktop scanner and image analysis software to automate labor-intensive tasks, but it measures only the area of a seed, not the shape parameters.
To achieve detailed genetic analyses (such as quantitative trait locus [QTL] analyses or genome-wide association studies), we need a highly accurate method that can quickly measure a large number of samples from genetic mapping populations, such as F2 and recombinant inbred lines, because there is little difference in size among seeds from even one plant at the same plant age (Hoshikawa, 1993;Herridge et al., 2011). In this study, we developed a high-throughput phenotyping program called SmartGrain that uses image analysis to determine seed shape. SmartGrain automatically recognizes all seeds within a digital image, detects outlines, and then calculates L, W, seed area (AS), perimeter length (PL), and other parameters. To validate the software, we used it in QTL analysis for rice (Oryza sativa) seed shape, which is difficult to automatically measure because of pedicels and awns. The pedicel is the stalk supporting a spikelet on a panicle branch (Fig. 1A, arrow; Fig. 2A, arrowhead), and the awn is a filiform extension of varying length 1 This work was supported by grants from the Ministry of Agriculture, Forestry, and Fisheries of Japan (Genomics for Agricultural Innovation NVR-0001 and NVR-0002).
2 These authors contributed equally to the article. * Corresponding author; e-mail myano@nias.affrc.go.jp. The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantphysiol.org) is: Masahiro Yano (myano@nias.affrc.go.jp).
[C] Some figures in this article are displayed in color online but in black and white in the print edition.
[W] The online version of this article contains Web-only data. [OA] Open Access articles can be viewed online without a subscription.
www.plantphysiol.org/cgi/doi/10.1104/pp.112.205120 protruding from the top of a lemma (Fig. 2A, arrow;Chang and Bardenas, 1965). We used backcross inbred lines (BILs) and chromosome segment substitution lines (CSSLs) derived from a cross between japonica cultivars Koshihikari and Nipponbare, which differ little in seed shape. The results clearly demonstrate the robustness and effectiveness of SmartGrain for use in genetic analysis.

SmartGrain: Image Analysis Software for High-Throughput Measurement of Seed Shape
SmartGrain batch-analyzes all images stored in a specified folder (up to 65,500). It can automatically detect overlapping seeds, which are then omitted from analysis, and automatically removes awns and pedicels (Figs. 1, A and B, and 2): Parameters are set for the method of seed detection, degree of seed overlap, and intensity of awn and pedicel removal. Each 600-dpi (dots per inch) A4 image takes a few minutes to analyze. The results are exported as a CSV (commaseparated values) file that can be opened in spreadsheet software (e.g. Microsoft Excel).
Regardless of the placement or number of seeds, SmartGrain can isolate all seeds and measure their shape (Fig. 1B), removing a serious bottleneck in taking images.

Algorithm for Image Processing and Measurement of Seed Shape Parameters
SmartGrain can measure multiple parameters in addition to area. Having detected the outline of each seed, it calculates AS, PL, L, W, circularity (CS), lengthto-width ratio (LWR), intersection of length and width (IS), center of gravity (CG), and distance between IS and CG (DS; Fig. 1C).
The values of these parameters are calculated by taking sequential points along the seed perimeter and maximizing or minimizing the values in the following sequence ( Fig. 1D): (1) load image; (2) convert to 1-bit image (white seeds on a black background); (3) analyze morphology to delete awns and pedicel; (4) detect outlines, label all seeds in the image, and calculate AS, PL, and CG from the outlines; (5) detect the longitudinal axis and calculate L; (6) detect the transverse axis from the outline and the longitudinal axis and calculate W and IS; and (7) calculate LWR, CS, and DS using data from steps 4 to 6.

Automatic Segmentation of AS and Plotting of Seed Perimeter
The image is processed in three steps. First, the image is converted from 24-bit (full color; Fig. 2A) to 1-bit (black and white) using a segmentation method (Tanabata et al., 2010) to give white seeds on a black background (Fig. 2B). This method is robust to lighting levels, and a threshold does not need to be set. Second, any awn or pedicel is removed. SmartGrain uses the OpenCV functions "cvErode" and "cvDilate" based on morphological operations (Fig. 2C). The cvErode process pares away the perimeter, reducing the AS, which the cvDilate process rebuilds (Fig. 2D).
Third, the OpenCV function "cvFindContour," based on the algorithm of Suzuki and Abe (1985), automatically detects the perimeter of each seed in the 1-bit image ( Fig. 2E; OpenCV Developers Team, 2012). It acquires a set of perimeter coordinates P i (Fig. 3A) as: The origin O ð0; 0Þ for all P i is defined as the top left corner in the image (Fig. 3A). From the set of perimeter coordinates, the OpenCV function "cvContourArea" computes the area within the perimeter, and "cvArcLength" computes the PL (OpenCV Developers Team, 2012). Using these functions, AS and PL are calculated from the set of P i ðx i ; y i Þ. CG is calculated from P i ðx i ; y i Þ as:

Calculation of L, W, and Other Shape Parameters
Next, the longitudinal and transverse axes are derived from the set of perimeter points.
To measure L, SmartGrain detects the maximum distance between points on the perimeter (Fig. 3A) by calculating all segment distances hl i;j between all pairs of points on perimeter P i ðx i ; y i Þ: To measure W, SmartGrain detects the longest segment that is perpendicular to the L segment ( Fig. 3B): Finally, the other shape parameters are calculated:

Application of SmartGrain to QTL Analysis of Rice Seed Shape in BILs
SmartGrain measured AS, PL, CS, L, W, LWR, IS, CG, and DS. Japanese rice cultivars Nipponbare and Koshihikari have similar seed shapes (Fig. 4A). However, SmartGrain clearly identified significant differences between them in PL, L, LRW, and CS (Fig. 4, B-G): cv Nipponbare is longer than cv Koshihikari and is therefore less round.
To detect QTLs for seed shape, we measured 127 cv Nipponbare/Koshihikari BILs developed using cv Koshihikari as the recurrent parent (Matsubara et al., 2008). The BILs showed a normal distribution, continuous variation, and transgressive segregation in all seed parameters examined (Fig. 5), suggesting the involvement of a number of QTLs in seed shape. QTL analysis detected 13 loci (Table I; Fig. 6, Supplemental Fig. S1). Two QTLs for W were detected on the short arm and the long arm of chromosome 3 near single nucleotide polymorphism (SNP) markers NIAS_Os_aa03000545 and NIAS_Os_aa03002423, respectively. The cv Nipponbare allele on the short arm decreased W and that on the long arm increased it (Table I). Three QTLs for L, AS, and PL were detected on the long arm of chromosome 7, near NIAS_Os_aa07005384. The cv Nipponbare alleles at these loci each decreased these parameters. Two QTLs for CS and LWR were identified on the long arm of chromosome 8, near NIAS_Os_aa08005493. The cv Nipponbare allele at each locus made the seeds rounder. Four QTLs were detected for L, LWR, CS, and PL on the long arm of chromosome 11, near NIAS_Os_aa11012252. The cv Nipponbare alleles increased L and PL, making the seeds more slender. We detected another QTL for W near NIAS_Os_ aa01004804 on the short arm of chromosome 1 and one for PL near NIAS_Os_aa10003500 on the distal end of chromosome 10.

Mapping of the QTL for L on Chromosome 11 with CSSLs
To verify the QTL detection accuracy of SmartGrain and the allelic effect of the QTL detected on chromosome 11, we performed further analysis using three CSSLs with a cv Nipponbare segment on chromosome 11 in a cv Koshihikari background: SL635, SL636, and SL637 (Fig. 7, A and B). Each CSSL has a single chromosome segment of cv Nipponbare on chromosome 11 in a common genetic background of cv Koshihikari (Hori et al., 2010). L was 0.17 mm longer in SL636 and SL637 than in cv Koshihikari but was not longer in SL635 (Fig. 7, B and C). Furthermore, we manually measured the L of the three CSSLs (10 seeds per plant; five plants per line) by caliper. Although these values were slightly larger than those measured  by SmartGrain, the patterns of L of each line corresponded with those of SmartGrain (Supplemental Fig.  S2). In addition, the 1,000-grain weight of SL636 and SL637 was significantly greater (by 0.7 g, or 3.6%; Fig. 7D). Thus, SmartGrain could detect and map the QTL accurately. The cv Nipponbare allele increased L and consequently 1,000-grain weight. Substitution mapping using the three CSSLs mapped the QTL between SNP markers SNP3403 and NIAS_Os_aa11005113, within a region of about 7 Mbp (Fig. 7A).

Use of SmartGrain to Measure Seed Shape of Various Plant Species
We also used SmartGrain to measure the seeds of Arabidopsis, soybean (Glycine max), Setaria italica, Setaria viridis, and rice 'Koshihikari'. The procedures and ease of use were the same as for measuring the rice grains, except that we scanned the Arabidopsis seeds at 2,400 dpi instead of 600 dpi on account of their small size. SmartGrain correctly recognized all seeds in the images (Fig. 8) and calculated the seed shape parameters as for the rice seeds (Table II). Thus, SmartGrain can be used to measure the seeds of various plant species.

Effectiveness of SmartGrain for High-Throughput Seed Shape Phenotyping
Our main aim in developing SmartGrain was to reduce the time and labor involved in capturing and analyzing images, which has hitherto prevented the automation of large-scale genetic analysis. The software accurately measures the shape of all seeds in an image, in any orientation (Figs. 1-3). Thus, the labor input has been reduced to spreading the seeds and removing them again (Supplemental Fig. S3). It also accurately removes awns and pedicels ( Figs. 1 and 2), eliminating the need to do this by hand. The Ls measured by SmartGrain were slightly shorter than those measured by caliper (Supplemental Fig. S2). This might be due to the automated removal of awns and pedicels in SmartGrain. It is difficult to remove them accurately (especially small parts, such as Fig. 2A, arrowhead) by hand, so the length might be overestimated. SmartGrain thus enables us to measure a large number of seeds easily and quickly, greatly reducing sampling error. The analysis of more than 200 seeds per plant at high resolution (600 dpi, 0.024 mm/pixel) made it possible to distinguish between seed lines with very little difference in shape (Fig. 4).
Application of SmartGrain to QTL Analysis of Rice BILs and Mapping cv Koshihikari and Nipponbare, both temperate japonica Japanese rice cultivars, are genetically closely related . Their seeds are similar in shape (Fig. 4A), but measuring them has previously Figure 7. Seed L and W of SL635, SL636, and SL637 CSSLs with a cv Nipponbare segment of chromosome 11 in the cv Koshihikari background. A, Graphical genotypes of chromosome 11 in the CSSLs. Gray, homozygous for cv Koshihikari; white, homozygous for cv Nipponbare. B, Seeds of the three CSSLs and cv Koshihikari; bar = 5 mm. L (C) and 1,000-grain weight (D; dehulled) of parents and CSSLs. Asterisks indicate significant differences between cv Koshihikari and the CSSLs: *P , 0.05, **P , 0.01; ns, not significant. L was measured in five plants per line, and 1,000seed weight was measured independently three times in bulked grains of each line. [See online article for color version of this figure.] proved difficult. As such small differences are sometimes very important to market value, detailed genetic analysis, including gene isolation, is necessary. Using SmartGrain, we could readily detect small differences in seed shape between these two cultivars. Subsequent QTL analysis in cv Koshihikari/Nipponbare BILs based on more than 12,700 seeds detected several QTLs for seed shape (Table I; Fig. 6).
Three QTLs (for AS, PL, and L) were detected on chromosome 7, and their nearest markers and directions of additive effects corresponded, suggesting that they are one QTL involved in three seed shape parameters: A decrease in L might result in a decrease in AS and PL. Likewise, two QTLs (for CS and LWR) were detected on chromosome 8, and four QTLs (for PL, CS, L, and LWR) were detected on chromosome 11, and their nearest markers and directions of additive effects also corresponded, although the two-LOD (log of the odds) confidence interval for PL on chromosome 11 did not overlap entirely with that of CS, L, and LWR. It is likely that the two QTLs on chromosome 8 constitute one QTL involved in seed roundness and that the four QTLs on chromosome 11 were also one QTL involved in PL, CS, L, and LWR; an increase in L might affect these other parameters. We performed QTL analysis in the BILs in only 1 year. As the effects of the detected QTLs were small, it will be necessary to confirm their effects in another year and in advanced progeny.
Applying SmartGrain to the CSSLs, we mapped the QTL for L within a region of about 7 Mbp on chromosome 11 (Fig. 7). This QTL was confirmed by the 1,000-grain weight, suggesting that SmartGrain can detect a difference in rice L of only 0.17 mm.
Several important seed shape QTLs, such as GS3 (Fan et al., 2006), GW2 (Song et al., 2007), qSW5/GW5 Weng et al., 2008), GS5 (Li et al., 2011), and GW8 (Wang et al., 2012), have been cloned by using map-based strategies. These QTLs have relatively large allelic effects; even GS5, with the smallest effect among them, caused differences of 0.25 mm in W and 1.59 g in seed weight (Li et al., 2011). However, qSW5, with a large effect on W (0.5 mm, 15%), explained only 38.5% of the phenotype variance in a cv Nipponbare 3 Kasalath F2 population , suggesting that the bulk of phenotypic variations are caused by several minor QTLs. To further improve grain yield and to understand the mechanisms controlling seed shape, it will be important to detect not only major but also minor QTLs. SmartGrain can contribute to this by detecting very small effects.
In addition to rice, SmartGrain accurately recognized seeds of several other species (Fig. 8; Table II), with no change to the procedure. In particular, it could measure Arabidopsis seed shapes, which are too small to measure in bulk by other means. SmartGrain will be a useful tool for understanding genes and mechanisms controlling grain or seed size and shape in various plant species.

Imaging
Images were captured on an Epson GT-X820 A4 scanner with the supplied software without image enhancement and saved as TIFF (tagged image file format) files. Seeds are spread uniformly on the glass (Supplemental Fig. S3, A-D), scanned at 600 dpi (23.6 mm 21 ; Fig. 1A), and removed. This takes only a few minutes, allowing us to process several hundred batches of seeds in a day.

Software Implementation
SmartGrain has been written in Visual C++ in the Microsoft Visual Studio 2010 software creation tool and runs under Microsoft Windows XP, Vista, and 7. The Open Computer Vision Library (OpenCV; http://sourceforge.net/ projects/opencvlibrary/) is used for image input and output and some imaging processes (median filtering, morphological operations, finding perimeters, and calculating AS and PL). SmartGrain is free for academic purposes (Supplemental Program S1 and Supplemental Data Set S1).

Plant Materials
To perform QTL analyses for seed shape, we tested 127 cv Koshihikari/ Nipponbare BILs (Matsubara et al., 2008) bred with cv Koshihikari as the recurrent parent. We also used three CSSLs (Hori et al., 2010) with cv Nipponbare chromosome segments on chromosome 11 in a cv Koshihikari background to verify the allelic effects of the QTL detected.

Measurement of Rice Seeds
We measured at least 58 grains per BIL, and more than 200 grains per plant of parents and CSSLs (five plants each), and used mean values of each seed parameter in the QTL analyses. Thousand-grain weight was measured on a gauge.
A linkage map was constructed in MAPMAKER/EXP 3.0 software , and the Kosambi mapping function was used to calculate genetic distances (in centimorgans). The genotypes of residual heterozygous regions in the BILs were treated as missing data. QTL analyses were performed by composite interval mapping (Zeng, 1993(Zeng, , 1994, as implemented by the Zmapqtl module (model 6) provided by version 2.5 of the QTL Cartographer software (http://statgen.ncsu.edu/qtlcart/WQTLCart.htm; Basten et al., 2005). Genome-wide threshold values (a = 0.05) were used to detect putative QTLs based on the results of 1,000 permutations (Churchill and Doerge, 1994).

Supplemental Data
The following materials are available in the online version of this article.
Supplemental Figure S1. Log-likelihood (top) and additive-effect plots (bottom) across the 12 rice chromosomes from the QTL analyses for seed shape parameters of cv Koshihikari/Nipponbare BILs.
Supplemental Figure S2. L, as measured by caliper, of recurrent parent (cv Koshihikari) and three CSSLs (SL635, SL636, and SL637) with a cv Nipponbare segment of chromosome 11 in the cv Koshihikari background.
Supplemental Figure S3. Scanning of seeds on a desktop scanner.
Supplemental Data Set S1. Test data set for SmartGrain software program.