Research Article

Recombination and selection pressure in the ipomovirus sweet potato mild mottle virus (Potyviridae) in wild species and cultivated sweetpotato in the centre of evolution in East Africa

  • 1Department of Agricultural Sciences, PO Box 27, FIN-00014 University of Helsinki, Finland
  • 2Department of Botany, Faculty of Science, Makerere University, PO Box 7062, Kampala, Uganda
  • 3Department of Crop Science, Faculty of Agriculture, Makerere University, PO Box 7062, Kampala, Uganda
  • 4Institute of Biotechnology, PO Box 65 (Viikinkaari 1), FIN-00014 University of Helsinki, Finland
  • Correspondence
    Jari P. T. Valkonen
    jari.valkonen{at}helsinki.fi
  • Journal of General Virology 2010; 91(4):1092–1108 · https://doi.org/10.1099/vir.0.016089-0

    View at publisher PubMed

    Abstract

    Sweet potato mild mottle virus (SPMMV) is the type member of the genus Ipomovirus (family Potyviridae). SPMMV occurs in cultivated sweetpotatoes (Ipomoea batatas Lam.; Convolvulaceae) in East Africa, but its natural wild hosts are unknown. In this study, SPMMV was detected in 283 (9.8 %) of the 2864 wild plants (family Convolvulaceae) sampled from different agro-ecological zones of Uganda. The infected plants belonged to 21 species that were previously not known to be natural hosts of SPMMV. The size of the SPMMV coat protein (CP) was determined by Western blot analysis, N-terminal protein sequencing and peptide mass fingerprinting. Data implicated a proteolytic cleavage site, VYVEPH/A, at the NIb/CP junction, resulting in a CP of approximately 35 kDa. Nearly complete sequences of 13 SPMMV isolates were characterized. Phylogenetic analysis of non-recombinant CP-encoding sequences placed five isolates from wild species sampled in the central zone of Uganda into a separate cluster. Recombination events were detected in the 5′- and 3′-proximal parts of the genome, providing novel evidence of recombination in the genus Ipomovirus. Thirteen amino acids in the N terminus of the P1 protein were under positive selection, whereas purifying selection was implicated for the HC-Pro-, P3-, 6K1- and CP-encoding regions. These data, supported by previous studies on ipomoviruses, provide indications of an evolutionary process in which the P1 proteinase responds to the needs of adaptation.

    • The GenBank/EMBL/DDBJ accession numbers for the SPMMV isolates characterized from wild plants and cultivated sweetpotato in this study are EF155969EF155973 and FJ999758FJ999765.

    • Supplementary material is available with the online version of this paper.

    INTRODUCTION

    The majority of viruses known to infect plants have an RNA genome and possess high evolutionary potential, due to fast replication and high genetic variability owing to the lack of proofreading activity of their RNA-dependent RNA polymerases (Malpica et al., 2002; Elena & Sanjuán, 2007). Consequently, evolutionary forces determine the extent of genetic variation and shape the genetic structures in virus populations. Understanding of these processes provides a framework for monitoring changes in virus populations and designing appropriate virus-control measures (García-Arenal et al., 2001; Elena et al., 2008).

    Sweet potato mild mottle virus (SPMMV) (Colinet et al., 1998) is the type member of the genus Ipomovirus in the family Potyviridae, the largest group of positive-sense, single-stranded RNA viruses infecting plants (Fauquet et al., 2005). The other members of the genus Ipomovirus are Cucumber vein yellowing virus (CVYV), Squash vein yellowing virus (SqVYV) and Cassava brown streak virus (CBSV) (Fauquet et al., 2005). Whilst the genome structure of SPMMV is similar to those of viruses of the genus Potyvirus and other monopartite members of the family Potyviridae (Colinet et al., 1998), it differs from the genomes of CVYV, SqVYV and CBSV, which do not encode the helper-component proteinase (HC-Pro) (Janssen et al., 2005; Li et al., 2008; Mbanzibwa et al., 2009a). HC-Pro is a suppressor of RNA silencing and considered to be an important determinant of virus–host interactions (Anandalakshmi et al., 1998; Brigneti et al., 1998; Kasschau & Carrington, 1998; Torres-Barceló et al., 2008; reviewed by Rajamäki et al., 2004). HC-Pro is also necessary for transmission of potyviruses by aphids (Govier & Kassanis, 1974; Blanc et al., 1998; Wang et al., 1998). The lack of HC-Pro in three ipomoviruses raises questions about the importance of this protein for SPMMV.

    Another difference between SPMMV and the other three characterized ipomoviruses is the P1 serine proteinase, located first at the polyprotein N terminus. In the genus Potyvirus, it acts as an auxiliary factor for RNA-silencing suppression, HC-Pro being the main suppressor (Rajamäki et al., 2005). In SPMMV, the silencing-suppression activities seem to have been overtaken by P1; HC-Pro shows no such activities (Giner et al., 2008). SPMMV, with a large P1 protein, differs from CVYV and SqVYV, which have two smaller P1 proteinases (P1a and P1b), and CBSV contains only P1b. The P1b proteins of CVYV and CBSV suppress RNA silencing (Valli et al., 2006, 2008; Mbanzibwa et al., 2009a). Hence, elucidation of the variability of and selection pressures on different parts of P1 may provide novel evidence of its roles in the infection cycle of SPMMV.

    SPMMV occurs in cultivated sweetpotatoes (Ipomoea batatas Lam.; Convolvulaceae) grown in countries surrounding the Lake Victoria basin of East Africa (Hollings et al., 1976; Mukasa et al., 2003a; Ateka et al., 2004; Tairo et al., 2004; Njeru et al., 2008). Detection of viruses in all major sweetpotato-growing regions of the world suggests that East Africa is the centre of origin and/or evolution of SPMMV (Loebenstein et al., 2003; Tairo et al., 2005; Valverde et al., 2007; Rännäli et al., 2009). In East Africa, SPMMV is the third most prevalent virus infecting sweetpotatoes, after sweet potato feathery mottle virus (SPFMV; genus Potyvirus, family Potyviridae) and sweet potato chlorotic stunt virus (SPCSV; genus Crinivirus, family Closteroviridae) (Tairo et al., 2005). The negative impact of SPMMV on sweetpotato production in East Africa is enhanced in sweetpotato plants also infected with SPCSV or with both SPCSV and SPFMV; such co-infection results in synergistic interactions and severe diseases (Mukasa et al., 2006). Although SPMMV was originally described as a whitefly-borne virus (Hollings et al., 1976), later studies have failed to confirm its whitefly transmissibility. In comparison to other sweetpotato-infecting viruses, SPMMV has an exceptionally wide experimental host range, including species in 14 plant families (Brunt et al., 1996). SPMMV infects Ipomoea purpurea, Ipomoea nil and Ipomoea rubrocaerulea following experimental inoculation (Hollings et al., 1976), but the natural hosts of SPMMV in the family Convolvulaceae are not known. Because over 80 wild Ipomoea species and several species of other genera of the family Convolvulaceae occur in East Africa (Verdcourt, 1963), which is also a major area for sweetpotato cultivation, the possibilities for detecting natural hosts of SPMMV seem particularly promising there.

    The aim of this study was to detect and analyse molecular signatures of selection pressure and recombination in the genome of SPMMV; to this end, the nearly complete genomes of 13 SPMMV isolates detected in wild species and cultivated sweetpotatoes were characterized.

    RESULTS

    SPMMV isolates from novel natural hosts

    SPMMV was detected in 283 (9.8 %) of the 2864 wild plants tested from Uganda. Infected plants belonged to Hewittia sublobata, Lepistemon owariensis and 19 Ipomoea species (Table 1). All are previously unknown natural hosts for SPMMV. In 10 species (I. acuminata, I. cairica, I. eriocarpa, I. involucrata, I. obscura, I. sinensis, I. tenuirostris, I. wrightii, H. sublobata and L. owariensis) of which over 70 plants each were tested, the overall incidence of SPMMV ranged from 0.7 % in I. eriocarpa to 18.5 % in L. owariensis (Table 1), as shown by a consistently positive reaction in repeated nitrocellulose membrane ELISA (NCM-ELISA). Scions of 25 wild plants that were seronegative for SPMMV were grafted onto 2-week-old plants of SPMMV-susceptible I. setosa Kerr., so as to enhance the chances to detect the virus (Tugume et al., 2008). The grafted I. setosa plants developed no virus symptoms and tested negative for SPMMV by NCM-ELISA 3 weeks after grafting, which indicated that serological detection of SPMMV in wild species was reliable. The majority of SPMMV-infected plants were detected in districts of the eastern zone (17 species, 214 plants), compared with the central and western zones, where only few infected plants were detected; no wild plants were found to be SPMMV-infected in the northern zone (Table 1). SPMMV was also detected in 14 % (59 of 419) of cultivated sweetpotato plants that were sampled and tested (data not shown). The ecological aspects and implications of these data will be discussed elsewhere.

    Table 1.

    Numbers of wild plant species collected and tested for SPMMV from different agro-ecological zones of Uganda

    Taxonomic identification of plants was done according to keys described by Verdcourt (1963) as described previously (Tugume et al., 2008). Plant genera: I, Ipomoea; A, Astripomoea; H, Hewittia; L, Lepistemon. Number of plants of each species from a given district that were infected with SPMMV out of the total number tested for that species is shown. Names of districts are as follows. Central zone: LUW, Luwero; MKN, Mukono; MSK, Masaka; RKI, Rakai; PG, Mpigi. Northern zone: LIR, Lira; APC, Apac; GUL, Gulu; ARU, Arua. Eastern zone: KTK, Katakwi; SOR, Soroti; KUM, Kumi; MBL, Mbale; KAP, Kapchorwa; TOR, Tororo; KML, Kamuli; IGG, Iganga. Western zone: RUK, Rukungiri; KNG, Kanungu; KBL, Kabale; BUS, Bushenyi; MBR, Mbarara; KAS, Kasese; MAS, Masindi; HOM, Hoima. The totals at the end of rows indicate the number of plants infected with SPMMV out of the total number of plants tested for that species, whereas the totals at the bottom of columns indicate the number of plants infected with SPMMV out of the total number of plants tested from that district.

    Scions of 20 wild plants seropositive for SPMMV only and displaying leaf-mottling symptoms (Fig. 1b) or leaf deformation (Fig. 1c) under greenhouse conditions were used to graft-inoculate I. setosa, which developed similar vein-clearing and mottling symptoms (Fig. 1e) and tested positive for only SPMMV by NCM-ELISA 3 weeks after grafting. Eight SPMMV isolates from wild species sampled from the eastern and central agro-ecological zones were transmitted mechanically from I. setosa to Nicotiana benthamiana, Nicotiana rustica and Nicotiana tabacum, in which they induced similar leaf puckering and chlorosis in systemically infected leaves (Fig. 1g–i). Hence, the isolates tested did not differ for symptoms induced in indicator plants. Eight isolates from five wild species that could be multiplied from cuttings and maintained for experiments were included in molecular analyses (Table 2).

    Figure image not available in archive
    Fig. 1.

    (a) Map of Uganda showing regions (districts) from where wild plants were tested for SPMMV in four agro-ecological zones. The northern zone is characterized by short grasslands, supports mostly annual crops and has a markedly different cropping system from the other agro-ecological zones, and sweetpotato is not continuously grown. In the tall grass–forest mosaic areas of the central zone and some parts of the western and eastern tall grassland zones, farmers may have many gardens planted in such a way that they get year-round harvest. (b–i) Virus-like symptoms observed under greenhouse conditions: (b) mottling symptoms observed in I. acuminata; (c) leaf deformation in I. cairica; (d) healthy leaf of I. cairica; (e, g, h, i) symptoms induced by an isolate of SPMMV from I. acuminata in different experimental host plants 21 days post-inoculation: (e) vein chlorosis in I. setosa [healthy leaf shown in (f)]; (g) leaf puckering and mottling in N. rustica; (h) severe leaf puckering, mottling and distortion in N. tabacum leaves; (i) chlorosis mostly of the leaf tips (the old part of the leaf) in N. benthamiana.

    Table 2.

    GenBank accession numbers for SPMMV isolates characterized from wild plants and cultivated sweetpotato in this study and those retrieved from GenBank previously characterized from cultivated sweetpotato

    All publicly available sequences of SPMMV were included. The SPMMV isolate from Kenya (GenBank accession no. NC_003797) is the only one whose genome has been fully sequenced.

    Size and variability of the SPMMV coat protein (CP)

    Size of the CP has not been determined experimentally in SPMMV. Detection of SPMMV CP by Western blot analysis using antibodies raised to SPMMV virions revealed a single band that migrated similarly to the 35 kDa protein marker in all tested SPMMV isolates (Fig. 2). The CPs from purified virions of SPMMV isolates MBL86, KAP90, ARU60 and TOR17 (Table 2) were subjected to N-terminal sequencing, but no signals could be obtained from the eight cycles of analysis, which suggested that the N terminus of CP was blocked. Subsequently, the CP of isolate ARU60 was digested ‘in gel’ and the resulting peptides were subjected to peptide fingerprint analysis by matrix-assisted laser desorption/ionization–time of flight (MALDI-TOF) mass spectrometry. The obtained peptide mass fingerprint included all of the predicted tryptic peptide masses of analysable size from the deduced CP sequence of ARU60 (data not shown). The most N-terminal protonated mass detected (2677.364 Da) corresponded to the tryptic peptide TIEELQQEMEDLDADTTITVVQR (2676.280 Da), indicating that the N terminus of the mature CP in virions was upstream from this peptide and likely to be VYVEPH/A (Adams et al., 2005a).

    Figure image not available in archive
    Fig. 2.

    SDS-PAGE and Western blotting of total plant protein extracted from leaves of N. rustica inoculated with SPMMV. (a) GelCode blue staining of excess proteins in acrylamide gel for 1 h after overnight transfer by electroblotting. (b) Detection of SPMMV CP-specific polyclonal antibodies by ECL anti-rabbit IgG–horseradish peroxidase conjugate (GE Healthcare), using a SuperSignal West Femto detection kit (Thermo Scientific) (see Methods). PVY indicates virus particles of potato virus Y strain O purified from N. tabacum; SPMMV indicates virus particles of SPMMV isolate MBL86 purified from N. rustica. Protein size marker positions are indicated to the left of the figure (in kDa). Lanes: H, total protein from healthy N. rustica; 1–4, total protein from sap of N. rustica plants inoculated with isolates TOR17, ARU60, MBL86 and KAP90 of SPMMV, respectively.

    The SPMMV CP N terminus did not contain any DAG motif, implicated in aphid transmissibility of potyviruses (Atreya et al., 1995), but contained a DAD motif (amino acid residues 18–20) in all 28 isolates from Uganda, in contrast to the only characterized isolate of SPMMV (GenBank accession no. NC_003797) from Kenya (Colinet et al., 1998), which contains a DSD motif (Fig. 3a). The CPs of eight isolates characterized from wild species were >92.4 % identical to each other and >92.8 % identical at the nucleotide level to the 21 isolates characterized from sweetpotato in this study (five isolates) and previous studies (16 isolates) (see Supplementary Table S1, available in JGV Online). The length of the 3′-untranslated region (UTR) was 305 nt (one isolate), 307 nt (six isolates), 308 nt (11 isolates), 310 nt (two isolates), 311 nt (eight isolates) or 314 nt (one isolate). The identity of the 3′-UTR was 90.3–99.1 % among the isolates from wild species and 89.6–96.9 % between these isolates and those from sweetpotato; however, identity of the 3′-UTR between isolate NC_003797 and the isolates characterized in this study was only 89.3–90.9 % (see Supplementary Table S2, available in JGV Online).

    Figure image not available in archive
    Fig. 3.

    (a) Alignment of the C-terminal 26 aa of the NIb protein and 74 aa of the N-terminal part of the CP in 29 isolates of SPMMV. The H/A NIb/CP proteolytic cleavage site determined experimentally in this study is shown by a double-ended vertical arrow. Amino acids P6–P3′ (Adams et al., 2005a) are boxed. The previously proposed proteolytic sites Q/R (Colinet et al., 1998) and E/P (Mukasa et al., 2003b) (broken arrows) are also shown. The most N-proximal tryptic peptide detected by peptide mass fingerprint analysis (in isolate ARU60, used as reference) is shown in bold and underlined. A ‘DAG-like’ motif (DAD), present in 28 isolates, is shaded grey. Names of eight SPMMV isolates characterized under this study from wild plants (in bold) and sweetpotato, as well as from previous studies from sweetpotato, are shown on the left. (b) Schematic presentation of the SPMMV genome and polyprotein structure (according to isolate NC_003797; 10 818 nt). The protein-encoding parts of the genomic regions characterized in this study (5868 nt of the 5′-proximal part and 1811–1820 nt at the 3′ end, depending on the isolate) are shaded and indicated by nucleotide/amino acid genomic positions. Note that P1 of NC_003797 (Colinet et al., 1998) is 15 aa shorter than other isolates, due to a 45 nt gap in the P1-encoding region. Names of mature proteins are shown above the polyprotein: P1, the first protein; HC-Pro, helper-component proteinase; P3, third protein; 6K1 and 6K2, 6 kDa proteins; CI, cylindrical inclusion protein; VPg, viral genome-linked protein; NIa–Pro, the main viral proteinase; NIb, replicase; CP, coat protein; 5′-UTR and 3′-UTR, untranslated regions. Numbers inside boxes indicate the size (kDa) of the mature proteins. (c) Nucleotide diversity (π) as calculated with a 100 nt sliding window moved by steps of 25 nt along an alignment of the nearly complete genome sequences of 13 isolates of SPMMV characterized in this study. The characterized genomic regions are as in (b). The nucleotide positions in the ‘concat’ sequence alignment are shown on the x-axis. The gap in the viral genome sequence indicates the uncharacterized part and is marked by a vertical broken line. (d) Average nucleotide diversity indices for the different genomic regions and mean numbers of nucleotide substitutions per non-synonymous (πa) and synonymous (πs) site. Diversity indices for the CP and 3′-UTR are based on a total of 29 isolates, including 13 characterized in this study and 16 from previous studies.

    Deduced amino acid sequences of mature proteins and their functional domains at the N-proximal part of the SPMMV polyprotein

    The proteolytic cleavage sites in the polyprotein of SPMMV were those described previously (Adams et al., 2005a) in all isolates (Fig. 3b), with the exception of isolate ARU60, in which the site at the P1/HC-Pro junction was MQFY/A instead of IQFY/A. The predicted P1 proteins were of identical size (758 aa) in all isolates except for NC_003797, which contained a gap of 15 aa due to a 45 nt deletion in the P1-encoding region. The amino acid identities of P1 proteins were only 82.1–90.9 % among the 14 isolates compared (Supplementary Table S2). However, the conserved catalytic triad (His655-Asp667-Ser704), with Ser704 in the context GWSG (Verchot et al., 1992), was detected at the active site of the P1 proteinase in all isolates (His640-Asp652-Ser689 in NC_003797). In all isolates, P1 contained also a putative zinc-finger motif and an LxKA motif (Valli et al., 2007), required for RNA silencing-suppression activities of the P1b protein of CVYV (Valli et al., 2008), and putative WG/GW motifs (positions 15–16, 100–101 and 130–131), implicated in binding Argonaute1, an RNase involved in RNA silencing (El-Shami et al., 2007).

    Identities of the HC-Pro amino acid sequences were 90.5–97.8 %, those for the P3 amino acid sequences were 94.2–100 % and those for the 6K1 amino acid sequences were 92.2–100 % among the 14 isolates (Supplementary Table S2). In all isolates, HC-Pro contained the highly conserved cysteine protease motif GxCY (Oh & Carrington, 1989) and motif PTK, involved in aphid transmission of potyviruses (Peng et al., 1998). The KITC, KLSC or RITC motif required for retention of potyviral virions in aphid stylets (Blanc et al., 1998) was replaced by KTCC in 10 SPMMV isolates, by KACC in isolate KAP88 and by RTCC in isolates R58LUW and KINT2. Five isolates (R58LUW, R125MPG, MBL86, KAP90 and 57KAP) from wild plants contained a unique amino acid substitution, Val208Ala, in the C terminus of P3 (data not shown).

    Nucleotide diversity and selection pressure

    The average nucleotide diversity (π) of the studied genomic regions in 14 isolates of SPMMV was relatively high (10.7 %) (Fig. 3c). Peaks of diversity >10.7 % were apparent in the regions encoding P1, HC-Pro, P3 and CP (Fig. 3c). The highest diversity (up to 28 %) was observed in the N-proximal part of the P1-encoding region (π=14.1 %; Fig. 3c). Non-synonymous diversity was 2–18-fold lower than synonymous diversity and ranged from 1.7 to 9.2 % (Fig. 3d).

    Model M0, used to assess selection pressures [maximum-likelihood (ML) framework of codon substitution], yielded the value ω≤0.192 over all codon sites in the five proteins at the 5′-proximal part of the SPMMV polyprotein and in the CP, indicating strong purifying selection (Table 3). Heterogeneity of selective pressure, tested by using an M3 vs M0 likelihood-ratio test (LRT), revealed that M3 fitted the data significantly better than M0 (with the exception of 6K1) (Table 3). M3 for P1 suggested that a large set of sites (58.4 %) were evolving under strong purifying selection (ω=0.002), fewer sites (38.8 %) under weak purifying selection (ω=0.453) and only 2.7 % of sites under positive selection (ω=2.491) (Table 3). In HC-Pro, P3 and CP, the majority of sites were under strong purifying selection (Table 3).

    Table 3.

    Selection pressures exerted on the P1, HC-Pro, P3, 6K1 and CP proteins of SPMMV

    Parameter estimates, log-likelihood (lnL) values, ω ratio (dN/dS) and LRT statistics under different ML models of codon substitution were used to investigate selection pressures. Models are according to Yang et al. (2000) (M0, M3, M7, M8), Wong et al. (2004) and Yang et al. (2005) (M1a, M2a). Model M0, one ratio; M1a, nearly neutral; M2a, positive selection; M3, discrete; M7, β; M8, β plus ω. The numbers of parameters estimated for the different models were one (M0), two (M1a), four (M2a), five (M3), two (M7) or four (M8). LRT statistics indicate M3 vs M0 tests of heterogeneity of selection pressures among codon sites, whilst M2a vs M1a and M8 vs M7 are tests of positive selection, all of which assess LRT statistic (2δlnL) against a chi-squared distribution with the degrees of freedom equal to the difference in the number of parameters between the nested models under comparison. Positively selected amino acid sites are those with posterior probabilities (P>95.0 %) of being under positive selection. Identification of amino acids potentially under positive selection is based on either neb inference (under M3) or beb inference under M2a or M8.

    LRTs comparing the log-likelihoods of nested models M2a and M1a showed that M2a was not a better fit than M1a for all protein-encoding regions analysed (Table 3), consistent with purifying selective constraints on most of the amino acid sites. Comparison of M8 with M7 showed that M8 fitted the P1 data better than M7 and that 13 sites (1.6 %) in P1 were under positive selection (ω=2.871) (Table 3). These 13 sites of P1, identified by using the Bayes empirical Bayes (beb) inference (Yang et al., 2005), were 203F, 229D, 237F, 238L, 243K, 248P, 255L, 256E, 278L, 288D, 305Q, 327P and 376S. The näive empirical Bayes (neb) inference (Yang et al., 2000) under M8 provided similar results for 11 sites, with the exceptions being 243K and 288D (Table 3). Under M3, neb predicted the same amino acid residues as under M8 and, moreover, site 361S, to be under positive selection. Four amino acids (248P, 256E, 278L and 305Q) were predicted with posterior probabilities of >95 % to be under positive selection (Table 3). Although some sites in HC-Pro, P3 and CP were predicted to be under positive selection under M8 and/or M2a (Table 3), the LRT statistics for these proteins were not significant (Table 3).

    Recombination in SPMMV

    Evidence of phylogenetic conflicts, as depicted by reticulate networks, was detected in the SPMMV genome by Neighbor-Net analysis of the 5′-proximal half of the genome (5868 nt) in 14 isolates [Fig. 4a; P<0.00001, pairwise homoplasy (PHI) test], and in the 3′ end of the genome (1811–1820 nt) of 29 isolates (Fig. 4b; P=1.77×10−12, PHI test), indicating the presence of recombination. For instance, isolates R58LUW and MBL86 (from I. acuminata), KAP88 (from I. obscura) and TOR17 (from sweetpotato) displayed extensive reticulate relationships with other isolates (Fig. 4a). No reticulate relationships were unique to isolates from a certain wild host plant species or from sweetpotato.

    Figure image not available in archive
    Fig. 4.

    (a, b) Neighbor-Net trees based on (a) 14 nucleotide sequences corresponding to approximately 5.8 kb of the 5′ genomic region (5′-UTR, P1, HC-Pro, P3, 6K1 and partial CI) or (b) 29 nucleotide sequences corresponding to approximately 1.8 kb of the 3′ genomic region (partial NIb, CP and 3′-UTR) of SPMMV. Networked relationships among several virus isolates with box-like structures instead of a bifurcating evolutionary tree indicate the presence of recombination. (c) Phylogenetic analysis of non-recombinant CP-encoding nucleotide sequences of 17 SPMMV isolates. Names of isolates characterized from wild plants are indicated in bold. CVYV (genus Ipomovirus; GenBank accession no. NC_006941) was used as a root. Numbers at branches represent bootstrap values (percentages of 1000 replicates). Only bootstrap values ≥80 % are shown. Bars, Kimura units in nucleotide substitutions per site (Kimura, 1980).

    Isolates MBL86, R58LUW (from I. acuminata), R125MPG (from I. cairica), RUK93 and TOR17 (from sweetpotato) were the most complex recombinants, predicted to contain two recombination events within the 5′-proximal part of the genome (Table 4). Breakpoints were also detected in the CP-encoding region of 12 isolates (Table 4). They included KAP90 (from I. spathulata), MBL86 (from I. acuminata), Bkb3, BUS, KAM, Kam2, KUM, Kum2, RUK2, RUK93, Tar3 and TOR (from sweetpotato) (Table 4), of which Bkb3 and Tar3 were predicted to contain two recombination events (Table 4). Isolates MBL86 and RUK93 contained recombination events at both the 3′ and 5′ parts of the genome (Table 4). No breakpoints were unique for a host plant species. Isolates BUSH1 and ARU60 from sweetpotato were predicted to be the major and minor ‘parent-like isolates’, respectively, for 50 % (nine of 18) of the recombinant isolates (Table 4), suggesting a common origin/parenthood of a large proportion of the recombinant SPMMV isolates in Uganda.

    Table 4.

    Predicted recombination breakpoints in isolates of SPMMV and parent-like isolates of SPMMV, as identified by different recombination-detection methods

    The 5.8 kb data (5868 nt) encompass the 5′-proximal genomic region of SPMMV, including the 5′-UTR, P1, HC-Pro, P3, 6K1 and 1056 nt of the CI-encoding region. The 1.8 kb region (1811–1820 nt) encompasses the 3′ end of the genome, including 624 nt of the NIb-encoding region, the CP region and the 3′-UTR. In KAP88, one of the recombination points could not be pinpointed precisely, indicated by (?). ‘Parent-like’ isolates are isolates in which sequences closely resemble the exchanged sequence tracts in the recombinants, but are not necessarily the actual parents. The methods used to infer recombination breakpoints were: R, rdp; G, geneconv; B, bootscan; M, maximum chi square; C, chimaera; S, siscan. The methods whose P-values are shown are indicated in bold.

    The 17 CP sequences containing no detectable recombination breakpoints and evidence for substitution saturation (see Supplementary Fig. S1, available in JGV Online) were subjected to phylogenetic analysis. Five isolates (Fig. 4c) characterized from wild plants (Table 2) were placed into an independent subcluster, whereas one isolate (KAP88) obtained from a wild plant clustered with isolates originating from cultivated sweetpotato (Fig. 4c).

    DISCUSSION

    The nearly complete genome sequence (7679–7688 nt) was characterized from 13 SPMMV isolates. The whole genome sequence (10 818 nt) is available from a single SPMMV isolate (Colinet et al., 1998) and only a 1.8 kb-long part of the 3′ end has been characterized from the other 15 isolates studied previously (Mukasa et al., 2003b; Tairo et al., 2005). Hence, the data of the present study allowed analysis of evolutionary processes in the 5′ genomic region of SPMMV as the first example in the genus Ipomovirus. In comparison to a number of viruses in the family Potyviridae studied previously (reviewed by García-Arenal et al., 2001), SPMMV exhibited relatively high nucleotide diversity in the P1 region (Adams et al., 2005b). Data of previous studies suggest that P1 may be crucial for host adaptation in potyviruses (Shi et al., 2007; Salvador et al., 2008a). P1 proteins of potyviruses such as turnip mosaic virus and potato virus Y (PVY) have the largest ω values among the mature viral proteins (at the population level), but no P1 amino acids have been reported to be under positive selection (Tomitaka & Ohshima, 2006; Ogawa et al., 2008). Therefore, the finding of 13 amino acids that were detected to be under positive selection in the P1 N terminus of SPMMV was significant. Besides being a proteinase, the P1 of SPMMV is also a suppressor of RNA silencing (Giner et al., 2008). It might interfere with the loaded RNA-induced silencing complexes (Giner et al., 2008); this hypothesis is supported by three reiterated WG/GW motifs in the N-terminal half of P1. Therefore, P1 might suppress RNA silencing via a mechanism involving binding of Argonaute1, similarly to the P0 protein of poleroviruses (Bortolamiol et al., 2007). Reiterated WG/GW motifs form evolutionarily and functionally conserved Argonaute-binding platforms in RNA silencing-related components (El-Shami et al., 2007; Lian et al., 2009). Taken together, the positively selected amino acids in SPMMV P1 may constitute ligand-binding domains that mediate interactions with host factors and drive evolution in SPMMV.

    On the other hand, all amino acids in HC-Pro, P3, 6K1 and CP were under purifying selection. This was unexpected, especially for HC-Pro, which in potyviruses is involved in multiple functions, including viral movement and RNA-silencing suppression (Rajamäki et al., 2004), of which some show host specificity (Sáenz et al., 2002; Salvador et al., 2008b). Hence, substitutions in HC-Pro could be needed for better fitness in the range of host species (Torres-Barceló et al., 2008). However, recent studies indicate that SPMMV HC-Pro does not suppress RNA silencing (Giner et al., 2008), suggesting a lower importance in virus–host interactions than is known for HC-Pro in potyviruses. Indeed, no HC-Pro is encoded by three other ipomoviruses, CVYV, SqVYV and CBSV, and P1b and P1 act as silencing suppressors in CVYV and CBSV, respectively (Janssen et al., 2005; Li et al., 2008; Mbanzibwa et al., 2009a). On the other hand, if SPMMV HC-Pro mediates vector transmission, purifying selection on this protein would seem meaningful. P3 contains virulence determinants in plum pox virus (genus Potyvirus) (Salvador et al., 2008b) and positive selection has been documented on a few amino acids (Glasa et al., 2002), which was not observed with the P3 of SPMMV. A few amino acids in the multifunctional CP are under positive selection in the potyviruses PVY, bean yellow mosaic virus and yam mosaic virus (Moury et al., 2002), but not in the SPMMV CP. Hence, it seems that similar HC-Pro, P3, 6K1 and CP proteins are able to provide the needed putative virus–host interactions during the infection cycle of SPMMV in a large number of host species (Hollings et al., 1976; this study).

    Neighbor-Net analysis supported by strong statistical evidence indicated recombination in SPMMV; recombination has been little studied in the genus Ipomovirus. Recombination was frequent at the 5′ end of the genome, especially in the P1- and HC-Pro-encoding regions, as found previously with potyviruses (Ohshima et al., 2007; Ogawa et al., 2008). Recombination may have contributed to the evolution of P1 proteins and helped in adaptation of members of the family Potyviridae to a wide range of host species (Valli et al., 2007). Many SPMMV recombinants were found in natural virus populations, which implies a measure of selective fitness. Isolates in which recombination does not conform to acceptable regions within the ‘evolutionary space’ are removed from the population (Martin et al., 2005b; Lefeuvre et al., 2007). Among the 17 isolates in which no recombinantion was detected in the CP-encoding region and that were subjected to phylogenetic analysis, five of the six isolates from wild plants formed a separate cluster, suggesting a possibility of host-mediated selection during the evolutionary process. However, the clustering also correlated with geographical origin of isolates. The high frequency of recombination in the CP sequence hampered these analyses by limiting the number of isolates that could be included.

    Experimental evidence of the size of CP and the NIb/CP junction in the polyprotein of SPMMV was not available, but was needed to define the CP-encoding sequence to be subjected to analysis of variability and evolution. Western blot analysis of the CP in virions indicated a molecular mass of approximately 35 kDa. Peptide mass fingerprinting by MALDI-TOF mass spectrometry indicated that the NIb/CP cleavage site could be VYVE/P or VYVEPH/A (Mukasa et al., 2003b; Adams et al., 2005a), but apparently not VVQ/R, proposed initially (Colinet et al., 1998) because the most N-terminal tryptic peptide obtained (TIEELQQEMEDLDADTTITVVQR) was downstream of VYVEP and VYVEPHA and contained VVQR. Lack of signal in N-terminal sequencing suggested that the CP N terminus of SPMMV is blocked, similar to the capsid proteins of many viruses (Driessen et al., 1985). Presuming that blocking was caused by acetylation, to which alanine is more prone than proline (Driessen et al., 1985), these data support VYVEPH/A rather than VYVE/P as the NIb/CP cleavage site, resulting in a CP of 302 aa.

    The aphid-transmissibility motifs typical of potyviruses (Atreya et al., 1995; Lopéz-Moya et al., 1999) were not found in the CP N terminus of SPMMV, and transmission of SPMMV by aphids has not been observed (Hollings et al., 1976). However, the HC-Pro of SPMMV contained the highly conserved PTK motif involved in binding HC-Pro to the CP in virions, which might bridge the virion to the stylet during aphid transmission of potyviruses (Blanc et al., 1998; Peng et al., 1998). The KITC, KLSC or RITC motifs are also critical for potyvirus retention in the aphid stylets (Blanc et al., 1998) and a KACC, KTCC or RTCC motif was found at the corresponding position in SPMMV HC-Pro. Previously, a highly aphid-transmissible potyvirus, peanut mottle virus, was found to possess a KVSC motif instead of KITC in the HC-Pro and a unique DAA motif instead of DAG in the CP N terminus (Flasinski & Cassidy, 1998). Hence, identification of vectors for SPMMV will provide an interesting subject for study. One scenario could be that the HC-Pro of a co-infecting potyvirus facilitates aphid transmission of SPMMV from the host (Kassanis & Govier, 1971; Sako & Ogata, 1981; Wang et al., 1998). In East Africa, co-infections of SPMMV with the potyvirus SPFMV are up to 3-fold more common than infections with SPMMV alone in cultivated sweetpotatoes (Mukasa et al., 2003a; Ateka et al., 2004; Tairo et al., 2004; Njeru et al., 2008), which seems to support the hypothesized opportunistic aphid transmission of SPMMV with help of the SPFMV HC-Pro.

    SPMMV is the third most prevalent virus infecting cultivated sweetpotato in East Africa (Mukasa et al., 2003a; Tairo et al., 2004; Ateka et al., 2004; Njeru et al., 2008), but the wild host plants of SPMMV in the field were hitherto unknown. Hence, identification of 21 wild plant species of the family Convolvulaceae as natural hosts of SPMMV has significantly extended the future possibilities to understand the epidemiology and ecology of SPMMV. The wide host range (Hollings et al., 1976) and geographical restriction largely to East Africa (Tairo et al., 2005) suggest that SPMMV is not a ‘sweetpotato virus’, but rather that it existed in East African wild plants and invaded sweetpotato when introduced from Latin America only approximately 300 years ago (Zhang et al., 2004). CBSV is another ipomovirus that seems to have originated in East Africa (Mbanzibwa et al., 2009b) and a strain of SPFMV also seems to have evolved there (Tairo et al., 2005; Tugume et al., 2008; Rännäli et al., 2009). These findings indicate that the native flora of East Africa offers interesting possibilities to explore the evolution of members of the family Potyviridae.

    Taken together, evolutionary diversification of P1 proteins and variability of genome structures in terms of the presence or absence of HC-Pro in the family Potyviridae have become more apparent in recent studies on ipomoviruses (Valli et al., 2007; Mbanzibwa et al., 2009a). However, the evolutionary forces behind this variability have remained largely unknown. This study has provided novel evidence of the evolutionary processes shaping up one lineage of viruses belonging to the family Potyviridae by revealing positive selection of sites in the P1 protein and purifying selection on HC-Pro in SPMMV, the only ipomovirus known to contain HC-Pro. These data, together with previous studies, suggest that ipomoviruses (Valli et al., 2007; Mbanzibwa et al., 2009a) and possibly tritimoviruses (Stenger et al., 2005, 2007) represent evolutionary lineages in which the P1 proteinases respond more significantly to the needs of adaptation than does HC-Pro.

    METHODS

    Virus isolates and detection.

    Wild plants belonging to the genera Astripomoea (106 plants), Hewittia (687 plants), Ipomoea (1974 plants) and Lepistemon (97 plants) of the family Convolvulaceae (Table 1) were collected from 25 districts of Uganda (Fig. 1a) as described in Supplementary Methods (available in JGV Online). In addition, 419 sweetpotato plants were sampled from gardens in whose vicinity the wild plants were collected. Testing for SPMMV was done by NCM-ELISA using polyclonal anti-CP antibodies provided by the International Potato Center (CIP), Lima, Peru, as described previously (Gibb & Padovan, 1993; Tugume et al., 2008). Cuttings were planted in an insect-proof screenhouse at Makerere University Agricultural Research Institute, Kabanyolo, Uganda, to allow further observations of virus-like symptoms on new growth and repeated testing for SPMMV.

    SPMMV isolates from six species sampled in eight districts of the central and eastern zones were graft-transmitted from wild plants to healthy plants of sweetpotato ‘Tanzania’, and cuttings of these plants were transported to the University of Helsinki for molecular characterization. In addition, eight SPMMV isolates were inoculated mechanically to carborundum-dusted leaves of N. benthamiana, N. tabacum and N. rustica to observe symptoms (Fig. 1g–i). Plants were maintained in an insect-proof greenhouse (25–30 °C; relative humidity, 70 %) under natural daylight extended to 16 h by illumination with high-pressure sodium halide lamps (150–200 μmol s−1 m−2).

    Analysis of the SPMMV CP.

    Particles of SPMMV were purified and total proteins were extracted from systemically infected leaves of N. rustica as described previously (Fribourg & Nakashima, 1984; Wang et al., 2006) and analysed by SDS-PAGE (Sambrook & Russell, 2001). The proteins were transferred onto a Hybond-P PVDF membrane (Amersham) by electroblotting and probed with the anti-SPMMV CP antibodies described above. Signals were developed by using the Enhanced Chemiluminescence (ECL) system (GE Healthcare) and detected by using a SuperSignal West Femto kit (Thermo Scientific).

    N-terminal sequencing of the SPMMV CP was carried out on purified virions by using a Procise 494A Sequencer (Perkin Elmer). To obtain a peptide mass fingerprint, the CP in the polyacrylamide gel was stained with Coomassie brilliant blue as described by Matsudaira (1987) and protein bands of interest were cut out and digested ‘in gel’ as described by Shevchenko et al. (1996). Proteins were reduced with dithiothreitol and alkylated with iodoacetamide before digestion with trypsin (Sequencing Grade Modified Trypsin, V5111; Promega). The recovered peptides were, after desalting, subjected to peptide mass fingerprinting by MALDI-TOF mass spectrometry using an Ultraflex TOF/TOF instrument (Bruker Daltonik).

    Molecular characterization of SPMMV sequences.

    Total RNA was extracted from leaves by using TRIzol reagent (Invitrogen). First-strand cDNA was synthesized from 3 μg total RNA by using an oligo-dT25 primer and Moloney murine leukemia virus reverse transcriptase (RT) (Finnzymes Oy) according to the manufacturer's instructions. The 3′-proximal part of the SPMMV genome (approx. 1.8 kb) was amplified by PCR using a degenerate forward primer (PVD-2) complementary to the NIb-encoding sequence (Gibbs & Mackenzie, 1997), and the reverse primer 10818R (5′-GGCTTTTGGATAGGCGACAA-3′) complementary to the 3′ end of the 3′-UTR of SPMMV.

    The 5′ end of the SPMMV genome was captured by using 5′ RACE (rapid amplification of cDNA ends) as described by Scotto-Lavino et al. (2006) with minor modifications, which included the use of SuperScript III RT enzyme (400 U) (Invitrogen), overnight incubation of the RT reaction at 50 °C and use of overlapping virus-specific reverse primers and the high-fidelity Phusion DNA polymerase (Finnzymes). The SPMMV-specific reverse primer RAE1 (5′-CCTCCCTGCACGCCCGAATCTTT-3′) was used for first-strand cDNA synthesis. Primers RAE2 (5′-TGCACGCCCGAATCTTTGAATTC-3′) and RAE3 (5′-CCCGAATCTTTGAATTCTTTCTGC-3′) were used in combination with primers QT and QO or QI for the first and second round of PCR amplification, respectively (Scotto-Lavino et al., 2006). The SPMMV-specific primers were complementary to the sequence encoding an RNA-binding and ATPase-activity motif, RAERIQRFGRAGR, in the cylindrical inclusion (CI) protein of SPMMV (GenBank accession no. NC_003797). PCR products were resolved by agarose gel (1.0 %) electrophoresis, stained with ethidium bromide and visualized under UV light.

    PCR products were purified by using exonuclease I and calf intestine alkaline phosphatase (Fermentas) from two independent PCRs and sequenced directly in both directions with a BigDye Terminator kit version 3.1 on an ABI automatic 3130XL Genetic Analyzer (Applied Biosystems). Alternatively, the PCR products were excised and purified from the gel, cloned in Escherichia coli (DH5α; Invitrogen) and at least two clones were sequenced.

    Multiple sequence alignments and analysis of phylogenetic signal.

    Nucleotide sequences were aligned by using clustal_x version 1.83 (Thompson et al., 1997), examined visually and translated into amino acid sequences by using the emboss web translation tool (). Percentage nucleotide and amino acid identities between sequences were computed by using the clustal w procedure (Thompson et al., 1994) as implemented in the megalign program of the dnastar software package (dnastar Inc.).

    Substitution saturation was tested (Xia et al., 2003) as implemented in dambe version 5.0.0.23. Results were displayed by plotting of transition and transversion rates against divergence based on the Kimura two-parameter nucleotide-substitution model of pairwise comparisons between sequences (Kimura, 1980).

    Phylogenetic analysis of recombination and analysis of recombination breakpoints.

    Phylogenetic evidence for recombination was detected by using the Neighbor-Net method (Bryant & Moulton, 2004) in splitstree4 version 4.10. The Kimura two-parameter nucleotide-substitution model (Kimura, 1980) was used, branch support was estimated by bootstrapping with 1000 replicates and the presence of recombination was verified statistically by using the PHI test (Bruen et al., 2006).

    Parent-like sequences and approximations of recombination breakpoints were identified by using the methods rdp, geneconv, bootscan, maximum chi square, chimaera and siscan (rdp3 package version 3.28; Martin et al., 2005a). Analyses were carried out using default settings and the Bonferroni correction P-value cut-off of 0.05. Only breakpoints deduced by more than one method were considered further (Posada, 2002).

    Analysis of nucleotide diversity and phylogenetic relationships.

    The 5′- and 3′-proximal sequences of SPMMV isolates were concatenated and aligned with the corresponding sequence of the only SPMMV isolate whose complete sequence has been determined (GenBank accession no. NC_003797; Colinet et al., 1998). Nucleotide diversity (π) was calculated by using a 100 nt sliding window with 25 nt steps. The average number of nucleotide substitutions per non-synonymous (πa) and synonymous (πs) site was calculated for each protein-encoding region. Diversity indices were calculated by using DnaSP version 5 (Librado & Rozas, 2009). Phylogenetic relationships of the sequences encoding P1, HC-Pro, P3, 6K1 or CP were analysed by using the neighbour-joining algorithm (Saitou & Nei, 1987) in mega4 (Tamura et al., 2007) using the Kimura two-parameter nucleotide-substitution model (Kimura, 1980) and tested by performing 1000 bootstrap replications. Sequences with evidence of recombination were excluded from phylogenetic analysis (Posada & Crandall, 2002).

    Analysis of selection pressures.

    The non-synonymous to synonymous nucleotide-substitution rate ratio (ω) was assessed by using an ML codon-substitution model implemented in the codeml program of the paml4 package (Yang, 2007). Six site models, including M0 (one ratio), M1a (nearly neutral), M2a (positive selection), M3 (discrete), M7 (β) and M8 (β plus ω), were exploited as described previously (Yang et al., 2000, 2005; Wong et al., 2004). Three LRTs (M3 vs M0, M2a vs M1a and M8 vs M7) were used to assess the models' fit to the data, as described by Wong et al. (2004). Where the LRTs suggested positive selection, the beb approach (Yang et al., 2005) was used to identify amino acids subjected to positive selection (posterior probabilities >95 %). Recombinant sequences were excluded from analysis because recombination creates patterns of genetic variability that closely resemble the effects of molecular adaptation, thus violating the assumptions under the ML framework of codon substitution (Anisimova et al., 2003).

    Acknowledgments

    Financial support from the academy of Finland (grant no. 1110797) is gratefully acknowledged.

    References