Research Article

Evolution, dispersal and replacement of American genotype dengue type 2 viruses in India (1956–2005): selection pressure and molecular clock analyses

  • 1National Institute of Virology, 130/1 Sus Road, Pashan, Pune – 411021, Maharashtra, India
  • 2King Edward Memorial Hospital and Research Center, 489, Rasta Peth, Sardar Moodliar Road, Pune – 411011, Maharashtra, India
  • Correspondence
    Devendra T. Mourya
    mouryadt{at}icmr.org.in
  • Journal of General Virology 2010; 91(3):707–720 · https://doi.org/10.1099/vir.0.017954-0

    View at publisher PubMed

    Abstract

    This study reports the phylogeny, selection pressure, genotype replacement and molecular clock analyses of many previously unstudied dengue type 2 virus (DENV-2) strains, isolated in India over a time span of almost 50 years (1956–2005). Analysis of complete envelope (E) gene sequences of 37 strains of DENV-2 from India, together with globally representative strains, revealed that the American genotype, which circulated predominantly in India during the pre-1971 period, was then replaced by the Cosmopolitan genotype. Two previously unreported amino acid residues, one in the American (402I) and one in the Cosmopolitan (126K) genotypes, known to be involved functionally in the cellular tropism of the virus, were shown to be under positive selection pressure. The rate of nucleotide substitution estimated for DENV-2 was 6.5×10−4 substitutions per site year−1, which is comparable with earlier estimates. The time to the most recent common ancestor of the pre-1971 Indian strains and the American genotype was estimated to be between 73 and 100 years (1905–1932), which correlates with the historical record of traffic between India and South America and suggests transportation of the virus from the Americas. Post-1971 Indian isolates formed a separate subclade within the Cosmopolitan genotype. The estimated time to the most recent common ancestor of the Indian Cosmopolitan strains was about 47 years, with further estimates indicating the migration of DENV-2 from India to countries across the Indian ocean between 1955 and 1966. Overall, the present study increases our understanding of the events leading to the establishment and dispersal of the two genotypes in India.

    • †These authors contributed equally to this work.

    • The GenBank/EMBL/DDBJ accession numbers for the sequences reported in this paper are FJ538905FJ538928 and FJ807632FJ807640.

    • Three supplementary tables are available with the online version of this paper.

    INTRODUCTION

    Dengue is the most prevalent arthropod-borne viral infection of humans in tropical and subtropical countries. About 100 million infections occur every year, of which the majority are either asymptomatic or self-limiting, acute, severe, febrile, ’flu-like illnesses often accompanied by rash. However, nearly half a million cases of dengue haemorrhagic fever or dengue shock syndrome (DHF/DSS) are estimated to be hospitalized throughout tropical and subtropical regions (WHO, 2002). A high proportion of patients who develop DSS die. Therefore, dengue fever (DF) and/or DHF/DSS have become a major public-health concern in the tropics and subtropics worldwide. Dengue virus (DENV) infections are caused by one or more of four antigenically related DENV serotypes (1–4). DF is commonly described as being confined to urban areas. However, it is now commonly reported in rural settings in both Asia and Latin America (Eram et al., 1979; Mehendale et al., 1991; Mahadev et al., 1993; Hayes et al., 1996; Chareonsook et al., 1999; Kumar et al., 2001; Tewari et al., 2004; Arunachalam et al., 2004). Although Aedes aegypti is the main vector for DENV, epidemiological evidence suggests that other Aedes species play an important role in DENV transmission (Mackerras, 1946; Rosen et al., 1954; Sulianti, 1978; Gratz, 2004; Xu et al., 2007; Fulmali et al., 2008). Enzootic DENV may also be responsible for DF in rural and periurban settings throughout the tropics, except in the Americas (Diallo et al., 2008).

    DENV belongs to the family Flaviviridae, genus Flavivirus, and its genome consists of a non-segmented, single-stranded, positive-sense RNA, which is approximately 10.7 kb. It encodes a single open reading frame that encodes three structural (capsid, C; membrane, M; and envelope, E) and seven non-structural (NS1, NS2A, NS2B, NS3, NS4A, NS4B and NS5) proteins (Chambers et al., 1990).

    Although historical records state that DENV was present in India in the 19th century, no viruses were obtained until DENV-2 was first isolated from the sera of American soldiers in 1944 during World War II (Sabin, 1952). Since then, epidemics of DENV have been recorded from almost all over India at different times. Many factors – including unprecedented population growth, increased population density, unplanned and uncontrolled urbanization, increased global travel, increased density of the vector mosquito, infestation of new geographical areas by vector mosquitoes, warm and humid climate and water storage pattern in houses – that promote the spread of DENV have all contributed to the remarkably increased epidemiology of DENV in India during the past few decades (Chaturvedi, 2006). All four DENV serotypes have been reported periodically in this country, but DENV-2 has usually been the main aetiological agent in DF and DHF outbreaks and has emerged as the predominant serotype (Dar et al., 1999; Parida et al., 2002). Various studies have revealed genetic variation amongst DENV-2 isolates on the basis of E gene sequence analysis and grouped them into six genotypes corresponding to Sylvatic, Asian I, Asian II, American/Asian, Cosmopolitan and American, as classified previously (Rico-Hesse, 1990; Rico-Hesse et al., 1997; Wang et al., 2000). Some genotypes reported from multiple geographical regions demonstrate the wide distribution of these viruses, whilst others appear to have more restricted distributions. It has been proposed that DENV-2 genotypes differ in their virulence and in their ability to infect host cells and to be transmitted, thus influencing the outcome and possibly the dispersal of the disease caused by the virus (Leitmeyer et al., 1999; Diamond et al., 2000; Cologna & Rico-Hesse, 2003; Cologna et al., 2005).

    As there is little information regarding the circulating genotypes of DENV-2 in India and because vaccines are now realistically within reach, it is important to genotype and to understand the epidemiology of these viruses (Chaturvedi & Shrivastava, 2004). Limited phylogenetic studies of DENV-2, utilizing a small fragment of either the E/NS1 gene or the capsid–premembrane (C–prM) gene, have been reported, showing a change in the circulating genotype of DENV-2 in India (Singh et al., 1999; Singh & Seth, 2001; Dash et al., 2004). These studies focused mainly on strains of DENV-2 from single epidemics. Studying the evolutionary dynamics of DENV in India over an extended period of time will be useful to understand how different genotypes were established in India and how they dispersed. As the genetic diversity following DENV-2 subtype replacement may have a significant role to play in the outcome of DENV infection and disease severity, we also looked for evidence of selection pressure and clade reintroduction in these viruses, based on previous observations with non-Indian DENV-2 (Zanotto et al., 1996; Twiddy et al., 2002a, b; Myat Thu et al., 2005; Bennett et al., 2006; Vasilakis et al., 2007).

    We report the molecular evolution, geographical dispersal and replacement of the American genotype based on 37 DENV-2 strains from India, sampled during the past 50 years, in the context of globally representative strains. The E gene was selected because it is known to be under immunological selection pressure imposed by the host immune system (Innis et al., 1989; Gritsun et al., 1995). Site-specific selection pressures and rates of nucleotide substitution were estimated.

    RESULTS

    Sequence and phylogenetic analyses

    The percentage nucleotide identity (PNI) of the E gene of Indian strains, when compared with the prototype strain of DENV-2 (PG/NGC44/1944), ranged from 89.9 to 95.6 %, and the deduced amino acid sequence identity ranged from 98.2 to 98.9 %. Single nucleotide substitutions were scattered throughout the entire length of the E gene. The majority of these substitutions occurred at the third codon position. No base insertions or deletions were found among the Indian strains. Transitions were more predominant than transversions. The overall transition/transversion bias was 4.54.

    Interestingly, the 37 DENV-2 strains of Indian origin segregated into only two different genotypes: the American genotype, comprising DENV-2 from Latin America, the Caribbean and Pacific Islands (including the earliest TT/Trinidad/1953 strain), and the Cosmopolitan genotype, which has a wide geographical distribution, comprising DENV-2 from Australia, China, Africa and South-East Asia (Fig. 1). The mutations observed in the Indian strains of the American and Cosmopolitan genotype, along with other representative strains that show a close relationship to the Indian isolates, are presented in Table 1. Mutations are known to occur in DENV following passage in mice (Bray et al., 1998) and in mammalian cells (Vasilakis et al., 2009). However, in the present study, it was observed that an isolate with a low passage history (NIV 053598) had 12 unique mutations, whereas an isolate with a high passage history (NIV 95455) revealed only one unique mutation.

    Figure image not available in archive
    Fig. 1.

    Phylogenetic tree of DENV-2 based on complete E gene sequences, using maximum likelihood (ML) with paup 4b10 software. Indian sequences are shown in bold.

    Table 1.

    Amino acid changes observed for Indian isolates of the American and Cosmopolitan genotypes compared with the prototype New Guinea C strain, along with a few other representative global strains

    Unique changes observed in a single strain have been omitted.

    DENV-2 strains that were isolated before and during 1971 (pre-1971 Indian isolates) grouped within the American genotype, whilst those isolated after 1974 (post-1971 Indian isolates) grouped within the Cosmopolitan genotype cluster. An exception was strain IN/NIV803347/1980, isolated during the DENV-2 epidemic in Kolkata (West Bengal) in 1980, which grouped within the American genotype cluster. This strain was seen to possess several of the mutations specific to the American genotype, i.e. 81T, 139V, 162V, 203D, 390D and 484I (Table 1). The PNI of the pre-1971 Indian isolates with respect to strain PG/NGC44/1944 was 91.2–92.3 %, whilst that of the post-1971 Indian isolates was 89.0–95.4 %.

    Within the American genotype, the Indian isolates formed four separate clusters with 100 % bootstrap support. In the fourth group, the IN/NIV715541/1971 isolate was found to be related most closely to strain TO/EKB194/1974 (isolated from Tonga) and these were related most closely to the native American isolates from South America from 1967 to 1995. Several of these isolates share a mutation (V308I) specifically with the IN/NIV715541/1971 strain. Within the Cosmopolitan group of viruses, two clusters with strong bootstrap support were observed. The Indian strains were distinct from the other Cosmopolitan strains. Further, within the Indian cluster, two main groups were observed. The Indian strain from 1974 (IN/NIV742295/1974) was related closely to the SC/SEY42/1977 and SL/SL206/1990 strains from the Seychelles and Sierra Leone, respectively. The group of Indian strains from 1994, 1996, 2001 and 2005, one strain from 1993 and one from 2004 showed a close relationship with strains from Sri Lanka (LK/271235/1990), Uganda (UG/CAMR11/1993) and China (CN/FJ11/1999). These isolates shared a mutation Lys126→Glu, whilst strain CN/FJ11/1999 specifically shared a mutation of Ile141→Val with most of the Indian Cosmopolitan strains. Of the six amino acid replacements (Glu71→Ala, Ile129→Val, His149→Asn, Ile164→Val, Asn390→Ser and Ile462→Val) noted in the viruses of the Cosmopolitan genotype (Twiddy et al., 2002a), five were seen in the Indian viruses at positions 71, 149, 164, 390 and 462. At position 129, most Indian viruses retained Ile, except for two strains belonging to the Cosmopolitan genotype that showed either Val (as in the ancestral prototype strain PG/NGC44/1944) or Thr. At position 390, two Indian Cosmopolitan strains possessed Asn, as in strain PG/NGC44/1944. In addition, five of eight Indian strains belonging to the American genotype possessed Lys at position 126, as in the prototype strain PG/NGC44/1944, whereas others possessed Glu. Notably, at the same position, 12 of 21 Indian strains of the Cosmopolitan genotype possessed Glu, whereas the rest had Lys. At position 322, six of 21 Indian strains of the Cosmopolitan genotype possessed Val instead of Ile, which was seen in the rest of the Indian strains. Furthermore, at position 402, most Indian strains possessed Phe, although a minority (two of eight) of Indian strains belonging to the American genotype possessed Leu and one strain from the Cosmopolitan genotype possessed Val at that position.

    Selection pressure analysis

    Site-specific selection pressure was analysed by using likelihood procedures including single-likelihood ancestor counting (SLAC), random-effects likelihood (REL) and fixed effects likelihood (FEL). Sites were considered to be under positive selection if at least two of the methods indicated this with high statistical significance (P<0.1/Bayes factor >50). Different datasets were used, one consisting of all American genotype sequences (n=21) and one consisting of all Cosmopolitan genotype sequences (n=36). Within the American genotype strains, site 402, and within the Cosmopolitan genotype, site 126, satisfied the criteria (with high significance). In the Cosmopolitan genotype, three other sites, 64, 310 and 360, showed some evidence of positive selection (Table 2). We also considered two other datasets by splitting the Indian sequences into the American genotype (n=8) and the Cosmopolitan genotype (n=21) to examine whether similar selection pressure processes could be identified. We found that site 402 in the American genotype and site 126 in the Cosmopolitan genotype were the only ones to be under significantly high positive selection by at least two of the likelihood methods adopted.

    Table 2.

    Parameters from selection pressure analysis of the E gene using the SLAC, REL and FEL methods, based on different datasets as indicated

    Sites under strong positive selection pressure as demonstrated by at least two methods are shown in bold. Other sites that are indicated to be under positive selection by only one of the methods, with the other two methods indicating close to threshold P values, are italicized.

    Molecular clock analysis

    Estimates of the nucleotide substitution rates and the time to most recent common ancestor (tMRCA) of the DENV-2 genotypes as well as Indian DENV-2 strains were inferred from a total of 76 (29 Indian and 47 global) dated E gene sequences. Among all the models (strict, relaxed uncorrelated exponential and relaxed uncorrelated lognormal) of molecular clocks employed, models enforcing relaxed molecular clocks performed better than the strict clock model (Table 3). The logistic growth demographic model did not converge under strict and relaxed (uncorrelated exponential and uncorrelated lognormal) models of molecular clock. Also, the expansion growth demographic model under relaxed (uncorrelated exponential and uncorrelated lognormal) models and the exponential growth demographic model under a relaxed (uncorrelated lognormal) model of molecular clock did not converge with the dataset considered.

    Table 3.

    Parameter estimates for different clock models for E gene sequences of DENV-2 using beast

    The best fit model is indicated in bold.

    The highest marginal likelihood was obtained with the model implementing the relaxed uncorrelated exponential molecular clock and the constant population growth model. The Bayes factor and the Bayesian skyline plot (data not shown) gave stronger support to this model than other clock and population models. Under this model, the mean substitution rate was 6.5×10−4 substitutions per site year−1 [95 % highest probability density (HPD), 4.1–8.7×10−4] (Table 3). The maximum clade credibility tree generated under the best fit uncorrelated exponential molecular clock with constant population growth model is shown in Fig. 2.

    Figure image not available in archive
    Fig. 2.

    Bayesian Markov chain Monte Carlo (MCMC) tree of E gene sequences of DENV-2 with the constant size demographic model under the relaxed uncorrelated exponential clock model. The tMRCA estimates of the key nodes (highlighted by black and grey circles) are also shown; black circles (labelled A–E) indicate nodes representing divergence of Indian subgroups. Indian sequences are shown in bold.

    The estimate of the tMRCA for all genotypes (excluding the Sylvatic genotype) of DENV-2 was about 200 years (95 % HPD: 90, 377 years) (Fig. 2). Strain TT/Trinidad/1953 and the American genotype strains share a common ancestor about 106 years (95 % HPD: 61, 185 years) ago, whilst all of the pre-1971 Indian isolates with the American genotype shared a common ancestor (node A) about 73 years (95 % HPD: 54, 100 years) ago. Three subgroups of Indian isolates with high bootstrap support were observed. The first group of Indian isolates from 1956 to 1963 showed a tMRCA of about 60 years (95 % HPD: 52, 72 years), whilst the second group of two Indian isolates from 1964 and 1980 showed a tMRCA of about 51 years (95 % HPD: 44, 62 years) and diverged from node B about 62 years (95 % HPD: 50, 79 years) ago. The third group of Indian isolates (1967–1971), along with the native American isolates, shared a common ancestor (node C) about 57 years (95 % HPD: 47, 69 years) ago.

    The common ancestor of all of the Cosmopolitan isolates was about 65 years (95 % HPD: 41, 101 years) old. The subclade consisting of east African and South-East Asian strains showed a tMRCA of about 43 years (95 % HPD: 30, 60 years), whilst that of the post-1971 Indian isolates was about 47 years (95 % HPD: 37, 63 years). Within the Indian subclade, the two subgroups indicated similar tMRCAs of about 39 years (95 % HPDs: 29, 50, and 34, 46 years, respectively) ago (nodes D and E).

    The tMRCA of the Asian I genotype was about 58 years (95 % HPD: 45, 79 years), that of the Asian II genotype was about 75 years (95 % HPD: 65, 93 years) and that of the American/Asian genotype was fairly recent at about 39 years (95 % HPD: 27, 58 years).

    DISCUSSION

    The number of different genotypes seen in each serotype is consistent with the abundance of genetic diversity in DENV (Holmes & Burch, 2000). It is important to note that, among the four serotypes, serotype 2 has been the most genetically diverse (Rico-Hesse et al., 1997). Furthermore, it has been observed that some of the genotypes have a wide geographical distribution (Twiddy et al., 2002a, b; Zhang et al., 2005). Various earlier studies have shown six distinct genotypes of DENV-2 (Rico-Hesse, 1990; Wang et al., 2000; Twiddy et al., 2002a).

    Our phylogenetic analysis based on the E gene of DENV-2 strains isolated during the period from 1956 to 2005 in India, along with globally representative strains, showed that Indian strains segregated into two genotypes, namely the American and Cosmopolitan genotypes. The Indian isolates obtained before and during the 1971 epidemics clustered into the American genotype (Fig. 1), whereas the post-1971 isolates clustered into the Cosmopolitan genotype. There was no indication of any geographical dispersal pattern. Our molecular clock analysis (Fig. 2) showed that the tMRCA of the pre-1971 Indian strains and the American genotype was about 73 years, with an upper bound of 100 years. This time period of 1905–1932 corresponds with the Indian immigration to Surinam and Guyana between 1873 and 1916 (Chickrie, 2003) and suggests transportation of the virus from the Americas to India. Unfortunately, the lack of samples from earlier collection times does not allow for more robust analysis.

    Figs 1 and 2 also show the different subgroups among the pre-1971 Indian isolates; these indicate the repeated establishment of the genotype at least three times in India (nodes A, B and C), the mean divergence times based on the molecular clock analysis being 73 years (about 1932), 62 years (1943) and 57 years (1948), respectively. The tree in Fig. 2 shows clearly that the American strains during the period from 1967 to 1995 share a common ancestor with the Indian strains of the period from 1967 to 1971 at node C. This is indicative of a possible transmission of DENV-2 viruses between India and the Americas sometime around 1948. Incidentally, these dates coincide with World War II and its aftermath, during which global movements showed an increase.

    The Indian strains isolated during and after 1974 segregated into the Cosmopolitan genotype, suggesting clearly that there was a change in the circulating genotype of DENV-2 in India. A deviation from this pattern was observed with the 1980 strain (IN/NIV803347/1980) isolated from Kolkata (West Bengal), which belonged to the American genotype. The close similarity of this isolate to the Indian isolate of 1964 (NIV_64553) may, however, imply that, rather than it being an imported case, the American genotype was circulating during that time. The Cosmopolitan genotype seems to have been established in India in the post-1980 period.

    An earlier study of genomic sequence analysis of DENV-2 from the 1996 epidemic in Delhi, India (Singh et al., 1999), showed that the replacing Cosmopolitan genotype resembled strains of the Somalia and Torres Straits. Our phylogenetic analysis of E gene sequences showed that the Cosmopolitan genotype segregated into two subclades with 100 % bootstrap support. The subclade consisting of the post-1971 Indian isolates was distinct from the subclade consisting of the Somalia and Torres Strait strains [along with later strains from South-East Asia (specifically Indonesia, the Philippines, Taiwan, Singapore and Malaysia)], indicating that there have may been independent introductions in the two regions. The post-1971 Indian isolates formed two subgroups (nodes D and E in Fig. 2) with strong bootstrap support (100 and 99 %, respectively), indicative of two establishments of the Cosmopolitan genotypes in India. In one of these subgroups, the earliest Indian isolate (IN/NIV742295/1974) shared a common ancestor with SC/SEY42/1977 and SL/SL206/1990, implying dispersal of a strain from Trichur, Kerala (IN/NIV742295/1974) to the island in the Indian Ocean and further spread to the West African countries. The Indian isolates of the period from 1983 to 1991 (IN/NIV836379/1983 and IN/NIV916584/1991) grouped with strains from Sri Lanka (LK/271235/1990) and Uganda (UG/CAMR11/1993), indicating dispersal of the Indian strains to these countries. Notably, a strain from Fujian province, China (CN/FJ11/1999), also fell into the cluster of Indian isolates. Close relatedness of this strain to the Indian isolates (IN/NIV965336/1996 and IN/NIV965338/1996) of an earlier timescale is indicative of the possible introduction of this particular strain to China from India. The transfer of viruses across continents and oceans can be attributed to increased global movements, as well as transportation of infected mosquitoes via commercial shipping (Self, 1984; Song et al., 2003; Zeller, 1998).

    The major mutations unique to the subclade defining the Indian strains were identified in the sequence-based analyses carried out in this study. The tMRCA of the group consisting of all post-1971 Indian isolates was about 47 years, with an upper bound of about 63 years, while that of the two subgroups (nodes D and E) was about 39 years and not more than 50 years, implying migration of DENV-2 to countries across the Indian Ocean during 1955–1966.

    The E protein is the major antigenic determinant of DENV. The selection pressure analysis based on the E gene provided strong evidence for positive selection at two previously unreported sites, one (402) in the American genotype and one (126) in the Cosmopolitan genotype. The other sites (64, 310 and 360 in the Cosmopolitan genotype) that showed some evidence of positive selection have also not been reported previously. Sites Ile402 in the American genotype and Lys126 in the Cosmopolitan genotype were the only sites found to be under strong positive selection among the Indian group of pre- and post-1971 isolates. The American genotype has previously been reported to be under conservative evolution, whilst there has been evidence of diversifying selection in the Cosmopolitan genotype (Twiddy et al., 2002a). Among the sites reported in the Cosmopolitan genotype (positions 52, 159 and 390) in the E gene, none were identified to be under positive selection with high statistical significance in our study. Most of these sites were found to be invariable in the Indian context and hence may have ceased to show evidence of any selection pressure.

    Notably, site 402 showed evidence of strong positive selection in the American genotype among both global and Indian viruses. This site has been proposed to be located within the postulated stem-anchor region adjacent to the putative receptor-binding domain (domain III) of the E protein structure. Substitutions at this position were reported to confer neurovirulence in mice (Bray et al., 1998). Site 126, identified in this study to be under selection pressure in the Cosmopolitan genotype, was found to be under mutational pressure during the clade replacement. The amino acid residue at position 126 resides in domain II of the E glycoprotein (Rey et al., 1995). It has been shown to be responsible for a conformational change at low pH and to be involved in exposing the fusion peptide on the virion surface at the time of virus entry into susceptible cells. This residue could, therefore, be responsible for altering virus–cell interactions (Roehrig et al., 1994). Substitutions altering virus–cell interactions may be one of several factors responsible for broad dispersal of the Cosmopolitan genotype. In the case of the Indian viruses, these substitutions may provide a fitness advantage over the American genotype, which led to detected change in DENV-2 epidemiology during the past few decades. The functional significance of other sites, such as 64, 310 and 360, that were also indicated to be under selection pressure, in terms of viral fitness, is not clear, but may merit further study.

    In earlier studies, the role of natural selection pressure has also been described for strain extinction and/or replacement (Sittisombut et al., 1997; Wittke et al., 2002). Further, strain replacement and/or extinction has also been attributed to stochastic processes, such as changes in the population density and the vector mosquito density during inter-epidemic years, or is a regular occurrence in virus evolution, especially at times when the numbers of susceptible hosts or mosquitoes are low (Holmes & Twiddy, 2003; Vasilakis et al., 2007). In the Indian scenario, it is not possible to elucidate whether the genotype replacement is attributed to the fitness of the latter genotype or because of stochastic processes. Based on the findings reported here, however, it seems reasonable to suggest that positive selection may have some role to play in genotype replacement.

    The estimated rate of nucleotide substitution in the E gene of the Indian strains of DENV-2 was shown to be about 6.5×10−4 substitutions per site year−1. This average rate is comparable to the earlier estimates of roughly 6×10−4 substitutions per site year−1, considering global DENV-2 human strains (Twiddy et al., 2003; Wang et al., 2000). The overall time of emergence of human DENV-2 was estimated to be about 200 years (95 % HPD: 90, 377 years) ago, which is comparable to the earlier estimates of Zanotto et al. (1996) and Wang et al. (2000). The estimate of the divergence time of the American, Asian, Cosmopolitan and American/Asian genotypes was found to be about 106 (61, 185), 90 (68, 120), 65 (41, 101) and 39 (27, 58) years, respectively. These estimates are also comparable with the earlier estimates (Twiddy et al., 2003; Wang et al., 2000).

    Notably, the Americas experienced a rapid displacement of the less virulent American genotype by the more virulent Asian and American/Asian genotypes (Rico-Hesse et al., 1997; Leitmeyer et al., 1999). Whilst the Asian strains from Thailand, Indonesia and China may have moved westward and displaced the American genotype, it is noteworthy that these genotypes were never established in India, at least on the basis of the available genetic data.

    Overall, the present study shows a change in the circulating genotype of DENV-2 in India from the American to the Cosmopolitan genotype during the 1970s. The selection pressure analysis on E gene sequences suggests the presence of positive selection at two sites, Ile402 in the American genotype (as well as in Indian pre-1971 strains) and Lys126 in the Cosmopolitan genotype (as well as in Indian post-1971 strains). The rates of nucleotide substitution determined in this study are comparable with the earlier reports suggesting similar rates of substitution for the Indian strains. The tMRCA of the pre-1971 Indian strains with the American genotype was about 73 years and not more than 100 years ago (1905–1932), correlating with historical records of traffic between India and South America. The tMRCA of the Cosmopolitan genotype, including all post-1971 Indian isolates, was about 47 years, with estimates of transmission of DENV-2 to countries across the Indian Ocean about 39 years and not more than 50 years ago, i.e. between 1955 and 1966. Although the present study helps us to understand the establishment and dispersal of the two genotypes in the Indian context, analysis based on full genome sequences of the isolates would consolidate the present conclusions.

    METHODS

    Viruses.

    The DENV-2 strains used in this study were isolated from different parts of India during different epidemics spanning over almost 50 years (1956–2005). The earliest isolates, from between 1956 and 1964, were from Vellore district, Tamil Nadu, south India, while the rest of the isolates were from different regions covering north, west, central and south India. All strains except for two (IN/NIV715541/1971 and IN/NIV051774/2005) were isolated from human samples; these two strains were isolated from A. aegypti mosquitoes in 1971 and 2005, respectively. Thirty-three isolates were sequenced as part of this study (details of these 33 virus strains are presented in Supplementary Table S1, available in JGV Online); the other four sequences were retrieved from GenBank. Viruses were procured from the virus repository of the National Institute of Virology, Pune, India. Lyophilized viruses were reconstituted in sterile distilled water and inoculated into Swiss albino suckling mice by the intracerebral route. Mice were maintained as per the guidelines of the Committee for Protection, Supervision and Control of Experiments on Animals (CPSCEA). Mice were observed for sickness; sick mice were euthanized and their brains were harvested. A 10 % suspension of mouse brain in 0.75 % bovine albumin phosphate saline (BAPS) was prepared and stored at −70 °C until use.

    RNA extraction, RT-PCR and sequencing.

    Infectious mouse brain suspension representing each strain was used for RNA extraction. Viral RNA was isolated by using a QIAamp Viral RNA Mini kit (Qiagen), according to the manufacturer's instructions. cDNAs were prepared with 10 μl RNA template using avian Moloney virus reverse transcriptase (Promega) as per the manufacturer's instructions. The E gene was amplified by using primers described by Wang et al. (2000) (see Supplementary Table S2, available in JGV Online) with 2.0 μl cDNA by using Platinum Taq polymerase (Invitrogen). PCR products were visualized by ethidium bromide agarose gel staining and the PCR products were gel-purified by using a QIAquick gel extraction kit (Qiagen). The purified PCR products were sequenced by using a BigDye Terminator cycle sequencing ready reaction kit (Applied Biosystems) on an automatic sequencer (ABI PRISM Genetic Analyzer 3100; Applied Biosystems).

    Sequence and phylogenetic analysis.

    Multiple sequence alignment was done by using clustal w implemented in mega v. 3.1 (Kumar et al., 2004). A phylogenetic tree was constructed by using the E gene sequences of the 33 strains (Supplementary Table S1), along with 51 other DENV-2 sequences from different geographical regions obtained from GenBank (see Supplementary Table S3, available in JGV Online). In the process of global sequence selection, sequences that were related closely (>99.5 % identity) were eliminated. Two sequences of DENV-2 Sylvatic strains were used as outgroup sequences. The tree, based on 86 sequences, was constructed by the maximum-likelihood (ML) method in the software paup 4b10 (Swofford, 2003) by using a heuristic search algorithm in two steps. The substitution model used was the best model as detected by the Akaike Information Criterion (AIC) in modeltest 3.7 (Posada & Crandall, 1998), namely GTR+G4+I (general time-reversible model with gamma-distributed rates of variation among sites and a proportion of invariable sites). In the first step, branch-swapping by nearest neighbour interchange was used with stepwise, random addition of sequences. In the second step, the tree bisection–reconnection method of branch-swapping was applied on the best tree found in the first step. The ML estimates of different parameters of the nucleotide substitution model were as follows: the optimal base frequencies of A, C, G and T were 0.33, 0.21, 0.24 and 0.20, respectively. The estimates of the relative substitution rates were 0.89, 6.41, 1.41, 1.1, 18.71 and 1.0 for the substitution types A↔C, A↔G, A↔T, C↔G, C↔T and G↔T, respectively. The estimated proportion of invariable sites was 24.6 %, whilst the shape parameter of the Γ (gamma) distribution of rate variation among sites was 0.87.

    The topology as predicted by the ML tree was also confirmed by two more methods: maximum parsimony (MP) and the Bayesian MCMC (Markov chain Monte Carlo) tree-sampling method. The MP tree construction was done by a heuristic search approach implemented in paup 4b10. MrBayes v. 3 (Ronquist & Huelsenbeck, 2003) was used to build the Bayesian MCMC tree using GTR+G4+I as the nucleotide substitution model with 1 000 000 generations, a sampling frequency of 100 and 25 % of the generations as burn-in. The reliability of the nodes of the ML tree was judged by superimposing the posterior supports (%) at corresponding nodes of the Bayesian MCMC tree.

    Selection pressure analysis.

    To identify the existence of positive selection pressure at individual codon sites in the E gene of DENV-2, three likelihood procedures were used: the SLAC and FEL methods and the more powerful REL method (Pond & Frost, 2005). The strength of selection pressure is determined on the basis of the ratio of non-synonymous (dN) to synonymous (dS) substitutions per site (ratio dN/dS). The analysis was carried out using the online Datamonkey facility (), incorporating the GTR model of nucleotide substitution, with a phylogenetic tree inferred by using the neighbour-joining method. Four datasets were used: all American genotype, all Cosmopolitan genotype, only Indian sequences of American genotype and only Indian sequences of Cosmopolitan genotype. In all of these datasets, closely related sequences (>99.5 % nucleotide identity) were eliminated. With this restriction, n=21, 36, 8 and 21, respectively.

    Molecular clock analysis.

    The dataset for the molecular clock analysis comprised Indian and global sequences from the five different genotypes (Sylvatic strains excluded). No evidence of recombination was identified in the Indian strains or between the Indian and the globally representative viruses. Closely similar sequences (>99.5 % identity) were removed from the analysis. With this restriction, the number of Indian isolates was reduced to 29 and the total dataset consisted of 76 sequences. The rate of nucleotide substitution and divergence times, i.e. tMRCA, of DENV-2 strains of Indian origin and also of the different genotypes were calculated by using the Bayesian MCMC approach as implemented in beast 1.4.8 (Drummond & Rambaut, 2007). We employed both strict and relaxed (uncorrelated exponential and uncorrelated lognormal) molecular clocks (Drummond et al., 2006) with different demographic models (constant size, exponential growth, logistic growth and expansion growth). modeltest was used to select the best-fitting nucleotide substitution model, using AIC as implemented in modeltest 3.7 (Posada & Crandall, 1998). The GTR+G4+I model was found to be the best fit for our dataset. Five independent MCMC analyses, each for 10 000 000 steps, were performed for each combination of branch rate and demographic model, and combined with a burn-in value set to 10 % generations using LogCombiner 1.4.8 (implemented in beast). The resulting convergence was analysed by using Tracer 1.4 (Rambaut & Drummond, 2007). Model comparison was done by calculation of Bayes factor based on the relative marginal likelihoods of the models under comparison (Suchard et al., 2001). The maximum clade credibility tree was generated by using Tree Annotator (available in beast), and FigTree 1.2.2 () was used for visualization of the annotated trees. The 95 % HPD intervals were obtained to ascertain the uncertainty in the parameter estimates.

    Acknowledgments

    The authors appreciate the input from two anonymous reviewers, which has significantly improved the manuscript. S. R. P. K. acknowledges the Council of Scientific and Industrial Research (Government of India) for providing fellowship to carry out this research. The authors thank Mrs S. S. Athawale and Dr A. B. Sudeep for their help with virus propagation.

    References