Abstract
The GenBank/EMBL/DDBJ accession numbers for the sequences reported in this work are AJ582128AJ582131 and AJ781117AJ781124.
Serum samples were obtained from 20 patients with chronic hepatic disease from Hospital Nacional Eduardo Rebagliati Martins (Lima, Peru) and Hospital de Clinicas (Montevideo, Uruguay). The Peruvian patients were screened by using an enzyme immunoassay (Innogenetics) and a confirmatory line immunoassay test (Innogenetics), according to the manufacturer's instructions. The Uruguayan patients were screened by using an enzyme immunoassay (Abbott) according to the manufacturer's instructions.
RNA extraction and cDNA synthesis and amplification.
HCV RNA was extracted from serum samples (100 µl) by using a QIAamp viral RNA kit (Qiagen) according to the manufacturer's instructions. Extracted RNA was eluted from the columns with 50 µl RNase-free water, and cDNA synthesis and PCR amplification of the core region were carried out as described by Bukh et al. (1994). To avoid false-positive results, the recommendations of Kwok & Higuchi (1989) were strictly adhered to. Amplicons were purified by using a QIAquick PCR purification kit (Qiagen) according to the manufacturer's instructions.
Sequencing.
The primers used for amplification were also used for sequencing the PCR fragments. All PCR fragments were sequenced in both directions to avoid discrepancies. The sequencing reaction was carried out by using a BigDye DNA sequencing kit on a 373 DNA sequencer apparatus (both from Perkin Elmer).
Sequence analysis.
The amino acid sequences of the core protein, as well as the F protein (obtained from the core gene in the F reading frame), were aligned by using the CLUSTAL W program (Thompson et al., 1994).
Substitution-rate analysis.
Substitution rates along the HCV F protein were measured by using a sliding window. Pairwise nucleotide distances (synonymous and non-synonymous) within each window were estimated by the method of Comeron (1995), as implemented in the computer program K-estimator. For those windows where the method was inapplicable (due to the negative argument of the logarithm), we used the JukesCantor method (Jukes & Cantor, 1969). The window size used was 30 codons, shifting three codons at a time. The ratios of non-synonymous (dn) to synonymous (ds) substitutions for the core and F proteins were calculated by using data obtained from the computer program SNAP, as implemented by Korber (2000).
In order to gain insight into the pattern of amino acid substitutions in the F protein, the deduced HCV F amino acid sequences, obtained from the core gene sequences from the South American patients, were aligned with those from 12 other strains that were representative of all six HCV types isolated elsewhere for which total sequences have been obtained. The origin of the sequences and the strains used are listed in Table 1. Once aligned, we compared the dn/ds ratios obtained for the F and core proteins from all pairwise comparisons among all strains involved in these studies. Examples of the results of these studies are shown in Table 2. Unexpectedly, very high dn/ds ratios were found for the F protein in comparison with the values found for the core protein. The mean dn/ds ratio for all pairwise comparisons for the F protein was 2·5, whereas the mean ratio for the core protein was 0·10.
Table 1. Origins of HCV strains
Table 2. Amino acid substitution rates for the F and core proteins across all HCV genotypes
Within-gene covariation between synonymous and non-synonymous substitutions
In order to estimate substitution rates along all regions of the F protein, we used a sliding-window analysis to estimate variation in the rates of synonymous and non-synonymous substitutions within the F protein.
As shown in Fig. 1, the rates of non-synonymous substitutions were significantly higher than those of synonymous substitutions for all pairwise comparisons, in agreement with previous results (Table 2). Interestingly, the profiles of synonymous and non-synonymous distances exhibited low covariation. This means that those regions of the F protein that are more divergent at the amino acid level are not more divergent at the synonymous level (see Fig. 1).
|
For the purpose of testing whether the observed pattern of divergence was governed by a deterministic force, such as natural selection, it was necessary to analyse processes of divergence between phylogenetically independent lineages. Therefore, as shown in Fig. 1, we obtained the profiles of synonymous and non-synonymous distances for different HCV genotypes and subtypes. The differences between the pairs of profiles were evident for all examples shown (see Fig. 1). This indicated that the pattern of conservation/divergence along the F protein was not due to deterministic forces and that the intragenic distribution of synonymous and non-synonymous substitutions was random in the HCV F protein.
Distribution of stop codons in the F protein across all HCV genotypes
In order to gain insight into the functionality of the F protein, we studied the distribution of stop codons across all HCV genotypes and subtypes available in the HCV databases. As shown in Table 3, some genotypes, particularly 2 and 3, had more stop codons in their F proteins than subtypes 1a and 1b. This means that the overall structure of the F protein varies greatly among the different genotypes, which may have important consequences in relation to the functionality of the protein in different genotypes and subtypes.
Table 3. Distribution of stop codons in the F protein across all HCV genotypes
HCV F protein is a newly discovered HCV gene product that is expressed by a translational ribosomal frameshift in the core gene. Although ribosomal frameshifting for gene expression has been demonstrated for RNA viruses of several different families, including retroviruses (Jacks & Varmus, 1985), coronaviruses (Brierley et al., 1989) and astroviruses (Marczinke et al., 1994), HCV is the first example within the family Flaviviridae to use this mechanism to express a gene that is embedded totally in another coding sequence. Little is known about the biological properties of the HCV F protein (Xu et al., 2003). It is a labile protein with a half-life of less than 10 min in Huh7 hepatoma cells and in vitro (Xu et al., 2003).Strikingly, we found very high dn/ds ratios for all pairwise comparisons across all HCV genotypes for the F protein (i.e. dn/ds>1; see Table 2). One possible explanation for the F protein having such a large dn/ds ratio is that its non-synonymous variation is simply the result of variation in the overlapping core gene, which is in a different reading frame. Nevertheless, these results showed a different pattern of amino acid substitutions for the F protein than for the the core protein and other regions of the HCV genome.
We found a high degree of genetic variability and amino acid substitution rates along the F protein (Fig. 1). This is in contrast to the results commonly found in other viral systems, such as human immunodeficiency virus (Zanotto et al., 1999) and hepatitis A virus (Costa-Mattioli et al., 2003), and even with other HCV proteins, such as the core protein (not shown). This suggests that deterministic forces are not acting to conserve a particular domain or region.
The results of this work are in agreement with previous reports indicating that the F protein displays no clear sequence homologies to other proteins of known function, except that it is highly basic (Xu et al., 2001). Interestingly, the F protein does not appear to be essential for viral RNA replication, as its absence did not abolish the replication of an HCV RNA replicon in Huh7 hepatoma cells (Lohmann et al., 1999; Blight et al., 2000).
HCV chimeras have been constructed and shown to be infectious (Yagani et al., 1998) and the HCV F protein has been expressed (Roussel et al., 2003). Taking this into account, specific experiments can be designed to determine whether HCV F proteins from different HCV genotypes show differences in specific functions, such as morphogenesis, replication and ligand interactions. These will provide a definitive picture of the role of the F protein in the biology of HCV.
We acknowledge support from ICGEB, PAHO and RELAB through Project CRP.LA/URU03-032. We thank Dr Fabián Alvarez-Valin, from Sección Biomatemáticas, Facultad de Ciencias, Montevideo, Uruguay, for helpful discussions. We thank anonymous reviewers from previous versions of this manuscript for helpful suggestions.References
Blight, K. J., Kolykhalov, A. A. & Rice, C. M. (2000). Efficient initiation of HCV RNA replication in cell culture. Science 290, 19721974.
Boulant, S., Becchi, M., Penin, F. & Lavergne, J.-P. (2003). Unusual multiple recoding events leading to alternative forms of hepatitis C virus core protein from genotype 1b. J Biol Chem 278, 4578545792.
Brierley, I., Digard, P. & Inglis, S. C. (1989). Characterization of an efficient coronavirus ribosomal frameshifting signal: requirement for an RNA pseudoknot. Cell 57, 537547.[CrossRef][Medline]
Bukh, J., Purcell, R. H. & Miller, R. H. (1994). Sequence analysis of the core gene of 14 hepatitis C virus genotypes. Proc Natl Acad Sci U S A 91, 82398243.
Choi, J., Xu, Z. & Ou, J. (2003). Triple decoding of hepatitis C virus RNA by programmed translational frameshifting. Mol Cell Biol 23, 14891497.
Comeron, J. M. (1995). A method for estimating the numbers of synonymous and nonsynonymous substitutions per site. J Mol Evol 41, 11521159.[Medline]
Costa-Mattioli, M., Ferré, V., Casane, D. & 7 other authors (2003). Evidence of recombination in natural populations of hepatitis A virus. Virology 311, 5159.[CrossRef][Medline]
Jacks, T. & Varmus, H. E. (1985). Expression of the Rous sarcoma virus pol gene by ribosomal frameshifting. Science 230, 12371242.
Jukes, T. H. & Cantor, C. R. (1969). Evolution of protein molecules. In Mammalian Protein Metabolism, pp. 21132. Edited by H. N. Munro. New York: Academic Press.
Korber, B. (2000). HIV sequence signatures and similarities. In Computational and Evolutionary Analysis of HIV Molecular Sequences, pp. 5572. Edited by A. G. Rodrigo & G. H. Learn, Jr. Dordrecht: Kluwer.
Kwok, S. & Higuchi, R. (1989). Avoiding false positives with PCR. Nature 339, 237238.[CrossRef][Medline]
Lohmann, V., Körner, F., Koch, J.-O., Herian, U., Theilmann, L. & Bartenschlager, R. (1999). Replication of subgenomic hepatitis C virus RNAs in a hepatoma cell line. Science 285, 110113.
Marczinke, B., Bloys, A. J., Brown, T. D. K., Willcocks, M. M., Carter, M. J. & Brierley, I. (1994). The human astrovirus RNA-dependent RNA polymerase coding region is expressed by ribosomal frameshifting. J Virol 68, 55885595.
Reed, K. E. & Rice, C. M. (2000). Overview of hepatitis C virus genome structure, polyprotein processing, and protein properties. Curr Top Microbiol Immunol 242, 5584.[Medline]
Roussel, J., Pillez, A., Montpellier, C., Duverlie, G., Cahour, A., Dubuisson, J. & Wychowski, C. (2003). Characterization of the expression of the hepatitis C virus F protein. J Gen Virol 84, 17511759.
Simmonds, P. (1999). Viral heterogeneity of the hepatitis C virus. J Hepatol 31 (Suppl. 1), 5460.
Thompson, J. D., Higgins, D. G. & Gibson, T. J. (1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22, 46734680.
Xu, Z., Choi, J., Yen, T. S. B., Lu, W., Strohecker, A., Govindarajan, S., Chien, D., Selby, M. J. & Ou, J. (2001). Synthesis of a novel hepatitis C virus protein by ribosomal frameshift. EMBO J 20, 38403848.[CrossRef][Medline]
Xu, Z., Choi, J., Lu, W. & Ou, J. (2003). Hepatitis C virus F protein is a short-lived protein associated with the endoplasmic reticulum. J Virol 77, 15781583.
Yagani, M., St Claire, M., Shapiro, M., Emerson, S. U., Purcell, R. H. & Bukh, J. (1998). Transcripts of a chimeric cDNA clone of hepatitis C virus genotype 1b are infectious in vivo. Virology 244, 161172.[CrossRef][Medline]
Zanotto, P. M. de A., Kallas, E. G., de Souza, R. F. & Holmes, E. C. (1999). Genealogical evidence for positive selection in the nef gene of HIV-1. Genetics 153, 10771089.
Received 9 August 2004; accepted 27 September 2004.