Hepatitis C virus F protein sequence reveals a lack of functional constraints and a variable pattern of amino acid substitution

Abstract

Hepatitis C virus (HCV) is an important human pathogen that affects 170 million people worldwide. The HCV genome is an RNA molecule that is approximately 9·6 kb in length and encodes a polyprotein that is cleaved proteolytically to generate at least 10 mature viral proteins. Recently, a new HCV protein named F has been described, which is synthesized as a result of a ribosomal frameshift. Little is known about the biological properties of this protein, but the possibility that the F protein may participate in HCV morphology or replication has been raised. In this work, the presence of functional constraints in the F protein was investigated. It was found that the rate of amino acid substitutions along the F protein was significantly higher than the rate of synonymous substitutions, and comparisons involving genes that represented independent phylogenetic lineages yielded very different divergence/conservation patterns. The distribution of stop codons in the F protein across all HCV genotypes was also investigated; genotypes 2 and 3 were found to have more stop codons than genotype 1. The results of this work suggest strongly that the pattern of divergence in the F protein is not affected by functional constraints.

The GenBank/EMBL/DDBJ accession numbers for the sequences reported in this work are AJ582128AJ582131 and AJ781117AJ781124.

Hepatitis C virus (HCV) is the major causative agent of post-transfusion hepatitis and parenterally transmitted, non-A, non-B hepatitis throughout the world (Alter & Seeff, 2000). HCV is an enveloped RNA virus that is classified in the family Flaviviridae. HCV has high genomic variability and at least six different genotypes and an increasing number of subtypes have been reported (Simmonds, 1999). The HCV genome is approximately 9·6 kb in length and encodes a polyprotein that is cleaved proteolytically to generate at least 10 mature viral protein products (Reed & Rice, 2000). Recently, a new protein named F has been described; it is expressed as a result of a ribosomal frameshift within the capsid-encoding sequence, a mechanism unique among members of the family Flaviviridae (Xu et al., 2001; Choi et al., 2003). This protein was localized in the cytoplasm of infected cells, with a notable perinuclear localization (Roussel et al., 2003), and was found to be associated with the endoplasmic reticulum (Xu et al., 2003). This subcellular localization of HCV F protein is similar to that of the HCV core and NS5A proteins, raising the hypothesis that the F protein may participate in HCV morphogenesis or replication (Xu et al., 2003). In addition, sera from patients who were positive for HCV genotype 1a or 1b were shown to react differently to synthetic peptides encoded by the F reading frame, and these findings have suggested genotype-dependent specific features for the F protein (Boulant et al., 2003). In order to contribute to elucidating these matters, we performed an analysis of genetic variability and amino acid substitution rates for the HCV F protein. Serum samples.
Serum samples were obtained from 20 patients with chronic hepatic disease from Hospital Nacional Eduardo Rebagliati Martins (Lima, Peru) and Hospital de Clinicas (Montevideo, Uruguay). The Peruvian patients were screened by using an enzyme immunoassay (Innogenetics) and a confirmatory line immunoassay test (Innogenetics), according to the manufacturer's instructions. The Uruguayan patients were screened by using an enzyme immunoassay (Abbott) according to the manufacturer's instructions.

RNA extraction and cDNA synthesis and amplification.
HCV RNA was extracted from serum samples (100 µl) by using a QIAamp viral RNA kit (Qiagen) according to the manufacturer's instructions. Extracted RNA was eluted from the columns with 50 µl RNase-free water, and cDNA synthesis and PCR amplification of the core region were carried out as described by Bukh et al. (1994). To avoid false-positive results, the recommendations of Kwok & Higuchi (1989) were strictly adhered to. Amplicons were purified by using a QIAquick PCR purification kit (Qiagen) according to the manufacturer's instructions.

Sequencing.
The primers used for amplification were also used for sequencing the PCR fragments. All PCR fragments were sequenced in both directions to avoid discrepancies. The sequencing reaction was carried out by using a BigDye DNA sequencing kit on a 373 DNA sequencer apparatus (both from Perkin Elmer).

Sequence analysis.
The amino acid sequences of the core protein, as well as the F protein (obtained from the core gene in the F reading frame), were aligned by using the CLUSTAL W program (Thompson et al., 1994).

Substitution-rate analysis.
Substitution rates along the HCV F protein were measured by using a sliding window. Pairwise nucleotide distances (synonymous and non-synonymous) within each window were estimated by the method of Comeron (1995), as implemented in the computer program K-estimator. For those windows where the method was inapplicable (due to the negative argument of the logarithm), we used the JukesCantor method (Jukes & Cantor, 1969). The window size used was 30 codons, shifting three codons at a time. The ratios of non-synonymous (dn) to synonymous (ds) substitutions for the core and F proteins were calculated by using data obtained from the computer program SNAP, as implemented by Korber (2000).

Amino acid substitution rates in the F and core regions across HCV genotypes
In order to gain insight into the pattern of amino acid substitutions in the F protein, the deduced HCV F amino acid sequences, obtained from the core gene sequences from the South American patients, were aligned with those from 12 other strains that were representative of all six HCV types isolated elsewhere for which total sequences have been obtained. The origin of the sequences and the strains used are listed in Table 1. Once aligned, we compared the dn/ds ratios obtained for the F and core proteins from all pairwise comparisons among all strains involved in these studies. Examples of the results of these studies are shown in Table 2. Unexpectedly, very high dn/ds ratios were found for the F protein in comparison with the values found for the core protein. The mean dn/ds ratio for all pairwise comparisons for the F protein was 2·5, whereas the mean ratio for the core protein was 0·10.

Table 1. Origins of HCV strains

Table 2. Amino acid substitution rates for the F and core proteins across all HCV genotypes

Within-gene covariation between synonymous and non-synonymous substitutions
In order to estimate substitution rates along all regions of the F protein, we used a sliding-window analysis to estimate variation in the rates of synonymous and non-synonymous substitutions within the F protein.

As shown in Fig. 1, the rates of non-synonymous substitutions were significantly higher than those of synonymous substitutions for all pairwise comparisons, in agreement with previous results (Table 2). Interestingly, the profiles of synonymous and non-synonymous distances exhibited low covariation. This means that those regions of the F protein that are more divergent at the amino acid level are not more divergent at the synonymous level (see Fig. 1).

(26K):

Fig. 1. Profiles of synonymous and non-synonymous distances in the HCV F protein. Numbers on the y axis denote distance. Numbers on the x axis show the codon positions in the mid-point of the window. Synonymous substitutions are shown by a broken line and non-synonymous substitutions by a solid line. The following comparisons are shown: (a) strains JK1 (genotype 1b) and V-D (genotype 3); (b) strains H77 (genotype 1a) and J8 (genotype 2); (c) strains J8 (genotype 2) and V-D (genotype 3); (d) strains V-D (genotype 3) and ED43 (genotype 4); (e) strains ED43 (genotype 4) and EUH (genotype 5); and (f) strains EUH (genotype 5) and euhk (genotype 6).

For the purpose of testing whether the observed pattern of divergence was governed by a deterministic force, such as natural selection, it was necessary to analyse processes of divergence between phylogenetically independent lineages. Therefore, as shown in Fig. 1, we obtained the profiles of synonymous and non-synonymous distances for different HCV genotypes and subtypes. The differences between the pairs of profiles were evident for all examples shown (see Fig. 1). This indicated that the pattern of conservation/divergence along the F protein was not due to deterministic forces and that the intragenic distribution of synonymous and non-synonymous substitutions was random in the HCV F protein.

Distribution of stop codons in the F protein across all HCV genotypes
In order to gain insight into the functionality of the F protein, we studied the distribution of stop codons across all HCV genotypes and subtypes available in the HCV databases. As shown in Table 3, some genotypes, particularly 2 and 3, had more stop codons in their F proteins than subtypes 1a and 1b. This means that the overall structure of the F protein varies greatly among the different genotypes, which may have important consequences in relation to the functionality of the protein in different genotypes and subtypes.

Table 3. Distribution of stop codons in the F protein across all HCV genotypes

HCV F protein is a newly discovered HCV gene product that is expressed by a translational ribosomal frameshift in the core gene. Although ribosomal frameshifting for gene expression has been demonstrated for RNA viruses of several different families, including retroviruses (Jacks & Varmus, 1985), coronaviruses (Brierley et al., 1989) and astroviruses (Marczinke et al., 1994), HCV is the first example within the family Flaviviridae to use this mechanism to express a gene that is embedded totally in another coding sequence. Little is known about the biological properties of the HCV F protein (Xu et al., 2003). It is a labile protein with a half-life of less than 10 min in Huh7 hepatoma cells and in vitro (Xu et al., 2003).

Strikingly, we found very high dn/ds ratios for all pairwise comparisons across all HCV genotypes for the F protein (i.e. dn/ds>1; see Table 2). One possible explanation for the F protein having such a large dn/ds ratio is that its non-synonymous variation is simply the result of variation in the overlapping core gene, which is in a different reading frame. Nevertheless, these results showed a different pattern of amino acid substitutions for the F protein than for the the core protein and other regions of the HCV genome.

We found a high degree of genetic variability and amino acid substitution rates along the F protein (Fig. 1). This is in contrast to the results commonly found in other viral systems, such as human immunodeficiency virus (Zanotto et al., 1999) and hepatitis A virus (Costa-Mattioli et al., 2003), and even with other HCV proteins, such as the core protein (not shown). This suggests that deterministic forces are not acting to conserve a particular domain or region.

The results of this work are in agreement with previous reports indicating that the F protein displays no clear sequence homologies to other proteins of known function, except that it is highly basic (Xu et al., 2001). Interestingly, the F protein does not appear to be essential for viral RNA replication, as its absence did not abolish the replication of an HCV RNA replicon in Huh7 hepatoma cells (Lohmann et al., 1999; Blight et al., 2000).

HCV chimeras have been constructed and shown to be infectious (Yagani et al., 1998) and the HCV F protein has been expressed (Roussel et al., 2003). Taking this into account, specific experiments can be designed to determine whether HCV F proteins from different HCV genotypes show differences in specific functions, such as morphogenesis, replication and ligand interactions. These will provide a definitive picture of the role of the F protein in the biology of HCV.

We acknowledge support from ICGEB, PAHO and RELAB through Project CRP.LA/URU03-032. We thank Dr Fabián Alvarez-Valin, from Sección Biomatemáticas, Facultad de Ciencias, Montevideo, Uruguay, for helpful discussions. We thank anonymous reviewers from previous versions of this manuscript for helpful suggestions.

References

Alter, H. J. & Seeff, L. B. (2000). Recovery, persistence, and sequelae in hepatitis C virus infection: a perspective on long-term outcome. Semin Liver Dis 20, 1735.[CrossRef][Medline]

Blight, K. J., Kolykhalov, A. A. & Rice, C. M. (2000). Efficient initiation of HCV RNA replication in cell culture. Science 290, 19721974.[Abstract/Free Full Text]

Boulant, S., Becchi, M., Penin, F. & Lavergne, J.-P. (2003). Unusual multiple recoding events leading to alternative forms of hepatitis C virus core protein from genotype 1b. J Biol Chem 278, 4578545792.[Abstract/Free Full Text]

Brierley, I., Digard, P. & Inglis, S. C. (1989). Characterization of an efficient coronavirus ribosomal frameshifting signal: requirement for an RNA pseudoknot. Cell 57, 537547.[CrossRef][Medline]

Bukh, J., Purcell, R. H. & Miller, R. H. (1994). Sequence analysis of the core gene of 14 hepatitis C virus genotypes. Proc Natl Acad Sci U S A 91, 82398243.[Abstract/Free Full Text]

Choi, J., Xu, Z. & Ou, J. (2003). Triple decoding of hepatitis C virus RNA by programmed translational frameshifting. Mol Cell Biol 23, 14891497.[Abstract/Free Full Text]

Comeron, J. M. (1995). A method for estimating the numbers of synonymous and nonsynonymous substitutions per site. J Mol Evol 41, 11521159.[Medline]

Costa-Mattioli, M., Ferré, V., Casane, D. & 7 other authors (2003). Evidence of recombination in natural populations of hepatitis A virus. Virology 311, 5159.[CrossRef][Medline]

Jacks, T. & Varmus, H. E. (1985). Expression of the Rous sarcoma virus pol gene by ribosomal frameshifting. Science 230, 12371242.[Abstract/Free Full Text]

Jukes, T. H. & Cantor, C. R. (1969). Evolution of protein molecules. In Mammalian Protein Metabolism, pp. 21132. Edited by H. N. Munro. New York: Academic Press.

Korber, B. (2000). HIV sequence signatures and similarities. In Computational and Evolutionary Analysis of HIV Molecular Sequences, pp. 5572. Edited by A. G. Rodrigo & G. H. Learn, Jr. Dordrecht: Kluwer.

Kwok, S. & Higuchi, R. (1989). Avoiding false positives with PCR. Nature 339, 237238.[CrossRef][Medline]

Lohmann, V., Körner, F., Koch, J.-O., Herian, U., Theilmann, L. & Bartenschlager, R. (1999). Replication of subgenomic hepatitis C virus RNAs in a hepatoma cell line. Science 285, 110113.[Abstract/Free Full Text]

Marczinke, B., Bloys, A. J., Brown, T. D. K., Willcocks, M. M., Carter, M. J. & Brierley, I. (1994). The human astrovirus RNA-dependent RNA polymerase coding region is expressed by ribosomal frameshifting. J Virol 68, 55885595.[Abstract/Free Full Text]

Reed, K. E. & Rice, C. M. (2000). Overview of hepatitis C virus genome structure, polyprotein processing, and protein properties. Curr Top Microbiol Immunol 242, 5584.[Medline]

Roussel, J., Pillez, A., Montpellier, C., Duverlie, G., Cahour, A., Dubuisson, J. & Wychowski, C. (2003). Characterization of the expression of the hepatitis C virus F protein. J Gen Virol 84, 17511759.[Abstract/Free Full Text]

Simmonds, P. (1999). Viral heterogeneity of the hepatitis C virus. J Hepatol 31 (Suppl. 1), 5460.

Thompson, J. D., Higgins, D. G. & Gibson, T. J. (1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22, 46734680.[Abstract/Free Full Text]

Xu, Z., Choi, J., Yen, T. S. B., Lu, W., Strohecker, A., Govindarajan, S., Chien, D., Selby, M. J. & Ou, J. (2001). Synthesis of a novel hepatitis C virus protein by ribosomal frameshift. EMBO J 20, 38403848.[CrossRef][Medline]

Xu, Z., Choi, J., Lu, W. & Ou, J. (2003). Hepatitis C virus F protein is a short-lived protein associated with the endoplasmic reticulum. J Virol 77, 15781583.

Yagani, M., St Claire, M., Shapiro, M., Emerson, S. U., Purcell, R. H. & Bukh, J. (1998). Transcripts of a chimeric cDNA clone of hepatitis C virus genotype 1b are infectious in vivo. Virology 244, 161172.[CrossRef][Medline]

Zanotto, P. M. de A., Kallas, E. G., de Souza, R. F. & Holmes, E. C. (1999). Genealogical evidence for positive selection in the nef gene of HIV-1. Genetics 153, 10771089.[Abstract/Free Full Text]

Received 9 August 2004; accepted 27 September 2004.