Sequence analysis of the guanylyltransferase (VP3) of group A rotaviruses

Abstract

The RNA segment encoding the guanylyltransferase (VP3) from 12 group A rotavirus isolates has been sequenced following RT-PCR and molecular cloning of the full-length amplicons produced. Alignment of the derived amino acid sequences including those of the four VP3 sequences available from GenBank revealed two levels of sequence divergence. Virus isolates from humans showed greater than 94 % sequence identity, whereas those isolated from different mammalian species showed as low as 79 % sequence identity. The exceptions were avian virus isolates, which diverged ∼45 % from those of mammalian origin, and the human virus isolates DS1 and 69M, which showed much closer (over 90 %) identity to viruses of bovine origin, suggesting that these human isolates may have undergone recent reassortment events with a bovine virus. Analysis of the sequences for a putative enzymic active site has revealed that the KXTAMDXEXP and KXXGNNH motifs around amino acids 385 and 545, respectively, are conserved across both group A and C rotaviruses.

The genome of rotaviruses consists of 11 segments of double-stranded (ds) RNA ranging in size from ∼3·3 kbp to ∼660 bp (Estes, 2001). With the exception of the smallest segment, which is dicistronic (Mattion et al., 1991; Mitchell & Both, 1988), each RNA encodes a single protein (McCrae & McCorquodale, 1982). Rotaviruses have been divided into a number of distinct groups that show no antigenic cross-reactivity and very limited sequence similarity (Pedley et al., 1983, 1986; Saif & Jiang, 1994). There are currently five groups, A–E, with the majority of rotaviruses falling into group A. The protein product of the third largest RNA segment, VP3, has been shown to be located inside the core of the virion associated with genomic RNA (McCrae & Faulkner-Valle, 1981). Several lines of evidence have indicated that at least one function of this protein is to act as a guanylyltransferase (Liu et al., 1992; Pizarro et al., 1991) associated with the 5′ end capping of the virion mRNAs synthesized by the virion-associated RNA transcriptase activity. In comparison with some of the other group A rotavirus genes, that encoding VP3 has only been subjected to limited comparative sequence analysis, with a total of only four sequences being available in the databases and, of these, only one being from a human virus isolate. The recognition that VP3 was a target for the cell-mediated immune response in our work on effector T cell responses to rotaviruses (Heath, 1996) has prompted us to extend the sequence analysis of the gene encoding VP3.

Rotavirus strains (human virus isolates Wa G1P[8], DS1 G2P[4], Hochi G4P[8], ST3 G4P[6], 69M G8P[10], WI-61 G9P[8], A64 G10P[?] and L26 G12P[4]; porcine isolates OSU G5P[7] and YM1 G11P[7]; bovine isolate UKtc G6P[5]; equine isolate L338 G13P[?]; avian isolate CH-2 G7P[?]) were all grown in African green monkey kidney cells (BSC-1) after pre-treatment with 10 μg trypsin ml⁻¹ as described previously (McCrae & Faulkner-Valle, 1981). Total cytoplasmic RNA was extracted from infected cells as described previously (Johnson & McCrae, 1989). The gene encoding VP3 from each of these isolates was subject to RT-PCR using primers specific for the first 25 bases at the 5′ termini and the final 26 bases at the 3′ termini of RNA segment 3 derived from the human KU G1P[8] strain sequence present in GenBank (accession no. AB022767). The sequences of the primers used were: 5′ end primer, 5′-GGCTATTAAAGCAGTACTAGTAGTG-3′; 3′ end complementary primer, 5′-GGTCACATCATGACTAGTGTGTTAAG-3′. RT-PCR was carried out using either a single reaction mixture (Xu et al., 1990) or the more traditional two-step approach, in which an aliquot of the reverse-transcribed viral RNA was added to a separate reaction mixture for the PCR step (Ball, 2000). In some cases, the Qiagen One-Step RT-PCR kit was used according to the manufacturer's instructions for long template regions. The ∼2·6 kbp PCR amplicons were isolated from 1 % agarose gels and cloned into the plasmid pCR2.1 using the TA cloning kit (Invitrogen) according to the manufacturer's instructions. Sequencing of the viral gene cloned in pCR2.1 was carried out using the Big Dye Terminator version 3.1 chemistry (Applied Biosystems) with the fragments being resolved on an ABI Prism 3100 Genetic Analyzer. In order to exclude sequence errors introduced by Taq DNA polymerase, at least three independent clones were sequenced for each virus isolate. The consensus VP3 sequences obtained were deposited in GenBank and given the following accession numbers: Wa, AY267335; DS1, AY277914; Hochi, AY277915; ST3, AY277919; 69M, AY277916; WI-61, AY277917; A64, AY277920; L26, AY277918; OSU, AY277921; YM1, AY300922; UKtc, AY300923; L338, AY277922; and CH-2, AY277923. Other sequences already in GenBank included SA11 G3P[2], X16062; RF G6P[?], AY116592; and PO-13 G7P[?], AB009631. The deduced amino acid sequences were aligned using clustal w (Thompson et al., 1994) at NPS@:Network Protein Sequence Analysis (Combet et al., 2000).

The RT-PCR strategy used relied on there being a very high level of sequence conservation in the 5′ and 3′ untranslated regions that had been indicated in early terminal sequencing and fingerprinting studies (Clarke & McCrae, 1983; Imai et al., 1983; McCrae & McCorquodale, 1983). The validity of this strategy was confirmed by the fact that, in all cases, a major PCR amplicon of the expected size was obtained. The mammalian isolates examined in general had VP3-encoding genes of 2591 nt with 5′ and 3′ untranslated regions of 49 and 34 nt, respectively. The two exceptions to this rule were the bovine isolates; the RF isolate has one additional base in the 5′ untranslated sequence and the gene from the UKtc isolate is one base shorter in the 3′ untranslated region. The mammalian isolate genes all had a single long open reading frame (ORF) encoding a protein of 835 aa with a predicted molecular mass of ∼98 kDa. This molecular mass estimate is consistent with that observed for VP3 by PAGE analysis, and supports the conclusion reached in the earlier protein analyses that this protein is not subjected to any major post-translational modifications in virus-infected cells. The VP3 gene of the avian virus isolate, CH-2, was found to be shorter than the mammalian isolates, at 2583 nt, with an ORF capable of encoding a VP3 protein of 829 aa, which nevertheless still had a predicted molecular mass of ∼98 kDa. This is in broad agreement with the observed molecular mass (Kang et al., 1987). The 5′ and 3′ untranslated regions of the avian isolate gene were 49 and 44 nt, respectively (the terminal 25 and 26 bases, respectively, resulted from the PCR primers).

Alignment of the amino acid sequences deduced from the corresponding nucleotide sequences allowed estimates to be made of the extent of amino acid identity between the various isolates examined including those already lodged in GenBank. The results (Table 1⇓) revealed two levels of overall amino acid identity; for virus isolates originating from humans, sequence identity was greater than 94 % with the exception of DS1 and 69M (see later). By contrast, isolates originating from other species showed sequence identities as low as 79 % when compared with human isolates. Interestingly, these values are similar in magnitude to those seen for the external shell glycoprotein VP7 when virus isolates from the same and different G-serotypes are compared. There were two exceptions to these levels of amino acid identity. First, the level of divergence at ∼45 % (CH-2 versus PO-13, 95 %; CH-2 versus KU, 56 %; and PO-13 versus KU, 56 %) between virus isolates of mammalian and avian origin was much greater than that between isolates made from different mammalian hosts (Table 1⇓). Secondly, the two human virus isolates DS1 and 69M showed much closer sequence identity to viruses of bovine origin than to other human virus isolates (Table 1⇓). This suggests that these two human virus isolates may in fact be recent genetic reassortants carrying some genes originating from the human virus isolates but with their VP3 genes having been acquired from a bovine virus through reassortment. In the case of the 69M isolate, previous studies (Qian & Green, 1991) on the gene encoding the external shell protein VP4 of this isolate revealed that it had a higher level of sequence similarity to virus isolates of animal origin than those coming from humans. By contrast, the NSP1 gene sequences of DS1 and 69M are clearly not of animal origin (Hua et al., 1993; Xu et al., 1994). Phylogenetic tree analysis of the data (Fig. 1⇓) reiterates the interpretation that can be made from the simple percentage identity figures. The higher levels of sequence identity seen in virus isolates from the same host species may indicate that the selective pressure for evolutionary change in VP3 is for the optimization of its interaction with species-specific host proteins. At the individual amino acid level, proline residues at 15 positions (P120, P211, P223, P289, P309, P313, P392, P401, P426, P502, P652, P697, P709, P717, P796) and cysteine residues at four positions (C144, C380, C381, C659) were completely conserved, suggesting that they may have structural importance.

Figure image not available in archive

Fig. 1.

Phylogenetic tree compiled from the multiple alignment of group A mammalian VP3 amino acid sequences. Trees were constructed using the neighbour-joining method and drawn using treeview (Page, 1996). Scale bar indicates an estimated sequence divergence of 10 %. Numbers at nodes indicate bootstrap probabilities with confidence limits of over 70 % (700 out of 1000 trials).

Table 1.

Identity matrix for all identified mammalian group A rotavirus VP3 sequences

Figures are percentage identities of nucleotides (upper right-hand portion) and amino acids (lower left-hand portion).

Previous studies aimed at identifying active site motifs within the eukaryotic cellular and DNA virus guanylyl-transferases found Kx[D/N]G, with K and G being required for enzyme activity (Fresco & Buratowski, 1994). Within the Reoviridae, guanylyltransferase activity is associated with different proteins, VP3 in rotaviruses, VP4 in orbiviruses and λ2 in orthoreoviruses. These proteins have little discernible sequence similarity. Initial studies in mammalian reoviruses found that GMP bound to lysine 226 (K226) of λ2 in the sequence KPTNG, which closely resembled the motif mentioned above (Fausnaugh & Shatkin, 1990). However, it was subsequently found that mutagenesis of K226 did not abolish transferase activity, whereas the mutational change K190A caused a complete loss of activity and K171A resulted in activity being greatly reduced, and, as a result, a novel active site motif, KDLS, was suggested (Luongo et al., 2000). A modified version of this motif, Kx[V/L/I]S, was conserved in the limited number of rotavirus VP3 and orbivirus VP4 sequences available at the time this study was carried out. However, when the KLVS motif of orbivirus or the KRIS motif of rotavirus was substituted in place of the KDLS motif of the reovirus λ2 protein by mutagenesis, its guanylyltransferase activity was greatly reduced, implying that the modified motif did not encompass the active site of the enzyme (Luongo, 2002). This conclusion was strengthened when an avian reovirus homologue of λ2 was sequenced and found not to contain the expected motif (Hsiao et al., 2002). The greatly expanded number of rotavirus VP3 sequences resulting from the present study allows a reassessment of possible active site motifs. Alignment of the mammalian VP3 sequences revealed 35 completely conserved lysine residues (data not shown). However, when the avian and three group C rotavirus VP3 sequences from GenBank were included in the alignment, the number of completely conserved lysine residues dropped to six (data not shown), and excluded the Kx[V/L/I]S motif, as the avian VP3 sequences did not have a lysine residue in the requisite position (Fig. 2⇓a). Of the six lysine residues conserved across all rotavirus VP3 sequences, four positions are single residue conservations and hence are probably not part of a conserved motif. The sequence around K383 is, however, reasonably well conserved across the rotavirus sequences (Fig. 2b⇓) and includes the TAMD sequence that is a possible casein kinase II phosphorylation site. The sequence around K541 in the mammalian group A rotavirus consensus sequence is also well conserved (Fig. 2c⇓). This motif also closely resembles the KPTNG sequence originally identified in the orthoreoviruses and associated with guanylyltransferase activity (Fausnaugh & Shatkin, 1990). This analysis therefore highlights the need to carry out site-directed mutagenesis to identify the active site region of the rotavirus VP3 guanylyltransferase unequivocally; however, it does give an indication of which regions of the protein are good initial targets for such studies.

Figure image not available in archive

Fig. 2.

Portions of an alignment of the consensus group A mammalian rotavirus VP3 sequence with the two avian and three group C sequences. Alignments and consensus sequence were derived using clustal w (Thompson et al., 1994). The numbering is based on the group A mammalian sequence. Asterisks represent identity, colons represent high similarity, periods (full stops) represent low similarity, and spaces represent no similarity.

Acknowledgments

This work was supported by a grant from The Wellcome Trust.

References

Ball, J. (2000). Analysis of virus quasispecies. In RNA Viruses, pp. 105–140. Edited by A. J. Cann. Oxford: IRL Press.
Clarke, I. N. & McCrae, M. A. (1983). The molecular biology of rotaviruses. VI. RNA species-specific terminal conservation in rotaviruses. J Gen Virol 64, 1877–1884.
Combet, C., Blanchet, C., Geourjon, C. & Deleage, G. (2000). NPS@: network protein sequence analysis. Trends Biochem Sci 25, 147–150.
Estes, M. K. (2001). Rotaviruses and their replication. In Virology, pp. 1747–1786. Edited by D. M. Knipe. Baltimore: Lippincott Williams & Wilkins.
Fausnaugh, J. & Shatkin, A. J. (1990). Active site localization in a viral mRNA capping enzyme. J Biol Chem 265, 7669–7672.
Fresco, L. D. & Buratowski, S. (1994). Active site of the mRNA-capping enzyme guanylyltransferase from Saccharomyces cerevisiae: similarity to the nucleotidyl attachment motif of DNA and RNA ligases. Proc Natl Acad Sci U S A 91, 6624–6628.
Heath, R. R. (1996). The role of cell-mediated immune response to rotavirus infection. PhD thesis, University of Warwick, UK.
Hsiao, J., Martinez-Costas, J., Benavente, J. & Vakharia, V. N. (2002). Cloning, expression, and characterization of avian reovirus guanylyltransferase. Virology 296, 288–299.
Hua, J., Mansell, E. A. & Patton, J. T. (1993). Comparative analysis of the rotavirus NS53 gene: conservation of basic and cysteine rich regions in the protein and possible stem-loop structures in the RNA. Virology 196, 372–378.
Imai, M., Akatani, K., Ikegami, N. & Furuichi, Y. (1983). Capped and conserved terminal structures in human rotavirus genome double-stranded RNA segments. J Virol 47, 125–136.
Johnson, M. A. & McCrae, M. A. (1989). Molecular biology of rotaviruses. VIII. Quantitative analysis of regulation of gene expression during virus replication. J Virol 63, 2048–2055.
Kang, S. Y., Nagaraja, K. V. & Newman, J. A. (1987). Characterization of viral polypeptides from avian rotavirus. Avian Dis 31, 607–621.
Liu, M., Mattion, N. M. & Estes, M. K. (1992). Rotavirus VP3 expressed in insect cells possesses guanylyltransferase activity. Virology 188, 77–84.
Luongo, C. L. (2002). Mutational analysis of a mammalian reovirus mRNA capping enzyme. Biochem Biophys Res Commun 291, 932–938.
Luongo, C. L., Reinisch, K. M., Harrison, S. C. & Nibert, M. L. (2000). Identification of the guanylyltransferase region and active site in reovirus mRNA capping protein lambda2. J Biol Chem 275, 2804–2810.
Mattion, N. M., Mitchell, D. B., Both, G. W. & Estes, M. K. (1991). Expression of rotavirus proteins encoded by alternative open reading frames of genome segment 11. Virology 181, 295–304.
McCrae, M. A. & Faulkner-Valle, G. P. (1981). Molecular biology of rotaviruses. I. Characterization of basic growth parameters and pattern of macromolecular synthesis. J Virol 39, 490–496.
McCrae, M. A. & McCorquodale, J. G. (1982). The molecular biology of rotaviruses. II. Identification of the protein-coding assignments of calf rotavirus genome RNA species. Virology 117, 435–443.
McCrae, M. A. & McCorquodale, J. G. (1983). Molecular biology of rotaviruses. V. Terminal structure of viral RNA species. Virology 126, 204–212.
Mitchell, D. B. & Both, G. W. (1988). Simian rotavirus SA11 segment 11 contains overlapping reading frames. Nucleic Acids Res 16, 6244.
Page, R. D. (1996). TreeView: an application to display phylogenetic trees on personal computers. Comput Appl Biosci 12, 357–358.
Pedley, S., Bridger, J. C., Brown, J. F. & McCrae, M. A. (1983). Molecular characterization of rotaviruses with distinct group antigens. J Gen Virol 64, 2093–2101.
Pedley, S., Bridger, J. C., Chasey, D. & McCrae, M. A. (1986). Definition of two new groups of atypical rotaviruses. J Gen Virol 67, 131–137.
Pizarro, J. L., Sandino, A. M., Pizarro, J. M., Fernandez, J. & Spencer, E. (1991). Characterization of rotavirus guanylyltransferase activity associated with polypeptide VP3. J Gen Virol 72, 325–332.
Qian, Y. & Green, K. Y. (1991). Human rotavirus strain 69M has a unique VP4 as determined by amino acid sequence analysis. Virology 182, 407–412.
Saif, L. J. & Jiang, B. (1994). Nongroup A rotaviruses of humans and animals. Curr Top Microbiol Immunol 185, 339–371.
Thompson, J. D., Higgins, D. G. & Gibson, T. J. (1994). clustal w: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22, 4673–4680.
Xu, L., Harbour, D. & McCrae, M. A. (1990). The application of polymerase chain reaction to the detection of rotaviruses in faeces. J Virol Methods 27, 29–37.
Xu, L., Tian, Y., Tarlow, O., Harbour, D. & McCrae, M. A. (1994). Molecular biology of rotaviruses. IX. Conservation and divergence in gene segment 5. J Gen Virol 75, 3413–3421.