Animal: DNA Viruses

The sequence of camelpox virus shows it is most closely related to variola virus, the cause of smallpox

  • Sir William Dunn School of Pathology, University of Oxford, South Parks Road, Oxford OX1 3RE, UK1
  • Department of Infectious Diseases, Division of Investigative Science, Faculty of Medicine, Imperial College, St Mary’s Campus, Norfolk Place, London W2 1PG, UK2
  • Author for correspondence: Geoffrey Smith (at Imperial College). Fax +44 207 594 3973. e-mail glsmith{at}ic.ac.uk
  • Journal of General Virology 2002; 83(4):855–872

    Download PDF PubMed

    Abstract

    Camelpox virus (CMPV) and variola virus (VAR) are orthopoxviruses (OPVs) that share several biological features and cause high mortality and morbidity in their single host species. The sequence of a virulent CMPV strain was determined; it is 202182 bp long, with inverted terminal repeats (ITRs) of 6045 bp and has 206 predicted open reading frames (ORFs). As for other poxviruses, the genes are tightly packed with little non-coding sequence. Most genes within 25 kb of each terminus are transcribed outwards towards the terminus, whereas genes within the centre of the genome are transcribed from either DNA strand. The central region of the genome contains genes that are highly conserved in other OPVs and 87 of these are conserved in all sequenced chordopoxviruses. In contrast, genes towards either terminus are more variable and encode proteins involved in host range, virulence or immunomodulation. In some cases, these are broken versions of genes found in other OPVs. The relationship of CMPV to other OPVs was analysed by comparisons of DNA and predicted protein sequences, repeats within the ITRs and arrangement of ORFs within the terminal regions. Each comparison gave the same conclusion: CMPV is the closest known virus to variola virus, the cause of smallpox.

    Introduction

    Poxviruses are complex viruses that replicate in the cytoplasm and encode many enzymes and immunomodulatory proteins (Moss, 1996 ). They are classified into vertebrate (Chordopoxvirinae) and insect (Entomopoxvirinae) subfamilies. Chordopoxviruses (ChPVs) are subdivided into eight genera and members of six of these have been sequenced: These are vaccinia virus (VV) strains Copenhagen (COP) (Goebel et al., 1990 ), Tian Tan (accession no. AF095689), modified virus Ankara (MVA) (Antoine et al., 1998 ) and Western Reserve (Smith et al., 1991 ) (and references therein), variola virus (VAR) strains Bangladesh-1975 (BSH) (Massung et al., 1993a , 1994 ), India-1967 (IND) (Shchelkunov et al., 1995 ) and Garcia-1966 (Shchelkunov et al., 2000 ), myxoma virus (MYX) strain Lausanne (Cameron et al., 1999 ), Shope fibroma virus (SFV) strain Kaza (Willer et al., 1999 ), molluscum contagiosum virus (MCV) (Senkevich et al., 1997 ), fowlpox virus (FPV) (Afonso et al., 2000 ), lumpy skin disease virus (Tulman et al., 2001 ) and Yaba-like disease virus (YLDV) (Lee et al., 2001 ). In addition, the sequence of 50 kb from each end of the genome of cowpox virus (CPV) strain GRI-90 (Shchelkunov et al., 1998 ), and parts of the Yaba monkey tumour virus (accession nos AB025319, AB018404 and AB015885), swinepox virus (Massung et al., 1993b ) and ectromelia virus (Chen et al., 2000 ) genomes have been reported.

    Camelpox virus (CMPV) is a poorly characterized orthopoxvirus (OPV) that causes a severe and economically important disease in camels, especially young animals (McGrane & Higgins, 1986 ). The discovery of CMPV caused concern during the WHO-sponsored smallpox eradication campaign due to its description as smallpox-like (Baxby, 1972 ). Both CMPV and VAR, the cause of smallpox, cause a systemic illness in a single host species, form a small, white pock on the chorioallantoic membrane of a fertile hen’s egg, have a similar ceiling temperature for growth, and are restricted for replication in rabbit skin (Fenner et al., 1989 ). Despite these similarities, CMPV and VAR are distinguishable (Bedson, 1972 ; Baxby, 1974 ; Esposito & Knight, 1985 ) and CMPV has rarely, if ever, caused disease in man (Jezek et al., 1983 ). Likewise, VAR is unable to cause disease in camels, although camels immunized with VAR are resistant to subsequent infection with CMPV (Baxby et al., 1975 ).

    Smallpox was eradicated by vaccination. Originally cowpox virus (CPV) was the vaccine, but vaccinia virus (VV) (Fenner et al., 1988 ), a virus of unknown origin (Baxby, 1981 ), was used in the 20th century. CMPV, VAR, CPV and VV are all OPVs, a ChPV genus that also includes monkeypox virus and ectromelia virus (Fenner et al., 1989 ). To increase our understanding of CMPV and of OPV phylogeny we sequenced the genome of CMPV strain CMS (CMPV-CMS), a virulent virus isolated in 1970 from Iran (Baxby, 1972 ), and also the termini of CMPV strain 903 (CMPV-903) (Douglass & Dumbell, 1996 ) isolated from Somalia. Hitherto, CMPV sequence data were restricted to a few individual genes (Binns, 1992 ; Meyer & Rziha, 1993 ; Douglass & Dumbell, 1996 ).

    Analyses of the CMPV genome sequence, the arrangement of open reading frames (ORFs), the protein sequences and the nature of the repeats within the inverted terminal repeats (ITRs) all showed that CMPV was more closely related to VAR than to any other virus.

    Methods

    ▪ Virus and cells.

    Human TK143 cells and monkey kidney BS-C-1 cells were grown as described (Mathew et al., 2001 ). CMPV-CMS and CMPV-903 were kindly provided by Keith Dumbell (Cape Town, South Africa). Viruses were plaque purified twice on monolayers of BS-C-1 cells and virus stocks were grown in TK143 cells.

    ▪ Virus purification and DNA extraction.

    CMPV was purified from the cytoplasm of infected TK143 cells by sedimentation in two successive sucrose density gradients (Mackett et al., 1985 ) and DNA was extracted from purified virions as for other OPVs (Esposito et al., 1981 ).

    ▪ Construction of CMPV DNA libraries and shotgun sequencing.

    The CMPV DNA sequence was determined using the random shotgun sequencing method (Bankier et al., 1987 ). Virus DNA was sheared by sonication and DNA fragments of 2–5 kb and 5–10 kb were cloned into pUC118 (Yanisch-Perron et al., 1985 ). Plasmid DNA was extracted using a Qiagen Biorobot 9600. The DNA library containing fragments of 2–5 kb was used for most sequencing, whereas fragments of 5–10 kb were useful for gap filling. DNA was sequenced with M13 forward and reverse universal primers or specially designed oligonucleotides (BioLabs) on an Applied Biosystems model 373 Sequencer using the cycle sequencing method with fluorescent dye terminators and AmpliTaq DNA polymerase FS (ABI PRISM BigDye Terminator Cycle Sequencing Ready Reaction, Perkin-Elmer). Applied Biosystems sequence software was used for lane tracking and trace extraction and data were transferred to UNIX workstations for all further processing.

    ▪ Genomic DNA assembly.

    Raw data were processed using the program Pregap4, including Phred (Ewing & Green, 1998 ) and Phrap (Ewing et al., 1998 ). A consensus sequence was produced and edited on the graphical interface of Gap4 (Bonfield et al., 1995 ; Bonfield & Staden, 1996 ). Oligonucleotide primer design, genomic DNA composition, inverted repeats and restriction enzyme patterns were determined with the Wisconsin Genetic Computer Group (GCG) program (Devereux et al., 1984 ).

    ▪ Bioinformatic analysis.

    ORFs were identified with NIP4 software (Staden & McLachlan, 1982 ). Protein comparisons with sequence databases and amino acid sequence analyses were processed and viewed using PIX (http://www.hgmp.mrc.ac.uk/Registered/Webapp/pix/). Related proteins were aligned with CLUSTALW (Thompson et al., 1994 ) and were edited with GeneDoc (Nicholas & Nicholas, 1997 ). Phylogeny studies were carried out using the maximum likelihood analysis program PUZZLE (Strimmer & von Haeseler, 1996 , 1997 ) version 5.0 (VT model of substitution) (Müller & Vingron, 2000 ) and the PHYLIP package version 3.5 (Felsenstein, 1989 ) using the programs SEQBOOT, PROTDIST, NEIGHBOR and CONDENSE. Phylogenetic trees were viewed with TREEVIEW (Page, 1996 ). OPV genome sequences were aligned using DOTTER (Sonnhammer & Durbin, 1995 ).

    Results and Discussion

    Genome sequence

    Raw sequence data (1641717 nucleotides) were assembled into a 202182 bp contiguous sequence (average density of 8·12 readings per nucleotide). The entire sequence was read on both strands and had a base composition of 66·9% A+T. The genome was slightly longer than reported but the calculated HindIII restriction map (Fig. 1a) was consistent with previously published maps (Esposito & Knight, 1985 ) except for the 718 bp U fragment that was missed previously. The sequence obtained is predicted to extend very close to the terminal hairpins because: (i) the size of terminal restriction fragments determined by HindIII or EcoRI digestion followed by denaturation, snap-back hybridization and gel electrophoresis showed these to be indistinguishable from those predicted from nucleotide sequence by computer; and (ii) the sequence contains the nucleotide motif 5′ TTTTTTTCTAGACACTAAAT 3′ that is identical to the sequence present in VV and that is needed for the resolution of concatemeric DNA replication intermediates. In sequenced OPVs this motif is present very close the terminal hairpin. Compared to other OPVs (Mackett & Archard, 1979 ; DeFilippes, 1982 ) the CMPV genome has a distinctive HindIII restriction map and is clearly a separate OPV species.

    Figure image not available in archive

    Fig. 1. (a) HindIII restriction map of the CMPV genome predicted for DNA sequence. Fragments are lettered A to U in decreasing size according to existing convention (Mackett & Archard, 1979 ). ITRs are indicated with open boxes. (b) ORF map of the CMPV genome. ORFs are represented by coloured arrows (green, assembly/structural; light blue, RNA metabolism; dark blue, DNA metabolism; yellow, host range; red, immunomodulators; black, unknown; grey, gene fragments) and are numbered from the left to the right end of the genome based on the position of the first methionine codon (Table 1). They were assigned the designation L (left) or R (right) to represent the direction of transcription. The ITRs are indicated by horizontal bars at each end of the genome. Each line represents just under 26 kb.

    Inverted terminal repeats

    The OPV genome consists of a linear dsDNA molecule with covalently linked termini and ITRs. ITRs studied previously contain two unique sequences, non-repeated (NR) 1 and 2, and one or more blocks of tandem repeats. Two sets of repeats are present in the ITRs of VV-COP (Goebel et al., 1990 ), CPV-GRI (Shchelkunov et al., 1998 ) and RCN (Parsons & Pickup, 1987 ) but not in most VAR strains (Massung et al., 1995 ) (see Fig. 4). NR1 is adjacent to the terminal hairpin loop and contains the motif essential for the resolution of DNA concatemeric replication intermediates (DeLange et al., 1986 ; Merchlinsky & Moss, 1986 ). In contrast, NR2 is located after the first set of repeats and its function remains unknown.

    The CMPV-CMS ITR is 6045 bp and encodes protein to within 650 bp of the terminus. This situation is unlike VV (Goebel et al., 1990 ) and CPV (Shchelkunov et al., 1998 ) but very similar to VAR (Massung et al., 1994 ; Shchelkunov et al., 1995 , 2000 ). In the terminal 630 bp, there is a single block of tandem repeats consisting of three 70 bp units, followed by a 52 bp incomplete unit and a 27 bp related sequence. However, no equivalent of NR2 was identified within the CMPV genome showing that NR2 is not essential for OPV replication.

    General arrangement of ORFs

    Translation of the CMPV DNA sequence identified 206 ORFs ⩾195 nucleotides starting with ATG or smaller ORFs that are conserved in other ChPVs (Fig. 1b, Table 1). Two of the smaller ORFs are the RNA polymerase subunit rpo7 (Amegadzie et al., 1992 ) and the IMV membrane protein A14.5L (Betakova et al., 2000 ). In addition, there are 61 minor ORFs that fall wholly or partly within larger ORFs (data not shown). Major ORFs have an average length of 964 nucleotides, encode proteins of 53 to 1869 amino acids and 101 are transcribed leftwards and 105 rightwards.

    Table 1. Major ORFs encoded by CMPV

    Table 1 (cont.)

    Table 1 (cont.)

    Table 1 (cont.)

    Table 1 (cont.)

    CMPV has a genome organization very similar to other ChPVs. Protein coding sequences are contiguous and are on both DNA strands. ORFs are tightly packed and there is little noncoding DNA. Some ORFs towards the middle of the genome are even slightly overlapping. Blocks of ORFs are transcribed in the same direction, most notably at the ends of the genome (Fig. 1b).

    The CMPV ITRs contain three ORFs that are consequently diploid (1L/206R, 2L/205R and 3L/204R). A fourth ORF crosses the left ITR and represents a complete version of the CPV-GRI D4L ORF (Shchelkunov et al., 1998 ). Within the right ITR, ORF 203R represents the C-terminal 376 amino acids of 4L. Upstream of the initiating methionine of 203R an additional 85 codons are conserved until the ITR internal boundary is reached. This suggests that during CMPV evolution, DNA was copied by a terminal transposition event from the left of the genome to the right, rather then the converse. Other poxviruses also contain ORFs that cross only one ITR (Willer et al., 1999 ).

    Table 1 shows CMPV major ORFs, their nucleotide co-ordinates, number of amino acids and the best protein matches in public databases. Most proteins have a putative function based on amino acid similarities with other viral or cellular proteins. The general arrangement of CMPV ORFs is collinear with other OPVs. Fig. 1(b) illustrates a genetic map of the CMPV genome with predicted functions represented using a colour code. The central region encodes mainly proteins involved in DNA and RNA metabolism (dark and light blue) or in virion assembly or structure (green). In contrast, the terminal regions encode proteins with known or predicted virulence (red) or host range functions (yellow). ORFs encoding proteins of unknown function are shown in black and incomplete ORFs in grey. The latter are fragments of larger ORFs in other OPVs.

    Minimal gene complement

    Computational analyses identified 87 genes that are conserved in all sequenced ChPVs (Table 1, asterisks) and we define these as the ChPV minimal gene complement. These lie within the central genomic region (107 kb) between CMPV 44L (VV-COP F9L) and CMPV 151R (A34R) and mostly encode proteins for RNA transcription, DNA replication or virion structure (Table 1) that are essential for virus replication. Conversely, genes within the terminal regions are non-essential and encode proteins involved in host range, virulence or immunomodulation (Fig. 1b). These non-essential genes probably were acquired later during poxvirus evolution. Consonant with this hypothesis, whereas the average A+T content of the genome is 66·9%, 6/11 ORFs with an A+T content of less than 59% are clustered near the termini (1L/206R, 2L/205R, 5L and 6L) and several of these encode immunomodulators (see below). In addition, CMPV terminal ORFs have a codon usage that differs from the average codon usage for the whole genome (data not shown). Collectively, this suggests these terminal genes were acquired more recently.

    Gene fragments

    Within 60 kb at either end of the CMPV genome, 33 ORFs are incomplete versions of 23 larger ORFs found in other poxviruses due to frameshift or nonsense mutations. The role, if any, of gene fragments is unknown and many are unlikely to be expressed. The retention of so many gene fragments in OPV genomes is intriguing and in CMPV these represent 16% of all CMPV ORFs. Incomplete ORFs are also found in other OPVs (Smith et al., 1991 ; Aguado et al., 1992 ). To compare fragmented genes, 94 ORFs present within the terminal genomic regions of VV-COP, VAR-BSH, CPV-GRI or CMPV are listed in Table 2. ORFs are named as in VV-COP where possible. Complete ORFs are marked with a tick, fragmented ORFs with the letter F, deleted ORFs with a horizontal dash, and the 26 ORFs that are present in all four OPVs are highlighted. Although conserved in these viruses, some of the latter genes are non-essential for virus replication (Perkus et al., 1991 ). CPV contains the greatest number of complete ORFs in these terminal regions. Comparing fragmented genes in VAR-BSH and CMPV, there are 20 CMPV ORFs that are absent or broken in VAR, while 11 VAR ORFs are broken in CMPV (Table 3).

    Table 2. Comparison of OPV ORFs near the genome termini

    Table 3. CMPV ORFs that are broken in or absent from the VAR-BSH genome, and VAR-BSH ORFs that are broken in the CMPV genome

    ORFs specific to CMPV

    Hitherto, CPV contained all known OPV genes from terminal genomic regions, suggesting CPV was the closest virus to the ancestral OPV (Shchelkunov et al., 1998 ). However, five ORFs (8R, 9L, 182R, 183R and 184R) are unique to CMPV. ORFs 8L and 9L are short (65 and 81 amino acids, respectively) and are absent from other OPVs due to rearrangement of corresponding DNA. No related proteins were detected in public databases. In contrast, ORFs 182R, 183R and 184R encode polypeptides with 23 to 31% amino acid identity to the very large VAR-BSH protein B22R. Complete ORFs related to VAR-BSH B22R are present in CMPV (202R), ectromelia virus (Chen et al., 2000 ), CPV-GRI (Shchelkunov et al., 1998 ), MCV (Senkevich et al., 1997 ), FPV (Afonso et al., 2000 ), SFV (Willer et al., 1999 ), MYX (Cameron et al., 1999 ), YLDV (Lee et al., 2001 ) and lumpy skin disease virus (Tulman et al., 2001 ) but are absent in VV (Goebel et al., 1990 ). These ORFs encode the largest OPV proteins (∼214 kDa), predicted membrane glycoproteins of unknown function. The similarity of CMPV ORFs 182R, 183R and 184R with 202R suggests these smaller ORFs are remnants of another member of this family.

    Immunomodulatory proteins

    CMPV infection of camels can produce severe disease, suggesting CMPV may interfere with the host response to infection. CMPV expresses soluble proteins that bind IFN-γ (Alcamı́ & Smith, 1995 ), IFN-α/β (Symons et al., 1995 ), CC chemokines (Alcamı́ et al., 1998a ) and tumour necrosis factor (TNF) (Alcamı́ et al., 1999 ), and ORFs 181R, 196R, 1L/206R and 2L/205R, respectively, are predicted to encode these activities. In addition, ORFs 11R and 23L encode proteins that are very similar to the VV epidermal growth factor (Blomquist et al., 1984 ) and soluble inhibitor of complement (Kotwal et al., 1990 ). Proteins encoded by ORFs 31L, 188R and 200R have similarity to serpins that have anti-fusion or anti-apoptotic activity; for review see Turner et al. (1995) . Proteins encoded by ORFs 32L and 55L are similar to VV proteins K3L and E3L that mediate resistance to IFN; for review see Smith et al. (1998) . Additionally, bioinformatic studies suggest that ORFs 6L, 176R and 201R may have immunomodulatory or host range function.

    ORF 6L

    The only counterpart of protein 6L among sequenced poxviruses is a slightly shorter protein (S1R) (210 amino acids) in CPV-GRI (Shchelkunov et al., 1998 ). Hydropathy plots of 6L predict an integral membrane protein with five or six transmembrane domains and a putative signal peptide. Protein 6L is closely related to an uncharacterized human protein of family UPF0005 (72% identity, 82% similarity) whose members contain several membrane-spanning domains and share a signature from the third to fourth transmembrane domain (Walter et al., 1995 ). Protein 6L is also related (33–35% identity and 50–59% similarity) to the rat glutamate binding protein (Kumar et al., 1991 ) and the Bax inhibitor-1 (BI-1) family of anti-apoptotic integral membrane proteins (Xu & Reed, 1998 ; Kawai et al., 1999 ). When overexpressed in mammalian cells, BI-1 suppressed apoptosis induced by Bax, etoposide, staurosporine and growth factor deprivation, indicating BI-1 is a regulator of cell death pathways controlled by Bcl-2 and Bax (Xu & Reed, 1998 ). Possibly 6L regulates apoptosis in CMPV-infected cells.

    ORF 176R

    Protein 176R has similarity (36% identity, 56% similarity) with members of the Schlafen (SLFN) protein family that are expressed preferentially in lymphoid tissues and are regulated differentially during thymocyte maturation. Family members are grouped by size: a short form of about 350 amino acids (SLFN1, SLFN2) and a longer form of about 550 amino acids (SLFN3 to SLFN7). The prototype of the family, SLFN1, inhibits T cell growth and development (Schwarz et al., 1998 ). Related proteins are encoded by several OPVs; however, only CMPV 176R and CPV-GRI B2R (Shchelkunov et al., 1998 ) are undisrupted genes, whereas in VV-COP, VV-WR, VAR-BSH, VAR-IND and VAR-GAR, the ORF is fragmented. SLFN proteins are intracellular and so if 176R had a similar location and function, it is unclear how it would regulate T cell development other than after infection of T cells.

    ORF 201R

    Protein 201R contains a signal peptide, a RGD motif and shows amino acid similarity to the C-terminal domain of OPV TNF receptors CrmB and CrmD (Alcamı́ et al., 1998b ). RGD motifs mediate the binding of proteins to cell surface integrins: therefore, 201R might be a secreted protein that binds back to infected and/or uninfected cells. A similar protein is encoded by CPV-GRI gene B21R (Shchelkunov et al., 1998 ) and VV-COP contains a disrupted version of this gene (C13L and C14L) (Goebel et al., 1990 ).

    Phylogeny

    The relationship of CMPV to other OPVs was analysed by comparison of DNA sequences, predicted protein sequences, repeats within the ITRs and ORFs in the terminal regions. Each comparison gave the same conclusion: CMPV and VAR are more closely related to each other than either is to any other known virus.

    DNA sequence comparisons

    The central region of OPV genomes is highly conserved between different viruses and this close relationship allowed the pairwise alignment of complete genomes using the program DOTTER (Sonnhammer & Durbin, 1995 ). Comparison of CMPV and VAR-BSH showed their genomes are collinear except for the differing length of the ITRs and four insertions of 1·5–2·9 kb in CMPV relative to VAR (Fig. 2a, arrows). An apparent fifth gap (Fig. 2a, arrowhead) represents a region where there are several smaller rearrangements. The CMPV ITRs are longer than VAR-BSH ITRs because they contain three ORFs that are present outside the VAR-BSH ITR near the right end of the genome. A line running perpendicular to the diagonal (Fig. 2a, asterisk) illustrates the presence of this oppositely orientated region present at both ends of the CMPV genome and at only the right end of the VAR-BSH genome.

    Figure image not available in archive

    Fig. 2. (a) Graphical dotplot alignment of the CMPV-CMS and VAR-BSH genomes using the program DOTTER. Regions of high similarity are shown by a diagonal line. Nucleotide numbering for each genome is indicated on the corresponding axis. Arrows indicate regions of DNA ⩾1·5 kb that are present in CMPV but not VAR. The arrowhead indicates a region of sequence rearrangement. The asterisk illustrates the CMPV ITR that is present at only the right end of the VAR genome. (b) Pairwise DNA alignment [GCG program GAP (Devereux et al., 1984 )] of the central 100 kb of the CMPV, VAR and VV genomes divided into blocks of approximately 20 kb, labelled using VV gene nomenclature. Nucleotide identities (%) are shown. (c) DNA distance matrix. DNA sequences between counterparts of VV genes F9L to A34R were aligned using the program CLUSTALW (Thompson et al., 1994 ) and a DNA distance matrix was constructed using the program PUZZLE 5.0 (Strimmer & von Haeseler, 1996 ). CMPV, CMPV-CMS; VAR, VAR-BSH; VV, VV-COP; MYX, MYX Lausanne; SFV, SFV Kaza.

    In contrast to these similarities between CMPV and VAR, more breaks were found when the genomes of VV-COP and VAR-BSH, or VV-COP and CMPV were compared (data not shown). In the terminal regions CPV-GRI could also be compared. Here too, CMPV and VAR were most closely related and this was illustrated by (i) fewer breaks in the aligned genomes and (ii) closer nucleotide sequence identity (data not shown).

    The four significant insertions in the CMPV genome compared to VAR-BSH encode ORFs that are absent from all VAR strains analysed. Near the left end of the genome, ORFs 6L and 7L are present in CPV-GRI, whereas 8R and 9L are unique to CMPV. Near the right end of the CMPV genome, most of the region encoding ORFs 182R, 183R and 184R (see above) is missing in VAR-BSH, although parts of 182R containing several frameshift mutations can be identified (region encoding genes B9R and B10R).

    The relationships between CMPV, VAR-BSH and VV-COP were analysed further by comparison of the percentage nucleotide identity in the conserved central 110 kb of these genomes (encoding CMPV genes 44L to 151R, VV genes F9L to A34R and VAR-BSH genes C13L to A37R) (Fig. 2b). The region was divided into blocks of approximately 20 kb. Alignments of CMPV, VAR and VV showed that throughout this region nucleotide identity was ⩾91%. However, VAR and CMPV are more closely related (96·6–98·6%, average 98·0%) than are CMPV and VV (91·9–98·3%, average 96·7%) or VV and VAR (91·4–97·9%, average 96·0%) (Fig. 2b). For comparison, VAR-BSH and VAR-IND shared 99·8% nucleotide identity in this region.

    A DNA distance matrix, constructed using programme PUZZLE 5.0 (Strimmer & von Haeseler, 1997 ), was used to analyse the genetic distance between CMPV, VAR-BSH and VV-COP (Fig. 2c). In this method, identical sequences give a score of zero and unrelated sequences a score of 1. The analysis showed that the genetic distance between CMPV and VAR (0·0166) was lower than between CMPV and VV (0·0220), and VAR and VV (0·0267). For comparison, the corresponding regions of SFV and MYX were included. This showed the genetic distance between these leporipoxviruses was 0·1277 (8-fold greater than between CMPV and VAR), whereas the distances between each leporipoxvirus and all three OPVs ranged from 0·5642 to 0·5696.

    Phylogenetic analyses of protein sequences

    To investigate further the evolutionary relationships between CMPV and other poxviruses, phylogenetic trees for specific proteins were constructed from CLUSTALW alignments of protein sequences. In the terminal regions, a comparison of CMPV, VAR, CPV and VV identified 26 ORFs conserved in all these viruses (Table 2, highlighted). The percentage amino acid identities of the CMPV, CPV and VV proteins to VAR-BSH (Fig. 3a) show that in most cases (20/26) the CMPV protein was more closely related to VAR than were proteins from CPV or VV. In 3/26 cases the CMPV protein and the corresponding protein from either or both VV and CPV were equally closely related to VAR, and in only 3/26 cases was the CPV or VV protein more closely related than the CMPV protein to the VAR protein. Consensus phylogenetic trees were also constructed using the maximum likelihood program PUZZLE (Methods) from 11 of these protein sequences (asterisks in Fig. 3a) and are shown in Fig. 3(b, c) together with the corresponding, more distantly related proteins from MYX or SFV to root the tree. In Fig. 3(c) the OPV grouping is expanded to show the relationships together with the quartet puzzling support values. To obtain an independent analysis of the grouping of these four OPVs, the dataset was bootstrapped using the PHYLIP package version 3.5 (Felsenstein, 1989 ) with programs SEQBOOT, PROTDIST, NEIGHBOR and CONDENSE (Fig. 3e). Both these analyses placed VAR and CMPV together and distinct from CPV and VV.

    Figure image not available in archive

    Fig. 3. (a) VAR-BSH terminal ORFs that are conserved in CMPV-CMS, CPV-GRI-90 and VV-COP and the amino acid identities of the encoded proteins compared with VAR-BSH proteins. Consensus phylogenetic trees for 20 proteins (VV-COP F12L, E1L, E9L, I8R, J3R, H6R, D1R, D4R, D5R, D6R, D7R, D10R, D11L, D12L, A1L, A2L, A5R, A7L, A18R and A24R) encoded in the central region (b, c, d) or for 11 proteins [marked with an asterisk in (a)] encoded in the terminal regions (e, f, g) of the OPV genomes were constructed with program PUZZLE version 5.0 (Strimmer & von Haeseler, 1996 , 1997 ) (b, c, e, f) and with the PHYLIP package version 3.5 (Felsenstein, 1989 ) (d, g). Amino acid sequences of OPV proteins were aligned using the program CLUSTALW version 1.8 (Thompson et al., 1994 ) and phylogenic trees were viewed using TREEVIEW version 1.6.0 (Page, 1996 ). The scale bar indicates 0·1 substitutions per site (b, e) and the quartet puzzling support values for each branch are indicated in (c) and (f) (VT model of substitution, 25000 puzzling steps). The more distantly related proteins from MYX or SFV were included to root the tree. The branch lengths shown in (d) and (g) are arbitrary and the root position was forced using MYX. The numbers at the forks show the number of bootstrap repetitions, out of 100, in which the given topology was observed. Bootstrapping values were calculated using the modules SEQBOOT (random number seed 123, 100 replicates), PRODIST (Dayhoff PAM matrix, analysis of 100 data sets), NEIGHBOR (neighbour-joining analysis of 100 data sets) and CONDENSE.

    Seventeen members of the ChPV minimal gene complement were compared next. Phylogenetic trees constructed using PUZZLE for these individual proteins did not give consistent relationships between OPVs and three different topologies were observed. In 53% of the trees, CMPV and VAR-BSH were grouped together on the same branch independently of VV-COP. In 29% of the cases, VAR-BSH and VV-COP grouped together and were independent of CMPV. Finally, in 18% of the cases, CMPV and VV-COP grouped together independently of VAR-BSH. The inconsistent relationship obtained is explained by the very high conservation in these proteins (up to 99% amino acid identity). Similar results have been reported with analysis of single genes from closely related species (Huelsenbeck & Bull, 1996 ). Therefore, to obtain a reliable phylogenetic relationship, 20 proteins from the central regions were compared simultaneously as for the terminal ORFS (above). This showed that CMPV was most closely related to VAR, and VV was more distantly related (Fig. 3e, f, g). A comparison of the scale bars in Fig. 3(b, e) shows the greater conservation of the proteins encoded in the centre of the genome.

    Collectively, the phylogenetic analysis of proteins showed VAR and CMPV are most closely related, consistent with analysis of two CMPV genes (Binns, 1992 ; Douglass & Dumbell, 1996 ), which showed CMPV, VAR and taterapoxvirus are closely related.

    Comparisons of inverted terminal repeats

    OPV ITRs vary in length; for instance, VV-COP, CMPV and VAR-BSH have ITRs of 12068, 6045 and 725 bp, respectively. Although CMPV and VAR ITRs differ in length, and therefore might appear divergent, a terminal transposition event (Moyer & Graves, 1981 ) could create larger VAR ITRs similar to CMPV. Alternatively, if previously VAR had ITRs of similar size to CMPV, a deletion of sequences from within the VAR left ITR could have created the present structure. Evidence supporting this possibility comes from analysis of CMPV ORF 4L, which crosses the left ITR boundary and is repeated in part within the right ITR (ORF 203R). In VAR, sequences related to CMPV ORF 4L are present at each end of the genome (D1* and f* Fig. 4a), b, suggesting a longer ITR at one stage. In contrast, ORFs related to CMPV 1L/206R, 2L/205R and 3L/204R are found at only the right end of the VAR genome (G3R, G2R and G1R) (Fig. 4a). VAR strain Somalia is unusual in that its ITR is longer than other VAR strains and the repeated sequences, which are outside the ITR in other VAR strains, are included in the Somalia ITR. This might suggest that VAR strain Somalia represents a structure intermediate between most VAR strains and that of other OPVs (Massung et al., 1995 ).

    Figure image not available in archive

    Fig. 4. Comparison of CMPV-CMS and VAR-BSH terminal ORFs and repeats within the ITRs. Leftmost 16 kb (a) and rightmost 18 kb (b) of the CMPV genome and the corresponding region of VAR are shown. ITRs are represented with a grey background, ORFs by blue arrows, fragmented ORFs by orange arrows and regions absent in VAR with a dashed line. Asterisks represent sequences related to CMPV 4L. The block of terminal repeats is illustrated by a black box. (c) OPV terminal repeats. Different repeats are illustrated in colour as indicated. ORFs are shown by arrows. NR1 (1) and NR2 (2) sequences are represented by horizontal lines. Underlined repeats indicate the block of repeats unique to CMPV and VAR strains in which there are two or more 70 bp repeats in VAR strains Harvey-1947 (HAR), Garcia (GAR), Somalia-1970 (SOM) and Congo-1970 (CNG) (Massung et al., 1995 ).

    Generally, ORFs in the terminal regions are variable between OPVs and distinctive for each virus. However, the arrangement of ORFs close to and within the ITR of CMPV and VAR shows a higher degree of similarity (Fig. 4a, b, b). Firstly, in each virus, ORFs extend to within 650 bp of the termini, a feature that distinguishes CMPV and VAR from other OPVs. Secondly, outside the ITR the gene pattern is conserved except for the absence of counterparts of CMPV 6L to 9L from the left and 201R from the right end of VAR.

    All OPV ITRs sequenced hitherto contain blocks of tandem repeats that vary in sequence, number and arrangement. A comparison of OPV terminal repeats (Fig. 4c) shows that CMPV is most similar to VAR. Firstly, CMPV strains CMS and 903 and all five sequenced strains of VAR, but not VV, CPV and racoonpoxvirus (Parsons & Pickup, 1987 ; Massung et al., 1995 ), show a block of repeats containing three 70 bp repeats (yellow) followed by related sequences of 52 or 64 (red) and 27 (green) bp. Although the absolute length of these repeats varies slightly between viruses, their sequences are highly conserved, confirming their relationship. Secondly, some repeats (pink and dark blue symbols) that are shared by VV and CPV are absent from all VAR and CMPV strains. CMPV strains CMS and 903 have different numbers of 70 bp repeats consistent with terminal length heterogeneity in CMPV isolates (Pfeffer et al., 1996 ) and each lack the NR2 sequence (Fig. 4c).

    Evolution of VAR and CMPV

    All the above comparisons established that VAR and CMPV are more closely related to each other than to any other virus. This suggests either that one virus has evolved from the other, or that they have each evolved from a closely related ancestral virus distinct from VV and CPV. The first possibility seems unlikely because of DNA sequences unique to either virus (insertions of 1·5–2·9 kb in CMPV and the presence of NR2 in the VAR ITR). So evolution from a closely related ancestor, possibly a rodent virus (Fenner et al., 1988 , 1989 ), seems more probable. When this took place is uncertain, but highly infectious diseases, such as measles and smallpox, require human populations of between 100000 and 300000 to retain transmission between susceptible (non-immune) hosts. During human evolution, populations of this size within a reasonably defined geographical area arose when man adopted intensive agriculture rather than being an isolated hunter gather, between 5000 and 10000 years BC. The presence of camels in areas of human and associated rodent population expansion such as the Nile, Tigris, Euphrates, Ganges and Indus river basins makes it possible that an ancestral OPV might have spread to camels at a similar time. The presence of many broken genes in VAR and CMPV, which are non-essential for virus replication but not jettisoned from these virus genomes, suggests that VAR and CMPV are relatively recent pathogens of man and camels, respectively. Given longer, these viruses might have become adapted better to man and camels to become less virulent and possibly lose some of these non-essential gene fragments.

    Whatever the precise origin of VAR and CMPV, the collinearity of their genomes (except for a few insertions in CMPV), their DNA sequences, ITRs and encoded proteins all show they are the closest known viruses to each other. In addition, CMPV and VAR share other distinctive biological properties, such as their ability to induce high morbidity and mortality in a single host species, the similar pock morphology and ceiling temperature for growth in the chorioallantoic membrane of the fertile hen’s egg and the inability to grow in rabbit skin (Fenner et al., 1989 ).

    Although the disease smallpox has been eradicated, there are concerns about the potential use of VAR in bioterrorism and the WHO has scheduled and postponed (until 31/12/2002) destruction of the last VAR stocks held in Russia and USA. CMPV has not caused disease in man, but the possibility of an OPV such as VAR, monkeypox, CMPV or taterapox virus emerging or re-emerging as a threat to human health increases as the proportion of the world’s population that is immunologically naı̈ve for OPVs increases. The parallel increase in those immunosuppressed due to HIV infection potentiates the chance of OPVs jumping species and adapting to mankind. This possibility and the threat of bioterrorism justify the retention of adequate stocks of vaccine (VV) to combat OPV infections. Finally, it is unclear whether all, only a few, or just one of the differences between the CMPV and VAR genomes are responsible for the inability of CMPV to cause human disease. Consequently, genetic modification of CMPV to delete genes that are present in CMPV but absent in VAR might be unwise. It might also be unwise to insert into CMPV genes encoding Th2 cytokines, which caused a dramatic change in ectromelia virus virulence (Jackson et al., 2001 ).

    Acknowledgments

    We thank Bela Tiwari, Nicki Gray and Rory Bowden for computing help. This work was supported by the UK Medical Research Council, The Wellcome Trust, the Swiss National Science Foundation (83 EU-56142), Roche, and the University of Lausanne. G.L.S. is a Wellcome Trust Principal Research Fellow.

    Footnotes

    • Accession numbers: AY009089 and AY037935.

    References