Abstract
In 2000, the full genome sequence of Ureaplasma parvum (previously known as Ureaplasma urealyticum) serovar 3 was released. In 2002, after prolonged debate, it was agreed that the former U. urealyticum should be divided into two species – U. parvum and U. urealyticum. To provide additional support for this decision and improve our understanding of the relationship between these two species, the authors studied four ‘core’ genes or gene clusters in ATCC reference strains of all 14 serovars of U. parvum and U. urealyticum. These ‘core’ regions were the rRNA gene clusters, the EF-Tu genes (tuf), urease gene clusters and multiple-banded antigen genes (mba). The known U. parvum genome sequences (GenBank accession no. NC_002162) were used as reference. DNA insertions and deletions (indels) were found in all of the gene regions studied, except tuf, but they were found only between, not within, the two species. An incidental finding was that there was inter-copy heterogeneity for rRNA gene cluster sequences. Sequence analysis (sequence heterogeneity and especially indels) of all four selected targets consistently supported the separation of human ureaplasmas into two species. Except for multiple-banded antigen, there was less heterogeneity in amino acid sequences of proteins, between species, than in the nucleic acid sequences of the corresponding genes. The degrees of heterogeneity at the 5′ end of the species-specific regions of multiple-banded antigen were almost identical for both amino acid and nucleotide sequences. Analysis of the authors' results provided an interesting case study to help resolve some common problems in the use of sequence data to infer phylogenetic relationships and support taxonomic changes. It is recommended that, to avoid confusion, the new nomenclature be used for human ureaplasmas in future publications.
-
Published online ahead of print on 2 July 2004 as DOI 10.1099/ijs.0.63073-0.
-
The GenBank/EMBL/DDBJ accession numbers for the sequences described in this article are: rRNA gene cluster, AF272599–AF272604, AF073446–AF073459, AF059322–AF059335 and AF272605–AF272630; tuf, AF270758–AF270770; urease gene clusters, AF085720–AF085733; mba, AF055358–AF055367 and AF056982–AF056984.
-
Comparisons of two U. parvum serovar 3 rRNA operons (Fig. A), U. urealyticum serovar 8 and U. parvum serovar 3 rRNA operon 1 and operon 2 DNA sequences (Fig. B), U. urealyticum serovars 8 and 13 and U. parvum serovar 3 EF-Tu gene (tuf) DNA and amino acid sequences (Fig. C), and U. urealyticum serovar 8 (upper line) and U. parvum serovar 3 (lower line) urease gene cluster sequences (Fig. D), and a comparison of interspecies heterogeneity of DNA and amino acid sequences of the urease gene clusters of U. parvum and U. urealyticum (Table A) are available as supplementary material in IJSEM Online.
INTRODUCTION
There have been two recent major developments that affect our understanding of human ureaplasmas. Firstly, the full genome sequence of Ureaplasma parvum (previously known as Ureaplasma urealyticum) serovar 3 was released in 2000 (GenBank accession no. NC_002162) (Glass et al., 2000). Secondly, the taxonomy of human Ureaplasma species changed in 2002 (Robertson et al., 2002), when the two former U. urealyticum biovars were given full species status, as U. parvum (previously biovar parvo or biovar 1) and U. urealyticum (previously biovar T960T or biovar 2) (Robertson et al., 2002).
It is now accepted that a decision to create a new species should be based on many independent phenotypic and genotypic characteristics – the theory of ‘polyphasic taxonomy’ (Vandamme et al., 1996). However, molecular methods and genome-based criteria have become more accessible and attractive (Gürtler & Mayall, 2001; Stackebrandt et al., 2002). DNA–DNA hybridization showing less than 70 % homology between whole genomes is accepted as the most definitive criterion or ‘gold standard’ for separate prokaryote species (Murray & Schleifer, 1994). The traditional 70 % DNA–DNA hybridization value used to delineate genomic species was found to correspond to genome mispairings in the range 13–13·6 % or 0·097–0·104 nucleotide substitutions per site (Mougel et al., 2002). Similarity of less than 97 % in the 16S rRNA gene is the most widely used practical alternative (Murray & Schleifer, 1994), but these criteria may conflict and additional alternative targets are required (Pettersson et al., 2000; Dellaglio et al., 2004).
The two Ureaplasma species discussed herein exhibit many distinct phenotypic and genotypic properties (including DNA–DNA hybridization showing less than 70 % homology), which support the change in taxonomy and fulfil the requirements of the polyphasic theory (Vandamme et al., 1996). However, continued use of the old single-species nomenclature in some recent publications (Daxboeck et al., 2003; Baier et al., 2003) is potentially confusing. To strengthen the case for acceptance and exclusive use of the new ureaplasma taxonomy (Robertson et al., 2002), we studied four ‘core’ genes/gene clusters – the rRNA gene cluster, EF-Tu gene (tuf), urease gene cluster and multiple-banded antigen gene (mba) – of all 14 human ureasplasma serovars. These four regions were chosen because previous studies have shown that they were promising targets for study of the phylogeny of ureaplasmas and mycoplasmas (Kong et al., 1999b, 2000b; Kamla et al., 1996). In addition, we used this as a case study to help resolve some common problems with the use of sequence data to infer phylogeny and to support the establishment of new taxonomy (Ludwig et al., 1998).
METHODS
Bacterial strains.
Reference strains of four U. parvum and ten U. urealyticum serovars, obtained directly from the American Type Culture Collection (ATCC), were the same as those used in our previous studies (Kong et al., 1999a).
Value of U. parvum serovar 3 genome in oligonucleotide primers design.
The full genome sequence of U. parvum serovar 3 (Glass et al., 2000) greatly facilitated sequencing of the selected genes and gene clusters of the other three U. parvum and ten U. urealyticum serovars. For this study, we used the following steps. Firstly, to identify conserved regions, we compared known sequences of genes corresponding to our selected genes and gene clusters – rRNA gene cluster, tuf and urease gene clusters – in other Mycoplasma species and Ureaplasma serovars (Fraser et al., 1995; Himmelreich et al., 1996; Neyrolles et al., 1996), in addition to mba, which we and others have studied in detail previously (Zheng et al., 1995; Kong et al., 1999a). Based on the results, we designed primers and amplified target regions for sequencing. The target regions sequenced were: (a) the whole rRNA gene cluster, including a short region upstream of the 16S rRNA gene (for U. parvum serovars only), the 16S rRNA gene, the 16S–23S rRNA intergenic spacer, the 23S rRNA gene, the 23S–5S rRNA intergenic spacer, the 5S rRNA gene and a short region downstream of the 5S rRNA gene (see supplementary Figs A and B in IJSEM Online); (b) almost the full-length of tuf (see supplementary Fig. C in IJSEM Online); (c) the whole urease gene cluster, including short regions upstream of ureA–ureA–ureB–ureC–ureE–ureF–ureG–ureD and downstream of ureD (see supplementary Fig. D in IJSEM Online).
The amplification and sequencing primers used in the study are shown in Table 1⇓. Most primers were used for both amplification and sequencing, but some (as inner sequencing primers) were used only for sequencing.
Sequences of oligonucleotide primers used for sequencing three different genes/gene clusters
DNA preparation and PCR.
These were performed as described previously (Kong et al., 1999b).
Sequencing and sequence analysis.
The PCR products were sequenced as described previously (Kong et al., 2000b). All sites showing unexpected heterogeneity, such as those indicating rRNA gene inter-copy sequence variation and the unique heterogeneity site in serovar 13 tuf (see below), were sequenced at least twice, to confirm the results. When necessary, different PCR amplicons and/or inner sequencing primers were used for sequencing.
The initial sequencing results were analysed with the program Bestfit in the Comparison program group and then joined together to determine sequences of whole genes/gene clusters. The multiple sequence alignments were performed with the programs Pileup and Pretty from the Multiple Sequence Analysis program group. All of the programs/program groups are available in WebANGIS (), ANGIS (Australian National Genomic Information Service).
RESULTS AND DISCUSSION
Advantages of sequencing multiple strains
In this study, as in our previous study of mba (Kong et al., 2000b), we sequenced the three target genes/gene clusters for 13 ureaplasma serovars (excluding serovar 3 sequences from the full genome, which were used as reference). The advantages of this approach are: (i) since the genes/gene clusters are relatively conserved between serovars within each species, the results for different serovars help to confirm the accuracy of sequencing results (Clayton et al., 1995); and (ii) the results help to differentiate interspecies from intraspecies heterogeneity (Mygind et al., 1998).
A previous study showed that there is some intraserovar as well as interserovar/intraspecies and interspecies heterogeneity in mba (Knox et al., 1998). However, there is limited intraspecies heterogeneity in the other three gene regions studied and, even in mba, intraspecies heterogeneity is much less than between species (Kong et al., 1999b). Therefore, a single reference strain of each serovar provides an example of each species to demonstrate interspecies heterogeneity, which was the focus of this study.
Key characteristics of the ‘core’ genes
Interspecies, intraspecies and inter-copy polymorphisms of the rRNA gene cluster.
There was relatively little interspecies heterogeneity in 16S and 23S rRNA genes but considerably more in the two copies of the 5S rRNA genes and the corresponding intergenic spacer regions (Table 2⇓). In common with other targets studied, the intraspecies heterogeneities in these genes were minor and some were assumed to be due to inter-copy differences between duplicate copies (Table 2⇓). Previous studies have shown sequence variation in duplicate copies of rRNA genes of other mollicutes (Pettersson et al., 1996). Analysis of the two copies of the rRNA gene cluster in U. parvum serovar 3 genome (GenBank accession no. AE002111 plus AE002112 and AE002127 plus AE002128) showed inter-copy heterogeneity between 16S rRNA genes (one site), 16S–23S rRNA intergenic spacer regions (two sites), 23S rRNA genes (four sites) and 5S rRNA genes (one site) but none in the 23S–5S rRNA intergenic spacer regions (Fig. A in IJSEM Online). In sequences of the corresponding genes of the other 13 serovars the result was ‘N’ (i.e. unknown or unidentifiable nucleotide) rather than ‘A, T, C or G’, at several sites, even after repeat sequencing or use of different amplification and sequencing primers. We assumed that most, if not all, of these were due to inter-copy polymorphisms (see Table 2⇓ and supplementary Figs A and B in IJSEM Online) (Ueda et al., 1999). If these inter-copy polymorphisms were ignored, the intraspecies heterogeneity in rRNA gene clusters between the two human Ureaplasma species was very low. In future, the design of primers or probes or study of the phylogenetic relationships should take account of polymorphisms between multi-copy rRNA gene clusters (Gürtler, 1999).
Comparison of interspecies, intraspecies and inter-copy heterogeneity in rRNA gene clusters of U. parvum and U. urealyticum
U. urealyticum serovar 13 EF-Tu gene (tuf).
Previous studies have shown that differences in tuf can distinguish species and may reflect some phenotypic relationships better than the 16S rRNA gene (Kamla et al., 1996). EF-Tu gene (tuf) DNA sequences were the same in serovars within each species, except for that of serovar 13 of U. urealyticum. It contains two base differences (but the same amino acid sequences) compared with the other nine U. urealyticum serovars (Fig. C in IJSEM Online). This difference is of interest in view of another reported difference between serovar 13, which gives an intermediate response in the Mn2+ (manganese)-inhibition test, and all other U. urealyticum serovars, which are fully inhibited (Robertson & Chen, 1984).
Intraspecies and interspecies heterogeneity of urease gene clusters.
There were 21, nine and one intraspecies heterogeneity sites in the urease gene cluster DNA sequences for U. parvum serovars 1, 6 and 14 (compared with serovar 3), respectively. There were seven heterogeneity sites (or 12 bp – one site has 6 bp difference) in U. urealyticum serovar 2, compared with all other U. urealyticum serovars. These results show greater heterogeneity between urease gene clusters of U. parvum serovars than between those of U. urealyticum serovars, as we found previously for mba (Kong et al., 1999a).
Interspecies heterogeneity between urease gene clusters of the two species was greater in the intergenic spacer regions (where it varies from 16·8 to 31·8 %) than in the genes themselves (where the range of heterogeneity is 5·9 to 9·7 %). Variation in amino acid sequences, between species, is less (range 0·97 to 6·7 %) than in nucleic acid sequences of urease genes (see supplementary Table A in IJSEM Online).
The ‘molecular clock’ is different for different mba regions.
As we described previously, different mba regions apparently evolve at different rates, i.e. according to different ‘molecular clocks' (Bromham & Penny, 2003). The upstream regions are more heterogeneous than mba itself (Kong et al., 1999a) and the repetitive regions are more heterogeneous than the 5′ end regions. This should be taken into account when using different regions to infer the phylogeny (Kong et al., 1999a).
Indels.
Analysis of insertions and deletions (indels) is a very useful tool with which to study bacterial phylogeny (Britten et al., 2003; Gupta & Griffiths, 2002). We compared the distribution of indels in the four ‘core’ genes/gene clusters of four U. parvum serovars with those of ten U. urealyticum serovars (Figs B, C and D in IJSEM Online) (Kong et al., 2000b). All indels were consistent between serovars within each species, which strongly supports the separation of Ureaplasma species based on indels. The rRNA gene cluster of U. parvum differed from that of U. urealyticum as follows: (a) a TGTG insertion in the 16S rRNA gene; (b) an AT (for operon 1) or A and C (for operon 2) deletion and AT insertion (for operons 1 and 2) in the 16S–23S rRNA intergenic spacer region; (c) a G insertion in 23S–5S rRNA intergenic spacer region; and (d) a TTAGG (for operon 1) or AAAAA (for operon 2) deletion in the 5S rRNA gene (Fig. B in IJSEM Online). There were no indels in the EF-Tu gene (tuf) (Fig. C in IJSEM Online). In the urease gene clusters, there were: (a) a TCAAT deletion in the ureA–ureB spacer; (b) AAC, T and CTA insertions in the ureB–ureC spacer; (c) a CA deletion in the ureC–ureE spacer; and (d) an ACATT insertion in the ureF–ureG spacer (Fig. D in IJSEM Online).
Despite these specific differences, the numbers of insertions or deletions, sites and total number of bases in these three genes were not significantly different between the two species. In mba, there were no indels in species-specific sites, but there was an AAATT insertion, an AA deletion, a 45 bp deletion and a TC deletion in U. parvum upstream of mba (Kong et al., 2000b).
Genes, intergenic spacers or gene clusters?
Many studies have shown that intergenic spacer regions are more heterogeneous than the neighbouring genes (Garcia-Martinez et al., 1999; Kong et al., 1999b). Our study confirmed this by showing greater heterogeneity in the intergenic spacers, especially the external spacer regions, of both the rRNA gene clusters and the urease gene cluster compared with the corresponding genes (see Table 2⇑ and supplementary Figs A, B and D and Table A in IJSEM Online) (Jung et al., 2003). In addition, for urease gene clusters and mba, indels only existed in the gene spacer regions. Because the gene cluster as a whole is a functional group, we suggest that there are a couple of advantages in considering them as a unit in basic and applied research. (i) Whole gene clusters contain both conserved and variable sequences and phylogenetic data derived from them are stable and discriminatory (Gürtler, 1999), which is valuable in solving taxonomic problems (Harasawa, 1999). (ii) Species-specific primer pairs based on whole gene clusters are generally more specific and easier to design than primers based on any single component (Kong et al., 2000a).
DNA or protein sequence? Which protein or gene region?
To fulfil polyphasic theory requirements (Vandamme et al., 1996), DNA sequences and protein amino acid sequences should be considered together (Agosti et al., 1996). However, DNA sequences often reflect the phylogeny more accurately and have greater (about double) discriminatory power (Simmons et al., 2002). Our study showed that the mba species-specific region (5′ end) DNA (67/430=15·6 %) and the corresponding amino acid sequences (24/147=16·3 %) have nearly identical levels of heterogeneity (Kong et al., 2000b). However, urease gene subunit (Table A in IJSEM Online) and EF-Tu gene DNA sequences are more heterogeneous than their corresponding protein amino acid sequences (Fig. C in IJSEM Online). For example, for the ureaplasma EF-Tu gene, DNA sequence heterogeneity was 54 out of 1185 (4·6 %) bases compared with 2 out of 394 (0·5 %) differences in amino acids between the two species. Presumably, genetic changes that significantly alter the structure, and therefore the function, of proteins such as enzymes are incompatible with survival. On the other hand, genetic variation that causes antigenic variation in multiple-banded antigens is not only consistent with survival but also an advantage if it helps the organism to evade the host immune response.
Many surface protein antigen genes are used to study the phylogeny of different microbes and to develop practical identification and typing schemes (Bush & Everett, 2001; Stackebrandt et al., 2002). Sometimes, the gene or even the gene region selected can significantly affect the results (Bromham & Penny, 2003). For example, ureaplasma mba contains both species-specific (5′ end region) and serovar definition sites (repetitive regions or 3′ end). Thus, the 5′ end region would be the appropriate region for studying species-level phylogeny, rather than the repetitive regions (Zheng et al., 1995). If different bacterial species share almost identical protein antigens as a result of lateral gene transfer (Lawrence, 2002) – for example, the Streptococcus agalactiae Alp3 protein and Streptococcus pyogenes R28 protein (Stalhammar-Carlemalm et al., 1999) – the corresponding genes lose their value for studying species-level taxonomy (Thornton, 2002).
Why ‘core’ genes?
Of the two Ureaplasma species discussed herein – U. parvum and U. urealyticum (Robertson et al., 2002) – a full genome sequence was available only for U. parvum (Glass et al., 2000). In future, our understanding of human ureaplasmas would be significantly improved by availability of the full genome sequence of U. urealyticum also. In particular, it would help to elucidate the nature and significance of the >80 kbp size difference between the two human Ureaplasma species (Robertson et al., 1990; Fraser et al., 2000) in reverse evolution (Rocha & Blanchard, 2002) and pathogenesis (Povlsen et al., 2002). Meanwhile, alternative strategies such as analysis of selected ‘core’ genes or gene clusters (as in this study) can be used to infer the phylogenetic relationship between species (Daubin et al., 2002). The rationale for the choice of these four genes was that the rRNA gene cluster (Stackebrandt et al., 2002) and tuf (Kamla et al., 1996) have been widely accepted targets for phylogenetic/taxonomic studies and the urease gene cluster and mba are unique determinants (Stackebrandt et al., 2002) of ureaplasma metabolism (Neyrolles et al., 1996) and antigenicity (Zheng et al., 1995).
Conclusion
Analysis of four ‘core’ genes/gene clusters further supported the establishment of two separate human Ureaplasma species – U. parvum and U. urealyticum. Significant differences between genes/gene clusters in the degree of heterogeneity between and within species sheds further light on the relationships between them and makes a useful case study to help understand common problems in the use of sequence data to infer phylogeny and support taxonomic change (Ludwig et al., 1998).
Acknowledgments
We wish to thank M. Wheeler for his precious help in sequencing.