EVOLUTION, PHYLOGENY AND BIODIVERSITY

Phylogeny of the Enterobacteriaceae based on genes encoding elongation factor Tu and F-ATPase β-subunit

  • 1Centre de recherche en infectiologie de l'Université Laval, Centre hospitalier universitaire de Québec (pavillon CHUL), Sainte-Foy, Québec, Canada G1V 4G2
  • 2Division de microbiologie, faculté de Médecine, Université Laval, Sainte-Foy, Québec, Canada G1K 7P4
  • 3département de biochimie et microbiologie, faculté des Sciences et Génie, Université Laval, Sainte-Foy, Québec, Canada G1K 7P4
  • 4Infectio Diagnostic (I.D.I.) Inc., Sainte-Foy, Québec, Canada G1V 2K8
  • Correspondence
    Michel G. Bergeron
    Michel.G.Bergeron{at}crchul.ulaval.ca
  • International Journal of Systematic and Evolutionary Microbiology 2005; 55(5):2013–2025 · https://doi.org/10.1099/ijs.0.63539-0

    View at publisher PubMed

    Abstract

    The phylogeny of enterobacterial species commonly found in clinical samples was analysed by comparing partial sequences of their elongation factor Tu gene (tuf) and of their F-ATPase β-subunit gene (atpD). An 884 bp fragment for tuf and an 884 or 871 bp fragment for atpD were sequenced for 96 strains representing 78 species from 31 enterobacterial genera. The atpD sequence analysis exhibited an indel specific to Pantoea and Tatumella species, showing, for the first time, a tight phylogenetic affiliation between these two genera. Comprehensive tuf and atpD phylogenetic trees were constructed and are in agreement with each other. Monophyletic genera are Cedecea, Edwardsiella, Proteus, Providencia, Salmonella, Serratia, Raoultella and Yersinia. Analogous trees based on 16S rRNA gene sequences available from databases were also reconstructed. The tuf and atpD phylogenies are in agreement with the 16S rRNA gene sequence analysis, and distance comparisons revealed that the tuf and atpD genes provide better discrimination for pairs of species belonging to the family Enterobacteriaceae. In conclusion, phylogeny based on tuf and atpD conserved genes allows discrimination between species of the Enterobacteriaceae.

    • Published online ahead of print on 27 May 2005 as DOI 10.1099/ijs.0.63539-0.

    • The GenBank/EMBL/DDBJ accession numbers for the 16S rRNA, tuf and atpD gene sequences obtained in this study are listed in Table 1.

    • Further trees based on tuf, atpD and 16S rRNA gene sequences, and scatterplots comparing pairwise distance between taxa, are available as supplementary figures in IJSEM Online.

    INTRODUCTION

    Members of the family Enterobacteriaceae are facultatively anaerobic, Gram-negative rods that are catalase-positive and oxidase-negative (Brenner, 1984). They are found in soil, water and plants, and also in animals ranging from insects to humans. Many enterobacteria are opportunistic pathogens. In fact, members of this family are responsible for about 50 % of nosocomial infections in the US (Brenner, 1984). Therefore, this family is of considerable clinical importance.

    The major classification studies on the family Enterobacteriaceae were based on phenotypic traits (Brenner et al., 1980, 1999; Dickey & Zumoff, 1988; Farmer et al., 1980, 1985a, b) such as biochemical reactions and physiological characteristics. However, phenotypically distinct strains may be closely related by genotypic criteria and may belong to the same genospecies (Bercovier et al., 1980; Hartl & Dykhuizen, 1984). Also, phenotypically close strains (biogroups) may belong to different genospecies, like Klebsiella pneumoniae and Enterobacter aerogenes (Brenner, 1984), for example. Consequently, identification and classification of certain species may be ambiguous with techniques based on phenotypic tests (Janda et al., 1999; Kitch et al., 1994; Sharma et al., 1990).

    More advances in the classification of members of the family Enterobacteriaceae have come from DNA–DNA hybridization studies (Brenner et al., 1980, 1986, 1993; Farmer et al., 1980, 1985a; Izard et al., 1981; Steigerwalt et al., 1976). Furthermore, the phylogenetic significance of bacterial classification based on 16S rRNA gene sequences has been recognized by many workers (Stackebrandt & Goebel, 1994; Wayne et al., 1987). However, members of the family Enterobacteriaceae have not been subjected to extensive phylogenetic analysis of the 16S rRNA gene (Spröer et al., 1999). In fact, this gene was not thought to solve taxonomic problems concerning closely related species because of its very high degree of conservation (Brenner, 1992; Spröer et al., 1999). Another drawback of the 16S rRNA gene is that it is found in several copies within the genome (seven in Escherichia coli and Salmonella typhimurium) (Hill & Harnish, 1981). Because of sequence divergence between the gene copies, direct sequencing of PCR products is seldom suitable for achieving a representative sequence (Cilia et al., 1996; Hill & Harnish, 1981). Other genes, such as gap and ompA (Lawrence et al., 1991), rpoB (Mollet et al., 1997) and infB (Hedegaard et al., 1999), have been used to resolve the phylogeny of enterobacteria. However, none of these studies covered an extensive number of species.

    tuf and atpD are the genes encoding elongation factor Tu and the F-ATPase β-subunit, respectively. Elongation factor Tu is involved in peptide chain formation (Ludwig et al., 1990). The two copies of the tuf gene (tufA and tufB) found in enterobacteria (Sela et al., 1989) share high levels of identity (99 %) in Salmonella typhimurium and in Escherichia coli. A recombination phenomenon could explain sequence homogenization between the two copies (Abdulkarim & Hughes, 1996; Grunberg-Manago, 1996). F-ATPase is present on the plasma membranes of eubacteria (Nelson & Taiz, 1989). It works mainly in ATP synthesis (Nelson & Taiz, 1989), and the β-subunit contains the catalytic site of the enzyme. Elongation factor Tu and F-ATPase have been highly conserved throughout evolution and show functional constancy (Amann et al., 1988a; Ludwig et al., 1990). Phylogenies based on protein sequences from elongation factor Tu and the F-ATPase β-subunit have shown good agreement with each other and with the rRNA gene sequence data (Ludwig et al., 1993). These phylogenies were reconstructed, respectively, from 36 species belonging to 32 bacterial genera and from 29 species belonging to 27 bacterial genera.

    We elected to sequence 884 bp fragments of tuf and atpD from 96 clinically relevant enterobacterial strains representing 78 species from 31 genera. These DNA sequences were used to create phylogenetic trees that were compared with 16S rRNA gene sequence trees generated using sequence data available in public databases. These trees revealed good agreement with each other and demonstrated the high resolution of tuf and atpD phylogenies at the species level.

    METHODS

    Bacterial strains and genomic material.

    All bacterial strains used in this study were obtained from the American Type Culture Collection (ATCC), Manassas, VA, USA, or the Deutsche Sammlung von Mikroorganismen und Zellkulturen GmbH (DSMZ), Braunschweig, Germany. Whenever possible, type strains were chosen. Identification of all strains was confirmed by classical biochemical tests using the automated MicroScan WalkAway-96 system equipped with a Negative BP Combo Panel Type 15 (Dade Behring Canada). Genomic DNA was purified using the G NOME DNA kit (Bio 101). Genomic DNA from Yersinia pestis was kindly provided by Dr Robert R. Brubaker of Michigan State University. The strains used in this study are described in Table 1.

    Table 1.

    Strains analysed

    Strains used in this study for sequencing of partial tuf, atpD and 16S rRNA genes are listed. Strains used in other studies for sequencing of the 16S rRNA gene are also shown; strain numbers on the same row represent the same strain although strain numbers may vary in the publications.

    PCR primers.

    The eubacterial tuf and atpD gene sequences available from public databases were analysed using the gcg package (version 8.0) (Accelrys). On the basis of multiple sequence alignments, two highly conserved regions were chosen for each gene, and PCR primers were derived from these regions with the help of oligo primer analysis software (version 5.0) (National Biosciences). A second 5′ primer was designed to amplify atpD for a few enterobacteria in which it was difficult to amplify the gene with the first primer set. When required, the primers contained inosines or degeneracies to account for variable positions. Oligonucleotide primers were synthesized with a model 394 DNA/RNA synthesizer (PE Applied Biosystems). The PCR primers used in this study are listed in Table 2.

    Table 2.

    PCR primers used for sequencing

    The nucleotide positions given are for Escherichia coli tuf and atpD sequences (GenBank accession numbers AE000410 and V00267, respectively). Numbering starts from the first base of the initiation codon.

    DNA sequencing.

    An 884 bp portion of the tuf gene and an 884 bp portion (or alternatively an 871 bp portion for a few enterobacterial strains) of the atpD gene were sequenced for all of the enterobacteria listed in Table 1. Amplifications were performed with 4 ng genomic DNA. The 40 μl PCR mixtures used to generate PCR products for sequencing contained 1·0 μM each primer, 200 μM each dNTP (Pharmacia Biotech), 10 mM Tris/HCl (pH 9·0 at 25 °C), 50 mM KCl, 0·1 % (w/v) Triton X-100, 2·5 mM MgCl2, 0·05 mM BSA and 1·0 U Taq DNA polymerase (Promega) combined with TaqStart (Clontech Laboratories). The PCR mixtures were subjected to thermal cycling (3 min at 95 °C and then 35 cycles of 1 min at 95 °C, 1 min at 55 °C for tuf or 50 °C for atpD, and 1 min at 72 °C, with a 7 min final extension at 72 °C) using a PTC-200 DNA Engine thermocycler (MJ Research). PCR products of the predicted sizes were recovered from a methylene-blue-stained agarose gel as described previously (Ke et al., 2000).

    Both strands of the purified amplicons were sequenced using the ABI Prism BigDye Terminator cycle sequencing ready reaction kit (PE Applied Biosystems) on an automated DNA sequencer (model 377; PE Applied Biosystems). Amplicons from two independent PCR amplifications were sequenced for each strain to ensure the absence of sequencing errors attributable to nucleotide misincorporations by the Taq DNA polymerase. Sequence assembly was performed with the aid of sequencher 3.0 software (Gene Codes).

    DNA sequences from 16S rRNA genes were obtained mostly from public databases. 16S rRNA gene sequences for Escherichia fergusonii and Escherichia vulneris were obtained using published primers (Lane, 1991). The strains used, and their descriptions, are shown in Table 1.

    Phylogenetic and distance analysis.

    Multiple sequence alignments were performed using PileUp from the gcg package (version 10.0) and checked by eye with the editor SeqLab to edit sequences when necessary and to identify regions containing gaps, indels or ambiguities to be excluded from the phylogenetic analysis. Haemophilus influenzae, Pasteurella multocida subsp. multocida, Shewanella putrefaciens and Vibrio cholerae were used as an outgroup because they do not belong to the family Enterobacteriaceae but are phylogenetically close to that family. Bootstrap subsets (750 or 1000 sets) and phylogenetic trees were generated with the neighbour-joining algorithm from Dr David Swofford's paup (Phylogenetic Analysis Using Parsimony) software, versions 4.0b4a and 4.0b6 (Sinauer Associates). The distance model used was Kimura two-parameter (Kimura, 1980).

    Distance Matrices Parsing and Plotting (DiMPP, a software tool freely available at ) was used to obtain scatterplots for pairwise gene comparison into the genetic distance space. These distance plots were analysed to determine visually how well each taxonomic level (in this case species, genera and families) is resolved by each of the two compared genes.

    Bootstrap and partition homogeneity test.

    To determine the number of bootstrap replications needed for the phylogenetic analyses, phylogenetic reconstructions were first repeated with exactly the same parameters at least twice with 100 bootstrap replications. If the consensus trees gave different topologies, the number of bootstrap replications was increased before repeating the phylogenetic reconstructions again (at least twice). The smallest number of bootstrap replications giving a stable consensus topology was chosen: for the tuf and atpD consensus trees, the smallest number of bootstrap replications required was 750. This number of bootstrap replications was also used for the tuf, atpD and 16S rRNA gene sequence consensus trees (available as Supplementary Fig. S1 in IJSEM Online). We repeated the same procedure for the tuf–atpD tree. This latter tree was stable with 1000 replications. The comparison of consensus trees reconstructed with different numbers of bootstrap replications showed that the instability of consensus topologies is observed at nodes that exhibit bootstrap values around 50 % (data not shown). This comparison revealed that this instability is not decreased with longer sequences. This could be explained by the fact that the submission of longer sequences brings a larger number of possible sequences randomly generated by the bootstrap calculation. Alternatively, these discrepancies could be attributed to incongruent phylogenetic signals between atpD and tuf. Indeed, a partition homogeneity test (ILD test in paup with 100 replicates) showed a P value of 0·01, suggesting an apparent conflict between the tuf and atpD phylogenies.

    RESULTS AND DISCUSSION

    Sequence data

    A PCR product of the expected size of 884 bp was obtained for tuf and one of 884 or 871 bp for atpD from all bacterial strains tested. After subtracting for biased primer regions and ambiguous single-strand data, 765 bp for tuf and 732 bp for atpD were subjected to phylogenetic analysis. The sequences obtained in this study are comparable to enterobacterial sequences from other studies available in public databases (Abdulkarim et al., 1991; Amann et al., 1988b; Blattner et al., 1997; Christensen & Olsen, 1998; Hudson et al., 1981; Perna et al., 2001; Saraste et al., 1981). However, some degree of polymorphism was observed. Zero to three and zero to nine differences in tuf and atpD sequences were found between Escherichia coli strains sequenced in this study and Escherichia coli K-12 MG1655 (Blattner et al., 1997). This polymorphism is comparable to that found between Escherichia coli K-12 MG1655 and Escherichia coli EDL933 (serovar O157 : H7) (Perna et al., 2001), for which four and six differences are encountered, respectively. The atpD sequence was appended to the tuf sequence for every strain. Indeed, it is preferable to join two or more genes in order to submit more biological information for phylogenetic analysis when their evolution is similar for the taxa under study. The tuf–atpD dual gene alignment used for phylogenetic inference was 1414 bp long. All of the 16S rRNA gene sequences listed in Table 1, obtained from 58 strains representing 53 species belonging to 28 genera, were aligned and 1300 bp were subjected to phylogenetic analysis. Gaps were excluded to perform tuf, atpD, tuf–atpD and 16S rRNA gene sequence analyses.

    Signature sequences

    Multiple sequence alignments revealed no indels for tuf, whereas atpD had three distinct regions with indels. The region between positions 105 and 121 of atpD of Escherichia coli (GenBank accession no. V00267) (Saraste et al., 1981) exhibited three different combinations involving one or two amino acid indels: one combined Budvicia aquatica, Pragia fontium and Leminorella grimontii, another was unique to Plesiomonas shigelloides and a third was found in species not belonging to the Enterobacteriaceae, including Shewanella putrefaciens, Haemophilus influenzae and Pasteurella multocida, which were used as an outgroup. The lack of conservation of this 105–121 region suggests that parallelism, convergence or back-substitution events could have occurred. Therefore, further analyses will be required to determine the phylogenetic significance of these indels.

    A 5 aa insertion located between positions 327 and 328 of atpD of Escherichia coli was observed for the type strains of Pantoea agglomerans, Pantoea dispersa and Tatumella ptyseos. This indel can be considered as a signature sequence for Pantoea species and Tatumella ptyseos (Fig. 1). In fact, the presence of a conserved indel of defined length and sequence which is flanked by conserved regions could suggest a common ancestor, particularly when members of a given taxon share this indel (Gupta, 1998). To our knowledge, this is the first demonstration to suggest a close common ancestor for the genera Pantoea and Tatumella. Also, this 5 aa indel could represent a useful marker for helping to resolve Pantoea classification. The transfer of Enterobacter agglomerans to Pantoea agglomerans was proposed by Gavini et al. (1989). However, rapid phenotypic identification systems are unable to distinguish unequivocally between the different species belonging to the Erwinia herbicola–Enterobacter agglomerans complex (Gavini et al., 1989). The groups within this complex could be individualized by DNA hybridization but the heterogeneity of the complex limits phenotypic identification. Interestingly, atpD sequence data were obtained from a second Pantoea agglomerans strain in addition to the type strain. It was found that Pantoea agglomerans ATCC 27989 does not possess the 5 aa indel, suggesting that this strain may be misclassified and most likely does not belong to the genus Pantoea (Fig. 1). Strain ATCC 27989 was deposited as Enterobacter agglomerans biogroup 7, and, although we could not find a reference justifying the name change for this particular strain, it should be noted that strains of biogroup 7 can be found in at least three different DNA relatedness groups (Brenner et al., 1984).

    Figure image not available in archive
    Fig. 1.

    Pantoea and Tatumella species-specific signature indel in atpD. The nucleotide positions given are for the Escherichia coli atpD sequence (GenBank accession no. V00267). Numbering starts from the first base of the initiation codon.

    A 7 aa insertion located between positions 603 and 604 of the atpD gene of Escherichia coli was observed in the Vibrio cholerae sequence obtained in this study (data not shown). More Vibrio sequences will be required to evaluate the significance of this indel.

    Phylogenetic trees based on partial tuf, atpD and 16S rRNA gene sequences of members of the Enterobacteriaceae

    Bootstrap consensus trees reconstructed from tuf, atpD and tuf–atpD sequences are shown in Fig. 2(a), (b) and (c), respectively. The phylogenetic trees generated from partial tuf and atpD sequences are similar overall, but they show minor differences in branching. The atpD tree shows more monophyletic groups corresponding to species that belong to the same genus than does the tuf tree. Monophyletic genera observed on the atpD consensus tree are Cedecea, Edwardsiella, Proteus, Providencia, Salmonella, Serratia, Raoultella and Yersinia. Since atpD is more divergent than tuf, the former could allow better resolution for tree reconstruction. Whatever the gene used for tree reconstruction, some genera are not monophyletic, e.g. Escherichia, Klebsiella and Enterobacter. These results support previous phylogenies based on the genes gap and ompA (Lawrence et al., 1991), rpoB (Drancourt et al., 2001; Mollet et al., 1997) and infB (Hedegaard et al., 1999) and on DNA–DNA hybridization studies (Brenner et al., 1986; Farmer et al., 1985a).

    Figure image not available in archive
    Figure image not available in archive
    Figure image not available in archive
    Fig. 2.

    Trees based on sequence data from (a) tuf, (b) atpD and (c) tuf–atpD. The phylogenetic analysis was performed with the neighbour-joining method, calculated using the Kimura two-parameter method. Values on each branch indicate the occurrence (%) of the branching order in 750 bootstrapped trees for (a) and (b), and in 1000 bootstrapped trees for (c). Haemophilus influenzae, Pasteurella multocida subsp. multocida, Shewanella putrefaciens and Vibrio cholerae were used as an outgroup. Strain names and sequence accession numbers are listed in Table 1. Similar trees including only those strains for which 16S rRNA gene sequences were available are shown in Supplementary Fig. S1 in IJSEM Online.

    There were few minor conflicts in branching between the tuf gene and the atpD gene. These differences could reflect small sequence differences, which could impact branching of genetically close taxa. This is the case for (i) Enterobacter aerogenes and Raoultella species, (ii) Escherichia hermannii and Escherichia vulneris, (iii) Escherichia coli, Escherichia fergusonii and Shigella species, (iv) serovars and subspecies of the same genospecies and (v) species of the same genus.

    Four slightly more important discrepancies between tuf and atpD phylogenies are more difficult to explain. (i) In terms of the tuf gene, Erwinia amylovora is closer to Pantoea species than to Tatumella ptyseos. Phylogeny based on 16S rRNA gene sequences (Spröer et al., 1999) confirms this branching. Nevertheless, this result is not congruent with the atpD phylogeny or with the indel (Fig. 1) shared only by the type strains of Pantoea species and Tatumella ptyseos. Moreover, bootstrap values better support the atpD branching. Therefore, atpD phylogeny could be more reliable for branching between these three genera. (ii) Branching of Leminorella grimontii with Edwardsiella species with the tuf gene is supported neither by atpD phylogeny nor by 16S rRNA gene sequence phylogeny (Spröer et al., 1999), suggesting that the tuf gene could have evolved at a slower pace in the genus Leminorella. (iii) tuf phylogeny reveals a closer relationship between Trabulsiella guamensis and Citrobacter farmeri, while atpD shows more distant branching. In fact, the distance between these species is much smaller with the tuf gene and corresponds to distances obtained between two taxa of the same genus. (iv) Moellerella wisconsensis is closer to the genera Proteus and Providencia according to atpD gene analysis than according to tuf gene analysis. 16S rRNA gene sequences were not available for Trabulsiella guamensis or for Moellerella wisconsensis. Perhaps further phylogenetic studies based on other genes could help to resolve these ambiguities.

    Even though the Pantoea and Tatumella species-specific indel was excluded for phylogenetic analysis, type strains of Pantoea agglomerans and Pantoea dispersa grouped together and were distant from Pantoea agglomerans ATCC 27989, adding further evidence that careful analysis is required for the identification of species belonging to the heterogeneous Erwinia herbicola–Enterobacter agglomerans complex. In fact, with respect to the tuf and atpD genes, Pantoea agglomerans strain ATCC 27989 exhibits branch lengths similar to those for Enterobacter species. No comparisons of 16S rRNA gene sequences could be realized, because of the unavailability of the 16S rRNA gene sequence for Pantoea agglomerans strain ATCC 27989. Therefore, until further reclassification of this genus, we suggest that this strain should remain a member of the genus Enterobacter.

    tuf and atpD trees exhibit very short genetic distances between taxa belonging to the same genetic species, including species segregated on the basis of clinical considerations. For example, Escherichia coli and Shigella species were confirmed to be of the same genetic species by hybridization studies (Brenner et al., 1972a, b, 1982b), as well as by phylogenies based on 16S rRNA genes (Wang et al., 1997) and rpoB genes (Mollet et al., 1997). Hybridization studies (Bercovier et al., 1980) and phylogeny based on 16S rRNA gene sequences (Ibrahim et al., 1994) also demonstrated that Yersinia pestis and Yersinia pseudotuberculosis are of the same genetic species. Five genospecies analysed in this study are represented by at least two members: E. coli–Shigella species, Yersinia pestis and Yersinia pseudotuberculosis, Klebsiella pneumoniae subspecies, Morganella morganii subspecies and Salmonella choleraesuis subspecies. Salmonella choleraesuis is a less tightly knit species than the other four genospecies. In fact, strains from Salmonella choleraesuis show DNA–DNA hybridization levels of 57–99 % between subspecies and these hybridization levels are more than 76 % within each subspecies (Le Minor et al., 1982). The genetic definition of a species generally would include strains with approximately 70 % or greater DNA–DNA relatedness (Wayne et al., 1987). Therefore, Salmonella choleraesuis is a genetically broad species in accordance with DNA–DNA hybridization analyses and our phylogenetic results.

    atpD phylogeny revealed Salmonella choleraesuis subspecies divisions consistent with the actual taxonomy. This result was also observed by Christensen & Olsen (1998). On the other hand, Salmonella choleraesuis subspecies are not resolved as well by tuf phylogeny. atpD and tuf phylogenies suggest that Salmonella bongori is another Salmonella choleraesuis subspecies. This observation is corroborated by 16S rRNA (Supplementary Fig. S1) and 23S rRNA gene sequence phylogeny (Christensen et al., 1998), is qualified by DNA hybridization values (Le Minor et al., 1982) and is contradicted by multilocus enzyme electrophoresis (Reeves et al., 1989). In fact, the DNA–DNA hybridization level between Salmonella bongori and Salmonella choleraesuis strains ranges from only 51 % up to 64 %, while intraspecies DNA–DNA hybridization levels for Salmonella bongori strains are above 91 % (Le Minor et al., 1982). Le Minor et al. (1982) observed that Salmonella bongori could be considered as a novel species. Finally, Reeves et al. (1989) proposed the novel combination Salmonella bongori comb. nov. It had been previously observed that recently diverged species might not be recognizable on the basis of conserved sequences even if DNA hybridization established them as being different species (Fox et al., 1992). Therefore, Salmonella bongori and Salmonella choleraesuis could be considered as distinct, though recently diverged, species.

    The phylogenetic relationships between Salmonella, Escherichia coli and Citrobacter freundii are not well defined. 16S and 23S rRNA gene sequence data reveal a closer relationship between Salmonella and Escherichia coli than between Salmonella and Citrobacter freundii (Christensen & Olsen, 1998; Spröer et al., 1999), while DNA–DNA hybridization studies (Selander et al., 1996) and infB phylogeny (Hedegaard et al., 1999) showed that Salmonella is more closely related to Citrobacter freundii than to Escherichia coli. In that regard, the tuf and atpD phylogenies are coherent with 16S and 23S rRNA gene sequence analysis, showing a closer relationship between the genus Salmonella and Escherichia coli than between the genera Salmonella and Citrobacter.

    According to the tuf and atpD phylogenies (Supplementary Fig. S1a, b), Escherichia fergusonii is very close to the Escherichia coli–Shigella genetic species. This observation is corroborated by the 16S rRNA gene sequence phylogeny (Supplementary Fig. S1c) (McLaughlin et al., 2000) but not by the DNA hybridization values. In fact, the DNA–DNA hybridization level between Escherichia fergusonii and Escherichia coli–Shigella is only 49–63 % (Farmer et al., 1985a). Therefore, Escherichia fergusonii could be a recently diverged species, such as is the case for Salmonella bongori.

    To simplify the comparisons, phylogenetic trees for tuf and atpD (Supplementary Fig. S1a, b) were reconstructed using sequences corresponding to taxa for which 16S rRNA gene sequences were available in the GenBank/EMBL databases. To complete this study, we determined the 16S rRNA gene sequences of Escherichia fergusonii and Escherichia vulneris (Supplementary Fig. S1c). The tuf and atpD trees were similar to those generated using additional taxa (shown in Fig. 2). The tree for 16S rRNA gene sequences gave a poorer resolution power at the species and genus levels than did the tuf and atpD trees. Indeed, the 16S rRNA gene sequence tree exhibited more multifurcation (polytomies) than did the tuf and atpD trees.

    Not withstanding the apparent incongruence of tuf and atpD, the phylogeny based on tuf–atpD appears to improve some bootstrap values, and, in some cases, to resolve a few of the polytomies. Indeed, according to that consensus tree (Fig. 2c), Budvicia aquatica and Pragia fontium are resolved from the species belonging to the genus Yersinia. Also, Plesiomonas shigelloides is branched deeper than the group Hafnia alvei–Obesumbacterium proteus and Morganella morganii subspecies. Moreover, the branch with Leminorella grimontii and species of the genus Edwardsiella appears as a sister group of the Cedecea–Klebsiella–Enterobacter–Escherichia–Salmonella–Citrobacter group. This latter group has been defined as the ‘core’ of the family Enterobacteriaceae (Brenner et al., 1982a). Finally, the Citrobacter koseri–Citrobacter sedlakii group and Pantoea agglomerans ATCC 27989 branch between the Escherichia coli–Shigella–Escherichia fergusonii–Salmonella group and the other enterobacteria belonging to the ‘core’.

    Distance analysis with DiMPP showed that, for each pair of strains compared with each other, tuf and atpD distances were sufficient to allow clear discrimination between different species, whereas 16S rRNA gene sequences often exhibited much shorter distances between species (see Supplementary Fig. S2 available in IJSEM Online). Other studies confirm that sequence analysis of 16S rRNA genes is not an appropriate method for delineation at lower taxonomic levels; for example, sequence heterogeneities among 16S rRNA operons can affect phylogenetic analysis at the species level (Cilia et al., 1996; Clayton et al., 1995). Moreover, the low evolutionary rate of this gene can cause failure in the distinction of closely related taxa (Palys et al., 1997). However, the majority of phenotypically close enterobacterial species could be easily discriminated genotypically using tuf or atpD gene sequences.

    Conclusion

    In this study, the phylogenetic affiliations of 96 enterobacterial strains representing 78 species from 31 genera were revealed by analyses based on tuf and atpD genes. These genes exhibit phylogenies consistent with the 16S rRNA gene sequence phylogeny. For example, they show that the family Enterobacteriaceae is monophyletic. However, tuf and atpD distances provide a higher discriminating power at the species level. In fact, tuf and atpD provide better discrimination between different genospecies, such that primers and molecular probes could be designed for diagnostic purposes. Therefore, they represent good target genes for distinguishing phenotypically close enterobacteria belonging to different genetic species, e.g. Klebsiella pneumoniae and Enterobacter aerogenes. Preliminary studies support these observations, and diagnostic tests based on tuf and atpD gene sequence data for identifying enterobacteria are currently under development in our laboratory.

    In summary, this study shows that tuf, atpD and a tuf–atpD combination represent highly valuable phylogenetic tools offering discriminatory power superior to that of 16S rRNA gene sequences for distinguishing between species. Moreover, extensive evolutionary distance comparisons using a group of conserved genes should help to better define a genetic basis for classification into genera and families. This would be of great value for revisiting the taxonomy of bacterial species.

    Acknowledgments

    We thank Pascal Lapierre for the design of tuf sequencing primers. S. P. received scholarships from Fondation Dr George Phénix (Outremont, Québec, Canada) and from le Fonds de recherche en santé du Québec. This research project was supported by grant PA-15586 from the Canadian Institutes of Health Research and by Infectio Diagnostic (I.D.I) Inc., Ste-Foy, Québec, Canada.

    References