Abstract
DNA–DNA hybridization (DDH), the gold standard for bacterial species delineation, is a laborious method and the alternative, average nucleotide identity (ANI), a genomic sequence-derived parameter, is not applicable to non-sequenced species. A universal cut-off value to delineate bacterial species does not exist, yet a DDH value <70 % and ANI <95±0.5 % have proved useful in selected examples. We herein compare published values for DDH and ANI with sequence similarity of rpoB gene sequences retrieved from GenBank for strains of 230 bacterial species representative of 45 genera. Intraspecific rpoB sequence similarity was 98.2–100 %. We observed that an rpoB gene sequence similarity ≤97.7 % significantly correlated with a DDH value <70 % and an ANI value <94.3 %. An rpoB gene sequence similarity <85.5 % correlated with membership of different genera. When applied to fastidious and as-yet-uncultivated organisms lacking experimental DDH values, these cut-off values suggested that ‘Candidatus Blochmannia pennsylvanicus’ and ‘Candidatus Blochmannia floridarius’ may belong to different genera, that the different endosymbiotic Buchnera aphidicola organisms may belong to different genera and that, while the tsetse fly enteric symbiont Sodalis glossinidius may belong to the Enterobacteriaceae, the endocellular obligate symbiont Wigglesworthia glossinidia from the same host may belong to the group of as-yet-uncultivated gammaproteobacteria. rpoB gene sequence similarity provides an efficient supplement to DDH and ANI measurements to delineate bacterial species and genera, including delineation of as-yet-uncultivated, non-sequenced organisms.
-
Details of strains included in this study and a maximum-likelihood tree based on rpoB gene sequences of gammaproteobacteria are available as supplementary material with the online version of this paper.
INTRODUCTION
DNA–DNA hybridization (DDH) has been adopted as the primary molecular tool to delineate bacterial species (Stackebrandt & Goebel, 1994; Wayne et al., 1987), and a 70 % cut-off is most often used to place organisms into different species, although some species have higher threshold values. Despite being introduced several decades ago, DDH data are consistent with recent results from complete genome sequences. Nevertheless, DDH is a time-consuming and labour-intensive method, requiring the manipulation of large amounts of biomass of possibly hazardous organisms, and is ill-suited to rapid delineation of prokaryotes. Also, it is not applicable to the study of prokaryotes that are currently non-culturable, now recognized to represent the majority of prokaryotes in the biosphere (Amann et al., 1995; Rappé & Giovannoni, 2003; Riesenfeld et al., 2004). It also requires pairwise comparisons of DNA isolated from the test organism and members of characterized species sharing >97 % 16S rRNA gene sequence similarity (Stackebrandt & Goebel, 1994); individual strains cannot be analysed and compared with a database using a codified set of criteria to assign them to a known taxon or to propose a new taxon (Gevers et al., 2005).
At the beginning of the 1990s, it was found that bacterial isolates exhibiting <97 % 16S rRNA gene sequence similarity usually shared <70 % DDH and belonged to different species. On the other hand, isolates exhibiting ≥97 % similarity might or might not meet the 70 % DDH criterion for inclusion in the same species (Fox et al., 1992; Stackebrandt & Goebel, 1994). A recent study increased the recommended 16S rRNA gene sequence similarity from 97 to 98.7–99 % in order to facilitate taxonomic studies without sacrificing the quality and precision of a ‘species’ description (Stackebrandt & Ebers, 2006). In this study (Stackebrandt & Ebers, 2006), about 20 % of 16S rRNA gene sequences exhibited a similarity value >99 % with corresponding DDH values <70 %.
In the present study, we first compared values of published rpoB gene sequence similarity derived from complete genome sequences and GenBank with DDH values and secondarily we examined the relationship of complete rpoB gene sequence similarity and the genomic sequence-derived parameter average nucleotide identity (ANI) (Konstantinidis & Tiedje, 2005a). The ANI of conserved genes in two sequenced strains represents a robust measure of the genetic and evolutionary distance, it correlates with 16S rRNA gene sequence similarity and the genome mutation rate, it is not affected by lateral transfer or variable recombination rates of single (or a few) genes and it offers resolution at the subspecies level (Konstantinidis & Tiedje, 2005a). rpoB is a single-copy gene, it belongs to the common set of genes (Koonin, 2003) and it is long enough to contain phylogenetically useful information (Adékambi & Drancourt, 2004; Drancourt & Raoult, 1999; Khamis et al., 2003, 2004; La Scola et al., 2006; Mollet et al., 1998; Taillardat-Bisch et al., 2003). It might be less prone than the 16S rRNA gene to lateral gene transfer (Case et al., 2007; Korczak et al., 2006) due to its housekeeping function and it has already been used for bacterial species delineation in selected cases (Adékambi et al., 2004, 2006a, b, c) as well as to estimate the DNA G+C content of whole bacterial genomes (Fournier et al., 2006). Among some alternative protein-coding genes, the rpoB gene showed the highest correlation to average amino-acid identity, which reflects whole genome-level relatedness (Konstantinidis & Tiedje, 2005b).
METHODS
DDH and ANI data mining.
We screened DDH values contained in articles published during the last three decades (see Supplementary Table S1 in IJSEM Online). The DDH values were obtained using the microtitre plate technique (Ezaki et al., 1989), the spectrophotometric technique (De Ley et al., 1970), the membrane filter method (Tourova & Antonov, 1987) and dot hybridization (Yip et al., 2007). We further analysed ANI measurements previously determined for 85 bacterial genomes (Goris et al., 2007; Konstantinidis & Tiedje, 2005a) (Supplementary Table S2).
Sequence analysis.
We recovered complete 16S rRNA gene and rpoB gene sequences from the same bacterial strains in GenBank in order to constitute a local database. A blastn search (Altschul et al., 1990) identified the best hits for the 16S rRNA gene and rpoB gene in other species (). However, genes appearing to be the most similar based on blast hits may often not represent the closest relative phylogenetically (Koski & Golding, 2001). Also, with direct blastn with rpoB sequences exhibiting <85.5 % similarity, manual analysis was necessary in most situations. Percentages of similarity between 16S rRNA gene sequences or between rpoB gene sequences were determined using the clustal w program supported by the PBIL website. The rpoB sequences were obtained from strains of the same bacterial species used in DDH experiments and ANI measurements (Supplementary Tables S1 and S2), although the strains sequenced were not necessarily the same. The maximum-likelihood tree was reconstructed using PhyML software with GTR (gamma distribution and invariable sites) as the substitution model and 100 bootstrap replications (Guindon & Gascuel, 2003).
RESULTS AND DISCUSSION
Correlation between rpoB gene sequence similarities and DDH values
GenBank contains over 25 000 sequences annotated as being partial or complete rpoB gene sequences. Complete rpoB gene sequences (∼3000 sequences) ranged from 3411 bp (Staphylococcus aureus) to 4185 bp (Neisseria meningitidis). The quality of the rpoB gene sequence subset is likely to be reliable since most of the rpoB gene sequences in GenBank are recent (deposited in the last 10 years). The plot of DDH values against rpoB gene sequence similarity for 318 pairwise comparisons fits a linear model (Fig. 1⇓). The correlation between the two parameters is high (r2=0.82) and the linear relationship between the two parameters was best described by the regression formula DDH value=5.98(rpoB similarity)−516.1. However, the r2 value was probably <0.9 because reciprocal DDH values may differ by up to 15 % and DDH values varied according to the method used. None of these methods is straightforward to apply without thorough training. Also, a significant number of physico-chemical parameters, genome size, the presence of large plasmids, DNA purity and other factors, influence the hybridization (Stackebrandt & Ebers, 2006). DDH values may also not represent the actual sequence identity because detectable DNA heteroduplexes will only form between strands that show at least 80 % sequence identify (Coenye & Vandamme, 2004; Rosselló-Mora & Amann, 2001). Multilocus sequence analysis (MLSA) has been proposed as an alternative method that can replace DDH experiments (Gevers et al., 2005; Kuhnert & Korczak, 2006; Yamamoto et al., 1999). However, only partial gene sequences were involved in these analyses, limiting the accuracy of the conclusions. Also, these studies were limited to one genus or family, limiting their usefulness across a large range of bacteria.
Scatter plot of the relationship between rpoB gene sequence similarity and DDH (%).
Species delineation cut-off
Our analysis indicated that an rpoB gene sequence similarity above 97.7 % should be useful to delineate species (Fig. 2⇓). Using this cut-off value, species exhibiting a 16S rRNA gene sequence similarity ≥99 % and a DDH value <70 % with closest related species were resolved by the complete rpoB gene sequence in 82/85 cases (96.5 %) (Fig. 3⇓). According to the rpoB-based criterion, a 99.7 % rpoB sequence similarity confirmed the close phylogenetic relationship between Borrelia recurrentis, which causes louse-borne relapsing fever worldwide, and Borrelia duttonii, which causes East African tick-borne relapsing fever (Cutler et al., 1999). Also, this criterion agreed with DDH criteria that place Escherichia coli and Shigella flexneri in the same species as well as placing Salmonella enterica serovar Typhimurium and Salmonella enterica serovar Typhi in the same species, Bacillus anthracis and Bacillus cereus in the same species and Mycobacterium tuberculosis complex isolates in the same species. Mycobacterium tuberculosis complex isolates have been regarded as variants of a single clone (Sreevatsan et al., 1997) and would become clones Tuberculosis, Africanum, Bovis and Microti (Lan & Reeves, 2001). Also, Corynebacterium glucuronolyticum and Corynebacterium seminale were placed in the same species, with a DDH value of 78 %, rpoB sequence similarity of 99.7 % and 16S rRNA gene sequence similarity of 99.9 % (Devriese et al., 2000; Khamis et al., 2004). A complete rpoB gene sequence similarity >99.4 % also agreed with the previous proposition that the genus Brucella contains only one species (Verger et al., 1985, 1987), with DDH >90 % among all nomenspecies (Marianelli et al., 2006). Our data agreed with the re-elevation of Pseudomonas syringae pv. tomato to a novel species (Gardan et al., 1999) because it displayed only 41–46 % DDH and 95.4 % rpoB sequence similarity but 99.0 % 16S rRNA gene sequence similarity with Pseudomonas syringae pv. syringae.
Comparison of rpoB gene sequence similarities (>88.5 %) and DDH values. Interspecies comparisons are indicated by filled diamonds, whereas intraspecies comparisons are indicated by open squares.
Comparison of 16S rRNA gene sequence similarities (>98 %) and DDH values. Interspecies comparisons are indicated by filled diamonds, whereas intraspecies comparisons are indicated by open squares.
There are a few more complicated cases with respect to speciation in our dataset. Discrepancies in species delineation of isolates with rpoB sequence similarities >97.7 %, 16S rRNA gene sequence similarities >99 % and DDH values <70 % were observed in pairwise comparisons of Mycobacterium fortuitum with Mycobacterium houstonense and Mycobacterium senegalense with Mycobacterium houstonense, suggesting the re-examination of Mycobacterium houstonense (Adékambi & Drancourt, 2004; Adékambi et al., 2003; Schinsky et al., 2004). As for Mycobacterium ulcerans and Mycobacterium marinum, a 99.6 % rpoB sequence similarity and >98 % whole genome sequence similarity (Stinear et al., 2007) contrasted with a DDH value <59 % (Yip et al., 2007). The presence of a 174 kb virulence plasmid, two bacteriophages, multiple copies of IS2404 and multiple DNA deletions and rearrangements in Mycobacterium ulcerans may explain, at least in part, the strikingly low value for DDH (Stinear et al., 2007; Yip et al., 2007). These genome sequence data led to the conclusion that Mycobacterium ulcerans, a mycolactone-producing mycobacterium, had recently evolved via lateral gene transfer and reductive evolution from the environmental, mycolactone-non-producing species Mycobacterium marinum to become a niche-adapted specialist (Stinear et al., 2007).
Borrelia bissetii and Borrelia burgdorferi exhibited 95.7 % rpoB sequence similarity but 99.7 % 16S rRNA gene sequence similarity and a 72.8 % DDH value. However, caution should be taken in interpreting this exception because a reciprocal DDH value was not available (Masuzawa et al., 2001) and the DDH value was close to the 70 % cut-off. Prokaryotic species are considered to be groups of isolates that share 50–70 % DDH and 5–7 % difference in thermal stability between the homologous and heterologous duplexes (Ursing et al., 1995). Further experimental DDH data are therefore required in order to interpret this exception.
The 16S rRNA gene sequences of Burkholderia mallei, Burkholderia pseudomallei and Burkholderia thailandensis are more than 99 % similar. Comparison of complete rpoB gene sequences derived from their genome sequences showed that Burkholderia pseudomallei and Burkholderia thailandensis shared 97.1 % similarity (DDH value <47 %) (Yabuuchi et al., 2000) whereas Burkholderia pseudomallei and Burkholderia mallei shared 99.8 % similarity (DDH value >76 %) (Rogul et al., 1970). This finding was in agreement with MLSA results based on seven housekeeping genes, ace, gltB, gmhD, lepA, lipA, narK and ndh (Godoy et al., 2003). This was also true for Burkholderia cenocepacia and Burkholderia vietnamiensis, which shared 97.7 % rpoB sequence similarity. These examples illustrate the contribution of the rpoB gene sequence approach to the resolution of taxonomic ambiguities.
Furthermore, the intraspecies rpoB sequence similarity of 440 isolates belonging to 83 species, 44 genera and 14 phyla ranged from 98.2 to 100 % (Supplementary Table S3). However, there is overlapping rpoB similarity between isolates of Bacillus anthracis, Bacillus cereus and Bacillus thuringiensis, Mycobacterium tuberculosis and Mycobacterium bovis, Burkholderia mallei and Burkholderia pseudomallei, Escherichia coli and Shigella species, Yersinia pestis and Yersinia pseudotuberculosis, and Brucella species.
Correlation between rpoB gene sequence similarity and ANI
The plot of 134 values derived from 85 bacterial genomes (Supplementary Table S2) shows that ANI correlates more strongly (r2=0.93 for linear model) with rpoB gene sequence similarity (Fig. 4⇓) than with 16S rRNA gene sequence identity (r2=0.79 for linear model) (Konstantinidis & Tiedje, 2005a) and shows that rpoB gene sequence similarity can resolve areas where the 16S rRNA gene is inadequate, such as the species level. The 97.7 % rpoB gene sequence similarity cut-off for speciation of closely related bacterial strains corresponds to 94.3 % ANI (Fig. 4⇓). The latter value is similar to the ANI of 95±0.5 % recently found by Goris et al. (2007) to correspond to the recommended cut-off point of 70 % DDH for species delineation. A borderline 94.3 % ANI measurement has been found between Neisseria gonorrhoeae and Neisseria meningitidis (Konstantinidis & Tiedje, 2005a).
Scatter plot of the relationship between rpoB gene sequence similarities and ANI (%).
Genus delineation cut-off
Despite the fact that the 16S rRNA gene sequence similarity cut-off for genus delineation is different in different bacterial clades, the published cut-off of <95 % 16S rRNA gene sequence similarity has been used to delineate genera otherwise delineated by other methods (Clarridge, 2004; Schloss & Handelsman, 2004). Using this value as a standard, complete rpoB gene sequence similarity <85.5 % delineated genera in 33 of 35 cases (94.3 %) (Fig. 1⇑). For example, [Haemophilus] ducreyi exhibited only 81.1 % rpoB sequence similarity with Haemophilus influenzae, 80.5 % with Pasteurella multocida and 83.7 % with Mannheimia haemolytica but 86.6 % with Actinobacillus pleuropneumoniae. It is therefore a member of the same genus as Actinobacillus pleuropneumoniae. These data are in agreement with a previous analysis based on five housekeeping genes (Kuhnert & Korczak, 2006) that suggested the transfer of [Haemophilus] ducreyi to the genus Actinobacillus. The rpoB sequence similarities of >93 % among Escherichia, Shigella and Salmonella species support the suggestion that they belong to a single genus and should be considered highly related (Brenner et al., 1969, 1972; Crosa et al., 1973). The genus Yersinia was found to be at the borderline of the enteric group, with rpoB gene sequence similarity <85.5 % (Supplementary Table S1). An rpoB gene sequence similarity <85.5 % was found for species of the family Pasteurellaceae, for which the taxonomy of the genera, species and subspecies is not well settled (Blackall et al., 2007; Christensen et al., 2007; Kuhnert & Korczak, 2006).
Our analysis suggested that Acidovorax avenae and ‘Verminephrobacter eiseniae’ may belong to the same genus, because they shared 87.2 % rpoB sequence similarity and 95 % 16S rRNA gene sequence similarity. Geobacter sulfurreducens and Geobacter metallireducens may belong to different genera (rpoB sequence similarity 82 %, 16S rRNA gene sequence similarity <95 %), as may Pelobacter carbinolicus and Pelobacter propionicus (rpoB sequence similarity 63.6 %, 16S rRNA gene sequence similarity <90 %) and Desulfovibrio vulgaris and Desulfovibrio desulfuricans (rpoB sequence similarity 78 %, 16S rRNA gene sequence similarity <90 %).
The species in the genera Afipia, Bradyrhizobium, Rhodopseudomonas and Nitrobacter may belong to the same genus (rpoB sequence similarity >85.8 %, 16S rRNA gene sequence similarity >98 %). Similarly, species in the genera Rhizobium, Sinorhizobium and Agrobacterium (rpoB sequence similarity >87 %, 16S rRNA gene sequence similarity >95 %) may belong to the same genus. However, the genera Bosea and Mesorhizobium were distinct but closely related to these genera, as found previously (Khamis et al., 2003). Also, Anabaena and Nostoc species may belong to the same genus (94.2 % rpoB sequence similarity, 98.9 % 16S rRNA gene sequence similarity).
The situation was more complex in the genus Mycoplasma using data derived from complete genome sequences. In the Mycoplasma pneumoniae group (Kim et al., 2003), Mycoplasma pneumoniae and Mycoplasma genitalium shared 97.8 % 16S rRNA gene sequence similarity and only 75 % rpoB sequence similarity, Mycoplasma pneumoniae and Mycoplasma gallisepticum shared 59.5 % rpoB and 89.5 % 16S rRNA gene sequence similarity, Mycoplasma pneumoniae and Mycoplasma penetrans shared 57.1 % rpoB and 81.4 % 16S rRNA gene sequence similarity, Mycoplasma genitalium and Mycoplasma gallisepticum shared 61.3 % rpoB and 89.5 % 16S rRNA gene sequence similarity, Mycoplasma genitalium and Mycoplasma penetrans shared 60.7 % rpoB and 81.3 % 16S rRNA gene sequence similarity and Mycoplasma gallisepticum and Mycoplasma penetrans shared 64.2 % rpoB and 82.8 % 16S rRNA gene sequence similarity. This group was distant from the Mycoplasma hominis group (Kim et al., 2003), with which it shared <50 % rpoB and <76 % 16S rRNA gene sequence similarity. In the latter group, Mycoplasma synoviae and Mycoplasma pulmonis shared only 66.2 % rpoB and 82.9 % 16S rRNA gene sequence similarity, whereas Mycoplasma capricolum and Mycoplasma mycoides shared 95.8 % rpoB and 99.2 % 16S rRNA gene sequence similarity and 40 % DDH. This large divergence suggested that the current genus Mycoplasma includes members that could be classified into different genera. These data suggest that more taxonomic work is needed to reconcile speciation within this genus in order to provide an improved basis for Mycoplasma taxonomy (Gasparich et al., 2004; van Passel et al., 2005).
rpoB gene sequence-based prediction of DDH values for fastidious or uncultivated endosymbionts
When applying a <85.5 % rpoB similarity cut-off for the delineation of bacterial genera to as-yet-uncultivated organisms for which experimental DDH and biochemical parameters are therefore lacking, we found that ‘Candidatus Blochmannia floridarius’ and ‘Candidatus Blochmannia pennsylvanicus’ may belong to different genera, as they shared 80.6 % rpoB and 93.1 % 16S rRNA gene sequence similarity. This was also true for Anaplasma marginale and Anaplasma phagocytophilum, which shared 76.3 % rpoB sequence similarity and 92.3 % 16S rRNA gene sequence similarity, and for Buchnera aphidicola strains APS, Sg, Bp and Cc, which exhibited rpoB sequence similarities of 74.1–85.0 % and 16S rRNA gene sequence similarity of 88.7–94.1 %. Because strains of Buchnera aphidicola are known simply by the host aphid that they infect (Munson et al., 1991), naming species based on host association would lead to unmanageable proliferation of species names (Lo et al., 2007).
Sodalis glossinidius (an enteric symbiont of tsetse flies) and Wigglesworthia glossinidia (an endocellular obligate symbiont of tsetse flies) have been classified within the Enterobacteriaceae. Our analysis showed that Sodalis glossinidius was indeed more closely related to species of the Enterobacteriaceae (82.9–85.0 % rpoB sequence similarity, <93 % 16S rRNA gene sequence similarity) than was Wigglesworthia glossinidia (63.7–66.0 % rpoB sequence similarity, <87 % 16S rRNA gene sequence similarity). Thus, the two organisms shared the name of their host species but were evolutionarily unrelated. Furthermore, most of the as-yet-uncultured gammaproteobacteria clustered separately from members of the Enterobacteriaceae in the maximum-likelihood tree, with a bootstrap value of 100 % (Supplementary Fig. S1). This confirmed that the as-yet-uncultured gammaproteobacterial lineage was clearly different from that of other recognized gammaproteobacteria. This was in agreement with the new composite measure termed ‘genome conservation’, where the phylogenetic analysis was not affected by the number of species incorporated in the dataset (Kunin et al., 2005).
Our analysis also confirmed that Wolbachia pipientis wMel (intracellular bacterium of Drosophila melanogaster) (Wu et al., 2004) and Wolbachia sp. wBm (endosymbiont of Brugia malayi) (Foster et al., 2005) belonged to the same genus but not to the same species (87.6 % rpoB sequence similarity and 97.3 % 16S rRNA gene sequence similarity), with a predicted DDH value of about 7.7 %. This finding is in agreement with a recent proposal to re-examine taxonomic relationships of Wolbachia pipientis strains (Lo et al., 2007).
Conclusions
Phenotypic, genotypic (DDH) and phylogenetic (16S rRNA gene) information is commonly used to determine a consensus taxonomy of bacteria (Vandamme et al., 1996). While the 16S rRNA gene sequence often lacks resolution when compared with DDH, we found that the rpoB gene sequence correlated with ANI, a genomic sequence-derived parameter, and with DDH, an experimental comparison of whole genomes. rpoB gene sequence comparison may then be used as a first, preliminary indication of species and genus delineation. However, the lack of universal primers for rpoB gene amplification and sequencing due to the saturation of third codon positions over a long evolutionary timescale may limit such an application. Nonetheless, analysis of rpoB gene sequences derived from on-going genomic and metagenomic sequencing projects could allow estimation of DDH values for even fastidious and as-yet-uncultured organisms. We therefore propose rpoB gene sequence similarity as a suitable estimate of DDH for bacterial species and genus delineation.
Acknowledgments
We thank Bonnie B. Plikaytis for expert review of the manuscript. Funding was provided by the Unité des Rickettsies, Faculté de Médecine, Université de la Méditerranée, Marseille, France.