Methods

Taxonomic use of DNA G+C content and DNA–DNA hybridization in the genomic age

  • Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures GmbH, Inhoffenstraße 7B, 38124 Braunschweig, Germany
  • Correspondence
    Hans-Peter Klenk hpk{at}dsmz.de
  • International Journal of Systematic and Evolutionary Microbiology 2014; 64(Pt 2):352–356 · https://doi.org/10.1099/ijs.0.056994-0

    View at publisher PubMed

    Abstract

    The G+C content of a genome is frequently used in taxonomic descriptions of species and genera. In the past it has been determined using conventional, indirect methods, but it is nowadays reasonable to calculate the DNA G+C content directly from the increasingly available and affordable genome sequences. The expected increase in accuracy, however, might alter the way in which the G+C content is used for drawing taxonomic conclusions. We here re-estimate the literature assumption that the G+C content can vary up to 3–5 % within species using genomic datasets. The resulting G+C content differences are compared with DNA–DNA hybridization (DDH) similarities calculated in silico using the GGDC web server, with 70 % similarity as the gold standard threshold for species boundaries. The results indicate that the G+C content, if computed from genome sequences, varies no more than 1 % within species. Statistical models based on larger differences alone can reject the hypothesis that two strains belong to the same species. Because DDH similarities between two non-type strains occur in the genomic datasets, we also examine to what extent and under which conditions such a similarity could be <70 % even though the similarity of either strain to a type strain was ≥70 %. In theory, their similarity could be as low as 50 %, whereas empirical data suggest a boundary closer (but not identical) to 70 %. However, it is shown that using a 50 % boundary would not affect the conclusions regarding the DNA G+C content. Hence, we suggest that discrepancies between G+C content data provided in species descriptions on the one hand and those recalculated after genome sequencing on the other hand ≥1 % are due to significant inaccuracies of the applied conventional methods and accordingly call for emendations of species descriptions.

    • Three supplementary tables are available with the online version of this paper.

    Abbreviations:
    DDH
    DNA–DNA hybridization
    dDDH
    digital DNA–DNA hybridization
    GGDC
    Genome-to-Genome Distance Calculator