Abstract
Abbreviations: BAC, bacterial artificial chromosome; RD, region of difference
Representative sequences of the junction regions reported in this article have been deposited in the EMBL database under accession numbers AJ583832, AJ583833 and AJ583834.
The 40 M. tuberculosis complex strains were composed of 18 M. tuberculosis strains and 11 M. bovis strains isolated from different organs of humans and animals, originating from different countries. Two M. bovis BCG vaccine strains (Birkhaug and Mérieux), two strains that were listed as M. africanum (001, 940946) in the collection of the Institut Pasteur as well as four M. microti (ATCC 35782, 94/2272, 005004, OV254), two M. canettii (14000059, 990161) and one M. canettii-like clinical isolate (990263) were included. The strains have been extensively characterized by reference typing methods, i.e. IS6110-RFLP typing and spoligotyping. Other tested mycobacterial species were Mycobacterium avium, Mycobacterium marinum and Mycobacterium smegmatis. For the investigation of the micro-deletion in the pks15/1 locus, 21 selected genomic DNAs were taken from a collection previously used for the study of the evolution of the M. tuberculosis complex (Brosch et al., 2002) in addition to 15 DNAs from the macro-array hybridization study.
In silico analyses of mycobacterial species.
The complete genome sequences of M. tuberculosis and M. leprae were shown to differ extensively in size and number of genes. The genome of M. tuberculosis comprises 4 411 532 bp and 3993 protein-coding genes (Cole et al., 1998; Camus et al., 2002), whereas M. leprae contains 3 268 203 bp and only 1605 genes, but numerous pseudogenes (Cole et al., 2001). In silico comparison of the predicted proteins shared by M. tuberculosis and M. leprae was done employing BLAST and FASTA alignment programs against public databases and partial genome sequences (available at web sites and ). To be listed as a conserved mycobacterial protein, 40 % identity at the protein level between M. tuberculosis and M. leprae was used as the cut-off level (Cole, 2002). The presence or absence of these genes in M. avium, M. marinum and M. smegmatis was determined in a similar manner by using 40 % identity over at least 70 % of the complete length of the tested protein. For selected cases, to determine if proteins corresponded to orthologous proteins in two species, the bi-directional best-hit method was applied, by comparing a given protein of M. tuberculosis with the sequence of another species, e.g. M. marinum. The protein sequence from M. marinum, which showed the highest similarity, was then compared back to the M. tuberculosis database, and, in the case of an orthologous protein, showed its best hit with the protein with which the initial comparison was started. This method was particularly useful when genes or proteins from multi-gene families were compared, as high scores due to cross-hybridization may appear.
PCR.
PCR amplification was used for making probes, for confirmation of the absence of genes that were suggested to be absent in certain tested strains by macro-array results, and for generating the DNA fragments of junction regions of deleted regions. According to the type of application, different volumes were used. For the production of probes, which were spotted on the macro-array, PCRs were performed in 96-well plates containing 12·5 µl of 10x PCR buffer [600 mM Tris/HCl pH 8·8, 20 mM MgCl2, 170 mM (NH4)2SO4, 100 mM β-mercaptoethanol], 12·5 µl of 20 mM nucleotide mix, 25 µl each primer at 2 µM, 10 ng template DNA, 10 % DMSO, 2 U Taq polymerase (Gibco-BRL) and sterile water to 125 µl. Amplification of junction regions and evaluation of the presence or absence of genes that showed no or weak hybridization by macro-array experiments with genomic DNA from a given strain were performed in a total reaction volume of 12·5 µl, as described previously (Brosch et al., 2002). Thermal cycling was performed on a PTC-100 amplifier (MJ) with an initial denaturation step of 90 s at 95 °C, followed by 35 cycles of 30 s at 95 °C, 1 min at 58 °C and 4 min at 72 °C.
Macro-arrays.
The selection of 500 genes for the focused macro-array included 219 genes common to both M. tuberculosis and M. leprae that code for proteins that did not show any similarity with proteins from other organisms in the public databases. Genes that were classified in the M. tuberculosis H37Rv genome as potentially involved in virulence (Cole et al., 1998) as well as genes that belonged to certain multi-gene families (Cole et al., 1998; Tekaia et al., 1999) were also included in the selection. Differences in the copy number of the insertion element IS1081 in the M. tuberculosis complex are almost entirely restricted to M. canettii. To determine if genes that flank IS1081 elements in the genome of M. tuberculosis H37Rv were conserved throughout the members of the M. tuberculosis complex, these genes were also selected for the construction of the macro-arrays. The selection also included genes that were variable between M. tuberculosis and the highly related vaccine strain M. bovis BCG (Mahairas et al., 1996; Gordon et al., 1999; Behr et al., 1999). Several house-keeping genes and other genes for which oligonucleotides were available in the laboratory were used for control purposes. The sequences of selected genes from M. tuberculosis H37Rv and M. bovis BCG Pasteur were downloaded by using the complete genome sequence () displayed by the ARTEMIS software (Rutherford et al., 2000) or by using in-house databases. The design of primer pairs for the amplification of ∼500 bp portions of these genes was done using the PRIMER 3 software (available via ). The oligonucleotides used for the amplification of the probes were designed to have annealing temperatures in the range 5860 °C. PCRs were performed in a final volume of 125 µl as described above. Fifty-five microlitres of the 125 µl of the PCR product were transferred from the 96-well plates in 384-well plates using a pipetting robot (Tecan). After estimation of the amount of amplification product by gel electrophoresis and ethidium-bromide staining, each PCR product was then deposited in duplicate on a 22x22 cm Q-filter N+222 mm membrane (Genetix) by a gridding robot (QPIX). Probes were fixed and denatured by putting the freshly spotted membranes on Whatman paper soaked with fixation solution (0·5 M NaOH, 1·5 M NaCl) and leaving them for 15 min. The reaction was stopped by using distilled water. Membranes were stored wet at -20 °C for further use. Membranes were pre-hybridized and hybridized in 10 ml of a solution containing SSPE buffer (750 mM NaCl, 50 mM NaH2PO4, 5 mM EDTA) with 1 % SDS, Denhardt's reagent composed of 0·01 % Ficoll, 0·01 % polyvinylpyrrolidone and 0·01 % BSA and sonicated salmon DNA at a final concentration of 100 µg ml-1. Pre-hybridization was performed during 1 h at 65 °C. Genomic DNAs from the various mycobacterial strains were labelled by random incorporation of [α-33P]dCTP into the synthesized complementary strand using the Prime-it II kit (Stratagene). Unincorporated nucleotides were removed by exclusion chromatography with the QIAquick Nucleotide Removal kit (Qiagen). Membranes were hybridized overnight at 65 °C with the labelled probe followed by four washing steps in 10 ml of 0·5x SSPE, 0·2 % SDS solution. The first two washes were done at room temperature for 5 min followed by two washes at 65 °C for 20 min. Membranes were sealed in Saran wrap, exposed to a screen for 2 days and scanned on a STORM phosphorimager (Molecular Dynamics); signals were quantified and visualized using the IMAGE QUANT (Molecular Dynamics) software. The hybridization signals of each spot hybridized with the genomic DNAs from the various strains were compared to a control membrane hybridized with the reference strain M. tuberculosis H37Rv using the ARRAY VISION software (Imaging Research). For normalization purposes, the intensities from the central and the surrounding area of each spot were calculated. The intensity from the surrounding area, due to non-specific background hybridization signals, was subtracted from the spot intensity. To compare spot intensities from different membranes, a mean background intensity was calculated for each membrane, which was then used for establishing a correction factor to normalize the spot intensities from individual membranes. The log10 ratios between the normalized intensities of each spot from a tested strain compared to the reference strain M. tuberculosis H37Rv were used to estimate whether a gene was present or absent in a given strain. The cut-off was determined using a Gaussian model involving the mean and the standard deviation. For confirmation purposes, the genes which were found absent by this approach were re-tested by PCR analysis in the corresponding strain.
Sequencing of junction regions.
For genes that were missing from certain strains, PCR confirmation was done as described above. Then, new primers that were situated in the flanking regions of the missing gene(s) were designed. After amplification of the fragment containing the junction region of a given new deleted region, the fragment was purified by using a QIAquick PCR purification kit. For sequencing, we used 500 ng purified amplification product, 3 µl Big Dye sequencing mix (Applied Biosystems), 2 µl (2 µM) flanking primer and 3 µl of 5x buffer (5 mM MgCl2, 200 mM Tris/HCl pH 8·8). Thermal cycling was performed on a PTC-100 amplifier (MJ), with an initial denaturation step of 1 min at 96 °C, followed by 35 cycles of 30 s at 96 °C, 15 s at 56 °C and 4 min at 60 °C). The products were then precipitated with 80 µl of 76 % ethanol, centrifuged, washed with 70 % ethanol and dried. Then, 2 µl of formamide/EDTA buffer were added and, after denaturation, the samples were loaded onto 4 % polyacrylamide gels (48 cm). Electrophoresis lasted for 1012 h on a model 377 automated DNA sequencer (Applied Biosystems). Obtained sequences were compared to the genome sequence of M. tuberculosis H37Rv using the TubercuList server at the Institut Pasteur, allowing the size and exact location of deleted regions to be determined. Representative sequences of the junction regions were deposited in the EMBL database under accession numbers AJ583832, AJ583833 and AJ583834.
Comparative genome analysis was carried out by screening all proteins in public databases for similarities with putative proteins from M. tuberculosis and M. leprae using the BLASTP and FASTA programs. Initially, this approach identified 219 genes encoding orthologous proteins in M. tuberculosis and M. leprae, but which showed no appreciable similarity with other proteins (Cole, 2002). The conservation of these genes by M. leprae in the face of extensive reductive evolution strongly suggests that they encode essential functions. These genes appear to code for proteins that are restricted to mycobacteria and possibly to closely related actinobacteria, whose sequences were not available in the public databases at the time of writing. Many of these genes were classified in the original genome analysis (Cole et al., 1998) in the group for which no function could be predicted. However, during the re-annotation of the M. tuberculosis H37Rv genome (Camus et al., 2002), based on similarities with new sequence data from other organisms or experimental data (Rosenkrands et al., 2000), the majority of these genes were re-grouped into categories for which a higher level of functional information is now available. Of the 219 genes specific for M. tuberculosis and M. leprae, 102 belong to the cell-wall and cell-processes class and 97 to the class of conserved hypothetical proteins. Ten are PE and PPE proteins with the remainder belonging to other classes (Fig. 1, Table 1). To test whether these genes were indeed conserved throughout the genus Mycobacterium, in silico comparison of the translated sequences with the predicted open reading frames (ORFs) from the almost-finished genome sequences of M. marinum, M. avium and M. smegmatis were undertaken. This analysis showed that, with a few exceptions, the great majority of these genes had orthologues present in M. marinum, M. avium and M. smegmatis (Table 1). Most of the genes which were not conserved among these species belong to the PE/PPE families, suggesting that extensive variation in number and sequence of these genes exists among the mycobacteria. M. marinum, one of the closest relatives of M. tuberculosis, as determined by 16S rRNA analysis (Springer et al., 1996), was missing only nine of the 219 conserved genes, whereas M. avium lacked 20, including the genes of the RD1 region, which are also absent from M. bovis BCG and M. microti. In the fast-grower M. smegmatis, 18 of the conserved genes did not have a counterpart.
|
Table 1. Conservation of core mycobacterial genes in silico Presence or absence of the 219 orthologous genes of M. tuberculosis H37Rv (Mt) and M. leprae (Ml) in the mycobacterial species M. avium (Ma), M. marinum (Mm) and M. smegmatis (Ms). Genes that were found to share >40 % amino acid identity over >70 % of the predicted protein were considered as orthologous genes and their presence is represented on the table by the sign +, whereas absence of the gene is indicated by the sign -. Genes of the PE/PPE families are boxed.
Macro-array analysis of the M. tuberculosis complex
The focused macro-array contained 500 gene fragments chosen according to criteria outlined above. To test the specificity of the array, hybridizations were undertaken with labelled plasmid DNA from a bacterial artificial chromosome (BAC) clone (Mi10C12) containing a 100 kb fragment from M. microti OV254 corresponding to genes Rv3802 to Rv3884 in M. tuberculosis H37Rv. Analysis of the hybridization signals quantified by using a phosphorimager and the ARRAY VISION software and generic file management programs allowed a large dataset to be established (Table 2). Absence of genes was further confirmed by PCR analysis and, where possible, by in silico comparison with finished or unfinished genome sequences. As expected, most of the 219 genes conserved between M. tuberculosis and M. leprae hybridized with the set of strains from the M. tuberculosis complex. Among the 102 genes presumably implicated in cell-wall structure and cell processes, three genes embCAB (Rv37939495) code for arabinosyltransferases, which are targets for ethambutol, a front-line anti-tuberculosis drug (Belanger et al., 1996; Telenti et al., 1997). Apart from results of bioinformatic analyses, confirming the presence of the three genes in M. avium, M. marinum and M. smegmatis, these three genes were also identified by the macro-array hybridizations as being present among all tested strains of the M. tuberculosis complex. This finding is encouraging and suggests that, among the other conserved mycobacterial genes of this group, new drug targets may also be discovered. Only for seven of the 219 genes was variability detected in the genomes of the tested strains from the M. tuberculosis complex and most of these genes are situated in RD regions (Table 2). Interestingly, several of these genes belong to the ESAT-6 family, which has 23 members on the M. tuberculosis H37Rv chromosome at 11 distinct sites (Cole et al., 1998; Tekaia et al., 1999). Some of these genes were located in previously defined regions of difference, such as RD1, RD5 and RD8 (Brosch et al., 2002; Gordon et al., 1999; Tekaia et al., 1999), whereas others such as esxR (Rv3019c) and esxS (Rv3020c) were identified in this study for the first time as being absent from several strains of the M. tuberculosis complex.
Table 2. Presence or absence of the tested genes in 40 strains of the M. tuberculosis complex Presence of an RD region is indicated by the sign +, whereas absence of the region is indicated by the sign -. Note that for regions RD1, RD2 and RD12 in some strains junction sequences are not identical, as outlined in the results section.
In silico and macro-array analyses of the RD1 region showed that this region is of particular interest because the gene content, as well as the gene order, at this locus is highly conserved among several, sometimes rather distant, mycobacterial species such as M. tuberculosis, M. marinum, M. leprae and M. smegmatis (Fig. 2), whereas portions of this region were found to be absent from M. avium (Gey Van Pittius et al., 2001), M. bovis BCG (Mahairas et al., 1996) and M. microti. In fact, the hybridization experiments presented here (Fig. 2), using DNA from BAC clone MiBAC10C12 (from 4252·3 to 4367·9 kb relative to M. tuberculosis H37Rv), as well as genomic DNAs from two M. microti strains, have contributed to the identification of the RD1mic deleted region, a segment of 14 kb comprising genes Rv3864Rv3876 that was deleted from the genome of M. microti strains (Brodin et al., 2002). The ESAT-6 family, genes esxOesxP (Rv2346c47c) located in the RD5 region, as well as esxVesxW (Rv3619c20c) from the RD8 region, were absent from all tested M. bovis and M. bovis BCG strains (Table 2), whereas M. microti lacks the genes in the RD8 region but has the ones from RD5 esxOesxP (Rv2346c47c) present. For esxResxS (Rv3019c20c), we found that three M. tuberculosis strains (950530, 950531, 950532) and four M. microti strains (OV254, ATCC 35782, 005004 and 94/2272) did not harbour these genes, whereas they were present in all other tested strains. PCR amplification experiments using sequences flanking esxResxS showed that parts of the neighbouring genes PPE46 (Rv3018c) and PPE47 (Rv3021c) were also absent from the strains lacking esxResxS. Inspection of the sequence of this region in M. tuberculosis H37Rv showed that in this segment numerous genes contain highly repetitive sequences [PPE46 (Rv3018c), PE27A (Rv3018A), PPE47 (Rv3021c), PPE48 (Rv3022c), PE29 (Rv3022A)]. It appears that in M. microti and the three M. tuberculosis strains the deletion of esxResxS was mediated independently by recombination between the highly similar PPE genes PPE46 (Rv3018c) and PPE47 (Rv3021c) which share stretches of 364 and 408 bp of identical sequences, removing a 2·4 kb fragment (Fig. 3). This study clearly shows that the variability is greater among the members of the ESAT-6 family than for the other conserved mycobacteria-specific genes. The strong immunogenic character of the ESAT-6 proteins during hostpathogen interactions may be related to this greater variability.
|
|
Identification and description of deleted regions
In contrast to the conserved genes between M. tuberculosis and M. leprae, the genes from the known RD regions showed much more variability in the 40 isolates of the M. tuberculosis complex, confirming the finding that certain lineages of strains (e.g. M. bovis) have successively lost genetic material during evolution. In agreement with previously published studies (Brosch et al., 2001, 2002; Niobe-Eyangoh et al., 2003), the RD9 region was absent from the tested M. africanum, M. microti and M. bovis strains, whereas it was present in M. tuberculosis and M. canettii strains. It represents a key element that defines one evolutionary lineage within the M. tuberculosis complex that has separated from the M. tuberculosis lineage and comprises M. africanum, M. microti and M. bovis. Within this lineage, the M. bovis BCG substrains tested, Birkhaug and Mérieux, showed the greatest number of deleted regions, followed by M. bovis strains and M. microti (Table 2). Some deletions were found to be characteristic for certain subspecies, for example, the RD1mic deletion for M. microti strains (Brodin et al., 2002).
Similarly, we identified a specific deletion, RD2seal, for strains which were isolated from infected seals in different parts of the world. Hybridization results suggested that genes Rv1978 and Rv1979 were absent from the seal isolates. Sequence analysis of the junction region in the four tested seal isolates confirmed this finding and showed that in these strains a 1941 bp deletion has removed parts of genes Rv1978 and Rv1979 (Fig. 4a). We named this region RD2seal as it overlaps the 10·7 kb RD2 region, which is missing from some but not all BCG substrains (Mahairas et al., 1996; Behr et al., 1999; Gordon et al., 1999). In addition, strains isolated from seals were deleted for regions RD7, RD8, RD9 and RD10, whereas regions RD4, RD5, RD6, RD11, RD12 and RD13, usually missing from M. bovis (Brosch et al., 2002; Mostowy et al., 2002), were present. As the RD2seal junction regions in the four seal isolates were identical, but different from the RD2 deletion of BCG strains, it appears that this deletion is a specific evolutionary marker for strains prevalent in seals and sea lions (Fig. 4d).
|
The selection of strains used in this study also included two isolates of M. canettii, which have been proposed to represent the most distant phylogenetic variant presently known within the M. tuberculosis complex (Brosch et al., 2002; Gutacker et al., 2002). We were particularly interested in the unusually low copy number of IS1081 (1 copy) in these strains, as all other members of the M. tuberculosis complex harbour 56 copies and display very homogeneous IS1081 RFLP (van Soolingen et al., 1997). In this respect, one key question was whether the different copy number of IS1081 in the M. canettii strains is due to a low rate of IS1081 transposition in this group of strains or if IS1081 copies may have been deleted from the genome. Bioinformatic comparisons showed that these sites were conserved in all completely or partially sequenced strains from the M. tuberculosis complex (i.e. M. tuberculosis strains H37Rv, CDC1551, Beijing 210, M. microti OV254, M. bovis AF2122/97 and M. bovis BCG) and that the genes flanking IS1081 copies were present in these strains. Furthermore, more distantly related mycobacteria, such as M. avium, M. marinum or M. smegmatis, possess orthologues of these flanking genes, without harbouring IS1081 insertion elements (data not shown). In contrast, hybridization results obtained with genomic DNAs from the two M. canettii strains showed that these strains lacked most of the genes that flank the IS1081 copies in the other members of the M. tuberculosis complex (Table 3), suggesting that in M. canettii strains the number of IS1081 copies is low because of deletion events in this lineage. As an example, Fig. 4(b) shows the genomic region containing genes Rv3113Rv3124 from M. tuberculosis H37Rv, whereas in M. canettii 14000059 a 12 436 bp deletion was observed that truncated genes Rv3111 (moaD) at position 3 491 865 and Rv3127 at position 3 479 429 relative to M. tuberculosis H37Rv, removing the intervening genomic region that carries a copy of IS1081. Interestingly, this deletion in M. canettii partially overlaps deleted region RD12 from M. bovis strains, which is 2·4 kb in size and has not removed the IS1081 copy (Fig. 4b). As for other copies of IS1081 that are missing from M. canettii 14000059 and 990161, the absence of many flanking genes suggests that they may have been removed by deletion events as well (Table 3).
Table 3. Presence or absence of IS1081 flanking genes in some strains from the M. tuberculosis complex In M. tuberculosis H37Rv, ORFs Rv1047, Rv1199, Rv2512, Rv2666, Rv3023 and Rv3115 code for IS1081 transposases. Strain/species: 1, M. tuberculosis CDC1551; 2, M. tuberculosis strain 210; 3, M. microti; 4, M. bovis; 5, M. bovis BCG; 6, M. canettii.
In the selection of strains were three belonging to the M. tuberculosis Beijing type, showing the characteristic spoligotype and IS6110 insertion in the dnaAN region. For these strains, macro-array results suggested that they all lacked regions RvD2 and RvD3, which are also absent from M. tuberculosis H37Rv, as shown previously (Gordon et al., 1999). However, for the Beijing strains, a single deletion of 15·5 kb relative to M. bovis AF2122/97 was detected that has removed both RvD regions, situated next to each other on the chromosome, apparently by homologous recombination between two copies of IS6110. This observation is in agreement with the findings of Ho and colleagues, who showed that deletions in the RvD2 region can be as large as 20 kb (Ho et al., 2000).
Micro-deletions in gene pks15/1
In a recent study, Guilhot and colleagues showed that several well characterized M. tuberculosis strains (H37Rv, Erdman, CDC1551 and MT106) lack a particular phenolglycolipid (PGL) that is produced by M. tuberculosis 210 (Beijing type) and M. canettii 14000059, and they linked this observation to a deletion of 7 bp that introduces a frameshift in a gene encoding a polyketide synthase (pks15/1) in these strains (Constant et al., 2002). Interestingly, in M. bovis AF2122/97 and M. bovis BCG, a 6 bp deletion was observed at the same locus of the pks15/1 gene. As M. bovis and M. bovis BCG both produce PGL, it seems likely that this 6 bp deletion, which does not cause a frameshift in the pks15/1 gene, does not influence the enzymic activity of the resulting gene product for the synthesis of the particular PGL (Constant et al., 2002). However, as this interesting polymorphism has direct phenotypic consequences, we were interested to determine at what stage of the phylogenetic diversification the deletion of 7 or 6 bp occurred. Therefore, we sequenced PCR products from the polymorphic locus in the pks15/1 gene (Fig. 4c) in 15 strains from the present study and in 21 additional strains used previously (Brosch et al., 2002). The results of this approach showed that the 7 bp deletion only occurred in a particular subgroup of M. tuberculosis strains that show the katG463 mutation CGG and, according to the nomenclature of Sreevatsan and colleagues, belong to genetic group 2 or 3 (Sreevatsan et al., 1997). All these strains had region TbD1 deleted and also lacked spacers 3336 in their spoligotype, which is a characteristic feature of these genetic groups (Brosch et al., 2002; Soini et al., 2000). In contrast, no M. tuberculosis strains of genetic group 1 (katG463 CTG), including strains of the ancestral type that have the TbD1 region present as well as strains of the Beijing type cluster, which lack the TbD1 region (Brosch et al., 2002), showed a deletion in the polymorphic locus of the pks15/1 gene.
As for the 6 bp deletion previously observed for M. bovis AF2122/97 and M. bovis BCG (Constant et al., 2002), in the present study we found the same 6 bp deletion in the pks15/1 gene of all tested M. bovis strains, seal isolates, M. microti and M. africanum lacking regions RD7RD10. Only M. africanum strains that lack the RD9 region but have retained regions RD7, RD8 and RD10 did not show the 6 bp deletion, suggesting that it occurred after the RD9 deletion, at about the same period as deletion of regions RD7, RD8 and RD10 occurred in the M. africanum→M. bovis lineage (Fig. 4d). These findings fit well with the proposed evolutionary scenario of the M. tuberculosis complex (Brosch et al., 2002) and suggest that two independent deletion events have occurred in the pks15/1 gene in two distinct branches of the phylogenetic tree of the M. tuberculosis complex. The 7 bp deletion, which inactivated the pks15/1 gene, occurred in the branch of TbD1-deleted modern M. tuberculosis strains at about the same time-range as the katG463 mutation (CTG→CGG), whereas the 6 bp deletion occurred after the RD9 and before the RD10 deletion in the M. africanum→M. bovis lineage. Considering a clonal structure (Supply et al., 2003; Fleischmann et al., 2002) of the M. tuberculosis complex, it seems that this 6 bp deletion in gene pks15/1 was then inherited by the other members of this branch, and can therefore be found in M. microti, seal isolates, M. bovis and BCG strains (Fig. 4d).
In this study, we evaluated the extent of genetic variability among mycobacteria and in particular those belonging to the M. tuberculosis complex, by using bioinformatic comparisons of newly available mycobacterial sequences together with the macro-array technology that allows the simultaneous screening of many more genes than is possible with previously used PCR-based strategies. In this perspective, it was particularly interesting to determine if genes that are conserved between the two major mycobacterial pathogens M. tuberculosis and M. leprae were present in all M. tuberculosis complex members and other more distantly related mycobacterial species. As shown in Results, the combined approach of bioinformatic analyses, macro-array hybridizations and sequencing of selected genes has resulted in a more-refined picture of the M. tuberculosis complex, indicating that within the M. tuberculosis complex and other tested mycobacteria, a very high degree of conservation exists for the genes shared by M. tuberculosis H37Rv and M. leprae.Among the few exceptions, members of the ESAT-6 family were most prominent. The members of this family are characterized by a small size (∼100 aa) (Cole et al., 1998), common amino acid motifs (Tekaia et al., 1999), and several of them are organized in genomic loci with similar organization, suggesting that the neighbouring genes may have some function in the transport of these proteins out of the bacterial cell (Cole et al., 1998; Tekaia et al., 1999; Pallen, 2002). The first experimental proof for this hypothesis was recently obtained for the RD1 region of M. tuberculosis (Fig. 2), which is absent from BCG and M. microti (Pym et al., 2003). In the same study, it was shown that recombinant vaccine strains that appropriately exported ESAT-6 and CFP10 induced better protection against tuberculosis in animal models. This finding may be linked to the highly immunogenic character that was demonstrated in several studies for ESAT-6 and other members of this family (Skjot et al., 2002). Indeed, most ESAT-6 proteins, deleted from one or more strains as identified in the present study, are strongly recognized by the immune system of the host (Skjot et al., 2002). Furthermore, two additional members of the ESAT-6 family (Rv3809c, Rv3905c) are reported to be altered in the sequenced M. bovis AF2122/97 strain (Garnier et al., 2003). Taken together, it seems plausible that variation of ESAT-6 family proteins in strains of M. tuberculosis and/or members of the M. tuberculosis complex could contribute to antigenic variation, eventually helping the bacteria to escape immune recognition by the host. To elucidate the biological function of this protein family, further studies are necessary. The finding that the RD1 region is highly conserved in gene content and gene order in several pathogenic and non-pathogenic mycobacterial species (Fig. 2) suggests that ESAT-6 systems may play a fundamental role in survival in specific environments. This knowledge, together with appropriate cosmid and BAC libraries from these species (Brosch et al., 1998), should enable now very focused studies on the role of these proteins in the various mycobacteria.
In the tight-knit M. tuberculosis complex, where single nucleotide substitutions do not seem to be a substantial source of genetic diversity between strains, the presence or absence of certain regions of difference may play important roles in the varying phenotypes, host range and virulence of these bacteria (Pym et al., 2002; Lewis et al., 2003). Analyses of these RD regions in well-defined strains from the M. tuberculosis complex have allowed us to describe distinct phylogenetic lineages within the M. tuberculosis complex (Brosch et al., 2002) that have evolved from a common ancestor. In this study, we describe RD regions that are characteristic for certain subpopulations of the M. tuberculosis complex. One of the regions (RD2seal) is restricted to strains that were isolated from seals. In the past, seals have been described to be susceptible to tuberculosis, but it was not always clear if the infections in seals were caused by M. tuberculosis and/or M. bovis (Zumarraga et al., 1999). However, by the use of macro-arrays and sequencing strategies we show here that the tested strains, which were isolated from seals in different geographical regions (Argentina, France), lack RD7, RD8, RD9 and RD10 and the particular region RD2seal that seems to be specific for tubercle bacilli hosted by seals. The analysis of all available genetic markers (RDs, mmpL6 polymorphism, pks15/1 polymorphism and spoligotype) showed that the seal isolates are phylogenetically more closely related to M. bovis than to M. tuberculosis. Their position in the established evolutionary scheme (Fig. 4d) is somewhere close to M. microti, which also lacks RD7RD10, shares the mmpL6 codon 551 single nucleotide polymorphism (SNP) of M. bovis (AAG) and presents a particular deletion (RD1mic) that is restricted to this subspecies. The position of the seal isolates in the phylogenetic scheme of the members of the M. tuberculosis complex shown in Fig. 4(d) is in good agreement with a recent SNP analysis by Musser and colleagues (Gutacker et al., 2002), who also placed these isolates as intermediate between M. tuberculosis and M. bovis. In a very recent study, the seal isolates were considered as sufficiently distant from M. bovis and M. tuberculosis to place them in a separate subspecies of the M. tuberculosis complex (Cousins et al., 2003). In this respect, the marker RD2seal is a valuable tool for the rapid identification of such strains.
The analysis of the sequence polymorphism in the pks15/1 gene, which abolishes production of a particular phenolglycolipid in a large group of M. tuberculosis strains (Constant et al., 2002), showed excellent agreement of the observed polymorphism with all other evolutionary markers available and confirmed the phylogenetic position of the strains used in this study (Fig. 4c, d). These results suggest that the 6 bp deletion in the pks15/1 gene in the M. africanum→M. bovis lineage arose independently from the 7 bp deletion observed for M. tuberculosis strains of Sreevatsan's group 2 and 3. Closer inspection of the flanking sequences of this polymorphic locus (Fig. 4c) in the pks15/1 gene showed that this genomic region is very GC-rich. In genes that code for PE and PPE proteins, such GC-rich regions have previously been associated with increased sequence polymorphism between strains (Cole et al., 1998; Banu et al., 2002). Interestingly, the pks15/1 sequence polymorphism is not the only example where independent deletion events have occurred in different evolutionary lineages of the tubercle bacilli in the same genomic regions. Other examples are the RD1 region of BCG (9·7 kb) and M. microti (14 kb), the RD2 region of BCG (10·7 kb) and seal strains (2 kb), or the RD12 region in M. bovis (2·7 kb) and M. canettii (12·4 kb). The size of the deletions, as well as the junction sequences of these regions, are clearly distinct from each other, indicating that no direct phylogenetic relationship exists between them. This observation raises an important point for the interpretation of micro- and macro-array data and implies that sequencing of the junction regions of thereby identified deleted regions (Fig. 4a, b) is necessary before the presence/absence of these marker genes can be used in the construction of evolutionary schemes. From a practical point of view, the pks15/1 polymorphism may serve as an important additional marker for the identification and classification of members of the M. tuberculosis complex, as well as for the characterization of mycobacterial DNAs amplified from mummified human remains. Recent studies have shown that, according to their spoligotype and their katG463 SNP, in former human populations M. tuberculosis strains were present that resembled TbD1-deleted M. tuberculosis strains of Sreevatsan's genetic group 2 and 3 (Zink et al., 2003; Fletcher et al., 2003). As shown in the present study, it appears that a strict correlation exists between these characteristics and the frameshift mutation (deletion of 7 bp) in the pks15/1 gene.
The situation of mycobacterial research has considerably changed in the last few years due to the information contained in the whole-genome sequence of M. tuberculosis H37Rv, the paradigm strain of tuberculosis research. However, genomic variation may exist among different strains and, for the mycobacteria, only very few studies have addressed this question by the use of DNA arrays and then for a limited number of strains (Behr et al., 1999; Kato-Maeda et al., 2001). We therefore evaluated the extent of the conserved gene pool relative to the flexible gene pool in a collection of strains from the M. tuberculosis complex and for some other mycobacterial species; this has led to a better understanding of the genetic criteria that may have played a role in the selection of the most successful M. tuberculosis strains during the evolution of the pathogen. This information is of importance for the development of new therapeutic and preventive strategies in the fight against tuberculosis.
We are grateful to Lionel Frangeul, Thierry Garnier and Aboubakar Maitournam for help in primer design and data comparison, and Stephen Gordon, Sarah Ngo Niobe-Eyangoh and Alexander Pym for fruitful discussions. Preliminary sequence data were obtained from The Institute for Genomic Research (TIGR) web site () and the M. marinum sequence database at the Wellcome Trust Sanger Institute (). Sequencing of M. avium and M. smegmatis at TIGR was accomplished with support from NIAID, and sequencing of M. marinum at the Sanger Institute with support from Beowulf Genomics. This study received financial support from the Institut Pasteur (PTR 35), the Génopole Programme and the Association Française Raoul Follereau.References
Behr, M. A., Wilson, M. A., Gill, W. P., Salamon, H., Schoolnik, G. K., Rane, S. & Small, P. M. (1999). Comparative genomics of BCG vaccines by whole-genome DNA microarray. Science 284, 15201523.
Belanger, A. E., Besra, G. S., Ford, M. E., Mikusova, K., Belisle, J. T., Brennan, P. J. & Inamine, J. M. (1996). The embAB genes of Mycobacterium avium encode an arabinosyl transferase involved in cell wall arabinan biosynthesis that is the target for the antimycobacterial drug ethambutol. Proc Natl Acad Sci U S A 93, 1191911924.
Brodin, P., Eiglmeier, K., Marmiesse, M., Billault, A., Garnier, T., Niemann, S., Cole, S. T. & Brosch, R. (2002). Bacterial artificial chromosome-based comparative genomic analysis identifies Mycobacterium microti as a natural ESAT-6 deletion mutant. Infect Immun 70, 55685578.
Brosch, R., Gordon, S. V., Billault, A., Garnier, T., Eiglmeier, K., Soravito, C., Barrell, B. G. & Cole, S. T. (1998). Use of a Mycobacterium tuberculosis H37Rv bacterial artificial chromosome library for genome mapping, sequencing, and comparative genomics. Infect Immun 66, 22212229.
Brosch, R., Gordon, S. V., Pym, A., Eiglmeier, K., Garnier, T. & Cole, S. T. (2000). Comparative genomics of the mycobacteria. Int J Med Microbiol 290, 143152.[Medline]
Brosch, R., Pym, A. S., Gordon, S. V. & Cole, S. T. (2001). The evolution of mycobacterial pathogenicity: clues from comparative genomics. Trends Microbiol 9, 452458.[CrossRef][Medline]
Brosch, R., Gordon, S. V., Marmiesse, M. & 12 other authors (2002). A new evolutionary scenario for the Mycobacterium tuberculosis complex. Proc Natl Acad Sci U S A 99, 36843689.
Camus, J. C., Pryor, M. J., Medigue, C. & Cole, S. T. (2002). Re-annotation of the genome sequence of Mycobacterium tuberculosis H37Rv. Microbiology 148, 29672973.
Cole, S. T. (2002). Comparative mycobacterial genomics as a tool for drug target and antigen discovery. Eur Respir J Suppl 36, 7886.
Cole, S. T., Brosch, R., Parkhill, J. & 39 other authors (1998). Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature 393, 537544.[CrossRef][Medline]
Cole, S. T., Eiglmeier, K., Parkhill, J. & 41 other authors (2001). Massive gene decay in the leprosy bacillus. Nature 409, 10071011.[CrossRef][Medline]
Constant, P., Perez, E., Malaga, W., Laneelle, M. A., Saurel, O., Daffe, M. & Guilhot, C. (2002). Role of the pks15/1 gene in the biosynthesis of phenolglycolipids in the Mycobacterium tuberculosis complex. Evidence that all strains synthesize glycosylated p-hydroxybenzoic methyl esters and that strains devoid of phenolglycolipids harbor a frameshift mutation in the pks15/1 gene. J Biol Chem 277, 3814838158.
Cousins, D. V., Bastida, R., Cataldi, A. & 16 other authors (2003). Tuberculosis in seals caused by a novel member of the Mycobacterium tuberculosis complex: Mycobacterium pinnipedii sp. nov. Int J Syst Evol Microbiol 53, 13051314.
Fleischmann, R. D., Alland, D., Eisen, J. A. & 23 other authors (2002). Whole-genome comparison of Mycobacterium tuberculosis clinical and laboratory strains. J Bacteriol 184, 54795490.
Fletcher, H. A., Donoghue, H. D., Taylor, G. M., van der Zanden, A. G. & Spigelman, M. (2003). Molecular analysis of Mycobacterium tuberculosis DNA from a family of 18th century Hungarians. Microbiology 149, 143151.
Garnier, T., Eiglmeier, K., Camus, J. C. & 19 other authors (2003). The complete genome sequence of Mycobacterium bovis. Proc Natl Acad Sci U S A 100, 78777882.
Gey Van Pittius, N. C., Gamieldien, J., Hide, W., Brown, G. D., Siezen, R. J. & Beyers, A. D. (2001). The ESAT-6 gene cluster of Mycobacterium tuberculosis and other high G+C Gram-positive bacteria. Genome Biol 2, RESEARCH0044.1-0044.18.
Gordon, S. V., Brosch, R., Billault, A., Garnier, T., Eiglmeier, K. & Cole, S. T. (1999). Identification of variable regions in the genomes of tubercle bacilli using bacterial artificial chromosome arrays. Mol Microbiol 32, 643655.[CrossRef][Medline]
Gutacker, M. M., Smoot, J. C., Migliaccio, C. A. & 7 other authors (2002). Genome-wide analysis of synonymous single nucleotide polymorphisms in Mycobacterium tuberculosis complex organisms. Resolution of genetic relationships among closely related microbial strains. Genetics 162, 15331543.
Ho, T. B., Robertson, B. D., Taylor, G. M., Shaw, R. J. & Young, D. B. (2000). Comparison of Mycobacterium tuberculosis genomes reveals frequent deletions in a 20 kb variable region in clinical isolates. Yeast 17, 272282.[CrossRef][Medline]
Kato-Maeda, M., Rhee, J. T., Gingeras, T. R., Salamon, H., Drenkow, J., Smittipat, N. & Small, P. M. (2001). Comparing genomes within the species Mycobacterium tuberculosis. Genome Res 11, 547554.
Lewis, K. N., Liao, R., Guinn, K. M., Hickey, M. J., Smith, S., Behr, M. A. & Sherman, D. R. (2003). Deletion of RD1 from Mycobacterium tuberculosis mimics bacille Calmette-Guerin attenuation. J Infect Dis 187, 117123.[CrossRef][Medline]
Mahairas, G. G., Sabo, P. J., Hickey, M. J., Singh, D. C. & Stover, C. K. (1996). Molecular analysis of genetic differences between Mycobacterium bovis BCG and virulent M. bovis. J Bacteriol 178, 12741282.
Mostowy, S., Cousins, D., Brinkman, J., Aranaz, A. & Behr, M. A. (2002). Genomic deletions suggest a phylogeny for the Mycobacterium tuberculosis complex. J Infect Dis 186, 7480.[CrossRef][Medline]
Niobe-Eyangoh, S. N., Kuaban, C., Sorlin, P., Cunin, P., Thonnon, J., Sola, C., Rastogi, N., Vincent, V. & Gutierrez, M. C. (2003). Genetic biodiversity of Mycobacterium tuberculosis complex strains from patients with pulmonary tuberculosis in Cameroon. J Clin Microbiol 41, 25472553.
Pallen, M. J. (2002). The ESAT-6/WXG100 superfamily and a new Gram-positive secretion system? Trends Microbiol 10, 209212.[CrossRef][Medline]
Pym, A. S., Brodin, P., Brosch, R., Huerre, M. & Cole, S. T. (2002). Loss of RD1 contributed to the attenuation of the live tuberculosis vaccines Mycobacterium bovis BCG and Mycobacterium microti. Mol Microbiol 46, 709717.[CrossRef][Medline]
Pym, A. S., Brodin, P., Majlessi, L. & 7 other authors (2003). Recombinant BCG exporting ESAT-6 confers enhanced protection against tuberculosis. Nat Med 9, 533539.[CrossRef][Medline]
Rosenkrands, I., King, A., Weldingh, K., Moniatte, M., Moertz, E. & Andersen, P. (2000). Towards the proteome of Mycobacterium tuberculosis. Electrophoresis 21, 37403756.[CrossRef][Medline]
Rutherford, K., Parkhill, J., Crook, J., Horsnell, T., Rice, P., Rajandream, M.-A. & Barrell, B. (2000). ARTEMIS: sequence visualisation and annotation. Bioinformatics 16, 944945.
Skjot, R. L., Brock, I., Arend, S. M., Munk, M. E., Theisen, M., Ottenhoff, T. H. & Andersen, P. (2002). Epitope mapping of the immunodominant antigen TB10.4 and the two homologous proteins TB10.3 and TB12.9, which constitute a subfamily of the esat-6 gene family. Infect Immun 70, 54465453.
Soini, H., Pan, X., Amin, A., Graviss, E. A., Siddiqui, A. & Musser, J. M. (2000). Characterization of Mycobacterium tuberculosis isolates from patients in Houston, Texas, by spoligotyping. J Clin Microbiol 38, 669676.
Springer, B., Stockman, L., Teschner, K., Roberts, G. D. & Bottger, E. C. (1996). Two-laboratory collaborative study on identification of mycobacteria: molecular versus phenotypic methods. J Clin Microbiol 34, 296303.[Abstract]
Sreevatsan, S., Pan, X., Stockbauer, K. E., Connell, N. D., Kreiswirth, B. N., Whittam, T. S. & Musser, J. M. (1997). Restricted structural gene polymorphism in the Mycobacterium tuberculosis complex indicates evolutionarily recent global dissemination. Proc Natl Acad Sci U S A 94, 98699874.
Supply, P., Warren, R. M., Banuls, A. L. & 7 other authors (2003). Linkage disequilibrium between minisatellite loci supports clonal evolution of Mycobacterium tuberculosis in a high tuberculosis incidence area. Mol Microbiol 47, 529538.[CrossRef][Medline]
Tekaia, F., Gordon, S. V., Garnier, T., Brosch, R., Barrell, B. G. & Cole, S. T. (1999). Analysis of the proteome of Mycobacterium tuberculosis in silico. Tuber Lung Dis 79, 329342.[CrossRef][Medline]
Telenti, A., Philipp, W. J., Sreevatsan, S., Bernasconi, C., Stockbauer, K. E., Wieles, B., Musser, J. M. & Jacobs, W. R., Jr (1997). The emb operon, a gene cluster of Mycobacterium tuberculosis involved in resistance to ethambutol. Nat Med 3, 567570.[CrossRef][Medline]
van Soolingen, D., Hoogenboezem, T., de Haas, P. E. & 9 other authors (1997). A novel pathogenic taxon of the Mycobacterium tuberculosis complex, Canetti: characterization of an exceptional isolate from Africa. Int J Syst Bacteriol 47, 12361245.
Zink, A. R., Sola, C., Reischl, U., Grabner, W., Rastogi, N., Wolf, H. & Nerlich, A. G. (2003). Characterization of Mycobacterium tuberculosis complex DNAs from Egyptian mummies by spoligotyping. J Clin Microbiol 41, 359367.
Zumarraga, M. J., Bernardelli, A., Bastida, R. & 10 other authors (1999). Molecular characterization of mycobacteria isolated from seals. Microbiology 145, 25192526.[Medline]
Received 22 July 2003; revised 30 September 2003; accepted 2 October 2003.