GENES AND GENOMES

Genetic diversity of capsular polysaccharide biosynthesis in Klebsiella pneumoniae clinical isolates

  • 1Department of Bioscience Technology, Chang Jung Christian University, Tainan County, Taiwan, ROC
  • 2Genome Research Center, National Yang-Ming University, Taipei, Taiwan, ROC
  • 3Division of Infectious Diseases, Department of Medicine, Taipei Veterans General Hospital, Taipei, Taiwan, ROC
  • 4Institute of Tropical Medicine, School of Medicine, National Yang-Ming University, Taipei, Taiwan, ROC
  • 5Division of Molecular and Genomic Medicine, National Health Research Institutes, Zhunan, Miaoli County, Taiwan, ROC
  • 6Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan, ROC
  • 7Department of Life Sciences and Institute of Genome Sciences, National Yang-Ming University, Taipei, Taiwan, ROC
  • Correspondence
    Hung-Yu Shu
    hyshu{at}mail.cjcu.edu.tw
  • Microbiology 2009; 155(12):4170–4183 · https://doi.org/10.1099/mic.0.029017-0

    View at publisher PubMed

    Abstract

    Klebsiella pneumoniae is an enteric pathogen causing community-acquired and hospital-acquired infections in humans. Epidemiological studies have revealed significant diversity in capsular polysaccharide (CPS) type and clinical manifestation of K. pneumoniae infection in different geographical areas of the world. We have sequenced the capsular polysaccharide synthesis (cps) region of seven clinical isolates and compared the sequences with the publicly available cps sequence data of five strains: NTUH-K2044 (K1 serotype), Chedid (K2 serotype), MGH78578 (K52 serotype), A1142 (K57 serotype) and A1517. Among all strains, six genes at the 5′ end of the cps clusters that encode proteins for CPS transportation and processing at the bacterial surface are highly similar to each other. The central region of the cps gene clusters, which encodes proteins for polymerization and assembly of the CPS subunits, is highly divergent. Based on the collected sequence, we found that either the wbaP gene or the wcaJ gene exists in a given K. pneumoniae strain, suggesting that there is a major difference in the CPS biosynthesis pathway and that the K. pneumoniae strains can be classified into at least two distinct groups. All isolates contain gnd, encoding gluconate-6-phosphate dehydrogenase, at the 3′ end of the cps gene clusters. The rmlBADC genes were found in CPS K9-positive, K14-positive and K52-positive strains, while manC and manB were found in K1, K2, K5, K14, K62 and two undefined strains. Our data indicate that, while overall genomic organization is similar between different pathogenic K. pneumoniae strains, the genetic variation of the sugar moiety and polysaccharide linkage generate the diversity in CPS molecules that could help evade host immune attack.

    • †These authors contributed equally to this work.

    • The GenBank/EMBL/DDBJ accession numbers for the capsular polysaccharide synthesis sequences of the Klebsiella pneumoniae strains examined in this paper are AB371289–AB371295.

    • Three supplementary tables, listing primers used in this study, genes encoding membrane or periplasmic proteins in the cps gene clusters identified using the EZ-Tn5 <blaM/R6KγOri> transposon, and predicted cps genes and CPS structures, are available with the online version of this paper.

    Edited by: S. D. Bentley

    INTRODUCTION

    Klebsiella pneumoniae is one of the common pathogens among the Gram-negative bacteria that cause respiratory and urinary tract infections, liver abscesses and bacteraemia (Podschun & Ullmann, 1998; Fang et al., 2007). K. pneumoniae can produce capsular polysaccharides (CPSs, K antigen) and lipopolysaccharides (LPSs, O antigen), both of which are important virulence factors (Podschun & Ullmann, 1998). The functions of CPSs are serum resistance and antiphagocytosis (Podschun & Ullmann, 1998). Among the 77 distinct K serotypes of K. pneumoniae (Mori et al., 1989), K1 and K2 are common causative agents isolated from patients with K. pneumoniae infections in Singapore, Taiwan and South Africa (Fung et al., 2002; Yu et al., 2007).

    K. pneumoniae is phylogenetically related to Escherichia coli. Both K. pneumoniae and E. coli produce a variety of capsules (Podschun & Ullmann, 1998; Jann & Jann, 1997). The genetic organization of the capsular polysaccharide synthesis (cps) region and CPS assembly in E. coli have been well studied (reviewed by Whitfield, 2006; Whitfield & Roberts, 1999). E. coli K antigens are divided into four groups and the group 1 CPSs are synthesized by the Wzy-dependent polymerization pathway. Previous studies indicate that the mechanism of synthesis of K. pneumoniae capsules and K. pneumoniae cps gene clusters is similar to that of the E. coli group I CPSs (Arakawa et al., 1995; Rahn et al., 1999; Chuang et al., 2006; Pan et al., 2008). The genes in the group I cps gene clusters can be classified into three groups, (i) genes for sugar nucleotide synthesis, (ii) genes for capsule repeat-unit synthesis and (iii) genes for capsular repeat-unit assembly and export. Moreover, it is well documented that genes encoding proteins for capsule repeat-unit synthesis, assembly and export form an operon (Drummelsmith & Whitfield, 1999).

    The synthesis of the E. coli K30 (group I CPS) antigen serves as a model for CPS biogenesis in K. pneumoniae (Whitfield & Paiment, 2003). Four glycosyltransferases (GTs) are involved in the synthesis of K30 lipid-linked antigen repeat units. WbaP, a membrane protein, transfers the Gal from UDP-Gal to an undecaprenol diphosphate (UndP) to form Gal-p-UndP. Three other GTs (WbaZ, WcaO and WcaN) then transfer three sugar moieties to Gal-p-UndP sequentially to produce the capsule repeat unit. GTs play a significant role in the biosynthesis of CPS and O antigen repeat units. GTs can be classified into two structural superfamilies and 65 families based on different criteria (Franco & Rigden, 2003; Coutinho et al., 2003). The diverse enzymic activities of GTs provide the remarkable diversity of K and O antigens. The membrane proteins encoded by wza and wzc are located at the 5′ end of E. coli group I and K. pneumoniae cps gene clusters (Arakawa et al., 1995; Rahn et al., 1999; Chuang et al., 2006; Pan et al., 2008). Wza is an integral outer membrane lipoprotein and Wzc functions as an inner membrane tyrosine autokinase. Wza and Wzc form a channel complex that acts as a K antigen transporter (Dong et al., 2006; Whitfield, 2006). It has been suggested that Wzb, the cognate phosphatase of Wzc, regulates the function of Wzc through its phosphatase activity (Whitfield, 2006). Previous studies (Rahn et al., 1999; Chuang et al., 2006; Pan et al., 2008) have revealed that the genes responsible for K antigen repeat-unit biosynthesis and polymerization located in the central region (between wzc and gnd) of cps gene clusters are highly diverse.

    The full-length cps gene clusters of the K. pneumoniae strains NTUH-K2044 (Chuang et al., 2006; Wu et al., 2009), Chedid (Arakawa et al., 1995) and MGH78578 () have been sequenced. Twenty, 18 and 22 ORFs have been predicted in the K1, K2 and K52 cps gene clusters, respectively. Recently, the cps gene clusters (from galF to gnd) of K. pneumoniae A1142 and A1517, each consisting of 16 ORFs, have also been sequenced and characterized (Pan et al., 2008).

    In this study, we sequenced seven cps gene clusters from clinical isolates of K. pneumoniae. Comparative analysis reveals that these cps gene clusters of K. pneumoniae are highly diverse. The number of GT genes and the DNA sequence of these GT genes (except wbaP and wcaJ) are specific to each cps gene cluster. Notably, the first GT gene that transfers sugar from activated sugar nucleotides to UndP present in each cps gene cluster is either wbaP or wcaJ. The information provided by this study could be used to investigate the evolution of cps gene clusters in K. pneumoniae.

    METHODS

    The cps gene clusters of 12 K. pneumoniae strains, NTUH-K2044, Chedid, MGH78578, NK8, NK29, NK245, VGH404, VGH484, VGH916, VGH698, A1142 and A1517, are named in this paper as cpskpK1, cpskpK2, cpskpK52, cpskpA, cpskpB, cpskpC, cpskpK5, cpskpK9, cpskpK14, cpskpK62, cpskpK57 and cpskpKpA1517, respectively.

    The sequences of cpskpK1, cpskpA, cpskpB and cpskpC were determined by whole-genome shotgun sequencing (Wu et al., 2009; K. M. Wu and others, unpublished data). The cps sequence of K. pneumoniae NTUH-K2044 has been independently reported (Chuang et al., 2006). The genomic segments containing cpskpK5, cpskpK9, cpskpK14 and cpskpK62 were obtained by fosmid library construction, PCR screening and PCR cloning to generate genomic sequences of the corresponding clones. Sequences for cpskpK2, cpskpK57 and cpskpKpA1517 were downloaded from GenBank, and the sequence of cpskpK52 is available at .

    Bacterial strains, plasmids and growth conditions.

    K. pneumoniae strains and plasmids used in this study are listed in Table 1. K. pneumoniae NK8, NK29 and NK245 were collected from the Department of Pathology, National Cheng Kung University Hospital, Tainan, Taiwan (Wu et al., 2009). K. pneumoniae VGH525, VGH404, VGH484, VGH916 and VGH698 were collected from Veterans General Hospital, Taipei, Taiwan (Fung et al., 2000). E. coli EPI300 (Epicentre) was used to construct fosmid libraries. E. coli and K. pneumoniae were cultured in Luria broth (LB) or on LB agar plates (Miller, 1972) at 37 °C.

    Table 1.

    Bacterial strains and plasmids used in this study

    Fosmid library construction and PCR screening method.

    For fosmid library construction, the genomic DNA of K. pneumoniae strains NK8, NK29, NK245, VGH404, VGH484, VGH916 and VGH698 was prepared using the Wizard Genomic DNA Purification System (Promega) following the manufacturer's instructions. The fosmid libraries were constructed with a CopyControl Fosmid Library Production kit (Epicentre) according to the manufacturer's instructions. Briefly, genomic DNA was first sheared into approximately 40 kb fragments. The sheared DNA was end-repaired to generate blunt, 5′-phosphorylated ends and then size-selected by and recovered from a low-melting-point agarose gel. Finally, the size-selected DNA was ligated with the cloning-ready CopyControl pCC1FOS vector, packaged using ultra-high-efficiency MaxPlax Lambda Packaging Extracts, and used to transfect E. coli EPI300. Transfected cells were plated on agar plates containing chloramphenicol (12.5 μg ml−1).

    Fosmid clones were then picked and placed in LB medium containing chloramphenicol in 96-well microtitre plates (stock plates) and incubated overnight at 37 °C with shaking. The overnight bacterial culture from eight wells down a column of a 96-well microtitre plate (10 μl per well) was pooled (column pool) and stored in a single well of 96-well microtitre plates (pool plates). A 2 μl sample of bacterial culture from each well of the pool plates was used for PCR amplification. WZI1 and WZI2 primers (sequences listed in Supplementary Table S1) were used to identify fosmid clones containing cps gene clusters. Herculase DNA polymerase (Stratagene) was used for the PCR. The PCR products were examined in a 1 % agarose gel. Individual fosmids from a PCR-positive well of the pool plate were purified and fosmid end-sequence pairs were sequenced to determine which fosmid clone contains the cps region.

    Fosmids pCCKPA, pCCKPB, pCCK9CPS and pCCK62CPS containing full-length cps were identified by PCR screening of fosmid libraries of K. pneumoniae NK8 (cpskpA), NK29 (cpskpB), VGH484 (cpskpK9) and VGH698 (cpskpK62), respectively (Fig. 1). Fosmid pCCK5CPS, and pCCK14CPS containing partial cps (5′ end), were identified from fosmid libraries of K. pneumoniae VGH404 (cpskpK5) and VGH916 (cpskpK14) (Fig. 1).

    Figure image not available in archive
    Fig. 1.

    Recombinant clones containing cps gene clusters of K. pneumoniae. Horizontal lines represent the inserts of plasmids. The vertical lines indicate the boundary of the cps gene cluster (from galF to ugd). The dashed line represents the DNA sequence from K2 CPS of K. pneumoniae Chedid (accession number D21242). The cps gene cluster of K. pneumoniae VGH404 (cpskpK5) and VGH916 (cpskpK14) is represented by three (pCCK5CPS, pCCK5-1 and pCCK5-2) and four (pCCK14CPS, pCCK14-1, pCCK14-2 and pCCK14-3) clones, respectively.

    Cloning the 3′ region of cps of K. pneumoniae strains VGH404, VGH916 and VGH525.

    After PCR screening, fosmid clone pCCK5CPS was identified from the fosmid library of K. pneumoniae VGH404 (serotype K5). pCCK5CPS contains a partial cpskpK5 gene cluster. Therefore, in order to get the full-length of the cpskpK5 gene cluster, the primer pairs PR03/PR04 and PR05/PR06 (Supplementary Table S1) were used to amplify cpskpK5 from K. pneumoniae VGH404 genomic DNA. The amplified DNA fragments were then cloned into linearized pCC1FOS (Epicentre) to generate the plasmids pCCK5-1 and pCCK5-2. Overlapping of the inserts from pCCK5CPS, pCCK5-1 and pCCK5-2 made up the full-length of the cpskpK5 gene cluster (Fig. 1).

    Fosmid clone pCCK14CPS identified by PCR screening contained partial cpskpK14. To get full-length cpskpK14, primer pairs PR07/PR04, PR08/PR09 and PR01/PR06 (Supplementary Table S1) were used to amplify the fragment from the K. pneumoniae VGH916 (serotype K14) genome. The PCR products were then ligated with pCC1FOS to generate pCCK14-1, pCCK14-2 and pCCK14-3. As a result, overlapping of the inserts from pCCK14CPS, pCCK14-1, pCCK14-2 and pCCK14-3 made up the full-length of the cpskpK14 gene cluster (Fig. 1).

    Though the cps gene cluster sequence of K. pneumoniae K2 (cpskpK2) has been reported by Arakawa et al. (1995), the 3′ end of cpskpK2 has not been determined. We used primers PR01 and PR02 (Supplementary Table S1) to amplify the 3′ region of cpskpK2 from K. pneumoniae VGH525 (serotype K2), and the PCR product was ligated with pCC1FOS to generate pCCK2 (Fig. 1).

    Plasmid and fosmid DNA purification and sequencing.

    For plasmid and fosmid DNA purification, bacteria were grown according to the instructions of the CopyControl Fosmid Library Production kit (Epicentre). Plasmid and fosmid DNA from 1 ml bacterial culture was purified with a Montage Plasmid Miniprep96 kit (Millipore).

    To generate sequencing templates, plasmids and fosmids were tagged with the EZ-Tn5 <KAN-2> transposon according to the manufacturer's instructions (Epicentre). Tagged plasmids and fosmids were electroporated into E. coli EPI300 and purified with the Montage Plasmid Miniprep96 kit.

    The sequencing reactions were carried out with BigDye Terminator v3.1 following the standard procedure and the products were analysed with an ABI 3730xl DNA analyser (Applied Biosystems). The Phred/Phrap/Consed package (Ewing & Green, 1998; Gordon et al., 1998) was used for sequence assembling and editing.

    Coding gene prediction and sequence analysis.

    The protein-coding genes were predicted using Glimmer 2.13, GeneMark 2.4 and GeneMark.hmm 2.1 (Delcher et al., 1999; Borodovsky & McIninch, 1993; Lukashin & Borodovsky, 1998) and those shorter than 30 translated amino acids were abandoned. The gene name, description, probable function and Clusters of Orthologous Groups (COG) group of each predicted gene were assigned according to the results of blastp (E-value <10−5 and identity >40 % and matched length >30 %) against RefSeq.microbial (Pruitt et al., 2007), non-redundant protein and COG databases from NCBI (). Subsequent manual inspections were applied to correct errors from automatic annotation and information retrieval. Those genes without matched gene names within the cps variable region were denominated with the corresponding serotype following the alphabet, except those in strains NK8, NK29 and NK245, since their serotypes are not yet identified.

    The protein family of each protein was predicted by searching the Pfam protein families database (Finn et al., 2006; Bateman et al., 2004), and the program MacVector (Olson, 1994) was used for pairwise sequence comparison.

    Identification of novel membrane and periplasmic proteins.

    An EZ-Tn5 β-Lactamase Fusion kit (Epicentre) was used to identify genes encoding membrane and periplasmic proteins. Fosmids pCCKPA, pCCKPB, pCCK9CPS and pCCK62CPS were tagged with the EZ-Tn5 <blaM/R6Kγori> transposon in vitro. Tagged fosmids were transformed into E. coli Epi300 by electroporation and plated on an LB plate containing 50 μg ampicillin ml−1. The fosmid DNA of transformants was isolated and the <blaM/R6Kγori> transposon insertion site of each fosmid was determined by sequencing using primer BLA-RP2 (5′-CTTCGGGGCGAAAACTCTCAAGGATC-3′). Ninety-six samples of each fosmid (pCCKPA, pCCKPB, pCCK9CPS and pCCK62CPS) were sequenced.

    RESULTS AND DISCUSSION

    General features of cps gene clusters of K. pneumoniae

    Seven newly sequenced and five already sequenced cps of K. pneumoniae were analysed in this paper. The cps of K. pneumoniae NTUH-K2044, Chedid, MGH78578, NK8, NK29, NK245, VGH404, VGH484, VGH916, VGH698, A1142 and A1517 are named as cpskpK1, cpskpK2, cpskpK52, cpskpA, cpskpB, cpskpC, cpskpK5, cpskpK9, cpskpK14, cpskpK62, cpskpK57 and cpskpKpA1517, respectively. The genetic organization of the K. pneumoniae cps gene clusters are shown in Fig. 2(a). In general, the cps region of K. pneumoniae begins at galF and ends at ugd. The length of the K. pneumoniae cps gene clusters ranges from 21 to 30 kb (Table 2). The number of ORFs found in the cps region of 12 K. pneumoniae cps gene clusters ranges from 16 to 25 (Table 2). The transcriptional direction of all ORFs is from galF to ugd, except for kpc7 of cpskpC (Fig. 2a). The GC content of each ORF is also shown in Fig. 2(a). The GC content of K antigen gene clusters is lower than the mean GC content (57.7 %) of the K. pneumoniae NTUH-K2044 genome (Wu et al., 2009), especially in the region between wzc and gnd (Fig. 2a).

    Figure image not available in archive
    Fig. 2.

    Organization of cps gene clusters of 12 K. pneumoniae isolates (a) and putative promoters and transcriptional units of the K. pneumoniae cps (b). (a) The arrows represent the direction of transcription and dark-grey arrows represent putative GT genes identified by Pfam. The percentage below each ORF indicates the G+C content of individual genes. (b) The thick lines represent transcriptional units of K. pneumoniae cps. The polycistronic mRNAs are driven by promoters P1, P2 and P3. The polycistronic mRNA driven by P1 consists of galF and orf2. The polycistronic mRNA driven by P2 includes genes of capsule repeat unit synthesis and polymerization as well as surface assembly. The transcriptional direction of ugd is opposite to that of uge. For clarity, only four conserved genes of K. pneumoniae cps are shown. ⧫, JUMPStart sequence; T, stem–loop structure.

    Table 2.

    General features of K. pneumoniae cps

    +, Genes present.

    All of the cps gene clusters in K. pneumoniae contain six conserved genes at the 5′ end of the cps region in the following order: galF, orf2, wzi, wza, wzb and wzc. Genes encoding glucose-6-phosphate dehydrogenase (gnd) and UDP-glucose dehydrogenase (ugd) were found downstream of all cps gene clusters except cpskpK57 and cpskpA1517, in which the ugd is still unknown (Fig. 2a). Our data also reveal that most gnd and ugd are separated by manCB, rmlBADC or manCBrmlBADC genes (Fig. 2a).

    Sugar nucleotide synthesis genes in the cps clusters

    Mannose, rhamnose and glucuronic acid are common components of capsular and O antigens. The genes encoding enzymes for the biosynthesis of sugar nucleotide precursors in the K. pneumoniae capsule are found in the cps gene cluster (Figs 2a and 3). Products of manC and manB are involved in the synthesis of GDP-d-mannose. RmlA, RmlB, RmlC and RmlD proteins are responsible for the conversion of glucose 1-phosphate to dTDP-l-rhamnose (Fig. 3). The genes involved in GDP-d-mannose and dTDP-l-rhamnose synthesis are arranged in the order manCB and rmlBADC, respectively. However, manCB are present in seven cps and rmlBADC are present in only three cps (Fig. 2a, Table 2). manCB and rmlBADC are absent in K. pneumoniae NK29 (cpskpB), while K. pneumoniae VGH916 (cpskpK14) contains manCB and rmlBADC genes (Fig. 2a and Table 2). GDP-d-mannose is synthesized from fructose 6-phosphate by three enzymes, ManA, ManB and ManC (Fig. 3). However, the manA gene is not located in the cps gene cluster of K. pneumoniae (Fig. 2a).

    Figure image not available in archive
    Fig. 3.

    Nucleotide sugars biosynthesis pathway for UDP-glucose, UDP-glucuronic acid, GDP-d-mannose, GDP-l-fucose and dTDP-l-rhamnose. Enzymes located at cps gene clusters of K. pneumoniae are underlined. GlK, glucokinase; Pgm, phosphoglycomutase; GalU, UDP-glucose pyrophosphorylase; Ugd, UDP-glucose dehydrogenase; RmlA, d-glucose-1-phosphate thymidylyltransferase; RmlB, dTDP-d-glucose 4,6-dehydratase; RmlC, dTDP-4-keto-6-deoxy-d-hexulose 3,5-epimerase; RmlD, dTDP-4-keto-l-rhamnose reductase; Pgi, phosphoisomerase; ManA, phosphomannose isomerase; ManB, phosphomannomutase; ManC, d-mannose-1-phosphate guanylyltransferase; Gmd, GDP-d-mannose 4,6-dehydratase; WcaG, GDP-4-keto-6-deoxy-d-mannose 3,5-epimerase/reductase.

    The rhamnose pathway is widely distributed in bacteria (reviewed by Giraud & Naismith, 2000). dTDP-l-rhamnose, the precursor of rhamnose, is synthesized by glucose-1-phosphate thymidylyltransferase (RmlA), dTDP-d-glucose-4,6-dehydratase (RmlB), dTDP-6-deoxy-d-glucose-3,5-epimerase (RmlC) and dTDP-6-deoxy-l-mannose-dehydrogenase (RmlD) from glucose 1-phosphate (Köplin et al., 1993) (Fig. 3). These genes are clustered in the order rmlBADC in K. pneumoniae (Fig. 2a). However, these genes are in the order rmlACBD in the Gram-positive thermophile Aneurinibacillus thermoaerophilus DSM 10155 (Graninger et al., 2002). Since the enzymes for rhamnose synthesis are absent in humans, inhibitors for RmlA, RmlB, RmlC and RmlD activities have been screened for chemotherapy (Graninger et al., 2002; Ma et al., 2001; Giraud & Naismith, 2000). Other than these, galF and ugd genes were also found in the 5′ end and the 3′ end of K. pneumoniae cps gene clusters, respectively (Fig. 2a). GalF protein modulates the activity of GalU to elevate the cellular concentration of UDP-glucose in E. coli and to increase the capsule production of E. coli K30 (Marolda & Valvano, 1996; Rahn & Whitfield, 2003). The ugd gene encodes UDP-glucose dehydrogenase, which converts UDP-glucose into UDP-glucuronic acid, the substrate for capsule synthesis.

    Fucose is widely distributed in bacteria, plants and animals (Albermann et al., 2000). GDP-d-mannose 4,6-dehydratase (Gmd) and GDP-4-keto-6-deoxy-d-mannose 3,5-epimerase/reductase (WcaG) are needed for the conversion of GDP-d-mannose to GDP-l-fucose, the precursor of fucose. The colanic acid of the capsule of both E. coli and K. pneumoniae K1 contains fucose. The gmd and wcaG genes are located in the gene clusters responsible for E. coli colanic acid (Stevenson et al., 1996; Andrianopoulos et al., 1998) and K. pneumoniae K1 capsule biosynthesis (Fig. 2a) (Chuang et al., 2006). The enzyme homologous to WcaG is also found in humans (formerly called FX protein) (Tonetti et al., 1996). The WcaG involved in the K. pneumoniae K1 capsule synthesis shows 29 % identity to FX protein (accession number AAC50786) and 77 % identity to the corresponding gene product (accession number AAC77843) of the E. coli colanic acid biosynthesis gene cluster.

    The protein encoded by glf found in K. pneumoniae VGH916 (K14) shows 42 % identity to Glf (accession number AAC98417) of K. pneumoniae serotype O1 and 63 % identity to the Glf (accession number AAB88403) of E. coli K-12. Glf catalyses the conversion of UDP-galactopyranose into UDP-galactofuranose (Nassau et al., 1996; Lee et al., 1996). The enzymic activity (Nassau et al., 1996; Lee et al., 1996) and crystal structure (Sanders et al., 2001) of Glf of E. coli have been characterized. The presence of a glf homology suggested that K. pneumoniae K14 capsule contains galactofuranose sugar. Owing to the absence of Glf in humans, Glf is useful in drug development (Lee et al., 1996).

    Membrane and periplasmic proteins in the cps gene clusters of K. pneumoniae

    Membrane proteins are important in the biosynthesis of E. coli group 1 and K. pneumoniae CPSs. Wzi, Wza, Wzc, Wzx, Wzy, WbaP and WcaJ are well-characterized membrane proteins (Stevenson et al., 1996; Whitfield & Roberts, 1999; Rahn & Whitfield, 2003; Beis et al., 2004; Collins et al., 2006; Dong et al., 2006; Whitfield, 2006). The gene organization and sequence of wzi, wza and wzc are conserved among E. coli group 1 and K. pneumoniae cps gene clusters (Fig. 2a). On the other hand, the sequences of wzx and wzy in the cps gene clusters are more diverse. Therefore, the transposon containing the leaderless and promoterless β-lactamase gene was used to characterize the genes encoding Wzx, Wzy and novel membrane proteins of K. pneumoniae cps. The EZ-Tn5 <blaM/R6Kγori> transposon was used to identify the genes encoding membrane and periplasmic proteins in cpskpA, cpskpB, cpskpK9 and cpskpK62. Fosmids pCCKPA, pCCKPB, pCCK9CPS and pCCK62CPS containing the full-length cps gene cluster of cpskpA, cpskpB, cpskpK9 and cpskpK62, respectively, were tagged with the EZ-Tn5 <blaM/R6Kγori> transposon in vitro, transformed into E. coli Epi300, and then plated on LB plates with ampicillin. Genes encoding membrane or periplasmic proteins fused in-frame with β-lactamase allow E. coli to survive on plates with ampicillin. Ninety-six transformants of each fosmid were purified and sequenced. Sequencing analysis revealed that 79, 80, 78 and 80 transposon insertion sites were successfully identified for pCCKPA, pKPPB, pCCK9CPS and pCCK62CPS, respectively (Supplementary Table S2). By using blaM, we identified eight genes (wzi, wza, wzc, wbaP, wzy, orfZ, wzx and manC) of cpskpA, six genes (wzi, wza, wzc, wbaP, kpb2 and gnd) of cpskpB, six genes (wzi, wza, wzc, kp9D, wzx and rmlA) of cpskpK9, and nine genes (orf2, wzi, wza, wzc, wbaP, kp62C, wzy, kp62F and kp62I) of cpskpK62 that encoded membrane or periplasmic proteins (Supplementary Table S2). Wzi and Wza were identified more frequently than the other genes. WbaP is a membrane protein with four hydrophobic segments at the N terminal (Stevenson et al., 1996; Jiang et al., 2001; Whitfield, 2006). In cpskpA, wzx and wzy were identified in four out of 79 transformants. wbaP was also identified in three out of four cps including cpskpA, cpskpB and cpskpK62 at low frequency (Supplementary Table S2).

    wzx and wzy genes of K. pneumoniae cps gene clusters

    Synthesis of E. coli groups 1 and 4 and K. pneumoniae CPSs are Wzy-dependent (Whitfield, 2006). Wzx functions as a flippase and exports repeat-unit oligosaccharides to the periplasmic region. Wzy in turn polymerizes these repeating units (Whitfield & Roberts, 1999). It is known that the sequence similarity of wzx and wzy is very low (Jiang et al., 2001), and because of this low similarity, wzx and wzy genes should be serotype-specific. Therefore, serotyping by PCR targeting wzx and wzy for E. coli O antigen and Streptococcus pneumoniae K antigen has been developed (DebRoy et al., 2005; Kong et al., 2005). Due to the low-level of similarity of wzx and wzy genes among different gene clusters, sequence similarity and transmembrane helix profiles have been used in wzx and wzy identification (Jiang et al., 2001; Mazur et al., 2005; Nakhamchik et al., 2007; Cunneen & Reeves, 2008).

    In this study, protein families of putative Wzx and Wzy in K. pneumoniae cps gene clusters were analysed by searching the Pfam database (Finn et al., 2006; Bateman et al., 2004). Additionally, the sequence similarity of Wzx and Wzy from K. pneumoniae cps was analysed by blastp and psi-blast (Altschul et al., 1997), while transmembrane helices were predicted using HMMTOP (Tusnády & Simon, 1998) and TMpred (Hofmann & Stoffel, 1993) (Table 3). Proteins listed in Table 3 were found to contain a number of transmembrane helices ranging from nine to 14 (predicted by HMMTOP). The Pfam protein family and homologue of each protein were determined by searching the Pfam and GenBank databases. Proteins that belong to the Pfam families Polysacc_synt (PF01943) and Wzy_C (PF04932), were named as Wzx and Wzy, respectively. Other Wzx and Wzy proteins with no significant match to a Pfam family were identified according to their homologues. We found that nine cps gene clusters that contained ORFs encoding Wzx belonged to the Polysacc_synt family (Table 3). Although Wzx of cpskpK1 did not belong to the Polysacc_synt family, it showed 20 % identity to O-antigen flippase of Francisella tularensis (YP_170390) (Chuang et al., 2006). Moreover, Wzx of cpskpK9 and Wzy of cpskpK62 were indeed membrane proteins characterized using <blaM/R6Kγori> transposon insertion (Supplementary Table S2). In addition, Kpb2 and Kp62I were identified using transposon insertion (Supplementary Table S2).

    Table 3.

    Predicted genes for wzx and wzy of K. pneumoniae cps

    Wzx protein, a flippase, is essential for capsule biosynthesis and should be present in K. pneumoniae cps, but no Wzx-like protein was identified in cpskpC and cpskpK62 gene clusters. Kpc4 of cpskpC and Kp62I of cpskpK62 belonged to the MVIN (PF03023) protein family, and both MVIN and Polysacc_synt belonged to the MviN_MATE integral membrane protein superfamily. Since no other Wzx-like protein was identified in cpskpC and cpskpK62, Kpc4 and Kp62I are likely candidates for Wzx.

    Synthesis of K. pneumoniae capsule is a Wzy-dependent process (Whitfield, 2006). Five out of 12 cps gene clusters contain ORFs encoding Wzy which belongs to structural motif Pfam04932 (Wzy_C), including magA of cpskpK1.Wzy of cpskpK2, cpskpK52 and cpskpA1517 have low similarity to Wzy of E. coli and Salmonella enterica (Table 3). Due to the sequence diversity of Wzy, we did not identify Wzy in cpskpB, cpskpK5, cpskpK9 and cpskpK14. The number of transmembrane helices predicted by HMMTOP in Kpb2, Kp5A, Kp9D and Kp14E was similar to that of Wzy(s) identified in K. pneumoniae cps gene clusters (Table 3). Our data suggest that the similarity of wzx and wzy is extremely low in K. pneumoniae cps; therefore, these two genes and GTs can be used for serotyping by PCR.

    Transferase genes in K. pneumoniae capsule biosynthesis gene clusters

    Both WbaP and WcaJ are first GTs that transfer sugar to the lipid acceptor and initiate the synthesis of the capsular repeat (Whitfield, 2006). Either the wbaP gene or the wcaJ gene existed in a given K. pneumoniae cps (Fig. 2a, Table 1). Unlike other GTs, WbaP and WcaJ are membrane proteins with four hydrophobic segments at the N terminal (Stevenson et al., 1996; Jiang et al., 2001; Whitfield, 2006). EZ-Tn5 <blaM/R6Kγori> transposon insertion analysis also suggests that the WbaPs are indeed membrane proteins (Supplementary Table S2). Moreover, according to transposon insertion analysis, none of the other putative GTs identified by searching the Pfam database is a membrane protein.

    It has been shown that WbaP is the UDP-Gal : undecaprenolphosphate galactose-1-phosphate transferase of S. enterica B and E. coli K30 (Liu et al., 1993; Drummelsmith & Whitfield, 1999). The WbaPs of K. pneumoniae NK8, NK29, NK245, VGH484, VGH698, MGH78578, A1142 and A1517 share 98, 71, 74, 74, 72, 76, 77 and 74 % identity with WbaP of E. coli K30 (AAD21565), respectively. A previous study has suggested that WcaJ (AAC77848) functions as a UDP-glucose lipid carrier transferase in the biosynthesis of colanic acid in E. coli K-12 (Stevenson et al., 1996). The WcaJs of K. pneumoniae NTUH-K2044, VGH404, VGH916 and Chedid share 63, 61, 63 and 62 % identity with WcaJ of E. coli K-12 (AAC77848), respectively.

    GTs of the K. pneumoniae cps gene clusters were identified by searching the Pfam protein families database (Finn et al., 2006; Bateman et al., 2004). Putative GTs determined by searching Pfam are listed in Table 4. The results show that cpskpK1 contains three, cpskpA four, cpskpB four, cpskpC six, cpskpK5 two, cpskpK9 four, cpskpK14 three, cpskpK62 five, cpskpK2 three, cpskpK52 four, cpskpK57 four and cpskpA1517 four genes that encode GTs (Table 4). Earlier studies indicate that the DXD motif is conserved in some GT families (Liu & Mushegian, 2003; Breton et al., 1998; Wiggins & Munro, 1998) and is essential for their enzymic activities (Li et al., 2001; Götting et al., 2004; Klutts et al., 2007; Persson et al., 2007). Amino acid sequence analysis of GTs in K. pneumoniae cps gene clusters revealed that most GTs identified by Pfam HMM contained the DXD motifs (Table 4). All WbaPs in the cps gene clusters of K. pneumoniae were found to contain a conserved DXD motif in the C-terminal region. Moreover, the sequence of the DXD motif was DVD, except in WbaP of K. pneumoniae A1142, where it was DID (Table 4). In contrast to WbaP, two out of four WcaJs identified in K. pneumoniae VGH404 and Chedid were found to contain a DVD motif (Table 4).

    Table 4.

    GT genes of K. pneumoniae cps

    Since the structure of the K. pneumoniae K1 capsule is →4)-[2,3-(S)-pyruvate]-β-d-GlcpA-(1→4)-α-l-Fucp-(1→3)-β-d-Glcp(1→ (Chuang et al., 2006), we expect that a pyruvyl transferase (Kp1A) and three GTs (Kp1B, WcaI and WcaJ) are involved in repeating-unit synthesis. We also found that ORF15 of cpskpA1517 was a putative pyruvyl transferase which belonged to the Pfam PS_pyruv_trans family. In addition to GTs and pyruvyl transferase, acyltransferases were also identified in K. pneumonia cps. Kp1C of cpskpK1, ORF13 of cpskpK2, Kp5E of cpskpK5 and OrfY of cpskpA belonged to Pfam family Hexapep (PF00132). Some bacterial acetyl and acyltransferases contain Hexapep repeats, including DapD, GlmU and LacA (Williams & Raetz, 2007). In addition, sequence analysis indicated that Kp52I of cpskpK52 and Kp62F of cpskpK62 belonged to Pfam family Acyl_transf_3. The Kp52I of cpskpK52 showed 34 % identity to acyltransferase 3 of Ralstonia metallidurans CH34 (ZP_00593471). ORF14 of cpskpK57 showed 30 % identity to acyltransferase family protein of Pseudomonas fluorescens (YP_261319). However, acylation of Klebsiella K52 CPS was not noted in an earlier study (Stenutz et al., 1997). In contrast to K52, Klebsiella K5 CPS is acetylated at glucopyranose (Cescutti et al., 1995). In this study we were unable to clarify whether cpskpK5 contains gene(s) responsible for Klebsiella K5 CPS acetylation. The relationship between genes and acylation in K. pneumoniae CPS biosynthesis still requires further investigation.

    Lateral gene transfer and recombination

    Both E. coli group 1 cps and the cps gene clusters in K. pneumoniae contain six (galF, orf2, wzi, wza, wzb and wzc) and two conserved genes (gnd and ugd) at the 5′ end and 3′ end, respectively. The average pairwise sequence identities of galF, orf2, wzi, wza, wzb, wzc, gnd and ugd genes were 97.7±2.0, 95.0±3.2, 87.0±3.7, 80.1±2.6, 65.5±4.3, 66.2±4.3, 92.9±3.2 and 97.3±0.6 %, respectively (Fig. 4). Average pairwise sequence identities of galF, orf2, wzi, wza, wzb, wzc and gnd were results from 12 cps gene clusters of K. pneumoniae, whereas the value for ugd was the result from 10 cps gene clusters, not including cpskpK57 and cpskpKpA1517 of K. pneumoniae A1142 and A1517, respectively. The average pairwise sequence identities of the 5′-end conserved genes (galf, orf2, wzi, wza, wzb and wzc) gradually decreased toward the centre of the cps gene clusters (Fig. 4). A similar phenomenon has also been observed in the Streptococcus pneumoniae CPS gene clusters (Jiang et al., 2001).

    Figure image not available in archive
    Fig. 4.

    Average pairwise sequence identities of eight conserved genes in 12 cps gene clusters of K. pneumoniae. Average pairwise sequence identities of galF, orf2, wzi, wza, wzb, wzc and gnd were results from 12 cps gene clusters, whereas ugd was the result from 10 cps gene clusters, not including cpskpK57 and cpskpKpA1517 of A1142 and A1517, respectively. The average pairwise sequence identities of galF, orf2, wzi, wza, wzb, wzc, gnd and ugd were 97.7±2.0, 95.0±3.2, 87.0±3.7, 80.1±2.6, 65.5±4.3, 66.2±4.3, 92.9±3.2 and 97.3±0.6 %, respectively.

    Sequence analysis showed that the pairwise sequence divergence of 12 wzi, wza, wzb and wzc genes in K. pneumoniae cps was significantly higher than that of the gnd gene. The gnd gene, located at the 3′ end of the cps, encoding an NADP-dependent 6-phosphogluconate dehydrogenase, is involved in the pentose phosphate pathway. 6-Phosphogluconate dehydrogenase converts 6-phosphogluconate into ribulose 5-phosphate and produces NADPH (Nelson & Selander, 1994). The pentose phosphate pathway provides NADPH and ribose 5-phosphate for reductive biosynthesis and nucleotide biosynthesis. The fact that the diversity of wzi, wza, wzb, wzc and gnd genes was higher than that of other housekeeping genes indicated that the diversity could be due to recombination (Nelson & Selander, 1994). Previously, the capsule of K. pneumoniae K20 has been shown to be identical to that of E. coli K30 (Laakso et al., 1988; Homonylo et al., 1988), and the capsule of E. coli K42 is serologically identical to that of Klebsiella K63 (Niemann et al., 1978). In the present study, sequence analysis indicated that the cps gene cluster of K. pneumoniae of NK8 is identical to that of E. coli K30 (accession number AF104912). This suggests that gene substitution has occurred in an existing cps gene cluster. These clues and the low GC content (Fig. 2a) strongly suggest that the K. pneumoniae cps gene cluster was acquired from a low-GC organism via lateral gene transfer and recombination (Rahn et al., 1999).

    K. pneumoniae cps gene expression

    Transcriptional regulation of the cps gene cluster is important for capsule synthesis and pathogenicity of K. pneumoniae (Lai et al., 2003). RmpA2, RcsAB and RfaH modulate the differential expression of the cps gene clusters (Lai et al., 2003; Rahn & Whitfield, 2003). Loss of the conjugative plasmid pLVPK, which carries the rmpA2 gene, dramatically reduces the mucoidy and pathogenicity of K. pneumoniae CG43 (Lai et al., 2003). Previous studies suggest that cpskpK2 includes three promoters (Arakawa et al., 1995; Lai et al., 2003), named as P1, P2 and P3 in this study (Fig. 2b). Since the 12 K. pneumoniae cps have conserved genetic organization, the transcriptional organization and regulation mechanism would be similar to that of cpskpK2. Promoters P1 and P2 are located upstream of galF and wzi, respectively (Fig. 2b). The transcript driven by promoter P1 consisted of galF and orf2. The polycistronic mRNA driven by promoter P2 from wzi to gnd consisted of genes of capsule repeat unit synthesis and polymerization, as well as surface assembly (Fig. 2b).

    An earlier study indicates that a transcriptional terminator structure possibly exists just downstream of gnd (Arakawa et al., 1995), suggesting the existence of P3 downstream of gnd (Fig. 2b). uge, encoding uridine diphosphate galacturonate 4-epimerase, is located downstream of ugd in the K. pneumoniae NTUH-K2044 and MGH78578 genomes. The transcriptional direction of ugd is opposite to that of uge, suggesting that transcription by P3 could terminate at a site downstream of ugd (Fig. 2b). Since manCB and rmlBADC could be dispersed between gnd and ugd in K. pneumoniae cps, the possibility of another promoter located downstream of P3 cannot be ruled out.

    RmpA2, an activator of capsule biosynthesis, can bind to P1 and P2 and activate gene expression (Lai et al., 2003). An RcsAB box (TAAGATTATTCTCA) was found immediately upstream of the galF gene in the K. pneumoniae NTUKH2044 (K1) and MGH78578 (K52) genomes. An RcsAB box was also identified in the K. pneumoniae Chedid (K2) (Arakawa et al., 1995) and E. coli K30 group 1 capsule gene clusters (Rahn & Whitfield, 2003). Meanwhile, rcsA (KP3552), rcsB (KP3872) and rcsC (KP3873) genes are present in the genome of K. pneumoniae NTUKH2044 (Wu et al., 2009). We speculate that both RmpA2 and RcsAB are required for high-level capsule synthesis of K. pneumoniae.

    In E. coli K30 cps, a transcriptional terminator with a hairpin loop structure has been identified between wzc and wbaP. Transcription will be influenced by this terminator. RfaH, an antiterminator, is involved in distal genes expression (Rahn & Whitfield, 2003). The JUMPStart sequence was identified in all 12 K. pneumoniae cps immediately upstream of wzi. Moreover, rfaH (KP0198) is present in the genome of K. pneumoniae NTUKH2044 (Wu et al., 2009). Therefore, the JUMPStart-RfaH antitermination mechanism could modulate K. pneumoniae cps expression.

    CPSs of K. pneumoniae

    Much of the information about CPS synthesis in K. pneumoniae is modelled after the E. coli pathways. We have predicted gene functions of the available sequences of the K. pneumoniae cps clusters and correlated them with the CPS structures (Supplementary Table S3). The structural repeating unit of CPSs of E. coli K30 consists of →2)-α-d-Manp-(1→3)-β-d-Galp-(1→ chains carrying β-d-GlcpA-(1→3)-α-d-Galp-(1→ branches at position 3 of the mannoses (Chakraborty et al., 1980). The tetrasaccharide repeating unit from the K30 capsule of E. coli is synthesized on UndP by four GTs (WbaP, WbaZ, WcaO and WcaN) (Whitfield & Paiment, 2003). The repeating units containing mannose are synthesized by the products of manA, manB and manC, of which only manB and manC are present in cpskpA. Although cpskpK1 also contains manB and manC, mannose is not included in the structural unit of the K1 capsule (Chuang et al., 2006). However, it consists of fucose, which is synthesized from GDP-d-mannose by Gmd and WcaG. These findings may reinforce the correlation of genes and capsule components. Furthermore, colanic acid of E. coli contains two fucose units, and the gene cluster responsible for colanic acid synthesis indeed contains gmd and wcaG genes (Stevenson et al., 1996; Andrianopoulos et al., 1998).

    Since the K1 capsule also contains pyruvate, synthesis of K. pneumoniae NTUH-K2044 capsule requires pyruvyl transferase. An earlier study shows that kp1A of cpskpK1 encodes a putative pyruvyl transferase (Chuang et al., 2006). In addition to pyruvyl transferase, three more GTs found in cpskpK1 (Fig. 2a, Table 3) may be engaged in the synthesis of K1 capsule repeating units by sequential addition of pyruvate and three sugars to UndP.

    The structural unit of K2 polysaccharide of K. pneumoniae has been characterized as →3)-β-d-Glcp-(1→4)-β-d-Manp-(1→4)-α-d-Glcp-(1→ chains carrying α-d-GlcpA-(1→ branches at position 3 of the mannoses (Corsaro et al., 2005). The capsular hexasaccharide repeating unit from Klebsiella type 52 contains two rhamnoses (Stenutz et al., 1997). GDP-d-mannose and dTDP-l-rhamnose biosynthesis genes are present in cpskpK2 and cpskpK52, respectively (Fig. 2a). The presence of manCB and rmlBADC could reflect the existence of mannose and rhamnose in the capsule of K2 and K52. Moreover, the structural unit of K5 polysaccharide of K. pneumoniae has been characterized as →4)-β-d-GlcpA-(1→4)-β-d-Glcp-(1→3)-β-d-Manp-(1→ with a pyruvic acetal group acetylated at the C-2 position of the glucopyranose (Cescutti et al., 1995). The GDP-d-mannose biosynthesis gene was found in cpskpK5 (Fig. 2a). manCB and GDP-l-fucose biosynthesis genes (gmd and wcaG) are both found in cpskpK1; however, only fucose is incorporated into the repeat unit of the K1 capsule. In contrast to other cps gene clusters of K. pneumoniae, except cpskpK57 and cpskpA1517, cpskpB does not contain mannose and rhamnose sugar nucleotide biosynthesis genes. Therefore, the capsule of K. pneumoniae NK29 might not contain mannose and rhamnose.

    The repeating units of capsules usually include uronic acids (Corsaro et al., 2005), and glucuronic acid is found in the repeating unit of K1, K2, K5 and K52 isolated from K. pneumoniae (Cescutti et al., 1995; Stenutz et al., 1997; Corsaro et al., 2005; Chuang et al., 2006). The presence of ugd could explain the prevalence of glucuronic acids in the capsules. Moreover, Wzc could increase Ugd activity by phosphorylation (Grangeasse et al., 2003), and phosphorylated Ugd could produce UDP-glucuronic acids from UDP-glucose (Fig. 4). However, galacturonic acid is present in Klebsiella K57 CPS instead of glucuronic acid (Pan et al., 2008).

    Conclusions

    Serotyping has been widely used for the epidemiological study of K. pneumoniae. However, the procedure of serotyping is laborious and time-consuming. Recently, methods for the molecular typing of Klebsiella species have been developed (Brisse et al., 2004; Fang et al., 2007; Pan et al., 2008) to supplant the existing serotyping method. In principle, molecular typing of K. pneumoniae is easier to perform than conventional serotyping. Nevertheless, successful typing requires a panel of probes that can easily detect all common strain types. Our study provides seven new cps gene clusters of K. pneumoniae strains that will facilitate the development of PCR-based identification methods.

    Acknowledgments

    We are grateful to Dr June Hsieh Wu for critical reading of the manuscript and helpful discussion about the capsule structures. This study was supported by a grant from the National Science Council of Taiwan (NSC97-3112-B-400-005). The sequencing services were provided by the Sequencing Core Facility of the National Research Program for Genomic Medicine supported by a grant from the National Science Council, Taiwan.

    References