(1–456)
5ezmA/5f15A
(578 AA)
[ ]
6s7tA
(826 AA)
[ ]
6s7oA
(705 AA)
[ ]
6eznF
(718 AA)
[ ]
3wajA
(875 AA)
[ ]
5oglA
(713 AA)
[ ]
6p25A/6p2rA
(817 AA)
[ ]
7bvfA
(1102 AA)
[ ]
6sniX/6snhX
(562 AA)
[ ]
The eight essentially full-length hits with best E-values and sequence coverage > 90% are tabulated: 5ezm, crystal structure of ArnT from Cupriavidus metallidurans in the apo state [ 58 ], 5f15 is the same as 5ezm but with undecaprenyl phosphate as analogue for a lipid-linked sugar substrate; 6s7t, cryo-EM structure of human oligosaccharyltransferase complex OST-B [ 59 ]; 6s7o, cryo-EM structure of human oligosaccharyltransferase complex OST-A [ 65 ]; 6ezn, cryo-EM structure of the yeast oligosaccharyltransferase (OST) complex [ 74 ]; 3waj, crystal structure of the Archaeoglobus fulgidus oligosaccharyltransferase (O29867_ARCFU) complex with Zn and sulfate [ 75 ]; 5ogl, structure of bacterial oligosaccharyltransferase PglB in complex with an acceptor peptide and an lipid-linked oligosaccharide analogue [ 60 ]; 6p25/6p2r, structure of Saccharomyces cerevisiae protein O-mannosyltransferase Pmt1-Pmt2 complex bound to the sugar donor and a peptide acceptor/without peptide ligand [ 59 ]; 7bvf_A, Cryo-EM structure of Mycobacterium tuberculosis arabinosyltransferase EmbA-EmbB-AcpM2 in complex with ethambutol [ 76 ]. We added also 6sni/6snh (cryo-EM structure of nanodisc reconstituted yeast ALG6 in complex with 6AG9 Fab or with Dol25-P-Glc [ 77 ]) because of the much shorter template length. For each query and each PDB structure (listed as PDB ID), we provide the E-value and the sequence ranges hit in the query (Q) and in the template (T; we also provide the length of the template in parentheses below the PDB identifier). The uppercase letter behind the PDB identifier denotes the relevant chain
Proteins with known structure discovered in these searches belong to the group of well-studied membrane-standing arabinosyl-, oligosaccharyl- or mannosyltransferases. Their annotated enzymatic domain is fully part of the alignment. Given the full-length coverage of the N-TMTCs’ sequences queried against the PDB, there is no doubt that N-TMTCs and the annotated enzymatic domains of sugar transferases detected share a common fold and have a similar 3D structure.
For all N-TMTCs, the sequence of the bacterial aminoarabinose transferase ArnT corresponding to structures 5ezm/5f15 [ 58 ] is the most similar homologue with an almost gapless alignment (with some exception for the N-terminal region of the loop between TM7 and TM8). The alignments of N-TMTCs generated by HHpred cover the first 11 of the 13 N-terminal TMs in 5ezm/5f15, nicely supporting the membrane topology consideration in the previous section (to note, TM region TM4 is missing and TM5/6 are annotated as a single large TM both in the PDB entry 5ezm and in the Uniprot entry Q1LDT6). As a result of the structural similarity, we can conclude that there are five loops between TM regions that form the structure in the ER lumen (see Fig. Fig.2): 2 ): (i) two long loops EL1 (between TM1/TM2) and EL4 (between TM7/TM8; both loops contain helical segments) as well as (ii) three short loops EL2 (between TM3/TM4), EL3 (between TM5/TM6) and EL5 (between TM9/TM10). In 5ezm/5f15 (as in other sugar transferases of this type), there are two substrate binding cavities that communicate via a channel limited, on one side, by the TMs in the membrane and, at the other side, by the long loop connecting TM7 and TM8 (i.e., EL4 in the case of TMTCs). One binding region is formed by the segments homologous to EL1, EL2 and EL4 and accommodates the sugar acceptor substrate. The other site (built by EL1 and mainly by EL4) provides for interaction with a lipid-linked carbohydrate (LLC; the sugar donor, e.g., a dolichyl phosphate or pyrophosphate with attached sugar/oligosaccharide moiety). In the zone of contact of the two substrates, a divalent metal ion important for catalysis is coordinated by amino acid residues of the transferase. Despite the vast differences in sequences and possible ligands, homology considerations suggest that the TMTCs are constructed following the same general architecture.
Most importantly, we see at the level of sequence comparison (even without any structural modelling) that some critical motifs strongly conserved among the TMTCs have a structural and/or functional equivalent (e.g., in ligand binding) in the 3D structures of enzymes found. The strictly conserved DD motif in the loop between TM1 and TM2 (e.g., D52/D53 in N-TMTC1) aligns with the known active site in several sugar transferases (e.g., D55/E56 in 5ezm_A, D77/E78 in 6p25_A or D281/D282 in 7bvf_A). All the sugar transferases found in our HHpred homology search have at least an aspartate that coincides with the first aspartate in this motif. This residue is described as binding to the polar group of the sugar acceptor and/or a divalent metal ion (e.g., for 5ezm/5f15 [ 58 ], 5ogl [ 60 ], 6s7t/6s7o [ 65 ] or 6sni/6snh [ 77 ]). Thus, these positions are absolutely critical for enzymatic catalysis since any residue substitution leads to loss of function. For example in 6p25/6p2r [ 59 ], E78 forms a salt bridge with R138 making D77 sticking out towards the cavity where it binds to the sugar acceptor substrate. Any replacement of D77/E78 abolishes enzyme function [ 59 , 78 ].
In 5ezm/5f15, D158 (in EL2, N-terminal to TM4) interacts with the acceptor substrate and also forms a salt bridge with K203 (in EL3, C-terminal to TM5). The homologous residues are conserved in TMTCs (e.g., D169 and K219 in N-TMTC1) and, thus, are predicted to also play a role in ligand binding.
An arginine in the loop EL5 between TM9 and T10 close to the N-terminus of TM10 and strictly conserved among TMTCs (e.g., R404 in TMTC1 as part of the conserved sequence AERV) followed by a hydrophobic stretch of residues (from TM10) is also seen in sugar transferase structures (R459 in 6s7t [ 65 ], R405 in 6s7o [ 65 ], R404 in 6ezn [ 74 ], R426 in 3waj [ 75 , 79 ], and R375 in 5ogl [ 60 ]). In all these known structures, this arginine is described as an interaction partner of the LLC’s phosphate group whereas the lipid part of the LLC is accommodated within a hydrophobic groove formed mainly by TM6 and TM7.
The sequence SHKSYRP (with H89/K90 in TMTC1) in EL1 is well conserved among TMTCs (close to the N-terminal end of second helix in EL1). At the same time, K85 in the 5ezm/5f15 sequence at a homologous position is known to interact with the LLC’s phosphate. Thus, it is reasonable to assume that one of the positively charged residues in TMTCs (e.g., H89 or K90 in TMTC1) has a similar role. This suggestions is supported by the known mutant phenotype in human TMTC3 (the mutation His67Asp introduces a charge swap and leads to cobblestone lissencephaly [ 19 ]; H67 is the position in TMTC3 homologous to H89 in TMTC1).
The limits of a purely sequence-analytic approach can be illustrated with the case of the DW motif conserved among all TMTCs in EL4 (e.g., D330/W331 in N-TMTC1) at the C-terminal end of the helix parallel to the ER membrane. It is problematic to identify the function of an equivalent motif in homologous 3D structures, even in those with a hit to DUF1736. For example, the apparently homologous sequence position R270/Y271 in 5ezm/5f15 are at the edge of a structurally unresolved loop region. In 6s7t, residues E405/H406 seem the closest to positions homologous to the TMTCs’ DW motif. E405 is directed towards R214 (a residue in the loop homologous to EL2) [ 65 ]. Thus, the function of the conserved DW motif in TMTCs (as well as of several others) cannot be unambiguously understood due to such comparisons. Interestingly, a DW motif has been described as critical for subunit interaction in pyruvate dehydrogenase kinase 2 [ 80 ].
Thus, this sequence-analytic comparison of TMTCs with known homologous 3D structures shows that a number of conserved sequence motifs can be understood in the context of ligand binding. TMTCs appear to incorporate divalent metal ions for catalysis and LLCs as donors for a sugar moiety. Given the experimental finding of TMTCs being part of a new O-mannosylation pathway [ 26 ], the LLC applicable here is dolichyl-phospho-mannose (DPM), the universal donor of mannosyl-residues in higher eukaryotes.
When applying HHPred with N-TMTCs as input against the Pfam library of sequence domain family models, a large variety of annotated entries besides many domains of unknown function are hit with, beyond doubt, statistically significant E-values (E-value< 1.e-5, see Table 2 and Additional file 3 ).
HHPred search with N-terminal part of four human TMTCs against Pfam-A_v33.1
Pfam domain | TMTC1 (1–456) | TMTC2 (1–475) | TMTC3 (1–426) | TMTC4 (1–462) |
---|---|---|---|---|
Glyco_transf_22 (PF03901, 388 AA) | 2.1E-20 | 1E-18 | 1.5E-20 | 6.4E-19 |
Q: 29–456 | Q: 3–475 | Q: 9–426 | Q: 22–462 | |
T: 1–352 | T: 1–351 | T: 2–350 | T: 1–350 | |
STT3 (PF02516, 458 AA) | 5.5E-19 | 2.1E-17 | 1.8E-19 | 9.5E-18 |
Q: 26–456 | Q: 1–473 | Q: 5–423 | Q: 19–459 | |
T: 3–406 | T: 4–400 | T: 3–401 | T: 3–401 | |
PTPS_related (PF10131, 616 AA) | 1.4E-15 | 9.7E-14 | 2.5E-16 | 4.3E-15 |
Q: 89–456 | Q: 62–475 | Q: 67–425 | Q: 81–462 | |
T: 1–308 | T: 1–308 | T: 1–307 | T:1–308 | |
PMT (PF02366, 247 AA) | 2.3E-14 | 1.3E-13 | 1.5E-14 | 2.1E-13 |
Q: 30–285 | Q: 3–248 | Q: 9–259 | Q: 23–293 | |
T: 2–242 | T: 1–242 | T: 2–242 | T: 2–242 | |
Mannosyl_trans2 (PIG-V) (PF04188, 432 AA) | 6E-14 | 1.9E-12 | 3.8E-14 | 1.4E-12 |
Q: 51–451 | Q: 25–470 | Q: 30–426 | Q: 44–462 | |
T: 60–425 | T: 60–425 | T: 60–429 | T: 60–427 | |
Dpy19 (PF10034, 651 AA) | 8.4E-13 | 1.8E-12 | 4.4.E-13 | 3E-12 |
Q: 46–455 | 20–474 | 27–424 | 39–460 | |
30–502 | 30–503 (651) | 32–499 (651) | 30–499 (651) | |
AftA_N (PF12250, 432 AA) | 3.6E-12 | 3.7E-11 | 3.1E-13 | 1.6E-11 |
T: 27–446 | T: 3–465 | T: 7–399 | T: 20–435 | |
Q: 76–430 | Q: 78–431 | Q: 76–402 | Q: 75–402 | |
PMT_2 (PF13231, 159 AA) | 3.7E-13 | 1.1E-11 | 6.3E-13 | 1.6E-12 |
Q: 91–276 | Q: 64–234 | Q: 69–250 | Q: 83–284 | |
T: 1–156 | T: 1–156 | T: 1–156 | T: 1–159 | |
Arabinose_trans (PF04602, 471 AA) | 6.3E-11 | 5.9E-09 | 1.1E-10 | 1.6E-10 |
Q: 34–456 | Q: 8–468 | Q: 13–426 | Q: 27–462 | |
T: 51–428 | T: 51–423 | T: 51–430 | T: 51–427 | |
PIG-U (PF06728, 363 AA) | 9.8E-11 | 7.8E-09 | 2.1E-10 | 5E-09 |
Q: 47–456 | Q: 6–475 | Q: 14–423 | Q: 45–462 | |
T: 30–349 | T: 1–350 | T: 1–345 | T: 35–349 | |
Mannosyl_trans4 (PF15971, 163 AA) | 9.4E-11 | 1E-09 | 6.2E-11 | 3.9E-10 |
Q: 81–276 | Q: 59–234 | Q: 59–250 | Q: 78–285 | |
T: 1–162 | T: 6–161 | T: 1–162 | T: 6–162 | |
Glucos_trans_II (PF14264, 312 AA) | 6.4E-07 | 2.8E-06 | 7.6E-08 | 5.6E-07 |
Q: 45–413 | Q: 19–431 | Q: 24–385 | Q: 38–421 | |
T: 5–310 | T: 5–310 | T: 5–310 | T: 5–310 | |
GT87 (PF09594, 251 AA) | 3.5E-07 | 6.4E-06 | 1.1E-06 | 1.5E-06 |
Q: 91–389 | Q: 64–406 | Q: 68–360 | Q: 82–396 | |
T: 2–251 | T: 2–249 | T: 1–248 | T: 1–248 |
The functionally annotated hits with best E-values are listed: PF03901, Alg9-like mannosyltransferase family; PF02516, Oligosaccharyl transferase STT3 subunit; PF10131, 6-pyruvoyl-tetrahydropterin synthase related domain, function unknown; PF02366, Dolichyl-phosphate-mannose-protein mannosyltransferase; PF04188, Mannosyltransferase (PIG-V); PF10034, Q-cell neuroblast polarisation, function unknown; PF12250, Arabinofuranosyltransferase N terminal domain; PF13231, Dolichyl-phosphate-mannose-protein mannosyltransferase; PF04602, Mycobacterial cell wall arabinan synthesis protein; PF06728, GPI transamidase subunit PIG-U; PF15971, Dolichyl-phosphate-mannose mannosyltransferase; PF14264, Glucosyl transferase Gtr II; PF09594, Glycosyltransferase family 87. For each query and each Pfam entry (listed as Pfam entry name and ID), we provide the E-value and the sequence ranges hit in the query (Q) and in the template (T; we also provide the length of the template in parentheses below the Pfam model name)
Most of the domains found belong to the GT-C clan (CL0111) of glycosyltransferases (out of 19 known GT-C members, nine were detected: Glyco_transf_22, STT3, PTPS_related, PMT, Mannosyl_trans2, PMT_2, Arabinose_trans, PIG-U, GT87). Most informative are the sequence homologies with Glyco_transf_22 (PF03901) and STT3 (PF02516) because the E-value is < 1.e-18 and alignment of the Pfam domains and the N-TMTCs cover both query and template almost completely (coverage > 95%). Certain super-conserved residues in the sequence family alignments of both Pfam families are also conserved among the TMTCs. This includes the active site DD motif in EL1 (e.g., D52/D53 in N-TMTC1) and the arginine in front of TM10 (e.g., R404 in TMTC1) that are characteristic for both Pfam domains.
The homology with other groups of dolichyl-phosphate-mannose-dependent mannosyltransferases (Mannosyl_trans4, PF15971), glucosyl transferases GtrII (Glucos_trans_II, PF14264) and arabinofuranosyltransferase N-terminal domain (AftA_N, PF12250) not directly linked to the GT-C clan fits into the same general functional prediction for TMTCs as sugar transferases and having a similar 3D structure.
The HHPRED search results are confirmed by iterative PSI-BLAST [ 32 ] runs with standard parametrization and human TMTC sequences as input. They deliver plentiful hits within the GT-C clan and beyond (results not shown). The diversity of significant homology hits constitutes a problem for function assignment of TMTCs beyond the general prediction as GT-C/PMT-like sugar transferases. It needs to be emphasized that the GT-C clan is a very diverse sequence superfamily comprising membrane-bound sugar transferases with a large variety of different specific activities and substrate types (including the transfer of arabinose, mannose, glucose or oligosaccharides among others).
We find also other proteins including even enzymatically completely inactive ones such as PIG-U (see reference [ 81 ] for discussion of PIG-U’s function). Interestingly, the profile build on the basis of our grand alignment of TMTCs is linked by HHPred to the domain BindGPILA [ 81 ] with E-value ~ 0.03 (calculated at the background of all Pfam models). To note, this domain model is derived from homologous sequence segments with 10 TMs and intermittent loops extracted from proteins in the glycosylphosphatidylinositol (GPI) lipid anchor pathway PIG-B, PIG-M, PIG-U, PIG-V, PIG-W and PIG-Z [ 81 ]. PIG-W is an acetyltransferase for the GPI lipid anchor, PIG-U is not an enzyme at all but the remaining four (PIG-B, PIG-M, PIG-V and PIG-Z) are mannosyltransferases. All of them are united by the ability to bind phospho-lipid linked sugar/carbohydrate moieties.
Thus, the mere homology of TMTCs to the GT-C group of sequences by itself is only informative with regard to fold coincidence, to structural similarity and to a general level of functional classification. Yet, the conservation of residues known to be important for catalysis and substrate binding as detailed in the sequence analysis above indicates that TMTCs are actually enzymatically active. As we see in the 3D structure modelling exercise below, many additional conserved sequence motifs can be rationalized due to interactions with ligands and substrate molecules.
We attempted to create 3D structural models of all four TMTCs together with a divalent metal ion and DPM with the goal to explore whether observed sequence motifs that are conserved between TMTCs and sugar-transferases of known 3D structure come spatially together for interaction with the ligands.
HHpred scored the aminoarabinose transferase structures ArnTCm (PDB IDs: 5ezm and 5f15, chain A [ 58 ]) as by far the best hit for all human TMTCs (see Table Table1) 1 ) and also for five other organisms including Bos taurus, Gallus gallus, Danio rerio, Xenopus laevis and Drosophila melanogaster (results not shown). Therefore, this X-ray crystal structure was used as a template to build 3D models of TMTC1 (XP_016875493.1), TMTC2 (Q8N394), TMTC3 (Q6ZXV5) and TMTC4 (Q5T4D3) using the functions automodel and loop refine in Modeller (version 9.4) [ 35 ]. The overall structure of 5ezm (apo ArnTCm, resolution 2.70 Å) / 5f15 (UndP-bound ArnTCm, resolution 3.20 Å) [ 58 ] consists of (i) an N-terminal membrane-embedded region and (ii) a periplasmic domain (PD). For this work, only the first segment is of interest. It involves 13 TM helices and interconnecting loops including three juxtamembrane helices (JM1, JM2 and JM3). JM1 and JM2 form the first periplasmic loop between TM1 and TM2 while JM3 leads into a partially disordered flexible periplasmic loop (PL4 being homologous to EL4 in TMTCs) between TM7 and TM8.
In this study, only the membrane-embedded domain of TMTCs including the juxtamembrane helices were modelled using the most N-terminal regions of the templates 5ezm and 5f15 (the 11 TM segments together with JM1 and JM2 following 5ezm while JM3 was molded after 5f15). The major hurdles to generate the 3D structure of TMTCs by homology modelling are (i) the low percent identity (< 15%) with sequences of the template crystal structures (Table 3 ) and (ii) several overly long loops between TM regions without equivalent in the structure templates. As we want to understand structural detail at the lumenal side, cytoplasmic loops are not that critical but the lumenal ones are. The loop sequence segments include (i) the cytoplasmic loop between TM2-TM3 (residues 136–146) in TMTC4, (ii) the cytoplasmic loop between TM6-TM7 in all TMTCs and (iii) the lumenal loop TM9-TM10 in all TMTCs. Furthermore, the template 5ezm/5f15 does not account for a loop extension at the N-terminal side of the domain of unknown function, DUF1736 (PF08409), between TM7-TM8 for all TMTCs. Moreover, we note that TMTC2 has another unusually longer cytoplasmic loop between TM8-TM9 (residues 337–392) and, therefore, in the absence of any template, residues 337–392 were not modelled. We describe the alignment with the 5ezm/5f15 template, the regions modelled for each TMTC proteins and issues with the overly long loops in Table Table3 3 and in the annotated alignment in Additional File 4 – Supplementary Figure 1.
Modelling the 3D structures of TMTCs
TMTC1 | TMTC2 | TMTC3 | TMTC4 | |
---|---|---|---|---|
| 9.4% | 10.6% | 9.5% | 11.3% |
23–456 | 1–336 & 393–474 | 4–428 | 17–464 | |
240–257 | 207–220 | 209–231 | 242–262 | |
393–406 | 411–419 | *365–373 | 401–409 | |
| 284–358 (321–335) | 247–321 (284–298) | 258–331 (294–308) | 292–366 (329–343) |
The table provides the sequence identities of template 5EZM/5F15 with TMTCs, the range of the modelled regions, the longer loops between TM6-TM7 and TM8-TM9 compared with the templates, and location of DUF1736 along with JM3 (*residues 365–369 continue to be helical with TM9). TMTC2 has another unusual, longer cytoplasmic loop between TM8-TM9 (residues 337–392) which is not modelled in the absence of any template
As we expect that certain long loops, especially those that have no equivalent in the 5ezm/5f15 structure, will not get reconstructed well, the DOPE model scoring system provided by Modeller might not be such a good choice for selecting among various model instances. We have validated our model instances based on the TM-align scores [ 82 ]. A TM-score between 0 and 0.3 suggests random structural similarity while a TM-score greater than 0.5 and less than 1.0 suggests two structures having the same fold. The TM-align scores for TMTC1, TMTC2, TMTC3 and TMTC4 (when compared with 5ezm) are 0.93441, 0.72261, 0.91499, and 0.92104 respectively.
The resulting 3D structure models (see Fig. 3 ) were used to place a divalent metal ion (following 5ezm for initial positioning) and a DPM moiety (using crystal-bound ligand UndP in 5f15 for initial posing as reference position). We applied Zn + 2 parametrization for the ion in this study although there is no clarity about the exact nature of the divalent metal ion from experiment. The crystallographic evidence speaks for zinc in 5ezm [ 58 ]; yet, Mn 2+ is the likely ion in the case of 5ogl [ 60 ], several other reports such as the one for 6s7t [ 65 ] remain silent about the nature of the ion other than emphasizing an electronic density consistent with a divalent metal ion. To emphasize, we do not think that the exact parametrization of the ion (beyond carrying two positive charges) is critical for the outcome of this modelling study.
Structure models of TMTC1/2/3/4 with ligands. The cartoon representation of model TMTC1/2/3/4 (from top to bottom) with docked DPM is shown in side- (left column) and top-view (middle column). Close-up (right column) of the binding pocket of TMTCs with docked DPM (cyan color sticks) and with important residues (HKSY residues of the conserved SHKSYRP motif M2 in EL1; K and E from motif M4 in EL3) presented in yellow color sticks; the divalent metal ion (modelled as zinc) is shown in gray color
3D structure modelling operations including ligands were implemented with Schrodinger suite [ 36 ]. An induced fit procedure following established protocols [ 36 – 42 ] was applied. In brief, the Schrodinger programs “Protein Preparation Wizard” and “LigPrep” were utilized for preparing the TMTC models and the DPM. With “Glide-SP” and “Prime”, multiple poses of DPM were generated and optimized in multi-step energy minimizations (with the OPLS parameter set and a surface Generalized Born implicit solvent model) that included some stages with softened potentials and side chains mutated to alanine. The procedure was completed with a minimization that allowed all residues within 5 Å of DPM (including their backbone and side-chain) and ligand DPM itself to be relaxed. The complexes were ranked by Prime energy (molecular mechanics energy plus solvation) and those within 30 kcal/mol of the minimum energy structure were passed through for a final round of Glide docking and scoring with GlideScore. The final structures for each of the TMTCs together with the ligands are provided with their atomic coordinates (Additional File 5 ).
As the most important outcome of the modelling effort, visual inspection of the four model structures show that, for all TMTCs, the resulting structures show consistently that seven conserved sequence motifs M1-M7 as listed in Table 4 come spatially together at the lumenal side of the TMTCs, form part of the surface of the protein structure that is homologous to the two substrate/ligand binding sites in 5ezm/5f15. They group closely around the DPM moiety and the divalent ion creating a dome region (see Fig. 4 for the case of TMTC1). We find that residues in motifs M4 and M5 are observed for coordinating the divalent metal ions. M2 and M3 are largely engaged in mannose interactions, M6 tends to contact with the dolichyl tail. Motifs M4, M5 and M7 are important for interaction with the phosphate in DPM. Thus, the observed sequence conservation can be rationalized in terms of evolutionary conserved function.
Several conserved sequence motifs in TMTCs are related to DPM binding and divalent metal ion coordination
Motif | Residues | TMTC1 | TMTC2 | TMTC3 | TMTC4 |
---|---|---|---|---|---|
M1 (red) DD in EL1 | D | 52 | 26 | 31 | 45 |
D | 53 | 27 | 32 | 46 | |
M2 (orange) SHKSYRP in EL1 mannose | S | 88 | 61 | 66 | 80 |
H | 89 | 62 | 67 | 81 | |
K | 90 | 63 | 68 | 82 | |
S | 91 | 64 | 69 | 83 | |
Y | 92 | 65 | 70 | 84 | |
R | 93 | 66 | 71 | 85 | |
P | 94 | 67 | 72 | 86 | |
M3 (yellow) RxD in EL2 | R | 167 | 139 | 143 | 172 |
D | 169 | 141 | 145 | 174 | |
M4 (green) KE(T/Q) xxT in EL3 | K | 219 | 186 | 188 | 221 |
T/Q | 221(T) | 188(Q) | 190(Q) | 223(Q) | |
T | 224 | 191 | 193 | 226 | |
M5 (blue) DW in EL4 | |||||
W | 331 | 294 | 304 | 339 | |
M6 (violet) PxxP in TM9 | P | 386 | 404 | 358 | 394 |
P | 389 | 407 | 361 | 397 | |
M7 (pink) ERxxY in EL5 | E | 403 | 421 | 375 | 411 |
R | 404 | 422 | 376 | 412 | |
Y | 407 | 425 | 379 | 415 |
Conserved residues present in the vicinity of the ligand dolichyl-phosphate-mannose (DPM) are part of seven motifs M1-M7 in the TMTC family protein sequences. For each motif, the actual sequence, the location (loop number or TM number), loop coloring in Fig. Fig.4 4 and the residue numbers in TMTC1/2/3/4 respectively are listed. If at least one atom of the residue is within 5 Å, 6 Å or 7 Å of any atom of DPM, the respective residue is marked with the corresponding subscript “A”, “B” or “C”. In bold, we indicate residues in M4 and M5 observed for coordinating the divalent metal ions. We find motifs M2 and M3 largely involved in mannose interactions, M6 provides for the dolichyl tail, and M4, M5 and M7 are important for interaction with the phosphate
Sequence motifs M1-M7 come spatially together in model structures of TMTCs. We illustrate the spatial localization of sequence motifs M1 (red), M2 (orange), M3 (yellow), M4 (green), M5 (blue), M6 (violet) and M7 (pink, all shown in ball mode) at the background of the structural cartoon of the whole protein. DPM is presented as blackish sticks, the divalent metal ion is represented as reddish sphere. We show the case of TMTC1; the figures for the other TMTCs look very similar. To note, motif M2 in this figure is extended to the conserved region represented by SHKSYRPLCVTLTSFRLN in TMTC1 (88–103 in EL1)
Further, several close contacts between the DPM ligand, the metal ion and TMTC residues were observed (to note, we did not enforce any specific residue contacts during the induced fit docking procedure). Given some sequence diversity among TMTCs and also the large number of degrees of freedom in the modelling process, it is not surprising that not all contacts are found in all models. Yet, a common subset of those was detected in each of the TMTC1, TMTC2, TMTC3, and TMTC4 model structures (see Table Table4) 4 ) and some contacts repeat patterns seen in homologous crystal structures:
The structural models of human TMTCs can only be considered preliminary in many details at this stage since
The average accuracy of C-alpha atom positioning in homology modelling above 30% sequence identity is estimated 2 Å [ 83 , 84 ]; hence, the error is expected to be higher for certain regions in our model structures, especially in loop regions without equivalent in the template. On the other hand, the known crystal structures (having very moderate crystallographic resolutions around 3 Å) do not resemble the complete protein complex including the correctness of certain groups of amino acid chains, some inter-TM loops, substrates and ligands needed for catalysis either.
Despite these restrictions, we see consistent features emerging from the modelling of various TMTCs, namely the arrangement of TM regions in the membrane as well as of the loops and segments that form the binding site for the lipid-linked sugar and the divalent metal ion; essentially, the major part of the structure located in the ER lumen appears functionally plausible after the conserved sequence segments got spatially united as a result of the 3D reconstruction.
Thus, it makes sense to analyze also contacts between the DPM moiety, the metal ion and TMTC residues seen only in a few of the TMTC models. In this way, we will get a more complete picture of the binding cavity and can enlarge the list of potentially relevant residues for interaction with the ligands:
Despite the wealth of sequence-analytic findings available for TMTCs, the systematic analysis of their sequences and of related biomolecular data for the purpose of assigning the biological function of TMTCs has never been performed before. Several roadblocks had to be overcome. First, there are issues with sequence accuracy as, for some TMTCs, several versions of protein sequences are available in databases, some of which lack sequence pieces essential for TMTC function as this study has revealed. Second, the complex nature [ 66 ] of the TM regions sprinkled with polar residues/prolines/glycines makes their accurate prediction in the TMTC sequences difficult. This seriously hampers function discovery since localizing certain loops at the correct side of the membrane might be impossible with errors in membrane topology. Third, just the fact of finding sequence similarity with a large number of sugar transferases is helpful to establish the homology relationship but provides little guidance for biological follow-up work aimed at zooming into the exact molecular and cellular functions of TMTCs, for example with regard to actual catalytic capacity, substrate specificity and ligands bound.
This work has made significant steps forward in understanding 3D structure and biological function of the membrane-embedded domains covering the N-terminal halves of TMTC1, TMTC2, TMTC3 and TMTC4 sequences. First, we determined the exact membrane topology using sequence-analytic, phylogenetic and available experimental data. The assumption of conserved membrane topology for evolutionarily conserved molecular function was key to interpret TM prediction results for N-TMTCs in a unified manner. The finally determined membrane topology including 11 TMs nicely complies with all known constraints. The C-terminal globular TPR domain is located in the ER lumen together with the critical for function conserved sequence motifs in the loops between TM regions. The homologous sequence segments in the known 3D structures 5ezm/5f15 corresponding with the luminal loops in TMTCs have the same membrane topology. We can further conclude that TMTC sequences in the database that cannot fit to this topology are most likely erroneous.
Whereas the complex nature of TM regions in TMTCs makes TM prediction difficult, it supports establishing gene homology via searches for significant sequence similarity [ 66 , 70 ]. The evidence certifying the homology of N-TMTCs with GT-C/PMT-class and other related sugar transferases is overwhelming; thus, TMTCs must have the same overall fold and resemble similar tertiary structure. Despite the huge evolutionary distance from bacteria to human representatives in this homology group, higher eukaryote TMTCs share strongly conserved sequence motifs with GT-C/PMT-class enzyme sequences. Even at the pure sequence-analytic level, we can explain a few of these conserved sites as required for catalysis or for ligand binding. Given the close relationship with ArnT from Cupriavidus metallidurans (the structure of which is known: 5ezm/5f15), we suggest that these ligands include a divalent metal ion and a LLC molecule. Since TMTCs are part of an O-mannosylation pathway, we conclude that this LLC is DPM.
3D-structural modelling of N-TMTCs further enhances the association of conserved sequence motifs with ligand binding. Seven conserved sequence motifs from various parts of the protein sequence (including those seen already at the level of just sequence comparison) come spatially together to form the surface of binding sites for the mannosyl residue, the phosphate group and the dolichyl tail of DPM as well as the divalent metal ion; thus, their evolutionary conservation can be rationalized as maintaining the ability to position these two ligands for catalysis. Notably, this spatial co-localization of peptide stretches corresponding to the conserved motifs is sufficiently macroscopic to be a reliable result not affected by the accuracy of the homology procedure applied here.
In addition, we derive, as a result of this homology-supported structural modelling, a further expanded list of residues taken from the set of conserved motifs that are potentially interacting with the divalent metal ion and the DPM ligand. This list comprises those critical residues previously found with combined phylogenetic arguments (sequence conservation among TMTCs and similarity with sequences of structurally and functionally characterized sugar transferases) as a subset. Thus, we can relate certain residues strictly conserved among the TMTC sequences with functions in catalysis and ligand binding. This work also clarified the nature of the DUF1736 sequence segment in TMTCs, actually a loop between TM7 and TM8 the accurate positioning of several of its functional residues is critical for catalysis and binding of ligands, especially the lipid-linked sugar moiety.
Notably, we have already established the homology of TMTCs with GT-C/PMT-class sugar transferases when we first analysed their sequences for the first time in 2012; yet, a substrate and biological context assignment as well as 3D structural modelling were not possible. With HHpred [ 33 ], significant sequence similarity with DPM-dependent mannosyltransferases (PMTs, PF02366) was detected. With RPS-BLAST [ 85 , 86 ], we found the link to ArnT-like arabinose transferases (COG1807). Their respective 3D structures were not known during that time [ 58 ].
The density of hints derived from sequence analysis, phylogenetic comparisons, homology studies and structural modelling leaves no doubt that the TMTCs have enzymatic activity and perform sugar moiety transferase functions in their biological context. Thus, the O-mannosyl-transferase sought in the recently discovered new O-mannosylation pathway (via combinations of TMTC knock-outs) that selectively processes cadherin-like targets and that the TMTCs are members of [ 26 ], are actually the TMTCs.
Finding the real substrates of the various human TMTCs and rationalising the function of their glycosylation are important questions from the view-point of biological science. Additionally, this topic has a critical medical dimension as several mutations of TMTCs are compatible with survival but severely disable the affected patients in various ways due to the pleiotropic nature of their molecular and cellular functions. Laudably, first steps in this direction have been done. It can be concluded that various cadherins/proto-cadherins found as substrates for the new O-mannosylation pathway are protein substrates for O-mannosylation by TMTCs [ 25 , 26 ].
BLAST/PSIBLAST [ 32 ] searches reveal TMTC proteins are present in a wide range of animals but apparently not in fungi and plants (details not shown). Interestingly, essentially full-length homologous sequences (including the sugar transferase followed by TPR segments) are also found in many, typically not yet well characterized prokaryotes besides hits in lower eukaryotes such as oomycetes and choanoflagellates. One example is protein AMJ42_05695 (from Deltaproteobacteria bacterium DG_8) that is found by a BLAST search with human TMTC3 (24% sequence identity, E-value=3.e-47, alignment of query positions 12–698 against positions 46–774 from target). Human curiosity will not be satisfied until the diversity of their organic chemistry, the related biomolecular mechanisms and the cellular phenotypes will be understood.
This work has been supported by the Biomedical Research Council of the Agency for Science, Technology and Research (A*STAR). We thank Shazib Pervaiz (National University Singapore) for drawing our attention to the sequence-based function prediction of human TMTC2 in 2012.
AA | Amino acid(s) |
DPM | Dolichyl-phosphate-mannose |
ER | Endoplasmic reticulum |
GPI | Glycosylphosphatidylinositol |
PDB | Protein Data Bank ( ) |
TM | Transmembrane (region) |
TMTC | Transmembrane and tetratricopeptide repeat-containing |
TPR | Tetratricopeptide repeat |
BE and FE initiated the project and designed the computational approaches. BE, SS, VS, QWT, and FE made the sequence-analytic computations and evaluated the results. VS and FLS initiated the structural modelling; the final models were created by SS and CKJ and evaluated by FE. BE and FE were the major contributors in writing the manuscript. All authors read and approved the final manuscript.
VS and QWT had several months of student internships in the sequence analysis group of BE and FE at the Bioinformatics Institute Singapore (QWT in 2016 and VS in 2019).
There was no dedicated funding for this project. The writing of this article benefitted from the shutdown of other activities during the COVID-19 lock-down. General financial support from A*STAR is gratefully acknowledged. QWT received an A*STAR Graduate Academy (AGA) scholarship (AUS) for her university studies. The internship of VS was supported by a SIPGA grant from AGA.
Ethics approval and consent to participate.
Not applicable.
Competing interests.
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Birgit Eisenhaber and Swati Sinha joint first authors.
Birgit Eisenhaber, Email: gs.ude.rats-a.iib@etigrib .
Frank Eisenhaber, Email: gs.ude.rats-a.iib@eknarf .
Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser .
Enter the email address you signed up with and we'll email you a reset link.
COMMENTS
Lock-and-key vs. Induced Fit Model. At present, two models attempt to explain enzyme-substrate specificity; one of which is the lock-and-key model, and the other is the Induced fit model.The lock and key model theory was first postulated by Emil Fischer in 1894.The lock-and-key enzyme action proposes the high specificity of enzymes.
Lock and Key Model. A German scientist, Emil Fischer postulated the lock and key model in 1894 to explain the enzyme's mode of action. Fischer's theory hypothesized that enzymes exhibit a high degree of specificity towards the substrate. This model assumes that the active site of the enzyme and the substrate fit perfectly into one another ...
The Lock and Key Theory, postulated by Emil Fischer in 1894, is a cornerstone concept in biochemistry that elucidates the specificity of enzyme action. Enzymes are specialized proteins that act as catalysts to accelerate chemical reactions within biological systems. This theory analogizes the enzyme's active site to a lock and the compatible ...
Figure 1. Illustration of 'Lock and Key' (top), Induced fit (middle) and Combination Lock (bottom) model of protein-ligand binding interaction. But, enzymes show conformational flexibility and, on that basis, Daniel Koshland proposed a modification to the 'lock and key' model. Koshland's suggestion was that active sites of enzymes are ...
Lock-key model, or its modified version, the induced-fit model [21], explains catalysis by an enzyme with an easily accessible active site, while it is less appropriate for the enzymes with active sites buried in the protein core. The activity and specificity of such proteins is determined by not only the geometry and properties of the active ...
The Induced Fit Model Builds upon the Lock-and-Key Hypothesis. This lock-and-key model served the biochemical community well for over 50 years. However, while this model adequately explained how substrates that are too large to fit within the confines of the active site would fail to act as substrates, it did not explain how small substrates, for instance water, often acted as non-substrates ...
In 1894, Emil Fisher discovered that glycolytic enzymes are able to distinguish between sugar stereoisomers. Based upon that discovery, he formulated the lock-and-key hypothesis (Fischer 1894), which proposed that enzymes recognize their substrates just as a lock receives a key.That is, only in the case of exact geometric complementarity between the substrate (key) and enzyme (lock) is the ...
The simplest model of enzyme-substrate interaction is the lock-and-key model, in which the substrate fits precisely into the active site (Figure 2.24). ... This example illustrates several features of enzymatic catalysis; the specificity of enzyme-substrate interactions, the positioning of different substrate molecules in the active site, and ...
similar structure. The specificity of an enzyme with a substrate can be explained by "Lock and key" model. In this model, the lock and key correspond to the enzyme and the substrate, respectively, and only the correctly shaped key can fit into the key hole (active site). This theory is based on the "rigid enzyme" model
1. Introduction. After Emil Fischer coined the lock-and-key picture for the reaction between enzymes and substrates [], it became a leading concept for the understanding of intermolecular interactions with proteins, and later for the rational design of drugs.With the advent of supramolecular chemistry the idea gained an enormous momentum, as chemists began to synthetize a large variety of host ...
'Lock and key' hypothesis of enzyme specificity. Harden and Young: 1901-3: Methods for the derivation of kinetic rate laws; principle of enzyme-substrate complex. ... Enzyme specificity is the absolute specificity of protein catalysts to identify and bind to only one or a few molecules. In this process the enzyme carries a defined ...
Explore how enzymes work and how they shape the active site to catalyze biochemical reactions . Khan Academy offers a free, world-class education for anyone, anywhere.
In protein: The role of the active site. …and enzyme, called the "key-lock" hypothesis, was proposed by German chemist Emil Fischer in 1899 and explains one of the most important features of enzymes, their specificity. In most of the enzymes studied thus far, a cleft, or indentation, into which the substrate fits is found at the active….
Enzyme specificity is due to the way an enzyme interacts with the substrate molecule to form an enzyme-substrate complex (also called transition-state complex ). ... Lock and Key Theory states both the structure of enzyme and the substrate are rigid whereas Induced Fit Theory describes that the structure of enzyme is partially flexible. The ...
How Emil Fischer was led to the lock and key concept for enzyme specificity Adv Carbohydr Chem Biochem. 1994:50:1-20. Authors R U Lemieux 1 , U Spohr. Affiliation 1 Department of Chemistry, University of Alberta, Edmonton, Canada. PMID: 7942253 No abstract available ...
A pilot screening of prevalence of atopic states and opisthorchosis and their relationship in people of Tomsk Oblast
According to recent data [2] permafrost was not only in the central and southern parts of West Siberia, but Quaternary ice cover embraced northern and northeastern parts of Kazakhstan; ice sheet
Additional file 4 : Supplementary Figure 1.Alignment of TMTC sequences with those of the template structures 5ezm and 5f15 used for homology modelling. The file AF4-2020-10-modeller-alignment-TMTCs.pdf shows the alignment of the four human TMTC sequences with template structures 5ezm and 5f15 that was actually used for generating their 3D model with the Modeller suite version 9.4.
This article focuses on the processes of ethnic and cultural identification and self-identification, which the indigenous peoples of the North of Russia and Siberia, living in the Russian Federation, are currently going through. The post-Soviet