UPDATE: 14 Sep. 2000
VOLUME 2, 1995
CALCIUM-BINDING PROTEINS 1
EF-hand proteins can be classified into 66 distinct subfamilies of proteins that contain from two to twelve EF-hand domains [6372c]. Table 1 summarizes the characteristics of each of the subfamilies as well as their abbreviations which are used in the text. Calmodulin (CAM) is probably found in all cells of all eukaryotes and has been extensively studied; it alone might well form the subject for an entire treatise. For others, such as CAM-related gene product (CRGP), little more than the sequence of its cDNA is known. Hence, the 66 subsections that deal with individual proteins vary in length and content. The next section describes the general characteristics of the EF-hand domain and how it differs from numerous analogs. We then survey the role of calcium in signal transduction within the cytosol. We conclude by considering some general characteristics of EF-hand proteins and how they might relate to cell signalling.
In the second major section we summarize individual characteristics of each of the 66 subfamilies, for some in extensive tabulations, for others in a sentence. In the third section we present sequence alignments and some of the generalizations that can be drawn from a thousand EF-hands of known sequence. Crystal structures are available for eight subfamilies, the subject of the fourth major section.
The mnemonic of Figure 1 is often used to identify EF-hands because it is easily applied and reflects several structural and functional characteristics. It misses very few true EF-hands but can give false positives. Usually, a candidate sequence is compared against our database of a thousand known EF-hand domains. Consistently we find that an EF-hand homolog scores over 10 identities within the 29-residue-length motif with representatives of several established EF-hand subfamilies. The canonical EF-hand consists of an alpha-helix E (residues 1-10), a loop around the calcium ion (residues 10-21), and a second alpha-helix F (residues 19-29). The stippled alpha-carbons, indicated by n - 2, 5, 6, 9, 17, 22, 25, 26, and (29) usually have hydrophobic side chains. They point inward and interact with the homologous residues of a second EF-hand domain, related to the first by an approximate twofold rotation axis, to form a hydrophobic core. Ile, Leu, or Val at position 17 attaches the loop to the hydrophobic core. '*' indicates variable residues, often hydrophilic; Gly at position 15 permits a sharp bend in the calcium-binding loop. Residues specifically indicated reflect a strong consensus but they are not invariant. The Ca2+ ion is coordinated by an oxygen atom or bridging water molecule (-X) of the side chains of residues 10 (X), 12 (Y), 14 (Z), and 18 (-X). The ligand at vertex -Y is the carbonyl oxygen of residue 16. Position 21 (-Z) is usually Glu and is the sixth residue to coordinate calcium. Calcium binds with both oxygen atoms, contributing the sixth and seventh oxygens of a pentagonal bipyramid whose axis is 10 (X) - 18 (-X).
In the S100 subfamily there are two residues inserted into the first loop of those domains 1 that are inferred to bind calcium. The Glu at -Z coordinates with both atoms of its carboxylate group. Four carbonyl oxyges -- X, Y, Z, and -Y -- coordinate calcium. The seventh ligand is the oxygen of water.
Loop 1 of the essential light chain of myosin (ELC) of Aquipecten, and probably also of Patinopecten and Todarodes, has two residues inserted, but not at homologous positions to those inserted in loop 1 of S100. Asp10 coordinates calcium or magnesium with both carboxylate oxygen (X) and carbonyl oxygen (Y); Asp12 (Trp is inserted between 11 and 12) coordinates with both carbonyl oxygen (Z) and carboxylate (-X); Asp14 (Arg is inserted between 13 and 14) uses its carboxylate (-Y); and a carbonyl oxygen comes from 16 (-Z). The $...$ in the molluscan ELCs is either Glu...Asp or Asp...Glu. This special molluscan ELC calcium-binding motif may be absent in other ELCs, because only molluscan ELCs have the inserted Trp and Arg.
Loop 1 of BM40 is similar to the canonical loop except that a Pro is inserted between 12 and 13. The His-Pro peptide bond is cis; the result is that the carbonyl oxygen of Pro bonds to calcium at Y instead of the usual side chain of residue 12. Pro's are at 24 and 28.
The distributions of ligands in the second EF-hand of CBL, the fourth variant, is canonical (Asp, Thr, Asn, Ser, Glu); however Ser247 does not coordinate Ca2+; instead the -X ligand is provided by Glu164 of the fourth helix of a preceding four-helix domain.
One-third of the domains probably do not bind calcium. However, given that the calcium coordinations of neither the S100 nor the ELC variant domains were anticipated, one can anticipate alternative calcium coordinations. In general the (inferred) non-calcium-binding domains show greater variation in sequence than do those that bind calcium.
Pairs of domains
EF-hands usually occur in pairs of adjacent domains from the same polypeptide chain. However, this cannot be the case for parralbumin (PARV) with three domains nor for calpain (CALP) with five. The N-terminal domain of PARV, indicated as #2 in Table 1, covers the hydrophobic surface of the 3,4 pair. The fifth EF-hands of m and u-calpains (CALP), and probably those of the close homolog, sorcin (SORC), pair to form a dimer. The two domains of a pair are related by an approximate twofold rotation axis. Each pair is hemispherical, with the calcium-binding sites on the curved surface. The flat surface is more or less hydrophobic, depending on whether or not calcium is bound. In both aequorin (AEQ) and the Thr/Ser protein phosphatase of Drosophila (PPTS) there are three well-defined domains, which are listed in Table 1 as 1, 3, and 4. The residues (::-::) between each domain 1 and domain 3 do not appear to contain EF-hands. However, these residues might have diverged beyond recognition and pair with domain 1.
The relationship of one pair of domains, referred to as a lobe, to other parts of the molecule varies. In troponin C (TNC) and in CAM the 11 or 8 residue linker between domain pair 1,2 and pair 3,4 is an a~-helix as seen in the crystal structure and deduced from solution studies by small-angle X-ray scattering [2738, 6835b, 6836] or by nuclear magnetic resonance [6839, 2586]. The second or F2 a~-helix of domain 2, the linker 2,3, and the first or E3 a~-helix of domain 3 form one continuous central helix seven or eight turns long. A similar dumbbell shape is inferred for monomeric ELC and for RLC. However, when bound to a target, CAM [6839, 6840], ELC, and RLC [6849d] show varying degrees of bending in the 2,3 linker, hence the term, flexible tether  or expansion joint . In contrast, in the crystal structure of SARC the 15-18 residue linker is bent, as is the seven residue linker of recoverin (VIS). For both SARC and VIS lobe 1,2 and lobe 3,4 fit tightly together; however, the regions of contact and relative orientations are quite different. These variations are discussed in greater detail in Section 4 on tertiary structures. One can anticipate many variations in the relationships between lobes, especially in calbindin (CLBN), which has six EF-hand domains, in the Lytechinus pictus SPEC-iike protein (LPS), which has eight and in the calcium bindingprotein from nematodes (EF12), which has 12 EF-hands
Classification of subfamilies
All proteins within a subfamily should be congruent. For example, all EF-hand domains 1 should more closely resemble other domain 1 regions within the subfamily than other domains of the same protein. Further, the dendrogram generated from each domain 1 should, within stochastic fluctuation, be the same as the tree for each domain 2. This implies that all members of the subfamily evolved from a precursor protein by speciation and/or gene duplication with no gene fusions or translocations (Figure 1). Members of a subfamily usually share similar, distinguishing physical and functional features. However, some of the assignments are tentative. For example, intestinal calcium binding protein (ICBP) vs. S100 or troponin C (TNC) vs. troponin from nonvertebrates (TPNV). The four subfamilies - CAM, TNC, ELC, and RLC - are strongly congruent; however, they comprise different subfamilies because of their well-documented differences in function and structure. In contrast the S100 subfamily contains proteins that have 16 different alpha-numeric designations and that occur both as homo- and hetro-dimers. They are grouped into one subfamily because they all contain the unique first domain and because, unique among the EF-hand proteins of known chromosomal location, nearly all of their genes are clustered (see section on chromosome locations).
CALCIUM AS A CYTOSOLIC MESSENGER
Calcium is one of the most widely employed of an ever expanding list of cytosolic messengers whose pathways interact in fascinating skeins. To a first approximation, the concentration of free Ca2+ ion in the cytosol of an unstimulated eukaryotic cell is ~ 6 x 10-8 M. Following stimulation [Ca2+] rises to 1.6 x 10-6 M. An apo-EF-hand protein binds calcium and changes conformation. In its calcium form it activates a target enzyme or structural protein.
The story is more complex. The rise in calcium concentration following stimulation may oscillate with various phase and amplitude in different parts of the cell [3026a]. Calcium enters the cytosol via either the plasma membrane or a membrane enclosing an organelle derived from the endoplasmic reticulum. The effective concentration of Ca2+ ion adjacent to the membrane surface may exceed 10-5 M.
In the so-called quiescent cell some EF-hands bind magnesium. The concentration of the free Mg2+ ion in the cytosol is near constant, about 3 x 10-3M. For many calcium-binding proteins, and certainly most EF-hand proteins the ratio of affinities for Ca2+ and for Mg2+ is about 104.2. Sites with high affinity for calcium, Kd = 6 x 10-8 have an affinity for magnesium Kd =1 x 10-3 and are probably filled with magnesium in the resting cell. This leads to the counter-intuitive situation in which calcium entering a cell first binds to the weaker sites, sometimes referred to as calcium-specific, which are unoccupied. It is not known whether some sites, such as in S100-1 or molluscan ELC-1, are in the calcium or the magnesium form in either the resting or stimulated cell.
Functions of EF-hand proteins
Most of the proteins that bind messenger calcium are members of the EF-hand homolog family. Other proteins, some of which are associated with membranes, as are annexins, that have lower affinities for calcium can also bind messenger calcium because of its higher effective concentration near the membrane. A general paradigm for the transduction of information contained in a pulse of Ca2+ ions to a change in conformation of a target enzyme or structural protein is provided by CAM. In the quiescent cell it is in the apo form as a monomer. Following the binding of calcium it changes conformation then binds to and activates its target(s). This schema has been demonstrated only for CAM and even CAM does not always honor it. Even so, it provides a valuable reference for considering other variations.
The EF-hand protein may be part of an oligomer in the resting cell. Four molecules of CAM are part of the hexadecamer, phosphorylase kinase b, a~4 b~4 g~4 CAM4. Full activation involves binding (probably) four equivalents of calcium to each of the four resident CAMs as well as the binding of four more equivalents of CAM to the hexadecamer. TNC, and perhaps TPNV, is the third component of troponin, which includes the troponin-binding protein, TNT, and the inhibitory component, TNI. One molecule each of ELC and of RLC bind to the head of the heavy chain of myosin to form the heterohexamer, (HC*ELC*RLC)2. TNC, ELC, and RLC remain parts of their respective complexes throughout the functional cycles. Several other homologs are components of heterocomplexes and do not dissociate upon release of calcium: for example, the small subunit of the protein phosphatase, calcineurin (CLNB); the homodimer, ACTN, that links the ends of thin filaments to Z lines or to membranes; and p10 (p11) a member of the S100 subfamily that, as a dimer, is part of the heterotetramer, calpactin (annexin II).
Guanyl cyclease activating protein (VIS) in the apo form activates guanylate cyclase but in the calcium form dissociates. A third variation is provided by those proteins that have catalytic domains spliced to two or four tandem EF-hands to form chimeric proteins. Protein kinase (CDPK), diacylglycerol kinase (DGK), the threonine and serine phosphoprotein phosphatase from Drosophila (PPTS), calpain (CALP), and P. falciparum protein kinase (PFPK) are calcium modulated via their EF-hands. There are no indications whether these, attached EF-hands, interact with the catalytic regions in a manner similar to CAM interaction with the MLCK or to ELC and RLC interaction with the heavy chain of myosin. The genes encoding the five different catalytic domains are fused to EFhand genes.
Aequorin (AEQ) is a coelenterazine oxidase. From its amino acid sequence, we recognized only three EF-hands but a fourth domain (second in sequence), which is beyond our recognition, is a homolog of the EF-hand. The structure of the second domain is very similar to other three domains. AEQ has only a few residues N- and C-terminal to the four EF-hand domains. The C-terminal extention is important to stablize the peroxiside which reacts to coleltrazine to initiate the light-emmision reaction.
Not all EF-hand proteins that are not enzymes interact with other molecules. The function of parvalbumin (PARV), which served as the Rosetta stone for the entire family, has yet to be determined; however, it is inferred to function in temporal buffering and/or transport of calcium.
Osteonectin (BM40) and QR1 are extracellular in an environment whose pCa is near constant ~ 2.9. One EF-hand of BM40 binds calcium, supposedly with high affinity, but not with any modulation. This appears to be an example of Nature's having taken a protein initially "designed" for one function and put it to use in extracellular stabilization. S100 and PARV are sometimes found extracellularly. Whether this reflects a normal function or pathology has yet to be determined. All other EF-hand proteins are found in the cytosol or associated with cytoplasmic membranes.
The functions of only 20 of the 66 EF-hand subfamilies are known. Other functional patterns may be found.
GENERAL CHARACTERISTICS OF EF-HAND PROTEINS
Sixteen of the subfamilies consist of two (diacylglycerol kinase (DGK), a~-actinin (ACTN), a~-spectrin(FDRN), glycerol phospahe dehydrogenase (GPD), osteonectin (BM40), tricohyalin & prohylaggrin (HYFL), fimbrin (FIMB), ryanodine receptor (RYR), Cbl (CBL), and Calcium- and integrin-binding protein (CIB)) or of four (calcium-dependent protein kinase (CMPK), phosphalipase C (PLC), calcium depende protein kinase (CDPK), P. falciparum protein kinase (PFPK), protein phosphatase (PPTS), and LAV1 of Physarurn (LAV)) or of five (calpain (CALP)) or of six (surface protein of Plasmodium (PFS)) EF-hands that are attached to the C-terminus (CMPK, CALP, ACTN, FDRN, GPD, BM40, CDPK, PFPK, PPTS, LAV and CIB) of, or to the N-terminus (HYFL, and FIMB) of , or between (DGK, PLC, RYR, CBL, and PFS) larger, non-EF-hand domain(s). The catalytic regions of the five chimeric enzymes - CMPK, GPD, PLC, DGK, CDPK, CALP, PPTS, and PFPK - are each homologs of non-EF-hand proteins. The remaining 50 subfamilies consist solely of two to twelve EF-hands, with short pre-, inter-, and postdomain sequences.
Protein phosphatase (PPTS) of Drosophila, has three canonical EF-hands, (1, 3, and 4); domain 2 (indicated by '::-::' in Table 1) might be an EFhand but the sequence cannot be assigned as a homolog with confidence. Many of the 66 proteins are isolated, and are assumed to function, as monomers. However, some, such as specific S100s within the broader subfamily of S100 are dimers. Others, such as ELC, RLC, TNC, a~-spectrin and a~-actinin, are components of oligomeric complexes. CAM interacts with over a score of target proteins when in its calcium form and is usually free in the cytosol in its apo form.
About one-third of the known EF-hand domains do not bind calcium. Most of the EF-hands that do bind calcium also bind magnesium, albeit with ~~104.2 lower affinity. Hence many EF-hand domains that bind Ca2+ ions with Kd = 6 x 10-8 bind Mg2+ with Kd = 1 x 10-3 and will bind magnesium in the resting cell. The binding of calcium to an EF-hand protein induces changes in conformation that are detected by various spectroscopic techniques and have been inferred from the crystal structure of TNC, in which domains 1 and 2 are apo and domains 3 and 4 in the calcium form (see the later section on structure). Binding magnesium also induces changes, some similar to and some different from those induced by calcium, depending upon the subfamily. Most of the subfamilies bind calcium functioning as a cytosolic messenger; however, the functions of 46 of the 66 subfamilies remain unknown. When involved in an information transduction pathway, the binding of calcium imparts not only a change in conformation but also an increased stability, as measured by proteolysis and thermal denaturation. Many of the early preparative procedures involved boiling the crude extracts to denature and precipitate other proteins.
Genetic and evolutionary relationships
For humans, 53 loci belonging to 29 subfamilies have been mapped on the genome [http://www.ncbi.nlm.nih.gov/UniGene/Hs.Home.html]. Except for the clustering of S100 genes on chromosome lq21 there is no discernible pattern in the distribution of loci.
Several isoforms of TNC, ELC, RLC, S100, and PARV are available in a single organisms. Three genes of CAM have been repoted in human, which encode the same proteinsequence. Two isoforms of CAM have been reported in Arbacia punctulata and in Arabidopsis thaliana and several vertebrates have pseudogenes for CAM. The two isoforms of RLC, L1 and L4, are generated by different initiation sites and different splicing of exons. The C-terminal 140 residues are identical.
The evolution of EF-hand proteins was recently reviewed and summarized:
(1) Multidomain proteins should be classified in terms of their constituent domains. Congruence must be established before the entire sequences of proteins can be compared.
(2) The relationships of the branches in the dendrograms of the EF-hand subfamilies are statistically significant.
(3) CAM, TNC, ELC, and RLC are congruent with one another. The five domains of CALP evolved from a single CALP precursor domain; the four domains of SARC evolved from an ur-SARC domain. The CAM-TNC-ELC-RLC/SARC-CALP dendrogram provides a reference tree for estimating the origins of the domains of the other 60 subfamilies. Several of the subfamilies evolved by complex patterns of gene duplications, translocations, and splicings. This is reflected both in the diversities of their domain origins and their chromosomal locations.
(5) Even within CAM-TNC-ELC-RLC and its congruent subfamilies and certainly for the other subfamilies one cannot recognize a pattern or sequence of subfamily origins. The several subfamilies seem to have radiated from one point. This may reflect turbulent speciation events or, more likely, multiple mutations whose order cannot be reconstructed.
(6) Functional and physical properties, such as calcium coordination, vary within subfamilies.
(7) Conversely, (members of) different subfamilies have often converged toward similar characters. We do not recognize a systematic variation of characteristics among subfamilies. Most of the subfamilies appear to have evolved from a few precursors with no systematic correlation of structure or of function with branching patterns of subfamilies (domains) relative to one another.
(8) Since many of the subfamilies are represented by multiple isoforms, one must exercise extreme caution in using EF-hand proteins as markers to follow the evolution of species.
(9) The distributions of introns in the encoding genes precludes the correspondence of exons with EF-hand domains; exon shuffling did not play a significant role in the evolution of this family (see Table 9, later??).
(10) The dendrograms based on encoding DNAs are parallel within stochastic fluctuation to those based on protein sequence for TNC, ELC, RLC, CALP, PARV, and S100.
(11) The classifications of CAM by protein and by nucleic acids sequences present several intriguing and unresolved contradictions.