Update: 19 Sep. 2000
SEQUENCE
INFORMATION
PROTEIN SEQUENCES
Ten Subfamilies (CTER) that Resemble Calmodulin
By definition the members of one subfamily resemble one another more closely than they do any member of another subfamily. When comparing subfamilies we initially use one sequence of that subfamily; subsequently conclusions are confirmed by repeating the key comparisons with other representatives, if available, of that subfamily. With minor exceptions the EF-hands 1 (and 2, and 3, and 4) of the ten subfamilies - CAM, TNC, ELC, RLC, TPNV, CLAT, SQUD, CDC, CAL, and CAST - resemble one another more closely than they do other EF-hands of their own or of other subfamilies. These ten subfamilies are grouped together as "CTER" because, within statistical variation, they are congruent. CVP, EFH5, and PARV show fair congruence with CTER but are not included. Tentatively we suggest that all ten evolved from a single precursor that contained four EF-hands by a process of gene duplication without subsequent gene fusion. That single precursor is inferred to have evolved by two cycles of gene duplication and fusion starting with a single ur-domain.
Even though the ten are congruent, they show a great divergence in calcium affinity.
CLAT and CAST are chimeric; the other eight are not. CAM, TNC, ELC, RLC, TPNV, and SQUD all function by activating a target enzyme(s) or structural protein(s). CAM, TNC, ELC, and RLC bind an a~-helix of the target peptide in the cleft between ODD and EVEN domains.
Seven Subfamilies (CPV) that Resemble Calcineurin B
With minor exceptions the EF-hands 1 (and 2, and 3, and 4) of the seven subfamilies - CLNB, P22, VIS, CALS, DREM, CMPK, and SOS3 - resemble one another more closely than they do other EF-hands of their own or of other subfamilies. These seven subfamilies are grouped together as "CPV" because, within statistical variation, they are congruent; however, the degree of congruence is not so strong as for CTER.
The CPV proteins, as was the case for CTER, show a great deal of variation in calcium binding. CALS, DREM, and CMPK are heterochimeric; the others are not. CMPK is an enzyme. CLNB binds an (-helix of a target enzyme in the cleft between ODD and EVEN domains. Despite the similarities the CTER subfamily EF-hands and the CPV subfamily EF-hands are not more closely related to one another than they are to many EF-hands of other subfamilies. Evenso, the postulated ODD, EVEN precursor common to all members of the CTER group may also be the common precursor of CPV.
Other General Patterns
Two other patterns, in addition to congruence among CTER and among CPV subfamilies, are frequently observed. Seven pairs or triplets of subfamilies most closely resemble the other member(s) of the "Pairings" (triplet) over most of the domains. Often the placement in separate subfamilies, e.g. S100 and ICBP, depends on characters other than just amino acid sequence, (table 1).
The "Self" category includes EF12, LPS, CLBN, EP15, TCBP, P26, PLC, and CBP. For each subfamily the domain(s) that most closely resembles the EF-hand in question come from the same subfamily. In the most easily interpreted cases EF-hands 1 - 6 of EF12 (and domains 1 - 4 of LPS) most closely resemble domains 7 - 12 of EF12 (5 - 8 of LPS). EF12 (or LPS) is inferred to have evolved by a recent gene duplication and fusion from a six (four) domain precursor. This simple pattern of gene duplication and fusion also obtains for CLBN, EP15, TCBP, and P26. However, for PLC EF-hands 1 and 2 most closely resemble one another; for CBP EF-hands 3 and 4 are most similar.
DISTRIBUTION OF INTRONS
The positions and phases of known EF-hand introns are summarized in Table 8.
The sequences of genomic DNA are available for 86 different EF-hand sequences in 30 different subfamilies. We have classified these intron sites by phase (between encoding base triplets or after the first or and after the third base of the triplet) and by position within or between EF-hands into 49 distinct "characters" over the entire lengths of the proteins. These characters can then be scored "+" or "-" for the presence or absence of an intron at that site. Of 48 different intron sites, within 31 subfamilies, 13 occur between domains and 35 occur within domains. This indicates that exons do not correspond to or define the structural and functional EF-hand domain. CAM of Saccharomyces has only one intron, while CAM of vertebrates has five. Xiang et al. [6516] proposed that the pattern of introns in SPEC from Strongylocentrotus could be recognized twice in the eight domains of LPS from Lytechinus.
The dendrogram based on 29 intron positions (characters) using genomic sequences for 45 CTER proteins for which there are 20 different character states (CAM 15/9, TNC 4/1, ELC 15/2, RLC 7/4, CAL 3/3, and CDC 1/1) was made from Table 8.
The dendrogram based on introns for CAM, TNC, ELC, RLC, CAL, and CDC has limited statistical validity because there are only 29 "+/-" characters. Even so, there are several significant features:
1. The intron character states for both human and mouse, skeletal and cardiac TNC's are identical. Over the time scale of vertebrate evolution they have not changed.
2. Most organisms have multiple isoforms of most proteins. The intron characters of two ELC's from Drosophila,; two RLC's from Rattus; two CAM's from Halocynthia are the same. Since the duplication of these encoding genes that gave rise to the isoforms the introns have not changed.
3. The sharp delineation of the ten subfamilies of CTER based on amino acid sequences as illustrated in figure 5 is blurred in the analysis of introns. The TNC's are well resolved from the others. The RLC's of rat smooth muscle and Drosophila nonmuscle are very close as are the RLC's of rat skeletal muscle and Drosophila muscle; this is consistent with the dendrogram based on amino acid sequence and reflects an early division of the two isoforms.
4. The CAL and CDC taxa are more interspersed among the various CAM taxa. For instance the intron character state of CAL from Caenorhabditis and CAM from Chlamydomonas are identical. Whether this reflects statistical fluctuation or deeper functional significance is difficult to say; however the question lends urgency to determining the functions of CAL and of CDC.
DISTRIBUTION OF LOCI ON THE CHROMOSOMES
Berchtold [19a] had summarized the chromosomal assignments of genes encoding EF-hand proteins in Mus and Homo. He concluded that there was no '...selective pressure to maintain chromosomal clustering...', that '...chromosomal translocations occurred before divergence of these species,' and that '...evolution of various EF-hand proteins involved numerous gene duplications, translocations, assemblies, and other rearrangements.'
In Homo there are 57 known loci for members of 29 subfamilies. Except for S100A1-A13 and HYFL there is no linkage of genes within subfamilies nor clustering of subfamilies. For instance the three genes encoding the identical CAM sequence are located at 14q24-q31, 2p21, and 19q13.3 and calmodulin-like geneis at 10pl3-ter.
Berchtold's conclusions still hold true.