Aligning protein sequences by hand
The most powerful tools in the bioinformaticist's toolbox is sequence alignment. Let’s see why this is so with the following example:
Well, lets give a few examples.
人脸识别器∙Suppose we have cloned and sequenced a protein, which we believe to be a protease. Which protease could it be? Search using PUBMED to find out more about proteases and protease families. A database search using BLAST tells us that this protein is a remote family member of the serine protease family. So we make an alignment against the most similar serine protease. This tells us that the overall sequence identity is about 29%. That is not very much, but the local sequence identity around the three active site residues is considerably higher, and because of that we know for sure that our new protein is a serine protease.
∙Suppose we have a protein that we can easily obtain in large quantities, and we want to
石材背栓use it in a bioreactor. Unfortunately, the industrial process requires a temperature of 65 oC, but our protein is heat labile and denatures at temperatures higher than 52 oC. What will you do? Introducing some mutations may make the protein more stable. Simple, but which of the 298 amino acids should be mutated? The only thing we know is that we should not mutate in or near the active site because that would alter the specificity. This is when sequence alignments come in. We align our protein against a series of family members that have been purified from thermophilic members of domain Bacteria and Archaea. We look at the multiple sequence alignment, and if we see positions where all the thermostable stable proteins have one type of residue, and our protein another, we may have a site which we could mutate. If one such position is also far away from the active site, and not in an unpleasant position (like the first residue because of cleavage of the pro-peptide, or just in the middle of the epitope that our monoclonal antibody recognizes), we have potentially found a stabilizing mutation. ∙A third example will pop up in due time.
But, lets start with some examples on alignments:
The question with the first example given below: "Write down in your own words why the green alignment is better than the red one, and why this seems to be wrong at first.
If we have two sequences, with two different alignments:
A TVTVTGNSITIT A TVTVTGNSITIT
B1 TVTVTG--ITIT B2 TVTVT—GITIT
then the left alignment looks much better, but look at the corresponding structures that are shown below:
| Structure A TVTVTGNSITIT |
| the structure that would lead to alignment B1 TVTVTGNSITIT TVTVTG--ITIT |
| the structure that would lead to alignment B2 TVTVTGNSITITat89s52最小系统 TVTVT--GITIT |
| | 线圈缠绕机
1 Aligning sequences by hand.
The alignment given below is very straight forward to achieve and does not require software.
-ASTRGFHILTYHGVCIPPYILRTSA
AATTKGFHVISYHGICLPPYMIRT--
However, the following alignment of the two sequences has been not straightforward and required some thinking.
-ASTRGFHILTYHGVCIPPYILRTSA
AATTQPF--ISFHSICLGNFMIRS--
Nevertheless, I think that this alignment is the best that can be achieved for these two sequences. How can I know that? How did I make this alignment?
Lets think about an alignment. An alignment is a representation of a whole series of events that took place during evolution and that left their traces in the sequence. So, the more likely it is that something happens (or does not happen!) during evolution, the more important is it to have this "something" show up in the alignment.
What kind of "something"s is important? lets give a few examples:
∙It is much easier to mutate than to insert or delete (indel).
∙Once nature decided on an indel, its length is less important, but longer indels are more difficult to make than shorter ones.
∙线路保护Active site residues don't mutate. ∙Residues tend to mutate into similar residues (e.g. V <-> I; S <-> T; etc).
∙Residues mutate more easily to residues encoded by similar codons.
∙Cysteines that sit in cysteine bridges don't mutate easily.
∙Surface residues mutate more easily than core residues.
包层模∙Core residues mutate easier when they make fewer contacts.
∙It is hard to mutate a glycine that sits somewhere with torsion angles that other residues cannot have.
∙Etc.
We will now start working on sequence alignments. We will slowly add one rule after the other, and learn a few new physico chemical properties of amino acids while we are doing this.
2 Hydrophobicity in sequence alignment