Multiple Sequence Alignment (MSA) is usually done for the following reasons:-

  • To determine the consensus sequences of the aligned sequences.
  • To characterize protein families and determine evolutionary history behind them.
  • It's also done as an initial step towards phylogenetic studies.
Let's study the relationship of DHFR protein with the other protein of same family.

Open the Swiss Prot entry for P00374 which we have previously used in this tutorial.

Click on the Family option on this page and you will find all the Swiss Prot ID which belong to this family.

For this tutorial I have selected few proteins ,given below are their Swiss Prot ID's. Open individual sequence in new window:-

Swiss Prot Entries:-

P00374(Original sequence)

P11731
P00378
P07807
P00375

By this we'll actually find the similarity between human, mouse, chicken, yeast and E.coli's DHFR protein .i.e., by what extent they are related..

Copy all the above sequences in FASTA format in one notepad. So if have done it .Check here, I have that notepad file for you.
Clustal W sequences file
.

Now open CLUSTAL W page at EBI. http://www.ebi.ac.uk/clustalw/

Copy the entire sequence from notepad and paste in the text box ,keep the default settings and Run CLUSTAL W.

Interpreting Results

CLUSTAL W results page appears.

Unlike the BLAST, CLUSTAL W finds the best alignment over the entire sequence.

So to check out the alignment, click on Alignment file .Results appear showing our five sequences aligned .You'll notice various signs beneath sequences, these refer to:

"*"-asterisk denotes that residue at that positron is exactly same. Like I and G in first line in our case.

":"-colon indicates residues at that position are very similar. Like valine and leucine share the same property.

"."-dot indicates residues are more or less similar.

And if there is no mark, it denotes that there is no common property.

So at last what should be the interpretation.

If "*" appearing in the alignment file are very less then the sequences are not related to each other and if you find fair number of "*" ,sequences are likely to be related .Lastly if the number of "*" is more than they might share functional domains.