By lliu | Sat, 04/24/2021 - 14:36

Let $A_k=\{a_{1,k}, ..., a_{n,k}\}$ for $k=1,...,K$ be the $K$ sequences in the alignment, where $a_{i,k}$ is the nucleotide at site $i$ of sequence $A_k$ in the alignment. Note that all sequences in the alignment have the same length $n$. Suppose the similarity score of two nucleotides $a$ and $b$ is defined as 

$$\begin{align} S(a,b)=\begin{cases} 1, &a=b \\ -1, &a \ne b \\ 0, &gap\end{cases} \end{align}$$

Each site (i.e., column) in the alignment is scored by summing the scores of all pairs of nucleotides in that site. For example, the similarity score of site $m$ is given by

$$S_m=\sum_{i=1}^{K-1}\sum_{j=i+1}^K S(a_{m,i},a_{m,j})$$

Finally, the similarity score of the entire alignment is equal to the sum of the similarity scores across sites, i.e.,


The goal is to find the alignment with the maximum similarity score.