By lliu | Sat, 04/24/2021 - 14:36

Let $A_k=\{a_{1,k}, ..., a_{n,k}\}$ for $k=1,...,K$ be the $K$ sequences in the alignment, where $a_{i,k}$ is the nucleotide at site $i$ of sequence $A_k$ in the alignment. Note that all sequences in the alignment have the same length $n$. Suppose the similarity score of two nucleotides $a$ and $b$ is defined as 

$$\begin{align} S(a,b)=\begin{cases} 1, &a=b \\ -1, &a \ne b \\ 0, &gap\end{cases} \end{align}$$

Each site (i.e., column) in the alignment is scored by summing the scores of all pairs of nucleotides in that site. For example, the similarity score of site $m$ is given by

$$S_m=\sum_{i=1}^{K-1}\sum_{j=i+1}^K S(a_{m,i},a_{m,j})$$

Finally, the similarity score of the entire alignment is equal to the sum of the similarity scores across sites, i.e.,

$$S=\sum_{m=1}^nS_m$$

The goal is to find the alignment with the maximum similarity score.