Skip to main content

User account menu

  • Contact
  • Log in
Home

Main navigation

  • Home
  • File
    • File concatenation
    • File separation
    • Sequence Format Converter
    • Tree Format Converter
  • Alignment
    • Alignment trimming
    • Multiple Sequence Alignment
  • Phylogeny
    • Distance Methods
    • Maximum Likelihood
    • Maximum Parsimony
    • Species Tree Inference
      • MP-EST
      • NJst
      • STAR
    • Tree Summary
      • Consensus Trees
      • Root/reroot
      • Tree Distance
      • Unroot
    • Model Selection/Validation
      • Concatenation model validation
      • Molecular Clock
      • Substitution model selection
      • Substitution model validation
  • Simulation
    • Simulate DNA Sequences
    • Simulate Gene Trees
  • Training
    • Alignment
      • Pairwise Alignment
      • Multiple Sequence Alignment
    • Phylogeny
      • Substitution Models
      • Phylogenetic tree reconstruction
      • Tree Distance
  • Help
  • Service

Global alignment and Dynamic programming

Breadcrumb

  • Home
  • Alignment
  • Multiple Sequence Alignment
  • Global alignment and Dynamic programming
By lliu | Sat, 04/24/2021 - 11:25

Dynamic programming for pairwise alignment can be extended to multiple sequences. Let $A_k=\{a_{1,k}, ..., a_{n_k,k}\}$ for $k=1,...,K$ be the $K$ sequences being aligned, where $n_k$ is the length of sequence $A_k$. To find the best alignment, we first construct a $K$-dimensional array $M$. We use $M_{i_1,...,i_K}$ to denote the value of cell $(i_1,...,i_K)$ of $M$. We fill in the matrix by the following scheme,

$$M_{i_1,...,i_K} = \max\left\{M_{i_1',...,i_K'}+S(a_{i_1',1},...,a_{i_K',K})+d\times (K-1)\times \sum_j I_{(i_j'=i_j)} \right\}$$

Note that $ \sum_j I_{(i_j'=i_j)}$ is the count of gaps. The similarity score $S(a_{i_1',1},...,a_{i_K',K})$ of multiple nucleotides is calcualted by the following formula,

$$S(a_{i_1',1},...,a_{i_K',K})=\sum_{j=1}^{K-1}\sum_{k=j+1}^K I_{(i_j'=i_j-1)}I_{(i_k'=i_k-1)}S(a_{i_j,j},a_{i_k,k})$$

Here $I_{(i_j'=i_j-1)}$ is an indicator funciton for the corresponding nucleotide $a_{i_j,j}$. After the array $M$ is filled by the optimal scores $M_{i_1,...,i_K}$, the alignment with the highest score can be reconstructed by tracing back through the lower-right corner to the upper-left corner of $M$.

Algorithm

  • $K$ sequences are arranged in an array form $M$
  • Calculating $M_{i_1,...,i_K}$ for each cell
  • Tracing back from the lower-right corner to the upper-left corner

The search space thus increases exponentially with $K$ and also increases linearly with sequence length. To find the global optimum for $K$ sequences is an NP-complete problem.

Book traversal links for Global alignment and Dynamic programming

  • ❮ Similarity Scores for multiple sequences
  • Up
  • Progressive alignment construction ❯
Rating (5 = Best)

Copyright 2021 - All Rights Reserved