Forward Algorithm

maths Given a Hidden Markov Model with known parameters (stationary distribution/ transition probability/ emission probability) we can calculate the likelihood of seeing a sequence observations, efficiently using the forward algorithm.

Let’s say we observe 😫,😃. What is the likelihood of observing these states, given our hidden markov model?

The naive approach is to enumerate all possible hidden states and sum their probabilities. That is:

$P (O) = \sum_{\forall q} P (O ∣ q) P (q)$ Where:

$q = q_{1}, q_{2}, \dots, q_{T}$ represents a sequence of states through the HMM that generates the observations $O$ .
$P (O ∣ q)$ is the probability of observing the sequence $O$ given the state sequence $q$ , which is the product of the observation probabilities for each observed symbol at each step in the sequence given the state at that step: $\prod_{t = 1}^{T} P (o_{t} ∣ q_{t})$ .
$P (q)$ is the probability of the state sequence $q$ , which includes the initial state probability and the product of the transition probabilities for each transition in the sequence: $P (q_{1}) \prod_{t = 1}^{T - 1} P (q_{t + 1} ∣ q_{t})$ .
The sum is taken over all possible sequences of states $q$ of length $T$ .

We can expand the equation to:

$P (O) = \sum_{\forall q} [(\prod_{t = 1}^{T} P (o_{t} ∣ q_{t})) * (P (q_{1}) \prod_{t = 1}^{T - 1} P (q_{t + 1} ∣ q_{t}))]$ For $O = {😫, 😃}$ the permutations of states $q$ that can give rise to $O$ are

q = {{☀ \overset{◯}{R}, ☀ \overset{◯}{R}}, {☀ \overset{◯}{R}, ☁ \overset{◯}{R}}, {☀ \overset{◯}{R}, ⛈}, {☁ \overset{◯}{R}, ☀ \overset{◯}{R}}, {☁ \overset{◯}{R}, ☁ \overset{◯}{R}}, {☁ \overset{◯}{R}, ⛈}, {⛈, ☀ \overset{◯}{R}}, {⛈, ☁ \overset{◯}{R}}, {⛈, ⛈}}

The number of possible hidden states can be calculated with:

P (n, r) = n^{r}

where $n$ is the number of states and $r$ is the number of observations. This is an exponential function that grows with the number of observations we make.

The probability of making observation $O$ is:

P (O = {o_{1} = 😫, o_{2} = 😃}) = \forall q \sum [(t = 1 \prod T P (o_{t} ∣ q_{t})) * (P (q_{1}) t = 1 \prod T - 1 P (q_{t + 1} ∣ q_{t}))] = \forall q \sum P (q_{1}) * P (o_{1} ∣ q_{1}) * P (q_{2} ∣ q_{1}) * P (o_{2} ∣ q_{2}) = \forall q \sum P (q_{1}) * P (😫∣ q_{1}) * P (q_{2} ∣ q_{1}) * P (😃∣ q_{2}) = P (☀ \overset{◯}{R}) * P (😫∣☀ \overset{◯}{R}) * P (☀ \overset{◯}{R} ∣☀ \overset{◯}{R}) * P (😃∣☀ \overset{◯}{R}) + P (☀ \overset{◯}{R}) * P (😫∣☀ \overset{◯}{R}) * P (☁ \overset{◯}{R} ∣☀ \overset{◯}{R}) * P (😃∣☁ \overset{◯}{R}) + P (☀ \overset{◯}{R}) * P (😫∣☀ \overset{◯}{R}) * P (⛈∣☀ \overset{◯}{R}) * P (😃∣⛈) + P (☁ \overset{◯}{R}) * P (😫∣☁ \overset{◯}{R}) * P (☀ \overset{◯}{R} ∣☁ \overset{◯}{R}) * P (😃∣☀ \overset{◯}{R}) + P (☁ \overset{◯}{R}) * P (😫∣☁ \overset{◯}{R}) * P (☁ \overset{◯}{R} ∣☁ \overset{◯}{R}) * P (😃∣☁ \overset{◯}{R}) + P (☁ \overset{◯}{R}) * P (😫∣☁ \overset{◯}{R}) * P (⛈∣☁ \overset{◯}{R}) * P (😃∣⛈) + P (⛈) * P (😫∣⛈) * P (☀ \overset{◯}{R} ∣⛈) * P (😃∣☀ \overset{◯}{R}) + P (⛈) * P (😫∣⛈) * P (☁ \overset{◯}{R} ∣⛈) * P (😃∣☁ \overset{◯}{R}) + P (⛈) * P (😫∣⛈) * P (⛈∣⛈) * P (😃∣⛈)

Writing it out we can see that we are recomputing certain parts of the equation. For example $P (⛈) * P (😫∣⛈)$ is computed 3 times. We can make the computation of $P (O)$ much more efficient by computing it recursively. This is where the forward algorithm comes in.

Initialization: Recursion: Termination: α_{1} (X_{i}) = π [i] P (Y^{0} ∣ X_{i}), α_{t + 1} (i) = [j = 1 \sum N α_{t} (X_{j}) P (X_{i} ∣ X_{j})] P (Y^{t} ∣ X_{i}), P (O) = i = 1 \sum N α_{T} (i), 1 \leq i \leq N 1 \leq t \leq T - 1, 1 \leq i \leq N

Let’s recompute our example using the forward algorithm:

\underline{I ni t ia l i z a t i o n :} t = 1 : \underline{R ec u rs i o n :} t = 2 : \underline{T er mina t i o n :} P (O) = α_{1} (☀ \overset{◯}{R}) = π [☀ \overset{◯}{R}] P (😫∣☀ \overset{◯}{R}), α_{1} (☁ \overset{◯}{R}) = π [☁ \overset{◯}{R}] P (😫∣☁ \overset{◯}{R}), α_{1} (⛈) = π [⛈] P (😫∣⛈) α_{2} (☀ \overset{◯}{R}) = [α_{1} (☀ \overset{◯}{R}) P (☀ \overset{◯}{R} ∣☀ \overset{◯}{R}) + α_{1} (☁ \overset{◯}{R}) P (☀ \overset{◯}{R} ∣☁ \overset{◯}{R}) + α_{1} (⛈) P (☀ \overset{◯}{R} ∣⛈)] P (😃∣☀ \overset{◯}{R}) α_{2} (☁ \overset{◯}{R}) = [α_{1} (☀ \overset{◯}{R}) P (☁ \overset{◯}{R} ∣☀ \overset{◯}{R}) + α_{1} (☁ \overset{◯}{R}) P (☁ \overset{◯}{R} ∣☁ \overset{◯}{R}) + α_{1} (⛈) P (☁ \overset{◯}{R} ∣⛈)] P (😃∣☁ \overset{◯}{R}) α_{2} (⛈) = [α_{1} (☀ \overset{◯}{R}) P (⛈∣☀ \overset{◯}{R}) + α_{1} (☁ \overset{◯}{R}) P (⛈∣☁ \overset{◯}{R}) + α_{1} (⛈) P (⛈∣⛈)] P (😃∣⛈) α_{2} (☀ \overset{◯}{R}) + α_{2} (☁ \overset{◯}{R}) + α_{2} (⛈)

The complexity to compute the forward algorithm is $O (N^{2} T)$ . This is because for each of the observations $T$ , we compute the transition from each state to every other state leading to $N * N$ .

In comparison the complexity for the naive approach is $O (N^{T})$ , which is infeasible for even moderate sized $T$ .

Amen Abshir

Explorer

Forward Algorithm

Graph View

Backlinks