**Hidden Markov Models**

**[Edit: 5/2/2011]** I realized that the B matrix was all sorts of messed up in the example. It has been fixed below.

Using our previous example of weather, let’s say that you wish to know the weather in a remote location but have no access to view it directly. For example, we’ll say that Alive lives in Seattle and Bob lives in Washington D.C. Bob calls Alice on the phone every day at 8 am and explains his intended activity for the day. Since Bob lives a simple life, his activities include walking outside, shopping, or cleaning his house. Additionally, Bob’s activities are highly depended on the weather. Alice wants to know the weather in Washington D.C. but fails to ask Bob and has no other means for finding out. However, she knows that Bob has a 60% chance to take a walk if it is sunny, cleans the house 70% of the time when it is raining, and goes shopping 30% of the time on sunny days. Using this information, she can construct a model for figuring out the weather in Washington D.C.

Because Alice cannot see the weather (or states) in our example, we consider the Markov Model to be *Hidden*. We know that the states exist, but we cannot directly observe them. Instead, we have to rely on a set of observables to model the system. We can construct another matrix to describe the likelihood of observable occurrence based on the current state. Each element of the array consists of an individual probability. For example:

b_{11} = P[activity = walk | weather = sunny] = 0.6

which means that Bob has a 0.4 chance of either cleaning or shopping on a sunny day. We will follow this example and fill out the rest of the probabilities.

b_{12} = P[activity = walk | weather = rainy] = 0.1

Following this naming convention, we can re-write the other probabilities:

b_{21} = P[activity = clean | weather = sunny] = 0.1

b_{22} = P[activity = clean | weather = rainy] = 0.7

b_{31} = P[activity = shop | weather = sunny] = 0.3

b_{32} = P[activity = shop | weather = rainy] = 0.2

These can be compressed into a matrix used to describe the output symbol probabilities:

Taking the observations into account, we can view the HMM as a state diagram with observables:

The above figure shows how the state transitions from the Markov Chain remain the same but incorporates the probabilities of observables. These probabilities are appropriately captured in the A and B matrices.

In addition to the A and B matrices, we need to define the probabilities for the initial state transition. In many HMM applications, we cannot know the starting state. In many cases, however, we can define the probabilities for the initial state. Continuing our example, the first day that Bob calls Alice, Alice knows the general weather patterns of Washington D.C. and assumes that the weather has a 75% chance of being sunny and 25% chance of being rainy. From this, she can construct the initial state probability matrix:

Using the A, B, and π ,matrices, we can fully define a Hidden Markov Model. We will use the following notation to describe the model:

When considering the use of HMMs, three canonical problems arise:

1. Given a model, compute the probability of an output sequence, which can be expressed as P(O|λ). This can be accomplished using the Forward algorithm.

2. Given a model and an output sequence, find the most likely sequence of states. This is covered by the Viterbi algorithm.

3. Given an output sequence, find the most likely sequence of states and output probabilities. Put differently, find the maximum likelihood estimator for the HMM parameters, which can be accomplished via the Baum-Welch algorithm.

HMMs play an important role in statistical modeling of time or space varying systems and have seen widespread use in applications such speech and handwriting recognition.