Markov Chains

Khaled Nakhleh included in Stochastic Processes

2021-11-20 863 words 5 minutes

Contents

The document covers both Markov chains in both the discrete and continuous time models.

1. Preamble

A Markov chain is described by,

State space $\mathcal{S}$ : Set of all possible states $s$ that the chain can be in.
Transition Kernel\Matrix $\mathbf{P}$ : A two-dimensional matrix that stores the probability of going from state $i$ to another state in the state space $j$ . State space $\mathcal{S}$ cardinality is $|\mathcal{S}| = N$ . The probability is denoted by, $p_{ij} := Pr(s_{t+1} = j | s_t = i), i,j \in \mathcal{S}$ where $t$ is the timestep counter for a *discrete* Markov chain.

$\mathbf{P} = \begin{bmatrix} p_{11} & p_{12} & \cdots & p_{1N}\\ p_{22} & p_{22} & \ddots & p_{2N}\\ p_{N1} & p_{N2} & \cdots & p_{NN} \end{bmatrix}$

Note that the row vectors give transition probabilities for a state $i$ , which combined with the normalization axiom must give $\sum_{j=1}^{N} p_{ij} = 1 \hspace{1em} \forall i \in \mathcal{S}$ .

A self-transition $p_{ii}$ gives the probability of remaining in the same state at the next timestep.

Markov Property

The current transition at timestep $t$ is independent from the history $\mathcal{H}$ , where it does not depend on previous timestep transitions $t-1, t-2, \cdots, t=0$ . Formally, the probability of transitioning to state $j$ is given by,

$p_{ij} =$ $P(s_{t+1} = j | s_t = i_t, s_{t-1} = i_{t-1}, \cdots, s_{t=0} = i_{t=0})$ $= P(s_{t+1} = j | s_t = i_t)$

Transition is said to be memoryless for all trajectories' enumerations $i_{t-1}, i_{t-2}, \cdots, i_{t=0}$ .

Important note
we are not restricted to only use information given at the current timestep in transitioning. It is possible to perserve the Markov property by designing the state $s_t$ to include past information up to a window $W_t = [t-1, t-2, \cdots, t-|W_t|]$ . One popular example is presented in the DQN paper ¹, where the authors used the past three frames as part of the current state. The DQN agent made its decision based on a state constructed from fresh and past information.

State Classifications

2. Discrete-Time Markov Chains (DTMCs)

Let $t = 0,1,\cdots, T$ be the timestep with horizon $T$ . A Markov chain transition based on $\mathbf{P}$ in each $t$ .

3. Continuous-Time Markov Chains (CTMCs)

The continuous-time equivalent builds on top of the discrete version, but where a defined timeslots exists in the discrete version, continuous-time take the limit over timeslot interval. The continuous-time version is evaluated for integrals over time.

A stochastic process $X(t)$ is a CTMC if:

Timesteps are $t \in \mathbb{R}$ .
Has state spacec $X(t) \in \mathcal{S}$ , with $\mathcal{S}$ being a countable set ( $|\mathcal{S}|$ either finite or infinite).
Holds the Markov property, $Pr(X(t+s) | X(u), u \leq s)$ $= Pr(X(t+s) | X(s))$

Meaning that the conditional probability depends only on the current state $X(s)$ .

Assumption 1
Non-explosivness For a finite time interval $\delta > 0$ , the chain transitions to a finite number of states.

Definitions

Time-homogenous CTMC: if transitions probabilities $Pr(X(t+s) | X(s))$ are independent of time $s$ , then the CTMC is time-homogenous.

Transitions in a CTMC are defined as jumps, with the state $Y(k)$ being the state after $k$ jumps. The time interval between the $(k-1)^{th}$ and $k^{th}$ jumps is defined as $T_k$ . $T_k$ is an exponentially distributed random variable that depends only on the $Y(k-1)$ state. We define the time spent in state $i$ at time $t$ as $\gamma_i(t)$ , $\gamma_i(t) := inf\{s > 0 : X(t+s) \neq X(t) \text{ and } X(t) = i\}$

$\gamma_i(t)$ is an exponentially distributed random variable if the CTMC is time-homogenous. Denote $\frac{1}{q_i}$ as the mean time spent in state $i \in \mathcal{S}$ .

Stationary Distribution $\pi$ of CTMCs

Theorem 1
CTMCs with finite state space $\mathcal{S}$ and is irreducible has a stationary distribution $\pi$ and $\lim\limits_{t \rightarrow \infty} p(t) = \pi \text{ } \forall \text{ } p(0)$ . The stationary distribution may not necessarily be unique.

The irreducability condition is not enough to ensure a stationary distribution for infinite state spaces. A stationary distribution may still exist for infinite state spaces.

In a CTMC, the states can be categorized as recurrent or transient (same as in DTMCs), but using different time intervals. A state $i$ is recurrent if, $\lim\limits_{T \rightarrow \infty} Pr \{ \tau_i < T \} = 1$

With the intervals $\tau_i$ and $\gamma_i$ for state $i$ defined as, $\tau_i := inf \{ t > \gamma_i : X(t) = i \text{ and } X(0) = i \}$

$\gamma_i := inf \{t > 0 : X(t) \neq i \text{ and } X(0) = i \}$

If the above condition is not satisfied, then the state $i$ is transient.

Global and Local Balance Equations

Foster-Lyapunov for CTMCs

With the same goal as in DTMCs, of proving positive-recurrency for a Markov chain, the Foster-Lyapunov theorem can be extended to the continuous-time domain. This is another method of providing a sufficient condition for positive recurrency.

Theorem 2
For an irreducible, non-explosive CTMC, if a function $V : \mathcal{S} \rightarrow \mathbb{R}^{+}$ exists such that:

$\sum\limits_{j \neq i} Q_{ij}(V(j) - V(i)) \leq - \epsilon$ if $i \in \beta^{c}$ .

$\sum\limits_{j \neq i} Q_{ij} (V(j) - V(i)) \leq M$ if $i \in \beta$ .

Reversability

Deep-Q Network (DQN) paper link ↩︎