Contents

Markov Chains

The document covers both Markov chains in both the discrete and continuous time models.

1. Preamble

A Markov chain is described by,

  1. State space $\mathcal{S}$: Set of all possible states $s$ that the chain can be in.

  2. Transition Kernel\Matrix $\mathbf{P}$: A two-dimensional matrix that stores the probability of going from state $i$ to another state in the state space $j$. State space $\mathcal{S}$ cardinality is $|\mathcal{S}| = N$. The probability is denoted by, $$p_{ij} := Pr(s_{t+1} = j | s_t = i), i,j \in \mathcal{S}$$ where $t$ is the timestep counter for a *discrete* Markov chain.

$$ \mathbf{P} = \begin{bmatrix} p_{11} & p_{12} & \cdots & p_{1N}\\ p_{22} & p_{22} & \ddots & p_{2N}\\ p_{N1} & p_{N2} & \cdots & p_{NN} \end{bmatrix} $$

Note that the row vectors give transition probabilities for a state $i$, which combined with the normalization axiom must give $\sum_{j=1}^{N} p_{ij} = 1 \hspace{1em} \forall i \in \mathcal{S}$.

A self-transition $p_{ii}$ gives the probability of remaining in the same state at the next timestep.

Markov Property

The current transition at timestep $t$ is independent from the history $\mathcal{H}$, where it does not depend on previous timestep transitions $t-1, t-2, \cdots, t=0$. Formally, the probability of transitioning to state $j$ is given by,

$$ p_{ij} = $$ $$ P(s_{t+1} = j | s_t = i_t, s_{t-1} = i_{t-1}, \cdots, s_{t=0} = i_{t=0}) $$ $$ = P(s_{t+1} = j | s_t = i_t) $$

Transition is said to be memoryless for all trajectories' enumerations $i_{t-1}, i_{t-2}, \cdots, i_{t=0}$.

Important note
we are not restricted to only use information given at the current timestep in transitioning. It is possible to perserve the Markov property by designing the state $s_t$ to include past information up to a window $W_t = [t-1, t-2, \cdots, t-|W_t|]$. One popular example is presented in the DQN paper 1, where the authors used the past three frames as part of the current state. The DQN agent made its decision based on a state constructed from fresh and past information.


State Classifications


2. Discrete-Time Markov Chains (DTMCs)

Let $t = 0,1,\cdots, T$ be the timestep with horizon $T$. A Markov chain transition based on $\mathbf{P}$ in each $t$.


3. Continuous-Time Markov Chains (CTMCs)

The continuous-time equivalent builds on top of the discrete version, but where a defined timeslots exists in the discrete version, continuous-time take the limit over timeslot interval. The continuous-time version is evaluated for integrals over time.

A stochastic process $X(t)$ is a CTMC if:

  1. Timesteps are $t \in \mathbb{R}$.

  2. Has state spacec $X(t) \in \mathcal{S}$, with $\mathcal{S}$ being a countable set ($|\mathcal{S}|$ either finite or infinite).

  3. Holds the Markov property, $$ Pr(X(t+s) | X(u), u \leq s) $$ $$ = Pr(X(t+s) | X(s)) $$

Meaning that the conditional probability depends only on the current state $X(s)$.

Assumption 1
Non-explosivness For a finite time interval $\delta > 0$, the chain transitions to a finite number of states.

Definitions

Time-homogenous CTMC: if transitions probabilities $Pr(X(t+s) | X(s))$ are independent of time $s$, then the CTMC is time-homogenous.

Transitions in a CTMC are defined as jumps, with the state $Y(k)$ being the state after $k$ jumps. The time interval between the $(k-1)^{th}$ and $k^{th}$ jumps is defined as $T_k$. $T_k$ is an exponentially distributed random variable that depends only on the $Y(k-1)$ state. We define the time spent in state $i$ at time $t$ as $\gamma_i(t)$, $$\gamma_i(t) := inf\{s > 0 : X(t+s) \neq X(t) \text{ and } X(t) = i\}$$

$\gamma_i(t)$ is an exponentially distributed random variable if the CTMC is time-homogenous. Denote $\frac{1}{q_i}$ as the mean time spent in state $i \in \mathcal{S}$.

Stationary Distribution $\pi$ of CTMCs

Theorem 1
CTMCs with finite state space $\mathcal{S}$ and is irreducible has a stationary distribution $\pi$ and $\lim\limits_{t \rightarrow \infty} p(t) = \pi \text{ } \forall \text{ } p(0)$. The stationary distribution may not necessarily be unique.

The irreducability condition is not enough to ensure a stationary distribution for infinite state spaces. A stationary distribution may still exist for infinite state spaces.

In a CTMC, the states can be categorized as recurrent or transient (same as in DTMCs), but using different time intervals. A state $i$ is recurrent if, $$ \lim\limits_{T \rightarrow \infty} Pr \{ \tau_i < T \} = 1 $$

With the intervals $\tau_i$ and $\gamma_i$ for state $i$ defined as, $$ \tau_i := inf \{ t > \gamma_i : X(t) = i \text{ and } X(0) = i \} $$

$$ \gamma_i := inf \{t > 0 : X(t) \neq i \text{ and } X(0) = i \} $$

If the above condition is not satisfied, then the state $i$ is transient.

Global and Local Balance Equations

Foster-Lyapunov for CTMCs

With the same goal as in DTMCs, of proving positive-recurrency for a Markov chain, the Foster-Lyapunov theorem can be extended to the continuous-time domain. This is another method of providing a sufficient condition for positive recurrency.

Theorem 2
For an irreducible, non-explosive CTMC, if a function $V : \mathcal{S} \rightarrow \mathbb{R}^{+}$ exists such that:

  1. $\sum\limits_{j \neq i} Q_{ij}(V(j) - V(i)) \leq - \epsilon$ if $i \in \beta^{c}$.

  2. $\sum\limits_{j \neq i} Q_{ij} (V(j) - V(i)) \leq M$ if $i \in \beta$.

Reversability

  1. Deep-Q Network (DQN) paper link ↩︎