Contents

Markov Chains

The document covers both Markov chains in both the discrete and continuous time models.

A Markov chain is described by,

  1. State space S\mathcal{S}: Set of all possible states ss that the chain can be in.

  2. Transition Kernel\Matrix P\mathbf{P}: A two-dimensional matrix that stores the probability of going from state ii to another state in the state space jj. State space S\mathcal{S} cardinality is S=N|\mathcal{S}| = N. The probability is denoted by, pij:=Pr(st+1=jst=i),i,jSp_{ij} := Pr(s_{t+1} = j | s_t = i), i,j \in \mathcal{S} where tt is the timestep counter for a *discrete* Markov chain.

P=[p11p12p1Np22p22p2NpN1pN2pNN] \mathbf{P} = \begin{bmatrix} p_{11} & p_{12} & \cdots & p_{1N}\\ p_{22} & p_{22} & \ddots & p_{2N}\\ p_{N1} & p_{N2} & \cdots & p_{NN} \end{bmatrix}

Note that the row vectors give transition probabilities for a state ii, which combined with the normalization axiom must give j=1Npij=1iS\sum_{j=1}^{N} p_{ij} = 1 \hspace{1em} \forall i \in \mathcal{S}.

A self-transition piip_{ii} gives the probability of remaining in the same state at the next timestep.

The current transition at timestep tt is independent from the history H\mathcal{H}, where it does not depend on previous timestep transitions t1,t2,,t=0t-1, t-2, \cdots, t=0. Formally, the probability of transitioning to state jj is given by,

pij= p_{ij} = P(st+1=jst=it,st1=it1,,st=0=it=0) P(s_{t+1} = j | s_t = i_t, s_{t-1} = i_{t-1}, \cdots, s_{t=0} = i_{t=0}) =P(st+1=jst=it) = P(s_{t+1} = j | s_t = i_t)

Transition is said to be memoryless for all trajectories' enumerations it1,it2,,it=0i_{t-1}, i_{t-2}, \cdots, i_{t=0}.

Important note
we are not restricted to only use information given at the current timestep in transitioning. It is possible to perserve the Markov property by designing the state sts_t to include past information up to a window Wt=[t1,t2,,tWt]W_t = [t-1, t-2, \cdots, t-|W_t|]. One popular example is presented in the DQN paper 1, where the authors used the past three frames as part of the current state. The DQN agent made its decision based on a state constructed from fresh and past information.



Let t=0,1,,Tt = 0,1,\cdots, T be the timestep with horizon TT. A Markov chain transition based on P\mathbf{P} in each tt.


The continuous-time equivalent builds on top of the discrete version, but where a defined timeslots exists in the discrete version, continuous-time take the limit over timeslot interval. The continuous-time version is evaluated for integrals over time.

A stochastic process X(t)X(t) is a CTMC if:

  1. Timesteps are tRt \in \mathbb{R}.

  2. Has state spacec X(t)SX(t) \in \mathcal{S}, with S\mathcal{S} being a countable set (S|\mathcal{S}| either finite or infinite).

  3. Holds the Markov property, Pr(X(t+s)X(u),us) Pr(X(t+s) | X(u), u \leq s) =Pr(X(t+s)X(s)) = Pr(X(t+s) | X(s))

Meaning that the conditional probability depends only on the current state X(s)X(s).

Assumption 1
Non-explosivness For a finite time interval δ>0\delta > 0, the chain transitions to a finite number of states.

Time-homogenous CTMC: if transitions probabilities Pr(X(t+s)X(s))Pr(X(t+s) | X(s)) are independent of time ss, then the CTMC is time-homogenous.

Transitions in a CTMC are defined as jumps, with the state Y(k)Y(k) being the state after kk jumps. The time interval between the (k1)th(k-1)^{th} and kthk^{th} jumps is defined as TkT_k. TkT_k is an exponentially distributed random variable that depends only on the Y(k1)Y(k-1) state. We define the time spent in state ii at time tt as γi(t)\gamma_i(t), γi(t):=inf{s>0:X(t+s)X(t) and X(t)=i}\gamma_i(t) := inf\{s > 0 : X(t+s) \neq X(t) \text{ and } X(t) = i\}

γi(t)\gamma_i(t) is an exponentially distributed random variable if the CTMC is time-homogenous. Denote 1qi\frac{1}{q_i} as the mean time spent in state iSi \in \mathcal{S}.

Theorem 1
CTMCs with finite state space S\mathcal{S} and is irreducible has a stationary distribution π\pi and limtp(t)=π  p(0)\lim\limits_{t \rightarrow \infty} p(t) = \pi \text{ } \forall \text{ } p(0). The stationary distribution may not necessarily be unique.

The irreducability condition is not enough to ensure a stationary distribution for infinite state spaces. A stationary distribution may still exist for infinite state spaces.

In a CTMC, the states can be categorized as recurrent or transient (same as in DTMCs), but using different time intervals. A state ii is recurrent if, limTPr{τi<T}=1 \lim\limits_{T \rightarrow \infty} Pr \{ \tau_i < T \} = 1

With the intervals τi\tau_i and γi\gamma_i for state ii defined as, τi:=inf{t>γi:X(t)=i and X(0)=i} \tau_i := inf \{ t > \gamma_i : X(t) = i \text{ and } X(0) = i \}

γi:=inf{t>0:X(t)i and X(0)=i} \gamma_i := inf \{t > 0 : X(t) \neq i \text{ and } X(0) = i \}

If the above condition is not satisfied, then the state ii is transient.

With the same goal as in DTMCs, of proving positive-recurrency for a Markov chain, the Foster-Lyapunov theorem can be extended to the continuous-time domain. This is another method of providing a sufficient condition for positive recurrency.

Theorem 2
For an irreducible, non-explosive CTMC, if a function V:SR+V : \mathcal{S} \rightarrow \mathbb{R}^{+} exists such that:

  1. jiQij(V(j)V(i))ϵ\sum\limits_{j \neq i} Q_{ij}(V(j) - V(i)) \leq - \epsilon if iβci \in \beta^{c}.

  2. jiQij(V(j)V(i))M\sum\limits_{j \neq i} Q_{ij} (V(j) - V(i)) \leq M if iβi \in \beta.


  1. Deep-Q Network (DQN) paper link ↩︎