R code

library("tidyverse"); theme_set(theme_bw())
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.7     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.0
## ✔ readr   2.1.2     ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
set.seed(20230307)

Exponential Distribution

The exponential distribution is commonly used as a waiting time distribution, i.e. the time until something happens.

Probability density function

A random variable \(X\) has an exponential distribution if its density function is \[ f(x) = \lambda e^{-\lambda x} \quad \mbox{for }x>0 \] for rate \(\lambda>0\).

r <- 3 # rate

ggplot(data.frame(x = 0:(4/r))) + 
  stat_function(fun = dexp, args = list(rate = r)) 

The probability density function is strictly decreasing for any rate.

Note: Some parameterizations of the exponential use the mean rather than the rate as the parameter.

Cumulative distribution function

The cumulative distribution function is \[ F(x) = P(X \le x) = 1-e^{-\lambda x} \quad \mbox{for }x > 0. \]

ggplot(data.frame(x = 0:(4/r))) + 
  stat_function(fun = pexp, args = list(rate = r)) 

Memoryless property

The probability of being greater than some value has a simple form for the exponential distribution. \[ P(X > x) = 1-P(X\le x) = e^{-\lambda x}. \]

The exponential distribution has a property called the memoryless property that states \[ P(X > x+c | X > c) = P(X > x) = e^{-\lambda x} \]

For example,

c <- 1

# Check probability 
1-pexp(c, rate = r) # if this probability is too small, the simulation will take a while
## [1] 0.04978707
# Simulate exponential random variables that are greater than c
x <- rexp(1e4, rate = r)
for (i in seq_along(x)) {
  while (x[i] < c)
    x[i] <- rexp(1, rate = r)
}
ggplot(
  data.frame(x=x), 
  aes(x-c)         # performs the shift
) +
  stat_ecdf() +
  stat_function(fun = pexp, args = list(rate = r), color = "red") 

Minimum exponential

Let \[ X_i \stackrel{ind}{\sim} Exp(\lambda_i) \] what is the distribution for \(T = \min\{X_1,\ldots,X_n\}\)?

It turns out that \[ T \sim Exp\left(\sum_{i=1}^n \lambda_i\right). \]

If you are only told that \(T=t\) which \(X_i=t\)?
That is, which exponential was the minimum one?

We won’t know, but the probability is proportional to the rate. So, we know \[ P(X_i = t) \propto \lambda_i = \frac{\lambda_i}{\sum_{j=1}^n \lambda_j}. \]

Continuous time Markov process

These types of models are used in a wide variety of fields. Here we will motivate the module using a chemistry experiment called Michaelis-Menton kinetics. This experiment creates a product “P” from a substrate “S” in a reaction that is catalyzed by an enzyme “E”. In order to convert the substrate to the product, the enzyme must bind to the substrate to create the enzyme-substrate complex “ES”.

This system can be represented by this system of reactions \[ E + S \leftrightarrows ES \to P. \] This system of reactions actually has 3 reactions: \[ \begin{array}{rll} I:& E + S &\stackrel{\lambda_1}{\to} ES \\ II:& ES &\stackrel{\lambda_2}{\to} E + S \\ III:& ES &\stackrel{\lambda_3}{\to} E + P \\ \end{array} \] The \(\lambda\) indicate the rate of the reaction.

In this system, reaction I requires that an enzyme molecule come into contact with a substrate molecule.

If these molecules are in a volume, then the system might look like this

set.seed(20230309)
n <- 30

d <- data.frame(
  x = runif(n),
  y = runif(n),
  species = sample(c("E","S","ES","P"), size = n, replace = TRUE, 
                   prob = c(0.2,0.6,0.1,0.1)),
  speed = runif(n,0.05, 0.1),
  direction = runif(n, max = 2*pi)
) %>%
  mutate(
    xend = x + speed*cos(direction),
    yend = y + speed*sin(direction),
    species = factor(species, levels = c("E","S","ES","P"))
  )



ggplot(d, aes(x = x, y = y, xend = xend, yend=yend,
              color = species, shape = species)) +
  geom_point(size = 5) +
  geom_segment(arrow = arrow(length = unit(0.2, "cm")), color = "black") +
  scale_color_manual(
    values = c(
      "E" = "red",
      "S" = "blue",
      "ES" = "purple",
      "P" = "green"
    )
  ) +
  scale_shape_manual(
    values = c(
      "E" = 4,
      "S" = 0,
      "ES" = 7,
      "P" = 19
    )
  )

These reactions can be coded by indicting two matrices: Pre and Post. These matrices indicate the reactants (Pre) and the products (Post).

Pre <- rbind(
#   E  S ES  P
  c(1, 1, 0, 0),
  c(0, 0, 1, 0),
  c(0, 0, 1, 0)
)

rownames(Pre) <- c("I","II","III")
colnames(Pre) <- c("E","S","ES","P")
Pre
##     E S ES P
## I   1 1  0 0
## II  0 0  1 0
## III 0 0  1 0
Post <- rbind(
#   E  S ES  P
  c(0, 0, 1, 0),
  c(1, 1, 0, 0),
  c(1, 0, 0, 1)
)

rownames(Post) <- c("I","II","III")
colnames(Post) <- c("E","S","ES","P")
Post
##     E S ES P
## I   0 0  1 0
## II  1 1  0 0
## III 1 0  0 1

The reaction matrix is constructed by subtracting these and the stoichiometry matrix by transposing the reaction matrix.

A <- Post - Pre # reaction
S <- t(A)       # stoichiometry

A
##      E  S ES P
## I   -1 -1  1 0
## II   1  1 -1 0
## III  1  0 -1 1
S
##     I II III
## E  -1  1   1
## S  -1  1   0
## ES  1 -1  -1
## P   0  0   1

We can update the state of the system by either the stoichiometry or reaction matrix.

x0 <- c(5, 100, 1, 0)
x0 + S[,1]
##  E  S ES  P 
##  4 99  2  0
x0 + S[,2]
##   E   S  ES   P 
##   6 101   0   0
x0 + S[,3]
##   E   S  ES   P 
##   6 100   0   1

To determine which reaction occurs we need to calculate the propensity of the reaction. The propensity is the product of two components: the reaction rate and the number of combinations of the Pre molecules.

For this model we have the following propensities: \[ \begin{array}{rll} I:& \lambda_1 E \cdot S \\ II:& \lambda_2 ES \\ III:& \lambda_3 ES \\ \end{array} \]

Simulation

To simulate this continuous time Markov process, we iterate through these steps

  1. Simulate a time increment which has an exponential distribution with rate equal to the sum of the propensities.
  2. Simulate which reaction occurred which has a discrete distribution with probability proportional to the propensity.
  3. Update time according to the time increment.
  4. Update the state according to the reaction/stoichiometry matrix

R implementation

michaelis_menton_reaction_matrix <- rbind(
  #  E   S  ES P
  c(-1, -1,  1, 0),
  c( 1,  1, -1, 0),
  c( 1,  0, -1, 1)
)

#' Function to simulate a single Michaelis-Menton transition
#' 
#' This function will simulate a single predator-prety transition based on the 
#' current `state` of the system and the reaction `rates`
simulate_michaelis_menton_transition <- function(state, rate, initial_time) {
  propensity <- c(
    rate[1] * state[1] * state[2],
    rate[2] * state[3] ,
    rate[3] * state[3]
  )
  
  time_increment <- rexp(1, rate = sum(propensity))
  reaction       <- sample(1:3, 1, prob = propensity)
  
  state <- state + michaelis_menton_reaction_matrix[reaction,]
  
  return(
    list(
      time  = initial_time + time_increment,
      state = state
    ))
}
simulate_michaelis_menton_system <- function(initial_state, rate, max_rxns = 1e3) {
  # Create storage structures for state and time
  n_possible_rxns <- max_rxns + 1
  state <- matrix(initial_state, nrow = 4, ncol = n_possible_rxns)
  n_rxns <- 1
  time   <- rep(0, n_possible_rxns) 
  
  # Simulate outbreak
  while(any(state[2:3,n_rxns] > 0) & n_rxns <= max_rxns) {
    tmp <- simulate_michaelis_menton_transition(state[,n_rxns], rate, time[n_rxns])
    n_rxns         <- n_rxns + 1
    time[n_rxns]   <- tmp$time
    state[,n_rxns] <- tmp$state
  }
  
  d <- data.frame(
    E  = state[1, ],
    S  = state[2, ],
    ES = state[3, ],
    P  = state[4, ],
    time = time
  ) %>%
    pivot_longer(-time, names_to = "state", values_to = "count") %>%
    mutate(
      state = factor(state, levels = c("E","S","ES","P"))
    )
}

This

set.seed(1)
d <- simulate_michaelis_menton_system(c(5, 100, 0, 0), rate = c(1,1,.1), max_rxns = 1e3)

ggplot(d, aes(x = time, y = count, color = state)) + 
  geom_step()

This model is a continuous time Markov process because

  1. Time can be incremented by any amount (rather than a discrete amount).
  2. The state of the system is updated based on the current state (as opposed to past history).

Examples

SIR Model

The SIR model has three states:

  • [S]usceptible
  • [I]nfectious
  • [R]ecovered

There are two reactions that can occur \[ \begin{array}{r\quad{}ll} I:&S + I &\stackrel{\lambda}{\to} 2I \\ II:&I &\stackrel{\rho}{\to} R \end{array} \]

The propensity of these reactions are \[ \begin{array}{r\quad{}ll} I:& \lambda \, S\cdot I / N \\ II:& \rho \, I \end{array} \]

The system is updated by the reaction matrix

\[ \begin{array}{r|rrr} & S & I & R \\ \hline I & -1 & 1 & 0 \\ II & 0 & -1 & 1\\ \end{array} \]

Due to the two reactions converting a single individual from one state to another, this model has a constant population size.

R implementation

sir_reaction_matrix <- rbind(
  #  S   I  R
  c(-1,  1, 0),
  c( 0, -1, 1)
)

#' Function to simulate a single SIR transition
#' 
#' This function will simulate a single SIR transition based on the current 
#' `state` of the system and the reaction `rates`
simulate_sir_transition <- function(state, rate, initial_time) {
  propensity <- c(
    rate[1] * state[1] * state[2] / sum(state),
    rate[2] * state[2]
  )
  
  time_increment <- rexp(1, rate = sum(propensity))
  reaction       <- sample(1:2, 1, prob = propensity)
  
  # Update state according to stoichiometry
  state <- state + sir_reaction_matrix[reaction,]
  
  return(
    list(
      time  = initial_time + time_increment,
      state = state
    ))
}
simulate_outbreak <- function(initial_state, rate, max_rxns = 5000) {
  # Create storage structures for state and time
  n_possible_rxns <- min(2*initial_state[1] + initial_state[2]+1, max_rxns + 1)
  state <- matrix(initial_state, nrow = 3, ncol = n_possible_rxns)
  n_rxns <- 1
  time   <- rep(0, n_possible_rxns) 
  
  # Simulate outbreak
  while(state[2,n_rxns] > 0 & n_rxns <= max_rxns ) {
    tmp <- simulate_sir_transition(state[,n_rxns], rate, time[n_rxns])
    n_rxns         <- n_rxns + 1
    time[n_rxns]   <- tmp$time
    state[,n_rxns] <- tmp$state
  }
  
  # Clean up data frames
  n <- which(state[2,] == 0)
  
  d <- data.frame(
    S = state[1, 1:n],
    I = state[2, 1:n],
    R = state[3, 1:n],
    time = time[1:n]
  ) %>%
    pivot_longer(-time, names_to = "state", values_to = "count") %>%
    mutate(
      state = factor(state, levels = c("S","I","R"))
    )
}

Sometimes the outbreak will die out right away

set.seed(1)
d <- simulate_outbreak(c(1000, 1, 0), rate = c(2,1))

ggplot(d, aes(x = time, y = count, color = state)) + 
  geom_step()

Sometimes the outbreak will last a while

set.seed(4)
d <- simulate_outbreak(c(1000, 1, 0), rate = c(2,1))

ggplot(d, aes(x = time, y = count, color = state)) + 
  geom_step()

Scientific questions

One question we might ask is: what is the probability the outbreak will die out quickly (before time 5) for a given initial state and reaction rates.

To answer this question, we can run a Monte Carlo study

X0 <- c(1000, 2, 0)
rate <- c(2,1)

die_out <- replicate(1e3, {
  d <- simulate_outbreak(X0, rate)
  return(max(d$time) < 5)
})

mean(die_out)
## [1] 0.258
binom.test(sum(die_out), length(die_out))$conf.int
## [1] 0.2311271 0.2863056
## attr(,"conf.level")
## [1] 0.95

Another question could be, given the outbreak lasts longer than 5, what is the expected number of people who will be infected?

X0 <- c(1000, 1, 0)
rate <- c(2,1)

infected <- replicate(1e3, {
  d <- simulate_outbreak(X0, rate)
  while(max(d$time) < 5)
    d <- simulate_outbreak(X0, rate)
  return(max(d$count[d$state == "R"]))
})

t.test(infected)$conf.int
## [1] 791.6672 797.3448
## attr(,"conf.level")
## [1] 0.95

Lotka-Volterra Model

A simple predator-prey model (stochastic version of the Lotka-Volterra Model) has two states:

  • [R]abbits
  • [F]oxes

There are two reactions that can occur \[ \begin{array}{r\quad{}ll} I:&R &\stackrel{\lambda}{\to} 2R \\ II:&F+R &\stackrel{\rho}{\to} 2F \\ II:&F &\stackrel{\delta}{\to} \emptyset \\ \end{array} \] The first reaction simply

The propensity of these reactions are \[ \begin{array}{r\quad{}ll} I:& \lambda \, R \\ II:& \rho \, F\cdot R\\ II:& \delta \, F \end{array} \]

The system is updated by the reaction matrix

\[ \begin{array}{r|rrr} & R & F \\ \hline I & 1 & 0 \\ II & -1 & 1 \\ III & 0 & -1 \end{array} \]

This model does not have a constant population size.

R implementation

predator_prey_reaction_matrix <- rbind(
  #  R   F
  c( 1,  0),
  c(-1,  1),
  c( 0, -1)
)

#' Function to simulate a single predator-prey transition
#' 
#' This function will simulate a single predator-prety transition based on the 
#' current `state` of the system and the reaction `rates`
simulate_transition <- function(state, rate, initial_time) {
  propensity <- c(
    rate[1] * state[1],
    rate[2] * state[1] * state[2],
    rate[3] * state[2]
  )
  
  time_increment <- rexp(1, rate = sum(propensity))
  reaction       <- sample(1:3, 1, prob = propensity)
  
  # Update state according to reaction matrix
  state <- state + predator_prey_reaction_matrix[reaction,]
  
  return(
    list(
      time  = initial_time + time_increment,
      state = state
    ))
}
simulate_predator_prey_system <- function(initial_state, rate, max_rxns = 1e4) {
  # Create storage structures for state and time
  n_possible_rxns <- max_rxns + 1
  state <- matrix(initial_state, nrow = 2, ncol = n_possible_rxns)
  n_rxns <- 1
  time   <- rep(0, n_possible_rxns) 
  
  # Simulate outbreak
  while(state[2,n_rxns] > 0 & n_rxns <= max_rxns ) {
    tmp <- simulate_transition(state[,n_rxns], rate, time[n_rxns])
    n_rxns         <- n_rxns + 1
    time[n_rxns]   <- tmp$time
    state[,n_rxns] <- tmp$state
  }
  
  d <- data.frame(
    R = state[1, ],
    F = state[2, ],
    time = time
  ) %>%
    pivot_longer(-time, names_to = "state", values_to = "count") %>%
    mutate(
      state = factor(state, levels = c("R","F"))
    )
}

Sometimes the foxes will eat all the rabbits

set.seed(1)
d <- simulate_predator_prey_system(c(20,20), rate = c(1,.01,1), max_rxns = 1e5)

ggplot(d, aes(x = time, y = count, color = state)) + 
  geom_step()