In Lecture 4 we were concerned with models for disease transmission which incorporate households. Although we looked at two distinct cases with regards to what sort of data are available, in this lab session we will focus on the case where there is only final outcome data available. In other words, the available data consist only of the final number of cases in each household.
The main objective of this lab session is to draw Bayesian inference for the model parameters by implementing a Markov Chain Monte Carlo algorithm. Below, there is a brief description of the model (see the lecture notes for more details).
We assume that a population of \(N\) individuals is partitioned into households. Each individual in the population has, independently, a constant ``risk’’ per unit time of becoming infected from the community. In addition, within a household, the disease spreads according to the mechanism of an SIR model.
Suppose that the data consist of the set of numbers \({\bf n} = \{n(j,k)\}\) where \(n(j,k)\) is number of households in which \(j\) out of \(k\) initial susceptibles become infected. Therefore, the likelihood takes the form \[\pi({\bf n}|\alpha, p) = \prod_{j}\prod_{k}q(j,k)^{n(j,k)}\] where \(q(j,k)\) is the probability that in a household of size \(k\) the number of individuals who ever become infected is \(j\).
In the lecture we have derived an expression for the probability \(q(j,k)\) and showed that it depends on two parameters:
Download the file households.R
from the module’s
website
The file contains two functions:
compute.conditional.prob
which computes the probability
that \(j\) susceptibles become infected
in an SIR model with constant infectious period of length \(c\), \(a\)
initial infectives and \(m\) initial
susceptibles.# this is a function to compute the final size P(T=k|Y=y) = probability
# that k-y susceptibles become infected with $y$ initial infetctives
# and n-y susceptibles
# m initial susceptibles
# a initial infectives
# alpha infection rate
# the length of the fixed infectious period
compute.conditional.prob <- function(m, j, a, alpha, c){
if (j==0) {
res <- (exp(-alpha*m*c))^a
}
else {
part.one <- exp(-alpha*(m-j)*c)^(j+a)
part.two <- choose(m, j)
sum <- 0;
for (k in 0:(j-1)) {
sum <- sum + choose(m-k, j-k)*compute.conditional.prob(m,k,a,alpha,c)/(exp(-alpha*(m-j)*c)^(k+a))
}
res <- part.one*(part.two - sum)
}
return(res)
}
compute.marginal.probab
which computes the probability
that in a household of size \(n\) the
number of individuals who ever become infected is \(k\).# We wish to compute P(T = k), for k = 0, ..., n
# "n" is the size of the household
# "alpha" is the person to person infection rate
# "c" is the length of the (fixed) infectious period
compute.marginal.prob <- function(k, n, c, alpha, p) {
prob <- 0;
for (y in 0:n) {
if (k >= y) {
cond.prob <- compute.conditional.prob(n - y, k - y, y, alpha, c)
}
else {
cond.prob <- 0
}
prob <- prob + cond.prob*dbinom(y, n, 1 - p)
}
prob
}
In addition, Table 1 presents some Asian influenza epidemic household data taken from Longini & Koopman (1982). There is a community which consists of 42 households each of them of size 3.
cases <- c(0,1,2,3)
households <- c(29,9,2,2)
mat <- matrix(NA, nrow=4, 2)
mat[,1] <- cases
mat[,2] <- households
mat <- as.data.frame(mat)
colnames(mat) <- c("No.of.Cases", "No.of.Households")
mat
## No.of.Cases No.of.Households
## 1 0 29
## 2 1 9
## 3 2 2
## 4 3 2
Go through each of the functions in the file
households.R
and make sure that you understand what is
going on.
Write a function in R which computes the log-likelihood of the data given the parameters \(p\) and \(\alpha\) assuming that the infectious period is constant and of length one unit.
Suppose that we are interested in drawing Bayesian inference for the model parameters and therefore, we could employ an MCMC algorithm to draw samples from the posterior distribution: \[\pi(p, \alpha|{\bf n}) \propto \pi({\bf n}|\alpha, p) \times \pi(p) \times \pi (\alpha).\] Assume that a priori \[p \sim \mbox{Beta}(\mu_p, \nu_p)\] and \[\alpha \sim \mbox{Gamma}(\mu_{\alpha}, {\nu_{\alpha}}).\]
What values should we choose for the hyper-parameters \(\mu_p, \nu_p, \mu_{\alpha}\) and \(\nu_{\alpha}\) such that we end up with fairly uninformative priors for the model parameters \(p\) and \(\alpha\)?
Write a function in R which draws samples from \(\pi(p, \alpha|{\bf n})\) using MCMC and by adopting uninformative priors for the parameters of interest \(p\) and \(\alpha\).
Hint: In the lecture we discussed two different ways to update the parameter \(p\). Either using an independence sampler or a random walk Metropolis. For the purposes of this exercise, update \(p\) by an independence sampler.
Investigate how correlated the posterior samples of \(\alpha\) and \(p\) are.
Draw samples from the posterior distribution of the probability that a susceptible individual avoids infection from one infected household member.
In the current version of your MCMC algorithm the parameter \(p\) is updated via an independence sampler. Modify your code such that \(p\) is now updated using a random walk Metropolis. Does the mixing of your algorithm improve?