This post looks at implementing estimating a Dirichlet Process mixture via a finite approximation to the stick-breaking construction of a DP. There is a directly analogy to finite mixture models. The model here is (I apologize for the notation…eventually I will figure out how to incorporate LaTeX):

• y_i ~ N(mu_i, sigma_i^2 )
• mu_i, sigma_i^2 ~ P
• P ~ DP(a P0)

where a is the concentration parameter of the Dirichlet process and P0 is the base measure. In this case, the base measure is a Normal-Inverse Gamma(m0,t0, a, b). The stick-breaking construction allows for an alternative representation of the model via

• y_i ~ N(\mu_i, sigma_i^2)
• mu_i, \sigma_i^2 ~ \sum_{h=1}^\infty \pi_h \delta_{mu_h^* , sigma_h^{2*}}
• pi ~ stick-breaking(a)
• mu_h^* , sigma_h^{2*} ~ Normal-Inverse Gamma(m0,t0, a, b)

One way to estimate this model is to use a finite approximation (up to H) to the infinite integral in this stick-breaking construction.

The original version of this code is from here and modified by me. I’m sure any mistakes are my own.

The first part of this code just has the function to build the stick-breaking construction.

Next we simulate the data use the mixture of normals simulator from the ‘mixtools’ package.

Now we set up the prior, initial values for the MCMC, and objects to record the results.

The MCMC involves iterating through the following steps:

• Sample cluster indicators S
• Sampling mixture probabilities pi and mixture parameters mu and sigma^2.

This is analogous to finite mixture models except in the updating of the mixture probabilities we need to use the stick-breaking construction.

Without further ado, we run the MCMC>

Next we calculate some posterior summaries and compare to the truth. We only used point estimates here, but we could have easily provided uncertainty estimates as well.

We can also estimate the unknown density of the data via estimating the approximation on a finite grid of y values. This will be used below to compare to the truth as well as to the ‘DPdensity’ function from the DPpackage.

Here is the same analysis using the DPpackage.

Now we can compare our fit and the DPdensity fit to the truth (as well as the data).