A non-stats colleague asked me yesterday about what happens to an MCMC chain when the posterior is multimodal. I believe their mindset is that convergence happens to a point since this is the way the many algorithms work, e.g. hill-climbing algorithms. MCMC chains, as typically used in Bayesian analysis, don't converge to a point rather they converge to a distribution. So an MCMC chain will explore the entire posterior which includes all modes of that posterior.
Take a simple example where the posterior is a equal-weighted mixture of two normal distributions both with variance 1. The means of these distributions are positive and negative some known constant, in the example below I used 3. If the constant is sufficiently large, then the distribution is multimodal and an MCMC algorithm will alternate (although not every iteration) between the two modes as it samples. Below I implemented a random-walk Metropolis algorithm that samples from this distribution. From the partial traceplot it is clear that this algorithm gets stuck in both modes for a few iterations before making its way to the other mode.
Of course problems can get much more complicated and more sophisticated algorithms, e.g. simulated annealing and parallel tempering (see chapter 10 of this book for recent developments), are necessary for exploring these posteriors.
Edit: Trying out gist.github.com. How can I decease the font size?