I have recently finished grading the final projects for my PSTAT 262MC course _Applied Bayesian Time Series_. The students had in common a desire to make their priors as non-informative as possible. Given the student's lack of exposure to Bayesian methodologies, about half had taken a 10-week introduction to Bayesian methods, I guess I shouldn't be surprised. But I realize that I didn't make my views on the topic clear and convincing. This post by Andrew Gelman is exactly how I think about priors. Here is the quote:
1. Siegfried describes prior probability as "an informed guess about the expected probability of something in advance of the study." He immediately qualifies this: "Often this prior probability is more than a mere guess -- it could be based, for instance, on previous studies." Still, I disagree with his first sentence. I agree that sometimes--often!--a prior distributions is not constructed using previous studies. But when it's not, I'd call it a model or an assumption, not a guess. Why does this matter? Mere semantics? Not quite. I put the prior distribution on the same philosophical dimension as the likelihood [emphasis added by me]. I have no problem with you calling my prior distribution an "informed guess" if you'll also describe your normal distribution or your logistic regression as "informed guesses." My point: the prior distribution, and also the likelihood (in most cases) are assumptions, they're mathematical models, not really "guesses" at the truth so much as useful approximations to the truth. Or, more to the point, approximations to the truth that give useful inferences for quantities of interest.
To take an example from class. Consider an example where you have a scalar outcome variable and a scalar covariate for observations that are taken in time. One possibility is to model the data using simple linear regression with no intercept. This model has two parameters: the regression coefficient and the error variance. A Bayesian analysis for this problem requires specifying a prior on these two parameters and a non-informative prior is proportional to the inverse of the variance. Now, an alternative model is dynamic regression with no intercept. Specifically the regression coefficient is now allowed to varying in time and we assume a random walk for its variation. This model requires priors on three different parameters: the initial regression coefficient, the evolution variance, and the error variance. In analogy with the static regression, we could place a joint prior on the initial regression coefficient and the error variance that is non-informative and proportional to the inverse of the error variance. The question is: what prior should we place on the evolution variance? At one extreme, a non-informative prior on the evolution variance could be used and would allow the regression parameter to vary however the data sees fit. On the other extreme, we could use a point-mass at zero prior which is equivalent to the static regression model. In the middle, a prior that enforces small but non-zero values will make the regression parameter vary smoothly. So the choice of prior is really a choice over how we believe the model should behave. If we were okay with the static regression model, but wanted some flexibility for a time-varying regression parameters, then allowing a dynamic model with an informative prior on the evolution variance is a good approach. There is no need for the statistical modeler to immediately jump to a non-informative prior for all parameters.