Spatial pattern in residuals

The original question came from a data set attempting to predict historical sea ice levels based on diatom data from core samples. A generalized additive model (GAM) was being used with the concentration (proportion?) of three diatoms as the explanatory variables. The client’s concern was that there was a clear spatial pattern in the residuals with positive residuals generally lying in the north and negative residuals in the south. The client wanted to know how they would deal with these spatial residuals without explicitly incorporate spatial information, e.g. latitude. Ultimately the client’s goal is to

predict historical sea ice levels using diatom data from deeper core samples.

Thoughts

First, we should congratulate this client on looking at the residuals and discovering a pattern that signifies clear departure from model assumptions.

Our first inclination is to include latitude in the model, but the client does not want to do this because latitude is not a very good predictive explanatory variable for historical sea ice levels due to dramatically varying sea ice levels at constant latitudes.

We could possibly build a spatial error term, but the data are clustered into two groups: one northern group and one southern group. The separation between these groups indicates that long range dependency will be poorly estimated.

I’m not sure we can up with anything that we really liked to deal with the spatial dependence.

Extrapolation

Our far greater concern was with the prospect of using these data to extrapolate at all. Since a GAM was used and GAMs are extremely flexible in their estimation of the functional form of the relationship between explanatory variables and the response, predictions outside the range of the observed explanatory variables could be highly misleading. Even if a less flexible model was used, e.g. multiple regression, it isn’t clear the model will be able to reasonable extrapolate at all.