GPs with Functional Inputs for WEPP

Jarad Niemi and Luis Damiano

2023-06-15

Computer Models in Agriculture

name developer date
WEPP USDA 1990
Agro-IBIS Wisconsin 1990
APSIM CSIRO (Australia) 2007
Cycles Penn State University 2020
CyclesL Penn State University 2023
HydroGeoSphere Iowa State University 2023

Input-Output

Inputs (\(X\)):

  • Topography
  • Soil type
  • Management
  • Weather

Outputs (\(Y\)):

  • Yield
  • Nutrients
  • Water
  • Soil

Daily Erosion Project (DEP)

DEP - Precipitation

DEP - Soil Loss

DEP - Expansion

Currently (Iowa and some surrounding regions)

  • ~200,000 runs/day
  • start at midnight, done by 6am
  • single server parallelism

Expansion

  • Midwest
  • U.S.
  • World

Water Erosion Prediction Project (WEPP)

WEPP

WEPP Experiment - Profile

WEPP Experiment - Length and Slope

Scientific questions

  • Out-of-sample prediction accuracy
    • Soil loss
  • Relative importance
    • Profile
      • Position
    • Length
    • Mean slope

Gaussian Process Emulators

Emulator

Deterministic computer model

\[Y = f(X)\]

for

  • input \(X\)
  • output \(Y\)

An emulator is an estimate of \(f\), i.e. \(\widehat{f}\).

Gaussian Process

For \(f:\) 𝒳 \(\to\) 𝒴, assume \[f \sim GP(\mu, k)\]

for

  • mean function \(\mu:\) 𝒳 \(\to \mathbb{R}\) and
  • covariance function \(k:\) 𝒳 \(\times\) 𝒳 \(\to \mathbb{R}^{+}\).

For simplicity, \(\mu(x) = 0\).

Data

For a collection of data \((Y_i, X_i)\) for \(i=1,\ldots,n\), we have

\[Y = (Y_1,\ldots,Y_n)^\top \sim N(0, \Sigma)\]

where

\[\Sigma_{i,j} = k(x_i,x_j).\]

For prediction at a new location \(\widetilde{x}\), we have the conditional distribution \(\widetilde{Y}|y\) which involves covariances \(k(x_i, \widetilde{x})\) for all \(i\).

Distance-based covariance kernel

For any 𝒳, let \[k(x_i,x_j) = \sigma^2 e^{-d(x_i,x_j)/2}\] for spatial variance \(\sigma^2\) and some distance function \(d(x_i,x_j)\).

For example,

  • squared exponential (Gaussian) covariance
  • automatic relevance determination
  • automatic dynamic relevance determination

Squared-exponential kernel

If \(x \in \mathbb{R}\),

the squared-exponential (Gaussian) covariance kernel is \[d(x_i,x_j) = w (x_i-x_j)^2\]

where \(w\) is the weight (\(1/w\) is the length-scale/range).

Automatic relevance determination

If \(x \in \mathbb{R}^P\),

the automatic relevance determination kernel is \[d(x_i,x_j) = \sum_{p=1}^P w_p [x_{i,p}-x_{j,p}]^2\]

where \(w_p\) controls the strength of the relationship in the \(p\)th dimension.

Automatic dynamic relevance determination

If \(x \in\) ℋ (Hilbert space), the automatic dynamic relevance determination kernel is \[d(x_i,x_j) = \int w(t) [x_{i}(t)-x_{j}(t)]^2 dt\]

for some weight function \(w:\) 𝒯 \(\to \mathbb{R}^+\).

WLOG 𝒯 = [0,1].

Automatic dynamic relevance determination (ADRD)

Asymmetric double exponential

Fourier expansion

B-splines and hinges

Combined

Combine hillslope profile, length, and mean slope into a single correlation function:

Calculate scaled-integrated weight:

Results

Out-of-sample prediction accuracy

Profile, length, and slope relevance

Hillslope profile relevance

Summary

Novelty

Introduced

  • Automatic dynamic relevance determination
  • Variety of weight functions

Results

  • Similar prediction accuracy
  • Informative relevance

Future questions

  • Calibration
  • Curse of big data
  • Design
  • Shift-invariant distance function
  • Data Fusion
    • Field data
    • Remote sensing data

More information

  • Webpage: jarad.me
  • Slides
  • Luis Damiano’s PhD Dissertation (Link TBD)

Thank you!