GPs with Functional Inputs for WEPP

Jarad Niemi and Luis Damiano

2023-06-15

Computer Models in Agriculture

name	developer	date
WEPP	USDA	1990
Agro-IBIS	Wisconsin	1990
APSIM	CSIRO (Australia)	2007
Cycles	Penn State University	2020
CyclesL	Penn State University	2023
HydroGeoSphere	Iowa State University	2023

Input-Output

Inputs (\(X\)):

Topography
Soil type
Management
Weather

Outputs (\(Y\)):

Yield
Nutrients
Water
Soil

Daily Erosion Project (DEP)

DEP - Precipitation

DEP - Soil Loss

DEP - Expansion

Currently (Iowa and some surrounding regions)

~200,000 runs/day
start at midnight, done by 6am
single server parallelism

Expansion

Midwest
U.S.
World

Water Erosion Prediction Project (WEPP)

WEPP

WEPP Experiment - Profile

WEPP Experiment - Length and Slope

Scientific questions

Out-of-sample prediction accuracy
- Soil loss

Relative importance
- Profile
  - Position
- Length
- Mean slope

Gaussian Process Emulators

Emulator

Deterministic computer model

\[Y = f(X)\]

for

input \(X\)
output \(Y\)

An emulator is an estimate of \(f\), i.e. \(\widehat{f}\).

Gaussian Process

For \(f:\) 𝒳 \(\to\) 𝒴, assume \[f \sim GP(\mu, k)\]

for

mean function \(\mu:\) 𝒳 \(\to \mathbb{R}\) and
covariance function \(k:\) 𝒳 \(\times\) 𝒳 \(\to \mathbb{R}^{+}\).

For simplicity, \(\mu(x) = 0\).

Data

For a collection of data \((Y_i, X_i)\) for \(i=1,\ldots,n\), we have

\[Y = (Y_1,\ldots,Y_n)^\top \sim N(0, \Sigma)\]

where

\[\Sigma_{i,j} = k(x_i,x_j).\]

For prediction at a new location \(\widetilde{x}\), we have the conditional distribution \(\widetilde{Y}|y\) which involves covariances \(k(x_i, \widetilde{x})\) for all \(i\).

Distance-based covariance kernel

For any 𝒳, let \[k(x_i,x_j) = \sigma^2 e^{-d(x_i,x_j)/2}\] for spatial variance \(\sigma^2\) and some distance function \(d(x_i,x_j)\).

For example,

squared exponential (Gaussian) covariance
automatic relevance determination
automatic dynamic relevance determination

Squared-exponential kernel

If \(x \in \mathbb{R}\),

the squared-exponential (Gaussian) covariance kernel is \[d(x_i,x_j) = w (x_i-x_j)^2\]

where \(w\) is the weight (\(1/w\) is the length-scale/range).

Automatic relevance determination

If \(x \in \mathbb{R}^P\),

the automatic relevance determination kernel is \[d(x_i,x_j) = \sum_{p=1}^P w_p [x_{i,p}-x_{j,p}]^2\]

where \(w_p\) controls the strength of the relationship in the \(p\)th dimension.

Automatic dynamic relevance determination

If \(x \in\) ℋ (Hilbert space), the automatic dynamic relevance determination kernel is \[d(x_i,x_j) = \int w(t) [x_{i}(t)-x_{j}(t)]^2 dt\]

for some weight function \(w:\) 𝒯 \(\to \mathbb{R}^+\).

WLOG 𝒯 = [0,1].

Automatic dynamic relevance determination (ADRD)

Asymmetric double exponential

Fourier expansion

B-splines and hinges

Combined

Combine hillslope profile, length, and mean slope into a single correlation function:

Calculate scaled-integrated weight:

Results

Out-of-sample prediction accuracy

Profile, length, and slope relevance

Hillslope profile relevance

Summary

Novelty

Introduced

Automatic dynamic relevance determination
Variety of weight functions

Results

Similar prediction accuracy
Informative relevance

Future questions

Calibration
Curse of big data
Design
Shift-invariant distance function
Data Fusion
- Field data
- Remote sensing data

More information

Webpage: jarad.me
Slides
Luis Damiano’s PhD Dissertation (Link TBD)

GPs with Functional Inputs for WEPP

Computer Models in Agriculture

Input-Output

Daily Erosion Project (DEP)

DEP - Precipitation

DEP - Soil Loss

DEP - Expansion

Water Erosion Prediction Project (WEPP)

WEPP

WEPP Experiment - Profile

WEPP Experiment - Length and Slope

Scientific questions

Gaussian Process Emulators

Emulator

Gaussian Process

Data

Distance-based covariance kernel

Squared-exponential kernel

Automatic relevance determination

Automatic dynamic relevance determination

Automatic dynamic relevance determination (ADRD)

Asymmetric double exponential

Fourier expansion

B-splines and hinges

Combined

Results

Out-of-sample prediction accuracy

Profile, length, and slope relevance

Hillslope profile relevance

Summary

Novelty

Future questions

More information

Thank you!