Gaussian Processes with Derivative Information

with an application to the Helmholtz kernel

Published

September 29, 2024

Gaussian Processes

A Gaussian process (GP) is an infinite dimensional stochastic process where any finite subsample of the process follows a multivariate normal distribution (see here for a good visual introduction). Define by \(y \in \mathbb{R}\) the random variable over which we define the GP, and by \(x \in \mathbb{R}^p\) the locations over which the \(y\) are observed. We write \[y \sim \mathcal{GP}(m(x), k(x, x'))\] where \(m: \mathbb{R}^p \rightarrow \mathbb{R}\) and \(k: \mathbb{R}^{p \times p} \rightarrow \mathbb{R}\) are the mean and covariance functions, respectively.

This is the mathematical definition of the continuous stochastic process, but in the real world world we may only make discrete observations. Define the collection of observations as \(\textbf{y} = (y_1, \dots, y_n)\) with corresponding locations \((x_1, \dots, x_n)\). As per the definition of a GP, the \(\textbf{y}\) thus follow a multivariate normal distribution, so that \[\mathbf{y} \sim \mathcal{N}(\mathbf{\mu}, \Sigma),\] where \[ \mathbf{\mu} = \begin{bmatrix} m(x_1) \\ \vdots \\ m(x_n) \end{bmatrix}, \quad \text{and} \quad \Sigma = \begin{bmatrix} k(x_1, x_1) & \cdots & k(x_1, x_n) \\ \vdots & \ddots & \vdots \\ k(x_n, x_1) & \cdots & k(x_n, x_n) \end{bmatrix}.\]

Observing Derivatives

Consider now we observe the derivatives \(\frac{d}{dx} y \equiv \partial_x y\). Say we wish to remain specifying our beliefs of \(y\) via \(m(\cdot)\) and \(k(\cdot, \cdot)\), how do we incorporate this information?

The functions \(m(\cdot)\) and \(k(\cdot, \cdot)\) have been defined in terms of \(x\), and so adjusting them in terms of \(y\) (or, rather, it’s derivative) may not seem immediately intuitive. Instead consider that \(m(x_i) = \mathbb{E}[y_i]\) and \(k(x_i, x_j) = \mathrm{cov}[y_i, y_j]\); this is seen by the roles that \(m(\cdot)\) and \(k(\cdot, \cdot)\) take in the equations above.

Calculating the first two moments of \(\partial_x y\) we get \(\mathbb{E}[\partial_x y] = \partial_x \mathbb{E}[y] = \partial_x m(x)\) and \(\mathrm{cov}[\partial_x y, \partial_{x'} y'] = \partial_{x x'} \mathrm{cov}[ y, y'] = \partial_{x x'} k(x, x')\). Here I have made use of the property of linearity of derivatives, expectations, and covariance operators.

Now define \(\partial_\textbf{x} \textbf{y} = (\partial_{x_1} y_1, \dots, \partial_{x_m} y_m)\) with corresponding locations \((x_1, \dots, x_m)\). Similar to above we may describe \(\partial_\textbf{x} \textbf{y}\) as being sampled from the normal distribution \[\partial_\textbf{x} \textbf{y} \sim \mathcal{N}(\partial_\textbf{x} \mu, \partial_{\textbf{xx}'} \Sigma)\] where \[ \partial_\textbf{x} \mu = \begin{bmatrix} \partial_{x_1} m(x_1) \\ \vdots \\ \partial_{x_m} m(x_m) \end{bmatrix}, \quad \text{and} \quad \partial_{\textbf{xx}'} \Sigma = \begin{bmatrix} \partial_{x_1 x_1} k(x_1, x_1) & \cdots & \partial_{x_1 x_m} k(x_1, x_m) \\ \vdots & \ddots & \vdots \\ \partial_{x_m x_1} k(x_m, x_1) & \cdots & \partial_{x_m x_m} k(x_m, x_m) \end{bmatrix}\].

Updating with mixed observations

We may also form the joint distribution between the process and its derivatives. For example, this may be to combine the observations from different measurement platforms, or to form conditional distributions. The only additional term we require is the off-diagonal term corresponding to the \(\mathrm{cov}[y, \partial_{x'} y'] = \partial_{x'} \mathrm{cov}[y, y'] = \partial_{x'} k(x, x')\). The joint normal distribution is \[\begin{bmatrix} \textbf{y} \\ \partial_\textbf{x} \textbf{y} \end{bmatrix} \sim \mathcal{N}\left(\begin{bmatrix} \mu \\ \partial_\textbf{x} \mu \end{bmatrix}, \begin{bmatrix} \Sigma & \partial_{\textbf{x}'} \Sigma \\ \partial_{\textbf{x}} \Sigma & \partial_{\textbf{xx}'} \Sigma \end{bmatrix} \right),\] \[\text{where} \quad \partial_{\textbf{x}'} \Sigma = (\partial_{\textbf{x}} \Sigma)^\intercal = \begin{bmatrix} \partial_{x_1} k(x_1, x_1) & \cdots & \partial_{x_m} k(x_1, x_m) \\ \vdots & \ddots & \vdots \\ \partial_{x_1} k(x_n, x_1) & \cdots & \partial_{x_m} k(x_n, x_m) \end{bmatrix}.\]

Example: The Helmholtz Kernel

The primary motivation for writing this article was to provide a proof to equations (4)–(6) and (10)–(12) in our recent article on inferring flow properties from Lagrangian drifters. The mathematics involved with incorporating derivative information for 1D process is straightforward. Moving to more than 1 dimension, there are some handy tricks to use.

Define the zonal and meridional velocities, \(u\) and \(v\), as the sum of rotational and divergent components \[u = -\partial_y \psi + \partial_x \phi,\] \[v = \partial_x \psi + \partial_y \phi\] where \(\psi\) and \(\phi\) are the streamfunction and velocity-potential fields defined over a domain \(\textbf{x} = [x, y] \in \mathcal{X}\). We specify the auto and cross covariance functions for \(\psi\) and \(\phi\) (\(\mathrm{cov}[\psi, \psi']\), \(\mathrm{cov}[\phi, \phi']\), and \(\mathrm{cov}[\psi, \phi']\)) and from this generate covariance functions for \(u\) and \(v\). To form the joint distribution of \(u\) and \(v\) we require expressions for \(\mathrm{cov}[u, u']\), \(\mathrm{cov}[v, v']\), and \(\mathrm{cov}[u, v']\). These are fairly easily obtained using basic properties of covariance. For example, \[\begin{align} \mathrm{cov}[u, v'] &= \mathrm{cov}[-\partial_y \psi + \partial_x \phi, \partial_{x'} \psi + \partial_{y'} \phi] \\ &= \mathrm{cov}[-\partial_y \psi, \partial_{x'} \psi] + \mathrm{cov}[\partial_x \phi, \partial_{x'} \psi] + \mathrm{cov}[-\partial_y \psi, \partial_{y'} \phi] + \mathrm{cov}[\partial_x \phi, \partial_{y'} \phi] \\ &= -\partial_{yx'}\mathrm{cov}[\psi, \psi] + \partial_{xx'}\mathrm{cov}[\phi, \psi] - \partial_{yy'} \mathrm{cov}[\psi, \phi] + \partial_{xy'} \mathrm{cov}[\phi, \phi]. \end{align}\] Expressions for \(\mathrm{cov}[u, u']\) and \(\mathrm{cov}[v, v']\) are similarly obtained as \[\begin{align} \mathrm{cov}[u, u'] &= \partial_{yy'}\mathrm{cov}[\psi, \psi] + \partial_{xx'} \mathrm{cov}[\phi, \phi] - \partial_{yx'} \mathrm{cov}[\psi, \phi] - \partial_{xy'} \mathrm{cov}[\phi, \psi], \quad \text{and} \\ \mathrm{cov}[v, v'] &= \partial_{xx'}\mathrm{cov}[\psi, \psi] + \partial_{yy'} \mathrm{cov}[\phi, \phi] + \partial_{xy'} \mathrm{cov}[\psi, \phi] + \partial_{yx'} \mathrm{cov}[\phi, \psi]. \end{align}\]

The fields \(\psi\) and \(\phi\) are defined over two dimensions \(x\) and \(y\), and it is most common to define the covariance functions as isotropic (so that covariance is a function of the Euclidean distance) and stationary (so that covariance is independent of the input location and only a function of distance). Define distance as \(r = (\tau_x^2 + \tau_y^2)^{1/2}\) where \(\tau_x = x - x'\) and \(\tau_y = y - y'\). Before we proceed, let’s establish the following relations: \[\frac{\mathrm{d} r}{\mathrm{d} \tau_y} = \frac{\tau_y}{(\tau_x^2 + \tau_y^2)^{1/2}} = \frac{\tau_y}{r}, \tag{1}\] \[\frac{\mathrm{d} \tau_y}{\mathrm{d} y} = 1, \quad \frac{\mathrm{d} \tau_y}{\mathrm{d} y'} = -1, \tag{2}\] \[\frac{\mathrm{d}}{\mathrm{d} \tau_y} \frac{\tau_y}{r} = \frac{\mathrm{d}}{\mathrm{d} \tau_y} \tau_y(\tau_x^2 + \tau_y^2)^{-1/2} = (\tau_x^2 + \tau_y^2)^{-1/2} - \tau_y^2(\tau_x^2 + \tau_y^2)^{-3/2} = \frac{1}{r} - \frac{\tau_y^2}{r^3} = \frac{\tau_x^2}{r^3}. \tag{3}\] Similar relations follow for \(x\) and \(\tau_x\). Consider \(\partial_{yy'}\mathrm{cov}[\psi, \psi]\). This may be expressed as \[\begin{align}\frac{\mathrm{d}^2}{\mathrm{d} y y'} \mathrm{cov}[\psi, \psi] &= \frac{\mathrm{d}}{\mathrm{d} y'} \left( \frac{\mathrm{d}}{\mathrm{d} r} \mathrm{cov}[\psi, \psi] \frac{\mathrm{d} r}{\mathrm{d} \tau_y} \frac{\mathrm{d} \tau_y}{\mathrm{d} y} \right) \\ &= \frac{\mathrm{d}}{\mathrm{d} y'} \left( \mathrm{cov}'[\psi, \psi] \cdot \frac{\tau_y}{r} \right) \end{align}\] where here we have made use of the chain rule and Equations 1 and 2. Note, that the notation \(\mathrm{cov}'[\psi, \psi] \equiv \partial_r \mathrm{cov}[\psi, \psi]\), and similarly for the second derivatives below. Now, employing the product rule, we get \[\begin{align}\frac{\mathrm{d}^2}{\mathrm{d} y y'} \mathrm{cov}[\psi, \psi] &= \left(\frac{\mathrm{d}}{\mathrm{d} y'} \mathrm{cov}'[\psi, \psi] \right) \cdot \frac{\tau_y}{r} + \left(\frac{\mathrm{d}}{\mathrm{d} y'} \frac{\tau_y}{r} \right) \cdot \mathrm{cov}'[\psi, \psi] \\ &= \left(- \mathrm{cov}''[\psi, \psi] \cdot \frac{\tau_y}{r}\right) \cdot \frac{\tau_y}{r} + \left( \frac{\mathrm{d}}{\mathrm{d} \tau_y} \frac{\tau_y}{r} \frac{\mathrm{d} \tau_y}{\mathrm{d} y'} \right) \cdot \mathrm{cov}'[\psi, \psi] \\ &= - \left( \frac{\tau_y^2}{r^2} \cdot \mathrm{cov}''[\psi, \psi] + \frac{\tau_x^2}{r^3} \cdot \mathrm{cov}'[\psi, \psi] \right)\end{align}\] where this final step uses the relation in Equation 3. This leads to Equation (10) in our paper, and Equations (11) and (12) follow similarly.

Contact

Please email me if you either have questions, edits, contributions, suggestions for improvements, or just to say thanks. I understand that non-refereed online posts are not citable in most academic work. If you need a citation please consider our recent article.