Algorithm Overview

This page explains the algorithm behind mpspline.

Background

mpspline implements the mass-preserving equal-area quadratic smoothing spline algorithm described in:

Bishop, T.F.A., McBratney, A.B., & Laslett, G.M. (1999). Modelling soil attribute depth functions with equal-area quadratic smoothing splines. Geoderma, 89(3-4), 185-208.

What It Does

Given soil horizon data at varying depths, the algorithm:

  1. Fits a smooth function through the data

  2. Generates 1cm predictions across the profile depth

  3. Averages predictions to standardized depth intervals

  4. Preserves mass: Average values at standard depths equal the original data

Why It Matters

Soil data is usually collected at irregular depths (horizons):

Depth (cm)    Clay (%)
0-20          24.5      ← Ap horizon
20-50         35.2      ← Bt horizon
50-80         32.0      ← BC horizon

But analysis often requires standardized depths:

Depth (cm)    Clay (%)
0-5           21.4      ← Interpolated
5-15          24.1      ← Interpolated
15-30         29.8      ← Interpolated
30-60         34.9      ← Interpolated
60-100        31.2      ← Interpolated

The spline algorithm ensures these new estimates have the same average value as the original horizons.

How It Works

The algorithm has three main steps:

Step 1: Estimate Spline Parameters

  • Fit a continuous function using quadratic spline pieces

  • Each piece defined by parameters at control points

  • Smoothing parameter (lambda) controls how tightly the spline fits

Step 2: Generate Predictions

  • Use the fitted spline to predict values at 1cm intervals

  • Creates a continuous depth function from discrete horizons

  • Ensures mass is preserved across depth intervals

Step 3: Aggregate to Standard Depths

  • Average predictions within each target depth interval

  • Output harmonized values at standardized depths

  • Result: Values that represent the horizon averages

Implementation Details

The roughness penalty matrix \(\mathbf{R}\) is constructed with diagonal elements:

\[R_{ii} = 2(h_i + h_{i+1}) + 6g_i\]

Where \(h_i\) is the thickness of layer \(i\) and \(g_i\) is the gap between layers.

Key Properties

Mass Preservation

The algorithm ensures:

\[\frac{\sum_{d_{top}}^{d_{bottom}} f(d)}{d_{bottom} - d_{top}} = \text{original average}\]

In plain English: The average of the spline function across the original horizon depth equals the original value.

Smoothing Parameter (Lambda)

Controls the trade-off between smoothness and fit:

  • Small lambda (0.001-0.01): Very smooth predictions

  • Default lambda (0.1): Balanced smoothness and fit

  • Large lambda (1.0-10.0): Closer fit to original data

Continuity

The function is: - Continuous: No jumps - Smooth: No sharp angles - Differentiable: Can compute slopes

Parameters That Matter

lambda (lam)

Smoothing parameter (default: 0.1)

Controls the trade-off between: - Smoothness: Smaller lambda = smoother curve - Fidelity: Larger lambda = follows data more closely

Example:

# Smooth curve for noisy data
result = mpspline(profile, lam=0.01)

# Balanced (default)
result = mpspline(profile, lam=0.1)

# Follows data closely
result = mpspline(profile, lam=1.0)

vlow and vhigh

Value constraints (default: 0.0 and 1000.0)

Ensures predictions stay within realistic ranges:

# Clay: 0-100%
result = mpspline(profile, var_name=['clay'], vlow=0, vhigh=100)

# Bulk density: 1-2 g/cm³
result = mpspline(
    profile,
    var_name=['bdensity'],
    vlow=1.0,
    vhigh=2.0
)

Strengths

  1. Mass preservation: Maintains original horizon averages

  2. Smooth predictions: No artificial discontinuities

  3. Flexibility: Works with any horizon configuration

  4. Robustness: Handles variable number of horizons

  5. Speed: Efficient computation

Limitations

  1. Extrapolation: Cannot extend beyond profile depth

  2. Few horizons: Requires minimum 2 horizons per profile

  3. Assumption: Assumes smooth depth function (not appropriate for all properties)

  4. Parameter selection: Choice of lambda affects results

When to Use Different Lambda Values

Use lambda = 0.01 (smooth): - Noisy field measurements - Estimate is uncertain - Want conservative, smooth predictions - Variability between horizons seems excessive

Use lambda = 0.1 (default): - Normal soil profile data - Good data quality - Want balanced fit - When in doubt, use this

Use lambda = 1.0 (fit close to data): - Very accurate laboratory measurements - High precision data - Want predictions close to observed values - Large number of horizons with clear trends

Algorithm Diagram

Simplified flow:

Input Horizons
     ↓
Estimate Spline Parameters (using lambda)
     ↓
Generate Predictions at 1cm intervals
     ↓
Apply Constraints (vlow, vhigh)
     ↓
Aggregate to Standard Depths
     ↓
Output Harmonized Values

Example Walkthrough

Given this profile:

Horizon  | Depth    | Clay
---------|----------|------
Ap       | 0-20cm   | 24.5%
Bt       | 20-50cm  | 35.2%

Step 1: Fit spline function

The algorithm finds parameters that: - Pass through (0, 24.5) and (20, 24.5) boundary - Pass through (20, 35.2) and (50, 35.2) boundary - Smooth transition between horizons - Controlled by lambda parameter

Step 2: Generate 1cm predictions

Spline function evaluated at every 1cm:

Depth  | Prediction
-------|----------
0-1cm  | 24.1%
1-2cm  | 24.2%
...
20-21cm| 28.5%
...
50cm   | 35.2%

Step 3: Aggregate to standard depths

Average predictions within intervals:

Interval    | Average of Predictions
------------|---------------------
0-5cm       | (24.1+24.2+...)/5 = 24.3%
5-15cm      | (average of 5-15)  = 24.5%
15-30cm     | (average of 15-30) = 28.9%
30-60cm     | (average of 30-60) = 35.0%

Result: Harmonized values that preserve original horizon averages!

Further Reading