Skip to content

Discovery API Reference

Equation discovery framework for recovering governing equations from data.

SINDy Module

SINDy: Sparse Identification of Nonlinear Dynamics.

JAX-native implementation of the SINDy algorithm family for discovering governing equations from time-series data.

Reference

Brunton et al. (2016) "Discovering governing equations from data by sparse identification of nonlinear dynamical systems"

EnsembleSINDyConfig dataclass

EnsembleSINDyConfig(*, polynomial_degree: int = 2, threshold: float = 0.1, alpha: float = 0.05, max_iter: int = 20, include_trig: bool = False, n_frequencies: int = 1, optimizer: str = 'stlsq', n_models: int = 20, bagging_fraction: float = 0.8, library_dropout: float = 0.0)

Bases: SINDyConfig

Configuration for ensemble SINDy.

Attributes:

Name Type Description
n_models int

Number of models in the ensemble.

bagging_fraction float

Fraction of data for each bootstrap sample.

library_dropout float

Fraction of library terms to drop per model.

SINDyConfig dataclass

SINDyConfig(*, polynomial_degree: int = 2, threshold: float = 0.1, alpha: float = 0.05, max_iter: int = 20, include_trig: bool = False, n_frequencies: int = 1, optimizer: str = 'stlsq')

Configuration for the SINDy sparse identification algorithm.

Attributes:

Name Type Description
polynomial_degree int

Maximum polynomial degree for candidate library.

threshold float

Sparsity threshold for STLSQ optimizer.

alpha float

L2 regularization strength for ridge regression.

max_iter int

Maximum STLSQ iterations.

include_trig bool

Include trigonometric basis functions.

n_frequencies int

Number of Fourier frequencies (if trig enabled).

optimizer str

Optimizer name ('stlsq' or 'sr3').

WeakSINDyConfig dataclass

WeakSINDyConfig(*, polynomial_degree: int = 2, threshold: float = 0.1, alpha: float = 0.05, max_iter: int = 20, include_trig: bool = False, n_frequencies: int = 1, optimizer: str = 'stlsq', n_subdomains: int = 100, test_function_order: int = 4)

Bases: SINDyConfig

Configuration for weak-form SINDy (noise-robust variant).

Attributes:

Name Type Description
n_subdomains int

Number of integration subdomains.

test_function_order int

Order of the polynomial test function.

EnsembleSINDy

EnsembleSINDy(config: EnsembleSINDyConfig)

Ensemble SINDy with bootstrap aggregation for uncertainty.

Fits multiple SINDy models on data subsets (bagging) and reports coefficient statistics (mean, std) across the ensemble. This provides uncertainty estimates on discovered equation terms.

Attributes:

Name Type Description
config

Ensemble configuration.

coef_mean ndarray | None

Mean coefficients across ensemble, shape (n_targets, n_library).

coef_std ndarray | None

Std of coefficients across ensemble, shape (n_targets, n_library).

coef_list list[ndarray]

List of all individual model coefficients.

Parameters:

Name Type Description Default
config EnsembleSINDyConfig

Ensemble configuration with n_models, bagging_fraction, etc.

required

fit

fit(x: ndarray, x_dot: ndarray, *, key: Array) -> None

Fit ensemble of SINDy models via bootstrap aggregation.

Parameters:

Name Type Description Default
x ndarray

State data, shape (n_samples, n_features).

required
x_dot ndarray

Time derivatives, shape (n_samples, n_features).

required
key Array

JAX PRNG key for random subsampling.

required

predict

predict(x: ndarray) -> ndarray

Predict using mean ensemble coefficients.

Parameters:

Name Type Description Default
x ndarray

State data, shape (n_samples, n_features).

required

Returns:

Type Description
ndarray

Predicted derivatives using mean coefficients.

equations

equations(input_names: Sequence[str] | None = None, precision: int = 3) -> list[str]

Get equations using mean coefficients with uncertainty.

Parameters:

Name Type Description Default
input_names Sequence[str] | None

Names for state variables.

None
precision int

Decimal places for values.

3

Returns:

Type Description
list[str]

Equation strings with coefficient ± std notation.

CandidateLibrary

CandidateLibrary(polynomial_degree: int = 2, include_trig: bool = False, n_frequencies: int = 1, custom_functions: Sequence[Callable] | None = None)

Generates candidate function library for SINDy.

Constructs a feature matrix Theta(x) where each column is a candidate nonlinear function evaluated on the data. The SINDy algorithm then finds the sparse subset of columns that best predicts the derivatives.

Attributes:

Name Type Description
polynomial_degree

Maximum polynomial degree.

include_trig

Whether to include sin/cos basis functions.

n_frequencies

Number of Fourier frequencies.

custom_functions

Optional list of custom basis functions.

Parameters:

Name Type Description Default
polynomial_degree int

Maximum degree of polynomial terms.

2
include_trig bool

Include sin/cos of each feature.

False
n_frequencies int

Number of Fourier frequencies (if trig enabled).

1
custom_functions Sequence[Callable] | None

List of callables f(x) -> array, where x is (n_samples, n_features) and output is (n_samples, 1) or (n_samples,).

None

transform

transform(x: ndarray) -> ndarray

Build the candidate library matrix Theta(x).

Parameters:

Name Type Description Default
x ndarray

Data matrix of shape (n_samples, n_features).

required

Returns:

Type Description
ndarray

Library matrix Theta of shape (n_samples, n_library_terms).

get_feature_names

get_feature_names(input_names: Sequence[str] | None = None) -> list[str]

Get human-readable names for library terms.

Parameters:

Name Type Description Default
input_names Sequence[str] | None

Names for input features (e.g., ['x', 'y', 'z']).

None

Returns:

Type Description
list[str]

List of feature name strings.

SR3

SR3(threshold: float = 0.1, nu: float = 1.0, max_iter: int = 30, tol: float = 1e-05, regularization: str = 'l0')

Sparse Relaxed Regularized Regression optimizer.

Minimizes: 0.5||y - Xw||² + lambda * R(u) + (0.5/nu) * ||w - u||²

where R is a regularization penalty (L0, L1, or L2) and w, u are alternately optimized.

References

Zheng et al. (2019) "A unified framework for sparse relaxed regularized regression"

Parameters:

Name Type Description Default
threshold float

Sparsity threshold (lambda in the SR3 formulation).

0.1
nu float

Relaxation parameter controlling w-u coupling.

1.0
max_iter int

Maximum number of alternating iterations.

30
tol float

Convergence tolerance.

1e-05
regularization str

Regularization type ('l0', 'l1', or 'l2').

'l0'

fit

fit(x: ndarray, y: ndarray) -> ndarray

Fit sparse coefficients via SR3.

Parameters:

Name Type Description Default
x ndarray

Library feature matrix, shape (n_samples, n_features).

required
y ndarray

Target derivatives, shape (n_samples, n_targets).

required

Returns:

Type Description
ndarray

Sparse coefficient matrix, shape (n_targets, n_features).

STLSQ

STLSQ(threshold: float = 0.1, alpha: float = 0.05, max_iter: int = 20)

Sequential Thresholded Least Squares optimizer.

The canonical SINDy optimizer. Alternates between ridge regression and hard thresholding until the support (set of nonzero coefficients) stabilizes.

Algorithm
  1. Initialize via ridge regression
  2. Zero out coefficients below threshold
  3. Re-fit on remaining (nonzero) features
  4. Repeat until convergence or max_iter

Parameters:

Name Type Description Default
threshold float

Coefficients with absolute value below this are zeroed.

0.1
alpha float

L2 regularization parameter for ridge regression.

0.05
max_iter int

Maximum number of thresholding iterations.

20

fit

fit(x: ndarray, y: ndarray) -> ndarray

Fit sparse coefficients via STLSQ.

Parameters:

Name Type Description Default
x ndarray

Library feature matrix, shape (n_samples, n_features).

required
y ndarray

Target derivatives, shape (n_samples, n_targets).

required

Returns:

Type Description
ndarray

Sparse coefficient matrix, shape (n_targets, n_features).

SINDy

SINDy(config: SINDyConfig | None = None)

Sparse Identification of Nonlinear Dynamics.

Discovers governing equations from data by building a library of candidate nonlinear functions and using sparse regression to find the subset that best explains the observed dynamics.

Usage::

config = SINDyConfig(polynomial_degree=2, threshold=0.1)
model = SINDy(config)
model.fit(x, x_dot)

print(model.equations(["x", "y", "z"]))
x_dot_pred = model.predict(x)

Attributes:

Name Type Description
config

SINDy configuration.

library

Candidate function library.

coefficients ndarray | None

Fitted sparse coefficient matrix (n_targets, n_library_terms).

Parameters:

Name Type Description Default
config SINDyConfig | None

Model configuration. Uses defaults if None.

None

fit

fit(x: ndarray, x_dot: ndarray) -> Self

Fit the SINDy model to data.

Parameters:

Name Type Description Default
x ndarray

State data, shape (n_samples, n_features).

required
x_dot ndarray

Time derivatives, shape (n_samples, n_features).

required

Returns:

Type Description
Self

self for method chaining.

predict

predict(x: ndarray) -> ndarray

Predict derivatives using the discovered model.

Parameters:

Name Type Description Default
x ndarray

State data, shape (n_samples, n_features).

required

Returns:

Type Description
ndarray

Predicted derivatives, shape (n_samples, n_features).

Raises:

Type Description
RuntimeError

If model has not been fit.

equations

equations(input_names: Sequence[str] | None = None, precision: int = 3) -> list[str]

Get human-readable equation strings.

Parameters:

Name Type Description Default
input_names Sequence[str] | None

Names for state variables (e.g., ['x', 'y', 'z']).

None
precision int

Decimal places for coefficient values.

3

Returns:

Type Description
list[str]

List of equation strings, one per state variable.

Raises:

Type Description
RuntimeError

If model has not been fit.

feature_names

feature_names(input_names: Sequence[str] | None = None) -> list[str]

Get library feature names.

Parameters:

Name Type Description Default
input_names Sequence[str] | None

Names for state variables.

None

Returns:

Type Description
list[str]

List of feature name strings.

score

score(x: ndarray, x_dot: ndarray) -> float

Compute R² score of the model.

Parameters:

Name Type Description Default
x ndarray

State data, shape (n_samples, n_features).

required
x_dot ndarray

True derivatives, shape (n_samples, n_features).

required

Returns:

Type Description
float

R² coefficient of determination.

WeakSINDy

WeakSINDy(config: WeakSINDyConfig)

Weak-form SINDy for noise-robust equation discovery.

Instead of computing noisy pointwise derivatives, integrates the governing equations against smooth bump test functions over overlapping time subdomains. This makes the method robust to measurement noise up to high SNR levels.

Usage::

config = WeakSINDyConfig(polynomial_degree=2, n_subdomains=50)
model = WeakSINDy(config)
model.fit(x_data, t)
print(model.equations(["x", "y"]))

Attributes:

Name Type Description
config

WeakSINDy configuration.

coefficients ndarray | None

Fitted sparse coefficient matrix.

Parameters:

Name Type Description Default
config WeakSINDyConfig

Configuration with polynomial degree, threshold, number of subdomains, and test function order.

required

fit

fit(x: ndarray, t: ndarray) -> Self

Fit WeakSINDy model to time-series data.

Parameters:

Name Type Description Default
x ndarray

State data, shape (n_samples, n_features).

required
t ndarray

Time array, shape (n_samples,).

required

Returns:

Type Description
Self

self for method chaining.

predict

predict(x: ndarray) -> ndarray

Predict derivatives using discovered model.

Parameters:

Name Type Description Default
x ndarray

State data, shape (n_samples, n_features).

required

Returns:

Type Description
ndarray

Predicted derivatives, shape (n_samples, n_features).

equations

equations(input_names: Sequence[str] | None = None, precision: int = 3) -> list[str]

Get human-readable equation strings.

Parameters:

Name Type Description Default
input_names Sequence[str] | None

Names for state variables.

None
precision int

Decimal places for coefficient values.

3

Returns:

Type Description
list[str]

List of equation strings.

score

score(x: ndarray, x_dot: ndarray) -> float

Compute R² score against true derivatives.

Parameters:

Name Type Description Default
x ndarray

State data.

required
x_dot ndarray

True derivatives.

required

Returns:

Type Description
float

R² coefficient of determination.

distill_ude_residual

distill_ude_residual(neural_residual: Callable[[Array], Array], x_eval: ndarray, *, config: SINDyConfig | None = None) -> SINDy

Distill a UDE neural residual into symbolic equations via SINDy.

Evaluates the neural residual on sample data and uses SINDy to find the sparsest symbolic representation of the learned function.

Parameters:

Name Type Description Default
neural_residual Callable[[Array], Array]

Trained neural network module that maps state -> residual dynamics. Should accept (batch, state_dim).

required
x_eval ndarray

Evaluation data, shape (n_samples, state_dim).

required
config SINDyConfig | None

SINDy configuration. Uses defaults if None.

None

Returns:

Type Description
SINDy

Fitted SINDy model whose coefficients represent the symbolic

SINDy

form of the neural residual.

finite_difference

finite_difference(x: ndarray, dt: float | ndarray, order: int = 2) -> ndarray

Compute time derivatives via centered finite differences.

Uses second-order centered differences for interior points and first-order forward/backward differences at boundaries.

Parameters:

Name Type Description Default
x ndarray

State data, shape (n_samples, n_features).

required
dt float | ndarray

Time step between samples.

required
order int

Derivative order (only 1 supported currently).

2

Returns:

Type Description
ndarray

Estimated derivatives, shape (n_samples, n_features).

smooth_data

smooth_data(x: ndarray, window_size: int = 5) -> ndarray

Smooth data using a moving average filter.

Applies a uniform moving average along the time axis (axis 0). Boundary values are handled by truncating the kernel.

Parameters:

Name Type Description Default
x ndarray

Data matrix, shape (n_samples, n_features).

required
window_size int

Number of points in the averaging window.

5

Returns:

Type Description
ndarray

Smoothed data, shape (n_samples, n_features).

Symbolic Discovery

Symbolic regression wrapper for equation discovery.

Provides a thin bridge to PySR (Julia-based symbolic regression) as an optional dependency. When PySR is not installed, falls back to a simplified brute-force search over a small expression set.

Reference

Cranmer (2023) "Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl"

SymbolicRegressionConfig dataclass

SymbolicRegressionConfig(*, max_complexity: int = 20, populations: int = 30, niterations: int = 40, binary_operators: tuple[str, ...] = ('+', '-', '*', '/'), unary_operators: tuple[str, ...] = ('sin', 'cos', 'exp', 'sqrt'))

Configuration for symbolic regression.

Attributes:

Name Type Description
max_complexity int

Maximum expression complexity.

populations int

Number of evolutionary populations.

niterations int

Number of search iterations.

binary_operators tuple[str, ...]

Allowed binary operators.

unary_operators tuple[str, ...]

Allowed unary operators.

SymbolicRegressor

SymbolicRegressor(config: SymbolicRegressionConfig | None = None)

Symbolic regression for discovering closed-form expressions.

Uses PySR when available, otherwise falls back to a simple polynomial fit as a baseline.

Usage::

reg = SymbolicRegressor()
reg.fit(x, y)
print(reg.best_equation())
y_pred = reg.predict(x)

Parameters:

Name Type Description Default
config SymbolicRegressionConfig | None

Search configuration. Uses defaults if None.

None

fit

fit(x: Array, y: Array) -> None

Fit symbolic regression to data.

Parameters:

Name Type Description Default
x Array

Input features, shape (n_samples, n_features).

required
y Array

Target values, shape (n_samples,).

required

predict

predict(x: Array) -> ndarray

Predict using the discovered expression.

Parameters:

Name Type Description Default
x Array

Input features, shape (n_samples, n_features).

required

Returns:

Type Description
ndarray

Predicted values, shape (n_samples,).

best_equation

best_equation() -> str

Get the best discovered equation as a string.

Returns:

Type Description
str

Human-readable equation string.