Discovery API Reference¶

Equation discovery framework for recovering governing equations from data.

SINDy Module¶

SINDy: Sparse Identification of Nonlinear Dynamics.

JAX-native implementation of the SINDy algorithm family for discovering governing equations from time-series data.

Reference

Brunton et al. (2016) "Discovering governing equations from data by sparse identification of nonlinear dynamical systems"

EnsembleSINDyConfig `dataclass` ¶

EnsembleSINDyConfig(*, polynomial_degree: int = 2, threshold: float = 0.1, alpha: float = 0.05, max_iter: int = 20, include_trig: bool = False, n_frequencies: int = 1, optimizer: str = 'stlsq', n_models: int = 20, bagging_fraction: float = 0.8, library_dropout: float = 0.0)

Bases: SINDyConfig

Configuration for ensemble SINDy.

Attributes:

Name	Type	Description
`n_models`	`int`	Number of models in the ensemble.
`bagging_fraction`	`float`	Fraction of data for each bootstrap sample.
`library_dropout`	`float`	Fraction of library terms to drop per model.

SINDyConfig `dataclass` ¶

SINDyConfig(*, polynomial_degree: int = 2, threshold: float = 0.1, alpha: float = 0.05, max_iter: int = 20, include_trig: bool = False, n_frequencies: int = 1, optimizer: str = 'stlsq')

Configuration for the SINDy sparse identification algorithm.

Attributes:

Name	Type	Description
`polynomial_degree`	`int`	Maximum polynomial degree for candidate library.
`threshold`	`float`	Sparsity threshold for STLSQ optimizer.
`alpha`	`float`	L2 regularization strength for ridge regression.
`max_iter`	`int`	Maximum STLSQ iterations.
`include_trig`	`bool`	Include trigonometric basis functions.
`n_frequencies`	`int`	Number of Fourier frequencies (if trig enabled).
`optimizer`	`str`	Optimizer name ('stlsq' or 'sr3').

WeakSINDyConfig `dataclass` ¶

WeakSINDyConfig(*, polynomial_degree: int = 2, threshold: float = 0.1, alpha: float = 0.05, max_iter: int = 20, include_trig: bool = False, n_frequencies: int = 1, optimizer: str = 'stlsq', n_subdomains: int = 100, test_function_order: int = 4)

Bases: SINDyConfig

Configuration for weak-form SINDy (noise-robust variant).

Attributes:

Name	Type	Description
`n_subdomains`	`int`	Number of integration subdomains.
`test_function_order`	`int`	Order of the polynomial test function.

EnsembleSINDy ¶

EnsembleSINDy(config: EnsembleSINDyConfig)

Ensemble SINDy with bootstrap aggregation for uncertainty.

Fits multiple SINDy models on data subsets (bagging) and reports coefficient statistics (mean, std) across the ensemble. This provides uncertainty estimates on discovered equation terms.

Attributes:

Name	Type	Description
`config`		Ensemble configuration.
`coef_mean`	`ndarray \| None`	Mean coefficients across ensemble, shape (n_targets, n_library).
`coef_std`	`ndarray \| None`	Std of coefficients across ensemble, shape (n_targets, n_library).
`coef_list`	`list[ndarray]`	List of all individual model coefficients.

Parameters:

Name	Type	Description	Default
`config`	`EnsembleSINDyConfig`	Ensemble configuration with n_models, bagging_fraction, etc.	required

fit ¶

fit(x: ndarray, x_dot: ndarray, *, key: Array) -> None

Fit ensemble of SINDy models via bootstrap aggregation.

Parameters:

Name	Type	Description	Default
`x`	`ndarray`	State data, shape (n_samples, n_features).	required
`x_dot`	`ndarray`	Time derivatives, shape (n_samples, n_features).	required
`key`	`Array`	JAX PRNG key for random subsampling.	required

predict ¶

predict(x: ndarray) -> ndarray

Predict using mean ensemble coefficients.

Parameters:

Name	Type	Description	Default
`x`	`ndarray`	State data, shape (n_samples, n_features).	required

Returns:

Type	Description
`ndarray`	Predicted derivatives using mean coefficients.

equations ¶

equations(input_names: Sequence[str] | None = None, precision: int = 3) -> list[str]

Get equations using mean coefficients with uncertainty.

Parameters:

Name	Type	Description	Default
`input_names`	`Sequence[str] \| None`	Names for state variables.	`None`
`precision`	`int`	Decimal places for values.	`3`

Returns:

Type	Description
`list[str]`	Equation strings with coefficient ± std notation.

CandidateLibrary ¶

CandidateLibrary(polynomial_degree: int = 2, include_trig: bool = False, n_frequencies: int = 1, custom_functions: Sequence[Callable] | None = None)

Generates candidate function library for SINDy.

Constructs a feature matrix Theta(x) where each column is a candidate nonlinear function evaluated on the data. The SINDy algorithm then finds the sparse subset of columns that best predicts the derivatives.

Attributes:

Name	Type	Description
`polynomial_degree`		Maximum polynomial degree.
`include_trig`		Whether to include sin/cos basis functions.
`n_frequencies`		Number of Fourier frequencies.
`custom_functions`		Optional list of custom basis functions.

Parameters:

Name	Type	Description	Default
`polynomial_degree`	`int`	Maximum degree of polynomial terms.	`2`
`include_trig`	`bool`	Include sin/cos of each feature.	`False`
`n_frequencies`	`int`	Number of Fourier frequencies (if trig enabled).	`1`
`custom_functions`	`Sequence[Callable] \| None`	List of callables f(x) -> array, where x is (n_samples, n_features) and output is (n_samples, 1) or (n_samples,).	`None`

transform ¶

transform(x: ndarray) -> ndarray

Build the candidate library matrix Theta(x).

Parameters:

Name	Type	Description	Default
`x`	`ndarray`	Data matrix of shape (n_samples, n_features).	required

Returns:

Type	Description
`ndarray`	Library matrix Theta of shape (n_samples, n_library_terms).

get_feature_names ¶

get_feature_names(input_names: Sequence[str] | None = None) -> list[str]

Get human-readable names for library terms.

Parameters:

Name	Type	Description	Default
`input_names`	`Sequence[str] \| None`	Names for input features (e.g., ['x', 'y', 'z']).	`None`

Returns:

Type	Description
`list[str]`	List of feature name strings.

SR3 ¶

SR3(threshold: float = 0.1, nu: float = 1.0, max_iter: int = 30, tol: float = 1e-05, regularization: str = 'l0')

Sparse Relaxed Regularized Regression optimizer.

Minimizes: 0.5||y - Xw||² + lambda * R(u) + (0.5/nu) * ||w - u||²

where R is a regularization penalty (L0, L1, or L2) and w, u are alternately optimized.

References

Zheng et al. (2019) "A unified framework for sparse relaxed regularized regression"

Parameters:

Name	Type	Description	Default
`threshold`	`float`	Sparsity threshold (lambda in the SR3 formulation).	`0.1`
`nu`	`float`	Relaxation parameter controlling w-u coupling.	`1.0`
`max_iter`	`int`	Maximum number of alternating iterations.	`30`
`tol`	`float`	Convergence tolerance.	`1e-05`
`regularization`	`str`	Regularization type ('l0', 'l1', or 'l2').	`'l0'`

fit ¶

fit(x: ndarray, y: ndarray) -> ndarray

Fit sparse coefficients via SR3.

Parameters:

Name	Type	Description	Default
`x`	`ndarray`	Library feature matrix, shape (n_samples, n_features).	required
`y`	`ndarray`	Target derivatives, shape (n_samples, n_targets).	required

Returns:

Type	Description
`ndarray`	Sparse coefficient matrix, shape (n_targets, n_features).

STLSQ ¶

STLSQ(threshold: float = 0.1, alpha: float = 0.05, max_iter: int = 20)

Sequential Thresholded Least Squares optimizer.

The canonical SINDy optimizer. Alternates between ridge regression and hard thresholding until the support (set of nonzero coefficients) stabilizes.

Algorithm

Initialize via ridge regression
Zero out coefficients below threshold
Re-fit on remaining (nonzero) features
Repeat until convergence or max_iter

Parameters:

Name	Type	Description	Default
`threshold`	`float`	Coefficients with absolute value below this are zeroed.	`0.1`
`alpha`	`float`	L2 regularization parameter for ridge regression.	`0.05`
`max_iter`	`int`	Maximum number of thresholding iterations.	`20`

fit ¶

fit(x: ndarray, y: ndarray) -> ndarray

Fit sparse coefficients via STLSQ.

Parameters:

Name	Type	Description	Default
`x`	`ndarray`	Library feature matrix, shape (n_samples, n_features).	required
`y`	`ndarray`	Target derivatives, shape (n_samples, n_targets).	required

Returns:

Type	Description
`ndarray`	Sparse coefficient matrix, shape (n_targets, n_features).

SINDy ¶

SINDy(config: SINDyConfig | None = None)

Sparse Identification of Nonlinear Dynamics.

Discovers governing equations from data by building a library of candidate nonlinear functions and using sparse regression to find the subset that best explains the observed dynamics.

Usage::

config = SINDyConfig(polynomial_degree=2, threshold=0.1)
model = SINDy(config)
model.fit(x, x_dot)

print(model.equations(["x", "y", "z"]))
x_dot_pred = model.predict(x)

Attributes:

Name	Type	Description
`config`		SINDy configuration.
`library`		Candidate function library.
`coefficients`	`ndarray \| None`	Fitted sparse coefficient matrix (n_targets, n_library_terms).

Parameters:

Name	Type	Description	Default
`config`	`SINDyConfig \| None`	Model configuration. Uses defaults if None.	`None`

fit ¶

fit(x: ndarray, x_dot: ndarray) -> Self

Fit the SINDy model to data.

Parameters:

Name	Type	Description	Default
`x`	`ndarray`	State data, shape (n_samples, n_features).	required
`x_dot`	`ndarray`	Time derivatives, shape (n_samples, n_features).	required

Returns:

Type	Description
`Self`	self for method chaining.

predict ¶

predict(x: ndarray) -> ndarray

Predict derivatives using the discovered model.

Parameters:

Name	Type	Description	Default
`x`	`ndarray`	State data, shape (n_samples, n_features).	required

Returns:

Type	Description
`ndarray`	Predicted derivatives, shape (n_samples, n_features).

Raises:

Type	Description
`RuntimeError`	If model has not been fit.

equations ¶

equations(input_names: Sequence[str] | None = None, precision: int = 3) -> list[str]

Get human-readable equation strings.

Parameters:

Name	Type	Description	Default
`input_names`	`Sequence[str] \| None`	Names for state variables (e.g., ['x', 'y', 'z']).	`None`
`precision`	`int`	Decimal places for coefficient values.	`3`

Returns:

Type	Description
`list[str]`	List of equation strings, one per state variable.

Raises:

Type	Description
`RuntimeError`	If model has not been fit.

feature_names ¶

feature_names(input_names: Sequence[str] | None = None) -> list[str]

Get library feature names.

Parameters:

Name	Type	Description	Default
`input_names`	`Sequence[str] \| None`	Names for state variables.	`None`

Returns:

Type	Description
`list[str]`	List of feature name strings.

score ¶

score(x: ndarray, x_dot: ndarray) -> float

Compute R² score of the model.

Parameters:

Name	Type	Description	Default
`x`	`ndarray`	State data, shape (n_samples, n_features).	required
`x_dot`	`ndarray`	True derivatives, shape (n_samples, n_features).	required

Returns:

Type	Description
`float`	R² coefficient of determination.

WeakSINDy ¶

WeakSINDy(config: WeakSINDyConfig)

Weak-form SINDy for noise-robust equation discovery.

Instead of computing noisy pointwise derivatives, integrates the governing equations against smooth bump test functions over overlapping time subdomains. This makes the method robust to measurement noise up to high SNR levels.

Usage::

config = WeakSINDyConfig(polynomial_degree=2, n_subdomains=50)
model = WeakSINDy(config)
model.fit(x_data, t)
print(model.equations(["x", "y"]))

Attributes:

Name	Type	Description
`config`		WeakSINDy configuration.
`coefficients`	`ndarray \| None`	Fitted sparse coefficient matrix.

Parameters:

Name	Type	Description	Default
`config`	`WeakSINDyConfig`	Configuration with polynomial degree, threshold, number of subdomains, and test function order.	required

fit ¶

fit(x: ndarray, t: ndarray) -> Self

Fit WeakSINDy model to time-series data.

Parameters:

Name	Type	Description	Default
`x`	`ndarray`	State data, shape (n_samples, n_features).	required
`t`	`ndarray`	Time array, shape (n_samples,).	required

Returns:

Type	Description
`Self`	self for method chaining.

predict ¶

predict(x: ndarray) -> ndarray

Predict derivatives using discovered model.

Parameters:

Name	Type	Description	Default
`x`	`ndarray`	State data, shape (n_samples, n_features).	required

Returns:

Type	Description
`ndarray`	Predicted derivatives, shape (n_samples, n_features).

equations ¶

equations(input_names: Sequence[str] | None = None, precision: int = 3) -> list[str]

Get human-readable equation strings.

Parameters:

Name	Type	Description	Default
`input_names`	`Sequence[str] \| None`	Names for state variables.	`None`
`precision`	`int`	Decimal places for coefficient values.	`3`

Returns:

Type	Description
`list[str]`	List of equation strings.

score ¶

score(x: ndarray, x_dot: ndarray) -> float

Compute R² score against true derivatives.

Parameters:

Name	Type	Description	Default
`x`	`ndarray`	State data.	required
`x_dot`	`ndarray`	True derivatives.	required

Returns:

Type	Description
`float`	R² coefficient of determination.

distill_ude_residual ¶

distill_ude_residual(neural_residual: Callable[[Array], Array], x_eval: ndarray, *, config: SINDyConfig | None = None) -> SINDy

Distill a UDE neural residual into symbolic equations via SINDy.

Evaluates the neural residual on sample data and uses SINDy to find the sparsest symbolic representation of the learned function.

Parameters:

Name	Type	Description	Default
`neural_residual`	`Callable[[Array], Array]`	Trained neural network module that maps state -> residual dynamics. Should accept (batch, state_dim).	required
`x_eval`	`ndarray`	Evaluation data, shape (n_samples, state_dim).	required
`config`	`SINDyConfig \| None`	SINDy configuration. Uses defaults if None.	`None`

Returns:

Type	Description
`SINDy`	Fitted SINDy model whose coefficients represent the symbolic
`SINDy`	form of the neural residual.

finite_difference ¶

finite_difference(x: ndarray, dt: float | ndarray, order: int = 2) -> ndarray

Compute time derivatives via centered finite differences.

Uses second-order centered differences for interior points and first-order forward/backward differences at boundaries.

Parameters:

Name	Type	Description	Default
`x`	`ndarray`	State data, shape (n_samples, n_features).	required
`dt`	`float \| ndarray`	Time step between samples.	required
`order`	`int`	Derivative order (only 1 supported currently).	`2`

Returns:

Type	Description
`ndarray`	Estimated derivatives, shape (n_samples, n_features).

smooth_data ¶

smooth_data(x: ndarray, window_size: int = 5) -> ndarray

Smooth data using a moving average filter.

Applies a uniform moving average along the time axis (axis 0). Boundary values are handled by truncating the kernel.

Parameters:

Name	Type	Description	Default
`x`	`ndarray`	Data matrix, shape (n_samples, n_features).	required
`window_size`	`int`	Number of points in the averaging window.	`5`

Returns:

Type	Description
`ndarray`	Smoothed data, shape (n_samples, n_features).

Symbolic Discovery¶

Symbolic regression wrapper for equation discovery.

Provides a thin bridge to PySR (Julia-based symbolic regression) as an optional dependency. When PySR is not installed, falls back to a simplified brute-force search over a small expression set.

Reference

Cranmer (2023) "Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl"

SymbolicRegressionConfig `dataclass` ¶

SymbolicRegressionConfig(*, max_complexity: int = 20, populations: int = 30, niterations: int = 40, binary_operators: tuple[str, ...] = ('+', '-', '*', '/'), unary_operators: tuple[str, ...] = ('sin', 'cos', 'exp', 'sqrt'))

Configuration for symbolic regression.

Attributes:

Name	Type	Description
`max_complexity`	`int`	Maximum expression complexity.
`populations`	`int`	Number of evolutionary populations.
`niterations`	`int`	Number of search iterations.
`binary_operators`	`tuple[str, ...]`	Allowed binary operators.
`unary_operators`	`tuple[str, ...]`	Allowed unary operators.

SymbolicRegressor ¶

SymbolicRegressor(config: SymbolicRegressionConfig | None = None)

Symbolic regression for discovering closed-form expressions.

Uses PySR when available, otherwise falls back to a simple polynomial fit as a baseline.

Usage::

reg = SymbolicRegressor()
reg.fit(x, y)
print(reg.best_equation())
y_pred = reg.predict(x)

Parameters:

Name	Type	Description	Default
`config`	`SymbolicRegressionConfig \| None`	Search configuration. Uses defaults if None.	`None`

fit ¶

fit(x: Array, y: Array) -> None

Fit symbolic regression to data.

Parameters:

Name	Type	Description	Default
`x`	`Array`	Input features, shape (n_samples, n_features).	required
`y`	`Array`	Target values, shape (n_samples,).	required

predict ¶

predict(x: Array) -> ndarray

Predict using the discovered expression.

Parameters:

Name	Type	Description	Default
`x`	`Array`	Input features, shape (n_samples, n_features).	required

Returns:

Type	Description
`ndarray`	Predicted values, shape (n_samples,).

best_equation ¶

best_equation() -> str

Get the best discovered equation as a string.

Returns:

Type	Description
`str`	Human-readable equation string.

Discovery API Reference¶

SINDy Module¶

EnsembleSINDyConfig dataclass ¶

SINDyConfig dataclass ¶

WeakSINDyConfig dataclass ¶

EnsembleSINDy ¶

fit ¶

predict ¶

equations ¶

CandidateLibrary ¶

transform ¶

get_feature_names ¶

SR3 ¶

fit ¶

STLSQ ¶

fit ¶

SINDy ¶

fit ¶

predict ¶

equations ¶

feature_names ¶

score ¶

WeakSINDy ¶

fit ¶

predict ¶

equations ¶

score ¶

distill_ude_residual ¶

finite_difference ¶

smooth_data ¶

Symbolic Discovery¶

SymbolicRegressionConfig dataclass ¶

SymbolicRegressor ¶

fit ¶

predict ¶

best_equation ¶

EnsembleSINDyConfig `dataclass` ¶

SINDyConfig `dataclass` ¶

WeakSINDyConfig `dataclass` ¶

SymbolicRegressionConfig `dataclass` ¶