Discovery API Reference¶
Equation discovery framework for recovering governing equations from data.
SINDy Module¶
SINDy: Sparse Identification of Nonlinear Dynamics.
JAX-native implementation of the SINDy algorithm family for discovering governing equations from time-series data.
Reference
Brunton et al. (2016) "Discovering governing equations from data by sparse identification of nonlinear dynamical systems"
EnsembleSINDyConfig
dataclass
¶
EnsembleSINDyConfig(*, polynomial_degree: int = 2, threshold: float = 0.1, alpha: float = 0.05, max_iter: int = 20, include_trig: bool = False, n_frequencies: int = 1, optimizer: str = 'stlsq', n_models: int = 20, bagging_fraction: float = 0.8, library_dropout: float = 0.0)
Bases: SINDyConfig
Configuration for ensemble SINDy.
Attributes:
| Name | Type | Description |
|---|---|---|
n_models |
int
|
Number of models in the ensemble. |
bagging_fraction |
float
|
Fraction of data for each bootstrap sample. |
library_dropout |
float
|
Fraction of library terms to drop per model. |
SINDyConfig
dataclass
¶
SINDyConfig(*, polynomial_degree: int = 2, threshold: float = 0.1, alpha: float = 0.05, max_iter: int = 20, include_trig: bool = False, n_frequencies: int = 1, optimizer: str = 'stlsq')
Configuration for the SINDy sparse identification algorithm.
Attributes:
| Name | Type | Description |
|---|---|---|
polynomial_degree |
int
|
Maximum polynomial degree for candidate library. |
threshold |
float
|
Sparsity threshold for STLSQ optimizer. |
alpha |
float
|
L2 regularization strength for ridge regression. |
max_iter |
int
|
Maximum STLSQ iterations. |
include_trig |
bool
|
Include trigonometric basis functions. |
n_frequencies |
int
|
Number of Fourier frequencies (if trig enabled). |
optimizer |
str
|
Optimizer name ('stlsq' or 'sr3'). |
WeakSINDyConfig
dataclass
¶
WeakSINDyConfig(*, polynomial_degree: int = 2, threshold: float = 0.1, alpha: float = 0.05, max_iter: int = 20, include_trig: bool = False, n_frequencies: int = 1, optimizer: str = 'stlsq', n_subdomains: int = 100, test_function_order: int = 4)
Bases: SINDyConfig
Configuration for weak-form SINDy (noise-robust variant).
Attributes:
| Name | Type | Description |
|---|---|---|
n_subdomains |
int
|
Number of integration subdomains. |
test_function_order |
int
|
Order of the polynomial test function. |
EnsembleSINDy
¶
EnsembleSINDy(config: EnsembleSINDyConfig)
Ensemble SINDy with bootstrap aggregation for uncertainty.
Fits multiple SINDy models on data subsets (bagging) and reports coefficient statistics (mean, std) across the ensemble. This provides uncertainty estimates on discovered equation terms.
Attributes:
| Name | Type | Description |
|---|---|---|
config |
Ensemble configuration. |
|
coef_mean |
ndarray | None
|
Mean coefficients across ensemble, shape (n_targets, n_library). |
coef_std |
ndarray | None
|
Std of coefficients across ensemble, shape (n_targets, n_library). |
coef_list |
list[ndarray]
|
List of all individual model coefficients. |
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
EnsembleSINDyConfig
|
Ensemble configuration with n_models, bagging_fraction, etc. |
required |
CandidateLibrary
¶
CandidateLibrary(polynomial_degree: int = 2, include_trig: bool = False, n_frequencies: int = 1, custom_functions: Sequence[Callable] | None = None)
Generates candidate function library for SINDy.
Constructs a feature matrix Theta(x) where each column is a candidate nonlinear function evaluated on the data. The SINDy algorithm then finds the sparse subset of columns that best predicts the derivatives.
Attributes:
| Name | Type | Description |
|---|---|---|
polynomial_degree |
Maximum polynomial degree. |
|
include_trig |
Whether to include sin/cos basis functions. |
|
n_frequencies |
Number of Fourier frequencies. |
|
custom_functions |
Optional list of custom basis functions. |
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
polynomial_degree
|
int
|
Maximum degree of polynomial terms. |
2
|
include_trig
|
bool
|
Include sin/cos of each feature. |
False
|
n_frequencies
|
int
|
Number of Fourier frequencies (if trig enabled). |
1
|
custom_functions
|
Sequence[Callable] | None
|
List of callables f(x) -> array, where x is (n_samples, n_features) and output is (n_samples, 1) or (n_samples,). |
None
|
SR3
¶
SR3(threshold: float = 0.1, nu: float = 1.0, max_iter: int = 30, tol: float = 1e-05, regularization: str = 'l0')
Sparse Relaxed Regularized Regression optimizer.
Minimizes: 0.5||y - Xw||² + lambda * R(u) + (0.5/nu) * ||w - u||²
where R is a regularization penalty (L0, L1, or L2) and w, u are alternately optimized.
References
Zheng et al. (2019) "A unified framework for sparse relaxed regularized regression"
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
threshold
|
float
|
Sparsity threshold (lambda in the SR3 formulation). |
0.1
|
nu
|
float
|
Relaxation parameter controlling w-u coupling. |
1.0
|
max_iter
|
int
|
Maximum number of alternating iterations. |
30
|
tol
|
float
|
Convergence tolerance. |
1e-05
|
regularization
|
str
|
Regularization type ('l0', 'l1', or 'l2'). |
'l0'
|
fit
¶
Fit sparse coefficients via SR3.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
ndarray
|
Library feature matrix, shape (n_samples, n_features). |
required |
y
|
ndarray
|
Target derivatives, shape (n_samples, n_targets). |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Sparse coefficient matrix, shape (n_targets, n_features). |
STLSQ
¶
Sequential Thresholded Least Squares optimizer.
The canonical SINDy optimizer. Alternates between ridge regression and hard thresholding until the support (set of nonzero coefficients) stabilizes.
Algorithm
- Initialize via ridge regression
- Zero out coefficients below threshold
- Re-fit on remaining (nonzero) features
- Repeat until convergence or max_iter
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
threshold
|
float
|
Coefficients with absolute value below this are zeroed. |
0.1
|
alpha
|
float
|
L2 regularization parameter for ridge regression. |
0.05
|
max_iter
|
int
|
Maximum number of thresholding iterations. |
20
|
fit
¶
Fit sparse coefficients via STLSQ.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
ndarray
|
Library feature matrix, shape (n_samples, n_features). |
required |
y
|
ndarray
|
Target derivatives, shape (n_samples, n_targets). |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Sparse coefficient matrix, shape (n_targets, n_features). |
SINDy
¶
SINDy(config: SINDyConfig | None = None)
Sparse Identification of Nonlinear Dynamics.
Discovers governing equations from data by building a library of candidate nonlinear functions and using sparse regression to find the subset that best explains the observed dynamics.
Usage::
config = SINDyConfig(polynomial_degree=2, threshold=0.1)
model = SINDy(config)
model.fit(x, x_dot)
print(model.equations(["x", "y", "z"]))
x_dot_pred = model.predict(x)
Attributes:
| Name | Type | Description |
|---|---|---|
config |
SINDy configuration. |
|
library |
Candidate function library. |
|
coefficients |
ndarray | None
|
Fitted sparse coefficient matrix (n_targets, n_library_terms). |
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
SINDyConfig | None
|
Model configuration. Uses defaults if None. |
None
|
fit
¶
predict
¶
Predict derivatives using the discovered model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
ndarray
|
State data, shape (n_samples, n_features). |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Predicted derivatives, shape (n_samples, n_features). |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If model has not been fit. |
equations
¶
Get human-readable equation strings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_names
|
Sequence[str] | None
|
Names for state variables (e.g., ['x', 'y', 'z']). |
None
|
precision
|
int
|
Decimal places for coefficient values. |
3
|
Returns:
| Type | Description |
|---|---|
list[str]
|
List of equation strings, one per state variable. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If model has not been fit. |
feature_names
¶
score
¶
WeakSINDy
¶
WeakSINDy(config: WeakSINDyConfig)
Weak-form SINDy for noise-robust equation discovery.
Instead of computing noisy pointwise derivatives, integrates the governing equations against smooth bump test functions over overlapping time subdomains. This makes the method robust to measurement noise up to high SNR levels.
Usage::
config = WeakSINDyConfig(polynomial_degree=2, n_subdomains=50)
model = WeakSINDy(config)
model.fit(x_data, t)
print(model.equations(["x", "y"]))
Attributes:
| Name | Type | Description |
|---|---|---|
config |
WeakSINDy configuration. |
|
coefficients |
ndarray | None
|
Fitted sparse coefficient matrix. |
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
WeakSINDyConfig
|
Configuration with polynomial degree, threshold, number of subdomains, and test function order. |
required |
distill_ude_residual
¶
distill_ude_residual(neural_residual: Callable[[Array], Array], x_eval: ndarray, *, config: SINDyConfig | None = None) -> SINDy
Distill a UDE neural residual into symbolic equations via SINDy.
Evaluates the neural residual on sample data and uses SINDy to find the sparsest symbolic representation of the learned function.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
neural_residual
|
Callable[[Array], Array]
|
Trained neural network module that maps state -> residual dynamics. Should accept (batch, state_dim). |
required |
x_eval
|
ndarray
|
Evaluation data, shape (n_samples, state_dim). |
required |
config
|
SINDyConfig | None
|
SINDy configuration. Uses defaults if None. |
None
|
Returns:
| Type | Description |
|---|---|
SINDy
|
Fitted SINDy model whose coefficients represent the symbolic |
SINDy
|
form of the neural residual. |
finite_difference
¶
Compute time derivatives via centered finite differences.
Uses second-order centered differences for interior points and first-order forward/backward differences at boundaries.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
ndarray
|
State data, shape (n_samples, n_features). |
required |
dt
|
float | ndarray
|
Time step between samples. |
required |
order
|
int
|
Derivative order (only 1 supported currently). |
2
|
Returns:
| Type | Description |
|---|---|
ndarray
|
Estimated derivatives, shape (n_samples, n_features). |
smooth_data
¶
Smooth data using a moving average filter.
Applies a uniform moving average along the time axis (axis 0). Boundary values are handled by truncating the kernel.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
ndarray
|
Data matrix, shape (n_samples, n_features). |
required |
window_size
|
int
|
Number of points in the averaging window. |
5
|
Returns:
| Type | Description |
|---|---|
ndarray
|
Smoothed data, shape (n_samples, n_features). |
Symbolic Discovery¶
Symbolic regression wrapper for equation discovery.
Provides a thin bridge to PySR (Julia-based symbolic regression) as an optional dependency. When PySR is not installed, falls back to a simplified brute-force search over a small expression set.
Reference
Cranmer (2023) "Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl"
SymbolicRegressionConfig
dataclass
¶
SymbolicRegressionConfig(*, max_complexity: int = 20, populations: int = 30, niterations: int = 40, binary_operators: tuple[str, ...] = ('+', '-', '*', '/'), unary_operators: tuple[str, ...] = ('sin', 'cos', 'exp', 'sqrt'))
Configuration for symbolic regression.
Attributes:
| Name | Type | Description |
|---|---|---|
max_complexity |
int
|
Maximum expression complexity. |
populations |
int
|
Number of evolutionary populations. |
niterations |
int
|
Number of search iterations. |
binary_operators |
tuple[str, ...]
|
Allowed binary operators. |
unary_operators |
tuple[str, ...]
|
Allowed unary operators. |
SymbolicRegressor
¶
SymbolicRegressor(config: SymbolicRegressionConfig | None = None)
Symbolic regression for discovering closed-form expressions.
Uses PySR when available, otherwise falls back to a simple polynomial fit as a baseline.
Usage::
reg = SymbolicRegressor()
reg.fit(x, y)
print(reg.best_equation())
y_pred = reg.predict(x)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
SymbolicRegressionConfig | None
|
Search configuration. Uses defaults if None. |
None
|