uf3.regression.least_squares.WeightedLinearModel¶

class WeightedLinearModel(bspline_config, regularizer=None, **params)[source]¶

Bases: uf3.regression.least_squares.BasicLinearModel

Handler class for regularized linear least squares using energies and forces and basis set provided by bspline.BsplineBasis.

Parameters: regularizer (np.ndarray) – regularization matrix.

Methods

`batched_predict`	Extract inputs and outputs from HDF5 file and predict energies/forces.
`combine_weighted_gram`	Apply weighting to gram matrices and ordinates for energy and force contributions to the fit.
`dump`	Arrange coefficients/knots map into dictionary.
`fit`	Direct solution from input-output pairs corresponding to energies and forces, with option to weigh their respective contributions.
`fit_from_file`	Accumulate inputs and outputs from batched parsing of HDF5 file and compute direct solution via LU decomposition.
`fit_with_gram`	Intermediate function for direct solution using gram matrix and ordinate (Moore-penrose inverse).
`gram_from_df`	Extract inputs and outputs from dataframe and compute moore-penrose components (gram matrices and ordinates).
`initialize_gram_ordinate`	Initialize empty matrices for gram matrices and ordinates.
`load`	Reflatten coefficients (e.g.
`predict`	Predict using fit coefficients.
`save`	Save model (coefficients and knots map) to file.
`score`	Evaluate score (negative error metric).
`set_params`	Set parameters from keyword arguments.

Attributes

`col_idx`
`frozen_c`
`mask`
`n_feats`

batched_predict(filename: str, keys: Optional[List[str]] = None, table_names: Optional[List[str]] = None, score: bool = True)[source]¶

Extract inputs and outputs from HDF5 file and predict energies/forces.

Parameters

filename – path to HDF5 file.
keys (list) – keys to query from df (e.g. training subset).
table_names (list) – list of table names in HDF5 to read.
score (bool) – whether to return root mean square error metrics.

Returns

target values for energies. p_e (np.ndarray): prediction values for forces. y_f (np.ndarray): target values for energies. p_f (np.ndarray): target values for forces. rmse_e (np.ndarray): RMSE across energy predictions. rmse_e (np.ndarray): RMSE across force predictions.

Return type

y_e (np.ndarray)

combine_weighted_gram(gram_e: numpy.ndarray, gram_f: numpy.ndarray, ord_e: numpy.ndarray, ord_f: numpy.ndarray, energy_weight: float, force_weight: float, weight: float)[source]¶

Apply weighting to gram matrices and ordinates for energy and force contributions to the fit.

Parameters

gram_e (np.ndarray) – gram matrix (x^T x) for energies.
gram_f (np.ndarray) – gram matrix (x^T x) for forces.
ord_e (np.ndarray) – ordinate (x^T y) for energies.
ord_f (np.ndarray) – ordinate (x^T y) for forces.
energy_weight – 1 / (# energies * sqrt(Var(energies)))
force_weight – 1 / (# forces * sqrt(Var(forces)))
weight (float) – parameter balancing contribution from energies vs. forces. Higher values favor energies; defaults to 0.5.

Returns

gram matrix (x^T x) for fitting. ordinate (np.ndarray): ordinate (x^T y) for fitting.

Return type

gram (np.ndarray)

dump()[source]¶: Arrange coefficients/knots map into dictionary.

fit(x_e: numpy.ndarray, y_e: numpy.ndarray, x_f: Optional[numpy.ndarray] = None, y_f: Optional[numpy.ndarray] = None, weight: float = 0.5, batch_size=2500)[source]¶

Direct solution from input-output pairs corresponding to energies and forces, with option to weigh their respective contributions.

Parameters

x_e (np.ndarray) – input matrix of shape (n_samples, n_features).
y_e (np.ndarray) – output vector of length n_samples.
x_f (np.ndarray) – input matrix corresponding to forces.
y_f (np.ndarray) – output vector corresponding to forces.
weight (float) – parameter balancing contribution from energies vs. forces. Higher values favor energies; defaults to 0.5.
batch_size – maximum batch size for gram matrix construction.

fit_from_file(filename: str, subset: Collection, index: Collection, weight: float = 0.5, batch_size=2500, energy_key='energy', progress: str = 'bar')[source]¶

Accumulate inputs and outputs from batched parsing of HDF5 file and compute direct solution via LU decomposition.

Parameters

filename (str) – path to HDF5 file.
subset (list) – list of indices for training.
index (list) – list of keys, i.e. from df_data DataFrame.
weight (float) – parameter balancing contribution from energies vs. forces. Higher values favor energies; defaults to 0.5.
batch_size (int) – batch size, in rows, for matrix multiplication operations in constructing gram matrices.
energy_key (str) – column name for energies, default “energy”.
progress (str) – style for progress indicators.

fit_with_gram(gram: numpy.ndarray, ordinate: numpy.ndarray)[source]¶

Intermediate function for direct solution using gram matrix and ordinate (Moore-penrose inverse).

Parameters

gram (np.ndarray) – gram matrix (x^T x)
(np.ndarray (ordinate) – ordinate (x^T y)

gram_from_df(df: pandas.core.frame.DataFrame, keys: Collection, e_variance: Optional[uf3.regression.least_squares.VarianceRecorder] = None, f_variance: Optional[uf3.regression.least_squares.VarianceRecorder] = None, energy_key: str = 'energy', batch_size: int = 2500)[source]¶

Extract inputs and outputs from dataframe and compute moore-penrose components (gram matrices and ordinates).

Parameters

df (pd.DataFrame) – DataFrame of energy/force features.
keys (list) – keys to query from df (e.g. training subset).
e_variance (VarianceRecorder) – handler for accumulating statistics for energies (mean and variance).
f_variance (VarianceRecorder) – handler for accumulating statistics for forces (mean and variance).
energy_key (str) – column name for energies, default “energy”.
batch_size (int) – batch size, in rows, for matrix multiplication operations in constructing gram matrices.

initialize_gram_ordinate()[source]¶: Initialize empty matrices for gram matrices and ordinates.

load(solution: Optional[Dict[Tuple[str], numpy.ndarray]] = None, filename: Optional[str] = None)[source]¶

Reflatten coefficients (e.g. obtained through arrange_coefficients) and load into model for prediction.

Parameters

solution (dict) – dictionary of 1B, 2B, … terms organized as interaction: vector entries.
filename (str) – filename of json dump containing solution.

predict(x: numpy.ndarray)¶

Predict using fit coefficients.

Parameters: x (np.ndarray) – input matrix of shape (n_samples, n_features).
Returns: vector of predictions.
Return type: predictions (np.ndarray)

save(filename: str)[source]¶: Save model (coefficients and knots map) to file.

score(x, y, weights=None, normalize=True)¶

Evaluate score (negative error metric).

Parameters

x (np.ndarray) – input matrix of shape (n_samples, n_features).
y (np.ndarray) – output vector of length n_samples.
weights (np.ndarray) – sample weights (optional).
normalize (bool) – whether to normalize by the std of y.

Returns

negative weighted root-mean-square-error.

Return type

score (float)

set_params(**params)[source]¶: Set parameters from keyword arguments. Initializes regularizer with default parameters if unspecified.