uf3.regression.least_squares.WeightedLinearModel

class WeightedLinearModel(bspline_config, regularizer=None, **params)[source]

Bases: uf3.regression.least_squares.BasicLinearModel

Handler class for regularized linear least squares using energies and forces and basis set provided by bspline.BsplineBasis.

Parameters

regularizer (np.ndarray) – regularization matrix.

Methods

batched_predict

Extract inputs and outputs from HDF5 file and predict energies/forces.

combine_weighted_gram

Apply weighting to gram matrices and ordinates for energy and force contributions to the fit.

dump

Arrange coefficients/knots map into dictionary.

fit

Direct solution from input-output pairs corresponding to energies and forces, with option to weigh their respective contributions.

fit_from_file

Accumulate inputs and outputs from batched parsing of HDF5 file and compute direct solution via LU decomposition.

fit_with_gram

Intermediate function for direct solution using gram matrix and ordinate (Moore-penrose inverse).

gram_from_df

Extract inputs and outputs from dataframe and compute moore-penrose components (gram matrices and ordinates).

initialize_gram_ordinate

Initialize empty matrices for gram matrices and ordinates.

load

Reflatten coefficients (e.g.

predict

Predict using fit coefficients.

save

Save model (coefficients and knots map) to file.

score

Evaluate score (negative error metric).

set_params

Set parameters from keyword arguments.

Attributes

col_idx

frozen_c

mask

n_feats

batched_predict(filename: str, keys: Optional[List[str]] = None, table_names: Optional[List[str]] = None, score: bool = True)[source]

Extract inputs and outputs from HDF5 file and predict energies/forces.

Parameters
  • filename – path to HDF5 file.

  • keys (list) – keys to query from df (e.g. training subset).

  • table_names (list) – list of table names in HDF5 to read.

  • score (bool) – whether to return root mean square error metrics.

Returns

target values for energies. p_e (np.ndarray): prediction values for forces. y_f (np.ndarray): target values for energies. p_f (np.ndarray): target values for forces. rmse_e (np.ndarray): RMSE across energy predictions. rmse_e (np.ndarray): RMSE across force predictions.

Return type

y_e (np.ndarray)

combine_weighted_gram(gram_e: numpy.ndarray, gram_f: numpy.ndarray, ord_e: numpy.ndarray, ord_f: numpy.ndarray, energy_weight: float, force_weight: float, weight: float)[source]

Apply weighting to gram matrices and ordinates for energy and force contributions to the fit.

Parameters
  • gram_e (np.ndarray) – gram matrix (x^T x) for energies.

  • gram_f (np.ndarray) – gram matrix (x^T x) for forces.

  • ord_e (np.ndarray) – ordinate (x^T y) for energies.

  • ord_f (np.ndarray) – ordinate (x^T y) for forces.

  • energy_weight – 1 / (# energies * sqrt(Var(energies)))

  • force_weight – 1 / (# forces * sqrt(Var(forces)))

  • weight (float) – parameter balancing contribution from energies vs. forces. Higher values favor energies; defaults to 0.5.

Returns

gram matrix (x^T x) for fitting. ordinate (np.ndarray): ordinate (x^T y) for fitting.

Return type

gram (np.ndarray)

dump()[source]

Arrange coefficients/knots map into dictionary.

fit(x_e: numpy.ndarray, y_e: numpy.ndarray, x_f: Optional[numpy.ndarray] = None, y_f: Optional[numpy.ndarray] = None, weight: float = 0.5, batch_size=2500)[source]

Direct solution from input-output pairs corresponding to energies and forces, with option to weigh their respective contributions.

Parameters
  • x_e (np.ndarray) – input matrix of shape (n_samples, n_features).

  • y_e (np.ndarray) – output vector of length n_samples.

  • x_f (np.ndarray) – input matrix corresponding to forces.

  • y_f (np.ndarray) – output vector corresponding to forces.

  • weight (float) – parameter balancing contribution from energies vs. forces. Higher values favor energies; defaults to 0.5.

  • batch_size – maximum batch size for gram matrix construction.

fit_from_file(filename: str, subset: Collection, index: Collection, weight: float = 0.5, batch_size=2500, energy_key='energy', progress: str = 'bar')[source]

Accumulate inputs and outputs from batched parsing of HDF5 file and compute direct solution via LU decomposition.

Parameters
  • filename (str) – path to HDF5 file.

  • subset (list) – list of indices for training.

  • index (list) – list of keys, i.e. from df_data DataFrame.

  • weight (float) – parameter balancing contribution from energies vs. forces. Higher values favor energies; defaults to 0.5.

  • batch_size (int) – batch size, in rows, for matrix multiplication operations in constructing gram matrices.

  • energy_key (str) – column name for energies, default “energy”.

  • progress (str) – style for progress indicators.

fit_with_gram(gram: numpy.ndarray, ordinate: numpy.ndarray)[source]

Intermediate function for direct solution using gram matrix and ordinate (Moore-penrose inverse).

Parameters
  • gram (np.ndarray) – gram matrix (x^T x)

  • (np.ndarray (ordinate) – ordinate (x^T y)

gram_from_df(df: pandas.core.frame.DataFrame, keys: Collection, e_variance: Optional[uf3.regression.least_squares.VarianceRecorder] = None, f_variance: Optional[uf3.regression.least_squares.VarianceRecorder] = None, energy_key: str = 'energy', batch_size: int = 2500)[source]

Extract inputs and outputs from dataframe and compute moore-penrose components (gram matrices and ordinates).

Parameters
  • df (pd.DataFrame) – DataFrame of energy/force features.

  • keys (list) – keys to query from df (e.g. training subset).

  • e_variance (VarianceRecorder) – handler for accumulating statistics for energies (mean and variance).

  • f_variance (VarianceRecorder) – handler for accumulating statistics for forces (mean and variance).

  • energy_key (str) – column name for energies, default “energy”.

  • batch_size (int) – batch size, in rows, for matrix multiplication operations in constructing gram matrices.

initialize_gram_ordinate()[source]

Initialize empty matrices for gram matrices and ordinates.

load(solution: Optional[Dict[Tuple[str], numpy.ndarray]] = None, filename: Optional[str] = None)[source]

Reflatten coefficients (e.g. obtained through arrange_coefficients) and load into model for prediction.

Parameters
  • solution (dict) – dictionary of 1B, 2B, … terms organized as interaction: vector entries.

  • filename (str) – filename of json dump containing solution.

predict(x: numpy.ndarray)

Predict using fit coefficients.

Parameters

x (np.ndarray) – input matrix of shape (n_samples, n_features).

Returns

vector of predictions.

Return type

predictions (np.ndarray)

save(filename: str)[source]

Save model (coefficients and knots map) to file.

score(x, y, weights=None, normalize=True)

Evaluate score (negative error metric).

Parameters
  • x (np.ndarray) – input matrix of shape (n_samples, n_features).

  • y (np.ndarray) – output vector of length n_samples.

  • weights (np.ndarray) – sample weights (optional).

  • normalize (bool) – whether to normalize by the std of y.

Returns

negative weighted root-mean-square-error.

Return type

score (float)

set_params(**params)[source]

Set parameters from keyword arguments. Initializes regularizer with default parameters if unspecified.