uf3.regression.least_squares.WeightedLinearModel¶
- class WeightedLinearModel(bspline_config, regularizer=None, **params)[source]¶
Bases:
uf3.regression.least_squares.BasicLinearModelHandler class for regularized linear least squares using energies and forces and basis set provided by bspline.BsplineBasis.
- Parameters
regularizer (np.ndarray) – regularization matrix.
Methods
Extract inputs and outputs from HDF5 file and predict energies/forces.
Apply weighting to gram matrices and ordinates for energy and force contributions to the fit.
Arrange coefficients/knots map into dictionary.
Direct solution from input-output pairs corresponding to energies and forces, with option to weigh their respective contributions.
Accumulate inputs and outputs from batched parsing of HDF5 file and compute direct solution via LU decomposition.
Intermediate function for direct solution using gram matrix and ordinate (Moore-penrose inverse).
Extract inputs and outputs from dataframe and compute moore-penrose components (gram matrices and ordinates).
Initialize empty matrices for gram matrices and ordinates.
Reflatten coefficients (e.g.
Predict using fit coefficients.
Save model (coefficients and knots map) to file.
Evaluate score (negative error metric).
Set parameters from keyword arguments.
Attributes
col_idxfrozen_cmaskn_feats- batched_predict(filename: str, keys: Optional[List[str]] = None, table_names: Optional[List[str]] = None, score: bool = True)[source]¶
Extract inputs and outputs from HDF5 file and predict energies/forces.
- Parameters
filename – path to HDF5 file.
keys (list) – keys to query from df (e.g. training subset).
table_names (list) – list of table names in HDF5 to read.
score (bool) – whether to return root mean square error metrics.
- Returns
target values for energies. p_e (np.ndarray): prediction values for forces. y_f (np.ndarray): target values for energies. p_f (np.ndarray): target values for forces. rmse_e (np.ndarray): RMSE across energy predictions. rmse_e (np.ndarray): RMSE across force predictions.
- Return type
y_e (np.ndarray)
- combine_weighted_gram(gram_e: numpy.ndarray, gram_f: numpy.ndarray, ord_e: numpy.ndarray, ord_f: numpy.ndarray, energy_weight: float, force_weight: float, weight: float)[source]¶
Apply weighting to gram matrices and ordinates for energy and force contributions to the fit.
- Parameters
gram_e (np.ndarray) – gram matrix (x^T x) for energies.
gram_f (np.ndarray) – gram matrix (x^T x) for forces.
ord_e (np.ndarray) – ordinate (x^T y) for energies.
ord_f (np.ndarray) – ordinate (x^T y) for forces.
energy_weight – 1 / (# energies * sqrt(Var(energies)))
force_weight – 1 / (# forces * sqrt(Var(forces)))
weight (float) – parameter balancing contribution from energies vs. forces. Higher values favor energies; defaults to 0.5.
- Returns
gram matrix (x^T x) for fitting. ordinate (np.ndarray): ordinate (x^T y) for fitting.
- Return type
gram (np.ndarray)
- fit(x_e: numpy.ndarray, y_e: numpy.ndarray, x_f: Optional[numpy.ndarray] = None, y_f: Optional[numpy.ndarray] = None, weight: float = 0.5, batch_size=2500)[source]¶
Direct solution from input-output pairs corresponding to energies and forces, with option to weigh their respective contributions.
- Parameters
x_e (np.ndarray) – input matrix of shape (n_samples, n_features).
y_e (np.ndarray) – output vector of length n_samples.
x_f (np.ndarray) – input matrix corresponding to forces.
y_f (np.ndarray) – output vector corresponding to forces.
weight (float) – parameter balancing contribution from energies vs. forces. Higher values favor energies; defaults to 0.5.
batch_size – maximum batch size for gram matrix construction.
- fit_from_file(filename: str, subset: Collection, index: Collection, weight: float = 0.5, batch_size=2500, energy_key='energy', progress: str = 'bar')[source]¶
Accumulate inputs and outputs from batched parsing of HDF5 file and compute direct solution via LU decomposition.
- Parameters
filename (str) – path to HDF5 file.
subset (list) – list of indices for training.
index (list) – list of keys, i.e. from df_data DataFrame.
weight (float) – parameter balancing contribution from energies vs. forces. Higher values favor energies; defaults to 0.5.
batch_size (int) – batch size, in rows, for matrix multiplication operations in constructing gram matrices.
energy_key (str) – column name for energies, default “energy”.
progress (str) – style for progress indicators.
- fit_with_gram(gram: numpy.ndarray, ordinate: numpy.ndarray)[source]¶
Intermediate function for direct solution using gram matrix and ordinate (Moore-penrose inverse).
- Parameters
gram (np.ndarray) – gram matrix (x^T x)
(np.ndarray (ordinate) – ordinate (x^T y)
- gram_from_df(df: pandas.core.frame.DataFrame, keys: Collection, e_variance: Optional[uf3.regression.least_squares.VarianceRecorder] = None, f_variance: Optional[uf3.regression.least_squares.VarianceRecorder] = None, energy_key: str = 'energy', batch_size: int = 2500)[source]¶
Extract inputs and outputs from dataframe and compute moore-penrose components (gram matrices and ordinates).
- Parameters
df (pd.DataFrame) – DataFrame of energy/force features.
keys (list) – keys to query from df (e.g. training subset).
e_variance (VarianceRecorder) – handler for accumulating statistics for energies (mean and variance).
f_variance (VarianceRecorder) – handler for accumulating statistics for forces (mean and variance).
energy_key (str) – column name for energies, default “energy”.
batch_size (int) – batch size, in rows, for matrix multiplication operations in constructing gram matrices.
- load(solution: Optional[Dict[Tuple[str], numpy.ndarray]] = None, filename: Optional[str] = None)[source]¶
Reflatten coefficients (e.g. obtained through arrange_coefficients) and load into model for prediction.
- Parameters
solution (dict) – dictionary of 1B, 2B, … terms organized as interaction: vector entries.
filename (str) – filename of json dump containing solution.
- predict(x: numpy.ndarray)¶
Predict using fit coefficients.
- Parameters
x (np.ndarray) – input matrix of shape (n_samples, n_features).
- Returns
vector of predictions.
- Return type
predictions (np.ndarray)
- score(x, y, weights=None, normalize=True)¶
Evaluate score (negative error metric).
- Parameters
x (np.ndarray) – input matrix of shape (n_samples, n_features).
y (np.ndarray) – output vector of length n_samples.
weights (np.ndarray) – sample weights (optional).
normalize (bool) – whether to normalize by the std of y.
- Returns
negative weighted root-mean-square-error.
- Return type
score (float)