uf3.representation.process.BasisFeaturizer¶
- class BasisFeaturizer(chemical_system, bspline_config, fit_forces=True, prefix='x')[source]¶
Bases:
object-Manage knot-related logic for pair interactions -Generate energy/force features -Arrange features into DataFrame -Process DataFrame into tuples of (x, y, weight)
- Parameters
chemical_system (uf3.data.composition.ChemicalSystem) –
bspline_config (uf3.representation.bspline.BsplineConfig) –
fit_forces (bool) – whether to generate force features.
prefix (str) – prefix for feature columns.
Methods
- param eval_map
map of energy/force keys to fixed-length
batched_to_hdfProcess standard dataframe to generate representation features and arrange into processed dataframe.
Generate feature vector(s) for learning energy and/or forces
Process standard dataframe to generate representation features and arrange into processed dataframe.
Generate 2B feature vector for learning energy of one configuration.
Generate 3B feature vector for learning energy of one configuration.
Generate 2B feature vectors for learning forces of one configuration. :param geom: unit cell or configuration of interest. :type geom: ase.Atoms :param supercell: optional ase.Atoms output of get_supercell used to account for atoms in periodic images. :type supercell: ase.Atoms.
Generate 3B feature vectors for learning forces of one configuration.
Instantiate from configuration dictionary
TODO: Remove (deprecated)
Attributes
basis_functionsdegreeelement_listinteraction_hashesinteractions_mapknot_subintervalsknots_mappartition_sizesr_cutr_max_mapr_min_mapresolution_maptrailing_trim- arrange_features_dataframe(eval_map)[source]¶
- Parameters
eval_map (dict) – map of energy/force keys to fixed-length feature vectors. If forces and the energy are both provided, the dictionary will contain 3N + 1 entries.
- Returns
- processed dataframe with columns
[y, {name}_0, …, {name}_x, n_A, …, n_Z] corresponding to target vector, pair-distance representation features, and composition (one-body) features.
- Return type
df_features (pd.DataFrame)
- evaluate(df_data, atoms_key='geometry', energy_key='energy', progress='bar')[source]¶
Process standard dataframe to generate representation features and arrange into processed dataframe. Operates in serial by default.
- Parameters
df_data (pd.DataFrame) – standard dataframe with columns [atoms_key, energy_key, fx, fy, fz]
atoms_key (str) –
energy_key (str) –
progress (str, None) – style of progress counter.
- Returns
- processed dataframe with columns
[y, {name}_0, …, {name}_x, n_A, …, n_Z] corresponding to target vector, pair-distance representation features, and composition (one-body) features.
- Return type
df_features (pd.DataFrame)
- evaluate_configuration(geom, name=None, energy=None, forces=None, energy_key='energy')[source]¶
- Generate feature vector(s) for learning energy and/or forces
of one configuration.
TODO: refactor to break up into smaller, reusable functions
- Parameters
geom (ase.Atoms) – configuration of interest.
name (str) – if specified, keys in returned dictionary are tuples {(name, ‘e’), (name, ‘fx’), …{ instead of {‘e’, ‘fx’, …}
energy (float) – energy of configuration (optional).
forces (list, np.ndarray) – array containing force components fx, fy, fz for each atom. Expected shape is (n_atoms, 3).
energy_key (str) – column name for energies, default “energy”.
- Returns
- map of energy/force keys to fixed-length
feature vectors. If forces and the energy are both provided, the dictionary will contain 3N + 1 entries.
- Return type
eval_map (dict)
- evaluate_parallel(df_data, client, atoms_key='geometry', energy_key='energy', n_jobs=2, shuffle=True, progress='bar')[source]¶
Process standard dataframe to generate representation features and arrange into processed dataframe. Operates in serial by default.
- Parameters
df_data (pd.DataFrame) – standard dataframe with columns [atoms_key, energy_key, fx, fy, fz]
data_coordinator (uf3.data.io.DataCoordinator) –
n_jobs (int) – number of parallel jobs to submit.
client (concurrent.futures.Executor, dask.distributed.Client) –
- Returns
- processed dataframe with columns
[y, {name}_0, …, {name}_x, n_A, …, n_Z] corresponding to target vector, pair-distance representation features, and composition (one-body) features.
- Return type
df_features (pd.DataFrame)
- featurize_energy_2B(geom, supercell=None)[source]¶
Generate 2B feature vector for learning energy of one configuration.
- Parameters
geom (ase.Atoms) – unit cell or configuration of interest.
supercell (ase.Atoms) – optional ase.Atoms output of get_supercell used to account for atoms in periodic images.
- Returns
vector of features.
- Return type
vector (np.ndarray)
- featurize_energy_3B(geom, supercell=None)[source]¶
Generate 3B feature vector for learning energy of one configuration.
- Parameters
geom (ase.Atoms) – unit cell or configuration of interest.
supercell (ase.Atoms) – optional ase.Atoms output of get_supercell used to account for atoms in periodic images.
- Returns
vector of features.
- Return type
vector (np.ndarray)
- featurize_force_2B(geom, supercell=None)[source]¶
Generate 2B feature vectors for learning forces of one configuration. :param geom: unit cell or configuration of interest. :type geom: ase.Atoms :param supercell: optional ase.Atoms output of get_supercell
used to account for atoms in periodic images.
- Returns
- feature vectors arranged in
array of shape (n_atoms, n_force_components, n_features).
- Return type
feature_array (np.ndarray)
- featurize_force_3B(geom, supercell=None)[source]¶
Generate 3B feature vectors for learning forces of one configuration.
- Parameters
geom (ase.Atoms) – unit cell or configuration of interest.
supercell (ase.Atoms) – optional ase.Atoms output of get_supercell used to account for atoms in periodic images.
- Returns
- feature vectors arranged in
array of shape (n_atoms, n_force_components, n_features).
- Return type
feature_array (np.ndarray)
- get_training_tuples(df_features, kappa, data_coordinator)[source]¶
TODO: Remove (deprecated)
- Weights are generated by normalizing energy and force entries by the
respective sample standard deviations as well as the relative number of entries per type. Weights are further modified by kappa, which controls the relative weighting between energy and force errors. A value of 0 corresponds to force-training, while a value of 1 corresponds to energy-training.
- Parameters
df_features (pd.DataFrame) – dataframe with target vector (y) as the first column and feature vectors (x) as remaining columns.
kappa (float) – energy-force weighting parameter between 0 and 1.
data_coordinator (uf3.data.io.DataCoordinator) –
- Returns
features for machine learning. y (np.ndarray): target vector. w (np.ndarray): weight vector for machine learning.
- Return type
x (np.ndarray)