uf3.representation.process.BasisFeaturizer

class BasisFeaturizer(chemical_system, bspline_config, fit_forces=True, prefix='x')[source]

Bases: object

-Manage knot-related logic for pair interactions -Generate energy/force features -Arrange features into DataFrame -Process DataFrame into tuples of (x, y, weight)

Parameters
  • chemical_system (uf3.data.composition.ChemicalSystem) –

  • bspline_config (uf3.representation.bspline.BsplineConfig) –

  • fit_forces (bool) – whether to generate force features.

  • prefix (str) – prefix for feature columns.

Methods

arrange_features_dataframe

param eval_map

map of energy/force keys to fixed-length

batched_to_hdf

evaluate

Process standard dataframe to generate representation features and arrange into processed dataframe.

evaluate_configuration

Generate feature vector(s) for learning energy and/or forces

evaluate_parallel

Process standard dataframe to generate representation features and arrange into processed dataframe.

featurize_energy_2B

Generate 2B feature vector for learning energy of one configuration.

featurize_energy_3B

Generate 3B feature vector for learning energy of one configuration.

featurize_force_2B

Generate 2B feature vectors for learning forces of one configuration. :param geom: unit cell or configuration of interest. :type geom: ase.Atoms :param supercell: optional ase.Atoms output of get_supercell used to account for atoms in periodic images. :type supercell: ase.Atoms.

featurize_force_3B

Generate 3B feature vectors for learning forces of one configuration.

from_config

Instantiate from configuration dictionary

get_training_tuples

TODO: Remove (deprecated)

Attributes

basis_functions

degree

element_list

interaction_hashes

interactions_map

knot_subintervals

knots_map

partition_sizes

r_cut

r_max_map

r_min_map

resolution_map

trailing_trim

arrange_features_dataframe(eval_map)[source]
Parameters

eval_map (dict) – map of energy/force keys to fixed-length feature vectors. If forces and the energy are both provided, the dictionary will contain 3N + 1 entries.

Returns

processed dataframe with columns

[y, {name}_0, …, {name}_x, n_A, …, n_Z] corresponding to target vector, pair-distance representation features, and composition (one-body) features.

Return type

df_features (pd.DataFrame)

evaluate(df_data, atoms_key='geometry', energy_key='energy', progress='bar')[source]

Process standard dataframe to generate representation features and arrange into processed dataframe. Operates in serial by default.

Parameters
  • df_data (pd.DataFrame) – standard dataframe with columns [atoms_key, energy_key, fx, fy, fz]

  • atoms_key (str) –

  • energy_key (str) –

  • progress (str, None) – style of progress counter.

Returns

processed dataframe with columns

[y, {name}_0, …, {name}_x, n_A, …, n_Z] corresponding to target vector, pair-distance representation features, and composition (one-body) features.

Return type

df_features (pd.DataFrame)

evaluate_configuration(geom, name=None, energy=None, forces=None, energy_key='energy')[source]
Generate feature vector(s) for learning energy and/or forces

of one configuration.

TODO: refactor to break up into smaller, reusable functions

Parameters
  • geom (ase.Atoms) – configuration of interest.

  • name (str) – if specified, keys in returned dictionary are tuples {(name, ‘e’), (name, ‘fx’), …{ instead of {‘e’, ‘fx’, …}

  • energy (float) – energy of configuration (optional).

  • forces (list, np.ndarray) – array containing force components fx, fy, fz for each atom. Expected shape is (n_atoms, 3).

  • energy_key (str) – column name for energies, default “energy”.

Returns

map of energy/force keys to fixed-length

feature vectors. If forces and the energy are both provided, the dictionary will contain 3N + 1 entries.

Return type

eval_map (dict)

evaluate_parallel(df_data, client, atoms_key='geometry', energy_key='energy', n_jobs=2, shuffle=True, progress='bar')[source]

Process standard dataframe to generate representation features and arrange into processed dataframe. Operates in serial by default.

Parameters
  • df_data (pd.DataFrame) – standard dataframe with columns [atoms_key, energy_key, fx, fy, fz]

  • data_coordinator (uf3.data.io.DataCoordinator) –

  • n_jobs (int) – number of parallel jobs to submit.

  • client (concurrent.futures.Executor, dask.distributed.Client) –

Returns

processed dataframe with columns

[y, {name}_0, …, {name}_x, n_A, …, n_Z] corresponding to target vector, pair-distance representation features, and composition (one-body) features.

Return type

df_features (pd.DataFrame)

featurize_energy_2B(geom, supercell=None)[source]

Generate 2B feature vector for learning energy of one configuration.

Parameters
  • geom (ase.Atoms) – unit cell or configuration of interest.

  • supercell (ase.Atoms) – optional ase.Atoms output of get_supercell used to account for atoms in periodic images.

Returns

vector of features.

Return type

vector (np.ndarray)

featurize_energy_3B(geom, supercell=None)[source]

Generate 3B feature vector for learning energy of one configuration.

Parameters
  • geom (ase.Atoms) – unit cell or configuration of interest.

  • supercell (ase.Atoms) – optional ase.Atoms output of get_supercell used to account for atoms in periodic images.

Returns

vector of features.

Return type

vector (np.ndarray)

featurize_force_2B(geom, supercell=None)[source]

Generate 2B feature vectors for learning forces of one configuration. :param geom: unit cell or configuration of interest. :type geom: ase.Atoms :param supercell: optional ase.Atoms output of get_supercell

used to account for atoms in periodic images.

Returns

feature vectors arranged in

array of shape (n_atoms, n_force_components, n_features).

Return type

feature_array (np.ndarray)

featurize_force_3B(geom, supercell=None)[source]

Generate 3B feature vectors for learning forces of one configuration.

Parameters
  • geom (ase.Atoms) – unit cell or configuration of interest.

  • supercell (ase.Atoms) – optional ase.Atoms output of get_supercell used to account for atoms in periodic images.

Returns

feature vectors arranged in

array of shape (n_atoms, n_force_components, n_features).

Return type

feature_array (np.ndarray)

static from_config(chemical_system, config)[source]

Instantiate from configuration dictionary

get_training_tuples(df_features, kappa, data_coordinator)[source]

TODO: Remove (deprecated)

Weights are generated by normalizing energy and force entries by the

respective sample standard deviations as well as the relative number of entries per type. Weights are further modified by kappa, which controls the relative weighting between energy and force errors. A value of 0 corresponds to force-training, while a value of 1 corresponds to energy-training.

Parameters
  • df_features (pd.DataFrame) – dataframe with target vector (y) as the first column and feature vectors (x) as remaining columns.

  • kappa (float) – energy-force weighting parameter between 0 and 1.

  • data_coordinator (uf3.data.io.DataCoordinator) –

Returns

features for machine learning. y (np.ndarray): target vector. w (np.ndarray): weight vector for machine learning.

Return type

x (np.ndarray)