Ultra-Fast Force Fields (UF3) - YAML/CLI Guide¶
General Settings¶
outputs_path
(string): directory in which to save outputs such as
trained models, LAMMPS potentials, and plots.
Default: “./outputs”
element_list
(list): elements in the chemical system
Example:
element_list:
- Ca
- Ti
- O
degree
(integer): maximum degree of atomic interactions to consider.
Options: {2, 3}
Default: 2
seed
(integer): seed for random number routines in Numpy. Improves
reproducibility.
Default: 0
Data¶
db_path
(string): filename for cached data. Created during
preprocessing. Used during featurization, learning, and postprocessing.
Default: “data.db”
max_per_file
(integer): maximum number of samples to take per file.
Farthest-point-sampling is used to select subsets of samples. Typically,
when sampling from a relaxation path, more samples are taken from the
beginning than the end. -1 takes all entries without farthest-point-sampling.
Default: -1
min_diff
(float): minimum energy difference in eV between any two
samples selected with farthest-point-sampling. This avoids oversampling
configurations that are very similar in energy, e.g. at the end of a
relaxation path.
Default: 1e-8
generate_stats
(boolean): whether to analyze and summarize pair
interaction distances, observed peaks, and more.
Default: True
progress
(string): style of printing progress. bar enables tqdm
progress bars while text yields timestamped updates. None or False
disables progress indicators.
Options: {“bar”, “text”, None}
Default: “bar”
vasp_pressure
(boolean): enable correction for external pressure,
subtracting pressure * volume term from parsed energies. External
pressure tag PSTRESS is extracted from file (INCAR, OUTCAR,
vasprun.xml) inside the same directory.
Default: True
sources¶
path
(string): path to highest-level directory in which to search
for files.
Default: “./data”
pattern
(string): glob pattern for recursive search in path.
Default: “*”
keys¶
atoms_key
(string): column name for atomic configurations in
DataFrame.
Default: “geometry”
energy_key
(string): keyword for energies in data parsing and
ase.Atoms.info
.
Default: “energy”
force_key
(string): keyword for forces in data parsing and
ase.Atoms.arrays
.
Default: “forces”
size_key
(string): column name for number of atoms per atomic
configurations in DataFrame.
Default: “size”
Basis¶
r_min
(dictionary): minimum pair distance per interaction, in
angstroms, to consider for featurization.
Note: this value should be
low enough to account for the smallest pair distances expected to appear in simulations. Otherwise, LAMMPS will fail as soon as two atoms get too close together.
Default: 1.0 for 2B interactions, [1.0, 1.0, 1.0]
for 3B interactions
r_max
(dictionary): maximum pair distance per interaction, in
angstroms, to consider for featurization.
Note: increasing this value
necessarily increases the number of neighbors to consider during featurization, which increases the computational cost of featurization.
Default: 6.0 for 2B interactions, [6.0, 6.0, 6.0] for 3B
interactions
resolution
(dictionary): number of knot intervals per interaction.
Note: Due to local support, featurization time does not scale with this
value. However, memory requirements do. In the case of 3B interactions, the scaling is cubic.
Default: 25 for 2B interactions, [10, 10, 20] for 3B interactions
fit_offsets
(boolean): enable fitting 1-body energies per element,
a.k.a. reference energy or isolated-atom energy.
Default: True
trailing_trim
(integer): force a number of trailing basis functions
for each pair potential to zero during training. Note: if the upper
cutoff distances given in r_max
are too low, then this scheme may
slightly increase error.
Default: 3
= 0
: hard cutoff at r_max``= 1
: function goes to zero atr_max
= 2
: first derivative goes to zero atr_max
= 3
: second derivative goes to zero atr_max
mask_trim
(boolean): whether to mask all trimmed basis functions
when caching features. For large values of resolution
, this option
greatly reduces filesize.
Default: True
knot_strategy
(string): spacing scheme for placing knots, given a
fixed resolution. When read_knots
is True and knots_file
is
provided, this setting is ignored.
Options: {“linear”, “lammps”, “geometric”, “inverse”, }
= linear
: uniform spacing of knots.= lammps
: LAMMPS-style r^2 spacing, resulting in higher resolution at longer distances and lower resolution and smaller distances.= geometric
: log(r) spacing, yielding higher resolution at smaller distances.= inverse
: 1/r spacing, yielding higher resolution at smaller distances.
Default: “linear”
knots_path
(string): filename for knots. If specified, enables
writing and reading of knot sequences.
Note: if specified and
`load_knots`
is on, any
settings (per interaction) specified in r_min
, r_max
, and
resolution
are ignored.
Default: “knots.json”
load_knots
(boolean): If enabled, read knot sequences from knots_path
.
Default: False
dump_knots
(boolean): If enabled, write knot sequences to knots_path
, overwriting existing files.
Default: False
Features¶
db_path
(string): filename for cached data. Created during
preprocessing. Used during featurization and learning.
Default: “data.db”
features_path
(string): filename for cached features. Created during
featurization. Used during learning.
Default: “features.h5”
n_cores
(integer): maximum number of parallel processes for
featurization.
Default: 4
parallel
(string): backend for parallel execution.
Options:
{“python”, “dask”}
Default: “python”
Model¶
model_path
(string): filename for serialized model. Created during
learning. Used during prediction.
Default: “model.json”
Learning¶
features_path
(string): filename for cached features. Created during
featurization. Used during learning.
Default: “features.h5”
splits_path
(string): filename for cached
training-testing-validation and, optionally, cross-validation splits.
Default: “splits.json”
holdout_split
(integer, float): number of samples (integer) or
fraction of total samples (float < 1) to partition for holdout. The
remainder is used for training.
Default: 0.2
cv_split
(integer): number of partitions to create out of
non-holdout data for cross-validation purposes.
Default: 5
weight
(float): weighting parameter for error in energies and forces
during training. Lower values emphasize forces while higher values
emphasize energies. 0.0 disables energy contributions to the fit while
1.0 disables force contributions to the fit.
Default: 0.5
regularizer
(dictionary): ridge and curvature regularization
strengths for 1-body, 2-body, and 3-body interactions.
Default:
- ridge_1b: 1e-8
- ridge_2b: 0
- ridge_3b: 0
- curvature_2b: 1e-8
- curvature_3b: 1e-8