-
Notifications
You must be signed in to change notification settings - Fork 1
black_box
black_box calculates a reaction rate constant of a reaction, where only the optimized reaction path is needed as initial input. All other steps -- calculating QM reference data with Gaussian or orca, setting up a TREQ potential energy surface and running umbrella sampling and recrossing calculations on the generated surface -- are done automatically: like in a black_box.
The program is able to determine the class of reaction mechanism from the given reaction path, and sets up the collective variables for the biased samplings in the next steps using this information. Further, two qualities of reference methods can be combined within the TREQ method, such that the path energies can be recalculated with a more expensive method like DLPNO-CCSD(T).
In order to get a reasonable accessment of the expectable accuracy and error of the results, several independent rate constant calculations are usually being done and their results are averaged (calculating their variance as well). Further, if rate constant calculations are ordered to be made at different temperatures, Arrhenius parameters can be fitted and the degree of Arrhenius behavior can be determined (linear correlation).
Concerning the input, most settings defining the PES and the rate constant calculations have default values that are used if no explicit keywords for them are given in the keyfile. Therefore, a much smaller number of keywords is needed compared to the usual calculations. Nevertheless, since many tasks will be done by the program, the keyfile will be of non-neglectable extend.
The whole automated procedure might require a significant amount of calculation time for medium size systems (more than 6-8 atoms): Due to the exponential dependence of k(T) values on barrier heights, QM methods of high quality should be used (hybrid functionals or MP2 with TZ basis, ideally also energy corrections with, e.g., DPLNO-CCSD(T)). With them, the reference calculations are tedious, further, many rate constant calculations are needed if Arrhenius parameters shall be calculated from averaged k(T) values at different temperatures.
Therefore, only small systems are presented as examples, where the calculations can be completed after 2-3 hours on a usual desktop PC. Larger systems, however, were also calculated with that method and gave good results (which will be published in the future).
A black _box calculation consists of the following steps:
- Read the given reaction path and pick its first and last structures. Their geometries are optimized with the QM program and method desired by the user.
- The needed set of QMDFF reference information is calculated with the chosen QM program, the QMDFFs of reactants and products are generated with the program qmdffgen.x
- A number of gradient+Hessian reference points along the path is picked by the preimplemented importance-selection algorithm. QM reference calculations are done for them.
- The TREQ PES description is set up with the generated QMDFFs and the data from the gradient+Hessian calculations along the path. The internal coordinates for the RP-EVB and the direct interpolation region are picked based upon merging the internal coordinate sets of both QMDFFs.
- Rate constants are calculated for the desired temperatures. The reaction mechanism (and thus the dividing surfaces for the definition of the collective variable for the umbrella samplings) is determined based on comparison of bonds used in the QMDFF internal coordinates sets in reactants and products. Based on the average rates at the temperatures, Arrhenius parameters are calculated as well.
List of examples:
- 1: H+H2 with Orca reference (simplest possible reaction with high-quality reference): black_box/h+h2
- 2: PH3+H with Gaussian reference (Slightly larger reaction): black_box/ph3+h
Here again the simplest possible H+H2 reaction. With this small system, it is possible to do the full black-box calculation process in less than one hour.
The reaction path was optimized with Orca (RI-MP2/aug-cc-pVQZ/C).
As for all black-box calculations based on Orca-generated reference, only two input files are needed: The main input keyfile (black_box.key) and the IRC output file (irc_trj.xyz), where the energies were written by Orca in the xyz file command lines.
The black_box.key file begins with a general section:
irc_software O
irc_prefix irc_trj
irc_direction left2right
separate_energy
min_rp_points 25
rate_temps 400 450 500 550 600
kt_average 5
deltat 0.1
rpmd_beads 16
nprocs_total 6
min_nprocs 3
treq_ref_nprocs 1
ens_nprocs 3
ref_memory 2
The software package used for the given IRC is Orca in this case (irc_software O), the prefix of the file is apparently irc_trj.
Usually, the keyword irc_direction is important. With its choice, the whole rate calculation process is either done for the direction first to last frame (left2right, related to a plot of the IRC energies), or last to the first frame (right2left).
Since the H+H2 reaction is symmetric, this choice does not matter here.
The keyword separate_energy activates the recalculation of the energies of all IRC structures with a higher level of theory, which is too expansive to do the whole energy+gradient calculation with it. This is possible due to the separation of the reaction coordinate (which is treated by the high level) from all others, interpolated by Taylor expansions orthogonal to the path.
For small reactive systems it is of no large overhead to add many gradient+Hessian reference points along the path, here 25 as offset, given by min_rp_points 25.
Concerning the rate calculation, the list of temperatures at which k(T) values shall be calculated is important: rate_temps 400 450 500 550 600, as well as the number of independent k(T) calculations at each temperature (kt_average 5).
If only one temperature is given, no Arrhenius behavior will be evaluated and no Arrhenius parameters will be calculated.
Both the MD time step used for all samplings (in fs) deltat 0.1 and the number of RPMD beads (rpmd_beads) must be given explicitly.
The following section is rather technical: The total number of available processors (nprocs_total 6) must be given as well as the number of processors for each geometry optimization and QMDFF reference generation (min_nprocs 3) and the number of processors for each gradient+Hessian reference calculation for the TREQ term (treq_ref_nprocs). Since additional energies along the reaction paths shall be calculated, the number of processors for them must be given as well (ens_nprocs 3). The total number of energy calculations is divided according to the number of batches that can be calculated with the total number of processors. If a large machine with many processors is available, many of those reference calculations can be done in parallel, if an appropriate number of processors is given.
Finally, the available memory for each reference calculation (in GB, ref_memory 2) is given as well.
After the general settings, the gradient+frequency section follows:
grad_freq {
software O
method RI-MP2
basis NoFrozenCore aug-cc-pVQZ/C
charge 0
multi 2
}
Here, all settings for the quantum chemical reference calculations of QMDFF minima and the gradient+frequency points along the path for the setup of the TREQ PES are listed.
First, the QM software package (Orca) is specified, similar to the IRC settings in the general section: software O.
Further, the method and the basis set (and further settings) are noted: method RI-MP2 and basis NoFrozenCore aug-cc-pVQZ/C.
Finally, charge and multiplicity of the full system must be given (which are of course the same for all reference calculations): charge 0 and multi 2.
The next section must only be given if energies along the path are calculated with a higher level of theory: the extra energy section:
energy_extra {
software O
method dlpno-ccsd(t)
basis aug-cc-pVQZ/C
}
Here, software, method and basis set are given analogously to the gradient+frequency section. Since charge and multiplicity will be the same, they must not be given again.
A rather technical section follows, the symlinks section:
symlinks {
qmdffgen ~/bin/qmdffgen.x
calc_rate ~/bin/calc_rate.x
orca ~/Software/orca_5_0_3_linux_x86-64_shared_openmpi411/orca
mpi mpirun -np
}
Since black_box needs to know how it can call the several programs during its execution, their paths as being available on the target machine must be given here (absolute or as available in $PATH variables).
Here, qmdffgen is needed for the setup of reactants/products QMDFFs, calc_rate for the rate constant calculations, orca for the QM reference calculations and mpi for the parallelization of the calc_rate calls.
The final keywords concern details in the calc_rate calculations and are fully optional: For all settings, black_box gives default values if not set explicitly by the user. The only setting which usually should be taken under consideration explicitly, is the number of equivalent reaction paths:
mecha {
n_paths 2
}
Currently, the symmetry is not determined automatically. Therefore, all rate constants would be 50% too small if the twofold symmetry is not given. Umbrella sampling and recrossing settings can be altered, e.g., to make the calculation cheaper or more accurate.
umbrella {
bias 0.20
equi_steps 5000
sample_steps 5000
}
recross {
no_check
mpi
}
The calculation can be start as usual by invoking the black_box program on a single CPU (parallelization realized by the calls within the program, so make sure that enough processors are available!)
black_box.x black_box.key
Now, as the program name states, everything is done in a black-box fashion. On a usual workstation with 8 CPUs, the total calculation should roughly take one hour.
The actual status of the calculation is printed to the command line. Usually, the most interesting parts are written at the end:
Averaged k(T) values for all temperatures:
400K: 1.72396019E+09 cm^3/(mol*s) , 2.86270275E-15 cm^3/(molec*s)
450K: 5.62474406E+09 cm^3/(mol*s) , 9.34010566E-15 cm^3/(molec*s)
500K: 1.70822079E+10 cm^3/(mol*s) , 2.83656687E-14 cm^3/(molec*s)
550K: 3.46224040E+10 cm^3/(mol*s) , 5.74918445E-14 cm^3/(molec*s)
600K: 7.97762034E+10 cm^3/(mol*s) , 1.32471480E-13 cm^3/(molec*s)
Do Arrhenius fit for calculated rates:
--> Done! The results are:
slope: 4562.0930135 K, y_intercept: 2.3592958E-10 cm^3/(mol*s)
or in other units (NIST):
Ea (slope): 37.9313868 kJ/mol, A(y_intercept): 2.3592958E-10 cm^3/(mol*s)
The correlation coefficient: -0.99844816
In order to visualize the Arrhenius plot, open: calc_rate/arrhen_plot.gnu
Timings:
----------
Read in of settings and initialization: 0.012 s.
QMDFF reference calculation : 18.630 s.
QMDFF generation 0.315 s.
TREQ reference data calculation 47.883 s.
Additional energy calculations 62.033 s.
Rate constant calculations + Arrhenius 2734.815 s.
The calculation needed a total time of 2863.688 seconds.
These are: 0 days, 0 hours, 47 minutes and 43 seconds.
Here, the averaged rate constants for the chosen temperatures are listed as well as the Arrhenius parameters obtained from the fit of the data. Further, the timings of the different calculation stages are shown. Here, the rate constant calculation part is by far the most time-consuming, using more than 95 % of the total walltime. The QM reference calculations are much cheaper, of course mainly due to the rather trivial system, containing only three electrons.
During the calculation, a number of folders is written, containing different pieces of output which are helpful to control the overall quality of the calculation and, if needed, to debug possible strange results.
-
reactants: The QMDFF reference calculation for the reactants minimum, geometry optimization and frequency calculation. In this case, the single point energy is calculated with a separate level of theory, therefore, the subfolderadd_ensis present. -
products: Same asreactants, but for the products minimum. -
evb_qmdff: Here, the QMDFFs are generated with theqmdffgenprogram. Currently, only TREQ is supported byblack_box, therefore, the nameevb_qmdffmight be slightly misleading, the nameqmdffwould be enough. The quality of the QMDFFs (see page qmdffgen) can be judged by looking into theqmdffgen.logfile and, more detailed, into thereactants_qmdff.logandproducts_qmdff.logfiles. -
mep_irc: Here, the given input concerning the IRC of the reaction is analyzed, the structures are printed toirc.xyz, the energies (in Hartrees) toirc_ens.dat. Further, the path progress parameter xi and the slope of the path are calculated and plotted, into the filesirc_xi_ens.datandirc_slopes.dat. -
add_ens: If, as it is here the case, the energies are calculated with a separate level of theory, the energies for all structures along the given reaction path are calculated in this folder. If the total number of processors is more than twice the scheduled number of processors for the energy calculations, the path is divided into a number of slices, calculated in parallel. In this case, there are two parts, calculated in foldersround1andround2. -
rp_ref: For the setup of the TREQ coupling term, a number of gradient and Hessian calculations is performed along the given reaction path. How the points are selected in detail can be seen in the Caracal publication. Each point is calculated separately in one folder, in this case 26 in total, located in folders1to26. -
calc_rate: The most important part of the process, the rate constant calculations, are located here. For each temperature in the list afterrate_temps, a subfolder is generated, here,400Kto600K. For each temperature, five independent rate calculations were done, such that foldersrun1torun5are located in each temperature folder. In each run folder, the full output of the particularcalc_ratecalculation is contained, see page calc_rate for details. In thecalc_ratemain folder, all individual rate constants are listed in the fileall_kt.dat. If, e.g., a single rate deviates strongly from all others at its temperature, the reason for that can be searched in therunfolder of the particular calculation.
For this example, large parts of the proceedings work very similar to the first example, therefore, it is described rather shortly.
Here, the reaction path has been calculated with Gaussian16. Since the output files of Gaussian are much larger, they are compressed in the archive irc_reference.tar.gz.
It can be extracted with:
tar -xvf irc_reference.tar.gz
The files irc.chk (checkpoint file) and irc.log (logfile) were extracted.
With this, Caracal can do its job:
black_box.x back_box.key
The overall computation should take 2-3 hours, on a usual workstation with 8 CPUs (6 used).
Evaluation and debugging of results can be made in the same way as described in example 1.
CARACAL Wiki
Getting Started
Program Tutorials
Miscellaneous