Umbrella sampling with PLUMED
Alanine dipeptide in vacuum
First of all, we will download the data required to run the calculations, which
should be in the data/umbrella
folder of this repository.
It should contain the following files from the
21.3 PLUMED masterclass.
foo@bar:~$ ls data/umbrella/
ala_dipeptide_analysis.ipynb topolA.tpr wham.py
reference.pdb topolB.tpr
Using the typical Gromacs syntax, you could use these tpr
files
to run MD simulations using
foo@bar:~$ gmx mdrun -s topolA.tpr -nsteps 200000 -x traj_unbiased.xtc
This will result in an unbiased simulation trajectory of the alanine dipeptide.
When we run simulations with the PLUMED we will use an additional
flag -plumed
that points to the unique input required to bias the
simulations. Below, you can find an example input file to
estimate the values of the \( \phi \) and \( \psi \) dihedrals
and histogram their values.
MOLINFO STRUCTURE=reference.pdb
phi: TORSION ATOMS=@phi-2
psi: TORSION ATOMS=@psi-2
# make histograms
hhphi: HISTOGRAM ARG=phi STRIDE=100 GRID_MIN=-pi GRID_MAX=pi GRID_BIN=600 BANDWIDTH=0.05
ffphi: CONVERT_TO_FES GRID=hhphi
DUMPGRID GRID=ffphi FILE=fes_phi.dat STRIDE=200000
hhpsi: HISTOGRAM ARG=psi STRIDE=100 GRID_MIN=-pi GRID_MAX=pi GRID_BIN=600 BANDWIDTH=0.05
ffpsi: CONVERT_TO_FES GRID=hhpsi
DUMPGRID GRID=ffpsi FILE=fes_psi.dat STRIDE=200000
PRINT ARG=phi,psi FILE=colvar.dat STRIDE=100
Note that we are using a number of actions (TORSION
,
HISTOGRAM
and DUMPGRID
, to name a few). You should
familiarize yourself with their
syntax.
Introducing an umbrella bias in this script is very easy. It just requires adding an extra line to the script. For example, to restrain the simulation with a harmonic potential at 0 radians with a spring constant of 200, we would write
# add restraining potential
bb: RESTRAINT ARG=phi KAPPA=200.0 AT=0
Of course, we will be interested in running multiple window umbrella sampling, and hence the structure of our directory should change with as many plumed files as umbrellas windows. To generate 32 PLUMED input files, we can use the following Python script:
at=np.linspace(-np.pi,np.pi,33)[:-1]
print(at)
for i in range(32):
with open("plumed_" + str(i) + ".dat","w") as f:
print("""
# vim:ft=plumed
MOLINFO STRUCTURE=reference.pdb
phi: TORSION ATOMS=@phi-2
psi: TORSION ATOMS=@psi-2
bb: RESTRAINT ARG=phi KAPPA=200.0 AT={}
PRINT ARG=phi,psi,bb.bias FILE=colvar_multi_{}.dat STRIDE=100
""".format(at[i],i),file=f)
And then we will run the 32 simulations
foo@bar:~$ for i in $(seq 0 1 31)
do
gmx mdrun -plumed plumed_${i}.dat -s topolA.tpr -nsteps 200000 -x traj_comp_${i}.xtc
done
The result from this will be a dataset of 32 simulations, all initialized at the same conformation, but with umbrella potentials that will bias the sampling to different regions of conformational space.
After completion of the runs, we must analyze the resulting trajectories. In order to do that we concatenate the trajectories
foo@bar:~$ gmx trjcat -cat -f `for i in $(seq 0 1 31); do echo "traj_comp_${i}.xtc"; done` -o traj_multi_cat.xtc
Then, we analyze them using plumed driver:
foo@bar:~$ for i in $(seq 0 1 31)
do
plumed driver --plumed plumed_${i}.dat --ixtc traj_multi_cat.xtc --trajectory-stride 100
done
To visualize the output of the simulations, we can read the
colvar_multi_\*.dat
files and see what region of the
Ramachandran map has been sampled by each of the runs. Using
a Jupyter-notebook we can run the following
import plumed
import wham
col=[]
for i in range(32):
col.append(plumed.read_as_pandas("colvar_multi_" + str(i)+".dat"))
bias = np.zeros((len(col[0]["bb.bias"]),32))
for i in range(32):
bias[:,i] = col[i]["bb.bias"][-len(bias):]
w = wham.wham(bias,T=kBT)
colvar = col[0]
colvar["logweights"] = w["logW"]
plumed.write_pandas(colvar,"bias_multi.dat")
We then write a file called plumed_multi.dat
to obtain
reweighted free energy surfaces. The file should look as
follows:
# vim:ft=plumed
phi: READ FILE=bias_multi.dat VALUES=phi IGNORE_TIME
psi: READ FILE=bias_multi.dat VALUES=psi IGNORE_TIME
lw: READ FILE=bias_multi.dat VALUES=logweights IGNORE_TIME
hhphi: HISTOGRAM ARG=phi GRID_MIN=-pi GRID_MAX=pi GRID_BIN=600 BANDWIDTH=0.05
ffphi: CONVERT_TO_FES GRID=hhphi
DUMPGRID GRID=ffphi FILE=fes_phi_cat.dat
hhpsi: HISTOGRAM ARG=psi GRID_MIN=-pi GRID_MAX=pi GRID_BIN=600 BANDWIDTH=0.05
ffpsi: CONVERT_TO_FES GRID=hhpsi
DUMPGRID GRID=ffpsi FILE=fes_psi_cat.dat
hhphir: HISTOGRAM ARG=phi GRID_MIN=-pi GRID_MAX=pi GRID_BIN=600 BANDWIDTH=0.05 LOGWEIGHTS=lw
ffphir: CONVERT_TO_FES GRID=hhphir
DUMPGRID GRID=ffphir FILE=fes_phi_catr.dat
hhpsir: HISTOGRAM ARG=psi GRID_MIN=-pi GRID_MAX=pi GRID_BIN=600 BANDWIDTH=0.05 LOGWEIGHTS=lw
ffpsir: CONVERT_TO_FES GRID=hhpsir
DUMPGRID GRID=ffpsir FILE=fes_psi_catr.dat
This script is again read by plumed driver, and from this program we get our outputs.
foo@bar:~$ plumed driver --plumed plumed_multi.dat --ixtc traj_multi_cat.xtc --trajectory-stride 100 --kt 2.4943387854
In the folder where you have found the data to run these tests you can find a script
that will let you plot the free energy landscapes as estimated from umbrella
sampling and compare them with those from the equilibrium MD trajectory.
Clearly, using umbrella sampling we have been able to cover much more ground
for our reaction coordinates, while obtaining a very consistent PMF in the
regions that actually matter. One of the assignments in the masterclass is to run
from a different initial state, that corresponding to the topolB.tpr
Gromacs input file.
Running with multiple replicas
Typically, if we want to do molecular simulations using umbrella sampling, we will take advantage of the parallel capabilities of modern computers. Additionally, we may able to swap conformations between umbrella windows in order to enhance the sampling. This may be helpful to avoid getting trapped in local minima. The specifics on how to do this using PLUMED in Gromacs are described in the first part of the PLUMED 21.5 masterclass. In order to take advantage of this, the software must have been compiled using MPI.
The PLUMED input file for this type of calculation is only incrementally more complicated than the one used before
MOLINFO STRUCTURE=../reference.pdb
phi: TORSION ATOMS=@phi-2
psi: TORSION ATOMS=@psi-2
bb: RESTRAINT ARG=phi KAPPA=200.0 AT=@replicas:-3.141,-2.945,-2.748,-2.552,-2.356,-2.159,-1.963,-1.767,-1.570,-1.374,-1.178,-0.981,-0.785,-0.589,-0.392,-0.196,0.0,0.196,0.392,0.589,0.785,0.981,1.178,1.374,1.570,1.767,1.963,2.159,2.356,2.552,2.748,2.945
PRINT ARG=phi,psi,bb.bias FILE=../colvar_multi.dat STRIDE=100
Note the AT=@replicas
section in the RESTRAINT
action. In this case, we must generate
as many directories (here termed run0
, run1
, and so on), each of which will contain
the tpr
file. Simulations are then run using
foo@bar:~$ mpiexec -np 32 --oversubscribe gmx_mpi mdrun -multidir run* -plumed ../plumed.dat -s topolA.tpr -replex 200 -nsteps 200000
While much of the analysis will remain unchanged from the case of running each umbrella window independently, it is important to carefully check in the demuxed trajectories whether the different copies of the system diffuse across windows. Some copies may remain isolated from the rest, which may have catastrophic results.