=====================
Energy flow analysis
=====================

**Interresidue energy flow and energy conductivity (G)**

In 2006, we proposed the concept of inter-residue energy conductivity (G) to measure the strength of the residue-residue interactions (Ishikura & Yamato, *Chem. Phys. Lett.* 432: 533-537, (2006)). Later, in 2021, we extended our program to calculate the residue-residue heat current and its contribution to the overall thermal conductivity of a protein (CURP v.1.3). 

For instance, we reported the application of G to illustrate the photosignal transduction mechanism of a small water-soluble photosensory protein--photoactive yellow protein, as mentioned in the reference above. The value of the energy flow from :math:`j` to :math:`i` can be calculated as follows:

.. math::

   J_{i \leftarrow j} = 
    \frac{1}{2}\left( {{{\boldsymbol{v}}_i} \cdot {{\boldsymbol{F}}_{ij}}
    - {{\boldsymbol{v}}_j} \cdot {{\boldsymbol{F}}_{ji}}} \right).

The energy flow between atom groups, A and B, can be calculated by the summation of the inter-atomic energy flow between the pair:

.. math::

   J_{A \leftarrow B} =
      \sum\limits_{i \in A}^{{N_A}} {\mathop \sum \limits_{j \in B}^{{N_B}}
      {J_{i \leftarrow j}}}.

If the atom groups A and B represent residues A and B, respectively, the corresponding energy flow between A and B represents the inter-residue energy flow. Next, we define a sort of energy transport coefficient based on the Green-Kubo
linear response theory as:

.. math::

   G_{AB} = \int_0^\infty  {\left\langle
      {J_{A \leftarrow B}}(t){J_{A \leftarrow B}}(0)
      \right\rangle dt}.

In practice, the time integral is replaced by the sum of the auto-correlation function of :math:`J` at discrete time points. The CURP program performs these calculations for the proteins
and visualizes the energy conductivities using an energy exchange network (EEN).This allows the user to grasp the pattern of the inter-residue interactions at a glance. In principle, it is possible to calculate the values of G based on a single NVE trajectory. However, it is highly recommended to calulate thermal averages of them based on multiple NVE trajectories that start from different points on the energy landscape of the target protein. If the number of the NVE trajectories are sufficiently large, their average values are fair representations of the energy transport network of the thermally fluctuating protein system.  

.. note:: In the previous paper(*CPL* ,2006), we calculated the values of G based on the atomic decomposition of the potential energy using classical force-field functions, where the way of the decomposition involves arbitrariness. In the CURP program, the irEC calculation is performed in a different manner based on the pairwise decomposition of forces (*CPL* 539(2012)144). Accordingly, the numerical results of G may not be coincide with those obtained earlier.

We will illustrate how to perform energy flow analysis. To perform the calculations, coordinates and velocities trajectory of the system of interest are required. CURP analyzes the inter-residue interactions through the following steps:

.. contents::
   :local:
   :depth: 1

Preparing trajectories
=======================

Solvent stripping 
------------------

When we are focusing on residue-residue interactions in a polypeptide chain, solvent coordinates and velocities are not needed. It is possible to save disc space by running the follwoing scripts and disregarding the solvent degrees of freedom:

.. code-block:: bash

   # Solvent spripping from the coordinates trajetory
   curp conv-trj -crd \
       -p system.prmtop   -pf amber \
       -i nev.crd.nc      -if netcdf --irange 1 -1 1 \
       -o stripped.crd.nc -of netcdf --orange 1 -1 1 \
       mask -m mask.pdb

   # Solvent stripping from the velocities trajectory
   curp conv-trj -vel \
       -p system.prmtop   -pf amber \
       -i nev.vel.nc      -if netcdf  --irange 1 -1 1 \
       -o stripped.vel.nc -of netcdf  --orange 1 -1 1 \
       mask -m mask.pdb

Suppose that the time step of the NVE simulation was 2 fs.
In the first (second) example above, the `mask` subcommand removes the solvent degrees of freedom from the coordinates (velocities) trajectory file, `nev.crd.nc` (`nev.vel.nc`), in which coordinates are saved every 10 fs = 5 frames (2 fs = 1 frame), and a solvent-stripped trajectory is stored in `stripped.crd.nc` (`stripped.nev.nc`). The filename of the parameter/topology file is specified after the `-p` option.

**mask**
   This subcommand extracts selected atoms, which are recorded in the PDB file specified after the `-m` option. Make sure that the PDB file contains only the selected atoms, and each atom id coincides with the sequential number of the corresponding atom in the trajectory file. The other atoms will be excluded from the system.

`conv-trj` command options

`-crd`
   Setting coordinates trajectory files, whose type is assumed to be the coordinate type by default.
   Note that this option is mutually exclusive with the `-vel` option.

`-vel`
   Setting velocities trajectories files.
   Note that this option is mutually exclusive with the `-crd` option.

`-p` 
   Specifies the parameter and topology file.

`-pf`
   Specifies the format of the parameter and topology file.

`-i`
   This option specifies the input trajectory file. By repeating this option,
   multiple trajectory files are read in the order you provided.
   
`-if`
   Specifies the format of the input trajectory file.

`\--irange`
   Specifies <first_frame>, <last_frame> and <frame_interval> for the input trajectory file.
   `-1` for <last_frame> represents the last frame of the trajectory.
   For example, if "1 -1 5" is provided, the conv-trj command reads
   1st, 6th, 11th, ... , up to the last frames out from the trajectory.

`-o`
   Specifies the trajectory file for output. You are not allowed to
   specify this option multiple times.

`-of`
   Specifies the format of the output trajectory file.

`\--orange`
   Specifies <first_frame>, <last_frame> and <frame_interval> for the output trajectory file.
   `-1` for <last_frame> represents the last frame of the trajectory.

Note that the parameter and topology file is needed to be modified 
according to the solvent splitting for the subsequent MD simulations of the
new system.

Adjusting the time points for the coordinates and velocities trajectory
------------------------------------------------------------------------

In the Amber restart and trajectory files, the time frames of atomic velocities are shifted by :math:`-\Delta t/2` from those of atomic coordinates.
In the CURP program, however, the time points of the atomic coordinates must coincide with those of the atomic velocities.
Therefore, the atomic velocities in the Amber trajectory file must be 
modified before the energy flow calculations. As explained below, 
the `conv-trj` program of the CURP package estimates the atomic velocities at
the time point of :math:`t` from those at the time points of
:math:`t - \Delta t/2`, and :math:`t + \Delta t/2`. Accordingly, the
velocity frames should be recorded every step to the trajectory file. 

.. note:: Suppose that you started your MD simulation from time :math:`t_0`. Time points of the Amber coordinate and the velocity frames are, then, (:math:`t_0 + \Delta t, t_0 + 2\Delta t, t_0 + 3\Delta t, \cdots`), (:math:`t_0 + \Delta t/2, t_0 + 3\Delta t/2, t_0 + 5\Delta t/2, \cdots`), respectively. In addition, the restart file contains the coordinate and the velocity frames at the time points of :math:`t_0` and :math:`t_0 - \Delta t/2`, respectively.

To modify the time points of the Amber velocities trajectory, the following 
script is available:

.. code-block:: bash

   # adjust the velocity time
   curp conv-trj -vel \
       -p stripped.prmtop -pf amber \
       -i stripped.vel.nc -if netcdf --irange 1 -1 1 \
       -o adjusted.vel.nc -of netcdf --orange 5 -1 5 \
       adjust-vel

Suppose the the NVE simulation was performed with the time step of 2 fs.
In the above example, the 1st, 2nd, ... , up to the last frames are read from 
the `stripped.vel.nc` file obtained in the `Solvent stripping`_ section.
For each of the frame-pairs, (4th, 5th), (9th, 10th), :math:`\cdots` , a new frame is generated at the midtime point of the frame-pair. Consequently, the time points of the the original 5th, 10th, :math:`\cdots`, frames are shifted by :math:`- \Delta t/2` altogether and output to the `adjusted.vel.nc` file. Note that this process is needed for the velocities trajectory obtained by the leap frog algorithm, while not needed for that obtained by the velocity Verlet algorithm.

**adjust-vel**
   This subcommand shifts the time points of the velocities trajectories as described above.  

In this tutorial, we provided the sample trajectories in which the 
time points of the coordinates and velocities trajectories were adjusted.

Avoiding missing frame problem while concatenating multiple trajectory files
----------------------------------------------------------------------------

A special care is needed when you use the leap frog integrator, which is usually employed in the AMBER program, and split your trajectory into multipile files. Suppose that the time period of the `i`-th trajectory segment is :math:`[T_{i-1} + \Delta t, T_{i}]`, and the atomic coordinates and the velocities in this segment are recorded at the time points of :math:`(T_{i-1} + \Delta t, T_{i-1} + 2\Delta t, \cdots ,T_{i} - \Delta t, T_{i})` and :math:`(T_{i-1} + \Delta t/2, T_{i-1} + 3\Delta t/2, \cdots, T_{i} - 3\Delta t/2, T_{i} - \Delta t/2)`, respectively. If you need to consider the velocities at :math:`T_{i}` for the further calculations of energy flow, you need the velocity trajectory files of both `i`-th and `(i+1)`-st segments, because the velocities at :math:`T_{i}` are estimated from those at :math:`T_{i} - \Delta t/2` and :math:`T_{i} + \Delta t/2`. Note that the velocity trajectory file of the `i`-th segment can be replaced with the restart file generated at the end of the `i`-th segment.

**Example**
    When you perform a MD simulation for 10 ps with the time step of :math:`\Delta t` = 2 fs, and save the atomic coordinates every 10 fs, the time points of the atomic coordinates are 10 fs, 20 fs, :math:`\cdots`, 9990 fs, and 10000 fs. On the other hand, you need to save the velocities every step because you need to adjust the time points of the velocities to those of the atomic coordinates. Suppose that you divide the velocities trajectory into halves, and save the first (second) half to the trajectory file named `nve1.vel.nc` (`nve2.vel.nc`). Then the time points of the velocities in `nve1.vel.nc` (`nve2.vel.nc`) are 1 fs, 3 fs, :math:`\cdots`, 4999 fs (5001 fs, 5003 fs, :math:`\cdots`, 9999 fs). 

.. code-block:: bash

   # Example 1: adjust velocities for the 1st half of the velocity trajectory 
   curp conv-trj -vel \
       -p system.prmtop -pf amber \
       -i nve1.vel.nc -if netcdf --irange 1 -1 1 \
       -o stripped1.vel.nc -of netcdf --orange 1 -1 1 \
       mask -m mask.pdb

    curp conv-trj -vel \
        -p strip.prmtop -pf amber \
        -i stripped1.vel.nc -if netcdf --irange 1 -1 1 \
        -o adjusted1.vel.nc -of netcdf --orange 5 -1 5 \
        adjust-vel

Example 1 shows how to adjust the time points of the velocities to those of the atomic coordinates for the first half of the trajectory after removing unnecessary part of the system. Note that `strip.prmtop` represents the parameter/topology file generated for `mask.pdb`. As a result of the adjustment, the velocities at the time points of 10f, 20fs, :math:`\cdots`, and 4990 fs are saved to `adjusted1.vel.nc`. 

.. code-block:: bash

   # Example 2: adjust velocities for the 2nd half of the velocity trajectory 
   curp conv-trj -vel \
       -p system.prmtop -pf amber \
       -i nve1.rst -if restart --irange 1 -1 1 \
       -i nve2.vel.nc -if netcdf --irange 1 -1 1 \
       -o stripped2.vel.nc -of netcdf --orange 1 -1 1 \
       mask -m mask.pdb

   curp conv-trj -vel \
        -p strip.prmtop -pf amber \
        -i stripped2.vel.nc -if netcdf --irange 1 -1 1 \
        -o adjusted2.vel.nc -of netcdf --orange 1 -1 5 \
        adjust-vel

Similarly, example 2 shows the velocity adjustment for the 2nd half of the trajectory. Here we need to read the restart file, `nve1.rst` before reading the velocity trajectory `nve2.vel.nc`. The velocities at `t` = 4999 fs (`t` = 5001 fs) are saved in `nve1.rst` (at the 1st frame of `nve2.vel.nc`), and the velocities at `t` = 5000 fs are estimated from those at 4999 and 5001 fs. If the velocities at `t` = 5000 fs are not necessary for the final output file, you do not need to read the restart file in this example. Note that the final velocity file is generated with `"--orange 1 -1 5"` so that the first frame at `t` = 5000 fs is included in the output file named `adjusted2.vel.nc` and, thus, the velocities at 5000 fs, 5010 fs, :math:`\cdots`, and 9990 fs are save in the output file. 

Calculating interresidue energy flow
=====================================

Here we show an exmaple about how to calculate energy flow with 

#. configuration file for the CURP calculation

To start the calculations, please type in the following command:

.. code-block:: bash

   $ curp compute eflow.cfg > eflow.log

or 

.. code-block:: bash

   $ mpiexec -n 2 curp compute eflow.cfg > eflow.log

for parallel calculations with OpenMPI. In this case the number of cores is 2, ``eflow.cfg`` (see below)  is a configuration file for the energy flow calculations and ``eflow.log`` is the log file.


These commands should produce the following two files:

*  eflow.log
*  flux_grp.nc

``flux_grp.nc`` stores the fime series of energy flow in the netcdf format.
To check the content of this file, type in the following command: 

.. code-block:: bash

   $ ncdump outdata/flux_grp.nc

Setting up ``eflow.cfg``
--------------------------

Here we show an example of ``eflow.cfg``:: 

   [input]
   format = amber
   # first_last_interval = 1 4 1
   # group_file = group.ndx

   [input_amber]
   target = trajectory
   topology_file = ../pdz3/stripped.prmtop.gz
   coordinate_format = netcdf
   coordinate_file = ../pdz3/strip.crd.nc
   velocity_format = netcdf
   velocity_file = ../pdz3/strip.vel.nc

   [curp]
   potential = amber12SB
   method = energy-flux

   group_method = residue
   flux_grain = group
   # target_atoms = 
   # enable_inverse_pair = no
   group_pair_file = gpair.ndx

   remove_trans =  yes
   remove_rotate = yes

   log_frequency = 2

   [output]
   filename = outdata/flux.nc
   format = netcdf
   decomp = no

   output_energy = no

A detailed explanation is provided below:

[input]
~~~~~~~

The input file format.

format = amber
   Read Amber formatted files.
   
first_last_interval = 1 4 1
   For the energy flow calculations, the <first> and <last> frame with the interval of <intraval> frames are set in this line as <first> <last> <interval>.

group_file = group.ndx
   In this line, atom group definition file is specified. In this file, you can define an arbitrary group of atoms that is different than the standard amino acid residues.

[input_amber]
~~~~~~~~~~~~~~

In this section name, 
after ``input_`` comes the keyword specified as the format key in the ``[input]`` section.
The following keywords are used in the ``input_amber`` section.

target = trajectory | restart
   Specifies whether the input file is a trajectory file or a restart file.

topology_file = <prm_top_file>
   Set the path to the parameter and topology file.
   
coordinate_format = ascii | netcdf
   Set the format of the coordinate trajectory file.

coordinate_file = <mdcrd_file>
   File name of the coordinate trajectory file.

velocity_format = ascii | netcdf
   Set the format of the velocity trajectory file.

velocity_file = <mdvel_file>
   File name of the velocity trajectory file.

[curp]
~~~~~~~

In this section, parameters and keywords are set for the irEF calculations.

potential = amberbase | amber94 | amber96 | amber99 | amber99SB | amber03 | amber12SB
   In this line, the type of the potential function is set.

method = momentum-current | energy-flux | heat-flux
   This line specifies whether the calculation is for energy flux or for atomic stress tensors (momentum current) or heat flux. In this example, we choose ``energy-flux``.

group_method = none | united | residue | file
   In this line, the unit of irEF is set.
   ``none``: No groups are defined.
   ``united``: This specifies fixed united atom groups. All of the hydrogen atoms, whether polar or apolar, belong to the united atom group represented by the heavy atom to which they attached.
   ``residue``: Groups are defined by residues unit.
   ``file``: User defined atom groups are adopted. (see ``group_file`` key in the input section.)
   none: No groups are formed.

flux_grain = atom | group | both
   Output option for the energy flow data.
   ``atom``: Output inter-atomic energy flow for all atom pairs. (not recommended) 
   ``group``: Output inter-group energy flow between all pairs of groups defined by the ``group_method`` keyword.
   ``both``: Output both of the above two data (not recommended).
   
target_atoms = 1-33
   Specifies the target segment for the calculations. In this example, atom 1 to
   33 are considered and the other atoms are neglected. If not
   specified, all atoms of the system are considered. Note that the CURP program
   excludes atoms other than the ones specifed by this option from the calculations, even when the group option is set to any of united/residue/file. 
   
group_pair_file = gpair.ndx
   Set group pair file. This option defines the set of group pair for which
   the energy flow is calculated. This can be used to focus only on
   the region of interest, saving the computational time considerably.
   Without this option, CURP calculates the energy flow between all pairs of groups.

remove_trans = yes | no
   If yes, the translational movement of the system is removed. (the default is yes)
   NOTE: Usually, we should use "yes".

remove_rotate = yes | no
   If yes, the rotational movement of the system is removed. (the default is yes) 
   NOTE: Usually, we should use "yes".

log_frequency = 2
   The frequencey of output information to stdout.

[output]
~~~~~~~~

Setting the output format.

filename = outdata/flux.nc
   Filename of the energy flow data.

format = ascii | netcdf
   Format of the energy flow data. (netcdf format is highly recommended.)

decomp = no | yes
   During the calculations, choose whether the energy is decomposed into different
   components.

output_energy = no | yes
   CURP is able to evaluate the energy using the atomic velocities and coordinates of the trajectory files. When set to "no", this energy value is not output. 

Calculating G
=================

After the energy flux calculations, the values of G are calculated based on the linear response theory.
You will need the time series of irEF stored in `flux_grp.nc`. Type in the following command:

.. code-block:: bash

   $ curp cal-tc \
       --no-axes --frame-range 1 10 1 --average-shift 1 \
       -a outdata/acf.nc \
       -o outdata/ec.dat outdata/flux_grp.nc > ec.log

`\--no-axes`
   This option is needed for handling J (scalar flux). 

`\--frame-range <first_frame> <last_frame> <frame_interval>`
   This specifies the range of the time integration of the auto-correlation function of energy flux, :math:`\left\langle J(0)J(t) \right\rangle`.
   The upper and lower limits of the integral are set in <first_frame>, 
   <last_frame>, respectively. During the integration, every <frame_interval>
   frames are used for calculations.  

`\--average-shift <ave_shift>`
   In the calculation of G, J(0)J(t) is integrated from <first_fram> to 
   <last_frame>. Then, the origin of the integration is shifted by <ave_shift>
   and the time integration is again conducted from <first_frame> to <last_frame>.
   This procedure is then repeated until the end point of the time integration
   reaches the end of the trajectory.

`-a <file_name>`
   The time-correlation function data are output to <file_name>.
   The data format is netcdf. If this key is not specified, no data is output.

`-o <file_name>`
   Energy conductivity data is output to <file_name>.

You will then obtain energy conductivity data ``output/ec.dat`` and the
time-correlatioin data file, ``outdata/acf.nc``.

Format of irEC data file
-------------------------

 In each line of the data file, `ec.dat`, a pair of residues and the corresponding value of G is written as <residue_A> <residue_B> <G_AB>. The unit of <G_AB> is measured in :math:`{\rm{(kcal/mol)^2/fs}}`.