1. Introduction
TauTGen
(pronounced “tau-t-gen”) is a tautomer generator program. TauTGen constructs
tautomers from molecular frame built of heavy atoms and a number of hydrogens.
The user has to provide the geometry of molecular frame and specify constrains
on minimum and maximum number of hydrogen atoms connected to each heavy atom.
The places of possible placement of hydrogen atoms, the sites, have to be also
defined by the user.
TauTGen was
developed by Maciej Haranczyk under supervision of prof. Maciej Gutowski. It was
originally a part of PhD project on anions of nucleic acid bases.
1.1 Citation
TauTGen is
available free of charge on the basis of GNU license. However, we kindly ask to
cite the following article whenever publishing results produced by TauTGen.
Maciej Haranczyk and Maciej Gutowski “Quantum Mechanical Energy –Based
Screening of Combinatorially Generated Library of Tautomers. TauTGen: A
Tautomer Generator Program“ Journal of Chemical
Information and Modeling 2007; 47(2); 686-694; DOI:
10.1021/ci6002703
1.2 User feedback
This software comes without warranty or
guarantee of support, but we will try to meet the needs of our user community.
Please send bug reports, requests for enhancement, or other comments to
maharan@chem.univ.gda.pl
2. Installation and
usage
TauTGen is provided as a .tgz archive
containing three .c file. To compile the main program just run the C complier:
gcc main.c –o TauTGen –lm
(There are two other versions of TauTGen that
can save output file to PDB and SDF files. The complile them, try:
gcc main_pdb.c –o TauTGenPDB –lm
gcc main_sdf.c –o TauTGenSDF –lm
)
In order to run the program to test the input
file and get the total number of tautomers as well as their names, one has to try:
./TauTGen inputfile
However, if one wants to save each tautomer
into separate .xyz file, the program has to be executed with an extra parameter
./TauTGen inputfile save
3. How TauTGen works ?
TauTGen constructs tautomers from
molecular frame built of heavy atoms and a number of hydrogens.
The user has to provide the geometry of
molecular frame and specify constrains on minimum and maximum number of
hydrogen atoms connected to each heavy atom. The places of possible placement
of hydrogen atoms, the sites, are also defined by the user. To define a site,
the user has to provide the following information:
Special care is taken to preciously
name the sites. These names are used to create the tautomer names that are
later used as the filenames. For example, sites A and B (left figure) should be
named “N4cis”
and “N4trans” to distinguish possible rotamers resulting from rotation of imino
group. Information about connectivity is used to count number of hydrogen atoms
at each heavy atom.
Each site has defined constraint
that it is used only if a specified number of hydrogen atoms is
present at particular heavy atom. It is called the site constraint. This option
is used to build proper hybridization of framework atoms. For example, site C
and site E (middle figure) are used only when there are 2 hydrogen atoms at C5
(so it has sp3 hybridization) and site D is used only is there is one hydrogen
at C5 (so it has sp2 hybridization). If
the user desires to generate stereoisomers of a molecule, than two sites have
to be used for each asymmetric atom in order to describe R and S configuration
of heavy atom (or sites that are “below” and “above” molecular plane, see right
figure). Each of these sites have additional
information describing the configuration, e.g. 1 or 2 for respectively “above”
or “below” configuration.
Having all required
information TauTGen generates all possible combinations how to distribute all
available hydrogens among defined sites. For each generated combination TauTGen
checks whatever it is in agreements with the applied constrains. The constrains are checked in the following order:
The generated tautomer is rejected
when it does not pass any of above checks. The stereoconfiguration check is an
additional routine that has enantiomer detection algorithm. If an enantiomer of
previously generated stereoisomer is built, it is rejected so the final set of stereoisomers
consist of diasteroisomers only.
The following steps are the part of
the stereoconfiguration check:
At the last step TauTGen generates
the filenames and saves the coordinates of each tautomer to a separate file.
The filename is a cluster of the sites names that were used to build up a tautomers. If proper sites names are defined, the
filename can uniquely name the tautomer in the file (including at specified
rotamer). In the case of molecules with large number of tautomers, there is a
possibility to divide the tautomers among groups and subgroups according to
number of hydrogens at two specified atoms of molecular frame. If the
stereoisomers were generated, the tautomer name is supplemented with
stereoconfiguration information: eg. ”Z_nml” where n,m
and l are sites names placed on the same side of molecular plane.
4. Input file
4.1 Input file structure
The input
file contains all information needed to generate tautomers. It consists of the
following blocks:
Definition of molecular frame
Definition of constraints on min.
and max. number of
hydrogens at each heavy atom
Definition of sites
Number of hydrogens
Definition of stereoconfiguration
Naming and division into groups and subgroups
Each of
these blocks is described in following sections. The sample input files
supplemented with our comments are presented in section 4.2.
4.1.1 Definition of molecular frame
This block
describes the geometry of molecular frame. Currently only Cartesian coordinates
of atoms might be used to specify a molecule. The format is as described below:
<integer number_of_atoms_in_frame>
<string 1st_atom_symbol> <real
xyz_coordinates_of_1st_atom>
<string 2nd_atom_symbol> <real
xyz_coordinates_of_2nd_atom>
…
<string last_atom_symbol> <real xyz_coordinates_of_last_atom>
4.1.2 Constraints on minimum and maximum number
of hydrogens at each heavy atom
The minimum
and maximum values for each atom of the molecular frame are specified in
separate lines. The order of atoms is the same as in block defining geometry of
the molecular frame (section 4.1.1).
<integer min_#H_at_1st atom> <integer
max_#H_at_1st_atom>
<integer min_#H_at_2nd_atom> <integer
max_#H_at_2nd_atom>
…
<integer min_#H_at_last_atom> <integer max_#H_at_last_atom>
4.1.3 Definition of sites
This block
contains definition of sites in the following format:
<integer number_of_sites>
<string site_name> <int which_atom> <int
site_constrint> <int stereo_inf> <real x y z>
With last
line repeated for each site. Sites should have precise names as explained in
section 3.1.
The integer
which_atom defines to which atom the site is connected (the same order as in
the first block). The site constraint specifies how many hydrogen atoms are
required at particular heavy atom to make this site active. Stereo_inf defines
stereoconfiguration: 0 when stereoconfiguration check is not used,
1 or 2 when stereoconfiguration check is desired and the site is below or above
molecular plane, respectively. The position of the site is given in Cartesian
coordinates.
4.1.4 Number of hydrogens
This block
contains only one line. It contains an integer specifying the total number of
hydrogen atoms in the molecule.
4.1.5 Stereoconfiguration block
One has to
put 0 when doesn’t require to perform stereoconfiguration check. Or put 1 if one
wants to enable stereoconfiguration check. When stereoconfiguration check is
enabled, in the next line type number of asymmetric atoms followed by numbers
of those atoms, each in a separate line (again, numbering is the same as the
order in the first block).
4.1.6 Tautomer names and bunching
In the
first line of this block, one should put a string that will be used as a prefix
of each tautomer name. In the next line, one has to put 0 if doesn’t want to divide
tautomers into groups and subgroups. Or one has to put 1 if division into
groups and subgroups is desired but than in two following lines one has to put
number of atoms that split tautomers into groups and subgroups (the numbering
of atoms is the same as in previous blocks).
4.2 Input file examples
In this
section, cytosine is considered as an example. Cytosine
consist of 8 heavy atoms and 5 hydrogens. The number of generated the
tautomers may vary depending on defined constraints on the minimum and maximum
number of hydrogens at each heavy atom. In the section 4.2.1-4.2.2 two examples of
input files are presented. Each of them let the TauTGen generate tautomers according to constraints
presented in the following tables.
4.2.1 Generation of large number of tautomers
without E,Z stereoisomers.
Generation
of large number of tautomers is done by applying wide range between minimum and
maximum number of hydrogens at each heavy atom. The example constraints are
presented in the table:
Atom |
Minimum
and maximum number of hydrogen atoms at heavy atom |
Number of
sites if 1, 2 or 3 hydrogen atoms present at heavy atom |
|||
Minimum |
Maximum |
1
hydrogen |
2
hydrogens |
3
hydrogens |
|
N1 |
0 |
2 |
1 |
2 |
|
C2 |
0 |
1 |
1 |
|
|
O2 |
0 |
2 |
2 |
2 |
|
N3 |
0 |
2 |
1 |
2 |
|
C4 |
0 |
1 |
1 |
|
|
N4 |
0 |
3 |
2 |
2 |
3 |
C5 |
0 |
2 |
1 |
2 |
|
C6 |
0 |
2 |
1 |
2 |
|
In this
example, the generation of stereoisomers is omitted. Hence, only one site per possible
asymmetric atom are used . (The possibly asymmetric atoms are C2 and C4 because
they are the only ones to have 4 different substitutents after attaching one
hydrogen atom.).
Input file |
Comment |
8 C 1.187597 -0.015098 -0.971085 C 0.029536 0.137264 -1.721102 C -1.206594 0.253832 -1.012796 N -1.170225 -0.168858 0.339470 C 0.021137 -0.177554 1.099375 N 1.205055 0.140332 0.473651 O 0.038072 -0.804498 2.187450 N 2.471115 0.178384 -1.560944 0 1 0 2 0 2 0 2 0 1 0 2 0 2 0 3 25 c4 1 1
0 1.144003 -1.106404 -1.102044 c5 2 1
0 0.074610 0.281884 -2.804140 c5 2 2
0 -0.021099 0.908039 -2.504263 c5 2 2
0 -0.347200 -0.699711 -2.327355 c6 3 1
0 -2.166964 0.015482 -1.487651 c6 3 2
0 -1.684085 1.238137 -0.898138 c6 3 2
0 -2.095573 -0.290079 -1.364759 n1 4 1 0 -2.101124
-0.062984 0.915853 n1 4 2
0 -1.832399
0.463394 0.949210 n1 4 2
0 -1.504583
-1.214007 0.416074 c2 5 1
0 -0.016175 0.836214 1.524678 n3 6 1
0 2.226017
-0.025562 0.847971 n3 6 2
0 1.883148 -0.681857 0.746040 n3 6 2
0 1.895245
0.952142 0.746783 o2t 7 1
0 1.032241 -0.711283 2.648904 o2c 7 1
0 -0.946208 -1.263796 2.361341 o2 7 2
0 1.032241 -0.711283 2.648904 o2 7 2
0 -0.721033 -0.385125 2.864123 n4c 8 1
0 3.177455 0.018934 -0.842596 n4t 8 1
0 2.397382 0.427027 -2.545824 n4 8 2
0 3.177455
0.018934 -0.842596 n4 8 2
0 2.547495
1.093957 -2.000421 n4 8 3 0 2.397060 0.049013 -2.650797 n4 8 3 0 2.828836 1.193918 -1.335719 n4 8 3 0 3.177707 -0.558504 -1.151403 5 0 c 1 8 4 |
Number of atoms in the frame 1st atom of frame – C4 C5 C6 N1 C2 N3 O2 N4 Min #H at C4 = 0; Max #H at C4 =1 Min #H at C5 = 0; Max #H at C5 =2 Min #H at C6 = 0; Max #H at C6 =2 Min #H at N1 = 0; Max #H at N1 =2 Min #H at C2 = 0; Max #H at C2 =1 Min #H at N3 = 0; Max #H at N3 =2 Min #H at O2 = 0; Max #H at O2 =2 Min #H at N4 = 0; Max #H at N4 =3 Number of sites = 25 “c4” site connected to atom #1 (C4), active when #H at C4 =1 “c5” site connected to atom #2 (C5), active when #H at C5 =1 “c5” site connected to atom #2 (C5), active when #H at C5 =2 “c5” site connected to atom #2 (C5), active when #H at C5 =2 … … … … … … … … … … This is a trans position of O2 in hydroxyl tautomer This is a cis position of O2 in hydroxyl tautomer … … This is a cis position of N4 in
imino tautomer This is a trans position of N4 in imino tautomer … … … … … Number of hydrogen atom = 5 Do not use stereoconfiguration check All tautomer names will start with “c” Divide tautomer into groups and subgroups Groups depend on #H atoms at atom #8 (N4) Subgroups depend on #H atoms at atom #4 (N1) |
4.2.1 Generation of reasonable number of
tautomers including E,Z stereoisomers.
In the
following example, the more narrow range of constraints on minimum and maximum
number of hydrogens at each atom are used. In addition
stereoisomers are generated since C2 and C4 become asymmetric atoms when
hydrogen is attached. Each of C2 and C4 have two
connected sites, one below and one above molecular plane.
Atom |
Minimum
and maximum number of hydrogen atoms at heavy atom |
Number of
sites if 1 or 2 hydrogen atoms present at heavy atom |
Asymmetric
atom |
||
Minimum |
Maximum |
1
hydrogen |
2
hydrogens |
|
|
N1 |
0 |
1 |
1 |
|
|
C2 |
0 |
1 |
2 |
|
Yes |
O2 |
0 |
1 |
2 |
|
|
N3 |
0 |
1 |
1 |
|
|
C4 |
0 |
1 |
2 |
|
Yes |
N4 |
1 |
2 |
2 |
2 |
|
C5 |
1 |
2 |
1 |
2 |
|
C6 |
1 |
2 |
1 |
2 |
|
Input file |
Comment |
8 C 1.187597 -0.015098 -0.971085 C 0.029536 0.137264 -1.721102 C -1.206594 0.253832 -1.012796 N -1.170225 -0.168858 0.339470 C 0.021137 -0.177554 1.099375 N 1.205055 0.140332 0.473651 O 0.038072 -0.804498 2.187450 N 2.471115 0.178384 -1.560944 0 1 1 2 1 2 0 1 0 1 0 1 0 1 1 2 18 c4 1 1 1 1.144003 -1.106404 -1.102044 c4 1 1 2 1.176055 1.106404 -0.913044 c5 2 1
0 0.074610 0.281884 -2.804140 c5 2 2
0 -0.021099 0.908039 -2.504263 c5 2 2
0 -0.347200 -0.699711 -2.327355 c6 3 1
0 -2.166964 0.015482 -1.487651 c6 3 2
0 -1.684085 1.238137 -0.898138 c6 3 2
0 -2.095573 -0.290079 -1.364759 n1 4 1
0 -2.101124
-0.062984 0.915853 c2 5 1 1 -0.016175 0.836214 1.524678 c2 5 1 2 -0.016175 -1.05674 0.836214
n3 6 1
0 2.226017
-0.025562 0.847971 o2t 7 1
0 1.032241 -0.711283 2.648904 o2c 7 1
0 -0.946208 -1.263796 2.361341 n4c 8 1
0 3.177455 0.018934 -0.842596 n4t 8 1
0 2.397382 0.427027 -2.545824 n4 8 2
0 3.177455 0.018934 -0.842596 n4 8 2
0 2.547495
1.093957 -2.000421 5 1 2 1 5 c 1 8 4 |
Number of atoms in the frame 1st atom of frame – C4 C5 C6 N1 C2 N3 O2 N4 Min #H at C4 = 0; Max #H at C4 =1 Min #H at C5 = 1; Max #H at C5 =2 Min #H at C6 = 1; Max #H at C6 =2 Min #H at N1 = 0; Max #H at N1 =1 Min #H at C2 = 0; Max #H at C2 =1 Min #H at N3 = 0; Max #H at N3 =1 Min #H at O2 = 0; Max #H at O2 =1 Min #H at N4 = 1; Max #H at N4 =2 Number of sites = 18 “c4” connected to atom #1 (C4), active when #H at C4=1, above “c4” connected to atom #1 (C4), active when #H at C4=1, below “c5” site connected to atom #2 (C5), active when #H at C5 =1 “c5” site connected to atom #2 (C5), active when #H at C5 =2 “c5” site connected to atom #2 (C5), active when #H at C5 =2 … … … … “c2” connected to atom #5 (C2), active when #H at C4=1, above “c2” connected to atom #5 (C2), active when #H at C4=1, above … This is a trans position of O2 in hydroxyl tautomer This is a cis position of O2 in hydroxyl tautomer This is a cis position of N4 in
imino tautomer This is a trans position of N4 in imino tautomer … … Number of hydrogen atoms = 5 Use stereoconfiguration check Number of asymmetric atoms 1st asymmetric atom = atom #1 (C4) 2nd asymmetric atom = atom #5(C2) All tautomer names will start with “c” Divide tautomer into groups and subgroups Groups depend on #H atoms at atom #8 (N4) Subgroups depend on #H atoms at atom #4 (N1) |