SourceForge.net Logo

 

 

1. Introduction

 

TauTGen (pronounced “tau-t-gen”) is a tautomer generator program. TauTGen constructs tautomers from molecular frame built of heavy atoms and a number of hydrogens. The user has to provide the geometry of molecular frame and specify constrains on minimum and maximum number of hydrogen atoms connected to each heavy atom. The places of possible placement of hydrogen atoms, the sites, have to be also defined by the user.

 

TauTGen was developed by Maciej Haranczyk under supervision of prof. Maciej Gutowski. It was originally a part of PhD project on anions of nucleic acid bases.

 

1.1 Citation

 

TauTGen is available free of charge on the basis of GNU license. However, we kindly ask to cite the following article whenever publishing results produced by TauTGen.

 

Maciej Haranczyk and Maciej Gutowski  Quantum Mechanical Energy –Based Screening of Combinatorially Generated Library of Tautomers. TauTGen: A Tautomer Generator ProgramJournal of Chemical Information and Modeling 2007; 47(2); 686-694; DOI: 10.1021/ci6002703

 

1.2 User feedback

 

This software comes without warranty or guarantee of support, but we will try to meet the needs of our user community. Please send bug reports, requests for enhancement, or other comments to maharan@chem.univ.gda.pl

 

2. Installation and usage

 

TauTGen is provided as a .tgz archive containing three .c file. To compile the main program just run the C complier:

gcc main.c –o TauTGen –lm

 

(There are two other versions of TauTGen that can save output file to PDB and SDF files. The complile them, try:

gcc main_pdb.c –o TauTGenPDB –lm

gcc main_sdf.c –o TauTGenSDF –lm

)

 

In order to run the program to test the input file and get the total number of tautomers as well as their names, one has to try:

./TauTGen inputfile

However, if one wants to save each tautomer into separate .xyz file, the program has to be executed with an extra parameter

./TauTGen inputfile save

 

3. How TauTGen works ?

 

TauTGen constructs tautomers from molecular frame built of heavy atoms and a number of hydrogens.

 The user has to provide the geometry of molecular frame and specify constrains on minimum and maximum number of hydrogen atoms connected to each heavy atom. The places of possible placement of hydrogen atoms, the sites, are also defined by the user. To define a site, the user has to provide the following information:

Special care is taken to preciously name the sites. These names are used to create the tautomer names that are later used as the filenames. For example, sites A and B (left figure) should be named  N4cis” and “N4trans” to distinguish possible rotamers resulting from rotation of imino group. Information about connectivity is used to count number of hydrogen atoms at each heavy atom.

Each site has defined constraint that it is used only if a specified number of hydrogen atoms is present at particular heavy atom. It is called the site constraint. This option is used to build proper hybridization of framework atoms. For example, site C and site E (middle figure) are used only when there are 2 hydrogen atoms at C5 (so it has sp3 hybridization) and site D is used only is there is one hydrogen at C5 (so it has sp2 hybridization).  If the user desires to generate stereoisomers of a molecule, than two sites have to be used for each asymmetric atom in order to describe R and S configuration of heavy atom (or sites that are “below” and “above” molecular plane, see right figure). Each of these sites have additional information describing the configuration, e.g. 1 or 2 for respectively “above” or “below” configuration.

Having all required information TauTGen generates all possible combinations how to distribute all available hydrogens among defined sites. For each generated combination TauTGen checks whatever it is in agreements with the applied constrains. The constrains are checked in the following order:

The generated tautomer is rejected when it does not pass any of above checks. The stereoconfiguration check is an additional routine that has enantiomer detection algorithm. If an enantiomer of previously generated stereoisomer is built, it is rejected so the final set  of stereoisomers consist of diasteroisomers only.

The following steps are the part of the stereoconfiguration check:

At the last step TauTGen generates the filenames and saves the coordinates of each tautomer to a separate file. The filename is a cluster of the sites names that were used to build up a tautomers. If proper sites names are defined, the filename can uniquely name the tautomer in the file (including at specified rotamer). In the case of molecules with large number of tautomers, there is a possibility to divide the tautomers among groups and subgroups according to number of hydrogens at two specified atoms of molecular frame. If the stereoisomers were generated, the tautomer name is supplemented with stereoconfiguration information: eg. ”Z_nml” where n,m and l are sites names placed on the same side of molecular plane.   

 

 

4. Input file

4.1 Input file structure

The input file contains all information needed to generate tautomers. It consists of the following blocks:

 

Definition of molecular frame

Definition of constraints on min. and max. number of hydrogens at each heavy atom

Definition of sites

Number of hydrogens

Definition of stereoconfiguration

Naming and division into groups and subgroups

 

Each of these blocks is described in following sections. The sample input files supplemented with our comments are presented in section 4.2.

 

4.1.1 Definition of molecular frame

This block describes the geometry of molecular frame. Currently only Cartesian coordinates of atoms might be used to specify a molecule. The format is as described below:

<integer number_of_atoms_in_frame>

<string 1st_atom_symbol> <real xyz_coordinates_of_1st_atom>

<string 2nd_atom_symbol> <real xyz_coordinates_of_2nd_atom>

<string last_atom_symbol> <real xyz_coordinates_of_last_atom>

 

4.1.2 Constraints on minimum and maximum number of hydrogens at each heavy atom

The minimum and maximum values for each atom of the molecular frame are specified in separate lines. The order of atoms is the same as in block defining geometry of the molecular frame (section 4.1.1).

<integer min_#H_at_1st atom> <integer max_#H_at_1st_atom>

<integer min_#H_at_2nd_atom> <integer max_#H_at_2nd_atom>

<integer min_#H_at_last_atom> <integer max_#H_at_last_atom>

 

4.1.3 Definition of sites

This block contains definition of sites in the following format:

<integer number_of_sites>

<string site_name> <int which_atom> <int site_constrint> <int stereo_inf> <real x y z>

With last line repeated for each site. Sites should have precise names as explained in section 3.1.

The integer which_atom defines to which atom the site is connected (the same order as in the first block). The site constraint specifies how many hydrogen atoms are required at particular heavy atom to make this site active. Stereo_inf defines stereoconfiguration: 0 when stereoconfiguration check is not used, 1 or 2 when stereoconfiguration check is desired and the site is below or above molecular plane, respectively. The position of the site is given in Cartesian coordinates.

 

4.1.4 Number of hydrogens

This block contains only one line. It contains an integer specifying the total number of hydrogen atoms in the molecule.

 

4.1.5 Stereoconfiguration block

One has to put 0 when doesn’t require to perform stereoconfiguration check. Or put 1 if one wants to enable stereoconfiguration check. When stereoconfiguration check is enabled, in the next line type number of asymmetric atoms followed by numbers of those atoms, each in a separate line (again, numbering is the same as the order in the first block).

 

4.1.6 Tautomer names and bunching

In the first line of this block, one should put a string that will be used as a prefix of each tautomer name. In the next line, one has to put 0 if doesn’t want to divide tautomers into groups and subgroups. Or one has to put 1 if division into groups and subgroups is desired but than in two following lines one has to put number of atoms that split tautomers into groups and subgroups (the numbering of atoms is the same as in previous blocks). 

 

4.2 Input file examples

In this section, cytosine is considered as an example. Cytosine consist of 8 heavy atoms and 5 hydrogens. The number of generated the tautomers may vary depending on defined constraints on the minimum and maximum number of hydrogens at each heavy atom. In the  section 4.2.1-4.2.2 two examples of input files are presented. Each of them let the TauTGen generate  tautomers according to constraints presented in the following tables.

 

4.2.1 Generation of large number of tautomers without E,Z stereoisomers.

Generation of large number of tautomers is done by applying wide range between minimum and maximum number of hydrogens at each heavy atom. The example constraints are presented in the table:

 

Atom

Minimum and maximum number of hydrogen atoms at heavy atom

Number of sites if 1, 2 or 3 hydrogen atoms present at heavy atom

Minimum

Maximum

1 hydrogen

2 hydrogens

3 hydrogens

N1

0

2

1

2

 

C2

0

1

1

 

 

O2

0

2

2

2

 

N3

0

2

1

2

 

C4

0

1

1

 

 

N4

0

3

2

2

3

C5

0

2

1

2

 

C6

0

2

1

2

 

 

In this example, the generation of stereoisomers is omitted. Hence, only one site per  possible asymmetric atom are used . (The possibly asymmetric atoms are C2 and C4 because they are the only ones to have 4 different substitutents after attaching one hydrogen atom.).

 

 

Input file

Comment

8

 C     1.187597    -0.015098    -0.971085

 C     0.029536     0.137264    -1.721102

 C    -1.206594     0.253832    -1.012796

 N    -1.170225    -0.168858     0.339470

 C     0.021137    -0.177554     1.099375

 N     1.205055     0.140332     0.473651

 O     0.038072    -0.804498     2.187450

 N     2.471115     0.178384    -1.560944

0 1

0 2

0 2

0 2

0 1

0 2

0 2

0 3

25

c4 1 1 0    1.144003    -1.106404    -1.102044

c5 2 1 0     0.074610     0.281884    -2.804140

c5 2 2 0    -0.021099     0.908039    -2.504263

c5 2 2 0    -0.347200    -0.699711    -2.327355

c6 3 1 0    -2.166964     0.015482    -1.487651

c6 3 2 0    -1.684085     1.238137    -0.898138

c6 3 2 0    -2.095573    -0.290079    -1.364759

n1 4 1 0    -2.101124    -0.062984     0.915853

n1 4 2 0    -1.832399     0.463394     0.949210

n1 4 2 0    -1.504583    -1.214007     0.416074

c2 5 1 0    -0.016175     0.836214     1.524678

n3 6 1 0     2.226017    -0.025562     0.847971

n3 6 2 0     1.883148    -0.681857     0.746040

n3 6 2 0     1.895245     0.952142     0.746783

o2t 7 1 0    1.032241    -0.711283     2.648904

o2c 7 1 0    -0.946208    -1.263796     2.361341

o2 7 2 0     1.032241    -0.711283     2.648904

o2 7 2 0    -0.721033    -0.385125     2.864123

n4c 8 1 0     3.177455     0.018934    -0.842596

n4t 8 1 0     2.397382     0.427027    -2.545824

n4 8 2 0     3.177455     0.018934    -0.842596

n4 8 2 0     2.547495     1.093957    -2.000421

n4 8 3 0     2.397060     0.049013    -2.650797

n4 8 3 0     2.828836     1.193918    -1.335719

n4 8 3 0     3.177707    -0.558504    -1.151403

5

0

c

1

8

4

 

Number of atoms in the frame

1st atom of frame – C4

C5

C6

N1

C2

N3

O2

N4

Min #H at C4 = 0; Max #H at C4 =1

Min #H at C5 = 0; Max #H at C5 =2

Min #H at C6 = 0; Max #H at C6 =2

Min #H at N1 = 0; Max #H at N1 =2

Min #H at C2 = 0; Max #H at C2 =1

Min #H at N3 = 0; Max #H at N3 =2

Min #H at O2 = 0; Max #H at O2 =2

Min #H at N4 = 0; Max #H at N4 =3

Number of sites = 25

“c4” site connected to atom #1 (C4), active when #H at C4 =1

“c5” site connected to atom #2 (C5), active when #H at C5 =1

“c5” site connected to atom #2 (C5), active when #H at C5 =2

“c5” site connected to atom #2 (C5), active when #H at C5 =2

This is a trans position of O2 in hydroxyl tautomer

This is a cis position of O2 in hydroxyl tautomer

This is a cis position of  N4 in imino tautomer

This is a trans position of N4 in imino tautomer

Number of hydrogen atom = 5

Do not use stereoconfiguration check

All tautomer names will start with “c”

Divide tautomer into groups and subgroups

Groups depend on #H atoms at atom #8 (N4)

Subgroups depend on #H atoms at atom #4 (N1)

 

 

 

4.2.1 Generation of reasonable number of tautomers including E,Z stereoisomers.

In the following example, the more narrow range of constraints on minimum and maximum number of hydrogens at each atom are used. In addition stereoisomers are generated since C2 and C4 become asymmetric atoms when hydrogen is attached. Each of C2 and C4 have two connected sites, one below and one above molecular plane.

 

Atom

Minimum and maximum number of hydrogen atoms at heavy atom

Number of sites if 1 or 2 hydrogen atoms present at heavy atom

Asymmetric atom

Minimum

Maximum

1 hydrogen

2 hydrogens

 

N1

0

1

1

 

 

C2

0

1

2

 

Yes

O2

0

1

2

 

 

N3

0

1

1

 

 

C4

0

1

2

 

Yes

N4

1

2

2

2

 

C5

1

2

1

2

 

C6

1

2

1

2

 

 

 

 

 

Input file

Comment

8

 C     1.187597    -0.015098    -0.971085

 C     0.029536     0.137264    -1.721102

 C    -1.206594     0.253832    -1.012796

 N    -1.170225    -0.168858     0.339470

 C     0.021137    -0.177554     1.099375

 N     1.205055     0.140332     0.473651

 O     0.038072    -0.804498     2.187450

 N     2.471115     0.178384    -1.560944

0 1

1 2

1 2

0 1

0 1

0 1

0 1

1 2

18

c4 1 1 1    1.144003    -1.106404    -1.102044

c4 1 1 2    1.176055     1.106404    -0.913044

c5 2 1 0     0.074610     0.281884    -2.804140

c5 2 2 0    -0.021099     0.908039    -2.504263

c5 2 2 0    -0.347200    -0.699711    -2.327355

c6 3 1 0    -2.166964     0.015482    -1.487651

c6 3 2 0    -1.684085     1.238137    -0.898138

c6 3 2 0    -2.095573    -0.290079    -1.364759

n1 4 1 0    -2.101124    -0.062984     0.915853

c2 5 1 1    -0.016175     0.836214     1.524678

c2 5 1 2    -0.016175    -1.05674       0.836214

n3 6 1 0     2.226017    -0.025562     0.847971

o2t 7 1 0    1.032241    -0.711283     2.648904

o2c 7 1 0    -0.946208    -1.263796     2.361341

n4c 8 1 0     3.177455     0.018934    -0.842596

n4t 8 1 0     2.397382     0.427027    -2.545824

n4 8 2 0     3.177455     0.018934    -0.842596

n4 8 2 0     2.547495     1.093957    -2.000421

5

1

2

1

5

c

1

8

4

 

Number of atoms in the frame

1st atom of frame – C4

C5

C6

N1

C2

N3

O2

N4

Min #H at C4 = 0; Max #H at C4 =1

Min #H at C5 = 1; Max #H at C5 =2

Min #H at C6 = 1; Max #H at C6 =2

Min #H at N1 = 0; Max #H at N1 =1

Min #H at C2 = 0; Max #H at C2 =1

Min #H at N3 = 0; Max #H at N3 =1

Min #H at O2 = 0; Max #H at O2 =1

Min #H at N4 = 1; Max #H at N4 =2

Number of sites = 18

“c4” connected to atom #1 (C4), active when #H at C4=1, above

“c4” connected to atom #1 (C4), active when #H at C4=1, below

“c5” site connected to atom #2 (C5), active when #H at C5 =1

“c5” site connected to atom #2 (C5), active when #H at C5 =2

“c5” site connected to atom #2 (C5), active when #H at C5 =2

“c2” connected to atom #5 (C2), active when #H at C4=1, above

“c2” connected to atom #5 (C2), active when #H at C4=1, above

This is a trans position of O2 in hydroxyl tautomer

This is a cis position of O2 in hydroxyl tautomer

This is a cis position of  N4 in imino tautomer

This is a trans position of N4 in imino tautomer

Number of hydrogen atoms = 5

Use stereoconfiguration check

Number of asymmetric atoms

1st asymmetric atom = atom #1 (C4)

2nd asymmetric atom = atom #5(C2)

All tautomer names will start with “c”

Divide tautomer into groups and subgroups

Groups depend on #H atoms at atom #8 (N4)

Subgroups depend on #H atoms at atom #4 (N1)