Chr. Benecke, A. Kerber, R. Laue
In order to apply the previously described algorithm for a canonical numbering of 2D Structures,
we first determine which information can be used to describe the structure.
Building the Initial Classification
Of course, using the atom type description, we can build initial equivalence classes of atoms.
This classification can easily be refined taking into account, for example, the number
of double bonds an atom is connected with, the number of H-atoms connected to a heavy atom
or other typical atom properties ( e.g. atom is part of an aromatic ring,...).
Please remember, that the more properties you use, the faster the algorithm will
get.
To ensure that equivalent molecules with different atom numbering start with the same
initial classification, we have to determine a total order with respect to the atom properties.
In our example, this order can be defined as follows:
Using this order, the starting classification should be fine (that means, even if the molecule consists of only e.g.
C-atoms, there is a good chance to get at least two separate classes which drastically reduces the time spent for canonical
numbering).
Having found this classification, two cases can occur.
- If none of the classes consists of only one member, the backtracking algorithm must be started, which is described in the
Backtracking chapter.
- One or more classes consist of one member only.
Using the Iterated Classification
In the second case, we use the atoms in the single membered classes in order to refine the other classes. First we memorize the
classes with only one member in the order induced by their members.
Starting with the largest class we memorized but did not use for refinement yet, we split all other classes that consist of more
than one atom.
Using the connectivity information of the molecule, the class to be splitted is divided into one class that contains atoms which
are connected to the one the memorized class consists of and another class that contains the atoms which are not connected to
it.
Of course, there is not always a new class created, but if the resulting classes now only contain
one atom, they are also memorized in a fixed order (e.g. the class with the atoms connected
to the atom we are refining with, are memorized first.).
This procedure is used as long as there is a memorized class available which we did not use for refinement so far. If, after that,
all classes consist of only a single atom, the end of the algorithm is reached. If not, we have to go on with the Backtracking.
Backtracking
Starting the Backtracking part, we first determine the class containing the fewest number of atoms. This strategy is expected
to act best in this case.
Before choosing an atom out of this class to continue with, we have to set a rule for this choice:
Assume, that we already memorized three classes with a single element member. We now rearrange the connection
table for the molecule, so that these atoms (A1, A2 and A3) are the first to occur in a row. In addition to that, we only
take into account the lower half of the connection table, for obvious reasons.
A1 *
A2 1 *
A3 1 0 *
A4 . . . *
.
.
In this examplary connection table, A2 is connected to A1, A3 to A1, and A3 not to A2.
We will now define an order for connection tables that enables us to decide, whether one connection table is larger than another.
This is done in a very natural way, just by appending the numbers in the rows to the preceding row. Here, it will look like: 1 1
0 . . . Another Connection table is said to be larger than this, if the first n numbers are equivalent but the
(n+1). number is larger than the one in the first connection table.
What we can do now, is select only those atoms of the class as a new A4, where the connection table
is larger than it would become if we had chosen another atom.
If only one atom fullfills this condition,
we do not need to start the backtracking, but split the actual class in one containing only this
atom and another class containing the rest of the atoms. After memorizing the new single atom member class, we continue with the
iterated classification.
If there are more than one atoms whose choice would lead to the largest connection
table, we have to build new branches for the backtracking algorithm. However, all the up to this point largest
connection tables must be stored so we can immediately cut off useless branches when a larger connection tables occurs.
If no more refinement is necessary, that means if all classes consist of one single atom, the canonical numbering of the structure
is the list of the memorized classes containing only one atom. To show this in a practical example, please follow the link to the
next chapter.
Previous: Canonical Representatives for Nested Structures
Next: Example
© Chr. Benecke, Oct. 1995