Notes on Premature Convergence in Genetic Algorithms



This post again applies to PGAPack and PGApy or other Genetic Algorithm (GA) implementations.

When optimizing solutions with GA, it sometimes happens that a sub-optimal solution is found because the whole population converges early to a part of the search space where no better solutions are found anymore. This effect is called premature convergence.

It is usually hard to detect premature convergence, a good measure is the mean hamming distance between individuals. In PGAPack reporting of the hamming distance can be enabled with the reporting option PGA_REPORT_HAMMING, set with the function PGASetPrintOptions in PGAPack and with the print_options constructor parameter in PGApy. Unfortunately this is only implemented for the binary datatype.

One reason for the effect of premature convergence is the use of Fitness Proportionate Selection as detailed in an earlier post in this blog [1]. If during the early stages of the search an individual is discovered that is much better than anything found so far, the chance is high that this individual takes over the whole population when Fitness Proportionate Selection is in use, preventing any further progress of the algorithm. The reason is that an individual that is much better than all others gets an unreasonably high proportion of the roulette wheel when selecting individuals for the next generation resulting in all other genetic material having only a slim chance of making it into the next generation.

Another reason for premature convergence can be a small population size. Goldberg et. al. [9] give estimates of the population size for classic GA with a small alphabet (the number of different allele values, e.g. 0 and 1 for a binary GA) with cardinality \(\chi\), a problem specific building block size that overcomes deception \(k \ll n\) where \(k\) is the building block size (a measure for the difficulty of the problem) and \(n\) is the string (gene) length. They show that the population size should be \(O(m\chi^k)\) with \(m=n/k\) so that for a given difficulty of the problem \(k\) the population size is proportional to the string length \(n\). This result, however, does not readily translate to problems with a large alphabet, e.g. floating-point representations like the real data type in PGAPack. For floating point representations, difficulty usually translates to how multimodal a problem is, i.e., how many peaks (in case of a maximization problem) or valleys (in case of a minimization problem) there are in the objective function.

Now if with an adequate population size and an appropriate selection scheme premature convergence still occurs, there are some mechanisms that can be used.

Prevention of Duplicates

PGAPack implements a mechanism for preventing duplicate gene strings (individuals). In previous implementations the computation effort was quadratic in the population size \(N\), i.e. the effort was \(O(N^2)\) (it compared each new individual with all current members of the population, once for each new individual). In the latest versions it uses hashing for detecting duplicates, reducing the overhead to \(O(N)\), a small constant overhead for each new individual.

For user-defined data types this means that the user has to define a hash function for the data type in addition to the string comparison function. For the built-in data types (binary, integer, real) this is automatically available.

Prevention of duplicates works quite well for binary and integer data types, especially if the genetic operators have a high probability of producing duplicates. It does not work so well for the real data type because new individuals tend to be different from other individuals even if they can often be very close to already existing individuals.

Restricted Tournament Replacement

An algorithm originally called restricted tournament selection by Harik [2], [3] and later adopted under the name of restricted tournament replacement (RTR) by Pelikan [4] uses the old and new population for deciding if a candidate individual will be allowed into the new population. It works by randomly selecting a number of members from the old population (called the selection window), chosing the individual which is most similar to the candidate, and allows the candidate into the new population only if it is better than this most similar individual.

The default for the number of individuals randomly selected from the old population (the window size) by default is \(\min (n, \frac{N}{20})\) [4] where \(n\) is the string (gene) length and \(N\) is the population size. This can be set by the user with the PGASetRTRWindowSize function for PGAPack and with the rtr_window_size constructor parameter of PGApy.

The RTR algorithm needs a similarity metric for deciding how close an individual is to another. By default this uses a manhattan distance (equivalent to the hamming distance for binary genes), i.e. an allele-by-allele sum of distances, but can be set to an euclidian distance or a user-defined metric with the user-function mechanism of PGAPack. Comparison of an euclidian distance metric for RTR is in the magic square example in PGApy where the use of the euclidian distance can be turned on with a command-line option.

Restricted tournament replacement works well not only for binary and integer genes but also for real genes. It can be combined with different evolutionary algorithm settings.

The effect of RTR on a problem that tends to suffer from premature convergence can be seen in the test program examples/mgh/testprog.f, this implements several test functions from an old benchmark of nonlinear test problems [5]. The test function that exhibits premature convergence is what the authors call a "Gaussian function", described as example (9) in the paper [5] and implemented as function 3 in examples/mgh/objfcn77.f. This function is given as

\begin{equation*} f(x) = x_1 \cdot e^{\frac{x_2(t_i - x_3)^2}{2}} - y_i \end{equation*}

with

\begin{equation*} t_i = (8 - i) / 2 \end{equation*}

And tabulated values for \(y_i\) given in the paper or the implementation in examples/mgh/objfcn77.f. The minimization problem from these equations is

\begin{equation*} f (x) = \sum_{i=1}^{m} f_i^2(x) \end{equation*}

with \(m=15\) for this test problem. The authors [5] give the vector \(x_0 = (0.4, 1, 0)\) for the minimum \(f(x_0) = 1.12793 \cdot 10^{-8}\) they found. The original Fortran implementation in examples/mgh/testprog.f uses a population size of 10000 with default settings for the real data type of PGAPack. The large population size is chosen because otherwise the problem exhibits premature convergence. It finds a solution in 100 generations \(x_0=(0.3983801, 0.9978369, 0.009918243)\) with an evaluation value of \(f(x_0)=2.849966\cdot 10^{-5}\). The number of function evaluations needed were 105459 (this is a little less than \(10000 + 100 \cdot 1000\), i.e. evaluation of the initial generation plus evaluation of 10% of the population of each generation, the probability of crossover and mutation is not 100%, so it happens that none of the operators is performed on an individual and it is not re-evaluated).

I've implemented the same problem with Differential Evolution [6], [7], [8] in examples/mgh/testprogde.c (the driver program implemented in C because I really do not speak Fortran but using the same functions from the Fortran implementation linking the Fortran and C code into a common executable). This uses a population size of 2000 (large for most problems when using Differential Evolution, again for the reason of premature convergence) and finds the solution \(x_0=(0.3992372, 0.9990791, -0.0007697923)\) with an evaluation value of \(f(x_0)=7.393123\cdot 10^{-7}\) in only 30 generations. This amounts to 62000 function evaluations (Differential Evolution creates all individuals for the new generation and decides afterwards which to keep).

When using RTR with this problem in examples/mgh/testprogdertr.c, the population size can be reduced to 250 and even after 100 generations the search has not converged to a suboptimal solution. After 100 generations we find \(x_0=(0.398975, 1.000074, -6.719886 \cdot 10^{-5})\) and \(f(x_0)=1.339766\cdot 10^{-8}\) (but also with some changed settings of the Differential Evolution algorithm). This amounts to 25250 function evaluations.

Restart

A last resort when the above mechanisms do not work is to regularly restart the GA whenever the population has converged too much. The restart mechanism implemented in PGAPack uses the best individual from the population to re-seed a new population with variations created by mutation from this best individual. Restarts can be enabled by setting PGASetRestartFlag in PGAPack or using the restart constructor parameter in PGApy. The frequency (default is every 50 generations) of restarts can be set with PGASetRestartFrequencyValue in PGAPack and the restart_frequency constructor parameter in PGApy.