Notes on Premature Convergence in Genetic Algorithms
This post again applies to PGAPack and PGApy or other Genetic Algorithm (GA) implementations.
When optimizing solutions with GA, it sometimes happens that a sub-optimal solution is found because the whole population converges early to a part of the search space where no better solutions are found anymore. This effect is called premature convergence.
It is usually hard to detect premature convergence, a good measure is
the mean hamming distance between individuals. In PGAPack reporting of
the hamming distance can be enabled with the reporting option
PGA_REPORT_HAMMING
, set with the function PGASetPrintOptions
in
PGAPack and with the print_options
constructor parameter in PGApy.
Unfortunately this is only implemented for the binary datatype.
One reason for the effect of premature convergence is the use of Fitness Proportionate Selection as detailed in an earlier post in this blog [1]. If during the early stages of the search an individual is discovered that is much better than anything found so far, the chance is high that this individual takes over the whole population when Fitness Proportionate Selection is in use, preventing any further progress of the algorithm. The reason is that an individual that is much better than all others gets an unreasonably high proportion of the roulette wheel when selecting individuals for the next generation resulting in all other genetic material having only a slim chance of making it into the next generation.
Another reason for premature convergence can be a small population size.
Goldberg et. al. [9] give estimates of the population size for classic
GA with a small alphabet (the number of different allele values, e.g. 0
and 1 for a binary GA) with cardinality \(\chi\), a problem specific
building block size that overcomes deception \(k \ll n\) where
\(k\) is the building block size (a measure for the difficulty of
the problem) and \(n\) is the string (gene) length. They show
that the population size should be \(O(m\chi^k)\) with \(m=n/k\)
so that for a given difficulty of the problem \(k\) the population
size is proportional to the string length \(n\). This result,
however, does not readily translate to problems with a large alphabet,
e.g. floating-point representations like the real
data type in
PGAPack. For floating point representations, difficulty usually
translates to how multimodal a problem is, i.e., how many peaks (in
case of a maximization problem) or valleys (in case of a minimization
problem) there are in the objective function.
Now if with an adequate population size and an appropriate selection scheme premature convergence still occurs, there are some mechanisms that can be used.
Prevention of Duplicates
PGAPack implements a mechanism for preventing duplicate gene strings (individuals). In previous implementations the computation effort was quadratic in the population size \(N\), i.e. the effort was \(O(N^2)\) (it compared each new individual with all current members of the population, once for each new individual). In the latest versions it uses hashing for detecting duplicates, reducing the overhead to \(O(N)\), a small constant overhead for each new individual.
For user-defined data types this means that the user has to define a hash function for the data type in addition to the string comparison function. For the built-in data types (binary, integer, real) this is automatically available.
Prevention of duplicates works quite well for binary and integer data types, especially if the genetic operators have a high probability of producing duplicates. It does not work so well for the real data type because new individuals tend to be different from other individuals even if they can often be very close to already existing individuals.
Restricted Tournament Replacement
An algorithm originally called restricted tournament selection by Harik [2], [3] and later adopted under the name of restricted tournament replacement (RTR) by Pelikan [4] uses the old and new population for deciding if a candidate individual will be allowed into the new population. It works by randomly selecting a number of members from the old population (called the selection window), chosing the individual which is most similar to the candidate, and allows the candidate into the new population only if it is better than this most similar individual.
The default for the number of individuals randomly selected from the old
population (the window size) by default is \(\min (n,
\frac{N}{20})\) [4] where \(n\) is the string (gene) length and \(N\)
is the population size. This can be set by the user with the
PGASetRTRWindowSize
function for PGAPack and with the
rtr_window_size
constructor parameter of PGApy.
The RTR algorithm needs a similarity metric for deciding how close an individual is to another. By default this uses a manhattan distance (equivalent to the hamming distance for binary genes), i.e. an allele-by-allele sum of distances, but can be set to an euclidian distance or a user-defined metric with the user-function mechanism of PGAPack. Comparison of an euclidian distance metric for RTR is in the magic square example in PGApy where the use of the euclidian distance can be turned on with a command-line option.
Restricted tournament replacement works well not only for binary and integer genes but also for real genes. It can be combined with different evolutionary algorithm settings.
The effect of RTR on a problem that tends to suffer from premature
convergence can be seen in the test program examples/mgh/testprog.f
,
this implements several test functions from an old benchmark of
nonlinear test problems [5]. The test function that exhibits premature
convergence is what the authors call a "Gaussian function", described as
example (9) in the paper [5] and implemented as function 3 in
examples/mgh/objfcn77.f
. This function is given as
with
And tabulated values for \(y_i\) given in the paper or the
implementation in examples/mgh/objfcn77.f
. The minimization problem
from these equations is
with \(m=15\) for this test problem. The authors [5] give the vector
\(x_0 = (0.4, 1, 0)\) for the minimum
\(f(x_0) = 1.12793 \cdot 10^{-8}\)
they found. The original Fortran implementation in
examples/mgh/testprog.f
uses a population size of 10000 with default
settings for the real
data type of PGAPack. The large population
size is chosen because otherwise the problem exhibits premature
convergence. It finds a solution in 100 generations
\(x_0=(0.3983801, 0.9978369, 0.009918243)\) with an evaluation value
of \(f(x_0)=2.849966\cdot 10^{-5}\). The number of function
evaluations needed were 105459 (this is a little less than
\(10000 + 100 \cdot 1000\), i.e. evaluation of the initial
generation plus evaluation of 10% of the population of each generation,
the probability of crossover and mutation is not 100%, so it happens
that none of the operators is performed on an individual and it is not
re-evaluated).
I've implemented the same problem with Differential Evolution [6],
[7], [8] in examples/mgh/testprogde.c
(the driver program
implemented in C because I really do not speak Fortran but using the
same functions from the Fortran implementation linking the Fortran and C
code into a common executable). This uses a population size of 2000
(large for most problems when using Differential Evolution, again for
the reason of premature convergence) and finds the solution
\(x_0=(0.3992372, 0.9990791, -0.0007697923)\) with an evaluation
value of \(f(x_0)=7.393123\cdot 10^{-7}\) in only 30 generations.
This amounts to 62000 function evaluations (Differential Evolution
creates all individuals for the new generation and decides afterwards
which to keep).
When using RTR with this problem in examples/mgh/testprogdertr.c
, the
population size can be reduced to 250 and even after 100 generations the
search has not converged to a suboptimal solution. After 100 generations
we find \(x_0=(0.398975, 1.000074, -6.719886 \cdot 10^{-5})\) and
\(f(x_0)=1.339766\cdot 10^{-8}\) (but also with some changed settings of
the Differential Evolution algorithm). This amounts to 25250 function
evaluations.
Restart
A last resort when the above mechanisms do not work is to regularly
restart the GA whenever the population has converged too much. The
restart mechanism implemented in PGAPack uses the best individual from
the population to re-seed a new population with variations created by
mutation from this best individual. Restarts can be enabled by setting
PGASetRestartFlag
in PGAPack or using the restart
constructor
parameter in PGApy. The frequency (default is every 50 generations) of
restarts can be set with PGASetRestartFrequencyValue
in PGAPack and
the restart_frequency
constructor parameter in PGApy.