.. title: Notes on Premature Convergence in Genetic Algorithms
.. slug: 06
.. date: 2023-01-06 18:45
.. tags: documentation,english,genetic algorithms,howto,open source,optimization
.. description:
.. wp-status: publish
.. has_math: true
.. |--| unicode:: U+2013 .. en dash
.. |ohm| unicode:: U+02126 .. Omega
:trim:
.. |_| unicode:: U+00A0 .. Non-breaking space
:trim:
.. |examples/mgh/testprog.f| replace:: ``examples/mgh/testprog.f``
.. |examples/mgh/objfcn77.f| replace:: ``examples/mgh/objfcn77.f``
.. |examples/mgh/testprogde.c| replace:: ``examples/mgh/testprogde.c``
.. |examples/mgh/testprogdertr.c| replace:: ``examples/mgh/testprogdertr.c``
This post again applies to PGAPack_ and PGApy_ or other Genetic
Algorithm (GA) implementations.
When optimizing solutions with GA, it sometimes happens that a
sub-optimal solution is found because the whole population converges
early to a part of the search space where no better solutions are found
anymore. This effect is called `premature convergence`_.
It is usually hard to detect `premature convergence`_, a good measure is
the mean `hamming distance`_ between individuals. In PGAPack_ reporting of
the `hamming distance`_ can be enabled with the reporting option
``PGA_REPORT_HAMMING``, set with the function ``PGASetPrintOptions`` in
PGAPack_ and with the ``print_options`` constructor parameter in PGApy_.
Unfortunately this is only implemented for the binary datatype.
One reason for the effect of `premature convergence`_ is the use of
`Fitness Proportionate Selection`_ as detailed in an earlier post in
this blog [1]_. If during the early stages of the search an individual
is discovered that is much better than anything found so far, the chance
is high that this individual takes over the whole population when
`Fitness Proportionate Selection`_ is in use, preventing any further
progress of the algorithm. The reason is that an individual that is
much better than all others gets an unreasonably high proportion of the
roulette wheel when selecting individuals for the next generation
resulting in all other genetic material having only a slim chance of
making it into the next generation.
Another reason for premature convergence can be a small population size.
Goldberg et. al. [9]_ give estimates of the population size for classic
GA with a small alphabet (the number of different allele values, e.g. 0
and 1 for a binary GA) with cardinality :math:`\chi`, a problem specific
building block size that overcomes deception :math:`k \ll n` where
:math:`k` is the building block size (a measure for the difficulty of
the problem) and :math:`n` is the string (gene) length. They show
that the population size should be :math:`O(m\chi^k)` with :math:`m=n/k`
so that for a given difficulty of the problem :math:`k` the population
size is proportional to the string length :math:`n`. This result,
however, does not readily translate to problems with a large alphabet,
e.g. floating-point representations like the ``real`` data type in
PGAPack_. For floating point representations, difficulty usually
translates to how multimodal a problem is, i.e., how many *peaks* (in
case of a maximization problem) or *valleys* (in case of a minimization
problem) there are in the objective function.
Now if with an adequate population size *and* an appropriate selection
scheme premature convergence still occurs, there are some mechanisms
that can be used.
Prevention of Duplicates
------------------------
PGAPack_ implements a mechanism for preventing duplicate gene strings
(individuals). In previous implementations the computation effort was
quadratic in the population size :math:`N`, i.e. the effort was
:math:`O(N^2)` (it compared each new individual with *all* current
members of the population, once for each new individual). In the latest
versions it uses hashing for detecting duplicates, reducing the overhead
to :math:`O(N)`, a small constant overhead for each new individual.
For user-defined data types this means that the user has to define a
hash function for the data type in addition to the string comparison
function. For the built-in data types (binary, integer, real) this is
automatically available.
Prevention of duplicates works quite well for binary and integer data
types, especially if the genetic operators have a high probability of
producing duplicates. It does not work so well for the real data type
because new individuals tend to be different from other individuals even
if they can often be very close to already existing individuals.
Restricted Tournament Replacement
---------------------------------
An algorithm originally called *restricted tournament selection* by
Harik [2]_, [3]_ and later adopted under the name of *restricted
tournament replacement* (RTR) by Pelikan [4]_ uses the old and new
population for deciding if a candidate individual will be allowed into
the new population. It works by randomly selecting a number of members
from the old population (called the selection window), chosing the
individual which is most similar to the candidate, and allows the
candidate into the new population only if it is better than this most
similar individual.
The default for the number of individuals randomly selected from the old
population (the window size) by default is :math:`\min (n,
\frac{N}{20})` [4]_ where :math:`n` is the string (gene) length and :math:`N`
is the population size. This can be set by the user with the
``PGASetRTRWindowSize`` function for PGAPack_ and with the
``rtr_window_size`` constructor parameter of PGApy_.
The RTR algorithm needs a similarity metric for deciding how
close an individual is to another. By default this uses a `manhattan
distance`_ (equivalent to the `hamming distance`_ for binary genes), i.e. an
allele-by-allele sum of distances, but can be set to an euclidian
distance or a user-defined metric with the user-function mechanism of
PGAPack_. Comparison of an euclidian distance metric for RTR is
in the `magic square example`_ in PGApy_ where the use of the euclidian
distance can be turned on with a command-line option.
Restricted tournament replacement works well not only for binary and
integer genes but also for real genes. It can be combined with different
evolutionary algorithm settings.
The effect of RTR on a problem that tends to suffer from `premature
convergence`_ can be seen in the test program |examples/mgh/testprog.f|_,
this implements several test functions from an old benchmark of
nonlinear test problems [5]_. The test function that exhibits premature
convergence is what the authors call a "Gaussian function", described as
example (9) in the paper [5]_ and implemented as function 3 in
|examples/mgh/objfcn77.f|_. This function is given as
.. math::
f(x) = x_1 \cdot e^{\frac{x_2(t_i - x_3)^2}{2}} - y_i
with
.. math::
t_i = (8 - i) / 2
And tabulated values for :math:`y_i` given in the paper or the
implementation in |examples/mgh/objfcn77.f|_. The minimization problem
from these equations is
.. math::
f (x) = \sum_{i=1}^{m} f_i^2(x)
with :math:`m=15` for this test problem. The authors [5]_ give the vector
:math:`x_0 = (0.4, 1, 0)` for the minimum
:math:`f(x_0) = 1.12793 \cdot 10^{-8}`
they found. The original Fortran implementation in
|examples/mgh/testprog.f|_ uses a population size of 10000 with default
settings for the ``real`` data type of PGAPack_. The large population
size is chosen because otherwise the problem exhibits `premature
convergence`_. It finds a solution in 100 generations
:math:`x_0=(0.3983801, 0.9978369, 0.009918243)` with an evaluation value
of :math:`f(x_0)=2.849966\cdot 10^{-5}`. The number of function
evaluations needed were 105459 (this is a little less than
:math:`10000 + 100 \cdot 1000`, i.e. evaluation of the initial
generation plus evaluation of 10% of the population of each generation,
the probability of crossover and mutation is not 100%, so it happens
that none of the operators is performed on an individual and it is not
re-evaluated).
I've implemented the same problem with Differential Evolution [6]_,
[7]_, [8]_ in |examples/mgh/testprogde.c|_ (the driver program
implemented in C because I really do not speak Fortran but using the
same functions from the Fortran implementation linking the Fortran and C
code into a common executable). This uses a population size of 2000
(large for most problems when using Differential Evolution, again for
the reason of premature convergence) and finds the solution
:math:`x_0=(0.3992372, 0.9990791, -0.0007697923)` with an evaluation
value of :math:`f(x_0)=7.393123\cdot 10^{-7}` in only 30 generations.
This amounts to 62000 function evaluations (Differential Evolution
creates all individuals for the new generation and decides afterwards
which to keep).
When using RTR with this problem in |examples/mgh/testprogdertr.c|_, the
population size can be reduced to 250 and even after 100 generations the
search has not converged to a suboptimal solution. After 100 generations
we find :math:`x_0=(0.398975, 1.000074, -6.719886 \cdot 10^{-5})` and
:math:`f(x_0)=1.339766\cdot 10^{-8}` (but also with some changed settings of
the Differential Evolution algorithm). This amounts to 25250 function
evaluations.
Restart
-------
A last resort when the above mechanisms do not work is to regularly
restart the GA whenever the population has converged too much. The
restart mechanism implemented in PGAPack_ uses the best individual from
the population to re-seed a new population with variations created by
mutation from this best individual. Restarts can be enabled by setting
``PGASetRestartFlag`` in PGAPack_ or using the ``restart`` constructor
parameter in PGApy_. The frequency (default is every 50 generations) of
restarts can be set with ``PGASetRestartFrequencyValue`` in PGAPack_ and
the ``restart_frequency`` constructor parameter in PGApy_.
.. [1] Ralf Schlatterbeck. `Proportional Selection (Roulette-Wheel) in
Genetic Algorithms`_. Blog post, Open Source Consulting, December
2022.
.. [2] Georges Harik. Finding multiple solutions in problems of bounded
difficulty. IlliGAL Report 94002, Illinois Genetic Algorithm Lab,
May 1994.
.. [3] Georges R. Harik. Finding multimodal solutions using restricted
tournament selection. In Larry J. Eshelman, editor, *Proceedings
of the International Conference on Genetic Algorithms (ICGA)*,
pages 24–31. Morgan Kaufmann, July 1995.
.. [4] Martin Pelikan. *Hierarchical Bayesian Optimization Algorithm:
Toward a New Generation of Evolutionary Algorithms*, volume 170 of
*Studies in Fuzziness and Soft Computing*. Springer, 2005.
.. [5] Jorge J. Moré, Burton S. Garbow, and Kenneth E. Hillstrom.
*Testing unconstrained optimization software.* ACM Transactions
on Mathematical Software, 7(1):17–41, March 1981.
.. [6] Rainer Storn and Kenneth Price. Differential evolution |--| a simple
and efficient heuristic for global optimization over continuous
spaces. *Journal of Global Optimization*, 11(4):341–359, December 1997.
.. [7] Rainer Storn and Kenneth Price. Differential evolution – a simple
and efficient adaptive scheme for global optimization over
continuous spaces. Technical Report TR-95-012, International
Computer Science Institute (ICSI), March 1995.
`Available from the International Computer Science Institute`_
.. [8] Kenneth V. Price, Rainer M. Storn, and Jouni A. Lampinen.
*Differential Evolution: A Practical Approach to Global Optimization.*
Springer, Berlin, Heidelberg, 2005.
.. [9] David E. Goldberg, Kalyanmoy Deb, and James H. Clark. Genetic
algorithms, noise, and the sizing of populations. *Complex
Systems*, 6(4):333–362, 1992. `Available from Complex Systems`_
.. _`PGAPack`: https://github.com/schlatterbeck/pgapack
.. _`PGAPy`: https://github.com/schlatterbeck/pgapy
.. _`Fitness Proportionate Selection`:
https://en.wikipedia.org/wiki/Fitness_proportionate_selection
.. _`premature convergence`:
https://en.wikipedia.org/wiki/Premature_convergence
.. _`Proportional Selection (Roulette-Wheel) in Genetic Algorithms`:
https://blog.runtux.com/posts/2022/12/03/
.. _`Manhattan Distance`: https://en.wikipedia.org/wiki/Taxicab_geometry
.. _`magic square example`:
https://github.com/schlatterbeck/pgapy/blob/master/examples/magic_square.py
.. _`Available from the International Computer Science Institute`:
ftp://ftp.icsi.berkeley.edu/pub/techreports/1995/tr-95-012.pdf
.. _`examples/mgh/testprog.f`:
https://github.com/schlatterbeck/pgapack/blob/master/examples/mgh/testprog.f
.. _`examples/mgh/objfcn77.f`:
https://github.com/schlatterbeck/pgapack/blob/master/examples/mgh/objfcn77.f
.. _`examples/mgh/testprogde.c`:
https://github.com/schlatterbeck/pgapack/blob/master/examples/mgh/testprogde.c
.. _`examples/mgh/testprogdertr.c`:
https://github.com/schlatterbeck/pgapack/blob/master/examples/mgh/testprogdertr.c
.. _`Available from Complex Systems`:
https://wpmedia.wolfram.com/uploads/sites/13/2018/02/06-4-3.pdf
.. _`hamming distance`: https://en.wikipedia.org/wiki/Hamming_distance