Runtux Blog (Posts about genetic algorithms)

Optimizing Floating-Point Problems with Evolutionary Algorithms

Ralf Schlatterbeck — Sun, 07 Jan 2024 17:00:00 GMT

[This is an extended abstract for a talk I'm planning]

Evolutionary Algorithms (EA) are a superset of genetic algorithms and related optimization algorithms. Genetic algorithms usually work with bits and small integer numbers.

There are other EAs that directly work with floating point numbers, among the Differential Evolution (DE) [1] [2] [3].

The talk gives an introduction to optimization of floating-point problems with DE. It uses examples from electrical engineering as well as from optimization of actuation waveforms of inkjet printers. Piezo-electric inkjet printers use an actuation waveform to jet drops out of a nozzle. This waveform (among other parameters like the jetted fluid) determines the quality of the jetted drops.

For the software I'm using the Python bindings PGApy [4] for the Parallel Genetic Algorithm Package PGAPack [5] which was originally developed at Argonne National Laboratories. I'm maintaining both packages for some years now. Among others, Differential Evolution (DE) and strategies for multi-objective optimization (using NSGA-II [6]) where newly implemented.

Differential Evolution Illustrated

Ralf Schlatterbeck — Fri, 07 Apr 2023 20:00:00 GMT

This post again applies to PGAPack and PGApy or other Genetic Algorithm (GA) implementations that support Differential Evolution (DE).

Differential Evolution (DE) [1], [2], [3] is a population based optimizer that is similar to other evolutionary algorithms. It is quite powerful and implemented in PGAPack (and usable in the Python wrapper PGAPy). To illustrate some points about this algorithm I've constructed a simple parcours in OpenSCAD.

To test the optimizer with this parcours the gene in the genetic algorithm (DE calls this the parameters) is the X and Y coordinate to give a position on the parcours. In the following we're also calling these coordinates a vector as is common in the DE literature. We initialize the population in the region of the start of the ramp. We allow the population to leave the initialization range.

Note that the corners of the parcours are flat areas with the same height. Areas like this are generally difficult for optimization algorithms because the algorithm cannot know in which direction better values (if available at all) can be found. This is also the reason why we do not put the initialization range inside the flat area at the bottom of the parcours. Note that normally one would initialize the population over the whole area to be searched. In our case we want to demonstrate that the algorithm is well capable of finding a solution far from the initialization range.

The algorithm of DE is quite simple (see, e.g. [3] p.37-47): Each member \(\mathbf x_{i,g}\) in the population of the current generation \(g\), where \(i\) is the index of the population member is crossed over with a mutant vector \(v_{i,g}\). The mutant vector is generated by selecting three population members at random (without replacement) from the population and combining them as follows.

\begin{equation*} \mathbf v_{i,g} = \mathbf x_{r_0,g} + F \cdot (\mathbf x_{r_1,g} - \mathbf x_{r_2,g}) \end{equation*}

As can be seen, the weighted difference of two random members of the population is added to a third population member. All indeces \(r_0, r_1,\) and \(r_2\) are different and different from \(i\). The weight \(F\) is a configuration parameter that is typically \(0.3 \le F < 1\). This algorithm is the classic DE, also often termed de-rand-1-bin. A variant uses the index of the best individual instead of \(r_0\) and is called de-best-1-bin. If more than a single difference is added, another variant would be, e.g., de-rand-2-bin. The last component of the name bin refers to uniform crossover [4], a binomial distribution. In DE, parameters from the mutant vector are selected with a certain probability \(Cr\). Note that DE makes sure that at least one parameter from the mutant vector is selected (otherwise the original vector \(\mathbf x_{i,g}\) would be unchanged).

More details about DE and the usage with PGAPack can be found in the section Differential Evolution and in the Mutation sections of the PGAPack documentation.

To use the OpenSCAD model for optimization, we convert it to a greyscale heightmap using an STL to PNG converter (STL is a 3D format that can be generated by OpenSCAD). The evaluation function then simply returns the greyscale value at the given location (after rounding the gene X and Y values to an integer). Values outside the picture are returned as 0 (black). The resulting optimization run is shown below in an animated image. The population is quite small, there are only 6 individuals in the population. We clearly see that when the differences between individuals gets large (on the straight ridges of the parcours), the optimization proceeds in larger steps. We also see that the flat corners can take quite some time to escape from and in the corners the algorithm slows down. Finally in the last corner, the cone is climbed and the individuals converge almost to a single point. The algorithm is stopped when the first individual returns an evaluation of the largest greyscale value (representing white). Note that I cheated a little in that many optimization runs take longer (in particular the optimization can get stuck for many generations in the flat parts of the ridge) and I selected a run that produced a good animation. So the selected run is not the average but one of the shorter runs.

Notes on Premature Convergence in Genetic Algorithms

Ralf Schlatterbeck — Fri, 06 Jan 2023 17:45:00 GMT

This post again applies to PGAPack and PGApy or other Genetic Algorithm (GA) implementations.

When optimizing solutions with GA, it sometimes happens that a sub-optimal solution is found because the whole population converges early to a part of the search space where no better solutions are found anymore. This effect is called premature convergence.

It is usually hard to detect premature convergence, a good measure is the mean hamming distance between individuals. In PGAPack reporting of the hamming distance can be enabled with the reporting option PGA_REPORT_HAMMING, set with the function PGASetPrintOptions in PGAPack and with the print_options constructor parameter in PGApy. Unfortunately this is only implemented for the binary datatype.

One reason for the effect of premature convergence is the use of Fitness Proportionate Selection as detailed in an earlier post in this blog [1]. If during the early stages of the search an individual is discovered that is much better than anything found so far, the chance is high that this individual takes over the whole population when Fitness Proportionate Selection is in use, preventing any further progress of the algorithm. The reason is that an individual that is much better than all others gets an unreasonably high proportion of the roulette wheel when selecting individuals for the next generation resulting in all other genetic material having only a slim chance of making it into the next generation.

Another reason for premature convergence can be a small population size. Goldberg et. al. [9] give estimates of the population size for classic GA with a small alphabet (the number of different allele values, e.g. 0 and 1 for a binary GA) with cardinality \(\chi\), a problem specific building block size that overcomes deception \(k \ll n\) where \(k\) is the building block size (a measure for the difficulty of the problem) and \(n\) is the string (gene) length. They show that the population size should be \(O(m\chi^k)\) with \(m=n/k\) so that for a given difficulty of the problem \(k\) the population size is proportional to the string length \(n\). This result, however, does not readily translate to problems with a large alphabet, e.g. floating-point representations like the real data type in PGAPack. For floating point representations, difficulty usually translates to how multimodal a problem is, i.e., how many peaks (in case of a maximization problem) or valleys (in case of a minimization problem) there are in the objective function.

Now if with an adequate population size and an appropriate selection scheme premature convergence still occurs, there are some mechanisms that can be used.

Prevention of Duplicates

PGAPack implements a mechanism for preventing duplicate gene strings (individuals). In previous implementations the computation effort was quadratic in the population size \(N\), i.e. the effort was \(O(N^2)\) (it compared each new individual with all current members of the population, once for each new individual). In the latest versions it uses hashing for detecting duplicates, reducing the overhead to \(O(N)\), a small constant overhead for each new individual.

For user-defined data types this means that the user has to define a hash function for the data type in addition to the string comparison function. For the built-in data types (binary, integer, real) this is automatically available.

Prevention of duplicates works quite well for binary and integer data types, especially if the genetic operators have a high probability of producing duplicates. It does not work so well for the real data type because new individuals tend to be different from other individuals even if they can often be very close to already existing individuals.

Restricted Tournament Replacement

An algorithm originally called restricted tournament selection by Harik [2], [3] and later adopted under the name of restricted tournament replacement (RTR) by Pelikan [4] uses the old and new population for deciding if a candidate individual will be allowed into the new population. It works by randomly selecting a number of members from the old population (called the selection window), chosing the individual which is most similar to the candidate, and allows the candidate into the new population only if it is better than this most similar individual.

The default for the number of individuals randomly selected from the old population (the window size) by default is \(\min (n, \frac{N}{20})\) [4] where \(n\) is the string (gene) length and \(N\) is the population size. This can be set by the user with the PGASetRTRWindowSize function for PGAPack and with the rtr_window_size constructor parameter of PGApy.

The RTR algorithm needs a similarity metric for deciding how close an individual is to another. By default this uses a manhattan distance (equivalent to the hamming distance for binary genes), i.e. an allele-by-allele sum of distances, but can be set to an euclidian distance or a user-defined metric with the user-function mechanism of PGAPack. Comparison of an euclidian distance metric for RTR is in the magic square example in PGApy where the use of the euclidian distance can be turned on with a command-line option.

Restricted tournament replacement works well not only for binary and integer genes but also for real genes. It can be combined with different evolutionary algorithm settings.

The effect of RTR on a problem that tends to suffer from premature convergence can be seen in the test program examples/mgh/testprog.f, this implements several test functions from an old benchmark of nonlinear test problems [5]. The test function that exhibits premature convergence is what the authors call a "Gaussian function", described as example (9) in the paper [5] and implemented as function 3 in examples/mgh/objfcn77.f. This function is given as

\begin{equation*} f(x) = x_1 \cdot e^{\frac{x_2(t_i - x_3)^2}{2}} - y_i \end{equation*}

with

\begin{equation*} t_i = (8 - i) / 2 \end{equation*}

And tabulated values for \(y_i\) given in the paper or the implementation in examples/mgh/objfcn77.f. The minimization problem from these equations is

\begin{equation*} f (x) = \sum_{i=1}^{m} f_i^2(x) \end{equation*}

with \(m=15\) for this test problem. The authors [5] give the vector \(x_0 = (0.4, 1, 0)\) for the minimum \(f(x_0) = 1.12793 \cdot 10^{-8}\) they found. The original Fortran implementation in examples/mgh/testprog.f uses a population size of 10000 with default settings for the real data type of PGAPack. The large population size is chosen because otherwise the problem exhibits premature convergence. It finds a solution in 100 generations \(x_0=(0.3983801, 0.9978369, 0.009918243)\) with an evaluation value of \(f(x_0)=2.849966\cdot 10^{-5}\). The number of function evaluations needed were 105459 (this is a little less than \(10000 + 100 \cdot 1000\), i.e. evaluation of the initial generation plus evaluation of 10% of the population of each generation, the probability of crossover and mutation is not 100%, so it happens that none of the operators is performed on an individual and it is not re-evaluated).

I've implemented the same problem with Differential Evolution [6], [7], [8] in examples/mgh/testprogde.c (the driver program implemented in C because I really do not speak Fortran but using the same functions from the Fortran implementation linking the Fortran and C code into a common executable). This uses a population size of 2000 (large for most problems when using Differential Evolution, again for the reason of premature convergence) and finds the solution \(x_0=(0.3992372, 0.9990791, -0.0007697923)\) with an evaluation value of \(f(x_0)=7.393123\cdot 10^{-7}\) in only 30 generations. This amounts to 62000 function evaluations (Differential Evolution creates all individuals for the new generation and decides afterwards which to keep).

When using RTR with this problem in examples/mgh/testprogdertr.c, the population size can be reduced to 250 and even after 100 generations the search has not converged to a suboptimal solution. After 100 generations we find \(x_0=(0.398975, 1.000074, -6.719886 \cdot 10^{-5})\) and \(f(x_0)=1.339766\cdot 10^{-8}\) (but also with some changed settings of the Differential Evolution algorithm). This amounts to 25250 function evaluations.

Restart

A last resort when the above mechanisms do not work is to regularly restart the GA whenever the population has converged too much. The restart mechanism implemented in PGAPack uses the best individual from the population to re-seed a new population with variations created by mutation from this best individual. Restarts can be enabled by setting PGASetRestartFlag in PGAPack or using the restart constructor parameter in PGApy. The frequency (default is every 50 generations) of restarts can be set with PGASetRestartFrequencyValue in PGAPack and the restart_frequency constructor parameter in PGApy.

Proportional Selection (Roulette-Wheel) in Genetic Algorithms

Ralf Schlatterbeck — Sat, 03 Dec 2022 15:28:00 GMT

Some of you know that I'm maintaining PGApack, a genetic algorithm package and a corresponding Python wrapper, PGApy. Recently the question came up why PGApack, when using Fitness Proportionate Selection (also known as Roulette-Wheel selection and also called proportional selection) errors out because it cannot normalize the fitness value.

In PGApack, the user supplies a real-valued evaluation function (which is called for each individual) and specifies if the result of that evaluation should be minimized or maximized (defining the optimization direction).

When using Fitness proportionate selection, this evaluation value must be mapped to a positive value, assigning each evaluation a share on the roulette-wheel. If the optimization direction is minimization, small values are better and need a larger part of the roulette-wheel so that they have a higher probability of being selected. So we need to remap the raw evaluation values to a nonnegative and monotonically increasing fitness. For a minimization problem, we're computing the maximum (worst) evaluation and then the fitness is the difference of this maximum and the evaluation for that individual (after scaling the maximum a little so that no fitness value is exactly zero):

\begin{equation*} F = E_{max} - E \end{equation*}

where \(F\) is the fitness of the current individual, \(E_{max}\) is the maximum of all evaluations in this generation, and \(E\) is the evaluation of that individual.

Now when evaluation values differ by several orders of magnitude, it can happen that the difference in that formula ends up being \(E_{max}\) for many (different) evaluation values. I'm calling this an overflow in the error message which is probably not the best name for it.

That overflow happens when \(E_{max}\) is large compared to the current evaluation value \(E\) so that the difference ends up being \(E_{max}\) (i.e. the subtraction of \(E\) had no effect). In the code we check for this condition:

if (cmax == cmax - evalue && evalue != 0)

This condition triggers when subtracting the evaluation \(E\) from the current \(E_{max}\) does not change \(E_{max}\) even though \(E\) is not zero. So \(E\) is so small compared to \(E_{max}\) that the double data type cannot represent the difference. This happens whenever the units in the last place (called ulps by Goldberg (not the same Goldberg who wrote the Genetic Algorithms book afaik)) of the significand (also called mantissa) is larger than the value that should be subtracted [1].

In our example \(E_{max} = 1.077688 * 10^{22}\) and the evaluation where this failed was \(E = 10000\). The IEEE 754 double precision floating point format has 53 bit of significand which can represent numbers up to \(2^{54} - 1 = 18014398509481983\) or about \(1.8 * 10^{16}\). So you see that the number 10000 is just below the ulps. We can try in python (which uses double for floating-point values):

>>> 1.077688e22 - 10000 == 1.077688e22
True

Why do we make this check in the program? Letting the search continue with such an overflow (or how we want to call it) would map many different evaluation values to the same fitness. So the genetic algorithm could not distinguishing these individuals.

So what can we do about it when that error happens?

The short answer: Use a different selection scheme. There is a reason why the default in PGApack is not proportional selection.

Fitness proportionate selection (aka roulette wheel selection) has other problems, too. It has too much selection pressure in the beginning and too low at the end (also mentioned in the Wikipedia article but beware, I've written parts of it).

Blickle and Thiele [2] did a mathematical analysis of different selection schemes and showed that proportional selection is typically not a good idea (it was historically the first selection scheme and is described in Goldberg's (the other Goldberg) book [3] which is probably the reason it is still being used). Note that there is an earlier report from Blickle and Thiele [4] that is more frank about the use of proportional selection: "All the undesired properties together led us to the conclusion that proportional selection is a very unsuited selection scheme. Informally one can say that the only advantage of proportional selection is that it is so difficult to prove the disadvantages" ([4], p. 42), they were not as outspoken in the final paper :-)

We're seeing this in the example above: We have very high differences between good and bad evaluation (in fact so large that the fitness cannot be computed, see above). So when using proportional selection the very good individuals will be selected with too high probability resulting in premature convergence.

That all said: If you're doing optimization with genes of type 'real', (represented by double values in PGApack) you may want to try Differential Evolution [5], [6], [7]. At least in my experiments with antenna optimization [8] the results outperform the standard genetic algorithm, but this is reported by several practitioners [7]. Examples of how to use this are in examples/deb/optimize.c or examples/mgh/testprogde.c in PGApack.

The PGASetDECrossoverProb is critical, for problems where the dimensions cannot typically be optimized separately the value should be set close to 1 but not equal to 1.

Epsilon-Constrained Optimization

Ralf Schlatterbeck — Mon, 29 Aug 2022 16:00:00 GMT

[Update 2022-10-18: Replace epsilon with delta in description of example problem 7 in pgapack]

Many optimization problems involve constraints that valid solutions must satisfy. A constrained optimization problem is typically written as a nonlinear programming problem [1].

\begin{align*} \hbox{Minimize} \; & f_i(\vec{x}), & i &= 1, \ldots, I \\ \hbox{Subject to} \; & g_j(\vec{x}) \le 0, & j &= 1, \ldots, J \\ & h_k(\vec{x}) = 0, & k &= 1, \ldots, K \\ & x_m^l \le x_m \le x_m^u, & m &= 1, \ldots, M \\ \end{align*}

In this problem, there are \(n\) variables (the vector \(\vec{x}\) is of size \(n\)), \(J\) inequality constraints, \(K\) equality constraints and the variable \(x_m\) must be in the range \(|x_m^l, x_m^u|\) (often called a box constraint). The functions \(f_i\) are called the objective functions. If there is more than one objective function, the problem is said to be a multi objective optimization problem as described previously in this blog [2]. In the following I will use some terms that were introduced in that earlier blog-post without further explanation.

The objective functions are not necessarily minimized (as in the formula) but can also be maximized if the best solutions to a problem requires maximization. Note that the inequality constraints are often depicted with a \(\ge\) sign but the formula can easily be changed (e.g. by multiplying with -1) to use a \(\le\) sign.

Since it is very hard to fulfil equality constraints, especially if they involve nonlinear functions of the input variables, equality constraints are often converted to inequality constraints using an δ‑neighborhood:

\begin{equation*} -\delta \le h_k(\vec{x}) \le \delta \end{equation*}

Where δ is chosen according to the requirements of the problem for which a solution is sought.

One very successful method of solving constrained optimization problems is to consider a lexicographic ordering of constraints and objective function values. Candidate solutions to the optimization problem are first sorted by violated constraints (typically the sum over all violated constraints) and then by the objective function value(s) [1]. When comparing two individuals during selection in the genetic algorithm there are three cases: If both individuals violate constraints, the individual with the lower constraint violation wins. If one violates constraints and the other does not, the one not violating constraints wins. In the last case where both individuals do not violate constraints the normal comparison is used (which depends on the algorithm and if we're minimizing or maximizing). This method, originally proposed by Deb [1], is implemented in the genetic algorithm package I'm maintaining, PGAPack, and my Python wrapper PGAPy for it.

With this algorithm for handling constraints, the constraints are optimized first before the algorithm "looks" at the objective function(s) at all. It often happens that the algorithm ends up searching in a region of the input space where no good solutions exist (but no constraints are violated). Hard problems often contain equality constraints (converted to inequality constraints as indicated earlier) or other "hard" constraints. In my previous blog post [2] on antenna optimization I wrote: "the optimization algorithm has a hard time finding the director solutions at all. Only in one of a handful experiments I was able to obtain the pareto front plotted above".

In that experiment I was running 50 searches and only 5 of them did not get stuck in a local optimum. A similar thing happens for the problem (7) in Deb's paper [1] which has equality constraints. I've implemented this as example problem 7 in PGAPack. It only finds a solution near the (known) optimum when \(\delta \ge 10^{-2}\) for all equality constraints (I didn't experiment with different random seeds for the optimizer, maybe a better solution would be possible with a different random seed). In the paper [1], Deb uses \(\delta = 10^{-3}\) for the same reason.

One method for handling this problem was appealing because it is so easy to understand and implement: Takahama and Sakai were first experimenting with a method for relaxing constraints during the early stages of optization with a formulation they called an α‑constrained genetic algorithm [3]. They later simplified the formulation and called the resulting algorithm ε constrained optimization. It can be applied to different optimization algorithms, not just genetic algorithms and variants [4]. Of special interest is the application of the method to differential evolution [5], [6] but of course it can also be applied to other forms of genetic algorithms.

Note that the ε in the name of the algorithm can be used for the δ used when converting an equality constraint to inequality constraints but is not limited to this case.

During the run of the optimizer in each generation a new value for ε is computed. The comparison of individuals outlined above is modified, so that an individual is handled like it was not violating any constraints if the constraint violation is below ε. So if both individuals have constraint violations larger than ε, the one with lower violation wins. If one violation is below ε and the other above, the individual with the violation below ε wins. And finally if the constraint violations of both individuals are below ε, the normal comparison takes place.

The last case is the key to the success of this algorithm: Even though the search proceeds into a direction where the constraint violations are minimized, at the same time good solutions in terms of the objective function are found.

The algorithm begins by initializing \(\varepsilon_0\) with the constraint violation of the individuum with index \(\theta\) from the initial population sorted by constraint violation, where \(\theta\) is a parameter of the algorithm between 1 and the population size, a good value uses the individuum at about 20% of the population size which is also the default in PGAPack. In each generation \(t\), \(\varepsilon_t\) is computed by

\begin{equation*} \varepsilon_t = \varepsilon_0 \left(1-\frac{t}{T_c}\right)^{cp} \end{equation*}

up to generation \(T_c\). After that generation, ε is set to 0. The exponent \(cp\) is between 2 and 10. The 2010 paper [6] recommends to set \(cp = 0.3 cp + 0.7 cp_\min\) at generation \(T_\lambda = 0.95 T_c\) where \(cp_\min\) is the fixed value 3. The initial value of \(cp\) is chosen so that \(\varepsilon_\lambda=10^{-5}\) at generation \(T_\lambda\) unless it is smaller than \(cp_\min\) in which case it is set to \(cp_\min\). PGAPack implements this schedule for \(cp\) by default but allows to change \(cp\) at start and during run of the optimizer, so it's possible to easily implement a different schedule for \(cp\) – the default works quite nicely, though.

With the ε constraint method, example 7 from Deb [1] can be optimized with a precision of \(10^{-6}\) in my experiments, see the epsilon_generation parameter in the optimizer example

The antenna-optimizer with an ε‑generation of 50 (that's the \(T_c\) parameter of the algorithm) gets stuck in the local optimum only in one of 50 cases, all other cases find good results:

In that picture all the solutions that are dominated by solutions from another run are drawn in black. It can be seen that the data from run number 16 did not contribute any non-dominated solutions (on the right side in the legend the number 16 is missing). You can turn off the display of the dominated solutions by clicking on the black dot in the legend.

When I increase the ε‑generation to 60, the run with random seed 16 also finds a solution:

We also see that the solutions are quite good (quite near to the pareto front) for all runs, the black "shadow" of the dominated solutions is quite near to the real pareto front and it is enough to do a single run of the algorithm for finding a good set of solutions.

Multi-Objective Antenna Optimization

Ralf Schlatterbeck — Mon, 27 Dec 2021 17:05:00 GMT

For quite some time I'm optimizing antennas using genetic algorithms. I'm using the pgapack parallel genetic algorithm package originally by David Levine from Argonne National Laboratory which I'm maintaining. Longer than maintaining pgapack I'm developing a Python wrapper for pgapack called pgapy.

For the antenna simulation part I'm using Tim Molteno's PyNEC, a python wrapper for the Numerical Electromagnetics Code (NEC) version 2 written in C++ (aka NEC++) and wrapped for Python.

Using these packages I've written a small open source framework to optimize antennas called antenna-optimizer. This can use traditional genetic algorithm method with bit-strings as genes as well as a floating-point representation with operators suited for floating-point genes.

The parallel in pgapack tells us that the evaluation function of the genetic algorithm can be parallelized. When optimizing antennas we simulate each candidate parameters for an antenna using the antenna simulation of PyNEC. Antenna simulation is still (the original NEC code is from the 1980s and was conceived using punched cards for I/O) a CPU-intensive undertaking. So the fact that with pgapack we can run many simulations in parallel using the message passing interface (MPI) standard [1] is good news.

For pgapack – and also for pgapy – I've recently implemented some classic algorithms that have proven very useful over time:

Differential Evolution [2], [3], [4] is a very successful optimization algorithm for floating-point genes that is very interesting for electromagnetics problems
The elitist Nondominated Sorting Genetic Algorithm NSGA-II [5] allows to optimize multiple objectives in a single run of the optimizer
We can have constraints on the optimization using constraint functions that are minimized. For a solution to be valid, all constraints must be zero or negative. [6]

Traditionally with genetic algorithms only a single evaluation function, also called objective function is possible. With NSGA-II it is possible to have several objective functions. We call such an algorithm a multi-objective optimization algorithm.

For antenna simulation this means that we don't need to combine different antenna criteria like gain, forward/backward ratio, and standing wave ratio (VSWR) into a single evaluation function which I was using in antenna-optimizer, but instead we can specify them separately and leave the optimization to the genetic search.

With multiple objectives, however, typically when a solution is better in one objective, it can be worse in another objective and vice-versa. So we are searching for solutions that are strictly better than other solutions. A solution is said to dominate another solution when it is strictly better in one objective but not worse in any other objective. All solutions that fulfill this criterion are said to be pareto-optimal named after the italian scientist Vilfredo Pareto who first defined the concept of pareto optimality. All solutions that fulfill the pareto optimality criterion are said to lie on a pareto front. For two objectives the pareto front can be shown in a scatter-plot as we will see below.

Since pgapack follows a "mix and match" approach to genetic algorithms we can combine successful strategies for different parts of a genetic algorithm:

We can use Differential Evolution just for the mutation/crossover part of the genetic algorithm
We can combine this with the nondominated sorting replacement of NSGA-II
We can define some of our objectives as constraints. For our problem it makes sense to only allow antennas that do not exceed a given standing-wave ratio. So we do not allow antennas with a VSWR > 1.8. The necessary constraint function is \(S - 1.8 \le 0\) where \(S\) is the voltage standing wave ratio (VSWR).

With this combination we can successfully compute antennas for the 70cm ham-radio band (430 MHz - 440 MHz). The antenna uses what we call a folded dipole (the thing with the rounded corners) and a straight element. The measures in the figure represent the lenghts optimized by the genetic algorithm. The two dots in the middle of the folded dipole element represent the point where the antenna feed-line is connected.

A first example simulates antenna parameters for the lowest, the highest and the medium frequency. The gain and forward/backward ratio are computed for the medium frequency only:

In this graph (a scatter plot) the first objective (the gain) is graphed against the second objective, the forward/backward ratio. All numbers are taken from the medium frequency. Each dot represents a simulated antenna. All antennas have a VSWR lower than 1.8 on the minimum, medium, and maximum frequency.

With this success I was experimenting with different settings of the Differential Evolution parameters. It is well-known that Differential Evolution performance on decomposable problems is better with a low crossover-rate, while it is better on non-decomposable problems with a high crossover rate. A decomposable problem is one where the different dimensions can be optimized separately, this was first observed by Salomon in 1996 [7]. I had been using a crossover-rate of 0.2 and my hope was that the optimization would be better and faster with a higher crossover rate. The experiment below uses a crossover-rate of 0.9.

In addition I was experimenting with dither: Differential Evolution allows to randomly change the scale-factor \(F\), by which the difference of two vectors is multiplied slightly for each generated variation. In the first implementation I was setting dither to 0, now I had a dither of 0.2. Imagine my surprise when with these settings I found a completely different Pareto front for the solution:

To make it easier to see that the second discovered front completely dominates the front that was first discovered, I've plotted the two fronts into a single graph:

Now since the second discovered front looks too good to be true (over the whole frequency range) for a two-element antenna, lets take a look what is happening here. First we show the orientation of the antenna and the computed gain pattern for one of the antennas from the middle of the lower front:

The antenna has – as already indicated in the pareto-front graphics – a gain of about 6.6 dBi and a forward/backward ratio of about 11 dB in the middle of the band at 435 MHz. The colors on the antenna denote the currents on the antenna structure. If you want to look at this yourself, here is a link to the NEC input file for antenna 1

Now lets compare this with one of the antennas of the "orange front", where we get a lot better values:

This antenna is in the middle of the pareto front above and has a gain of about 6.7 dBi and a forward/backward ratio of about 16 dB in the middle of the band at 435 MHz. Can you spot the difference to the first antenna? Yes: The maximum gain is in the opposite direction of the first antenna. We say that for the first antenna the straight element acts as a reflector while for the second antenna it acts as a director. If you want to look at this yourself, here is a link to the NEC input file for antenna 2

Now we look at the frequency plot of gain and forward/backward ratio of the antennas, the plot for the first antenna (with the reflector element) is on the left, while the plot for the antenna with the director element is on the right.

We see that the forward/backward ratio of the director antenna ranges from more than 10 dB to more than 25 dB while the reflector design ranges from 9.3 dB to 11.75 dB. For the minimum gain the reflector design is slightly better (from 6.35-6.85 dBi vs. 6.3-7.05 dBi). So this needs further experiments. When forcing a reflector design and changing the evaluation function to return the minimum gain and F/B ratio over the three (start, middle, end) frequencies we get:

The same for a director design (also with the minimum gain and F/B ratio over the three frequencies start, middle, end) we get:

With these result, the sweet spot for an antenna to build is probably at or above 10 dB F/B ratio and a gain of about 6.2 dBi. Going for some 1/10 dBi more gain and sacrificing several dB of F/B ratio doesn't seem sensible. Comparing the director vs. reflector design we notice (contrary to at least my intuition) that the director design has a better F/B ratio over the whole frequency range. If, however the antenna is to be used for relay operation, where the sending frequency (the relay input) is in the lower half of the frequency range and the relay output (the receiving frequency) is in the upper half, we will probably chose a reflector design because there the gain is higher when sending and the F/B ratio is higher when receiving (compare the two earlier gain and F/B ratio plots).

Also note that the optimization algorithm has a hard time finding the director solutions at all. Only in one of a handful experiments I was able to obtain the pareto front plotted above. The design is more narrowband than the reflector design and the algorithm often converges to a local optimimum. The higher difference in gain and F/B range of the director design also tells us that it will be harder to build: Not getting the dimensions exactly right will probably not reach the predicted simulation results. The reflector design is a little more tolerant in this regard.