Chipselect Glitch with SPI NOR on IMX-28


For a customer I'm maintaining a Linux environment (including kernel) for a custom board that uses a module based on the IMX-28 ARM processor. For the user space I'm using ELBE, the Embedded Linux Build Environment that allows building a user space filesystem from a Debian Linux release. Migrating from an earlier Debian version to the current stable release code-named Bookworm we needed a kernel upgrade because systemd requires some features not present in earlier kernels.

After upgrading from a Kernel in the 4.14 longterm support series to the 6.1 longterm support series our SPI NOR flash memory was no longer working. The JFFS2 file system we're using produced errors that indicated that we had a problem writing to the NOR flash (some error messages indicated that the file system expected bytes to be 0 which were 0xFF which lead to the conclusion that these bytes had not been written).

After verifying that the clock rate of the SPI bus when writing to the flash device was unchanged between kernel versions I looked at the signals of the SPI bus.

image of glitch in chip select

Glitch in Chip Select During Transfer

image of glitch in chip select higher temporal resolution

Glitch in Chip Select During Transfer: Time Zoom

From the Oszilloscope pictures (blue is the chip select, yellow is the SPI clock) it can be seen that there is a short glitch in the chip select line. This has the effect of terminating the page program command to the NOR flash prematurely. From the zoomed-in version it can be seen that the chip-select has a glitch in the middle of a bit during the transfer. The exact position changes slightly from transfer to transfer (it doesn't hit the same bit every time but it does hit the same byte).

Further experimentation showed that the problem occurs only when the SPI transfer is initiated via direct memory access (DMA), not when the non-DMA version (programmed input-output, PIO) was used. The SPI driver optimizes transfers: Only longer transfers are done via DMA, short transfers use PIO because the DMA overhead exceeds the gain. When looking into the DMA implementation of the IMX-28 in drivers/dma/mxs-dma.c it turns out that there was a change in a flag passed to the DMA transfer, the old version has:

if (flags & DMA_CTRL_ACK)
        ccw->bits |= CCW_WAIT4END;

The new version has:

if (flags & MXS_DMA_CTRL_WAIT4END)
        ccw->bits |= CCW_WAIT4END;

The bit CCW_WAIT4END according the the IMX-28 processor reference: "A value of one indicates that the channel waits for the end of command signal to be sent from the APBH device to the DMA before starting the next DMA command." This flag is set on the actual DMA transfer to the NOR flash. The IMX-28 is special in that it can transfer register settings also via DMA. Prior to the memory transfer a register transfer is initiated that sets the IGNORE_CRC bit in the processors SSP register. From the processor reference on the IGNORE_CRC bit: "In SPI/SSI modes: When set to 1, deassert the chip select (SSn) pin after the command is executed".

So what happens when the CCW_WAIT4END is not set is that the chip select is deasserted too early.

The DMA transfer initiated from the SPI driver in drivers/spi/spi-mxs.c does not set the MXS_DMA_CTRL_WAIT4END flag, so the CCW_WAIT4END is not set on the transfer.

Now the fix is obvious, we simply need to set this flag on the DMA transfer initiated by the SPI driver in drivers/spi/spi-mxs.c.

Linux Kernel route_localnet setting


The Linux kernel has a setting for network interfaces that allows routing local net (127.0.0.0/8 for IP version 4, also known as the loopback address because it typically is assigned to the lo loopback interface). It can be accessed via:

/proc/sys/net/ipv4/conf/eth0/route_localnet

for the typical eth0 network interface. The values which can be written there are either 0 (the default) for no routing or 1 for routing. Each network interface has its own special file. If a setting should be made persistent this can usually be done by configuring a setting in /etc/sysctl.conf on most Linux distributions. The setting takes the form of the above path where the prefix /proc/sys/ is removed and all / characters are replaced by a dot. In the example above this would be written as:

net.ipv4.conf.eth0.route_localnet = 1

Now what does this setting achieve?

In the following I'm using examples from IP version 4 but the same applies to the local host (loopback) address ::1 for IP version 6.

Typically in the Linux network stack IP packets which are either originating from local net 127.0.0.0/8 or have a destination in that network are dropped when the come in over an interface that is not configured for this address range. We'll first verify this in the following.

For testing we use a simple UDP server in python that receives UDP packets and prints their source IP address and port (and length):

import socket
[...]
sock = socket.socket (socket.AF_INET, socket.SOCK_DGRAM,
socket.IPPROTO_UDP)
sock.bind ((args.address, args.port))
while True:
    buf, adr = sock.recvfrom (1024)
    l = len (buf)
    print ('Received %(l)s bytes from %(adr)s' % locals ())

All the values in args come from the command line via ArgumentParser. The address to bind to is 0.0.0.0 in the following, meaning it accepts packets from all network interfaces.

Now we can first test with a simple client. In the following we have two machines, .9 is the sending machine and .5 is the receiving machine.

When sending a UDP packet we can use the following simple code:

import socket
sock = socket.socket (socket.AF_INET, socket.SOCK_DGRAM, socket.IPPROTO_UDP)
sock.sendto (b'This is a test', (args.address, args.port))

Again we can specify the parameters on the command line. Now when we send from .9 to .5 we get:

Received 14 bytes from ('XXXXXXX.9', 38232)

where the source port 38232 is a random number created by the sending Linux kernel and the XXXXXXX part represents the high bytes of the IP address of the sender. When sending from the receiving machine to 127.0.0.1 we get:

Received 14 bytes from ('127.0.0.1', 58196)

We see that the simple server listens to both, the real ethernet interface and to the loopback interface.

Nothing new so far. Lets see what happens when we fake addresses. For creating arbitrary IP packets (or even non-IP packets) the Python package Scapy is used in the following.

Consider the following little script:

p = Ether (dst = args.mac_address, src = args.mac_address) \
    / IP (dst = args.address, src = args.source_address) \
    / UDP (dport = args.port, sport = args.source_port) \
    / Raw (b'This is a test')
sendp (p, iface = args.interface)

This builds an Ethernet (Ether) packet with an IP payload. The IP packet contains a UDP payload, which in turn contains a Raw string. This packet is then sent via a raw (network layer 2) network socket. Again all parameters come from the command line. Default for all the ports is 22222, the source IP is the XXXXXXX.9 address and the destination IP is the XXXXXXX.5 address. The destination and source mac addresses are both set to the mac address of the destination machine. We can replicate what we did with our simple client above:

sudo scapyclient.py

which results in the server printing:

Received 14 bytes from ('XXXXXXX.9', 22222)

It will always be port 22222 unless we specify a port option since this is the default in the settings of the script. We need to use sudo because only the superuser is allowed to send arbitrary packets on layer-2 of the network stack.

Now what happens when we start faking packets? For this we start tcpdump on both, the sending machine and the receiving machine as follows:

sudo tcpdump -n -i eth0 udp port 22222

Of course if your network interface is not called eth0 you would replace the parameter to the -i option above. The -n option tells tcpdump to not attempt DNS lookups of received IP addresses. Now lets send a packet with a localhost address as the source address:

sudo scapyclient.py --source-address=127.0.0.1

We see it on both machines in the tcpdump output:

TT:TT:TT.TTTTTT IP 127.0.0.1.22222 > 10.23.5.5.22222: UDP, length 14

where TT:TT:TT.TTTTTT is a timestamp. So the packet is going out via the network interface of the sending machine and it is being received by the receiving machine (because the destination mac address matches) but it is not printed by our simple server.

The same happens when we fake the destination address:

sudo scapyclient.py --address=127.0.0.1

The tcpdump output is:

TT:TT:TT.TTTTTT IP 10.23.5.9.22222 > 127.0.0.1.22222: UDP, length 14

It is also printed by both machines by tcpdump and again it is not printed by our simple server.

Now lets see what happens if we enable route_localnet for our ethernet interface (the following command only works as root, of course):

echo 1 > /proc/sys/net/ipv4/conf/eth0/route_localnet

We again fake the sender address:

sudo python3.11 scapyclient.py --source-address=127.0.0.1

Again the packet is printed by both running tcpdump processes but again not by our simple server. But faking the destination address:

sudo python3.11 scapyclient.py --address=127.0.0.1

Apart from the packet being logged by tcpdump we get an answer from our server:

Received 14 bytes from ('10.23.5.9', 22222)

Note that this works even if we tell our simple server to only listen to localhost:

python3 server.py --address=127.0.0.1

Still outputs our faked packet although this did not come in via the loopback interface:

Received 14 bytes from ('10.23.5.9', 22222)

Security implications

Some applications are using local servers that listen only to the loopback address and expect these to be able to only receive packets via the loopback interface – from local processes running on the same machine as the server.

This is always a dangerous assumption, so you should never do dangerous things that rely on this type of (non-) authentication. At least on Linux there is some safeguard as long as you keep the setting of the route_localnet to 0. On other operating systems you may not be so lucky. And even on Linux this behaviour can be turned off.

Why would you want to set this?

Sometimes you want to forward connections originating on the local machine that try to connect to local port to a remote machine, e.g. you have in /etc/hosts

127.0.0.1 some.remote.example.com

and you want to connect to that machine with a process from localhost. You can add firewall rules that rewrite packets to achieve this:

iptables -t nat -A OUTPUT -d 127.0.0.1 -p tcp --dport 443 \
    -j DNAT --to-destination XXXXXXX.5:1443
iptables -t nat -A POSTROUTING -d XXXXXXX.5 -p tcp --dport 1443 \
    -j SNAT --to-source XXXXXXX.9

The first rule is doing the destination address (and port) rewriting while the second rule makes sure that answer packets reach us by rewriting the source address. But without setting route_localnet on the outgoing ethernet interface this will not work, the packets will be dropped.

But when setting this beware of the explanation in the Security implications section. Because it has also an impact on what packets are received from the network.

Optimizing Floating-Point Problems with Evolutionary Algorithms


[This is an extended abstract for a talk I'm planning]

Evolutionary Algorithms (EA) are a superset of genetic algorithms and related optimization algorithms. Genetic algorithms usually work with bits and small integer numbers.

There are other EAs that directly work with floating point numbers, among the Differential Evolution (DE) [1] [2] [3].

The talk gives an introduction to optimization of floating-point problems with DE. It uses examples from electrical engineering as well as from optimization of actuation waveforms of inkjet printers. Piezo-electric inkjet printers use an actuation waveform to jet drops out of a nozzle. This waveform (among other parameters like the jetted fluid) determines the quality of the jetted drops.

For the software I'm using the Python bindings PGApy [4] for the Parallel Genetic Algorithm Package PGAPack [5] which was originally developed at Argonne National Laboratories. I'm maintaining both packages for some years now. Among others, Differential Evolution (DE) and strategies for multi-objective optimization (using NSGA-II [6]) where newly implemented.

Differential Evolution Illustrated


This post again applies to PGAPack and PGApy or other Genetic Algorithm (GA) implementations that support Differential Evolution (DE).

Differential Evolution (DE) [1], [2], [3] is a population based optimizer that is similar to other evolutionary algorithms. It is quite powerful and implemented in PGAPack (and usable in the Python wrapper PGAPy). To illustrate some points about this algorithm I've constructed a simple parcours in OpenSCAD.

 

/images/parcours.gif

 

To test the optimizer with this parcours the gene in the genetic algorithm (DE calls this the parameters) is the X and Y coordinate to give a position on the parcours. In the following we're also calling these coordinates a vector as is common in the DE literature. We initialize the population in the region of the start of the ramp. We allow the population to leave the initialization range.

Note that the corners of the parcours are flat areas with the same height. Areas like this are generally difficult for optimization algorithms because the algorithm cannot know in which direction better values (if available at all) can be found. This is also the reason why we do not put the initialization range inside the flat area at the bottom of the parcours. Note that normally one would initialize the population over the whole area to be searched. In our case we want to demonstrate that the algorithm is well capable of finding a solution far from the initialization range.

The algorithm of DE is quite simple (see, e.g. [3] p.37-47): Each member \(\mathbf x_{i,g}\) in the population of the current generation \(g\), where \(i\) is the index of the population member is crossed over with a mutant vector \(v_{i,g}\). The mutant vector is generated by selecting three population members at random (without replacement) from the population and combining them as follows.

\begin{equation*} \mathbf v_{i,g} = \mathbf x_{r_0,g} + F \cdot (\mathbf x_{r_1,g} - \mathbf x_{r_2,g}) \end{equation*}

As can be seen, the weighted difference of two random members of the population is added to a third population member. All indeces \(r_0, r_1,\) and \(r_2\) are different and different from \(i\). The weight \(F\) is a configuration parameter that is typically \(0.3 \le F < 1\). This algorithm is the classic DE, also often termed de-rand-1-bin. A variant uses the index of the best individual instead of \(r_0\) and is called de-best-1-bin. If more than a single difference is added, another variant would be, e.g., de-rand-2-bin. The last component of the name bin refers to uniform crossover [4], a binomial distribution. In DE, parameters from the mutant vector are selected with a certain probability \(Cr\). Note that DE makes sure that at least one parameter from the mutant vector is selected (otherwise the original vector \(\mathbf x_{i,g}\) would be unchanged).

More details about DE and the usage with PGAPack can be found in the section Differential Evolution and in the Mutation sections of the PGAPack documentation.

To use the OpenSCAD model for optimization, we convert it to a greyscale heightmap using an STL to PNG converter (STL is a 3D format that can be generated by OpenSCAD). The evaluation function then simply returns the greyscale value at the given location (after rounding the gene X and Y values to an integer). Values outside the picture are returned as 0 (black). The resulting optimization run is shown below in an animated image. The population is quite small, there are only 6 individuals in the population. We clearly see that when the differences between individuals gets large (on the straight ridges of the parcours), the optimization proceeds in larger steps. We also see that the flat corners can take quite some time to escape from and in the corners the algorithm slows down. Finally in the last corner, the cone is climbed and the individuals converge almost to a single point. The algorithm is stopped when the first individual returns an evaluation of the largest greyscale value (representing white). Note that I cheated a little in that many optimization runs take longer (in particular the optimization can get stuck for many generations in the flat parts of the ridge) and I selected a run that produced a good animation. So the selected run is not the average but one of the shorter runs.

 

/images/animated.gif

 

Notes on Premature Convergence in Genetic Algorithms


This post again applies to PGAPack and PGApy or other Genetic Algorithm (GA) implementations.

When optimizing solutions with GA, it sometimes happens that a sub-optimal solution is found because the whole population converges early to a part of the search space where no better solutions are found anymore. This effect is called premature convergence.

It is usually hard to detect premature convergence, a good measure is the mean hamming distance between individuals. In PGAPack reporting of the hamming distance can be enabled with the reporting option PGA_REPORT_HAMMING, set with the function PGASetPrintOptions in PGAPack and with the print_options constructor parameter in PGApy. Unfortunately this is only implemented for the binary datatype.

One reason for the effect of premature convergence is the use of Fitness Proportionate Selection as detailed in an earlier post in this blog [1]. If during the early stages of the search an individual is discovered that is much better than anything found so far, the chance is high that this individual takes over the whole population when Fitness Proportionate Selection is in use, preventing any further progress of the algorithm. The reason is that an individual that is much better than all others gets an unreasonably high proportion of the roulette wheel when selecting individuals for the next generation resulting in all other genetic material having only a slim chance of making it into the next generation.

Another reason for premature convergence can be a small population size. Goldberg et. al. [9] give estimates of the population size for classic GA with a small alphabet (the number of different allele values, e.g. 0 and 1 for a binary GA) with cardinality \(\chi\), a problem specific building block size that overcomes deception \(k \ll n\) where \(k\) is the building block size (a measure for the difficulty of the problem) and \(n\) is the string (gene) length. They show that the population size should be \(O(m\chi^k)\) with \(m=n/k\) so that for a given difficulty of the problem \(k\) the population size is proportional to the string length \(n\). This result, however, does not readily translate to problems with a large alphabet, e.g. floating-point representations like the real data type in PGAPack. For floating point representations, difficulty usually translates to how multimodal a problem is, i.e., how many peaks (in case of a maximization problem) or valleys (in case of a minimization problem) there are in the objective function.

Now if with an adequate population size and an appropriate selection scheme premature convergence still occurs, there are some mechanisms that can be used.

Prevention of Duplicates

PGAPack implements a mechanism for preventing duplicate gene strings (individuals). In previous implementations the computation effort was quadratic in the population size \(N\), i.e. the effort was \(O(N^2)\) (it compared each new individual with all current members of the population, once for each new individual). In the latest versions it uses hashing for detecting duplicates, reducing the overhead to \(O(N)\), a small constant overhead for each new individual.

For user-defined data types this means that the user has to define a hash function for the data type in addition to the string comparison function. For the built-in data types (binary, integer, real) this is automatically available.

Prevention of duplicates works quite well for binary and integer data types, especially if the genetic operators have a high probability of producing duplicates. It does not work so well for the real data type because new individuals tend to be different from other individuals even if they can often be very close to already existing individuals.

Restricted Tournament Replacement

An algorithm originally called restricted tournament selection by Harik [2], [3] and later adopted under the name of restricted tournament replacement (RTR) by Pelikan [4] uses the old and new population for deciding if a candidate individual will be allowed into the new population. It works by randomly selecting a number of members from the old population (called the selection window), chosing the individual which is most similar to the candidate, and allows the candidate into the new population only if it is better than this most similar individual.

The default for the number of individuals randomly selected from the old population (the window size) by default is \(\min (n, \frac{N}{20})\) [4] where \(n\) is the string (gene) length and \(N\) is the population size. This can be set by the user with the PGASetRTRWindowSize function for PGAPack and with the rtr_window_size constructor parameter of PGApy.

The RTR algorithm needs a similarity metric for deciding how close an individual is to another. By default this uses a manhattan distance (equivalent to the hamming distance for binary genes), i.e. an allele-by-allele sum of distances, but can be set to an euclidian distance or a user-defined metric with the user-function mechanism of PGAPack. Comparison of an euclidian distance metric for RTR is in the magic square example in PGApy where the use of the euclidian distance can be turned on with a command-line option.

Restricted tournament replacement works well not only for binary and integer genes but also for real genes. It can be combined with different evolutionary algorithm settings.

The effect of RTR on a problem that tends to suffer from premature convergence can be seen in the test program examples/mgh/testprog.f, this implements several test functions from an old benchmark of nonlinear test problems [5]. The test function that exhibits premature convergence is what the authors call a "Gaussian function", described as example (9) in the paper [5] and implemented as function 3 in examples/mgh/objfcn77.f. This function is given as

\begin{equation*} f(x) = x_1 \cdot e^{\frac{x_2(t_i - x_3)^2}{2}} - y_i \end{equation*}

with

\begin{equation*} t_i = (8 - i) / 2 \end{equation*}

And tabulated values for \(y_i\) given in the paper or the implementation in examples/mgh/objfcn77.f. The minimization problem from these equations is

\begin{equation*} f (x) = \sum_{i=1}^{m} f_i^2(x) \end{equation*}

with \(m=15\) for this test problem. The authors [5] give the vector \(x_0 = (0.4, 1, 0)\) for the minimum \(f(x_0) = 1.12793 \cdot 10^{-8}\) they found. The original Fortran implementation in examples/mgh/testprog.f uses a population size of 10000 with default settings for the real data type of PGAPack. The large population size is chosen because otherwise the problem exhibits premature convergence. It finds a solution in 100 generations \(x_0=(0.3983801, 0.9978369, 0.009918243)\) with an evaluation value of \(f(x_0)=2.849966\cdot 10^{-5}\). The number of function evaluations needed were 105459 (this is a little less than \(10000 + 100 \cdot 1000\), i.e. evaluation of the initial generation plus evaluation of 10% of the population of each generation, the probability of crossover and mutation is not 100%, so it happens that none of the operators is performed on an individual and it is not re-evaluated).

I've implemented the same problem with Differential Evolution [6], [7], [8] in examples/mgh/testprogde.c (the driver program implemented in C because I really do not speak Fortran but using the same functions from the Fortran implementation linking the Fortran and C code into a common executable). This uses a population size of 2000 (large for most problems when using Differential Evolution, again for the reason of premature convergence) and finds the solution \(x_0=(0.3992372, 0.9990791, -0.0007697923)\) with an evaluation value of \(f(x_0)=7.393123\cdot 10^{-7}\) in only 30 generations. This amounts to 62000 function evaluations (Differential Evolution creates all individuals for the new generation and decides afterwards which to keep).

When using RTR with this problem in examples/mgh/testprogdertr.c, the population size can be reduced to 250 and even after 100 generations the search has not converged to a suboptimal solution. After 100 generations we find \(x_0=(0.398975, 1.000074, -6.719886 \cdot 10^{-5})\) and \(f(x_0)=1.339766\cdot 10^{-8}\) (but also with some changed settings of the Differential Evolution algorithm). This amounts to 25250 function evaluations.

Restart

A last resort when the above mechanisms do not work is to regularly restart the GA whenever the population has converged too much. The restart mechanism implemented in PGAPack uses the best individual from the population to re-seed a new population with variations created by mutation from this best individual. Restarts can be enabled by setting PGASetRestartFlag in PGAPack or using the restart constructor parameter in PGApy. The frequency (default is every 50 generations) of restarts can be set with PGASetRestartFrequencyValue in PGAPack and the restart_frequency constructor parameter in PGApy.

Proportional Selection (Roulette-Wheel) in Genetic Algorithms


Some of you know that I'm maintaining PGApack, a genetic algorithm package and a corresponding Python wrapper, PGApy. Recently the question came up why PGApack, when using Fitness Proportionate Selection (also known as Roulette-Wheel selection and also called proportional selection) errors out because it cannot normalize the fitness value.

In PGApack, the user supplies a real-valued evaluation function (which is called for each individual) and specifies if the result of that evaluation should be minimized or maximized (defining the optimization direction).

When using Fitness proportionate selection, this evaluation value must be mapped to a positive value, assigning each evaluation a share on the roulette-wheel. If the optimization direction is minimization, small values are better and need a larger part of the roulette-wheel so that they have a higher probability of being selected. So we need to remap the raw evaluation values to a nonnegative and monotonically increasing fitness. For a minimization problem, we're computing the maximum (worst) evaluation and then the fitness is the difference of this maximum and the evaluation for that individual (after scaling the maximum a little so that no fitness value is exactly zero):

\begin{equation*} F = E_{max} - E \end{equation*}

where \(F\) is the fitness of the current individual, \(E_{max}\) is the maximum of all evaluations in this generation, and \(E\) is the evaluation of that individual.

Now when evaluation values differ by several orders of magnitude, it can happen that the difference in that formula ends up being \(E_{max}\) for many (different) evaluation values. I'm calling this an overflow in the error message which is probably not the best name for it.

That overflow happens when \(E_{max}\) is large compared to the current evaluation value \(E\) so that the difference ends up being \(E_{max}\) (i.e. the subtraction of \(E\) had no effect). In the code we check for this condition:

if (cmax == cmax - evalue && evalue != 0)

This condition triggers when subtracting the evaluation \(E\) from the current \(E_{max}\) does not change \(E_{max}\) even though \(E\) is not zero. So \(E\) is so small compared to \(E_{max}\) that the double data type cannot represent the difference. This happens whenever the units in the last place (called ulps by Goldberg (not the same Goldberg who wrote the Genetic Algorithms book afaik)) of the significand (also called mantissa) is larger than the value that should be subtracted [1].

In our example \(E_{max} = 1.077688 * 10^{22}\) and the evaluation where this failed was \(E = 10000\). The IEEE 754 double precision floating point format has 53 bit of significand which can represent numbers up to \(2^{54} - 1 = 18014398509481983\) or about \(1.8 * 10^{16}\). So you see that the number 10000 is just below the ulps. We can try in python (which uses double for floating-point values):

>>> 1.077688e22 - 10000 == 1.077688e22
True

Why do we make this check in the program? Letting the search continue with such an overflow (or how we want to call it) would map many different evaluation values to the same fitness. So the genetic algorithm could not distinguishing these individuals.

So what can we do about it when that error happens?

The short answer: Use a different selection scheme. There is a reason why the default in PGApack is not proportional selection.

Fitness proportionate selection (aka roulette wheel selection) has other problems, too. It has too much selection pressure in the beginning and too low at the end (also mentioned in the Wikipedia article but beware, I've written parts of it).

Blickle and Thiele [2] did a mathematical analysis of different selection schemes and showed that proportional selection is typically not a good idea (it was historically the first selection scheme and is described in Goldberg's (the other Goldberg) book [3] which is probably the reason it is still being used). Note that there is an earlier report from Blickle and Thiele [4] that is more frank about the use of proportional selection: "All the undesired properties together led us to the conclusion that proportional selection is a very unsuited selection scheme. Informally one can say that the only advantage of proportional selection is that it is so difficult to prove the disadvantages" ([4], p. 42), they were not as outspoken in the final paper :-)

We're seeing this in the example above: We have very high differences between good and bad evaluation (in fact so large that the fitness cannot be computed, see above). So when using proportional selection the very good individuals will be selected with too high probability resulting in premature convergence.

That all said: If you're doing optimization with genes of type 'real', (represented by double values in PGApack) you may want to try Differential Evolution [5], [6], [7]. At least in my experiments with antenna optimization [8] the results outperform the standard genetic algorithm, but this is reported by several practitioners [7]. Examples of how to use this are in examples/deb/optimize.c or examples/mgh/testprogde.c in PGApack.

The PGASetDECrossoverProb is critical, for problems where the dimensions cannot typically be optimized separately the value should be set close to 1 but not equal to 1.

Plotting Antenna Pattern


For my pymininec project (a re-implementation of the original Mininec [1] in Python, see the pymininec README for a description on how I was able to reverse-engineer 40 year old Basic code), I had needed a method to graphically display the results. So at first I had included a visualization tool for antenna pattern in pymininec.

But after I had added a quick & dirty parser for the output of nec2c, it turned out, the program is useful on its own. So I made it a standalone program and named it plot-antenna.

NEC (numerical electromagnetics code) was originally developed at the Lawrence Livermore National Laboratory in Fortran. Up to version 2 this is free software, later versions (they're at version 5 of NEC now) are non-free. Neoklis Kyriazis translated the Fortran version of NEC-2 to C and called it nec2c. He did some optimizations, e.g. include the separate program SOMNEC (for computing a Sommerfeld/Norton ground model) directly into nec2c and got rid of many compile-time limits of the original Fortran code. The C version nec2c is one of a small number of open source antenna modelling programs. It is a command-line utility that compiles an antenna description into an output text file with a lot of tables. There are other programs like xnecview that can display antenna pattern from this output.

Note that I didn't find an official web-page for nec2c – it is included in Debian and Ubuntu (Ubuntu also is searching for the official upstream version) and it seems it was originally hosted on Google Code. There is a version on github with a link to another version on github. The software seems to be unmaintained these days – not much of a problem since NEC version 2 doesn't get any updates for some decades now.

Now back to plot-antenna: Originally the program only had a parser for output of pymininec (and of the original Mininec [1] program in Basic). I later added a parser for NEC. The current parsers are quick & dirty implementations and may not be very robust – but they do work for me.

The first version used Matplotlib for the graphic display and this is still supported. Originally I had implemented azimuth and elevation pattern and a 3d display.

Recently, after having come across plotly I added html/javascript output using the plotly library. The 3d capabilities of plotly are much better than those of Matplotlib – even with 1 degree resolution of the antenna angles (i.e. 180 * 360 3D points) the display still runs smoothly in the browser (Firefox). With the Matplotlib 3D plot, we need to use 5 degree resolution (36 * 72 3D points) otherwise rotating the display is very sluggish. You can try the plotly version yourself below (or you might want the full window antenna plot):

Not to mention zooming: This works fine in plotly (using the scroll wheel of the mouse in the graphics above) and doesn't seem to be implemented at all in Matplotlib.

In the plotly version we display the plots for all frequencies in a single plot. For the 3D view all except the currently-selected frequency are hidden. You can select the frequency for which the plot should be displayed in the legend on the right side.

The best feature in plotly is the live-display of the 3D-coordinates of the surface on mouse-over. In the current implementation this displays azimuth and elevation angles as well as the absolute and relative gains at the selected point on the antenna pattern.

On the other hand I needed several workarounds in plotly:

  • There is a bug in hover when displaying coordinates

  • plotly does not support menus for 3D surfaces, I had to put an additional (empty, with one white dot) graphics into the same display for showing the menu. The menu is used when an antenna pattern is computed for a set of frequencies. In that case the frequencies can be selected in the menu on the right side.

  • In plotly there is no way to only show a single menu item at a time. I wanted the menu buttons to behave like radio buttons in web interfaces: Clicking on a new frequency should show only the plot for the clicked-on frequency. For 3D plots it does not make much sense to show several 3D plots in a single display. This is achieved with some custom Javascript.

The antenna pattern in the example above is taken from an article by L.B. Cebik [2]. When you compare the pattern in that article to the 3D pattern above (or you look at the azimuth/elevation diagrams in the description of plot-antenna), you'll notice that the plotly 3D view is slighty distorted (the lobe is too long). I've not yet found out if this is a problem of plotly or my use of it. I do try to get an equal aspect ratio in all dimensions by setting xaxis, yaxis, and zaxis range to the same value but as you can see the aspect ratio does not look right.

When we compare this to an azimuth diagram created with plotly:

the dimensions look ok. Notice that also in the 2D diagram you can display the angle (azimuth in this case) on mouse-over. When using Matplotlib, this is also available but the cursor does not snap to the drawn pattern. So with Matplotlib you always have to guess if the cursor is now exactly at the correct position. Also note that, again, you can select the frequency to display in the frequency legend on the right side. But this time you can display the plots for several frequencies at the same time (which does not make much sense in an opaque 3D plot).

Epsilon-Constrained Optimization


[Update 2022-10-18: Replace epsilon with delta in description of example problem 7 in pgapack]

Many optimization problems involve constraints that valid solutions must satisfy. A constrained optimization problem is typically written as a nonlinear programming problem [1].

\begin{align*} \hbox{Minimize} \; & f_i(\vec{x}), & i &= 1, \ldots, I \\ \hbox{Subject to} \; & g_j(\vec{x}) \le 0, & j &= 1, \ldots, J \\ & h_k(\vec{x}) = 0, & k &= 1, \ldots, K \\ & x_m^l \le x_m \le x_m^u, & m &= 1, \ldots, M \\ \end{align*}

In this problem, there are \(n\) variables (the vector \(\vec{x}\) is of size \(n\)), \(J\) inequality constraints, \(K\) equality constraints and the variable \(x_m\) must be in the range \(|x_m^l, x_m^u|\) (often called a box constraint). The functions \(f_i\) are called the objective functions. If there is more than one objective function, the problem is said to be a multi objective optimization problem as described previously in this blog [2]. In the following I will use some terms that were introduced in that earlier blog-post without further explanation.

The objective functions are not necessarily minimized (as in the formula) but can also be maximized if the best solutions to a problem requires maximization. Note that the inequality constraints are often depicted with a \(\ge\) sign but the formula can easily be changed (e.g. by multiplying with -1) to use a \(\le\) sign.

Since it is very hard to fulfil equality constraints, especially if they involve nonlinear functions of the input variables, equality constraints are often converted to inequality constraints using an δ‑neighborhood:

\begin{equation*} -\delta \le h_k(\vec{x}) \le \delta \end{equation*}

Where δ is chosen according to the requirements of the problem for which a solution is sought.

One very successful method of solving constrained optimization problems is to consider a lexicographic ordering of constraints and objective function values. Candidate solutions to the optimization problem are first sorted by violated constraints (typically the sum over all violated constraints) and then by the objective function value(s) [1]. When comparing two individuals during selection in the genetic algorithm there are three cases: If both individuals violate constraints, the individual with the lower constraint violation wins. If one violates constraints and the other does not, the one not violating constraints wins. In the last case where both individuals do not violate constraints the normal comparison is used (which depends on the algorithm and if we're minimizing or maximizing). This method, originally proposed by Deb [1], is implemented in the genetic algorithm package I'm maintaining, PGAPack, and my Python wrapper PGAPy for it.

With this algorithm for handling constraints, the constraints are optimized first before the algorithm "looks" at the objective function(s) at all. It often happens that the algorithm ends up searching in a region of the input space where no good solutions exist (but no constraints are violated). Hard problems often contain equality constraints (converted to inequality constraints as indicated earlier) or other "hard" constraints. In my previous blog post [2] on antenna optimization I wrote: "the optimization algorithm has a hard time finding the director solutions at all. Only in one of a handful experiments I was able to obtain the pareto front plotted above".

In that experiment I was running 50 searches and only 5 of them did not get stuck in a local optimum. A similar thing happens for the problem (7) in Deb's paper [1] which has equality constraints. I've implemented this as example problem 7 in PGAPack. It only finds a solution near the (known) optimum when \(\delta \ge 10^{-2}\) for all equality constraints (I didn't experiment with different random seeds for the optimizer, maybe a better solution would be possible with a different random seed). In the paper [1], Deb uses \(\delta = 10^{-3}\) for the same reason.

One method for handling this problem was appealing because it is so easy to understand and implement: Takahama and Sakai were first experimenting with a method for relaxing constraints during the early stages of optization with a formulation they called an α‑constrained genetic algorithm [3]. They later simplified the formulation and called the resulting algorithm ε constrained optimization. It can be applied to different optimization algorithms, not just genetic algorithms and variants [4]. Of special interest is the application of the method to differential evolution [5], [6] but of course it can also be applied to other forms of genetic algorithms.

Note that the ε in the name of the algorithm can be used for the δ used when converting an equality constraint to inequality constraints but is not limited to this case.

During the run of the optimizer in each generation a new value for ε is computed. The comparison of individuals outlined above is modified, so that an individual is handled like it was not violating any constraints if the constraint violation is below ε. So if both individuals have constraint violations larger than ε, the one with lower violation wins. If one violation is below ε and the other above, the individual with the violation below ε wins. And finally if the constraint violations of both individuals are below ε, the normal comparison takes place.

The last case is the key to the success of this algorithm: Even though the search proceeds into a direction where the constraint violations are minimized, at the same time good solutions in terms of the objective function are found.

The algorithm begins by initializing \(\varepsilon_0\) with the constraint violation of the individuum with index \(\theta\) from the initial population sorted by constraint violation, where \(\theta\) is a parameter of the algorithm between 1 and the population size, a good value uses the individuum at about 20% of the population size which is also the default in PGAPack. In each generation \(t\), \(\varepsilon_t\) is computed by

\begin{equation*} \varepsilon_t = \varepsilon_0 \left(1-\frac{t}{T_c}\right)^{cp} \end{equation*}

up to generation \(T_c\). After that generation, ε is set to 0. The exponent \(cp\) is between 2 and 10. The 2010 paper [6] recommends to set \(cp = 0.3 cp + 0.7 cp_\min\) at generation \(T_\lambda = 0.95 T_c\) where \(cp_\min\) is the fixed value 3. The initial value of \(cp\) is chosen so that \(\varepsilon_\lambda=10^{-5}\) at generation \(T_\lambda\) unless it is smaller than \(cp_\min\) in which case it is set to \(cp_\min\). PGAPack implements this schedule for \(cp\) by default but allows to change \(cp\) at start and during run of the optimizer, so it's possible to easily implement a different schedule for \(cp\) – the default works quite nicely, though.

With the ε constraint method, example 7 from Deb [1] can be optimized with a precision of \(10^{-6}\) in my experiments, see the epsilon_generation parameter in the optimizer example

The antenna-optimizer with an ε‑generation of 50 (that's the \(T_c\) parameter of the algorithm) gets stuck in the local optimum only in one of 50 cases, all other cases find good results:

In that picture all the solutions that are dominated by solutions from another run are drawn in black. It can be seen that the data from run number 16 did not contribute any non-dominated solutions (on the right side in the legend the number 16 is missing). You can turn off the display of the dominated solutions by clicking on the black dot in the legend.

When I increase the ε‑generation to 60, the run with random seed 16 also finds a solution:

We also see that the solutions are quite good (quite near to the pareto front) for all runs, the black "shadow" of the dominated solutions is quite near to the real pareto front and it is enough to do a single run of the algorithm for finding a good set of solutions.

Impedance matching with a cable of same impedance?


Recently in the german magazine "Funkamateur", there were two articles where the author claimed that it is possible to get a better standing wave ratio (VSWR) by using a specific lengths of 50Ω cable.

The first article was about a duo-band antenna for the 70cm and 2m ham radio bands [1]. This casually mentioned that the VSWR could be improved by using a specific length of 50Ω cable and provided a link to a software by the author to compute that length (in fact it was claimed that there are different lengths of cable that would achieve that purpose). It was claimed that this length of cable can transform the impedance (the word used was "Leitungstransformation").

In a second article – prompted by several readers who wrote letters to the editor – the author provided measurements that clearly showed that, indeed, the VSWR was improved with a certain length of cable [2].

Now transmission line theory says that a transformation resulting in a different VSWR can only be achieved by a piece of line with a different impedance that the characteristic impedance of the system, see my recent article on the subject. So there must be another reason why the impedance is changed – as clearly shown by the authors measurements. The author of the article claimed here that there is a discrepance between theory and practice – I'll show in the following that there is a reasonable explanation for the findings without dumping the existing electromagnetics theory.

The hint here is that the author did not use a balun or any other means that prevents current on the outer sheath of the feedline. So I had some fun modelling the setup with the antenna simulation program NEC2. The results should be reproduceable with any other antenna simulation program, at least if it contains a NEC core (e.g. EZNEC) which can model ground and not just free space. I've modelled the (simple two-element) antenna from the original article.

The following image shows the standing wave ratio (VSWR) and the impedance split into real and imaginary part directly at the antenna. The first NEC input file can be used to reproduce the results. Note that I'm using a Sommerfeld/Norton ground model with a medium quality ground. The antenna is located 15m above ground in the model because the antenna is meant to be used on a balcony.

 

/images/an-der-antenne.png

 

The next image shows the same antenna with a length of feedline (the length is the one for which the author claims that it can improve the VSWR). Note that the feedline is not modelled as a piece of wire but as a transmission line (which NEC provides as an abstraction). Again we show real and imaginary part. The electrical length (NEC uses only the electrical length when modeling transmission lines, no velocity factor is necessary) is about 1.81m computed by the formula in [1] with a velocity-factor of 1. It can be clearly seen that the impedance changes but the VSWR stays the same. The corresponding second NEC input file can be used to reproduce these results.

 

/images/mit-speiseleitung.png

 

Now to simulate the conditions at the feedpoint when there is no balun I've modeled a wire between the antenna and the feedpoint at the end of the cable in the third NEC input file. The following image shows the model, we have an additional wire which represents the outside of the coax feedline. It is connected to the antenna at one point and to the end of the coax at the second point. The length of the cable is the physical length. Except for a small increase of the electrical length due to the isolation (not with the usual coax velocity factor, we're simulating only the outer braid!) the cable in the model needs to use the real physical length of the coax as computed by the formula in the first article [1]. We're using 1.484m from [1], table 3 for an Aircell5 with the factor n=1 in the table. Now we see that, indeed, as observed in the second article [2], the standing wave ratio VSWR is indeed smaller. The reason is, that due to a missing balun, the antenna itself has changed: The outer braid of the feedline is part of the antenna.

 

/images/mit-mantelwelle.png

 

A view with the Linux program xnec2c shows the currents in the antenna and on the outer braid of the feedline. We can clearly see that there is current on the outer braid of the feedline (the feedline in the picture is the long part to the right, in parallel to that the modelled transmission line is shown).

 

/images/mit-mantelwelle-antenne.png

 

So the electromagnetics theory is saved, there is a physical explanation of the phenomenon. It would still be interesting to model different influences like height above ground or different ground characteristics – and find out if the results above are reproduceable with different parameters since the author claims that the observed behaviour is well known ([2] p. 35).

Multi-Objective Antenna Optimization


For quite some time I'm optimizing antennas using genetic algorithms. I'm using the pgapack parallel genetic algorithm package originally by David Levine from Argonne National Laboratory which I'm maintaining. Longer than maintaining pgapack I'm developing a Python wrapper for pgapack called pgapy.

For the antenna simulation part I'm using Tim Molteno's PyNEC, a python wrapper for the Numerical Electromagnetics Code (NEC) version 2 written in C++ (aka NEC++) and wrapped for Python.

Using these packages I've written a small open source framework to optimize antennas called antenna-optimizer. This can use traditional genetic algorithm method with bit-strings as genes as well as a floating-point representation with operators suited for floating-point genes.

The parallel in pgapack tells us that the evaluation function of the genetic algorithm can be parallelized. When optimizing antennas we simulate each candidate parameters for an antenna using the antenna simulation of PyNEC. Antenna simulation is still (the original NEC code is from the 1980s and was conceived using punched cards for I/O) a CPU-intensive undertaking. So the fact that with pgapack we can run many simulations in parallel using the message passing interface (MPI) standard [1] is good news.

For pgapack – and also for pgapy – I've recently implemented some classic algorithms that have proven very useful over time:

  • Differential Evolution [2], [3], [4] is a very successful optimization algorithm for floating-point genes that is very interesting for electromagnetics problems

  • The elitist Nondominated Sorting Genetic Algorithm NSGA-II [5] allows to optimize multiple objectives in a single run of the optimizer

  • We can have constraints on the optimization using constraint functions that are minimized. For a solution to be valid, all constraints must be zero or negative. [6]

Traditionally with genetic algorithms only a single evaluation function, also called objective function is possible. With NSGA-II it is possible to have several objective functions. We call such an algorithm a multi-objective optimization algorithm.

For antenna simulation this means that we don't need to combine different antenna criteria like gain, forward/backward ratio, and standing wave ratio (VSWR) into a single evaluation function which I was using in antenna-optimizer, but instead we can specify them separately and leave the optimization to the genetic search.

With multiple objectives, however, typically when a solution is better in one objective, it can be worse in another objective and vice-versa. So we are searching for solutions that are strictly better than other solutions. A solution is said to dominate another solution when it is strictly better in one objective but not worse in any other objective. All solutions that fulfill this criterion are said to be pareto-optimal named after the italian scientist Vilfredo Pareto who first defined the concept of pareto optimality. All solutions that fulfill the pareto optimality criterion are said to lie on a pareto front. For two objectives the pareto front can be shown in a scatter-plot as we will see below.

Since pgapack follows a "mix and match" approach to genetic algorithms we can combine successful strategies for different parts of a genetic algorithm:

  • We can use Differential Evolution just for the mutation/crossover part of the genetic algorithm

  • We can combine this with the nondominated sorting replacement of NSGA-II

  • We can define some of our objectives as constraints. For our problem it makes sense to only allow antennas that do not exceed a given standing-wave ratio. So we do not allow antennas with a VSWR > 1.8. The necessary constraint function is \(S - 1.8 \le 0\) where \(S\) is the voltage standing wave ratio (VSWR).

With this combination we can successfully compute antennas for the 70cm ham-radio band (430 MHz - 440 MHz). The antenna uses what we call a folded dipole (the thing with the rounded corners) and a straight element. The measures in the figure represent the lenghts optimized by the genetic algorithm. The two dots in the middle of the folded dipole element represent the point where the antenna feed-line is connected.

/images/2ele.png

A first example simulates antenna parameters for the lowest, the highest and the medium frequency. The gain and forward/backward ratio are computed for the medium frequency only:

/images/twoele-v1.png

In this graph (a scatter plot) the first objective (the gain) is graphed against the second objective, the forward/backward ratio. All numbers are taken from the medium frequency. Each dot represents a simulated antenna. All antennas have a VSWR lower than 1.8 on the minimum, medium, and maximum frequency.

With this success I was experimenting with different settings of the Differential Evolution parameters. It is well-known that Differential Evolution performance on decomposable problems is better with a low crossover-rate, while it is better on non-decomposable problems with a high crossover rate. A decomposable problem is one where the different dimensions can be optimized separately, this was first observed by Salomon in 1996 [7]. I had been using a crossover-rate of 0.2 and my hope was that the optimization would be better and faster with a higher crossover rate. The experiment below uses a crossover-rate of 0.9.

In addition I was experimenting with dither: Differential Evolution allows to randomly change the scale-factor \(F\), by which the difference of two vectors is multiplied slightly for each generated variation. In the first implementation I was setting dither to 0, now I had a dither of 0.2. Imagine my surprise when with these settings I found a completely different Pareto front for the solution:

/images/twoele-v2.png

To make it easier to see that the second discovered front completely dominates the front that was first discovered, I've plotted the two fronts into a single graph:

/images/twoele-v3.png

Now since the second discovered front looks too good to be true (over the whole frequency range) for a two-element antenna, lets take a look what is happening here. First we show the orientation of the antenna and the computed gain pattern for one of the antennas from the middle of the lower front:

/images/middle-lower-antenna.png/images/middle-lower-pattern.png

The antenna has – as already indicated in the pareto-front graphics – a gain of about 6.6 dBi and a forward/backward ratio of about 11 dB in the middle of the band at 435 MHz. The colors on the antenna denote the currents on the antenna structure. If you want to look at this yourself, here is a link to the NEC input file for antenna 1

Now lets compare this with one of the antennas of the "orange front", where we get a lot better values:

/images/middle-higher-antenna.png/images/middle-higher-pattern.png

This antenna is in the middle of the pareto front above and has a gain of about 6.7 dBi and a forward/backward ratio of about 16 dB in the middle of the band at 435 MHz. Can you spot the difference to the first antenna? Yes: The maximum gain is in the opposite direction of the first antenna. We say that for the first antenna the straight element acts as a reflector while for the second antenna it acts as a director. If you want to look at this yourself, here is a link to the NEC input file for antenna 2

Now we look at the frequency plot of gain and forward/backward ratio of the antennas, the plot for the first antenna (with the reflector element) is on the left, while the plot for the antenna with the director element is on the right.

/images/middle-lower-fplot.png/images/middle-higher-fplot.png

We see that the forward/backward ratio of the director antenna ranges from more than 10 dB to more than 25 dB while the reflector design ranges from 9.3 dB to 11.75 dB. For the minimum gain the reflector design is slightly better (from 6.35-6.85 dBi vs. 6.3-7.05 dBi). So this needs further experiments. When forcing a reflector design and changing the evaluation function to return the minimum gain and F/B ratio over the three (start, middle, end) frequencies we get:

/images/twoele-reflector.png

The same for a director design (also with the minimum gain and F/B ratio over the three frequencies start, middle, end) we get:

/images/twoele-director.png

With these result, the sweet spot for an antenna to build is probably at or above 10 dB F/B ratio and a gain of about 6.2 dBi. Going for some 1/10 dBi more gain and sacrificing several dB of F/B ratio doesn't seem sensible. Comparing the director vs. reflector design we notice (contrary to at least my intuition) that the director design has a better F/B ratio over the whole frequency range. If, however the antenna is to be used for relay operation, where the sending frequency (the relay input) is in the lower half of the frequency range and the relay output (the receiving frequency) is in the upper half, we will probably chose a reflector design because there the gain is higher when sending and the F/B ratio is higher when receiving (compare the two earlier gain and F/B ratio plots).

Also note that the optimization algorithm has a hard time finding the director solutions at all. Only in one of a handful experiments I was able to obtain the pareto front plotted above. The design is more narrowband than the reflector design and the algorithm often converges to a local optimimum. The higher difference in gain and F/B range of the director design also tells us that it will be harder to build: Not getting the dimensions exactly right will probably not reach the predicted simulation results. The reflector design is a little more tolerant in this regard.