# Modeling a Wire Antenna with Insulation

[Update 2024-09-19: Correction of citation of article by Stearns, Supplement to Antenna Book [5]]

For my antenna modeling code pymininec (which is a re-implementation of the original Mininec code in Basic) I was researching how to model insulated wires. I had asked Roy Lewallen, W7EL, the author of EZNEC what he is using in EZNEC and received a pointer to a paper by J. H. Richmond [1] which seems the first paper to propose an algorithm for modeling insulated wires in an antenna and actually implementing it. The algorithm was implemented by Jerry McCormack in his masters thesis [2] and later incorporated into a contractor report [3] (the contractor report is almost identical with the thesis except for updated Fortran code but has a different author). The original Fortran code is not only listed in the reports but is still made available by Ray L. Cross on his "Antenna Scatterers Analysis Program" (ASAP) page. The subroutine "DSHELL" implements the insulated wire algorithm. With some modifications I was able to run this code today. Many thanks go to Roy, W7EL for sending me this information.

To test my implementation in pymininec against real measured data, I dug up a paper by David Lamensdorf [4] where the effect of insulation were actually measured. Since at the time the insulation could only be modelled as an infinite cylinder, the experiment used insulation that goes beyond the end of the wire until no more changes in the measured parameters were observed.

Experiments with the formulation by Richmond [1] indicated that the resulting impedance matrix is very ill-conditioned: For short wire segments the formulation yields very large absolute values that are added to the main diagonal of the impedance matrix and subtracted from the second diagonals (the matrix is symmetric and the second diagonal is the same for the lower and upper triangle). Since these large values are subtracted during the matrix inversion this leads to Catastrophic Cancellation and results that are dominated by cancellation effects.

So I investigated further and found a paper [5] and a presentation [6] by Steve Stearns, K6OIK which gives distributed inductance and an equivalent-radius formula for modeling insulation. His presentation also has a great history of various approaches to modeling insulation with programs that originally do not support it (like the original NEC-2 and the original Mininec):

\begin{align*} a_e &= a \left(\frac{a}{b}\right)^{\left(1- \frac{1}{\varepsilon_r}\right)} \\ L &= \frac{\mu_0}{2\pi}\left(1-\frac{1}{\varepsilon_r} \right)\log\left(\frac{a}{b}\right) \\ \sigma_e &= \sigma\left(\frac{a}{b}\right)^{2\left(1-\frac{1}{\varepsilon_r}\right)} \\ \end{align*}

where $a$ is the original radius of the wire, $b$ is the radius of the wire including insulation, $\sigma$ is the conductivity of the wire, $\varepsilon_r$ is the relative dieelectric constant of the insulation, $a_e$ is the equivalent radius and $\sigma_e$ is the equivalent conductivity that is used for removing the effect of the changed radius on the computation of the skin effect loading of the wire. The inductance $L$ is the inductance per length of the insulated wire (or wire segment).

I then searched for a paper that would confirm the formulas by Steve Stearns and found an even older paper by Tai Tsun Wu [7] which has exactly the formula for the equivalent radius $a_e$ and for the distributed inductance $L$.

The formula for the equivalent conductivity is just used to cancel the effect of the equivalent radius on the computation of the skin effect load. In a previous blog post [8] I had given the formula for the skin effect load. In that formula, the conductivity $\sigma$ appears as its reciprocal $\rho$, the resistivity. The square root of $\sigma$ is always used as a product with the radius. So we can undo the equivalent radius correction by multiplying $\sigma$ with the reciprocal of the equivalent radius factor. We get

\begin{align*} \sqrt{\sigma}a &= \sqrt{\sigma_e}a\left(\frac{b}{a}\right)^{\left (1-\frac{1}{\varepsilon_r}\right)} \\ \frac{\sigma_e}{\sigma} &= \frac{1}{\left(\frac{b}{a}\right)^{2\left (1-\frac{1}{\varepsilon_r}\right)}} = \left(\frac{a}{b}\right)^{2\left(1-\frac{1}{\varepsilon_r}\right)} \\ \end{align*}

In pymininec I don't need to correct the equivalent radius for the skin effect computation, I can simply use the original radius there.

In the following three graphics I'm comparing various implementations of insulated wires with the measurements by Lamensdorf [4]. The wire has a diameter (not radius!) of 0.25in, the frequency is 600MHz. I'm using only the measurements with $\varepsilon_r = 3.2$, the higher relative permittivities are unrealistic in practice and will deviate from the formulas due to the unrealistic experimental setup. I'm plotting the real- (conductance) and imaginary (susceptance) parts of the admittance (reciprocal of the impedance). The three graphics are:

• Without any insulation

• Diameter of wire with insulation of 0.375in (insulation 0.125in)

• Diameter of wire with insulation of 0.5in (insulation 0.25in)

The traces in the plots are:

• ASAP with two, four, eight and 16 segments for each half of a dipole, the traces are marked 'AS02', 'AS04', 'AS08' and 'AS16', respectively, I'm not showing all four traces in each plot

• 4NEC2 which, according to Stearns [6], implements a formula derived by Cebik [9], these traces are marked '4N2'

• NEC-2 using Wu's [7] formula for equivalent radius and distributed inductance as well as Stearns' [6] formula for equivalent $\sigma$ marked as 'NEC'

• My pymininec implementation using Wu's [7] formula for equivalent radius and distributed inductance (but not the $\sigma$ correction because I can use the original radius for the skin effect directly) marked as 'mini'

• The measurements from Lamensdorf [4] are the blue and orange bubbles (for the real and imaginary parts of the impedance, respectively)

• In the graphics with insulation I'm also showing the results of EZNEC, the traces are greyed out and can be turned on in the legend on the right side

The Formula by Cebik [9] and by Wu [7] are very similar, the traces '4N2' and 'NEC' are almost identical.

I'm greying out the EZNEC traces and some of the ASAP traces because they exhibit numerical problems due to the ill-conditioned impedance matrix resulting from Richmond's formulation. It is interesting to see that EZNEC also exhibits numerical problems for smaller segments. You can see the traces by clicking into the legend on the right side.

The segmentation in all examples is 21 segments for the wire. This means small segments for the lower end of the lengths computed. It triggers a small segment warning in EZNEC for the first three points (up to and including $\frac{4.2}{32} = 0.06558125$ wavelength) but not for the clearly numerically invalid 6th value in the first graphics with insulation and the points exhibiting numeric instability in the second graphics with insulation.

Note that the measurements by Lamensdorf [4] were performed on monopole-antennas. I'm simulating dipoles. The impedance of monopole is half that of a dipole, so for admittance I'm multiplying by 2. All admittances are in milli-Siemens, called milli-mhos (mho being ohm spelled backwards) with the symbol ℧ at the time of the Lamensdorf [4] paper.

# Plotting Antenna Pattern

I'm the author of an antenna plotting program called plot-antenna. It is written in python and has graphics backends for both, matplotlib and plotly. As of this writing the version available on pypi does not yet support different frequency resolution for pattern and impedance plots. The matplotlib backend is better when generating graphics for use in printed documentation while the plotly backend can generate interactive graphics for the web. The documentation contains some matplotlib examples, so I'm going into more detail here for the plotly backend.

I've recently added support for the "Antenna Scatterers Analysis Program" ASAP, an antenna simulation program written before the Fortran-77 Standard in 1974. It is still interesting today because it uses a formulation for the Method of Moments [1] [2] that is different from both, NEC [3] and Mininec [4]. And it pioneered the simulation of wire insulation when an antenna uses insulated elements.

In the following I'm using a 3-Element Yagi-Uda antenna that is in the Example Section of ASAP. The input file I'm using exports antenna pattern for 280 MHz to 305 MHz in 5 MHz steps. In addition it exports antenna impedance in 1 MHz steps.

When plotting, plot-antenna can use different scaling methods for the antenna pattern. The default is ARRL scaling [5] which I'm using in the following example.

The 3D plot can switch the frequency in the legend on the right side, by clicking on a frequency the display switches to the antenna pattern for that frequency. In addition on mouse-over the gain values at different positions in the 3D-plot can be seen. It is possible to zoom with the mouse wheel and turn the plot by clicking and dragging.

Of course plot-antenna can also display the conventionaly 2D plots for Azimuth and elevation:

With the 2D display in the legend on the right side, multiple traces can be enabled at once. This way it is easier to compare gain pattern than with the 3D plot where only one pattern can be shown at a time. Similar to the 3D-plot we can also see the gain values on mouse-over. It is possible to zoom (althought this is not very useful in a polar plot) and reset to the default view with the "Home" button in the menu.

Often it is useful to see the voltage standing-wave ratio and the antenna impedance. This is achieved with the VSWR plot. In the following plot I've turned on view of the impedance using the --swr-show-impedance option. The vertical green line indicates the position of minimum VSWR. Again we can see the plotted values on mouse-over. We can zoom in by selecting a rectangle with the mouse or by using the +/- buttons in the menu. The view can be reset to the default using the "Home" button in the menu.

Last not least we can plot the antenna geometry. This is again a 3D plot where we can show the positions of the antenna elements on mouse-over. In the legend on the right we can turn off the visibility of the feed point (in orange). Other antennas may show a ground plane in this view or additional impedance loading on the antenna. The geometry view is very useful to check if the antenna has been modeled correctly (e.g. if the feed point is at the correct position).

The program plot-antenna currently supports output from ASAP, the original Mininec core (written in Basic), my Python version pymininec, and output of NEC-2. The format is auto-detected. In addition, with the companion-program plot-eznec, the gain pattern exported from EZNEC can be displayed. As of this writing there is a not-yet released version that can also display VSWR plots using the data exported by EZNEC when plotting VSWR in a file called LastZ.txt.

In my last post [1] I was wondering why my implementation of skin effect load in pymininec did not match the results we get from NEC-2. Turns out I was wrong in assuming that the skin effect load is purely resistive, it has a large reactive component.

Note: This article contains a lot of math. You may want to skip to the end and check how the new implementation in pymininec behaves compared to may last article which used only a purely resistive load from the skin effect resistivity.

The english Wikipedia article on skin effect does have the correct formula:

\begin{equation*} \newcommand{\Int}{{\mathrm\scriptscriptstyle int}} \newcommand{\ber}{\mathop{\mathrm{ber}}\nolimits} \newcommand{\bei}{\mathop{\mathrm{bei}}\nolimits} \end{equation*}
\begin{equation*} Z_\Int = \frac{k\rho}{2\pi r}\frac{J_0 (kr)}{J_1 (kr)} \end{equation*}

where

\begin{equation*} k = \sqrt{\frac{-j\omega\mu}{\rho}} \end{equation*}

and $r$ is the wire radius, $J_v$ are the Bessel functions of the first kind of order $v$. $Z_\Int$ is the impedance per unit length of wire.

NEC-2 uses a different formulation [2] (p.75):

\begin{equation*} Z_i = \frac{j\Delta_i}{a_i} \sqrt{\frac{\omega\mu}{2\pi\sigma}} \left[\frac{\ber (q) + j\bei (q)} {\ber^\prime(q)+ j\bei^\prime(q)}\right] \end{equation*}

with

\begin{equation*} q = (\omega\mu\sigma)^{1/2} a_i \end{equation*}

and $a_i$ the wire radius, $\Delta_i$ the length of the wire segment, $\sigma$ the wire conductivity and $\ber$ and $\bei$ the Kelvin functions. In the following we will use $r$ for the wire radius, not $a_i$. Note that $\ber^\prime$ is the derivative of $\ber$, we can also write $\ber$ as $\ber_0$ (order 0) and $\ber^\prime$ as $\ber_1$ (order 1) and similar for $\bei$.

In the following I'll be a little bit sloppy about the usage of the permeability of free space $\mu_0$ vs. the permeability of the wire $\mu$, for non-ferromagnetic wires these are really (almost) the same: $\mu = \mu_r * \mu_0$ with $\mu_r\approx 1$ for non-ferromagnetic material.

The NEC-2 code, uses a Fortran function ZINT which gets two parameters, the conductivity multiplied by the wavelength $\lambda$ (parameter SIGL) and the radius divided by the wavelength $\lambda$ (parameter ROLAM). Apart from an approximation of the Kelvin functions (the term computing the fraction of the Kelvin functions is BR1), ZINT computes:

X = SQRT (TPCMU * SIGL) * ROLAM
ZINT = FJ * SQRT (CMOTP / SIGL) * BR1 / ROLAM

with TPCMU = 2368.705, CMOTP=60.0, and FJ is $j$. The computed X is passed to the computation of the Kelvin functions resulting in BR1 for the fraction term. Note that the computed ZINT does not include the length, it is equivalent with the Wikipedia definition in unit length. With the reverse-engineered magic numbers

CMOTP $\approx \frac{\mu_0 c}{2\pi}$

TPCMU $\approx 2\pi c\mu_0$

($c$ is the speed of light) we get for X from the code (which we call $q$ now consistent with the formula from [2]):

\begin{align*} q &= \frac{r}{\lambda}\sqrt{2368.705\sigma\lambda} \\ &= r\sqrt{\frac{2368.705\sigma}{\lambda}} \\ &\approx r\sqrt{\frac{2\pi c\mu_0\sigma}{\lambda}} \\ &= r\sqrt{\frac{2\pi c\mu_0\sigma f}{c}} \\ &= r\sqrt{\frac{2\pi \mu_0\sigma \omega}{2\pi}} \\ &= r\sqrt{\omega\mu_0\sigma} \\ \end{align*}

which matches the definition of $q$ in [2] (see above).

for $Z_\Int$ of the Fortran implementation we get:

\begin{align*} Z_\Int &= \frac{j}{r} \sqrt{\frac{\mu_0 c}{2\pi\sigma\lambda}} \left[\frac{\ber_0 (q) + j\bei_0 (q)} {\ber_1(q)+ j\bei_1(q)}\right] \\ &= \frac{j}{r} \sqrt{\frac{\mu_0 c f}{2\pi\sigma c}} \left[\frac{\ber_0 (q) + j\bei_0 (q)} {\ber_1(q)+ j\bei_1(q)}\right] \\ &= \frac{j}{r} \sqrt{\frac{\mu_0 f}{2\pi\sigma}} \left[\frac{\ber_0 (q) + j\bei_0 (q)} {\ber_1(q)+ j\bei_1(q)}\right] \\ &= \frac{j}{r} \sqrt{\frac{\mu_0 \omega}{2\pi 2\pi\sigma}} \left[\frac{\ber_0 (q) + j\bei_0 (q)} {\ber_1(q)+ j\bei_1(q)}\right] \\ &= \frac{j}{2\pi r} \sqrt{\frac{\mu_0 \omega}{\sigma}} \left[\frac{\ber_0 (q) + j\bei_0 (q)} {\ber_1(q)+ j\bei_1(q)}\right] \\ \end{align*}

Which does NOT match the formula for $Z_i$ from [2] above. It looks like the formula from [2] (p. 75) should have the frequency $f$ instead of $\omega$ under the square root. The formula derived from the NEC-2 implementation, however, is consistent with the Wikipedia formula above.

The term before the fraction of Kelvin functions differs by a factor from the Wikipedia formula (I'm leaving out $\Delta_i$, making it a per unit length formula like the one from Wikipedia), also note that $\rho=1/\sigma$:

From code:

\begin{equation*} \frac{j}{2\pi r}\sqrt{\frac{\omega\mu}{\sigma}} \end{equation*}

From Wikipedia:

\begin{equation*} \frac{k\rho}{2\pi r} = \sqrt{\frac{-j\omega\mu}{\rho}}\frac{\rho}{2\pi r} = \frac{1}{2\pi r}\sqrt{-j\omega\mu\rho} = \frac{1}{2\pi r}\sqrt{\frac{-j\omega\mu}{\sigma}} \end{equation*}

This is a factor of

\begin{equation*} \frac{\sqrt{-j}}{j} = -\frac{1+j}{\sqrt{2}} \end{equation*}

because $\sqrt{-j} = \frac{1-j}{\sqrt{2}}$

For the Kelvin functions we have the following equivalence to Bessel functions [3] (10.61, Kelvin Functions):

\begin{equation*} \ber_v x + j \bei_v x = J_v \left(xe^{3\pi j/4}\right) = e^{v\pi j} J_v \left(xe^{-\pi j / 4}\right) \end{equation*}

With the special cases of order 0 and 1:

\begin{align*} \ber_0 x + j \bei_0 x &= e^{0} J_0 \left(xe^{-\pi j / 4}\right) \\ &= J_0 \left(\frac{1-j}{\sqrt{2}}x\right) \\ \ber_1 x + j \bei_1 x &= e^{j\pi} J_1 \left(xe^{-\pi j / 4}\right) \\ &= -J_1 \left(\frac{1-j}{\sqrt{2}}x\right) \\ \end{align*}

The Wikipedia formula passes $kr$ to the Bessel functions:

\begin{equation*} kr = r\sqrt{\frac{-j\omega\mu}{\rho}} = r\sqrt{-j\omega\mu\sigma} \end{equation*}

while the NEC-2 passes

\begin{equation*} q = r\sqrt{\omega\mu\sigma} \end{equation*}

which differ by the factor $\sqrt{-j}$. This means that we pass a factor of $\sqrt{2}$ too large to the Bessel functions. In turn due to the symmetry and the fact that the functions in the denominator are the derivative of the numerator, the output is too large by a factor of $\sqrt{2}$ which compensates the factor of the first term.

Implementing the new formula in pymininec now the reactance part of the impedance behaves similar to NEC-2 and the results for the resistance are even closer than before.

Also when plotting reactance by wire resistivity the results are now as expected. The values for MMANA are still surprising, the reason might be that I was running MMANA under WINE in Linux, the resistance values computed by MMANA look OK, though.

# Skin Effect Load and Impedance

I've recently implemented skin effect loading of wires in pymininec, my re-implementation of the Mininec antenna modeling code. The original Mininec code did not have this feature.

To attempt verification of my implementation I chose to follow an ancient publication by R. P. Haviland, W4MB [1] where he, among other things, investigates effect of wire resistance on feed point impedance. He has two graphics where the real and the imaginary part of the feed point impedance are plotted against the length/diameter ratio of the wire of a dipole. He mentions Mininec in the text but reading closely it is not apparent if for the two graphics with the feed point impedance he really used Mininec or something else. I've tried to reproduce his graphs with both PyNEC, a Python wrapper for a C++ implementation of NEC-2 and my pymininec implementation of mininec. The results of Haviland have the same tendencies as the NEC results in the following but do not match closely.

Note that the X-axis of these graphs is logarithmic and the wire gets thicker to the right (the X-axis is length/diameter of the wire). In all cases we compare ideal wires without resistance to wires with copper resistivity.

Now the resistance part is not surprising: The real part of the antenna impedance gets higher when the wires have higher resistance (because they get thinner).

The interesting part is the imaginary part (the reactance) of the antenna. The NEC result is that the reactance gets higher with higher resistance of the wire (to the left) while my pymininec implementation has a slight decrease of the reactance.

To investigate this further I've decided to just change the wire resistivity without modifying the wire radius to show that the effect is independent of the wire radius. The radius is 66µm at a wavelength of 1m, $\frac{l}{d}\approx 7576$.

In addition to PyNEC and pymininec I've also included results for MMANA-GAL and EZNEC. MMANA is originally based on Mininec. The EZNEC results match closely to the PyNEC results (it's the same engine, one implemented in Fortran, one in C++) but MMANA keeps the reactance completely constant at $42.63\Omega$ at all resistivities of the wire. Note that the most interesting wire resistivities are the first three big dots on the left, the first is an ideal wire (at 0 resistivity) followed by copper and alu.

I currently don't have an idea which result comes closest to the physical reality.

# Beat

Musician know the effect when tuning instruments: If two instruments are playing not quite the same tune, it sounds like the tune has a periodic variation in volume. The period gets larger the better tuned the two instruments are and stops when they both have the same tune. We call this effect "Beat".

Ham radio operators know the "opposite" of that effect when amplitude-modulating a radio frequency with a sine wave in the audio range: The result are two radio frequencies, one lower by the frequency of the audio sine wave, one higher by that frequency.

Amplitude modulation is simply a multiplication: We multiply the radio wave with the audio. We have

$$m(t) = A(t) \cdot\cos(2\pi f t)$$

Where $t$ is the time, $m(t)$ is the modulated signal, $\cos(2\pi f_r t)$ is the radio frequency signal with frequency $f_r$ and $A(t)$ is the audio signal. For a simple audio sine wave we get:

$$m(t) = \cos(2\pi f_a t)\cdot\cos(2\pi f_r t)$$

with $f_a$ the audio frequency.

Now we want to show that both effects are the same thing. We do not need radio frequencies, we can do the same in the audio spectrum.

From Wikipedia we take one of the trigonometric Product sum identities:

$$\displaystyle\cos\theta \cos\phi = \frac{\cos(\theta-\phi) + \cos(\theta+\phi)}{2}$$

When we apply this to our modulation example, we get:

\displaystyle \begin{align} \cos(2\pi f_a t) \cos(2\pi f_r t) & = \frac{\cos(2\pi f_r t-2\pi f_a t) + \cos(2\pi f_r t+2\pi f_a t)}{2} \\ & = \frac{\cos(2\pi(f_r-f_a)t)) + \cos(2\pi(f_r+f_a)t)}{2} \\ \end{align}

So multiplying two cosines with different frequencies results in the sum of two cosine functions, one with the frequency of the difference of the original frequencies, one with the sum.

So no matter how we create the modulated signal, by multiplying two signals or adding two signals we get the same result.

I the following example we multiply a 3Hz signal (the beat) with a 440 Hz signal (that's an 'A' in musical terms). We get a modulated signal that changes volume with the beat frequency as expected.

In [1]:
import numpy as np
# Three seconds with 18000 samples
t = np.linspace (0, 3, 18000)
# Product of 3Hz and 440Hz tune
y = np.cos (2*np.pi*3*t) * np.cos (2*np.pi*440*t)
# Plot only 1/9 of the three seconds = 1/3s = one period of the 3Hz signal
limit = int (2000)

In [2]:
import plotly.graph_objs as go
import plotly.io as pio
from plotly_fix_fn import setup_plotly
setup_plotly ('content/2024/05/plotly', '11-en')
show_opt = dict (include_plotlyjs = 'directory')

fig = go.Figure ()
fig.add_trace (go.Scatter (x = t [:limit], y = y [:limit]))
fig.update_layout (title = "Modulated signal", xaxis_title = "Time (s)", yaxis_title = "Amplitude")
#fig_widget = go.FigureWidget (fig)
fig.show (** show_opt)


Now we do a fourier transform of the result to see the frequency spectrum, we see that the result of the multiplication consists of two frequencies, one 3Hz below the 440 Hz and one 3Hz above.

In [3]:
from numpy.fft import rfft, fftfreq
fft   = rfft (y, norm = 'forward')
freqs = fftfreq (len (y), d = 3/18000)
fig = go.Figure ()
fig.add_trace (go.Scatter (x = freqs [:5000], y = np.abs (fft [:5000])))
fig.update_layout (title = "Frequency spectrum", xaxis_title = "Frequency (Hz)")
fig.update_xaxes (range = [430, 450])
fig.show (** show_opt)


We can arrive at the same result by just adding the two frequencies (and dividing by 2 to get the same amplitude):

In [4]:
y = 0.5 * (np.cos (2*np.pi*(440-3)*t) + np.cos (2*np.pi*(440+3)*t))
fig = go.Figure ()
fig.add_trace (go.Scatter (x = t [:limit], y = y [:limit]))
fig.update_layout (title = "Modulated signal", xaxis_title = "Time (s)", yaxis_title = "Amplitude")
fig.show (** show_opt)


Finally we verify again that this produces the same audio spectrum as before by doing another fourier analysis.

In [5]:
fft   = rfft (y, norm='forward')
freqs = fftfreq (len (y), d = 3/18000)
fig = go.Figure ()
fig.add_trace (go.Scatter (x = freqs [:5000], y = np.abs (fft [:5000])))
fig.update_layout (title = "Frequency spectrum", xaxis_title = "Frequency (Hz)")
fig.update_xaxes (range = [430, 450])
fig.show (** show_opt)


So what you hear when tuning instruments is not an illusion: Adding the two frequencies is the same thing as when modulating a tune with a beat frequency.

# Chipselect Glitch with SPI NOR on IMX-28

For a customer I'm maintaining a Linux environment (including kernel) for a custom board that uses a module based on the IMX-28 ARM processor. For the user space I'm using ELBE, the Embedded Linux Build Environment that allows building a user space filesystem from a Debian Linux release. Migrating from an earlier Debian version to the current stable release code-named Bookworm we needed a kernel upgrade because systemd requires some features not present in earlier kernels.

After upgrading from a Kernel in the 4.14 longterm support series to the 6.1 longterm support series our SPI NOR flash memory was no longer working. The JFFS2 file system we're using produced errors that indicated that we had a problem writing to the NOR flash (some error messages indicated that the file system expected bytes to be 0 which were 0xFF which lead to the conclusion that these bytes had not been written).

After verifying that the clock rate of the SPI bus when writing to the flash device was unchanged between kernel versions I looked at the signals of the SPI bus.

From the Oszilloscope pictures (blue is the chip select, yellow is the SPI clock) it can be seen that there is a short glitch in the chip select line. This has the effect of terminating the page program command to the NOR flash prematurely. From the zoomed-in version it can be seen that the chip-select has a glitch in the middle of a bit during the transfer. The exact position changes slightly from transfer to transfer (it doesn't hit the same bit every time but it does hit the same byte).

Further experimentation showed that the problem occurs only when the SPI transfer is initiated via direct memory access (DMA), not when the non-DMA version (programmed input-output, PIO) was used. The SPI driver optimizes transfers: Only longer transfers are done via DMA, short transfers use PIO because the DMA overhead exceeds the gain. When looking into the DMA implementation of the IMX-28 in drivers/dma/mxs-dma.c it turns out that there was a change in a flag passed to the DMA transfer, the old version has:

if (flags & DMA_CTRL_ACK)
ccw->bits |= CCW_WAIT4END;

The new version has:

if (flags & MXS_DMA_CTRL_WAIT4END)
ccw->bits |= CCW_WAIT4END;

The bit CCW_WAIT4END according the the IMX-28 processor reference: "A value of one indicates that the channel waits for the end of command signal to be sent from the APBH device to the DMA before starting the next DMA command." This flag is set on the actual DMA transfer to the NOR flash. The IMX-28 is special in that it can transfer register settings also via DMA. Prior to the memory transfer a register transfer is initiated that sets the IGNORE_CRC bit in the processors SSP register. From the processor reference on the IGNORE_CRC bit: "In SPI/SSI modes: When set to 1, deassert the chip select (SSn) pin after the command is executed".

So what happens when the CCW_WAIT4END is not set is that the chip select is deasserted too early.

The DMA transfer initiated from the SPI driver in drivers/spi/spi-mxs.c does not set the MXS_DMA_CTRL_WAIT4END flag, so the CCW_WAIT4END is not set on the transfer.

Now the fix is obvious, we simply need to set this flag on the DMA transfer initiated by the SPI driver in drivers/spi/spi-mxs.c.

# Linux Kernel route_localnet setting

The Linux kernel has a setting for network interfaces that allows routing local net (127.0.0.0/8 for IP version 4, also known as the loopback address because it typically is assigned to the lo loopback interface). It can be accessed via:

/proc/sys/net/ipv4/conf/eth0/route_localnet

for the typical eth0 network interface. The values which can be written there are either 0 (the default) for no routing or 1 for routing. Each network interface has its own special file. If a setting should be made persistent this can usually be done by configuring a setting in /etc/sysctl.conf on most Linux distributions. The setting takes the form of the above path where the prefix /proc/sys/ is removed and all / characters are replaced by a dot. In the example above this would be written as:

net.ipv4.conf.eth0.route_localnet = 1

## Now what does this setting achieve?

In the following I'm using examples from IP version 4 but the same applies to the local host (loopback) address ::1 for IP version 6.

Typically in the Linux network stack IP packets which are either originating from local net 127.0.0.0/8 or have a destination in that network are dropped when the come in over an interface that is not configured for this address range. We'll first verify this in the following.

For testing we use a simple UDP server in python that receives UDP packets and prints their source IP address and port (and length):

import socket
[...]
sock = socket.socket (socket.AF_INET, socket.SOCK_DGRAM,
socket.IPPROTO_UDP)
while True:
l = len (buf)
print ('Received %(l)s bytes from %(adr)s' % locals ())

All the values in args come from the command line via ArgumentParser. The address to bind to is 0.0.0.0 in the following, meaning it accepts packets from all network interfaces.

Now we can first test with a simple client. In the following we have two machines, .9 is the sending machine and .5 is the receiving machine.

When sending a UDP packet we can use the following simple code:

import socket
sock = socket.socket (socket.AF_INET, socket.SOCK_DGRAM, socket.IPPROTO_UDP)
sock.sendto (b'This is a test', (args.address, args.port))

Again we can specify the parameters on the command line. Now when we send from .9 to .5 we get:

Received 14 bytes from ('XXXXXXX.9', 38232)

where the source port 38232 is a random number created by the sending Linux kernel and the XXXXXXX part represents the high bytes of the IP address of the sender. When sending from the receiving machine to 127.0.0.1 we get:

Received 14 bytes from ('127.0.0.1', 58196)

We see that the simple server listens to both, the real ethernet interface and to the loopback interface.

Nothing new so far. Lets see what happens when we fake addresses. For creating arbitrary IP packets (or even non-IP packets) the Python package Scapy is used in the following.

Consider the following little script:

p = Ether (dst = args.mac_address, src = args.mac_address) \
/ UDP (dport = args.port, sport = args.source_port) \
/ Raw (b'This is a test')
sendp (p, iface = args.interface)

This builds an Ethernet (Ether) packet with an IP payload. The IP packet contains a UDP payload, which in turn contains a Raw string. This packet is then sent via a raw (network layer 2) network socket. Again all parameters come from the command line. Default for all the ports is 22222, the source IP is the XXXXXXX.9 address and the destination IP is the XXXXXXX.5 address. The destination and source mac addresses are both set to the mac address of the destination machine. We can replicate what we did with our simple client above:

sudo scapyclient.py

which results in the server printing:

Received 14 bytes from ('XXXXXXX.9', 22222)

It will always be port 22222 unless we specify a port option since this is the default in the settings of the script. We need to use sudo because only the superuser is allowed to send arbitrary packets on layer-2 of the network stack.

Now what happens when we start faking packets? For this we start tcpdump on both, the sending machine and the receiving machine as follows:

sudo tcpdump -n -i eth0 udp port 22222

Of course if your network interface is not called eth0 you would replace the parameter to the -i option above. The -n option tells tcpdump to not attempt DNS lookups of received IP addresses. Now lets send a packet with a localhost address as the source address:

sudo scapyclient.py --source-address=127.0.0.1

We see it on both machines in the tcpdump output:

TT:TT:TT.TTTTTT IP 127.0.0.1.22222 > 10.23.5.5.22222: UDP, length 14

where TT:TT:TT.TTTTTT is a timestamp. So the packet is going out via the network interface of the sending machine and it is being received by the receiving machine (because the destination mac address matches) but it is not printed by our simple server.

The same happens when we fake the destination address:

sudo scapyclient.py --address=127.0.0.1

The tcpdump output is:

TT:TT:TT.TTTTTT IP 10.23.5.9.22222 > 127.0.0.1.22222: UDP, length 14

It is also printed by both machines by tcpdump and again it is not printed by our simple server.

Now lets see what happens if we enable route_localnet for our ethernet interface (the following command only works as root, of course):

echo 1 > /proc/sys/net/ipv4/conf/eth0/route_localnet

We again fake the sender address:

sudo python3.11 scapyclient.py --source-address=127.0.0.1

Again the packet is printed by both running tcpdump processes but again not by our simple server. But faking the destination address:

sudo python3.11 scapyclient.py --address=127.0.0.1

Apart from the packet being logged by tcpdump we get an answer from our server:

Received 14 bytes from ('10.23.5.9', 22222)

Note that this works even if we tell our simple server to only listen to localhost:

python3 server.py --address=127.0.0.1

Still outputs our faked packet although this did not come in via the loopback interface:

Received 14 bytes from ('10.23.5.9', 22222)

## Security implications

Some applications are using local servers that listen only to the loopback address and expect these to be able to only receive packets via the loopback interface – from local processes running on the same machine as the server.

This is always a dangerous assumption, so you should never do dangerous things that rely on this type of (non-) authentication. At least on Linux there is some safeguard as long as you keep the setting of the route_localnet to 0. On other operating systems you may not be so lucky. And even on Linux this behaviour can be turned off.

## Why would you want to set this?

Sometimes you want to forward connections originating on the local machine that try to connect to local port to a remote machine, e.g. you have in /etc/hosts

127.0.0.1 some.remote.example.com

and you want to connect to that machine with a process from localhost. You can add firewall rules that rewrite packets to achieve this:

iptables -t nat -A OUTPUT -d 127.0.0.1 -p tcp --dport 443 \
-j DNAT --to-destination XXXXXXX.5:1443
iptables -t nat -A POSTROUTING -d XXXXXXX.5 -p tcp --dport 1443 \
-j SNAT --to-source XXXXXXX.9

The first rule is doing the destination address (and port) rewriting while the second rule makes sure that answer packets reach us by rewriting the source address. But without setting route_localnet on the outgoing ethernet interface this will not work, the packets will be dropped.

But when setting this beware of the explanation in the Security implications section. Because it has also an impact on what packets are received from the network.

# Optimizing Floating-Point Problems with Evolutionary Algorithms

[This is an extended abstract for a talk I'm planning]

Evolutionary Algorithms (EA) are a superset of genetic algorithms and related optimization algorithms. Genetic algorithms usually work with bits and small integer numbers.

There are other EAs that directly work with floating point numbers, among the Differential Evolution (DE) [1] [2] [3].

The talk gives an introduction to optimization of floating-point problems with DE. It uses examples from electrical engineering as well as from optimization of actuation waveforms of inkjet printers. Piezo-electric inkjet printers use an actuation waveform to jet drops out of a nozzle. This waveform (among other parameters like the jetted fluid) determines the quality of the jetted drops.

For the software I'm using the Python bindings PGApy [4] for the Parallel Genetic Algorithm Package PGAPack [5] which was originally developed at Argonne National Laboratories. I'm maintaining both packages for some years now. Among others, Differential Evolution (DE) and strategies for multi-objective optimization (using NSGA-II [6]) where newly implemented.

# Differential Evolution Illustrated

This post again applies to PGAPack and PGApy or other Genetic Algorithm (GA) implementations that support Differential Evolution (DE).

Differential Evolution (DE) [1], [2], [3] is a population based optimizer that is similar to other evolutionary algorithms. It is quite powerful and implemented in PGAPack (and usable in the Python wrapper PGAPy). To illustrate some points about this algorithm I've constructed a simple parcours in OpenSCAD.

To test the optimizer with this parcours the gene in the genetic algorithm (DE calls this the parameters) is the X and Y coordinate to give a position on the parcours. In the following we're also calling these coordinates a vector as is common in the DE literature. We initialize the population in the region of the start of the ramp. We allow the population to leave the initialization range.

Note that the corners of the parcours are flat areas with the same height. Areas like this are generally difficult for optimization algorithms because the algorithm cannot know in which direction better values (if available at all) can be found. This is also the reason why we do not put the initialization range inside the flat area at the bottom of the parcours. Note that normally one would initialize the population over the whole area to be searched. In our case we want to demonstrate that the algorithm is well capable of finding a solution far from the initialization range.

The algorithm of DE is quite simple (see, e.g. [3] p.37-47): Each member $\mathbf x_{i,g}$ in the population of the current generation $g$, where $i$ is the index of the population member is crossed over with a mutant vector $v_{i,g}$. The mutant vector is generated by selecting three population members at random (without replacement) from the population and combining them as follows.

\begin{equation*} \mathbf v_{i,g} = \mathbf x_{r_0,g} + F \cdot (\mathbf x_{r_1,g} - \mathbf x_{r_2,g}) \end{equation*}

As can be seen, the weighted difference of two random members of the population is added to a third population member. All indeces $r_0, r_1,$ and $r_2$ are different and different from $i$. The weight $F$ is a configuration parameter that is typically $0.3 \le F < 1$. This algorithm is the classic DE, also often termed de-rand-1-bin. A variant uses the index of the best individual instead of $r_0$ and is called de-best-1-bin. If more than a single difference is added, another variant would be, e.g., de-rand-2-bin. The last component of the name bin refers to uniform crossover [4], a binomial distribution. In DE, parameters from the mutant vector are selected with a certain probability $Cr$. Note that DE makes sure that at least one parameter from the mutant vector is selected (otherwise the original vector $\mathbf x_{i,g}$ would be unchanged).

More details about DE and the usage with PGAPack can be found in the section Differential Evolution and in the Mutation sections of the PGAPack documentation.

To use the OpenSCAD model for optimization, we convert it to a greyscale heightmap using an STL to PNG converter (STL is a 3D format that can be generated by OpenSCAD). The evaluation function then simply returns the greyscale value at the given location (after rounding the gene X and Y values to an integer). Values outside the picture are returned as 0 (black). The resulting optimization run is shown below in an animated image. The population is quite small, there are only 6 individuals in the population. We clearly see that when the differences between individuals gets large (on the straight ridges of the parcours), the optimization proceeds in larger steps. We also see that the flat corners can take quite some time to escape from and in the corners the algorithm slows down. Finally in the last corner, the cone is climbed and the individuals converge almost to a single point. The algorithm is stopped when the first individual returns an evaluation of the largest greyscale value (representing white). Note that I cheated a little in that many optimization runs take longer (in particular the optimization can get stuck for many generations in the flat parts of the ridge) and I selected a run that produced a good animation. So the selected run is not the average but one of the shorter runs.

# Notes on Premature Convergence in Genetic Algorithms

This post again applies to PGAPack and PGApy or other Genetic Algorithm (GA) implementations.

When optimizing solutions with GA, it sometimes happens that a sub-optimal solution is found because the whole population converges early to a part of the search space where no better solutions are found anymore. This effect is called premature convergence.

It is usually hard to detect premature convergence, a good measure is the mean hamming distance between individuals. In PGAPack reporting of the hamming distance can be enabled with the reporting option PGA_REPORT_HAMMING, set with the function PGASetPrintOptions in PGAPack and with the print_options constructor parameter in PGApy. Unfortunately this is only implemented for the binary datatype.

One reason for the effect of premature convergence is the use of Fitness Proportionate Selection as detailed in an earlier post in this blog [1]. If during the early stages of the search an individual is discovered that is much better than anything found so far, the chance is high that this individual takes over the whole population when Fitness Proportionate Selection is in use, preventing any further progress of the algorithm. The reason is that an individual that is much better than all others gets an unreasonably high proportion of the roulette wheel when selecting individuals for the next generation resulting in all other genetic material having only a slim chance of making it into the next generation.

Another reason for premature convergence can be a small population size. Goldberg et. al. [9] give estimates of the population size for classic GA with a small alphabet (the number of different allele values, e.g. 0 and 1 for a binary GA) with cardinality $\chi$, a problem specific building block size that overcomes deception $k \ll n$ where $k$ is the building block size (a measure for the difficulty of the problem) and $n$ is the string (gene) length. They show that the population size should be $O(m\chi^k)$ with $m=n/k$ so that for a given difficulty of the problem $k$ the population size is proportional to the string length $n$. This result, however, does not readily translate to problems with a large alphabet, e.g. floating-point representations like the real data type in PGAPack. For floating point representations, difficulty usually translates to how multimodal a problem is, i.e., how many peaks (in case of a maximization problem) or valleys (in case of a minimization problem) there are in the objective function.

Now if with an adequate population size and an appropriate selection scheme premature convergence still occurs, there are some mechanisms that can be used.

## Prevention of Duplicates

PGAPack implements a mechanism for preventing duplicate gene strings (individuals). In previous implementations the computation effort was quadratic in the population size $N$, i.e. the effort was $O(N^2)$ (it compared each new individual with all current members of the population, once for each new individual). In the latest versions it uses hashing for detecting duplicates, reducing the overhead to $O(N)$, a small constant overhead for each new individual.

For user-defined data types this means that the user has to define a hash function for the data type in addition to the string comparison function. For the built-in data types (binary, integer, real) this is automatically available.

Prevention of duplicates works quite well for binary and integer data types, especially if the genetic operators have a high probability of producing duplicates. It does not work so well for the real data type because new individuals tend to be different from other individuals even if they can often be very close to already existing individuals.

## Restricted Tournament Replacement

An algorithm originally called restricted tournament selection by Harik [2], [3] and later adopted under the name of restricted tournament replacement (RTR) by Pelikan [4] uses the old and new population for deciding if a candidate individual will be allowed into the new population. It works by randomly selecting a number of members from the old population (called the selection window), chosing the individual which is most similar to the candidate, and allows the candidate into the new population only if it is better than this most similar individual.

The default for the number of individuals randomly selected from the old population (the window size) by default is $\min (n, \frac{N}{20})$ [4] where $n$ is the string (gene) length and $N$ is the population size. This can be set by the user with the PGASetRTRWindowSize function for PGAPack and with the rtr_window_size constructor parameter of PGApy.

The RTR algorithm needs a similarity metric for deciding how close an individual is to another. By default this uses a manhattan distance (equivalent to the hamming distance for binary genes), i.e. an allele-by-allele sum of distances, but can be set to an euclidian distance or a user-defined metric with the user-function mechanism of PGAPack. Comparison of an euclidian distance metric for RTR is in the magic square example in PGApy where the use of the euclidian distance can be turned on with a command-line option.

Restricted tournament replacement works well not only for binary and integer genes but also for real genes. It can be combined with different evolutionary algorithm settings.

The effect of RTR on a problem that tends to suffer from premature convergence can be seen in the test program examples/mgh/testprog.f, this implements several test functions from an old benchmark of nonlinear test problems [5]. The test function that exhibits premature convergence is what the authors call a "Gaussian function", described as example (9) in the paper [5] and implemented as function 3 in examples/mgh/objfcn77.f. This function is given as

\begin{equation*} f(x) = x_1 \cdot e^{\frac{x_2(t_i - x_3)^2}{2}} - y_i \end{equation*}

with

\begin{equation*} t_i = (8 - i) / 2 \end{equation*}

And tabulated values for $y_i$ given in the paper or the implementation in examples/mgh/objfcn77.f. The minimization problem from these equations is

\begin{equation*} f (x) = \sum_{i=1}^{m} f_i^2(x) \end{equation*}

with $m=15$ for this test problem. The authors [5] give the vector $x_0 = (0.4, 1, 0)$ for the minimum $f(x_0) = 1.12793 \cdot 10^{-8}$ they found. The original Fortran implementation in examples/mgh/testprog.f uses a population size of 10000 with default settings for the real data type of PGAPack. The large population size is chosen because otherwise the problem exhibits premature convergence. It finds a solution in 100 generations $x_0=(0.3983801, 0.9978369, 0.009918243)$ with an evaluation value of $f(x_0)=2.849966\cdot 10^{-5}$. The number of function evaluations needed were 105459 (this is a little less than $10000 + 100 \cdot 1000$, i.e. evaluation of the initial generation plus evaluation of 10% of the population of each generation, the probability of crossover and mutation is not 100%, so it happens that none of the operators is performed on an individual and it is not re-evaluated).

I've implemented the same problem with Differential Evolution [6], [7], [8] in examples/mgh/testprogde.c (the driver program implemented in C because I really do not speak Fortran but using the same functions from the Fortran implementation linking the Fortran and C code into a common executable). This uses a population size of 2000 (large for most problems when using Differential Evolution, again for the reason of premature convergence) and finds the solution $x_0=(0.3992372, 0.9990791, -0.0007697923)$ with an evaluation value of $f(x_0)=7.393123\cdot 10^{-7}$ in only 30 generations. This amounts to 62000 function evaluations (Differential Evolution creates all individuals for the new generation and decides afterwards which to keep).

When using RTR with this problem in examples/mgh/testprogdertr.c, the population size can be reduced to 250 and even after 100 generations the search has not converged to a suboptimal solution. After 100 generations we find $x_0=(0.398975, 1.000074, -6.719886 \cdot 10^{-5})$ and $f(x_0)=1.339766\cdot 10^{-8}$ (but also with some changed settings of the Differential Evolution algorithm). This amounts to 25250 function evaluations.

## Restart

A last resort when the above mechanisms do not work is to regularly restart the GA whenever the population has converged too much. The restart mechanism implemented in PGAPack uses the best individual from the population to re-seed a new population with variations created by mutation from this best individual. Restarts can be enabled by setting PGASetRestartFlag in PGAPack or using the restart constructor parameter in PGApy. The frequency (default is every 50 generations) of restarts can be set with PGASetRestartFrequencyValue in PGAPack and the restart_frequency constructor parameter in PGApy.