Section8.1Forbidden Words

Let $\al$ denote a binary string, and let $L$ denote the set of binary of binary strings that do not contain $\al$ as a substring. How many strings are there in $L$ with length $n$?

Let $M$ denote the binary strings that contain exactly one copy of $\al$, as a suffix. If $\eps$ denotes the empty string, then

$$\eps+L(0+1) = L+M.\label{eq-eL01}\tag{8.1.1}$$
Also $L\al$ clearly contains $M$ as a subset, but for example if $\al=111$, then
$L\al =M+M1+M11.$
We develop a precise description of $L\al$. Let $\al\backslash\be$ denote the set of suffixes $\be_1$ such that $\be_0\be_1=\be$ and $\be_0$ is a suffix of $\al$. For example if
$\al =100110,\quad \be=1011,$
then
$\al\backslash\be = \{11\},\quad \al\backslash\al = \{\eps,0110\}$

Note that $\al\backslash\al$ is a finite language, i.e., a finite set of strings.

If $N$ is a set of binary strings, let $N(t)$ denote its generating series. This is what we get when we substitute $t$ for $0$ and $1$, a string of length $\ell$ maps to $t^\ell$ and the language maps to its generating series. Now \eqref{eq-eL01} yields

$1+2t L(t) = L(t)+M(t)$
and, if $D=\al\backslash\al$ and $|\al|$ denotes the length of $\al$, the lemma yields that
$L(t)t^{|\al|} = M(t) D(t).$

Let's use this to compute some numbers. We first set up the ring of formal power series over the rationals.

If $\al=111$, then $D(t)=1+t+t^2$ and $L(t)$ is

As an exercise, write a program which takes a binary string $\al$ and computes the generating series for $\al\backslash\al$. (The string arrives as a Python string, there is no reason why your procedure should not work for strings over an arbitrary alphabet.)

The square root of the series $L(2t)$ has non-negative integer coefficients.

Does this hold for other forbidden substrings?

For a second variant, suppose $\cF=\{\seq\al1d\}$ is a set of binary strings. Let $L$ be the set of strings that contain no element of $\cF$ and let $M_i$ denote the set of strings that contain one copy of $\al_i$, as a suffix, but do not contain any other copies of strings in $\cF$. If $|\cF|=2$, we have equations

\begin{align*} \eps+L(0+1) &= L +M_1 +M_2\\ L\al_1 &= M_1(\al_1\backslash\al_1) +M_2(\al_1\backslash\al_2)\\ L\al_2 &= M_1(\al_2\backslash\al_1) +M_2(\al_2\backslash\al_2). \end{align*}
Write a program that, given $\al_1$ and $\al_2$, computes the generating series for $L$, $M_1$ and $M_2$.