IFS Driven by Letters

What about other approaches to driving IFS by text? Here is the IFS driven by the text of Benoit Mandelbrot's essay "Mathematics and Society in the Twentieth Century," read as a DNA string. Specifically, we took the text of the essay, and removed all punctuation, spaces, paragraph indents, and converted all letters to lower case, obtaining a string of 12,325 characters. Then we read through the string sequentially, applying T1 for each occurrence of c, T2 for each occurrence of a, T3 for each occurrence of t, and T4 for each occurrence of g.
CATG picture

We certainly see a clustering of points along the diagonal, but does that reflect any more than the greater abundance of a and t? (There are 461 c, 1080 a, 1226 t, and only 192 g.)

Another approach is to divide the letters of the alphabet into four bins. Here is the driven IFS from the same text, plotted this way:
apply T1 for every a, b, c, d, e, f, g,
apply T2 for every h, i, j, k, l, m,
apply T3 for every n, o, p, q, r, s,
apply T4 for every t, u, v, w, x, y, z

Does this picture suggest any interpretation? (Note there are 4119 occurrences of a through g, 2505 occurrences of h through m, 3526 occurrences of n through s, 2175 occurrences of t through z,
Whole text (WT) picture

Of course, there is a natural sequential ordering of the real numbers, whereas the ordering of the alphabet is arbitrary. (Glance at your computer keyboard if you think the ordering of the letters is anything other than arbitrary.)

Comparing these pictures can lead to the discovery of a delicate point about driven IFS, most often begun by the question, "The CATG picture is driven by a subsequence of the WT picture, so shouldn't the CATG picture be a subset of the WT picture? Examining the lower right corners, it is clear this is not true." This question has two answers, one obvious, one more subtle.

The obvious answer is, "Look at the bins." For the WT picture, every c, a, and g yield an application of T1, so there should be no relation between the pictures.

The subtle answer is, "Even if the bins of the WT picture were changed so c is in bin 1, a in bin 2, t in bin 3, and g in bin 4, the CATG picture need not be a subset of the WT picture because the order in which the Ti are applied will be different." More generally, after converting to the driving sequence of four symbols, a subsequence need not produce a picture that is a subset of the original picture. For example, suppose the original sequence begins 121..., and has no consecutive 1s. Further, suppose the subsequence is made by deleting the second, fourth, sixth, etc., terms of the original sequence. The IFS driven by the original sequence has no points in the subsquare with address 11, while the IFS driven by the subsequence does have a point in the 11 subsquare.

Return to IFS Driven by Texts