User:Songyunc/square-free word


In combinatorics, a squarefree word is a word (a sequence of symbols) that does not contain any squares. A square is a word of the form , where is not empty. Thus, a squarefree word can also be defined as a word that avoids the pattern .

Finite squarefree words

edit

Binary alphabet

edit

Over a binary alphabet  , the only squarefree words are the empty word   and  .

Ternary alphabet

edit

Over a ternary alphabet  , there are infinitely many squarefree words. It is possible to count the number   of ternary squarefree words of length  .

The number of ternary squarefree words of length n[1]
n 0 1 2 3 4 5 6 7 8 9 10 11 12
c(n) 1 3 6 12 18 30 42 60 78 108 144 204 264

This number is bounded by  , where  [2]. The upper bound on   can be found via Fekete's Lemma and approximation by automata. The lower bound can be found by finding a substitution that preserves squarefreeness[2].

Alphabet with more than three letters

edit

Since there are infinitely many squarefree words over three-letter alphabets, this implies there are also infinitely many squarefree words over an alphabet with more than three letters.

The following table shows the exact growth rate of the k-ary squarefree words:

The growth rate of the k-ary squarefree words[2]
alphabet size (k) 4 5 6 7 8 9
growth rate 2.6215080 3.7325386 4.7914069 5.8284661 6.8541173 7.8729902
alphabet size (k) 10 11 12 13 14 15
growth rate 8.8874856 9.8989813 10.9083279 11.9160804 12.9226167 13.9282035

2-dimensional words

edit

Consider a map   from   to  , where   is an alphabet and   is called a 2-dimensional word. Let   be the entry  . A word   is a line of   if there exists  such that  , and for  [3].

Carpi[4] proves that there exists a 2-dimensional word   over a 16-letter alphabet such that every line of   is squarefree. A computer search shows that there are no 2-dimensional words  over a 7-letter alphabet, such that every line of   is squarefree.

Generating finite squarefree words

edit

Shur[5] proposes an algorithm called R2F (random-t(w)o-free) that can generate a squarefree word of length   over any alphabet with three or more letters. This algorithm is based on a modification of entropy compression: it randomly selects letters from a k-letter alphabet to generate a (k+1)-ary squarefree word.

algorithm R2F is
    input: alphabet size  ,
           word length  
    output: a (k+1)-ary squarefree word  of length  .

    (Note that   is the alphabet with letters  .)
    (For a word  ,   is the permutation of   such that   precedes   in   if the 
     right most position of   in   is to the right of the rightmost position of   in  .
     For example,   has  .)

    choose   in   uniformly at random 
    set   to   followed by all other letters of   in increasing order
    set the number   of iterations to 0

    while   do
        choose   in   uniformly at random
        append   to the end of  
        update   shifting the first   elements to the right and setting  
        increment   by  
        if   ends with an square of rank   do
            delete the last   letters of  

    return  

Every (k+1)-ary squarefree word can be the output of Algorithm R2F, because on each iteration it can append any letter except for the last letter of  .

The expected number of random k-ary letters used by Algorithm R2F to construct a (k+1)-ary squarefree word of length   is Note that there exists an algorithm that can verify the squarefreeness of a word of length   in   time. Apostolico and Preparata[6] give an algorithm using suffix trees. Crochemore[7] uses partitioning in his algorithm. Main and Lorentz[8] provide an algorithm based on the divide-and-conquer method. A naive implementation may require   time to verify the squarefreeness of a word of length  .

Infinite squarefree words

edit

There exist arbitrarily long squarefree words in any alphabet with three or more letters, as proved by Axel Thue[9].

Examples

edit

First difference of the Thue–Morse sequence

edit

One example of an infinite squarefree word over an alphabet of size 3 is the word over the alphabet   obtained by taking the first difference of the Thue–Morse sequence [9]. That is, from the Thue–Morse sequence

 

one forms a new sequence in which each term is the difference of two consecutive terms of the Thue–Morse sequence. The resulting squarefree word is

 (sequence A029883 in the OEIS).

Another example found by John Leech[10] is defined recursively over the alphabet  . Let   be any squarefree word starting with the letter  . Define the words   recursively as follows: the word   is obtained from   by replacing each   in   with  , each   with   , and each   with  . It is possible to prove that the sequence converges to the infinite squarefree word 

Generating infinite squarefree words

edit

Infinite squarefree words can be generated by squarefree morphism. A morphism is called squarefree if the image of every squarefree word is squarefree. A morphism is called k–squarefree if the image of every squarefree word of length k is squarefree.

Crochemore[11] proves that a uniform morphism   is squarefree if and only if it is 3-squarefree. In other words,   is squarefree if and only if   is squarefree for all squarefree   of length 3. It is possible to find a squarefree morphism by brute-force search.

algorithm squarefree_morphism is
    output: a squarefree morphism with the lowest possible rank  .

    set  
    while True do
        set   to the list of all squarefree words of length   over a ternary alphabet
        for each   in   do
            for each   in   do
                for each   in   do
                    if   do
                        break from the current loop (advance to next  )
                    if   and   do
                        if   is squarefree for all squarefree   of length   do
                            return  
        increment   by  

Over a ternary alphabet, there are exactly 144 uniform squarefree morphisms of rank 11 and no uniform squarefree morphisms with a lower rank than 11.

To obtain an infinite squarefree words, start with any squarefree word such as  , and successively apply a squarefree morphism   to it. The resulting words preserve the property of squarefreeness. For example, let   be a squarefree morphism, then as  ,   is an infinite squarefree word.

Note that, if a morphism over a ternary alphabet is not uniform, then this morphism is squarefree if and only if it is 5-squarefree[11].

Letter combinations in squarefree words

edit
 
Extending a squarefree word to avoid  .

Avoid two-letter combinations

edit

Over a ternary alphabet, a squarefree word of length more than 13 contains all the squarefree two-letter combinations[12].

This can be proved by constructing a squarefree word without the two-letter combination  . As a result,     is the longest squarefree word without the combination   and its length is equal to 13.

Note that over a more than three-letter alphabet there are squarefree words of any length without an arbitrary two-letter combination.

Avoid three-letter combinations

edit

Over a ternary alphabet, a squarefree word of length more than 36 contains all the squarefree three-letter combinations[12].

However, there are squarefree words of any length without the three-letter combination  .

Note that over a more than three-letter alphabet there are squarefree words of any length without an arbitrary three-letter combination.

Density of a letter

edit

The density of a letter   in a finite word   is defined as   where   is the number of occurrences of   in   and   is the length of the word. The density of a letter   in an infinite word is   where   is the prefix of the word   of length  [13].

The minimal density of a letter   in an infinite ternary squarefree word is equal to  [13].

The maximum density of a letter   in an infinite ternary squarefree word is equal to  [14].

edit

A cube-free word is one with no occurrence of www for a factor w. The Thue-Morse sequence is an example of a cube-free word over a binary alphabet.[15] This sequence is not squarefree but is "almost" so: the critical exponent is 2.[16] The Thue–Morse sequence has no overlap or overlapping square, instances of 0X0X0 or 1X1X1:[15] it is essentially the only infinite binary word with this property.[17] Dejean's theorem characterizes the minimum possible critical exponents for each alphabet size.[18]

The Thue number of a graph G is the smallest number k such that G has a k-coloring for which the sequence of colors along every simple path is squarefree.

The Kolakoski sequence is an example of a cube-free sequence.

An abelian p-th power is a subsequence of the form   where each   is a permutation of  . There is no abelian-squarefree infinite word over an alphabet of size three: indeed, every word of length eight over such an alphabet contains an abelian square. There is an infinite abelian-squarefree word over an alphabet of size five.[19]

Notes

edit
  1. ^ "A006156 - OEIS". oeis.org. Retrieved 2019-03-28.
  2. ^ a b c Shur, Arseny (2011). "Growth properties of power-free languages". Computer Science Review: 28–43.
  3. ^ Berthe, Valerie; Rigo, Michel (eds.), "Preface", Combinatorics, Words and Symbolic Dynamics, Cambridge University Press, pp. xi–xviii, ISBN 9781139924733, retrieved 2019-04-08
  4. ^ Carpi, Arturo (1988). "Multidimensional unrepetitive configurations". Theoretical Computer Science. 56 (2): 233–241. doi:10.1016/0304-3975(88)90080-1. ISSN 0304-3975.
  5. ^ Shur, Arseny (2015). "Generating square-free words efficiently". Theoretical Computer Science. 601: 67–72.
  6. ^ Apostolico, A.; Preparata, F.P. (Feb 1983). "Optimal off-line detection of repetitions in a string". Theoretical Computer Science. 22 (3): 297–315. doi:10.1016/0304-3975(83)90109-3. ISSN 0304-3975.
  7. ^ Crochemore, Max (Oct 1981). "An optimal algorithm for computing the repetitions in a word". Information Processing Letters. 12 (5): 244–250. doi:10.1016/0020-0190(81)90024-7. ISSN 0020-0190.
  8. ^ Main, Michael G; Lorentz, Richard J (Sep 1984). "An O(n log n) algorithm for finding all repetitions in a string". Journal of Algorithms. 5 (3): 422–432. doi:10.1016/0196-6774(84)90021-x. ISSN 0196-6774.
  9. ^ a b Berstel, Jean (1994). Axel Thue's papers on repetitions in words a translation. Départements de mathématiques et d'informatique, Université du Québec à Montréal. ISBN 2892761409. OCLC 494791187.
  10. ^ Leech, J. (1957). "A problem on strings of beads". Math. Gazette. 41: 277–278. Zbl 0079.01101.
  11. ^ a b Berstel, Jean (April 1984). "Some Recent Results on Squarefree Words" (PDF). Annual Symposium on Theoretical Aspects of Computer Science: 14–25.
  12. ^ a b Zolotov, Boris. "Another Solution to the Thue Problem of Non-Repeating Words" (PDF). ArXiv 2015.
  13. ^ a b Khalyavin, Andrey (2007). "The minimal density of a letter in an infinite ternary square-free word is 883/3215" (PDF). Journal of Integer Sequences. 10(2): 3.
  14. ^ Ochem, Pascal (2007). "Letter frequency in infinite repetition-free words". Theoretical Computer Science. 380 (3): 388–392. doi:10.1016/j.tcs.2007.03.027. ISSN 0304-3975.
  15. ^ a b Lothaire (2011) p.113
  16. ^ Krieger, Dalia (2006). "On critical exponents in fixed points of non-erasing morphisms". In Ibarra, Oscar H.; Dang, Zhe (eds.). Developments in Language Theory: Proceedings 10th International Conference, DLT 2006, Santa Barbara, CA, USA, June 26-29, 2006. Lecture Notes in Computer Science. Vol. 4036. Springer-Verlag. pp. 280–291. ISBN 3-540-35428-X. Zbl 1227.68074.
  17. ^ Berstel et al (2009) p.81
  18. ^ Rampersad, Narad; Shallit, Jeffrey (2016). "Repetitions in words". Combinatorics, words and symbolic dynamics. Encyclopedia Math. Appl. Vol. 159. Cambridge Univ. Press, Cambridge. pp. 101–150. MR 3525483.
  19. ^ Blanchet-Sadri, Francine; Simmons, Sean (2011). "Avoiding Abelian Powers in Partial Words". In Mauri, Giancarlo; Leporati, Alberto (eds.). Developments in Language Theory. Proceedings, 15th International Conference, DLT 2011, Milan, Italy, July 19-22, 2011. Lecture Notes in Computer Science. Vol. 6795. Berlin, Heidelberg: Springer-Verlag. pp. 70–81. doi:10.1007/978-3-642-22321-1_7. ISBN 978-3-642-22320-4. ISSN 0302-9743.

References

edit