Consider 26 letters $A,..., Z$. A string is a finite sequence consisting of those letters. We say that a string $s$ is nice if it contains each of the 26 letters at least once, and each permutation of letters $A,..., Z$ occurs in $s$ as a subsequences the same number of times. Prove that: (a) There exists a nice string. (b) Any nice string contains at least $2022$ letters.
Problem
Source: Czech-Polish-Slovak Match 2022 P6
Tags: combinatorics
CANBANKAN
17.09.2022 19:21
Let f(n,k) denote the following assertion:
There exists a string $S$ with characters in $[n]=\{1, \cdots, n\}$ such that for any $k$ distinct numbers of $[n]$, call $A=a_1,a_2, \cdots, a_k$, the number of subsequences of $S$ that are congruent to $A$ is equal.
We induct on $k$.
Base case: $(n,1)$ is trivial.
Inductive step: let $S'$ be the string for $(n,k-1)$, which exists by $f(n,k-1)$.
Definitions: Let $S'=s_1s_2 \cdots s_m$. Define $S_1\bigoplus S_2$as the concatenation of $S_1, S_2$. Order the permutations of $[n]$ as $\pi_1, \cdots, \pi_{n!}$. For a permutation $\pi$, define $S'(\pi)$ to be $\pi(s_1) \bigoplus \pi(s_2) \bigoplus \cdots \bigoplus \pi(s_m)$.
Consider the following string:
$$S=S'(\pi_1) \bigoplus S'(\pi_2) \cdots, S'(\pi_{n!})$$I claim this string satisfies the criteria for $(n,k)$. Consider a $T\le k$ distinct numbers in $[n]$, call $t_1 t_2 \cdots t_T$. We divide into two cases:
Case 1: $t_i$ are all from $S'(\pi_z)$ for the same $z$. Then I claim any string of the same length appears an equal number of times. For a string $t_1\cdots t_T$ let $f(t_1, \cdots, t_T)$ be the number of times it appears in $S'(id)$ where $id$ is the identity permutation. Then it appears in $S$ exactly $\sum_{\pi} S'(\pi)$ times where the sum goes over all permutations of $[n]$.
Case 2: $t_i$ are from at least two different $z$'s.
Suppose they are from $z_1, \cdots, z_m$, and $a_j$ letters of $T$ appear in $S'(\pi_{z_j})$. Note $a_j\le k-1$. Therefore by inductive hypothesis the number of ways is exactly $\prod_{i=1}^m $f(\text{some} $a_j\le k-1$ \text{ letters})
Which by inductive hypothesis doesn't depend on the letters. All in all, this completes the induction, and the string appearing in $(n,n)$ is exactly the string we need.
pepat
03.08.2023 23:47
The key is to realize that;
- if you remove any number of letters from a nice string, you get a nice string on the remaining letters; and
- when there are $p$ remaining letters for $p$ a prime, then at least one of them must occur a number of times that is a multiple of $p$. In particular, at most $p-1$ letters do *not* appear a-multiple-of-$p$-times.
One way to conclude from there is to consider primes up to 11. By union bound, at least $26-1-2-4-6-10=3$ of the 26 letters appear a number of times that is a multiple of $2\cdot 3\cdot 5\cdot 7\cdot 11=2310>2022$.
This great problem was proposed by Václav Rozhoň (Czech Republic). It has an alternative formulation in terms of permutation-fair dice, I recommend checking out their video on a polylog channel: https://www.youtube.com/watch?v=-64UT8yikng