AoPS - Problem

27.06.2022 23:18

DottedCaculator wrote: Let $k>1$ be a fixed positive integer. Prove that if $n$ is a sufficiently large positive integer, there exists a sequence of integers with the following properties: Each element of the sequence is between $1$ and $n$, inclusive. For any two different contiguous subsequence of the sequence with length between $2$ and $k$ inclusive, the multisets of values in those two subsequences is not the same. The sequence has length at least $0.499n^2$ It suffices to prove that for sufficient large odd numbers $n$ there is a sequence of length $\geq0.4995n^2$ satisfying the first two criteria. A random Eulerian path in $K_n$ will work with positive probability.

CANBANKAN

28.06.2022 01:54

Solved on the Plane Claim: if $p>k^k$ is a prime, then we can construct a sequence with $\frac 12 p(p-1)$ numbers. Proof: consider this sequence $x_0, \cdots, x_{\frac{p(p-1)}{2}-1}$ in $\mathbb{Z}_p$ Such that for all $0\le a<\frac{p-1}{2}$ and $0\le b\le p-1$, $$x_{ap+b}=bg^a$$ Where $g$ is a generator mod $p$ such that $g\ne \frac ab$ or $a+\frac{\pm 1}{b}$ for any $-2k^2\le a,b\le 2k^2$ where $a,b$ are nonzero. Call this assertion (*) Note (*) is valid (ie $g$ exists) because there are $\varphi(p-1)$ choices for $g$, and for all sufficiently large $n$, $\varphi(n)>99k^4$ I claim this works. We first check $x_i\ne x_{i+t}$ for all $2\le t\le k$ (t=1 is obvious) If $\lfloor \frac ip\rfloor=\lfloor \frac{i+t}{p} \rfloor$ this obviously holds. Otherwise, let $i=ap-m$ where $m<t$, then $x_{ap-m}=g^{a-1} (p-m)=-mg^{a-1}$ while $x_{ap+t-m}=g^a (t-m)$. By (*) they cannot be equal. If $\{x_j,x_{j+1}\}=\{ x_i, x_{i+1}\}$, then note $(x_j-x_{j+1}) = \pm (x_i-x_{i+1})$. Note $x_j-x_{j+1}=-g^{\lfloor \frac jp\rfloor}$ and $g^a\ne \pm g^b$ if $0\le a<b<\frac{p-1}{2}$. Now it's obvious $j=i$. Suppose $\{x_j, \cdots, x_{j+t-1}\}=\{x_i, \cdots, x_{i+t-1}\}$, then first consider the length of the longest arithmetic sequence present $\{x_j, \cdots, x_{j+t-1}\}$. Write $j=ap+b$ and $i=cp+d$ Case 1: $b<p+2-t$, then the common difference is $x_{j+1}-x_j=g^a$, and the number of terms is $t$. It follows that $a=c$ and $d<p+2-t$, so it's clear that $b=d$. Case 2: $b\ge p+2-t$. Then we can see $d\ge p+2-t$ as well. If the common difference of an arithmetic sequence whose elements are a subset of that set is $\pm g^a$ then it must be a subset of $\{x_{ap+b}, \cdots, x_{(a+1)p}\}$. This is true because it cannot contain two terms in $\{ x_{(a+1)p}, \cdots, x_{(a+1)p+(b+t-p-1)} \}$ because $g\notin \{ -k^{-1}, (1-k)^{-1}, \cdots, k^{-1} \}$, so $cg^{a+1}\ne g^a$ for any $c\in [-k,k]$. This is true by (*) Similarly, we can see it cannot contain $x_{(a+1)p+m} (m>0)$ and $x_{ap+b} (b<p)$. This is also true by (*) We can analyse the longest arithmetic sequence with common difference $\pm g^{a+1}$ in the same way. Lemma: for any $d\ne \pm g^a, g^{a+1}$, the length of the longest arithmetic sequence with common difference $d$ in $\{(m-t+1)g^a, (m-t+2)g^a, \cdots, -g^a,0,g^{a+1}, \cdots, mg^{a+1} \}$ is at most $\frac t2$. Proof of Lemma: By dividing by $g^a$, we may assume $a=0$. We have established that $w+xg\ne y+zg$ for all $(w,x)\ne (y,z)$ and $-k\le w,x,y,z\le k$ First, the arithmetic sequence cannot contains at least two terms from $\{m-t+1, \cdots, -1\}$ and at least one term from $\{g, 2g, \cdots, mg\}$. Assume otherwise. Say it contains $-x, -y, zg$ for $x,y,z \in [-k,k]$. Then there exists $l,n\in [-k,k]$ such that $dl=y-x$ and $dn=zg+y$, so $$dln=n(y-x)=l(zg+y)$$ $$lzg=n(y-x)-ly$$ Since $lz, n(y-x)-ly\in \{ -2k^2, \cdots, 2k^2\}$, this is impossible by (*) Therefore the arithmetic sequence is a subset of $\{m-t+1, \cdots, 0\}$ or $\{0, g, \cdots, mg\}$. WLOG it is a subset of $\{m-t+1, \cdots, 0\}$. Then its common difference is in $[-k,k] \backslash \{1,-1\}$, so it contains at most $\frac{t-m}{2}<\frac t2$ terms. The lemma is proven. WLOG the arithmetic sequence in the first set with length at least $\frac{t+1}{2}$ has common difference $\pm g^a$. By our above claim, $c\in \{a-1,a\}$. If $c=a$ then $i=j$ since $x_i,\cdots,x_{i+k}$ are pairwise distinct. Otherwise, $g^a$ is in the second set but not the first. Now, let $c=0.9995$. I claim for all sufficiently large $n$, there exists at least one prime in the interval $[cn,n]$. Let $d_1=1-10^{-8}, d_2=1+10^{-8}$, then $\pi(cn)=\pi(n)$. However, by Prime Number Theorem, for all sufficiently large $n$ $$\frac{d_1n}{\log n} < \pi(n) < \frac{d_2n}{\log n}$$ Where the log is in base $e$. Simple bounding leads to a size contradiction. Therefore, we can take a prime $\ge cn$ and use that sequence for $p$ to produce the desired construction with $\ge 0.499n^2$ terms

DottedCaculator

28.06.2022 02:40

Another way to finish: It is well known that there exists a prime between $n^3$ and $(n+1)^3$ for all $n>e^{e^{33.3}}$, which implies there is a prime between $cn$ and $n$ since $n^3>c(n+1)^3$ for all sufficiently large $n$.

tastymath75025

28.06.2022 05:58

My problem! I wasn’t really expecting this question to make it onto a contest in its current form, since (in my solution) getting to $(0.5 - o(1))n^2$ requires some intuition from analytic number theory, but I suppose that is fine for a TSTST #9 The motivation for this problem came from a chat with an MIT professor, who mentioned that he was trying to find long strings with the property that every contiguous length-$k$ substring contained a distinct multiset of characters (apparently this was relevant for particle detection in physics or something). It’s easy to see from considering $k=2$ that the sequence can’t be longer than $(0.5+o(1))n^2$. I realized pretty quickly that if $n$ was prime and you had a primitive root $g\pmod n$ which was “small” relative to $n$ (specifically, we need $g<\frac{n}{k}$), you could construct a sequence of length $\approx \frac{n^2}{2}$ using properties of primitive roots, essentially matching the upper bound. It’s well-known for all large primes $p$ that there is a primitive root $g< \frac{p}{k}$, but this is difficult to prove and probably not fair to expect contestants to know. Luckily, we can modify the previous idea a little - instead of taking $n=p$, we take $n=p^2$, and now it’s quite easy to show the existence of a primitive root $g < \frac{n}{k}$ (specifically, there is always a primitive root $\pmod {p^2}$ between $0$ and $p$ for $p>2$). So when $n=p^2$ we can construct sequences of length $\approx \frac{p^4}{2}$, and when $n$ is not in this form we can choose a prime $p<\sqrt{n}$ with $\frac{n}{p^2}\approx 1$, which finishes the problem. I’d be curious to see a combinatorial solution to this problem. All of the testsolvers only found similar constructions using primitive roots, but it’s possible they were influenced by this problem being on the N section of the shortlist, and there’s a nice alternative approach that nobody tried as a result.

CANBANKAN

30.06.2022 18:57

Here are some concluding remarks about the problem. SPOILER ALERT

goodbear

01.07.2022 22:43

tastymath75025 wrote: I’d be curious to see a combinatorial solution to this problem. All of the testsolvers only found similar constructions using primitive roots, but it’s possible they were influenced by this problem being on the N section of the shortlist, and there’s a nice alternative approach that nobody tried as a result. I agree with this being N! In my writeup, I cited the following clean, concise, memorable, well-known piece of mathematics: Wikipedias page on Bertrands postulate wrote: Dudek proved that, for all ${\displaystyle n\geq e^{e^{33.3}}}$, there is at least one prime between ${\displaystyle n^{3}}$ and ${\displaystyle (n+1)^{3}}$.

DottedCaculator

01.07.2022 22:46

I cited the fact that there exists a prime between $n^2$ and $(n+1)^2$ for all $n$

ABCDE

02.07.2022 01:33

pi_quadrat_sechstel wrote: DottedCaculator wrote: Let $k>1$ be a fixed positive integer. Prove that if $n$ is a sufficiently large positive integer, there exists a sequence of integers with the following properties: Each element of the sequence is between $1$ and $n$, inclusive. For any two different contiguous subsequence of the sequence with length between $2$ and $k$ inclusive, the multisets of values in those two subsequences is not the same. The sequence has length at least $0.499n^2$ It suffices to prove that for sufficient large odd numbers $n$ there is a sequence of length $\geq0.4995n^2$ satisfying the first two criteria. A random Eulerian path in $K_n$ will work with positive probability. Can you say a bit more about how this works? It seems like a random Eulerian path will have a cycle of length at most $k$ with high probability, which breaks the second condition.

squareman

08.07.2022 22:25

This was a headache. Solved with some help from rama1728. For all large enough primes $p$ we generate a working sequence with the residues $\pmod{p}$ and size $\frac{p(p-1)}{2}.$ Since there is a prime say in $[0.9999n, n]$ for all large enough $n$ this is enough. Call the absolute value $|r|$ of a residue $r$ the distance between $r$ and the closest multiple of $p.$ For a primitive root $g$ (TBD) we construct sequence $a_0, a_1, a_2, \dots a_{\frac{p(p-1)}{2}}$ where if $k = pq+r$ for integers $0 \le q < \frac{p-1}{2}$ and $0 \le r < p$ then $a_k \equiv rg^q \pmod{p}.$ Note $|g^{q_1}| \not\equiv |g^{q_2}|$ for $0\le q_1\ne q_2 < \frac{p-1}{2}.$ It's easy to see subsequences of length $2,3$ are dealt with, so we'll show for subsequences of length $\ge 4.$ To do this we enforce $g \not\equiv \frac{a}{b} \pmod{p}$ for any nonzero $1 \le |a|,|b| \le 2k.$ This can be done since $\phi(p-1)$ gets arbitrarily large compared to $k.$ Claim. Multisets don't appear. If $a_{pq+r} \equiv a_{pq+r+t}$ for some $1 \le t \le k,$ then assume $p-t \le r \le p-1$ (or it's trivial). Then $rg^{q} \equiv (r+t-p)g^{q+1},$ impossible. $\square$ Let $4 \le t \le k$ and we consider subsequences of length $t.$ Let a maximal subset of a subsequence be its largest subset corresponding with terms of an arithmetic sequence. Claim. If $0 \le r \le p-t+1$ then the maximal subset of $S_{pq+r}=\{ a_{pq+r}, a_{pq+r+1}, \dots a_{pq+r+t-1} \}$ is itself and its common difference has absolute value $g^{q}.$ First part trivial. Any arithmetic sequence corresponding exactly with these numbers must have $rg^q, (r+t-1)g^{q}$ as beginning and ending terms (assuming $t$ is small in proportion to $p$) proving the second part. $\square$ Claim. If $ p-t+1 < r < p$ and $S_{pq+r}$ is $\{rg^q, (r+1)g^{q}, \dots 0, g^{q+1}, \dots (r - p +t - 1)g^{q+1} \},$ the maximal subset must be all the terms not right of $0$ (must have common difference $g^q$), all the terms not left of $0$ (must have common difference $g^{q+1}$), or both if they happen to have the same size. Suppose two consecutive terms of an arithmetic sequence are on two different sides of $0$ (say, $-mg^{q}$ on the left and $ng^{q+1}$ on the right). Then the common difference has absolute value $mg^{q} + ng^{q+1}.$ But neither $-2mg^{q} - ng^{q+1}$ nor $mg^{q} + 2ng^{q+1}$ can be in $S_{pq+r}$ since $g \not\equiv \frac{a}{b} \pmod{p}$ for any $1 \le |a|,|b| \le 2k.$ So a maximal subset lies on one side of $0.$ The common difference is fixed with same logic as before. $\square$ We can finish. Consider a sequence $S_{pq+r} = \{ a_{pq+r}, a_{pq+r+1}, \dots a_{pq+r+t-1} \}$ and suppose it is $S_{px+y} = \{ a_{px+y}, a_{px+y+1}, \dots a_{px+y+t-1} \}$ for $4 \le t \le k.$ If $0 \le r \le p-t+1$ then $0 \le y \le p-t+1$ to give the correct size of the maximal subset, forcing $q = x$ to give the correct common difference, forcing $r = y$ to allow terms to match up. If $S_{pq+r}$ is $\{rg^q, (r+1)g^{q}, \dots 0, g^{q+1}, \dots (r - p +t - 1)g^{q+1} \}$ then this forces $|r| \equiv |y|$ to give the right size of the maximal subset(s) and $x$ to be within one of $q$ in order for the maximal subset(s) to have the same common difference(s). It's a quick check to see terms cannot line up unless $pq+r = px+y,$ the end. $\blacksquare$

thevictor

31.07.2022 12:52

In fact it is very easy to find a combinitorial solution since the bound required is merely $0.5n^2-o(n^2)$. We biject a sequence with a tour on a (directed) $K_n$.The statements rewrite as follows: (i)each edge has at most one direction being travelled and only once; (ii)after we visit one node,we don't visit it within $k$ steps; (iii)the nodes we visit in $4,\cdots,k$ steps is not the same set.(Note that for $r=2,3$,it directly follows from (i)) We start by some conjectures.Suppose that we can find an Eular tour of $K_n$ such that the set formed by each $t(4 \le t \le k)$ consecutive elements appears uniformly at random,it is easy to see that there are $O(1)$ coincidences. However,I'm not aware of such a proof.Luckily,some slight modifications can make a randomlized "nearly Eular tour".Here's the writeup: Fix a large prime $M>\max (100k,10^{20})$.We may WLOG assume that $n$ is a multiple of $M$ since ignoring $O(1)$ possible elements is OK.We can also use $O(1)$ additional nodes. Partition $[n]$ into $M$ subsets of equal size.Since $M$ is a large prime,it is well known that $K_m$ can be partitioned into hamilton cycles.For each of these cycles $k_1\to k_2 \to\cdots \to k_M\to k_1$ we consider all the directed edges from the $k_i$ th subset to the $k_{i+1}$ th subset.It is easy to see that the following directed subgraph has a Eular tour. We now have $\frac{M-1}{2}$ Eular tours which we call "PET"(partial Eular tour).Plug in $k$ additional nodes between different Eular tours to avoid corner cases. For concecutive $4 \le t \le k$ nodes in a single PET their sets won't coincide.In fact,they must be in the some sort of $k_l,k_{l+1},\cdots,k_r$ subset,and their order is uniquely determined. Now is our probability method:we let all the PETs to be conquent,but randomly relabel the points when we consider a single one.It turns out that for a string of length $t$,(supposing that there are $L_t$ of them)it occurs in a PET of probability at most $n^2/L_t$,so it occurs in at least two PET with probability at most $M^2n^4/L_t^2$.Summing up for all that kind of strings to get a bound of $M^2n^4/L_t$.Since $L_t=O(n^t)$,summing up for $t$ we get an expected $O(1)$ bound.Again,plugging in some additional nodes avoids this problem.

YaoAOPS

24.01.2024 20:56

We first prove a claim for bootstrapping. Claim (Bootstrapping): There exists some $\varepsilon$ such that the sequence has length at least $\varepsilon n^2$ for all $n$ and some $\varepsilon$. Proof: Split $n$ into $k+1$ groups of the same size $t = \left\lfloor\frac{n}{k+1} \right\rfloor$ ignoring remainders. Then we get at least $t^2 \cdot (k+1) \ge \frac{n}{(1 + o(1))(k+1)}$ tuples by taking a path that goes between groups increasing each time $\blacksquare$. Fix a large odd prime $n = 2m+1$ and define $t = \left\lceil\sqrt{\frac{n(k+1)}{2\varepsilon}}\right\rceil$. We take $t + 2$ additional numbers for gluing our APs together. We then take $\frac{n}{2}$ APs of \[ -k, 0, \dots, (n-2) \cdot k, \]which have length $n$ for $1 \le k \le m$. Claim: Any consecutive multiset of two different APs are disjoint. Proof: It's equivalent to show the result for an AP with common difference $1$ and some other AP. If the other AP has common difference $d$, then for some $d, 2 \cdot d, 3 \cdot d, \dots$, we break the size constraint $\blacksquare$. Note that each number appears at the beginning and end of exactly $2$ APs. Take a path of length $\varepsilon t^2$ which is possible by our claim and cut it up into $\frac{n-1}{2}$ strings of length $k+1$, $S_1, S_2, \dots, S_{\frac{n}{2}}$, then put an AP, one of the two additional numbers, $S_1$, and so forth. Then everything containing two portions of an AP form distinct multisets, the connecting two numbers form distinct multisets for those containing them and one adjacent number in $S_i$ or an $AP$, and the portions $S_i$ are distinct. This means that we get a length $\varepsilon t^2 + \frac{n^2}{2}$ sequence for $n + \sqrt{\frac{n(k+1)}{\varepsilon}} + 2$ elements, which suffices.