Sunday, May 5, 2019

An order statistic week

The weekend of the Mar 25 - Mar 31 week was already the time to travel to Porto for ICPC World Finals for participants, but it still had a couple of rounds. TopCoder SRM 754 was the first one (problems, results, top 5 on the left, analysis). tourist's solution for the medium was challenged, which doesn't happen very often, but he more than made up for that by being the only contestant to solve the hard. Congratulations on the win!

Codeforces Round 549 followed a few hours later (problems, results, top 5 on the left, analysis). It was Um_nik's turn to dominate, being the only one to solve all problems but already leading the field with a sizable margin after solving 4 out of 5. Well done!

In the previous summary, I have mentioned an AtCoder problem: we pick n independent uniformly random points on a circle, and then find two points the (circular/angular) distance between which is as close to 1/3 of the circle as possible. What is the expected value of the difference between that distance, expressed as a fraction of the circle, and 1/3? n is up to 106, and you need to compute the answer using division modulo 109+7.

The official editorial has a similar but not exactly the same explanation that does not require any googling to solve the problem. However, I'd like to share my approach as well.

Let's collapse the circle 3 times, in other words project each point between 1/3 and 2/3 to x-1/3, and each point between 2/3 and 1 to x-2/3. The original question of finding two points with distance as close to 1/3 as possible almost translates into finding two closest points on this projection: if two points x and y are close, then these two points are either also close on the original circle (with probability 1/3), or the distance between them on the original circle is close to 1/3 (with probability 2/3), since 2/3 is also 1/3 when viewed from another angle. Moreover, these probabilities of 1/3 are almost independent for pairs of consecutive points: it's impossible that all pairs of consecutive points are close on original circle as well, but apart from that the probability that any k<n consecutive pairs each are close on the original circle as well is equal to 1/3k.

Now, in order to find the expected value of the difference between the sought distance and 1/3, define f(x) as the probability that this difference is >=x, then our answer is the integral of f(x).

Consider g(x,k) to be the probability that the k-th smallest distance (in sorted order, the so-called order statistic) between consecutive points from n uniformly random ones being >=x. Then we can find the probability that exactly k distances between consecutive points are less than x as g(x,k+1)-g(x,k) (we define g(x,0)=0 here). If the points are picked on a circle of size 1/3 instead of the unit circle, that probability is equal to g(3x,k+1)-g(3x,k). Finally, if we have k such distances on the 1/3 circle less than x, then in order to have our answer >=x, we need all of those small distances to be "also close" on the original circle, the probability of which is 1/3k, therefore we have established that f(x)=sumk((g(3x,k+1)-g(3x,k))/3k).

Since we're interested in the integral of f(x), we can swap summation and integration in the above formula, and learn that we're interested in sumk((int(g(3x,k+1))-int(g(3x,k)))/3k)=sumk((int(g(x,k+1))-int(g(x,k)))/3k+1).

Finally, int(g(x,k)) is simply the expected value of the k-th order statistic of distances between consecutive points when n uniformly random points are chosen on a circle. It turns out it's possible to google it: int(g(x,k))=1/n*(1/(n-k+1)+1/(n-k+2)+....+1/n).

Most of the terms cancel out, and we get int(g(x,k+1))-int(g(x,k))=1/(n*(n-k)), and int(f(x))=sumk(1/(n*(n-k)*3k+1)), where the summation is done over all values of k from 0 to n-1, which solves our problem, and the actual code we submit just computes this sum.

As I was coming up with this solution, I did not just invent it from top to bottom. Instead, I was digging from both ends: first, I've noticed that the setting is similar but not exactly the same as just finding the smallest distance between n random points on a circle of size 1/3. Then, I've googled that such distance has a known closed-form expression, and the same is true for k-th order statistic. Then, I've noticed that knowing how the k-th order statistic behaves allows us to split the probability space into areas where we just need to multiply by 1/3k. Finally, I've put the pieces together and verified that swapping summation and integration checks out, implemented the solution and got wrong answer for all samples: I forgot the extra 1/3 that comes from the fact that our circle is of size 1/3 instead of 1, and thus had 3k instead of 3k+1 in the formula. The contest time was running out, and I've almost given up hope, but still started to make small changes to the code expecting that I have some issue in that vein, 3k-1 didn't help but 3k+1 did, and I very happily submitted with just 1 minute left in the round :)

Thanks for reading, and check back for more!

No comments:

Post a Comment