Thursday, January 2, 2020

A birthday week

Codeforces hosted two rounds during the Nov 18 - Nov 24 week. Round 601 took place on Tuesday (problems, results, top 5 on the left, analysis). Five contestants could solve all problems, and Radewoosh was quite a bit faster than others on problems C and D which gave him the victory. Well done!

Round 602 followed on Sunday (problems, results, top 5 on the left, analysis). This time even more competitors solved everything, and this time it was sunset who was slightly faster than others (some of whom have made great sacrifices to compete). Congratulations on the win!

In my previous summary, I have mentioned a TopCoder problem: you are given three integers n (1<=n<=1011), d (40<=d<=120) and b (5<=b<=62). Your goal is to return any base-b number that has exactly d digits when written without leading zeros, is divisible by n, and has at most four distinct digits when written in base-b without leading zeros.

I have found the correct idea almost immediately after reading the problem: we need to somehow make use of the birthday paradox, which is closely related to meet-in-the-middle algorithm. Roughly speaking, if we generate random numbers, then after generating O(sqrt(n)) numbers we will have generated two that have the same remainder modulo n.

However, it is important to find the right way to apply it. My approach was to split the number in two parts, keep generating random numbers with the same four distinct digits for the first and second part, and wait until we generate two that add up to 0 modulo n. This is not an exact application of the birthday paradox as we have two different random distributions rather than one, therefore it no longer guarantees that we're going to find such pair after generating just O(sqrt(n)) numbers. However, there is a hand-wavy argument: remainders modulo n of such large numbers are more or less random both for the first and for the second part, therefore the birthday paradox should still apply.

Unfortunately, there are cases where the distribution is not random at all. For example, when b=10 and n=1011, all digits before the 11-th one from the end do not affect the remainder modulo n, and therefore all randomly generated first parts will have remainder 0, therefore we'll need on the order of 1011 randomly generated second parts in order to find a match, instead of sqrt(1011). This case is easy to handle separately of course, as we can just include "all zeros" as one of the candidates for the second part. However, there are more tricky cases where gcd(n,b)>1, or when gcd(n,b-1)>1. I have managed to find a way to make this solution pass the system tests, but it was a very dangerous thing to do and the solution could have easily failed.

The right idea is to find a way to apply the birthday paradox directly. Let's generate random numbers of the full length until we find two that give the same remainder modulo n, this is bound to happen within O(sqrt(1011)) steps. The difference between these two is divisible by n, but there are two issues:
  • the difference might have less than d digits
  • even if each of the numbers has only four distinct digits, the difference might have more
It turns out that both issues can be taken care of if we generate random numbers consisting only of digits 0 and 1 in our process. The difference of any two such numbers will have only digits 0, 1, b-2 and b-1, satisfying the at most four distinct digits constraint. And in case the difference has less than d digits, we can just append enough zeros at the end since zero is one of the allowed digits, and appending zeros keeps the number divisible by n.

This solution provably terminates within O(sqrt(1011)) steps with very high probability, and one can submit it with confidence. Note that the fact that we generate numbers consisting only of digits 0 and 1, and therefore their remainders modulo n might not be uniformly distributed, does not hurt us: if the choices are not uniformly distributed, the birthday paradox collision just becomes more likely.

Thanks for reading, and check back for more!

No comments:

Post a Comment