Algorithms Weekly by Petr Mitrichev: April 2021

Sunday, April 25, 2021

A cycle week

Codeforces Round 718 (also known as "Contest 2050") took place on Friday (problems, results, top 5 on the left, analysis). Benq solved the first seven problems 20 minutes faster than Um_nik, who in turn solved them almost half an hour faster than all other contestants. They were the only two top contestants who therefore had time to submit something in the last problem, it did not work out but nevertheless their first and second place were safe with a big margin. Congratulations on the impressive performance!

Google Code Jam 2021 Round 1B finished a few minutes ago (problems, results, top 5 on the left, analysis). neal_wu was the first to score 100 provisional points, but he made one incorrect attempt and therefore had to sit and wait for 4 minutes to see if somebody will overtake him. Nobody else was as fast, though, so Neal kept the first place. Congratulations to him and to all 1500 Round 2 advancers!

In my previous summary, I have mentioned a Codeforces problem: You are given n<=2000 points on the plane such that no three points lie on the same line. Each point has an integer label between 1 and n, and all labels are distinct. Your goal is to rearrange the labels in such a way that the i-th point has label i. You are allowed to take any two points and swap their labels, but when you do that you must draw a straight line segment connecting those two points. The segments you draw must not intersect except at their endpoints (in particular, you are not allowed to draw the same segment twice).

The key trick to make progress in the problem is to find a less general, but still non-trivial case which we can solve. It turns out that such case is the one when the initial labeling of points forms a single cycle, for example: point 1 has label 2, point 2 has label 3, and so on, until point n which has label 1. In this case we can solve the problem using only the swaps that include point 1, and therefore their segments will not intersect: we start with labels 234...n1. We swap the labels of points 1 and 2, and we get 324...n1. We swap the labels of points 1 and 3, and we get 423...n1. We continue with swapping the labels of points 1 and 4, and so on, and eventually we get n234...(n-1)1, and finally we swap the labels of the first and last points to get the identity permutation. Note that our choice of point 1 as the "pivot" of all swaps was arbitrary, and we could do the same with any other point.

What should we do if the labels do not form a single cycle? Let's make some additional swaps before using the above approach to make sure they do! More specifically, let's assume our points are sorted by angle with respect to point 1. The above solution will only draw segments between point 1 and other points. Therefore we are free to swap the labels between adjacent points in this sorted order, as those segments will not intersect each other and the segments to point 1.

The initial permutation has some number of cycles, and whenever we swap two elements from different cycles they merge into one. While we have more than one cycle, we can find two adjacent points in the sorted order that belong to different cycles, and swap them to merge those cycles. We repeat this until we have just one cycle remaining, and then apply the single-cycle solution.

There are some additional small details to be figured out, which you can find in the official editorial. I could not solve this problem myself, in part because the space of possible approaches is so vast, and yet most of them do not seem to work. I've checked the solutions for this problem from the top finishers, and they all seem to use this approach. In fact, I'm really curious: is some fundamentally different solution possible here? If not, does there exist some intuition why?

Thanks for reading, and check back next week.

Sunday, April 18, 2021

A yuri week

Codeforces Round 715 was the main round of this week (problems, results, top 5 on the left, analysis). The first three problems were quite approachable, and the real fight for the top spots started on the remaining three. ecnerwala went for the 4000-point F instead of the 2250-point D, and this turned out to be exactly the right plan, as there was not enough time to solve all problems. Congratulations on the victory!

I have tried to make the same plan work, but unfortunately I could not solve F in time — my solution was written at the end of the contest, but it had two issues:

It did not work on the samples
Its complexity was O(n²*log(n)) in the worst case (even though with a 7-second time limit this was not obviously bad)

I could make it work on the samples in just a couple of minutes after the end of the contest, and the solution passed the system tests — only to be hacked as the complexity was in fact too big. I would've probably reached the second place if I had those couple of minutes, and I'm still on the fence whether the fact that I'd have squeezed in an incorrect solution makes me regret this more or less :)

I'd like to highlight another problem though, for which I had completely no working ideas during the contest: problem D. You are given n<=2000 points on the plane such that no three points lie on the same line. Each point has an integer label between 1 and n, and all labels are distinct. Your goal is to rearrange the labels in such a way that the i-th point has label i. You are allowed to take any two points and swap their labels, but when you do that you must draw a straight line segment connecting those two points. The segments you draw must not intersect except at their endpoints (in particular, you are not allowed to draw the same segment twice). Can you see a way to achieve the goal?

In my previous summary, I have mentioned a Code Jam problem: you are given up to 10¹⁵ cards, each with a prime number between 2 and 499. You need to split the cards into two piles, such that the sum of the numbers in the first pile is equal to the product of the numbers in the second pile, or tell that it's impossible.

The key idea here is that the product grows very quickly, so the second pile will always be very small. Therefore the sum of the numbers in the second pile will be small. But the sum of the numbers in the first pile is the sum of all given numbers minus the sum of the numbers in the second pile. If the sum of all given numbers is S, we just need to check all numbers in the segment [S-C,S] as the candidates for the sum of the numbers in the first pile (which is equal to the product of the numbers in the second pile) for some relatively small value of C.

How do we check a particular candidate? Here the fact that all numbers are prime comes into play: since we know that the candidate is a product of some subset of the given numbers, and all of them are prime, there is in fact a unique decomposition for it as a product of primes. Factorizing a number this big can still be problematic, but since we're only interested in prime factors up to 499, it is fast enough.

You can check the official analysis for more details, such as what is a reasonable value for C. Thanks for reading, and check back next week!

Sunday, April 11, 2021

An ESP week

Google Code Jam Round 1A was the first round of this week (problems, results, top 5 on the left, analysis). The top 1500 have made it to Round 2, with 3000 further spots up for grabs in the remaining two Round 1s. Nevertheless, the spirit of competition was there, and it is of course a great achievement to get the first place. Congratulations to Um_nik, and his achievement is even more impressive given that the round was probably at 4 in the morning :)

I have prepared the test data for the second problem, Prime Time, and I (humbly, of course :)) think it was a great example of how a not particularly difficult problem can still be nice. The problem went like this: you are given up to 10¹⁵ cards, each with a prime number between 2 and 499. You need to split the cards into two piles, such that the sum of the numbers in the first pile is equal to the product of the numbers in the second pile, or tell that it's impossible. Can you see how to handle such a huge amount of cards?

AtCoder has returned with its Grand Contest 053 a few hours later (problems, results, top 5 on the left, analysis). I have solved the first three problems in less than 50 minutes, which was great, but then I could not score any additional points in the remaining 2 hours and 10 minutes, which was a bit frustrating :) I've tried to approach all three remaining problems in the beginning, but eventually focused on the problem E, and it turns out I was really close: I did identify the two cases from the official analysis, I have correctly (I think) handled the first case and tried various formulas that looked very similar to the correct one for the second case, but I could neither figure out all details and come up with a formula on paper, nor get the correct answers for the sample cases by trying small changes to the formulas. It turns out that my problem was that I was inserting the segments in the increasing order of the starting time, instead of the decreasing order of the ending time. These two ideas look symmetric on the surface, but they actually are not equivalent because the problem asks about local maxima, and the symmetric case would be local minima. However, my idea can still handle the first case just as well (because the first case is in fact symmetric, as we have n-1 local minima there as well), which made it hard to move away from it for the second case.

Congratulations to the seven contestants who have managed to handle all details correctly and solve one of the harder problems, and especially to jiangly on the win!

Thanks for reading, and check back next week.

Monday, April 5, 2021

A no-regret week

TopCoder SRM 803 last week wrapped up the third stage of the race for TCO21 qualification (problems, results, top 5 on the left, TCO21 race results, analysis). Even though tourist had already guaranteed himself the first place and the direct ticket (or a direct login, in case it takes place online again :)) to TCO21, he has won the round with a big margin once again. This was his fourth SRM win in a row, which has happened before, but nobody else could win five in a row in the past. tourist has also claimed the all-time high rating during this streak. Congratulations Gennady, and no pressure at all for SRM 804 ;)

Codeforces Round 712 followed on Saturday (problems, results, top 5 on the left, analysis). The last problem kind of screamed that some sort of a greedy approach for embedding each cycle should work, but figuring out all details of the approach, as well as implementing it, was still extremely hard. Very well done to ksun48 on doing that and getting a clear first place!

Finally, the Northern Eurasia region of the ICPC held its onsite finals for the 2020-21 season on Sunday (problems, results, top 12 on the left, online mirror results, analysis). The top 50 teams from the online finals were invited to this round (how does one actually call the round between a semifinal and a final? A 2/3-final? A 1/sqrt(2)-final? :)). In the end, solving the 6 relatively easier problems was the bar for getting a medal, while getting a gold medal required also solving either the traditionally cactus-themed problem C, or the also traditionally interactive problem I. Congratulations to all medalists and especially to the team "Insert your name" from ITMO on the victory!

I set that interactive problem I for this round. Given that the contestants were onsite and did not have internet access during the round, it was a good opportunity to give a problem involving a beautiful but googleable algorithm: the randomized weighted majority algorithm. Preparing the test cases for this problem was extremely tricky, as the space of possible approaches is virtually unlimited, and there seems to be no single way to fail all of them. It turned out that the testcases were good enough for the onsite round, with the only two passing solutions using the provably correct approach, and with the top two teams probably accumulating quite some love for me with their -35 and -29 on this problem.

In the online mirror, some more questionable, but at the same time really creative, approaches passed all tests. In fact, the ML approach pwns the test set, making several times less mistakes than b (the smallest the number of mistakes of experts) in most test cases. It only has difficulties on test cases with a really small b (say, 7 out of 10000), where the allowed leeway of 100 extra mistakes is barely enough for the neural network training to converge.

It looks like I need to add some simple gradient descent and boosted decision forest algorithms to my prewritten code library for the future :)

In my previous summary, I have mentioned a Codeforces problem: there is a hidden array of n b-bit integers (in other words, each element is between 0 and 2^b-1). You do not know the elements of the array, but you can ask questions about them: in one question, you can ask "is it true that the i-th element of the array is greater than j?" for some value of i between 1 and n and j between 0 and 2^b-1. Your goal is to find the value of the maximum element in the array, and you need to do it in O(n+b) queries.

The first question is kind of expected: let's compare the first element of the array with 2^b-1-1, in other words let's learn the highest bit of the first element. If that bit is 1, we're all good: we then know that the highest bit of the answer is also 1, therefore we have effectively reduced b by 1 using just one query, so we're on track for O(n+b) queries overall.

The case where that bit is 0 is more interesting. We can't really afford to keep asking about the first bit of all numbers, since in case they are all 0, we would've spent n queries to reduce b by 1, which does not lead to a linear number of queries. This issue kind of points us to a better approach: in case the answers are always "less than", we want to be left with a really small range after asking the n queries. Therefore let's compare the second number with 2^b-2-1, in other words let's ask "is it true that the two highest bits of the second number are 0"?

In case we keep getting "less than" answers, after n queries we will know that the first number starts with 0, the second with 00, the third with 000, and so on. Now let's go from right to left, and ask "does the n-1-th number start with n zeros?". If not, then we know that the n-th number is smaller than the (n-1)-th number, and can be discarded for the rest of the solution, and we continue by asking "does the (n-2)-th number start with n-1 zeros?" If the (n-1)-th number does start with n zeros, we continue by asking "does the (n-2)-th number start with n zeros"? After we complete this process going from right to left, we will have discarded some amount k of all numbers, and will know that the remaining numbers all start with n-k zeros. Therefore we have reduced n by k, and b by n-k, so n+b was reduced by n using 2n queries, which is good enough for a linear solution!

Finally, we need to figure out what to do in case we get some "greater" answer after a few "less than" answers, for example when we learn that the first number starts with 0, the second with 00, the third with 000, but the fourth does not start with 0000. We will then ask: does the fourth number start with 0001? If the answer is also no, then we know that the fourth number starts with at least 001, therefore it's greater than the third number which can be discarded, and we continue by asking if the fourth number really starts with 001, potentially discarding the second number if not, and so on. If the fourth number does start with 0001, then we continue with the fifth number, but instead of asking if it starts with 00000, we ask if its prefix is at least 00010 (since we're not really interested in numbers smaller than 0001, given that we already have evidence of a number that starts with 0001).

At any moment, our algorithm therefore maintains a stack of numbers, where for each number we know a prefix that is equal to the prefix we know for the previous number in the stack plus one more bit. When we run out of bits, we do the backwards pass as described above, and obtain a problem of reduced size. Just like in the all-zeros case, we spend O(n) queries to reduce n+b by n, therefore achieving a linear solution.

This is the main idea of the solution. There are still some small details to be figured out, which you can find in the official editorial.

Thanks for reading, and check back next week!