Algorithms Weekly by Petr Mitrichev: 2021

Saturday, December 18, 2021

A pseudobubble week

Last week's contests were all concentrated on the weekend. Codeforces Round 758 took place early on Saturday (problems, results, top 5 on the left, analysis). The round seems to have been downvoted into oblivion because of weak pretests, and I think it's quite unfortunate — I think there's definitely place for weak pretests in the Codeforces format, and proving/testing one's solution is a part of the game. Congratulations to maroonrk who was able to solve six problems despite the weak pretests and win the round!

Facebook Hacker Cup 2021 Finals followed a few hours later (problems, results, top 5 on the left, analysis). Once again the points+ICPC penalty system has allowed for many different strategies, and this time A+D+F was the winning choice. Congratulations to ecnerwala and Um_nik for the successful execution!

I was trying to solve the same 3 problems, but my solution for F did not manage to finish in 6 minutes because it used too much memory and started swapping. After the round I've asked Egor to run it on his machine with 64G RAM, and it finished in 3 minutes there, but it turned out it also gave wrong answer on one testcase. It was surprisingly satisfying to learn that I was not so close, and my (relatively) weak laptop was not the only issue :)

My real issue in this round is that I've treated it as a Facebook Hacker Cup round. I've convinced myself that the Hacker Cup always has tedious implementation problems, and therefore I've started implementing the first solution that came to mind instead of thinking further to find a much simpler approach. In problem D, this ended up being segment trees with lazy propagation on a structure with 6 fields, applied on the previsit orders on the components of the centroid decomposition. The reference solution is a relatively simple dynamic programming on a tree after applying the trick from Aliens. In problem F, I've used a persistent segment tree containing persistent treaps in its nodes to build a graph of size O(n*log^2(n)) to find strongly connected components in, and this was already too much — it has taken me ages to implement, used too much memory and also probably had a bug (or was fully incorrect — I still haven't debugged the wrong answer).

Well, I hope this will be a good reminder to look for simpler solutions next time!

Finally, Open Cup 2021-22 Grand Prix of Nanjing wrapped up the week and the 2021 Open Cup rounds on Sunday (problems, results, top 5 on the left, onsite results). Team USA1 was once again in their own league, getting to 12 problems with 1.5 hours remaining, and being the only team to solve everything in the end. Well done!

Problem D was relatively standard, but as part of the story it contained the following algorithm:

for i in 1..n:
for j in 1..n:
if a[i] < a[j]:
swap(a[i], a[j])

At first sight, this looks to be a standard sorting algorithm. At second sight, though, it seems to be buggy. But it turns out that this algorithm does in fact sort the array in increasing order :)

Thanks for reading, and check back next week!

Sunday, October 10, 2021

A 2020 week

ICPC 2019-20 World Finals was the first event of this week (problems, results, top 12 on the left, official broadcast, our stream). As this is the main event of the year for many competitors, there are quite a few reports to check out from various participants (1, 2, 3, 4, please post more links in comments!). I did not watch the contest myself as we were solving it on stream instead, so let me just congratulate all medalists, and especially the NNSU team on the victory!

It was amazing to meet so many old and new friends in person after the long break, so big thanks to the organizers on making this event possible! At the same time, I think that the 3-computer rule and the fact that it was only announced a few weeks before the event were quite unfortunate. I hope that we will return to the 1-computer format next year!

On the heels of the World Finals where JetBrains was one of the main sponsors, Codeforces hosted Kotlin Heroes Episode 8 (problems, results, top 5 on the left, analysis). Judging by the standings, there was a nice difficulty curve that allowed everyone to have something interesting to solve. 3 contestants were able to solve one of the two hardest problems, SpyCheese finishing ahead of Heltion and eatmore thanks to being much faster on the relatively easier first 8. Congratulations!

TopCoder SRM 815 took place on Friday (problems, results, top 5 on the left). I was solving the round with a cold and a headache, however somehow this did not affect the results much, maybe because the 250 and 500 were very standard, which allowed me to go through them more or less on autopilot.

Solving the hard problem, on the other hand, required some gut feeling for what will be fast in practice. Given two positive integers n<=50 and m such that mⁿ<=10¹⁸, what is the largest possible value of the least common multiple of n positive integers, each up to m?

Facebook Hacker Cup 2021 Round 3 wrapped up the week on Saturday (problems, results, top 5 on the left, analysis). The combination of point values with the ICPC penalty time system created an interesting strategic dilemma: solving tasks D2+D3 (which was not really much more difficult than solving one of them) required more time than solving problem B or problem C, but since they were counted as two for penalty time purposes, it could pay off in terms of penalty time. From the top two finishers, tourist went for D2+D3 first, and Radewoosh went for B+C first, and the strategies turned out to be really close in practice. Well done to both, and congratulations to the top 25 who will compete in the online finals!

Going with D2+D3 before problem C had another drawback: in case one did not have enough time to solve everything, since D2+D3 were worth less points than C, one would lose to the alternative approach. The set of people on 68 points in the scoreboard has quite a few favorites who ended up not qualifying despite solving the most difficult problem. So in effect one had to decide between a strategy that would deliver the smallest penalty time when solving everything, and a strategy that would give more points when solving all problems except one (if we treat D2+D3 as one problem, as they really were).

I would also like to bring two important competitive programming websites to your attention. clist.by has the information about virtually all competitive programming contests, and allows everyone to assemble a custom calendar with regexp-based filters to see only the contests one is interested in. It also has Telegram notifications for this customized calendar and also imports the standings from most of those contests which allows for contestant-vs-contestant comparisons across platforms. It obtains the links between handles on different platforms through crowdsourcing, you can link all your handles together after registering on clist.by.

The recently launched cphof.org is similar in spirit, as it tries to create a unified profile for the contestants through various platforms and one-off competitions. However, it focuses only on big tournaments and not on regular rounds, and this limited scope allowed it to do the linking without relying on crowdsourcing. As a result, it has really good coverage for the top competitors.

Huge thanks to the maintainers of those websites, I use them every week! And thanks to all for reading, please check back next week.

Monday, October 4, 2021

An invitational week

TopCoder SRM 814 was the first round of last week (problems, results, top 5 on the left). ecnerwal was about 5 minutes faster than the rest of the field in solving the 1000-pointer, and therefore got enough margin to maintain the first place through the challenge phase and deny me qualification to Round 4 of TCO22 :) Well done!

Codeforces Round 745 followed later on the same day (problems, results, top 5 on the left, analysis). The last two problems were a lot harder than the rest, and only medium_waxberry and maroonrk were able to crack them. Congratulations to both!

ACM ICPC 2019-20 World Finals finally takes place tomorrow in Moscow. In case you haven't been following, quite a few teams could not make it and participated in a different online contest instead, and for those who could, the format was changed to 3 computers instead of 1. From the teams that made it to Moscow (list, but you have to exclude the online teams), the strongest three are the MIT, the Nizhny Novgorod SU and the Seoul NU. It seems that the battle tomorrow will be quite exciting!

I hope there will be an online mirror, and we'll solve it together with Gennady Korotkevich and Mikhail Tikhomirov for the fourth time in a row (2017, 2018, 2019), streaming the process online. We will use only 1 computer and stream to Gennady's Twitch channel this time. Tune in tomorrow around 9:15am Moscow time and ask your questions in chat! There is also an amazing official stream with commentary.

Sunday, September 26, 2021

A 202 week

Facebook Hacker Cup 2021 Round 2 narrowed the field to 500 qualifiers on Saturday (problems, results, top 5 on the left, analysis). maroonrk was a lot faster than everybody else, and he also bravely skipped writing a dedicated solution for C1, submitting problem C2 first and then quickly adapting its solution for C1. Congratulations on the clear first place!

This contest ended up being quite a struggle for me. First, I did run into the stack overflow issues that are a recurring topic for the Hacker Cup. It seems that I have changed my setup compared to the last year, and now even about 15 minutes of googling during the round and another half an hour after the round did not help me find a way to increase the stack size in the new setup. In the end I just gave up and submitted by compiling with g++ in the command line (in hindsight, this decision should have come earlier).

However, I'm still curious how to increase the stack size in my setup, which is CLion that uses WSL as the toolchain, compiling with clang++. I've tried various compiler/linker flags such as -Wl,-stack_size and -Wl,--stack, but none of them worked. I've tried editing /etc/security/limits.conf inside WSL, but that did not seem to have any effect. I've found answers suggesting that the stack size can only be changed in WSL 2, and I double-checked that I do in fact have WSL 2. I've tried calling setrlimit from within my code, but it did not help either. I've found mentions that calling sudo prlimit --stack=unlimited --pid $$; ulimit -s unlimited would help, but it's not clear how to do it when I'm executing my program from within CLion, as I could not find a way to specify that some commands need to be run before my program. Putting that command in .profile did not help.

So, I'm turning to the readers of my blog for help. Do you know how to increase the stack size in the CLion+WSL+clang setup?

EDIT: this has been resolved in comments, it turns out that I did not double-check enough and I had WSL1. With WSL2 /etc/security/limits.conf does the trick.

This was not the end of my fight with C++, however. After implementing the solution for C2, I've tried to check it before submitting by feeding it the input from C1 augmented with many random changes. I've noticed that it did agree with my solution for C1 in debug mode, but it produced a different answer in release mode! I went on to debug this for around half an hour, could not find any issues, and finally noticed that compiling from the command line with g++ with optimization on (the release mode mentioned above was compiling with clang++ from within CLion) produces the correct answers, so I've used that to make my submission which turned out to be correct.

After the contest I discovered what the problem was using binary search on clang++ flags: in release mode, CLion (which inherits this behavior from CMake as I understand) passes -DNDEBUG flag, which makes assert(f()) statements not execute f() at all, and I've used assertions to verify that the elements that I try to erase from sets are actually found: assert(set.erase(x)). I guess everyone has to learn this lesson once :)

Finally, I was quite excited to come up with a cool solution for problem D inspired by the algorithm described in this post from 9 years ago (see the second solution to problem F). However, I missed the fact that even though I only discarded at most 18 numbers in my solution, some of those discarded numbers are differences of, or differences of differences of original numbers (and so on), so we can discard much more than 18 original numbers. Therefore my solution has failed, but luckily the first four problems were enough to advance. Funnily enough, if I'm reading the feedback page correctly, this solution has almost passed, only failing one testcase where it discarded 31 numbers (it was only allowed to discard 25).

Open Cup 2021-22 Grand Prix of XiAn followed on Sunday (results, top 5 on the left, analysis). Team USA1 has returned to the winning ways, even though they were not the first team to reach 10 problems: Ildar Gainullin did this 20 minutes earlier, but his penalty time was ruined by the +24 in problem I. Congratulations to both teams!

In my previous summary, I have mentioned an Open Cup problem: you are given a positive integer k up to 10⁶, and need to produce any set of at most 30 integers with absolute values not exceeding 10¹⁶ such that the number of its subsets with zero sum, including the empty subset, is exactly k.

Since we did not have anything else to code, I got to play with this problem using a computer during the round. First, I've tried a greedy approach where we start with a set containing just the number 1, and then we keep adding numbers that give us as many additional zero-sum subsets as possible (but not exceeding k). This approach found a solution of size ~100 for large values of k if I remember correctly. When looking at the solution, I've noticed that after starting with a few 1s and -1s, it starts using bigger and bigger numbers, which, while locally optimal, means that on later steps we have many different sums, and therefore not so many subsets with the same sum.

I've tried to improve the solution by prescribing it to use more 1s and -1s before falling back to the greedy approach, which brought the solution size to ~70, still quite far from the required 30.

Then I've tried to think how a more constructive solution to this problem look like. One idea that came to mind is to split the solution into two parts: in the first part, we use only positive numbers, and therefore do not get any zero-sum subsets. Then, we use only negative numbers, and we have some kind of a knapsack problem: for each negative number, we know the number of ways to build its negation as the sum of some positive numbers, and we need those numbers to add up to k. It's not exactly a knapsack though since there might also be zero-sum subsets with more than one negative number.

However, this line of thought allowed me to come back to my greedy approach with a new idea: instead of starting with 1s ands -1s, I've tried starting with 1s and 2s first (just iterating how many of each we have, up to a total of 30), and then greedily adding only negative numbers, each time maximizing the number of new zero-sum subsets, but not exceeding k. This has dramatically improved its performance, finding solutions with less than 30 numbers for k=10⁶ and k=10⁶-1. I'm not sure if this rationalization is correct, but I think the reason for such improvement might be the fact that now we do the "exponential blowup" only once instead of twice (using only positive numbers for it).

This solution did not pass, but it passed quite a few tests with 1000 cases in each one, so I've modified it to also try having some 3s, which was enough :)

Thanks for reading, and check back next week!

Saturday, September 25, 2021

An unwrapping week

TopCoder SRM 813 took place last Friday (problems, results, top 5 on the left). With this victory tourist has guaranteed himself the TCO qualification spot, so that he does not have to wake up for the last SRM which takes place in the middle of the night — well done! I have grossly overestimated the chances of other solutions being broken, so my challenges for solutuons that did not "look right" brought me -75 points, and that despite one of the challenges being successful :)

The second stage of the new Open Cup season, the Grand Prix of IMO, took place on Sunday (results, top 5 on the left, analysis). Team USA1 continued their streak with the 21st top-3 finish in a row, spanning from the end of the 2019-20 season, but they could not get their fifth victory in a row since team ext71 was 5 penalty minutes faster after getting their 12th problem dramatically at 4:58 from +3. Congratulations to both teams!

I've had some fun digging through the constructive problem K: you are given a positive integer k up to 10⁶, and need to produce any set of at most 30 integers with absolute values not exceeding 10¹⁶ such that the number of its subsets with zero sum, including the empty subset, is exactly k. I doubt it's possible to solve this problem completely in one's head or on paper, so if you're interested, I encourage you to try some idea out with a computer :)

Finaly, CodeChef September Cook-Off wrapped up the week (problems, results, top 5 on the left, analysis). The two last problems were a lot more difficult than the rest, and only gennady.korotkevich was able to solve them both. Great job!

In my previous summary, I have mentioned a VK Cup problem: you have a tape that is infinite in both directions and is split into cells. You're going to write a given random permutation of size n<=15000 onto this tape in the following manner: you write the first number somewhere, then you either stay in the same cell or move one cell to the left or to the right, then write the second number there, then again either stay or move to an adjacent cell, write the third number there, and so on. Whenever you write a number into a cell that already has a number, the old number is overwritten and therefore discarded. After all n numbers have been written, we look at the resulting sequence written on the tape from left to right. What is the maximum possible size of an increasing subsequence of this sequence? Note that the given permutation is guaranteed to have been picked uniformly at random from all permutations of size n.

The first idea is relatively standard for problems of this type: overwriting is hard to deal with, so let us look at the process from the end. Then instead of overwriting we will just skip writing numbers to cells that already have a number, and therefore we will always have a segment of cells that have numbers that will not change anymore, and the rest of the cells free.

The second thing to get out of the way is the "increasing subsequence" complication. In the new setup where we go from the end, it's clear that there's no point at all writing numbers that will not end up in that increasing subsequence, as we can simply stay in the current position and skip writing the unnecessary number instead of moving left/right to write it. The only exception to this is the very first number we write (in reverse order, so the last number of the input permutation) which will always be written even if it's useless for the increasing subsequence. We can deal with this peculiar property of the last number separately, so for the remainder of this post I will forget about it for simplicity.

Now all numbers we write must form an increasing sequence. The property "increasing" is local, so all subsequent numbers will only be compared with the first or the last number written so far, which leads us to a dynamic programming solution. Having processed some prefix of the reversed permutation, only four things matter:

what is the first number of the sequence we have so far
what is the last number of the sequence we have so far
what is the length of the sequence we have so far
where in this sequence are we

However, these four things plus the length of the prefix we have processed lead to O(n⁵) states, so while polynomial, this solution does not really cut it for n<=15000. So we need a few tricks to reduce the state space.

The first trick is to notice that since the permutation is random, the length of the sequence will be O(n^0.5) instead of O(n) (more info). Therefore our position inside the sequence also has just O(n^0.5) options, so we have reduced the number of states to O(n⁴) just by staring at the solution for a bit and without changing it :)

The next relatively standard trick is to notice that we don't need the first two dimensions in such detail. Instead, we can store the smallest last number for each combination of (first number, length, our position), which brings us down to O(n³) states. Symmetrically, we could have also stored the biggest first number for each combination of (last number, length, our position).

Here comes another optimization: if we only consider states right after we have written a new number, as opposed to at all moments, then three dimensions of our dynamic programming collapse into one! For example, if we have just written a new first number, then "where in the sequence are we" has just one option — at that first number. Moreover, the prefix of the reversed permutation that we have processed is also fixed — it's the prefix up to that first number. This optimization without the previous optimization would have brought the number of states down to O(n^2.5).

Can we apply both optimizations together? This is what I could not figure out during the contest, but in retrospect it's quite straightforward: yes, we can! The key thing to notice is that we can't apply just "smallest last number" or just "biggest first number" optimizations by themselves. We must use both: if we have just written a new first number, then we need to store the smallest last number for this (first number, length) pair, and if we have just written a new last number, then we need to store the biggest first number for this (last number, length) pair! This brings the number of states down to just O(n^1.5), finally something manageable for n<=15000.

However, having reduced the number of states we have increased the number of transitions: instead of just considering what to do with the next number for a transition, we have to iterate over which will be the next number to be written, leading to O(n) transitions for each state and O(n^2.5) running time.

However, we can use yet another relatively standard trick: instead of naively doing all transitions, we can use a segment tree that stores all "pending" transitions, and be able to find the dynamic programming value for a new state in O(log(n)), solving our problem in O(n^1.5log(n)), which is fast enough. I hope you enjoyed this long process of unwrapping the present as much as I did :)

Thanks for reading, and check back for this week's summary!

Monday, July 19, 2021

A reversal week

TopCoder SRM 809 was the first event of the last week (problems, results, top 5 on the left, analysis). This was apparently the first SRM that counts towards the TCO22 qualification, and I have completely overlooked this fact and did not prioritize participating :( Congratulations to those who pay attention, and especially to tourist on solving the hard problem almost twice faster than everybody else and thus earning the clear first place!

The VK Cup 2021 Elimination Round for those who speak Russian and qualified, which was also available as Codeforces Round 733 for everybody else, followed on Saturday (problems, results, top 5 on the left, parallel round results, top 5 on the right, analysis stream, analysis). Everything went pretty well for me in this round, I got the solution ideas quickly and only had to do heavy debugging once, in problem G. However, Um_nik chose a superior strategy and went for problem H in the end which gave him much more points, and won the round. Well done!

More worryingly for me, despite me feeling pretty good about my performance on the first six problems, some usual suspects in the parallel round were faster to solve those six problems and managed to solve both harder problems to boot! Congratulations to jiangly and ecnerwala. It seems that just having a good day is no longer enough for me to compete for the first place at Codeforces :)

The hardest problem H was a pretty nice example of DP optimization. You have a tape that is infinite in both directions and is split into cells. You're going to write a given random permutation of size n<=15000 onto this tape in the following manner: you write the first number somewhere, then you either stay in the same cell or move one cell to the left or to the right, then write the second number there, then again either stay or move to an adjacent cell, write the third number there, and so on. Whenever you write a number into a cell that already has a number, the old number is overwritten and therefore discarded. After all n numbers have been written, we look at the resulting sequence written on the tape from left to right. What is the maximum possible size of an increasing subsequence of this sequence? Note that the given permutation is guaranteed to have been picked uniformly at random from all permutations of size n (which conveniently makes preparing the testcases for this problem so much easier!)

Thanks for reading, and check back next week!

Sunday, April 25, 2021

A cycle week

Codeforces Round 718 (also known as "Contest 2050") took place on Friday (problems, results, top 5 on the left, analysis). Benq solved the first seven problems 20 minutes faster than Um_nik, who in turn solved them almost half an hour faster than all other contestants. They were the only two top contestants who therefore had time to submit something in the last problem, it did not work out but nevertheless their first and second place were safe with a big margin. Congratulations on the impressive performance!

Google Code Jam 2021 Round 1B finished a few minutes ago (problems, results, top 5 on the left, analysis). neal_wu was the first to score 100 provisional points, but he made one incorrect attempt and therefore had to sit and wait for 4 minutes to see if somebody will overtake him. Nobody else was as fast, though, so Neal kept the first place. Congratulations to him and to all 1500 Round 2 advancers!

In my previous summary, I have mentioned a Codeforces problem: You are given n<=2000 points on the plane such that no three points lie on the same line. Each point has an integer label between 1 and n, and all labels are distinct. Your goal is to rearrange the labels in such a way that the i-th point has label i. You are allowed to take any two points and swap their labels, but when you do that you must draw a straight line segment connecting those two points. The segments you draw must not intersect except at their endpoints (in particular, you are not allowed to draw the same segment twice).

The key trick to make progress in the problem is to find a less general, but still non-trivial case which we can solve. It turns out that such case is the one when the initial labeling of points forms a single cycle, for example: point 1 has label 2, point 2 has label 3, and so on, until point n which has label 1. In this case we can solve the problem using only the swaps that include point 1, and therefore their segments will not intersect: we start with labels 234...n1. We swap the labels of points 1 and 2, and we get 324...n1. We swap the labels of points 1 and 3, and we get 423...n1. We continue with swapping the labels of points 1 and 4, and so on, and eventually we get n234...(n-1)1, and finally we swap the labels of the first and last points to get the identity permutation. Note that our choice of point 1 as the "pivot" of all swaps was arbitrary, and we could do the same with any other point.

What should we do if the labels do not form a single cycle? Let's make some additional swaps before using the above approach to make sure they do! More specifically, let's assume our points are sorted by angle with respect to point 1. The above solution will only draw segments between point 1 and other points. Therefore we are free to swap the labels between adjacent points in this sorted order, as those segments will not intersect each other and the segments to point 1.

The initial permutation has some number of cycles, and whenever we swap two elements from different cycles they merge into one. While we have more than one cycle, we can find two adjacent points in the sorted order that belong to different cycles, and swap them to merge those cycles. We repeat this until we have just one cycle remaining, and then apply the single-cycle solution.

There are some additional small details to be figured out, which you can find in the official editorial. I could not solve this problem myself, in part because the space of possible approaches is so vast, and yet most of them do not seem to work. I've checked the solutions for this problem from the top finishers, and they all seem to use this approach. In fact, I'm really curious: is some fundamentally different solution possible here? If not, does there exist some intuition why?

Thanks for reading, and check back next week.

Sunday, April 18, 2021

A yuri week

Codeforces Round 715 was the main round of this week (problems, results, top 5 on the left, analysis). The first three problems were quite approachable, and the real fight for the top spots started on the remaining three. ecnerwala went for the 4000-point F instead of the 2250-point D, and this turned out to be exactly the right plan, as there was not enough time to solve all problems. Congratulations on the victory!

I have tried to make the same plan work, but unfortunately I could not solve F in time — my solution was written at the end of the contest, but it had two issues:

It did not work on the samples
Its complexity was O(n²*log(n)) in the worst case (even though with a 7-second time limit this was not obviously bad)

I could make it work on the samples in just a couple of minutes after the end of the contest, and the solution passed the system tests — only to be hacked as the complexity was in fact too big. I would've probably reached the second place if I had those couple of minutes, and I'm still on the fence whether the fact that I'd have squeezed in an incorrect solution makes me regret this more or less :)

I'd like to highlight another problem though, for which I had completely no working ideas during the contest: problem D. You are given n<=2000 points on the plane such that no three points lie on the same line. Each point has an integer label between 1 and n, and all labels are distinct. Your goal is to rearrange the labels in such a way that the i-th point has label i. You are allowed to take any two points and swap their labels, but when you do that you must draw a straight line segment connecting those two points. The segments you draw must not intersect except at their endpoints (in particular, you are not allowed to draw the same segment twice). Can you see a way to achieve the goal?

In my previous summary, I have mentioned a Code Jam problem: you are given up to 10¹⁵ cards, each with a prime number between 2 and 499. You need to split the cards into two piles, such that the sum of the numbers in the first pile is equal to the product of the numbers in the second pile, or tell that it's impossible.

The key idea here is that the product grows very quickly, so the second pile will always be very small. Therefore the sum of the numbers in the second pile will be small. But the sum of the numbers in the first pile is the sum of all given numbers minus the sum of the numbers in the second pile. If the sum of all given numbers is S, we just need to check all numbers in the segment [S-C,S] as the candidates for the sum of the numbers in the first pile (which is equal to the product of the numbers in the second pile) for some relatively small value of C.

How do we check a particular candidate? Here the fact that all numbers are prime comes into play: since we know that the candidate is a product of some subset of the given numbers, and all of them are prime, there is in fact a unique decomposition for it as a product of primes. Factorizing a number this big can still be problematic, but since we're only interested in prime factors up to 499, it is fast enough.

You can check the official analysis for more details, such as what is a reasonable value for C. Thanks for reading, and check back next week!

Sunday, April 11, 2021

An ESP week

Google Code Jam Round 1A was the first round of this week (problems, results, top 5 on the left, analysis). The top 1500 have made it to Round 2, with 3000 further spots up for grabs in the remaining two Round 1s. Nevertheless, the spirit of competition was there, and it is of course a great achievement to get the first place. Congratulations to Um_nik, and his achievement is even more impressive given that the round was probably at 4 in the morning :)

I have prepared the test data for the second problem, Prime Time, and I (humbly, of course :)) think it was a great example of how a not particularly difficult problem can still be nice. The problem went like this: you are given up to 10¹⁵ cards, each with a prime number between 2 and 499. You need to split the cards into two piles, such that the sum of the numbers in the first pile is equal to the product of the numbers in the second pile, or tell that it's impossible. Can you see how to handle such a huge amount of cards?

AtCoder has returned with its Grand Contest 053 a few hours later (problems, results, top 5 on the left, analysis). I have solved the first three problems in less than 50 minutes, which was great, but then I could not score any additional points in the remaining 2 hours and 10 minutes, which was a bit frustrating :) I've tried to approach all three remaining problems in the beginning, but eventually focused on the problem E, and it turns out I was really close: I did identify the two cases from the official analysis, I have correctly (I think) handled the first case and tried various formulas that looked very similar to the correct one for the second case, but I could neither figure out all details and come up with a formula on paper, nor get the correct answers for the sample cases by trying small changes to the formulas. It turns out that my problem was that I was inserting the segments in the increasing order of the starting time, instead of the decreasing order of the ending time. These two ideas look symmetric on the surface, but they actually are not equivalent because the problem asks about local maxima, and the symmetric case would be local minima. However, my idea can still handle the first case just as well (because the first case is in fact symmetric, as we have n-1 local minima there as well), which made it hard to move away from it for the second case.

Congratulations to the seven contestants who have managed to handle all details correctly and solve one of the harder problems, and especially to jiangly on the win!

Thanks for reading, and check back next week.

Monday, April 5, 2021

A no-regret week

TopCoder SRM 803 last week wrapped up the third stage of the race for TCO21 qualification (problems, results, top 5 on the left, TCO21 race results, analysis). Even though tourist had already guaranteed himself the first place and the direct ticket (or a direct login, in case it takes place online again :)) to TCO21, he has won the round with a big margin once again. This was his fourth SRM win in a row, which has happened before, but nobody else could win five in a row in the past. tourist has also claimed the all-time high rating during this streak. Congratulations Gennady, and no pressure at all for SRM 804 ;)

Codeforces Round 712 followed on Saturday (problems, results, top 5 on the left, analysis). The last problem kind of screamed that some sort of a greedy approach for embedding each cycle should work, but figuring out all details of the approach, as well as implementing it, was still extremely hard. Very well done to ksun48 on doing that and getting a clear first place!

Finally, the Northern Eurasia region of the ICPC held its onsite finals for the 2020-21 season on Sunday (problems, results, top 12 on the left, online mirror results, analysis). The top 50 teams from the online finals were invited to this round (how does one actually call the round between a semifinal and a final? A 2/3-final? A 1/sqrt(2)-final? :)). In the end, solving the 6 relatively easier problems was the bar for getting a medal, while getting a gold medal required also solving either the traditionally cactus-themed problem C, or the also traditionally interactive problem I. Congratulations to all medalists and especially to the team "Insert your name" from ITMO on the victory!

I set that interactive problem I for this round. Given that the contestants were onsite and did not have internet access during the round, it was a good opportunity to give a problem involving a beautiful but googleable algorithm: the randomized weighted majority algorithm. Preparing the test cases for this problem was extremely tricky, as the space of possible approaches is virtually unlimited, and there seems to be no single way to fail all of them. It turned out that the testcases were good enough for the onsite round, with the only two passing solutions using the provably correct approach, and with the top two teams probably accumulating quite some love for me with their -35 and -29 on this problem.

In the online mirror, some more questionable, but at the same time really creative, approaches passed all tests. In fact, the ML approach pwns the test set, making several times less mistakes than b (the smallest the number of mistakes of experts) in most test cases. It only has difficulties on test cases with a really small b (say, 7 out of 10000), where the allowed leeway of 100 extra mistakes is barely enough for the neural network training to converge.

It looks like I need to add some simple gradient descent and boosted decision forest algorithms to my prewritten code library for the future :)

In my previous summary, I have mentioned a Codeforces problem: there is a hidden array of n b-bit integers (in other words, each element is between 0 and 2^b-1). You do not know the elements of the array, but you can ask questions about them: in one question, you can ask "is it true that the i-th element of the array is greater than j?" for some value of i between 1 and n and j between 0 and 2^b-1. Your goal is to find the value of the maximum element in the array, and you need to do it in O(n+b) queries.

The first question is kind of expected: let's compare the first element of the array with 2^b-1-1, in other words let's learn the highest bit of the first element. If that bit is 1, we're all good: we then know that the highest bit of the answer is also 1, therefore we have effectively reduced b by 1 using just one query, so we're on track for O(n+b) queries overall.

The case where that bit is 0 is more interesting. We can't really afford to keep asking about the first bit of all numbers, since in case they are all 0, we would've spent n queries to reduce b by 1, which does not lead to a linear number of queries. This issue kind of points us to a better approach: in case the answers are always "less than", we want to be left with a really small range after asking the n queries. Therefore let's compare the second number with 2^b-2-1, in other words let's ask "is it true that the two highest bits of the second number are 0"?

In case we keep getting "less than" answers, after n queries we will know that the first number starts with 0, the second with 00, the third with 000, and so on. Now let's go from right to left, and ask "does the n-1-th number start with n zeros?". If not, then we know that the n-th number is smaller than the (n-1)-th number, and can be discarded for the rest of the solution, and we continue by asking "does the (n-2)-th number start with n-1 zeros?" If the (n-1)-th number does start with n zeros, we continue by asking "does the (n-2)-th number start with n zeros"? After we complete this process going from right to left, we will have discarded some amount k of all numbers, and will know that the remaining numbers all start with n-k zeros. Therefore we have reduced n by k, and b by n-k, so n+b was reduced by n using 2n queries, which is good enough for a linear solution!

Finally, we need to figure out what to do in case we get some "greater" answer after a few "less than" answers, for example when we learn that the first number starts with 0, the second with 00, the third with 000, but the fourth does not start with 0000. We will then ask: does the fourth number start with 0001? If the answer is also no, then we know that the fourth number starts with at least 001, therefore it's greater than the third number which can be discarded, and we continue by asking if the fourth number really starts with 001, potentially discarding the second number if not, and so on. If the fourth number does start with 0001, then we continue with the fifth number, but instead of asking if it starts with 00000, we ask if its prefix is at least 00010 (since we're not really interested in numbers smaller than 0001, given that we already have evidence of a number that starts with 0001).

At any moment, our algorithm therefore maintains a stack of numbers, where for each number we know a prefix that is equal to the prefix we know for the previous number in the stack plus one more bit. When we run out of bits, we do the backwards pass as described above, and obtain a problem of reduced size. Just like in the all-zeros case, we spend O(n) queries to reduce n+b by n, therefore achieving a linear solution.

This is the main idea of the solution. There are still some small details to be figured out, which you can find in the official editorial.

Thanks for reading, and check back next week!

Monday, January 4, 2021

A Samara week

"Good Bye 2020" was the last round of the year on Codeforces (problems, results, top 5 on the left, analysis). The round has left me with mixed feelings, as I've spent almost the whole 3 hours solving relatively standard problems, and did not have enough time to solve the very exciting problem I. The main reason for that was that I forgot how to count DAGs, and tried to reinvent the wheel in problem H for a very long time. tourist, on the other hand, did not struggle so much with H, and won the round. Well done! Also well done to scott_wu, fivedemands, mnbvmar and qwerty787788 who were able to solve the last problem.

Here's that exciting problem: there is a hidden array of n b-bit integers (in other words, each element is between 0 and 2^b-1). You do not know the elements of the array, but you can ask questions about them: in one question, you can ask "is it true that the i-th element of the array is greater than j?" for some value of i between 1 and n and j between 0 and 2^b-1. Your goal is to find the value of the maximum element in the array, and you need to do it in O(n+b) queries. The problem actually asked to do it in at most 3*(n+b) queries, but I think just doing it in a linear number of queries is the most interesting part.

Note that simply finding each element with binary search would lead to n*b queries, and it's not clear initially how we can do any better as each query only asks about one element, and the elements are somewhat independent. Can you see the way? n and b are up to 200, and the interactor is adaptive (so your solution most likely needs to be deterministic).

The new year pause was not long, and Open Cup 2020-21 Grand Prix of Samara took place on Sunday (results, top 5 on the left, analysis, overall Open Cup standings). Unlike last year, which saw a fierce competition at the top of the overall Open Cup scoreboard between three teams, this year team USA1 really dominates the proceedings, and they won their 7th Grand Prix out of 8 this time, finishing all problems in 3.5 hours. Congratulations!

The Prime New Year Contest is another staple of the holidays. It is running until the end of this week, and features the problems from 2020 which were not solved by anybody during respective contests. Good luck getting something accepted there, and huge respect to those who already did!

In my previous summary, I have mentioned an AtCoder problem: you are given an undirected graph with four vertices and four edges that looks like a square (the picture from the statement on the right). You know that a walk started from vertex S, finished at vertex S, has passed the ST edge (in any direction) a times, the TU edge b times, the VU edge c times, and the SV edge d times. How many different walks fit that description? a, b, c and d are up to 500000, and you need to compute the answer modulo 998244353.

If we replace the ST edge with a parallel edges, the TU edge with b parallel edges, and so on, then we're looking for the number of Euler tours in the resulting graph modulo dividing by some factorials to account for the fact that all parallel edges are equivalent. However, counting Euler tours in undirected graphs is hard.

But given the simple structure of our graph, we can actually reduce our problem to counting Euler tours in directed graphs, which can be done using the BEST theorem! We can iterate over the number of times we pass the ST edge in the direction from S to T, in other words over how many ST arcs would our directed graph have. This determines the number of TS arcs by subtracting from a, then the number of SV and VS arcs from the constraint that the in-degree and out-degree of S must be the same, and so on until we know the number of times we pass each edge in each direction, and we can then count the Euler tours in the resulting graph in constant time (because the graph has 4 vertices and 8 "arc types", and the actual number of parallel arcs does not affect the running time of the BEST theorem). Since a is up to 500000, we have to do this constant time computation 500000 times, which is fast enough.

Thanks for reading, and check back next week!