Algorithms Weekly by Petr Mitrichev: 2020

Tuesday, December 29, 2020

A rng_58 week

Last week wrapped up the year for two major contest platforms — TopCoder and AtCoder — and with it the races to qualify to the corresponding onsites.

TopCoder SRM 796 was the first round of the week (problems, results, top 5 on the left, analysis). maroon_kuri was leading after the coding phase but failed the system tests; none of myself, tourist and Egor could find a challenge opportunity even though there were lots of them available, including in the 500-pointer which screamed "look for challenges here!" as it had just one sample case.

This round concluded the three-month race for the second TCO21 spot (results, top 5 on the left). Even though six SRMs is quite a lot, the race came down to the wire and to a lot of very close calls: in SRM 792, neal_wu produced an amazing 200-point challenge phase to get a 3-point advantage and deny Um_nik the 5 race points; in SRM 793, tourist's easy was challenged but he would still win the round had ksun48 not taken his turn to earn 200 challenge points; you can see the story of SRM 796 above.

AtCoder held its two last rounds of the year on the weekend, starting with AtCoder Grand Contest 050 on Saturday (problems, results, top 5 on the left, analysis). Um_nik was the fastest on the four relatively easier problems, and protected his lead by finding the main insight and figuring out all details in the very tricky problem F. Congratulations on the win! duality and newbiedmy were also able to solve the hardest problem, and rounded up the top three. newbiedmy in particular must've had one hell of a contest: he started with this problem, and did not give up after 11 incorrect attempts. This unusual strategy was richly rewarded!

AtCoder Grand Contest 051 on Sunday was the last contest of rng_58 as AtCoder admin (problems, results, top 5 on the left, analysis). Even though the point values were different, the scoreboard looks very similar: the first four problems were solved by many, while the last two were very hard and only got accepted solutions in the last 45 minutes of the 4.5-hour contest. semiexp and yutaka1999 went for problem F which gave them more points and the first two places. Congratulations!

I'd like to highlight problem D: you are given an undirected graph with four vertices and four edges that looks like a square (the picture from the statement on the right). You know that a walk started from vertex S, finished at vertex S, has passed the ST edge (in any direction) a times, the TU edge b times, the VU edge c times, and the SV edge d times. How many different walks fit that description? a, b, c and d are up to 500000, and you need to compute the answer modulo 998244353.

Given that Makoto (rng_58) is retiring as AtCoder admin, I'd like to thank him for bringing AtCoder to the English-speaking world, and for making it the place for solving awesome problems. Extremely well done!

The two rounds have concluded the race for the 8 AtCoder World Tour Finals spots (results, top 8 on the left). Congratulations to all finalists! This is the third season of the AtCoder points system. The top 8 has quite a big intersection with those from the two previous years (2018, 2019), with tourist, Um_nik and myself qualifying all three times, and ecnerwala and mnbvmar qualifying last year. It will be the first WTF for Benq, Stonefeang and endagorion, assuming it does actually take place of course :)

Thanks for reading, and Happy New Year!

Sunday, November 22, 2020

A Seattle week

Codeforces Round 684 warmed the contestants up before the remaining TCO20 rounds (problems, results, top 5 on the left, analysis). It was very close between the top 3, but ecnerwala has managed to avoid getting TLE on pretests in the last problem (and therefore having to spend more time speeding up the solution) and thus gained a very small edge. Congratulations on the first place!

The problemsetters received a lot of flak for the tight time limits. I have not solved the round myself, so I can't offer any firsthand experience. However, I found it quite impressive that three different coders were complaining about having to squeeze their solutions only to learn that the have bad asymptotic complexity (1, 2, 3 — thanks a lot to dorijanlendvaj for investigating those!).

TCO20 Semifinal 2 followed next day (problems, results on the left, stream recording, analysis). The problems were significantly easier compared to Semifinal 1, with a relatively standard easy, a unique but not very hard medium, and a hard which was mostly solved by printing out the nimbers for small inputs and figuring out a pattern. Solving all three problems was enough to advance without any additional concerns; Egor and uwi solved two each, and Egor got ahead thanks to uwi's resubmit and his own successful challenge.

The top 4 from this round, together with the top 5 from the first round, competed in the TCO20 Final on Saturday (results on the left, stream recording). The problems were also not too difficult, and the competition came down to speed and random bugs, with less than 50 points separating many contestants at the top. tourist was just a tiny bit faster than everybody except Um_nik, and Um_nik's hard failed the system tests paving the way for the second TCO victory in a row for tourist — congratulations!

My contest was compromised early by two things: first, I had to resubmit the easy because I forgot to apply the modulo in one of the operations and therefore got an integer overflow. I guess it is a good lesson that tells me to start using a "modint" class that everybody else is using (which might be prohibitively slow in Java, but fine in C++). Second, after opening the medium I've immediately started to implement a brute force solution that computes the losing positions for small inputs, inspired by what happened in the semifinal. This strategy has backfired in this problem, as just a tiny bit of thinking on paper could've allowed me to achieve the same insight (that the losing positions are exactly the same as in the normal Nim) much quicker. I was therefore more than 100 points behind tourist and Um_nik after the first two problems.

I have managed to recover a big part of that gap on the hard problem, using the same strategy — implementing a brute force solution to see a pattern that enables a real solution (that all interesting states are the multiples of interesting 0/1 states), so one could say that this strategy worked out about even if one looks at the medium and hard combined.

I was therefore less than 50 points behind the first place, and needed just one successful challenge to win. I could not find any incorrect solutions, though, so I went with a last-second challenge on neal_wu's medium, which used an approach that was different from the approach that I and most others have taken, and which I hypothesized could TLE. It turned out it was fast enough, but I think it was definitely worth a try.

It is also interesting that the only incorrect solution in this round — Um_nik's hard — had a bug that I considered during the coding phase: it computed differences with the previous element instead of differences with the next element on every iteration. However, I was not sure if this bug can affect the results, so I've ran both variants of my solution on the biggest testcase (26, 10¹⁸), and they indeed gave the same output. I've thought that maybe the parity of n plays a role, so I've also tried (25, 10¹⁸) and the results were also the same, which got me completely convinced that the direction does not matter :) It turns out that both (24, 10¹⁸) and (26, 10¹⁷) have different outputs for those two solutions, so I was really close to discovering this and then checking all other solutions for this bug and challenging Um_nik successfully.

It is a bit more painful to come so close to winning in multiple ways (compared to for example 2019 where I did not really have any chance). Well, better luck next time, and congratulations to tourist once again!

In my previous summary, I have mentioned an AtCoder problem: you start with a sequence of n zeros (n<=50), and can apply the following four operations any number of times:

Add one to any number. This operation costs 1.
Subtract one from any number. This operation costs 1.
Add one to any contiguous segment of numbers. This operation costs C.
Subtract one from any contiguous segment of numbers. This operation costs C.

Now, you are given k (k<=50) candidates for each value in the final state of the sequence, each candidate is between 1 and 10⁹. You need to find sum of the minimum costs to obtain the final state for each of kⁿ possible final states, modulo 10⁹+7.

Swistakk has explained my approach in this comment, so I will only clarify that reducing the problem to O(n*k) problems where each element is 0 or 1 is done by considering independent problems "how to add 1 to all numbers which must be >=1 in the end", "how to add 1 to all numbers which must be >=2 in the end" and so on.

November was quite packed with onsite finals that were in limbo given the circumstances, and ended up happening online close to the end of the year. Now most of them have passed (VK Cup, Yandex Cup, TopCoder Open), AtCoder WTF is most likely postponed further (judging from the list of upcoming contests here), so the only remaining big onsite-turned-online final (and my last chance to win something this year) is the Facebook Hacker Cup on December 5.

Thanks for reading, and check back next week!

Sunday, November 15, 2020

A clean slate week

I think it's really past time to admit that I can't keep up with the weekly schedule, and start enjoying writing the posts instead of stressing about the backlog of several months. So, here comes:

This week's competitive programming events started with the Kotlin Heroes 5 on Codeforces (problems, results, top 5 on the left, analysis). Gone is the idea to auto-convert from Java, as everyone in top 5 seemingly writes somewhat idiomatic Kotlin directly (however I'm wondering if it looks idiomatic to Roman Elizarov :)). tourist and Benq were the only contestants to solve all problems, but Benq's chance to catch tourist was mostly gone with the incorrect attempt on problem D on the 12-th minute, which he took 9 minutes to correct and therefore fell behind so much he could not really recover. Congratulations to both on the great performance!

TopCoder Open 2020 has opened its virtual doors with Semifinal 1 (problems, results on the left, stream recording). The round was marred by an incorrect reference solution for the 500, seemingly making the problem unsolvable, at least within the time of the round, and resulting in neal_wu's challenge requests not being processed. In the end, the organizers have awarded him 50 challenge points as well (so we got +100 challenge points with just one incorrect submission :)), and advanced five competitors to the finals instead of four. Congratulations to all five! I think this is a decent solution to this situation, but I'm wondering if rerunning the round from scratch with a different set of problems would (arguably, of course) be more fair.

I have been commentating the round on the stream, and I didn't really do it well. First of all, I've had one job — to read the handles of the Russian competitors correctly — and still managed to mispronounce Um_nik's handle :( To add insult to injury, I did not name him in my list of contestants who have been doing very well recently, which was of course an obvious oversight. I'd like to take this opportunity and apologize to Alexey!

There were also serious connection issues which resulted in my voice not making it to the stream in many cases, and in us talking over each other a few times :( In addition, I have assumed that the first solution for the medium that came to our mind would run in time, while it actually did not. Streaming is hard! Please share any improvement suggestions that you have, I hope to do better next time.

AtCoder Grand Contest 049 followed on Saturday (problems, results, top 5 on the left, analysis). Um_nik breezed through the first five problems, and since the last problem was too tough to crack, his first place was not really in doubt. Well done! The race towards the 8 WTF spots is close to its conclusion (assuming there will be 1 or 2 more qualifying AGCs), with the top 4 most likely already booking their spots, mnbvmar being almost there, and maybe maroonrk as well if he keeps being a writer in the remaining rounds :)

In keeping with the meta-story of problem E, I came up with a solution that seems to have exactly zero things in common with the editorial :) Here is the problem statement: you start with a sequence of n zeros (n<=50), and can apply the following four operations any number of times:

Add one to any number. This operation costs 1.
Subtract one from any number. This operation costs 1.
Add one to any contiguous segment of numbers. This operation costs C.
Subtract one from any contiguous segment of numbers. This operation costs C.

Right after the end of the AtCoder round, Errichto hosted the final 8 of the first open Lockout tournament organized by Geothermal (details, stream recording, top 8 bracket on the left). pseudocoder10 has created an excellent bot that runs the matches and automatically picks Codeforces problems of appropriate difficulty that both participants have not solved yet, and the system with 6 problems and 100-200-300-400-500-600 point values provided for a big strategic variety and made the matches very exciting to watch. Well done to everyone involved, and congratulations to Um_nik on the victory!

rng_58 is running another iteration on new problems (future AtCoder Regular Contests), consider signing up if you're 2800+ on AtCoder!

Codeforces Round 683 wrapped up the week (problems, results, top 5 on the left, analysis). Um_nik has solved problem E in a seemingly normal rhythm, and went on to solve everything with almost half an hour to spare and win the round. Most of the others could not get past pretest 3 in problem E (Um_nik also had two attempts that stopped there), with ksun48 advancing as far as pretest 5. Congratulations to Um_nik on the convincing victory!

The diff between his passing solution and "wrong answer on pretest 3" attempt is small, but it does seem to be a substantial fix to the solution logic (see it on the right).

Thanks for reading, and check back (maybe) next week!

Sunday, September 20, 2020

An unexpected verdict week

Codeforces Round 658 was the first event of the Jul 20 - Jul 26 week (problems, results, top 5 on the left, my screencast, analysis). Benq has managed to finish all six problems in time, even though the first five would be enough for the first place anyway. Congratulations on the confident victory!

I have managed to dig myself out of the piecewise quadratic function world with just 8 minutes to go, which was still quite satisfying even though competing with Benq was completely out of question :)

TopCoder SRM 788 started the race for the first TCO21 spot (problems, results, top 5 on the left, my screencast, analysis). My easy-hard-medium strategy has hurt me this time, as after submitting the hard I saw others get 500+ points for the medium and submitted it without enough thinking, leading to a resubmit later. On the other hand, I could submit, stress-test, debug, fix and resubmit it without the pressure of "should I switch to the hard instead?" :) It was hard to fight for the top places with the resubmission. Even though the medium problem was much harder than the other two for lyrically as well, she got it right from the first attempt and earned 5 TCO21 points. Congratulations!

Here is the problem that caused all this mess: you are given an H times W grid (H*W<=500), with some of the boundaries between cells being passable, and some being walls. All outside boundaries are walls. Your goal is to remove some more non-outside walls in such a way that the remaining walls split the entire grid into rectangular areas without internal walls. What is the maximum number of such rectangles one can get?

Codeforces Round 659 then revealed the origin of "Polish Mafia" team name (problems, results, top 5 on the left, analysis). I could not solve problem C (neither could I solve problem F, but I did not spend much time on it), and I was not alone. That makes the performance of tourist, Benq and Radewoosh who solved all six problems even more impressive, well done! Even they have solved problem C as their last or next-to-last problem, suggesting it was also quite tricky for them.

Here is that problem: you are given a string of length up to 10⁵, consisting of the first 20 lowercase English letters. In one step, you can pick any set of characters in this string that are all equal (not necessarily all occurrences of that character), and change all of them to some other character (the same for all replacements in this step). What is the smallest number of steps needed to obtain the other given string of the same length?

I would also like to highlight the excellent investigation by maroonrk on the practical performance of modern flow algorithms :)

Thanks for reading, and check back for more!

Sunday, September 13, 2020

An Eggheads week

The Jul 13 - Jul 19 week was the second TopCoder-only week in a row, with TopCoder Open 2020 Round 2B (problems, results, top 5 on the left, parallel round results, my screencast, analysis). 203 more contestants advanced to Round 3, and DmitryGrigorev was on the first place thanks to going for easy+hard. Congratulations! Nobody was able to solve all three problems, and the score from easy+medium was only good enough for the 5th place.

Check back for more :)

A stable bubble week

TopCoder Open 2020 Round 2A was the main event of Jul 6 - Jul 12 week (problems, results, top 5 on the left, parallel round results, analysis). Four contestants solved all three problems in the main round, which is even more impressive given that nobody managed to do that in the parallel round, despite that fact that the strongest contestants who qualified directly to Round 4 were competing there. Congratulations to all four, and especially to Kriii on the win!

In my previous summary, I have mentioned a Codeforces problem: you are given an array a with at most 1000 elements. Then we write down all pairs of positions that form an inversion: pairs (u,v) such that u<v and a_u>a_v, getting a long list of all those pairs. Now we want to treat this list as a sorting program: for every pair in the list, we will swap the elements on the corresponding positions. Our goal is to make this program actually sort our array. We are allowed to put the elements of the list in arbitrary order (but we must have all pairs that form an inversion in the starting array exactly once).

If we were allowed to swap any pair that forms an inversion in the current state of the array, then the bubble sort algorithm would work, as it only swaps adjacent elements that form an inversion. However, we can only use the inversions from the initial array (and must use them all).

In order to achieve this, we need to find an algorithm that tries to keep most existing inversions unchanged. Let's do the following: first, we find the maximum number. Then, we find the second highest number, and sort them (within their places). Then, we add the third highest number and sort the three numbers within their places, using up all their inversions, and so on. This way, whenever we process a new number, the only thing that happened to the array is that the higher numbers got reordered, so the inversions involving the new number stay unchanged!

The only remaining step is to learn how to put the new number into its correct place using all its inversions. This is equivalent to putting 1 into the correct place given the array 2 3 4 ... k 1 (k+1) ... n. The following sequence of swaps does the job: swap 2 and 1, then swap 3 and 2 (which is in the original position of 1 now), then swap 4 and 3, and so on until swapping k and k-1.

Thanks for reading, and check back for more!

Friday, August 28, 2020

A respects week

Codeforces Global Round 9 was the main event of the Jun 29 - Jul 5 week (problems, results, top 5 on the left, analysis). The problemsetters warned us to read all problems before spending too much time on one of them, and yet my story is quite similar to Radewoosh's: I've spent 1.5 hours on problem F, and then solved problems G and H immediately after reading their problem statements, but lacked 5 minutes to finish the solution to H (it passed in practice). tourist, on the other hand, just didn't get stuck and got the first place with some margin. Well done!

Let me highlight a (relatively) easier problem this time, problem E: you are given an array a with at most 1000 elements. Then we write down all pairs of positions that form an inversion: pairs (u,v) such that u<v and a_u>a_v, getting a long list of all those pairs. Now we want to treat this list as a sorting program: for every pair in the list, we will swap the elements on the corresponding positions. Our goal is to make this program actually sort our array. We are allowed to put the elements of the list in arbitrary order (but we must have all pairs that form an inversion in the starting array exactly once). Can you see a way to achieve this?

Thanks for reading, and check back for more!

Wednesday, August 26, 2020

A tube week

There were no contests that I'd like to mention during the Jun 22 - Jun 28 week, so let's come back to the Codeforces problem from the previous summary: there are n lamps arranged in a circle (n<=1000), and two players are turning them on and off. Initially all lamps are off. The first player starts by turning on any subset of lamps that are off. Let the number of lamps turned on by the first player be k. Then the second player chooses any subset of k consecutive lamps (along the circle), and turns off all lamps in that subset that were on. Then the first player can turn on any subset of lamps that are off again, and so on. The first player can choose to finish the game instead of making their turn. The goal of the first player is to maximize the number of lamps that are on when the game is finished (and they have to finish the game at some point, they can't continue forever), and the second player tries to minimize that number. What is the optimal strategy for the first player?

Let's call a move of the first player followed by a move of the second player one step. Since the second player always turns off at most the same number of lamps that the first player has just turned on, the number of lamps never decreases after a step. We can further classify the steps into two types: the ones that increase the number of lamps that are on, and the ones that keep it unchanged.

Consider a game that was played optimally by both players, and let's focus on the last increasing step of that game. Suppose the first player has turned k lamps on during their move. If there were k consecutive lamps that were on after that, the second player could have turned them all off, and make the step not increasing. Therefore in order to make an increasing step, the first player needs to keep at least one lamp off among each k consecutive lamps. Therefore the maximum number of lamps that are on after his move is n-ceil(n/k), and then the second player will turn k-1 lamps off in the worst case, so the number of lamps that will be on after this step will be n-ceil(n/k)-k+1.

The first player can pick the value of k that maximizes n-ceil(n/k)-k+1 and achieve such score in the following way: choose any set of lamps of size n-ceil(n/k) that does not have k consecutive lamps, and keep turning on any k lamps from this set that are off. This will guarantee that each step will be increasing until we reach the score of n-ceil(n/k)-k+1.

The second player can guarantee that the score never exceeds max(n-ceil(n/k)-k+1) by simply turning off the maximum number of lamps that they can at each turn, which will make sure that whenever a step is increasing and the first player turned on k lamps, the score after this step will not exceed n-ceil(n/k)-k+1. This is not an entirely formal proof, but the remaining details are left to the reader :)

Thanks for reading, and check back for more!

Friday, July 31, 2020

A specific week

Codeforces Global Round 8 took place during the Jun 15 - Jun 21 week (problems, results, top 5 on the left, analysis). I got a great start by solving problem F 14 minutes earlier than the second solution to this problem (from Egor), but unfortunately could not solve G in the last hour even though I was quite close. Both ecnerwala and tourist solved their 8th problem in the last 10 minutes, and grabbed the first two places with less than 100 points separating them. Congratulations to both!

Here is the problem that I solved so fast: there are n lamps arranged in a circle, and two players are turning them on and off. Initially all lamps are off. The first player starts by turning on any subset of lamps that are off. Let the number of lamps turned on by the first player be k. Then the second player chooses any subset of k consecutive lamps (along the circle), and turns off all lamps in that subset that were on. Then the first player can turn on any subset of lamps that are off again, and so on. The first player can choose to finish the game instead of making their turn. The goal of the first player is to maximize the number of lamps that are on when the game is finished (and they have to finish the game at some point, they can't continue forever), and the second player tries to minimize that number. What is the optimal strategy for the first player?

AtCoder Grand Contest 046 followed on Saturday (problems, results, to 5 on the left, analysis). tourist would be ahead of everybody else with 5 problems even if he stopped competing an hour before the end, but he has used that hour to solve problem E as well. Congratulations on the commanding victory! I could not solve any of the three harder problems in the last two hours.

TopCoder SRM 787 wrapped up the week on Sunday night (problems, results, top 5 on the left, analysis). bqi343 has confirmed his qualification for TCO20 finals, but even with 200 challenge points he could not overtake ksun48 for the first place. Congratulations to both!

The hard problem in this round was somewhat standard, and could be solved by carefully implementing a few standard tricks. However, one could significantly cut down the implementation time and the chances for making a bug by considering the specifics of this problem, which is also a great skill. From the solutions I saw, I think pashka's demonstrates this skill the most (the actual logic of the solution on top of copy-pasted components starts from the line "for (auto b : br2) {"). No binary search, no tree dynamic programming necessary :)

Thanks for reading, and check back for more!

Wednesday, July 29, 2020

A rescue week

The Jun 8 - Jun 14 week did not have contests that I want to mention, so let's discuss the question from the previous summary.

I have found that the following code solves an AtCoder problem:

for (int n = 1; n <= N; ++n) {
for (int a = 1; a <= A && a <= n; ++a) {
if (a == n) {
w0[n][a] = fact[n];
w1[n][a] = fact[n];
} else {
w0[n][a] = ((n - a) * (int64) w1[n - 1][a] + (a - 1) * (int64) w1[n - 1][a - 1]) % MODULO;
w1[n][a] = (w0[n - 1][a - 1] + w0[n][a]) % MODULO;
}
}
}

where w0[N][A] would contain the final answer. This solution runs in O(N*A), which is too slow to get the problem accepted since N<=10⁷ and A<=5000. How to make it faster?

To quote TLE, it's Berlekamp-Massey Algorithm to the rescue! The situation here matches the application from that blog post almost exactly. There are two differences, though. The first difference is that our transition has an extra "if (a == n)" branch. However, that branch never triggers when n>A, so we can just treat the n=A row as our initial values and imagine we don't have that branch.

This is not enough: the answer to the last sample still does not match if we compute w0[A,A], w0[A+1,A],..., w0[3*A,A] and apply Berlekamp-Massey to find w0[N][A]. The reason is the following: in our transition function, we multiply by (n-a), while Berlekamp-Massey expects the transition matrix to be independent of n.

However, we can notice a peculiar property of our transition: we only multiply by (n-a) when going from (n-1,a) to (n,a). In all other cases, the value of n-a stays constant, and we multiply by something independent of n (we go from either (n-1,a-1) or from (n,a) to (n,a)). Therefore, in aggregate, we will multiply by (n-a)! to achieve state (n,a) starting from a state (x,x). And this in turn means that if we define w2[n][a]=w0[n][a]/fact[n-a], then the transition function for these values will be independent of n, and we can apply Berlekamp-Massey to the sequence w2[A,A],w2[A+1,A],..., w2[3*A,A] to find w2[N][A]. Note that we don't need to write down the actual transition function for w2, we just need to believe it has the correct form, and we can use the code above to find w2 through w0, and thus solve our problem in O(N²*log(N)).

Thanks for reading, and check back for more!

A pen testing week

Codeforces Round 647 was the first contest of the first week of June (problems, results, top 5 on the left, analysis). This round seemed to have a bit more "Failed System Test" outcomes than a typical Codeforces round (instructive example), and yet tourist, Um_nik and boboniu did not make mistakes and solved all problems correctly (to be more precise, both tourist and Um_nik resubmitted one of the problems, so they have managed to correct the mistakes they made). Congratulations to all and especially to tourist on the win!

Google Code Jam 2020 Round 3 selected the 25 finalists, who unfortunately will not travel to Munich, and will instead compete in the online finals (problems, results, top 5 on the left). Besides the significantly easier first problem that was solved by everyone in top 25, the contestants found a lot of different ways to qualify using the other three problems. Gennady.Korotkevich solved all inputs but one and won, continuing his exceptional GCJ form.

I have helped prepare the third problem, Pen Testing, that was created by Ian Tullis and which I find quite amazing: you have 15 ballpoint pens having 0, 1, 2, ..., 14 units of ink in them. The pens are given to you in random order and you don't know which pen is which. The only way for you to get information about the pens is to write something with them. In one turn, you pick one pen and try to spend one unit from ink from it, and you get one bit of information doing that: if the pen had at least one unit of ink left before this turn, you succeed in writing with it, and it now has one unit less (possibly 0). If the pen was empty, then you fail in writing with it, and you know that it's definitely empty. You goal is to eventually choose two pens that have at least 15 units of ink remaining in total.

Just choosing two pens at random without any writing gives you a 46.(6)% probability of success, and writing seemingly makes things even worse as it reduces the amount of remaining ink, and you only learn the concrete identity of a pen after it has no ink left :) And yet the problem asks you to succeed in 63.6% of all cases. How is it even possible?

I have written an extensive analysis for it that you can read by clicking "Analysis" after following the above link. I don't have much to add to it :)

AtCoder Grand Contest 045 wrapped up the week on Sunday (problems, results, top 5 on the left, analysis). Nobody could solve problem E, and only apiad has solved problem F, but he did not have enough time left afterwards for the easier problems and finished in 12th place. Solving the first four problems fast was the way to victory this time, and that's exactly what ksun48 did. Congratulations!

I have made most of the steps described in the analysis for problem D, but not all of them. So I have obtained the following recurrence:

for (int n = 1; n <= N; ++n) {
for (int a = 1; a <= A && a <= n; ++a) {
if (a == n) {
w0[n][a] = fact[n];
w1[n][a] = fact[n];
} else {
w0[n][a] = ((n - a) * (int64) w1[n - 1][a] + (a - 1) * (int64) w1[n - 1][a - 1]) % MODULO;
w1[n][a] = (w0[n - 1][a - 1] + w0[n][a]) % MODULO;
}
}
}

where w0[N][A] would contain the final answer. This solution runs in O(N*A), which is too slow since N<=10⁷ and A<=5000. Can you guess how I got this problem accepted (you don't need to read the problem statement :P)?

Thanks for reading, and check back for more!

Thursday, July 23, 2020

A local week

Kotlin Heroes Episode 4 was the main event of the May 25 - May 31 week (problems, results, top 5 on the left, analysis). Only tourist, Golobanov399 and eatmore have managed to solve the hardest problem I, but tourist was done 45 minutes before his competitors and earned a very convincing victory. Well done!

In my previous summary, I have mentioned an AtCoder problem: there are n<=200000 cells arranged in a circle. The i-th cell has two numbers associated with it, a_i and b_i (0<=a_i<=10¹², 0<=b_i<=100). We put a token into one of the cells uniformly at random, and our initial score is zero. Then, we repeatedly face the following choice: we either add a_i to our score and end the process (where i is the number of the cell the token is currently in), or we subtract b_i from our score, move the token uniformly at random into one of the two adjacent cells, and continue the process. What is the maximum expected score we can get if we play optimally?

Since the only state of the process is the cell the token is currently in, and the only decision we make at every moment is whether to stop or to continue, a strategy is completely described by a set S of cells in which we stop (by taking a_i). How do we compute the expected score if we choose a particular set S?

The parts of the circle between consecutive elements of S can be treated completely independently, and we have the following subproblem there: given a segment of length k, we start in a random position and keep moving randomly either to the left or to the right until we leave the segment (because of symmetry we will exit to the left or to the right with probability 50%). What is the expected sum of b_i's of the cells we go through, where one b_i is added multiple times if its cell is visited multiple times?

Because of linearity of expectation, this sum is equal to sum of b_i times the expected number of times the i-th cell is visited. We can denote that expected number of times as f(k, i). Since this number has just two integer parameters, we can implement simulation to check if there is a pattern. I did just that in the beginning of the contest, and the pattern was very clear: f(k, i)=(i+1)*(k-i). There is probably an easy way to deduce that or prove it by induction, but for me the experimental proof was good enough :)

Now that we've learned how to compute the expected score given the set S, we can turn our attention to finding the optimal set S. My intuition told me during the round that greedy should work, and indeed it works because of the following lemma: suppose we have a non-empty set S which is locally optimal: neither adding one cell to it, nor removing one cell from it, leads to a set S' with a better expected score. Then this set S is actually globally optimal, and is the answer to the problem.

Here is the proof: let's denote the expected score we get if we start from cell i and use the set S as the stopping cells as g(S, i). Let's prove that for any other set S', g(S', i)<=g(S, i) by the infinite induction on the number of moves that happen before we stop. If we decide to stop in some cell i, then g(S', i)=a_i, and since S is locally optimal g(S, i)>=a_i, so g(S', i)<=g(S, i) in this case. If we decide to continue in some cell i, then get we g(S', i)=-b_i+(g(S', i-1)+g(S', i+1))/2, which by induction hypothesis (sorry for being a bit informal here, to make it formal we'd have to introduce a third parameter to g) is <=-b_i+(g(S, i-1)+g(S, i+1))/2, which is in turn <=g(S, i) because S is locally optimal.

Therefore all that remains is to find a way to apply local optimizations to the set S in such a way that we converge quickly. We can actually use a very similar argument to guide our way: suppose for some set S' we determine that it's better to remove the cell i from it, in other words it's better to continue if we are in cell i. Then the cell i does not belong to the optimal set S, in other words it is better to continue from the cell i in the optimal solution as well! To see why it's true, let's write down the condition that it's better to remove the cell i from S': a_i<-b_i +(g(S'', i-1)+g(S'', i+1))/2 (where S''=S'-{i}). g(S'', i-1)<=g(S, i-1) and g(S'', i+1)<=g(S, i+1), therefore a_i<-b_i+(g(S, i-1)+g(S, i+1))/2, which means that i will not belong to S since S is locally optimal.

So we can just start with S containing all cells, and keep removing cells while it improves the expected score. In order to avoid traversing the list of cells n times and therefore getting O(n²) running time, we can use the standard idea of keeping a set of "pending" cells, which would improve the expected score when removed, and update the status of two adjacent non-removed cells when we process the removal of a cell. Another way to do it is to start with a cell that is definitely in S (the one with the highest a_i), and keep going to the right and adding cells to S (which will be represented by a stack), and then repeatedly checking if removing next-to-last cell in S leads to a better expected score. This second approach is very similar to the convex hull algorithms.

Thanks for reading, and check back for more!

Tuesday, July 21, 2020

An extract method week

AtCoder Grand Contest 044 was the main event of the May 18 - May 24 week (problems, results, top 5 on the left, analysis). I have started by diving into the problem E, implementing what seemed to be a reasonable greedy but repeatedly getting a wrong answer, and my inability to give up on trying reasonable-looking ideas without proof has ruined the whole contest. To add insult to injury, it turned out that the last idea that I wanted to try (also without proof) when the contest ended could actually get the problem accepted :) tourist, on the other hand, came up with a solution to this problem in a normal manner, without trying random ideas, and won the round. Well done!

Here is the problem statement: there are n<=200000 cells arranged in a circle. The i-th cell has two numbers associated with it, a_i and b_i (0<=a_i<=10¹², 0<=b_i<=100). We put a token into one of the cells uniformly at random, and our initial score is zero. Then, we repeatedly face the following choice: we either add a_i to our score and end the process (where i is the number of the cell the token is currently in), or we subtract b_i from our score, move the token uniformly at random into one of the two adjacent cells, and continue the process. What is the maximum expected score we can get if we play optimally?

Let me also use this summary to update on my C++ experiment. For about half a year, I have been using exclusively C++ on all platforms except TopCoder (I kept using Java on TopCoder because JHelper lacks TopCoder integration). From the only objective measure I could come up with, the experiment doesn't seem to go so well: while on TopCoder I've managed to qualify for the TCO and kept the second place in the overall rating, I've dropped out of the top 10 on Codeforces, my campaign to qualify for AtCoder World Tour Finals 2021 is not in the best shape, and our Open Cup results seem to have taken a turn for the worse in the second half of the season. Of course, all these numbers might well be statistically insignificant if one tries to dig in deeper.

Subjectively, I definitely feel that I'm fighting less with the time limits and can focus on implementing the algorithm, however I also feel that my coding speed dropped quite substantially as I have to think more about every line of code I write, and also as C++ support in JetBrains tools is still being improved to the level of Java support (for example, "Extract Method" which I use extensively used to suggest weird parameter names, but it seems to be fixed now). I wonder if it's possible to quantify the coding speed drop based on the screencasts :)

I'm still hopeful that the coding speed will improve, and that CLion will have all necessary features soon (or already has them), so I think I will continue the experiment at least until the end of the year. Please share any observations or suggestions that you have about this, and of course check back for the next week's summary!

Sunday, July 5, 2020

A Bresenham's week

The May 11 - May 17 week started with Codeforces Round 641 (problems, results, top 5 on the left, analysis). It was quite challenging for some contestants, and nobody could solve more than 4 problems out of 7. Um_nik solved his four in just one half of allotted time and won the round, congratulations!

TopCoder SRM 786 followed on Friday (problems, results, top 5 on the left, analysis). bqi343 had a 50-point lead after the coding phase, and increased it to 100 points during the challenge phase. Congratulations on the victory! These 5 qualification points have also made him a huge favorite to qualify to the TCO from the third online stage.

On Saturday Google held Code Jam 2020 Round 2 (problems, results, top 5 on the left). Placing in the top 1000 was the main goal in this round, but still Benq's jump to the first place with just five minutes remaining in the round was quite impressive, well done!

Open Cup 2019-20 Grand Prix of Serbia wrapped up the week on Sunday (results, top 5 on the left). Teams Polish Mafia, USA1 and NNSU Almost Retired solved 12 problems each, and the deciding factor was Polish Mafia's significantly fewer incorrect attempts. Congratulations on the win!

It was not announced before the round, but this Grand Prix was the last one of the Open Cup 2019-20 season (overall standings, top 10 on the left). The standings were very close at the top, but in the end we have a new champion. For only the second time since 2013, the champion team's last name list does not include Korotkevich :) Congratulations to the team USA1, and looking forward to even more competition at the top in the coming season!

In my previous summary, I have mentioned an Open Cup problem: a string of 0s and 1s is called balanced if the number of 1s in any two its circular substrings of the same length differs by at most 1. A circular substring of a string s is any string with the length not exceeding the length of s that can be read after we write s on a circle; in other words, it's either a normal substring of s, or a suffix of s concatenated with a prefix of s with total length not exceeding the length of s. You are given a string consisting of 0s, 1s and ?s. How many ways are there to replace each ? with a 0 or a 1 such that the resulting string is balanced? The length of the given string is at most 1024.

This setting reminded me of Bresenham's line drawing algorithm, so I tried to get balanced strings like this: let's draw a line from point (0,0) to point (x,y), then let's keep walking starting from point (0,0). We walk like this: if we are below the line, then we go up (increase the y-coordinate by 1), and if we are above the line, then we go right (increase the x-coordinate by 1). If we are on the line, then we go up unless the line is horizontal (y=0), in that case we always go right. We will reach the point (x,y) in x+y steps, and let's write down a 0 every time we go right, and a 1 every time we go up.

Intuitively, such strings should be balanced, as we're always walking very close to the line and all substrings of the same size should represent almost the same vector. Circular shifts of a balanced string are also balanced, so we need to also consider the circular shifts of the strings we get that way. It's not at all clear if there are other balanced strings, though. Imagine how happy I was when I found out that considering only such strings gets correct answers on all samples, and then was accepted by the judging system!

The resulting code is very simple, since the strings we get for different pairs (x,y) are all different, so we just need to find the period of each such string to determine the number of its different circular shifts, and do not need to otherwise check the strings we get for repetitions.

This was one of those rare cases where submitting a solution solely based on intuition and without having even a sketch of a proof worked out :)

Thanks for reading, and check back for more!

Sunday, June 14, 2020

A week without paradox

TopCoder SRM 785 took place during the May 4 - May 10 week (problems, results, top 5 on the left, analysis). The hard problem had this feeling of "it's amazing nobody has asked this before", and it turned out that somebody did :) mochalka had seen this problem before and was the only contestant to get it accepted, winning the match — even in such situation it's important to execute well given the chance, so well done! It turned out that the only bug in my solution was that I was iterating up to bit 29 instead of bit 30 in one place, and it passed in the practice room with that one-byte change. It was disappointing to figure out all tricky details correctly to only fail in that manner :)

Open Cup 2019-20 Grand Prix of Bytedance took place on Sunday (results, top 5 on the left). Team japan02 has really aced it this time, finishing 2 problems ahead of the three leaders of the overall standings and winning their second stage this season with a bang. Congratulations!

Our solution to problem B was quite unusual, and was based on intuition a lot more than is typically reasonable. The problem went like this: a string of 0s and 1s is called balanced if the number of 1s in any two its circular substrings of the same length differs by at most 1. A circular substring of a string s is any string with the length not exceeding the length of s that can be read after we write s on a circle; in other words, it's either a normal substring of s, or a suffix of s concatenated with a prefix of s with total length not exceeding the length of s. You are given a string consisting of 0s, 1s and ?s. How many ways are there to replace each ? with a 0 or a 1 such that the resulting string is balanced? The length of the given string is at most 1024.

In my previous summary, I have mentioned another Open Cup problem: we have a string which is initially empty. Then we repeatedly add characters either to the beginning of the string, or to the end of the string, n times (n<=1000000). Then, we need to print n numbers: how many period lengths did the string have after each addition? A string s has a period length k if 1<=k<=length(s) and for each valid i s_i=s_i+k (note that under this definition the last repetition of a period may be incomplete). For example, the string ababa has 3 period lengths: 2, 4 and 5.

The key observation here is that once a certain length k stops being a period, it will never become one again after we add more characters! This is especially clear from the formal definition above, as we just get more constraints of form s_i=s_i+k as the number of possible values of i increases. Therefore for each length k it will be a period from step k (each string has its own length as a period length) to some step s_k.

We can find each particular s_k using binary search in O(log(n)), using hashes to check if k is a period length in O(1), therefore we can find all s_k's in O(n*log(n)), and the numbers we need to print can then be computed from those in O(n).

It's not particularly important for this problem, but note that we're only doing O(n*log(n)) equality comparisons for the hashes, therefore we're not subject to the birthday paradox, and 32-bit hashes are enough.

Thanks for reading, and check back for more!