Algorithms Weekly by Petr Mitrichev: July 2020

Friday, July 31, 2020

A specific week

Codeforces Global Round 8 took place during the Jun 15 - Jun 21 week (problems, results, top 5 on the left, analysis). I got a great start by solving problem F 14 minutes earlier than the second solution to this problem (from Egor), but unfortunately could not solve G in the last hour even though I was quite close. Both ecnerwala and tourist solved their 8th problem in the last 10 minutes, and grabbed the first two places with less than 100 points separating them. Congratulations to both!

Here is the problem that I solved so fast: there are n lamps arranged in a circle, and two players are turning them on and off. Initially all lamps are off. The first player starts by turning on any subset of lamps that are off. Let the number of lamps turned on by the first player be k. Then the second player chooses any subset of k consecutive lamps (along the circle), and turns off all lamps in that subset that were on. Then the first player can turn on any subset of lamps that are off again, and so on. The first player can choose to finish the game instead of making their turn. The goal of the first player is to maximize the number of lamps that are on when the game is finished (and they have to finish the game at some point, they can't continue forever), and the second player tries to minimize that number. What is the optimal strategy for the first player?

AtCoder Grand Contest 046 followed on Saturday (problems, results, to 5 on the left, analysis). tourist would be ahead of everybody else with 5 problems even if he stopped competing an hour before the end, but he has used that hour to solve problem E as well. Congratulations on the commanding victory! I could not solve any of the three harder problems in the last two hours.

TopCoder SRM 787 wrapped up the week on Sunday night (problems, results, top 5 on the left, analysis). bqi343 has confirmed his qualification for TCO20 finals, but even with 200 challenge points he could not overtake ksun48 for the first place. Congratulations to both!

The hard problem in this round was somewhat standard, and could be solved by carefully implementing a few standard tricks. However, one could significantly cut down the implementation time and the chances for making a bug by considering the specifics of this problem, which is also a great skill. From the solutions I saw, I think pashka's demonstrates this skill the most (the actual logic of the solution on top of copy-pasted components starts from the line "for (auto b : br2) {"). No binary search, no tree dynamic programming necessary :)

Thanks for reading, and check back for more!

Wednesday, July 29, 2020

A rescue week

The Jun 8 - Jun 14 week did not have contests that I want to mention, so let's discuss the question from the previous summary.

I have found that the following code solves an AtCoder problem:

for (int n = 1; n <= N; ++n) {
for (int a = 1; a <= A && a <= n; ++a) {
if (a == n) {
w0[n][a] = fact[n];
w1[n][a] = fact[n];
} else {
w0[n][a] = ((n - a) * (int64) w1[n - 1][a] + (a - 1) * (int64) w1[n - 1][a - 1]) % MODULO;
w1[n][a] = (w0[n - 1][a - 1] + w0[n][a]) % MODULO;
}
}
}

where w0[N][A] would contain the final answer. This solution runs in O(N*A), which is too slow to get the problem accepted since N<=10⁷ and A<=5000. How to make it faster?

To quote TLE, it's Berlekamp-Massey Algorithm to the rescue! The situation here matches the application from that blog post almost exactly. There are two differences, though. The first difference is that our transition has an extra "if (a == n)" branch. However, that branch never triggers when n>A, so we can just treat the n=A row as our initial values and imagine we don't have that branch.

This is not enough: the answer to the last sample still does not match if we compute w0[A,A], w0[A+1,A],..., w0[3*A,A] and apply Berlekamp-Massey to find w0[N][A]. The reason is the following: in our transition function, we multiply by (n-a), while Berlekamp-Massey expects the transition matrix to be independent of n.

However, we can notice a peculiar property of our transition: we only multiply by (n-a) when going from (n-1,a) to (n,a). In all other cases, the value of n-a stays constant, and we multiply by something independent of n (we go from either (n-1,a-1) or from (n,a) to (n,a)). Therefore, in aggregate, we will multiply by (n-a)! to achieve state (n,a) starting from a state (x,x). And this in turn means that if we define w2[n][a]=w0[n][a]/fact[n-a], then the transition function for these values will be independent of n, and we can apply Berlekamp-Massey to the sequence w2[A,A],w2[A+1,A],..., w2[3*A,A] to find w2[N][A]. Note that we don't need to write down the actual transition function for w2, we just need to believe it has the correct form, and we can use the code above to find w2 through w0, and thus solve our problem in O(N²*log(N)).

Thanks for reading, and check back for more!

A pen testing week

Codeforces Round 647 was the first contest of the first week of June (problems, results, top 5 on the left, analysis). This round seemed to have a bit more "Failed System Test" outcomes than a typical Codeforces round (instructive example), and yet tourist, Um_nik and boboniu did not make mistakes and solved all problems correctly (to be more precise, both tourist and Um_nik resubmitted one of the problems, so they have managed to correct the mistakes they made). Congratulations to all and especially to tourist on the win!

Google Code Jam 2020 Round 3 selected the 25 finalists, who unfortunately will not travel to Munich, and will instead compete in the online finals (problems, results, top 5 on the left). Besides the significantly easier first problem that was solved by everyone in top 25, the contestants found a lot of different ways to qualify using the other three problems. Gennady.Korotkevich solved all inputs but one and won, continuing his exceptional GCJ form.

I have helped prepare the third problem, Pen Testing, that was created by Ian Tullis and which I find quite amazing: you have 15 ballpoint pens having 0, 1, 2, ..., 14 units of ink in them. The pens are given to you in random order and you don't know which pen is which. The only way for you to get information about the pens is to write something with them. In one turn, you pick one pen and try to spend one unit from ink from it, and you get one bit of information doing that: if the pen had at least one unit of ink left before this turn, you succeed in writing with it, and it now has one unit less (possibly 0). If the pen was empty, then you fail in writing with it, and you know that it's definitely empty. You goal is to eventually choose two pens that have at least 15 units of ink remaining in total.

Just choosing two pens at random without any writing gives you a 46.(6)% probability of success, and writing seemingly makes things even worse as it reduces the amount of remaining ink, and you only learn the concrete identity of a pen after it has no ink left :) And yet the problem asks you to succeed in 63.6% of all cases. How is it even possible?

I have written an extensive analysis for it that you can read by clicking "Analysis" after following the above link. I don't have much to add to it :)

AtCoder Grand Contest 045 wrapped up the week on Sunday (problems, results, top 5 on the left, analysis). Nobody could solve problem E, and only apiad has solved problem F, but he did not have enough time left afterwards for the easier problems and finished in 12th place. Solving the first four problems fast was the way to victory this time, and that's exactly what ksun48 did. Congratulations!

I have made most of the steps described in the analysis for problem D, but not all of them. So I have obtained the following recurrence:

for (int n = 1; n <= N; ++n) {
for (int a = 1; a <= A && a <= n; ++a) {
if (a == n) {
w0[n][a] = fact[n];
w1[n][a] = fact[n];
} else {
w0[n][a] = ((n - a) * (int64) w1[n - 1][a] + (a - 1) * (int64) w1[n - 1][a - 1]) % MODULO;
w1[n][a] = (w0[n - 1][a - 1] + w0[n][a]) % MODULO;
}
}
}

where w0[N][A] would contain the final answer. This solution runs in O(N*A), which is too slow since N<=10⁷ and A<=5000. Can you guess how I got this problem accepted (you don't need to read the problem statement :P)?

Thanks for reading, and check back for more!

Thursday, July 23, 2020

A local week

Kotlin Heroes Episode 4 was the main event of the May 25 - May 31 week (problems, results, top 5 on the left, analysis). Only tourist, Golobanov399 and eatmore have managed to solve the hardest problem I, but tourist was done 45 minutes before his competitors and earned a very convincing victory. Well done!

In my previous summary, I have mentioned an AtCoder problem: there are n<=200000 cells arranged in a circle. The i-th cell has two numbers associated with it, a_i and b_i (0<=a_i<=10¹², 0<=b_i<=100). We put a token into one of the cells uniformly at random, and our initial score is zero. Then, we repeatedly face the following choice: we either add a_i to our score and end the process (where i is the number of the cell the token is currently in), or we subtract b_i from our score, move the token uniformly at random into one of the two adjacent cells, and continue the process. What is the maximum expected score we can get if we play optimally?

Since the only state of the process is the cell the token is currently in, and the only decision we make at every moment is whether to stop or to continue, a strategy is completely described by a set S of cells in which we stop (by taking a_i). How do we compute the expected score if we choose a particular set S?

The parts of the circle between consecutive elements of S can be treated completely independently, and we have the following subproblem there: given a segment of length k, we start in a random position and keep moving randomly either to the left or to the right until we leave the segment (because of symmetry we will exit to the left or to the right with probability 50%). What is the expected sum of b_i's of the cells we go through, where one b_i is added multiple times if its cell is visited multiple times?

Because of linearity of expectation, this sum is equal to sum of b_i times the expected number of times the i-th cell is visited. We can denote that expected number of times as f(k, i). Since this number has just two integer parameters, we can implement simulation to check if there is a pattern. I did just that in the beginning of the contest, and the pattern was very clear: f(k, i)=(i+1)*(k-i). There is probably an easy way to deduce that or prove it by induction, but for me the experimental proof was good enough :)

Now that we've learned how to compute the expected score given the set S, we can turn our attention to finding the optimal set S. My intuition told me during the round that greedy should work, and indeed it works because of the following lemma: suppose we have a non-empty set S which is locally optimal: neither adding one cell to it, nor removing one cell from it, leads to a set S' with a better expected score. Then this set S is actually globally optimal, and is the answer to the problem.

Here is the proof: let's denote the expected score we get if we start from cell i and use the set S as the stopping cells as g(S, i). Let's prove that for any other set S', g(S', i)<=g(S, i) by the infinite induction on the number of moves that happen before we stop. If we decide to stop in some cell i, then g(S', i)=a_i, and since S is locally optimal g(S, i)>=a_i, so g(S', i)<=g(S, i) in this case. If we decide to continue in some cell i, then get we g(S', i)=-b_i+(g(S', i-1)+g(S', i+1))/2, which by induction hypothesis (sorry for being a bit informal here, to make it formal we'd have to introduce a third parameter to g) is <=-b_i+(g(S, i-1)+g(S, i+1))/2, which is in turn <=g(S, i) because S is locally optimal.

Therefore all that remains is to find a way to apply local optimizations to the set S in such a way that we converge quickly. We can actually use a very similar argument to guide our way: suppose for some set S' we determine that it's better to remove the cell i from it, in other words it's better to continue if we are in cell i. Then the cell i does not belong to the optimal set S, in other words it is better to continue from the cell i in the optimal solution as well! To see why it's true, let's write down the condition that it's better to remove the cell i from S': a_i<-b_i +(g(S'', i-1)+g(S'', i+1))/2 (where S''=S'-{i}). g(S'', i-1)<=g(S, i-1) and g(S'', i+1)<=g(S, i+1), therefore a_i<-b_i+(g(S, i-1)+g(S, i+1))/2, which means that i will not belong to S since S is locally optimal.

So we can just start with S containing all cells, and keep removing cells while it improves the expected score. In order to avoid traversing the list of cells n times and therefore getting O(n²) running time, we can use the standard idea of keeping a set of "pending" cells, which would improve the expected score when removed, and update the status of two adjacent non-removed cells when we process the removal of a cell. Another way to do it is to start with a cell that is definitely in S (the one with the highest a_i), and keep going to the right and adding cells to S (which will be represented by a stack), and then repeatedly checking if removing next-to-last cell in S leads to a better expected score. This second approach is very similar to the convex hull algorithms.

Thanks for reading, and check back for more!

Tuesday, July 21, 2020

An extract method week

AtCoder Grand Contest 044 was the main event of the May 18 - May 24 week (problems, results, top 5 on the left, analysis). I have started by diving into the problem E, implementing what seemed to be a reasonable greedy but repeatedly getting a wrong answer, and my inability to give up on trying reasonable-looking ideas without proof has ruined the whole contest. To add insult to injury, it turned out that the last idea that I wanted to try (also without proof) when the contest ended could actually get the problem accepted :) tourist, on the other hand, came up with a solution to this problem in a normal manner, without trying random ideas, and won the round. Well done!

Here is the problem statement: there are n<=200000 cells arranged in a circle. The i-th cell has two numbers associated with it, a_i and b_i (0<=a_i<=10¹², 0<=b_i<=100). We put a token into one of the cells uniformly at random, and our initial score is zero. Then, we repeatedly face the following choice: we either add a_i to our score and end the process (where i is the number of the cell the token is currently in), or we subtract b_i from our score, move the token uniformly at random into one of the two adjacent cells, and continue the process. What is the maximum expected score we can get if we play optimally?

Let me also use this summary to update on my C++ experiment. For about half a year, I have been using exclusively C++ on all platforms except TopCoder (I kept using Java on TopCoder because JHelper lacks TopCoder integration). From the only objective measure I could come up with, the experiment doesn't seem to go so well: while on TopCoder I've managed to qualify for the TCO and kept the second place in the overall rating, I've dropped out of the top 10 on Codeforces, my campaign to qualify for AtCoder World Tour Finals 2021 is not in the best shape, and our Open Cup results seem to have taken a turn for the worse in the second half of the season. Of course, all these numbers might well be statistically insignificant if one tries to dig in deeper.

Subjectively, I definitely feel that I'm fighting less with the time limits and can focus on implementing the algorithm, however I also feel that my coding speed dropped quite substantially as I have to think more about every line of code I write, and also as C++ support in JetBrains tools is still being improved to the level of Java support (for example, "Extract Method" which I use extensively used to suggest weird parameter names, but it seems to be fixed now). I wonder if it's possible to quantify the coding speed drop based on the screencasts :)

I'm still hopeful that the coding speed will improve, and that CLion will have all necessary features soon (or already has them), so I think I will continue the experiment at least until the end of the year. Please share any observations or suggestions that you have about this, and of course check back for the next week's summary!

Sunday, July 5, 2020

A Bresenham's week

The May 11 - May 17 week started with Codeforces Round 641 (problems, results, top 5 on the left, analysis). It was quite challenging for some contestants, and nobody could solve more than 4 problems out of 7. Um_nik solved his four in just one half of allotted time and won the round, congratulations!

TopCoder SRM 786 followed on Friday (problems, results, top 5 on the left, analysis). bqi343 had a 50-point lead after the coding phase, and increased it to 100 points during the challenge phase. Congratulations on the victory! These 5 qualification points have also made him a huge favorite to qualify to the TCO from the third online stage.

On Saturday Google held Code Jam 2020 Round 2 (problems, results, top 5 on the left). Placing in the top 1000 was the main goal in this round, but still Benq's jump to the first place with just five minutes remaining in the round was quite impressive, well done!

Open Cup 2019-20 Grand Prix of Serbia wrapped up the week on Sunday (results, top 5 on the left). Teams Polish Mafia, USA1 and NNSU Almost Retired solved 12 problems each, and the deciding factor was Polish Mafia's significantly fewer incorrect attempts. Congratulations on the win!

It was not announced before the round, but this Grand Prix was the last one of the Open Cup 2019-20 season (overall standings, top 10 on the left). The standings were very close at the top, but in the end we have a new champion. For only the second time since 2013, the champion team's last name list does not include Korotkevich :) Congratulations to the team USA1, and looking forward to even more competition at the top in the coming season!

In my previous summary, I have mentioned an Open Cup problem: a string of 0s and 1s is called balanced if the number of 1s in any two its circular substrings of the same length differs by at most 1. A circular substring of a string s is any string with the length not exceeding the length of s that can be read after we write s on a circle; in other words, it's either a normal substring of s, or a suffix of s concatenated with a prefix of s with total length not exceeding the length of s. You are given a string consisting of 0s, 1s and ?s. How many ways are there to replace each ? with a 0 or a 1 such that the resulting string is balanced? The length of the given string is at most 1024.

This setting reminded me of Bresenham's line drawing algorithm, so I tried to get balanced strings like this: let's draw a line from point (0,0) to point (x,y), then let's keep walking starting from point (0,0). We walk like this: if we are below the line, then we go up (increase the y-coordinate by 1), and if we are above the line, then we go right (increase the x-coordinate by 1). If we are on the line, then we go up unless the line is horizontal (y=0), in that case we always go right. We will reach the point (x,y) in x+y steps, and let's write down a 0 every time we go right, and a 1 every time we go up.

Intuitively, such strings should be balanced, as we're always walking very close to the line and all substrings of the same size should represent almost the same vector. Circular shifts of a balanced string are also balanced, so we need to also consider the circular shifts of the strings we get that way. It's not at all clear if there are other balanced strings, though. Imagine how happy I was when I found out that considering only such strings gets correct answers on all samples, and then was accepted by the judging system!

The resulting code is very simple, since the strings we get for different pairs (x,y) are all different, so we just need to find the period of each such string to determine the number of its different circular shifts, and do not need to otherwise check the strings we get for repetitions.

This was one of those rare cases where submitting a solution solely based on intuition and without having even a sketch of a proof worked out :)

Thanks for reading, and check back for more!