Algorithms Weekly by Petr Mitrichev: 2022

Sunday, February 20, 2022

A Berlekamp-Massey week

TopCoder SRM 824 was the first event of this week (problems, results, top 5 on the left, my screencast, TCO race standings). I had some kind of blackout and could not solve the medium, which in retrospect was quite standard. However, the hard was also quite standard and straightforward for me, while many others failed to get all details correctly, so the end result was not so bad. Still, I could not really compete with Um_nik and mhq who solved all three problems. Well done!

Open Cup has returned from its winter break with the Grand Prix of Kyoto on Sunday (results, top 5 on the left). Team USA1 continued their domination of this Open Cup season, finishing all problems a good hour before any other team, and with just one incorrect attempt to boot. Very impressive!

Our team struggled quite a bit with the two mathy problems H and M, until Google and OEIS came to the rescue. In problem M, I found the answers to a simplified problem in OEIS, noticed a linear recurrence as one of the methods to compute it, and then hypothesized that the answers to the full problem can also be determined using a linear recurrence :) And in problem H, I had to do a quick dive into some mathematical concepts that were new to me.

I have found problem J the most beautiful in this round: you are given a string of length n (n<=200000) with characters +, - and ?, and two numbers a and b. You can replace each ? character with either + or -. Then you need to find a balanced substring of the string, which means a substring of length a+b that has exactly a + characters and b - characters, in any order, and remove it from the string (the parts before and after the substring being removed become adjacent to each other). Then you need to find a balanced substring in the remaining string, and remove it from the string, and so on. You goal is to replace the ? characters in such a way that the number of times you can remove a balanced substring is maximum possible (this number of times will never exceed n/(a+b) since we would run out of characters). How to find this maximum number of times?

In my previous summary, I have mentioned a Codeforces problem: you are given a single integer n. You start with a sequence of numbers 1, 2, 3, ..., n. In one step, you can take any two numbers x and y, and replace them with x+y and |x-y|. Your goal is to make all numbers equal, and you also need to make those equal final numbers as small as possible, and you need to do all this in at most 20n steps.

I was not sure how to even approach this problem, so I started by implementing a brute force solution that found me the answers for all values of n up to 15. The brute force returns that for n=3,4 we can make all numbers equal to 4, for n=5,6,7,8 we can make all numbers equal to 8, and for n=9,10,11,12,13,14,15 we can make all numbers equal to 16. This naturally suggests that we need to target the smallest power of two that is at least n. I did not prove this fact during the contest, but there exists a very simple proof.

Apart from this observation, looking at the way the brute force achieved the goal helped me learn a couple of tricks that are useful for the solution. First of all, when we have a 0, then we can double any number: (0,x)->(x,x)->(0,2x). Second, the solution often makes many smaller powers of two, and then makes them all equal using this trick.

So now we need to convert all numbers into powers of two, not necessarily equal ones. Suppose 2^k is the smallest power of two bigger than n (we assume n is not a power of two, as for that case we can just use the solution for n-1 without touching the number n at all). We can apply the operation to n and 2^k-n, n-1 and 2^k-(n-1), and so on, 2^k-1+1 and 2^k-1-1. This way, we will get many copies of 2^kas sums, and from the differences we will get numbers 2n-2^k, 2n-2^k-2, ..., 6, 4, 2. And we also have the remaining numbers which we did not touch, those are 2^k-1 and 1, 2, ..., 2^k-n-1. 2^kand 2^k-1 are already powers of two so we don't need to do anything. And for the two chains 1, 2, ..., 2^k-n-1 and 2, 4, 6, ..., 2n-2^k-2, 2n-2^k we can just apply the same algorithm recursively: directly for the first chain, and doing a recursive call for 1, 2, 3, ..., n-2^k-1 and multiplying all results by two for the second chain.

This way we will convert all numbers into powers of two not exceeding 2^k. It turns out that for n>=3 at least two of those powers will be equal and less than 2^k, which allows us to use the (x,x)->(0,2x) step to get a zero, and then use this zero to keep multiplying all numbers by two until we have only one zero and many 2^k, and finally use the (0,x)->(x,x) step to get rid of the zero.

Thanks for reading, and check back next week!

Sunday, February 13, 2022

A global week

Codeforces Global Round 19 was the only round of this week (problems, results, top 5 on the left, analysis). tourist and Um_nik were pretty close, but Um_nik's two wrong attempts on E have set him back both in terms of points and in terms of contest time, so tourist came out on top. Congratulations!

Also well done to Maksim1744 for being the only other contestant to solve all problems. I have solved problem G at roughly the same time as him, but have more or less given up on solving H in the remaining time, while he persevered and was rewarded with the third place.

I think problem F, while a bit standard, was quite educational, so I encourage you to check it out as a good practice opportnity. However, since the last problem I described in this blog was also of this kind, let me instead focus on the very original problem G: you are given a single integer n. You start with a sequence of numbers 1, 2, 3, ..., n. In one step, you can take any two numbers x and y, and replace them with x+y and |x-y|. Your goal is to make all numbers equal, and you also need to make those equal final numbers as small as possible, and you need to do all this in at most 20n steps. Can you see a way? At least for me, it was quite helpful to experiment on a computer instead of just solving on paper.

Thanks for reading, and check back next week!

Saturday, February 5, 2022

A Gomory-Hu week

TopCoder SRM 823 was the only round of this week (problems, results, top 5 on the left, analysis, TCO race standings). neal_wu and ksun48 were leading after the coding phase, but it all went downhill for them during the challenges. First, ksun48, who was less than 50 points away from the first place, made a quick challenge and earned -25, making neal_wu clear first with more than 50-point gap. A couple of minutes later, Kevin earned another -25. Then Neal decided to take the matters into his own hands and got a -25 of his own, which meant that while Kevin was still more than one successful challenge away, SSRS_ now needed just one +50 to take the first place and the 5 TCO race points, which he duly did closer to the end of the challenge phase. Congratulations on the victory! Neal and Kevin made two more unsuccessful challenges each, ending the round with quite bizarre -75 and -100 challenge points respectively.

In my previous summary, I have described a solution to a Codeforces problem. I'm not going to paste it here since it's very long, so please read the previous summary for context :) As the first step of my solution, I'm constructing another tree where the longest path queries become LCA queries, and then construct an inorder traversal of the new tree to answer LCA queries via RMQ queries.

tourist has pointed out this awesome comment to me, which shows that I did not really have a full understanding of what's going on when writing my solution. Indeed, let us look at the construction process of the new tree more closely and forget about future LCA queries for a second, just trying to keep the property that in the new tree the largest edge weight between any two vertices is the same as in the old tree. Then we can see that whenever we attach two subtrees to a newly created node, we don't really have to use the roots of the subtrees for that, and can use any node instead (because all edges within the subtrees have smaller weight than the currently processed edge, it does not matter which within-subtree edges will end up being used for the paths between the subtrees). And this allows us to keep all subtrees as chains, and connect them to form bigger chains, ultimately creating one big chain.

So it turns out that for any tree, we can construct a chain on the same vertices such that the maximum edge weight on the path between any two vertices is the same in the tree and in the chain, and of course the latter can be found using range-maximum queries. This chain is more or less equivalent to the inorder traversal of the auxiliary tree that was built in my solution, but I think constructing it explicitly is more beautiful and demonstrates a better understanding of the mechanics of the problem.

In an attempt to demonstrate an even better understanding of the mechanics, it seems to me that this construction is somewhat related to the Gomory-Hu construction. In fact, this construction seems to suggest that the Gomory-Hu tree can always be replaced by an equivalent Gomory-Hu chain! However, there seems to be no mention of a "Gomory-Hu chain" on the Internet, so I'm probably missing something here?

Update: Maxim Babenko has clarified the matters here: in the Gomory-Hu tree, for any pair of vertices not just the size of the minimum cut between them is equal to the size of the minimum cut in the original graph (as Wikipedia claims), but also the minimum cut itself (as a partition of the vertex set into two). In a Gomory-Hu chain the size of the cut is indeed still the same, but not the cut itself.

Thanks for reading, and check back next week!

Friday, February 4, 2022

An AlphaWeek

Codeforces Round 768 on Thursday was the first event of last week (problems, results, top 5 on the left, analysis). Um_nik has solved the hardest problem F so much faster than everybody else that his gap at the top of the scoreboard is enormous, almost as if he has solved a seventh problem :) Very impressive!

CodeChef January Lunchtime 2022 followed on Saturday (problems, results, top 5 on the left, my screencast, analysis). I haven't solved IOI-format contests for quite some time, but I have dived right in, taking advantage of the fact that there is no penalty for incorrect submissions and wasting way too much time waiting for submission results :) For this reason, or maybe because I just had too many small bugs to find and fix, gennady.korotkevich and maroonrk were out of reach, getting to 500 points much faster. Well done!

In my previous summary, I have mentioned a Codeforces problem: you are given a tree with vertices numbered from 1 to n, initially all colored white. The tree edges have weights. You need to process a sequence of queries of three types:

Given l and r, color all vertices with numbers between l and r black.
Given l and r, color all vertices with numbers between l and r white.
Given x, find the largest edge weight in the union of the shortest paths from x to all black vertices.

Both the number of vertices and the number of queries are up to 300000.

The first step in solving this problem is relatively standard: let's learn to effectively find the largest edge weight on a path in a tree. We will build an auxiliary new tree where the inner nodes correspond to the edges of our tree, and the leaves correspond to the vertices of our tree. We start by having just n independent leaves. Then we iterate over all tree edges in increasing order of weight, and for each edge we find the new trees containing its ends, then make a new tree vertex corresponding to this edge and make the two trees we found its children. This can be accomplished using a disjoint set union data structure.

You can see an example of an initial tree and a new tree corresponding to it on the right. Now, it turns out that the edge with the largest weight between any pair of vertices in the initial tree corresponds to the least common ancestor of those vertices in the new tree.

So in the new tree, the query of type 3 can be seen as finding the shallowest least common ancestor between the given vertex x and the black vertices, and it's not hard to notice that if we write down the vertices of the new tree in dfs in-order (in a similar fashion to how LCA is reduced to RMQ), then we just need to find the first and last vertices from the set {x, black vertices} in this order, and take their LCA.

Now we are ready to turn our attention to the queries of type 1 and 2. Consider the array where the i-th element contains the in-order traversal times of the i-th vertex of the original tree in the new tree. Then in order to answer queries of type 3, we need to be able to find the minimum and maximum value in this array over all black vertices. So, we need a data structure that supports the following operations:

Given l and r, color all vertices with numbers between l and r black.
Given l and r, color all vertices with numbers between l and r white.
Find the minimum and maximum value of a_i among all i that are colored black.

Given the range update and range query nature of this subproblem, it is reasonable to use a segment tree with lazy propagation to solve it. Indeed, the AtCoder library provides a general implementation of such data structure that allows one to plug an arbitrary operation inside it. You can check out the documentation for the details of how to customize it, but in short we need to come up with a monoid and a set of mappings (in effect, another monoid) acting on the elements of the first monoid in such a way that the distributive law applies, the range update is applying a mapping to all elements from a range, and the range query is multiplying all elements in a range in the first monoid.

This all sounds abstract, fine and dandy, but how to put our real problem into these algebraic terms? First of all, let's come up with the second monoid, the one that corresponds to the range updates (queries of type 1 and 2). This monoid will have just three elements: neutral, white and black, and the multiplication in this monoid will simply be "if the left argument is not neutral, then take it, otherwise take the right argument", which makes sense since the last color applied to a vertex overrides all previous ones.

Now, what should the first monoid be, the one that corresponds to the range queries? Since we need to find the minimum and maximum value of all elements that are colored black, it's logical to first try to have an element of the monoid store those minimum and maximum, with the natural operation that takes the minimum of the minimums and the maximum of the maximums.

However, it turns out that we can't make the whole system work: how would a "black" range update act on such a pair? It turns out we're missing information: the new maximum of all black elements should be equal to the maximum of all elements, because all of them are colored black after the range update, and we don't store and propagate this information yet.

So here's our second attempt at the range query monoid: its element stores four numbers: the minimum and maximum values of black elements in the corresponding segment, and the minimum and maximum values of all elements in the corresponding segment. The operation is also defined naturally, and we now have enough to define how "black" and "white" range updates act on it.

To summarize, both monoids that we use in our lazy segment tree are not very common:

One where elements are quadruples of numbers, with the operation taking minimum of two numbers and maximum of the two others.
The other with just three elements and a weird multiplication table.

I realize that this explanation is very abstract and can be hard to follow. Here's an excerpt from my submission that defines the lazy segtree, maybe it will help:

struct State {
int minOn = INF;
int maxOn = -INF;
int minAll = INF;
int maxAll = -INF;
};
State unitState() {
return {};
}
State combine(State x, State y) {
State r;
r.minOn = min(x.minOn, y.minOn);
r.maxOn = max(x.maxOn, y.maxOn);
r.minAll = min(x.minAll, y.minAll);
r.maxAll = max(x.maxAll, y.maxAll);
return r;
}
State apply(int x, State s) {
if (x == 0) return s;
if (x == 1) {
s.minOn = s.minAll;
s.maxOn = s.maxAll;
return s;
}
assert(x == -1);
s.minOn = INF;
s.maxOn = -INF;
return s;
}
int combine2(int x, int y) {
if (x != 0) return x; else return y;
}
int identity() {
return 0;
}

...

atcoder::lazy_segtree<State, combine, unitState, int, apply, combine2, identity> lazytree;

I was pretty happy with myself when I figured this out, even though that maybe just implementing a lazy segtree without explicit mathematical abstractions might've been faster :)

Finally, it turns out that last week the submissions (1, 2, 3) generated by AlphaCode were submitted to Codeforces. I got to take a look at them a couple of days before the publication, but I was not involved in any other way, so I was just as impressed as many of you. Of course, I think this is just the first step and we're still very far from actually solving problems with ML models.

I found this submission to be the most helpful to understand the current state of affairs: the part before "string s;" is quite meaningful, and the model had to have a decent understanding of the problem statement to generate that loop. However, I cannot see any logical justification of the loop after "string s;", which seems to be a pretty arbitrary loop :) The reason the whole solution still works is that after the first part, in case the length of the result string is n-1 we need to append any of the two characters a, b to it, and an arbitrary loop can generate an arbitrary character.

This solution seems to suggest that the model can translate instructions from the problem statement into code implementing them, and also sometimes make small logical steps by the virtue of statement-guided random exploration. While this seems enough to solve easy Div 2/3 problems, I think the model needs to improve a lot to generate a solution of the type that I'd appreciate enough to mention in this blog :)

Nevertheless, this has still advanced the state of the art by leaps and bounds, so congratulations to the AlphaCode team!

Thanks for reading, and check back for this week's summary.

Saturday, January 29, 2022

An incremental week

TopCoder SRM 822 was the first event of the last week (problems, results, top 5 on the left, analysis). The hard problem did not require too many algorithmic insight, but the implementation and tiny details were extremely tricky to get right. As a result, only bqi343 managed to make it work and he won the SRM ahead of four challenge phase masters. Well done!

I was almost there with the hard problem, my solution had the same phases as the one from the official analysis, but I did not manage to even make it work on samples in time.

Codeforces Round 767 followed on Saturday (problems, results, top 5 on the left, analysis). I did not like the long statement of A and I did not immediately see the solution to B, so I've started solving the (relatively) easier problems in a weird order: D1, D2, C, A. When I solved these four, I saw that tourist already had all of them + problem B solved, so it became clear that I needed to go for the harder problems if I wanted to have a shot at the first place. We solved E roughly at the same time which did not change the situation, so I went all in on F, but unfortunately did not manage to finish in time.

At the end of the contest, my solution for F had just one bug: when comparing fractions a/b and c/d, I compared a*d-b*c with zero without taking the signs of b and d into account :( To be fair to tourist, his solution to F also needed just a few more minutes. Congratulations on the win! As he pointed out after the round, I would have probably won the round if he did not participate, since then I would have solved B instead of F.

I want to mention problem E, not because it was particularly exciting, but because I want to share an interesting trick from my solution: you are given a tree with vertices numbered from 1 to n, initially all colored white. The tree edges have weights. You need to process a sequence of queries of three types:

Given l and r, color all vertices with numbers between l and r black.
Given l and r, color all vertices with numbers between l and r white.
Given x, find the largest edge weight in the union of the shortest paths from x to all black vertices.

Both the number of vertices and the number of queries are up to 300000.

In the past few months, I've become a lot more proficient using AtCoder's general lazy segtree algorithm, and prefer using it to rolling a custom lazy segtree implementation for every problem. However, in problems like this one it can be quite challenging to express the lazy propagation as a monoid + a set of mappings. All the more satisfaction when you actually manage to do it :) Can you see the trick? I'll describe my solution in the next summary.

CodeChef January Cook-Off 2022 wrapped up the week on Sunday (problems, results, top 5 on the left, analysis). tourist has solved all problems in less than an hour (out of 2.5 hours available) and won in a very convincing fashion. Great job!

In my previous summary, I have mentioned a CodeChef problem: you are given a permutation of length 200000, and need to tell if it's possible to split it into two disjoint subsequences in such a way that both parts have the same number of incremental maximums (numbers that are bigger than all previous numbers in the part).

The first observation is somewhat natural: let's look at the incremental maximums of the original permutation. If their number k is even, then it's easy to find the required split: let's find the (k/2+1)-th incremental maximum, and let the first subsequence consist of all numbers before it, and the second subsequence of this number and all following number. Then each of the subsequences will have exactly k/2 incremental maximums, and those will be exactly the incremental maximums of the original permutation.

We can also notice the following "reverse" observation: each of the incremental maximums of the original permutation will be an incremental maximum of one of the two subsequences. But of course the subsequence may have additional incremental maximums. Let's call the incremental maximums of the original permutation the big ones, and all others the small ones.

Notice that we can always get rid of a small incremental maximum without affecting any other big or small incremental maximums: just move it to the other subsequence (where it will be hidden by the big incremental maximum that was hiding it in the original permutation), together with any additional small incremental maximums that appear when we remove this one.

Therefore if there is a solution to this problem where both subsequences have small incremental maximums, we can keep removing one from each until one of the subsequences has only big ones. So without loss of generality we can only look for solutions where the second subsequence has only big incremental maximums.

In a similar spirit to the above argument, we can also introduce new small incremental maximums by moving them from the other subsequence. Therefore the set of incremental maximums of the first subsequence can be any increasing subsequence of the original permutation.

Therefore the problem has been reduced to the following: does there exist an increasing subsequence of the given permutation such that it contains exactly k-m out of k big maximums, where m is its length?

Now we can assign a weight of 2 to all big maximums, and a weight of 1 to all other numbers, and our question simply becomes: "does there exist an increasing subsequence of weight k?"

Finally, we can notice that we can always decrease the weight of a subsequence by 2 by simply removing some elements, so it's enough to answer the question "what is the maximum weight of an increasing subsequence which has the same parity as k?" And this question can be answered by adapting a O(n*log(n)) algorithm for the longest increasing subsequence.

Notice how we first obtained a working understanding of the solution space here — that we can have any two disjoint incremental subsequences of the original permutation that cover all big maximums as the sequences of incremental maximums of the two parts — and then the actual algorithmic solution was not that hard to see.

Thanks for reading, and check back (hopefully) tomorrow for this week's summary!

Sunday, January 16, 2022

A Stirling week

There were no contests that I'd like to mention this week, but this offers a good chance to look back at the first full week of 2022 :) The new competitive programming year started with Hello 2022 on Codeforces (problems, results, top 5 on the left, analysis). The new year started in just the same way it ended: with a victory for tourist, who has been in amazing form recently. Congratulations!

My round was unfortunately ruined by getting bogged in minor details of the problems. I've started by forgetting to reset the segment tree back to the initial state after processing each group in problem E, but that was still a relatively minor setback. The next problem F was nastier, and I spent quite a lot of time and energy to deal with the first and last groups of zeros properly. Finally, in problem G I just could not untangle myself in time, and my bug turned out to be simply forgetting to ignore smaller numbers to the right of a right-maximum when processing it.

TopCoder SRM 821 on Saturday started the race to qualify for the TCO22 regional events (problems, results, top 5 on the left, analysis). tourist has continued his impressive run of form by becoming the first person ever to win 5 SRMs in a row, also achieving a ridiculous rating of 4034 in the process. Very impressive!

The round featured a 250-pointer that required a math olympiad-style solution. Sure, after knowing that such solution is expected finding it is not so hard, but I somehow tried to think more algorithmically and failed. Having seen that others are submitting the problem for very high scores, I've assumed that the greedy works and submitted it, which at the very least allowed me to get unstuck and move on to the other two problems. Having submitted them, I've implemented a stress-test for the 250 and quickly learned that the greedy does not in fact work, but still could not come up with a working solution, hence no resubmit. Therefore I guess I was lucky to believe in the greedy, as otherwise I wouldn't have had enough time left for the two remaining problems even had I figured the trick out.

I've managed to make up the points I knew I was going to lose by challenging others who submitted greedy and similar solutions for this problem, but even +450 challenge points were not enough as tourist has found +425 for himself. Nevertheless, I enjoyed the excitement of the challenge phase, and the need to make tradeoffs between confidence that your challenge might work with the need to find many challenges in 15 minutes. Props to the TopCoder format for making such exciting rounds possible!

Finally, CodeChef SnackDown 2021 Final Round on Sunday wrapped up the contests of the previous year (problems, results, top 5 on the left, analysis). It was the first online edition of SnackDown, and the first one without the two-person team format (even though the organizers mentioned that they're planning to come back to the team format when the pandemic allows). Mingyang Deng has made steady progress while others stumbled, and therefore had a much better penalty on 6 problems and got the first prize. Well done!

My round went wrong when I quickly reduced the permutation problem to finding the Stirling numbers of the second kind S(n, k) for all k from 1 to a, but then spent at least an hour, and probably more, trying to figure out how to actually do that, including opening a physical copy of Concrete Mathematics searching for some identities that would help me :) However, this was a moment where no magical trick was needed, and one just needed to apply the technique that I described in my blog 5 years ago. I guess my statement back then was wrong: "However, C(n, k) seems to be the only practically relevant function with this property".

This huge delay has set the stage for my last-hour comeback, as I managed first to deal with the quite tricky problem where one needs to take advantage of the fact that some probabilities are too low to significantly reduce the number of dynamic programming states, then to solve the problem about incremental maximums, and then finally see the FFT trick with about 5 minutes remaining, managing to finish coding and submit my solution with about 5 seconds left on the contest clock, after it failed to compile locally, and it passed!

The compilation issues were due to the fact I'm currently using Visual Studio as the backend for CLion, which provides for much faster compilation, but unfortunately lacks int128 support. I had int128 used in my library to apply CRT when doing FFT on three prime moduli, back from the times I used gcc and clang. JHelper's inlining only works on file level, therefore this version of FFT was being included even though it was not actually used.

Here is the incremental maximums problem: you are given a permutation of length 200000, and need to tell if it's possible to split it into two disjoint subsequences in such a way that both parts have the same number of incremental maximums (numbers that are bigger than all previous numbers in the part).

This is one of those problems that look quite ugly after reading the statement, but as you work towards the solution you develop a good understanding of the problem structure, and the solution is therefore quite rewarding when you see it in the end. So I encourage you to take a crack at it, even though of course it is very hard.

This problem has appeared before, but I somehow did not remember it at all even though I solved the relevant round. I guess I focused on other problems then, did not do much upsolving, and did not find it interesting enough for the blog. Well, at least that allowed me to enjoy solving it this time!

Thanks for reading, and check back next week.