Algorithms Weekly by Petr Mitrichev: This week in competitive programming

TopCoder SRM 653 has ignited this week's contests on Tuesday (problems, results, top 5 on the left). Egor and Kazuhiro were in their own league with amazingly fast solutions both for the medium and for the hard problem, but Egor has squeezed out the victory during the challenge phase - congratulations!

The most interesting part of this round, in my view, was coming up with a challenge for the easy problem. You were given a sequence of numbers with some numbers replaced by wildcards, and were guaranteed that the sequence before replacement by wildcards consisted of several consecutive segments of equal numbers, where each number is equal to the length of the corresponding segment, for example: 3, 3, 3, 4, 4, 4, 4, 2, 2, 2, 2 (3x3+4x4+2x2+2x2), then 3, *, 3, *, *, 4, 4, *, *, 2, * with wildcards. The problem asked to check if there's more than one way to reconstruct the numbers that were replaced by wildcards, and many people have simply counted the number of ways to reconstruct the numbers, and compared it with 1.

Many of those people have used the 32-bit integer type to count the number of ways, but it's not sufficient to count the number of ways for a sequence of length 100, so their solutions might fail if the total number of ways is k*2³²+1. I found one such solution during the challenge phase, and tried to create a testcase to fail it - but could not, and neither did the system test fail it. However, right after the SRM Misha 'Endagorion' has posted such testcase on Codeforces. Can you come up with a tricky testcase without following that link?

Codeforces Round 296 (problems, results, top 5 on the left) happened later that day. I've skipped the round, but want to congratulate piob on the amazing victory which he achieved in just 54 minutes out of two hours!

On Saturday, VK Cup 2015 Round 1 pioneered a (relatively) original competition format: 2 person teams (problems, results, top 5 on the left). Congratulations to Boris and Adam on the victory! The pre-round favorite team "Never Sorry" has led through most of the contest, but had to resubmit the solution for the hardest problem several minutes before the end of the round and dropped to fourth place. The reason for their resubmission? Their solution made an out-of-bounds access, namely tried to reach n+1-st character in an n-character string. Since they were using C++, this would have flown just fine if that was an ordinary string, but they had their own class since the string was constructed implicitly, and it had an explicit assertion for out-of-bounds accesses. Indeed, removing line 87 "assert(false);" from their first submission makes it pass the system test!

I have to admit that this example goes against my philosophy that more strict languages like Java or Pascal lead to higher probability of passing the system test because more bugs can be caught during the coding phase. Of course, this is just one example :)

VK Cup 2015 Round 1 online mirror was held several hours later with a slightly modified problemset (problems, results, top 5 on the left). Congratulations to Ivan on his first victory on Codeforces!

Now, let's come back to the problem I described last week and the new data structure. You are given a tree with at most 10⁵ vertices, where each edge has an integer length, and a sequence of 10⁵ updates and queries. Each update tells to color all vertices in the tree that are at most the given distance from the given vertex with the given color. Each query requires you to output the current color of a given vertex.

The data structure as described in the Russian post-match discussion and in another Codeforces comment is called "Centroid Decomposition of a Tree". We start by finding the centroid of the tree: a vertex such that it splits the tree into components of size at most N/2, where N is the number of vertices in the tree. One way to find such vertex is to pick an arbitrary root, then run a depth-first search computing the size of each subtree, and then move starting from root to the largest subtree until we reach a vertex where no subtree has size greater than N/2.

Let's mark the centroid with label 0, and remove it. After removing the centroid the tree separates into several parts of size at most N/2. Naturally, now we do the same recursively for each part, only marking the new centroids with label 1, then we get even more parts of size at most N/4, mark their centroids with label 2, and so on, until we reach parts of size 1. Since the size decreases at least twice with each step, the labels will be at most log(N).

The process of construction is displayed in the pictures on the left. The right subtree displays an interval tree analogy, while the left subtree shows that more unusual things can happen.

Now, consider any two vertices A and B in the tree and the path connecting them, and let's find the vertex C with the smallest label on that path. It's not hard to see that the path connecting A and B lies entirely in the part that vertex C was the centroid of in the above process, and that A and B lie in different parts that appear after removing C. So our path is concatenation of two paths: from C to A, and from C to B.

Finding C given A and B is also easy: let's just keep a link from each vertex to its "parent" in the above process (if our vertex has label K, the parent will have label K-1), and let's repeatedly follow this link either on A or on B, whichever currently has a higher label, until the two coincide.

Notice that we've chosen O(NlogN) paths in the tree (from each centroid to all vertices in the corresponding part) such that every path is a concatenation of two paths from that set, and we can find those two paths in O(logN) time. Such decomposition of paths turns out useful in many problems.

Now, how does one solve the problem in question? Well, whenever we need to color all vertices B at distance at most D from the given vertex A with color X, we will group possible B's by C - the vertex with the smallest label on the path from A to B, as descried above. To find all possible C's, we just need to follow the "decomposition parent" links from A, and there are at most O(logN) such C's. For each candidate C, we will remember that all vertices in its part with distance at most D-dist(A,C) from C need to be colored with color X.

When we need to handle the second type of query, in other words when we know vertex B but not A, we can also iterate over possible candidate C's. For each C, we need to find the latest update recorded there where the distance is at least dist(B, C). After finding the latest update for each C, we just find the latest update affecting B by comparing them all, and thus learn the current color of B.

Finally, in order to find the last update for each C efficiently, we will keep the updates for each C in a stack where the distance decreases and the time increases (so the last item in the stack is always the last update, the previous item is the last update before that one that had larger distance, and so on). Finding the latest update with at least the given distance is now a matter of simple binary search.

As usual, I'm expecting that some of you have already known this data structure for ages. Still, I'd love to hear what do you think about my explanation above! Also please tell if you've read a better explanation somewhere else.

And in any case, check back next week!

Algorithms Weekly by Petr Mitrichev

Sunday, March 22, 2015

This week in competitive programming

No comments:

Post a Comment